Next Article in Journal
Special Issue: Environmental Flows, Ecological Quality, and Ecosystem Services
Next Article in Special Issue
Water Extraction from Fully Polarized SAR Based on Combined Polarization and Texture Features
Previous Article in Journal
Climate Change Effects on Fish Passability across a Rock Weir in a Mediterranean River
Previous Article in Special Issue
A Barotropic Tide Model for Global Ocean Based on Rotated Spherical Longitude-Latitude Grids
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A ConvLSTM Conjunction Model for Groundwater Level Forecasting in a Karst Aquifer Considering Connectivity Characteristics

1
School of Geography, Nanjing Normal University, Nanjing 210023, China
2
Key Laboratory of Virtual Geographic Environment (Nanjing Normal University), Ministry of Education, Nanjing 210023, China
3
Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing 210023, China
4
Jinan Rail Transit Group Co., Ltd., Jinan 250101, China
*
Author to whom correspondence should be addressed.
Water 2021, 13(19), 2759; https://doi.org/10.3390/w13192759
Submission received: 27 August 2021 / Revised: 23 September 2021 / Accepted: 1 October 2021 / Published: 5 October 2021
(This article belongs to the Special Issue Advances in Hydroinformatics for Water Data Management and Analysis)

Abstract

:
Groundwater is an important water resource, and groundwater level (GWL) forecasting is a useful tool for supporting the sustainable management of water resources. Existing studies have shown that GWLs can be accurately predicted by combining an artificial neural network model with meteorological and hydrological factors. However, GWL data are typically geographic spatiotemporal series data, and current studies have considered only the spatial distance factor when predicting GWLs. In karst aquifers, the GWL is affected by the developmental degree of the karst, topographic factors, structural features, and other factors; considering only the spatial distance is not enough, and the real spatial connectivity characteristics need to be considered. Thus, in this paper, we proposed a new method for forecasting GWLs in karst aquifers while considering connectivity characteristics using a neural network prediction model. The connectivity of a karst aquifer was analyzed by a multidimensional feature clustering method based on the distance index and hydrogeological characteristics recorded at observation wells, and a convolutional long short-term memory (ConvLSTM) conjunction model was constructed. The proposed approach was validated through GWL simulations and predictions in karst aquifers in Jinan, China, and four experiments were conducted for comparison. The experimental results show that the proposed method provided the most consistent results with the measured observation well data among the analyzed methods. These findings demonstrate that the proposed method, which considers connectivity characteristics in karst aquifers, has a higher simulation accuracy than other methods. This method is therefore effective and provides a new idea for the real-time prediction of the GWLs of karst aquifers.

1. Introduction

Groundwater serves as a critical source of water for domestic water uses, agricultural irrigation, and industrial uses, and the rational use of water resources can support sustainable development. In contrast to surface water characteristics, the volume and dynamic evolution characteristics of groundwater are difficult to directly obtain; these features must be revealed by hydrogeological surveys and long-term sequence analyses. Among these characteristics, the groundwater level (GWL) is the most important index and can reveal the influence of human activities, meteorological conditions, and other factors on the groundwater environment [1,2,3]. Therefore, GWL prediction research has become a hot topic.
GWL prediction methods can be divided into methods that use deterministic models and those that use stochastic models. Deterministic models can estimate the hydrological process in a specific region based on the governing equation of the groundwater flow, but they require large quantities of long-term hydrometeorological data and many parameters to describe the physical characteristics of the studied aquifer system, such as MODEFLOW, FEFLOW [4,5]. For regional simulations, a groundwater aquifer is often generalized, causing difficulties in reflecting local details and introducing great errors at some stations. In deterministic models, the GWL prediction process is complex, and the acquisition of parameters is expensive and time consuming [6].
Therefore, in cases of insufficient groundwater aquifer information, many studies have focused on stochastic models to forecast GWLs. Stochastic models are data-driven models that are established by using probability statistics theory. These models do not require sufficient information on hydrogeological aquifer parameters in the model calibration process [6,7,8]; thus, they are more suitable for predicting GWLs when no detailed groundwater attribute data are available.
The stochastic models used for GWL forecasting mainly include time series models and neural network models. Time series models, in which the use of the autoregressive integrated moving average (ARIMA) is typical, have been widely used in GWL prediction research [9]. However, ARIMA is limited because the GWL is affected by many factors that cannot be considered by these models. For example, the GWL is closely related to rainfall, and the ARIMA cannot describe this correlation or its delayed effect [7], although the improved ARIMAX model can be used to consider the relationship between exogenous variables and groundwater level [10]. However, time series models are limited in that their application is restricted to linear processes of a system, so their use is questionable for predicting groundwater levels using time series models, especially in complex environments [11,12].
Currently, with the development of deep learning technology, artificial neural network (ANN) models have been gradually applied in time series data research such as GWL research, and some achievements have been obtained [13,14]. Compared with time series models, ANNs can better identify the nonlinear behaviors of GWL time series. ANNs can learn and summarize inherent regularity from a set of provided examples and do not require information on the interactions among multiple factors that affect the GWL [15]. Common neural network models used for predicting GWLs include the back-propagation (BP) neural network, radial basis function (RBF) neural network, and long short-term memory (LSTM) neural network [16,17,18,19].
In recent years, the LSTM neural network has been developed rapidly [20,21,22]. Long short-term memory (LSTM) neural network, a kind of recurrent neural network, can simulate and predict GWLs using only historical data [21,22,23]. Existing studies have shown that LSTM prediction models have higher efficiencies and accuracies than time series models and BP neural networks [24,25,26].
However, since GWL data are typical geographic spatiotemporal series data, LSTM prediction models consider the temporal autocorrelation of GWL data as well as the spatial correlations among different observation wells in the prediction process [27]. Existing studies have considered only the spatial distance factor and have not involved analyses of spatial correlations among sites under the influence of complex factors [27,28,29]. GWL dynamic characteristics are closely related to external factors, such as rainfall, and internal hydrological characteristics, especially in karst aquifers. Due to the differentiation that occurs during the formation of karst aquifers, the spatial heterogeneity of karst water is very obvious. The flow process within a karst aquifer includes flow in the karst network and flow in the matrix characterizing dual-medium flow [30]. The flow features in a karst aquifer are closely related to its spatial connectivity, which is affected by the developmental degree of the karst, the local topography, structural features, and other factors [31]. Observation wells with relatively nearby spatial distances may have weak mutual connectivity because of the spatial heterogeneity of karst development. In contrast, observation wells with relatively far distances may have strong mutual spatial connectivity. The influence of the spatial connectivity of karst aquifers on GWL forecasting should thus be considered.
Therefore, we propose a new convolutional LSTM conjunction model (ConvLSTM) for forecasting the GWL in a karst aquifer considering its connectivity characteristics. In this study, the connectivity characteristics of the studied aquifer were analyzed based on the multidimensional k-means clustering method, and the ConvLSTM prediction model was constructed based on a convolutional neural network and LSTM networks. Thus, the proposed method considers the temporal autocorrelation of GWL data but also the spatial correlation among different observation wells, and the proposed method was validated through its application to GWL predictions in a karst aquifer in Jinan, China.

2. Materials and Methods

2.1. Study Area

Jinan, the capital city of Shandong Province, is located in a mid-latitude inland zone and belongs to a warm, temperate continental monsoon climate zone in northern China. The annual average precipitation is 641.68 mm (1956–2012), and the annual average evaporation is 1500–1900 mm [32]. Jinan is famous for its springs and is named “The City of Springs”—108 springs are distributed within 2.6 km2 in the center of Jinan [33].
Jinan is a typical area containing distributed karst water resources. Karst groundwater is an indispensable and important resource for industrial and agricultural production and social development in Jinan. The carbonate karst aquifers in Jinan are mainly located in the Majiagou Group in the Cambrian–Ordovician strata. Its lithology is dominated by thick microcrystalline limestone, dolomite, and argillaceous dolomite [33,34].
The main supply source of the karst aquifer system in the Jinan spring catchment is meteoric precipitation. The groundwater runoff of the karst aquifer in the spring catchment is controlled by the local topographic features and lithological and geological structure features, and the runoff has obvious spatial heterogeneity [34]. Figure 1 shows the general situation of the study area; the observation wells selected in this paper are mainly located in the carbonate karst aquifer between the Mashan Fault and the Ganggou Fault in Jinan spring catchment.

2.2. Data

The data used in this study include long-term GWL observation data, rainfall data, geological structure data, and hydrogeological parameter data.
1. GWL observation data
The GWL time series data collected in the study area represent the period from 2009 to 2012, and these GWL data were sampled 6 times a month with a sampling interval of 5 days. We selected 16 observation wells from which to obtain experimental data; these wells of the karst aquifer system in the Jinan spring catchment are numbered No. 1~No. 16, and the spatial location distribution of these GWL monitoring wells is shown in Figure 1.
2. Rainfall
GWL change characteristics are closely related to the atmospheric rainfall process; to obtain rainfall data, we selected 12 rainfall observation stations in the Jinan spring catchment study area that recorded data from 2009 to 2012. The daily rainfall data were resampled 6 times a month with a sampling interval of 5 days in the temporal domain, and the rainfall data and the GWL data recorded in observation wells were matched in the temporal and spatial domains by interpolation and by extraction to points in the spatial domain.
3. Hydrogeological characteristics data
The hydrogeological characteristics data included the groundwater depth, terrain slope, relief amplitude, variance in the GWL, distance from a fault, and water yield of a single well (Table A1). The terrain slope and relief amplitude represent the topographic variation characteristics of the area where the corresponding observation well is located. The GWL variance was used to indicate the degree of variation in the water level. The distance from a fault indicates the degree of karst development. The water yield of a single well indicates the ability of the aquifer to produce groundwater, which is an important index to classify the water yield grade. In the paper, spatial clustering was firstly carried out using the spatial distances between the observation wells, and then the hydrogeological characteristics were taken as the clustering index set, and another cluster analysis was used to study the connectivity of the karst water.

2.3. Methods

2.3.1. K-Means Clustering of Multidimensional Features

The k-means algorithm is a popular partitional clustering method, based on the idea of using the cluster centers (means) as representatives of each cluster, which is widely used [35]. The major factors that can impact the performance of clustering algorithms are choosing the initial centroids and estimating the number of clusters [36]. In this paper, the silhouette coefficient was used to select the optimal number of clusters and evaluate the clustering performance, which considers both the intra-cluster and inter-cluster distances for cluster validation [10,37,38]. First, the classification data are divided into k groups with a k-means clustering algorithm, and then the average contour value of the current iteration of each index k is calculated within the range of the predefined minimum and maximum cluster numbers. Finally, the index with the largest average contour value is selected as the optimal number of clusters for the classification data.
In this paper, the k-means clustering algorithm was used to first cluster the observation wells based on their mutual spatial distances; then, further clustering was performed based on the hydrogeological characteristics index values recorded at each station. Through this method, the observation wells with strong spatial connectivity were divided into the same category, while observation wells with weak connectivity were divided into different categories.

2.3.2. LSTM and ConvLSTM

LSTM was originally proposed by Hochreiter and Schmidhuber in 1997 [16], it improved the cyclic neural network model by introducing gates and well-defined storage units to solve the gradient disappearance and gradient explosion problems, respectively, in time series with excessively long processing times in the recurrent neural network model [20]. The neuron structure of the LSTM network is shown in Figure 2.
Although an LSTM network can effectively extract the temporal characteristics of time series, the network cannot capture the spatial features of the data. To better integrate the karst water connectivity analysis, this research designed a ConvLSTM module to analyze the temporal and spatial characteristics of GWL changes [39]. The ConvLSTM module includes a convolutional neural network and LSTM network; the resulting module has the temporal modeling ability of an LSTM but can also depict local features such as a convolutional neural network (CNN) [40]. The module captures basic spatial features by performing convolution operations in multidimensional data and replaces the matrix multiplication step with the convolution operation of each gate in the LSTM unit to extract the spatiotemporal characteristics of GWL changes, characterizing the main component of the GWL prediction model developed in this study.
The internal structure (Figure 3) and calculation process of ConvLSTM (Equation (1)) are as follows:
The calculation process of ConvLSTM is as follows:
i t = σ W x i * X t + W h i * H t 1 + W c i C t 1 + b i f t = σ W x f * X t + W h f * H t 1 + W c f C t 1 + b f C t = f t C t 1 + i t tanh W x c * X t + W h c * H t 1 + b c o t = σ W x o * X t + W h o * H t 1 + W c o C t + b o H t = o t tanh C t
where i ,   f ,   c and o are the input gate, forget gate, control unit and output gate in the LSTM structure, respectively, σ is the nonlinear activation function, x t is the input at time t , W x i , W h i , W c i , W x f , W h f , W c f , W x c , W h c , W x o , W h o , W c o are the weight matrix parameters (for example, W x i is the weight matrix value from the input to the input gate), represents the Hadamard product, * represents the convolution operation, h t represents the output value at time t , and o t represents the gating information in the output gate.
The ConvLSTM structure used in this article is shown in Figure 4. It consists of two convolutional layers and two LSTM layers and introduces the attention mechanism [41].

3. GWL Simulation and Prediction Experiments

3.1. Description of the GWL Prediction Process

In this paper, a ConvLSTM conjunction model that considers connectivity characteristics was proposed for GWL forecasting in a karst aquifer. First, the connectivity of the karst water was analyzed by the mixed clustering method, and the observation wells were classified by spatial clustering based on their mutual distances and the results of the attribute clustering based on the hydrogeological characteristics of each well.
Then, the ConvLSTM neural network model was constructed to simulate and predict GWLs; this model has the time series modeling ability of an LSTM network and can also describe local spatial characteristics such as a CNN. When predicting the GWL of each observation well, the rainfall data of the observation well, historical GWL data, and the historical GWL data of associated wells with strong connectivity are selected as relevant variables; the model comprehensively considers the GWLs, meteorological factors, and spatial correlation and heterogeneity between observation wells to simulate and predict the GWL of the karst aquifer of Jinan Spring Catchment.

3.2. Input Data Consideration

Existing GWL prediction research has shown that the past steps of the GWL time series and precipitation are the most commonly used variables input into artificial intelligence models to predict GWLs [15]. Based on the actual situation and data availability in the study area, this paper used the lasso regression method to extract important variables from the original data based on regression weights [42]. The GWL and rainfall were selected as input variables among the GWL, rainfall, surface runoff, evaporation, temperature. This conclusion is consistent with existing research conclusions.
In this paper, a mixed clustering method was designed to analyze the connectivity of karst water by the k-means clustering algorithm. First, the observation wells were preliminarily clustered based on their spatial distance index values; then, a further cluster analysis was carried out according to the hydrogeological characteristics of the observation wells by k-means clustering. The most suitable number of clusters was determined by the silhouette coefficient, which can evaluate the performance of clustering results.
The final results were obtained by combining the results of these two methods. For any two observation wells, only if they belong to the same category in the clustering method based on spatial distance, and also in the same category in the clustering method based on hydrogeological characteristics, the two observation wells are finally classified into the same category. Otherwise, they are divided into different categories. This method can avoid two kinds of errors. First, it avoids considering only the distances between wells and ignoring whether the observation wells are truly connected. Second, it avoids the phenomenon in which the clustering results are not spatially clustered when considering only their corresponding hydrogeological characteristics. However, this method may cause the problem of classification redundancy, which needs to be improved in the future.
Figure 5a shows the results of clustering based on distance index in which wells were grouped into three categories, and the average silhouette coefficient value is the largest. Figure 5b shows the results of clustering based on hydrogeological characteristics, which shows some partial difference, compared with Figure 5a; for instance, No. 11, No. 12, No. 15, and No. 16 in the result of clustering based on distance, would undoubtedly be classified into the same category since the four observation wells are very close in actual distance. However, in the results of clustering based on hydrogeological characteristics, they are quite different in the water yield of a single well and variance in the GWL, so they are divided into two categories. Furthermore, due to the terrain slope and relief amplitude factors, No.10 is divided into different categories; compared with other observation wells, No. 1 is the same.
Therefore, based on the idea of considering the connectivity of karst aquifers, combined with the spatial distance index and hydrogeological characteristics clustering results, using the cross-merging method, all observation wells were finally divided into seven categories, as shown in Figure 5c and Table 1. When predicting each observation well individually, the GWLs of the remaining observation wells belonging to the same category could be taken as an input variable to consider the spatial connectivity between observation wells and reveal the spatial characteristics of the GWL in the karst aquifer.
Thus, the input variables ultimately selected in this study were the GWL, past steps of the GWL time series, rainfall, past steps of the rainfall series, and the GWL sequences of wells with the same category.

3.3. Data Set Processing

The GWL and related variables of the prediction well are expressed as a one-dimensional vector as
x = G p ,   R p , G c a t
where G p and R p are GWL and rainfall value of the target prediction well, and G c a t is the average GWL value of the remaining observation wells that belong to the same category as the predicted well.
Then, the one-dimensional vectors of multiple time steps are formed into a two-dimensional matrix, which is used to represent the input data within a period of time, i.e., it is used as a time window (Equation (3)). Further, using this time window to slide the complete data sequence of the prediction well from the time series direction, multiple time window data of this prediction well can be obtained (Equation (4)).
X = x 1 x m = G p 1 R p 1 G c a t 1 G p m R p m G c a t m
P k = X 1 k X n k = x 1 1 x m 1 x 1 n x m n
where X is data in a time window, and m is the size of the time window; G p m , R p m and G c a t m are G p ,   R p   and   G c a t at the moment m in the time window; P k is data of the target prediction well k and X n k is the time window n of the target prediction well k; x m n is the x at the time m in the time window n.
Finally, the data of 16 observation wells are integrated into an input matrix to enter the ConvLSTM model (Equation (5)). The network can learn the GWL common features of the observation wells in the area and realize the real-time and efficient GWL prediction effect for multiple observation wells in the area with a single prediction model.
D a t a = P 1 P 16 = X 1 1 X n 1 X 1 16 X n 16

3.4. Experimental Design

To verify the validity of the ConvLSTM conjunction model that considers connectivity characteristics in predicting the GWL, this paper designed the following four experiments. The training stage was from January 2009 to December 2011, and the testing stage was from January through December 2012. In each batch of training, 20% of the training set was divided for data validation. The parameter settings, input variable, and data set of the experimental models are shown in Table 2 and Table 3; The experiment contents are as follows:
  • Experiment 1: Single-variable LSTM (SV-LSTM) model. In this experiment, the GWL and past steps of GWL time series (GWLt1) values of the 16 observation wells were used as inputs. In the model structure, Experiment 1 used the LSTM model to predict groundwater levels.
  • Experiment 2: Multivariate LSTM (MV-LSTM) model. In this experiment, the GWL, GWLt1, rainfall, and past steps of rainfall time series (Rt1) values recorded at the 16 observation wells were used as inputs. Similarly, Experiment 2 used the LSTM model to predict groundwater levels.
  • Experiment 3: Multivariate ConvLSTM considering only the spatial distance (D-MV-ConvLSTM) model. In input variables, the D-MV-ConvLSTM model was based on Experiment 2 and additionally considered the GWL sequences of wells with the same category of the clustering results based on the distance index (GWLdistance_cat); all observation wells are divided into three categories, as shown in Figure 5a. In the model structure, the ConvLSTM module was introduced to improve the LSTM structure and effectively extract the temporal and spatial characteristics of the GWL fluctuations recorded in the observation wells.
  • Experiment 4: Modified multivariate ConvLSTM considering connectivity characteristics (M-MV-ConvLSTM) model. In terms of input variables, the M-MV-ConvLSTM model was based on Experiment 2 and additionally considered the GWL sequences of wells with the same category of the clustering results based on the connectivity characteristics (GWLconnectivity_cat); all observation wells are divided into seven categories, as shown in Figure 5c and Table 1. Similarly, Experiment 4 used the ConvLSTM model to predict groundwater levels.

3.5. Models and Network Architecture

Table 2 describes the network set and data set for predicting the groundwater levels at the 16 wells in this study. The model parameters were given in a large number of cases. The number of hidden nodes (4, 5, 6, 7, 8), kernel size of convolution layer (3, 5, 7), and learning rate (0.001, 0.003, 0.005) comprised a total of 45 combinations, so one optimal combination was determined by a trial-and-error method in the validation stage [6]. Table 3 describes the best combination of parameters for each experimental model.

4. Results

To examine the applicability of the ConvLSTM conjunction model for GWL forecasting in a karst aquifer while considering connectivity characteristics, we used the root mean square error (RMSE) and Nash–Sutcliffe efficiency coefficient (NSE) to evaluate the simulation results.
Table 4 lists the RMSE and NSE values of the four GWL simulation and prediction experiments, and Figure 6 shows histograms of the RMSEs. For the SV-LSTM, MV-LSTM, D-MV-ConvLSTM and M-MV-ConvLSTM models, the average RMSE values were 1.38, 0.75, 0.56 and 0.46, respectively. The experimental results showed that the MV-LSTM model, which considered the rainfall factor, performed better than the SV-LSTM model at each well. The D-MV-ConvLSTM, which considered the GWLdistance_cat, performed better than the SV-LSTM and MV-LSTM models. Moreover, the M-MV-ConvLSTM model considered the connectivity between observation wells, and its accuracy was further improved.
Furthermore, it can be seen from the NSE values listed in Table 4 that the NSEs of the D-MV-ConvLSTM and M-MV-ConvLSTM models had certain degrees of improvement, compared with those of the SV-LSTM model and MV-LSTM, and the NSE values of the M-MV-ConvLSTM model were further improved, compared with those of D-MV-ConvLSTM. With this model, among all wells, 50% had NSE values above 0.9, and 80% had NSE values above 0.85, indicating that the M-MV-ConvLSTM model, which considered both meteorological factors and the connectivity among observation wells, had the best prediction effect and high credibility.
Figure 7 shows the prediction results obtained by using the four models for a selected number of wells; for all wells, the results can be seen in Figure A1. The predicted data are the GWL observation data from January through December 2012, which were sampled 6 times a month with a sampling interval of 5 days. It can be seen from the figure that, in general, the four models and the observed values had the same trend. However, the prediction results of the M-MV-ConvLSTM model were more in line with the actual observation results than the prediction results of the other three models. This was because the M-MV-ConvLSTM model considered the GWL data and meteorological data of the target observation well itself and additionally considered the GWL data of the associated observation wells.
Moreover, compared with the single-structure LSTM model, the ConvLSTM model comprehensively analyzed the spatiotemporal characteristics of the data and revealed a certain improvement in the time-lag problem of the prediction model. From the prediction results of each well shown in Figure 7, it can be seen that the forecast lag problem was greatly improved by the M-MV-ConvLSTM model. Therefore, considering the prediction accuracy and improvement of the lag problem, the M-MV-ConvLSTM prediction model proposed in this paper, which considers the connectivity of observation wells, has good applicability for predicting GWLs.

5. Discussion

5.1. Influence of Water Level Fluctuations on Model Accuracy

As shown in Table 4, in the SV-LSTM model simulation experiment, 2 of the 16 wells (No. 1 and No. 2) exhibited relatively large errors, with RMSE values of 9.19 and 5.01, respectively. This is because the GWLs of these two wells have a relatively high degree of water level change (Figure 8), so it is difficult to accurately simulate the GWLs of these wells using only the GWL data itself. However, by adding rainfall variables and analyzing the observation well connectivity, the RMSE values of wells No. 1 and No. 2 obtained with the M-MV-ConvLSTM model were improved to 2.95 and 0.94, respectively.
These results show that in cases of large GWL fluctuations, the prediction accuracy can be effectively improved in D-MV-ConvLSTM and M-MV-ConvLSTM, and for No. 1, the accuracy of M-MV-ConvLSTM is further improved. However, for No. 2, the accuracy has not been further improved in M-MV-ConvLSTM (Table 4); we believe that it is located on the edge of the hill (Figure 1), which is relatively isolated and has poor water yield (water yield of single well <500 m3/d) and is vulnerable to external factors. The model cannot accurately reflect the dynamic change of GWL at this well, so the accuracy of the No. 2 has not been effectively improved.

5.2. Effect of Spatial Connectivity on Model Accuracy

As shown in Figure 5, when clustering based on distance, the four observation wells No. 11, No. 12, No. 15, and No. 16 were undoubtedly clustered into the same category because of the short distances between these wells. However, the clustering method based on hydrogeological characteristics divided wells No. 11 and No. 12 into category 5 and No. 15 and No. 16 into category 4.
Figure 9 displays the prediction results obtained for these four observation wells. We found that although the four observation wells were very close in their spatial locations, No. 11, No. 12, No. 15, and No. 16 obviously have different GWL fluctuation characteristics and show typical spatial heterogeneity. Moreover, the water yield of the four wells is significantly different, and they are in two different levels in the water-rich classification of karst aquifers, as shown in Figure 1. From the results presented in Table 4, and the results of Experiment 4, it can be inferred that the RMSE and NSE values of the M-MV-ConvLSTM model are the best for these four observation wells.
The results reveal the effectiveness of the conjunction ConvLSTM method proposed in this paper for predicting the GWLs of karst aquifers while considering connectivity characteristics. The mixed clustering method in this paper may cause the problem of classification redundancy, which needs to be improved in the future. Moreover, comprehensive geographic research usually focuses on various geographic elements to explore the relationships and interactions of elements behind geographic phenomena and processes [43]; the results of this article show that meteorological data are very important for predicting GWL, and it will be a very good research method if these data can be coupled with meteorological forecast models and used in the modeling.

6. Conclusions

In this paper, a ConvLSTM conjunction model and GWL prediction method that considers connectivity characteristics in a karst aquifer were proposed. The method proposed herein mainly verifies the idea that the spatial connectivity of observation wells at different locations should be considered to improve the prediction of GWLs. This study comprehensively analyzed the connectivity of a karst aquifer based on the distances between observation wells and the hydrogeological characteristics of the aquifer recorded at these wells, and the connectivity analysis results were incorporated into the developed ConvLSTM neural network prediction model. On this basis, four artificial neural network prediction models were designed to simulate and predict the GWL in the Jinan karst water area. The experimental results show that the multivariable ConvLSTM model that considers the connectivity characteristics among observation wells has a higher simulation accuracy than the other analyzed models. The results of this research show that this method is effective and provides a new idea for obtaining real-time GWL predictions in karst water areas.

Author Contributions

Conceptualization, F.G. and Z.Z.; methodology, F.G. and Z.Z.; software, J.Y.; validation, F.G., Z.Z., and H.L.; formal analysis, F.G.; investigation, Z.Z.; resources, H.L. and G.L.; data curation, J.Y.; writing—original draft preparation, F.G., Z.Z., and J.Y.; writing—review and editing, H.L. and G.L.; visualization, F.G. and Z.Z.; supervision, H.L. and G.L.; project administration, H.L.; funding acquisition, F.G., Z.Z., and H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (No. 41571386 and No. U1811464) and the Key Research and Developmental Program of Shandong Province, Major Scientific and Technological Innovation Project, 2019JZZY020105.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The meteorological data and other model simulation data sets used in this study are available from the corresponding author upon reasonable request.

Acknowledgments

Acknowledgement for the data support from Yangtze River Delta Science Data Center, National Earth System Science Data Center, National Science & Technology Infrastructure of China.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Appendix A

Table A1. Hydrogeological characteristics data of all observation wells.
Table A1. Hydrogeological characteristics data of all observation wells.
IDLongitudeLatitudeWater_Yield
(m3/d)
Relief Amplitude
(m)
Terrain Slope
(Degree)
Var_GWL
(m)
Dis_Faults
(m)
No. 1117.12 36.63 <500704.55 13.38 3502.63
No. 2117.01 36.54 <500343.43 1.73 5943.78
No. 3116.91 36.59 1000–5000391.18 1.88 4031.99
No. 4117.21 36.72 1000–5000312.56 1.07 838.63
No. 5116.71 36.56 5000–10,000227.34 1.04 160.50
No. 6116.78 36.60 5000–10,000211.97 1.17 1558.72
No. 7116.71 36.57 5000–10,000263.43 1.18 427.73
No. 8116.84 36.65 5000–10,000233.71 0.99 3654.12
No. 9116.86 36.62 5000–10,000220.66 1.00 895.40
No. 10116.87 36.59 5000–10,000436.29 1.88 208.02
No. 11117.01 36.66 5000–10,000441.18 0.49 912.59
No. 12117.03 36.66 5000–10,000313.03 0.53 1751.44
No. 13117.17 36.70 1000–5000221.86 1.39 2441.75
No. 14117.14 36.73 1000–5000266.87 0.66 1465.39
No. 15117.01 36.67 1000–50002512.86 0.06 1526.24
No. 16117.02 36.67 1000–5000233.84 0.03 1926.57
Abbreviation: Var_GWL, variance in the GWL; Dis_faults, distance from a fault.
Figure A1. Modeling simulation results obtained at all wells in the four experiments.
Figure A1. Modeling simulation results obtained at all wells in the four experiments.
Water 13 02759 g0a1

References

  1. Scibek, J.; Allen, D.M. Modeled impacts of predicted climate change on recharge and groundwater levels. Water Resour. Res. 2006, 42, 1–18. [Google Scholar] [CrossRef]
  2. Panagoulia, D.; Dimou, G. Groundwater-Streamflow interactions under changing climate conditions. In Proceedings of the Man’s Influence on Freshwater Ecosystems and Water Use XXI General Assembly of the International Union of Geodesy & Geophysics, Boulder, CO, USA, July 1995; pp. 191–196. [Google Scholar]
  3. Panagoulia, D.; Dimou, G. Sensitivities of groundwater streamflow interaction to global climate change. Hydrol. Sci. J. 1996, 41, 781–796. [Google Scholar] [CrossRef] [Green Version]
  4. Trefry, M.G.; Muffels, C. FEFLOW: A finite-element ground water flow and transport modeling tool. Groundwater 2007, 45, 525–528. [Google Scholar] [CrossRef]
  5. Wang, S.; Shao, J.; Song, X.; Zhang, Y.; Huo, Z.; Zhou, X. Application of MODFLOW and geographic information system to groundwater flow simulation in North China Plain. China Environ. Geol. 2008, 55, 1449–1462. [Google Scholar] [CrossRef]
  6. Lee, S.; Lee, K.K.; Yoon, H. Using artificial neural network models for groundwater level forecasting and assessment of the relative impacts of influencing factors. Hydrogeol. J. 2018, 27, 1–4. [Google Scholar] [CrossRef]
  7. Shirmohammadi, B.; Vafakhah, M.; Moosavi, V.; Moghaddamnia, A. Application of Several Data-Driven Techniques for Predicting Groundwater Level. Water Resour. Manag. 2013, 27, 419–432. [Google Scholar] [CrossRef]
  8. Maiti, S.; Tiwari, R.K. A comparative study of artificial neural networks, bayesian neural networks and adaptive neuro-fuzzy inference system in groundwater level prediction. Environ. Earth Sci. 2014, 7, 3147–3160. [Google Scholar] [CrossRef]
  9. Yan, Q.; Ma, C. Application of integrated ARIMA and RBF network for groundwater level forecasting. Environ. Earth Sci. 2016, 75, 396. [Google Scholar] [CrossRef]
  10. Nourani, V.; Alami, M.T.; Vousoughi, F.D. Wavelet-entropy data pre-processing approach for ann-based groundwater level modeling. J. Hydrol. 2015, 524, 255–269. [Google Scholar] [CrossRef]
  11. Wong, H.; Ip, W.C.; Zhang, R.Q.; Xia, J. Non-parametric time series models for hydrological forecasting. J. Hydrol. 2007, 332, 337–347. [Google Scholar] [CrossRef]
  12. Yang, Z.P.; Lu, W.X.; Long, Y.Q.; Li, P. Application and comparison of two prediction models for groundwater levels: A case study in western Jilin Province. China J. Arid. Environ. 2009, 73, 487–492. [Google Scholar] [CrossRef]
  13. Yoon, H.; Jun, S.C.; Hyun, Y.; Bae, G.O.; Lee, K.K. A comparative study of artificial neural networks and support vector machines for predicting groundwater levels in a coastal aquifer. J. Hydrol. 2011, 396, 128–138. [Google Scholar] [CrossRef]
  14. Chang, J.; Wang, G.; Mao, T. Simulation and prediction of suprapermafrost groundwater level variation in response to climate change using a neural network model. J. Hydrol. 2015, 529, 1211–1220. [Google Scholar] [CrossRef]
  15. Rajaee, T.; Ebrahimi, H.; Nourani, V. A review of the artificial intelligence methods in groundwater level modeling. J. Hydrol. 2019, 572, 336–351. [Google Scholar] [CrossRef]
  16. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  17. Coulibaly, P.; Anctil, F.; Aravena, R.; Bobee, B. Artificial neural network modeling of water table depth fluctuations. Water Resour. Res. 2001, 37, 885–896. [Google Scholar] [CrossRef]
  18. Daliakopoulos, I.N.; Coulibaly, P.; Tsanis, I.K. Groundwater level forecasting using artificial neural networks. J. Hydrol. 2005, 309, 229–240. [Google Scholar] [CrossRef]
  19. Zanotti, C.; Rotiroti, M.; Sterlacchini, S.; Cappellini, G.; Bonomi, T. Choosing between linear and nonlinear models and avoiding overfitting for short and long term groundwater level forecasting in a linear system. J. Hydrol. 2019, 578, 124015. [Google Scholar] [CrossRef]
  20. Sahoo, B.B.; Jha, R.; Singh, A.; Kumar, D. Long short-term memory (LSTM) recurrent neural network for low-flow hydrological time series forecasting. Acta Geophys. 2019, 67, 1471–1481. [Google Scholar] [CrossRef]
  21. Supreetha, B.S.; Shenoy, N.; Nayak, P. Lion algorithm- optimized long short-term memory network for groundwater level forecasting in udupi district, india. Appl. Comput. Intell. Soft Comput. 2019. [Google Scholar] [CrossRef] [Green Version]
  22. Wunsch, A.; Liesch, T.; Broda, S. Groundwater level forecasting with artificial neural networks: A comparison of long short-term memory (LSTM), convolutional neural networks (CNNs), and non-linear autoregressive networks with exogenous input (NARX). Hydrol. Earth Syst. Sci. 2021, 25, 1671–1687. [Google Scholar] [CrossRef]
  23. Zhang, Z.; Wang, W.; Qu, S.; Huang, Q.; Liu, S.; Xu, Q.; Ni, L. A New Perspective to Explore the Hydraulic Connectivity of Karst Aquifer System in Jinan Spring Catchment, China. Water 2018, 10, 1368. [Google Scholar] [CrossRef] [Green Version]
  24. Bowes, B.D.; Sadler, J.M.; Morsy, M.M.; Behl, M.; Goodall, J.L. Forecasting groundwater table in a flood prone coastal city with long short-term memory and recurrent neural networks. Water 2019, 11, 1098. [Google Scholar] [CrossRef] [Green Version]
  25. Jeong, J.; Park, E. Comparative applications of data-driven models representing water table fluctuations. J. Hydrol. 2019, 572, 261–273. [Google Scholar] [CrossRef]
  26. Wu, C.C.; Zhang, X.Q.; Wang, W.J.; Lu, C.; Zhang, Y.; Qin, W.; Tick, G.R.; Liu, B.; Shu, L. Groundwater level modeling framework by combining the wavelet transform with a long short-term memory data-driven mode. Sci. Total. Environ. 2021, 783, 146948. [Google Scholar] [CrossRef] [PubMed]
  27. He, L.; Hou, M.; Chen, S.; Zhang, J.; Chen, J.; Qi, H. Construction of a spatio-temporal coupling model for groundwater level prediction: A case study of Changwu area, Yangtze River Delta region of China. Water Supply 2021. [Google Scholar] [CrossRef]
  28. Nayak, P.C.; Rao, Y.; Sudheer, K.P. Groundwater level forecasting in a shallow aquifer using artificial neural network approach. Water Resour. Manag. 2006, 20, 77–90. [Google Scholar] [CrossRef]
  29. Tang, Y.; Zang, C.; Wei, Y.; Jiang, M. Data-driven modeling of groundwater level with least-square support vector machine and spatial–temporal analysis. Geotech. Geol. Eng. 2019, 37, 1661–1670. [Google Scholar] [CrossRef]
  30. Zhang, J.F.; Zhu, Y.; Zhanga, X.P.; Ye, M.; Yang, J.Z. Developing a long short-term memory (lstm) based model for predicting water table depth in agricultural areas. J. Hydrol. 2018, 561, 918–929. [Google Scholar] [CrossRef]
  31. Fiorillo, F.; Pagnozzi, M.; V entafridda, G. A model to simulate recharge processes of karst massifs. Hydrol. Process. 2015, 29, 2301–2314. [Google Scholar] [CrossRef]
  32. Wang, J.; Jin, M.; Lu, G.; Zhang, D.; Kang, F.; Jia, B. Investigation of discharge-area groundwaters for recharge source char-acterization on different scales: The case of Jinan in northern China. Hydrogeol. J. 2016, 24, 1723–1737. [Google Scholar] [CrossRef]
  33. Kang, F.; Jin, M.; Qin, P. Sustainable yield of a karst aquifer system: A case study of Jinan springs in northern China. Hydrogeol. J. 2011, 19, 851–863. [Google Scholar] [CrossRef]
  34. Li, C.M. Karst groundwater resources and springs protection in Jinan City. Carsol. Sin. 1985, 1, 31–39. [Google Scholar]
  35. MacQueen, J. Some methods for classification and analysis of multivariate observations. Proc. Fifth Berkeley Symp. Math. Stat. Probab. 1967, 1, 281–297. [Google Scholar]
  36. Reddy, C.K.; Vinzamuri, B. A survey of partitional and hierarchical clustering algorithms. In Data Clustering: Algorithms and Applications, 1st ed.; Aggarwal, C.C., Reddy, C.K., Eds.; Chapman & Hall/CRC: Boca Raton, FL, USA, 2014; pp. 87–110. [Google Scholar]
  37. Dinh, D.T.; Fujinami, T.; Huynh, V.N. Estimating the Optimal Number of Clusters in Categorical Data Clustering by Silhouette Coefficient. In International Symposium on Knowledge and Systems Sciences; Springer: Singapore, 2019; pp. 1–17. [Google Scholar]
  38. Rousseeuw, P.J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef] [Green Version]
  39. Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.C. Convolutional lstm network: A machine learning approach for precipitation nowcasting. In Proceedings of the 29th Annual Conference in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 802–810. [Google Scholar]
  40. Zhang, J.; Zheng, Y.; Sun, J.; Qi, D. Flow Prediction in Spatio-Temporal Networks Based on Multitask Deep Learning. IEEE Trans. Knowl. Data Eng. 2020, 32, 468–478. [Google Scholar] [CrossRef]
  41. Cho, K.; Merrienboer, B.V.; Gulcehre, C.; Ba Hdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using rnn encoder-decoder for statistical machine translation. Comput. Sci. 2014. [Google Scholar] [CrossRef]
  42. Yuan, M.; Lin, Y. Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. 2006, 68, 49–67. [Google Scholar] [CrossRef]
  43. Chen, M.; Lv, G.; Zhou, C.; Lin, H.; Ma, Z.; Yue, S.; Wen, Y.; Zhang, F.; Wang, J.; Zhu, Z.; et al. Geographic modeling and simulation systems for geographic research in the new era: Some thoughts on their development and construction. Sci. China Earth Sci. 2021, 64, 1207–1223. [Google Scholar] [CrossRef]
Figure 1. The division of karst water systems and locations of observation wells in Jinan spring catchment.
Figure 1. The division of karst water systems and locations of observation wells in Jinan spring catchment.
Water 13 02759 g001
Figure 2. The neuron structure of the LSTM network.
Figure 2. The neuron structure of the LSTM network.
Water 13 02759 g002
Figure 3. The inner structure of ConvLSTM.
Figure 3. The inner structure of ConvLSTM.
Water 13 02759 g003
Figure 4. The ConvLSTM structure of this article.
Figure 4. The ConvLSTM structure of this article.
Water 13 02759 g004
Figure 5. The results of the mixed cluster analysis in which wells were ultimately grouped into seven categories: (a) results based on the spatial distance index; (b) results based on hydrogeological characteristics; (c) final results.
Figure 5. The results of the mixed cluster analysis in which wells were ultimately grouped into seven categories: (a) results based on the spatial distance index; (b) results based on hydrogeological characteristics; (c) final results.
Water 13 02759 g005aWater 13 02759 g005b
Figure 6. RMSE values obtained at the 16 wells in the four experiments.
Figure 6. RMSE values obtained at the 16 wells in the four experiments.
Water 13 02759 g006
Figure 7. Modeling simulation results obtained at certain wells in the four experiments.
Figure 7. Modeling simulation results obtained at certain wells in the four experiments.
Water 13 02759 g007aWater 13 02759 g007b
Figure 8. Modeling simulation results obtained for wells No. 1 and No. 2 in the four experiments.
Figure 8. Modeling simulation results obtained for wells No. 1 and No. 2 in the four experiments.
Water 13 02759 g008
Figure 9. Prediction results of four observation wells (No. 11, No. 12, No. 15, and No. 16) obtained in the four experiments.
Figure 9. Prediction results of four observation wells (No. 11, No. 12, No. 15, and No. 16) obtained in the four experiments.
Water 13 02759 g009
Table 1. Mixed clustering results of observation wells.
Table 1. Mixed clustering results of observation wells.
Category 1Category 2Category 3Category 4Category 5Category 6Category 7
No. 5No. 4No. 2No. 15No. 11No. 1No. 10
No. 6No. 13No. 3No. 16No. 12
No. 7No. 14
No. 8
No. 9
Table 2. Description of the data set and network set for ANN models.
Table 2. Description of the data set and network set for ANN models.
Set TypeStages and Parameters Data
Data setTrainingNumber3456
periodJanuary 2009 to December 2011
Validation20% of the training set in each batch of training
TestingNumber1152
periodJanuary through December 2012
Model parameter settingsTraining algorithm Back-propagation algorithm
Kernel sizes of convolution3, 5, 7
Number of hidden nodes4, 5, 6, 7, 8
Learning rate 0.001, 0.003, 0.005
Table 3. Input variables and model parameters that were selected for four experiments.
Table 3. Input variables and model parameters that were selected for four experiments.
ModelsInput VariablesKernel Size of ConvolutionNumber of Hidden NodesLearning RateValidation Error
SV-LSTM GWL, GWLt1/80.0030.0236
MV-LSTMGWL, R, GWLt1, Rt1/60.0030.0138
D-MV-ConvLSTMGWL, R, GWLt1, Rt1, GWLdistance_cat570.0030.0111
M-MV-ConvLSTMGWL, R, GWLt1, Rt1, GWLconnectivity_cat550.0050.0100
Table 4. RMSE and NSE values of the GWL prediction results obtained from the four experiments at 16 wells.
Table 4. RMSE and NSE values of the GWL prediction results obtained from the four experiments at 16 wells.
Observation WellsNameSV-LSTMMV-LSTMD-MV-ConvLSTMM-MV-ConvLSTM
RMSENSERMSENSERMSENSERMSENSE
No.1Shigou Village9.192 −0.146 4.836 0.677 3.518 0.832 2.954 0.881
No.2Beihou Village5.012 −24.152 2.339 −1.063 0.770 0.406 0.935 0.124
No.3South of Xiaozhuang Village0.507 0.854 0.259 0.895 0.373 0.921 0.257 0.962
No.4Jinping Village1.129 −0.212 0.915 0.056 0.916 0.202 0.960 0.122
No.5Ximenli Village0.342 0.807 0.313 0.768 0.315 0.836 0.261 0.888
No.6Dayu Village0.285 0.845 0.269 0.784 0.263 0.867 0.217 0.910
No.7North of Ximenli0.367 0.766 0.284 0.774 0.295 0.848 0.221 0.916
No.8Kuangli Village0.262 0.746 0.208 0.686 0.216 0.827 0.174 0.888
No.9North of Hongwei Village0.387 0.927 0.218 0.939 0.214 0.978 0.112 0.994
No.10Xiaozhuang Village0.550 0.798 0.401 0.837 0.456 0.861 0.358 0.914
No.11Baotu Spring0.341 0.075 0.181 0.502 0.155 0.809 0.088 0.939
No.12Heihu Spring0.346 0.144 0.187 0.545 0.162 0.813 0.094 0.937
No.13Yinchen Village0.707 0.670 0.556 0.813 0.567 0.788 0.428 0.879
No.14Zhaoxian Village0.786 −0.765 0.283 0.508 0.218 0.864 0.155 0.931
No.15Shuangzhongci Street1.055 −498.447 0.504 −8.943 0.274 −32.608 0.032 0.540
No.16Zhenzhu Spring0.792 −1261.636 0.176 −331.881 0.230 −105.330 0.068 −8.361
Average 1.379 −111.171 0.746 −20.819 0.559 −7.943 0.457 0.216
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Guo, F.; Yang, J.; Li, H.; Li, G.; Zhang, Z. A ConvLSTM Conjunction Model for Groundwater Level Forecasting in a Karst Aquifer Considering Connectivity Characteristics. Water 2021, 13, 2759. https://doi.org/10.3390/w13192759

AMA Style

Guo F, Yang J, Li H, Li G, Zhang Z. A ConvLSTM Conjunction Model for Groundwater Level Forecasting in a Karst Aquifer Considering Connectivity Characteristics. Water. 2021; 13(19):2759. https://doi.org/10.3390/w13192759

Chicago/Turabian Style

Guo, Fei, Jing Yang, Hu Li, Gang Li, and Zhuo Zhang. 2021. "A ConvLSTM Conjunction Model for Groundwater Level Forecasting in a Karst Aquifer Considering Connectivity Characteristics" Water 13, no. 19: 2759. https://doi.org/10.3390/w13192759

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop