Next Article in Journal
Hydrologic and Cost–Benefit Analysis of Multiple Check Dams in Catchments of Ephemeral Streams, Rajasthan, India
Previous Article in Journal
Concerning Dynamic Effects in Pipe Systems with Two-Phase Flows: Pressure Surges, Cavitation, and Ventilation
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:

A CNN-LSTM Model Based on a Meta-Learning Algorithm to Predict Groundwater Level in the Middle and Lower Reaches of the Heihe River, China

School of Mathematics and Physics, Lanzhou Jiaotong University, Lanzhou 730070, China
Author to whom correspondence should be addressed.
Water 2022, 14(15), 2377;
Submission received: 9 June 2022 / Revised: 28 July 2022 / Accepted: 29 July 2022 / Published: 31 July 2022
(This article belongs to the Section Hydrogeology)


In this study, a deep learning model is proposed to predict groundwater levels. The model is able to accurately complete the prediction task even when the data utilized are insufficient. The hybrid model that we have developed, CNN-LSTM-ML, uses a combined network structure of convolutional neural networks (CNN) and long short-term memory (LSTM) network to extract the time dependence of groundwater level on meteorological factors, and uses a meta-learning algorithm framework to ensure the network’s performance under sample conditions. The study predicts groundwater levels from 66 observation wells in the middle and lower reaches of the Heihe River in arid regions and compares them with other data-driven models. Experiments show that the CNN-LSTM-ML model outperforms other models in terms of prediction accuracy in both the short term (1 month) and long term (12 months). Under the condition that the training data are reduced by 50%, the MAE of the proposed model is 33.6% lower than that of LSTM. The results of ablation experiments show that CNN-LSTM-ML is 26.5% better than the RMSE of the original CNN-LSTM structure. The model provides an effective method for groundwater level prediction and contributes to the sustainable management of water resources in arid regions.

1. Introduction

Groundwater is the world’s largest distributed reservoir of fresh water [1], and bears an important responsibility to meet the daily water consumption of 2.5 billion people around the world [2]. In addition, more than half of the world’s food irrigation water comes from groundwater resources [3]. In arid and semi-arid regions with limited rainfall, groundwater resources are especially important for economic and agricultural development [4,5]. In recent years, due to climate change, geological change, and increasing population, the global demand for groundwater resources has also increased sharply [6,7,8]. At the same time, groundwater reserves have rapidly decreased [9], while policies to manage groundwater have been neglected. The reduction of groundwater resources will bring a series of problems, including reduced food production and environmental degradation [10,11]. Therefore, the accurate prediction of groundwater level is crucial for relevant government departments to protect the sustainable development of groundwater resources [12].
In fact, the depth of groundwater level changes over time and is affected by various factors. These factors include meteorological variables, such as precipitation, evapotranspiration, and snow melt [13,14,15,16,17,18,19], as well as surface water around groundwater monitoring sites [20], geological hazards [21], and human factors [22,23]. Therefore, predicting the groundwater level is a highly dynamic and complex problem.
A model that can accurately predict the groundwater level needs to consider time series information on the existing level and the factors that can change the level. Groundwater models based on modern geophysical methods are used to study certain hydrogeological properties of aquifers through electrical conductivity [24,25,26]. These methods are expensive because they usually require manual surveying. With the development of satellite technology and measurement tools, technologies based on remote sensing (RS) and geographic information systems (GIS) have played a huge role in groundwater research [27,28,29]. These methods have obvious advantages in collecting data and drawing graphics and provide technical support for the monitoring of groundwater levels. However, due to the real-time nature of RS and GIS methods, their predictions of groundwater level rely on human judgment, with imperfect results.
By contrast, data-driven models, driven by data sensor technology, have played a greater role in water resource management [30,31,32]. As one of the classic methods for groundwater level prediction, the multiple linear regression method is widely used [33]. The prediction process takes into account the influence of various factors such as rainfall, temperature and river water level on the groundwater level [34], and the constructed model has a high degree of interpretability. However, the multiple regression model performs poorly in predicting the groundwater level under the condition of a small sample size. In addition, if there are too many input variables, problems such as multicollinearity are prone to occur, which will interfere the performance of the model. An autoregressive integrated moving average (ARIMA) model, as a classic time series forecasting method, performs well in the short-term forecasting of groundwater levels [35,36,37]. However, as with multiple regression, the predictive performance of ARIMA is greatly reduced if the amount of data available for modeling is small. In addition, the process of setting appropriate parameters for ARIMA is complicated and its long-term predictions can lack specificity.
With the rapid development of computer technology, deep learning models represented by a large number of neurons have begun to be used in the field of groundwater level prediction. These include multilayer perceptron (MLP) models, also known as artificial neural networks (ANNs). Because these models have high predictive accuracy and do not need to clarify the physical relationship among variables, they have many applications in groundwater prediction [38,39,40]. Recurrent neural networks (RNNs) have natural advantages in time series forecasting and are therefore used for groundwater level forecasting [41]. However, because RNNs are prone to gradient disappearance and gradient explosion problems, a long short-term memory (LSTM) network is usually used to replace RNN. LSTM fully considers the temporal dependence of the input sequence, and has outstanding performance in extracting temporal information. Therefore, many models use LSTM for groundwater level prediction [42,43,44,45,46]. Although LSTM has obvious advantages in long time series prediction, it has insufficient ability to extract the characteristics of influencing factors, so it has certain limitations in extracting the characteristics of groundwater dynamic changes under multiple influences. Convolutional neural networks (CNNs) shine in the field of deep learning due to their powerful feature extraction capabilities. Naturally, there are also studies applying CNN to groundwater level prediction modeling [47,48]. However, after all, CNN is a neighborhood feature extractor, which will have some impact on the predictive accuracy of long-term series. A long- and short-term time-series network (LSTNet) [49] is suitable for multivariate time series forecasting, and CNN and LSTM are also added to the network structure, which is better for time series forecasting with periodic changes. Correspondingly, when the amount of data is small, or the periodicity of sample data is weak, the universality is poor.
On the one hand, the groundwater level prediction model based on deep learning requires large-scale training samples to train the model [50,51], and one of the problems of groundwater prediction is the lack of sufficient data [52], so it is difficult to accumulate enough training samples for the convergence of deep learning model parameters. On the other hand, the existing groundwater level prediction models are weak in modeling the multivariate influencing factors at different times, and most of them cannot simulate the dynamic changes of the groundwater level in time. As an emerging method in the field of deep learning, meta-learning has shown strong performance in model generalization ability, self-learning ability, and few-shot classification and regression problems [53,54,55]. Meta-learning can handle problems with fewer training samples, but currently lacks a network structure that matches the target task, which coincides with the purpose of this study. The model-agnostic meta-learning algorithm (MAML) [56] can finally determine the initial parameters of the model for the target prediction task by meta-training the training samples. This means that the model can learn by itself even under the condition of insufficient sample size, thereby quickly adapting to new prediction tasks. In the end, better performance can still be obtained under the condition of a small number of target samples.
For the above problems and backgrounds, this study proposes a deep learning hybrid model, convolutional neural networks and long short-term memory networks based on meta-learning (CNN-LSTM-ML), to accurately predict groundwater levels. The main contributions of this study are as follows:
Aiming to resolve the problem that the existing methods are not effective for long-term prediction of groundwater level, and the insufficient modeling ability of various influencing factors. In this paper, a deep learning network structure of hybrid CNN-LSTM is designed, in which the CNN module can effectively extract the multivariate features that affect the groundwater level, and the LSTM module has natural advantages for long-term time series prediction. Therefore, the network structure can effectively solve the above problems.
For deep learning models, large-scale training data are usually needed as a support, and the real groundwater prediction problem usually lacks sufficient samples. In this study, a meta-learning algorithm is added to the CNN-LSTM network structure, so that the model can train a meta-learner with fewer training samples to complete the task of groundwater level prediction. To the authors’ knowledge, there is no single paper that applies meta-learning to groundwater level prediction, and our study provides support for the expansion of meta-learning algorithm applications.
To verify the performance of the CNN-LSTM-ML, this study is conducted on a real groundwater level dataset in the middle and lower reaches of the Heihe River. Experimental results show that in short-term prediction (1 month), the MAE of CNN-LSTM-ML is 11.7% lower than that of the multiple regression method. In long-term prediction (12 months), the RMSE of CNN-LSTM-ML is 5% lower than that of LSTM. At the same time, the model can still maintain a high prediction accuracy even when the training samples are reduced. All in all, the proposed model can accurately predict groundwater levels, which can help relevant government departments manage water resources and make evidence-based decisions.

2. Materials and Methods

2.1. Groundwater Level Prediction Process

In this study, the groundwater level prediction process is divided into three main stages: data processing, modeling prediction and model evaluation. The specific flow chart is shown in Figure 1.
Data processing is conducted to identify outliers and impute missing data, to ensure the data integrity of the input model. Data normalization is carried out to eliminate the influence of different dimensions on model predictions. To build the model, we use the meta-learning framework and add the CNN and LSTM networks to extract the spatial features and temporal dependencies of the data. The specific details of the model will be explained in the following sections. In the final experiment, scientific evaluation indicators are used to evaluate the model.

2.2. Study Area and Data Processing

2.2.1. Study Area and Data

Located in an arid region of northwestern China, the Heihe is the second-largest inland river in China and is representative of all inland rivers [57]. The Heihe River is located within 37.7°–42.7° N, 97.1°–102.0° E. It originates in the Qilian Mountains of Qinghai Province and flows northward through Gansu Province to the Inner Mongolia Autonomous Region, with a total length of about 821 km and a total area of about 1,432,000 km2. There are glaciers, permafrost, forests, and deserts in the Heihe River Basin (HRB) [58], which affect the production and lives of millions of people in these arid areas.
As shown in Figure 2, this study is carried out in the middle and lower reaches of the HRB, which covers Zhangye, Linze, Gaotai, Jinta in Gansu Province, and EJinaqi in the Inner Mongolia Autonomous Region. The simulated water level data of 66 observation wells from 1 January 2003 to 1 December 2012 were selected in the above five study areas [59]. The data are all monthly data. Seven meteorological factors, including evaporation, precipitation, air pressure, relative humidity, sunshine time, temperature and wind speed in the study area, are selected for the study. Historical meteorological data are collected from local meteorological observation sites, supported by local meteorological bureaus. This paper uses the daily data from 1 January 2003 to 31 December 2012. Many of the dates have missing data that require post-processing.

2.2.2. Missing Data Processing

Due to failure of data acquisition equipment, unstable data transmission networks, failure of storage equipment, and possible human factors, the data used for this type of research are usually incomplete. Missing data can lead to biased parameter estimates, missing key location information, increased standard errors, and poor generalizability of the results [60]. Due to the small amount of data collected in this experiment, the main focus of this study is the prediction problem under the condition of few samples, so the existence of missing values will have a great impact on the model in the experiment and the comparison model. In the process of dealing with missing values in time series, methods such as mean imputation, linear imputation, multiple imputation, and k-nearest neighbor (KNN) imputation are usually used [61,62,63].
For the missing values in the meteorological data collected in this study, we firstly obtained data from five national-level surface meteorological observation stations provided by the China Meteorological Data Service Centre (CMDC) (Zhangye 39°05′ N, 100°17′ E; Gaotai 39°22′ N, 99°50′ E; Linze 39°09′ N, 100°10′ E; Jinta 40°00′ N, 98°53′ E; EJinaqi 41°57′ N, 101°04′ E). If there is still a single missing data, we will adopt the method of mean interpolation. The specific implementation is shown in the following formula:
x N A = x t 1 + x t + 1 2
It is assumed that there is a missing value x N A at the current time t , and that x t 1 and x t + 1 represent the complete data at the previous time and the time after the missing data, respectively. For the case where multiple consecutive data are missing in the time series, we use linear interpolation to solve this problem.

2.2.3. Outlier Data Processing

We use the boxplot method to analyze outliers. QL is the lower quartile, meaning that 25% of all observations is less than QL. QU is the upper quartile, meaning that one in four observations is greater than QU. IQR is the interquartile range, which is the difference between the upper quartile QU and the lower quartile QL, that is, IQR = QU − QL. Outliers are usually defined as values less than QL − 1.5IQR or greater than QU + 1.5IQR [64]. Outliers can be treated as missing values.
Due to the large number of observation wells, we select one observation well from each of the Zhangye and Jinta study areas for demonstration. The results of boxplot analysis of outliers are shown in Figure 3. Among them, there is an abnormal value point in an observation well in the Zhangye research area, and it is necessary to analyze and de-cide whether to eliminate the interpolation or keep it. In addition, since the original meteorological data is daily data, but the groundwater level data is monthly data, the monthly average of the meteorological data is calculated. After the above processing, data that can be used for experiments are obtained.

2.2.4. Data Normalization

Before the data are put into some of the deep learning models used in this article, the data need to be normalized. The purpose is to eliminate the influence on the model of the feature attribute dimension in the data, so as to accelerate the model convergence and improve the model’s predictive accuracy. In the experiments, we adopt the max-min normalization method:
x = x x m i n x m a x x m i n  
where x is the normalized data, x represents the current observation data, and x m a x and x m i n represent the maximum and minimum values in the current observation data, respectively.

2.3. Convolutional Neural Network (CNN)

CNN is a type of neural network that includes convolutional computations and is specialized for processing similar gridded data. The concept of convolutional networks was first proposed in 1989 to solve computer vision problems [65]. Due to the weight sharing and local connection characteristics of convolutional neural networks, they have made great contributions in deep learning and have been widely used in various fields [66,67,68]. The main purpose of the convolutional neural network is to extract the feature information of the input, which is mainly done by the convolutional layer. Since the feature information extracted by the convolutional layer is linear, the input data usually has nonlinear characteristics. Therefore, it is necessary to introduce a nonlinear function to solve this problem. This nonlinear function is the activation function in the convolutional neural network. In the specific application of machine learning, the input data is usually not a one-dimensional array. Here, two-dimensional discrete convolution is used as an example to show the convolution operation, and the activation function uses the Relu function as an example. The specific operation is as formula:
H ( i , j ) = ( I K ) ( i , j ) = m n K ( m , n ) I ( i + m , j + n )
f ( x ) = m a x ( 0 , x )  
where H ( i , j ) represents a specific position in the feature map after convolution. I and K ( m , n ) represent the size of the input array and the convolution kernel, respectively. x is the feature map after the convolution operation, which is usually a tensor. According to the above formula, it can be seen that the convolution kernel can slide on the input array. Each time it slides to a position, its corresponding array elements are multiplied and summed. Finally, after completing the sliding input array, the feature map matrix will be obtained. Each element on the feature map represents the feature information extracted by the convolution operation. Finally, the feature map is put into the activation function to give it nonlinear characteristics, which strengthens the expressive ability of the entire network and plays a key role in data fitting.
Because this research involves many meteorological factors that affect the groundwater level, the primary challenge of our work involves the feature engineering of multivariate time series. Therefore, it is natural to consider using convolutional neural networks for feature extraction. The entire feature engineering process is shown in Figure 4.
The collected multivariate time series is regarded as a two-dimensional array, the corresponding filters are used for feature extraction and integration, and the activation function is used for nonlinear fitting. The final feature map stores the feature information that affects the groundwater level.

2.4. Long-Short Term Memory (LSTM) Network

LSTM is a neural network for learning long-term dependencies from time series data. LSTM was first proposed in 1997 [69], and currently has many applications in the field of time series forecasting [70,71,72]. In fact, LSTM is a variant of the recurrent neural network, which solves the problem of gradient disappearance in the traditional RNN training process, and thus can preserve long-term information. The internal structure of LSTM is shown in Figure 5.
LSTM is a special RNN. The biggest difference from traditional RNN is that it introduces the concept of memory cells. Due to the existence of memory cells, LSTM has the ability to memorize long-term information in the input sequence. The representation of memory cells in Figure 4 is C t . We can see that the state of the memory cell can directly pass through the cyclic chain, where the LSTM chooses to update or forget the cell state through various gates. Specifically, there are three types of LSTM inputs at time t , namely the input x t at the current time, the hidden layer state h t 1 and the cell state C t 1 at the previous time. The forget gate f t and the input gate i t are used to control which information should be added to the cell state, and the output gate O t is used to calculate the current hidden layer state h t . The final LSTM outputs are the current cell state C t and the hidden layer state h t . The formula of the specific gating unit is as follows:
f t = σ ( W f [ h t 1 , X t ] + b f )
i t = σ ( W i [ h t 1 , X t ] + b i )
O t = σ ( W o [ h t 1 , X t ] + b o )
where [ h t 1 , X t ] indicates that the hidden layer and the sequence data are jointly input to the network. W f , W i , W o and b f , b i , b o are the corresponding parameters in the network. Specifically, W f , W i , and W o are the weighted coefficient matrices corresponding to each gate, and b f , b i , and b o correspond to the offsets. σ is the sigmoid activation function of each gate. The process of updating the cell state is as follows:
C ˜ t = tanh ( W c [ h t 1 , X t ] + b c )  
C t = f t C t 1 + i t C ˜ t  
where represents the multiplication of corresponding elements. tanh is the hyperbolic tangent activation function. The updated cell C t consists of the remaining value in the cell C t 1 filtered by the forget gate and the value of the newly added cell C ˜ t after the input gate, respectively. Finally, we can use the updated cells to generate new hidden states through the output gate:
h t = O t tanh ( C t )
The role of the output gate is to ensure that only truly valid information is included in the new hidden state. It is worth mentioning that, in order to avoid the disappearance of the gradient information of LSTM through backpropagation, the memory cells are updated by means of addition operations, which allows LSTM to save the cell state for a long time.

2.5. Meta-Learning Algorithm

At present, the monitoring of groundwater level mainly relies on two methods: instrument collection and manual measurement. The collected data are mainly monthly data, and many of them have missing values. Therefore, the amount of data collected for research is generally limited. At present, most deep learning models require large-scale datasets to train the model. When there are few data points, the model usually fails [73]. To address this issue, a meta-learning algorithm was introduced in this study. The meta-learning proposal is aimed at the lack of generalization performance of traditional neural network models and poor adaptability to new types of tasks. It is hoped that the model will acquire a “learning to learn” ability, so that it can acquire existing “knowledge” and quickly learn new tasks.
Considering the structure of our overall model and the training process, the model-agnostic meta-learning (MAML) algorithm [56] was finally chosen. The purpose is to train a deep learning model with better predictions and stronger generalization ability with limited data. We will describe the content of MAML in detail below.
The key idea of MAML is to find a better initial parameter in the process of training the model. Under such conditions, the model can have stronger performance on new tasks after updating the parameters through fewer gradient steps. Suppose there is a distribution p ( T ) for a set of tasks and a model f θ influenced by the parameter θ . Then, a fraction of tasks T i is drawn from the task distribution. For each new task T i , its parameters are transformed from the initialized θ to θ i . In MAML, the updated model parameters θ i are computed by doing one or more gradient descents on task T i . When considering a gradient calculation:
θ i = θ α θ L T i ( f θ )
where α is a hyperparameter, L T i represents the loss of task T i , and represents the gradient operation. After a gradient update, the tasks extracted by the algorithm have optimal parameters θ i . Our ultimate goal is for θ to perform well on any task T i of the task distribution p ( T ) . Therefore, the goal of MAML can be expressed as:
min θ T i ~ p ( T ) L T i ( f θ i ) = T i ~ p ( T ) L T i ( f θ α θ L T i ( f θ ) )  
In other words, the goal is to find the best model parameter θ to minimize the sum of the losses of all tasks after the gradient update. The updating process of the model parameters θ is as follows:
θ = θ β θ L T i ( f θ i )
where β is a hyperparameter. The above steps are also known as meta-optimization or meta-updating.

2.6. CNN-LSTM-ML Prediction Model

2.6.1. Problem Definition

Changes in groundwater levels are usually affected by a variety of factors, including changes in climate, changes in geological and topographical conditions, and anthropogenic influences. The above-mentioned statistical data usually change with the passage of time in different geographical locations. Usually, the change of groundwater level in a region is directly reflected by the water level data of each monitoring point. Due to problems, such as outdated monitoring equipment and insufficient human resources in arid areas, the amount of available data that can be collected is limited. Therefore, in this study, we regard the prediction and modeling process of groundwater levels in arid areas as a multivariate time series prediction problem under the condition of few samples. By predicting the groundwater level of each monitoring well, this method provides a dynamic simulation of groundwater. Here are some symbolic representations of the prediction process:
Definition 1
(Monitoring wells). In this study, the monitoring wells in the data are regarded as S = { s 1 , s 2 , , s n } , where n represents the total number of monitoring wells in the study area.
Definition 2
(Relevant features that affect groundwater level). As mentioned above, the meteorological factors that affect the prediction of groundwater level include evaporation, precipitation, pressure, humidity, sunshine time, average temperature and wind speed. Considering a factor influencing each monitoring well s in a time interval t as X t . Then, all the characteristics of a monitoring well s in a time interval t can be X t = { X t 1 , , X t k } , where k is the total number of features.
Definition 3
(Groundwater level situation). In this study, the groundwater level of each monitoring well s in a time interval t is represented as y t , which is the prediction target of the model.
Definition 4
(Groundwater level training/prediction task). A groundwater level training/prediction task T i is composed of a set D = { X i t , y i t } in a continuous time interval T . In this study, each task T i will be divided into training data D i t r a i n and test data D i t e s t , namely T i = { ( X i 1 , y i 1 ) , , ( X i T , y i T ) } = { D i t r a i n , D i t e s t } .
Usually, to predict the groundwater level of the specified target well s , all historical well groundwater level data and meteorological factor data will be put into a model f ϕ affected by the parameter ϕ for training. Then, the trained model is used to make predictions. However, since the goal of this paper is to use a meta-learning algorithm to solve this problem, taking the prediction of a time step as an example, the problem is transformed as follows. We divide the historical groundwater level data and related meteorological feature data into τ tasks, assuming that all tasks are sampled from the same distribution, i.e., T i ~ p ( T ) . Then the whole meta-learning process is divided into a meta-training stage and a fine-tune testing stage. In the meta-training stage, our goal is to train so that, under the condition of insufficient data, there is still a model f θ with high predictive accuracy. In the prediction fine-tune testing stage, our goal can be regarded as putting the trained model f θ into the target data of few samples D t r a i n = { ( X 1 , y 1 ) , , ( X t 1 , y t 1 ) } . The result is a fine-tuned model f θ to complete the prediction task:
y ^ t = f θ ( D t e s t )  
where y ^ t represents the final prediction result

2.6.2. Model Structure

This subsection introduces the network structure designed by CNN-LSTM-ML and the role of each module in capturing the complex groundwater level situation. First, the algorithm samples a batch of tasks T i from all groundwater level training/prediction tasks p ( T ) . Then, the extracted task T i is divided into training data D i t r a i n and test data D i t e s t . The main task of the training data is to find the optimal parameters for each task through the network structure, while the test data is used to find the optimal parameters of the entire model. Both the training data and the test data go through the same network structure. The input data of the training data will first go through the CNN. CNN is used to capture the multiple influences of each meteorological element on the groundwater level, and, at the same time, it can integrate the spatial relationship between the data, which is convenient for extracting features. The complete sequence is fed into the LSTM layer. Since the observation data is essentially time series data, the feature maps output by the CNN are then fed into the LSTM to learn the long-term dependencies of the sequence. After that, the input of LSTM is put into the fully connected layer to output the prediction result; gradient descent is used to calculate and minimize the loss. Finally, the optimal parameters for the task are obtained. For the input data of the test data, the input network structure is unchanged, but the parameters used by the network are the optimal parameters for the task. On this basis, the gradient descent is performed again to minimize the loss. Finally, we get the relative optimal parameters for the entire model. Figure 6 shows the structure of the training phase of CNN-LSTM-ML.
It is worth noting that the process in the fine-tune testing stage is roughly the same as that in the training stage. The main difference is that the fine-tune testing stage does not need to initialize the model parameters, but rather uses the trained model parameters. Second, the fine-tune testing stage does not have a second gradient update, but instead uses the results of one gradient update to update the model parameters. Subsequent subsections will detail the training and testing algorithm flow.

2.6.3. CNN-LSTM-ML Meta-Training and Fine-Tune Testing

The definition of groundwater level prediction in the field of machine learning is basically a regression problem. We hope to use meta-learning algorithms to solve the problem of small samples in the regression. The goal is to train a statistical function of the groundwater table depth, and ultimately make predictions with fewer data points. Since it is a regression problem, we will use the mean square error loss function, which is described as follows:
L T i ( f θ ) = X j , y j f θ ( X j ) y j 2 2
where X j and y j represent input samples and labels sampled from task T i , respectively. · 2 represents the second norm.
The CNN-LSTM-ML model designed in this study is divided into two processes: meta-training and fine-tune testing (See Algorithms 1 and 2 for details). The purpose of the meta-training stage is to find the optimal initial parameters, so that the model can still make accurate predictions in the case of a small amount of data.
Algorithm 1 CNN-LSTM-ML Meta-Training
Require: Distribution of tasks p ( T ) , Step size hyperparameters α , β
1: Random Initialization parameters θ
2: While not done do
3: Sample a batch of tasks from p ( T ) i.e., ( T 1 , T 2 , · · · , T i ) p ( T )
4: for all  T i  do
5:  Set up training set D i t r a i n for each task in T i
6:  Calculate θ L T i ( f θ ) using D i t r a i n and L T i in Equation (15)
7:  Calculate adapted parameters with gradient descent: θ i = θ α θ L T i ( f θ )
8:  Set up test set D i t e s t for each task in T i for the meta-update
9: end for
10:  Update θ θ β θ L T i ( f θ i ) using each D i t e s t and L T i in Equation (15)
11: end while
The purpose of the fine-tune testing stage is to optimize the model parameters trained in the meta-training stage on new tasks, and finally make predictions on the test set. The specific process is shown in Algorithm 2:
Algorithm 2 CNN-LSTM-ML Fine-tune Testing
Require: Distribution of tasks p ( T ) , Step size hyperparameters γ
1: Well-trained parameters θ
2: While not done do
3: Sample a task T i from p ( T ) i.e., T i p ( T )
4: Set up training set D i t r a i n for each task in T i
5: Calculate θ L T i ( f θ ) using D i t r a i n and L T i in Equation (15)
6: Calculate adapted parameters with gradient descent: θ i * = θ γ θ L T i ( f θ )
7: Set up test set D i t e s t for each task in T i for the prediction
8: Calculate prediction values y ^ = f θ i * ( D i t e s t )
9: end while

2.6.4. Model Evaluation

In order to correctly describe the predictive performance of CNN-LSTM-ML, this study uses two commonly used model accuracy evaluation indicators, namely the root mean square error (RMSE) and the mean absolute error (MAE) [74]. The calculation is as follows:
MAE = 1 N n = 1 N | y n ^ y n |
RMSE = 1 N n = 1 N ( y n ^ y n ) 2
where y n ^ represents the value of the groundwater level predicted by the model, and y n represents the observed value of the actual groundwater level. N is the data size that needs to be predicted.

3. Results

3.1. Meteorological Factor Time Series

After data processing, the meteorological data of a station in Gaotai County are shown as an example in Figure 7. It can be seen that the annual average temperature is 8.5 °C, the monthly average maximum temperature is 23.5 °C (July), the monthly average minimum temperature is −10.2 °C (January), and the maximum temperature has a rising trend. The average monthly evapotranspiration is greater in spring and summer each year, and the maximum monthly average evaporation is 5 mm. The precipitation in the arid study area is sparse. There is no precipitation for one to two months each year, and the annual average rainfall is about 24 mm. In ten years, the maximum sunshine duration in the study area reached 11.7 h, and the monthly average sunshine duration also reached 8.7 h. The wind speed is relatively stable every year, and the windy weather will occur from time to time in the spring. In addition, the number of months with relative humidity above 60% is relatively small, and most of them are concentrated between 40% and 60%.
In general, the region has a dry climate with less rainfall, high evaporation, large temperature differences between day and night, and long sunshine hours. Therefore, groundwater resources are of great significance to the lives of people in arid areas.

3.2. Prediction Performance

In this section, we use different models to predict the groundwater level of all monitoring wells in the middle and lower reaches of the Heihe River. In the model validation phase, all models including the CNN-LSTM-ML were evaluated by MAE and RMSE. The total evaluation result is the average of the evaluation values of all wells. Table 1 shows the details of the prediction performance.
The prediction results in Table 1 show that the CNN-LSTM-ML model proposed in this study achieves better results in comparison with other models. Especially in the short-term prediction process, the two statistical indicators of CNN-LSTM-ML, MAE and RMSE, are small. As the predicted time step increases, the errors of all models increase, but it is obvious from Table 1 that the error of CNN-LSTM-ML grows slowly, and the predictive performance is the best. This is because the model can perform well in the extraction of multiple meteorological features. It also reflects the adaptability and robustness of the meta-learning algorithm to new tasks. Compared with the classic multiple linear regression method, the MAE and RMSE of the model proposed in this paper are reduced by 0.117 and 0.254, respectively, in the prediction effect of the time step of one month. In the prediction results of twelve months, the MAE and RMSE respectively decrease by 21.5% and 44.6%. Compared with other deep learning models, the MAE of CNN-LSTM-ML is 0.014, 0.021, and 0.028 lower than that of MLP, LSTM, and LSTNet, respectively, in the performance of the prediction step size of one month; and the long-term prediction effect (12 months), RMSE is 23.3%, 5%, and 9.1% better than MLP, LSTM, and LSTNet, respectively. In addition, it can be seen from Table 1 that the long-term prediction ability (12 months) of ARIMA is slightly stronger than that of MLP, but the short-term prediction ability (1 month) of MLP is better than that of other deep learning models, and the prediction performance of CNN-LSTM-ML shows constant stability.
Furthermore, we also evaluate the performance of the CNN-LSTM-ML model by examining the comparative graphical representations. Line charts allow us to better understand the fit between observed and predicted data. The visualization is shown in Figure 8.
Since Table 1 shows the average value of the predictive performance of each model on different monitoring wells, in order to ensure that the mean value is not affected by extreme values, we randomly select monitoring well data for prediction. As can be seen from Figure 8, compared with other models, CNN-LSTM-ML predicts and fits closer to well observations. In addition, during the period from 1 January 2012 to 1 June 2012, the groundwater depth of the observation well shows a significant downward trend. This is predicted by every model. However, it is obvious that the downward trend predicted by CNN-LSTM-ML is closer to the measured value. The multiple linear regression method and ARIMA predict an upward trend in the process, which does not match the actual situation. After 1 June 2012, the depth of groundwater level deepened, and CNN-LSTM-ML does a good job of predicting this change, but the LSTM model appears to predict the time of the change too early, so the error increased. Compared with CNN-LSTM-ML, the prediction errors of other models all increase with the increase of the number of months, and the rate of increase exceeds that of the model proposed in this paper. It can also be seen from Figure 8 that, in terms of the overall prediction and fitting effect, the performance of ARIMA and MLP at some time points is slightly better than that of LSTM, which may be related to the amount of training data. It also shows that the deep learning model using the meta-learning algorithm has strong robustness in few-shot regression.
It is worth mentioning that, in order to verify the predictive ability of the CNN-LSTM-ML model proposed in this paper in the case of insufficient sample size, we performed a reduction operation on the basis of the existing training data volume. By training on datasets reduced by different proportions, the following validation results are obtained, as shown in Figure 9.
Figure 9 shows that, after reducing the amount of training data, the CNN-LSTM-ML model proposed in this paper still achieves better results compared with other models. After reducing the training data by 10%, the MAE and RMSE of CNN-LSTM-ML are reduced by approximately 30% and 17%, respectively, compared to the ARIMA model. When the training data is reduced by 30%, the predictive performance of the CNN-LSTM-ML model is close to that of other models, except for the ARIMA model, but still maintains the best predictive performance. After reducing the training data by 50%, the MAE and RMSE of the CNN-LSTM-ML model are reduced by 0.145 and 0.147, respectively, compared to the multiple linear regression model. The results show that the CNN-LSTM-ML model proposed in this study also has strong predictive ability under the condition of insufficient data. This is related to the meta-learning algorithm we adopted, and also shows that the model can extract other features. It can also be seen from Figure 9 that, after the data volume is reduced by 10%, the predictive ability of the ARIMA model is greatly reduced. It is worth mentioning that under the condition that the training data set is reduced by 10%, the MAE of CNN-LSTM-ML only increases by 0.069 compared with the previous normal amount of training data, and the RMSE increases by 0.21. The MAE and RMSE of LSTM rise by 0.114 and 0.215, respectively. When the training data set is reduced by 50%, CNN-LSTM-ML shows strong robustness, MAE increases by 60%, and RMSE increases by 70% compared with the previous normal amount of training data. However, this figure in LSTM reflects that MAE and RMSE rise by 90.8% and 94.5%, respectively. The performance of the ARIMA is mediocre, with MAE and RMSE rising by 2.612 and 2.538, respectively. However, the MLP and multiple linear regression models show better prediction performance when the amount of training data is reduced.
In addition to the above experiments, this paper also conducts ablation experiments to demonstrate the effectiveness of the meta-learning algorithm that we adopted for the predictive effect of the model. In this experiment, we remove the meta-learning training architecture, use the CNN-LSTM network structure alone for prediction, and compare the results with CNN-LSTM-ML. The experimental results are shown in Figure 10 and Figure 11.
From the results of the ablation experiments, it is clear that the model with the meta-learning algorithm can provide better predictive performance than the model without the meta-learning algorithm, especially under the condition of few samples. From the predictive performance shown in Figure 10, the predictive effect of the CNN-LSTM-ML model is always better than that of the CNN-LSTM in both short-term and long-term prediction. In this experiment, we randomly select a monitoring well, and only use the data of this well for training and prediction. The experimental results are shown in Figure 11. Given the small number of samples available for training, the predicted and observed values of CNN-LSTM-ML are still very close. Compared with CNN-LSTM, the fitting effect of CNN-LSTM is not ideal, which once again confirms that the deep learning model needs a lot of data support. In addition, under the data condition of only one well, it can be seen from the extreme point of 1 September 2021 in Figure 11 that the predictive effect of CNN-LSTM-ML for rapidly changing groundwater levels needs to be strengthened. This is a topic for future study.

4. Discussion

In this study, we propose a method that uses a meta-learning algorithm as the training and prediction framework. CNN extracts feature information and LSTM takes the multivariate feature mixture time series output by CNN as the model input. The model extracts features of relevant groundwater levels and meteorological factors through CNN and calculates long-term dependencies in time series through a LSTM layer. The entire network framework is based on meta-learning, and the model still has strong predictive ability under the condition of a small sample size. In comparative experiments with other baseline models, CNN-LSTM-ML outperforms other models in both short-term and long-term prediction effects. Compared with that of the traditional multiple linear regression model, the prediction performance of CNN-LSTM-ML has been greatly improved. We believe that the result is due to the shortcomings of multiple linear regression in nonlinear fitting, which is also consistent with research [33]. Compared with that of the multivariate time series prediction model LSTNet in deep learning, the predictions of the CNN-LSTM-ML proposed in this paper are outstanding. This optimized result is potentially based on the advantages of the meta-learning algorithm in the regression of limited data [56]. It is possible that LSTNet is more suitable to be applied for time series data with fixed patterns [49], and some of the data involved in this article do not have periodic characteristics. This can also be observed in Table 1, which shows that the predictive performance of LSTM is better than that of LSTNet at some time steps.
Interestingly, the predictive power of the models for observed wells varies widely. This is reflected in the fact that deep learning models are not necessarily better than traditional time series forecasting models. For example, ARIMA performs even slightly better than LSTM at certain time points, and the prediction performance of LSTM suitable for predicting longer time steps may be inferior to MLP on some observation well data. We believe that this may be related to the size of the training data, which can be proved in experiments with reduced datasets. After reducing the training dataset, all model predictions are affected, but CNN-LSTM-ML is more robust. Among them, the impact on ARIMA is tremendous. This consequence may be due to the fact that in the process of reducing the data set, the trend and related seasonal characteristics in the data are also eliminated, so that ARIMA cannot capture this change well, resulting in a declination in forecasting ability. Furthermore, the predictive performance of the deep learning models LSTM and LSTNet is also greatly reduced if the amount of trainable data is greatly reduced. Under the condition that the amount of training data is reduced by 50%, the traditional multiple linear regression model and MLP work better. This shows that the deep learning model is very dependent on the size of the training set. If the available data are in-sufficient, the performance may be similar to that of the traditional model. This is consistent with the findings of [50,51].
The deep learning model is similar to a “black box” structure [75], in which the model has strong predictive ability and weak interpretability. The impact of each meteorological factor on the groundwater level cannot be explained fully from the perspective of the CNN-LSTM-ML. From the perspective of model optimization, attention mechanisms have been widely used in the field of deep learning [76,77,78,79,80,81,82]. An attention mechanism is proposed to simulate the process of human learning by focusing on the most important part of the information. In the case of limited computing power, computing resources can be allocated to more important computing tasks, thereby improving model calculation efficiency. In follow-up research, we will consider adding an attention mechanism to the model. On the one hand, it can improve the calculation speed. On the other hand, the weight assigned by an attention mechanism to each meteorological factor can help explain the changes of groundwater level and improve the model’s interpretability. In addition, according to the requirements of the CNN-LSTM-ML model, the dataset is divided into training data and test data in both the meta-training stage and the fine-tune testing stage. In the process of parameter updating, it undergoes multiple gradient updates. This takes up a lot of computing resources. We will consider optimizing the entire process in future model updates.
In addition to the meteorological factors that affect the groundwater level, the level is also affected by factors, such as landforms, aquifer lithology, and human activities. Subsequent research should consider collecting data on these other factors to facilitate model updating and improve predictive accuracy. In addition, this study uses monthly data, while more fine-grained multivariate time series data could be used in subsequent studies to conduct real-time simulations of groundwater level dynamics.

5. Conclusions

This study proposes a CNN-LSTM-ML model based on a meta-learning algorithm framework, a network structure combining CNN and LSTM, for predicting groundwater levels in the middle and lower reaches of the Heihe River. The structure of CNN+LSTM in this study is designed to solve the problems of the complex nonlinear relationship among multiple meteorological factors and the difficulty of long-time series dependence in the prediction of groundwater level. Because groundwater level monitoring data are limited, this paper uses a meta-learning algorithm to cause the deep learning model to have better initialization parameters that quickly converge, and thereby solves the problem of prediction in the face of limited data. Experimental results show that our proposed CNN-LSTM-ML outperforms other models in both short-term and long-term prediction effects. After reducing the samples available for training, CNN-LSTM-ML still achieves the best prediction performance. Compared with the model with a single CNN-LSTM structure, the CNN-LSTM-ML after adding the meta-learning algorithm improves the accuracy of groundwater level prediction based on the existing sample size, which shows the effectiveness of the meta-learning algorithm. However, the CNN-LSTM-ML model still has room for optimization in terms of interpretability, expansion of multivariate influencing factors, and gradient updates. Future research should start from the above directions. In summary, this paper demonstrates the successful application of the CNN-LSTM-ML model in groundwater level prediction in the middle and lower reaches of the Heihe River. This approach can be extended to other limited-sample time series forecasting with multivariate effects. The model has significance for water resources management in arid regions.

Author Contributions

Conceptualization, X.Y. and Z.Z.; methodology, X.Y.; software, X.Y.; validation, X.Y.; formal analysis, X.Y.; investigation, X.Y. and Z.Z.; resources, Z.Z.; data curation, X.Y.; writing—original draft preparation, X.Y. and Z.Z.; writing—review and editing, X.Y. and Z.Z.; visualization, X.Y.; supervision, Z.Z.; project administration, Z.Z.; funding acquisition, Z.Z. All authors have read and agreed to the published version of the manuscript.


This research was funded by the National Natural Science Foundation of China (41930101), the Gansu Province Science and Technology Program Funding (20YF3GA013), the Gansu Science and Technology SME Technology Innovation Fund Project Funding (20CX9JA128), the Education Department of Gansu Province: the Young Doctoral Foundation (2021QB-055), Young Scholars Science Foundation of Lanzhou Jiaotong University (2020022), 2022 Gansu Province Outstanding Graduate “Innovation Star” Project (2022CXZX-590).

Data Availability Statement

Groundwater data set is provided by National Tibetan Plateau Data Center ( (accessed on 14 April 2022)). Meteorological data is not available due to project needs, readers can contact the author for details if necessary.


The authors are very grateful to National Tibetan Plateau Data Center ( (accessed on 28 July 2022)) for providing data support and Zhongjing Wang of Tsinghua University for providing specific information on monitoring wells in the Heihe River Basin. We also thank anonymous reviewers for raising constructive comments that led to a substantially improved manuscript.

Conflicts of Interest

The authors declare no conflict of interest.


  1. Taylor, R.G.; Scanlon, B.; Doll, P.; Rodell, M.; van Beek, R.; Wada, Y.; Longuevergne, L.; Leblanc, M.; Famiglietti, J.S.; Edmunds, M.; et al. Ground water and climate change. Nat. Clim. Chang. 2013, 3, 322–329. [Google Scholar] [CrossRef] [Green Version]
  2. UNESCO. The Groundwater Resources of the World are Suffering the Effects of Poor Governance, Experts Say. Available online: (accessed on 22 March 2022).
  3. Famiglietti, J.S. The global groundwater crisis. Nat. Clim. Chang. 2014, 4, 945–948. [Google Scholar] [CrossRef]
  4. Li, P.Y.; Qian, H.; Howard, K.W.F.; Wu, J.H. Building a new and sustainable “Silk Road economic belt”. Environ. Earth Sci. 2015, 74, 7267–7270. [Google Scholar] [CrossRef]
  5. Ostad-Ali-Askari, K.; Shayannejad, M. Quantity and quality modelling of groundwater to manage water resources in Isfahan-Borkhar Aquifer. Environ. Dev. Sustain. 2021, 23, 15943–15959. [Google Scholar] [CrossRef]
  6. Neshat, A.; Pradhan, B.; Pirasteh, S.; Shafri, H.Z.M. Estimating groundwater vulnerability to pollution using a modified DRASTIC model in the Kerman agricultural area, Iran. Environ. Earth Sci. 2014, 71, 3119–3131. [Google Scholar] [CrossRef]
  7. Sajedi-Hosseini, F.; Malekian, A.; Choubin, B.; Rahmati, O.; Cipullo, S.; Coulon, F.; Pradhan, B. A novel machine learning-based approach for the risk assessment of nitrate groundwater contamination. Sci. Total Environ. 2018, 644, 954–962. [Google Scholar] [CrossRef] [Green Version]
  8. Shekhar, S.; Purohit, R.; Kaushik, Y. Technical paper included in the special session on Groundwater in the 5th Asian Regional Conference of INCID. In Groundwater Management in NCT Delhi; Vigyan Bhawan: New Delhi, India, 2009; pp. 23–35. [Google Scholar]
  9. Richey, A.S.; Thomas, B.F.; Lo, M.H.; Famiglietti, J.S.; Swenson, S.; Rodell, M. Uncertainty in global groundwater storage estimates in a Total Groundwater Stress framework. Water Resour. Res. 2015, 51, 5198–5216. [Google Scholar] [CrossRef]
  10. Dalin, C.; Wada, Y.; Kastner, T.; Puma, M.J. Groundwater depletion embedded in international food trade. Nature 2017, 543, 700–704. [Google Scholar] [CrossRef] [Green Version]
  11. Konikow, L.F.; Kendy, E. Groundwater depletion: A global problem. Hydrogeol. J. 2005, 13, 317–320. [Google Scholar] [CrossRef]
  12. Chen, W.; Li, H.; Hou, E.K.; Wang, S.Q.; Wang, G.R.; Panahi, M.; Li, T.; Peng, T.; Guo, C.; Niu, C.; et al. GIS-based groundwater potential analysis using novel ensemble weights-of-evidence with logistic regression and functional tree models. Sci. Total Environ. 2018, 634, 853–867. [Google Scholar] [CrossRef] [Green Version]
  13. Cavelan, A.; Golfier, F.; Colombano, S.; Davarzani, H.; Deparis, J.; Faure, P. A critical review of the influence of groundwater level fluctuations and temperature on LNAPL contaminations in the context of climate change. Sci. Total Environ. 2022, 806, 150412. [Google Scholar] [CrossRef] [PubMed]
  14. Fu, G.B.; Crosbie, R.S.; Barron, O.; Charles, S.P.; Dawes, W.; Shi, X.G.; Niel, T.V.; Li, C. Attributing variations of temporal and spatial groundwater recharge: A statistical analysis of climatic and non-climatic factors. J. Hydrol. 2019, 568, 816–834. [Google Scholar] [CrossRef]
  15. Klove, B.; Ala-Aho, P.; Bertrand, G.; Gurdak, J.J.; Kupfersberger, H.; Kvaerner, J.; Muotka, T.; Mykra, H.; Preda, E.; Rossi, P.; et al. Climate change impacts on groundwater and dependent ecosystems. J. Hydrol. 2014, 518, 250–266. [Google Scholar] [CrossRef]
  16. Latif, Y.; Ma, Y.; Ma, W. Climatic trends variability and concerning flow regime of Upper Indus Basin, Jehlum, and Kabul river basins Pakistan. Theor. Appl. Climatol. 2021, 144, 447–468. [Google Scholar] [CrossRef]
  17. Latif, Y.; Ma, Y.; Ma, W.; Muhammad, S.; Adnan, M.; Yaseen, M.; Fealy, R. Differentiating Snow and Glacier Melt Contribution to Runoff in the Gilgit River Basin via Degree-Day Modelling Approach. Atmosphere 2020, 11, 1023. [Google Scholar] [CrossRef]
  18. Latif, Y.; Yaoming, M.; Yaseen, M. Spatial analysis of precipitation time series over the Upper Indus Basin. Theor. Appl. Climatol. 2018, 131, 761–775. [Google Scholar] [CrossRef] [Green Version]
  19. Latif, Y.; Yaoming, M.; Yaseen, M.; Muhammad, S.; Wazir, M.A. Spatial analysis of temperature time series over the Upper Indus Basin (UIB) Pakistan. Theor. Appl. Climatol. 2020, 139, 741–758. [Google Scholar] [CrossRef] [Green Version]
  20. Winter, T.C. Relation of streams, lakes, and wetlands to groundwater flow systems. Hydrogeol. J. 1999, 7, 28–45. [Google Scholar] [CrossRef]
  21. Lyu, H.; Wu, T.T.; Su, X.S.; Wang, Y.Q.; Wang, C.; Yuan, Z.J. Factors controlling the rise and fall of groundwater level during the freezing-thawing period in seasonal frozen regions. J. Hydrol. 2022, 606, 127442. [Google Scholar] [CrossRef]
  22. Delinom, R.M.; Assegaf, A.; Abidin, H.Z.; Taniguchi, M.; Suherman, D.; Lubis, R.F.; Yulianto, E. The contribution of human activities to subsurface environment degradation in Greater Jakarta Area, Indonesia. Sci. Total Environ. 2009, 407, 3129–3141. [Google Scholar] [CrossRef] [PubMed]
  23. Lamb, S.E.; Haacker, E.M.K.; Smidt, S.J. Influence of Irrigation Drivers Using Boosted Regression Trees: Kansas High Plains. Water Resour. Res. 2021, 57, e2020WR028867. [Google Scholar] [CrossRef]
  24. Gerke, H.H.; Koszinski, S.; Kalettka, T.; Sommer, M. Structures and hydrologic function of soil landscapes with kettle holes using an integrated hydropedological approach. J. Hydrol. 2010, 393, 123–132. [Google Scholar] [CrossRef]
  25. Goldman, M.; Neubauer, F.M. Groundwater exploration using integrated geophysical techniques. Surv. Geophys. 1994, 15, 331–361. [Google Scholar] [CrossRef]
  26. Owen, R.J.; Gwavava, O.; Gwaze, P. Multi-electrode resistivity survey for groundwater exploration in the Harare greenstone belt, Zimbabwe. Hydrogeol. J. 2006, 14, 244–252. [Google Scholar] [CrossRef]
  27. Allafta, H.; Opp, C.; Patra, S. Identification of Groundwater Potential Zones Using Remote Sensing and GIS Techniques: A Case Study of the Shatt Al-Arab Basin. Remote Sens. 2021, 13, 112. [Google Scholar] [CrossRef]
  28. Celik, R.; Aslan, V. Evaluation of hydrological and hydrogeological characteristics affecting the groundwater potential of Harran Basin. Arab. J. Geosci. 2020, 13, 1–13. [Google Scholar] [CrossRef]
  29. Rahmati, O.; Samani, A.N.; Mahdavi, M.; Pourghasemi, H.R.; Zeinivand, H. Groundwater potential mapping at Kurdistan region of Iran using analytic hierarchy process and GIS. Arab. J. Geosci. 2015, 8, 7059–7071. [Google Scholar] [CrossRef]
  30. Castrillo, M.; Garcia, A.L. Estimation of high frequency nutrient concentrations from water quality surrogates using machine learning methods. Water Res. 2020, 172, 115490. [Google Scholar] [CrossRef] [Green Version]
  31. Herrera, M.; Torgo, L.; Izquierdo, J.; Perez-Garcia, R. Predictive models for forecasting hourly urban water demand. J. Hydrol. 2010, 387, 141–150. [Google Scholar] [CrossRef]
  32. Yaseen, Z.M.; Naganna, S.R.; Sa’adi, Z.; Samui, P.; Ghorbani, M.A.; Salih, S.Q.; Shahid, S. Hourly River Flow Forecasting: Application of Emotional Neural Network Versus Multiple Machine Learning Paradigms. Water Resour. Manag. 2020, 34, 1075–1091. [Google Scholar] [CrossRef]
  33. Khalil, B.; Broda, S.; Adamowski, J.; Ozga-Zielinski, B.; Donohoe, A. Short-term forecasting of groundwater levels under conditions of mine-tailings recharge using wavelet ensemble neural network models. Hydrogeol. J. 2015, 23, 121–141. [Google Scholar] [CrossRef]
  34. Sahoo, S.; Jha, M.K. Groundwater-level prediction using multiple linear regression and artificial neural network techniques: A comparative assessment. Hydrogeol. J. 2013, 21, 1865–1887. [Google Scholar] [CrossRef]
  35. Adhikary, S.; Rahman, M.; Das Gupta, A. A Stochastic Modelling Technique for Predicting Groundwater Table Fluctuations with Time Series Analysis. Int. J. Appl. Sci. Eng. Res. 2012, 1, 238–249. [Google Scholar]
  36. Mirzavand, M.; Ghazavi, R. A Stochastic Modelling Technique for Groundwater Level Forecasting in an Arid Environment Using Time Series Methods. Water Resour. Manag. 2015, 29, 1315–1328. [Google Scholar] [CrossRef]
  37. Patle, G.T.; Singh, D.K.; Sarangi, A.; Rai, A.; Khanna, M.; Sahoo, R.N. Time Series Analysis of Groundwater Levels and Projection of Future Trend. J. Geol. Soc. India 2015, 85, 232–242. [Google Scholar] [CrossRef]
  38. Gholami, V.; Chau, K.W.; Fadaee, F.; Torkaman, J.; Ghaffari, A. Modeling of groundwater level fluctuations using dendrochronology in alluvial aquifers. J. Hydrol. 2015, 529, 1060–1069. [Google Scholar] [CrossRef]
  39. Muller, J.; Park, J.; Sahu, R.; Varadharajan, C.; Arora, B.; Faybishenko, B.; Agarwal, D. Surrogate optimization of deep neural networks for groundwater predictions. J. Glob. Optim. 2021, 81, 203–231. [Google Scholar] [CrossRef]
  40. Sahu, R.K.; Muller, J.; Park, J.; Varadharajan, C.; Arora, B.; Faybishenko, B.; Agarwal, D. Impact of Input Feature Selection on Groundwater Level Prediction From a Multi-Layer Perceptron Neural Network. Front. Water 2020, 2, 573034. [Google Scholar] [CrossRef]
  41. Coulibaly, P.; Anctil, F.; Aravena, R.; Bobee, B. Artificial neural network modeling of water table depth fluctuations. Water Resour. Res. 2001, 37, 885–896. [Google Scholar] [CrossRef]
  42. Bowes, B.D.; Sadler, J.M.; Morsy, M.M.; Behl, M.; Goodall, J.L. Forecasting Groundwater Table in a Flood Prone Coastal City with Long Short-term Memory and Recurrent Neural Networks. Water 2019, 11, 1098. [Google Scholar] [CrossRef] [Green Version]
  43. Jeong, J.; Park, E. Comparative applications of data-driven models representing water table fluctuations. J. Hydrol. 2019, 572, 261–273. [Google Scholar] [CrossRef]
  44. Jeong, J.; Park, E.; Chen, H.L.; Kim, K.Y.; Han, W.S.; Suk, H. Estimation of groundwater level based on the robust training of recurrent neural networks using corrupted data. J. Hydrol. 2019, 582, 124512. [Google Scholar] [CrossRef]
  45. Supreetha, B.S.; Shenoy, N.; Nayak, P. Lion Algorithm-Optimized Long Short-Term Memory Network for Groundwater Level Forecasting in Udupi District, India. Appl. Comput. Intell. Soft Comput. 2020, 2020, 8685724. [Google Scholar] [CrossRef] [Green Version]
  46. Zhang, J.F.; Zhu, Y.; Zhang, X.P.; Ye, M.; Yang, J.Z. Developing a Long Short-Term Memory (LSTM) based model for predicting water table depth in agricultural areas. J. Hydrol. 2018, 561, 918–929. [Google Scholar] [CrossRef]
  47. Afzaal, H.; Farooque, A.A.; Abbas, F.; Acharya, B.; Esau, T. Groundwater Estimation from Major Physical Hydrology Components Using Artificial Neural Networks and Deep Learning. Water 2019, 12, 5. [Google Scholar] [CrossRef] [Green Version]
  48. Lahivaara, T.; Malehmir, A.; Pasanen, A.; Karkkainen, L.; Huttunen, J.M.J.; Hesthaven, J.S. Estimation of groundwater storage from seismic data using deep learning. Geophys. Prospect. 2019, 67, 2115–2126. [Google Scholar] [CrossRef] [Green Version]
  49. Lai, G.K.; Chang, W.C.; Yang, Y.M.; Liu, H.X. Acm/Sigir In Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks. In Proceedings of the 41st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), Ann Arbor, MI, USA, 8–12 July 2018; pp. 95–104. [Google Scholar]
  50. Fei-Fei, L.; Fergus, R.; Perona, P.; Ieee Computer, S.; Ieee Computer, S. A Bayesian approach to unsupervised one-shot learning of object categories. In Proceedings of the 9th IEEE International Conference on Computer Vision, Nice, France, 13–16 October 2003; pp. 1134–1141. [Google Scholar]
  51. Li, F.F.; Fergus, R.; Perona, P. One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 594–611. [Google Scholar]
  52. Malik, A.; Bhagwat, A. Modelling groundwater level fluctuations in urban areas using artificial neural network. Groundw. Sustain. Dev. 2021, 12, 100484. [Google Scholar] [CrossRef]
  53. Fort, S. Gaussian prototypical networks for few-shot learning on omniglot. arXiv 2017, arXiv:1708.02735. [Google Scholar]
  54. Koch, G.; Zemel, R.; Salakhutdinov, R. Siamese Neural Networks for One-Shot Image Recognition; ICML Deep Learning Workshop: Lille, France, 2015; pp. 1–8. [Google Scholar]
  55. Snell, J.; Swersky, K.; Zemel, R. Prototypical networks for few-shot learning. Adv. Neural Inf. Processing Syst. 2017, 30, 1–13. [Google Scholar]
  56. Finn, C.; Abbeel, P.; Levine, S. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar]
  57. Cheng, G.D.; Li, X.; Zhao, W.Z.; Xu, Z.M.; Feng, Q.; Xiao, S.C.; Xiao, H.L. Integrated study of the water-ecosystem-economy in the Heihe River Basin. Natl. Sci. Rev. 2014, 1, 413–428. [Google Scholar] [CrossRef] [Green Version]
  58. Li, X.; Cheng, G.D.; Liu, S.M.; Xiao, Q.; Ma, M.G.; Jin, R.; Che, T.; Liu, Q.H.; Wang, W.Z.; Qi, Y.; et al. Heihe Watershed Allied Telemetry Experimental Research (HiWATER): Scientific Objectives and Experimental Design. Bull. Am. Meteorol. Soc. 2013, 94, 1145–1160. [Google Scholar] [CrossRef]
  59. Zhongjing, W. Groundwater Simulation Data in the Middle Reaches of Heihe (2003–2012); National Tibetan Plateau Data, C., Ed.; National Tibetan Plateau Data Center: Beijing, China, 2016. [Google Scholar]
  60. Dong, Y.R.; Peng, C.Y.J. Principled missing data methods for researchers. Springerplus 2013, 2, 222. [Google Scholar] [CrossRef] [Green Version]
  61. Gnauck, A. Interpolation and approximation of water quality time series and process identification. Anal. Bioanal. Chem. 2004, 380, 484–492. [Google Scholar] [CrossRef] [PubMed]
  62. Kulesh, M.; Holschneider, M.; Kurennaya, K. Adaptive metrics in the nearest neighbours method. Phys. D Nonlinear Phenom. 2008, 237, 283–291. [Google Scholar] [CrossRef]
  63. Lepot, M.; Aubin, J.B.; Clemens, F. Interpolation in Time Series: An Introductive Overview of Existing Methods, Their Performance Criteria and Uncertainty Assessment. Water 2017, 9, 796. [Google Scholar] [CrossRef] [Green Version]
  64. Schwertman, N.C.; Owens, M.A.; Adnan, R. A simple more general boxplot method for identifying outliers. Comput. Stat. Data Anal. 2004, 47, 165–174. [Google Scholar] [CrossRef]
  65. LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
  66. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
  67. Yin, C.Y.; Zhang, S.; Wang, J.; Xiong, N.N. Anomaly Detection Based on Convolutional Recurrent Autoencoder for IoT Time Series. IEEE Trans. Syst. Man Cybern.-Syst. 2022, 52, 112–122. [Google Scholar] [CrossRef]
  68. Zhang, Y.; Wallace, B. A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv 2015, arXiv:1510.03820. [Google Scholar]
  69. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  70. Ke, J.T.; Zheng, H.Y.; Yang, H.; Chen, X.Q. Short-term forecasting of passenger demand under on-demand ride services: A spatio-temporal deep learning approach. Transp. Res. Part C-Emerg. Technol. 2017, 85, 591–608. [Google Scholar] [CrossRef] [Green Version]
  71. Zhao, Z.; Chen, W.H.; Wu, X.M.; Chen, P.C.Y.; Liu, J.M. LSTM network: A deep learning approach for short-term traffic forecast. Iet Intell. Transp. Syst. 2017, 11, 68–75. [Google Scholar] [CrossRef] [Green Version]
  72. Zhao, J.C.; Deng, F.; Cai, Y.Y.; Chen, J. Long short-term memory—Fully connected (LSTM-FC) neural network for PM2.5 concentration prediction. Chemosphere 2019, 220, 486–492. [Google Scholar] [CrossRef]
  73. Geng, C.X.; Huang, S.J.; Chen, S.C. Recent Advances in Open Set Recognition: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 3614–3631. [Google Scholar] [CrossRef] [Green Version]
  74. Willmott, C.J.; Matsuura, K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in a ssessing average model performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
  75. Zhang, Q.S.; Zhu, S.C. Visual interpretability for deep learning: A survey. Front. Inf. Technol. Electron. Eng. 2018, 19, 27–39. [Google Scholar] [CrossRef] [Green Version]
  76. Chorowski, J.K.; Bahdanau, D.; Serdyuk, D.; Cho, K.; Bengio, Y. Attention-based models for speech recognition. Adv. Neural Inf. Process. Syst. 2015, 28, 1–19. [Google Scholar]
  77. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
  78. Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
  79. Shan, C.H.; Zhang, J.B.; Wang, Y.J.; Xie, L. Ieee In Attention-based end-to-end speech recognition on voice search. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 4764–4768. [Google Scholar]
  80. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 1–15. [Google Scholar]
  81. Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. In Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
  82. Yin, W.; Schütze, H.; Xiang, B.; Zhou, B. Abcnn: Attention-based convolutional neural network for modeling sentence pairs. Trans. Assoc. Comput. Linguist. 2016, 4, 259–272. [Google Scholar] [CrossRef]
Figure 1. Groundwater level prediction process.
Figure 1. Groundwater level prediction process.
Water 14 02377 g001
Figure 2. The research area of the middle and lower reaches of the Heihe River.
Figure 2. The research area of the middle and lower reaches of the Heihe River.
Water 14 02377 g002
Figure 3. Box Plot Analysis Outlier Process.
Figure 3. Box Plot Analysis Outlier Process.
Water 14 02377 g003
Figure 4. Schematic diagram of the convolution process.
Figure 4. Schematic diagram of the convolution process.
Water 14 02377 g004
Figure 5. The internal structure of LSTM.
Figure 5. The internal structure of LSTM.
Water 14 02377 g005
Figure 6. The internal structure of CNN-LSTM-ML.
Figure 6. The internal structure of CNN-LSTM-ML.
Water 14 02377 g006
Figure 7. Multivariate meteorological factors after data processing.
Figure 7. Multivariate meteorological factors after data processing.
Water 14 02377 g007
Figure 8. The prediction effect of the model on the data of an observation well.
Figure 8. The prediction effect of the model on the data of an observation well.
Water 14 02377 g008
Figure 9. Average RMSE and MAE of models using different amounts of training data. (a,b) reduce training data by 10%. (c,d) reduce training data by 30%. (e,f) reduce training data by 50%.
Figure 9. Average RMSE and MAE of models using different amounts of training data. (a,b) reduce training data by 10%. (c,d) reduce training data by 30%. (e,f) reduce training data by 50%.
Water 14 02377 g009
Figure 10. Comparison of short-term and long-term predictive performance of CNN-LSTM and CNN-LSTM-ML.
Figure 10. Comparison of short-term and long-term predictive performance of CNN-LSTM and CNN-LSTM-ML.
Water 14 02377 g010
Figure 11. Visualization of ablation experiment prediction results.
Figure 11. Visualization of ablation experiment prediction results.
Water 14 02377 g011
Table 1. Predictive accuracy of groundwater level.
Table 1. Predictive accuracy of groundwater level.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Yang, X.; Zhang, Z. A CNN-LSTM Model Based on a Meta-Learning Algorithm to Predict Groundwater Level in the Middle and Lower Reaches of the Heihe River, China. Water 2022, 14, 2377.

AMA Style

Yang X, Zhang Z. A CNN-LSTM Model Based on a Meta-Learning Algorithm to Predict Groundwater Level in the Middle and Lower Reaches of the Heihe River, China. Water. 2022; 14(15):2377.

Chicago/Turabian Style

Yang, Xingyu, and Zhongrong Zhang. 2022. "A CNN-LSTM Model Based on a Meta-Learning Algorithm to Predict Groundwater Level in the Middle and Lower Reaches of the Heihe River, China" Water 14, no. 15: 2377.

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop