1. Introduction
Soil moisture (SM), as an essential ground diagnostic variable in the Earth system, plays a vital role in energy, water, and carbon cycles between the atmosphere and land surface [
1]. SM was identified as one of the Essential Climate Variables by the Global Climate Observing System [
2]. Therefore, accurate forecast of SM is important for many critical applications, such as numerical weather forecasts [
3], disaster monitoring of drought [
4], the carbon cycle modelling [
5], agriculture yield assessment [
6], and many other scientifically and socially important applications.
Microwave remote sensing has a significant association with the soil dielectric constant, and satellite data can be utilized to provide SM from regional to global scales with daily or hourly temporal resolution [
7], providing SM datasets with high information content, large observation range, high speed, and relatively high accuracy. The Soil Moisture Active Passive (SMAP) mission [
1,
8,
9] was launched in 2015 and received the highest ranking of mission priorities from the National Research Council [
10]. The mission seeks to provide SM retrieval for the top 5 cm of the soil column using L-band microwaves with a spatial resolution of 40 km. Because the L-band has a greater capacity to penetrate vegetation than the C- or X-bands, it has been deemed the most appropriate band for SM in high-density vegetative zones. It is also committed to producing long-term and global SM products with an accuracy requirement of 0.04 cm
3/cm
3 volumetric SM with the unbiased root mean square error [
1]. Furthermore, the employment of anti-interference hardware and kurtosis-based algorithms contributes to the reduction of radio frequency interference (RFI) produced by human activity [
11]. Thus, SMAP products have attracted a lot of attention and have substantially increased the measurement capability of SM. However, it has a revisit time of 2–3 days, which causes gaps between SMAP areas. Thus, these temporal and spatial gaps can limit their applications.
Many researchers [
12,
13,
14,
15,
16,
17] used deep learning (DL) models to forecast SM because of their ability to learn nonlinear mappings, automatically extract features, and build dynamic systems. DL models can be divided into three categories: spatial models [
18], temporal models [
19,
20], and spatial–temporal models [
21]. Spatial models, such as the Convolutional Neural Network (CNN), were proposed by LeCun and Bengio (1995) [
18]. CNNs typically comprise a convolutional layer, a pooling layer, and a fully connected layer, with the convolutional and pooling layers alternated. Convolutional layers can extract spatial characteristics. As a result, the grid of SM and meteorological time series data can be taken as CNN-processed images. For example, Wang et al. [
22] used CNN to predict SM content using near infrared spectroscopy. Hegazi et al. [
23] used Sentinel-1 images to train a CNN-based algorithm to estimate SM content over agricultural areas. Temporal models include the long short-term memory (LSTM) and the gated recurrent unit (GRU). LSTM and GRU can successfully correlate contextual information when processing serial data and are frequently employed in Earth system research for time series data prediction. For example, Fang et al. [
12,
13] employed LSTM to predict surface SM based on climate forcing and soil texture. In addition, Filipovi et al. [
24] forecasted the second layer SM using LSTM. Spatial–temporal models, such as the convolutional LSTM (ConvLSTM) [
21], substituted matrix multiplication with a convolution operation of each gate in an LSTM cell. The model can capture and use both spatial and temporal correlations, making it an effective tool for forecasting spatial–temporal variables. For example, Li et al. [
15] showed that ConvLSTM outperformed independent CNN or LSTM in terms of SM forecast accuracy. A et al. [
25] utilized ConvLSTM to assess the root zone SM. Li et al. [
16] suggested a ConvLSTM based on attention-mechanisms for SM prediction and demonstrated the relevance of temporal and spatial correlation on model performance. It is well known that the state of each pixel’s SM at each time step depends not only on its own historical observation, but also on the state of its neighboring pixels at the current time. It may also be directly influenced by the historical state of neighboring pixels, as well as by the time series data of meteorological forcing variables and static geographical attributes. For spatial models, spatial features are extracted by convolutional layers, however, “flattening” loses spatial autocorrelation and its inter-grid order, and its forecasting in the time dimension cannot introduce long-term memory structure. The time series model is used to build mapping relationships between time series of variables at a particular point, although there are significant issues such missing spatial information. As a result, the spatial–temporal models have the advantage of concurrently capturing spatial and temporal changes for SM forecast.
In addition, both LSTM and GRU retain important features through various gate functions, which ensures that the vital information for SM prediction will not be lost in long-term propagation. GRU has one less gate function than LSTM, so there are fewer parameters of GRU and it is faster than LSTM, but the accuracy is similar. ConvGRU is based on the combination of GRU and CNN; by superimposing convolutional operations on different regions, it can obtain temporal relationships and spatial features to better predict their future change patterns based on information mining.
All the aforementioned SM prediction methods based on deep learning have some drawbacks, including the inability to handle large amounts of missing remote sensing data. The key to forecast SM is SM memory characteristics. Thus, using lagged SM in DL models is a common practice, but this is challenging to perform with remotely sensed data because of missing data. Previously, Fang et al. [
14] proposed the LSTM with an adaptive data integration kernel (called DI_LSTM) model for training SMAP L3 SM with missing spatial coverage at a time step. That is, the lagged SM is added as an input variable to the input data. When lagged SM data is missing, the predicted value from the previous moment is added as an input variable to the input data. However, LSTM-based models are trained in a point-to-point manner, which allows them to completely understand SM temporal correlation but ignores the impact of SM’s spatial distribution on SM forecast.
Therefore, to address the challenges mentioned above, we propose a Convolutional Gated Recursive Units with Data Integration (DI_ConvGRU) model for SM forecast. The SMAP L3 SM was used as the training target of the model, and the daily lagged SM from the SMAP L3 remote sensing data set and meteorological time series and static characteristics were used as predictors. The contributions of this paper are fourfold: (1) Given the spatial–temporal complexity of SM and the gap caused by the revisit time of remote sensing data, we developed a spatial–temporal DL model (DI_ConvGRU) for real-time SM forecast; (2) We validated the model’s prediction performance and compared it with other models in two ways: spatial–temporal model versus time series model (DI_LSTM) and DI term versus linear interpolation (interp_ConvGRU, that is ConvGRU with Linear Interpolation); (3) Evaluating the computational performance of DI_ConvGRU by comparing it with other DL models (ConvGRU, LSTM, DI_LSTM, and interp_ConvGRU); (4) We intuitively display and analyze the performance of the prediction models for various climatic zones and input factors. We expect this to help researchers improve the predictive performance for SM intentionally and strategically.
2. Data Sources
The domain of this study was restricted to China (18°–54°N, 73°–135°E). We used SMAP version 8, Level 3 passive radiometer SM product [
26] as the target to training and forecast SM by using the DL models. The SMAP L3 SM product is a composite product released daily that contains SM observations from descending orbit (6:00 AM) and ascending orbit (6:00 PM). It has a spatial resolution of about 36 km and a revisit time of 2–3 days. Revisit time is the time when a satellite observation from a point can return to that point again, and satellite scan gaps can result in a large number of SMAP pixels with missing values; vegetation water content, urban areas, and water bodies can all also result in SMAP pixels with missing values. The average values of descending and ascending directions were determined as the day’s SM data in order to increase the spatial coverage of the data. Chan and Zhang et al. [
27,
28] showed that the performance of the ascending and descending orbit SM products was similar. Like Koster et al. [
29], we did not screen the SM data according to its quality control information to allow greater spatial and temporal coverage. The SMAP L3 SM product used in this study over the 3-year period (1 April 2015–31 March 2018) is freely available from the National Snow and Ice Data Center (NSIDC) (
https://nsidc.org/data/SPL3SMP, last accessed on 15 November 2022).
The spatial and temporal variation of SM is controlled by a variety of factors, and this work selected lagged SM, meteorological forcing variables and static physiographic attributes that have been commonly used in DL studies in recent years [
12,
13,
14,
15,
16]. The lagged climatic forcing was extracted from the land component of the fifth generation of European Reanalysis (ERA5-Land) [
30], and included precipitation, temperature, radiation, humidity, and wind speed (the lagged time is 1 day). Physiographic attributes contain soil properties extracted from the China soil dataset for land surface modeling (CSDL) [
31], including sand, silt, clay content, bulk density, and land cover type extracted from the United States Geological Survey [
32] as well as the digital elevation model (DEM) [
33]. All data were resampled to SMAP L3 grid with a resolution of 36 km and the time series data were aggregated to a daily time scale.
3. Methods
3.1. Components of the Model
3.1.1. ConvGRU Model
ConvGRU is modified according to ConvLSTM [
21], converting LSTM to GRU [
34] for calculation. The GRU neural network is an adaptation of the LSTM that optimizes the cell structure on the basis of the LSTM neural network to decrease the parameters and speed up the training while maintaining similar accuracy. The main concept of ConvGRU is to combine matrix operations with convolutional operations in order to extract spatial characteristics using convolutional computation and retrieve temporal features using GRU. As shown in
Figure 1, all the linear layers in GRU are transformed into Conv layers, and the input variables of two-dimensional variables in GRU are transformed into three-dimensional variables in ConvGRU. Then, the convolution filter is applied to the input-to-state and state-to-state transitions of a passenger flow grid cell.
For each time step
t, the update gate
Zt and reset gate
Rt of ConvGRU are formulated as follows (for further detail, refer to [
21]):
Hidden Memory
where
represents the
t-th step of inputs time steps. The term
is the inputs,
is the SM data, forcing and static attributes, respectively.
is the hidden state. The operator
denotes the convolution operator and
denotes the Hadamard product.
is the sigmoid function,
represents the hyperbolic tangent function, and
is 2D Conv kernels for spatial dimension. Update gate and reset gate are both four-dimensional tensors, where the four dimensions denote time, feature, and the rows and columns of spatial data.
3.1.2. Data Integration (DI)
Data Integration (DI) was used to process SMAP L3 data with a huge number of missing values in response to neural networks’ inability to handle missing values [
14]. In this method, the observed values at time
t are used when SM data are available, while the predicted values at time
t − 1 are used when there are no SM data at time
t. Therefore, during the model run, the most recent observations or forecasts are added as injection terms to the model input data. Although initial model training will result in inaccurate data filling and poor forecasts, the forecast network can enhance its capacity to fill the missing values and converge in the training process as the iteration goes on.
3.2. Structure of DI_ConvGRU Model
To better capture the spatial interactions and temporal evolution of surrounding SM simultaneously, we proposed a ConvGRU model with Data Integration (DI_ConvGRU) for SM forecast. As shown in
Figure 2, the model consists of an input layer, three ConvGRU layers, and an output layer.
For the input layer, we first checked to see if there were any SM observations when the time step was 0 (i.e.,
t = 0). We filled missing values with zero values when there were no SM observations; it was verified with experiments that using zero values or average values had similar effect on the prediction results. After filling the SM image completely, lagged SM were combined with other variables as inputs to the first ConvGRU cell for training to obtain the forecast of
for the first day. We repeated the procedure at
t = 0 when the time step was 1 (i.e.,
t = 1). The difference was that when the SM observation is missing, we substituted the first day’s forecast
, which results in the next day’s forecast,
. The operation was repeated for the time
t = 1 when the time step was 2 (i.e.,
t = 2). Because of inaccurate initial value filling in this setup, the network produced poor predicted values during the first few training epochs. However, as the prediction network improved during training, it also improved at filling in the gaps, thus the training would eventually converge. For DI_ConvGRU,
to the ConvGRU cell is expressed as:
where
is the value of a pixel in the SM grid data
on time step
t, and
is the predicted value of
pixel on time step
t − 1.
At each time step, the loss function was calculated for the pixels with SMAP observations as [
14]:
where
N is the length of the time series,
and
are the SMAP SM observations and the model’s predicted value at time
t, respectively.
3.3. Experiment Setup and Model Setting
In this work, DI_ConvGRU was compared with four other DL models, namely ConvGRU, LSTM, DI_LSTM, and interp_ConvGRU. For brevity, we mainly show the comparison of DI_ConvGRU, DI_LSTM, and interp_ConvGRU in the results section. DI_LSTM combines DI with LSTM instead of ConvGRU compared with DI_ConvGRU, while interp_ConvGRU uses linear interpolation to fulfill the missing values of SM images. All codes are available at
https://github.com/YeZhang929/DI_ConvGRU (last accessed on 15 November 2022). The codes of LSTM and DI_LSTM were modified from Fang et al. [
14] (
https://github.com/mhpi/hydroDL, last accessed on 15 November 2022).
The whole data set was divided into three parts following a chronological order for training, validating, and testing with the ratio of 60%, 10%, and 30%, respectively. We processed the input data as 5-D tensors (the five dimensions are the samples, timesteps, width, height, and features) required by the network. All the variables were normalized using the min-max method, since normalization can make the data more focused and helps to improve the prediction accuracy and fitting speed of the model.
For the spatial–temporal models (i.e., ConvGRU, DI_ConvGRU and interp_ConvGRU), the input feature map size is 50 × 50, the number of ConvGRU layers is 3, and the number of filters in each ConvGRU layer is 64, 64, and 1, respectively. The default hyperbolic tangent “
tanh” activation function is utilized, and each layer kernel is 3 × 3. The network layers were joined by stacking. The size of feature maps (50 × 50) was kept constant with the input SM by using the same padding technique. The number of filters in the LSTM layers was set to 128 for DI_LSTM, and all other parameter values were made in accordance with Fang et al. [
14].
We used the early stop method during the model training to improve generalization performance and prevent overfitting. The model training was stopped when the validation set loss was not decreased for 20 consecutive training cycles. Empirically, 100 iterations and 16 batches were chosen as the parameters. The Adam optimizer was used to train the model, and the time step was set at 16. In addition, a timer was put up in the code to keep track of the training time, serving as a benchmark for assessing the model fitting efficiency.
We performed the experiments on a server with a CPU: AMD Ryzen 7 5800X 8-Core Processor 3.80 GHz, GPU: RTX2080Ti and RAM: 11 GB running Pycharm. The Anaconda platform was used as the base platform for DL training, pytorch 1.10.2 was the backend, and CUDA technology was used to implement the computation. The Python version is 3.9.
To explore the sensitivity of predictions to different inputs, we designed four experiments to test the prediction performance of the DL models based on different input data for 1-, 2-, and 3-day forecasts. Experiment I served as the baseline, in which the model was built using ConvGRU and using only climate forcing as inputs. In Experiment II, Interp_ConvGRU was built using climate forcing and lagged SM as input, which was used to show the effect of adding lagged SM gap filled by linearly interpolation compared with Experiment I. It is not suitable to build a ConvGRU with lagged SM but without gap filling because of the frequent gaps in the original SMAP SM. As a result, we did not build ConvGRU with climate forcing and lagged SM. In Experiment III, DI_ConvGRU was built using climate forcing and lagged SM, which was used to show the effect of DI instead of the linear interpolation in Experiment II. In Experiment IV, DI_ConvGRU was built using climate forcing, lagged SM, and static physiographic attributes, which were used to show the effect of static data compared with Experiment III.
3.4. Performance Evaluation Measures
This work used the several indicators to evaluate the predictive performance of the DL model, including the Bias, Root Mean Square Error (RMSE), the unbiased RMSE (ubRMSE), and Pearson’s Correlation Coefficient (R) and Kling-Gupta efficiency (KGE). These criteria are calculated as follows:
where
and
are the corresponding mean and standard deviation of the predicted values
and the observed values
for
i-th test value.
is the number of timesteps. KGE [
35] is a composite indicator reflecting the agreement between observed and predicted values. KGE varies between
and 1, and values close to 1 indicate that model forecasts are accurate.
The performance of the DL models was evaluated for both the whole China and eight climate regions. The former was used to evaluate the overall performance of models, while the latter was used to reveal why the model performed differently in different regions as well as the possible reasons that soil moisture has different behavior and influencing factors in different climates [
1].
5. Discussion
The DL model has been used in hydrology, particularly in prediction tasks like streamflow [
39], pan evaporation [
40], and floods [
41], demonstrating its robustness and generality. We consider that the current state of each pixel of the SM depends not only on its own historical observations, but also on the state of its surrounding pixels. In addition, it may be directly influenced by meteorological forcing and static geographic attributes of neighboring pixels. Further complicating the forecast of SM are the gaps in the SMAP L3 remote sensing data. Therefore, we proposed DI_ConvGRU to predict the SM, allowing for the capture of the spatial–temporal features and to handle the missing data.
We compared the prediction performance of DI_ConvGRU with interp_ConvGRU and DI_LSTM in terms of temporal analysis and spatial analysis (
Figure 3,
Figure 4,
Figure 5 and
Figure 6,
Figures S1 and S2), and found that DI_ConvGRU can not only capture the spatial–temporal characteristics of SM better, but it can also effectively handle the missing data in SMAP SM. From the spatial–temporal statistical scatter plot (
Figure 3 and
Figure 5), DI_ConvGRU was the best DL model used for SMAP SM prediction, which had the highest R, very close to 1, compared with interp_ConvGRU and DI_LSTM. This indicated that predictions of DI_ConvGRU were very close to SMAP SM observations. However, the spatial analysis results are not very satisfactory. In Eastern China (
Figure 4b,e,h,
Figure S1b,e,h, Figure S2b,e,h), we found that the RMSE values of DL models were generally large at some pixel points, especially in the case of predicted 2- and 3-day forecasts. The reason may come from the standard deviation of SMAP SM (
Figure S3a), where we found a significant correlation between the value of RMSE and the standard deviation of SMAP SM. However, for different models, we found that the correlation of RMSE with the standard deviation of SMAP SM was different. For example, in the eastern part of Heilongjiang province, this correlation was not significant for DI_ConvGRU, but it was larger for interp_ConvGRU and DI_LSTM. It can be seen from
Figure 4c and
Figure S3b that the mapping of R was consistent with the mapping of lagged R, and that the lagged R of SM memory had a significant positive correlation (0.793) with the R of DI_ConvGRU (
Figure S3c). The stronger the SM memory, the better the prediction performance of DI_ConvGRU. This is consistent with previous studies [
35].
The overall improvement of DI_ConvGRU compared with DI_LSTM was larger than that compared with interp_ConvGRU (
Figure 6), but the improvement over different regions was not even; sometimes it even degenerated (
Figure 7). For example, the R values of DI_LSTM were higher than DI_ConvGRU by more than 10% in northern Xinjiang as well as the Inner Mongolia region. This indicated that DI_LSTM may be more suitable for the prediction in these areas. We also found that the improvement in RMSE of DI_ConvGRU (
Figure 5c) was consistent with the trend of lagged correlation of SM (
Figure S3b). This indicated that DI_ConvGRU can capture the SM memory characteristic better than the other two DL models, especially DI_LSTM.
We also investigated the impact of missing SMAP SM data (
Figure S4) on the prediction performance of the model.
Figure S4a,b show the percentage of missing SMAP SM data per day and per pixel in the test phase, respectively. As shown in
Figure S4a, the winter between 2017 and 2018 had the highest rate of missing SM values of SMAP, over 70%. This was followed by spring and autumn, while summer had the lowest rate of missing data (40%). As can be seen from
Figures S5 and S6, the prediction performance of the DL models, especially DI_ConvGRU, was affected by the missing rate of SMAP data in different seasons. However, interp_ConvGRU was subject to larger errors that were caused by the interpolation (blue box plot), and there was an accumulation of errors. That is, the large amount of missing data in winter had little impact on the performance in winter but a large impact on the second spring. Therefore, the DI term can effectively prevent the accumulation of errors.
Figure S4b shows that there are more missing data in the western part of China than in the eastern part. However, the performance of DL models (
Figure 4,
Figures S1 and S2) had good performance in most areas of China, except the northern part. The above results indicate that the missing data rate in different locations did not affect the model performance much, but the missing data in different seasons did affect it to some extent.
It is expected that the DI_ConvGRU proposed in this work can be applied on remote sensing data of soil moisture other than SAMP. Furthermore, it has the potential to be applied on predictions of remote sensing variables with gaps other than soil moisture, which needs further study to verify its suitability. As this work shows that the performance of the DI_ConvGRU depends heavily on soil moisture memory effects that are represented by the lagged SM, there are questions to be answered about whether this method can be useful for gap-filling of other variables, such as leaf area index, and the quality of its performance in that type of usage.
6. Conclusions
SM is a key physical parameter in land surface processes and is involved in key processes such as hydrological processes, surface runoff, and land-atmosphere interactions. It also provides the basis for meteorological services such as drought and flood warnings. In this study, we proposed a convolutional gated recursive unit with a data integration (DI_ConvGRU) model for accurate and real-time SM prediction using SMAP L3 SM data. The model can capture the spatial and temporal correlations of time series SM and adapt to the irregular observations of SMAP SM for prediction. Comparisons were made with interp_ConvGRU (for verifying the role of DI terms) and DI_LSTM (for verifying whether the spatial–temporal model improves the prediction accuracy of SM), and DI_ConvGRU has improved the model performance in 74.88% and 68.99% of the regions according to RMSE comparison with interp_ConvGRU and DI_LSTM, respectively. We analyzed and discussed three aspects of the model performance: the overall performance of the model, the model performance in different climatic regions, and the influence of different factors. The conclusions are as follows:
- (1)
DI_ConvGRU can not only better capture the spatial–temporal characteristics of SM, but also effectively handle the missing data in SMAP SM. In terms of prediction accuracy and convergence speed, the DI_ConvGRU model outperformed the other DL models (ConvGRU, LSTM, DI_LSTM and interp_ConvGRU), and it achieved good performance with a bias of 0.0132 m3/m3, an ubRMSE of 0.022 m3/m3 and an R of 0.977. Using ConvGRU instead of LSTM had a greater impact on the model performance than linear interpolation with DI terms.
- (2)
Among the eight climate zones, the polar regions had the best prediction performance, and the tropical regions had the worst performance. We find that the prediction performance of the model is strongly related to the lagged R of the SM and the coefficient of variation of the SM. The spatial–temporal model’s image-to-image training strategy collected not only information on the time series but also the spatial information of surrounding pixels, whereas the DI term-based model better captured the peaks.
- (3)
The lagged SM has the most significant impact on the model performance, followed by the DI term and static data. Error buildup may result from the linear interpolation-based DL model, while the DI-based model can successfully avoid it. Additionally, while the missing data rate for various places had little impact on the model’s performance, the missing data rate in different seasons had some effects.
In general, the research results presented in this work can provide some reference value for the improvement of prediction models for other meteorological variables. However, the work has some limitations, and further research can be carried out in the following aspects in the future. First, the use of linear interpolation for meteorological forcing variables as well as geographically static attribute data may produce errors and reduce the predictive performance of the model. Second, the quality of SMAP SM data and severe data deficiencies can affect the prediction results of the model, and the deep learning model can be optimized by using high-quality data (for data quality control) and by combining multiple data. Third, because of GPU memory restrictions and the relatively low performance of GPU (RTX 2080Ti) we employed in our study, we were forced to divide the study area into 50 × 50 images. Therefore, utilizing a more powerful GPU could further shorten the training time of DI_ConvGRU and enhance prediction performance. Fourth, our evaluation of the model prediction performance was incomplete because it only considers cases where SMAP SM observations had values. We can evaluate the predicted values using site data instead of the missing pixel points of SMAP SM values. Finally, the DI_ConvGRU model will be useful for long-term SM hindcasts or forecasts as well as weather modeling; the model can also be employed as a spatial–temporal gap filling strategy for remote sensing data reconstruction.