3.3.1. Results Derived from Multi-Satellite LSTM Model Retrieval
Using Rrl as input data, combined with TB and TR as auxiliary data, the optimal parameters of the LSTM network were found by a Bayesian optimisation algorithm for 100 iterations, and the LSTM soil moisture retrieval model was established. The field measurement value of the ThetaProbe soil moisture sensor was used as the target setting for training. The first 70% of the experimental data (1 October 2014–24 March 2016) was used for training, and the last 30% (25 March 2016–31 March 2016) was used as the test set for model prediction and evaluation.
The results are shown in
Figure 13. Compared to the black dots in situ, the blue and red dots in the figure match better, the purple dots less well, and the green dots worse. When
Rrl as such (green) is used as input data for the prediction, the correlation coefficient is only 0.50, and the RMSE is 0.019. When only
Rrl is used as input data, the retrieved soil moisture obtained fluctuates only around 0.13 m³/m³, which does not reflect the real soil moisture change. This is because the reflectivity data are mixed with a lot of noise. The model cannot extract features well in the case of a single input variable, which causes the soil moisture recovery to show almost a straight horizontal line. The accuracy is poor, so the subsequent multi-satellite soil moisture retrieval no longer uses only
Rrl as input data. When
Rrl is used as input data for prediction together with
TR, the correlation coefficient improves to 0.58, and the RMSE is reduced to 0.018 m
3/m
3. When
Rrl is used as input data for prediction together with
TB, the correlation coefficient improves to 0.81, and the RMSE decreases to 0.013. Finally, when
Rrl,
TR, and
TB are used together as input data for prediction, the correlation coefficient increases to 0.83, and the RMSE decreases to 0.013 (
Table 3).
Subsequently, we performed an uncertainty analysis of the model. According to
Table 4, the performance of the models under different data allocation methods is good, thus implying that the model has low variability and low uncertainty.
For comparison, we used the multilayer perceptron with backpropagation learning algorithm (MLP-BP) and support vector machine (SVM) to retrieve soil moisture [
33,
34].
Table 5 showed that the results of MLP-BP and SVM were not as good as those of LSTM. Therefore, in following studies, we only used the LSTM model to retrieve soil moisture.
Because soil surface roughness, vegetation scattering, and other factors affect the accuracy of model retrieval, more carefully distinguishing model performance under different roughness conditions still improves accuracy. This section analyses the retrieval accuracy of two models with different roughness conditions. Since the LSTM model is applied to continuous time series, the roughness threshold is set to 0.21, considering that data above 0.21 corresponds to high roughness, and data below 0.21 corresponds to low roughness (
Figure 14). In this section, the model is trained and prepared for prediction under two different roughness conditions.
According to the threshold, in this first analysis, experimental data with high roughness are discarded, amounting to a total of 37,783 moments. Similarly,
Rrl,
TB, and
TR are selected as input data, and the training set and the test set are divided according to the standard ratio of 70% and 30%, respectively, where the training set has 26,448 moments, and the test set has 11,335 moments for training and prediction. In
Figure 15, the red dots and black dots match better. When
Rrl is used as input for prediction together with
TR, the correlation coefficient is 0.67, and the RMSE is 0.017. When
Rrl is used as input for prediction with
TB, the correlation coefficient improves to 0.90, and the RMSE decreases to 0.009. When
Rrl is used as input for prediction together with
TR and
TB, the correlation coefficient is 0.85, and the RMSE is 0.012 (
Table 6).
Similarly, in the second analysis, experimental data with low roughness are discarded, amounting to a total of 24,509 moments, including 17,156 moments in the training set and 7353 moments in the test set, for training and prediction.
Figure 16 shows the prediction effect of the model in high roughness, where the blue dots and black dots match better. When
Rrl is used as input for prediction with
TR, the correlation coefficient is only 0.36, and the RMSE is 0.010. When
Rrl is used as input data for prediction with
TB, the correlation coefficient improves to 0.53, and the RMSE is reduced to 0.009. When
Rrl is used as input data for prediction together with
TR and
TB, the correlation coefficient is 0.83, and the RMSE is 0.007 (
Table 7).
3.3.2. Results Derived from Single-Satellite LSTM Model Retrieval
Considering that the penetration and reflection characteristics of electromagnetic waves are closely related to their frequencies, the information on soil moisture carried by reflected GNSS signals in different frequency bands is different [
35]. The soil-moisture retrieval experiment using single-satellite data was carried out utilizing 29 satellites. The values of
Rrl,
TB, and
TR at each point in time are used as model inputs for retrieval. Taking PRN13, 20, 32 as an example, the retrieval results from the LSTM model are shown in
Figure 17, and the retrieval results for all satellites are shown in
Figure 18 and
Figure 19.
Compared to the
Rrl input data alone, the RMSE is reduced by 39 % on average after adding
TR and
TB, within the set of which PRN32 has the best retrieval effect, with a maximum RMSE reduction of 63%. For demonstration purposes, satellites with a correlation coefficient greater than 0.5 are considered valid for retrieval. We calculated the number of valid satellites for the four configurations. If only reflectivity is used as input, the number of valid satellites is only 2. If
TR is added, the number increases to 7. If
TB is added, it increases to 24 (
Table 8). And when
TR and
TB are added, all satellites show correlation. However, multiple satellites such as PRN13, 20, and 32 also performed well (R > 0.9). When
TR and
TB are added, all satellites show correlation (R > 0.5), and the optimum coefficient is 0.94. The single-satellite retrieval statistics show that the model has the highest accuracy after adding two radiometer output products.