Next Article in Journal
Color-Sensitive Sensor Array Combined with Machine Learning for Non-Destructive Detection of AFB1 in Corn Silage
Previous Article in Journal
Innovative Protocols for Blackberry Propagation: In Vitro Cultivation in Temporary Immersion Systems with Ex Vitro Acclimatization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Inversion of County-Level Farmland Soil Moisture Based on SHAP and Stacking Models

1
College of Science, Shihezi University, Shihezi 832003, China
2
Key Laboratory of Oasis Town and Basin System Ecological Corps, Shihezi 832003, China
*
Author to whom correspondence should be addressed.
Agriculture 2025, 15(14), 1506; https://doi.org/10.3390/agriculture15141506
Submission received: 29 May 2025 / Revised: 9 July 2025 / Accepted: 10 July 2025 / Published: 13 July 2025
(This article belongs to the Section Digital Agriculture)

Abstract

Accurate monitoring of soil moisture in arid agricultural regions is essential for improving crop production and the efficient management of water resources. This study focuses on Shihezi City in Xinjiang, China. We propose a novel method for soil moisture retrieval by integrating Sentinel-1 and Sentinel-2 remote sensing data. Dual-polarization parameters (VV + VH and VV × VH) were constructed and tested. Pearson correlation analysis showed that these polarization combinations carried the most useful information for soil moisture estimation. We then applied Shapley Additive exPlanations (SHAP) for feature selection, and a Stacking model was used to perform soil moisture inversion based on the selected features. SHAP values derived from the coupled support vector regression (SVR) and random forest regression (RFR) models were used to select an additional six key features for model construction. Building on this framework, a comparative analysis was conducted to evaluate the predictive performance of multivariate linear regression (MLR), RFR, SVR, and a Stacking model that integrates these three models. The results demonstrate that the Stacking model outperformed other approaches in soil moisture retrieval, achieving a higher R2 of 0.70 compared to 0.52, 0.61, and 0.62 for MLR, RFR, and SVR, respectively. This process concluded with the use of the Stacking model to generate a county-level farmland soil moisture distribution map, which provides an objective and practical approach to guide agricultural management and the optimized allocation of water resources in arid regions.

1. Introduction

Soil moisture is one of the critical factors that determines agricultural production, ecosystem stability, and water resource management in arid and semi-arid regions. Its spatiotemporal distributions significantly influence crop growth, groundwater circulation, and regional ecological balance [1,2,3,4]. Accurate monitoring and prediction of soil moisture conditions are essential for optimizing agricultural irrigation, crop yield improvement, and sustainable utilization of limited water resources [5,6,7,8].
In recent years, the rapid advancement of remote sensing technology has provided effective means for large-scale soil moisture monitoring [9,10,11]. Synthetic Aperture Radar (SAR) has been widely used in soil moisture retrieval studies because of its all-weather, day-and-night imaging capabilities and high sensitivity to surface moisture [12,13,14,15]. SAR data capture surface moisture dynamics through backscattering coefficients, while optical remote sensing provides supplementary vegetation and environmental parameters [16,17,18,19]. The synergistic use of these datasets enhances the realization of reliable results in agricultural soil moisture monitoring. In parallel, the integration of SAR and optical data has also proven effective in monitoring surface water and mesic vegetation dynamics in semi-arid regions, further underscoring the value of multisource data fusion in hydrological applications [10]. For instance, He et al. [20] improved retrieval accuracy by integrating Landsat-8 and Sentinel-1 data and employing the Alpha approximation change detection method to reduce roughness and vegetation interference. Holtgrave et al. [21] validated the role of vegetation indices in mitigating vegetation effects by fusing SAR with NDVI data. Wang et al. [22] established a database based on the Advanced Integral Equation Model (AIEM), achieving high-precision surface soil moisture prediction in oasis regions. These studies have demonstrated the superior performance of combining SAR and optical data for soil moisture retrieval.
However, the complexity of feature parameters and their nonlinear relationship with soil moisture continue to pose challenges in constructing accurate retrieval models. Therefore, numerous recent studies have increasingly employed machine learning algorithms for soil moisture retrieval using various remote sensing datasets [23,24,25,26,27,28,29,30]. For instance, Li et al. [26] utilized the random forest (RF) algorithm to screen polarimetric features from Radarsat-2, optimizing model performance through feature selection. In a similar vein, Bourgeau-Chavez et al. [27] successfully retrieved soil moisture in U.S. black spruce forests by extracting polarimetric features from Radarsat-2 data using Van Zyl and H-A-α decomposition methods. Other initiatives come from (1) Wang et al. [28], who combined ALOS-2 and Landsat-8 data by selecting features based on parameter importance scores to construct a synergistic RF-based inversion model that demonstrated substantial applicability in arid desert-oasis regions, (2) Zhao et al. [29], who tested six different machine learning algorithms (including RF, SVR, and neural networks) with Sentinel-1/2 and Radarsat-2 data for farmland soil moisture inversion, reporting an R2 of 0.6395, and (3) Wang et al. [30], who successfully improved soil moisture monitoring accuracy in winter wheat fields by optimizing a convolutional neural network (CNN) model using a frost optimization algorithm. These studies collectively demonstrate the wide application and flexibility of machine learning methods in soil moisture retrieval. Despite some progress, existing research still presents several limitations that constrain broader applicability. Many studies have primarily relied on single-model frameworks, with limited exploration of the potential benefits of multi-model fusion—particularly at the sub-regional county scale, where related investigations remain scarce. Some studies have focused on specific vegetation types, with little attention to farmland [27,28], or concentrated solely on single-crop conditions and comparisons of individual models [29,30]. Moreover, single-model frameworks have shown limited generalizability; for example, Li et al. [26] reported signs of overfitting in the validation set when using RF, with an overall sample root mean square error (RMSE) of approximately 6%.
Building on these identified gaps, this study proposes an interpretable ensemble modeling approach for soil moisture retrieval that is tailored for small-sample conditions at the county scale. Unlike previous efforts that primarily focused on single algorithms or single-crop conditions, our framework integrates SAR and optical features using a Stacking ensemble model, which leverages the complementary strengths of multiple base learners—namely multivariate linear regression (MLR), support vector regression (SVR), and random forest regression (RFR). To enhance interpretability and feature transparency, we further incorporate Shapley Additive exPlanations (SHAP) to quantify the relative importance of individual input features [31,32]. Specifically, we aim to (1) integrate SAR and optical remote sensing features, (2) apply SHAP and correlation analysis to identify the most informative and non-redundant variables, and (3) assess the performance of a Stacking model relative to single-machine learning models across county-scale agricultural land. These objectives collectively aim to improve the accuracy, interpretability, and applicability of soil moisture retrieval under real-world constraints.

2. Study Area and Datasets

2.1. Study Area

The study area is located in Shihezi, a county-level city in Xinjiang, China, administered by the Eighth Division of the Xinjiang Production and Construction Corps, spanning 85°59′12″–86°08′13″ E and 44°15′43″–44°19′13″ N. Situated at the northern foothills of the Tianshan Mountains, between Shawan City and Manas County, the region is predominantly flat, with terrain gradually descending from the southeast to the northwest. The area experiences a typical temperate continental climate, characterized by an average annual temperature of 6.5–7.2 °C and annual precipitation of 125.0–207.7 mm, with intense evaporation and pronounced aridity. Shihezi features contiguous, well-organized farmland, primarily cultivating cotton, maize, wheat, sugar beets, and zucchini, making it a representative oasis agricultural zone. Farmland data focusing on cropland land-use types (Figure 1) were extracted from the China Land Cover Dataset (CLCD) provided by Wuhan University.

2.2. Data Sources and Preprocessing

2.2.1. Remote Sensing Data

The remote sensing data that were used include Sentinel-1 and Sentinel-2 data. Sentinel-1 consists of C-band SAR data, which are downloadable from the European Space Agency’s (ESA) Copernicus Open Access Hub (https://browser.dataspace.copernicus.eu, accessed on 11 October 2024). Sentinel-1 offers multiple imaging modes, including Stripmap (SM), Interferometric Wide Swath (IW), and Extra Wide Swath (EW). Considering the characteristics of the study area and analytical requirements, we selected Level-1 Single Look Complex (SLC) and Ground Range Detected (GRD) products in IW mode, with VV/VH dual-polarization and an average incident angle of 39.36°. The image that was used was acquired on 8 October 2024, with this acquisition being preferred because of its close temporal correspondence with the ground observation period. GRD data were used for backscattering coefficient extraction and related parameter calculations. SLC data were employed for polarimetric decomposition to derive polarimetric parameters which were essential because they enhanced the model’s input dimensions.
Sentinel-2 data comprise this satellite’s Multispectral Imager (MSI) data with spatial resolutions of 10 m, 20 m, and 60 m, including three red-edge bands that are particularly valuable for soil moisture retrieval in vegetated areas. These data were downloaded from the Copernicus Data Space Ecosystem (https://browser.dataspace.copernicus.eu, accessed on 26 October 2024). We selected an atmospherically corrected Level-2A product of 9 October 2024 that was temporally close to the Sentinel-1 dataset. This image was further processed by (1) clipping and constraining it to the study area’s footprint and (2) resampling it to a 10 m spatial resolution to enhance spatial consistency. All datasets were resampled to 10 m resolution using bilinear interpolation to ensure spatial alignment and enable multisource pixel-level analysis.

2.2.2. Ground Measurement Data

The ground data collection for this study was conducted on 6 October and 8 October 2024, with the two sampling periods closely spaced and synchronized with satellite overpass times. During this period, weather conditions remained stable without significant variations. Within the study area, four easily accessible farmland zones covering this area’s typical crop types were selected for the compilation of ground truth. Each sampling area contained 10–20 sampling points (Figure 1). The ground measurements included latitude and longitude coordinates, soil moisture content, and soil temperature at each sampling point. Precise Beidou positioning was used to record the coordinates of sampling points with levels of accuracy in the range of ±3 m. Despite some spatial imbalances and a limited total number of sampling points, this study attempted to cover the major land-use types that were reasoned to be representative of the sampling universe. The samples encompass a variety of the major land cover types, of which wheat, cotton, corn, and post-harvest bare soil were the most dominant, in order to ensure representative coverage and diversity. For each cover type, topsoil samples (0–10 cm) were collected from selected sampling points across the study area. At each point, surface soil was extracted and sealed in aluminum boxes for laboratory analysis. In the laboratory, the gravimetric soil moisture and bulk density of each sample were measured. Soil moisture was determined using the standard oven-drying method at 105 °C until constant weight was achieved, ensuring accurate measurement of dry weight. Sampling was strategically concentrated in areas with greater land type heterogeneity to enhance the representativeness of the training data, with fewer samples being collected in homogeneous land-use areas. The logic informing this approach was that repetitive collection of the same information in the area does not add value to signature-based classification of remote sensing images.

2.2.3. Data Preprocessing

To ensure data quality, all remote sensing and ground measurement datasets underwent systematic preprocessing. For the field measurements, soil samples were oven-dried to determine gravimetric water content at each sampling point. Soil bulk density was measured using the cutting ring method, with an average value of 1.33 g/cm3. The gravimetric water content was then converted to volumetric water content using the bulk density, resulting in a range of 14.62% to 36.95%. After calculation and processing, 3 outliers that clearly deviated from expected values were removed, leaving 60 sampling points with volumetric soil moisture data for model input and validation. Sentinel-1 GRD data were preprocessed using ESA’s Sentinel Application Platform (SNAP 10.0.0) software, including orbit correction, border noise removal, thermal noise reduction, and speckle filtering using the 7 × 7 pixel Refined Lee filter, radiometric calibration, terrain correction, geocoding, and image cropping. For Sentinel-1 SLC data, PolSARpro software (version 6.0) was used for multi-looking, polarimetric filtering, and H/A/α polarimetric decomposition to extract polarimetric parameters. Sentinel-2 Level-2A data, which contain atmospherically corrected bottom-of-atmosphere reflectance, did not require additional atmospheric correction. However, as the images did not fully cover the study area, mosaic and clipping processes were performed to constrain the images to the research area’s footprint.

3. Methodology

This study utilized Sentinel-1 and Sentinel-2 data to extract 21 features. Feature selection, informed by Pearson correlation analysis and SHAP values derived from the integrated RFR and SVR models, laid the foundation for building a multisource remote sensing ensemble model. The objective is to enhance farmland soil moisture retrieval accuracy at the county scale, providing theoretical support and practical guidance for agricultural water resource management in arid regions. The technical workflow is illustrated in Figure 2.

3.1. Feature Parameter Extraction

3.1.1. Backscattering Coefficients

The backscattering coefficient is a critical parameter in active microwave remote sensing for soil moisture inversion because it reliably captures physical properties by directly reflecting surface scattering intensity. Based on the longitude and latitude coordinates of sampling points, the VV-polarized backscattering coefficient ( σ V V o ) and VH-polarized backscattering coefficient ( σ V H o ) were extracted from the preprocessed SAR data. Research has shown that surface roughness correlates with polarization ratios [33]. To enrich the feature space, four combined features were constructed based on VV and VH polarizations: VV + VH, VV − VH, VV × VH, and VV/VH.

3.1.2. Incidence Angle and Surface Roughness

The incidence angle θ is a key parameter in microwave scattering, with both s i n θ   a n d   c o s θ exhibiting specific relationships with soil moisture [34]. After obtaining the incidence angle for each sampling point based on its coordinates, this study calculated s i n θ ,   c o s θ , and t a n θ through functional transformations. Surface roughness, as a crucial parameter affecting surface scattering characteristics, is closely related to soil moisture and backscattering coefficients. Zhao et al. [35] used Sentinel-1 data to calculate the composite surface roughness ( Z s ), as shown in Equation (1), where nonlinear least squares and linear regression methods were employed to fit the C-band parameters A V θ and B V θ , given in Equations (2) and (3):
Z s = e x p ( σ V H o σ V V o B V θ A V θ )
A V θ = 2.6408 sin 3 θ + 5.293 sin 2 θ 3.838 sin θ + 2.2042
B V θ = 4.1522 sin 3 θ 13.1 sin 2 θ + 16.9472 sin θ 16.4228
Due to the time-consuming and labor-intensive nature and limited coverage of in situ roughness measurements, this study adopted validated roughness data from the literature. This approach mitigates the limitations of field measurements while enhancing model applicability and accuracy.

3.1.3. Polarimetric Decomposition Parameters

Polarimetric decomposition techniques enhance feature extraction by reliably dissecting complex scattering processes. For dual-polarization Sentinel-1 data, H-A-α decomposition was performed to derive three polarimetric parameters from the target’s coherence or covariance matrix [28]. These parameters are Entropy (H), which measures the degree of polarization in target scattering; Alpha angle (α), representing the dominant scattering mechanism; and Anisotropy (A), which characterizes the complexity of scattering. Narayanarao et al. [36] demonstrated that pseudo-scattering-type parameters ( θ c ), pseudo-entropy parameters ( H c ), and co-polarization purity parameters ( m c ) can be extracted from dual-polarization Sentinel-1 GRD SAR data, effectively reflecting crop phenology and scattering mechanisms. Their calculations are given in Equations (7)–(9), with intermediate steps in Equations (5) and (6). First, the backscattering coefficients σ V V o and σ V H o (in dB) were converted to linear values using Equation (4).
σ V V , l i n e o = 10 σ V V o 10 ,     σ V H , l i n e o = 10 σ V H o 10
q = σ V H , l i n e o σ V V , l i n e o
p 1 = 1 1 + q ,     p 2 = q 1 + q
m c = 1 q 1 + q ,     0 m c 1
θ c = 1 q 2 1 + q 2 q ,         0 ° θ c 45 °
H c = i = 1 2 p i log 2 p I ,     0 H c 1
In the equations, m c represents the purity of target scattering, describing the varying degrees of complexity in target scattering. θ c characterizes the scattering type of the target, providing gradual change information about the target’s scattering mechanism. H c represents the entropy of target scattering, quantifying the randomness of the scattering process.

3.1.4. Vegetation Indices

Vegetation significantly affects surface scattering information, necessitating compensation through optical remote sensing data. This study utilized Sentinel-2 data and selected four vegetation indices commonly used in soil moisture inversion based on the actual vegetation coverage in the study area [35]: the Normalized Difference Vegetation Index (NDVI), Normalized Difference Water Index (NDWI), Fusion Vegetation Index (FVI), and Modified Soil-Adjusted Vegetation Index (MSAVI). These indices were calculated for each sampling point, with their specific expressions shown in Table 1.

3.2. Model Construction

3.2.1. Feature Selection

The accuracy of remote sensing-based soil moisture inversion is substantially dependent on the composition of input features. We attempted to accommodate this dependency by extracting the aforementioned 21 feature parameters through a purposefully designed screening procedure that was reasoned to be capable of yielding representatively informative features to which we assigned successive symbols as shown in Table 2. This was done by initially screening the two optimal polarization features (VV + VH and VV × VH) by calculating the correlation between VV and VH polarization modes, their combined polarization characteristics, and soil moisture content, after which the selected features were used as input variables for the modeling process.
To further quantify the contribution of other parameters (e.g., incidence angle and surface roughness) to soil moisture retrieval, this study employed SHAP values based on (1) game theory, whose origins began in the early 1950s with stochastic games that were secretly employed during the Cold War by the United States Arms Control and Disarmament Agency to study the decision theoretic aspects of arms control and disarmament [37], and (2) conjunctive use of the RFR and SVR models to evaluate feature importance, after which the marginal contributions of each feature were calculated to quantify their impact on prediction outcomes. The SHAP values were calculated as follows [38]:
g z = ϕ 0 + i = 1 M ϕ i z i ( z 0,1 M )
where g is the interpretation model, M is the number of input features in the model, ϕ 0 is the mean prediction value across all training datasets, and ϕ I represents the marginal contribution of variable I, which is the SHAP value.
By combining the RFR + SHAP and SVR + SHAP approaches, this methodology fully leverages the complementary advantages of both models. The specific implementation process was based on the following:
(1)
Using RFR + SHAP random forest regression’s stable capabilities to model global feature relationships, with SHAP values being used to quantify each feature’s importance within the overall model framework.
(2)
The ability of SVR + SHAP to combine support vector regression’s sensitivity to local nonlinear patterns in the capturing of critical feature contributions.
(3)
The effective integration of these models’ strengths with the SHAP outputs from RFR and SVR being equally weighted (0.5 weighting factor) to obtain comprehensive feature contribution rankings. This approach was preferred because apart from being able to balance the quantification of global and local features it also effectively reduces potential biases and instabilities that may arise from using a single model.
(4)
The weighted average SHAP value ranking of six features (NDWI, H c ,   c o s θ , MSAVI, FVI, and m c ) was used to capture the significant contributions of each feature to soil moisture inversion, with each of them serving as an input variable to the subsequent Stacking model.

3.2.2. Stacking Model Construction

This study employed four distinct models (MLR, RFR, SVR, and Stacking) for soil moisture inversion through feature engineering. These models were intentionally selected and used because of their respective capabilities to (1) effectively fit linear trends (MLR); (2) demonstrate stable performance with high-dimensional nonlinear data (RFR); (3) be suitable for small sample sizes with strong pattern-fitting capabilities (SVR); and (4) enhance prediction accuracy (Stacking). The latter (Stacking) was constructed by integrating SVR and RFR as base learners to leverage their complementary strengths. In a Stacking ensemble, a meta-learner is used to combine base model predictions and to capture higher-level patterns. In this study, Bayesian ridge regression served as the meta-learner, which improved model stability by automatically adjusting the regularization parameter (α) within a probabilistic framework. This property is especially beneficial in small-sample scenarios, as it helps reduce overfitting and enhances the generalizability of the ensemble. Figure 3 presents the architectural structure of the Stacking model, and implementation details are described in Steps 1–5 below. All models were developed using Python 3.11. Core modeling procedures were implemented with the scikit-learn library [39], and model interpretability was supported by the SHAP library [40], along with other standard scientific computing tools.
(1)
A random division of 60 samples was performed, with 42 of these samples used for model training and the remaining 18 for validation. To reduce the influence of data partitioning randomness, we conducted multiple random splits. For presentation, we selected one representative split whose performance fell within one standard deviation of the mean across all splits. All feature variables were standardized prior to modeling in order to eliminate the influence of differing units and to improve model training performance and convergence speed.
(2)
Thereafter, a Grid Search was applied to optimize the hyperparameters of SVR and RFR. The SVR used C = 400 and γ = 0.0032, the RBF Kernel was used to capture complex nonlinear relationships, and RFR was configured with max_features = log2, n_estimators = 6, and max_depth = 2 to control complexity and enhance robustness with small samples.
(3)
This was followed by constructing a stacked feature matrix through the generation of predictions from both SVR and RFR on training sets Xstack-train and test sets Xstack-test, respectively.
(4)
In the second-last stage, ridge regression was used as the meta-learner in the integration of base learners’ predictions by fitting Xstack-train.
(5)
The last stage involved use of the trained meta-learner that was predicted by the Xstack-test to produce final soil moisture inversion results, with their accuracies and stability being verified by using the RMSE, mean absolute error (MAE), and mean bias error metrics (Bias).

3.3. Accuracy Validation

This study used the coefficient of determination (R2) and RMSE as metrics to evaluate the accuracy of prediction results based on the formulas that are shown in Equations (11) and (12). The closer R2 is to 1 and the smaller the RMSE is, the higher the inversion accuracy of the model for soil moisture. This was boosted by introducing the MAE and the Bias. MAE was computed because it accurately reflects the model’s error at most data points, with the formula used in its calculation being shown in Equation (13). Bias refers to the average difference between predicted and true values. This metric is useful because it indicates whether predictions are systematically higher or lower. When bias approaches 0, the model’s predictions show no significant bias. This metric is calculated by Equation (14) in the list of equations below.
R 2 = i = 1 n y ^ i y ¯ 2 i = 1 n y i y ¯ 2
R M S E = y ^ i y ¯ 2 n
M A E = 1 n i = 1 n | y i y ^ i |
B i a s = 1 n i = 1 n ( y ^ i y i )
In the above listed equations, y ^ I represents the predicted soil moisture value at each sampling point; y ¯ represents the mean soil moisture value at each sampling point; y I represents the measured soil moisture value at each sampling point; and n represents the number of samples.

4. Results

4.1. Parameter Combination Selection

To enhance the performance of polarimetric features in soil moisture retrieval, this study constructed multiple polarization parameter combinations and analyzed their correlation with soil moisture and linear fitting performance. Table 3 shows the correlation coefficients and fitting accuracies of various polarimetric features. It is worth noting that correlation filtering was only applied to variables x 1 to x 6 , which are directly derived from dual-polarization SAR backscatter and tend to exhibit strong internal correlations due to their common physical origin and mathematical formulation. In contrast, variables x 7 to x 21 were not subjected to correlation analysis, as they stem from more diverse sources—including geometric descriptors, vegetation indices, surface roughness parameters, and polarimetric decomposition features—and are therefore less susceptible to multicollinearity. This selective strategy aimed to reduce redundancy while preserving complementary information for model training.
These results show that the two single-polarization features (VV and VH) yielded statistically significant negative correlations with soil moisture, with their Pearson correlation coefficients being −0.66 and −0.62, and their linear fitting R2 values being 0.44 and 0.38, respectively. The major insight from these metrics is that they indicate relatively low fitting accuracies. In contrast, certain combined features show stronger correlations and fitting capabilities. These features are VV + VH, which achieved a correlation coefficient of −0.71, and VV × VH, which yielded a positive correlation of 0.73. Both of these combinations yielded R2 values exceeding 0.50, which indicate significant modeling potential. Conversely, VV − VH and VV/VH displayed weaker correlations, with their Pearson coefficients being 0.04 and 0.28, and their corresponding R2 values indicating poor fitting performance by virtue of being as low as 0.00 and 0.08, respectively. The take-home message from this analysis is that VV + VH and VV × VH have optimal performances and good fitting capabilities and correlations that are indicative of representative input variables that can be used to improve the skills of subsequent models.

4.2. Importance Analysis

In machine learning algorithms, feature selection significantly impacts prediction accuracy, with excessive and insufficient features leading to overfitting or underfitting problems, respectively, which prevent the model from achieving optimal performance [32,41]. After selecting the optimal polarization combination features, it was still necessary to avoid redundancy and potential accuracy degradation by screening the remaining 15 features. The six backscatter-based indices ( x 1 x 6 ) were not included in the SHAP analysis to avoid instability caused by their high mutual correlations. This study employed SHAP values to rank feature importance, as shown in Table 4. The mean absolute SHAP value across all samples was used as the global importance measure for each feature: x 18 > x 16 > x 9 > x 20 > x 19 > x 14 > x 8 > x 7 > x 10 > x 15 > x 11 > x 13 > x 12 > x 17 > x 21 .
The importance ranking revealed that x 18 (NDWI) contributed most to soil moisture retrieval, accounting for 40% of the total SHAP value, followed by H c ,   c o s θ , MSAVI, FVI, and m c (Table 4). To identify optimal features for achieving the best inversion results, we conducted soil moisture inversion analysis using the Stacking model based on the top 10 important features from Table 4, combined with the previously selected optimal polarization features x3 (VV + VH) and x5 (VV × VH), to ensure a comprehensive feature set. Ten different feature combinations were evaluated for their test set error metrics and variation trends (Table 5).
The results demonstrate that the evaluation metrics (R2, RMSE, MAE) for the test set under different feature combinations with the Stacking model generally exhibited an initial increase followed by a decreasing trend. The combination showed optimal performance among all ten combinations, yielding the best prediction results when used as input variables (R2 = 0.7007, RMSE = 1.8837%, MAE = 3.5484). The selected six features (NDWI, H c , c o s θ , MSAVI, FVI, m c ) significantly improved soil moisture inversion accuracy compared to other feature combinations, clearly demonstrating the substantial impact of feature selection on prediction precision.

4.3. Model Comparison

Table 6 presents a comparison of evaluation metrics for soil volumetric water content inversion results across four different models. Figure 4 shows the comparison between inversion results and measured values for RFR, SVR, and the Stacking model.
These results demonstrate that the proposed Stacking model exhibits superior performance by outperforming individual MLR, RFR, and SVR models across all metrics. Specifically, the Stacking model achieves an R2 of 0.7007, representing improvements of 34.6%, 15.7%, and 13.1% over MLR, RFR, and SVR, respectively. Its RMSE reduces to 1.8837%, showing decreases of 21.0%, 12.9%, and 11.3% compared to MLR, RFR, and SVR. Although the Stacking model’s MAE (3.5484%) is higher than that of MLR, it is significantly lower than both RFR and SVR, indicating an optimal balance between prediction accuracy and stability. In terms of bias, the Stacking model demonstrates relatively stable performance (0.5466%), significantly lower than RFR (0.7234%) though slightly higher than SVR (0.3029%), suggesting consistent systematic error correction. Comparative analysis reveals that RFR exhibits substantial deviation from zero, indicating systematic overestimation or underestimation, while SVR shows better bias performance. Although MLR’s bias approaches zero (0.0250%), its inferior RMSE and R2 performance point to limited predictive capability. Despite the sample size being relatively small, the fitting results of each model on both the training and testing sets (Figure 4) show a high level of consistency between the predicted and observed values, with strong correlations and a relatively uniform distribution of residuals that suggests no bias. These results indicate that the constructed models exhibit good generalization ability under the current data split, without significant signs of overfitting, as further supported by similar trends observed across multiple random splits. The major strength of the Stacking model is that it effectively integrates predictions from MLR, RFR, and SVR, capitalizing on each model’s strengths while mitigating individual limitations, thereby significantly enhancing inversion accuracy. Overall, these results indicate the ability of our approach to successfully eliminate redundant features and to substantially improve the accuracy of soil volumetric water content estimation by combining the predictive capabilities of different models.

4.4. County-Level Farmland Soil Moisture Inversion Results

Figure 5 shows the soil moisture inversion results of county-level farmland using the RFR, SVR, and Stacking models.
Analysis of the results shown in Figure 5 indicates that the RFR model produced a county-level mean soil moisture of 25.3348%, with predicted values ranging between 16.8043% and 33.8070%. The predicted range is slightly higher than the observed minimum (14.6236%) and lower than the observed maximum (36.9474%). This narrower range suggests potential underfitting at the distribution tails, particularly in wetter areas, despite overall spatial stability. The SVR model shows a county-level mean soil moisture inversion of 24.2063%, slightly lower than RFR, with the maximum reaching 77.2737% and minimum even showing −11.1754%. These extreme values suggest that the SVR model may have significant overestimations and underestimations in some areas, leading to instability at the county scale.
To compare the performance differences between models in soil moisture inversion, the original SVR predictions without physical constraints were retained to reveal its deviation under unconstrained conditions. Although some predicted values exceeded the physically reasonable range of soil volumetric water content (0–60%), this anomalous deviation reflects the limited generalization ability of the traditional SVR model in extreme regions (Figure 5). In contrast, the Stacking model demonstrated a stronger ability to maintain reasonable output ranges, highlighting its advantage in boundary control and robustness. The Stacking model has a county-level mean soil moisture inversion of 24.9761%, between RFR and SVR means, with a maximum of 56.7838% and a minimum of 2.4275%, showing better stability than the SVR model while retaining certain nonlinear characteristics with reasonable inversion results. At the sampling-point scale, the predicted mean soil moisture values are 24.5872%, 24.104%, and 24.4795% for the RFR, SVR, and Stacking models, respectively, with close correspondence to the measured data mean of 23.5953%, indicating that all models can adequately capture soil moisture distribution characteristics at the local scale.
Overall, the findings indicate that (1) RFR’s performance is relatively stable at the county scale and suitable for large-scale soil moisture inversion, but it may have some underestimation; (2) SVR captured the mean soil moisture at sampling points reasonably well, but showed extreme errors at the county scale due to extrapolation issues; and (3) the Stacking model performs well at both the sampling-point and county scales because it combines the advantages of both models. The inversion results of all three models also show frequency distributions that are consistent with measured data with this consistency, indicating that in this study, these three models yielded usable results. The findings of this pilot initiative are important because, unlike previous studies that have primarily focused on small plot-scale analyses, this study goes a step further by applying the Stacking model to soil moisture inversion at the broader county-level scale, thereby enabling researchers to address what has all along been one of the major limitations that has been confronting most research initiatives.

5. Discussion

5.1. Advantages of Feature Combinations

Single-polarization features exhibit certain limitations in characterizing complex surface properties, primarily manifested in their weak responsiveness to surface roughness and vegetation cover variations. Although VV and VH can capture partial backscattering information, they are constrained by insufficient sensitivity in areas with complex cover types and insignificant moisture variations. Polarization feature combinations can effectively overcome these limitations by integrating dual-polarization information to enhance the capture of multi-dimensional scattering variables.
Specifically, the VV + VH linear combination of dual-polarization signals improves the signal-to-noise ratio and highlights the target’s overall scattering characteristics, whereas VV × VH strengthens nonlinear feature responses and shows higher correlation. This combination approach can, to some extent, reduce single-polarization dependence on specific scenarios and enhances sensitivity to soil moisture variations. The poor performance of VV − VH and VV/VH may be attributed to their difference and ratio forms’ high sensitivity to noise. The difference form shows a weak response when surface contrast is small, while the ratio form tends to affect correlation by amplifying noise interference when VH values are low. Therefore, VV + VH and VV × VH were ultimately selected as polarization combination input features for the soil moisture inversion model, with this selection offering the advantages of dual polarization’s superior fitting capabilities, stability in actual data performance, and a solid foundation for selection of features in the subsequent construction of models. To accurately investigate the importance of the six selected feature variables for farmland soil moisture inversion and interpret the feature importance output by the machine learning model along with each feature’s specific impact on predictions, a bee swarm plot of the test set was used for interpretability analysis (Figure 6). This plot displays each feature’s SHAP values for every sample. Each row represents a feature, with the horizontal coordinate indicating SHAP values, each dot representing a sample, and color indicating feature value magnitude, with red and blue denoting high and low, respectively.
Using the zero SHAP value as the central dividing line, samples on the left side exhibit negative effects on estimated values, while those on the right show positive effects, with color intensity symbolizing the corresponding feature value’s magnitude. Since the model is herein used to predict soil moisture content, negative effects indicate that higher feature values correspond to lower predicted soil moisture, while positive effects mean higher feature values equate to higher predicted soil moisture. The plot shows that for NDWI, MSAVI, FVI, and m c , most high values (red dots) fall within the negative SHAP value range, demonstrating that elevated values of these features significantly reduce the model’s predicted soil moisture content. Conversely, their low values (blue dots) primarily appear in the positive SHAP value range, indicating they increase the predicted soil moisture content. In contrast, high values (red dots) of H c and cos θ cluster in the positive SHAP value region, while their low values (blue dots) are distributed in the negative SHAP value range, which is important and worth mentioning because it reveals that low values decrease and high values increase the model’s predicted soil moisture content.
Equally important is this study’s identification of optimal feature parameters: the combined polarization features VV + VH and VV × VH, which demonstrated higher correlation and stronger fitting capability in estimating soil moisture variations. Building upon this foundation, six additional features (NDWI, H c , cos θ , MSAVI, FVI, m c ) selected through SHAP value analysis were incorporated to construct a “polarization combination + key variables” input system. This integrated framework not only encompasses radar signal scattering characteristics but also incorporates vegetation physiological status, surface structural attributes, and radar geometric features. The integration of this multisource heterogeneous information offers the following three key advantages:
(1)
Strong complementary information: The polarization combinations capture electromagnetic wave scattering intensity and mechanisms, while the SHAP-selected features supplement vegetation, roughness, and radar angle factors, achieving comprehensive coverage from physical scattering and biogeographical influences.
(2)
Reduced redundancy and improved efficiency: SHAP-based ranking effectively eliminates low-contribution features, maintaining the feature dimension below 10, thereby controlling model complexity and mitigating overfitting risks.
(3)
Significant prediction improvement: The Stacking model, using VV + VH, VV × VH, and the top six SHAP features as inputs, achieved optimal performance on the test set (R2 = 0.7007, RMSE = 1.8837%, MAE = 3.5484), substantially outperforming single-feature-group approaches.
In summary, the proposed feature combination strategy fully leverages the complementary advantages of polarization parameters and surface characteristics in soil moisture retrieval. It enhances both the physical rationality of the model and the accuracy/stability of predictions, providing an effective pathway for developing high-precision and interpretable soil moisture inversion models. The methodology demonstrates particular value in addressing the complex scattering–vegetation–moisture interactions characteristic of agricultural monitoring scenarios.

5.2. Model Error Analysis

To systematically understand the differences in prediction results among various models, we conducted comprehensive error analysis from perspectives of model mechanisms and spatial-scale adaptability. The results show that in general, MLR’s soil moisture estimates (SMEs) have marginal biases of ~0.0250%, but the accuracy of these estimates is generally low, with R2 values tending to fluctuate around 0.5206. This limitation primarily stems from this model’s linear nature, which undermines its ability to capture the complex nonlinear relationships between soil moisture and remote sensing features. This is so because in remote sensing inversion, backscattering coefficients often exhibit nonlinear responses to soil moisture that are inadequately represented by simple linear models due to the extraneous effects of complex vegetation–moisture–roughness interactions and other external factors. These limitations are aggravated by MLR’s susceptibility to multicollinearity, which compromises its robustness. Although MLR offers simplicity and interpretability, these inherent limitations undermine its performance in multisource remote sensing-based inversion tasks. Detailed analysis also indicates that although SVR effectively captured mean soil moisture at the sampling-point scale, it exhibited severe value fluctuations at the county scale, with a maximum and minimum of 77.27% and −11.18%, respectively (Figure 5), demonstrating abnormally pronounced implausible numerical extremes. This anomalous characteristic originates from its lack of prediction boundary control and sensitivity to outliers and boundary samples. These run-away tendencies tend to emerge when input variables fall near or beyond the training data’s distribution, edges which render the model prone to overestimation and underestimation. These underlying discrepancies are further aggravated by SVR’s global optimization procedure during training which may lead to insufficient learning of local structures and unstable performance across complex terrains and extremely variable land cover types. In contrast, RFR demonstrated good spatial stability but conservative predictions. Its minimum predicted value (16.8%) noticeably exceeded the actual measured minimum (14.6%), which explains why it failed to adequately capture extreme drought conditions (Figure 5). This conservative bias appears to stem from its “voting-averaging” approach. When high-moisture samples are sparse, for example, the model tends to produce median outputs that overestimate and underestimate measurements in dry and wet areas, respectively. This downside is further worsened by the independence of individual decision trees, which reduce global prediction accuracy by placing overemphasis on poorly representative sample features. These shortcomings are, however, partly mitigated by RFR’s robustness with high-dimensional data when sufficient samples are available, which makes it suitable for large-scale inversion of soil moisture across heterogeneous landscapes. Although the performance of these models was variable, the Stacking model still demonstrated substantial skillfulness by being able to accurately predict soil moisture at levels of accuracy that range from 2.4% to 56.8%, with the former closely approximating extreme drought measurements in arid regions [42], and the latter doing the same for transient high moisture during the pre-winter wheat irrigation period (Figure 7) in tandem with this area’s October farming practices. This finding is important because it demonstrates that it is possible to address SVR’s extrapolation anomalies and RFR’s insensitivity to extremes through the systematic integration of base learners.
Regarding spatial-scale adaptability, marked differences emerge between models. SVR performs well at the sampling-point scale for localized retrieval but shows unstable extreme fluctuations at larger scales. The occurrence of negative and excessively large values in the SVR results suggests that, without objectively informed parameterization of input factors, the model is prone to generating physically unrealistic inversion outputs. This observation explains why we purposefully decided to explore the potential usefulness of the Stacking model in this study, with its application under the ensemble learning framework demonstrating substantial robustness and physical plausibility. The findings of this pilot initiative also show that although RFR is handy for county-scale applications, Stacking is better because it effectively exploits the advantages of MLR, SVR, and RFR to produce accurate results across both point and regional scales by combining nonlinear fitting with spatial generalization capabilities. Although spatial distribution maps derived from these three models were able to capture general moisture trends (Figure 5), SVR exhibited unstable edge distributions with extreme values, while RFR generated smoother spatial patterns which failed to detect extremely dry or wet conditions. These discrepant outputs are in stark contrast with the Stacking model’s output, which closely captured natural transitions, through boundary control within physically plausible ranges that offer intelligible spatial continuity. This superior spatial consistency is further supported by comparative performance evaluations against existing studies. Our findings indicate that the Stacking model outperforms the single-model approaches commonly used in previous studies in terms of both soil moisture prediction accuracy and robustness [26,29]. Despite being trained on a limited sample size (60 samples, R2 = 0.70), our model achieved a prediction accuracy comparable to that of Wang et al. [30], who used a much larger dataset (approximately 700 samples, R2 = 0.72), highlighting the effectiveness of multi-model fusion. Moreover, the RMSE of our Stacking model (1.8837%) is substantially lower than that reported by Li et al. [26] (6%), indicating stronger generalizability and model stability. These improvements are primarily attributed to the complementary strengths of the base learners and the incorporation of interpretable feature selection, which together help mitigate issues such as overfitting and physically unrealistic predictions observed in earlier studies. Implicit in these observations is the fact that the Stacking model stands out as being best capable of producing reliable estimates with enhanced physical consistency and spatial interpretability across various land cover types, with its ability to do so justifying why we strongly recommend its use by other researchers in their studies elsewhere.
However, it is important to note that these results are based on a limited training dataset, which does not capture the full variability of soil moisture conditions across the county. The observed extreme outputs from SVR reflect extrapolation failures due to insufficient representation of input combinations, while RFR’s relatively narrow prediction range stems from its tendency to regress toward the mean, potentially overlooking more extreme conditions. Although the Stacking model demonstrated improved predictive stability and accuracy within this dataset, its ability to generalize beyond the observed data distribution remains uncertain. Therefore, conclusions about its broader applicability should be made with caution. Further validation using larger and more spatially diverse datasets is necessary to confirm the model’s generalizability and robustness.

5.3. Limitations and Further Investigation

Although we recommend the Stacking over the stand-alone RFR and SVR models because of its superior soil moisture retrieval capabilities, there are several limitations that need to be addressed. Based on how this study was conducted, it goes without saying that it was not possible to collect all the information that was needed due to lack of access to some of the equipment that was needed and time constraints which imposed limitations on the size of our sample. Uneven spatial distribution of sampling points was inevitable mainly because some of the target areas were inaccessible. These drawbacks may have adversely affected model training and the accuracy of our results. Image-based soil moisture estimations were also impacted by the prevalence of heterogeneous surface cover types at the county level, which undermined the ability of our models to fully capture the complex spatial variations in soil moisture. This explains why certain samples exhibited unrealistic extremes. The other limitation was that, although the Stacking model performed exceptionally well, its inordinate dependence on parameter selection and optimization strategies may have compromised its usability. Future research can enhance our abilities to overcome these limitations by (1) innovatively exploring sampling strategies that can be used to improve spatial representativeness; (2) integrating multisource and multi-sensor data types that are readily accessible from different sources; and (3) endeavoring to offer adaptable, more efficient model fusion and feature optimization solutions for techniques that can be used for timely and cost-effective monitoring of soil moisture at multiple temporal and spatial scales. We hope that future work incorporating larger and more diverse datasets will further validate and enhance the applicability of our approach.

6. Conclusions

This study attempts to address the critical challenges confronting soil moisture monitoring in arid farmland by developing a Stacking model that integrates SVR and RFR, to enhance the retrieval of soil moisture at the county-level scale in Shihezi, a county-level city in Xinjiang, China. Through systematic utilization of purposefully selected remote sensing datasets to retrieve soil moisture in Shihezi’s agricultural region, the following key conclusions were drawn: (1) The combined polarization features (VV + VH and VV × VH) exhibited higher correlations with soil moisture than single-polarization features (VV, VH) and other combination methods. (2) The Stacking model that was devised by combining SVR’s nonlinear modeling capabilities with RFR’s high-dimensional feature processing was able to (a) improve prediction accuracy, (b) enhance adaptability to complex surface conditions, and (c) effectively prove that multi-model fusion-based soil moisture retrieval can be used to provide timely usable farm-level soil moisture estimates. Comparative analysis with single models (MLR, SVR, and RFR) confirmed the Stacking model’s superior stability and accuracy, establishing model fusion as an effective approach for complex soil moisture inversion. The integrated model provided in this paper is extremely useful because apart from improving the accuracy of soil moisture estimations, it is robust enough to be applied under different settings. We conclude by inviting and urging other researchers to complement our efforts by innovatively striving to bridge some of the gaps in our study.

Author Contributions

Conceptualization, P.G.; methodology, H.Z.; software, H.Z.; validation, H.Z.; formal analysis, H.Z.; investigation, J.H., Z.W. and H.Z.; resources, P.G.; data curation, J.H. and J.L.; writing—original draft preparation, H.Z.; writing—review and editing, P.G. and H.Z.; visualization, Z.W. and J.L.; supervision, P.G.; project administration, P.G.; funding acquisition, P.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Tianshan Talent Team Program of Xinjiang Uygur Autonomous Region under Grant 2025TSYCTD003 and the National Natural Science Foundation of China under Grant U2003109.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Tong, C.F.; Hou, H.F.; Zheng, H.X.; Wang, Y.; Zhao, Y. Investigation of sunflower soil moisture and development in the Kubugi Desert using extensive time series remote sensing data. China Rural Water Hydropower 2025, 6, 157–165. [Google Scholar]
  2. Tian, Y.; Wu, K.; Li, H.; Shen, Y.; Qiu, H.; Zhai, J.; Zhang, J. Method for evaluating habitat suitability of rice sheath blight at a regional scale based on multi-source remote sensing information. Natl. Remote Sens. Bull. 2024, 28, 2469–2484. [Google Scholar] [CrossRef]
  3. Zhu, Y.Q.; Wu, S.R.; Wang, D. Soil moisture retrieval by radar remote sensing. China Agric. Inform. 2024, 36, 45–62. [Google Scholar]
  4. Zhang, R.; Bao, X.; Hong, R.; He, X.; Yin, G.; Chen, J.; Ouyang, X.; Wang, Y.; Liu, G. Soil moisture retrieval over croplands using novel dual-polarization SAR vegetation index. Agric. Water Manag. 2024, 306, 109159. [Google Scholar] [CrossRef]
  5. Fan, D.; Zhao, T.; Jiang, X.; García-García, A.; Schmidt, T.; Samaniego, L.; Attinger, S.; Wu, H.; Jiang, Y.; Shi, J.; et al. A Sentinel-1 SAR-based global 1-km resolution soil moisture data product: Algorithm and preliminary assessment. Remote Sens. Environ. 2025, 318, 114579. [Google Scholar] [CrossRef]
  6. Da, Q.; Yan, J.; Li, G.; Guo, Z.; Li, H.; Wang, W.; Li, J.; Ma, W.; Li, X.; Cheng, K. Inversion of Soil Moisture Content in Silage Corn Root Zones Based on UAV Remote Sensing. Agriculture 2025, 15, 331. [Google Scholar] [CrossRef]
  7. Arias, M.; Notarnicola, C.; Campo-Bescós, M.Á.; Arregui, L.M.; Álvarez-Mozos, J. Evaluation of soil moisture estimation techniques based on Sentinel-1 observations over wheat fields. Agric. Water Manag. 2023, 287, 108422. [Google Scholar] [CrossRef]
  8. Zhou, Y.N.; Wang, B.Y.; Zhu, W.W.; Feng, L.; He, Q.S.; Zhang, X.; Wu, T.J.; Yan, N.N. Spatial-temporal constraints for surface soil moisture mapping using Sentinel-1 and Sentinel-2 data over agricultural regions. Comput. Electron. Agric. 2024, 219, 108835. [Google Scholar]
  9. Paloscia, S.; Pettinato, S.; Santi, E.; Notarnicola, C.; Pasolli, L.; Reppucci, A. Soil moisture mapping using Sentinel-1 images: Algorithm and preliminary validation. Remote Sens. Environ. 2013, 134, 234–248. [Google Scholar] [CrossRef]
  10. Kolarik, N.E.; Roopsind, A.; Pickens, A.; Brandt, J.S. A satellite-based monitoring system for quantifying surface water and mesic vegetation dynamics in a semi-arid region. Ecol. Indic. 2023, 147, 109965. [Google Scholar] [CrossRef]
  11. Garg, S.; Dasgupta, A.; Motagh, M.; Martinis, S.; Selvakumaran, S. Unlocking the full potential of Sentinel-1 for flood detection in arid regions. Remote Sens. Environ. 2024, 315, 114417. [Google Scholar] [CrossRef]
  12. Ma, Z.; Liu, C.; Xue, H.; Li, J.; Fang, X.; Zhou, J. Identification of winter wheat by integrating active and passive remote sensing data based on Google Earth Engine platform. Trans. Chin. Soc. Agric. Mach. 2021, 52, 195–205. [Google Scholar]
  13. Yang, L.; Hou, C.; Su, Z.; Bai, Y.; Wang, T.; Feng, R. Soil moisture inversion in arid areas by using machine learning and fully polarimetric SAR imagery. Trans. Chin. Soc. Agric. Eng. 2021, 37, 74–82. [Google Scholar]
  14. Muhuri, A.; Goïta, K.; Magagi, R.; Wang, H. Soil moisture retrieval during crop growth cycle using satellite SAR time series. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 9302–9319. [Google Scholar] [CrossRef]
  15. Guan, H.X.; Huang, J.X.; Li, L.; Li, X.C.; Miao, S.X.; Su, W.; Ma, Y.Y.; Niu, Q.D.; Huang, H. Improved Gaussian mixture model to map the flooded crops of VV and VH polarization data. Remote Sens. Environ. 2023, 295, 113714. [Google Scholar] [CrossRef]
  16. Wang, J.; Ding, J.; Ge, X.; Peng, J.; Hu, Z. Monitoring soil salinization on the basis of remote sensing and proximal soil sensing: Progress and perspective. Natl. Remote Sens. Bull. 2024, 28, 2187–2208. [Google Scholar]
  17. Lv, C.; Xie, Q.; Peng, X.; Dou, Q.; Wang, J.; Lopez-Sanchez, J.M.; Shang, J.; Chen, L.; Fu, H.; Zhu, J.; et al. Soil moisture retrieval over agricultural fields with machine learning: A comparison of quad-, compact-, and dual-polarimetric time-series SAR data. J. Hydrol. 2024, 644, 132093. [Google Scholar] [CrossRef]
  18. Bousbih, S.; Zribi, M.; El Hajj, M.; Baghdadi, N.; Lili-Chabaane, Z.; Gao, Q.; Fanise, P. Soil Moisture and Irrigation Mapping in A Semi-Arid Region, Based on the Synergetic Use of Sentinel-1 and Sentinel-2 Data. Remote Sens. 2018, 10, 1953. [Google Scholar] [CrossRef]
  19. Zhu, L.J.; Yuan, S.S.; Liu, Y.; Chen, C.; Walker, J.P. Time series soil moisture retrieval from SAR data: Multi-temporal constraints and a global validation. Remote Sens. Environ. 2023, 287, 113466. [Google Scholar] [CrossRef]
  20. He, L.; Qin, Q.; Ren, H.; Du, J.; Meng, J.; Du, C. Soil moisture retrieval using multi-temporal Sentinel-1 SAR data in agricultural areas. Trans. Chin. Soc. Agric. Eng. 2016, 32, 142–148. [Google Scholar]
  21. Holtgrave, A.-K.; Förster, M.; Greifeneder, F.; Notarnicola, C.; Kleinschmit, B. Estimation of soil moisture in vegetation-covered floodplains with Sentinel-1 SAR data using support vector regression. PFG J. Photogramm. Remote Sens. Geoinf. Sci. 2018, 86, 85–101. [Google Scholar] [CrossRef]
  22. Wang, J.; Ding, J.; Chen, W.; Yang, A. Microwave modeling of soil moisture in oasis regional scale based on Sentinel-1 radar images. Infrared Millim. Waves 2017, 36, 120–126. [Google Scholar]
  23. Yao, Y.; Yan, J.; Li, G.; Ma, W.; Yao, X.; Song, M.; Li, Q.; Li, J. A GNSS-IR Soil Moisture Inversion Method Considering Multi-Factor Influences Under Different Vegetation Covers. Agriculture 2025, 15, 837. [Google Scholar] [CrossRef]
  24. Wang, S.N.; Li, R.P.; Wu, Y.J.; Wang, W.J. Estimation of surface soil moisture by combining a structural equation model and an artificial neural network (SEM-ANN). Sci. Total Environ. 2023, 876, 162558. [Google Scholar] [CrossRef]
  25. Abdikan, S.; Sekertekin, A.; Madenoglu, S.; Ozcan, H.; Peker, M.; Pinar, M.O.; Koc, A.; Akgul, S.; Secmen, H.; Kececi, M.; et al. Surface soil moisture estimation from multi-frequency SAR images using ANN and experimental data on a semi-arid environment region in Konya, Turkey. Soil Tillage Res. 2023, 228, 105646. [Google Scholar] [CrossRef]
  26. Li, P.X.; Liu, Z.Q.; Yang, J.; Sun, W.; Li, M.; Ren, Y. Soil moisture retrieval of winter wheat fields based on random forest regression using quad-polarimetric SAR images. Geomat. Inf. Sci. Wuhan Univ. 2019, 44, 405–412. [Google Scholar]
  27. Bourgeau-Chavez, L.L.; Leblon, B.; Charbonneau, F.; Buckley, J.R. Evaluation of polarimetric Radarsat-2 SAR data for development of soil moisture retrieval algorithms over a chronosequence of black spruce boreal forests. Remote Sens. Environ. 2013, 132, 71–85. [Google Scholar] [CrossRef]
  28. Wang, Y.; Yang, L.; Ren, J.; Zhang, J.; Kong, J.; Hou, C. An oasis soil moisture inversion model using ALOS-2 and Landsat 8 data. Geomat. Inf. Sci. Wuhan Univ. 2024, 49, 1630–1638. [Google Scholar]
  29. Zhao, J.; Zhang, C.; Min, L.; Guo, Z.; Li, N. Retrieval of Farmland Surface Soil Moisture Based on Feature Optimization and Machine Learning. Remote Sens. 2022, 14, 5102. [Google Scholar] [CrossRef]
  30. Wang, R.; Zhao, J.H.; Yang, H.J.; Li, N. Inversion of soil moisture in wheat farmlands using the RIME-CNN-SVR model. Trans. Chin. Soc. Agric. Eng. 2024, 40, 94–102. [Google Scholar]
  31. Tao, S.Y.; Zhang, X.; Feng, R.; Qi, W.C.; Wang, Y.B.; Shrestha, B. Retrieving soil moisture from grape growing areas using multi-feature and stacking-based ensemble learning modeling. Comput. Electron. Agric. 2023, 204, 107537. [Google Scholar] [CrossRef]
  32. Ge, J.; Lei, G.; Chen, H.; Zhang, B.; Chen, L.; Bai, M.; Su, N.; Yu, Z. Irrigation district channel dispatch flow prediction based on SHAP importance ranking and machine learning algorithm. Trans. Chin. Soc. Agric. Eng. 2023, 39, 113–122. [Google Scholar]
  33. Guan, Y.T.; Li, J.P. Soil moisture inversion based on genetic optimization neural network and multi-source remote sensing data. J. Water Resour. Water Eng. 2019, 30, 252–256. [Google Scholar]
  34. Lin, L.B. Soil Moisture Retrieval under Vegetation Cover Using Multi-Source Remote Sensing Data. Master’s Thesis, Nanjing University of Information Science and Technology, Nanjing, China, 2018. [Google Scholar]
  35. Zhao, J.; Zhang, B.; Li, N.; Guo, Z. Cooperative inversion of winter wheat covered surface soil moisture based on Sentinel-1/2 remote sensing data. J. Electron. Inf. Technol. 2021, 43, 692–699. [Google Scholar]
  36. Bhogapurapu, N.R.; Dey, S.; Bhattacharya, A.; Mandal, D.; Lopez-Sanchez, J.M.; McNairn, H.; López-Martínez, C.; Rao, Y. Dual-polarimetric descriptors from Sentinel-1 GRD SAR data for crop growth assessment. ISPRS J. Photogramm. Remote Sens. 2021, 178, 20–35. [Google Scholar] [CrossRef]
  37. Aumann, R.J.; Maschler, M.B.; Stearns, R.E. Repeated Games with Incomplete Information; MIT Press: Cambridge, MA, USA, 1995. [Google Scholar]
  38. Zhou, Y.N.; Chen, H.; Liu, H.B. Land cover classification in hilly and mountainous areas using multi-source data and Stacking-SHAP technique. Trans. Chin. Soc. Agric. Eng. 2022, 38, 213–222. [Google Scholar]
  39. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  40. Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4765–4774. [Google Scholar]
  41. Aliferis, C.; Simon, G. Overfitting, underfitting and general model overconfidence and under-performance pitfalls and best practices in machine learning and AI. In Artificial Intelligence and Machine Learning in Health Care and Medical Sciences; Simon, G.J., Aliferis, C., Eds.; Springer: Cham, Switzerland, 2024. [Google Scholar]
  42. Xu, C.Y. Dynamic Research on Shallow Groundwater and Soil Moisture of Shihezi Agricultural Areas. Ph.D. Thesis, Changan University, Xi’an, China, 2013. [Google Scholar]
Figure 1. Location of the study area and spatial distributions of sampling points: (a) Xinjiang, China; (b) distribution of points in the study area (base map: Sentinel-2 imagery).
Figure 1. Location of the study area and spatial distributions of sampling points: (a) Xinjiang, China; (b) distribution of points in the study area (base map: Sentinel-2 imagery).
Agriculture 15 01506 g001
Figure 2. Technical workflow diagram.
Figure 2. Technical workflow diagram.
Agriculture 15 01506 g002
Figure 3. Architectural configuration of the Stacking model.
Figure 3. Architectural configuration of the Stacking model.
Agriculture 15 01506 g003
Figure 4. Soil moisture inversion results and measurement data: (a) MLR; (b) RFR; (c) SVR; (d) Stacking model.
Figure 4. Soil moisture inversion results and measurement data: (a) MLR; (b) RFR; (c) SVR; (d) Stacking model.
Agriculture 15 01506 g004aAgriculture 15 01506 g004b
Figure 5. Inversion results of soil moisture in farmland at the county level: (a) RFR: The predicted values do not reach the observed minimum and maximum, indicating a narrower range compared to field observations. (b) SVR: The inversion results exceeding the physically reasonable range reflect the model’s predictive limitations under unconstrained conditions. (c) Output of the Stacking model.
Figure 5. Inversion results of soil moisture in farmland at the county level: (a) RFR: The predicted values do not reach the observed minimum and maximum, indicating a narrower range compared to field observations. (b) SVR: The inversion results exceeding the physically reasonable range reflect the model’s predictive limitations under unconstrained conditions. (c) Output of the Stacking model.
Agriculture 15 01506 g005
Figure 6. Bee colony distributions of SHAP values in the optimal combination test set.
Figure 6. Bee colony distributions of SHAP values in the optimal combination test set.
Agriculture 15 01506 g006
Figure 7. Schematic diagram of extreme values in the Stacking model: (a) county-level retrieval results from the Stacking model; (b) hotspots of extreme high values; (c) coldspots of anomalous minima.
Figure 7. Schematic diagram of extreme values in the Stacking model: (a) county-level retrieval results from the Stacking model; (b) hotspots of extreme high values; (c) coldspots of anomalous minima.
Agriculture 15 01506 g007
Table 1. Expressions of vegetation indices.
Table 1. Expressions of vegetation indices.
Vegetation IndexExpression
NDVI(NIR − R)/(NIR + R)
NDWI(NIR − SWIR)/(NIR + SWIR)
FVI(2NIR − R − SWIR)/(2NIR + R + SWIR)
MSAVI ( 2 NIR + 1     2 N I R + 1 2 8 ( N I R R ) ) / 2
Note: NIR, R, and SWIR represent the band values in Sentinel-2 data that correspond to the 842 nm, 665 nm, and 1610 nm central wavelengths, respectively.
Table 2. Characteristic parameters.
Table 2. Characteristic parameters.
SymbolsCharacteristic ParametersSymbolsCharacteristic Parameters
x 1 σ V V o x 12 A
x 2 σ V H o x 13 α
x 3 VV + VH x 14 m c
x 4 VV − VH x 15 θ c
x 5 VV × VH x 16 H c
x 6 VV/VH x 17 NDVI
x 7 θ x 18 NDWI
x 8 sin θ x 19 FVI
x 9 cos θ x 20 MSAVI
x 10 tan θ x 21 Z s
x 11 H
Table 3. Correlation of polarization characteristic parameters and linear fitting accuracy.
Table 3. Correlation of polarization characteristic parameters and linear fitting accuracy.
Polarization Feature Pearson Correlation CoefficientCoefficient of Determination (R2)
VV−0.66 **0.44
VH−0.62 **0.38
VV + VH−0.71 **0.51
VV − VH0.040.00
VV × VH0.73 **0.53
VV/VH0.28 *0.08
Explanation: * significant at σ 0.05; ** significant at σ 0.01.
Table 4. Feature importance and SHAP values.
Table 4. Feature importance and SHAP values.
SymbolsFeatureSHAP Value
x 18 NDWI0.0915
x 16 H c 0.0176
x 9 cos θ 0.0139
x 20 MSAVI0.0106
x 19 FVI0.0105
x 14 m c 0.0105
x 8 sin θ 0.0103
x 7 θ 0.0100
x 10 tan θ 0.0100
x 15 θ c 0.0090
x 11 H0.0081
x 13 α0.0074
x 12 A0.0066
x 17 NDVI0.0055
x 21 Zs0.0048
Table 6. Comparison of model evaluation indices.
Table 6. Comparison of model evaluation indices.
ModelR2RMSEMAEBias
MLR0.52062.38391.78490.0250
RFR0.60542.16294.67820.7234
SVR0.61972.12354.50910.3029
Stacking0.70071.88373.54840.5466
Table 5. Evaluation metrics for 10 combined test sets based on SHAP and Stacking models.
Table 5. Evaluation metrics for 10 combined test sets based on SHAP and Stacking models.
GroupR2RMSEMAE
x 3   +   x 5   +   x 18 0.34952.77707.7119
x 3   +   x 5   +   x 18   +   x 16 0.48852.46246.0635
x 3   +   x 5   +   x 18   +   x 16   +   x 9 0.48872.46206.0614
x 3   +   x 5   +   x 18   +   x 16   +   x 9   +   x 20 0.52062.38415.6838
x 3   +   x 5   +   x 18   +   x 16   +   x 9   +   x 20   +   x 19 0.65442.02424.0974
x 3   +   x 5   +   x 18   +   x 16   +   x 9   +   x 20   +   x 19   +   x 14 0.70071.88373.5484
x 3   +   x 5   +   x 18   +   x 16   +   x 9   +   x 20   +   x 19   +   x 14   +   x 8 0.57622.24145.0241
x 3   +   x 5   +   x 18   +   x 16   +   x 9   +   x 20   +   x 19   +   x 14   +   x 8   +   x 7 0.41982.62266.8783
x 3   +   x 5   +   x 18   +   x 16   +   x 9   +   x 20   +   x 19 + x 14   +   x 8   +   x 7   +   x 10 0.49322.45116.0077
x 3   +   x 5   +   x 18   +   x 16   +   x 9   +   x 20   +   x 19   +   x 14   +   x 8   +   x 7   +   x 10   +   x 15 0.45682.53786.4403
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhan, H.; Guo, P.; Hao, J.; Li, J.; Wang, Z. Inversion of County-Level Farmland Soil Moisture Based on SHAP and Stacking Models. Agriculture 2025, 15, 1506. https://doi.org/10.3390/agriculture15141506

AMA Style

Zhan H, Guo P, Hao J, Li J, Wang Z. Inversion of County-Level Farmland Soil Moisture Based on SHAP and Stacking Models. Agriculture. 2025; 15(14):1506. https://doi.org/10.3390/agriculture15141506

Chicago/Turabian Style

Zhan, Hui, Peng Guo, Jiaxin Hao, Jiali Li, and Zixu Wang. 2025. "Inversion of County-Level Farmland Soil Moisture Based on SHAP and Stacking Models" Agriculture 15, no. 14: 1506. https://doi.org/10.3390/agriculture15141506

APA Style

Zhan, H., Guo, P., Hao, J., Li, J., & Wang, Z. (2025). Inversion of County-Level Farmland Soil Moisture Based on SHAP and Stacking Models. Agriculture, 15(14), 1506. https://doi.org/10.3390/agriculture15141506

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop