An Interpretable Machine Learning Framework for Unraveling the Dynamics of Surface Soil Moisture Drivers

Nikraftar, Zahir; Parizi, Esmaeel; Saber, Mohsen; Boueshagh, Mahboubeh; Tavakoli, Mortaza; Esmaeili Mahmoudabadi, Abazar; Ekradi, Mohammad Hassan; Mbuvha, Rendani; Hosseini, Seiyed Mossa

doi:10.3390/rs17142505

Open AccessArticle

An Interpretable Machine Learning Framework for Unraveling the Dynamics of Surface Soil Moisture Drivers

by

Zahir Nikraftar

^1,*

,

Esmaeel Parizi

²

,

Mohsen Saber

³

,

Mahboubeh Boueshagh

⁴

,

Mortaza Tavakoli

⁵,

Abazar Esmaeili Mahmoudabadi

²,

Mohammad Hassan Ekradi

⁶,

Rendani Mbuvha

⁷ and

Seiyed Mossa Hosseini

²

¹

Machine Intelligence and Decision Systems (MInDS) Research Group, School of Electronic Engineering and Computer Science, Queen Mary University of London (QMUL), London E1 4NS, UK

²

Physical Geography Department, University of Tehran, Tehran P.O. Box 14155-6465, Iran

³

School of Surveying and Geospatial Engineering, University of Tehran, Tehran P.O. Box 14174-66191, Iran

⁴

Department of Earth and Environmental Sciences, Lehigh University, Bethlehem, PA 18015, USA

⁵

Department of Geography and Planning, Tarbiat Modares University, Tehran P.O. Box 14115-111, Iran

⁶

Iran Meteorological Organization, Tehran P.O. Box 13185-461, Iran

⁷

Statistics and Probability Group, Department of Mathematics, University of Manchester, Manchester M13 9PL, UK

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(14), 2505; https://doi.org/10.3390/rs17142505

Submission received: 13 May 2025 / Revised: 9 July 2025 / Accepted: 16 July 2025 / Published: 18 July 2025

(This article belongs to the Special Issue Earth Observation Satellites for Soil Moisture Monitoring)

Download

Browse Figures

Versions Notes

Abstract

Understanding the impacts of the spatial non-stationarity of environmental factors on surface soil moisture (SSM) in different seasons is crucial for effective environmental management. Yet, our knowledge of this phenomenon remains limited. This study introduces an interpretable machine learning framework that combines the SHapley Additive exPlanations (SHAP) method with two-step clustering to unravel the spatial drivers of SSM across Iran. Due to the limited availability of in situ SSM data, the performance of three global SSM datasets—SMAP, MERRA-2, and CFSv2—from 2015 to 2023 was evaluated using agrometeorological stations. SMAP outperformed the others, showing the highest median correlation and the lowest Root Mean Square Error (RMSE). Using SMAP, we estimated SSM across 609 catchments employing the Random Forest (RF) algorithm. The RF model yielded R² values of 0.89, 0.83, 0.70, and 0.75 for winter, spring, summer, and autumn, respectively, with corresponding RMSE values of 0.076, 0.081, 0.098, and 0.061 m³/m³. SHAP analysis revealed that climatic factors primarily drive SSM in winter and autumn, while vegetation and soil characteristics are more influential in spring and summer. The clustering results showed that Iran’s catchments can be grouped into five categories based on the SHAP method coefficients, highlighting regional differences in SSM controls.

Keywords:

surface soil moisture dynamics; machine learning; random forest; SHapley additive exPlanations; environmental factors

Graphical Abstract

1. Introduction

Surface soil moisture (SSM) influences climate processes by governing the distribution of precipitation into runoff, evapotranspiration, and infiltration, and by affecting the partitioning of incoming energy into latent and sensible heat fluxes [1,2,3]. SSM also plays a crucial role in the global hydrological cycle, as well as in understanding water resource management, flood generation, and climate change at local and global scales [4,5]. The significance of SSM becomes more pronounced in dry and semi-arid regions, where water scarcity is a prominent concern [6]. In these regions, variations in SSM can trigger cascading effects, influencing groundwater recharge rate [7], vegetation dynamics [8], and regional climate patterns [1]. Hence, precise monitoring and prediction of SSM, along with the investigation of influencing factors, are indispensable for sustainable water resource management, risk assessment, and mitigation of drought and other hydrometeorological hazards in these regions [9,10,11].

However, a major bottleneck in SSM monitoring, particularly in arid and semi-arid regions, is the lack of high-quality and reliable in situ data [12,13,14]. While in situ sensors provide localized, high-quality measurements, their geographic coverage is often limited due to financial, logistical, and accessibility constraints [15]. In many countries, particularly those with vast arid and semi-arid regions, installing and maintaining in situ networks is challenging [12,13,16]. Thus, validating global SSM datasets with in situ measurements is becoming increasingly important [17]. Country-specific validation of these global datasets allows for the calibration of models to regional conditions, enhancing their reliability and applicability for resource management strategies [16,18].

Understanding the factors that influence SSM is key to effective monitoring and management [14,19]. SSM is influenced by numerous variables, including precipitation [20,21], texture and organic matter content of soil [3,22], topography [23], vegetation [24], and groundwater [25]. Seasonal trends, land-use changes, and evapotranspiration rates are also critical elements affecting SSM dynamics [26,27]. Understanding these variables and their interactions is vital for developing strategies for SSM management, particularly in arid and semi-arid regions where every drop of water matters [28]. Insight into these factors can enhance the precision of hydrological models and contribute to the development of adaptive management strategies for water resources [29].

The increasing complexity and non-linearity of relationships among environmental variables affecting SSM present significant challenges for traditional modeling approaches [29,30]. To address these challenges, machine learning (ML) techniques have gained popularity in hydrological modeling due to their ability to process large datasets and capture intricate interactions among multiple predictors [31]. Among various machine learning models, Random Forest (RF) stands out for its ability to deliver consistent predictions by reducing variance without increasing prediction bias [32,33]. However, a major criticism of ML approaches is their “black-box” nature, which often limits interpretability and hinders their adoption in environmental sciences [34]. To overcome this limitation, recent studies have employed SHapley Additive exPlanations (SHAP)—a model-agnostic, game-theoretic technique that explains model predictions by quantifying the contribution of each input variable [35,36]. Unlike traditional variable importance measures, SHAP not only indicates which variables are most influential, but also reveals the direction of their effects (positive or negative), thus providing deeper insight into model behavior [34].

Therefore, the objective of this study is to develop an interpretable machine learning framework to investigate the key environmental drivers influencing SSM—including the magnitude and direction of their effects—across diverse hydro-climatic regions in Iran. We achieve this by integrating the RF model with SHAP and a two-step clustering method to identify and interpret the spatial and seasonal dynamics of key environmental drivers of SSM. Unlike previous studies that are often limited to local scales or lack interpretability, our approach provides a scalable, interpretable, and seasonally explicit analysis of SSM drivers across 609 catchments. The insights derived from this framework aim to support data-driven water resource management in arid and semi-arid regions.

2. Materials and Methods

2.1. Study Area

Iran with an area of approximately 1,648,195 km² is located in the Middle East and faces challenges related to water resource scarcity [37]. Iran is typically characterized by an arid and semi-arid climate, with an average annual precipitation of approximately 250 mm [38]. As indicated by the climate classification shown in Figure 1, a significant portion of the country falls under the category of a warm-dry climate, naturally resulting in SSM deficits [39]. In recent years, SSM deficits in Iran have been exacerbated by factors that include climate change, excessive groundwater extraction, mismanagement of surface water, and inefficient irrigation practices [40]. Such SSM shortages can lead to detrimental environmental consequences, including desertification, soil degradation, dust storms, wind erosion, and the degradation of air and water quality [41]. Therefore, it is imperative to monitor SSM and study the factors influencing it in Iran to ensure effective environmental management [17,42]. Nonetheless, SSM remains inadequately monitored in numerous regions of Iran, and the existing measurements as depicted in Figure 1, lack sufficient temporal and spatial resolution [38].

Under these circumstances, remote sensing data can offer a viable solution for monitoring SSM in Iran [43]. Consequently, it is essential to validate and assess global SSM products using in situ stations to gauge their real-world effectiveness before using them across all climatic regions in Iran. In this study, after validating the SSM products, the environmental factors affecting SSM were investigated across 609 catchments in Iran. The catchments cover a total drainage area of 1,648,195 km² [44].

These catchments are distributed across diverse climates, spanning from cold-dry to warm-humid, as illustrated in Figure 1. They exhibit different topographic characteristics, geological compositions, soil types, and vegetation, leading to diverse hydrological conditions throughout the country. Additionally, the land cover map obtained from ESA with a 10 m resolution [45,46] reveals that bare/sparse vegetation, grasslands, croplands, and forests constitute the predominant land cover in the studied catchments, accounting for 64.2%, 18.9%, 11.8%, and 1.70%, as given in Table S1. Given its vast area and predominantly arid to semi-arid climate, Iran experiences severe soil moisture deficits driven by both natural and anthropogenic factors. Its diverse topography, land cover, and climate zones make it an ideal case study for evaluating SSM dynamics. Moreover, the insights gained—particularly regarding the role of environmental drivers—can be valuable for other semi-arid regions worldwide facing similar climatic and hydrological challenges.

2.2. Datasets

2.2.1. SSM Data

Following the flowchart illustrated in Figure 2, daily in situ SSM data from agrometeorological stations were gathered from the Iran Meteorological Organization, IMO [39] for the period spanning 1 February 2006, to 31 March 2023. Unfortunately, SSM data are not widely available in numerous regions of Iran and contain several gaps [38,42]. Hence, a total of 42 stations were chosen throughout Iran, distributed as follows: 13 stations in cold-dry climates, 2 in cold-humid, 11 in warm-dry, 3 in warm-humid, 8 in moderate-dry, and 5 in moderate-humid (Figure 1). The limitation of in situ stations largely stems from financial, logistical, and accessibility challenges that hinder the establishment and maintenance of dense monitoring networks in Iran [39]. Although the in situ dataset is available from 2006, we selected the period from April 2015 to March 2023 to ensure consistency with the temporal coverage limitations of the other datasets used in this study. Considering previous studies (e.g., [15,47,48]), we utilized three primary SSM datasets, namely Soil Moisture Active Passive (SMAP), Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2), and Climate Forecast System, Version 2 (CFSv2), to validate SSM at a depth of 0–5 cm using in situ data (Table 1).

The SMAP satellite mission by NASA was launched on 31 January 2015, to globally map soil moisture and the freeze/thaw state of landscapes [49]. Although the satellite was initially equipped with both an L-band radar and an L-band radiometer, the radar instrument encountered a failure after approximately 11 weeks of operation, and the production of soil moisture data continues based solely on the radiometer measurements [47]. SMAP measurements provide direct sensing of SSM in the upper 5 cm of the soil [15,50]. In this study, we employed the SMAP Level 4 passive product [51], which features a spatial resolution of 9 km (Table 1).

Table 1. A summary of the SSM datasets used in this study.

Data Name	Institution	Examined Period	Spatial Resolution	Temporal Resolution	Reference
In situ	IMO	2015–2023	point location	daily	[39]
SMAP Level 4	NASA	2015–2023	9 km × 9 km	daily	[52]
MERRA-2	NASA	2015–2023	56 km × 70 km	daily	[53]
CFSv2	NCEP	2015–2023	22 km × 22 km	daily	[54]

MERRA-2, generated by the NASA Global Modeling and Assimilation Office (GMAO), is the most recent atmospheric reanalysis covering the modern satellite era, offering global reanalysis data spanning from 1980 to the present [53]. MERRA-2 is validated against in situ measurements in North America, Europe, and Australia, demonstrating that its performance is slightly superior to that of ERA-Interim/Land [55]. In this study, we utilized MERRA-2, with a spatial resolution of 56 km by 70 km (Table 1).

CFSv2 was made operational at the National Centers for Environmental Prediction (NCEP) in March 2011 [56,57]. The soil moisture dataset within the CFSv2 encompasses four layers (at 5, 25, 70, and 150 cm depths) and has a spatial resolution of approximately 22 km [54]. It is worth mentioning that while SMAP directly measures SSM, MERRA-2 and CFSv2 rely on model-based reanalysis data generated through physical models, observations, and data assimilation. These three SSM datasets were resampled to 0.25° × 0.25° by bilinear interpolation [19] to ensure consistency across the datasets.

2.2.2. Factors Influencing SSM

We selected the candidate factors that influence SSM based on previous studies (e.g., [3,58,59]) and data availability (Table 2, Figure S1). We used the catchment-averaged monthly data of the following parameters: precipitation, potential evapotranspiration, solar radiation, wind speed, normalized difference vegetation index (NDVI), and groundwater table depth. Additionally, time-invariant catchment attributes such as distance from water bodies, clay fraction, organic matter fraction, elevation, and topography roughness index were included (Table 2). The mean monthly precipitation for the studied catchments was computed using the Radial Basis Function (RBF) interpolation method [60] in Python (v 3.11.5). This was based on daily precipitation data collected from 422 synoptic stations between 2015 and 2023 from IMO [39]. The mean monthly potential evapotranspiration for the studied catchments was determined using the MODIS global evapotranspiration product MOD16A2 [61] at a 500 m resolution.

Solar radiation and wind speed data were obtained from the ERA5-Land dataset [62,63] at a spatial resolution of 11 km. Vegetation dynamics were analyzed using the NDVI derived from Sentinel-2 data [64] at 10 m resolution. The NDVI has proven to be a suitable index for detecting vegetation changes, particularly within arid and semi-arid regions [7]. To generate a time series of groundwater table depth, we utilized monthly data from 11,003 observation wells collected by the Iran Water Resources Management Company [44]. We derived the average monthly groundwater table depth for each catchment using the RBF interpolation method.

Table 2. Summary of the candidate factors considered in this study that impact the SSM.

Type	Dataset	Source	Spatial Resolution	Reference
Dynamic	Precipitation	In situ observations	point location	[39]
	Potential evapotranspiration	MODIS	500 m	[61]
	Solar radiation	ERA5-Land	11 km	[62]
	Wind speed	ERA5-Land	11 km	[62]
	Normalized difference vegetation index	Sentinel-2	10 m	[64]
	Groundwater table depth	In situ observations	point location	[44]
Static	Distance from water bodies	NOAA	-	[65]
	Clay fraction	SoilGrids	250 m	[66]
	Organic matter fraction	SoilGrids	250 m	[66]
	Elevation	ALOS AW3D30	30 m	[67]
	Topography roughness index	ALOS AW3D30	30 m	[67]

The mean distance of each catchment from water bodies was determined by utilizing the water bodies’ data [65] and the Euclidean Distance method in ArcGIS [68]. The SoilGrids250m dataset [66] was employed to extract the clay and organic matter fractions within the upper vadose zone. Finally, we calculated the mean elevation and topography roughness index for the studied catchments using an ALOS DEM with a 30 m resolution [67] and Focal Statistics tools in ArcGIS. All factors influencing SSM were extracted using the Google Earth Engine platform [69], Python (v 3.11.5), and ArcGIS software (v 10.7.1). All SSM-influencing factors, derived from datasets with varying spatial resolutions, were resampled to a common resolution of 0.25° × 0.25° by bilinear interpolation to ensure consistency with the SSM datasets. Additional details regarding the factors influencing SSM and their correlation are provided in the Supplementary Material.

2.3. Methods

2.3.1. Statistical Metrics

The performance of global SSM products was assessed using the Root Mean Squared Error (RMSE), Relative Bias (RBias), Kendall’s Tau (τ), and Kling–Gupta efficiency (KGE′, Figure 2) [70,71,72,73]:

RMSE = \sqrt{\frac{\sum_{i = 1}^{N} {({Tar}_{i} - {Ref}_{i})}^{2}}{N}}

(1)

RBias = \frac{\sum_{i = 1}^{N} ({Tar}_{i} - {Ref}_{i})}{\sum_{i = 1}^{N} {Ref}_{i}}

(2)

τ = \frac{\sum_{i = 1}^{n} \sum_{j = 1}^{n} sign ({Tar}_{i} - {Tar}_{j}) sign ({Ref}_{i} - {Ref}_{j})}{n (n - 1)}

(3)

where N represents the number of samples, Ref is the reference values (i.e., in situ data), and Tar is the target values (i.e., SMAP, MERRA-2, and CFSv2 data) for each record (i). Furthermore, we utilized the KGE′ statistic, initially introduced by [74] and subsequently modified by [75]. KGE′ balances the contributions of correlation, bias, and variability terms as follows [70]:

KGE = 1 - \sqrt{{(r - 1)}^{2} + {(β - 1)}^{2} + {(γ - 1)}^{2}}

(4)

β = \frac{μ_{Tar}}{μ_{Ref}}

(5)

γ = \frac{{CV}_{Tar}}{{CV}_{Ref}} = \frac{σ_{Tar} / μ_{Tar}}{σ_{Ref} / σ_{Ref}}

(6)

where r represents the correlation coefficient between the Ref and Tar datasets, γ is the variability ratio, β is the bias ratio, μ is the mean SSM, and CV is the coefficient of variation that represents the standard deviation. The better values have higher KGE [71]. The results of these metrics are illustrated in Figure 3, where τ, KGE′, RMSE, and RBias are presented in sub-panels (a) through (d), respectively, for the period from 1 April 2015 to 31 March 2023.

2.3.2. Random Forest (RF)

Due to its ability to produce consistent predictions by reducing variance without increasing bias [32], we chose RF to model SSM in 609 catchments across Iran. The RF model was initially introduced by Leo Breiman in 2001 [76]. The RF algorithm does not necessitate any alterations, conversions, or modifications to the input data, and it autonomously handles missing values [77,78]. An RF model comprises a multitude of decision trees designed to be as uncorrelated as possible [79]. To create an uncorrelated collection of trees, the RF employs bagging and feature randomization during the construction of the decision trees. This implies that each tree is trained on a random sample drawn from the training set with replacement, a technique known as bootstrapping [80]. Additionally, each tree is limited to a random subset of the available features [77].

In this study, we used the RandomForestRegressor from the machine learning library sklearn [81,82] for the Python programming language. For enhanced convergence speed and to mitigate the impact of local extremes on training, the input variables undergo normalization to a range of 0.1 to 0.9 before the training process. As some models encounter issues when inputs are normalized between 1 and 0, we chose to normalize the input variables using the adjusted min-max method within the range of 0.1 to 0.9 [83]. In this study, satellite-based SSM products were first evaluated against in situ observations from 42 agrometeorological stations for the period 2015–2023. The product demonstrating the highest agreement was selected to train the RF model over 609 catchments. Although the dataset covers the period from 2015 to 2023, our analysis did not involve direct time-series modeling. Instead, we aggregated the data seasonally, calculating the average SSM and associated predictors for each season. This seasonal averaging approach effectively removed the temporal sequence from the data, making serial autocorrelation irrelevant to both our modeling and validation processes. First, 80% of the data (487 catchments) was randomly allocated for training, and the remaining 20% (122 catchments) was reserved for testing. Then, 10-fold cross-validation was applied solely to the training set for hyperparameter tuning, serving as a safeguard against model overfitting [79]. Finally, the RF model’s performance in estimating SSM across different seasons was evaluated on the reserved test set using R-squared (R²), RMSE, and Mean Absolute Error (MAE).

2.3.3. SHAP

The SHAP method represents a model-agnostic game-theoretic technique for interpreting machine learning models [35]. Unlike conventional methods that only quantify the influence of input variables on the model output, SHAP can reveal whether each variable exerts a positive or negative impact on the model [34]. In other words, SHAP can analyze an individual prediction by considering it as a composite result of the combined effects of each input variable on the output value (i.e., the predicted value). This approach allows users to gain insight into the magnitude and direction of each variable’s influence on the output [35,84]. In recent years, SHAP has been widely applied to environmental data to improve the interpretability of ML models (e.g., [85,86,87,88]). Using a pre-trained machine learning model denoted as M and a set of input variables x = {x₁, …, x_q}, SHAP employs an explanation model E to ascertain the individual influence of each variable on the behavior of model M [36]. SHAP is expressed as

E = ϕ_{0} + \sum_{i = 1}^{q} ϕ_{i} t_{i}

(7)

ϕ_{i} (M, x) = \sum_{t \subseteq x} \frac{|t|! (q - |t| - 1)!}{q!} [M (t) - M (t \ i)]

(8)

where q represents the number of input variables, t is the variable simplification, ϕ_i ∈ R represents the contribution of each variable to the machine learning model, and \ is the difference-set notation for set operations [34,36]. Recent enhancements to SHAP have focused on optimizing TreeExplainer for tree-based models, reducing computational costs while improving accuracy. Notably, ref. [89] proposed polynomial-time algorithms for computing SHAP values by leveraging the structural properties of tree ensembles, enabling exact attributions without the exponential complexity typically associated with Shapley value calculations. Earlier, ref. [90] introduced Fast TreeSHAP, an algorithm that accelerates SHAP value computation by up to 2.5 times through strategic caching and precomputation, significantly improving scalability for large datasets. Additionally, ref. [91] developed a functional decomposition approach for gradient-boosted trees, which separates main effects and interactions to enhance the interpretability of SHAP outputs. These methodological advancements have made TreeExplainer both more efficient and more robust in capturing detailed feature contributions. Accordingly, in our study involving ensemble tree models, we employed these improved SHAP implementations to ensure scalable and reliable feature attribution.

Two major types of SHAP plots are beeswarm and waterfall plots. In a beeswarm plot, features are displayed on the vertical axis and SHAP values on the horizontal axis. Each point represents a SHAP value for a specific feature in an individual sample. Red points indicate high feature values, while blue points indicate low ones. Red points located on the right suggest that high feature values increase SSM, whereas their presence on the left implies that high values reduce SSM. Similarly, blue points on the right suggest that low feature values increase SSM, while those on the left indicate that a decrease in the feature leads to lower SSM, as illustrated in Figure 7. In a waterfall plot, red bars represent features that contribute to increasing the predicted SSM, while blue bars indicate features that reduce it. The horizontal length of each bar reflects the magnitude of the feature’s impact on the prediction relative to the model’s baseline, as shown in Figure 8.

2.3.4. Cluster Analysis

A two-step cluster analysis was performed on the SHAP model outputs to enhance our spatial understanding of the effects of factors influencing SSM within the studied catchments. In other words, employing a two-step cluster analysis can offer spatial interpretations for the factors that affect SSM, utilizing the outcomes generated by RF and SHAP. Compared to k-means and balanced iterative reducing and clustering using hierarchies (BIRCH), the two-step cluster analysis offers several advantages. These include its ability to handle both categorical and continuous variables, automatically determine the optimal number of clusters, and scale effectively for large datasets [92,93]. We follow the methodology for the two-step cluster analysis outlined by [94]. Only a summary of the technique is provided here, and rather, the reader is directed to [94] for a detailed description of the approach. The method involves two steps: (1) whole records are probed by distance to construct a classification tree, where records in the same tree node are most similar [95]; (2) nodes are classified using the cohesion technique and clustering results are evaluated using the Bayesian information criterion (BIC) or the Akaike information criterion (AIC), which determine the structure of the final cluster [96].

3. Results and Discussion

3.1. Performances of SSM Products

Figure 3 shows the validation results of the three global SSM datasets based on data from 42 in situ stations between April 2015 and March 2023. The findings suggest that, among the evaluated datasets, SMAP stands out with the highest median values for τ and KGE (0.740 and 0.690). It also exhibits the lowest median values for RMSE and RBias (0.068 and 0.030). Following SMAP, the MERRA-2 product shows median values of τ and KGE at 0.684 and 0.604, along with median RMSE and RBias values of 0.085 and 0.034, respectively. CFSv2 shows the lowest median values for both τ and KGE (0.550 and 0.500), and it also has the highest median values for RMSE and RBias (0.113 and 0.059) among the three products, as illustrated in Figure 3. These findings indicate that SMAP provides superior performance in estimating SSM across Iran compared to the other datasets.

The performance of SMAP aligns with the findings of the study conducted by [97], which reported that SMAP has a global average anomaly correlation of 0.76. Ref. [15] evaluated eight global root zone soil moisture products (0–1 m depth) across the globe. Their findings indicated that SMAP, MERRA-2, JRA-55, and ERA-5 consistently showed stronger correlations with in situ root zone soil moisture measurements compared to GLDAS, NCEP R1, and NCEP R2. Ref. [47] validated SMAP SSM using core validation sites. They reported that the SMAP radiometer-based SSM product meets its expected performance, achieving an unbiased RMSE of 0.04 m³/m³ for volumetric SSM. It is noteworthy that global evaluations have not incorporated the in situ SSM data from Iran. Few studies have explicitly focused on evaluating SSM in Iran. These studies either concentrate on a particular local area, such as the Lake Urmia Basin [98,99] or cover a short period in Iran (i.e., 2015–2016, as demonstrated in [42]). Additionally, some studies were solely concerned with validating a single product (e.g., [17,100,101]). Ref. [42] validated SSM products from SMAP, SMOS, and AMSR2, using 23 in situ stations in Iran from 2015 to 2016. Their results pointed to SMAP as the best-performing satellite-based product. Also, ref. [100] stated that SMAP has a strong capacity for SSM data retrieval in Iran.

3.2. Spatial-Temporal Pattern of SSM

We calculated SSM for 609 catchments in Iran using the SMAP dataset, which demonstrated its optimal performance (Figure 4). Figure 4 illustrates the mean catchment-averaged daily SSM across 609 studied catchments from April 2015 to March 2023, delineated by different seasons and based on the SMAP dataset. The mean SSM for different seasons reveals distinct patterns, with winter having the highest median at 0.175 m³/m³, followed by spring at 0.160 m³/m³, while autumn and summer exhibit lower medians of 0.096 m³/m³ and 0.081 m³/m³, respectively. These findings indicate notable seasonal variations in SSM within the studied catchments. Ref. [102] stated that many catchments in Iran lack natural moisture, especially during the summer, leading to a heightened demand for irrigation in agriculture during this season. According to [103], differences in soil moisture levels between dry (summer) and wet (winter) conditions are more pronounced in the upper surface layers (0 to 20 cm) when compared to deeper layers. Figure 4 also demonstrates that catchments with SSM exceeding 0.20 m³/m³ are predominantly located in northern, northwestern, western, and southwestern Iran, primarily in regions characterized by cold-humid and moderate-humid climates. In contrast, catchments with SSM below 0.05 m³/m³ are primarily concentrated in central and southeastern Iran, which are characterized by warm-dry climates.

3.3. SSM in Different Land Covers

Numerous studies have shown that changes in SSM exhibit varying characteristics when subjected to different land cover types (e.g., [104,105,106]). The boxplots of SSM within the six primary land cover categories in Iran, as determined by the land cover map of ESA with a 10 m resolution [45] for the period spanning 1 April 2015 to 31 March 2023, are displayed in Figure 5a. The results reveal significant variations in median SSM across different land covers. Specifically, we found that forests exhibited the highest SSM with a median value of 0.180 m³/m³, followed by grasslands at 0.148 m³/m³, croplands at 0.148 m³/m³, shrubland at 0.117 m³/m³, built-up areas at 0.107 m³/m³, and bare/sparse vegetation at 0.093 m³/m³. SSM in various land-use patterns along the lower Bhavani River in India has been studied [107]. The study concluded that SSM is higher in forested areas compared to fallow land and built-up areas. Figure 5a also indicates that grasslands exhibit the highest soil moisture variability, while bare/sparse vegetation shows the least diversity across Iran.

Figure 5b indicates a time series of SSM for various land cover types in Iran, spanning from 1 April 2015 to 31 March 2023. The results show that in the early months of 2019, especially in March and April, SSM reached higher levels compared to the same months in the preceding and subsequent years (Figure 5b). The analysis of the precipitation time series in Iran (as shown in Figure 5b) for the years 2015–2023 reveals that the increase in SSM during those particular months is due to the increase in precipitation. This finding aligns with previous research. A study has demonstrated that a significant increase in precipitation during the early months of 2019 led to a rise in the water level of Lake Urmia in northwestern Iran [108]. Due to the heavy and unprecedented precipitation event between mid-March and April 2019, widespread flooding events affected 25 out of the 31 provinces in Iran [73,109,110]. These events resulted in more than 77 human fatalities and caused approximately USD 2.2 billion in damages.

3.4. RF and SHAP

Figure 6 illustrates the comparison of the SSM data obtained by SMAP vs. RF model performance across different seasons during the testing phase. The findings indicate that RF can yield SSM estimations with R² values of 0.89, 0.83, 0.70, and 0.75 for the winter, spring, summer, and autumn seasons, respectively. Corresponding RMSE values are 0.076, 0.081, 0.098, and 0.061 m³/m³, while MAE values are 0.058, 0.060, 0.076, and 0.047 m³/m³, demonstrating consistent performance across seasons. The decrease in R² values observed during summer is likely due to the significant shortage of SSM during this season compared to others, notably winter (as shown in Figure 4). The good performance of the RF method in estimating SSM in this study aligns with the findings of previous research. For example, in the semi-arid region of West Khorasan-Razavi province in Iran, ref. [111] utilized several machine learning algorithms for SSM estimation. Their study concluded that the RF method provided the most precise results. Another study [112] conducted a study comparing spectral and spatial-based approaches to map local SSM variations in the Balikhli-Chay watershed in northwestern Iran. Their findings revealed that the RF approach outperformed others, demonstrating the highest level of performance in SSM modeling.

Machine learning techniques are often considered black-box models, which limits their interpretability regarding the process of making predictions [113]. SHAP offers a way to understand the influence of each feature on the model’s outputs [114]. Figure 7 presents beeswarm plots from the SHAP analysis for various seasons. The findings reveal that the primary factors influencing SSM vary from season to season (Figure 7). In the winter season, the key factors that exert the most influence on SSM are precipitation, distance from water bodies, solar radiation, clay fraction, potential evapotranspiration, and elevation, respectively (Figure 7a). Winter is typically associated with increased precipitation in many of Iran’s catchments [115,116]. Previous research demonstrated a direct contribution of precipitation to SSM (e.g., [117,118]). The proximity of the studied catchments to water bodies, such as the Caspian Sea and the Persian Gulf, plays a vital role in determining SSM during the winter. For example, the distance from water bodies can influence relative humidity, which consequently impacts SSM [33,119]. Solar radiation with an inverse impact on SSM tends to be lower during the winter due to shorter days and reduced sunlight, which affects the rate of moisture evaporation from the soil as reported by [120]. So, lower solar radiation in winter can help maintain higher SSM.

Recently, the spatial and temporal variability of soil moisture and its influencing factors across the northern agricultural regions of China have been investigated [121]. Their findings indicated that soil moisture exhibited a negative correlation with temperature and sunshine duration, while showing a positive correlation with precipitation and relative humidity. In a study [105], it is demonstrated that as soil depth increases, the influence of natural factors on soil moisture anomalies gradually diminishes, whereas the impact of human-related factors becomes more pronounced. This suggests that human activities exert a stronger influence on deeper soil layers. A study [122] examined the driving factors of soil moisture in the Heihe River Basin and found that during months with low soil moisture, land cover and elevation were the main influencing factors. In contrast, during months with higher soil moisture, the NDVI and land surface temperature played the primary roles.

As the fourth important factor influencing SSM in winter, the clay fraction can enhance the soil’s water-holding capacity [3,123,124]. Clay soils retain moisture more effectively, contributing to higher SSM levels. One of the other factors affecting SSM is potential evapotranspiration, which tends to be lower during the winter in the studied catchments due to cooler temperatures. This reduction in potential evapotranspiration can contribute to SSM preservation. Ref. [33] highlighted potential evapotranspiration as a key determinant in estimating root zone soil moisture within the Raam catchment in the Netherlands. Ref. [125] analyzed the spatiotemporal variability of soil moisture and its dominant driving factors, showing that evapotranspiration played a more significant role in tropical areas. Ref. [126] examined the long-term evolution of soil moisture and its driving factors across China’s agroecosystems. Their findings revealed that in the plateau mountain and temperate continental climate zones, relative soil moisture was primarily influenced by temperature and precipitation, respectively. In temperate humid regions, climate change emerged as the dominant controlling factor, while in subtropical humid zones, grain output exhibited a negative impact on relative soil moisture.

Elevation, identified as the sixth most influential factor, has a negative impact on SSM during winter. Catchments at different elevations may experience variations in temperature and slope, which can influence SSM retention [127,128]. The analysis of the published data on seven potential factors influencing the temporal stability of soil water content indicated that the influence of these factors appears to be interconnected rather than solely driven by a single dominant factor [129]. These results align with hydrological theory, which states that increased precipitation and reduced radiation contribute to higher SSM during winter.

As the spring season commences and plants and trees in Iran start to grow, the significance of NDVI, distance from water bodies, clay fraction, and organic matter fraction becomes more pronounced than precipitation (Figure 7b). In other words, during spring, land cover and soil characteristics take on greater importance compared to winter. On the other hand, increased plant growth and the decomposition of organic materials, driven by increased temperatures, can lead to an increase in soil organic matter content in spring. This, in turn, notably influences the soil’s ability to retain water. Based on [3], an increase in organic matter content results in an enhanced water-holding capacity due to the inherent affinity of organic matter for water. Ref. [102] investigated long-term spatiotemporal variations in SSM and vegetation indices across Iran. They concluded that NDVI exerts a noteworthy influence on the spatiotemporal variations in SSM. Ref. [130] estimated agricultural farm SSM using spectral indices and demonstrated that NDVI and land surface temperature possess substantial potential for extracting valuable SSM information. In spring, the increasing importance of NDVI and organic matter aligns with established ecohydrological understanding, as vegetation and organic matter enhance infiltration and water retention capacity.

With the significant decrease in precipitation in summer, the dominant factors affecting SSM are proximity to water bodies, clay fraction, potential evapotranspiration, NDVI, organic matter fraction, and elevation (Figure 7c). These findings reveal that the influence of the clay fraction on SSM reaches its peak during this dry season, surpassing the impact seen in other seasons. Catchments with a substantial clay content can effectively retain SSM during this period. In summer, the dominance of the clay fraction highlights the importance of soil texture in arid climates. This aligns with the water balance concept, which suggests that soils with higher water-holding capacity—such as those rich in clay or organic matter—are better able to buffer against drying.

As autumn arrives and precipitation begins, there is a shift in the hierarchy of factors influencing SSM fluctuations, with precipitation emerging as the predominant factor impacting SSM during this season (Figure 7d). It is worth noting that additional factors, such as groundwater table depth, wind speed, and topography roughness index, exert a negative influence on SSM. These factors have a relatively minor impact on SSM in comparison to other factors. The low influence of groundwater table depth on SSM can be attributed to the average depth of the groundwater table in the studied catchment, approximately 31 m (as indicated in Figure S1f). The impact of groundwater depth on SSM tends to become significant mainly in catchments with shallower groundwater tables, particularly in winter. Overall, the SHAP-derived feature importance patterns align well with established hydrological processes, including seasonal water inputs, storage dynamics, and land surface–atmosphere interactions. This confirms the model’s ability to reflect not only statistical relationships but also meaningful hydrological behavior.

Ref. [88] applied explainable transfer learning to predict subsurface soil moisture in the Yellow River Basin. Their findings, based on SHAP values, revealed that as the model was transferred from arid to humid regions, the influence of evapotranspiration-related factors declined significantly. Additionally, the effect of precipitation no longer increased with its amount, while the influence of SSM became more prominent. Also, ref. [131] conducted a comparative analysis of machine learning models for soil moisture estimation using high-resolution remote sensing data in the ShanDian River basin. Their SHAP-based analysis revealed that elevation was the most influential feature across all models, exerting a negative impact on soil moisture—indicating that higher elevations correspond to lower soil moisture levels. This aligns with the well-known pattern in which elevation influences precipitation distribution and runoff generation, ultimately reducing soil moisture at higher altitudes.

Figure 8 displays the SHAP waterfall plots for four representative catchments (i.e., Gorgan, Mahabad, Hamun, and the Lut Desert). The analysis reveals distinct regional patterns. In Gorgan and Mahabad, precipitation, clay fraction, and distance from water bodies appear as the most influential features, all contributing significantly to SSM prediction. This aligns with their relatively wetter climates and proximity to major water bodies—the Caspian Sea near Gorgan and Lake Urmia near Mahabad. Conversely, in the Hamun and Lut Desert regions, distance from water bodies shows a strong negative contribution, reflecting their remoteness from surface water sources. Additionally, the clay content in these arid regions further suppresses SSM estimates. These SHAP-based insights emphasize the importance of both climatic inputs and static geographic variables in shaping regional SSM dynamics and enhance the interpretability of the model for hydrological assessments across diverse environments.

3.5. Cluster Analysis

The SHAP model yielded a large number of coefficients for the 609 studied catchments, presenting a challenge in terms of interpretation. Therefore, a methodology is needed to categorize the non-stationarity results of factors influencing SSM across seasons in Iran. In this study, cluster analysis was used to process SHAP coefficients. Following the clustering process, we observed a discernible resemblance in the factors affecting catchments grouped within the same category. When comparing these categories, the results show substantial distinctions in the various factors across catchments. In SPSS software (v 24), the two-step clustering algorithm automatically assesses the suitability of segmenting catchments into multiple categories by considering lower BIC values. Once the optimal number of clusters is determined, more precise insights into the distinctions among categories can be obtained. The results of clustering allow us to understand how each factor influences SSM. In this study, we assessed the clustering quality using the BIC to ensure the reliability of the results. For each season, BIC values were calculated for a range of cluster numbers (from 1 to 10). The lowest BIC consistently occurred at five clusters, with only minimal changes observed beyond this point. Consequently, SPSS automatically identified five clusters as the optimal solution across all four seasons. The corresponding BIC values are presented in Table 3.

Table 4 presents the average SHAP values of the six primary factors for every season and cluster. Figure 9 displays the spatial distribution of the clustering results for various catchments. Features with significant absolute values carry a more pronounced influence on the SSM. These figures serve as valuable tools for comprehending the spatial distribution of each catchment type and identifying the factors that exert the most substantial influence on each catchment’s SSM.

The clustering results for winter are shown in Figure 9a,e. The first catchment type is located mainly in northwestern and western Iran. In these catchments, the factors that most significantly affect SSM are, in respective order, precipitation, clay fraction, solar radiation, potential evapotranspiration, distance from water bodies, and elevation, as shown in Figure 9a,e and Table 4. Ref. [132] investigated the spatial and temporal variations in SSM with respect to topographic and meteorological factors in Ardabil province, which falls into the first catchment category of this study. Their research underscored a significant correlation between SSM and variables such as precipitation. The second catchment type is primarily found in central, eastern, and northeastern Iran. The most influential factors affecting SSM in these catchments are precipitation, distance from water bodies, solar radiation, clay fraction, potential evapotranspiration, and elevation. The third catchment category is predominantly located in southern and northern Iran, in proximity to the Persian Gulf and the Caspian Sea. Within this catchment type, the most pivotal factor affecting SSM is the distance from water bodies, with precipitation, clay fraction, solar radiation, elevation, and potential evapotranspiration following in significance. In the fourth catchment category, similar to the third catchment type, the primary factor influencing SSM is the distance from water bodies, albeit with an inverse effect. This is followed by solar radiation, precipitation, potential evapotranspiration, elevation, and clay fraction (Table 4).

The fifth catchment category is distributed in southeastern Iran. In order of influence, the most significant factors on SSM are precipitation, clay fraction, solar radiation, distance from water bodies, potential evapotranspiration, and elevation. The clustering results for the spring season are shown in Figure 9b,f. In this season, vegetation and organic matter fraction have a greater impact on SSM than in winter. The first catchment type is located mainly in northern, northwestern, and western Iran. Within these catchments, the factors exerting the most substantial influence on SSM are, in order of impact, NDVI, organic matter fraction, clay fraction, precipitation, solar radiation, and distance from water bodies, as demonstrated in Figure 9 and expounded upon in Table 4.

The second catchment type is mainly distributed in southern, southwestern, and northern Iran. The most influential factors affecting SSM are distance from water bodies, NDVI, clay fraction, organic matter fraction, solar radiation, and precipitation. In the spring of 2020, Ref. [133] collected 394 surface soil samples in Golestan province located in northern Iran, a region categorized within the second catchment type of this study. The findings of their research demonstrated a strong correlation between NDVI and SSM. The third catchment category is primarily located in southern and central Iran. Within this catchment category, the most dominant factor affecting SSM is the NDVI, with organic matter fraction, precipitation, clay fraction, distance from water bodies, and solar radiation subsequently ranking in significance. In the fourth catchment category, the predominant factor influencing SSM is the distance from water bodies. This is followed by clay fraction, organic matter fraction, precipitation, solar radiation, and NDVI (Table 4). The fifth catchment category is distributed in southeastern and eastern Iran. The most significant factors influencing SSM are NDVI, clay fraction, precipitation, organic matter fraction, solar radiation, and distance from water bodies.

Figure 9c,g and Table 4 indicate the clustering results for the summer season. During this season, the influence of distance from water bodies and clay fraction on SSM is more pronounced compared to other seasons. The first catchment type is located in northern, northwestern, and northeastern Iran around the Caspian Sea. In these catchments, the most significant influences on SSM are associated with potential evapotranspiration, proximity to water bodies, clay fraction, NDVI, organic matter fraction, and elevation (Figure 9c,g and Table 4). The second catchment type is mainly located in southern and southeastern Iran around the Persian Gulf and the Gulf of Oman. In this category, distance from water bodies is the most important factor, followed by clay fraction, potential evapotranspiration, organic matter fraction, NDVI, and elevation. In the third and fourth catchment categories, the hierarchy of influencing factors closely mirrors that of the second catchment type, except that NDVI exerts a more significant influence than the organic matter fraction. The fifth catchment category is primarily located in southeastern and eastern Iran. The most influential factors are distance from water bodies, clay fraction, potential evapotranspiration, elevation, NDVI, and organic matter fraction (Table 4).

The clustering results for autumn are shown in Figure 9d,h and Table 4. As precipitation begins in Iran, it becomes the predominant factor affecting SSM in most classes. The first catchment category is primarily located in central and eastern Iran. Within these catchments, the factors exerting the most substantial influence on SSM are precipitation, distance from water bodies, potential evapotranspiration, elevation, solar radiation, and clay fraction, as demonstrated in Figure 9d,h and described in Table 4. The second catchment type is mainly distributed in southwestern and western Iran. The factors that most significantly affect SSM are precipitation, potential evapotranspiration, clay fraction, distance from water bodies, solar radiation, and elevation. The third catchment category is primarily located in southern (near the Persian Gulf) and northeastern Iran. In this catchment type, precipitation is the most influential factor on SSM, followed by the distance from water bodies, elevation, potential evapotranspiration, clay fraction, and solar radiation. In the fourth catchment category, the primary influencing factors on SSM are precipitation, clay fraction, potential evapotranspiration, distance from water bodies, solar radiation, and elevation (Table 4). Finally, the fifth catchment category is distributed in northern Iran near the Caspian Sea. The most significant factors on SSM are potential evapotranspiration, precipitation, distance from water bodies, solar radiation, clay fraction, and elevation.

4. Conclusions

In this study, we unraveled the impact of the spatial non-stationarity of the critical environmental factors influencing SSM. To this end, we have introduced a framework that combines the SHAP technique with a two-step clustering analysis to provide spatial interpretations for machine learning models such as RF. Given the limited availability of reliable in situ SSM data in Iran, we initially validated the global SSM datasets (SMAP, MERRA-2, and CFSv2) at a depth of 0–5 cm, against available in situ measurements. While overfitting, sensitivity to hyperparameters, and the presence of correlated covariates are common concerns in machine learning applications, we implemented several strategies to address these challenges. Aggregating the data at a seasonal scale helped reduce noise and the risk of overfitting. To further improve model robustness, we used k-fold cross-validation for hyperparameter tuning and evaluated RF performance using multiple statistical metrics (R², RMSE, MAE). Additionally, we carefully examined the predictor variables to minimize the impact of multicollinearity. These measures collectively enhanced the reliability and generalizability of our results. The main conclusions are as follows:

(1) Results of the validation analysis demonstrated that among the datasets, SMAP exhibited the highest median correlation and the lowest median RMSE compared to in situ stations. Hence, it is recommended for applications such as hydrological modeling, water resources management, and drought monitoring in Iran, where SSM data are scarce.

(2) Investigation of SSM across different land cover types in Iran revealed significant variations. Specifically, forests and bare/sparse vegetation regions exhibited the highest and lowest SSM with median values of 0.180 m³/m³ and 0.093 m³/m³, respectively. These findings highlight the importance of understanding the spatial distribution of SSM across different land cover types, which can have implications for various environmental and ecological processes.

(3) The results indicated that the RF model can produce SSM estimates with R² values of 0.89, 0.83, 0.70, and 0.75 for the winter, spring, summer, and autumn seasons, respectively. Corresponding RMSE values are 0.076, 0.081, 0.098, and 0.061 m³/m³, while MAE values are 0.058, 0.060, 0.076, and 0.047 m³/m³, demonstrating consistent performance across seasons. These findings highlight the importance of seasonal investigation of SSM. Due to limited SSM availability in the dry season, machine learning models exhibit reduced prediction accuracy.

(4) The findings of the SHAP model and two-step cluster analysis indicated that SSM in winter and autumn is primarily influenced by climatic factors. In contrast, SSM in spring and summer is largely controlled by vegetation and soil characteristics. These findings highlight the dynamic nature of SSM and how it is influenced by different environmental factors across seasons. Understanding these seasonal variations is essential for effective SSM management and prediction in diverse regions.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/rs17142505/s1, Figure S1: Spatial distribution of factors considered in this study that impact the SSM; Figure S2: Correlation matrix of the factors influencing SSM in winter (a), spring (b), summer (c), and autumn (d). P: precipitation, PET: potential evapotranspiration, SR: solar radiation, WS: wind speed, NDVI: normalized difference vegetation index, GWTD: groundwater table depth, DWB: distance from water bodies, CF: clay fraction, OMF: organic matter fraction, E: elevation, and TRI: topography roughness index; Table S1: The type and percentages of various land covers in the 609 studied catchments. References [3,134,135,136,137,138] are cited in the supplementary materials.

Author Contributions

Conceptualization, Z.N. and R.M.; Methodology, E.P., M.S. and M.B.; Software, Z.N., E.P. and M.S.; Validation, Z.N., M.S. and M.T.; Formal analysis, Z.N. and M.B.; Investigation, R.M. and S.M.H.; Resources, Z.N., M.T., A.E.M. and M.H.E.; Data curation, E.P., M.B., A.E.M. and M.H.E.; Writing—original draft, E.P. and M.B.; Writing—review & editing, Z.N., M.S., M.T., R.M. and S.M.H.; Project administration, Z.N. and S.M.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

The authors thank the Iran Meteorological Organization, which provided SSM and precipitation data for this paper. We also express our sincere gratitude to the editor and anonymous reviewers for their insightful comments, which significantly improved the quality of the manuscript.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Seneviratne, S.I.; Corti, T.; Davin, E.L.; Hirschi, M.; Jaeger, E.B.; Lehner, I.; Orlowsky, B.; Teuling, A.J. Investigating Soil Moisture–Climate Interactions in a Changing Climate: A Review. Earth-Sci. Rev. 2010, 99, 125–161. [Google Scholar] [CrossRef]
Zhang, L.; Zeng, Y.; Zhuang, R.; Szabó, B.; Manfreda, S.; Han, Q.; Su, Z. In Situ Observation-Constrained Global Surface Soil Moisture Using Random Forest Model. Remote Sens. 2021, 13, 4893. [Google Scholar] [CrossRef]
Han, Q.; Zeng, Y.; Zhang, L.; Wang, C.; Prikaziuk, E.; Niu, Z.; Su, B. Global Long Term Daily 1 Km Surface Soil Moisture Dataset with Physics Informed Machine Learning. Sci. Data 2023, 10, 101. [Google Scholar] [CrossRef] [PubMed]
Sure, A.; Dikshit, O. Estimation of Root Zone Soil Moisture Using Passive Microwave Remote Sensing: A Case Study for Rice and Wheat Crops for Three States in the Indo-Gangetic Basin. J. Environ. Manag. 2019, 234, 75–89. [Google Scholar] [CrossRef] [PubMed]
Yang, Q.; Fan, J.; Luo, Z. Response of Soil Moisture and Vegetation Growth to Precipitation under Different Land Uses in the Northern Loess Plateau, China. Catena 2024, 236, 107728. [Google Scholar] [CrossRef]
Cosh, M.H.; Jackson, T.J.; Moran, S.; Bindlish, R. Temporal Persistence and Stability of Surface Soil Moisture in a Semi-Arid Watershed. Remote Sens. Environ. 2008, 112, 304–313. [Google Scholar] [CrossRef]
Parizi, E.; Hosseini, S.M.; Ataie-Ashtiani, B.; Simmons, C.T. Normalized Difference Vegetation Index as the Dominant Predicting Factor of Groundwater Recharge in Phreatic Aquifers: Case Studies across Iran. Sci. Rep. 2020, 10, 17473. [Google Scholar] [CrossRef] [PubMed]
D’Odorico, P.; Caylor, K.; Okin, G.S.; Scanlon, T.M. On Soil Moisture–Vegetation Feedbacks and Their Possible Effects on the Dynamics of Dryland Ecosystems. J. Geophys. Res. Biogeosci. 2007, 112, G04010. [Google Scholar] [CrossRef]
Cai, Y.; Zheng, W.; Zhang, X.; Zhangzhong, L.; Xue, X. Research on Soil Moisture Prediction Model Based on Deep Learning. PLoS ONE 2019, 14, e0214508. [Google Scholar] [CrossRef] [PubMed]
Lagos, M.; Serna, J.L.; Muñoz, J.F.; Suárez, F. Challenges in Determining Soil Moisture and Evaporation Fluxes Using Distributed Temperature Sensing Methods. J. Environ. Manag. 2020, 261, 110232. [Google Scholar] [CrossRef] [PubMed]
Nikraftar, Z.; Mostafaie, A.; Sadegh, M.; Afkueieh, J.H.; Pradhan, B. Multi-Type Assessment of Global Droughts and Teleconnections. Weather Clim. Extrem. 2021, 34, 100402. [Google Scholar] [CrossRef]
Gruber, A.; Dorigo, W.A.; Zwieback, S.; Xaver, A.; Wagner, W. Characterizing Coarse-Scale Representativeness of in Situ Soil Moisture Measurements from the International Soil Moisture Network. Vadose Zone J. 2013, 12, vzj2012-0170. [Google Scholar] [CrossRef]
Everson, C.; Mengistu, M.; Vather, T. The Validation of the Variables (Evaporation and Soil Water) in Hydrometeorological Models: Phase II, Application of Cosmic Ray Probes for Soil Water Measurement. Water Res. Comm. Pretoria S. Afr. WRC Rep. 2017, 17, 4. Available online: https://www.wrc.org.za/wp-content/uploads/mdocs/2323-1-171.pdf (accessed on 17 July 2017).
Rasheed, M.W.; Tang, J.; Sarwar, A.; Shah, S.; Saddique, N.; Khan, M.U.; Imran Khan, M.; Nawaz, S.; Shamshiri, R.R.; Aziz, M. Soil Moisture Measuring Techniques and Factors Affecting the Moisture Dynamics: A Comprehensive Review. Sustainability 2022, 14, 11538. [Google Scholar] [CrossRef]
Xu, L.; Chen, N.; Zhang, X.; Moradkhani, H.; Zhang, C.; Hu, C. In-Situ and Triple-Collocation Based Evaluations of Eight Global Root Zone Soil Moisture Products. Remote Sens. Environ. 2021, 254, 112248. [Google Scholar] [CrossRef]
Dorigo, W.; Himmelbauer, I.; Aberer, D.; Schremmer, L.; Petrakovic, I.; Zappa, L.; Preimesberger, W.; Xaver, A.; Annor, F.; Ardö, J. The International Soil Moisture Network: Serving Earth System Science for over a Decade. Hydrol. Earth Syst. Sci. Discuss. 2021, 2021, 1–83. [Google Scholar] [CrossRef]
Jamei, M.; Mousavi Baygi, M.; Oskouei, E.A.; Lopez-Baeza, E. Validation of the SMOS Level 1C Brightness Temperature and Level 2 Soil Moisture Data over the West and Southwest of Iran. Remote Sens. 2020, 12, 2819. [Google Scholar] [CrossRef]
Guevara, M.; Taufer, M.; Vargas, R. Gap-Free Global Annual Soil Moisture: 15 Km Grids for 1991–2018. Earth Syst. Sci. Data Discuss. 2020, 2020, 1–65. [Google Scholar] [CrossRef]
Xu, L.; Chen, N.; Chen, Z.; Zhang, C.; Yu, H. Spatiotemporal Forecasting in Earth System Science: Methods, Uncertainties, Predictability and Future Directions. Earth-Sci. Rev. 2021, 222, 103828. [Google Scholar] [CrossRef]
Cho, E.; Choi, M. Regional Scale Spatio-Temporal Variability of Soil Moisture and Its Relationship with Meteorological Factors over the Korean Peninsula. J. Hydrol. 2014, 516, 317–329. [Google Scholar] [CrossRef]
Fu, X.; Jiang, X.; Yu, Z.; Ding, Y.; Lü, H.; Zheng, D. Understanding the Key Factors That Influence Soil Moisture Estimation Using the Unscented Weighted Ensemble Kalman Filter. Agric. For. Meteorol. 2022, 313, 108745. [Google Scholar] [CrossRef]
Wang, T.; Franz, T.E. Field Observations of Regional Controls of Soil Hydraulic Properties on Soil Moisture Spatial Variability in Different Climate Zones. Vadose Zone J. 2015, 14, vzj2015-02. [Google Scholar] [CrossRef]
Perry, M.A.; Niemann, J.D. Analysis and Estimation of Soil Moisture at the Catchment Scale Using EOFs. J. Hydrol. 2007, 334, 388–404. [Google Scholar] [CrossRef]
Jiang, Z.; Huete, A.R.; Didan, K.; Miura, T. Development of a Two-Band Enhanced Vegetation Index without a Blue Band. Remote Sens. Environ. 2008, 112, 3833–3845. [Google Scholar] [CrossRef]
Meng, F.; Luo, M.; Sa, C.; Wang, M.; Bao, Y. Quantitative Assessment of the Effects of Climate, Vegetation, Soil and Groundwater on Soil Moisture Spatiotemporal Variability in the Mongolian Plateau. Sci. Total Environ. 2022, 809, 152198. [Google Scholar] [CrossRef] [PubMed]
Jung, M.; Reichstein, M.; Ciais, P.; Seneviratne, S.I.; Sheffield, J.; Goulden, M.L.; Bonan, G.; Cescatti, A.; Chen, J.; De Jeu, R. Recent Decline in the Global Land Evapotranspiration Trend Due to Limited Moisture Supply. Nature 2010, 467, 951–954. [Google Scholar] [CrossRef] [PubMed]
de Queiroz, M.G.; da Silva, T.G.F.; Zolnier, S.; Jardim, A.M.d.R.F.; de Souza, C.A.A.; Júnior, G.D.N.A.; de Morais, J.E.F.; de Souza, L.S.B. Spatial and Temporal Dynamics of Soil Moisture for Surfaces with a Change in Land Use in the Semi-Arid Region of Brazil. Catena 2020, 188, 104457. [Google Scholar] [CrossRef]
Wang, Y.; Yang, J.; Chen, Y.; Fang, G.; Duan, W.; Li, Y.; De Maeyer, P. Quantifying the Effects of Climate and Vegetation on Soil Moisture in an Arid Area, China. Water 2019, 11, 767. [Google Scholar] [CrossRef]
Grayson, R.B.; Western, A.W.; Chiew, F.H.; Blöschl, G. Preferred States in Spatial Soil Moisture Patterns: Local and Nonlocal Controls. Water Resour. Res. 1997, 33, 2897–2908. [Google Scholar] [CrossRef]
Boueshagh, M.; Hasanlou, M. Estimating Water Level in the Urmia Lake Using Satellite Data: A Machine Learning Approach. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2019, 42, 219–226. [Google Scholar] [CrossRef]
Ahmad, S.; Kalra, A.; Stephen, H. Estimating Soil Moisture Using Remote Sensing Data: A Machine Learning Approach. Adv. Water Resour. 2010, 33, 69–80. [Google Scholar] [CrossRef]
Tyralis, H.; Papacharalampous, G.; Langousis, A. A Brief Review of Random Forests for Water Scientists and Practitioners and Their Recent History in Water Resources. Water 2019, 11, 910. [Google Scholar] [CrossRef]
Carranza, C.; Nolet, C.; Pezij, M.; van der Ploeg, M. Root Zone Soil Moisture Estimation with Random Forest. J. Hydrol. 2021, 593, 125840. [Google Scholar] [CrossRef]
Wang, S.; Peng, H.; Liang, S. Prediction of Estuarine Water Quality Using Interpretable Machine Learning Approach. J. Hydrol. 2022, 605, 127320. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 2. [Google Scholar] [CrossRef]
Liu, Q.; Gui, D.; Zhang, L.; Niu, J.; Dai, H.; Wei, G.; Hu, B.X. Simulation of Regional Groundwater Levels in Arid Regions Using Interpretable Machine Learning Models. Sci. Total Environ. 2022, 831, 154902. [Google Scholar] [CrossRef] [PubMed]
Nikraftar, Z.; Parizi, E.; Saber, M.; Hosseini, S.M.; Ataie-Ashtiani, B.; Simmons, C.T. Groundwater Sustainability Assessment in the Middle East Using GRACE/GRACE-FO Data. Hydrogeol. J. 2024, 32, 321–337. [Google Scholar] [CrossRef]
Rahmani, A.; Golian, S.; Brocca, L. Multiyear Monitoring of Soil Moisture over Iran through Satellite and Reanalysis Soil Moisture Products. Int. J. Appl. Earth Obs. Geoinf. 2016, 48, 85–95. [Google Scholar] [CrossRef]
IMO Iran Meteorological Organization. 2023. Available online: https://www.irimo.ir/index.php?newlang=eng (accessed on 17 July 2017).
Ashraf, S.; Nazemi, A.; AghaKouchak, A. Anthropogenic Drought Dominates Groundwater Depletion in Iran. Sci. Rep. 2021, 11, 9135. [Google Scholar] [CrossRef] [PubMed]
Sivakumar, M.V.K.; Stefanski, R. Climate and Land Degradation—An Overview. In Climate and Land Degradation; Sivakumar, M.V.K., Ndiang’ui, N., Eds.; Environmental Science and Engineering; Springer: Berlin/Heidelberg, Germany, 2007; pp. 105–135. ISBN 978-3-540-72437-7. [Google Scholar] [CrossRef]
Gheybi, F.; Paridad, P.; Faridani, F.; Farid, A.; Pizarro, A.; Fiorentino, M.; Manfreda, S. Soil Moisture Monitoring in Iran by Implementing Satellite Data into the Root-Zone SMAR Model. Hydrology 2019, 6, 44. [Google Scholar] [CrossRef]
Saadatabadi, A.R.; Izadi, N.; Karakani, E.G.; Fattahi, E.; Shamsipour, A.A. Investigating Relationship between Soil Moisture, Hydro-Climatic Parameters, Vegetation, and Climate Change Impacts in a Semi-Arid Basin in Iran. Arab. J. Geosci. 2021, 14, 1796. [Google Scholar] [CrossRef]
IWRMC Iran Water Resources Management Company. 2023. Available online: https://www.wrm.ir/?l=EN (accessed on 17 July 2017).
ESA-WorldCover Worldwide Land Cover Mapping. 2020. Available online: https://esa-worldcover.org/en (accessed on 17 July 2017).
Hossein-Panahi, B.; Golestani, A.; Amani, K.; Hosseini, S.M.; Parizi, E. Suspended Sediment Yield Estimation Using Geomorphologic Instantaneous Unit Sedimentgraph: A Case Study from the Southern Caspian Sea Iran. Int. J. River Basin Manag. 2024, 1–13. [Google Scholar] [CrossRef]
Colliander, A.; Cosh, M.H.; Misra, S.; Jackson, T.J.; Crow, W.T.; Chan, S.; Bindlish, R.; Chae, C.; Collins, C.H.; Yueh, S.H. Validation and Scaling of Soil Moisture in a Semi-Arid Environment: SMAP Validation Experiment 2015 (SMAPVEX15). Remote Sens. Environ. 2017, 196, 101–112. [Google Scholar] [CrossRef]
Tian, J.; Zhang, Y. Comprehensive Validation of Seven Root Zone Soil Moisture Products at 1153 Ground Sites across China. Int. J. Digit. Earth 2023, 16, 4008–4022. [Google Scholar] [CrossRef]
Wu, X.; Lu, G.; Wu, Z.; He, H.; Scanlon, T.; Dorigo, W. Triple Collocation-Based Assessment of Satellite Soil Moisture Products with in Situ Measurements in China: Understanding the Error Sources. Remote Sens. 2020, 12, 2275. [Google Scholar] [CrossRef]
Kimball, J.; Jones, L.; Glassy, J.; Reichle, R. SMAP L4 Global Daily 9 Km Carbon Net Ecosystem Exchange, Version 2; NASA National Snow and Ice Data Center Distributed Active Archive Center (DAAC) Data Set; National Snow and Ice Data Center: Boulder, CO, USA, 2016. [Google Scholar] [CrossRef]
Reichle, R.H.; Liu, Q.; Koster, R.D.; Crow, W.T.; De Lannoy, G.J.; Kimball, J.S.; Ardizzone, J.V.; Bosch, D.; Colliander, A.; Cosh, M. Version 4 of the SMAP Level-4 Soil Moisture Algorithm and Data Product. J. Adv. Model. Earth Syst. 2019, 11, 3106–3130. [Google Scholar] [CrossRef]
Kimball, J.; Jones, L.; Endsley, A.; Kundig, T.; Reichle, R. SMAP L4 Global Daily 9 Km EASE-Grid Carbon Net Ecosystem Exchange, Version 4; National Snow and Ice Data Center: Boulder, CO, USA, 2018. [Google Scholar] [CrossRef]
Gelaro, R.; McCarty, W.; Suárez, M.J.; Todling, R.; Molod, A.; Takacs, L.; Randles, C.A.; Darmenov, A.; Bosilovich, M.G.; Reichle, R. The Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2). J. Clim. 2017, 30, 5419–5454. [Google Scholar] [CrossRef] [PubMed]
Saha, S.; Tripp, P. CFSv2 Retrospective Forecasts; NOAA/NWS/NCEP Environmental Modeling Center Tech. Rep: College Park, Maryland, 2011. Available online: https://www.cpc.ncep.noaa.gov/products/CFSv2/CFSv2_body.html (accessed on 17 July 2017).
Reichle, R.H.; Draper, C.S.; Liu, Q.; Girotto, M.; Mahanama, S.P.; Koster, R.D.; De Lannoy, G.J. Assessment of MERRA-2 Land Surface Hydrology Estimates. J. Clim. 2017, 30, 2937–2960. [Google Scholar] [CrossRef]
Saha, S.; Moorthi, S.; Wu, X.; Wang, J.; Nadiga, S.; Tripp, P.; Behringer, D.; Hou, Y.-T.; Chuang, H.; Iredell, M. The NCEP Climate Forecast System Version 2. J. Clim. 2014, 27, 2185–2208. [Google Scholar] [CrossRef]
Dirmeyer, P.A.; Halder, S. Sensitivity of Numerical Weather Forecasts to Initial Soil Moisture Variations in CFSv2. Weather Forecast. 2016, 31, 1973–1983. [Google Scholar] [CrossRef]
Patel, N.; Anapashsha, R.; Kumar, S.; Saha, S.; Dadhwal, V. Assessing Potential of MODIS Derived Temperature/Vegetation Condition Index (TVDI) to Infer Soil Moisture Status. Int. J. Remote Sens. 2009, 30, 23–39. [Google Scholar] [CrossRef]
Zhao, W.; Sánchez, N.; Lu, H.; Li, A. A Spatial Downscaling Approach for the SMAP Passive Surface Soil Moisture Product Using Random Forest Regression. J. Hydrol. 2018, 563, 1009–1024. [Google Scholar] [CrossRef]
Du Toit, W. Radial Basis Function Interpolation. 2008. Available online: https://scholar.sun.ac.za/handle/10019.1/2002 (accessed on 17 July 2017).
Running, S.; Mu, Q.; Zhao, M. Mod16a2 Modis/Terra Net Evapotranspiration 8-Day L4 Global 500m Sin Grid V006. 2017. Available online: https://ladsweb.modaps.eosdis.nasa.gov/missions-and-measurements/products/MOD16A2 (accessed on 17 July 2017).
Muñoz-Sabater, J.; Dutra, E.; Agustí-Panareda, A.; Albergel, C.; Arduini, G.; Balsamo, G.; Boussetta, S.; Choulga, M.; Harrigan, S.; Hersbach, H. ERA5-Land: A State-of-the-Art Global Reanalysis Dataset for Land Applications. Earth Syst. Sci. Data 2021, 13, 4349–4383. [Google Scholar] [CrossRef]
Hossein-Panahi, B.; Samani, S.M.; Sadeghi, A.-R.; Shahi, M.; Hosseini, S.M.; Parizi, E. River Baseflow in Supplying Reservoirs Inflows of Tehran Metropolis: A Machine Learning Modeling Based on Influencing Factors. J. Hydrol. Reg. Stud. 2025, 60, 102528. [Google Scholar] [CrossRef]
European Space Agency (ESA). Google. Harmonized Sentinel-2 MSI: MultiSpectral Instrument, Level-2A (Surface Reflectance) [COPERNICUS/S2_SR_HARMONIZED]. 2023. Available online: https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_SR_HARMONIZED (accessed on 14 September 2024).
National Oceanic and Atmospheric Administration Ocean and Coasts 2024. Available online: https://www.noaa.gov/ocean-coasts (accessed on 17 July 2017).
Hengl, T.; Mendes De Jesus, J.; Heuvelink, G.B.M.; Ruiperez Gonzalez, M.; Kilibarda, M.; Blagotić, A.; Shangguan, W.; Wright, M.N.; Geng, X.; Bauer-Marschallinger, B.; et al. SoilGrids250m: Global Gridded Soil Information Based on Machine Learning. PLoS ONE 2017, 12, e0169748. [Google Scholar] [CrossRef] [PubMed]
Tadono, T.; Nagai, H.; Ishida, H.; Oda, F.; Naito, S.; Minakawa, K.; Iwamoto, H. Generation of the 30 M-Mesh Global Digital Surface Model by ALOS PRISM. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, 41, 157–162. [Google Scholar] [CrossRef]
ESRI Spatial Analysis. 2013. Available online: https://www.esri.com/en-us/home (accessed on 17 July 2017).
Amani, M.; Ghorbanian, A.; Ahmadi, S.A.; Kakooei, M.; Moghimi, A.; Mirmazloumi, S.M.; Moghaddam, S.H.A.; Mahdavi, S.; Ghahremanloo, M.; Parsian, S. Google Earth Engine Cloud Computing Platform for Remote Sensing Big Data Applications: A Comprehensive Review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 5326–5350. [Google Scholar] [CrossRef]
Tang, G.; Clark, M.P.; Papalexiou, S.M.; Ma, Z.; Hong, Y. Have Satellite Precipitation Products Improved over Last Two Decades? A Comprehensive Comparison of GPM IMERG with Nine Satellite and Reanalysis Datasets. Remote Sens. Environ. 2020, 240, 111697. [Google Scholar] [CrossRef]
Saemian, P.; Hosseini-Moghari, S.-M.; Fatehi, I.; Shoarinezhad, V.; Modiri, E.; Tourian, M.J.; Tang, Q.; Nowak, W.; Bárdossy, A.; Sneeuw, N. Comprehensive Evaluation of Precipitation Datasets over Iran. J. Hydrol. 2021, 603, 127054. [Google Scholar] [CrossRef]
Fahrudin, T.; Wijaya, D.R.; Agung, A.A.G. COVID-19 Confirmed Case Correlation Analysis Based on Spearman and Kendall Correlation. In Proceedings of the 2020 International Conference on Data Science and Its Applications (ICoDSA), Bandung, Indonesia, 5–6 August 2020; pp. 1–4. [Google Scholar] [CrossRef]
Parizi, E.; Khojeh, S.; Hosseini, S.M.; Moghadam, Y.J. Application of Unmanned Aerial Vehicle DEM in Flood Modeling and Comparison with Global DEMs: Case Study of Atrak River Basin, Iran. J. Environ. Manag. 2022, 317, 115492. [Google Scholar] [CrossRef] [PubMed]
Gupta, H.V.; Kling, H.; Yilmaz, K.K.; Martinez, G.F. Decomposition of the Mean Squared Error and NSE Performance Criteria: Implications for Improving Hydrological Modelling. J. Hydrol. 2009, 377, 80–91. [Google Scholar] [CrossRef]
Kling, H.; Fuchs, M.; Paulin, M. Runoff Conditions in the Upper Danube Basin under an Ensemble of Climate Change Scenarios. J. Hydrol. 2012, 424, 264–277. [Google Scholar] [CrossRef]
Naghibi, S.A.; Pourghasemi, H.R.; Dixon, B. GIS-Based Groundwater Potential Mapping Using Boosted Regression Tree, Classification and Regression Tree, and Random Forest Machine Learning Models in Iran. Environ. Monit. Assess. 2016, 188, 44. [Google Scholar] [CrossRef] [PubMed]
Breiman, L.; Cutler, A. State of the Art of Data Mining Using Random Forest. In Proceedings of the Salford Data Mining Conference, San Diego, CA, USA, 24–25 May 2012; pp. 24–25. [Google Scholar] [CrossRef]
Pouyan, S.; Pourghasemi, H.R.; Bordbar, M.; Rahmanian, S.; Clague, J.J. A Multi-Hazard Map-Based Flooding, Gully Erosion, Forest Fires, and Earthquakes in Iran. Sci. Rep. 2021, 11, 14889. [Google Scholar] [CrossRef] [PubMed]
Kaiser, M.; Günnemann, S.; Disse, M. Regional-Scale Prediction of Pluvial and Flash Flood Susceptible Areas Using Tree-Based Classifiers. J. Hydrol. 2022, 612, 128088. [Google Scholar] [CrossRef]
Amini, S.; Saber, M.; Rabiei-Dastjerdi, H.; Homayouni, S. Urban Land Use and Land Cover Change Analysis Using Random Forest Classification of Landsat Time Series. Remote Sens. 2022, 14, 2654. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Buitinck, L.; Louppe, G.; Blondel, M.; Pedregosa, F.; Mueller, A.; Grisel, O.; Niculae, V.; Prettenhofer, P.; Gramfort, A.; Grobler, J. API Design for Machine Learning Software: Experiences from the Scikit-Learn Project. arXiv 2013, arXiv:1309.0238. [Google Scholar]
Aksu, G.; Güzeller, C.O.; Eser, M.T. The Effect of the Normalization Method Used in Different Sample Sizes on the Success of Artificial Neural Network Model. Int. J. Assess. Tools Educ. 2019, 6, 170–192. [Google Scholar] [CrossRef]
Van den Broeck, G.; Lykov, A.; Schleich, M.; Suciu, D. On the Tractability of SHAP Explanations. J. Artif. Intell. Res. 2022, 74, 851–886. [Google Scholar] [CrossRef]
Başağaoğlu, H.; Chakraborty, D.; Lago, C.D.; Gutierrez, L.; Şahinli, M.A.; Giacomoni, M.; Furl, C.; Mirchi, A.; Moriasi, D.; Şengör, S.S. A Review on Interpretable and Explainable Artificial Intelligence in Hydroclimatic Applications. Water 2022, 14, 1230. [Google Scholar] [CrossRef]
Zhang, B.; Salem, F.K.A.; Hayes, M.J.; Smith, K.H.; Tadesse, T.; Wardlow, B.D. Explainable Machine Learning for the Prediction and Assessment of Complex Drought Impacts. Sci. Total Environ. 2023, 898, 165509. [Google Scholar] [CrossRef] [PubMed]
Kanani-Sadat, Y.; Safari, A.; Nasseri, M.; Homayouni, S. A Novel Explainable PSO-XGBoost Model for Regional Flood Frequency Analysis at a National Scale: Exploring Spatial Heterogeneity in Flood Drivers. J. Hydrol. 2024, 638, 131493. [Google Scholar] [CrossRef]
Ye, S.; Chai, Y.; Li, J.; Wang, J.; Deng, X.; Ran, Q. Explainable Transfer Learning for Subsurface Soil Moisture Prediction. J. Hydrol. 2025, 661, 133473. [Google Scholar] [CrossRef]
Hu, L.; Wang, K. Computing SHAP Efficiently Using Model Structure Information. arXiv 2023. [Google Scholar] [CrossRef]
Yang, J. Fast TreeSHAP: Accelerating SHAP Value Computation for Trees. arXiv 2021. [Google Scholar] [CrossRef]
Hiabu, M.; Meyer, J.T.; Wright, M.N. Unifying Local and Global Model Explanations by Functional Decomposition of Low Dimensional Structures. arXiv 2022. [Google Scholar] [CrossRef]
Chiu, T.; Fang, D.; Chen, J.; Wang, Y.; Jeris, C. A Robust and Scalable Clustering Algorithm for Mixed Type Attributes in Large Database Environment. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Minin, San Francisco, CA, USA, 26–29 August 2001; pp. 263–268. [Google Scholar] [CrossRef]
Wu, X.; Benjamin Zhan, F.; Zhang, K.; Deng, Q. Application of a Two-Step Cluster Analysis and the Apriori Algorithm to Classify the Deformation States of Two Typical Colluvial Landslides in the Three Gorges, China. Environ. Earth Sci. 2016, 75, 146. [Google Scholar] [CrossRef]
Fahy, B.; Brenneman, E.; Chang, H.; Shandas, V. Spatial Analysis of Urban Flooding and Extreme Heat Hazard Potential in Portland, OR. Int. J. Disaster Risk Reduct. 2019, 39, 101117. [Google Scholar] [CrossRef]
Qin, H.; Huang, Q.; Zhang, Z.; Lu, Y.; Li, M.; Xu, L.; Chen, Z. Carbon Dioxide Emission Driving Factors Analysis and Policy Implications of Chinese Cities: Combining Geographically Weighted Regression with Two-Step Cluster. Sci. Total Environ. 2019, 684, 413–424. [Google Scholar] [CrossRef] [PubMed]
Satish, S.; Bharadhwaj, S. Information Search Behaviour among New Car Buyers: A Two-Step Cluster Analysis. IIMB Manag. Rev. 2010, 22, 5–15. [Google Scholar] [CrossRef]
Chen, F.; Crow, W.T.; Bindlish, R.; Colliander, A.; Burgin, M.S.; Asanuma, J.; Aida, K. Global-Scale Evaluation of SMAP, SMOS and ASCAT Soil Moisture Products Using Triple Collocation. Remote Sens. Environ. 2018, 214, 1–13. [Google Scholar] [CrossRef] [PubMed]
Maleki, K.H.; Vaezi, A.R.; Sarmadian, F.; Crow, W.T. Validation of Satellite-Based Soil Moisture Retrievals from SMAP with in Situ Observation in the Simineh-Zarrineh (Bokan) Catchment, NW of Iran. Eurasian J. Soil Sci. 2019, 8, 340–350. [Google Scholar] [CrossRef]
Saeedi, M.; Sharafati, A.; Tavakol, A. Evaluation of Gridded Soil Moisture Products over Varied Land Covers, Climates, and Soil Textures Using in Situ Measurements: A Case Study of Lake Urmia Basin. Theor. Appl. Climatol. 2021, 145, 1053–1074. [Google Scholar] [CrossRef]
Jamei, M.; Lopez-Baeza, E.; Asadi, E. Validation of SMAP Surface Soil Moisture Products over Iran. In Proceedings of the 44th COSPAR Sci. Assembly, Athens, Greece, 16–24 July 2022; Volume 44, p. 123. Available online: https://ui.adsabs.harvard.edu/abs/2022cosp...44..123J/abstract (accessed on 17 July 2017).
Amini, A.; Moghadam, M.K.; Kolahchi, A.A.; Raheli-Namin, M.; Ahmed, K.O. Evaluation of GLDAS Soil Moisture Product over Kermanshah Province, Iran. H2Open J. 2023, 6, 373–386. [Google Scholar] [CrossRef]
Fakharizadehshirazi, E.; Sabziparvar, A.A.; Sodoudi, S. Long-Term Spatiotemporal Variations in Satellite-Based Soil Moisture and Vegetation Indices over Iran. Environ. Earth Sci. 2019, 78, 342. [Google Scholar] [CrossRef]
Garcia-Estringana, P.; Latron, J.; Llorens, P.; Gallart, F. Spatial and Temporal Dynamics of Soil Moisture in a Mediterranean Mountain Area (Vallcebre, NE Spain). Ecohydrology 2013, 6, 741–753. [Google Scholar] [CrossRef]
Jin, Z.; Guo, L.; Lin, H.; Wang, Y.; Yu, Y.; Chu, G.; Zhang, J. Soil Moisture Response to Rainfall on the Chinese Loess Plateau after a Long-term Vegetation Rehabilitation. Hydrol. Process. 2018, 32, 1738–1754. [Google Scholar] [CrossRef]
Feng, T.; Shen, Y.; Wang, F.; Chen, Q.; Ji, K. Spatiotemporal Variability and Driving Factors of the Shallow Soil Moisture in North China during the Past 31 Years. J. Hydrol. 2023, 619, 129331. [Google Scholar] [CrossRef]
Zhou, Q.; Sun, Z.; Liu, X.; Wei, X.; Peng, Z.; Yue, C.; Luo, Y. Temporal Soil Moisture Variations in Different Vegetation Cover Types in Karst Areas of Southwest China: A Plot Scale Case Study. Water 2019, 11, 1423. [Google Scholar] [CrossRef]
Janani, N.; Kannan, B.; Nagarajan, K.; Thiyagarajan, G.; Duraisamy, M.R. Soil Moisture Mapping for Different Land-Use Patterns of Lower Bhavani River Basin Using Vegetative Index and Land Surface Temperature. Env. Dev Sustain 2023, 26, 4533–4549. [Google Scholar] [CrossRef]
Nikraftar, Z.; Parizi, E.; Hosseini, S.M.; Ataie-Ashtiani, B. Lake Urmia Restoration Success Story: A Natural Trend or a Planned Remedy? J. Great Lakes Res. 2021, 47, 955–969. [Google Scholar] [CrossRef]
Khosravi, K.; Panahi, M.; Golkarian, A.; Keesstra, S.D.; Saco, P.M.; Bui, D.T.; Lee, S. Convolutional Neural Network Approach for Spatial Prediction of Flood Hazard at National Scale of Iran. J. Hydrol. 2020, 591, 125552. [Google Scholar] [CrossRef]
Sadeghi, M.; Shearer, E.J.; Mosaffa, H.; Gorooh, V.A.; Naeini, M.R.; Hayatbini, N.; Katiraie-Boroujerdy, P.-S.; Analui, B.; Nguyen, P.; Sorooshian, S. Application of Remote Sensing Precipitation Data and the CONNECT Algorithm to Investigate Spatiotemporal Variations of Heavy Precipitation: Case Study of Major Floods across Iran (Spring 2019). J. Hydrol. 2021, 600, 126569. [Google Scholar] [CrossRef]
Adab, H.; Morbidelli, R.; Saltalippi, C.; Moradian, M.; Ghalhari, G.A.F. Machine Learning to Estimate Surface Soil Moisture from Remote Sensing Data. Water 2020, 12, 3223. [Google Scholar] [CrossRef]
Fathololoumi, S.; Vaezi, A.R.; Alavipanah, S.K.; Ghorbani, A.; Biswas, A. Comparison of Spectral and Spatial-Based Approaches for Mapping the Local Variation of Soil Moisture in a Semi-Arid Mountainous Area. Sci. Total Environ. 2020, 724, 138319. [Google Scholar] [CrossRef] [PubMed]
Choubin, B.; Darabi, H.; Rahmati, O.; Sajedi-Hosseini, F.; Kløve, B. River Suspended Sediment Modelling Using the CART Model: A Comparative Study of Machine Learning Techniques. Sci. Total Environ. 2018, 615, 272–281. [Google Scholar] [CrossRef] [PubMed]
Ekmekcioğlu, Ö.; Koc, K.; Özger, M.; Işık, Z. Exploring the Additional Value of Class Imbalance Distributions on Interpretable Flash Flood Susceptibility Prediction in the Black Warrior River Basin, Alabama, United States. J. Hydrol. 2022, 610, 127877. [Google Scholar] [CrossRef]
Tabari, H.; Talaee, P.H. Temporal Variability of Precipitation over Iran: 1966–2005. J. Hydrol. 2011, 396, 313–320. [Google Scholar] [CrossRef]
Javari, M. Trend and Homogeneity Analysis of Precipitation in Iran. Climate 2016, 4, 44. [Google Scholar] [CrossRef]
Feng, H.; Liu, Y. Combined Effects of Precipitation and Air Temperature on Soil Moisture in Different Land Covers in a Humid Basin. J. Hydrol. 2015, 531, 1129–1140. [Google Scholar] [CrossRef]
Rascón-Ramos, A.E.; Martínez-Salvador, M.; Sosa-Pérez, G.; Villarreal-Guerrero, F.; Pinedo-Alvarez, A.; Santellano-Estrada, E.; Corrales-Lerma, R. Soil Moisture Dynamics in Response to Precipitation and Thinning in a Semi-Dry Forest in Northern Mexico. Water 2021, 13, 105. [Google Scholar] [CrossRef]
Du, M.; Zhang, J.; Elmahdi, A.; Wang, Z.; Yang, Q.; Liu, H.; Liu, C.; Hu, Y.; Gu, N.; Bao, Z. Variation Characteristics and Influencing Factors of Soil Moisture Content in the Lime Concretion Black Soil Region in Northern Anhui. Water 2021, 13, 2251. [Google Scholar] [CrossRef]
Wenwu, Z.; Xuening, F.; Daryanto, S.; Zhang, X.; Yaping, W. Factors Influencing Soil Moisture in the Loess Plateau, China: A Review. Earth Environ. Sci. Trans. R. Soc. Edinb. 2018, 109, 501–509. [Google Scholar] [CrossRef]
Cai, J.; Zhou, B.; Chen, S.; Wang, X.; Yang, S.; Cheng, Z.; Wang, F.; Mei, X.; Wu, D. Spatial and Temporal Variability of Soil Moisture and Its Driving Factors in the Northern Agricultural Regions of China. Water 2024, 16, 556. [Google Scholar] [CrossRef]
Yin, D.; Song, X.; Zhu, X.; Guo, H.; Zhang, Y.; Zhang, Y. Spatiotemporal Analysis of Soil Moisture Variability and Its Driving Factor. Remote Sens. 2023, 15, 5768. [Google Scholar] [CrossRef]
Vachaud, G.; Passerat de Silans, A.; Balabanis, P.; Vauclin, M. Temporal Stability of Spatially Measured Soil Water Probability Density Function. Soil Sci. Soc. Am. J. 1985, 49, 822–828. [Google Scholar] [CrossRef]
Ojha, R.; Morbidelli, R.; Saltalippi, C.; Flammini, A.; Govindaraju, R.S. Scaling of Surface Soil Moisture over Heterogeneous Fields Subjected to a Single Rainfall Event. J. Hydrol. 2014, 516, 21–36. [Google Scholar] [CrossRef]
Li, Y.-X.; Leng, P.; Kasim, A.A.; Li, Z.-L. Spatiotemporal Variability and Dominant Driving Factors of Satellite Observed Global Soil Moisture from 2001 to 2020. J. Hydrol. 2025, 654, 132848. [Google Scholar] [CrossRef]
Zhu, P.; Jia, X.; Zhao, C.; Shao, M. Long-Term Soil Moisture Evolution and Its Driving Factors across China’s Agroecosystems. Agric. Water Manag. 2022, 269, 107735. [Google Scholar] [CrossRef]
Pellet, C.; Hauck, C. Monitoring Soil Moisture from Middle to High Elevation in Switzerland: Set-up and First Results from the SOMOMOUNT Network. Hydrol. Earth Syst. Sci. 2017, 21, 3199–3220. [Google Scholar] [CrossRef]
Xu, M.; Xu, G.; Cheng, Y.; Min, Z.; Li, P.; Zhao, B.; Shi, P.; Xiao, L. Soil Moisture Estimation and Its Influencing Factors Based on Temporal Stability on a Semiarid Sloped Forestland. Front. Earth Sci. 2021, 9, 629826. [Google Scholar] [CrossRef]
Vanderlinden, K.; Vereecken, H.; Hardelauf, H.; Herbst, M.; Martínez, G.; Cosh, M.H.; Pachepsky, Y.A. Temporal Stability of Soil Water Contents: A Review of Data and Analyses. Vadose Zone J. 2012, 11, vzj2011-0178. [Google Scholar] [CrossRef]
Ghasemloo, N.; Matkan, A.A.; Alimohammadi, A.; Aghighi, H.; Mirbagheri, B. Estimating the Agricultural Farm Soil Moisture Using Spectral Indices of Landsat 8, and Sentinel-1, and Artificial Neural Networks. J. Geovisualization Spat. Anal. 2022, 6, 19. [Google Scholar] [CrossRef]
Li, M.; Yan, Y. Comparative Analysis of Machine-Learning Models for Soil Moisture Estimation Using High-Resolution Remote-Sensing Data. Land 2024, 13, 1331. [Google Scholar] [CrossRef]
Majdar, H.A.; Vafakhah, M.; Sharifikia, M.; Ghorbani, A. Spatial and Temporal Variability of Soil Moisture in Relation with Topographic and Meteorological Factors in South of Ardabil Province, Iran. Environ. Monit. Assess. 2018, 190, 500. [Google Scholar] [CrossRef] [PubMed]
Bandak, S.; Boali, A.; Yaghobi, S.; Taghizadeh-Mehrjardi, R. Ensemble Machine Learning Approaches for Estimating Soil Texture Components in Loess Soils of Golestan Province. Earth Sci. Inform. 2025, 18, 396. [Google Scholar] [CrossRef]
Laity, J.J. Deserts and Desert Environments; John Wiley & Sons: Hoboken, NJ, USA, 2009; Volume 3, ISBN 1-4443-0074-1. Available online: https://www.wiley.com/en-us/Deserts+and+Desert+Environments-p-9781444300741 (accessed on 17 July 2017).
Parizi, E.; Hosseini, S.M.; Ataie-Ashtiani, B.; Simmons, C.T. Representative Pumping Wells Network to Estimate Groundwater Withdrawal from Aquifers: Lessons from a Developing Country, Iran. J. Hydrol. 2019, 578, 124090. [Google Scholar] [CrossRef]
Goward, S.N.; Markham, B.; Dye, D.G.; Dulaney, W.; Yang, J. Normalized Difference Vegetation Index Measurements from the Advanced Very High Resolution Radiometer. Remote Sens. Environ. 1991, 35, 257–277. [Google Scholar] [CrossRef]
Chen, X.; Hu, Q. Groundwater Influences on Soil Moisture and Surface Evaporation. J. Hydrol. 2004, 297, 285–300. [Google Scholar] [CrossRef]
Fan, Y.; Li, H.; Miguez-Macho, G. Global Patterns of Groundwater Table Depth. Science 2013, 339, 940–943. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The location of the study area and the agrometeorological stations with the climate classification and topography as the background.

Figure 2. Flowchart of the methodology used in this study.

Figure 3. The SSM evaluation based on in situ data and statistical metrics of τ (a), KGE (b), RMSE (c), and RBias (d) from 1 April 2015 to 31 March 2023. The asterisk symbol indicates outliers.

Figure 4. Spatial pattern of SSM in winter (a), spring (b), summer (c), and autumn (d) in Iran’s 609 studied catchments from April 2015 to March 2023.

Figure 5. Boxplot (a) and temporal pattern (b) of monthly SSM in various land covers across Iran from 1 April 2015 to 31 March 2023. The temporal pattern of precipitation is also illustrated in section b.

Figure 6. The scatterplot, trend line, R², RMSE, and MAE of the predicted and observed SSM in the testing phase: winter (a), spring (b), summer (c), and autumn (d).

Figure 7. Beeswarm plots of SHAP reveal the effects of the influencing factors on the SSM: winter (a), spring (b), summer (c), and autumn (d). Each data point represents a SHAP value for a feature. P: precipitation, PET: potential evapotranspiration, SR: solar radiation, WS: wind speed, NDVI: normalized difference vegetation index, GWTD: groundwater table depth, DWB: distance from water bodies, CF: clay fraction, OMF: organic matter fraction, E: elevation, and TRI: topography roughness index.

Figure 8. Waterfall plots of SHAP values for four major catchments in Iran. P: precipitation, PET: potential evapotranspiration, SR: solar radiation, WS: wind speed, GWTD: groundwater table depth, DWB: distance from water bodies, CF: clay fraction, E: elevation, and TRI: topography roughness index. (a–d) correspond to the Gorgan, Mahabad, Hamun, and Lut Desert catchments, respectively.

Figure 9. Different types of catchments obtained by clustering the SHAP values for the winter (a,b), spring (c,d), summer (e,f), and autumn (g,h). P: precipitation, PET: potential evapotranspiration, SR: solar radiation, WS: wind speed, NDVI: normalized difference vegetation index, GWTD: groundwater table depth, DWB: distance from water bodies, CF: clay fraction, OMF: organic matter fraction, E: elevation, and TRI: topography roughness index. The SHAP values are multiplied by 1000 for visualization purposes.

Table 3. BIC values for different numbers of clusters across seasons. Boldface indicates the optimal BIC value.

Number of Clusters	BIC
Number of Clusters	Winter	Spring	Summer	Autumn
1	2607	2607	2607	2607
2	2173	1952	2012	2059
3	1959	1706	1753	1707
4	1803	1594	1579	1514
5	1549	1432	1320	1268
6	1546	1428	1330	1260
7	1549	1432	1345	1291
8	1573	1440	1384	1328
9	1602	1453	1425	1378
10	1638	1480	1467	1428

Table 4. Clustering results of SHAP values for different seasons. P: precipitation, PET: potential evapotranspiration, SR: solar radiation, NDVI: normalized difference vegetation index, DWB: distance from water bodies, CF: clay fraction, OMF: organic matter fraction, and E: elevation. The SHAP values are multiplied by 1000.

Season	Class	P	DWB	SR	CF	PET	E
Winter	1	34.79	0.61	4.05	5.63	2.67	−0.21
	2	−36.45	−13.00	2.87	−0.59	−0.45	0.02
	3	3.19	19.96	−2.10	2.19	−0.81	1.42
	4	2.57	−14.18	−7.06	1.45	−1.73	−1.58
	5	−47.93	3.53	−6.71	−12.87	−2.06	1.11
		NDVI	DWB	CF	OMF	P	SR
Spring	1	37.69	1.15	5.63	8.79	4.71	1.82
	2	12.72	13.93	6.31	−2.80	−1.18	1.85
	3	−14.65	−3.57	3.69	−4.19	−4.17	−3.42
	4	−1.35	−8.68	4.55	−4.46	3.58	2.02
	5	−35.12	−1.56	−17.79	−3.20	−5.07	−1.97
		DWB	CF	PET	NDVI	OMF	E
Summer	1	6.92	4.72	18.13	1.77	1.26	0.08
	2	10.63	−8.47	1.32	−0.95	−1.14	0.25
	3	−12.85	3.59	−3.01	1.96	1.32	−0.57
	4	−13.4	−5.15	−2.50	−2.14	−1.64	−0.62
	5	8.47	3.83	−3.15	0.38	0.24	0.90
		P	DWB	PET	SR	E	CF
Autumn	1	−8.40	−5.55	−2.73	−1.03	−2.25	0.68
	2	12.68	−0.15	−4.29	0.07	0.04	0.47
	3	−8.54	3.59	−1.83	0.34	1.96	0.54
	4	−20.97	−3.00	−3.21	−2.15	0.55	−3.35
	5	11.52	3.92	13.74	4.62	0.26	0.49

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nikraftar, Z.; Parizi, E.; Saber, M.; Boueshagh, M.; Tavakoli, M.; Esmaeili Mahmoudabadi, A.; Ekradi, M.H.; Mbuvha, R.; Hosseini, S.M. An Interpretable Machine Learning Framework for Unraveling the Dynamics of Surface Soil Moisture Drivers. Remote Sens. 2025, 17, 2505. https://doi.org/10.3390/rs17142505

AMA Style

Nikraftar Z, Parizi E, Saber M, Boueshagh M, Tavakoli M, Esmaeili Mahmoudabadi A, Ekradi MH, Mbuvha R, Hosseini SM. An Interpretable Machine Learning Framework for Unraveling the Dynamics of Surface Soil Moisture Drivers. Remote Sensing. 2025; 17(14):2505. https://doi.org/10.3390/rs17142505

Chicago/Turabian Style

Nikraftar, Zahir, Esmaeel Parizi, Mohsen Saber, Mahboubeh Boueshagh, Mortaza Tavakoli, Abazar Esmaeili Mahmoudabadi, Mohammad Hassan Ekradi, Rendani Mbuvha, and Seiyed Mossa Hosseini. 2025. "An Interpretable Machine Learning Framework for Unraveling the Dynamics of Surface Soil Moisture Drivers" Remote Sensing 17, no. 14: 2505. https://doi.org/10.3390/rs17142505

APA Style

Nikraftar, Z., Parizi, E., Saber, M., Boueshagh, M., Tavakoli, M., Esmaeili Mahmoudabadi, A., Ekradi, M. H., Mbuvha, R., & Hosseini, S. M. (2025). An Interpretable Machine Learning Framework for Unraveling the Dynamics of Surface Soil Moisture Drivers. Remote Sensing, 17(14), 2505. https://doi.org/10.3390/rs17142505

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Interpretable Machine Learning Framework for Unraveling the Dynamics of Surface Soil Moisture Drivers

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Datasets

2.2.1. SSM Data

2.2.2. Factors Influencing SSM

2.3. Methods

2.3.1. Statistical Metrics

2.3.2. Random Forest (RF)

2.3.3. SHAP

2.3.4. Cluster Analysis

3. Results and Discussion

3.1. Performances of SSM Products

3.2. Spatial-Temporal Pattern of SSM

3.3. SSM in Different Land Covers

3.4. RF and SHAP

3.5. Cluster Analysis

4. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI