Integrating Sentinel-1 SAR and Machine Learning Models for Optimal Soil Moisture Sensor Placement at Catchment Scale

Xie, Yi; Cui, Guotao; Zheng, Kaifeng; Tang, Guoping

doi:10.3390/rs17132330

Open AccessArticle

Integrating Sentinel-1 SAR and Machine Learning Models for Optimal Soil Moisture Sensor Placement at Catchment Scale

Carbon-Water Research Station in Karst Regions of Northern Guangdong, School of Geography and Planning, Sun Yat-Sen University, Guangzhou 510006, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(13), 2330; https://doi.org/10.3390/rs17132330

Submission received: 30 April 2025 / Revised: 29 June 2025 / Accepted: 4 July 2025 / Published: 7 July 2025

(This article belongs to the Special Issue High-Resolution Soil Moisture Products for Hydrology, Agriculture, and Hazard Applications)

Download

Browse Figures

Versions Notes

Abstract

Accurate calibration and validation of remote sensing soil moisture products critically depend on high-quality in situ measurements. However, effectively capturing representative soil moisture patterns across heterogeneous catchments using ground-based sensors remains a significant challenge. To address this, we propose a machine-learning-based framework for optimizing soil moisture sensor network deployment at the catchment scale. The framework was validated using Sentinel-1 SAR-derived soil moisture data within a humid catchment in southern China. Results show that a network of nine optimally placed sensors minimized prediction errors (RMSE: 7.20%), outperforming both sparser and denser configurations. The optimized sensor network achieved a 52.45% reduction in RMSE compared to random placement. Moreover, the optimal number of sensors varied with seasonal dynamics: the wet season required 11 sensors due to increased precipitation-induced spatial variability, whereas the dry season could be adequately monitored with only six sensors. The proposed optimization approach offers a cost-effective strategy for collecting reliable in situ data, which is essential for improving the accuracy and applicability of remote sensing products in catchment-scale soil moisture monitoring.

Keywords:

soil moisture; sensor network; machine learning; Sentinel-1; SAR data

Graphical Abstract

1. Introduction

Soil moisture is a critical environmental variable that plays a pivotal role in hydrological cycles [1], agricultural productivity [2], and climate regulation [3]. As a key component of the Earth’s surface energy balance, it directly influences evapotranspiration processes [4], vegetation dynamics [5], and groundwater recharge [6], while also serving as a sensitive indicator of extreme weather events such as droughts and floods [7]. Accurate soil moisture monitoring is essential for optimizing irrigation strategies, improving crop yield predictions, and enhancing climate models [8]. However, at the catchment scale, soil moisture exhibits significant spatiotemporal heterogeneity, which arises from the combined effects of terrain, vegetation cover, and other environmental factors [9]. These factors interact in complex and often nonlinear ways, complicating efforts to capture representative soil moisture dynamics using limited in situ observations [10]. Understanding and characterizing such spatial heterogeneity is crucial for improving the accuracy and applicability of hydrological remote sensing [11].

Remote sensing facilitates large-scale and dynamic observation. However, the accuracy of some remote sensing data can be limited by relatively coarse spatial resolution [12]. Synthetic Aperture Radar (SAR) data from the Sentinel-1 mission, an active microwave mission, has attracted considerable attention for soil moisture retrieval due to its high spatial resolution (~10 m) and regular 12-day revisit interval [13,14]. Soil dielectric properties are strongly influenced by brightness temperature and the backscattering coefficient, with soil moisture being the primary driver of variations in the dielectric constant [15]. Consequently, Sentinel-1’s C-band SAR offers detailed surface backscatter information, enabling the assessment of soil moisture dynamics at both field and catchment scales.

Although traditional in situ soil sampling through manual collection yields accurate measurements, its practical application is limited due to its labor-intensive nature, high time cost, and unsuitability for continuous monitoring [16]. Large-scale observation experiments have been conducted globally over the past two decades. Among them, the Soil Moisture Experiments 2002–2005 (SMEX02–05), including the Georgia Soil Moisture Experiment (SMEX03), yielded extensive in situ datasets that were critical for validating remote sensing products and underscored the importance of spatial heterogeneity in hydrological processes [17]. In Australia, the National Airborne Field Experiments (NAFE) and the follow-up Soil Moisture Active Passive Experiments (SMAPEx) focused on bridging the spatial scale gap between point measurements and satellite observations. These campaigns substantially enhanced the development of retrieval algorithms at intermediate to coarse resolutions [18,19]. In China, the Watershed Allied Telemetry Experimental Research (WATER) and its successor, the Heihe Watershed Allied Telemetry Experimental Research (Hi-WATER) established comprehensive observation networks integrating multi-scale, multi-sensor platforms. These initiatives provided benchmark datasets for remote sensing validation and contributed significantly to understanding land–atmosphere interactions in cold and arid regions [17,20]. More recently, wireless sensor networks (WSNs) have emerged as a promising way for soil moisture monitoring, offering high temporal resolution, scalability, and the potential for real-time data acquisition across diverse environments [21]. At small catchment scales (0.1 to 80 km²), WSNs represent a significant advancement by utilizing spatially optimized low-power sensor nodes. These systems retain the millimeter-level precision and minute-level temporal resolution characteristic of traditional in situ methods, while wireless communication technologies enable the real-time acquisition of spatially distributed data [22]. WSN technology provides a cost-effective and scalable solution for instrumenting large areas, enabling real-time data acquisition from spatially distributed sensors through a centralized communication infrastructure. For example, a WSN comprising over 300 sensors was deployed in a forested 1 km² headwater catchment in California’s southern Sierra Nevada to monitor variables such as snow depth, soil moisture, and solar radiation. This deployment demonstrated the system’s effectiveness in delivering real-time, catchment-scale hydrological monitoring at relatively low cost [23]. Similarly, the STH-net, established at the Schäfertal Hillslope site in Germany, offers high-resolution observations of key hydrological variables and soil properties. This network includes eight monitoring stations equipped with time-domain reflectometry probes, soil temperature sensors, and monitoring wells, complemented by a weather station recording various meteorological parameters. The spatial arrangement of sensors was strategically optimized based on prior soil mapping and soil moisture monitoring, allowing the network to effectively capture both lateral and vertical variability in soil properties and water dynamics along the hillslope [24]. Another example is the deployment of SoilNet in the 27-hectare forested Wüstebach catchment, where 150 end devices and 600 EC-5 soil water content sensors were installed. This network demonstrated the feasibility of near real-time soil moisture monitoring at both field and headwater catchment scales [25]. Moreover, the self-powered design and remote transmission capabilities of WSNs substantially reduce the costs associated with deployment and maintenance in complex terrains. Even with a limited number of nodes, WSNs can be deployed at representative sites to effectively capture the spatiotemporal heterogeneity of soil moisture, thereby enhancing monitoring efficiency and representativeness [26].

Optimal node deployment is crucial for maximizing the efficiency and reliability of WSNs. Random placement of sensor nodes often leads to uneven spatial distribution, resulting in coverage gaps, redundant nodes, and inefficient data transmission, all of which degrade network performance and accelerate energy depletion [27]. To address these limitations, dynamic deployment and scheduling algorithms have been adopted to improve coverage quality and enhance the representativeness of collected data. Liang proposed an adaptive Cauchy variant butterfly optimization algorithm for optimizing sensor deployment in soil moisture wireless sensor networks and developed a coverage model integrating node coverage and network quality of service. The algorithm enhanced global and local search capabilities through Cauchy variants and adaptive factors [28]. Xiao proposed a sensor target allocation model and applied a quantum clone elite genetic algorithm to optimize sensor selection in soil moisture wireless sensor networks, significantly improving convergence speed and placement efficiency for agricultural monitoring [29]. Similarly, Dursun combined an artificial neural network with a genetic algorithm to estimate soil moisture and optimize sensor locations in a solar-powered irrigation system, achieving a 32% reduction in daily energy and water use with only five sensors [30]. However, these algorithms were primarily designed for soil moisture monitoring in flat agricultural areas and have not been utilized at the catchment scale with complex geospatial features [31].

Machine learning-based clustering techniques offer significant advantages for optimizing WSNs node deployment in complex catchment landscapes. Unlike traditional scheduling algorithms, machine learning-based clustering methods use unsupervised learning to analyze features, enabling the automatic detection of similarities and differences within data. Through the analysis of geospatial variables, these approaches delineate the monitoring region into distinct clusters, thereby facilitating strategic sensor deployment that emphasizes critical areas and reduces spatial redundancy. For example, a random forest algorithm utilized the concept of virtual stations to optimize sensor network deployment under future climate scenarios, outperforming traditional random or distance-based placement approaches [32]. Similarly, for snow depth monitoring, a machine-learning approach has been proposed to enhance network design at basin scale [33]. Nevertheless, there remains a need for integrated frameworks that leverage machine learning not only for efficient spatial optimization but also specifically to enhance the synergy between targeted in situ measurements and satellite remote sensing for catchment scale soil moisture monitoring.

In this study, we propose a machine learning framework to identify an optimal sensor deployment strategy for catchment-scale soil moisture monitoring—one that minimizes the number of sensors required while maintaining reliable estimation accuracy. This framework integrates machine learning models to scientifically guide sensor deployment, improving the efficiency of obtaining representative soil moisture data and enhancing the accuracy of large-scale monitoring. The primary focus of this study is to propose a methodology for optimizing sensor placement prior to the availability of in situ data. Specifically, the study consists of the following steps: (1) retrieving soil moisture from Sentinel-1 imagery; (2) determining the optimal number and spatial arrangement of wireless sensor networks (WSNs) within the catchment; and (3) evaluating the performance of the optimized sensor placement by comparing it with randomized configurations. This method integrates machine learning modeling to scientifically optimize sensor deployment, enhancing the efficiency of acquiring representative soil moisture information at the catchment scale and improving the accuracy of large-scale monitoring.

2. Materials and Methods

To optimize the WSN configuration, we developed a sensor selection framework that integrates unsupervised learning with regression techniques. (Figure 1). The methodology first involves retrieving soil moisture from Sentinel-1 imagery. Next, representative sites are identified through multi-scale Gaussian Mixture Model (GMM) clustering based on geospatial features. Finally, a Gaussian Process Regression (GPR) model was employed to estimate the spatial distribution of soil moisture across the entire catchment. To identify the optimal sensor network configuration, model performance was evaluated under varying numbers and spatial distributions of stations by calculating prediction errors.

2.1. Study Area

The Shikenghe (SKH) Catchment, located in Yingde City, Guangdong Province, China, spans 24.22°N–24.30°N and 113.20°E–113.25°E (Figure 2), covering approximately 75.4 km² with a maximum elevation of 1174 m. The land cover distribution shown in Figure 2b is based on the China Land Cover Dataset (CLCD) [34]. The study area lies within the Shimentai (SMT) National Nature Reserve, a key ecological conservation zone in South China, and is rich in biodiversity and natural resources [35]. Positioned in the upper reaches of the Pearl River, the catchment features a dense hydrological network and abundant water resources. As one of the main sources of the Pearl River, it plays a vital role in regulating and sustaining regional water availability. Given its ecological significance, the region has long been a focal point of interdisciplinary research, particularly in the fields of ecology, hydrology, and climate change.

The terrain is predominantly mountainous and hilly, with higher elevations in the western and northern regions, while lowland plains are mainly concentrated in the central part of the watershed. In terms of land cover, forests are the dominant type, followed by shrublands and riparian vegetation. The forests are primarily located in the northern mountainous regions, where human disturbance is minimal and vegetation is dense and well preserved. Evergreen broadleaf forests dominate these areas [36]. The catchment’s soils are mainly red soils [37], which are loose, highly acidic, and well-drained, making them susceptible to moisture loss and erosion. Climatically, the region lies in a transitional zone between the southern and mid-subtropical regions and experiences a subtropical monsoon climate with distinct wet and dry seasons. The wet season, which extends from May to August, accounts for over 70% of the annual precipitation and is characterized by frequent and intense rainfall events [35]. In contrast, the dry season, spanning from October to March, receives significantly less rainfall, with November to January being the driest period, during which the average monthly precipitation often falls below 50 mm.

2.2. Input Data and Preprocessing

2.2.1. SAR-Based Soil Moisture Using Sentinel-1 Data

In our study area, the ground-based soil moisture observation network was not deployed. Instead, we employed SAR-derived soil moisture saturation data. Soil moisture retrieval from SAR data is based on the sensitivity of radar backscatter to the dielectric properties of the soil, which are primarily influenced by its moisture content [38]. Upon interacting with the land surface, the incident radar signal may be absorbed, scattered, or reflected, and a portion of the backscattered energy is recorded by the sensor. Despite the complexity introduced by vegetation and surface heterogeneity, with appropriate modeling approaches, the soil dielectric constant can be isolated to estimate surface soil moisture [39]. Sentinel-1, equipped with a C-band SAR sensor, provides dual polarized (VV and VH) backscatter data at high spatial (10 m) and temporal (6–12 day) resolutions, with global coverage and free data access. These features make it particularly suitable for monitoring soil moisture dynamics under various land cover conditions. Recent studies leveraging these capabilities demonstrate applications ranging from high-precision, field-scale mapping achieved by integrating Sentinel-1 SAR with UAV stereoscopy [40], to regional empirical modeling combining Sentinel-1 and Sentinel-2 data for vegetation and soil moisture estimation in diverse environments like Portugal’s Atlantic mountains [41]. Furthermore, the utility of Sentinel-1 for large-scale monitoring is evidenced by the development of global 1 km soil moisture products derived from its dual-polarization data, which show strong agreement with in situ observations and support high-resolution hydrological and ecological applications [42].

Sentinel-1 SAR data were acquired from the Google Earth Engine (GEE) platform using the COPERNICUS/S1_GRD dataset (https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S1_GRD?hl=zh-cn, accessed on 1 February 2025)). The dataset provides vertically transmitted and vertically received (VV) polarization data with a spatial resolution of 10 m. A total of 178 images spanning the period from 2017 to 2024 were selected through temporal filtering, specifically targeting VV polarization to ensure data consistency. To reduce the impact of coherent speckle noise inherent in Sentinel-1 SAR imagery, we applied a multi-temporal median compositing approach. Specifically, for each month during the study period, all available Sentinel-1 images were collected and aggregated using a pixel-wise median calculation.

For soil moisture retrieval, we employed a normalized backscatter model that quantifies relative changes in surface wetness conditions [43]. This model builds on the TU Wien Change Detection Model [44], which interprets temporal variations in radar backscatter primarily as changes in soil moisture, assuming that other surface properties such as geometry, surface roughness, and vegetation structure remain temporally stable [45]. It is a dimensionless, normalized index derived from Sentinel-1 SAR backscatter time series analysis, which serves as an indicator for comparing wetness levels across different locations and time periods. And it is self-calibrated at the pixel level through statistical analysis of long-term backscatter time series to estimate parameters representing dry and wet reference conditions, as well as average contributions from vegetation and surface geometry [46]. For each observation, the backscatter value

Δ σ^{0} (Θ, t)

is normalized to a reference incidence angle and linearly scaled between the dry and wet reference values to derive the relative soil water saturation, expressed as a percentage [47]. This approach leverages the sensitivity of VV-polarized SAR signals to dielectric changes in surface materials, enabling robust estimation of soil moisture dynamics while minimizing terrain-induced artifacts [48]. Vegetation effects are implicitly accounted for in the model by treating the vegetation contribution to the radar signal as a temporally stable component. As such, their influence is incorporated into the estimation of the dry and wet reference backscatter values through long-term statistical analysis [43]. The model has demonstrated reliable performance in extracting geophysical information from C-band backscatter measurements, as formulated in Equations (1) and (2).

SM (t) = \frac{Δ σ^{0} (Θ, t)}{σ_{wet}^{0} (Θ) - σ_{dry}^{0} (Θ)}

(1)

S (Θ) = σ_{wet}^{0} (Θ) - σ_{dry}^{0} (Θ)

(2)

where SM(t) represents the soil moisture saturation at time t, expressed as a percentage (%). Δσ⁰(θ, t) denotes the temporal change in the normalized radar backscatter coefficient relative to dry reference conditions, serving as the primary indicator of soil moisture variation. θ represents the radar incidence angle, which significantly affects backscatter measurements. σ⁰__wet(θ) and σ⁰__dry(θ) refer to the normalized backscatter coefficients under saturated and dry soil conditions, respectively, representing the upper and lower bounds of soil moisture response. The actual backscatter σ⁰(θ, t) at incidence angle θ and time t is normalized and scaled between these reference values, yielding relative surface soil moisture saturation (SSM) in percent.

2.2.2. Catchment Physiographic Data

We use elevation and canopy height data from two Earth observation datasets, namely the Shuttle Radar Topography Mission (SRTM) and the Global Ecosystem Dynamics Investigation (GEDI) Level 2A. The SRTM dataset (http://srtm.csi.cgiar.org, accessed on 1 February 2025) utilizes C-band radar interferometry to generate global digital elevation models (DEMs) at a spatial resolution of 30 m. The DEM data was imported using geospatial processing libraries to extract elevation information, record the corresponding spatial reference system and coordinate information, and further calculate slope and aspect. The GEDI L2A dataset (https://earthdata.nasa.gov/, accessed on 1 February 2025), collected by a lidar instrument onboard the International Space Station (ISS) since 2018, provides high-resolution canopy height measurements at a 25 m resolution. The Global Ecosystem Dynamics Investigation (GEDI) instrument comprises three lasers that generate eight ground transects, spaced approximately 600 m apart in the across-track direction [49]. Each transect consists of 25 m diameter footprints, sampled at 60 m intervals along the orbital path. Each waveform captures the vertical structure of intercepted surfaces, including canopy tops and the underlying ground. Specifically, the GEDI Level 2A product delivers footprint level (25 m) elevation measurements, including the elevation of the lowest mode, which represents the estimated ground surface elevation beneath each laser footprint [50]. This dataset offers several advantages for canopy analysis, including high spatial resolution, global coverage, and precise vertical structural information of vegetation. Representative sites were selected using multiple spatial data, including slope, aspect, and canopy height. To ensure consistency across variables, all datasets were resampled to a 30 m spatial resolution. Each representative site corresponds to a single 30 m grid cell, encapsulating local topographic and vegetation characteristics.

2.3. Machine Learning Models

To construct the GPR model, we selected four independent variables: elevation, aspect, slope, derived or calculated from the DEM data, and canopy height obtained from the GEDI L2A 246 dataset. The dependent variable was satellite-derived soil moisture saturation. Representative training locations across the study area were identified using the GMM clustering algorithm based on the input features. Model evaluation followed a leave-one-station-out cross-validation (LOSO-CV) strategy, where each representative site was iteratively withheld as the validation set, and the remaining representative sites were used for training. All non-representative locations were used as an independent test set to evaluate the generalization capability of the model.

2.3.1. GMM for Soil Moisture Sensors Clustering

The proposed method begins by applying a GMM to cluster soil moisture monitoring stations according to their representative geospatial characteristics, such as elevation, aspect, slope, and canopy height. This step aims to cluster geospatial features to delineate the spatial heterogeneity of soil distribution within the catchment, thereby optimizing the placement of monitoring stations to support efficient network design for comprehensive hydrological observation. Previous studies suggest that elevation, aspect, slope, and canopy height are key factors affecting catchment water distribution [51]. To characterize the catchment heterogeneity, we defined a four-dimensional feature space vector for the GMM, x = [elevation, aspect, slope, canopy height]. In this vector, elevation reflects terrain height, which influences drainage potential; aspect captures slope orientation, which affects solar radiation input and evapotranspiration; slope quantifies surface steepness, governing surface runoff and infiltration dynamics; and canopy height represents vegetation cover, modulating water retention through interception and transpiration processes.

The GMM enables the identification of spatially coherent clusters, where stations within a cluster exhibit minimal intra-cluster variability in soil moisture-influencing features [33]. The GMM assumes that the observed feature space arises from m latent Gaussian components, each corresponding to a cluster of stations sharing homogeneous environmental characteristics. Each component is modeled as a multivariate normal distribution as shown in Equation (3).

N (x∣ μ_{m}, Σ_{m})

(3)

The probability density function of the GMM is expressed as Equation (4).

p (x) = \sum_{m = 1}^{M} π_{m} N (x∣ μ_{m}, Σ_{m})

(4)

where

π_{m}

is the weight of the m cluster Gaussian component;

μ_{m}

defines the centroid of the m cluster in the feature space, representing the mean values within the cluster;

Σ_{m}

captures the covariance structure among the four features, characterizing spatial interdependencies. The Bayesian Information Criterion (BIC) guides the selection of M by balancing model complexity and goodness of fit.

2.3.2. Spatial Prediction Using GPR

The GPR model is a non-parametric Bayesian approach widely used for regression tasks, particularly when modeling complex and nonlinear relationships between a target variable and its covariates [52]. In the context of this study, GPR is applied to predict soil moisture based on geospatial features (elevation, aspect, slope, canopy height).

Technical details on GPR theory and implementation can be found in [53]. Formally, a Gaussian process (GP) defines a prior over functions, which can be converted into a posterior distribution after observing training data. A GP is completely specified by its mean function m(x) and covariance function k(x, x’), formally expressed as Equation (5).

Y \sim G P (m (x), k (x, x^{'}))

(5)

where x denotes the input vector.

k (x, x^{'})

encodes assumptions about the function’s smoothness and periodicity. A common choice is the squared exponential kernel as Equation (6). Its most notable property is infinite differentiability, which ensures the generation of functions that are extremely smooth and continuous.

k (x, x^{'}) = σ^{2} e x p (- \frac{{‖x - x^{'}‖}^{2}}{{2 l}^{2}})

(6)

where

σ^{2}

is the variance of all representative samples, and l is the length scale of all samples.

GPR was chosen for soil moisture prediction due to its ability to model complex, nonlinear, and interactive relationships among geospatial features. Unlike traditional regression methods that rely on assumptions of linearity or additivity, GPR employs covariance kernels to capture intricate spatial dependencies [54]. Moreover, field-based soil moisture observations are typically sparse, resulting in limited training data. GPR is well-suited for such small data scenarios, as its Bayesian framework mitigates overfitting and supports robust prediction even with limited samples.

2.3.3. Model Evaluation

The accuracy of the GPR model was evaluated by comparing the predicted soil moisture under each scenario with the SAR-based soil moisture. To ensure the robustness and reliability of the GPR model, we conducted leave-one-station-out cross-validation. In each iteration, one representative station was left out as the validation sample, while the remaining representative stations were used as the training set. Three metrics were employed to quantify model performance: Root Mean Squared Error (RMSE), bias, and mean absolute error (MAE) as shown in Equations (7)–(9).

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(\hat{y_{i}} - y_{i})}^{2}}

(7)

B i a s = \sum_{i = 1}^{n} (\hat{y_{i}} - y_{i})

(8)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |\hat{y_{i}} - y_{i}|

(9)

where n is the total number of points,

\hat{y_{i}}

represents the GPR model predicted soil moisture at point i, and

y_{i}

is the SAR-based soil moisture.

3. Results

3.1. SAR-Based Soil Moisture in the SKH Catchment

The spatial distribution of soil moisture in the SKH Catchment between the wet seasons (May to August) and dry seasons (October to March) from 2017 to 2023 was inverted based on Sentinel-1 SAR data (Figure 3). Soil moisture exhibits clear seasonal patterns, ranging from 62.3% to 75.5% with an average of 67.1% during the wet season, whereas in the dry season, it varies between 16.3% and 40.9%, with an average of 27.6%. Average soil moisture in the wet season is 142% higher than in the dry season. The standard deviation of soil moisture also exhibits distinct seasonal variation, ranging from 20.1% to 21.8% with a mean value of 20.9% during the wet season, and from 16.4% to 23.9%, averaging 19.0%, during the dry season. Soil moisture variability is greater in the wet season. The histogram of soil moisture distribution exhibits a wider spread during the wet season compared to the dry season.

This indicates that during the wet season, frequent and intense precipitation events lead to significant spatial and temporal variability in soil moisture inputs, thereby increasing soil moisture variability and resulting in a more heterogeneous distribution. In contrast, during the dry season, reduced precipitation and enhanced evapotranspiration result in generally lower soil moisture levels and diminished differences among regions, leading to a more concentrated and homogeneous distribution. These seasonal differences highlight the critical role of climatic conditions in regulating soil hydrological processes and spatial heterogeneity.

3.2. Model Performance and Optimal Placement of Sensors

To identify an optimal sensor deployment strategy that minimizes the number of sensors required while maintaining reliable estimation accuracy. The average soil moisture over the seven-year study period was first calculated to determine the optimal number of model stations. Representative sites identified through GMM clustering were selected as training data to build the GPR model. The RMSE and bias were then evaluated for sensor deployments ranging from 5 to 20 sensors (Figure 4). To ensure effective model training, the number of training samples must exceed the feature dimensionality (4 dimensions); therefore, five sensors were selected as the lower limit for deployment. We also investigated the effect of sensor network density by evaluating configurations of up to 30 sensors. As detailed in Figure S1, the analysis showed that improvements in model performance, measured by RMSE ratio, became marginal beyond a network size of 20 sensors. Balancing this finding against the substantial costs associated with deploying additional sensors, we concluded that 20 sensors represent the optimal upper limit for this study. For practical applications in other catchments, this upper limit can be adjusted to suit different budgetary constraints and objectives.

RMSE reaches its minimum value of 7.20% when nine sensor sites are deployed (Table 1), indicating that the GPR model most accurately captures the average soil moisture throughout the study period. The results indicate that simply increasing the number of stations does not continuously reduce the model’s RMSE. Adding more stations reduces the RMSE only until the optimal number of stations is reached. Once this optimal number of sensors is surpassed, further increases in sensor count do not significantly lower the model’s RMSE. For example, when the number of sensors increased from seven to nine, the RMSE decreased from 9.09% to 7.20%. When the number of sensors increased from 10 to 20, there was no significant decrease in the RMSE. In fact, when the number of sensors was 15, the model exhibited the worst performance, with the highest RMSE of 8.66%. When the number of sensors is small (less than 10), the model’s bias remains relatively stable, and its predictive performance is consistent. However, when the number of sensors increases (between 10 and 20), the model’s bias shows greater variation, and its predictive performance becomes less stable.

Our results indicate that when the optimal number of sensors has not been reached, the model’s predictive ability improves as the number of sensors increases. The data provided by the model may not fully represent the underlying patterns of soil moisture distribution. The model might miss important spatial variations or local changes in the environment, leading to poorer predictions. However, once the optimal number of sensors is reached, increasing the number of sensors further leads to a decrease in prediction accuracy due to data redundancy. When additional sensors are deployed in areas already covered by existing sensors, the new data may simply duplicate the information already provided, resulting in only minimal improvements in model performance. In such cases, the model becomes overly reliant on redundant data, which can lead to overfitting. Furthermore, interactions between sensors may also degrade model performance. In regions with high sensor density, the results from sensors may be highly correlated, and this correlation makes the model overly sensitive to small variations in the data, which may not reflect the true soil moisture distribution patterns. When the sensor network becomes too dense, the model may fail to capture the spatial heterogeneity of soil moisture distribution, leading to unstable and inaccurate predictions.

To further assess whether the number of deployed sensors leads to statistically significant differences in model performance, we calculated the residuals for each clustering configuration and performed pairwise paired t-tests across all scenarios. The resulting p-values were visualized as a heatmap (Figure 5), highlighting the significance of residual differences between sensor deployment strategies. The analysis showed that 82.50% of pairwise comparisons yielded statistically significant differences (p < 0.05), indicating that prediction accuracy is substantially influenced by the number of sensors deployed.

To evaluate the generalization performance of the GPR model across spatial domains, we implemented a leave-one-station-out cross-validation (LOSO-CV) procedure based on the nine selected optimal stations. The observed values, predicted values, and corresponding prediction errors for each held-out station are summarized in (Supplementary Material, Table S1). The results show that the GPR model maintains reasonable predictive accuracy across most stations, with prediction errors ranging from −9.67 to 17.14. While some stations exhibit larger discrepancies, the overall performance demonstrates the robustness of the GPR-based approach.

To determine the optimal placement of sensors in the study area and validate the effectiveness of the proposed site selection method, we compared four site selection scenarios (i.e., scenarios #1–4) and MAEs (Table 2). These four scenarios are based on the results presented in Figure 4, where selecting nine sites is considered the optimal number of sites for the catchment. This study also examines the impact of reducing the number of sites (7 sites) and increasing the number of sites (11 sites) on the spatial distribution of model-predicted soil moisture. Additionally, we compared the results of model training using nine sites selected by our proposed method with those using nine sites selected randomly.

As shown in Figure 3 and Figure 6a–c, before reaching the optimal number of sites (nine sites), increasing the number of selected sites consistently enhances the model prediction performance. The MAE for the model with seven sites is 9.09%, which is 27.54% higher than that of the optimal nine-site configuration. However, once the optimal number of sites is reached, further increasing the number of sites results in a deterioration in performance. The MAE for the model with 11 sites increases to 6.53%, representing a 14.53% increase compared to that for the optimal 9-site scenario. In Figure 6b,d, the MAE of the nine sites selected by the random method is 8.69%, which is significantly higher than using the method proposed in this study. The performance of the method proposed in this study is 52.45% better than that of random site selection.

Figure 6e–h illustrate the spatial distribution of the selected sensors. When the number of sensors is seven, their distribution is relatively sparse, making it difficult to capture the soil moisture distribution across the catchment. As shown in Figure 6i, the model exhibits larger biases in areas farther from the sensors. In contrast, when the number of sensors increases to 11, the distribution becomes redundant, with some sensors placed close to each other. Figure 6k shows that the model bias in this case is larger than the optimal number of sites (nine sites). Figure 6h,k show that the randomly selected station locations fail to adequately capture the spatial distribution of soil moisture across the catchment. The site distribution is overly dense in some areas and overly sparse in others, and this uneven spatial coverage results in greater model bias.

Therefore, the site selection method proposed in this study effectively identifies the optimal number and placement of sensors. In the design of sensor networks, it is crucial to balance deployment costs with model accuracy. Although increasing the number of sensors may enhance model performance to some extent, an excessive number of stations can lead to data redundancy and ultimately reduce the efficiency of the monitoring network.

3.3. Seasonal Impacts on Optimal Site Selection

To assess the influence of dry and wet seasons on the optimal sensor network design for the study area, the average soil moisture during the dry and wet seasons over the seven-year study period was first calculated, and then model performance was evaluated across these different seasons.

As shown in Figure 7a,b, in most site selection scenarios, the selected sites demonstrate relatively weaker predictive performance of the soil moisture model during the wet season. The mean RMSE of the soil moisture prediction model was 34.49% for the wet season and 31.03% for the dry season, with the model’s predictive ability for the wet season being 11.15% lower than that for the dry season.

To further examine seasonal differences in model performance, we analyzed the prediction variances estimated by the GPR models for both wet and dry seasons across the study period. For each year, we calculated the distribution of prediction variances and visualized them using boxplots (Supplementary Material, Figure S2). Results show that the prediction variance is higher during the wet season, suggesting greater model uncertainty under wetter conditions. This increased uncertainty is likely driven by higher spatial heterogeneity and more dynamic hydrological processes, such as rapid changes in precipitation and vegetation water uptake, which complicate soil moisture dynamics.

The kernel density maps of all representative sites and the optimal number and spatial distribution of soil moisture sensors during different seasons are shown in Figure 8. Representative sites were obtained using the GMM algorithm from all sensor numbers (5–20). The optimal network sensor number was determined by the minimum RMSE. Results revealed distinct seasonal requirements, with the wet season necessitating 11 sites compared to only 6 sites during the dry season. This marked difference in sensor deployment between seasons likely stems from greater spatial heterogeneity in soil moisture patterns during periods of higher precipitation, requiring denser network coverage to achieve equivalent monitoring accuracy.

3.4. Physiographic Characteristics of Representative Sites

Figure 9 illustrates the distribution of four physiographic characteristics for all representative soil monitoring sites. The results indicate that these representative sites are primarily located at elevations between 200 and 400 m, suggesting that mid-altitude zones capture important landscape variability within the study area. The canopy height at these sites generally ranges from 15 to 30 m, indicating that areas with mature vegetation cover were frequently selected as representative sites. The slope is typically less than 30 degrees, implying that relatively gentle terrain was prioritized. The orientations of the representative sites are generally southeast-facing. These geospatial features are critical for accurately capturing the overall hydrological response and variability of this humid southern China catchment.

4. Discussion

4.1. Soil Moisture Monitoring Networks for Remote Sensing

Optimization algorithms for moisture monitoring networks play a critical role in enhancing the effectiveness of remote sensing applications by improving the spatial representativeness of in situ observations [55]. Since remote sensing products often suffer from limitations such as coarse resolution [56], sensor noise [57], and retrieval uncertainty [58], a well-designed ground-based sensor network can provide essential validation data and support downscaling efforts. By strategically selecting station locations based on environmental heterogeneity, topographic complexity, and hydrological variability, optimization algorithms ensure that the limited number of ground stations can maximally capture the spatial and temporal dynamics of soil moisture [58]. Integrating optimized monitoring networks with remote sensing enhances the accuracy of satellite-derived soil moisture estimates and strengthens their utility for hydrological modeling, drought assessment, and climate change studies [59]. Optimized sensor networks improve the temporal and spatial consistency of ground observations, which is crucial for capturing transient hydrological events and seasonal dynamics.

4.2. Seasonal Dynamics of Soil Moisture Heterogeneity

Our findings demonstrate that optimizing sensor network density according to wet and dry season variations holds critical significance for long-term accurate soil moisture monitoring in watersheds. As revealed in Figure 5 and Figure 6, the enhanced spatial heterogeneity during wet seasons necessitates denser sensor deployment to capture moisture variability patterns. These findings have practical implications for the deployment of mobile soil moisture sensors. Specifically, adaptive sensor placement strategies can be designed to target areas and periods with elevated predictive uncertainty. For example, during the wet season, mobile sensors can be prioritized for deployment in regions with high topographic complexity or land cover transitions, where soil moisture variability tends to be greater. Such targeted deployments can complement fixed sensor networks, enhance the spatial representativeness of observations, and improve the robustness of model-based estimations for hydrological applications. The observed seasonal dynamics in soil moisture spatial heterogeneity can be attributed to the nonlinear hydraulic response of soil systems under high-intensity rainfall events, as characterized by the Brooks–Corey model [60]. During wet seasons, short-duration, high-intensity precipitation induces rapid transitions between unsaturated and saturated flow regimes, leading to threshold-controlled changes in hydraulic conductivity. When rainfall intensity exceeds the soil’s infiltration capacity, the relative hydraulic conductivity follows a power law relationship with effective saturation [61]. This nonlinear dependence enhances spatial variability, as saturation tends to occur preferentially in areas with higher antecedent moisture content or coarser soil textures, while neighboring drier regions remain in low conductivity unsaturated states [62]. As a result, emergent patterns of hydrologic connectivity generate pronounced soil moisture gradients that may be inadequately captured by sparse sensor spacing. Our findings have practical implications for the deployment of mobile soil moisture sensors. Specifically, adaptive sensor placement strategies can be designed to target areas and periods with elevated predictive uncertainty. For example, during the wet season, mobile sensors can be prioritized for deployment in regions with high topographic complexity or land cover transitions, where soil moisture variability tends to be greater. Such targeted deployments can complement fixed sensor networks, enhance the spatial representativeness of observations, and improve the robustness of model-based estimations for hydrological applications.

Vegetation also plays a crucial role in the seasonal dynamics of soil moisture spatial heterogeneity, as it regulates both biological processes (such as root water uptake) and physical mechanisms (such as the spatial redistribution of throughfall) [63]. Under wet conditions, the absence of water stress enables plants to exhibit spatially heterogeneous root uptake patterns, thereby amplifying soil moisture variability [64]. Conversely, during drought periods, water extraction becomes localized to remaining moist zones, driving moisture distribution toward homogenization [64]. While this moisture-dependent variability pattern has been validated in agricultural and grassland ecosystems [65], forest systems exhibit more pronounced regulatory effects due to their deep root architectures and elevated transpiration demands [66], which exacerbate water redistribution heterogeneity [67]. Furthermore, canopy structural features (e.g., leaf drip points) amplify this spatial differentiation through focused water channeling [68]. Integration becomes especially crucial for forest ecosystems, where microscale water redistribution processes may exert asymmetric impacts on regional hydrological cycles through cumulative effects across spatial scales.

4.3. Scaling and Adaptation of Soil Moisture Sensor Network Optimization

While the proposed framework demonstrated effectiveness within a humid catchment, its broader applicability warrants further investigation. Future research should prioritize evaluating the scalability of soil moisture sensor networks from small experimental catchments to larger watershed systems. Larger basins often exhibit increased spatial heterogeneity in terms of topography, soil properties, land use, and anthropogenic influences. As a result, more sophisticated deployment strategies may be necessary to ensure adequate spatial representation and monitoring efficiency.

Furthermore, given that the present study was conducted in a subtropical monsoonal climate, assessing the adaptability of the sensor network design across varying climatic zones is critical to enhance its generalizability and robustness. In arid and semi-arid regions, for instance, high evapotranspiration rates and infrequent precipitation events result in more pronounced soil moisture variability in deeper soil layers. Previous studies have highlighted the importance of placing sensors at greater depths and maintaining long-term monitoring to capture episodic recharge events [69]. Additionally, the harsh environmental conditions in such regions necessitate the use of low-power and weather-resistant sensors [70]. Conversely, in humid and monsoonal environments, frequent precipitation events lead to rapid and dynamic fluctuations in near-surface soil moisture. High-density sensor networks with fine temporal resolution are therefore essential for capturing infiltration dynamics. Redundant sensor deployment can also help mitigate the risk of data loss due to sensor failure caused by waterlogging or biological interference [71]. Therefore, future efforts should focus on developing climate-sensitive and scalable sensor network frameworks. Key design parameters (such as sensor depth, spatial density, and temporal resolution) should be tailored to both watershed scale and prevailing climatic conditions to improve the applicability, adaptability, and resilience of soil moisture monitoring systems across diverse hydroclimatic contexts.

4.4. Limitations

While GMM is widely used for clustering spatial data due to its capacity to model multimodal distributions, its application in areas with highly heterogeneous or overlapping geographic attributes presents certain limitations [33]. In particular, the performance of GMM can be sensitive to the predefined number of mixture components, which may not fully capture the underlying spatial complexity if the data distribution deviates significantly from Gaussian assumptions or exhibits strong local variation. This may lead to suboptimal cluster delineation and reduced interpretability of the results. Moreover, when clusters have varying densities or irregular shapes, the standard GMM may oversimplify the spatial structure. Future work may benefit from exploring more flexible non-parametric models such as the Dirichlet Process Gaussian Mixture Model (DP-GMM), which can adaptively infer the number of components from the data and potentially offer improved robustness in complex spatial domains.

One assumption in the change detection method used for soil moisture retrieval is that surface roughness and vegetation conditions remain constant during the inversion period. While this assumption simplifies the complex dynamics of land surface properties, its influence on the study’s outcomes is likely minor, as the primary objective is the optimal placement of soil moisture sensors, which is a preparatory step that occurs before long-term data collection begins. The soil moisture data used for model training and analysis were obtained from the same period, during which the overall surface conditions remained relatively stable. As a result, potential variations in surface roughness and vegetation are minimized, and their impact on the sensor placement optimization results is considered negligible. Nonetheless, future studies incorporating dynamic vegetation and surface roughness parameters may further improve the accuracy of soil moisture estimation and sensor network design.

In this study, we did not use actual ground-based observations, instead, we employed SAR-derived soil moisture saturation data to demonstrate the proposed method. While it is possible to convert SAR-based saturation values to soil moisture content through site-specific calibration [72], the lack of in situ data in our specific study area prevents a direct validation of the soil moisture estimates. This study aims to establish a framework for optimizing sensor placement in ungauged basins. The validation of surface parameters using measured data is both essential and critical. Future studies in regions with established ground sensor networks can quantitatively evaluate and enhance the proposed framework, supporting its robustness and applicability in broader contexts where such data exist.

5. Conclusions

In this study, we proposed a machine learning-based framework for optimizing soil moisture sensor network deployment at the catchment scale, validated using Sentinel-1 SAR-derived data in the SKH catchment. The main findings are as follows:

(1) Soil moisture exhibited marked seasonal variation from 2017 to 2023, with a wet season average of 67.1–142% higher than the dry season average of 27.6% and greater variability during the wet season.

(2) The optimal sensor network configuration includes nine stations, yielding the lowest RMSE (7.20%) and outperforming both sparse and redundant deployments. Compared to random placement, our approach improves prediction accuracy by 52.45% through strategic site selection.

(3) Due to higher spatial heterogeneity in the wet season, model performance declines by 11.15% relative to the dry season. Accordingly, more sensors, i.e., 11 sensors are required for optimal coverage in the wet season, compared to only six in the dry season.

Overall, the proposed method supports the efficient acquisition of in situ data for calibrating and validating remote sensing products, enhancing their accuracy and utility in catchment-scale soil moisture monitoring.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/rs17132330/s1, Figure S1: RMSE ratio between consecutive sensor configurations (20–30 sensors); Figure S2: Boxplot comparison of GPR prediction variance between wet and dry seasons; Table S1: Results of leave-one-station-out cross-validation (LOSO-CV) for the GPR model.

Author Contributions

Conceptualization, Y.X. and G.C.; data curation, Y.X. and G.C.; formal analysis, Y.X. and G.C.; funding acquisition, G.C. and G.T.; investigation, Y.X., K.Z. and G.C.; methodology, Y.X.; supervision, G.C. and G.T.; validation, Y.X.; visualization, Y.X. and K.Z.; writing—original draft, Y.X. and G.C.; writing—review and editing, Y.X., K.Z., G.C. and G.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China (Grant No. 42301012), the National Key R&D Program of China (2024YFD1700801-04), the Guangzhou Science and Technology Plan Project (2024A04J3814), the 100 Talents Plan Foundation of Sun Yat-sen University (37000-12230030), the Fundamental Research Funds for the Central Universities of Ministry of Education of China (24qnpy020), and the Guangdong science and technology plan project for “research and innovation platform” (#2024B1212040005).

Data Availability Statement

Data will be available on request from the authors.

Acknowledgments

We are grateful to Saswata Nandi for his valuable help in editing the manuscript. We are also very grateful to the three anonymous reviewers for their insightful comments and suggestions, which have substantially strengthened our work.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

WSNs	Wireless Sensor Networks
SAR	Synthetic Aperture Radar
GPR	Gaussian Process Regression
GMM	Gaussian Mixture Model
GP	Gaussian Process
RMSE	Root Mean Square Error
MAE	Mean Absolute Error

References

Rodriguez-Iturbe, I. Ecohydrology: A Hydrologic Perspective of Climate-Soil-Vegetation Dynamies. Water Resour. Res. 2000, 36, 3–9. [Google Scholar] [CrossRef]
Champagne, C.; Berg, A.A.; McNairn, H.; Drewitt, G.; Huffman, T. Evaluation of Soil Moisture Extremes for Agricultural Productivity in the Canadian Prairies. Agric. For. Meteorol. 2012, 165, 1–11. [Google Scholar] [CrossRef]
Quan, Q.; Liang, W.; Yan, D.; Lei, J. Influences of Joint Action of Natural and Social Factors on Atmospheric Process of Hydrological Cycle in Inner Mongolia, China. Urban. Clim. 2022, 41, 101043. [Google Scholar] [CrossRef]
Zhang, K.; Ali, A.; Antonarakis, A.; Moghaddam, M.; Saatchi, S.; Tabatabaeenejad, A.; Chen, R.; Jaruwatanadilok, S.; Cuenca, R.; Crow, W.T.; et al. The Sensitivity of North American Terrestrial Carbon Fluxes to Spatial and Temporal Variation in Soil Moisture: An Analysis Using Radar-Derived Estimates of Root-Zone Soil Moisture. J. Geophys. Res. Biogeosciences 2019, 124, 3208–3231. [Google Scholar] [CrossRef]
Morbidelli, R.; Saltalippi, C.; Flammini, A.; Corradini, C.; Brocca, L.; Govindaraju, R.S. An Investigation of the Effects of Spatial Heterogeneity of Initial Soil Moisture Content on Surface Runoff Simulation at a Small Watershed Scale. J. Hydrol. 2016, 539, 589–598. [Google Scholar] [CrossRef]
He, M.; Kimball, J.S.; Running, S.; Ballantyne, A.; Guan, K.; Huemmrich, F. Satellite Detection of Soil Moisture Related Water Stress Impacts on Ecosystem Productivity Using the MODIS-Based Photochemical Reflectance Index. Remote Sens. Environ. 2016, 186, 173–183. [Google Scholar] [CrossRef]
Furtak, K.; Wolińska, A. The Impact of Extreme Weather Events as a Consequence of Climate Change on the Soil Moisture and on the Quality of the Soil Environment and Agriculture—A Review. Catena 2023, 231, 107378. [Google Scholar] [CrossRef]
Liu, Y.; Yang, Y. Advances in the Quality of Global Soil Moisture Products: A Review. Remote Sens. 2022, 14, 3741. [Google Scholar] [CrossRef]
Feng, Q.; Zhao, W.; Qiu, Y.; Zhao, M.; Zhong, L. Spatial Heterogeneity of Soil Moisture and the Scale Variability of Its Influencing Factors: A Case Study in the Loess Plateau of China. Water 2013, 5, 1226–1242. [Google Scholar] [CrossRef]
Stark, J.R.; Fridley, J.D. Topographic Drivers of Soil Moisture Across a Large Sensor Network in the Southern Appalachian Mountains (USA). Water Resour. Res. 2023, 59, e2022WR034315. [Google Scholar] [CrossRef]
Lakhankar, T.; Ghedira, H.; Temimi, M.; Azar, A.E.; Khanbilvardi, R. Effect of Land Cover Heterogeneity on Soil Moisture Retrieval Using Active Microwave Remote Sensing Data. Remote Sens. 2009, 1, 80–91. [Google Scholar] [CrossRef]
Stanyer, C.; Seco-Rizo, I.; Atzberger, C.; Marti-Cardona, B. Soil Texture, Soil Moisture, and Sentinel-1 Backscattering: Towards the Retrieval of Field-Scale Soil Hydrological Properties. Remote Sens. 2025, 17, 542. [Google Scholar] [CrossRef]
Munda, M.K.; Parida, B.R. Soil Moisture Modeling over Agricultural Fields Using C-Band Synthetic Aperture Radar and Modified Dubois Model. Appl. Geomat. 2023, 15, 97–108. [Google Scholar] [CrossRef]
Zhu, L.; Cai, Q.; Jin, J.; Yuan, S.; Shen, X.; Walker, J.P. Multi-Scale Domain Adaptation for High-Resolution Soil Moisture Retrieval from Synthetic Aperture Radar in Data-Scarce Regions. J. Hydrol. 2025, 657, 133073. [Google Scholar] [CrossRef]
Wigneron, J.-P.; Calvet, J.-C.; Pellarin, T.; Van De Griend, A.A.; Berger, M.; Ferrazzoli, P. Retrieving Near-Surface Soil Moisture from Microwave Radiometric Observations: Current Status and Future Plans. Remote Sens. Environ. 2003, 85, 489–506. [Google Scholar] [CrossRef]
Duarte, E.; Hernandez, A. A Review on Soil Moisture Dynamics Monitoring in Semi-Arid Ecosystems: Methods, Techniques, and Tools Applied at Different Scales. Appl. Sci. 2024, 14, 7677. [Google Scholar] [CrossRef]
Li, X.; Li, X.; Li, Z.; Ma, M.; Wang, J.; Xiao, Q.; Liu, Q.; Che, T.; Chen, E.; Yan, G.; et al. Watershed Allied Telemetry Experimental Research. J. Geophys. Res. Atmos. 2009, 114, D22103. [Google Scholar] [CrossRef]
Merlin, O.; Walker, J.P.; Kalma, J.D.; Kim, E.J.; Hacker, J.; Panciera, R.; Young, R.; Summerell, G.; Hornbuckle, J.; Hafeez, M.; et al. The NAFE’06 Data Set: Towards Soil Moisture Retrieval at Intermediate Resolution. Adv. Water Resour. 2008, 31, 1444–1455. [Google Scholar] [CrossRef]
Panciera, R.; Walker, J.P.; Jackson, T.J.; Gray, D.A.; Tanase, M.A.; Ryu, D.; Monerris, A.; Yardley, H.; Rüdiger, C.; Wu, X.; et al. The Soil Moisture Active Passive Experiments (SMAPEx): Toward Soil Moisture Retrieval From the SMAP Mission. IEEE Trans. Geosci. Remote Sens. 2014, 52, 490–507. [Google Scholar] [CrossRef]
Li, X.; Cheng, G.; Liu, S.; Xiao, Q.; Ma, M.; Jin, R.; Che, T.; Liu, Q.; Wang, W.; Qi, Y.; et al. Heihe Watershed Allied Telemetry Experimental Research (HiWATER): Scientific Objectives and Experimental Design. Bull. Am. Meteorol. Soc. 2013, 94, 1145–1160. [Google Scholar] [CrossRef]
Rasheed, M.W.; Tang, J.; Sarwar, A.; Shah, S.; Saddique, N.; Khan, M.U.; Imran Khan, M.; Nawaz, S.; Shamshiri, R.R.; Aziz, M.; et al. Soil Moisture Measuring Techniques and Factors Affecting the Moisture Dynamics: A Comprehensive Review. Sustainability 2022, 14, 11538. [Google Scholar] [CrossRef]
Briciu-Burghina, C.; Zhou, J.; Ali, M.I.; Regan, F. Demonstrating the Potential of a Low-Cost Soil Moisture Sensor Network. Sensors 2022, 22, 987. [Google Scholar] [CrossRef]
Kerkez, B.; Glaser, S.D.; Bales, R.C.; Meadows, M.W. Design and Performance of a Wireless Sensor Network for Catchment-scale Snow and Soil Moisture Measurements. Water Resour. Res. 2012, 48, W09515. [Google Scholar] [CrossRef]
Martini, E.; Bauckholt, M.; Kögler, S.; Kreck, M.; Roth, K.; Werban, U.; Wollschläger, U.; Zacharias, S. STH-Net: A Soil Monitoring Network for Process-Based Hydrological Modelling from the Pedon to the Hillslope Scale. Earth Syst. Sci. Data 2021, 13, 2529–2539. [Google Scholar] [CrossRef]
Bogena, H.R.; Herbst, M.; Huisman, J.A.; Rosenbaum, U.; Weuthen, A.; Vereecken, H. Potential of Wireless Sensor Networks for Measuring Soil Water Content Variability. Vadose Zone J. 2010, 9, 1002–1013. [Google Scholar] [CrossRef]
Placidi, P.; Gasperini, L.; Grassi, A.; Cecconi, M.; Scorzoni, A. Characterization of Low-Cost Capacitive Soil Moisture Sensors for IoT Networks. Sensors 2020, 20, 3585. [Google Scholar] [CrossRef]
Abdulwahid, H.M.; Mishra, A. Deployment Optimization Algorithms in Wireless Sensor Networks for Smart Cities: A Systematic Mapping Study. Sensors 2022, 22, 5094. [Google Scholar] [CrossRef]
Liang, J.; Tian, M.; Liu, Y.; Zhou, J. Coverage Optimization of Soil Moisture Wireless Sensor Networks Based on Adaptive Cauchy Variant Butterfly Optimization Algorithm. Sci. Rep. 2022, 12, 11687. [Google Scholar] [CrossRef]
Xiao, J.; Liu, Y.; Zhou, J. Quantum Clone Elite Genetic Algorithm-Based Evaluation Mechanism for Maximizing Network Efficiency in Soil Moisture Wireless Sensor Networks. J. Sens. 2021, 2021, 5590472. [Google Scholar] [CrossRef]
Dursun, M.; Özden, S. Optimization of Soil Moisture Sensor Placement for a PV-Powered Drip Irrigation System Using a Genetic Algorithm and Artificial Neural Network. Electr. Eng. 2017, 99, 407–419. [Google Scholar] [CrossRef]
Haseeb, K.; Ud Din, I.; Almogren, A.; Islam, N. An Energy Efficient and Secure IoT-Based WSN Framework: An Application to Smart Agriculture. Sensors 2020, 20, 2081. [Google Scholar] [CrossRef] [PubMed]
Bessenbacher, V.; Gudmundsson, L.; Seneviratne, S.I. Optimizing Soil Moisture Station Networks for Future Climates. Geophys. Res. Lett. 2023, 50, e2022GL101667. [Google Scholar] [CrossRef]
Oroza, C.A.; Zheng, Z.; Glaser, S.D.; Tuia, D.; Bales, R.C. Optimizing Embedded Sensor Network Design for Catchment-Scale Snow-Depth Estimation Using LiDAR and Machine Learning: Optimizing Snow Sensor Placements. Water Resour. Res. 2016, 52, 8174–8189. [Google Scholar] [CrossRef]
Yang, J.; Huang, X. The 30 m Annual Land Cover Dataset and Its Dynamics in China from 1990 to 2019. Earth Syst. Sci. Data 2021, 13, 3907–3925. [Google Scholar] [CrossRef]
Xu, Q.; Yang, Y.; Yang, R.; Zha, L.-S.; Lin, Z.-Q.; Shang, S.-H. Spatial Trade-Offs and Synergies between Ecosystem Services in Guangdong Province, China. Land 2024, 13, 32. [Google Scholar] [CrossRef]
Jiang, S.; Guo, X.; Zhao, P.; Liang, H. Radial Growth–Climate Relationship Varies with Spatial Distribution of Schima Superba Stands in Southeast China’s Subtropical Forests. Forests 2023, 14, 1291. [Google Scholar] [CrossRef]
Yu, B.; Kang, J.; Tang, J.; Wang, Z.; Zhang, S.; Ma, Q.; Su, H. Effect of Nitrogen Addition on the Intra-Annual Leaf and Stem Traits and Their Relationships in Two Dominant Species in a Subtropical Forest. Forests 2025, 16, 28. [Google Scholar] [CrossRef]
Tao, L.; Wang, G.; Chen, W.; Chen, X.; Li, J.; Cai, Q. Soil Moisture Retrieval from SAR and Optical Data Using a Combined Model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 637–647. [Google Scholar] [CrossRef]
Kornelsen, K.C.; Coulibaly, P. Advances in Soil Moisture Retrieval from Synthetic Aperture Radar and Hydrological Applications. J. Hydrol. 2013, 476, 460–489. [Google Scholar] [CrossRef]
Zeyliger, A.M.; Muzalevskiy, K.V.; Zinchenko, E.V.; Ermolaeva, O.S. Field Test of the Surface Soil Moisture Mapping Using Sentinel-1 Radar Data. Sci. Total Environ. 2022, 807, 151121. [Google Scholar] [CrossRef]
Monteiro, A.T.; Arenas-Castro, S.; Punalekar, S.M.; Cunha, M.; Mendes, I.; Giamberini, M.; Marques da Costa, E.; Fava, F.; Lucas, R. Remote Sensing of Vegetation and Soil Moisture Content in Atlantic Humid Mountains with Sentinel-1 and 2 Satellite Sensor Data. Ecol. Indic. 2024, 163, 112123. [Google Scholar] [CrossRef]
Fan, D.; Zhao, T.; Jiang, X.; García-García, A.; Schmidt, T.; Samaniego, L.; Attinger, S.; Wu, H.; Jiang, Y.; Shi, J.; et al. A Sentinel-1 SAR-Based Global 1-Km Resolution Soil Moisture Data Product: Algorithm and Preliminary Assessment. Remote Sens. Environ. 2025, 318, 114579. [Google Scholar] [CrossRef]
Bauer-Marschallinger, B.; Freeman, V.; Cao, S.; Paulik, C.; Schaufler, S.; Stachl, T.; Modanesi, S.; Massari, C.; Ciabatta, L.; Brocca, L.; et al. Toward Global Soil Moisture Monitoring with Sentinel-1: Harnessing Assets and Overcoming Obstacles. IEEE Trans. Geosci. Remote Sens. 2019, 57, 520–539. [Google Scholar] [CrossRef]
Wagner, W.; Lemoine, G.; Rott, H. A Method for Estimating Soil Moisture from ERS Scatterometer and Soil Data. Remote Sens. Environ. 1999, 70, 191–207. [Google Scholar] [CrossRef]
Peng, J.; Loew, A.; Merlin, O.; Verhoest, N.E.C. A Review of Spatial Downscaling of Satellite Remotely Sensed Soil Moisture. Rev. Geophys. 2017, 55, 341–366. [Google Scholar] [CrossRef]
Dostálová, A.; Doubková, M.; Sabel, D.; Bauer-Marschallinger, B.; Wagner, W. Seven Years of Advanced Synthetic Aperture Radar (ASAR) Global Monitoring (GM) of Surface Soil Moisture over Africa. Remote Sens. 2014, 6, 7683–7707. [Google Scholar] [CrossRef]
Scipal, K.; Drusch, M.; Wagner, W. Assimilation of a ERS Scatterometer Derived Soil Moisture Index in the ECMWF Numerical Weather Prediction System. Adv. Water Resour. 2008, 31, 1101–1112. [Google Scholar] [CrossRef]
Wagner, W.; Noll, J.; Borgeaud, M.; Rott, H. Monitoring Soil Moisture over the Canadian Prairies with the ERS Scatterometer. IEEE Trans. Geosci. Remote Sens. 1999, 37, 206–216. [Google Scholar] [CrossRef]
Liu, A.; Cheng, X.; Chen, Z. Performance Evaluation of GEDI and ICESat-2 Laser Altimeter Data for Terrain and Canopy Height Retrievals. Remote Sens. Environ. 2021, 264, 112571. [Google Scholar] [CrossRef]
Fayad, I.; Baghdadi, N.; Lahssini, K. An Assessment of the GEDI Lasers’ Capabilities in Detecting Canopy Tops and Their Penetration in a Densely Vegetated, Tropical Area. Remote Sens. 2022, 14, 2969. [Google Scholar] [CrossRef]
Schneider, K.; Helfricht, K.; Sailer, R.; Kuhn, M.; Schöber, J. Interannual Persistence of the Seasonal Snow Cover in a Glacierized Catchment. J. Glaciol. 2014, 60, 889–904. [Google Scholar]
Bukkapatnam, S.T.S.; Cheng, C. Forecasting the Evolution of Nonlinear and Nonstationary Systems Using Recurrence-Based Local Gaussian Process Models. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 2010, 82, 056206. [Google Scholar] [CrossRef] [PubMed]
Rasmussen, C.E.; Williams, C.K.I. Gaussian Processes for Machine Learning; The MIT Press: Cambridge, MA, USA, 2005; ISBN 978-0-262-25683-4. [Google Scholar]
Zhang, X.; Sun, X.; Lin, Z. Improving Soil Moisture Prediction Using Gaussian Process Regression. Smart Agric. Technol. 2025, 11, 100905. [Google Scholar] [CrossRef]
Jackson, T.J.; Bindlish, R.; Cosh, M.H.; Zhao, T.; Starks, P.J.; Bosch, D.D.; Seyfried, M.; Moran, M.S.; Goodrich, D.C.; Kerr, Y.H.; et al. Validation of Soil Moisture and Ocean Salinity (SMOS) Soil Moisture Over Watershed Networks in the U.S. IEEE Trans. Geosci. Remote Sens. 2012, 50, 1530–1543. [Google Scholar] [CrossRef]
Li, X.; Ling, F.; Foody, G.M.; Ge, Y.; Zhang, Y.; Du, Y. Generating a Series of Fine Spatial and Temporal Resolution Land Cover Maps by Fusing Coarse Spatial Resolution Remotely Sensed Images and Fine Spatial Resolution Land Cover Maps. Remote Sens. Environ. 2017, 196, 293–311. [Google Scholar] [CrossRef]
Narayanan, R.M.; Ponnappan, S.K.; Reichenbach, S.E. Effects of Noise on the Information Content of Remote Sensing Images. Geocarto Int. 2003, 18, 15–26. [Google Scholar] [CrossRef]
Sayer, A.M.; Govaerts, Y.; Kolmonen, P.; Lipponen, A.; Luffarelli, M.; Mielonen, T.; Patadia, F.; Popp, T.; Povey, A.C.; Stebel, K. A Review and Framework for the Evaluation of Pixel-Level Uncertainty Estimates in Satellite Aerosol Remote Sensing. Atmos. Meas. Tech. 2020, 13, 373–404. [Google Scholar] [CrossRef]
Chang, N.-B.; Makkeasorn, A. Optimal Site Selection of Watershed Hydrological Monitoring Stations Using Remote Sensing and Grey Integer Programming. Environ. Model. Assess. 2010, 15, 469–486. [Google Scholar] [CrossRef]
Su, L.; Wang, J.; Qin, X.; Wang, Q. Approximate Solution of a One-Dimensional Soil Water Infiltration Equation Based on the Brooks-Corey Model. Geoderma 2017, 297, 28–37. [Google Scholar] [CrossRef]
Hayek, M. An Exact Explicit Solution for One-Dimensional, Transient, Nonlinear Richards’ Equation for Modeling Infiltration with Special Hydraulic Functions. J. Hydrol. 2016, 535, 662–670. [Google Scholar] [CrossRef]
Ma, D.; Zhang, J.; Lu, Y.; Wu, L.; Wang, Q. Derivation of the Relationships between Green–Ampt Model Parameters and Soil Hydraulic Properties. Soil Sci. Soc. Am. J. 2015, 79, 1030–1042. [Google Scholar] [CrossRef]
Rosenbaum, U.; Bogena, H.R.; Herbst, M.; Huisman, J.A.; Peterson, T.J.; Weuthen, A.; Western, A.W.; Vereecken, H. Seasonal and Event Dynamics of Spatial Soil Moisture Patterns at the Small Catchment Scale. Water Resour. Res. 2012, 48, W10544. [Google Scholar] [CrossRef]
Ivanov, V.Y.; Fatichi, S.; Jenerette, G.D.; Espeleta, J.F.; Troch, P.A.; Huxman, T.E. Hysteresis of Soil Moisture Spatial Heterogeneity and the “Homogenizing” Effect of Vegetation. Water Resour. Res. 2010, 46, W09521. [Google Scholar] [CrossRef]
Western, A.W.; Blöschl, G.; Grayson, R.B. Geostatistical Characterisation of Soil Moisture Patterns in the Tarrawarra Catchment. J. Hydrol. 1998, 205, 20–37. [Google Scholar] [CrossRef]
Grayson, R.B.; Western, A.W.; Chiew, F.H.S.; Blöschl, G. Preferred States in Spatial Soil Moisture Patterns: Local and Nonlocal Controls. Water Resour. Res. 1997, 33, 2897–2908. [Google Scholar] [CrossRef]
Vivoni, E.R.; Rodríguez, J.C.; Watts, C.J. On the Spatiotemporal Variability of Soil Moisture and Evapotranspiration in a Mountainous Basin within the North American Monsoon Region. Water Resour. Res. 2010, 46, W02509. [Google Scholar] [CrossRef]
Schume, H.; Jost, G.; Katzensteiner, K. Spatio-Temporal Analysis of the Soil Water Content in a Mixed Norway Spruce (Picea abies (L.) Karst.)–European Beech (Fagus sylvatica L.) Stand. Geoderma 2003, 112, 273–287. [Google Scholar] [CrossRef]
Scanlon, B.R.; Healy, R.W.; Cook, P.G. Choosing Appropriate Techniques for Quantifying Groundwater Recharge. Hydrogeol. J. 2002, 10, 18–39. [Google Scholar] [CrossRef]
Zhuo, L.; Dai, Q.; Zhao, B.; Han, D. Soil Moisture Sensor Network Design for Hydrological Applications. Hydrol. Earth Syst. Sci. 2020, 24, 2577–2591. [Google Scholar] [CrossRef]
Ochsner, T.E.; Cosh, M.H.; Cuenca, R.H.; Dorigo, W.A.; Draper, C.S.; Hagimoto, Y.; Kerr, Y.H.; Larson, K.M.; Njoku, E.G.; Small, E.E.; et al. State of the Art in Large-Scale Soil Moisture Monitoring. Soil Sci. Soc. Am. J. 2013, 77, 1888–1919. [Google Scholar] [CrossRef]
Balenzano, A.; Mattia, F.; Satalino, G.; Lovergine, F.P.; Palmisano, D.; Peng, J.; Marzahn, P.; Wegmüller, U.; Cartus, O.; Dąbrowska-Zielińska, K.; et al. Sentinel-1 Soil Moisture at 1 Km Resolution: A Validation Study. Remote Sens. Environ. 2021, 263, 112554. [Google Scholar] [CrossRef]

Figure 1. Workflow for the optimization of soil moisture sensor placement, illustrating the flow of data between processing stages. The process begins by (1) generating a seasonal soil moisture dataset from Sentinel-1 imagery. The output of this step serves as the training data for the final optimization. Concurrently, (2) potential sensor sites are identified by applying GMM clustering to the catchment’s physiographic data. In the final stage, (3) a GPR model systematically evaluates sensor configurations from the sites identified in Step 2, using the satellite data from Step 1 to find the optimal network that minimizes prediction error, e.g., (a) optimal sensor locations in wet season, and (b) optimal sensor locations in dry season.

Figure 2. (a) Location of the study area in China; (b) land cover of the SKH catchment.

Figure 3. Spatial distribution and histogram of SAR-derived soil moisture saturation during wet and dry seasons (2017–2023): (a) spatial distribution during the wet season; (b) spatial distribution during the dry season; (c) histogram of soil moisture saturation in the wet season; (d) histogram of soil moisture saturation in the dry season.

Figure 4. (a) RMSE and (b) bias of soil moisture predictions from the GPR model with different numbers of sensors.

Figure 5. Heatmap of paired t-test p-value heatmap for residuals. ‘***’ denoting statistical significance (p < 0.05).

Figure 6. Performance of the GPR model under four site selection scenarios: (a–d) bias histograms with MAE; (e–h) spatial distribution of predicted soil moisture and sensor locations (black pentagrams); and (i–l) spatial distribution of bias.

Figure 7. (a) Histogram and (b) violin plot of the RMSE of GPR model predictions for soil moisture during dry and wet seasons with different numbers of sensors.

Figure 8. Kernel density map showing all representative sites and the distribution of optimal sensor locations, indicated by black pentagrams, during the (a) wet season and (b) dry season.

Figure 9. Box plots of the four physiographic characteristics of representing sites: (a) elevation; (b) canopy height; (c) slope; and (d) aspect.

Table 1. RMSE and bias of GPR model with different sensor numbers.

Sensor Number	RMSE (%)	Bias (%)
5	8.55	1.03
6	7.39	2.08
7	9.09	−0.38
8	8.05	3.21
9	7.20	1.23
10	8.16	0.18
11	8.23	−0.28
12	8.56	1.01
13	8.04	2.57
14	7.27	1.18
15	8.66	−3.91
16	7.24	0.86
17	7.67	−1.82
18	7.21	1.30
19	7.68	2.29
20	7.15	0.89

Table 2. The MAEs of four site selection scenarios with different sensor numbers and selection methods.

Scenario	Sensor Number	Site Selection Method	MAE (%)
#1	7	proposed method	7.27
#2	9	proposed method	5.70
#3	11	proposed method	6.53
#4	9	random selection	8.69

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xie, Y.; Cui, G.; Zheng, K.; Tang, G. Integrating Sentinel-1 SAR and Machine Learning Models for Optimal Soil Moisture Sensor Placement at Catchment Scale. Remote Sens. 2025, 17, 2330. https://doi.org/10.3390/rs17132330

AMA Style

Xie Y, Cui G, Zheng K, Tang G. Integrating Sentinel-1 SAR and Machine Learning Models for Optimal Soil Moisture Sensor Placement at Catchment Scale. Remote Sensing. 2025; 17(13):2330. https://doi.org/10.3390/rs17132330

Chicago/Turabian Style

Xie, Yi, Guotao Cui, Kaifeng Zheng, and Guoping Tang. 2025. "Integrating Sentinel-1 SAR and Machine Learning Models for Optimal Soil Moisture Sensor Placement at Catchment Scale" Remote Sensing 17, no. 13: 2330. https://doi.org/10.3390/rs17132330

APA Style

Xie, Y., Cui, G., Zheng, K., & Tang, G. (2025). Integrating Sentinel-1 SAR and Machine Learning Models for Optimal Soil Moisture Sensor Placement at Catchment Scale. Remote Sensing, 17(13), 2330. https://doi.org/10.3390/rs17132330

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Integrating Sentinel-1 SAR and Machine Learning Models for Optimal Soil Moisture Sensor Placement at Catchment Scale

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Input Data and Preprocessing

2.2.1. SAR-Based Soil Moisture Using Sentinel-1 Data

2.2.2. Catchment Physiographic Data

2.3. Machine Learning Models

2.3.1. GMM for Soil Moisture Sensors Clustering

2.3.2. Spatial Prediction Using GPR

2.3.3. Model Evaluation

3. Results

3.1. SAR-Based Soil Moisture in the SKH Catchment

3.2. Model Performance and Optimal Placement of Sensors

3.3. Seasonal Impacts on Optimal Site Selection

3.4. Physiographic Characteristics of Representative Sites

4. Discussion

4.1. Soil Moisture Monitoring Networks for Remote Sensing

4.2. Seasonal Dynamics of Soil Moisture Heterogeneity

4.3. Scaling and Adaptation of Soil Moisture Sensor Network Optimization

4.4. Limitations

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI