Next Article in Journal
A Novel Approach for Wetland Type Classification in China’s Coastal Areas Using Landsat Time Series
Previous Article in Journal
Construction of Green Volume Quantity and Equity Indicators for Urban Areas at Both Regional and Neighborhood Scales: A Case Study of Major Cities in China
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Soil MoistureRetrieval from TM-1 GNSS-R Reflections with Auxiliary Geophysical Variables: A Multi-Cluster and Seasonal Evaluation

1
College of Geodesy and Geomatics, Shandong University of Science and Technology, Qingdao 266590, China
2
Shandong Engineering Research Center for Beidou Navigation and Intelligent Spatial Information Technology Application, Qingdao 266590, China
3
Qingdao Key Laboratory of Beidou Navigation and Intelligent Spatial Information Technology Application, Qingdao 266590, China
4
Qingdao Surveying & Mapping Institute, Qingdao 266033, China
5
Shandong Engineering Research Center of Digital Intelligence Technology in Underground Space, Qingdao 266033, China
*
Author to whom correspondence should be addressed.
Land 2026, 15(1), 36; https://doi.org/10.3390/land15010036
Submission received: 3 November 2025 / Revised: 18 December 2025 / Accepted: 19 December 2025 / Published: 24 December 2025

Abstract

Current passive microwave satellites like SMAP still face limitations in observational frequency and responsiveness in regions with frequent cloud cover, dense vegetation, or complex terrain, making it difficult to achieve continuous global monitoring with high spatio-temporal resolution. To enhance global high-frequency monitoring capabilities, this study utilizes global reflectivity data provided by the Tianmu-1 (TM-1) constellation since 2023, combined with multiple auxiliary variables, including NDVI, VWC, precipitation, and elevation, to develop a 9 km resolution soil moisture retrieval model. Several spatial clustering and temporal partitioning strategies are incorporated for systematic evaluation. Additionally, since the publicly available TM-1 L1 reflectivity data does not provide separable polarization channels, this study uses DDM/specular point reflectivity as the primary observable quantity for modeling and mitigates non-soil factor interference by introducing multi-source priors such as NDVI, VWC, precipitation, terrain, and roughness. Unlike SMAP’s “single orbit daily fixed local time” observation mode, TM-1, leveraging multi-constellation and multi-orbit reflection geometry, offers more balanced temporal sampling and availability in cloudy, rainy, and mid-to-high latitude regions. This enables temporal gap filling and rapid event response (such as moisture transitions within hours after precipitation events) during periods of SMAP’s quality masking or intermittent data loss. Results indicate that the model combining LC-cluster with seasonal partitioning delivers the best performance at the cluster level, achieving a correlation coefficient (R) of 0.8155 and an unbiased RMSE (ubRMSE) of 0.0689 cm3/cm3, with a particularly strong performance in barren and shrub ecosystems. Comparisons with SMAP and ISMN datasets show that TM-1 is consistent with mainstream products in trend tracking and systematic error control, providing valuable support for global and high-latitude studies of dynamic hydrothermal processes due to its more balanced mid- and high-latitude orbital coverage.

1. Introduction

Soil moisture (SM) is a key variable in the global water cycle and energy exchange, regulating processes such as precipitation infiltration, evapotranspiration, and surface runoff. It plays a crucial role in agricultural production, hydrological modeling, drought monitoring, and climate prediction. However, due to its significant spatio-temporal variability, traditional ground-based observations are unable to meet the high-resolution monitoring requirements at regional to global scales in terms of coverage and temporal frequency [1]. Against this backdrop, satellite-based microwave remote sensing has gradually become the primary method. In particular, L-band passive microwave sensors are widely used due to their strong sensitivity to soil moisture and relatively low sensitivity to vegetation effects. Representative missions, such as ESA’s SMOS and NASA’s SMAP, have been providing global SM products over long periods, with spatial resolutions of approximately 25–36 km and enhanced resolution products reaching 9 km, widely applied in land surface processes and climate research [2]. However, in regions with frequent cloudy and rainy conditions, dense vegetation, or complex terrain, passive microwave observations remain constrained by antenna size and orbital characteristics, leading to insufficient revisit frequency and dynamic response capabilities [3,4].
In recent years, spaceborne GNSS-R has shown stable and cloud-independent sensing capabilities for surface moisture changes, but previous research has mainly focused on three key areas of progress. First, the observability and physical sensitivity have been systematically verified: from comparisons between CYGNSS reflectivity and SMAP soil moisture to consistency checks across multiple climatic zones and the reconstruction of observations in complex terrain/freeze–thaw environments in the Third Pole, as well as interannual consistency evaluations, all of which demonstrate that GNSS-R can effectively track SM dynamics and provide quasi-global usability [5,6,7,8,9,10]. Second, accuracy improvements depend on the ‘GNSS-R + prior’ multi-source approach: machine learning integration of SMAP and CYGNSS observations enables quasi-global estimates at daily and 9 km spatial scales [11]; introducing LST and SIF alleviates systematic biases due to vegetation–temperature coupling [12,13]; applications targeting regional hydrological mechanisms and feedbacks have also started to emerge [14]; at the same time, land-cover-based stratified modeling and semi-empirical corrections suggest that ‘land cover/scenario optimization’ generally outperforms single-parameter schemes [15,16]; and quality control and emissivity estimation work have reduced coarse grid errors from the data and radiative transfer perspectives, supporting higher-resolution gridded modeling [17,18,19]. Third, coverage and evaluation still face structural weaknesses: due to the low-inclination orbit and specular point distribution of CYGNSS, spatio-temporal coverage and systematic evaluations in mid- and high-latitude regions and complex terrains are relatively insufficient. Although multi-constellation explorations and sensitivity analyses based on Fengyun-3 and GNOS-II exist, common limitations include coarse resolution (∼36 km), insufficient integration of multi-source priors, or validation limited to regional/short-term periods [20,21,22,23]. Overall, existing research indicates that, under a unified evaluation grid, incorporating interpretable spatial stratification (e.g., land cover) and seasonal partitioning to quantify and independently validate GNSS-R retrievals, especially in mid- and high-latitude regions and cloud-prone regions, remains a practical strategy for improving usability and timeliness.
Against this backdrop, the TM-1 constellation consists of 23 low-Earth-orbit satellites that support full reception of the GPS, BDS, GLONASS, and Galileo systems. Compared with CYGNSS, which primarily enhances sampling coverage over low- to mid-latitude land areas, TM-1 also maintains high specular point density and more balanced orbital visibility in mid- and high-latitude regions. Relative to the current FY-3/GNOS-R configuration, where a small number of operational satellites carry the reflectometry payload and the field of view and orbital geometry of the reflection channels constrain the ground swath and incidence-angle distribution, the multi-constellation, multi-track layout of TM-1 provides higher land sampling density and better revisit capability, substantially improving intra-day and day-to-day spatio-temporal sampling [24]. The passive dual-station GNSS-R geometry is insensitive to cloud and thin cloud, and the wide variety of observation azimuths makes TM-1 more effective than traditional passive microwave in complex terrain and high-latitude regions, offering higher effective pixel availability and usability. When SMAP encounters quality masking or limited local overpass times, TM-1 can fill temporal gaps and respond to moisture transitions within hours post-precipitation [25]. Based on this, this study uses TM-1 L1 DDM statistics/specular point reflectivity, combined with NDVI, VWC, precipitation, terrain, and roughness priors, to systematically assess the impact of various spatial clustering (including LC-cluster) and seasonal partitioning strategies on SM retrieval performance, with SMAP and ISMN used as benchmarks for independent validation. This study aims to complement, rather than replace, SMAP’s temporal efficiency and mid–high latitude coverage, providing more balanced temporal sampling and availability evidence for global and high-latitude dynamic hydrothermal process studies.
In terms of feature selection, a data-driven approach was employed, combining existing literature and experimental data to verify the relationship between features and soil moisture. This process not only summarizes existing research findings but also analyzes experimental data to ensure that the selected features contribute to retrieval accuracy and stability. While this study does not rely on specific physical mechanism models, empirical analysis validated the effectiveness of these features, and importance evaluation confirmed their contribution to model performance. Therefore, the selected features effectively capture the spatio-temporal variability of soil moisture and enhance the model’s generalization ability in different environments.
In summary, this study aims to alleviate the limited temporal sampling and spatial coverage of existing L-band soil moisture products in certain mid- and high-latitude and cloudy or rainy regions by exploiting TM-1 GNSS-R reflectivity as a complementary data source to SMAP. Building on a unified 9 km grid and multi-source geophysical variables, we: (i) develop a global 9 km soil moisture retrieval model for TM-1; (ii) systematically evaluate the impact of different spatial clustering schemes (including land-cover-based clusters) and seasonal vs. chronological temporal partitioning on retrieval performance; and (iii) compare TM-1 soil moisture estimates with SMAP and ISMN to quantify their consistency and assess the potential added value of TM-1 in regions where SMAP retrievals are sparse or quality-flagged.
This paper is organized as follows: Section 2 and Section 3 describe the TM-1 data, auxiliary variables, SMAP and ISMN reference data, and preprocessing procedures. Section 4 evaluates the model performance under different clustering and temporal partitioning schemes, followed by discussions on spatial distribution, land cover responses, and feature importance. Section 5 concludes the study and discusses limitations and future directions.

2. Datasets

This section describes the datasets used to construct the machine learning-based soil moisture retrieval model, including SMAP soil moisture data and auxiliary variables representing surface conditions. In addition, in situ soil moisture measurements are incorporated to evaluate the TM-1 retrievals. Subsequent sections will explain data preprocessing, the machine learning procedures, model construction methods, and evaluation metrics.

2.1. Tianmu-1 GNSS Radio Occultation and Reflectometry Constellation

Tianmu-1 is China’s first commercial low-Earth-orbit meteorological satellite constellation system, constructed and operated by TMSat (Chongqing) Satellite Technology Co., Ltd., a subsidiary of China Aerospace Science and Industry Corporation. Currently, the constellation consists of 23 operational satellites, primarily equipped with GNSS occultation and GNSS-R reflectometry payloads. It supports four major navigation systems: BeiDou, GPS, GLONASS, and Galileo, providing high-frequency, wide-coverage Earth observation capabilities. The Tianmu GNSS-R products offer all-weather, all-day global coverage of atmospheric and land surface parameters.
With the official launch of the constellation on 9 January 2023, this study utilizes the TM-1 Level-1 reflectivity data products obtained from that date onward (https://tmsats.com/satellite (accessed on 2 April 2025)). The dataset is stored with an hourly sampling frequency, with the Delay–Doppler Map (DDM) being the most important parameter. The DDM consists of a 61 × 20 (Delay × Doppler) grid, with a delay resolution of 0.125 chips and a Doppler frequency resolution of 500 Hz. Its characteristics vary across different reflection surfaces [26,27] (see Figure 1). The specific key TM-1 parameters are summarized in Table 1. The GNSS transmitter is RHCP (Right-Hand Circular Polarization), and after surface scattering, it contains both LHCP (Left-Hand Circular Polarization) and RHCP components. However, the TM-1 L1 public product used in this study does not provide separable polarization channels or multipolarization scattering coefficients at the receiver end. Instead, it primarily provides DDM statistics and specular-point equivalent reflectivity.
Therefore, this study employs a joint constraint framework of “DDM shape parameters (skewness/kurtosis) + incidence angle + surface priors (roughness/vegetation/ precipitation/terrain)” to absorb the uncertainties introduced by the absence of polarization, which is consistent with the GNSS-R literature’s approach of “replacing polarization decomposition with geometric/surface priors.” It should be noted that the reflectivity parameter directly provided by TM-1 assumes coherent scattering at the specular point. If higher-level data containing polarization components become available in the future, we will further assess their contribution to retrieval stability and spatio-temporal generalization.
Following the conventional assumption of the soil-moisture penetration depth used in SMAP retrievals, TM-1 reflectivity is assumed to represent the top 5 cm of the soil profile. Prior to estimating soil moisture from TM-1 observables, we perform quality control to remove records with large uncertainties caused by instrument artifacts or complex surface conditions. Specifically, we apply the following filters: (i) exclude observations with incidence angles θ inc > 65 to avoid path-elongation distortions at large angles; (ii) require signal-to-noise ratio (SNR) > 2 to ensure signal strength and reliability; (iii) retain only samples with reflectivity > 30 and antenna gain > 0 dB to eliminate extremely weak reflections and receiver-malfunction cases; (iv) keep only records whose 20-bit quality flags are all zero—indicating no attitude disturbance, thermal drift, automatic-gain-control (AGC) anomaly, or radio-frequency interference (RFI) contamination; and (v) require the land–sea mask value Sp _ land _ sea _ mask 0.95 to maximize land-data retention.
Additionally, considering the significant effects of terrain, water bodies, and snow/ice, we remove observations over areas with surface elevation > 1500 m and with land-cover codes of 0 (Water) or 15 (Snow and Ice). Details of the elevation and land-cover datasets are provided in Section 2.3.

2.2. Soil Moisture Active Passive (SMAP)

The Soil Moisture Active Passive (SMAP) satellite, launched by NASA in 2015, was designed to observe global surface soil moisture (depth∼5 cm) and freeze/thaw states. SMAP employs an L-band radiometer with a revisit cycle of 2–3 days.
In this study, we use the enhanced Level-3 radiometer soil moisture product (SPL3SMP_E) from SMAP, available via the National Snow and Ice Data Center (https://doi.org/10.5067/M20OXIZHY3RJ (accessed on 14 April 2025)). This product is derived from brightness temperature observations acquired by SMAP’s L-band radiometer. It originates from the Level-1C antenna temperature data, which are optimally interpolated using the Backus–Gilbert method to produce enhanced-resolution brightness temperatures (L2_SM_P_E) and subsequently composited into daily Level-3 gridded soil moisture. The data are provided in EASE-Grid 2.0 format at 9 km spatial resolution and daily temporal resolution, including both descending (06:00 local time) and ascending (18:00 local time) overpasses, thereby offering comprehensive coverage of global land-surface hydrothermal conditions. Because the thermal equilibrium at the soil–vegetation interface depends on the local acquisition time, which affects retrieval accuracy, descending-pass SMAP retrievals are generally considered more reliable. Nevertheless, in this study, both ascending and descending SMAP observations are employed to increase spatio-temporal sampling. Prior to use, quality control is applied to retain only grid cells that (i) are under unfrozen conditions (land surface temperature > 273.15 K ), and (ii) are flagged as valid ( quality _ flag = 1 ), as shown in Figure 2.

2.3. Environmental Auxiliary Variables

The GNSS-R signals used in this study are not only sensitive to soil moisture but are also easily influenced by multiple factors such as surface roughness, vegetation cover, topographic relief, and meteorological conditions. To accurately characterize signal propagation and reflection and to improve retrieval accuracy and stability, we introduce multi-source auxiliary variables as prior constraints [28]. Physically, GNSS-R reflections are jointly controlled by three categories of factors:
  • (1) dielectric response, dominated by volumetric water content and soil particle composition;
  • (2) surface geometry, where surface roughness and slope/relief determine the balance between specular and diffuse scattering;
  • (3) volume scattering/occlusion and hydrothermal forcing, driven by vegetation water status and short-term precipitation–evapotranspiration processes.
Guided by this “signal–surface–meteorology” perspective, we construct a minimal sufficient feature set: NDVI and VWC to represent vegetation growth and water status; Roughness, slope, and elevation to describe surface geometric characteristics; and precipitation and LST to capture short-term hydrothermal forcing. These are modeled jointly with TM-1 DDM shape parameters (skewness, kurtosis) and incidence angle, thereby explicitly absorbing confounding from non-soil factors—such as vegetation occlusion, geometric effects, and short-term hydrothermal perturbations—and reducing their interference with soil moisture retrieval. Meanwhile, land-cover type is used both to identify and exclude areas unsuitable for retrieval (e.g., water, snow/ice) and as a key basis for spatial organization and clustered modeling. In sum, accurately characterizing background conditions—topography, vegetation, and land cover—is essential for high-quality GNSS-R soil moisture retrieval.
The precipitation data are sourced from the ERA5-Land hourly dataset (https://cds.climate.copernicus.eu (accessed on 16 April 2025)), offering globally consistent meteorological records at ∼9 km spatial and hourly temporal resolution. NDVI data are obtained from the VIIRS Global NDVI product released by USGS (https://earthexplorer.usgs.gov/ (accessed on 25 April 2025)), with a 10-day compositing period and 1 km spatial resolution. Terrain data are from the GTOPO30 global digital elevation model (DEM), also available via USGS, with ∼1 km resolution. Land cover information is derived from the MODIS MCD12C1 Climate Modeling Grid (CMG) product (https://search.earthdata.nasa.gov (accessed on 11 April 2025)), which provides 17 IGBP land cover classes at a 0.05° spatial resolution.

2.4. In Situ Soil Moisture for Independent Validation

Given that machine learning models are often sensitive to the distribution characteristics of their training datasets, this study employs in situ soil moisture observations from the International Soil Moisture Network (ISMN) (https://ismn.earth/en/ (accessed on 12 June 2025)) for independent validation. ISMN compiles data from a wide range of regional networks, with particularly dense coverage in North America and Europe [29,30]. Considering that L-band microwave signals typically penetrate the top 0– 5 cm of the soil surface, only ISMN measurements within the 0– 5 cm depth layer are used for validation. To ensure data consistency and quality, only hourly ISMN records flagged as “G” (good quality) are retained and subsequently aggregated to daily averages. These ISMN observations are used for independent validation of the TM-1 soil moisture retrievals, and their spatial distribution is shown in Figure 3.

3. Methodology

Using the auxiliary datasets described above, several features are derived and combined with GNSS-R observables to construct the input layer of the machine learning model. In total, eleven input features are used: (1) reflectivity, (2) kurtosis, (3) incidence angle, (4) skewness, (5) NDVI, (6) VWC, (7) clay content, (8) surface roughness, (9) elevation, (10) precipitation, and (11) land surface temperature (LST). The overall training and testing framework is illustrated in Figure 4.

3.1. Spatial Clustering and Data Organization Strategies

To construct the machine learning model for global soil moisture retrieval using Tianmu-1 observations, multiple clustering strategies were applied to organize the 9 km grid cells. The corresponding statistics are presented in Table 2. One baseline method involves merging all global 9 km grid cells into a single unified model, hereafter referred to as the ONE-cluster. In addition, a regular-interval spatial partitioning approach was employed, dividing the global coverage of Tianmu-1 into square regions with side lengths of 72 km , 288 km , 720 km , 1080 km , 1440 km , and 2250 km . For example, in the 288 km case, all cells of the grid 9 km within each 288 km × 288 km region were aggregated to train an independent model (hereafter referred to as the 288KM-cluster). Using this method, the global land surface was divided into 2606 grid boxes at the 288 km scale, and the total number of models at each scale depended on the number of land grids containing valid data.
Beyond the spatial partitioning method, a land-cover-based clustering strategy (LC-cluster) was also implemented. Specifically, the 17 IGBP land-cover classes were reclassified into eight categories: forest, shrubland, grassland, wetland, cropland, urban, barren, and snow/ice or water. Because snow/ice and water areas are unsuitable for soil-moisture retrieval, they were excluded, leaving seven classes for modeling. Under this strategy, each 9 km grid cell was assigned to the corresponding land-cover category, and separate models were constructed for each class. Owing to the uneven spatial extent of different land-cover types, the number of samples available for model training varied across categories.

3.2. Random Forest Modeling

In this study, the Random Forest (RF) algorithm was employed to construct the soil moisture (SM) retrieval model. As an ensemble learning method, RF exhibits strong nonlinear modeling capabilities, robustness, and adaptability to high-dimensional heterogeneous features, making it suitable for addressing the complex land surface structures and nonlinear responses inherent in Tianmu-1 remote sensing data. Moreover, RF does not require strict assumptions about the statistical distribution of input features, allowing stable performance even under imbalanced data distributions or in the presence of outliers. Considering its comprehensive advantages in predictive accuracy, computational efficiency, and generalization ability, RF was selected as the core regression model in this study. To further enhance the model’s sensitivity to regional spatial heterogeneity, hyperparameter optimization was conducted under multiple clustering strategies, thereby adapting to data characteristics at different scales and improving overall model expressiveness.
In each clustering strategy, we adopted a unified hyperparameter search procedure to determine the optimal configuration of the random forest model. Considering practical constraints on computational resources, we first selected a representative subset consisting of the top 5% of clusters in terms of sample size, and conducted a grid search for hyperparameters on this subset. Specifically, the number of trees was varied from 10 to 500 with a step of 10, and the maximum tree depth was varied from 10 to 150 with a step of 5. A 10-fold cross-validation was used, and the configuration that minimized the validation unbiased root mean square error (ubRMSE) was selected as the optimal choice. For each parameter combination in the grid, the model was trained on the representative samples and the validation error was computed, yielding a configuration that performed most stably across different clustering strategies and temporal partitioning schemes. Based on these experiments, we unified the core hyperparameters of the random forest model as n estimators = 250 and max_depth = 25 , and kept them fixed in all LC-cluster/ONE-cluster and temporal-splitting experiments to ensure comparability across configurations and to avoid over-tuning in small-sample clusters. Under the current data volume and hardware conditions, a full hyperparameter search for each clustering strategy takes approximately 2–3 days; once the optimal parameters were determined, the final model was retrained on all available training samples and then used for subsequent TM-1 soil moisture prediction.

3.3. Data Processing and Temporal Partitioning

To ensure spatial consistency between model inputs and outputs, all auxiliary variables (e.g., NDVI, VWC, precipitation, elevation) were preprocessed and matched to the locations of Tianmu-1 GNSS-R specular reflection points using a nearest-neighbor method. These were subsequently mapped onto the 9 km EASE-Grid framework to maintain spatial alignment with SMAP soil moisture products, thereby avoiding scale-induced biases across datasets. To ensure spatiotemporal consistency, all datasets were synchronized at the daily scale: within each UTC day and within the same 9 km grid cell, hourly TM-1 observations were averaged to form a daily value; SMAP’s two daily overpasses were averaged to obtain a 9 km daily value; ISMN minute-/hour-level records were aggregated to daily means at each station. For each TM-1 specular point, under the same-day condition, we extracted the nearest pixel value from each auxiliary grid (nearest-neighbor sampling) and then aggregated the point samples onto the 9 km EASE-Grid for training and evaluation.
On the temporal dimension, two partitioning strategies were designed to examine model robustness and generalization:
  • Chronological split: Data from January to August 2023 were used for training and data from September to December 2023 were reserved for testing to evaluate predictive performance across different time periods.
  • Seasonal split: Observations from winter (January–February), spring (March–April), summer (June–July), and autumn (September–October) were used for training, while the remaining months (May, August, November, and December 2023) served as the test set. This setup better reflects the seasonal variability of soil moisture dynamics.

3.4. Model Evaluation and Validation Methods

To comprehensively evaluate the performance of the TM-1 soil moisture retrieval model, assessments were carried out at both the cluster and grid-cell levels, using multiple statistical metrics and independent validation data. The evaluation metrics included the correlation coefficient (R), ubRMSE, mean absolute error (MAE), and root mean square error (RMSE), which, respectively, quantify model performance in terms of trend consistency, random error, overall bias, and total accuracy.
For reference datasets, the SMAP-enhanced L3 soil moisture product was used to ensure consistency with mainstream passive microwave retrievals, while in situ SM measurements from the ISMN (0– 5 cm depth) provided an independent ground benchmark, ensuring objective and reliable validation. Furthermore, to identify the contributions of different input features, permutation importance analysis was applied. This method quantifies the importance of each variable by evaluating the change in model performance when the feature values are randomly permuted. The results highlighted VWC, precipitation, and surface state as dominant drivers of soil moisture retrieval, providing both insight into model mechanisms and guidance for future optimization.

4. Results

In this section, the performance of the Tianmu-1 soil moisture (SM) retrieval using the RF model is evaluated from multiple perspectives, as outlined in Section 3.4.

4.1. Model Performance Under Temporal Partitioning

As described in Section 3.2, multiple clustering strategies (72KM, 288KM, 720KM, 1080KM, 1440KM, 2250KM, LC, ONE) were implemented using Tianmu-1 GNSS-R data from 2023 to construct retrieval models for SMAP soil moisture. The models were trained with data from January–August and tested with data from September–December. At the 9 km grid scale, ubRMSE and correlation coefficient R were calculated as performance metrics, with global medians shown in Figure 5, where vertical bars represent the interquartile range (25–75%). Figure 5 presents results under two evaluation schemes: panel (a) shows cluster-level performance, reflecting the average accuracy of each cluster, while panel (b) shows grid-level performance, representing the actual accuracy at individual 9 km cells.
From Figure 5a, the ONE-cluster approach achieved the highest correlation coefficient R at the cluster level; however, its ubRMSE was relatively high, suggesting strong correlation performance but weaker error control. The LC-cluster achieved slightly lower R but much smaller ubRMSE, demonstrating more balanced overall performance. Since the ONE-cluster does not involve subset partitioning, interquartile ranges could not be calculated, and thus error bars are not displayed in the figure. From Figure 5b, performance differences among clusters converged at the grid level, but the LC-cluster consistently maintained the highest R and lowest ubRMSE, showing the most stable and reliable performance, followed by the ONE-cluster strategy. Other clustering methods showed certain advantages at specific spatial scales: smaller clusters benefited from more balanced local samples, leading to strong localized performance, though sometimes limited by insufficient sample sizes that risk underfitting; in contrast, larger clusters had abundant samples but suffered from imbalanced distributions, which affected training consistency. Overall, the LC-cluster strategy achieved the best trade-off between accuracy and stability across evaluation schemes.

4.2. Accuracy Evaluation with Seasonal Temporal Splitting

Given that the LC-cluster and ONE-cluster strategies demonstrated representative modeling performance in the previous assessments, a seasonal partitioning scheme was introduced in addition to the chronological split (i.e., January–August 2023 as training and September–December 2023 as testing).
Chronological partitioning is suitable for testing a model’s predictive capacity for future time periods; however, as it does not account for seasonal structures, training and testing datasets are often under different climatic regimes, potentially introducing strong seasonal heterogeneity and reducing model generalization. To mitigate this distribution shift, the seasonal scheme assigned January–February, March–April, June–July, and September–October as training samples (representing winter, spring, summer, and autumn, respectively), while the remaining months (May, August, November–December) were used for testing. Although this partitioning is not a strict global definition of seasons, it better reflects the climatic cyclicity influencing soil moisture dynamics.
Figure 6 shows the results, indicating that seasonal partitioning outperformed chronological partitioning in most evaluation metrics, yielding greater stability and accuracy. At the cluster level, both LC-cluster and ONE-cluster showed higher R values under seasonal partitioning. For example, LC-cluster improved from R = 0.7336 to R = 0.8155, with ubRMSE reduced from 0.0716 to 0.0689, representing significant gains. Similar trends were observed at the grid level: seasonal partitioning consistently improved correlation and optimized error metrics. For LC-cluster, the median R increased from 0.4956 to 0.5251; although ubRMSE rose slightly ( 0.0459 0.0499 ), both RMSE and MAE decreased, suggesting that enhanced correlation was achieved while maintaining a more stable error structure.
Thus, seasonal partitioning demonstrated superior temporal adaptability and spatial generalization for soil moisture modeling. It not only captured the seasonal coupling between climate and SM dynamics but also alleviated distribution shifts inherent in chronological splits. Across evaluation metrics, LC-cluster consistently outperformed ONE-cluster, offering higher predictive stability and modeling accuracy and providing a robust methodological basis for regional retrievals and large-scale applications. We note that, building on this configuration, Section 4.5 further selects four representative months (May, August, November, and December) from the test period and analyzes the seasonal maps of R and ubRMSE between TM-1 and SMAP, thereby providing a more detailed assessment of the seasonal-splitting scheme from both regional and seasonal perspectives.

4.3. Spatial Generalization Performance Using LC-Cluster and Seasonal Partitioning

To further evaluate the adaptability and generalization of the seasonal partitioning with the LC-cluster strategy, TM-1 predictions obtained under the seasonal scheme were assessed globally at the grid-cell scale using four metrics—R, RMSE, ubRMSE, and MAE. Results are shown in Figure 7.
The spatial distribution of R demonstrates excellent model performance across multiple key ecological zones and climatic regions. High correlations ( R > 0.8 , in some areas approaching 0.9) were observed in the Sahel (West Africa), western Mongolian Plateau, northeastern China, Australia, South African grasslands, northwestern India, parts of Southeast Asia, central–southern Europe, northern Argentina, central Brazil, southwestern Canada, and much of the United States. These regions typically exhibit clear vegetation cycles and strong climatic forcing, resulting in stable relationships between remote sensing predictors (NDVI, VWC, precipitation) and soil moisture responses, which facilitate accurate model fitting.
Error metrics revealed spatial variability in prediction accuracy. High errors (RMSE, ubRMSE, MAE > 0.06 cm 3 / cm 3 ) occurred in the Amazon Basin, northern DRC, mountainous Japan, southern India, western Siberia, and parts of the Russian boreal forest, likely due to strong hydrological variability, weak remote sensing signals, or complex terrain. Conversely, lower errors (RMSE and MAE < 0.04 cm 3 / cm 3 , ubRMSE similarly low) were recorded in the North China Plain, northeastern Chinese croplands, U.S. Midwest, central Europe, southern Africa agricultural regions, and South American Pampas grasslands, indicating stable and accurate predictions in major agricultural zones.
Notably, in most regions, ubRMSE was significantly lower than RMSE, suggesting that systematic bias, rather than random noise, dominated the errors—implying that predictions are more correctable and reliable. MAE patterns further confirmed model robustness, showing low values in vegetation-rich and seasonally dynamic zones (e.g., eastern China, North American farmlands, African savannas, and South American highlands).
For representativeness, extreme regions such as the Tibetan Plateau, Andes (elevations > 1500 m), and permafrost zones with annual mean temperatures < 0 °C (e.g., northern Siberia, Greenland) were excluded. Blank areas in Figure 7 correspond to deserts (Sahara, Taklamakan, Arabian, Thar, Turan, Gobi, Great Basin, etc.), where SM is essentially invariant and features cannot be retrieved.
Therefore, all spatial evaluation results are based on TM-1 models trained with seasonal partitioning and the LC-cluster. These findings demonstrate that the strategy provides strong temporal adaptability in climate-sensitive zones and reliable spatial generalization in agro-pastoral regions. Accordingly, all subsequent TM-1 SM results presented are based on the LC-cluster with seasonal partitioning.

4.4. In Situ Validation

To further assess temporal consistency and predictive accuracy, two representative ISMN validation sites (PSA6Plaenterwald and Sandstone-6-W) were selected. Predictions for May, August, November, and December 2023 were compared with in situ observations (Figure 8). As TM-1/SMAP and ground measurements require precise temporal alignment and only “G”-quality in situ data were retained, some gaps in the time series exist.
Results show that TM-1 demonstrated strong temporal fitting at both sites. At PSA6Plaenterwald and Sandstone-6-W, TM-1 achieved R = 0.8335 and 0.8015, outperforming SMAP (R = 0.7581 and 0.8231, respectively), highlighting its advantage in dynamic trend tracking. In terms of errors, TM-1 achieved ubRMSE = 0.03902 at PSA6Plaenterwald, lower than SMAP (0.04635); at Sandstone-6-W, TM-1 and SMAP recorded 0.04366 and 0.05457, respectively, further underscoring TM-1’s superior bias control. Overall, TM-1 provided better estimation accuracy and temporal consistency than SMAP at the site level, particularly during periods with strong precipitation variability.
Figure 9 shows scatterplots of predictions versus ISMN observations for the testing months, with the red dashed line indicating the 1:1 consistency line ( y = x ). Most samples clustered near the line, confirming good agreement for both TM-1 and SMAP. TM-1 achieved R = 0.560 and ubRMSE = 0.0895 , compared with SMAP’s R = 0.605 and ubRMSE = 0.0865 , suggesting both captured spatial SM variability effectively. Although SMAP showed slightly higher overall correlation, TM-1 results were very close, demonstrating comparable capability to mainstream products. Importantly, TM-1 relies only on SMAP data and auxiliary predictors, yet achieved robust independent performance—highlighting its efficiency and potential for high-resolution SM monitoring.
To better illustrate the spatial and land-cover representativeness of the in situ validation samples, we further summarized the paired TM-1/SMAP–ISMN samples by ISMN subnetwork and dominant land-cover type, as reported in Table 3. Because each ISMN subnetwork may contain a large number of stations and observations, listing individual sites would be impractical; instead, Table 3 provides, for each “network × land-cover type” combination, the number of paired samples n samples and the corresponding correlation coefficient R and unbiased root-mean-square error (ubRMSE) for TM-1–ISMN and SMAP–ISMN. The results show that effective samples are mainly concentrated in North American networks such as SCAN, USCRN, SNOTEL, and SOILSCAPE, and in European networks such as Berlin, REMEDHUS, and SMOSMANIA, with most samples falling into Grass, Cultivated, and Forest land-cover types. Although sample sizes for Barren and Urban are relatively small, the R and ubRMSE values for TM-1 and SMAP are of the same order across the major land-cover types, indicating that both products exhibit broadly comparable temporal consistency and error levels for near-surface (0–5 cm) soil moisture variations within the ISMN-covered regions.

4.5. Analysis of Spatio-Temporal Dynamics of Soil Moisture Retrievals

Figure 10, Figure 11, Figure 12 and Figure 13 illustrate spatial distributions of TM-1 and SMAP soil moisture for May, August, November, and December (test months), along with evaluation metrics (ubRMSE and R). At the global scale, both products exhibited strong agreement in capturing large-scale SM patterns. However, differences emerged under varying land cover, climate, and seasonal conditions, reflecting heterogeneous model adaptability.
In May, during late spring/early summer at mid-latitudes, vegetation greening drives generally high soil moisture. TM-1 and SMAP showed consistent patterns across the East European Plain, eastern Siberia, North China Plain, and eastern U.S., with R > 0.7 and low ubRMSE. Agricultural regions such as southern China’s rice paddies and the U.S. Corn Belt exhibited excellent agreement, with correlations approaching or exceeding 0.8. Particularly strong performance ( R > 0.8 ) was observed in Whitehorse (western Canada), eastern Mongolia, and southern Russia, indicating robust model adaptability in grassland–forest transition zones.
By contrast, higher errors were found in central Russia and western U.S. mountainous regions, likely due to permafrost remnants, hydrological–terrain interactions, or complex land cover. In arid regions (Australia, North Africa), soil moisture variability is minimal year-round, resulting in sparse evaluation coverage or missing values.
In August, peak summer in the Northern Hemisphere, rainfall and evapotranspiration maxima intensified SM variability. TM-1 and SMAP maintained good agreement in eastern China, Southeast Asia, tropical Africa, and northern South America. High precision was observed in western/eastern Kazakhstan grasslands (high R, low ubRMSE), underscoring strong generalization in moderately vegetated, relatively flat terrain. Meanwhile, deserts in Australia and North Africa again showed sparse or missing evaluation due to negligible SM dynamics.
In November–December, northern high latitudes (> 60 N; Siberia, northern Canada, Scandinavia) entered deep winter, with frozen soils limiting remote sensing retrievals, leading to widespread metric gaps. Nonetheless, consistent SM patterns were maintained in warm, humid zones such as southeastern China, southern U.S., Gulf Coast, Southeast Asia, and southern–central Africa. In Forrest (southwestern Australia), the onset of the rainy season yielded good model performance ( R 0.8 , low ubRMSE), demonstrating TM-1’s responsiveness in wet–dry transition regions.
Across tropical South America and Africa, TM-1 consistently exhibited stronger wetness responses than SMAP, particularly under heavy rainfall conditions, with broader spatial extent of wet zones. This suggests greater sensitivity of TM-1 to short-term precipitation events.
Overall, TM-1 effectively captured seasonal transitions in the Northern Hemisphere, from wetter conditions in May/August to drier states in November/December, consistent with vegetation senescence, reduced precipitation, and permafrost formation. Evaluation metrics also declined synchronously with these seasonal shifts. However, in South Asia, Southeast Asia, and East Africa, TM-1 tended to overestimate wetness extent relative to SMAP, possibly due to dense vegetation or high soil organic matter.

4.6. Difference Analysis Between TM-1 and SMAP

Figure 14 shows the difference distribution between TM-1 and SMAP soil moisture products during the test period in May, August, November, and December (TM–SMAP). The overall differences are concentrated within the ± 0.1 cm 3 / cm 3 range, indicating high consistency between the two datasets in most regions. In May, a noticeable underestimation is observed in central Russia; while in the central United States and the Amazon Basin, there is a slight underestimation in May and August, which shifts to slight overestimation in November and December. The difference distribution exhibits a certain seasonal pattern spatially, reflecting the varying responses of TM-1 and SMAP across different regions and time periods.

4.7. Feature Contributions and Prediction Performance of Soil Moisture Retrieval Across Different Land Cover Types

Table 4 presents the results of feature importance evaluation for soil moisture estimation using TM data across seven land cover (LC) types. For each LC type, the permutation importance method was used to assess the significance of seven core input features. The final results were averaged from the seven models to obtain a more representative global feature importance ranking.
The evaluation shows that VWC, Precipitation, and Surface Roughness (Rough) exert strong influence on soil moisture prediction, indicating that vegetation water content, precipitation supply, and surface structure are key factors affecting soil moisture distribution. Land Surface Temperature (LST) and Clay content also contribute substantially, reflecting the role of surface thermal conditions and soil particle composition in regulating soil moisture dynamics. Although the importance score for NDVI is relatively low, as a comprehensive indicator of vegetation growth status, it still provides supplementary value to the model’s generalization under seasonal variations. Therefore, the TM soil moisture estimation model retains all features as input parameters to ensure comprehensiveness and stability of the information.
In the importance evaluation, we did not compute the scores separately for individual seasons. Instead, we constructed a joint evaluation sample using four representative months in the test period (May, August, November, and December, corresponding to different seasonal backgrounds), and calculated permutation importance based on this combined sample. This approach both guarantees sufficient sample size and yields a ranking that is closer to an annual mean state, which is consistent with our goal of building a unified TM-1 retrieval framework that focuses on cross-season overall applicability. It should be emphasized that the DDM shape parameters (kurtosis and skewness) are native internal waveform statistics in the TM-1 L1 product, whose values are strongly coupled with observation geometry and land-cover conditions. To avoid confounded, apparent conclusions when geometry/land cover is not explicitly stratified, we do not list them separately in the importance ranking of Table 4; instead, they are included together with incidence angle and other internal geometric quantities as joint features for modeling and evaluation. Further permutation-importance analysis shows that, among all 11 input features, the mean importance of kurtosis and skewness is below 0.01, much lower than that of dominant factors such as VWC, precipitation, and roughness. This indicates that, within the current 9 km global framework, their marginal contribution to overall retrieval performance is limited, although they are still retained in the model as TM-1-specific geometric information to avoid potential information loss.
For different land cover types, Barren areas performed the best at both evaluation scales, with an R value of 0.94 and a grid-level median of 0.59, showing the smallest errors (ubRMSE: 0.043/0.007), indicating high inversion accuracy and strong temporal stability for this surface type. Shrubland also showed excellent performance, with a cluster-level R value of 0.87 and a grid-level R median of 0.66, making it the second best-performing category after Barren. Cultivated and Urban areas both had cluster-level R values exceeding 0.81 and grid-level R values above 0.60, demonstrating the model’s strong explanatory power for moisture changes in areas with significant human intervention. In contrast, Forest and Wetland categories had relatively lower R values, with grid-level R medians not exceeding 0.45, and larger ubRMSE, indicating challenges in prediction accuracy in environments with high complexity or variability. Overall, the trends in evaluation results across land cover types are consistent at both cluster and grid levels, with differences mainly reflecting the model’s varying responses to regional averages and local temporal performance, as shown in Figure 15.
From the perspective of sample distribution and within-class variability, Table 5 summarizes the test-set sample size, SMAP soil moisture statistics, and cluster-level accuracy metrics for each land-cover type. The number of test samples N test ranges from about 8.8 × 10 4 (Urban) to 3.6 × 10 6 (Grass). For the dominant classes such as Forest, Grass, and Cultivated, N test exceeds 8 × 10 5 , indicating that training and evaluation of the LC models are based on a relatively sufficient sample foundation. The within-class standard deviation of SMAP (std(SMAP)) generally lies between 0.10 and 0.30 cm3/cm3. Grass and Cultivated exhibit relatively larger variance, reflecting the stronger heterogeneity of their climatic and surface conditions, whereas Barren shows a smaller std (SMAP), consistent with the overall low moisture level and limited temporal variability in arid regions. Against this background, the cluster-level correlation coefficient R test remains in the range of approximately 0.71–0.94, and ubRMSE test is about 0.043–0.093 cm3/cm3 across land-cover types. Even for internally heterogeneous classes such as Cultivated and Urban, no obvious degradation in performance is observed. Combined with the broadly consistent Cluster-level and Grid-level patterns in Figure 15, these results suggest that the LC-cluster strategy achieves reasonable sample utilization and error control across different land-cover types, without significant performance deterioration caused by sample imbalance or large within-class variance.
From the perspective of uncertainty structure, combining the feature-importance results in Table 4 with the accuracy contrasts in Figure 15 and Table 5 reveals that the land-cover types with the smallest errors (Barren and Shrub) are precisely those located in regions with relatively low VWC and Preci. and more homogeneous surface structure. In contrast, land-cover types with lower correlations and larger ubRMSE (Forest and Wetland) are concentrated in areas with high vegetation cover, strong precipitation, or complex surface geometry. This spatial–categorical correspondence indicates that the amplification of TM-1 retrieval errors is mainly driven by vegetation masking, precipitation forcing, and roughness enhancement, which is consistent with the leading roles of VWC, Preci. and Rough in the feature-importance ranking and provides empirical support for our qualitative attribution of GNSS-R error sources.

5. Discussion

5.1. Justification of the Random Forest Model and Comparison with Other Algorithms

Building on the LC-cluster + seasonal partitioning configuration described above, we further evaluated the suitability of different machine-learning algorithms for TM-1 soil moisture retrieval. Under the same input features and data organization, three representative gradient boosting tree models—CatBoost, LightGBM, and XGBoost—were introduced as benchmarks for comparison (Table 6 and Table 7). At the cluster level (Table 6), the Random Forest model achieves the highest correlation coefficient ( R = 0.8155 ) among all four algorithms, while simultaneously yielding the lowest ubRMSE, RMSE, and MAE. This indicates that Random Forest provides the best overall balance between trend consistency and error control. CatBoost and LightGBM show clearly weaker performance in both correlation and error metrics. XGBoost is closer to Random Forest, but its R remains lower by about 0.02, and both RMSE and MAE are slightly higher, suggesting that at the cluster scale Random Forest attains a more favorable trade-off between accuracy and robustness.
At the 9 km grid scale (Table 7), the differences among the models become less pronounced, yet Random Forest still attains the highest correlation and the smallest RMSE and MAE, indicating stronger overall error control for fine-scale spatial prediction. Although XGBoost shows a marginal advantage in terms of ubRMSE, the combined performance of Random Forest in correlation and total root-mean-square error is more balanced. Consequently, from cluster to grid scales, Random Forest exhibits a more stable error structure. This is consistent with the spatial generalization advantages of the LC-cluster + seasonal partitioning configuration discussed in Section 4.1, Section 4.2 and Section 4.3, and further supports Random Forest as the more robust choice under the current data organization and temporal splitting framework.
From the perspective of algorithmic mechanisms and data characteristics, the TM-1 reflectivity and multi-source auxiliary variables form a training set that exhibits strong spatial heterogeneity and non-negligible noise at the quasi-global scale, with pronounced differences in feature distributions across land-cover types, climate zones, and seasons. Compared with gradient boosting algorithms that rely on sequentially fitting residuals, Random Forest aggregates multiple subsampled decision trees to capture nonlinear relationships, making it less sensitive to outliers and local noise and less dependent on meticulous hyperparameter tuning. These properties make Random Forest particularly suitable for constructing a unified retrieval model on large, highly heterogeneous global datasets. In addition, by combining Random Forest with permutation importance, we quantitatively assessed the dominant roles of VWC, precipitation, and roughness in Section 4.7, which helps interpret the model behavior from a physical-process standpoint and mitigates the “black-box” nature of the machine learning approach. Considering the cross-scale accuracy comparisons, the robustness of the algorithms to noise and heterogeneity, and the interpretability of feature contributions, we ultimately adopt Random Forest as the core regression model for TM-1 soil moisture retrieval, while treating CatBoost, LightGBM, and XGBoost as important baselines to justify the model choice and to highlight the consistency of our framework across different algorithms.

5.2. Comparison with Existing Spaceborne GNSS-R Soil Moisture Studies

Within the overall framework of existing spaceborne GNSS-R soil moisture retrieval studies, the TM-1 results presented here can be interpreted in the context of the current mainstream performance level. Previous work has typically combined GNSS-R reflectivity with passive microwave soil moisture products and auxiliary meteorological and vegetation variables, and carried out stratified modeling by land-cover type or climate zone on a unified grid, achieving correlations and root-mean-square errors comparable to passive microwave products at spatial resolutions of about 9 km. The review papers by Rahmani et al. [1] and Rohil and Mathur [6] likewise point out that, across most climate regimes, the accuracy of spaceborne GNSS-R soil moisture retrievals can generally approach that of mainstream passive microwave products. In comparison with this typical range, the cluster-level performance obtained in this study under the LC-cluster plus seasonal-splitting configuration (R ≈ 0.82, ubRMSE ≈ 0.069 cm3/cm3, with particularly strong results over Barren and Shrub ecosystems; Section 4.3, Section 4.4, Section 4.5, Section 4.6 and Section 4.7) indicates that the RF retrieval based on TM-1 reflectivity and multi-source auxiliary variables has reached the mainstream level of current spaceborne GNSS-R soil moisture products and exhibits strong dynamic responsiveness in arid and semi-arid regions.
Moreover, most existing studies still rely primarily on CYGNSS or single-constellation GNSS-R missions, for which spatio-temporal coverage and systematic evaluation in mid- and high-latitude regions and complex terrain are relatively limited. Yang et al. [8] and Wang et al. [24], using CYGNSS reconstructed observations, the FY-3 GNSS-R constellation, and regional TM-1 data, have shown that multi-constellation layouts and improved orbital geometries can substantially enhance data availability over the cryosphere and high-latitude regions. Building on a unified 9 km grid and the LC-cluster plus seasonal-splitting framework, our results further demonstrate that TM-1 achieves correlation and error levels in southwestern Canada, the Mongolian Plateau, northeastern China, and parts of mid- to high-latitude Europe that are comparable to those over mid-latitude agricultural belts (Section 4.3, Section 4.4, Section 4.5 and Section 4.6), thereby partly compensating for the coverage gaps of CYGNSS’s low-inclination orbits in these regions. This is consistent with the multi-constellation and regional-network analyses of Setti and Tabibi [22,28] and Arellana et al. [30], and from the perspective of constellation design and coverage patterns further confirms the potential of multi-constellation GNSS-R for soil moisture monitoring in climatically complex regions.

5.3. Limitations and Future Work

Despite the generally good accuracy and spatial generalization achieved by the random forest model under the LC-cluster plus seasonal-splitting configuration, several limitations remain that call for further improvement. First, the training and validation targets rely primarily on SMAP products, so TM-1 retrievals inevitably inherit part of SMAP’s error structure and systematic biases. The current in situ validation, based on a limited number of ISMN sites, mainly constrains the local-scale consistency between SMAP and TM-1 and does not yet provide a fully independent absolute calibration at the global scale. As TM-1 and other spaceborne passive/active microwave missions (e.g., SMOS, FY-3 GNSS-R) continue to accumulate long-term data records, future work can perform systematic bias analyses and cross-calibration among multiple soil moisture products and assess, within land surface models or data assimilation frameworks, the added value of TM-1 retrievals for simulating soil moisture, evapotranspiration, and related water and energy fluxes. In parallel, as the TM-1 operational record extends and multi-year datasets become available, it will be possible to conduct anomaly and trend analyses at annual scales, thereby evaluating the robustness of TM-1 soil moisture retrievals under interannual climate variability and extreme events.
Second, independent validation is still dominated by ISMN stations in Europe and North America, with virtually no available in situ networks in large parts of Africa, South America, and Asia. As a result, the true performance of the model in these regions remains uncertain. Future efforts should therefore seek collaboration with regional hydrological and ecological monitoring networks to expand the in situ validation sample and to perform application-oriented evaluations in specific hydrological and agricultural contexts, for example assessing the contribution of TM-1 retrievals to drought monitoring, crop yield estimation, or diagnosis of precipitation–soil moisture feedbacks. On this basis, it will be important to develop, starting from the unified TM-1 retrieval framework, a family of soil moisture retrieval models that are optimized for specific regions, land-cover types, and application needs, and to extend these models to multi-year TM-1 data in order to analyze interannual variability and long-term trends.
Additionally, to ensure grid consistency and reproducibility for cross-regional modeling, we prioritized a quality-controlled global 9 km clay fraction dataset that is strictly aligned with our retrieval grid. In contrast, the robust incorporation of organic carbon (OC) and other static soil properties at the global scale still requires a dedicated harmonization workflow, including resolution unification to 9 km, unit/definition alignment, and regional bias correction. In line with recent studies showing that DSM-derived soil properties (e.g., clay fraction and OC) can improve soil moisture prediction accuracy [31], we will, in future work, harmonize OC and other key soil property variables and systematically assess their contributions to the performance of TM-1 soil moisture retrievals.
Finally, we note that the uncertainty analysis in this study is mainly based on the spatio-temporal distribution of errors and the performance differences among land-cover types, combined with feature-importance results for a first-order attribution of dominant error sources such as vegetation, hydro-meteorological conditions, and surface geometry. A more formal pixel-scale decomposition of error components into instrument noise, vegetation masking, precipitation-related disturbances, and other factors will be a key focus of future work.

6. Conclusions

This study develops a 9 km global soil-moisture retrieval model using 2023 TM-1 GNSS-R reflectivity fused with multi-source auxiliary variables (NDVI, VWC, precipitation, elevation). By introducing a land-cover-based spatial clustering strategy (LC-cluster) and a seasonal partitioning scheme, the model achieves higher accuracy, better stability, and improved adaptability across diverse ecozones and land-cover types. LC-cluster delivers consistent gains at both the cluster and grid levels (cluster-level R = 0.8155, ubRMSE = 0.0689 cm3/cm3; grid-level median R = 0.5251, ubRMSE ≈ 0.0499 cm3/cm3), with particularly strong performance over Barren and Shrub surfaces (cluster-level R = 0.94 and 0.87; grid-level R = 0.59 and 0.66; minimum ubRMSE = 0.043 cm3/cm3). Seasonal partitioning further enhances generalization by aligning the model with intra-annual climate–soil-moisture dynamics, yielding notable accuracy gains in seasonally driven regions such as Northeast China, southern Africa, and the Argentine Pampas. Overall, the proposed LC-cluster and seasonal partitioning approaches improve predictive accuracy and robustness while maintaining strong temporal adaptability and spatial generalization, providing a practical pathway for global GNSS-R soil-moisture monitoring with TM-1.

Author Contributions

Conceptualization, Y.J. and M.J.; methodology, Y.J.; software, Z.Z.; validation, N.Z.; formal analysis, P.D.; investigation, Q.Z., M.J. and N.Z.; resources, N.Z. and M.J.; data curation, Q.Z., Z.Z. and P.D.; writing—original draft preparation, Y.J. and M.J.; writing—review and editing, M.J. and N.Z.; visualization, Q.Z.; supervision, Z.Z.; project administration, N.Z. and M.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Shandong Province Key R&D Program (Competitive Innovation Platform) Project, Grant No. 2024CXPT101, “Research on Key Technologies for Monitoring and Evaluation of Ecological Restoration Effect of Mines in the Yellow River Basin (Shandong Section)”, the Qingdao Science and Technology for the Benefit of People Project, Grant No. 25-1-5-xdny-12-nsh, “High-Standard Farmland Integrated Sky–Aerial–Ground Intelligent Supervision Technology and Application”, the China Postdoctoral Science Foundation (76th General Program), Grant No. 2024M761845, “Airborne GNSS-IR High-Precision Sea-Surface Height Inversion Methods and Model”, the Open Fund of the Key Laboratory of Marine Environmental Survey Technology and Application, Ministry of Natural Resources, Grant No. MESTA-2024-B001, “Spatiotemporally Intelligent Sea-Surface Wind-Speed Retrieval via Multi-source Collaborative Spaceborne GNSS-R Sensing”, the Qingdao Natural Science Foundation (Young Scientists Program), Grant No. 25-1-1-54-zyyd-jch, “Multi-source-Collaborative Spaceborne GNSS-R Sea-Surface Salinity Inversion Methods and Model”, the Shandong Provincial Postdoctoral Innovation Project, Grant No. SDCX-ZG-202502046, “Global Sea-Surface Salinity Inversion via Multi-source Collaborative Spaceborne GNSS-R”, and the National Program for Funding Postdoctoral Researchers (Category B), Grant No. GZB20250067, “Global Sea-Surface Salinity Inversion via Multi-source Collaborative Spaceborne GNSS-R”.

Data Availability Statement

Publicly available datasets were analyzed in this study. The data can be accessed at: TM-1 Level-1 reflectivity data products (https://tmsats.com/satellite); SMAP Enhanced L3 Radiometer Global and Polar Grid Daily 9 km EASE-Grid Soil Moisture, Version 6 (https://doi.org/10.5067/M20OXIZHY3RJ); ERA5-Land hourly dataset (https://cds.climate.copernicus.eu); VIIRS Global NDVI (https://earthexplorer.usgs.gov/); MODIS MCD12C1 Climate Modeling Grid (CMG) (https://search.earthdata.nasa.gov); International Soil Moisture Network (ISMN) (https://ismn.earth/en/).

Acknowledgments

We hereby express our sincere gratitude to Aerospace Tianmu (Chongqing) Satellite Technology Co., Ltd. for providing the Tianmu data for the entire year of 2023.

Conflicts of Interest

Author Zhihua Zhang, Penghui Ding and Qian Zhao were employed by the Qingdao Surveying & Mapping Institute. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Rahmani, M.; Asgari, J.; Asgarimehr, M. Soil moisture retrieval using space-borne GNSS reflectometry: A comprehensive review. Int. J. Remote Sens. 2022, 43, 5173–5203. [Google Scholar] [CrossRef]
  2. Kim, H.; Lakshmi, V.; Kwon, Y.; Kumar, S.V. First attempt of global-scale assimilation of subdaily scale soil moisture estimates from CYGNSS and SMAP into a land surface model. Environ. Res. Lett. 2021, 16, 074041. [Google Scholar] [CrossRef]
  3. Yan, Q.; Huang, W.; Jin, S.; Jia, Y. Pan-tropical soil moisture mapping based on a three-layer model from CYGNSS GNSS-R data. Remote Sens. Environ. 2020, 247, 111944. [Google Scholar] [CrossRef]
  4. Chen, F.; Ye, Y.; Liu, L.; Huang, L.; Guo, F.; Chen, Y. A Novel SlidingWindow Algorithm based Approach for Global Bias Correction in CYGNSS Soil Moisture Retrievals. IEEE Trans. Geosci. Remote Sens. 2025, 63, 4414811. [Google Scholar]
  5. Chew, C.C.; Small, E.E. Soil moisture sensing using spaceborne GNSS reflections: Comparison of CYGNSS reflectivity to SMAP soil moisture. Geophys. Res. Lett. 2018, 45, 4049–4057. [Google Scholar] [CrossRef]
  6. Rohil, M.K.; Mathur, S. CYGNSS-derived soil moisture: Status, challenges and future. Ecol. Inform. 2022, 69, 101621. [Google Scholar] [CrossRef]
  7. Kim, H.; Lakshmi, V. Use of cyclone global navigation satellite system (CyGNSS) observations for estimation of soil moisture. Geophys. Res. Lett. 2018, 45, 8272–8282. [Google Scholar] [CrossRef]
  8. Yang, W.; Guo, F.; Zhang, X.; Zhu, Y.; Zhang, Z.; Li, Z.; Mei, D. High-resolution soil moisture and freeze–thaw records toward the third pole using GNSS-R reconstructed observations during 2018–2022. GPS Solut. 2025, 29, 9. [Google Scholar] [CrossRef]
  9. Nguyen, H.H.; Kim, H.; Crow, W.; Yueh, S.; Wagner, W.; Lei, F.; Wigneron, J.P.; Colliander, A.; Frappart, F. From theory to hydrological practice: Leveraging CYGNSS data over seven years for advanced soil moisture monitoring. Remote Sens. Environ. 2025, 316, 114509. [Google Scholar] [CrossRef]
  10. Wang, H.; Yuan, Q.; Zhao, H.; Xu, H. In-situ and triple-collocation based assessments of CYGNSS-R soil moisture compared with satellite and merged estimates quasi-globally. J. Hydrol. 2022, 615, 128716. [Google Scholar] [CrossRef]
  11. Lei, F.; Senyurek, V.; Kurum, M.; Gurbuz, A.C.; Boyd, D.; Moorhead, R.; Crow, W.T.; Eroglu, O. Quasi-global machine learning-based soil moisture estimates at high spatio-temporal scales using CYGNSS and SMAP observations. Remote Sens. Environ. 2022, 276, 113041. [Google Scholar] [CrossRef]
  12. Zhu, Y.; Guo, F.; Zhang, X. Effect of surface temperature on soil moisture retrieval using CYGNSS. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102929. [Google Scholar] [CrossRef]
  13. Li, Y.; Yan, S.; Gong, J.; Xiao, J.; Asgarimehr, M.; Wickert, J. Soil moisture retrieval by a novel hybrid model based on CYGNSS and Sun-induced fluorescence data. J. Hydrol. 2024, 632, 130845. [Google Scholar] [CrossRef]
  14. Bui, H.X.; Li, Y.X.; Sherwood, S.C.; Reid, K.J.; Dommenget, D. Assessing the soil moisture-precipitation feedback in Australia: CYGNSS observations. Environ. Res. Lett. 2023, 19, 014055. [Google Scholar] [CrossRef]
  15. Song, S.; Zhu, Y.; Qu, X.; Tao, T. Spaceborne GNSS-R for sensing soil moisture using CYGNSS considering land cover type. Water Resour. Manag. 2025, 39, 3499–3519. [Google Scholar] [CrossRef]
  16. Dong, Z.; Yan, Q.; Jin, S.; Li, L.; Chen, G. Refined semi-empirical models for soil moisture retrieval in spaceborne GNSS-Reflectometry: Evaluation across diverse land cover types. Measurement 2025, 242, 115849. [Google Scholar] [CrossRef]
  17. Tang, F.; Yan, S. CYGNSS soil moisture estimations based on quality control. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar]
  18. Zhu, Y.; Guo, F.; Yang, W.; Zhang, Z.; Li, Z.; Wu, Z.; Zhang, X. Analysis of 9 km quasi-global microwave land surface emissivity estimates derived from SMAP radiometer and CYGNSS reflectometer. Geo-Spat. Inf. Sci. 2025, 1–16. [Google Scholar] [CrossRef]
  19. Zhang, S.; Guo, Q.; Liu, Q.; Ma, Z.; Liu, N.; Hu, S.; Bao, L.; Zhou, X.; Zhao, H.; Wang, L.; et al. Improvement of CYGNSS soil moisture retrieval model considering water and surface temperature. Adv. Space Res. 2023, 72, 3048–3064. [Google Scholar] [CrossRef]
  20. Yang, W.; Guo, F.; Zhang, X.; Zhu, Y.; Li, Z.; Zhang, Z. First quasi-global soil moisture retrieval using Fengyun-3 GNSS-R constellation observations. Remote Sens. Environ. 2025, 321, 114653. [Google Scholar] [CrossRef]
  21. Ma, Z.; Camps, A.; Park, H.; Zhang, S.; Li, X.; Wigneron, J.P. Sensitivity study of multi-constellation GNSS-R to soil moisture and surface roughness using FY-3E GNOS-II data. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2024, 18, 413–423. [Google Scholar] [CrossRef]
  22. Setti, P.T., Jr.; Tabibi, S. Evaluation of Spire GNSS-R reflectivity from multiple GNSS constellations for soil moisture estimation. Int. J. Remote Sens. 2023, 44, 6422–6441. [Google Scholar] [CrossRef]
  23. Zhu, Y.; Guo, F.; Zhang, X. Spaceborne GNSS-R soil moisture retrieval from GPS/BDS-3/Galileo satellites. GPS Solut. 2025, 29, 10. [Google Scholar] [CrossRef]
  24. Wang, R.; Li, C.; Zheng, N. Exploration of Tianmu-1 and CYGNSS to estimate soil moisture with GNSS-R in southwest China. Phys. Scr. 2025, 100, 056005. [Google Scholar] [CrossRef]
  25. Ruf, C.; Gleason, S. Spaceborne GNSS-R Bistatic Radar Remote Sensing, CYGNSS, and Future Missions. Proc. IEEE 2025, in press. [Google Scholar] [CrossRef]
  26. Farhad, M.M.; Senyurek, V.; Rafi, M.A.S.; Baray, S.B.; McCraine, C.; Hathcock, L.A.; Adeli, A.; Yanbo, H.; Gurbuz, A.C.; Kurum, M. Integrating UAS-based GNSS-R, LiDAR, and Multispectral Data for Soil Moisture Estimation: Summary of Results from a Three-Year Long Field Campaign. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 16896–16915. [Google Scholar]
  27. Wu, X.; Han, S.; Huang, X.; Cao, K. Soil Moisture Retrieval Using Single-Frequency Dual-Polarization GNSS-R Data From Airborne GLORI Experiment. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 20655–20665. [Google Scholar]
  28. Setti, P.T.; Tabibi, S. Comprehensive analysis of CYGNSS GNSS-R data for enhanced soil moisture retrieval. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 18, 663–679. [Google Scholar] [CrossRef]
  29. Yang, T.; Cong, N. A preliminary view of the CYGNSS soil moisture-vegetation activity linkage. Front. For. Glob. Change 2023, 6, 1320432. [Google Scholar]
  30. Arellana, J.; Grings, F.; Franco, M. Enhanced CyGNSS Soil Moisture retrieval validated by in-situ data in Argentina’s Pampas. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025. early access. [Google Scholar] [CrossRef]
  31. Fathololoumi, S.; Biswas, A. Incorporating digital soil mapping-derived soil properties for enhanced soil moisture prediction. Soil Tillage Res. 2026, 256, 106901. [Google Scholar] [CrossRef]
Figure 1. TM-1 Delay–Doppler Maps (DDMs): (a) ocean and (b) land.
Figure 1. TM-1 Delay–Doppler Maps (DDMs): (a) ocean and (b) land.
Land 15 00036 g001
Figure 2. Spatial distribution of SMAP soil moisture on 28 August 2023. The color bar below the map represents the soil moisture values and their corresponding color range.
Figure 2. Spatial distribution of SMAP soil moisture on 28 August 2023. The color bar below the map represents the soil moisture values and their corresponding color range.
Land 15 00036 g002
Figure 3. Spatial distribution of ISMN in in situ stations. Green dots represent the ISMN surface soil moisture observation sites (0–5 cm) used for independent validation in this study.
Figure 3. Spatial distribution of ISMN in in situ stations. Green dots represent the ISMN surface soil moisture observation sites (0–5 cm) used for independent validation in this study.
Land 15 00036 g003
Figure 4. Overall methodological framework. “M” denotes month in 2023; under the time-sequence split, 1–8M (January–August) are used for training and 9–12M (September–December) for testing; under the seasonal-sequence split, 1–4M, 6–7M, and 9–10M are used for training, while 5M, 8M, 11M, and 12M are held out for testing.
Figure 4. Overall methodological framework. “M” denotes month in 2023; under the time-sequence split, 1–8M (January–August) are used for training and 9–12M (September–December) for testing; under the seasonal-sequence split, 1–4M, 6–7M, and 9–10M are used for training, while 5M, 8M, 11M, and 12M are held out for testing.
Land 15 00036 g004
Figure 5. Comparison of ubRMSE ( cm 3 / cm 3 ) and R between TM-1 and SMAP soil moisture during the testing period under two evaluation schemes: (a) cluster-level and (b) grid-level. The teal curve denotes the ubRMSE metric, and the orange curve denotes the correlation coefficient R; vertical error bars indicate the interquartile range (25–75%), reflecting performance dispersion across cluster or grid scales.
Figure 5. Comparison of ubRMSE ( cm 3 / cm 3 ) and R between TM-1 and SMAP soil moisture during the testing period under two evaluation schemes: (a) cluster-level and (b) grid-level. The teal curve denotes the ubRMSE metric, and the orange curve denotes the correlation coefficient R; vertical error bars indicate the interquartile range (25–75%), reflecting performance dispersion across cluster or grid scales.
Land 15 00036 g005
Figure 6. Test results of TM-1 and SMAP soil moisture under different data-splitting schemes. Panels (ad) show the cluster-level evaluation metrics (R, RMSE, ubRMSE, and MAE) for the LC-cluster and ONE-cluster strategies, while panels (eh) present the corresponding grid-level evaluation metrics. Yellow and pink bars denote the Time Sequence and Seasonal Sequence splits, respectively.
Figure 6. Test results of TM-1 and SMAP soil moisture under different data-splitting schemes. Panels (ad) show the cluster-level evaluation metrics (R, RMSE, ubRMSE, and MAE) for the LC-cluster and ONE-cluster strategies, while panels (eh) present the corresponding grid-level evaluation metrics. Yellow and pink bars denote the Time Sequence and Seasonal Sequence splits, respectively.
Land 15 00036 g006
Figure 7. Spatial distribution of test evaluation metrics for TM-1 and SMAP soil moisture under the LC-cluster strategy with seasonal partitioning. Panels (ad) show R, RMSE ( cm 3 / cm 3 ), ubRMSE ( cm 3 / cm 3 ), and MAE ( cm 3 / cm 3 ), respectively. The color bar beneath each panel indicates the metric values and their corresponding color ranges.
Figure 7. Spatial distribution of test evaluation metrics for TM-1 and SMAP soil moisture under the LC-cluster strategy with seasonal partitioning. Panels (ad) show R, RMSE ( cm 3 / cm 3 ), ubRMSE ( cm 3 / cm 3 ), and MAE ( cm 3 / cm 3 ), respectively. The color bar beneath each panel indicates the metric values and their corresponding color ranges.
Land 15 00036 g007
Figure 8. Comparison of TM-1 and SMAP soil moisture predictions against in situ observations at PSA6Plaenterwald and Sandstone-6-W stations (May, August, November, and December 2023). The left y-axis denotes soil moisture, while the right y-axis denotes precipitation.
Figure 8. Comparison of TM-1 and SMAP soil moisture predictions against in situ observations at PSA6Plaenterwald and Sandstone-6-W stations (May, August, November, and December 2023). The left y-axis denotes soil moisture, while the right y-axis denotes precipitation.
Land 15 00036 g008
Figure 9. Scatterplots of TM-1 and SMAP soil moisture predictions versus in situ ISMN observations in the test months (May, August, November, December 2023). The red dashed line represents the 1:1 consistency line ( y = x ). The right-hand color bar maps soil moisture values to their corresponding colors.
Figure 9. Scatterplots of TM-1 and SMAP soil moisture predictions versus in situ ISMN observations in the test months (May, August, November, December 2023). The red dashed line represents the 1:1 consistency line ( y = x ). The right-hand color bar maps soil moisture values to their corresponding colors.
Land 15 00036 g009
Figure 10. Spatial distribution of TM-1 and SMAP soil moisture and evaluation metrics (ubRMSE (cm3/cm3) and R) for the May test set. The color bar beneath the soil moisture panels indicates the soil moisture values and their corresponding color ranges; the color bars beneath the evaluation-metric panels indicate the respective metric values and their corresponding color ranges.
Figure 10. Spatial distribution of TM-1 and SMAP soil moisture and evaluation metrics (ubRMSE (cm3/cm3) and R) for the May test set. The color bar beneath the soil moisture panels indicates the soil moisture values and their corresponding color ranges; the color bars beneath the evaluation-metric panels indicate the respective metric values and their corresponding color ranges.
Land 15 00036 g010
Figure 11. Spatial distribution of TM-1 and SMAP soil moisture and evaluation metrics (ubRMSE (cm3/cm3) and R) for the August test set. The color bar beneath the soil moisture panels indicates the soil moisture values and their corresponding color ranges; the color bars beneath the evaluation-metric panels indicate the respective metric values and their corresponding color ranges.
Figure 11. Spatial distribution of TM-1 and SMAP soil moisture and evaluation metrics (ubRMSE (cm3/cm3) and R) for the August test set. The color bar beneath the soil moisture panels indicates the soil moisture values and their corresponding color ranges; the color bars beneath the evaluation-metric panels indicate the respective metric values and their corresponding color ranges.
Land 15 00036 g011
Figure 12. Spatial distribution of TM-1 and SMAP soil moisture and evaluation metrics (ubRMSE (cm3/cm3) and R) for the November test set. The color bar beneath the soil moisture panels indicates the soil moisture values and their corresponding color ranges; the color bars beneath the evaluation-metric panels indicate the respective metric values and their corresponding color ranges.
Figure 12. Spatial distribution of TM-1 and SMAP soil moisture and evaluation metrics (ubRMSE (cm3/cm3) and R) for the November test set. The color bar beneath the soil moisture panels indicates the soil moisture values and their corresponding color ranges; the color bars beneath the evaluation-metric panels indicate the respective metric values and their corresponding color ranges.
Land 15 00036 g012
Figure 13. Spatial distribution of TM-1 and SMAP soil moisture and evaluation metrics (ubRMSE (cm3/cm3) and R) for the December test set. The color bar beneath the soil moisture panels indicates the soil moisture values and their corresponding color ranges; the color bars beneath the evaluation-metric panels indicate the respective metric values and their corresponding color ranges.
Figure 13. Spatial distribution of TM-1 and SMAP soil moisture and evaluation metrics (ubRMSE (cm3/cm3) and R) for the December test set. The color bar beneath the soil moisture panels indicates the soil moisture values and their corresponding color ranges; the color bars beneath the evaluation-metric panels indicate the respective metric values and their corresponding color ranges.
Land 15 00036 g013
Figure 14. Seasonal mean spatial distribution of the differences (TM-1–SMAP) between TM-1 and SMAP soil moisture retrievals during the 2023 test period (May, August, November, and December). The color bar ranges from −0.30 to 0.30 cm3/cm3; purple indicates that TM-1 is lower than SMAP, while green indicates that TM-1 is higher than SMAP.
Figure 14. Seasonal mean spatial distribution of the differences (TM-1–SMAP) between TM-1 and SMAP soil moisture retrievals during the 2023 test period (May, August, November, and December). The color bar ranges from −0.30 to 0.30 cm3/cm3; purple indicates that TM-1 is lower than SMAP, while green indicates that TM-1 is higher than SMAP.
Land 15 00036 g014
Figure 15. Test-set performance of TM-1 soil moisture retrievals against SMAP under different land-cover types for the LC-cluster plus seasonal-splitting configuration. Purple bars denote cluster-level results (Cluster-level), and green bars denote grid-level results (Grid-level). The metrics shown are the correlation coefficient (R) and the unbiased root-mean-square error (ubRMSE).
Figure 15. Test-set performance of TM-1 soil moisture retrievals against SMAP under different land-cover types for the LC-cluster plus seasonal-splitting configuration. Purple bars denote cluster-level results (Cluster-level), and green bars denote grid-level results (Grid-level). The metrics shown are the correlation coefficient (R) and the unbiased root-mean-square error (ubRMSE).
Land 15 00036 g015
Table 1. Key parameters of TM-1 GNSS-R observations. Units: “–” denotes dimensionless; ° denotes degrees; s denotes seconds; dB denotes decibels. Abbreviations: DDM, Delay–Doppler Map; SNR, signal-to-noise ratio; Rx, receiver.
Table 1. Key parameters of TM-1 GNSS-R observations. Units: “–” denotes dimensionless; ° denotes degrees; s denotes seconds; dB denotes decibels. Abbreviations: DDM, Delay–Doppler Map; SNR, signal-to-noise ratio; Rx, receiver.
NameUnitsDescription
Ddm_sp_reflectivitySpecular point reflectivity
Sp_inc_angle°Specular point incidence angle
Ddm_kurtosisDDM kurtosis
Ddm_skewnessDDM skewness
Sp_lat°Specular point latitude
Sp_lon°Specular point longitude
Ddm_time_utcsDDM sample time UTC
Sp_antenna_gaindBSpecular point Rx antenna gain
Ddm_sp_snrdBDDM specular point SNR
Ddm_quality_flagDDM quality flag
Sp_land_sea_maskSpecular point land–sea mask
Table 2. Statistics of different clustering strategies for global coverage. 72KM, 288KM, 720KM, 1080KM, 1440KM, and 2250KM clusters represent box-shaped clustering methods with different side lengths at regular intervals. LC-cluster refers to clustering based on land cover types. ONE-cluster represents a single model trained with all global grid cells. The sample count refers to the number of training samples.
Table 2. Statistics of different clustering strategies for global coverage. 72KM, 288KM, 720KM, 1080KM, 1440KM, and 2250KM clusters represent box-shaped clustering methods with different side lengths at regular intervals. LC-cluster refers to clustering based on land cover types. ONE-cluster represents a single model trained with all global grid cells. The sample count refers to the number of training samples.
Clustering Strategy72KM288KM720KM1080KM1440KM2250KMLCONE
Number of Models26,72024825352771698371
Average Samples per Model436464021,54041,615 6.82 × 10 4 1.39 × 10 5 1.44 × 10 6 1.15 × 10 7
Standard Deviation of Samples per Model499564827,029 5.24 × 10 4 8.74 × 10 4 1.58 × 10 5 1.84 × 10 6 /
Table 3. Sample statistics and TM-1/SMAP–ISMN validation metrics for each ISMN network and land-cover type.
Table 3. Sample statistics and TM-1/SMAP–ISMN validation metrics for each ISMN network and land-cover type.
NetworkLand-Cover Type
Forest Shrub Grass Cultivated Barren Urban
Berlin22820211442
CW3E10232378
FMI49
REMEDHUS1240100
RSMN172
SCAN3778514978748812
SMOSMANIA167552
SNOTEL12613683261510
SOILSCAPE98128
USCRN30312411913304641
XMS-CAT71851
n_samples115945541731648149515
R(TM1–ISMN)0.5310.74880.5190.53750.75810.6057
R(SMAP–ISMN)0.58140.71890.61210.56180.77370.6524
ubRMSE(TM1–ISMN)0.08690.06520.09190.08450.05810.704
ubRMSE(SMAP–ISMN)0.08340.06850.08410.08380.05720.0673
Table 4. Feature importance ranking in the TM soil moisture estimation model.
Table 4. Feature importance ranking in the TM soil moisture estimation model.
FeatureImportance MeanImportance Std
VWC0.166730.00079
Precip.0.152800.00058
Rough0.124480.00066
LST0.116250.00065
Clay0.109140.00068
Elev.0.054290.00034
NDVI0.011740.00018
Table 5. Test-set sample statistics and cluster-level performance for each land-cover class.
Table 5. Test-set sample statistics and cluster-level performance for each land-cover class.
LC_type N test mean_SMAPstd_SMAPRubRMSE
Forest2,025,2590.35770.13190.72270.0926
Shrub542,1090.20570.10370.86560.0533
Grass3,606,2600.26770.11680.76690.0754
Wetland100,4910.29910.10140.71400.0717
Cultivated857,4910.27380.11550.81550.0670
Urban88,0170.25050.11980.81800.0689
Barren216,1460.12220.12540.93950.0433
Notes: N test , number of test samples; mean_SMAP and std_SMAP denote the mean and standard deviation of SMAP soil moisture within the test set; R and ubRMSE are the cluster-level correlation coefficient and unbiased root mean square error between TM-1 and SMAP soil moisture, respectively. All statistics are computed on the TM-1 test period (May, August, November and December 2023).
Table 6. Cluster-level performance comparison of Random Forest and three gradient boosting tree models under the LC-cluster + seasonal partitioning configuration.
Table 6. Cluster-level performance comparison of Random Forest and three gradient boosting tree models under the LC-cluster + seasonal partitioning configuration.
ModelRubRMSERMSEMAE
CatBoost0.72940.07560.07720.0579
LightGBM0.79190.07160.07330.0542
XGBoost0.79480.07070.07260.0533
Random Forest0.81550.06890.06890.0511
Table 7. Grid-level performance comparison of Random Forest and three gradient boosting tree models under the LC-cluster + seasonal partitioning configuration.
Table 7. Grid-level performance comparison of Random Forest and three gradient boosting tree models under the LC-cluster + seasonal partitioning configuration.
ModelRubRMSERMSEMAE
CatBoost0.48200.05020.06920.0578
LightGBM0.50330.04980.06620.0548
XGBoost0.52020.04930.06480.0535
Random Forest0.52510.04990.06210.0506
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jin, Y.; Ji, M.; Zheng, N.; Zhang, Z.; Ding, P.; Zhao, Q. Soil MoistureRetrieval from TM-1 GNSS-R Reflections with Auxiliary Geophysical Variables: A Multi-Cluster and Seasonal Evaluation. Land 2026, 15, 36. https://doi.org/10.3390/land15010036

AMA Style

Jin Y, Ji M, Zheng N, Zhang Z, Ding P, Zhao Q. Soil MoistureRetrieval from TM-1 GNSS-R Reflections with Auxiliary Geophysical Variables: A Multi-Cluster and Seasonal Evaluation. Land. 2026; 15(1):36. https://doi.org/10.3390/land15010036

Chicago/Turabian Style

Jin, Yu, Min Ji, Naiquan Zheng, Zhihua Zhang, Penghui Ding, and Qian Zhao. 2026. "Soil MoistureRetrieval from TM-1 GNSS-R Reflections with Auxiliary Geophysical Variables: A Multi-Cluster and Seasonal Evaluation" Land 15, no. 1: 36. https://doi.org/10.3390/land15010036

APA Style

Jin, Y., Ji, M., Zheng, N., Zhang, Z., Ding, P., & Zhao, Q. (2026). Soil MoistureRetrieval from TM-1 GNSS-R Reflections with Auxiliary Geophysical Variables: A Multi-Cluster and Seasonal Evaluation. Land, 15(1), 36. https://doi.org/10.3390/land15010036

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop