1. Introduction
As one of the world’s most pivotal food crops, rice serves as the primary caloric source for over half of the global population, playing a vital role in hunger alleviation efforts [
1]. Accurate rice mapping is crucial for addressing global food security challenges, supporting rice growth monitoring and yield forecasting—particularly vital under increasing climate variability [
2]. With advances in remote sensing technology, large-scale rice mapping has been achieved using optical and/or SAR imagery [
3,
4,
5]. However, most existing studies rely on complete time-series data spanning the entire growth cycle for retrospective mapping [
6,
7,
8]. This introduces a significant time lag, limiting their utility for rapid in-season damage assessment following sudden events such as floods or typhoons. Accurate in-season rice mapping is therefore essential for timely damage assessment and post-disaster recovery [
9].
Current in-season identification approaches are primarily based on historical data, primarily categorized into phenology-based methods, deep learning models and traditional machine learning methods. Phenology-based rice mapping methods exploit unique phenological characteristics by applying thresholds derived from temporal vegetation index curves or segmentation algorithms. Multiple indices have been developed for this purpose, including the Temporal Spectral Descriptor (TSD) [
10], composite dual-threshold strategies combining radar backscatter and vegetation dynamics [
11], the 3-Sigmoid index (SSSI) [
12], and the cumulative spectral and phenological characteristics index (CSP) [
13]. Additionally, approaches assessing time-series similarity—such as Time-Weighted Dynamic Time Warping (TWDTW) [
14], Moving Average Convergence Divergence (MACD) [
15], and standard growth curve matching [
16]—complement threshold-based identification by extracting reference phenological curves. Crucially, most curve-matching and similarity-assessment methods (such as TWDTW) are inherently retrospective, requiring full-season observations to construct holistic crop profiles, which severely limits their structural capability to capture transient early-season transplanting signals or execute dynamic, month-scale sample updates. While these phenology-based approaches offer computational efficiency, their dependence on localized prior knowledge and sensitivity to environmental variability limits applicability for large-scale in-season mapping.
Deep learning and traditional machine learning methods effectively overcome the limitations of hand-crafted thresholds through their self-learning capabilities. However, despite possessing exceptional non-linear feature extraction capabilities, deep learning models are limited by their extreme reliance on massive high-quality labeled samples and high computational overhead during large-scale spatial operations. Therefore, for large-scale in-season rice mapping, traditional machine learning methods, which offer computational efficiency and relatively controllable sample requirements, currently remain the mainstream strategy. Meanwhile, both deep learning and machine learning methods are affected by the quality and acquisition timing of training data. Sample acquisition methods based on field collection and visual interpretation are commonly employed [
17,
18,
19,
20,
21,
22,
23,
24], but often suffer from severe timing lags. For instance, Zhou et al. [
25] and Fontanelli et al. [
26] implemented advanced transformer or convolutional architectures for crop recognition, but their frameworks required training samples collected at crop maturity or months after sowing, introducing delays that hinder the prompt deployment of in-season classifiers. To reduce reliance on in-season samples, recent research explores historical data through two strategic directions: model transfer and label transfer. Model transfer approaches involve training classifiers on historical crop maps and applying them to new seasons or regions [
27,
28,
29,
30,
31,
32,
33], often guided by historical crop indices [
34] or cascade filtering strategies [
35] to advance early crop classification under target domain label scarcity [
36,
37]. Label transfer methods generate training labels from historical classification data using advanced statistical or probabilistic modeling [
38,
39,
40]. For instance, Zhang et al. [
41] and Li et al. [
42] leveraged multi-year historical data and crop rotation patterns to produce training labels for target-year modeling. At the same time, unsupervised clustering approaches have been introduced as label-free alternatives to delineate crop distributions without requiring in-season ground truth [
43,
44,
45,
46]. While label-free methods lessen the reliance on sample annotation, this advantage comes at the cost of reduced classification accuracy. Therefore, the use of historical samples remains a common practice in most current studies. Nevertheless, both traditional machine learning and deep learning approaches based on historical data remain dependent on static sample sets. Crucially, these static labels fail to capture the spatiotemporal variations in planting schedules across different regions. Consequently, during the early in-season period, fields still undergoing land preparation are frequently misclassified as already planted, compromising the accuracy and reliability of early-stage classification.
Studies have leveraged a wide range of features through multi-sensor integration strategies for crop monitoring and classification [
15,
19,
21,
23,
29,
30,
45]. For example, Tiwari et al. [
11] employed Sentinel-1 (S1) VV backscatter to identify the rice transplanting stage and Sentinel-2 (S2) NDVI to detect maturity. Additionally, Rußwurm et al. [
47] incorporated all available bands from S1, S2, and PlanetScope, along with associated vegetation indices, for model training. These approaches reflect a growing trend toward utilizing a multitude of features from complementary remote sensing sources to enhance model performance in agricultural applications. However, the use of numerous features may lead to significant redundancy (e.g., multicollinearity), which can compromise model performance. Thus, identifying an optimal feature subset is crucial for enhancing efficiency and accuracy. Current approaches typically evaluate feature importance across the entire growing season to develop streamlined feature sets. For instance, Guo et al. [
19] ranked features by importance and selected VH backscatter along with four vegetation indices—NDVI, EVI, NGRDI, and LSWI—from a larger feature set, while Wang et al. [
48] applied the Boruta algorithm to remove irrelevant features by comparing the importance of original attributes with their randomly shuffled counterparts. However, these methods often fail to account for the varying contributions of different phenological stages to crop identification. This is particularly relevant for rice, which exhibits characteristic flooding signals during the transplanting phase.
To overcome these distinct limitations in the current literature, this study introduces two core innovations that directly address the limitations of existing methods. First, moving beyond the conventional static historical sample sets that fail to capture regional variations in planting schedules, we develop a dynamic threshold-driven pseudo-sample generation mechanism. This mechanism captures dynamic changes in rice planting status to ensure the continuous update of high-confidence training data during the early season. Second, unlike standard feature selection approaches that evaluate importance uniformly across the entire growing season, we introduce a phenologically optimized feature weighting scheme. This approach applies exponential functions to multi-source remote sensing features specifically at key growth stages, effectively amplifying critical stage-specific diagnostic indicators such as the characteristic flooding signals during the transplanting phase.
Based on these mechanisms, the proposed Multi-Source Dynamic Sample Generation and Phenology-Guided Feature Selection Framework for In-Season Rice Identification (MSDF-RiceID) enables month-scale rice mapping via a grid-search-tuned Random Forest classifier. The objectives of this study are to: (i) generate and update rice samples dynamically based on historical data; (ii) optimize features through adaptive thresholds and feature selection; and (iii) achieve monthly rice mapping using a fine-tuned Random Forest classifier.
3. Methods
A novel in-season rice mapping framework proposed in the study integrated rice phenological characteristics into both sample selection and feature optimization through multi-source data fusion. As illustrated in
Figure 3, the methodological framework consisted of six sequential components: (1) data preprocessing to ensure consistency across sensor inputs, (2) vegetation index calculation for feature extraction, (3) rule-based generation of high-confidence pseudo-samples, (4) phenology-guided feature optimization, (5) Random Forest (RF) training with hyperparameter tuning, and (6) accuracy assessment.
3.1. Data Preprocessing
The multi-source data processing pipeline incorporated several key steps to ensure data quality and consistency. The Sentinel-1 GRD data were processed using the Refined Lee filter to reduce speckle noise while preserving edge features [
54]. For Sentinel-2 imagery, only acquisitions with cloud cover below 40% were retained for analysis, with cloudy pixels reconstructed through temporal linear interpolation to maintain spectral continuity. To address temporal resolution limitations of Sentinel-2, MODIS data were incorporated and upsampled to 10 m resolution via bilinear interpolation to establish a unified feature space for the pixel-based Random Forest classifier. Although upsampling 500 m data to 10 m inherently introduces mixed-pixel effects rather than creating genuine fine-scale spatial detail, it is a necessary step to preserve the native 10 m resolution of Sentinel sensors. All remote sensing time-series data underwent monthly median compositing to: (1) guarantee sufficient valid observations throughout the rice growth cycle, (2) effectively eliminate outliers (superior to mean filtering), and (3) minimize weather-induced noise–particularly crucial for southern China’s persistently cloudy conditions [
29,
55]. Finally, NASADEM elevation data were upsampled to 10 m resolution using cubic convolution to match the target spatial scale, ensuring proper integration with other datasets for terrain analysis of paddy rice.
3.2. Feature Construction
Eight polarization features were derived from the Sentinel-1 GRD VV and VH backscatter coefficients (
Table 4). Notably, this study proposes a novel Polarization Ratio Index (PRI), specifically designed to capture the unique temporal backscatter signature of paddy rice, defined as
using linear power intensity. By nonlinearly combining dual-polarization information in the linear domain, the PRI significantly enhances sensitivity to rice phenological development. As illustrated in
Figure 4, the PRI exhibits a distinct dynamic trajectory throughout the growth cycle. During the early flooding stage (transplanting), the specular reflection from the water surface yields minimum values for both VV and VH. Because the product of two small values in the numerator decreases more rapidly than their sum in the denominator, the PRI reaches its minimum during this phase. As rice progresses through vegetative stages, the increasing canopy complexity and multiple scattering effects drive a rapid increase in both backscatter coefficients. In this growth phase, the multiplicative numerator increases more rapidly than the additive denominator, thereby driving the PRI sharply upward. This characteristic “V-shaped” trajectory enables precise tracking of rice development from transplanting to maturity.
Distinct spectral signatures across various land cover types provide essential information for accurate classification. To leverage these differences, we computed eight optical indices (
Table 5) from Sentinel-2 imagery specifically designed to discriminate key land cover classes (water bodies, croplands, natural vegetation, and built-up areas). These spectral features (including both the computed indices and original bands) were integrated into model training. To mitigate temporal gaps caused by Sentinel-2′s 5-day revisit cycle and frequent cloud cover in subtropical regions, we incorporated MODIS products, which are 8-day composites generated from daily observations. Due to spectral band differences between sensors, two additional MODIS-specific vegetation indices were derived to ensure data consistency (
Table 5). Furthermore, elevation and slope parameters generated from NASADEM were included to account for topographic influences on rice cultivation, particularly important in terraced landscapes of Hunan Province.
3.3. High Confidence Pseudo Sample Generation and Update
3.3.1. High-Confidence Sample Generation from Historical Products
The stringent temporal requirements of in-season crop monitoring present significant challenges in acquiring reliable and timely rice samples. To address this constraint, we developed a robust sampling methodology that capitalizes on historical rice mapping. The approach employs three consecutive years (2020–2022) of rice classification data from the TWDTW-Rice product, retaining only those pixels consistently identified as rice throughout the three years to guarantee temporal reliability. To further refine sample quality, we implemented the ESA WorldCover 10 m land cover dataset as an agricultural mask, thereby excluding non-cropland areas. Through this integrated validation framework—combining multi-temporal consistency checks with spatial filtering—we established a high-confidence reference dataset appropriate for in-season crop monitoring.
3.3.2. Dynamic Threshold-Based Pseudo Sample Update
The dynamic threshold (DT) algorithm, previously validated for estimating rice transplanting dates using time-series Sentinel-1 GRD imagery [
67], shows improved accuracy as the number of images increases after transplanting (DAT). Building on high-confidence samples derived from historical rice maps, we applied the DT algorithm to further update the sample by transplanting dates (
Figure 5).
To support temporal tracking, we added the Day of Year (DOY) as an additional band to each image, enabling precise monitoring of phenological stages. Furthermore, to better characterize temporal dynamics, we calculated the difference (
) in backscatter coefficients between consecutive observations with a 12-day revisit cycle, defined as
where
represents the backscatter coefficient at time t (DOY).
In this study, the steps of the dynamic threshold algorithm mainly include the following four steps:
Identify the dates at which the VH backscatter coefficient of the current pixel drops below −18 dB. This threshold, derived from the rice backscatter coefficient curve (
Figure 4), captures the “flooding signals” during the transplanting period.
on transplanting date is below −18 dB, as shown in
Figure 4.
Extract the dates when
and
exceeds 0 dB. As illustrated in
Figure 5,
operates as a pixel-level adaptive threshold to ensure sufficient temporal nodes (
) for capturing the rice growth trajectory. Initialized at 2 dB,
iteratively decreases by 0.1 dB if Diff_num < 3, which accommodates natural intra-class variations in backscatter increments. This relaxation loop terminates when
is satisfied or the 1 dB lower bound is reached, below which the pixel is rejected as non-rice. Analysis of rice backscatter curves revealed that
on the second post-transplanting date increases relative to that on the first post-transplanting date.
Extract the date set that satisfies Conditions 1 and 2. This set contains potential key nodes of rice phenology.
Extract the date from that satisfies the following conditions (1–4). When satisfies all conditions, the loop terminate and is designated as the post-transplanting date (), while the transplanting date set to . Otherwise, the loop continues iterating through the entire range. The conditions applied in this step are:
is ≥−21 dB and ≤−14 dB. From the second post-transplanting date to the maturity period,
typically ranges between −21 dB and −14 dB, as shown in
Figure 4, the
denotes the DOY of the maximum VH backscatter coefficient (
) observed from tx until the end of the observation period.
, where
denotes the number of observations between
and
where
.
below 0 represents a decrease (e.g., in soil moisture) caused by rainfall. While the previous study [
67] used a threshold of 3, this study uses a threshold of 5 due to the high rainfall characteristic of Hunan Province.
The difference between
and
is ≥4 dB. This threshold promotes a strong positive trend in the temporal profile of the VH backscatter coefficient during the growing season (
Figure 4).
Through the above multi-level conditional screening, this study constructed a high-confidence pseudo sample set integrating time-series dynamic features. By dynamically adjusting thresholds and time-series constraints, this method effectively overcomes the insufficient adaptability of single static thresholds to different years and rice types. The generated samples can be directly applied to cross-year transfer model training. Compared with traditional static samples, those extracted by the dynamic threshold algorithm can reflect the in-season rice planting status, reduce model contamination from unplanted rice pseudo samples, and their accuracy progressively improves with the accumulation of time-series data.
3.4. Exponentially Weighted Feature Selection
While high-dimensional feature sets enhance classification robustness, they concurrently elevate risks of model overfitting and computational complexity. In addition, feature importance varies significantly across different rice phenological stages. Therefore, we implemented an exponentially weighted feature selection method that incorporates rice phenological knowledge, temporally integrates importance scores of features across different phenological stages, constructing a comprehensive feature importance measurement system. Based on the rice phenological calendar (
Figure 2), we divided the growth cycle into three critical phases: (1) sowing–transplanting, (2) vegetative growth, and (3) maturity.
The weighted importance scores of features were calculated using an exponential weighting approach based on the time span of each phenological stage, the decaying weight, and the original feature importance scores. The resulting scores were then used to select an optimal training feature set that maintains classification performance while mitigating model overfitting and improving computational efficiency.
The calculation formula is as follows:
where
represents the
-th phenological period, and
represents the importance scores of features in
-th phenological period. The phenological periods comprised three key phases—sowing–transplanting, vegetative growth, and harvest—which are illustrated in
Figure 2. For each phase, we quantitatively evaluated feature importance (
) using GEE’s ee.Classifier.explain() function, analyzing features from multiple data sources (Sentinel-1 and -2, MODIS). Importance scores were normalized within each phenological stage to allow cross-stage comparison.
The
is the normalized weight of
. The formulas are as follows:
where
is the adjusted time span of each phenological stage, which is calculated using Equation (5):
where
denotes the time span of the
-th phenological period;
is the adjustment factor. The
denotes the decaying weight of the
-th phenological period, and
is the decay coefficient. To balance information entropy and phenological prioritization while adhering to mathematical parsimony (avoiding dataset-specific over-tuning), both
and
were set to 0.5 in this study.
By highlighting the unique phenological characteristics of rice during the transplanting and early vegetative stages, this early-season optimization strategy ensures the selection of the most suitable feature combination for in-season recognition, while effectively maintaining model simplicity and efficiency.
3.5. Random Forest with Model Hyperparameter Optimization and Cross-Year Transfer
The RF classifier serves as this study’s classification core. This ensemble method, widely adopted in crop mapping [
28,
68], generates predictions through aggregated decision tree outputs via majority voting. However, suboptimal hyperparameter tuning (e.g., excessive tree depth) may lead to overfitting due to high dimensionality.
In this study, an RF model was implemented within the GEE platform. To optimize predictive performance and learning dynamics, key hyperparameters were rigorously tuned using a grid search strategy. Given the memory restrictions of GEE and the relatively low-dimensional parameter space of its RF API, grid search enables a systematic evaluation across discretized grids to reliably identify the optimal configuration [
69]. Based on comprehensive experimental testing, the optimal RF parameters were determined as follows: first, the number of decision trees was set to 300, striking an optimal balance between classification accuracy and computational efficiency. Second, the number of features evaluated at each node split was defined as 9—corresponding to the square root of the total feature count—to guarantee adequate diversity among the base classifiers. Finally, the minimum size of a terminal node was restricted to 1, enabling the model to effectively capture fine-grained and subtle spatial characteristics of heterogeneous landscapes. To ensure rigorous scientific reproducibility, this configuration was deployed via the GEE JavaScript API using the ee.Classifier.smileRandomForest() function, mapping the optimized hyperparameters to numberOfTrees, variablesPerSplit, and minLeafPopulation, respectively. Furthermore, the high-confidence pseudo-sample extraction was executed via the native ee.Image.stratifiedSample() function under a strictly locked random seed of 42 to guarantee fully deterministic sample partitioning. Spatial interpolation and multi-scale sensor alignment were standardized using the native bilinear resampling (resample(‘bilinear’)) and cubic convolution re-projection algorithms to ensure strict structural consistency across datasets.
Regarding the mapping framework, this study employed a cross-temporal model transfer strategy executed within individual Sentinel-1 tile-sized blocks to mitigate spatial autocorrelation. Initially, the RF classifier was trained using optimized features and a sparse subset of 20,000 dynamically generated pseudo-samples (10,000 per class) from 2022, which were strictly temporally aligned with the observational windows of the 2023 rice growing season. To ensure a genuine zero-shot evaluation for cross-year application, strict temporal data isolation was maintained; no target-year data from 2023 was involved in the training or hyperparameter selection of this historical model. Furthermore, while a separate hold-out field survey dataset exclusively for 2022 was unavailable due to logistical constraints, the baseline performance of the historical model was robustly validated using the inherent Random Forest Out-of-Bag (OOB) metric. The model achieved an excellent baseline OOB accuracy of 97.38% (OOB error: 0.0262), confirming its high reliability in capturing stable phenological features. Subsequently, this trained model was directly transferred to the 2023 time-series data for prediction. During the continuous in-season mapping process, as current-season satellite imagery gradually accumulated, the corresponding historical time windows were synchronously expanded. This mechanism allowed the historical features and pseudo-labels to be dynamically updated and augmented. Consequently, the model effectively captured the consistent inter-annual phenological response patterns of rice, progressively refining its identification performance and ultimately achieving robust in-season rice mapping for 2023.
3.6. Accuracy Assessment
For the in-season rice recognition results, this study evaluated accuracy using five metrics—producer accuracy (PA), user accuracy (UA), overall accuracy (OA), Kappa coefficient, and F1-score—based on field survey data and verification samples derived from visual interpretation. The calculation formulas for these five accuracy metrics are as follows:
where
denotes rice samples correctly classified as rice,
represents non-rice samples misclassified as rice,
refers to non-rice samples correctly classified as non-rice,
is rice samples misclassified as non-rice, and
is the overall accuracy.
Furthermore, to evaluate the statistical precision of the generated error matrices, the 95% confidence intervals (CIs) for the primary classification metrics were analytically derived using the standard binomial proportion error formulation:
where
represents the overall accuracy, and
denotes the total number of validation pixels.
5. Discussion
5.1. In-Season Updating of Rice Samples
To obtain high-quality rice samples, a multi-step filtering strategy was adopted. Historical rice maps from TWDTW-Rice spanning 2020 to 2022 were first extracted and composited into
Figure 14a. However, the analysis based on TWDTW-Rice revealed that these maps contained a small proportion of misclassified pixels, including wetlands and aquatic vegetation incorrectly labeled as rice. These misclassifications could significantly degrade the performance of rice classification. To mitigate this, high-confidence rice pixels persistently classified as rice across all three years were regarded as rice samples. Further refinement was achieved using a farmland mask to eliminate non-agricultural pixels mislabeled as rice (
Figure 14b).
The DT algorithm was subsequently employed to identify high-confidence rice samples. This step not only reinforced the reliability of rice samples but also allowed the extraction of transplanting dates for individual pixels based on phenological signals (
Figure 14c). Following regional rice phenology studies in Hunan Province [
49], the normal transplanting window was defined as DOY 90 to 220. Samples with transplanting dates outside this range were excluded as anomalous, resulting in a final set of high-confidence rice samples (
Figure 14d). Specifically, transplanting dates falling within DOY 100–112, 124–148, 160–172, 184–208, and 220 represent rice fields transplanted in April, May, June, July, and August, respectively.
While utilizing such historical products and automated phenological rules introduces an inherent risk of error propagation, this two-stage purification framework effectively bounds and suppresses potential noise amplification. The first-stage continuous spatial consensus across TWDTW-Rice (2020–2022) functions as a robust spatial filter that successfully sweeps away annual crop rotation anomalies and random mapping errors present in individual annual layers. Subsequently, the second-stage phenological purification via the DT algorithm calculates the precise localized transplanting dates on a pixel-by-pixel basis. By enforcing these exceptionally stringent multi-criterion constraints during the flooding window, the sample pool explicitly identifies and excludes confounded non-rice pixels (e.g., wetlands and aquatic vegetation) that frequently leak through static historical products.
From a machine learning standpoint, this dual-filtering architecture ensures that the final ensemble classifiers are fed with highly localized, phenologically synchronized core pure pixels rather than mixed or contaminated training inputs. Furthermore, ensemble bagging architectures like Random Forest possess a high mathematical resilience to low-level residual label noise, prioritizing the generalized regional scattering and reflectance trajectories over individual anomalous cells. The high overall accuracy (OA = 0.97) demonstrated under our strictly independent ground-truth validation firmly disproves any catastrophic error propagation, confirming that the developed sample updating strategy successfully stabilizes and purifies the automated sample stream under operational constraints.
During the in-season mapping process, the sample set was iteratively updated based on the transplanting dates of rice pixels falling within their corresponding monthly time windows. Furthermore,
Figure 15 reveals distinct regional variations in transplanting timing, clearly demonstrating that samples in the southwestern plains were transplanted earlier than those in the northeastern mountainous areas. Specifically, early-season rice—primarily cultivated in the low-altitude southwestern region—was generally transplanted around day of year (DOY) 100 to 120 in April (
Figure 15a). From May to June, with the progressive accumulation of time-series imagery and the application of the DT algorithm, mid-season rice samples largely concentrated in the northeastern region were gradually identified (
Figure 15b,c). Because lower temperatures at higher altitudes delay the suitable planting period, these mid-season rice crops in the high-elevation mountains were typically transplanted near DOY 160 in June. Subsequently, late-season rice, predominantly distributed in the southwestern lowlands, was typically transplanted around DOY 184 in July (
Figure 15d). This spatial pattern aligns consistently with field survey results, confirming that double-season rice is mainly cultivated in the flatter western regions, whereas single-season rice dominates the northeastern uplands. The strong agreement between the remotely detected samples and ground observations demonstrates that the MSDF-RiceID framework reliably represents real-world farming practices, thereby enabling the in-season classification model to accurately capture the monthly spatial distribution of rice cultivation.
Despite the robust operational performance demonstrated, a comprehensive evaluation requires explicitly delimiting the physical boundaries, sensor-specific errors, and failure conditions of the MSDF-RiceID framework. First, because the dynamic threshold algorithm relies on identifying unique low backscatter coefficients during the specular reflection phase of paddy flooding, it remains inherently vulnerable to SAR polarimetric anomalies induced by intense precipitation. Heavy rainfall during the critical transplanting window introduces a dual-directional error propagation path: on one hand, it alters paddy water surface geometry, shifting specular reflection to diffuse scattering and artificially inflating VH backscatter above the −18 dB threshold, causing false non-rice omissions. On the other hand, intense downpours can induce localized surface pooling or flatten adjacent short-stature non-rice vegetation, smoothing their surface structures and suppressing their backscatter coefficients below the threshold, thereby introducing false-positive commission errors. Second, the framework is susceptible to failure under extreme meteorological anomalies, such as prolonged summer floods that submerge fields well into vegetative phases or severe early-season droughts that completely alter irrigation calendars, both of which deform the characteristic “V-shaped” profile and disrupt fixed chronological rules. Finally, mixed-pixel edge effects within highly fragmented agricultural landscapes continue to introduce localized geometric noise at parcel boundaries, limiting fine-scale mapping fidelity.
5.2. Comparison of Multi-Source Data Combinations
Based on the empirical accuracy metrics evaluated across various feature configurations in
Section 4.3, this section further interprets the underlying physical, spectral, and environmental response mechanisms driving these results. Under single-source conditions (
Figure 10a,f,k), notable performance differences were observed across sensors. Classification using S1 data achieved higher accuracy from April (OA = 0.72, F1-score = 0.6, Kappa = 0.42) to June (OA = 0.78, F1-score = 0.78, Kappa = 0.56). This advantage can be attributed to the SAR signal’s sensitivity to shallow water and sparse rice seedlings during transplanting, which results in distinctive low backscatter signatures. In contrast, crop discrimination with optical imagery during this period relied on spectral differences between land covers—differences that were not yet sufficiently distinct early in the season. As the growing season advanced, spectral separability improved, enabling optical-based classifications to exceed SAR performance after August. Models using MODIS data (OA = 0.78, F1-score = 0.79, Kappa = 0.56) outperformed those using S2 (OA = 0.73, F1-score = 0.67, Kappa = 0.46) prior to August, due to higher temporal frequency and greater data availability in cloud-prone periods. Although S2 data has a finer spatial resolution, its limited number of clear observations early in the season increased susceptibility to cloud and rainfall interference. With the accumulation of imagery over time, the spatial detail provided by S2 became increasingly advantageous, ultimately achieving the highest accuracy (OA = 0.91, F1-score = 0.92, Kappa = 0.83).
Under multi-source fusion (
Figure 10c,e,h,m), the combination of S1, S2, and MODIS data achieved the best performance (OA = 0.96, F1-score = 0.96, Kappa = 0.93). This result can be attributed to the complementary nature of spatial, spectral, and temporal information provided by the three sensors: S1 data enabled the detection of paddy flooding and structural features in early growth stages, S2 imagery supported crop differentiation in mid-to-late stages through its high spectral and spatial resolution, and MODIS contributed continuous temporal coverage. The integrated use of these datasets compensated for limitations in continuity or spatial detail inherent in single- and dual-source configurations, while also improving spectral separability, which collectively enhanced classification accuracy and robustness. The S1 + S2 combination ranked second (OA = 0.95, F1-score = 0.94, Kappa = 0.90), benefiting from strong complementary characteristics though still limited by the acquisition frequency and cloud-induced data gaps of S2. The S1 + MODIS combination (OA = 0.95, F1-score = 0.94, Kappa = 0.89) partially alleviated constraints related to spatial resolution and temporal density; however, the coarse resolution of MODIS (500 m) led to classification uncertainty in heterogeneous and small-field landscapes, resulting in lower stability compared to S1+S2. The S2 + MODIS combination (OA = 0.93, F1-score = 0.93, Kappa = 0.86), consisting solely of optical sensors, exhibited lower complementarity. While the high temporal density of MODIS partly compensated for the scarcity of early-season S2 images, the mixed-pixel effect of MODIS reduced the effective spatial advantage of S2, leading to lower performance compared to combinations including SAR.
Furthermore, it is important to explicitly address the scale effect introduced by the MODIS data. Upsampling 500 m MODIS imagery to 10 m via bilinear interpolation inherently introduces an apparent “false resolution” with mixed-pixel effects. However, in cloud-prone regions like Hunan, the temporal continuity provided by MODIS is indispensable. This spatial-temporal trade-off is validated by our results: the S2 + MODIS combination outperforms S2 alone, demonstrating that the temporal information gain from MODIS effectively outweighs its spatial penalty. Nevertheless, to counteract the potential misclassifications in fragmented farmlands caused by MODIS’s spatial ambiguity, our framework relies on the synergistic S1 + S2 + MODIS combination. In this configuration, the native 10 m SAR and optical data provide the rigorous spatial constraints needed to delineate fragmented field boundaries and suppress artificial spatial autocorrelation, ensuring both high spatial fidelity and temporal continuity.
The effect of terrain factors on rice distribution was observed to vary depending on the data source used. Under single-source conditions, the use of DEM led to the greatest improvement in models using S1 (average gain ≈ 0.04), as radar backscatter is sensitive to surface geometry and water presence. Previous studies have shown that rice cultivation is primarily concentrated in areas below 200 m in altitude and on slopes under 6°, while it rarely occurs above 800 m or on slopes exceeding 16° [
70]. Accordingly, DEM helped constrain plausible planting zones and reduce topographic noise in SAR-based classification. In contrast, optical data rely on spectral reflectance and vegetation indices, which are less influenced by topography than radar scattering mechanisms. Therefore, DEM provided only moderate benefits for optical imagery, mainly reducing confusion in early stages when spectral separability was low. Its contribution diminished once optical features became more distinctive later in the season. In multi-source fusion, although the relative benefit of DEM decreased with more abundant optical time-series data, its inclusion consistently improved performance. This was particularly evident when DEM was incorporated into the S1 + S2 + MODIS combination, enhancing overall accuracy and stability—with OA and F1-score increasing by approximately 0.02 and Kappa by 0.03 (
Figure 10j). These improvements were most pronounced during the early growing period (April–June).
These results demonstrate that multi-source feature fusion effectively improves rice classification under challenging conditions. The combination of SAR, optical, and terrain data significantly enhances both spatial discrimination and temporal continuity, especially in regions with complex terrain and frequent cloud cover. Among all approaches, the “SAR + multi-scale optical + terrain” strategy yielded the most reliable and accurate mapping results by effectively mitigating atmospheric interference and capturing key phenological dynamics with improved temporal consistency, proving particularly beneficial for in-season rice classification.
5.3. Comparison of Feature Selection Strategies
Unlike conventional approaches where features are ranked with equal weighting across all growth periods, our exponential-weighted feature selection strategy dynamically adjusts feature importance according to phenological information. It emphasizes early-stage phenological features driven by the distinct flooding signal during the transplanting phase. Compared to the conventional full-season strategy (
Figure 16), the proposed approach significantly improves early-stage identification performance, as evidenced by enhanced F1-scores and Kappa coefficient metrics. By prioritizing key features from Sentinel-1 (top 3) and integrated Sentinel-2/MODIS (top 4), the approach achieves consistent early-stage gains: F1-scores improve by +0.03 on average (April–June), with Kappa coefficients rising by 0.04–0.05 in May–June (
Figure 16). It is worth noting that the performance gap narrowed after September, as phenological signals became more distinct and full-season features provided sufficient separability. This trend further emphasizes the importance of assigning greater weight to early-stage features when timely identification is necessary.
5.4. Comparative Evaluation and Suitability of Alternative Classifiers
As systematically illustrated in
Figure 11, the time-series accuracy trajectories highlight distinct behavioral characteristics among different algorithmic architectures across the phenological stages. During the initial transplanting window in April, all models started at a lower baseline (RF F1-score: 0.61, GBDT F1-score: 0.62, KNN F1-score: 0.51), which accurately reflects the intense spectral and polarimetric confusion between transiently flooded paddies and nearby permanent water bodies. From June onward, as rice plants advanced into peak vegetative and reproductive stages, the rapid accumulation of biophysical tracking information led to a sharp escalation in accuracy. By September, all models reached their annual performance peaks, with KNN obtaining an F1-score of 0.89 and GBDT achieving a highly competitive F1-score of 0.93, while RF established its technical dominance with an elite F1-score of 0.97 (
Figure 11a).
Intriguingly, the late-season harvesting phase from October to November induced a severe performance collapse for specific architectures. Post October, GBDT suffered a devastating performance degradation, with its F1-score falling abruptly from 0.93 in September to 0.86 in October and November (
Figure 11b). This diagnostic divergence in F1-score is mathematically governed by the algorithmic tolerance of Bagging and Boosting structures against macro-structural semantic noise. Post October, widespread harvesting in Hunan Province transforms homogeneous crop canopies into chaotic fragments of bare soils and crop residues. Due to its residual-fitting nature, GBDT forces its late-stage constituent classifiers to minimize training errors by fitting these post-harvest scattering variations, leading to severe empirical overfitting that directly pollutes the targets and drags down the performance. Conversely, the random subspace feature selection mechanism inherent to Random Forest randomly subsets available dimensions at node splits across hundreds of parallel independent decision trees. This Bagging architecture naturally dilutes and down-weights the noisy post-harvest time-steps, forcing the ensemble to heavily rely on the pristine biophysical checkpoints captured during the peak growth window, thereby firmly locking its target-crop F1-score at an elite plateau of 0.97 (
Figure 11a). Consequently, these empirical F1-score dynamics distinctively validate that RF remains the optimal and most reliable choice for operational in-season rice mapping.
5.5. Robustness Assessment of the MSDF-RiceID
5.5.1. Cross-Regional Performance Consistency
To evaluate the cross-regional applicability of the proposed MSDF-RiceID framework, in-season experiments encompassed geographically diverse rice-growing regions in southern and northern China: Taishan in Guangdong Province and Panjin in Liaoning Province. To further verify the statistical reliability of the mapping outcomes across these heterogeneous study areas, the 95% confidence intervals were analytically derived using Equation (13). The calculated standard error margins are tightly bounded within ±0.05% across all peak operational phases for the experimental regions, statistically demonstrating that the reported rice mapping results possess exceptional deterministic stability and are highly resistant to random spatial variations. Based on the previous rice phenology study [
49], the monitoring period spanned March–December 2019 for Taishan and May–October 2024 for Panjin. However, since Sentinel-2 Level-2A surface reflectance products were unavailable in Taishan before 2019, only Sentinel-1 and MODIS data from 2018 and 2019 were used for classification in this region to maintain consistency across datasets. The classification results for Taishan and Panjin are presented in
Figure 17 and
Figure 18, respectively.
Early March classifications (OA = 0.8, F1-score = 0.6, Kappa = 0.47) misidentified fishponds as rice fields in southern and central Taishan subregions (red circles,
Figure 17a). This confusion occurs when flooded paddies in the transplanting window exhibit: SAR backscatter similarity (VH: −21.44 dB vs. −20.98 dB) to open water bodies and optical water index overlap with water bodies (mean MNDWI: 0.67 vs. 0.72). The progressive enhancement correlates with increased temporal feature separability, where the added observations better distinguished rice phenological trajectories from permanent water bodies. Meanwhile, the paucity of early time-series observations further subjected the optical imagery in the southwestern and northeastern regions of Taishan to cloud and rainfall interference, resulting in the omission of rice (red boxes,
Figure 17a). From April onward, accumulating time-series data enabled the correct identification of previously undetected rice fields and reduced fishpond misclassification (red circles,
Figure 17b). The convergence of high accuracy (OA = 0.95, F1-score = 0.91, Kappa = 0.88) by May reflects established phenological divergence: as rice develops canopy structure beyond the flooding phase, enhanced backscatter and spectral separability from water bodies (ΔNDVI > 0.6) fundamentally suppress early-season confusion. This stabilization enables reliable in-season mapping with a 2–3-month lead over conventional full-season approaches.
Compared to Taishan, Panjin achieved earlier high-precision classification (OA = 0.92, F1-score = 0.92, Kappa = 0.85) within the first month post-transplanting, ultimately exceeding that of Taishan by 0.05 (
Figure 19b). First, low annual precipitation minimized cloud interference, enabling high observational continuity that enhanced temporal feature separability–critical for distinguishing flooded paddies from aquaculture ponds during early growth stages. Second, Panjin’s landscape homogeneity (
Figure 18) suppressed edge-mixed pixels that typically degrade early-stage classification in heterogeneous southern landscapes. Although May marked the earliest identifiable stage, initial classifications exhibited limited geometric fidelity with poorly resolved field boundaries (
Figure 18a). As time-series data accumulated, accuracy steadily improved as the model captured finer intra-field structures, culminating in a peak F1-score of 0.98.
The earliest identifiable stage was achieved earlier in Taishan and Panjin than in Hunan Province. This difference can be attributed to the greater phenological complexity of Hunan’s triple-cropping systems—particularly transplanting phases spanning more than three months—which was further compounded by a limited number of early-season validation samples. Both factors contributed to lower initial classification accuracy. By contrast, the single-season rice system in Panjin is characterized by highly concentrated transplanting windows, whereas the double-season system in Taishan maintains distinct seasonal synchrony. Both patterns produce coherent phenological signatures, allowing earlier detection compared to the staggered triple-cropping system found in Hunan. Field investigations confirmed the predominance of rice transplanting during the initial detection window, where MSDF-RiceID successfully identified newly established fields. This effective deployment across divergent agroecological regions demonstrates the framework’s cross-regional adaptability and operational capacity for timely, precise monitoring of emerging planting patterns.
5.5.2. Multi-Product Algorithm Benchmarking: Accuracy Gains and Persistent Constraints
The superior performance of the proposed MSDF-RiceID framework over benchmark products stems from its ability to resolve persistent confusion in spectral and backscatter signals between rice paddies and wetlands, leading to substantial improvements in classification accuracy (OA: +0.12–0.18; Kappa: +0.23–0.35; F1-score: +0.09–0.15 relative to TWDTW-Rice and EARice10, as listed in
Table 6). To statistically verify this thematic superiority, a pixel-level McNemar’s test was conducted on the discordant cells across the validation regions. In the paired map comparisons, MSDF-RiceID achieved 76,870 uniquely correct pixels against TWDTW-Rice (which yielded 9361), resulting in a Chi-squared statistic of 52,850.25. Compared to EARice10, MSDF-RiceID achieved 145,538 uniquely correct pixels (which yielded 11,734), resulting in a Chi-squared statistic of 113,836.17. The calculated
p-values for both tests strictly approached zero (
p < 0.001), robustly demonstrating that the observed performance advantages are mathematically meaningful and directly attributable to our proposed methodological innovations rather than random data fluctuations. Spatial mapping results across representative heterogeneous zones are visually displayed in
Figure 20, where misclassified non-rice areas identified as rice by alternative products are highlighted with red circles.
The enhanced discrimination is achieved through a multi-temporal feature weighting approach that identifies flooded paddies based on phenological trajectory divergence after initial flooding stages—a period when conventional spectral and backscatter methods often fail due to similar sensor responses between rice and wetlands. Notably, although the 250 m MODIS time-series offers lower spatial resolution than Sentinel-2, its high temporal density plays a critical role in capturing key phenological cues. The daily revisit frequency of MODIS overcomes the 5–16-day gaps in Sentinel-2 acquisitions caused by cloudy seasons, enabling the capture of rapid phenophase transitions. For instance, in areas where EARice10 misclassified wetlands near Dongting Lake (
Figure 20), short-duration flooding events indicative of rice cultivation were identified using time-series MODIS-derived vegetation indices within the proposed framework. Furthermore, the double-peak signature in MODIS EVI trajectories—associated with rice tillering and heading phases—provides an additional distinguishing feature absent in perennial wetlands.
While MSDF-RiceID achieves high classification accuracy, its performance faces one key constraint. The scarcity of early-season remote sensing data necessitated reliance on historical satellite imagery and existing rice products for model training and sample generation. This approach is inherently limited by historical data reliability, whereby substantial misclassification rates may propagate sample contamination into the current framework. To address this fundamental challenge, our ongoing work focuses on developing systematically validated annual rice products to enhance sample purity for future applications.
6. Conclusions
This study introduces MSDF-RiceID, a novel framework designed for in-season, month-scale rice mapping based on multi-source remote sensing imagery. The framework addresses the critical challenge of sparse early-season observations by integrating a dynamic threshold algorithm that automatically generates high-confidence pseudo samples from historical maps. Furthermore, it employs a phenologically guided exponentially weighted strategy for feature optimization, enhancing cross-year generalizability through adaptive model transfer.
Experimental results revealed regional variations in early detectability: the earliest identifiable stage (F1-score > 0.9) occurred in May for both Taishan (transplanted in March) and Panjin (transplanted in May), whereas Hunan Province required until July (transplanted in April), due to its more complex triple-cropping system with an extended transplanting window exceeding three months. For Hunan, in-season samples were successfully differentiated based on key transplanting dates in April (DOY 100 and 120), June (DOY 160), and July (DOY 184), corresponding to early, middle, and late rice seasons. Panjin notably achieved high classification accuracy (OA = 0.92, F1-score = 0.92, Kappa = 0.85) within the first month post-transplanting, outperforming Taishan by approximately 0.05 in final F1-score.
In terms of feature strategy, the “SAR + multi-scale optical + terrain” strategy produced the most reliable and accurate mapping results for in-season rice classification. The optimal feature set included indices from Sentinel-1 (PRI, VH, VV_VH), Sentinel-2 (NDYI, PSRI, NDBI, NDWI), and MODIS (NDVI, EVI, NDBI, LSWI). Dual-source models exhibited only marginally lower accuracy than their triple-source counterparts during most growth periods. Additionally, the exponentially weighted mechanism was essential for highlighting discriminative features in early-season identification. As expected, its advantage over equal weighting diminished as the season progressed into mid- and late-stages, when more distinct phenological signals were captured in the accumulated time-series data.
MSDF-RiceID significantly outperformed existing rice mapping products such as TWDTW-Rice and EARice10. Specifically, the framework achieved an overall accuracy of 0.97, a Kappa coefficient of 0.95, and an F1-score of 0.97. These represent improvements of 0.12–0.18 in OA, 0.23–0.35 in Kappa, and 0.09–0.15 in F1-score. The results highlight the framework’s robustness and adaptability across diverse cropping systems and environments. With its ability to support high-accuracy, timely rice monitoring, MSDF-RiceID is a scalable operational solution for large-scale agricultural management and disaster response.