MMDFRNet: Dynamic Cross-Modal Decoupling and Alignment for Robust Rice Mapping
Highlights
- Developed MMDFRNet, a mechanism-driven dual-stream framework featuring dynamic cross-modal decoupling and alignment to resolve statistical disparities between Sentinel-1 SAR and Sentinel-2 optical imagery.
- Achieved state-of-the-art performance (IoU of 0.8612), outperforming recent advanced paradigms (e.g., UNetFormer, STMA) in both mapping accuracy and inference speed.
- Resolved the multi-modal “degradation paradox” by effectively suppressing SAR speckle noise and transforming it into a performance booster through pixel-wise adaptive alignment.
- Methodological Implication: Validates that pixel-wise adaptive recalibration effectively bridges the physical gap between SAR structural data and optical spectral data, providing a robust paradigm for handling asynchronous data quality.
- Practical Implication: Delivers a highly robust and computationally efficient tool for regional food security monitoring, demonstrating exceptional generalization across both fragmented smallholder plots and large-scale agricultural landscapes.
Abstract
1. Introduction
- Mechanism-driven modality decoupling: We design independent dual-stream encoders to decouple the distinct characteristics of heterogeneous data. This structure effectively separates the penetrative texture features of SAR from the spectral details of optical images, preventing the feature interference and noise propagation common in traditional shared-encoder approaches.
- Dynamic cross-modal alignment via multi-modal feature fusion (MMF) module: An attention-based MMF module is proposed to facilitate pixel-wise dynamic alignment. Unlike static fusion, this module acts as a dynamic gate that suppresses noise from compromised modalities (e.g., cloud-covered optical pixels) while enhancing reliable features, ensuring robust cross-modal complementarity.
- Multi-scale feature fusion (MSF) module: To address the scale variation inherent in diverse agricultural landscapes, we incorporate an MSF module. This component hierarchically fuses low-level spatial features with high-level semantic representations, significantly enhancing boundary delineation accuracy in both fragmented and large-scale rice fields.
2. Materials and Methods
2.1. Study Area
2.2. Data Acquisition and Preprocessing
2.3. Ground Truth Generation and Quality Control
2.4. Models and Principles
2.4.1. MMDFRNet
2.4.2. Multi-Modal Fusion Module
2.4.3. ASPP
2.4.4. Multi-Scale Feature Fusion Module
2.4.5. OFE and SFE
2.5. Experimental Design for Ablation Studies
2.5.1. Module Effectiveness Analysis
- MMFNet (w/o MSF): As shown in Figure 6a, this variant is constructed by removing the MSF module from MMDFRNet. The objective is to verify the necessity of multi-scale feature integration. By retaining only the MMF module, we can assess whether the network suffers from insufficient local detail and generalization capability when hierarchical semantic fusion is absent.
- MSFNet (w/o MMF): To validate the efficacy of the proposed adaptive fusion mechanism, we constructed the MSFNet (Figure 6b) by removing the MMF module. In this setup, features from the optical and SAR encoders are directly concatenated and fed into the MSF module. This comparison helps clarify whether the dynamic recalibration and cross-modal enhancement (provided by MMF) offer a significant advantage over simple static fusion.
- MMDFNet (w/o ASPP): To verify the contribution of the atrous spatial pyramid pooling module, we constructed this variant by replacing the dilated convolutions in the ASPP block with standard 1 × 1 convolutions, as illustrated in Figure 6c. This configuration removes the multi-scale receptive field expansion capability, allowing us to quantify the module’s specific role in capturing long-range contextual dependencies and handling scale variations in rice fields.
- RNet (Baseline): As a baseline, both the MMF, MSF, and ASPP modules are removed (Figure 6d). The dual-stream encoder outputs are directly concatenated and passed to the decoder. This model serves to quantify the overall performance gain attributed to the proposed fusion architecture.
2.5.2. Modality Necessity Analysis
2.6. Comparative Methods and Degradation Verification
2.6.1. Classic and SOTA Comparision Models
- Classic semantic segmentation models: We selected U-Net [30] and PSPNet [32]. U-Net is a pioneering encoder–decoder architecture that recovers spatial details through symmetric skip connections, though it primarily relies on simple channel-wise concatenation for feature integration. PSPNet utilizes a pyramid pooling module to aggregate global context information at multiple scales, effectively capturing scene-level semantics through fixed-grid spatial pooling.
- Domain-specific fusion models: We included R-Unet [15], a dedicated rice mapping model, and CCRNet [33], a representative multi-modal network designed to fuse heterogeneous data. R-Unet is a specialized rice mapping model that incorporates deep residual blocks into a U-Net structure to enhance feature extraction from bi-temporal remote sensing data. CCRNet is a representative multi-modal network that employs a cross-level regional response mechanism to align and fuse features from heterogeneous sensors through cross-attention.
- SOTA models: To benchmark against advanced deep learning paradigms, we selected UNetFormer [34] and STMA [22]. UNetFormer introduces a hybrid Transformer-based architecture (combining a ResNet encoder with a Transformer decoder) for efficient global modeling. STMA is designed to capture multi-scale spatio-temporal features through learnable positional encodings.
2.6.2. Degradation Verification in Classic Models
2.7. Hyperparameter Settings
2.8. Model Evaluation
3. Results
3.1. Training Dynamics and Convergence Analysis
3.2. Comparative Analysis with SOTA Models
3.3. Internal Module Effectiveness Analysis
3.4. Modality Necessity and Degradation Verification
3.4.1. Necessity of Multi-Modal Fusion in MMDFRNet
3.4.2. Verification of Degradation in Classic Models
3.5. Model Adaptability Assessment Results
4. Discussion
4.1. The Analysis of Performance Degradation in Concatenation-Based Fusion
4.2. The Comparative Assessment Against Advanced Deep Learning Paradigms
4.3. Evidence of MMF Necessity from Regional Performance Variations
4.4. The Impact of Data Distribution Shifts on Generalization Mechanisms
4.5. Limitations and Prospects
5. Conclusions
- Superior mapping accuracy and methodological robustness: MMDFRNet establishes a new benchmark for rice mapping, achieving a Precision of 0.9234 and an IoU of 0.8612. Crucially, our framework effectively reverses the “degradation paradox” observed in classic concatenation-based models. By employing adaptive recalibration, MMDFRNet leverages SAR data to boost IoU by 8.09% compared to its optical-only variant. Furthermore, it significantly outperforms state-of-the-art paradigms, surpassing the Transformer-based UNetFormer and the time-series model STMA in both segmentation accuracy and boundary delineation.
- Exceptional generalization and operational efficiency: The rigorous evaluation across four spatially distinct study areas confirms the model’s superior generalization capability. The synergistic combination of MMF and MSF modules enables the model to capture long-range dependencies and multi-scale features, ensuring high structural consistency in both fragmented smallholder plots and large-scale agricultural landscapes. Furthermore, the model maintains a high inference speed (6.0274 s per image), demonstrating that our mechanism-driven dual-stream design is not only theoretically rigorous but also highly efficient for practical, national-level crop mapping tasks in complex environments.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Wu, B.; Zhang, M.; Zeng, H.; Tian, F.; Potgieter, A.B.; Qin, X.; Yan, N.; Chang, S.; Zhao, Y.; Dong, Q.; et al. Challenges and Opportunities in Remote Sensing-Based Crop Monitoring: A Review. Natl. Sci. Rev. 2023, 10, nwac290. [Google Scholar] [CrossRef]
- Bhandari, B.; Mayer, T. Comparing Deep Learning Models for Mapping Rice Cultivation Area in Bhutan Using High-Resolution Satellite Imagery. ISPRS Open J. Photogramm. Remote Sens. 2025, 15, 100084. [Google Scholar] [CrossRef]
- Weiss, M.; Jacob, F.; Duveiller, G. Remote Sensing for Agricultural Applications: A Meta-Review. Remote Sens. Environ. 2020, 236, 111402. [Google Scholar] [CrossRef]
- Hashemi, M.G.Z.; Jalilvand, E.; Alemohammad, H.; Tan, P.-N.; Das, N.N. Review of Synthetic Aperture Radar with Deep Learning in Agricultural Applications. ISPRS J. Photogramm. Remote Sens. 2024, 218, 20–49. [Google Scholar] [CrossRef]
- Chen, W.; Ouyang, S.; Tong, W.; Li, X.; Zheng, X.; Wang, L. GCSANet: A Global Context Spatial Attention Deep Learning Network for Remote Sensing Scene Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 1150–1162. [Google Scholar] [CrossRef]
- Chen, Q.; Kuang, G.; Li, J.; Sui, L.; Li, D. Unsupervised Land Cover/Land Use Classification Using PolSAR Imagery Based on Scattering Similarity. IEEE Trans. Geosci. Remote Sens. 2013, 51, 1817–1825. [Google Scholar] [CrossRef]
- Zhan, L.; Ye, P.; Fan, J.; Chen, T. U2ConvFormer: Marrying and Evolving Nested U-Net and Scale-Aware Transformer for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5517114. [Google Scholar] [CrossRef]
- Yuan, Y.; Lin, L.; Liu, Q.; Hang, R.; Zhou, Z.-G. SITS-Former: A Pre-Trained Spatio-Spectral-Temporal Representation Model for Sentinel-2 Time Series Classification. Int. J. Appl. Earth Obs. Geoinf. 2022, 106, 102651. [Google Scholar] [CrossRef]
- Wang, Y.; Feng, L.; Zhang, Z.; Tian, F. An Unsupervised Domain Adaptation Deep Learning Method for Spatial and Temporal Transferable Crop Type Mapping Using Sentinel-2 Imagery. ISPRS J. Photogramm. Remote Sens. 2023, 199, 102–117. [Google Scholar] [CrossRef]
- Xu, Y.; Zhou, J.; Zhang, Z. A New Bayesian Semi-Supervised Active Learning Framework for Large-Scale Crop Mapping Using Sentinel-2 Imagery. ISPRS J. Photogramm. Remote Sens. 2024, 209, 17–34. [Google Scholar] [CrossRef]
- Fan, L.; Xia, L.; Yang, J.; Sun, X.; Wu, S.; Qiu, B.; Chen, J.; Wu, W.; Yang, P. A Temporal-Spatial Deep Learning Network for Winter Wheat Mapping Using Time-Series Sentinel-2 Imagery. ISPRS J. Photogramm. Remote Sens. 2024, 214, 48–64. [Google Scholar] [CrossRef]
- Xia, L.; Zhao, F.; Chen, J.; Yu, L.; Lu, M.; Yu, Q.; Liang, S.; Fan, L.; Sun, X.; Wu, S.; et al. A Full Resolution Deep Learning Network for Paddy Rice Mapping Using Landsat Data. ISPRS J. Photogramm. Remote Sens. 2022, 194, 91–107. [Google Scholar] [CrossRef]
- Du, M.; Huang, J.; Wei, P.; Yang, L.; Chai, D.; Peng, D.; Sha, J.; Sun, W.; Huang, R. Dynamic Mapping of Paddy Rice Using Multi-Temporal Landsat Data Based on a Deep Semantic Segmentation Model. Agronomy 2022, 12, 1583. [Google Scholar] [CrossRef]
- Zhao, F.; Xia, L.; Kylling, A.; Li, R.Q.; Shang, H.; Xu, M. Detection Flying Aircraft from Landsat 8 OLI Data. ISPRS J. Photogramm. Remote Sens. 2018, 141, 176–184. [Google Scholar] [CrossRef]
- Fu, T.; Tian, S.; Ge, J. R-Unet: A Deep Learning Model for Rice Extraction in Rio Grande Do Sul, Brazil. Remote Sens. 2023, 15, 4021. [Google Scholar] [CrossRef]
- Zhong, L.; Hu, L.; Zhou, H. Deep Learning Based Multi-Temporal Crop Classification. Remote Sens. Environ. 2019, 221, 430–443. [Google Scholar] [CrossRef]
- Jiang, T.; Liu, X.; Wu, L. Method for Mapping Rice Fields in Complex Landscape Areas Based on Pre-Trained Convolutional Neural Network from HJ-1 A/B Data. ISPRS Int. J. Geo-Inf. 2018, 7, 418. [Google Scholar] [CrossRef]
- Yang, L.; Huang, R.; Zhang, J.; Huang, J.; Wang, L.; Dong, J.; Shao, J. Inter-Continental Transfer of Pre-Trained Deep Learning Rice Mapping Model and Its Generalization Ability. Remote Sens. 2023, 15, 2443. [Google Scholar] [CrossRef]
- Fu, T.; Tian, S.; Zhan, Q. Phenological Analysis and Yield Estimation of Rice Based on Multi-Spectral and SAR Data in Maha Sarakham, Thailand. J. Spat. Sci. 2024, 69, 149–165. [Google Scholar] [CrossRef]
- Yang, H.; Pan, B.; Li, N.; Wang, W.; Zhang, J.; Zhang, X. A Systematic Method for Spatio-Temporal Phenology Estimation of Paddy Rice Using Time Series Sentinel-1 Images. Remote Sens. Environ. 2021, 259, 112394. [Google Scholar] [CrossRef]
- Lasko, K.; Vadrevu, K.P.; Tran, V.T.; Justice, C. Mapping Double and Single Crop Paddy Rice with Sentinel-1A at Varying Spatial Scales and Polarizations in Hanoi, Vietnam. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 498–512. [Google Scholar] [CrossRef] [PubMed]
- Han, Z.; Zhang, C.; Gao, L.; Zeng, Z.; Zhang, B.; Atkinson, P.M. Spatio-Temporal Multi-Level Attention Crop Mapping Method Using Time-Series SAR Imagery. ISPRS J. Photogramm. Remote Sens. 2023, 206, 293–310. [Google Scholar] [CrossRef]
- Adrian, J.; Sagan, V.; Maimaitijiang, M. Sentinel SAR-Optical Fusion for Crop Type Mapping Using Deep Learning and Google Earth Engine. ISPRS J. Photogramm. Remote Sens. 2021, 175, 215–235. [Google Scholar] [CrossRef]
- Wei, P.; Huang, R.; Lin, T.; Huang, J. Rice Mapping in Training Sample Shortage Regions Using a Deep Semantic Segmentation Model Trained on Pseudo-Labels. Remote Sens. 2022, 14, 328. [Google Scholar] [CrossRef]
- Wei, P.; Chai, D.; Lin, T.; Tang, C.; Du, M.; Huang, J. Large-Scale Rice Mapping under Different Years Based on Time-Series Sentinel-1 Images Using Deep Semantic Segmentation Model. ISPRS J. Photogramm. Remote Sens. 2021, 174, 198–214. [Google Scholar] [CrossRef]
- Wei, P.; Chai, D.; Huang, R.; Peng, D.; Lin, T.; Sha, J.; Sun, W.; Huang, J. Rice Mapping Based on Sentinel-1 Images Using the Coupling of Prior Knowledge and Deep Semantic Segmentation Network: A Case Study in Northeast China from 2019 to 2021. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102948. [Google Scholar] [CrossRef]
- Cai, Z.; Hu, Q.; Zhang, X.; Yang, J.; Wei, H.; Wang, J.; Zeng, Y.; Yin, G.; Li, W.; You, L.; et al. Improving Agricultural Field Parcel Delineation with a Dual Branch Spatiotemporal Fusion Network by Integrating Multimodal Satellite Data. ISPRS J. Photogramm. Remote Sens. 2023, 205, 34–49. [Google Scholar] [CrossRef]
- Cai, Z.; Wei, H.; Hu, Q.; Zhou, W.; Zhang, X.; Jin, W.; Wang, L.; Yu, S.; Wang, Z.; Xu, B.; et al. Learning Spectral-Spatial Representations from VHR Images for Fine-Scale Crop Type Mapping: A Case Study of Rice-Crayfish Field Extraction in South China. ISPRS J. Photogramm. Remote Sens. 2023, 199, 28–39. [Google Scholar] [CrossRef]
- Li, Y.; Zhou, Y.; Zhang, Y.; Zhong, L.; Wang, J.; Chen, J. DKDFN: Domain Knowledge-Guided Deep Collaborative Fusion Network for Multimodal Unitemporal Remote Sensing Land Cover Classification. ISPRS J. Photogramm. Remote Sens. 2022, 186, 170–189. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. arXiv 2015, arXiv:1505.04597. [Google Scholar] [CrossRef]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. arXiv 2017, arXiv:1612.01105. [Google Scholar] [CrossRef]
- Wu, X.; Hong, D.; Chanussot, J. Convolutional Neural Networks for Multimodal Remote Sensing Data Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5517010. [Google Scholar] [CrossRef]
- Wang, L.; Li, R.; Zhang, C.; Fang, S.; Duan, C.; Meng, X.; Atkinson, P.M. UNetFormer: A UNet-like Transformer for Efficient Semantic Segmentation of Remote Sensing Urban Scene Imagery. ISPRS J. Photogramm. Remote Sens. 2022, 190, 196–214. [Google Scholar] [CrossRef]
- Lin, S.; Qi, Z.; Li, X.; Zhang, H.; Lv, Q.; Huang, D. A Phenological-Knowledge-Independent Method for Automatic Paddy Rice Mapping with Time Series of Polarimetric SAR Images. ISPRS J. Photogramm. Remote Sens. 2024, 218, 628–644. [Google Scholar] [CrossRef]
- Sainte Fare Garnot, V.; Landrieu, L.; Chehata, N. Multi-Modal Temporal Attention Models for Crop Mapping from Satellite Time Series. ISPRS J. Photogramm. Remote Sens. 2022, 187, 294–305. [Google Scholar] [CrossRef]
- Yang, J.; Hu, Q.; Li, W.; Song, Q.; Cai, Z.; Zhang, X.; Wei, H.; Wu, W. An Automated Sample Generation Method by Integrating Phenology Domain Optical-SAR Features in Rice Cropping Pattern Mapping. Remote Sens. Environ. 2024, 314, 114387. [Google Scholar] [CrossRef]
- Deng, J.; Hong, D.; Li, C.; Yokoya, N. Joint super-resolution and segmentation for 1-m impervious surface area mapping in China’s Yangtze River economic belt. arXiv 2025, arXiv:2505.05367. [Google Scholar] [CrossRef]
- Li, X.; Li, C.; Ghamisi, P.; Hong, D. FlexiMo: A Flexible Remote Sensing Foundation Model. arXiv 2025, arXiv:2503.23844. [Google Scholar] [CrossRef]
- Li, H.; Qiu, K.; Chen, L.; Mei, X.; Hong, L.; Tao, C. SCAttNet: Semantic Segmentation Network with Spatial and Channel Attention Mechanism for High-Resolution Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2021, 18, 905–909. [Google Scholar] [CrossRef]
- Ghamisi, P.; Rasti, B.; Yokoya, N.; Wang, Q.; Hofle, B.; Bruzzone, L.; Bovolo, F.; Chi, M.; Anders, K.; Gloaguen, R.; et al. Multisource and Multitemporal Data Fusion in Remote Sensing: A Comprehensive Review of the State of the Art. IEEE Geosci. Remote Sens. Mag. 2019, 7, 6–39. [Google Scholar] [CrossRef]
- Liu, C.; Sun, Y.; Xu, Y.; Sun, Z.; Zhang, X.; Lei, L.; Kuang, G. A Review of Optical and SAR Image Deep Feature Fusion in Semantic Segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 12910–12930. [Google Scholar] [CrossRef]
- Wang, H.; Liu, X.; Qiao, Z.; Wang, G.; Chen, H. Multimodal Remote Sensing Data Classification Based on Gaussian Mixture Variational Dynamic Fusion Network. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5621214. [Google Scholar] [CrossRef]
- Huang, Y.; Wang, Z.; Tang, T.; Ohtsuki, T.; Gui, G. Dual-Stream Multimodal Fusion with Local–Global Attention for Remote-Sensing Object Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2026, 19, 1691–1702. [Google Scholar] [CrossRef]
- Cheng, B.; Xu, B.; Deng, Q.; Shen, T. MIFNet: Multi-Modal Interactive Fusion Network For Remote Sensing Semantic Segmentation. In Proceedings of the IGARSS 2025—2025 IEEE International Geoscience and Remote Sensing Symposium; IEEE: New York, NY, USA, 2025; pp. 6980–6984. [Google Scholar]
- Schmitt, M.; Zhu, X.X. Data Fusion and Remote Sensing: An Ever-Growing Relationship. IEEE Geosci. Remote Sens. Mag. 2016, 4, 6–23. [Google Scholar] [CrossRef]
- Schmitt, M.; Tupin, F.; Zhu, X.X. Fusion of SAR and Optical Remote Sensing Data—Challenges and Recent Trends. In Proceedings of the 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS); IEEE: New York, NY, USA, 2017; pp. 5458–5461. [Google Scholar]
- Rußwurm, M.; Courty, N.; Emonet, R.; Lefèvre, S.; Tuia, D.; Tavenard, R. End-to-End Learned Early Classification of Time Series for in-Season Crop Type Mapping. ISPRS J. Photogramm. Remote Sens. 2023, 196, 445–456. [Google Scholar] [CrossRef]
- Seong, S.; Chang, A.; Mo, J.; Na, S.; Ahn, H.; Oh, J.; Choi, J. Crop Classification in South Korea for Multitemporal PlanetScope Imagery Using SFC-DenseNet-AM. Int. J. Appl. Earth Obs. Geoinf. 2024, 126, 103619. [Google Scholar] [CrossRef]
- Hong, D.; Gao, L.; Yokoya, N.; Yao, J.; Chanussot, J.; Du, Q.; Zhang, B. More Diverse Means Better: Multimodal Deep Learning Meets Remote-Sensing Imagery Classification. IEEE Trans. Geosci. Remote Sens. 2021, 59, 4340–4354. [Google Scholar] [CrossRef]
- Hang, R.; Li, Z.; Ghamisi, P.; Hong, D.; Xia, G.; Liu, Q. Classification of Hyperspectral and LiDAR Data Using Coupled CNNs. IEEE Trans. Geosci. Remote Sens. 2020, 58, 4939–4950. [Google Scholar] [CrossRef]
- Zhang, P.; Ke, Y.; Zhang, Z.; Wang, M.; Li, P.; Zhang, S. Urban Land Use and Land Cover Classification Using Novel Deep Learning Models Based on High Spatial Resolution Satellite Imagery. Sensors 2018, 18, 3717. [Google Scholar] [CrossRef]
- Loshchilov, I.; Hutter, F. SGDR: Stochastic gradient descent with warm restarts. arXiv 2017, arXiv:1608.03983. [Google Scholar] [CrossRef]
- Zhang, L.; Chen, X.; Zhang, J.; Dong, R.; Ma, K. Contrastive deep supervision. arXiv 2022, arXiv:2207.05306. [Google Scholar] [CrossRef]
- Zhang, C.; Yue, P.; Tapete, D.; Jiang, L.; Shangguan, B.; Huang, L.; Liu, G. A Deeply Supervised Image Fusion Network for Change Detection in High Resolution Bi-Temporal Remote Sensing Images. ISPRS J. Photogramm. Remote Sens. 2020, 166, 183–200. [Google Scholar] [CrossRef]
- Jiang, N.; Li, P.; Feng, Z. Remote Sensing of Swidden Agriculture in the Tropics: A Review. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102876. [Google Scholar] [CrossRef]
- Veloso, A.; Mermoz, S.; Bouvet, A.; Le Toan, T.; Planells, M.; Dejoux, J.-F.; Ceschia, E. Understanding the Temporal Behavior of Crops Using Sentinel-1 and Sentinel-2-like Data for Agricultural Applications. Remote Sens. Environ. 2017, 199, 415–426. [Google Scholar] [CrossRef]
- Liu, L.; Xiao, X.; Qin, Y.; Wang, J.; Xu, X.; Hu, Y.; Qiao, Z. Mapping Cropping Intensity in China Using Time Series Landsat and Sentinel-2 Images and Google Earth Engine. Remote Sens. Environ. 2020, 239, 111624. [Google Scholar] [CrossRef]
- Zhang, G.; Xiao, X.; Dong, J.; Kou, W.; Jin, C.; Qin, Y.; Zhou, Y.; Wang, J.; Menarguez, M.A.; Biradar, C. Mapping Paddy Rice Planting Areas through Time Series Analysis of MODIS Land Surface Temperature and Vegetation Index Data. ISPRS J. Photogramm. Remote Sens. 2015, 106, 157–171. [Google Scholar] [CrossRef]
- Zhu, Y.; Pan, Y.; Zhang, D.; Wu, H.; Zhao, C. A Deep Learning Method for Cultivated Land Parcels (CLPs) Delineation from High-Resolution Remote Sensing Images with High-Generalization Capability. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4410525. [Google Scholar] [CrossRef]













| Dataset | Sentinel-1 | Sentinel-2 |
|---|---|---|
| Qianjiang | April to September 2023 | April and August 2023 |
| Jianli | March and August 2023 | |
| Huoqiu | May and August 2023 | |
| Yongxiu | May to October 2023 | May and October 2023 |
| Number of Patches | Training and Validation | Test |
|---|---|---|
| Qianjiang | 2025 | 539 |
| Jianli | 1125 | 343 |
| Huoqiu | 450 | 196 |
| Yongxiu | 675 | 196 |
| Model | Precision | IoU | F1-Score | MCC |
|---|---|---|---|---|
| U-Net | 0.8062 | 0.7873 | 0.8800 | 0.8200 |
| PSPNet | 0.7535 | 0.7121 | 0.8295 | 0.7447 |
| R-Unet | 0.7610 | 0.7503 | 0.8554 | 0.7873 |
| CCRNet | 0.8775 | 0.8342 | 0.9094 | 0.8314 |
| UNetFormer | 0.8632 | 0.8207 | 0.9015 | 0.8413 |
| STMA | 0.8491 | 0.8091 | 0.8945 | 0.8299 |
| MMDFRNet | 0.9234 | 0.8612 | 0.9252 | 0.8879 |
| Model | U-Net | PSPNet | R-Unet | CCRNet | UNetFormer | STMA | MMDFRNet |
|---|---|---|---|---|---|---|---|
| Time/s | 0.1649 | 0.1488 | 0.099 | 1.3929 | 6.4432 | 6.7643 | 6.0274 |
| MMDFRNet | Precision | IoU | F1-Score | MCC | |
|---|---|---|---|---|---|
| Qianjiang | Optical-only | 0.8091 | 0.7803 | 0.8766 | 0.8014 |
| SAR-only | 0.7107 | 0.6618 | 0.7965 | 0.6658 | |
| Optical-SAR | 0.9234 | 0.8612 | 0.9252 | 0.8879 | |
| Yongxiu | Optical-only | 0.7687 | 0.7469 | 0.8551 | 0.7712 |
| SAR-only | 0.8537 | 0.7438 | 0.8531 | 0.7698 | |
| Optical-SAR | 0.8834 | 0.8465 | 0.9160 | 0.8612 | |
| U-Net | Precision | IoU | F1-Score | MCC |
|---|---|---|---|---|
| Optical-only | 0.8373 | 0.8044 | 0.8916 | 0.8255 |
| SAR-only | 0.6888 | 0.6444 | 0.7838 | 0.6441 |
| Optical-SAR | 0.8062 | 0.7873 | 0.8800 | 0.8200 |
| Region | Precision | IoU | F1-Score | MCC |
|---|---|---|---|---|
| Qianjiang | 0.9234 | 0.8612 | 0.9252 | 0.8879 |
| Jianli | 0.8665 | 0.8129 | 0.8963 | 0.8501 |
| Huoqiu | 0.8848 | 0.8456 | 0.9159 | 0.7707 |
| Yongxiu | 0.8834 | 0.8465 | 0.9160 | 0.8612 |
| Region | Model | Precision | IoU | F1-Score | MCC |
|---|---|---|---|---|---|
| Qianjiang | MMDFRNet | 0.9234 | 0.8612 | 0.9252 | 0.8879 |
| MMFNet | 0.8914 | 0.8311 | 0.9075 | 0.8589 | |
| MSFNet | 0.8548 | 0.8152 | 0.8974 | 0.8449 | |
| MMDFNet | 0.8102 | 0.7703 | 0.8676 | 0.7998 | |
| RNet | 0.8061 | 0.7848 | 0.8734 | 0.8010 | |
| Jianli | MMDFRNet | 0.8665 | 0.8129 | 0.8963 | 0.8501 |
| MMFNet | 0.8247 | 0.7476 | 0.8538 | 0.7905 | |
| MSFNet | 0.8404 | 0.7812 | 0.8767 | 0.8190 | |
| MMDFNet | 0.7742 | 0.7330 | 0.8442 | 0.7810 | |
| RNet | 0.8451 | 0.7543 | 0.859 | 0.7917 | |
| Huoqiu | MMDFRNet | 0.8848 | 0.8456 | 0.9159 | 0.7707 |
| MMFNet | 0.9173 | 0.8405 | 0.9121 | 0.7775 | |
| MSFNet | 0.8578 | 0.8115 | 0.8950 | 0.7092 | |
| MMDFNet | 0.8896 | 0.8259 | 0.9033 | 0.7519 | |
| RNet | 0.9199 | 0.6959 | 0.8045 | 0.6450 | |
| Yongxiu | MMDFRNet | 0.8834 | 0.8465 | 0.9160 | 0.8612 |
| MMFNet | 0.8694 | 0.8285 | 0.9048 | 0.8443 | |
| MSFNet | 0.8061 | 0.7848 | 0.8734 | 0.8010 | |
| MMDFNet | 0.6548 | 0.6496 | 0.7644 | 0.6883 | |
| RNet | 0.8138 | 0.7484 | 0.8486 | 0.7945 |
| Region | Model | Precision | IoU | F1-Score | MCC |
|---|---|---|---|---|---|
| Jianli | U-Net | 0.8372 | 0.7668 | 0.8665 | 0.8079 |
| PSPNet | 0.8234 | 0.7068 | 0.8249 | 0.7546 | |
| R-Unet | 0.8297 | 0.6921 | 0.8085 | 0.7447 | |
| CCRNet | 0.8400 | 0.7448 | 0.8525 | 0.7876 | |
| UNetFormer | 0.8667 | 0.7446 | 0.8536 | 0.7902 | |
| STMA | 0.8284 | 0.7947 | 0.8856 | 0.8340 | |
| MMDFRNet | 0.8665 | 0.8129 | 0.8963 | 0.8501 | |
| Huoqiu | U-Net | 0.8570 | 0.8051 | 0.8847 | 0.7477 |
| PSPNet | 0.9088 | 0.8099 | 0.8938 | 0.7366 | |
| R-Unet | 0.8792 | 0.7979 | 0.8781 | 0.7516 | |
| CCRNet | 0.8966 | 0.8327 | 0.9081 | 0.7584 | |
| UNetFormer | 0.8595 | 0.8160 | 0.8987 | 0.7799 | |
| STMA | 0.8924 | 0.8152 | 0.8982 | 0.7847 | |
| MMDFRNet | 0.8848 | 0.8456 | 0.9159 | 0.7707 | |
| Yongxiu | U-Net | 0.9370 | 0.8032 | 0.8863 | 0.8337 |
| PSPNet | 0.9265 | 0.789 | 0.8781 | 0.8189 | |
| R-Unet | 0.9315 | 0.8042 | 0.8871 | 0.834 | |
| CCRNet | 0.8605 | 0.8244 | 0.8982 | 0.8325 | |
| UNetFormer | 0.8417 | 0.7660 | 0.8675 | 0.8260 | |
| STMA | 0.8698 | 0.8311 | 0.9078 | 0.8538 | |
| MMDFRNet | 0.8834 | 0.8465 | 0.9160 | 0.8612 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Fu, T.; Ge, J.; Tian, S. MMDFRNet: Dynamic Cross-Modal Decoupling and Alignment for Robust Rice Mapping. Remote Sens. 2026, 18, 1413. https://doi.org/10.3390/rs18091413
Fu T, Ge J, Tian S. MMDFRNet: Dynamic Cross-Modal Decoupling and Alignment for Robust Rice Mapping. Remote Sensing. 2026; 18(9):1413. https://doi.org/10.3390/rs18091413
Chicago/Turabian StyleFu, Tingyan, Jia Ge, and Shufang Tian. 2026. "MMDFRNet: Dynamic Cross-Modal Decoupling and Alignment for Robust Rice Mapping" Remote Sensing 18, no. 9: 1413. https://doi.org/10.3390/rs18091413
APA StyleFu, T., Ge, J., & Tian, S. (2026). MMDFRNet: Dynamic Cross-Modal Decoupling and Alignment for Robust Rice Mapping. Remote Sensing, 18(9), 1413. https://doi.org/10.3390/rs18091413

