Transformer-Based Dual-Branch Spatial–Temporal–Spectral Feature Fusion Network for Paddy Rice Mapping
Abstract
1. Introduction
2. Materials and Methods
2.1. Study Area
2.2. Remote Sensing Data Preparation
2.2.1. Sentinel-1 Time-Series SAR Images
2.2.2. Sentinel-2 Multispectral Image
2.2.3. The Ground Truth Data
2.2.4. Training, Validation and Test Samples
2.3. Analysis of Diversity Across Regions
2.4. Model and Principles
2.4.1. RDTFNet
2.4.2. SAR Branch Encoder
2.4.3. Optical Branch Encoder
2.4.4. Feature Fusion Module (FFM)
2.4.5. Loss Function
2.4.6. Performance Evaluation and Experimental Setup
3. Results
3.1. Cross-Regional Temporal Diversity Analysis
3.2. Comparative Results of Basic Performance of Rice Extraction
3.3. Comparative Results of Cross-Regional Generalization Capabilities
3.4. Comparison Results of Temporal Generalization Capabilities
4. Discussion
4.1. Ablation Study
- (1)
- Effect of the SAR-temporal branch encoder: As shown in Table 7, the use of the Restormer CNN (R-CNN) branch structure developed in this study for U-Net segmentation of the test set improved the overall accuracy by 0.19%, the IoU by 0.82%, and the recall by 1.42%, with a decrease in precision of 0.56% but an increase in the F1-score of 0.47%. The Restormer module emphasizes the local context, ensuring that context-based global relationships between pixels are implicitly modeled during covariance-based attention map computation [24]. The cross-channel self-attention calculation meets the temporal feature extraction requirements. This result validates that, compared with traditional convolutional branches, the Restormer CNN branch captures richer spatial and temporal features in SAR imagery, improving the model’s capacity to recognize rice.
- (2)
- Effect of the optical–spatial branch encoder: As shown in Table 7, replacing the convolutional branch with the multiscale transformer-CNN (MTCNN) module as the encoding branch in the U-Net segmentation of the test set improved the overall accuracy by 0.15%, the IoU by 0.61%, the recall by 0.82%, and the F1-score by 0.35% compared with DE-UNet. This confirms the effectiveness of multiscale global attention in rice extraction and demonstrates that adding a multiscale mechanism compensates for insufficient granularity in global context information [25].
- (3)
- Effect of the FFM: Table 7 shows the results of applying the FFM to fuse optical and SAR features from the dual-branch encoder and process the test set images. The overall accuracy improved by 0.22%, the IoU improved by 0.79%, the precision by 0.76%, the recall improved by 0.15%, and the F1-score improved by 0.45%. The FFM integrates the temporal information from SAR imagery with spatial information from optical imagery at the feature level, learning complementary information from multimodal data [45]. By including the FFM, the network became more sensitive to the phenological characteristics of rice, resulting in fewer misclassifications between rice-growing areas and rivers or lakes, as the unique irrigation period of rice makes its optical features similar to those of water bodies. The results indicate that the FFM effectively captures the phenological characteristics of rice, enhances category differentiation and improves segmentation between rice and non-rice areas.
4.2. The Impact of Single/Dual-Branch Model Structures on Rice Extraction
4.3. The Impact of Unimodal/Multimodal Data on Rice Extraction
5. Conclusions
- (1)
- In terms of baseline performance and generalization capability, RDTFNet achieves high classification accuracy in rice extraction, outperforming several existing deep learning models. On in-region test data, it reached an overall accuracy (OA), intersection over union (IoU), and F1-score of 96.95%, 88.12%, and 93.68%, respectively, improving by 1.61%, 5.37%, and 2.53% over other models. In cross-region tests, RDTFNet showed the smallest performance drop, achieving a 92.55% F1-score with only a 1.13% decrease, demonstrating strong spatial generalization. In cross-temporal evaluation, it also had the smallest metric reductions, maintaining an F1-score of 90.53%, just 2.03% lower than in cross-region testing, highlighting its robust temporal generalization.
- (2)
- Compared with single-branch architectures, the dual-branch design offers higher accuracy and better overall performance. On the 2019 dataset of Study Area A, models using only a single branch (spatial or temporal) yielded lower F1 (92.61%) and IoU (86.24%) scores, indicating their limitations in fully exploiting multimodal information. In contrast, RDTFNet achieved 88.12% IoU, 95.14% overall accuracy, and 93.68% F1-score. Compared with single-branch models such as U-Net, Restormer-Unet, and MSTransformer-Unet, RDTFNet achieved the highest scores across all evaluation metrics, demonstrating superior performance in rice mapping.
- (3)
- To evaluate the impact of unimodal and multimodal data, experiments were conducted using either optical or SAR data alone, and the results were compared with those obtained from the multimodal RDTFNet. The results demonstrate that the multimodal RDTFNet achieved the best performance, with IoU scores exceeding those of the unimodal optical and SAR inputs by 11.08% and 10.33%, respectively, highlighting the complementary advantages of optical and SAR data.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Boddiger, D. Boosting biofuel crops could threaten food security. Lancet 2007, 370, 923–924. [Google Scholar] [CrossRef] [PubMed]
- Bongaarts, J. The State of Food Security and Nutrition in the World 2020. Transforming food systems for affordable healthy diets. Popul. Dev. Rev. 2021, 10–11. [Google Scholar]
- Funk, C.C.; Brown, M.E. Declining global per capita agricultural production and warming oceans threaten food security. Food Secur. 2009, 1, 271–289. [Google Scholar] [CrossRef]
- Godfray, H.C.J.; Beddington, J.R.; Crute, I.R.; Haddad, L.; Lawrence, D.; Muir, J.F.; Pretty, J.; Robinson, S.; Thomas, S.M.; Toulmin, C. Food security: The challenge of feeding 9 billion people. Science 2010, 327, 812–818. [Google Scholar] [CrossRef]
- Bioucas-Dias, J.M.; Plaza, A.; Camps-Valls, G.; Scheunders, P.; Nasrabadi, N.; Chanussot, J. Hyperspectral remote sensing data analysis and future challenges. IEEE Geosci. Remote Sens. Mag. 2013, 1, 6–36. [Google Scholar] [CrossRef]
- Bo, L.; Xiaoyang, X.; Xingxing, W.; Wenting, T. Ship detection and classification from optical remote sensing images: A survey. Chin. J. Aeronaut. 2021, 34, 145–163. [Google Scholar]
- Hu, Q.; Yin, H.; Friedl, M.A.; You, L.; Li, Z.; Tang, H.; Wu, W. Integrating coarse-resolution images and agricultural statistics to generate sub-pixel crop type maps and reconciled area estimates. Remote Sens. Environ. 2021, 258, 112365. [Google Scholar] [CrossRef]
- Lorenz, S.; Ghamisi, P.; Kirsch, M.; Jackisch, R.; Rasti, B.; Gloaguen, R. Feature extraction for hyperspectral mineral domain mapping: A test of conventional and innovative methods. Remote Sens. Environ. 2021, 252, 112129. [Google Scholar] [CrossRef]
- Wei, P.; Huang, R.; Lin, T.; Huang, J. Rice mapping in training sample shortage regions using a deep semantic segmentation model trained on pseudo-labels. Remote Sens. 2022, 14, 328. [Google Scholar] [CrossRef]
- Zhu, S.; Li, S.; Yang, Z. Research on the Distribution Map of Weeds in Rice Field Based on SegNet. In 3D Imaging—Multidimensional Signal Processing and Deep Learning: Multidimensional Signals, Images, Video Processing and Applications; Springer: Berlin/Heidelberg, Germany, 2022; Volume 2, pp. 91–99. [Google Scholar]
- Wang, M.; Wang, J.; Cui, Y.; Liu, J.; Chen, L. Agricultural field boundary delineation with satellite image segmentation for high-resolution crop mapping: A case study of rice paddy. Agronomy 2022, 12, 2342. [Google Scholar] [CrossRef]
- Fan, X.; Yan, C.; Fan, J.; Wang, N. Improved U-net remote sensing classification algorithm fusing attention and multiscale features. Remote Sens. 2022, 14, 3591. [Google Scholar] [CrossRef]
- Crisóstomo de Castro Filho, H.; Abílio de Carvalho Júnior, O.; Ferreira de Carvalho, O.L.; Pozzobon de Bem, P.; dos Santos de Moura, R.; Olino de Albuquerque, A.; Rosa Silva, C.; Guimaraes Ferreira, P.H.; Fontes Guimarães, R.; Trancoso Gomes, R.A. Rice crop detection using LSTM, Bi-LSTM, and machine learning models from Sentinel-1 time series. Remote Sens. 2020, 12, 2655. [Google Scholar] [CrossRef]
- Wang, X.; Zhang, J.; Xun, L.; Wang, J.; Wu, Z.; Henchiri, M.; Zhang, S.; Zhang, S.; Bai, Y.; Yang, S. Evaluating the effectiveness of machine learning and deep learning models combined time-series satellite data for multiple crop types classification over a large-scale region. Remote Sens. 2022, 14, 2341. [Google Scholar] [CrossRef]
- Radford, A. Improving Language Understanding by Generative Pre-Training. 2018. Available online: https://www.mikecaptain.com/resources/pdf/GPT-1.pdf (accessed on 20 April 2024).
- Garnot, V.S.F.; Landrieu, L. Panoptic segmentation of satellite image time series with convolutional temporal attention networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 10–17 October 2021; pp. 4872–4881. [Google Scholar]
- Zhang, Q.; Yang, Y.-B. ResT: An efficient transformer for visual recognition. Adv. Neural Inf. Process. Systems 2021, 34, 15475–15485. [Google Scholar]
- Srinivas, A.; Lin, T.-Y.; Parmar, N.; Shlens, J.; Abbeel, P.; Vaswani, A. Bottleneck transformers for visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 16519–16529. [Google Scholar]
- Dai, Z.; Liu, H.; Le, Q.V.; Tan, M. CoAtNet: Marrying convolution and attention for all data sizes. Adv. Neural Inf. Process. Syst. 2021, 34, 3965–3977. [Google Scholar]
- Niu, B.; Feng, Q.; Chen, B.; Ou, C.; Liu, Y.; Yang, J. HSI-TransUNet: A transformer based semantic segmentation model for crop mapping from UAV hyperspectral imagery. Comput. Electron. Agric. 2022, 201, 107297. [Google Scholar] [CrossRef]
- He, X.; Zhou, Y.; Zhao, J.; Zhang, D.; Yao, R.; Xue, Y. Swin transformer embedding UNet for remote sensing image semantic segmentation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–15. [Google Scholar] [CrossRef]
- Li, H.; Chen, S.-B.; Huang, L.-L.; Ding, C.; Tang, J.; Luo, B. DEGANet: Road Extraction Using Dual-branch Encoder with Gated Attention Mechanism. IEEE Geosci. Remote Sens. Lett. 2024, 21, 8003705. [Google Scholar] [CrossRef]
- Wei, H.; Xu, X.; Ou, N.; Zhang, X.; Dai, Y. DEANet: Dual encoder with attention network for semantic segmentation of remote sensing imagery. Remote Sens. 2021, 13, 3900. [Google Scholar] [CrossRef]
- Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.-H. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 5728–5739. [Google Scholar]
- Wu, H.; Huang, P.; Zhang, M.; Tang, W.; Yu, X. CMTFNet: CNN and multiscale transformer fusion network for remote sensing image semantic segmentation. IEEE Trans. Geosci. Remote Sens. 2023, 61, 2004612. [Google Scholar] [CrossRef]
- Yang, L.; Huang, R.; Huang, J.; Lin, T.; Wang, L.; Mijiti, R.; Wei, P.; Tang, C.; Shao, J.; Li, Q. Semantic segmentation based on temporal features: Learning of temporal–spatial information from time-series SAR images for paddy rice mapping. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–16. [Google Scholar] [CrossRef]
- Fu, T.; Tian, S.; Ge, J. R-Unet: A Deep Learning Model for Rice Extraction in Rio Grande do Sul, Brazil. Remote Sens. 2023, 15, 4021. [Google Scholar] [CrossRef]
- Shipp, M. Rice Crop Timeline for the Southern States of Arkansas, Louisiana, and Mississippi; NSF Center for Integrated Pest Management: Raleigh, NC, USA, 2005. [Google Scholar]
- Wilson, C., Jr.; Branson, J. Trends in Arkansas rice production. BR Wells Rice Res. Stud. 2004, 550, 13–22. [Google Scholar]
- Hill, J.; Williams, J.; Mutters, R.; Greer, C. The California rice cropping system: Agronomic and natural resource issues for long-term sustainability. Paddy Water Environ. 2006, 4, 13–19. [Google Scholar] [CrossRef]
- Torres, R.; Snoeij, P.; Geudtner, D.; Bibby, D.; Davidson, M.; Attema, E.; Potin, P.; Rommen, B.; Floury, N.; Brown, M. GMES Sentinel-1 mission. Remote Sens. Environ. 2012, 120, 9–24. [Google Scholar] [CrossRef]
- Yang, C.; Zhang, D.; Zhao, C.; Han, B.; Sun, R.; Du, J.; Chen, L. Ground deformation revealed by Sentinel-1 MSBAS-InSAR time-series over Karamay Oilfield, China. Remote Sens. 2019, 11, 2027. [Google Scholar] [CrossRef]
- Onojeghuo, A.O.; Miao, Y.; Blackburn, G.A. Deep ResU-Net Convolutional Neural Networks Segmentation for Smallholder Paddy Rice Mapping Using Sentinel 1 SAR and Sentinel 2 Optical Imagery. Remote Sens. 2023, 15, 1517. [Google Scholar] [CrossRef]
- Gumma, M.K.; Nelson, A.; Thenkabail, P.S.; Singh, A.N. Mapping rice areas of South Asia using MODIS multitemporal data. J. Appl. Remote Sens. 2011, 5, 53547–53573. [Google Scholar] [CrossRef]
- Huete, A.; Didan, K.; Miura, T.; Rodriguez, E.P.; Gao, X.; Ferreira, L.G. Overview of the radiometric and biophysical performance of the MODIS vegetation indices. Remote Sens. Environ. 2002, 83, 195–213. [Google Scholar] [CrossRef]
- Liu, W.; Dong, J.; Xiang, K.; Wang, S.; Han, W.; Yuan, W. A sub-pixel method for estimating planting fraction of paddy rice in Northeast China. Remote Sens. Environ. 2018, 205, 305–314. [Google Scholar] [CrossRef]
- Boryan, C.; Yang, Z.; Mueller, R.; Craig, M. Monitoring US agriculture: The US department of agriculture, national agricultural statistics service, cropland data layer program. Geocarto Int. 2011, 26, 341–358. [Google Scholar] [CrossRef]
- Shao, Y.; Lunetta, R.S.; Wheeler, B.; Iiames, J.S.; Campbell, J.B. An evaluation of time-series smoothing algorithms for land-cover classifications using MODIS-NDVI multi-temporal data. Remote Sens. Environ. 2016, 174, 258–265. [Google Scholar] [CrossRef]
- Zhong, L.; Hu, L.; Zhou, H.; Tao, X. Deep learning based winter wheat mapping using statistical data as ground references in Kansas and northern Texas, US. Remote Sens. Environ. 2019, 233, 111411. [Google Scholar] [CrossRef]
- Sun, Z.; Di, L.; Fang, H. Using long short-term memory recurrent neural network in land cover classification on Landsat and Cropland data layer time series. Int. J. Remote Sens. 2019, 40, 593–614. [Google Scholar] [CrossRef]
- Wang, S.; Chen, W.; Xie, S.M.; Azzari, G.; Lobell, D.B. Weakly supervised deep learning for segmentation of remote sensing imagery. Remote Sens. 2020, 12, 207. [Google Scholar] [CrossRef]
- Zhang, X.; Liu, C.; Yang, D.; Song, T.; Ye, Y.; Li, K.; Song, Y. RFAConv: Innovating spatial attention and standard convolutional operation. arXiv 2023, arXiv:2304.03198. [Google Scholar]
- de Bem, P.P.; de Carvalho Júnior, O.A.; de Carvalho, O.L.F.; Gomes, R.A.T.; Guimarāes, R.F.; Pimentel, C.M.M. Irrigated rice crop identification in Southern Brazil using convolutional neural networks and Sentinel-1 time series. Remote Sens. Appl. Soc. Environ. 2021, 24, 100627. [Google Scholar]
- Xia, L.; Zhao, F.; Chen, J.; Yu, L.; Lu, M.; Yu, Q.; Liang, S.; Fan, L.; Sun, X.; Wu, S. A full resolution deep learning network for paddy rice mapping using Landsat data. ISPRS J. Photogramm. Remote Sens. 2022, 194, 91–107. [Google Scholar] [CrossRef]
- Yang, X.; Li, S.; Chen, Z.; Chanussot, J.; Jia, X.; Zhang, B.; Li, B.; Chen, P. An attention-fused network for semantic segmentation of very-high-resolution remote sensing imagery. ISPRS J. Photogramm. Remote Sens. 2021, 177, 238–262. [Google Scholar] [CrossRef]
Index | Start Date | End Date |
---|---|---|
1 | April 1 | April 24 |
2 | April 25 | May 18 |
3 | May 19 | June 11 |
4 | June 12 | July 5 |
5 | July 6 | July 29 |
6 | July 30 | August 22 |
7 | August 23 | September 15 |
8 | September 16 | October 9 |
9 | October 10 | November 3 |
Datasets | Input Bands | Channels | Description |
---|---|---|---|
SAR | VV+VH | 18 | VV: vertical-vertical polarization VH: vertical-horizontal polarization Indices: NDVI+EVI+LSWI NDVI: normalized difference Vegetation index EVI: enhanced vegetation index LSWI: land surface water index |
Optical | Indices | 3 | |
SAR and Optical | VV+VH and Indices | 18 and 3 |
Study Area | Class | Number of Samples | Area/km2 |
---|---|---|---|
Study Area A | Corn | 4,146,263 | 3734.5 (13.40%) |
Cotton | 5,083,928 | 4579.1 (16.43%) | |
Rice | 5,376,608 | 4842.7 (17.37%) | |
Sorghum | 32,314 | 29.1 (0.10%) | |
Soybeans | 16,070,391 | 14,474.6 (51.93%) | |
Peanuts | 166,313 | 149.8 (0.54%) | |
Winter Wheat | 39,842 | 35.9 (0.13%) | |
Alfalfa | 4200 | 3.8 (0.01%) | |
Sweet Potatoes | 25,682 | 23.1 (0.08%) | |
Study Area B | Corn | 88,449 | 79.7 (2.65%) |
Cotton | 15,640 | 14.1 (0.47%) | |
Rice | 2,324,439 | 2093.6 (69.54%) | |
Sorghum | 5897 | 5.3 (0.18%) | |
Sunflowers | 229,388 | 206.6 (6.86%) | |
Barley | 17,126 | 15.4 (0.51%) | |
Winter Wheat | 276,386 | 248.9 (8.27%) | |
Alfalfa | 344,142 | 310 (10.30%) | |
Dry Beans | 41,128 | 37 (1.23%) |
Model | OA | IoU | Precision | Recall | F1-Score | Specificity |
---|---|---|---|---|---|---|
DeepLabV3 | 95.34% | 82.74% | 89.98% | 91.15% | 91.15% | 96.70% |
SegNet | 95.67% | 83.54% | 92.26% | 89.84% | 91.03% | 97.88% |
U-Net | 96.32% | 85.80% | 94.04% | 90.73% | 92.35% | 97.87% |
CMTFNet | 96.32% | 85.84% | 93.72% | 91.08% | 92.38% | 98.02% |
TFBS | 96.46% | 86.24% | 94.81% | 90.51% | 92.61% | 98.39% |
R-Unet | 96.90% | 87.86% | 94.46% | 91.70% | 93.05% | 98.58% |
RDTFNet | 96.95% | 88.12% | 95.14% | 92.27% | 93.68% | 98.71% |
Model | OA | IoU | Precision | Recall | F1-Score | Specificity |
---|---|---|---|---|---|---|
DeepLabV3 | 96.60% | 74.61% | 82.49% | 88.65% | 85.44% | 97.61% |
(+1.25%) | (−8.14%) | (−7.49%) | (−2.50%) | (−5.71%) | (+0.90%) | |
SegNet | 96.95% | 76.29% | 86.32% | 86.90% | 86.41% | 98.23% |
(+1.29%) | (−7.26%) | (−5.95%) | (−2.94%) | (−4.62%) | (+0.35%) | |
U-Net | 97.37% | 79.16% | 88.16% | 88.60% | 88.33% | 98.48% |
(+1.05%) | (−6.64%) | (−5.88%) | (−2.13%) | (−4.03%) | (+0.62%) | |
CMTFNet | 97.50% | 79.82% | 89.88% | 87.70% | 88.74% | 98.75% |
(+1.18%) | (−6.02%) | (−3.83%) | (−3.39%) | (−3.64%) | (+0.73%) | |
TFBS | 98.20% | 83.20% | 91.80% | 89.24% | 90.49% | 98.95% |
(+1.72%) | (−3.04%) | (−3.01%) | (−1.27%) | (−2.12%) | (+0.56%) | |
R-Unet | 97.49% | 80.13% | 88.46% | 89.50% | 88.93% | 99.09% |
(+0.60%) | (−7.72%) | (−6.00%) | (−2.19%) | (−4.12%) | (+0.50%) | |
RDTFNet | 98.33% | 86.15% | 93.25% | 91.91% | 92.55% | 99.15% |
(+1.38%) | (−1.96%) | (−1.89%) | (−0.36%) | (−1.13%) | (+0.68%) |
Model | OA | IoU | Precision | Recall | F1-Score |
---|---|---|---|---|---|
DeepLabV3 | 96.12% | 64.53% | 78.91% | 78.05% | 78.42% |
(−0.48%) | (−10.08%) | (−3.58%) | (−10.60%) | (−7.02) | |
SegNet | 96.80% | 70.72% | 83.64% | 82.18% | 82.58% |
(−0.16%) | (−5.57%) | (−2.68%) | (−4.72%) | (−3.84%) | |
U–Net | 97.15% | 74.52% | 87.10% | 82.54% | 85.22% |
(−0.22%) | (−4.63%) | (−1.07%) | (−6.06%) | (−3.11%) | |
CMTFNet | 97.23% | 72.13% | 88.84% | 79.32% | 83.73% |
(−0.28%) | (−7.68%) | (−1.04%) | (−8.38%) | (−5.01%) | |
TFBS | 97.66% | 77.75% | 84.84% | 87.30% | 86.05% |
(−0.53%) | (−5.45%) | (−6.95%) | (−1.94%) | (−4.44%) | |
R–Unet | 97.38% | 75.63% | 87.62% | 82.90% | 85.10% |
(−0.11%) | (−4.50%) | (−0.84%) | (−4.72%) | (−3.84) | |
RDTFNet | 98.32% | 82.71% | 92.44% | 88.75% | 90.53% |
(−0.01%) | (−3.44%) | (−0.81%) | (−3.16%) | (−2.03%) |
Model | OA | IoU | Precision | Recall | F1-Score |
---|---|---|---|---|---|
Base (U-Net) | 96.32% | 85.80% | 94.04% | 90.73% | 92.35% |
Base + dual branch (DE-UNet) | 96.50% | 86.38% | 94.76% | 90.73% | 92.69% |
Base + dual branch + spatial encoder (Spatial-DE-UNet) | 96.65% | 86.99% | 94.60% | 91.55% | 93.04% |
Base + dual branch + temporal encoder (Temporal-DE-UNet) | 96.69% | 87.20% | 94.20% | 92.15% | 93.16% |
Base + dual branch + spatial–temporal encoders (WithoutFFM) | 96.73% | 87.33% | 94.38% | 92.12% | 93.23% |
Base + dual branch + spatial–temporal encoders + FFM (RDTFNet) | 96.95% | 88.12% | 95.14% | 92.27% | 93.68% |
Model | OA | IoU | Precision | Recall | F1-Score |
---|---|---|---|---|---|
U-Net | 96.32% | 85.80% | 94.04% | 90.73% | 92.35% |
DE-Unet | 96.50% | 86.38% | 94.76% | 90.73% | 92.69% |
MSTransformer-UNet | 96.36% | 85.96% | 93.96% | 90.99% | 92.45% |
Restormer-UNet | 96.45% | 86.24% | 94.47% | 90.83% | 92.61% |
RDTFNet | 96.95% | 88.12% | 95.14% | 92.27% | 93.68% |
Model | Modal | OA | IoU | Precision | Recall | F1-Score |
---|---|---|---|---|---|---|
U-Net | Unimodal (OPT) | 95.28% | 67.69% | 75.21% | 87.30% | 80.70% |
Unimodal (SAR) | 95.90% | 67.34% | 86.60% | 75.25% | 80.31% | |
Multimodal | 97.37% | 79.16% | 88.16% | 88.60% | 88.33% | |
RDTFNet | Unimodal (OPT) | 95.51% | 75.07% | 80.05% | 92.42% | 85.75% |
Unimodal (SAR) | 96.57% | 75.82% | 88.20% | 84.39% | 86.23% | |
Multimodal | 98.33% | 86.15% | 93.25% | 91.91% | 92.55% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, X.; Wei, H.; Shao, Y.; Luan, H.; Wang, D.-H. Transformer-Based Dual-Branch Spatial–Temporal–Spectral Feature Fusion Network for Paddy Rice Mapping. Remote Sens. 2025, 17, 1999. https://doi.org/10.3390/rs17121999
Zhang X, Wei H, Shao Y, Luan H, Wang D-H. Transformer-Based Dual-Branch Spatial–Temporal–Spectral Feature Fusion Network for Paddy Rice Mapping. Remote Sensing. 2025; 17(12):1999. https://doi.org/10.3390/rs17121999
Chicago/Turabian StyleZhang, Xinxin, Hongwei Wei, Yuzhou Shao, Haijun Luan, and Da-Han Wang. 2025. "Transformer-Based Dual-Branch Spatial–Temporal–Spectral Feature Fusion Network for Paddy Rice Mapping" Remote Sensing 17, no. 12: 1999. https://doi.org/10.3390/rs17121999
APA StyleZhang, X., Wei, H., Shao, Y., Luan, H., & Wang, D.-H. (2025). Transformer-Based Dual-Branch Spatial–Temporal–Spectral Feature Fusion Network for Paddy Rice Mapping. Remote Sensing, 17(12), 1999. https://doi.org/10.3390/rs17121999