Dual-Branch Transformer–CNN Fusion for Enhanced Cloud Segmentation in Remote Sensing Imagery
Abstract
1. Introduction
- (1)
- A novel cloud detection approach is introduced, combining Transformer and CNN architectures to capture both local details and global contexts. This framework improves the precision of cloud region segmentation by aggregating features from both models.
- (2)
- The proposed feature aggregation module (FAM) efficiently merges multi-level outputs from both pathways, integrating spatial and contextual information. This design overcomes the limitations of using CNN or Transformer models alone, retaining more relevant details for accurate cloud delineation.
- (3)
- A large-scale, high-resolution cloud detection dataset, named CHLandsat-8, was assembled from 64 full scenes of Landsat-8 satellite imagery, covering various regions in China from January to December 2021. Each image includes pixel-level cloud annotations through manual labeling. The dataset will be publicly available, serving as a benchmark for the development and testing of cloud recognition techniques and supporting ongoing research in the field.
2. Related Works
2.1. Cloud Detection in Remote Sensing Images
2.2. Vision Transformer
3. Method
3.1. Problem Formulation
3.2. Overall Framework
3.3. Feature Aggregation Module
4. Experimental Results and Analysis
4.1. Experimental Setup
4.2. Dataset
4.3. Evaluation Indicators
4.4. Comparative Experiments
4.4.1. Quantitative Experiments
4.4.2. Qualitative Experiments
4.5. Ablation Experiments
4.6. Interpretive Experiments
5. Discussion and Analysis
5.1. Computational Complexity Analysis
5.2. Limitations
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Hagolle, O.; Huc, M.; Pascual, D.V.; Dedieu, G. A multi-temporal method for cloud detection, applied to FORMOSAT-2, VENµS, LANDSAT and SENTINEL-2 images. Remote Sens. Environ. 2010, 114, 1747–1755. [Google Scholar] [CrossRef]
- Zhu, Z.; Wang, S.; Woodcock, C.E. Improvement and expansion of the Fmask algorithm: Cloud, cloud shadow, and snow detection for Landsats 4–7, 8, and Sentinel 2 images. Remote Sens. Environ. 2015, 159, 269–277. [Google Scholar] [CrossRef]
- Stowe, L.L.; Davis, P.A.; McClain, E.P. Scientific basis and initial evaluation of the CLAVR-1 global clear/cloud classification algorithm for the Advanced Very High Resolution Radiometer. J. Atmos. Ocean Technol. 1999, 16, 656–681. [Google Scholar] [CrossRef]
- Irish, R.R.; Barker, J.L.; Goward, S.N.; Arvidson, T. Characterization of the Landsat-7 ETM+ automated cloud-cover assessment (ACCA) algorithm. Photogramm. Eng. Remote Sens. 2006, 72, 1179–1188. [Google Scholar] [CrossRef]
- Mohajerani, S.; Saeedi, P. Cloud-Net: An end-to-end cloud detection algorithm for Landsat 8 imagery. In Proceedings of the IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019. [Google Scholar] [CrossRef]
- Mohajerani, S.; Saeedi, P. Cloud and cloud shadow segmentation for remote sensing imagery via filtered jaccard loss function and parametric augmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 4254–4266. [Google Scholar] [CrossRef]
- Hou, Q.; Cheng, M.M.; Hu, X.; Borji, A.; Tu, Z.; Torr, P.H. Deeply supervised salient object detection with short connections. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Yang, J.; Guo, J.; Yue, H.; Liu, Z.; Hu, H.; Li, K. CDnet: CNN-based cloud detection for remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6195–6211. [Google Scholar] [CrossRef]
- Guo, J.; Yang, J.; Yue, H.; Tan, H.; Hou, C.; Li, K. CDnetV2: CNN-based cloud detection for remote sensing imagery with cloud-snow coexistence. IEEE Trans. Geosci. Remote Sens. 2020, 59, 700–713. [Google Scholar] [CrossRef]
- Lu, C.; Xia, M.; Qian, M.; Chen, B. Dual-branch Network for Cloud and Cloud Shadow Segmentation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–12. [Google Scholar] [CrossRef]
- Zhang, Z.; Xu, Z.; Liu, C.A.; Tian, Q.; Wang, Y. Cloudformer: Supplementary aggregation feature and mask-classification network for cloud detection. Appl. Sci. 2022, 12, 3221. [Google Scholar] [CrossRef]
- Wang, Y.; Xu, Z.; Wang, X.; Shen, C.; Cheng, B.; Shen, H.; Xia, H. End-to-end video instance segmentation with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Hu, K.; Zhang, D.; Xia, M. CDUNet: Cloud Detection UNet for Remote Sensing Imagery. Remote Sens. 2021, 13, 4533. [Google Scholar] [CrossRef]
- Garofalo, S.P.; Ardito, F.; Sanitate, N.; De Carolis, G.; Ruggieri, S.; Giannico, V.; Rana, G.; Ferrara, R.M. Robustness of Actual Evapotranspiration Predicted by Random Forest Model Integrating Remote Sensing and Meteorological Information: Case of Watermelon (Citrullus lanatus, (Thunb.) Matsum. & Nakai, 1916). Water 2025, 17, 323. [Google Scholar] [CrossRef]
- Zhang, H.K.; Qiu, S.; Suh, J.W.; Luo, D.; Zhu, Z. Machine learning and deep learning in remote sensing data analysis. Ref. Modul. Earth Syst. Environ. Sci. 2024. [Google Scholar] [CrossRef]
- Pu, W.; Wang, Z.; Liu, D.; Zhang, Q. Optical Remote Sensing Image Cloud Detection with Self-Attention and Spatial Pyramid Pooling Fusion. Remote Sens. 2022, 14, 4312. [Google Scholar] [CrossRef]
- Lu, C.; Xia, M.; Lin, H. Multi-scale strip pooling feature aggregation network for cloud and cloud shadow segmentation. Neural Comput. Appl. 2022, 34, 6149–6162. [Google Scholar] [CrossRef]
- Xia, M.; Wang, T.; Zhang, Y.; Liu, J.; Xu, Y. Cloud/shadow segmentation based on global attention feature fusion residual network for remote sensing imagery. Int. J. Remote Sens. 2021, 42, 2022–2045. [Google Scholar] [CrossRef]
- Guo, H.; Bai, H.; Qin, W. ClouDet: A Dilated Separable CNN-Based Cloud Detection Framework for Remote Sensing Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 9743–9755. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Houlsby, N. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Fei-Fei, L. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
- Liu, N.; Zhang, N.; Wan, K.; Shao, L.; Han, J. Visual saliency transformer. In Proceedings of the 18th IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021. [Google Scholar]
- Qiu, Y.; Liu, Y.; Zhang, L.; Xu, J. Boosting Salient Object Detection with Transformer-based Asymmetric Bilateral U-Net. arXiv 2021, arXiv:2108.07851. [Google Scholar] [CrossRef]
- Zhang, Y.; Liu, H.; Hu, Q. Transfuse: Fusing transformers and cnns for medical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Strasbourg, France, 27 September–1 October 2021. [Google Scholar] [CrossRef]
- Du, X.; Wu, H. Cloud-Graph: A feature interaction graph convolutional network for remote sensing image cloud detection. J. Intell. Fuzzy Syst. 2023, 45, 9123–9139. [Google Scholar] [CrossRef]
- Du, X.; Wu, H. Gated aggregation network for cloud detection in remote sensing image. Vis. Comput. 2024, 40, 2517–2536. [Google Scholar] [CrossRef]
- Samplawski, C.; Marlin, B.M. Towards Transformer-Based Real-Time Object Detection at the Edge: A Benchmarking Study. In Proceedings of the MILCOM 2021-2021 IEEE Military Communications Conference, San Diego, CA, USA, 29 November–2 December 2021. [Google Scholar] [CrossRef]
- Sun, Z.; Cao, S.; Yang, Y.; Kitani, K.M. Rethinking transformer-based set prediction for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021. [Google Scholar]
- Wang, W.; Lai, Q.; Fu, H.; Shen, J.; Ling, H.; Yang, R. Salient object detection in the deep learning era: An in-depth survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3239–3259. [Google Scholar] [CrossRef] [PubMed]
- Dang, L.M.; Wang, H.; Li, Y.; Nguyen, T.N.; Moon, H. DefectTR: End-to-end defect detection for sewage networks using a transformer. Constr. Build. Mater. 2022, 325, 126584. [Google Scholar] [CrossRef]
- Botach, A.; Zheltonozhskii, E.; Baskin, C. End-to-end referring video object segmentation with multimodal transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- De Boer, P.T.; Kroese, D.P.; Mannor, S.; Rubinstein, R.Y. A tutorial on the cross-entropy method. Ann. Oper. Res. 2005, 134, 19–67. [Google Scholar] [CrossRef]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Achanta, R.; Hemami, S.; Estrada, F.; Susstrunk, S. Frequency-tuned salient region detection. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009. [Google Scholar] [CrossRef]
- Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Shao, L. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021. [Google Scholar]
- Hughes, M.J.; Hayes, D.J. Automated detection of cloud and cloud shadow in single-date Landsat imagery using neural networks and spatial post-processing. Remote Sens. 2014, 6, 4907–4926. [Google Scholar] [CrossRef]
- Fu, K.; Fan, D.P.; Ji, G.P.; Zhao, Q.; Shen, J.; Zhu, C. Siamese network for RGB-D salient object detection and beyond. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 5541–5559. [Google Scholar] [CrossRef] [PubMed]
- Margolin, R.; Zelnik-Manor, L.; Tal, A. How to evaluate foreground maps? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
- Fan, D.P.; Cheng, M.M.; Liu, Y.; Li, T.; Borji, A. Structure-measure: A new way to evaluate foreground maps. In Proceedings of the 16th IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
- Fan, D.P.; Gong, C.; Cao, Y.; Ren, B.; Cheng, M.M.; Borji, A. Enhanced-alignment measure for binary foreground map evaluation. arXiv 2018, arXiv:1805.10421. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted, Munich, Germany, 5–9 October 2015. [Google Scholar] [CrossRef]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Amirul Islam, M.; Rochan, M.; Bruce, N.D.; Wang, Y. Gated feedback refinement network for dense image labeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Cheng, B.; Misra, I.; Schwing, A.G.; Kirillov, A.; Girdhar, R. Masked-attention Mask Transformer for Universal Image Segmentation. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 1280–1289. [Google Scholar] [CrossRef]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: New York, NY, USA, 2021; Volume 34, pp. 12077–12090. [Google Scholar]
- Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11976–11986. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
Dataset | Scenes | Images | Train/Test |
---|---|---|---|
CHLandsat-8-TR | 44 | 22,616 | Train |
CHLandsat-8-TE | 20 | 10,080 | Test |
38-Cloud-Test | 20 | 10,906 | Test |
SPARCS | 80 | 720 | Test |
Model | CHLandsat-8-TE (20) | |||||
---|---|---|---|---|---|---|
MAE | MaxFm | AvgFm | WFm | Sm | Em | |
FCN8s | 0.106 | 0.874 | 0.858 | 0.784 | 0.742 | 0.827 |
UNet | 0.113 | 0.862 | 0.826 | 0.745 | 0.729 | 0.789 |
PSPNet | 0.097 | 0.879 | 0.869 | 0.799 | 0.767 | 0.858 |
SEGNet | 0.102 | 0.874 | 0.850 | 0.778 | 0.754 | 0.824 |
GFRNet | 0.123 | 0.852 | 0.827 | 0.736 | 0.716 | 0.779 |
Cloud-Net | 0.101 | 0.875 | 0.846 | 0.764 | 0.736 | 0.795 |
ClouDet | 0.095 | 0.884 | 0.870 | 0.796 | 0.764 | 0.828 |
CDNet | 0.129 | 0.848 | 0.814 | 0.722 | 0.709 | 0.763 |
CDNetV2 | 0.125 | 0.842 | 0.823 | 0.735 | 0.714 | 0.790 |
ResNet-34 | 0.133 | 0.841 | 0.823 | 0.716 | 0.691 | 0.763 |
PVT | 0.144 | 0.850 | 0.818 | 0.706 | 0.753 | 0.824 |
Mask2former | 0.102 | 0.877 | 0.839 | 0.765 | 0.737 | 0.797 |
Segformer | 0.109 | 0.867 | 0.831 | 0.752 | 0.733 | 0.791 |
Cloudformer | 0.112 | 0.856 | 0.828 | 0.742 | 0.766 | 0.781 |
TransCNet | 0.082 | 0.893 | 0.874 | 0.815 | 0.850 | 0.844 |
Model | 38-Cloud-Test (20) | |||||
---|---|---|---|---|---|---|
MAE | MaxFm | AvgFm | WFm | Sm | Em | |
FCN8s | 0.069 | 0.857 | 0.833 | 0.768 | 0.768 | 0.817 |
UNet | 0.064 | 0.879 | 0.862 | 0.797 | 0.785 | 0.856 |
PSPNet | 0.065 | 0.831 | 0.823 | 0.759 | 0.777 | 0.889 |
SEGNet | 0.056 | 0.857 | 0.850 | 0.800 | 0.806 | 0.912 |
GFRNet | 0.079 | 0.843 | 0.824 | 0.751 | 0.754 | 0.817 |
Cloud-Net | 0.055 | 0.890 | 0.878 | 0.761 | 0.798 | 0.870 |
ClouDet | 0.052 | 0.896 | 0.882 | 0.819 | 0.824 | 0.899 |
CDNet | 0.106 | 0.835 | 0.823 | 0.738 | 0.727 | 0.793 |
CDNetV2 | 0.108 | 0.817 | 0.805 | 0.718 | 0.721 | 0.781 |
ResNet-34 | 0.090 | 0.840 | 0.797 | 0.704 | 0.719 | 0.770 |
PVT | 0.101 | 0.851 | 0.697 | 0.647 | 0.737 | 0.735 |
Mask2former | 0.078 | 0.846 | 0.841 | 0.754 | 0.761 | 0.845 |
Segformer | 0.080 | 0.842 | 0.831 | 0.751 | 0.746 | 0.821 |
Cloudformer | 0.082 | 0.841 | 0.835 | 0.752 | 0.742 | 0.831 |
TransCNet | 0.045 | 0.867 | 0.866 | 0.814 | 0.859 | 0.892 |
Model | SPARCS (80) | |||||
---|---|---|---|---|---|---|
MAE | MaxFm | AvgFm | WFm | Sm | Em | |
FCN8s | 0.143 | 0.464 | 0.386 | 0.307 | 0.517 | 0.456 |
UNet | 0.131 | 0.527 | 0.451 | 0.365 | 0.542 | 0.506 |
PSPNet | 0.126 | 0.543 | 0.480 | 0.376 | 0.541 | 0.550 |
SEGNet | 0.110 | 0.631 | 0.555 | 0.470 | 0.592 | 0.595 |
GFRNet | 0.131 | 0.516 | 0.444 | 0.363 | 0.547 | 0.512 |
Cloud-Net | 0.121 | 0.547 | 0.462 | 0.380 | 0.553 | 0.517 |
ClouDet | 0.105 | 0.566 | 0.502 | 0.452 | 0.581 | 0.554 |
CDNet | 0.116 | 0.616 | 0.546 | 0.459 | 0.592 | 0.595 |
CDNetV2 | 0.122 | 0.587 | 0.514 | 0.425 | 0.570 | 0.560 |
ResNet-34 | 0.148 | 0.403 | 0.364 | 0.301 | 0.490 | 0.442 |
PVT | 0.180 | 0.512 | 0.442 | 0.355 | 0.479 | 0.495 |
Mask2former | 0.115 | 0.618 | 0.557 | 0.466 | 0.595 | 0.602 |
Segformer | 0.122 | 0.545 | 0.471 | 0.378 | 0.562 | 0.546 |
Cloudformer | 0.112 | 0.624 | 0.563 | 0.475 | 0.602 | 0.611 |
TransCNet | 0.105 | 0.645 | 0.582 | 0.490 | 0.606 | 0.627 |
Cloud Region | Non-Cloud Region | |
---|---|---|
Predicted: cloud | True Positive | False Positive |
Predicted: non-cloud | False Negative | True Negative |
Backbone | FAM | CA | CHLandsat8-TE (20) | 38-Cloud-Test (20) | SPARCS (80) | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
MAE | MaxFm | Sm | MAE | MaxFm | Sm | MAE | MaxFm | Sm | |||
ResNet-34 | ✗ | ✗ | 0.133 | 0.841 | 0.691 | 0.090 | 0.840 | 0.719 | 0.148 | 0.403 | 0.490 |
PVT-Tiny | ✗ | ✗ | 0.144 | 0.850 | 0.753 | 0.101 | 0.851 | 0.737 | 0.180 | 0.512 | 0.479 |
ResNet-34 | ✗ | ✓ | 0.128 | 0.840 | 0.698 | 0.082 | 0.851 | 0.708 | 0.149 | 0.408 | 0.451 |
PVT-Tiny | ✗ | ✓ | 0.137 | 0.855 | 0.747 | 0.096 | 0.848 | 0.743 | 0.174 | 0.522 | 0.483 |
ResNet-34 + PVT-Tiny | ✗ | ✗ | 0.106 | 0.859 | 0.794 | 0.068 | 0.827 | 0.785 | 0.136 | 0.602 | 0.571 |
ResNet-34 + PVT-Tiny | ✓ | ✗ | 0.088 | 0.886 | 0.853 | 0.051 | 0.869 | 0.828 | 0.108 | 0.697 | 0.637 |
ResNet-34 + PVT-Tiny | ✓ | ✓ | 0.082 | 0.893 | 0.850 | 0.045 | 0.867 | 0.859 | 0.098 | 0.728 | 0.662 |
ConvNeXt-Tiny + Swin-T | ✗ | ✗ | 0.097 | 0.871 | 0.813 | 0.061 | 0.832 | 0.819 | 0.125 | 0.647 | 0.608 |
ConvNeXt-Tiny + Swin-T | ✓ | ✗ | 0.081 | 0.894 | 0.859 | 0.047 | 0.871 | 0.856 | 0.097 | 0.731 | 0.669 |
ConvNeXt-Tiny + Swin-T | ✓ | ✓ | 0.076 | 0.902 | 0.854 | 0.042 | 0.869 | 0.887 | 0.09 | 0.758 | 0.693 |
Model | Params (M) | FLOPs (G) (224 × 224) | Running Time (s) (1 k × 1 k) |
---|---|---|---|
UNet | 8.6 | 25.2 | 1.09 |
PSPNet | 46.6 | 19.3 | 1.05 |
SEGNet | 29.7 | 90.2 | 1.28 |
CDNetV1 | 64.8 | 48.5 | 1.26 |
CDNetV2 | 65.9 | 31.5 | 1.31 |
ConvNeXt-Tiny + Swin-T | 31.6 | 33.8 | 1.38 |
TransCNet | 23.0 | 24.6 | 1.29 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cheng, S.; Guo, H.; Wu, H.; Du, X. Dual-Branch Transformer–CNN Fusion for Enhanced Cloud Segmentation in Remote Sensing Imagery. Appl. Sci. 2025, 15, 9870. https://doi.org/10.3390/app15189870
Cheng S, Guo H, Wu H, Du X. Dual-Branch Transformer–CNN Fusion for Enhanced Cloud Segmentation in Remote Sensing Imagery. Applied Sciences. 2025; 15(18):9870. https://doi.org/10.3390/app15189870
Chicago/Turabian StyleCheng, Shengyi, Hangfei Guo, Hailei Wu, and Xianjun Du. 2025. "Dual-Branch Transformer–CNN Fusion for Enhanced Cloud Segmentation in Remote Sensing Imagery" Applied Sciences 15, no. 18: 9870. https://doi.org/10.3390/app15189870
APA StyleCheng, S., Guo, H., Wu, H., & Du, X. (2025). Dual-Branch Transformer–CNN Fusion for Enhanced Cloud Segmentation in Remote Sensing Imagery. Applied Sciences, 15(18), 9870. https://doi.org/10.3390/app15189870