Attention Guide Axial Sharing Mixed Attention (AGASMA) Network for Cloud Segmentation and Cloud Shadow Segmentation
Abstract
:1. Introduction
- We have developed a network that preserves high resolution throughout the entire process for cloud and cloud shadow segmentation. We have adopted a method of parallel-connecting sub-networks from high to low scales, to maintain high-resolution features throughout the entire process. The network is capable of achieving high-quality cloud detection tasks.
- The parallel-connected Transformer and CNN sub-models create a platform for interaction between fine and coarse features at the same level. This approach mitigates the limitations of using the Transformer and CNN models separately, enhances the acquisition of semantic and detailed information, and improves the model’s ability to accurately locate and segment clouds and cloud shadows.
- Most polymerization of high and low resolutions use splicing and addition operations. On the contrary, we use ASMA to obtain the dependence of the spatial position of pixels in high-resolution images and low-resolution images, which effectively aggregates multi-scale information, mitigating issues such as difficult image positioning and unclear boundaries caused by information loss. Additionally, incorporating the AGM enhances the modeling capability of the self-attention channel, enhancing the integration of high- and low-resolution channels and spatial features.
2. Related Work
2.1. Cloud Detection
2.2. Remote Sensing Image Processing
2.3. Multi-Scale Fusion
2.4. Attention
3. Methodology
3.1. Network Structure
3.2. Attention Guide Axial Sharing Mixed Attention Module (AGASMA)
3.2.1. Axial Attention
3.2.2. Sharing Mixed Attention (SMA)
3.2.3. Attention Guide Module (AGM)
4. Datasets
4.1. Cloud and Cloud Shadow Dataset
4.2. SPARCS Dataset
4.3. CSWV Dataset
5. Experimental Analysis
5.1. Multi-Scale Fusion Experiment
5.2. Backbone Fusion Experiment
5.3. Comparative Study of Attention Mechanism
5.4. Network Ablation Experiments
- Effect of CNN and Transformer cross-architecture: The results are presented in Table 6. The architecture incorporating CNN and Transformer parallel sub-modules effectively enhanced the segmentation performance of the Backbone. By substituting certain CNN sub-modules with Transformer sub-modules, we achieved a fusion of fine and coarse features at the same feature level. Compared with Backbone, this model improved the MIOU by 0.61%, 1.48%, and 0.40% on the three datasets, respectively. In Figure 13, the heatmaps before and after the Swin Transformer module are visualized. From the figure, it is evident that the global receptive field enhanced the model’s semantic information and improved target prediction accuracy.
- Effect of Axial Sharing Mixed Attention (ASMA) module: Table 6 shows that after adding the Axial Sharing Mixed Attention (ASMA) module, the segmentation results increased by 1.11%, 1.55%, and 1.28% on the MIOU. It is verified that this attention mechanism for multi-scale fusion can effectively reconstruct the dependencies between pixels. More intuitively, the visual segmentation results are shown in Figure 13. For the above image, the use of ASMA alleviated the interference of the thin cloud and residual cloud on the boundary segmentation. For the second image, high ground noise significantly affected the accurate identification of clouds and cloud shadows, but using ASMA greatly reduced this interference. These results demonstrate that integrating ASMA to efficiently fuse multi-scale features enhances the model’s segmentation accuracy in complex backgrounds, including thin clouds, residual clouds, and high noise.
- Effect of the Attention Guide Module: As shown in Table 6, we added the CBAM and the AGM on the basis of Backbone + Swin + ASMA, respectively. Our experiments showed that AGM is significantly better than CBAM in performance, and that the MIOU was improved by 0.29%, 0.55%, and 0.24% on the three datasets, respectively, verifying the effectiveness of the AGM in our network. For the noise-interference and target-positioning problems mentioned above, our model achieved the best results. In Figure 13, the visualization indicates that the areas of cloud and cloud shadow positioning appear more red, highlighting a concentrated weight distribution in the target areas. Noise interference was minimized, resulting in less impact on non-target areas, which appear bluer. The results demonstrate that integrating the ASM guided attention mechanism for feature fusion effectively reduces noise interference and enhances the model’s accuracy in target positioning.
5.5. Comparative Experiments of Different Algorithms on Cloud and Cloud Shadow Dataset
5.6. Comparative Experiments of Different Algorithms in SPARCS Dataset
5.7. Comparative Experiments of Different Algorithms in CSWV Datasets
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Ceppi, P.; Nowack, P. Observational evidence that cloud feedback amplifies global warming. Proc. Natl. Acad. Sci. USA 2021, 118, e2026290118. [Google Scholar] [CrossRef]
- Chen, K.; Dai, X.; Xia, M.; Weng, L.; Hu, K.; Lin, H. MSFANet: Multi-Scale Strip Feature Attention Network for Cloud and Cloud Shadow Segmentation. Remote Sens. 2023, 15, 4853. [Google Scholar] [CrossRef]
- Carn, S.A.; Krueger, A.J.; Krotkov, N.A.; Yang, K.; Evans, K. Tracking volcanic sulfur dioxide clouds for aviation hazard mitigation. Nat. Hazards 2009, 51, 325–343. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Chen, K.; Xia, M.; Lin, H.; Qian, M. Multi-scale Attention Feature Aggregation Network for Cloud and Cloud Shadow Segmentation. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5612216. [Google Scholar]
- Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5693–5703. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Ren, H.; Xia, M.; Weng, L.; Hu, K.; Lin, H. Dual-Attention-Guided Multiscale Feature Aggregation Network for Remote Sensing Image Change Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 4899–4916. [Google Scholar] [CrossRef]
- Dai, X.; Xia, M.; Weng, L.; Hu, K.; Lin, H.; Qian, M. Multi-Scale Location Attention Network for Building and Water Segmentation of Remote Sensing Image. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5609519. [Google Scholar] [CrossRef]
- Ren, W.; Wang, Z.; Xia, M.; Lin, H. MFINet: Multi-Scale Feature Interaction Network for Change Detection of High-Resolution Remote Sensing Images. Remote Sens. 2024, 16, 1269. [Google Scholar] [CrossRef]
- Yin, H.; Weng, L.; Li, Y.; Xia, M.; Hu, K.; Lin, H.; Qian, M. Attention-guided siamese networks for change detection in high resolution remote sensing images. Int. J. Appl. Earth Obs. Geoinf. 2023, 117, 103206. [Google Scholar] [CrossRef]
- Lee, Y.; Kim, J.; Willette, J.; Hwang, S.J. Mpvit: Multi-path vision transformer for dense prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 7287–7296. [Google Scholar]
- Xu, J.; Xiong, Z.; Bhattacharyya, S.P. PIDNet: A Real-Time Semantic Segmentation Network Inspired by PID Controllers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 19529–19539. [Google Scholar]
- Dai, X.; Chen, K.; Xia, M.; Weng, L.; Lin, H. LPMSNet: Location pooling multi-scale network for cloud and cloud shadow segmentation. Remote Sens. 2023, 15, 4005. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
- Huang, Z.; Wang, X.; Huang, L.; Huang, C.; Wei, Y.; Liu, W. Ccnet: Criss-cross attention for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 603–612. [Google Scholar]
- Feng, Y.; Jiang, J.; Xu, H.; Zheng, J. Change detection on remote sensing images using dual-branch multilevel intertemporal network. IEEE Trans. Geosci. Remote Sens. 2023, 61, 4401015. [Google Scholar] [CrossRef]
- Chen, C.F.R.; Fan, Q.; Panda, R. Crossvit: Cross-attention multi-scale vision transformer for image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 357–366. [Google Scholar]
- Manolakis, D.G.; Shaw, G.A.; Keshava, N. Comparative analysis of hyperspectral adaptive matched filter detectors. In Proceedings of the Algorithms for Multispectral, Hyperspectral, and Ultraspectral Imagery VI, Orlando, FL, USA, 23 August 2000; SPIE: Bellingham, WA, USA, 2000; Volume 4049, pp. 2–17. [Google Scholar]
- Liu, X.; Xu, J.m.; Du, B. A bi-channel dynamic thershold algorithm used in automatically identifying clouds on gms-5 imagery. J. Appl. Meteorl. Sci. 2005, 16, 134–444. [Google Scholar]
- Ma, F.; Zhang, Q.; Guo, N.; Zhang, J. The study of cloud detection with multi-channel data of satellite. Chin. J. Atmos. Sci.-Chin. Ed. 2007, 31, 119. [Google Scholar]
- Ji, H.; Xia, M.; Zhang, D.; Lin, H. Multi-Supervised Feature Fusion Attention Network for Clouds and Shadows Detection. ISPRS Int. J. Geo-Inf. 2023, 12, 247. [Google Scholar] [CrossRef]
- Ding, L.; Xia, M.; Lin, H.; Hu, K. Multi-Level Attention Interactive Network for Cloud and Snow Detection Segmentation. Remote Sens. 2024, 16, 112. [Google Scholar] [CrossRef]
- Gkioxari, G.; Toshev, A.; Jaitly, N. Chained predictions using convolutional neural networks. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part IV 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 728–743. [Google Scholar]
- Yao, J.; Zhai, H.; Wang, G. Cloud detection of multi-feature remote sensing images based on deep learning. In Proceedings of the IOP Conference Series: Earth and Environmental Science; IOP Publishing: Bristol, UK, 2021; Volume 687, p. 012155. [Google Scholar]
- Xia, M.; Wang, T.; Zhang, Y.; Liu, J.; Xu, Y. Cloud/shadow segmentation based on global attention feature fusion residual network for remote sensing imagery. Int. J. Remote Sens. 2021, 42, 2022–2045. [Google Scholar] [CrossRef]
- Hu, Z.; Weng, L.; Xia, M.; Hu, K.; Lin, H. HyCloudX: A Multibranch Hybrid Segmentation Network With Band Fusion for Cloud/Shadow. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 6762–6778. [Google Scholar] [CrossRef]
- Mohajerani, S.; Saeedi, P. Cloud-Net: An end-to-end cloud detection algorithm for Landsat 8 imagery. In Proceedings of the IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1029–1032. [Google Scholar]
- Jiang, S.; Lin, H.; Ren, H.; Hu, Z.; Weng, L.; Xia, M. MDANet: A High-Resolution City Change Detection Network Based on Difference and Attention Mechanisms under Multi-Scale Feature Fusion. Remote Sens. 2024, 16, 1387. [Google Scholar] [CrossRef]
- Chen, Y.; Weng, Q.; Tang, L.; Liu, Q.; Fan, R. An automatic cloud detection neural network for high-resolution remote sensing imagery with cloud–snow coexistence. IEEE Geosci. Remote Sens. Lett. 2021, 19, 6004205. [Google Scholar] [CrossRef]
- Zhang, G.; Gao, X.; Yang, J.; Yang, Y.; Tan, M.; Xu, J.; Wang, Y. A multi-task driven and reconfigurable network for cloud detection in cloud-snow coexistence regions from very-high-resolution remote sensing images. Int. J. Appl. Earth Obs. Geoinf. 2022, 114, 103070. [Google Scholar] [CrossRef]
- Lu, C.; Xia, M.; Qian, M.; Chen, B. Dual-branch network for cloud and cloud shadow segmentation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5410012. [Google Scholar] [CrossRef]
- Wu, H.; Xiao, B.; Codella, N.; Liu, M.; Dai, X.; Yuan, L.; Zhang, L. Cvt: Introducing convolutions to vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 10–17 October 2021; pp. 22–31. [Google Scholar]
- Gu, G.; Weng, L.; Xia, M.; Hu, K.; Lin, H. Muti-path Muti-scale Attention Network for Cloud and Cloud shadow segmentation. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5404215. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Yang, J.; Matsushita, B.; Zhang, H. Improving building rooftop segmentation accuracy through the optimization of UNet basic elements and image foreground-background balance. ISPRS J. Photogramm. Remote Sens. 2023, 201, 123–137. [Google Scholar] [CrossRef]
- Yu, C.; Wang, J.; Peng, C.; Gao, C.; Yu, G.; Sang, N. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 325–341. [Google Scholar]
- Zhao, H.; Qi, X.; Shen, X.; Shi, J.; Jia, J. Icnet for real-time semantic segmentation on high-resolution images. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 405–420. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
- Li, Y.; Weng, L.; Xia, M.; Hu, K.; Lin, H. Multi-Scale Fusion Siamese Network Based on Three-Branch Attention Mechanism for High-Resolution Remote Sensing Image Change Detection. Remote Sens. 2024, 16, 1665. [Google Scholar] [CrossRef]
- Kitaev, N.; Kaiser, Ł.; Levskaya, A. Reformer: The efficient transformer. arXiv 2020, arXiv:2001.04451. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2020; pp. 213–229. [Google Scholar]
- Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 568–578. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-unet: Unet-like pure transformer for medical image segmentation. In Proceedings of the European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2022; pp. 205–218. [Google Scholar]
- Lin, A.; Chen, B.; Xu, J.; Zhang, Z.; Lu, G.; Zhang, D. Ds-transunet: Dual swin transformer u-net for medical image segmentation. IEEE Trans. Instrum. Meas. 2022, 71, 4005615. [Google Scholar] [CrossRef]
- He, X.; Zhou, Y.; Zhao, J.; Zhang, D.; Yao, R.; Xue, Y. Swin transformer embedding UNet for remote sensing image semantic segmentation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4408715. [Google Scholar] [CrossRef]
- Wang, L.; Li, R.; Zhang, C.; Fang, S.; Duan, C.; Meng, X.; Atkinson, P.M. UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery. ISPRS J. Photogramm. Remote Sens. 2022, 190, 196–214. [Google Scholar] [CrossRef]
- Xu, F.; Wong, M.S.; Zhu, R.; Heo, J.; Shi, G. Semantic segmentation of urban building surface materials using multi-scale contextual attention network. ISPRS J. Photogramm. Remote Sens. 2023, 202, 158–168. [Google Scholar] [CrossRef]
- Chen, J.; Xia, M.; Wang, D.; Lin, H. Double Branch Parallel Network for Segmentation of Buildings and Waters in Remote Sensing Images. Remote Sens. 2023, 15, 1536. [Google Scholar] [CrossRef]
- Wang, Z.; Xia, M.; Weng, L.; Hu, K.; Lin, H. Dual Encoder–Decoder Network for Land Cover Segmentation of Remote Sensing Image. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 2372–2385. [Google Scholar] [CrossRef]
- Zhan, Z.; Ren, H.; Xia, M.; Lin, H.; Wang, X.; Li, X. AMFNet: Attention-Guided Multi-Scale Fusion Network for Bi-Temporal Change Detection in Remote Sensing Images. Remote Sens. 2024, 16, 1765. [Google Scholar] [CrossRef]
- Wang, Z.; Gu, G.; Xia, M.; Weng, L.; Hu, K. Bitemporal Attention Sharing Network for Remote Sensing Image Change Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 10368–10379. [Google Scholar] [CrossRef]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3146–3154. [Google Scholar]
- Chai, D.; Newsam, S.; Zhang, H.K.; Qiu, Y.; Huang, J. Cloud and cloud shadow detection in Landsat imagery based on deep convolutional neural networks. Remote Sens. Environ. 2019, 225, 307–316. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Jeppesen, J.H.; Jacobsen, R.H.; Inceoglu, F.; Toftegaard, T.S. A cloud detection algorithm for satellite imagery based on deep learning. Remote Sens. Environ. 2019, 229, 247–259. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Mou, L.; Hua, Y.; Zhu, X.X. Relation matters: Relational context-aware fully convolutional network for semantic segmentation of high-resolution aerial images. IEEE Trans. Geosci. Remote Sens. 2020, 58, 7557–7569. [Google Scholar] [CrossRef]
- Hughes, M.J.; Hayes, D.J. Automated detection of cloud and cloud shadow in single-date Landsat imagery using neural networks and spatial post-processing. Remote Sens. 2014, 6, 4907–4926. [Google Scholar] [CrossRef]
- Zhang, G.; Gao, X.; Yang, Y.; Wang, M.; Ran, S. Controllably deep supervision and multi-scale feature fusion network for cloud and snow detection based on medium-and high-resolution imagery dataset. Remote Sens. 2021, 13, 4805. [Google Scholar] [CrossRef]
- Guo, J.; Han, K.; Wu, H.; Tang, Y.; Chen, X.; Wang, Y.; Xu, C. Cmt: Convolutional neural networks meet vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 12175–12185. [Google Scholar]
- Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pvt v2: Improved baselines with pyramid vision transformer. Comput. Vis. Media 2022, 8, 415–424. [Google Scholar] [CrossRef]
- Gu, J.; Kwon, H.; Wang, D.; Ye, W.; Li, M.; Chen, Y.H.; Lai, L.; Chandra, V.; Pan, D.Z. Multi-scale high-resolution vision transformer for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 12094–12103. [Google Scholar]
- Yuan, Y.; Chen, X.; Wang, J. Object-contextual representations for semantic segmentation. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part VI 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 173–190. [Google Scholar]
Layer | Size | ||||
---|---|---|---|---|---|
Stem | L1 | L2 | L3 | Input Size | Output Size |
- | - | - | h,3 | h/4,256 | |
Swin. | CNN. | Swin. | h/4,32 | h/4,32 | |
CNN. | Swin. | CNN. | h/8,64 | h/8,64 | |
- | CNN. | Swin. | h/16,128 | h/16,128 | |
- | - | CNN. | h/32,256 | h/32,256 |
Satellite | Band | Wavelength (nm) | Resolution |
---|---|---|---|
Landsat-8 | 1 (Coastal) | 430–450 | 30 |
2 (Blue) | 450–515 | 30 | |
3 (Green) | 525–600 | 30 | |
4 (Red) | 630–680 | 30 | |
5 (NIR) | 845–885 | 30 | |
6 (SWIR-1) | 1560–1660 | 30 | |
7 (SWIR-2) | 2100–2300 | 30 | |
8 (PAN) | 503–676 | 30 |
CC | SPARCS | CSWV | |||
---|---|---|---|---|---|
Ours (HRNet) | Ours | Ours (HRNet) | Ours | Ours (HRNet) | Ours |
94.12 | 94.42 | 80.21 | 81.48 | 89.98 | 90.13 |
CNN | Transformer | MIOU (%) | |||
---|---|---|---|---|---|
BasicBlock | Bottleneck | Pvtv2 | Cvt | Swin | |
✓ | 92.04 | ||||
✓ | 91.02 | ||||
✓ | ✓ | 92.39 | |||
✓ | ✓ | 92.12 | |||
✓ | ✓ | 92.05 | |||
✓ | ✓ | 91.56 | |||
✓ | ✓ | 92.68 | |||
✓ | ✓ | 92.33 |
Method | CC | SPARCS | CSWV |
---|---|---|---|
Without | 93.06 | 79.43 | 88.86 |
Cross-attention | 93.84 | 79.88 | 89.03 |
Mixed-attention (a) | 94.26 | 80.54 | 89.67 |
Mixed-attention (b) | 94.21 | 80.65 | 89.57 |
Sharing-mixed-attention | 94.42 | 81.48 | 90.13 |
Method | CC | SPARCS | CSWV | |||
---|---|---|---|---|---|---|
F1 | MIOU | F1 | MIOU | F1 | MIOU | |
Backbone | 93.84 | 92.04 | 82.22 | 77.56 | 90.42 | 88.06 |
Backbone + Swin | 94.49 | 92.65 | 82.97 | 79.04 | 90.60 | 88.46 |
Backbone + Swin + SMA | 95.29 | 93.76 | 85.17 | 80.59 | 91.43 | 89.74 |
Backbone + Swin + AGSMA (CBAM) | 95.55 | 94.13 | 85.22 | 80.93 | 91.74 | 89.89 |
Backbone + Swin + AGSMA (Ours) | 95.59 | 94.42 | 85.48 | 81.48 | 92.00 | 90.13 |
Cloud | Cloud Shadow | Evaluation Index | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Method | P (%) | R (%) | F1 (%) | P (%) | R (%) | F1 (%) | PA (%) | MPA (%) | MIOU (%) | Flops (G) | Paras (M) |
UNet [4] | 93.28 | 94.80 | 91.68 | 92.82 | 91.56 | 88.43 | 95.24 | 94.24 | 89.15 | 23.96 | 13.40 |
CVT [34] | 94.58 | 95.34 | 92.81 | 93.84 | 91.34 | 88.68 | 95.74 | 95.03 | 90.16 | 1.56 | 74.13 |
DANet [56] | 94.59 | 95.91 | 93.35 | 93.26 | 93.38 | 90.33 | 96.11 | 95.09 | 90.95 | 37.76 | 62.01 |
CloudNet [27] | 94.32 | 96.61 | 93.89 | 92.67 | 93.85 | 90.49 | 96.19 | 94.93 | 91.07 | 2.50 | 27.58 |
CMT [64] | 93.65 | 97.28 | 94.18 | 94.71 | 92.49 | 90.16 | 96.31 | 95.36 | 91.35 | 3.07 | 26.37 |
DeepLabv3 [5] | 96.42 | 94.31 | 92.69 | 95.10 | 93.25 | 91.07 | 96.35 | 96.06 | 91.62 | 34.85 | 54.92 |
SwinUNet [47] | 95.47 | 96.18 | 94.03 | 94.72 | 93.42 | 91.05 | 96.57 | 95.88 | 91.96 | 1.42 | 41.38 |
MPvit [13] | 95.43 | 96.77 | 94.59 | 92.37 | 95.61 | 91.98 | 96.68 | 95.36 | 92.13 | 1.13 | 45.61 |
CCNet [17] | 95.36 | 96.62 | 94.40 | 93.33 | 95.32 | 92.18 | 96.73 | 95.61 | 92.29 | 42.63 | 52.28 |
PVTv2 [65] | 95.97 | 96.45 | 94.54 | 94.88 | 94.14 | 91.81 | 96.84 | 96.17 | 92.59 | 2.12 | 63.24 |
HRNet [7] | 95.44 | 97.00 | 94.80 | 93.73 | 95.46 | 92.51 | 96.90 | 95.82 | 92.60 | 17.85 | 65.85 |
PSPNet [37] | 95.99 | 96.20 | 94.30 | 95.39 | 94.02 | 91.93 | 96.87 | 96.32 | 92.66 | 33.93 | 48.94 |
HRvit [66] | 95.34 | 97.89 | 94.34 | 95.35 | 93.97 | 91.87 | 96.95 | 96.39 | 92.82 | 1.70 | 35.70 |
DBNet [33] | 95.38 | 97.19 | 94.95 | 95.42 | 94.24 | 92.15 | 96.98 | 96.26 | 92.89 | 17.84 | 95.29 |
OCRNet [67] | 97.83 | 96.84 | 94.26 | 95.60 | 94.33 | 92.45 | 97.04 | 96.50 | 93.03 | 30.93 | 70.35 |
DBPNet [52] | 95.81 | 97.15 | 95.13 | 95.14 | 94.86 | 92.61 | 97.12 | 96.35 | 93.17 | 5.93 | 82.10 |
Ours | 96.62 | 97.75 | 96.10 | 96.12 | 95.77 | 93.96 | 97.63 | 97.05 | 94.42 | 7.86 | 39.20 |
Class Pixel Accuracy | Evaluation Index | ||||||||
---|---|---|---|---|---|---|---|---|---|
Method | Cloud (%) | Shadow (%) | Water (%) | Snow (%) | Land (%) | Shadow over Water (%) | Flooded (%) | F1 (%) | MIOU (%) |
UNet | 79.73 | 58.31 | 84.29 | 89.72 | 95.90 | 12.03 | 68.33 | 65.94 | 58.36 |
DANet | 82.13 | 42.39 | 78.78 | 90.61 | 94.32 | 16.47 | 82.04 | 66.41 | 60.42 |
CMT | 87.98 | 71.25 | 87.30 | 90.90 | 94.90 | 15.24 | 80.10 | 72.25 | 66.92 |
CloudNet | 88.47 | 71.27 | 89.25 | 94.03 | 96.07 | 45.60 | 90.68 | 79.37 | 74.27 |
HRvit | 89.39 | 70.82 | 89.84 | 95.00 | 96.38 | 45.92 | 80.95 | 80.16 | 74.38 |
CVT | 88.43 | 73.76 | 91.08 | 94.91 | 96.20 | 52.85 | 89.84 | 80.92 | 76.03 |
PVT | 90.46 | 76.95 | 90.02 | 94.18 | 95.45 | 58.50 | 85.78 | 80.81 | 76.12 |
Swin-Unet | 89.82 | 74.70 | 92.08 | 94.81 | 96.63 | 50.43 | 91.50 | 81.68 | 77.00 |
CCNet | 88.21 | 75.32 | 93.07 | 95.64 | 97.04 | 48.36 | 91.04 | 81.70 | 77.04 |
HRNet | 92.13 | 77.81 | 90.06 | 95.60 | 95.83 | 55.55 | 91.69 | 82.09 | 77.71 |
PSPNet | 91.39 | 74.11 | 87.25 | 95.24 | 96.85 | 62.32 | 93.00 | 82.20 | 77.80 |
MPvit | 90.83 | 76.67 | 91.29 | 95.20 | 96.37 | 55.59 | 94.37 | 82.60 | 78.23 |
DBNet | 91.25 | 76.76 | 92.91 | 95.47 | 97.13 | 52.01 | 88.68 | 83.05 | 78.31 |
OCRNet | 91.03 | 76.97 | 92.06 | 94.35 | 97.13 | 56.65 | 87.72 | 83.03 | 78.31 |
DeepLabV3 | 91.13 | 81.35 | 93.18 | 94.93 | 95.82 | 65.86 | 91.63 | 82.93 | 79.07 |
DBPNet | 90.13 | 78.73 | 92.68 | 95.50 | 97.04 | 63.50 | 93.95 | 83.87 | 79.82 |
Ours | 92.45 | 81.59 | 91.47 | 96.48 | 97.34 | 63.71 | 92.99 | 85.52 | 81.48 |
Method | PA (%) | MPA (%) | R (%) | F1 (%) | MIOU (%) |
---|---|---|---|---|---|
DANet | 93.74 | 86.43 | 87.81 | 81.45 | 76.67 |
CMT | 94.23 | 89.32 | 88.61 | 83.47 | 79.46 |
PVT | 94.44 | 89.75 | 88.74 | 84.62 | 80.98 |
CloudNet | 94.71 | 89.11 | 90.27 | 85.38 | 81.31 |
CVT | 94.76 | 91.45 | 89.21 | 85.44 | 82.22 |
CCNet | 95.37 | 91.72 | 91.21 | 87.38 | 84.11 |
MPvit | 95.65 | 92.28 | 92.08 | 88.47 | 85.34 |
HRvit | 95.54 | 92.65 | 90.19 | 88.63 | 85.75 |
Swin-Unet | 95.64 | 92.53 | 92.20 | 88.86 | 85.90 |
UNet | 95.52 | 92.16 | 92.91 | 89.27 | 86.13 |
PSPNet | 95.80 | 92.44 | 92.82 | 89.50 | 86.46 |
HRNet | 95.86 | 93.77 | 92.31 | 89.63 | 87.15 |
DBNet | 96.13 | 93.45 | 93.08 | 90.03 | 87.34 |
DeepLabv3 | 96.01 | 92.21 | 94.13 | 90.53 | 87.42 |
DBPNet | 96.18 | 93.51 | 93.59 | 90.66 | 88.01 |
OCRNet | 96.42 | 94.13 | 93.83 | 91.22 | 88.80 |
Ours | 96.82 | 94.22 | 95.33 | 92.61 | 90.13 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gu, G.; Wang, Z.; Weng, L.; Lin, H.; Zhao, Z.; Zhao, L. Attention Guide Axial Sharing Mixed Attention (AGASMA) Network for Cloud Segmentation and Cloud Shadow Segmentation. Remote Sens. 2024, 16, 2435. https://doi.org/10.3390/rs16132435
Gu G, Wang Z, Weng L, Lin H, Zhao Z, Zhao L. Attention Guide Axial Sharing Mixed Attention (AGASMA) Network for Cloud Segmentation and Cloud Shadow Segmentation. Remote Sensing. 2024; 16(13):2435. https://doi.org/10.3390/rs16132435
Chicago/Turabian StyleGu, Guowei, Zhongchen Wang, Liguo Weng, Haifeng Lin, Zikai Zhao, and Liling Zhao. 2024. "Attention Guide Axial Sharing Mixed Attention (AGASMA) Network for Cloud Segmentation and Cloud Shadow Segmentation" Remote Sensing 16, no. 13: 2435. https://doi.org/10.3390/rs16132435
APA StyleGu, G., Wang, Z., Weng, L., Lin, H., Zhao, Z., & Zhao, L. (2024). Attention Guide Axial Sharing Mixed Attention (AGASMA) Network for Cloud Segmentation and Cloud Shadow Segmentation. Remote Sensing, 16(13), 2435. https://doi.org/10.3390/rs16132435