An Efficient Semantic Segmentation Method for Remote-Sensing Imagery Using Improved Coordinate Attention
Abstract
:1. Introduction
- (1)
- The proposed approach introduces an encoder–decoder framework based on TransUNet specifically designed for the semantic segmentation of remote-sensing images. This framework leverages detailed information and global context to enhance the quality of feature representation. Additionally, the Content-Aware ReAssembly of Features (CARAFE++) method is employed to effectively upsample feature maps, thereby preserving important details during the decoding process.
- (2)
- To improve the model’s performance, an efficient improved coordinate attention module is incorporated, which utilizes four pooling enhancements to suppress background information and accentuate small features. The h-swish activation function is utilized to enhance the model’s nonlinear fitting capability. Furthermore, a weight generation submodule is designed to assist the network in precisely localizing the object of interest.
- (3)
- The transformer module is improved to reduce the time complexity associated with calculating attention. This is achieved by sparsifying the attention matrix and introducing a row–column attention (RCA) mechanism, which replaces the multi-head attention in the transformer model. It can also supplement the contextual information in the attention. Additionally, the layer normalization (LN) layer and multi-layer perceptron (MLP) layer are substituted with an asymmetric convolutional block (ACB) and a Leaky ReLU activation layer.
2. Methods
2.1. Datasets
2.2. Methodology
2.2.1. Network Architecture
2.2.2. Improved Coordinate Attention Module
2.2.3. Improved Vision Transformer Module
- (1)
- Classical Transformer model
- (2)
- Improved vision Transformer module
2.2.4. CAPAFE++ Upsampling Module
2.2.5. Loss Function
3. Results
3.1. Evaluation Metrics
3.2. Dataset Settings and Implementation Details
3.3. Comparison of Different Methods
4. Discussion
4.1. Analysis of the Attention Mechanism
4.2. Limitations
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Yuan, X.; Shi, J.; Gu, L. A review of deep learning methods for semantic segmentation of remote sensing imagery. Expert Syst. Appl. 2021, 169, 114417. [Google Scholar] [CrossRef]
- He, X.; Zhou, Y.; Zhao, J.; Zhang, D.; Yao, R.; Xue, Y. Swin transformer embedding unet for remote sensing image semantic segmentation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–15. [Google Scholar] [CrossRef]
- Huang, L.; Jiang, B.; Lv, S.; Liu, Y.; Fu, Y. Deep-Learning-Based Semantic Segmentation of Remote Sensing Images: A Survey. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 8370–8396. [Google Scholar] [CrossRef]
- Diakogiannis, F.I.; Waldner, F.; Caccetta, P.; Wu, C. Resunet-a: A deep learning framework for semantic segmentation of remotely sensed data. ISPRS J. Photogramm. Remote Sens. 2020, 162, 94–114. [Google Scholar] [CrossRef]
- Huo, Y.; Gang, S.; Guan, C. Fcihmrt: Feature cross-layer interaction hybrid method based on res2net and transformer for remote sensing scene classification. Electronics 2023, 12, 4362. [Google Scholar] [CrossRef]
- Wu, X.; Wang, L.; Wu, C.; Guo, C.; Yan, H.; Qiao, Z. Semantic segmentation of remote sensing images using multiway fusion network. Signal Process. 2024, 215, 109272. [Google Scholar] [CrossRef]
- Pal, S.K.; Ghosh, A.; Shankar, B.U. Segmentation of remotely sensed images with fuzzy thresholding, and quantitative evaluation. Int. J. Remote Sens. 2000, 21, 2269–2300. [Google Scholar] [CrossRef]
- Li, D.; Zhang, G.; Wu, Z.; Yi, L. An edge embedded marker-based watershed algorithm for high spatial resolution remote sensing image segmentation. IEEE Trans. Image Process. 2010, 19, 2781–2787. [Google Scholar]
- Saha, I.; Maulik, U.; Bandyopadhyay, S.; Plewczynski, D. Svmefc: Svm ensemble fuzzy clustering for satellite image segmentation. IEEE Geosci. Remote Sens. Lett. 2012, 9, 52–55. [Google Scholar] [CrossRef]
- Yu, A.; Quan, Y.; Yu, R.; Guo, W.; Wang, X.; Hong, D.; Zhang, H.; Chen, J.; Hu, Q.; He, P. Deep learning methods for semantic segmentation in remote sensing with small data: A survey. Remote Sens. 2023, 15, 4987. [Google Scholar] [CrossRef]
- Yi, Y.; Zhang, Z.; Zhang, W.; Zhang, C.; Li, W.; Zhao, T. Semantic segmentation of urban buildings from vhr remote sensing imagery using a deep convolutional neural network. Remote Sens. 2019, 11, 1774. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 39, 640–651. [Google Scholar]
- Kampffmeyer, M.; Salberg, A.-B.; Jenssen, R. Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images using deep convolutional neural networks. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 680–688. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015; Springer: Berlin/Heidelberg, Germany; pp. 234–241. [Google Scholar]
- Huo, Y.; Li, X.; Tu, B. Image measurement of crystal size growth during cooling crystallization using high-speed imaging and a u-net network. Crystals 2022, 12, 1690. [Google Scholar] [CrossRef]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Ma, B.; Chang, C.-Y. Semantic segmentation of high-resolution remote sensing images using multiscale skip connection network. IEEE Sens. J. 2021, 22, 3745–3755. [Google Scholar] [CrossRef]
- Zhou, W.; Jin, J.; Lei, J.; Yu, L. Cimfnet: Cross-layer interaction and multiscale fusion network for semantic segmentation of high-resolution remote sensing images. IEEE J. Sel. Top. Signal Process. 2022, 16, 666–676. [Google Scholar] [CrossRef]
- Zeng, Q.; Zhou, J.; Niu, X. Cross-Scale Feature Propagation Network for Semantic Segmentation of High-Resolution Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2023, 20, 6008305. [Google Scholar] [CrossRef]
- Liu, J.; Gu, H.; Li, Z.; Chen, H.; Chen, H. Multi-Scale Feature Fusion Attention Network for Building Extraction in Remote Sensing Images. Electronics 2024, 13, 923. [Google Scholar] [CrossRef]
- Xu, D.; Li, Z.; Feng, H.; Wu, F.; Wang, Y. Multi-Scale Feature Fusion Network with Symmetric Attention for Land Cover Classification Using SAR and Optical Images. Remote Sens. 2024, 16, 957. [Google Scholar] [CrossRef]
- Ding, L.; Tang, H.; Bruzzone, L. Lanet: Local attention embedding to improve the semantic segmentation of remote sensing images. IEEE Trans. Geosci. Remote Sens. 2020, 59, 426–435. [Google Scholar] [CrossRef]
- Liu, R.; Tao, F.; Liu, X.; Na, J.; Leng, H.; Wu, J.; Zhou, T. Raanet: A residual aspp with attention framework for semantic segmentation of high-resolution remote sensing images. Remote Sens. 2022, 14, 3109. [Google Scholar] [CrossRef]
- Li, X.; Xu, F.; Liu, F.; Lyu, X.; Tong, Y.; Xu, Z.; Zhou, J. A Synergistical Attention Model for Semantic Segmentation of Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5400916. [Google Scholar] [CrossRef]
- Hu, X.; Zhang, P.; Zhang, Q.; Yuan, F. GLSANet: Global-Local Self-Attention Network for Remote Sensing Image Semantic Segmentation. IEEE Geosci. Remote Sens. Lett. 2023, 20, 6000105. [Google Scholar] [CrossRef]
- Wang, L.; Li, R.; Zhang, C.; Fang, S.; Duan, C.; Meng, X.; Atkinson, P.M. Unetformer: A unet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery. ISPRS J. Photogramm. Remote Sens. 2022, 190, 196–214. [Google Scholar] [CrossRef]
- Xu, Z.; Geng, J.; Jiang, W. MMT: Mixed-Mask Transformer for Remote Sensing Image Semantic Segmentation. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5613415. [Google Scholar] [CrossRef]
- Wu, H.; Huang, P.; Zhang, M.; Tang, W. CTFNet: CNN-Transformer Fusion Network for Remote-Sensing Image Semantic Segmentation. IEEE Geosci. Remote Sens. Lett. 2024, 21, 5000305. [Google Scholar] [CrossRef]
- Wu, H.; Huang, P.; Zhang, M.; Tang, W.; Yu, X. Cmtfnet: Cnn and multiscale transformer fusion network for remote sensing image semantic segmentation. IEEE Trans. Geosci. Remote Sens. 2023, 61, 2004612. [Google Scholar] [CrossRef]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. Transunet: Transformers make strong encoders for medical image segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar]
- Ghamisi, P.; Yokoya, N. Img2dsm: Height simulation from single imagery using conditional generative adversarial net. IEEE Geosci. Remote Sens. Lett. 2018, 15, 794–798. [Google Scholar] [CrossRef]
- Wang, J.; Zheng, Z.; Ma, A.; Lu, X.; Zhong, Y. Loveda: A remote sensing land-cover dataset for domain adaptive semantic segmentation. arXiv 2021, arXiv:2110.08733. [Google Scholar]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 13708–13717. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 31th International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4 December 2017; pp. 6000–6010. [Google Scholar]
- Tay, Y.; Dehghani, M.; Bahri, D.; Metzler, D. Efficient Transformers: A Survey. ACM Comput. 2022, 55, 109. [Google Scholar] [CrossRef]
- Wang, J.; Chen, K.; Xu, R.; Liu, Z.; Loy, C.C.; Lin, D. Carafe++: Unified content-aware reassembly of features. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 4674–4687. [Google Scholar] [CrossRef] [PubMed]
- Li, X.; Sun, X.; Meng, Y.; Liang, J.; Wu, F.; Li, J. Dice loss for data-imbalanced NLP tasks. In Proceedings of the the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 465–476. [Google Scholar]
- Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 833–851. [Google Scholar]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. Segformer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
- Ni, J.; Wu, J.; Elazab, A.; Tong, J.; Chen, Z. Dnl-net: Deformed non-local neural network for blood vessel segmentation. BMC Med. Imaging 2022, 22, 109. [Google Scholar] [CrossRef]
- Strudel, R.; Garcia, R.; Laptev, I.; Schmid, C. Segmenter: Transformer for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 7262–7272. [Google Scholar]
- Guo, M.-H.; Lu, C.-Z.; Hou, Q.; Liu, Z.; Cheng, M.-M.; Hu, S.-M. Segnext: Rethinking convolutional attention design for semantic segmentation. Adv. Neural Inf. Process. Syst. 2022, 35, 1140–1156. [Google Scholar]
- Xu, M.; Zhang, Z.; Wei, F.; Hu, H.; Bai, X. Side adapter network for open-vocabulary semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–21 June 2023; pp. 2945–2954. [Google Scholar]
Methods | IoU | Indicators | |||||||
---|---|---|---|---|---|---|---|---|---|
Imp. Surf. | Building | Low Veg. | Tree | Car | Background | mIoU | mF1 | OA | |
Deeplabv3+ | 84.86 | 90.86 | 71.46 | 80.88 | 59.81 | 14.01 | 66.98 | 76.54 | 89.17 |
Segformer | 84.11 | 88.28 | 70.53 | 80.51 | 58.30 | 10.38 | 65.35 | 74.92 | 87.97 |
DNLNet | 84.52 | 90.00 | 70.33 | 80.04 | 60.67 | 24.03 | 68.27 | 78.68 | 89.44 |
Segmenter | 83.09 | 89.23 | 71.65 | 79.55 | 44.95 | 13.54 | 63.67 | 73.84 | 89.05 |
SegneXT | 81.09 | 86.24 | 67.50 | 78.22 | 34.27 | 11.52 | 59.81 | 70.38 | 87.40 |
CMTFNet | 84.36 | 89.68 | 69.79 | 78.79 | 67.21 | 34.82 | 70.78 | 81.41 | 89.89 |
SAN | 81.77 | 87.34 | 67.53 | 77.63 | 57.09 | 22.64 | 65.67 | 76.81 | 87.60 |
Proposed | 86.52 | 92.40 | 71.51 | 80.93 | 61.27 | 36.28 | 71.48 | 81.81 | 90.50 |
Methods | IoU | Indicators | |||||||
---|---|---|---|---|---|---|---|---|---|
Imp. Surf. | Building | Low Veg. | Tree | Car | Background | mIoU | mF1 | OA | |
Deeplabv3+ | 80.40 | 88.09 | 70.79 | 72.55 | 76.04 | 33.05 | 70.15 | 80.98 | 86.59 |
Segformer | 81.79 | 89.64 | 71.72 | 73.72 | 77.98 | 32.97 | 71.30 | 81.69 | 87.35 |
DNLNet | 82.14 | 89.54 | 71.91 | 73.95 | 81.43 | 33.15 | 72.02 | 82.15 | 87.15 |
Segmenter | 82.38 | 90.69 | 73.09 | 74.76 | 75.80 | 36.71 | 72.24 | 82.57 | 87.99 |
SegneXT | 80.73 | 88.11 | 70.90 | 73.39 | 72.53 | 34.05 | 69.95 | 80.92 | 86.79 |
CMTFNet | 84.70 | 90.28 | 74.31 | 76.13 | 90.28 | 40.77 | 76.08 | 85.19 | 86.92 |
SAN | 84.80 | 91.36 | 74.23 | 74.73 | 90.5 | 35.13 | 75.13 | 84.17 | 87.68 |
Proposed | 84.19 | 92.08 | 73.94 | 76.30 | 84.06 | 35.48 | 74.34 | 83.77 | 88.90 |
Methods | IoU | Indicators | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Background | Building | Road | Water | Barren | Forest | Agricultural | mIoU | mF1 | OA | |
Deeplabv3+ | 54.22 | 57.94 | 49.58 | 65.57 | 20.87 | 38.24 | 51.06 | 48.21 | 63.81 | 68.61 |
Segformer | 51.52 | 60.37 | 51.26 | 67.87 | 32.70 | 41.52 | 54.09 | 51.33 | 67.16 | 69.31 |
DNLNet | 53.28 | 57.70 | 49.17 | 64.85 | 31.50 | 40.30 | 54.09 | 50.13 | 66.17 | 69.31 |
Segmenter | 52.99 | 58.55 | 49.98 | 69.58 | 30.37 | 42.25 | 49.46 | 50.45 | 66.29 | 68.82 |
SegneXT | 53.51 | 55.95 | 48.46 | 69.76 | 22.07 | 39.78 | 51.58 | 48.73 | 64.30 | 68.62 |
CMTFNet | 52.61 | 55.05 | 51.15 | 57.95 | 22.02 | 37.31 | 45.87 | 45.99 | 62.05 | 68.48 |
SAN | 53.51 | 64.00 | 56.90 | 69.73 | 26.23 | 39.53 | 51.50 | 51.63 | 66.95 | 72.01 |
Proposed | 52.76 | 62.49 | 54.94 | 68.16 | 27.60 | 44.23 | 57.81 | 52.57 | 67.98 | 70.80 |
Methods | Vaihingen | Potsdam | LoveDA | ||||||
---|---|---|---|---|---|---|---|---|---|
mIoU | mF1 | OA | mIoU | mF1 | OA | mIoU | mF1 | OA | |
Baseline | 67.90 | 77.20 | 90.11 | 71.15 | 81.54 | 87.13 | 48.88 | 64.14 | 67.79 |
Proposed | 71.48 | 81.81 | 90.50 | 74.34 | 83.77 | 88.90 | 52.57 | 67.98 | 70.80 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Huo, Y.; Gang, S.; Dong, L.; Guan, C. An Efficient Semantic Segmentation Method for Remote-Sensing Imagery Using Improved Coordinate Attention. Appl. Sci. 2024, 14, 4075. https://doi.org/10.3390/app14104075
Huo Y, Gang S, Dong L, Guan C. An Efficient Semantic Segmentation Method for Remote-Sensing Imagery Using Improved Coordinate Attention. Applied Sciences. 2024; 14(10):4075. https://doi.org/10.3390/app14104075
Chicago/Turabian StyleHuo, Yan, Shuang Gang, Liang Dong, and Chao Guan. 2024. "An Efficient Semantic Segmentation Method for Remote-Sensing Imagery Using Improved Coordinate Attention" Applied Sciences 14, no. 10: 4075. https://doi.org/10.3390/app14104075
APA StyleHuo, Y., Gang, S., Dong, L., & Guan, C. (2024). An Efficient Semantic Segmentation Method for Remote-Sensing Imagery Using Improved Coordinate Attention. Applied Sciences, 14(10), 4075. https://doi.org/10.3390/app14104075