A New Semantic Segmentation Method for Remote Sensing Images Integrating Coordinate Attention and SPD-Conv
Abstract
:1. Introduction
- (1)
- A new segmentation method for small objects in remote sensing images is proposed. We construct an asymmetric encoder–decoder structure which, based on ResNet101 and added SPD-Conv layers, enables the model to reduce the loss of fine-grained information and improve the segmentation accuracy of small objects in images.
- (2)
- We adopted the coordinate attention mechanism in the feature extraction stage to obtain more orientation-sensitive and position-sensitive feature information in remote sensing images and improve the segmentation accuracy of node edges.
- (3)
- We introduce the Dice coefficient into the cross-entropy loss function, which can reflect the degree of overlap between the predicted and real regions when the classification is extremely unbalanced, reducing the accuracy problem caused by classification unbalance.
2. Related Work
2.1. Attention Mechanism
2.2. Small Objects
3. Method
3.1. Feature Extraction Module
3.2. Multiscale Module Based on Coordinate Attention Mechanism (CA-ASPP)
3.3. Loss Function
4. Experiments and Discussion
4.1. Data Set
4.2. Evaluation Indicators
4.3. Experimental Setup
4.4. Experimental Results
4.5. Ablation Experiment
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Xu, Z.Y.; Zhang, W.C.; Zhang, T.X.; Yang, Z.F.; Li, J.Y. Efficient transformer for remote sensing image segmentation. Remote Sens. 2021, 13, 3585. [Google Scholar] [CrossRef]
- Zhou, X.K.; Xu, X.S.; Liang, W.; Zeng, Z.; Yan, Z. Deep-Learning-Enhanced Multitarget Detection for End-Edge-Cloud Surveillance in Smart IoT. IEEE Internet Things J. 2021, 8, 12588–12596. [Google Scholar] [CrossRef]
- Ali, I.; Rehman, A.U.; Khan, D.M.; Khan, Z.; Shafiq, M.; Choi, J.-G. Model Selection Using K-Means Clustering Algorithm for the Symmetrical Segmentation of Remote Sensing Datasets. Symmetry 2022, 14, 1149. [Google Scholar] [CrossRef]
- Li, J.Y.; Huang, X.; Gong, J.Y. Deep neural network for remote-sensing image interpretation: Status and perspectives. Natl. Sci. Rev. 2019, 6, 1082–1086. [Google Scholar] [CrossRef]
- Chen, X.L.; Zhu, G.B.; Liu, M.Q. Remote sensing image scene classification with self-supervised learning based on partially unlabeled datasets. Remote Sens. 2022, 14, 5838. [Google Scholar] [CrossRef]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer vision and Pattern Recognition, Las Vegas, NV, USA, 25 June–2 July 2016; pp. 3213–3223. [Google Scholar]
- Pan, J.S.; Wei, Z.Q.; Zhao, Y.H.; Zhou, Y. Enhanced FCN for farmland extraction from remote sensing image. Multimed. Tools Appl. 2022, 81, 38123–38150. [Google Scholar] [CrossRef]
- Liu, Y.; Gao, L.R.; Xiao, C.C.; Qu, Y.; Zheng, K. Hyperspectral image classification based on a shuffled group convolutional neural network with transfer learning. Remote Sens. 2020, 12, 1780. [Google Scholar] [CrossRef]
- Yuan, X.H.; Shi, J.F.; Gu, L.C. A review of deep learning methods for semantic segmentation of remote sensing imagery. Expert Syst. Appl. 2021, 169, 114417. [Google Scholar] [CrossRef]
- Tuia, D.; Volpi, M.; Copa, L.; Kanevski, M.; Munoz-Mari, J. A survey of active learning algorithms for supervised remote sensing image classification. IEEE J. Sel. Top. Signal Process. 2011, 5, 606–617. [Google Scholar] [CrossRef]
- Geng, J.; Wang, H.Y.; Fan, J.C.; Ma, X.R. SAR image classification via deep recurrent encoding neural networks. IEEE Trans. Geosci. Remote Sens. 2017, 56, 2255–2269. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Mou, L.C.; Hua, Y.S.; Zhu, X.X. Spatial relational reasoning in networks for improving semantic segmentation of aerial images. In Proceedings of the IEEE Conference on International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 5232–5235. [Google Scholar]
- Tao, C.; Qi, J.; Li, Y.S.; Wang, H.; Li, H.F. Spatial information inference net: Road extraction using road-specific contextual information. ISPRS J. Photogramm. Remote Sens. 2019, 158, 155–166. [Google Scholar] [CrossRef]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef]
- He, Q.; Dong, Z.; Chen, F. Pyramid: Enabling Hierarchical Neural Networks with Edge Computing. In Proceedings of the ACM Web Conference 2022, Lyon, France, 25–29 April 2022; pp. 1860–1870. [Google Scholar]
- Wang, Z.; Zhang, J.; Xia, S.; Shi, B.; Bai, X.; Zhang, L. Symmetry-enhanced deep learning for spatiotemporal prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Los Angeles CA, USA, 15–21 June 2019; pp. 5989–5998. [Google Scholar]
- Ma, J.; Lu, D.; Li, Y.; Shi, G. CLHF-Net: A Channel-Level Hierarchical Feature Fusion Network for Remote Sensing Image Change Detection. Symmetry 2022, 14, 1138. [Google Scholar] [CrossRef]
- Qi, L.Y.; Yang, Y.H.; Zhou, X.K. Fast anomaly identification based on multiaspect data streams for intelligent intrusion detection toward secure industry 4.0. IEEE Trans. Ind. Inform. 2021, 18, 6503–6511. [Google Scholar] [CrossRef]
- Liang, W.; Hu, Y.; Zhou, X. Variational few-shot learning for microservice-oriented intrusion detection in distributed industrial IoT. IEEE Trans. Ind. Inform. 2021, 18, 5087–5095. [Google Scholar] [CrossRef]
- Lv, Y.; Feng, W.; Wang, S.; Dauphin, G.; Zhang, Y.; Xing, M. Spectral-Spatial Feature Enhancement Algorithm for Nighttime Object Detection and Tracking. Symmetry 2023, 15, 546. [Google Scholar] [CrossRef]
- Park, J.; Lee, M.; Chang, H.J.; Lee, K.; Choi, J.Y. Symmetric graph convolutional autoencoder for unsupervised graph representation learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Los Angeles CA, USA, 15–21 June 2019; pp. 6519–6528. [Google Scholar]
- Kampffmeyer, M.; Salberg, A.; Jenssen, R. Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images using deep convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1–9. [Google Scholar]
- Kemker, R.; Kanan, C. Self-taught feature learning for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2693–2705. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–10 June 2015; pp. 3431–3440. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A deepconvolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Zhao, H.; Shi, J.; Qi, X. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Sun, Y.; Tian, Y.; Xu, Y. Problems of encoder-decoder frame-works for high-resolution remote sensing image segmentation: Struc-tural stereotype and insufficient learning. Neurocomputing 2019, 330, 297–304. [Google Scholar] [CrossRef]
- Guo, M.H.; Xu, T.X.; Liu, J.J.; Liu, Z.N. Attention mechanisms in computer vision: A survey. Comput. Vis. Media 2022, 8, 331–368. [Google Scholar] [CrossRef]
- Chen, L.; Zhang, H.W.; Xiao, J.; Nie, L.Q. Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5659–5667. [Google Scholar]
- Zhao, Q.; Liu, J.H.; Li, Y.W.; Zhang, H. Semantic segmentation with attention mechanism for remote sensing images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5403913. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Warsaw, Poland, 17–19 September 2018; pp. 7132–7141. [Google Scholar]
- Tong, W.; Chen, W.T.; Han, W. Channel-attention-based DenseNet network for remote sensing image scene classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 4121–4132. [Google Scholar] [CrossRef]
- Zhu, M.H.; Jiao, L.C.; Liu, F.; Yang, S.Y. Residual spectral–spatial attention network for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2020, 59, 449–462. [Google Scholar] [CrossRef]
- Ren, Y.; Li, X.; Yang, X. Development of a dual-attention U-Net model for sea ice and open water classification on SAR images. IEEE Geosci. Remote Sens. Lett. 2021, 19, 4010205. [Google Scholar] [CrossRef]
- Wang, H.; Zhu, Y.; Green, B. Axial-deeplab: Stand-alone axial-attention for panoptic segmentation. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 108–126. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Hou, Q.B.; Zhou, D.Q.; Feng, J.S. Coordinate attention for efficient mobile network design. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Beijing, China, 29 October–21 November 2021; pp. 13713–13722. [Google Scholar]
- Sunkara, R.; Luo, T. No more strided convolutions or pooling: A new CNN building block for low-resolution images and small objects. arXiv 2022, arXiv:2208.03641. [Google Scholar]
- Mnih, V.; Heess, N.; Graves, A. Recurrent models of visual attention. In Advances in Neural Information Processing Systems 27, Proceedings of the 28th Annual Conference on Neural Information Processing Systems, Montreal, Canada, 8–13 December 2014; Neural Information Processing Systems Foundation, Inc. (NeurIPS): La Jolla, CA, USA, 2014. [Google Scholar]
- Zhang, S.Y.; Li, C.R.; Qiu, S. EMMCNN: An ETPS-based multi-scale and multi-feature method using CNN for high spatial resolution image land-cover classification. Remote Sens. 2019, 12, 66. [Google Scholar] [CrossRef]
- Gao, H.; Cao, L.; Yu, D.F. Semantic segmentation of marine remote sensing based on a cross direction attention mechanism. IEEE Access 2020, 8, 142483–142494. [Google Scholar] [CrossRef]
- Zheng, J.W.; Feng, Y.C.; Bai, C.; Zhang, J.L. Hyper spectral image classification using mixed convolutions and covariance pooling. IEEE Trans. Geosci. Remote Sens. 2021, 59, 522–534. [Google Scholar] [CrossRef]
- Zhou, M.; Zou, Z.; Shi, Z.; Zeng, W.J.; Gui, J. Local Attention networks for occluded airplane detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 2020, 17, 381–385. [Google Scholar] [CrossRef]
- Qi, X.; Li, K.; Liu, P.; Zhou, X.; Sun, M. Deep attention and multi-scale networks for accurate remote sensing image segmentation. IEEE Access 2020, 8, 146627–146639. [Google Scholar] [CrossRef]
- Li, J.; Xiu, J.; Yang, Z. Dual path attention net for remote sensing semantic image segmentation. ISPRS Int. J. Geo-Inf. 2020, 9, 571. [Google Scholar] [CrossRef]
- Mnih, V. Machine Learning for Aerial Image Labeling. Ph.D. Thesis, University of Toronto, Toronto, Canada, 2013. [Google Scholar]
- Saito, S.; Yamashita, T.; Aoki, Y. Multiple object extraction from aerial imagery with convolutional neural networks. Electron. Imaging 2016, 10, 1–9. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1063–6919. [Google Scholar]
- Rottensteiner, F.; Sohn, G.; Jung, J. The ISPRS benchmark on urban object classification and 3D building reconstruction. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2012, 1, 293–298. [Google Scholar] [CrossRef]
Model | Contribution | Backbone | Dataset | Mean Intersection over Union (mIou)/% |
---|---|---|---|---|
FCN-8s [25] | Fully conv network without full connection layer (fc) can adapt to any size of input image | VGG16 | VOC 2012 | 62.2 |
SegNet [26] | The parameters that need to be trained for upsampling in FCN are reduced | VGG16 | CamVid | 60.1 |
PSPNet [27] | PSPNet provides effective global context priors for pixel-level scene resolution | ResNet | VOC 2012 | 82.6 |
DeepLab V3 [28] | The atrous convolution is applied to the expansion module, and the atrous spatial pyramid pooling module is improved | ResNet | Cityscapes | 81.3 |
DeepLab V3+ [28] | The improved Xception computes faster without reducing accuracy | Xception | Cityscapes | 82.1 |
ReSegNet [29] | A new residual architecture of a coder–decoder model is proposed to alleviate the problem of inadequate learning | VGG16 | ISPRS Vaihingen | 74.63 |
Layer Name | ResNet101-SPD |
---|---|
spd1 | SPD-Conv |
conv1 | 7 × 7, 64 output channels |
3 × 3 max pool | |
conv2 | |
spd2 | SPD-Conv |
conv3 | |
spd3 | SPD-Conv |
conv4 | |
spd4 | SPD-Conv |
conv5 | |
Fc(fully conn) | Global avg pooling + fc(no. of class) + softmax |
Set Name | Image Number |
---|---|
Training set | 1, 3, 5, 6, 11, 13, 15, 16, 17, 21, 22, 23, 24, 26, 28, 29, 31, 32, 33, 34, 35, 37, 38 |
Validation set | 2, 7, 14, 20, 30 |
Test set | 4, 8, 10, 12, 27 |
Model | F1-Score/% | /% | OA/% | ||||
---|---|---|---|---|---|---|---|
Impervious Surfaces | Buildings | Trees | Cars | Low Vegetation | |||
FCN-8s | 82.96 | 84.10 | 78.59 | 56.34 | 77.24 | 64.57 | 81.57 |
SegNet | 85.32 | 86.48 | 78.96 | 66.24 | 78.35 | 67.06 | 82.91 |
PSPNet | 87.82 | 89.31 | 80.14 | 73.67 | 79.24 | 70.32 | 84.65 |
DeepLab V3+ | 88.84 | 92.66 | 84.61 | 74.06 | 82.35 | 74.32 | 88.16 |
ReSegNet | 91.37 | 93.47 | 85.34 | 76.18 | 80.23 | 73.36 | 88.03 |
CAS-Net | 90.96 | 94.56 | 87.73 | 80.78 | 82.07 | 75.58 | 89.83 |
Model | F1-Score/% | /% | OA/% | |||||
---|---|---|---|---|---|---|---|---|
Impervious Surfaces | Buildings | Trees | Cars | Low Vegetation | Backgrounds | |||
FCN-8s | 82.87 | 83.81 | 79.32 | 58.64 | 76.38 | 41.32 | 65.92 | 80.96 |
SegNet | 86.41 | 88.93 | 78.85 | 66.58 | 79.46 | 43.92 | 69.46 | 83.58 |
PSPNet | 88.06 | 89.25 | 82.56 | 72.04 | 79.64 | 46.53 | 69.58 | 85.75 |
DeepLab V3+ | 89.54 | 93.76 | 83.78 | 75.33 | 81.35 | 50.64 | 73.45 | 87.45 |
ReSegNet | 89.43 | 93.78 | 86.92 | 77.79 | 81.54 | 48.63 | 72.78 | 87.93 |
CAS-Net | 91.24 | 94.91 | 88.97 | 81.46 | 82.92 | 53.22 | 76.34 | 89.94 |
Model | F1-Score/% | /% | OA/% | ||||
---|---|---|---|---|---|---|---|
Impervious Surfaces | Buildings | Trees | Cars | Low Vegetation | |||
Baseline | 88.73 | 91.36 | 83.47 | 75.73 | 82.38 | 74.07 | 88.52 |
+SPD-Conv | 89.72 | 93.53 | 86.52 | 78.23 | 82.03 | 75.26 | 89.14 |
+SPD-Conv+CA-ASPP | 90.67 | 93.62 | 87.02 | 79.51 | 82.02 | 75.38 | 89.57 |
CAS-Net | 90.96 | 94.56 | 87.73 | 80.78 | 82.07 | 75.58 | 89.83 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, Z.; Wu, Q.; Zhang, F.; Zhang, X.; Chen, X.; Gao, Y. A New Semantic Segmentation Method for Remote Sensing Images Integrating Coordinate Attention and SPD-Conv. Symmetry 2023, 15, 1037. https://doi.org/10.3390/sym15051037
Yang Z, Wu Q, Zhang F, Zhang X, Chen X, Gao Y. A New Semantic Segmentation Method for Remote Sensing Images Integrating Coordinate Attention and SPD-Conv. Symmetry. 2023; 15(5):1037. https://doi.org/10.3390/sym15051037
Chicago/Turabian StyleYang, Zimeng, Qiulan Wu, Feng Zhang, Xueshen Zhang, Xuefei Chen, and Yue Gao. 2023. "A New Semantic Segmentation Method for Remote Sensing Images Integrating Coordinate Attention and SPD-Conv" Symmetry 15, no. 5: 1037. https://doi.org/10.3390/sym15051037
APA StyleYang, Z., Wu, Q., Zhang, F., Zhang, X., Chen, X., & Gao, Y. (2023). A New Semantic Segmentation Method for Remote Sensing Images Integrating Coordinate Attention and SPD-Conv. Symmetry, 15(5), 1037. https://doi.org/10.3390/sym15051037