LPASS-Net: Lightweight Progressive Attention Semantic Segmentation Network for Automatic Segmentation of Remote Sensing Images
Abstract
1. Introduction
- A lightweight progressive attention semantic segmentation network (LPASS-Net) is proposed, which utilizes an efficient, lightweight backbone network, atrous spatial pyramid pooling (ASPP) modules, and reverse progressive attention. Using an enhanced feature fusion network, the algorithm improves its robustness when segmenting targets of different scales and solves the problem of local information loss. In contrast, the design of the feature fusion network can enrich the diversity of features of each discipline, which is conducive to improving the accuracy of segmentation when size and perspective change phenomena.
- The proposed lightweight non-local convolutional attention network (LNCA-Net) is a spatial dimensional attention mechanism that breaks the limitation of only local feature integration by an improved autocorrelation matrix operation.
- An edge padding cut prediction (EPCP) is proposed to segment and splice images by the edge padding method, which can well solve the problem of producing splice traces when direct prediction is performed.
2. Related Work
3. Methodologies
3.1. Lightweight and Efficient Backbone Network
3.2. Enhanced Feature Extraction Network
3.3. Splicing Optimization Method
3.4. Experiments
3.4.1. Dataset and Experimental Environment
3.4.2. Evaluation Metrics and Experimental Details
4. Results and Discussion


5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Kampffmeyer, M.; Salberg, A.B.; Jenssen, R. Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images using deep convolutional neural networks. In Proceedings of the IEEE Conference on Computer Visionc and Pattern Recognition Workshops, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1–9. [Google Scholar]
- Li, R.; Zheng, S.; Zhang, C.; Duan, C.; Su, J.; Wang, L.; Atkinson, P.M. Multiattention network for semantic segmentation of fine-resolution remote sensing images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–13. [Google Scholar] [CrossRef]
- Yi, Y.; Zhang, Z.; Zhang, W.; Zhang, C.; Li, W.; Zhao, T. Semantic segmentation of urban buildings from VHR remote sensing imagery using a deep convolutional neural network. Remote Sens. 2019, 11, 1774. [Google Scholar] [CrossRef]
- Wurm, M.; Stark, T.; Zhu, X.X.; Weigand, M.; Taubenböck, H. Semantic segmentation of slums in satellite images using transfer learning on fully convolutional neural networks. ISPRS J. Photogramm. Remote Sens. 2019, 150, 59–69. [Google Scholar] [CrossRef]
- Sun, S.; Mu, L.; Wang, L.; Liu, P.; Liu, X.; Zhang, Y. Semantic segmentation for buildings of large intra-class variation in remote sensing images with O-GAN. Remote Sens. 2021, 13, 475. [Google Scholar] [CrossRef]
- Yuan, X.; Sarma, V. Automatic urban water-body detection and segmentation from sparse ALSM data via spatially constrained model-driven clustering. IEEE Geosci. Remote Sens. Lett. 2010, 8, 73–77. [Google Scholar] [CrossRef]
- Pulvirenti, L.; Chini, M.; Pierdicca, N.; Guerriero, L.; Ferrazzoli, P. Flood monitoring using multi-temporal COSMO-SkyMed data: Image segmentation and signature interpretation. Remote Sens. Environ. 2011, 115, 990–1002. [Google Scholar] [CrossRef]
- Alzu’bi, A.; Alsmadi, L. Monitoring deforestation in Jordan using deep semantic segmentation with satellite imagery. Ecol. Inform. 2022, 70, 101745. [Google Scholar] [CrossRef]
- Balado, J.; Olabarria, C.; Martínez-Sánchez, J.; Rodríguez-Pérez, J.R.; Pedro, A. Semantic segmentation of major macroalgae in coastal environments using high-resolution ground imagery and deep learning. Int. J. Remote Sens. 2021, 42, 1785–1800. [Google Scholar] [CrossRef]
- Ulku, I.; Akagündüz, E.; Ghamisi, P. Deep Semantic Segmentation of Trees Using Multispectral Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 7589–7604. [Google Scholar] [CrossRef]
- Dechesne, C.; Mallet, C.; Le Bris, A.; Gouet-Brunet, V. Semantic segmentation of forest stands of pure species combining airborne lidar data and very high resolution multispectral imagery. ISPRS J. Photogramm. Remote Sens. 2017, 126, 129–145. [Google Scholar] [CrossRef]
- M Rustowicz, R.; Cheong, R.; Wang, L.; Ermon, S.; Burke, M.; Lobell, D. Semantic segmentation of crop type in Africa: A novel dataset and analysis of deep learning methods. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops 2019, Long Beach, CA, USA, 16–17 June 2019; pp. 75–82. [Google Scholar]
- Dronova, I.; Gong, P.; Clinton, N.E.; Wang, L.; Fu, W.; Qi, S.; Liu, Y. Landscape analysis of wetland plant functional types: The effects of image segmentation scale, vegetation classes and classification methods. Remote Sens. Environ. 2012, 127, 357–369. [Google Scholar] [CrossRef]
- Wei, P.; Chai, D.; Lin, T.; Tang, C.; Du, M.; Huang, J. Large-scale rice mapping under different years based on time-series Sentinel-1 images using deep semantic segmentation model. ISPRS J. Photogramm. Remote Sens. 2021, 174, 198–214. [Google Scholar] [CrossRef]
- Shimabukuro, Y.E.; Batista, G.T.; Mello, E.M.K.; Moreira, J.C.; Duarte, V. Using shade fraction image segmentation to evaluate deforestation in Landsat Thematic Mapper images of the Amazon region. Int. J. Remote Sens. 1998, 19, 535–541. [Google Scholar] [CrossRef]
- Fang, F.; Yuan, X.; Wang, L.; Liu, Y.; Luo, Z. Urban land-use classification from photographs. IEEE Geosci. Remote Sens. Lett. 2018, 15, 1927–1931. [Google Scholar] [CrossRef]
- Zhang, N.; Wang, Y.; Feng, S. A Lightweight Remote Sensing Image Super-Resolution Method and Its Application in Smart Cities. Electronics 2022, 11, 1050. [Google Scholar] [CrossRef]
- Bao, H.; Ming, D.; Guo, Y.; Zhang, K.; Zhou, K.; Du, S. DFCNN-based semantic recognition of urban functional zones by integrating remote sensing data and POI data. Remote Sens. 2020, 12, 1088. [Google Scholar] [CrossRef]
- Bonafoni, S.; Baldinelli, G.; Verducci, P. Sustainable strategies for smart cities: Analysis of the town development effect on surface urban heat island through remote sensing methodologies. Sustain. Cities Soc. 2017, 29, 211–218. [Google Scholar] [CrossRef]
- Li, D.; Deng, L.; Cai, Z. Intelligent vehicle network system and smart city management based on genetic algorithms and image perception. Mech. Syst. Signal Process. 2020, 141, 106623. [Google Scholar] [CrossRef]
- Chen, X.; Li, Z.; Jiang, J.; Han, Z.; Deng, S.; Li, Z.; Fang, T.; Huo, H.; Li, Q.; Liu, M. Adaptive effective receptive field convolution for semantic segmentation of VHR remote sensing images. IEEE Trans. Geosci. Remote Sens. 2020, 59, 3532–3546. [Google Scholar] [CrossRef]
- Ding, Q.; Shao, Z.; Huang, X.; Altan, O. DSA-Net: A novel deeply supervised attention-guided network for building change detection in high-resolution remote sensing images. Int. J. Appl. Earth Obs. Geoinf. 2021, 105, 102591. [Google Scholar] [CrossRef]
- Zheng, H.; Gong, M.; Liu, T.; Jiang, F.; Zhan, T.; Lu, D.; Zhang, M. HFA-Net: High frequency attention siamese network for building change detection in VHR remote sensing images. Pattern Recognit. 2022, 129, 108717. [Google Scholar] [CrossRef]
- Marmanis, D.; Schindler, K.; Wegner, J.D.; Galliani, S.; Datcu, M.; Stilla, U. Classification with an edge: Improving semantic image segmentation with boundary detection. ISPRS J. Photogramm. Remote Sens. 2018, 135, 158–172. [Google Scholar] [CrossRef]
- Shen, L.; Li, C. Water body extraction from Landsat ETM+ imagery using adaboost algorithm. In Proceedings of the 2010 18th International Conference on Geoinformatics, Beijing, China, 18–20 June 2010; IEEE: New York, NY, USA, 2010; pp. 1–4. [Google Scholar]
- Vu, T.T.; Yamazaki, F.; Matsuoka, M. Multi-scale solution for building extraction from LiDAR and image data. Int. J. Appl. Earth Obs. Geoinf. 2009, 11, 281–289. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2015, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer: Cham, Switzerland; pp. 234–241. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Zhang, T.; Yang, Z.; Xu, Z.; Li, J. Wheat Yellow Rust Severity Detection by Efficient DF-UNet and UAV Multispectral Imagery. IEEE Sens. J. 2022, 22, 9057–9068. [Google Scholar] [CrossRef]
- Shi, X.; Huang, H.; Pu, C.; Yang, Y.; Xue, J. CSA-UNet: Channel-Spatial Attention-Based Encoder–Decoder Network for Rural Blue-Roofed Building Extraction from UAV Imagery. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6514405. [Google Scholar] [CrossRef]
- Wang, L.; Li, R.; Zhang, C.; Fang, S.; Duan, C.; Meng, X.; Atkinson, P.M. UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery. ISPRS J. Photogramm. Remote Sens. 2022, 190, 196–214. [Google Scholar] [CrossRef]
- Wang, L.; Yang, J.; Huang, C.; Luo, X. An Improved U-Net Model for Segmenting Wind Turbines From UAV-Taken Images. IEEE Sens. Lett. 2022, 6, 6002404. [Google Scholar] [CrossRef]
- Patil, P. An Attention Augmented Convolution based Improved Residual UNet for Road Extraction. May 2022. [Google Scholar] [CrossRef]
- Ni, X.; Cheng, Y.; Wang, Z. Remote sensing semantic segmentation with convolution neural network using attention mechanism. In Proceedings of the 2019 14th IEEE International Conference on Electronic Measurement and Instruments (ICEMI), Nanjing, China, 1–3 November 2019; IEEE: New York, NY, USA, 2019; pp. 608–613. [Google Scholar]
- Hu, H.; Li, Z.; Li, L.; Yang, H.; Zhu, H. Classification of very high-resolution remote sensing imagery using a fully convolutional network with global and local context information enhancements. IEEE Access 2020, 8, 14606–14619. [Google Scholar] [CrossRef]
- Shang, R.; Zhang, J.; Jiao, L.; Li, Y.; Marturi, N.; Stolkin, R. Multi-scale adaptive feature fusion network for semantic segmentation in remote sensing images. Remote Sens. 2020, 12, 872. [Google Scholar] [CrossRef]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 28–29 October 2019; pp. 1314–1324. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
- Liang, H.; Seo, S. Lightweight Deep Learning for Road Environment Recognition. Appl. Sci. 2022, 12, 3168. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7794–7803. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]



















| Items | Description | |
|---|---|---|
| H/W | CPU | Intel(R) Core (TM) i5–11400F | 
| RAM | 16 GB | |
| SSD | Samsung SSD 500GB | |
| Graphics Card | NVIDIA GeForce RTX 3050 | |
| S/W | Operating System | Windows 11 Pro, 64bit | 
| Programming Language | Python 3.7 | |
| Learning Framework | TensorFlow 2.2.0 | 
| Backbone Feature Extraction | Baseline | √ | √ | √ | 
|---|---|---|---|---|
| ASPP module | √ | √ | √ | |
| RPA-Net (No attention) | √ | × | ||
| RPA-Net (LNCA-Net) | × | √ | ||
| mIoU | 73.50 | 76.11 | 80.91 | 83.17 | 
| Settings | Input Size | Parameters (Millions) | mIoU (%) | 
|---|---|---|---|
| Baseline | 256 × 256 | 7.14 | 80.91 | 
| + CBAM | 256 × 256 | 7.24 | 82.03 | 
| + LRCA-Net | 256 × 256 | 7.29 | 82.87 | 
| + LNCA-Net | 256 × 256 | 7.18 | 83.17 | 
| LNCA-Net Modules | mIoU | |||
|---|---|---|---|---|
| Baseline | A1 | A2 | A3 | |
| √ | 80.91 | |||
| √ | √ | 81.51 | ||
| √ | √ | √ | 82.47 | |
| √ | √ | √ | √ | 83.17 | 
| Method | Backbone | Parameters (Millions) | mF1 | mIoU (%) | Inference Time (s) | ||
|---|---|---|---|---|---|---|---|
| SegNet | VGG16 | 21.61 | 74.01 | 76.96 | 67.82 | 70.41 | 0.0358 | 
| ResNet50 | 44.86 | 77.64 | 80.32 | 71.16 | 75.47 | 0.0428 | |
| PSPNet | ResNet50 | 46.77 | 84.42 | 87.09 | 79.91 | 82.35 | 0.0397 | 
| MobileNetv3 | 2.41 | 79.11 | 84.82 | 74.75 | 78.13 | 0.0260 | |
| U-Net | VGG16 | 24.89 | 81.14 | 84.62 | 76.43 | 81.82 | 0.0364 | 
| ResNet50 | 44.01 | 81.94 | 84.87 | 77.96 | 81.46 | 0.0411 | |
| DeepLabv3+ | Xception | 41.25 | 86.22 | 91.14 | 82.15 | 86.39 | 0.0386 | 
| ResNet50 | 27.75 | 85.31 | 88.80 | 80.56 | 82.27 | 0.0365 | |
| LPASS-Net | MobileNetv3 | 7.17 | 89.17 | 94.55 | 83.17 | 88.86 | 0.0271 | 
| Method | Vegetation | Buildings | Water Bodies | Roads | Others | 
|---|---|---|---|---|---|
| SegNet | 78.7 | 68.1 | 73.8 | 70.4 | 64.8 | 
| PSPNet | 83.4 | 79.3 | 83.2 | 80.5 | 73.15 | 
| U-Net | 82.7 | 74.9 | 81.5 | 79.6 | 71.1 | 
| DeepLabv3+ | 86.9 | 80.1 | 84.1 | 82.7 | 76.95 | 
| LPASS-Net | 88.3 | 80.5 | 85.4 | 81.1 | 80.55 | 
| Method | Impervious Surfaces | Buildings | Low Vegetation | Trees | Cars | Clutter | 
|---|---|---|---|---|---|---|
| SegNet | 77.1 | 73.2 | 83.1 | 72.9 | 76.7 | 69.8 | 
| PSPNet | 81.5 | 80.6 | 87.4 | 84.6 | 88.3 | 71.7 | 
| U-Net | 80.5 | 81.1 | 85.8 | 84.5 | 89.6 | 69.4 | 
| DeepLabv3+ | 83.7 | 84.1 | 91.5 | 86.1 | 93.8 | 78.9 | 
| LPASS-Net | 83.4 | 87.9 | 95.4 | 90.4 | 95.6 | 80.5 | 
| The BDCI 2017 Dataset | The ISPRS Potsdam Dataset | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Category | Vegetation | Building | Water Bodies | Road | Other | Impervious Surface | Building | Low Vegetation | Tree | Car | Clutter | 
| Color | |||||||||||
| Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. | 
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liang, H.; Seo, S. LPASS-Net: Lightweight Progressive Attention Semantic Segmentation Network for Automatic Segmentation of Remote Sensing Images. Remote Sens. 2022, 14, 6057. https://doi.org/10.3390/rs14236057
Liang H, Seo S. LPASS-Net: Lightweight Progressive Attention Semantic Segmentation Network for Automatic Segmentation of Remote Sensing Images. Remote Sensing. 2022; 14(23):6057. https://doi.org/10.3390/rs14236057
Chicago/Turabian StyleLiang, Han, and Suyoung Seo. 2022. "LPASS-Net: Lightweight Progressive Attention Semantic Segmentation Network for Automatic Segmentation of Remote Sensing Images" Remote Sensing 14, no. 23: 6057. https://doi.org/10.3390/rs14236057
APA StyleLiang, H., & Seo, S. (2022). LPASS-Net: Lightweight Progressive Attention Semantic Segmentation Network for Automatic Segmentation of Remote Sensing Images. Remote Sensing, 14(23), 6057. https://doi.org/10.3390/rs14236057
 
        


 
       