NPSFF-Net: Enhanced Building Segmentation in Remote Sensing Images via Novel Pseudo-Siamese Feature Fusion
Abstract
:1. Introduction
- We propose a novel building segmentation network called NPSFF-Net, which enriches and expands upon the traditional encoder–decoder structure.
- We use ResNet-34 and ResNet-50 to construct an improved pseudo-Siamese network to learn building features from HRSIs, and combine transfer learning and fusion encoding techniques to achieve efficient encoding of deep semantic features from building images.
- We design a double-stream decoder based on two neighboring deep encoded features and obtain decoded building features through skip connections while cleverly fusing deep and transposed convolutions.
- We conduct sufficient experiments on the Satellite Dataset I, Massachusetts Buildings Dataset, and Aerial Imagery Dataset, and the experimental results prove the effectiveness and advancement of NPSFF-Net.
2. Related Work
2.1. Encoders
2.2. Decoders
2.3. Attention Mechanisms
3. Methodology
3.1. The Overall Structure of NPSFF-Net
3.2. Pseudo-Siamese Feature Fusion Encoding
3.3. Double-Stream Feature Fusion Decoding
3.4. Building Feature Generation and Prediction
4. Results
4.1. Experimental Datasets
4.1.1. Satellite Dataset I
4.1.2. Massachusetts Buildings Dataset
4.2. Evaluation Metrics
4.2.1. IoU
4.2.2. Accuracy
4.2.3. Precision
4.2.4. Recall
4.2.5. F1-Score
4.3. Experimental Conditions
4.3.1. Computing Environment
4.3.2. Hyperparameter Settings
4.4. Experimental Results
4.4.1. Satellite Dataset I
4.4.2. Massachusetts Buildings Dataset
4.5. Ablation Experiments
4.5.1. Encoding Networks
4.5.2. Attention Mechanisms
4.5.3. Decoding Patterns
4.5.4. Loss Functions
5. Discussion
5.1. Applicability
5.2. Limitations
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Huang, X.; Wen, D.; Li, J.; Qin, R. Multi-level monitoring of subtle urban changes for the megacities of China using high-resolution multi-view satellite imagery. Remote Sens. Environ. 2017, 196, 56–75. [Google Scholar] [CrossRef]
- Vardanjani, S.M.; Fathi, A.; Moradkhani, K. Grsnet: Gated residual supervision network for pixel-wise building segmentation in remote sensing imagery. Int. J. Remote Sens. 2022, 43, 4872–4887. [Google Scholar] [CrossRef]
- Feng, W.; Sui, H.; Hua, L.; Xu, C.; Ma, G.; Huang, W. Building extraction from VHR remote sensing imagery by combining an improved deep convolutional encoder-decoder architecture and historical land use vector map. Int. J. Remote Sens. 2020, 41, 6595–6617. [Google Scholar] [CrossRef]
- Zhang, B.; Wu, Y.; Zhao, B.; Chanussot, J.; Hong, D.; Yao, J.; Gao, L. Progress and challenges in intelligent remote sensing satellite systems. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 1814–1822. [Google Scholar] [CrossRef]
- Yang, J.; Matsushita, B.; Zhang, H. Improving building rooftop segmentation accuracy through the optimization of UNet basic elements and image foreground-background balance. ISPRS J. Photogramm. Remote Sens. 2023, 201, 123–137. [Google Scholar] [CrossRef]
- Yu, Y.; Wang, C.; Fu, Q.; Kou, R.; Huang, F.; Yang, B.; Yang, T.; Gao, M. Techniques and Challenges of Image Segmentation: A Review. Electronics 2023, 12, 1199. [Google Scholar] [CrossRef]
- Wang, Y.; Lv, H.; Deng, R.; Zhuang, S. A Comprehensive Survey of Optical Remote Sensing Image Segmentation Methods. Can. J. Remote Sens. 2020, 46, 501–531. [Google Scholar] [CrossRef]
- Bhargavi, K.; Jyothi, S. A survey on threshold based segmentation technique in image processing. Int. J. Innov. Res. Dev. 2014, 3, 234–239. [Google Scholar]
- Cheng, Z.; Wang, J. Improved region growing method for image segmentation of three-phase materials. Powder Technol. 2020, 368, 80–89. [Google Scholar] [CrossRef]
- Muthukrishnan, R.; Radha, M. Edge detection techniques for image segmentation. Int. J. Comput. Sci. Inf. Technol. 2011, 3, 259. [Google Scholar] [CrossRef]
- Wu, Y.; Peng, X.; Ruan, K.; Hu, Z. Improved image segmentation method based on morphological reconstruction. Multimed. Tools Appl. 2017, 76, 19781–19793. [Google Scholar] [CrossRef]
- Pan, Z.; Xu, J.; Guo, Y.; Hu, Y.; Wang, G. Deep Learning Segmentation and Classification for Urban Village Using a Worldview Satellite Image Based on U-Net. Remote Sens. 2020, 12, 1574. [Google Scholar] [CrossRef]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Atik, S.O.; Atik, M.E.; Ipbuker, C. Comparative research on different backbone architectures of DeepLabV3+ for building segmentation. J. Appl. Remote Sens. 2022, 16, 024510. [Google Scholar] [CrossRef]
- Minaee, S.; Boykov, Y.; Porikli, F.; Plaza, A.; Kehtarnavaz, N.; Terzopoulos, D. Image segmentation using deep learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3523–3542. [Google Scholar] [CrossRef] [PubMed]
- Khan, S.D.; Alarabi, L.; Basalamah, S. An Encoder–Decoder Deep Learning Framework for Building Footprints Extraction from Aerial Imagery. Arab. J. Sci. Eng. 2022, 48, 1273–1284. [Google Scholar] [CrossRef]
- Luo, L.; Li, P.; Yan, X. Deep Learning-Based Building Extraction from Remote Sensing Images: A Comprehensive Review. Energies 2021, 14, 7982. [Google Scholar] [CrossRef]
- Ji, Y.; Zhang, H.; Zhang, Z.; Liu, M. CNN-based encoder-decoder networks for salient object detection: A comprehensive review and recent advances. Inf. Sci. 2020, 546, 835–857. [Google Scholar] [CrossRef]
- Wang, L.; Fang, S.; Meng, X.; Li, R. Building extraction with vision transformer. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–11. [Google Scholar] [CrossRef]
- Guo, X.; Wang, Z.; Yang, Q.; Lv, W.; Liu, X.; Wu, Q.; Huang, J. GAN-Based virtual-to-real image translation for urban scene semantic segmentation. Neurocomputing 2020, 394, 127–135. [Google Scholar] [CrossRef]
- Gao, H.; Yuan, H.; Wang, Z.; Ji, S. Pixel transposed convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 42, 1218–1227. [Google Scholar] [CrossRef]
- Sediqi, K.M.; Lee, H.J. A Novel Upsampling and Context Convolution for Image Semantic Segmentation. Sensors 2021, 21, 2170. [Google Scholar] [CrossRef] [PubMed]
- Zhang, X.; Quan, Z.; Li, Q.; Zhu, D.; Yang, W. SED: Searching Enhanced Decoder with switchable skip connection for semantic segmentation. Pattern Recognit. 2024, 149, 110196. [Google Scholar] [CrossRef]
- Zhao, Q.; Liu, J.; Li, Y.; Zhang, H. Semantic Segmentation with Attention Mechanism for Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–13. [Google Scholar] [CrossRef]
- Jaderberg, M.; Simonyan, K.; Zisserman, A. Spatial transformer networks. In Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, Montreal, QC, Canada, 7–12 December 2015; p. 28. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Bastidas, A.A.; Tang, H. Channel attention networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019; pp. 0–0. [Google Scholar]
- Li, J.; Huang, X.; Tu, L.; Zhang, T.; Wang, L. A review of building detection from very high resolution optical remote sensing images. GIScience Remote Sens. 2022, 59, 1199–1225. [Google Scholar] [CrossRef]
- Chicco, D. Siamese neural networks: An overview. Artif. Neural Netw. 2021, 73–94. [Google Scholar] [CrossRef]
- Xu, Q.; Chen, K.; Sun, X.; Zhang, Y.; Li, H.; Xu, G. Pseudo-Siamese Capsule Network for Aerial Remote Sensing Images Change Detection. IEEE Geosci. Remote Sens. Lett. 2020, 19, 1–5. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Krishna, S.T.; Kalluri, H.K. Deep learning and transfer learning approaches for image classification. Int. J. Recent Technol. Eng. 2019, 7, 427–432. [Google Scholar]
- Santurkar, S.; Tsipras, D.; Ilyas, A.; Madry, A. How does batch normalization help optimization? Adv. Neural Inf. Process. Syst. 2018, 31. [Google Scholar]
- Stock, P.; Gribonval, R. An Embedding of ReLU Networks and an Analysis of Their Identifiability. Constr. Approx. 2022, 57, 853–899. [Google Scholar] [CrossRef]
- Zhang, Z.; Sabuncu, M. Generalized cross entropy loss for training deep neural networks with noisy labels. Adv. Neural Inf. Process. Syst. 2018, 31. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:14126980. [Google Scholar]
- Ji, S.; Wei, S.; Lu, M. Fully Convolutional Networks for Multisource Building Extraction from an Open Aerial and Satellite Imagery Data Set. IEEE Trans. Geosci. Remote Sens. 2018, 57, 574–586. [Google Scholar] [CrossRef]
- Mnih, V. Machine Learning for Aerial Image Labeling. Ph.D. Thesis, University of Toronto, Toronto, ON, Canada, 2013. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-assisted Intervention–MICCAI 2015, Proceedings of the 18th International Conference, Munich, Germany, 5–9 October 2015, Proceedings, Part III 18; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Gu, Z.; Cheng, J.; Fu, H.; Zhou, K.; Hao, H.; Zhao, Y.; Zhang, T.; Gao, S.; Liu, J. CE-Net: Context Encoder Network for 2D Medical Image Segmentation. IEEE Trans. Med. Imaging 2019, 38, 2281–2292. [Google Scholar] [CrossRef]
- Yang, F.; Sun, Q.; Jin, H.; Zhou, Z. Superpixel segmentation with fully convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 13964–13973. [Google Scholar]
- Chen, J.; Zhang, D.; Wu, Y.; Chen, Y.; Yan, X. A Context Feature Enhancement Network for Building Extraction from High-Resolution Remote Sensing Imagery. Remote Sens. 2022, 14, 2276. [Google Scholar] [CrossRef]
- Li, R.; Zheng, S.; Duan, C.; Su, J.; Zhang, C. Multistage attention ResU-Net for semantic segmentation of fine-resolution remote sensing images. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
- Che, Z.; Shen, L.; Huo, L.; Hu, C.; Wang, Y.; Lu, Y.; Bi, F. MAFF-HRNet: Multi-Attention Feature Fusion HRNet for Building Segmentation in Remote Sensing Images. Remote Sens. 2023, 15, 1382. [Google Scholar] [CrossRef]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:04861.2017. [Google Scholar]
- Mukhoti, J.; Kulharia, V.; Sanyal, A.; Golodetz, S.; Torr, P.; Dokania, P. Calibrating deep neural networks using focal loss. Adv. Neural Inf. Process. Syst. 2020, 33, 15288–15299. [Google Scholar]
- Nordström, M.; Hult, H.; Maki, A.; Löfman, F. Noisy Image Segmentation with Soft-Dice. arXiv 2023, arXiv:00801.2023. [Google Scholar]
- Salehi, S.S.M.; Erdogmus, D.; Gholipour, A. Tversky loss function for image segmentation using 3D fully convolutional deep networks. In Machine Learning in Medical Imaging; Springer: Cham, Switzerland, 2017. [Google Scholar]
- Ren, Y.; Zhang, X.; Ma, Y.; Yang, Q.; Wang, C.; Liu, H.; Qi, Q. Full Convolutional Neural Network Based on Multi-Scale Feature Fusion for the Class Imbalance Remote Sensing Image Classification. Remote Sens. 2020, 12, 3547. [Google Scholar] [CrossRef]
Computing Setting | Specification |
---|---|
Operating System | Windows 10 |
Processor | Intel(R) Xeon(R) W-2245 CPU (Santa Clara, CA, USA) |
RAM | 128 GB |
Graphics Card | NVIDIA RTX A6000 (Santa Clara, CA, USA) |
GPU Memory | 48 GB |
Programming Language | Python |
Deep Learning Framework | PyTorch |
ID | Approach | Accuracy | IoU | Precision | Recall | F1-Score |
---|---|---|---|---|---|---|
(a) | UNet [39] | 88.75 | 58.99 | 77.91 | 70.83 | 74.20 |
(b) | SegNet [13] | 84.07 | 39.65 | 74.70 | 45.80 | 56.79 |
(c) | DeepLabV3-Plus [40] | 89.73 | 61.42 | 81.23 | 71.58 | 76.10 |
(d) | CE-Net [41] | 90.12 | 64.12 | 79.04 | 77.25 | 78.12 |
(e) | SSFC-Net [42] | 84.82 | 49.43 | 67.43 | 64.92 | 66.16 |
(f) | CFENet [43] | 84.70 | 42.84 | 74.55 | 50.18 | 59.98 |
(g) | MaResU-Net [44] | 89.67 | 61.12 | 81.33 | 71.10 | 75.87 |
(h) | MAFF-HRNet [45] | 86.37 | 53.73 | 70.56 | 69.26 | 69.90 |
(ours) | NPSFF-Net | 91.37 | 68.72 | 80.02 | 82.95 | 81.46 |
ID | Approach | Accuracy | IoU | Precision | Recall | F1-Score |
---|---|---|---|---|---|---|
(a) | UNet [39] | 93.02 | 66.02 | 87.64 | 72.80 | 79.53 |
(b) | SegNet [13] | 90.27 | 51.96 | 86.59 | 56.51 | 68.39 |
(c) | DeepLabV3-Plus [40] | 92.67 | 65.23 | 84.80 | 73.87 | 78.96 |
(d) | CE-Net [41] | 93.74 | 70.24 | 85.92 | 79.37 | 82.52 |
(e) | SSFC-Net [42] | 92.28 | 64.74 | 81.27 | 76.09 | 78.60 |
(f) | CFENet [43] | 87.23 | 44.32 | 70.16 | 54.62 | 61.42 |
(g) | MaResU-Net [44] | 93.14 | 66.47 | 88.04 | 73.07 | 79.86 |
(h) | MAFF-HRNet [45] | 92.36 | 62.79 | 87.06 | 69.26 | 77.15 |
(ours) | NPSFF-Net | 93.94 | 71.88 | 84.03 | 83.24 | 83.64 |
Dataset | Attention Mechanisms | IoU | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|---|---|
Satellite Dataset I | No FEM | 67.63 | 91.00 | 79.16 | 82.28 | 80.69 |
Replaced with CBAM | 67.45 | 91.31 | 82.43 | 78.78 | 80.56 | |
Double FEM | 68.21 | 91.47 | 82.06 | 80.17 | 81.10 | |
Single FEM | 68.72 | 91.37 | 80.02 | 82.95 | 81.46 | |
Massachusetts Buildings Dataset | No FEM | 71.13 | 93.71 | 83.00 | 83.26 | 83.13 |
Replaced with CBAM | 70.69 | 93.76 | 84.87 | 80.88 | 82.83 | |
Double FEM | 71.60 | 93.89 | 84.15 | 82.77 | 83.45 | |
Single FEM | 71.88 | 93.94 | 84.03 | 83.24 | 83.64 |
Dataset | Decoding Pattern | IoU | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|---|---|
Satellite Dataset I | Single-Stream | 67.99 | 91.47 | 82.64 | 79.31 | 80.94 |
Double-Stream | 68.72 | 91.37 | 80.02 | 82.95 | 81.46 | |
Concatenation | 67.65 | 91.33 | 82.10 | 79.35 | 80.70 | |
Summation | 68.72 | 91.37 | 80.02 | 82.95 | 81.46 | |
Massachusetts Buildings Dataset | Single-Stream | 70.82 | 93.74 | 84.31 | 81.58 | 82.92 |
Double-Stream | 71.88 | 93.94 | 84.03 | 83.24 | 83.64 | |
Concatenation | 70.70 | 93.83 | 85.96 | 79.94 | 82.84 | |
Summation | 71.88 | 93.94 | 84.03 | 83.24 | 83.64 |
ID | Approach | Accuracy | IoU | Precision | Recall | F1-Score |
---|---|---|---|---|---|---|
(a) | UNet [39] | 98.68 | 88.61 | 95.84 | 92.15 | 93.96 |
(b) | SegNet [13] | 98.37 | 86.02 | 94.89 | 90.20 | 92.48 |
(c) | DeepLabV3-Plus [40] | 98.61 | 88.18 | 94.46 | 92.99 | 93.72 |
(d) | CE-Net [41] | 98.71 | 89.04 | 94.27 | 94.14 | 94.20 |
(e) | SSFC-Net [42] | 98.49 | 87.35 | 93.01 | 93.48 | 93.25 |
(f) | CFENet [43] | 96.89 | 73.47 | 93.57 | 77.38 | 84.71 |
(g) | MaResU-Net [44] | 98.58 | 87.68 | 96.51 | 90.56 | 93.44 |
(h) | MAFF-HRNet [45] | 98.63 | 88.23 | 95.32 | 92.22 | 93.74 |
(ours) | NPSFF-Net | 98.77 | 89.45 | 95.47 | 93.41 | 94.43 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Guo, N.; Jiang, M.; Hu, X.; Su, Z.; Zhang, W.; Li, R.; Luo, J. NPSFF-Net: Enhanced Building Segmentation in Remote Sensing Images via Novel Pseudo-Siamese Feature Fusion. Remote Sens. 2024, 16, 3266. https://doi.org/10.3390/rs16173266
Guo N, Jiang M, Hu X, Su Z, Zhang W, Li R, Luo J. NPSFF-Net: Enhanced Building Segmentation in Remote Sensing Images via Novel Pseudo-Siamese Feature Fusion. Remote Sensing. 2024; 16(17):3266. https://doi.org/10.3390/rs16173266
Chicago/Turabian StyleGuo, Ningbo, Mingyong Jiang, Xiaoyu Hu, Zhijuan Su, Weibin Zhang, Ruibo Li, and Jiancheng Luo. 2024. "NPSFF-Net: Enhanced Building Segmentation in Remote Sensing Images via Novel Pseudo-Siamese Feature Fusion" Remote Sensing 16, no. 17: 3266. https://doi.org/10.3390/rs16173266
APA StyleGuo, N., Jiang, M., Hu, X., Su, Z., Zhang, W., Li, R., & Luo, J. (2024). NPSFF-Net: Enhanced Building Segmentation in Remote Sensing Images via Novel Pseudo-Siamese Feature Fusion. Remote Sensing, 16(17), 3266. https://doi.org/10.3390/rs16173266