Multi-Scale Feature Fusion Attention Network for Building Extraction in Remote Sensing Images
Abstract
:1. Introduction
- Multi-resolution attention network: We introduce a network structure named the multi-resolution attention network. This network fully utilizes the attention mechanism, concentrating on the most critical features and effectively addressing the issue of blurred boundaries in the segmentation process through the integration of multi-scale information.
- Multiscale channel and spatial attention module: In response to the unique characteristics of remote sensing building images, we have designed the MCAM. This module operates adaptively, more effectively capturing crucial features and filtering out irrelevant information. This capability enables the model to selectively concentrate on the critical parts of the image, thus enhancing the accuracy of building extraction.
- Layered residual connectivity module: We designed the LRCM. This module aims to augment the expression of information across various scales by merging features from multiple levels. This enhancement not only improves context understanding but also yields significant results in capturing fine edge details and fusing high-level abstract features. This approach offers an effective means of enhancing the performance of building extraction models.
2. Related Works
2.1. Traditional Feature-Based Segmentation Methods
2.2. Segmentation Based on Deep Learning Methods
3. Method
3.1. Multi-Resolution Attention Network Framework
- Multiple pooling layers in the multi-layer feature extraction section capture features at various scales, ensuring continuous feature extraction for subsequent modules.
- The adaptive feature enhancement attention module implements an attention mechanism on the input of the feature extraction section, achieving adaptive feature enhancement by assigning different weights to different features.
- The task of the feature aggregation fusion module is to organically fuse high-resolution features with low-resolution ones, thereby enhancing detail extraction.
- In the progressive upsampling section, during the decoding process, further fusion of the already integrated features with bottom-layer features is performed to ultimately obtain the prediction results.
Algorithm 1 Enhanced Segmentation Model with VGG Backbone and Attention Mechanisms |
Input: Input image |
Output: Segmentation map |
1: Initialize a VGG16 backbone with optional pretrained weights. |
2: Use the first four layers of VGG16 to extract initial feature maps from the input image. |
3: Employ subsequent layers of VGG16 to extract deeper, more complex features. |
4: Apply MRAM modules on the extracted features for channel and spatial attention enhancement. |
5: Utilize LRCM modules to integrate and fuse features from various hierarchical levels. |
6: Perform progressive upsampling and feature concatenation: |
a. For each upsampling stage, concatenate the upsampled feature with the corresponding LRCM-enhanced feature. |
b. Apply convolution operations followed by ReLU activations. |
7: Generate the final segmentation map using a convolutional layer. |
8: Output the segmentation map. |
3.2. Multiscale Channel and Spatial Attention Module
Algorithm 2 Multiscale Channel and Spatial Attention Module |
Input: Feature maps X ∈ ℝC×H×W |
Output: The refined feature maps Y |
1: for each feature map in the batch do |
2: Channel Attention Module: |
3: Perform average pooling and max pooling along spatial dimensions (H and W). |
4: Apply shared Multi-Layer Perceptron (MLP) on both pooled features. |
5: Sum the outputs of the MLP. |
6: Apply sigmoid function to obtain channel attention map. |
7: Spatial Attention Module: |
8: Perform average pooling and max pooling along the channel axis. |
9: Concatenate the pooled features along the channel axis. |
10: Convolve concatenated features and apply sigmoid function to obtain spatial attention map. |
11: Multiply the channel-refined feature maps with the spatial attention map. |
12: end for |
13: Output the refined feature maps Y after sequentially applying channel and spatial attention mechanisms. |
3.3. Layered Residual Connectivity Module
4. Experiments
4.1. Data and Hardware Environment
- WHU dataset [37]. This dataset comprises two subsets of remote sensing images (aerial and satellite images). The aerial image subset was selected to validate the proposed method. The original aerial images were obtained from the New Zealand Land Information Service website, located in Christchurch, New Zealand, and contain 187,000 buildings with a total of 8188 images. Among these, 4736 images were used as the training set, 1036 images as the validation set, and 2416 images as the test set. Each image has a size of 512 × 512 pixels, a spatial resolution of 0.3 m, and includes three bands (red, green, and blue).
- Massachusetts building dataset [38]. It comprises 151 remote sensing images of Boston city and suburbs, each with a size of 1500 × 1500 pixels. The dataset was divided into 136 images for model training, 11 for model testing, and the remaining ones for model validation. Due to memory constraints, the original image size exceeded the available memory, so the images were cropped to 512 × 512 in these experiments.
4.2. Evaluation Metrics
4.3. Experiment Analysis
- Quantitative Results
- 2.
- Qualitative results
4.4. Ablation Study
- Quantitative Results
- 2.
- Qualitative results
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Kotaridis, I.; Lazaridou, M. Remote sensing image segmentation advances: A meta-analysis. ISPRS J. Photogramm. Remote. Sens. 2021, 173, 309–322. [Google Scholar] [CrossRef]
- Ok, A.O. Automated detection of buildings from single VHR multispectral images using shadow information and graph cuts. ISPRS J. Photogramm. Remote. Sens. 2013, 86, 21–40. [Google Scholar] [CrossRef]
- Zhao, W.; Persello, C.; Stein, A. Building instance segmentation and boundary regularization from high-resolution remote sensing images. In Proceedings of the IGARSS 2020–2020 IEEE International Geoscience and Remote Sensing Symposium, Waikoloa, HI, USA, 26 September–2 October 2020; pp. 3916–3919. [Google Scholar]
- Chen, J.; Yuan, Z.; Peng, J.; Chen, L.; Huang, H.; Zhu, J.; Liu, Y.; Li, H. DASNet: Dual attentive fully convolutional siamese networks for change detection in high-resolution satellite images. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2021, 14, 1194–1206. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet Classification with Deep Convolutional Neural Networks. Adv. Neural Inf. Process. Syst. 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Abdollahi, A.; Pradhan, B. Integrating semantic edges and segmentation information for building extraction from aerial images using UNet. Mach. Learn. Appl. 2021, 6, 100194. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Duarte, D.; Nex, F.; Kerle, N.; Vosselman, G. Multi-Resolution Feature Fusion for Image Classification of Building Damages with Convolutional Neural Networks. Remote. Sens. 2018, 10, 1636. [Google Scholar] [CrossRef]
- Dong, S.; Chen, Z. A Multi-Level Feature Fusion Network for Remote Sensing Image Segmentation. Sensors 2021, 21, 1267. [Google Scholar] [CrossRef] [PubMed]
- Zheng, Z.; Du, S.; Taubenböck, H.; Zhang, X. Remote sensing techniques in the investigation of aeolian sand dunes: A review of recent advances. Remote Sens. Environ. 2022, 271, 112913. [Google Scholar] [CrossRef]
- Yuan, X.; Shi, J.; Gu, L. A Review of Deep Learning Methods for Semantic Segmentation of Remote Sensing Imagery. Expert Syst. Appl. 2021, 169, 114417. [Google Scholar] [CrossRef]
- Cheng, Y.; Wang, W.; Zhang, W.; Yang, L.; Wang, J.; Ni, H.; Guan, T.; He, J.; Gu, Y.; Tran, N.N. A Multi-Feature Fusion and Attention Network for Multi-Scale Object Detection in Remote Sensing Images. Remote. Sens. 2023, 15, 2096. [Google Scholar] [CrossRef]
- Inglada, J. Automatic Recognition of Man-Made Objects in High Resolution Optical Remote Sensing Images by SVM Classification of Geometric Image Features. ISPRS J. Photogramm. Remote. Sens. 2007, 62, 236–248. [Google Scholar] [CrossRef]
- Cetin, M.; Halici, U.; Aytekin, O. Building detection in satellite images by textural features and Adaboost. In Proceedings of the 2010 IAPR Workshop on Pattern Recognition in Remote Sensing (PRRS 2010), Istanbul, Turkey, 22–22 August 2010; pp. 1–4. [Google Scholar]
- Peng, J.; Liu, Y.C. Model and Context-Driven Building Extraction in Dense Urban Aerial Images. Int. J. Remote. Sens. 2007, 26, 1289–1307. [Google Scholar] [CrossRef]
- Wei, Y.; Zhao, Z.; Song, J. Urban Building Extraction from High-Resolution Satellite Panchromatic Image Using Clustering and Edge Detection. In Proceedings of the IGARSS 2004—2004 IEEE International Geoscience and Remote Sensing Symposium, Anchorage, AK, USA, 20–24 September 2004; Volume 3, pp. 2008–2010. [Google Scholar]
- Li, E.; Femiani, J.; Xu, S.; Zhang, X.; Wonka, P. Robust rooftop extraction from visible band images using higher order CRF. IEEE Trans. Geosci. Remote. Sens. 2015, 53, 4483–4495. [Google Scholar] [CrossRef]
- Du, S.; Zhang, F.; Zhang, X. Semantic classification of urban buildings combining VHR image and GIS data: An improved random forest approach. ISPRS J. Photogramm. Remote. Sens. 2015, 105, 107–119. [Google Scholar] [CrossRef]
- Gavankar, N.L.; Ghosh, S.K. Automatic building footprint extraction from high-resolution satellite image using mathematical morphology. Eur. J. Remote. Sens. 2018, 51, 182–193. [Google Scholar] [CrossRef]
- Xu, L.; Kong, M.; Pan, B. Building Extraction by Stroke Width Transform from Satellite Imagery. In Proceedings of the Second CCF Chinese Conference Computer Vision CCCV 2017, Tianjin, China, 11–14 October 2017; Springer: Singapore, 2017; pp. 340–351. [Google Scholar]
- Cheng, B.; Cui, S.; Ma, X.; Liang, C. Research on an Urban Building Area Extraction Method with High-Resolution PolSAR Imaging Based on Adaptive Neighborhood Selection Neighborhoods for Preserving Embedding. ISPRS Int. J. Geo-Inf. 2020, 9, 109. [Google Scholar] [CrossRef]
- Dai, Y.; Gong, J.; Li, Y.; Feng, Q. Building Segmentation and Outline Extraction from UAV Image-Derived Point Clouds by a Line Growing Algorithm. Int. J. Digit. Earth 2017, 10, 1077–1097. [Google Scholar] [CrossRef]
- Adegun, A.A.; Viriri, S.; Tapamo, J.-R. Review of Deep Learning Methods for Remote Sensing Satellite Images Classification: Experimental Survey and Comparative Analysis. J. Big Data 2023, 10, 9. [Google Scholar] [CrossRef]
- Liu, S.; Shi, Q.; Zhang, L. Few-Shot Hyperspectral Image Classification with Unknown Classes Using Multitask Deep Learning. IEEE Trans. Geosci. Remote. Sens. 2020, 59, 5085–5102. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015. [Google Scholar]
- Tong, Z.; Li, Y.; Li, Y.; Fan, K.; Si, Y.; He, L. New Network Based on Unet++ and Densenet for Building Extraction from High Resolution Satellite Imagery. In Proceedings of the 2020 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Waikoloa, HI, USA, 26 September 2020; pp. 2268–2271. [Google Scholar]
- Dey, M.S.; Chaudhuri, U.; Banerjee, B.; Bhattacharya, A. Dual-Path Morph-UNet for Road and Building Segmentation From Satellite Images. IEEE Geosci. Remote. Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
- Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
- Guo, H.; Du, B.; Zhang, L.; Su, X. A Coarse-to-Fine Boundary Refinement Network for Building Footprint Extraction from Remote Sensing Imagery. ISPRS J. Photogramm. Remote. Sens. 2022, 183, 240–252. [Google Scholar] [CrossRef]
- Chen, F.; Wang, N.; Yu, B.; Wang, L. Res2-Unet, a New Deep Architecture for Building Detection From High Spatial Resolution Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2022, 15, 1494–1501. [Google Scholar] [CrossRef]
- Zhang, C.; Jiang, W.S.; Zhang, Y.; Wang, W.; Zhao, Q.; Wang, C.J. Transformer and CNN Hybrid Deep Neural Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery. IEEE Trans. Geosci. Remote. Sens. 2022, 60, 1–20. [Google Scholar] [CrossRef]
- Zhou, Y.; Chen, Z.; Wang, B.; Li, S.; Liu, H.; Xu, D.; Ma, C. BOMSC-Net: Boundary Optimization and Multi-Scale Context Awareness Based Building Extraction From High-Resolution Remote Sensing Imagery. IEEE Trans. Geosci. Remote. Sens. 2022, 60, 1–17. [Google Scholar] [CrossRef]
- Liu, T.; Yao, L.; Qin, J.; Lu, N.; Jiang, H.; Zhang, F.; Zhou, C. Multi-Scale Attention Integrated Hierarchical Networks for High-resolution Building Footprint Extraction. Int. J. Appl. Earth Obs. Geoinf. 2022, 109, 102768. [Google Scholar] [CrossRef]
- Wang, Y.; Zeng, X.; Liao, X.; Zhuang, D. B-FGC-Net: A Building Extraction Network from High Resolution Remote Sensing Imagery. Remote. Sens. 2022, 14, 269. [Google Scholar] [CrossRef]
- Ku, T.; Yang, Q.; Zhang, H. Multilevel Feature Fusion Dilated Convolutional Network for Semantic Segmentation. Int. J. Adv. Robot. Syst. 2021, 18, 20. [Google Scholar] [CrossRef]
- Zhang, R.; Zhang, Q.; Zhang, G. SDSC-UNet: Dual Skip Connection ViT-based U-shaped Model for Building Extraction. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar] [CrossRef]
- Ji, S.; Wei, S.; Lu, M. Fully Convolutional Networks for Multisource Building Extraction From an Open Aerial and Satellite Imagery Data Set. IEEE Trans. Geosci. Remote. Sens. 2018, 57, 574–586. [Google Scholar] [CrossRef]
- Huang, Z.; Cheng, G.; Wang, H.; Li, H.; Shi, L.; Pan, C. Building Extraction from Multi-Source Remote Sensing Images Via Deep Deconvolution Neural Networks. In Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 1835–1838. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the European Conference Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Sun, K.; Zhao, Y.; Jiang, B.; Cheng, T.; Xiao, B.; Liu, D.; Wang, J. High-Resolution Representations for Labeling Pixels and Regions. arXiv 2019, arXiv:1904.04514. [Google Scholar]
- Huang, H.; Chen, Y.; Wang, R. A Lightweight Network for Building Extraction from Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–12. [Google Scholar] [CrossRef]
- Chen, Y.; Jiang, W.; Wang, M.; Kang, M.; Weise, T.; Wang, X.; Zhang, C. LightFGCNet: A Lightweight and Focusing on Global Context Information Semantic Segmentation Network for Remote Sensing Imagery. Remote. Sens. 2022, 14, 6193. [Google Scholar] [CrossRef]
- Guo, H.; Su, X.; Tang, S.; Du, B.; Zhang, L. Scale-Robust Deep-Supervision Network for Mapping Building Footprints From High-Resolution Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2021, 14, 10091–10100. [Google Scholar] [CrossRef]
- Zhang, Z.; Liu, Q.; Wang, Y. Road Extraction by Deep Residual U-Net. IEEE Geosci. Remote Sens. Lett. 2018, 15, 749–753. [Google Scholar] [CrossRef]
Method | IoU | Recall | Precision | F1 |
---|---|---|---|---|
U-net [25] | 88.26 | 93.66 | 94.34 | 93.77 |
Segnet [39] | 85.20 | 91.55 | 92.48 | 92.00 |
DeepLabv3+ [40] | 87.07 | 92.65 | 93.59 | 93.12 |
HRNet [41] | 86.51 | 92.67 | 93.66 | 93.16 |
RSR-Net [42] | 88.69 | 92.31 | 95.61 | 93.93 |
LightFGCNet-C [43] | 89.87 | 94.57 | - | 94.86 |
DS-Net [44] | 90.4 | 95.06 | 94.85 | 94.96 |
Ours | 90.66 | 95.2 | 95.0 | 95.1 |
Method | IoU | Recall | Precision | F1 |
---|---|---|---|---|
Segnet [39] | 58.07 | 82.75 | 66.06 | 73.47 |
Res-Unet [45] | 66.21 | 82.58 | 76.97 | 79.67 |
DeepLabv3+ [40] | 69.23 | 79.1 | 84.73 | 81.82 |
HRNet [41] | 67.89 | 82.28 | 79.51 | 80.87 |
Ours | 69.53 | 82.96 | 81.12 | 82.03 |
Method | IoU | Recall | Precision | F1 |
---|---|---|---|---|
Baseline | 87.02 | 92.87 | 93.25 | 93.06 |
Baseline + MCAM | 89.15 | 94.62 | 93.92 | 94.27 |
Baseline + LRCM | 88.97 | 94.75 | 93.58 | 94.16 |
Baseline + MCAM + LRCM | 90.66 | 95.2 | 95.0 | 95.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, J.; Gu, H.; Li, Z.; Chen, H.; Chen, H. Multi-Scale Feature Fusion Attention Network for Building Extraction in Remote Sensing Images. Electronics 2024, 13, 923. https://doi.org/10.3390/electronics13050923
Liu J, Gu H, Li Z, Chen H, Chen H. Multi-Scale Feature Fusion Attention Network for Building Extraction in Remote Sensing Images. Electronics. 2024; 13(5):923. https://doi.org/10.3390/electronics13050923
Chicago/Turabian StyleLiu, Jia, Hang Gu, Zuhe Li, Hongyang Chen, and Hao Chen. 2024. "Multi-Scale Feature Fusion Attention Network for Building Extraction in Remote Sensing Images" Electronics 13, no. 5: 923. https://doi.org/10.3390/electronics13050923
APA StyleLiu, J., Gu, H., Li, Z., Chen, H., & Chen, H. (2024). Multi-Scale Feature Fusion Attention Network for Building Extraction in Remote Sensing Images. Electronics, 13(5), 923. https://doi.org/10.3390/electronics13050923