A Geometric Significance-Aware Deep Mutual Learning Network for Building Extraction from Aerial Images
Abstract
:1. Introduction
- This study incorporates the geometric saliency of buildings as prior knowledge within the deep learning network, directing the network’s focus toward the geometrically salient regions of buildings. This strategic integration significantly improves the network’s specificity in learning building features.
- Furthermore, a novel bi-directional guidance attention module (BGAM) is proposed in this study, effectively capturing the interdependency between same-scale feature maps within the RGB and GS semantic branches. The deep mutual learning facilitated by the BGAM enhances the extraction of features within the target area.
- Additionally, an enhanced flow alignment module (FAM++) is introduced to mitigate the occurrence of voids in building feature maps and attain highly semantic information at a heightened resolution. By implementing a gating mechanism inspired by the original FAM design, FAM++ effectively learns the spatial and semantic relationships across adjacent feature layers.
- To enhance the precision of building extraction outcomes, a multi-objective loss function is devised in this study to fine-tune the network. This loss function encompasses the refinement of both the final prediction map and the feature maps derived from the encoder, with the goal of synergistically enhancing the network’s efficiency.
2. Related Works
2.1. Knowledge-Driven Building Extraction
2.2. Data-Driven Building Extraction
3. Methodology
3.1. Acquiring Geometric Significance of Buildings
3.2. Dual-Branch Network
3.3. Bi-Directional Guided Attention Module
3.4. Improved Flow Alignment Module
3.5. Simple Multi-Layer Perceptron Module
3.6. Multi-Objective Loss Function
4. Experiments
4.1. Datasets
4.2. Implementation Details and Evaluation Metrics
4.3. Comparison and Analysis
4.4. Sensitivity Analysis of Key Parameters
4.5. Comparing with Microsoft’s Building Extraction Method
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Jain, A.; Gue, I.H.; Jain, P. Research trends, themes, and insights on artificial neural networks for smart cities towards SDG-11. J. Clean. Prod. 2023, 412, 137300. [Google Scholar] [CrossRef]
- Allam, Z.; Jones, D.S. Future (post-COVID) digital, smart and sustainable cities in the wake of 6G: Digital twins, immersive realities and new urban economies. Land Use Policy 2021, 101, 105201. [Google Scholar] [CrossRef]
- Zhang, S.; Wang, C.; Li, J.; Sui, Y. MF-Dfnet: A deep learning method for pixel-wise classification of very high-resolution remote sensing images. Int. J. Remote Sens. 2022, 43, 330–348. [Google Scholar] [CrossRef]
- Wei, S.; Zhang, T.; Ji, S.; Luo, M.; Gong, J. BuildMapper: A fully learnable framework for vectorized building contour extraction. ISPRS J. Photogramm. Remote Sens. 2023, 197, 87–104. [Google Scholar] [CrossRef]
- Zhu, X.; Zhang, X.; Zhang, T.; Tang, X.; Chen, P.; Zhou, H.; Jiao, L. Semantics and contour based interactive learning network for building footprint extraction. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5623513. [Google Scholar] [CrossRef]
- Hu, A.; Wu, L.; Chen, S.; Xu, Y.; Wang, H.; Xie, Z. Boundary shape-preserving model for building mapping from high-resolution remote sensing images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5610217. [Google Scholar] [CrossRef]
- Cui, S.; Yan, Q.; Reinartz, P. Complex building description and extraction based on Hough transformation and cycle detection. Remote Sens. Lett. 2012, 3, 151–159. [Google Scholar] [CrossRef]
- El Merabet, Y.; Meurie, C.; Ruichek, Y.; Sbihi, A.; Touahni, R. Building Roof Segmentation from Aerial Images Using a Lineand Region-Based Watershed Segmentation Technique. Sensors 2015, 15, 3172–3203. [Google Scholar] [CrossRef]
- Ok, A.O. Automated detection of buildings from single VHR multispectral images using shadow information and graph cuts. ISPRS J. Photogramm. Remote Sens. 2013, 86, 21–40. [Google Scholar] [CrossRef]
- Izadi, M.; Saeedi, P. Three-dimensional polygonal building model estimation from single satellite images. IEEE Trans. Geosci. Remote Sens. 2012, 50, 2254–2272. [Google Scholar] [CrossRef]
- Liasis, G.; Stavrou, S. Satellite images analysis for shadow detection and building height estimation. ISPRS J. Photogramm. Remote Sens. 2016, 119, 437–450. [Google Scholar] [CrossRef]
- Gao, X.; Wang, M.; Yang, Y.; Li, G. Building extraction from RGB VHR Images using shifted shadow algorithm. IEEE Access 2018, 6, 22034–22045. [Google Scholar] [CrossRef]
- Zhou, G.; Sha, H. Building shadow detection on ghost images. Remote Sens. 2020, 12, 679. [Google Scholar] [CrossRef]
- Lei, H.; Jin, Z.; Feng, G. A building extraction method using shadow in high resolution multispectral images. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Vancouver, BC, Canada, 24–29 July 2011; pp. 1862–1865. [Google Scholar] [CrossRef]
- Huang, X.; Zhang, L. A multidirectional and multiscale morphological index for automatic building extraction from multispectral GeoEye-1 imagery. Photogramm. Eng. Remote Sens. 2011, 77, 721–732. [Google Scholar] [CrossRef]
- Ding, Z.; Wang, X.Q.; Li, Y.L.; Zhang, S.S. Study on building extraction from high-resolution images using MBI. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2018, XLII-3, 283–287. [Google Scholar] [CrossRef]
- Zhang, C.; Hu, Y.; Cui, W. Semiautomatic right-angle building extraction from very high-resolution aerial images using graph cuts with star shape constraint and regularization. J. Appl. Remote Sens. 2018, 12, 26005. [Google Scholar] [CrossRef]
- Ning, X.; Lin, X. An index based on joint density of corners and line segments for built-up area detection from high resolution satellite imagery. ISPRS Int. J. Geo-Inf. 2017, 6, 338. [Google Scholar] [CrossRef]
- Karantzalos, K.; Paragios, N. Recognition-driven two-dimensional competing priors toward automatic and accurate building detection. IEEE Trans. Geosci. Remote Sens. 2009, 47, 133–144. [Google Scholar] [CrossRef]
- Ahmadi, S.; Zoej, M.J.V.; Ebadi, H.; Moghaddam, H.A.; Mohammadzadeh, A. Automatic urban building boundary extraction from high resolution aerial images using an innovative model of active contours. Int. J. Appl. Earth Obs. Geoinf. 2010, 12, 150–157. [Google Scholar] [CrossRef]
- Ywata, M.S.Y.; Dal Poz, A.P.; Shimabukuro, M.H.; de Oliveira, H.C. Snake-based model for automatic roof boundary extraction in the object space integrating a high-resolution aerial images stereo pair and 3D roof models. Remote Sens. 2021, 13, 1429. [Google Scholar] [CrossRef]
- Shelhamer, E.; Long, J.; Darrell, T. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 640–651. [Google Scholar] [CrossRef]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference On Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6230–6239. [Google Scholar] [CrossRef]
- Chen, L.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference On Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Volume 9351, pp. 234–241. [Google Scholar] [CrossRef]
- Liu, Y.; Sun, Y.; Tao, S.; Wang, M.; Shen, Q.; Huang, J. Discovering potential illegal construction within building roofs from UAV images using semantic segmentation and object-based change detection. Photogramm. Eng. Remote Sens. 2021, 87, 263–271. [Google Scholar] [CrossRef]
- Zheng, Z.; Wan, Y.; Zhang, Y.; Xiang, S.; Peng, D.; Zhang, B. CLNet: Cross-layer convolutional neural network for change detection in optical remote sensing imagery. ISPRS J. Photogramm. Remote Sens. 2021, 175, 247–267. [Google Scholar] [CrossRef]
- Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar]
- Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. UNet++: A nested U-Net architecture for medical image segmentation. In International Workshop On Deep Learning in Medical Image Analysis (DLMIA); Springer: Cham, Switzerland, 2018; Volume 11045, pp. 3–11. [Google Scholar] [CrossRef]
- Tong, Z.; Li, Y.; Li, Y.; Fan, K.; Si, Y.; He, L. New network based on Unet plus plus and Densenet for building extraction from high resolution satellite imagery. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), New York, NY, USA, 26 September–2 October 2020; pp. 2268–2271. [Google Scholar]
- Zhao, H.; Zhang, H.; Zheng, X. A multiscale attention-guided UNet plus plus with edge constraint for building extraction from high spatial resolution imagery. Appl. Sci. 2022, 12, 5960. [Google Scholar] [CrossRef]
- Guo, H.; Su, X.; Wu, C.; Du, B.; Zhang, L. Decoupling semantic and edge representations for building footprint extraction from remote sensing images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5613116. [Google Scholar] [CrossRef]
- Yang, D.; Wang, B.; Li, W.; He, C. Exploring the user guidance for more accurate building segmentation from high-resolution remote sensing images. Int. J. Appl. Earth Obs. Geoinf. 2024, 126, 103609. [Google Scholar] [CrossRef]
- Sun, Z.; Zhang, Z.; Chen, M.; Qian, Z.; Cao, M.; Wen, Y. Improving the Performance of Automated Rooftop Extraction through Geospatial Stratified and Optimized Sampling. Remote Sens. 2022, 14, 4961. [Google Scholar] [CrossRef]
- Xue, N.; Xia, G.; Bai, X.; Zhang, L.; Shen, W. Anisotropic-scale junction detection and matching for indoor images. IEEE Trans. Image Process. 2018, 27, 78–91. [Google Scholar] [CrossRef]
- Xia, G.; Huang, J.; Xue, N.; Lu, Q.; Zhu, X. GeoSay: A geometric saliency for extracting buildings in remote sensing images. Comput. Vis. Image Underst. 2019, 186, 37–47. [Google Scholar] [CrossRef]
- Li, X.; You, A.; Zhu, Z.; Zhao, H.; Yang, M.; Yang, K.; Tan, S.; Tong, Y. Semantic flow for fast and accurate scene parsing. In Computer Vision—ECCV 2020; Springer: Cham, Switzerland, 2020; pp. 775–793. [Google Scholar]
- Milletari, F.; Navab, N.; Ahmadi, S. V-Net: Fully convolutional neural networks for volumetric medical image segmentation. In Proceedings of the IEEE International Conference on 3D Vision (3DV), New York, NY, USA, 25–28 October 2016; pp. 565–571. [Google Scholar]
- Lin, T.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef] [PubMed]
- Ji, S.; Wei, S.; Lu, M. Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set. IEEE Trans. Geosci. Remote Sens. 2019, 57, 574–586. [Google Scholar] [CrossRef]
- Mnih, V. Machine Learning for Aerial Image Labeling; University of Toronto: Toronto, ON, Canada, 2013. [Google Scholar]
- Du, S.; Du, S.; Liu, B.; Zhang, X. Incorporating DeepLabv3+and object-based image analysis for semantic segmentation of very high resolution remote sensing images. Int. J. Digit. Earth 2021, 14, 357–378. [Google Scholar] [CrossRef]
- Wang, J.; Sun, K.; Cheng, T.; Jiang, B.; Deng, C.; Zhao, Y.; Liu, D.; Mu, Y.; Tan, M.; Wang, X.; et al. Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 3349–3364. [Google Scholar] [CrossRef]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 35th Conference on Neural Information Processing Systems (NeurIPS), Online, 6–14 December 2021; Volume 34. [Google Scholar]
- Lin, H.; Hao, M.; Luo, W.; Yu, H.; Zheng, N. BEARNet: A novel buildings edge-aware refined network for building extraction from high-resolution remote sensing images. IEEE Geosci. Remote Sens. Lett. 2023, 20, 6005305. [Google Scholar] [CrossRef]
Dataset | Collection Area | Cover Area (km2) | Spatial Resolution (m) | Sample Size | Sample Number (Group) | Number of Buildings | ||
---|---|---|---|---|---|---|---|---|
Train | Val | Test | ||||||
WHU Building Dataset | Christchurch, New Zealand | 450 | 0.2 | 512 | 8460 | 3626 | 2782 | 28,873 |
Massachusetts Building Dataset | Boston | 340 | 1 | 1500 | 137 | 4 | 10 | - |
Data | Method | Precision (%) | Recall (%) | F1 (%) | IoU (%) | OA (%) | Params (M) |
---|---|---|---|---|---|---|---|
WHU Building Dataset | PSPNet | 96.10 | 96.01 | 96.05 | 92.54 | 97.96 | 2.38 |
Deeplabv3+ | 91.56 | 95.83 | 93.65 | 88.15 | 96.48 | 5.81 | |
HRNet | 95.69 | 93.96 | 94.82 | 90.34 | 97.36 | 9.64 | |
SegFormer | 97.57 | 97.24 | 97.40 | 95.00 | 98.63 | 3.72 | |
BEARNet | 97.70 | 97.15 | 97.42 | 95.03 | 98.67 | 19.42 | |
GSDMLNet | 97.89 | 97.57 | 97.73 | 95.61 | 98.83 | 6.09 | |
Massachusetts Building Dataset | PSPNet | 75.00 | 73.20 | 73.09 | 63.03 | 89.95 | 2.38 |
Deeplabv3+ | 72.46 | 82.18 | 77.01 | 64.28 | 88.13 | 5.81 | |
HRNet | 78.46 | 76.21 | 77.32 | 66.46 | 91.23 | 9.64 | |
SegFormer | 84.54 | 83.81 | 83.68 | 74.63 | 93.71 | 3.72 | |
BEARNet | 84.92 | 85.27 | 85.09 | 75.82 | 93.99 | 19.42 | |
GSDMLNet | 85.89 | 86.58 | 86.23 | 77.34 | 94.42 | 6.09 |
Data | λ | F1 (%) | IoU (%) | OA (%) |
---|---|---|---|---|
WHU Building Dataset | 0 | 97.69 | 95.55 | 98.81 |
0.5 | 97.71 | 95.56 | 98.82 | |
1.0 | 97.73 | 95.61 | 98.83 | |
1.5 | 97.72 | 95.60 | 98.83 | |
2.0 | 97.72 | 95.57 | 98.82 | |
Massachusetts Building Dataset | 0 | 85.86 | 76.84 | 94.31 |
0.5 | 85.91 | 76.92 | 94.35 | |
1.0 | 86.23 | 77.34 | 94.42 | |
1.5 | 85.99 | 77.03 | 94.37 | |
2.0 | 86.00 | 77.04 | 94.37 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hao, M.; Lin, H.; Chen, S.; Luo, W.; Zhang, H.; Zheng, N. A Geometric Significance-Aware Deep Mutual Learning Network for Building Extraction from Aerial Images. Drones 2024, 8, 593. https://doi.org/10.3390/drones8100593
Hao M, Lin H, Chen S, Luo W, Zhang H, Zheng N. A Geometric Significance-Aware Deep Mutual Learning Network for Building Extraction from Aerial Images. Drones. 2024; 8(10):593. https://doi.org/10.3390/drones8100593
Chicago/Turabian StyleHao, Ming, Huijing Lin, Shilin Chen, Weiqiang Luo, Hua Zhang, and Nanshan Zheng. 2024. "A Geometric Significance-Aware Deep Mutual Learning Network for Building Extraction from Aerial Images" Drones 8, no. 10: 593. https://doi.org/10.3390/drones8100593
APA StyleHao, M., Lin, H., Chen, S., Luo, W., Zhang, H., & Zheng, N. (2024). A Geometric Significance-Aware Deep Mutual Learning Network for Building Extraction from Aerial Images. Drones, 8(10), 593. https://doi.org/10.3390/drones8100593