DMU-Net: A Dual-Stream Multi-Scale U-Net Network Using Multi-Dimensional Spatial Information for Urban Building Extraction
Abstract
:1. Introduction
- (1)
- The dual-stream structure of the DMU-Net can effectively extract the features of multi-modal data, and the building features of different scales can be effectively integrated through the IFPN structure.
- (2)
- The fusion of three-dimensional data with two-dimensional data significantly improves the accuracy of urban building extraction.
- (3)
- Compared with different semantic segmentation networks, the DMU-Net has higher accuracy while preserving edge details.
2. Study Area and Dataset
3. Materials and Methods
3.1. Data Preprocessing
3.2. DMU-Net Architecture
3.2.1. Fusion Strategy
3.2.2. Improved Feature Pyramid Network
3.3. Post-Processing of Buildings
3.3.1. Digital Morphological Processing
3.3.2. Boundary Regularization
4. Experiments and Analysis
4.1. Evaluation Metric
4.2. Experimental Details
4.3. Results
4.3.1. Comparative with SOTA Methods
4.3.2. Results of Building Extraction in the Study Area
5. Discussion
5.1. Comparative Analysis
5.1.1. Validity of NIR and nDSM Data
5.1.2. Comparison of Different Network Structures
5.1.3. Different Fusion Methods
5.1.4. Advantages of Regularization
5.2. Limitations and Future Works
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Cheng, L.; Zhang, F.; Li, S.; Mao, J.; Xu, H.; Ju, W.; Liu, X.; Wu, J.; Min, K.; Zhang, X.; et al. Solar energy potential of urban buildings in 10 cities of China. Energy 2020, 196, 117038. [Google Scholar] [CrossRef]
- Xu, M.; Cao, C.; Jia, P. Mapping fine-scale urban spatial population distribution based on high-resolution stereo pair images, points of interest, and land cover data. Remote Sens. 2020, 12, 608. [Google Scholar] [CrossRef]
- Shen, Y.; Zhu, S.; Yang, T.; Chen, C.; Pan, D.; Chen, J.; Xiao, L.; Du, Q. Bdanet: Multiscale convolutional neural network with cross-directional attention for building damage assessment from satellite images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–14. [Google Scholar] [CrossRef]
- White, G.; Zink, A.; Codecá, L.; Clarke, S. A digital twin smart city for citizen feedback. Cities 2021, 110, 103064. [Google Scholar] [CrossRef]
- Du, S.; Zhang, Y.; Zou, Z.; Xu, S.; He, X.; Chen, S. Automatic building extraction from LiDAR data fusion of point and grid-based features. ISPRS J. Photogramm. Remote Sens. 2017, 130, 294–307. [Google Scholar] [CrossRef]
- Shahzad, M.; Maurer, M.; Fraundorfer, F.; Wang, Y.; Zhu, X.X. Buildings detection in VHR SAR images using fully convolution neural networks. IEEE Trans. Geosci. Remote Sens. 2018, 57, 1100–1116. [Google Scholar] [CrossRef]
- Feng, D.; Chen, H.; Xie, Y.; Liu, Z.; Liao, Z.; Zhu, J.; Zhang, H. GCCINet: Global feature capture and cross-layer information interaction network for building extraction from remote sensing imagery. Int. J. Appl. Earth Obs. Geoinf. 2022, 114, 103046. [Google Scholar] [CrossRef]
- Huang, J.; Zhang, X.; Xin, Q.; Sun, Y.; Zhang, P. Automatic building extraction from high-resolution aerial images and LiDAR data using gated residual refinement network. ISPRS J. Photogramm. Remote Sens. 2019, 151, 91–105. [Google Scholar] [CrossRef]
- Kang, J.; Wang, Z.; Zhu, R.; Xia, J.; Sun, X.; Fernandez-Beltran, R.; Plaza, A. DisOptNet: Distilling Semantic Knowledge From Optical Images for Weather-Independent Building Segmentation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–15. [Google Scholar] [CrossRef]
- Zhang, P.; Du, P.; Lin, C.; Wang, X.; Li, E.; Xue, Z.; Bai, X. A hybrid attention-aware fusion network (HAFNET) for building extraction from high-resolution imagery and LiDAR data. Remote Sens. 2020, 12, 3764. [Google Scholar] [CrossRef]
- Amjadipour, F.; Ghassemian, H.; Imani, M. Building Detection Using Very High Resolution SAR Images with Multi-Direction Based on Weighted-Morphological Indexes. In Proceedings of the 2022 International Conference on Machine Vision and Image Processing (MVIP), Ahvaz, Iran, 23–24 February 2022; pp. 1–6. [Google Scholar]
- Sun, Y.; Hua, Y.; Mou, L.; Zhu, X.X. CG-Net: Conditional GIS-Aware network for individual building segmentation in VHR SAR images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–15. [Google Scholar] [CrossRef]
- Li, J.; Huang, X.; Tu, L.; Zhang, T.; Wang, L. A review of building detection from very high resolution optical remote sensing images. GIScience Remote Sens. 2022, 59, 1199–1225. [Google Scholar] [CrossRef]
- Ji, X.; Yang, B.; Tang, Q.; Xu, W.; Li, J. Feature fusion-based registration of satellite images to airborne LiDAR bathymetry in island area. Int. J. Appl. Earth Obs. Geoinf. 2022, 109, 102778. [Google Scholar] [CrossRef]
- Zhu, X.; Tang, X.; Zhang, G.; Liu, B.; Hu, W. Accuracy Comparison and Assessment of DSM Derived from GFDM Satellite and GF-7 Satellite Imagery. Remote Sens. 2021, 13, 4791. [Google Scholar] [CrossRef]
- Luo, H.; He, B.; Guo, R.; Wang, W.; Kuai, X.; Xia, B.; Wan, Y.; Ma, D.; Xie, L. Urban Building Extraction and Modeling Using GF-7 DLC and MUX Images. Remote Sens. 2021, 13, 3414. [Google Scholar] [CrossRef]
- Wang, J.; Hu, X.; Meng, Q.; Zhang, L.; Wang, C.; Liu, X.; Zhao, M. Developing a Method to Extract Building 3D Information from GF-7 Data. Remote Sens. 2021, 13, 4532. [Google Scholar] [CrossRef]
- Gharibbafghi, Z.; Tian, J.; Reinartz, P. Modified superpixel segmentation for digital surface model refinement and building extraction from satellite stereo imagery. Remote Sens. 2018, 10, 1824. [Google Scholar] [CrossRef]
- Kumar, M.; Bhardwaj, A. Building Extraction from Very High Resolution Stereo Satellite Images using OBIA and Topographic Information. Environ. Sci. Proc. 2020, 5, 1. [Google Scholar]
- Jin, X.; Davis, C.H. Automated building extraction from high-resolution satellite imagery in urban areas using structural, contextual, and spectral information. EURASIP J. Adv. Signal Process. 2005, 2005, 74. [Google Scholar] [CrossRef]
- Huang, X.; Zhang, L. A multidirectional and multiscale morphological index for automatic building extraction from multispectral GeoEye-1 imagery. Photogramm. Eng. Remote Sens. 2011, 77, 721–732. [Google Scholar] [CrossRef]
- Singh, D.; Maurya, R.; Shukla, A.S.; Sharma, M.K.; Gupta, P. Building extraction from very high resolution multispectral images using NDVI based segmentation and morphological operators. In Proceedings of the 2012 Students Conference on Engineering and Systems, Allahabad, India, 16–18 March 2012; pp. 1–5. [Google Scholar]
- Dahiya, S.; Garg, P.K.; Jat, M.K. Object oriented approach for building extraction from high resolution satellite images. In Proceedings of the 2013 3rd IEEE International Advance Computing Conference (IACC), Ghaziabad, India, 22–23 February 2013; pp. 1300–1305. [Google Scholar]
- Grinias, I.; Panagiotakis, C.; Tziritas, G. MRF-based segmentation and unsupervised classification for building and road detection in peri-urban areas of high-resolution satellite images. ISPRS J. Photogramm. Remote Sens. 2016, 122, 145–166. [Google Scholar] [CrossRef]
- Sanlang, S.; Cao, S.; Du, M.; Mo, Y.; Chen, Q.; He, W. Integrating aerial LiDAR and very-high-resolution images for urban functional zone mapping. Remote Sens. 2021, 13, 2573. [Google Scholar] [CrossRef]
- Zhang, L.; Wu, J.; Fan, Y.; Gao, H.; Shao, Y. An efficient building extraction method from high spatial resolution remote sensing images based on improved mask R-CNN. Sensors 2020, 20, 1465. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Sun, G.; Huang, H.; Zhang, A.; Li, F.; Zhao, H.; Fu, H. Fusion of multiscale convolutional neural networks for building extraction in very high-resolution images. Remote Sens. 2019, 11, 227. [Google Scholar] [CrossRef]
- Yu, B.; Yang, L.; Chen, F. Semantic segmentation for high spatial resolution remote sensing images based on convolution neural network and pyramid pooling module. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 3252–3261. [Google Scholar] [CrossRef]
- Liu, W.; Yang, M.; Xie, M.; Guo, Z.; Li, E.; Zhang, L.; Pei, T.; Wang, D. Accurate building extraction from fused DSM and UAV images using a chain fully convolutional neural network. Remote Sens. 2019, 11, 2912. [Google Scholar] [CrossRef]
- Kang, W.; Xiang, Y.; Wang, F.; You, H. EU-Net: An efficient fully convolutional network for building extraction from optical remote sensing images. Remote Sens. 2019, 11, 2813. [Google Scholar] [CrossRef]
- Chen, M.; Wu, J.; Liu, L.; Zhao, W.; Tian, F.; Shen, Q.; Zhao, B.; Du, R. DR-Net: An improved network for building extraction from high resolution remote sensing image. Remote Sens. 2021, 13, 294. [Google Scholar] [CrossRef]
- Deng, W.; Shi, Q.; Li, J. Attention-gate-based encoder–decoder network for automatical building extraction. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 2611–2620. [Google Scholar] [CrossRef]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Wei, S.; Ji, S.; Lu, M. Toward automatic building footprint delineation from aerial images using CNN and regularization. IEEE Trans. Geosci. Remote Sens. 2019, 58, 2178–2189. [Google Scholar] [CrossRef]
- Ran, S.; Gao, X.; Yang, Y.; Li, S.; Zhang, G.; Wang, P. Building multi-feature fusion refined network for building extraction from high-resolution remote sensing images. Remote Sens. 2021, 13, 2794. [Google Scholar] [CrossRef]
- Zhang, P.; Ke, Y.; Zhang, Z.; Wang, M.; Li, P.; Zhang, S. Urban land use and land cover classification using novel deep learning models based on high spatial resolution satellite imagery. Sensors 2018, 18, 3717. [Google Scholar] [CrossRef]
- Tamilarasi, R.; Prabu, S. Automated building and road classifications from hyperspectral imagery through a fully convolutional network and support vector machine. J. Supercomput. 2021, 77, 13243–13261. [Google Scholar] [CrossRef]
- Audebert, N.; Le Saux, B.; Lefèvre, S. Beyond RGB: Very high resolution urban remote sensing with multimodal deep networks. ISPRS J. Photogramm. Remote Sens. 2018, 140, 20–32. [Google Scholar] [CrossRef]
- Piramanayagam, S.; Saber, E.; Schwartzkopf, W.; Koehler, F.W. Supervised classification of multisensor remotely sensed images using a deep learning framework. Remote Sens. 2018, 10, 1429. [Google Scholar] [CrossRef]
- Marmanis, D.; Schindler, K.; Wegner, J.D.; Galliani, S.; Datcu, M.; Stilla, U. Classification with an edge: Improving semantic image segmentation with boundary detection. ISPRS J. Photogramm. Remote Sens. 2018, 135, 158–172. [Google Scholar] [CrossRef]
- Hirschmuller, H. Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 30, 328–341. [Google Scholar] [CrossRef]
- Zhang, W.; Huang, H.; Schmitz, M.; Sun, X.; Wang, H.; Mayer, H. Effective fusion of multi-modal remote sensing data in a fully convolutional network for semantic labeling. Remote Sens. 2017, 10, 52. [Google Scholar] [CrossRef]
- Ji, S.; Wei, S. Building extraction via convolutional neural networks from an open remote sensing building dataset. Acta Geod. Cartogr. Sin. 2019, 48, 448. [Google Scholar]
- Wang, Z.; Zhou, Y.; Wang, S.; Wang, F.; Xu, Z. House building extraction from high resolution remote sensing image based on IEU-Net. J. Remote Sens. 2021, 12, 3845. [Google Scholar]
- Jiang, J.; Liu, F.; Xu, Y.; Huang, H. Multi-spectral RGB-NIR image classification using double-channel CNN. IEEE Access 2019, 7, 20607–20613. [Google Scholar] [CrossRef]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Said, K.A.M.; Jambek, A.B.; Sulaiman, N. A study of image processing using morphological opening and closing processes. Int. J. Control. Theory Appl. 2016, 9, 15–21. [Google Scholar]
- Gribov, A. Searching for a compressed polyline with a minimum number of vertices (discrete solution). In Proceedings of the International Workshop on Graphics Recognition, Kyoto, Japan, 9–10 November 2017; pp. 54–68. [Google Scholar]
- Gribov, A. Optimal Compression of a Polyline While Aligning to Preferred Directions. In Proceedings of the 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), Sydney, Australia, 22–25 September 2019; Volume 1, pp. 98–102. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Wang, H.; Miao, F. Building extraction from remote sensing images using deep residual U-Net. Eur. J. Remote Sens. 2022, 55, 71–85. [Google Scholar] [CrossRef]
- Avbelj, J.; Müller, R.; Bamler, R. A metric for polygon comparison and building extraction evaluation. IEEE Geosci. Remote Sens. Lett. 2014, 12, 170–174. [Google Scholar] [CrossRef] [Green Version]
- Zhao, W.; Persello, C.; Stein, A. Building outline delineation: From aerial images to polygons with an improved end-to-end learning framework. ISPRS J. Photogramm. Remote Sens. 2021, 175, 119–131. [Google Scholar] [CrossRef]
Region | Area (km2) | Buildings | Training Images | Validation Images | Test Images |
---|---|---|---|---|---|
(a) | 6 | 1518 | 120 | 30 | - |
(b) | 5 | 720 | 80 | 20 | - |
(c) | 18 | 2328 | 340 | 50 | - |
(d) | 5 | 629 | - | - | 49 |
(e) | 4 | 436 | - | - | 40 |
Sum | 38 | 5631 | 540 | 100 | 89 |
Model | OA (%) | IoU (%) | F1 (%) |
---|---|---|---|
DMU-Net | 96.16 | 84.49 | 91.59 |
PSPNet [53] | 92.25 | 71.06 | 83.08 |
DeepLab V3+ [54] | 95.29 | 81.21 | 89.63 |
EU-Net [32] | 95.64 | 82.41 | 90.36 |
RU-Net [55] | 94.89 | 79.71 | 88.71 |
Model | First Stream | Second Stream | OA (%) | IoU (%) | F1-Score (%) |
---|---|---|---|---|---|
M1 | NIR + nDSM | RGB | 96.16 | 84.49 | 91.59 |
M2 | NIR | RGB | 95.21 | 81.30 | 89.69 |
M3 | nDSM | RGB | 95.91 | 83.79 | 91.18 |
M4 | - | RGB | 93.56 | 76.18 | 86.48 |
Model | States |
---|---|
DMU-Net | The method proposed in this paper. |
DMU-Net (FPN) | The data is input by a dual-stream CNN and the FPN structure is retained. |
SMU-Net | The data is input by a single-stream CNN and the IFPN structure is retained. |
DU-Net | The IFPN structure is removed and the dual-stream CNN structure is retained. |
IEU-Net | Data are input by single-stream CNN and the IFPN structure is removed [46]. |
Model | OA (%) | IoU (%) | F1 (%) | Trainable Params (M) | FLOPs (M) |
---|---|---|---|---|---|
DMU-Net | 96.16 | 84.49 | 91.59 | 12.49 | 24.97 |
DMU-Net (FPN) | 96.08 | 84.27 | 91.47 | 12.48 | 24.95 |
SMU-Net | 96.00 | 83.75 | 91.15 | 7.78 | 15.54 |
DU-Net | 96.08 | 83.94 | 91.27 | 12.48 | 24.95 |
IEU-Net | 95.75 | 82.84 | 90.61 | 7.77 | 15.53 |
Model | OA (%) | IoU (%) | F1 (%) |
---|---|---|---|
data-level fusion | 95.96 | 83.69 | 91.12 |
feature-level fusion (ours) | 96.12 | 84.27 | 91.46 |
decision-level fusion | 95.65 | 82.19 | 90.22 |
Method | OA (%) | IoU (%) | F1 (%) | PoLiS |
---|---|---|---|---|
the results of original prediction | 96.16 | 84.49 | 91.59 | 16.53 |
the results after morphological processing and regularization | 95.95 | 83.73 | 91.15 | 8.48 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, P.; Sun, Z.; Duan, G.; Wang, D.; Meng, Q.; Sun, Y. DMU-Net: A Dual-Stream Multi-Scale U-Net Network Using Multi-Dimensional Spatial Information for Urban Building Extraction. Sensors 2023, 23, 1991. https://doi.org/10.3390/s23041991
Li P, Sun Z, Duan G, Wang D, Meng Q, Sun Y. DMU-Net: A Dual-Stream Multi-Scale U-Net Network Using Multi-Dimensional Spatial Information for Urban Building Extraction. Sensors. 2023; 23(4):1991. https://doi.org/10.3390/s23041991
Chicago/Turabian StyleLi, Peihang, Zhenhui Sun, Guangyao Duan, Dongchuan Wang, Qingyan Meng, and Yunxiao Sun. 2023. "DMU-Net: A Dual-Stream Multi-Scale U-Net Network Using Multi-Dimensional Spatial Information for Urban Building Extraction" Sensors 23, no. 4: 1991. https://doi.org/10.3390/s23041991