A Stacking Ensemble Deep Learning Model for Building Extraction from Remote Sensing Images
Abstract
:1. Introduction
- (1)
- A deep learning feature integration method is proposed for extracting buildings from remote sensing images. The method combines the advantages of deep learning and ensemble learning. It can enhance the generalization and robustness of the whole model by integrating the advantages of different CNN models.
- (2)
- An optimization method for the prediction results of the basic model is proposed based on fully connected CRFs. The influence of the number of inference function calculations in the CRFs on the optimization result is analyzed, and the number of inference function calculations needed to obtain the best optimization result is determined.
- (3)
- A stacking ensemble method based on a sparse autoencoder [41] is proposed to combine the features of the optimized basic model prediction results. A sparse autoencoder is used to extract the features of the optimized base model prediction results, and then these features are integrated based on the stacking ensemble technique.
2. Methodology
2.1. Overview of the Proposed Model
2.2. Basic Model Construction
2.2.1. Semantic Segmentation Models for Building Extraction
2.2.2. Optimized Basic Prediction Results
2.3. Basic Model Combination
A Stacking Ensemble Method Based on a Sparse Autoencoder
3. Experiments and Results
3.1. Dataset
3.2. Experimental Setting
3.3. Model Performance
3.3.1. Overall Performance
3.3.2. Building Colors
3.3.3. Building Sizes
3.3.4. Building Shapes
3.3.5. Building Shadows
4. Discussion
4.1. Dataset Evaluation
4.2. Applicability Analysis of SENet
4.3. Complexity Comparison of Deelp Learning Models
4.4. Ablation Study
- Net 1: SegNet
- Net 2: SegNet + FCN-8s
- Net 3: SegNet + U-Net
- Net 4: FCN-8s + U-Net
- Net 5: SegNet + FCN-8s + U-Net
4.5. Analysis of the Number of CRF Optimization Calculations
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Ghanea, M.; Moallem, P.; Momeni, M. Building extraction from high-resolution satellite images in urban areas: Recent methods and strategies against significant challenges. Int. J. Remote Sens. 2016, 37, 5234–5248. [Google Scholar] [CrossRef]
- Grinias, I.; Panagiotakis, C.; Tziritas, G. MRF-based segmentation and unsupervised classification for building and road detection in peri-urban areas of high-resolution satellite images. ISPRS J. Photogramm. Remote Sens. 2016, 122, 145–166. [Google Scholar] [CrossRef]
- Chen, R.; Li, X.; Li, J. Object-Based Features for House Detection from RGB High-Resolution Images. Remote Sens. 2018, 10, 451. [Google Scholar] [CrossRef] [Green Version]
- Hui, J.; Du, M.; Ye, X.; Qin, Q.; Sui, J. Effective Building Extraction From High-Resolution Remote Sensing Images With Multitask Driven Deep Neural Network. IEEE Geosci. Remote Sens. Lett. 2018, 16, 786–790. [Google Scholar] [CrossRef]
- Jing, W.; Xu, Z.; Ying, L. Texture-based segmentation for extracting image shape features. In Proceedings of the 2013 19th International Conference on Automation and Computing (ICAC), London, UK, 13–14 September 2013. [Google Scholar]
- Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Ojala, T.; Pietikainen, M.; Maenpaa, T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 971–987. [Google Scholar] [CrossRef]
- Dalal, N.; Triggs, B. Histograms of Oriented Gradients for Human Detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886–893. [Google Scholar] [CrossRef] [Green Version]
- Inglada, J. Automatic recognition of man-made objects in high resolution optical remote sensing images by SVM classification of geometric image features. ISPRS J. Photogramm. Remote Sens. 2007, 62, 236–248. [Google Scholar] [CrossRef]
- Aytekin, Ö.; Zongur, U.; Halici, U. Texture-Based Airport Runway Detection. IEEE Geosci. Remote Sens. Lett. 2012, 10, 471–475. [Google Scholar] [CrossRef]
- Dong, Y.; Du, B.; Zhang, L. Target Detection Based on Random Forest Metric Learning. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 1830–1838. [Google Scholar] [CrossRef]
- Li, E.; Femiani, J.; Xu, S.; Zhang, X.; Wonka, P. Robust Rooftop Extraction From Visible Band Images Using Higher Order CRF. IEEE Trans. Geosci. Remote Sens. 2015, 53, 4483–4495. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. arXiv 2014, arXiv:1411.4038. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:1505.04597. [Google Scholar]
- Chaurasia, A.; Culurciello, E. LinkNet: Exploiting encoder representations for efficient semantic segmentation. In Proceedings of the 2017 IEEE Visual Communications and Image Processing (VCIP), Saint Petersburg, FL, USA, 10–13 December 2017; pp. 1–4. [Google Scholar]
- Mnih, V. Machine Learning for Aerial Image Labeling; University of Toronto: Toronto, ON, Canada, 2013. [Google Scholar]
- Saito, S.; Yamashita, T.; Aoki, Y. Multiple Object Extraction from Aerial Imagery with Convolutional Neural Networks. J. Imaging Sci. Technol. 2016, 60, 104021–104029. [Google Scholar] [CrossRef]
- Bittner, K.; Adam, F.; Cui, S.; Korner, M.; Reinartz, P. Building Footprint Extraction From VHR Remote Sensing Images Combined With Normalized DSMs Using Fused Fully Convolutional Networks. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 2615–2629. [Google Scholar] [CrossRef] [Green Version]
- Yi, Y.; Zhang, Z.; Zhang, W.; Zhang, C.; Li, W.; Zhao, T. Semantic Segmentation of Urban Buildings from VHR Remote Sensing Imagery Using a Deep Convolutional Neural Network. Remote Sens. 2019, 11, 1774. [Google Scholar] [CrossRef] [Green Version]
- Noh, H.; Hong, S.; Han, B. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1520–1528. [Google Scholar]
- Zhang, Z.; Liu, Q.; Wang, Y. Road Extraction by Deep Residual U-Net. IEEE Geosci. Remote Sens. Lett. 2018, 15, 749–753. [Google Scholar] [CrossRef] [Green Version]
- Li, R.; Liu, W.; Yang, L.; Sun, S.; Hu, W.; Zhang, F.; Li, W. DeepUNet: A Deep Fully Convolutional Network for Pixel-Level Sea-Land Segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 3954–3962. [Google Scholar] [CrossRef] [Green Version]
- Pan, X.; Yang, F.; Gao, L.; Chen, Z.; Zhang, B.; Fan, H.; Ren, J. Building Extraction from High-Resolution Aerial Imagery Using a Generative Adversarial Network with Spatial and Channel Attention Mechanisms. Remote Sens. 2019, 11, 917. [Google Scholar] [CrossRef] [Green Version]
- Ye, Z.; Fu, Y.; Gan, M.; Deng, J.; Comber, A.; Wang, K. Building Extraction from Very High Resolution Aerial Imagery Using Joint Attention Deep Neural Network. Remote Sens. 2019, 11, 2970. [Google Scholar] [CrossRef] [Green Version]
- Lin, G.; Milan, A.; Shen, C.; Reid, I. RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5168–5177. [Google Scholar] [CrossRef] [Green Version]
- Jegou, S.; Drozdzal, M.; Vazquez, D.; Romero, A.; Bengio, Y. The one hundred layers tiramisu: Fully convolutional DenseNets for semantic segmentation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 1175–1183. [Google Scholar]
- Liu, P.; Liu, X.; Liu, M.; Shi, Q.; Yang, J.; Xu, X.; Zhang, Y. Building Footprint Extraction from High-Resolution Images via Spatial Residual Inception Convolutional Neural Network. Remote Sens. 2019, 11, 830. [Google Scholar] [CrossRef] [Green Version]
- Lin, L.; Jian, L.; Min, W.; Zhu, H. A Multiple-Feature Reuse Network to Extract Buildings from Remote Sensing Imagery. Remote Sens. 2018, 10, 1350. [Google Scholar]
- Liu, W.; Yang, M.; Xie, M.; Guo, Z.; Li, E.; Zhang, L.; Pei, T.; Wang, D. Accurate Building Extraction from Fused DSM and UAV Images Using a Chain Fully Convolutional Neural Network. Remote Sens. 2019, 11, 2912. [Google Scholar] [CrossRef] [Green Version]
- Zhang, S.; Chen, Y.; Zhang, W.; Feng, R. A novel ensemble deep learning model with dynamic error correction and multi-objective ensemble pruning for time series forecasting—ScienceDirect. Inf. Sci. 2021, 544, 427–445. [Google Scholar] [CrossRef]
- Ma, J.; Wu, L.; Tang, X.; Liu, F.; Zhang, X.; Jiao, L. Building Extraction of Aerial Images by a Global and Multi-Scale Encoder-Decoder Network. Remote Sens. 2020, 12, 2350. [Google Scholar] [CrossRef]
- Ju, M.; Ding, C.; Ren, W.; Yang, Y.; Zhang, D.; Guo, Y.J. IDE: Image Dehazing and Exposure Using an Enhanced Atmospheric Scattering Model. IEEE Trans. Image Process. 2021, 30, 2180–2192. [Google Scholar] [CrossRef] [PubMed]
- Ju, M.; Ding, C.; Guo, Y.J.; Zhang, D. IDGCP: Image Dehazing Based on Gamma Correction Prior. IEEE Trans. Image Process. 2020, 29, 3104–3118. [Google Scholar] [CrossRef] [PubMed]
- Shao, H.; Jiang, H.; Lin, Y.; Li, X. A novel method for intelligent fault diagnosis of rolling bearings using ensemble deep auto-encoders. Mech. Syst. Signal Process. 2018, 102, 278–297. [Google Scholar] [CrossRef]
- Zhou, J.; Peng, T.; Zhang, C.; Sun, N. Data Pre-Analysis and Ensemble of Various Artificial Neural Networks for Monthly Streamflow Forecasting. Water 2018, 10, 628. [Google Scholar] [CrossRef] [Green Version]
- David, B. Online cross-validation-based ensemble learning. Stat. Med. 2018, 2, 37. [Google Scholar]
- Sun, G.; Huang, H.; Zhang, A.; Li, F.; Zhao, H.; Fu, H. Fusion of Multiscale Convolutional Neural Networks for Building Extraction in Very High-Resolution Images. Remote Sens. 2019, 11, 227. [Google Scholar] [CrossRef] [Green Version]
- Saqlain, M.; Jargalsaikhan, B.; Lee, J.Y. A Voting Ensemble Classifier for Wafer Map Defect Patterns Identification in Semiconductor Manufacturing. IEEE Trans. Semicond. Manuf. 2019, 32, 171–182. [Google Scholar] [CrossRef]
- Cheng, J.; Aurélien, B.; van der, L.M. The relative performance of ensemble methods with deep convolutional neural networks for image classification. J. Appl. Stat. 2018, 45, 2800–2818. [Google Scholar]
- Gong, M.; Liu, J.; Li, H.; Cai, Q.; Su, L. A Multiobjective Sparse Feature Learning Model for Deep Neural Networks. IEEE Trans. Neural Networks Learn. Syst. 2015, 26, 3263–3277. [Google Scholar] [CrossRef]
- Huang, B.; Lu, K.; Audeberr, N.; Khalel, A.; Tarabalka, Y.; Malof, J.; Boulch, A.; Le Saux, B.; Collins, L.; Bradbury, K.; et al. Large-Scale Semantic Classification: Outcome of the First Year of Inria Aerial Image Labeling Benchmark. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 6947–6950. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Bischke, B.; Helber, P.; Folz, J.; Borth, D.; Dengel, A. Multi-Task Learning for Segmentation of Building Footprints with Deep Neural Networks. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 1480–1484. [Google Scholar] [CrossRef] [Green Version]
- Krähenbühl, P.; Koltun, V. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials. Adv. Neural Inf. Process. Syst. 2012, 24, 10–20. [Google Scholar]
- Zhang, B.; Wang, C.; Shen, Y.; Liu, Y. Fully Connected Conditional Random Fields for High-Resolution Remote Sensing Land Use/Land Cover Classification with Convolutional Neural Networks. Remote Sens. 2018, 10, 1889. [Google Scholar] [CrossRef] [Green Version]
- Orlando, J.I.; Prokofyeva, E.; Blaschko, M. A Discriminatively Trained Fully Connected Conditional Random Field Model for Blood Vessel Segmentation in Fundus Images. IEEE Trans. Biomed. Eng. 2016, 64, 16–27. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wagner, S.A. SAR ATR by a combination of convolutional neural network and support vector machines. IEEE Trans. Aerosp. Electron. Syst. 2016, 52, 2861–2872. [Google Scholar] [CrossRef]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Zhang, Y.; Gong, W.; Sun, J.; Li, W. Web-Net: A Novel Nest Networks with Ultra-Hierarchical Sampling for Building Extraction from Aerial Imageries. Remote Sens. 2019, 11, 1897. [Google Scholar] [CrossRef] [Green Version]
- Castagno, J.; Atkins, E. Roof Shape Classification from LiDAR and Satellite Image Data Fusion Using Supervised Learning. Sensors 2018, 18, 3960. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Gabay, H.; Meir, I.A.; Schwartz, M.; Werzberger, E. Cost-benefit analysis of green buildings: An Israeli office buildings case study. Energy Build. 2014, 76, 558–564. [Google Scholar] [CrossRef]
- Ji, S.; Wei, S.; Lu, M. Fully Convolutional Networks for Multisource Building Extraction From an Open Aerial and Satellite Imagery Data Set. IEEE Trans. Geosci. Remote Sens. 2019, 57, 574–586. [Google Scholar] [CrossRef]
- Maggiori, E.; Tarabalka, Y.; Charpiat, G.; Alliez, P. Can semantic labeling methods generalize to any city? The inria aerial image labeling benchmark. In Proceedings of the 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA, 23–28 July 2017; pp. 3226–3229. [Google Scholar]
Methods | Structure | Backbone | LRP | Loss |
---|---|---|---|---|
FCN-8s | Multi-Scale | VGG-16 | Fixed | Cross Entropy |
U-Net | Encoder-Decoder | VGG-16 | Step | Cross Entropy |
SegNet | Encoder-Decoder | VGG-16 | Step | Cross Entropy |
Features | Options | Definition | |
---|---|---|---|
Color | 1. red; 2. blue; 3. white; 4. gray; 5. others. | Describes the building roof color features | |
Size | 1. small; 2. medium; 3. large. | Varying in size (1000, 4000, and 10,000 m2) | |
Shape | Structure | 1. simple; 2. complex. | Describes the shapes of buildings through structure and edge contours |
Edge contour | 1. obvious; 2. Blurry. | ||
Shadow | 1. yes; 2. no. | Describes whether a building is covered by shadows |
Methods | 1 | 2 | 3 | 4 | Average | |
---|---|---|---|---|---|---|
Precision | SENet | 0.972 | 0.963 | 0.926 | 0.956 | 0.954 |
U-Net | 0.935 | 0.825 | 0.856 | 0.933 | 0.887 | |
SegNet | 0.814 | 0.912 | 0.895 | 0.951 | 0.893 | |
FCN-8s | 0.836 | 0.842 | 0.824 | 0.922 | 0.856 | |
Recall | SENet | 0.965 | 0.824 | 0.832 | 0.935 | 0.889 |
U-Net | 0.878 | 0.763 | 0.742 | 0.824 | 0.802 | |
SegNet | 0.96 | 0.733 | 0.785 | 0.913 | 0.848 | |
FCN-8s | 0.834 | 0.915 | 0.793 | 0.873 | 0.854 | |
F1 | SENet | 0.932 | 0.886 | 0.878 | 0.924 | 0.905 |
U-Net | 0.893 | 0.795 | 0.794 | 0.871 | 0.838 | |
SegNet | 0.855 | 0.811 | 0.832 | 0.925 | 0.856 | |
FCN-8s | 0.834 | 0.872 | 0.813 | 0.896 | 0.854 | |
IoU | SENet | 0.752 | 0.737 | 0.685 | 0.826 | 0.750 |
U-Net | 0.733 | 0.622 | 0.635 | 0.645 | 0.659 | |
SegNet | 0.675 | 0.618 | 0.634 | 0.813 | 0.685 | |
FCN-8s | 0.714 | 0.734 | 0.652 | 0.764 | 0.716 |
Building Color | Methods | Recall | ||||||
---|---|---|---|---|---|---|---|---|
SENet | U-Net | SegNet | FCN-8s | SENet | U-Net | SegNet | FCN-8s | |
Red | 0.959 | 0.958 | 0.925 | 0.885 | 0.951 | 0.885 | 0.938 | 0.936 |
Blue | 0.923 | 0.826 | 0.861 | 0.826 | 0.789 | 0.715 | 0.748 | 0.764 |
White | 0.949 | 0.877 | 0.893 | 0.850 | 0.881 | 0.806 | 0.858 | 0.857 |
Gray | 0.943 | 0.886 | 0.892 | 0.862 | 0.883 | 0.802 | 0.848 | 0.859 |
Average | 0.943 | 0.887 | 0.893 | 0.856 | 0.876 | 0.802 | 0.848 | 0.854 |
Building Sizes | Precision | Recall | ||||||
---|---|---|---|---|---|---|---|---|
SENet | U-Net | SegNet | FCN-8s | SENet | U-Net | SegNet | FCN-8s | |
Small | 0.968 | 0.951 | 0.913 | 0.905 | 0.956 | 0.804 | 0.948 | 0.803 |
Medium | 0.933 | 0.827 | 0.881 | 0.815 | 0.846 | 0.862 | 0.787 | 0.811 |
Large | 0.945 | 0.866 | 0.885 | 0.848 | 0.897 | 0.791 | 0.809 | 0.826 |
Average | 0.949 | 0.881 | 0.893 | 0.856 | 0.900 | 0.819 | 0.848 | 0.813 |
Datasets | GCD (m) | Area (km2) | Source | Tiles | Pixels | Label Format |
---|---|---|---|---|---|---|
Ours | 0.27 | 1830 | sat | 650 | 5000 × 3500 | vector/raster |
WHU | 0.075/2.7 | 450/550 | aerial/sat | 8189/17,388 | 512 × 512 | vector/raster |
ISPRS | 0.05/0.09 | 2/11 | aerial | 24/16 | 6000 × 6000 | raster |
Massachusetts | 1.00 | 340 | aerial | 151 | 11,500 × 7500 | raster |
Inria | 0.3 | 405 | aerial | 180 | 1500 × 1500 | raster |
Models | Precesion | Recall | F1 | IoU |
---|---|---|---|---|
U-Net | 0.891 | 0.896 | 0.847 | 0.685 |
SegNet | 0.924 | 0.901 | 0.863 | 0.705 |
SENet | 0.957 | 0.915 | 0.923 | 0.785 |
Model | SegNet | FCN-8s | U-Net | DeconvNet | DeepUNet | SENet |
---|---|---|---|---|---|---|
Training Time (Second/Epoch) | 1186 | 976 | 724 | 2359 | 493 | 1769 |
Testing Time (ms/image) | 58.6 | 84.3 | 48.3 | 206.7 | 42.8 | 63.1 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cao, D.; Xing, H.; Wong, M.S.; Kwan, M.-P.; Xing, H.; Meng, Y. A Stacking Ensemble Deep Learning Model for Building Extraction from Remote Sensing Images. Remote Sens. 2021, 13, 3898. https://doi.org/10.3390/rs13193898
Cao D, Xing H, Wong MS, Kwan M-P, Xing H, Meng Y. A Stacking Ensemble Deep Learning Model for Building Extraction from Remote Sensing Images. Remote Sensing. 2021; 13(19):3898. https://doi.org/10.3390/rs13193898
Chicago/Turabian StyleCao, Duanguang, Hanfa Xing, Man Sing Wong, Mei-Po Kwan, Huaqiao Xing, and Yuan Meng. 2021. "A Stacking Ensemble Deep Learning Model for Building Extraction from Remote Sensing Images" Remote Sensing 13, no. 19: 3898. https://doi.org/10.3390/rs13193898
APA StyleCao, D., Xing, H., Wong, M. S., Kwan, M. -P., Xing, H., & Meng, Y. (2021). A Stacking Ensemble Deep Learning Model for Building Extraction from Remote Sensing Images. Remote Sensing, 13(19), 3898. https://doi.org/10.3390/rs13193898