Improving Building Extraction by Using Knowledge Distillation to Reduce the Impact of Label Noise
Abstract
:1. Introduction
2. Materials and Methods
2.1. Study Area and Data
2.2. Methods
2.2.1. Overall Architecture
2.2.2. Swin Transformer Encoder
- (a)
- W-MSA module
- (b)
- SW-MSA module
- (c)
- Encoder
2.2.3. UPerNet Decoder
2.2.4. Loss Function
3. Results
3.1. Experimental Settings and Evaluation Metrics
3.2. Experimental Parameter Settings
3.3. Experimental Results
3.3.1. Quantitative Analysis
3.3.2. Qualitative Analysis
4. Discussion
4.1. Experiment of Different Features for Distillation
4.2. Method Applicability and Model Complexity
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Xu, Y.; Du, B.; Zhang, L.; Cerra, D.; Pato, M.; Carmona, E.; Prasad, S.; Yokoya, N.; Hänsch, R.; Le Saux, B. Advanced multi-sensor optical remote sensing for urban land use and land cover classification: Outcome of the 2018 IEEE GRSS data fusion contest. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2019, 12, 1709–1724. [Google Scholar] [CrossRef]
- Rashidian, V.; Baise, L.G.; Koch, M. Detecting collapsed buildings after a natural hazard on vhr optical satellite imagery using u-net convolutional neural networks. In Proceedings of the IGARSS 2019—2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 9394–9397. [Google Scholar]
- Zou, S.; Wang, L. Individual vacant house detection in very-high-resolution remote sensing images. Ann. Am. Assoc. Geogr. 2020, 110, 449–461. [Google Scholar] [CrossRef]
- Doulamis, A.; Grammalidis, N.; Ioannides, M.; Potsiou, C.; Doulamis, N.D.; Stathopoulou, E.K.; Ioannidis, C.; Chrysouli, C.; Dimitropoulos, K. 5D modelling: An efficient approach for creating spatiotemporal predictive 3D maps of large-scale cultural resources. In ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences. In Proceedings of the 25th International CIPA Symposium, CIPA 2015, Taipei, Taiwan, 31 August–4 September 2015; Volume 2. [Google Scholar]
- Osco, L.P.; Junior, J.M.; Ramos, A.P.M.; de Castro Jorge, L.A.; Fatholahi, S.N.; de Andrade Silva, J.; Matsubara, E.T.; Pistori, H.; Gonçalves, W.N.; Li, J. A review on deep learning in UAV remote sensing. Int. J. Appl. Earth Obs. Geoinf. 2021, 102, 102456. [Google Scholar] [CrossRef]
- Hoeser, T.; Kuenzer, C. Object detection and image segmentation with deep learning on earth observation data: A review-part I: Evolution and recent trends. Remote. Sens. 2020, 12, 1667. [Google Scholar] [CrossRef]
- Luo, L.; Li, P.; Yan, X. Deep learning-based building extraction from remote sensing images: A comprehensive review. Energies 2021, 14, 7982. [Google Scholar] [CrossRef]
- Kang, J.; Wang, Z.; Zhu, R.; Xia, J.; Sun, X.; Fernandez-Beltran, R.; Plaza, A. DisOptNet: Distilling Semantic Knowledge From Optical Images for Weather-Independent Building Segmentation. IEEE Trans. Geosci. Remote. Sens. 2022, 60, 1–15. [Google Scholar] [CrossRef]
- Wei, S.; Ji, S.; Lu, M. Toward automatic building footprint delineation from aerial images using CNN and regularization. IEEE Trans. Geosci. Remote. Sens. 2019, 58, 2178–2189. [Google Scholar] [CrossRef]
- Feng, W.; Sui, H.; Hua, L.; Xu, C.; Ma, G.; Huang, W. Building extraction from VHR remote sensing imagery by combining an improved deep convolutional encoder-decoder architecture and historical land use vector map. Int. J. Remote. Sens. 2020, 41, 6595–6617. [Google Scholar] [CrossRef]
- Hosseinpoor, H.; Samadzadegan, F. Convolutional neural network for building extraction from high-resolution remote sensing images. In Proceedings of the 2020 International Conference on Machine Vision and Image Processing (MVIP), Qom, Iran, 18–20 February 2020; pp. 1–5. [Google Scholar]
- Ji, S.; Wei, S.; Lu, M. Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set. IEEE Trans. Geosci. Remote. Sens. 2018, 57, 574–586. [Google Scholar] [CrossRef]
- Ma, J.; Wu, L.; Tang, X.; Liu, F.; Zhang, X.; Jiao, L. Building extraction of aerial images by a global and multi-scale encoder-decoder network. Remote. Sens. 2020, 12, 2350. [Google Scholar] [CrossRef]
- Maltezos, E.; Doulamis, A.; Doulamis, N.; Ioannidis, C. Building extraction from LiDAR data applying deep convolutional neural networks. IEEE Geosci. Remote. Sens. Lett. 2018, 16, 155–159. [Google Scholar] [CrossRef]
- Pan, X.; Yang, F.; Gao, L.; Chen, Z.; Zhang, B.; Fan, H.; Ren, J. Building extraction from high-resolution aerial imagery using a generative adversarial network with spatial and channel attention mechanisms. Remote. Sens. 2019, 11, 917. [Google Scholar] [CrossRef] [Green Version]
- Shao, Z.; Tang, P.; Wang, Z.; Saleem, N.; Yam, S.; Sommai, C. BRRNet: A fully convolutional neural network for automatic building extraction from high-resolution remote sensing images. Remote. Sens. 2020, 12, 1050. [Google Scholar] [CrossRef] [Green Version]
- Cheng, G.; Han, J. A survey on object detection in optical remote sensing images. ISPRS J. Photogramm. Remote. Sens. 2016, 117, 11–28. [Google Scholar] [CrossRef] [Green Version]
- Ahmadi, S.; Zoej, M.V.; Ebadi, H.; Moghaddam, H.A.; Mohammadzadeh, A. Automatic urban building boundary extraction from high resolution aerial images using an innovative model of active contours. Int. J. Appl. Earth Obs. Geoinf. 2010, 12, 150–157. [Google Scholar] [CrossRef]
- Belgiu, M.; Drǎguţ, L. Comparing supervised and unsupervised multiresolution segmentation approaches for extracting buildings from very high resolution imagery. ISPRS J. Photogramm. Remote. Sens. 2014, 96, 67–75. [Google Scholar] [CrossRef] [Green Version]
- Maggiori, E.; Tarabalka, Y.; Charpiat, G.; Alliez, P. Convolutional neural networks for large-scale remote-sensing image classification. IEEE Trans. Geosci. Remote. Sens. 2016, 55, 645–657. [Google Scholar] [CrossRef] [Green Version]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Wu, G.; Shao, X.; Guo, Z.; Chen, Q.; Yuan, W.; Shi, X.; Xu, Y.; Shibasaki, R. Automatic building segmentation of aerial imagery using multi-constraint fully convolutional networks. Remote. Sens. 2018, 10, 407. [Google Scholar] [CrossRef] [Green Version]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Luo, Z.; Wang, Y.; Fu, Y.; Feng, J.; Xiang, T.; Torr, P.H.; et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 6881–6890. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
- Frénay, B.; Verleysen, M. Classification in the presence of label noise: A survey. IEEE Trans. Neural Netw. Learn. Syst. 2013, 25, 845–869. [Google Scholar] [CrossRef] [PubMed]
- Mnih, V.; Hinton, G.E. Learning to label aerial images from noisy data. In Proceedings of the 29th International Conference on Machine Learning (ICML-12), Edinburgh, Scotland, 26 June 26–1 July 2012; pp. 567–574. [Google Scholar]
- Xiao, T.; Xia, T.; Yang, Y.; Huang, C.; Wang, X. Learning from massive noisy labeled data for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 2691–2699. [Google Scholar]
- Sukhbaatar, S.; Bruna, J.; Paluri, M.; Bourdev, L.; Fergus, R. Training convolutional networks with noisy labels. arXiv 2014, arXiv:1406.2080. [Google Scholar]
- Goldberger, J.; Ben-Reuven, E. Training deep neural-networks using a noise adaptation layer. In Proceedings of the 5th International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Yuan, J. Learning building extraction in aerial scenes with convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 2793–2798. [Google Scholar] [CrossRef] [PubMed]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Xiao, T.; Liu, Y.; Zhou, B.; Jiang, Y.; Sun, J. Unified perceptual parsing for scene understanding. In Proceedings of the European conference on computer vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 418–434. [Google Scholar]
- TianDiTu WenZhou. Available online: https://zhejiang.tianditu.gov.cn/wenzhou/ (accessed on 20 May 2022).
- Yuan, Y.; Chen, X.; Chen, X.; Wang, J. Object-contextual representations for semantic segmentation. arXiv 2019, arXiv:1909.11065. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In MICCAI 2015: Medical Image Computing and Computer-Assisted Intervention, Proceedings of the International Conference on Medical image computing and Computer-Assisted Intervention, Singapore, 18–22 September 2022; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Zhang, Y.; Xiang, T.; Hospedales, T.M.; Lu, H. Deep mutual learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4320–4328. [Google Scholar]
- Shu, C.; Liu, Y.; Gao, J.; Yan, Z.; Shen, C. Channel-wise knowledge distillation for dense prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 5311–5320. [Google Scholar]
- Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
- Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 5693–5703. [Google Scholar]
- Fan, M.; Lai, S.; Huang, J.; Wei, X.; Chai, Z.; Luo, J.; Wei, X. Rethinking BiSeNet For Real-time Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 9716–9725. [Google Scholar]
- Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Method | Backbone | Precision | Recall | F1 Score | IoU |
---|---|---|---|---|---|
UNet [39] | - | 85.12 | 78.92 | 81.9 | 69.35 |
PSPNet [38] | ResNet-101 [47] | 84.87 | 84.64 | 84.76 | 73.55 |
FPN [37] | ResNet-101 [47] | 84.76 | 82.02 | 83.37 | 71.48 |
UPerNet [34] | ResNet-101 [47] | 85.28 | 84.95 | 85.11 | 74.08 |
DeepLabV3+ [40] | ResNet-101 [47] | 84.97 | 84.05 | 84.51 | 73.17 |
OCRNet [36] | HRNet-48 [44] | 86.62 | 86.58 | 86.6 | 76.36 |
UPerNet [34] | Swin Transformer [25] | 86.91 | 87.38 | 87.15 | 77.22 |
SegFormer [26] | MIT-B5 | 83.57 | 80.19 | 81.84 | 69.26 |
STDC [45] | STDC2 | 83.13 | 84.38 | 83.75 | 72.05 |
ConvNeXt [46] | ConvNeXt-L | 88.35 | 86.57 | 87.45 | 77.70 |
Proposed | Swin Transformer | 89.22 | 90.71 | 89.87 | 81.61 |
Method | Backbone | Precision | Recall | F1 Score | IoU |
---|---|---|---|---|---|
UPerNet [34] | Swin Transformer [25] | 86.91 | 87.38 | 87.15 | 77.22 |
TEACHER | Swin Transformer [25] | 88.37 | 86.86 | 87.6 | 77.84 |
Fine-tune | Swin Transformer [25] | 89.06 | 87.07 | 88.13 | 78.78 |
DML [41] | Swin Transformer [25] | 87.54 | 86.34 | 86.93 | 76.89 |
CWD [42] | Swin Transformer [25] | 87.36 | 88.09 | 87.72 | 78.13 |
Proposed | Swin Transformer [25] | 89.22 (+2.31) | 90.71 (+3.33) | 89.87 (+2.72) | 81.61 (+4.39) |
Methods | Precision | Recall | F1 Score | IoU |
---|---|---|---|---|
Baseline | 86.91 | 87.38 | 87.15 | 77.22 |
Proposed (+distill) | 88.75 | 89.4 | 89.08 | 80.31 |
Proposed (+MS + distill) | 89.22 | 90.71 | 89.87 | 81.61 |
Precision | Recall | F1 Score | IoU | ||||
---|---|---|---|---|---|---|---|
√ | 88.75 | 89.4 | 89.08 | 80.31 | |||
√ | 88.56 | 89.56 | 89.06 | 80.28 | |||
√ | 88.78 | 89.57 | 89.17 | 80.36 | |||
√ | 87.95 | 88.96 | 88.45 | 79.29 | |||
√ | √ | 88.8 | 89.71 | 89.25 | 80.59 | ||
√ | √ | 88.8 | 89.47 | 89.13 | 80.4 | ||
√ | √ | 88.81 | 89.61 | 89.21 | 80.52 | ||
√ | √ | √ | 89.22 | 90.71 | 89.87 | 81.61 | |
√ | √ | 88.64 | 89.82 | 89.23 | 80.55 | ||
√ | √ | √ | 88.33 | 88.5 | 88.41 | 79.23 | |
√ | √ | √ | √ | 89.38 | 89.18 | 89.28 | 80.64 |
Methods | Backbone | Precision | Recall | F1 Score | IoU |
---|---|---|---|---|---|
UPerNet | ResNet-101 | 85.28 | 84.95 | 85.11 | 74.08 |
UPerNet + MS + distill | ResNet-101 | 86.68 (+1.40) | 86.99 (+2.04) | 86.84 (+1.73) | 76.74 (+2.66) |
FPN | Swin Transformer | 87.54 | 86.85 | 87.19 | 77.3 |
FPN + MS + distill | Swin Transformer | 88.96 (+1.42) | 89.54 (+2.69) | 89.25 (+2.06) | 80.95 (+3.29) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xu, G.; Deng, M.; Sun, G.; Guo, Y.; Chen, J. Improving Building Extraction by Using Knowledge Distillation to Reduce the Impact of Label Noise. Remote Sens. 2022, 14, 5645. https://doi.org/10.3390/rs14225645
Xu G, Deng M, Sun G, Guo Y, Chen J. Improving Building Extraction by Using Knowledge Distillation to Reduce the Impact of Label Noise. Remote Sensing. 2022; 14(22):5645. https://doi.org/10.3390/rs14225645
Chicago/Turabian StyleXu, Gang, Min Deng, Geng Sun, Ya Guo, and Jie Chen. 2022. "Improving Building Extraction by Using Knowledge Distillation to Reduce the Impact of Label Noise" Remote Sensing 14, no. 22: 5645. https://doi.org/10.3390/rs14225645
APA StyleXu, G., Deng, M., Sun, G., Guo, Y., & Chen, J. (2022). Improving Building Extraction by Using Knowledge Distillation to Reduce the Impact of Label Noise. Remote Sensing, 14(22), 5645. https://doi.org/10.3390/rs14225645