A Light-Weight Neural Network Using Multiscale Hybrid Attention for Building Change Detection
Abstract
:1. Introduction
- (1)
- A novel lightweight Siamese convolutional network is proposed. It consists of a lightweight backbone network constructed using parallel dilated convolutions and contextual semantic-guided module. The Parallel dilated convolution is calculated in a depth-separable manner, i.e., RGB channels are grouped, and convolution operations are performed in each independent group, which greatly reduces the number of model parameters and improves model efficiency compared with standard convolution. Compared with traditional lightweight models that only use depth-separable convolution operations, we combine dilated convolution and depth-separable manner to further reduce the number of model parameters. In addition, the context semantic guidance module effectively aggregates the global features of input remote sensing images and enhances feature extraction capabilities.
- (2)
- To fully use the contextual feature information of the input image pair, we introduce a hybrid attention mechanism based on the backbone network, which enhances the model’s ability to recognize edge details of changing buildings. In addition, due to the large scale span of change buildings, a hybrid attention module that fuses channel attention and spatial attention cannot effectively deal with this challenge. Therefore, we construct a multi-scale hybrid attention module to generate multi-scale attention feature tensors by aggregating spatiotemporal information at different scales, and to improve the model’s ability to recognize fine details and small-scale buildings. In addition to theoretical analysis, the proposed method was tested on the change detection standard datasets (WHU-CD [26], LEVIR-CD [27]) and compared with other change detection models to demonstrate its feasibility and superiority.
2. Related Work
3. Materials and Methods
3.1. Overview
Algorithm 1: The process of building change detection method. |
input: Bitemporal RS images and . output: CDM:The final binary result L. // 1: read VHR images; 2: compute the feature matrix , // 3: compute the value of attention matrix A: 4: compute the value of matrix 5: Updata feature matrix Z: // 6: compute distance map 7: compute global threshold value according to attention matrix A;
|
3.2. Lightweight Siamese Feature Extraction Backbone
3.3. Multiscale Hybrid Attention Module
- (1)
- First, perform a global average pooling operation on the input feature map X to aggregate the feature maps of different channels. At the same time, the channel vector will encode each channel’s feature information and obtain the channel attention matrix (Equation (6)). To verify the effect of channel attention, we evaluate it using a multilayer perceptron (MLP) and add a batch normalization (BN) layer after the MLP to adjust the output size:
- (2)
- The spatial attention module utilizes 1 × 1 convolution to reduce the dimensionality of the input feature map X and merge and compress it across the entire channel dimension. Then, two 3 × 3 dilated convolutions are used to expand the receptive field (RF) to fully utilize the contextual spatiotemporal semantic information while reducing the number of model parameters. Finally, a 1 × 1 convolution is used to reduce the feature map to a spatial attention map , and a BN layer is applied at the end of the spatial attention branch to adjust the size of the output result. The spatial attention calculation is shown in Equation (7):
3.4. Evaluation Metrics Module
4. Experiments
4.1. Data Sets
4.1.1. Dataset Introduction
- The study area selected for the WHU-CD [26] dataset is Christchurch, New Zealand. The dataset consists of high-resolution aerial imagery covering an area of a magnitude 6.3 earthquake in February 2011 and reconstructed in subsequent years, containing 12,796 buildings over 20.5 square kilometers (16,077 buildings of the same place in the 2016 dataset). The original dataset size is 32,507 × 15,354, contains three channels of RGB, and the spatial resolution is 0.075 m.
- LEVIR-CD [27] is a large-scale remote sensing dataset for building change detection provided by the LEVIR Laboratory of Beihang University. The dataset uses Google Earth API to obtain 637 high-resolution remote sensing image pairs. The image size is 1024 × 1024, contains three channels of RGB, and the spatial resolution is 0.5 m. The dataset spans 5 to 14 years, during which there are significant changes in land use, especially building additions. LEVIR-CD covers various types of buildings, and the size of the changing objects spans a large scale, which can effectively test the performance of the change detection method.
4.1.2. Data Preprocessing
4.2. Experimental Metrics
4.3. Comparative Methods
- Fully Convolutional-Early Fusion (FC-EF) [31]: FC-EF is a Fully convolutional neural network based on UNet structure. This structure fuses bitemporal images in the channel dimension before feeding them into the network, treating them as different channels of an image.
- Fully Convolutional-Siamese-Concatenation (FC-Siam-conc) [31]: FC-Siam-conc is a Siamese neural network structure based on FC-EF, which skip-connects three feature maps from two encoder branches and the corresponding layers of the decoder, and finally generates a change map.
- Fully Convolutional-Siamese-Difference (FC-Siam-diff) [31]: FC-Siam-diff is similar to FC-Siam-conc, both based on the Siamese neural network structure of FC-EF. The difference is that the structure first obtains the absolute value of the difference between the feature maps of the two decoder branches and then performs a skip connection with the corresponding layer of the decoder.
- Bitemporal Image Transformer (BIT) [42]: BIT uses a transformer to build a network consists of a Siamese semantic tagger, a transformer encoder, and a decoder.
- Deeply Supervised Image Fusion Network (IFNet) [45]: IFNet applies channel attention to each level of feature extraction of the decoder and calculates the supervision loss at each level.
- Deeply Supervised Attention Metric-based Network (DSAMN [46]: DSAMNet is a deep supervised metric learning network model that integrates CBAM, which can obtain more representative features and achieve better model effects.
4.4. Results Evaluation
4.5. Model Computation Complexity
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Singh, A. Review article digital change detection techniques using remotely-sensed data. Int. J. Remote Sens. 1989, 10, 989–1003. [Google Scholar] [CrossRef]
- Dong, H.; Ma, W.; Wu, Y.; Gong, M.; Jiao, L. Local descriptor learning for change detection in synthetic aperture radar images via convolutional neural networks. IEEE Access 2018, 7, 15389–15403. [Google Scholar] [CrossRef]
- Zhang, J.; Pan, B.; Zhang, Y.; Liu, Z.; Zheng, X. Building Change Detection in Remote Sensing Images Based on Dual Multi-Scale Attention. Remote Sens. 2022, 14, 5405. [Google Scholar] [CrossRef]
- Liu, R.; Jiang, D.; Zhang, L.; Zhang, Z. Deep depthwise separable convolutional network for change detection in optical aerial images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 1109–1118. [Google Scholar] [CrossRef]
- Patil, P.S.; Holambe, R.S.; Waghmare, L.M. EffCDNet: Transfer learning with deep attention network for change detection in high spatial resolution satellite images. Digit. Signal Process. 2021, 118, 103250–103262. [Google Scholar] [CrossRef]
- Luo, H.; Liu, C.; Wu, C.; Guo, X. Urban change detection based on Dempster–Shafer theory for multitemporal very high-resolution imagery. Remote Sens. 2018, 10, 980. [Google Scholar] [CrossRef]
- Wu, J.; Xie, C.; Zhang, Z.; Zhu, Y. A Deeply Supervised Attentive High-Resolution Network for Change Detection in Remote Sensing Images. Remote Sens. 2022, 15, 45. [Google Scholar] [CrossRef]
- Wang, Q.; Yuan, Z.; Du, Q.; Li, X. GETNET: A general end-to-end 2D CNN framework for hyperspectral image change detection. IEEE Trans. Geosci. Remote Sens. 2018, 57, 3–13. [Google Scholar] [CrossRef]
- Li, H.; Xiao, P.; Feng, X.; Yang, Y.; Wang, L.; Zhang, W.; Wang, X.; Feng, W.; Chang, X. Using land long-term data records to map land cover changes in China over 1981–2010. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 1372–1389. [Google Scholar] [CrossRef]
- Li, Z.; Shi, W.; Myint, S.W.; Lu, P.; Wang, Q. Semi-automated landslide inventory mapping from bitemporal aerial photographs using change detection and level set method. Remote Sens. Environ. 2016, 175, 215–230. [Google Scholar] [CrossRef]
- Azzouzi, S.A.; Vidal-Pantaleoni, A.; Bentounes, H.A. Desertification monitoring in Biskra, Algeria, with Landsat imagery by means of supervised classification and change detection methods. IEEE Access 2017, 5, 9065–9072. [Google Scholar] [CrossRef]
- Lv, Z.; Liu, T.; Wan, Y.; Benediktsson, J.A.; Zhang, X. Post-processing approach for refining raw land cover change detection of very high-resolution remote sensing images. Remote Sens. 2018, 10, 472. [Google Scholar] [CrossRef]
- Yu, L.; Wang, Z.; Tian, S.; Ye, F.; Ding, J.; Kong, J. Convolutional neural networks for water body extraction from Landsat imagery. Int. J. Comput. Intell. Appl. 2017, 16, 1750001–17500013. [Google Scholar] [CrossRef]
- Kemker, R.; Salvaggio, C.; Kanan, C. Algorithms for semantic segmentation of multispectral remote sensing imagery using deep learning. ISPRS J. Photogramm. Remote Sens. 2018, 145, 60–77. [Google Scholar] [CrossRef]
- Yang, G.; Li, H.C.; Wang, W.Y.; Yang, W.; Emery, W.J. Unsupervised change detection based on a unified framework for weighted collaborative representation with RDDL and fuzzy clustering. IEEE Trans. Geosci. Remote Sens. 2019, 57, 8890–8903. [Google Scholar] [CrossRef]
- Ma, W.; Yang, H.; Wu, Y.; Xiong, Y.; Hu, T.; Jiao, L.; Hou, B. Change detection based on multi-grained cascade forest and multi-scale fusion for SAR images. Remote Sens. 2019, 11, 142. [Google Scholar] [CrossRef]
- Kit, O.; Lüdeke, M. Automated detection of slum area change in Hyderabad, India using multitemporal satellite imagery. ISPRS J. Photogramm. Remote Sens. 2013, 83, 130–137. [Google Scholar] [CrossRef]
- Qiu, L.; Gao, L.; Ding, Y.; Li, Y.; Lu, H.; Yu, W. Change detection method using a new difference image for remote sensing images. In Proceedings of the 2013 IEEE International Geoscience and Remote Sensing Symposium-IGARSS, Melbourne, Australia, 21–26 July 2013; pp. 4293–4296. [Google Scholar]
- Bruzzone, L.; Prieto, D.F. Automatic analysis of the difference image for unsupervised change detection. IEEE Trans. Geosci. Remote Sens. 2000, 38, 1171–1182. [Google Scholar] [CrossRef]
- Malila, W.A. Change vector analysis: An approach for detecting forest changes with Landsat. LARS Symp. 1980, 1, 385–397. [Google Scholar]
- Celik, T. Unsupervised change detection in satellite images using principal component analysis and k-means clustering. IEEE Geosci. Remote Sens. Lett. 2009, 6, 772–776. [Google Scholar] [CrossRef]
- Chen, H.; Wu, C.; Du, B.; Zhang, L.; Wang, L. Change detection in multisource VHR images via deep Siamese convolutional multiple-layers recurrent neural network. IEEE Trans. Geosci. Remote Sens. 2019, 58, 2848–2864. [Google Scholar] [CrossRef]
- Zhang, W.; Fan, H. Application of isolated forest algorithm in deep learning change detection of high resolution remote sensing image. In Proceedings of the 2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), Dalian, China, 27–29 June 2020; pp. 753–756. [Google Scholar]
- Duan, H.; Dong, X.; You, S.; Han, S. A Deep Learning Denoising Framework Based on FFDNet for SAR Image Change Detection. In Proceedings of the 2021 IEEE 11th International Conference on Electronics Information and Emergency Communication (ICEIEC) 2021 IEEE 11th International Conference on Electronics Information and Emergency Communication (ICEIEC), Beijing, China, 18–20 June 2021; pp. 1–4. [Google Scholar]
- Zhu, Q.; Guo, X.; Deng, W.; Guan, Q.; Zhong, Y.; Zhang, L.; Li, D. Land-use/land-cover change detection based on a Siamese global learning framework for high spatial resolution remote sensing imagery. ISPRS J. Photogramm. Remote Sens. 2022, 184, 63–78. [Google Scholar] [CrossRef]
- Ji, S.; Wei, S.; Lu, M. Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set. IEEE Trans. Geosci. Remote Sens. 2018, 57, 574–586. [Google Scholar] [CrossRef]
- Chen, H.; Shi, Z. A spatial-temporal attention-based method and a new dataset for remote sensing image change detection. Remote Sens. 2020, 12, 1662. [Google Scholar] [CrossRef]
- REN, Q.; YANG, W.; WANG, C.; WEI, W.; QIAN, Y. Review of remote sensing image change detection. J. Comput. Appl. 2021, 41, 2294–2307. [Google Scholar]
- Zhao, J.; Liu, S.; Wan, J.; Yasir, M.; Li, H. Change detection method of high resolution remote sensing image based on DS evidence theory feature fusion. IEEE Access 2020, 9, 4673–4687. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Daudt, R.C.; Le Saux, B.; Boulch, A. Fully convolutional Siamese networks for change detection. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 4063–4067. [Google Scholar]
- Liu, M.; Chai, Z.; Deng, H.; Liu, R. A CNN-transformer network with multiscale context aggregation for fine-grained cropland change detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 4297–4306. [Google Scholar] [CrossRef]
- Liu, R.; Tao, F.; Liu, X.; Na, J.; Leng, H.; Wu, J.; Zhou, T. RAANet: A Residual ASPP with Attention Framework for Semantic Segmentation of High-Resolution Remote Sensing Images. Remote Sens. 2022, 14, 3109. [Google Scholar] [CrossRef]
- Bhujel, A.; Kim, N.E.; Arulmozhi, E.; Basak, J.K.; Kim, H.T. A lightweight Attention-based convolutional neural networks for tomato leaf disease classification. Agriculture 2022, 12, 228. [Google Scholar] [CrossRef]
- Liu, W.; Yang, J.; Zhao, J.; Yang, L. A novel method of unsupervised change detection using multi-temporal PolSAR images. Remote Sens. 2017, 9, 1135. [Google Scholar] [CrossRef]
- Yan, L.; Xia, W.; Zhao, Z.; Wang, Y. A novel approach to unsupervised change detection based on hybrid spectral difference. Remote Sensing 2018, 10, 841. [Google Scholar] [CrossRef]
- Zhang, M.; Liu, Z.; Feng, J.; Liu, L.; Jiao, L. Remote Sensing Image Change Detection Based on Deep Multi-Scale Multi-Attention Siamese Transformer Network. Remote Sens. 2023, 15, 842. [Google Scholar] [CrossRef]
- Chen, Y.; Bruzzone, L. Self-supervised remote sensing images change detection at pixel-level. arXiv 2021, arXiv:2105.08501. [Google Scholar]
- Fang, S.; Li, K.; Shao, J.; Li, Z. SNUNet-CD: A densely connected Siamese network for change detection of VHR images. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
- Hafner, S.; Ban, Y.; Nascetti, A. Urban Change Detection Using a Dual-Task Siamese Network and Semi-Supervised Learning. arXiv 2022, arXiv:2204.12202. [Google Scholar]
- Peng, D.; Zhang, Y.; Guan, H. End-to-end change detection for high resolution satellite images using improved UNet++. Remote Sens. 2019, 11, 1382. [Google Scholar] [CrossRef]
- Chen, H.; Qi, Z.; Shi, Z. Remote sensing image change detection with transformers. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–14. [Google Scholar] [CrossRef]
- Zheng, H.; Gong, M.; Liu, T.; Jiang, F.; Zhan, T.; Lu, D.; Zhang, M. HFA-Net: High frequency attention Siamese network for building change detection in VHR remote sensing images. Pattern Recognit. 2022, 129, 108717–108728. [Google Scholar] [CrossRef]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Zhang, C.; Yue, P.; Tapete, D.; Jiang, L.; Shangguan, B.; Huang, L.; Liu, G. A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images. ISPRS J. Photogramm. Remote Sens. 2020, 166, 183–200. [Google Scholar] [CrossRef]
- Shi, Q.; Liu, M.; Li, S.; Liu, X.; Wang, F.; Zhang, L. A deeply supervised attention metric-based network and an open aerial image dataset for remote sensing change detection. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–16. [Google Scholar] [CrossRef]
Method | Rec (%) | Pre (%) | OA (%) | MIoU (%) | F1 (%) |
---|---|---|---|---|---|
FC-EF [29] | 78.8 | 82.3 | 94.1 | 80.6 | 80.5 |
FC-Siam-Conc [29] | 71.6 | 73.1 | 92.7 | 74.8 | 72.3 |
FC-Siam-Diff [29] | 75.0 | 76.6 | 93.3 | 77.3 | 75.8 |
IFNet [45] | 81.4 | 83.9 | 95.5 | 82.3 | 82.6 |
DSAMNet [46] | 87.4 | 85.1 | 96.4 | 85.9 | 86.2 |
BIT [40] | 85.3 | 86.6 | 96.1 | 85.5 | 85.9 |
The proposed | 90.9 | 84.9 | 97.0 | 87.6 | 87.8 |
Method | Rec (%) | Pre (%) | OA (%) | MIoU (%) | F1 (%) |
---|---|---|---|---|---|
FC-EF [29] | 81.9 | 84.1 | 95.7 | 83.2 | 83.0 |
FC-Siam-Conc [29] | 81.2 | 83.6 | 95.3 | 82.1 | 82.4 |
FC-Siam-Diff [29] | 82.6 | 84.8 | 96.0 | 84.1 | 83.7 |
IFNet [45] | 83.6 | 84.2 | 96.2 | 84.3 | 83.9 |
DSAMNet [46] | 87.5 | 86.3 | 97.1 | 86.6 | 86.9 |
BIT [40] | 86.2 | 86.8 | 97.0 | 86.1 | 86.5 |
The proposed | 91.2 | 85.2 | 97.3 | 87.9 | 88.1 |
Methods | MACs (G) | Params (M) |
---|---|---|
FC-EF [29] | 3.57 | 1.35 |
FC-Siam-Conc [29] | 5.31 | 1.54 |
FC-Siam-Diff [29] | 4.70 | 1.34 |
IFNet [45] | 82.26 | 35.73 |
DSAMNet [46] | 75.29 | 16.95 |
BIT [40] | 12.52 | 3.55 |
The proposed | 9.15 | 3.20 |
Method | Precision (%) | Recall (%) | OA (%) | MIoU (%) | F1-Score (%) |
---|---|---|---|---|---|
Backbone | 75.1 | 88.9 | 94.9 | 81.9 | 81.4 |
Backbone + BAB | 81.1 | 90.0 | 95.8 | 84.0 | 85.3 |
Backbone + MSAB | 84.9 | 90.9 | 97.0 | 87.6 | 87.8 |
Method | Precision (%) | Recall (%) | OA (%) | MIoU (%) | F1-Score (%) |
---|---|---|---|---|---|
Backbone | 77.1 | 89.2 | 95.1 | 82.3 | 82.7 |
Backbone + BAB | 81.0 | 90.5 | 96.2 | 84.5 | 85.5 |
Backbone + MSAB | 85.2 | 91.2 | 97.3 | 87.9 | 88.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hua, Z.; Yu, H.; Jing, P.; Song, C.; Xie, S. A Light-Weight Neural Network Using Multiscale Hybrid Attention for Building Change Detection. Sustainability 2023, 15, 3343. https://doi.org/10.3390/su15043343
Hua Z, Yu H, Jing P, Song C, Xie S. A Light-Weight Neural Network Using Multiscale Hybrid Attention for Building Change Detection. Sustainability. 2023; 15(4):3343. https://doi.org/10.3390/su15043343
Chicago/Turabian StyleHua, Zhihua, Haiyang Yu, Peng Jing, Caoyuan Song, and Saifei Xie. 2023. "A Light-Weight Neural Network Using Multiscale Hybrid Attention for Building Change Detection" Sustainability 15, no. 4: 3343. https://doi.org/10.3390/su15043343