Siamese InternImage for Change Detection
Abstract
:1. Introduction
- (1)
- A Siamese InternImage framework is proposed for change detection tasks. This represents the first CNN-based change detection vision foundation model and integrates the advantages of Siamese architecture and the merits of InternImage (e.g., reducing the inductive bias of traditional CNN, learning stronger and more robust feature representations via the large and effective receptive field, and integrating an adaptive spatial aggregation mechanism conditioned by inputs, etc.).
- (2)
- To compensate for the lack of local information, a refinement block is used to enhance local feature representation.
- (3)
- Channel attention is integrated into the InternImage model in order to model the temporal relationship and enhance the detection accuracy at different scales.
2. Related Work
2.1. CNN-Based Change Detection Methods
2.2. Transformer-Based Change Detection Methods
3. Materials and Methods
3.1. Stem and Downsampling Layers
3.2. Basic Stage
3.2.1. Basic Block
3.2.2. CAS Block
3.2.3. Refinement Block
3.3. Fusion Layer
3.4. SE Layer
3.5. UperNet
4. Results
4.1. Experiment Settings
- (1)
- FC-EF [15]. FC-EF is a variant of the U-Net model that directly incorporates bitemporal patches by concatenating them along the channel dimension.
- (2)
- FC-Siam-Conc [15]. The Siamese extensions of FC-EF feature two parallel streams of encoding layers with identical structures and shared weights, as seen in traditional Siamese networks. FC-Siam-conc combines the skip connections from both encoding streams by concatenating them.
- (3)
- FC-Siam-Diff [15]. The Siamese extension of FC-EF features two parallel streams of encoding layers with identical structures and shared weights, as seen in traditional Siamese networks. FC-Siam-diff combines the absolute differences between corresponding features extracted from the two temporal images.
- (4)
- USSFC-Net [7]. USSFC-Net utilizes a multiscale decoupled convolution to reduce computational redundancy and introduce an effective spatial–temporal feature fusion strategy to generate richer and more comprehensive features. By capturing intricate details across various scales and dimensions, USSFC-Net improves the overall accuracy and robustness of change detection processes.
- (5)
- HANet [20]. HANet integrates features across various scales and refines detailed features based on an hierarchical attention network implemented by Siamese architecture.
- (6)
- STNet [23]. STNet proposes a temporal feature fusion module and spatial feature fusion module to capture temporal information and fine-grained spatial information, respectively.
- (7)
- BIT [24]. BIT is a Transformer-based model that utilizes a CNN to extract features, and then utilizes a Transformer encoder–decoder architecture to learn and extract contextual information, and finally predicts change detection results through a shallow CNN.
4.2. Experiments Analysis
4.2.1. LEVIR-CD
4.2.2. WHU-CD
- (1)
- Robustness in irregular shapes: Our model demonstrates exceptional robustness in detecting irregular shapes, accurately identifying challenging examples.
- (2)
- Global information capture: Our model effectively captures global information and identifies large change areas with high precision, significantly surpassing other models.
- (3)
- Isolated and small change area Recognition: Our model excels in recognizing very small, isolated change areas and small areas at the edges of images.
- (4)
- Handling noise: Our model is adept at overcoming noise between images without compromising the detection of genuine changes.
- (5)
- Precision in mixed regions: Our model accurately identifies change areas in images containing both changed and unchanged regions, delineating these areas with clear boundaries.
- (6)
- Performance on regular areas: Our model performs well on regular areas, including densely packed regular-shaped change areas and conventional rectangular regions.
4.3. Ablation Study
4.3.1. Proposed Modules
- (1)
- Base + Cros. Base + Cros means the baseline model with the addition of the CAS block, but excluding the SE layer.
- (2)
- Base + Refine. Base + Refine represents the baseline model with the addition of the refinement block.
- (3)
- Base + Cros + Refine. Base + Cros + Refine represents the baseline model with the addition of the CAS block and the refinement block, but excluding the SE layer.
- (4)
- Base + Cros + Refine + SE. Base + Cros + Refine + SE means the complete model with all the modules included, including the baseline, the CAS block, the SE layer, and the refinement block.
4.3.2. CrosDCNv3
4.3.3. SE Layer
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Li, Z.; Shi, W.; Zhang, H.; Hao, M. Change detection based on Gabor wavelet features for very high resolution remote sensing images. IEEE Geosci. Remote Sens. Lett. 2017, 14, 783–787. [Google Scholar] [CrossRef]
- Celik, T. Unsupervised change detection in satellite images using principal component analysis and k-means clustering. IEEE Geosci. Remote Sens. Lett. 2009, 6, 772–776. [Google Scholar] [CrossRef]
- Byrne, G.F.; Crapper, P.F.; Mayo, K.K. Monitoring land-cover change by principal component analysis of multitemporal Landsat data. Remote Sens. Environ. 1980, 10, 175–184. [Google Scholar] [CrossRef]
- Volpi, M.; Tuia, D.; Bovolo, F.; Kanevski, M.; Bruzzone, L. Supervised change detection in VHR images using contextual information and support vector machines. Int. J. Appl. Earth Obs. Geoinf. 2013, 20, 77–85. [Google Scholar] [CrossRef]
- Chen, H.; Shi, Z. A spatial-temporal attention-based method and a new dataset for remote sensing image change detection. Remote Sens. 2020, 12, 1662. [Google Scholar] [CrossRef]
- Chen, Y.; Ouyang, X.; Agam, G. MFCNET: End-to-end approach for change detection in images. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP) 2018, Athens, Greece, 7–10 October 2018; pp. 4008–4012. [Google Scholar]
- Lei, T.; Geng, X.; Ning, H.; Lv, Z.; Gong, M.; Jin, Y.; Nandi, A.K. Ultralightweight spatial–spectral feature cooperation network for change detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 4402114. [Google Scholar] [CrossRef]
- Zhang, C.; Wang, L.; Cheng, S.; Li, Y. SwinSUNet: Pure Transformer Network for Remote Sensing Image Change Detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5224713. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision; 2017; pp. 764–773. Available online: https://openaccess.thecvf.com/content_iccv_2017/html/Dai_Deformable_Convolutional_Networks_ICCV_2017_paper.html (accessed on 25 May 2024).
- Zhu, X.; Hu, H.; Lin, S.; Dai, J. Deformable convnets v2: More deformable, better results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9308–9316. [Google Scholar]
- Wang, W.; Dai, J.; Chen, Z.; Huang, Z.; Li, Z.; Zhu, X.; Hu, X.; Lu, T.; Lu, L.; Li, H.; et al. Internimage: Exploring large-scale change detection vision foundation model s with deformable convolutions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 14408–14419. [Google Scholar]
- Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv 2015, arXiv:1511.07122. [Google Scholar]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Caye Daudt, R.; Le Saux, B.; Boulch, A. Fully Convolutional Siamese Networks for Change Detection. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 4063–4067. [Google Scholar]
- Fang, S.; Li, K.; Shao, J.; Li, Z. SNUNet-CD: A Densely Connected Siamese Network for Change Detection of VHR Images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 8007805. [Google Scholar] [CrossRef]
- Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. Unet++: A nested u-net architecture for medical image segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Proceedings of the 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, 20 September 2018; Springer: Cham, Switzerland, 2018; Proceedings 4; pp. 3–11. [Google Scholar]
- Chen, J.; Yuan, Z.; Peng, J.; Chen, L.; Huang, H.; Zhu, J.; Liu, Y.; Li, H. DASNet: Dual Attentive Fully Convolutional Siamese Networks for Change Detection in High-Resolution Satellite Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2021, 14, 1194–1206. [Google Scholar] [CrossRef]
- Huang, J.; Fu, Q.; Wang, X.; Ji, Y. Remote sensing building change detection based on improved U-Net. In Proceedings of the 2022 3rd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), Xi’an, China, 15–17 July 2022; pp. 772–775. [Google Scholar]
- Han, C.; Wu, C.; Guo, H.; Hu, M.; Chen, H. HANet: A hierarchical attention network for change detection with bitemporal very-high-resolution remote sensing images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 3867–3878. [Google Scholar] [CrossRef]
- Zhang, C.; Yue, P.; Tapete, D.; Jiang, L.; Shangguan, B.; Huang, L.; Liu, G. A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images. ISPRS J. Photogramm. Remote Sens. 2020, 166, 183–200. [Google Scholar]
- Fang, S.; Li, K.; Li, Z. Changer: Feature interaction is what you need for change detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5610111. [Google Scholar] [CrossRef]
- Ma, X.; Yang, J.; Hong, T.; Ma, M.; Zhao, Z.; Feng, T.; Zhang, W. STNet: Spatial and Temporal feature fusion network for change detection in remote sensing images. In Proceedings of the IEEE International Conference on Multimedia and Expo, Brisbane, Australia, 10–14 July 2023; pp. 2195–2200. [Google Scholar]
- Chen, H.; Qi, Z.; Shi, Z. Remote sensing image change detection with transformers. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5607514. [Google Scholar] [CrossRef]
- Bandara, W.G.C.; Patel, V.M. A transformer-based siamese network for change detection. In Proceedings of the IGARSS 2022–2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; pp. 207–210. [Google Scholar]
- Feng, Y.; Xu, H.; Jiang, J.; Liu, H.; Zheng, J. ICIF-Net: Intra-scale cross-interaction and inter-scale feature fusion network for bitemporal remote sensing images change detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4410213. [Google Scholar] [CrossRef]
- Zhou, Y.; Huo, C.; Zhu, J.; Huo, L.; Pan, C. DCAT: Dual cross-attention-based transformer for change detection. Remote Sens. 2023, 15, 2395. [Google Scholar] [CrossRef]
- Zhu, J.; Zhou, Y.; Xu, N.; Huo, C. Collaborative Learning Network for Change Detection and Semantic Segmentation of Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2023, 20, 6012305. [Google Scholar] [CrossRef]
- Xiao, T.; Liu, Y.; Zhou, B.; Jiang, Y.; Sun, J. Unified perceptual parsing for scene understanding. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 418–434. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
Method | Precision (%) | Recall (%) | F1 Score (%) | IoU (%) |
---|---|---|---|---|
FC-EF | 85.42 | 87.67 | 86.53 | 76.26 |
FC-Siam-Conc | 83.98 | 89.19 | 86.51 | 76.23 |
FC-Siam-Diff | 84.62 | 89.18 | 87.12 | 77.18 |
STANet | 85.70 | 87.70 | 86.70 | 76.50 |
USSFC-Net | 86.84 | 89.67 | 88.23 | 78.94 |
HANet | 91.21 | 89.36 | 90.28 | 82.27 |
BIT | 89.24 | 89.37 | 89.31 | 80.68 |
ChangeStar(FarSeg) | 89.88 | 88.72 | 89.30 | 80.66 |
ChangeFormer | 92.05 | 88.80 | 90.40 | 82.48 |
STNet | 92.06 | 89.03 | 90.52 | 82.09 |
DDLNet | 91.72 | 89.41 | 90.60 | 82.49 |
ours | 90.47 | 90.92 | 90.69 | 82.97 |
Method | Precision (%) | Recall (%) | F1 Score (%) | IoU (%) |
---|---|---|---|---|
FC-EF | 80.40 | 82.26 | 81.32 | 68.52 |
FC-Siam-Conc | 88.16 | 75.96 | 81.61 | 68.93 |
FC-Siam-Diff | 85.56 | 74.68 | 79.75 | 66.32 |
USSFC | 85.46 | 92.40 | 88.88 | 79.99 |
HANet | 88.30 | 88.01 | 88.16 | 78.82 |
BIT | 86.64 | 81.48 | 83.98 | 72.39 |
ChangeStar(FarSeg) | 88.78 | 85.31 | 87.01 | 77.00 |
ChangeFormer | 87.25 | 77.03 | 81.82 | 69.24 |
STNet | 87.84 | 87.08 | 87.46 | 77.72 |
DDLNet | 91.56 | 90.03 | 90.56 | 82.75 |
ours | 89.22 | 92.03 | 90.60 | 82.82 |
Module | Precision (%) | Recall (%) | F1 Score (%) | IoU (%) |
---|---|---|---|---|
Baseline | 90.18 | 89.10 | 89.64 | 81.22 |
Base + Cros | 90.11 | 89.63 | 89.87 | 81.60 |
Base + Refine | 89.50 | 90.96 | 90.22 | 82.19 |
Base + Cros + Refine | 90.57 | 90.65 | 90.61 | 82.83 |
Base + Cros + Refine + SE | 90.47 | 90.92 | 90.69 | 82.97 |
N | Precision (%) | Recall (%) | F1 Score (%) | IoU (%) |
---|---|---|---|---|
1 | 89.99 | 90.74 | 90.36 | 82.42 |
2 | 90.98 | 89.78 | 90.38 | 82.45 |
3 | 90.47 | 90.92 | 90.69 | 82.97 |
4 | 91.17 | 89.50 | 90.33 | 82.36 |
Method | Precision (%) | Recall (%) | F1 Score (%) | IoU (%) |
---|---|---|---|---|
FC-EF | 85.42 | 87.67 | 86.53 | 76.26 |
FC-Siam-Conc | 83.98 | 89.19 | 86.51 | 76.23 |
FC-Siam-Diff | 84.62 | 89.18 | 87.12 | 77.18 |
USSFC | 86.84 | 89.67 | 88.23 | 78.94 |
FC-EF + SElayer | 86.29 | 87.74 | 87.01 | 77.01 |
FC-Siam-Conc + SElayer | 86.16 | 88.76 | 87.44 | 77.68 |
FC-Siam-Diff + SElayer | 86.00 | 89.12 | 87.53 | 77.83 |
USSFC + SElayer | 86.89 | 89.92 | 88.38 | 79.18 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shen, J.; Huo, C.; Xiang, S. Siamese InternImage for Change Detection. Remote Sens. 2024, 16, 3642. https://doi.org/10.3390/rs16193642
Shen J, Huo C, Xiang S. Siamese InternImage for Change Detection. Remote Sensing. 2024; 16(19):3642. https://doi.org/10.3390/rs16193642
Chicago/Turabian StyleShen, Jing, Chunlei Huo, and Shiming Xiang. 2024. "Siamese InternImage for Change Detection" Remote Sensing 16, no. 19: 3642. https://doi.org/10.3390/rs16193642
APA StyleShen, J., Huo, C., & Xiang, S. (2024). Siamese InternImage for Change Detection. Remote Sensing, 16(19), 3642. https://doi.org/10.3390/rs16193642