Hybrid-TransCD: A Hybrid Transformer Remote Sensing Image Change Detection Network via Token Aggregation
Abstract
:1. Introduction
2. Related Work
2.1. Attention-Based Methods
2.2. Vision Transformer
3. Materials and Methods
3.1. Network Overview
3.2. Hybrid-Transformer Encoder
3.3. Hybrid-Transformer Decoder
3.4. Cascaded Feature Decoder
4. Experiments
4.1. Datasets and Implementation Details
- Baseline: A light CNN backbone (ResNet18) with a single-level decoder sub-network. The decoder sub-network comprises four upsampling blocks for progressively restoring the image scale, and the fusion among multiple outputs is used to predict the final change map.
- H-Res-E4-D4-ED-CFD: The CNN backbone with the proposed hybrid transformer layer including four H-TE and four H-TD blocks, the ED decoder structure performs as the H-TD layer. In the feature decoder stages, the cascaded feature decoder is utilized.
- H-Res-E4-D4-LD-CFD: The same as H-Res-E4-D4-ED-CFD, except that the ED decoder is replaced by LD.
- H-Res-E1-D1-ED-CFD: The numbers of H-TE block (M) and H-TD block (N) are both reduced to 1.
- H-Res-E1-D1-LD-CFD: Identical to the previous one except the ED decoder structure is replaced by LD.
- H-Res-E4-D0-LD-CFD: The H-TD behind H-TE is removed by setting N to 0 while M is 4.
- H-Res-E0-D4-LD-CFD: The H-TE is removed by setting M to 0 while four H-TDs are employed.
- H-E4-D4-LD-CFD: Different from the above, which combines CNN-based and transformer-based features, the input here is directly processed by our hybrid transformer network. Specifically, the bitemporal images are linearly projected rather than .
- H-Res-E4-D4-LD-Single: Compared to H-Res-E4-D4-LD-CFD, the cascade feature decoder is not applied to this structure. Specifically, the feature maps from the last decoder stage are concatenated with skip-connections for producing final features.
4.2. Ablation Study of Existing Methods
4.3. Ablation Study of Proposed Modules
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Shi, W.; Zhang, M.; Zhang, R.; Chen, S.; Zhan, Z. Change detection based on artificial intelligence: State-of-the-art and challenges. Remote Sens. 2020, 12, 1688. [Google Scholar] [CrossRef]
- Liu, Y.; Pang, C.; Zhan, Z.; Zhang, X.; Yang, X. Building Change Detection for Remote Sensing Images Using a Dual-Task Constrained Deep Siamese Convolutional Network Model. IEEE Geosci. Remote Sens. Lett. 2020, 18, 811–815. [Google Scholar] [CrossRef]
- Fang, B.; Pan, L.; Kou, R. Dual learning-based siamese framework for change detection using bitemporal VHR optical remote sensing images. Remote Sens. 2019, 11, 1292. [Google Scholar] [CrossRef] [Green Version]
- Wiratama, W.; Lee, J.; Sim, D. Change detection on multi-spectral images based on feature-level U-Net. IEEE Access 2020, 8, 12279–12289. [Google Scholar] [CrossRef]
- Wu, C.; Zhang, F.; Xia, J.; Xu, Y.; Li, G.; Xie, J.; Du, Z.; Liu, R. Building Damage Detection Using U-Net with Attention Mechanism from Pre-and Post-Disaster Remote Sensing Datasets. Remote Sens. 2021, 13, 905. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Kolesnikov, A.; Dosovitskiy, A.; Weissenborn, D.; Heigold, G.; Uszkoreit, J.; Beyer, L.; Minderer, M.; Dehghani, M.; Houlsby, N.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Zheng, Z.; Ma, A.; Zhang, L.; Zhong, Y. Change is Everywhere: Single-Temporal Supervised Object Change Detection in Remote Sensing Imagery. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 15193–15202. [Google Scholar]
- Liu, R.; Jiang, D.; Zhang, L.; Zhang, Z. Deep depthwise separable convolutional network for change detection in optical aerial images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 1109–1118. [Google Scholar] [CrossRef]
- Ke, Q.; Zhang, P. CS-HSNet: A Cross-Siamese Change Detection Network Based on Hierarchical-Split Attention. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 9987–10002. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. Unet++: A nested u-net architecture for medical image segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support; Springer: Berlin/Heidelberg, Germany, 2018; pp. 3–11. [Google Scholar]
- Ding, H.; Jiang, X.; Shuai, B.; Liu, A.Q.; Wang, G. Semantic segmentation with context encoding and multi-path decoding. IEEE Trans. Image Process. 2020, 29, 3520–3533. [Google Scholar] [CrossRef]
- Chen, J.; Yuan, Z.; Peng, J.; Chen, L.; Haozhe, H.; Zhu, J.; Liu, Y.; Li, H. DASNet: Dual attentive fully convolutional siamese networks for change detection of high resolution satellite images. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2020, 14, 1194–1206. [Google Scholar] [CrossRef]
- Chen, H.; Shi, Z. A spatial-temporal attention-based method and a new dataset for remote sensing image change detection. Remote Sens. 2020, 12, 1662. [Google Scholar] [CrossRef]
- Ke, Q.; Zhang, P. MCCRNet: A Multi-Level Change Contextual Refinement Network for Remote Sensing Image Change Detection. ISPRS Int. J. Geo.-Inf. 2021, 10, 591. [Google Scholar] [CrossRef]
- Zhang, Y.; Fu, L.; Li, Y.; Zhang, Y. Hdfnet: Hierarchical dynamic fusion network for change detection in optical aerial images. Remote Sens. 2021, 13, 1440. [Google Scholar] [CrossRef]
- Fang, S.; Li, K.; Shao, J.; Li, Z. SNUNet-CD: A Densely Connected Siamese Network for Change Detection of VHR Images. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
- Zhang, C.; Yue, P.; Tapete, D.; Jiang, L.; Shangguan, B.; Huang, L.; Liu, G. A deeply supervised image fusion network for change detection in high resolution bitemporal remote sensing images. ISPRS J. Photogramm. Remote Sens. 2020, 166, 183–200. [Google Scholar] [CrossRef]
- Raza, A.; Liu, Y.; Huo, H.; Fang, T. EUNet-CD: Efficient UNet++ for Change Detection of Very High-Resolution Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
- Chen, H.; Qi, Z.; Shi, Z. Efficient transformer based method for remote sensing image change detection. arXiv e-Prints 2021, arXiv:2103.00208. [Google Scholar]
- Wang, Z.; Zhang, Y.; Luo, L.; Wang, N. TransCD: Scene change detection via transformer-based architecture. Opt. Express 2021, 29, 41409–41427. [Google Scholar] [CrossRef]
- Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. arXiv 2021, arXiv:2102.12122. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. arXiv 2021, arXiv:2103.14030. [Google Scholar]
- Wang, W.; Yao, L.; Chen, L.; Lin, B.; Cai, D.; He, X.; Liu, W. CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention. arXiv 2021, arXiv:2108.00154. [Google Scholar]
- Lin, H.; Cheng, X.; Wu, X.; Yang, F.; Shen, D.; Wang, Z.; Song, Q.; Yuan, W. CAT: Cross Attention in Vision Transformer. arXiv 2021, arXiv:2106.05786. [Google Scholar]
- Shi, Q.; Liu, M.; Li, S.; Liu, X.; Wang, F.; Zhang, L. A deeply supervised attention metric-based network and an open aerial image dataset for remote sensing change detection. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–16. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
Method | Precision | Recall | IoU | OA | Kappa | |
---|---|---|---|---|---|---|
FC-EF | 81.26 | 80.17 | 80.71 | 71.53 | 98.39 | 84.10 |
FC-Siam-Conc | 90.99 | 76.77 | 83.69 | 79.96 | 98.49 | 85.96 |
FC-Siam-Diff | 89.64 | 82.68 | 86.02 | 78.86 | 98.65 | 85.78 |
U-Net++ | 90.66 | 85.32 | 87.91 | 80.94 | 98.24 | 86.79 |
DASNet | 80.76 | 79.53 | 79.91 | 74.65 | 94.32 | 85.14 |
STANet | 83.81 | 91.02 | 87.27 | 78.64 | 98.87 | 86.66 |
BiT | 89.24 | 89.37 | 89.31 | 80.68 | 98.92 | 88.97 |
Hybrid-TransCD (ours) | 91.45 | 88.72 | 90.06 | 81.92 | 99.00 | 89.54 |
Method | Precision | Recall | IoU | OA | Kappa | |
---|---|---|---|---|---|---|
FC-EF | 74.32 | 75.84 | 75.07 | 60.09 | 86.02 | 72.14 |
FC-Siam-Conc | 82.54 | 71.03 | 76.35 | 61.75 | 86.17 | 72.33 |
FC-Siam-Diff | 89.13 | 61.21 | 72.57 | 59.96 | 82.11 | 71.04 |
U-Net++ | 81.36 | 75.39 | 78.26 | 62.14 | 86.39 | 72.36 |
DASNet | 68.14 | 70.01 | 69.14 | 60.65 | 80.14 | 68.37 |
STANet | 70.76 | 85.33 | 77.37 | 63.09 | 87.96 | 71.24 |
BiT | 82.18 | 74.49 | 78.15 | 64.13 | 90.18 | 73.14 |
Hybrid-TransCD (ours) | 83.05 | 77.40 | 80.13 | 66.84 | 90.95 | 74.27 |
Method | Params (M) | FLOPs (G) |
---|---|---|
FC-EF | 81.35 | 20.36 |
FC-Siam-Conc | 81.54 | 21.58 |
FC-Siam-Diff | 81.35 | 21.42 |
U-Net++ | 131.26 | 47.35 |
DASNet | 108.69 | 31.33 |
STANet | 116.93 | 36.58 |
BiT | 121.85 | 42.99 |
Hybrid-TransCD (ours) | 166.57 | 51.38 |
Method | LEVIR-CD | SYSU-CD | ||||||
---|---|---|---|---|---|---|---|---|
Kappa | IoU | Kappa | IoU | Params (M) | FLOPs (G) | |||
Baseline | 86.99 | 86.35 | 76.99 | 75.25 | 67.92 | 62.87 | 16.64 | 26.86 |
H-Res-E4-D4-ED-CFD | 90.06 | 89.54 | 81.92 | 80.13 | 74.27 | 66.84 | 183.83 | 67.58 |
H-Res-E4-D4-LD-CFD | 89.93 | 89.41 | 81.70 | 79.53 | 73.37 | 66.02 | 173.73 | 55.18 |
H-Res-E1-D1-LD-CFD | 89.10 | 88.36 | 81.25 | 79.24 | 72.72 | 65.61 | 27.69 | 27.72 |
H-Res-E1-D1-ED-CFD | 89.73 | 89.24 | 81.44 | 77.46 | 70.90 | 63.22 | 27.69 | 27.44 |
H-Res-E0-D4-LD-CFD | 89.21 | 88.65 | 81.33 | 79.05 | 72.97 | 65.36 | 106.08 | 60.63 |
H-Res-E4-D0-LD-CFD | 88.23 | 87.63 | 78.93 | 71.94 | 64.00 | 56.18 | 106.08 | 60.63 |
H-E4-D4-LD-CFD | 84.40 | 83.60 | 73.01 | 78.13 | 71.84 | 64.12 | 23.11 | 8.66 |
H-Res-E4-D4-LD-Single | 88.87 | 88.65 | 80.84 | 78.68 | 72.68 | 65.47 | 166.57 | 51.38 |
Aggregation | LEVIR-CD | SYSU-CD | ||||||
---|---|---|---|---|---|---|---|---|
Kappa | IoU | Kappa | IoU | Params (M) | FLOPs (G) | |||
Linear | 89.77 | 89.24 | 81.44 | 76.16 | 70.04 | 61.50 | 147.6 | 42.18 |
Convolution | 89.82 | 89.27 | 81.52 | 77.17 | 69.73 | 62.82 | 171.71 | 53.47 |
Ours | 89.93 | 89.41 | 81.70 | 79.53 | 73.37 | 66.02 | 173.73 | 55.18 |
Layers | LEVIR-CD | SYSU-CD | ||||||
---|---|---|---|---|---|---|---|---|
Kappa | IoU | Kappa | IoU | Params (M) | FLOPs (G) | |||
FeedForward | 87.84 | 87.24 | 78.32 | 76.57 | 69.80 | 62.03 | 173.65 | 55.14 |
DW-FeedForward | 89.08 | 88.52 | 80.30 | 77.44 | 70.07 | 63.18 | 173.73 | 55.19 |
DE-Feedforward | 89.93 | 89.41 | 81.70 | 79.53 | 73.37 | 66.02 | 173.73 | 55.19 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ke, Q.; Zhang, P. Hybrid-TransCD: A Hybrid Transformer Remote Sensing Image Change Detection Network via Token Aggregation. ISPRS Int. J. Geo-Inf. 2022, 11, 263. https://doi.org/10.3390/ijgi11040263
Ke Q, Zhang P. Hybrid-TransCD: A Hybrid Transformer Remote Sensing Image Change Detection Network via Token Aggregation. ISPRS International Journal of Geo-Information. 2022; 11(4):263. https://doi.org/10.3390/ijgi11040263
Chicago/Turabian StyleKe, Qingtian, and Peng Zhang. 2022. "Hybrid-TransCD: A Hybrid Transformer Remote Sensing Image Change Detection Network via Token Aggregation" ISPRS International Journal of Geo-Information 11, no. 4: 263. https://doi.org/10.3390/ijgi11040263
APA StyleKe, Q., & Zhang, P. (2022). Hybrid-TransCD: A Hybrid Transformer Remote Sensing Image Change Detection Network via Token Aggregation. ISPRS International Journal of Geo-Information, 11(4), 263. https://doi.org/10.3390/ijgi11040263