DTT-CGINet: A Dual Temporal Transformer Network with Multi-Scale Contour-Guided Graph Interaction for Change Detection
Abstract
:1. Introduction
- (1)
- We propose a novel dual temporal attention mechanism, considering non-local structural relationships between bi-temporal images and effectively modeling contextual semantic relationships.
- (2)
- We propose a contour extraction module (CEM) based on the Sobel convolutional block to effectively extract contours from remote sensing images. Additionally, we introduce the contour-guided graph reasoning module (CGRM), which utilizes contour maps to guide the generation of graph representations within contour-enclosed regions. To enhance CGRM’s graph reasoning, we employ a joint attention mechanism to improve information propagation between graph vertices, ensuring the preservation of boundary integrity in change detection results.
- (3)
- Extensive experiments on three CD datasets demonstrate that our proposed method outperforms previous state-of-the-art methods in terms of accuracy and robustness.
2. Related Work
2.1. Traditional Methods
2.2. CNN-Based Model
2.3. Transformer
2.4. Graph Convolutional Network
3. Materials and Methods
3.1. Overall Architecture
3.2. Feature Extraction Backbone
3.3. Contour-Guided Graph Interaction Module
3.3.1. Contour Extraction Module
3.3.2. Contour-Guided Graph Projection
3.3.3. Graph Interaction Module
- Joint Attention: To facilitate information interaction between the graph representations of bi-temporal images, we introduced joint attention from [61] to focus on graph nodes that undergo genuine changes. Specifically, as the graph itself is a one-dimensional sequence, we utilize a convolutional operation to generate the query, key, and value for the graph representations of temporal images 1 and 2, denoted as . Note that the channel dimension of Q needs to be halved. Subsequently, we concatenate and to obtain , and through sequential matrix multiplication and Softmax, we obtain the similarity matrices between and (where ). As is a joint query from both temporal phases, it enables dual temporal interaction among graph nodes. Mathematically, JointAtt can be expressed as:
- Graph Convolution: The architecture of the graph convolution unit is illustrated in Figure 6b, consisting of two 1D convolution layers that operate independently on the channel and node dimensions. The final output can be expressed as:Here, denotes the identity matrix, represents the adjacency matrix, and W denotes the updated parameters of the convolutional layer. A and W are randomly initialized during training for gradient descent.
3.3.4. Graph Reprojection
3.4. Feature Pyramid Decoder
3.5. Convolutional Block Attention Module (CBAM)
3.6. Dual Temporal Transformer
3.6.1. Tokenizer
3.6.2. DTT Encoder
- Self-Attention in Transformer
- 2.
- The Dual Temporal Attention in DTT Encoder
- Situation of unchanged: As shown in Figure 12a, if , and assuming token i at temporal 2 is similar to token j at temporal 1. The in transformer can be intuitively understood as calculating the similarity between two tokens; thus, we have , where a is a relatively large positive value. It can also be obtained that , indicating that token i in temporal 1 is similar to token j. At the same time, we have . Through transitive similarity, it can be inferred that token i is similar in both temporal images. In the context of bi-temporal images, this also implies that the region represented by token i has remained unchanged. Consequently, when the attention output of token j is computed by weighted summation, the feature component from token i will be suppressed.
- Situation of changed: As shown in Figure 12b, assuming that tokens i and j of temporal image 1 are dissimilar, we have . If , this indicates the region represented by token i has remained changed. At the same time, we have . It means that when computing attention output , the features of token i will be strengthened to highlight changed regions. In contrast, for self-attention in Equation (11), the tokens representing changed and unchanged regions are treated equally when calculating attention output. We think this is not conducive to highlighting features of changed regions while suppressing features of unchanged regions.
3.6.3. DTT decoder
3.7. Loss Function
3.7.1. Focal Loss
3.7.2. Dice Loss
3.7.3. Contrastive Loss
4. Results
4.1. Description of Datasets
- WHU-CD [64] is a public building CD dataset from Christchurch, New Zealand, with a spatial size of 32,507 × 15,354 pixels at a resolution of 0.2 m. It comprises images in the red (R), green (G), and blue (B) bands. To facilitate efficient handling, we divided the large image into non-overlapping slices of pixels. Then, the training/validation/test sample numbers were 6096/762/762, respectively.
- LEVIR-CD [65] consists of 637 very high-resolution (VHR, 0.5 m/pixel) Google Earth image patch pairs with a size of pixels. These bi-temporal images with a time span of 5 to 14 years have significant land-use changes, especially construction growth. LEVIR-CD covers various types of buildings, such as villa residences, tall apartments, small garages, and large warehouses. We followed the default configuration to facilitate model training and partitioned the input images into smaller patches of pixels. The dataset was split into 7120 image pairs for training, 1024 for validation, and 2048 for testing.
- CDD [66] is a dataset of 11 pairs of multispectral images for remote sensing change detection. The dataset contains seven pairs of seasonal images with a dimension of pixels and four pairs of images with a dimension of 1900 × 1000 pixels. The spatial resolution of these images varies from 3 to 100 cm per pixel. The authors divided the image pairs into non-overlapping image patches of pixels to make them suitable for processing. They obtained 15,998 pairs of bi-temporal remote sensing images and split them into training, validation, and test sets with 10,000, 2998, and 3000 pairs, respectively.
4.2. Metrics
4.3. Experimental Settings
4.4. Compared Methods
- FC-EF [40]: FC-EF is a change detection network that employs early fusion, based on the U-Net architecture of a fully convolutional network.
- FC-Siam-Conc [40]: FC-Siam-Conc is a variant of FC-EF that employs a late fusion strategy during the decoding stage, using concatenation as the fusion method.
- FC-Siam-Diff [40]: FC-Siam-Diff is another FC-EF variant, utilizing the difference’s absolute value as the feature fusion method.
- STANet [65]: STANet improves remote sensing image change detection by capturing spatial–temporal dependencies at different scales.
- SNUNet [41]: SNUNet is a Unet-type network that introduces dense skip connections.
- DMINet [61]: DMINet is a novel dual temporal change detection network based on joint attention.
- BIT [17]: BIT enhances high-resolution remote sensing change detection by efficiently capturing spatial–temporal contexts through a transformer-based approach.
- ICIF-Net [68]: ICIF-Net is a hybrid network that combines CNN and transformer in parallel, aiming to leverage the respective strengths of CNN and transformer.
- ChangeFormer [20]: ChangeFormer introduced a novel multi-scale transformer.
4.5. Evaluation Results
4.5.1. Quantitative Comparisons
- On the LEVIR-CD dataset, Table 3 shows our method demonstrates significant improvement, achieving precision, recall, and F1 values of , , and , respectively. FC-Diff achieves the highest F1 score among the first three comparison methods, reaching . Compared to the second-best method, our approach increases the F1 value by .
- On the WHU-CD dataset, our method demonstrates significant improvement, as shown in Table 4. It shows enhancements in all evaluation metrics, with precision, recall, and F1 values being , , and , respectively. Simultaneously, our model has a relatively small number of parameters, . Consequently, our approach achieves a commendable balance between model complexity, time cost, and accuracy. In contrast, earlier methods such as FC-EF, FC-Conc, and FC-Diff show less satisfactory performance. The results also indicate that concatenation preserves more useful information for change detection on the WHU-CD dataset than the difference operation. Moreover, our approach exhibits notable advantages over transformer-based methods, including BIT [17] and ChangeFormer [20]. It also outperforms the hybrid network ICIF-Net [68], which employs a parallel structure of CNN and transformer.
- On the CDD dataset, our method also achieved the best performance, with an F1 score of , as shown in Table 5. Due to the larger training set of the CDD dataset, which consists of 10,000 images compared to 6096 pairs for WHU-CD and 7120 pairs for LEVIR-CD, neural network models can better learn feature representations, leading to superior generalization capabilities. Consequently, almost all methods significantly improve their F1 scores on the CDD dataset, with transformer-based approaches showing particularly notable enhancements.
4.5.2. Qualitative Comparisons
- LEVIR-CD dataset: To further demonstrate the effectiveness of our approach, Figure 13 shows the visual results of different methods in three types of regions. These regions include isolated regions, dense regions, and large-span regions, demonstrating our method’s superiority over others. In particular, Figure 13(1,2) suggests that our method can effectively capture isolated regions, while many previous methods fail to provide complete segmentation results (d–f). Furthermore, our approach outperforms other methods in detecting dense regions, as shown in Figure 13(3,4). The change results effectively ensure boundary integrity, with more obvious gaps between boundaries. Our result is more consistent than other transformer-based methods regarding large-span area boundaries in complex scenes, as Figure 13(j6,k6,l6) shows. At the same time, some previous methods (FC-EF, FC-Siam-Conc, FC-diff) exhibit not only unclear boundaries but also significant voids in changing regions, as shown in Figure 13d–f. These experiments show that DTT-CGINet performs excellently on the LEVIR-CD dataset, with clear, complete boundaries, sensitivity to small targets, and no holes in large-area detection.
- WHU-CD dataset: Consistent with LEVIR-CD, we selected three region types representing independent, dense, and large-span areas to evaluate our method visually from diverse perspectives. For the independent area in Figure 14(1,2), many previous methods tended to produce more false positives and false negatives. Additionally, previous methods commonly exhibited boundary sticking issues for the dense area in Figure 14(3,4), whereas our results were more accurate. As for the large-span area in Figure 14(5,6), although most methods detected it fairly well, our approach had fewer holes, higher boundary integrity, and clearer boundaries.
- CDD Dataset: We also selected three different types of regions: (1) isolated regions, (2) dense regions, and (3) large-span regions—to visually validate our method’s performance. Some early methods, like FC-EF, FC-Conc, and FC-Diff, could not detect changes in isolated or dense regions and performed poorly in large-span regions. Subsequent methods, like STANet and SNUNet, could more completely detect isolated region changes but had sticking phenomena in dense regions and unclear, incomplete boundaries for large spans. The latest transformer-based methods could provide relatively satisfactory results, but our method outperformed them in maintaining boundary integrity, as demonstrated in Figure 15(m6,m3).
5. Discussion
5.1. Ablation Study on the Network Components
- In Ablation 1, we removed the CGIM module to validate its effectiveness. The CGIM module is crucial in constraining graph projection to generate graph nodes within edge regions, yielding well-defined boundary segmentation results. As indicated in Table 6, removing CGIM led to a decrease in all evaluation metrics for the network. In particular, the primary evaluation metric, F1, declines . Figure 16(1,2) illustrates boundary blurring resulting from CGIM absence.
- In Ablation 2, to demonstrate the effectiveness of the Feature Pyramid Decoder (FPD) in DTT-CGINet, we conducted an ablation study by removing it. Since the FPD can aggregate features from multiple scales, its absence would result in the network solely using the CGIM on the output features from the backbone’s layer 3, underutilizing the shallow output features from layers 1 and 2. Therefore, as indicated in the second row of Table 6, removing the FPD led to an overall performance decline in the network, with the F1 score experiencing a decrease of . Visually, this impacted the model’s ability to detect changes across regions of varying scales. Figure 16(d3) shows that many small regions are fused, and the network fails to capture finer local details. Figure 16(d4) illustrates missing detections for small objects.
- In Ablation 3, we conducted an ablation study by removing the CBAM submodule. CBAM reweights feature maps based on channel and spatial attention, guiding the network to focus on relevant changes and ignore irrelevant ones. Removing CBAM resulted in an overall performance decline, as shown in Table 6, with a decrease in the F1 score.
- In Ablation 4, we conducted an ablation study to demonstrate the efficacy of the dual temporal transformer (DTT). As described in Section 3.6, DTT can model non-local structural relationships between bi-temporal images. The fourth row of Table 6 indicates overall performance declines when DTT is absent, with the F1 score dropping . Visually, as shown in Figure 16(5,6), the lack of DTT hinders the network’s long-range context modeling capability, impacting change detection in large-span regions.
5.2. Parameter Analysis of Loss
5.3. Ablation on the CGIM
5.4. Ablation on the Tokenizer
5.5. Ablation Study on Pre-training
5.6. Model Efficiency Analysis
5.7. Visualization of Network
6. Conclusions
Limitation
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Gong, M.; Zhao, J.; Liu, J.; Miao, Q.; Jiao, L. Change detection in synthetic aperture radar images based on deep neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2015, 27, 125–138. [Google Scholar] [CrossRef]
- Radke, R.J.; Andra, S.; Al-Kofahi, O.; Roysam, B. Image change detection algorithms: A systematic survey. IEEE Trans. Image Process. 2005, 14, 294–307. [Google Scholar] [CrossRef]
- Zerrouki, N.; Harrou, F.; Sun, Y.; Hocini, L. A machine learning-based approach for land cover change detection using remote sensing and radiometric measurements. IEEE Sens. J. 2019, 19, 5843–5850. [Google Scholar] [CrossRef]
- Marin, C.; Bovolo, F.; Bruzzone, L. Building change detection in multitemporal very high resolution SAR images. IEEE Trans. Geosci. Remote Sens. 2014, 53, 2664–2682. [Google Scholar] [CrossRef]
- Eismann, M.T.; Meola, J.; Hardie, R.C. Hyperspectral change detection in the presenceof diurnal and seasonal variations. IEEE Trans. Geosci. Remote Sens. 2007, 46, 237–249. [Google Scholar] [CrossRef]
- Zhou, J.; Kwan, C.; Ayhan, B.; Eismann, M.T. A novel cluster kernel RX algorithm for anomaly and change detection using hyperspectral images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 6497–6504. [Google Scholar] [CrossRef]
- Kwan, C. Methods and challenges using multispectral and hyperspectral images for practical change detection applications. Information 2019, 10, 353. [Google Scholar] [CrossRef]
- Marmanis, D.; Datcu, M.; Esch, T.; Stilla, U. Deep learning earth observation classification using ImageNet pretrained networks. IEEE Geosci. Remote Sens. Lett. 2015, 13, 105–109. [Google Scholar] [CrossRef]
- Shi, Q.; Liu, M.; Li, S.; Liu, X.; Wang, F.; Zhang, L. A deeply supervised attention metric-based network and an open aerial image dataset for remote sensing change detection. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5604816. [Google Scholar] [CrossRef]
- Chen, H.; Li, W.; Shi, Z. Adversarial instance augmentation for building change detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5603216. [Google Scholar] [CrossRef]
- Peng, D.; Zhang, Y.; Guan, H. End-to-end change detection for high resolution satellite images using improved UNet++. Remote Sens. 2019, 11, 1382. [Google Scholar] [CrossRef]
- Liu, Y.; Pang, C.; Zhan, Z.; Zhang, X.; Yang, X. Building change detection for remote sensing images using a dual-task constrained deep siamese convolutional network model. IEEE Geosci. Remote Sens. Lett. 2020, 18, 811–815. [Google Scholar] [CrossRef]
- Peng, X.; Zhong, R.; Li, Z.; Li, Q. Optical remote sensing image change detection based on attention mechanism and image difference. IEEE Trans. Geosci. Remote Sens. 2020, 59, 7296–7307. [Google Scholar] [CrossRef]
- Jiang, H.; Hu, X.; Li, K.; Zhang, J.; Gong, J.; Zhang, M. PGA-SiamNet: Pyramid feature-based attention-guided Siamese network for remote sensing orthoimagery building change detection. Remote Sens. 2020, 12, 484. [Google Scholar] [CrossRef]
- Zhang, M.; Xu, G.; Chen, K.; Yan, M.; Sun, X. Triplet-based semantic relation learning for aerial remote sensing image change detection. IEEE Geosci. Remote Sens. Lett. 2018, 16, 266–270. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Chen, H.; Qi, Z.; Shi, Z. Remote sensing image change detection with transformers. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5607514. [Google Scholar] [CrossRef]
- Song, F.; Zhang, S.; Lei, T.; Song, Y.; Peng, Z. MSTDSNet-CD: Multiscale swin transformer and deeply supervised network for change detection of the fast-growing urban regions. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6508505. [Google Scholar] [CrossRef]
- Liu, M.; Chai, Z.; Deng, H.; Liu, R. A CNN-transformer network with multiscale context aggregation for fine-grained cropland change detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 4297–4306. [Google Scholar] [CrossRef]
- Bandara, W.G.C.; Patel, V.M. A transformer-based siamese network for change detection. In Proceedings of the IGARSS 2022—2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 207–210. [Google Scholar]
- Ding, L.; Guo, H.; Liu, S.; Mou, L.; Zhang, J.; Bruzzone, L. Bi-temporal semantic reasoning for the semantic change detection in HR remote sensing images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5620014. [Google Scholar] [CrossRef]
- Zhou, Y.; Huo, C.; Zhu, J.; Huo, L.; Pan, C. DCAT: Dual Cross-Attention-Based Transformer for Change Detection. Remote Sens. 2023, 15, 2395. [Google Scholar] [CrossRef]
- Xu, C.; Ye, Z.; Mei, L.; Shen, S.; Zhang, Q.; Sui, H.; Yang, W.; Sun, S. SCAD: A Siamese Cross-Attention Discrimination Network for Bitemporal Building Change Detection. Remote Sens. 2022, 14, 6213. [Google Scholar] [CrossRef]
- Wang, K.; Zhang, X.; Lu, Y.; Zhang, X.; Zhang, W. CGRNet: Contour-guided graph reasoning network for ambiguous biomedical image segmentation. Biomed. Signal Process. Control 2022, 75, 103621. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Manas, O.; Lacoste, A.; Giró-i Nieto, X.; Vazquez, D.; Rodriguez, P. Seasonal contrast: Unsupervised pre-training from uncurated remote sensing data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 9414–9423. [Google Scholar]
- Bourdis, N.; Marraud, D.; Sahbi, H. Constrained optical flow for aerial image change detection. In Proceedings of the 2011 IEEE International Geoscience and Remote Sensing Symposium, Vancouver, BC, Canada, 24–29 July 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 4176–4179. [Google Scholar]
- Johnson, R.D.; Kasischke, E. Change vector analysis: A technique for the multispectral monitoring of land cover and condition. Int. J. Remote Sens. 1998, 19, 411–426. [Google Scholar] [CrossRef]
- Nielsen, A.A.; Conradsen, K.; Simpson, J.J. Multivariate alteration detection (MAD) and MAF postprocessing in multispectral, bitemporal image data: New approaches to change detection studies. Remote Sens. Environ. 1998, 64, 1–19. [Google Scholar] [CrossRef]
- Nielsen, A.A. The regularized iteratively reweighted MAD method for change detection in multi-and hyperspectral data. IEEE Trans. Image Process. 2007, 16, 463–478. [Google Scholar] [CrossRef] [PubMed]
- Deng, J.; Wang, K.; Deng, Y.; Qi, G. PCA-based land-use change detection and analysis using multitemporal and multisensor satellite data. Int. J. Remote Sens. 2008, 29, 4823–4838. [Google Scholar] [CrossRef]
- Wu, C.; Du, B.; Zhang, L. Slow feature analysis for change detection in multispectral imagery. IEEE Trans. Geosci. Remote Sens. 2013, 52, 2858–2874. [Google Scholar] [CrossRef]
- Lv, P.; Zhong, Y.; Zhao, J.; Zhang, L. Unsupervised change detection based on hybrid conditional random field model for high spatial resolution remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 2018, 56, 4002–4015. [Google Scholar] [CrossRef]
- Nemmour, H.; Chibani, Y. Multiple support vector machines for land cover change detection: An application for mapping urban extensions. ISPRS J. Photogramm. Remote Sens. 2006, 61, 125–133. [Google Scholar] [CrossRef]
- Im, J.; Jensen, J.R. A change detection model based on neighborhood correlation image analysis and decision tree classification. Remote Sens. Environ. 2005, 99, 326–340. [Google Scholar] [CrossRef]
- Wessels, K.J.; Van den Bergh, F.; Roy, D.P.; Salmon, B.P.; Steenkamp, K.C.; MacAlister, B.; Swanepoel, D.; Jewitt, D. Rapid land cover map updates using change detection and robust random forest classifiers. Remote Sens. 2016, 8, 888. [Google Scholar] [CrossRef]
- Moser, G.; Angiati, E.; Serpico, S.B. Multiscale unsupervised change detection on optical images by Markov random fields and wavelets. IEEE Geosci. Remote Sens. Lett. 2011, 8, 725–729. [Google Scholar] [CrossRef]
- Ma, B.; Chang, C.Y. Semantic segmentation of high-resolution remote sensing images using multiscale skip connection network. IEEE Sens. J. 2021, 22, 3745–3755. [Google Scholar] [CrossRef]
- Sun, L.; Cheng, S.; Zheng, Y.; Wu, Z.; Zhang, J. SPANet: Successive pooling attention network for semantic segmentation of remote sensing images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 4045–4057. [Google Scholar] [CrossRef]
- Daudt, R.C.; Le Saux, B.; Boulch, A. Fully convolutional siamese networks for change detection. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 4063–4067. [Google Scholar]
- Fang, S.; Li, K.; Shao, J.; Li, Z. SNUNet-CD: A densely connected Siamese network for change detection of VHR images. IEEE Geosci. Remote Sens. Lett. 2021, 19, 8007805. [Google Scholar] [CrossRef]
- Zhang, C.; Yue, P.; Tapete, D.; Jiang, L.; Shangguan, B.; Huang, L.; Liu, G. A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images. ISPRS J. Photogramm. Remote Sens. 2020, 166, 183–200. [Google Scholar] [CrossRef]
- Zhang, Y.; Fu, L.; Li, Y.; Zhang, Y. HDFNet: Hierarchical dynamic fusion network for change detection in optical aerial images. Remote Sens. 2021, 13, 1440. [Google Scholar] [CrossRef]
- Huang, J.; Shen, Q.; Wang, M.; Yang, M. Multiple attention Siamese network for high-resolution image change detection. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5406216. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6010. [Google Scholar]
- Yin, W.; Kann, K.; Yu, M.; Schütze, H. Comparative study of CNN and RNN for natural language processing. arXiv 2017, arXiv:1702.01923. [Google Scholar]
- Yu, Y.; Si, X.; Hu, C.; Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 2019, 31, 1235–1270. [Google Scholar] [CrossRef]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable detr: Deformable transformers for end-to-end object detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
- Ding, M.; Yang, Z.; Hong, W.; Zheng, W.; Zhou, C.; Yin, D.; Lin, J.; Zou, X.; Shao, Z.; Yang, H. Cogview: Mastering text-to-image generation via transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 19822–19835. [Google Scholar]
- Li, Q.; Zhong, R.; Du, X.; Du, Y. TransUNetCD: A hybrid transformer network for change detection in optical remote-sensing images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5622519. [Google Scholar] [CrossRef]
- Pujara, J.; Miao, H.; Getoor, L.; Cohen, W. Knowledge graph identification. In The Semantic Web–ISWC 2013: Proceedings of the 12th International Semantic Web Conference, Sydney, NSW, Australia, 21–25 October 2013; Proceedings, Part I 12; Springer: Berlin/Heidelberg, Germany, 2013; pp. 542–557. [Google Scholar]
- Isinkaye, F.O.; Folajimi, Y.O.; Ojokoh, B.A. Recommendation systems: Principles, methods and evaluation. Egypt. Inform. J. 2015, 16, 261–273. [Google Scholar] [CrossRef]
- Romero, C.; Ventura, S. Data mining in education. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2013, 3, 12–27. [Google Scholar] [CrossRef]
- Li, Y.; Gupta, A. Beyond grids: Learning graph representations for visual recognition. Adv. Neural Inf. Process. Syst. 2018, 31, 9245–9255. [Google Scholar]
- Liu, Q.; Xiao, L.; Yang, J.; Wei, Z. CNN-enhanced graph convolutional network with pixel-and superpixel-level feature fusion for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2020, 59, 8657–8671. [Google Scholar] [CrossRef]
- Zhang, X.; Tan, X.; Chen, G.; Zhu, K.; Liao, P.; Wang, T. Object-based classification framework of remote sensing images with graph convolutional networks. IEEE Geosci. Remote Sens. Lett. 2021, 19, 8010905. [Google Scholar] [CrossRef]
- Liu, C. Remote Sensing Image Change Detection with Graph Interaction. arXiv 2023, arXiv:2307.02007. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1–9. [Google Scholar] [CrossRef]
- Feng, Y.; Jiang, J.; Xu, H.; Zheng, J. Change detection on remote sensing images using dual-branch multilevel intertemporal network. IEEE Trans. Geosci. Remote Sens. 2023, 61, 4401015. [Google Scholar] [CrossRef]
- Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Ji, S.; Wei, S.; Lu, M. Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set. IEEE Trans. Geosci. Remote Sens. 2018, 57, 574–586. [Google Scholar] [CrossRef]
- Chen, H.; Shi, Z. A spatial-temporal attention-based method and a new dataset for remote sensing image change detection. Remote Sens. 2020, 12, 1662. [Google Scholar] [CrossRef]
- Lebedev, M.; Vizilter, Y.V.; Vygolov, O.; Knyaz, V.A.; Rubis, A.Y. Change detection in remote sensing images using conditional adversarial networks. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2018, 42, 565–571. [Google Scholar] [CrossRef]
- Hadsell, R.; Chopra, S.; LeCun, Y. Dimensionality reduction by learning an invariant mapping. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 17–22 June 2006; IEEE: Piscataway, NJ, USA, 2006; Volume 2, pp. 1735–1742. [Google Scholar]
- Feng, Y.; Xu, H.; Jiang, J.; Liu, H.; Zheng, J. ICIF-Net: Intra-scale cross-interaction and inter-scale feature fusion network for bitemporal remote sensing images change detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4410213. [Google Scholar] [CrossRef]
- Selvaraju, R.R.; Das, A.; Vedantam, R.; Cogswell, M.; Parikh, D.; Batra, D. Grad-CAM: Why did you say that? arXiv 2016, arXiv:1611.07450. [Google Scholar]
Layer Name | Output Size () | Details |
---|---|---|
Conv1 | , stride 2 | |
Max Pooling | max pool, stride 2 | |
layer1 | ||
layer2 | ||
layer3 | ||
layer4 | ||
Upsample | ||
Conv2 | , 32 stride 1 |
Dataset | Pairs | Size | Change Pixels | Change Ratio |
---|---|---|---|---|
WHU-CD [64] | 1 | 32,507 × 15,354 | 21,352,815 | |
LEVIR-CD [65] | 637 | 30,913,975 | ||
CDD [66] | 7 and 4 | and | 9,198,562 and 400,279 |
Model | Precision | Recall | F1 | IoU | OA | Params (M) | FLOPs (G) |
---|---|---|---|---|---|---|---|
FC-EF [40] | 86.69 | 77.95 | 82.09 | 77.95 | 98.28 | 1.35 | 3.58 |
FC-Siam-Conc [40] | 84.37 | 81.51 | 82.91 | 70.80 | 98.49 | 1.55 | 5.33 |
FC-Siam-Diff [40] | 89.33 | 82.39 | 85.72 | 75.00 | 98.61 | 1.35 | 4.73 |
STANet [65] | 70.88 | 96.01 | 81.56 | 68.86 | 97.79 | 16.89 | 16.9 |
DMINet [61] | 91.09 | 85.26 | 88.08 | 78.70 | 98.82 | 6.24 | 14.55 |
SNUNet [41] | 91.21 | 86.69 | 88.89 | 78.83 | 98.82 | 12.04 | 54.83 |
BIT [17] | 89.10 | 89.16 | 89.13 | 80.68 | 98.92 | 3.50 | 10.63 |
ICIF-Net [68] | 91.63 | 88.10 | 89.83 | 81.54 | 98.98 | 23.83 | 25.37 |
ChangeFormer [20] | 91.94 | 88.81 | 90.34 | 82.37 | 99.04 | 41.03 | 202.79 |
Ours | 92.45 | 89.83 | 91.12 | 83.50 | 99.12 | 4.71 | 18.42 |
Model | Precision | Recall | F1 | IoU | OA | Params (M) | FLOPs (G) |
---|---|---|---|---|---|---|---|
FC-EF [40] | 77.67 | 77.16 | 77.42 | 63.16 | 98.08 | 1.35 | 3.58 |
FC-Siam-Conc [40] | 36.49 | 82.75 | 50.65 | 40.03 | 93.40 | 1.55 | 5.33 |
FC-Siam-Diff [40] | 45.18 | 82.28 | 58.33 | 41.17 | 94.98 | 1.35 | 4.73 |
STANet [65] | 79.37 | 85.50 | 82.32 | 69.95 | 98.66 | 16.89 | 16.9 |
DMINet [61] | 83.98 | 91.09 | 87.39 | 77.61 | 98.88 | 6.24 | 14.55 |
SNUNet [41] | 91.72 | 86.75 | 89.16 | 80.43 | 99.10 | 12.04 | 54.83 |
BIT [17] | 90.46 | 77.55 | 83.51 | 71.69 | 98.69 | 3.50 | 10.63 |
ICIF-Net [68] | 92.25 | 89.28 | 90.74 | 83.04 | 99.22 | 23.83 | 25.37 |
ChangeFormer [20] | 94.18 | 89.14 | 91.86 | 89.24 | 99.37 | 41.03 | 202.79 |
Ours | 95.51 | 92.80 | 94.14 | 88.93 | 99.51 | 4.71 | 18.42 |
Model | Precision | Recall | F1 | IoU | OA | Params (M) | FLOPs (G) |
---|---|---|---|---|---|---|---|
FC-EF [40] | 88.46 | 49.73 | 63.67 | 46.70 | 92.99 | 1.35 | 3.58 |
FC-Siam-Conc [40] | 89.71 | 58.73 | 70.98 | 55.02 | 94.21 | 1.55 | 5.33 |
FC-Siam-Diff [40] | 90.16 | 51.38 | 65.46 | 48.65 | 93.31 | 1.35 | 4.73 |
STANet [65] | 76.97 | 94.55 | 84.86 | 73.71 | 95.84 | 16.89 | 16.9 |
DMINet [61] | 96.02 | 95.23 | 95.61 | 91.88 | 98.89 | 6.24 | 14.55 |
SNUNet [41] | 94.46 | 89.72 | 92.03 | 85.23 | 98.08 | 12.035 | 54.83 |
BIT [17] | 95.46 | 90.68 | 93.01 | 86.94 | 98.32 | 3.50 | 10.63 |
ICIF-Net [68] | 95.04 | 93.79 | 94.41 | 89.41 | 98.03 | 23.83 | 25.37 |
ChangeFormer [20] | 95.47 | 94.31 | 94.88 | 90.27 | 98.74 | 41.03 | 202.79 |
Ours | 96.79 | 94.78 | 95.78 | 91.90 | 98.92 | 4.71 | 18.42 |
Model | Precision | Recall | F1 |
---|---|---|---|
No CGIM | 90.19 | 89.43 | 89.80 |
No FPD | 91.35 | 89.43 | 90.38 |
NO CBAM | 91.28 | 90.24 | 90.76 |
NO DTT | 90.47 | 86.51 | 88.45 |
Ours | 92.45 | 89.83 | 91.12 |
Precision | Recall | F1 | |
---|---|---|---|
0 | 95.18 | 91.33 | 93.31 |
0.1 | 95.52 | 91.07 | 93.54 |
0.3 | 95.94 | 91.63 | 93.74 |
0.5 | 95.67 | 92.80 | 94.14 |
0.7 | 95.51 | 91.25 | 93.63 |
1 | 95.48 | 91.66 | 93.79 |
Number | Precision | Recall | F1 | Params (M) | FLOPs (G) |
---|---|---|---|---|---|
0 | 90.25 | 89.28 | 89.76 | 4.41 | 18.21 |
1 | 91.19 | 89.45 | 90.40 | 4.62 | 18.26 |
2 | 91.83 | 89.94 | 90.72 | 4.67 | 18.31 |
3 | 92.45 | 89.83 | 91.12 | 4.71 | 18.42 |
Number | Precision | Recall | F1 | Params(M) | FLOPs(G) |
---|---|---|---|---|---|
(16,16,16) | 90.25 | 89.28 | 89.76 | 4.56 | 18.39 |
(64,36,16) | 92.45 | 89.83 | 91.12 | 4.71 | 18.42 |
(64,64,64) | 91.83 | 89.94 | 90.88 | 4.93 | 18.66 |
Number | Precision | Recall | F1 |
---|---|---|---|
0 | 92.62 | 88.79 | 90.66 |
2 | 92.53 | 89.16 | 90.81 |
4 | 92.45 | 89.83 | 91.12 |
8 | 91.71 | 89.85 | 90.77 |
Number | Precision | Recall | F1 |
---|---|---|---|
0 | 92.13 | 91.80 | 92.95 |
2 | 95.63 | 92.07 | 93.82 |
4 | 95.51 | 92.80 | 94.14 |
8 | 95.08 | 91.65 | 93.33 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, M.; Jiang, W.; Zhou, Y. DTT-CGINet: A Dual Temporal Transformer Network with Multi-Scale Contour-Guided Graph Interaction for Change Detection. Remote Sens. 2024, 16, 844. https://doi.org/10.3390/rs16050844
Chen M, Jiang W, Zhou Y. DTT-CGINet: A Dual Temporal Transformer Network with Multi-Scale Contour-Guided Graph Interaction for Change Detection. Remote Sensing. 2024; 16(5):844. https://doi.org/10.3390/rs16050844
Chicago/Turabian StyleChen, Ming, Wanshou Jiang, and Yuan Zhou. 2024. "DTT-CGINet: A Dual Temporal Transformer Network with Multi-Scale Contour-Guided Graph Interaction for Change Detection" Remote Sensing 16, no. 5: 844. https://doi.org/10.3390/rs16050844
APA StyleChen, M., Jiang, W., & Zhou, Y. (2024). DTT-CGINet: A Dual Temporal Transformer Network with Multi-Scale Contour-Guided Graph Interaction for Change Detection. Remote Sensing, 16(5), 844. https://doi.org/10.3390/rs16050844