DAAINet: Domain Adversarial Anti-Interference Network for Bi-Temporal Change Detection

Yang, Jiyuan; Gao, Kun; Hu, Baiyang; Zhang, Zefeng; Wang, Jingyi; He, Yuqing; Feng, Yunpeng

doi:10.3390/rs18101656

Open AccessArticle

DAAINet: Domain Adversarial Anti-Interference Network for Bi-Temporal Change Detection^†

by

Jiyuan Yang

¹,

Kun Gao

¹

,

Baiyang Hu

¹,

Zefeng Zhang

¹,

Jingyi Wang

¹,

Yuqing He

^2,*

and

Yunpeng Feng

²

¹

National Key Laboratory on Near-Surface Detection, Beijing Institute of Technology, Beijing 100072, China

²

Key Laboratory of Photoelectronic Imaging Technology and System, Ministry of Education of China, Beijing Institute of Technology, Beijing 100081, China

^*

Author to whom correspondence should be addressed.

^†

This paper was supported by Beijing Nova Program under Grant No. 20240484543, the National Natural Science Foundation of China under Grant 62575028 and the National Key R&D Program of China under Grant 2022YFC3301602.

Remote Sens. 2026, 18(10), 1656; https://doi.org/10.3390/rs18101656

Submission received: 7 April 2026 / Revised: 10 May 2026 / Accepted: 11 May 2026 / Published: 21 May 2026

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

We developed DAAINet, which utilizes domain adversarial learning and GCN-based semantic interaction to mitigate pseudo-changes.
We introduced the CICD dataset for cloud interference, where our method achieved a 5.67% improvement in IoU and 1.42% in recall.

What are the implications of the main findings?

The DAAINet framework provides a practical paradigm for achieving robust change detection in complex, real-world atmospheric conditions.
The CICD dataset establishes a new benchmark for evaluating the interference resistance of future remote sensing algorithms.

Abstract

Bi-temporal change detection (CD) in remote sensing (RS) aims to map image pairs at different times into a shared feature space to discriminate variant regions effectively. However, factors such as cloud interference may disrupt the feature distribution of RS images and cause pseudo-change problems. Existing public change detection datasets also pay less attention to such pseudo-change phenomena. To address the pseudo-change problems of CD applications, we propose a Domain Adversarial Anti-Interference Change Detection Network (DAAINet), which uses ResNet to extract multi-scale features from the original input images. Semantic features are then obtained and fed into a subsequent graph convolution module after soft clustering, by introducing a domain adversarial structure to align the feature space in RS images. In the graph convolution module, the association of node context is utilized to predict the adjacency relationship between objects. We collected data and constructed a real-world dataset called “Cloud Interference Change Detection” (CICD), which focuses on real bi-temporal remote sensing image data containing cloud interference and includes pseudo-changes caused by factors such as the presence of temporary objects and illumination changes. Experimental results demonstrate that our method is more robust and efficient compared to other state-of-the-art methods on two public CD datasets, and achieves state-of-the-art performance on the noise-corrupted CICD dataset, surpassing prior methods by up to 5.67%p in IoU and 1.42%p in recall.

Keywords:

change detection; graph convolutional networks; convolutional neural network; domain adaptation; remote sensing; pseudo-change; cloud interference

1. Introduction

Bi-temporal change detection for remote sensing image is to generate change maps by comparing and analyzing two images in different phases, which constitutes a vital task within the domain of remote sensing. This task finds common applications in disaster assessment, urban design, agricultural planning, geography surveys, and river monitoring [1,2,3]. Although deep learning has made considerable progress in analyzing large-scale, high-resolution remote sensing images [4,5,6], bi-temporal CD still faces great challenges that some interfering factors may contribute to the pseudo-change phenomena. Pseudo-change arises from the misclassification of unchanged areas as changed. Common causes of pseudo-change include temporary objects, shadow variations, and cloud interference. As shown in Figure 1a, these interferences lead to shadow and projection differences in unchanged areas being erroneously detected as changed areas [7,8,9]. Among these, pseudo-change caused by thin cloud interference is a particularly noteworthy problem [10]. Moreover, existing remote sensing image CD datasets generally do not focus on the effects of thin cloud interference, resulting in CD models trained on these datasets generally failing to account for the influence of thin cloud interference.

Conventional change detection methods produce difference maps by examining variations in the spectral information within RS images at the pixel level, and then obtain change maps through threshold or clustering methods. Widely adopted techniques encompass principal component analysis (PCA) [11] and change vector analysis (CVA) [12], and support vector machine methods [13]. However, these methods mainly focus on the spectral changes of a single pixel and ignore the spatial background information between pixels. They also have the problem of being sensitive to the subjective selection of empirical thresholds and lack of robustness.

Deep learning-based algorithms exhibit superior performance in comparison to traditional methods that rely on handcrafted features, primarily because they possess the capability to automatically learn discriminative features from large-scale, high-quality datasets. Among these, algorithms based on deep Convolutional Neural Networks (CNN) show particularly strong results. CNNs are widely used in CD for extracting discriminative features, encompassing classical CNN architectures and their extended versions, such as ResNet [14], U-Net [15], and HRNet [16]. The CNN model captures the spatial distribution and structure of ground objects in an image through convolution operations, and can abstractly represent the information in the image layer by layer. However, the CNN model is limited by a fixed receptive field, which leads to insufficient response to changes in ground objects of different scales and difficulty in distinguishing some complex changes of ground objects [17]. After adding the attention mechanism, the model can more effectively focus on areas with important information in the image [18]. It can adaptively learn the importance of different parts of the original input image, thereby alleviating the limitation of the fixed receptive field inherent in the pure CNN method. Therefore, in recent years, numerous studies have incorporated attention mechanisms into change detection tasks to enhance detection accuracy. For example, Liu et al. [19] designed feature exchange and channel attention modules to effectively simulate contextual information in bi-temporal images. Feng et al. [20] introduced a joint attention module by combining self-attention and cross-attention to guide the global feature distribution of the input and promote information coupling. However, CNNs often struggle to model long-range dependencies and contextual relationships between pixels due to their inherent locality constraints.

Transformer [21], initially developed for handling natural language processing problems, has drawn significant attention within the computer vision community. Compared with pure CNN-based models, transformers have powerful capabilities in modeling long-distance dependencies [22], can capture global context information and process long-distance-related features. This development has greatly promoted the progress of CD algorithms. For example, Chen et al. [23] proposed a dual spatiotemporal image transformer that uses an encoder to capture spatiotemporal context and a decoder to refine the original features, thereby promoting the effective use and modeling of spatial context information and significantly improving the effectiveness of CD. Zhang et al. [24] designed a U-shaped transformer framework based on the Swin Transformer as the basic unit. The model processes bi-temporal images through an encoder to extract multi-scale features, and restores detail change information through a decoder, thereby obtaining more accurate CD results. The construction of a joint CNN–Transformer model can comprehensively integrate the advantages of the two architectures and effectively handle the multi-scale changes of ground objects. The model is good at capturing local details while considering global correlations [18,21,22]. Although the CNN–Transformer model has shown excellent performance in CD tasks, it still has some shortcomings. First, the increasing complexity of ground objects in bi-temporal remote sensing images, coupled with style changes such as shadow changes, temporary objects, and cloud occlusions, may lead to inconsistent feature representations of the same object. Previous CNN–Transformer models have difficulty in effectively alleviating the pseudo-changes caused by differences in feature distribution.

GCN-based change detection networks have also made increasing progress recently [25]. These approaches are capable of modeling relationships among pixels in remote sensing images, enabling the extraction of more complete feature information. Leveraging graph-based feature representations allows for the exploration of semantic similarities among unlabeled pixels [26]. The quadratic growth of graph computation with respect to node numbers often restricts their application to small-scale scenes or necessitates simplified graph constructions. Because graph operations are computationally expensive, many studies employ graph convolution merely as an auxiliary feature extractor to improve information interaction between bi-temporal images [27]. In particular, within fully supervised change detection, achieving an appropriate trade-off between model complexity and detection performance remains a key challenge when applying graph convolution.

Despite the remarkable progress of CNN, Transformer, and GCN-based CD methods, an implicit assumption is commonly shared by most existing works: the bi-temporal images can be mapped into a naturally consistent feature space through feature extraction and interaction alone. In other words, these methods focus primarily on cross-temporal feature fusion while overlooking a more fundamental issue—the potential distribution discrepancy between features extracted from images acquired at different times. However, in real-world remote sensing scenarios, this assumption is frequently violated. Factors such as cloud occlusion, shadow variation, illumination changes, and temporary objects introduce significant perturbations to the visual appearance of the scene without altering the actual land-cover semantics. As a consequence, the feature distributions of bi-temporal images may exhibit noticeable divergence even before feature interaction is performed, as shown in Figure 1b. This phenomenon leads to what is commonly observed as “pseudo-change”, where models mistakenly interpret domain-induced feature shifts as semantic changes. Unlike most existing change detection methods that directly model cross-temporal feature interaction, this work starts from a different observation: pseudo changes caused by clouds, shadows, and transient objects are essentially the result of domain-induced feature distribution discrepancies between bi-temporal images. If features from two temporal phases are not aligned into a consistent representation space beforehand, subsequent interaction mechanisms (e.g., attention, GCN, or transformer-based modeling) may unintentionally amplify these discrepancies and misinterpret them as real changes.

We argue that pseudo-change is not merely a feature representation problem, but essentially a bi-temporal covariate domain shift problem. Formally, although the conditional relationship between the scene and its change label remains invariant, i.e.,

P (Y | X_{1}) = P (Y | X_{2})

, the marginal distributions of the inputs differ significantly, i.e.,

P (X_{1}) \neq P (X_{2})

. Existing CD frameworks, regardless of whether they employ CNNs, Transformers, or GCNs, rarely address this discrepancy explicitly. Instead, they attempt to enhance cross-temporal interaction under the implicit assumption that the feature spaces are already aligned. This observation motivates a fundamentally different perspective for change detection under interference conditions: before performing cross-temporal feature interaction, it is necessary to first align the feature distributions of bi-temporal images to suppress domain-induced variations.

Domain adaptation is widely used in remote sensing image processing to align the feature spaces between different domains—such as images from different sensors, locations, or resolutions—so that models trained on one domain can perform well on another. Techniques include joint distribution adaptive-alignment methods, Maximum Mean Discrepancy (MMD) methods, and adversarial learning methods. These domain adaptation strategies significantly improve the transferability and generalization of remote sensing models across diverse and challenging scenarios. Adversarial learning methods, such as Generative Adversarial Networks (GANs), make it difficult to distinguish between the source and target domain features, thereby enabling more seamless feature mapping. In this work, we first introduce domain adversarial learning into the change detection pipeline to explicitly minimize the distribution discrepancy between features extracted from two temporal phases. Unlike conventional feature alignment or attention-based fusion strategies, our approach treats the bi-temporal images as samples drawn from different domains and performs adversarial domain alignment guided by clustering-based domain discovery.

Furthermore, we observe that once domain-induced variations are suppressed, semantic reasoning across temporal images becomes more reliable. To this end, we project the aligned feature maps into a compact graph space via soft semantic clustering, where graph convolution and inter-graph interaction (GFIM) are employed to model non-local semantic dependencies efficiently. This design not only reduces computational complexity compared to pixel-level graph reasoning, but also enhances robustness to imaging disturbances by operating at a higher semantic abstraction level. The Boundary Feature Module (BFM) further refines spatial details by integrating shallow texture cues with semantic reasoning. To validate the above hypothesis that pseudo-change originates from domain shift, we construct a new real-world dataset named “Cloud Interference Change Detection (CICD)”. Unlike existing benchmarks that mainly focus on seasonal or structural variations, CICD explicitly contains cloud occlusion, shadows, illumination differences, and temporary objects, providing a dedicated testbed for evaluating domain-robust change detection methods.

Therefore, instead of strengthening cross-temporal interaction first, we propose to mitigate domain discrepancy prior to interaction. This design philosophy forms the core of the proposed DAAINet.

In summary, this paper makes the following contributions:

1.: We propose DAAINet, a framework that prioritizes bi-temporal feature distribution alignment through domain adversarial learning guided by clustering-based pseudo-domain discovery. This alignment ensures that subsequent cross-temporal interaction operates on domain-consistent representations, effectively suppressing interference-induced false responses.
2.: We design a sequential pipeline of alignment, interaction, and refinement, where semantic graph interaction (GFIM) and boundary correction (BFM) are performed only after domain alignment, forming a logically consistent mechanism to reduce pseudo changes.
3.: We construct the CICD dataset with diverse interference conditions to specifically evaluate domain-induced pseudo changes. Extensive experiments demonstrate that the proposed framework achieves superior robustness in interference-rich scenarios while maintaining competitive performance on standard benchmarks, validating the effectiveness of the proposed domain alignment perspective.

2. Related Work

With the rapid development of deep learning in CD, most mainstream approaches have predominantly focused on feature learning. Early studies relied on conventional CNN models to extract local features and perform feature fusion. Subsequently, attention mechanisms were introduced to enhance feature representations, followed by the integration of GCNs to capture global features and model long-range dependencies. More recently, domain adaptation (DA) strategies have been employed to alleviate distribution shifts and reduce artifacts induced by modality differences or other interference factors. This section reviews related works from three perspectives: CNN-based models, GCN-based models, and DA-based models.

2.1. CNN-Based Methods

CNN-based models exploit deeper architectures and multi-scale information to effectively capture local spatial and texture details of RS images [24,28,29]. Residual networks, by introducing skip connections, facilitate direct information propagation across layers, enabling deeper architectures while mitigating the vanishing gradient problem. Ding et al. [30] incorporated a deep supervision module to strengthen the feature extraction capability of middle and deep layers. To better exploit deep feature information, Zhang et al. [31] proposed a deep supervised fusion network with a dual-stream architecture, followed by a differential discriminant network for CD. Li et al. [32] leveraged deep contextual features to enhance the differentiation of change regions. Addressing the limited rotation invariance of CNNs, Mei et al. [33] designed cyclic polar coordinate convolutional layers to model band correlations and rotational invariance in feature learning.

Moreover, Ma et al. [34] proposed a spectral correlation-based band selection algorithm to improve classification performance. Multi-scale feature fusion has been shown to be particularly effective, as it enables adaptation to objects of varying sizes and enlarges the receptive field, thereby improving the discrimination of complex surface structures and subtle changes in RS images. Lv et al. [35] integrated cross-layer connections into a UNet framework to fuse multi-scale and multi-level information, while Peng et al. [36] developed UNet++ to generate higher-precision feature maps by combining global and fine-grained features. Shuai et al. [37] further incorporated graph convolutional operations with attention mechanisms to capture multi-scale neighborhood relationships. Overall, CNNs demonstrate strong capabilities in extracting local details and subtle changes, making them highly effective for CD. However, their limited ability to model long-range dependencies and contextual interactions has motivated the integration of transformer architectures to enhance global feature representation.

2.2. GCN-Based Methods

GCN-based approaches [38] have attracted increasing attention due to their suitability for modeling non-Euclidean data, making them well-suited for semantic reasoning in vision tasks. By capturing long-range pixel dependencies, non-local networks expanded the receptive field and demonstrated effectiveness in tasks such as video classification and object detection [31]. GraphFCN [39] constructed a graph structure from intermediate feature layers, transforming pixel-level classification into node-level classification. DGCNet [40] modeled spatial and channel-wise relationships using projection strategies, enabling contextual interactions across long distances. BI-GCN [41] jointly exploited regional and boundary information, enhancing semantic reasoning at boundaries through boundary-aware correlation modeling.

In semi-supervised and unsupervised CD, DCVA [25] utilized GCNs to efficiently exploit sparse labeled samples from large unlabeled datasets, capturing the spatial as well as spectral correlations contained in bi-temporal images. GMCD [38] further captured both short- and long-range contextual patterns at the feature-map level, while FD-MCD [42] introduced Fourier-domain frequency consistency measures and non-local structural graphs to generate more robust difference maps. These studies highlight the strength of GCNs in modeling structural relationships and contextual dependencies, offering complementary advantages to CNN-based approaches.

2.3. Transformer-Based Methods

Transformer architectures [21] have gained prominence in CD due to their ability to model long-range dependencies, overcoming the local receptive field constraints of CNNs through global self-attention [23]. Specifically, Chen et al. proposed the Bitemporal Image Transformer (BIT) to model spatiotemporal context using compact semantic tokens. Bandara and Patel [28] further introduced ChangeFormer, a pure Siamese transformer network that achieves competitive performance without a CNN backbone.

To balance performance and efficiency, window-based and hybrid strategies have emerged. SwinSUNet [24] leverages shifted-window attention within a U-shaped framework to reduce computational complexity. Alternatively, hybrid models like TransUNetCD [43] and Hybrid-TransCD [44] integrate CNN-based local extraction with transformer-based global modeling. Similar architectural synergies are seen in Siamese Swin-UNet [45] and the pyramid-based Fully Transformer Network (FTN) [46].

2.4. Domain Adaptation Methods

Domain adaptation (DA) has been widely explored to address domain shift between source and target datasets [47]. Early shallow methods focused on statistical alignment, such as Maximum Mean Discrepancy (MMD) for kernel-space divergence reduction and Transfer Component Analysis (TCA) for domain-invariant subspace learning. While theoretically sound, these approaches were often limited in handling nonlinear shifts.

With the advent of deep learning, DA research shifted towards representation learning. A seminal work, Domain-Adversarial Neural Networks (DANN) [48], employed adversarial training with a gradient reversal layer to align distributions. Reconstruction-based methods [49] simultaneously performed supervised learning on the source domain and reconstruction on the target domain, enabling shared feature representations. Discrepancy-based methods [50] aligned higher-order statistics for computational efficiency. Meanwhile, self-training and semi-supervised DA strategies leveraged unlabeled target data. Recently, contrastive learning has emerged to enhance cross-domain feature discriminability by constructing positive and negative pairs.

More recently, BAN [51] introduces a bi-temporal adapter network aiming to extract the knowledge of foundation models for CD. UniCD [52] integrates supervised, weakly-supervised, and unsupervised CD into a coupled multi-branch architecture, utilizing semantic priors and representation regularization to bridge the gap between different annotation granularities. Despite these advances, challenges remain in open-set and multi-source DA. Several CD methods [53,54,55,56] have incorporated DA strategies and achieved promising performance. Nevertheless, additional efforts are required to design DA frameworks that offer stronger robustness and improved generalization for change detection tasks.

3. Methodology

In real-world remote sensing, domain discrepancies often exist between training and testing data [57]. Specifically, under interference conditions, models are typically trained on clean bi-temporal image pairs. Such discrepancy primarily alters the input distribution

P (X)

, while the underlying mapping from input clean images

X_{1}

and interference images

X_{2}

to change labels Y remains consistent. Formally, this corresponds to a covariate shift scenario, defined as:

P (X_{1}) \neq P (X_{2}),

(1)

P (Y ∣ X_{1}) = P (Y ∣ X_{2}) .

(2)

In this case, various interference sources cause significant shift in the feature distribution of the visual inputs, without altering the intrinsic semantic relationship between the observed scene and its change label. Consequently, the model experiences performance degradation not because of a change in the change-detection concept, but due to a mismatch in the input domain statistics.

3.1. Dataset Construction

In the domain of remote sensing change detection, most publicly available datasets have primarily focused on factors such as seasonal variations, illumination differences, shadow effects, vegetation dynamics, and architectural diversity. In contrast, the impact of cloud interference—despite its frequent occurrence in optical remote sensing imagery—has received comparatively limited attention. Clouds and their associated shadows often obscure ground objects, distort spectral signatures, and disrupt temporal consistency, thereby introducing significant challenges for accurate change detection.

To address this critical gap, we propose a real-world cloud interference change detection dataset, specifically designed to evaluate algorithmic robustness under complex atmospheric conditions. The dataset is constructed from multi-temporal high-resolution optical imagery collected over diverse geographic regions and land-cover types. It consists of 294 image pairs with dimensions of

1024 \times 1024

pixels, encompassing diverse land cover types including forests, grasslands, rivers, bridges, and residential buildings, ensuring a representative coverage of urban, agricultural, and natural environments. Each image pair is accompanied by pixel-level annotations that delineate true surface changes, while explicitly accounting for areas affected by cloud cover and shadows. This enables a more realistic assessment of change detection methods in scenarios where atmospheric perturbations may be mistaken as land-cover transitions. Some examples of images in the dataset are shown in Figure 2. The images are collected from multiple remote sensing competitions dataset, covering diverse geographic regions, sensors, and acquisition intervals, which helps reduce geographic bias and improves generalization.

Annotation Rules: To ensure that the CICD dataset specifically evaluates the robustness of change detection algorithms against pseudo-change caused by interference, we follow strict annotation principles during label generation.

Only genuine land-cover changes are annotated as positive change regions. Areas affected by cloud occlusion, cloud shadows, illumination variation, seasonal chromatic differences, and temporary objects (e.g., vehicles, construction materials, movable facilities) are explicitly annotated as non-change regions, even though they exhibit significant visual differences between the two temporal images.This annotation strategy forces change detection models to distinguish between true semantic changes and domain-induced visual perturbations, which is the core objective of CICD.

Data Split Protocol: The dataset consists of 294 bi-temporal image pairs with a spatial resolution of 1024 × 1024 pixels. To ensure fair evaluation and reproducibility, we adopt a fixed split protocol:

1.: Training set: 204 pairs (70%)
2.: Validation set: 30 pairs (10%)
3.: Test set: 60 pairs (20%)

All splits are scene-independent, meaning that image pairs from the same geographic region are not shared across different subsets, preventing spatial leakage.

Interference Type Distribution: Each image pair is manually inspected and categorized according to the dominant interference factors. The interference sources in CICD mainly include: thin cloud occlusion, cloud shadow, illumination variation, temporary objects and mixed interference conditions.The statistical distribution of these interference types is reported in Table 1, providing a clearer understanding of the dataset complexity.

Unlike standard benchmarks that focus on clean temporal pairs, CICD is intentionally constructed to include diverse interference conditions collected from multiple remote sensing datasets. This diversity introduces significant domain discrepancies between temporal images, making CICD particularly suitable for evaluating methods designed to handle domain-induced pseudo changes.

3.2. Architecture Overview

As discussed in Section 2, pseudo-change in bi-temporal remote sensing images is mainly caused by domain-induced visual perturbations rather than genuine land-cover variations. Therefore, the core challenge of interference-robust change detection is not merely how to enhance cross-temporal feature interaction, but how to suppress the distribution discrepancy between features extracted from the two temporal images before interaction.

Based on this observation, the proposed DAAINet follows a three-stage design philosophy:

Domain Alignment: reduce the feature distribution gap between bi-temporal images caused by clouds, shadows, and illumination variations;

Semantic Reasoning: perform reliable cross-temporal interaction after domain-induced variations are suppressed;

Boundary Refinement: recover fine spatial details that may be lost during high-level reasoning.

The overall pipeline of DAAINet follows a principle that differs from conventional change detection frameworks. Rather than immediately establishing interaction between bi-temporal features, DAAINet first enforces domain-level alignment to ensure that features extracted from different temporal phases lie in a consistent representation space. Only after this alignment step are cross-temporal interaction and boundary refinement performed. This sequential design is essential for suppressing domain-induced pseudo changes under complex interference conditions.

The overall pipeline of DAAINet is illustrated in Figure 3. Given a pair of bi-temporal images

X_{1}

and

X_{2}

, they are first passed through a shared backbone to obtain feature maps

F_{1}

and

F_{2}

. These features are then aligned via a domain adversarial learning branch guided by clustering-based domain discovery. After alignment, the features are projected into a compact graph space, where the Graph Feature Interaction Module (GFIM) performs semantic reasoning across temporal domains. Finally, the Boundary Feature Module (BFM) integrates shallow texture cues with high-level semantics to produce the final change map.

A key question in applying domain adversarial learning to change detection is how to define domain labels. Unlike conventional domain adaptation problems where domain identity is explicitly known (e.g., different sensors or datasets), the domain discrepancy in CICD is implicitly caused by interference factors such as clouds and shadows. Therefore, we employ unsupervised clustering on the feature space to automatically discover latent domain groups that correspond to different interference patterns. The clustering results serve as pseudo domain labels to guide the adversarial learning branch. This design has two important implications: (1) It enables domain alignment without requiring manual domain annotation; (2) It ensures that the adversarial learning specifically targets interference-induced distribution shifts rather than semantic content.

After domain alignment, the feature distributions of the two temporal images become more consistent. At this stage, cross-temporal interaction becomes more reliable because the model is less likely to confuse domain-induced variations with real changes. Instead of performing interaction at the pixel level, which is sensitive to noise and computationally expensive, we project the aligned features into a graph space via soft semantic clustering. The GFIM operates on these graph nodes to capture non-local semantic dependencies across temporal domains efficiently. High-level graph reasoning may weaken spatial boundary details. Therefore, the BFM is introduced to fuse shallow spatial features with graph-enhanced semantic representations, ensuring accurate boundary localization in the final change map.

Feature Extractor: Maps the input data into a specific feature space, enabling the change detector to distinguish between classes in the source domain data while preventing the domain classifier from identifying the data’s origin.

Change Detector: Performs change detection on the input data, aiming to accurately identify changed regions.

Domain Classifier: Classifies the feature-space data, attempting to determine which domain (source or target) the data originates from.

The feature extractor and change detector together form a feedforward neural network. A domain classifier is then appended after the feature extractor, connected via a Gradient Reversal Layer (GRL). During training, for bi-temporal remote sensing images, the network continuously minimizes the change detector’s loss based on the change detection labels of the input data. Simultaneously, guided by the domain labels assigned by the clustering module, the network minimizes the domain classifier’s loss.

The training objective is two-fold: (1) Ensure the change detector is sufficiently robust to accurately identify changed regions. (2) Encourage the feature representations of bi-temporal remote sensing images to be as similar as possible in the feature space, thereby confusing the domain classifier.

The feature extractor F and change detector C together form a feedforward neural network. A domain classifier D is then appended after the feature extractor, connected via a GRL. During training, for bi-temporal remote sensing images

X_{1}, X_{2} \in R^{H \times W \times C}

, the network optimizes:

min_{F, C} L_{CD} (C (F (X_{1}), F (X_{2})), Y)

(3)

min_{F} max_{D} L_{DA} (D (F (X)), d)

(4)

where

Y

denotes change detection labels and d represents domain labels. The training objective encourages domain-invariant feature representations:

∥ F (X_{1}) - F (X_{2}) ∥_{2} \to 0

(5)

The model implements an adversarial mechanism through a gradient reversal in the backward pass:

\nabla_{θ_{F}} \leftarrow \nabla_{θ_{F}} - λ \frac{\partial L_{DA}}{\partial θ_{F}}

(6)

where

λ

controls the adversarial strength. This induces a feature space

F

where:

P (F (X_{1})) \approx P (F (X_{2}))

(7)

while maintaining discriminative power for change detection.

3.3. Graph Feature Interaction Module

The structure of the GFIM module is shown in the Figure 4. The bidirectional arrows in GFIM indicate mutual feature interaction between graph representations of the two temporal images. This interaction is implemented through feature concatenation, rather than through any explicit loss function.

Although the three outputs of the MLP resemble the Q, K, V structure in self-attention, GFIM does not operate at the pixel level. Instead, the spatial dimension

H \times W

is first compressed into a small number of semantic graph nodes via soft clustering. Therefore, the query matrix no longer represents pixel-wise queries but semantic node queries. This compression significantly reduces computational complexity and makes the interaction more robust to interference such as clouds and shadows, which primarily affect local pixels but not high-level semantic clusters.

GFIM does not align features at the pixel level. Instead, after domain alignment, the two feature maps are projected into a shared semantic graph space via clustering. Nodes in this space represent high-level semantic regions rather than local pixels. Since clouds, shadows, and temporary objects mainly affect local appearance, their influence is significantly reduced in this graph space. As a result, GFIM aligns the two temporal representations at the semantic level, enabling reliable comparison even under severe interference.

The computational complexity of conventional pixel-level GCN is

O ({(H W)}^{2})

. In contrast, GFIM first compresses the feature map into K semantic nodes (

K ≪ H W

) via clustering. Therefore, the complexity is reduced to

O (K^{2})

. In our implementation,

K = 64

, while HW = 65,536 for a

256 \times 256

feature map, leading to over 1000× reduction in pairwise computation.

3.4. Boundary Feature Module

The structure of the BFM module is shown in the Figure 5. In the BFM, the feature map at resolution

H / 4 \times W / 4

is concatenated twice before fusion. This design is not redundant but intentionally acts as a form of multi-scale residual reinforcement.

Residual errors mainly appear around object boundaries due to fine-grained spatial ambiguity introduced by interference. Graph reasoning in GFIM operates on highly abstracted semantic nodes, which may weaken fine-grained spatial textures. By repeatedly injecting the original shallow feature map, BFM explicitly strengthens low-level spatial cues and stabilizes boundary localization during upsampling.

3.5. Domain Clustering

Since the original dataset lacks domain labels, it remains uncertain whether all data samples belong to the same domain or exhibit feature distribution shifts due to interference factors. To address this, we preliminarily assign temporary domain labels to all remote sensing images, enabling subsequent domain alignment through the gradient reversal layer in the domain classification network.

After extracting feature maps from all remote sensing images, we perform k-means clustering on the complete set of feature representations. We empirically tested

k \in {2, 3, 4}

. We observed that larger k values tend to fragment semantic regions and weaken domain discrimination. As shown in Table 2, the best performance is achieved at

k = 2

, indicating that the dominant distribution discrepancy mainly lies between interference-affected and normal regions. This parameter can be adjusted when more interference factors lead to increased domain diversity. Each image is then assigned a domain label based on its nearest cluster center. These annotated images subsequently proceed to the change detection and domain classification modules.

3.6. Graph Projection and Reprojection

This section describes the procedure for projecting bi-temporal images into a graph representation. For clarity, we illustrate the feature projection for temporal instance

t_{1}

. Following the strategy in [58], two matrices

W \in R^{| V | \times d}

and

M \in R^{| V | \times d}

are learned, where

W \in R^{C \times C}

denotes the learnable projection matrix in the MLP, and

M \in R^{K \times K}

represents the affinity matrix between semantic graph nodes and

| V |

denotes the predefined number of graph nodes. Each row vector

w_{k}

in W functions as the anchor of the k-th vertex. The soft assignment

q_{i j}^{k}

between feature vector

x_{i j}

and anchor

w_{k}

is then computed as:

q_{i j}^{k} = \frac{exp (- {∥\frac{x_{i j} - w_{k}}{σ_{k}}∥}_{2}^{2} / 2)}{\sum_{l} exp (- {∥\frac{x_{i j} - w_{l}}{σ_{l}}∥}_{2}^{2} / 2)} .

(8)

In this formulation,

σ_{k}

denotes the k-th row of matrix M, whose values are restricted to the range

(0, 1)

through a sigmoid activation. The numerator measures the affinity between the feature vector and its corresponding anchor, whereas the denominator normalizes this assignment over all graph nodes.

Next, the vertex feature matrix

Z \in R^{| V | \times d}

is formed by aggregating the associated pixel-level features. For each vertex k, its representation

z_{k}

is computed as the weighted mean of the residuals between feature vectors

x_{i j}

and the anchor

w_{k}

. The resulting vector

z_{k}

is then normalized to yield a unit vector

z_{k}^{'}

:

z_{k}^{'} = \frac{z_{k}}{∥ z_{k} ∥_{2}}, z_{k} = (\frac{\sum_{i j} q_{i j}^{k} (x_{i j} - w_{k})}{\sum_{i j} q_{i j}^{k}}) / σ_{k} .

(9)

Finally, the graph adjacency matrix A is computed as:

A = Z Z^{T} \in R^{| V | \times | V |} .

(10)

To transform the refined graph features back into the original spatial domain, we make use of the assignment relationships established during the graph projection stage:

\begin{matrix} X_{1}^{new} & = Q_{1} {\tilde{Z}}_{1}^{'} + X_{1}, Q_{1} \in R^{(h \times w) \times K} \end{matrix}

(11)

\begin{matrix} X_{2}^{new} & = Q_{2} {\tilde{Z}}_{2}^{'} + X_{2}, Q_{2} \in R^{(h \times w) \times K} \end{matrix}

(12)

where

X_{1}^{new} \in R^{(h \times w) \times C}

and

X_{2}^{new} \in R^{(h \times w) \times C}

are the interacted feature maps.

3.7. Bitemporal Graph Semantic Interaction

This module implements the semantic reasoning stage after domain alignment. The proposed GFIM module takes as input the graph embeddings

G_{1} = (V_{1}, Z_{1}, A_{1})

and

G_{2} = (V_{2}, Z_{2}, A_{2})

, derived from the feature mappings of temporal

t_{1}

and

t_{2}

, respectively. The model facilitates inter-graph semantic interaction by guiding bidirectional message passing from

V_{1}

to

V_{2}

and vice versa.

We use different multilayer perceptrons (MLPs) to transform

Z_{1}

into: the query graph

Z_{1}^{q}

, the key graph

Z_{1}^{k}

, and the value graph

Z_{1}^{v}

; and transform

Z_{2}

into: the query graph

Z_{2}^{q}

, the key graph

Z_{2}^{k}

, and the value graph

Z_{2}^{v}

.

Then, we unify

Z_{1}^{q}

and

Z_{2}^{q}

as:

{\vec{Z}}^{q} = concat ({\vec{Z}}_{1}^{q}, {\vec{Z}}_{2}^{q})

(13)

The similarity matrices

A_{1 \to 2}^{inter}

and

A_{2 \to 1}^{inter} \in R^{K \times K}

are calculated as:

A_{1 \to 2}^{inter} = f (Z^{q ⊤} Z_{2}^{k})

(14)

A_{2 \to 1}^{inter} = f (Z^{q ⊤} Z_{1}^{k})

(15)

where

f (\cdot)

is the softmax function. After that, information between

Z_{1}

and

Z_{2}

is exchanged by:

Z_{1}^{'} = f_{GFIM} (Z_{2}, Z_{1}) = (A_{2 \to 1}^{inter} Z_{1}^{v ⊤}) + Z_{1}

(16)

Z_{2}^{'} = f_{GFIM} (Z_{1}, Z_{2}) = (A_{1 \to 2}^{inter} Z_{2}^{v ⊤}) + Z_{2}

(17)

Following the inter-graph interaction, intra-graph reasoning is performed using

Z_{1}^{'}

and

Z_{2}^{'}

as inputs, yielding refined graph representations:

\begin{matrix} {\tilde{Z}}_{1}^{'} = f_{GCN} ({\tilde{Z}}_{1}^{'}) = g (A_{2 \to 1}^{inter} Z_{1}^{'} W_{1}) \in R^{K \times C} \\ {\tilde{Z}}_{2}^{'} = f_{GCN} ({\tilde{Z}}_{2}^{'}) = g (A_{1 \to 2}^{inter} Z_{2}^{'} W_{2}) \in R^{K \times C} \end{matrix}

(18)

where

g (\cdot)

is the ReLU activation function and

W_{1} \in R^{C \times C}

,

W_{2} \in R^{C \times C}

are learnable parameters of the graph convolutional layer.

3.8. Domain Classifier

This module corresponds to the domain alignment stage. The purpose of domain clustering is not to perform semantic grouping, but to provide reliable pseudo-domain labels for adversarial alignment. Since interference factors (clouds, shadows, transient objects) introduce distinct feature distribution patterns, clustering enables the network to explicitly model these distribution differences and guide the domain adversarial branch to reduce them. This step ensures that subsequent cross-temporal interaction operates on domain-consistent features rather than misaligned representations.

The features produced by the backbone extractor are forwarded simultaneously to the change detection module and the domain classification module. As illustrated in Figure 3, the domain classifier is composed of a Gradient Reversal Layer (GRL), two fully connected layers, a ReLU activation, and a log-softmax output layer. The domain labels of all feature maps in a batch are obtained through domain clustering. This module converts the single input feature maps into probability vectors

(P_{domain 1}, P_{domain 2}, \dots)

, representing the likelihood of belonging to each domain, and then calculates the loss with the domain label. During training, the domain classification loss is first computed and optimized via gradient descent. When backpropagating to the feature extractor, the domain classifier parameters remain fixed, and the negative gradient (inverse gradient) of the domain classification loss is used to update the feature extractor parameters. This adversarial update strategy intentionally maximizes domain classification errors, thereby forcing the feature extractor to learn domain-invariant feature representations.

3.9. Prediction Head and Loss Function

In the final step of producing the binary change map, a unified prediction head is applied. First, three decoded aggregated maps at different scales are upsampled to match the input image size using bilinear interpolation. These maps are then processed through two

1 \times 1

convolutional layers followed by batch normalization to generate a differential prediction. After multiscale differential computation, aggregation, and reorganization, the model is trained by minimizing both the loss of domain adaptation

L_{D A}

and the loss of cross-entropy

L_{C D}

between the predictions and the ground truth.

L_{C D} = L_{focal} + α L_{dice}

(19)

L_{D A} (d, P) = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{c = 1}^{C} d_{i}^{c} log P_{i}^{c}

(20)

Here, we set

α = 0.5

in all experiments, where

d_{i}^{c}

denotes the one-hot domain label of the i-th sample for the c-th domain category, and

P_{i}^{c}

represents the predicted probability that the i-th sample belongs to domain c produced by the domain classifier. The total training loss is given by:

L = (1 - λ) L_{D A} + λ L_{C D}

(21)

where

λ

balances the contribution between the domain adaptation loss

L_{D A}

and the change detection loss

L_{C D}

.

4. Results

4.1. Dataset

To assess the practical performance of the proposed DAAINet, we conducted extensive experiments on three representative public RS CD datasets.

1.: LEVIR-CD [59]: LEVIR-CD is an extensive very high-resolution (VHR, 0.5 m/pixel) CD dataset. It encompasses 637 pairs of image patches with a size of 1024 pixels, capturing significant changes over a span of 5–14 years. The bi-temporal images of LEVIR-CD are sourced from 20 distinct regions across various cities in Texas, USA. For experimental purposes, we cropped them into smaller image pairs with a size of 256 pixels, and they were randomly divided into 7120/1024/2048 for training, validation, and testing.
2.: WHU-CD [60]: The Wuhan University dataset encompasses the region affected by the earthquake in February 2011, along with the area that underwent reconstruction several years after the earthquake. The dataset spans approximately 20 km² and includes 12,796 buildings. It comprises a pair of aerial images with a spatial size of 32,507 × 15,354 pixels with a resolution of 0.2 m/pixel. The original images were cropped into smaller image pairs with the size of 256 pixels and randomly divided into 6096/762/762 for training, validation, and testing.
3.: CICD: The Real-world Change Detection Dataset consists of 294 image pairs with dimensions of $1024 \times 1024$ pixels, encompassing diverse land cover types including forests, grasslands, rivers, bridges, and residential buildings. Significantly, the dataset incorporates various challenging interference factors such as thin cloud cover, temporary objects, and extensive shadow areas.

4.2. Experimental Setup

To assess the effectiveness and accuracy of our DAAINet in CD, we compared with several state-of-the-art approaches, including FC-EF [58], FC-Siam-Diff [58], FC-Siam-Conc [58], SNUNet [61], BiT [23], DASNet [60], AMTNet [19], and DMINet [20].

Since some of the baseline methods methods (ChangeFormer, BAN-CF, UniCD) were not re-implemented in our codebase, the reported metrics are quoted from the original papers. This comparison aims to position our method within the context of current state-of-the-art approaches rather than claiming strictly controlled experimental reproduction.

The proposed DAAINet and all compared methods were evaluated on an NVIDIA GeForce RTX 3080Ti workstation, using the PyTorch framework. Throughout the model training phase, we applied data augmentation techniques including vertical and horizontal flipping, scaling, and cropping to mitigate overfitting. The model was trained using the Adam optimizer, with a batch size of 8,

λ

set to 0.4, an initial learning rate of

1 \times 10^{- 4}

, and 100 training epochs. The number of graph nodes is set to

| V | = 64

. Other hyperparameters follow default settings unless specified.

To assess the performance and effectiveness of our proposed model relative to other state-of-the-art methods, we employ five evaluation metrics: Precision (Pre), Recall (Rec), F1 score (F1), Intersection over Union (IoU), and Kappa coefficient (Kap). Their definitions are as follows:

Pre = \frac{T P}{T P + F P}

(22)

Rec = \frac{T P}{T P + F N}

(23)

F 1 = \frac{2 \times Pre \times Rec}{Pre + Rec}

(24)

IoU = \frac{T P}{T P + F P + F N}

(25)

Kap = \frac{p_{o} - p_{e}}{1 - p_{e}}

(26)

4.3. Performance Comparison and Result Analysis

The quantitative comparison of CD indicators is presented in Table 3, Table 4 and Table 5. The DAAINet demonstrates significant advantages over other state-of-the-art models across three indicators.

Evaluation on the LEVIR-CD dataset (Table 3) indicates that our method achieves performance comparable to current state-of-the-art (SOTA) approaches. On the WHU-CD benchmark (Table 4), the proposed method also achieves performance comparable to current state-of-the-art approaches. Although UniCD yield marginally higher metrics respectively, the numerical differences are small and likely acceptable in practical applications. This demonstrates that our method maintains excellent performance even on dataset with minimal interference. It should be noted that LEVIR-CD contains relatively clean bi-temporal pairs with minimal interference. In such scenarios, the domain discrepancy between temporal phases is limited, reducing the necessity of explicit domain alignment. As a result, methods that directly model cross-temporal interaction may achieve comparable or even slightly better performance. This observation is consistent with the design motivation of DAAINet, which is specifically targeted at interference-rich scenarios such as those in CICD.

On the CICD dataset (Table 5), the proposed method demonstrates superior robustness on the noise-corrupted dataset. Our method records the highest F1-score (72.94%) and IoU (64.51%), indicating superior capability in change detection under interference. Although FC-EF shows higher precision (96.01% vs. 80.41%), this comes at the cost of significantly lower recall (57.90% vs. 68.48%). This behavior may be related to class imbalance in the dataset, which affects precision and recall performance. This indicates that FC-EF tends to produce conservative predictions, missing many true change pixels, which is undesirable in practical change detection tasks. Our method significantly outperforms in all other metrics, demonstrating greater sensitivity and accuracy to change, which is particularly important for applications such as urban growth monitoring.

4.4. Visualization Comparison

To assess the effectiveness of various change detection algorithms, comparative experiments were carried out on the WHU-CD and LEVIR-CD datasets. The results indicate that our proposed DAAINet outperforms the other methods. Specifically, DAAINet generates significantly fewer false alarms than most competing algorithms on both datasets. Furthermore, our method exhibits excellent capability in suppressing pseudo-changes caused by shadows and chromatic aberrations. In terms of false detection and missed detection rates, DAAINet achieves comparable or better performance than the best baseline methods. While most existing algorithms struggle to accurately detect changes in irregularly shaped buildings of varying sizes, our DAAINet maintains clear identification of building boundaries, as evidenced in Figure 6, Figure 7 and Figure 8.

4.5. Ablation Analysis

Selection of Backbone Network: In CNNs, shallow feature maps capture detailed texture and color information, whereas deeper feature maps encode richer semantic content. We perform ablation studies to investigate the impact of network depth on the GCN Encoder and GCN Decoder, aiming to validate our model’s capability to aggregate semantic information from feature maps. Notably, the experiment configuration corresponds to the “DA + GFIM” setting reported in Table 6.

The results, presented in Table 7, indicate that using ResNet50 as the backbone yields the best performance across both datasets. Consequently, ResNet50 is chosen as the backbone of our framework.

Further integration of the BFM module yields additional improvements of approximately 1.1% in F1-score and 0.93% in IoU across both datasets. This demonstrates that combining texture information from shallow layers with retrospective learning enables the network to generate more complete and regularized change detection predictions.

Effectiveness of Clustering: To verify whether the clustering-based domain structure is meaningful, we replace the clustering-based domain labels with randomly generated binary labels while keeping all other settings unchanged. As shown in Table 8, using random domain labels leads to a clear performance drop. This indicates that the adversarial learning branch does not work simply due to the existence of domain supervision, but specifically relies on meaningful domain grouping induced by clustering. This experiment confirms that clustering plays a critical role in identifying interference-related distribution patterns and guiding effective domain alignment.

Effectiveness of GFIM interaction: To investigate whether the performance gain of GFIM comes from graph modeling rather than generic cross-temporal interaction, we replace GFIM with a simple element-wise addition between the graph representations of two temporal features (i.e.,

Z_{1}^{n e w} = Z_{1} + Z_{2}

,

Z_{2}^{n e w} = Z_{2} + Z_{1}

). This operation preserves cross-temporal information exchange but removes graph-based message passing and affinity modeling. As reported in Table 9, the simple addition strategy results in noticeably inferior performance compared to the full GFIM. This demonstrates that the improvement is not merely due to feature interaction across time, but specifically benefits from semantic-level message passing and affinity reasoning enabled by the graph structure. The result validates that GFIM performs meaningful semantic reasoning rather than acting as a trivial cross-temporal fusion module.

Effectiveness of BFM repeated concatenation: We empirically observed that removing the repeated concatenation leads to blurred change boundaries and inferior performance (see Ablation Study in Table 10), confirming that this design effectively compensates for the loss of spatial detail introduced by high-level semantic reasoning.

Module Effectiveness: We conduct experiments on both datasets to verify the effectiveness of our proposed core modules in change detection tasks. The full model includes GFIM, BFM and the DA structure in DAAINet. The experimental results are shown in Table 6.

Incorporating DA and GFIM into the network leads to performance gains on the LEVIR-CD dataset of 2.48% in F1-score and 4.08% in IoU compared to the baseline. On the WHU-CD dataset, the improvements are even more pronounced, reaching 7.71% (F1) and 12.16% (IoU). These findings indicate that our domain adversarial graph convolutional framework effectively enhances the change detection network’s performance.

In addition, we also specifically compared the feature extractors trained with and without the DA structure. We randomly selected 100 image pairs and the t-SNE plot is as shown in the Figure 9. Maybe the visual difference appears to be there but subtle, the quantitative improvement in Table 6 also confirms the effectiveness of DA.

4.6. Model Efficiency Assessment

To assess the computational efficiency of different models, we report FLOPs and parameter counts for input image sizes of

256 \times 256

and

512 \times 512

pixels, as summarized in Table 11. FC-EF, FC-Siam-Conc, and FC-Siam-Di exhibit lower FLOPs and Params due to their reliance on conventional CNN architectures without additional interaction modules. For the proposed DAAINet, the computational cost is 13.36G FLOPs and 11.34M parameters for

256 \times 256

inputs. Although DAAINet contains more parameters than DMINet and BIT, its graph reasoning is performed on clustered semantic nodes rather than dense pixel grids, which reduces the scale of pairwise interaction compared to pixel-level graph modeling. These results indicate that DAAINet introduces additional computational cost due to domain alignment and graph interaction, while remaining within a comparable complexity range to other interaction-based change detection models.

5. Discussion

The consistent top-tier performance across all benchmark and noisy datasets suggests that our approach offers more reliable change detection for practical remote sensing applications where data quality varies significantly. These numerical results provide clear evidence that the DA structure contributes meaningfully to CD tasks.

The visualization results for each model on different datasets are presented in Figure 6, Figure 7 and Figure 8. These figures showcase the accuracy of different models in detecting changing areas. DAAINet generates significantly fewer false alarms than most competing algorithms on both datasets, and exhibits excellent capability in suppressing pseudo-changes caused by shadows and chromatic aberrations. While most existing algorithms struggle to accurately detect changes in irregularly shaped buildings of varying sizes, our DAAINet maintains clear identification of building boundaries.

Despite the effectiveness of the proposed framework under interference-rich scenarios, several limitations remain.

First, the proposed domain alignment mechanism relies on clustering-based pseudo-domain discovery. Although this strategy provides useful guidance for adversarial learning, the clustering quality may vary across datasets and interference types. In cases where domain discrepancy is subtle, such as clean benchmarks with minimal interference, the benefit of explicit domain alignment becomes less significant. Second, the CICD dataset, although specifically constructed to evaluate interference-induced pseudo changes, is relatively limited in scale and collected from heterogeneous sources. The diversity of sensors, resolutions, and acquisition conditions introduces domain variety, but also makes standardized evaluation more challenging.

Future work will focus on expanding CICD into a larger, standardized benchmark with richer annotations and metadata. In addition, integrating the proposed domain alignment perspective with transformer-based architectures may further improve generalization ability under complex interference conditions.

6. Conclusions

Aiming at the pseudo-change phenomena of CD tasks based on remote sensing images, this paper proposes a CNN-GCN model that integrates a domain adversarial (DA) architecture. Specifically, we first extract features at three different scales using a CNN and use BFM for feature alignment to reduce the distribution differences between input images. Then, a GCN performs the encoding and decoding stages to aggregate contextual information. The proposed DA architecture effectively mitigates artifacts caused by modality differences or other interference factors by narrowing the feature distribution between the two temporal images. Our proposed GFIM effectively utilizes the interactive difference discriminative information of the input images and dynamically minimizes the interference of irrelevant information. To address the lack of cloud interference in existing CD datasets for remote sensing images, we construct a CICD dataset using real images to introduce cloud interference into the CD task.

We conduct extensive experiments on three datasets: LEVIR-CD, WHU-CD, and CICD. Experimental results show that although not always achieving the highest score, DAAINet maintains stable and competitive performance across the above public datasets with minimal interference. Furthermore, extensive experiments demonstrate that our approach outperforms all competing methods on CICD, surpassing prior methods by up to 5.67%p in IoU and 1.42%p in recall on CICD dataset. strongly demonstrating its superior interference resistance. Future research can explore the application of the proposed model to heterogeneous images.

Author Contributions

Conceptualization, K.G. and Y.H.; methodology, J.Y.; software, J.Y. and B.H.; validation, J.Y., Z.Z. and J.W.; formal analysis, J.Y.; investigation, J.Y.; resources, K.G. and Y.H.; data curation, B.H. and Z.Z.; writing—original draft preparation, J.Y.; writing—review and editing, K.G. and Y.H.; visualization, J.Y.; supervision, K.G. and Y.H.; project administration, K.G. and Y.F.; funding acquisition, K.G., Y.H. and Y.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Beijing Nova Program under Grant No. 20240484543, the National Natural Science Foundation of China under Grant 62575028 and the National Key R&D Program of China under Grant 2022YFC3301602.

Data Availability Statement

The CICD dataset constructed in this work will be made publicly available upon acceptance of the manuscript. The source code and dataset can be found at https://github.com/GeraldYJY/DAAINet (accessed on 6 April 2026).

Acknowledgments

The authors thank the open-source platforms for providing the remote sensing imagery used to construct the CICD dataset.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

CD	Change Detection
RS	Remote Sensing
CNN	Convolutional Neural Network
GCN	Graph Convolutional Network
DA	Domain Adaptation/Domain Adversarial
GRL	Gradient Reversal Layer
GFIM	Graph Feature Interaction Module
BFM	Boundary Feature Module
CICD	Cloud Interference Change Detection
VHR	Very High Resolution
MMD	Maximum Mean Discrepancy
PCA	Principal Component Analysis
CVA	Change Vector Analysis
IoU	Intersection over Union
OA	Overall Accuracy
SOTA	State-of-the-Art
MLP	Multilayer Perceptron

References

Malmir, M.; Zarkesh, M.M.K.; Monavari, S.M.; Jozi, S.A.; Sharifi, E. Urban development change detection based on multitemporal satellite images as a fast tracking approach—A case study of Ahwaz county, southwestern Iran. Environ. Monit. Assess. 2015, 187, 108. [Google Scholar] [CrossRef]
Soto Vega, P.J.; da Costa, G.A.O.P.; Feitosa, R.Q.; Adarme, M.X.O.; de Almeida, C.A.; Heipke, C.; Rottensteiner, F. An unsupervised domain adaptation approach for change detection and its application to deforestation mapping in tropical biomes. ISPRS J. Photogramm. Remote Sens. 2021, 181, 113–128. [Google Scholar] [CrossRef]
Zheng, Z.; Zhong, Y.; Wang, J.; Ma, A.; Zhang, L. Building damage assessment for rapid disaster response with a deep object-based semantic change detection framework: From natural disasters to man-made disasters. Remote Sens. Environ. 2021, 265, 112636. [Google Scholar] [CrossRef]
Wan, Y.; Zhong, Y.; Ma, A.; Wang, J.; Zhang, L. E2SCNet: Efficient multiobjective evolutionary automatic search for remote sensing image scene classification network architecture. IEEE Trans. Neural Netw. Learn. Syst. 2022; early access. [CrossRef]
Chen, C.; Wan, Y.; Ma, A.; Zhang, L.; Zhong, Y. A decomposition-based multiobjective clonal selection algorithm for hyperspectral image feature selection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5541516. [Google Scholar] [CrossRef]
Zhang, M.; Zhao, X.; Li, W.; Zhang, Y.; Tao, R.; Du, Q. Cross-scene joint classification of multisource data with multilevel domain adaption network. IEEE Trans. Neural Netw. Learn. Syst. 2023; early access. [CrossRef] [PubMed]
Li, X.; He, M.; Li, H.; Shen, H. A combined loss-based multiscale fully convolutional network for high-resolution remote sensing image change detection. IEEE Geosci. Remote Sens. Lett. 2022, 19, 8017505. [Google Scholar] [CrossRef]
Zheng, Z.; Zhong, Y.; Tian, S.; Ma, A.; Zhang, L. ChangeMask: Deep multi-task encoder-transformer-decoder architecture for semantic change detection. ISPRS J. Photogramm. Remote Sens. 2022, 183, 228–239. [Google Scholar] [CrossRef]
Wang, C.; Sun, W.; Fan, D.; Liu, X.; Zhang, Z. Adaptive feature weighted fusion nested U-Net with discrete wavelet transform for change detection of high-resolution remote sensing images. Remote Sens. 2021, 13, 4971. [Google Scholar] [CrossRef]
Li, Y.; Chen, W.; Zhang, Y.; Tao, C.; Tan, Y. Accurate cloud detection in high-resolution remote sensing imagery by weakly supervised deep learning. Remote Sens. Environ. 2020, 250, 112045. [Google Scholar] [CrossRef]
Deng, J.; Wang, K.; Deng, Y.; Qi, G.J. PCA-based land-use change detection and analysis using multitemporal and multisensor satellite data. Int. J. Remote Sens. 2008, 29, 4823–4838. [Google Scholar] [CrossRef]
Johnson, R.D.; Kasischke, E.S. Change vector analysis: A technique for the multispectral monitoring of land cover and condition. Int. J. Remote Sens. 1998, 19, 411–426. [Google Scholar] [CrossRef]
Volpi, M.; Tuia, D.; Bovolo, F.; Kanevski, M.F.; Bruzzone, L. Supervised change detection in VHR images using contextual information and support vector machines. Int. J. Appl. Earth Obs. Geoinf. 2013, 20, 77–85. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
Deng, Z.; Sun, H.; Zhou, S.; Zhao, J.; Lei, L.; Zou, H. Multi-scale object detection in remote sensing imagery with convolutional neural networks. ISPRS J. Photogramm. Remote Sens. 2018, 145, 3–22. [Google Scholar] [CrossRef]
Peng, X.; Zhong, R.; Li, Z.; Li, Q. Optical remote sensing image change detection based on attention mechanism and image difference. IEEE Trans. Neural Netw. Learn. Syst. 2021, 59, 7296–7307. [Google Scholar] [CrossRef]
Liu, W.; Lin, Y.; Liu, W.; Yu, Y.; Li, J. An attention-based multiscale transformer network for remote sensing image change detection. ISPRS J. Photogramm. Remote Sens. 2023, 202, 599–609. [Google Scholar] [CrossRef]
Feng, Y.; Jiang, J.; Xu, H.; Zheng, J. Change detection on remote sensing images using dual-branch multilevel intertemporal network. IEEE Trans. Geosci. Remote Sens. 2023, 61, 4401015. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30, pp. 6000–6010. [Google Scholar]
Li, G.; Zhu, L.; Liu, P.; Yang, Y. Entangled transformer for image captioning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8927–8936. [Google Scholar]
Chen, H.; Qi, Z.; Shi, Z. Remote sensing image change detection with transformers. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5607514. [Google Scholar] [CrossRef]
Zhang, C.; Wang, L.; Cheng, S.; Li, Y. SwinSUNet: Pure transformer network for remote sensing image change detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5224713. [Google Scholar] [CrossRef]
Saha, S.; Mou, L.; Zhu, X.X.; Bovolo, F.; Bruzzone, L. Semisupervised change detection using graph convolutional network. IEEE Geosci. Remote Sens. Lett. 2021, 18, 607–611. [Google Scholar] [CrossRef]
Wang, W.; Liu, C.; Liu, G.; Wang, X. CF-GCN: Graph convolutional network for change detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5607013. [Google Scholar] [CrossRef]
Meng, Y.; Wei, M.; Gao, D.; Zhao, Y.; Yang, X.; Huang, X.; Zheng, Y. CNN-GCN aggregation enabled boundary regression for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Cham, Switzerland, 2020; pp. 352–362. [Google Scholar]
Bandara, W.G.C.; Patel, V.M. A transformer-based Siamese network for change detection. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; pp. 207–210. [Google Scholar]
Li, K.; Li, Z.; Fang, S. Siamese NestedUNet networks for change detection of high resolution satellite image. In Proceedings of the 1st International Conference on Control, Robotics and Intelligent System, Xiamen, China, 27–29 October 2020; pp. 42–48. [Google Scholar]
Ding, Q.; Shao, Z.; Huang, X.; Altan, O. DSA-Net: A novel deeply supervised attention-guided network for building change detection in high-resolution remote sensing images. Int. J. Appl. Earth Obs. Geoinf. 2021, 105, 102591. [Google Scholar] [CrossRef]
Zhang, C.; Yue, P.; Tapete, D.; Jiang, L.; Shangguan, B.; Huang, L.; Liu, G. A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images. ISPRS J. Photogramm. Remote Sens. 2020, 166, 183–200. [Google Scholar] [CrossRef]
Li, X.; Du, Z.; Huang, Y.; Tan, Z. A deep translation (GAN) based change detection network for optical and SAR remote sensing images. ISPRS J. Photogramm. Remote Sens. 2021, 179, 14–34. [Google Scholar] [CrossRef]
Mei, S.; Jiang, R.; Ma, M.; Song, C. Rotation-invariant feature learning via convolutional neural network with cyclic polar coordinates convolutional layer. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5600713. [Google Scholar] [CrossRef]
Ma, M.; Mei, S.; Li, F.; Ge, Y.; Du, Q. Spectral correlation-based diverse band selection for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5508013. [Google Scholar] [CrossRef]
Lv, Z.; Huang, H.; Gao, L.; Benediktsson, J.A.; Zhao, M.; Shi, C. Simple multiscale UNet for change detection with heterogeneous remote sensing images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 2504905. [Google Scholar] [CrossRef]
Peng, D.; Zhang, Y.; Guan, H. End-to-end change detection for high resolution satellite images using improved UNet++. Remote Sens. 2019, 11, 1382. [Google Scholar] [CrossRef]
Shuai, W.; Jiang, F.; Zheng, H.; Li, J. MSGATN: A superpixel-based multi-scale Siamese graph attention network for change detection in remote sensing images. Appl. Sci. 2022, 12, 5158. [Google Scholar] [CrossRef]
Tang, X.; Zhang, H.; Mou, L.; Liu, F.; Zhang, X.; Zhu, X.X.; Jiao, L. An unsupervised remote sensing change detection method based on multiscale graph convolutional network and metric learning. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5609715. [Google Scholar] [CrossRef]
Lu, Y.; Chen, Y.; Zhao, D.; Chen, J. Graph-FCN for image semantic segmentation. In International Symposium on Neural Networks; Springer: Cham, Switzerland, 2019; pp. 97–105. [Google Scholar]
Zhu, W.; Sun, X.; Zhang, Q. DCG-Net: Enhanced hyperspectral image classification with dual-branch convolutional neural network and graph convolutional neural network integration. Electronics 2024, 13, 3271. [Google Scholar] [CrossRef]
Meng, Y.; Zhang, H.; Gao, D.; Zhao, Y.; Yang, X.; Qian, X.; Huang, X.; Zheng, Y. BI-GCN: Boundary-aware input-dependent graph convolution network for biomedical image segmentation. arXiv 2021, arXiv:2110.14775. [Google Scholar]
Chen, H.; Yokoya, N.; Chini, M. Fourier domain structural relationship analysis for unsupervised multimodal change detection. ISPRS J. Photogramm. Remote Sens. 2023, 198, 99–114. [Google Scholar] [CrossRef]
Li, Q.; Zhong, R.; Du, X.; Du, Y. TransUNetCD: A hybrid transformer network for change detection in optical remote-sensing images. IEEE IEEE Trans. Geosci. Remote Sens. 2022, 60, 5622519. [Google Scholar] [CrossRef]
Ke, Q.; Zhang, P. Hybrid-TransCD: A hybrid transformer remote sensing image change detection network via token aggregation. ISPRS Int. J. Geo-Inf. 2022, 11, 263. [Google Scholar] [CrossRef]
Tang, Y.; Cao, Z.; Guo, N.; Jiang, M. A Siamese Swin-Unet for image change detection. Sci. Rep. 2024, 14, 4577. [Google Scholar] [CrossRef] [PubMed]
Yan, T.; Wan, Z.; Zhang, P. Fully transformer network for change detection of remote sensing images. In Proceedings of the Asian Conference on Computer Vision (ACCV); Springer: Cham, Switzerland, 2022; pp. 75–84. [Google Scholar]
Tzeng, E.; Hoffman, J.; Saenko, K.; Darrell, T. Adversarial discriminative domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2963–2972. [Google Scholar]
Ganin, Y.; Ustinova, E.; Ajakan, H.; Germain, P.; Larochelle, H.; Laviolette, F.; March, M.; Lempitsky, V. Domain-adversarial training of neural networks. J. Mach. Learn. Res. 2016, 17, 2030–2096. [Google Scholar]
Ghifary, M.; Kleijn, W.; Zhang, M.; Balduzzi, D.; Li, W. Deep reconstruction-classification networks for unsupervised domain adaptation. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2016; pp. 597–613. [Google Scholar]
Ji, Y.; Sun, W.; Wang, Y.; Lv, Z.; Yang, G.; Zhan, Y.; Li, C. Domain adaptive and interactive differential attention network for remote sensing image change detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 3382116. [Google Scholar] [CrossRef]
Li, K.; Cao, X.; Meng, D. A new learning paradigm for foundation model-based remote-sensing change detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5610112. [Google Scholar] [CrossRef]
Jiang, K.; Wu, C.; Zhao, Z.; Han, C.; Guo, H.; Chen, H. Bridging supervision gaps: A unified framework for remote sensing change detection. arXiv 2026, arXiv:2601.17747. [Google Scholar] [CrossRef]
Zhang, X.; Yu, F.; Chang, S.; Wang, S. Deep transfer network: Unsupervised domain adaptation. arXiv 2015, arXiv:1503.00591. [Google Scholar] [CrossRef]
Liu, Q.; Ren, K.; Meng, X.; Shao, F. Domain adaptive cross reconstruction for change detection of heterogeneous remote sensing images via a Feedback Guidance Mechanism. IEEE Trans. Geosci. Remote Sens. 2023, 61, 4507216. [Google Scholar] [CrossRef]
Huang, Y.; Zhang, P. CSDACD: Domain-adaptive change detection network for cross-seasonal remote sensing images. IEEE Geosci. Remote Sens. Lett. 2024, 21, 6014205. [Google Scholar] [CrossRef]
Yang, W.; Ye, Z.; Mei, L.; Yao, Y.; Chen, J.; Li, Y. DDRL: Domain distribution reconstruction learning for binary change detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5604713. [Google Scholar] [CrossRef]
Sugiyama, M.; Krauledat, M.; Müller, K. Covariate shift adaptation by importance weighted cross validation. J. Mach. Learn. Res. 2007, 8, 985–1005. [Google Scholar]
Daudt, R.C.; Le Saux, B.; Boulch, A. Fully convolutional Siamese networks for change detection. In Proceedings of the 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 4063–4067. [Google Scholar]
Chen, H.; Shi, Z. A spatial-temporal attention-based method and a new dataset for remote sensing image change detection. Remote Sens. 2020, 12, 1662. [Google Scholar] [CrossRef]
Chen, J.; Yuan, Z.; Peng, J.; Chen, L.; Huang, H.; Zhu, J.; Liu, Y.; Li, H. DASNet: Dual attentive fully convolutional Siamese networks for change detection in high-resolution satellite images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 1194–1206. [Google Scholar] [CrossRef]
Fang, S.; Li, K.; Shao, J.; Li, Z. SNUNet-CD: A densely connected Siamese network for change detection of VHR images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 8007805. [Google Scholar] [CrossRef]

Figure 1. Illustration of pseudo-change and domain shift. (a) Bi-temporal heterogeneous images and the pseudo-change phenomena. The factors from top to bottom are: cloud interference, temporary objects, and shadow variation. (b) Corresponding data domain distributions of bi-temporal heterogeneous images, visualized with PCA (Principal Component Analysis). Blue dots represent Time A and green dots represent Time B.

Figure 2. Time A, Time B and label images in the CICD dataset.

Figure 3. The architecture of DAAINet: (a) feature extractor; (b) change detector; (c) domain classifier.

Figure 4. The architecture of Graph Feature Interaction Module (GFIM).

Figure 5. The architecture of Boundary Feature Module (BFM).

Figure 6. The visual results for LEVIR-CD dataset.

Figure 7. The visual results for WHU-CD dataset.

Figure 8. The visual results for CICD dataset.

Figure 9. t-SNE diagram of DA module ablation study (a) with DA; (b) without DA.

Table 1. Distribution of Interference Types.

Interference Type	Image Pairs	Percentage
Thin Cloud Occlusion	86	29.3%
Thick Cloud Shadow	54	18.4%
Illumination Variation	47	16.0%
Temporary Objects	39	13.2%
Mixed Interference	68	23.1%

Table 2. Quantitative comparison under different cluster numbers k.

k	F1 (%)	IoU (%)
2	90.22	81.18
3	88.47	79.02
4	87.95	78.41

Table 3. Performance comparison on LEVIR-CD dataset.

Methods	Pre (%)	Recall (%)	F1 (%)	IoU (%)	OA (%)
FC-EF	87.90	62.93	73.35	57.91	97.08
FC-conc	90.78	64.83	75.64	60.82	97.22
FC-diff	90.58	62.20	73.50	58.42	97.17
SNUNet	90.30	71.23	79.64	66.71	97.68
DASNet	81.18	78.17	83.78	63.59	98.31
BiT	89.24	89.37	89.31	80.68	98.92
AMTNet	90.62	89.00	89.80	81.49	99.10
DMINet	91.79	89.67	90.72	83.02	99.07
ChangeFormer	92.05	88.80	90.40	82.48	99.04
BAN-CF	93.47	90.30	91.86	84.94	-
UniCD	-	-	92.10	85.36	99.19
Ours	92.48	88.52	90.46	81.74	99.12

Table 4. Performance comparison on WHU-CD dataset.

Methods	Pre (%)	Recall (%)	F1 (%)	IoU (%)	OA (%)
FC-EF	81.12	68.12	74.05	58.80	98.39
FC-conc	54.40	82.88	65.69	48.91	96.56
FC-diff	62.68	80.25	70.39	54.31	97.32
SNUNet	76.25	85.09	80.43	67.26	98.35
DASNet	86.59	81.15	83.78	73.96	98.36
BiT	86.64	81.48	83.98	72.39	98.75
AMTNet	91.99	89.96	90.96	83.42	99.07
DMINet	93.00	89.52	91.23	83.87	99.32
UniCD	-	-	93.94	88.57	99.51
Ours	92.36	90.88	91.61	84.41	99.40

Table 5. Performance comparison on CICD dataset.

Methods	Pre (%)	Recall (%)	F1 (%)	IoU (%)	OA (%)
FC-EF	96.01	57.90	63.21	57.19	98.71
FC-conc	90.93	54.11	57.15	54.35	98.61
FC-diff	93.08	57.37	62.31	56.58	98.69
SNUNet	57.32	64.99	59.66	54.10	96.37
DASNet	60.61	67.06	63.01	56.53	97.16
BiT	66.20	63.24	64.57	57.91	98.61
AMTNet	82.93	60.41	65.65	58.84	98.66
DMINet	89.59	55.55	59.44	54.74	98.63
Ours	80.41	68.48	72.94	64.51	98.73

Table 6. Ablation study of different modules.

Configuration	LEVIR-CD			WHU-CD
	F1 (%)	IoU (%)	F1 (%)	IoU (%)
Baseline	86.64	76.17	80.72	67.00
+DA	86.89	76.45	82.15	69.78
+DA+GFIM	89.12	80.25	88.43	79.16
Full Model	90.22	81.18	89.36	80.09

Table 7. Performance comparison of different backbone networks.

Backbone	LEVIR-CD			WHU-CD
	F1 (%)	IoU (%)	F1 (%)	IoU (%)
VGG16	85.32	74.56	83.45	71.89
ResNet18	87.45	77.89	85.67	74.92
ResNet50	89.12	80.25	88.43	79.16

Table 8. Ablation study on the clustering design on LEVIR-CD.

Clustering Design	F1-Score (%)	IoU (%)
Random Domain Label	82.91	74.70
Clustering (Ours)	90.22	81.18

Table 9. Ablation study on the GFIM design on LEVIR-CD.

GFIM Design	F1-Score (%)	IoU (%)
sum	85.65	77.39
GFIM	90.22	81.18

Table 10. Ablation study on the repeated concatenation design in BFM on LEVIR-CD.

BFM Design	F1-Score (%)	IoU (%)
Single concatenation	88.79	80.20
Repeated concatenation (ours)	90.22	81.18

Table 11. Efficiency comparison analysis.

Methods		Image Size: 256			Image Size: 512
	Avg. F1 (%)	FLOPs (G)	Params (M)	FLOPs (G)	Params (M)
FC_EF	67.20	3.58	1.35	14.30	1.35
FC_Siam_conc	66.98	5.33	1.55	21.32	1.55
FC_Siam_diff	67.13	4.73	1.35	18.91	1.35
SNUNet	78.55	54.83	12.04	219.34	12.04
BiT	82.09	8.75	3.49	42.37	3.49
DASNet	81.22	107.98	55.31	432.00	55.31
AMTNet	80.83	21.56	24.67	86.26	24.67
DMINet	82.78	14.55	6.24	59.49	6.24
Ours	84.19	13.36	11.34	53.46	11.34

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, J.; Gao, K.; Hu, B.; Zhang, Z.; Wang, J.; He, Y.; Feng, Y. DAAINet: Domain Adversarial Anti-Interference Network for Bi-Temporal Change Detection. Remote Sens. 2026, 18, 1656. https://doi.org/10.3390/rs18101656

AMA Style

Yang J, Gao K, Hu B, Zhang Z, Wang J, He Y, Feng Y. DAAINet: Domain Adversarial Anti-Interference Network for Bi-Temporal Change Detection. Remote Sensing. 2026; 18(10):1656. https://doi.org/10.3390/rs18101656

Chicago/Turabian Style

Yang, Jiyuan, Kun Gao, Baiyang Hu, Zefeng Zhang, Jingyi Wang, Yuqing He, and Yunpeng Feng. 2026. "DAAINet: Domain Adversarial Anti-Interference Network for Bi-Temporal Change Detection" Remote Sensing 18, no. 10: 1656. https://doi.org/10.3390/rs18101656

APA Style

Yang, J., Gao, K., Hu, B., Zhang, Z., Wang, J., He, Y., & Feng, Y. (2026). DAAINet: Domain Adversarial Anti-Interference Network for Bi-Temporal Change Detection. Remote Sensing, 18(10), 1656. https://doi.org/10.3390/rs18101656

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

DAAINet: Domain Adversarial Anti-Interference Network for Bi-Temporal Change Detection †

Highlights

Abstract

1. Introduction

2. Related Work

2.1. CNN-Based Methods

2.2. GCN-Based Methods

2.3. Transformer-Based Methods

2.4. Domain Adaptation Methods

3. Methodology

3.1. Dataset Construction

3.2. Architecture Overview

3.3. Graph Feature Interaction Module

3.4. Boundary Feature Module

3.5. Domain Clustering

3.6. Graph Projection and Reprojection

3.7. Bitemporal Graph Semantic Interaction

3.8. Domain Classifier

3.9. Prediction Head and Loss Function

4. Results

4.1. Dataset

4.2. Experimental Setup

4.3. Performance Comparison and Result Analysis

4.4. Visualization Comparison

4.5. Ablation Analysis

4.6. Model Efficiency Assessment

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

DAAINet: Domain Adversarial Anti-Interference Network for Bi-Temporal Change Detection^†