1.1. Background
With the development of geographic information technology, significant progress has been made in Earth observation technology, and various sensors and equipment have been widely applied. Remote sensing technology captures target objects through satellites equipped with sensors, analyzing the characteristic information of the target, extracting key features, and applying them [
1,
2,
3,
4]. At the same time, remote sensing technology can also capture spectral information beyond ultraviolet, infrared, and visible light to enrich spatial feature information. Due to these advantages, remote sensing images have gradually become a reliable and indispensable data source for obtaining surface information [
5,
6,
7,
8,
9]. The change detection of dual-temporal remote sensing images refers to shooting the ground via remote sensing platforms such as satellites and drones, capturing images at different periods to locate areas that have changed and those that have not [
10,
11,
12]. Simply put, it recognizes the differences in the status of geographical objects under different time distributions. Remote sensing image change detection is widely applied in geological surveying [
13,
14,
15], environmental monitoring [
16,
17,
18], resource management [
19,
20], urban expansion [
21,
22], and many other fields [
23,
24,
25,
26].
How to effectively extract true change information from dual-temporal remote sensing images is a crucial research direction at present. Over the past decades, many change detection methods have been proposed and applied. Images obtained in actual environments are subject to many uncertain factors, such as lighting, seasons, and shooting angles. Change detection methods can be divided into pixel-level change detection, feature-level change detection, and object-level change detection according to the difference in change detection elements. Pixel-level change detection considers each pixel as a basic unit and determines whether the values in the pixel have changed through comparison. Common methods include interpolation [
27], the ratio method [
28], and principal component analysis (PCA) [
29]. However, these methods all have high requirements for the accuracy of image registration and are now mostly used as part of a framework, rather than used alone. Deng et al. [
30] used PCA to enhance data and input the processed features into the classifier to detect land changes. Celik [
31] proposed a detection method that combines PCA and K-means. Feature-level change detection extracts significant features from the original dual-temporal images and analyzes them for change detection. This method often extracts texture features and edge features. Zhang et al. [
32] proposed a texture feature extraction method that can better describe texture features and spatial feature distribution based on local detail features. Guiming and Jidong [
33] proposed an edge detection method that improves the Canny operator and achieves better noise-smoothing effects while retaining more details. Object-level change detection treats the image as a combination of objects with different semantic information and carries out semantic detection on these objects. Peng et al. [
34] proposed a UNet++ based encoder–decoder structure that uses a multi-side output fusion method to obtain a higher-accuracy change map.
According to different technical means used, change detection methods can also be divided into traditional methods and methods based on deep learning. Traditional change detection generally consists of three steps: image preprocessing, difference map generation, and difference map analysis. In the image preprocessing stage, methods such as geometric correction and image registration are used to make the dual-temporal remote sensing images comparable in space and spectrum. In the difference map generation stage, a feature matrix representing the distance between dual-temporal remote sensing images is found through methods such as interpolation and ratio. In the difference map analysis stage, methods such as thresholding and clustering are combined to classify the pixels in the difference map. He et al. [
35] proposed a dynamic threshold algorithm based on the merged fuzzy C-means algorithm to generate each pixel’s membership value and a global initial threshold, which can reduce speckle noise and better retain detailed information. The main drawback of these methods is that they do not consider the surrounding pixel information when determining the correctness of the detected pixels, and only take the degree of fit between the statistical model and the actual data distribution as the standard. Thonfeld et al. [
36] proposed robust change vector analysis (CVA), which considers the neighborhood information of each pixel to alleviate the effect of poor co-registration between images but does not capture object-level information. Zheng et al. [
37] proposed a method based on combined difference images and K-means clustering. The use of mean and median filters ensures that edge information is well preserved while considering local consistency. Luppino et al. [
38] proposed the use of a cycle-consistent generative adversarial network to transcode images from different sensors into the same domain in an unsupervised manner, and further implemented change detection through CVA. Traditional detection methods are complicated to operate and yield relatively low detection accuracy.
In recent years, the development of deep learning algorithms has provided new solutions for change detection, and it has also been widely applied in the research fields of object detection and image segmentation [
39]. It builds a neural network structure to obtain a model framework, inputs the dataset into the network and then outputs the desired results according to the needs of the task, and the obtained model structure will have stronger robustness. Compared with traditional methods, deep learning-based methods can achieve higher scores, and also reduce some cumbersome steps of data preprocessing, and the end-to-end learning model framework is easier to be directly used. Gong et al. [
40] first applied CNN to solve the task of remote sensing image change detection and achieved good results, proving the feasibility of CNN in the task of remote sensing image change detection. Subsequently, many researchers proposed CNN-based change detection methods. Zhan et al. [
41] proposed a deep Siamese convolutional neural network to solve the problem of change detection, which extracted the change information of dual-temporal remote sensing images through the weight-sharing Siamese neural network, improving the operational efficiency of the model. On this basis, weight-sharing Siamese neural networks have been widely applied in the task of remote sensing image change detection. Zhang et al. [
42] proposed a deep Siamese semantic network change detection method, improved the loss function, and used the triplet of piecewise functions to strengthen the robustness of the model. Chen and Shi [
43] proposed a spatio-temporal attention neural network based on a connected body, dividing the image into sub-regions of multiple scales and introducing the self-attention mechanism in them, thus capturing spatio-temporal dependencies of various scales. Song et al. [
44] proposed a change detection network based on the U-shaped structure, which extracts and learns the similar feature information, different feature information, and global feature information of dual-temporal remote sensing images through multiple branches. Wang et al. [
45] proposed a hyperspectral image change detection method. The method first encodes the position of each pixel in the image, and then uses a spectral transform coder and a spatial transform coder to extract spectral sequence information and spatial texture information, respectively. Finally, the time-domain transformer is used to extract the useful change features between the current image pair, and the detection result is obtained through the multi-layer perceptron. Zhang et al. [
46] proposed a cascaded attention-induced differential representation learning method for multispectral change detection, which explores the correlation of features extracted from bitemporal images to obtain more discriminative features, and finally detects the discriminative features and obtains the final detection map.
With the emergence of high-resolution remote sensing images, traditional remote sensing image change detection methods have been unable to solve related problems well. The ground object scenes in high-resolution remote sensing images are generally more complex, and the seasons or lighting conditions of different time-phase remote sensing images are different, which may lead to different spectral characteristics of ground objects with the same semantic concept at different times and different spatial positions, which introduces a lot of noise interference to remote sensing images, making it more difficult to detect changes in specific ground objects. The remote sensing image change detection method based on deep learning can better model the relationship between remote sensing images and real objects by virtue of its powerful representation ability. Wang et al. [
47] used Faster R-CNN to detect changes in high-resolution remote sensing images. Experiments show that the detection accuracy of this method has been improved, and the detection probability of wrong samples has been reduced, thereby extracting more real changes information. Ding et al. [
48] proposed a novel dual-branch end-to-end network to build change detection, and innovatively introduced a spatial attention mechanism-guided cross-layer addition and skip connection module to aggregate multi-level contextual information, weakening original image features and differential features heterogeneity among them, and direct the network’s attention to regions where changes occur. Shu et al. [
49] proposed a dual-perspective change context network for change detection, the process of extracting and optimizing change features by bitemporal feature fusion and context modeling. Yin et al. [
50] proposed an attention-guided change detection Siamese network, which combines shallow spatial information and deep semantic information, to assist in the restoration of edge details of change areas and the reconstruction of small targets during upsampling.
1.2. Related Work
Considering the collection problem of dual-temporal remote sensing images, potential factors such as sensors, illumination, and solar angle inevitably cause the same object to have different positions and spectra, causing visual changes in remote sensing images. However, these changes are not caused by actual changes in the ground objects and should not be detected. This redundant information in model training will bring up such a problem: if the previous block fails to correctly segment the entire target, the incorrect predictions may accumulate in subsequent blocks, leading to an inability to obtain structurally complete change detection results. Therefore, how to fully utilize the information contained in the dual-temporal remote sensing images and eliminate redundant useless information is a difficulty we are currently facing. In this paper, we propose a new change detection network in response to the above issues, referred to as a dual-temporal remote sensing image change detection network based on the Siamese-attention feedback system architecture (SAFNet). First, we propose a global semantic module (GSM) on the encoder network to generate a low-resolution semantic change map to roughly capture the changing objects and provide semantic guidance for the reconstruction of the change map. Then, we propose a temporal interaction module (TIM). Unlike previous methods that only use a single cascading operation or subtraction operation for dual-time feature fusion, TIM can enhance the interaction between dual-time features, filter out redundant information, and improve the network’s perception of the entire changing target. Finally, we also propose a change feature extraction module (CFEM) to capture temporal difference information at different feature levels, and a feature refinement module (FRM) to adaptively focus on the change area, enhancing the network’s detection ability of edge information and small targets. Our work’s main contributions are summarized as follows:
- 1.
We propose a bitemporal remote sensing image change detection network based on the Siamese-attention feedback system architecture (SAFNet) to address the challenges in change detection tasks. We design a temporal interaction module (TIM). When multi-scale features in the encoder block are passed into the corresponding decoder, the network’s perception of the changing target is enhanced by using TIM to implement feature feedback between the two time steps, thus producing better detection results. SAFNet produces prediction outputs step by step and eventually obtains the best change prediction map.
- 2.
We propose the global semantic module (GSM), change feature extraction module (CFEM), and feature refinement module (FRM). By introducing GSM into the deep layer of the encoder network to obtain context-aware semantic change information of multi-scale and multi-receptive fields, it can guide the network to better locate significant change areas during the learning process, and reduce network false detection and missed detection. CFEM extracts difference information from the features of dual-temporal remote sensing images between each level, better learning the edge features and texture features of the change features. FRM enables the network to capture change features in both spatial and channel dimensions, eliminates and suppresses redundant features, and weights the feature extraction for the next time step, thereby improving the network’s detection accuracy.
- 3.
Extensive experiments on two remote sensing image change detection datasets show that compared with other deep learning-based change detection algorithms, our proposed SAFNet network demonstrates robustness and high precision.
The rest of the paper is as follows:
Section 2 introduces each module in the model and analyzes the composition of the dataset.
Section 3 analyzes the performance of the model through experiments.
Section 4 discusses the strengths and future of our approach.
Section 5 summarizes our work in this paper.