1. Introduction
In recent years, remote sensing technology has advanced rapidly, resulting in significant improvements in the imaging capability and quality of remote sensing satellites. Optical remote sensing images now offer a resolution that exceeds 0.1 GSD (ground sampling interval, which indicates the ground distance represented by each pixel). This enhanced resolution facilitates the clear and precise identification of surface objects using remote sensing images. As a result, remote sensing images have become a widely used tool for identifying and analyzing ground objects; change detection is an important application of this technique.
Remote sensing change detection refers to the analysis of images from the same geographical area captured at different times to identify changes in features of interest within a specific time range [
1]. It can be applied to a variety of scenarios, such as arable land changes, building changes, lake and river changes, road network changes, etc. [
2,
3,
4]. Remote sensing change detection technology is important for national land regulation, modern urban planning, natural disaster assessment [
5], and military facility reconnaissance. Therefore, studying change detection algorithms with higher accuracy is of great theoretical significance and application value.
Numerous studies have explored the problem of change detection in remote sensing images. The accuracy of traditional methods is relatively low due to the effects of atmospheric conditions and seasonal variations, the nature of satellite sensors, and solar elevation.
Recently, deep learning methods have gained widespread use in remote sensing change detection. These techniques are capable of automatically extracting complex, hierarchical, and nonlinear features from raw data, overcoming the limitations of traditional change detection methods and exhibiting outstanding performance. Based on the deep feature extraction process of dual-temporal images, deep-learning-based change detection frameworks can be categorized into three types: single-stream, double-stream, and multi-model integrated [
6]. The double-stream Siamese networks have received more attention due to their simpler structure and stronger performance. In double-stream Siamese networks, the deep models used to extract features can be classified as convolutional neural-network-based models [
7,
8], recurrent neural-network-based models [
9,
10,
11], Transformer-based models [
12,
13], adversarial generative network-based models [
14], etc. [
15].
Convolutional neural networks can preserve neighborhood connections and local features and can process images of large size due to their structure of shared convolutional kernels. The FC-EF [
7] and FC-diff [
7] algorithms were the pioneers in utilizing a fully convolutional Siamese network architecture with skip connections. These methods were the first to employ end-to-end training and fully convolutional neural networks, improving the network’s accuracy and inference speed without increasing the training time. Subsequently, researchers [
8,
12,
16,
17,
18,
19,
20,
21,
22] extensively employed the Siamese network encoder and UNet decoder architectures as a base model for change detection. SNUNet-CD [
8] increases the flow path of multi-scale features in the decoder part, reducing the loss of localization information of shallow neural network features. ECAM [
23] is designed to refine the most representative output at different feature scales.
Recurrent neural networks are very effective in capturing sequence relationships, and in change detection; they can effectively establish change relationships between dual-temporal images. REFEREE [
9] is based on an improved long short-term memory (LSTM) model to acquire and record change information of long-term serial remote sensing data, using core storage units to learn change rules from information about binary changes or multi-class changes. In addition, there are algorithms that combine CNN and RNN to implement change detection. SiamCRNN [
10] uses a deep Siamese convolutional neural network to accomplish feature extraction and uses spatial-spectral features extracted from stacked long- and short-term memory units to map to a new latent feature space and to mine change information between them. In FCD-R2U-net [
11], in order to reduce the possible loss of topological information in changing regions, the classical R2U-Net structure is improved using a pair of R2CUs instead of a single R2CU in each convolutional layer of the encoder and decoder paths to make the model focus on certain detailed changing forest objects.
Transformer [
24] can extract contextually relevant feature representations through a multi-head self-attention mechanism and has been widely used in remote sensing image processing in recent years. BIT [
12] uses an effective transformer-based [
24] change detection method for remote sensing images. This method expresses the input image as visual words and models the context in a compact token-based space-time, facilitating the identification of change features of interest, while excluding irrelevant non-change features. The Changeformer [
13] algorithm is the first pure transformer-based change detection model. It leverages the MIT [
25] backbone network, which excels in semantic segmentation models, for the change detection task, integrating a hierarchically structured transformer encoder with a multi-layer perceptual decoder to effectively extract the desired multi-scale long-range relations.
However, these algorithms operate within a Siamese network architecture where the encoder part is not optimized for the change detection task. The extraction of change features is performed only at the decoder, resulting in underutilization of the encoder parameters.
Furthermore, extraction of change features is crucial for change detection tasks. Some studies [
1,
19,
20,
26,
27] have sought to improve the performance of change detection by enhancing the fusion of multi-scale features. The STANet [
1] algorithm incorporates a change detection self-attention module after the encoder network, allowing for the computation of spatiotemporal relationships between any two pixels in the change detection input image. Additionally, it uses different scales of self-attention to account for the scale diversity of the building target attention mechanism, resulting in more effective change features. The FCCDN [
26] algorithm introduces a feature fusion module, DFM, based on dense connectivity that is both simple and effective. The module includes difference and summation branches, where the summation branch enhances edge information, and the difference branch generates regions of variation. Each branch is constructed from two streams of shared weights that are densely connected, reducing feature misalignment. The DSANet [
19] algorithm uses a remote sensing change detection method based on deep metric learning. It uses a dual attention module to improve feature discrimination and more robustly distinguish changes. SCFNet [
27] introduces a structure of self-attention and convolutional layer fusion in the deepest layer of the encoder to better capture the semantic and positional mapping of different buildings in the study area, providing more informative features for subsequent feature fusion.
However, to ensure symmetry between the front and back temporal phases in the change detection task, prediction results should be the same regardless of whether the image of temporal phase one or the image of temporal phase two is input first. Some existing change feature fusion algorithms do not consider this symmetry, while others use a complex attention mechanism.
In this paper, we propose a solution to the problem of symmetry in change features by introducing a symmetric change feature fusion module (SCFM). The SCFM uses a two-branch feature selection approach, which preserves feature information, while incorporating strong prior knowledge, and a spatial feature attention mechanism based on cosine similarity. To fully utilize the encoder parameters and address the issue of delayed change feature extraction, we propose the interaction feature-aware module (MFAM) in the encoder stage. The MFAM incorporates change features into the encoder stage and uses a cross-type attention mechanism to model long-range dependencies. We also address sample imbalance and poor edge detection in change detection by introducing the Dice loss function for edge region detection.
The main contributions of this paper are summarized as follows:
A symmetric change feature fusion module (SCFM) is proposed, which uses parallel feature differencing and feature summation to maintain the symmetry of the results, while employing feature channel selection attention and explicit spatial attention based on cosine similarity to enhance the model’s extraction of change features.
A mutual feature-aware module (MFAM) is proposed to introduce change features in the deep stage of encoders based on the Siamese network architecture, which can make fuller use of the powerful parametric number of encoders and feature extraction for the focused regions that need attention compared to previous work.
We propose a new loss function (EL) for improving the effect of edge region change detection.
Based on the above three structures, we propose a mutual feature-aware network (MFNet) for remote sensing image change detection. Moreover, we conducted extensive experiments on public datasets SYSU-CD and LEVIR-CD and achieved advanced performance with F1 scores of 83.11% and 91.52%, respectively.
This paper is organized as follows: In
Section 2, we describe the overall framework and detailed structure of the algorithm proposed in this paper. In
Section 3, the evaluation results on public datasets are shown and compared with current state-of-the-art algorithms. In
Section 4, a detailed ablation experiment with visualization is discussed. The
Section 5 concludes the full paper.
5. Conclusions
In this paper, we propose MFNet, a mutual feature-aware network for remote sensing image change detection. Based on the encoder-decoder and Siamese network structure, we address three problems that exist in current remote sensing image change detection tasks, such as asymmetric change feature fusion, change feature extraction lag and sample imbalance and edge detection difficulties. A symmetric change feature fusion module SCFM, a mutual feature-aware module MFAM, and the edge loss function EL, are proposed, in which the symmetric change feature fusion module introduces the change feature a priori information using the two-branch feature selection without losing the feature information, and explicitly performs the spatial dimensional feature weighting based on the cosine similarity. The mutual feature-aware module introduces change features in advance at the encoder, allowing the model to target feature extraction for feature comparison in subsequent decoders. The edge loss guides the model to focus on the more difficult regions in the edge area, while alleviating the problem of unbalanced positive and negative samples.
We experimented on two commonly used change detection datasets, SYSU-CD and LEVIR-CD, and compared and analyzed them with the current mainstream remote sensing change detection algorithms. Detailed ablation experiments and feature visualization analysis were also performed to demonstrate the effectiveness of the proposed method.
In terms of future work to be carried out, in the mutual feature-aware module proposed in this paper, the intersection of the mutual features with the original features is performed by a simple concatenate operation plus a convolution operation, and more efficient structures for feature interaction can be explored in the future. Moreover, in order to improve the change detection performance, we use a heavy encoder in our method, which leads to the limitation of our method in terms of computational complexity, and cannot be applied in the case of restricted computational resources. For the next stage, we will study the lightweight remote sensing image change detection model under a computational resource constraint.