Building Change Detection with Deep Learning by Fusing Spectral and Texture Features of Multisource Remote Sensing Images: A GF-1 and Sentinel 2B Data Case

: Building change detection is an important task in the remote sensing ﬁeld, and the powerful feature extraction ability of the deep neural network model shows strong advantages in this task. However, the datasets used for this study are mostly three-band high-resolution remote sensing images from a single data source, and few spectral features limit the development of building change detection from multisource remote sensing images. To investigate the inﬂuence of spectral and texture features on the effect of building change detection based on deep learning, a multisource building change detection dataset (MS-HS BCD dataset) is produced in this paper using GF-1 high-resolution remote sensing images and Sentinel-2B multispectral remote sensing images. According to the different resolutions of each Sentinel-2B band, eight different multisource spectral data combinations are designed, and six advanced network models are selected for the experiments. After adding multisource spectral and texture feature data, the results show that the detection effects of the six networks improve to different degrees. Taking the MSF-Net network as an example, the F1-score and IOU improved by 0.67% and 1.09%, respectively, compared with high-resolution images, and by 7.57% and 6.21% compared with multispectral images.


Introduction
Building change detection is an important research topic in the field of remote sensing, which refers to the design of relevant algorithms to extract the building change characteristics in different periods of images in the same area [1], which plays an important role in the investigation of damaged buildings and the study of urbanization development processes [2,3].The development of remote sensing technology has led to the continuous improvement of image quality [4]; high-resolution images have fine texture features and multispectral images are rich in spectral features.Combining these two types of images and investigating their effects on the building change detection is particularly important for further expansion of the field.
Traditional methods of change detection [5] are principal component analysis [6], arithmetic operations [7,8], etc.These methods are simple to operate and only require the arithmetic calculation of remote sensing images in different periods, but their detection effect is poor and they cannot realize batch processing of images, which is difficult to deal with the increasing surge in massive remote sensing data [4,9,10].The rapid development of deep learning [11], especially the powerful image processing capability of convolutional neural networks (CNNs) [12], has led to the wide use of CNNs in the remote sensing field [13,14], including building change detection from remote sensing imagery [15,16].
Early fusion networks such as fully convolutional early fusion (FC-EF) [17] based on fully convolutional networks (FCNs) [18] and UNet++_MSOF [19] based on UNet++ [20] have been proposed successively, and scholars have achieved results by inputting dualtemporal remote sensing images in series to the network for change detection.Daudt et al. proposed the FC-Siam-conc and FC-Siam-diff networks [17], which combine FCN and Siamese networks [21] for change detection.This method can learn the deep-level features of a single remote sensing image and improve detection accuracy.Since then, Siamese networks such as NestNet [10], Siamese Nested UNet [22], DASNet [23], and PGA-SiamNet [24] have been proposed to further develop the remote sensing change detection field.STA-Net [25], ADS-Net [26], and MSF-Net [9] for building change detection in remote sensing images have also been proposed to improve building change detection accuracy.
In addition to the Siamese network structures, scholars have added attention mechanisms [27,28] to change detection networks.Zhang et al. designed the deeply supervised image fusion network (DSIFN) [4], which introduces a spatial attention mechanism [29] and channel attention mechanism [30] to effectively improve change feature detection.Chen et al. designed STA-Net [25] to introduce a spatial-temporal attention mechanism to the network to capture long-term spatial-temporal dependencies to learn better building features.Wang et al. designed ADS-Net [26], adding a convolutional block attention module [31] to the network to reconstruct features for multiscale information and improve the building change detection ability.Chen et al. designed MSF-Net [9], introducing selective kernel convolution [32] and channel and spatial multiple attention mechanisms to fuse multiscale features for building change extraction.
Although building change detection networks have matured in development, current research is mostly based on single-source high-resolution image datasets (e.g., LEVIR-CD [25], WHU Building Dataset [33], AIST Building Change Detection [34], SYSU-CD dataset [35]) and single-source multispectral image datasets (OSCD dataset [36]).Single satellite sensors have specific revisit periods for the same area, are affected by weather, such as clouds and fog, and cannot provide continuous high-quality image data in the same area [37].This condition limits ground-surface dynamic monitoring tasks such as change detection, so it is necessary to conduct research on building change detection with multiple data sources.For remote sensing imagery, the high-resolution satellite-based datasets, as described above, have higher spatial resolution and rich texture features, but their spectral features are insufficient.Multispectral satellite-based datasets have rich spectral features, but their spatial resolution is lower, the texture features are insufficient, and the existing OSCD multispectral dataset containing building change information only has 24 image pairs of data, which poses a limitation to the training effect of the neural network.Effectively combining multisource remote sensing images and using more textural and spectral features plays an important role in improving building change detection accuracy.
For multisource change detection, Liu proposed a change discovery and update method for high-consequence areas based on multisource remote sensing imagery data [38] using Landsat-8 multispectral images to assist with high-resolution imaging for change detection and achieve higher accuracy results.Zhao proposed a land use change detection method based on multisource data, combining imagery and vector data to achieve fully automatic and efficient land use change detection and extraction [39].Wang conducted change detection experiments based on a convolutional neural network and multisource high-resolution remote sensing data of ZY-3 and GF-2 to design a hybrid convolutional feature extraction module, a hybrid interleaved group convolution module, and a multiloss supervised training method to obtain fine change detection results [37].Zhang et al. designed W-Net [40], which can be used for change detection tasks of single-source and multisource remote sensing data.The experiments showed that combined multisource data as the model input, which combines the advantages of spectral, texture, and, structural information, can significantly improve the robustness of the model.Chen et al. designed the deep Siamese convolutional multiple-layer recurrent neural network (SiamCRNN) [41], which can perform change detection tasks with heterogeneous source images in front and behind time phases.Seydi et al. proposed an end-to-end multidimensional CNN framework [42] for land use change detection in multisource remote sensing data using three different types of remote sensing datasets (multispectral, hyperspectral, and polarized synthetic aperture radar) to evaluate the effectiveness and reliability of the proposed method.
From the above research, it can be concluded that: (1) the existing building change detection datasets are mostly single-source high-resolution images and single-source multispectral images, and there is a lack of open-source multisource remote sensing image-based datasets.High-resolution datasets such as LEVIR-CD are rich in texture features but insufficient in spectral information, while multispectral datasets such as OSCD are sufficient in spectral information but insufficient in texture features, and the too small data size of this dataset limits the learning capability of the model.Therefore, the lack of multisource building change detection dataset limits the further development of this field.(2) The current research on multisource data change detection focuses on designing a change detection method that can be used for multiple data sources.However, there are few comparative studies on the impact of adding multisource spectral features and texture features on improving detection accuracy; the main reason for this problem is also due to the lack of datasets.
To solve the above problems, this paper uses GF-1 high-resolution images and Sentinel-2B multispectral imageries, proposes a multisource remote sensing image building change detection dataset (MS-HS BCD dataset), and provides a database for studying multisource building change detection; our proposed dataset will be released through GitHub (https:// github.com/arcgislearner/MS-HS-BCD-dataset(accessed on 6 April 2023)).Additionally, based on the MS-HS BCD dataset, six open-source state-of-the-art change detection network models are selected to explore the effects of fusing multisource spectral features and texture features on building change detection effects.The experimental results show that after fusing the multisource texture and spectral information, the detection effect of all six network models improved compared with that of a single data source.The proposed dataset and study provide a database for multisource building change detection research and a reference for further research applications in this field.

Study Area
In this paper, dual-temporal images of Huangdao District, Qingdao City, Shandong Province, were selected to produce a multisource image building change detection dataset.Huangdao District is located in the southeastern part of the Shandong Peninsula, near the Yellow Sea, with latitude 35 • 35 ~36 • 08 N and longitude 119 • 30 ~120 • 11 E (as shown in Figure 1) and is the ninth national new district of the People's Republic of China.In recent years, the Huangdao District experienced rapid economic development, accelerated urbanization, and continuous reconstruction of the old city.This has resulted in significant change in the types of buildings, which provides a possibility for producing the building change detection dataset in this paper.

Data Selection and Preprocessing
In this study, GF-1 high-resolution remote sensing images and Sentinel-2B multispectral remote sensing images from February 2019 and December 2019 were used to produce the dataset.The GF-1 data selected for this paper were obtained from Shandong University of Science and Technology; due to the limitation of the data acquisition party, it can only be known that the GF-1 data were obtained in February and December.The Sentinel-2B data were obtained from ESA Copernicus Data Center (https://scihub.copernicus.eu/dhus/#/home(accessed on 12 April 2021)), and the specific acquisition dates for the two Sentinel-2B are 16 February 2019 and 7 December 2019.To ensure the reliability of the production of the dataset, the buildings in the GF-1 and Sentinel-2B images in February and December were manually checked to be consistent.The GF-1 images are RGB 3-band with 2 m/pixel spatial resolution after image fusion, and Sentinel-2B data are 13-band images with spatial resolutions of 10, 20, and 60 m/pixel.The parameters of the two satellite images are shown in Tables 1 and 2.  In this paper, the GF-1 images were preprocessed with radiometric correction and georeferencing, and only image resampling was performed.The maximum spatial

Data Selection and Preprocessing
In this study, GF-1 high-resolution remote sensing images and Sentinel-2B multispectral remote sensing images from February 2019 and December 2019 were used to produce the dataset.The GF-1 data selected for this paper were obtained from Shandong University of Science and Technology; due to the limitation of the data acquisition party, it can only be known that the GF-1 data were obtained in February and December.The Sentinel-2B data were obtained from ESA Copernicus Data Center (https://scihub.copernicus.eu/dhus/#/home(accessed on 12 April 2021)), and the specific acquisition dates for the two Sentinel-2B are 16 February 2019 and 7 December 2019.To ensure the reliability of the production of the dataset, the buildings in the GF-1 and Sentinel-2B images in February and December were manually checked to be consistent.The GF-1 images are RGB 3-band with 2 m/pixel spatial resolution after image fusion, and Sentinel-2B data are 13-band images with spatial resolutions of 10, 20, and 60 m/pixel.The parameters of the two satellite images are shown in Tables 1 and 2. In this paper, the GF-1 images were preprocessed with radiometric correction and georeferencing, and only image resampling was performed.The maximum spatial resolution of the Sentinel-2B multispectral image is 10 m/pixel.To facilitate the cropping of the corresponding areas of the two images and inspection of the features in the corresponding photos after cropping, the spatial resolution of the GF-1 image was resampled to 2.5 m/pixel using the resampling tool in ArcGIS, and the resampling method was the nearest neighbor method.The Sentinel-2B images were obtained at the L1C level with orthorectification and geometric correction.To produce the L2A level data, the L1C level data were radiometrically calibrated and atmospherically corrected using the Sen2cor plug-in released by ESA.The L2A level data were processed to remove the 10th band of data, leaving 12 bands.The image of the L2A level was then resampled using the Sen2cor tool to upgrade the 20 m/pixel and 60 m/pixel spatial resolution bands to 10 m/pixel spatial resolution, and each data band was saved as a TIFF file.Finally, all bands of the L2A Sentinel-2B images were resampled to 2.5 m/pixel spatial resolution also using the resampling tool in ArcGIS; the resampling method was the nearest neighbor method, so that they had the same resolution as that of GF-1 images, which was convenient for subsequent data processing and neural network training.We also used the geographic registration tool in ArcGIS software to carry out geographic registration of GF-1 and Sentinel-2B in two periods, respectively.The registration method used was affine transformation.Due to the limitation of resolution, the registration error was maintained at about half a pixel.

Dataset Production
ArcGIS is used in this paper to label buildings that changed in two periods, labeling objects such as buildings, movable houses, and agricultural sheds, with a total of 3646 elements.The annotation files were exported to raster data files, and the pixel points of changed buildings were marked as one; those of unchanged buildings were marked as zero using a raster calculator.Considering the computing power of the GPU, the annotation file, the 3-band GF-1 images, and the 12 single-band image maps of the processed L2A-level Sentinel-2B were cropped into 256 × 256, and the cropped annotation images were checked for building change annotation to modify mislabeled and omitted objects.To reduce the negative impact of the imbalance between the changed and unchanged samples on the model training, the cropped image pairs without building changes were excluded, and a total of 600 sets of building change images were obtained.Each set of images contains the prechange image, the postchange image, and the change annotation file.The pre-and postchange images were composed of the GF-1 images and the 12 single-band Sentinel-2B images.
After a series of operations such as resampling and cropping of Sentinel-2B multispectral images, the pixel depth of single-band data was 16-bit unsigned data type, taking values from 0 to 65,535, and some pixel values in the image area reached more than 7000.If the image data are directly loaded into the neural network training, the excessive value will make the loss drop unstable and increase the difficulty of neural network model training.Therefore, this study converted the image data with a pixel depth of 16-bit unsigned data type into 8-bit unsigned data type with a range of values from 0 to 255.
In this paper, the finally generated 600 image sets were divided into 540 sets of training images, 30 sets of validation images, and 30 sets of test images.This multisource image building change detection dataset was named the multispectral-high-resolution building change detection dataset (MS-HS BCD dataset).Dataset production and the experimental flow of this paper are shown in Figure 2.

Deep Neural Network for Building Change Detection
In this paper, six open-source, state-of-the-art or widely used change detection networks, MSF-Net [9], Siam-conc, Siam-diff, Siam-conc-diff [22], SNUNet-CD [43], and NestNet [10], are selected to explore the impact of multisource spectral data on building change detection improvement.

MSF-Net: A state-of-the-art multiscale supervised fusion network based on attention
mechanisms; the network structure is shown in Figure 3. MSF-Net built dual-context fusion module (Figure 3b) to obtain global context information of buildings, introduced channel attention mechanism (Figure 3e), selective kernel convolution (Figure 3f) to the network encoding (Figure 3a), and decoding (Figure 3d) modules to enhance the building change detection capability.A new multiscale fusion module and multiscale output module are designed to enable the network model to simultaneously extract buildings at different scales.The powerful feature extraction capability and the state-of-the-art nature are the reasons why we use it in this paper.

Deep Neural Network for Building Change Detection
In this paper, six open-source, state-of-the-art or widely used change detection networks, MSF-Net [9], Siam-conc, Siam-diff, Siam-conc-diff [22], SNUNet-CD [43], and NestNet [10], are selected to explore the impact of multisource spectral data on building change detection improvement.

1.
MSF-Net: A state-of-the-art multiscale supervised fusion network based on attention mechanisms; the network structure is shown in Figure 3. MSF-Net built dualcontext fusion module (Figure 3b) to obtain global context information of buildings, introduced channel attention mechanism (Figure 3e), selective kernel convolution (Figure 3f) to the network encoding (Figure 3a), and decoding (Figure 3d) modules to enhance the building change detection capability.A new multiscale fusion module and multiscale output module are designed to enable the network model to simultaneously extract buildings at different scales.The powerful feature extraction capability and the state-of-the-art nature are the reasons why we use it in this paper.
The network structure of Siam-conc, Siam-diff, and Siam-conc-diff are shown in Figure 4.These three network structures are a combination of a Siamese structure and Unet++.This combination is widely used in the field of change detection, where the Unet++ network model improves the multiscale detection capability and the Siamese structure enables the model to simultaneously learn deeper building features in the dual-temporal images, effectively improving the building change detection accuracy.

2.
Siam-conc: As shown in Figure 4, in the Siam-conc network, the "Operation" is channel concatenate; Siamese UNet++ connects the channels of the before and after change images in series for the building change detection.

3.
Siam-diff: In the Siam-diff network, the "Operation" in Figure 4 is used to calculate the difference between two images before and after the change, then Siamese UNet++ feeds the calculated difference into the next network structure to detect the changed building.4.
Siam-conc-diff: In the Siam-conc-diff network, the "Operation" in Figure 4 is used to perform a channel concatenation operation on the images before and after the change and the result of its differential operation, then Siamese UNet++ performs feature extraction on the concatenation result to detect the changed buildings.

5.
SNUNet-CD: The SNUNet-CD network structure is shown in Figure 5.An improved Siamese UNet++ uses the Ensemble Channel Attention Module (ECAM, Figure 5b) to combine the outputs of multiple branches into one output to obtain representative features at different scales.This process can improve the building change detection accuracy at different scales.The advanced nature of this network structure makes it widely cited, which is why we chose it for our study.2. Siam-conc: As shown in Figure 4, in the Siam-conc network, the "Operation" is channel concatenate; Siamese UNet++ connects the channels of the before and after change images in series for the building change detection.3. Siam-diff: In the Siam-diff network, the "Operation" in Figure 4 is used to calculate the difference between two images before and after the change, then Siamese UNet++ feeds the calculated difference into the next network structure to detect the changed building.4. Siam-conc-diff: In the Siam-conc-diff network, the "Operation" in Figure 4 is used to perform a channel concatenation operation on the images before and after the change and the result of its differential operation, then Siamese UNet++ performs feature extraction on the concatenation result to detect the changed buildings.5. SNUNet-CD: The SNUNet-CD network structure is shown in Figure 5.An improved Siamese UNet++ uses the Ensemble Channel Attention Module (ECAM, Figure 5b) to combine the outputs of multiple branches into one output to obtain representative features at different scales.This process can improve the building change detection accuracy at different scales.The advanced nature of this network structure makes it widely cited, which is why we chose it for our study.

Combination of MS-HS BCD Datasets
To investigate the building change detection of multisource spectral and texture feature data, based on the MS-HS BCD dataset, we set up eight data combinations for the training set, validation set, and test set used for the deep neural network in this paper.One combination is only the single-source GF-1 high-resolution imageries with three bands (red, green, blue), as shown in Table 3.Four combinations are only the single-source Sentinel-2B multispectral images, according to the difference of original resolution before the resampling of Sentinel-2B, as shown in Tables 4-7.Other combinations include the multisource image pair that combines high-resolution images with multispectral images.Again, depending on the original Sentinel-2B resolution, there are three combinations which are shown in Tables 8-10.Table 8 shows the combination of RGB bands in the GF-1 image and band NIR with the original resolution of 10 m/pixel in the Sentinel-2B image.Table 9 shows the basis of the band combination in Table 8 with added bands of Vegetation Red Edge and SWIR with the original resolution of 20 m/pixel.Table 10 shows, on the basis of the band combination in Table 9, that the bands of Coastal Aerosol and Water Vapor were added with an original resolution of 60 m/pixel.

Combination of MS-HS BCD Datasets
To investigate the building change detection of multisource spectral and texture feature data, based on the MS-HS BCD dataset, we set up eight data combinations for the training set, validation set, and test set used for the deep neural network in this paper.One combination is only the single-source GF-1 high-resolution imageries with three bands (red, green, blue), as shown in Table 3.Four combinations are only the single-source Sentinel-2B multispectral images, according to the difference of original resolution before the resampling of Sentinel-2B, as shown in Tables 4-7.Other combinations include the multisource image pair that combines high-resolution images with multispectral images.Again, depending on the original Sentinel-2B resolution, there are three combinations which are shown in Tables 8-10.Table 8 shows the combination of RGB bands in the GF-1 image and band NIR with the original resolution of 10 m/pixel in the Sentinel-2B image.Table 9 shows the basis of the band combination in Table 8 with added bands of Vegetation Red Edge and SWIR with the original resolution of 20 m/pixel.Table 10 shows, on the basis of the band combination in Table 9, that the bands of Coastal Aerosol and Water Vapor were added with an original resolution of 60 m/pixel.

Experiment Environment
The eight dataset combinations designed in this paper are inputted to the six selected building change detection models for training and validation.During the training process of the network, data augmentation of the training dataset is performed using methods such as random rotate, random noise, and random flip to reduce the adverse effects of small datasets on network training and to enhance the robustness of the network.The experiment environment is shown in Table 11; all of the network models are built based on the PyTorch deep learning framework, the programming language is Python, the programming environment is PyCharm, and the training parameters of each network model are consistent with the original paper.The experiments are run on a workstation with an AMD Ryzen 9 5950X 16-core (3.4 GHz) CPU, 128 GB RAM, and an Nvidia GeForce RTX 3090 (24 GB) GPU.

Evaluation Metric
In this paper, four metrics, precision (P), recall (R), F1-score (F1), and intersection over union (IOU), are selected to verify the effectiveness of the proposed model.The higher the precision is, the more the model detects the correct change pixels.The higher the recall is, the more the model detects the correct change pixels.The higher the F1-score and IOU are, the better the overall performance of the model.The calculations of the four metrics are shown below: where TP represents pixels that actually changed and were predicted to change by the model, TN represents pixels that actually did not change and were predicted to not change by the model, FP represents pixels that actually did not change but were predicted to change by the model, and FN represents pixels that actually changed but were predicted to not change by the model.

Single-Source High-Resolution Remote Sensing Images Building Change Detection
Based on multiple dataset combinations in the MS-HS BCD dataset, this paper first performs change detection of single-source high-resolution remote sensing images, with the data source being the three-band GF-1 RGB images (Table 3).The training loss curves for the six network structures are shown in Figure 7; the losses of the six network models remained flat after epoch50, with NestNet having the highest loss and SNUNet-CD having the lowest loss.The detection results of the six network models are shown in Table 12 and Figure 8.
From Table 12, the experimental results show that the MSF-Net algorithm performs the best, with the four metrics of precision, recall, F1-score, and IOU reaching 61.1%, 65.02%, 58.55%, and 43.31%, respectively.These results are 11.03%, 10.25%, 10.53%, and 8.24%, respectively, higher than those of the Siam-diff algorithm, which performs the best among the other advanced model algorithms.The worst performing algorithm is NestNet, with an F1-score and IOU of only 43.84% and 30.96%, respectively, but its recall index improves significantly compared with other network models, reaching 60.46%.
the data source being the three-band GF-1 RGB images (Table 3).The training loss curves for the six network structures are shown in Figure 7; the losses of the six network models remained flat after epoch50, with NestNet having the highest loss and SNUNet-CD having the lowest loss.The detection results of the six network models are shown in Table 12 and Figure 8.  From Table 12, the experimental results show that the MSF-Net algorithm performs the best, with the four metrics of precision, recall, F1-score, and IOU reaching 61.1%, 65.02%, 58.55%, and 43.31%, respectively.These results are 11.03%, 10.25%, 10.53%, and 8.24%, respectively, higher than those of the Siam-diff algorithm, which performs the best among the other advanced model algorithms.The worst performing algorithm is NestNet, with an F1-score and IOU of only 43.84% and 30.96%, respectively, but its recall index improves significantly compared with other network models, reaching 60.46%.Figure 8 shows that the Siam-conc and SNUNet-CD algorithms have more false detection areas, and the detected change buildings are unclear with more broken edges.The changed buildings detected by Siam-conc-diff and NestNet are not complete enough, and the information extraction ability of changed buildings is not enough.The detection results of Siam-diff are slightly better than those of the remaining four advanced algorithms, but the detected building boundaries are still incomplete and blurred.It can be seen that MSF-Net improves compared with other models, the false detection phenomenon is lower, the detected changed buildings are more complete, and the boundaries are clearer.Figure 8 shows that the Siam-conc and SNUNet-CD algorithms have more false detection areas, and the detected change buildings are unclear with more broken edges.The changed buildings detected by Siam-conc-diff and NestNet are not complete enough, and the information extraction ability of changed buildings is not enough.The detection results of Siam-diff are slightly better than those of the remaining four advanced algorithms, but the detected building boundaries are still incomplete and blurred.It can be seen that MSF-Net improves compared with other models, the false detection phenomenon is lower, the detected changed buildings are more complete, and the boundaries are clearer.

Single-Source Multispectral Remote Sensing Images Building Change Detection
Using the annotated MS-HS BCD dataset, building change detection of single-source multispectral images are investigated and experiments are conducted using six network models based on the four multispectral band combinations designed (Tables 4-7).The training loss curves for the six network structures are shown in Figure 9; NestNet and MSF-Net remained flat after epoch60 and the other models remained flat after epoch50.The detection results are shown in Tables 13-18.Table 13 shows the experimental results of the Siam-conc network under four multispectral image data combinations; it works best when the input data are in three bands, Table 13 shows the experimental results of the Siam-conc network under four multispectral image data combinations; it works best when the input data are in three bands, with the F1-score and IOU reaching 47.79% and 33.98%, respectively.The network becomes less effective after adding spectral features in multiple bands, indicating that this network cannot utilize more spectral features well, and the addition of low-resolution spectral features reduces the network effectiveness.Table 14 shows the experimental results of the Siam-conc-diff network under four multispectral image data combinations; it can be concluded that the Siam-conc-diff model similarly achieves the best results for three-band input data, with the F1-score and IOU reaching 47.89% and 34.46%, respectively, and the low-resolution spectral features reduce the network detection when the spectral bands of the input data increase.Table 15 shows the experimental results of the Siam-diff network under four multispectral image data combinations; the Siam-diff network performs best when the number of input bands is ten and the precision, recall, F1-score, and IOU are all the highest, reaching 48.09%, 59.65%, 49.51%, and 35.15%, respectively.This indicates that after increasing the spectral bands, Siam-diff learns more spectral features and improves the network detection effectiveness, but the network effectiveness decreases after continuing to increase the 60 m/pixel resolution band, indicating that the too low-resolution affects the network detection effect.Table 16 shows the experimental results of the SNUNet-CD network under four multispectral image data combinations; SNUNet-CD achieves optimal results when the input data are twelve bands, with the F1-score and IOU reaching 50.08% and 36.19%,respectively.This indicates that the attention mechanism designed at the output side can acquire building change characteristics at different scales and improve the detection effectiveness after inputting data from multiple resolution bands.Table 17 shows the experimental results of the NestNet network under four multispectral image data combinations; NestNet achieves the best results with only three bands, with the F1-score and IOU reaching 49.73% and 35.84%, respectively.Adding more data bands, the network became effective, indicating that this network is not capable of handling more data bands.Table 18 shows the experimental results of the MSF-Net network under four multispectral image data combinations; MSF-Net works best at four bands, with the F1-score and IOU reaching 51.65% and 38.19%, respectively.The detection becomes less effective after continuing to include band data with resolutions of 20 m/pixel and 60 m/pixel, indicating that the MSF-Net algorithm can utilize more spectral features with higher resolutions and is less capable of processing spectral information with lower resolutions.
The performance of the six deep neural networks on the single-source multispectral remote sensing image building change detection dataset shows that Siam-conc, Siamconc-diff, and NestNet achieve optimal results when the number of input bands is three, while Siam-diff, SNUNet-CD, and MSF-Net achieve optimal results when the number of input bands is 10, 12, and 4, respectively.This reflects the different learning abilities of different networks for different data bands.Among the six algorithms, MSF-Net has the best detection effect when using four-band multispectral image data as the input and improves the F1-score by 1.57% and the IOU by 2% compared with SNUNet-CD, which has the best performance among the remaining algorithms using 12 bands as the data input.

Multisource Spectral and Texture Feature Building Change Detection
Using six network models based on the MS-HS BCD dataset, experiments on building change detection in multisource spectral and texture features are conducted based on three different band combinations of high-resolution and multispectral images.We compared the experimental results of the three combined multisource data, the single-source highresolution images experimental results, and the results for the single-source multispectral images.It is important to note that, among the results for single-source multispectral data, the Siam-conc, Siam-conc-diff, Siam-diff, SNUNet-CD, NestNet, and MSF-Net networks achieve the best results at three, three, ten, twelve, three, and four bands, respectively, so only these data combinations are used for the comparison of multisource data.The losses of the six network models on the three multisource data combinations are shown in Figure 10; the losses of the six network models remained flat after epoch50, the multisource data combination method had a greater impact on the MSF-Net, and the loss is significantly reduced at the four bands combination.The quantitative comparison results of each network model are shown in Tables 19-24, and the results of building change detection at different data combinations are shown in Figures 11-16.Table 19 shows the experimental results of the Siam-conc network under multisource spectral and texture data combinations.The Siam-conc algorithm achieves the best detection results when the four-band multisource data are combined; the precision reaches 52.53%, the F1-score reaches 48.97%, and the IOU reaches 35.54%, which are 4.1%, 3.67%, and 4.05% higher than the high-resolution images and 3.77%, 1.18%, and 1.56% higher than the multispectral images, respectively.Recall is optimal for the ten-band combination of multisource data, which indicates that the detection effectiveness of the Siam-conc algorithm can be improved by adding multisource spectral information with higher resolution.Continuing to add multisource spectral information, the detection effect decreases when the band combination is 10 and 12, indicating that adding too much low-resolution multisource data for this network model has a negative effect and reduces the model detection effectiveness.In Figure 11, when the four-band multisource data were combined, Siam-conc detected changed buildings more completely and clearly.Table 19 shows the experimental results of the Siam-conc network under multisource spectral and texture data combinations.The Siam-conc algorithm achieves the best detection results when the four-band multisource data are combined; the precision reaches 52.53%, the F1-score reaches 48.97%, and the IOU reaches 35.54%, which are 4.1%, 3.67%, and 4.05% higher than the high-resolution images and 3.77%, 1.18%, and 1.56% higher than the multispectral images, respectively.Recall is optimal for the ten-band combination of multisource data, which indicates that the detection effectiveness of the Siam-conc algorithm can be improved by adding multisource spectral information with higher resolution.Continuing to add multisource spectral information, the detection effect decreases when the band combination is 10 and 12, indicating that adding too much low-resolution multisource data for this network model has a negative effect and reduces the model detection effectiveness.In Figure 11, when the four-band multisource data were combined, Siam-conc detected changed buildings more completely and clearly.
Table 20 shows the experimental results of the Siam-conc-diff network under multisource spectral and texture data combinations.The Siam-conc-diff algorithm achieves the best detection at the ten-band combination, with recall reaching 62.22%, F1-score reaching 55.02%, and IOU reaching 40.09%, improving by 8.97%, 7.96%, and 6.1%, respectively, compared to the high-resolution images, and 9.86%, 7.13%, and 5.63%, respectively, compared to multispectral images.Precision reached the highest at 55.37% for the four-band combination.This indicates that the algorithm can better utilize more spectral information to improve detection precision after merging the dual-temporal image features and their differential features.The detection results of twelve-band data decreased significantly compared with those of ten-band data, indicating that adding low-resolution band data has a greater negative impact on the Siam-conc-diff network.Figure 12 shows that when ten multisource spectral bands are combined, Siam-conc-diff can detect changed buildings more precisely.Table 20 shows the experimental results of the Siam-conc-diff network under multisource spectral and texture data combinations.The Siam-conc-diff algorithm achieves the best detection at the ten-band combination, with recall reaching 62.22%, F1-score reaching 55.02%, and IOU reaching 40.09%, improving by 8.97%, 7.96%, and 6.1%, respectively, compared to the high-resolution images, and 9.86%, 7.13%, and 5.63%, respectively, compared to multispectral images.Precision reached the highest at 55.37% for the four-band combination.This indicates that the algorithm can better utilize more spectral information to improve detection precision after merging the dual-temporal image features and their differential features.The detection results of twelve-band data decreased significantly compared with those of ten-band data, indicating that adding low-resolution band data has a greater negative impact on the Siam-conc-diff network.Figure 12 shows that when ten multisource spectral bands are combined, Siam-conc-diff can detect changed buildings more precisely.Table 21 shows the experimental results of the Siam-diff network under multisource spectral and texture data combinations.The Siam-diff algorithm has the best detection effect in the four-band combination; although the recall decreases, the precision reaches 62.76%, the F1-score reaches 52.64%, and the IOU reaches 38.28%.The F1-score improves Table 21 shows the experimental results of the Siam-diff network under multisource spectral and texture data combinations.The Siam-diff algorithm has the best detection effect in the four-band combination; although the recall decreases, the precision reaches 62.76%, the F1-score reaches 52.64%, and the IOU reaches 38.28%.The F1-score improves by 4.62% and 3.13% compared with the high-resolution image and multispectral image, respectively, and the IOU improves by 3.21% and 3.13%, respectively.The F1-score and IOU of the model continue to decrease as we continue to add multisource spectral information, indicating that the Siam-diff network can use higher resolution multisource image data to improve the detection precision but cannot handle more low-resolution spectral data.Figure 13 shows that when four multisource spectral bands are combined, Siam-diff can detect clearer building boundaries.Table 22 shows the experimental results of the SNUNet-CD network under multisource spectral and texture data combinations.From the two important metrics, F1-score and IOU, the SNUNet-CD algorithm works best in the four-band combination of multisource data, with the F1-score and IOU reaching 50.43% and 36.41%,respectively.However, the network is weak in extracting information from multisource image data, and although the two metrics, the F1-score and IOU, improve by 5.75% and 4.58%, respectively, compared with high-resolution images, they only improve by 0.35% and 0.22% compared with single-source multispectral images.These results indicate that the Table 22 shows the experimental results of the SNUNet-CD network under multisource spectral and texture data combinations.From the two important metrics, F1-score and IOU, the SNUNet-CD algorithm works best in the four-band combination of multisource data, with the F1-score and IOU reaching 50.43% and 36.41%,respectively.However, the network is weak in extracting information from multisource image data, and although the two metrics, the F1-score and IOU, improve by 5.75% and 4.58%, respectively, compared with high-resolution images, they only improve by 0.35% and 0.22% compared with singlesource multispectral images.These results indicate that the SNUNet-CD network achieves better results in extracting multispectral image features.Adding more multisource data information does not significantly, improve its effect and the effect of the network becomes worse when adding more low-resolution multisource spectral features.In Figure 14, when four multisource spectral bands are combined, SNUNet-CD can detect changed buildings more completely.Table 23 shows the experimental results of the NestNet network under multisource spectral and texture data combinations.The F1-score and IOU, two important metrics of NestNet, achieved the best results when the twelve-band multisource image data were combined, reaching 50.92% and 36.78%,respectively.Precision was the highest at 53.8% for the four-band combination.Recall was the highest at 61.23% for the ten-band combination.The four evaluation indicators showed a significant increase compared with single-source images.Thanks to the improved UNet++ dense skip connection module, the NestNet algorithm can use more multisource spectral data to effectively improve the building change detection effect compared with single-source remote sensing image data.In Figure 15, it can be seen that NestNet works best when twelve multisource spectral bands are combined and the boundaries of the changing buildings are clearer.Table 23 shows the experimental results of the NestNet network under multisource spectral and texture data combinations.The F1-score and IOU, two important metrics of NestNet, achieved the best results when the twelve-band multisource image data were combined, reaching 50.92% and 36.78%,respectively.Precision was the highest at 53.8% for the four-band combination.Recall was the highest at 61.23% for the ten-band combination.The four evaluation indicators showed a significant increase compared with single-source images.Thanks to the improved UNet++ dense skip connection module, the NestNet algorithm can use more multisource spectral data to effectively improve the building change detection effect compared with single-source remote sensing image data.In Figure 15, it can be seen that NestNet works best when twelve multisource spectral bands are combined and the boundaries of the changing buildings are clearer.Table 24 shows the experimental results of the MSF-Net network under multisource spectral and texture data combinations.MSF-Net works best with the combination of fourband multisource data.Although recall decreases by 0.61% compared with single-source high-resolution images, the precision, F1-score, and IOU improve by 61.63%, 59.22%, and 44.4%, respectively.This indicates that adding multisource spectral information with higher resolution data can make MSF-Net learn more building features and improve the detection effectiveness, and the model detection becomes less effective when continuing to add multisource spectral information, indicating that the addition of too much lowresolution data affects the algorithm performance.Figure 16 shows the building change detection results under different band combinations.When the four-band multisource data were combined, the changing buildings were detected more completely and with clearer boundaries.This indicates that MSF-Net can use more spectral features and texture information to improve the effect of building change detection.
In order to more significantly represent the variation in each model evaluation metric with data source, we organized the results of the Tables 19-24 into a bar chart, as shown in Figure 17.It is important to note that the multispectral combinations in the Siam-conc, Siam-conc-diff, Siam-diff, SNUNet-CD, NestNet, and MSF-Net are three-, three-, ten-, twelve-, three-, and four-bands combination, respectively.The data combinations of highresolution, multisource four bands, multisource ten bands , multisource twelve bands are shown in Tables 3 and 8  Table 24 shows the experimental results of the MSF-Net network under multisource spectral and texture data combinations.MSF-Net works best with the combination of four-band multisource data.Although recall decreases by 0.61% compared with singlesource high-resolution images, the precision, F1-score, and IOU improve by 61.63%, 59.22%, and 44.4%, respectively.This indicates that adding multisource spectral information with higher resolution data can make MSF-Net learn more building features and improve the detection effectiveness, and the model detection becomes less effective when continuing to add multisource spectral information, indicating that the addition of too much lowresolution data affects the algorithm performance.Figure 16 shows the building change detection results under different band combinations.When the four-band multisource data were combined, the changing buildings were detected more completely and with clearer boundaries.This indicates that MSF-Net can use more spectral features and texture information to improve the effect of building change detection.
In order to more significantly represent the variation in each model evaluation metric with data source, we organized the results of the Tables 19-24 into a bar chart, as shown in Figure 17.It is important to note that the multispectral combinations in the Siam-conc, Siam-conc-diff, Siam-diff, SNUNet-CD, NestNet, and MSF-Net are three-, three-, ten-, twelve-, three-, and four-bands combination, respectively.The data combinations of highresolution, multisource four bands, multisource ten bands, multisource twelve bands are shown in Tables 3 and 8

Conclusions
In this paper, a multisource remote sensing image building change detection dataset (MS-HS BCD dataset) is produced based on two kinds of images, GF-1 and Sentinel-2B, and three kinds of data combinations are classified: single-source high-resolution data, single-source multispectral data with four-band combination methods, and multisource data with three-band combination methods.Based on multiple data combination methods, six state-of-the-art change detection neural network models are selected for building change detection experiments with multisource spectral and texture feature data.The experimental results show the following: (1) When inputting multisource spectral and texture feature data into the neural network, all six network models achieve different degrees of improvement, among which SNUNet-CD has the smallest improvement and the F1score and IOU have less than 1% improvement compared with single-source multispectral images.The improvement of Siam-conc-diff is the largest and the improvement of the F1score and IOU is 7.13% and 5.63%, respectively, compared with single-source multispectral images, indicating that different network models have different learning abilities for multisource data.(2) Siam-conc, Siam-diff, SNUNet-CD, and MSF-Net achieve the best results when the four-band data are combined, and Siam-conc-diff and NestNet achieve

Conclusions
In this paper, a multisource remote sensing image building change detection dataset (MS-HS BCD dataset) is produced based on two kinds of images, GF-1 and Sentinel-2B, and three kinds of data combinations are classified: single-source high-resolution data, single-source multispectral data with four-band combination methods, and multisource data with three-band combination methods.Based on multiple data combination methods, six state-of-the-art change detection neural network models are selected for building change detection experiments with multisource spectral and texture feature data.The experimental results show the following: (1) When inputting multisource spectral and texture feature data into the neural network, all six network models achieve different degrees of improvement, among which SNUNet-CD has the smallest improvement and the F1-score and IOU have less than 1% improvement compared with single-source multispectral images.The improvement of Siam-conc-diff is the largest and the improvement of the F1-score and IOU is 7.13% and 5.63%, respectively, compared with single-source multispectral images, indicating that different network models have different learning abilities for multisource data.(2) Siam-conc, Siam-diff, SNUNet-CD, and MSF-Net achieve the best results when the four-band data are combined, and Siam-conc-diff and NestNet achieve the best results when the ten-band and twelve-band data are combined.The detection effect of some models decreases after adding lower resolution band data, indicating that the excessive addition of multisource spectral feature data with lower resolutions will reduce the effectiveness of the model.Combining the above experimental results, it can be concluded that the of the model is improved compared with the results based on single-source data when multiple sources are fed into the model simultaneously, indicating that the lack of single-source high-resolution spectral features or the coarse texture features of single-source multispectral images limit the learning of changing building features by the network model.By combining them, the network model can combine the more spectral and fine texture features of these data sources to enhance the learning ability of building features so that the model can detect more complete changing buildings and improve the detection effect.
In summary, our paper firstly proposed an open-source multisource building change detection dataset, which provides a database for multisource building change detection research and makes up for the lack of such datasets.Next, the effect of multisource data on the building detection effect was performed based on this dataset.The experimental results showed that the detection effect can be significantly improved when multisource spectral and texture features were simultaneously inputted to the model, and the research in this paper also provided a reference for the continued exploration in this field.
Although we obtained some research results in building change detection in multisource remote sensing data, there are still some limitations in the current research.The multisource building change detection dataset (MS-HS BCD dataset) produced in this paper was a small volume dataset with only 600 image sets, which may affect the detection effectiveness of the model to some extent.The preprocessing method of dataset can be further improved.For example, our future research direction includes exploring the influence of resampling with higher resolution (2 m) on the experimental results and exploring the geographical registration methods of two kinds of images with higher accuracy.
Our future work will also introduce radar, SHP vector data, etc., to explore the impact of more types of data sources on building change detection.

Figure 2 .
Figure 2. Flowchart of the method used in this study.

Figure 2 .
Figure 2. Flowchart of the method used in this study.

Figure 3 .
Figure 3.The network structure of MSF-Net.(a) basic structure of encoding module, (b) dual-context fusion module, (c) context module, (d) basic structure of decoding module, (e) channel attention mechanism, (f) selective kernel convolution.The network structure of Siam-conc, Siam-diff, and Siam-conc-diff are shown in Figure 4.These three network structures are a combination of a Siamese structure and Unet++.This combination is widely used in the field of change detection, where the Unet++ network model improves the multiscale detection capability and the Siamese structure enables the model to simultaneously learn deeper building features in the dualtemporal images, effectively improving the building change detection accuracy.

Figure 3 .
Figure 3.The network structure of MSF-Net.(a) basic structure of encoding module, (b) dual-context fusion module, (c) context module, (d) basic structure of decoding module, (e) channel attention mechanism, (f) selective kernel convolution.

6 .
NestNet: The network structure of NestNet is shown in Figure 6.NestNet improves the dense skip connection module based on Siamese Unet++ and uses the difference absolute value operation to process remote sensing images and learn building change features at multiple scales.The effective network structure makes it have an excellent detection effect, and the open-source and advanced features are also the reasons why we select it as our research model.

Figure 7 .
Figure 7. Training loss of single-source high-resolution images.

Figure 7 .
Figure 7. Training loss of single-source high-resolution images.

Figure 8 .
Figure 8. Building change detection results of single-source high-resolution images.

Figure 8 .
Figure 8. Building change detection results of single-source high-resolution images.

Figure 9 .
Figure 9. Training loss of single-source multispectral remote sensing images.

Figure 9 .
Figure 9. Training loss of single-source multispectral remote sensing images.

Figure 11 .
Figure 11.The multisource spectral and texture feature detection results of Siam-conc.

Figure 11 .
Figure 11.The multisource spectral and texture feature detection results of Siam-conc.

Figure 12 .
Figure 12.The multisource spectral and texture feature detection results of Siam-conc-diff.

Figure 13 .
Figure 13.The multisource spectral and texture feature detection results of Siam-diff.

Figure 14 .
Figure 14.The multisource spectral and texture features detection results of SNUNet-CD.

Figure 14 .
Figure 14.The multisource spectral and texture features detection results of SNUNet-CD.

Figure 15 .
Figure 15.The multisource spectral and texture feature detection results of NestNet.

Figure 15 .
Figure 15.The multisource spectral and texture feature detection results of NestNet.

Figure 16 .
Figure 16.The multisource spectral and texture feature detection results of MSF-Net.
-10.It can be concluded that building the change detection based on multisource spectral and texture feature data can effectively improve the detection effect of the algorithm model.Except for the decrease in recall of the Siam-diff, SNUNet-CD.and MSF-Net algorithms, all of the other indices increase to different degrees.

Figure 16 .
Figure 16.The multisource spectral and texture feature detection results of MSF-Net.
-10.It can be concluded that building the change detection based on multisource spectral and texture feature data can effectively improve the detection effect of the algorithm model.Except for the decrease in recall of the Siam-diff, SNUNet-CD.and MSF-Net algorithms, all of the other indices increase to different degrees.

Figure 17 .
Figure 17.The evaluation index of each model changes with the data source.

Figure 17 .
Figure 17.The evaluation index of each model changes with the data source.

Table 4 .
The first combination of the single-source Sentinel-2B multispectral images.

Table 5 .
The second combination of the single-source Sentinel-2B multispectral images.

Table 6 .
The third combination of the single-source Sentinel-2B multispectral images.

Table 7 .
The fourth combination of the single-source Sentinel-2B multispectral images.

Table 11 .
The specific experiment environment.

Table 12 .
Building change detection results of single-source high-resolution images.

Table 12 .
Building change detection results of single-source high-resolution images.

Table 13 .
Siam-conc building change detection results of multispectral images.

Table 13 .
Siam-conc building change detection results of multispectral images.

Table 14 .
Siam-conc-diff building change detection results of multispectral images.

Table 15 .
Siam-diff building change detection results of multispectral images.

Table 16 .
SNUNet-CD building change detection results of multispectral images.

Table 17 .
NestNet building change detection results of multispectral images.

Table 18 .
MSF-Net building change detection results of multispectral images.

Table 20 .
Siam-conc-diff building change detection results of multisource spectral and texture features.

Table 20 .
Siam-conc-diff building change detection results of multisource spectral and texture features.

Table 21 .
Siam-diff building change detection results of multisource spectral and texture features.

Table 21 .
Siam-diff building change detection results of multisource spectral and texture features.

Table 22 .
SNUNet-CD building change detection results of multisource spectral and texture features.

Table 23 .
NestNet building change detection results of multisource spectral and texture features.

Table 24 .
MSF-Net building change detection results of multisource spectral and texture features.

Table 24 .
MSF-Net building change results of multisource spectral and texture features.