Next Article in Journal
Estimation of Potato Above-Ground Biomass Using UAV-Based Hyperspectral images and Machine-Learning Regression
Previous Article in Journal
Ceiling-View Semi-Direct Monocular Visual Odometry with Planar Constraint
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Use of GAN to Help Networks to Detect Urban Change Accurately

1
Key Laboratory of Degraded and Unused Land Consolidation Engineering, Ministry of Natural Resources, Xi’an 710075, China
2
School of Environment and Spatial Informatics, China University of Mining and Technology, Xuzhou 221116, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2022, 14(21), 5448; https://doi.org/10.3390/rs14215448
Submission received: 31 August 2022 / Revised: 4 October 2022 / Accepted: 28 October 2022 / Published: 29 October 2022
(This article belongs to the Section Remote Sensing Image Processing)

Abstract

:
Mastering urban change information is of great importance and significance in practical areas such as urban development planning, land management, and vegetation cover. At present, high-resolution remote sensing images and deep learning techniques have been widely used in the detection of urban information changes. However, most of the existing change detection networks are Siamese networks based on encoder–decoder architectures, which tend to ignore the pixel-to-pixel relationships and affect the change detection results. To solve this problem, we introduced a generative adversarial network (GAN). The change detection network based on the encoder–decoder architecture was used as the generator of the GAN, and the Jensen-Shannon(JS) scatter in the GAN model was replaced by the Wasserstein distance. An urban scene change detection dataset named XI’AN-CDD was produced to verify the effectiveness of the algorithm. Compared with the baseline model of the change detection network, our generator outperformed it significantly and had higher feature integrity. When the GAN was added, the detected feature integrity was better, and the F1-score increased by 4.4%.

1. Introduction

With the rapid development of economy and society, the urbanization process is rapidly advancing, and Land-Use and Land-Cover Change (LUCC) information is rapidly changing. Humans must have timely and accurate information on changes in the Earth’s surface, which is important in urban development planning [1,2], land management [3], vegetation cover [4], and other practical uses. In recent years, high-resolution remote sensing imagery has become more accessible, and it is used to capture land cover change information. Simultaneously, as high-resolution remote sensing images can provide richer and clearer spatial feature information, the application of high-resolution remote sensing images in urban change detection has gradually increased [5,6].
Scholars have proposed a variety of remote sensing image change detection methods. Most of the early change detection methods obtain change results by performing algebraic operations on remote sensing images, such as Change Vector Analysis (CVA) [7]. More recent methods used by some scholars entail the extraction of spectrum, texture, shape, color, and other artificially designed features from remote sensing images before and after the changes for analysis and research, to obtain change intensity maps. Then, the thresholds are set or sample points are manually selected to obtain the change detection results. Some of these methods include Support Vector Machine (SVM) [8], Random Forest (RF) [9], etc.
Traditional change detection methods are complicated to operate and have poor robustness. In recent years, deep learning technology has performed well in computer vision fields such as image recognition and semantic segmentation, and some scholars have applied it to change detection tasks. Post-classification change detection is one of the commonly used methods. The idea is to use the semantic segmentation network to perform semantic segmentation on bi-temporal images separately and obtain the difference in the segmentation results as the change area. Semantic segmentation networks such as FCN [10] and U-Net [11] are widely used. Then, Siamese neural networks became the standard method for change detection [12]. Scholars have used various methods to improve the change detection capability of Siamese networks. In [13], pyramidal pooling is proposed based on fully convolutional neural networks. In [14], an encoder–decoder architecture based on UNet++ is proposed. In [15], an SLN model without image pre-processing is proposed. Although CNN networks can detect non-linear variations in high-resolution images, as they usually predict the class of each pixel, pixel-level accuracy may be high, but pixel-to-pixel correlations are easily ignored, leaving the integrity of the detection results lacking.
The generative adversarial network (GAN) [16] and its variants have been widely used in computer vision in recent years, such as image style migration [17] and image attribute editing [18]. GANs consist of two parts: a Generator (G) and a Discriminator (D), G’s goal is to generate images that D cannot distinguish between true and false, D’s goal is to judge the generator’s images as false as possible. G and D are trained simultaneously through the min–max game theory against each other, so that the generated image of the generator gradually approaches the true image. When the discriminator judges the generated image to be the true image, it means that the data distribution of the generator approximates the training data distribution.
The wide application of GANs in computer vision has inspired scholars to use it in remote sensing image processing. In 2016, Luc et al. used a GAN for image semantic segmentation, pioneering the application of GANs in the direction of semantic segmentation [19]. In response to the fact that most of the currently available methods for road extraction in high-resolution remote sensing images cannot automatically extract roads with smooth and accurate boundaries, Shi et al. [20] proposed a road extraction method based on the GAN. Kim et al. [21] used the conditional generative adversarial network (CGAN) to convert remote sensing images into simple road routing map images and then extracted roads. Niu et al. [22] addressed the change detection problem in heterogeneous remote sensing images by converting heterogeneous images into homogeneous images using the conditional GAN and performed change detection on the obtained results with good results. We applied GAN networks to Siamese change detection networks and proposed GAN-enhanced change detection networks. The main contributions of this paper include the following:
(1) We produced a change detection dataset for urban scenes based on high-resolution images of Xi’an, named XI’AN-CDD. It mainly included the changes in three types of land objects, namely, roads, cultivated land, and buildings;
(2) We built a Siamese-structured change detection network based on the center-surround, ASPP, and attention fusion modules, named UNET-CD. We built a GAN-based change detection network named UNET-GAN-CD using UNCE-CD as a generator. The performance of this network was verified on the XI’AN-CDD and CDD datasets, respectively.

2. Dataset

2.1. Xi’an Change Detection Dataset

In this paper, change detection experiments were conducted using high-resolution remote sensing image data. The bi-temporal remote sensing images were GF-1 images acquired on 2 May 2015, and 2 May 2017. The panchromatic spatial resolution of GF-1 remote sensing image data was 2 m, and the multispectra contained four bands of red, green, blue, and NIR with a spatial resolution of 8 m. The image size of the experimental data was 17,165 × 17,535. The R-G-B true color images are shown in Figure 1a,b.
We visually interpreted the remote sensing images of Xi’an in 2015 and 2017 to obtain the changed label as shown in Figure 2, where the white color indicates the changed areas and the black color indicates the unchanged areas.
The bi-temporal remote sensing images and the corresponding changed labels were cropped. The cropped image blocks with the size of 3400 × 3500 pixels are shown in Figure 3. The image blocks numbered 8, 12, 14, 15, and 17 were selected as the testing images. The other image blocks were cropped to 224, and the cropping overlap was set to 150 pixels.
To increase the number of training samples in the training dataset, improve sample feature diversity, and enhance the robustness of the change detection network and its generalization ability, data augmentation was performed on the training set images together with the corresponding changed labels. Data augmentation mainly included random data rotation by 90°, 135°, and 270°, as well as horizontal flip and vertical flip. In the process of the data enhancement of the training set image pairs, the number of changed pixels in the changed labels was counted, and if the number of changed pixels was less than 200 pixels, the label was not enhanced. The background of the images in the training dataset was not involved in the training process of the change detection network model.
The dataset was named XI’AN-CDD (XI’AN change detection dataset). The change types of XI’AN-CDD were mainly divided into three categories: road change, cultivated land change, and building change, as shown in Figure 4.

2.2. Change Detection Dataset

The CDD is one of the most common evaluation datasets in the field of change detection. The CDD contains 11 pairs of RGB images taken in different seasons obtained by Google Earth, and these images have a spatial resolution of 3 cm/px to 100 cm/px. A total of 16,000 pairs of images of size 256 × 256 pixels were generated from the original images using cropping and rotating operations. The training set had 10,000 image pairs, and the validation set and test set had 3000 image pairs.

3. Method

The structural composition of the constructed GAN-based remote sensing image change detection network is shown in Figure 5. It mainly consists of two convolutional neural networks, the generator, and the discriminator. This model is named UNET-GAN-CD.
In this framework, generator continuously generate new data distribution based on the input data. The discriminator determines whether the data generated by the generator is real data or fake data. Both of them were confronted with each other in the model training process of the GAN, finally, the generator and discriminator together reach the optimal state. This framework corresponds to a Min-Max game of the two training networks.

3.1. Generator

The generator is an encoder–decoder Siamese network named UNET-CD, as shown in Figure 6. In order to solve the problems of image information loss and image resolution reduction caused by the pooling in the Siamese network structure, a 3 × 3 convolutional layer with a stride of two is used to replace the pooling layer.
The encoder uses the Siamese structure with shared weights and is designed as a center-surround module, as shown in Figure 6. In order to obtain the multi-scale features of the image, the central part of the input image is cropped and resampled to the original image size. The bi-temporal images and their corresponding cropped images are input into the network to extract the features. The feature maps obtained from the original images are called the surround features, and the feature maps obtained from the cropped images are called the center features. The surround features and the central features extracted by the encoder are fused in the band dimension, as shown in Figure 6, where m represents the feature fusion; the expression is:
Feature = merge (Ssubi,Csubi), i = 1,2,3,4,5
where feature represents the fused feature maps; Ssubi, Csubi denote the differential feature maps of the surround feature maps and the center feature maps, respectively; and merge denotes the fusion operation.
In the up-sampling process, deep features at the decoder and shallow features at the encoder are fused through the attention module. The last layer of the change detection network uses the Softmax activation function to normalize the output neuron values, and the change and unchanged values are mapped between 0 and 1 using the sigmoid function.

3.1.1. Generator Architecture

The specific network architecture of the generator is shown in Table 1. I, O, and K represent the number of input channels, the number of output channels, and the kernel size, respectively. ReLU is the activation function, and P denotes the pooling operation.

3.1.2. ASPP Module

As shown in Figure 7, the ASPP module uses a dilated convolution with expansion rates of 6, 12, and 18 to increase the perceptual field of the model and capture the multi-scale features of the image. To obtain more global contextual information on the image, the ASPP module also acquires the image-level features through GAP. Finally, the acquired multi-scale image features are up-sampled to the appropriate image size using a bilinear interpolation method, and feature fusion is performed.

3.1.3. Center-Surround Architecture

In Figure 8, assuming that the original image has a size of 224 × 224 pixels, the cropped center sub-image has the size of 112 × 112 pixels and is up-sampled to 224 × 224 pixels. Then, the original image is input to the surround module, and the up-sampled center sub-image is input to the central module. The multi-resolution information of the image is generated via the above process.

3.1.4. Attention Fusion Module

In Figure 9, Flow denotes the shallow features of the network at the encoder; Fhigh denotes the deep features at the decoder; and F denotes the fused feature image.
The interaction and information integration of the feature image meta-values across channels is achieved using 1 × 1 convolution kernels for shallow and deep features, respectively; then, the integrated deep and shallow feature images are summed pixel by pixel. The weight size of their pixels is obtained. Finally, the image element weight value results of the feature image are output to the range of 0-1 using the sigmoid activation function. The new weight value feature image and the deep features after information integration are subjected to image-by-pixel dot product, and all the resultant values of the deep feature map are adjusted to obtain the final result.

3.2. Discriminator

To reduce the number of parameters in the network and improve the model training and prediction time, the discriminator in this paper uses the fully convolutional network FCN16s, and the last layer of the discriminator uses the Softmax activation function, while the rest of the layers use the Leaky ReLU activation function.
As shown in Figure 10, after the result of the generator is fused with the input image, it is input to the discriminator network for judgment, and the output result of the discriminator is only restored to 1/4 of the size of the original input image. Therefore, the input discriminator image size is 224 × 224, and the output image size of the discriminator is 56 × 56.

Discriminator Architecture

The network architecture of the discriminator is shown in Table 2. I, O, and K represent the number of input channels, the number of output channels, and the kernel size, respectively. LeakyReLU is the activation function, and BN denotes the Batch Normalization.

3.3. Loss Function

The loss function of UNET-GAN-CD consists of two parts, the Wasserstein distance and the category cross-entropy. The Wasserstein distance is proposed in the WGAN network. We use it instead of the JS scatter in the original GAN model, and its loss function is defined by the following formula:
W(Pγ,Pg) = inf(γ~Π(Pγ,Pg)){E(x,y)[‖x − y‖]},
where Π(Pγ,Pg) denotes the set of all joint distributions of Pγ and Pg combined. For each possible joint distribution γ, one can sample (x,y)~γ from it to obtain a real sample x and a generative sample y and calculate the expectation value, ‖x − y‖, of the sample pair distance in joint distribution γ.
The results obtained from the generator and the real changed labels are used for calculating the category cross-entropy and for the binary classification of change detection; when the change category label is 1, the category cross-entropy is calculated as:
CE(X) = −yi log(f(x)),
where yi is the label category information, the changed label pixel element is 1, the unchanged pixel element is 0, and f(x) is the change probability value of the image element obtained by the generator.
The loss function of the generator can be expressed as:
LG(x) = CE(x) + W(x),
The loss function of the discriminator can be expressed as:
LD(x) = W(x),

4. Experiments and Result Analysis

4.1. Evaluation Indicators

As shown in Table 3. We used a confusion matrix to calculate evaluation indicators. TP, TN, FP, and FN are the true positive case, true negative case, false positive case, and false-negative case, respectively. The overall accuracy (OA), Kappa coefficient, F1-score, and other accuracy indicators of change detection were calculated using the confusion matrix.
The data in the confusion matrix were obtained from the statistics, and the formula for assessing the accuracy indicators of change detection results was calculated as shown below.
(1) The overall accuracy (OA) is the ratio between the number of pixels correctly predicted by the model on all test sets and the overall number, expressed by the formula:
OA = (TP + TN)/(TP + FP + TN + FN),
(2) The Kappa coefficient is used for consistency tests and can also be used to measure classification accuracy. The Kappa coefficient is calculated based on confusion matrices. The Kappa coefficient can be expressed by the formula:
Kappa = (po − pe)/(1 − pe),
Pe is equivalent to the OA:
pe = (T1C1 + T2C2)/(TP + FP + TN + FN)2,
(3) The F1-score is a comprehensive evaluation index for evaluating precision and recall and is the harmonic mean of precision and recall, expressed by the formula:
F1 = 2 × (P × R)/(P + R),
P and R represent precision and recall, and their formulas are expressed as:
Precision = TP/(TP + FP),
Recall = TP/(TP + FN),

4.2. Experiment Design

We implemented the network using the Tensorflow framework. In the training process, the batch size was set to 4, and the RMSprop function was applied as an optimizer. The learning rate was set to 0.001. The weights of the convolutional layers were initialized using KaiMing normalization. We conducted the experiments on a single NVIDIA 3060TI 8 G and trained for 300 epochs to make the model converge. When the loss of the validation set no longer continued to decrease after five iterations, the learning rate of the change detection network model decayed, and the decay factor was set to 0.1. When the number of iterations with which the accuracy no longer improved reached 10, model training was terminated using the Early Stopping strategy.

4.3. Analysis and Comparison

As shown in Figure 11, we used a sliding window to detect changes in the test images. Then, the center region of the window was taken as the final prediction result. Assuming that the size of the sliding window was W, we only took the middle area of size W/2 as the prediction result. The sliding window could reduce the phenomenon of a black edge due to image mosaic. In addition, since the final prediction result was to obtain the small window value of the center region of the window, it could improve the edge effect in the predicted result image.

4.3.1. Comparison of UNET-CD and Existing CD Algorithms

The results of the evaluation indexes of the generator UNET-CD and the comparison methods SLN and FC-Siam-diff are shown in Table 4.
It could be seen that the results of UNET-CD were better than those of SLN and FC-Siam-diff. In this paper, the remote sensing image blocks numbered 12 and 15 in the test set were selected as the display data, and the change detection results obtained are shown in Figure 12.
The change detection results obtained with UNET-CD could better detect the image change areas, and compared with the change detection results of SLN and FC-Siam-diff, the results obtained with FC-Siam-diff were fragmented; the integrity of the detection results was poor; and there were many false detection results. In contrast, the detection results obtained with UNET-CD and SLN had better feature integrity, but there were also some problems of missing detection of complex change areas.
From the comparison of image block 15, it could be found that the detection structures obtained with the three methods all detected change areas, but the results obtained using the method in this paper had more accurate change area boundaries.
To further evaluate and analyze the results of UNET-CD, three sub-regions of images were selected to show the detection results obtained with UNET-CD, FC-Siam-diff, and SLN. As shown in Figure 13, the edges of the UNET-CD change detection results were more accurate, and there were fewer missing detection areas.

4.3.2. Enhancement of UNET-CD with GAN

The change detection model after the introduction of the GAN was called UNET-GAN-CD. The accuracy results of UNET-GAN-CD and UNET-CD are shown in Table 5. The OA was improved by 0.76%; the KAPPA coefficient was improved by 3.4%; and the F1-score was improved by 4.4%.
The change detection results are shown in Figure 14. To further demonstrate the improvement brought by GAN to UNET-CD, we selected the results of three sub-regions for demonstration (Figure 15). There were more missing detections for UNET-CD in sub region 1, while UNET-GAN-CD detected the boundaries more accurately. UNET-CD in sub-region 3 missed some change regions, and the detection results mistakenly detected some unchanged regions as change regions, while the detection results of UNET-GAN-CD accurately detected all change region locations, and the obtained change region results were more accurate and reliable.

4.3.3. Comparison of UNET-GAN-CD and Existing CD Algorithms on CDD

FC-EF, FC-Siam-conc, and FC-Siam-diff [12] are baseline models for change detection tasks. Unet++_MSOF [14] adopts UNet++ to extract features and uses multi-side output fusion for deep supervision. IFN [23] proposes a multi-scale deep supervised change detection model.
As shown in Table 6, our model achieved the best F1-score and recall. Interestingly, the precision and recall of UNET-GAN-CD differed by only 0.002 compared with other models, which is an excellent balance between P and R.
It can be seen from the first line of Figure 16 that for seasonal changes, UNET-GAN-CD could accurately detect snow-covered areas. SLN was disturbed by vegetation in the previous phase images and could not identify objects. As shown in the second line of Figure 16, SLN could only detect some new road additions, and FC-Siam-diff was completely unable to detect any changes in the image. UNET-GAN-CD not only accurately detected changes in the road, but also changes in vehicles. As shown in the third line of Figure 16, both SLN and FC-Siam-diff could detect building changes, but the edges of their detection results were not smooth enough, and the integrity of ground objects was poor. The detection results of UNET-GAN-CD were more accurate for building boundaries, and there was no void phenomenon inside the changing area.

5. Conclusions

In this paper, we proposed a GAN-based change detection method for urban remote sensing images. The model could detect changes in roads, buildings, and cultivated land occurring in urban scenes, and the effectiveness of the model was proved via quantitative and qualitative evaluations. In order to verify the effectiveness of the GAN in change detection model, we conducted an ablation experiment for GAN on XI’AN-CDD. Specifically, the addition of GAN improved the overall accuracy, Kappa coefficient, and F1-score by 0.76%, 3.4%, and 4.4%, respectively. It also further improved the completeness of the detection results and improved the missing detection phenomenon. On the public change detection dataset (CDD), UNET-GAN-CD achieved the best F1-score and achieved the best balance between precision and recall.
During model training and testing, only RGB bands were used for the experiments, and future research can be devoted to exploring multi-band remote sensing images processing. Most of the existing change detection datasets are binary change detection datasets, and multi-category change detection is also the direction of our future research. In addition, training a change detection model with excellent performance using a small sample of change category training data is also the next research focus.

Author Contributions

Conceptualization, C.H. and Y.Z.; methodology, C.H. and Y.Z.; writing—original draft preparation, C.H. and Y.Z.; writing—review and editing, C.H., Y.Z., J.D. and Y.X.; supervision, Y.Z.; funding acquisition, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Open Fund of the Key Laboratory of Degraded and Unused Land Consolidation Engineering, Ministry of Natural Resources (No. SXDJ2019-4); the National Key Research and Development Program of China (No. 2017YFE0119600); and the Natural Science Foundation of China (No. 51874306).

Data Availability Statement

The data presented in this study are available in CDD dataset [24] (https://drive.google.com/file/d/1GX656JqqOyBi_Ef0w65kDGVto-nHrNs9/edit) (accessed on 1 October 2022). In addition, the XI’AN-CDD dataset are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Malmir, M.; Zarkesh, M.M.K.; Monavari, S.M.; Jozi, S.A.; Sharifi, E. Urban development change detection based on Multi-Temporal Satellite Images as a fast tracking approach—A case study of Ahwaz County, southwestern Iran. Environ. Monit. Assess 2015, 187, 108–117. [Google Scholar] [CrossRef] [PubMed]
  2. Huang, X.; Zhang, L.; Zhu, T. Building change detection from multitemporal high-resolution remotely sensed images based on a morphological building index. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2013, 7, 105–115. [Google Scholar] [CrossRef]
  3. Xiao, P.; Zhang, X.; Wang, D.; Yuan, M.; Feng, X.; Kelly, M. Change detection of built-up land: A framework of combining pixel-based detection and object-based recognition. ISPRS J. Photogramm. Remote Sens. 2016, 119, 402–414. [Google Scholar] [CrossRef]
  4. Zhou, J.; Yu, B.; Qin, J. Multi-level spatial analysis for change detection of urban vegetation at individual tree scale. Remote Sens. 2014, 6, 9086–9103. [Google Scholar] [CrossRef] [Green Version]
  5. Wang, B.; Choi, S.; Han, Y.; Lee, S.; Choi, J. Application of IR-MAD using synthetically fused images for change detection in hyperspectral data. Remote Sens. Lett. 2015, 6, 578–586. [Google Scholar]
  6. Wen, D.; Huang, X.; Zhang, L.; Benediktsson, J. A Novel Automatic Change Detection Method for Urban High-Resolution Remotely Sensed Imagery Based on Multiindex Scene Representation. IEEE Trans. Geosci. Remote Sens. 2016, 54, 609–625. [Google Scholar] [CrossRef]
  7. Malila, W.A. Change vector analysis: An approach for detecting forest changes with landsat. LARS Symp. 1980, 385, 326–335. [Google Scholar]
  8. Feng, W.; Sui, H.; Tu, J.; Sun, K.; Huang, W. Change detection method for high resolution remote sensing images using random forest. Cehui Xuebao/Acta Geod. Et Cartogr. Sin. 2017, 46, 1880–1890. [Google Scholar]
  9. Optimization, S.M. A fast algorithm for training support vector machines. CiteSeerX 1998, 10, 4376. [Google Scholar]
  10. Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
  11. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
  12. Daudt, R.C.; Le Saux, B.; Boulch, A. Fully convolutional siamese networks for change detection. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 4063–4067. [Google Scholar]
  13. Lei, T.; Zhang, Y.; Lv, Z.; Li, S.; Liu, S.; Nandi, A.K. Landslide inventory mapping from bi-temporal images using deep convolutional neural networks. IEEE Geosci. Remote Sens. Lett. 2019, 16, 982–986. [Google Scholar] [CrossRef]
  14. Peng, D.; Zhang, Y.; Guan, H. End-to-end change detection for high resolution satellite images using improved unet++. Remote Sens. 2019, 11, 1382. [Google Scholar] [CrossRef] [Green Version]
  15. Cai, Y.; Liu, C.; Cheng, P.; Du, D.; Zhang, L.; Wang, W.; Ye, Q. Scale-residual learning network for scene text detection. IEEE Trans. Circ. Syst. Video Technol. 2020, 31, 2725–2738. [Google Scholar] [CrossRef]
  16. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. Adv. Neural Inf. Process Syst. 2014, 27, 2672–2680. [Google Scholar]
  17. Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
  18. Song, G.; Luo, L.; Liu, J.; Ma, W.C.; Lai, C.; Zheng, C.; Cham, T.J. AgileGAN: Stylizing portraits by inversion-consistent transfer learning. ACM Trans. Graph. 2021, 40, 1–13. [Google Scholar] [CrossRef]
  19. Luc, P.; Couprie, C.; Chintala, S.; Verbeek, J. Semantic Segmentation using Adversarial Networks. arXiv 2016, arXiv:1611.08408. [Google Scholar]
  20. Shi, Q.; Liu, X.; Li, X. Road detection from remote sensing images by generative adversarial networks. IEEE Access 2017, 6, 25486–25494. [Google Scholar] [CrossRef]
  21. Kim, S.; Park, S.; Yu, K. Proposal for a Method of Extracting Road Layers from Remote Sensing Images Using Conditional GANs. In Proceedings of the 2nd International Conference on Digital Signal Processing, Tokyo, Japan, 25–27 February 2018; pp. 84–87. [Google Scholar]
  22. Niu, X.; Gong, M.; Zhan, T.; Yang, Y. A conditional adversarial network for change detection in heterogeneous images. IEEE Geosci. Remote Sens. Lett. 2018, 16, 45–49. [Google Scholar] [CrossRef]
  23. Zhang, C.; Yue, P.; Tapete, D. A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images. ISPRS J. Photogramm. Remote Sens. 2020, 166, 183–200. [Google Scholar] [CrossRef]
  24. Lebedev, M.A.; Vizilter, Y.V.; Vygolov, O.V.; Knyaz, V.A.; Rubis, A.Y. Change Detection in Remote Sensing Images Using Conditional Adversarial Networks. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2018, 42, 2. [Google Scholar] [CrossRef]
Figure 1. Bi-temporal images: (a) 2015 Xi’an GF1 image and (b) 2017 Xi’an GF1 image.
Figure 1. Bi-temporal images: (a) 2015 Xi’an GF1 image and (b) 2017 Xi’an GF1 image.
Remotesensing 14 05448 g001
Figure 2. Ground truth changed label. Changed pixels are shown in white, and unchanged pixels are shown in black.
Figure 2. Ground truth changed label. Changed pixels are shown in white, and unchanged pixels are shown in black.
Remotesensing 14 05448 g002
Figure 3. Dataset division.
Figure 3. Dataset division.
Remotesensing 14 05448 g003
Figure 4. Change types in XI’AN-CDD: (a) road change, (b) cultivated land change, and (c) building change.
Figure 4. Change types in XI’AN-CDD: (a) road change, (b) cultivated land change, and (c) building change.
Remotesensing 14 05448 g004
Figure 5. UNET-GAN-CD Architecture.
Figure 5. UNET-GAN-CD Architecture.
Remotesensing 14 05448 g005
Figure 6. UNET-CD architecture.
Figure 6. UNET-CD architecture.
Remotesensing 14 05448 g006
Figure 7. ASPP architecture.
Figure 7. ASPP architecture.
Remotesensing 14 05448 g007
Figure 8. Center-surround architecture. Conv1-5 is the generator encoder shared weight.
Figure 8. Center-surround architecture. Conv1-5 is the generator encoder shared weight.
Remotesensing 14 05448 g008
Figure 9. Attention fusion module. Flow and Fhigh are inputs, and F is the output.
Figure 9. Attention fusion module. Flow and Fhigh are inputs, and F is the output.
Remotesensing 14 05448 g009
Figure 10. Discriminator architecture.
Figure 10. Discriminator architecture.
Remotesensing 14 05448 g010
Figure 11. Sliding window image prediction. W is the window size.
Figure 11. Sliding window image prediction. W is the window size.
Remotesensing 14 05448 g011
Figure 12. Change detection results: (a) detection results of image block 12 and (b) detection results of image block 15.
Figure 12. Change detection results: (a) detection results of image block 12 and (b) detection results of image block 15.
Remotesensing 14 05448 g012aRemotesensing 14 05448 g012bRemotesensing 14 05448 g012c
Figure 13. Sub-region change detection results: (a) image before the change, (b) image after the change, (c) ground truth, (d) detection results of SLN, (e) detection results of FC-Siam-diff, and (f) detection results of UNET-CD.
Figure 13. Sub-region change detection results: (a) image before the change, (b) image after the change, (c) ground truth, (d) detection results of SLN, (e) detection results of FC-Siam-diff, and (f) detection results of UNET-CD.
Remotesensing 14 05448 g013
Figure 14. Change detection results: (a) detection results of image block 12 and (b) detection results of image block 8.
Figure 14. Change detection results: (a) detection results of image block 12 and (b) detection results of image block 8.
Remotesensing 14 05448 g014aRemotesensing 14 05448 g014bRemotesensing 14 05448 g014c
Figure 15. Sub-region change detection results on XI’AN-CDD: (a) image before the change, (b) image after the change, (c) ground truth, (d) detection results of UNET-CD, and (e) detection results of UNET-GAN-CD.
Figure 15. Sub-region change detection results on XI’AN-CDD: (a) image before the change, (b) image after the change, (c) ground truth, (d) detection results of UNET-CD, and (e) detection results of UNET-GAN-CD.
Remotesensing 14 05448 g015
Figure 16. Change detection results on CDD: (a) image before the change, (b) image after the change, (c) ground truth, (d) detection results of SLN, (e) detection results of FC-Siam-diff, and (f) detection results of UNET-CD.
Figure 16. Change detection results on CDD: (a) image before the change, (b) image after the change, (c) ground truth, (d) detection results of SLN, (e) detection results of FC-Siam-diff, and (f) detection results of UNET-CD.
Remotesensing 14 05448 g016
Table 1. Generator architecture.
Table 1. Generator architecture.
LayerGenerator, UNET-CD
L1ConvBlock (I3, O32, K3), ReLU, P
L2ConvBlock (I32, O64, K3), ReLU, P
L3ConvBlock (I64, O96, K3), ReLU, P
L4ConvBlock (I96, O128, K3), ReLU, P
L5ConvBlock (I128, O128, K3)
L6ASPPBlock (I128, O128)
L7UpSampling2D (2), DeconvBlock (I512, O128, K3)
L8UpSampling2D (2), DeconvBlock (I320, O96, K3)
L9UpSampling2D (2), DeconvBlock (I224, O64, K3)
L10UpSampling2D (2), DeconvBlock (I96, O32, K3)
L11FinalConvBlock (I32, O3, K1)
Table 2. Discriminator architecture.
Table 2. Discriminator architecture.
LayerDiscriminator, D
L1Conv (I8, O16, K3, S2), LeakyReLU, BN
L2Conv (I16, O32, K3, S2), LeakyReLU, BN
L3Conv (I32, O64, K3, S2), LeakyReLU, BN
L4Conv (I64, O128, K3, S2), LeakyReLU, BN
L5UpSampling2D (2),Conv (I192,O64,K3,S1), LeakyReLU
L6UpSampling2D (4),Conv (I64,O64,K3,S1), LeakyReLU
L7Conv (I64,O2,K3,S1)
Table 3. Confusion matrix.
Table 3. Confusion matrix.
Ground Truth
Prediction 10
1TPFPT1
0FNTNT2
C1C2All_Num
Table 4. Accuracy comparison of change detection results.
Table 4. Accuracy comparison of change detection results.
Evaluation IndicatorsOAKappaF1
UNET-CD0.95150.56540.5767
SLN0.94950.50970.5129
FC-Siam-diff0.92720.40720.4437
Table 5. Accuracy comparison of change detection results.
Table 5. Accuracy comparison of change detection results.
Evaluation IndicatorsOAKappaF1
UNET-CD0.95150.56540.5767
UNET-GAN-CD0.95870.58460.6022
Table 6. Accuracy comparison of change detection results.
Table 6. Accuracy comparison of change detection results.
Method/ChannelParams (M)PrecisionsRecallsF1scoresFps
FC-EF/161.350.6090.5830.592-
FC-Siam-conc/161.540.7090.6030.637-
FC-Siam-diff/16 *1.350.8790.4360.58338
FC-Siam-diff/325.390.7830.6260.692-
Unet++_MSOF/3211.000.8950.8710.876-
IFN/-35.720.9500.8610.903-
SLN/16 *1.660.7930.6220.69735
UNET-GAN-CD (Ours)5.350.9130.9150.91421
The symbol * means our re-implement results. Others are experimental results of the original paper.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

He, C.; Zhao, Y.; Dong, J.; Xiang, Y. Use of GAN to Help Networks to Detect Urban Change Accurately. Remote Sens. 2022, 14, 5448. https://doi.org/10.3390/rs14215448

AMA Style

He C, Zhao Y, Dong J, Xiang Y. Use of GAN to Help Networks to Detect Urban Change Accurately. Remote Sensing. 2022; 14(21):5448. https://doi.org/10.3390/rs14215448

Chicago/Turabian Style

He, Chenyang, Yindi Zhao, Jihong Dong, and Yang Xiang. 2022. "Use of GAN to Help Networks to Detect Urban Change Accurately" Remote Sensing 14, no. 21: 5448. https://doi.org/10.3390/rs14215448

APA Style

He, C., Zhao, Y., Dong, J., & Xiang, Y. (2022). Use of GAN to Help Networks to Detect Urban Change Accurately. Remote Sensing, 14(21), 5448. https://doi.org/10.3390/rs14215448

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop