Two-Phase Object-Based Deep Learning for Multi-temporal SAR Image Change Detection

Change detection is one of the fundamental applications of synthetic aperture radar (SAR) images. However, speckle noise presented in SAR images has a much negative effect on change detection. In this research, a novel two-phase object-based deep learning approach is proposed for multi-temporal SAR image change detection. Compared with traditional methods, the proposed approach brings two main innovations. One is to classify all pixels into three categories rather than two categories: unchanged pixels, changed pixels caused by strong speckle (false changes), and changed pixels formed by real terrain variation (real changes). The other is to group neighboring pixels into segmented into superpixel objects (from pixels) such as to exploit local spatial context. Two phases are designed in the methodology: 1) Generate objects based on the simple linear iterative clustering algorithm, and discriminate these objects into changed and unchanged classes using fuzzy c-means (FCM) clustering and a deep PCANet. The prediction of this Phase is the set of changed and unchanged superpixels. 2) Deep learning on the pixel sets over the changed superpixels only, obtained in the first phase, to discriminate real changes from false changes. SLIC is employed again to achieve new superpixels in the second phase. Low rank and sparse decomposition are applied to these new superpixels to suppress speckle noise significantly. A further clustering step is applied to these new superpixels via FCM. A new PCANet is then trained to classify two kinds of changed superpixels to achieve the final change maps. Numerical experiments demonstrate that, compared with benchmark methods, the proposed approach can distinguish real changes from false changes effectively with significantly reduced false alarm rates, and achieve up to 99.71% change detection accuracy using multi-temporal SAR imagery.


Introduction
With its cloud penetrating capability, synthetic aperture radar (SAR) images have drawn a large amount of attention, for example, in environmental surveillance, urban planning and military applications over the past decades. Using SAR images for change detection often involves two images acquired over the same area at different times, utilising the information in the differences between them.
Depending on the availability of a difference image (DI), change detection approaches can be divided into two categories. One is post-classification comparison which is undertaken to identify changed and unchanged regions directly from two images that were classified independently before the analysis. In this approach, the change detection result is not influenced by radiation normalization and geometric correction. However, the accuracy of the change detection relies on the quality of the classification results, with errors propagating to the outcome. The other approach is post-comparison analysis, in which change detection is achieved by generating a DI from two multi-temporal images, and obtaining the final change map from it. The classification errors in this case do not accumulate, but the way that the DI is generted may influence the validity of the change detection results [1].
From a machine learning perspective, change detection can also be categorized into supervised and unsupervised approaches, depending on whether labeled data are used or not [2][3]. For supervised methods, features extracted from labeled data are fed into a subsequent classifier. This strategy requires a significant number of ground reference data to train the algorithm, and the labelling process can be extremely labor-intensive and time-consuming [4]. In [5], a context-sensitive similarity measure is presented based on supervised classification to amplify the dissimilarity between changed and unchanged pixels. Unsupervised methods for change detection can be viewed as a clustering approach which divides the data into changed and unchanged classes [6][7]. In [8], the DI is cast into an eigenvector space and k-means clustering is used to partition the space into two clusters. In [9], a modified Markov Random Field (MRF) energy function is employed to update iteratively the membership association of fuzzy c-means (FCM), to cluster the DI into two classes. In [10] a novel method based on spatial fuzzy clustering was used to add spatial information to enhance change detection performance.
Recently, deep learning has gained widespread attention in the field of computer vision and pattern recognition, and demonstrated state-of-the-art prediction accuracy in various challenging tasks, such as target detection, image classification, etc.. The major benefit of deep learning is that it can extract abstract and high-level representations that are hard to hand-code through feature engineering [11,12]. Besides, deep networks are often pre-trained using a large-scale dataset (e.g. ImageNet), and fine-tuned to other domains including remote sensing. Convolutional neural networks (CNNs) are considered as the pioneer of deep learning methods which mimic the receptive fields of the human brain neural cortex, with less redundancy and complexity through the weightsharing architecture [12,13]. Some well-developed CNN models, such as AlexNet [12], VGG [14] and ResNet [15], have been adopted quickly in the remote sensing community to solve real-world challenges (e.g., land cover and land use classification).
Given the advantages of deep learning, some pioneering methods have been proposed for multitemporal SAR image change detection. In [1], a stack of restricted Boltzmann machine (RBM) networks was used to learn efficiently the relationship between two multi-temporal SAR images for change detection. A dual-channel CNN structure was used to extract features of two SAR images for change detection [16]. [17] presents a local restricted CNN for SAR image change detection, which is formed by imposing a spatial constraint on the output layer of the CNN, such as to learn from several layered difference images. In [18], a stacked contractive autoencoder (sCAE) using a contractive penalty was proposed to promote local invariance and robustness, such that robust features can be extracted from superpixels of SAR images for change detection. In [19], a deep learning-based weakly supervised framework was developed for urban change detection using multi-temporal polarimetric SAR data. In [20], a transferred multi-level fusion network (MLFN) was trained using a large dataset and fine-tuned to extract features from SAR image patches for sea ice change detection. PCANet is an alternative deep learning model suitable for SAR image change detection [22,23,24]. In PCANet, the cascaded PCA filters and binary quantization (hashing) are used as a data-adapting convolution filter bank in each stage and in the nonlinearity layer [21]. During the PCANet training process, there is no requirement for regularized parameters and numerical optimization solvers, which promotes the efficiency and accuracy of the network. In [22], PCANet was shown to be accurate, with great potential for SAR image change detection. In [23], context-aware saliency detection was employed to obtain training samples for PCANet in SAR image change detection, which reduces the number of training samples required while maintaining the reliability of the training sample sets, leading to less training time and computational efficiency. In [24], a morphologically supervised PCANet was designed to overcome the class imbalance problem in SAR image change detection (changed pixels are far less common than unchanged pixels).
Although the above-mentioned deep learning methods exhibit excellent performance in SAR image change detection, there are still some shortcomings. First of all, all the above methods are actually binary classification algorithms, which separate pixels of the changed class (CC) from pixels of the unchanged class (UC). In reality, variation in the pixel values caused by strong speckle noise may lead to allocation to the changed class, potentially producing a large number of false alarms. There are actually two kinds of changed pixels: one is produced by real terrain object changes (i.e. real changed class, RCC), and the other caused by strong speckle noise (i.e. false changed class, FCC). Even if deep learning models have powerful classification capabilities, there will still be several false alarms due to strong speckle noise. Secondly, in current deep learning-based SAR image change detection, high quality training samples are required to train the networks. Those training samples are commonly taken as rectangular patches centering around the pixels that are of interest. However, this operation often introduces artefacts on the border of these rectangular patches, which produces uncertainty in the classification maps. For example, unchanged pixels and changed pixels could potentially exist in one image patch simultaneously. Heterogeneous pixels can also be found in one rectangular patch, which will increase the difficulty of distinguishing between CC and UC classes.
In this research, a new framework of two-phase object-based deep learning (TPOBDL) is proposed for SAR image change detection. Object-based deep learning has been shown to be suitable for remote sensing applications [25]. Thus, in TPOBDL, change detection is implemented in an objectbased rather than pixel-wise fashion. Superpixel generation is applied to SAR images to acquire image objects (also called superpixels in computer science, and here) using a simple linear iterative clustering (SLIC) algorithm [26]. In fact, all processing steps in TPOBDL are based on image superpixels. Since a superpixel is a local set of homogeneous pixels, superpixels can reflect the local spatial context [27,28,29]. Therefore, this approach can overcome the problems caused by operations involving rectangular patches, such as introducing artefacts and uncertainty in the classification. The proposed approach involves two phases to differentiate RCC and FCC objects in an automated approach. Our two-phase deep learning strategy is, thus: Phase 1 deep learning to classify the objects of CC and those of UC, and Phase 2 deep learning to classify objects of CC into RCC and FCC objects. This two-phase framework reduces the classification difficulty faced by deep learning models at each phase, and is conducive to increasing the overall accuracy of change detection.
Our major contributions are as follows: 1) Change detection through an object-based rather than pixel-wise approach. Superpixel generation is applied to SAR images to obtain objects via SLIC, such that the local spatial context is captured.
2) A two-phase approach is designed for multi-temporal SAR image change detection. Deep learning methods are developed to identify objects of FCC and RCC by combining low rank and sparse decomposition (LRSD) with reduced false alarms.
The remainder of this paper is organized as follows. In Section 2, the proposed approach is described in detail. Section 3 presents the experimental datasets and results. Discussion on the experiment results and the proposed approach are shown in Section 4. Finally, conclusions are drawn in Section 5.

Problem Statement and Overview of the Proposed Method
Consider two SAR images taken from the same location, but at different times 1 I and 2 I , both of size N M  . Change detection is required to generate a binary change map labeling changed pixels and unchanged pixels between 1 I and 2 I . Figure 1 shows the scheme of TPOBDL, which consists mainly of two phases of deep learning, described in detail as follows.

Superpixel Generation of Multi-Temporal SAR Images
In existing deep learning-based SAR image change detection methods, the patches for the training and testing of deep neural networks are generated mainly in the shape of rectangles, which is convenient [24]. However, the operation of taking rectangular patches has significant disadvantages for SAR image change detection. Firstly, when the current pixel is near the boundary between changed and unchanged regions, the patch generated will contain both changed and unchanged pixels, which may introduce uncertainty to the deep neural network and impair the learning process [25]. Secondly, rectangular patch generation ignores the local spatial context, which is conducive to the change detection. Instead of taking a rectangular patch, in this paper, patches come from superpixels, where all pixels are homogeneous. This reduces the likelihood that heterogeneous pixels, or even changed and unchanged pixels appear in one patch simultaneously. Patches that are superpixels, compared with traditional rectangular patches, provide more valid information to the deep learning model. In fact, deep learning based on superpixels is an object-based approach, which have more advantages.
In this research, we use SLIC to apply superpixel generation to two multitemporal SAR images 1 I and 2 I . SLIC is chosen for its simplicity, flexibility in compactness, memory efficiency and high accuracy, as applied to SAR image processing [30,31]. First, superpixels of 1 I are obtained by SLIC.
Then the superpixel pattern from 1 I is copied to 2 I , as shown in Fig 2. Pattern copying ensures that the corresponding two superpixels of 1 I and 2 I represent the same local region.  Figure 2. Illustration of copying superpixel pattern from 1 The principles of SLIC are briefly described as follows. Firstly, the number of superpixels is set as v , which means 1 I is portioned into v pixel-blocks at the beginning. The center of each pixelblock is called a seed. The distance (step length) between two seeds is defined as To avoid seeds falling on the contour boundary with a larger gradient, the seeds are redefined where the gradient is the smallest in the neighborhood. Then searching in the neighborhood of each seed, the distance between a pixel in the neighborhood and the seed, including distance in feature (colour) space c d and in geographical space where c d means feature (color) distance , s d means spatial distance, and D is the distance metric. of the pixel in the neighborhood. In this manner, a pixel will be searched many times with different seeds. The seed with the smallest D is taken as the clustering center of this pixel. Then the seeds are updated. Generally, 10 iterations is enough to obtain satisfactory superpixels.
Superpixels possess a range of geometries and sizes (i.e., numbers of pixels). In contrast, the inputs of the deep neural network are required to be uniform rectangles with the same numbers of pixels. Thus, the superpixels need to be reshaped into rectangles before being fed into the network. Assume that the input patches are of size k k  . Then, each reshaped superpixel should also have 2 k pixels. If a superpixel contains p pixels, there are two ways to reshape the superpixel. One is 2 k p  . For this case, assume that a superpixel represented as m i n, k pixels, and an extra vector with

Superpixel DI Generation and FCM
The reshaped superpixel vectors

Training PCANet1
As a type of deep learning model, PCANet is easy to train and can be adapted to other tasks. For SAR image change detection, PCANet has been shown to learn non-linear relations from multitemporal SAR images, which is an advantage compared to other deep neural networks [22]. It has already been employed in SAR image change detection [22,23,24]. Considering these superiorities of PCANet in SAR image change detection tasks, we use PCANet here to further classify those superpixel pairs identified to the intermediate class in the previous phase. Since PCANet is used in the second phase, the network in the first phase is called PCANet1.  The structure of PCANet1 is shown in Fig. 4 (5) where , ih y denotes mean-removed and vectorized , ih R .   (6) where T () ql YY means l  th principal eigenvector and mat( ) where the  operator means 2-D convolution. , l ih R forms the input of the second layer.
In the second layer, all , l ih R have their means removed and are vectorized to be , ,, L    Z Z Z Z (8) The following step is similar to that for the first layer. We choose 2 L principal eigenvectors of T ZZ as the PCA filters 2 l W of the first layer, that is: (9) And then the outputs of the second convolution layer are: After these two convolution layers, every The features obtained as above are fed into a support vector machine (SVM) to train a model which can classify superpixels of intermediate class to CC or UC. It is worth noting that the CC of the first phase includes not only the changed pixels caused by real terrain variation, but also changed pixels caused by strong speckle noise.

Second Phase Deep Learning
As stated above, when SAR images are contaminated by strong speckle noise, the CC of the first phase contains two categories of change. One is false change caused by speckle noise called FCC, the other is caused by real terrain variation called RCC. Thus, in the second phase, we aim to separate FCC and RCC, between which the intra-class interval is so small that they are difficult to distinguish. However, the hypostatic difference between the two categories is such that the change caused by strong speckle noise has strong randomness. If the influence of the random noise can be greatly weakened, discrimination between the RCC and FCC can be increased. Therefore, in the second deep learning phase, we adopt different methods to the first phase. One key step in the second phase is speckle noise suppression based on low rank and sparse decomposition. Details are as follows.

Superpixel Generation on the Updated SAR Images
In the second phase, we firstly use mask processing on the original SAR images 1 I and 2 I to set the pixels classified as UC in the first phase to zero, thus, easing the burden on the classifier in this phase. Then SLIC is conducted on these two masked images to generate new superpixel objects denoted by 2 , bi S . The superpixel generation in the phase has two differences from that in the first phase. Firstly, the superpixel generation of this phase is based on the masked images, so the spatial context of the pixels has altered significantly leading to different superpixel patterns. Secondly, when applying SLIC in this phase, we set the number of pixels of each superpixel to be less than that in the first phase because there are many discontinuous areas caused by the mask operation compared to the generation in the first phase. Then we reshape the superpixel objects using a strategy similar to that in the first phase.

Low Rank and Sparse Decomposition
The principle of using LRSD is that the pair of noisy superpixels from the same unchanged area of 1 I and 2 I , have an inherent large correlation with a low rank characteristic. Therefore, to discriminate RCC and FCC，we propose an idea based on LRSD to suppress speckle noise and restore the superpixel objects. The LRSD model establishes the effective expression of observed data with noise [33,34]. Low rank regularization constraints and sparse regularization constraints can separate noise effectively from observed data and recover data. By optimizing the LRSD model, speckle noise can be separated and observed objects restored, which may greatly increase the discrimination between RCC and FCC.
At first, we apply a logarithmic operation on each vector of superpixel objects to convert multiplicative speckle noise to additive noise. Then, each vector can be formulated as follows.
, as shown in Fig. 5. Thus, we can obtain the matrix version of equation (13) as equation (14).

Figure 5. Construction of matrix Φ .
According to the principle of low rank representation, in order to estimate a low rank matrix U and a spare matrix E from a noise-contaminated observed Φ , we formulate an optimization problem as follows.  (15) Where ‖⋅‖ * indicates the nuclear norm, ‖⋅‖ 2,1 indicates the 1 norm of a vector formed by the 2 norm of the column vector of the underlying matrix. ‖⋅‖ * induces sparsity of the singular values of the matrix, and ‖⋅‖ 2,1 induces sparsity of the elements of the matrix.
The optimization problem can be solved by an augmented Lagrange algorithm. The augmented Lagrange formula of the problem (15) is as follows:   Fig. 6. Figure 6. LRSD of the vectors from superpixel objects.

SPDI Generation and FCM
In the second phase, the difference vector is obtained from the superpixel vectors restored by LRSD, and FCM clustering is also adopted. At this stage,

Experiments and Results
To demonstrate the accuracy and effectiveness of the proposed approach, we compared TPOBDL with other state-of-the-art methods: principal component analysis and k-means clustering (PCAKM) [8], Gabor feature extraction and PCANet (GaborPCANet) [22], neighborhood-based ratio and extreme learning machine (NR_ELM) [35] and convolutional-wavelet neural network (CWNN) [36].

Datasets and Experimental Setup
We applied the proposed and benchmark methods to three real space-borne SAR datasets to evaluate the performance of TPOBDL. The three datasets used are co-registered and geometrically corrected SAR images acquired by the COSMO-Skymed satellite sensor, as shown in Fig. 7. The images in Fig. 7(a)(b)(c) were acquired on June 10, 2016 and those in Fig. 7(d)(e)(f) on April 26, 2017. The three areas are selected to represent different landscapes containing a river, a plain, mountain and buildings. They are all of size 400 × 400 pixels. It is obvious that the three SAR datasets suffer from speckle noise. Many speckle noise reduction filters exist, but the balance between speckle suppression and detail preservation remains challenging [35]. Therefore, no speckle filters were applied to the three SAR datasets. The corresponding ground reference datasets are shown in Fig.  7 How to evaluate the performance of SAR image change detection algorithms is a key issue. Here, we utilized several state-of-the-art evaluation metrics, including the false alarm probability f P , 13 missing detection probability m P , percentage correct classification PCC , Kappa coefficient KC and GD OE [1,22]. Assume that the actual numbers of pixels belonging to UC and CC are denoted by u N and c N , respectively, in the ground reference data, then = 100% Where n F denotes the number of unchanged pixels detected as changed, while n M represents the number of changed pixels detected as unchanged.
The definition of GD OE is then as follows.

Experiments
We analyzed and evaluated the final results visually and quantitatively.
The change detection results of multi-temporal SAR dataset C1 are shown in Fig. 8 and Table 1. As presented in Fig. 8, the change map of PCAKM contains many false alarms, scattered widely across the image with f P reaching 39.23%. This is because PCAKM is unable to classify the false changes caused by strong speckle noise and real changes caused by terrain variation as shown in Fig.  8 (a). However, different from PCAKM, the false alarms of GaborPCANet, NR_ELM and CWNN are centred in the river, as shown in Fig.8 (b)(c)(d). On one hand, PCAKM uses pixel values for change detection, which are affected by strong speckle noise. Thus, the f P of PCAKM is very high.
However, GaborPCANet and CWNN, two deep learning-based methods, can extract deep features and have a certain speckle noise suppression capability, so the f P is greatly reduced compared to PCAKM. Moreover, the extreme learning machine in NR_ELM can also effectively extract features and suppress speckle noise. Therefore, the performance of GaborPCANet, NR_ELM and CWNN is better than that of PCAKM. On the other hand, compared to the original two SAR images, we found that false alarms occur in the river region for the latter three methods. The river region in the two SAR images looks very dark, because the river backscatter of electromagnetic waves is relatively weak. Thus, under strong speckle noise, the signal-to-noise ratio (SNR) in the river region of the SAR image is very low. Therefore, in this case, the difference in values of pixels between the two images in the river region is relatively large, and pixels in the river region are easily classified as CC.
It can be seen that the final change map obtained by the proposed approach TPOBDL is very close to the ground reference, as shown in Fig. 8 (f). Compared with the former methods, the f P obtained by TPOBDL is only 0.18% (see Table 1), which is a remarkable result. This is because the second phase of TPOBDL uses a special network to identify the pixels of FCC and those of RCC. In addition, compared to CWNN, our approach uses object-based deep learning removing those scattered false alarms effectively, which demonstrates the advantages of object-based deep learning. Therefore, TPOBDL can eliminate effectively the false alarms caused by strong speckle noise.
As can be seen from Table 1, the quantitative analysis is consistent with the visual analysis. The performance of TPOBDL is better than for the benchmark algorithms in terms of PCC ,   9 and Table 2 present the final change detection results on dataset C2. In terms of visual comparison, PCAKM still includes many false alarms. The performance of GaborPCANet is better than that of PCAKM in terms of f P . However, there are several false alarms due to speckle noise.
Moreover, for each of PCAKM, GaborPCANet or NR_ELM, there is an obvious long and narrow area with fewer false alarms in the upper right corner of the change map. Comparing the original two multi-temporal SAR images, we find that this long and narrow area has an area of relatively strong back-scattering (visually white), which means the amplitude value of these pixels is relatively large. This indicates that change detection in areas with strong scattering is less affected by speckle noise because of the high SNR. This situation is exactly the opposite of the high false alarm phenomenon in the river region in the experiments on C1. As for CWNN, it is clear that the value of f P due to speckle noise is smaller than for the three benchmarks. This benefit arises from the wavelet pooling layers in CWNN, which suppress speckle noise by losing high-frequency sub-bands while preserving low-frequency sub-bands to extract features. However, TPOBDL has less false alarms than CWNN, because the object-based methodology is adopted, which greatly reduces classification uncertainty induced by rectangular patches. As for TPOBDL, two-phase deep learning is not only effective for change detection in low SNR region, but also for change detection in high SNR regions. This is due to the influence of the LRSD, which greatly constrains the influence of speckle noise. Among the five methods, TPOBDL has the best performance in terms of PCC ,  The results of experiments on dataset C3 are exhibited in Fig. 10 and Table 3. The performance of PCAKM is again the least good. Compared with the first two datasets, there are no weak backscattering regions (like river, C1) or strong backscattering regions (like mountain, C2). However, the contrast in the whole scene of C3 is relatively low, which means that classification may be more challenging due to low discrimination. Thus, it can be seen from Table  3

Parameters Selection
In the proposed approach, there exist four parameters to be discussed, which are the number of superpixels 1 SP and the patch size 1 k in the first phase, and the equivalents, 2 SP and 2 k , in the second phase. These four parameters affect the ability to learn neighborhood information in the twophase object-based deep learning approach. As indicated in [21], when the patch size is set as 55  , it leads to an optimal result. Hence, we fix 1 =5 k at the beginning. As for 1 SP and 2 SP , to reduce redundancy and increase superpixel generation efficiency, we assume , which means that the number of pixels in a superpixel and the number of pixels in a patch should be the same, as far as possible. So we fix 1 =6400 SP . Then, we conduct experiments on 2 = SP 17800, 6400, 3200 and 2 k = 3, 5, 7, 9 in pair-wise fashion, respectively. The experimental results are shown in Fig. 11 -12. Observing from Fig. 11-12, we found that when 2 = SP 17800 and 2 k = 3, the values of PCC and KC were the best. The experimental result is consistent with the principle of the proposed approach. As mentioned before, the spatial context of the pixels has altered significantly after masking in the second phase. And, there may be many discontinuous areas after masking. Hence, superpixel objects with a small number of pixels have the benefit of avoiding heterogeneous pixels inside the objects, which reduces classification uncertainty in PCANet2. This reveals that, in the second phase, the relatively small superpixels helps the PCANet2 to exploit more details, which cater to the purpose of distinguishing RCC and FCC. We then fixed the parameters of the second phase as 2 = SP 17800 and 2 k = 3 to conduct experiments on 1 = SP 17800,6400,3200 and 1 k = 3,5,7,9 in a pair-wise fashion, respectively. The experimental results are presented in Fig. 13-14.
As shown in Fig. 13 , which indicates that theoretically the number of pixels in a superpixel should be similar to the number of pixels in a patch. Thus, the best parameter combination is 1 SP = 3200, 1 k = 7 for the first phase, and 2 = SP 17800, 2 k = 3 for the second phase.

Comparison with Other Methods
Firstly, we compare the proposed approach with four other methods. The experimental results of all methods are presented in Fig. 8-10 and Tables 1-3. TPOBDL outperforms other methods in all evaluation indicators, except for missing alarms rate. This is because by using superpixel objects and two phases of PCANet, TPOBDL is more robust to speckle noise, able to extract deep features and capable of learning the nonlinear relations from multi-temporal SAR images efficiently. The patches reshaped from superpixel objects with homogeneous pixels are beneficial to the deep feature extraction and PCANet training, which avoids uncertainty due to rectangular patches.
The two deep learning phases in TPOBDL are important for acquiring the desired change detection performance. The first phase generally classifies pixels into two classes, CC and UC. However, there are actually two kinds of changes in CC. One is strong speckle noise-induced change, and the other is real terrain variation-induced change. In the second phase, the pixels belonging to UC are set to zero so that the PCANet2 can focus on identifying two indistinguishable changes. PCANet2 faces a more difficult classification tasks than PCANet1. Hence, we equip the second phase with LRSD to suppress noise and increase the ability to discriminate the two previously indistinguishable changes. Despite noise interference, multi-temporal SAR images of the same object should have a strong correlation. Based on this principle, we established the LRSD model. LRSD can not only suppress speckle noise, but also highlight the correlation between objects via the low rank constraint, as shown in Fig. 15. Through this, TPOBDL achieves the best performance amongst the five methods when facing strong speckle noise. It is worth noting that there is no speckle filtering in TPOBDL.

Modular Deep Learning Framework for change detection
In the proposed approach, PCANet1 in the first phase completes the classification tasks of CC and UC, and PCANet2 in the second phase completes the classification tasks of RCC and FCC. In fact, other deep neural networks can also be used in the first stage, instead of PCANet. In the same way, it is not necessary to use the PCANet in the second phase. Therefore, the two phase deep learning framework proposed in this paper can be regarded as a modular structure. The structure does not actually limit what deep learning models are used. The key to this modular structure is hierarchical classification. Moreover, the advantage of this modular deep learning framework is that the deep neural network in each module can complete a specialized, and not particularly complicated task, so the difficulty of classification in each module is reduced. For example, in this research, if only one PCANet is used to complete the classification of UC, RCC and FCC simultaneously, it is easy to generate more misclassifications, which will lead to a larger number of false alarms or larger number of missing alarms. In addition, this modular deep learning-based change detection structure is particularly suitable for engineering implementation.

Time-series SAR Images to Suppress Speckle Noise
As is commonly known, speckle noise is a major factor affecting the detection of changes in multitemporal SAR images. Although speckle filtering is not used in the proposed approach, it is conceivable that the performance of the algorithm will be improved to a certain extent if speckle noise filtering is performed before change detection. Since multi-temporal SAR images can be regarded as a time-series, and the images of the same observation scene are coherent, speckle noise can be suppressed based on this principle effectively. How to suppress speckle noise using time-series SAR images is also a topic for subsequent research.

Conclusions
In this research, a novel change detection algorithm with two-phase object-based deep learning approach for multi-temporal SAR images is presented. An object-based approach is used instead of a pixel-wise approach. The object-based change detection approach can effectively exploit the spatial context of neighborhood pixels, which is conducive to increasing the ability to identify UC and CC. Using superpixel objects, the pixels in each object are generally more homogeneous, which avoids the classification uncertainty caused by heterogeneous pixels and provides high-quality training samples for subsequent PCANets. In addition, this paper uses a two-phase deep learning framework to implement change detection on multi-temporal SAR images. The first phase of deep learning realizes the distinction between UC and CC. The second phase of deep learning realizes the distinction between RCC and FCC. The two-phase deep learning framework can tackle effectively the classification challenge faced by deep learning in each phase, and can effectively distinguish RCC and FCC, while maintaining a very low false alarm under strong speckle noise. The experimental results illustrate that the proposed approach can achieve high accuracy and validity.