Land Cover Change Detection from High-Resolution Remote Sensing Imagery Using Multitemporal Deep Feature Collaborative Learning and a Semi-supervised Chan–Vese Model

: This paper presents a novel approach for automatically detecting land cover changes from multitemporal high-resolution remote sensing images in the deep feature space. This is accomplished by using multitemporal deep feature collaborative learning and a semi-supervised Chan–Vese (SCV) model. The multitemporal deep feature collaborative learning model is developed to obtain the multitemporal deep feature representations in the same high-level feature space and to improve the separability between changed and unchanged patterns. The deep di ﬀ erence feature map at the object-level is then extracted through a feature similarity measure. Based on the deep di ﬀ erence feature map, the SCV model is proposed to detect changes in which labeled patterns automatically derived from uncertainty analysis are integrated into the energy functional to e ﬃ ciently drive the contour towards accurate boundaries of changed objects. The experimental results obtained on the four data sets acquired by di ﬀ erent high-resolution sensors corroborate the e ﬀ ectiveness of the proposed approach.


Introduction
Land cover change information is extremely important for the study of global climate change, biodiversity, environmental monitoring, and national resources management [1][2][3][4]. In recent decades, change detection (CD) using multi-temporal remote sensing datasets to quantify the changes and temporal effects on the Earth's surface have become a research hotspot [5,6]. Along with the rapid development of Earth observing technology, vast amounts of CD methodologies from remote sensing imagery have been developed and newer techniques are still emerging [6][7][8].
In the literature, the developed CD approaches can be classified into two categories, namely post-classification comparison and direct comparison [9,10]. Post-classification comparison is performed on multitemporal images to independently classify pixels, and then the classified maps are compared for change analysis [11][12][13]. The direct comparison of multispectral images is generally Remote Sens. 2019, 11, 2787 2 of 20 achieved by two steps: generating a difference feature map containing change magnitudes and analyzing the feature map to detect the changed areas [14][15][16]. Layer arithmetic operations (e.g., image differencing and change vector analysis [14]) and data transformation (e.g., principle component analysis [17] and histogram trend similarity [7]) can be used to generate the difference feature map. Then, analyzing methods for the feature map including thresholding, clustering, image segmentation, and machine learning are performed to discriminate changed and unchanged areas [6]. Direct comparison approaches based on the difference feature map have been widely implemented to automatically detect changes from multitemporal remote sensing data without the need for any prior information.
Based on the unit of analysis, these CD approaches can be categorized into pixel-based and object-based methods [18][19][20][21][22]. Pixel-based approaches mainly include the expectation maximization algorithm (EM), the fuzzy C-means (FCM), and active contour models (ACMs) [10,23]. The Chan-Vese (CV) model reduces the complexity of the optimization problem of ACMs and has been extensively studied in pixel-based CD approaches [24][25][26]. The local uncertainty of pixels was incorporated into the CV model to construct energy constraints in [27] to improve the accuracy of CD results and the computational efficiency. Li et al. [28] added the local fuzzy information in the CV model to enhance the changed information and reduce speckle noise. Li, Shi, Myint, Lu, and Wang [26] combined a thresholding method, morphology operations, and fast level set evolution for landslide mapping from bitemporal orthophotos. Compared to pixel-based approaches, object-based methods can delineate landscape features at different levels and reduce small spurious changes [6]. Therefore, object-based approaches are considered more suitable for remote sensing images with high spatial resolution [29][30][31]. Image segmentation is a pre-step for OBCD, which divides the image into homogenous objects on different scales. These image-objects are further used as the basic unit for developing a CD strategy [30,32].
Numerous machine learning algorithms have been used in CD applications, such as SVM [33][34][35], neural networks [36,37], and decision trees [38][39][40]. Recently, with the development of machine learning techniques, deep learning has attracted increasing attention due to its ability of mining the latent features and representations from the raw data [41]. The overwhelming advantages of deep learning have been presented in various remote-sensing applications [42][43][44], such as semantic segmentation [45], object detection [46], and complex land cover mapping based on remote sensing imagery [47]. For CD applications, Khan et al. [48] detected forest changes from contaminated SLC-off Landsat images using a convolutional neural network (CNN) model. Mou et al. [49] proposed a recurrent CNN architecture to learn spectral-spatial-temporal features for CD in multispectral remote sensing images. Wang et al. [50] presented a general CNN framework for discriminative feature extraction and CD from the multisource hyperspectral images. Zhang et al. [51] utilized feature learning based on deep neural networks and mapping transformation for CD from images with different spatial resolutions. Gong et al. [52] presented a CD framework using deep difference representations at the superpixel level by deep belief networks. However, it is difficult to effectively exploit robust features to highlight changes from high-resolution images, and we face a tradeoff between the level of required supervision and the possibility to define automatic criteria for the generation of CD maps [34,52].
In this paper, we propose a novel CD framework which combines deep feature learning (DFL) and a novel semi-supervised CV (SCV) model for detecting changes from multitemporal high-resolution remote sensing images. The multitemporal deep feature collaborative learning is conducted based on the SDAE model to obtain deep representations of multitemporal images from the spatial contextual information of the given pixel. Then the object-level difference feature map can be obtained through the feature similarity measure and multi-scale segmentation. After that, the SCV algorithm is proposed to detect the changed objects which can automatically exploit seed patterns with labeling information to guide the level set evolution. This CD procedure does not require any prior information. The contributions of this work can be concluded in the following aspects: (1) This paper proposed a new schema for solving CD problems for high-resolution multispectral remote sensing images, which has the ability to measure changes accurately and efficiently. (2) The multitemporal deep feature collaborative learning can transform the original multitemporal images into the same high-level feature space, obtaining the abstract representation of difference in intensities and improving the separability between changed and unchanged objects. (3) The pseudo-training set containing changed and unchanged patterns derived by uncertainty analysis of object labels is incorporated into the level set evolution process to efficiently drive the level curves towards the accurate boundaries of changed objects.
The rest of this paper is organized as follows. Section 2 describes the proposed CD approach. Then, Section 3 presents the experimental results on four remote sensing datasets from different sensors. After that, the findings are discussed in Section 4. Finally, the conclusions of this research are drawn in Section 5.

Methodology
The general framework of the proposed CD approach is depicted in Figure 1. The approach consists of two principal steps: 1) DFL of multitemporal images and 2) SCV model based on object-level deep difference feature map. First, deep feature collaborative learning based on SDAE is applied for the well-preprocessed multitemporal images to obtain deep feature representations in the same high-level feature space. Second, the object-level deep difference feature map is achieved by co-segmentation of the stacked bi-temporal images and the feature similarity measure. After that, the SCV model is proposed in which the pseudo-training set containing labeled patterns derived from the uncertainty analysis is integrated into the level set energy functional to guide the level set evolution. Finally, the CD map can be obtained through the level set evolution. (1) This paper proposed a new schema for solving CD problems for high-resolution multispectral remote sensing images, which has the ability to measure changes accurately and efficiently. (2) The multitemporal deep feature collaborative learning can transform the original multitemporal images into the same high-level feature space, obtaining the abstract representation of difference in intensities and improving the separability between changed and unchanged objects. (3) The pseudo-training set containing changed and unchanged patterns derived by uncertainty analysis of object labels is incorporated into the level set evolution process to efficiently drive the level curves towards the accurate boundaries of changed objects. The rest of this paper is organized as follows. Section 2 describes the proposed CD approach. Then, Section 3 presents the experimental results on four remote sensing datasets from different sensors. After that, the findings are discussed in Section 4. Finally, the conclusions of this research are drawn in Section 5.

Methodology
The general framework of the proposed CD approach is depicted in Figure 1. The approach consists of two principal steps: 1) DFL of multitemporal images and 2) SCV model based on object-level deep difference feature map. First, deep feature collaborative learning based on SDAE is applied for the well-preprocessed multitemporal images to obtain deep feature representations in the same high-level feature space. Second, the object-level deep difference feature map is achieved by co-segmentation of the stacked bi-temporal images and the feature similarity measure. After that, the SCV model is proposed in which the pseudo-training set containing labeled patterns derived from the uncertainty analysis is integrated into the level set energy functional to guide the level set evolution. Finally, the CD map can be obtained through the level set evolution.  Figure 1. Flowchart of the proposed change detection (CD) approach.

Multitemporal Deep Feature Collaborative Learning
The proposed deep feature collaborative learning aims at transforming the multitemporal images into the same high-level feature space to highlight changes and improve the separability between changed and unchanged patterns. SDAE, as its capability of learning robust and abstract representations from the raw data in an unsupervised way, is utilized in the multitemporal deep feature collaborative learning [53,54].

Multitemporal Deep Feature Collaborative Learning
The proposed deep feature collaborative learning aims at transforming the multitemporal images into the same high-level feature space to highlight changes and improve the separability between changed and unchanged patterns. SDAE, as its capability of learning robust and abstract representations from the raw data in an unsupervised way, is utilized in the multitemporal deep feature collaborative learning [53,54]. Let us consider two remote sensing images, I 1 and I 2 of size R × P, acquired in the same geographical area at two different times, T 1 and T 2 , each having b bands. Both images have been well-preprocessed, including co-registration and radiometric calibration. For each point (r, p) ∈ Ω, we use the point (r, p) with its spatial neighboring pixels N r,p (ω) as the input vector, where ω represents the local window size of its neighborhood. The corresponding image patches in the bitemporal images are both vectorized as training samples with dimensions of d = b × ω × ω, as displayed in Figure 2. Then, the feature vectors from the bitemporal images are trained together through a deep feature learning algorithm based on a SDAE model.  Figure 2. Then, the feature vectors from the bitemporal images are trained together through a deep feature learning algorithm based on a SDAE model. An autoencoder is a multi-layer neural network that is used to reconstruct the original input and learn the features. DAE introduces a denoising criterion into the basic autoencoder to make the autoencoder robust to unfavorable noises, and the original inputs are contaminated explicitly by adding random noises during the training. After training, a clean "repaired" input will be reconstructed from the corrupted one and the output values will be as close as the original uncontaminated values [51,55]. This is done by corrupting the original input x to get a partially contaminated version x  , according to a stochastic mapping is then transformed into a hidden representation y through a deterministic mapping f θ : where the parameter is set to Then we reconstruct a d-dimensional vector z through mapping the hidden representation y back to the input space. This mapping g θ ′ is an affine mapping, optionally followed by a squashing non-linearity: parameterized by , where b′ is a bias vector of the dimensionality d and ′ W is a d d′ × weight matrix. An autoencoder is a multi-layer neural network that is used to reconstruct the original input and learn the features. DAE introduces a denoising criterion into the basic autoencoder to make the autoencoder robust to unfavorable noises, and the original inputs are contaminated explicitly by adding random noises during the training. After training, a clean "repaired" input will be reconstructed from the corrupted one and the output values will be as close as the original uncontaminated values [51,55]. This is done by corrupting the original input x to get a partially contaminated version x, according to a stochastic mapping x ∼ q ℵ ( x x). Corrupted input x ∈ [0, 1] d is then transformed into a hidden representation y through a deterministic mapping f θ : where the parameter is set to θ = {W, b}, σ is the activation function, b is a bias vector of dimensionality d and W is a d × d weight matrix. The activation function σ is set to the sigmoid function in this paper, i.e.,σ(x) = 1/(1 + e −x ). Then we reconstruct a d-dimensional vector z through mapping the hidden representation y back to the input space. This mapping g θ is an affine mapping, optionally followed by a squashing non-linearity: parameterized by θ = W , b , where b is a bias vector of the dimensionality d and W is a d × d weight matrix. The parameters of the DAE model are optimized in an unsupervised way by minimizing the reconstruction error amounts between a clean x (i) and its reconstruction z (i) , that is, carrying the following optimization: where L is the squared error function L =||x − z|| 2 . The sample size is equal to 2 × R × P. After training, the reconstruction layer z is removed, and the values of the hidden layer y can be used as the representation of input features in a new feature space [51,55].
By stacking multiple DAEs in a hierarchical manner such that the values of hidden layers become the input to the next upper DAE, a SDAE model can be constructed. The SDAE is learnt in a greedy layer-wise fashion using a gradient descent [51,55]. After training the (k-1)th DAE, its learnt representation is used as input to train the kth DAE to learn the next-level representation [54,56]. Then the procedure can be repeated until all the DAEs are trained and the highest-level output representation can be obtained. In this paper, parameters of SDAE are initialized at random and then optimized by stochastic gradient descent. The multitemporal deep features are learned collaboratively in the same high-level feature space based on a SDAE model, as illustrated in Figure 2, thus the multitemporal deep features can be compared directly.

Deep Difference Feature Extraction
In the proposed CD framework, the co-segmentation using the fractal net evolution approach (FNEA) is applied directly to the stacked bitemporal images to create spatially corresponding objects. FNEA is a region growing algorithm based on a minimum heterogeneity criteria and builds a multi-scale hierarchical structure by merging the neighboring image objects [57,58]. The segment parameters for the FNEA-based segmentation are adjusted and determined with the aid of the ESP tool in this paper.
The deep difference feature map Q is then generated by applying the cosine similarity measure on the multitemporal deep features, as follows: where q k is the deep difference feature of the kth object in region A k . ||A k || denotes the number of pixels in the kth object. y 1 and y 2 represent the deep feature vectors of image I 1 and I 2 , respectively. sim() means the cosine similarity of the two vectors denoted as follows:

Uncertainty Analysis
In CD problems, it is difficult to obtain reliable supervised information without available ground truth. In this study, we propose to exploit the changed and unchanged patterns by an uncertainty analysis of object labels. The FCM algorithm can obtain more useful information such as the fuzzy membership grade compared to the traditional hard clustering methods [59,60], thus it is adopted to initially cluster the objects in this research. It is an unsupervised method that can classify the deep difference features of the objects into fuzzy clusters. The objective function of the initial clustering algorithm for the deep difference features is represented as the following equation: where q k is the deep difference feature vector of the kth object, v j is the cluster center in the jth cluster, u jk indicates the fuzzy membership grade of q k associated with the jth cluster, q k − v j 2 is the squared distance between the feature vector q k and the cluster v j . Based on the initial clustering by FCM, the label uncertainty of each object can be measured by information entropy. Then the pseudo-training set identified as seed patterns can be obtained by selectively thresholding the uncertainty values and comparing between the changed and unchanged fuzzy membership grade, as demonstrated below: where E k is the initial label uncertainty of the kth object, T denotes a threshold of uncertainty to determine the range of the nearly certain patterns, S c and S u contain the changed and unchanged patterns, respectively. As objects with high uncertainty are more likely to be confused with changed and unchanged classes, we can define a relatively small certain region to guarantee the chosen objects contained in the sets S c and S u can be accurately labeled with a high probability. Consequently, a pseudo-training set containing relatively reliable samples can be obtained from the deep difference feature map by using the represented rules. The pseudo-training set Ψ = {S, L} is made up of the pairs: the changed samples q c , l = w c and the unchanged ones q u , l = w u , to be used as seed patterns, i.e.,

SCV Model
The proposed SCV model aims at finding an optimal contour, which splits the deep difference feature map into non-overlapping regions associated with changed and unchanged classes. In this paper, the pseudo-training information containing labeled patterns is introduced into the traditional CV model. For the given deep difference feature map Q, the proposed energy functional takes on the following form: where F SCV is the proposed energy functional, F glo is the global energy term derived from the CV model and F sup is the incorporated supervised term integrated with the labeled patterns. φ is the level set function. H(φ) is the Heaviside step function, i.e., H(z) = 1 if z ≥ 0, and H(z) = 0 if otherwise. c 1 and c 2 approximate the change intensities inside and outside the contour, respectively. The typically used regularization term based on the mean curvature in the CV model is eliminated in the proposed model to reduce the computational complexity and constrain the curves towards the object boundaries in the feature map.
Keeping φ fixed and minimizing the energy F SCV , we solve c 1 and c 2 , as follows: The energy functional is minimized with respect to φ by deducing the associated Euler-Lagrange equation for φ when c 1 and c 2 remain fixed. Then the new variational formulation for level set evolution can be represented as follows: where the regularized versions of Heaviside step function H and the Dirac delta function δ are selected as follows: where ε is a small number. The implementation of the proposed algorithm is presented in Table 1. 1: Initialize φ as a signed distance function, n = 0 2: Initial clustering 3: Get the pseudo-training set H through uncertainty analysis 4: Repeat 5: Compute c 1 (φ n ) and c 2 (φ n ) 6: Solve the partial differential equation in φ n 7: Update the level set function φ n+1 = φ n + ∆t ∂φ n ∂t 8: Until convergence criterion F n − F n−1 < ξ is satisfied 9: Return φ > 0, i.e., the binary result of CD

Datasets
To verify the advantages of the proposed CD approach, four high-resolution multitemporal remote sensing datasets acquired by different platforms and sensors, namely, QuickBird, GF 1, SPOT 5, and Aerial, were considered in the experiments.
The first data set consists of two images of size 598 × 497 pixels, acquired by the QuickBird satellite covering the Xinzhou district in the city of Wuhan, China, in April 2002 and July 2009, with the same spatial resolution of 2.4 m, as shown in Figure 3a.
The second data set represented two 2 m high-resolution images acquired by the GF 1 satellite over the Caidian district in Wuhan, China, in April 2016 and August 2018. The images were generated by fusing panchromatic and multispectral images. An area with 800 × 1050 pixels was cropped from the entire images, as displayed in Figure 3b.
The third data set was acquired by the SPOT 5 satellite covering the Wuqing district in the city of Tianjin, China, in April 2008 and February 2009. The size of the dataset is 450 × 400 pixels with a spatial resolution of 2.5 m, as shown in Figure 3c. The fourth data set contains a pair of bitemporal aerial orthophotos on the Lantau Island, Hong Kong, China. The orthophotos were acquired by Zeiss RMK TOP Aerial Survey Camera System in December 2005 and November 2008, respectively. The images have the size of 743 × 1107 pixels, with a spatial resolution of 0.5 m, as presented in Figure 3d.
Before applying the proposed CD approach, the preprocessing of multitemporal images, including image co-registration and radiometric correction, was performed on the four data sets by ENVI software. The ground truth maps were produced by visual interpretation using ArcGIS software.

Evaluation Criteria and Experimental Settings
In the experiments, five unsupervised CD methods are selected as the comparison algorithms to verify the advantages of the proposed CD approach, including the classic PCA-K-Means method

Evaluation Criteria and Experimental Settings
In the experiments, five unsupervised CD methods are selected as the comparison algorithms to verify the advantages of the proposed CD approach, including the classic PCA-K-Means method [17], multi-scale superpixel and deep neural networks (MSDNN) for CD [61], region-based level set evolution (RLSE) method [26], multi-scale object histogram distance (MOHD) method [57], and the object-based unsupervised CD based on the SVM method, denoted as object-based SVM (OSVM) [25].
To verify the effectiveness of the proposed approach, the CD results were evaluated by the following four widely used indices: 1) false alarm (FA) rate, 2) missed detection (MD) rate, 3) total error (TE) rate, and 4) Kappa coefficient [62,63,64].
In the experiments, the local window size of the given pixel

Experimental Analysis
The CD maps obtained from the proposed approach and the comparison algorithms on the four datasets are shown in Figures 4-7, respectively. From the qualitative point of view, change maps generated with PCA-K-Means display significant noise both in the changed and unchanged regions. Although the homogenous changes can be well-detected by MSDNN, MOHD, and OSVM, many noise spots still exist in the change maps. RLSE uses a thresholding method and morphology operations to reduce errors, but it produces change maps losing a large number of details in the changed regions. By contrast, the proposed approach significantly reduces noise spots and simultaneously retains detailed changes in the change maps.

Evaluation Criteria and Experimental Settings
In the experiments, five unsupervised CD methods are selected as the comparison algorithms to verify the advantages of the proposed CD approach, including the classic PCA-K-Means method [17], multi-scale superpixel and deep neural networks (MSDNN) for CD [61], region-based level set evolution (RLSE) method [26], multi-scale object histogram distance (MOHD) method [57], and the object-based unsupervised CD based on the SVM method, denoted as object-based SVM (OSVM) [25].
To verify the effectiveness of the proposed approach, the CD results were evaluated by the following four widely used indices: 1) false alarm (FA) rate, 2) missed detection (MD) rate, 3) total error (TE) rate, and 4) Kappa coefficient [62][63][64].
In the experiments, the local window size of the given pixel ω = 3 and the threshold of uncertainty T = 0.1 were set for the proposed approach. The multitemporal deep feature collaborative learning adopted a 3-layer SDAE with structure 27-15-5-2 stacked by three DAEs. Furthermore, we set h = 3 and s = 3 for PCA-K-Means, α = 1.5, l = 60 for MSDNN, c 0 = 1 and δ = 1 for RLSE, scale = 40, compactness = 0.8, shape = 0.9 for MOHD and T max = 0.5, T min = 0.1 for OSVM in which the Gaussian radial basis function kernel was set for the SVM kernel model. The deep feature learning was implemented in the Python programming language using TensorFlow 1.13.1 (GPU version) on a workstation with an Intel Core i7 CPU and NVIDIA GeForce GTX 1070. The SCV model was implemented in MATLAB R2016a on the workstation.

Experimental Analysis
The CD maps obtained from the proposed approach and the comparison algorithms on the four datasets are shown in Figures 4-7, respectively. From the qualitative point of view, change maps generated with PCA-K-Means display significant noise both in the changed and unchanged regions. Although the homogenous changes can be well-detected by MSDNN, MOHD, and OSVM, many noise spots still exist in the change maps. RLSE uses a thresholding method and morphology operations to reduce errors, but it produces change maps losing a large number of details in the changed regions. By contrast, the proposed approach significantly reduces noise spots and simultaneously retains detailed changes in the change maps. Remote Sens. 2019, 11, x FOR PEER REVIEW 10 of 20    Deep difference feature maps and uncertainty analysis results of the four test data sets are given in Figure 8. As can be seen, the deep difference feature maps can highlight change intensities Deep difference feature maps and uncertainty analysis results of the four test data sets are given in Figure 8. As can be seen, the deep difference feature maps can highlight change intensities Deep difference feature maps and uncertainty analysis results of the four test data sets are given in Figure 8. As can be seen, the deep difference feature maps can highlight change intensities of objects. In addition, the changed and unchanged samples identified as seed patterns can be obtained from deep difference feature maps through uncertainty analysis Remote Sens. 2019, 11, x FOR PEER REVIEW 12 of 20 of objects. In addition, the changed and unchanged samples identified as seed patterns can be obtained from deep difference feature maps through uncertainty analysis Figure 9 represents the evolution process of the level set function from the initial contour to the final result in the SCV model compared to the traditional CV model. The initial curves were circles evenly covering the entire deep difference feature map. As shown in Figure 8, the SCV model has the capability of causing the level curves to rapidly evolve towards the object boundaries compared to the CV model.  Figure 9 represents the evolution process of the level set function from the initial contour to the final result in the SCV model compared to the traditional CV model. The initial curves were circles evenly covering the entire deep difference feature map. As shown in Figure 8, the SCV model has the capability of causing the level curves to rapidly evolve towards the object boundaries compared to the CV model. Tables 2-5 illustrate the quantitative error measures obtained by all the CD methods used in this research. From the point of view of KC and TE rates, the proposed approach clearly exceeds the other five methods, which indicates that the proposed approach achieves the most accurate CD results compared to the ground truths. In terms of QuickBird data, although the proposed approach generates more FAs than RLSE and OSVM, the MD rate has been significantly reduced by the proposed approach. Consequently, the proposed approach generates the least amount of TEs. The similar performance can be found in results of GF 1 data and SPOT 5 data. RLSE generates the least FAs, but the largest amount of MD rates. However, the proposed approach is capable of extracting more complete changed areas. As a result, the proposed approach produces the lowest TEs. With respect to the aerial data, the PCA-K-Means and OSVM achieve lower MD rates and larger FA rates. In comparison, the proposed approach performs significantly better in terms of TEs and FAs. Overall, the proposed approach has demonstrated competitive advantages over the compared methods throughout the experiments. Tables 2-5 illustrate the quantitative error measures obtained by all the CD methods used in this research. From the point of view of KC and TE rates, the proposed approach clearly exceeds the other five methods, which indicates that the proposed approach achieves the most accurate CD results compared to the ground truths. In terms of QuickBird data, although the proposed approach generates more FAs than RLSE and OSVM, the MD rate has been significantly reduced by the proposed approach. Consequently, the proposed approach generates the least amount of TEs. The similar performance can be found in results of GF 1 data and SPOT 5 data. RLSE generates the least FAs, but the largest amount of MD rates. However, the proposed approach is capable of extracting more complete changed areas. As a result, the proposed approach produces the lowest TEs. With respect to the aerial data, the PCA-K-Means and OSVM achieve lower MD rates and larger FA rates. In comparison, the proposed approach performs significantly better in terms of TEs and FAs. Overall, the proposed approach has demonstrated competitive advantages over the compared methods throughout the experiments.  Table 3. Accuracy comparison among different methods on the GF 1 data set. To test the impact of the network structure of SDAE in our deep feature learning model, different network structures have been taken into consideration to evaluate their influences over the accuracy of the CD results. Figure 10 illustrates the variations in the TE rates and KC values with CD results obtained by SDAE with different structures, in which SDAE-2 is a 2-layer SDAE with structure 27-15-2 stacked by two DAEs, SDAE-3 is a 3-layer SDAE with structure 27-15-5-2 stacked by three DAEs, SDAE-4 is a 4-layer SDAE with structure 27-20-15-5-2 stacked by four DAEs, and SDAE-5 is a 5-layer SDAE with structure 27-20-15-10-5-2 stacked by five DAEs.

Method FA (%) MD (%) TE (%) KC
In general, a deeper network can learn more useful abstract features from the input data. The CD results in this paper are related to both deep difference features and the SCV model. For the SPOT 5 and Aerial data sets, the deeper network such as SDAE-3, SDAE-4, and SDAE-5 can generate more accurate results than SDAE-2, and the TE and KC values obtained by SDAE-3, SDAE-4, and SDAE-5 are close. Similarly, for the GF 1 data set, the SDAE-4 and SDAE-5 with deeper network structures can obtain more accurate results than SDAE-2 and SDAE-3. Nevertheless, for the QuickBird data set, the SDAE-5 stacked by five DAEs undergoes a decline in accuracy because some detailed changes are lacking in the CD result.
In the SCV model, the T values determine the uncertainty of pseudo-training samples and the reliability of the labeled patterns. The variations in the TE rates and KC with different T values in Equation (9) of the SCV model are displayed in Figure 11. Different change maps produced by T values ranging from 0.1 to 0.9 with a step of 0.1 are used to analyze the effects of different T values on CD results. In general, the TE rates rise with the increase of the value of T and the KC values undergo a decline. It indicates that the more reliable pseudo-training samples with smaller T values can generate more accurate CD results. Labeled patterns obtained by larger T values may guide the level curves to the unexpected objects boundaries and result in a decrease of the accuracy of CD maps, especially for Aerial data.
for the QuickBird data set, the SDAE-5 stacked by five DAEs undergoes a decline in accuracy because some detailed changes are lacking in the CD result.  In the SCV model, the T values determine the uncertainty of pseudo-training samples and the reliability of the labeled patterns. The variations in the TE rates and KC with different T values in Equation (9) of the SCV model are displayed in Figure 11. Different change maps produced by T values ranging from 0.1 to 0.9 with a step of 0.1 are used to analyze the effects of different T values on CD results. In general, the TE rates rise with the increase of the value of T and the KC values undergo a decline. It indicates that the more reliable pseudo-training samples with smaller T values can generate more accurate CD results. Labeled patterns obtained by larger T values may guide the level curves to the unexpected objects boundaries and result in a decrease of the accuracy of CD maps, especially for Aerial data.

Discussion
The experimental results on four remote sensing data sets from different sensors have corroborated the proposed CD approach is superior to other methods through the qualitative and quantitative analysis. In the multitemporal deep feature collaborative learning, the deeper network can generate more abstract difference features, but the loss of detailed changes may occur in the deep feature map when there are too much layers in the deep networks. Reliable labeled patterns for the SCV model can be obtained by smaller T values and guide the level curves to the changed object boundaries.
The proposed CD approach in this research is mainly based on comparing images at two different times-the bitemporal approach, but it can also be used for CD with more than two images. For example, the image-objects can be generated by segmenting the multi-temporal images together and the multitemporal deep feature collaborative learning can use multi-temporal images as input. The experimental results have confirmed the robustness of the proposed approach and its ability of handling different land cover change types, such as urban sprawl (QuickBird data, GF1 data, and SPOT5 data), vegetation restoration (QuickBird data), and disaster monitoring (Aerial data). This research focuses on automatically and efficiently detecting land cover changes from remote sensing images. It provides a solution for CD without high-quality samples or prior knowledge. To further improve the reliability of CD results, the additional information can be used in the specific CD applications, such as analyzing historical data and other supplementary data to obtain the driving factor of land cover changes and integrating the slope and aspect information in the landslide mapping.

Conclusions
This paper has presented a novel approach for CD from multitemporal high-resolution remote sensing images without any prior information. The multitemporal deep feature collaborative learning based on SDAE is developed to obtain the deep feature representations of multitemporal images. Then, the object-level abstract difference features can be obtained through multitemporal co-segmentation and the feature similarity measure. After that, a SCV model is used to extract the final changed regions integrated with the labeled patterns derived from an uncertainty analysis.
The experimental results on four data sets acquired by different sensors have corroborated the effectiveness and reliability of the proposed approach for CD. Compared to PCA-K-Means, MSDNN, RLSE, MOHD, and OSVM, the proposed approach performs better through qualitative and quantitative evaluations. The proposed approach can not only reduce the influence of speckle noise, but also retain the detailed changes. Thus, it achieves the most accurate CD results among all the methods in the experiments.

Discussion
The experimental results on four remote sensing data sets from different sensors have corroborated the proposed CD approach is superior to other methods through the qualitative and quantitative analysis. In the multitemporal deep feature collaborative learning, the deeper network can generate more abstract difference features, but the loss of detailed changes may occur in the deep feature map when there are too much layers in the deep networks. Reliable labeled patterns for the SCV model can be obtained by smaller T values and guide the level curves to the changed object boundaries.
The proposed CD approach in this research is mainly based on comparing images at two different times-the bitemporal approach, but it can also be used for CD with more than two images. For example, the image-objects can be generated by segmenting the multi-temporal images together and the multitemporal deep feature collaborative learning can use multi-temporal images as input. The experimental results have confirmed the robustness of the proposed approach and its ability of handling different land cover change types, such as urban sprawl (QuickBird data, GF1 data, and SPOT5 data), vegetation restoration (QuickBird data), and disaster monitoring (Aerial data). This research focuses on automatically and efficiently detecting land cover changes from remote sensing images. It provides a solution for CD without high-quality samples or prior knowledge. To further improve the reliability of CD results, the additional information can be used in the specific CD applications, such as analyzing historical data and other supplementary data to obtain the driving factor of land cover changes and integrating the slope and aspect information in the landslide mapping.

Conclusions
This paper has presented a novel approach for CD from multitemporal high-resolution remote sensing images without any prior information. The multitemporal deep feature collaborative learning based on SDAE is developed to obtain the deep feature representations of multitemporal images. Then, the object-level abstract difference features can be obtained through multitemporal co-segmentation and the feature similarity measure. After that, a SCV model is used to extract the final changed regions integrated with the labeled patterns derived from an uncertainty analysis.
The experimental results on four data sets acquired by different sensors have corroborated the effectiveness and reliability of the proposed approach for CD. Compared to PCA-K-Means, MSDNN, RLSE, MOHD, and OSVM, the proposed approach performs better through qualitative and quantitative evaluations. The proposed approach can not only reduce the influence of speckle noise, but also retain the detailed changes. Thus, it achieves the most accurate CD results among all the methods in the experiments.
The main advantages of the proposed approach are that deep features of original multitemporal images can be represented in the same high-level feature space through deep feature collaborative learning to effectively exploit the abstract difference features and improve the separability between changed and unchanged patterns. Moreover, the pseudo-training set containing the labeled patterns derived from uncertainty analysis is incorporated into the level set evolution functional to efficiently drive the level curves towards more accurate changed object boundaries.
Further improvement will be considered in modifying this algorithm to handle the CD problems when the multitemporal images come from different sensors and when detecting the multi-class changes from the images.
Author Contributions: X.Z. and W.S. were responsible for the overall design of the study. X.Z. performed the experiments and drafted the manuscript, which was revised by all authors. X.Z., Z.L., and F.P. carried out the data processing. All authors read and approved the final manuscript.