Change Detection for Heterogeneous Remote Sensing Images with Improved Training of Hierarchical Extreme Learning Machine (HELM)

: To solve the problems of susceptibility to image noise, subjectivity of training sample selection, and inefﬁciency of state-of-the-art change detection methods with heterogeneous images, this study proposes a post-classiﬁcation change detection method for heterogeneous images with improved training of hierarchical extreme learning machine (HELM). After smoothing the images to suppress noise, a sample selection method is deﬁned to train the HELM for each image, in which the feature extraction is respectively implemented for heterogeneous images and the parameters need not be ﬁne-tuned. Then, the multi-temporal feature maps extracted from the trained HELM are segmented to obtain classiﬁcation maps and then compared to generate a change map with changed types. The proposed method is validated experimentally by using one set of synthetic aperture radar (SAR) images obtained from Sentinel-1, one set of optical images acquired from Google Earth, and two sets of heterogeneous SAR and optical images. The results show that compared to state-of-the-art change detection methods, the proposed method can improve the accuracy of change detection by more than 8% in terms of the kappa coefﬁcient and greatly reduce run time regardless of the type of images used. Such enhancement reﬂects the robustness and superiority of the proposed method.


Introduction
Change detection intends to identify the changes between a given image pair of the same scene acquired at different times [1]. With the advancement of remote sensing technology, remote sensing data have developed into multi-temporal, multi-channel, and multi-source data that serve as the main source for detecting changes on the Earth's surface. Change detection is widely used with various types of remote sensing data (e.g., synthetic aperture radar (SAR), optical, light detection and ranging (LiDAR), and geographic information system (GIS) data) in many fields, such as land cover monitoring [2], forest cover monitoring [3], and disaster assessment [4]. According to the characteristics of multi-temporal images, change detection methods can be divided into change detection with homogeneous or heterogeneous images.
In change detection with homogeneous images, the multi-temporal images used for change detection are acquired from the same remote sensing sensor. Numerous methods have been proposed for change detection with homogeneous images. According to processing units, these change detection methods can be divided into pixel-based and object-based transformed into the same feature space for comparison by training them with unchanged pixels. To enlarge the difference between changed and unchanged regions, the work in [35] proposed an approximately symmetrical deep neural network (ASDNN), which transforms heterogeneous images into common feature spaces and then highlights the changed regions by training them with changed and unchanged labels. To perform joint feature extraction on transformed SAR and optical images for the selection of training labels, the work in [36] proposed a logarithmic transformation feature learning (LTFL) framework and obtained the change map by using the trained classifier. Niu et al. used conditional generative adversarial network (CGAN) [37] to transform heterogeneous images into the same feature space for comparison. All these methods could adequately learn training samples, but they are time consuming.
In this paper, to address the issue of image noise susceptibility and manual sample selection, and to take advantage of machine learning whiling improving algorithm efficiency, the current work proposes a change detection method with heterogeneous images that is based on hierarchical extreme learning machine classification (HELMC). Despite the vast variances in imaging characteristics in heterogeneous images, the same Ground objects may be represented by the same labels, and change information can be retrieved by comparing the labels.
In this method, SAR and optical images are first smoothed separately to reduce the effects of image noise. Then, a measurement between pixels and cluster centers is proposed for the automatic selection of training samples among the multi-temporal images. In training the hierarchical extreme learning machine (HELM), the hidden layers can use parameters that are given randomly without adjustment [38] to obtain meaningful feature representations and high learning efficiency. After the segmentation of the multi-temporal feature maps to obtain the classification maps, a change map is generated through the comparison of the multi-temporal classification maps. The main contributions of this work are summarized as follows: (1) This paper proposes a new change detection framework, which is applicable to both homogeneous/heterogeneous images change detection, not only to obtain changed regions, but also to distinguish changed types. (2) This paper proposes a separable training sample selection method to train the network, which accurately selects training samples and does not need to utilize additional training datasets. (3) HELM with less parameter adjustment in network training is introduced for multitemporal feature extraction to improve the accuracy and efficiency of change detection with heterogeneous images.
The rest of the paper is organized as follows: Section 2 introduces the related theories about HELM and describes HELMC in detail. Section 3 displays the experimental results for heterogeneous and homogeneous images. Section 4 discusses the paper. Section 5 provides the conclusion and future work directions.

HELM
HELM is a new hierarchical learning framework based on extreme learning machine (ELM) [39]. HELM has a deeper architecture than ELM and is thus able to achieve more meaningful features. As shown in Figure 1, HELM has two parts: (1) multilayer forward encoding and (2) original ELM.

Multilayer Forward Encoding
In the first part, HELM transforms data into an ELM random feature space to exploit any hidden layer information. The output of each hidden layer can be expressed as: where β is the hidden layer weight.
hidden layers. ( ) g  is the activation function of the hidden layer. Through the multilayer forward encoding among hidden layers in the HELM framework, the parameters of the following hidden layer are determined as long as the features of the previous hidden layer are extracted.
Therefore, HELM has a higher computational efficiency than some traditional DL frameworks [40] that require parameter adjustment.
The hidden layer weights β can be optimized by an ELM sparse autoencoder to obtain good generalization and a fast learning speed.
where X is the training set and 1 l is a constraint term.

Original ELM
In the second part, the final output F is obtained on the basis of the original ELM [39], that is, where the output weight out β is calculated by where T is the target matrix and λ is a positive value to improve the stability of ELM.

Methodology
Consider two co-registered heterogeneous images of a same scene,

Multilayer Forward Encoding
In the first part, HELM transforms data into an ELM random feature space to exploit any hidden layer information. The output of each hidden layer can be expressed as: where β is the hidden layer weight. H i−1 and H i respectively represent the output matrix of the ith and (i − 1)th (i ∈ [1, K]) hidden layers. g(·) is the activation function of the hidden layer. Through the multilayer forward encoding among hidden layers in the HELM framework, the parameters of the following hidden layer are determined as long as the features of the previous hidden layer are extracted. Therefore, HELM has a higher computational efficiency than some traditional DL frameworks [40] that require parameter adjustment.
The hidden layer weights β can be optimized by an ELM sparse autoencoder to obtain good generalization and a fast learning speed.
where X is the training set and l 1 is a constraint term.

Original ELM
In the second part, the final output F is obtained on the basis of the original ELM [39], that is, where the output weight β out is calculated by where T is the target matrix and λ is a positive value to improve the stability of ELM.

Methodology
Consider two co-registered heterogeneous images of a same scene, which are acquired from SAR and optical sensors at different times t 1 and t 2 , respectively. H and W are the height and width of each image, respectively. Figure 2 shows the proposed change detection framework for images I 1 and I 2 with HELMC. After smoothing images, HELM is trained by the proposed method for automatically selecting training samples separately from I 1 and I 2 . Feature maps for each image are then generated. Then, the feature maps are segmented to obtain the multi-temporal classification maps, which are subsequently compared to identify changed regions and their categories. are segmented to obtain the multi-temporal classification maps, which are subsequently compared to identify changed regions and their categories. To reduce the effect of image noise on change detection, this work applies the mean shift (MS) method [41] to the smoothening of the images according to their distribution without any prior information.

Training Sample Selection
The proposed selection of training samples is aimed toward the selection of pixels with values close to the cluster centers in a given smooth image. Given image  To reduce the effect of image noise on change detection, this work applies the mean shift (MS) method [41] to the smoothening of the images according to their distribution without any prior information.

Training Sample Selection
The proposed selection of training samples is aimed toward the selection of pixels with values close to the cluster centers in a given smooth image. Given image I ∈ [I min , I max ], where I min and I max respectively represent the minimum and maximum values of the image. The cluster centers of the smooth image are obtained by FCM [8], with the center values being c i (i = 1, 2, . . . , C, and c i > c i−1 ), where c represents the total number of image categories.
The objective function and constraints of the FCM are defined as where m is the fuzzy factor of affiliation u ij , x j is the sample data, and n is the total number of samples in the dataset. Each cluster center corresponds to a value interval of sample sample i , and min(sample i ) < c i < max(sample i ), (i = 1, 2, . . . , C), where min(·) and max(·) are the maximum and minimum operators, respectively. In this way, we have two cases: (1) the sample intervals do not overlap, and (2) the sample intervals overlap. A schematic example is displayed in Figure 3 to illustrate the definition of the training sample selection. The case of the sample intervals not overlapping is shown in Figure 3a, which can be described as . . .
where th i (i = 1, 2, . . . , C − 1) represents the boundaries of the sample intervals, which are defined as represents the boundaries of the sample intervals, which are defined as ( ) Figure 3b, if the sample intervals overlap, then the upper boundary of 1 i sample − might be higher than the lower boundary of i sample . This situation will cause the training samples are not pure enough, which will easily affect the image classification accuracy. In this condition, the values of boundaries i th are defined as follows to improve the separability of the training samples:  As shown in Figure 3b, if the sample intervals overlap, then the upper boundary of sample i−1 might be higher than the lower boundary of sample i . This situation will cause the training samples are not pure enough, which will easily affect the image classification accuracy. In this condition, the values of boundaries th i are defined as follows to improve the separability of the training samples:

Training of HELM
By attaching the cluster centers (target matrix T) to the training samples, the training sets X are constructed to train HELM.
In multilayer forward encoding, the original data are converted to an ELM random space to extract the hidden layer features of the training samples. According to Equation (1), Remote Sens. 2021, 13, 4918 7 of 17 the output of each hidden layer is obtained. To obtain other sparse features and improve learning accuracy, this work uses an ELM sparse autoencoder in the optimization of the hidden layer weights according to Equation (2). In the original ELM, the universal approximation capability of ELM is used to obtain the output weight β out according to Equation (4).
Each hidden layer of HELM is a separate feature extractor. Once the parameters of O β and β out in HELM are calculated, the training of HELM is completed.

Feature Map Generation
As the parameters of O β and β out in HELM are determined in the training process, the multi-temporal smooth images are separately used as inputs in the trained HELM to extract the feature maps. This process is similar to the training of HELM, and the hidden layer parameters used are obtained from the training of HELM. According to Equation (3), the final output is the feature map. The value of each category in the feature map is the value near the cluster center. Hence, the homogeneity of the same image category in the feature map is greater than that of the smooth image.

Change Detection Map Generation
The multi-temporal feature maps cannot be directly compared because their features are extracted from each image and they are unequal despite originating from the same category. Therefore, the FCM algorithm [8] is used to perform cluster analysis on the feature maps and mark the multi-temporal clusters with the same sets of category labels to generate comparable classification maps.
Subsequently, the change detection between the classification maps of I 1 and I 2 is carried out as where Cmap i.j (I 1 ) and Cmap i.j (I 2 ) represent the category labels of pixel (i, j) in the classification maps of I 1 and I 2 , respectively.
In the changed regions, if the category labels of pixel (i, j) in Cmap i.j (I 1 ) and Cmap i.j (I 2 ) are respectively "a" and "b", then the changed type of pixel (i, j) is "from a to b."

Dataset Description
To validate the proposed change detection method not only for heterogeneous images but also for homogeneous images, we used the following four datasets in the experiments: homogeneous SAR images, homogeneous optical images, and two sets of heterogeneous images acquired by SAR and optical sensors. Figure 4 shows the homogeneous SAR images with a spatial resolution of 5 m and size of 1500 × 898 pixels. These images were acquired by the SAR sensor of Sentinel-1 satellite over a part of Yangtze River in China in November 2017 and May 2018. They reflected the changes in Yangtze River during this period.

Evaluation Criteria
For the quantitative assessment of the proposed change detection method, several evaluation indicators are used, including: overall error (OE), percentage of correct change detection (PCC), kappa coefficient (KC) [42], F 1 -measure (F 1 ) [43], runtime. where N denotes the total number of pixels in the image. FP and FN respectively denote the numbers of the changed and unchanged pixels incorrectly detected. TP and TN respectively represent the numbers of changed and unchanged pixels detected correctly; they can be respectively denoted by NC and FN as follows: where NC and NU are the numbers of changed and unchanged pixels in the change detection results. The kappa coefficient (KC) is defined as follows: where The F 1 -measure (F 1 ) is defined as where The value of KC and F 1 are between 0 and 1, and larger values of KC and F 1 indicate higher change detection accuracy. Compared with the PCC and OE, KC and F 1 consider more details because they also consider the detection accuracy for unchanged areas.

Comparison Methods
To evaluate the efficiency of the proposed HELMC for homogeneous/heterogeneous images, we used the following methods for comparison:

Parameter Setting
The λ is set as 10 8 according to the conclusion of HELM in [38]. And through experiments to verify the λ in the more than 10 5 experimental results are better, and λ within this value range have little effect on the accuracy of the change detection results. And the activation function g(·) of the hidden layer is set as tansig, which is defined as follows: where x and y are the input and output data, respectively. The fuzzy factor m of FCM and the architecture of the hidden layer of HELMC net need to be discussed. In our experiments, KC and run time (in seconds) were exploited as the validation metrics for the four datasets.
(1) Effect of fuzzy factor m: The fuzzy factor m is an important parameter of FCM. Therefore, the effect of m on the performance of HELMC needs to be evaluated. The work in [44] considered m = 2 as the most appropriate; thus, we evaluated the performance of m by taking m = 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, and 3.0. To ensure the accuracy and efficiency of HELMC, we fixed the hidden layer net to [30-75-100-200]. Figure 8 shows the influence of m on the accuracy of change detection. The line graphs of different colors indicate the results of different datasets. The KCs of the four datasets are stable within a range with the variation of m, thereby indicating that HELMC is not sensitive to m. To avoid the selection of different m for different datasets, we set m to 2.5. to 2.5.
(2) Effect of hidden layer net: Different network depths affect the richness of the features extracted by HELMC and thus affect the accuracy of HELMC. As network depth increases, the run time of HELMC increases and consequently affects the efficiency of the algorithm. Therefore, the effect of net on the accuracy and efficiency of HELMC should be analyzed. We set six different net: net1 = [30], net2 =  As shown in Figure 9, the KCs of the four datasets are relatively low in the case of a single hidden layer. When the number of hidden layers increases, the KCs of the four datasets increase and remain within a stable range. This result is due to the fact that the feature extraction ability of the multilayer network becomes stronger than that of a singlelayer network. However, a deeper network does not always equate to high accuracy because overfitting may occur as network depth increases; this condition then leads to a decrease in network accuracy (e.g., dataset 1). Figure 8 also shows that the run time of the network increases sharply as it deepens. Given the accuracy and efficiency of HELMC, setting net to net4 in this work is reasonable.  As shown in Figure 9, the KCs of the four datasets are relatively low in the case of a single hidden layer. When the number of hidden layers increases, the KCs of the four datasets increase and remain within a stable range. This result is due to the fact that the feature extraction ability of the multilayer network becomes stronger than that of a single-layer network. However, a deeper network does not always equate to high accuracy because overfitting may occur as network depth increases; this condition then leads to a decrease in network accuracy (e.g., dataset 1). Figure 8 also shows that the run time of the network increases sharply as it deepens. Given the accuracy and efficiency of HELMC, setting net to net4 in this work is reasonable.

Running Environment
All algorithms were written in MATLAB language and tested on the following running environment: AMD Ryzen 7 3800X 8-Core CPU at 3.89 Hz, 64 GB RAM, Windows 10 (64 bit), MATLAB 2020a.

Results on Homogeneous Images
3.4.1. Results on Yangtze River Dataset 1 HELMC was verified with Yangtze River dataset 1 (Figure 4), and the results are shown in Figure 10. The image noises and various distributions of shadows resulting from the different imaging conditions between the multi-temporal images presented difficulties. As determined from the visual comparison among the methods for the Yangtze River dataset 1 (Figure 10), the results of FCM, PCA-Kmeans, FLICM, PCA-Net, and CWNN showed large amounts of noise. Although NPSG showed less noise than the other methods, the contours of the changed regions were not clear enough. INLPG showed good results with a clear representation of the changed regions, but it did not detect some changed regions. Overall, the result of HELMC was the least noisy and was closest to the reference image. This result was due to the separable sample selection method of HELMC that suppresses image noise and clearly classifies images.

Running Environment
All algorithms were written in MATLAB language and tested on the following running environment: AMD Ryzen 7 3800X 8-Core CPU at 3.89 Hz, 64 GB RAM, Windows 10 (64 bit), MATLAB 2020a.

Results on Yangtze River Dataset 1
HELMC was verified with Yangtze River dataset 1 (Figure 4), and the results are shown in Figure 10. The image noises and various distributions of shadows resulting from the different imaging conditions between the multi-temporal images presented difficulties. As determined from the visual comparison among the methods for the Yangtze River dataset 1 (Figure 10), the results of FCM, PCA-Kmeans, FLICM, PCA-Net, and CWNN showed large amounts of noise. Although NPSG showed less noise than the other methods, the contours of the changed regions were not clear enough. INLPG showed good results with a clear representation of the changed regions, but it did not detect some changed regions. Overall, the result of HELMC was the least noisy and was closest to the reference image. This result was due to the separable sample selection method of HELMC that suppresses image noise and clearly classifies images.
To further compare HELMC with other methods, we present in Table 1 their accuracies and run times. In this table, OE, PCC, and KC reflect the overall accuracies of the postclassification change detection methods; FP and FN respectively denote the numbers of false and missed alarms. HELMC had the lowest OE and highest PCC, KC and F1 for the Yangtze River dataset 1. Although FN of FCM was less than that of HELMC, its FP was extremely high (Figure 10a). Moreover, FCM and PCA-Kmeans had a shorter run time than HELMC, but their accuracies were much lower. The accuracy of INLPG was closest to that of HELMC, but its run time was 47.52 times that of HELMC.   To further compare HELMC with other methods, we present in Table 1 their accuracies and run times. In this table, OE, PCC, and KC reflect the overall accuracies of the postclassification change detection methods; FP and FN respectively denote the numbers of false and missed alarms. HELMC had the lowest OE and highest PCC, KC and F 1 for the Yangtze River dataset 1. Although FN of FCM was less than that of HELMC, its FP was extremely high (Figure 10a). Moreover, FCM and PCA-Kmeans had a shorter run time than HELMC, but their accuracies were much lower. The accuracy of INLPG was closest to that of HELMC, but its run time was 47.52 times that of HELMC.  Figure 11 shows the change detection results of the different methods for the Wanghong dataset. As a result of the different imaging time and conditions, the spectral characteristics of the multitemporal images varied (e.g., the optical image acquired in April 2014 was blurred) and thus made the change detection difficult. Visually, the results presented many missed and false alarms for the methods. In the area of the wheat field with one obvious change in the lower right corner of the image, only HELMC detected the correct result ( Figure 11g). Moreover, HELMC showed the best detection result and was closest to the reference image. This outcome was attributed to the accurate classification mechanism of HELMC, which accurately classifies images regardless of differences in imaging conditions.  Table 2 shows the quantitative comparison among the methods for the Wanghong dataset. The PCC and KC values were low because the existing methods produced many false and missed alarms. By contrast, the PCC, KC and F1 values of HELMC were the highest. Although the run times of CVA, FCM, and PCA-Kmeans were short, their accuracies were much lower than that of HELMC.  Table 2 shows the quantitative comparison among the methods for the Wanghong dataset. The PCC and KC values were low because the existing methods produced many false and missed alarms. By contrast, the PCC, KC and F 1 values of HELMC were the highest. Although the run times of CVA, FCM, and PCA-Kmeans were short, their accuracies were much lower than that of HELMC. The proposed method was further verified using the Yangtze River dataset 2 ( Figure 12). The difficulty of this dataset stems from the differences in the shadow distribution and the imaging characteristics of the heterogeneous images. Visually, the results of HPT and NPSG showed isolated noise. Those of LTFL, CGAN, and INLPG were less noisy but had many false and missed alarms. Given the separable sample selection and learning ability of HELMC for the internal features of the images, the influence of these differences on the change maps was reduced. By visual interpretation, in the change map of HELMC (Figure 12h), the blue (W 3 ) and red (W 4 ) regions might indicate the changes from river to land and from land to river, respectively.     Table 3 shows the comparison of the accuracy and run times of the methods. Although INLPG had the lowest FP, its FN was high. HELMC had the lowest OE and the highest PCC, KC and F 1 . Moreover, the run time of HELMC of 7.97 s was much lower than those of the other methods.    Considering the results of the visual comparison and quantitative assessment, we  Table 4 shows the quantitative comparison of the methods. Although CGAN had the lowest FP, its FN was high. HELMC yielded the lowest OE and the highest PCC, KC and F 1 . The run time of LTFL, which had the closest accuracy to HELMC, was 229 times that of HELMC.
Considering the results of the visual comparison and quantitative assessment, we deduced that the proposed HELM method could effectively detect changes in homogeneous and heterogeneous images acquired by SAR and optical sensors. In addition, HELMC is able to improve efficiency while maintaining accuracy.

Discussion
Currently, change detection tasks require a higher detection accuracy and speed, and need to be less data-dependent (i.e., dependent on homogeneous image). However, heterogeneous remote sensing images exhibit different imaging characteristics due to different imaging mechanisms, and thus changed information cannot be obtained by directly comparing heterogeneous images.
We consider that the images can be classified so that the same ground objects would have the same category labels, thus circumventing the effect of different imaging characteristics on change detection. In this way, both the changed region and the changed category can be obtained by directly comparing the category labels. However, most of the traditional image classification methods can only be used for one kind of remote sensing images (e.g., optical/SAR images) and require select samples manually to train the classifier. We consider the spectral characteristics of the same image category to be similar. Therefore, we propose a separable sample selection method to obtain the cluster center of each image category by FCM, and automatically select the image pixels close to the cluster center as the training samples of the corresponding image category. Moreover, the training samples of each image category are not overlapped, which ensures the purity of the training samples corresponding to each image category and helps to obtain more accurate classification results.
Unlike other neural networks, HELM is a feed-forward neural network in which the weights of the current layer are determined once the feature extraction of the previous layer is completed without fine-tuning. Therefore, the introduction of HELM as a classifier improves the efficiency of the proposed method, which is of great importance when applied to change detection tasks that require real-time/quasi-real-time performance.

Conclusions
A change detection method with heterogeneous images is proposed on the basis of HELMC. After the multi-temporal images are smoothed to suppress image noise, appropriate samples are automatically selected through a proposed rule to train the HELM network. After HELM training, the multi-temporal feature maps are separately extracted without parameter adjustment and classified by the FCM algorithm. The changed regions and changed types are obtained by comparing the classification maps.
In the experiments, the HELMC method is compared with existing ones by using homogeneous and heterogeneous images acquired by optical and SAR sensors. In the comparison of the homogeneous images, the result of HELMC shows the least noise and false and missed alarms in the unchanged areas. It also correctly detects the reduction of wheat fields. Meanwhile, HELMC has the lowest OE and the highest PCC and KC for the homogeneous SAR and optical datasets. As for the heterogeneous images acquired by optical and SAR sensors, the HELMC method shows the fewest false alarms because of the separability of its sample selection and its learning ability. The run time of HELMC is also considerably short, thus making the algorithm highly efficient in practical applications. The theoretical explanation and experimental validation show that HELMC presents robustness and superiority in change detection with homogeneous and heterogeneous images acquired by optimal and SAR sensors.
In future research, we plan to use more different types of images for change detection research, such as: LiDAR images, infrared images, GIS maps, etc. We will also further strengthen the change detection research of heterogeneous images in complex scenarios.