Abstract
Hyperspectral remote sensing image (HSI) include rich spectral information that can be very beneficial for change detection (CD) technology. Due to the existence of many mixed pixels, pixel-wise approaches can lead to considerable errors in the resulting CD map. The spectral unmixing (SU) method is a potential solution to this problem, as it decomposes mixed pixels into a set of fractions of land cover. Subsequently, the CD map is created by comparing the abundance images. However, based only on the abundance images created through the SU method, they are unable to effectively provide detailed change information. Meanwhile, the features of change information cannot be sufficiently extracted by the traditional sub-pixel CD framework, which leads to a poor CD result. To address these problems, this paper presents an integrated CD method based on multi-endmember spectral unmixing, joint matrix and CNN (MSUJMC) for HSI. Three main steps are considered to accomplish this task. First, considering the endmember spectral variability, more reliable endmember abundance information is obtained by multi-endmember spectral unmixing (MSU). Second, the original image features are incorporated with the abundance images using a joint matrix (JM) algorithm to provide more temporal and spatial land cover change information characteristics. Third, to efficiently extract the change features and to better handle the fused multi-source information, the convolutional neural network (CNN) is introduced to realize a high-accuracy CD result. The proposed method has been verified on simulated and real multitemporal HSI datasets, which provide multiple changes. Experimental results verify the effectiveness of the proposed approach.
1. Introduction
Change detection (CD) [,] using remote sensing images is essential for protecting the ecological environment, managing natural resources and studying social development, etc. [,,,]. The traditional CD technique is mainly applied in multispectral images (MSI) with the aim of detecting land cover changes in multi-temporal remote sensing images [,]. However, due to the limited spectral resolution of the MSI, only significant changes can be detected. In contrast, hyperspectral images (HSIs) have more abundant spectral information of ground objects, which can reflect the subtle spectral properties of the measured objects in detail []. Many researchers have been concerned about the study of CD in HSI. For example, sparse unmixing with dictionary pruning for HSI-CD is proposed to reduce the computation time of the CD process and improve the CD performance by using the high spectral resolution property of HSI []. A novel hierarchical CD method that considers both change magnitude and spectral change information is proposed, aiming to identify the change classes with discriminable spectral behaviour and thereby improving the accuracy of CD results []. A subspace-based HSI-CD method is proposed, which makes full use of the rich spectral information in the HSI and obtains more reliable CD results by measuring the spectral changes [], etc.
Even so, many mixed pixels occur widely in HSI due to the complexity of the ground cover and the restriction of the sensor resolution in HSI, which cause difficulties for CD []. The interior of the mixed pixel is not a single element but rather a combination of several elements [,]. If these mixed pixels are employed in CD on a pixel level, some significant information will be lost []. To address the effect of mixed pixels, many researchers have attempted to explore efficient and robust CD methods. Spectral unmixing (SU) is a representative method that solves the problem of mixed pixels to a certain extent []. The whole procedure is described by the following steps. First, the endmember spectra of each feature type are extracted from the HSI. Second, the abundance of each feature type is estimated. Finally, the CD map is created by comparing the corresponding abundance results. The CD method at sub-pixel level has been shown to be superior to the traditional CD method at the pixel level [,,]. During the process, obtaining abundance results with high accuracy is very important. As the imaging environment of each region on an image is likely to be different, the same ground element may not have the same endmember spectra or may have significant differences. The illumination variations or shadows degrade the performance of the spectral unmixing due to the spectral variability within each endmember class. If this issue is not addressed, the validity of endmember abundance information will be adversely impacted, leading to a poor CD result []. Many existing unmixing models that include spectral variability or nonlinearity aim to address these problems. For example, an unsupervised multitemporal SU method that considers endmember spectral variability is proposed for detecting multiple changes in HSI []. A novel CD approach is based on SU from stacked multitemporal remote sensing images with a variability of endmembers []. A new spectral mixing model, called the augmented linear mixing model (ALMM), is proposed to solve the spectral variability problem by applying a data-driven learning strategy in inverse problems of HSI unmixing [], etc. These methods have achieved better results in solving the problem of endmember spectral variability.
However, previous SU methods were mainly focused on obtaining the abundance images, and afterward, the traditional comparison methods were used to obtain the CD results. In such a framework, the CD methods at the “sub-pixel level” are often unable to provide detailed change information effectively based only on the limited resources. The original image feature information at the pixel level is not utilized, which can be seen as a significant complement. This means that the original image feature information should be sufficiently considered and extracted for the HSI-CD task. Thus, to improve the accuracy of the HSI-CD results, the original image feature information is incorporated with the endmember abundance information to fully exploit the information contained in the HSI. Subsequently, a suitable method is adopted to efficiently extract change features from the fused information. In the early stages, some machine learning-based methods, such as the k-means clustering algorithm (k-means) [], change vector analysis (CVA) [], random forest (RF) [], support vector machine (SVM) [], Markov random field (MRF) [], decision trees (DT) [], neural network (NN) [], logistic regression (LR) [], etc., were widely used. However, these methods only extract shallow features based on the spectral information of the HSI, using individual pixels and all their bands as input. As a result, these linear and nonlinear classifiers do not extract subtle features of HSI well, limiting their application. In recent years, deep learning has become increasingly popular in the field of remote sensing, as it is capable of dealing with certain sophisticated and abstract problems. Deep learning can utilize training samples to train the neural network, which allows the neural network to recognize subtle and abstract features and eventually, extract these features efficiently [,,].
From the above discussion, an integrated CD method based on a multi-endmember spectral unmixing joint matrix and CNN (MSUJMC) is proposed in this paper. It incorporates the original image feature information on sub-pixel level endmember abundance information. In addition, CNN is used to extract multiple types of the change features for detecting multiple changes in the HSI. The overall flow chart of the MSUJMC approach is described in the following steps: (1) Considering the variability of the endmember spectra, this approach utilizes MSU to generate the abundance maps for the whole image; (2) To make full use of the valid information contained in the HSI data, the JM algorithm is used to fuse the multi-source information; (3) The CNN is used to extract complex fine-grained change features and improve the CD performance of the method. This method showed excellent performance on both simulated and real HSI datasets. The experimental results validate the effectiveness of the method in detecting multiple changes in HSI. The remainder of this work is structured in the following format. The second section reviews the related research on remote sensing image CD. The third section thoroughly describes the MSUJMC presented in this article. The fourth section provides the related datasets, experimental results, and analysis. Finally, the fifth section is the conclusion.
2. Related Works
As mentioned in Section 1, the CD is a common application in the field of remote sensing. The aim is to extract pixels from multitemporal remote sensing images where ground types have changed. The relevant CD methods are summarized below.
2.1. Spectral Unmixing
With the increasing availability of HSIs, the problem of mixed pixels included in HSIs is becoming increasingly apparent, limiting the accuracy of HSI-CD results. In order to solve mixed pixels in HSIs, SU has been applied in HSI-CD and achieved good results. For example, a framework based on multi-level SU is proposed for use in HSI-CD tasks []. An unsupervised method based on SU and a new formulation for CD is proposed to detect binary changes and multiple changes []. In reference [], HSI-CD by SU is investigated, and systematically present the advantages that can be gained by using such an approach. A sub-pixel CD algorithm using variability in endmembers is proposed, through a simple but effective model that takes into account the real change in endmember combination, the performance of SU is enhanced and the accuracy of the CD results is improved [], etc.
2.2. Machine Learning
Machine learning was widely used in the field of HSI-CD when deep learning was not widespread. These methods use a number of pixels and all their bands as input to extract features from the spectral information of the HSI and achieve good results. For example, through a formal definition and theoretical study of the CVA technique, a suitable framework is proposed to solve the unsupervised CD problem []. An improved RF algorithm is proposed for the classification step in the feature selection process to improve the accuracy of CD []. A framework based on image calibration, SVM training and tuning, statistical evaluation of model accuracy, and temporal pixel-based image differencing was used for the coral reef change detection task []. Furthermore, a method based on spatial domain analysis and MRF was used to detect building changes in remotely sensed images with good results [], etc.
2.3. Deep Learning
Deep learning is an important approach for processing high-dimensional data. Different from traditional machine learning algorithms, these deep learning-based methods can automatically extract high-level semantic information from HSIs with no handcrafted feature extraction. Numerous studies have shown that deep learning has outstanding capabilities in feature extraction. When deep learning was introduced into HSI-CD, it achieved remarkable performance. For example, a three-dimensional spectral spatial CNN was proposed to extract the spectral and spatial features of HSIs, which improved the accuracy of the HSI-CD results []. A method of the HSI-CD based on tensor and deep learning was proposed, which improves the accuracy of the results by extracting the change information of the underlying features []. A CNN-based CD framework was proposed to detect the subtle change features as a way to improve the accuracy of the binary HSI-CD results [], etc.
3. Methodology
In the proposed method, we first consider the variability of the endmember spectra, using the MSU method to obtain more reliable endmember abundance information. Then, to exploit the rich information contained in the HSI, the JM algorithm is used to fuse the sub-pixel level endmember abundance information and the original image feature information. This procedure converts the corresponding two one-dimensional pixel vectors into a two-dimensional matrix to provide richer cross-channel gradient information. Finally, to perform efficient feature extraction, the CNN is introduced to detect various change categories and generate a CD result map. Figure 1 shows the general architecture of the proposed HSI-CD approach. According to the above theory, this approach consists of the following three steps: (1) The abundance maps were obtained based on the MSU method; (2) Multi-source information was combined based on the JM algorithm; (3) multiple changes were detected based on CNN.

Figure 1.
The architecture of the HSI-CD method based on MSUJMC.
3.1. MSU Method for Acquiring Abundance Images
We used the MSU technique to obtain reliable sub-pixel level endmember abundance information, which improves the performance of the HSI-CD method. Figure 2 shows the flow chart of the MSU method. Considering the large range of the HSI, the same land cover category is in different imaging environments or has different existence states, which often results in the endmember spectra of the same land cover category not being the same or even having large differences. Therefore, the image is divided into several patch images (P1–Pn) based on the size and complexity of the HSI. Thus, subtle endmember features can be highlighted. In this situation, the endmember spectral signatures can be sufficiently analysed in each patch image. The divided patch image is considerably smaller in size than the entire image. The patch scheme simultaneously handles both issues of a possible large number of endmembers and the effects of local spectral variability. Although the number and type of endmembers are increased, the redundant endmembers that are generated can be merged or eliminated later. In this case, the relative change in the deformation can be fully considered, which effectively reduces the errors of the endmember extraction.

Figure 2.
The flow chart of the MSU method.
Afterward, the endmember spectra of each patch image are identified by the vertex component analysis (VCA) [] algorithm, since it is a robust and universal tool. This technique builds the observation vector into a convex cone, and the vertices of the convex cone are considered endmembers. Compared with other techniques, the VCA method is computationally inexpensive and efficient. In addition, the VCA method has a relatively good extraction accuracy and noise immunity. Following the endmember extraction, each patch image has its own endmember pool (U1–Un). Then, the spectra in each endmember pool are compared for classification. The defined rule is obviously the key factor in the procedure. Due to the explicit physical meaning, the spectral angle mapper (SAM) technique [] is used to distinguish the similarity by calculating the angle through the following formula:
where and are the spectra of the two specified endmembers. The more similar the two spectra are, the smaller the angle is. If the angle of the two endmembers is smaller than the empirical threshold, they belong to the same type. Following the classification of the endmember spectra, a ground cover category corresponds to numerous endmember spectra. However, a large amount of redundant computation arises when there are many candidate endmember spectra. In particular, blocking easily caused many similar spectra to occur according to the same land cover type. In this case, the endmember average root mean square error (EAR) indicator is proposed to optimize the selection of the endmember spectra []. The average error in the endmember spectra of a ground cover type can be determined by EAR. The endmember spectra with a smaller EAR are the more representative endmember spectra of the ground cover type among several comparable spectra. Assuming a land cover type with m candidate endmember spectra , the EAR of the ith endmember spectrum can be stated as follows:
where represents the EAR of the ith endmember spectrum, and represents the average value of the root mean square error between and . The lower the EAR value is, the more representative the spectra are. According to the calculated EAR value, some of the more representative spectra were selected to be the endmember spectra of the ground cover type. Through this method, the endmember spectra of the various ground cover categories are optimized and selected. Finally, the endmember pools of all the patch images are combined to obtain the final endmember pool (U). In this endmember pool, a land cover type has several representative endmember spectra.
The LMM is widely used to identify and quantify pure components in remotely sensed images due to its simple physical interpretation and trackable estimation process []. This model assumes that the reflectance measured within each pixel is a unique linear combination of the reflectance of each sub-pixel endmember, weighted by its abundance and some noise []. The spectral properties of mixed pixels are as follows:
where x is the spectral vector value of the mixed pixel, E is the endmember matrix, is the abundance column vector of each endmember, and n is the noise. Finally, the least squares approach is used to obtain the abundance that is most appropriate. There are two crucial limitations in computing the abundance; the abundance must be nonnegative, and the sum must be one []. If the number of endmembers in the image is N, Formula (3) can be represented as:
where and are the ith endmember spectrum and the corresponding abundance values, respectively. The mixed pixels are decomposed according to a specific endmember pool (U) obtained previously. The MSU model is revised based on Equation (4) and shown below:
where N is the number of the endmembers and is the number of spectra according to the ith land cover type. The values of and are the jth endmember spectrum in the ith class type and the corresponding abundance value, respectively, q is the label (zero or one) representing whether the relevant endmember spectrum is used, and n is the noise. Like the LMM, the MSU has the following two constraints: and . Finally, more reliable endmember abundance maps are obtained.
3.2. JM Algorithm for Information Fusion
After obtaining the abundance images, it can be seen that the traditional direct comparison approaches are not effective in extracting the change information. In addition, the detailed change information is not effectively provided if it was only based on the abundance images that were obtained through the MSU method. Therefore, we consider fusing the abundance image information with the original image feature information to provide more information on the land cover change. In this process, principal component analysis (PCA) [] is used to extract the feature information of the original image. It can map the high-dimensional data to the low-dimensional space. In other words, the original features are projected onto the selected feature vector to obtain new low-dimensional feature information. This process removes redundant information and noise to obtain high-quality original image feature information and then normalizes it to limit the value to the range of zero to one. Subsequently, the JM algorithm is used to efficiently process the multi-source information simultaneously, fusing the sub-pixel level endmember abundance information with the original image feature information to fully explore the rich information contained in the HSI. By converting the corresponding two one-dimensional pixel vectors into a two-dimensional matrix, the change patterns between two corresponding spectral vectors on the spatial pixels are better explored. The flow chart is shown in Figure 3.

Figure 3.
The flow chart of the information fusion.
Assuming that the original image size is , m bands are retained after the PCA processing. Suppose that the abundance image (F1, F2) has n bands. The abundance image information and its corresponding original image feature information are stacked in the spectral dimension to obtain two stacked images of size . To properly characterize the local spatial patterns, a neighbourhood is used. For each stacked image, a neighbourhood of size is divided with each pixel as the centre. Then, the JM algorithm is used to reshape and stack the corresponding neighbourhood image information to obtain tensor K. The tensor K contains the context information of each pixel, and the noise and misalignment artefacts can be suppressed or removed. This can be expressed with the following formula:
where represents the value of row p and column q of the joint matrix generated by the pixel vector of row i and column j of the corresponding neighbourhood image; is the p-band value of the pixel vector of row i and column j in the T1 image; and is the q-band value of the pixel vector of row i and column j in the T2 image.
According to Equation (6), each pair of the corresponding one-dimensional pixel vectors is converted into a two-dimensional joint matrix of size . In addition, each pair of corresponding neighbourhood images is converted into a joint matrix cluster K of size . As shown in Figure 3, each contains three parts of information. In Part A, the abundance image of a certain endmember for one pixel in the T1 image subtracts the abundance image of the n endmember corresponding to the pixel of the T2 image. This part of the information represents the difference information at the sub-pixel level of the HSI, which is distributed in the upper left corner of the two-dimensional matrix. In Part B, a certain band of a pixel in the T1 image subtracts m bands of pixels corresponding to the T2 image. This part of the information represents the pixel level difference information of the HSI, which is distributed in the lower right corner of the two-dimensional matrix. Since the affinity between the endmember abundance information and original image feature information is meaningless, the remaining is set to zero. The smaller the value of is, the more similar it is between the corresponding pixels. In contrast, the greater the value of is, the greater the likelihood for pixel change. Following the calculation of the joint matrix, K is obtained with a quantity of .
This whole process combines the endmember abundance information with the original HSI feature information. Furthermore, the JM algorithm maps the differences between the corresponding one-dimensional pixel vectors in the neighbourhood to a two-dimensional matrix, which can provide richer cross-channel gradient information and maximize the utilization of multi-source information. This is an efficient way to simultaneously process multi-source information.
3.3. CNN for Detecting Multiple Changes
To accurately extract the change features from the fused information, the CNN is introduced to improve the accuracy of the HSI-CD results []. The CNN is significant in computer vision. It maintains spatial information by using convolution, pooling, and activation functions and learns locally invariant spatial characteristics as well as nonlinear features. It is commonly utilized for image classification, target detection, change detection, and other image processing tasks. Furthermore, the CNN convolution layer with an activation function can be described through the following calculation:
where is the characteristic graph p output by the pixel (i, j) of the lth layer, and are the kth convolution kernel and deviation of layer l + 1, respectively, is the size of the convolution kernel of layer l + 1, k is the number of characteristic graphs of this layer, and is the activation function, such as the sigmoid function (sigma) [], hyperbolic tangent function (tanh) [], and corrected linear function (relu) [,].
The overall network architecture is shown in Figure 4, and the specific parameters are shown in Table 1. After convolution, a batch normalization (BN) is used to accelerate the training process, and the relu function is used for nonlinear activation. The previously obtained joint matrix clusters are fed into this neural network to integrate this rich multi-source information. Finally, the predicted type labels of the pixels are output to obtain the HSI multicategory CD result map.

Figure 4.
Diagram of the network frame.

Table 1.
Specific details of network architecture.
To optimize the network model, the cross-entropy loss function is used as the loss function of the neural network to detect multicategory changes in the HSI. The cross-entropy loss function is as follows:
where y and p are real labels and predicted output labels, respectively, C is the number of categories, and M is the number of samples. The parameters in the model are updated by backpropagation and random gradient descent.
4. Experiment
In this section, we will first introduce the HSI datasets used in the experiment, then introduce the evaluation measures of the selected method, followed by the discussion of the results, and finally, give the computational cost analysis.
4.1. Dataset Description
To make the experiment more representative, we chose three datasets with large differences in style for the experiment. The simulated experimental dataset had a large number of change types but a strong similarity between the images, which was relatively simple. The first real experimental data had fewer types of changes and a more concentrated range of changes, and the main difficulty was reflected in the complex farm paths. The second real experimental dataset had poor image similarity, and the images had a complex environment with multiple scattered change types.
The first dataset is made up of an HSI acquired by the AVIRIS sensor in 1998 on Salinas Valley, California. The original image has 224 contiguous spectral bands with wavelengths from 400 to 2500 nm, which are characterized by a spatial resolution of 3.7 m and a spectral resolution of 10 nm. Ground truth data that contain 16 material classes (vegetation, bare soil, vineyard, etc.) are available. A subset of the whole image was selected, having a size of 100 × 217 pixels. In pre-processing, 20 water absorption bands (bands 108–112, 154–167, and 224) were discarded, resulting in 204 bands for the experiments. Taking this image as the T1 image, the T2 image was simulated based on the T1 image. To obtain realistic simulation data, six different regions were extracted from the T1 image and inserted back into different spatial positions on the T1 image by replacing the whole spectral vector. Thus, the T2 image was generated with six simulated change classes. To simulate the radiation measurement difference, Gaussian white noise (mean value = 0, variance = 0.001) was added. A pair of simulated images constructed from the original image was obtained. Figure 5a,b show the original T1 image and simulated T2 image with noise. The reference land cover change map in Figure 5c was generated from artificial exchange areas. Change 1 is shown in red, change 2 is shown in green, change 3 is shown in blue, change 4 is shown in yellow, change 5 is shown in magenta, change 6 is shown in cyan, and the unchanged is shown in black.

Figure 5.
Simulated multitemporal image dataset: False-colour composite image (bands: R: 40, G: 30, B: 20): (a) Salinas original hyperspectral image; (b) simulated image with Gaussian white noise (mean = 0, variance = 0.001); and (c) reference image.
The second dataset, “farmland”, is from the Earth Observation-1 (EO-1) Hyperion images, as shown in Figure 6a,b. The dataset covers farmland near Yancheng City, Jiangsu Province, China, with a size of 450 × 140 pixels. The two images were acquired on 3 May 2006, and 23 April 2007, respectively. The spectral resolution is 10 nm, and the spatial resolution is 30 m. The original image has 242 bands. Some bands with a low signal-to-noise ratio (uncalibrated, overlapping, highly influenced by water vapour and noisy bands) are eliminated in the pre-processing process, reserving 155 bands for the experiments. Visually, the main change in the dataset is the size of farmland. The change reference image is shown in Figure 6c, change 1 is shown in red, change 2 is shown in green, and the unchanged is shown in black.

Figure 6.
Real multitemporal image dataset-1 and dataset-2: False-colour composites (bands: R: 23, G: 13, B: 6) images: Farmland images acquired on (a) 3 May 2006, and (b) 23 April 2007, and (c) reference image. False-colour composites (bands: R: 23, G: 13, B: 6) images: Agricultural irrigated images acquired on (d) 1 May 2004 and (e) 8 May 2007, and (f) reference image.
The third dataset is a pair of real bitemporal images acquired by a Hyperion sensor mounted on the EO-1 satellite on 1 May 2004, and 8 May 2007. The study area is an agricultural irrigated land of Umatilla County, Oregon, USA, which has a size of 390 × 200 pixels. The wavelength range of the image is 350~2580 nm, the spectral resolution is 10 nm, and the spatial resolution is 30 m. After pre-processing (repairing bad stripes, removing uncalibrated and noisy bands, atmospheric correction and registration), 159 bands of the original 242 bands (8–57, 82–119, 131–164, 182–184, and 187–220) were used for the CD experiment. Changes in this scenario mainly include the land-cover transitions between the crops, bare soil, and variations in soil moisture and water content of the vegetation. Figure 6d,e show the false-colour composite of the images. The change reference image is shown in Figure 6f. Change 1 is shown in red, change 2 is shown in green, change 3 is shown in blue, change 4 is shown in yellow, change 5 is shown in magenta, and the unchanged is shown in black.
4.2. Evaluation Measures
To assess the performance of the CD method, the CD map was compared to the real reference map. The precision of the computation results can directly represent a CD method’s dependability and practicability. In this research, the following four assessment indices were used: overall accuracy (OA); Kappa coefficient; precision and recall for each class.
When computing the assessment statistics, each category was considered “positive”, while the remaining categories were considered “negative”. The number of pixels that correctly predicted a positive class is the TP, while the number of pixels that correctly predicted a negative class is the TN. The FP represents the number of pixels that incorrectly predicted the negative class as positive, and the FN represents the number of pixels that incorrectly predicted the positive class as negative. The proportion of correctly classified pixels is calculated using the OA. The formula is as follows:
The Kappa coefficient measures the consistency of the CD result map and the actual surface map. In comparison to the OA, the kappa coefficient more objectively reflects the correctness of the CD result. The kappa coefficient is calculated as follows:
where the formula to calculate P is the following:
Precision is the proportion of the pixels in the predicted positive class that truly belong to the positive class, which demonstrates the prediction ability of the algorithm. The formula is as follows:
Recall refers to the proportion of the positive class pixels correctly predicted in the total positive class pixels, which reflects the differentiation ability of the algorithm. The following formula calculates the recall:
4.3. Results and Discussion
To prove the feasibility of this method (MSUJMC), experiments were performed on three different HSI datasets. Comparisons with seven methods, which are the pixel-level CD method based on machine learning (K-means, RF, SVM), sub-pixel level CD method based on spectral unmixing (SU, MSU), CNN comparison method with abundance maps as input (MSUC), and JM algorithm based on traditional spectral unmixing and CNN method (SUJMC), were made to evaluate the performance of the proposed methods. During the experiment, the neighbourhood size was set to 3 × 3 (). The batch size for the training was set to 64, the optimizer was Adam, the learning rate was 0.0005, and the decay was 10−3. Experiment was optimized for 300 epochs for the training samples, and the early stop method was used to avoid overfitting. The number of training samples was 10% of the total pixels. Each experiment is repeated five times and the average accuracy are reported.
4.3.1. Simulation Dataset
The MSUJMC method was used to perform the CD on the simulated images. In acquiring the endmember abundance information, the number of rows and columns was set to two, and the whole image was segmented into four patch images. In acquiring the original image feature information, 10 bands were retained after the PCA processing. Figure 7b–i show the CD results obtained by the different CD methods: K-means, RF, SVM, SU, MSU, MSUC, SUIMC and MSUJMC, respectively. The evaluation statistics of the simulated data are shown in Table 2.

Figure 7.
The experimental results of Simulated multitemporal image dataset: (a) reference image; (b) K-means; (c) RF; (d) SVM; (e) SU; (f) MSU; (g) MSUC; (h) SUJMC; and (i) MSUJMC.

Table 2.
Accuracy statistics of simulation dataset.
Through the analysis of the evaluation statistics and visual comparison with the reference map, the result obtained by the K-means method is shown to be the worst. There are many misclassified isolated pixels caused by the wrong CD, and some of the change 5 and change 6 pixels were not detected. The RF and SVM methods are much better than K-means. The six change classes were almost completely detected, but a large number of unchanged pixels were detected as change 5 and change 6. Therefore, in the two experimental results, the precision statistics of these two change classes are very low, barely exceeding 0.1, and the performance of the SVM was better than that of the RF. From the OA and Kappa, we found that the SU method outperforms RF but is weaker than SVM. The MSU method is a significant improvement over the SU method but still not as good as the methods based on the CNN. The SUJMC and MSUJMC, which fuse the sub-pixel endmember abundance information and original image feature information, have better performance than the MSUC, which does not consider the original image feature information. In particular, since the MSUJMC method considers the endmember spectral variability, this method significantly improves the CD accuracy and has fewer false pixels. Although some pixels of change 5 and change 6 were not detected, the MSUJMC method has the best CD performance overall.
To explore the stability of the method under the influence of different levels of noise, we added two simulations for comparison. By increasing the value of the noise variance (variance = 0.003 and variance = 0.005) in the simulated experiments, the difference in the performance between the methods is more significantly represented. Figure 8a–d show the spectral curves at different noise levels. It can be seen that the higher the noise level, the greater the effect on the spectral curve, which greatly increases the difficulty of CD. By analysing the statistics of these three simulated experiments, we found that the CD results of the various methods were affected by interference to different degrees as the noise level increases. As shown in Figure 8e,f, the MSUJMC method is the least affected by the noise and the most resistant to interference. When the noise level increases, the OA decreases by only 0.18% and 0.25%, and the Kappa decreases by only 0.016 and 0.02.

Figure 8.
Spectral curves at different noise levels: (a) without noise; (b) with Gaussian white noise (variance = 0.001); (c) with Gaussian white noise (variance = 0.003); (d) with Gaussian white noise (variance = 0.005). Experimental results of the simulated dataset under different levels of noise: (e) the OA of each method for the simulation dataset under different levels of noise; (f) the KAPPA of each method for the simulation dataset under different levels of noise.
4.3.2. Real HSI Dataset-1
In this experiment, when acquiring the endmember abundance information, the number of rows and columns was set to two, and the whole image was segmented into four patch images. When acquiring the original image feature information, 10 bands were retained after the PCA processing. The MSUJMC method result are shown in Figure 9i, and the results of the comparison methods (K-means, RF, SVM, SU, MSU, MSUC, SUIMC) are shown in Figure 9b–h, respectively. The evaluation statistics are shown in Table 3.


Figure 9.
The experimental results of real multitemporal image dataset-1: (a) reference image; (b) K-means; (c) RF; (d) SVM; (e) SU; (f) MSU; (g) MSUC; (h) SUJMC; (i) MSUJMC.

Table 3.
Accuracy statistics of the real HSI dataset-1.
By comprehensively analysing the CD result maps of the various methods and their corresponding evaluation statistics, we found that the K-means method generated the least accurate result, with a large number of undetected change 2 pixels and a recall statistic of 0.53. In the resulting maps obtained by the RF, SVM and SU methods, a large number of roads are misidentified as change regions. The performance of the SVM and SU methods was similar, and both outperformed the RF method. Compared with previous methods, the result obtained by the MSU method show much visual and statistical improvement. Due to the strong learning capability of the CNN, the methods based on the CNN are far superior than the traditional methods. Among them, the SUJMC method and the MSUJMC method fused multi-source information and outperformed the MSUC method, which only considers the sub-pixel level endmember abundance information. However, the endmember abundance information used by the SUJMC method does not consider the variability of the endmember spectra, only one candidate spectrum for each endmember type is considered. This leads to the SUJMC method being less reliable than the MSUJMC method. It is also clear from the resulting maps and statistics that the MSUJMC is a very effective way to handle the HSI-CD tasks. All the evaluation statistics are above 0.95, and most of them are above 0.98, indicating that the resulting maps obtained by the MSUJMC method are generally consistent with the ground truth maps.
4.3.3. Real HSI Dataset-2
This dataset is complex compared to the previous datasets. Experiments were carried out by using the proposed MSUJMC method. In acquiring the endmember abundance information, the numbers of rows and columns are set to three and two, respectively, and the whole image is divided into six patch images. When acquiring the original image feature information, 15 bands were retained after the PCA processing. The MSUJMC method results are shown in Figure 10i. Figure 10b–h show the CD results obtained by the comparison of the following CD methods: K-means, RF, SVM, SU, MSU, MSUC, SUIMC, respectively. The evaluation statistics are shown in Table 4.

Figure 10.
The experimental results of real multitemporal image dataset-2: (a) reference image; (b) K-means; (c) RF; (d) SVM; (e) SU; (f) MSU; (g) MSUC; (h) SUJMC; (i) MSUJMC.

Table 4.
Accuracy statistics of real HSI dataset-2.
We found that the result of the K-means method was significantly different from the real surface of the reference map. Although most of the change areas were detected, the various change classes were mixed together, resulting in many misjudgements. In particular, change 3 was not detected, leading to the precision and recall statistics of zero for this change type. Compared with the K-means method, the RF and SVM methods can provide more reliable and clearer information between change classes. The discernibility between the change classes is strong, but there are many unchanged pixels that are detected as changed pixels. The overall performance of the SU and MSU methods is similar to that of the RF and SVM. In the obtained results, the unchanged pixels that are detected as change pixels are less than those detected in the RF and SVM. However, some change 4 pixels were mistaken for change 5, resulting in lower precision and recall statistics for both change types. Moreover, in the result of the SU method, the pixels of change 1 and change 2 are mixed and difficult to distinguish. The MSUC, SUJMC and MSUJMC methods based on the CNN show higher performance than the traditional CD methods. Only a small number of unchanged pixels are detected as change pixels in the results of these three methods. The detection ability of the MSUC method for change 4 is slightly inferior, and the SUJMC method is less capable of detecting change 3. The MSUJMC method has the highest comprehensive performance both from the resulting map and from the evaluation statistics. The result of the MSUJMC method is the most consistent with the real reference map, and only very few pixels are incorrectly detected. The precision and recall statistics for each change type are very high. The OA and Kappa are also the highest among all experimental methods.
4.4. Computational Cost Analysis
The hardware and software parameters were as follows: Lenovo Desktops, AMD Ryzen 7 5800H with Radeon Graphics 3.20 GHz, 16.0 GB RAM, NVIDIA GeForce RTX 3050Ti 4 GB GPU, Windows 10, python3.8. The average computational costs of the three datasets are shown in Table 5. We can see that among these methods, machine learning based methods (K-means, RF, SVM) are less time consuming. The reason for this is due to the fact that these methods only extract shallow features during training. The K-means method has the highest time cost of the three methods, while RF and SVM perform much better in comparison, and RF has the lowest computational time cost. Spectral unmixing based methods (SU, MSU) are somewhat more costly in terms of computational time than machine learning based methods. The reason for the higher computation time cost of the MSU method, compared to the SU method, is analysed mainly as the MSU method adopts a patching strategy, dividing the original image into multiple patch images to expand and optimize the endmember pool, which therefore takes more time to complete. The time costs of deep learning-based methods (MSUC, SUJMC, MSUJMC) increases abruptly, mainly as neural networks need more training time to extract deep features. Among these three deep learning-based methods, the MSUC method has the lowest time cost, mainly as the amount of data input to its network is much smaller. The difference in computational time costs between the SUJMC method and the MSUJMC method depends primarily on the time difference in data processing prior to network input. In the simulation experiments, the time consumption increases with increasing noise levels, but not significantly. In summary, the MSUJMC method requires the highest cost in terms of computational time, but is still acceptable.

Table 5.
Computational cost of the three datasets.
5. Conclusions
In this paper, the MSUJMC method is proposed to solve the challenging problem of detecting multiple changes in multitemporal HSIs. The main contributions of the method are reflected in the following three aspects: (1) The MSU method provides sub-pixel endmember abundance information for the CD. In this process, the variability of the endmember spectra is fully considered so that the obtained endmember abundance information is more accurate and reliable; (2) The JM algorithm fuses the sub-pixel level endmember abundance information and original image feature information into lightweight matrices to make full use of the valid information in the HSI to determine the type of pixel change; (3) The CNN are used to fully exploit a series of change features in the joint matrix clusters to better accomplish the task of detecting multiple changes in HSI.
Although the proposed method consumes more time than the comparative methods in this paper, the time costs were still acceptable. Experimental results on the simulated and real HSI datasets verify the effectiveness of the method. The following conclusions can be drawn from the analysis of the theoretical and experimental results:
(1) A consideration of the variability in the endmember spectra should be made when obtaining image endmember abundance information. Dividing the original image into multiple patch images according to the complexity of the image enables the selection of a large number of possible candidate endmember spectra. The redundant candidate endmember spectra can be removed by the EAR index optimization. Finally, the mixed pixels of the whole image can be unmixed using the MSU method and the final endmember pool to obtain reliable sub-pixel level endmember abundance information.
(2) In the process of fusing the multi-source information, the JM algorithm also considers the neighbouring pixels around each pixel that jointly participate in the decision on the type of change of the central pixel. The one-dimensional pixel information is converted into two-dimensional matrix information, providing richer cross-channel gradient information. Then, a maximization of the use of the valid information in the HSI can be performed.
(3) The powerful learning ability of the CNN can work seamlessly with the joint matrix clusters. It can identify the change features and integrate the multi-source information. Therefore, the task of detecting multiple changes in the HSIs can be performed efficiently.
Considering that the HSI dataset with the real change reference map is relatively scarce, the real change reference maps have a high production cost. In order to achieve high accuracy CD on the HSI dataset without a real change reference map, in future work we will explore the application of the transfer learning and remote sensing images from different data sources on the HSI-CD.
Author Contributions
H.L. and K.W. proposed the algorithm, conceived and designed the experiments and performed the experiments; H.L., K.W. and Y.X. provided article revision opinions and H.L. wrote the paper. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the Natural Science Foundation of China under Grant U21A2013 and the Global Change and Air-Sea Interaction II under Grant GASI-01-DLYG-WIND01.
Data Availability Statement
The data presented in this study are available on request from the corresponding author.
Acknowledgments
The authors would like to thank the anonymous reviewers and associate editor for their valuable comments and suggestions to improve the quality of the paper.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Asokan, A.; Anitha, J. Change detection techniques for remote sensing applications: A survey. Earth Sci. Inform. 2019, 12, 143–160. [Google Scholar] [CrossRef]
- Zhang, L.; Wu, C. Advance and Future Development of Change Detection for Multi-temporal Remote Sensing Imagery. Acta Geod. Cartogr. Sin. 2017, 46, 1447. [Google Scholar]
- Liu, W.; Xu, J.; Guo, Z.; Li, E.; Li, X.; Zhang, L.; Liu, W. Building Footprint Extraction from Unmanned Aerial Vehicle Images Via PRU-Net: Application to Change Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 2236–2248. [Google Scholar] [CrossRef]
- Kaliraj, S.; Meenakshi, S.M.; Malar, V.K. Application of Remote Sensing in Detection of Forest Cover Changes Using Geo-Statistical Change Detection Matrices—A Case Study of Devanampatti Reserve Forest, Tamilnadu, India Nature Environment and Pollution Technology. Nat. Environ. Pollut. Technol. 2012, 11, 261–269. [Google Scholar]
- Zhang, Q.; Yang, N.; Li, X. Application and Future Development of Land Use Change Detection Based on Remote Sensing Technology in China. In Proceedings of the 2010 Asia-Pacific Conference on Power Electronics and Design, Wuhan, China, 30–31 May 2010. [Google Scholar]
- Benedetti, A.; Picchiani, M.; Frate, F.D. Sentinel-1 and Sentinel-2 Data Fusion for Urban Change Detection. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018. [Google Scholar]
- Usha, S.G.A.; Vasuki, S. Unsupervised Change Detection of Multispectral Imagery Using Multi Level Fuzzy Based Deep Representation. J. Asian Sci. Res. 2017, 7, 206–213. [Google Scholar] [CrossRef] [Green Version]
- Bruzzone, L.; Bovolo, F. A Novel Framework for the Design of Change-Detection Systems for Very-High-Resolution Remote Sensing Images. Proc. IEEE 2013, 101, 609–630. [Google Scholar] [CrossRef]
- Tewkesbury, A.P.; Comber, A.J.; Tate, N.J.; Lamb, A.; Fisher, P.F. A critical synthesis of remotely sensed optical image change detection techniques. Remote Sens. Environ. 2015, 160, 1–14. [Google Scholar] [CrossRef] [Green Version]
- Erturk, A.; Iordache, M.D.; Plaza, A. Sparse Unmixing with Dictionary Pruning for Hyperspectral Change Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 321–330. [Google Scholar] [CrossRef]
- Liu, S.; Bruzzone, L.; Bovolo, F.; Du, P. A novel hierarchical method for change detection in multitemporal hyperspectral images. In Proceedings of the IEEE International Geoscience & Remote Sensing Symposium, Melbourne, Australia, 21–26 July 2013. [Google Scholar]
- Wu, C.; Du, B.; Zhang, L. A Subspace-Based Change Detection Method for Hyperspectral Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2013, 6, 815–830. [Google Scholar] [CrossRef]
- Seydi, S.T.; Shah-Hosseini, R.; Hasanlou, M. New framework for hyperspectral change detection based on multi-level spectral unmixing. Appl. Geomat. 2021, 13, 763–780. [Google Scholar] [CrossRef]
- Ruggeri, S.; Henao-Cespedes, V.; Garcés-Gómez, Y.A.; Uzcátegui, A.P. Optimized unsupervised CORINE Land Cover mapping using linear spectral mixture analysis and object-based image analysis—ScienceDirect. Egypt. J. Remote Sens. Space Sci. 2021, 24, 1061–1069. [Google Scholar]
- Haertel, V.; Shimabukuro, Y.E.; Almeida-Filho, R. Fraction images in multitemporal change detection. Int. J. Remote Sens. 2004, 25, 5473–5489. [Google Scholar] [CrossRef]
- Wu, K.; Du, Q.; Wang, Y.; Yang, Y. Supervised Sub-Pixel Mapping for Change Detection from Remotely Sensed Images with Different Resolutions. Remote Sens. 2017, 9, 284. [Google Scholar] [CrossRef] [Green Version]
- Keshava, N.; Mustard, J.F. Spectral unmixing. IEEE Signal Process. Mag. 2002, 19, 44–57. [Google Scholar] [CrossRef]
- Afarzadeh, H.J.; Hasanlou, M. An Unsupervised Binary and Multiple Change Detection Approach for Hyperspectral Imagery Based on Spectral Unmixing. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 12, 4888–4906. [Google Scholar] [CrossRef]
- Erturk, A.; Iordache, M.D.; Plaza, A. Sparse Unmixing-Based Change Detection for Multitemporal Hyperspectral Image. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 9, 708–719. [Google Scholar] [CrossRef]
- Erturk, A.; Plaza, A. Informative Change Detection by Unmixing for Hyperspectral Images. IEEE Geosci. Remote Sens. Lett. 2017, 12, 1252–1256. [Google Scholar] [CrossRef]
- Wu, K.; Du, Q. Subpixel Change Detection of Multitemporal Remote Sensed Images Using Variability of Endmembers. IEEE Geosci. Remote Sens. Lett. 2017, 14, 796–800. [Google Scholar] [CrossRef]
- Liu, S.; Bruzzone, L.; Bovolo, F.; Du, P. Hierarchical Unsupervised Change Detection in Multitemporal Hyperspectral Images. IEEE Trans. Geosci. Remote Sens. 2014, 53, 244–260. [Google Scholar]
- Wu, K.; Chen, T.; Xu, Y.; Song, D.; Li, H. A Novel Change Detection Approach Based on Spectral Unmixing from Stacked Multitemporal Remote Sensing Images with a Variability of Endmembers. Remote Sens. 2021, 13, 2550. [Google Scholar] [CrossRef]
- Hong, D.; Yokoya, N.; Chanussot, J.; Zhu, X.X. An Augmented Linear Mixing Model to Address Spectral Variability for Hyperspectral Unmixing. IEEE Trans. Image Process. 2018, 28, 1923–1938. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wong, R.S.; Ford, G.E.; Paglieroni, D.W. K-means reclustering: An alternative approach to automatic target cueing in hyperspectral images. Proc. SPIE 2002, 4726, 162–172. [Google Scholar]
- Bovolo, F.; Bruzzone, L. A Theoretical Framework for Unsupervised Change Detection Based on Change Vector Analysis in the Polar Domain. IEEE Trans. Geosci. Remote Sens. 2006, 45, 218–236. [Google Scholar] [CrossRef] [Green Version]
- Botsch, M.; Nossek, J.A. Feature Selection for Change Detection in Multivariate Time-Series. In Proceedings of the IEEE Symposium on Computational Intelligence & Data Mining, Honolulu, HI, USA, 1 March–5 April 2007. [Google Scholar]
- Gapper, J.J.; El-Askary, H.; Linstead, E.; Piechota, T. Coral Reef Change Detection in Remote Pacific Islands Using Support Vector Machine Classifiers. Remote Sens. 2019, 11, 1525. [Google Scholar] [CrossRef] [Green Version]
- Zong, K.; Sowmya, A.; Trinder, J. Building change detection from remotely sensed images based on spatial domain analysis and Markov random field. J. Appl. Remote Sens. 2019, 13, 024514. [Google Scholar] [CrossRef]
- Im, J.; Jensen, J.R. A change detection model based on neighborhood correlation image analysis and decision tree classification. Remote Sens. Environ. 2005, 99, 326–340. [Google Scholar] [CrossRef]
- Pu, R.; Gong, P.; Tian, Y.; Miao, X.; Carruthers, R.I.; Anderson, G.L. Invasive species change detection using artificial neural networks and CASI hyperspectral imagery. Environ. Monit. Assess. 2008, 140, 15. [Google Scholar] [CrossRef]
- Hussein, G.A. Retrospective change detection for binary time series models. J. Stat. Plan. Inference 2014, 145, 102–112. [Google Scholar]
- Zhan, T.; Song, B.; Sun, L.; Jia, X.; Wan, M.; Yang, G.; Wu, Z. TDSSC: A Three-Directions Spectral–Spatial Convolution Neural Network for Hyperspectral Image Change Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 14, 377–388. [Google Scholar] [CrossRef]
- Huang, F.; Yu, Y.; Feng, T. Hyperspectral remote sensing image change detection based on tensor and deep learning. J. Vis. Commun. Image Represent. 2019, 58, 233–244. [Google Scholar] [CrossRef]
- Seydi, S.T.; Hasanlou, M. Binary hyperspectral change detection based on 3D convolution deep learning. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2020, 43, 1–5. [Google Scholar] [CrossRef]
- Nascimento, J.; Dias, J. Vertex component analysis: A fast algorithm to unmix hyperspectral data. IEEE Trans. Geosci. Remote Sens. 2005, 43, 898–910. [Google Scholar] [CrossRef] [Green Version]
- Garcia-Allende, P.B.; Conde, O.M.; Mirapeix, J.; Cubillas, A.M.; Lopez-Higuera, J.M. Data Processing Method Applying Principal Component Analysis and Spectral Angle Mapper for Imaging Spectroscopic Sensors. IEEE Sens. J. 2008, 8, 1310–1316. [Google Scholar] [CrossRef]
- Quintano, C.; Fernández-Manso, A.; Roberts, D.A. Multiple Endmember Spectral Mixture Analysis (MESMA) to map burn severity levels from Landsat images in Mediterranean countries. Remote Sens. Environ. 2013, 136, 76–88. [Google Scholar] [CrossRef]
- Du, Q.; Wasson, L.; King, R. Unsupervised linear unmixing for change detection in multitemporal airborne hyperspectral imagery. In International Workshop on the Analysis of Multi-Temporal Remote Sensing Images; IEEE: Piscataway, NJ, USA, 2005. [Google Scholar]
- Foody, G.M.; Cox, D.P. Sub-pixel land cover composition estimation using a linear mixture model and fuzzy membership functions. Int. J. Remote Sens. 1994, 15, 619–631. [Google Scholar] [CrossRef]
- Heinz, D.C. Fully constrained least squares linear spectral mixture analysis method for material quantification in hyperspectral imagery. IEEE Trans. Geosci. Remote Sens. 2002, 39, 529–545. [Google Scholar] [CrossRef] [Green Version]
- Liu, Y.; Zhang, Q.; Chen, Y.; Cheng, Q.; Peng, C. Hyperspectral Image Denoising with Log-Based Robust PCA. In Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 19–22 September 2021. [Google Scholar]
- Dong, Y.; Liu, Q.; Du, B.; Zhang, L. Weighted Feature Fusion of Convolutional Neural Network and Graph Attention Network for Hyperspectral Image Classification. IEEE Trans. Image Process. 2022, 31, 1559–1572. [Google Scholar] [CrossRef]
- Han, J.; Kang, D.S. CoS: An Emphasized Smooth Non-Monotonic Activation Function Consisting of Sigmoid for Deep Learning. J. Korean Inst. Inform. Technol. 2021, 19, 1–9. [Google Scholar]
- Liu, X.; Di, X. TanhExp: A Smooth Activation Function with High Convergence Speed for Lightweight Neural Networks. IET Comput. Vis. 2020, 15, 136–150. [Google Scholar] [CrossRef]
- Hao, X.; Zhang, G.; Ma, S. Deep Learning. Int. J. Semant. Comput. 2016, 10, 417–439. [Google Scholar] [CrossRef] [Green Version]
- Alhassan, A.M.; Wan, M. Brain tumor classification in magnetic resonance image using hard swish-based RELU activation function-convolutional neural network. Neural Comput. Appl. 2021, 33, 9075–9087. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).