A Novel Approach to Unsupervised Change Detection Based on Hybrid Spectral Difference

: The most commonly used features in unsupervised change detection are spectral characteristics. Traditional methods describe the degree of the change between two pixels by quantifying the difference in spectral values or spectral shapes (spectral curve shapes). However, traditional methods based on variation in spectral shapes tend to miss the change between two pixels if their spectral curves are close to ﬂat; and traditional methods based on variation in spectral values tend to miss the change between two pixels if their values are low (dark objects). To inhibit the weaknesses of traditional methods, a novel approach to unsupervised change detection based on hybrid spectral difference (HSD) is proposed which combines the difference between spectral values and spectral shapes. First, a new method referred to as change detection based on spectral shapes (CDSS) is proposed that fuses the difference images produced by the spectral correlation mapper (SCM) and spectral gradient difference (SGD) in order to describe the variation in spectral shapes. Second, a method called change detection based on spectral values (CDSV), computing the Euclidean distance between two spectral vectors, is used to obtain a difference image based on the variation in spectral values. Then, the credibility of CDSS and CDSV for every pixel is calculated to describe how appropriate these two methods are for detecting the change. Finally, the difference images produced by CDSS and CDSV are fused with the corresponding credibility to generate the hybrid spectral difference image. Two experiments were carried out on worldview-2/3 and Landsat-7 Enhanced Thematic Mapper Plus (ETM+) datasets, and both qualitative and quantitative results indicated that HSD had superior capabilities of change detection compared with standard change vector analysis (CVA), SCM, SGD and multivariate alteration detection (MAD). The accuracy of CDSS is higher than CDSV in case-1 but lower in case-2 and, compared to the higher one, the overall accuracy and the kappa coefﬁcient of HSD improved by 3.45% and 6.92%, respectively, in the ﬁrst experiment, and by 1.66% and 3.31%, respectively, in the second experiment. The omission rate dropped by approx. 4.4% in both tests.


Introduction
Change detection in remote sensing is the process of detecting changes that have occurred on the Earth's surface by remote-sensing images of the same region acquired at different times. In the past few decades, many change detection algorithms have been proposed and reviewed [1][2][3][4][5], which can be simply divided into two categories of supervised and unsupervised.
Supervised change detection is to identify changes through the process of overlaying coincident thematic maps which are generated by classifying the images [4]. The distinct advantage of this sensitive to gain factors in theory, that is, SGD cannot eliminate the effects of illumination variation and shadows. To get a better representation of the difference between spectral shapes, a new method which combines SGD and SCM is proposed.
In this paper, the method based on the variation in spectral values is referred to CDSV (change detection based on spectral values), and the method based on the variation in spectral shapes is referred to CDSS (change detection based on spectral shapes). It is difficult to identify correctly all change types using only CDSV or CDSS, because different change types have different spectral curves. For example, the change between two pixels will be easily missed by CDSV if both are dark. In other words, the credibility of CDSV is low for this change type. Besides, the change tends to be missed by CDSS if two pixels have "close-to-flat" spectral curves (which means the spectral curves are close to flat). In other words, the credibility of CDSS is low for this change type.
According to the above analysis, this paper presents a new hybrid spectral difference (HSD) method, considering the difference of both spectral values and spectral shapes. Firstly, two difference images are produced by CDSV and CDSS, respectively. Then, the credibility of two methods is calculated. Finally, two difference images are fused with the corresponding credibility to produce the result.
The superiority of HSD lies in the calculation of weights in the fusion of CDSS and CDSV. The weights are calculated by the credibility of two methods. For example, if CDSS has high credibility and CDSV has low credibility for a changed pixel, CDSV tend to miss the change but have the low weight in the fusion, and CDSS tend to find the change and have the high weight in the fusion. Therefore, HSD will find the change which is missed by only one method.
The rest of this paper is organized as follows. The next section presents the theory of the proposed method. Section 3 presents the results of two experiments on different datasets. Section 4 presents the discussion. The conclusions are presented in Section 5.

Methodology
Let X 1 , X 2 represent two co-registered remote-sensing images with L bands. There are three major steps for HSD to obtain the difference image: (1) generate two difference images by CDSV and CDSS; (2) calculate the credibility of the two methods; (3) fuse two difference images with the corresponding credibility.

Preprocessing
During data preprocessing, atmospheric correction [30,31], geometric and relative radiometric corrections would be carried out to make the two images as comparable as possible. Atmospheric correction and geometric correlation would be implemented by Envi5.3 and ERDAS IMAGINE 2014 respectively.
In the present work, relative radiometric normalization was implemented based on pseudoinvariant features (PIFs) [32,33] with an assumption that the invariant pixels in both images are linearly related, and PIFs are unchanged pixels only affected by radiometric variance [34].
The selection of PIFs would be decided by CDSV, SCM and SGD together. We assume that at least 50% of the scene is unchanged, which is right in most applications. Three difference images are produced by three algorithms respectively. Then, the 'unchange map' of each algorithm is produced by thresholding the corresponding difference image and the darker 50% of the image is labeled as unchange. Let U CDSV , U SGD , U SCM represent the 'unchange map' of CDSV, SGD and SCM respectively, and PIFs can be selected by the following equation: The main purpose of this framework is to compare and assess the algorithms fairly, as the relative radiometric normalization has a big influence on change detection. HSD was a fusion of CDSS and Remote Sens. 2018, 10, 841 4 of 21 CDSV, and CDSS was a fusion of SGD and SCM. The PIFs are decided by the areas identified as unchanged by CDSV, SCM and SGD together, so the relative radiometric normalization will not influence the comparison of these algorithms.

Change Detection Based on Spectral Value (CDSV) and Change Detection Based on Spectral Shapes (CDSS) Algorithms
This section details with the CDSV and CDSS for producing two difference images by spectral values and shapes respectively.

CDSV Algorithm
The Euclidean distance between spectral vectors can characterize the difference between spectral values excellently. Although the Mahalanobis distance can increase the importance of bands having little variation of reflectance, image normalization already ensures that each band has the same impact on change detection. Let DISV represent the difference image produced by CDSV [9,10]: where X i 1 and X i 2 repersents the i-th band in X 1 and X 2 respectively.

CDSS Algorithm
SCM and SGD represent the difference in spectral shapes from different perspectives. SCM computes the difference between spectral shapes by the Pearson's correlation of them. Let SCM CD be the difference image obtained by SCM [11,28]: where X 1 and X 2 represent the mean of all the bands of X 1 and X 2 , L represents the number of bands and the superscript i means the i-th band. SCM computes the Pearson's correlation between spectral vectors, so the value range is [−1, 1]. SGD directly represents the spectral shape by the spectral gradient. Let GD represent the spectral gradient and SGD CD be the difference image produced by SGD [13,29] where the superscript i and subscript k means the i-th band in the k-th image, L represents the number of bands. In Figure 1, the blue curve is a spectral curve or the corresponding spectral gradients of a pixel in WordlView-2 image, and the red curve is the simulation of pseudo changes. The simulated spectra are changed by an additive factor of 1 in Figure 1a, a multiplicative factor of 2 in Figure 1b, and both two factors in Figure 1c. The spectral gradient curves are just below the corresponding spectral curves. We can find that two gradient curves coincide in Figure 1d, and the gradient curves in Figure 1e,f are identical. Therefore, SGD is insensitive to the additive factor even if the spectra are changed by a multiplicative factor simultaneously. However, the multiplicative factor is of negative impact on SGD, because the two gradient curves in Figure 1e are different from each other. are changed by an additive factor of 1 in Figure 1a, a multiplicative factor of 2 in Figure 1b, and both two factors in Figure 1c. The spectral gradient curves are just below the corresponding spectral curves. We can find that two gradient curves coincide in Figure 1d, and the gradient curves in Figure 1e,f are identical. Therefore, SGD is insensitive to the additive factor even if the spectra are changed by a multiplicative factor simultaneously. However, the multiplicative factor is of negative impact on SGD, because the two gradient curves in Figure 1e are different from each other. Spectral difference caused by an additive factor; (b) spectral difference caused by a multiplicative factor; (c) spectral difference caused by a multiplicative factor and an additive factor; (d) the spectral gradients of two spectral curves in Figure 1a (the two curves coincide); (e) the spectral gradients of two spectral curves in Figure 1b; (f) the spectral gradients of two spectral curves in Figure 1c.
In practice, the spectral changes caused by multiplicative and additive factors cannot be avoided because of shading, illumination variation, clouds and sensor calibration. The Pearson's correlations between two curves in Figure 1a-c are all 1, because SCM is insensitive to linear transformation of the spectra. SCM has an imperfect assumption that Pearson's correlation between spectral vectors is negatively correlated with the degree of change between them, which is not always true in practice, because the correlation between spectra of different objects is arbitrary. For example, the degree of change of negatively correlated spectral vectors cannot be regarded as higher than it of uncorrelated ones. What can be sure is that spectral vectors of changed areas are not highly positively corrected, but there is no specific mathematic relationship between Pearson's correlation and the degree of change.
Consequently, this paper proposes a new method which combines SCM and SGD to comprehensively consider the variation in spectral shapes, and the method is referred to CDSS. Let DISS represents the difference image produced by CDSS: CDSS can be seen as a weighted SGD method, and the weight is decided by Pearson's correlation between spectral vectors. When spectral changes are caused by multiplicative factors, SGD might wrongly identify it as change, so the weight for SGD needs to be low. The weight is negatively correlated with Pearson's correlation, so it is decided by 1 − SCM CD (x, y) and is normalized by being divided by 2 as the righthand part in Equation (6). When the Pearson's correlation between two curves is close to 1, the value of DISS will be close to 0 even if the value of SGD CD is high, so CDSS is insensitive to multiplicative factors and can eliminate the pseudo changes caused by illumination variation and shadow.

The Credibility of CDSV and CDSS
Since the credibility of CDSV is low when the pixels at different times are dark, we can calculate the spectral magnitude of each pixel at both times and use the larger one to represent the credibility. Let Cred_V representthe credibility of CDSV: Cred_V(x, y) = MAX{Cred_V 1 (x, y), Cred_V 2 (x, y)}, where Cred_V represent the spectral magnitude of the k-th image. In practice, the spectral changes caused by multiplicative and additive factors cannot be avoided because of shading, illumination variation, clouds and sensor calibration. The Pearson's correlations between two curves in Figure 1a-c are all 1, because SCM is insensitive to linear transformation of the spectra. SCM has an imperfect assumption that Pearson's correlation between spectral vectors is negatively correlated with the degree of change between them, which is not always true in practice, because the correlation between spectra of different objects is arbitrary. For example, the degree of change of negatively correlated spectral vectors cannot be regarded as higher than it of uncorrelated ones. What can be sure is that spectral vectors of changed areas are not highly positively corrected, but there is no specific mathematic relationship between Pearson's correlation and the degree of change.
Consequently, this paper proposes a new method which combines SCM and SGD to comprehensively consider the variation in spectral shapes, and the method is referred to CDSS. Let DISS represents the difference image produced by CDSS: CDSS can be seen as a weighted SGD method, and the weight is decided by Pearson's correlation between spectral vectors. When spectral changes are caused by multiplicative factors, SGD might wrongly identify it as change, so the weight for SGD needs to be low. The weight is negatively correlated with Pearson's correlation, so it is decided by 1 − SCM CD (x, y) and is normalized by being divided by 2 as the righthand part in Equation (6). When the Pearson's correlation between two curves is close to 1, the value of DISS will be close to 0 even if the value of SGD CD is high, so CDSS is insensitive to multiplicative factors and can eliminate the pseudo changes caused by illumination variation and shadow.

The Credibility of CDSV and CDSS
Since the credibility of CDSV is low when the pixels at different times are dark, we can calculate the spectral magnitude of each pixel at both times and use the larger one to represent the credibility. Let Cred_V representthe credibility of CDSV: Cred_V(x, y) = MAX{Cred_V 1 (x, y), Cred_V 2 (x, y)}, (8) where Cred_V k represent the spectral magnitude of the k-th image.
The key to describing the credibility of CDSS is to quantify how steep the spectral curves are. This paper uses spectral gradient to describe the spectral shape, and the magnitude of spectral gradient can be an indicator to describe how steep the curve is. Therefore, we can calculate the magnitude of the spectral gradients of each pixel at both times and use the larger one to represent the credibility. Let MOSG k representthe magnitude of the spectral gradient of the k-th image, and Cred_S represent the credibility of CDSS: Cred_S(x, y) = MAX{MOSG 1 (x, y), MOSG 2 (x, y)}.

The Hybrid Spectral Difference (HSD) Algorithm
The HSD algorithm fuses DISV and DISS with their corresponding credibility. Let DIHSD be the difference image produced by HSD, and we can get DIHSD as follows: where ω 1 and ω 2 represent the weights in the fusion process. To ensure that every pixel in ω 1 and ω 2 satisfies ω 1 (x, y) + ω 2 (x, y) = 1, the weights can be calculated by the credibility as follows: However, it is not appropriate to calculate ω 1 and ω 2 by Cred_V and Cred_S directly, because Cred_V and Cred_S have different pixel value range and distributions. Usually, pixels in Cred_V have much higher values than pixels in Cred_S; therefore, most pixels in ω 1 would have higher values than them in ω 2 . As a result, DISV would play a predominant role in the fusion process.
To ensure that the weights can correctly describe the importance of DISV and DISS, Cred_V and Cred_S should have the same pixel value distributions, that is, their histograms should be the same. There are two ways to ensure that two images have the same pixel value distributions:

1.
Apply the histogram equalization transformation to two images.

2.
Apply the histogram matching transformation to one image and make it have the similar histogram as the other one.
Before histogram processing, the linear stretching needs to be applied to the images to ensure that the range of pixel values is 0 to 255. It is better to adopt histogram equalization rather than histogram matching to make Cred_V and Cred_S have the same histograms, because histogram equalization makes pixel values distributed evenly from 0 to 255. The histogram matching may lead to a narrow range of pixel values, which may cause the most pixel values in ω 1 and ω 2 to be close to 0.5, so the weights lose their effectiveness.
Let HECred_V and HECred_S be the Cred_V and Cred_S after histogram equalization; then Equations (12) and (13) are changed to: HECred_V(x, y) HECred_S(x, y) + HECred_V(x, y) Similarly, DISS and DISV should have the same histograms before fusion by Equation (11). Just like Cred_V and Cred_S, linear stretching should be applied to DISV and DISS before histogram processing, but here the histogram matching transformation is adopted rather than histogram equalization. Normally, only a small portion of the scene is changed in most applications, so DISV and DISS are dominated by large, dark areas, resulting in histograms characterized by a large concentration of pixels having levels near 0. In this circumstance, the histograms cannot become uniform by histogram equalization transformation, and a very narrow interval of dark pixels is mapped into the upper end of the gray levels of the output images [35]. What's more, the histogram equalization will compress the gray levels which are occupied by very little pixels and the output image would have few intensity levels occupied, hence, some changed area may be merged into unchanged area.
Let LSDISV and LSDISS be DISV and DISS after linear stretching, and chooses LSDISS as the specified image. Let HMLSDISV be LSDISV after histogram matching transformation, Equation (11) is changed to: with ω 1 and ω 2 are calculated by Equations (14) and (15). And the flowchart of the HSD is shown in Figure 2.
Remote Sens. 2017, 9, x FOR PEER REVIEW 7 of 22 image would have few intensity levels occupied, hence, some changed area may be merged into unchanged area. Let LSDISV and LSDISS be DISV and DISS after linear stretching, and chooses LSDISS as the specified image. Let HMLSDISV be LSDISV after histogram matching transformation, Equation (11) is changed to: with ω 1 and ω 2 are calculated by Equations (14) and (15). And the flowchart of the HSD is shown in Figure 2.

Accuracy Assessment and Reference Map
To prove the validity and effectiveness of the proposed algorithm, HSD, three algorithms which are used to generate HSD and other two algorithms were implemented for comparison. Each algorithm generated a difference image, but the result needed an appropriate threshold. The threshold directly affected the accuracy of the change detection. A series of thresholds were set by adjusting the parameters, and the calculation method was as follows:

Accuracy Assessment and Reference Map
To prove the validity and effectiveness of the proposed algorithm, HSD, three algorithms which are used to generate HSD and other two algorithms were implemented for comparison. Each algorithm generated a difference image, but the result needed an appropriate threshold. The threshold directly affected the accuracy of the change detection. A series of thresholds were set by adjusting the parameters, and the calculation method was as follows: where µ and σ represent the mean and standard deviation of the difference image. m was a series of adjustable parameters with equidistant intervals and the range of m varied in different experiments. After binarizing the difference images, the results would be evaluated by the reference image using the omission rate and the commission rate which are widely used in change detection assessment [5]. The omission rate is used to show the algorithm's ability to find real changes while the commission rate is used to show the algorithm's ability to avoid producing pseudo changes. Let ref c represent the number of changed pixels in the reference image, n c represent the number of changed pixels correctly identified, and n p represent the number of pixels of pseudo changes. Then, the omission rate and commission rate are calculated as follows: The reference map made from manual interpretation was important for accuracy assessment. Normally, change maps were constructed with three areas of changed, unchanged and undefined. Foody found that the percentage of change would impact the accuracy assessment and approximately 50% of change would make both of sensitivity and specificity reach balance [36][37][38]. In this paper, change maps would contain approximately equal area of changed and unchanged areas to allow algorithms to be evaluated and compared better.
Apart from the omission rate and the commission rate, the kappa coefficient [39] is also used to assess algorithms. Some researches showed that kappa indices had flaws [40,41] and other advanced assessment features performed better such as quantity disagreement, allocation disagreement [40] or Bradley-Terry model [41]. Still, the kappa coefficient is widely used and acceptable in change detection assessment, as change detection can be seen as a simple classification problem with only two categories.

Experiments and Results
Two experiments were carried out to validate the effectiveness of HSD on remote-sensing images with different sensors. The first used Worldview-2/3 very high-resolution (VHR) multispectral images of Chang Sha (China), and the second was carried out on Landsat-7 ETM+ images of Washington. These images contained most common land-use/land-cover (LULC) classes to ensure the effectiveness of the proposed methods. Details are described below.

Material
The first experiment was carried out on a pair of Worldview-2/3 images of Chang Sha (China) at November 2011 and February 2015 as shown in Figure 3 and their information was shown in Table 1. The first experiment was carried out on a pair of Worldview-2/3 images of Chang Sha (China) at November 2011 and February 2015 as shown in Figure 3 and their information was shown in Table 1.  After radiometric calibration, atmospheric correction was implemented by the FLAASH model in ENVI 5.3. The input parameters for a sensor's information can be found in Table 1 and some other important input parameters of the FLAASH model are listed here: gain factor (1.0), atmosphere model (tropical), aerosol model (urban). Model adjustments for the other parameters were kept by their default values according to the FLAASH user's manual for multispectral imagery [42].
Before further processing, the Worldview-3 image was resampled to 2.0 m so both images had the same resolution. Geometric correction was carried out using ERDAS IMAGEINE 2014 and misregistration errors were less than one pixel; then relative radiometric correction was executed by the method introduced in Section 2.1.
The scene mainly contained roads, land and industrial buildings, and changes occurred mainly due to buildings. Reference information shown in Figure 3 with changes delineated with red polygons would be used to generate quantitative results, while changes in undefined areas were just for visual and qualitative analysis.
The attention in this case was paid to LULC change and most changes with no vague meaning would be marked as changed area. Changes with "vague meaning" means that there is ambiguity or argument in understanding the change type, even when these changes had significant visual difference of color or texture. For example, roof rust or contamination did not change the type of buildings, while it might make a big visual difference between two-date images. Other changes would lie in undefined areas such as the change caused by shadows, different sensor angles or the presence of cars. To avoid seasonal effects of vegetation transition, changes about vegetation type would not be delineated.
Based on Foody' theory [36][37][38], equal area of unchanged areas delineated with green polygons in Figure 3 were chosen for a balance of sensitivity and specificity.

Results
The different images generated by six algorithms were thresholding by a series of thresholds calculated in Equation (17), and the range of m was [−0.3, 1.6]. Table 2 shows the values of the accuracy features when they got their highest kappa coefficients. The change maps of different algorithms in Figure 4 were generated by image thresholding with their optimal thresholds at which the corresponding algorithms got the maximal kappa coefficients.   In Figure 4d, changes in red ellipses were almost missed by CDSS but identified by CDSV; in Figure 4e, changes in red ellipse were partly missed by CDSV but identified by CDSS. These changes were identified better by HSD than the corresponding methods which missed them. Overall, HSD In Figure 4d, changes in red ellipses were almost missed by CDSS but identified by CDSV; in Figure 4e, changes in red ellipse were partly missed by CDSV but identified by CDSS. These changes were identified better by HSD than the corresponding methods which missed them. Overall, HSD identified more changes and had best performance. However, there were some missed changes not completely identified by HSD, for example, the changes in the leftmost ellipse in Figure 4d were partly missed by HSD but almost identified by CDSS.
In Figure 4f, changes in the red ellipse were missed by all algorithms, because the different objects in two-date images here had similar spectra. identified more changes and had best performance. However, there were some missed changes not completely identified by HSD, for example, the changes in the leftmost ellipse in Figure 4d were partly missed by HSD but almost identified by CDSS.
In Figure 4f, changes in the red ellipse were missed by all algorithms, because the different objects in two-date images here had similar spectra.

Material and Preprocessing
The second experiment was carried out on a pair of Landsat-7 ETM+ images of Washington acquired at the same season of 2000 and 2005. Both images are of 668*668 pixel size with 6 bands and the spatial resolution is 30 m. Figure 5 shows the image of 2000, and images were downloaded on USGS GloVis website (https://glovis.usgs.gov/). The image of 2005 had the scan line corrector off (SLC-off) error problem and was processed by the specialized plugin in ENVI before the follow-up processing. After 2003, ETM+ had a major onboard failure resulting in what is known as the SLC-off error, which leads to significant artifacts and approximately 22% data loss. A technique was developed to fill the gaps and was implemented in terms of a plugin of ENVI, and the plugin can be downloaded from the Yale Center for Earth Observation (YCEO) website (https://yceo.yale.edu/how-fill-gaps-landsat-etm-images). Then, after radiometric calibration of two-date images, atmospheric correction was implemented by FLAASH model in ENVI 5.3. Some important input parameters of FLAASH model are listed here: gain factor (1.0), atmosphere model (sub-Arctic summer), aerosol model (rural). Other input parameters of FLAASH model of Landsat-7 ETM+ are already listed in FLAASH user's manual for multispectral imagery [42].
After relative radiometric correction by the method introduced in 2.1, two patches with 150*150 pixel size were chosen for evaluation, as shown in Figures 5 and 6. Areas with significant visual difference of color or texture would be delineated as changed, and equal area of unchanged areas delineated with green polygons were chosen for quantitative evaluation. The image of 2005 had the scan line corrector off (SLC-off) error problem and was processed by the specialized plugin in ENVI before the follow-up processing. After 2003, ETM+ had a major onboard failure resulting in what is known as the SLC-off error, which leads to significant artifacts and approximately 22% data loss. A technique was developed to fill the gaps and was implemented in terms of a plugin of ENVI, and the plugin can be downloaded from the Yale Center for Earth Observation (YCEO) website (https://yceo.yale.edu/how-fill-gaps-landsat-etm-images). Then, after radiometric calibration of two-date images, atmospheric correction was implemented by FLAASH model in ENVI 5.3. Some important input parameters of FLAASH model are listed here: gain factor (1.0), atmosphere model (sub-Arctic summer), aerosol model (rural). Other input parameters of FLAASH model of Landsat-7 ETM+ are already listed in FLAASH user's manual for multispectral imagery [42].
After relative radiometric correction by the method introduced in Section 2.1, two patches with 150*150 pixel size were chosen for evaluation, as shown in Figures 5 and 6. Areas with significant visual difference of color or texture would be delineated as changed, and equal area of unchanged areas delineated with green polygons were chosen for quantitative evaluation.  Figure 5. The reference information of changed areas and unchanged areas were delineated by red polygons and green polygons respectively.

Results
The different images generated by six algorithms were thresholding by a series of thresholds calculated in Equation (17), and the range of m was [−0.3, 1.6]. Table 3 shows the values of the accuracy features and the threshold of different algorithms when they got their highest kappa coefficients. The change maps of different algorithms for two patches were generated by image thresholding with their optimal thresholds at which the corresponding algorithms obtained the maximal kappa coefficients, as shown in Figures 7 and 8.

Results
The different images generated by six algorithms were thresholding by a series of thresholds calculated in Equation (17), and the range of m was [−0.3, 1.6]. Table 3 shows the values of the accuracy features and the threshold of different algorithms when they got their highest kappa coefficients. The change maps of different algorithms for two patches were generated by image thresholding with their optimal thresholds at which the corresponding algorithms obtained the maximal kappa coefficients, as shown in Figures 7 and 8.  In Figures 7d and 8d, changes in red ellipses were almost missed by CDSS but identified by CDSV, while in Figures 7e and 8e, changes in red ellipse were partly missed by CDSV but identified by CDSS. These changes were identified better by HSD than the corresponding methods which missed them. Overall, HSD identified more changes and had best performance. However, there were some missed changes not completely identified by HSD, for example, the changes in the ellipses in Figure 8e were partly missed by HSD but almost identified by CDSS.

Limitation of Traditional Methods
SGD-computed spectral gradient vectors as shape descriptor and difference images were generated by the difference between vectors [13]. When objects on both dates had "close to flat" spectral shape, elements in their spectral gradient vectors would be close to 0 and difference between gradient vectors would be small. SCM assumed that the Pearson's correlation between spectral In Figures 7d and 8d, changes in red ellipses were almost missed by CDSS but identified by CDSV, while in Figures 7e and 8e, changes in red ellipse were partly missed by CDSV but identified by CDSS. These changes were identified better by HSD than the corresponding methods which missed them. Overall, HSD identified more changes and had best performance. However, there were some missed changes not completely identified by HSD, for example, the changes in the ellipses in Figure 8e were partly missed by HSD but almost identified by CDSS.

Limitation of Traditional Methods
SGD-computed spectral gradient vectors as shape descriptor and difference images were generated by the difference between vectors [13]. When objects on both dates had "close to flat" spectral shape, elements in their spectral gradient vectors would be close to 0 and difference between gradient vectors would be small. SCM assumed that the Pearson's correlation between spectral In Figures 7d and 8d, changes in red ellipses were almost missed by CDSS but identified by CDSV, while in Figures 7e and 8e, changes in red ellipse were partly missed by CDSV but identified by CDSS. These changes were identified better by HSD than the corresponding methods which missed them. Overall, HSD identified more changes and had best performance. However, there were some missed changes not completely identified by HSD, for example, the changes in the ellipses in Figure 8e were partly missed by HSD but almost identified by CDSS.

Limitation of Traditional Methods
SGD-computed spectral gradient vectors as shape descriptor and difference images were generated by the difference between vectors [13]. When objects on both dates had "close to flat" spectral shape, elements in their spectral gradient vectors would be close to 0 and difference between gradient vectors would be small. SCM assumed that the Pearson's correlation between spectral vectors is negatively correlated with the degree of change between them [11]. The Pearson's correlation is calculated by Equation (3), and when spectral shape is close to flat, the denominator will be small and therefore SCM will be unstable. The most extreme case is that when spectral shape is completely flat, then the denominator in Equation (3) will be 0.
The image of HECred_S for case-1 is shown in Figure 9a, with changes in red ellipses partly missed by SGD or SCM (as shown in Figure 4). The images of HECred_S described how steep the spectral curves were for every pixel. We can find that those missed changes were dark in the image of HECred_S. The statistics of pixel values of the specific areas in the image of HECred_S were calculated in Table 4, and the specific areas were the delineated changes areas and the omission areas of SGD, SCM, CDSS. We can find that the mean values of the omission areas of SGD, SCM and CDSS were lower than the delineated changes areas in all datasets, and SGD was the lowest. Therefore, SGD and SCM tended to miss the changes between two pixels if both of their spectral curves were close to flat. spectral curves were for every pixel. We can find that those missed changes were dark in the image of HECred_S. The statistics of pixel values of the specific areas in the image of HECred_S were calculated in Table 4, and the specific areas were the delineated changes areas and the omission areas of SGD, SCM, CDSS. We can find that the mean values of the omission areas of SGD, SCM and CDSS were lower than the delineated changes areas in all datasets, and SGD was the lowest. Therefore, SGD and SCM tended to miss the changes between two pixels if both of their spectral curves were close to flat.  The CDSS assumes that the degree of change is determined by the variation in spectral values which is computed by the distance between spectral vectors [11]. When spectral values of objects on both dates are low, their distance will be small. The image of HECred_V for case-1 is shown in Figure  9b, with changes in red ellipses partly missed by CDSV (as shown in Figure 4). The images of HECred_V described how dark the spectra were for every pixel. We can find that those missed changes were dark in the image of HECred_V. The statistics of pixel values of the specific areas in the image of HECred_V were calculated in Table 5, and the specific areas were delineated the change  The CDSS assumes that the degree of change is determined by the variation in spectral values which is computed by the distance between spectral vectors [11]. When spectral values of objects on both dates are low, their distance will be small. The image of HECred_V for case-1 is shown in Figure 9b, with changes in red ellipses partly missed by CDSV (as shown in Figure 4). The images of HECred_V described how dark the spectra were for every pixel. We can find that those missed changes were dark in the image of HECred_V. The statistics of pixel values of the specific areas in the image of HECred_V were calculated in Table 5, and the specific areas were delineated the change areas and omission areas of CDSV. We can find that the mean values of the omission areas of CDSV were lower than the delineated changes areas in all datasets. Therefore, CDSV tended to miss the changes between two pixels if both were dark. Table 5. The statistics of pixel values of the specific areas in the image of HECred_V of two cases. The specific areas were the delineated change areas and the omission areas of CDSV. The accuracy of MAD in both cases were low. There were a non-negligible proportion of both scenes changed, so the results became unstable because MAD requires that a large portion of the image should be unchanged.

Discussion of CDSS
In Tables 2 and 3, we found that the accuracy of CDSS was better than SGD and SCM in both cases. Figure 10 shows the accuracy assessment curves of six algorithms for case-1, in which the x-axis pertains to the parameter m in Equation (17) and the y-axis pertains to overall accuracy and kappa coefficients. We found that the curves of CDSS rose to the peak dramatically and then declined rapidly. The maximum kappa coefficient of CDSS was slightly higher than it of SCM and SGD but dropped rapidly.
Remote Sens. 2017, 9, x FOR PEER REVIEW 16 of 22 areas and omission areas of CDSV. We can find that the mean values of the omission areas of CDSV were lower than the delineated changes areas in all datasets. Therefore, CDSV tended to miss the changes between two pixels if both were dark. Patch-2 in case-2 mean 100 124 std 32 32 The accuracy of MAD in both cases were low. There were a non-negligible proportion of both scenes changed, so the results became unstable because MAD requires that a large portion of the image should be unchanged.

Discussion of CDSS
In Tables 2 and 3, we found that the accuracy of CDSS was better than SGD and SCM in both cases. Figure 10 shows the accuracy assessment curves of six algorithms for case-1, in which the xaxis pertains to the parameter in Equation (17) and the y-axis pertains to overall accuracy and kappa coefficients. We found that the curves of CDSS rose to the peak dramatically and then declined rapidly. The maximum kappa coefficient of CDSS was slightly higher than it of SCM and SGD but dropped rapidly. CDSS was not stable and not appropriate to be used alone for change detection. Although, the optimal accuracy of CDSS was better than SGD and SCM, but the former was overly dependent on the choice of thresholds and it was difficult to find optimal threshold in practical applications.
As shown in Figure 4d, changes in red ellipses were missed by SGD, SCM and CDSS. In Figure  9a, we knew these areas had "close-to-flat" spectral shapes and change detection methods based on spectral shape tended to miss these changes.

Superiority and Limitation of HSD
One superiority of HSD is that it identified changes which were missed by CDSS and CDSV but identified by the other algorithm, and the superiority was created by the calculation of weights of CDSS was not stable and not appropriate to be used alone for change detection. Although, the optimal accuracy of CDSS was better than SGD and SCM, but the former was overly dependent on the choice of thresholds and it was difficult to find optimal threshold in practical applications.
As shown in Figure 4d, changes in red ellipses were missed by SGD, SCM and CDSS. In Figure 9a, we knew these areas had "close-to-flat" spectral shapes and change detection methods based on spectral shape tended to miss these changes.

Superiority and Limitation of HSD
One superiority of HSD is that it identified changes which were missed by CDSS and CDSV but identified by the other algorithm, and the superiority was created by the calculation of weights of two methods in the fusion. Figure 11 showed the weight images of CDSS and CDSV in case-1, which were calculated by Equations (13) and (12) and denoted by ω 2 and ω 1 respectively. ω 2 and ω 1 satisfied that ω 1 + ω 2 = 1, so the gray value of Figure 11a,b was inverted. In Figure 11, we found that those missed changes in red ellipses by CDSV or CDSS had low weights for the corresponding methods, and the methods which identified changes had higher weighs, so HSD combined their advantages in the fusion. As shown in Tables 2 and 3, the accuracy of CDSS is higher than CDSV in case-1 but lower in case-2, but HSD had highest accuracy in both cases.
Remote Sens. 2017, 9, x FOR PEER REVIEW 17 of 22 were calculated by Equations (13) and (12) and denoted by ω 2 and ω 1 respectively. ω 2 and ω 1 satisfied that ω 1 +ω 2 = 1, so the gray value of Figure 11a,b was inverted. In Figure 11, we found that those missed changes in red ellipses by CDSV or CDSS had low weights for the corresponding methods, and the methods which identified changes had higher weighs, so HSD combined their advantages in the fusion. As shown in Tables 2 and 3, the accuracy of CDSS is higher than CDSV in case-1 but lower in case-2, but HSD had highest accuracy in both cases. Another superiority of HSD is that it is highly automatic. Change detection methods have two steps in a broad sense. The first is to produce a difference image and the second is to analyze the difference image to determine the final change detection result. HSD is non-parametric in the first step, and the only parameter is the threshold in step two.

(b) weights for CDSV (a) weights for CDSS
Step two is not the focus in this paper, so a thresholding method [19,20] with a series of thresholds are used to analyze the result. Besides thresholding, many researchers used classifiers, e.g., Naïve Bayes [43] and support vector machine [44], to determine the final result. In addition, a non-parametric framework using the least squares probabilistic classifier [45] is proposed and outperforms other classifiers.
One limitation of HSD is that it could not help identify pseudo changes. The commission rate of HSD is lowest in case-1 but higher than CDSV in case-2, as shown in Tables 2 and 3. HSD was the fusion of CDSS and CDSV by weights. The perfect fusion is that if a change is missed or a pseudo change is made by one of two methods, this method will have a low weight in the fusion. However, the way we calculated the weight only solves one problem that the method had low weight when it missed changes. Hence, HSD might inherit the pseudo changes as the weight might be high in the fusion.
Another limitation of HSD is that it could not help identify changes if they were missed by both methods no matter how to calculate the weights. In this situation, spectral characteristics were not enough to find changes, and more features like texture, e.g., grey-level co-occurrence matrix (GLCM) [46] and histogram of oriented gradients (HOG) [47], should be used.
HSD can be easily implemented on objects, because all the methods in this paper care about the spectral feature, no matter whether it is a pixel's or object's spectral feature. The spectral value of an object is the mean value of all the pixels' values in the object. Object-based change detection has many challenges like the segment strategies [38]. Normally, more context information and more kinds of object features [48] are used in research works. Hence, how to combine HSD and other features like texture feature [48] or shape feature [49] is a study direction in the future. Another superiority of HSD is that it is highly automatic. Change detection methods have two steps in a broad sense. The first is to produce a difference image and the second is to analyze the difference image to determine the final change detection result. HSD is non-parametric in the first step, and the only parameter is the threshold in step two.

Discussion of Histogram Processing
Step two is not the focus in this paper, so a thresholding method [19,20] with a series of thresholds are used to analyze the result. Besides thresholding, many researchers used classifiers, e.g., Naïve Bayes [43] and support vector machine [44], to determine the final result. In addition, a non-parametric framework using the least squares probabilistic classifier [45] is proposed and outperforms other classifiers.
One limitation of HSD is that it could not help identify pseudo changes. The commission rate of HSD is lowest in case-1 but higher than CDSV in case-2, as shown in Tables 2 and 3. HSD was the fusion of CDSS and CDSV by weights. The perfect fusion is that if a change is missed or a pseudo change is made by one of two methods, this method will have a low weight in the fusion. However, the way we calculated the weight only solves one problem that the method had low weight when it missed changes. Hence, HSD might inherit the pseudo changes as the weight might be high in the fusion.
Another limitation of HSD is that it could not help identify changes if they were missed by both methods no matter how to calculate the weights. In this situation, spectral characteristics were not enough to find changes, and more features like texture, e.g., grey-level co-occurrence matrix (GLCM) [46] and histogram of oriented gradients (HOG) [47], should be used.
HSD can be easily implemented on objects, because all the methods in this paper care about the spectral feature, no matter whether it is a pixel's or object's spectral feature. The spectral value of an object is the mean value of all the pixels' values in the object. Object-based change detection has many challenges like the segment strategies [38]. Normally, more context information and more kinds of object features [48] are used in research works. Hence, how to combine HSD and other features like texture feature [48] or shape feature [49] is a study direction in the future.

Discussion of Histogram Processing
There are two places in HSD with the use of the histogram processing. The histogram equalization transformation was applied to Cred_V and Cred_S before they were used to calculate the weights, and the histogram matching transformation was adopted for DISS and DISV before the fusion of them. Figure 12a,b show the histograms of Cred_V and Cred_S after linear stretching and we can find that they have different histogram shapes. If histogram matching was adopted, for example, let Cred_S be the specified image and Cred_V be mapped to match Cred_S; then Cred_V and Cred_S would have the same histogram as in Figure 12b. Most pixel values in Figure 12b were distributed over a narrow range from 0 to 100, which might lead the most pixel values in ω 1 and ω 2 to be close to 0.5, so the weights lose their effectiveness. As a consequence, it was better to adopt histogram equalization rather than histogram matching to make Cred_V and Cred_S have the same pixel value distributions.
Remote Sens. 2017, 9, x FOR PEER REVIEW 18 of 22 There are two places in HSD with the use of the histogram processing. The histogram equalization transformation was applied to Cred_V and Cred_S before they were used to calculate the weights, and the histogram matching transformation was adopted for DISS and DISV before the fusion of them. Figure 12a,b show the histograms of Cred_V and Cred_S after linear stretching and we can find that they have different histogram shapes. If histogram matching was adopted, for example, let Cred_S be the specified image and Cred_V be mapped to match Cred_S; then Cred_V and Cred_S would have the same histogram as in Figure 12b. Most pixel values in Figure 12b were distributed over a narrow range from 0 to 100, which might lead the most pixel values in ω 1 and ω 2 to be close to 0.5, so the weights lose their effectiveness. As a consequence, it was better to adopt histogram equalization rather than histogram matching to make Cred_V and Cred_S have the same pixel value distributions.  Most pixel values of them are less than 50, and most intensity levels are not occupied or just are occupied by few pixels. As we can see in Figure 12e,f, the histograms of LSDISV and LSDISS after histogram equalization were not uniform and the narrow intervals of dark pixels of them were mapped into the upper end of the gray levels of the output images [35]. What's more, their histograms had few intensity levels occupied; hence, some changed areas might be merged into unchanged areas. As a consequence, it was better to adopt histogram matching rather than histogram equalization to make DISS and DISV have the same pixel value distributions.

Conclusions
This paper proposes a novel approach to unsupervised change detection based on hybrid spectral difference (HSD). The main idea of HSD is to combine the two types of spectral differences: (a) spectral shapes; (b) spectral values. First, this paper uses a new method referred to CDSS which fuses the difference images produced by SCM and SGD to describe the variation in spectral shapes. Second, CDSV, which computes the Euclidean distance between two spectral vectors, is used to obtain a difference image based on the variation in spectral values. Then, the credibility of two types of spectral characteristics for every pixel is calculated to describe how appropriate these two spectral characteristics are for detecting the change. Finally, the difference images produced by CDSS and CDSV are fused with the corresponding credibility to generate the hybrid spectral difference image.
Two experiments were carried out to confirm the effectiveness of HSD. Both qualitative and quantitative results indicate that HSD had superior capabilities of change detection compared with CDSS, CDSV, SCM, SGD and MAD. The accuracy of CDSS is higher than CDSV in case-1 but lower in case-2, and compared to the higher one, the overall accuracy and the kappa coefficient of HSD improved by 3.45% and 6.92% and the omission rate dropped by 4.41% in the first experiments; the overall accuracy and the kappa coefficient of HSD improved by 1.66% and 3.31% and the omission rate dropped by 4.44% in the second experiments.
CDSS tends to miss the real changes if both spectral curves are close to flat, and CDSV tends to miss the changes if both pixels are dark. HSD used the credibility to compute the weights in the fusion, and successfully detected more changes.
The advantages of the method proposed in this paper are two-fold. First, HSD is highly automatic, and the only parameter in HSD is the threshold of the difference image. Second, HSD successfully combines the advantage of CDSS and CDSV and can detect more changes.
However, some change types are difficult to detect by only the spectral features of individual pixels. To solve this problem, texture features like grey-level co-occurrence matrix (GLCM) [46] and histogram of oriented gradients (HOG) [47], which use context information, can be used in our future investigations.
Author Contributions: W.X. and Z.Z. conceived and designed the experiments; W.X. and Y.W. performed the experiments; W.X., L.Y. and Z.Z. analyzed the data; L.Y. contributed materials and analysis tools; W.X. and Z.Z. wrote the paper.