Spectrally-Spatially Regularized Low-Rank and Sparse Decomposition : A Novel Method for Change Detection in Multitemporal Hyperspectral Images

Change detection (CD) for multitemporal hyperspectral images (HSI) can be approached as classification consisting of two steps, change feature extraction and change identification. This paper is focused on binary classification of the changed and the unchanged samples, which is the essential case of change detection. Meanwhile, it is challenging to extract clean change features from heavily corrupted spectral change vectors (SCV) of multitemporal HSI. The corruptions can be characterized as gross sample-specific errors, i.e., outliers, and small entry-wise noise following Gaussian distribution. To address the issue, this paper proposes a novel Spectrally-Spatially (SS) Regularized Low-Rank and Sparse Decomposition (LRSD) model, denoted by LRSD_SS. It decomposes the SCV into three components, a locally smoothed low-rank matrix for the clean change features, a sparse matrix for the outliers and an error matrix for the small Gaussian noise. The proposed method is effective in change feature extraction and robust to noise corruptions as it exploits the underlying data structures of the SCV, especially local spectral-spatial smoothness. It is also efficient since there is a closed-form solution for the feature component in the optimization problem of LRSD_SS. The experimental results in the paper show that the proposed method outperforms several classic methods which only deal with the spectral domain of image samples, as well as some state-of-the-art methods which use both spectral and spatial information.


Introduction
Detection of land changes in multitemporal remote sensing images is useful to various areas, such as disaster monitoring, resource management and urban development [1][2][3][4].Change detection (CD) is mostly studied in two ways [5][6][7][8][9][10][11][12][13][14][15].One is to design methods specifically for CD, e.g., the well-established framework for automatic and unsupervised detection [5][6][7] using the polar-coordinates of spectral change vectors (SCV) of multitemporal images.Another exemplary method of this type proposes a novel concept, change endmember, for unsupervised identification of hierarchical changes [8].The other way is to adapt existing image pattern recognition techniques to change detection.A typical method in this category relies on fusion and classification to increase change detection accuracy [9].A family of subspace-based CD (SCD) methods regards CD as target detection [10].When approached as a classification task, CD can be accomplished in two steps, change feature extraction and change identification [9,11].The first step is to obtain features of changes, usually from the SCV of multitemporal images.The second step is to identify changed samples by these enhanced features, perhaps also specifying different kinds of changes, including the unchanged class, i.e., background [16][17][18][19].In this way, many conventional feature extraction (e.g., Principal Component Analysis (PCA) [11,19]) and classification methods (e.g., K-Means clustering [5,20]) have been directly applied to CD.Since this approach can be very inclusive and flexible, the classification framework is adopted for CD in this paper.
Recently, multitemporal hyperspectral images (HSI) have begun to attract attentions by exhibiting great potentials for CD [8,10,16,21,22].Produced as the subtraction result of bitemporal HSI, the SCV are basically a set of HSI, featuring in high spectral resolution.Thus, it is natural to detect changes by making use of the rich spectral details of the SCV as did by most existing methods [8,10,22].However, these methods neglect spatial information, which is proven very useful in analysis of HSI [23][24][25][26][27][28].Then, it is natural to expect that advanced spectral-spatial classification algorithms could have some effects on the increase of change detection accuracy.Unfortunately, there are potential hazards in directly applying these methods to HSI change detection, since they are not able to deal with the problems that particularly exist in the SCV.
One of the most serious problems is that the SCV are usually corrupted.Apart from the actual noise in HSI [29][30][31][32][33][34][35], there are many other causes for corruptions, such as misregistration, spectral variability over the spatial domain and spectral deviation over the sampling time.Take misregistration as an example [36][37][38][39][40].There might be a great spectral dissimilarity between two misaligned pixels of two different images, where it should not be if the images were properly registered.The spectral difference could be treated as change feature by a pixel-based method, thus creating false alarms and hurting detection accuracy.An adaptive SCD (ASCD) method [10] managed to deal with the misregistrations but neglected corruptions of other sources.What is worse, the amount of corruptions in one original set of temporal HSI is likely to be doubled in SCV, since the SCV is the difference between the bitemporal HSI.
To solve the problem, this paper proposes a Spectrally-Spatially Regularized Low-Rank and Sparse Decomposition (LRSD_SS) method for change feature extraction.It addresses the most essential case in HSI change detection, which is binary classification of the changed and the unchanged samples [5][6][7]10].The proposed model is built upon the reasonable assumption that the desired change features and the unwanted corruptions are separable due to their intrinsically different data structures in SCV [41][42][43][44][45][46][47][48].The motivation is analyzed in three aspects as follows.
First, being high-dimensional data, the SCV exhibit intrinsic low-rank features [41][42][43][44][45][46][47][48].This assumption is based on two facts: the number of endmembers (i.e., spectral signatures of substances) in the temporal HSI is far less than the number of bands [49,50]; and the number of real changes is far less than the number of bands [8,10].Therefore, it is safe to infer that the features of real changes, including the background, lie in some low-dimensional subspace.The background data are naturally low-rank because the unchanged samples have approximately zero SCV.
Second, the corruptions of various sources in the SCV can be characterized as gross sample specific noise or small entry-wise noise added to the clean low-rank data [42][43][44].The former is known as outliers, e.g., impulse noise and gross Gaussian noise.Since they only corrupt specific pixels, the outliers exhibit sparsity in high-dimensional data such as SCV.The latter is regarded as thermal noise, being widely spread in HSI while following the Gaussian distribution with a relatively low intensity, compared to that of the outliers.The underlying structures of the high-dimensional noisy data, i.e., SCV in this case, can be exploited by the well-established LRSD model [43] to simultaneously separate the low-rank features, the sparse outliers and the small Gaussian noise.LRSD is much more robust than its original model, PCA, as the former considers the outliers [42][43][44].Given the two assumptions above, LRSD is chosen as the basic model for designing the proposed change feature extraction method.
Third, samples of real changes in the SCV exhibit local smoothness, as ground objects within a small spatial region are likely to belong to the same class and share similar spectral signatures.Meanwhile, noise is very unlikely to be locally smooth due to its sparsity or spatial randomness.The spectral-spatial information can be crucial in differentiation of low-rank change features from low-rank noise, e.g., dead lines that are simultaneously low-rank and sparse [44,45].Therefore, the novel SS regularization is designed.It characterizes the local patterns by averaging pixel-wise similarity measurements within each square neighborhood centered at one sample of SCV.In this way, the proposed method, LRSD_SS, maintains the change features by enhancing the local spectral-spatial smoothness in the low-rank data and further suppressing the noise that could not be completely contained or removed in the noise components yielded by the original LRSD.
Apart from effectiveness, efficiency is also considered by the proposed method in the following aspects.Firstly, the SS regularization is designed to process all the spectral bands of the SCV simultaneously, which is less time-consuming than processing each band image separately, as done by a recently published total-variation (TV) [51,52] regularized low-rank model [44] denoted by LRSD_TV in this paper.Secondly, the SS regularization is formulated based on the Frobenius norm, which results in a closed-form solution for the feature data when LRSD_SS is solved using the Augmented Lagrange Multiplier (ALM) [44,53].
After the low-rank change features are extracted, they are fed to some classifier to be identified.As the temporal changes are mostly unpredictable, prior knowledge is usually scarce and unsupervised classification is adopted [54,55].In fact, many unsupervised classifiers can be used, as long as they are able to perform binary classification.This paper chooses the classic K-Means [20] since it is quite efficient while being plain enough to fully expose the effects of the previous change feature extraction methods on detection accuracy.Our experiments show that LRSD_SS has outperformed several classic algorithms (e.g., Spectral Angle Mapping (SAM) [10], PCA [11], and LRSD [43]), which only consider spectral information, as well as some of the state-of-the-art ones, which consider both spectral and spatial information (e.g., ASCD [10] and LRSD_TV [44]).
The main contributions of this paper are summarized as follows: 1.
It deals with a critical but not well solved problem with CD in HSI, which is to recover clean change data from noisy SCV of bitemporal HSI.Moreover, it addresses the issue from a relatively new angle for CD, which is to extract change features by exploiting inherent data structures exhibited in the SCV.To do so, this paper proposes a novel method, LRSD_SS, based on LRSD.Although the original LRSD has been applied in some areas of non-temporal HSI processing [38,49], it has not been used in temporal HSI change detection, to the best of our knowledge.This paper tries to fill the void by construction of LRSD_SS and demonstration of its capacity in solving the aforementioned problem of CD in multitemporal HSI.

2.
For better characterization of the underlying data structures, this paper designs a novel SS regularization superimposed on LRSD.The regularization enhances the local spectral-spatial smoothness, which normally exhibits in real change data, but is seldom observed from noise.Thus, the proposed LRSD_SS can further suppress the noise, especially those being as low-rank as the change features, which, however, has not been considered by the original LRSD.Experimental results show that LRSD_SS is robust to noise of various forms and intensities while being able to extract change features and increase change detection accuracy.

3.
This paper offers an implementation-friendly method for change detection in temporal HSI.The SS regularization processes all the bands of the SCV simultaneously and yields a closed form solution for the smoothed low-rank data matrix, thus ensuring efficiency.
The rest of the paper is organized as follows.Section 2 provides background knowledge on low-rank decomposition.Section 3 presents the proposed method, LRSD_SS, in full details.Section 4 analyzes experimental results.Finally, Section 5 concludes the paper.

The LRSD Model
LRSD [43] is sometimes referred to as Low-Rank Matrix Recovery (LRMR) [42].The model is derived from robust PCA (RPCA), which originates from PCA [19,[41][42][43][44][45][46][47].It is designed to separate both gross sample-specific outliers and small entry-wise Gaussian noise from clean low-rank data, thus being more robust than PCA in low-rank approximation and denoising [19,42,43,47].Suppose Y ∈ R M×B (B ≤ M) is an observed high-dimensional matrix corrupted by noise, where M and B are the number of samples and the dimension of each sample, respectively.The LRSD model of Y is formulated as where L ∈ R M×B is the low-rank data, S ∈ R M×B holds the outliers which exhibit sparsity, and N ∈ R M×B represents the small entry-wise noise which are i.i.d.Gaussian.
In the case of change detection, Y is SCV of bitemporal HSI T 1 , T 2 ∈ R M×B while M and B are the number of pixels and the number of bands, respectively.L maintains the change features.The reasons for choosing LRSD instead of other low-rank decomposition models for change detection are manifold.First, LRSD accurately models the intrinsic data structures of the SCV.Second, LRSD deals with both small noise and gross outliers, which widely exist in the SCV.Plus, it can be solved efficiently by bilateral random projections (BRP) [43,45,56,57].
where λ is a constant for the trade-off between rank(L) and ||S|| 0 .Equation (2) can be rewritten as where r and k stand for the upper bound of the rank of L and the cardinality of S, respectively [37].
Their values are set a priori.Since Equation ( 2) is highly nonconvex with no efficient solution, a tractable optimization problem is obtained by relaxing Equation (2), i.e., replacing the rank(•) and the 0 -norm in Function (2) with * -norm (nuclear norm) and 1 -norm, respectively [43,58,59], where ε is a constant related to the standard deviation of the random noise N.

Spatial Regularization
Let ||•|| Spa be an arbitrary spatial regularization.By the maximum a posteriori (MAP) estimation theory [35,44], ||•|| Spa can be superimposed on Equation (4), resulting in a spatially regularized optimization function as follows where X ∈ R M×B is a set of spatially smoothed low-rank data and τ is a constant for trade-off and X is introduced to solve L conveniently by iterations. and is amplitude vector of change features.

The Proposed Model: LRSD_SS
The main idea of the proposed method, LRSD_SS, is to extract change features by exploiting intrinsic data structures in SCV of bitemporal HSI.The features of real changes and background are assumed to be lying in a low-dimensional subspace, thus being low-rank, compared to the original high-dimensional SCV.Meanwhile, image corruptions in SCV may lead to false changes and must be removed from the low-rank change features of interest.They can be characterized as gross sample-specific corruptions, i.e., outliers, or small entry-wise noise added to the clean, low-rank data.The former exhibits sparsity while the latter behaves as thermal noise and follows Gaussian distribution.Therefore, we use the LRSD model in Equation (1), which is able to simultaneously extract low-rank features ( L ) and remove sparse outliers ( S ) and Gaussian noise ( N ).
However, the spectral data structures of the change features and those of the corruptions may not always be totally different.Some noise can be simultaneously low-rank and sparse, for instance, dead lines located at the same row or column in all the band images.In that case, conventional LRSD algorithms may fail to recover the clean data [44,45].Since the spectral information alone is not enough for clean data recovery, spatial information is engaged.It is assumed that the real change features exhibit local spectral-spatial smoothness while the noise does not.This data structure can help in eliminating the aforementioned noise.We design a spectral-spatial (SS) regularization for LRSD to exploit the local patterns and enhance the change features, thus proposing the LRSD_SS model.The SS regularization is defined as follows, where , i.e., the squared Euclidean distance between pixel x ij and pixel x is located at the i th row and the j th column of the feature data , whereas Θ is an operator that reshapes a 1 IJ × vector into a I J × matrix.
For fixed i , j and w ,

The Proposed Model: LRSD_SS
The main idea of the proposed method, LRSD_SS, is to extract change features by exploiting intrinsic data structures in SCV of bitemporal HSI.The features of real changes and background are assumed to be lying in a low-dimensional subspace, thus being low-rank, compared to the original high-dimensional SCV.Meanwhile, image corruptions in SCV may lead to false changes and must be removed from the low-rank change features of interest.They can be characterized as gross sample-specific corruptions, i.e., outliers, or small entry-wise noise added to the clean, low-rank data.The former exhibits sparsity while the latter behaves as thermal noise and follows Gaussian distribution.Therefore, we use the LRSD model in Equation (1), which is able to simultaneously extract low-rank features (L) and remove sparse outliers (S) and Gaussian noise (N).
However, the spectral data structures of the change features and those of the corruptions may not always be totally different.Some noise can be simultaneously low-rank and sparse, for instance, dead lines located at the same row or column in all the band images.In that case, conventional LRSD algorithms may fail to recover the clean data [44,45].Since the spectral information alone is not enough for clean data recovery, spatial information is engaged.It is assumed that the real change features exhibit local spectral-spatial smoothness while the noise does not.This data structure can help in eliminating the aforementioned noise.We design a spectral-spatial (SS) regularization for LRSD to exploit the local patterns and enhance the change features, thus proposing the LRSD_SS model.The SS regularization is defined as follows, where , i.e., the squared Euclidean distance between pixel x ij and pixel x w ij x ij is located at the ith row and the jth column of the feature data ΘX ∈ R I×J×B , whereas Θ is an operator that reshapes a I J × 1 vector into a I × J matrix.x w ij is a sample within the W × W neighborhood centered at x ij while x w ij = x ij .Let W be an odd number and (i n , j n ) be the spatial location of x w ij in ΘX, For fixed i, j and w, ω w ij is a constant weighing the dissimilarity d w ij between x w ij and x ij , which can be adjusted a priori so as to avoid over-smoothness.Figure 2 illustrates the generation of the neighborhood and the weights for computation of Equation ( 6) when W = 3.As shown in the figure, x ij can be denoted by x 0 ij and no weight is needed for With the proposed SS regularization in (6) being substituted for ||•|| Spa in Equation ( 5), the optimization problem of LRSD_SS is obtained as follows, Since ||•|| SS is particularly designed based on the Frobenius norm, the smoothed images X in Equation ( 8

Optimization for LRSD_SS
The optimization problem formulated by Equation ( 8) is solved by the ALM method in this paper.ALM is adopted mainly because of its excellent convergence property and simple implementation [53], which are not typical with other methods used for solving low-rank decomposition models, such as the Alternating Direction Method of Multipliers (ADMM) [60,61].To be specific, the inexact ALM algorithm is implemented since the validity and optimality of the algorithm is theoretically guaranteed [53].It has also been proved practical as He et al. [44] successfully solved their TV-regularized low-rank decomposition model for HSI denoising by the algorithm.Within the ALM framework, Equation ( 8) can be rewritten as follows, ( ) ( ) where 1 Λ and 2 Λ are Lagrange multipliers.μ is a penalty parameter.
Since there are more than one unknown variables in (8), they are solved alternatively and iteratively by optimizing Equation ( 8) over one variable while fixing the others.Let t be the iteration counter.Derived from Equation ( 8), the optimization function to update L is as follows [44],

Optimization for LRSD_SS
The optimization problem formulated by Equation ( 8) is solved by the ALM method in this paper.ALM is adopted mainly because of its excellent convergence property and simple implementation [53], which are not typical with other methods used for solving low-rank decomposition models, such as the Alternating Direction Method of Multipliers (ADMM) [60,61].To be specific, the inexact ALM algorithm is implemented since the validity and optimality of the algorithm is theoretically guaranteed [53].It has also been proved practical as He et al. [44] successfully solved their TV-regularized low-rank decomposition model for HSI denoising by the algorithm.Within the ALM framework, Equation ( 8) can be rewritten as follows, where Λ 1 and Λ 2 are Lagrange multipliers.µ is a penalty parameter.
Since there are more than one unknown variables in (8), they are solved alternatively and iteratively by optimizing Equation ( 8) over one variable while fixing the others.Let t be the iteration counter.Derived from Equation ( 8), the optimization function to update L is as follows [44], In this paper, Bilateral Random Projections (BRP) is adopted to estimate L, which has been fully developed and successfully applied in an algorithm named Go Decomposition (GoDec) algorithm to solve the LRSD model [42][43][44]56,57].
and H = HH T q H.As suggested by the power scheme [43], BRP is applied to H instead of H, since the singular values of H decay faster than those of H.The BRP results of H are Z 1 and Z 2 , where A 1 ∈ R B×r and A 2 ∈ R M×r are two random matrices, and r is the upper bound of the rank of L. Given that, L (t+1) in Equation ( 10) is estimated as follows [43], where Conventionally, L in Equation ( 10) is solved by Singular Value Decomposition (SVD) [43,44,62].The BRP technique used here is basically an efficient approximation algorithm for SVD.For solving the same L ∈ R M×B , SVD requires min M 2 B, MB 2 flops while BRP requires r 2 (M + 3B + 4r) + (4q + 4)MBr [43].When r << B << M, BRP is more time-saving than SVD.Moreover, the relative error of the BRP approximation can be very close to that of SVD if q is sufficiently large, say, q ≥ 3 [57].To be noted, r and q are set a priori.Derived from Equation ( 8), X is updated as follows, where T , whereas x w m is a sample within the neighborhood of x m defined in Equation ( 6) and m = (j − 1) × I + i.Thus, Let where , q m and 1 ∈ R W 2 ×1 is a vector with all the elements equal to 1.It should be noted that X (t+1) is solved by Equation ( 14) in a closed form.The computation complexity of updating According to Equation ( 8), the optimization function of S is written as follows, Given the soft-thresholding (shrinkage) technique [44,63], where the shrinkage operator is defined as After L (t+1) and S (t+1) are obtained, the Gaussian noise N (t+1) in Equation ( 1) is updated by The procedures of LRSD_SS are summarized in Algorithm 1, whereas L or X is the output change feature matrix.The technique of updating the parameter µ in every iteration can reinforce the convergence of the ALM-based algorithms [44,53].Apart from the iteration number t, two types of estimation errors are used as stopping criteria for LRSD_SS, which are defined as follows, The upper bounds of Error1 and Error2 are ε 1 and ε 2 , respectively, whereas ε 1 > 0 and ε 2 > 0.

Implementation
As shown in Figure 1, the proposed LRSD_SS is used as a change feature extraction method for the first step of CD.Let Y be the SCV of bitemporal HSI, M is the number of pixels and B is the number of bands.This type of SCV is widely used [5][6][7].Y is input to LRSD_SS in Algorithm 1, which separates the locally smoothed low-rank data (L), outliers (S) and small entry-wise Gaussian noise (N).The proposed method is expected to enhance the real change features by maintaining L, while suppressing the unwanted false changes by removing S and N. Therefore, LRSD_SS makes it easier to determine whether a sample is changed or unchanged in the next step.
The binary classification task is completed by K-Means [5,20].Let A ∈ R M×1 hold the amplitude of each sample in L, whereas A(m) = ||L(m, :)|| F and m = 1, 2, • • • , M. A, instead of L, is fed to the classifier, because the spectral amplitude alone is sufficient for the binary classification required by CD [5][6][7].As an efficient unsupervised method, K-Means is chosen for change identification since the temporal changes can be quite unexpected and prior knowledge is usually scarce [54,55].The main procedures of LRSD_SS + K-Means for CD can be found in Figure 1.The first step is illustrated in Figure 3. First, three identical copies of the original Pavia HSI are created, denoted as T 10 , T temp and T 20 .Then, three parts of T temp are selected as "source" areas and five parts of T 20 are selected as "target" areas.The pixels in one "target" area are replaced with those in one "source" area, as indicated by the arrows in Figure 3.If the sizes of two areas are not the same, the pixels in the "source" are randomly replicated or removed to fill in the "target" area exactly right.Thereby, T 20 is synthesized with five kinds of temporal changes with respect to T 10 .It is made sure that most changed samples in SCV of T 10 and T 20 have far-above-zero spectral amplitudes, so that they are distinguishable from the unchanged ones.The SCV is denoted by Y 0 and computed as Y 0 = T 10 − T 20 , which is used as clean data for reference in some of our experiments.The false color composition of T 10 and T 20 are presented in Figure 4a,b, respectively.The ground truth map indicating the changed areas and the unchanged background is given by Figure 4c. ) simulation and corruption simulation [42,44], as explained in the following.The first step is illustrated in Figure 3. First, three identical copies of the original Pavia HSI are created, denoted as 10 T , temp T and 20 T .Then, three parts of temp T are selected as "source" areas and five parts of 20 T are selected as "target" areas.The pixels in one "target" area are replaced with those in one "source" area, as indicated by the arrows in Figure 3.If the sizes of two areas are not the same, the pixels in the "source" are randomly replicated or removed to fill in the "target" area exactly right.Thereby, 20 T is synthesized with five kinds of temporal changes with respect to 10 T .

4
It is made sure that most changed samples in SCV of 10 T and 20 T have far-above-zero spectral amplitudes, so that they are distinguishable from the unchanged ones.The SCV is denoted by 0 Y and computed as 0 The ground truth map indicating the changed areas and the unchanged background is given by Figure 4c.In the second step, artificial noise is superimposed on 10 T and 20 T to simulate corruptions of various causes, as explained in Section 1.There are mainly two kinds of noise generated: small entry-wise Gaussian noise and gross sample-specific errors (outliers).Three types of outliers are simulated: Gaussian noise with high intensity, impulse noise and dead lines.By different combination of the artificial noise, ten types of corrupted bitemporal HSI, 1 T and 2 T , are created.
All the simulated datasets are normalized.Each type of the artificial noise and the bitemporal HSI are described in details as follows.In the second step, artificial noise is superimposed on T 10 and T 20 to simulate corruptions of various causes, as explained in Section 1.There are mainly two kinds of noise generated: small entry-wise Gaussian noise and gross sample-specific errors (outliers).Three types of outliers are simulated: Gaussian noise with high intensity, impulse noise and dead lines.By different combination of the artificial noise, ten types of corrupted bitemporal HSI, T 1 and T 2 , are created.All the simulated datasets are normalized.Each type of the artificial noise and the bitemporal HSI are described in details as follows.
Remote Sens. 2017, 9, 1044 10 of 21 ), superimposed on 20 randomly selected bands of some randomly selected pixels, which occupy % η of all the pixels; • Noise 3: impulse noise with an intensity of the unitary distribution, superimposed on 20 randomly selected bands of some randomly selected pixels, which occupy 0.5% of all the pixels; and

1.
Noise: • Noise 1: random Gaussian noise with a low intensity (σ 2 L ), superimposed on all the bands of all the pixels; • Noise 2: random Gaussian noise with a high intensity (σ 2 H = 0.5), superimposed on 20 randomly selected bands of some randomly selected pixels, which occupy η% of all the pixels; • Noise 3: impulse noise with an intensity of the unitary distribution, superimposed on 20 randomly selected bands of some randomly selected pixels, which occupy 0.5% of all the pixels; and • Noise 4: four dead lines with zero intensity, two of them superimposed on 20 randomly selected bands of all the pixels in two randomly selected rows and the other two superimposed on 20 randomly selected bands of all the pixels in two randomly selected columns.

Efficacy
Table 1 shows the change detection results of each feature extraction method for each simulated data type, whereas K-Means is adopted for binary classification.It can be seen that the proposed LRSD_SS outperforms all the other methods for all the datasets.Figure 6a-h gives an exemplary

Setup
The proposed LRSD_SS described in Algorithm 1 is compared with two conventional method, SAM [10] and PCA [11], one classic low-rank decomposition method, LRSD [43], one state-of-the-art method based on subspace projection, ASCD [10], and one state-of-the-art method based on low-rank decomposition, LRSD_TV [44].ASCD is an upgraded version of SAM as the former considers local spatial information but the latter does not.These methods exploit the projection distance between T 1 and T 2 .The LRSD-based methods use Y as SCV.The TV regularization in LRSD_TV is solved by the toolbox provided by its original publisher (http://iew3.technion.ac.il/~becka/papers/tv_fista.zip), which was successfully adopted by reference [44] for HSI denoising.For fairness, all the LRSD-based methods engage BRP to estimate L. The original LRSD is realized by the GoDec algorithm [43].
For each type of the simulated data, as there are five kinds of changes and one kind of unchanged background, the upper bound of rank r is fixed at 6 for LRSD_SS, LRSD_TV, LRSD and PCA.For the real-world data, as there are approximately six kinds of changes (including the background), r is also is fixed at 6.The parameter µ 0 of LRSD_SS is adjusted within (0.4, 1) and tailored to each type of the bitemporal HSI.Other parameters of LRSD_SS are fixed for all the datasets as follows, τ = 0.01, q = 3, µ max = 10 6 , ρ = 1.05, λ = 1/ √ M, ε 1 = ε 2 = 10 −6 and t max = 30.The weights ω w (w = 1, 2, • • • , W 2 − 1) defined in Equation ( 6) for the SS regularization are set by whereas the neighborhood size W is 3.The major parameters required by LRSD_TV, LRSD or PCA are delicately tailored to each type of data.
All the feature extraction methods are followed by K-Means (MATLAB 2013b toolbox) [20] to distinguish the changed samples from the unchanged in an unsupervised fashion.The parameters of K-Means are default.As to provide baseline results, A of Y is directly fed to K-Means.Since HSI change detection is regarded as classification, the final results are evaluated by overall accuracy (OA), average accuracy (AA) and Kappa coefficient (κ), which are standard criteria for classification evaluation.All the algorithms are implemented in MATLAB R2013b running on a workstation with an Intel(R) Xeon(R) CPU X5667 @ 3.00 GHz (dual core) and 48.0 GB of RAM.

Efficacy
Table 1 shows the change detection results of each feature extraction method for each simulated data type, whereas K-Means is adopted for binary classification.It can be seen that the proposed LRSD_SS outperforms all the other methods for all the datasets.Figure 6a-h gives an exemplary result of each method for Data 3, while Figure 7a-h does for Data 10.These figures show that LRSD_SS can produce a nice and smooth label map, with most errors removed and spatial structures maintained while outliers and small Gaussian noise are present.In addition, it can be inferred that the proposed method is robust to corruptions.
The effectiveness of LRSD_SS is also validated by the real-world dataset, however, this time, LRSD_SS only slightly outperforms the others, as shown in Table 2. Since the dataset contains large areas of nice and clean background, it is not challenging enough for most existing change detection methods.In the future, we shall find some more complex HSI to test our method.
Compared to the methods based on low-rank approximation in Table 1, SAM and ASCD lack robustness to noise.It can be easily seen in Figure 7 that SAM/ASCD fails to detect the changes, whereas the bitemporal HSI are heavily corrupted by all types of the simulated noise.These projection-distance methods directly use the spectral distance between a target sample in T 1 and a "background subspace" spanned by samples in T 2 , as the change feature.The measurement is not very robust to noise, even though local spatial information is considered by ASCD.(a To further evaluate the performances of change feature extraction, 4 columns of samples in L produced by LRSD_SS, LRSD_TV and LRSD tested on Data 10 are analyzed.The samples at these columns of Y are corrupted by all types of the simulated noise.As presented in Figure 8a-h As can be inferred, the proposed method is effective in removal of the unwanted corruptions and extraction of the desired low-rank and locally smooth features, especially the clean background.In the binary classification, the changed areas are naturally detected as long as the background is identified.Thus, LRSD_SS can be well applied in HSI change detection.To further evaluate the performances of change feature extraction, 4 columns of samples in L produced by LRSD_SS, LRSD_TV and LRSD tested on Data 10 are analyzed.The samples at these columns of Y are corrupted by all types of the simulated noise.As presented in Figure 8a-h  All the samples at these locations in Y are corrupted by all the types of simulated noise.On each box, the central mark represents the median, the edges are the 25th and the 75th percentiles, and the whiskers extend to the most extreme data points not considered as "outliers" defined by function "boxplot" in MATLAB 2013b (these "outliers" are not the same as those defined by LRSD, except for the name).The maximum whisker length is 1.5.

Efficiency
Table 3 shows the time consumption in solving each component of different LRSD-based models per iteration.Each given value is the average time cost by a low-rank decomposition method tested on Data 3, Data 5, Data 6 and Data 10.Within the same ALM framework, LRSD_TV is nearly 47 times slower than the proposed LRSD_SS in optimization of the locally smoothed low-rank data X .While the SS regularization in LRSD_SS processes all the bands of SCV simultaneously, characterizes spectral-spatial features and produces a closed-form solution for X , the TV regularization in LRSD_TV has to be applied to each band of SCV since the TV method is originally designed for two-dimensional images [35,51,52].Among the three algorithms in Table 3, the default LRSD is the fastest since it does not involve any additional regularizations [43].
Error , defined in Equations ( 18) and ( 19), respectively, are tracked with respect to iterations when LRSD_SS is tested on Data 3, Data 5, Data 6 and Data 10. Figure 10a-l present one set of the exemplary results for each case.Data 3, Data 5 and Data 6 are analyzed because the bitemporal HSI in each of these cases include a unique type of the outliers described in Section 4.1.Data 10 is selected since the bitemporal HSI in this case are corrupted by all types of the simulated noise.Figure 10 shows that the curves of the optimization errors against the iteration number become flat after about 20 iterations in whichever the cases.It also indicates that the OA of LRSD_SS stops increasing approximately around the iteration whereas the errors stop decreasing.With the average time cast per run presented in Table 3, it can be inferred that LRSD_SS does not require much time to yield the optimal result for change detection.Therefore, the proposed method can be considered quite practical.On each box, the central mark represents the median, the edges are the 25th and the 75th percentiles, and the whiskers extend to the most extreme data points not considered as "outliers" defined by function "boxplot" in MATLAB 2013b (these "outliers" are not the same as those defined by LRSD, except for the name).The maximum whisker length is 1.5.
As can be inferred, the proposed method is effective in removal of the unwanted corruptions and extraction of the desired low-rank and locally smooth features, especially the clean background.In the binary classification, the changed areas are naturally detected as long as the background is identified.Thus, LRSD_SS can be well applied in HSI change detection.

Efficiency
Table 3 shows the time consumption in solving each component of different LRSD-based models per iteration.Each given value is the average time cost by a low-rank decomposition method tested on Data 3, Data 5, Data 6 and Data 10.Within the same ALM framework, LRSD_TV is nearly 47 times slower than the proposed LRSD_SS in optimization of the locally smoothed low-rank data X.While the SS regularization in LRSD_SS processes all the bands of SCV simultaneously, characterizes spectral-spatial features and produces a closed-form solution for X, the TV regularization in LRSD_TV has to be applied to each band of SCV since the TV method is originally designed for two-dimensional images [35,51,52].Among the three algorithms in Table 3, the default LRSD is the fastest since it does not involve any additional regularizations [43].The optimization errors of LRSD_SS, Error1 and Error2, defined in Equations ( 18) and ( 19), respectively, are tracked with respect to iterations when LRSD_SS is tested on Data 3, Data 5, Data 6 and Data 10. Figure 10a-l present one set of the exemplary results for each case.Data 3, Data 5 and Data 6 are analyzed because the bitemporal HSI in each of these cases include a unique type of the outliers described in Section 4.1.Data 10 is selected since the bitemporal HSI in this case are corrupted by all types of the simulated noise.Figure 10 shows that the curves of the optimization errors against the iteration number become flat after about 20 iterations in whichever the cases.It also indicates that the OA of LRSD_SS stops increasing approximately around the iteration whereas the errors stop decreasing.With the average time cast per run presented in Table 3, it can be inferred that LRSD_SS does not require much time to yield the optimal result for change detection.Therefore, the proposed method can be considered quite practical.(i) OA in an exemplary case of Data 6; and (j) Error ; and (l) OA in an exemplary case of Data 10.In each graph, the x-axis represents iteration number and the y-axis indicates evaluation criteria.OA is short for overall accuracy.

Discussion
From the analysis above, the strengths of LRSD_SS can be summarized as follows: 1. LRSD_SS effectively addresses the essential issue of change detection in multitemporal HSI by exploiting inherent data structures of SCV.With the proposed SS regularization which maintains the local patterns in the SCV, LRSD_SS can further characterize the nature of the change features, making it easier to separate the features from various types of noise and identify the changes.It is especially effective in recovering the clean background.Therefore, in In each graph, the x-axis represents iteration number and the y-axis indicates evaluation criteria.OA is short for overall accuracy.

Discussion
From the analysis above, the strengths of LRSD_SS can be summarized as follows: 1. LRSD_SS effectively addresses the essential issue of change detection in multitemporal HSI by exploiting inherent data structures of SCV.With the proposed SS regularization which maintains the local patterns in the SCV, LRSD_SS can further characterize the nature of the change features, making it easier to separate the features from various types of noise and identify the changes.It is especially effective in recovering the clean background.Therefore, in the binary classification of the changed and the unchanged samples, LRSD_SS yields better results than many classic or state-of-the-art methods do.

2.
The proposed method is also efficient.The SS regularization extracts spectral-spatial features, processing all the bands of SCV simultaneously.Thus, it is much faster than the TV regularization on the LRSD model, which processes each band image separately.Moreover, the design of SS contributes a closed-form solution for the feature data within the ALM framework, so that the algorithm of LRSD_SS is implementation-friendly.As can be seen from the experimental results, only a few iterations are needed.
Since the proposed method has been proved useful to binary change detection, it will be further explored and adapted to multiple change detection in our future research.In addition, supervised or semi-supervised classification methods will be employed to increase detection accuracy.

Conclusions
For better change detection in HSI, this paper considers change detection as a classification problem and proposes a novel method, LRSD_SS.It designs a unique spectral-spatial regularization superimposed on a well-established LRSD model and exploits intrinsic structures of different data components, especially the local smoothness.Since LRSD_SS deals with spectral and spatial information simultaneously, it is effective in change feature extraction and able to increase accuracy in change identification.LRSD_SS is also efficient, as there exists a closed-form solution for the change features within the ALM framework.Experiments on various types of bitemporal HSI have shown the advantages of LRSD_SS, compared to many other methods for change feature extraction.For the synthetic data corrupted by all types of the simulated noise (Data 10), LRSD_SS outperforms the state-of-the-art methods, LRSD_TV and ASCD, by 2.14 percentage points and 27.45 percentage points in terms of OA, respectively.It is also demonstrated that the time consumption of LRSD_SS is about 47 times less than that of LRSD_TV in solving the change feature matrix.For follow-up work, we shall acquire some more real-world datasets to further exam the proposed method and modify the design of the SS regularization so that it could be adapted to detection of multiple changes in multitemporal HSI.

21 Figure 1 .
Figure 1.A framework for change detection in HSI, where 1

Figure 1 .
Figure 1.A framework for change detection in HSI, where T 1 ∈ R M×B and T 2 ∈ R M×B are bitemporal hyperspectral images acquired at time 1 and time 2, respectively; Y ∈ R M×B is SCV of T 1 and T 2 ; and A ∈ R M×1 is amplitude vector of change features.

Figure 2 .
Figure 2.An illustration of the proposed Spectral-Spatial (SS) regularization, with: (a) W W × neighborhood centered by pixel x ij ( 0 ij x ); and (b) weights superimposed on
.1.Datasets 4.1.1.Simulated Bitemporal Hyperspectral Data Ten sets of simulated bitemporal HSI are used for performance evaluation of the proposed LRSD_SS.Each dataset is created from the same non-temporal HSI of 610 × 340 pixels, captured over the University of Pavia, Italy in 2001 by the ROSIS instrument (http://www.ehu.es/ccwintco/uploads/e/ee/PaviaU.mat).After removing the bands seriously affected by noise or water absorption, 103 spectral channels are left.Thus, I = 610, J = 340, M = I × J = 207, 400 and B = 103.The Pavia dataset makes the CD task very challenging as it covers several (at least nine) different kinds of ground objects and exhibits spectral variability.The simulation of T 1 and T 2 is carried out in two steps: clean bitemporal HSI (T 10 , T 20 ∈ R M×B ) simulation and corruption simulation[42,44], as explained in the following.

1 .
Simulated Bitemporal Hyperspectral DataTen sets of simulated bitemporal HSI are used for performance evaluation of the proposed LRSD_SS.Each dataset is created from the same non-temporal HSI of 610 × 340 pixels, captured over the University of Pavia, Italy in 2001 by the ROSIS instrument (http://www.ehu.es/ccwintco/uploads/e/ee/PaviaU.mat).After removing the bands seriously affected by noise or water absorption, 103 spectral channels are left.Thus, dataset makes the CD task very challenging as it covers several (at least nine) different kinds of ground objects and exhibits spectral variability.The simulation of 1 T and 2 T is carried out in two steps: clean bitemporal HSI ( 10 20 , , which is used as clean data for reference in some of our experiments.The false color composition of 10 T and 20 T are presented in Figure 4a,b, respectively.

Figure 5 .
Figure 5. Real-world bitemporal data, including false color composition by bands 20, 40 and 60 of: (a) images acquired on 3 May 2006; (b) images acquired on 23 April 2007; and (c) ground truth map with changed areas in white and unchanged areas (background) in black.

Figure 5 .
Figure 5. Real-world bitemporal data, including false color composition by bands 20, 40 and 60 of: (a) images acquired on 3 May 2006; (b) images acquired on 23 April 2007; and (c) ground truth map with changed areas in white and unchanged areas (background) in black.
, these samples in L yielded by LRSD_SS are the most similar to those at the same columns in the clean data 0 Y , in terms of spectral signature or spectral amplitude.It indicates that LRSD_SS is better than the other LRSD-based methods in recovering clean data and suppressing noise.The box plots in Figure 9a,b illustrate that LRSD_SS makes the centers of the change and the non-change clusters the farthest apart than the other LRSD-based methods.The non-change cluster becomes nicely compact after LRSD_SS.Therefore, it will be easy to classify the changed samples and the unchanged ones based on the change features produced by LRSD_SS.
, these samples in L yielded by LRSD_SS are the most similar to those at the same columns in the clean data Y 0 , in terms of spectral signature or spectral amplitude.It indicates that LRSD_SS is better than the other LRSD-based methods in recovering clean data and suppressing noise.The box plots in Figure9a,b illustrate that LRSD_SS makes the centers of the change and the non-change clusters the farthest apart than the other LRSD-based methods.The non-change cluster becomes nicely compact after LRSD_SS.Therefore, it will be easy to classify the changed samples and the unchanged ones based on the change features produced by LRSD_SS.

Figure 8 .
Figure 8.In an exemplary case of Data 10: (a) spectral signatures and (b) amplitudes of the samples at column 84; (c) spectral signatures and (d) amplitudes of the samples at column 170; (e) spectral signatures and (f) amplitudes of the samples at column 286; and (g) spectral signatures and (h) amplitudes of the samples at column 299 in Y , 0 Y and L produced by LRSD_TV/LRSD_SS/LRSD.

Figure 8 .Figure 9 .
Figure 8.In an exemplary case of Data 10: (a) spectral signatures and (b) amplitudes of the samples at column 84; (c) spectral signatures and (d) amplitudes of the samples at column 170; (e) spectral signatures and (f) amplitudes of the samples at column 286; and (g) spectral signatures and (h) amplitudes of the samples at column 299 in Y, Y 0 and L produced by LRSD_TV/LRSD_SS/LRSD.All the samples at these locations in Y are corrupted by all the types of simulated noise.

Figure 9 .
Figure 9.In an exemplary case of Data 10, box plots of: (a) spectral amplitudes of all the samples in Y, Y 0 and L produced by LRSD_SS/ LRSD_TV/ LRSD; and (b) spectral values of these samples in band 92.On each box, the central mark represents the median, the edges are the 25th and the 75th percentiles, and the whiskers extend to the most extreme data points not considered as "outliers" defined by function "boxplot" in MATLAB 2013b (these "outliers" are not the same as those defined by LRSD, except for the name).The maximum whisker length is 1.5.

Figure 10 .
Figure 10.Performance curves of LRSD_SS: (a) Error1; (b) Error2; and (c) OA in an exemplary case of Data 3; (d) Error1; (e) Error2; and (f) OA in an exemplary case of Data 5; (g) Error1; (h) Error2; (i) OA in an exemplary case of Data 6; and (j) Error1; (k) Error2; and (l) OA in an exemplary case of Data 10.In each graph, the x-axis represents iteration number and the y-axis indicates evaluation criteria.OA is short for overall accuracy.

Table 1 .
Evaluation of change detection results for each type of the simulated bitemporal HSI.

Table 2 .
Evaluation of change detection results for the real-world bitemporal HSI.

Table 3 .
Time cost (in seconds) of three LRSD-based change feature extraction methods.
1Error and

Table 3 .
Time cost (in seconds) of three LRSD-based change feature extraction methods.