Robust Hyperspectral Image Classification by Multi-Layer Spatial-Spectral Sparse Representations

Sparse representation (SR)-driven classifiers have been widely adopted for hyperspectral image (HSI) classification, and many algorithms have been presented recently. However, most of the existing methods exploit the single layer hard assignment based on class-wise reconstruction errors on the subspace assumption; moreover, the single-layer SR is biased and less stable due to the high coherence of the training samples. In this paper, motivated by category sparsity, a novel multi-layer spatial-spectral sparse representation (mlSR) framework for HSI classification is proposed. The mlSR assignment framework effectively classifies the test samples based on the adaptive dictionary assembling in a multi-layer manner and intrinsic class-dependent distribution. In the proposed framework, three algorithms, multi-layer SR classification (mlSRC), multi-layer collaborative representation classification (mlCRC) and multi-layer elastic net representation-based classification (mlENRC) for HSI, are developed. All three algorithms can achieve a better SR for the test samples, which benefits HSI classification. Experiments are conducted on three real HSI image datasets. Compared with several state-of-the-art approaches, the increases of overall accuracy (OA), kappa and average accuracy (AA) on the Indian Pines image range from 3.02% to 17.13%, 0.034 to 0.178 and 1.51% to 11.56%, respectively. The improvements in OA, kappa and AA for the University of Pavia are from 1.4% to 21.93%, 0.016 to 0.251 and 0.12% to 22.49%, respectively. Furthermore, the OA, kappa and AA for the Salinas image can be improved from 2.35% to 6.91%, 0.026 to 0.074 and 0.88% to 5.19%, respectively. This demonstrates that the proposed mlSR framework can achieve comparable or better performance than the state-of-the-art classification methods.


Introduction
The quantitative useful information provided by high-resolution sensors is helpful to distinguish between different land cover classes with different spectral responses.Hyperspectral image (HSI) classification remains one of the most challenging problems due to within-class variation and spatial details [1][2][3][4][5][6].In the past few decades, significant efforts have been made to develop various classification methods.For example, there have been a variety of studies that utilize spatial-spectral information for HSI classification [7,8].However, most of the previous works mainly focus on various dense feature extractions, such as Gabor, patch-based [9], scale-invariant feature transform (SIFT) [10], local binary pattern (LBP) [11], local Gabor binary pattern (LGBP) [12], random project (RP) [13] and the bag-of-visual-words (BOVW) [14], where the extracted features are fed into k-nearest neighbor (K-NN), support vector machine (SVM) or Markov random field (MRF) [15] to perform HSI classification.Besides, some feature-matching methods [16,17] in the computer vision area can also be generalized for HSI classification, but they have a prerequisite that the spectral features should be extracted in advance.However, these local features may be contradictory because of their overlapping with each other and result in less contribution to the classifiers.
Recently, researchers have exploited sparse representation (SR) techniques for HSI classification and other computer vision applications, e.g., [18,19].Sparse representation classification (SRC) assumes the input samples of the same class lie in a class-dependent low-dimensional subspace, and a test sample can be sparsely represented as a linear combination of the labeled samples via 1 regularization.Different from the conventional classifiers aforementioned, it does not require training, and the class label of a test sample is determined to be the class whose dictionary atoms provide the minimal approximation error.Although SRC has achieved promising results in HSI classification [20,21], it suffers from instable representation coefficients across multiple classes, especially with the similar input features.After that, kernelized SRC (KSRC) [22] and structured sparse priors, such as Laplacian regularized Lasso [23] and low-rank group Lasso [24], are presented for HSI classification, and improved accuracies are reported.Later on, a collaborative representation classification (CRC) via 2 regularization is introduced in face recognition and demonstrated that CRC can achieve comparable performance with SRC at much lower computational cost [25].Recently, CRC is actively adopted for HSI classification [26], where the test sample is collaboratively represented with dictionary atoms from all of the classes, rather than the sparsity constraint as in SRC.However, CRC has limited discriminative ability when the labeled samples include mixed information.
It is generally agreed that SR coefficients follow a class-dependent distribution, which means the nonzero entries of the recovered coefficients from the same class tend to locate at a specific sub-dictionary, and the magnitudes of the coefficients in accordance with the true class are larger than the others.Therefore, in [27], the class-dependent sparse representation classifier (cdSRC) was proposed for HSI classification, where SRC combines K-NN in a class-wise manner to exploit both correlation and Euclidean distance between test and training samples, and the classification performance is increased.Furthermore, K-NN Euclidean distance and the spatial neighboring information of test pixels are introduced into the CR classifiers.In [28], a nonlocal joint CR with a locally-adaptive dictionary is developed.In [29], spatially multiscale adaptive sparse representation in a pixel-wise manner is utilized to construct a structural dictionary and outperforms its counterparts.However, the spatially multiscale pixel-wise operation requires extra computational cost.In [30], the spatial filter banks were included to enhance the logistic classifier with group-Lasso regularization.In addition, kernelized CRC (KCRC) is investigated for HSI classification in [31], and accumulated assignment using a sparse code histogram is discussed in [32].
More recently, some sparse representation-based nearest neighbors (SRNN) and elastic net representation-based classification (ENRC) for HSI are also reported.In [33], three sparse representation-based NN classifiers, i.e., SRNN, local SRNN and spatially-joint SRNN, were proposed to achieve much higher classification accuracy than the traditional Euclidean distance and representation residual.In [34], the proposed ENRC classification method produces more robust weight coefficients via adopting 1 and 2 penalties in the objective function, thereby turning out to be more discriminative than the original SRC and CRC.In a word, such representation-based methods are designed to improve the stability of the sparse codes and their discriminability by modeling the spectral variations or collaboratively coding multiple samples.
Although the aforementioned representation-based classification methods perform well to some extent, all of them reconstruct sparse coefficients in a single layer and still need to be further exploited in terms of how to estimate the "true" reconstruction coefficients for the test sample.In fact, a multi-layer sparse representation-based (mlSR) assignment framework is preferred to necessarily stabilize the sparse codes for representation-based classification.In this paper, we propose to investigate a multi-layer spatial-spectral SR assignment framework under the structural dictionary for HSI classification, which effectively combines the ideas of a multi-layer SR assignment framework with adaptive dictionary assembling and adaptive regularization parameter selection.Specifically, three SR algorithms, multi-layer SRC (mlSRC), multi-layer CRC (mlCRC) and multi-layer ENRC (mlENRC) are developed.The proposed mlSR assignment framework enforces the selected bases (dictionary atoms) into as few categories as possible, and the estimated reconstruction coefficients are refined thereby, which boosts the accurate discrimination of the model.This is one feature of our method.Another one is that the proposed mlSR assignment framework exploits the intrinsic class-dependent distribution, which is utilized to stabilize test distribution estimation across multiple classes and lead to a selective multi-layer representation-based classification framework.Moreover, we consider the construction of the structural dictionary, that is, a dictionary consisting of spectral and spatial features via the utilization of a group of globally-spatial filter banks is first constructed, thus integrating the spatial consistency of the dictionary atoms and allowing drastic savings in computational time.The proposed mlSR assignment framework is not only natural and simple, but also indeed beneficial toward HSI classification.Note that these features compose our major contributions in this work and make the proposed methods unique with regard to previously-proposed approaches in this area (e.g., [29,35,36]).Our mlSR framework is different from [35] in terms of the implementation principle, where the latter can be viewed as a kind of weighted sparse coding, and the classification is done by maximizing the feature probability, but without dictionary assembling; meanwhile, unlike [29,36], which are designed to capture spatial correlations by introducing the neighboring pixels of the test sample for sparse coding and often time consuming, our proposed methods are in essence a multi-layer framework, which involves assembling adaptive dictionaries for test samples.The experimental results demonstrate that classification accuracy can be consistently improved by the proposed mlSR assignment framework.
There are three main contributions in this work.First, a multi-layer spatial-spectral sparse representation (mlSR) framework for HSI classification is proposed.In the proposed framework, three algorithms, multi-layer SR classification (mlSRC), multi-layer collaborative representation classification (mlCRC) and multi-layer elastic net representation-based classification (mlENRC), are developed and achieve stable assignment distributions via the adaptive atoms selection in a multi-layer manner.Second, both the test distribution evaluation-based filtering rule and dictionary assembling based on the classes ranked within the top half of the minimal residuals are developed to convey discriminative information for classification and decrease the computational time.Last, but not least, a structural dictionary consisting of globally-filtered spatial and spectral information is constructed to further boost the classification performance.It is also worth mentioning that our proposed mlSR framework has another nice property that can be easily plugged into any representation-based classification model using different HSI features (e.g., spectral features, spatial features and spatial-spectral features).The proposed approach is evaluated using three real HSI datasets.The experimental results verify the effectiveness of our proposed methods as compared to state-of-the-art algorithms.
The remainder of this paper is organized as follows.Section 2 briefly reviews representation-based techniques for HSI classification.Section 3 presents the proposed mlSR framework and classification approaches in detail.Section 4 evaluates the proposed approaches against various state-of-the-art methods on three real HSI datasets in terms of classification accuracy and computational time.Section 5 includes discussions of our framework and method.Finally, Section 6 concludes the paper.

Sparse and Collaborative Representations
As a natural way in signal representation, sparse representation (SR) assumes that the input samples of a particular class lie in a low-dimensional subspace spanned by dictionary atoms (training samples) from the same class.A test sample can be represented as a linear combination of training samples from all classes.Formally, in SR-based classification (SRC), for a test sample y ∈ R d (d is the number of the features of HSI), the objective of SR according to the 1 -norm is to find sparse coefficient vector α (SR) with a given d × N structural dictionary D, so the objective function can be formulated as: where N is the total number of atoms in D. ||•|| 1 denotes the 1 -norm.λ is the regularization parameter that balances the contribution of the reconstruction error and the sparsity of the reconstruction weights.
Once the sparse coefficient α(SR) is obtained, the class label of the test sample y can be determined by the minimal residual error between y and the class-dependent sub-dictionary of each class.
where C is the number of classes and α(SR) represents the coefficients in α(SR) belonging to the c-th class.The class label is given by SRC: Different from SRC, in collaborative representation-based classification (CRC), a test sample is represented collaboratively over all of the training samples, and the objective is to find the weight vector α(CR) , which can be expressed as: with λ being the regularization parameter.By taking derivative with regard to α (CR) in Equation ( 4), α(CR) can be solved as: Then, the class label assignment by CRC is determined according to the minimum residual r CRC c (y). Obviously, CRC is more computationally efficient than SRC due to the closed-form solution as in Equation (5).

Elastic Net
The reconstruction weights play an important role in representation-based classification.Thus, many representation-based methods aim to obtain the weight vector based on some reasonable constraint.For instance, in SRC, training samples are projected onto a subspace, and only a few dictionary atoms (sparsity) are allowed to be selected to form sparse representation, which becomes inaccurate when dictionary atoms are less related and small; while in CRC, many dictionary atoms collaborate on the representation of a test sample and contribute to the reconstruction (collaboratively).Nevertheless, the non-sparse vector of CRC might distribute across multiple classes, and its discriminant ability is limited.Recent literature [34] has pointed out that the classification improvement in some cases is brought by SR, while in other cases, the gain is brought by CR.In order to avoid the aforementioned problem, the elastic net model was recently presented and resulted in robust coefficients via a convex combination of SR and CR [37].The objective function of elastic net representation-based (ENR) classification (ENRC) is defined as: where the nonnegative parameters λ 1 and λ 2 are used to control the contributions of the sparsity constraint and self-similarity constraint, respectively.The first constraint encourages sparsity in the reconstruction weights, and the second constraint enforces similarity in their collaborations.
The 1 -norm and 2 -norm regularization terms are utilized together in the objective function to overcome the limitation of SR-based and CR-based methods, respectively.Therefore, the highly correlated samples are guaranteed to be selected, and the intrinsic sparsity is enforced by the ENRC.After obtaining α (EN) , the class label of the test sample y is then determined according to the minimum residual r ENRC c (y), similar to Equation (2).As a result, the ENRC may offer a correct label assignment even when both SRC and CRC give wrong labels.

Motivation for the Proposed Approach
It is well known that the HSI data are characterized with high correlation with each other and spatial variation of spectral signatures, which makes single-layer SR/CR/ENR-based classification methods challenging.This is because the recovered coefficients under such scenarios potentially are instable.The instability implies that the nonzero entries of the recovered coefficients might distribute across multiple classes, thus deteriorating the discriminability.In other words, multi-layer sparse representation is preferred.Intuitively, the more classes associated with the top several minimal residuals can be kept for dictionary assembling, the more accurate the class label of the test sample expected to be assigned.Therefore, we can enforce dictionary assembling into few categories in a multi-layer manner, and the regularization parameter for each test sample is adaptively chosen by cross-validation again.To better understand the working mechanism of the proposed method, we randomly take four test samples located at ( (21,6), (25,7), (18,13), (18,6)) in the Indian Pines for example and calculate respective recovered coefficients using SRC, CRC and ENRC for at most three layers.The sparse coefficients and corresponding residuals of each test sample under various norms are shown in Figure 1.From this figure, one can easily notice that, although all the four test samples belong to Class 2, the single-layer hard assignments of SRC, CRC and ENRC for the considered test samples are inaccurate because the residual error computed from Class 2 is higher than that from some other classes.However, the samples with obviously the wrong class label assigned by the single-layer SRC, CRC and ENRC are enforced to carry out the second-layer SR, even the third-layer SR, respectively, and then assigned the correct class label, which clearly denotes the effectiveness and superiority of this proposed mlSR assignment framework.It should be noted that the structural dictionary consisting of globally-spatial and spectral information is combined to further boost the classification performance.

Test Distribution Evaluation
According to the above observation, the recovered coefficients follow a class-dependent overall distribution despite the instability in a single-layer SR, and the nonzero entries of the recovered coefficients from the same class tend to locate at a specific sub-dictionary; and the magnitudes of the coefficients corresponding to the true class are usually larger than the others.Perceptually, a test sample y is correctly assigned the class label because this sample has the largest magnitude of sparse coefficients within the active sub-dictionary.We introduce the following heuristics to find out the obviously misclassified samples, which will be accepted to perform a second-layer SR, and newly-assigned class labels of those samples are updated.Similarly, a third-layer SR can be done in the same way.This is based on the following reasons.First, less test samples that will be accepted to carry out multi-layer SR means less computational time for the proposed method.Second, some obviously correctly-assigned samples are unnecessary to run in the subsequent layers.As a result, the classification accuracy of such a selective multi-layer SR assignment framework is consistently improved.To this end, we first adopt the sparsity concentration index (SCI) [38] as the degree measurement across multiple classes: where δ i (α) indicates the entries associated with i-th class.Obviously, in SR via the 1 -norm, for a test sample y, if SCI(α y ) = 1, y is definitely represented using a unique class, and if SCI(α y ) = 0, the sparse coefficients are spread evenly over all classes.Furthermore, we define the heuristic as follows.Specifically, for a test sample y with its label assigned by the lth (l > 0) layer classification as L l (y), it is accepted to do a multi-layer SR based on the following condition of being 'true'.
where Position(Peak(α y )) denotes the position of maximal peak of sparse coefficients α from test sample y.X D l c ) indicates the lth layer class-dependent overall distribution from class c and can be expressed as a triplet <Peak, Position, SCI>.In addition, the slight fluctuation ε l is introduced possibly due to the bias of sparse coefficients in each layer.Thus in a sense, the proposed filtering rule for a multi-layer classification via the residual and sparse coefficient together can pick the obviously misclassified samples for the next layer SR.Let us take the Indian Pines image for example, as illustrated in Figure 2.

Remote
where Position(Peak(αy)) denotes the position of maximal peak of sparse coefficients α from test sample y.
indicates the lth layer class-dependent overall distribution from class c and can be expressed as a triplet <Peak, Position, SCI>.In addition, the slight fluctuation ε l is introduced possibly due to the bias of sparse coefficients in each layer.Thus in a sense, the proposed filtering rule for a multi-layer classification via the residual and sparse coefficient together can pick the obviously misclassified samples for the next layer SR.Let us take the Indian Pines image for example, as illustrated in Figure 2.  (25,7); (c) elastic net (ℓ1 + ℓ2) for pixel (21,6); (d) elastic net (ℓ1 + ℓ2) for pixel (25,7); (e) collaborative representation classification (CRC) (ℓ2) for pixel (18,13); and (f) CRC (ℓ2) for pixel (18,6).
The class-dependent overall distribution (blue curve and red blocks) of sparse coefficients of twelve classes are obtained using all of the labeled samples of the corresponding class, under a structural dictionary with forty training samples per class.Further, the sparse coefficients (green curve) of a test sample is also plotted.As can be seen, class-dependent sparse coefficients and the magnitudes mainly concentrate at some fixed blocks; that is, the nonzero entries of the sparse coefficient and larger magnitudes are in accordance with the true class.Hence, class-dependent overall distributions convey discriminative information and are exploited to find the obviously misclassified samples at the previous layer.With this treatment, the plotted test sample is recognized as a 'good' sample, i.e., this sample is partitioned into the same class as the true class (ID: 2), and it is unnecessary to proceed to do a multi-layer classification.For the non-1 -norm SR, we use a similar rule to filter the obviously misclassified samples to run a multi-layer SR.Note that the heuristic rule in Equation ( 8) has the advantage of both computational efficiency and achieving better classification performance over several state-of-the-art methods.

mlSR Framework
Motivated by the above observations, we propose a novel multi-layer sparse representation (mlSR) framework to explore the stable assignment distribution.To summarize, the overall outline of the proposed mlSR is shown in Figure 3.As depicted in Figure 3, for each test sample, the method consists of the following main steps: (1) compute the sparse coefficients and residual matrices at the first layer; (2) select the obviously misclassified samples at the first layer to perform the second layer SR according to the top C/2 minimal residuals based on the test distribution evaluation and dictionary assembling, and update the corresponding class label assignments; (3) choose the obviously misclassified samples at the second layer to carry out the third layer SR according to the top C/4 minimal residuals on the basis of the predefined test distribution evaluation, and update the corresponding class labels; (4) output the final class labels.
One of the key ingredients in the proposed mlSR framework is that adaptive selection of the sub-dictionary atoms, that is, the new sub-dictionary that are re-assembled for each test sample based on the test distribution evaluation and the classes ranked within the top half of minimal residuals (i.e., C/2 classes in the second layer, C/4 in the third layer, and so on) and appropriate for representing the test sample.In other words, a subset of the structured dictionary is re-selected for the SR of each test sample, favoring a stable assignment distribution and resulting in better discriminative ability of the proposed approach.In addition, the filtering rule for obviously misclassified samples sieved into the next layer SR is another core part of the proposed method because, on the one hand, the correctly-assigned samples at the first layer are unnecessary to do subsequent layer SR according to Equation (8); on the other hand, the new cross-validation on the parameter searching from the second layer is conducted for each test sample and time costly.Thus, a tradeoff between the number of samples filtered to do a multi-layer SR and classification performance should be made.Furthermore, the proposed mlSR framework is detailed in Algorithm 1.
In the proposed framework, the globally-filtered spatial features, such as the widely-used band ratios from the three first principal components (PCs) of original spectral features, 2D Gabor energy [12] and morphological files [5], are extracted and employed to construct the structural dictionary.Note that different types of global features exploit the local information of each considered pixel and should contribute to the discrimination of dictionary atoms; meanwhile, the globally spatial features are much faster to be extracted.These considered features are reported in Table 1.As shown in Table 1, D {s, r, g, m} indicates that different types of features except for spectra are globally extracted via different spatial filter banks.Input: Layer l = 1, a structural dictionary D l with M features and N samples; number of classes C; regularization parameter set λ all , test index set U l , threshold ε l and residual r l = φ Step 1: Calculate lth layer class-dependent overall distribution X(D l ) Step 2: for each test sample y do Step 3: Determine the optimal regularization parameter λ, λ2 via five-fold cross-validation searching in λ all under the dictionary D l Step 4: Compute sparse coefficients α (SR) , α (CR) , α (EN) using Equations ( 1), ( 5) and ( 6), respectively Step Step 6: Evaluate test distribution τ l (y) based on X(D l ) Step 7: Add y into the (l + 1)th test set U l+1 if τ l (y) = true Step 8: Find newly-selected atoms' indexes and assemble the sub-dictionary according to the classes ranked within the top half of the minimal residuals Step 9: l ← l + 1, if l > 2 or U l = NULL; go to the Step 12 Step 10: Go to the Step 3 Step 11: end for Step 12: Decide the final class label class SRC (y), class CRC (y), class ENRC (y) according to Equation (3) Step 13: Output: class(y)

Experiments
In this section, in order to demonstrate the superiority of the proposed method in HSI classification, the proposed multi-layer spatial-spectral sparse representation (mlSR) method is compared with various state-of-the-art methods on three benchmark hyperspectral remote sensing images: Indian Pines, University of Pavia and Salinas.Note that the proposed method utilizes a structural dictionary consisting of globally-filtered spatial feature such as 2D Gabor (scale = 2, orient = 8), morphological profiles and spectral feature along all bands.To further validate the effectiveness of the proposed model on exploring the structural consistency in the classification scenarios, we compare the proposed mlSR assignment framework with competitors built on only spectral features.Meanwhile, the number of the layers is set as three in order to balance computational complexity and classification performance.Additionally, we also analyze the influence of several key model parameters.

Hyperspectral Images and Experiment Setting
Three hyperspectral remote sensing images are utilized for extensive evaluations of the proposed approach in the experiments: Indian Pines image captured by AVIRIS (Airborne Visible/Infrared Imaging Spectrometer), University of Pavia image captured by ROSIS (Reflective Optics System Imaging Spectrometer) and Salinas image collected by AVIRIS sensor.
The Indian Pines image was acquired by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor over northwest Indiana's Indian Pine test site in June 1992 [39].The image contained 16 classes of different crops at a 20-m spatial resolution with the size of 145 × 145 pixels.After uncalibrated and noisy bands were removed, 200 bands remained.We use the whole scene, and twelve large classes are investigated.The number of training and testing samples is shown in Table 2.
The University of Pavia utilized in the experiments is of an urban area that was taken by the ROSIS-03 optical sensor over the University of Pavia, Italy [40].The image consists of 115 spectral channels of size 610 × 340 pixels with a spectral range from 0.43 to 0.86 µm with a spatial resolution of 1.3 m.The 12 noisy channels have been removed, and the remaining 103 bands were used for the experiments.The ground survey contains nine classes of interest, and all classes are considered.The number of training and testing samples is summarized in Table 2.
The Salinas image was also collected by the AVIRIS sensor, capturing an area over Salinas Valley, CA, USA, with a spatial resolution of 3.7 m.The image comprises 512 × 217 pixels with 204 bands after 20 water absorption bands are removed.It mainly contains vegetables, bare soils and vineyard fields.The calibrated data are available online (along with detailed ground-truth information) from http://www.ehu.es/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes.There are also 16 different classes, and all are utilized; the number of training and testing samples is listed in Table 2.
The parameter settings in our experiments are given as follows.
(1) For training set generation, we first randomly select a subset of labeled samples from the ground truth.Then, we randomly choose some samples from the selected training set to build the dictionary.For all of the considered images, different training rates are employed to examine the classification performance of various algorithms.We randomly select a reduced number of labeled samples ({5, 10, 20, 40, 60, 80, 100, 120} samples per class) for training, and the rest are for testing.The classification results and maps of our approach and other compared methods are generated with 120 training samples per class.
(2) For classification, we report the overall accuracy (OA), average accuracy (AA), class-specific accuracies (%), kappa statistic (κ), standard deviation and computational time (including searching the optimal regularization parameters) derived from averaging the results after conducting ten independent runs with respect to the initial training set.
(4) For implementation details, to make the comparisons as meaningful as possible, we use the same experimental settings as [41], and all results are originally reported.For Indian Pines and Salinas image datasets, the attribute profiles (APs) [42] were built using threshold values in the range from 2.5% to 10% with respect to the mean of the individual features, with a step of 2.5% for the standard deviation attribute and thresholds of 200, 500 and 1000 for the area attribute, whereas the APs in the University of Pavia image were built using threshold values in the range from 2.5% to 10% with respect to the mean of the individual features and with a step of 2.5% for the definition of the criteria based on the standard deviation attribute.Values of 100, 200, 500 and 1000 were selected as references for the area attribute.The fluctuation epsilon in Equation ( 8) is heuristically found and 10% of the class atom position range in our experiments.It should be noted that each sample is normalized to be zero mean, unit standard deviation, and all of the results are reported over ten random partitions of the training and testing sets.All of the implementations were carried out using MATLAB R2015a on a desktop PC equipped with an Intel Core i7 CPU (3.4 GHz) and 32 GB of RAM.
As can be seen from Table 4, SVM is the fastest, and our proposed mlSRC, mlCRC and mlENRC methods require a larger computational effort, but also achieve better classification accuracy than all competitors.Nevertheless, a fusion strategy using multiple parameters instead of cross-validation for regularization parameter selection at consequent layers can be utilized to reduce the computational time.The classification maps of the Indian Pines generated using the proposed methods and baseline algorithms are shown in Figure 5 to test the generalization capability of these methods.It is shown in Figure 5 that the three proposed mlSRC, mlCRC and mlENRC methods result in more accurate and "smoother" classification maps (with reduced salt-and-pepper classification noise) compared with traditional SRC/CRC, even kernelized SRC/CRC and SVM, which further validates the effectiveness and superiority of the proposed mlSR assignment framework for HSI classification.The results also show that the single-layer SRC, CRC and ENRC always produce inferior performances on this test set, most likely in part due to the instability of the single-layer SR.Our analysis also shows that the KSRC and KCRC have a comparable performance compared with SVM, mlENRC and mlAPCRC, as well.
training ratio) respectively, and the reason is that the attribute profiles provide better discriminative features than the globally-filtered features.To our best knowledge, this result is very competitive on this dataset, which indicates the effectiveness of the proposed mlSR framework.
As can be seen from Table 4, SVM is the fastest, and our proposed mlSRC, mlCRC and mlENRC methods require a larger computational effort, but also achieve better classification accuracy than all competitors.Nevertheless, a fusion strategy using multiple parameters instead of cross-validation for regularization parameter selection at consequent layers can be utilized to reduce the computational time.The classification maps of the Indian Pines generated using the proposed methods and baseline algorithms are shown in Figure 5 to test the generalization capability of these methods.It is shown in Figure 5 that the three proposed mlSRC, mlCRC and mlENRC methods result in more accurate and "smoother" classification maps (with reduced salt-and-pepper classification noise) compared with traditional SRC/CRC, even kernelized SRC/CRC and SVM, which further validates the effectiveness and superiority of the proposed mlSR assignment framework for HSI classification.The results also show that the single-layer SRC, CRC and ENRC always produce inferior performances on this test set, most likely in part due to the instability of the single-layer SR.Our analysis also shows that the KSRC and KCRC have a comparable performance compared with SVM, mlENRC and mlAPCRC, as well.The results in this experiment show that the proposed multi-layer assignment framework is effective to boost classification performance with an improved accuracy of about 3% to 14% via a multi-layer SR.The underlying mechanism of our methods accords with the observation that the sparse coefficients obtained from the second layer lead to a total correct label assignment, where the classes ranked within the top half of the minimal residuals are utilized to do dictionary assembling for each test sample.As a result, the classification performance is guaranteed to increase, which clearly denotes the effectiveness and superiority of the proposed mlSR assignment framework.The results in this experiment show that the proposed multi-layer assignment framework is effective to boost classification performance with an improved accuracy of about 3% to 14% via a multi-layer SR.The underlying mechanism of our methods accords with the observation that the sparse coefficients obtained from the second layer lead to a total correct label assignment, where the classes ranked within the top half of the minimal residuals are utilized to do dictionary assembling for each test sample.As a result, the classification performance is guaranteed to increase, which clearly denotes the effectiveness and superiority of the proposed mlSR assignment framework.

Experiment 2: Results on the University of Pavia Classification
The classification results of the proposed methods and baseline algorithms for the University of Pavia are summarized in Tables 5 and 6.We compare the classification accuracies of our approaches with traditional SRC and CRC, kernelized SRC and CRC and SVM on this dataset.As in the Indian Pines experiments, our proposed mlSRC, mlCRC and mlENRC methods yield higher classification accuracies than any other baseline algorithms.Observing Table 5, we can find that three proposed mlSRC, mlCRC and mlENRC approaches are consistently better than all baseline methods (except AP-based and mlAP-based) from a small number of training samples (five and ten per class) to a larger one (one hundred and twenty per class).Specifically, as provided in Table 6 (mlAP-based and APSVM are not presented due to the limited column space), the OA, κ and AA for our best approach, mlENRC, can be improved from 1.4% to 21.93%, 0.016 to 0.251 and 0.12% to 22.49%, respectively.More specifically, the increases for mlENRC in OA, κ and AA are about 1.4%, 0.016% and 0.12% than that of the fourth best method, KSRC, respectively.Interestingly, the mlAP-based methods (i.e., mlAPSRC and mlAPCRC) achieve better accuracies than their counterparts (that is, APSRC and APCRC), respectively.This can be due to the better stability of proposed mlSR assignment framework.Moreover, we observe that our proposed methods require the largest time, mainly due

Experiment 2: Results on the University of Pavia Classification
The classification results of the proposed methods and baseline algorithms for the University of Pavia are summarized in Tables 5 and 6.We compare the classification accuracies of our approaches with traditional SRC and CRC, kernelized SRC and CRC and SVM on this dataset.As in the Indian Pines experiments, our proposed mlSRC, mlCRC and mlENRC methods yield higher classification accuracies than any other baseline algorithms.Observing Table 5, we can find that three proposed mlSRC, mlCRC and mlENRC approaches are consistently better than all baseline methods (except AP-based and mlAP-based) from a small number of training samples (five and ten per class) to a larger one (one hundred and twenty per class).Specifically, as provided in Table 6 (mlAP-based and APSVM are not presented due to the limited column space), the OA, κ and AA for our best approach, mlENRC, can be improved from 1.4% to 21.93%, 0.016 to 0.251 and 0.12% to 22.49%, respectively.More specifically, the increases for mlENRC in OA, κ and AA are about 1.4%, 0.016% and 0.12% than that of the fourth best method, KSRC, respectively.Interestingly, the mlAP-based methods (i.e., mlAPSRC and mlAPCRC) achieve better accuracies than their counterparts (that is, APSRC and APCRC), respectively.This can be due to the better stability of proposed mlSR assignment framework.Moreover, we observe that our proposed methods require the largest time, mainly due to the cross-validation again from the second layer, but the classification performance is improved in spite of relatively slight improvements.In addition, CRC and KCRC are relatively lower than SRC and KSRC.For this dataset, small spatial homogeneity in this image might cause the training samples from other classes also to participate in the linear representation of the test samples, which leads to some misclassification.Visualization of the classification map using 120 training samples per class is shown in Figure 6.The effectiveness of classification accuracies can be further confirmed by carefully visually checking the classification maps.The obvious misclassification among the class of asphalt and the class of shadow by CRC illustrates the inadequacy of the single-layer SR, which is greatly alleviated in Figure 5m-o, and the best result is achieved in Figure 5o.Therefore, the proposed mlSR framework in the classifiers helps in the discrimination of different types of land cover classes.A similar phenomena can be observed that the multi-layer assignment framework achieves improvements of about 1.2% to 14% to a great extent.The highest accuracy is achieved by mlAPSRC in all training ratios (the second best method is mlAPCRC), which may be associated with the fact that the highly related samples after AP-based processing are chosen to produce a more discriminative power of SRC.Another interesting case is found that AP/mlAP-based methods are always better than non-AP-based ones.

Experiment 3: Results on the Salinas Classification
To validate the performance of the proposed mlSR, mlCRC and mlENRC with both under-complete and over-complete dictionaries, we have tested over a wide range of numbers of training samples, varying from five samples per class to 120 samples per class and the classification results for this dataset are shown in Tables 7 and 8. Likewise, it can be observed from the results that the proposed mlSRC, mlCRC and mlENRC give a consistently better performance than other non-AP-based algorithms.It is obvious from Table 8 (mlAP-based and APSVM are not presented due to the limited column space) that almost all of the class-specific accuracies are improved and hold for interpreting the consistency of the three proposed mlSRC, mlCRC and mlENRC algorithms.Overall, the OA, κ and AA for this dataset can be improved from 2.35% to 6.91%, 0.026 to 0.074 and 0.88% to 5.19%, respectively.Specifically, the increases in OA, κ and AA for the overall best method mlENRC over the fourth best method KCRC are 2.35%, 0.026 and 0.88%, respectively.The best approach is mlAPCRC, which reaches 97.67% when the training ratio is 120 per class.The proposed multi-layer assignment framework and large structure in this HSI may account for this.The classification maps shown in Figure 7 are generated using the proposed algorithms and baselines.Based on the visual inspection in Figure 7, the maps generated from classification using the multi-layer SR framework are less noisy and more accurate than those from using single-layer SR.For example, the classification map of mlENRC (Figure 7o) is more accurate than the map of SVM (Figure 7e).The misclassification of SVM mostly occurred between the class of grapes-untrained and vineyard-untrained.This is explained by the fact that most of the classes in the image represent large structures, and less spatial features could not well capture local structures.Similarly, the proposed methods are computationally intensive during testing.In this case, multiple parameter fusion instead of cross-validation can be employed in order to decrease the computational time.Therefore, the conclusion is that the classification performance of the proposed approaches can be greatly improved via a novel multi-layer SR framework.
As the previous two HSI datasets, the multi-layer assignment framework can obtain an increase of about between 2% and 11% by the introduction of multi-layer SR in the Salinas image, the proposed multi-layer framework accumulates the classification results from different layers, which results in a greater accuracy and is superior to the single-layer hard assignment due to unstable coefficients based on the minimal residual alone.Therefore, the proposed mlSR framework is competent to improve classification performance.

Discussion
The design of a proper SR-based classification framework is the first important issue we are facing, as HSI datasets are complex, and the within-class variation and spatial details in complex scenes cannot be well measured by the single-layer SR.In the design of the SR-based model, we propose a multi-layer SR framework to produce discriminative SR for the test samples and achieve stable assignment distributions via the adaptive atoms' selection in a multi-layer manner, then three proposed approaches, mlSRC, mlCRC and mlENRC, are developed.The proposed mlSRC, mlCRC and mlENRC are based on the same idea, but adopt different sparse optimization criteria.The different criteria bring out the difference among them for HSI classification, and the difference of the three proposed methods is related to the constructing manner of the sparse optimization solver.In

Discussion
The design of a proper SR-based classification framework is the first important issue we are facing, as HSI datasets are complex, and the within-class variation and spatial details in complex scenes cannot be well measured by the single-layer SR.In the design of the SR-based model, we propose a multi-layer SR framework to produce discriminative SR for the test samples and achieve stable assignment distributions via the adaptive atoms' selection in a multi-layer manner, then three proposed approaches, mlSRC, mlCRC and mlENRC, are developed.The proposed mlSRC, mlCRC and mlENRC are based on the same idea, but adopt different sparse optimization criteria.The different criteria bring out the difference among them for HSI classification, and the difference of the three proposed methods is related to the constructing manner of the sparse optimization solver.In order to balance the classification performance and complexity of the framework, the three-layer SR is adopted.Meanwhile, a filtering rule is heuristically exploited to identify the obviously misclassified samples for the next layer SR; moreover, dictionary assembling and new cross-validation on the parameter searching for each test sample are conducted.These enhancements lead to a substantial improvement in performance and saves computational time during testing.Another important observation is that our proposed methods are computationally intensive; this is mainly due to the fact that the optimal regularization parameter for each test sample is searched via cross-validation again from the second layer.Thus, the multiple parameter fusion is expected to be a good alternative to cross-validation in computationally efficiency.Nevertheless, our proposed mlSR framework has another nice property that can be easily plugged into any representation-based classification model using different HSI features (e.g., spectral feature, spatial features and spatial-spectral features).Last, but not least, a structural dictionary consisting of globally-spatial and spectral information is constructed to further boost the classification performance.
Overall, by comparing the classification performances of Experiment 1, Experiment 2 and Experiment 3, it is clear that the proposed multi-layer assignment framework is superior to the single-layer competitors in terms of classification accuracy; this is expected.The improvements mainly come from the proposed multi-layer SR framework, which confirms our former statement.It is interesting to note that for small considered class, such as wheat (C13) in the Indian Pines image and metal sheets (C5) in the University of Pavia, and for difficult class, for instance, grapes (C8) in the Salinas image, the proposed methods exhibit very good generalization performance with an OA of 100% or of remarkable increase, which validates our observation well that mlSRC, mlCRC, mlENRC and mlAP-based methods can improve the performance of the learnt model for a specific class.
In order to further assess the performance of the proposed method, we select some methods that use joint/spectral-spatial sparse representation classification for comparison.Reference results were provided in [34] for fused representation-based classification, sparse representation-based nearest neighbor classifier (SRNN), the local sparse representation-based nearest neighbor classifier (LSRNN), simultaneous orthogonal matching pursuit (SOMP) and the joint sparse representation-based nearest neighbor classifier (JSRNN) proposed in [33].Additionally, the reported accuracies from [28] for joint sparse representation classification (JSRC), collaborative representation classification with a locally-adaptive dictionary (CRC-LAD) and nonlocal joint CR classification with a locally-adaptive dictionary (NJCRC-LAD) and from [36] for pixel-wise learning sparse representation classification with spatial co-occurrence probabilities estimated point-wise without any regularization (suffi-P, i.e., LSRC-P) and the patch-based version (pLSRC-P) are shown.Finally, logistic regression via variable splitting and augmented Lagrangian-multilevel logistic (LORSAL-MLL), joint sparse representation model (JSRM) and multiscale joint sparse representation (MJSR) in [29] are compared.
Tables 9-11 illustrate the classification overall accuracy of mlSRC, mlCRC, mlENRC, mlAPSRC and mlAPCRC in comparison with the above methods for the Indian Pines, University of Pavia and Salinas datasets, respectively.For a fair comparison, the same number of training samples in the same image is kept.As can be seen from Tables 9-11, the classification accuracies in our approaches are comparable or better than the accuracies of the other compared methods in the same image.For the Indian Pines, the OA of mlENRC is 2.11% higher than CRC-LAD.For the Pavia University, the OA of mlAPSRC is 3.71% higher than NJCRC-LAD and 5.74% higher than JSRNN, respectively.For the Salinas, the improvement in OA of mlAPCRC over JSRM is 1.31%.The reason is that we use a multi-layer sparse representation framework with methods that are different from each other, and the classification performances are consistently improved.Table 9.Comparison of the methods, denoted as mlSRC, mlCRC, mlENRC, mlAPSRC and mlAPCRC, with the results reported in (1) [34], (2) to (3) [33] and (4) to (5) [28]

Conclusions
In this paper, a novel multi-layer spatial-spectral sparse representation (mlSR) classification framework and three mlSR methods, that is mlSRC, mlCRC and mlENRC, have been proposed for HSI classification.In the proposed mlSR assignment framework, a test sample is represented in a selective multi-layer manner that exploits the potentially multiple class label assignments and adaptive selection of the sub-dictionary atoms.Furthermore, the mlSR assignment framework is integrated with the filtering rule for the obviously misclassified samples to be selected to perform a multi-layer SR for classification, which results in better performance and less computational complexity.The proposed methods of mlSRC, mlCRC, mlENRC and AP/mlAP-based are tested on three real HSI datasets and can achieve comparable or higher classification accuracy over several state-of-the-art methods both quantitatively and qualitatively.The novelty of our proposed methods lie in the multi-layer sparse representation framework and the effectiveness in modeling the discriminative information for representation-based classification.As we know, the performance can be further improved.In our future work, we will explore the multiple kernel SR assignment framework to enhance its performance.

Figure 2 .
Figure 2. Distribution of sparse codes of samples (twelve classes and a test sample) in the Indian Pines image under a structural dictionary with 480 atoms in total.The red curve indicates the corresponding class-dependent atoms.The used test sample belongs to Class 2.

Figure 2 .
Figure 2. Distribution of sparse codes of samples (twelve classes and a test sample) in the Indian Pines image under a structural dictionary with 480 atoms in total.The red curve indicates the corresponding class-dependent atoms.The used test sample belongs to Class 2.

Figure 2 .
Figure 2. Distribution of sparse codes of samples (twelve classes and a test sample) in the Indian Pines image under a structural dictionary with 480 atoms in total.The red curve indicates the corresponding class-dependent atoms.The used test sample belongs to Class 2.

5 :
Obtain individual residuals r SRC c (y), r CRC c (y), r ENRC c (y) according to Equation (2), and update respective class label matrices L SRC l

Figure 4 .
Figure 4. Overall classification accuracy (%) of representation-based classifiers versus λ using 40 training samples per class, (a,b) λ for mlSRC, mlCRC and multi-layer elastic net representation-based classification (mlENRC) at the first layer in the Indian Pines image; (c,d) λ for mlSRC, mlCRC and mlENRC at the first layer in the University of Pavia image; (e,f) λ for mlSRC, mlCRC and mlENRC at the first layer in the Salinas image.

Figure 4 .
Figure 4. Overall classification accuracy (%) of representation-based classifiers versus λ using 40 training samples per class, (a,b) λ for mlSRC, mlCRC and multi-layer elastic net representation-based classification (mlENRC) at the first layer in the Indian Pines image; (c,d) λ for mlSRC, mlCRC and mlENRC at the first layer in the University of Pavia image; (e,f) λ for mlSRC, mlCRC and mlENRC at the first layer in the Salinas image.
Remote Sens. 2016, 8, 985 16 of 26 to the cross-validation again from the second layer, but the classification performance is improved in spite of relatively slight improvements.In addition, CRC and KCRC are relatively lower than SRC and KSRC.For this dataset, small spatial homogeneity in this image might cause the training samples from other classes also to participate in the linear representation of the test samples, which leads to some misclassification.Visualization of the classification map using 120 training samples per class is shown in Figure 6.The effectiveness of classification accuracies can be further confirmed by carefully visually checking the classification maps.The obvious misclassification among the class of asphalt and the class of shadow by CRC illustrates the inadequacy of the single-layer SR, which is greatly alleviated in Figure 5m-o, and the best result is achieved in Figure 5o.Therefore, the proposed mlSR framework in the classifiers helps in the discrimination of different types of land cover classes.

Table 1 .
Types of features except for spectra via globally filtering in the experiments.

Table 3 .
Overall classification accuracy (%) and standard deviation as a function of the number of training samples per class for the Indian Pines image.KSRC, kernelized SRC; AP, attribute profile.

Table 5 .
Overall classification accuracy (%) and standard deviation as a function of the number of training samples per class for the University of Pavia image.

Table 6 .
Class-specific accuracy (%), overall (OA), average (AA), kappa (κ), as well as computational time in seconds with 120 training samples per class for the University of Pavia image.

Table 7 .
Overall classification accuracy (%) and standard deviation as a function of the number of training samples per class for the Salinas image.

Table 8 .
Class-specific accuracy (%), overall (OA), average (AA), kappa (κ), as well as computational time in seconds with 120 training samples per class for the Salinas image.