A Symmetric Sparse Representation Based Band Selection Method for Hyperspectral Imagery Classification

A novel Symmetric Sparse Representation (SSR) method has been presented to solve the band selection problem in hyperspectral imagery (HSI) classification. The method assumes that the selected bands and the original HSI bands are sparsely represented by each other, i.e., symmetrically represented. The method formulates band selection into a famous problem of archetypal analysis and selects the representative bands by finding the archetypes in the minimal convex hull containing the HSI band points (i.e., one band corresponds to a band point in the high-dimensional feature space). Without any other parameter tuning work except the size of band subset, the SSR optimizes the band selection program using the block-coordinate descent scheme. Four state-of-the-art methods are utilized to make comparisons with the SSR on the Indian Pines and PaviaU HSI datasets. Experimental results illustrate that SSR outperforms all four methods in classification accuracies (i.e., Average Classification Accuracy (ACA) and Overall Classification Accuracy (OCA)) and three quantitative evaluation results (i.e., Average Information Entropy (AIE), Average Correlation Coefficient (ACC) and Average Relative Entropy (ARE)), whereas it takes the second shortest computational time. Therefore, the proposed SSR is a good alternative method for band selection of HSI classification in realistic applications.


Introduction
Thanks to the powerful advantage in collecting both spectrum and images of ground objects on the earth surface, hyperspectral imaging is a popular technique in many application fields, including environment monitoring [1,2], precision agriculture [3,4], mine exploration [5,6] and so on.However, many challenging problems exist in the hyperspectral imagery (HSI) processing, especially the "curse of dimensionality" [7][8][9].The problem results from numerous bands and strong intra-band correlations and it indicates that achieving higher classification accuracy requires more training samples.However, collecting too many training samples is expensive and time-consuming [10,11].Therefore, dimensionality reduction is an alternative way to conquer the above problem and to promote the applications of HSI data.
Usually, dimensionality reduction can be classified into two main groups: band selection and feature extraction [12,13].Feature extraction reduces the dimensionality of HSI data through transforming it into a low-dimensional feature space, whereas band selection selects a proper band subset from the original band set [14,15].In this study, we focus on band selection because we believe band selection inherits the original spectral meanings of HSI data when compared to feature extraction.
The research history of band selection starts from the birth of hyperspectral imaging technique.Many classical methods from information theory were introduced into the hyperspectral community.The entropy-based methods select a band subset aiming for maximal information entropy or relative entropy [16,17].The effects from intra-band correlations are usually neglected in the entropy-based methods, and the representative bands are prone to be highly correlated and do not necessarily perform well in realistic applications [18].Meanwhile, intra-class divergences can be maximized to formulate the distance measure based methods using Euclidean, Spectral Information Divergence (SID), Mahalanobis distances and so on [18,19].The methods outperform the entropy-based methods in many instances but the selected band subsets vary greatly across different distance measurements.In addition, some measurements such as Spectral Angle Mapping (SAM) do not consider the intra-band correlations and these methods might bring about unstable results in band selection.The intra-band correlation based methods select a proper band subset that has minimal band correlations, and typical examples are the mutual information method [20], the joint band-prioritization and band-decorrelation method [21], the semi-supervised band clustering method [22] and the column subset selection method [23].These methods perform better than all classical methods, but they rely heavily on prior knowledge of intra-band and still have some respective disadvantages.For example, the band clustering algorithms typically involve complex combinatorial optimization leading to a plethora of heuristics, and the choices of clustering centers highly affect the result of representative bands [22].
With the maturity of artificial intelligence, many relevant algorithms have been adopted to solve the band selection problem.The particle swarm optimization based methods implement a defined iterative searching criterion function to obtain a proper band subset that maximizes the intra-class separabilities.Typical algorithms are the simple particle swarm optimization algorithm using the searching criterion function of minimum estimated abundance covariance [24], the parallel particle swarm optimization algorithm [25], and the improved particle swarm optimization algorithm [26].The particle swarm optimization based methods have lower computational complexity and smaller parameter tuning works, but the methods are easily encountered in local minima and could not guarantee successful global optimization.The ant colony optimization based methods implement a positive feedback scheme and continually update the pheromones to optimize the band subset combination.The representative algorithms are the parallel ant colony optimization algorithm [27] and the specific ant colony algorithm for urban data classification [28].Because they lack sufficient initial information, the ant colony based methods usually take long computational times to obtain a stable optimal solution.The complex networks based methods input the HSI dataset into complex networks and find an appropriate band subset that has best qualification for differentiating all ground objects [29,30].The band subset from complex networks performs better in identifying different ground objects than classical methods, whereas the high computational complexity in constructing and analyzing the complex network hinders its applications in realistic works.Other artificial intelligence based methods in recent literature include the progressive band selection method [31], the constrained energy minimization based method [32] and the supervised trivariate mutual information based method [33].From the above, most artificial intelligence based methods could not perfectly balance the computational speeds and the optimization solutions.In addition, the estimated band subset is difficult to physically interpret because of the complicated searching strategy adopted.
More recently, the popularity of compressive sensing brings about new perspectives for band selection and many sparsity-based algorithms have been presented in the literature [34][35][36].The sparsity theory states that each band vector (i.e., the hyperspectral image in each band is reshaped as a band vector in the column format) can be sparsely represented using only a few non-zero coefficients in a proper basis or dictionary [37,38].Sparse representation could uncover underlying features within the HSI band collection and help selecting a proper band subset.The Sparse Nonnegative Matrix Factorization (SNMF) based methods originate from the idea of "blind source separation", and simultaneously factorize the HSI data matrix into a dictionary and a sparse coefficient matrix [36].The band subset is then estimated from the sparse coefficient matrix.The examples of SNMF based methods are the improved SNMF with thresholded earth's mover distance algorithm [39] and the constrained nonnegative matrix factorization algorithm [40].The SNMF based methods stand on low rank approximations and have a great degree of flexibility in capturing the variances among different band vectors.Unfortunately, the band subset from SNMF based methods can be hard to interpret and its physical or geometric meaning is unclear.Different from the SNMF based methods, the dictionaries in sparse coding based methods are learned or manually defined in advance.The sparse coding based methods integrate the regular band selection models with sparse representation model of band vectors to estimate the proper representative bands.Typical methods are the sparse representation based (SpaBS) method [35], the sparse support vector machine method [41], the sparse constrained energy minimization method [42], the discriminative sparse multimodal learning based method [43], the multitask sparsity pursuit method [44] and the least absolute shrinkage and selection operator based method [45].Similar with SNMF, the band subset from sparse coding has unclear physical or geometric explanations.When the dictionary in sparse coding is set to be equal to the HSI data matrix, all band vectors can be assumed to be sampled from several independent subspaces and the Sparse Subspace Clustering (SSC) model is then formulated.Typical methods include the collaborative sparse model based method [34] and the Improved Sparse Subspace Clustering (ISSC) method [46].The SSC based methods combine the sparse coding model with the subspace clustering approach, and the benefit of clustering renders that the achieved band subset is easy to interpret.Nevertheless, the clustering center in the methods is difficult to uniquely determine because it depends on the number of clusters.
In this study, different from previous works, a Symmetric Sparse Representation (SSR) method is proposed to investigate the band selection problem.The aim of SSR is to combine the advantages of SNMF and SSC, while avoiding their respective disadvantages.Compared with the SNMF and SSC, the SSR method favors the following three main innovations: (1) SSR combines the assumptions of SNMF and SSC and integrates benefits from both methods.The SNMF regards that each band vector can be sparsely represented by the aimed band subset with a sparse and nonnegative coefficient vector, and it explains that each band vector in HSI data can be regarded as a convex combination of the aimed band subset, even though the band subset is undetermined.The SSC assumes that each selected band vector can be sparsely represented in the feature space spanned by all the band vectors, and each selected band vector is a convex combination of all the band vectors in HSI data.The SSR combines symmetric assumptions of both SNMF and SSC together, and then it could integrate the advantages of SNMF and the virtues of SSC.(2) The SSR method has clearer geometric interpretations than many current methods.SSR formulates the band selection problem into the optimization program of archetypal analysis.Archetypal analysis gives the SSR a clear geometric meaning that selecting the representative bands is to find archetypes (i.e., representative corners) of the minimal convex hull containing the HSI band points (i.e., a band vector corresponds to a high-dimensional band point).In contrast, the current sparsity-based methods including SNMF and sparse coding based method could capture low-rank feature of HSI band set, but the meanings of selected bands are difficult to interpret [47].(3) The SSR method does not involve any tuning works of inner parameters and this feature makes it easier to implement SSR in realistic applications.Particularly, the SSR does not have the clustering procedure, and hence the estimated SSR band subset avoids negative effects from the clustering approaches that exist in SNMF and SSC.
The rest of this paper is organized as follows.Section 2 presents the band selection procedure using the proposed SSR method.Section 3 describes experimental results of SSR in band selection for classification on two widely used HSI datasets.Section 4 discusses the performance of SSR compared to four other methods.Section 5 states the conclusions.

Methods
In this section, the Symmetric Sparse Representation (SSR) method is proposed.Section 2.1 describes the model of symmetric sparse representation on HSI bands, Section 2.2 presents the solution of the model and Section 2.3 gives the summary of the proposed method for band selection.

Symmetric Sparse Representation of HSI Bands
SNMF assumes that each band vector in HSI data can be sparsely represented by a coefficient vector in a basis or dictionary that is constituted with the aimed band subset.The SNMF simultaneously decomposes the HSI band matrix into the dictionary and a sparse coefficient matrix.SNMF was inspired from the idea of "blind source separation", and the flexibility of SNMF renders that it is efficient in capturing the variances among different bands for selecting proper representative bands.However, the low-rank approximations of SNMF cannot provide reasonable explanations on the selected band subset.In contrast, the SSC based methods improve from subspace clustering, and state that each selected band is sampled from a defined subspace and it could be sparsely represented by all the other bands from the HSI data.The benefit of clustering and subspace assumptions gives the SSC an easy and interpretable band subset.Nevertheless, the binary assignments in the clustering reduce the flexibility of the SSC model and the result of clustering strongly depends on the heuristics of clustering centers.Therefore, we propose the Symmetric Sparse Representation (SSR) model to combine the virtue of SSC and the flexibility of SNMF.
The SSR model assumes that the selected bands are convex combinations of the original HSI bands and the total HSI bands are approximated in terms of convex combinations of the selected band subset.Consider all the HSI band vectors to constitute a band matrix Y " y j ( N j"1 P R DˆN , where each band vector y j in each column corresponds to a band point in the D-dimensional feature space, D is equal to the number of pixels in the image scene and N is the number of bands with N !D. Band selection is used to find the representative or exemplar bands from the original HSI band set, and accordingly SSR assumes that the HSI band matrix can be successfully reconstructed by the selected band vectors using a sparse coefficient matrix.The assumption formulates an equation of SNMF that the HSI band matrix can be simultaneously decomposed as the aimed band subset and the sparse coefficient matrix [36], shown in the following: where the matrix Z P R Dˆk is the dictionary matrix constituted with the selected band vectors, k is the size of band subset, A " ta i u N i"1 P R kˆN is the sparse coefficient matrix with a i ě 0 and |a i | " 1 and E 1 P R DˆN is the error term of all the band vectors.The constraint a i ě 0 ensures nonnegative coefficients to satisfy the reality of HSI band vectors.The constraint |a i | " 1 guarantees the probability that an arbitrary i-th band is represented by the selected bands is equal to 1.The error matrix E 1 mainly originates from approximation errors in the representation by the selected bands and Gaussian noises in all band vectors.
Meanwhile, the SSR assumes that all the HSI bands are sampled from a union of independent subspaces constituted from several bands, each representative band z j can then be approximately sparsely represented in the feature space spanned by all the bands [46], where b j is a sparse coefficient vector that shows the coordinates of z j in the feature space, having b j ě 0, b jj " 0 and ˇˇb j ˇˇ" 1.The constraint b j ě 0 ensures nonnegative coefficients to satisfy the reality of HSI band vectors.The constraint b jj " 0 is to eliminate a trivial solution that each selected band is simply a representation of itself.The constraint ˇˇb j ˇˇ" 1 guarantees the probability that an and arbitrary selected band z j is represented by all the other band vectors is equal to 1.The positions of nonzero entries in b j denote the other bands from the same subspace (i.e., cluster) that the representative band z j belongs to.When stacking all the k selected bands together in the column format, the selected band vectors can be sparsely represented by the original band vectors, where B " b j ( k j"1 P R Nˆk is the sparse coefficient matrix, and E 2 P R Dˆk is the error term that comes from the Gaussian noises in bands and approximation errors in the representation model.The constraint pBq " 0 is to avoid a trivial solution that all the selected bands are self-represented by themselves.Nonzero entries in each column of B illustrate the band constituents of its subspace, and all the bands are concentrated into the k independent subspaces.Substituting the Equation (3) into Equation ( 1), the formulated Symmetric Sparse Representation (SSR) model for the HSI bands is the following:

#
b j ě 0, b jj " 0 and ˇˇb j ˇˇ" 1, @j P t1, ¨¨¨, ku a i ě 0 and where Y is the band matrix constituted with all band column vectors, B and A are sparse coefficient matrices and are column stochastic, and the error term E combines both errors in Equations ( 1) and ( 3) that come from noises in band vectors and approximation errors in the sparse representation models.Equation ( 4) integrates SNMF and SSC, and hence the SSR model has the features of flexibility and easy interpretation.

The Solution of SSR Model for Band Selection
The solution of Equation ( 4) can be transformed into the famous archetypal analysis problem [48] shown in Equation (5) argmin # b j ě 0, b jj " 0 and ˇˇb j ˇˇ" 1, @j P t1, ¨¨¨, ku a i ě 0 and where ||¨|| F is the Frobenious norm.Archetypal analysis assumes that archetypes are convex combinations of all band points and all band points are approximated in terms of convex combinations of archetypes [49].Therefore, selecting a proper band subset is then explained as finding archetypes of the minimal convex hull of the high-dimensional band points [50].The coefficient matrices B and A in Equation ( 5) are unknown, and that makes the un-convex optimization problem challenging to solve.Fortunately, the problem becomes convex with respect to one of the variables A or B when the other one is fixed.In this study, we utilize the block-coordinate descent scheme to achieve an optimal solution of problem Equation (5).First, the selected band subset Z p0q is initialized via the FURTHESTSUM algorithm [51] and the initial sparse coefficient matrix B p0q is obtained via Z p0q " YB p0q .The FURTHESTSUM proceeds in the following three steps: (1) A subset Z p0q with k bands is randomly selected from the original band set; (2) for an arbitrary j-th random band z p0q j P Z p0q , a unique feature band vector z 1 p0q j that has maximal Euclidean distance with z p0q j is chosen from the original band set; and (3) the random band z p0q j is replaced by its corresponding feature band z 'p0q j and the Z p0q is renewed as the FURTHESTSUM band subset.The careful selected initial band subset Z p0q from the FURTHESTSUM scheme improves the convergence speed of optimization problem Equation ( 5) and lowers its risk in finding insignificant bands, especially to avoid selecting the too-close bands.After that, the block-coordinate descent scheme optimizes the variables B and A with iterative procedures and updates each variable at iteration t+1 using the following schemes [48] After the ergodic process of all columns, the variable A pt`1q at the t+1-th iteration is obtained.On the other hand, when fixing variable A pt`1q , variable B pt`1q is estimated with the update scheme in each column b j , where the b j at the t + 1 iteration is optimized with the quadratic program Equation ( 7 where b ptq j is the j-th column of variable B ptq at the t-th iteration, and a j is the j-th row of variable A pt`1q at the t+1 iteration.Variable B pt`1q at the t + 1-th iteration is estimated after the egrodic procedure of all its columns.The active-set algorithm [52] is utilized to solve the quadratic programs in Equations ( 6) and (7), and it implements an aggressive strategy that leverages the underlying sparsity feature of variables B and A. The above updates for A pt`1q and B pt`1q are repeated until satisfying the convergence conditions or the number of iterations exceeds the predefined maximal iteration number.The convergence condition is set as ||Y ´YB pt`1q A pt`1q || 8 ď ε , where ε is the defined error tolerance for the residuals.The variables A pt`1q and B pt`1q at the stopping iteration are set as the optimal sparse coefficient matrices and the estimated band subset Z is obtained via Z " YB pt`1q .The achieved matrix Z does not represent the real subset from HSI data because of approximation errors in Equation (4).Therefore, we select the real bands that are nearest to the estimated Z from the original band collection to replace the estimated result.The index set c " c j ( k j"1 of the real band subset is obtained using the following optimization Equation (8) c j " argmin i"1,¨¨¨,N ||z j ´yi || 2  2 , @j P t1, ¨¨¨, ku where z j is the j-th column of the estimated Z.The final band subset Ẑ is picked with the achieved index set c.

The Summary of SSR for Band Selection
The SSR method stands on two symmetric assumptions: the original band set is sparsely represented by the dictionary matrix of the selected band subset, and each band in the selected subset can be sparsely represented by all the original bands except itself.The SSR formulates band selection into the problem of archetypal analysis, and solves the problem with the block-coordinate descent scheme.Meanwhile, the SSR utilizes the FURTHESTSUM algorithm to obtain a good initialization of band subset Z p0q .The sparse coefficient matrices B and A are obtained when the convergence conditions satisfy or the number of iterations exceeds the maximal iteration number.Considering that the estimated band subset is not included in the original band set, the real bands that have smallest divergence with the estimated bands are selected from the original band collection and are set as candidates of the final band subset.The SSR method implements as follows: (1) Hyperspectral images are transformed from a data cube into a two-dimensional real band matrix Y P R DˆN , where D is the number of pixels and N is the number of bands.(2) With the predefined size k of the band subset, the SSR model represents the HSI bands with Equation ( 4), where B and A are the aimed sparse coefficient matrices. (3) The solution of SSR is reformulated into an archetypal analysis problem in Equation ( 5), and the block-coordinate descent algorithm is introduced to solve the problem.The algorithm is implemented as an iterative scheme and each column in A and B is updated via solving the quadratic program in Equations ( 6) and (7), respectively.(4) The variables A and B at the stopping iteration are set as the estimated matrices and the estimated band subset is obtained via Z " YB. (5) The band y i P Y that is nearest to the estimated Ẑj P Z using Equation ( 8) is set as one candidate of the final subset and the real band subset Ẑ is finally obtained.
The computational complexity of FURTHESTSUM procedure is O pDNk `NklogNq, where D is the number of pixels in the image scene, N is the number of HSI bands and k is the size of band subset.In Equation ( 6), each iteration in updating the column of variable A has the computational complexity less than O `Dk `k2 ˘, and thus the computational complexity in updating variables A at each iteration approaches O `DNk `Nk 2 ˘.Similarly, the computational complexity in updating variables B at each iteration is approximately O `Dk 2 `kN 2 ˘.Therefore, the total complexity of the SSR method for band selection is less than O pNk pD `logNq `kt pD `Nq pk `Nqq and it approaches O pNk pD `logNq `ktDNq because k !N !D.

Experiments
In this section, three groups of experiments on two HSI datasets are designed to testify the SSR method for band selection.Section 3.1 describes the information of two HSI datasets.Section 3.2 lists detailed results from the three groups of experiments.

Descriptions of Two HSI Datasets
The Indian Pines dataset was collected by NASA on 12 June 1992 using the AVIRIS sensor from JPL.It has 20 m spatial resolution and 10 nm spectral resolution, covering a spectrum range of 200-2400 nm.A subset of the image scene of size 145 ˆ145 pixels is implemented in the experiment and it covers an area of 6 miles west of West Lafayette, Indiana.The dataset was pre-processed with radiometric corrections and bad band removal, and 200 bands were left with calibrated data values proportional to radiances.Sixteen classes of ground objects exist in the image scene (Figure 1), and the ground truth for both training and testing samples in each class is listed in Table 1.The Pavia University (PaviaU) dataset was obtained from ROSIS sensor having 1.3 m spatial resolutions and 115 bands.After removing low SNR bands, the remaining 103 bands were utilized in the following experiments.A smaller subset of the larger dataset shown in Figure 2 contains 350 ˆ340 pixels and covers an area of Pavia University.The image scene has nine classes of ground objects, including shadows, and the ground truth information of training and testing samples in each class is listed in Table 2.

Experimental Results
In the following, we design three groups of experiments on both HSI datasets to explore the performance of the proposed method.Four state-of-the-art methods are utilized to make holistic comparisons with the SSR, including SID [18], MVPCA [21], SNMF [36] and SpaBS [35] methods.The first experiment quantifies the band selection performance of SSR and compares the results with those of the four other methods.The second experiment compares classification accuracies of SSR and the four other methods.Three popular classifiers are adopted in the experiment, Support Vector Machine (SVM) [53], K-Nearest Neighbor (KNN) [54] and Random Forest (RF) [55] classifiers.We quantify classification accuracies using Overall Classification Accuracy (OCA) and Average Classification Accuracy (ACA).The SVM classifier is implemented in the LIBSVM software package using the Radial Basis Function (RBF) kernel function [56] and the variance parameter and penalization factor in the SVM are estimated via cross-validation.The KNN classifier works with the Euclidean distance and the RF classifier is implemented in the "randomforest" package using default parameters [57].The third experiment compares the computational complexity and computational times of all five methods.The following results, without specific clarifications, are the average results of ten different and independent experiments. (1) Quantitative evaluation of the SSR band subset.The experiment investigates the band selection performance of SSR before classification.We implement three quantitative measures, the Average Information Entropy (AIE), the Average Correlation Coefficient (ACC) and the Average Relative Entropy (ARE) (also called Average Kullback-Leibler Divergence, AKLD), to estimate the richness of spectrum information, the intra-band correlations and the intra-class separabilities of the selected band subset, respectively.The reason for the three quantitative measures is that we argue that a proper band subset should have higher information amount, low intra-band correlations and high intra-class separabilities.In the experiment, we manually choose the parameter k and then set them as the dimensions of band subsets from all five methods.The k in Indian Pines dataset is 12 and that of PaviaU dataset is 10.In the SNMF method, the parameter α controls the entry size of dictionary matrix and the parameter γ determines the sparseness of coefficient matrix.Using cross-validation, the α and γ of SNMF on Indian Pines dataset are chosen as 3.0 and 0.05, respectively, and the α and γ on PaviaU dataset are 4.0 and 0.001, respectively.The iteration time t for the learning dictionary in SpaBS is manually set as 5 for both HSI datasets.
Table 3 compares quantitative evaluation results of SSR and the four other methods on both datasets.For the Indian Pines dataset, SSR has the highest ARE and the lowest ACC, whereas SNMF has the highest AIE.The AIE of SSR is lower than that of SNMF but it clearly outperforms SID, MVPCA and SpaBS.The SID and MVPCA behave worse when compared with the three other methods.For the PaviaU dataset, the SSR outperforms the four other methods in all three quantitative measures.(2) Classification performance of SSR.This experiment makes holistic evaluations in classification performance of SSR by varying the size of band subset k.The classification accuracies are quantified with the OCA and ACA and the results are averaged from ten independent experiments.In the experiment, the sizes of band subset k in Indian Pines and PaviaU datasets change from 5 to 45.The neighborhood size in the KNN classifier and the threshold of total distortion in the SVM classifier are set as 3 and 0.01 respectively.Using cross-validation, the α and γ in SNMF of PaviaU dataset are estimated as 3.0 and 0.1, respectively, and the α and γ in PaviaU are chosen as 4.0 and 1.5, respectively.Other parameters unmentioned are the same as their counterparts in the above experiments.
Figure 3 plots the OCAs of original HSI band sets and the band subsets of all five methods using SVM, KNN and RF classifiers on both datasets.The reason for omitting the ACA plots is the similarities with those of OCAs.All the plots from Figure 3a to Figure 3f rise from a small value and the changes become slow after a certain threshold, for both datasets and all three classifiers.The SID behaves worst among all the plots, regardless of classifier or HSI dataset.This coincides with the observations in Experiment (1).From all six figures, the OCA plots from SSR clearly surpass those of the four other methods, including SID, SpaBS, MVPCA and SNMF.When increasing the size of k, after a certain value, the SSR band subsets behave better than the original band sets in relation to OCAs.In contrast, the plots of the four other methods are inferior to those of the original band sets, whatever the size of k.Moreover, we compare classification accuracies ACAs and OCAs from all five methods when the parameters of band subset size k equal those of Experiment (1).The contrast in ACAs and OCAs from all five methods is illustrated in Table 4, and Figures 4 and 5 show the classification maps of all methods on both datasets using the SVM classifier.The results from the three classifiers show that SSR performs better than the four other methods and further verify the above conclusions.(3) Computational performance of SSR.This experiment explores the computational performance of SSR against the four other methods.Table 5 lists the computational complexity of all five methods, where parameter D is the number of pixels in the image scene, k is the size of band subset, N is the number of bands (i.e., the size of original band set), t is the iteration time, and K is the sparsity level in the SpaBS method.In the table, we can see that SpaBS has the highest computational complexity among all the methods and SSR has lower computational complexity.
Table 5.The contrast in computational complexity of SSR and the four other methods.

SID
Furthermore, we compare the computational times of the five methods by changing the size of band subset from 10 to 50 with a step interval of 10.The experiment is carried out using a Windows 7 computer with Intel i5-4570 Quad Core Processor and 8 GB of RAM.SSR and the four other methods are implemented in Matlab 2014a.The results in Table 6 show that all five methods have the computational complexity increase with the rising k.Among all the methods, SID has the fastest computational speeds and takes the shortest time at the same parameter k and on the same HSI dataset.The SSR has shorter computational times than those of MVPCA, SNMF and SpaBS, and it has the second fastest computational speeds.The computational times of SNMF are longer than those of MVPCA but clearly outperform those of SpaBS.SpaBS performs worst among all the methods with respect to computational speeds.The computational speeds in descending order are the following: SID, SSR, MVPCA, SNMF and SpaBS.

Discussions
This section discusses the performances of SSR compared to the four other state-of-the-art methods from Section 3.2 in detail.Three experiments have been designed using Indian Pines and PaviaU datasets to compare the SSR method to SID, MVPCA, SpaBS and SNMF.Three quantitative measures, AIE, ACC and ARE, show that SSR outperforms the four other methods.SSR assumes that the selected bands and the original HSI bands are symmetrically sparsely represented by each other.The two sparse representation assumptions interpret band selection as finding archetypes (i.e., representative corners) of the minimal convex hull containing the HSI band points.Hence, the SSR subset has high information amount, high intra-class separabilities and low intra-band correlations.The SSR satisfies the requirements of band subset selection and is more appropriate for band selection than the four other methods, especially SID and MVPCA.
The classification and computation experiments compare classification and computational performances of the SSR band subset with those of the four other methods.The SID has the fastest computational speeds whereas its band subset obtains worst ACAs and OCAs.The fastest speed of SID results from its lowest computational complexity in computing the diagonal elements of its similarity matrix.The SpaBS has better classification accuracies, ACAs and OCAs, than SID but it costs the longest computational time.The reason for that is the extremely high complexity of dictionary learning using K-SVD algorithm.The slower computational speed of MVPCA than SSR results from the lower computation in principal component analysis transformation.The SSR behaves best among all five methods in classification accuracies OCAs and ACAs while it takes the second shortest computational times.Moreover, compared with the four other methods, the SSR band subsets exclusively achieve better OCAs than the original band sets on both HSI datasets, when having a larger size k than a certain value.This implies that SSR could select a proper band subset and could help solve the "curse of dimensionality" problem in HSI classification.
However, we have to clarify that SSR requires no more parameter setting work, except the size of band subset k.We did manually estimate the size of band subset and did not carefully investigate a proper size for the selected bands.Aside from the size problem of a band subset, SSR is the best candidate among all five methods for selecting a proper band subset from HSI bands because of its comprehensive performances in classification and computation.The reason we did not explore setting a proper size for the SSR method is that different estimation criteria in various methods renders it confusing, and even difficult, to estimate a unique and proper size.The unification of all current estimation methods of band subset size is then the first significant problem we aim to solve in future work.One big possible uncertainty of SSR in band selection for HSI classification is the effect from atmospheric calibration on both HSI datasets.We decided to make no atmospheric correction in this manuscript to facilitate comparison with other methods.Nevertheless, atmospheric calibration does make clear effects in classification results of HSI datasets.Therefore, the second aim of our future work is to analyze the effects of atmospheric calibration on the classification performance of SSR and continue to ameliorate classification results of the SSR band subset in realistic classification applications.

Conclusions
In this study, we propose a SSR method to study the band selection problem of HSI dataset.The SSR method has the following two symmetric assumptions: that the HSI bands can be reconstructed by the selected bands with a sparse coefficient matrix and the selected bands can be sparsely represented in the feature space spanned by the HSI bands.The SSR method selects the representative bands by finding archetypes of the minimal convex hull containing the HSI band points.The SSR method estimates the representative bands by solving an optimization program with the block-coordinate descent scheme and the final representative bands are obtained by picking the real counterpart that has the smallest differences with each element in the estimated Z .Three groups of experiments on Indian Pines and PaviaU datasets were carefully designed to test the SSR method and the results are compared with those of four state-of-the-art methods and the original band sets.SSR outperforms the four other methods in three quantitative measures, AIE, ACC and ARE, and has the best classification accuracies, ACAs and OCAs.Moreover, the SSR subset could obtain better classification accuracies than the original band set and then could successfully deal with the "curse of dimensionality" problem in HSI classification.Besides, the contrast in computational times illustrate that SSR has the second shortest computational times among all five methods.Therefore, SSR is a good alternative for band selection on HSI dataset in realistic classification applications.

Figure 1 .
Figure 1.The image of Indian Pines dataset.

Figure 2 .
Figure 2. The image of Pavia dataset.
. When fixing the variable B ptq at the t-th iteration, each column a i ě 0 and |a i | " 1, @i P t1, ¨¨¨, Nu

Table 1 .
The ground truth of training and testing samples in each class for Indian Pines dataset.

Table 2 .
The ground truth of training and testing samples in each class for PaviaU dataset.

Table 3 .
Contrast in quantitative evaluation of band subsets from all five methods on both datasets.

Table 4 .
Classification accuracies of all five methods with a certain k for both datasets.

Table 6 .
Computational times of all five methods on both HSI datasets.