Hyperspectral Classiﬁcation via Superpixel Kernel Learning-Based Low Rank Representation

: High dimensional image classiﬁcation is a fundamental technique for information retrieval from hyperspectral remote sensing data. However, data quality is readily affected by the atmosphere and noise in the imaging process, which makes it difﬁcult to achieve good classiﬁcation performance. In this paper, multiple kernel learning-based low rank representation at superpixel level (Sp_MKL_LRR) is proposed to improve the classiﬁcation accuracy for hyperspectral images. Superpixels are generated ﬁrst from the hyperspectral image to reduce noise effect and form homogeneous regions. An optimal superpixel kernel parameter is then selected by the kernel matrix using a multiple kernel learning framework. Finally, a kernel low rank representation is applied to classify the hyperspectral image. The proposed method offers two advantages. (1) The global correlation constraint is exploited by the low rank representation, while the local neighborhood information is extracted as the superpixel kernel adaptively learns the high-dimensional manifold features of the samples in each class; (2) It can meet the challenges of multiscale feature learning and adaptive parameter determination in the conventional kernel methods. Experimental results on several hyperspectral image datasets demonstrate that the proposed method outperforms several state-of-the-art classiﬁers tested in terms of overall accuracy, average accuracy, and kappa statistic.


Introduction
The hyperspectral image (HSI) reflects information on hundreds of adjacent narrow spectral bands collected by the airborne or space-borne hyperspectral imagers.Abundant spectral information for HSI makes it suitable for many important applications, such as mineral exploration [1], agricultural production [2], and military target detection [3,4].Thus, HSI classification is a hotspot in the field of remote sensing image processing [5][6][7][8][9][10].Based on the rich spectral information of HSI, many pixel-by-pixel classification methods are used for hyperspectral image classification, such as multinomial logistic regression (MLR) [11], support vector machine (SVM) [12], artificial neural network (ANN) [13], and maximum likelihood method [14].In recent years, the sparse/low rank classifier [15][16][17] has been applied to conduct HSI classification.These types of methods use sparse or low rank properties to exploit the prior knowledge.Given a training sample set, any test sample can be represented by a small number of training samples as the representation coefficient is sparse or of low rank.
Due to the noise of HSI, the accuracy of pixel-by-pixel classification is low when only spectral information is used.Spectral-spatial combination methods and kernel-based methods are proved to effectively improve the accuracy of HSI classification [18][19][20][21][22].The spectral-spatial joint classification methods assume the categories of adjacent pixels in the image are the same.Then, the spatial information constraints are integrated into the classification model to improve accuracy.For example, the support vector machine and Markov random field (SVM-MRF) [23] method assume the terrain distribution of HSI that conforms to Markov randomness and then uses an MRF regular term to build spatial information in the Bayesian framework.The joint sparse representation methods [24,25] use the training samples as a dictionary to express the object spectrum and usually introduce its neighborhood spectra to represent the spatial information.In addition, the total variation (TV) method [26] and extend morphological features (EMPs) [27] approach based on morphological analysis [28] are used to generate spatial information by describing the texture characteristics of the image, and to effectively improve the classification accuracy.In recent years, tensor learning methods [29] are developed in the area of hyperspectral image processing.In [30], Zhang et al. proposed tensor discriminative locality alignment for hyperspectral image spectral-spatial feature extraction to improve HSI classification accuracy.In addition, a multiclass support tensor machine was proposed for HSI classification in Reference [31].In this paper, a tensorial image interpretation framework was constructed for tensor-based HSI feature representation, feature extraction, and classification.
For the linearly non-separable high-dimensional data in HSIs, the kernel-based methods transform them to be linearly separable by mapping the data to a higher dimensional nonlinear feature space.The commonly used kernel functions include the radial basis function (RBF), the mean filtering kernel (MF), and the neighborhood filtering kernel (NF).In addition, the composite kernel (CK) is also widely used in HSI classification, such as in the support vector machine composite kernel method (SVMCK) [32], multinomial logistic regression composite kernel method (MLRCK) [33], and sparse representation composite kernel method [21].These CK methods introduce the spatial information to nonlinear data extracted by different kernel functions and show good classification performance.Unlike the CK method that used spatial filtering to generate spatial information, the spatial-spectral kernel (SSK) [34] method considers the similarity of the samples directly in the high-dimensional kernel feature space, so that it can reflect the complex manifold of the data hidden in the high-dimensional space.Hence, SSK-based methods can achieve better classification performance with a small set of training samples.
In the above methods, spatial information is often extracted through a square window, which is not consistent with the spatial distribution of HSIs.Using image features and superpixels [35,36] to select homogeneous regions adaptively can overcome the shortcomings of the fixed square window.For example, the superpixel-based CK (SPCK) method [37] has been developed.However, there is no single kernel function which can cope with complicated HSIs.Compared with the single kernel-based method, multiple kernel learning(MKL)-based methods [38,39] are more conducive to enhance the interpretability of decision functions and to represent the properties of the original sample space fully.In Reference [38], the authors proposed the representative multiple kernel learning (RMKL) method that selects the optimal kernel combination to map the original data to the high-dimensional space and to classify the data with a SVM classifier.
In this study, the multiple kernel learning is extended and applied at a superpixel level.Low rank representation is then integrated to multiple superpixel kernel learning to do HSI classification.The proposed method (Sp_MKL_LRR) consists of three steps of processing.First, principal component analysis (PCA) [40] reduces the dimension of the hyperspectral images, and the entropy rate segmentation [41] is applied to the dimension reduction results to generate the adaptive superpixels.Second, the superpixel spectral-spatial kernel is obtained by using the RBF kernel on the superpixels, and the optimal kernel combination is selected by RMKL method [38] in the multi-kernel learning framework.Finally, a superpixel kernel low rank representation method classifies the hyperspectral image.This proposed method offers two advantages over the previously described approaches.First, the global correlation constraint is exploited by the low rank representation, while the high-dimensional manifold features of the samples in each class are adaptively learned by the superpixel kernel and the local neighborhood information of the samples is fully extracted.Second, the multiple kernel learning method is adopted to overcome the challenges of multiscale feature learning and adaptive parameter determination in the conventional kernel methods, which yields more accurate classification results.Experimental results on the Indian Pines and the University of Pavia datasets demonstrate that the proposed method outperforms many state-of-the-art classifiers in terms of the overall accuracy, average accuracy, and the kappa coefficient.
The rest of this paper is outlined as follows.Section 2 introduces the proposed method gradually.In Section 2.1, we firstly provide a brief introduction to the superpixel kernel generation which is the theoretical base of the proposed method.Then, RMKL is extended and applied at a superpixel level to select the optimal superpixel kernel combination in Section 2.2.In Section 2.3, a superpixel kernel low rank representation method is proposed to classify the hyperspectral image.The experimental results and analysis are given in Section 3. Finally, Sections 4 and 5 give further discussion and conclusion, respectively.

The Proposed Sp_MKL_LRR Method
Figure 1 presents the architecture of proposed method, which is followed by detailed descriptions of each component.approaches.First, the global correlation constraint is exploited by the low rank representation, while the high-dimensional manifold features of the samples in each class are adaptively learned by the superpixel kernel and the local neighborhood information of the samples is fully extracted.Second, the multiple kernel learning method is adopted to overcome the challenges of multiscale feature learning and adaptive parameter determination in the conventional kernel methods, which yields more accurate classification results.Experimental results on the Indian Pines and the University of Pavia datasets demonstrate that the proposed method outperforms many state-of-the-art classifiers in terms of the overall accuracy, average accuracy, and the kappa coefficient.The rest of this paper is outlined as follows.Section 2 introduces the proposed method gradually.In Sub-section 2.1, we firstly provide a brief introduction to the superpixel kernel generation which is the theoretical base of the proposed method.Then, RMKL is extended and applied at a superpixel level to select the optimal superpixel kernel combination in Sub-section 2.2.In Sub-section 2.3, a superpixel kernel low rank representation method is proposed to classify the hyperspectral image.The experimental results and analysis are given in Section 3. Finally, Sections 4 and 5 give further discussion and conclusion, respectively.

The Proposed Sp_MKL_LRR Method
Figure 1 presents the architecture of proposed method, which is followed by detailed descriptions of each component.

Superpixel Kernel Generation
First, the PCA method is used to reduce the dimension of the hyperspectral data.Next, the ERS superpixel segmentation method [41] generates several superpixels in the first principal component image.Figure 2 shows the superpixel segmentation result of the Indian Pines dataset in which each successive neighborhood is a superpixel.x , and  is a function mapping x to the high-dimensional feature space to obtain the new feature ( ) x  .The neighborhood information of i x in the kernel feature space is extracted by the mean filtering, which is defined as where i n and respectively.The superpixel kernel between i x and j x can be represented as

Superpixel Kernel Generation
First, the PCA method is used to reduce the dimension of the hyperspectral data.Next, the ERS superpixel segmentation method [41] generates several superpixels in the first principal component image.Figure 2 shows the superpixel segmentation result of the Indian Pines dataset in which each successive neighborhood is a superpixel.Assuming x i represents the i-th sample in the image and x sp i represents the superpixel containing x i , and φ is a function mapping x to the high-dimensional feature space to obtain the new feature φ(x).The neighborhood information of x i in the kernel feature space is extracted by the mean filtering, which is defined as where n i and x m sp i represent the number of pixels located in x sp i and the m-th pixel in x sp i , respectively.The superpixel kernel between x i and x j can be represented as where n j is the number of pixels located in is the Gaussian RBF kernel function, and σ is kernel scale.
Considering the training set X = [x 1 , x 2 , • • • , x t ] ∈ R b×t with b bands and t training samples, and a testing sample y ∈ R b×1 , the column feature vector for training and testing samples can be given as, Remote Sens. 2018, 10, x FOR PEER REVIEW 4 of 18 ( , ) ( ), ( ) where j n is the number of pixels located in ( , Gaussian RBF kernel function, and  is kernel scale.
Considering the training set with b bands and t training samples, and a testing sample 1 b y   , the column feature vector for training and testing samples can be given as, From the above definitions, the superpixel kernel directly calculates the similarity between two pixels by averaging the pixels values in the kernel feature space within the corresponding superpixel.Thus, it eases the problem caused by window-based techniques, effectively overcomes the influence of outliers in superpixels and reflects the similarities between two superpixels in the kernel feature space other than the similarities between two vectors.

Multiple Kernel Learning
From Equations ( 3) and ( 4), columns of ( , )  can be viewed as new feature vectors that can be used for the pixel-based classifiers.However, the value of kernel scale  also affect the classification accuracy.In this subsection, the representative multiple kernel learning method is then utilized to determine the final multiple kernel learning expression by seeking the optimal low dimension representation in the original space, which is comprised of multiple basic kernel matrices in the superpixel.Given f kernel scales, min 1  From the above definitions, the superpixel kernel directly calculates the similarity between two pixels by averaging the pixels values in the kernel feature space within the corresponding superpixel.Thus, it eases the problem caused by window-based techniques, effectively overcomes the influence of outliers in superpixels and reflects the similarities between two superpixels in the kernel feature space other than the similarities between two vectors.

Multiple Kernel Learning
From Equations ( 3) and (4), columns of K σ SPMF (X, X) and K σ SPMF (X, y) can be viewed as new feature vectors that can be used for the pixel-based classifiers.However, the value of kernel scale σ also affect the classification accuracy.In this subsection, the representative multiple kernel learning method is then utilized to determine the final multiple kernel learning expression by seeking the optimal low dimension representation in the original space, which is comprised of multiple basic kernel matrices in the superpixel.Given f kernel scales, the essential kernel function are computed using Equation (3), and each matrix is transformed to a column vector according to a fixed order obtaining a new expression in the form of f kernel matrixes, Here, vec(•) is a stacking operator that turns a matrix into a vector.
According to Reference [38], the following model is established to find the low-dimensional linear subspaces in the kernel matrix group: where W ∈ R f ×p is a matrix space after feature mapping as well as a linear space formed by its column vector {w r } p r=1 .Z ∈ R p×t 2 is the projected matrix onto the linear subspace spanned by W. The dual form of minimizing Equation ( 5) regard to W is given as where ∑ H SPMF =H SPMF H SPMF T and I p is the identity matrix with size p × p.
The optimization of Equations ( 5) and ( 6) is solved by eigenvalue decomposition or singular value decomposition.By searching the W * , the variances of Z will be maximized.Using the same strategy in [38], we only take max-variance projection vector into account and set p = 1.Then, the projection vector Here, W * represents the optimal weight vector of the kernel function, and the optimal kernel function is a linear combination of these weights, such as Finally, the optimal superpixel kernel in Equation ( 2) is formulated as The procedure for the superpixel multiple kernel learning method is outlined in Algorithm 1.
Step 2: Give the range of kernel scale values [σ min , σ max ].
Step 5: Transform the superpixel kernel matrices to vectors and use Equation ( 6) to determine the optimal weights Step 6: Compute the optimal superpixel kernel functions using Equation (8).

Superpixel Kernel Low Rank Representation Classifier
In HSIs, the spectral characteristics of the homogeneous region are also changed because of the light, environment, weather, and other factors.The spectrums of pixels belonging to the same class may also be similar or different.This phenomenon of inconsistency decreases the classification accuracy.To solve this problem, it is necessary to excavate the characteristics of the spectral kernel space in HSIs and to build a more robust classification model using structured prior.In Reference [42], low rank representation was employed for HSI classification resulting in smooth boundaries between different classes in HSIs.Compared with other sparse prior based methods, the effect becomes more apparent within a much larger homogeneous region.Inspired by References [42][43][44], the superpixel kernel is applied to the low rank representation model for HSI classification.Specifically, a combination of the smooth slicing effect of low rank representation, the spatial information, and high-dimensional separability constructed by the superpixel kernel is made to improve the classification accuracy further.
Let Y = [y 1 , y 2 , . . ., y r ] ∈ R n×r be the testing sample set with b bands and r samples.We use the superpixel mapping function φ SPMF to map testing sample set Y and training sample set where Having these definitions in mind, the low rank representation-based classification is given as where U is an unknown low rank coefficient matrix and λ is a regulatory factor.A lower value of λ indicates a weaker constraint on the rank of U.
After solving U, the classification criteria based on the kernel low rank can be defined as where c = {1, 2, • • • , C} is a category index of a pixel, and δ c (U i ) is an indicator operation zeroing out all elements of y i that do not belong to the class c.
Having K(X, X) = φ SPMF (X) T φ SPMF (X), all high-dimensional mappings in Equation ( 9) are expressed in the form of an inner product as Tr(U T GU) − Tr(U T P) where const is a constant term, G is a positive semi-definite matrix with elements G ij = K * SPMF (x i , x j ).P is a matrix with elements P ij = K * SPMF (x i , y j ).Thus, the classification criteria is rewritten as The optimization of Equation ( 11) is a convex problem solvable using ADMM [45].Substituting U with variable V, Equation ( 11) is transmuted into the constrained optimization problem as Using the Lagrange multiplier method to transform Equation ( 13) into an unconstrained optimization problem, we obtain the following expression: Tr(U T GU) − Tr(U T P) where L is the Lagrangian multiplier and µ is the Lagrangian parameter.
ADMM adopts an alternately updating variables strategy to solve the above optimization with The optimum solution of Equation ( 16) is then formulated as: where A(Σ)B T is the singular value decomposition of the matrix V + L/µ and Θ 2λ/µ is a soft threshold operator: The optimization problem of Equation ( 17) has an explicit solution of the following equation: In Equations ( 14)-( 19), µ is a penalty parameter.A dynamic update strategy is applied to accelerate the speed of iteration with the equation: where ρ ≥ 1 and 0 ≤ ε 1 ≤ 1.The iteration stopping condition is set as: The process of the superpixel kernel and low rank representation-based classifier is provided in Algorithm 2.

Algorithm 2. Superpixel kernel low rank representation-based classification algorithm
Step 1: Inputs: training sample set X and corresponding category set along with the testing sample set Y.
Step 2: Select the optimal superpixel kernel function using Algorithm 1.
Step 9: Calculate the iteration stopping condition according to Equation (21).

end end while
Step 10: Determine the class of each pixel with Equation (12).
Step 11: Output: the categories of testing samples.

Datasets Description and Assessment Indicators
To verify the effectiveness of the proposed method, two real hyperspectral image datasets are employed for performance evaluation of classification.They are downloaded from http://lesun.weebly.com/hyperspectral-data-set.html.These two datasets have been well pre-processed.Therefore, we can mainly focus on the task of HSI classification.The only preprocessing applied to these two datasets is normalization.
Indian Pines Data: This dataset was collected by the airborne visible light/infrared imaging spectrometer (Airborne Visible Infrared Imaging Spectrometer, AVIRIS) over the Indian Pine test site in Northwest Indiana, USA.The spatial size of the image is 145 × 145 pixels and the spatial resolution is 20 m/pixel.The original dataset contains 224 bands across the spectral range from 0.2 to 2.4 µm.In this experiment, 4 bands full of zero and 20 water vapor absorption bands are removed with the remaining 200 bands used for classification.Figure 3a shows a pseudo color image; moreover, Figure 3b shows the corresponding ground truth, that contains sixteen types of objects.

Datasets Description and Assessment Indicators
To verify the effectiveness of the proposed method, two real hyperspectral image datasets are employed for performance evaluation of classification.They are downloaded from http://lesun.weebly.com/hyperspectral-data-set.html.These two datasets have been well preprocessed.Therefore, we can mainly focus on the task of HSI classification.The only preprocessing applied to these two datasets is normalization.
Indian Pines Data: This dataset was collected by the airborne visible light/infrared imaging spectrometer (Airborne Visible Infrared Imaging Spectrometer, AVIRIS) over the Indian Pine test site in Northwest Indiana, USA.The spatial size of the image is 145 × 145 pixels and the spatial resolution is 20 m/pixel.The original dataset contains 224 bands across the spectral range from 0.2 to 2.4 µm.In this experiment, 4 bands full of zero and 20 water vapor absorption bands are removed with the remaining 200 bands used for classification.Figure 3a shows a pseudo color image; moreover, Figure 3b shows the corresponding ground truth, that contains sixteen types of objects.University of Pavia: This dataset was collected by the Reflective Optics System Imaging Spectrometer optical sensor (ROSIS) over an urban area surrounding the University of Pavia.The spatial size of the image is 610 × 340 and the spatial resolution is 1.3 m per pixel.The original dataset contains 115 bands across the spectral range from 0.43 to 0.86 µm.After removing 12 noisy bands, 103 bands remain for classification.Figure 4a shows its false color image and Figure 4b shows the corresponding ground truth, which contains nine types of objects.
Experiments have been carried out to compare the HSI classification with several methods, including the proposed Sp_MKL_LRR method, the traditional classifiers (e.g., SVM and LRR), spectral-spatial combined method (e.g., SMLR_SPTV), the kernel based method (e.g., SVMCK), the superpixel based methods (e.g., SPCK, SCMK) and multiple kernel learning method (e.g., RMLK).The simple definitions of these methods are given as follows: (1) SVM: Support vector marching-based classifier [46]; (2) LRR: Low rank representation-based classifier [44]; (3) SVMCK: Composite kernels and SVM-based method [32]; (4) SMLR_SPTV: Multinomial logistic regression and spatially adaptive total variation based method [26]; (5) SPCK: Superpixel based composite kernel and SVM classifier [37]; (6) SCMK: Superpixel, multiple kernels and SVM-based method [42]; (7) RMKL: Representative multiple kernel learning and SVM-based method [38]; (8) Sp_MKL_SVM: The proposed superpixel multiple kernel learning and SVM-based method; (9) Sp_MKL_LRR: The proposed method.University of Pavia: This dataset was collected by the Reflective Optics System Imaging Spectrometer optical sensor (ROSIS) over an urban area surrounding the University of Pavia.The spatial size of the image is 610 × 340 and the spatial resolution is 1.3 m per pixel.The original dataset contains 115 bands across the spectral range from 0.43 to 0.86 µm.After removing 12 noisy bands, 103 bands remain for classification.Figure 4a shows its false color image and Figure 4b shows the corresponding ground truth, which contains nine types of objects.
The overall accuracy (OA), average accuracy (AA), and the kappa (κ) coefficient are used as key properties of performances evaluation.Assuming that a confusion matrix with C classes is denoted by M, in which the matrix element M ij represents the sample amount of the i-th class that is classified as the j-th class.The expressions of OA, AA and κ are given as follows: where, r is the number of all testing samples and r i is the number of testing samples in i-th class.
The experimental results are calculated by averaging the values obtained after ten Monte Carlo runs.
Remote Sens. 2018, 10, x FOR PEER REVIEW 9 of 18 The overall accuracy (OA), average accuracy (AA), and the kappa (  ) coefficient are used as key properties of performances evaluation.Assuming that a confusion matrix with C classes is denoted by M , in which the matrix element ij M represents the sample amount of the i -th class that is classified as the j -th class.The expressions of OA, AA and  are given as follows: , where, r is the number of all testing samples and i r is the number of testing samples in i -th class.
The experimental results are calculated by averaging the values obtained after ten Monte Carlo runs.

The Number of Superpixels
Different numbers of superpixels are used in the proposed method to study its influence on HSI classification accuracy.As a result, Figure 5 shows the OA values obtained from the proposed method based on two datasets.From these results, it is obvious that the classification results have poor performance when the scale of superpixel amount is extremely large or small.Such inferior performance is caused by the superpixel containing pixels from different substances in the condition of very large-scale of superpixel amount and even larger homogeneous.Conversely, in the condition of extremely small-scale amount of superpixels, the performance of the spatial constraint degrades and leads to a lower classification accuracy.In the experiments, the proposed method achieves better classification performance when the number of superpixels ranging in (200, 500) for the Indian Pines dataset and [600, 1600] for the University of Pavia dataset with an optimal number of superpixels of 300 and 800, respectively.

The Number of Superpixels
Different numbers of superpixels are used in the proposed method to study its influence on HSI classification accuracy.As a result, Figure 5 shows the OA values obtained from the proposed method based on two datasets.From these results, it is obvious that the classification results have poor performance when the scale of superpixel amount is extremely large or small.Such inferior performance is caused by the superpixel containing pixels from different substances in the condition of very large-scale of superpixel amount and even larger homogeneous.Conversely, in the condition of extremely small-scale amount of superpixels, the performance of the spatial constraint degrades and leads to a lower classification accuracy.In the experiments, the proposed method achieves better classification performance when the number of superpixels ranging in (200, 500) for the Indian Pines dataset and [600, 1600] for the University of Pavia dataset with an optimal number of superpixels of 300 and 800, respectively.

Impact of Parameter λ
Figure 6 plots the OA results as a function of the parameter λ from Equation ( 7) based on the Indian Pines and University of Pavia datasets.From the results, the best classification performance is obtained when the value of λ is in the range of [0.0001, 0.001].The OA value reduces rapidly after the value of λ growing over 0.001.This is caused by the fact that the low rank constraint performance is stronger when a large value is set for λ.It also affects the similarity of the first half of Equation ( 9) and forces the pixels belonging to different categories to be classified into the same category with such a strong low rank constraint.In the experiments, the value of λ is set to 0.0001.The OA value reduces rapidly after the value of  growing over 0.001.This is caused by the fact that the low rank constraint performance is stronger when a large value is set for  .It also affects the similarity of the first half of Equation ( 9) and forces the pixels belonging to different categories to be classified into the same category with such a strong low rank constraint.In the experiments, the value of  is set to 0.0001.

Classification Results on AVIRIS Indian Pines Dataset
Figure 8 shows the classification results using different methods on the Indian Pines dataset.The corresponding OA, AA, and kappa coefficient are included in Table 1.The classification accuracy of the SVM classifier is much lower when using fewer training samples.The accuracy of the LRR classifier is much higher than that of the SVM classifier, which demonstrates that the LRR classifier can ensure better classification accuracy with fewer training samples.In the SVMCK method, the square window is used to select the homogeneous region, so the classification accuracy is not satisfactory.The SMLR_SpTV method used the MRF regular term of the TV first-order neighborhood system to describe the spatial information.Although the effect is good at the edge region in the image, the classification accuracy within the small regions is very low.Compared with SVMCK, the SPCK method using a superpixel to select homogeneous regions improves the classification accuracy of the edge pixels significantly.SCMK utilized the multiple kernel technique to improve the accuracy of its classification further.MKL_LRR is a multiple kernel learning-based low rank representation method, which has a higher classification accuracy in small object areas compared with the SCMK method.Sp_MKL_SVM is a method combining superpixel multiple kernel learning and SVM classification with an overall classification precision higher than that of the previous methods.The proposed Sp_MKL_LRR method provides the highest classification accuracy, especially for small objects, because it integrates the advantages of the superpixel kernel, multiple kernel learning, and low rank representation in HSI classification.

Classification Results on AVIRIS Indian Pines Dataset
Figure 8 shows the classification results using different methods on the Indian Pines dataset.The corresponding OA, AA, and kappa coefficient are included in Table 1.The classification accuracy of the SVM classifier is much lower when using fewer training samples.The accuracy of the LRR classifier is much higher than that of the SVM classifier, which demonstrates that the LRR classifier can ensure better classification accuracy with fewer training samples.In the SVMCK method, the square window is used to select the homogeneous region, so the classification accuracy is not satisfactory.The SMLR_SpTV method used the MRF regular term of the TV first-order neighborhood system to describe the spatial information.Although the effect is good at the edge region in the image, the classification accuracy within the small regions is very low.Compared with SVMCK, the SPCK method using a superpixel to select homogeneous regions improves the classification accuracy of the edge pixels significantly.SCMK utilized the multiple kernel technique to improve the accuracy of its classification further.MKL_LRR is a multiple kernel learning-based low rank representation method, which has a higher classification accuracy in small object areas compared with the SCMK method.Sp_MKL_SVM is a method combining superpixel multiple kernel learning and SVM classification with an overall classification precision higher than that of the previous methods.The proposed Sp_MKL_LRR method provides the highest classification accuracy, especially for small objects, because it integrates the advantages of the superpixel kernel, multiple kernel learning, and low rank representation in HSI classification.

Classification Results on ROSIS University of Pavia Dataset
In this experiment, the proposed method is evaluated with the ROSIS University of Pavia data set while comparing with other state-of-the-art methods mentioned above.Figure 9 shows the classification results using different methods on the ROSIS University of Pavia dataset.The corresponding OA, AA, and kappa coefficient are included in Table 2.As concluded previously, the proposed Sp_MKL_LRR classifier achieves the highest accuracy among all the other classifiers.The results here also show that the proposed method can obtain better classification performance on irregularly shaped regions by using the superpixel kernel method.A kernel-based low rank classifier can also obtain better classification results on small object areas with fewer training samples.Meanwhile, the multiple kernel learning overcomes the single feature scale issue and difficult parameter determination of the kernel methods.All these advantages lead to the proposed method achieving the highest classification accuracy among all the reviewed classifiers.
proposed Sp_MKL_LRR classifier achieves the highest accuracy among all the other classifiers.The results here also show that the proposed method can obtain better classification performance on irregularly shaped regions by using the superpixel kernel method.A kernel-based low rank classifier can also obtain better classification results on small object areas with fewer training samples.Meanwhile, the multiple kernel learning overcomes the single feature scale issue and difficult parameter determination of the kernel methods.All these advantages lead to the proposed method achieving the highest classification accuracy among all the reviewed classifiers.

Discussion
The airborne or space-borne hyperspectral sensors collect data in hundreds of adjacent narrow spectral bands.The differences of their spectral features provide a great important significance to conduct different materials classification.In the last decade, several HSI classification methods were proposed for improving the classification performance.In this paper, we proposed a novel superpixel kernel learning based low rank representation method for HSI classification.During this study, we find that the classification effect obtained by integrating spatial information in the classification process is better than those methods without spatial information, and the superpixel can well introduce spatial information.The kernel-based methods transform the linearly non-separable high-dimensional data to be linearly separable by mapping the data to the higher dimensional nonlinear feature space.Thus, the kernel-based methods are able to improve HSI classification accuracy further.Compared with the single kernel-based method, these multiple kernel-based methods are more conducive to enhance the interpretability of decision functions and to represent the properties of the original sample space fully.In this paper, the KA criterion is applied to find the optimal kernel function, thus effectively solves the problem of kernel selection.In the classifier design process, we use low rank representation classifier to execute HSI classification task.The experimental results on two datasets demonstrate that the classification performance of the low rank representation classifier is better than that of SVM classifier and MLR classifier.Moreover, the number of training samples required by the low rank classifier is not as strict as that of the other classifiers.
There are three parameters in the proposed Sp_MKL_LRR method.The first one is the number of superpixels.We find that the classification accuracy is not satisfactory when the number of superpixels is either in an extremely large-scale or in an extremely small-scale.The capacity of spatial constraint will be affected when the number of superpixels is too much, and the purity of a single superpixel will be reduced if the number of superpixel is too little.From the experimental results, we think that the choice of superpixel number in HSI image is related to the size and the content complexity of HSI image.The number of superpixels chosen between 0.3% and 0.5% of image size will deliver a good classification performance.It is also suggested that the number of superpixels can be reduced if the content of HSI image is relatively simple, and the number of superpixels should be increased if the content of HSI image is quite complex.The second parameter is λ in low rank representation.This parameter is used to balance the class discrimination ability and low rank constraint.We suggest to take the value of λ in the range of [0.0001, 0.001] when using the proposed KLRR method presented in Equation (7).The third parameter is the numbers of training samples.The experimental results show that the proposed method is not strict with the number of training samples.15% of global samples in each class used for training is sufficient for obtaining an outstanding classification result.This demonstrates that the low rank representation-based classifier is robust to the number of training samples.Based on the above analysis and discussion, the future work will focus on multi-scale superpixels fusion for HSI classification, automatic selection of parameter in LRR classifier and high-performance computing.We will continue to improve the efficiency of the proposed method to meet the practical application of massive hyperspectral imagery.

Conclusions
A hyperspectral classification method is proposed, which is designed on the basis of a superpixel kernel, multiple kernel learning, and low rank representation.With this method, we first construct superpixel graphics and select homogeneous regions for dimensionality reduction results on hyperspectral images.Second, according to the multiple kernel learning framework, an optimal superpixel kernel function is selected through the feature of the superpixel kernel matrix.Finally, the optimal superpixel kernel and low rank representation classifier are integrated to execute HSI classification.The proposed method is applied to the Indian Pines and University of Pavia datasets.OA, AA, and the kappa coefficient obtained on two datasets are 0.9685, 0.9560, 0.9641 and 0.9391, 0.9093, 0.9192, respectively.Compared with SVM classifier, the OA, AA and the kappa coefficient obtained by the proposed method improved 16%, 27%, 20% on Indian Pines dataset and 14%, 15%, 17% on the University of Pavia dataset.Compared with LRR classifier, the OA, AA and the kappa coefficient obtained by the proposed method improved 14%, 15%, 17% on Indian Pines dataset and 16%, 8%, 21% on the University of Pavia dataset.Compared with other state-of-art methods, the OA, AA and the kappa coefficient obtained by the proposed method improved 5-11%, 5-16%, 7-13% on Indian Pines dataset and 5-10%, −0.1-6%, 6-13% on the University of Pavia dataset.These results demonstrate the superiority of the proposed method in HSI classification.At the same time, the proposed method obtains higher classification accuracy under a variety of conditions, such as fewer training samples, small object areas, and irregular regions.

Figure 1 .
Figure 1.The overflow of the proposed Sp_MKL_LRR method.
Assuming i x represents the i -th sample in the image and

Figure 1 .
Figure 1.The overflow of the proposed Sp_MKL_LRR method.

Figure 2 .
Figure 2. The superpixel segmentation of the Indian Pines dataset.

Figure 2 .
Figure 2. The superpixel segmentation of the Indian Pines dataset.

Figure 3 .
Figure 3. (a) false color map and (b) ground truth of the Indian Pines dataset.

Figure 3 .
Figure 3. (a) false color map and (b) ground truth of the Indian Pines dataset.

Figure 4 .
Figure 4. (a) false color map and (b) ground truth of the University of Pavia dataset.

Figure 4 .
Figure 4. (a) false color map and (b) ground truth of the University of Pavia dataset.

Figure 5 .Figure 6
Figure 5. Classification performances from different numbers of superpixels for (a) the Indian Pines and (b) University of Pavia datasets.3.2.2.Impact of Parameter  Figure 6 plots the OA results as a function of the parameter  from Equation (7) based on the Indian Pines and University of Pavia datasets.From the results, the best classification performance is obtained when the value of  is in the range of [0.0001, 0.001].The OA value reduces rapidly after the value of  growing over 0.001.This is caused by the fact that the low rank constraint performance is stronger when a large value is set for  .It also affects the similarity of the first half of Equation (9) and forces the pixels belonging to different categories to be classified into the same category with such a strong low rank constraint.In the experiments, the value of  is set to 0.0001.

Figure 6 .
Figure 6.Impact of the low rank constraint parameter  .

Figure 5 .
Figure 5. Classification performances from different numbers of superpixels for (a) the Indian Pines and (b) University of Pavia datasets.

Figure 5 .Figure 6
Figure 5. Classification performances from different numbers of superpixels for (a) the Indian Pines and (b) University of Pavia datasets.3.2.2.Impact of Parameter  Figure 6 plots the OA results as a function of the parameter  from Equation (7) based on the Indian Pines and University of Pavia datasets.From the results, the best classification performance is obtained when the value of  is in the range of [0.0001, 0.001].The OA value reduces rapidly after the value of  growing over 0.001.This is caused by the fact that the low rank constraint performance is stronger when a large value is set for  .It also affects the similarity of the first half

Figure 6 .
Figure 6.Impact of the low rank constraint parameter  .

Figure 6 .
Figure 6.Impact of the low rank constraint parameter λ.

Figure 7
Figure 7 shows the classification accuracy of the proposed method and the superpixel multiple kernel learning-based SVM classifier (Sp_MKL_SVM) which is obtained on a different number of training samples.The Sp_MKL_SVM classifier is generated by the SVM classifier to replace the low rank representation classifier in the Sp_MKL_LRR classifier.From these results, the classification accuracy of the Sp_MKL_SVM method depends on the number of training samples in more depth.Compared with Sp_MKL_SVM, the proposed method offers better classification accuracy with fewer training samples.The classification accuracy is more stable when more than 3% and 15% samples are selected as training samples from the Indian Pines and University of Pavia datasets, respectively.The comparison results show that the low rank representation method obtains better classification results when the training set is small.

Figure 7 .
Figure 7. Impact of the number of training samples.(a) Indian Pines dataset and (b) University of Pavia dataset.

Figure 7 .
Figure 7. Impact of the number of training samples.(a) Indian Pines dataset and (b) University of Pavia dataset.

Table 1 .
The classification results on Indian Pines dataset.

Table 2 .
The classification results on University of Pavia dataset.