Multiscale Superpixel-Based Sparse Representation for Hyperspectral Image Classification †

Recently, superpixel segmentation has been proven to be a powerful tool for hyperspectral image (HSI) classification. Nonetheless, the selection of the optimal superpixel size is a nontrivial task. In addition, compared with single-scale superpixel segmentation, the same image segmented on a different scale can obtain different structure information. To overcome such a drawback also utilizing the structural information, a multiscale superpixel-based sparse representation (MSSR) algorithm for the HSI classification is proposed. Specifically, a modified segmentation strategy of multiscale superpixels is firstly applied on the HSI. Once the superpixels on different scales are obtained, the joint sparse representation classification is used to classify the multiscale superpixels. Furthermore, majority voting is utilized to fuse the labels of different scale superpixels and to obtain the final classification result. Two merits are realized by the MSSR. First, multiscale information fusion can more effectively explore the spatial information of HSI. Second, in the multiscale superpixel segmentation, except for the first scale, the superpixel number on a different scale for different HSI datasets can be adaptively changed based on the spatial complexity of the corresponding HSI. Experiments on four real HSI datasets demonstrate the qualitative and quantitative superiority of the proposed MSSR algorithm over several well-known classifiers.


Introduction
A hyperspectral sensor can capture hundreds of narrow contiguous spectral bands from the visible to infrared spectrum for each image pixel.Therefore, hyperspectral images consist of rich spectral-spatial information, which have attracted great attention in different application domains, such as national defense [1], urban planning [2], precision agriculture [3,4] and environment monitoring [5][6][7].
In the last few decades, HSI classification has been an important issue in remote sensing.In the earlier research, different pixel-wise approaches have been developed [8][9][10][11].However, without considering the spatial information, the obtained classification results by these approaches usually contain much noise.To further improve the classification performance, methods incorporating the spatial information of the HSI have been proposed recently.In these methods, pixels in a small region are assumed to belong to the same material and have similar spectral properties.Various contexture feature extraction methods used in the traditional two-dimensional images have been extend to the HSI for improving the classification performance, such as the Gabor filter [12], the local binary pattern (LBP) filter [13], the edge-preserving filter (EPF) [14], the two-dimensional Gaussian derivative (GD) filter [15] and the extended morphological profiles (EMPs) filter [16,17].In addition, to exploit the nonlinearity information, kernel technique has been widely used in HSI.For instance, the generalized composite kernel [18], graphic kernel [19], spatial-spectral derivative-aided kernel [20] and probabilistic kernel [21] have been introduced to the HSI classification.Furthermore, HSI can be regarded as "cube" data.In this case, the tensor-based classification methods have been used for the HSI [22][23][24].Moreover, with the rise of deep learning, spectral-spatial information-based deep learning algorithms [25][26][27] are also applied in the HSI classification, which can extract potential and invariant features of the HSI.
In light of the operating mechanism of human vision, the key information of a nature image can be captured by learning a sparse coding.The sparse representation technique has been developed according to the above-mentioned theory.In the last few years, this technique has been extremely employed in the computer vision domain, such as face recognition [28], feature matching [29,30] and image fusion [31].In these different signal processing tasks, state-of-the-art performance is usually obtained.Recently, sparse representation classification (SRC) has also attracted much attention for the classification of the HSI [27,[32][33][34][35].The SRC assumes that a test pixel can be approximately represented by a linear combination of all training samples.The class label of the test pixel is determined by which class leads to the minimum reconstruction error.For the pixel-wise SRC, the HSI classification result usually appears very noisy.To gain better classification accuracies, Chen et al. [36] proposed a joint sparse representation classification (JSRC) for the HSI.The JSRC assumes that pixels in a fixed window belong to the same class, and thus, these pixels can be simultaneously represented by a set of common atoms of a training dictionary.In the past few years, several modified versions of the JSRC have been proposed [20,[37][38][39][40][41].Although these approaches obtain improved performance, the neighborhood of the test pixel is a fixed square window.That is, for each test pixel in these methods, if pixels locate at the image edges or in a detailed region, the neighborhood may contain pixels from different classes, and the classification results are usually unsatisfactory.Therefore, to solve the aforementioned problem, the shape of the regions should be adaptively changed according to the different spatial information of the HSI.
In the image processing field, various superpixel segmentation methods have been widely used [42][43][44].The superpixel has also been introduced for the HSI classification in recent years [45][46][47][48][49].Each superpixel of the image is an adaptive segmentation region according to the spatial structure.Therefore, it can effectively exploit the spatial information compared with the fixed window centered at the test pixel.Meanwhile, developing classification methods in a superpixel-by-superpixel manner has lower computational complexity than the pixel-wise approaches.However, for single-scale superpixel based algorithms, the accuracy of the superpixel segmentation will directly affect the final results [45,46,50].Therefore, the choice of the superpixel size is important.However, it is not a trivial work to choose the optimal superpixel size.The reason is that the small size may be short of enough information and, the large size may result in the error segmentation.In fact, for single-scale superpixel segmentation, some mixed superpixels that consist of pixels from different classes will still exist in the segmentation image.In addition, for the same region of an image, different structural information can be explored by segmenting superpixels on different scales.In view of this reason, multiscale superpixel-based methods are used for feature representation, target detection and recognition in some very recent works [50][51][52].For different applications, the superpixel information of different scales is usually integrated via different strategies, such as adopting the similarity between a pixel and the average of pixels within the superpixel [50], converting to a sparse constraint problem [51] and utilizing the convolutional neutral network (CNN) [52].These methods can effectively integrate multiscale information to obtain the optimal result.In this paper, a modified segmentation strategy of multiscale superpixels is proposed.In the strategy, the number of each scale superpixel is related with the complexity of the first principal component of the HSI.Adopting the segmentation strategy, a multiscale superpixel-based sparse representation (MSSR) algorithm is proposed.
The rest of this paper is organized as follows.In Section 2, the JSRC algorithm for the HSI classification is briefly introduced.The proposed MSSR for the HSI classification is detailed in Section 3. In Section 4, the experimental results and discussions are given.Finally, in Section 5, the paper is summarized, and the future works are suggested.

JSRC Algorithm for HSI Classification
For HSI, pixels in a fixed window are assumed to come from the same ground materials and share the same spectral characteristics.According to the sparse representation theory, the correlations among the pixels within the window can be represented by the joint sparse regularization.Specifically, we denote one pixel in a HSI with B bands as y c ∈ R B and pixels in the Let t j be the number of training samples from the j-th class, and ∑ M j=1 t j = T.Then, pixels Y c in the window can be represented as: where N is the possible noise and According to the JSRC algorithm, the sparse regularization places a l row,0 -norm on the sparse matrix A, which means to select a number of the most representative nonzero rows in A. The joint sparse matrix A can be obtained by solving the following optimization problem: where K represents the sparsity level.The simultaneous orthogonal matching pursuit (SOMP) algorithm [53] can efficiently solve (2).After the sparse coefficients matrix Â is obtained, the class label of Y c is determined by the minimum residual error: where is the corresponding reconstruction residual error of the j-th class.

Proposed MSSR for HSI Classification
Compared with the fixed-shape neighborhood in the JSRC method, a superpixel is an adaptively spatial region, which is beneficial to obtain a better classification performance [45][46][47].However, as previously mentioned, it is difficult to determine the optimal superpixel size.Meanwhile, the land covers in HSI have very complex structures with different sizes.Therefore, the multiscale superpixel-based approach is applied in the MSSR algorithm, which can more effectively exploit the spatial information of the HSI.The proposed MSSR algorithm for HSI classification consists of three parts: (1) the generation of multiscale superpixels in HSI; (2) the sparse representation for HSI with multiscale superpixels; (3) the fusion of multiscale classification results.The algorithmic schematic is demonstrated in Figure 1, and the detailed description is given below.

Generation of Multiscale Superpixels in HSI
As shown in Figure 1, to reduce the computational cost, the principal component analysis (PCA) algorithm [54] is firstly applied on the original HSI.Since the first principal component contains the major information of the HSI, we denote it as a fundamental image.Then, multiscale superpixel segmentation is applied on the fundamental image.For the multiscale superpixel segmentation, let F represent the fundamental image.Let S n denote the number of superpixels in the n-th scale.Let Y n k represent the k-th superpixel in the n-th scale.Then, the fundamental image F can be described as: In terms of Equation ( 4), the total number of superpixel scales is (2N + 1).
In general, the more complicated structure of the fundamental image is, the greater the number of segmented superpixels should be.Therefore, in the MSSR algorithm, we connect the number of superpixels in the n-th scale S n with the complexity of the fundamental image.To be specific, the Canny operator [55] is applied for the fundamental image F to gain the corresponding edge image.The edge ratio C [56], which is the proportion of nonzero pixels accounting for the total pixels in the edge image, reflects the complexity of the fundamental image.Then, S n is defined as: where S f is the fundamental number of superpixels, which is empirically selected.In general, the more complicated the fundamental image is, the larger the value of S f should be.In addition, in terms of Equation ( 5), when the fundamental image is more complicated, the number of superpixels in the same scale is also larger.Therefore, the step length variation among multiscale superpixels is related with the complexity of the fundamental image.At the same time, the variation range of multiscale superpixels is also connected with the image complexity.It should be noted that other advanced methods depicting the image complexity might be applied to enhance the performance, but that will increase the computation amount [57,58].
According to the number of superpixels in the n-th scale S n , a graph-based segmentation algorithm is used to generate the n-th scale superpixel segmentation result.Graph-based image segmentation algorithms are widely used in superpixel segmentation [59][60][61].Among these, the entropy rate superpixel (ERS) [61] segmentation method has been demonstrated to be very efficient.Specifically, the fundamental image F is firstly mapped to a graph G = (V, E), where V is the vertex set denoting pixels of the fundamental image and E is the edge set representing the pairwise similarities given in the form of a similarity matrix.In the ERS, for the n-th scale of superpixel segmentation, the graph is partitioned into connected S n subgraphs by choosing a subset of edges A n ⊆ E. To obtain the compact and homogeneous superpixels, an entropy rate term H n (A n ) is adopted.Meanwhile, a balancing term B n (A n ) is utilized to enable superpixels with similar sizes.Therefore, the objective function of the ERS method is given by: max where ω n ≥ 0 is the weight of the balancing term.As described in [62], a greedy algorithm effectively solves the optimization problem in (6).After multiscale superpixel segmentation, for each test pixel, there are (2N + 1) corresponding superpixels, which incorporate the test pixel.Then, the spatial information of each superpixel will be combined with the spectral information of pixels within the superpixel for HSI classification.Therefore, for each test pixel, there will be (2N + 1) classification results.

Sparse Representation for HSI with Multiscale Superpixels
Multiscale superpixel segmentation results combine the original HSI to acquire a group of HSI marked with multiscale superpixels.Therefore, there are (2N + 1) different marked regions corresponding to each test pixel in the HSI.For pixels within each region, they are supposed to have similar spectral characteristics.Hence, these pixels are simultaneously represented by a few common atoms from a structure dictionary.Assume the superpixel Y n k contains p spectral pixels, i.e., is the corresponding component of sub-dictionary D j j=1,2,••• ,M in A n k .The joint sparse matrix A n k can be obtained by applying (2): The reconstruction residual error of each class can be described as: The class label of Y n k is represented as:

Fusion of Multiscale Classification Results
For each test pixel, the class labels of corresponding multiscale superpixels may be different.That is, there are (2N + 1) different classification results for an HSI.For these multiscale classification results, a quick and effective decision fusion strategy (i.e., the majority voting) is utilized to obtain the final classification result.Specifically, assume the class labels of a test pixel under different scales respectively are l 1 , l 2 , • • • , l 2N+1 .We count the number of each class occurrence, and denote them as The class label of the test pixel can be obtained by:

Experimental Results and Discussion
In this section, the effectiveness of the proposed MSSR algorithm is tested in the classification of four hyperspectral datasets, i.e., the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) Indian Pines image, the Reflective Optics System Imaging Spectrometer (ROSIS-03) University of Pavia image, the AVIRIS Salinas image and the Hyperspectral Digital Image Collection Experiment (HYDICE) Washington DC image.The performance of the proposed MSSR algorithm is compared with those of seven competing classification algorithms, i.e., SVM [8], EMP [16], SRC [36], JSRC [36], multiscale adaptive sparse representation (MASR) [63], superpixel-based classification via multiple kernels (SCMK) [46] and the superpixel-based discriminative sparse model (SBDSM) [64].The EMP, JSRC, MASR, SCMK, SBDSM and MSSR algorithms take advantage of the spectral-spatial information for HSI classification, while the SVM and SRC algorithms only exploit the spectral information.It should be noted that, for the SBDSM algorithm, the sparse dictionary is built by directly extracting pixels from HSI.Therefore, compared with the MSSR algorithm, the SBDSM algorithm is based on the single-scale superpixel and sparse representation.The Washington DC image was recorded by the HYDICE sensor over the Washington DC Mall.The image consists of 280 × 307 pixels, each pixel including 210 spectral bands.The spectral coverage ranges from 0.4 to 2.5 µm and the spatial resolution of the image is 3 m per pixel.In the experiments, bands ranging from 0.9 to 1.4 µm, where the atmosphere of these bands is opaque, are discarded from the dataset, leaving 191 bands.Figure 5 demonstrates the false-color composite of the Washington DC image and the corresponding reference data, which considers six classes of interest.

Comparison of Results
In the experiments, the SVM algorithm adopting a spectral Gaussian kernel is implemented with the LIBSVM [65] package, which is accelerated with Visual C++ software (6.0 Version).The parameters C and σ of the SVM are obtained by ten-fold cross validation.For the EMP algorithm, the parameters of feature extraction are set to the default in [16].Once these morphological features are acquired, an SVM classifier is applied for the HSI classification.For the MASR and SCMK algorithms, the parameters are set to the same in [46,63], respectively.The parameters for the SRC and JSRC algorithms are tuned to reach the best results in these experiments.For the MSSR algorithm, the fundamental number of superpixel S f is set to 3200, 3200, 1600 and 12,800 for the Indian Pine, University of Pavia, Salinas and Washington DC images, respectively.The number of multiscales for the four images is respectively set to 7, 7, 5 and 11.For the SBDSM algorithm, the number of superpixels is obtained by applying Equation (5), in which the fundamental numbers of superpixels for the four images are the same values in the MSSR algorithm, and the power exponent n in Equation ( 5) is set to zero.In the following subsection, the parameters of the proposed MSSR algorithm and the parameters of the SBDSM algorithm will be further analyzed.In addition, different algorithms are compared based on the overall accuracy (OA), average accuracy (AA) and kappa coefficient.These quantitative values of each algorithm were averaged over ten runs to diminish the possible bias.
In the experiment on the Indian Pines image, 10% of the labeled samples for each class are randomly selected as the training set and the remainder as the test set (see Table 1).Figures 6 and 7, respectively, show the superpixel segmentation maps and classification maps under different single scales.In the two figures, the number of single-scale superpixels is gained by using Equation ( 5), in which the fundamental superpixel number is set to 3200 and the power exponent n is an integer changing from −3 to 3. Table 2 lists the quantitative values under different scales.Obviously, different scales yield different performances for different classes.For example, when the power exponent is set to −3, the value of OA is minimum.However, for this scale, the sixth, eighth, ninth and thirteenth classes, the optimal classification performances are obtained.Conversely, although the optimal OA is acquired when the power exponent n is zero, the classification accuracy of the third class at this scale is minimum.These results demonstrate that, for the HSI classification, the optimal single scale is not suitable for all of the spatial structure regions; multiscale information fusion may be a better approach.In addition, the classification maps from the compared classifiers are illustrated in Figure 8, and the quantitative results are tabulated in Table 3.As can be seen, the SVM and SRC algorithms, which only consider the spectral information, deliver the classification maps with much noise.Meanwhile, the spectral-spatial-based classification algorithms (EMP, JSRC, MASR, SCMK, SBDSM and MSSR) significantly outperform the pixel-wise algorithms.Compared with the SBDSM algorithm, the MSSR algorithm achieves more accurate estimations in the detailed area.This result indicates that the multiscale strategy can overcome the problem of the non-uniformity of spatial structure.At the same time, the problem bringing from the existing mixed superpixels of the single-scale superpixel segmentation can be well solved.Moreover, in terms of OA, AA and the kappa coefficient, the proposed MSSR algorithm also outperforms the other compared algorithms.In the experiment on the Pavia University image, we randomly select 1% labeled pixels for each class as training samples and the rest of labeled pixels as the testing samples (see Table 4).Superpixel segmentation maps and classification maps under different scales are shown in Figures 9 and 10.The quantitative results are illustrated in Table 5.As can be seen from Figure 9, Figure 10 and Table 5, with the increase of superpixel number, more details are presented in the segmentation maps and the classification maps.In this case, for the regions containing an abundance of details, the classification performances are improved.For example, the region size of the forth and the ninth classes is small and their classification accuracies are improved with the increase of superpixel number.Meanwhile, the classification accuracy of the seventh class is always 100%.The reason is that the region of this class is relatively smooth, and the superpixel-based classification method can obtain a good classification result.In addition, the classification maps and quantitative results from different classifiers on the University of Pavia image are shown in Figure 11 and Table 6.As shown from Figure 11 and Table 6, the proposed MSSR algorithm achieves competitive results, in terms of visual quality and quantitative metrics.Moreover, the MSSR algorithm obtains higher classification accuracies than the SBDSM algorithm for almost all classes.For some classes, the increase is obvious.For instance, in Table 6, the classification accuracy of pixels representing gravel climbs from 78.19% to 98.78%.The results demonstrate the superiority of multiple scales compared with the single scale, which can more accurately classify pixels in the detailed or near-edge regions.The number of single-scale superpixels is gained by using Equation ( 5), in which the fundamental superpixel number is set to 3200 and the power exponent n is an integer changing from −3 to three:    The third and fourth experiments are conducted on the Salinas and Washington DC images.For the Salinas image, 0.2% of the labeled data are randomly chosen for training, and the remaining 99.8% of data for testing (see Table 7).For the very small proportion of the training samples, the experiment is quite challenging.In the Washington DC image, 2% of the labeled pixels are selected as a training set and the remaining 98% as a testing set (see Table 8).For the Salinas image, Figures 12 and 13 respectively show the superpixel segmentation maps and classification maps under different scales.The corresponding quantitative values of classification results are tabulated in Table 9.For the Washington DC image, the superpixel segmentation maps and classification maps under different scales are illustrated in Figures 14 and 15.Table 10 shows the classification accuracies under different scales.As can be seen, for the Salinas image, although the proportion of training samples is very small, the high OA under all scales is acquired.Meanwhile, some classes, such as the first class, the third class and the ninth class, can obtain 100% classification accuracies.This is because the Salinas image has many large homogeneous regions, which make superpixel segmentation become easy.Obviously, error segmentation will appear when the number of superpixels is too small.In this case, the classification performance of some classes with small size regions will be deteriorated sharply, such as the eleventh class and the twelfth class.For the Washington DC image, it consists of many heterogeneous regions.Therefore, the optimal number of segmented superpixels is large, and the OA value is relatively steady near of the optimal scale.The qualitative and quantitative results from different algorithms on the Salinas and the Washington DC are shown in Figures 16 and 17 and Tables 11 and 12.We can observe that the proposed MSSR algorithm is usually superior to the other classifiers on the two datasets.Especially, for the Salinas image, compared with the other methods, the superpixel-based sparse representation algorithms greatly improve the classification accuracy under the condition of very limited training samples.Table 9. Classification accuracy of the Salinas image under different scales.The number of single-scale superpixels is gained by using Equation ( 5), in which the fundamental superpixel number is set to 1600 and the power exponent n is an integer changing from −2 to two.Class-specific accuracy values are in percentage.The best results are highlighted in bold typeface.Table 10.Classification accuracy of the Washington DC image under different scales.The number of single-scale superpixels is gained by using Equation ( 5), in which the fundamental superpixel number is set to 12,800 and the power exponent n is an integer changing from −5 to five.Class-specific accuracy values are in percentage.The best results are highlighted in bold typeface.The average running times over ten realizations of the proposed MSSR algorithm and the other algorithms are given in Table 13.We implemented these experiments using MATLAB on a computer with an Intel(R) Xeon(R) CPU E5-2603 v3 @1.60 GHz and 96 GB of RAM.As can be seen, in the case of a comparatively great proportion of training samples, the SBDSM algorithm consumes much less execution time compared with other algorithms.This demonstrates the high efficiency of the superpixel-based sparse classification strategy.However, on account of the multiscale information procedure, the MSSR algorithm consumes more computation time.Meanwhile, under the condition of less training samples, for the SBDSM algorithm, there is no advantage on the computation speed.The main reason is that the computational cost by the training processes for the SVM, EMP and SCMK algorithms is significantly reduced.In addition, for the multiscale-based sparse algorithms, the time consumption of the MSSR algorithm decreased compared with the MASR method, which illustrates the effectiveness of the superpixel-based strategy.Moreover, the running time is expected to be further reduced by adopting a general-purpose graphics processing unit.

Effect of Superpixel Scale Selection
We first analyze the effect of the number of single-scale superpixels.In this analysis, the training and testing sets for the Indian Pines, University of Pavia, Salinas and Washington DC images are the same sets in the aforementioned comparison experiments.The average results over ten runs for this analysis are obtained.Figure 18 shows the OA values under different fundamental superpixel numbers.The number of single-scale superpixels is gained by applying Equation ( 5), in which the power exponent n is set to zero.In these experiments, the fundamental superpixel number S f is varied from 400 to 51,200.Because the range of S f is large, we adopt the log scale to represent.The log value of 400, 800, 1600, 3200, 6400, 12,800, 25,600 and 51,200, respectively, is 2.6, 2.9, 3.2, 3.5, 3.8, 4.1, 4.4 and 4.7.As shown in Figure 18, with the increase of the fundamental superpixel number, the OA values of the four images firstly increase and then decrease.This demonstrates that the classification accuracy will be deteriorated at a too small or a too large superpixel scale.Moreover, in Figure 18, the highest OA values of the four images are obtained when S f reaches 3200, 3200, 1600 and 12,800 for the Indian Pines, University of Pavia, Salinas and Washington DC images, respectively.This result illustrates that the classification accuracy is closely relative with the complexity of the image.Compared with the other three images, the Washington DC image needs the most superpixels to realize the optimal classification performance, although its size is relatively small.The main reason is that the spatial structure of this image is comparatively complicated.In addition, the OA values keep high in a large dynamic range.For example, when the fundamental superpixel number is between 6400 and 51,200, the OA values of the Washington DC image are always over 90%.Therefore, multiscale superpixel information can be used for HSI classification.Figure 19 illustrates the relationship among the OA value, the number of multiscale superpixels and the fundamental superpixel number S f .In the same way, for S f in Figure 19, we adopt the log value.The log value of 400, 800, 1600, 3200, 6400, 12,800, 25,600 and 51,200, respectively, is 2.6, 2.9, 3.2, 3.5, 3.8, 4.1, 4.4 and 4.7.The training sets for the four images are set to the same as before.The number of superpixel multiscales is an odd number rising from three to 15.In these experiments, five contiguous fundamental superpixel numbers in the previous experiment are selected, in which the third number corresponds to the optimal OA in the previous experiment.As can be seen, for the Indian Pines and University of Pavia images, the optimal segmentation accuracies are obtained when the fundamental superpixel number is set to 3200 and the scale number of multiscale superpixels is set to seven.For the Salinas image, when the fundamental superpixel number reaches to 1600 and the scale number of multiscale superpixels reaches to five, the optimal OA can be acquired.For the Washington DC image, when the fundamental superpixel number is 12,800 and the scale number of multiscale superpixels is 11, the best classification performance is obtained.Among these four hyperspectral images, the Salinas image contains many homogeneous regions, and the Washington DC has a large number of heterogeneous regions.The experimental results show that, to obtain perfect classification performance, the relatively complex image requires more scale number and more superpixel number at each scale.

Comparison of Different Superpixel Segmentation Methods
In this section, we compare the performance of the adopted ERS algorithm with the performances of two competing superpixel segmentation algorithms, i.e., the Felzenszwalb-Huttenlocher (FH) algorithm [59] and the simple linear iterative clustering (SLIC) [42] algorithm.In the comparison, the Indian Pines image is utilized.The training and testing sets are the same sets as before.In the ERS algorithm, the fundamental superpixel number is 3200 and the power exponent n in Equation ( 5) is an integer changing from −3 to three.In the FH algorithm, multiscale superpixels are generated with various scales and smoothing parameters, σ and k_S, where σ is the Gaussian smoothing parameter and k_S controls the region size.In the SLIC algorithm, the number of multiscale superpixels is obtained by presetting the number of superpixel segmentation.In the two comparison experiments, for each scale, the superpixel number is approximately the superpixel number generated by applying the ERS algorithm.Figures 6,20   23, and Tables 2, 14 and 15.In addition, the three over-segmentation methods are utilized in the proposed MSSR algorithm, which are called the MSSR_ERS, MSSR_FH and MSSR_SLIC algorithms, respectively.The classification accuracies and maps by these algorithms are also shown in Table 16 and Figure 24.As can be seen from Figures 6, 20 and 21, for the FH algorithm, since there is no explicit constraint on length, the segmentation shapes are the most irregular.The SLIC algorithm yields similar size segmentation regions by setting uniform grid spacing.For the ERS algorithm, a balancing term is utilized to enable superpixels with similar sizes.From the point of classification performances on the single scale, with the increase of superpixel number, the OA values present an increase firstly, then a descending tendency.When superpixel number is too small, some classes with few pixels are completely misclassified by applying the FH and SLIC algorithms, such as the seventh class.The error classification is induced by error segmentation.On the contrary, when the superpixel number is too large, small segmented regions will deteriorate classification performance, since it lacks sufficient spatial information for classification.Three superpixel segmentation methods are applied in the proposed MSSR algorithm.These methods almost consume the same computational time.The classification results show that ERS-based classification algorithm outperforms the other two over-segmentations-based classification algorithms.
the structure dictionary with T training samples from M distinct classes, where D j j=1,2,••• ,M are sub-dictionaries.

4. 1 .
Datasets DescriptionThe Indian Pines image was acquired by the AVIRIS sensor over the agricultural Indian Pines site in northwestern Indiana.The size of this image is 145 × 145 × 220, where 20 water absorption bands are discarded.The spatial resolution of the image is 20 m per pixel and the spectral coverage ranges from 0.2 to 2.4 µm.The reference of this image contains sixteen classes, most of which are different kinds of crops.Figure2demonstrates the false-color composite of the Indian Pines image and the corresponding reference data.

Figure 3 .
Figure 3. University of Pavia image: (a) false-color image; and (b) reference image.

Figure 6 .Figure 7 .
Figure 6.Superpixel segmentation results of the Indian Pines image under different scales.The number of single-scale superpixels is gained by using the Equation (5), in which the fundamental superpixel number is set to 3200 and the power exponent n is an integer changing from −3 to three: (a) n = −3; (b) n = −2; (c) n = −1; (d) n = 0; (e) n = 1; (f) n = 2; and (g) n = 3.

Figure 10 .
Figure 9. Superpixel segmentation results of the University of Pavia image under different scales.The number of single-scale superpixels is gained by using Equation (5), in which the fundamental superpixel number is set to 3200 and the power exponent n is an integer changing from −3 to three: (a) n = −3; (b) n = −2; (c) n = −1; (d) n = 0; (e) n = 1; (f) n = 2; and (g) n = 3.

Figure 12 .Figure 13 .
Figure 12.Superpixel segmentation results of the Salinas image under different scales.The number of single-scale superpixels is gained by using Equation (5), in which the fundamental superpixel number is set to 1600 and the power exponent n is an integer changing from −2 to two: (a) n = −2; (b) n = −1; (c) n = 0; (d) n = 1; and (e) n = 2.

Figure 18 .
Figure 18.Classification accuracy OA versus different fundamental superpixel numbers S f on the four test images.

Figure 19 .
Figure 19.Relationship among the OA value, the number of multiscale superpixels and the fundamental superpixel number S f : (a) Indian Pines image; (b) University of Pavia image; (c) Salinas image; and (d) Washington DC image.
and 21 illustrate the superpixel segmentation results under different scales by adopting ERS, FH and SLIC algorithms, respectively.The qualitative and quantitative results under different scales are shown in Figures 7, 22 and

Figure 25 .
Figure 25.Effect of the number of training samples on SVM, EMP, SRC, JSRC, MASR, SCMK, SBDSM and MSSR for the: (a) Indian Pines image; (b) University of Pavia images; (c) Salinas image; and (d) Washington DC image.

Table 1 .
Number of training and test samples of sixteen classes in the Indian Pines image.

Table 2 .
Classification (5)uracy of the Indian Pines image under different scales.The number of single-scale superpixels is gained by using Equation(5), in which the fundamental superpixel number is set to 3200 and the power exponent n is an integer changing from −3 to three.Class-specific accuracy values are in percentage.The best results are highlighted in bold typeface.AA, average accuracy.

Table 3 .
Classification accuracy of the Indian Pines image by the classification algorithms used in this work for comparison.Class-specific accuracy values are in percentage.The best results are highlighted in bold typeface.EMP, extended morphological profile; JSRC, joint sparse representation classification; MASR, multiscale adaptive sparse representation; SCMK, superpixel-based classification via multiple kernel; SBDSM, superpixel-based discriminative sparse model; MSSR, multiscale superpixel-based sparse representation.

Table 4 .
Number of training and test samples of nine classes in the University of Pavia image.

Table 5 .
(5)ssification accuracy of the University of Pavia image under different scales.The number of single-scale superpixels is gained by using Equation(5), in which the fundamental superpixel number is set to 3200 and the power exponent n is an integer changing from −3 to three.Class-specific accuracy values are in percentage.The best results are highlighted in bold typeface.

Table 6 .
Classification accuracy of the University of Pavia image by the classification algorithms used in this work for comparison.Class-specific accuracy values are in percentage.The best results are highlighted in bold typeface.

Table 7 .
Number of training and test samples of sixteen classes in the Salinas image.

Table 8 .
Number of training and test samples of six classes in the Washington DC image.

Table 11 .
Classification accuracy of the Salinas image by the classification algorithms used in this work for comparison.Class-specific accuracy values are in percentage.The best results are highlighted in bold typeface.

Table 12 .
Classification accuracy of the Washington DC image by the classification algorithms used in this work for comparison.Class-specific accuracy values are in percentage.The best results are highlighted in bold typeface.

Table 13 .
Average running time (seconds) over ten realizations for the classification of the Indian Pines, University of Pavia, Salinas and Washington DC images by the algorithms used in this work.

Table 14 .
Classification accuracy of the Indian Pines image under different scales.The FH segmentation method is applied in the SBDSM algorithm.Multiscale superpixels are generated with various scales and smoothing parameters, σ and k_S.Class-specific accuracy values are in percentage.The best results are highlighted in bold typeface.

Table 15 .
Classification accuracy of the Indian Pines image under different scales.The SLIC segmentation method is applied in the SBDSM algorithm.The number of multiscale superpixels is obtained by presetting the number of superpixel segmentations n_S.Class-specific accuracy values are in percentage.The best results are highlighted in bold typeface.Class n_S = 147 n_S = 206 n_S = 288 n_S = 415 n_S = 562 n_S = 780 n_S = 1055

Table 16 .
Classification accuracy of the Indian Pines image by applying the MSSR_FH, MSSR_SLIC and MSSR_ERS algorithms.Class-specific accuracy values are in percentage.The best results are highlighted in bold typeface.