Hyperspectral Imagery Classification Based on Multiscale Superpixel-Level Constraint Representation

: Sparse representation (SR)-based models have been widely applied for hyperspectral image classiﬁcation. In our previously established constraint representation (CR) model, we exploited the underlying signiﬁcance of the sparse coe ﬃ cient and proposed the participation degree (PD) to represent the contribution of the training sample in representing the testing pixel. However, the spatial variants of the original residual error-driven frameworks often su ﬀ er the obstacles to optimization due to the strong constraints. In this paper, based on the object-based image classiﬁcation (OBIC) framework, we ﬁrstly propose a spectral–spatial classiﬁcation method, called superpixel-level constraint representation (SPCR). Firstly, it uses the PD in respect to the sparse coe ﬃ cient from CR model. Then, transforming the individual PD to a united activity degree (UAD)-driven mechanism via a spatial constraint generated by the superpixel segmentation algorithm. The ﬁnal classiﬁcation is determined based on the UAD-driven mechanism. Considering that the SPCR is susceptible to the segmentation scale, an improved multiscale superpixel-level constraint representation (MSPCR) is further proposed through the decision fusion process of SPCR at di ﬀ erent scales. The SPCR method is ﬁrstly performed at each scale, and the ﬁnal category of the testing pixel is determined by the maximum number of the predicated labels among the classiﬁcation results at each scale. Experimental results on four real hyperspectral datasets including a GF-5 satellite data veriﬁed the e ﬃ ciency and practicability of the two proposed methods.


Introduction
Hyperspectral remote sensing is a leading technology developed from remote sensing (RS) in the field of Earth observation, which accesses multidimensional information by combining imaging technology and spectral technology [1,2]. Hyperspectral image (HSI) can be viewed as a data cube with a diagnostic continuous spectrum, providing abundant spectral-spatial information, and different substances usually exhibit diverse spectral curves [3,4]. Because of the ability of characterization and discrimination of ground objects, HSI has become an indispensable technology in a wide range of applications such as civil construction and military fields [5,6]. As one of the popular applications in remote sensing, HSI classification (HSIC) is to use a mapping function to assign each pixel with a class label via its spectral characteristic and spatial information [7][8][9]. At present, a large number of HSIC methods have been proposed successively, mainly including the following two aspects: one is the classification based on the spectral information, which mainly focuses on the study of spectral features and spectral classifiers, such as support vector machines (SVM) and the maximum likelihood classifier (MLC). The other is realized by extracting spatial features to assist the discrimination, for example, SVM-based Markov Random Field (SVM-MRF) and some segmentation-based classification frameworks [10][11][12][13]. However, due to the high dimensionality of HSI, the high correlation and redundancy have been discovered in both the spectral and spatial domains, it can be inferred that HSI is mainly low-rank and can be represented sparsely, though the original HSI is not sparse [14,15].
In this context, sparse representation (SR)-based methods have been widely applied for HSIC and accompanied a state-of-the-art performance [16]. The classic SR-based classifier (SRC) is to use as few samples as possible to better represent the testing pixel [17]. Concretely, SRC firstly constructs a dictionary by labeling samples in different classes, and then represents the testing pixel by a mean of a linear combination of the dictionary and a weight coefficient under a sparse constraint. After obtaining the approximation of the testing pixel, the classification can be realized by analyzing which class yields the least reconstruction error [18]. However, this residual error-driven mechanism ignores the underlying significance and property of the sparse coefficient to a certain extent. The sparse coefficient plays a decisive role in the constraint representation (CR) model, and the category of the testing pixel is determined by the maximum participant degree (PD) in CR, of which PD is the contribution of labeled samples from different classes in representing the testing pixel. The CR model makes full use of a sparse principle to deal with the sparse coefficient, and achieves an equivalent and simplified effect to the classic SRC. As a powerful pattern recognition, both the SRC model and the CR model are the effective representational-based model, and generate a rather accurate result compared with SVM and some other spectral classifiers [19].
However, due to the sparse coefficients are susceptible to suffer spectral variability, some joint representation (JR)-based frameworks have successively appeared with consideration of the local spatial consistency, such as the joint sparse representational-based classifier (JSRC) and the joint collaborate representative-based classifier (JCRC) [20,21]. Similarly, based on the concept of PD and the PD-driven decision mechanism, adjacent CR (ACR) utilizes the PD of adjacent pixels as class-dependent constraints to classify the testing pixel. The adjacent pixels are defined in a fixed window in ACR, lacking consideration of the correlation of ground object, although there is no strong constraint in comparison with JSRC model. Therefore, in order to better characterize the image for classification, it is reasonable to utilize various features from spectral and spatial domains in the image [22,23].
Object-based image classification (OBIC) is a widely adopted classification framework with spatial discriminant characteristics. OBIC usually performs classification after segmentation [24]. Segmentation technology divides an image into several non-overlapping homogeneous regions according to the agreed similarity criteria. Some segmentation algorithms have shown an effective result in HSI, such as partitioned clustering and watershed segmentation [25][26][27][28]. In particular, the combination of the vector quantization clustering methods and the representation based has shown a well classification performance in some related literatures [29]. Therefore, the OBIC is a well-established framework, which can be widely applied for the HSIC tasks.
In this paper, a superpixel-level constraint representation (SPCR) model is proposed, combining a spatial constraint, simple linear iterative clustering (SLIC) superpixel segmentation, to the CR model [30]. Differing from the ACR model, the proposed SPCR method extracts the spectral-spatial information of pixels inside the superpixel block, preserves most of the edge information in the image, and estimates the real distribution of ground objects [31]. In general, the SPCR model utilizes the spectral feature of adjacent pixels, and transforms the individual PD to united activity degree (UAD) through a relaxed and adaptive constraint. As shown on the right side of Figure 1, the decision mechanism of the SPCR model is to classify the testing pixel into the category with the maximum UAD. However, like most OBIC-based methods, the constrained representation classification with Remote Sens. 2020, 12, 3342 3 of 21 a single fixed scale needs to find the optimal scale. To address this issue, it is necessary to propose a multiscale OBIC framework to comprehensively utilize image information [32]. As illustrated in Figure 1, we proposed an improved version based on the above SPCR model, called the multiscale superpixel-level constraint representation (MSPCR) method.
The MSPCR merges the classification maps generated by SPCR at different superpixel segmentation scales, which is implemented in three steps: (1) a segmentation step, in which the processed hyperspectral image is segmented into superpixel images with different scales by the SLIC algorithm; (2) a classification step, in which the PD of pixels inside the superpixel is utilized to shape the class-dependent constraint of the testing pixel; and (3) a decision fusion step, in which the final classification map of MSPCR is obtained through the decision fusion processed, based on the classification result of SPCR at each scale.
As mentioned above, the CR model classifies the testing pixel based on the PD-driven decision mechanism, and obtains a reliable performance with relatively low computational time. Considering the influence of the spectral variability, the ACR model adopts the PD of adjacent pixels to obtain the category of the testing pixel. However, the ACR only regards the pixels within a fixed window as adjacent pixels, lacking consideration to the correlation of ground objects. To address this issue, the SPCR model is firstly established by joining the CR model with the SLIC superpixel segmentation algorithm. Then, the MSPCR approach is successively proposed to alleviate the impact of the segmentation scale on the classification result of the SPCR method, and obtains high accuracies. Experimental results on four real hyperspectral datasets including a GF-5 satellite data are used to evaluate the classification performance of the proposed SPCR and MSPCR methods.
The rest of this paper is organized as follows. Section 2 reviews the related models, including classic representation-based classification methods and superpixel segmentation algorithm, i.e., SLIC that we used in this paper. Section 3 presents our proposed methods, firstly introduces the CR method and the ACR classifier, then presents the SPCR model and the MSPCR method proposed in this paper. Section 4 evaluates the classification performance of our proposed methods and other related methods via the experimental results on three real hyperspectral datasets. Section 5 takes a practical application and analysis to our proposed methods via the experiment on a GF-5 satellite data. Section 6 concludes this paper with some remarks.

Related Methods
In this section, we introduce several related methods of our framework. The classic sparse representation (SR)-based model and the joint representation (JR)-based framework are firstly reviewed in Section 2.1. Then the simple linear iterative clustering (SLIC) is presented in Section 2.2.

Representation-Based Classification Methods
Defining a testing pixel x i,j ∈ X in the location (i, j) of HSI X which contains B spectral bands and N = r × c pixels (r and c index the row and column of scene). The dictionary can be denoted as D = (D 1 , . . . , D K ) ∈ X, in which each column of D k is the samples selected from class k ∈ [1, K] (K is the number of classes).

Sparse Representation-Based Model
Since pixels in HSI can be represented sparsely, representation-based methods have been widely applied to process HSI due to their no assumption of data density distribution [33]. The SRC is a classic SR-based model, implementing classification based on several steps as follows. Firstly, it constructs a dictionary by training the available labeled samples, then represents the testing pixel by a sparse linear combination of the dictionary. Moreover, in order to use as few labeled samples as possible to represent the testing pixel, the weighted coefficients used in representation are sparsely constrained. Finally, the classification is conducted by a residual error-driven decision mechanism, which classifies the testing pixel as the class with minimum class-dependent residual error using the following formula: where α i,j 1 = n m=1 |α m | denotes the l 1 -norm and · 2 is the l 2 -norm, due to the optimization of l 0 -norm is a combinatorial NP-hard problem, the sparse constraint of weight coefficients α i,j adopts l 1 -norm to substitute l 0 -norm, where l 1 -norm is the closet convex function to the l 0 -norm. Moreover, λ is a scalar regularization parameter. As an indicator function, δ k (α i,j ) can assign zero to the element that does not belong to the class k. The weight vector,α i,j , can be optimized by the basis pursuit (BP) or basis pursuit denoising (BPDN) algorithm.

Joint Representation-Based Framework
HSIC initially focused on the spectral information because of its data characteristic, while the spatial information can be further exploited to reduce classification errors, according to the similar spectral characteristic among neighborhood pixels. As the second generation of SRC, the joint SRC (JSRC) is introduced under the JR-based framework, which has a solid classification performance after integrating spectral information with the local spatial coherence.
Based on the local spatial consistency, the fundamental assumption of JSRC is that the sparse vectors related with the adjacent pixels could share a common sparsity support [34]. In the JSRC, both the testing pixel and its neighboring pixels are stacked into the joint signal matrix, and sparsely represented using the dictionary and a row-sparse coefficient matrix [35]. The final classification result of JSRC is obtained by calculating the minimum total residual error as follows: where X i,j = (x i−w,j−w , ., x i,j , ., x i+w,j+w ) is a w × w pixel-sized square neighborhood centered on x i,j , and A i,j is the corresponding coefficient matrix. · F is the Frobenius norm and A i, j 1,2 = n s=1 a s 2 is the l 1,2 -norm, in which a s is the s-th row of A i,j .

Simple Linear Iterative Clustering
The OBIC is a widely used spectral-spatial classification framework, and it utilizes the spatial information after the procedure of segmentation [36]. As one of the widely used segmentation methods, the SLIC algorithm identifies superpixels by the over-segmentation approach. The idea of SLIC is to locally apply the K-means algorithm to obtain an effectively cluster segmentation result. Specifically, it measures the distance from each cluster center to pixels within a 2S × 2S block, where S = √ N/P. Here, N is the number of pixels, and P is the number of clustering centers which equals to the total number of superpixels [37].
In general, the SLIC algorithm can be implemented in several steps as follows: the first step is to select P initial clustering centers from the original image. Then it classifies each pixel to the nearest clustering center, and constructs various clusters respectively. The iterative clustering process is performed until the position of the cluster center became stable. As stated above, the original K-means algorithm calculates the distance from the whole map, while the searching area of SLIC is in the local area of each superpixel, thereby the SLIC algorithm alleviates the computation complexity to a great extent. The distance in SLIC is defined as follows: Remote Sens. 2020, 12, 3342

of 21
where D spectral is a spectral distance, which is used to ensure the homogeneity inside the superpixel, and the spectral distance between pixel i and pixel j is described as follows: where x i,d is the value of pixel i in band d, and D spatial represents the spatial distance, which is used to control the compact and regularity of the superpixels, the spatial distance between pixel i and pixel j is defined as follows: where (a i , b i ) is the location of pixel i, m, and ρ in Equation (3) are the scale parameter of superpixels.

Proposed Approach
As introduced in Section 2.1, both the classic SR-based model and the variant JR-based method conduct the classification using the class-dependent minimum residual error between the original observation and the approximate representation value. However, the residual error-based decision mechanism in the SR-based and JR-based frameworks ignore the importance of sparse coefficients. Section 3.1 introduces that the CR method and the ACR classifier can exploit the characteristic of the sparse coefficient. After that, we present the details of SPCR and the MSPCR in Section 3.2. Both methods are generally based on the spatial correlation. Specifically, the SPCR utilizes the spectral consistency feature among adjacent pixels in ACR, and then MSPCR achieves comprehensive utilization of various regional distribution.

CR Model
According to the principle of representation-based model, it can be regarded as representing the testing pixel via a sparse linear combination of the labeled samples. For the sake of understanding, a simple case can be assumed as Equation (6). The testing pixel is represented by a single element with nonzero coefficient (α p , α q , α m , . . . , α n ) from some certain classes (k, k + 1, . . . , k * ∈ [1, K]) as follows [38]: Sinceα i,j is sparsely constrained, the labeled samples which contributes to representing the testing pixel are the ones whose coefficients are not zero. In the process of representation, the larger measurement of the coefficient value, the higher contribution in representing the testing pixel, such that the testing pixel more likely belongs to the corresponding category. Therefore, CR directly exploits the sparse coefficient to conduct the classification, which is concise and equivalent to the residual error-driven determination mechanism. Specifically, it defines the participant degree (PD) from the perspective of the sparse coefficient, which estimates the contribution of labeled samples from different classes in representing the testing pixel x i,j . The PD of each class is calculated by the corresponding weight vector with l d -normed measurement (d = 1 or d = 2) as follows: The PD-driven decision mechanism of CR is to determine the category with the maximum PD, which can be expressed in Equation (8): Remote Sens. 2020, 12, 3342 6 of 21

ACR Model
Based on the PD-driven mechanism, an improved version, ACR has been proposed to correct spectral variation by imposing spatial constraints during the classification. According to the spectral similarity characteristic among the adjacent pixels, the adjacent pixels more likely belong to the same class [39]. In this context, the ACR brings better classification performance than that of the CR model through innovating the PD-driven mechanism with the spatial consistency of the adjacent pixels. The main principle of ACR is to use the PD of adjacent pixels as a constraint to determine the category of the testing pixel. Specifically, the ACR firstly defines adjacent pixels within a w × w pixel-sized window centered on the testing pixel, then constructs a k-dimensional PD image, and each dimensionality of the PD image shows the PD values of pixels in one class. The class-dependent activity degree (CAD) of each element is obtained after successively normalizing the PD image at each dimensionality, which could be expressed as follows: where k ∈ [1, K] denotes the class index, and (i, j) are the location of the testing pixel. With consideration of the spatial constraint of the adjacent pixels, the relative activity degree (RAD) is generated by combining the CAD of the testing pixel with the inactivity degree of its adjacent pixels through a scale compensation parameter τ, where the index of the adjacent pixels is v ∈ [1, w 2 ]. The ACR uses the RAD as the final contribution degree in representing the testing pixel x i,j , and the class of x i,j can be determined by the maximum RAD as follows: 3.2. Superpixel-Level CR (SPCR) and Multiscale SPCR (MSPCR)

SPCR Model
As mentioned above, the ACR model defines the adjacent pixels as pixels within a fixed pixel-sized window centered on the testing pixel. However, it does not consider the real distribution of ground objects. The superpixel block obtained by the superpixel segmentation algorithm is made up of some neighborhood pixels with similar spatial characteristics. Through combing the superpixel segmentation algorithm, we establish the SPCR model to further utilize the spectral consistency feature from the subset of adjacent pixels. In this way, the SPCR model conducts class-dependent constrained represent according to the PD of pixels inside the superpixel block centered on the testing pixel, which preserves most edge information of image in comparison to the sample selection in fixed window in ACR, and has a more objective consideration to the spatial distribution of the testing pixel. As illustrated in Figure 1, the schematic diagram of SPCR model is equal to MSPCR at a single segment scale, which can be implemented in several steps as follows.
Firstly, we obtain superpixel blocks by the SLIC algorithm. Since the SLIC can only process an image in the CIELAB color space, it is necessary to convert an HSI to a three bands image before processed by the SLIC algorithm. Therefore, the principal component analysis (PCA) method is adopted to reduce the spectral dimensionality in the SPCR method, which selects the first three components as the input of SLIC to generate a stable superpixel segmentation result [40]. Then, the category of the testing pixel can be measured by calculating the PD values of pixels inside the superpixel where the testing pixel is located. Specifically, using the PD values of pixels at the corresponding position of the superpixel, we built a SPD image surrounding x i,j with K dimension, and each dimension of SPD image shows the PD values of pixels in one class. Similar to ACR, the normalized value of each pixel in the k th SPD image is defined as the class-dependent activity degree (CAD) with regard to the class k. In order to further utilize the correlation of ground objects, SPCR combines the CAD of , CAD of other pixels insides superpixel through the scale compensation parameter, such that other pixels can give a properly constraint in classifying the testing pixel , i j x . Compared to the constraint with the local spatial information in RAD shown in formula (10), the united activity degree (UAD) utilizes the correlation of ground object via a similar combination, represented as follows: where

MSPCR Model
As shown in the aforementioned algorithm, the proposed SPCR method based on the OBIC framework generates solid performance through exploiting the spatial contextual information. However, as the classification results of SPCR with different segmentation scales are not the same, the superpixel segmentation-based HSI classification may not generate a comprehensive and stable result under a fixed segmentation scale. Thus, in particular, the performance of SPCR is highly affected by the scale level [41]. In order to solve these problems, it is reasonable to propose multiscale OBIC framework to comprehensively utilize image information. In this paper, MSPCR is firstly proposed by means of decision fusion with the classification result maps obtained by SPCR method at different segmentation scales. Compared with SPCR, the improved MSPCR not only uses multiple scales to balance the different size and distribution of ground objects, but also solves the problem of selecting the optimal segmentation scale.
Specifically, Figure 1 and Algorithm 1 illustrate the general schematic diagram and pseudo procedures of the MSPCR method, respectively. Firstly, similar to the workflow of the SPCR method, we simultaneously obtain the classification results of the testing pixel at different superpixel In order to further utilize the correlation of ground objects, SPCR combines the CAD of x i,j with CAD of other pixels insides superpixel through the scale compensation parameter, such that other pixels can give a properly constraint in classifying the testing pixel x i,j . Compared to the constraint with the local spatial information in RAD shown in formula (10), the united activity degree (UAD) utilizes the correlation of ground object via a similar combination, represented as follows: where e ∈ [1, l] indicates the element index in superpixel block, γ represents a scale compensation parameter. Moreover, the SPCR model classifies x i,j by analyzing which class leads to the maximum UAD i,j as follows:

MSPCR Model
As shown in the aforementioned algorithm, the proposed SPCR method based on the OBIC framework generates solid performance through exploiting the spatial contextual information. However, as the classification results of SPCR with different segmentation scales are not the same, the superpixel segmentation-based HSI classification may not generate a comprehensive and stable result under a fixed segmentation scale. Thus, in particular, the performance of SPCR is highly affected by the scale level [41]. In order to solve these problems, it is reasonable to propose multiscale OBIC framework to comprehensively utilize image information. In this paper, MSPCR is firstly proposed by means of decision fusion with the classification result maps obtained by SPCR method at different segmentation scales. Compared with SPCR, the improved MSPCR not only uses multiple scales to balance the different size and distribution of ground objects, but also solves the problem of selecting the optimal segmentation scale.
Specifically, Figure 1 and Algorithm 1 illustrate the general schematic diagram and pseudo procedures of the MSPCR method, respectively. Firstly, similar to the workflow of the SPCR method, we simultaneously obtain the classification results of the testing pixel at different superpixel segmentation scales. In this process, the superpixel block is generated by inputting the result of PCA into the SLIC algorithm, then classify the testing pixel by a relaxed and adaptive constraint inside the superpixel. After performing the SPCR method at each scale, a decision fusion process is applied to obtain the classification result of MSPCR, in which the category of the testing pixel y i is determined by the maximum number of labels of the testing pixel x i,j among the classification results at each scale, and the decision fusion process is expressed as follows: where y i is denoted as the final class label of x i,j , y q i represents the classification result of x i,j when the segmentation scale parameter is described as q, and mod is a modular function which defines y i with the most frequency category in [y

Algorithm 1. The proposed MSPCR method
Input: A hyperspectral image (HSI) image X, dictionary D, the testing pixel x i,j , regularization parameter λ, scale compensation parameter γ.
Step 1: Reshape X into a color image by compositing the first three principal component analysis (PCA) bands.
Step 3: Obtain the participation degree (PD) image of X according to Equation (7).
Step 4: Extract superpixel centered on the testing pixel x i, j from the PD image of X to get multiscale SPD image.
Step 5: Class-dependent normalization at each scale according to Equation (9).
Step 7: Assign the class of x i, j at each scale according to Equation (12).
Step 8: Determine the final class label by the decision fusion according to Equation (13). Output: The class labels y.

Experimental Results and Analysis
In this section, we investigated the effectiveness of the proposed SPCR and MSPCR models with three hyperspectral datasets. The detailed description of the applied datasets is given in Section 4.1. The parameter tuning related to the proposed models and other compared methods is presented in Section 4.2. We evaluate the performance of two proposed methods in comparison with the methods in the spectral domain and the spectral-spatial domain. The classic SR-based method, including SRC as well as its simplified model CR, and the classic SVM are firstly selected in the comparative experiments in the spectral domain. Then, the classic JR-based model JSRC, the typical model with post-processing of spatial information SVM-MRF, and the previously proposed ACR are further tested in the spectral-spatial domain. We randomly selected training samples 20 times in each experiment and calculated the overall accuracy (OA) and class-dependent accuracy (CA). We analyzed the experimental results of the two proposed methods and other related methods in Sections 4.3-4.5.

Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) Indian Pines Scene
The first data are of the Indian Pines scene acquired by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensors in the Northwestern Indiana, with a spatial resolution of 20 m. The scene covers 220 spectral bands ranging from 0.4 to 2.5 µm, and the size of the image is 145 × 145. In order to satisfy the sparse thought, eight ground-truth classes with a total of 8624 labeled samples are extracted from the original sixteen categories reference data. Figure 2a The second data are of the University of Pavia scene collected by the Reflective Optics Spectrographic Imaging System (ROSIS) over a downtown area near the University of Pavia in Italy, with a spatial resolution of 1.3 m. After removing 12 bands with high noise and water absorption, the scene has 103 spectral bands ranging from 0.43 to 0.86 µm, with 610 × 340 pixels. Nine ground-truth classes with a total of 42,776 labeled samples are contained in the reference data. Figure 3a,b shows the false-color composite image and the reference map of this scene, respectively. The third data are of the Washington, DC, National Mall scene captured by the Hyperspectral Digital Image Collection Experiment (HYDICE) sensor over the Washington, DC, in USA, with a spatial resolution of 3 m. The original scene contains 210 spectral bands ranging from 0.4 to 2.5 µm, with 280 × 307 pixels. After removing the atmospheric absorption bands from 0.9 to 1.4 µm, 191 bands were remaining. Six ground-truth classes with a total of 10190 labeled samples were included in the reference data. Figure 4a,b shows the false-color composite image and the reference map of this scene, respectively.
Remote Sens. 2020, 12, x FOR PEER REVIEW 9 of 22 scene covers 220 spectral bands ranging from 0.4 to 2.5 μm, and the size of the image is 145 145  . In order to satisfy the sparse thought, eight ground-truth classes with a total of 8624 labeled samples are extracted from the original sixteen categories reference data. Figure 2a,b shows the false-color composite image and the reference map of this scene, respectively.

Reflective Optics Spectrographic Imaging System (ROSIS) University of Pavia Scene
The second data are of the University of Pavia scene collected by the Reflective Optics Spectrographic Imaging System (ROSIS) over a downtown area near the University of Pavia in Italy, with a spatial resolution of 1.3 m. After removing 12 bands with high noise and water absorption, the scene has 103 spectral bands ranging from 0.43 to 0.86 μm, with 610 340  pixels. Nine ground-truth classes with a total of 42,776 labeled samples are contained in the reference data. Figure 3a

Parameter Tuning
In the experiment of this paper, the regularization parameter  for all SR-based models was selected from pixel-sized window centered on the testing pixel are different. This fact produces a critical constraint based on the assumption that the adjacent pixels inside the window belong to the same class. Referring to the definition of w  in [22], each scene usually has a proper w  with a consideration of the spatial consistency, and the exceeded size could influence the result. Therefore, in order to obtain a high classification accuracy, we optimized the size of the window w  in each experimental scene.
In addition, the number of superpixels P in SPCR and MSPCR classifier is decided by the segmentation scale S and the number of the pixels N via P N S  . The corresponding experimental analysis about P and the classification accuracy is illustrated in Figures 5 and 6. We can infer the relationship between the segmentation scale S and the classification accuracy, which is equal to the relationship of P and the classification results. Firstly, Figure 5 shows the impact of the number of superpixels on the classification accuracy (50 samples per class). We mainly select five

Parameter Tuning
In the experiment of this paper, the regularization parameter λ for all SR-based models was selected from 10 −3 to 10 −1 . For the scale compensation parameter τ and γ in ACR and SPCR-based methods, we set them in a properly range according to the value of w and the number of superpixels P. Due to the different value of w, the distributions of the ground objects in the w × w pixel-sized window centered on the testing pixel are different. This fact produces a critical constraint based on the assumption that the adjacent pixels inside the window belong to the same class. Referring to the definition of w in [22], each scene usually has a proper w with a consideration of the spatial consistency, and the exceeded size could influence the result. Therefore, in order to obtain a high classification accuracy, we optimized the size of the window w in each experimental scene.
In addition, the number of superpixels P in SPCR and MSPCR classifier is decided by the segmentation scale S and the number of the pixels N via P = √ N/S. The corresponding experimental analysis about P and the classification accuracy is illustrated in Figures 5 and 6. We can infer the relationship between the segmentation scale S and the classification accuracy, which is equal to the relationship of P and the classification results. Firstly, Figure 5 shows the impact of the number of superpixels on the classification accuracy (50 samples per class). We mainly select five and four classes to display from the AVIRIS Indian Pines dataset and HYDICE Washington, DC, National Mall dataset, respectively. As illustrated in Figure 5a, the result indicates that the optimal segmentation scale is various for different classes. For example, the optimal segmentation scale of the class 2 is distinct from the other three classes in Figure 5b. In addition, the relationship of the number of superpixels, overall accuracy and the number of the labeled samples is shown in Figure 6. Generally, the overall accuracy increased with the number of labeled samples at each segmentation scale. It is notable that under different number of labeled samples, the segmentation scale is various in order to achieve the highest classification accuracy. Like the most OBIC frameworks, the proposed SPCR method also needs to set the optimal segmentation scale, while the improved MSPCR method can overcome this drawback through taking fusion the spatial-spectral characteristics of HSI at different segmentation scales.
the overall accuracy increased with the number of labeled samples at each segmentation scale. It is notable that under different number of labeled samples, the segmentation scale is various in order to achieve the highest classification accuracy. Like the most OBIC frameworks, the proposed SPCR method also needs to set the optimal segmentation scale, while the improved MSPCR method can overcome this drawback through taking fusion the spatial-spectral characteristics of HSI at different segmentation scales.

Experiments with the AVIRIS Indian Pines Scene
In the first experiment with the AVIRIS Indian Pines hyperspectral scene, we randomly selected 90 labeled samples per class with a total of 720 samples to construct a dictionary and the training model. The selected training samples constitutes the approximately 8.35% of the labeled samples in the reference map, and the other remained samples are used in validation. As illustrated in Table 1, the OAs and the CAs of different methods are calculated, and the corresponding classification maps are presented in Figure 7. We analyzed the classification results as follows: the overall accuracy increased with the number of labeled samples at each segmentation scale. It is notable that under different number of labeled samples, the segmentation scale is various in order to achieve the highest classification accuracy. Like the most OBIC frameworks, the proposed SPCR method also needs to set the optimal segmentation scale, while the improved MSPCR method can overcome this drawback through taking fusion the spatial-spectral characteristics of HSI at different segmentation scales.

Experiments with the AVIRIS Indian Pines Scene
In the first experiment with the AVIRIS Indian Pines hyperspectral scene, we randomly selected 90 labeled samples per class with a total of 720 samples to construct a dictionary and the training model. The selected training samples constitutes the approximately 8.35% of the labeled samples in the reference map, and the other remained samples are used in validation. As illustrated in Table 1, the OAs and the CAs of different methods are calculated, and the corresponding classification maps are presented in Figure 7. We analyzed the classification results as follows:

Experiments with the AVIRIS Indian Pines Scene
In the first experiment with the AVIRIS Indian Pines hyperspectral scene, we randomly selected 90 labeled samples per class with a total of 720 samples to construct a dictionary and the training model. The selected training samples constitutes the approximately 8.35% of the labeled samples in the reference map, and the other remained samples are used in validation. As illustrated in Table 1, the OAs and the CAs of different methods are calculated, and the corresponding classification maps are presented in Figure 7. We analyzed the classification results as follows: 1 As a widely applied supervised classification framework, the SVM classifier has a feasible performance in the classification of HSI. However, there are some isolated pixels appeared in the result due to the noise and spectral variability, as shown in Figure 7. Compared with the SVM, the classic SRC method gains a better classification result, which proves that the SR-based classifier is suitable for the hyperspectral image classification tasks. Compared with the SRC, the CR model obtains an approximate equivalent classification result with a lower computational cost than that of SRC. The result not only underlines the CR model simplified the SRC model via an improved procedure without the calculation of residual error, but also verifies the effectiveness of the PD-driven decision mechanism in the process of HSIC. 2 In the spectral-spatial domain, as shown in Figure 7, SVM-MRF model outperforms the SVM classifier, which demonstrates the exploration of the spatial information can bring a further improvement on the spectral classifiers. Similarly, since the JSRC conducts the classification by sharing a common sparsity support among all neighborhood pixels, the improvement of overall accuracy also appeared in JSRC compared to SRC. Compared with the CR model, the ACR classifier obtains a significant improvement. It solves the spectral variability problem in CR by setting a spatial constraint, and proves that the innovation of decision mechanism from PD-driven to RAD-driven is effective for the HSIC tasks. As mentioned above, the improvements of SVM-MRF, JSRC, and ACR models relative to their original counterparts SVM, SRC, and CR confirm the effectiveness of introducing spatial information into the spectral domain classifiers. 3 From Figure 7, the JSRC has a better classification performance than the SVM-MRF in the AVIRIS Indian Pines scene. As illustrated in Table 1, the ACR classifier achieves a better classification result in comparison to JSRC and SVM-MRF, of which the overall accuracy is 2.38% higher than that of JSRC and 6.11% higher than that of SVM-MRF. On one hand, the RAD-driven mechanism in ACR is more effective than the hybrid norm constraint in JSRC. On the other hand, the post-processing of spatial information in SVM-MRF takes more emphasis on adjusting the initial classification result generated from spectral features, lacking an effective strategy integrating spatial information with spectral information.
the result due to the noise and spectral variability, as shown in Figure 7. Compared with the SVM, the classic SRC method gains a better classification result, which proves that the SR-based classifier is suitable for the hyperspectral image classification tasks. Compared with the SRC, the CR model obtains an approximate equivalent classification result with a lower computational cost than that of SRC. The result not only underlines the CR model simplified the SRC model via an improved procedure without the calculation of residual error, but also verifies the effectiveness of the PD-driven decision mechanism in the process of HSIC. 2. In the spectral-spatial domain, as shown in Figure 7, SVM-MRF model outperforms the SVM classifier, which demonstrates the exploration of the spatial information can bring a further improvement on the spectral classifiers. Similarly, since the JSRC conducts the classification by sharing a common sparsity support among all neighborhood pixels, the improvement of overall accuracy also appeared in JSRC compared to SRC. Compared with the CR model, the ACR classifier obtains a significant improvement. It solves the spectral variability problem in CR by setting a spatial constraint, and proves that the innovation of decision mechanism from PDdriven to RAD-driven is effective for the HSIC tasks. As mentioned above, the improvements of SVM-MRF, JSRC, and ACR models relative to their original counterparts SVM, SRC, and CR confirm the effectiveness of introducing spatial information into the spectral domain classifiers. 3. From Figure 7, the JSRC has a better classification performance than the SVM-MRF in the AVIRIS Indian Pines scene. As illustrated in Table 1, the ACR classifier achieves a better classification result in comparison to JSRC and SVM-MRF, of which the overall accuracy is 2.38% higher than that of JSRC and 6.11% higher than that of SVM-MRF. On one hand, the RAD-driven mechanism in ACR is more effective than the hybrid norm constraint in JSRC. On the other hand, the postprocessing of spatial information in SVM-MRF takes more emphasis on adjusting the initial classification result generated from spectral features, lacking an effective strategy integrating spatial information with spectral information.    Compared with the ACR, the proposed SPCR has a slightly higher OA. Table 1 demonstrates the effectiveness of introducing the superpixel segmentation, which preserves the edge information and fully considers the distribution of ground object. In addition, the practicability and reliability of the sparse coefficient, which plays an important role in the PD-driven decision mechanism and the UAD-driven decision mechanism. Thus, the combination of superpixel segmentation and sparse coefficients is effective, the overall accuracy of SPCR reaches to 92.90%, which is 1.66%, 4.04%, and 7.77% higher than ACR, JSRC, and SVM-MRF, respectively. 5 Compared with the SPCR, the proposed MSPCR model brings an improvement. Firstly, it verifies that the MSPCR performs better than the SPCR via alleviating the impact of superpixel segmentation scale on the classification results. Then, it also indicates that the decision fusion takes a comprehensive consideration to the different spatial features and distributions of various categories of objects, which elevates the final classification accuracy. In general, the proposed MSPCR obtains an overall accuracy of 95.30%, which is 2.40% and 4.06% higher than SPCR and ACR, and also 12.11% higher than CR, respectively. For individual class accuracy, it provides great results, especially for the classes 2, 6, and 7. The classification maps in Figure 7 verify the improvement achieved by the MSPCR.
In the second test with the AVIRIS Indian Pines scene, we randomly selected 10 to 90 samples per class as the training samples to measure the proposed SPCR and MSPCR. Figure 8 shows the overall classification accuracies acquired by different methods with different number of labeled samples. The results can be summarized as follows: and reliability of the sparse coefficient, which plays an important role in the PD-driven decision mechanism and the UAD-driven decision mechanism. Thus, the combination of superpixel segmentation and sparse coefficients is effective, the overall accuracy of SPCR reaches to 92.90%, which is 1.66%, 4.04%, and 7.77% higher than ACR, JSRC, and SVM-MRF, respectively. 5. Compared with the SPCR, the proposed MSPCR model brings an improvement. Firstly, it verifies that the MSPCR performs better than the SPCR via alleviating the impact of superpixel segmentation scale on the classification results. Then, it also indicates that the decision fusion takes a comprehensive consideration to the different spatial features and distributions of various categories of objects, which elevates the final classification accuracy. In general, the proposed MSPCR obtains an overall accuracy of 95.30%, which is 2.40% and 4.06% higher than SPCR and ACR, and also 12.11% higher than CR, respectively. For individual class accuracy, it provides great results, especially for the classes 2, 6, and 7. The classification maps in Figure 7 verify the improvement achieved by the MSPCR.
In the second test with the AVIRIS Indian Pines scene, we randomly selected 10 to 90 samples per class as the training samples to measure the proposed SPCR and MSPCR. Figure 8 shows the overall classification accuracies acquired by different methods with different number of labeled samples. The results can be summarized as follows:

1.
The classification results demonstrate that the overall accuracy has a positive relationship with the number of the labeled samples, the overall accuracy is increased by the number of labeled samples. Besides, this phenomenon only be satisfied under a certain number of the labeled samples, the growth trend would be stopped when the labeled samples reach a certain number.

2.
The integration of the spatial and spectral information benefits precision classification than the pixel-based classification method, which can be verified by the improvement of SVM-MRF, JSRC, ACR, SPCR, and MSPCR relative to their original counterparts, i.e., SVM, SRC, and CR.

3.
Compared to the traditional classifiers, the PD-driven classifiers provide a better classification performance. This can be confirmed by the overall accuracies of ACR and SPCR toward JSRC and SVM-MRF, as well as CR toward SVM. Moreover, the proposed MSPCR achieved the best performance among these classifiers.

Experiments with the ROSIS University of Pavia Scene
In the first test of the experiment with the ROSIS University of Pavia scene, we select 90 labeled samples per class with a total of 810 samples (which constitutes approximately 1.89% of the available labeled samples in the reference map), and the remaining labeled samples are used for validation. Table 2 and Figure 9 show the OAs and CAs for the classifiers, and the corresponding classification maps. From the experimental results, we have similar results with those obtained under the AVIRIS Indian Pines dataset: First, SRC and CR achieved similar classification results, with comparative result in comparison with the SVM in the spectral domain. In the spatial domain, SVM-MRF, JSRC, and ACR bring significant improvement to the SVM, SRC, and CR model by integrating the spatial information. Moreover, SVM-MRF owns a better classification accuracy than JSRC, different from the performance of these two methods in AVIRIS Indian Pines dataset. In comparison with the ACR, the introduction of the superpixel segmentation algorithm contributes to a higher accuracy in SPCR. Last but not the least, the proposed MSPCR achieves the best classification result with the overall accuracy of 96.90%, which is 3.64% and 4.71% higher than SPCR and ACR, and also 16.7% higher than CR, respectively. Additionally, it brings considerable improvements for individual class accuracy, especially for class 2 and class 4, which can be proved by the classification maps shown in Figure 9.    Our second test of the ROSIS University of Pavia scene measured the proposed SPCR and MSPCR with various sizes of labeled samples (from 10 to 90 samples per class). Figure 10 shows the overall classification accuracies obtained by different testing methods, under different number of training samples. With the number of the labeled sample increased, most of measured methods have an increase trend in accuracy. In comparison to the overall classification accuracy of SVM, the SRC and CR firstly have better performances, then perform worse as the number of the labeled samples increased. Considering the correlation of ground object, the classification performance of ACR and SVM-MRF, achieved significant improvements with the increase of the number of samples, with a higher classification accuracy than the JSRC in most cases. In addition, the combination of the PD-decision mechanism and the superpixel segmentation algorithm brings reliable and stable improvement, which can be confirmed by the overall classification accuracies obtained by SPCR method in all cases. From Figure 10, MSPCR method achieves the best classification result among these compared methods, as a result of applying the decision fusion which alleviates the challenge of adapting the fixed single segmentation scale to the spatial characteristic of all categories in the image. and CR firstly have better performances, then perform worse as the number of the labeled samples increased. Considering the correlation of ground object, the classification performance of ACR and SVM-MRF, achieved significant improvements with the increase of the number of samples, with a higher classification accuracy than the JSRC in most cases. In addition, the combination of the PDdecision mechanism and the superpixel segmentation algorithm brings reliable and stable improvement, which can be confirmed by the overall classification accuracies obtained by SPCR method in all cases. From Figure 10, MSPCR method achieves the best classification result among these compared methods, as a result of applying the decision fusion which alleviates the challenge of adapting the fixed single segmentation scale to the spatial characteristic of all categories in the image.

Experiments with the HYDIC Washington, DC, National Mall scene
In our first test with the HYDICE Washington, DC, National Mall scene, we first randomly select 50 labeled samples per class with a total of 300 samples for training and dictionary construction (which constitutes approximately 2.94% of the available labeled samples), the remaining samples are applied for validation. Table 3 shows the OAs and CAs obtained in different tested methods, and Figure 11 shows the corresponding classification maps. In the spectral domain, the traditional SRC provides an approximately equivalent result to CR, and both of them outperform the traditional SVM method, once again proving that the sparse coefficient is powerful to represent the spectral characteristics. In the spectral-spatial domain, the SVM-MRF, ACR, and SPCR perform well toward

Experiments with the HYDIC Washington, DC, National Mall Scene
In our first test with the HYDICE Washington, DC, National Mall scene, we first randomly select 50 labeled samples per class with a total of 300 samples for training and dictionary construction (which constitutes approximately 2.94% of the available labeled samples), the remaining samples are applied for validation. Table 3 shows the OAs and CAs obtained in different tested methods, and Figure 11 shows the corresponding classification maps. In the spectral domain, the traditional SRC provides an approximately equivalent result to CR, and both of them outperform the traditional SVM method, once again proving that the sparse coefficient is powerful to represent the spectral characteristics. In the spectral-spatial domain, the SVM-MRF, ACR, and SPCR perform well toward their original counterparts, i.e., SVM and CR. In addition, it also can be seen from the overall accuracies of the SRC method and the JSRC model that an improperly spatial constraint may have a negative impact on the classification performance. Distinct from the classification results in the above two datasets, the ACR gains a better classification performance than the proposed SPCR method in the HYDICE Washington, DC, National Mall scene, indicating that the SPCR model is susceptible to the superpixel segmentation scale. That is the original intention for us to propose MSPCR method, which eliminates the impact of the number of superpixels on classification by fusing the classification results at different segmentation scales. Furthermore, it can be found that the proposed MSPCR method achieves the highest accuracy 98.32%, which is similar with the results in the AVIRIS Indian Pines hyperspectral scene and the ROSIS University of Pavia scene. In addition, the proposed MSPCR provides reliable individual classification accuracy for each class, especially for class 1 and 2, which can be seen from the classification maps in Figure 11.  In our second test with the HYDICE Washington, DC, National Mall scene, we evaluated the classification performance of our proposed methods from the spectral-spatial domain with different numbers of training samples. As shown in Figure 12, the classification result shows a rising tendency with the increase of the number of training samples, and curve tends to be flat when the number of training samples reaches to a certain amount. Firstly, the SRC and CR gain a better classification results toward SVM with the increase of the number of the labeled samples in the spectral domain. Though JSRC obtains relatively poor results than SRC, the SVM-MRF, ACR, and SPCR still provide competitive classification performances toward the SVM and CR with the increase of the number of  In our second test with the HYDICE Washington, DC, National Mall scene, we evaluated the classification performance of our proposed methods from the spectral-spatial domain with different numbers of training samples. As shown in Figure 12, the classification result shows a rising tendency with the increase of the number of training samples, and curve tends to be flat when the number of training samples reaches to a certain amount. Firstly, the SRC and CR gain a better classification results toward SVM with the increase of the number of the labeled samples in the spectral domain. Though JSRC obtains relatively poor results than SRC, the SVM-MRF, ACR, and SPCR still provide competitive classification performances toward the SVM and CR with the increase of the number of training samples, which proves the integration of the spectral feature discrimination and spatial coherence is a reliable processing framework for the HSIC in most cases. On the other hand, improvement also appeared by the combination of the PD-driven and spatial constraint, which is indicated by the performance of ACR and SPCR-based method versus SVM-MRF and JSRC. In the spectral-spatial domain for all cases, the proposed MSPCR yields the best overall accuracy in comparison with the other related methods, and makes a significant improvement in comparison to the proposed SPCR. In addition, we compared the calculation cost of some spectral-spatial-based methods in the above three hyperspectral datasets, and the setting of the labeled samples corresponds to the cases in Tables 1-3. As shown in Figure 13, for the experiments on the above three datasets, the JSRC has the fastest speed but with the lowest classification accuracy. The proposed MSPCR not only achieves the best classification accuracy, which also has an increase in the time-consuming (about five times), as compared to the SPCR, due to the decision fusion process. On the ROSIS University of Pavia dataset and the AVIRIS Indian Pines dataset, the SPCR is the second best with an approximately equivalent time-consuming to ACR. On the HYDIC Washington, DC, National Mall dataset, the ACR achieves the second highest classification accuracy with a similar speed to SPCR. Synthesizing the above experimental results and analysis, the firstly proposed SPCR method obtains a considerable overall and individual classification accuracy. The improved MSPCR gets better classification performance than the SPCR method. Moreover, the experimental results in different datasets also show that MSPCR outperforms several other related methods. Furthermore, the classification experimental results under different number of training samples also indicate the superiority and practicability of the proposed SPCR and MSPCR methods.
It should also be noted that the computational cost of the proposed MSPCR is relatively high, In addition, we compared the calculation cost of some spectral-spatial-based methods in the above three hyperspectral datasets, and the setting of the labeled samples corresponds to the cases in Tables 1-3. As shown in Figure 13, for the experiments on the above three datasets, the JSRC has the fastest speed but with the lowest classification accuracy. The proposed MSPCR not only achieves the best classification accuracy, which also has an increase in the time-consuming (about five times), as compared to the SPCR, due to the decision fusion process. On the ROSIS University of Pavia dataset and the AVIRIS Indian Pines dataset, the SPCR is the second best with an approximately equivalent time-consuming to ACR. On the HYDIC Washington, DC, National Mall dataset, the ACR achieves the second highest classification accuracy with a similar speed to SPCR. In addition, we compared the calculation cost of some spectral-spatial-based methods in the above three hyperspectral datasets, and the setting of the labeled samples corresponds to the cases in Tables 1-3. As shown in Figure 13, for the experiments on the above three datasets, the JSRC has the fastest speed but with the lowest classification accuracy. The proposed MSPCR not only achieves the best classification accuracy, which also has an increase in the time-consuming (about five times), as compared to the SPCR, due to the decision fusion process. On the ROSIS University of Pavia dataset and the AVIRIS Indian Pines dataset, the SPCR is the second best with an approximately equivalent time-consuming to ACR. On the HYDIC Washington, DC, National Mall dataset, the ACR achieves the second highest classification accuracy with a similar speed to SPCR. Synthesizing the above experimental results and analysis, the firstly proposed SPCR method obtains a considerable overall and individual classification accuracy. The improved MSPCR gets better classification performance than the SPCR method. Moreover, the experimental results in different datasets also show that MSPCR outperforms several other related methods. Furthermore, the classification experimental results under different number of training samples also indicate the Synthesizing the above experimental results and analysis, the firstly proposed SPCR method obtains a considerable overall and individual classification accuracy. The improved MSPCR gets better classification performance than the SPCR method. Moreover, the experimental results in different datasets also show that MSPCR outperforms several other related methods. Furthermore, the classification experimental results under different number of training samples also indicate the superiority and practicability of the proposed SPCR and MSPCR methods.
It should also be noted that the computational cost of the proposed MSPCR is relatively high, which is also the part of optimization in the future. Moreover, there are some potential points, for instance, the sample selection mechanism with related to the adaptive capability of method could be the follow-up research line.

Practical Application and Analysis
Different from the above three experimental datasets, we adopt the hyperspectral image data collected by the GF-5 satellite, to measure the practicability of the proposed SPCR and MSPCR method. GF-5 is the first hyperspectral comprehensive observation satellite of China, with a spatial resolution of 30 m. There are six payloads on GF-5, including two land imagers and four atmospheric sounders. In this paper, we select a scene from the hyperspectral image data obtained by visible short wave infrared hyperspectral camera.
First, we select the range of visible light to near infrared spectrum in the original data. After the atmospheric correction and radiation correction processing, the scene covers 150 spectral bands ranging from 0.4 to 2.5 µm, and the size of the image is 200 × 200. Six ground-truth classes with a total of 2216 labeled samples are contained in the reference data. Figure 14 shows the false-color composite image and the reference map of this scene.

Practical Application and Analysis
Different from the above three experimental datasets, we adopt the hyperspectral image data collected by the GF-5 satellite, to measure the practicability of the proposed SPCR and MSPCR method. GF-5 is the first hyperspectral comprehensive observation satellite of China, with a spatial resolution of 30 m. There are six payloads on GF-5, including two land imagers and four atmospheric sounders. In this paper, we select a scene from the hyperspectral image data obtained by visible short wave infrared hyperspectral camera.
First, we select the range of visible light to near infrared spectrum in the original data. After the atmospheric correction and radiation correction processing, the scene covers 150 spectral bands ranging from 0.4 to 2.5 μm, and the size of the image is 200 200  . Six ground-truth classes with a total of 2216 labeled samples are contained in the reference data. Figure 14 shows the false-color composite image and the reference map of this scene. In the experiment with the GF-5 satellite dataset, we randomly selected five labeled samples per class with a total of 30 samples to construct a dictionary and the training model. The selected training samples constitute the approximately 1.35% of the labeled samples in the reference map, and the other remaining samples are used in validation. Figure 14 displays the classification maps of different methods. We analyzed the classification results as follows: Compared with the SVM-MRF and JSRC, the ACR has a better classification performance, of which the overall accuracy is 6.20% higher than that of JSRC and 2.00% higher than that of SVM-MRF. It confirms that the PD-driven-based decision mechanism plays an important role in classification. Compared with the ACR, the SPCR method obtains a better classification result, which verifies the effectiveness of integrating the PD-driven mechanism with the superpixel segmentation algorithm. The MSPCR outperforms the SPCR and yields the best accuracy in comparison to other related methods, which not only proves the MSPCR alleviates the impact of superpixel segmentation scale on the classification effect, but also indicates the decision fusion processing plays a decisive role in In the experiment with the GF-5 satellite dataset, we randomly selected five labeled samples per class with a total of 30 samples to construct a dictionary and the training model. The selected training samples constitute the approximately 1.35% of the labeled samples in the reference map, and the other remaining samples are used in validation. Figure 14 displays the classification maps of different methods. We analyzed the classification results as follows: Compared with the SVM-MRF and JSRC, the ACR has a better classification performance, of which the overall accuracy is 6.20% higher than that of JSRC and 2.00% higher than that of SVM-MRF.
It confirms that the PD-driven-based decision mechanism plays an important role in classification. Compared with the ACR, the SPCR method obtains a better classification result, which verifies the effectiveness of integrating the PD-driven mechanism with the superpixel segmentation algorithm. The MSPCR outperforms the SPCR and yields the best accuracy in comparison to other related methods, which not only proves the MSPCR alleviates the impact of superpixel segmentation scale on the classification effect, but also indicates the decision fusion processing plays a decisive role in adapting different spatial characteristics of various categories of objects.

Conclusions
In this paper, a novel classification framework based on sparse representation, called the superpixel-level constraint representation (SPCR), was firstly proposed for hyperspectral imagery classification. SPCR uses the characteristics of spectral consistency of pixels inside the superpixel to determine the category of the testing pixel. Besides this, we proposed an improved multiscale superpixel-level constraint representation (MSPCR) method, obtaining the final classification result through fusing the classification maps of SPCR at different segmentation scales. The proposed SPCR method exploits the latent property of sparse coefficient and improves the contextual constraint, with consideration of spatial characterization. Moreover, the proposed MSPCR achieves comprehensive utilization of various regional distribution, resulting in strong classification performance. The experimental results with four real hyperspectral datasets including a GF-5 satellite data demonstrated that the SPCR outperforms several other classification methods, and the MSPCR yields a better classification accuracy than SPCR.