Multi-Scale Superpixels Dimension Reduction Hyperspectral Image Classification Algorithm Based on Low Rank Sparse Representation Joint Hierarchical Recursive Filtering

The original Hyperspectral image (HSI) has different degrees of Hughes phenomenon and mixed noise, leading to the decline of classification accuracy. To make full use of the spatial-spectral joint information of HSI and improve the classification accuracy, a novel dual feature extraction framework joint transform domain-spatial domain filtering based on multi-scale-superpixel-dimensionality reduction (LRS-HRFMSuperPCA) is proposed. Our framework uses the low-rank structure and sparse representation of HSI to repair the unobserved part of the original HSI caused by noise and then denoises it through a block-matching 3D algorithm. Next, the dimension of the reconstructed HSI is reduced by principal component analysis (PCA), and the dimensions of the reduced images are segmented by multi-scale entropy rate superpixels. All the principal component images with superpixels are projected into the reconstructed HSI in parallel. Secondly, PCA is once again used to reduce the dimension of all HSIs with superpixels in scale with hyperpixels. Moreover, hierarchical domain transform recursive filtering is utilized to obtain the feature images; ultimately, the decision fusion strategy based on a support vector machine (SVM) is used for classification. According to the Overall Accuracy (OA), Average Accuracy (AA) and Kappa coefficient on the three datasets (Indian Pines, University of Pavia and Salinas), the experimental results have shown that our proposed method outperforms other state-of-the-art methods. The conclusion is that LRS-HRFMSuperPCA can denoise and reconstruct the original HSI and then extract the space-spectrum joint information fully.


Introduction
HSI uses numerous continuous narrow-band electromagnetic wave bands to image the surface species and obtain rich joint information of the space-spectrum. With the rapid development of HSI processing and analysis in recent years, HSI classification technology is extensively utilized in agriculture [1], environmental detection [2], marine monitoring [3], and other fields; however, considering the effects of imaging sensor breakdown [4], environmental pollution [5], and other factors, the obtained HSI has mixed noise [6][7][8][9] that reduces the classification accuracy. For example, Tu et al. [7] proposed a kernel entropy component analysis (KECA)-based method for noisy label detection that can remove noisy labels for the PaviaU with 50 true samples and 10 noisy labels per class, KECA obtains 81.11% OA, with 50 true samples and 30 noisy labels per class, the OA is 71%. Xu et al. [8] proposed a spectral-spatial classification of HSI based on low-rank decomposition, by removing the sparse part, the LRD-NWFE-GC [8] obtains 92.3% on Indian Pines even if the training sample is small. On the contrary, the NWFE-GC [8] does not remove the sparse noise, and the classification accuracy is reduced to 74.53%. So, the HSI denoising needs to be solved.

1.
Compared to other HSI classification algorithms, we propose to integrate the new joint feature extraction framework of spatial domain filtering and transform domain filtering(LRS-HRF) into the MSuperPCA. Firstly, the sparse feature and low-rank structure of HSI are utilized to realize transform domain filtering to separate the mixed noise in the image and improve the PSNR of HSI. Then, the separated feature images are transformed and filtered recursively in a multi-level domain based on the spatial domain. To enhance the spatial-spectral features of the feature image, the strong edges of different objects in the image are preserved and the fine texture structure is removed.

2.
The HSI with restoration and denoising is obtained by filtering in the transform domain segmented by multi-scale superpixels, and the low-dimension characteristics of different species in different regions are fully obtained. Then, PCA is performed for each HSI with superpixels in multi-scale, and the spatial information of each HSI is fully utilized.

3.
Compared to the state-of-the-art HSI classification algorithms, the LRS-HRFMSuperPCA algorithm achieves the highest OA, AA, and Kappa coefficient on the three real datasets. 2. The HSI with restoration and denoising is obtained by filtering in the transform domain segmented by multi-scale superpixels, and the low-dimension characteristics of different species in different regions are fully obtained. Then, PCA is performed for each HSI with superpixels in multi-scale, and the spatial information of each HSI is fully utilized. 3. Compared to the state-of-the-art HSI classification algorithms, the LRS-HRFMSuperPCA algorithm achieves the highest OA, AA, and Kappa coefficient on the three real datasets.

Estimation of Noise in Hyperspectral Image
Suppose an original HSI × ≡ [ , , … , ] ∈ , is the number of pixels in each band, L is the number of bands. The high correlation between neighboring spectral bands and the spatial correlation within that band are the reasons underlying the good performance of the linear regression theory in hyperspectral applications [37,38].
Calculate the autocorrelation matrix ≔ ( ) and the inverse matrix ≔ . Traversing all bands, we assume that is explained by a linear combination of the remaining -1 bands, contains the data read by the hyperspectral sensor at the th band for all image pixels, = , so we have [39]: where = [ , … , , , … , ] denotes data from other band image pixels after removing the -th band, is the regression vector of size ( − 1) × 1, and is the noise vector of size × 1, the least squares estimator of the regression vector is given by [39]:

Estimation of Noise in Hyperspectral Image
Suppose an original HSI Y L×Q ≡ y 1 , y 2 , . . . , y Q ∈ R L , Q is the number of pixels in each band, L is the number of bands. The high correlation between neighboring spectral bands and the spatial correlation within that band are the reasons underlying the good performance of the linear regression theory in hyperspectral applications [37,38].
Calculate the autocorrelation matrixR := YY T and the inverse matrix R :=R −1 . Traversing all bands, we assume that z i is explained by a linear combination of the remaining L − 1 bands, z i contains the data read by the hyperspectral sensor at the ith band for all image pixels, Z = Y T , so we have [39]: where Z ∂ i = [z 1 , . . . , z i−1 , z i+1 , . . . , z L ] denotes data from other band image pixels after removing the i-th band, β i is the regression vector of size (L − 1) × 1, and ξ i is the noise vector of size N × 1, the least squares estimator of the regression vector β i is given by [39]: In order to reduce the computational complexity, define the pseudoinverse Z # , of size (L − 1) × (L − 1). Let L × L be the size of symmetric and positive definite matrices R and R −1 that are partitioned into block matrices as follows: where A and A are (L − 1 × L − 1) matrices, b and b are (L − 1 × 1) vectors, and c and c are scalars. Because they are symmetric and positive, we find: is the inverse of the autocorrelation matrix obtained by deleting row i and column i, and we have R ξ i := z i − Z ∂ iβ is calculated for the regression vectorβ of each band, the end traversal outputξ := ξ 1 ,ξ 2 , . . . ,ξ L yields the estimated noise in the HSI.

Subspace Prediction of the Signal from the Hyperspectral Image
This section performs a subspace prediction of the signal from the HSI. is computed for the predicted signals. The eigenvectors E := [e 1 , . . . , e L ] forR x are computed. Given a permutation π = {i 1 , . . . , i L } of indices i = 1, . . . , L, let us decompose the space R L into two orthogonal subspaces. The two orthogonal subspaces are: the k-D subspace E k ≡ e i 1 , . . . , e i k and E 1 k ≡ e i k+1 , . . . , e i L , E 1 k is the orthogonal complement of the subspace E k . Let U k be the projection matrix onto E k andx k ≡ U k y be the projection of the observed spectral vector y onto the subspace E k , we introduce the concept of minimum mean square error between x andx k as a criterion of subspace [40].
where c is an irrelevant constant, with respect to all the permutations π = {i 1 , . . . , i L } of size L and to k. So, we can construct the setting item K as follows: Letk be the number of negative values of δ = tr U 1 kR y + 2tr U kRy , where δ := [δ 1 , . . . , δ L ]. Ultimately, the proper subset of feature vector E := [e 1 , . . . , e L ] corresponding to negative δ i is found from δ := [δ 1 , . . . , δ L ] as the prediction. The valueŜ = eˆi 1 , . . . , eˆiˆk of the HSI subspace is retrieved from the signal subspace fromk and π.

Domain Transform Recursive Filtering (DTRF)
The principle of DTRF is to find a transform t : R 2 → R for the input signal I, in the new domain R. It is stated that the Euclidean distance between neighboring samples in the new domain R must equal the l 1 distance between them in the original domain R 2 [13]. where S = {x 0 , x 1 , . . . , x n } represents the sample set on the signal. Set up ct(x) = t(x, I(x)), h is the sampling interval, and convert Equation (8) to: In general, to avoid calculating the absolute value, ct is defined as a simple increasing function set up ct(0) = 0. We have: In the actual filtering process, two adjustable parameters are introduced to adjust the filter, the spatial standard deviation σ s and the spectral standard deviation σ r , as: where u is the transform domain signal, I (x) represents the derivative of input onedimensional signal I(x). Then, the input signal I is processed by recursive filtering as: 1] refers to the feedback coefficient, b is the distance between two adjacent samples in the transform domain, K i represents the filtered result. By increasing b, a b tends to be zero, which stops the propagation chain to preserve sharp edges in the signal.

Proposed LRS-HRFMSuperPCA Method
The HSIs are inevitably contaminated with various noises due to equipment limitation and atmospheric environment impact. On the other hand, the computational complexity will grow exponentially as the dimension increases, which will cause the "curse of dimensionality"; two problems that limit the subsequent classification. We proposed a novel transformation domain-space domain feature extraction framework, called LRS-HRFMSuperPCA, which can effectively improve the classification accuracy and is effective for HSI denoising.
The framework of this joint feature extraction algorithm includes four parts: low-rank sparse representation denoising, multi-scale superpixel dimensionality reduction, and hierarchical domain transform recursive filtering, SVM classification. We detail all parts and provide a pseudo code in the following.

Low-Rank Sparse Representation Denoising Based on MSuperPCA
Transform domain filtering denoising algorithm is used to remove the mixed noise in the original HSI. Following Jiang et al. [35] and Lina et al. [17], we propose the low-rank sparse representation denoising algorithm to remove noise from the original HSI, as the input of the MSuperPCA algorithm. According to Section 2.2, we can obtain the subspacê S of the original HSI Y L×Q where ⊗ represents Kronecker product and z := vec(Z), X is the clean image without noise and I is the identity matrix, Z represents the correlation coefficient matrix. The denoising and inpainting problem is formulated as where φ(z) is the regularization function, λ is the regularization coefficient, y := vec(Y). Let the mask M i acting on the unobserved pixel i and HSI y o,i affected by noise, because In addition, n o,i is the noise part, according to the literature [40], we can obtain By calculatingŷ i =Ŝẑ i , we can obtain the fully observed HSIŶ = [ŷ 1 ,ŷ 2 , . . . ,ŷ L ]. After recovering the unobserved component, we should solve the denoising problem For the optimization problem of Equation (17), we choose BM3D for denoising [15], because the image obtained by the BM3D algorithm keeps a high PSNR. Then, the correlation coefficient matrix Z is obtained. The HSIX is solved byX =ŜZ.
The dimension of HSIX is reduced by PCA, the first principal component images I f are segmented into N principal component images with various superpixel blocks utilizing the entropy rate superpixel algorithm [41], where N = 2C + 1, C is a user-defined number of scales. The number of superpixels per image in the scale is . . , 0, . . . , C − 1, C], c denotes the rank of each picture with superpixels, the segmented superpixel blocks of each scale are parallel projected onto the repaired and denoised HSIX. SuperPCA was performed onX [35], therefore, we have where num _ PC is the dimension after PCA, J m represents the principal component image after superpixel segmentation.

Hierarchical Domain Transform Recursive Filtering (HDTRF)
Inspired by the idea of deep learning [42,43], we set up a hierarchical structure based on DTRF [13], as shown in Figure 2, the output feature set of the upper level DTRF unit is used as the input of the next level DTRF unit. The spatial information and spectral information of hyperspectral data are fully mined. The obtained spatial-spectral features preserve the global similarity structure, local geometry structure and rich spatial information of the hyperspectral dataset.
where ( ) is the regularization function, is the regularization coefficient, ≔ ( ). Let the mask acting on the unobserved pixel and HSI , affected by noise, because = , we have In addition, , is the noise part, according to the literature [40], we can obtain ̂ = ( ) , . By calculating = ̂ , we can obtain the fully observed HSI = [ , , … , ]. After recovering the unobserved component, we should solve the denoising For the optimization problem of Equation (17), we choose BM3D for denoising [15], because the image obtained by the BM3D algorithm keeps a high PSNR. Then, the correlation coefficient matrix is obtained. The HSI is solved by = .
The dimension of HSI is reduced by PCA, the first principal component images are segmented into principal component images with various superpixel blocks utilizing the entropy rate superpixel algorithm [41], where = 2 + 1, is a user-defined number of scales. The number of superpixels per image in the scale is = √2 , where = [− , − + 1, … ,0, … , − 1, ], denotes the rank of each picture with superpixels, the segmented superpixel blocks of each scale are parallel projected onto the repaired and denoised HSI . SuperPCA was performed on [35], therefore, we have where _ is the dimension after PCA, represents the principal component image after superpixel segmentation.

Hierarchical Domain Transform Recursive Filtering (HDTRF)
Inspired by the idea of deep learning [42,43], we set up a hierarchical structure based on DTRF [13], as shown in Figure 2, the output feature set of the upper level DTRF unit is used as the input of the next level DTRF unit. The spatial information and spectral information of hyperspectral data are fully mined. The obtained spatial-spectral features preserve the global similarity structure, local geometry structure and rich spatial information of the hyperspectral dataset.

Decision Fusion Strategy Based on Support Vector Machine
We use multi-scale entropy rate superpixel segmentation to obtain multiple principal component images. We classify the feature image sets after hierarchical recursive filtering by using SVM. SVM is a supervised learning model, and it is mainly oriented to the linear separable case.

Decision Fusion Strategy Based on Support Vector Machine
We use multi-scale entropy rate superpixel segmentation to obtain multiple principal component images. We classify the feature image sets after hierarchical recursive filtering by using SVM. SVM is a supervised learning model, and it is mainly oriented to the linear separable case.
In the case of linear inseparability, the linear inseparable samples in the low-dimensional input space are transformed into a high-dimensional feature space by using a nonlinear transformation algorithm. The nonlinear transformation is to map the sample information in the input space into a high-dimensional space through a properly defined inner product function, and then find the optimal linear classification surface in the new space. The optimal classification surface is obtained when the maximum interval classification line is expanded to the n-dimensional space. The optimal classification line cannot only correctly separate some individuals of different categories, but also maximize the overall classification interval. The general principle of SVM is shown in Figure 3.
In the case of linear inseparability, the linear inseparable samples in the lowsional input space are transformed into a high-dimensional feature space by using linear transformation algorithm. The nonlinear transformation is to map the sampl mation in the input space into a high-dimensional space through a properly define product function, and then find the optimal linear classification surface in the new The optimal classification surface is obtained when the maximum interval classif line is expanded to the n-dimensional space. The optimal classification line cann correctly separate some individuals of different categories, but also maximize the classification interval. The general principle of SVM is shown in Figure 3. In the face of the feature classification in HSIs, the problem is always non Therefore, we should use some nonlinear feature transformation to map the sampl mation in the original input space to a high-dimensional feature space, and then f optimal classification surface in the new space. The objective function is: where ( , ) represents the spatial variation function, represents the relaxati tor, represents the regularization parameter. Facing different classification tas need to construct different kernel functions. The RBF kernel function has the charac of unique best approximation. As a kernel function, RBF can map input samples t dimensional feature space in SVM. We use the RBF kernel function in the LIBSVM t As a feature image can obtain one classification accuracy, we can get 2 + 1 fication accuracies from different feature images. The decision fusion algorithm is improve classification accuracy. We leverage the majority voting-based decision strategy due to its insensitivity to inaccurate estimates of posterior probabilities, t jority vote is given by [44]: where is the indicator function, is the class label from one of the possible for the test pixel, is the classifier index, 2 + 1 is the number of classifiers, ( number of times class was detected in the bank of classifiers, and denotes the strength of the -th classifier. In our method, we use the equal voting strengt ( = 1,2, … ,2 + 1).
The detailed procedure for the LRS-HRFMSuperPCA classification is descri Algorithm 1. In the face of the feature classification in HSIs, the problem is always nonlinear. Therefore, we should use some nonlinear feature transformation to map the sample information in the original input space to a high-dimensional feature space, and then find the optimal classification surface in the new space. The objective function is: where ϕ(ω, τ) represents the spatial variation function, τ i represents the relaxation factor, c 1 represents the regularization parameter. Facing different classification tasks, we need to construct different kernel functions. The RBF kernel function has the characteristic of unique best approximation. As a kernel function, RBF can map input samples to highdimensional feature space in SVM. We use the RBF kernel function in the LIBSVM toolbox. As a feature image can obtain one classification accuracy, we can get 2C + 1 classification accuracies from different feature images. The decision fusion algorithm is used to improve classification accuracy. We leverage the majority voting-based decision fusion strategy due to its insensitivity to inaccurate estimates of posterior probabilities, the majority vote is given by [44]: where I 1 is the indicator function, v is the class label from one of the C 1 possible classes for the test pixel, j is the classifier index, 2C + 1 is the number of classifiers, N(i) is the number of times class i was detected in the bank of classifiers, and α j denotes the voting strength of the j-th classifier. In our method, we use the equal voting strength, α j = 1 2C+1 (j = 1, 2, . . . , 2C + 1). The detailed procedure for the LRS-HRFMSuperPCA classification is described in Algorithm 1.

Experimental Datasets
Indian Pines is a 145 × 145 pixel HSI collected by the AVIRIS hyperspectral spectrometer sensor at the Indiana agricultural test site. It contains 220 bands of 16 types of ground objects, and 20 bands with noise removed. The remaining 200 bands are utilized as the experimental dataset, and the total number of labeled pixels is 10,249, Figure 4 shows Indian Pines' ground true value map, specific species and false color map. PaviaU is a 610 × 340 pixel HSI collected by the ROSIS hyperspectral spectrom sensor over Pavia University. A total of 115 bands of 9 types of ground objects are incl and 12 bands with noise are eliminated. The remaining 103 bands are utilized as th perimental dataset, and the total number of labeled pixels is 42,776, Figure 5 s PaviaU's ground value map, specific species and false color map.  Figure 6 shows Salinas' ground map, specific species and false color map. The total number of labeled pixels is 54 Table 1 shows Number of samples in the Indian Pines, PaviaU, and Salinas. PaviaU is a 610 × 340 pixel HSI collected by the ROSIS hyperspectral spectrometer sensor over Pavia University. A total of 115 bands of 9 types of ground objects are included, and 12 bands with noise are eliminated. The remaining 103 bands are utilized as the experimental dataset, and the total number of labeled pixels is 42,776, Figure 5 shows PaviaU's ground value map, specific species and false color map. PaviaU is a 610 × 340 pixel HSI collected by the ROSIS hyperspect sensor over Pavia University. A total of 115 bands of 9 types of ground obj and 12 bands with noise are eliminated. The remaining 103 bands are u perimental dataset, and the total number of labeled pixels is 42,776, PaviaU's ground value map, specific species and false color map.  Figure 6 shows Salina map, specific species and false color map. The total number of labeled Table 1 shows Number of samples in the Indian Pines, PaviaU, and Salin Salinas Scene is a 512 × 217 pixel HSI collected by the AVIRIS hyperspectral spectrometer sensor over the Salinas Valley, California. A total of 224 bands of 16 types of ground objects are included, and 20 bands with noise are eliminated. The remaining 204 bands are employed as the experimental dataset, Figure 6 shows Salinas' ground value map, specific species and false color map. The total number of labeled pixels is 54,129, Table 1 shows Number of samples in the Indian Pines, PaviaU, and Salinas.

Parameter Analysis
The commonly used HSI classification performance indicators are as follo AA and Kappa coefficient. Taking the hyperspectral dataset with kinds of sur tures as an example, by comparing the classification results with the real surface distribution data, the confusion matrix ∈ ℝ × is obtained, in the matrix, sents the number of samples that belong to class and are divided into class . Th

Parameter Analysis
The commonly used HSI classification performance indicators are as follows: OA, AA and Kappa coefficient. Taking the hyperspectral dataset with C 1 kinds of surface features as an example, by comparing the classification results with the real surface feature distribution data, the confusion matrix M 1 ∈ R C 1 ×C 1 is obtained, in the matrix, m ij represents the number of samples that belong to class i and are divided into class j. Therefore, the diagonal element m ii in the confusion matrix represents the number of correctly classified samples in each feature category. Next, each evaluation index can be calculated separately.
AA: According to the OA, the AA is: Kappa coefficient: In order to make full use of the information of the confusion matrix to measure the consistency between the classification results and the real surface features distribution map, researchers put forward the concept of Kappa, which is defined as follows: It represents the total number of samples in the hyperspectral dataset. There are three types of parameters in our paper.

Scale Number C and Initial Superpixel Number S
As seen in Figure 7, when C is 0, the model degenerates to a single-scale superpixel dimension reduction. As a result, it is unable to make full use of the spatial information of multiple images, leading to the decline of classification accuracy. When the C value is too large, an over-segmentation will occur. Hence, a single-pixel block is obtained not being visible and the correct label cannot be identified. As shown in Figure 7a, when C is 7, OA will be maximized on the Indian Pines. Figure 7b reveals that, when C is 10, the maximum value is obtained on PaviaU, since the datasets with complex structures and textures and large sizes need to be segmented into more hyper pixels (for instance, PaviaU). Fewer hyper pixels are segmented when the dataset size is smaller (for example, the Indian Pines).
For the number of initial superpixels S, according to Figure 8, the trend of classification accuracy for the three datasets increases first and then decreases with increasing number of initial hyperpixels. Too few or too many initial superpixels will lead to poor classification accuracy. The reason is that when the initial number of superpixel blocks is too small, the number of superpixel blocks calculated for each image in the scale is also small, and the pixel blocks contain various ground objects, which cannot make full use of the spectral characteristics of different species. When the number of superpixels is too large, a singlepixel block will be smaller, and a single-pixel block will contain fewer pixels and will not be able to utilize all samples belonging to a uniform area. As seen in Figure 8, when S is 40, the maximum OA can be obtained for the three datasets. This indicates that hyperpixel segmentation of the reduced-dimension image can capture the spatial structure of the HSI and improve the OA of the subsequent process. (c) Salinas.

The Spatial and Range Standard Deviations of the Filter
As seen in Figure 9, on the Indian Pines, when σ is 30 and σ is 0.8, the can obtain the optimal classification accuracy. This will not be repeated for the Sali PaviaU. For the recursive layer number , when the recursion level is 0, th HRFMSuperPCA method degenerates to the LRS-MSuperPCA method leadin failure to guarantee similar eigenvalues for the adjacent pixels on the same sid edge and the classification accuracy is minimum, when the recursion level i fine texture structure cannot be completely eliminated by single-layer recursiv ing. Hence, it eliminates the fine texture structure and preserves the edges. T be inferred from the SuperPCA part in schematic Figure 1. According to Figur it is indicated that the proposed HDTRF can enhance the classification accu HSI.

The Spatial and Range Standard Deviations of the Filter
As seen in Figure 9, on the Indian Pines, when σ s is 30 and σ r is 0.8, the method can obtain the optimal classification accuracy. This will not be repeated for the Salinas and PaviaU. For the recursive layer number L, when the recursion level is 0, the LRS-HRFMSuperPCA method degenerates to the LRS-MSuperPCA method leading to the failure to guarantee similar eigenvalues for the adjacent pixels on the same side of the edge and the classification accuracy is minimum, when the recursion level is 1, the fine texture structure cannot be completely eliminated by single-layer recursive filtering. Hence, it eliminates the fine texture structure and preserves the edges. This can be inferred from the SuperPCA part in schematic Figure 1. According to Figures 9-11, it is indicated that the proposed HDTRF can enhance the classification accuracy of HSI.

Kernel Function Parameter G and Regularization Parameter c 1 in SVM
The SVM classifier used in the LRS-HRFMSuperPCA technique utilizes the LIBSVM toolbox in MATLAB. Moreover, the comparison algorithm of the SVM classifier also uses the LIBSVM toolbox in MATLAB containing the kernel parameter gamma G and regularization parameter c 1 . Figures 12 and 13 represent the effect of different kernel parameters G and regularization parameter c 1 on classification accuracy in three datasets, respectively. It is observed that for the three datasets, the maximum classification accuracy can be achieved when the kernel parameter G is 300, 40 and 400, respectively, and the different regularization parameter c 1 is set to 10 6 , 10 6 and 10 8 . In our classifier, a higher gamma value G leads to a higher dimension of the mapping, and the training result is better. We choose the higher c 1 , as the higher c 1 aims to classify all training samples correctly by giving the model the freedom to choose more samples as support vectors. Since the Salinas training sample is larger than the Indian Pines and PaviaU, we can see that the value of c 1 will increase with the increase of training set samples. the LIBSVM toolbox in MATLAB containing the kernel parameter gamma G and regularization parameter . Figures 12 and 13 represent the effect of different kernel parameters and regularization parameter on classification accuracy in three datasets, respectively. It is observed that for the three datasets, the maximum classification accuracy can be achieved when the kernel parameter is 300, 40 and 400, respectively, and the different regularization parameter is set to 10 6 , 10 6 and 10 8 . In our classifier, a higher gamma value leads to a higher dimension of the mapping, and the training result is better. We choose the higher , as the higher aims to classify all training samples correctly by giving the model the freedom to choose more samples as support vectors. Since the Salinas training sample is larger than the Indian Pines and PaviaU, we can see that the value of will increase with the increase of training set samples.  The experimental parameter settings of the LRS-HRFMSuperPCA method in subsequent experiments are shown in Table 2.

Experimental Results and Analysis
The HSI restored by the transform domain filtering algorithm based on low-rank sparse representation is better than the original image. In our paper, the peak-signal-tonoise ratio (PSNR) was utilized to compare the quality of the denoised HSI and the original HSI, as follows: where is the maximum value of the color of the image point, represents the mean square error between two monochromatic images and of m*n size. In our method, the original HSI with additive Gaussian noise with a noise variance of 0.01 is used as the reference image to determine the PSNR. In the experiment, the original HSI and the denoised HSI obtained by low-rank sparse representation are considered as the  Table 2.

Experimental Results and Analysis
The HSI restored by the transform domain filtering algorithm based on low-rank sparse representation is better than the original image. In our paper, the peak-signal-tonoise ratio (PSNR) was utilized to compare the quality of the denoised HSI and the original HSI, as follows: where MAX I is the maximum value of the color of the image point, MSE represents the mean square error between two monochromatic images I and K of m * n size. In our method, the original HSI with additive Gaussian noise with a noise variance σ 2 of 0.01 is used as the reference image to determine the PSNR. In the experiment, the original HSI and the denoised HSI obtained by low-rank sparse representation are considered as the processed images. Figure 14 represents the PSNR values of all bands of the original HSI and the HSI denoised by low-rank sparse representation on the Indian Pines. processed images. Figure 14 represents the PSNR values of all bands of the original HSI and the HSI denoised by low-rank sparse representation on the Indian Pines. Table 3 denotes the average PSNR values of all bands of the two HSIs after summation.  When comparing the PSNR values, the reference image used in our method is a noisy HSI with additive Gaussian noise. Thus, the greater the MSE between the processed image and the reference image, the greater the difference between the processed image and the reference image. Moreover, the smaller the corresponding PSNR value, in our method, the smaller the difference between the processed image and the reference image damaged by additive noise. The larger corresponding PSNR value indicates that the quality of the processed image is poor. As seen in Figure 14, the PSNR values of the reconstructed HSI obtained by our method are lower compared to the original HSI in all bands. The third row of Table 3 reveals that the OA of the LRS-HRFMSuperPCA is better than that of the HRFMSuperPCA without low-rank sparse representation denoising on the 10% training set. Therefore, the transform domain filtering denoising algorithm based on low-rank sparse representation of HSI can enhance the quality of the original HSI and improve the OA.
The LRS-HRFMSuperPCA is compared with several HSI classification models, including original classification algorithm, SVM, PCA and the recently proposed IFRF [26], CCJSR [27], MSuperPCA [35], SSRN [30], KNNRS [36] and GhoMR [33]. To avoid the chance of the experimental results, the average value of five runs is considered as the final result, including our algorithm and all other comparison algorithms. At the same time, for a fair comparison, the parameters of the comparison algorithm are set as the default optimal parameters based on the relevant literature. Table 4 represents the specific experimental parameters of the comparison algorithm. The experimental environment of the LRS-HRFMSuperPCA method and other comparative algorithms is a notebook with 12 GB memory, Intel Core i5 2.   When comparing the PSNR values, the reference image used in our method is a noisy HSI with additive Gaussian noise. Thus, the greater the MSE between the processed image and the reference image, the greater the difference between the processed image and the reference image. Moreover, the smaller the corresponding PSNR value, in our method, the smaller the difference between the processed image and the reference image damaged by additive noise. The larger corresponding PSNR value indicates that the quality of the processed image is poor. As seen in Figure 14, the PSNR values of the reconstructed HSI obtained by our method are lower compared to the original HSI in all bands. The third row of Table 3 reveals that the OA of the LRS-HRFMSuperPCA is better than that of the HRFMSuperPCA without low-rank sparse representation denoising on the 10% training set. Therefore, the transform domain filtering denoising algorithm based on low-rank sparse representation of HSI can enhance the quality of the original HSI and improve the OA.
The LRS-HRFMSuperPCA is compared with several HSI classification models, including original classification algorithm, SVM, PCA and the recently proposed IFRF [26], CCJSR [27], MSuperPCA [35], SSRN [30], KNNRS [36] and GhoMR [33]. To avoid the chance of the experimental results, the average value of five runs is considered as the final result, including our algorithm and all other comparison algorithms. At the same time, for a fair comparison, the parameters of the comparison algorithm are set as the default optimal parameters based on the relevant literature. Table 4

IFRF(SVM)
The parameters of recursive filtering are set to δ s =200 and δ r = 0.3. The number of features k is set 20, penalty parameter c = 100, kernel function parameter G = 4.

CCJSR
The relationship among OA, the joint neighboring scale S e , and the sparsity level S l , the number of nearest neighbors N, and the regularized parameter λ. When S e is set to 6 and S l is set to 2, λ equals 0.6 and N is set to 4. The highest classification accuracy is obtained by the proposed method.

MSuperPCA(SVM)
The scale number of C for Indian Pines, PaviaU, and Salinas is set to 4, 6, and 4, respectively. The number of superpixels is set to the initial optimal value, S = 100, S = 20, and 100 for Indian Pines, PaviaU, and Salinas, for SVM, penalty parameter c = 10 5 , Kernel function parameter G = 500.

SSRN
The SSRN includes two convolutional layers and two spectral residual blocks as a spectral feature learning section, a 3-D convolutional layer and two spatial residual blocks as a spatial feature learning section, an average pooling layer, and a fully connected layer.

KNNRS
The parameters of recursive filtering are set to δ s =204 and δ r = 0.6. The parameters of KNN are set to k 1 = 1 and k 2 = 9. The number of superpixels N for Indian Pines, PaviaU, and Salinas is set to 3300, 8000, and 9000.

GhoMR
The GhoMR have two hyperparameters-number of Ghost transformations (T) and spatial size of ghost filters (K T ), for Indian Pines T = 2 and K T = 3, for PaviaU T = 4 and K T = 7, for Salinas T = 2 and K T = 3 In order to fully compare with the training set in the references corresponding to the comparison algorithm, the contrast experiments are carried out on the training set which accounts for t 1 = 77 (10% Training set) and t 1 = 173 (20% Training set) for per class on Indian Pines and t 2 = 475 (10% Training set) and t 2 = 1139 (20% Training set) for PaviaU, t 3 = 102 (3% Training set) and t 2 = 170 (5% Training set) on Salinas, leaving the rest samples to form the testing samples. For 10% training data on Indian Pines and PaviaU, 3% training data on Salinas, from   To compare 10%, 20%, 3% and 5% of the total randomly selected training samples, the eight advanced comparison algorithms in Table 4 and the LRS-HRFMSuperPCA method are used. In this section, only the comparison results on the Indian Pines are discussed. Table 7 shows that, when the training set is 10%, the OA of our method is improved by 17.65% and 20.27%, respectively, compared with the original SVM and PCA on the Indian Pines. Compared to the IFRF, according to Table 3, the low-rank sparse representation denoising algorithm can enhance the quality of the original HSI, thus improving the OA of 0.9% on 10% of the Indian Pines. Compared to the CCJSR, the OA of our method is enhanced by 3.32% and 2.78% over the two training sets on Indian Pines. In comparison to the MSuperPCA, when the training set is 20%, the OA of our method on the Indian Pines dataset is improved by 1.78%. The reason is that our method combines the spatial domain filtering algorithm and the transform domain filtering algorithm. The mixed noise can be removed by the transform domain filtering denoising algorithm based on low-rank sparse representation within the original HSI. At the same time, the small-scale structure of the HSI can be eliminated by the multi-level spatial domain recursive filtering algorithm fully obtaining the spatial-spectral joint information of the HSI and improving the OA. Compared to KNNRS, the OA of our method is always higher than that of the LRS-HRFMSuperPCA on the three datasets (from Tables 7 and 8). Compared to GhoMR, the single projection space constructed by the GhoMR makes it impossible to make full use of the spectral features of different species. Our method can randomly control the scale of multi-scale superpixel segmentation, and each obtained superpixel block is a set of pixels with similar characteristics. Subsequently, a hierarchical recursive filtering algorithm is used to extract the spatial-spectral joint information in each pixel block region to improve the consistency and similarity within the region. At the same time, our method runs in a non-GPU environment. Compared to the GhoMR running on GPU, we use CPU to reduce the experimental equipment cost. Table 7. OA, AA, and Kappa using the LRS-HRFMSuperPCA and other state-of-the-art methods on 10% and 20% randomly chosen training samples.  Figures 15-17 represent the random feature classification map obtained by our method and different comparison algorithms and the OA of the corresponding feature classification map. As seen in Figure 15, the OA of the SVM and PCA is not ideal since they do not combine the joint information of spatial-spectrum, and the "salt and pepper" noise phenomenon appears in the classification result graph. In contrast, IFRF and KNNRS combined with the spatial-spectrum can effectively eliminate the noise pixels, and the "salt and pepper" noise phenomenon is greatly reduced. Compared to PCA and SVM, the OA of IFRF is improved by 18.33% and 15.92%, respectively, and that of KNNRS is improved by 17.3% and 19.6%, respectively. According to the surface features classification map of Indian Pines, the OA of our method is 1.43%, 2.47%, and 0.16% higher than that of IFRF, CCJSR, and KNNRS. Compared to the advanced MSuperPCA, SSRN and GhoMR algorithms, the OA of our algorithm on the Indian Pines is improved by 1.58%, 0.61% and 0.68%, respectively. Improvements or comparable results are obtained on PaviaU and Salinas as well that are not repeated here.     This section analyzes the effects of various training and test sets on the OA o method and the other methods. Figure 18 represents the classification results of the HRFMSuperPCA method. The number of training samples of three datasets incr from 1% to 6%. According to Figure 18, with different training sample numbers, th can be improved significantly by our method by increasing the number of training ples. Hence, we can conclude that our method can achieve the best OA for the thre tasets compared to other state-of-the-art comparison algorithms.
The computational time of the LRS-HRFMSuperPCA algorithm and other com son algorithms are represented in Table 9. The running environment of the HRFMSuperPCA and all the comparison algorithms is a notebook computer with 1 memory and a 2.2 GHz CPU processor. Except for GhoMR and SSRN, the running ronment of GhoMR and SSRN are pytorch 1.6.0 and CUDA10.1 in the open-source environment of Google lab, the running environment of other algorithm MATLAB2018a. Table 8 shows the number of training samples as a percentage of the sample number in the measurement run time to experiment on the three datasets. Ac ing to the analysis of Table 9, the LRS-HRFMSuperPCA algorithm takes a long time o Indian Pines. The main reason is that this technique utilizes multi-level domain tran recursive filtering and multi-scale superpixel segmentation. As seen in Figure 18a  This section analyzes the effects of various training and test sets on the OA of our method and the other methods. Figure 18 represents the classification results of the LRS-HRFMSuperPCA method. The number of training samples of three datasets increased from 1% to 6%. According to Figure 18, with different training sample numbers, the OA can be improved significantly by our method by increasing the number of training samples. Hence, we can conclude that our method can achieve the best OA for the three datasets compared to other state-of-the-art comparison algorithms.
The computational time of the LRS-HRFMSuperPCA algorithm and other comparison algorithms are represented in Table 9. The running environment of the LRS-HRFMSuperPCA and all the comparison algorithms is a notebook computer with 12 GB memory and a 2.2 GHz CPU processor. Except for GhoMR and SSRN, the running environment of GhoMR and SSRN are pytorch 1.6.0 and CUDA10.1 in the open-source GPU environment of Google lab, the running environment of other algorithms is MATLAB2018a. Table 8 shows the number of training samples as a percentage of the total sample number in the measurement run time to experiment on the three datasets. According to the analysis of Table 9, the LRS-HRFMSuperPCA algorithm takes a long time on the Indian Pines. The main reason is that this technique utilizes multi-level domain transform recursive filtering and multi-scale superpixel segmentation. As seen in Figure 18a,b, under different training sets of Indian Pines, the OA of our method is enhanced compared to other comparison algorithms. On the PaviaU and Salinas, the CCJSR took the most time since the computation of the correlation coefficient of the training set increased by increasing the training set, followed by the KNNRS. The main computation of the KNNRS is caused by the operation of KNNs. The operation time of the MSuperPCA is mainly caused by multiscale hyperpixel segmentation of HSIs. The run time of our method on the PaviaU is lower compared to the KNNRS, MSuperPCA, and CCJSR. The OA index of this method is also improved compared to other algorithms on PaviaU, for the GhoMR. Although the time cost of our algorithm on Indian Pines and PaviaU is higher than that of the GhoMR, the classification accuracy of our algorithm is better than that of the GhoMR in various training sets. Moreover, the time cost on Salinas is lower than that of GhoMR. Thus, the experimental results show that the run time of LRS-HRFMSuperPCA is acceptable. time cost of our algorithm on Indian Pines and PaviaU is higher than that of the GhoMR, the classification accuracy of our algorithm is better than that of the GhoMR in various training sets. Moreover, the time cost on Salinas is lower than that of GhoMR. Thus, the experimental results show that the run time of LRS-HRFMSuperPCA is acceptable.   Indian Pines  20  3  18  490  141  74  69  178  347  PaviaU  55  11  117  5977  1661  106  1684  680  1618  Salinas  139  23  82  10582  707  332  4525 860 762

Discussion
We reveal some interesting points about the LRS-HRFMSuperPCA methods by conducting several comparison experiments and parameter discussion on three HSIs.

•
In general, we notice that the mixed noise in HSI will reduce the classification accuracy [9]. Considering the effects of imaging sensors breakdown, environmental pollution, and other factors, these degradation factors account for a small proportion in HSI. We usually build them as sparse noise models here. A clear natural image (specific to our article, it is a HSI) is often a low-rank structure; therefore, when dealing with the mixed noise problem of natural image, we usually use the low-rank sparse representation algorithm to solve it. In order to remove the small-scale textures in HSI, domain transform recursive filtering is used to solve this problem. Another knowledge base is that after the stable low-rank feature natural image (specific to our article, it is a HSI) is extracted (specific to our article, it is LRS-MSuperPCA in Section 3.1), the joint information of the spatial-spectrum of the feature image can be obtained by hierarchical filtering (specific to our article, it is HDTRF in Section 3.2). In order to facilitate understanding, we apply hierarchical recursive filtering to IFRF ( Figure 19) for experimental verification. Compared with the LRS-HRFMSuperPCA algorithm, using hierarchical filtering in Figure 9c, with the increase of the level, the OA of the three kinds of classification in Figure 19 decreases significantly. The experimental results show that our knowledge base is correct.

•
Based on the two knowledge bases proposed in the previous point, denoising based on low-rank sparse representation (let it set to function A) belongs to the transform domain filtering algorithm, while denoising based on hierarchical recursive filtering (let it set to function B) belongs to the spatial domain filtering algorithm. Let us set the penalty factor ρ that controls the proportion between function A and function B, ρ ∈ [0, 1], the optimal filtering of the finding F can be expressed as a function, which can be used for future research to find an optimal framework: • According to the data in Tables 7 and 8, LRS-HRFMSuperPCA is superior to the feature extraction algorithm using a single filter. Compared with IFRF and KNNRS, our method uses low-rank sparse representation algorithm to denoise the original HSI. It can be seen from Table 3 that the OA of LRS-HRFMSuperPCA is 0.23% higher than that of HRFMSuperPCA, and PSNR has also improved. This supports the correctness of LRS [17] that can remove mixed noise in the original HSI. Compared with LRS-MSuperPCA, the small-scale textures of the HSI can be eliminated by the multi-level spatial domain recursive filtering algorithm, fully obtaining the spatialspectral joint information of the HSI and improving the OA. It can be seen from Figures 9c, 10c and 11c that the OA of LRS-HRFMSuperPCA is always higher than that of LRS-MSuperPCA on the three datasets. The comparative experiments verify that the DTRF [13] and IFRF [26] can remove texture structure and improve classification accuracy. These results support earlier studies.

•
Our finding is a novel feature extraction framework with joint low-rank sparse representation-hierarchical domain transform recursive filtering. The framework agrees with the existence of mixed noise in the original HSI [9] and the spectral vec-tors live in a low-dimensional subspace [45]. The proposed framework is applied to HSI feature extraction (MSuperPCA), so the algorithm LRS-HRFMSuperPCA is obtained. As can be seen from Figure 1, we use hierarchical recursive filtering to extract features images from HSI. According to the experimental results in Tables 7 and 8, it improves the OA, AA and Kappa. In the field of remote sensing science, there are two expectations for remote sensing image classification [46]: the first is feature mining, and the second is to design a classifier with high classification accuracy. Our LRS-HRFMSuperPCA algorithm achieves the two expectations. • Compared with the HSI classification algorithm based on non-deep learning [26,27,35,36] and deep learning [30,33], the advantage of the LRS-HRFMSuperPCA algorithm is the use of the dual filtering framework. Specifically, compared with IFRF [26] and KNNRS [36] which only use spatial domain filtering, LRS-HRFMSuperPCA uses spatial-transform dual domain filtering, LRS-HRFMSuperPCA reconstructs the original HSI by recovering the pixels not observed in some known bands. We conduct an experiment to prove the anti-noise performance of the algorithm, in which 0-means additive Gaussian noise is added to the original HSI. According to Figure 20, with the increase of Gaussian noise intensity, our algorithm still maintains an OA of more than 98%, accounting for 10% of the Indian Pines samples on the training sets. Compared with CCJSR [27], which only uses a sparse representation, LRS-HRFMSuperPCA uses hierarchical domain transformation recursive filtering to make full use of spatial context information. Compared with SSRN [30] and GhoMR [33], the two spacespectrum joint classification algorithms based on deep learning use a single projection space. They cannot make full use of the low-dimensional characteristics of each species. Multi-scale superpixel segmentation can learn the low-dimensional features of different regions in HSIs by inputting the number of scales. At the same time, the architectures of SSRN [30] and GhoMR [33] need expensive GPU hardware to train and store. LRS-HRFMSuperPCA uses CPU to reduce the consumption of the experimental equipment. However, we find that most feature extraction algorithms (such as PCA in LRS-HRFMSuperPCA) can compress the information transformation of the original data into the low-dimensional feature space and this may lose the physical meaning of the original feature. In future research, we will suggest to focus on the method of unsupervised band selection algorithm instead of SuperPCA. Band selection selects a desired band subset from the original bands and preserves well the spectral meaning of spectral channels. In practice, prior knowledge about the surface features are difficult to obtain, so it is necessary to develop an unsupervised feature extraction algorithm. Specifically, for the reconstructed HSI with superpixelŝ X, hybrid scheme-based methods implement multiple schemes to select appropriate bands fromX. We can try to integrate both clustering and ranking based techniques to search for the best combination of bands with higher classification accuracy. Another major drawback of our method is that its performance is still affected by the number of superpixel segmentation and the shape of superpixel in our method is irregular. Specifically, once pixels of different classes exist in the same superpixel region, the pixels in these superpixels cannot be effectively classified. In future research, for different categories of pixels in the same superpixel, we will segment all kinds of pixels again, according to the weight of each pixel in the superpixel. The weight can be measured by relative entropy or Euclidean distance, etc. On the other hand, a gradientbased superpixel method has the advantage of regular shape and controllable number of generated superpixels, which is the direction to solve the problem of irregular shape of superpixels in the future.
Another knowledge base is that after the stable low-rank feature natural image (specific to our article, it is a HSI) is extracted (specific to our article, it is LRS-MSuperPCA in Section 3.1), the joint information of the spatial-spectrum of the feature image can be obtained by hierarchical filtering (specific to our article, it is HDTRF in Section 3.2). In order to facilitate understanding, we apply hierarchical recursive filtering to IFRF ( Figure 19) for experimental verification. Compared with the LRS-HRFMSuperPCA algorithm, using hierarchical filtering in Figure 9c, with the increase of the level, the OA of the three kinds of classification in Figure 19 decreases significantly. The experimental results show that our knowledge base is correct.

•
Based on the two knowledge bases proposed in the previous point, denoising based on low-rank sparse representation (let it set to function ) belongs to the transform domain filtering algorithm, while denoising based on hierarchical recursive filtering (let it set to function ) belongs to the spatial domain filtering algorithm. Let us set the penalty factor that controls the proportion between function and function , ∈ [0,1], the optimal filtering of the finding F can be expressed as a function, which can be used for future research to find an optimal framework: F = × + (1 − ) (26) • According to the data in Tables 7 and 8, LRS-HRFMSuperPCA is superior to the feature extraction algorithm using a single filter. Compared with IFRF and KNNRS, our method uses low-rank sparse representation algorithm to denoise the original HSI. It can be seen from Table 3 that the OA of LRS-HRFMSuperPCA is 0.23% higher than that of HRFMSuperPCA, and PSNR has also improved. This supports the correctness of LRS [17] that can remove mixed noise in the original HSI. Compared with LRS-MSuperPCA, the small-scale textures of the HSI can be eliminated by the pixel in the superpixel. The weight can be measured by relative entropy or Euclidean distance, etc. On the other hand, a gradient-based superpixel method has the advantage of regular shape and controllable number of generated superpixels, which is the direction to solve the problem of irregular shape of superpixels in the future.

Conclusions
In this paper, a simple and effective HSI classification algorithm is proposed combining transform domain filtering and spatial domain filtering. The objective of our method is to combine the PCA based on multi-scale entropy rate superpixel dimension reduction within two types of joint filtering denoising framework. To remove the mixed noise in the original HSI, the denoising algorithm based on a low-rank sparse representation is used to denoise and restore the original HSI, improve the peak-signal-to-noise ratio of the original HSI, and the subsequent classification accuracy. A multi-scale entropy ratio-based superpixel segmentation algorithm is utilized to segment the repaired image. By dividing the whole HSI into several regions, the same superpixel block has similar reflection characteristics, which is convenient for subsequent dimension reduction processing and finding the low-dimensional features of the HSI. Secondly, hierarchical recursive filtering is applied to the feature images after SuperPCA is making full use of the spatial information contained in HSI. Experiments on three real HSIs reveal that the proposed LRS-HRFMSuperPCA algorithm enhances the classification accuracy compared with the original MSuperPCA algorithm and the recently proposed HSI classification algorithm. However, the disadvantage of the LRS-HRFMSuperPCA algorithm is that the parameters of the algorithm are manually selected, thereby reducing the operability of the algorithm. Therefore, in future research, we will focus on how to automatically determine the optimal parameters of our algorithm.
Author Contributions: All authors made significant contributions to the manuscript. X.L. and S.Q. put forward the idea; S.Q. and X.L. conceived and designed the experiments; S.Q. and X.L. presented tools and carried out the data analysis; X.L. and S.Q. wrote the paper. S.Q. and S.L. guided and revised the paper; S.Q. provided the funding; all authors have read and agreed to the published version of the manuscript.

Conclusions
In this paper, a simple and effective HSI classification algorithm is proposed combining transform domain filtering and spatial domain filtering. The objective of our method is to combine the PCA based on multi-scale entropy rate superpixel dimension reduction within two types of joint filtering denoising framework. To remove the mixed noise in the original HSI, the denoising algorithm based on a low-rank sparse representation is used to denoise and restore the original HSI, improve the peak-signal-to-noise ratio of the original HSI, and the subsequent classification accuracy. A multi-scale entropy ratio-based superpixel segmentation algorithm is utilized to segment the repaired image. By dividing the whole HSI into several regions, the same superpixel block has similar reflection characteristics, which is convenient for subsequent dimension reduction processing and finding the low-dimensional features of the HSI. Secondly, hierarchical recursive filtering is applied to the feature images after SuperPCA is making full use of the spatial information contained in HSI. Experiments on three real HSIs reveal that the proposed LRS-HRFMSuperPCA algorithm enhances the classification accuracy compared with the original MSuperPCA algorithm and the recently proposed HSI classification algorithm. However, the disadvantage of the LRS-HRFMSuperPCA algorithm is that the parameters of the algorithm are manually selected, thereby reducing the operability of the algorithm. Therefore, in future research, we will focus on how to automatically determine the optimal parameters of our algorithm.
Author Contributions: All authors made significant contributions to the manuscript. X.L. and S.Q. put forward the idea; S.Q. and X.L. conceived and designed the experiments; S.Q. and X.L. presented

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.