Spectral-Spatial Joint Classification of Hyperspectral Image Based on Broad Learning System

: At present many researchers pay attention to a combination of spectral features and spatial features to enhance hyperspectral image (HSI) classification accuracy. However, the spatial features in some methods are utilized insufficiently. In order to further improve the performance of HSI classification, the spectral-spatial joint classification of HSI based on the broad learning system (BLS) (SSBLS) method was proposed in this paper; it consists of three parts. Firstly, the Gaussian filter is adopted to smooth each band of the original spectra based on the spatial information to remove the noise. Secondly, the test sample’s labels can be obtained using the optimal BLS classification model trained with the spectral features smoothed by the Gaussian filter. At last, the guided filter is performed to correct the BLS classification results based on the spatial contextual information for improving the classification accuracy. Experiment results on the three real HSI datasets demonstrate that the mean overall accuracies (OAs) of ten experiments are 99.83% on the Indian Pines dataset, 99.96% on the Salinas dataset, and 99.49% on the Pavia University dataset. Compared with other methods, the proposed method in the paper has the best performance.


Introduction
Hyperspectral images (HSI) are widely used in various fields [1][2][3][4] due to their many characteristics, such as spectral imaging with high resolution, unity of spectral image and spatial image, and rapid non-destructive testing. One of the important tasks of HSI applications is HSI classification. At first, researchers only utilized spectral features for classification because the spectral information is easily affected by some factors, for example, light, noise, and sensors. The phenomenon of "same matter with the different spectrum and the same spectrum with distinct matter" often appears. It increases the difficulty of object recognition and seriously reduces the accuracy of classification. Then researchers began to combine spectral characteristics and spatial features to improve the classification accuracy.
The spectral feature extraction of HSI can be realized by unsupervised [5,6], supervised [7,8], and semi-supervised methods [7,9,10]. Representative unsupervised methods include principal component analysis (PCA) [11], independent component analysis (ICA) [12], and locality preserving projections (LPP) [13]. Some well-known unsupervised feature extraction methods are based on PCA and ICA. The foundation of some supervised feature extraction techniques for HSIs [14,15] is the well-known linear discriminant analysis (LDA). Many semi-supervised methods of spectral feature extraction often combine supervised and unsupervised methods to classify HSIs using limited labeled samples and unlabeled samples. For example, Cai et al. [16] proposed the semi-supervised discriminant analysis (SDA), which adopts the graph Laplacian-based regularization constraint in regularization deep cascade broad learning system (DCBLS) method to apply to the largescale data. The method is successful in image denoising. The discriminative locality preserving broad learning system (DPBLS) [54] was utilized to capture the manifold structure between neighbor pixels of hyperspectral images. Wang et al. [55] proposed the HSI classification method based on domain adaptation broad learning (DABL) to solve the limitation or absence of the available labeled samples. Kong et al. [56] proposed a semi-supervised BLS (SBLS). It first used the HGF to preprocess HSI data, then the class-probability structure (CP), and the BLS to classify. It achieved the semi-supervised classification of small samples.
In order to make full use of the spectral-spatial joint features for improving the HSI classification performance, we put forward the method of SSBLS. It incorporates three parts. First, the Gaussian filter is used to smooth spectral features on each band of the original HSI based on the spatial information for removing the noise. The inherent spectral characteristics of pixels are extracted. The first fusion of spectral information and spatial information is realized. Second, inputting the pixel vector of spectral-spatial joint features into the BLS, BLS extracts the sparse and compact features through a random weight matrix fine-turned by a sparse auto encoder for predicting the labels of test samples. The initial probability maps are constructed. In the last step, a guided filter corrects the initial probability maps under the guidance of a grey-scale image, which is obtained by reducing the spectral dimensionality of the original HSI to one via PCA. The spatial context information is fully utilized in the operation process of the guided filter. In SSBLS, the spatial information is used in the first and third steps. In the second step, BLS uses the spectralspatial joint features to classify. At the same time, in the third step, the first principal component of spectral information is used to obtain the grey-scale image. Therefore, in the proposed method, the full use of spectral-spatial joint features contributes to better classification performance. The major contribution of our work can be summarized as follows: (1) We found the organic combination of the Gaussian filter and BLS could enhance the classification accuracy. The Gaussian filter captures the inherent spectral information of each pixel based on HSI spatial information. BLS extracts the sparse and compact features using the random weights fine-turned by the sparse auto encoder in the process of mapping feature. Sparse features can represent the low-level structures such as edges and high-level structures such as local curvatures, shapes [57], these contribute to the improvement of classification accuracy. The inherent spectral features are input to BLS for training and prediction, thereby improving the classification accuracy of the proposed method. Experimental data supports this conclusion. (2) We take full advantage of spectral-spatial features in SSBLS. The Gaussian filter firstly smooths each spectral band based on spatial information of HSI to achieve the first fusion of spectral-spatial information. The guided filter corrects the results of BLS classification based on the spatial context information again. The grey-scale guidance image of the guided filter is obtained via the first PCA from the original HSI. These three operations sufficiently join spectral information and spatial information together, which is useful to improve the accuracy of SSBLS. (3) SSBLS utilizes the guided filter to rectify the misclassified hyperspectral pixels based on the spatial contexture information for obtaining the correct classification labels, thereby improving the overall accuracy of SSBLS. The experimental results can also support this point.
The rest of this paper is organized as follows. Section II describes the proposed method in detail. Section III presents the experiments and analysis. The discussion of the proposed method is in Section IV. Section V is the summary.

Proposed Method of Spectral-Spatial Joint Classification of HSI Based on Broad Learning System
The flowchart of SSBLS proposed in this paper is shown in Figure 1, which mainly consists of three steps: (1) After inputting the original HSI data, the Gaussian filter with an appropriate-sized window is performed to extract the inherent spectral features of samples based on the spatial information. (2) The test samples labels are got using the optimal BLS classification model trained with pixel vectors smoothed by the Gaussian filter. The initial probability maps are constructed according to the results of BLS classification. (3) To improve the classification accuracy of HSI, the guided filter is adopted to correct the initial probability maps based on the spatial context information of HSI under the guiding of the grey-scale guidance image. The guidance image is obtained via the first PCA. The initial probability maps Figure 1. The flowchart of hyperspectral image (HSI) classification via the spectral-spatial joint classification broad learning system (SSBLS).

Spectral Feature Extraction of HSI Based on Gaussian Filter
The first step of the proposed method is that the 2-dimensional (2-D) Gaussian filter smooths spectral features on each band based on the spatial information of HSI. The Gaussian filter is one of the most widely used and effective window-based filtering methods. It is usually used as a low-pass filter to suppress the high-frequency noise, and it can repair the detected missing regions [58]. When the Gaussian filter is capturing the spectral features of HSI, the weight of each hyperspectral pixel in the Gaussian filter window decays exponentially according to the distance from the center pixel. The closer the distance of the neighboring pixel from the center pixel is, the greater the weight is, and the farther the distance is, the smaller the weight is. The weight of each pixel in the Gaussian filter window is determined by the following 2-D Gaussian function where x and y are the coordinates of the pixels in the Gaussian filter window on each band of HSI. The coordinate of the center pixel of the window is (0, 0) .  , is the standard deviation of the Gaussian filter. It is used to control the degree of blurring spectral information. That is to say, the greater the value of  is, the smoother the blurred spectral features are. The Gaussian function [59] has the characteristic of being separable, so that a larger-sized Gaussian filter can be effectively realized. The 2-D Gaussian function convolution can be performed in two steps. First, the spectral image on each band of HSI is convolved with the 1-D Gaussian function, and then, the convolution result is convolved using the same 1-D Gaussian function in the way of rotating 90 degrees to the left. Therefore, the calculation of 2-D Gaussian filtering increases linearly with the size of the filter window instead of increasing squarely. The original HSI data with n samples are denoted as , , , , n , which belongs to the m-D space.
y y y y 1 2 3 , , , m n is gotten from X blurred by the Gaussian filter. Here, m is the number of HSI band. The superscript "GaF" represents the Gaussian filter. The " GaF O " stands for the Gaussian filtering operation. The spectral feature extraction of HSI based on the Gaussian filter can be represented as Equation (2).

HSI Classification Based on the Combination of Gaussian Filter and BLS
Chen and Liu put forward a BLS based on the rapid and dynamic learning features of the functional-link network [60][61][62]. BLS is built as a flat network, in which the input data first are mapped into mapped feature nodes, then all mapped feature nodes are mapped into enhancement nodes for expansion. The BLS network expands through both mapped feature nodes and enhancement nodes. Moreover, through rigorous mathematical methods, Igelnik and Pao [63] have proven that enhancement nodes contribute to the improvement of classification accuracy. BLS is built on the basis of the traditional random vector functional-link neural network (RVFLNN) [64]. However, unlike the traditional RVFLNN, in which the enhancement nodes are constructed though using a linear combination of the input nodes and then applying a nonlinear activation function to them. BLS first maps the inputs to construct a set of mapped feature nodes via some mapping functions and then maps all mapped feature nodes into enhancement nodes through other activation functions.
The second step of the proposed method is to input HSI pixel vectors smoothed by the Gaussian filter to train the BLS classification model. Then the test sample's labels are calculated by the optimal BLS classification model for constructing the initial probability maps. The notation in Table 1 will be used to present the described HSI classification procedure. The HSI samples smoothed by the Gaussian filter are split into a training set and test set. The training pixel vectors are mapped into mapped feature nodes applying the random weight matrix. In addition, the sparse auto encoder is used to fine-tune the random weight matrix. Then, the mapped feature nodes are mapped into enhancement nodes using other random weights. The optimal connection weights from all mapped feature nodes and enhancement nodes to the output are gained through the normalized optimization method of solving L2-norm by ridge regression approximation in order to obtain the optimal BLS model. The test sample labels are predicted by the optimal model to construct the initial probability maps.  The connecting weight matrix from all mapped feature nodes and enhancement nodes to the output

Y BLS
The output of BLS First, the HSI data smoothed by the Gaussian filter, GaF Y with n samples and m dimensions, is mapped into mapped feature nodes. That is to say, is the result of BLS classification, where C is the quantity of sample types. There are d feature mappings, and each mapping has e nodes, can be represented as in Equation (3) [19] l l is the concatenation of all the first l groups of enhancement nodes [19]. Combined with Equation (4), the output result of BLS can be expressed by Equation where W op is the connecting weight matrix from all mapped feature nodes and all enhancement nodes to the output of the BLS. The superscript "op" represents the optimal weight [19]. The optimal connecting weight matrix can be obtained using the L2-norm regularized least square problem as shown in Equation (6) GaF where  is applied to further restrict the squared of L2-norm of W op .  2 represents the L2-norm, and  2 2 stands for the square of L2-norm. Equation (7) is obtained by the ridge regression approximation [19].
When   0 , Equation (7) can be converted into solving the least square problem. When    , the result of Equation (7) is finite and turns to zero. So, set   0 , and add a positive number on the diagonal of to get the approximate Moore-Penrose generalized inverse [19]. Consequently, we have Equation (8).
Finally, the output of BLS is: After inputting the spectral features smoothed by the Gaussian filter into BLS, the initial result of classification is The probability maps of this results are expressed as , here p c is the probability map that all pixels belong to the c class.

Correction to the Results of BLS Classification Based on Guided Filter
In the third step of the proposed method, the guided filter is performed to correct each probability map p c with the guidance of the grey-scale guidance image V , and get the output q c c = 1,2, C  ) ( . V is obtained by the first PCA method from the original HSI. The output of the guided filter [38] is the local linear transformation of the guidance image and has a good edge-preserving characteristic. At the same time, the output image will become more structured and non-smooth than the input image under the guidance of the guidance image. For grey-scale and high-dimensional images, the guided filter essentially has the characteristic of low time complexity, regardless of the kernel size and the intensity range. In this step, the filtering output is Here, q c is the probability map that all pixels belong to the c class. , which is the probability that the pixel i belongs to c c = 1,2, C  ) ( , can be expressed as a linear transformation of the guidance image in a window  k centered at the pixel k , as shown in Equation (12).
  , k k a b are some assumed linear coefficients to be restricted in  k .  k is a window, the radius of which is r . This local linear model guarantees that q c has an edge only if V has an edge, because    q V c a . The cost function in the window  k is minimized as shown in Equation (13), which can not only realize the linear model of Equation (12), but also minimize the difference between q c and V . (13)  , which defines the degree of the guided filter blurring, is used to regularize the parameter penalizing large k a . Equation (13) is the linear ridge regression model and is solved by Equation (14).
Here  k and  2 k are the mean and variance of the guidance image in  k .  is the Pixel i is involved in all the overlapping windows, which cover pixel i ; therefore, the value of q , i c in Equation (12) The window  k is symmetrical，so (16) can be expressed by equation (17) q b are the mean coefficients of all windows covering pixel i . In fact, k a in Equation (14) can be rewritten as a weighted sum of input image p c : . The kernel weight is explicitly expressed by: So, Equation (17) can be changed to Equation (19).
After the initial probability maps are corrected by the guided filter, the probability of . We take the subscript of the highest probability among the C probabilities as the label of the pixel i , namely: After the guided filter corrects the initial probability maps, the labels of all labeled samples of HSI are 1 2 , y , , n y y . The superscript "GuF" represents the guided filtering operation. In summary, the algorithmic steps of HSI classification based on SSBLS are summarized in Algorithm 1.

Input:
Original HSI Dataset, X ; S is the size of the Gaussian filter window;  is the standard deviation of the Gaussian function; N is the number of training samples; M is the number of mapped feature windows; F is the number of mapped feature nodes per window; E is the number of enhancement nodes; r represents the radius of the guided filter window  k ;  is the penalty parameter of the guided filter.

2.
Select the optimal parameters S and  ，perform Gaussian filter to smooth each spectral band of original HSI data X , and get GaF Y .

10.
Based on the original HSI, the grey-level guidance map V is generated by the first PCA method. According to Equations (18) and (19), and the optimal parameters r and  , correct each initial probability map p c respectively, then get the final probability graphs

11.
According to Equation (20), based on the maximum probability principle, get GuF Y , the classification results of all samples, get the test samples labels after removing the training samples.

Experiment Results
We assess the proposed SSBLS through a lot of experiments. All experiments are performed in MATLAB R2014a using a computer with 2.90 GHz Intel Core i7-7500U central processing unit (CPU) and 32 GB memory and Windows 10.

Hyperspectral Image Dataset
The performance of SSBLS method and other comparison methods are evaluated on the three public hyperspectral datasets, which are the Indian Pines, Salinas, and Pavia University datasets (The three datasets are available at http://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_Scenes accessed on 04-11-2018).
The Indian Pines dataset was acquired by the Airborne Visible Infrared Imaging Spectrometer (AVIRIS) sensor when it was flying over North-west Indiana Indian Pine test site. This scene has 21,025 pixels and 200 bands. The wavelength of bands is from 0.4 to 2.5 μm. Two-thirds agriculture and one-third forests or other perennial natural vegetation constitute this image. There are two main two-lane highways, a railway line, some low-density housing, other built structures, and pathways in this image. It has 16 types of things. In our experiments, we selected the nine categories samples with a quantity greater than 400. The original hyperspectral image and ground truth are given in Figure 2.
The Salinas scene was obtained by a 224-band AVIRIS sensor, capturing over the Salinas Valley, California, USA, with a high spatial resolution of 3.7 m. The HSI dataset has 512 × 217 pixels with 204 bands after the 20 water absorption bands were discarded. We made use of 16 classes samples in the scene. The original hyperspectral image and ground truth are given in Figure 3.  The Pavia University dataset was collected by the Reflective Optics System Imaging Spectrometer (ROSIS) sensor over Pavia in northern Italy. The image has 610 × 340 pixels with 103 bands. Some pixels containing nothing in the image were removed. There were nine different sample categories used in our experiments. Figure 4 is the original hyperspectral image, category names with labeled samples, and ground truth.

Parameters Analysis
After analyzing SSBLS, it was found that the adjustable parameters are the size of the Gaussian filter window ( S ), the standard deviation of Gaussian function (  ), the number of mapped feature windows in BLS ( M ), the number of mapped feature nodes per window in BLS ( F ), the number of enhancement nodes ( E ), the radius of the guided filter window k  ( r ), and the penalty parameter of the guided filter (  ). The above parameters are analyzed with overall accuracy (OA) to evaluate the performance of SSBLS.  Figure 5. It can be seen from this figure that as the S and  increased, the OAs gradually increased, and gradually decreased after reaching the peak. If S is too small, the larger-sized target will divide into multiple parts distributing in the diverse Gaussian filter windows. If S is too large, the window will contain multiple small-sized targets. Both will cause misclassification. When  is too small, the weights change drastically from the center to the boundary. When  gradually becomes larger, the weights change smoothly from the center to the boundary, and the weights of pixels in the window are relatively well-distributed, which is close to the mean filter. Therefore, for different HSI datasets, the optimal values of S and  were not identical. in the Indian Pines dataset, when , the OA is the largest. So S and  were 18 and 7 respectively in the subsequent experiments. In the Salinas dataset, when    24, 7 S , the performance of SSBLS was the best. Therefore, S and  were taken as 24 and 7 in the later experiments severally. Similarly, the best values of S and  were 21 and 4 respectively in the Pavia University dataset.  Figure 6, we can see that as M and F were becoming larger, the OAs of SSBLS gradually grew. When M and F were too small, the lesser feature information was extracted and the lower the mean OA of ten experiments was. When M and F were too large, although the performance of SSBLS was improved, the computation and the consumed time also rose. Therefore, in the subsequent experiments, the best values of M and F were 6 and 34 respectively in the Indian Pines dataset, 12 and 36 in the Salinas dataset, and 8 and 26 in the Pavia University dataset.

Influence of Parameter E on OA
In the three datasets, S ,  , M and F were the optimal values obtained from the above experiments, r and  were 2 and 10 -3 , respectively. E was chosen from  [500, 550, 600, ,1200] in the Indian Pines dataset. The range of E was  [50,100,150, , 800] in the Salinas and Pavia University datasets. In the three datasets, the average OAs of ten experiments had an upward trend with the increase of E as shown in Figure 7. As E gradually grew, the features extracted by BLS also increased, at the same time, the computation and consumed time also grew. Therefore, the numbers of enhanced nodes were 1050 in the Indian Pines dataset, and 700 in both the Salinas and Pavia University datasets.

Influence of Parameter r on OA
The experiments were carried out on the three datasets. The values of S ,  , M, F and E were the optimal values analyzed previously,  is -3 10 , and r is chosen from  [1,2,3,9] . Figure 8 indicates that as r grew, the average OAs of ten experiments first increased, and then decreased. In the Indian Pines dataset, when  3 r , the mean OA was the largest, so r is 3. In the Salinas dataset, when  5 r , the performance of SSBLS was the best, so the value of r was 5. On the Pavia University dataset, while  3 r , the average OA was the greatest, so r was 3. Pines and Salinas datasets, as  increased, the mean OAs first increased and then decreased, as shown in Figure 9. In the Indian Pines dataset, when  -3 =10 , the average OA was the largest, so  was 10 -3 in the subsequent compared experiments. On the Salinas dataset, while   -1 10 , the performance of SSBLS was the best, so the optimal value of  was 10 -1 . In the Pavia University dataset, as  -7 =10 , the classification effect was the best, then the best value of  was -7 10 . Figure 9. The relationship of OA and  in the three datasets.

Ablation Studies on SSBLS
We have conducted several ablation experiments to investigate the behavior of SSBLS on the three datasets. In these ablation experiments, we randomly took 200 labeled samples as training samples and the remaining labeled samples as test samples from each class sample. We utilized OA, average accuracy (AA), kappa coefficient (Kappa) to measure the performance of different methods as shown First, we only used BLS to classify the original hyperspectral data. On the Salinas dataset, the effect was good; the OA reached 91.98%. However, the results were unsatisfactory when using the Indian Pines and Pavia University datasets.
Second, we disentangle the Gaussian filter influence on the classification results. We used the Gaussian filter to smooth the original HSI, and then used BLS to classify, namely the method of BLS based on the Gaussian filter (GBLS). In Indian Pines dataset, the OA was about 20% higher than these of BLS, about 7% higher than that of BLS in the Salinas dataset, and about 17% higher in the Pavia University dataset. These show that the Gaussian filter can help to improve the classification accuracy.
Next, we used BLS to classify the original hyperspectral data and then applied the guided filter to rectify the misclassified pixels of BLS. The results in terms of OA, AA, and Kappa were also better than those of BLS. This shows that guided filter also plays a certain role in improving classification performance.
Finally, we used the proposed method in the paper for HSI classification. This method first uses the Gaussian filter to smooth the original spectral features based on the spatial information of HSI. After using BLS classification, it finally applies the guided filter to correct the pixels that are misclassified by BLS. The results are the best in the four methods. This shows that both Gaussian filter and guided filter contribute to the improvement of classification performance.
From the above analysis, we know that the combination of the Gaussian filtering and BLS has a great effect on the improvement of classification performance, especially on Indian Pines and Pavia University datasets. Although the classification accuracy after BLS classification based on the Gaussian filter (GBLS) was relatively high, the classification accuracy was still improved after adding the guided filter to GBLS. It indicates that the guided filter can also help improve the classification accuracy.

Experimental Comparison
In order to prove the advantages of SSBLS on the three real datasets, we compare SSBLS with SVM [65], HiFi-We [42], SSG [66], spectral-spatial hyperspectral image classification with edge-preserving filtering (EPF) [41], support vector machine based on the Gaussian filter (GSVM), feature extraction of hyperspectral images with image fusion and recursive filtering (IFRF) [67], LPP_LBP_BLS [19], BLS [50], and GBLS. All methods inputs are the original HSI data. Furthermore, the experimental parameters are the optimal values. In each experiment, the 200 labeled samples are randomly selected from per class sample as the training set, and the rest labeled samples as the test samples set. We get the individual classification accuracy (ICA), OA, AA, Kappa, overall consumed time (t), and test time (tt). All results are the mean values of ten experiments as shown in Tables 3-5, and the highest values of them are shown in bold.  (1) Compared with the conventional classification method SVM-the effects of BLS approximate to those of SVM methods on the Indian Pines and Salinas datasets. However, when BLS and SVM make use of the HSI data filtered by the Gaussian filter, the performance of GBLS was obviously better than that of GSVM. In the Pavia University dataset, the OA of BLS was 16.56% lower than that of SVM. After filtering the Pavia University data using the Gaussian filter, the OA of GBLS was about 3% higher than that of GSVM. SSBLS had the best performance. From Tables 3-5, the experimental results illustrate that the combination of the Gaussian filter and BLS contributes to improving the classification accuracy.
(2) HiFi-We firstly extracts different spatial context information of the samples by HGF, which can generate diverse sample sets. As the hierarchy levels increased, the pixel spectra features tended to be smooth, and the pixel spatial features were enhanced. Based on the output of HGF, a series of classifiers could be obtained. Secondly, the matrix of spectral angle distance was defined to measure the diversity among training samples in each hierarchy. At last, the ensemble strategy was proposed to combine the obtained individual classifiers and mSAD. This method achieved a good performance. But its performance in terms of OA, AA, and Kappa were not as good as these of SSBLS. The main reasons are that SSBLS adopts the advantages of spectral-spatial joint features sufficiently in the three operations of the Gaussian filter, BLS, and guided filter; these are useful to improve the accuracy of SSBLS.
(3) SSG assigns a label to the unlabeled sample based on the graph method, integrates the spatial information, spectral information, and cross-information between spatial and spectral through a complete composite kernel, forms a huge kernel matrix of labeled and unlabeled samples, and finally applies the Nystróm method for classification. The computational complexity of the huge kernel matrix is large, resulting in increasing the consumed time of the classification. On the contrary, SSBLS not only has higher OA than SSG, but also takes lesser time than SSG.
(4) The EPF method adopts SVM for classification, constructs the initial probability map, and then utilizes the bilateral filter or the guided filter to collect the initial probability map for improving the final classification accuracy. The results of it were very good in the real three hyperspectral datasets. However, SSBLS had better performance compared with EPF. This is mainly because SSBLS firstly utilizes the Gaussian filter to extract the inherent spectral features based on spatial information, moreover, applies the guided filter to rectify the misclassification pixels of BLS based on the spatial context information.
(5) IFRF divides the HSI samples into multiple subsets according to the neighboring hyperspectral band, then applies the mean method to fuse each subset, finally makes use of the transform domain recursive filtering to extract features from each fused subset for classification using SVM. This method works very well. But the performance of SSBLS was better than that of IFRF. Specifically, the mean OA of SSBLS was 1.03% higher than that of IRRF in the Indian dataset, 0.24% higher in the Salinas dataset, and 1.5% higher in the Pavia University dataset. There were three reasons for the analysis results. Firstly, when SSBLS used the Gaussian filter to smooth the HSI spectral features based on the spatial information, the weight of each neighboring pixel decreased with the increase of the distance between it and the center pixel in the Gaussian filter window. The Gaussian filter operation could remove the noise. Secondly, in the SSBLS method, the integration of the Gaussian filter and BLS contributed to extracting the sparse and compact spectral features fusing the spatial features and achieved outstanding classification accuracy. Thirdly, SSBLS applied the guided filter based on the spatial context information to rectify the misclassified hyperspectral pixels for improving the final classification accuracy.
(6) The LPP_LBP_BLS method uses LPP to reduce the dimensionality of HSI in the spectral domain, then utilized LBP to extract spatial features in the spatial domain, and finally makes use of BLS to classify. The performance of LPP_LBP_BLS was very nice. But it has two disadvantages. First, the LBP operation led to an increase in the number of processed spectral-spatial features greatly. For example, the number of spectral bands after dimensionality reduction of each pixel was 50, and the number of each pixel spectralspatial features after the LBP operation was 2950. Second, LPP_LBP_BLS worked very well on the Indian Pines and Salinas datasets, but the mean OA only reached 97.14% in the Pavia University dataset. It indicates that this method has a certain data selectivity and is not robust enough. The average OAs of SSBLS in the three datasets are all above 99.49%. In the Indian dataset, the mean OA is 99.83%, and the highest OA we obtained during the experiments is 99.97%. In the Salinas dataset, the average OA is 99.96%, and the highest OA can reach 100% sometimes. It shows that the robustness of SSBLS is better, especially on the Pavia University dataset. As the parameters change, the OAs change regularly, as shown in Figures 5c and 6c. (7) Compared with BLS and GBLS. It can be seen in Tables 3-5 that BLS had an unsatisfactory classification effect only using the original HSI data; however, when the GBLS adopted the spectral features smoothed by the Gaussian filter, its OA was greatly improved. It indicates that the combination of the Gaussian filtering and BLS contributed to the improvement of classification accuracy. The classification accuracy of SSBLS was higher than those of BLS and GBLS. This was because SSBLS applied the guided filter based on the spatial contextual information to rectify the misclassified pixels, further improving the classification accuracy.
In summary, using the three datasets, the OA, AA, and Kappa of SSBLS were better than those of nine other comparison methods, as can be clearly seen from Figures 10-12. From Tables 3-5, it can be seen that the execution time of SSBLS was lesser than these methods (SVM, HiFi-We, SSG, EPF, GSVM, IFPF, and LPP_LBP_BLS), and the pretreatment time and the training time of SSBLS was lesser than HiFi-We, SSG, EPF, IFPF, and LPP_LBP_BLS.

Discussion
The experimental results of the three public datasets indicate that SSBLS had the best performance in terms of three measurements (OA, AA, and Kappa) in all the compared methods. There were three main reasons for this, as follows. Firstly, the combination of the Gaussian filter and BLS contributed to the improvement of SSBLS classification accuracy. The Gaussian filter could fuse spectral features and spatial features of HSI effectively to extract the inherent spectral characteristics of each pixel. BLS expressed the smoothed spectral information into the sparse and compact features in the process of mapping feature using random weight matrixes fine-turned by the sparse auto encoder. It also improved the classification accuracy. It can be clearly seen from Tables 3-5 that the performances of GBLS and SSBLS using the HSI data smoothed by the Gaussian filter were greatly improved. Secondly, SSBLS takes full advantage of spectral-spatial joint features to improve its performance. The Gaussian filter firstly smooths each band in the spectral domain based on the spatial information to achieved the first fusion of spectral and spatial information. The guided filter corrects the results of BLS classification under the guidance of the grey-scale guidance image, which is obtained by the first PCA based on the spectral information from the original HSI. These operations join spectral features and spatial information together sufficiently. At last, SSBLS applies the guided filter to rectify the misclassification HSI pixels to further enhance its classification accuracy.

Conclusions
To take full advantage of the spectral-spatial joint features for the improvement of HSI classification accuracy, we proposed the method of SSBLS in this paper. The method is divided into three parts. Firstly, the Gaussian filter smooths each spectral band to remove the noise in spectral domain based on the spatial information of HSI and fuse the spectral information and spatial information. Secondly, the optimal BLS models were obtained by training the BLS using the spectral features smoothed by the Gaussian filter. The test sample labels were computed for constructing the initial probability maps. Finally, the guided filter is applied to rectify the misclassification pixels of BLS based on the HSI spatial context information to improve the classification accuracy. The results of experiments of the three public datasets show that the proposed method outperforms other methods (SVM, HiFi-We, SSG, EPF, GSVM, IFRF, LPP_LBP_BLS, BLS, and GBLS) in terms of OA, AA, and Kappa.
This proposed method is a supervised learning classification that requires more labeled samples. However, the number of HSI labeled samples were very limited, and a high cost is required to label the unlabeled samples. Therefore, the next step is to study a semi-supervised learning classification method to improve the semi-supervised learning classification accuracy of HSI.
Author Contributions: All of the authors made significant contributions to the work. G.Z. and Y.C. conceived and designed the experiments; G.Z., X.W., Y.K., and Y.C. performed the experiments; G.Z., X.W., Y.K., and Y.C. analyzed the data; G.Z. wrote the original paper, X.W., Y.K., and Y.C. reviewed and edited the paper. All authors have read and agreed to the published version of the manuscript. Data Availability Statement: Publicly available datasets were analyzed in this study. This data can be found here: http://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_Sens-ing_Scenes.

Conflicts of Interest:
The authors declare no conflict of interest.