Dimensionality Reduction of Hyperspectral Images Based on Improved Spatial–Spectral Weight Manifold Embedding

Due to the spectral complexity and high dimensionality of hyperspectral images (HSIs), the processing of HSIs is susceptible to the curse of dimensionality. In addition, the classification results of ground truth are not ideal. To overcome the problem of the curse of dimensionality and improve classification accuracy, an improved spatial–spectral weight manifold embedding (ISS-WME) algorithm, which is based on hyperspectral data with their own manifold structure and local neighbors, is proposed in this study. The manifold structure was constructed using the structural weight matrix and the distance weight matrix. The structural weight matrix was composed of within-class and between-class coefficient representation matrices. These matrices were obtained by using the collaborative representation method. Furthermore, the distance weight matrix integrated the spatial and spectral information of HSIs. The ISS-WME algorithm describes the whole structure of the data by the weight matrix constructed by combining the within-class and between-class matrices and the spatial–spectral information of HSIs, and the nearest neighbor samples of the data are retained without changing when embedding to the low-dimensional space. To verify the classification effect of the ISS-WME algorithm, three classical data sets, namely Indian Pines, Pavia University, and Salinas scene, were subjected to experiments for this paper. Six methods of dimensionality reduction (DR) were used for comparison experiments using different classifiers such as k-nearest neighbor (KNN) and support vector machine (SVM). The experimental results show that the ISS-WME algorithm can represent the HSI structure better than other methods, and effectively improves the classification accuracy of HSIs.


Introduction
With the development of science and technology, hyperspectral images (HSIs) have become the main research direction in the field of modern remote sensing technology. HSIs have a large number of spectral bands, which provide detailed spectral information about objects [1,2]. However, due to the strong correlation between adjacent spectra, there is much redundant information in HSIs, which take up a large storage space and require much computation time. Moreover, when classifying HSIs, classification accuracy is subject to the curse of dimensionality [3]. In order to improve classification accuracy, a dimensionality reduction (DR) method is a necessary and feasible preprocessing measure for HSI [4,5].
it is easy to make abnormal points appear if only considering the structural distribution between the data points. Furthermore, it is easy to cause the problem of sparseness if only keeping the data nearest neighbor relationship unchanged during projection transformation. To overcome abnormal points and the sparseness problem, both the structure and neighbor sample relationship are taken into account in this paper. Finally, the model can be efficiently solved by solving the minimum eigenvalue to the generalized eigenvalue problem and obtaining a projection matrix. The main contributions of the proposed algorithm are as follows:

1.
A new weight matrix is constructed to describe the structure between samples, in which the product of the spatial-spectral distance weight matrix and the structure weight matrix is taken as a new data weight matrix. Compared with the previous weight matrix, which only considers spectral distance or spatial distance, the new weight matrix integrates the spatial-spectral information and structural characteristic of the data. 2.
The model not only makes the manifold structure invariant, but also preserves the nearest neighbor relationship of the samples, when the high-dimensional data are projecting to the low-dimensional space.
This paper is arranged as follows. Section 2 briefly summarizes the LLE and LE methods and reviews the related works of these models. Section 3 provides the detailed description and the solving process of ISS-WME. Section 4 compares the performance of the proposed method and other DR methods with respect to three public data sets. Finally, the conclusions and perspectives are provided in Section 5.

Local Linear Embedding
Given the data set X =[x 1 , · · · , x N ] ∈ R D×N , where x i ∈ R D , this denotes the ith sample with D-dimension features and N is the number of the samples. We assume that the D-dimensional-sample x i projects to d-dimension space M, d D. Therefore, the low-dimensional coordinate of the transformed data is Y = y 1 , · · · , y N ∈ R d×N , where y i ∈ R d . The core of the LLE algorithm is to retain x i and its local neighbor samples are unchanged after DR. We consider the point and its local neighbor points as belonging to the same class. Under the principle of minimizing reconstruction errors, the sample x i can be linearly represented by these neighbor samples. By reconstructing the weight matrix, the original space is connected with the low-dimensional embedding space. Moreover, the reconstruction weight matrix between each sample and its nearest neighbor samples is kept unchanged, and the embedding result in the low-dimensional space is obtained by minimizing the reconstruction errors. Therefore, the weight coefficient matrix of the relationship between x i and its local neighbors can be obtained by solving the following optimization problem [21]: In Equation (1), x j (j = 1, · · · , k) is one of the k samples, which is closest to X i (i = 1,· · · N), and w ij , andstands for the weight neighbor relationship between samples x i and x j ; if they are not neighbors then w ij = 0. Assuming the projection of D-dimensional samples into d-dimension space, it is desirable to maintain the same linear relationship: where I is the identity matrix and y i = YI i . Then, we have the following: M =(I − W) T (I − W), hence, Equation (2) can be changed into the following problem: Using the method of Lagrangian multiplier, Equation (3) can be easily solved by the generalized eigenvalue decomposition approach as follows: Then, we can obtain the eigenvector corresponding to the dth smallest non-zero eigenvalues, and the low-dimensional embedding matrix can be represented as Y = y 1 , · · · , y d .
The LLE algorithm [22] can successfully maintain the local neighbor geometric structure and have a fast calculation speed. However, as the number of data dimension and data size increases, it has large sparsity, poor noise, and other problems.

Laplacian Eigenmaps
Given the data set X =[x 1 , · · · , x N ] ∈ R D×N and using the KNN method to find the k-nearest neighbors of the sample x i , an overall data structure matrix is then formed, and x j is the jth nearest sample of x i . Then, there is a weight value as h ij = exp − x i −x j . Let Y = y 1 , · · · , y N ∈ R d×N , which denotes the low-dimensional embedding samples of data set X, and Y = P T X. Then, Y can be solved by constructing the following optimization problem [23]: Similarly, Equation (5) constraints ensure it has a solution. And it can be solved by using the generalized eigenvalue decomposition approach as follows: where D ii = j h ij is a diagonal matrix and L = D − H is the Laplacian matrix. H is the weight matrix made up of h ij . The embedding samples in the d-dimensional space are constructed by the eigenvectors corresponding to the d minimum eigenvalues. The LE algorithm [24] introduces the graph theory to achieve the purpose of DR methods. Nevertheless, due to the inaccurate weight matrix in the LE algorithm, the traditional LE algorithm cannot accurately describe the structure for complex hyperspectral data, resulting in the fact that the data in the low-dimensional space cannot fully express the original data features.

Improved Spatial-Spectral Weight Manifold Embedding
To solve large sparsity, inexact weight, and other problems of the LLE and LE [25] algorithms, an improved spatial-spectral weight manifold embedding (ISS-WME) algorithm is proposed in this Sensors 2020, 20, 4413 5 of 25 paper. It combines spatial-spectral and high-dimensional manifold structure information to construct a weight matrix corresponding to the HSI structure. Considering the multi-manifold structure of HSI, the combination of its structure and the nearest neighbor samples simultaneously makes the data neighbor relationship invariable, without breaking the original structure when embedding to the low-dimensional space. In this regard, Section 3.1 specifically analyzes how to construct a weight matrix that is more consistent with the sample structure. In addition, Section 3.2 describes the final optimization objective function.

Spatial-Spectral Weight Setting
Through experimental study, researchers have found that classification accuracy can be improved by combining the spatial information in the analysis of HSI. Hence, the ISS-WME method is based on spatial and spectral information. It uses the variation of Gaussian function to represent the spatial and spectral distance, respectively. Given the HSI data set X = [x f , x p , where x f is the spectral reflectance of a pixel and x p is the spatial coordinates of a pixel, to construct D ij , we find each pair of samples i and x j = x f j , x p j , where i, j = 1, · · · , N. Therefore, the spatial distance matrix and spectral distance matrix are represented, respectively, as follows: Therefore, the spatial-spectral distance weight matrix is as follows: In HSI, adjacent pixels in the same homogenous region usually belong to the same class, so any sample in the same class can be linearly represented by homogeneous neighbor samples. Similarly, the whole data sample centers can be represented by different classes of sample centers [26]. Hence, the HSI still maintains this characteristic after DR. We want to obtain a within-class representation coefficient matrix by minimizing the error of the collaboration representation model. To prevent overfitting, regularization constraints are added to the optimization model. The objective function of the within-class collaboration representation model is as follows: In Equation (9), l k is the sample number in the kth class and τ k is the sample set other than the kth class, and X k is expressed as the sample set from the same class as x i , except x i . θ w k is the within-class linear representation coefficient matrix of the kth class sample, and the within-class mean coefficient matrix is θ w k = 1 n−1 , · · · , 1 n−1 ∈ R (n−1)×1 , and θ w denotes all the within-class linear representation coefficients θ w k i = 1 · · · c . Likewise, the objective function of the between-class representation coefficient matrix is as follows: Sensors 2020, 20, 4413 6 of 25 In Equation (10), _ x is the mean of the total samples, and X k is the central sample set of each class sample. θ b k denotes the between-class representation coefficient matrix of the kth class sample, and the between-class mean coefficient matrix is θ b k = 1 n−1 , · · · , 1 n−1 ∈ R (n−1)×1 , and θ b denotes all the between-class representation coefficients θ b k k = 1 · · · c . The within-class representation matrix θ w k is obtained by solving the minimum value of Equation (9) (setting the derivative of objective function about within-class representation coefficients to be 0): Therefore, the within-class coefficient matrix is as follows: In the same way, we can set the derivative of objective function about between-class representation coefficients to be 0, so the between-class coefficient matrix is as follows:

ISS-WME Model
Given the HSI data set X =[x 1 , · · · , x N ], x i ∈ R D , we assume that the projection matrix P ∈ R D×d is expected to project the data X into the low-dimensional space. Y = y 1 , · · · , y N , y i ∈ R d represents the samples in low-dimensional space, and Y = P T X. As proposed by Wu [15], both of the distance and structural factors are taken into account in this paper. Then, we regard the spatial-spectral matrix as the distance weight W D ij and coefficient matrices as the structure weight W S ij , then W D ij and W S ij constitute the new weight matrix between samples, such as the following: where β represents the proportion of the within-class matrix and the between-class matrix in the structure weight. Furthermore, the high-dimensional data mapping to the low-dimensional space not only makes the local manifold structure unchanged, but also maintains the local neighbor relationship invariant. Introducing the weight of Equation (14) to increase the robustness of the model, the improved weight manifold embedding optimization problem is as follows: where α is a compromise parameter. W ij is the spatial-spectral matrix in Equation (14) and G ij is still the weight matrix representing the nearest neighbor relationship. If x i and x j are neighbors, G ij = d G (i, j) represents the geodesic distance; otherwise, G ij = 0. According to Equations (2) and (5), the optimization problem (15) is equivalent to the following: . Moreover, B = L + αM . Finally, the objective function can be conducted as the following optimization problem: With the method of Lagrange multiplier, the optimization problem is formed as follows: where p d is the generalized eigenvector of Equation (18) according to their eigenvalue λ 1 ≤ · · · ≤ λ d .
Then, we can learn a projection matrix P = p 1 , · · · , p d . In summary, Algorithm 1 is as follows:

Algorithm 1 Process of the ISS-WME Algorithm
Input: HSI data set X =[x 1 , · · · , x N ] ∈ R D×N and x i = x f i , x p i , low-dimensional space d D, K is the nearest neighbor. 1: HSI is segmented into superpixels using the SLIC segmentation method and randomly select training samples (for Pavia University, training samples are 2%, 4%, 6%, 8%, 10%), and then use Equations (7) and (8) to calculate the spatial-spectral distance matrix between superpixels. In addition, make sure the number of superpixels and training samples is the same. 2: Then, use Equations (12) and (13) to obtain the structure representation matrix between training samples. The product of the two types of matrices is taken as the new matrix Equation (14). 3: According to the local manifold structure and nearest neighbor relationship of the samples, the objective function of Equation (16) is constructed. 4: By solving the generalized feature of Equation (18), the corresponding eigenvector is obtained. 5: Learn a projection matrix P. Output: The data in low-dimensional space is Y = P T X

Experiments and Discussion
In order to verify the effectiveness of the proposed algorithm ISS-WME, we conducted experiments on three commonly used HSI data sets, namely Indian Pines, Pavia University, and Salinas scene. We considered the overall accuracy (OA) [19], classification accuracy (CA), average accuracy (AA), and kappa coefficient (kappa) [27] of the classification results as evaluation values. We compared the ISS-WME algorithm with six other representative DR algorithms, i.e., PCA, Isomap [28], LLE, LE, SSSE, and WLE-LLE. We used two more commonly used classifiers, i.e., the Euclidean distance-based k-nearest neighbor (KNN) algorithm [29] and the support vector machine (SVM), to classify the low-dimensional data. We performed the experiment using MATLAB on an Intel Core CPU 2.59 GHz and 8 GB RAM computer.

Data Sets
The Indian Pines, Pavia University, and Salinas scene data sets were subjected to experiments in the paper.
The Indian Pines data set [30,31] and Salinas scene data set [2,30] were the scenes gathered by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor. Indian Pines consisted of 145 × 145 pixels and 220 spectral bands. However, several spectral bands with noise and water absorption phenomena were removed from the data set, leaving a total of 200 radiance channels to be used in the experiments. Salinas had 512 × 217 pixels and 204 spectral bands.
The Pavia University data set [30,32] was acquired by the Reflective Optics System Imaging Spectrometer (ROSIS) sensor. Its size was 610 × 340 pixels. Some channels were removed due to noise and the remaining number of spectral bands was 103.

Experimental Parameter Settings
For this paper, six different DR algorithms were compared with the proposed ISS-WME method. These comparison algorithms are described as follows. PCA, Isomap, LLE, and LE are four classical DR algorithms. The SSSE algorithm combines the spectral and spatial information and WLE-LLE combines the spectral and structural information. In addition, for the LE, LLE, WLE-LLE, and SSSE algorithms, the number of nearest neighbor samples must be set in the experiment. To compare and analyze the classification results in the experiment, the nearest neighbor samples were set as 15 in all experiments. The SSSE and ISS-WME algorithms also require computational spatial and spectral information, so we set the parameters as σ f , σ p = (0.1, 100).
In each experiment, each data set was divided into training samples and testing samples. We used different DR algorithms to learn a projection matrix on the training samples, and then utilized the acquired embedding matrix to project the testing samples into the low-dimensional space. Finally, we used a KNN or SVM classifier to classify the data in the low-dimensional space. Moreover, to reduce the systematic error, the results were computed 10 times to calculate the average value for each experimental result with the associated standard deviation. We used OA, CA, AA, and K to evaluate the different algorithm performances. In the Indian Pines experiment, the parameters were set to (β, α) = (0.5, 0.2). In the same way, the parameters were set to (β, α) = (0.5, 0.1) in the Pavia University experiment. Finally, in the Salinas scene, the parameters were (β, α) = (0.5, 0.2).

Results for the Indian Pines Data Set
To fully attest the algorithm performance of the ISS-WME method, experiments were carried out under the conditions of different numbers of training samples, different embedding dimensions, and different DR methods. We randomly selected n% (n = 10, 20, 30, 40, 50) samples from each class as the training sample set, and the rest were the testing sample set. We also set the hyperspectral dimensionality (HD) of low-dimensional embedding from 10 to 50. The results of the proposed ISS-WME method were compared with those of the other comparison DR methods. Figures 1 and 2 show the OA of the KNN and SVM classifiers on different embedding dimensions using different DR methods. Specifically, (a)-(e) represent different training sample sets. The OA of Indian Pines with different training samples directly classified by a KNN or SVM classifier was used as the baseline. Compared with the five other dimensionality reduction methods, ISS-WME and WLE-LLE achieved the best and the second-best overall accuracy, respectively, under different dimensions or different training samples. Comparing Figures 1 and 2, the overall accuracy of SVM is higher than KNN. dimensions using different DR methods. Specifically, (a)-(e) represent different training sample sets. The OA of Indian Pines with different training samples directly classified by a KNN or SVM classifier was used as the baseline. Compared with the five other dimensionality reduction methods, ISS-WME and WLE-LLE achieved the best and the second-best overall accuracy, respectively, under different dimensions or different training samples. Comparing Figures 1 and 2, the overall accuracy of SVM is higher than KNN.  As can be seen in Figures 1 and 2, the OA decreases as the dimension increases. In Figure 1, it can be observed that, for the KNN classifier, the proposed ISS-WME method obtains similar classification results with those of WLE-LLE in almost all cases of embedding dimensions, and achieves the best classification result in hyperspectral dimensionality (HD) = 50. Figure 2c shows the OA of the HD for the 30% samples of the Indian Pines data as training set. Compared with RAW, PCA, Isomap, LLE, LE, SSSE, and WLE-LLE, when HD = 50, ISS-WME increases the OA by 12.01%, 8.2%, 7.98%, 5.28%, 4.15%, and 2.69%, respectively. To further demonstrate intuitively the classification results of the DR algorithms, the comparison results for the 50% of the Indian Pines data trained with the SVM classifier in HD = 20 are presented visually in Figure 3 with the best overall accuracy. It includes (a) the false-color image, (b) the corresponding ground-truth map, and the different DR methods' classification maps (c)-(j). It can be observed that the proposed ISS-WME algorithm performs better in land-over classes than the other compared DR methods. As can be seen in Figures 1 and 2, the OA decreases as the dimension increases. In Figure 1, it can be observed that, for the KNN classifier, the proposed ISS-WME method obtains similar classification results with those of WLE-LLE in almost all cases of embedding dimensions, and achieves the best classification result in hyperspectral dimensionality (HD) = 50. Figure 2c shows the OA of the HD for the 30% samples of the Indian Pines data as training set. Compared with RAW, PCA, Isomap, LLE, LE, SSSE, and WLE-LLE, when HD = 50, ISS-WME increases the OA by 12.01%, 8.2%, 7.98%, 5.28%, 4.15%, and 2.69%, respectively. To further demonstrate intuitively the classification results of the DR algorithms, the comparison results for the 50% of the Indian Pines data trained with the SVM classifier in HD = 20 are presented visually in Figure 3 with the best overall accuracy. It includes (a) the false-color image, (b) the corresponding ground-truth map, and the different DR methods' classification maps (c)-(j). It can be observed that the proposed ISS-WME algorithm performs better in land-over classes than the other compared DR methods.
Sensors 2020, 20, x FOR PEER REVIEW 11 of 25 In order to further describe the comparison results, the quantitative comparison of classification accuracy using SVM classifiers under HD = 20 for different DR methods is summarized in Table 1. The results include the OA and kappa coefficient for each method, and each result is the average of the results of 10 runs with the associated standard deviation. As can be seen in Table 1, in most cases, the classification results (OA and kappa) generated by ISS-WME are the best. In order to further describe the comparison results, the quantitative comparison of classification accuracy using SVM classifiers under HD = 20 for different DR methods is summarized in Table 1. The results include the OA and kappa coefficient for each method, and each result is the average of the results of 10 runs with the associated standard deviation. As can be seen in Table 1, in most cases, the classification results (OA and kappa) generated by ISS-WME are the best.  Table 2 provides the training and testing sample numbers of each class in the Indian Pines data set in the experiment, as well as the classification results of the SVM classifier using different DR methods. Compared to Table 1, Table 2 shows the evaluation index CA, where the best results are shown in bold numbers. It can also be seen in Table 2 that the ISS-WME method achieves the best accuracy in 10 classes of samples.

Results for the Pavia University Data Set
In order to fully attest the algorithm performance of ISS-WME, experiments were carried out under the conditions of different numbers of training samples, different embedding dimensions, and different DR methods. We randomly selected n% (n = 2, 4, 6, 8, 10) samples from each class as the training set, and the rest were the testing set. We also set the hyperspectral dimensionality (HD) of low dimensional embedding from 10 to 50. The results of the proposed ISS-WME method were compared with those of the other DR methods. Figures 4 and 5 show the OA of the KNN and SVM classifiers on different embedding dimensions using different DR methods. Specifically, (a)-(e) represent different training sets. The OA directly obtained by using different classifiers in dimensions was used as the baseline. Compared to the six other algorithms, ISS-WME achieved the best OA in almost all cases with different embedding dimensions under different numbers of training samples. As can be seen in Figure 5, image classification accuracies are more or less susceptible to distortion with the increase in embedding dimensions. No matter which DR algorithms are adopted, the curse of dimensionality occurs to a certain extent. Compared with Figures 4 and 5, the distortion is serious when using the SVM classifier. Sensors 2020, 20, x FOR PEER REVIEW 14 of 25  As can be seen in Figure 4, the OA of different DR methods is relatively stable with the increase in training sets when KNN is used as the classifier. Figure 5e shows the impact of the hyperspectral dimensionality (HD) on the OA for 10% samples of Pavia University data as training set. Compared with RAW, PCA, Isomap, LLE, LE, SSSE, and WLE-LLE, when HD = 50, ISS-WME increases the OA by 0.13%, 0.55%, 1.24%, 3.34%, 0.64%, and 0.31%, respectively. In order to further demonstrate the classification results of DR algorithms, the classification result maps for the 10% of the Pavia University data trained with the SVM classifier in HD = 20 are presented visually in Figure 6, including (a) the false-color image, (b) the corresponding ground-truth map, and the different DR methods' classification maps (c)-(j). It can be observed that the proposed ISS-WME algorithm performs better than the other compared DR methods, in most land-over classes. As can be seen in Figure 4, the OA of different DR methods is relatively stable with the increase in training sets when KNN is used as the classifier. Figure 5e shows the impact of the hyperspectral dimensionality (HD) on the OA for 10% samples of Pavia University data as training set. Compared with RAW, PCA, Isomap, LLE, LE, SSSE, and WLE-LLE, when HD = 50, ISS-WME increases the OA by 0.13%, 0.55%, 1.24%, 3.34%, 0.64%, and 0.31%, respectively. In order to further demonstrate the classification results of DR algorithms, the classification result maps for the 10% of the Pavia University data trained with the SVM classifier in HD = 20 are presented visually in Figure 6, including (a) the false-color image, (b) the corresponding ground-truth map, and the different DR methods' classification maps (c)-(j). It can be observed that the proposed ISS-WME algorithm performs better than the other compared DR methods, in most land-over classes.  To further describe the comparison results, the quantitative comparison of OA of different DR methods at HD = 20 is summarized in Table 3. The results include the overall accuracy and kappa coefficients of each method, and each result is an average of the results of 10 runs with the associated standard deviation. As can be seen in Table 3, the classification results (OA and kappa) produced by ISS-WME are the best in most cases. In addition, it can be seen in Table 4 that ISS-WME obtained the best classification accuracy about six classes, and the best results of the indexes are shown in bold. To further describe the comparison results, the quantitative comparison of OA of different DR methods at HD = 20 is summarized in Table 3. The results include the overall accuracy and kappa coefficients of each method, and each result is an average of the results of 10 runs with the associated standard deviation. As can be seen in Table 3, the classification results (OA and kappa) produced by ISS-WME are the best in most cases. In addition, it can be seen in Table 4 that ISS-WME obtained the best classification accuracy about six classes, and the best results of the indexes are shown in bold. Table 4 provides the number of training and test samples for each class in the Pavia University data set in the experiment, as well as the classification results under the SVM classifier using different dimensionality reduction methods. Compared with Table 3, the classification accuracy (CA) is displayed in Table 4, where the best results are shown in bold numbers. Moreover, it can be seen in Table 4 that the ISS-WME method achieves the best accuracy in sixclasses of samples.

Results for the Salinas Scene Data Set
To describe the comparison results, the quantitative comparison of OA of different DR methods when HD = 20 is summarized in Table 5. The results include the overall accuracy and kappa coefficients of each method, and each result is an average of the results of 10 runs with the associated standard deviation. As can be seen in Table 5, the classification results (OA and kappa) produced by ISS-WME are the best in most cases. In addition, it can be seen in Table 6 that ISS-WME obtained the best classification accuracy about 12 classes, and the best results of the indexes are shown in bold. Moreover, the results of three classes are the same as the WLE-LLE algorithm.  Table 5 provides the number of training and test samples for each class in the Salinas scene data set in the experiment, as well as the classification results under the SVM classifier using different dimensionality reduction methods. Compared with Table 5, the classification accuracy (CA) is displayed in Table 6, where the best results are shown in bold numbers. And the visual representation of different dimensional reduction methods of Salinas data set is supplemented in the Appendix A.

Conclusions
In this paper, a dimensionality reduction method combining the manifold structure of high-dimensional data with a linear nearest neighbor relationship was proposed. The method aimed to keep the data nearest neighbor relationship unchanged when the high-dimensional data were projecting to the low-dimensional space. Furthermore, the manifold structure of the data combined the spatial-spectral distance and structural features. To fully verify the superiority of the proposed method, the data obtained by the ISS-WME method and the six other dimensionality reduction methods were classified by two common classifiers. The results of several experiments show that the ISS-WME algorithm improves the ground object recognition ability of hyperspectral data, and the OA and kappa coefficients also support this conclusion. In the future, the dimensionality reduction labeling will be further considered to improve the classification effect through the framework of semi-supervised learning.   Figure A1. OA obtained by using an SVM classifier, with respect to (a-e), different numbers of training sets (2%, 4%, 6%, 8%, 10%) and different HD (from 10 to 50) for the Salinas scene data set. Figure A1. OA obtained by using an SVM classifier, with respect to (a-e), different numbers of training sets (2%, 4%, 6%, 8%, 10%) and different HD (from 10 to 50) for the Salinas scene data set.