Classification of Hyperspectral Images Based on Supervised Sparse Embedded Preserving Projection

Dimensionality reduction is an important research area for hyperspectral remote sensing images due to the redundancy of spectral information. Sparsity preserving projection (SPP) is a dimensionality reduction (DR) algorithm based on the l1-graph, which establishes the relations of samples by sparse representation. However, SPP is an unsupervised algorithm that ignores the label information of samples and the objective function of SPP; instead, it only considers the reconstruction error, which means that the classification effect is constrained. In order to solve this problem, this paper proposes a dimensionality reduction algorithm called the supervised sparse embedded preserving projection (SSEPP) algorithm. SSEPP considers the manifold structure information of samples and makes full use of the label information available in order to enhance the discriminative ability of the projection subspace. While maintaining the sparse reconstruction error, the algorithm also minimizes the error between samples of the same class. Experiments were performed on an Indian Pines hyperspectral dataset and HJ1A-HSI remote sensing images from the Zhangjiang estuary in Southeastern China, respectively. The results show that the proposed method effectively improves its classification accuracy.


Introduction
Alongside the development of hyperspectral remote sensing imaging technology, hyperspectral imaging (HSI) is providing increasingly useful information and is widely used in many fields, such as the military, agricultural industry, and geological exploration [1][2][3].In recent years, hyperspectral imaging has become a research hotspot both locally and internationally.However, hyperspectral imaging involves a large number of bands and is computationally complex.In addition, it has a high degree of correlation and band redundancy, which makes the process prone to dimension disaster [4].As a result, dimensionality reduction (DR) is an essential part in the classification of hyperspectral images [5].
In the research field of high-dimensional data processing, scholars have proposed a series of classic algorithms.The common DR algorithms mainly include principal component analysis (PCA) [6] and liner discriminative analysis (LDA) [7].PCA and LDA are both based on the assumption that the embedded subspace of the high-dimensional data space is linear.Therefore, the intrinsic properties of high-dimensional data are difficult to find and it is difficult to prevent the low-dimensional manifold structure of hyperspectral data from being revealed [8].In recent years, sparse representation has been applied to signal processing, pattern recognition, and other fields, and has achieved certain

Graph Embedding
Graph embedding (GE) [16] revealed certain geometrical features in the data through the spectrogram theory.It used the Laplacian operator to process the constructed graph, which retained useful information and suppressed useless information of the graph in low-dimensional embedding.In GE, the construction of an intrinsic graph was required to describe the geometric characteristics of similar data, and a penalty graph was also constructed to represent the geometric characteristics, which should be avoided.The intrinsic graph G = { X, W} and the penalty graph G p = X, W P are both undirected graphs, where X is the vertex set and W ∈ R n×n and W p ∈ R n×n represent the weight matrices.In graph G, the weight w ij denotes the edge weights between vertices x i and x j , which needed to maintain the similarity characteristics of similar data.In graph G P , the weight w p ij suppressed the similarity characteristics of heterogeneous data in low-dimensional embedding.According to the principle of the graph embedding relationship, the objective function can be written as: where c is a constant.In order to avoid degeneracy, an extra constraint matrix, B, was added.L was a Laplacian matrix of the intrinsic graph G. B was a typical normalized diagonal matrix and can be set as the Laplacian matrix of the penalty graph G P , that is, B = L p .Thus, the Laplacian matrix L and L p were defined as: where D and D p are both diagonal matrices.If the relationship is linear, that is, Y = V T X, the graph embedded objective function can be converted to: tr(V T XBX T V). (3)

Sparsity Preserving Projection
Sparse representation can use a small amount of data to represent the main information of an image.Sparsity preserving projection (SPP) is an unsupervised dimensionality reduction algorithm based on the sparse representation theory.Unlike traditional graph construction methods, the algorithm constructs a graph representing the relationships between data samples through sparse representation.This is a global sparse graph construction method, and thus there is no need to artificially select neighbor parameter values.The core idea of SPP was to construct the sparse reconstructive weight matrix and to look for an optimal projection vector V that minimizes the projection error between the original sample on the V and the sparse reconstruction sample on the V.
Assuming that the sample set X = { x 1 , x 2 , . . ., x N } and N make up the total number of samples, the objective function of the original sparse reconstruction weight was: Due to the noise of the sample, the constraints were not necessarily satisfied, so we could relax the constraints to get the following equation: where ξ represents the allowable error value for sparse reconstruction.By solving the above equation, it was possible to obtain the sparse representation coefficient matrix S = [s 1 , s 2 , . . ., s n ], where s i was the sparse representation coefficient of x i .
In the process of dimensionality reduction, the purpose of SPP was to keep the sparse representation relationship constant between samples, and thus the objective function of the SPP could be written as: where V was the projection matrix.After a simple algebraic operation, the objective function of SPP could be recorded as: In order to prevent degeneracy, an extra constraint was added: V T XX T V = I.Thus, the objective function of the SPP was expressed as: To obtain more stable numerical solutions, we transformed the above-mentioned minimization problem into the following maximization problem: where S β = S + S T − S T S, the optimization problem in Equation ( 9) could be solved using the generalized eigenvalue vectors XS β X T V = λXX T V, and the best projection matrix was obtained as

Supervised Sparse Embedded Preserving Projection
SPP is an unsupervised algorithm that ignores the label information of samples, and the objective function only considers the sparse reconstruction error, which does not reflect the local structural relationship of samples.A supervised sparse embedded preserving projection (SSEPP) algorithm was proposed on the basis of SPP.SSEPP not only used the label information of samples to construct the weight matrix, which found the manifold structural information of the data and enhanced the discrimination ability of the projection subspace, but also minimized the spacing between the same class samples.Assume the sample set X = { x 1 , x 2 , . . ., x n }, where x i is the i th sample of X.To improve the objective function of the SPP algorithm, the objective function of SSEPP was recast as the following optimization problem: where V represented a projection vector and g ij was a weight matrix, that is, the set of weights among all the samples.The definition was as follows: In Equation ( 11), if x i and x j both belong to the same class, the weight value is not 0. When the distance between x i and x j is closer, the value is smaller.This was used to enhance the discrimination ability of the projection subspace.After a few simple algebraic operations, the objective function was optimized to: In order to obtain a more stable numerical solution, the above problem regarding the minimization of the objective function was converted into a maximization problem: where S α = (sg) T − sg + (sg)(sg) T is a sparse reconstruction matrix.
The SPP objective function only considers the reconstruction error.On this basis, SSEPP considers the same class spacing and minimizes the error of samples in the same class in the projection space by establishing the objective function.If y i = V T x i , y j = V T x j is the projection of the different training samples in the projection space, the objective function can be recorded as follows: where After some simple operations, the objective function can be converted into the following: where S β = D − P, D is a diagonal matrix and d ii = i P ii is a diagonal element.By combining Equation ( 13) and Equation ( 15), the multi-objective optimization function of the SSEPP algorithm was obtained: The SSEPP objective function not only highlights the role of a sample's label, but also minimizes the sparse reconstruction error while minimizing the distance between samples of the same class so that the same class sample projection is more compact.
Equation ( 16) was solved by using the following generalized value eigenvalue decomposition problem: The eigenvalues given by the above equation are sorted in descending order, allowing us to obtain the projection matrix V = [v 1 , v 2 , . . ., v n ], which corresponded to the n maximum eigenvalues.Thus, the dimensionality reduction sample feature is given as follows: where m i ( x i ) is a feature value of the i th sample data.The specific steps of the supervising sparse embedded preserving projection algorithm are described in detail in Algorithm 1.
Algorithm 1: Supervised Sparse Embedded Preserving Projection (SSEPP) Input: The set of sample X = {x i x i ∈ R B , 1 ≤ i ≤ n} , error value ξ, and dimensionality reduction d.
Output: The set of feature after dimension reduction M.
(1) Standardize the sample set; (2) Use the SPGL1 [17] algorithm to solve the sparse reconstruction coefficients using Equation ( 5); (3) Calculate the weight matrix of the labeled sample using Equation ( 11); (4) Use Equation ( 15) to calculate the objective function, which minimizes the error of samples of the same class.
Then, obtain the multi-objective function according to Equation ( 16); (5) Obtain the projection matrix by solving the eigenvectors of the generalized eigenvalue decomposition using Equation ( 17); (6) Use Equation ( 18) to obtain the reduced-dimension sample feature set M. End For

Experimental Results and Analysis
In this section, we validated our proposed method with several HSI datasets and present experimental results that demonstrate the benefits of SSEPP for DR of HSI data.

Indian Pines Dataset
The Indian Pines dataset [18] was used in the first experiment.The image had 145 × 145 pixels and its spatial resolution was 20 m.The image had a total of 220 spectral bands, of which 20 bands were more affected by water absorption and were removed before the experiment [19].The remaining 200 bands were used for experiments.The wavelength range was between 0.4 and 2.5 um.In the entire HSI image, there was a total of 10249 pixels with label information.There were 16 sample ground truth classes, of which 10% were used as training samples and the rest were used for testing.Table 1 lists the classes and the number of their training samples.In the Indian Pines HSI dataset experiment, the dimensions of different dimensionality reduction algorithms were uniformly set to 30 dimensions.Then, the nearest neighbor classifier (1-NN) was used for classification, that is, K was set to 1.The SPGL1 algorithm was used to solve the sparse reconstruction coefficients.In order to obtain better performance, the sparse reconstruction error value ζ was set to 0.9.
Figure 1 showed a comparison of the classification results of the Indian Pines.Figure 1a showed the false color image of Indian pines, Figure 1b showed the ground truth, Figure 1c showed the sparsity preserving projection (SPP), Figure 1d showed the principal component analysis (PCA), Figure 1e showed the sparse discriminant embedding, and Figure 1f showed the supervised sparse embedding preserving projection (SSEPP).
were more affected by water absorption and were removed before the experiment [19].The remaining 200 bands were used for experiments.The wavelength range was between 0.4 and 2.5 um.In the entire HSI image, there was a total of 10249 pixels with label information.There were 16 sample ground truth classes, of which 10% were used as training samples and the rest were used for testing.Table 1 lists the classes and the number of their training samples.In the Indian Pines HSI dataset experiment, the dimensions of different dimensionality reduction algorithms were uniformly set to 30 dimensions.Then, the nearest neighbor classifier (1-NN) was used for classification, that is, K was set to 1.The SPGL1 algorithm was used to solve the sparse reconstruction coefficients.In order to obtain better performance, the sparse reconstruction error value ζ was set to 0.9.
Figure 1 showed a comparison of the classification results of the Indian Pines.Figure 1a showed the false color image of Indian pines, Figure 1b showed the ground truth, Figure 1c showed the sparsity preserving projection (SPP), Figure 1d showed the principal component analysis (PCA), Figure 1e showed the sparse discriminant embedding, and Figure 1f showed the supervised sparse embedding preserving projection (SSEPP).Figure 1 showed that the SSEPP algorithm is better than the SPP, PCA, and SDE comparison algorithms.SSEPP had the least misclassification and leakage, and was the closest to the ground truth image of the Indian Pines.The SPP algorithm had the worst classification result because SPP is an unsupervised algorithm.SSEPP and SDE are both supervised algorithms, which means that they can introduce sample label information to enhance their discrimination ability, thus improving the Figure 1 showed that the SSEPP algorithm is better than the SPP, PCA, and SDE comparison algorithms.SSEPP had the least misclassification and leakage, and was the closest to the ground truth image of the Indian Pines.The SPP algorithm had the worst classification result because SPP is an unsupervised algorithm.SSEPP and SDE are both supervised algorithms, which means that they can introduce sample label information to enhance their discrimination ability, thus improving the classification accuracy.In addition, compared to the SDE algorithm, according to the similarity of the samples, SSEPP constructed a similarity matrix that assigned different weights to samples of the same class; it therefore enhanced the natural discrimination ability of sparse representation.In order to improve classification accuracy of the Indian Pines dataset, it also constructed multi-objective optimization based on the original objective function that minimized both the sparse reconstruction error and same class sample spacing.
As shown in Table 2 and Figure 2, the difference in feature dimensions directly led to different levels of classification accuracy of hyperspectral remote sensing images.With the increase in feature dimensions, the overall accuracy showed an upward trend, and when dimensions reached about 30, the trend was stable.An SSEPP algorithm was always better than an SPP algorithm because SSEPP indicated the sample label and minimized the same class sample spacing, features that were very helpful for feature extraction.Before 7-dimension, the classification effect of the PCA algorithm was better than SSEPP, but after 7-dimension, the proposed algorithm was always better than PCA.After feature extraction reached 8-dimension, the SSEPP method was always optimal.This verifies the effectiveness of the algorithm.The Environment and Disaster Detection Small Satellite (HJ) is a new civilian satellite system in China.It was successfully launched in Taiyuan at 11:25 a.m. on 6 September 2008.The HJ1A satellite is equipped with a CCD camera and hyperspectral imaging (HSI).The remote sensing data used in these experiments were derived from the HJ-HSI [20] on 28 March 2010, which involved sampling of the Zhangjiangkou Mangrove Nature Reserve in Fujian province.The mangrove area is one of the national key nature reserves.The location is 22 • 53 45 ∼ 23 • 56 00 N; and 117 • 24 07 ∼ 117 • 30 00 E, with total area of 2360 hectares.The spatial resolution of this image was 100 m and the band range were 450-950 nm, with a total of 115 bands.According to field investigations in the study area, the land cover types are defined as being in seven categories.  E, with total area of 2360 hectares.The spatial resolution of this image was 100 m and the band range were 450-950 nm, with a total of 115 bands.According to field investigations in the study area, the land cover types are defined as being in seven categories.Table 3 lists the seven land cover types and their descriptions.In this experiment, 100 samples were randomly selected as training samples, of which 10 were mangroves and 15 were for each of the other six land cover types.Different dimensionality reduction algorithms uniformly reduced 115 spectral bands to 10 dimensions.The spectral bands had four texture features (mean, variance, dissimilarity, and second moment) [21], Digital elevation model (DEM) and normalized difference vegetation index (NDVI) [22] were integrated into one decision-making feature and a total of 16 dimensional features.Finally, the classification was performed by the 1-NN classifier.The false color composite of the Zhangjiangkou Mangrove Nature Reserve HJ1A-HSI is shown in Figure 3a. Figure 3b-e shows a comparison of the classification  In this experiment, 100 samples were randomly selected as training samples, of which 10 were mangroves and 15 were for each of the other six land cover types.Different dimensionality reduction algorithms uniformly reduced 115 spectral bands to 10 dimensions.The spectral bands had four texture features (mean, variance, dissimilarity, and second moment) [21], Digital elevation model (DEM) and normalized difference vegetation index (NDVI) [22] were integrated into one decision-making feature and a total of 16 dimensional features.Finally, the classification was performed by the 1-NN classifier.The false color composite of the Zhangjiangkou Mangrove Nature Reserve HJ1A-HSI is shown in Figure 3a. Figure 3b-e shows a comparison of the classification results of different algorithms, Figure 3b shows the sparsity preserving projection (SPP) algorithm, Figure 3c shows the principal component analysis (PCA) algorithm, Figure 3d shows the sparse discriminant embedding (SDE) algorithm, and Figure 3e shows the supervised sparse embedding preserving projection (SSEPP) algorithm.
A confusion matrix is a very effective way to assess classification accuracy.We used a confusion matrix, the overall accuracy, and the kappa coefficient to evaluate the classification performance of the algorithms.Using field sampling and high-resolution imagery, we randomly took 30 mangrove samples and 50 samples of the remaining six land cover types as the ground truth samples to build the confusion matrices.The confusion matrices for the classification results of the four algorithms are listed in Table 4. Overall accuracy and kappa coefficients are shown in Table 5.
results of different algorithms, Figure 3b shows the sparsity preserving projection (SPP) algorithm, Figure 3c shows the principal component analysis (PCA) algorithm, Figure 3d shows the sparse discriminant embedding (SDE) algorithm, and Figure 3e shows the supervised sparse embedding preserving projection (SSEPP) algorithm.A confusion matrix is a very effective way to assess classification accuracy.We used a confusion matrix, the overall accuracy, and the kappa coefficient to evaluate the classification performance of the algorithms.Using field sampling and high-resolution imagery, we randomly took 30 mangrove samples and 50 samples of the remaining six land cover types as the ground truth samples to build the confusion matrices.The confusion matrices for the classification results of the four algorithms are listed in Table 4. Overall accuracy and kappa coefficients are shown in Table 5.Both Figure 3 and Table 4 show that the SSEPP algorithm greatly reduced the misidentification phenomenon of mangroves, that is, the fewest red noise points were observed, and its classification result image was closest to the Zhangjiang estuary mangroves HJ1A-HSI ground truth image.Table 5 shows that the proposed SSEPP method yielded the best overall accuracy and kappa coefficient, followed by the SDE algorithm.Relative to the traditional sparse preserving projection algorithm, SSEPP classification accuracy and the kappa coefficient were increased by 3% and 0.04, respectively, which means that the sample label information was introduced and the spacing of same class samples considered could both result in better classification.Compared with the SDE and PCA algorithms, the proposed algorithm improved by 2% and 3%, respectively, and the Kappa coefficient was increased by 0.02 and 0.03, respectively.

Figure 2 .
Figure 2. Overall accuracy curves of different DR algorithms in different dimensions.

4. 2 .
Zhangjiangkou Mangrove Nature Reserve HJ1A-HSI Dataset The Environment and Disaster Detection Small Satellite (HJ) is a new civilian satellite system in China.It was successfully launched in Taiyuan at 11:25 a.m. on 6 September 2008.The HJ1A satellite is equipped with a CCD camera and hyperspectral imaging (HSI).The remote sensing data used in these experiments were derived from the HJ-HSI [20] on 28 March 2010, which involved sampling of the Zhangjiangkou Mangrove Nature Reserve in Fujian province.The mangrove area is one of the national key nature reserves.The location is

Figure 2 .
Figure 2. Overall accuracy curves of different DR algorithms in different dimensions.

Table 1 .
Information about the Indian Pines dataset.

Table 1 .
Information about the Indian Pines dataset.

Table 2 .
Indian Pines classification results.
Table 3 lists the seven land cover types and their descriptions.

Table 3 .
Classification of land cover types.

Table 3 .
Classification of land cover types.

Table 4 .
Comparison of the confusion matrices.

Table 5 .
Comparison of the overall accuracy and kappa coefficients.