Hypergraph Embedding for Spatial-Spectral Joint Feature Extraction in Hyperspectral Images

The fusion of spatial and spectral information in hyperspectral images (HSIs) is useful for improving the classification accuracy. However, this approach usually results in features of higher dimension and the curse of the dimensionality problem may arise resulting from the small ratio between the number of training samples and the dimensionality of features. To ease this problem, we propose a novel algorithm for spatial-spectral feature extraction based on hypergraph embedding. Firstly, each HSI pixel is regarded as a vertex and the joint of extended morphological profiles (EMP) and spectral features is adopted as the feature associated with the vertex. A hypergraph is then constructed by the K-Nearest-Neighbor method, in which each pixel and its most K relevant pixels are linked as one hyperedge to represent the complex relationships between HSI pixels. Secondly, the hypergraph embedding model is designed to learn a low dimensional feature with the reservation of geometric structure of HSI. An adaptive hyperedge weight estimation scheme is also introduced to preserve the prominent hyperedges by the regularization constraint on the weight. Finally, the learned low-dimensional features are fed to the support vector machine (SVM) for classification. The experimental results on three benchmark hyperspectral databases are presented. They highlight the importance of spatial–spectral joint features embedding for the accurate classification of HSI data. The weight estimation is better for further improving the classification accuracy. These experimental results verify the proposed method.


Introduction
Hyperspectral imaging is an important mode of remote sensing imaging, which has been widely used in a diverse range of applications, including environment monitoring, urban planning, precision agriculture, geological exploration, etc. [1][2][3].Most of these applications depend on the key problem of classifying the image pixels within hyperspectral imagery (HSI) into multiple categories, i.e., HSI classification, and extensive research efforts have been focused on this problem [4][5][6][7][8][9].
In HSI, each pixel contains hundreds of spectral bands from the visible to the infrared range of the electromagnetic spectrum.In general, the spectral signature of each pixel can be directly used as the feature for classification.However, due to the noise corruption and high correlation between spectral bands, the using of the spectral feature alone is often unable to obtain good classification results.It is well accepted that the HSI pixels within a small spatial neighborhood are often made up of the same materials.Thus, spatial contextual information is also useful for classification [10,11].Landgrebe and Ketting proposed the well-known extraction and classification of homogeneous objects (ECHO) approach that partitioned the HSI pixels into homogeneous object and classified homogeneous object as different categories [12].Later, Markov random field (MRF) modeling was widely adopted to capture the interpixel dependency through the neighbor system [13,14].However, the optimization of MRF-based methods is very time-consuming.Due to the high dimensionality of HSI data, the computationally effective algorithm is desirable.In this sense, Pesaresi and Benediktsson [15] proposed the use of morphological transformations to build a morphological profile (MP) for extracting the structural information.Palmason et al. [16] extended the method proposed in [15] to the high-resolution hyperspectral data classification.They first extracted several principal components of the hyperspectral data.Then, the MP is constructed based on each selected principal component.At last, all MPs are jointed as extended MP (EMP), which is input into a neural network for classification.However, EMP was primarily designed for classification of urban structures and it did not fully utilize the spectral information in the data.Regrading this issue, Fauvel et al. [17] proposed fusing the morphological information and the original hyperspectral data, i.e., the two vectors of attributes are concatenated into one feature vector.The final classification is achieved by using a support vector machine classifier.Many other spectral and spatial joint features [18][19][20][21][22], such as 3D wavelet [18], spatial and spectral kernel [19], matrix-based discriminant subspace analysis [20], etc. are used for classification.
These joint features usually have a high dimension.In order to avoid the Hughes phenomenon, feature extraction and dimensionality reduction must be conducted before classification.Principal component analysis (PCA) and Fisher's linear discriminant analysis (LDA) [23] are two simple and effective approaches for dimension reduction.PCA aims at projecting the data along the directions of maximal variance.LDA is designed to generate the optimal linear projection matrix by maximizing the between-class distance while minimizing the within-class distance.Apart from these linear methods, many nonlinear versions have been developed, such as kernel PCA [24] and kernel LDA [25].Some other feature extraction techniques have also been proposed, e.g., locality preserving projection (LPP) [26], independent component analysis (ICA) [27,28], and locally linear embedding (LLE) [29].In particular, Yan et al. [30] proposed a general graph embedding (GE) model that seamlessly includes many existing feature extraction techniques.In this GE model, each data point is visualized as a vertex and a pairwise edge is used to represent the association relationship between two data points.They consider each feature extraction algorithm as an undirected weighted graph that describes geometric structures of data.GE algorithms have been widely explored for dimension reduction of HSI.Besides the geometric structures of data, sparsity is also explored to construct the graph embedding model.Luo et al. proposed constructing a graph with the sparse coefficients that reveals the sparse properties of data, and the transformation matrix is obtained for feature reduction [31].In addition, by regarding different band sets as different views of land covers, multiview graph ensemble-based graph embedding is also utilized to promote the performance of graph embedding for hyperspectral image classification [32].
A hypergraph is a generalization of a pairwise graph.Different from pairwise graphs, each edge in a hypergraph is capable of connecting more than two vertices [33].Thus, the complex relationships of the dataset can be captured by a hypergraph, and hypergraphs have been gaining more and more attention in recent years.Bu et al. [34] presented a hypergraph learning based music recommendation method with the use of hyperedges to exploit the complex social media information.A hypergraph semi-supervised learning model [35] was also proposed for image classification.Yuan et al. [36] utilized a hypergraph embedding model for HSI feature reduction, in which the spatial hypergraph models (SHs) are construed by selecting the K-nearest neighbors within the spatial region of the centroid pixel.Experimental results demonstrated that SH outperformed many existing feature extract methods for HSI classification, including raw spectral feature (RAW), PCA, LPP, LDA, nonparametric weighted feature extraction (NWFE) [37] and semi-supervised local discriminant analysis (SELD) [38].However, SH is designed to learn the projection matrix for reducing the spectral feature.The spatial structure is not exploited for hypergraph embedding, which is not capable of simultaneously extracting the spectral-spatial features.Furthermore, the hyperedge weight is computed in advance and fixed in the hypergraph embedding procedure.As the discussion stated in [39,40], all of the hyperedges do not have the same effect on the learning procedure.Some hyperedges are not as informative as others.The hypergraph embedding should be enhanced by estimating the hyperedge weights adaptively.
In order to cope with these issues, we propose a novel algorithm for HSI spatial-spectral joint feature extraction.We combine the EMP and spectral features and adopt the KNN method to construct a hypergraph, where each sample and its K nearest neighbors are enclosed in one hyperedge.Similar to [36], a linear projection matrix P can be learnt by solving the hypergraph embedding model.However, in [36], the hyperedges' weights in the hypergraph embedded model are fixed.Inspired by [39,40], we introduce a scheme to update the weights adaptively to preserve the prominent hyperedge and further learn the low-dimensional structure.It helps improve the accuracy of the final HSI classification to a certain extent.Finally, the leaned low-dimensional features are fed to the SVM for classification.The flowchart of the proposed method is shown in Figure 1.Experiments conducted on three widely used types of HSI demonstrate that the proposed method achieves superior performance over many other feature extract methods for HSI classification.

Hypergraph Model
Denote a hypergraph as G = (V,E,W), which consists of a set of vertices V, a family of hyperedge E and a weight matrix W of hyperedges.Different from pairwise graphs (For convenience, we call it a simple graph in the following), every hyperedge e i can contain multiple vertices and is assigned a weight w(e i ).As shown in Figure 2b, hyperedge e 1 is composed of vertices v 1 , v 2 and v 3 .e 2 is composed of vertices v 3 and v 4 .e 3 is composed of vertices v 4 , v 5 , v 6 and v 7 .W is a diagonal matrix of the hyperedge weights.The connection relationship of hypergraph G can be represented by an incidence matrix H ∈ R |V|×|E| , which can be defined as: The degree of vertex v and hyperedge e can be respectively represented as: According to the above definition, the main difference between hypergraphs and simple graphs is that every hyperedge can link more than two vertexes.Therefore, hypergraph is suitable to represent local group information and the high-order relationship of data.For example, considering seven vertices in Figure 2b, they are attributed to three groups and the corresponding incidence matrix is shown in Figure 2c.In terms of building a simple graph with these seven data points, the complex relations within the group are broken into multiple pairwise links.Some valuable information may be lost in this procedure; therefore, a simple graph can not describe the group structure well.

Hypergraph Embedding of Spatial-Spectral Joint Features
As shown in Figure 1, our algorithm mainly consists of three steps: spatial-spectral joint feature construction, hypergraph embedding and SVM classification.

Spatial-Spectral Joint Feature Construction
Following [16], we first extract several PCs from the original HSI I(x) and then build an MP from each of the PCs: where n is the number of the circular structural element (SE) with different radius sizes, OP n (x) and CP n (x) are the opening profile (OP) and the closing profile (CP) at the pixel x with an SE of a size n, respectively.Specifically, we have CP 0 (x) = OP 0 (x) = I(x).The MP of I contains the original image I, n opening profile and n closing profile.Therefore, each MP is a (2n + 1)-dimensional vector.Finally, all MPs are stacked together in one as EMP: where m represents the number of PCs.The EMP is defined as an m(2n + 1)-dimensional vector.
After obtaining the EMP feature, we represent the spatial and spectral joint feature of the i-th HSI pixel as where d is the number of the spectral bands.Denote the spectral features matrix of HSI as , where x i is the i-th pixel, and N is the number of HSI pixels.Then, the joint feature matrix of HSI can be represented as: V = X EMP ∈ R (m(2n + 1) + d)×N .

Hypergraph Embedding
We take each pixel of HSI as a vertex and construct a hypergraph G = (V, E, W) to represent the correlation between HSI pixels.Each vertex v i is associated with the spatial and spectral joint feature defined in Equation (6).The hypergraph G is constructed by the K-nearest neighbor method.In detail, each pixel v i and its K nearest neighbors are enclosed as hyperedge e i .Thus, hyperedge set E = {e 1 , e 2 , . . ., e N } contains N hyperedges.Meanwhile, the weight w(e i ) of hyperedge e i is defined as: where σ is the mean distance between all vertices and can be calculated by σ = 1 d v i , v j is the distance between vertex v i and vertex v j .The degree of vertex v i and the degree of hyperedge e i can be computed by Equations ( 2) and (3), respectively.Based on this definition, the more "compact" hyperedge (local group) is assigned with a higher weight.Denote D v and D e as two diagonal matrices of the vertex degrees and the hyperedge degrees, respectively, and P ∈ R (m(2n+1)+d)×u (generally, m (2n + 1) + d >> u) as the linear projection matrix.The objective of hypergraph embedding model is to learn the projection matrix P for reducing the feature dimension with the preservation of geometric property in the original space.The objective function is formulated as: where L = D v − HWD −1 e H T is the hypergraph laplacian matrix.The constraint P T VD v V T P = 1 is used for scale normalization of the low-dimensional representations.This objective function induces the constraint that if v i and v j are similar and belong to the same hyperedge, they should also be adjacent in embedded space.In addition, an efficient hypergraph weight estimation scheme is proposed to preserve the prominent hyperedges.Assuming that w = (w 1 , w 2 , . . . ,w N ) T is composed of the elements lying in the main diagonal of W, we enforce 1 T N w = 1 and add an l 2 norm regularizer on w.Then, our proposed embedding model is finally defined as: {P * , w * } = arg min

Optimization Algorithm
The objective function Equation ( 9) is a multiple variables optimization problem, and it is non-convex with respect to w and P jointly.However, it is convex with either of them individually when the other is fixed.Thus, an alternative iteration strategy is adopted to get the solution of Equation ( 9).We first initialize w according to Equation (7).With w fixed, we optimize P according to Equation (8).The solution of Equation ( 8) is to find the eigenvectors corresponding to the first u largest eigenvalues of the matrix Next, fix P and optimize w: In this paper, we employ the Lagrangian algorithm to optimize the Equation (10).The Lagrangian function of the objective function ( 10) is defined as: The partial derivatives of ψ w.r.t.w i , i = 1, 2, • • • , M are given by: By simplifying Equation ( 12), w k can be calculated as: According to the constraint 1 T N w = 1, the Lagrange multiplier can be calculated as: By substituting Equation ( 14) into Equation ( 13), we can obtain w finally.Following this iteration process, w and P are alternately optimized until the maximal iteration number is reached or the relative difference of objective function value of Equation ( 9) is smaller than a given tolerance const ε, i.e., where f (t + 1) and f (t) is the function value of Equation ( 9) at iteration t + 1 and t, respectively.In addition, we can obtain the final projection matrix P * .At last, the joint feature set V is reduced as a low-dimensional feature set Y = (P * ) T v 1 , . . . ,(P * ) T v N , which is then transmitted into an SVM classifier.Based on the above analysis, the proposed method can be summarized in Algorithm 1.
reduction of the stacked feature set without adaptive weight estimation.The other is SSHG* shown in Algorithm 1.They are compared with the following feature extraction methods: (1) the method by using PCA to extract spectral features (denoted as PCA); (2) the method by using EMP features without dimension reduction (denoted as EMP); (3) the method [17] stacking the EMP and the spectral features as feature without dimension reduction (denoted as EMPSpe); and (4) the spatial hypergraph embedding method proposed in [36] (denoted as SH).In order to facilitate comparisons with these competing feature extraction methods, we adopt the overall accuracy (OA), the average accuracy (AA), the per-class accuracy and Kappa coefficient (κ) to evaluate the classification performance.Furthermore, the SVM classifier with Gaussian kernel is adopted to classify all of the aforementioned feature data of these feature extraction methods.The grid search tool is used to select the parameters of the optimal penalty term and Gaussian kernel variance in SVM within the given sets 2 −10 , ..., 2 10 and 2 −10 , ..., 2 10 , respectively.The one-against-all strategy is adopted for multi-class classification.
Regarding the three data sets, we select 15 samples from each class randomly to form a training set and the remaining samples are used as the test set.The training sample selection and the classification process are repeated ten times to reduce the bias induced by random sampling.We retain the average results.The parameters setting of SH is the same as the original paper [36].With respect to our algorithm, the tolerance const ε is set as 1 × 10 −3 and the regularization parameter λ is set as 100.The number of nearest neighbors K is selected as 10, 15, 5 for Indian Pines, Pavia University and Botswana data sets, respectively.

Experimental Results
The classification results of various methods upon three types of HSI are reported in Tables 1-3, respectively.The best results are highlighted with bold fonts.The number in brackets corresponds to the optimal dimensionality of reduced features.Classification maps of these different approaches are shown in Figures 3-5, respectively.According to the experimental results, our proposed method achieves the highest OA, AA, and κ among all of the competing methods, which shows the effectiveness of our feature extraction algorithm.The effectiveness of our SSHG method owes much to the hypergraph embedding of spatial and spectral joint features.Comparing the EMP and EMPSpe method, we can find that EMPSpe method is always slightly better than EMP due to the fusion of EMP and spectral features for classification.As mentioned in [17], the stacked EMP and spectral features are transformed to low dimensional features by the decision boundary feature extraction (DBFE) and NWFE methods before classification.However, the DBFE and NWFE did not bring about the effective improvement of algorithm performance.SH utilized the hypergraph embedding model for feature reduction.Compared with PCA, the SH method has much better classification performance, which verifies the capacity of the hypergraph to capture the intrinsic complex relationships between HSI pixels.However, SH utilized only the spectral similarity for finding the nearest neighbors within a given spatial region.The superiority of SSHG over SH demonstrates that the embedding of EMP and spectral features is better for HSI classification.Specifically, our SSHG method can extract the rich spatial structures in the Pavia University data and achieve the maximum improvement upon this data.SSHG* obtains better classification results than SSHG, which demonstrates that adaptive hypergraph weight estimation is also beneficial for improving the classification accuracy.There are two parameters, i.e., K and u, in our proposed method.The parameter K is the number of nearest neighbors, which determines how many pixels are included in the hyperedge.u is the dimensionality of the embedded low-dimensional feature.To evaluate their effects on the classification performance, we conduct the experiments on the above three datasets.We firstly fix the reduced dimensionality as u = 40 and evaluate the influence of different K on the OA.As seen in Figure 6, when K is set as 10, 15, 5 for Indian Pines, Pavia University and Botswana data sets, respectively, the OA achieves the highest value.Taken as a whole, [5,15] is usually a good range for the selection of parameter K.We then fix the K as 10, 15, 5 for the three datasets, respectively, and evaluate the influence of different us on the OA. Figure 7 shows the changes of OA with the reduced dimensions on three types of HSI.We can see that the inflection point of classification results is around the dimensionality 25 for these three HSIs, and there was no significant improvement on the classification results if the dimension continues to grow up.

Conclusions
In this paper, we propose a novel algorithm for spatial-spectral feature extraction based on hypergraph learning.A hypergraph is constructed by the KNN method and the embedding operation is conducted to transform the joint EMP and spectral features into the low-dimensional representation.Meanwhile, an efficient hypergraph weight estimation scheme is adopted to preserve the prominent hyperedges.Classification is performed with SVM using the embedded features.The experimental results on three benchmark hyperspectral datasets verify that our embedded representation can enhance the classification accuracy effectively.The hypergraph weight estimation can further improve the accuracy of HSI classification.

Figure 1 .
Figure 1.The flowchart of the proposed method.

3 eFigure 2 .
Figure 2. The example of graph and hypergraph (a) simple graph, each edge consists of only two data points; (b) hypergraph G, each hyperedge is marked by an ellipse and consists of at least two data points; (c) taking the seven vertices as example, H is the incidence matrix of G, whose values are usually binary.

Table 1 .
Classification accuracy of various algorithms on the Indian Pines image.

Table 2 .
Classification accuracy of various algorithms on the Pavia university image.

Table 3 .
Classification accuracy of various algorithms on the Botswana image.