Open Access
This article is
 freely available
 reusable
Remote Sensing 2017, 9(5), 506; https://doi.org/10.3390/rs9050506
Article
Hypergraph Embedding for SpatialSpectral Joint Feature Extraction in Hyperspectral Images
Jiangsu Key Laboratory of Big Data Analysis Technology, Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology, Nanjing University of Information Science and Technology, Nanjing 210044, China
^{*}
Author to whom correspondence should be addressed.
Academic Editors:
Qi Wang,
Nicolas H. Younan,
Carlos LópezMartínez,
Xiaofeng Li
and
Prasad S. Thenkabail
Received: 18 March 2017 / Accepted: 14 May 2017 / Published: 22 May 2017
Abstract
:The fusion of spatial and spectral information in hyperspectral images (HSIs) is useful for improving the classification accuracy. However, this approach usually results in features of higher dimension and the curse of the dimensionality problem may arise resulting from the small ratio between the number of training samples and the dimensionality of features. To ease this problem, we propose a novel algorithm for spatialspectral feature extraction based on hypergraph embedding. Firstly, each HSI pixel is regarded as a vertex and the joint of extended morphological profiles (EMP) and spectral features is adopted as the feature associated with the vertex. A hypergraph is then constructed by the KNearestNeighbor method, in which each pixel and its most K relevant pixels are linked as one hyperedge to represent the complex relationships between HSI pixels. Secondly, the hypergraph embedding model is designed to learn a low dimensional feature with the reservation of geometric structure of HSI. An adaptive hyperedge weight estimation scheme is also introduced to preserve the prominent hyperedges by the regularization constraint on the weight. Finally, the learned lowdimensional features are fed to the support vector machine (SVM) for classification. The experimental results on three benchmark hyperspectral databases are presented. They highlight the importance of spatial–spectral joint features embedding for the accurate classification of HSI data. The weight estimation is better for further improving the classification accuracy. These experimental results verify the proposed method.
Keywords:
feature extraction; hypergraph learning; morphological profiles; hyperedge weight estimation1. Introduction
Hyperspectral imaging is an important mode of remote sensing imaging, which has been widely used in a diverse range of applications, including environment monitoring, urban planning, precision agriculture, geological exploration, etc. [1,2,3]. Most of these applications depend on the key problem of classifying the image pixels within hyperspectral imagery (HSI) into multiple categories, i.e., HSI classification, and extensive research efforts have been focused on this problem [4,5,6,7,8,9].
In HSI, each pixel contains hundreds of spectral bands from the visible to the infrared range of the electromagnetic spectrum. In general, the spectral signature of each pixel can be directly used as the feature for classification. However, due to the noise corruption and high correlation between spectral bands, the using of the spectral feature alone is often unable to obtain good classification results. It is well accepted that the HSI pixels within a small spatial neighborhood are often made up of the same materials. Thus, spatial contextual information is also useful for classification [10,11]. Landgrebe and Ketting proposed the wellknown extraction and classification of homogeneous objects (ECHO) approach that partitioned the HSI pixels into homogeneous object and classified homogeneous object as different categories [12]. Later, Markov random field (MRF) modeling was widely adopted to capture the interpixel dependency through the neighbor system [13,14]. However, the optimization of MRFbased methods is very timeconsuming. Due to the high dimensionality of HSI data, the computationally effective algorithm is desirable. In this sense, Pesaresi and Benediktsson [15] proposed the use of morphological transformations to build a morphological profile (MP) for extracting the structural information. Palmason et al. [16] extended the method proposed in [15] to the highresolution hyperspectral data classification. They first extracted several principal components of the hyperspectral data. Then, the MP is constructed based on each selected principal component. At last, all MPs are jointed as extended MP (EMP), which is input into a neural network for classification. However, EMP was primarily designed for classification of urban structures and it did not fully utilize the spectral information in the data. Regrading this issue, Fauvel et al. [17] proposed fusing the morphological information and the original hyperspectral data, i.e., the two vectors of attributes are concatenated into one feature vector. The final classification is achieved by using a support vector machine classifier. Many other spectral and spatial joint features [18,19,20,21,22], such as 3D wavelet [18], spatial and spectral kernel [19], matrixbased discriminant subspace analysis [20], etc. are used for classification.
These joint features usually have a high dimension. In order to avoid the Hughes phenomenon, feature extraction and dimensionality reduction must be conducted before classification. Principal component analysis (PCA) and Fisher’s linear discriminant analysis (LDA) [23] are two simple and effective approaches for dimension reduction. PCA aims at projecting the data along the directions of maximal variance. LDA is designed to generate the optimal linear projection matrix by maximizing the betweenclass distance while minimizing the withinclass distance. Apart from these linear methods, many nonlinear versions have been developed, such as kernel PCA [24] and kernel LDA [25]. Some other feature extraction techniques have also been proposed, e.g., locality preserving projection (LPP) [26], independent component analysis (ICA) [27,28], and locally linear embedding (LLE) [29]. In particular, Yan et al. [30] proposed a general graph embedding (GE) model that seamlessly includes many existing feature extraction techniques. In this GE model, each data point is visualized as a vertex and a pairwise edge is used to represent the association relationship between two data points. They consider each feature extraction algorithm as an undirected weighted graph that describes geometric structures of data. GE algorithms have been widely explored for dimension reduction of HSI. Besides the geometric structures of data, sparsity is also explored to construct the graph embedding model. Luo et al. proposed constructing a graph with the sparse coefficients that reveals the sparse properties of data, and the transformation matrix is obtained for feature reduction [31]. In addition, by regarding different band sets as different views of land covers, multiview graph ensemblebased graph embedding is also utilized to promote the performance of graph embedding for hyperspectral image classification [32].
A hypergraph is a generalization of a pairwise graph. Different from pairwise graphs, each edge in a hypergraph is capable of connecting more than two vertices [33]. Thus, the complex relationships of the dataset can be captured by a hypergraph, and hypergraphs have been gaining more and more attention in recent years. Bu et al. [34] presented a hypergraph learning based music recommendation method with the use of hyperedges to exploit the complex social media information. A hypergraph semisupervised learning model [35] was also proposed for image classification. Yuan et al. [36] utilized a hypergraph embedding model for HSI feature reduction, in which the spatial hypergraph models (SHs) are construed by selecting the Knearest neighbors within the spatial region of the centroid pixel. Experimental results demonstrated that SH outperformed many existing feature extract methods for HSI classification, including raw spectral feature (RAW), PCA, LPP, LDA, nonparametric weighted feature extraction (NWFE) [37] and semisupervised local discriminant analysis (SELD) [38]. However, SH is designed to learn the projection matrix for reducing the spectral feature. The spatial structure is not exploited for hypergraph embedding, which is not capable of simultaneously extracting the spectralspatial features. Furthermore, the hyperedge weight is computed in advance and fixed in the hypergraph embedding procedure. As the discussion stated in [39,40], all of the hyperedges do not have the same effect on the learning procedure. Some hyperedges are not as informative as others. The hypergraph embedding should be enhanced by estimating the hyperedge weights adaptively.
In order to cope with these issues, we propose a novel algorithm for HSI spatialspectral joint feature extraction. We combine the EMP and spectral features and adopt the KNN method to construct a hypergraph, where each sample and its K nearest neighbors are enclosed in one hyperedge. Similar to [36], a linear projection matrix P can be learnt by solving the hypergraph embedding model. However, in [36], the hyperedges’ weights in the hypergraph embedded model are fixed. Inspired by [39,40], we introduce a scheme to update the weights adaptively to preserve the prominent hyperedge and further learn the lowdimensional structure. It helps improve the accuracy of the final HSI classification to a certain extent. Finally, the leaned lowdimensional features are fed to the SVM for classification. The flowchart of the proposed method is shown in Figure 1. Experiments conducted on three widely used types of HSI demonstrate that the proposed method achieves superior performance over many other feature extract methods for HSI classification.
2. Hypergraph Model
Denote a hypergraph as $\mathbf{G}=(\mathbf{V},\mathbf{E},\mathbf{W})$, which consists of a set of vertices V, a family of hyperedge E and a weight matrix W of hyperedges. Different from pairwise graphs (For convenience, we call it a simple graph in the following), every hyperedge ${e}_{i}$ can contain multiple vertices and is assigned a weight $w\left({e}_{i}\right)$. As shown in Figure 2b, hyperedge ${e}_{1}$ is composed of vertices ${v}_{1}$, ${v}_{2}$ and ${v}_{3}$. ${e}_{2}$ is composed of vertices ${v}_{3}$ and ${v}_{4}$. ${e}_{3}$ is composed of vertices ${v}_{4}$, ${v}_{5}$, ${v}_{6}$ and ${v}_{7}$. W is a diagonal matrix of the hyperedge weights. The connection relationship of hypergraph G can be represented by an incidence matrix $\mathbf{H}\in {\mathbf{R}}^{\left\mathbf{V}\right\times \left\mathbf{E}\right}$, which can be defined as:
$${H}_{ij}=\mathbf{H}({v}_{i},{e}_{j})=\left\{\begin{array}{cc}1,\hfill & \mathrm{if}\phantom{\rule{4.pt}{0ex}}{v}_{i}\in {e}_{j},\hfill \\ 0,\hfill & \mathrm{if}\phantom{\rule{4.pt}{0ex}}{v}_{i}\notin {e}_{j}.\hfill \end{array}\right.$$
The degree of vertex v and hyperedge e can be respectively represented as:
$$d\left({v}_{i}\right)=\sum _{{e}_{j}\in \mathbf{E}}w\left(e\right)\mathbf{H}({v}_{i},{e}_{j}),$$
$$\delta \left({e}_{j}\right)={\delta}_{j}=\sum _{{v}_{i}\in \mathbf{V}}\mathbf{H}({v}_{i},{e}_{j}).$$
According to the above definition, the main difference between hypergraphs and simple graphs is that every hyperedge can link more than two vertexes. Therefore, hypergraph is suitable to represent local group information and the highorder relationship of data. For example, considering seven vertices in Figure 2b, they are attributed to three groups and the corresponding incidence matrix is shown in Figure 2c. In terms of building a simple graph with these seven data points, the complex relations within the group are broken into multiple pairwise links. Some valuable information may be lost in this procedure; therefore, a simple graph can not describe the group structure well.
3. Hypergraph Embedding of SpatialSpectral Joint Features
As shown in Figure 1, our algorithm mainly consists of three steps: spatialspectral joint feature construction, hypergraph embedding and SVM classification.
3.1. SpatialSpectral Joint Feature Construction
Following [16], we first extract several PCs from the original HSI $\mathbf{I}\left(x\right)$ and then build an MP from each of the PCs:
where n is the number of the circular structural element (SE) with different radius sizes, ${\mathbf{OP}}_{n}\left(x\right)$ and ${\mathbf{CP}}_{n}\left(x\right)$ are the opening profile (OP) and the closing profile (CP) at the pixel x with an SE of a size n, respectively. Specifically, we have ${\mathbf{CP}}_{0}\left(x\right)={\mathbf{OP}}_{0}\left(x\right)=\mathbf{I}\left(x\right)$. The MP of $\mathbf{I}$ contains the original image $\mathbf{I}$, n opening profile and n closing profile. Therefore, each MP is a $(2n+1)$dimensional vector. Finally, all MPs are stacked together in one as EMP:
where m represents the number of PCs. The EMP is defined as an $m(2n+1)$dimensional vector.
$$\mathbf{MP}\left(x\right)=\left\{{\mathbf{CP}}_{n}\left(x\right),\dots ,\mathbf{I}\left(x\right),\dots ,{\mathbf{OP}}_{n}\left(x\right)\right\},$$
$$\mathbf{EMP}\left(x\right)=\left\{{\mathbf{MP}}_{{\mathrm{PC}}^{1}}\left(x\right),{\mathbf{MP}}_{{\mathrm{PC}}^{2}}\left(x\right),\dots ,{\mathbf{MP}}_{{\mathrm{PC}}^{m}}\left(x\right)\right\},$$
After obtaining the EMP feature, we represent the spatial and spectral joint feature of the ith HSI pixel as
where d is the number of the spectral bands. Denote the spectral features matrix of HSI as $\mathbf{X}\phantom{\rule{3.33333pt}{0ex}}=\phantom{\rule{3.33333pt}{0ex}}[{\mathbf{x}}_{1},{\mathbf{x}}_{2},\dots ,{\mathbf{x}}_{N}]\in {\mathbf{R}}^{d\times N}$, EMP matrix of HSI as $\mathbf{EMP}=\left[\mathrm{EMP}\left({x}_{1}\right),\cdots ,\mathrm{EMP}\left({x}_{N}\right)\right]$, where ${x}_{i}$ is the ith pixel, and N is the number of HSI pixels. Then, the joint feature matrix of HSI can be represented as: $\mathbf{V}=\left[\begin{array}{c}\mathbf{X}\hfill \\ \mathbf{EMP}\hfill \end{array}\right]\in {R}^{\left(\mathrm{m}\right(2\mathrm{n}\phantom{\rule{4.pt}{0ex}}+\phantom{\rule{4.pt}{0ex}}1)\phantom{\rule{4.pt}{0ex}}+\phantom{\rule{4.pt}{0ex}}d)\times N}$.
$${v}_{i}=\left[\begin{array}{c}{x}_{i}\hfill \\ \mathrm{EMP}\left({x}_{i}\right)\hfill \end{array}\right]\in {R}^{\mathrm{m}(2\mathrm{n}\phantom{\rule{4.pt}{0ex}}+\phantom{\rule{4.pt}{0ex}}1)+d},$$
3.2. Hypergraph Embedding
We take each pixel of HSI as a vertex and construct a hypergraph $\mathbf{G}=(\mathbf{V},\mathbf{E},\mathbf{W})$ to represent the correlation between HSI pixels. Each vertex ${v}_{i}$ is associated with the spatial and spectral joint feature defined in Equation (6). The hypergraph G is constructed by the Knearest neighbor method. In detail, each pixel ${\mathit{v}}_{i}$ and its K nearest neighbors are enclosed as hyperedge ${e}_{i}$. Thus, hyperedge set $\mathbf{E}=\{{e}_{1},{e}_{2},\dots ,{e}_{N}\}$ contains N hyperedges. Meanwhile, the weight $w\left({e}_{i}\right)$ of hyperedge ${e}_{i}$ is defined as:
where $\sigma $ is the mean distance between all vertices and can be calculated by $\sigma =\frac{1}{{N}^{2}}{\displaystyle \sum _{i}}{\displaystyle \sum _{j}}d\left({\mathit{v}}_{i},{\mathit{v}}_{j}\right)$, $d\left({\mathit{v}}_{i},{\mathit{v}}_{j}\right)$ is the distance between vertex ${\mathit{v}}_{i}$ and vertex ${\mathit{v}}_{j}$. The degree of vertex ${\mathit{v}}_{i}$ and the degree of hyperedge ${e}_{i}$ can be computed by Equations (2) and (3), respectively. Based on this definition, the more "compact" hyperedge (local group) is assigned with a higher weight.
$$w\left({e}_{i}\right)=\sum _{{v}_{i},{v}_{j}\in {e}_{i}}\mathrm{exp}\left(\frac{{\u2225{v}_{j}{v}_{i}\u2225}_{2}^{2}}{2{\sigma}^{2}}\right),$$
Denote ${\mathbf{D}}_{v}$ and ${\mathbf{D}}_{e}$ as two diagonal matrices of the vertex degrees and the hyperedge degrees, respectively, and $\mathbf{P}\in {\mathbf{R}}^{\left(m\left(2n+1\right)+d\right)\times u}$ (generally, $m\left(2n+1\right)+d>>u$) as the linear projection matrix. The objective of hypergraph embedding model is to learn the projection matrix $\mathbf{P}$ for reducing the feature dimension with the preservation of geometric property in the original space. The objective function is formulated as:
where $\mathbf{L}={\mathbf{D}}_{v}{\mathbf{HWD}}_{e}^{1}{\mathbf{H}}^{T}$ is the hypergraph laplacian matrix. The constraint ${\mathbf{P}}^{T}\mathbf{V}{\mathbf{D}}_{v}{\mathbf{V}}^{T}\mathbf{P}=1$ is used for scale normalization of the lowdimensional representations. This objective function induces the constraint that if ${v}_{i}$ and ${v}_{j}$ are similar and belong to the same hyperedge, they should also be adjacent in embedded space. In addition, an efficient hypergraph weight estimation scheme is proposed to preserve the prominent hyperedges. Assuming that $\mathit{w}={\left({w}_{1},{w}_{2},\dots ,{w}_{N}\right)}^{T}$ is composed of the elements lying in the main diagonal of W, we enforce ${\mathbf{1}}_{N}^{T}\mathit{w}=1$ and add an ${l}_{2}$ norm regularizer on $\mathit{w}$. Then, our proposed embedding model is finally defined as:
$$\begin{array}{c}\underset{{\mathbf{P}}^{T}\mathbf{V}{\mathbf{D}}_{v}{\mathbf{V}}^{T}\mathbf{P}=1}{\mathrm{min}}\frac{1}{2}{\displaystyle \sum _{e\in \mathbf{E}}}{\displaystyle \sum _{{v}_{i},{v}_{j}\in e}}\frac{w\left(e\right)h\left({v}_{i},e\right)h\left({v}_{j},e\right)}{\delta \left(e\right)}{\u2225{\mathbf{P}}^{T}{v}_{i}{\mathbf{P}}^{T}{v}_{j}\u2225}_{2}^{2}\hfill \\ =\frac{1}{2}{\displaystyle \sum _{k=1}^{N}}{\displaystyle \sum _{i,j=1}^{N}}\frac{{w}_{k}{h}_{ik}{h}_{jk}}{{\delta}_{k}}{\u2225{\mathbf{P}}^{T}{v}_{i}{\mathbf{P}}^{T}{v}_{j}\u2225}_{2}^{2}\hfill \\ =\mathrm{trace}\left({\mathbf{P}}^{T}\mathbf{VL}{\mathbf{V}}^{T}\mathbf{P}\right),\hfill \end{array}$$
$$\left\{{\mathbf{P}}^{*},{w}^{*}\right\}\phantom{\rule{4.pt}{0ex}}=\phantom{\rule{4.pt}{0ex}}\mathrm{arg}\underset{{\mathbf{P}}^{T}\mathbf{V}{\mathbf{D}}_{v}{\mathbf{V}}^{T}\mathbf{P}=1}{\mathrm{min}}\left\{\mathrm{trace}\left({\mathbf{P}}^{T}\mathbf{VL}{\mathbf{V}}^{T}\mathbf{P}\right)+\lambda {\u2225w\u2225}^{2}\right\}\text{\hspace{1em}}\mathrm{s}.\mathrm{t}.\text{}{\mathbf{1}}_{N}^{T}w=1.$$
3.3. Optimization Algorithm
The objective function Equation (9) is a multiple variables optimization problem, and it is nonconvex with respect to w and $\mathbf{P}$ jointly. However, it is convex with either of them individually when the other is fixed. Thus, an alternative iteration strategy is adopted to get the solution of Equation (9). We first initialize w according to Equation (7). With w fixed, we optimize P according to Equation (8). The solution of Equation (8) is to find the eigenvectors corresponding to the first u largest eigenvalues of the matrix ${\left(\mathbf{VL}{\mathbf{V}}^{T}\right)}^{1}\left(\mathbf{V}{\mathbf{D}}_{v}{\mathbf{V}}^{T}\right)$.
Next, fix P and optimize w:
$$\mathrm{arg}\underset{\mathit{w}}{\mathrm{min}}\left\{\mathrm{trace}\left({\mathbf{P}}^{T}\mathbf{VL}{\mathbf{V}}^{T}\mathbf{P}\right)+\lambda {\u2225\mathit{w}\u2225}^{2}\right\}\text{\hspace{1em}}\mathrm{s}.\mathrm{t}.\text{}{\mathbf{1}}_{N}^{T}\mathit{w}=1.$$
In this paper, we employ the Lagrangian algorithm to optimize the Equation (10). The Lagrangian function of the objective function (10) is defined as:
$$\begin{array}{cc}\hfill \psi \left(\mathit{w},c\right)& =\mathrm{trace}\left({\mathbf{P}}^{T}\mathbf{VL}{\mathbf{V}}^{T}\mathbf{P}\right)+\lambda {\mathit{w}}^{T}\mathit{w}+c\left({\mathbf{1}}_{N}^{T}\mathit{w}1\right)\hfill \\ & =\frac{1}{2}\sum _{k=1}^{N}\sum _{i,j=1}^{N}\frac{{w}_{k}{h}_{ik}{h}_{jk}}{{\delta}_{k}}{\u2225{\mathbf{P}}^{T}{v}_{i}{\mathbf{P}}^{T}{v}_{j}\u2225}_{2}^{2}+\lambda {\mathit{w}}^{T}\mathit{w}+c\left({\mathbf{1}}_{N}^{T}\mathit{w}1\right).\hfill \end{array}$$
The partial derivatives of $\psi $ w.r.t. ${w}_{i},i=1,2,\cdots ,M$ are given by:
$$\frac{\partial \psi \left(w,c\right)}{\partial {w}_{k}}=\frac{1}{2}\sum _{i,j=1}^{N}\frac{{h}_{ik}{h}_{jk}}{{\delta}_{k}}{\u2225{\mathbf{P}}^{T}{v}_{i}{\mathbf{P}}^{T}{v}_{j}\u2225}_{2}^{2}+2\lambda {w}_{k}+c=0.$$
By simplifying Equation (12), ${w}_{k}$ can be calculated as:
$${w}_{k}=\frac{\frac{1}{2}{\displaystyle \sum _{i,j=1}^{N}}\frac{{h}_{ik}{h}_{jk}}{{\delta}_{k}}{\u2225{\mathbf{P}}^{T}{v}_{i}{\mathbf{P}}^{T}{v}_{j}\u2225}_{2}^{2}+c}{2\lambda}.$$
According to the constraint ${1}_{N}^{T}w=1$, the Lagrange multiplier can be calculated as:
$$c=\frac{1}{N}\left[\frac{1}{2}\sum _{k=1}^{N}\sum _{i,j=1}^{N}\frac{{h}_{ik}{h}_{jk}}{{\delta}_{k}}{\u2225{\mathbf{P}}^{T}{v}_{i}{\mathbf{P}}^{T}{v}_{j}\u2225}_{2}^{2}+2\lambda \right].$$
Following this iteration process, w and P are alternately optimized until the maximal iteration number is reached or the relative difference of objective function value of Equation (9) is smaller than a given tolerance const $\epsilon $, i.e.,
where $f(t+1)$ and $f\left(t\right)$ is the function value of Equation (9) at iteration $t+1$ and t, respectively. In addition, we can obtain the final projection matrix ${\mathbf{P}}^{*}$. At last, the joint feature set $\mathbf{V}$ is reduced as a lowdimensional feature set $\mathbf{Y}=\left[{\left({\mathbf{P}}^{*}\right)}^{T}{\mathit{v}}_{1},\dots ,{\left({\mathbf{P}}^{*}\right)}^{T}{\mathit{v}}_{N}\right]$, which is then transmitted into an SVM classifier. Based on the above analysis, the proposed method can be summarized in Algorithm 1.
$$\frac{\leftf(t+1)f\left(t\right)\right}{\leftf\left(t\right)\right}\u2a7d\epsilon ,$$
Algorithm 1: The proposed method ( denoted as SSHG*) for HSI classification. 

4. Experiments and Discussion
4.1. Data Sets
In order to verify the performance of our proposed method, we conduct the experiments on the following three benchmark datasets.
 (1)
 Indian Pines data set—the first data set was acquired by the AVIRIS sensor over the Indian Pines test site in Northwestern Indiana, USA. The size of the image is 145 pixels × 145 pixels with a spatial resolution of 20 m per pixel. Twenty water absorption bands (104–108, 150–163, 220) were removed, and the 200band image is used for experiments. Sixteen classes of interest are considered.
 (2)
 Pavia University data set—the second data set was acquired by the ROSIS sensor during a flight campaign over Pavia, northern Italy. The size of the image is 610 pixels × 340 pixels with a spatial resolution of 1.3 m per pixel. Twelve channels were removed due to noise. The remaining 103 spectral bands are processed. Nine classes of interest are considered.
 (3)
 Botswana data set—the third data set was acquired by the NASA EO1 satellite over the Okavango Delta, Botswana, in 2001. The size of the image is 1476 pixels × 256 pixels with a spatial resolution of 30 m per pixel. Uncalibrated and noisy bands that cover water absorption features were removed, and the remaining 145 bands are used for experiment. Fourteen classes of interest are considered.
4.2. Experimental Setting
In order to demonstrate the effectiveness of adaptive weight estimation, we implement our algorithm as two versions. One is SSHG, which only utilizes the KNN hypergraph model for dimension reduction of the stacked feature set without adaptive weight estimation. The other is SSHG* shown in Algorithm 1. They are compared with the following feature extraction methods: (1) the method by using PCA to extract spectral features (denoted as PCA); (2) the method by using EMP features without dimension reduction (denoted as EMP); (3) the method [17] stacking the EMP and the spectral features as feature without dimension reduction (denoted as EMPSpe); and (4) the spatial hypergraph embedding method proposed in [36] (denoted as SH). In order to facilitate comparisons with these competing feature extraction methods, we adopt the overall accuracy (OA), the average accuracy (AA), the perclass accuracy and Kappa coefficient ($\kappa $) to evaluate the classification performance. Furthermore, the SVM classifier with Gaussian kernel is adopted to classify all of the aforementioned feature data of these feature extraction methods. The grid search tool is used to select the parameters of the optimal penalty term and Gaussian kernel variance in SVM within the given sets $\left\{{2}^{10},...,{2}^{10}\right\}$ and $\left\{{2}^{10},...,{2}^{10}\right\}$, respectively. The oneagainstall strategy is adopted for multiclass classification. Regarding the three data sets, we select 15 samples from each class randomly to form a training set and the remaining samples are used as the test set. The training sample selection and the classification process are repeated ten times to reduce the bias induced by random sampling. We retain the average results. The parameters setting of SH is the same as the original paper [36]. With respect to our algorithm, the tolerance const $\epsilon $ is set as $1\times {10}^{3}$ and the regularization parameter $\lambda $ is set as 100. The number of nearest neighbors K is selected as 10, 15, 5 for Indian Pines, Pavia University and Botswana data sets, respectively.
4.3. Experimental Results
The classification results of various methods upon three types of HSI are reported in Table 1, Table 2 and Table 3, respectively. The best results are highlighted with bold fonts. The number in brackets corresponds to the optimal dimensionality of reduced features. Classification maps of these different approaches are shown in Figure 3, Figure 4 and Figure 5, respectively. According to the experimental results, our proposed method achieves the highest OA, AA, and $\kappa $ among all of the competing methods, which shows the effectiveness of our feature extraction algorithm. The effectiveness of our SSHG method owes much to the hypergraph embedding of spatial and spectral joint features.
Comparing the EMP and EMPSpe method, we can find that EMPSpe method is always slightly better than EMP due to the fusion of EMP and spectral features for classification. As mentioned in [17], the stacked EMP and spectral features are transformed to low dimensional features by the decision boundary feature extraction (DBFE) and NWFE methods before classification. However, the DBFE and NWFE did not bring about the effective improvement of algorithm performance. SH utilized the hypergraph embedding model for feature reduction. Compared with PCA, the SH method has much better classification performance, which verifies the capacity of the hypergraph to capture the intrinsic complex relationships between HSI pixels. However, SH utilized only the spectral similarity for finding the nearest neighbors within a given spatial region. The superiority of SSHG over SH demonstrates that the embedding of EMP and spectral features is better for HSI classification. Specifically, our SSHG method can extract the rich spatial structures in the Pavia University data and achieve the maximum improvement upon this data. SSHG* obtains better classification results than SSHG, which demonstrates that adaptive hypergraph weight estimation is also beneficial for improving the classification accuracy.
There are two parameters, i.e., K and u, in our proposed method. The parameter K is the number of nearest neighbors, which determines how many pixels are included in the hyperedge. u is the dimensionality of the embedded lowdimensional feature. To evaluate their effects on the classification performance, we conduct the experiments on the above three datasets. We firstly fix the reduced dimensionality as $u=40$ and evaluate the influence of different K on the OA. As seen in Figure 6, when K is set as 10, 15, 5 for Indian Pines, Pavia University and Botswana data sets, respectively, the OA achieves the highest value. Taken as a whole, $[5,15]$ is usually a good range for the selection of parameter K. We then fix the K as 10, 15, 5 for the three datasets, respectively, and evaluate the influence of different us on the OA. Figure 7 shows the changes of OA with the reduced dimensions on three types of HSI. We can see that the inflection point of classification results is around the dimensionality 25 for these three HSIs, and there was no significant improvement on the classification results if the dimension continues to grow up.
5. Conclusions
In this paper, we propose a novel algorithm for spatialspectral feature extraction based on hypergraph learning. A hypergraph is constructed by the KNN method and the embedding operation is conducted to transform the joint EMP and spectral features into the lowdimensional representation. Meanwhile, an efficient hypergraph weight estimation scheme is adopted to preserve the prominent hyperedges. Classification is performed with SVM using the embedded features. The experimental results on three benchmark hyperspectral datasets verify that our embedded representation can enhance the classification accuracy effectively. The hypergraph weight estimation can further improve the accuracy of HSI classification.
Acknowledgments
This work was supported in part by the Natural Science Foundation of China under Grant Numbers: 61672292, 61532009, 61622305, 61502238, 61300162 and, in part, by the Six Talent Peaks Project of Jiangsu Province, China, under Grant DZXX037.
Author Contributions
Yubao Sun and Sujuan Wang contributed equally to this work. They proposed the algorithm and performed the experiments. Qingshan Liu supervised the study, analyzed the results and gave insightful suggestions for the manuscript. Sujuan Wang and Yubao Sun drafted the manuscript. Guangcan Liu and Renlong Hang contributed to the revision of the manuscript. All authors read and approved the submitted manuscript.
Conflicts of Interest
The authors declare no conflict of interest.
References
 Clement, A. Advances in remote sensing of agriculture: context description, existing operational monitoring systems and major information needs. Remote Sens. 2013, 5, 949–981. [Google Scholar] [CrossRef]
 Shafri, H.Z.M.; Taherzadeh, E.; Mansor, S.; Ashurov, R. Hyperspectral remote sensing of urban areas: An overview of techniques and applications. Res. J. Appl. Sci. Eng. Technol. 2012, 4, 1557–1565. [Google Scholar]
 Abbate, G.; Fiumi, L.; De Lorenzo, C.; Vintila, R. Avaluation of remote sensing data for urban planning. Applicative examples by means of multispectral and hyperspectral data. In Proceedings of the GRSS/ISPRS Joint Workshop on Remote Sensing and Data Fusion over Urban Areas, Berlin, Germany, 22–23 May 2003; pp. 201–205. [Google Scholar]
 Wu, Z.; Wang, Q.; Plaza, A.; Li, J.; Sun, L. Parallel spatialspectral hyperspectral image classification with sparse representation and markov random fields on GPUs. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 2926–2938. [Google Scholar] [CrossRef]
 Yuan, Y.; Lin, J.; Wang, Q. Hyperspectral image classification via multitask joint sparse representation and stepwise MRF optimization. IEEE Trans. Cybern. 2016, 46, 2966–2977. [Google Scholar] [CrossRef] [PubMed]
 Wang, Q.; Lin, J.; Yuan, Y. Salient band selection for hyperspectral image classification via manifold ranking. IEEE IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 1279–1289. [Google Scholar] [CrossRef] [PubMed]
 Hang, R.; Liu, Q.; Sun, Y.; Yuan, X.; Pei, H.; Plaza, J.; Plaza, A. Robust matrix discriminative analysis for feature extraction from hyperspectral images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 2002–2011. [Google Scholar] [CrossRef]
 Wu, Z.; Li, Y.; Plaza, A.; Li, J.; Xiao, F.; Wei, Z. Parallel and distributed dimensionality reduction of hyperspectral data on cloud computing architectures. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 2270–2278. [Google Scholar] [CrossRef]
 Sun, Y.; Hang, R.; Liu, Q.; Zhu, F.; Pei, H. GraphRegularized low rank representation for aerosol optical depth retrieval. Int. J. Remote Sens. 2016, 37, 5749–5762. [Google Scholar] [CrossRef]
 Fauvel, M.; Tarabalka, Y.; Benediktsson, J.A.; Chanussot, J.; Tilton, J.C. Advances in spectralspatial classification of hyperspectral images. Proc. IEEE 2013, 101, 652–675. [Google Scholar] [CrossRef]
 Yuan, Y.; Lin, J.; Wang, Q. DualClusteringBased hyperspectral band selection by contextual analysis. IEEE Trans. Geosci. Remote Sens. 2016, 54, 1431–1445. [Google Scholar] [CrossRef]
 Kettig, R.L.; Landgrebe, D.A. Classification of multispectral image data by extraction and classification of homogeneous objects. IEEE Trans. Geosci. Electron. 1976, 14, 19–26. [Google Scholar] [CrossRef]
 Descombes, X.; Sigelle, M.; Preteu, F. GMRF parameter estimation in a nonstationary framework by a renormalization technique: application to remote sensing imaging. IEEE Trans. Image Process. 1999, 8, 490–503. [Google Scholar] [CrossRef] [PubMed]
 Jackson, Q.; Landgrebe, D.A. Adaptive bayesian contextual classification based on markov random fields. IEEE Trans. Geosci. Remote Sens. 2002, 40, 2454–2463. [Google Scholar] [CrossRef]
 Pesaresi, M.; Benediktsson, J.A. A new approach for the morphological segmentation of highresolution satellite imagery. IEEE Trans. Geosci. Remote Sens. 2001, 39, 309–320. [Google Scholar] [CrossRef]
 Benediktsson, J.A.; Palmason, J.A.; Sveinsson, J.R. Classification of hyperspectral data from urban areas based on extended morphological profiles. IEEE Trans. Geosci. Remote Sens. 2005, 43, 480–491. [Google Scholar] [CrossRef]
 Fauvel, M.; Benediktsson, J.A.; Chanussot, J.; Sveinsson, J.R. Spectral and spatial classification of hyperspectral data using svms and morphological profiles. IEEE Trans. Geosci. Remote Sens. 2008, 46, 3804–3814. [Google Scholar] [CrossRef]
 Guo, X.; Huang, X.; Zhang, L. ThreeDimensional wavelet texture feature extraction and classification for multi/hyperspectral imagery. IEEE Geosci. Remote Sens. Lett. 2014, 11, 2183–2187. [Google Scholar] [CrossRef]
 Li, L.; Marpu, P.R.; Plaza, A.; BioucasDias, J.M. Generalized composite kernel framework for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2013, 51, 4816–4829. [Google Scholar] [CrossRef]
 Hang, R.; Liu, Q.; Song, H.; Sun, Y. Matrixbased discriminant subspace ensemble for hyperspectral image spatial–spectral feature fusion. IEEE Trans. Geosci. Remote Sens. 2016, 54, 783–794. [Google Scholar] [CrossRef]
 Fang, L.; Li, S.; Kang, X.; Benediktsson, J.A. Spectralspatial hyperspectral image classification via multiscale adaptive sparse representation. IEEE Trans. Geosci. Remote Sens. 2014, 52, 7738–7749. [Google Scholar] [CrossRef]
 Chen, Y.; Lin, Z.; Zhao, X.; Wang, G.; Gu, Y. Deep learningbased classification of hyperspectral data. IEEE J. Sel. Top. Appl. Earth Obs. 2014, 7, 2094–2107. [Google Scholar] [CrossRef]
 Du, Q. Modified fisher’s linear discriminant analysis for hyperspectral imagery. IEEE Geosci. Remote Sens. Lett. 2007, 4, 503–507. [Google Scholar] [CrossRef]
 Fauvel, M.; Chanussot, J.; Benediktsson, J.; Atli, N. Kernel principal component analysis for the classification of hyperspectral remote sensing data over urban areas. EURASIP J. Adv. Signal Process. 2009, 2009, 1–14. [Google Scholar] [CrossRef]
 Li, W.; Prasad, S.; Fowler, J.E. Decision fusion in kernelinduced spaces for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2014, 52, 3399–3411. [Google Scholar] [CrossRef]
 He, X.; Niyogi, P. Locality preserving projections. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Vancouver, BC, Canada, 8–13 December 2003; pp. 186–197. [Google Scholar]
 Villa, A.; Benediktsson, J.A.; Chanussot, J.; Jutten, C. Hyperspectral image classification with independent component discriminant analysis. IEEE Trans. Geosci. Remote Sens. 2011, 49, 4865–4876. [Google Scholar] [CrossRef]
 Mura, M.D.; Villa, A.; Benediktsson, J.A.; Chanussot, J. Classification of hyperspectral images by using extended morphological attribute profiles and independent component analysis. IEEE Geosci. Remote Sens. Lett. 2011, 8, 542–546. [Google Scholar] [CrossRef]
 Roweis, S.T.; Saul, L.K. Nonlinear dimensionality reduction by locally linear embedding. Science 2000, 290, 2323–2326. [Google Scholar] [CrossRef] [PubMed]
 Yan, S.; Xu, D.; Zhang, B.; Zhang, H.J. Graph embedding and extensions: A general framework for dimensionality reduction. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 40–51. [Google Scholar] [CrossRef] [PubMed]
 Luo, F.; Huang, H.; Liu, J.; Ma, Z. Fusion of graph embedding and sparse representation for feature extraction and classification of hyperspectral imagery. Photogramm. Eng. Remote Sens. 2017, 83, 37–46. [Google Scholar] [CrossRef]
 Chen, P.; Jiao, L.; Liu, F.; Zhao, J.; Zhao, Z. Dimensionality reduction for hyperspectral image classification based on multiview graphs ensemble. J. Appl. Remote Sens. 2016, 10, 030501. [Google Scholar] [CrossRef]
 Zhou, D.; Huang, J.; Schölkopf, B. Learning with hypergraphs: clustering, classification, and embedding. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 3–6 December 2007; pp. 1601–1608. [Google Scholar]
 Bu, J.; Tan, S.; Chen, C.; Wang, C.; Wu, H.; Zhang, L.; He, X. Music recommendation by unified hypergraph: combining social media information and music content. In Proceedings of the 18th ACM international conference on Multimedia, Firenze, Italy, 25–29 October 2010; pp. 391–400. [Google Scholar]
 Liu, Q.; Sun, Y.; Wang, C.; Liu, T.; Tao, D. Elastic net hypergraph learning for image clustering and semisupervised classification. IEEE Trans. Image Process. 2017, 26, 452–463. [Google Scholar] [CrossRef] [PubMed]
 Yuan, H.; Tang, Y.Y. Learning with hypergraph for hyperspectral image feature extraction. IEEE Trans. Geosci. Remote Sens. Lett. 2015, 12, 1695–1699. [Google Scholar] [CrossRef]
 Kuo, B.C.; Landgrebe, D.A. Nonparametric weighted feature extraction for classification. IEEE Trans. Geosci. Remote Sens. 2004, 42, 1096–1105. [Google Scholar] [CrossRef]
 Liao, W.; Pizurica, A.; Scheunders, P.; Philips, W.; Pi, Y. Semisupervised local discriminant analysis for feature extraction in hyperspectral images. IEEE Trans. Geosci. Remote Sens. 2013, 51, 184–198. [Google Scholar] [CrossRef]
 Pliakos, K.; Kotropoulos, C. Weight estimation in hypergraph learning. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, South Brisbane, Australia, 19–24 April 2015; pp. 1161–1165. [Google Scholar]
 Gao, Y.; Wang, W.; Zha, Z.J.; Shen, J.; Li, X.; Wu, X. Visualtextual joint relevance learning for tagbased social image search. IEEE Trans. Image Process. 2013, 22, 363–376. [Google Scholar] [CrossRef] [PubMed]
Figure 2.
The example of graph and hypergraph (a) simple graph, each edge consists of only two data points; (b) hypergraph G, each hyperedge is marked by an ellipse and consists of at least two data points; (c) taking the seven vertices as example, H is the incidence matrix of G, whose values are usually binary.
Figure 3.
Indian Pines. (a) threechannel color composite image with bands 65, 52, 36; (b,c) groundtruth map and class labels; (d–i) classification maps of PCA, EMP, EMPSpe, SH, SSHG, SSHG*, respectively.
Figure 4.
Pavia university. (a) threechannel color composite image with bands 102, 56, 31; (b,c) groundtruth map and class labels; (d–i) classification maps of PCA, EMP, EMPSpe, SH, SSHG, SSHG*, respectively.
Figure 5.
Botswana. (a) threechannel color composite image with bands 65, 52, 36; (b,c) groundtruth map and class labels; (d–i) classification maps of PCA, EMP, EMPSpe, SH, SSHG, SSHG*, respectively.
Figure 6.
Effects of the number K of nearest neighbors on OA. (a) Indian Pines; (b) Pavia University; (c) Botswana.
Class  PCA (25)  EMP (27)  EMPSpe (227)  SH (22)  SSHG (44)  SSHG* (44) 

1  91.61  98.71  99.03  94.87  98.06  98.06 
2  47.36  61.46  64.28  82.59  72.53  73.96 
3  48.60  78.75  77.14  73.50  84.06  84.85 
4  68.29  95.90  91.76  91.32  96.76  97.21 
5  75.75  87.78  88.85  92.12  89.83  90.32 
6  85.37  91.48  92.36  98.22  93.93  94.04 
7  91.54  99.23  99.23  100  100  100 
8  79.52  98.47  98.92  98.31  99.57  99.63 
9  96.00  100  100  100  100  100 
10  56.22  74.23  71.61  87.51  76.81  77.68 
11  49.62  69.51  71.02  64.41  75.65  75.57 
12  45.43  75.67  77.40  84.31  84.33  84.79 
13  93.47  98.68  99.00  99.49  99.37  99.37 
14  69.55  93.25  94.83  94.84  97.57  97.58 
15  46.42  95.96  95.85  75.07  97.74  97.76 
16  89.62  97.56  98.46  98.75  99.74  99.87 
OA  58.90  79.14  79.88  82.33  84.36  84.75 
AA  70.90  88.54  88.73  89.71  91.62  91.92 
kappa  53.88  76.42  77.24  80.06  82.27  82.73 
Class  PCA (10)  EMP (27)  EMPSpe (130)  SH (30)  SSHG (46)  SSHG* (46) 

1  66.21  82.40  81.57  70.33  81.67  82.70 
2  65.14  83.44  84.09  82.13  92.02  91.44 
3  70.00  77.04  77.79  72.37  80.47  80.08 
4  85.26  97.42  97.44  89.58  93.93  94.90 
5  99.37  99.76  99.75  99.61  99.79  99.80 
6  69.16  78.91  80.16  91.76  86.50  89.63 
7  90.45  94.07  93.28  92.68  94.16  94.44 
8  71.34  86.12  85.30  72.16  83.07  84.06 
9  99.72  96.04  97.44  99.51  98.26  98.15 
OA  70.59  84.77  85.05  81.88  89.01  89.43 
AA  79.63  88.35  88.53  85.57  89.99  90.58 
kappa  63.20  80.38  80.78  76.80  85.64  86.24 
Class  PCA (22)  EMP (27)  EMPSpe (172)  SH (25)  SSHG (34)  SSHG* (34) 

1  100  99.92  99.89  100  100  100 
2  96.51  100  97.99  100  99.68  98.05 
3  96.19  94.79  95.85  99.15  96.76  100 
4  99.00  95.85  98.83  99.50  98.41  93.27 
5  81.10  79.76  82.32  82.86  91.79  96.38 
6  69.29  81.73  88.34  81.89  96.37  99.22 
7  96.31  97.70  99.20  98.77  99.72  99.95 
8  98.40  99.63  99.48  99.47  100  97.42 
9  79.93  92.34  94.47  96.32  98.86  99.79 
10  95.28  98.33  97.98  99.57  99.92  97.97 
11  83.45  97.24  95.19  97.59  94.97  99.88 
12  93.98  99.94  99.88  88.55  100  99.49 
13  89.33  99.60  98.37  94.47  99.92  99.75 
14  98.75  99.25  98.35  100  91.36  99.63 
OA  89.83  94.69  95.65  95.10  97.79  98.38 
AA  91.25  95.43  96.15  95.58  97.70  98.63 
kappa  88.98  94.24  95.36  94.68  97.60  98.24 
© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).