Multi-Feature Manifold Discriminant Analysis for Hyperspectral Image Classification

Hyperspectral image (HSI) provides both spatial structure and spectral information for classification, but many traditional methods simply concatenate spatial features and spectral features together that usually lead to the curse-of-dimensionality and unbalanced representation of different features. To address this issue, a new dimensionality reduction (DR) method, termed multi-feature manifold discriminant analysis (MFMDA), was proposed in this paper. At first, MFMDA explores local binary patterns (LBP) operator to extract textural features for encoding the spatial information in HSI. Then, under graph embedding framework, the intrinsic and penalty graphs of LBP and spectral features are constructed to explore the discriminant manifold structure in both spatial and spectral domains, respectively. After that, a new spatial-spectral DR model for multi-feature fusion is built to extract discriminant spatial-spectral combined features, and it not only preserves the similarity relationship between spectral features and LBP features but also possesses strong discriminating ability in the low-dimensional embedding space. Experiments on Indian Pines, Heihe and Pavia University (PaviaU) hyperspectral data sets demonstrate that the proposed MFMDA method performs significantly better than some state-of-the-art methods using only single feature or simply stacking spectral features and spatial features together, and the classification accuracies of it can reach 95.43%, 97.19% and 96.60%, respectively.

Recent investigations have demonstrated that combining spatial and spectral information is beneficial to the feature extraction and classification of HSI data [20][21][22][23][24][25][26][27][28].In recent years, many effective spatial-based features have been proposed by concerning structure, shape, texture, geometric, etc. Li et al. [29] extracted textural features of HSI using LBP operator and then classified them through extreme learning machine (ELM).Mauro et al. [30] designed an extended multi-attribute profiles (EMAP) algorithm to explore morphological features from HSI, and the extracted features were classified by a random forest classifier.Li et al. [31] introduced generalized composite kernel machines to explore spatial information through EMAP, then used the multinomial logistic regression for classification.However, in real applications, it is impossible to find a single feature that is suitable for different image scenes due to the variety and irregular distribution of ground objects.The conventional method for addressing this issue is to explore a feature stacking (FS) approach for the combination of different types of features.Li et al. [32] tried to get combined features by fusing spectral features and EMAP features which improved the classification accuracy of HSI.Song et al. [33] used LBP operator for extracting textural features, and then stacked spectral features and textural features for classification.However, the feature stacking method commonly poses the problem of the curse-of-dimensionality for the increase in the dimension of stacked features, and thus such methods do not necessarily ensure better performance for HSI classification.Therefore, an urgent challenge in multi-feature classification of HSI data is how to reduce the dimension of spatial and spectral combined features largely with some valuable intrinsic information preserved [34].
To solve this problem, many DR methods have been proposed to reduce the number of bands and obtain some desired information in HSI [35][36][37][38].Principal component analysis (PCA) and Linear Discriminant Analysis (LDA) are two classical DR methods [39,40].However, the two subspace methods cannot analyze the data that lies on or near a submanifold embedded in the original space.Therefore, the graph-based manifold learning methods have attracted wide attention recently [41].Such methods include isometric mapping (Isomap), Laplacian eigenmaps (LE), locality preserving projections (LPP), locally linear embedding (LLE), neighborhood preserving embedding (NPE), and local tangent space alignment (LTSA) [42][43][44][45][46][47].These graph embedding (GE) methods are unsupervised learning methods without using the discriminant information in training samples.Some supervised learning methods were designed to explore the label information of training data to enhance the discriminating ability for classification, such as marginal Fisher analysis (MFA), locality sensitive discriminant analysis (LSDA), coupled discriminant multi-manifold analysis (CDMMA), and local geometric structure Fisher analysis (LGSFA) [48][49][50][51].However, the above DR methods only make use of spectral features in HSI, while it is commonly accepted that exploiting multiple features, spectral, texture and shape features, brings significant benefits in terms of improving the classification performance.
To explore DR of multiple features for HSI classification, Fauvel et al. [52] used PCA to reduce the dimension of EMAP features and stacked them with spectral features to form fused feature vectors.Huo et al. [53] selected the first three PCs of HSI to extract Gabor textures, then concatenated Gabor textures and spectral features from the same pixel to form the combined feature for classification.However, the above multi-feature-based methods simply stacked the reduced spectral and spatial features together after applying DR on the different types of features, respectively.The embedding features are obtained in different subspaces that cannot ensure global optimization.Furthermore, the direct stacking strategy may lead to unbalanced representation of different features.
To overcome above drawbacks, we propose a novel DR algorithm termed multi-feature manifold discriminant analysis for HSI data.The MFMDA method first exploits the spatial information in HSI by extracting LBP textural features.Then it constructs the intrinsic graphs and penalty graphs of spectral features and LBP features within GE framework, which can effectively discover the manifold structure of spatial features and spectral features.After that, MFMDA learns low-dimensional embedding space from original spectral features as well as LBP features for compacting the intramanifold samples while separating intermanifold samples, which will increase the margins between different manifolds.As a result, the spatial-spectral embedding features possess stronger discriminating ability for HSI classification.Experimental results on three real hyperspectral data sets show that the proposed MFMDA algorithm can significantly improve the classification accuracy compared with some state-of-art DR methods, especially in the case of limited training samples are available.
The remainder of this paper is organized as follows.In Section 2, we briefly introduce the spectral features, textural features, and the GE framework.The details of our algorithm are introduced in Section 3. Section 4 gives experimental results to demonstrate the effectiveness of our algorithm.We give some concluding remarks and suggestions for further work in Section 5.

Spectral and LBP Features of HSI
Spectral and textural information are the fundamental properties of hyperspectral imagery.Spectral information provides densely sampled reflectance values over a wide range of the electro-magnetic spectrum to distinguish similar materials, while texture is a typical spatial feature which gives a description of the homogeneity of an image using the texture element as the fundamental unit.Recent studies show that combining spatial context into pixel-based spectral classification can substantially improve the classification performance of HSI [54].
Local binary pattern is a discriminative and computationally local texture descriptor that has shown promising performance in classification.The original LBP operator represents the pixels of an image with binary numbers called LBP codes, which encode the local structure around each pixel, and then the codes are used for further analysis [29].The procedure of it is shown in Figure 1, where the 10th band of PaviaU hyperspectral image is used to extract LBP features.As in Figure 1, for a given center pixel in a 3 × 3 window, the neighbor pixels are assigned with binary labels ("0" or "1"), depending on whether the gray value of center pixel is larger or not.An 8-digit binary number can be obtained by concatenating all these binary codes in a clockwise direction starting from the top-left one, and the derived binary numbers are referred to as LBP code.According to the aforementioned analysis, spectral features and LBP features represent the information in HSI from different perspectives.Spectral features provide continuous spectral measurement across the entire electromagnetic spectrum, while LBP features present a better expression of detailed local spatial features, such as edges, corners, and knots.Thus, it is promising to apply LBP features as a supplement to spectral features that lack the consideration of spatial relations between pixels in HSI.However, both spectral features and LBP features are characterized by high dimensionality.A common approach to address the problem is to explore DR methods which will reduce the dimension of high-dimensional features largely without loss of information.

Graph Embedding
The GE framework is explored to unify many classical DR algorithms such as PCA, LDA, ISOMAP, LLE, LE, LPP and NPE.In GE, an intrinsic graph is constructed to characterize the statistical or geometrical properties that need to be preserved, and a penalty graph is explored to describe some properties which should be avoided.The intrinsic graph G I (X, W w ) and the penalty graph G p (X, W b ) are both undirected weighted graphs, where X is the vertex set of graph, W w ∈ n×n and W b ∈ n×n are the weight matrices of G I and G P , respectively.w w ij indicates the similarity between vertices x i and x j in G I , while w b ij measures the dissimilarity of vertices x i and x j in G P .Under this framework, MFA has been proposed for dimensionality reduction of high-dimensioanl data.In MFA, G I connects each point with its neighbors from the same class to represent intraclass compactness, and G P connects the neighbor points which from different classes to represent the interclass separability.In low-dimensional embedding space, the intraclass compactness and interclass separability should be enhanced.Therefore, the optimal projection matrix V can be obtained with the following optimization problem: where L = D w − W w is the Laplacian matrix of graph G I , W w = [w w ij ] n i,j=1 , D w is a diagonal matrix,

Proposed Approach
Let us suppose that a hyperspectral data set X= [x 1 ,x 2 ,x 3 , • • • , x N ] ∈ D×N , where D is the number of bands and N indicates the number of pixels in HSI data.
denote the spectral features and LBP features of X, respectively.The class label of x i is indicated by

Motivation
Since different types of features represent HSI data from different perspectives, multiple feature fusion will bring benefits to enhance the discrimination capability for classification.The most common way to combine these features is to simply concatenate different types of features together, and then a classifier is employed to classify the stacked features.However, such stack-based methods have witnessed limited performance due to the simple strategy, and they may even perform worse than using a single feature in HSI data.The reasons for this phenomenon are summarized as follows:

•
Simply stacking spatial and spectral features may yield redundant information, and it remains difficult to achieve an optimal combination for different kinds of features;

•
The spatial information and spectral information is not equally represented by simply stacking;

•
The stacked features greatly increase the dimensionality of spatial-spectral combined features, this will make HSI classification fairly challenging for the curse-of-dimensionality problem, especially when only limited training samples are available.
Many DR methods have been explored to reduce the dimension of stacked features.However, different types of features usually lie on different manifolds.Performing dimensionality reduction directly on the simply stacked features cannot reveal the manifold structure of different features in HSI.As a result, the discriminant information contained by different features is not effectively represented, which will restrict their discriminant capability for classification.
To overcome the shortcomings as discussed above, a new DR method called MFMDA is introduced in next section.By exploring the manifold structures of different features, it can effectively extract the spatial-spectral combined features and subsequently improve the classification performance of HSI.

MFMDA
The goal of the proposed MFMDA method is to find an optimized projection matrix which can couple dimensionality reduction and data fusion of original features (from HSI data) and spatial features (LBP features generated from HSI) based on GE framework.MFMDA simultaneously learns a low-dimensional embedding space from original spectral features as well as LBP features for compacting the intramanifold samples while separating intermanifold samples, which will increase the margins between different manifolds.As a result, the obtained embedding features possess stronger discriminating ability that helps to subsequent classification.The flowchart of MFMDA is shown in Figure 2. As illustrated in Figure 2, due to the fact that the similarity relationship between spectral features and LBP features from the same pixel should be preserved in the low-dimensional embedding space.Let us assume A S ∈ D×d and A L ∈ D×d are the corresponding projection matrices of spectral features and LBP features, respectively.A S and A L should be explored to minimize the distance between the two embedding features from the same pixel, and the objective function can be defined as follows:

Low-dimensional embedding space Textural Features Adjacent Graphs different features of class 1 different features of class 2 Similarity Preserving
With some mathematical operations, Equation ( 2) can be reduced as: where A S and A L are respectively parameterized as A S = X S B and A L = X L C, B and C are projection matrices that map spectral information and texture information in high-dimensional features to the low-dimensional embedded space, respectively.
From the view point of classification, in the low-dimensional embedding space, we expect that the samples are as close as possible if they belong to the same manifold, while samples are as far as possible if they are from different manifolds.To achieve this goal, we define the objective function as follows: where w w s ij and w b s ij are the affinity weights to characterize the similarity between spectral features x s i and x s j of intrinsic graph G S I as well as the dissimilarity between x s i and x s j of the penalty graph G S P , w w l ij and w b l ij are the affinity weights to characterize the similarity between LBP features x l i and x l j of the intrinsic graph G L I and the dissimilarity between x l i and x l j of the penalty graph G L P , respectively.In the intrinsic graph G S I of spectral features, the vertices x s i and x s j are connected by an edge if l(x i ) = l(x j ) and they are close to each other in terms of some distance.When it comes to the penalty graph G S P , the vertices x s i and x s j are connected by an edge if l(x i ) = l(x j ) and x s j belongs the k b nearest neighbors of x s i .The weights w w s ij and w b s ij in two spectral-based graphs are defined as: where N s,w (x s i ) is the n w -intramanifold neighbors of spectral feature x s i , N s,b (x s i ) indicates the n b -intermanifold neighbors of x s i , and In the intrinsic graph G L I of LBP features, an edge is added between the vertices x l i and x l j if l(x i ) = l(x j ) and x l j belongs the k w nearest neighbors of x l i ; in the penalty graph G L P , an edge is connected by x l i and x l j if l(x i ) = l(x j ) and x l j belongs the k b nearest neighbors of x l i .The weights w w l ij and w b l ij in two LBP-based graphs can be set as: where N l,w (x l i ) is the n w -intramanifold neighbors of spectral feature x l i , N l,b (x l i ) indicates the n b -intermanifold neighbors of x l i , and The objective function of J 2 (A S , A L ) in Equation ( 4) is to minimize the intramanifold distance to ensuring the samples from the same manifold should be as close as possible, and the objective function of J 3 (A S , A L ) in Equation ( 5) is to maximize the intermanifold distance for enlarging the manifold margins in the low-dimensional embedding space.
With some mathematical operations, Equations ( 4) and ( 5) can be reduced as: where ).As discussed, the MFMDA method not only preserves the similarity relationship between spectral features and LBP features but also possesses strong discriminating ability in the low-dimensional embedding space.Therefore, a reasonable criterion for choosing a good projection matrix is to optimize the following objective functions: The multi-objective function optimization problem in Equation ( 12) can be equivalent to: where α, β > 0 are two tradeoff parameters which can adjust intramanifold compactness and intermanifold separability, A constraint A T EE T A = I is imposed to remove an arbitrary scaling factor in the projection, and the objective function can be recast as follows: With the method of Lagrangian multiplier, the optimization solution is formulated as where λ is the Lagrangian multiplier.Then the optimization problem is transformed to solve a generalized eigenvalue problem, i.e., where the optimal projection matrix A = [a 1 , a 2 , • • • , a d ] is composed of d minimum eigenvalues of Equation ( 16) corresponding eigenvectors.Then the low-dimensional feature is given by: The proposed MFMDA algorithm is summarized in Algorithm 1.
the number of intraclass neighbor points n w and the number of interclass neighbor points n b , balance parameter α and β.
1: Get LBP features generated from the data set, denote the spectral and LBP features.
2: Find the n w intraclass neighbor points and n b interclass neighbor points of spectral features and LBP features, respectively.
3: Calculate the edge weights of the intrinsic and penalty graphs by Equations ( 6)-( 9).
), respectively.5: Obtain the Lagrangian matrix which contain manifold structure through according to Equation ( 16).
8: Obtain the projection matrix of spectral and LBP features by A = B 0 0 C , A S = X S B and 9: Achieve the low-dimensional features Y through Equation (17).
Projection matrix of spectral and LBP features:

Experimental Results and Discussion
In this section, experiments are conducted on three real HSI data sets to evaluate the effectiveness of the proposed MFMDA method.

Experimental Setup
In each experiment, the HSI data set was randomly divided into training and test sets.For the classes that are very small, i.e., Alfalfa, Grass/pasture-mowed, and Oats in Indian Pines data set, the number of training samples was set to 10 per class.The training samples are used to learn a low-dimensional embedding space, then all test samples are mapped into the embedding space for extracting low-dimensional features.After that, support vector machine (SVM) with the radial basis function (RBF) kernel were used to classify test samples, and the library for SVM (LibSVM) Toolbox was employed to implement SVM [57].The parameters for SVM were optimized by a grid search.The classification accuracy for each class, overall classification accuracies (OAs), average classification accuracies (AAs), and kappa coefficient (k) are used to evaluate the classification performance of different DR methods.To robustly evaluate the performance of different methods under different conditions, we repeated the experiments 10 times in each condition, and presented the results in the form of mean with standard deviation (STD).
The proposed MFMDA algorithm was compared with some state-of-art DR algorithms including Baseline, PCA [39], LDA [40], NPE [46], LPP [44], MFA [48] and LGSFA [51], where Baseline represents that test samples are classified directly by a classifier without dimensionality reduction.To verify the effectiveness of MFMDA, the above DR algorithms were applied to spectral features, LBP features and stacked features, respectively.Notice that LBP features are obtained by the "uniform LBP" pattern, and the ratio of neighborhood radius and the number of sampling points are set to 1 and 8, respectively [58].The stacked features refer to stack original spectral features and LBP features after applying normalization.
For all methods, the parameters are optimized by using cross validation to achieve good results.The numbers of neighbors for NPE and LPP is set to 9. For MFA and LGSFA, the numbers of intraclass and interclass neighbors are chosen as 9 and 180, respectively.All experiments were performed on a personal computer with i7-7800X central processing unit, 32-G memory, and 64-bit Windows 10 using MATLAB 2014b.

Dimension Selection
To analyze the influence of different embedding dimensions on each DR algorithm, 40 samples were randomly selected from the stacked features of each class in three HSI data sets for training, and the remaining samples were tested.Figure 6 shows the overall classification accuracy under different embedding sizes.As shown in Figure 6, with the increase of embedding dimension, the OAs of all methods are gradually improved.The reason for this is that the more discriminant information can be contained with the increase of embedding features, which is helpful for classification.However, when the dimension has been increased to a certain extent, the low-dimensional embedding space contains enough information for classification, and then the increase of embedding dimension has little effect on the improvement of classification performance.Meanwhile, MFMDA achieves better classification results than other methods, because the MFMDA can better characterize the intrinsic manifold structure of HSI and obtain more effective low-dimensional discriminant features.To achieve better classification performance for each algorithm, the embedding dimensions of all methods are set to 40.When it comes to LDA, the embedding dimension is set to c − 1, and c is the class number of the data set.

Experiments on the Indian Pines Data Set
In this section, the experiments were conducted on the Indian Pines data set to evaluate the effectiveness of the proposed algorithm.The proposed MFMDA method has different parameters, and we conducted experiments to analysis the sensitivity of parameters.In the experiments, 40 samples  As can be seen in Figure 7a, with the increase of n w , the classification accuracy improves and then tends to be stable, for the reason is that a large number of intraclass neighbors are conducive to reveal the intrinsic structure of HSI data.When the value of n w is lower than 7, the OAs maintain a stable value with an increase of n b , but the OAs significantly decline when the value of n b exceeded 10.The reason is that too large values of n b will cause the phenomenon of over-learning in the margins between interclass samples.In Figure 7b, the classification performance improves with the increase of parameter α and then slightly fluctuates.While the proposed method can achieve good results at a wide range.However, when β has a very large value, the effect of the intraclass separability will be limited.Thus, parameters α and β can balance the contribution between intraclass compactness and interclass separability.According to this figure, we set the parameters n w and n b to 6 and 4, α and β to 0.8 and 0.5 for achieving a satisfactory performance.
To analyze the classification performance of each algorithm under different numbers of training samples, n i (n i = 5, 10, 20, 30, 40) samples were randomly selected from each class for training, and the remaining data were used as test samples.Table 1 shows the average OAs with STD for different DR methods with different numbers of training samples.
According to Table 1, with the increase in the sample size of training set, the OAs of all methods continuously raise, for the reason is that a large training set contains more available information to learn discriminant features for classification.Furthermore, the classification results of each algorithm on LBP features are superior to that of spectral features, this is because LBP features are spatial-based features which bring benefits to classification.However, the classification performance of simply stacked features is even worse than LBP features, this may be due to the fact that spatial features and spectral features are not equally represented by simply stacking.While the proposed MFMDA algorithm produces a better classification effect than other methods in all conditions, especially when there are only a small number of labeled samples.The reason for this is that the proposed algorithm not only guarantees the similarity for spectral features and LBP features of the same pixel in the low-dimensional embedding space, but also discovers the manifold structure in the hyperspectral data by constructing intrinsic graphs and penalty graphs, and then extracts the spatial-spectral combined discriminant features to achieve the compactness for intraclass data and the separability for interclass data.To explore the classification performance of MFMDA on each class, 3% samples per class were randomly selected for training, and remaining samples were used for testing.We can see from Table 1, the experimental results of DR methods on LBP features are better than spectral features and stacked features, and thus LBP features were chosen for comparison in the following experiments.Table 2 lists the classification accuracy of each class, OAs, AAs, and Kappa coefficients of different methods, and Figure 8 shows the corresponding classification maps.As illustrated in Table 2, the MFMDA algorithm achieved good classification results in most classes, especially for the areas labeled as Wheat, Grass/Trees, Soybeans-min and Woods.By observing Figure 8, the classification map of MFMDA algorithm produced more homogenous regions than other methods.
The above results show that the proposed method compacts spectral features and LBP features from the same class and separates the features belong to different classes in low-dimensional embedding space, it can make better use of the manifold structure hidden in hyperspectral data.

Experiments on the Heihe Data Set
In this section, the Heihe hyperspectral image was used to further evaluate the classification performance of the proposed algorithm.In the parameter sensitivity experiments, we randomly selected 40 samples from each class for training and the rest for testing.At first, we analyze the influence of different parameters on MFMDA algorithm, and the OAs with different values of parameters are displayed in Figure 9.As in Figure 9a, the OA increases and then declines with the increase of n w , it is because a small value of n w cannot get enough information to represent the intraclass structure, and a large value of n w will lead to overfitting.At the same time, an appropriate size of interclass neighbor points can prevent overfitting and effectively obtain discriminant information of HSI data.In Figure 9b, it can be observed that the OAs increase and then maintain slight fluctuation with the increase of α, and a too small value of β will lead to unsatisfactory classification performance.This indicates that the suitable α and β can balance the intramanifold and intermanifold relations of spectral features and textual features.Based on the above analysis, we set the parameters n w and n b to 24 and 6, α and β to 0.8 and 0.4.
To compare the MFMDA algorithm with other DR methods under different numbers of training samples, we randomly selected n i samples from each class for training, and the remaining samples were used for testing.Table 3 is the classification results of various algorithms.According to Table 3, the classification accuracy increases as the number of training samples increases.Meanwhile, the experimental results of supervised learning methods, LDA, MFA and LGSFA, are superior to the unsupervised ones in most conditions, because the class information of data are used to enhance the discriminating capability of embedded features.The proposed method is more effective than other methods under various conditions, especially when a training set contains few samples.This shows that MFMDA can extract effective spatial-spectral joint features by exploring the inherent manifold structure of HSI data on the basis of GE, and then improve the classification accuracy.
To further show the classification results of each class, 0.1% samples were randomly selected for training, and the rest were used as test samples.The classification results of different methods on the Heihe data set is shown in Table 4, and Figure 10 shows the corresponding classification maps.
As illustrated in Table 4, it can be concluded that the proposed method achieves good classification performance on many classes, such as Endive Sprout and Artificial Surfaces.In addition, it possesses a smoother classification map, which is more conductive to practical application scenarios.Please note that OA and k coefficients are given in parentheses.

Experiments on the PaviaU Data Set
In this section, we used PaviaU data set to analyze the classification performance of the proposed algorithm under different scenes.We randomly selected 40 samples per class as training set to explore OAs with respect to different parameters.The results are displayed in Figure 11.
In Figure 11a, as the increase of n w , the OA rises first and then decreases slightly, the reason for this is that a small number of intraclass neighbor points cannot effectively explore intramanifold structure, while a large value of n w will include redundant information and lead to a decrease in classification accuracy.At the same time, when n b is lower than 8, the OAs can maintain a stable value.As shown in Figure 11b, the classification accuracy can fluctuate in a small range when the values of α and β continue to increase.It shows that α and β can balance the information between the intramanifold and intermanifold structures in HSI data.To achieve good classification performance, we selected n w , n b , α and β as 28, 4, 0.5, 0.3, respectively.To verify the effectiveness of the proposed algorithm, we randomly selected n i (n i = 5, 10, 20, 30, 40) samples from each class for training and remaining samples for testing.The average OAs with STD are given in Table 5.It can be seen from Table 5, the OAs of each method are improved when more samples are used for training.MFMDA achieves better results than other algorithms in most cases, the reason is that it can increase the margins between different classes, so the discriminant features are obtained for classification.
To compare the classification performance of various DR methods, we randomly selected 1% data in each class for training, and remaining data were used as test samples.As shown in Table 5, LBP features and stacked features achieve better experiment results than spectral features, so we choose stacked features compared with the MFMDA method.Table 6 gives the classification accuracies of different methods and Figure 12 shows the corresponding classification maps.As shown in Table 6, the proposed method obtained the best classification results in most classes, especially in Asphalt, Gravel, Bare Soil, Bitumen, Bricks.The reason is that the MFMDA algorithm effectively fuses the multiple features by compacting spectral features and LBP features from the same class in low-dimensional space.As displayed in Figure 12, MFMDA algorithm has fewer misclassified points and the classification map is smoother than other methods.

Discussion
The experiments on three HSI data sets reveal some interesting points.

•
As shown in Tables 1, 3 and 5, the classification performance of simply stacked features is even worse than LBP features in most cases, for the reason that simply stacked spatial and spectral features may yield redundant information and even lead to the curse-of-dimensionality.

•
From the experimental results, it is obviously that DR methods on LBP features or spectral features usually perform better than DR methods on the simply stacked features.This may be due to the fact that performing dimensionality reduction directly on the simply stacked features cannot reveal the manifold structure of different features in HSI, which will restrict their discriminant capability for classification.

•
The proposed MFMDA algorithm is superior to other DR methods under different training conditions.The reason is that MFMDA constructs the intrinsic graphs and penalty graphs of spectral features and LBP features to discover the manifold structure of spatial features and spectral features, then it learns low-dimensional embedding space from original spectral features as well as LBP features for compacting the intramanifold samples while separating intermanifold samples.As a result, the spatial-spectral embedding features possess stronger discriminating ability for HSI classification.

Conclusions
Traditional methods explore only a single feature or simply stacked features in hyperspectral image, which will restrict their discriminant capability for classification.In this paper, we proposed a new dimensionality reduction method termed MFMDA to couple DR and fusion of spectral and textual features of HSI data.MFMDA first explores LBP operator to extract textural features for encoding the spatial information in HSI.Then, within GE framework, the intrinsic and penalty graphs of LBP and spectral features are constructed to explore the discriminant manifold structure in both spatial and spectral domains, respectively.After that, a new spatial-spectral DR model is built to extract discriminant spatial-spectral combined features which not only preserve the similarity relationship between spectral features and LBP features but also possess strong discriminating ability in the low-dimensional embedding space.Experiments on Indian Pines, Heihe and PaviaU hyperspectral data sets demonstrate that the proposed MFMDA method can significantly improve classification performance and result in smoother classification maps than some state-of-the-art methods, and with fewer training samples, the classification accuracy can reach 95.43%, 97.19% and 96.60%, respectively.In the future, we will focus on conducting a more detailed investigation of other possible features to further improve the performance of MFMDA.

Figure 1 .
Figure 1.The procedure of local binary patterns (LBP) operator on the PaviaU image.

Figure 2 .
Figure 2. The flowchart of multi-feature manifold discriminant analysis.

4 :
Compute the D w s , D w l , D b s and D b l by D

4. 1 .
Experiment Data SetIndian Pines data set: This HSI data set was collected by NASA using the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor in Northwest Indiana.After removing water absorption bands, the remaining 200 bands were used in the experiments.The size of this image is 145 × 145 pixels with a spatial resolution of 20 m, and it contains sixteen land cover types such as Wheat, Woods and Oats.This scene in false color and its corresponding ground truth are shown in Figure3, and the values in brackets indicate the sample size of each class.

Figure 3 .
Figure 3. Indian Pines hyperspectral image.(a) HSI in false-color (bands 50, 27 and 17 for RGB); (b) Ground-truth map (please note that the number of samples is given in parentheses).Heihe data set[55,56]: This data set is provided by Heihe Plan Science Data Center which is sponsored by the integrated research on the eco-hydrological process of the Heihe River Basin of the National Natural Science Foundation of China, and it was captured by Compact Airborne Spectrographic Imager (CASI)/Shortwave Infrared Airborne Spectrogrpahic Imager (SASI) in Zhangye basin which is located in the middle reaches of Heihe watershed, Gansu Province, China.The data possesses a spatial size of 684 × 453 pixels, and it has a geometric resolution of 2.4 m.Exactly 135 bands remained after removal of 14 bands which have noise and atmospheric effects.The data contains 9 different kinds of land covers.The scene in false color and its ground-truth map are shown in Figure4.PaviaU data set: This data set is a scene of the PaviaU collected by the reflective optics system imaging spectrometer (ROSIS) sensor.It consists of 610 × 340 pixels and the spatial resolution is 1.3 m. 103 spectral bands remained after the removal of some channels as a result of dense water vapor and atmospheric effects.There are nine classes of ground objects are considered in the data set such as trees, soil and meadows.This HSI in false color and its corresponding ground truth are shown in Figure5.

Figure 4 .
Figure 4. Heihe hyperspectral image.(a) HSI in false-color (bands 57, 19 and 80 for RGB); (b) Ground-truth map (please note that the number of samples is given in parentheses).

Figure 5 .
Figure 5. PaviaU hyperspectral image.(a) HSI in false-color (bands 60, 100 and 20 for RGB); (b) Ground-truth map (please note that the number of samples is given in parentheses).
per class were randomly selected for training, and the remaining ones for testing.The SVM classifier is used for classification.To investigate the classification influence on intraclass and interclass neighbors, parameters n w and n b are tuned with a set of {1,2,3,• • • ,9} and a set of {2,4,6,• • • ,20}, respectively.Figure 7a shows the OAs with different values of n w and n b .When it comes to tradeoff parameters, parameters α and β are all tuned with the set of {0,0.1,0.2,•• • ,1}.The OAs with different values of α and β are displayed in Figure 7b.

Figure 7 .
Figure 7.The experiments for parameter analysis of MFMDA on Indian Pines data set.(a) Classification results of MFMDA with different values of n w and n b ; (b) Classification results of MFMDA with different parameters α and β.

Figure 9 .
Figure 9.The experiments for parameter analysis of MFMDA on Heihe data set.(a) OAs of MFMDA with different values of n w and n b ; (b) OAs of MFMDA with different values of α and β.

Figure 11 .
Figure 11.The experiments for parameter analysis of MFMDA on PaviaU data set.(a) Classification results of MFMDA with different parameters of n w and n b ; (b) Classification results of MFMDA with different parameters α and β.
2, • • • , c}, where c is the number of classes.The purpose of DR is to find a low-dimensional embedding space Y= [y 1 ,y 2 ,y 3 , • • • , y N ] ∈ d×N , where d (d D) is the embedding dimensionality of extracted features.

Table 2 .
Classification results of each class samples via different DR methods in Indian Pines data set (%).

SVM Classifier Train Test Baseline PCA LDA NPE LPP MFA LGSFA MFMDA
Notes: The bold numbers represent the maximum OA of the row.

Table 3 .
Classification results using different methods with different classifiers for the Heihe data set.[Overall Accuracy ± Std (%)].
Notes: The bold numbers represent the maximum OA of the column.

Table 4 .
Classification results of each class samples via different DR methods in Heihe data set (%).

Table 5 .
Classification results using different methods with different classifiers for the PaviaU data set.[Overall Accuracy ± Std (%)].

Table 6 .
Classification results of each class samples via different DR methods in PaviaU data set (%).
Notes: The bold numbers represent the maximum OA of the row.