Spatial-Spectral Multiple Manifold Discriminant Analysis for Dimensionality Reduction of Hyperspectral Imagery

: Hyperspectral images (HSI) possess abundant spectral bands and rich spatial information, which can be utilized to discriminate different types of land cover. However, the high dimensional characteristics of spatial-spectral information commonly cause the Hughes phenomena. Traditional feature learning methods can reduce the dimensionality of HSI data and preserve the useful intrinsic information but they ignore the multi-manifold structure in hyperspectral image. In this paper, a novel dimensionality reduction (DR) method called spatial-spectral multiple manifold discriminant analysis (SSMMDA) was proposed for HSI classiﬁcation. At ﬁrst, several subsets are obtained from HSI data according to the prior label information. Then, a spectral-domain intramanifold graph is constructed for each submanifold to preserve the local neighborhood structure, a spatial-domain intramanifold scatter matrix and a spatial-domain intermanifold scatter matrix are constructed for each sub-manifold to characterize the within-manifold compactness and the between-manifold separability, respectively. Finally, a spatial-spectral combined objective function is designed for each submanifold to obtain an optimal projection and the discriminative features on different submanifolds are fused to improve the classiﬁcation performance of HSI data. SSMMDA can explore spatial-spectral combined information and reveal the intrinsic multi-manifold structure in HSI. Experiments on three public HSI data sets demonstrate that the proposed SSMMDA method can achieve better classiﬁcation accuracies in comparison with many state-of-the-art methods.


Introduction
Hyperspectral image (HSI) is acquired by different imaging spectrometer sensors (e.g., EO-1 Hyperion, HyMap and AVIRIS), which possesses abundant spectral information about ground objects in hundreds of spectral bands [1][2][3]. Due to the advancement of imaging spectrometer technology, the spectral resolution of hyperspectral sensors has been improved significantly, which can provide richer spectral information to differentiate different ground objects [4,5]. However, the sensors with higher resolution produce a very large volume of data, it will render the traditional image processing algorithms designed for multispectral imagery ineffective [6,7]. In particular, the high dimensionality of HSI data brings about the "curse of dimensionality" problem, that is, under a fixed, small number of training samples, the classification accuracy of HSI data decreases when the dimensionality of HSI data increases [8][9][10]. Besides, the spectral bands are highly correlated and some spectral bands may not carry discriminant information in a specific application. The reason is that the sample point in HSI often exhibits similar characteristics to electromagnetic waves in adjacent bands, which will produce a lot of useless information and restrict the classification performance of HSI. Therefore, to achieve an excellent classification performance, it is an urgent task to perform dimensionality reduction (DR) for HSI data while preserving the intrinsic valuable information.
Serving as a good tool for data mining, manifold learning has been the main focus of DR [11][12][13]. Recently, many manifold learning algorithms have been introduced to discover the intrinsic structure in high-dimensional data, such methods include local linear embedding (LLE) [14], Laplacian eigenmaps (LE) [15] and isometric feature mapping (ISOMAP) [16]. LLE assumes that the data is linear over a small locality, it preserves the local linear structure of data in low-dimensional space [17]. LE builds a graph incorporating neighborhood information of data and computes a low-dimensional representation by optimally preserving local neighborhood information [18]. ISOMAP characterizes the data distribution by geodesic distances instead of Euclidean distances, it seeks a lower-dimensional embedding which maintains geodesic distances between all points [19]. However, due to their nonlinear characteristic, they suffer from the problem of out-of-sample and cannot process the unknown samples [20][21][22]. To address this issue, many linear manifold learning methods were designed to obtain explicit feature mappings, which can map unknown samples into low-dimensional space [23,24]. The representative methods include neighborhood preserving embedding (NPE) [25], locality preserving projection (LPP) [26] and local scaling cut (LSC) [27]. NPE aims at preserving the local manifold structure which is constructed by the k-nearest neighbors [28]. LPP seeks an optimal linear projection while preserving the local geometry structure in original data [29]. LSC uses the spectral information to explore the intrinsic manifold structure of HSI data by constructing a pairwise dissimilarity matrix [30]. Although the motivations of the methods are different, their common objective is to derive a lower-dimensional representation and facilitate the subsequent classification task. In order to unify these methods, a graph embedding (GE) framework was proposed to describe many existing DR techniques [31]. Based on this framework, local geometric structure Fisher analysis(LGSFA) was proposed by compacting the homogeneous data while separating the heterogeneous data [32]. However, the above manifold learning methods assume that the observed data is located on a single manifold and they cannot discover the multi-manifold structure embedded in HSI. Therefore, the single manifold methods have a limited discriminating power to distinguish different land covers and their classification performance will be reduced rapidly when applied to more complex scenes.
In practical applications, the observed data can be divided into many different subsets and each subset resides on a low-dimensional submanifold [33]. To reveal the multi-manifold structure in high-dimensional data, some multiple manifold learning methods were explored for feature learning. Hettiarachchi et al. [34] proposed a multiple manifold locally linear embedding (MM-LLE) algorithm which adopts a supervised LLE form of neighborhood selection in learning individual manifolds and it uses a manifold-manifold distance (MMD) as a measure to find the optimum embedded dimension. Jiang et al. [35] proposed a coupled discriminant multi-manifold analysis (CDMMA) algorithm which explores the neighborhood information as well as local geometric structure of the multi-manifold space spanned by samples and it exploits the discriminant information by minimizing the intramanifold distance and maximizing the intermanifold distance simultaneously. Chu et al. [36] proposed a multi-feature multi-manifold learning (M 3 L) method, which learns multiple discriminative feature subspaces by maximizing the multi-feature manifold margins of different classes and it recognizes the probe subjects with a multi-feature manifold-manifold distance. Shi et al. [37] proposed a supervised multi-manifold learning (SMML) algorithm, which extracts multi-manifold features through maximizing the between-class Laplacian graph and projects samples from different classes into respective submanifolds. Although the multiple manifold learning methods can enhance the classification performance of high-dimensional data, they ignore the spatial consistency property of HSI and do not utilize the abundant spatial information.
Spatial information has been proven to be useful for improving the classification performance of HSI [38,39]. Recently, many DR methods have been proposed based on the spatial distribution of hyperspectral data and they can be divided into two main types: spatial filtering methods and spatial DR methods [40][41][42]. Spatial filtering methods focus on how to utilize spatial homogeneous regions to smooth the pixel-wise classification map, while spatial DR methods incorporate spatial information into the process of DR by modeling the spatial neighboring correlations [43,44]. Although spatial filtering techniques can improve the classification accuracy of HSI data, they are often used as the pre-processing techniques to process the whole image [45]. In view of this, spatial DR methods have been widely studied for HSI classification [46,47]. Zhou et al. [48] proposed a spatial-domain local pixel NPE (LPNPE) method, which seeks a linear projection by minimizing the local pixel neighborhood preserving scatter and maximizing the total scatter of data. Feng et al. [49] developed a discriminate spectral-spatial margins (DSSM) method, which reveals the local neighborhood information of HSI data and explores the global structure of labeled and unlabeled data via low rank representation. Ramanarayan et al. [50] designed a neighboring pixel local scaling cut (NPLSC) method, which combines the unlabeled spectral weight with the local graph cut-based segmentation strategy on the homogeneous HSI data to maintain the label consistency in the spatial domain. The spatial-spectral combined DR methods have improved the classification performance significantly but they do not consider the multi-manifold structure of HSI data to further enhance the classification performance.
To explore the multiple manifold structure and spatial information in HSI, we proposed a novel spatial-spectral multi-manifold DR method, called spatial-spectral multiple manifold discriminant analysis (SSMMDA) for HSI classification. Differently from the methods of spatial filtering, SSMMDA incorporates spatial information into the DR process for feature learning. The main contributions of SSMMDA can be summarized as following. (1) According to the prior label information of training samples, we divide the samples data into several different subsets, each subset is treated as a submanifold. (2) Based on the graph embedding theory, an intramanifold graph is constructed for each submanifold to preserve the local neighborhood structure in spectral domain and an intramanifold scatter matrix and an intermanifold scatter matrix are constructed for each submanifold to characterize the within-manifold compactness and the between-manifold separability in spatial domain. (3) A spatial-spectral combined objective function is designed for each submanifold and each submanifold can obtain a discriminant projection matrix by maximizing the spatial-spectral intermanifold scatter matrix and minimizing the spatial-spectral intramanifold scatter matrix simultaneously. (4) Embedding features for each submanifold can be obtained via the corresponding projection matrix and then different embedding features are fused to enhance the classification performance.
The remainder of the paper is structured as follows. Section II briefly reviews some related works including GE and LSC. Section III details our proposed SSMMDA method. Section IV reports the experimental results. Section V presents some analysis and discussion on the experiments. Finally, Section VI provides our concluding remarks and gives recommendations for future work.

Related Works
Suppose that a HSI data set X contains n samples with D bands, i-th pixel can be represented as x i ∈ R D . Denote the class label of x i as l i ∈ {1, 2, . . . , c}, where c is the number of land-cover types. For linear manifold learning methods, the goal is to seek a projection matrix V ∈ R D×d , which can map X ∈ R D×n into a d-dimension embedding space, where d D. With the projection matrix V, we can compute low-dimensional embedding features Y ∈ R d×n as Y = V T X.

Graph Embedding
The graph embedding (GE) framework embeds a graph into a vector space where the inherent properties of the graph can be preserved. In GE, an intrinsic graph G = {X, W} and a penalty graph G P = X, W P are constructed based on the statistical or geometric properties of data, where X is the vertex set, W ∈ R n×n and W P ∈ R n×n are the similarity matrices of G and G P , respectively. Intrinsic graph G is used to describe desirable statistical or geometrical properties of data, while penalty graph G P is adopted to characterize dissimilarity relationship between the vertex pairs. The purpose of GE is to represent each sample as a low-dimensional vector in graphs which can preserve similarity relationship of vertex pairs. The optimal low-dimensional embedding can be given by the graph preserving criterion as Equation (1): where h is a constant, B is a constraint matrix which is defined to avoid the trivial solution of Equation (1). For scale normalization, B is typically set as a diagonal matrix or the Laplacian matrix of G P . Then, the Laplacian matrices L and L P in graphs G and G P can be defined as Equations (2) and (3): (2)

Local Scaling Cut
Local scaling cut (LSC) is proposed based on GE framework, it compacts the data which come from the same class and separates the data which come from different classes. LSC seeks a linear projection matrix such that the between-class dissimilarity matrix is maximized, whereas the total-class dissimilarity matrix is minimized in the projected space.
To exploit the intrinsic geometry and local manifold structure of the data, LSC constructs a localized k-nearest neighbor graph to characterize the similarity relationship between the neighboring pixels. Suppose N b (x i ) represents the k b -nearest neighbors of x i from different classes and N w (x i ) represents the k w -nearest neighbors of x i from the same class, then the local between-class dissimilarity matrix S b and the local within-class dissimilarity matrix S w can be given in Equations (4) and (5) as: where U s denotes all the samples in s-th class, H b ij and H w ij are the weights of x i and x j in N b (x i ) and N w (x i ), respectively. The weights H b ij and H w ij are defined in Equations (6) and (7) as: in which n s is the total number of elements in the s-th class. Then the local scaling cut criterion is formulated as Equation (8):

Spatial-Spectral Multi-Manifold Discriminant Analysis
To effectively reveal the intrinsic multi-manifold structure of HSI data, a spatial-spectral multi-manifold discriminant analysis (SSMMDA) method was proposed for DR. SSMMDA divides samples into different subsets based on their label information, each subset is treated as a submanifold. Then, for each submanifold, it selects spectral neighbors to construct a spectral-domain intramanifold graph to preserve the local neighborhood structure of HSI and it constructs two weighted scatter matrices in spatial domain to characterize the within-manifold compactness and the between-manifold separability. After that, a spatial-spectral combined objective function is designed for each submanifold and we can seek a discriminant projection matrix for each submanifold by maximizing the spatial-spectral intermanifold scatter matrix and minimizing the spatial-spectral intramanifold scatter matrix simultaneously. Finally, low-dimensional embedding for each submanifold can be obtained by corresponding projection matrix and embedding features of each submanifold are fused for classification. SSMMDA improves both the intramanifold compactness and intermanifold separability of HSI data, it possesses a better discriminative power to improve classification performance. The process of the SSMMDA method is shown in Figure 1.

Spectral-Domain Multi-Manifold Analysis Model
In some situations, high-dimensional data do not lies on a single manifold, samples from different classes lie on different submanifolds [33,34]. Existing manifold learning algorithms usually characterize the similarity relationships of high-dimensional data in the same manifold space and they can not discover the multi-manifold structure of data. Therefore, it is a crucial issue to preserve the local structure of different submanifolds by performing multiple manifold learning. In the proposed SSMMDA method, a multi-manifold learning model in spectral domain is designed to reveal the multi-manifold structure of HSI. At first, according to the label information of training samples, HSI data is divided into different subsets. Then, several intramanifold graphs are constructed in spectral domain for different subsets, which can effectively characterize the similarity relationship between samples belonging to the same submanifold.
For HSI data, each class can be treated as a submanifold. Therefore, a HSI data set with D bands can be represented as X = [M 1 , M 2 , . . . , M c ] ∈ R D×c , where c is the number of land-cover types. Denote the i-th point in the r-th submanifold as M i r , then the r-th submanifold can be given as . . , M n r r , where n r is the number of samples in M r . To characterize the discriminative manifold structure of M r , a spectral domain intramanifold graph G r (M r , W r ) can be constructed, where M r is the vertex set of the graph and W r is the weight matrix. In graph G r , an edge is put between nodes i and j if M i r and M j r are neighbors, otherwise they will not be connected and the weight between M i r and M j r can be defined as Equation (9): where N ik M i r is the neighbor set of M i r , k is the number of intramanifold neighbors for graph G r and t ri = 1 r is a heat kernel parameter. To enhance the aggregation of data points which possess similar spectral characteristics in low-dimensional space, the objective function can be given in Equation (10): in which y i r and y j r are the low-dimensional representations of M i r and M j r for submanifold M r , respectively.
According to Equation (10), c different projection matrices can be obtained. For the r-th submanifold in spectral domain, the optimization problem of submanifold M r can be transformed as Equation (11): where the affinity matrix W r is used to generate the intramanifold graph Laplacian matrix B r = D r − W r and D r is a diagonal matrix whose entries are column sums of W r .
To eliminate the impact of scale factor, a constraint Y r D r Y T r = V T r M r D r M T r V r = I is added for Equation (11), where I = diag[1, · · · , 1]. Therefore, the optimal function of Equation (11) equals to the following optimization problem of Equation (12): With the Lagrange multiplier method, the optimization problem in Equation (12) can be transformed into the following generalized eigenvalue problem of Equation (13): where λ r is the eigenvalue.

Spatial-Domain Multi-Manifold Analysis Model
According to the spatial distribution consistency of HSI, the pixels are generally distributed in blocks, such as Soil, Water, Building and Woods [51]. Therefore, the neighboring pixels in a spatial window are more likely belonging to the same class and they lie on the same manifold [52]. To utilize the spatial information in HSI, a spatial-domain weighted intramanifold scatter matrix is designed to characterize the similarity relationship for each submanifold and a spatial-domain weighted intermanifold scatter matrix is defined to represent the dissimilarity relationship between different submanifolds.
Suppose a pixel x i in HSI lies on the r-th spectral submanifold, then it can be represented as M i r . Since the pixels in the local spatial patch of x i commonly lie on the same submanifold as x i , so the neighboring pixels can be given as M i1 where the odd number w is the size of spatial window.
From the viewpoint of classification, we aim to minimize the distances of intramanifold points and maximize the distances between different submanifolds in low-dimensional space so that the submanifold margins can be maximized for DR. Denoting y i r and y ij r as the low-dimensional representations of i-th sample and its j-th spatial neighbor in the r-th manifold space, the optimization problem can be formulated as Equation (14): where y r i is the low-dimensional representation of i-th sample in the r-th manifold space, y r is the mean of all training data after they are mapped to submanifold M r , s ij r = exp − y i r − y ij r 2 /2(t ij ) 2 is the similarity weight between y i r and y ij r , s i r = exp − y r i − y r 2 /2(t ij ) 2 is the similarity weight between y r i and y r and t ij = 1 k ∑ k j=1 y i r − y ij r is a heat kernel parameter. Considering that the c projection matrices are independent of each other, we can divide Equation (14) into different parts and solve them separately. As for the submanifold M r , the optimization problem can be written as Equation (15): in which M is the mean of all samples in HSI data, H r and S r are the intramanifold scatter matrix and the intermanifold scatter matrix for the submanifold M r and they can be defined as Equations (16) and (17): After that, an optimal projection matrix V r = [v r1 , v r2 , . . . , v rd ] can be obtained by solving the generalized eigenvalue problem of Equations (18): in which λ r is the eigenvalue of Equation (18).

Spatial-Spectral Multi-Manifold Analysis Model
In HSI, spectral information and spatial structure are complementary to each other, spectral information characterizes the spectrum reflection characteristics of land covers, while spatial information reflects the spatial relationship of objects. To learn discriminant projections, the spectral and spatial information can be combined for feature learning. Therefore, we propose a spatial-spectral multiple manifold-based DR method for HSI classification, which extracts the feature information of HSI from both spatial and spectral domains and it can effectively reveal the multi-manifold structure in HSI.
By combining the spatial information and spectral information, we can obtain a spatial-spectral intramanifold scatter matrix and a spatial-spectral intermanifold scatter matrix for submanifold M r as Equations (19) and (20): in which α and β are the tradeoff parameters which are used to balance the contribution of spatial and spectral information in the process of DR.
With the Lagrange multiplier method, the optimization problem for submanifold M r can be transformed into the following form of Equation (21): in which λ r is the eigenvalue of Equation (21). After obtaining the eigenvectors v r1 , v r2 , v r3 , . . ., v rd corresponding to the d smallest eigenvalues, the optimal projection matrix can be given by Equation (22): With the same operations, different projection matrices V 1 , V 2 , . . . , V c can be obtained as Equation (22), which correspond to different submanifolds embedded in HSI data. Denote the feature representations of X in each low-dimensional space as Find the spatial-domain neighboring pixels set of M i r .

7:
Compute the intramanifold scatter matrix H r and the intermanifold scatter matrix S r in spatial domain.

8:
Compute the spatial-spectral intramanifold scatter matrix Z intra r and the spatial-spectral intermanifold scatter matrix Z inter r . 9: Solve the generalized eigenvalue problem of Equation (21). 10: Obtain the projection matrix V r for submanifold M r : V r = [v r1 v r2 . . . v rd ] ∈ R D×d 11: end for 12: Obtain c different projection matrices as V 1 , V 2 , . . . , V c . 13: Calculate the low dimensional features of X in each submanifold as Y 1 ,

Experimental Results
In this section, three public HSI data sets are adopted to evaluate the effectiveness of SSMMDA by comparing it with some state-of-art DR algorithms. Experiments on these HSI data sets demonstrate that the proposed SSMMDA method can explore spatial-spectral combined information and reveal the intrinsic multi-manifold structure in HSI, which can significantly improve the classification performance of HSI.

Experiment Data Set
PaviaU data set: This data set was captured on the city of Pavia, Italy by the ROSIS-03 (Reflective Optics Spectrographic Imaging System) airborne instrument. The ROSIS-03 sensor comprises 115 data channels with a spectral coverage ranging from 0.43 to 0.86 µm. After removing water absorption bands, we adopted the remaining 103 bands for experiments. This data set was originally 610 × 340 pixels with a spatial resolution of 1.3 m. There are nine classes considered in the data set, that is, asphalt, meadows, gravel, trees, metal sheets, soil, bitumen, bricks and shadows. Figure 2 shows its false color scene and the corresponding ground truth. Heihe data set [53,54]: This data set is captured by CASI/SASI sensors in Zhangye basin, Gansu Province, China. It is provided by Heihe Plan Science Data Center which is sponsored by the integrated research on the eco-hydrological process of the Heihe River Basin of the National Natural Science Foundation of China. The data set possesses a spatial size of 684 × 453 pixels with a geometric resolution of 2.4 m. Considering that 14 bands are easily absorbed by water absorption, the remaining 135 spectral bands are used for experiments. The data set contains 9 different land cover types. The scene in false color and its ground truth are shown in Figure 3. Washington DC Mall data set: This data set is acquired by the airborne HYDICE from the mall in Washington DC. The spatial size of this data set is 250 × 307 pixels and the spatial resolution equals to 3 m per pixel. This data set covers a 0.4 µm to 2.4 µm spectral range with 210 bands. There are six different land cover types in the data set such as Road, Building, Trail and Vegetation. This scene in false color and its corresponding ground truth are shown in Figure 4.

Experimental Setup
In each experiment, we randomly divided the HSI data into training and test sets. Training set is used to construct a DR model and a test set is adopted to verify the effectiveness of the DR model. Then, the nearest neighbor classifier (NN) [55] was used for classification. After that, the classification accuracy of each class (CA), overall classification accuracy (OA), average classification accuracy (AA) and kappa coeffcient (KC) were adopted to evaluate the performance of different DR methods [47,52]. Among them, CA is the classification accuracy on each class of land covers, AA is a measure of the mean value of the classification accuracies of all classes, OA refers to the number of correctly classified instances divided by the total number of test samples, KC is a statistical measurement of consistency between the ground truth map and the final classification map. Suppose N ij is the number of samples in the j-th class that are classified into the i-th class, then the classification accuracy of i-th class (CA i ), AA, OA and KC can be defined as Equations (23)-(26): in which c is the number of land cover types of HSI data and n is the total number of HSI pixels. To evaluate the classification performance of the proposed SSMMDA algorithm, some state-of-art DR methods were selected for comparison, such methods include principal component analysis (PCA) [56], NPE [28], LPP [29], linear discriminant analysis (LDA) [57], maximum margin criterion (MMC) [58], MFA [31], nonparametric weighted feature extraction (NWFE) [59], sparsity preserving projections (SPP) [29], sparse discriminant embedding (SDE) [60], sparse manifold embedding (SME) [61], SMML [37], MMDA [33], DSSM [49], LPNPE [48] and weighted spatial-spectral and global-local discriminant analysis (WSSGLDA) [62]. RAW indicates that samples are classified without dimensionality reduction. The former twelve methods are spectral-based DR methods, they only consider the spectral information in HSI. The later three algorithms are spatial-spectral combined DR methods, which simultaneously consider the spectral information and spatial structure of HSI data. Among all the spectral-based DR algorithms, PCA, NPE, LPP, LDA, MMC and MFA are single manifold learning methods, while SMML and MMDA assume that the high dimensional HSI data lies on a multi-manifold structure.
In the experiments, we optimized the parameters to achieve good results for each method. For NPE, LPP, SMML and MMDA, we set the number of neighbors to 5. For MFA and DSSM, the number of intraclass and interclass neighbors were set to 7. For DSSM, LPNPE and WSSGLDA, the window size was set to 15 for PaviaU and Heihe data sets and 5 for the Washington DC data set. To reduce noise in the HSI, the HSI was pre-processed with a 5 × 5 spatial filter. For robustness, all the experiments were repeated for 10 times in each condition.

Analysis of Window Size w and Neighbor Number k
To investigate the classification performance of SSMMDA with different window size w and neighbor number k, 20 samples were randomly selected from each class as training set, and the remaining were used as test set. In the experiment, parameters w and k were chosen within a set of {3, 5, 7, . . . , 25} and a set of {1, 2, 3, . . . , 20}, respectively and the NN classifier were used to classify the test samples. Figure 5 shows the OAs versus different values of w and k. As shown in Figure 5, as the window size w increases, the OAs first improve and then maintain a stable value. The reason is that a larger spatial window possesses more spatial neighbors, the neighbors can be utilized to construct a more effective DR model. However, if w is too large, the spatial information in neighbors will be redundant for DR of HSI. Therefore, the classification accuracies of SSMMDA will tend to remain stable. Besides, the classification accuracies improve with the increase of k and then slightly fluctuate. Because a small number of neighbor points cannot effectively reveal the multi-manifold structure of HSI in spectral domain, while a large k will produce too much useless information for adjacency graph construction. To balance running time and classification performance, we set w = 15, k = 5 for PaviaU data set, w = 15 and k = 8 for Heihe data set, and w = 17 and k = 8 for Washington DC Mall data set in the following experiments.

Analysis of Tradeoff Parameters α and β
SSMMDA has two tradeoff parameters α and β that are used to balance the contribution between the spectral information and spatial structure of HSI. To verify the classification performance of different values of α and β, we randomly selected 20 samples from each type of land covers for training and the rest were for testing. Parameters α and β were both tuned with a set of {0, 0.1, 0.2, 0.3, . . . , 0.9, 1}. After that, the NN classifier was used for classification. Figure 6 shows the average OAs with different α and β.
As can be seen from Figure 6, an increasing α causes a subtle change in terms of classification accuracy, because both spectral-domain intramanifold graph and spatial-domain intramanifold scatter matrix can enhance the intramanifold compactness and a suitable α can balance the contribution of spectral information and spatial structure in HSI. For parameter β, the OAs improve with the increase of β and then reach to a stable peak value, which indicates that the intermanifold scatter matrix can effectively enhance the intermanifold separability. To achieve better classification performance, we set α and β to 0.8 and 0.6 for the PaviaU data set, 0.9 and 0.6 for the Heihe and Washington DC Mall data sets in the experiments.

Investigation of Embedding Dimension d
The proposed SSMMDA method belongs to a dimensionality reduction method, thus we analyze the classification accuracy with different embedding dimensions d. To evaluate the influence of different embedding dimensions on classification performance, 30 samples per class were randomly chosen as training samples, the remaining were used as test samples. Figure 7 shows the classification results of different algorithms under different embedding dimensions.  In Figure 7, some similar conclusions can be obtained on the PaviaU, Heihe and Washington DC Mall data sets. When the value of d is lower than 15, the OAs of the DR algorithms increase significantly as the dimension increases. The reason is that a larger d means more discriminant features are utilized in the classification process, which is helpful to distinguish the land covers with similar spectrum. When d is larger than 15, the feature information contained in embedding space is close to saturation, an increasing d will not lead to the increase of the OAs. Besides, most DR methods can obtain better classification performance than RAW, which indicates that these algorithms can effectively reduce the useless information in HSI. To achieve better classification results for all algorithms, we set d = 30 for all DR methods on these three data sets and the dimension of LDA is c-1, c is the number of land cover types in each data set.

Analysis of Training Sample Size
To investigate the classification performance of the algorithms under different number of training samples, n i (n i = 10, 20, 30, 40, 50, 60) samples were randomly selected from each class for training, and the rest of the samples were used for testing. For robustness, all the experiments were repeated for 10 times in each condition. Tables 1-3 report the classification results of different DR methods with different number of training samples on the PaviaU, Heihe and Washington DC Mall data sets.
According to Tables 1-3, the OAs of the above algorithms are significantly improved as n i increases. The reason is that a large number of training samples contain more abundant feature information, which is conducive to the feature learning of HSI data. For single manifold algorithms with spectral information, supervised methods of LDA, SME and MFA, are superior to unsupervised ones such as PCA, NPE, LPP and SPP, it indicates that the priori label information of HSI is helpful to improve the classification performance of HSI. For the single manifold-based DR algorithms, most spatial-spectral combined methods achieve higher classification accuracies than the spectral DR methods. Because the HSI possesses the spatial consistency property and pixels from the same class are commonly distributed in blocks. Therefore, the spatial information is of great significance to improve the feature representation and classification performance of HSI. For the multiple manifold-based DR algorithms, the proposed SSMMDA method achieves better classification results than SMML and MMDA in most conditions, this is because SMML and MMDA only consider the spectral information of HSI and they do not utilize the spatial structure to further enhance the classification performance. For all the DR methods, SSMMDA has the better OAs and KCs than the other algorithms, because it simultaneously considers the multi-manifold structure and the spatial-spectral combined information in HSI.

Analysis of Classification Results
To explore the classification performance of SSMMDA on each class, we randomly selected a percentage of samples per class as training set and the remaining samples were used as a test set. The training set is used to construct a DR model and the test set is adopted to verify the effectiveness of the DR model. In the experiments, we set the percentage to 0.5% for the PaviaU data set, 0.1% for the Heihe data set and 1% for the Washington DC Mall data set. The CAs, OAs, AAs and KCs of different methods on the PaviaU, Heihe, Washington DC Mall data sets are detailed in Tables 4-6, respectively, and Figures 8-10 show the corresponding classification maps of different algorithms on these three data sets.
As shown in Tables 4-6, SSMMDA achieved better classification results than other algorithms in most classes and it possessed the best OA, AA and KC on the three HSI datasets. This indicates that the proposed SSMMDA method has a stronger ability to characterize the geometric relations between samples and extract more effective spatial-spectral combined information for classification. Besides, the classification maps of SSMMDA in Figures 8-10 produced more homogenous regions than the other methods, especially in the classes of Asphalt, Bitumen and Bricks for the PaviaU data set, Corn, Watermelon and Artificial Surface for the Heihe data set and Road, Water, Building and Shadow for the Washington DC Mall data set. The reason is that SSMMDA not only explores the multiple manifold structure of HSI data but also utilizes the spatial-spectral combined information to extract the discriminant features, which is conducive to the classification of HSI data.

Analysis of Computational Efficiency
To quantitatively compare the computational efficiency of each algorithm, we show the computational time of each algorithm in Table 7, where 30 labeled samples per class were selected for training, and the remaining samples were for testing. All the results in Table 7 were obtained on a personal computer, which has the i3-7100 CPU and 12-G memory. Besides, the version of Windows system and MATLAB are 64-bit Windows 10 and 2017a, respectively. As shown in Table 7, the proposed SSMMDA method requires more time in the classification process of HSI. The reason is that SSMMDA simultaneously considers the spatial and spectral information and it needs to build two models in both the spatial domain and the spectral domain. Besides, Since SSMMDA has c different mapping matrices, it needs to project the samples into c low-dimensional spaces when calculating the embedding features of samples, so the computational complexity of SSMMDA is slightly increased. However, the slight increase in computational complexity is acceptable relative to the improvement for classification performance.

Conclusions
High-dimensional data can be divided into different subsets; each subset is located on a particular submanifold. Traditional feature learning methods can not effectively discover the multi-manifold structure in hyperspectral image, which restricts their classification performance of HSI. In this paper, we proposed a new DR method termed spatial-spectral multiple manifold discriminant analysis (SSMMDA) for the classification on HSI data. SSMMDA first divides the HSI data into several different subsets according to the label information of training samples. Then, it constructs an intramanifold graph for each submanifold to characterize the within-manifold similarity in spectral domain and designs an intramanifold scatter matrix and an intermanifold scatter matrix for each submanifold to characterize the within-manifold similarity and the between-manifold dissimilarity in the spatial domain. After that, a new spatial-spectral combined DR model is built for each submanifold to obtain an optimal projection, and the discriminative features on different submanifolds are fused to enhance the classification performance. Experiments on PaviaU, Heihe and Washington DC Mall hyperspectral data sets demonstrate that the proposed SSMMDA method can obtain effective discriminative features and significantly improve the classification performance of HSI. In our future work, we will focus on how to obtain the optimum embedding dimension for each submanifold to further improve the classification performance of HSI.