Multiscale Superpixelwise Locality Preserving Projection for Hyperspectral Image Classiﬁcation

: Manifold learning is a powerful dimensionality reduction tool for a hyperspectral image (HSI) classiﬁcation to relieve the curse of dimensionality and to reveal the intrinsic low-dimensional manifold. However, a speciﬁc characteristic of HSIs, i.e., irregular spatial dependency, is not taken into consideration in the method design, which can yield many spatially homogenous subregions in an HSI scence. Conventional manifold learning methods, such as a locality preserving projection (LPP), pursue a uniﬁed projection on the entire HSI, while neglecting the local homogeneities on the HSI manifold caused by those spatially homogenous subregions. In this work, we propose a novel multiscale superpixelwise LPP (MSuperLPP) for HSI classiﬁcation to overcome the challenge. First, we partition an HSI into homogeneous subregions with a multiscale superpixel segmentation. Then, on each scale, subregion speciﬁc LPPs and the associated preliminary classiﬁcations are performed. Finally, we aggregate the classiﬁcation results from all scales using a decision fusion strategy to achieve the ﬁnal result. Experimental results on three real hyperspectral data sets validate the effectiveness of our method.


Introduction
Hyperspectral image (HSI) classification has been a research hotspot over recent years, since it plays a critical role in military target detection, precision agriculture, mine exploration, and many other applications [1][2][3]. With abundant spectral information, HSIs show great potential in identifying different ground objects of interest. However, hundreds of spectral bands in an HSI also bring about some problems, such as heavy computation burdens and the Hughes phenomenon which means that a large amount of training HSI pixels are required to maintain statistical confidence in the HSI classification task [4,5]. Such problems usually hinder HSI classifiers from achieving excellent performance in real applications. In fact, HSI spectral bands are strongly correlated; hence, the spectral signature of each HSI pixel can be represented by only a few features [6][7][8]. Therefore, one often-used strategy to overcome the dimensionality dilemma mentioned above is dimensionality reduction which aims to represent the high-dimensional HSI data in a low-dimensional space without losing important information for discriminating tasks.
Numerous dimensionality reduction methods have been developed, which can be roughly categorized into two groups: feature transform and feature selection [9,10]. Feature transform projects the original HSI data into an appropriate low-dimensional space, while feature selection chooses the most representative bands from the HSI. Feature transform gains the advantage over feature selection that it has the potential to maintain the original information in the high-dimensional data during dimensionality reduction and thus generate more discriminative features [11,12]. Principle component analysis (PCA) and linear discriminant analysis (LDA) are two typical feature transform methods. PCA attempts to map the data along the directions of maximal variance [13]. LDA tends to maximize between-class distances and minimize within-class distances in the meantime [14]. Both PCA and LDA just assume that the information contained in high-dimensional data lies on a linear low-dimensional space, whereas nonlinear structures are often exhibited in real HSI data. To cope with the nonlinearities, manifold learning methods were developed. Manifold learning assumes that the high-dimensional data actually lie on a low-dimensional manifold structure, which can be parameterized with a group of identifiable coordinates. One of the most popular manifold methods is locality preserving projection (LPP) which builds a graph to capture geometric structures of data and subsequently establishes a projection from the original data space to the low-dimensional space [15][16][17][18][19]. Other manifold learning approaches include locally linear embedding (LLE) which supposes that the structure represented by the linear combinations of the data's nearest neighbors is unchanged in both the high-dimensional and the associated low-dimensional spaces [20], isometric mapping (ISOMAP) which utilizes the geodesic distance to perform low-dimensional embedding [21], and local tangent space alignment (LTSA) which aims to recover the intrinsic manifold of data by aligning the local tangent space of each pixel [22]. Irregular spatial dependency is an important characteristic of HSIs, which is caused by the usual occurrences of complex irregular ground objects in HSI scenes. The dependency brings about spatially local homogeneous subregions in different shapes and sizes in an HSI. Such subregions can be detected effectively with appropriate techniques such as superpixel [23][24][25], where pixels in a homogeneous subregion have similar spectral properties but vary relatively significantly across different subregions. Intuitively, such homogeneous subregions would result in local homogeneities on an HSI manifold. However, conventional manifold learning-based HSI dimensionality reduction methods, such as LPP, directly apply a unified projection on the entire HSI, missing those local homogeneities on the HSI manifold. Motivated by such a consideration, we propose a multiscale superpixelwise LPP (MSuperLPP) method for HSI classification in this work. The method is able to deal with spatially local homogeneities in HSIs during dimensionality reduction, thus offering the potential to improving the subsequent classification. To the best of our knowledge, there is no similar work to ours in existing literature, which performs local homogeneity manifold-based HSI dimensionality reduction. Our methods comprise three major phases. First, we segment an HSI into many homogeneous subregions using entropy rate superpixel segmentation (ERS) with a series of scales, which can exploit fully rich spatial dependencies in different shapes and sizes. Next, on each scale, LPP is run on each subregion with spectral-spatial covariance feature, and the obtained low-dimensional features are fed into a preliminary classifier. In the final step, the results on all the scales are aggregated with a decision fusion strategy to yield the final classification result.
The remainder of this paper is organized as follows. In Section 2, we introduce some backgrounds related to the proposed approach. Section 3 gives the details of the proposed MSuperLPP. Section 4 presents the experiment results, and Section 5 provides a summary of this paper.

Related Work and Background
Denote an HSI with N pixels and D bands as X = [x 1 , x 2 , · · · , x N ] ∈ R D×N , and the connected data set in a low-dimensional space as Y = [y 1 , y 2 , · · · , y N ] ∈ R d×N , where d D. LPP is a widely used manifold learning method for HSI dimensionality reduction [16][17][18][19]26], while the region covariance descriptor is an effective spectral-spatial feature for HSI classification [26][27][28]. In the following, they are briefly introduced as backgrounds of our proposed method.

Locality Preserving Projection
LPP builds a graph to obtain the connections among all HSI pixels and then intends to find an optimal projection A to map the original HSI data into a low-dimensional space, while maintaining the local connection relationships [15][16][17][18][19]. Its objective function can be mathematically represented as and the weight W ij is defined as where σ denotes the scale and N(x i ) stands for the neighbors of x i . Equation (2) measures the similarity or the distance between pixels x i and x i . Under the similarity, the objective in Equation (1) would lead to a heavy penalty if x i and x j are mapped to be far away from each other; thus minimizing it guarantees that a small distance between x i and x j tends to force a small distance between y i and y j . Given a constraint YDY = I, which means removing the arbitrary scaling factor, the aforementioned optimization problem can be further formulated as is the graph Laplacian matrix. The problem in Equation (3) can be solved by the following generalized eigenvalue formulation: where λ denotes the eigenvalue and eigenvectors corresponding to the d smallest eigenvalues of Equation (4) from the projection matrix A = [a 1 , a 2 , · · · , a d ].
In brief, LPP aims to obtain a projection to perform an HSI dimensionality reduction while preserving the neighborhood structure of the data. Besides the standard LPP mentioned above, some considerations such as sparsity, tensor, and orthogonality can be utilized additionally to solve the projection [17][18][19]29].

Region Covariance Descriptor
Region covariance descriptor has been applied in the computer vision and brain computer interface problem [30][31][32]. Deng et al. introduced the descriptor to HSI processing [26][27][28]. Suppose that X ∈ R W×H×D represents the original HSI cube, X i ∈ R w×h×D denotes the spatial region around the ith pixel, and s = w × h is size of a spatially local window, then the region covariance descriptor is as follows: This can be defined as a spectral-spatial feature of x t . Since the covariance feature is connected to a symmetric positive definite matrix and lies on a Riemannian manifold [32], the Log-Euclidean distance metric shown below can measure the similarity between C i and C j : Performing a region covariance descriptor can yield an excellent spectral-spatial covariance feature for HSI classification [26][27][28], which has the advantage of robustness to noise and spectral variabilities over traditional spectral feature (the original spectral signature) [33][34][35].
Following one of the most effective HSI classification routines which comprises three successive phases, i.e., feature extraction, dimensionality reduction, and classification, the combination of various features and various manifold dimensionality reduction ways can yield different HSI manifold learning schemes which can be testified with a subsequent HSI classifier.

Proposed Method
A remotely sensed HSI scene usually comprises irregular ground objects in various shapes and sizes, which leads to the irregular spatial dependency characteristic of an HSI, thus leading to the local homogeneities on the connected HSI manifold. In the following, we develop a new method called MSuperLPP to deal with the manifold local homogeneities. Different from a conventional LPP which applies an unified projection on the entire HSI data, our method adaptively determines homogenous subregions and thus the HSI manifold local homogeneity and then employs a divide-and-conquer strategy to perform LPP processing based on those subregions. Therefore, our method is able to fully explore local homogeneities on the HSI manifold and thus collects more useful discriminative information, which poses the potential to enhance the final classification performance.

Determination of Manifold Local Homogeneity with Multiscale Superpixel Segmentation
Taking into consideration the irregular spatial dependency characteristic of HSIs, we adopt a multiscale superpixel segmentation strategy here to determine the irregular homogeneous subregions in various shapes and sizes in an HSI. ERS can achieve a natural representation of visual scenes [36,37]. It is based on an undirected graph G(V, E) where V is the vertex set and E is the edge set. The vertices can be the pixels of an HSI, and then, the weight of an edge measures the similarity of the connected two pixels. Thus, the image segmentation task becomes the problem of how to partition the graph properly. ERS aims to choose a subset S of E which is in the partition of K subgraphs using the following criterion: where the entropy rate term H(S) prefers homogeneous clusters, the regularization term B(S) forces the clusters to be of similar sizes, and the weight λ needs to be nonnegative. To solve the optimization problem in Equation (7), Liu et al. gave an efficient greedy algorithm [36]. In our method, we first use PCA to obtain the first principal component of HSI and then perform ERS on the extracted principal component for convenience. ERS tends to divide the image into subregions of similar sizes once the number of superpixels is given. However, the homogenous subregions in the HSI are in different sizes. That is to say, ERS with a single scale cannot represent the homogeneities well. Therefore, we perform the segmentation in a multiscale way to tackle this issue, where ERS is run with several scales. Specifically, if an HSI is segmented with 2P + 1 scales, then the number of superpixels on a scale is determined by where p is the index of scales and N s f is the fundamental number of superpixels and given empirically. With a series of scales, the segmentation is expected to adapt homogeneities in various sizes in the scene and to offer the potential to further reduce the impact of an irregular spatial dependency. A superpixel obtained in the HSI represents spatially homogenous subregions, while different superpixels means there are spectral signature differences to some degree. Such homogenous subregions in an HSI imply that there are local homogeneities on the associated HSI manifold. Thus, we can determine local homogeneities on the connected HSI manifold by performing the multiscale superpixel segmentation.

Divide-and-Conquer-Based LPP Classification
As local homogeneities on an HSI manifold can be reflected with superpixels, performing an LPP in each superpixel with a superpixel specific projection is expected to preserve the spatially local geometric structure of the data well.
Our method utilizes the region covariance descriptor to construct the graph for an LPP. The descriptor can yield spectral-spatial covariance features more robust to noise and spectral variabilities than the spectral feature [26][27][28]. More specifically, we characterize pixels with their region covariance descriptors, calculate the pairwise distances among pixels in a superpixel with Log-Euclidean distance metric, find the nearest neighbors of the pixels in a superpixel, and compute their similarities as where N(C i ) denotes the neighbors of C i . Then, we run an LPP on each superpixel. After performing the dimensionality reduction on the HSI with 2P + 1 different segmentation scales, we classify the obtained low-dimensional data on each scale, and then majority voting decision fusion is utilized to aggregate the results from all the scales. Suppose that the classification result on the jth scale for a pixel is l j . Then, the score for the ith class is where I(l j = i) = 1, if l j = i 0, otherwise denotes an indicator function. The final classification result is achieved by l = arg max i∈{1,2,··· ,L} where L is the number of all possible classes. The flowchart of our proposed MSuperLPP for HSI classification is shown in Figure 1, and the connected performing steps can be found in Algorithm 1.

Algorithm 1 MSuperLPP
Input: HSI X ∈ R W×H×D ; scale set N sp = {N i |i = 1, 2, · · · , S} obtained by (8); window size s = w × h. Extract spectral-spatial covariance features using (5) and perform PCA to extract the first principle component I f . for i=1 to S do Segment X into N i homogeneous subregions using ERS with I f ; for j=1 to N i do Perform LPP in each subregion R j , where the spectral-spatial covariance features are used to search for the k nearest neighbors. end for Combine the low-dimensional features of all the subregions on the same scale to form the low-dimensional data on this scale. Perform classification on the scales i to get preliminary output T i . end for Aggregate the classification results T i (i = 1, 2, · · · , S) using (10)

Experimental Results
In our experiments, three real HSI data sets are used to evaluate the proposed MSuperLPP. The first one is the Indian Pines data set, which was acquired by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) over northwestern Indiana, including 220 spectral bands as well as 145 × 145 pixels, and its available ground-truth data contains 10249 pixels with 16 classes. 40 noisy bands are removed, and the remaining 180 bands are used for experiments. The second one is the Zaoyuan data set, which was collected by the Operational Modular Imaging Spectrometer (OMIS) over the Zaoyuan region, China, in 2001. The data set comprises 137 × 202 pixels and 128 bands covering the region. After removing the noisy bands, 80 bands remain. Moreover, 23821 pixels with 8 classes are used for classification. The last one is the Salinas data set, which was collected by AVIRIS over Salinas Valley, California. The scene used for the experiments comprises 250 × 110 pixels and 224 bands and has 14 testing classes. After removing the noisy bands, 174 bands are Left for the experiments.
For the Indian Pines and Zaoyuan data sets, we randomly choose 5, 10, 30, and 50 pixels per class for training while the other pixels are used for test. Since some classes have a few samples, we select at most half of the total samples. For the Salinas data set, 1, 3, 5, and 10 pixels per class are selected as training samples. After a dimensionality reduction, both the nearest neighbor (NN) classifier and the support vector machine (SVM) with the radial basis function (RBF) kernel are subsequently applied to evaluate the proposed method. The free parameters of the two classifiers are set with cross-validation [38], including the number of nearest neighbors for the NN classifier and the RBF scale and the regularization coefficient for the SVM classifier. We repeat all the classification experiments ten times to avoid random bias, and the average accuracies are presented.
We compare the proposed MSuperLPP with GlobalLPP-SF (global LPP on spectral feature), GlobalLPP-SSCF (global LPP on region convariance descriptor based spectral-spatial covariance feature), SuperLPP-SF (superpixelwise LPP on spectral feature), and SuperLPP-SSCF (superpixelwise LPP on spectral-spatial covariance feature). In our experimental settings, some other parameters are empirically determined as follows. The window's size of region covariance descriptor is 3 × 3 for GlobalLPP-SSCF, SuperLPP-SSCF, and MSuperLPP. In MSuperLPP, the fundamental number of superpixel N s f is set to 100 for Indian Pines, 40 for Zaoyuan, and 15 for Salinas, while P is set to 4 for all three data sets.
Tables 1-3 demonstrate the overall classification accuracies on the Indian Pines, Zaoyuan, and Salinas data sets, respectively. Figure 2 shows the influence of the number of training samples on the classification performance. As observed, GlobalLPP-SSCF performs better than GlobalLPP-SF while SuperLPP-SSCF is better than SuperLPP-SF, which suggests that a spectral-spatial covariance feature is beneficial to improving classification accuracy. Meanwhile, SuperLPP-SSCF and SuperLPP-SF achieve higher accuracies than GlobalLPP-SSCF and GlobalLPP-SF, respectively, which indicates that a superpixel segmentation can enhance classification accuracy. Among all the five considered methods, our proposed MSuperLPP yields the highest accuracies.
For visual inspection purposes, the classification maps obtained with five compared methods are given in  Here, we only show the results obtained with the nearest neighbor classifier when the size of the training set is set to 50 for Indian Pines and Zaoyuan and 10 for Salinas, respectively. It can be seen that our MSuperLPP yields the best regional consistency and agrees more with the ground truth.     Therefore, it is verified by the experimental results that our MSuperLPP, involving the multiscale superpixel segmentation strategy, is able to achieve an excellent classification performance.

Conclusions
Taking into consideration local homogeneities on the HSI manifold connected to the irregular spatial dependency characteristic of an HSI, which is usually ignored by existing manifold learning-based dimensionality reduction methods, we propose a MSuperLPP method for HSI classification. In MSuperLPP, we adopt a divide-and-conquer strategy, first dividing the HSI into many homogeneous subregions on various scales to reveal those local homogeneities, then performing a LPP in each subregion and a preliminary classification on each scale, and finally fusing all the preliminary classification results to yield a final result. The experimental results on real HSI data sets verify the excellent performance of our method.