Locally Weighted Discriminant Analysis for Hyperspectral Image Classification

A hyperspectral image (HSI) contains a great number of spectral bands for each pixel, which will limit the conventional image classification methods to distinguish land-cover types of each pixel. Dimensionality reduction is an effective way to improve the performance of classification. Linear discriminant analysis (LDA) is a popular dimensionality reduction method for HSI classification, which assumes all the samples obey the same distribution. However, different samples may have different contributions in the computation of scatter matrices. To address the problem of feature redundancy, a new supervised HSI classification method based on locally weighted discriminant analysis (LWDA) is presented. The proposed LWDA method constructs a weighted discriminant scatter matrix model and an optimal projection matrix model for each training sample, which is on the basis of discriminant information and spatial-spectral information. For each test sample, LWDA searches its nearest training sample with spatial information and then uses the corresponding projection matrix to project the test sample and all the training samples into a low-dimensional feature space. LWDA can effectively preserve the spatial-spectral local structures of the original HSI data and improve the discriminating power of the projected data for the final classification. Experimental results on two real-world HSI datasets show the effectiveness of the proposed LWDA method compared with some state-of-the-art algorithms. Especially when the data partition factor is small, i.e., 0.05, the overall accuracy obtained by LWDA increases by about 20% for Indian Pines and 17% for Kennedy Space Center (KSC) in comparison with the results obtained when directly using the original high-dimensional data.


Introduction
A hyperspectral image (HSI) is captured by an image spectrometer with hundreds of spectral bands for each image pixel, which often plays an important role in the fields of urban planning, precision agriculture, and land-cover classification [1][2][3][4][5].Generally, the spectral bands of each pixel are considered to be the features with high dimensionality.The high dimensionality of the original HSI data significantly leads to feature redundancy problem and increases the computational complexity [6,7].To overcome these drawbacks, it is critical to perform dimensionality reduction, which is designed to project the original high-dimensional data into a low-dimensional feature subspace while preserving some desirable information.
The existing dimensionality reduction approaches can be classified into two categories: feature selection [8][9][10][11] and feature extraction.The focus of this paper is feature extraction, which is designed to construct a low-dimensional embedding subspace and then create meaningful information by the projection of the original high-dimensional data.Then, the existing traditional classification methods (e.g., support vector machine classifier) can be directly applied to the projected data.Therefore, the HSI classification of low-dimensional data is conducive to avoiding feature redundancy and the Hughes phenomenon [12] and to reducing the computational complexity.Consequently, lots of feature extraction approaches have been presented [6,[13][14][15][16][17][18][19][20].Popular feature extraction methods include principal component analysis (PCA) [21], linear discriminant analysis (LDA) [22], locality preserving projection (LPP) [23], and modified locality preserving projection (MLPP) [24].Compared with PCA and LPP, LDA can learn a linear transformation by simultaneously minimizing the intraclass distances and maximizing the interclass discrepancy.However, when directly applying LDA to process HSI data, it still faces several problems [25]: (1) when the dimensionality of data exceeds the size of training samples, LDA suffers from an ill-posed problem; (2) when the reduced dimensionality is less than the number of classes, LDA has an over-reducing problem; (3) LDA neglects the spatial information in the discriminant analysis; (4) LDA assumes that all the samples obey the Gaussian distribution, which is difficult for constructing the local classification boundary.
Recently, there have been many variants of LDA that try to improve the classification performance using some constraints, such as regularized local discriminant embedding (RLDE) [26], local geometric structure Fisher analysis (LGSFA) [18], and locality adaptive discriminant analysis (LADA) [27].RLDE employs a regularized discriminant model to preserve the local structure of the HSI data.LGSFA retains the local structure among the within-class and between-class samples during the analysis process.LADA constructs a scatter matrix for each pixel with its small neighborhood, which is considered a regularization term.The above approaches can alleviate the ill-posed and over-reducing problems of the original LDA method.However, they only represent the local structure relationship of the HSI data as one-to-one.Moreover, the preservation of local structure still remains an open issue.Some new methods have been developed on the basis of graph learning.Ly et al. [28] used graph learning to construct scatter matrices, and then they conducted a discriminant analysis.Li et al. [29] proposed a two-stage framework to learn the data graph in the low-dimensional feature subspace.However, the above methods only consider the spectral information of the HSI data, which cannot accurately determine the local classification boundary.
To effectively exploit the spectral information and spatial information of the high-dimensional HSI data, a locally weighted discriminant analysis (LWDA)-based dimensionality reduction method is proposed for HSI classification in this paper.In order to apply the spatial information to the projection process, the proposed method learns the data structures adaptively during the transformation of subspace projection.Furthermore, to guarantee the spatial consistency of land cover, samples within a small neighborhood in the embedding space should be similar, which is considered a regularized constraint term during the optimization.The main contributions of this paper can be summarized as follows: (1) A weighted scatter matrix model is proposed by exploiting the label information and spectral information of the samples, which is able to reduce the effect of the image difference of the HSI data.(2) The proposed method considers the spatial consistency and the similarity relationship among the samples in a small spatial neighborhood, which is able to describe the local structure of the samples.(3) An optimization function is constructed on the basis of the spatial-spectral information and label information, which is able to preserve the within-class characteristics and suppress the between-class properties in the embedding feature subspace.
The remainder of this paper is organized as follows.Section 2 briefly introduces some related works, including the original LDA and MFA approaches.Section 3 provides our proposed method in detail.In Section 4.2, experimental results are presented to demonstrate the effectiveness of the proposed method compared with several state-of-the-art dimensionality reduction algorithms.Finally, a conclusion of this work is provided in Section 5.

Related Works
x n ] ∈ R d×n be the original HSI data, where d is the number of spectral bands for each image pixel, i.e., the data dimensionality of the HSI data, and n represents the number of the image pixels considered as samples.The label information of the ith pixel is denoted as (x x x i ), which belongs to {1, 2, • • • , c}, and c is the number of classes.The goal of dimensionality reduction is to construct a projection matrix P P P ∈ R d×m , where m is the reduced dimensionality of the projected data.For the linear mapping function, the projected data is indicted as Y Y Y = P P P T X X X.Generally, the value of m is considerably smaller than d.

Linear Discriminant Analysis
Linear discriminant analysis (LDA) is a supervised method and able to compact the within-class samples and separate the between-class samples.It defines a between-class scatter matrix S S S b and a within-class scatter matrix S S S w as follows: where n k is the number of the kth class, and x x x i k is the ith sample from the kth class.u u u k is the mean of the kth class, computed by In addition, T represents the transpose operation.With the above definitions, LDA tries to learn the linear transformation matrix P P P by maximizing the ratio of the between-class scatter and the within-class scatter.The projection matrix can be obtained by the following optimization function [22]: max P P P tr P P P T S S S b P P P P P P T S S S w P P P , where tr(•) represents the trace operator.The optimal projection matrix P P P can be obtained by analytically solving the generalized eigenvalue decomposition and then choosing the m eigenvectors that correspond to the m largest eigenvalues.Then, the m-dimensional projected data can be computed by Y Y Y = (P P P ) T X X X. Equations ( 1) and (2) reveal that the between-class scatter matrix is easily reflected by the subtraction of the total mean.Moreover, it is unable to capture the local manifold structure of the HSI data.Due to the two drawbacks, it is difficult for LDA to achieve satisfactory performance in real-world HSI applications.

Marginal Fisher Analysis
Marginal Fisher analysis (MFA) is a supervised graph learning method, which constructs an inherent graph and a penalty graph [18].The inherent graph tries to obtain certain geometrical information of the input dataset, while the penalty graph reveals the unwanted properties of the inputs.MFA designs two weight matrices.Let W W W = {w ij } n i,j=1 and W W W p = {w p ij } n i,j=1 be the similarity matrix and the penalty matrix.w ij represents the similarity relationship between the two data points x x x i and x x x j , which are from the same class.On the other hand, w p ij describes the similarity characteristic between x x x i and x x x j that are from different classes.The mathematical descriptions of w ij and w p ij are defined as follows: where N 1 (x x x i ) and N 1 (x x x j ) represent the k 1 nearest neighbors of data points x x x i and x x x j that are from the same class, and N 2 (x x x i ) and N 2 (x x x j ) represent the k 2 nearest neighbors of data points x x x i and x x x j that are from different classes, respectively.With the definition of the two weight matrices, the optimization function of MFA is designed to obtain a projection matrix P P P as follows: min ⇒ min P P P tr P P P T X X XL L LX X X T P P P P P P T X X XL L L p X X X T P P P , where L L L and L L L p are the Laplacian matrices, which are defined as ). diag(•) represents the matrix diagonal element extraction operation.Equation ( 7) can be solved analytically through the generalized eigenvalue decomposition of X X XL L LX X X T and X X XL L L p X X X T .Then, the optimal projection matrix P P P is formed by the m eigenvectors corresponding to the m smallest eigenvalues.
MFA tries to enhance the compactness of the data points from the same class and to improve the separability of the data points from different classes in the embedding feature subspace.However, the similarity relationship between two data points in a small neighborhood is simplified as 1, which will limit it to learn a certain local manifold structure of the HSI data.

Proposed Method
To take advantage of the discriminant information and the spatial-spectral information of the input HSI data, a new supervised dimensionality reduction method, named locally weighted discriminant analysis (LWDA), is presented for HSI classification.LWDA constructs a weighted scatter matrix model on the basis of the within-class and between-class scatter matrices of the traditional LDA method.The weighted scatter matrix model defines a weighted within-class scatter matrix and a weighted between-class scatter matrix to improve the discriminating power.Furthermore, LWDA preserves the spatial consistency among the samples in a small spatial neighborhood.To construct the optimal low-dimensional feature subspace, the proposed method obtains the corresponding projection matrix by compacting the nature of the weighted within-class scatter matrix and the spatial consistency, and suppressing the property of the weighted between-class scatter matrix.
The flowchart of the proposed method is shown in Figure 1, where the high-dimensional HSI data are projected onto a two-dimensional subspace for visualization.Taking the classification process of the Indian Pines dataset as an example, the steps of the proposed algorithm can be summarized as follows: (1) on the basis of the training samples and the corresponding training labels, the weighted withinand between-class scatter matrices are computed; (2) with the help of the training and test samples, the spatial consistency matrix for each training sample can be computed; (3) with the foregoing weighted scatter matrices and spatial consistency matrix, the optimal projection matrix corresponding to each training sample is obtained; (4) for each test sample, the spatially closest training sample's projection matrix can be obtained, which is used to construct the embedding features; (5) the class estimation of each test sample is obtained by a certain classifier with the training labels and the embedding features.

Weighted Scatter Matrix Model
LDA assumes all the samples possess the same contribution, i.e., the Gaussian distribution.In LDA, the within-class scatter matrix only considers the data variances of the within-class samples, while the between-class scatter matrix just considers the data variance between the mean of each individual class and the total mean.However, different within-class samples should have different contribution rates in the within-class scatter matrix.Moreover, the properties of any two different individual class means may be different in the between-class scatter matrix.To better represent the similarity characteristic of the within-class samples and different individual class means, the proposed method constructs two weighted scatter matrices, i.e., the weighted within-class scatter matrix and the weighted between-class scatter matrix, which are defined as follows: where g k i,j is the similarity weight between the samples x x x i k and x x x j k , and h i,j is the similarity weight between the one-class means u u u i and u u u j .The similarity weights are represented as where , and ε is a small value for avoiding zero in the denominator.
Similar to LDA, the optimization function is designed to improve the aggregation of the within-class samples and enhance the diversity of the between-class samples in a low-dimensional feature subspace.So, the optimal projection matrix can be obtained by the following formula: min P P P tr P P P T S S S w P P P P P P T S S S b P P P .
Supposing the minimum value of the above function is α, the optimal P P P should make the value of tr P P P T S S S w P P P − αtr P P P T S S S b P P P close to 0. Thus, Equation ( 12) is equivalent to min P P P tr P P P T S S S w P P P − αtr P P P T S S S b P P P .(13)

Spatial Consistency Matrix
For real-world HSI data, the data points within a small spatial region are often highly correlated and are classified as the same class [25].Hence, spatial consistency is essential for an accurate classification.Given a data point x x x i ∈ R d×1 (i = 1, • • • , n), the spatial surroundings are found within a search region with a size of r × r, where r must be odd.Therefore, the r 2 − 1 neighbors are selected for each sample, which are denoted as . For different samples x x x i and x x x j , the subsets Z Z Z i and Z Z Z j may partially overlap.In a desired feature subspace, these neighbors are encouraged to be close to each other.The problem of spatial consistency can be defined as min The spatial consistency matrix is defined as Then, Equation ( 14) can be further reduced to min P P P tr P P P T S S S z P P P .( 16)

Optimization Algorithm
Integrating Equations ( 13) and ( 16) together, the objective function of the proposed LWDA method is summarized as min P P P i tr P P P T i S S S w P P P i − αtr P P P T i S S S b P P P i + βtr P P P T i S S S z P P P i , ( ⇒ min P P P i tr P P P T i S S S w − α S S S b + β S S S z P P P i , (18) where α and β are parameters, and P P P i is the desired projection matrix for the sample With the proposed objective function, the spatial consistency between the data points is captured, and the local data relationship is also investigated during the discriminant analysis.The optimal P P P i for Equation ( 18) can be obtained by the m (m d) eigenvectors of the critical matrix S S S w − α S S S b + β S S S z corresponding to the m smallest eigenvalues.
With a certain dataset partition factor τ (0 < τ < 1), the input HSI dataset can be divided into the training subset X X X s and the test subset X X X t .That is to say, for the kth class, the number of the randomly selected samples for the training subset is n s,k = n k * τ , while the number of the samples chosen for the test subset is n t,k = n k − n s,k , where n k is the total number of samples belonging to the kth class.For simplicity, the training and test subsets are denoted as X X X s = Construct the neighbor set Z Z Z i and then compute the spatial consistency matrix S S S z according to Equation (15); 4. Compute the total matrix S S S = S S S w − α S S S b + β S S S z ; 5. Obtain the optimal projection matrix P P P i by choosing the m eigenvectors of S S S corresponding to the m smallest eigenvalues; then, set i = i + 1; end Testing: For each test sample x x x i t , find the spatially nearest training sample x i s and the corresponding optimal projection matrix, denoted as P P P i = P P P i , where i = 1, • • • , n s ; 7. Low-dimensional embedding features are computed as X X X s,m = P P P T i X X X s , and x x x i t,m = P P P T i x x x i t .8. Using the nearest-neighbor classifier, find the serial number of the nearest training sample in the low-dimensional feature subspace, computed as , where j = 1, • • • , n s ; 9. Obtain the corresponding class information, i.e., y i t = y j s ; then, set i = i + 1.

Experimental Setting
In the experiments, two real-world hyperspectral image datasets were employed, i.e., Indian Pines and Kennedy Space Center (KSC) datasets [30].The Indian Pines dataset contains 10,249 data points from 16 classes.Each data point (pixel) has 200 spectral bands.The KSC dataset annotates 5211 valid pixels (excluding the background pixels with the class information of 0) from 13 classes.Each pixel has 176 spectral bands.The two HSI datasets were both captured by an Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor.Table 1 shows various land-cover types and the corresponding number of samples for the two aforementioned HSI datasets.To investigate the classification performance, each dataset was randomly divided into the training and test samples with a data partition factor τ.For instance, the total number of samples for the "Corn-notill" land-cover type is 1428, shown in Table 1.Setting τ to 0.05, the number of samples chosen for training is 1428 × 0.05 = 72, while the remaining 1356 samples were used for testing.
According to different dimensionality reduction algorithms, the projection matrix can be obtained to achieve the low-dimensional embedding features of training and test samples.After that, a certain classifier, e.g., nearest neighbor (NN) and support vector machine (SVM) [31], is exploited to discriminate the land-cover types of the test samples with the help of the class information of training samples.In this study, three widely used classification measurements, i.e., average classification accuracy of all the classes (AA), overall classification accuracy (OA), and kappa coefficient (KC), were used to evaluate the objective results of each method.To alleviate the random error caused by the partition of training and test samples, each experiment was repeated five times in each condition; reported are the average AAs, the average OAs, the average KCs, and their standard deviations (STDs).All the experiments were performed on a personal computer with Intel Xeon CPU E5-2643 v3, 3.40 GHz, 64 GB memory, and 64-bit Windows 7 using Matlab R2017b.

Performance on Hyperspectral Image Datasets
The quantitative results of the proposed method and the baselines are given in Tables 2 and 3 for the Indian Pines dataset and KSC dataset, respectively.The two tables show the average classification accuracy of each class, the AAs, OAs, and KCs, as well as their STDs, which were obtained by repeating each experiment five times.All the values are represented in terms of percentage.Tables 2 and 3 demonstrate that the proposed LWDA method achieves better classification results in most classes compared with other methods, and it outperforms all the competitors in terms of AAs, OAs, and KCs.PCA neglects the nonlinear relationship from the original high-dimensional feature data, although it achieves dimensionality reduction.LDA and DLPP preserve the local manifold structure by exploiting the discrimination information of the training samples.However, they neglect the global structure in the dimensionality reduction.Since MFA simply considers the similarity relationship between two samples in a small neighborhood as one, it is difficult to learn an accurate local manifold structure.LGSFA can retain the local structure among the within-and between-class samples during the discriminant analysis.TwoSP preserves the global structure in the first-stage subspace projection, and it investigates the local structure of the HSI data adaptively.DAGL combines the spatial neighborhood information and data graph in the discriminant analysis process.However, the data graph is constructed in the original high-dimensional space, which still introduces data noise into the final projection process.The proposed LWDA method exploits the spectral and spatial information of the HSI data, and it then enhances the spatial consistency during the discriminant analysis.Therefore, LWDA produces the best classification performance on the two HSI datasets.
Moreover, the classification maps of the aforementioned methods on the Indian Pines and KSC datasets are shown in Figures 2 and 4. For better visualization, the local magnifications with a certain magnification factor are displayed in Figures 3 and 5. From Figures 2-5, the proposed LWDA method generates smoother classification maps and poses more homogeneous areas.LWDA enforces the spectral-spatial information during the discriminant analysis.It not only preserves the global structure but also the local manifold structure with the help of the proposed weighted scatter matrix model and the construction of spatial consistency.4, where the methods in the horizontal direction are denoted as the test methods while those in the vertical direction are marked as the reference methods.When the value is smaller than zero, it indicates that the classification performance of the test method is better than that of the reference method; otherwise, the reference method outperforms the test method.Moreover, when the absolute value is larger than 1.96, the two methods have obvious differences.Compared with all the competitors, the absolute values in the LWDA column are larger than 10, which demonstrates that LWDA has a distinct advantage.

Analysis of Computational Cost
T1 is denoted as the running time of the construction of projection matrix.T2 and T3 are defined as the classification times obtained by the NN and SVM classifiers in the testing process.
Table 5 shows T1, T2, and T3 in terms of seconds of the different methods.The low dimensionality of the embedding features can reduce the running time in the classification process.To obtain the optimal projection matrix, the proposed LWDA method estimates the class information for each test sample.In addition, LWDA takes the most running time to extract the spectral-spatial information in the discriminant analysis for each training sample.So, the computation of dimensionality reduction in LWDA is larger than the others, excluding TwoSP because it involves a large kernel matrix computation.Compared with T3, the running time using the NN classifier is smaller, which also illustrates that the NN classifier for the classification process has a better advantage.LWDA with the SVM classifier needs to construct an SVM model for each test sample, so T3 of LWDA is much larger than the baselines.Therefore, the NN classifier was applied in the experiments.

Analysis of Reduced Dimensionality
The optimal reduced dimensionality of each method is discussed in this section.Figure 6 shows the curves of OAs varying with different dimensionalities of the projection matrix, from 2 to 50, for the Indian Pines and KSC datasets.
Figure 6 demonstrates that the proposed LWDA method achieves the highest OAs constantly.In particular, LWDA exceeds the baselines to a large extent when the dimensionality is less than 5. Furthermore, the classification performance of all the methods becomes stable or decreases when the dimensionality increases to a certain value, which also indicates that a low-dimensional feature subspace is sufficient for preserving the valuable information of HSI data.

Analysis of Classifier
To evaluate the classification performance of each method with two different classifiers, i.e., NN and SVM, the experiments were repeated five times.For SVM, the LibSVM Toolbox in a MATLAB version was applied with a radial basis function (RBF) kernel [34].Once the projected features were obtained by each method, the NN and SVM classifiers were applied for the classification process, respectively.Figure 7 shows the classification results obtained by different methods with NN and SVM classifiers on two HSI datasets.

Analysis of Parameters
The proposed LWDA method has two trade-off parameters r and β.
The value of r affects the number of neighbors in the spatial space, while the value of β balances the contribution between the weighted scatter matrix model and the spatial consistency matrix.r was tuned with the set {3, 5,7,9,11,13,15,17,19,21,23,25, 27}, and β was varied with the set {0, 0.001, 0.005, 0.01, 0.02, 0.04, 0.05, 0.06, 0.08, 0.1, 0.5, 1, 5, 10, 50, 100}.Figure 8 shows the average OAs with respect to the values of parameters r and β.According to Figure 8, when the value of β increases, the OAs display a subtle change with a fixed r.That is because the spatial consistency matrix generates a similar contribution for the Indian Pines and KSC datasets.An increased r leads to introducing more between-class samples in the construction of a spatial consistency matrix.For the Indian Pines dataset, a peak value is generated in the curved surface map when the value of r reaches 11, and then the OAs begin to slowly decrease when the value of r continues to increase.If r is a large value, the contribution of the preservation of the local manifold structure will be reduced.Similarly, a peak value of the curved surface is obtained when r increases to 25 for the KSC dataset.Therefore, the parameters r and β were set to 11 and 0.05 for the Indian Pines dataset and 25 and 0.04 for the KSC dataset in the experiments.
The proposed method can be divided into two versions: online and offline.The online version of LWDA needs to construct the optimal projection matrix for each test sample, which leads to a large computational cost.To reduce the computational time, the experiments in this paper used the offline version.Figure 9 shows the histograms of the spatial distance between the input test sample and its nearest training sample.The two histograms for the Indian Pines and KSC datasets illustrate that the spatial distance is mainly distributed in the range of [1,5].The samples in a small neighborhood should be close to each other in the desired feature subspace.Furthermore, the experimental results shown in the Section 4.2 demonstrate that the offline version of the proposed method achieves better classification performance than the existing approaches.

Analysis of Data Partition Factor τ
The data partition factor τ affects the number of selected samples in the training process.The influence of different values of τ was investigated, and results are presented in this section.The value of τ was tuned with the set {0.05, 0.06, 0.07, 0.08, 0.09, 0.1}.Tables 6 and 7 show the classification results in terms of average OAs, average KCs, and their STDs with the Indian Pines and KSC datasets.
In Tables 6 and 7, the OAs and KCs improve with the increase in the data partition factor for all the methods on the two HSI datasets.It implies that a large number of training samples contain more valuable information in the feature representation.DLPP achieves better classification results than PCA and LDA, because DLPP applies the discrimination information to preserve the local structure.TwoSP shows better OAs and KCs than MFA and LGSFA in most conditions, since TwoSP effectively alleviates the nonlinear problem in HSI data and simultaneously preserves the global and local structures.In all experiments, the proposed LWDA method achieves the best classification results under different values of τ, especially with a small τ value, which indicates that a small number of training samples is enough for a good performance.Moreover, LWDA applies the spectral-spatial information to the discriminant analysis process, which largely enhances the discriminating power of low-dimensional embedding features.

Conclusions
In this paper, a new supervised dimensionality reduction method, named LWDA, is proposed on the basis of the spectral-spatial information of HSI data.During the discriminant analysis, LWDA uses the proposed weighted scatter matrix model and computes the spatial consistency matrix for each data sample, which can adaptively learn local manifold structures of the original HSI data.In addition, LWDA preserves the within-class properties and suppresses the between-class characteristics in an optimal low-dimensional feature subspace.
Through the experiments on two real-world HSI datasets, i.e., Indian Pines and KSC, LWDA achieves better classification performance than the existing dimensionality reduction approaches.
In particular, a small data portion of the training set is enough for a satisfactory classification performance.The overall accuracy obtained by LWDA increases by at least 17% in comparison with RAW when the data partition factor is 0.05.In addition, the McNemar test demonstrates that LWDA has statistical significance when compared with the baselines.For LWDA, the absolute value of the McNemar test is at least 10 > 1.96.LWDA learns similarity relationships of the within-class samples and the means of different classes, as well as creates more available information in the subsequent classification.Hence, LWDA achieves the qualitative and quantitative results in the experiments.
Our future work will focus on how to extend the online version of the proposed method to quickly represent the spectral-spatial information and improve the computational efficiency.

Figure 1 .
Figure 1.Flowchart of the proposed LWDA method.With a partition factor τ, the training and test sample set (shown in two-dimensional space), as well as the training label set, are obtained.Then, the dataset is used to construct the weighted scatter matrix model, spatial consistency matrix, and optimal projection matrix for each training sample.The next step is to find the spatially nearest training sample's projection matrix, which is applied to construct the embedding features of training samples and input test sample.Once all the predicted test labels are obtained, the classification map (including the training labels), is generated by exploiting a fixed classifier.

Figure 4 .
Figure 4. Classification maps of different dimensionality reduction methods with NN classifier on the KSC dataset (τ = 0.05).(a) Ground truth; (b) PCA; (c) LDA; (d) DLPP; (e) MFA; (f) LGSFA; (g) TwoSP; (h) DAGL; (i) proposed LWDA.Furthermore, a McNemar test [32,33] was conducted by pairwise comparison to validate the effectiveness of different methods.In the McNemar test, the threshold of significance is set to 0.05.The results of the McNemar test are shown in Table4, where the methods in the horizontal direction are denoted as the test methods while those in the vertical direction are marked as the reference methods.When the value is smaller than zero, it indicates that the classification performance of the test method is better than that of the reference method; otherwise, the reference method outperforms the test method.Moreover, when the absolute value is larger than 1.96, the two methods have obvious differences.Compared with all the competitors, the absolute values in the LWDA column are larger than 10, which demonstrates that LWDA has a distinct advantage.

Figure 6 .
Figure 6.Overall classification accuracy (OA) versus the reduced dimensionality of different methods with NN classifier on the (a) Indian Pines and (b) KSC datasets using τ = 0.05.

Figure 7 .
Figure 7. OAs obtained by different methods with the NN and support vector machine (SVM) classifiers on the (a) Indian Pines and (b) KSC datasets using τ = 0.05.

Figure 7
Figure 7 illustrates that the proposed LWDA method with the NN classifier presents the best classification performance compared with the other dimensionality reduction methods.For most cases on the two dataset, the results with the NN classifier are superior to those with SVM.To unify the classifier in the classification process, the NN classifier was used in all the experiments.

Figure 8 .
Figure 8. OAs versus the value of parameters r and β in the proposed LWDA method with NN classifier on the (a) Indian Pines and (b) KSC datasets using τ = 0.05.

Figure 9 .
Figure 9. Histogram of spatial distance for the (a) Indian Pines and (b) KSC datasets.
The details of the whole framework are described in Algorithm 1. Locally weighted discriminant analysis (LWDA).Input: Training dataset X X X s , training class information set Y Y Y s , test dataset X X X t , parameters α and β, dimensionality of desired projection matrix m.Output: Estimate the test class information set Y Y Y t .Compute the weighted within-class scatter matrix S S S w according to Equation (8); 2. Compute the weighted between-class scatter matrix S S S b according to Equation (9);

Table 1 .
Number of total, training, and test samples, with a partition factor τ = 0.05 of each land-cover class for the Indian Pines and Kennedy Space Center (KSC) datasets.

Table 4 .
McNemar test of different methods on the Indian Pines and KSC datasets.

Table 5 .
Average computational time (unit: second) of different methods with NN classifier on the Indian Pines and KSC datasets using τ = 0.05.

Table 6 .
Classification results (%) with different values of τ on the Indian Pines dataset with NN classifier.Each method has two rows, where the first row is the OA ± standard deviation (STD) and the second row is the kappa coefficient (KC) ± STD.

Table 7 .
Classification results (%) with different values of τ on the KSC dataset with NN classifier.Each method has two rows, where the first row is the OA ± STD and the second row is the KC ± STD.