Feature Line Embedding Based on Support Vector Machine for Hyperspectral Image Classification

: In this paper, a novel feature line embedding (FLE) algorithm based on support vector machine (SVM), referred to as SVMFLE, is proposed for dimension reduction (DR) and for improving the performance of the generative adversarial network (GAN) in hyperspectral image (HSI) classification. The GAN has successfully shown high discriminative capability in many applications. However, owing to the traditional linear-based principal component analysis (PCA) the pre-processing step in the GAN cannot effectively obtain nonlinear information; to overcome this problem, feature line embedding based on support vector machine (SVMFLE) was proposed. The proposed SVMFLE DR scheme is implemented through two stages. In the first scatter matrix calculation stage, FLE within-class scatter matrix, FLE between-scatter matrix, and support vector-based FLE between-class scatter matrix are obtained. Then in the second weight determination stage, the training sample dispersion indices versus the weight of SVM-based FLE between-class matrix are calculated to determine the best weight between-scatter matrices and obtain the final transformation matrix. Since the reduced feature space obtained by the SVMFLE scheme is much more representative and discriminative than that obtained using conventional schemes, the performance of the GAN in HSI classification is higher. The effectiveness of the proposed SVMFLE scheme with GAN or nearest neighbor (NN) classifiers was evaluated by comparing them with state-of-the-art methods and us-ing three benchmark datasets. According to the experimental results, the performance of the proposed SVMFLE scheme with GAN or NN classifiers was higher than that of the state-of-the-art schemes in three performance indices. Accuracies of 96.3%, 89.2%, and 87.0% were obtained for the Salinas, Pavia University, and Indian Pines Site datasets, respectively. Similarly, this scheme with the NN classifier also achieves 89.8%, 86.0%, and 76.2% accuracy rates for these three datasets.


Introduction
Recently, hyper spectral image (HSI) classification has attracted the attention of researchers owing to its numerous applications such as land change monitoring, urban development, resource management, disaster prevention, and scene interpretation [1]. Generally, in HSI classification, a specific category is assigned to each pixel in an image. However, since most multispectral, ultraspectral, and hyperspectral images generate a large number of high-dimensional pixels with many category labels, it is a challenging task to effectively separate these pixels with similar land cover spectral properties. First, because of the large number of HSI pixels, numerous schemes applied in HSI classification were based on supervised learning [1], including the support vector machine (SVM) [2,3], nearest feature line [4,5], random forest [6], manifold learning [7,8], sparse representation [9], and deep learning (DL) [1]. Second, because HSI contains high-dimensional information, many studies have focused on dimension reduction (DR) and reported the importance of DR in HSI classification [10].
Principal component analysis (PCA) [11] is the most popular DR algorithm; it subtracts the mean of the population from each sample to obtain the covariance matrix and extracts a transformation matrix by maximizing the scatter of samples. PCA also plays a role in the pre-processing for other advanced DR algorithms to remove noise and mitigate overfitting [1,10]. Linear discriminant analysis (LDA) [12], and discriminant common vectors [13] are advanced versions of PCA. Since the PCA algorithm is based on linear measurement, it is ineffective in revealing the local structure of samples when samples are distributed in a manifold structure. Many methods based on manifold learning and kernel methods have been proposed to overcome the abovementioned problem. Manifold learning was proposed to preserve the topology of the locality of training samples. He et al. [14] proposed a locality preserving projection (LPP) scheme to preserve the local topology of training data for face recognition. Because the sample scatter obtained through LPP is based on the relationship between neighbors, the local manifold information of samples is preserved, and therefore, the performance of the LPP scheme was shown to be higher than that of linear measurement methods. Tu et al. [7] presented the Laplacian eigenmaps (LE) method, which uses polarimetric synthetic aperture radar data for land cover classification. The LE scheme maintains the high-dimensional polarimetric manifold information in an intrinsic low-dimensional space. Wang and He [15] also used LPP to preprocess data for HSI classification. Kim et al. [16] presented a manifold-based method called locally linear embedding (LLE) to reduce the dimensionality of HSI. Li et al. [8,17] presented the local Fisher discriminant analysis (LFDA) scheme, which takes into account the merits of both LDA and LPP to reduce the dimensionality of HSI. Luo et al. [18] proposed a supervised neighborhood preserving embedding method to extract the salient features for HSI classification. Zhang et al. [19] employed a sparse low-rank approximation scheme to regularize a manifold structure, and HSI data were treated as cube data for classification.
Generally, these methods based on manifold learning all preserve the topology of the locality of training samples and outperform the traditional linear measurement methods. However, according to Boots and Gordon [20], nonlinear information cannot be extracted through manifold learning, and the effectiveness of manifold learning is limited by noise. Therefore, the kernel tricks were employed to obtain a nonlinear feature space and improve the extraction of nonlinear information. Because the use of the kernel tricks improves the performance of a given method [21], the kernelization approach was adopted in both linear measurement methods and manifold learning methods to improve HSI classification. Boots and Gordon [20] employed the kernelization method to alleviate the noise effect in manifold learning. Scholkopf et al. [22] proposed a kernelization PCA scheme, which makes use of kernel tricks to find a high-dimensional Hilbert space and extract nonlinear salient features missed by PCA. In addition, Lin et al. [23] presented a framework for DR based on multiple kernel learning. The multiple kernels were first unified to generate a high dimensional space by a weighted summation, and then these multiple features were projected to a low dimensional space. However, using their proposed framework, they also attempted to determine proper weights for a combination of kernels and DR simultaneously, which increased the complexity of the method. Hence, Nazarpour and Adibi [24] presented a novel weight combination algorithm that was used to only extract good kernels from some basic kernels. Although the proposed idea was a simple and effective idea for multiple kernel learning, it used kernel discriminant analysis based on linear measurement for classification, and thus would not preserve the manifold topology of multiple kernels in high dimension space. Li et al. [25] proposed a kernel integration algorithm that linearly assembles multiple kernels to extract both spatial and spectral information. Chen et al. [26] used a kernelization method based on sparse representation for HSI classification. In their approach, a query sample was represented by all training data in a generated kernel space, and all training samples were also represented in a linear combination of their neighboring pixels. Resembling the idea of multiple kernel learning, Zhang et al. [27] presented an algorithm for multiple-feature integration based on multiple kernel learning and employed it to classify HSI data; their proposed algorithm assembles shape, texture, and spectral information to improve the performance of HSI classification. In addition to obtaining a salient feature space for HSI classification, DR can be a critical pre-processing step for DL. Zhu et al. [10] proposed an HSI classification method based on 3D generative adversarial networks (GANs) and PCA. Their experimental results demonstrated that the performance of GANs is adversely affected if there is no PCA pre-processing step. However, as mentioned earlier, because PCA is a linear measurement method, it may miss some useful manifold information. Therefore, a DR algorithm that can extract manifold information should be used to improve the performance of GANs.
Finally, because of the numerous category labels of HSI, a more powerful classifier is required to improve the performance of HSI classification. Recently, DL has been viewed as the most powerful tool in pattern recognition. Zhu et al. [10] used GANs for HSI classification and obtained a favorable result. Liu et al. [28] proposed a Siamese convolutional neural network to extract salient features for improving the performance of HSI classification. He et al. [29] proposed multiscale covariance maps to improve CNN and integrate both spatial and spectral information in a natural manner. Hu et al. [30] proposed a convolutional long short-term memory method to effectively extract both spectral and spatial features and improve the performance of HSI classification. Deng et al. [31] proposed a deep metric learning-based feature embedding model that can overcome the problem of having only a limited number of training samples. Deng et al. [32] also proposed a unified deep network integrated with active transfer learning, which also overcomes this problem. Chen et al. [1] proposed fine-grained classification based on DL; a densely connected CNN was employed for supervised HSI classification and GAN for semi-supervised HSI classification.
From the presented introduction, the challenges in HSI classification can be summarized as follows: 1. Owing to the high dimensions of HSI, an effective DR method is required to improve classification performance and overcome the overfitting problem. 2. Owing to the numerous category labels of HSI, a powerful classifier is required to improve the classification performance.
Because DR improves HSI classification considerably and overcomes the overfitting problem, in this study, a modification of the feature line embedding (FLE) algorithm [4,5] based on SVM [33], referred to as SVMFLE, was proposed for DR. In this algorithm, SVM is first employed for selecting the boundary samples and calculating the between-scatter matrix to enhance the discriminant ability. As we know, the scatter calculated from the boundary samples among classes could reduce the noise impact and improve the classification results. Second, the dispersion degree among samples is devised to automatically determine a better weight of the between-class scatter. By doing so, the reduced space with more discriminant power is obtained. Three benchmark data sets were used to evaluate the algorithm proposed. The experimental results demonstrated that the proposed SVMFLE method effectively improves the performance of both nearest neighbor (NN) and GAN classifiers.
The rest of this paper is organized as follows: In Section 2, previous and related works are discussed. In Section 3, the proposed algorithm of SVM-based sample selection incorporated into the FLE is introduced. In Section 4, the proposed method was compared with other state-of-the-art schemes for HSI classification to demonstrate its effectiveness. Finally, in Section 5, the conclusions are drawn.

Feature Line Embedding (FLE)
In this study, a novel SVMFLE DR algorithm based on FLE [4,5] and SVM [33] was proposed to reduce the number of feature dimensions and improve the performance of NN or GAN classifiers for HSI classification. A brief review of FLE, SVM, and GAN is provided in the following sections, after which the proposed algorithm is discussed. Consider -dimensional training samples = , , … , ∈ × with the labels = . These samples consist of land-cover classes for training. The projected samples in a low-dimensional feature space are obtained by the linear transformation = , in which is a linear transformation matrix for DR. For clarity, we present the notation and definitions used throughout this paper in Table 1.
FLE is a DR algorithm based on manifold learning. The training sample scatters are represented by a Laplacian matrix to preserve their local structure by applying the strategy of point-to-line. In general, the main objective in FLE is to measure the distance between the sample and its' projected point , ( ) on the feature line , that passes through points and . The FLE objective function is defined and minimized as follows: (1)

Notation Definition
The total number of samples The dimension of original space The number of classes , The set of training data in original space and transformed space ∈ , ∈ Training sample of the object(1 ≤ ≤ ) in original space and transformed space Pre-defined set of class labels ∈ Class label of the object(1 ≤ ≤ ) Obtained linear transformation matrix for DR where = − and matrix represents the column sums of the affinity matrix . Based on the summary of Yan et al. [34], matrix is represented as , = ( + − ) , when ≠ , and is 0 otherwise. Matrix in Equation (1) is represented as a Laplacian matrix. Refer to [34] for more details.
Considering the supervised FLE, the label information is used, and two parameters and for obtaining the within-scatter matrix SSW FLE and between-scatter matrix SSB FLE are manually determined: ( , ) describes the set of NFLs within the same class, , of a specified point , i.e., , ( ) = 1, and ( , ) is a set of NFLs owned by varied classes of point . Then, the Fisher criterion ( ⁄ ) is used to maximize and obtain the transformation matrix , which consists of eigenvectors with the corresponding highest eigenvalues. In the final step, a new projected sample in the low-dimensional feature space is described by the linear projection = , and an NN or a GAN classifier is used for classification later.

Support Vector Machine (SVM)
SVM is used for binary classification; ∈ −1, 1 . It creates the hyperplane with the maximum margin to separate two classes. The optimization problem is solved as follows to obtain the maximum margin model with parameters ∈ and ∈ : min , , where = are the slack variables and ≥ 0 is used as the regularizer for handling the trade-off between the classification error and maximum margin. Moreover, SVM could also be extended to a multi-class version. Assume a training set ( , ), where ∈ is the data label and ∈ 1, . . . , . The decision boundary for the class is the weighted sum of support vectors ∈ with a scalar bias ∈ , where = 1, . . . , indicates the various classes. Then, a multi-class SVM is used to solve the following optimization problem:

Generative Adversarial Networks (GAN)
GAN originated from the concept of game theory and provided an alternative representation of the maximum likelihood method. Two major components, a generator and a discriminator , compete with each other during the training process to fulfill the Nash equilibrium. According to [10], in order to learn the generator 's data distribution from data , the real samples are under a distribution ( ) and the variable of input noise is under a priori ( ). That means the random noise is the input for the generator, then the generator produces a mapping space ( ). The discriminator ( ) identify as a real sample or not from training data by probability. In the training process, generator is trained to minimize log(1 − ( ( ))) while discriminator is trained to maximize log( ( )). Therefore, the objective function is to solve the following: in which is the expectation operator. In order to keep the generator's proper gradients when the discriminator's classification accuracy is high; the loss function of the generator is modified to maximize the probability of identifying a sample as real instead of minimizing the probability of it to be identified as false [10].
To make sure that the generator has proper gradients when the classification accuracy for the discriminator is high, the generator's loss function is usually formulated to maximize the probability of classifying a sample as true instead of minimizing the probability of it to be classified as false [28]. Hence, the formulated loss function can be rewritten as follows: In addition, the GAN classifier in [10] takes both spectral and spatial features into account for HSI classification. It can achieve a better performance than spectral features only. For more details about the GAN, refer to [10].

Feature Line Embedding Based on Support Vector Machine (SVMFLE)
Hyperspectral images are the sensing data with high dimensions. Due to the high dimensional properties, the original pixel dimensions are first reduced by the PCA process. Generally, PCA can find the transformation matrix for the best representation in the reduced feature space. Moreover, PCA also solves the small sample size problem for the further supervised eigenspace-based DR methods, for example, LDA [12], LPP [14], …, etc. The within-class and between-class scatters are two essential matrices in eigenspacebased DR. The FLE-based class scatters have been shown to be the effective representation in HSI classification [4]. Selecting the discriminant samples is the key issue in scatter computation. In [4], the selection strategy, the first k nearest neighbors for a specified sample, is adopted to preserve the local topological structure among samples during training. In this proposed DR method as shown in Figure 1, the samples with more discriminant power, that is, support vectors (SV), are selected for enhancement. SVMFLE calculates the between-class scatter using the SVs which are found by SVM. According to our experience, the scatter calculated from the boundary samples among classes could reduce the noise impact and improve the classification results. In the proposed framework as shown in Figure 1, all pixels on HSI are employed to obtain a transformation matrix PCA , and projected into a low dimensional PCA feature space. Next, some training samples are randomly selected, and SVs between classes are extracted by SVM in the PCA feature space. After that, FLE within-class scatter matrix , FLE between-scatter matrix , and SV-based FLE between-class scatter matrix are calculated to maximize Fisher's criterion for obtaining the transformation matrix * . The final projection matrix is = * . Then, all pixels on HSI are projected into a five-dimensional feature space by the linear transformation = . Next, 200 training samples of dimension five in the SVMFLE feature space were randomly selected to train a GAN classifier. Finally, all pixels on HSI in this SVMFLE feature space are used for evaluating the performance of the GAN classifier.
The within-class scatter is calculated from the discriminant vectors of the same class by using the FLE strategy. Consider a specified sample , the discriminant vectors are chosen from the first nearest features lines generated by its eight nearest neighbors of the same class as shown in Equation (2). On the other hand, the betweenclass scatter is obtained from two parts. One scatter is similar to the approach in [4]. The first nearest feature lines are selected for the between-class scatter computation from six nearest neighbors of sample with different class labels as shown in Equation (3). Here, parameters and are set as 24 and 12, respectively, in the experiments. The other part of the between-class scatter is calculated from the support vectors generated by SVM. The one-against-all strategy is adopted in SVM, for example, two-class classification. Consider the positive training samples of a specified class c and the negative samples of the other classes. These training samples are inputted to SVM for a two-class classification. After the learning process, as we know, the decision boundary is determined from the weighted summation of support vectors. These support vectors are the specified samples near the boundaries among classes. Two SV sets, positive SV and negative SV sets, are obtained to calculate the between-class scatter as shown in the following equation: These two scatters and are integrated as: Here, parameter indicates the ratio between the scatter for all points and the scatter for support vectors . Since support vectors usually locate at the class boundary regions, they are the samples with more discriminant power for learning. The transformation matrix is found by maximizing the Fisher criterion tr( ⁄ ) , in which matrix is composed of the eigenvectors with the corresponding largest eigenvalues. The projected sample in the low-dimensional space is calculated by the linear projection = . Furthermore, the reduced data were used to train the GAN classifier in [10]. The reduced HSI pixels are tested by the discriminator in GAN. The pseudo-codes of the SVMFLE DR algorithm are listed in Table 2.

Input:
A -dimensional training set = , , … , consists of Nc classes. Output: The transformation matrix .
Step 1: PCA projection: Perform PCA to obtain the transformation matrix PCA using the whole HSI pixels. All HSI pixels are projected from a high-dimensional space to a low dimensional space using matrix PCA , a PCA feature space.
Step 2: Randomly select the training samples from HSI in the PCA feature space.
Step 3: The support vectors between classes are obtained by applying the SVM.
Step 6: Fisher's criterion maximization: The Fisher's criterion is maximized to obtain the transformation matrix * = arg max ⁄ , which is composed of eigenvectors with the largest eigenvalues.
Next, an indicator is defined to determine the better value in Equation (10) by measuring the overlapping degree among class samples. Consider a sample in a specified class , the Euclidean distances from every sample to its corresponding class mean are summed to represent the dispersion degree of samples as defined below: It is defined to be the within-class distance for class . On the other hand, the total Euclidean distance from every sample to the population mean μ is also calculated as follows: The dispersion index is thus calculated and defined as the ratio between the summation of within-class distances and the population distance as follows.

Measurement Metrics
In this section, some experiments were conducted for the performance evaluation of the proposed SVMFLE DR algorithm in HSI classification. Three HSI benchmark datasets, Salinas, Pavia University, and Indian Pines Site, were given for evaluation.
Before the presentation of experimental results, three metric indices, overall accuracy (OA), class averaged accuracy (AA), and the Kappa value, are defined for performance evaluation in the following. The OA index is first calculated from the ratio of the correct predictions over the total predictions. Second, the AA index is defined as the averaged value of all class accuracies. The AA index presents the ability to classify different classes of data. The last index, the Kappa value is a consistency indicator for the classification model. The Kappa index is calculated from a confusion matrix × . These three indices are, respectively, defined in Equations (14)- (16).

Classification Results of Dataset Salinas
In this sub-section, the classification results on dataset Salinas are shown by comparing the proposed method with the state-of-the-art methods. The Salinas image was captured from the 224-band AVIRIS (Airborne Visible/Infrared Imaging Spectrometer) sensor over Salinas Valley in California. This HSI is composed of 512 × 217 pixels with 204 bands after removing 20 water absorption bands. A false-color IR image and its corresponding class label map are shown in Figure 3. In the experiment, 16 land-cover classes and a total of 54,129 pixels for dataset Salinas are tabulated in Table 3.
In the training phase, all samples in the Salinas dataset were trained to construct the PCA feature space, and 300 samples of each class, for example, 4800 samples, were randomly chosen to generate the transformation matrix for SVMFLE-based DR. To determine the better value α in Equation (10) After determining value α = 1.0, the whole dataset of 54,129 samples was tested for evaluation. To show the robustness of the proposed method, the experiments were run 30 times for all methods, and the averaged rates were obtained for the comparison as shown in Table 4. These results are represented in the averaged accuracy rates plus/minus the standard derivation form. The classification results of three RD algorithms, PCA, PCA plus FLE, and PCA plus SVMFLE, are tabulated in Table 4 ((a) The NN classifier) using the NN classifier. In addition to the NN classifier, a GAN classifier was also trained by using the reduced data for the comparison with Zhu's work [10]. For a fair comparison, the GAN classifier was trained by the program whose source codes were accessed from [10]. The same architecture of GAN with the same initial weights was trained. A total of 200 training samples of dimensions 5 were randomly chosen for training the GAN classifier, and the remaining 53,929 samples were tested. Here, the training epochs were set as 100. The classification results of various RD algorithms are listed in Table 4 ((b) The GAN classifier) using the GAN classifier. From the classification results as listed in Table 4, the proposed SVMFLE-based DR method outperforms the other methods. Moreover, the standard derivations of SVMFLE are also smaller than the other methods. We can claim that the SVMFLE method provides the DR transformation matrix with more discriminant power than the other methods. To visualize the classification results, the classified label maps for dataset Salinas are presented as shown in Figure

Classification Results of Dataset Pavia University
In this sub-section, the Pavia University dataset was used to evaluate the performance of the proposed SVMFLE DR algorithm. Several state-of-the-art algorithms were also compared to show effectiveness. The Pavia University HSI was captured from the Reflective Optics System Imaging Spectrometer (ROSIS) instrument and covered Pavia City, Italy. Nine land-cover classes covered 610 by 340 pixels with 103 spectral bands from 0.43 to 0.86um. The false-color image and class label map are displayed in Figure 6, and their corresponding pixel numbers for all classes are listed in Table 5. In the experiment, all samples were used to generate the PCA feature space, for example, the same DR process in Zhu's method. After the projection, 180 reduced samples of each class, for example, 2700 samples, were randomly selected to generate the SVMFLE-based transformation matrix. Similar to the process in dataset Salinas, the dispersion indices were calculated for various values α as shown in Figure 7a, and the distributions of reduced samples are also displayed on a 2D plane in Figure 7b-e). The best value = 0.76 is selected according to the smallest dispersion index = 0.14, which will be used in Equation (10). After determining value α = 0.76, 42,776 pixels were tested to evaluate the performance. The experiments were executed 30 times to calculate the average accuracy rates. The accuracy rates for various DR methods, PCA, FLE, and SVMFLE, are listed in Table 6 ((a) The NN classifier) by using the NN classifier. Similarly, the reduced features are also classified by a GAN classifier. 200 training samples of dimensions 5 were randomly chosen for training the GAN classifier. The rest 42,576 samples in the dataset were tested for evaluation. Here, 100 epochs were performed during the training process. The classification results using a GAN classifier are shown in Table 6 ((b) The GAN classifier).
The proposed SVMFLE DR method outperforms the other methods from the classification results as shown in Table 6. Similar to the experiments on dataset Salina, all pixels in dataset Pavia University were projected into a five-dimensional feature space by the linear transformation = trained by the DR methods. Next, 200 five-dimensional samples were also randomly selected to train the NN and GAN classifiers. The discriminator in GAN was referred to as the classifier. The classified label maps for dataset Pavia University were generated by six algorithms, three DR methods, and two classifiers, as shown in Figure 8

Classification Results of Dataset Indian Pines Site(IPS)
In this sub-section, the Indian Pines Site (IPS) dataset was used to evaluate the effectiveness of the proposed method and compared with the state-of-the-art methods. The IPS image was captured from AVIRIS, which was constructed by the Jet Propulsion Laboratory and NASA/Ames in 1992. The area from six miles in the western area of Northwest Tippecanoe County (NTC) was scanned to obtain this HSI dataset. The false-color image and its corresponding class label map are shown in Figure 9. A total of 16 land-cover classes of 145 × 145 pixels in 224 spectral bands are manually labeled. The sample numbers of each class are tabulated in Table 7. After removing the bands covering the water absorption, in total, 10,249 labeled pixels of 200 spectral bands are used in the experiments. In the experiments, three DR methods (e.g., PCA, FLE, and SVMFLE) and two classifiers (e.g., NN and GAN) are implemented for performance comparison. Similar to the experimental configurations on datasets Salina and Pavia University, PCA is first performed to obtain the PCA feature space. Since the samples in each class are biased, the training samples for SVMFLE DR training are selected by the following rule. 300 training samples in a specified class were randomly chosen for obtaining the SVMFLE transformation matrix if its sample number is larger than 300; otherwise, 75% of samples are randomly selected for training. The specified training sample numbers are tabulated at the rightmost column in Table 7.
Next, value α in Equation (10) is also determined by the dispersion index from the training samples. It is varied from 0 to 1 in a 0.01 increasing step, and the corresponding dispersion index is calculated as shown in Figure 10a. The smallest value = 0.29 is obtained when = 0.74. Similarly, all training samples were projected to 2D space, and the sample distributions versus various and are displayed in Figure 10b-e. Observing the sample distributions drawn in the circles, the sample dispersion in Figure 10d is the largest with its corresponding values = 0.29 and = 0.74. In a similar manner, the proposed SVMFLE DR method cooperated with the GAN classifier was also compared with Zhu's work [10]. A total of 200 samples of 5 dimensions were randomly chosen for the training of the GAN classifier, and the remaining 10,049 samples were tested by the trained GAN classifier. One hundred epochs were executed during the training process. In addition, the same network architecture with the same initial weights was set for various compared methods. The experiments were run 30 times to obtain the averaged accuracy rates and the standard derivations for the fair comparisons. The classification results are tabulated in Table 8 by using PCA, FLE, and SVMFLE DR methods cooperated with two classifiers. From the results in Table 8, the proposed SVMFLE DR method outperforms all the other two DR methods both using the NN and GAN classifiers. Figure 11 shows the classification maps of dataset IPS. Three DR methods and two classifiers were trained. In our opinion, the number of training samples for each class is biased. Therefore, the improvement of the proposed SVMFLE DR algorithm is limited in this case.     The proposed SVMFLE with the GAN classifier method was employed in HSI classification. The SVMFLE algorithm applies the SVM selected samples to improve the representation of the between-scatter matrix, which could extract much more useful discrimination information. Accordingly, it obtained a better overall accuracy (OA) than that of Zhu's method to the improvements of 3.5%, 1.7%, and 0.4% for the Salinas, Pavia University, and Indian Pines Site datasets, respectively. In the meanwhile, the proposed SVMFLE with the GAN classifier also obtained a better average accuracy (AA) than that of Zhu's method to the improvements of 2.3%, 3.1%, and 0.7% for these three datasets, respectively. Owing to the fact that dataset IPS greatly varies in the pixel numbers of each class, a biased classifier is easily trained due to the biased training samples. The proposed SVMFLE with the GAN classifier method outperforms Zhu's method only above 0.4% in OA. On the other hand, the pixel numbers of each class in dataset Salinas are large enough, the trained SVMFLE with the GAN classifier outperforms that of Zhu's method above 3.5% in OA. Therefore, when the training samples are large enough and unbiased, the proposed method can obtain a powerfully discriminant feature space and significantly improve the classification performance. Otherwise, the improvement is limited if the training samples are few and biased.
Finally, Figures 12 and 13 show the overall accuracy versus the reduced dimensionality in NN and GAN classifiers, respectively. From the results in Figures 12 and 13, the proposed SVMFLE DR method cooperated with the NN or GAN classifier outperforms the other methods. Only five principal components in the reduced feature space were chosen in considering the computational complexity and training time of the GAN classifier. In addition, the classification performance descended when dimensionality ascended (e.g., 40 components).

Conclusions
In this paper, we have proposed a DR scheme using the feature line embedding strategy with SVM selected samples. SVM selected training samples were used for calculating the between-class scatters. The dispersion degree among samples is calculated to automatically determine the better parameter for . With the selected samples, a reduced space with more discriminant power is obtained. From the experimental results, the classifying ability of the classifier is improved in the reduced space. In addition, several state-of-the-art DR methods were implemented cooperated with the NN and GAN classifiers for the performance comparison. From the experimental results, the proposed SVMFLE DR method outperforms the other methods in three performance indices. Accuracies of 96.3%, 89.2%, and 87.0% were obtained for the Salinas, Pavia University, and Indian Pines Site datasets using the GAN classifier, respectively. On the other hand, this scheme with the NN classifier also achieves 89.8%, 86.0%, and 76.2% accuracy rates for these three datasets. The improvements of the proposed method are significant both in datasets Salinas and Pavia University because of the large and unbiased training samples. On the contrary, when the training samples of each class are biased and few in dataset IPS, the performance of the proposed method is not obviously improved. It will be the future work to mitigate the impact of biased training samples.

Conflicts of Interest:
The authors declare no conflict of interest.