Kernel Entropy Component Analysis-Based Robust Hyperspectral Image Supervised Classiﬁcation

: Recently, the “noisy label" problem has become a hot topic in supervised classiﬁcation of hyperspectral images (HSI). Nonetheless, how to effectively remove noisy labels from a training set with mislabeled samples is a nontrivial task for a multitude of supervised classiﬁcation methods in HSI processing. This paper is the ﬁrst to propose a kernel entropy component analysis (KECA)-based method for noisy label detection that can remove noisy labels of a training set with mislabeled samples and improve performance of supervised classiﬁcation in HSI, which consists of the following steps. First, the kernel matrix of training samples with noisy labels for each class can be achieved by exploiting a nonlinear mapping function to enlarge the sample separability. Then, the eigenvectors and eigenvalues of the kernel matrix can be obtained by employing symmetric matrix decomposition. Next, the entropy corresponding to each training sample in each class is calculated based on entropy component analysis using the eigenvalues arranged in descending order and the corresponding eigenvectors. Finally, the sigmoid function is applied to the entropy of each sample to obtain the probability distribution. Meanwhile, a decision probability threshold is introduced into the above probability distribution to cleanse the noisy labels of training samples with mislabeled samples for each class. The effectiveness of the proposed method is evaluated by support vector machines on several real hyperspectral data sets. The experimental results show that the proposed KECA method is more efﬁcient than other noisy label detection methods in terms of improving performance of the supervised classiﬁcation of HSI.


Introduction
Hyperspectral images (HSI) are captured by hundreds of continuous and narrow spectral bands while simultaneously reflecting interesting target areas.HSI offers the potential for the development of classification techniques because of the nature of different materials with different spectral information.Advancements in classification technology can bring high-level interpretations of remotely sensed scenes and are therefore now widely used in various application domains such as environmental monitoring [1,2], precision agriculture [3,4], and mineral exploration [5][6][7].Specifically, these application scenarios are almost always highly dependent on supervised classification algorithms such as support vector machines (SVM) [8][9][10][11][12], sparse representation (SR) [13][14][15][16][17][18][19], naive Bayesian method [20][21][22], and decision trees [23][24][25].In addition, most existing supervised classifiers are modeled based on the assumption that the labeled pixels used for training the classification model are highly trusted [26][27][28][29].However, in practical applications, the acquisition of training samples usually generate mislabeled samples (noisy labels) that result in significant degradation of performance for the supervised classifier.Therefore, it is often necessary to denoise a training set with noisy labels, and then use the improved training set in subsequent experiments.
Unlike the "noisy" label issue in general computer vision applications, the appearance of noisy labels in hyperspectral remote sensing images can be summarized in three aspects: (1) Noisy labels caused by global position system (GPS) positioning errors.To obtain training samples, field exploration with GPS and the environment for visualizing images (ENVI) is the most convenient way to label pixels.However, the positioning accuracy of the GPS cannot satisfy by the spatial resolution of the pixel, and it is difficult to accurately distinguish the land cover distribution of candidate pixels.Once the GPS has a position error, land covers may be mislabeled, and thus may generate noisy labels.
(2) Noisy labels caused by manual labeling errors.Labeling pixels with visual interpretation is the most reliable way of acquiring training samples.However, manual labeling requires a mass of manpower and a labeling expert who has knowledge of the environment that corresponds to the pixel to be labeled.Specifically, some noncongeneric land cover may exist for a large regular region.However, those noncongeneric regions are generally labeled as the same class as those of surrounding regions to reduce human effort.(3) Noisy labels caused by complex environmental factors.For some scenes, such as ocean and wetland, ground investigation is impossible since they may be unreachable for exploration experts.Moreover, labeling errors may also be produced due to other environmental factors such as adverse weather.In the above situations, the training sample acquisition process generally cannot avoid the generation of noisy labels.Therefore, we can conclude that the "noisy label" problem is, indeed, a major challenge in HSI classification.
For supervised classification in HSI, the noisy label problem of supervised task is increasingly becoming a focus of attention in the field of computer vision and remote sensing.For example, in computer vision , Xiao et al. [30] proposed a probabilistic graphical framework to train convolutional neural networks with a few clean labels and millions of noisy labels.Lu et al. [31] proposed an L 1 optimization-based sparse learning model to detect and remove noisy labels for semantic segmentation.Yao et al. [32] introduced a generative model called latent stability analysis to discover stable patterns among images with noisy labels.In remote sensing, Kang et al. [33] first introduced the reasons for the formation of noisy labels in HSI supervised classification and proposed an edge-preserving filtering (EPF) and spectral detection-based method to correct mislabeled training samples.Experiments show that this method can effectively remove noisy labels and improve the performance of supervised classifiers.Jiang et al. [34] proposed a random label propagation algorithm (RLPA) to cleanse the noisy labels in the training set, the key idea of RLPA is exploiting knowledge (e.g., the superpixel-based spectral-spatial constraints) from the observed hyperspectral images and applying it to the process of label propagation.The fusion spectral angle and local outlier factor (SALOF) are proposed to detect noisy labels in the HSI classification in [35].Tu et al. [36,37] proposed a new density peak (DP) clustering-based noisy label detection method to detect noisy labels.The experimental results show that the DP-based detection method can effectively promote the classification performance.Jie et al. [38] provided a noisy label detection method based on joint spectral-spatial distributed sparse representation that exploits the intraband structure and the interband correlation in the process of joint sparse representation and joint dictionary learning.
In recent years, various processing technologies based on entropy analysis and the kernel method have been successfully applied in the HSI classification.For instance, He et al. [39] proposed an HSI anomaly detection algorithm based on maximum entropy and nonparametric estimation.According to the low probability of the target, the maximum entropy principle is used to estimate the probability density of the target, and the generalized likelihood ratio test is simplified to test only the background likelihood.Cheng et al. [40] proposed an image segmentation algorithm based on 2D Renyi gray entropy and fuzzy clustering.The traditional 2D Renyi threshold is replaced by a two-dimensional Renyi entropy thresholding to improve the global segmentation performance.
The validating experiment shows the effectiveness of improved algorithm.Jie et al. [41] introduced a multiple kernel learning method based on discriminative kernel clustering (DKC) to choose the optimal bands in the HSI.The experiments were conducted on several real hyperspectral data sets to demonstrate that the effectiveness of the DKC band selection method in terms of classification performance and computation efficiency.
In this paper, a kernel entropy component analysis (KECA)-based noisy label detection method is proposed to improve training set with noisy labels in HSI supervised classification.The proposed method consists of the following steps: first, a kernel matrix of the training samples with noisy labels for each class can be created by exploiting the RBF kernel function.Then, the eigenvectors and eigenvalues of the kernel matrix can be obtained by employing symmetric matrix decomposition.Next, the entropy that corresponds to each training sample for each class is calculated based on entropy component analysis using the eigenvalues arranged in descending order and their corresponding eigenvectors.Finally, the sigmoid function is applied to the entropy of each sample to obtain the probability distribution.Meanwhile, a decision probability threshold is introduced into the above probability distribution to remove the noisy labels of the training set with mislabeled samples for each class.The major contributions of the proposed KECA method are presented as follows: 1.
KECA is first introduced into HSI supervised classification to cleanse the original training set with noisy labels.Noisy labels often have very high local entropies, which is the basic motivation behind this paper.

2.
Five commonly used kernel functions are analyzed in the proposed detection framework, where the RBF kernel function is found to be a robust kernel trick for detecting noisy labels.

3.
The effectiveness of proposed method is proved by adopting several real hyperspectral datasets and multiple classifiers, i.e., spectral classifiers and spectral-spatial classifiers.The experimental results show that the proposed KECA is more efficient than other noisy label detection methods in terms of improving performance of the supervised classification in HSI.
The rest of this paper is organized as follows.Entropy component analysis and related works are reviewed in Section 2. Section 3 describes the proposed KECA-based noise label detection method in detail.Section 4 analyzes the experimental results, Section 5 presents the extended discussion, and conclusions are given in Section 6.

Review of Related Methods
In this section, we briefly review the kernel tricks and the Renyi entropy method.

Kernel Tricks
Recently, some kernel tricks have been demonstrated that can provide optimal performance for HSI classification.For instance, Toksöz et al. [42] proposed a nonlinear kernel version of a recently introduced basic thresholding classifier for HSI classification that shows that the proposal and its spatial extension yield better classification results.Li et al. [43] presented a new framework for the development of generalized composite kernel machines for hyperspectral image classification that proved that the proposed framework could lead to state-of-the-art classification performance in complex analysis scenarios.Fang et al. [44] presented a novel framework to effectively utilize the spectral-spatial information of superpixels via multiple kernels that indicated that the proposed approach outperforms several well-known classification methods.Assume that r i and r j are pixels that belong to a sample set C = {r τ } n τ=1 , where n represents the number of pixels.Then, the kernel function is expressed as follows: where ϕ(•) is a mapping function that maps a spectral vector from low-dimensional to high-dimensional.Furthermore, this section reviews several common kernel functions, radial basis function (RBF), linear kernel function (LKF), polynomial kernel function (PKF), wavelet kernel function (WKF), and Laplacian kernel function (LNKF).
• Radial basis function: where σ refers to the weight of the Gaussian function.

•
Linear kernel function: where c refers to the constant term.

•
Polynomial kernel function: where α refers to the free parameter, and q is power term that controls the polynomial.

•
Wavlet kernel function: where ψ and β are the mother wavelet function and the translation coefficients, respectively.

Renyi entropy
Suppose that p 2 (x) is a probability density function on the sample set C = {r τ } n τ=1 , the Renyi entropy can be defined as follows: Then, the Parzen window density estimator is introduced by: where K(r, r u ) is a Parzen window, which is also called the kernel density.Finally, the entropy based on the Renyi method can be obtained as follows:

Proposed KECA Method for HSI Classification with Noisy Labels
Unlike kernel principal component analysis (KPCA), KECA extracts low-dimensional features by considering both the magnitude of eigenvalues and the size of eigenvectors, which can achieve discriminative features from training samples of each class and better reflect the cluster structure.The KPCA only considers the ranking of the eigenvalues, and thus the latent discriminative feature of the training samples may be lost.This paper proposes a new noisy label detection method that extracts the features with the greatest contribution to the Renyi entropy, which is composed of the following parts: (1) Construct the kernel matrix; (2) Acquire the entropy distribution; and (3) Cleanse the training set with noisy labels.The proposed KECA method for noisy label detection is summarized as Algorithm 1 (see Figure 1).Each part consists of certain steps, the details of which are presented as follows.

Acquire the Entropy Distribution
Using the above-obtained kernel matrix, the Renyi quadratic entropy of the each class in the original training set is defined as follows: where p(x 1 ) is the probability density function of each class in the original training set.Taking into account the monotonic nature of the logarithmic function, the following equation is introduced: To estimate V(p), the Parzen window density function is employed that is defined as follows: where K RBF (x l , x l a ) is the Parzen window, or kernel centered at x l a and its width can be represented by the kernel parameter, which must be a density function.Therefore, it is defined as follows: where 1 is a unit vector of length n.In addition, the value of Renyi entropy can be achieved by the terms of the eigenvalues and eigenvectors of the kernel matrix, which can be calculated as follows: where D = diag (γ 1 , • • • , γ n ) is a diagonal matrix for each class, and the columns of E are the eigenvectors e 1 , e 2 , . . ., e n with respect to γ 1 , • • • , γ n , By substituting Equation ( 15) into Equation ( 14), the following can be obtained: Specifically, it can be seen from Equation ( 16) that each λ a and e a have joint contribution to the entropy estimation, thus it is easy to find those eigenvalues and the eigenvectors with the most contribution to the entropy estimation.Finally, the Renyi entropy H(X l ) = [h(x l 1 ), h(x l 2 ), ..., h(x l n )] of the training samples for each class can be calculated by: where h(x l a ) refers to the value of the Renyi entropy for the ath training sample in the l class.

Cleanse the Training Set with Noisy Labels
First, the anomaly probabilities for each training sample in the original training set can be calculated by: Once the anomaly probabilities of the training samples for each class have been obtained, the noisy label of the noisy training set can be easily detected and removed as follows: where t is the threshold of the anomaly probabilities for each class, which is set by the optimal results of experiments under the SVM trained with the improved training set.Finally, the improved training set is represented by

Datasets and Experiments Description
In this section, the proposed detection method for noisy labels is performed using the University of Pavia, Salinas, Kennedy Space Center (KSC), and Washington DC datasets.
ROSIS University of Pavia Dataset: The University of Pavia image was acquired by the ROSIS 03 sensor over the campus at the University of Pavia, Italy.The image is of size 610×340×120, with spatial resolution 1.3 m per pixel and a spectral coverage in the range 0.43-0.86µm.Twelve spectral bands were removed before classification due to high noise.Figure 2a-c show the color composite of the University of Pavia image and the corresponding reference data, which considers nine classes of interest.Figure 2 shows the false-color composite of the University of Pavia image, the corresponding reference data, and the corresponding color code.Table 1 gives the experimental conditions.AVIRIS Salinas Dataset: The Salinas image was collected by the Airborne Visible Infrared Imaging Spectrometer (AVIRIS) sensor over the Salinas Valley, California; it has 224 bands of size 512×217 pixels.In the experiments, 20 water absorption and noisy bands (no.108-112, 145-167, and 224) have been removed.Figure 3a-c show false color composite of the Salinas image, and the reference classification map, which contain 16 different classes.Figure 3 illustrates that the false-color composite of the Salinas image, the corresponding reference data, and the corresponding color code.Table 2 represents the experimental conditions.AVIRIS KSC Dataset: The Kennedy Space Center (KSC) image was collected by the AVIRIS sensor over the Kennedy Space Center in Florida.The image is 512 × 614 pixels, where 48 bands have been removed as water absorption and low SNR bands.Figure 4a-c show the false color composite of the Kennedy Space Center image and the reference classification map, which contains 13 different classes.Figure 4 shows that the false-color composite of the KSC image, the corresponding reference data, and the corresponding color code.Table 3 shows the experimental conditions.HYDICE Washington DC Dataset: The Washington DC image was collected by the Hyperspectral Digital Image Collection Experiment (HYDICE) sensor over the Washington DC Mall.The sensor system measured 210 bands (for our experiments, 19 bands in the spectral range 0.9-1.4×10µm were omitted) of the visible and infrared spectrum in 0.4-2.4µm.The dataset contains 280 scan lines, each of which contains 307 pixels.The false-color composite and reference map (containing six ground reference classes) for Washington DC are as shown in Figure 5a-c.Figure 5 shows that the false-color composite of the Washington DC image, the corresponding reference data, and the corresponding color code.The experimental conditions are recorded in Table 4.As one of the most widely used pixelwise classifiers, this paper adopts the SVM to evaluate the performance of the proposed KECA method, which is implemented with the LIBSVM library [45] using the radial basis function kernel.Moreover, the parameters of the SVM are determined using fivefold cross-validation.To make the comparison fair, the represented quality indexes of the overall accuracy (OA), average accuracy (AA), Kappa coefficient (Kappa), and class individual accuracies are calculated by averaging the results achieved in ten repeated Monte Carlo experiments with different randomly selected training samples and noisy labels, and the mean and standard deviation in repeated experiments of the accuracies are represented in experimental reports.The training sets are constructed using samples in the ground truth.For each class, some pixels randomly selected from other classes will be added to simulate the "noisy label" problem.

Parameter Tuning
This section starts by analyzing the influence of the parameter on the performance of the proposed KECA method.Figure 6 gives the experimental results achieved for the University of Pavia, Salinas, KSC, and Washington DC dataset, respectively.For the University of Pavia dataset, 50 true samples and ten noisy labels are selected randomly for each class.For the Salinas, KSC, and Washington DC dataset 25 true samples and five noisy labels are selected randomly for each class.The size of the kernel parameter σ is set with intervals 0.02-0.22.As shown in Figure 6, it can be observed that the classification accuracy rises first and then falls.The reason is that the width parameter σ controls the radial range of the RBF kernel.Taking Figure 6a, a too large or too small radial range can lead to performance degradation of the proposed method.When the σ is set to 0.12, the classification accuracy achieves 81.10%.For the Salinas, KSC, and Washington DC dataset, the highest classification accuracies will be obtained when σ is set to 0.1, 0.12, 0.18, respectively.Specifically, σ = 0.13 is suggested to be used as the default parameters in the proposed method when the proposed KECA method is conducted on a new dataset.The second experiment sought to analyze the effectiveness of the threshold parameter t.The experiment is conducted on the University of Pavia dataset with 50 true samples and ten noisy labels per class, the Salinas dataset with 25 true samples and five noisy labels per class, the KSC dataset with 25 true samples and five noisy labels per class, and the Washington DC dataset with 25 true samples and five noisy labels per class, respectively.Moreover, the range of the parameter t is set to 0.5-1.Based on the experimental results presented in Figure 7, it can be found that the optimal classification accuracies can be obtained by the proposed KECA method when the t is set to 0.6.The reason is that the values of t control the number of noisy labels to be removed.If the t value is too small, the noisy labels in the training set will not be cleaned, whereas an excessive t value will remove true samples in the training set.Therefore, the threshold parameter t = 0.6 is set to default parameters in the proposed method according to the highest classification accuracy of the SVM trained using that training set that has been improved by the proposed KECA method.Similarly, if given a new dataset, applying a default of t = 0.6 for parameter is suggested in the proposed method.
In addition, the proposed KECA method may achieve better performance when using optimization procedure for setting [46].However, KECA only involves two important parameters in performing the noisy labels detection process.Using the optimal parameter setting based on the experimental results can effectively reduce the time cost due to automatic tuning.Therefore, in order to balance the detection accuracy and time validity, we adopted the experimental optimal parameter setting in the subsequent experiments.

Component Analysis
Here, an experiment is performed on the KSC dataset with 25 true training samples and five noisy labels per class, in which the performance of the proposed method with different kernel trick such linear kernel function (LKF), polynomial kernel function (PKF), wavlet kernel function (WKF), Laplacian kernel function (LNKF), and radial basis function (RBF), is analyzed in Table 5.We can observe from Table 5 that the proposed RBF-based KECA method achieves the best performance in terms of classification accuracies.Therefore, the RBF kernel trick is adopted for the proposed method and used in the following experiments.In addition, to further demonstrate the effectiveness of the RBF kernel in the proposed method, the illustration of the RBF-based kernel matrix in different classes of the four hyperspectral real datasets is given in Figure 8. Evidently, the RBF kernel-based discrimination between true samples and noisy labels is pretty obvious (see outside the red dotted box).Focusing on Figure 8c, d, it can be observed that good septation between true samples and noisy labels still exist when the original training set contains a larger of mislabeled samples.Specifically, the difference is more pronounced in the Salinas, KSC, and Washington DC datasets.This means that the RBF kernel can be an effective component of the proposed method for noisy label detection in supervised HSI classification tasks.

Detection Performance Analysis
In this section, the first experiment is performed to analyze the influence of the number of iterations on the performance of the proposed KECA method.For the University of Pavia dataset, the noisy original training set contains 50 true samples and different numbers of noisy labels for each class.For the KSC, Salinas, and Washington DC scenes, the training sets each contain 25 true samples and various noisy labels for each class.The main iteration steps in the proposed method is to repeat Equation ( 10)-( 19) and the main idea of iteration is that the previous output can be used as the next input until the stop criterion has been satisfied.As shown in Table 6, it can be found that the proposed KECA method achieves a low false detection rate (see the third column), which means that only a few true samples were detected as false.However, the improved training set still contains some noisy labels (fourth column), particularly, when a large number of noisy labels still exist in the original training set.The reason is that decision threshold-based removal solution is dissatisfied with the original training set that has a large number of noisy labels.Therefore, the iteration detection of proposed method is introduced into the improved training set to further remove noisy labels.As shown in Figure 9, it can be seen that the OA decreases as the number of iterations increases when the original training set has few noise labels (see the red curve) on the different HSI datasets.However, when a training set contains more noise labels (see the green curve), the OAs rise and then fall with the number of iterations.Therefore, iteration detection can achieve better detection accuracy in a training set with many noisy labels.Taking into account the balance between calculation efficiency and classification accuracy, the number of iterations of the proposed method is set to one for HSI datasets in this paper.In addition, Table 7 contains the classification results for University of Pavia dataset, it shows that the proposed KECA method can effectively improve the classification accuracies for the most classes.For instance, when a training set contains 30 noisy labels for each class, the classification accuracy of the SVM trained using that training set improved by the KECA training set increases from 80.15% to 91.10% for the asphalt class and from 70.81% to 90.45% for shadows class compared to the other improved training sets.Meanwhile, the OAs can be promoted by approximately 5%.This demonstrates that the noisy labels in the original training set can be effectively removed by the proposed KECA method with respect to the SALOF and DP methods.Table 8 and Figure 11 represent the classification results for the Salinas dataset.As shown in Table 8, when the original training set contains five noisy labels for each class, the SVM trained using training set improved by the KECA is promoted by about 4% in terms of the classification accuracy (i.e., OA) with respect to the SVM trained using the original training set.Similarly, the classification accuracy of the SVM is obviously promoted using the training set improved by the KECA (by about 5%).In addition, the SVM trained using the training set improved by the KECA achieves better classification results in terms of OA, AA, and Kappa compared to the SVM trained using the training set improved by other method such as SALOF or DP.As shown in Figure 11  Table 9 and Figure 12 present the experimental results of the KSC data set.As shown in Table 9, when the number of the noisy labels increases, the classification performance of the proposed KECA method becomes more obviously promoted in terms of OAs, AAs, and Kappas with respect to the other methods.For instance, when the original training set contains 25 true samples and five noisy labels per class, the classification accuracy can be improved approximately 3% by applying the proposed noisy label detection method.When the rate of noisy labels becomes 50%, the amount of improvement in classification accuracies reaches 5%.Specifically, the classification accuracy of the KECA improvement is significantly greater than that of the comparison method (see Table 9).As shown in the local window comparisons presented in Figure 12, the classification results achieved by the proposed KECA method are more similar to the reference data (see Figure 4b).Besides the University of Pavia, Salinas, and KSC datasets, the proposed method has been performed on the Washington DC datasets to further demonstrate the effectiveness of the KECA in HSI noisy label detection.Table 10 contains the classification results.The proposed method can improve the accuracies by 1.75-6.47%.Therefore, the proposed KECA method has the robustness of increasing the classification accuracy in HSI supervised tasks.Finally, Table 11 shows the time-consumption (in seconds) for the different of methods to classify the four real HSI datasets.All codes are conducted on a computer with an Intel(R) Core(TM) i5-7300HQ, CPU 2.50 GHz CPU, and 8 GB of RAM, and the software platform is MATLAB R2019a (MathWorks, Natick, Massachusetts, America).As shown in Table 11, the time-consumption improvement for the SVM trained using a training set with the proposed method was less than the SVM trained using the original noisy training set.In addition, the proposed KECA method has run-time advantages over the SALOF and DP methods in most simulations.This shows that the proposed method can effectively detect and remove noisy labels form a training set, and thus reduce the training time for the SVM.Besides the time spent on classification, Table 11 show the detection time of the different detection methods.The proposed KECA method typically needed less time to execute detection processing with noisy labels compared to the competitive methods.This phenomenon can demonstrate the effectiveness and robustness of the proposed KECA method.In this section, the performance of the proposed KECA method is evaluated by employing some widely used spectral classifiers, such as the basic thresholding classifier (BTC) [47], the kernel basic thresholding classifier (KBTC) [42], the sparse representation classifier (SRC) [48], and the extreme learning machine (ELM) [49], to demonstrate the effectiveness of the proposed noisy label detection method.In this experiment, the proposed KECA method is conducted on the Salinas dataset with 25 true training samples and different numbers of noisy labels for each class.To make the competition more objective, the experiment was repeated ten times to obtain the average value and the standard deviation of the classification accuracies.As shown in Table 12, the classification accuracies of different spectral classifiers trained with the original training set and the training set improved by different noisy label detection methods are represented, respectively.It can be seen that the spectral classifiers using the improved training set always obtain better classification accuracies with respect to those trained with the original noisy training set.Specifically, the classification accuracies of the spectral classifiers trained using the training set cleansed by the KECA often achieve higher classification results than the spectral classifiers trained using training set cleansed by the competitive methods in the term of OAs, AAs, and Kappas.It is demonstrated that the proposed KECA method can be widely employed in supervised hyperspectral processing tasks to promote the accuracy of spectral classifiers.Furthermore, the proposed method is extended to some spectral-spatial classifiers, i.e., representative spectral-spatial classification methods are adopted including extended morphological profiles (EMP) [50], logistic regression and multilevel logistic (LMLL) [51], the joint sparse representation classifier (JSRC) [13], and the edge-preserving filtering (EPF) [52], to prove that the classification performance of a spectral-spatial classifier trained using the training set with different numbers of noisy labels can be improved by exploiting the proposed KECA method.The experiment was also performed on the Salinas dataset with 25 true training samples and different numbers of noisy labels for each class.Similarly, the experiments are repeated ten times to obtain the results of fair competition.Table 13 shows that the classification accuracies of a spectral-spatial classifier trained using the original training set are much lower than those trained with the corrected training sets.In particular, when the number of noisy labels increases, the effectiveness of the proposed KECA method in terms of accuracy is more obvious than competitive methods such as the SALOF and DP methods.This experiment further demonstrates that the proposed method can effectively detect and remove noisy labels, and it is also useful for improving the performance of spectral-spatial classifiers.

Discussion on Feature Extraction
In this section, to further illustrate that the proposed method under different feature methods, i.e., linear discriminant analysis (LDA), principal component analysis (PCA), recursive filter (RF), and extended morphological profiles (EMPs), can effectively detect and remove noisy labels in the training set, an experimental analysis of detection performance is conducted on the KSC datasets with 25 true samples and five noisy labels per class.As shown in Table 14, it can be observed that the EMP-based KECA obtains better OA, AA, Kappa with respect to other features.This means that significant features can further improve the detection performance of the proposed KECA method.However, taking into account the versatility of the KECA method, the report presented by our paper is the experimental result of the original spectral features.

Conclusions and Future Lines
This paper first proposes kernel entropy component analysis to cleanse a noisy training set in supervised HSI classification.The key idea of this work is exploiting kernel-based entropy distribution to detect noisy labels in the original training set.Experimental results conducted on several real hyperspectral scenes show the effectiveness of the proposed methods in terms of classification evaluations.However, one limitation of the proposed method is that it has not taken into account the contextual information of the training samples in the detection process.Therefore, utilizing the kernel-based spectral and spatial information of hyperspectral data to further promote the detection performance will be an important research topic in our future work.

1 : 2 2 1 . 6 :Figure 1 .
Figure 1.Illustrating the framework of the proposed KECA method to detect noisy labels.3.1.Construct the Kernel MatrixLet X = {X 1 , X 2 , • • •, X L } representthe original training set that contain noisy labels, in which L refers to the number of classes, and x l a is the ath training sample in the lth class (a = 1, 2, ..., n).n refers

Figure 2 .
Figure 2. University of Pavia data set.(a) three-band color composite; (b) reference data; (c) color code.

Figure 6 .
Figure 6.Influence of the parameters k on the proposed KECA method.(a) University of Pavia dataset with 50 true samples and ten noisy labels for each class; (b) Salinas dataset with 25 true samples and five noisy labels for each class; (c) KSC dataset with 25 true samples and five noisy labels for each class; (d) Washington DC dataset with 25 true samples and five noisy labels for each class.

Figure 7 .
Figure 7. Effect of parameters k on the performance of the proposed KECA method.(a) University of Pavia dataset with 50 true samples and ten noisy labels per class; (b) Salinas dataset with 25 true samples and five noisy labels per class; (c) KSC dataset with 25 true samples and five noisy labels per class; (d) Washington DC dataset with 25 true samples and five noisy labels per class.

Figure 8 .
Figure 8. Illustration of the RBF-based kernel matrix in different classes for the four hyperspectral real datasets.(a,b) University of Pavia (50 true samples and ten/thirty noisy labels); (c,d) Salinas (25 true samples and five/fifteen noisy labels); (e,f) KSC (25 true samples and five/fifteen noisy labels); (g,h) Washington DC (25 true samples and five/fifteen noisy labels).

Figure 9 .
Figure 9. Classification accuracy achieved by the KECA with different numbers of iterations.(a) The University of Pavia dataset; (b) The Salinas dataset; (c) The KSC dataset; (d) The Washington DC dataset.

4. 5 .
Performance Evaluation Using the SVM In this section, the classification results of different methods, such as the SVM, SALOF, DP, and the proposed KECA method, are evaluated by the SVM trained with different improved training sets on the University of Pavia, KSC, Salinas, and Washington DC datasets.For the University of Pavia dataset, experiments are conducted with 50 true samples and 1-25 noisy labels per class.For the KSC, Salinas, and Washington DC datasets, the experiments are conducted with 25 true samples and 1-15 noisy labels per class.Figure 10 shows the classification performance of the SVM trained using the different training sets, which represented the average values and the standard deviation of the obtained OAs, AAs, and Kappas across ten repeated experiments.The SVM trained with the improved training sets are higher than the original training set.Specifically, the SVM trained using the training set improved by KECA can always achieve optimal classification results compared to the training set improved by other detection methods in terms of OA, AA, and Kappa.Therefore, introducing kernel trick based entropy component analysis can further improve the detection accuracy of noisy labels and the classification performance of the SVM.

Figure 10 .
Figure 10.Performance comparison between the SVM trained using the original training sets and when using the improved training sets obtained by the SALOF, DP, and, KECA methods in terms of OA (first column), AA (second column), and Kappa (third column).(a-c) experiments on the University of Pavia data set with different numbers of mislabeled (in the range 1-25) and 50 true samples for each class; (d-f) experiments on the KSC data set with different number of mislabeled (in the range 1-15) and 25 true samples for each class; (g-i) experiments on the Salinas data set with different numbers of mislabeled (in the range 1-15) and 25 true samples for each class; (j-l) experiments on the Washington DC data set with different numbers of mislabeled (in the range 1-15) and 25 true samples for each class.
, It can be observed that the classification results of fallow smooth and vineyard trellis achieved by the SVM trained with the noisy training set show serious misclassification.By comparison, such misclassification can obviously be reduced in the classification results obtained by the SVM trained with a training set improved by the KECA.Specifically, the classification results of the SVM trained with the training set improved by the KECA show significant improvements compared to the SVM trained with training set cleansed by other methods.Furthermore, it has been demonstrated that the proposed KECA method can effectively cleanse the noisy training set and improve the supervised classification performance in HSI.

35 Figure 11 .
Figure 11.Classification results (%) of the various methods with the University of Pavia dataset.Classification maps obtained by the SVM (first column), DP (second column), SALOF methods (third column), and the proposed KECA method (fifth column) trained with 50 true samples and different numbers of noisy labels per class: (a-d) 10 noisy labels per class, and (e-h) 30 noisy labels per class.

Figure 12 .
Figure 12.Classification results (%) of different methods on the KSC dataset.Classification maps obtained by the SVM (first column), DP (second column), SALOF methods (third column), and the proposed KECA method (fifth column) trained with 25 true samples and different numbers of noisy labels per class: (a-d) 5 noisy labels per class, and (e-h) 15 noisy labels per class.

Table 1 .
Number of samples, i.e., training samples, test samples, and noisy labels, of nine classes in the University of Pavia data set.

Table 2 .
Number of samples, i.e., training samples, test samples, and noisy labels, of sixteen classes in the Salinas data set.

Table 3 .
Number of samples, i.e., training samples, test samples, and noisy labels, of thirteen classes in the Salinas data set.

Table 4 .
Number of samples, i.e., training samples, test samples, and noisy labels, of six classes in the Washington DC data set.

Table 5 .
Classification performance obtained by the KECA method using different kernel tricks for the KSC dataset with 25 true samples and five noisy labels for each class.Number in the parenthesis represents the standard variance of the accuracies obtained in repeat experiments.

Table 6 .
Detection performance (numbers) of noisy labels for the proposed method on a different dataset.Note that the experimental results is the average of ten repeated experiments for objective evaluation.Here, T l × L represents the total number of noisy labels in the training set, T l refers to the number of noisy labels per class, and the number in parentheses represents the standard variance of the accuracies obtained in repeated experiments.

Table 7 .
Classification performance of the SVM, SALOF, DP, and KECA methods for the University of Pavia dataset with 50 true samples and different number of noisy labels per class as training set.The number in the parenthesis represents the standard variance of the accuracies obtained over repeat experiments.

Table 8 .
Classification performance of the SVM, SALOF, DP, and KECA methods for the Salinas dataset with 25 true samples and different numbers of noisy labels per class as a training set.The number in the parentheses represents the standard variance of the accuracies obtained over repeated experiments.

Table 9 .
Classification performance of the SVM, SALOF, DP, and KECA methods for the KSC dataset with 25 true samples and different numbers of noisy labels per class as the training set.The number in the parentheses represents the standard variance of the accuracies obtained in repeat experiments.

Table 10 .
Classification performance of the SVM, SALOF, DP, and KECA methods for the Washington DC dataset with 25 true samples and different numbers of noisy labels per class as training set.The number in the parentheses represents the standard variance of the accuracies obtained across repeat experiments.

Table 11 .
Comparison of the time-consumption (in seconds) of various methods.For the University of Pavia dataset, the training set contains 50 true samples and 10 noisy labels per class; the Salinas, KSC, Washington DC dataset contain 25 true samples and 15 noisy labels per class.The Detection Time is marked as DT and the Classification Time is marked as CT.

Table 12 .
Classification Accuracy of the Salinas dataset.The SVM obtained the classification accuracy of 25 true samples and the SVM, SALOF, DP, KECA methods to compare the classification accuracy under 25 true samples and different numbers of noisy labels per class.

Table 13 .
Classification Accuracy with the Salinas dataset.The SVM obtained the classification accuracy of 25 true samples and the SVM, SALOF, DP, KECA methods to compare the classification accuracy with 25 true samples and different numbers of noisy labels for each class.

Table 14 .
Classification performance of the KECA based on different feature extraction such as original spectral features (Original), linear discriminant analysis (LDA), principal component analysis (PCA), recursive filter (RF), and extended morphological profiles (EMPs) for the KSC dataset with 25 true samples and five noisy labels per class as training set.The number in the parenthesis represents the standard variance of the accuracies obtained over repeat experiments.