Optimized Kernel Minimum Noise Fraction Transformation for Hyperspectral Image Classification

This paper presents an optimized kernel minimum noise fraction transformation (OKMNF) for feature extraction of hyperspectral imagery. The proposed approach is based on the kernel minimum noise fraction (KMNF) transformation, which is a nonlinear dimensionality reduction method. KMNF can map the original data into a higher dimensional feature space and provide a small number of quality features for classification and some other post processing. Noise estimation is an important component in KMNF. It is often estimated based on a strong relationship between adjacent pixels. However, hyperspectral images have limited spatial resolution and usually have a large number of mixed pixels, which make the spatial information less reliable for noise estimation. It is the main reason that KMNF generally shows unstable performance in feature extraction for classification. To overcome this problem, this paper exploits the use of a more accurate noise estimation method to improve KMNF. We propose two new noise estimation methods accurately. Moreover, we also propose a framework to improve noise estimation, where both spectral and spatial de-correlation are exploited. Experimental results, conducted using a variety of hyperspectral images, indicate that the proposed OKMNF is superior to some other related dimensionality reduction methods in most cases. Compared to the conventional KMNF, the proposed OKMNF benefits significant improvements in overall classification accuracy.


Introduction
Hyperspectral images provide very rich spectral information of earth objects [1,2].In general, a hyperspectral image contains hundreds of spectral bands with high spectral resolution.However, the high dimensionality reduces the efficiency of hyperspectral data processing.Moreover, in hyperspectral image classification, another problem is known as the curse of dimensionality or the Hughes phenomenon [3].Namely, the more spectral bands the image has, the more training samples are needed in order to achieve an acceptable classification accuracy.Obviously, it is not easy to be satisfied to the hyperspectral case [4].Dimensionality reduction is a very effective technique to solve this problem [5,6].Dimensionality reduced data should well represent the original data, and can be considered as the extracted features for classification [7][8][9].When the data dimensionality is lower, the computing time will be reduced, and the number of training samples required will become less demanding [10][11][12][13].Therefore, dimensionality reduction is a very critical pre-processing step for hyperspectral image classification [14][15][16].Typically, several approaches exist for dimensionality reduction in hyperspectral data that can be split into two major groups.The first group includes band selection approaches.Such methods aim at selecting a subset of relevant data from the original information.This group includes not only a supervised method such as Bhattacharyya distance, Jeffries-Matusita distance, divergence, kernel dependence, mutual information, and spectral angle mapper, but also unsupervised methods such as geometric-based representative bands, dissimilar bands based on linear projection, manifold ranking [17] and dual clustering [18,19], which have proven to be valuable to achieve superior classification results.The second group relates to feature extraction approaches.Feature extraction methods transform original hyperspectral data into an optimized feature space by mathematical transformation, and then achieve dimensionality reduction through feature selection.A number of techniques have been developed for feature extraction.These techniques can be categorized as two major classes.The first class includes supervised feature extraction methods such as linear discriminant analysis (LDA) [20], nonparametric weighted feature extraction (NWFE) [21], sparse graph based feature extraction and their extensions [22][23][24].The second class relates to unsupervised feature extraction approaches such as principal component analysis (PCA) [25] and minimum noise fraction (MNF) [26], sparse-graph learning-based dimensionality reduction method [27], which do not need priori knowledge on label information.PCA and MNF are two of the widely adopted methods for dimensionality reduction of hyperspectral images.As we all know, the performance of PCA highly relies on noise characteristics [26,28].When the noise is not uniformly distributed across all of the spectral bands or when the noise variance is larger than the signal variance in one band, PCA cannot guarantee that the first few principal components have the highest image quality [26].MNF generates new components ordered by image quality and provides better spectral features in the major components than PCA, no matter how the spectral noise is distributed [28].Original MNF is a linear dimensionality reduction method.It is simple in processing and can be applied in most conditions.However, it is not easy for this method to handle the nonlinear characteristics within the data.The nonlinear characteristics of hyperspectral data is often due to the nonlinear nature of scattering as described in the bidirectional reflectance distribution function, multiple scattering within a pixel, and the heterogeneity of subpixel constituents [29,30].The Kernel MNF (KMNF) method is developed to overcome this weakness in MNF [31][32][33].KMNF is a nonlinear dimensionality reduction method, which introduces the use of kernel functions [34] to model the nonlinear characteristics within the data.The nonlinear transformation based on a kernel function can transform the original data into a higher dimensional feature space, and then a linear analysis can be followed in this space, as the complex nonlinear characteristics in the original input space have become simpler linear characteristics in the new feature space [35][36][37][38][39].Using a similar theory of the kernel methods such as KMNF, kernel PCA (KPCA) was also proposed for nonlinear dimensionality reduction of hyperspectral images [40].
While MNF is a valuable dimensionality reduction method for hyperspectral image classification, it is found that the traditional version of MNF cannot provide desired results in real applications.From the theoretical and experimental analysis, it has been reported that noise estimation is the key factor leading to this problem [41][42][43].In the traditional MNF, it is assumed that spatial neighboring pixels have very high correlation and the differences between these pixels can be considered as the noise.It works when the image has very high spatial resolution.Due to the limitation of hyperspectral sensors, hyperspectral images are often unable to offer high spatial resolution, and mixed pixels are very common in a hyperspectral image [44].Thus, spatial information adopted in the traditional MNF is less reliable for estimating noise for a hyperspectral image.Obviously, the spectral resolution of hyperspectral images is very high, which means that hyperspectral images have strong spectral correlation between bands [45].It has been found that the combination of the spatial and the spectral information is much more appropriate to estimate noise in hyperspectral images than only using single spatial information [46,47].Optimized MNF (OMNF) utilized spectral and spatial de-correlation (SSDC) [48][49][50] to improve noise estimation [51].However, existing SSDC combines the spectral information with only one spatial neighbor for noise estimation [48][49][50], leading to imperfect exploitation of spatial information.KMNF is a kernel version of MNF, and can well treat nonlinear characteristics within the data.However, the classification results using the features extracted by KMNF are often disappointing, and sometimes even worse than using MNF.The fundamental reason of this problem mainly also lies in the fact that the original KMNF adopts only spatial information to estimate noise that has a lot of errors and is not stable.
To overcome the above limitations, we propose a new framework to optimize KMNF (OKMNF) for feature extraction of hyperspectral data.Instead of only relying on single spatial information for noise estimation, the proposed OKNMF estimates noises by taking into account both spectral and spatial correlations through multiple linear regression.We also propose a more general method than SSDC [51][52][53] for noise estimation, where more spatial neighbors are exploited.Moreover, the proposed OKMNF can well treat nonlinear characteristics within the data, which cannot be effectively processed by linear OMNF and MNF.Therefore, OKMNF is much more stable and accurate than KMNF on the noise estimation, and enables better performances on both dimensionality reduction and its post application to classification.Last but not least, the proposed framework can be extended to a general model, when some other accurate noise estimation methods are available.
The remainder of this paper is organized as follows.In Section 2, the OKMNF method will be introduced in detail.Section 3 validates the proposed approach and reports experimental results, comparing them to several state-of-the-art alternatives.Section 4 discusses the performance of noise estimation algorithms and dimensionality reduction methods.Section 5 states the conclusions.

Proposed OKMNF Method
Let us consider a hyperspectral image data set with n pixels and b spectral bands organized as a matrix X with n rows and b columns.Hyperspectral images inevitably contain noises due to the sensor error and other environmental factors' influence.Normally, we can consider the original hyperspectral image X as a sum of a signal part and a noise part [26,[54][55][56]: where x(p) is the pixel vector in position p, x N (p) and x S (p) are noise and signal contained in x(p), respectively.In optical images, noises and signals are often considered to be independent.Thus, the covariance matrix S of image X could be written as a sum of the noise covariance matrix S N and signal covariance matrix S S , Let us consider x k as the average of the kth band, and we can get the matrix X mean with n rows b columns: Z as the center matrix of X, is given by The covariance matrix S of images X could be written as Let us consider x Nk as the average of the noise in kth band, and we can get the matrix X Nmean with n rows and b columns: Z N , as the center matrix of the noise matrix X N , can be computed as The covariance matrix S N of X N could be expressed as The noise fraction NF could be defined as the ratio of the noise variance to the total variance, so for a linear combinations, a T z(p) [26,31], we get where a is the eigenmatrix of NF.In NF, it is significant that the noise is estimated reliably.The original KMNF method [31] mainly adopts the spatial neighborhood (3 by 3) feature of a hyperspectral image to estimate noise Z N [57], as shown below: where z i,j,k is the value of pixel located at line i, column j, and band k of the original hyperspectral image Z, ẑi,j,k is the estimated value of this pixel, and n i,j,k is the estimated noise value of z i,j,k .However, noise estimation based on spatial information alone can be unstable and data-selective [25,51,53].It is because hyperspectral images do not always have very high spatial resolution, and the difference between pixels may contain a significant signal instead of pure noise.In contrast, in hyperspectral images, correlation between bands generally is very high.Therefore, we can incorporate the high correlations between bands for noise estimation, such as SSDC, which is a useful method for hyperspectral image noise estimation.In SSDC, the spatial and spectral correlations are removed through a multiple linear regression model, and the remaining residuals are the estimates of noise [49,50,58].Recent works show that SSDC can offer reliable results for noise estimation when there are different land cover types in the hyperspectral images [50].

Noise Estimation
In noise estimation based on spectral and spatial de-correlation, an image is uniformly divided into non-overlapping small sub-blocks X sub with w × h pixels, in order to reduce the influence of the variations in ground cover types.In SSDC, a multiple linear regression formula is adopted as follows for each pixel [49,50]: where 1 ≤ i ≤ w, 1 ≤ j ≤ h, and (i, j) = (1, 1), a, b, c, and d are the coefficients need to be determined.For each sub-block X sub , the multiple linear regression models could be written as where X sub is sub-block matrix, B is the spectral-spatial neighborhoods matrix, µ is the coefficients matrix, and ε is residual value.However, SSDC integrates spectral information and one spatial neighbor in multiple linear regression for noise estimation.This way the spatial information might not be well exploited to estimate noise.To solve this problem, we propose two methods to improve the SSDC, named SSDC 1 and SSDC 2 , where more spatial neighbors are incorporated into multiple linear regression for noise estimation.
We define SSDC 1 in the same multiple linear regression (same as Equation ( 11)) framework, but adopts the spatial neighbor parts x p,k as follows: where X sub and µ are the same as SSDC, but B is different from it, and can be defined as follows: We can also improve multiple linear regression, which we define as SSDC 2 : where X sub is the same as SSDC, but B and µ are defined as follows: µ could be estimated by Signal value could be estimated through Finally, the noise value N sub can be obtained by The procedure of noise estimation is summarized in Algorithm 1.
Input: hyperspectral image X, sub-block width w × h.
Step 1: compute the coefficients a, b, c, d and e of the multiple linear regression models for each sub-block using Equation (11) or Equation (17); then: Step 2: estimate noise: We analyze the influences of sub-block size by using hyperspectral image as shown in Figure 1a.From the experiments, we found that, when the sub-block size is 4 × 4, or 5 × 5, some sub-blocks are homogeneous and have similar DN values in certain bands; thus, it makes the matrix inversion in multiple linear regression infeasible.When the sub-block size is too large, such as 15 × 15 and 30 × 30, some sub-blocks contain multiple types of earth surface features, and the results of noise estimation become inaccurate and instable.When the sub-block size is 6 × 6, as shown in Figures 2  and 3, the results of noise estimation are reliable and stable.Therefore, we set the sub-block size to 6 × 6 for SSDC, SSDC 1 and SSDC 2 .The width and height of each sub-block are set as w = 6, h = 6.

Kernelization and Regularization
After noise is estimated through SSDC, SSDC 1 or SSDC 2 , it will be included in KMNF.In KMNF, in order to get the new components ordered by image quality after dimensionality reduction, we should minimize the NF.For the convenience of mathematics, we can maximize the 1/NF, which can be presented as We can get to the dual formulation by reparametrizing and setting a ∝ Z T b [31,34]: For the kernelization of 1/NF, we will consider an embedding map where x ∈ R n , Φ(x) ∈ R N , N > n, and nonlinear mapping Φ(x) can transform the original data x into higher dimensional feature space F [34].After mapping Φ(x), the kernelized 1/NF can be expressed as Traditionally, the inner products Φ(x), Φ(y) (x, y ∈ R n ) sometimes can be computed more efficiently as a direct function of the input features, without explicitly computing the mapping Φ(x) [34].This function is called the kernel function κ, which can be expressed as Therefore, Equation ( 25) could be written as where κ = Φ()Φ() T with elements κ(z i , z j ), and κ N = Φ(Z)Φ(Z N ) T with elements κ(z i , z Nj ).
To ensure the uniqueness of the result in Equation ( 27), we regulate the 1/NF by introducing a regulator r, similarly to what the other kernel methods (e.g., KMNF, KPCA [28,31]) have done.This way, we get a version which is regulated as

OKMNF Transformation
The regulated version described above is a symmetric generalized eigenvalue problem, which could be solved by maximizing the Rayleigh quotient in Equation (28).Therefore, this problem can be written as where λ and (κ N κ T N ) 1/2 b are eigenvalues and eigenvectors of ( respectively.a ∝ Z T b, after mapping Φ(x), Z T b transforms to Φ(Z) T b.Thus, we can get the value of b, and the feature extraction result Y can be obtained by: From the above analysis, we can see that noise estimation is a very critical step in the OKMNF method.Firstly, in the original data space, based on original hyperspectral data Z, we get the estimated data Ẑ calculated by multiple linear regression models.Then, we transform the original real hyperspectral data Z and the estimated data Ẑ to the kernel space.In this space, we get the results of noise estimation through calculating the difference of kernel Z and kernel Ẑ.It means that the noise is estimated in the kernel space.Finally, we get the transformation matrix by maximizing regulated 1/NF and achieve the dimensionality reduction.A good noise estimation is important for effective dimensionality reduction.
In many real applications, a hyperspectral image typically has a huge amount of pixels.Then, the kernel matrix could be very large (for example, the matrix sizes of κ and κ N are n by n, and n is the number of pixels).In this case, even in conventional hyperspectral remote sensing images, the kernel matrix will exceed the memory capacity of an ordinary personal computer.For example, a hyperspectral image of n = 512 × 512 pixels, the size of the kernel matrix is n × n = (512 × 512) × (512 × 512) elements.To reduce memory cost and computational complexity, we can randomly subsample the image and perform the kernel eigenvalue analysis only on these selected samples (suppose m), which can be used as training samples.We can generate a transformed version of the entire image by mapping all pixels onto the primal eigenvectors obtained from the subset samples.The procedure of OKMNF is summarized in Algorithm 2. Step 1: compute the residuals (noises) of training samples: n i,j,k = x i,j,k − xi,j,k .
Step 3: compute the eigenvectors of (κ Step 4: mapping all pixels onto the primal eigenvectors.
Output: feature extraction result Y.

Experiments and Results
This section designs three experiments to evaluate the performances of a few noise estimation algorithms and dimensionality reduction methods.The first experiment using real images with different land covers is to assess the robustness of noise estimation algorithms adopted in OKMNF, and the results are shown in Figure 4.The other two experiments are to validate the performances of dimensionality reduction methods in terms of maximum likelihood-based classification (ML) on two real hyperspectral images.The experimental results of Indian Pines image (as shown in Figure 5) are shown in Figures 6-9.The experimental results of Minamimaki scene image (as shown in Figure 10) are shown in Figures 11-13.

Parameter Tuning
In Equation ( 28), we introduced a parameter r to guarantee the uniqueness of the eigenvectors.Figures 7a and 12a show the sensitivity of kernel dimensionality reduction methods (KPCA, KMNF, and OKMNF) with respect to r.We can see that the values of parameter r have little effect on kernel dimensionality reduction methods, and OKMNF gets overall better or comparable accuracy than KMNF and KPCA.To fairly compare different dimensionality reduction methods, we adopt the optimal value of parameter r within the range of requirements when the classification accuracy of hyperspectral images achieves the maximum value.According to our empirical study, in the Indian Pines scene, r of OKMNF, KMNF, and KPCA are all set to 0.0025, and in the Minamimaki Scene, r of KMNF is set to 0.1, and r of OKMNF and KPCA are both set to 0.005.
Another important parameter is the number of subsamples (pixels), m.They were used to derive eigenvectors for data transformation.Figures 7b and 12b show the sensitivity of kernel dimensionality reduction methods (KPCA, KMNF, and OKMNF) with respect to m.We can see that the values of parameter m have little effect on KPCA.To the Indian Pines scene and the Minamimaki scene, the classification accuracy of OKMNF and KMNF both evidently descend when the value of parameter m is greater than 100.However, OKMNF shows lower sensitivity on parameter m than KMNF, and is even better or comparable to KPCA when the value of parameter m is less than 80.We fix the number of the extracted features to see the impact of subsample size on classification.We see the performance decrease, as the number of subsample increases.The reason is that when m increases, more extracted features are required.To reduce the computational time and memory use, we will adopt a small number of subsamples.It is an important empirical rule that can be considered in the applications of OKMNF.Here, we also adopt the optimal value of parameter m within the range of requirements when the classification accuracy of hyperspectral images achieves the maximum value.According to our empirical study, in the Indian Pines scene, m of OKMNF and KPCA are both set to 63, and m of KMNF is set to 42.In the Minamimaki Scene, m of KMNF and KPCA are both set to 30, and m of OKMNF is set to 25.In this paper, the employed kernel function is the Gaussian radial basis function, which is the same as KPCA, KMNF, and OKMNF [59] The Gaussian radial basis function is defined as where x i and x j are vectors of observations, σ = sσ 0 , σ 0 is the mean distance between the observations in feature space and s is a scale factor [33,37].Figures 7c and 12c show the sensitivity of KPCA, KMNF, and OKMNF with respect to s.We can see that both OKMNF and KPCA show better performance than KMNF.In the Indian Pines scene, OKMNF performs better than KPCA.Just like above, we adopt the optimal value of parameter s within the range of requirements when the classification accuracy of hyperspectral images achieves the maximum value.According to our empirical study, s of KPCA, KMNF, and OKMNF are set to 35, 1, and 15 for the Indian Pines scene, respectively.Then, for the Minamimaki scene, s of OKMNF is set to 25, and s of KPCA and KMNF are both set to 10.

Experiments on Noise Estimation Algorithms in KMNF and OKMNF
To assess the performance of noise estimation algorithms adopted in KMNF and OKMNF, six real Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) radiance images with very different land cover types were used in this experiment.These images are shown in Figure 1.Each of them contains 300 × 300 pixels, and covers spectral wavelengths from 400 nm to 2500 nm.Normally, the random noise in AVIRIS sensor images is mainly additive and uncorrelated with the signal [60].More detailed descriptions are shown in Table 1.
We assess the performance of noise estimation algorithms by computing noise standard deviation, after we get noise data through Algorithm 1.The local standard deviation (LSD) of each sub-block is estimated by where w × h − 4 means that four parameters are used in the multiple linear regression model and that the degree of freedom is w × h − 4. The LSD of each sub-block is calculated as the noise estimate of that region.The mean value of these LSD is considered as the best estimate of the band noise.
The AVIRIS hyperspectral imageries in Figure 1 were acquired from July 1996 to June 1997.Figure 1a-f are cut from the same image, respectively.Therefore, their noise level should be the same [50].
Remote Sens. 2017, 9, 548 9 of 20 random noise in AVIRIS sensor images is mainly additive and uncorrelated with the signal [60].
More detailed descriptions are shown in Table 1.
We assess the performance of noise estimation algorithms by computing noise standard deviation, after we get noise data through Algorithm 1.The local standard deviation (LSD) of each sub-block is estimated by means that four parameters are used in the multiple linear regression model and that the degree of freedom is . The LSD of each sub-block is calculated as the noise estimate of that region.The mean value of these LSD is considered as the best estimate of the band noise.
The AVIRIS hyperspectral imageries in Figure 1 were acquired from July 1996 to June 1997.Figure 1a-f are cut from the same image, respectively.Therefore, their noise level should be the same [50].More detailed descriptions are shown in Table 1.
We assess the performance of noise estimation algorithms by computing noise standard deviation, after we get noise data through Algorithm 1.The local standard deviation (LSD) of each sub-block is estimated by means that four parameters are used in the multiple linear regression model and that the degree of freedom is . The LSD of each sub-block is calculated as the noise estimate of that region.The mean value of these LSD is considered as the best estimate of the band noise.
The AVIRIS hyperspectral imageries in Figure 1 were acquired from July 1996 to June 1997.Figure 1a-f are cut from the same image, respectively.Therefore, their noise level should be the same [50].

Experiments on Dimensionality Reduction Methods
In these experiments, the dimensionality reduction performance of OKMNF is evaluated in terms of classification results on two real hyperspectral images.Classification accuracies using the features extracted by PCA, KPCA, MNF, KMNF, OMNF, and OKMNF (OKMNF-SSDC, OKMNF-SSDC 1 , and OKMNF-SSDC 2 ) are compared.Each experiment was run ten times, and the average of these ten experiments was reported for comparisons.

Experiments on the Indian Pines Image
The experimental dataset was collected by the AVIRIS at Indian Pines.
The image contains 145 × 145 pixels with spatial resolution of 20 m, and is with 220 spectral bands from 400 nm to 2500 nm.In this experiment, we compare with different dimensionality reduction methods based on original image including all the 220 bands.It is worth observing that 20 bands covering the region of water absorption are really noisy, thus allowing us to analyze the robustness of the different dimensionality reduction methods to real noise.As shown in Figures 5 and 9, large classes are considered in this experiment.In addition, 25% of samples are randomly selected for training and the others 75% are employed for testing [61,62].The numbers of training and testing samples are listed in Table 2.The first three features extracted by different dimensionality reduction methods are shown in Figure 8.The overall accuracies of ML classification after different dimensionality reduction methods are shown in Table 3 and Figure

Experiments on Dimensionality Reduction Methods
In these experiments, the dimensionality reduction performance of OKMNF is evaluated in terms of classification results on two real hyperspectral images.Classification accuracies using the features extracted by PCA, KPCA, MNF, KMNF, OMNF, and OKMNF (OKMNF-SSDC, OKMNF-SSDC1, and OKMNF-SSDC2) are compared.Each experiment was run ten times, and the average of these ten experiments was reported for comparisons.

Experiments on the Indian Pines Image
The experimental dataset was collected by the AVIRIS at Indian Pines.The image contains 145 × 145 pixels with spatial resolution of 20 m, and is with 220 spectral bands from 400 nm to 2500 nm.In this experiment, we compare with different dimensionality reduction methods based on original image including all the 220 bands.It is worth observing that 20 bands covering the region of water absorption are really noisy, thus allowing us to analyze the robustness of the different dimensionality reduction methods to real noise.As shown in Figures 5 and 9, large classes are considered in this experiment.In addition, 25% of samples are randomly selected for training and the others 75% are employed for testing [61,62].The numbers of training and testing samples are listed in Table 2.The first three features extracted by different dimensionality reduction methods are shown in Figure 8.The overall accuracies of ML classification after different dimensionality reduction methods are shown in Table 3 and Figure 6.The results of ML classification after different dimensionality reduction (number of features = 5) methods are shown in Figure 9.

Discussion
This section discusses the performances of noise estimation algorithms, and these results are shown in Section 3.2.In addition, the results of the dimensionality reduction methods are shown in Section 3.3.
Based on the experiment of assessing the performance of noise estimation algorithms adopted in KMNF and OKMNF, it can be seen in Figure 4 that the estimated noise curves through the difference of spatial neighborhood used in KMNF show a strong relationship with land cover types in the scene, and the noise levels are not the same for the two subimages from the same image.There are no such problems when the noise is estimated by OKMNF through SSDC, SSDC1 and SSDC2.We can see that SSDC, SSDC1 and SSDC2 are more reliable noise estimation methods than that used in KMNF.Thus, we can adopt SSDC, SSDC1 and SSDC2 to estimate noise for OKMNF.
Based on the experiment of assessing the performance of dimensionality reduction methods from Section 3.3.1, it can be seen in Figure 8 that the feature quality of KMNF is worse than other dimensionality reduction methods.OKMNF, by considering SSDC, SSDC1 or SSDC2 for noise estimation, outperforms the other dimensionality reduction methods.It can be seen in Table 3, and Figures 6 and 9 that the classification results using transformed data by MNF are not always better than those of PCA on low dimension space.KMNF performs worse than KPCA.By considering the spectral and spatial de-correlation for noise estimation, linear OMNF always performs better than PCA and mostly better than MNF.OKMNF, by considering SSDC, SSDC1 or SSDC2 for noise estimation, outperforms the other dimensionality reduction methods (including linear OMNF and

Discussion
This section discusses the performances of noise estimation algorithms, and these results are shown in Section 3.2.In addition, the results of the dimensionality reduction methods are shown in Section 3.3.
Based on the experiment of assessing the performance of noise estimation algorithms adopted in KMNF and OKMNF, it can be seen in Figure 4 that the estimated noise curves through the difference of spatial neighborhood used in KMNF show a strong relationship with land cover types in the scene, and the noise levels are not the same for the two subimages from the same image.There are no such problems when the noise is estimated by OKMNF through SSDC, SSDC 1 and SSDC 2 .We can see that SSDC, SSDC 1 and SSDC 2 are more reliable noise estimation methods than that used in KMNF.Thus, we can adopt SSDC, SSDC 1 and SSDC 2 to estimate noise for OKMNF.
Based on the experiment of assessing the performance of dimensionality reduction methods from Section 3.3.1, it can be seen in Figure 8 that the feature quality of KMNF is worse than other dimensionality reduction methods.OKMNF, by considering SSDC, SSDC 1 or SSDC 2 for noise estimation, outperforms the other dimensionality reduction methods.It can be seen in Table 3, and Figures 6 and 9 that the classification results using transformed data by MNF are not always better than those of PCA on low dimension space.KMNF performs worse than KPCA.By considering the spectral and spatial de-correlation for noise estimation, linear OMNF always performs better than PCA and mostly better than MNF.OKMNF, by considering SSDC, SSDC 1 or SSDC 2 for noise estimation, outperforms the other dimensionality reduction methods (including linear OMNF and kernel MNF), with less sensitivity for parameter settings, as well as better performances for classification.This is because OKMNF not only can treat nonlinear characteristics well within the data but also take into account both spectral and spatial correlations for reliable noise estimation.Moreover, OKMNF-SSDC 1 and OKMNF-SSDC 2 perform better than OKMNF-SSDC.This indicates that, by incorporating more spatial neighbors, we enable better noise estimation, as well as improve the classification performances.
Based on the experiment of assessing the performance of dimensionality reduction methods from Section 3.3.2, it can be seen in Table 5, and Figures 11 and 13 that the performances of PCA, KPCA, MNF, and OMNF are very similar, and all of them are better than KMNF.When we optimized the KMNF method through SSDC, SSDC 1 and SSDC 2 noise estimation, the performance of KMNF was greatly improved.OKMNF gets much better results than KMNF, and also performs slightly better than the other four dimensionality reduction methods.
The two experimental results, based on the experiment of assessing the performance of dimensionality reduction methods, show that: (1) the greater the number of features extracted, the higher classification accuracy is; (2) it is better not to use KMNF for dimensionality reduction in many cases, the overall accuracies of ML classification after KMNF are lower than MNF and other dimensionality reduction methods; (3) our proposed OKMNF, OKMNF-SSDC, OKMNF-SSDC 1 , and OKMNF-SSDC 2 perform much better than KMNF and mostly better than OMNF and MNF.These results imply that the dimensionality reduction results of KMNF are not suitable for image classification.By exploiting both spectral and spatial information for noise estimation, the proposed OKMNF benefits both dimensionality reduction and its post applications (e.g., classification).Compared to linear MNF, the proposed OKMNF not only has good performance in dimensionality reduction for classification but also does better in dealing with nonlinear problems.
To compare the efficiency of feature extraction methods, we took Indian Pines data as an example, and the consumed time (by extracting 30 features) of OKMNF-SSDC, OKMNF-SSDC 1 , OKMNF-SSDC 2 , KPCA, KMNF, OMNF, MNF, and PCA are 23.07 s, 25.27 s, 22.80 s, 1.03 s, 1.26 s, 22.87 s, 0.52 s and 0.20 s, respectively.We can find that the proposed OKMNF (OKMNF-SSDC, OKMNF-SSDC 1 , OKMNF-SSDC 2 ) methods consume comparatively longer time but with better dimensionality reduction performances.However, we can use high performance computing techniques such as graphics processing unit to reduce the processing time of OKMNF.In real applications, the number of features kept for classification should be determined for both classification performance and computing cost.Too few features may not provide adequate class separability.On the other hand, more features might not always bring higher classification accuracy, which can be seen from the results listed in Table 3.It is important to use as few features as possible to avoid overfitting and minimise computational load.

Conclusions
This paper proposes an optimized KMNF for dimensionality reduction of hyperspectral imagery.The main reason affecting the original KMNF in dimensionality reduction is the larger error and the instability in estimating noise.Here, we conduct a comparative study for noise estimation algorithms using real images with different land cover types.The experimental results show that the combined spatial and spectral correlation information provides better results than the algorithms only using spatial neighborhood information.OKMNF adopts SSDC, SSDC 1 , and SSDC 2 to stably estimate noise from hyperspectral images.Through this optimization, the overall accuracies of ML classification after OKMNF are much higher than those of KMNF, and the dimensionality reduction results of OKMNF are also better than OMNF, MNF, KPCA, and PCA in most situations.It can be concluded that OKMNF solves the problems existing in original KMNF well and improves the quality of dimensionality reduction.Moreover, OKMNF is valuable to reduce the dimensionality of nonlinear data.We can also expect that OKMNF will enhance the separability among endmember classes and improve the quality of spectral unmixing.Our future work will focus on incorporating more validations on other applications (e.g., target detection).

Algorithm 2 .
The Proposed OKMNF.Input: hyperspectral image X, and m training samples.

Figure 1 .
Figure 1.Airborne Visible/Infrared Imaging Spectrometer radiance images used for noise estimation, where (a) is the first subimage of Jasper Ridge; (b) is the second subimage of Jasper Ridge; (c) is the first subimage of Low Altitude; (d) is the second subimage of Low Altitude; (e) is the first subimage of Moffett Field; and (f) is the second subimage of Moffett Field.

Figure 2 .
Figure 2. Noise estimation results of spectral and spatial de-correlation (SSDC) of Figure 1a in a different size of sub-block.

Figure 1 .
Figure 1.Airborne Visible/Infrared Imaging Spectrometer radiance images used for noise estimation, where (a) is the first subimage of Jasper Ridge; (b) is the second subimage of Jasper Ridge; (c) is the first subimage of Low Altitude; (d) is the second subimage of Low Altitude; (e) is the first subimage of Moffett Field; and (f) is the second subimage of Moffett Field.

Figure 1 .
Figure 1.Airborne Visible/Infrared Imaging Spectrometer radiance images used for noise estimation, where (a) is the first subimage of Jasper Ridge; (b) is the second subimage of Jasper Ridge; (c) is the first subimage of Low Altitude; (d) is the second subimage of Low Altitude; (e) is the first subimage of Moffett Field; and (f) is the second subimage of Moffett Field.

Figure 2 .
Figure 2. Noise estimation results of spectral and spatial de-correlation (SSDC) of Figure 1a in a different size of sub-block.

Figure 2 .
Figure 2. Noise estimation results of spectral and spatial de-correlation (SSDC) of Figure 1a in a different size of sub-block.

Figure 3 .
Figure 3. Noise estimation results of SSDC, SSDC1, and SSDC2 of Figure 1a in the 6 × 6 size of sub-block.

6 .
The results of ML classification after different dimensionality reduction (number of features = 5) methods are shown in Figure9.Remote Sens. 2017, 9, 548 11 of 20

Figure 5 .
Figure 5. (a) original Indian Pines image; (b) ground reference map containing nine land-cover classes.Figure 5. (a) original Indian Pines image; (b) ground reference map containing nine land-cover classes.

Figure 5 .
Figure 5. (a) original Indian Pines image; (b) ground reference map containing nine land-cover classes.Figure 5. (a) original Indian Pines image; (b) ground reference map containing nine land-cover classes.

Figure 6 .Figure 7 .
Figure 6.Comparison of accuracies of maximum likelihood-based classification (ML) classification after different dimensionality reduction methods.

Figure 6 .
Figure 6.Comparison of accuracies of maximum likelihood-based classification (ML) classification after different dimensionality reduction methods.

Figure 6 .Figure 7 .
Figure 6.Comparison of accuracies of maximum likelihood-based classification (ML) classification after different dimensionality reduction methods.

Figure 7 .
Figure 7. Parameter tuning in the experiments using the Indian Pines dataset for ML classification after different feature extraction methods (number of features = 8), where (a) is r versus accuracies; (b) is m versus accuracies; (c) is s versus accuracies.

Figure 9 .
Figure 9.The results of ML classification after different dimensionality reduction methods (number of features = 5).

Figure 10 .
Figure 10.(a) true color image of the Minamimaki scene; (b) ground reference map with 6 classes.

Figure 11 .Figure 12 .
Figure 11.Comparison of accuracies of ML classification after different dimensionality reduction methods.

Figure 12 .Figure 13 .
Figure 12.Parameter tuning in experiments using the Minamimaki dataset for ML classification after different dimensionality methods (number of features = 8), where (a) is r versus accuracies; (b) is m versus accuracies; (c) is s versus accuracies.

Figure 13 .
Figure 13.The results of ML classification after different dimensionality reduction methods (number of features = 3).

Table 1 .
Detailed description of Airborne Visible/Infrared Imaging Spectrometer images shown in Figure1.

Table 1 .
Detailed description of Airborne Visible/Infrared Imaging Spectrometer images shown in Figure1.

Table 2 .
Training and testing samples used in Indian Pines image.

Table 2 .
Training and testing samples used in Indian Pines image.

Table 2 .
Training and testing samples used in Indian Pines image.

Table 3 .
The overall accuracies of maximum likelihood-based classification (ML) classification after different dimensionality reduction methods.

Table 3 .
The overall accuracies of maximum likelihood-based classification (ML) classification after different dimensionality reduction methods.

Table 4 .
Training and testing samples used in the Minamimaki scene.

Table 4 .
Training and testing samples used in the Minamimaki scene.

Table 4 .
Training and testing samples used in the Minamimaki scene.

Table 5 .
The overall accuracies of ML classification after different dimensionality reduction methods.

Table 5 .
The overall accuracies of ML classification after different dimensionality reduction methods.