An Efficient Clustering Method for Hyperspectral Optimal Band Selection via Shared Nearest Neighbor

A hyperspectral image (HSI) has many bands, which leads to high correlation between adjacent bands, so it is necessary to find representative subsets before further analysis. To address this issue, band selection is considered as an effective approach that removes redundant bands for HSI. Recently, many band selection methods have been proposed, but the majority of them have extremely poor accuracy in a small number of bands and require multiple iterations, which does not meet the purpose of band selection. Therefore, we propose an efficient clustering method based on shared nearest neighbor (SNNC) for hyperspectral optimal band selection, claiming the following contributions: (1) the local density of each band is obtained by shared nearest neighbor, which can more accurately reflect the local distribution characteristics; (2) in order to acquire a band subset containing a large amount of information, the information entropy is taken as one of the weight factors; (3) a method for automatically selecting the optimal band subset is designed by the slope change. The experimental results reveal that compared with other methods, the proposed method has competitive computational time and the selected bands achieve higher overall classification accuracy on different data sets, especially when the number of bands is small.


Introduction
The hyperspectral sensors capture many narrow spectral bands by different wavelength.Although these bands provide more information to relevant image processing, they also bring some issues.For instance, because of the large correlation between adjacent bands, it is easy to cause the result accuracy to rise first and then decrease when classifying or recognizing tasks.This phenomenon is usually defined as Hughes phenomenon [1].Furthermore, since HSI has many bands, if all bands are used directly for classification or recognition tasks, the time complexity will increase dramatically, and disaster dimension will easily appear.Hence, it is essential to reduce hyperspectral data.Feature selection has been a hot topic in the field of machine learning [2][3][4], which is viewed as an effective measure in HSI analysis for dimensionality reduction.It removes some redundant information and can also obtain satisfactory results in comparison to the raw data.Band selection is a form of feature selection, in which a band subset with low correlation and large information content is selected from all hyperspectral bands to represent the entire spectral bands.It can quickly implement subsequent analysis on hyperspectral data set, including change detection [5], anomaly detection [6], and classification [7,8].
Recently, many band selection methods have been proposed, and they can be grouped into two main categories: supervised band selection [9][10][11] and unsupervised band selection methods [12][13][14][15].The supervised methods first divide data set into training samples and test samples.Then, through the available prior knowledge, e.g., ground truth, the labeled samples are trained to find the most representative bands, whereas knowledge may not be available in practice.Thus, this paper focuses on unsupervised band selection methods.The unsupervised methods use different criteria to measure a qualified subset from all bands.Many of these criteria are information divergence [13], mutual information [14], maximum ellipsoid volume [16], and Euclidean distance [17].
Moreover, various unsupervised methods can be summarized into three main categories: (1) Linear prediction-based band selection [18].These methods based on this category achieve band selection by searching the most dissimilar bands.First, two bands are randomly selected as the initial subset from original band.Then, according to the selected bands, the sequential forward search is adopted to estimate the other candidate band one by one by least square method.Finally, the maximum error is chosen as the next selected band that is the most dissimilar until the number of bands is satisfied.
(2) Orthogonal subspace projection-based band selection [16,19].The type of methods is to project all bands into the subspace and search for the longest component of the projection as the selected band, which is also considered as the most dissimilar to the band subset that have been selected.
(3) Clustering-based band selection [17,[20][21][22].These methods first set up the corresponding similarity matrix by certain criterion.Then, some clustering methods are adopted to achieve band selection in accordance with the matrix.Usually, the selected band is closest to the cluster center within the whole cluster.
However, to obtain a different number of the selected bands, most of the methods require repeated calculations, resulting in the computational load.This is not consistent with the purpose of band selection.Recently, a clustering method based on density peak (DPC) is proposed [23] to measure local density and minimum distance between each point and other points.It effectively avoids multiple iterations, but it needs to set the cutoff distance in the process of clustering.Jia et al. [24] first introduce DPC into the hyperspectral domain to achieve band selection.Although the method develops a rule that automatically adjusts the cutoff distance to calculate local density, it cannot get a good classification performance, which is mainly due to inaccurate local density evaluation caused by not considering the influence of other bands.Furthermore, when selecting a large weight as the selected band, the information content of the band is not taken into account, making the loss of key information in subset.
To address the aforementioned issues, we propose an efficient clustering method based on shared nearest neighbor (SNNC) for hyperspectral optimal band selection.The main contributions are as follows: (a) Consider the similarity between each band and other bands by shared nearest neighbor [25].
Shared nearest neighbor can accurately reflect the local distribution characteristics of each band in space using the k-nearest neighborhood, which can better express the local density of the band to achieve band selection.(b) Take information entropy to be one of the evaluation indicators.When calculating the weight of each band, the information of each band is taken as one of the weight factors.It can retain useful information in a relatively complete way.(c) Design an automatic method to determine the optimal band subset.Through the slope change of the weight curve, the maximum index of the significant critical point is found, which represents the optimal number of clusters to achieve band subset selection.

Datasets Description
In this section, we will introduce in detail from three aspects.First, we introduce two public HSI data sets.Then, the functions of the proposed framework are specifically explained.Finally, the experimental setup is described from five aspects.
In this work, two public HSI data sets are applied to evaluate the superiority of our method.The first data set, the Indian Pines image, was collected by the AVIRIS sensor in 1992.It has 224 spectral bands with the size of 145 × 145 pixels, and the image contains 16 classes of interest (see Figure 1a).Some bands generate large noise due to water absorption, so we remove these bands and finally get 200 bands.The second data set, the Pavia University image, was acquired by the ROSIS sensor over Pavia.It includes 103 spectral bands with the size of 610 × 340 pixels, and the image contains 9 classes of interest, which is introduced in Figure 1b.

Proposed Method
In this paper, we propose SNNC to achieve optimal band selection.First, the local density of each band is computed using k-nearest neighborhood and shared nearest neighbor.Then, according to the thought that the clustering center has a large local density, the minimum distance from each band to other high-density bands is obtained.Next, information entropy is used to calculate the information of each band, and the product of three factors (local density, minimum distance, and information entropy) is taken as the band weight.Some large band weights are selected as clustering centers.Lastly, a method for automatically selecting the optimal subset is designed by the slope change.In the following, we will focus on the specific functions of the proposed framework with more details.

Weight Computation
Let X ∈ R W×H×L denote the HSI cube, where L is the number of total spectral band images, and the width and height of each band image are W, H, respectively.The matrix of each spatial image band is stretched to a one-dimensional vector, thus we get X = [x 1 , x 2 , ..., x L ], where X ∈ R W H×L and x i is the vector of the i-th band image.By means of Euclidean distance, the distance of any two band images is computed as follows: (1) Using Equation (1), the distances from each band image to the others are arranged in ascending order.Let d i denote the distance of the first K, then the K neighbor set of x i is defined in accordance with k-nearest neighborhood, i.e., KNN(x (2) KNN(x i ) reveals the local distribution of the band.Here, we can find that the smaller the distance is, the denser the distribution is around the band.
To analyze the degree of similarity between bands in space, shared nearest neighbor is introduced to describe the relationship between i-th band and j-th band.It is defined as follows: where SNN(x i , x j ) is the number of elements that represent the k-nearest space shared by x i and x j .Shared neighbor number reflects the distribution characteristics between band and its neighbors in local space.In general, the larger the total number is, the closer the distribution of band and neighborhoods is.It further indicates that they are more likely to belong to the same cluster.Conversely, if the total number value is small, the probability of belonging to the same cluster between two bands is low.Following the above distance and similarity matrices, one of the factors, the local density of each band image, can be defined as the ratio of two matrices, it is expressed as ρ i .As Section 2.1 describes, since the number of bands in HSI is not large, if further analysis is performed directly using local density ρ i , this will cause statistical errors.A common solution is to use Gaussian kernel function to estimate local density ρ i , i.e., Then the distance factor σ i of each band image is measured by calculating the minimum distance to band image with a high density [23].For band image with the highest density, it is directly equal to the maximum distance to the other band images.The specific equation is defined as follows: After ρ i and σ i has been computed by the above equation, we can discover that one high-density band image is far away from another high-density band image.It means this band image is more likely to become a cluster center, which further explains that a cluster center should have larger ρ i and σ i simultaneously.
To obtain a subset containing large amount of information, the information entropy is introduced to measure the information hidden in a stochastic variable.For a band x i , its information entropy is expressed as follows: where Ω is the gray-scale color space, and p(z) represents the probability of a certain gray level appearing in the image, which can be obtained from the gray level histogram.Though the above analysis, it can be found that when all three factors are the largest at the same time, the subset with low correlation and large amount of information can be obtained, which conforms to the criteria of band selection.In addition, since the values of three factors do not belong to the same order of magnitude, they are normalized to the scale of [0,1], respectively.This would give the three equal weights in the decision.Based on the above analyses, the weight w i of each band image is expressed by the product of three factors, i.e., With respect to Equation (7), the weights w of all band images in the decreasing order are acquired.We only need to select the bands with the first N weights as the clustering center according to the desired band number.In band selection step, compared with most of band selection clustering methods, our method does not need to repeat multiple iterations, just selecting the band with larger weights as the band subset Y, which can greatly reduce processing time.Consequently, SNNC is a quite simple and effective method.For more details about the framework, the detailed procedures are summarized in Algorithm 1.

Algorithm 1 Framework of SNNC
Input: HSI cube X ∈ R W×H×L , the number of selected bands N, the parameter of k-nearest neighborhood K. Output: The selected subset Y.
1: Stretch each spatial image band into one-dimensional vector x i .
2: Get the local density ρ i of each band by calculating k-nearest neighborhood and shared nearest neighbor.3: Obtain the minimum distance σ i from each band to other high-density bands.4: Compute the information entropy H i in accordance with the gray level histogram.5: Normalize three factors to the scale of [0,1], respectively, and select first N bands to be subset Y according to the product of three factors for each band.6: return Y.

Optimal Band Selection
To acquire the optimal number of bands, we use the weights of all bands.For ease of description, each weight band is treated as a point.We can discover that the band weight changes greatly in some points at the beginning.As the number of bands increase, the band weight becomes small, and the slope change of band weight between the two adjacent points is approximately equal to zero.For these points, they are usually regard as critical points p, i.e., where k i is the slope between point i and point i + 1, γ is the average of the slope difference between two adjacent points, and it can be defined as follows: where q(j) is the sum of the slope difference between two adjacent points, i.e., According to the above equations, we can get many critical points.It is worth pointing out that these points are generally of larger weights.For band selection, a small number of bands cannot satisfy the high accuracy because too much information is lost.In addition, if HSI data set contains many classes, and the scene is complicated, a basic rule is that more bands need to be selected to achieve better in practice.Therefore, we choose the maximum index of point as the optimal number N, i.e., N = max(p).(11) Finally, the desired band subset is obtained from original bands according to the band index represented by the selected cluster center to achieve optimal band selection.

Computational Complexity Analysis
In this section, the time complexity of the proposed method is analyzed as follows.To acquire the number of elements in the shared neighbor space, the distance and shared nearest neighbor matrices need to be calculated.The time complexity of these computation is O(2L 2 ) for hyperspectral data set with L bands.Moreover, when the parameter of k-nearest neighborhood is set to K, the time complexity of local density ρ i , minimum distance σ i , and information entropy H i is O(LK), O(L), and O(LW H), respectively.Considering K < L W H, the weight computation approximately costs O(LW H) time.For optimal band selection, the calculation of some critical points takes O(L) time.In summary, the total time complexity of proposed method is about O(LW H), which reveals that the proposed method has low time complexity in theory in that it is based on ranking approach to achieve clustering, avoiding multiple iterations.
(b) Classification Setting: Two well-known classifiers, including k-nearest neighborhood (KNN) and support vector machine (SVM), are used to evaluate the classification performance of various band selection methods.KNN is one of simplest classifiers that does not depend on any data distribution assumption.It only has one parameter (number of neighbors) in the experiment to be determined.This parameter is set to 5. For SVM classifier, it is a discriminant classifier defined by the classification hyperplane and is often used in image classification.In the experiment, the SVM classifier is implemented with RBF kernel, and the penalty C and gamma are optimized via five-fold cross validation.Additionally, 10% samples from each class are randomly chosen as the training set, the remaining 90% samples are used for test.The sample image and corresponding ground truth map for two data sets are shown in Figure 1.
(c) Number of the Required Bands: For two public HSI data sets, to explain the influence of different number of selected bands on classification accuracy, the number range of bands is set from 5 to 60 for all algorithms.
(d) Accuracy Measures: Three methods of classification accuracy are used to evaluate the classification performance.They are overall accuracy (OA), average overall accuracy (AOA), and kappa coefficient.
(e) Runtime Environment: The experiments are performed using the Intel Core i5-3470 CPU processor and 16GB RAM, and all methods are implemented on Matlab 2016a.

Results
In this section, some comparative tests are conducted to assess the effectiveness of the proposed band selection method.First, we analyze the influence of parameter K.Then, the experimental performance of several methods are employed in accordance with comparison, including classification performance comparison, number of recommended bands, and processing time comparison.

Parameter K Analysis
In the proposed band selection method, before solving shared nearest neighbor matrix, the distances between each band image and other band images are calculated, then the first K distances are obtained by ascending order.To analyze the influence of this parameter K in this processing, we select different number of bands from all bands and set different parameters K to analyze for the experiment.For simplicity, only the Indian Pines data set is analyzed to illustrate the classification results of KNN and SVM classifiers by OA (see Figure 2).Overall, the results show that the parameter K = 3 has best performance for the different number of bands.Furthermore, by calculating the standard deviation, it is found that when the parameter is equal to 3, the OA has the minimum fluctuation compared with other parameters ( the standard deviations of the parameter K = 3 on KNN and SVM classifiers are 6.81 × 10 −5 and 7.32 × 10 −4 , respectively.).Based on above facts, we set K = 3 for Indian Pines data set in the following experiments.Accordingly, the parameter K is set to 5 for Pavia University data set.Therefore, in the following experiments, we empirically set the parameter K = 3 and K = 5 for classification evaluation on two data sets, respectively, without further parameter tuning.

Classification Performance Comparison
In this section, we compare the proposed method with some state-of-the-art algorithms using three accuracy evaluation criteria.For Indian Pines data set, as it can be seen in Table 1 and Figure 3, obviously, SNNC provides superiority to the other methods in AOA and kappa, especially when the number of bands is small.Among all methods, the bands selected by OPBS and WaLuDi obtain approximately the same result at every selected band (Figure 3a,b), which agrees with the results of AOA and kappa.Moreover, RMBS has great fluctuation in a small number of selected bands.This is due to the selection of adjacent bands with high correlation.When it comes to E-FDPC, the result is close to the accuracy of the proposed method, but SNNC method can classify the pixels more precisely, especially when the number of bands is less than 7.For KNN classifier, it is worth pointing out that the AOA of all bands is lower than that of some methods in Table 1, because there is the presence of certain noise in this data set, even if some bands are removed.
For Pavia University data set, the proposed SNNC method also outperforms the other methods in each dimension for two classifiers, as shown in Table 1 and Figure 3.At every selected band (see Figure 3c,d), the difference in results is not obvious, and all competitors achieve a satisfying performance, which is also illustrated in Table 1.Specifically, it is clear that in Figure 3c,d, the results of E-FDPC and WaLuDi are poor and lower than the OA of all bands, even if the number of bands exceeds 50.Compared with Indian Pines data set, OPBS obtains a higher classification result in this data set, which indicates OPBS is not robust enough for different data sets.Moreover, UBS performs well, but its accuracy is slightly lower than that of the proposed methods.
From the experiments on two HSI data sets, some crucial results can be summarized.SNNC attains better and stable performance across different data sets in general, it indicates our method is robust enough for band selection.Furthermore, SNNC still achieve higher OA when a small number of bands is taken.

Number of Recommended Bands
With respect to optimal band number, Equation (11) automatically gives a promising estimation N to determine the suitable number of selected bands.For the Indian Pines data set, one can observe this from two aspects.On the one hand, as the number of bands increases, the OA increase rapidly at the beginning (see Figure 3).When the number of selected bands reaches the estimated N, the OA tends to be stable.On the other hand, Figure 4 also illustrates this key issue that the slope of the band curve changes greatly in critical points, as shown by the horizontal dashed line.To sum up, we can roughly estimate the recommended number of selected bands to be 11, which is exactly consistent with optimal selected bands in Equation (11).Accordingly, the recommended band number of Pavia University data set is 7.  To further illustrate the quality of the recommended band number, we calculate the average correlation coefficient (ACC) of recommended bands on two data sets.In general, the smaller the correlation between the bands is, the lower the redundancy of the selected subset of bands is.Tables 2 and 3 show the results on two data sets.Obviously, when the number of selected bands is 11, the ACC result is lower than other methods on Indian Pines data set when performing the OPBS (see Table 2), which is due to the selection of some bands with low correlation.However, Table 3 reveals that the ACC result of OPBS is relatively large on Pavia University data set.It indicates that OPBS is not robust enough on both data sets.Overall view of the two tables, there is no significant difference in the ACC of the selected bands.Nevertheless, SNNC obtains lower band correlation among all the method, which is the main reason for better classification performance (see Table 1).Thus, SNNC can effectively avoid the selection of adjacent bands, and the classification results are superior to other methods when selecting the same number of bands.

Figure 1 .
Figure 1.Sample image and corresponding ground truth map.(a) Indian Pines data set.(b) Pavia University data set.

Figure 2 .
Figure 2. OA for parameters K by selecting different number of bands on Indian Pines data set.(a) and (b) OA by KNN and SVM classifiers, respectively.

Figure 3 .
Figure 3. OA for several band selection methods by selecting different number of bands on two data sets.(a,b) OA by KNN and SVM on Indian Pines data set, respectively.(c,d) OA by KNN and SVM on Pavia University data set, respectively.

Figure 4 .
Figure 4. Band weight curve.(a) Band weight curve on Indian Pines data set.(b) Band weight curve on Pavia University data set.

Table 1 .
AOA and kappa on two data sets for different band selection methods.

Table 2 .
Number of recommended bands on Indian Pines data set.

Table 3 .
Number of recommended bands on Pavia University data set.

Table 4
reveals the time taken by different to select a certain number of bands on different data sets.As observed, the computation time required by WaLuDi and RMBS is larger than other