1. Introduction
Hyperspectral remote sensing images (HSIs) contain rich spectral and spatial information of ground objects. Thus, they are widely used in land cover classification [
1,
2], environmental protection [
3], mineral exploration [
4], precision agriculture [
5], and other fields [
6,
7]. In these fields, hyperspectral classification is an important application that can identify different materials with subtle spectral divergences. However, there is much redundant information in hyperspectral image cubes because of the strong correlations among adjacent bands, which cause Hughes’ phenomenon [
8]. This may deteriorate the performance of hyperspectral classification [
9]. Therefore, reducing the dimensionality of HSIs before classification is necessary.
The dimensionality reduction techniques of HSIs mainly include feature extraction [
10,
11,
12] and band selection [
13,
14,
15,
16,
17,
18,
19,
20]. Compared with feature extraction, which maps high-dimensional data into a low-dimensional space, band selection methods can select a subset of representative bands from all bands and thus can well retain the original physical meaning of HSIs [
9]. According to whether there are labeled samples, band selection methods can be classified as supervised [
13], semi-supervised [
14], and unsupervised [
15,
16,
17,
18,
19,
20]. Due to the high cost of obtaining labeled samples, supervised and semi-supervised methods are difficult to apply in practical applications. In contrast, unsupervised methods do not require labeled samples and thus can be better applied in real tasks [
18].
Unsupervised band selection methods can be further categorized into ranking-based, sparsity-based, searching-based, and clustering-based methods. Ranking-based methods, such as maximum-variance principal components analysis (MVPCA) [
16] and manifold ranking-based method [
21], perform band selection by assigning a weight to each band using some criteria and then selecting the top-ranked bands. As the ranking-based methods neglect the correlation between bands, the selected bands may have considerable information redundancy [
22]. Sparsity-based methods, such as sparse non-negative matrix factorization [
23] and low-rank representation [
24], select proper bands by fully exploiting the sparsity representation of hyperspectral bands. However, the global structure information of HSIs is hard to capture effectively in the learned sparse coefficient matrix, limiting the effectiveness of the band selection of sparsity-based methods [
25]. Searching-based methods, such as the firefly algorithm [
26] and particle swarm optimization algorithm [
27], select representative bands by optimizing a given criterion. However, the computational complexity of the nonlinear search process involved in these methods is relatively high [
28]. Clustering-based methods have attracted much attention due to the low redundancy of their selected bands and their high classification accuracy [
15]. These methods, such as Ward’s linkage strategy using divergence (WaluDi) [
17], normalized cut-based optimal clustering with ranking criteria using information entropy (NC-OC-IE) [
18], and adaptive subspace partition strategy (ASPS) [
19], first divide all bands into multiple clusters, and then select the most representative band in each cluster. Recently, Zeng et al. [
29] proposed a deep subspace clustering (DSC), which combined the subspace clustering into a convolutional autoencoder to obtain more accurate clustering results. Although clustering-based methods have improved the effectiveness of band selection to some extent, they also have some shortcomings in the clustering process. For example, most of these methods use a single clustering algorithm, which may be sensitive to the randomly chosen initial centroids. Thus, the effectiveness and robustness of clustering results cannot be guaranteed for high-dimensional data [
15,
16,
17]. In addition, most clustering-based methods neglect the exploitation of problem-dependent information of band selection during clustering.
In recent years, ensemble clustering has attracted extensive attention as it can combine multiple base partitions into a more effective clustering. Moreover, ensemble clustering has shown advantages in generating a robust partition, dealing with noisy features, and mining novel structures [
30,
31]. Generally, ensemble clustering can be classified into objective function-based and heuristic-based methods [
30]. The objective function-based methods treat the similarity measures between partitions as an explicit global objective for designing an effective consensus function. Representative methods include combination regularization [
32] and K-means-like algorithm [
33]. In contrast, heuristic-based methods, such as voting-based [
34] and co-association matrix-based methods [
31], employ some heuristics instead of objective functions to search for approximate solutions. For example, Huang et al. [
31] recently proposed a locally weighted ensemble clustering method, which estimated the uncertainty for each cluster of all the base clusterings by entropy theory to further improve the consensus clustering results by exploiting a locally weighted strategy in the consensus function. Although the existing ensemble clustering methods have made significant improvements in clustering performance, they have not been fully tested for band selection tasks. In addition, these methods were developed without considering the inherent characteristics of HSIs. Therefore, introducing problem-dependent information of hyperspectral band selection into the clustering strategy and designing an effective consensus function for generating superior consensus clustering results remains challenging.
Aiming to select more representative bands from HSIs by improving the accuracy of clustering, in this paper, we propose a novel correlation-guided ensemble clustering (CGEC) approach for hyperspectral band selection. By exploiting ensemble clustering in our work, more effective clustering results are expected based on multiple band partitions respectively obtained by K-means methods with different parameter settings. In practice, adjacent bands have a strong correlation because of the continuity of the bands in HSIs [
19]. To effectively exploit this property of HSIs in ensemble clustering, the proposed CGEC approach incorporates the similarity relationship between adjacent bands into the design of a consensus function for ensemble clustering. Consequently, the clustering results yielded by the proposed CGEC better satisfy the needs of band selection applications. Specifically, multiple initial base clustering results are first obtained by setting diverse parameters for K-means. Then, a novel consensus function is proposed to act as a consensus strategy for generating consensus clustering results by considering the assumption that adjacent bands most probably located in the same cluster [
18]. Then, the target bands are obtained by adopting an improved manifold ranking method that selects a representative band from each cluster. In our experiments, the proposed approach is compared with seven representative competitors on three real hyperspectral datasets. The experimental results show the superiority of our proposed method.
3. Result
3.1. Datasets
In our experiments, three benchmark datasets respectively listed in
Table 1 and displayed in
Figure 4 were chosen to test the performance of the proposed approach according to the classification accuracy criteria.
1. Pavia University dataset: The Pavia University dataset is part of the hyperspectral data taken from the image of the Italian city of Pavia in 2002 by the Reflective Optics System Imaging Spectrometer (ROSIS) sensor. This image has 610 × 340 pixels and 115 bands with wavelengths ranging from 0.43 m to 0.86 m, and its spatial resolution is 1.3 m. In our experiments, 12 bands were eliminated because of the influence of noise, and the image made up of the remaining 103 spectral bands was used. There are 9 classes in this image.
2. Botswana dataset: The Botswana dataset was acquired by the NASA EO-1 satellite over the Okavango Delta, Botswana, in 2001. The original Botswana image has 242 bands covering wavelengths from 0.4 µm to 2.5 µm, and its spatial resolution is 30 m. After some uncalibrated and noisy bands were removed, the remaining 145 spectral bands were used in this study. The adopted image has 1476 × 256 pixels and 14 classes.
3. Pavia Centre dataset: The Pavia Centre dataset was obtained by the ROSIS sensor during a flight campaign over Pavia in northern Italy. Thus, it has the same spectral and spatial resolution as those of the first dataset. In our experiments, the noisy bands were removed, and the remaining 102 bands were used. This image has 1096 × 715 pixels that belong to nine different classes.
3.2. Comparison Methods
Seven representative band selection methods, briefly introduced in this section, were used as baselines to verify the effectiveness of the proposed methods.
1. E-FDPC [
15]: Enhanced fast-density-peak-based clustering (E-FDPC) is a clustering-based method that improved a fast density peak-based clustering [
38] algorithm by weighting the local density and the distance within the cluster. In addition, it adopts an exponential-based learning rule to control the number of selected bands of HSIs.
2. ASPS [
19]: ASPS is also a clustering-based method for band selection. It first divides the HSI cube into several subcubes by maximizing the ratio of the intercluster distance to the intracluster distance, and then estimates the band noise in each subcube. Thereafter, the band containing the minimum noise in each cluster is selected as the target band.
3. WaLuDi [
17]: This method uses the hierarchical clustering technique to select representative bands. To measure the dissimilarity among the bands, the Kullback–Leibler divergence is employed in the clustering procedure.
4. NC-OC-IE [
18]: It is a clustering-based method that adopts an optimal clustering framework (OCF) to search for the optimal clustering structure on HSIs. First, an objective function based on a normalized cut criterion is designed. Then, the best band partition is obtained using OCF. Next, the importance of all the bands is evaluated using the information entropy-based criterion. Finally, the target bands are found by selecting the highest-ranked band in each cluster.
5. MVPCA [
16]: MVPCA is a ranking-based method that evaluates the band prioritization by constructing a data-sample covariance matrix. Then, all bands are ranked according to the matrix. The band subset can be obtained by selecting the top-rank bands.
6. LWEA [
31]: LWEA is an ensemble clustering method based on hierarchical agglomerative clustering, which utilizes a similarity matrix as input and iteratively performs cluster merging by finding two clusters with the maximum similarity.
7. DSC [
29]: DSC is a clustering-based band selection approach that exploits a convolutional autoencoder and deep subspace clustering to obtain the clustering results. Then, the final band subset can be obtained by selecting the band closest to its cluster center in each cluster.
3.3. Experimental Setup
1. Classification setting
Support vector machine (SVM) [
39] and K-nearest neighbor (KNN) [
40] were used to test the classification accuracy of different band selection methods. In the experiments, we randomly selected 20% of the samples as the training set, and the rest of the samples were used as the test set. Each method was run 10 times, and the average performance was reported. To test the influence of different band numbers on overall accuracy, we conducted experiments in the range of 5–50 bands. In addition, by referring to [
31], 10 base clusterings were randomly produced by running the K-means algorithm with different numbers of clusters
L. The range of
L is from 2 to
, where
N denotes the number of bands in HSIs. The balance parameter
in Equation (19) is set to 0.99 according to [
37].
2. Accuracy measures
Three accuracy criteria are used to analyze the accuracy of the classified pixels. These criteria are overall accuracy (OA), average overall accuracy (AOA), and Kappa coefficient (Kappa).
3.4. Experimental Results
To test the performance of the proposed approach, two classifiers were adopted to analyze three hyperspectral datasets. The average performance comparison of all the methods for different numbers of bands (the range of 5–50) is reported in
Table 2, where the classification performance of the SVM and KNN classifiers is indicated by the AOA and Kappa. Each row represents the classification accuracy of a specified classifier for the target dataset using the bands given by different methods. The values in red bold and blue italic fonts denote the best and second-best results, respectively.
Table 2 shows that the superiority of CGEC is evident in comparison with the other band selection approaches. Particularly, when using an SVM classifier on the Pavia Centre dataset, our method can achieve an improvement of 2.84% and 2.85% in AOA and Kappa, respectively, compared with LWEA. LWEA had the second-best performance when using an SVM classifier on the three datasets. DSC obtained the second-best results when using a KNN classifier on the Pavia University and Pavia Centire datasets. For the Botswana dataset, the second best is NC-OC-IE.
To further demonstrate the performance of all the methods, the classification results for each class on the three datasets using 30 selected bands are listed in
Table 3,
Table 4 and
Table 5. The values in red bold and blue italic fonts denote the best and second-best results, respectively. Clearly, our proposed method performs best or second best at most classes on three datasets. Some methods are slightly unstable. For example, the performance of LWEA is better on the Botswana dataset, but slightly worse on the other datasets. DSC performs better on the Botswana dataset than on the other datasets. This shows the effectiveness and stability of our method on the three datasets. In addition,
Figure 5,
Figure 6 and
Figure 7 illustrate the OA values of all eight methods on the three datasets when using the SVM and KNN classifiers, respectively. More detailed analyses are given as follows.
Pavia University dataset. For this dataset,
Figure 5a,b indicate the OA results of the SVM and KNN classifiers using bands given by all the methods. The range of the number of bands selected is from 5 to 50.
Figure 5a clearly shows that when using an SVM classifier, the performance of CGEC is better than those of the other algorithms at most of the selected bands. More specifically, when the numbers of selected bands are 10, 15, and 25, our method surpasses the other methods and achieves a satisfactory classification accuracy. It is worth noting that the data redundancy is obviously reduced via the band selection process, and more than 90% of the redundant bands in the original dataset are removed. When the number of the selected bands exceeds 30, our method also performs best, and the OA values of LWEA, DSC, ASPS, and NC-OC-IE are close to each other. Meanwhile, these four methods outperform the other methods. In
Figure 5b, when using a KNN classifier, the OA value of CGEC is better than those of the other methods when using 15, 20, 30, 35, and 50 bands. For the other bands, although our method is slightly lower than NC-OC-IE or MVPCA, it outperforms other approaches.
Botswana dataset. Similar to the Pavia University dataset,
Figure 6a,b illustrate the OA results of the classification using SVM and KNN, respectively.
Figure 6a shows that the OA value of the CGEC method is the highest except when the number of selected bands is 35. At 35 bands, our method achieves the second-best performance. In
Figure 6b, our method shows significant superiority when the numbers of selected bands are 5, 10, 15, and 25. At the same time, the OA values of the CGEC, LWEA, ASPS, WaLuDi, and NC-OC-IE approaches are close to each other at the other bands, which outperform the other methods. In general, the effectiveness of our method is verified.
Pavia Centre dataset. For this dataset, the advantage of our approach is more apparent when using the SVM classifier, as shown in
Figure 7a. When the number of selected bands is 5, our method is significantly better than the other methods. Remarkably, the proposed method achieves a satisfactory result with only 5% of the bands from the dataset. When the number of selected bands exceeds 5, our method also performs very well. In
Figure 7b, when KNN is utilized, the difference in the OA values is not obvious, and all the methods attain satisfying results, except for MVPCA.
In addition,
Figure 8,
Figure 9 and
Figure 10 compare classification maps and ground truth information using 30 selected bands. These classification maps indicate that our method can provide satisfactory results on condition that 79% of the bands from the Botswana dataset, as well as 70% of the bands from the Pavia Center and Pavia University datasets are removed. Therefore, our method can reduce lots of redundant information while maintaining good classification accuracy.
4. Discussion
Band selection is one of the important dimensionality reduction techniques for hyperspectral classification. Aiming to select more representative bands from HSIs by enhancing the accuracy of clustering, in this paper, LWEA, a recently proposed ensemble clustering method, was improved to tackle the band selection problem. The original LWEA can obtain better clustering performance by implementing aggregation clustering. However, it performs cluster merging by only finding two clusters with maximum similarity among all the obtained clusters without considering the characteristics of HSIs. Moreover, the similarity measurements between two clusters used in LWEA equally consider the similarities among data samples included in these two clusters, which may not meet the needs of band selection. Experimental results have demonstrated these issues limit the algorithm performance for band selection. Based on the assumption that adjacent bands in HSIs have a high correlation, and thus they are most probably located in the same cluster, in this paper, we proposed CGEC by improving the cluster merging procedure of LWEA, which can make full use of the similarity relationship between the adjacent bands to generate effective ensemble clustering results. Moreover, based on the clustering results provided by CGEC, our modified manifold ranking method can contribute to selecting more representative bands. To the best of our knowledge, this is the first time that ensemble clustering has been applied to the band selection of HSIs. The experimental results presented in the previous section demonstrate that ensemble clustering is more effective for band selection compared with the single clustering-based band selection methods and LWEA. In addition, exploiting the similarity relationship between adjacent bands in the design of consensus function can effectively enhance the performance of ensemble clustering for band selection.
According to the experimental results, the clustering-based methods achieved better performance than the ranking-based method (i.e., MVPCA). This is because the selected bands of MVPCA have higher redundancy. This finding is consistent with that of a previous study [
41]. In contrast, the clustering-based methods can remove the redundant bands by selecting a representative band from each cluster. Compared with representative clustering-based methods (i.e., E-FDPC, NC-OC-IE, ASPS, WaLudi, DSC, and LWEA), our proposed CGEC has remarkable performance due to the use of ensemble clustering with the guidance of band correlation property of HSIs. A clear explanation lies in the fact that the cluster results given by CGEC meet the need of band selection, which helps to select more representative bands. Thus, we believe that our method can provide more effective clustering results on the HSI in which each band has a stronger correlation with the adjacent bands.
The experimental results also demonstrate that the number of selected bands has a significant influence on classification performance. For example, a general phenomenon for all the methods lies in the fact that, as shown in
Figure 5,
Figure 6 and
Figure 7, the OA values rise rapidly with the increase of the number of bands, but when a certain number of bands are reached, the increase is very slight or even decreased. In accordance with our results, a previous study [
42] has demonstrated that the best performances do not always exist in the band subset with the most bands. The reason is that more bands bring more redundancies. Consequently, selecting more bands does not mean that better classification accuracy can be obtained, while a reasonable number of bands will achieve the best performance.
We have to point out that our study neglects noise interference in HSIs. Thus, the proposed CGEC may choose the noisy bands, which will degrade the classification performance. In addition, base clustering in our method is carried out on the original high-dimensional data, so the quality of base clustering is limited, which will affect the effectiveness of the ensemble clustering result. In future studies, we will explore the strategy of generating more effective base clustering based on representation learning to further improve the band selection performance of ensemble clustering.
5. Conclusions
In this paper, we proposed a correlation-guided ensemble clustering approach for hyperspectral band selection. By adopting ensemble clustering, a more accurate band partition can be obtained compared with the single clustering methods. With the help of a proposed consensus function that is designed based on the assumption that adjacent bands are most probably located in the same cluster, the clustering results of the proposed method more satisfy the needs of band selection. In addition, our proposed approach employs an improved manifold ranking algorithm to select a band subset with better representativeness from the final band partition. A variety of experiments on three real hyperspectral datasets indicate that the effectiveness of the proposed method is superior compared with other competitors. For the sake of clarity, the main conclusions of this paper are as follows.
An ensemble clustering-based approach is proposed to select representative bands for hyperspectral classification. To the best of our knowledge, this is the first time that ensemble clustering has been applied to the band selection of HSIs. The proposed approach consists of two stages, i.e., ensemble clustering and manifold ranking. The ensemble clustering stage is designed to improve the effectiveness of clustering, whereas the manifold ranking stage is exploited to select a representative band from each cluster. Consequently, the chosen band subset has good distinguishability and high representativeness for classification tasks.
In addition, we proposed a novel consensus function used for generating consensus clustering results via agglomerative clustering. By utilizing the fact that adjacent bands have high probability located in the same cluster, the proposed consensus function can simultaneously exploit the problem-independent information and the power of ensemble clustering, so that the obtained results of ensemble clustering better satisfy the needs of band selection.
To verify the effectiveness of our proposed method, we conducted extensive experiments on three real HSI datasets. The experimental results of our method were compared with those of seven representative methods, which demonstrates the superiority of our proposed method.