1. Introduction
Hyperspectral images (HSIs) contain information on hundreds of continuous narrow spectral wavelengths, which are collected by aircrafts, satellites, and unmanned aerial vehicles in each HSI pixel [
1,
2,
3,
4]. Since HSIs reflect rich spectral and spatial resolution, they offer the potential to discriminate more detailed classes and provide even broader applications for land-over classification and clustering [
5,
6,
7,
8]. To a certain extent, dealing with HSIs is difficult because the numerous spectral bands significantly increase the computational complexity and the noise in HSIs can badly influence the classification accuracy [
9,
10]. The existing work reported by most scholars can be roughly divided into two categories according to whether a certain number of training samples are required, as demonstrated in [
11,
12]: (1) supervised learning named HSI classification; and (2) unsupervised learning named HSI clustering. In the literature, many HSI classification algorithms have been proposed and they have achieved excellent performances. One popular method for HSI classification is to first use dimension reduction and then follow a classifier such as support vector machines [
13,
14]. Due to the noises and redundancy among spectral bands, many feature extraction, band selection and dimension reduction techniques have been developed in the past years. Some representative work, such as principle component analysis [
15] and feature-selection algorithm [
16,
17], are also widely applied in HSI classification. Kernel-based algorithms such as SVM and its variants [
14] have been shown to improve performance [
18]. Sparse representation [
19] has also been introduced to the task of HSI classification. Newly raised deep learning techniques [
20] have proved to be useful for supervised HSI classification.
HSI classification based on supervised methods provides excellent performance on standard datasets (e.g., more than 95% of the overall accuracy) [
21]. However, the reported HSI classification algorithms require a certain number of high quality samples to obtain an optimal model. Recently, many researchers noticed that it is expensive or even impossible to collect enough labeled training data in some cases, and some recent work pay more attention to the problem of “small sample size” and present encouraging results, e.g., semi-supervised learning [
22], active learning [
23], domain adaptation [
24], and tensor learning [
25]. Although these methods could achieve similar classification results as supervised ones while using fewer training samples, they are still supervised methods that require high quality training samples to learn the classification model. On the contrary, clustering-based techniques require little prior knowledge and can be considered as data preprocessing methods to provide necessary reference information regarding supervised classification, target detection, or spectral unmixing. Therefore, unsupervised HSI classification is an extremely important techniques and has attracted significant attention in recent years. Wang et al. [
26] illustrated that the existing algorithms can be coarsely divided into the following four categories: (1) Centroid-based clustering methods, such as k-mean [
27] and fuzzy c-means [
28], minimize the within cluster sample distance, but are sensitive to initialization and noise, and cannot provide a robust performance. (2) Density-based methods include the clustering by fast search and find the density peak algorithm [
29], the density-based spatial clustering of applications with noise [
30], and the clustering-in-quest method [
31], which are not suitable for HSIs as it is difficult to get the density peak in the sparse feature space. (3) Biological clustering methods include the artificial immune networks for unsupervised remote sensing image classification [
32] and the automatic fuzzy clustering method based on adaptive multiobjective differential evolution [
33]. Their results are not always satisfactory because biological models do not always exactly fit the characteristics of HSIs. (4) Graph-based methods, such as spectral clustering [
34,
35], perform well in the task of unsupervised HSI classification but most of them take too much time on the eigenvalue decomposition and the affinity matrix.
In general, the accuracy of the existing unsupervised HSI classification algorithms are far from satisfactory compared to the supervised techniques due to the uniform data distribution caused by the large spectral variability. In this paper, we focus on the family of graph-based clustering algorithms (i.e., spectral clustering algorithms) [
36,
37]. Compared with other clustering techniques, spectral clustering has good performance in dealing with irregularly-shaped clusters and gradual variation within groups. In general, spectral clustering performs a low-dimension embedding of the affinity matrix followed by a k-means clustering in the low-dimensional space [
38]. The utilization of graph model and manifold information makes it possible to process the data with complicated structure. Accordingly, algorithms based on spectral clustering have been widely applied and shown their effectiveness in the task of HSI processing. Although the spectral clustering methods have performed well, it would be too expensive to calculate the pairwise distance of enormous samples and difficult to provide an optimal approximation for eigenvalue decomposition in dealing with a large affinity matrix. In the clustering process, the complexity mainly arises from two aspects. First, the storage complexity of the affinity matrix is
and the corresponding time complexity is
. The second is the eigenvalue decomposition of Laplacian matrix, which is
time complexity. Note that
n,
d, and
c are the number of pixels, feature dimensions, and classes of HSI, respectively. It is obvious that high spatial resolution (i.e., number of pixels
n) is a major constraint to apply spectral clustering to real-life HSI applications. In our experiments, spectral clustering techniques can be applied to small-scale HSI datasets such as Samson, Jasper, SalinasA, and India Pines, as these datasets contain only about 10,000 pixels. However, along with the increase of spatial resolution of HSIs, it could be unacceptable for the large-scale HSI datasets including Salinas, Pavia University, Kennedy Space Center, and Urban, which contain about 100,000 pixels, because of the rapid growth of the storage and time complexity of affinity matrix construction and eigenvalue decomposition of Laplacian matrix.
To alleviate the above problem, several improved spectral clustering methods have been proposed for large-scale HSIs with high spatial resolution. An efficient way to get low-rank matrix approximation based on Nyström extension has been widely applied in many kernel based clustering task [
39,
40], and recent studies have shown good performance in the task of HSI processing [
41,
42]. Another method proposed by Nie et al. [
43,
44] constructs anchor-based affinity matrix with balanced k-means based hierarchical k-means algorithm. Wang et al. [
26] improved the anchor-based affinity matrix by incorporating the spatial information. Meanwhile, Nonnegative Matrix Factorization (NMF) technique [
45,
46] and its variants also provide an efficient solution for HSI classification. Motivated by the existing approaches, we propose an improved spectral clustering based on multiplicative update algorithm and two efficient methods for affinity matrix approximation. In general, the spectral clustering problem can be solved by the standard trace minimization of the objective function and we propose an efficient resolution though multiplicative update optimization according to the derivative of the objective function. Meanwhile, the nonnegative constraint and the orthonormal constraint provide a better indicator matrix and this makes it easier to get a robust clustering result by the later processing such as k-means. Furthermore, the anchor-based graph and the Nyström extension are introduced to improve the computational complexity by affinity matrix approximation for the large-scale HSIs. There are three main contributions of this work:
An novel multiplicative update optimization for eigenvalue decomposition is proposed for large-scale unsupervised HSIs classification. It is worth noting that the proposed method can be easily portable to the variants of spectral clustering methods with different regularization items only if the constraints are convex functions.
Two affinity matrix approximation techniques, namely the anchor-based graph and the Nyström extension, are introduced to improve the affinity matrix by sampling limited samples (i.e., pixels or anchors).
Comprehensive experiments on the HSI datasets illustrated that the proposed method achieved a good result in terms of efficiency and effectiveness, and the combination of multiplicative update method and affinity matrix approximation provided a better performance.
The rest of this paper is organized as follows.
Section 2 provides notations and a brief view of the general spectral clustering algorithm. Next, we present the motivation and formulate the proposed multiplicative update algorithm. Furthermore, an effective multiplicative update method for eigenvalue decomposition is presented in
Section 3. To further improve the computational complexity of affinity matrix, we introduce two efficient approximated techniques in
Section 4. The experimental results including performance analyses, computational complicity and parameter determination are given in
Section 5.
Section 6 concludes this paper.
5. Experiments
In the experiments, we verified the performance of the proposed unsupervised HSI classification algorithm on both synthetic datasets and HSI datasets, and then showed several useful analysis. The synthetic benchmark datasets were three sets of data with manifold structure and the HSI datasets are several hyperspectral images (i.e., Salinas, Pavia University, Kennedy Space Center, Samson, Indian Pines, Urban and Japser).
5.1. Experimental Datasets
We conducted experiments on eight widely used hyperspectral datasets:
Salinas and Salinas-A were acquired by the 224-band AVIRIS sensor over Salinas Valley, California, and characterized by high spatial resolution (3.7-m pixels). Salinas covers 512 lines by 217 samples at as scale of . Salinas ground truth contains 16 classes. Salinas-A is an small subscene of Salinas image and it comprises pixels located within the same scene at [samples, lines] = [591–676, 158–240] and includes six classes.
Pavia University is the scene collected by the ROSIS sensor during a flight campaign over Pavia, northern Italy. The number of spectral bands is 103 for Pavia University. Pavia University is a pixels image, where some pixels contain no information and these samples are discarded. Both hyperspectral image ground truths differentiate nine classes.
Kennedy Space Center was acquired by the NASA AVIRIS instrument over the Kennedy Space Center (KSC), Florida, on 23 March 1996. They acquired data in 224 bands of 10 nm width with center wavelengths from 400 to 2500 nm and 176 bands were used for the analysis. KSC hyperspectral image contains pixels. For classification purposes, 13 classes representing the various land cover types that occur in this environment were defined for the site.
Samson dataset is an image with 95 × 95 pixels and each pixel was recorded at 156 channels covering the wavelengths from 401 nm to 889 nm. The spectral resolution is high up to 3.13 nm and it is not degraded by blank or noisy channels. There are three targets in this image: Soil, Tree and Water.
Japser Ridge is a hyperspectral image with pixels. Each pixel was recorded at 224 channels ranging from 380 nm to 2500 nm. The spectral resolution is up to 9.46 nm. There are four end-members latent in these data: Road, Soil, Water and Tree.
Urban has 210 wavelengths ranging from 400 nm to 2500 nm, resulting in a spectral resolution of 10 nm. There are pixels, each of which corresponding to a m area. There are three versions of the ground truth, which contain 4, 5 and 6 end-members respectively, and are introduced in the ground truth.
Indian Pines was gathered by AVIRIS sensor in northwestern Indiana and consists of pixels and 224 spectral reflectance bands. The Indian Pines scene contains two-thirds agriculture, and one-third forest or other natural perennial vegetation. The ground truth available is designated into sixteen classes and we reduced the number of bands to 200 by removing bands covering the region of water absorption.
5.2. Evaluation Metrics
In the experiments, we evaluated the clustering results by Purity (P.) and Normalized Mutual Information (NMI).
P. is the most common metric for clustering results evaluation and it can be formulated as
where
is the clustering result set and
is the ground truth. The worst clustering result is very close to 0 and the best clustering result has a purity value equal to 1.
NMI is a normalization of the mutual information score to scale the results between 0 and 1 as
where
denotes the number of data contained in the cluster
,
is the number of data belonging to the
, and
denotes the number of data that are in the intersection between the cluster
and the class
. The larger is the NMI, the better is the clustering result.
We ran the experiments under the same environment: Intel(R) Core(TM) i7-5930K CPU, 3.50 GHz, 64 GB memory, Ubuntu 14.04.5 LTS system and Matlab version R2014b. We compared our algorithm with Spectral Clustering (SC), Anchor-based Graph Clustering (AGC), and Nyström Extension Clustering (NEC). The corresponding improved algorithms based on multiplicative update optimization are SC-I, NEC-I, and AGC-I. The affinity matrix of the above algorithms were constructed in three ways and the detailed description of the above affinity matrix is presented in the next section.
5.3. Toy Example
We firstly explored the performance of our algorithm on three synthetic datasets to verify the effectiveness of multiplicative update optimization and two approximated affinity matrix matrices. In this experiment, three synthetic datasets were introduced in our experiment: Cluster in Cluster (CC), Two Spirals (TS), and Crescent Moon (CM).
Figure 1 presents the manifold structure of the synthetic datasets in detail. These synthetic datasets contain 2000–40,000 data points that are divided into two groups and they are extremely challenging since clustering algorithms that only consider data point distance have difficulty obtaining a robust result. The algorithms with spectral graph theory provide a more powerful technique in dealing with the manifold information. The resolution for spectral clustering can be divided into two parts: affinity matrix construction and eigenvalue decomposition of the Laplacian matrix. In this paper, we present three formulations for the affinity matrix construction as
where
is the whole sample and
is the chosen data points.
is the parameter to control the neighbor of data points for Euclidean distance and we set
.
is the affinity matrix for anchors (chosen data points) and
stores the similarity between anchors (chosen data points) and the remaining ones.
denotes the distance between the
ith data point and the
jth anchor, which can be considered as chosen data points, and
are ordered from small to large. According to [
27], the parameter
k for anchor-based graph was set to 10, which provided a good performance in most cases. Note that the last two affinity matrices are the approximated solution for the original affinity matrix. The sample scale was set to 10, which means we randomly selected one-tenth of data points as the anchors or the chosen data points.
Compared with the traditional eigenvalue decomposition of the Laplacian matrix, we propose a multiplicative update optimization to get a more efficient solution of eigenvalue decomposition. In our experiments, the number of iterations was about 150 and we obtained good results in most cases. Besides the above-mentioned parameters, the other parameters of the compared algorithms and our improved algorithms were tuned to the optimum.
Table 2,
Table 3 and
Table 4 present the performance of the above six methods on three synthetic datasets. SC and SC-I provided a good clustering result since the corresponding affinity matrix considered the similarity of the whole data points; however, these two methods also needed more time to calculate the Euclidean distance among samples. Note that the proposed multiplicative update algorithm delivered a substantial efficiency increase, taking only half the time to get a similar clustering result. NEC and AGC had the benefit of the approximated affinity matrix and took only about one-tenth the time, but NEC was not robust enough to get a stable resolution of the eigenvalue decomposition. Compared with NEC, the improved algorithm NEC-I provided a better clustering result because of the orthonormal constraint and nonnegative constraint. AGC performed better than SC and NEC in terms of effectiveness and efficiency in the experiments, as it utilized the anchor-based affinity matrix, and the proposed AGC-I also had a good performance.
5.4. HSI Clustering Analysis
In this section, a further study is presented to illustrate the performance of the proposed multiplicative update algorithm and the efficiency of the approximated affinity matrix mentioned in
Section 4 on several popular hyperspectral image datasets. We followed the experiment setting in the previous section where the parameter
was set to 10 and the parameter
k was set to 10. In addition, the parameter
was set to 0.5 and the other parameters were tuned to the optimum for fair competition. Note that the affinity matrix for the hyperspectral image datasets was different from the previous section because it needed to consider both the brightness value and the spatial information. In this case, the affinity matrix
can be rewritten as
where
is the pixel location and the parameter
was set to 10 for both the brightness value and the spatial information. The affinity matrices
and
for NEC were constructed in the same way. Meanwhile, The affinity matrix for AGC is provided as
where
and
is the mean of the brightness value around pixel
.
Figure 2 and
Table 5 present the experimental results, which were evaluated by Purity and NMI on the hyperspectral image datasets. We made the following observations:
SC and the corresponding improved algorithm SC-I achieved competitive performance in term of Purity and NMI. However, SC took more time solving eigenvalue decomposition of Laplacian matrix and our improved algorithm provided a more efficient solution because of the utilization of the multiplicative update optimization. Meanwhile, it took more time to process India Pines because of the rapid growth of time complexity of eigenvalue decomposition of Laplacian matrix caused by the increase of spatial resolution and classes. Note that SC-I, which is based on the multiplicative update algorithm, slightly outperformed SC in terms of Purity and NMI, illustrating that the nonnegative constraint and the orthonormal constraint provided a better indicator matrix. This made it easier to get a robust clustering result by the later processing, such as k-means.
NEC and AGC are two efficient improved algorithms and they took only one-twentieth the time in our experiments. Moreover, NEC and AGC could be used on large-scale hyperspectral image datasets such as KSC and Urban, while SC ran out of memory in dealing with the above large-scale datasets because of the storage and time complexity of the affinity matrix. However, the experimental results also illustrate that NEC was not robust enough, which might be because the affinity matrix can be indefinite and the inverse matrix contains plural elements, making it difficult to get a robust clustering result by k-means. Besides NEC, the other methods did not struggle with this problem, and also provided a better performance than NEC.
The proposed NEC-I and AGC-I outperformed the other methods in terms of effectiveness and efficiency. NEC-I and AGC-I firstly take the advantage of sample techniques including Nyström extension and anchor-based graph, which allow them to be used on large-scale hyperspectral image datasets. Furthermore, the proposed multiplicative update algorithm provided an efficient resolution for eigenvalue decomposition of Laplacian matrix. The results presented in
Table 5 illustrate that NEC-I and AGC-I performed better than NEC and AGC in most cases. The proposed multiplicative update optimization is flexible and well-knit with the approximated affinity matrix such as Nyström extension and anchor-based graph.
5.5. Computational Time
Figure 3 lists the computational time on three synthetic datasets. We ran the experiments under the same environment: Intel(R) Core(TM) i7-5930K CPU, 3.50 GHz, 64 GB memory, Ubuntu 14.04.5 LTS system and Matlab version R2014b. The methods listed in
Figure 3 achieved similar clustering results when there were fewer than 10,000 data points, and SC and SC-I took more time than the other methods when there were more than 10,000 data points. Moreover, the computational time grew rapidly along with the increase of the number of data. The proposed improved algorithm SC-I took only about half the time with more than 30,000 data points. Compared with the above two methods, NEC, AGC and the corresponding improved algorithms NEC-I and AGC-I provided better performance in terms of computational time. Meanwhile, the affinity matrix constructed by the anchor-based graph was better than the affinity matrix constructed by Nyström extension, as the anchor-based graph provided a better way to measure the similarity of data points.