Fast Spectral Clustering for Unsupervised Hyperspectral Image Classiﬁcation

: Hyperspectral image classiﬁcation is a challenging and signiﬁcant domain in the ﬁeld of remote sensing with numerous applications in agriculture, environmental science, mineralogy, and surveillance. In the past years, a growing number of advanced hyperspectral remote sensing image classiﬁcation techniques based on manifold learning, sparse representation and deep learning have been proposed and reported a good performance in accuracy and efﬁciency on state-of-the-art public datasets. However, most existing methods still face challenges in dealing with large-scale hyperspectral image datasets due to their high computational complexity. In this work, we propose an improved spectral clustering method for large-scale hyperspectral image classiﬁcation without any prior information. The proposed algorithm introduces two efﬁcient approximation techniques based on Nyström extension and anchor-based graph to construct the afﬁnity matrix. We also propose an effective solution to solve the eigenvalue decomposition problem by multiplicative update optimization. Experiments on both the synthetic datasets and the hyperspectral image datasets were conducted to demonstrate the efﬁciency and effectiveness of the proposed algorithm.


Introduction
Hyperspectral images (HSIs) contain information on hundreds of continuous narrow spectral wavelengths, which are collected by aircrafts, satellites, and unmanned aerial vehicles in each HSI pixel [1][2][3][4]. Since HSIs reflect rich spectral and spatial resolution, they offer the potential to discriminate more detailed classes and provide even broader applications for land-over classification and clustering [5][6][7][8]. To a certain extent, dealing with HSIs is difficult because the numerous spectral bands significantly increase the computational complexity and the noise in HSIs can badly influence the classification accuracy [9,10]. The existing work reported by most scholars can be roughly divided into two categories according to whether a certain number of training samples are required, as demonstrated in [11,12]: (1) supervised learning named HSI classification; and (2) unsupervised learning named HSI clustering. In the literature, many HSI classification algorithms have been proposed and they have achieved excellent performances. One popular method for HSI classification is to first use dimension reduction and then follow a classifier such as support vector machines [13,14]. Due to the noises and redundancy among spectral bands, many feature extraction, band selection and dimension reduction techniques have been developed in the past years. Some representative work, such as principle component analysis [15] and feature-selection algorithm [16,17], are also widely applied in which contain about 100,000 pixels, because of the rapid growth of the storage and time complexity of affinity matrix construction and eigenvalue decomposition of Laplacian matrix.
To alleviate the above problem, several improved spectral clustering methods have been proposed for large-scale HSIs with high spatial resolution. An efficient way to get low-rank matrix approximation based on Nyström extension has been widely applied in many kernel based clustering task [39,40], and recent studies have shown good performance in the task of HSI processing [41,42]. Another method proposed by Nie et al. [43,44] constructs anchor-based affinity matrix with balanced k-means based hierarchical k-means algorithm. Wang et al. [26] improved the anchor-based affinity matrix by incorporating the spatial information. Meanwhile, Nonnegative Matrix Factorization (NMF) technique [45,46] and its variants also provide an efficient solution for HSI classification. Motivated by the existing approaches, we propose an improved spectral clustering based on multiplicative update algorithm and two efficient methods for affinity matrix approximation. In general, the spectral clustering problem can be solved by the standard trace minimization of the objective function and we propose an efficient resolution though multiplicative update optimization according to the derivative of the objective function. Meanwhile, the nonnegative constraint and the orthonormal constraint provide a better indicator matrix and this makes it easier to get a robust clustering result by the later processing such as k-means. Furthermore, the anchor-based graph and the Nyström extension are introduced to improve the computational complexity by affinity matrix approximation for the large-scale HSIs. There are three main contributions of this work: 1. An novel multiplicative update optimization for eigenvalue decomposition is proposed for large-scale unsupervised HSIs classification. It is worth noting that the proposed method can be easily portable to the variants of spectral clustering methods with different regularization items only if the constraints are convex functions. 2. Two affinity matrix approximation techniques, namely the anchor-based graph and the Nyström extension, are introduced to improve the affinity matrix by sampling limited samples (i.e., pixels or anchors). 3. Comprehensive experiments on the HSI datasets illustrated that the proposed method achieved a good result in terms of efficiency and effectiveness, and the combination of multiplicative update method and affinity matrix approximation provided a better performance.
The rest of this paper is organized as follows. Section 2 provides notations and a brief view of the general spectral clustering algorithm. Next, we present the motivation and formulate the proposed multiplicative update algorithm. Furthermore, an effective multiplicative update method for eigenvalue decomposition is presented in Section 3. To further improve the computational complexity of affinity matrix, we introduce two efficient approximated techniques in Section 4. The experimental results including performance analyses, computational complicity and parameter determination are given in Section 5. Section 6 concludes this paper.

Overview
We begin by reviewing the classical spectral clustering algorithm, and before going into the details, we firstly introduce the notation.

Notation
In this part, we define some notation to make sure that the mathematical meaning of the proposed method can be formulated clearly. The pixels of HSIs can be considered as {x i ∈ R d , i = 1, 2, ..., n} where d is the dimensionality (i.e., the number of spectral bands). Let {y 1 , y 1 , ..., y c } ⊂ R c be the indicator vectors of the pixels {x 1 , x 1 , ..., x n }, respectively. Here, y i = [y i1 , y i2 , ..., y ic ], where c is the predefined number of clusters, and the indicator vectors y ij = 1 if and only if x i belongs to the jth cluster and y ij = 0 otherwise. Denote Y = [y T 1 , y T 2 , ..., y T n ] T ∈ R n×c , and Y ≥ 0 indicates that the whole elements of Y are nonnegative. The affinity matrix is denoted by W and the Laplacian matrix is denoted by L. The corresponding trace can be denoted by Tr(W) and the Frobenius norm of W is denoted by ||W|| F . The detailed notations are summarized in Table 1 and we explain the meaning of each term when it is first used.

Normalized Cuts Revisit
A set of samples (i.e., pixels) {x 1 , x 2 , ..., x n } can be considered as an undirected graph G = {Vertices, Edges}. Each vertex represents a sample x i and the edge is aligned by their similarity. In general, the corresponding affinity (or similarity) matrix W can be denoted as where σ is the width of the neighbors, W is a symmetric matrix and W ij is the affinity of samples x i and x j . Let A and B represent a bipartition of Vertices, where A ∪ B = Vertices and A ∩ B = ∅. Let cut(A, B) denotes the sum of the weights between A and B as cut(A, B) = ∑ i∈A,j∈B W ij . The volume of a set is defined as the sum of the degrees within that set: vol(A) = ∑ i∈A D ii and vol(B) = ∑ i∈B D ii , where D ii = ∑ j W ij . The normalized cut between A and B can be considered as follows: where || is the harmonic mean. According to [47], an optimal resolution of NCut(A, B) can be provided by solving the minimization of the following equation where D is the diagonal matrix with elements D ii = ∑ j W ij . y is the indicator vector, where y ij = 1 if and only if x i belongs to the jth cluster and y ij = 0 otherwise. According to spectral graph theory, an approximate resolution of Equation (3) can be considered as thresholding the eigenvector corresponding to the second smallest eigenvalues of the normalized Laplacian L as follows: Shi and Malik [47] illustrated that the normalized Laplacian matrix L is positive semidefinite even when W is indefinite. Its second smallest eigenvalue lies on the interval [0, 2] so the corresponding eigenvalues of D − where Y T Y = I and Y is the indicator matrix. This can be solved by the standard trace minimization problem according to the normalized spectral clustering proposed in [47]. The solution Y consists of the top c eigenvectors of the normalized Laplacian matrix L as columns. However, there are two tough problems to get an efficient and effective solution by using the classical spectral clustering technique: one is the eigenvalue decomposition of the Laplacian matrix L, which takes O(n 2 c) time complexity, and the other one is the storage and time complexity of the affinity matrix, which are O(n 2 ) and O(n 2 d), respectively. It is obvious that either of the above problems can be an unbearable burden with the increasing of the number of samples. To alleviate the above problem, motivated by the recent work such as Nyström extension, anchor-based graph and nonnegative matrix factorization, we propose a novel approach to solving the large-scale and high-dimensional HSI clustering (or unsupervised HSI classification), and the detailed demonstration are presented in the following sections.

Improved Spectral Clustering with Multiplicative Update Algorithm
In this section, we propose an multiplicative update algorithm to get an efficient resolution of the eigenvalue decomposition of the Laplacian matrix L. We firstly present the formulation and our motivation, and then a novel resolution for spectral clustering based on multiplicative update algorithm is proposed in Section 3.2.

Formulation and Motivation
In general, a multigroup spectral clustering problem (i.e., c > 2) can be considered as a minimization of the following equation: where λ > 0 is the Lagrangian multiplier and ||Y T Y − I|| 2 F is the item for orthonormal constraint. However, Equation (6) is still a non-smooth objective function, thus it is difficult to obtain an efficient resolution by solving the eigenvalue decomposition of the Laplacian matrix L. Motivated by NMF, which has excellent performance in dealing with clustering by relaxation technique, we relax the discreteness condition and propose an multiplicative update optimization to solve the eigenvalue decomposition, the details of which are illustrated in the next section.

Multiplicative Update Optimization
Spectral clustering cannot provide an efficient resolution since it would be too expensive to get an optimal approximation for eigenvalue decomposition in deal with large-scale datasets. Motivated by the recent work on NMF, we introduce the nonnegative constraint for indicator matrix as Y where Y ij > 0. Moreover, the traditional spectral relation approaches relax the indicator matrix Y to orthonormal constraint as Y T Y = I. According to a recent work [48], if the indicator matrix Y is orthonormal and nonnegative simultaneously, only one element is positive and the others are zeros in each row of Y. Note that we can get an ideal indicator matrix Y as defined in Section 2.1 by considering the above two constraints: Y > 0 and Y T Y = I. The above constraints are significant, which can help us to solve the eigenvalue decomposition in a more efficient way and this is also easy to implement.
By relaxing the discreteness condition and considering the above two constraints, Equation (6) can be rewritten as where Y > 0. Equation (7) can be considered as the cost function and we try to find an optimal resolution of minimization. The derivation of Equation (7) is where L = I − D − 1 2 WD − 1 2 and Equation (8) can be rewritten as In this case, the derivation of Equation (7) is divided into two parts. Both Y + 2λYY T Y and 2λY + D − 1 2 WD − 1 2 Y are nonnegative matrices since Y > 0, D > 0, and W ≥ 0. For convenience, we denote the former factor as Q = Y + 2λYY T Y and the latter factor as P = 2λY + D − 1 2 WD − 1 2 Y. According the multiplication update rule for standard NMF algorithm [49], we can get the minimization of the cost function in Equation (7) by updating Y as follows: where • and denote Hadamard product and Hadamard division (i.e., element-wise multiplication and division), respectively, and Y ij ← Y ij · P ij /Q ij . Then, we can get a optimal resolution until the cost function converge and the implement details are presented in Algorithm 1. Since only one element is positive and the others approximate zero in each row of the indicator matrix Y, it can be considered as a nearly perfect indicator matrix for clustering representation.  (10): end Input Y to k-means to get the clustering result S.

Approximated Affinity Matrix
To further improve the time and storage complexity of computing affinity matrix to make the spectral clustering algorithm available for large-scale datasets such as HSIs, we introduce anchor-based graph and Nyström extension to approximate the original affinity matrix with limited samples.

Affinity Matrix with Nyström Extension
The Nyström extension is a technique for finding numerical approximations to eigenfunction problems and the detailed illustration can be found in [50]. It allows us to extend an eigenvector computed for a set of sample points to arbitrary samples x with the interpolation weights.
Inspired by [47], the affinity matrix considers both the brightness value of the pixels and their spatial location, and we can define the similarity of two samples x i and x j as where l i and l j are the spatial locations of the HSI's pixels. σ l and σ x are the bandwidth of neighboring pixels and these parameters are sensitive to different HSIs. To alleviate this problem, Zhao et al. [35] introduced an adaptive parameter and we can defineσ l andσ x as and Equation (11) can be presented as where the parameter α controls the neighbor of affinity matrix. For uniformity in notation, the affinity matrix A is similarity defined by Equation (11) of m chosen samples. The affinity matrix of the remaining n − m samples and the chosen samples are denoted as B. C is the affinity matrix for the remaining samples. The affinity matrix W can be rewritten as where A ∈ R m×m , B ∈ R m×(n−m) and C ∈ R (n−m)×(n−m) . According to the Nyström extension, C can be denoted by C = B T A −1 B and the approximated affinity matrix W can be formulated as We can find that the size of this norm is governed by the extent to which C is spanned by the rows of B, and the Nyström extension provides an approximation to the entire affinity matrix with only a subset of rows or columns.
To extend the above matrix form of Nyström method to NCut, we need to calculate the row sum of matrix W. However, it is possible without explicitly evaluating the sub-matrix where A1 m and B1 n are the row sum of matrix A and A and B T 1 m is the column sum of matrix B. Then, the matrix A and B can be formulated as and we can get the normalized affinity matrix D − 1 2 WD − 1 2 (refer to Equation (15)) as before; thus, we can get where A and B are from Equation (17). However, the elements of D − 1 2 WD − 1 2 can be negative since the matrix A −1 may contain negative elements. However, we have to keep D − 1 2 WD − 1 2 ≥ 0 to satisfy the constraints of the proposed multiplicative update algorithm. Because of this, we denote A + = (|A| + A)./2 and A − = (|A| − A)./2, where we can find that A + is the positive part of A and A − is the negative part of A. Note that both A + and A − are negative matrix and P and Q can be formulated as

Affinity Matrix with Anchor-Based Graph
The anchor-based graph was proposed by Zhu et al. [43] for large-scale data clustering problem. It makes the label prediction function a weighted average of the labels on a subset of anchor samples. If one can infer the labels of other unlabeled samples, they are easily obtained by a simple linear combination. As such, the label prediction function f (·) can be represented by a subset A = {a j } m j=1 ⊂ R D in which each a j acts as an anchor sample, where Z is the data-adaptive weight matrix which measures the similarity between samples and anchors. We define two vectors .., f (a n )] T , thus Equation (20) can be rewritten as This formula reduces the solution space of unknown labels from large F to smaller F a . The first problem of anchor-based graph construction is how to choose the anchors. In general, the anchors can be considered as random samples or representative samples such as k-means clustering centers. Random selection chooses m anchors by random sampling from samples with computational complexity O(1). However, the randomly chosen samples cannot guarantee that the approximated affinity matrix is always robust. Liu et al. [51] suggested using k-means clustering centers as anchors instead of randomly chosen samples since the k-means clustering centers have a robust representation power to adequately cover the whole data. However, the computational complexity of k-means is O(ndmt), where t is the number of iterations.
The second problem is how to design a regression matrix Z that measure the underlying relationship between the whole samples and the chosen anchors. Liu et al. [51] proposed a method named Local Anchor Embedding (LAE) to reconstruct the regression matrix, where {a 1 , a 2 , ..., a m } denote the chosen anchors and K(·) is a given kernel function with bandwidth parameters: The notation Φ i ⊂ [1, 2, ..., m] is the set saving the indexes of s nearest neighbors of x i in A, and the Gaussian kernel K(x i , a j ) = exp(−||x i − a j || 2 2 /2σ 2 ) is adopted for the kernel regression. However, the kernel-based methods need an extra parameter (i.e., bandwidth σ). Nie et al. [27] adopted a parameter-free yet effective neighbor assignment strategy and they obtained the ith row of Z by solving following problem: where Z T i denotes the ith row of Z and γ is the regularization parameter. Note that Equation (23) does not consider the spatial information of HSIs, which may result in some isolated pixels appearing in the clustering HSI due to the existence of noise, outliers, or mixed pixels. Recent studies incorporate the spatial information by directly modifying the cost function in Equation (23) as follows: wherex i is the mean of the neighboring pixels lying within a window around x i and the parameter α controls the tradeoff between hyperspectral information and spatial information. Let d x ij = ||x i − u j || 2 2 and dx ij = ||x i − u j || 2 2 , and denote d i ∈ R m a vector with the jth element as d ij = d x ij + βdx ij . It is obvious that Equation (24) can be rewritten in vector form as where Z T i 1 = 1 and Z i ≥ 0. Following y Nie et al. [43], the parameter γ can be denoted by γ = (s/2)d i,s+1 − (1/2) ∑ s j=1 d ij , and the resolution of Equation (25) is For the detailed deviation, see [27]. After we get the regression matrix Z, the affinity matrix W can be obtained as where ∆ is a diagonal matrix, the jth entry is defined as ∑ n i=1 Z ij and W is symmetric positive semidefinite and doubly stochastic. Not that Z T i 1 = 1 and Z i ≥ 0, and it can be verified that W is a double stochastic matrix and W ij ≥ 0. More importantly, the approximated matrix W is automatically normalized and we can find that W = D − 1 2 WD − 1 2 . In this case, the Laplacian matrix L can be considered as L = I − W and we can rewrite P and Q as follows:

Experiments
In the experiments, we verified the performance of the proposed unsupervised HSI classification algorithm on both synthetic datasets and HSI datasets, and then showed several useful analysis. The synthetic benchmark datasets were three sets of data with manifold structure and the HSI datasets are several hyperspectral images (i.e., Salinas, Pavia University, Kennedy Space Center, Samson, Indian Pines, Urban and Japser).

Experimental Datasets
We conducted experiments on eight widely used hyperspectral datasets: Japser Ridge is a hyperspectral image with 100 × 100 pixels. Each pixel was recorded at 224 channels ranging from 380 nm to 2500 nm. The spectral resolution is up to 9.46 nm. There are four end-members latent in these data: Road, Soil, Water and Tree. • Urban has 210 wavelengths ranging from 400 nm to 2500 nm, resulting in a spectral resolution of 10 nm. There are 307 × 307 pixels, each of which corresponding to a 2 × 2 m 2 area. There are three versions of the ground truth, which contain 4, 5 and 6 end-members respectively, and are introduced in the ground truth. • Indian Pines was gathered by AVIRIS sensor in northwestern Indiana and consists of 145 × 145 pixels and 224 spectral reflectance bands. The Indian Pines scene contains two-thirds agriculture, and one-third forest or other natural perennial vegetation. The ground truth available is designated into sixteen classes and we reduced the number of bands to 200 by removing bands covering the region of water absorption.

Evaluation Metrics
In the experiments, we evaluated the clustering results by Purity (P.) and Normalized Mutual Information (NMI).
• P. is the most common metric for clustering results evaluation and it can be formulated as where Ω is the clustering result set and Ω is the ground truth. The worst clustering result is very close to 0 and the best clustering result has a purity value equal to 1. • NMI is a normalization of the mutual information score to scale the results between 0 and 1 as where n i denotes the number of data contained in the cluster C i (1 ≤ i ≤ c),n j is the number of data belonging to the L j (1 ≤ j ≤ c), and n i,j denotes the number of data that are in the intersection between the cluster C i and the class L j . The larger is the NMI, the better is the clustering result.
We ran the experiments under the same environment: Intel(R) Core(TM) i7-5930K CPU, 3.50 GHz, 64 GB memory, Ubuntu 14.04.5 LTS system and Matlab version R2014b. We compared our algorithm with Spectral Clustering (SC), Anchor-based Graph Clustering (AGC), and Nyström Extension Clustering (NEC). The corresponding improved algorithms based on multiplicative update optimization are SC-I, NEC-I, and AGC-I. The affinity matrix of the above algorithms were constructed in three ways and the detailed description of the above affinity matrix is presented in the next section.

Toy Example
We firstly explored the performance of our algorithm on three synthetic datasets to verify the effectiveness of multiplicative update optimization and two approximated affinity matrix matrices. In this experiment, three synthetic datasets were introduced in our experiment: Cluster in Cluster (CC), Two Spirals (TS), and Crescent Moon (CM). Figure 1 presents the manifold structure of the synthetic datasets in detail. These synthetic datasets contain 2000-40,000 data points that are divided into two groups and they are extremely challenging since clustering algorithms that only consider data point distance have difficulty obtaining a robust result. The algorithms with spectral graph theory provide a more powerful technique in dealing with the manifold information. The resolution for spectral clustering can be divided into two parts: affinity matrix construction and eigenvalue decomposition of the Laplacian matrix. In this paper, we present three formulations for the affinity matrix construction as Anchor-based graph : where x is the whole sample and u is the chosen data points. α is the parameter to control the neighbor of data points for Euclidean distance and we set α = 10. A is the affinity matrix for anchors (chosen data points) and B stores the similarity between anchors (chosen data points) and the remaining ones. d ij denotes the distance between the ith data point and the jth anchor, which can be considered as chosen data points, and d i1 , d i2 , ..., d in are ordered from small to large. According to [27], the parameter k for anchor-based graph was set to 10, which provided a good performance in most cases. Note that the last two affinity matrices are the approximated solution for the original affinity matrix. The sample scale was set to 10, which means we randomly selected one-tenth of data points as the anchors or the chosen data points. Compared with the traditional eigenvalue decomposition of the Laplacian matrix, we propose a multiplicative update optimization to get a more efficient solution of eigenvalue decomposition. In our experiments, the number of iterations was about 150 and we obtained good results in most cases. Besides the above-mentioned parameters, the other parameters of the compared algorithms and our improved algorithms were tuned to the optimum.
Tables 2-4 present the performance of the above six methods on three synthetic datasets. SC and SC-I provided a good clustering result since the corresponding affinity matrix considered the similarity of the whole data points; however, these two methods also needed more time to calculate the Euclidean distance among samples. Note that the proposed multiplicative update algorithm delivered a substantial efficiency increase, taking only half the time to get a similar clustering result. NEC and AGC had the benefit of the approximated affinity matrix and took only about one-tenth the time, but NEC was not robust enough to get a stable resolution of the eigenvalue decomposition. Compared with NEC, the improved algorithm NEC-I provided a better clustering result because of the orthonormal constraint and nonnegative constraint. AGC performed better than SC and NEC in terms of effectiveness and efficiency in the experiments, as it utilized the anchor-based affinity matrix, and the proposed AGC-I also had a good performance.

HSI Clustering Analysis
In this section, a further study is presented to illustrate the performance of the proposed multiplicative update algorithm and the efficiency of the approximated affinity matrix mentioned in Section 4 on several popular hyperspectral image datasets. We followed the experiment setting in the previous section where the parameter α was set to 10 and the parameter k was set to 10. In addition, the parameter λ was set to 0.5 and the other parameters were tuned to the optimum for fair competition. Note that the affinity matrix for the hyperspectral image datasets was different from the previous section because it needed to consider both the brightness value and the spatial information. In this case, the affinity matrix W can be rewritten as where l is the pixel location and the parameter α was set to 10 for both the brightness value and the spatial information. The affinity matrices A and B for NEC were constructed in the same way. Meanwhile, The affinity matrix for AGC is provided as where d ij = ||x i − u j || 2 2 + ||x i − u j || 2 2 andx is the mean of the brightness value around pixel x. Figure 2 and Table 5 present the experimental results, which were evaluated by Purity and NMI on the hyperspectral image datasets. We made the following observations: • SC and the corresponding improved algorithm SC-I achieved competitive performance in term of Purity and NMI. However, SC took more time solving eigenvalue decomposition of Laplacian matrix and our improved algorithm provided a more efficient solution because of the utilization of the multiplicative update optimization. Meanwhile, it took more time to process India Pines because of the rapid growth of time complexity of eigenvalue decomposition of Laplacian matrix caused by the increase of spatial resolution and classes. Note that SC-I, which is based on the multiplicative update algorithm, slightly outperformed SC in terms of Purity and NMI, illustrating that the nonnegative constraint and the orthonormal constraint provided a better indicator matrix. This made it easier to get a robust clustering result by the later processing, such as k-means.
• NEC and AGC are two efficient improved algorithms and they took only one-twentieth the time in our experiments. Moreover, NEC and AGC could be used on large-scale hyperspectral image datasets such as KSC and Urban, while SC ran out of memory in dealing with the above large-scale datasets because of the storage and time complexity of the affinity matrix. However, the experimental results also illustrate that NEC was not robust enough, which might be because the affinity matrix A can be indefinite and the inverse matrix contains plural elements, making it difficult to get a robust clustering result by k-means. Besides NEC, the other methods did not struggle with this problem, and also provided a better performance than NEC.

•
The proposed NEC-I and AGC-I outperformed the other methods in terms of effectiveness and efficiency. NEC-I and AGC-I firstly take the advantage of sample techniques including Nyström extension and anchor-based graph, which allow them to be used on large-scale hyperspectral image datasets. Furthermore, the proposed multiplicative update algorithm provided an efficient resolution for eigenvalue decomposition of Laplacian matrix. The results presented in Table 5 illustrate that NEC-I and AGC-I performed better than NEC and AGC in most cases. The proposed multiplicative update optimization is flexible and well-knit with the approximated affinity matrix such as Nyström extension and anchor-based graph.    Figure 3 lists the computational time on three synthetic datasets. We ran the experiments under the same environment: Intel(R) Core(TM) i7-5930K CPU, 3.50 GHz, 64 GB memory, Ubuntu 14.04.5 LTS system and Matlab version R2014b. The methods listed in Figure 3 achieved similar clustering results when there were fewer than 10,000 data points, and SC and SC-I took more time than the other methods when there were more than 10,000 data points. Moreover, the computational time grew rapidly along with the increase of the number of data. The proposed improved algorithm SC-I took only about half the time with more than 30,000 data points. Compared with the above two methods, NEC, AGC and the corresponding improved algorithms NEC-I and AGC-I provided better performance in terms of computational time. Meanwhile, the affinity matrix constructed by the anchor-based graph was better than the affinity matrix constructed by Nyström extension, as the anchor-based graph provided a better way to measure the similarity of data points.

Conclusions
In this paper, we briefly review the classical spectral clustering technique for unsupervised HSI classification, and two major problems in dealing with large-scale HSI datasets, namely affinity matrix construction and eigenvalue decomposition of Laplacian matrix. Firstly, we introduce two efficient affinity matrix approximation methods, namely Nyström extension and anchor-based graph, by sampling several HSI pixels. Furthermore, we propose an efficient and effective multiplicative update algorithm to get a robust resolution of eigenvalue decomposition and the experimental results also illustrate that the combination of the affinity matrix approximation and the multiplicative update optimization outperformed the other methods. More importantly, the proposed improved algorithm provides an efficient solution for large-scale HSI classification where the traditional spectral clustering have no capability to deal with them.