Next Article in Journal
Small Multicopter-UAV-Based Radar Imaging: Performance Assessment for a Single Flight Track
Next Article in Special Issue
Improving K-Nearest Neighbor Approaches for Density-Based Pixel Clustering in Hyperspectral Remote Sensing Images
Previous Article in Journal
Wildfire Smoke Particle Properties and Evolution, from Space-Based Multi-Angle Imaging
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Sketch-Based Subspace Clustering of Hyperspectral Images

1
Department of Telecommunications and Information Processing, TELIN-GAIM, Ghent University, 9000 Ghent, Belgium
2
State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan 430079, China
3
Department of Electrical and Computer Engineering, Mississippi State University, Starkville, MS 39762, USA
*
Author to whom correspondence should be addressed.
Remote Sens. 2020, 12(5), 775; https://doi.org/10.3390/rs12050775
Submission received: 8 January 2020 / Revised: 22 February 2020 / Accepted: 24 February 2020 / Published: 29 February 2020
(This article belongs to the Special Issue Advances on Clustering Algorithms for Image Processing)

Abstract

:
Sparse subspace clustering (SSC) techniques provide the state-of-the-art in clustering of hyperspectral images (HSIs). However, their computational complexity hinders their applicability to large-scale HSIs. In this paper, we propose a large-scale SSC-based method, which can effectively process large HSIs while also achieving improved clustering accuracy compared to the current SSC methods. We build our approach based on an emerging concept of sketched subspace clustering, which was to our knowledge not explored at all in hyperspectral imaging yet. Moreover, there are only scarce results on any large-scale SSC approaches for HSI. We show that a direct application of sketched SSC does not provide a satisfactory performance on HSIs but it does provide an excellent basis for an effective and elegant method that we build by extending this approach with a spatial prior and deriving the corresponding solver. In particular, a random matrix constructed by the Johnson-Lindenstrauss transform is first used to sketch the self-representation dictionary as a compact dictionary, which significantly reduces the number of sparse coefficients to be solved, thereby reducing the overall complexity. In order to alleviate the effect of noise and within-class spectral variations of HSIs, we employ a total variation constraint on the coefficient matrix, which accounts for the spatial dependencies among the neighbouring pixels. We derive an efficient solver for the resulting optimization problem, and we theoretically prove its convergence property under mild conditions. The experimental results on real HSIs show a notable improvement in comparison with the traditional SSC-based methods and the state-of-the-art methods for clustering of large-scale images.

Graphical Abstract

1. Introduction

Hyperspectral images (HSIs), acquired by the hyperspectral cameras, record the spectrum of materials covering a wide range of wavelengths. The rich spectral information of HSIs enables discriminating the materials that are often visually indistinguishable, which led to a number of applications in remote sensing, such as target detection [1,2], environmental monitoring [3], geosciences, defense and security [4]. It is often desired to categorize pixels in the imaged scene into different classes, corresponding to different materials or different types of objects. When no training data is available, this task is called clustering. Hence, clustering is also referred to as unsupervised classification.
Two most popular clustering methods are fuzzy c-means (FCM) [5] and k-means [6,7] due to their simplicity and superior computational efficiency. They group data points by finding the minimum distance between test data and each cluster centroid which is updated iteratively. However, their performance is sensitive to initial conditions and noise.
Recently, spectral clustering-based methods [8,9,10,11,12,13] have achieved a great success and have been widely applied in various fields due to excellent performance and robustness to noise [14]. In general, these methods first define a similarity matrix to construct a graph of data points, which is learned from the input data under different critera. Then, the resulting similarity matrix is used within the spectral clustering framework. The performance of spectral clustering heavily depends on the similarity matrix [14], hence, its construction is a crucial step. Many of these methods, like local subspace affinity (LSA) [15], spectral local best-fit flats (SLBF) [16] and locally linear manifold clustering (LLMC) [17] build the similarity matrix with k nearest neighbours (KNN) using angle or Euclidean distance between two data points. This approach tends to treat erroneously the data points near the intersection of two subspaces because their closest points often lie in another subspace.
The recent sparse subspace clustering (SSC) method [11] constructs the similarity matrix based on the self-expressiveness model where the input data is employed as the representation dictionary. SSC models a high-dimensional data space as a union of low-dimensional subspaces. The key insight is that, for each data point in the subspace S i , the global solution of the sparse coding problem with the self-representation dictionary automatically selects the data points in the same subspace S i . Thus, each data point gets automatically represented as a sparse linear or affine combination of other points in the same subspace. This is called subspace preserving property and is explicitly expressed by non-zero entries of the coefficient matrix C : i-th and j-th data points are in the same subspace if C i , j 0 . The coefficient matrix leads directly to the similarity matrix for spectral clustering.
As the SSC model calculates sparse coefficients individually and independently for each input data point, the clustering performance is sensitive to noise. In order to solve this problem, various extensions have been proposed with the aim to encode the spatial dependencies among the neighbouring data points in hyperspectral data, and obtain thereby more accurate similarity matrices and improved clustering results [18,19,20,21,22,23,24,25]. Guo et al. [18,19] focus on the clustering of 1-D drill hole hyperspectral data and regularize the coefficients of neighbouring data points in depth to be similar by a 1 norm based smoothing regularization. For the 2-D spatial-wise hyperspectral images, a smoothing strategy was introduced in Reference [20] by minimizing the difference between coefficients corresponding to the central pixel and to the mean of pixels in a local square window. A kernel version of SSC incorporating max pooling of the sparse coefficient matrix was presented in Reference [21]. The spectral-spatial SSC method of Reference [22] integrates an 2 spatial regularizer with the SSC model (L2-SSC), to penalize abrupt differences between the coefficients of nearby pixels. In Reference [23,25], an  1 , 2 norm constraint on the coefficients of pixels in each local region was incorporated in the SSC model. Based on the collaborative representation with an 2 norm constraint on the coefficients, a novel model with a locally adaptive dictionary was proposed in Reference [24].
While showing excellent performances, the above mentioned methods are also of considerable computational complexity, resulting from iterative optimization. The time complexity in each iteration is typically in the range of O ( ( M N ) 3 ) , where M and N are the number of rows and columns in each band. For large scale HSIs with millions of pixels in each band, this bound can thus exceed 10 18 elementary operations per iteration, and such processing becomes often infeasible on the common computing platforms. The approaches reported in References [26,27] addressed this problem by constructing a graph based on a set of selected representative samples. In combination with modified spectral clustering methods, a lower complexity has been reached, but the clustering results are sensitive to the initially selected samples. Recently, some generalized large-scale methods [28,29,30] based on SSC have been proposed for clustering tasks in computer vision. In Reference [28], a scalable SSC method was designed for large-scale data sets, where a small part of samples are first randomly selected and clustered with the SSC model, and then clustering of remaining samples is executed by sparse coding with respect to the dictionary constructed from previously selected samples. The work in Reference [29] studied an efficient SSC model based on orthogonal matching pursuit (OMP) and discussed theoretical conditions for subspace preserving representation. A recent sketched SSC model of Reference [30] lowers the computational burden of SSC by using a clever random projection technique to sketch and compress the input data to a computationally affordable level. While these large-scale SSC-based methods demonstrated success in real applications with facial images, handwritten text and news corpus data, to the best of our knowledge none of them was applied before in the clustering of HSIs. Our experiments show that despite the scalability of these methods, their clustering performance in HSIs turns out to be poor. This can be attributed to the complex spatial structure of HSIs, spectral noise and spectral variability.
In view of this, we propose a sketched sparse subspace clustering method with total variation (TV) spatial regularization, termed Sketch-SSC-TV, which can handle large-scale HSIs while achieving a high level of clustering performance. A sketching matrix constructed by a random matrix is firstly employed to build a sketched dictionary, which is much smaller than the self-representation dictionary, resulting in a significant reduction of the number of coefficients to be solved. By incorporating the spatial constraint as the TV norm on the coefficient matrix, the proposed model greatly promotes the connectivity of neighbouring pixels and improves the piecewise smoothness of clustering maps. Furthermore, we propose an algorithm with theoretically guaranteed global convergence to solve the resulting optimization problem. By adopting the sketching matrix, the optimization complexity of the TV-related sub-problem reduces from O ( ( M N ) 2 log ( M N ) ) to O ( M N n log ( M N ) ) ( n < < M N ), facilitating thus greatly the processing of large-scale data. The similarity matrix is constructed by applying KNN on the obtained coefficient matrix, and further employed within the spectral clustering method. Experiments conducted on four HSIs show superior clustering performance compared to both traditional SSC-based methods and the related large-scale clustering methods. The major contributions of the paper can be summarized as follows.
  • The most important contribution of this paper is a new SSC-based framework, which can be applied on large-scale HSIs while achieving excellent clustering accuracy. To the best of our knowledge, this is the first time to address the large-scale clustering problem of HSIs based on the SSC model.
  • Different from the traditional SSC-based methods which use all the input data as a dictionary, we adopt a compressed dictionary by using random projection technique to reduce the dictionary size, which effectively enables a scalable subspace clustering approach.
  • To account for the spatial dependencies among the neighbouring pixels, we incorporate a powerful TV regularization in our model, leading to a more discriminative coefficient matrix. The resulting model proves to be more robust to spectral noise and spectral variability.
  • We develop an efficient algorithm to solve the resulting optimization problem and prove its convergence property theoretically.
The rest of this paper is organized as follows. Section 2 briefly introduces the clustering of HSIs with the SSC model. Section 3 describes the proposed Sketch-SSC-TV model and the resulting optimization problem. Experimental results on real HSIs are presented in Section 4. Section 5 concludes the paper.

2. HSI Clustering with the SSC Model

Let a B-band HSI be denoted as Y R B × M N , where the i-th vector y i R B represents the spectral signature of the i-th pixel in HSI and M N is the total number of pixels. Sparse subspace clustering (SSC) partitions the high-dimensional data space into a union of lower dimensional subspaces. Concretely, it assumes that all high-dimensional data points y i ’s, that is, spectral signatures of all the pixels from a given HSI Y , are drawn from a union of subspaces, each of which corresponds to a particular class. The key idea is that among infinitely many possibilities to represent a data point y i in terms of other points, a sparse representation will select a few points that belong to the same subspspace as y i . This is known as the subspace preserving property. Thus, SSC starts from a self-representation model where the input data matrix Y is employed as a dictionary: Y = YC and infers the coefficient matrix C R M N × M N by solving the sparse coding problem (requiring that C is sparse) and ensuring that the trivial solution where each sample would be simply represented by itself is excluded. The non-zero entries in C will then indicate directly which data points lie within the common subspaces. Formally, SSC solves the following optimization problem:
arg min C C 1 + λ 2 Y YC F 2 s . t . diag ( C ) = 0 , 1 T C = 1 T ,
where C 1 = i j | C i j | ; 1 is an all-one vector; diag ( C ) is a diagonal matrix whose entries outside the main diagonal are zero and λ is a parameter, which controls the balance between the data fidelity and the sparsity of the coefficient matrix. The constraint diag ( C ) = 0 is introduced to avoid the trivial solution of representing a sample by itself and the second constraint 1 T C = 1 T ensures that each data point is an affine combination of other data points.
The problem in (1) can be solved by the ADMM algorithm [31], with the time complexity of O ( ( M N ) 2 B + ( M N ) 3 ( I + 1 ) ) where I is the number of iterations. The coefficient matrix C yields directly the dependence structure among the data points: a non-zero entry C i j indicates that the samples y i and y j are in the same class. Thus, it is reasonable to construct the similarity matrix W R M N × M N as
W = | C | + | C | T ,
where | C | takes the absolute values of C . The symmetric structure of W ensures that each pair of samples are connected to each other if either side is selected to represent another, which results in a strengthened connection of the graph. The similarity matrix W is then used as an input to spectral clustering [32] to produce the clustering result. Specifically, the Laplacian matrix L R M N × M N is first formed by
L : = D W
where D R M N × M N is a diagonal matrix with D i i = j W i j [33]. Then the c eigenvectors { v k } k = 1 c of L corresponding to the c smallest non-zero eigenvalues of L are calculated via singular-value decomposition (SVD). Finally, the clustering result is obtained by running k-means clustering on the M N × c matrix V = [ v 1 , , v c ] .

3. Sketch-SSC-TV Model for HSIs

In this section, we first introduce our new SSC-based model with TV regularization (SSC-TV), to effectively account for the spatial dependencies between the input data points. Next, we incorporate a random sketching technique into this model, leading to our unified Sketch-SSC-TV model for large-scale HSIs. Finally, we develop an efficient optimization algorithm for the resulting model based on ADMM.

3.1. The SSC-TV Model

HSIs not only record the spectrum of materials in the spectral domain but also capture the distribution of ground-objects in the spatial domain. As the distribution of ground materials typically shows some continuity, HSIs are composed of various nearly homogeneous regions made of pixels that belong to the same class with a very high probability [34,35,36,37,38,39,40]. For this reason, spectral signatures of pixels within small local regions are typically very similar. Conversely, the pixels belonging to different classes are more likely to occupy different spatial locations and exhibit significantly different spectral characteristics. HSI clustering assigns pixels into distinct groups according to the spectral similarities such that the pixels from the same group are more similar to each other than to those from other groups. Here, each cluster is viewed as a subspace. Thus, by conducting subspace clustering the pixels of HSIs in local homogeneous regions are likely to be grouped together in the same cluster. At the same time, the pixels showing significant spectral differences, which are typically also spatially separated, are assigned to different clusters. This way, subspace clustering results in a meaningful interpretation of the spatial content of HSIs. Thus, idealy, the subspaces in the spectral domain correspond to the cluster structure in the spatial domain. However, due to noise and spectral variability, the actual results of the subspace clustering model differ from the ideal cluster structure, and do not agree perfectly with the spatial content. Such sensitivity to noise and to spectral variability is inherent to all the methods that perform pixel-wise processing and thus also to the SSC model in (1), where sparse coefficients are calculated independently for each pixel. Random variations in the recorded spectral responses affect the solution of the sparse coding problem such that in the resulting sparse representation some data points may be represented by data points from different subspaces. This degrades the construction of the similarity matrix, thereby deteriorating spectral clustering performance. We aim to alleviate this problem by imposing a spatial constraint that makes the model less sensitive to random spectral variations of individual pixels.
To accommodate for the fact that pixels within a local homogeneous region are likely to belong to the same class, we require explicitly that sparse coefficients of nearby pixels likely to be mutually similar, that is, selecting similar sets of pixels in the subspace-sparse representation. Formally, this means that the coefficient matrix C exhibits certain local smoothness. Recall that pixels y i and y j are likely to belong to the same class if C i j 0 . In reality, C i j is rarely exactly 0, but the larger C i j , the more likely it is that y i and y j are from the same class. Ideally, y i as an atom in the dictionary, only contributes to the representation of pixels in the same class. Since neighbouring pixels from a local region of the input image Y usually belong to the same class in the ideal case, all of them are likely to select the same atoms in the subspace representation. Hence, any row of C , c i = [ C i 1 , C i 2 , , C i M N ] , composed of the coefficients that correspond to an atom y i , will reflect some aspect of the spatial structure of HSI. In other words, the ideal coefficients should reflect the local smoothness and discontinuities that are present in the original HSI, as shown in Figure 1, where each c i is reshaped to a M × N 2-D slice. This motivates us to introduce TV spatial regularization on sparse coefficients, which promotes effectively piece-wise smoothness while preserving sharp transitions among the distinct regions.
Let x R M N denote a vector of raster scanned pixel values from a grayscale image of size M × N and define the anisotropic TV norm (An alternative isotropic TV norm formulation is x T V = i = 1 M N [ ( H x x ) i ] 2 + [ ( H y x ) i ] 2 where ( · ) i is the i-th element of a vector) as
x T V = H x x 1 + H y x 1 ,
where H x and H y are the forward finite-difference operators in the horizontal and vertical directions, respectively, with periodic boundary conditions.
For the 2-D matrix Y reshaped from a 3-D HSI Y R M × N × B in HSI, the corresponding TV norm is formulated as
Y T V = H x Y T 1 + H y Y T 1 .
Now we incorporate the spatial constraint into the SSC model. In particular, we impose the TV norm as defined above on the sparse coefficient matrix C , and formulate our SSC-TV model as
arg min C 1 2 Y YC F 2 + λ C 1 + λ t v C T V s . t . diag ( C ) = 0 , 1 T C = 1 T ,
where λ and λ t v are two penalty parameters corresponding to sparsity and spatial constraint, respectively. Like with the standard SSC, the similarity matrix is obtained from C by applying (2), and fed into the spectral clustering. The TV norm imposed on C promotes the local smoothness of the resulting subspace-sparse representation, which encourages neighbouring pixels to select a common set of pixels from the same class. Since pixels belonging to the same class tend to be spatially clustered as well (within one or multiple local regions), this locally smooth coefficient structure will also lead to an improved agreement of the resulting spectral clustering with the underlying spatial structure.

3.2. The Sketch-SSC-TV Model for Large-Scale HSIs

The problem at this point is to solve the sparse coefficient matrix C from the cost function in (6). However, as the number of pixels in HSIs, M N , is typically very large, the matrix C R M N × M N in (6) is huge. The optimization problem of the SSC-TV model actually cannot be efficiently solved in practice due to its prohibitively high computational complexity. The traditional SSC-based methods [11,20,21,22,23,41,42] also suffer from the same problem. One key obstacle is that they have to calculate and save the inverse of the entire large matrix ( Y T Y + μ I ) R M N × M N in memory based on the ADMM algorithm, whose time complexity reaches O ( ( M N ) 3 ) , which is infeasible for large-scale data sets. In addition, for the TV-regularized model in (6), the complexity to solve the subproblem with respect to the TV-norm is O ( ( M N ) 2 log ( M N ) ) , which further increases the computation burden. Despite the effectiveness of incorporating the TV-norm in the tasks such as HSI unmixing [43,44], superresolution [45] and denoising [46,47,48], the exploitation of TV-regularization in the SSC model is impractical, especially for large-scale HSIs. In the following parts, a sketched SSC (Sketch-SSC) [30] method designed for large-scale data sets will be introduced, and then our Sketch-SSC-TV model is present.

3.2.1. The Sketch-SSC Model

The recently proposed Sketch-SSC method [30], which was explored in the context of computer vision, employs a random projection matrix R R M N × n to sketch the input data, compressing the self-representation dictionary Y in (1) to a compact one D R B × n : = YR . The objective function of the Sketch-SSC with respect to the sparse coefficient matrix A R n × M N can be formulated as
arg min A A 1 + λ 2 Y DA F 2 .
By using the random sketching matrix R , the number of optimization variables in the sparse matrix is significantly reduced, making the Sketch-SSC model applicable to large-scale data sets. We illustrate this pictorially in Figure 2. After obtaining the sparse matrix A , the similarity matrix W is built via the KNN graph of A for spectral clustering.
The random matrix R R M N × n used here is known as Johnson-Lindenstrauss transform (JLT), which can compress Y to a very small dictionary while preserving major information in Y . The typically used JLTs are matrices with independent and identically distributed (i.i.d.) ± 1 entries multiplied by 1 / n  [49]. It was proved in Reference [30] that with a properly selected sketching matrix R the compressed dictionary D shows an equal expressive capability to Y since it preserves the column space of Y with high probability.

3.2.2. The Sketch-SSC-TV Model

By using the sketching technique in Reference [30], we convert the SSC-TV model in (6) to the following Sketch-SSC-TV model:
arg min A 1 2 Y DA F 2 + λ A 1 + λ t v A T V .
Compared with (6), the self-representation dictionary Y is replaced with the sketched D , and also the constraint diag ( C ) = 0 is not necessary any more because I is not the trivial solution of (8). For simplicity, here we also remove the affine subspace constraint 1 T C = 1 T . D serves as the basis to represent the whole data and thus pixels in HSI now lie in the union of subspaces described by D . Similarly to the self-representation method SSC, the coefficients with respect to D should preserve the smoothness of pixel values in local image regions. Since the neighbouring pixels are often in the same class, they ideally select the same or similar set of atoms in D which are constituting together that particular class. As for the computational complexity, the heaviest part in the traditional SSC-based methods for solving the inverse of ( Y T Y + μ I ) R M N × M N is replaced with the inverse of ( D T D + μ I ) R n × n in (8), which reduces the complexity from O ( ( M N ) 3 ) to O ( n 3 ) . In addition, for the model in (6) the complexity of the solver to the TV term is also reduced from O ( ( M N ) 2 log ( M N ) ) in (6) to O ( M N n log ( M N ) ) in (8). Note that n is much smaller than M N . Our experimental results shown later indicate that when n is larger than 100, there is no obvious performance improvement, and thus n can be empirically set to a value around 100, which can be more than thousand times smaller than M N in large-scale HSIs. Therefore, the computational complexity of the Sketch-SSC-TV model can be significantly reduced.
We solve the resulting model by the ADMM algorithm, as described in the following subsection. After obtaining the sparse coefficient matrix A , we cannot apply it directly in the same way as the traditional SSC-based methods to construct the similarity matrix since the size of A is n × M N and it cannot explicitly indicate the connections between input data points.
Here we use a KNN graph to build the similarity matrix with the sparse matrix A . For each a i from the i-th column of A , the first k nearest neighbours in Euclidean distance are located, denoted as N k ( a i ) . Then the similarity matrix W is calculated as
W i j = { w i j a i N k ( a j ) or a j N k ( a i ) , 0 otherwise
where w i j is obtained with a Gaussian kernel function:
w i j = e a i a j 2 2 2 σ 2 .
For large-scale HSIs, the construction of the KNN graph may result in a high computation burden. However, various methods [50,51,52] can be used to speed up this procedure. The obtained sparse similarity matrix W serves as an input to the spectral clustering framework to produce the clustering result. The complete procedure of the proposed Sketch-SSC-TV method is summarised in Algorithm 1.
Algorithm 1 The complete procedure of the proposed Sketch-SSC-TV method
1: Input: An input matrix Y R B × M N , D R B × n , λ , λ t v , k, σ 2 and c.
2: Calculate A by solving (8).
3: Construct W using (9).
4: Plug W into spectral clustering.
5: Output: A clustering map.

3.3. Optimization

In order to solve model (8), three auxiliary variables B , Z R n × M N and U R 2 M N × n are introduced, and then model (8) becomes
arg min B , A , Z , U 1 2 Y DB F 2 + λ Z 1 + λ t v U 1 s . t . A = B , A = Z , H A T = U ,
where H = [ H x ; H y ] is the TV operator in spatial direction of HSIs.
Based on the efficient ADMM algorithm, the optimization problem (11) can be solved by minimizing the resulting augmented Lagrangian function as:
L ( B , A , Z , U , Y 1 , Y 2 , Y 3 ) = 1 2 Y DB F 2 + λ Z 1 + λ t v U 1 + Y 1 , A B + Y 2 , A Z + Y 3 , H A T U + μ 2 A B F 2 + μ 2 A Z F 2 + μ 2 H A T U F 2 ,
where Y 1 , Y 2 R n × M N and Y 3 R 2 M N × n are the Lagrange multipliers, and  μ is a weighting parameter. To this end, the following subproblems can be solved iteratively. In each subproblem, a variable is updated with others being fixed.

3.3.1. Update B

The objective function with respect to B is given by:
B r + 1 = arg min B 1 2 Y DB F 2 + μ 2 A r B + Y 1 r μ F 2 . .
The solution can be obtained by setting the first-order derivative to zero:
B r + 1 = ( D T D + μ I ) 1 ( D T Y + μ A r + Y 1 r ) .

3.3.2. Update A

The objective function with respect to A is given by:
A r + 1 = arg min A 1 2 A B r + 1 + Y 1 r μ F 2 + 1 2 A Z r + Y 2 r μ F 2 + 1 2 H A T U r + Y 3 r μ F 2 .
By setting the first-order derivative to zero, we can obtain
A ( H T H + 2 I ) = Z r + B r + 1 Y 1 r μ Y 2 r μ + ( U r T Y 3 r T μ ) H .
As for each row of A , H is a convolution, the above problem can be efficiently solved by using the fast Fourier transform (FFT) method:
A r + 1 = F 1 [ G 2 + ( F ( H x ) ) 2 + ( F ( H y ) ) 2 ] ,
where G = F ( Z r + B r + 1 Y 1 r / μ Y 2 r / μ + ( U r T Y 3 r T / μ ) H ) , and  F ( · ) and F 1 ( · ) denote the FFT and the inverse FFT, respectively.

3.3.3. Update Z

The objective function with respect to Z is given by:
Z r + 1 = arg min Z λ Z 1 + μ 2 A r + 1 Z + Y 2 r μ F 2 . .
By introducing the following soft-thresholding operator:
R ( x ) = { s g n ( x ) ( | x | ) | x | 0 o t h e r w i s e
the problem in (18) can be solved by
Z r + 1 = R λ μ ( A r + 1 + Y 2 r μ ) .

3.3.4. Update U

The objective function with respect to U is given by:
U r + 1 = arg min U λ t v U 1 + μ 2 H A ( r + 1 ) T U + Y 3 r μ F 2
Similarly, U can be updated by
U r + 1 = R λ t v μ ( H A ( r + 1 ) T + Y 3 r μ ) .

3.3.5. Update Other Parameters

The next step is to update the multipliers Y 1 , Y 2 , Y 3 and μ by
Y 1 r + 1 = Y 1 r + μ ( A r + 1 B r + 1 ) Y 2 r + 1 = Y 2 r + μ ( A r + 1 Z r + 1 ) Y 3 r + 1 = Y 3 r + μ ( H A ( r + 1 ) T U r + 1 ) .
The above 5 steps are iteratively updated until the stop criterion is satisfied, that is,  A r + 1 B r + 1 < ϵ , A r + 1 Z r + 1 < ϵ and H A ( r + 1 ) T U r + 1 < ϵ or r > M a x I t e r , where M a x I t e r is the predefined maximum number of iteration. Algorithm 2 summarizes the whole optimization steps of the Sketch-SSC-TV model.
Algorithm 2 ADMM for solving the Sketch-SSC-TV model
1: Input: Y , R , λ and λ t v .
2: Initialize: A = 0 , Z = 0 , U = 0 , Y 1 = 0 , Y 2 = 0 , Y 3 = 0 , ϵ = 10 5 , M a x I t e r = 100
3: Do
4: Update B by (14).
5: Update A by (17).
6: Update Z by (18).
7: Update U by (22).
8: Update other parameters by (23).
9: While ( A r + 1 B r + 1 > ϵ or A r + 1 Z r + 1 > ϵ or H A ( r + 1 ) T U r + 1 > ϵ and
r M a x I t e r )
10: Output: Sparse matrix Z .

3.4. Convergence Analysis

The convergence property of the ADMM algorithm has been theoretically proven when two blocks of variables are alternatively updated [31,53,54]. However, it is difficult to guarantee the convergence of ADMM for the cases with more than two blocks [55]. In our problem (11), there are four variables { B , A , Z , U } . We show a week convergence property of our algorithm by proving that the solution obtained by Algorithm 2 converges to a Karush-Kuhn-Tucker (KKT) point under some mild conditions. We refer to these conditions as “mild”, meaning that they are most of the time fulfilled in practice, which is evidenced by the experimental results shown later. This weak convergence property is stated in Theorem 1 given below. We first introduce a lemma from Reference [56] that we will need in proving this theorem.
Lemma 1
([56]). Let X be a real Hilbert space endowed with an inner product · , a norm · with its dual norm · d u a l , and y x , where f ( · ) is the subgradient of f ( · ) . Then we have y d u a l = 1 if x 0 , and y d u a l 1 if x = 0 .
Theorem 1.
Let { Γ r = ( B r , A r , Z r , U r , Y 1 r , Y 2 r , Y 3 r ) } r = 1 be the sequence that is derived from Algorithm 2. If lim r μ ( Z r + 1 Z r ) = 0 and lim r μ ( U r + 1 T U r T ) = 0 , the sequence { Γ r } r = 1 is bounded, and its accumulation point Γ * = ( B * , A * , Z * , U * , Y 1 * , Y 2 * , Y 3 * ) satisfies the KKT conditions. The sequence of { Γ r } r = 1 converges to a KKT point.
Proof. 
We first prove the boundedness of the variable sequences { B r , A r , Z r , U r , Y 1 r , Y 2 r , Y 3 r } . With the definition of L ( · ) in (12) and the solver to the U -subproblem (22), we obtain
0 L U ( B r + 1 , A r + 1 , Z r + 1 , U , Y 1 r , Y 2 r , Y 3 r ) | U = U r + 1 = λ t v U r + 1 1 μ ( HA r + 1 T U k + 1 + Y 3 r μ ) = λ t v U r + 1 1 Y 3 r + 1 ,
where L U ( · ) is the subgradient of the non-smooth function L ( · ) with respect to U . Based on the above-stated Lemma 1, we derive that Y 3 r + 1 λ t v 1 d u a l 1 from (24), so the sequence { Y 3 r + 1 } is bounded. In the update of Z , we have
0 L Z ( B r + 1 , A r + 1 , Z , U r , Y 1 r , Y 2 r , Y 3 r ) | Z = Z r + 1 = λ Z r + 1 1 μ ( A r + 1 Z k + 1 + Y 2 r μ ) = λ Z r + 1 1 Y 2 r + 1 ,
Similarly, we obtain Y 2 r + 1 λ 1 d u a l 1 based on the Lemma 1, and thus we conclude that the sequence { Y 2 r + 1 } is bounded. For the update of A , we have
0 = L A ( B r + 1 , A , Z r , U r , Y 1 r , Y 2 r , Y 3 r ) | A = A r + 1 = μ A r + 1 ( H T H + 2 I ) μ Z r μ B r + 1 + Y 1 r + Y 2 r ( μ U r T Y 3 r T ) H ,
where L A denotes the gradient of smooth function L ( · ) with respect to A . With the updating rules for Y 2 and Y 3 in (23), we reformulate the equation in (26) as follows:
0 = Y 1 r + 1 + Y 2 r + 1 + μ ( Z r + 1 Z r ) + μ ( U r + 1 T U r T ) + Y 3 r + 1 T H .
When lim r μ ( Z r + 1 Z r ) = 0 and lim r μ ( U r + 1 T U r T ) = 0 , we deduce that the sequence { Y 3 r + 1 } is bounded due to the boundedness of { Y 1 r + 1 } and { Y 2 r + 1 } . More specifically, Y 3 r + 1 = Y 3 r + 1 HH T ( HH T ) 1 as HH T is invertible. Then we can obtain the boundedness of { Y 3 r + 1 } with the boundedness of { Y 3 r + 1 T H } . According to the updating steps in Algorithm 2, we have that
L ( B r + 1 , A r + 1 , Z r + 1 , U r + 1 , Y 1 r , Y 2 r , Y 3 r ) L ( B r , A r , Z r , U r , Y 1 r , Y 2 r , Y 3 r ) = L ( B r , A r , Z r , U r , Y 1 r 1 , Y 2 r 1 , Y 3 r 1 ) + Y 1 r Y 1 r 1 , A r B r + Y 2 r Y 2 r 1 , A r Z r + Y 3 r Y 3 r 1 , HA r T U r = L ( B r , A r , Z r , U r , Y 1 r 1 , Y 2 r 1 , Y 3 r 1 ) + 1 μ ( Y 1 r Y 1 r 1 F 2 + Y 2 r Y 2 r 1 F 2 + Y 3 r Y 3 r 1 F 2 )
Let r vary from 1 to t and sum both sides of (28), we get
L ( B t + 1 , A t + 1 , Z t + 1 , U t + 1 , Y 1 t , Y 2 t , Y 3 t ) = L ( B 1 , A 1 , Z 1 , U 1 , Y 1 0 , Y 2 0 , Y 3 0 ) + 1 μ r = 1 t ( Y 1 r Y 1 r 1 F 2 + Y 2 r Y 2 r 1 F 2 + Y 3 r Y 3 r 1 F 2 ) )
Because of the boundedness of Y 1 r , Y 2 r and Y 3 r , we conclude that L ( B t + 1 , A t + 1 , Z t + 1 , U t + 1 , Y 1 t , Y 2 t , Y 3 t ) is also bounded. We can obtain the following equation with (12)
L ( B r + 1 , A r + 1 , Z r + 1 , U r + 1 , Y 1 r , Y 2 r , Y 3 r ) + 1 2 μ ( Y 1 r F 2 + Y 2 r F 2 + Y 3 r F 2 ) = 1 2 Y DB r + 1 F 2 + λ Z r + 1 1 + λ t v U r + 1 1 + μ 2 A r + 1 B r + 1 + Y 1 r μ F 2 + μ 2 A r + 1 Z r + 1 + Y 2 r μ F 2 + μ 2 H A r + 1 T U r + 1 + Y 3 r μ F 2 .
Due to the boundedness of L ( B r + 1 , A r + 1 , Z r + 1 , U r + 1 , Y 1 r , Y 2 r , Y 3 r ) , Y 1 r , Y 2 r and Y 3 r , the left side is bounded and thus the right side of (30) is bounded as well, which deduces that each term in the right side of (30) is bounded. Therefore, we conclude that the sequences of { B r } , { A r } , { Z r } , { U r } are bounded. To this end, the proof to the boundedness of the variable sequences { B r , A r , Z r , U r , Y 1 r , Y 2 r , Y 3 r } is complete.
Let Γ r = ( B r , A r , Z r , U r , Y 1 r , Y 2 r , Y 3 r ) be the sequence that is generated by Algorithm 2. Based on Bolzano-Weierstrass theorem [57], it is known that for a bounded sequence, there exists at least one accumulation point. We denote by Γ * = ( B * , A * , Z * , U * , Y 1 * , Y 2 * , Y 3 * ) the accumulation point of the sequence { Γ r } r = 1 , that is,
lim r ( B r , A r , Z r , U r , Y 1 r , Y 2 r , Y 3 r ) = ( B * , A * , Z * , U * , Y 1 * , Y 2 * , Y 3 * )
Next, we prove that the accumulation point Γ * satisfies the KKT conditions [58], which means { Γ } r = 1 converges to a KKT point. The proof is similar to that in Reference [59,60].
A KKT point of (11) should meet the KKT conditions, including:
A B = 0
A Z = 0
HA T U = 0
L B = D T D B D T Y Y 1 = 0
L A = Y 3 T H + Y 1 + Y 2 = 0
Y 2 λ Z 1
Y 3 λ t v U 1
According to the updating rules in (23), we have:
Y 1 r + 1 Y 1 r μ = A r + 1 B r + 1 Y 2 r + 1 Y 2 r μ = A r + 1 Z r + 1
Y 3 r + 1 Y 3 r μ = H A ( r + 1 ) T U r + 1
As lim r ( Y 1 r + 1 Y 1 r ) = 0 , lim r ( Y 2 r + 1 Y 2 r ) = 0 , lim r ( Y 3 r + 1 Y 3 r ) = 0 , we obtain A * = B * , A * = Z * and HA * T U * = 0 . Thus, the KKT conditions (32)–(34) hold.
Based on the updating rules in (14), we have:
B r + 1 B r = ( D T D + μ I ) 1 ( D T Y + μ A r + Y 1 r ) B r .
By left multiplying D T D + μ I , we obtain that
( D T D + μ I ) ( B r + 1 B r ) = ( D T Y + μ A r + Y 1 r ) ( D T D + μ I ) B r
Considering lim r ( B r + 1 B r ) = 0 , we drive the following equation:
( D T Y + μ A * + Y 1 * ) ( D T D + μ I ) B * = 0
Due to A * = B * , we can infer that D T D B * D T Y * Y 1 * = 0 , which satisfies the KKT condition (35). Similarly, we have the following equation according to the updating rule (17)
A r + 1 A r = ( Z r + B r + 1 Y 1 r μ Y 2 r μ + ( U r T Y 3 r T μ ) H ) ( H T H + 2 I ) 1 A r
Thus,
( A r + 1 A r ) ( H T H + 2 I ) = ( Z r + B r + 1 Y 1 r μ Y 2 r μ + ( U r T Y 3 r T μ ) H ) A r ( H T H + 2 I ) .
Combining lim r ( A r + 1 A r ) = 0 and the proved conditions (32)–(34), we obtain Y 3 * T H + Y 1 * + Y 2 * = 0 , which satisfies the KKT condition (36). To prove the condition (37), we reformulate it as follows:
Z + Y 2 μ Z + λ μ Z 1 = Θ λ μ ( Z ) ,
where scalar function Θ λ μ ( t ) = t + λ μ | t | is applied to Z element-wise. Based on Reference [61], we have the following relation:
Z = Θ λ μ 1 ( Z + Y 2 μ ) = R λ μ ( Z + Y 2 μ ) .
Now the condition (37) is transformed equivalently to (47). Based on the updating rules in (18), we have
Z r + 1 Z r = R λ μ ( A r + 1 + Y 2 r μ ) Z r .
As lim r ( Z r + 1 Z r ) = 0 and A * = Z * , we derive that Z * = R λ μ ( Z * + Y 2 * μ ) , which satisfies the condition (37). The last KKT condition (38) can be proved similarly as (37). We first reformulate it to the following equivalent condition:
U = R λ t v μ ( U + Y 3 μ ) .
With the updating rule (22), we have
U r + 1 U r = R λ t v μ ( H A ( r + 1 ) T + Y 3 r μ ) U r .
Taking lim r ( U r + 1 U r ) = 0 and HA * T U * = 0 , we conclude that U * = R λ t v μ ( U * + Y 3 * μ ) , which proves the condition (38). Overall, the accumulation point Γ * satisfies all the KKT conditions. This completes the proof of Theorem 1. ☐
Theorem 1 assures the theoretical convergence property of our algorithm under mild conditions. In the next Section, we will prove the convergence empirically by conducting experiments on real data sets.

4. Experiments

4.1. Experimental Settings

We conduct experiments on three widely used bench mark data sets: Indian Pines, Pavia University and Salinas. The results of two classical clustering methods FCM [5] and k-means [7], the random swap clustering (RSC) [62], the original SSC method [11], the SSC-based extensions L2-SSC [22] and JSSC [23], and the state-of-the-art large-scale clustering methods SSSC [28], SSC-OMP [29] and Sketch-SSC [30] are reported and analysed. The clustering methods FCM, k-means, RSC, SSC, SSSC, SSC-OMP and Sketch-SSC yield the results based on the spectral information alone while the L2-SSC, JSSC and Sketch-SSC-TV methods employ both spatial and spectral information.
We conduct four independent experiments using the three data sets. The traditional SSC-based methods [11,22,23] cannot cluster large-scale data sets. For this reason, we first test all the methods on the cropped version of Indian Pines. In order to compare with the methods [28,29,30] designed for large-scale data sets, we also test the performance on the original large-scale HSIs. We refer to the cropped data set as the small HSIs and to the original data sets as the large-scale HSIs.
Two commonly utilized quantitative metrics including overall accuracy (OA) and Kappa coefficient ( κ ) are employed to evaluate the clustering performance. In addition, we report the running time (t) as well for all the methods. For a dataset Y = [ y 1 , y 2 , , y N ] R B × N with N samples, the OA is obtained by i = 1 N δ ( map ( r i ) , l i ) / N , where r i and l i are the cluster label obtained by clustering and the true label of y i , respectively, and δ ( x , y ) equals one if x = y and equals zero otherwise. The map ( · ) is a pair-wise mapping function that finds the best match between the clustering results and ground truth. We apply here the Hungarian algorithm [63] to derive the best mapping function. For more details about cluster matching, we refer to Reference [64]. The obtained mapping function finds the label for each pixel. Thus, κ can be directly computed from the corresponding confusion matrix [65]. The running time records the whole clustering procedure for each clustering method. The optimal parameters for the traditional SSC-based methods SSC, L2-SSC and JSSC on the small HSI are set according to References [20,22,23]. The parameters of the other analysed methods were tuned to produce the best results in terms of OA to guarantee a fair comparison. The total number of the randomly selected samples in the SSSC method is equal to n for the small HSIs. For simplicity, the number of samples in the large-scale HSIs is set to 10 per class. In order to avoid the biased clustering results caused by randomness, the methods FCM, k-means, SSSC, Sketch-SSC and Sketch-SSC-TV are repeated five times and the averaged performance are reported. In the spectral clustering method, to reduce the computational complexity, we only calculate c eigenvectors of the Laplacian matrix L whose time complexity is O ( c ( M N ) 2 ) . For the Sketch-SSC and Sketch-SSC-TV models, the sketching matrices R are shared in each simulation. We set n = 70 , σ 2 = i , j a i a j 2 2 / ( M N ) 2 and k = 30 for the proposed Sketch-SSC-TV model based on the empirical optimization. We search λ in the range of { 1 × 10 4 , 5 × 10 4 , 1 × 10 3 , 5 × 10 3 , 1 × 10 2 , 5 × 10 2 } and λ t v in the range of { 1 × 10 4 , 5 × 10 4 , 1 × 10 3 , 5 × 10 3 , 1 × 10 2 , 5 × 10 2 , 1 × 10 1 , 5 × 10 1 } . All the methods were implemented in MATLAB on a computer with an Intel © core-i7 3930K CPU with 64 GB of RAM.

4.2. Data Description

4.2.1. Indian Pines

This image was captured in 1992 by the Airborne/Visible Infrared Imaging Spectrometer (AVIRIS) sensor over the Indian Pines region in North-western Indiana. The image has a spatial resolution of 20 m per pixel, and contains 16 ground-truth classes and 220 spectral reflectance bands in the wavelength range 0.4–2.5 μ m. The image size is 145 × 145 × 220 . During the test, 20 spectral bands in 104–108, 150–163 and 200 are removed due to water absorption. Figure 3a,b show the false color image and the ground truth of the cropped Indian Pines with the size of 85 × 70 , which includes 4 classes. The complete data set is shown in Figure 5.

4.2.2. Pavia University

This data was collected by the Reflective Optics System Imaging Spectrometer (ROSIS) sensor during a flight campaign over Pavia, Northern Italy. The typically used image consists of 610 × 340 pixels, resulting in 207,400 pixels with 103 spectral reflectance bands. The resolution is 1.3 m per pixel and the number of ground-truth classes is 9. The false color image and ground truth are shown in Figure 6a,b.

4.2.3. Salinas

The third image was acquired by the AVIRIS sensor over the Salinas Valley, CA, USA. The geometric resolution is 3.7 m per pixel, and the image size is 512 × 217 × 224 . There are 16 ground-truth classes. Twenty bands in 108–112, 154–167 and 224 are removed due to water absorption. The false color image and ground truth are shown in Figure 7a,b.

4.3. Experiments on the Small HSI

In this part, the experiments are conducted on the data in Figure 3 that is cropped from the original Indian Pines as indicated by the yellow box in Figure 5a. The specific class names and their corresponding clustering results are shown in Table 1, where the best result is marked in bold and the sub-optimal result is underlined.
The results reveal that our proposed Sketch-SSC-TV model achieves the best performance in terms of clustering accuracy and κ . The optimal parameters of the Sketch-SSC-TV in terms of OA are λ = 10 3 , λ t v = 10 2 . Compared with the classical clustering methods FCM and k-means, the SSC-based methods SSC, L2-SSC, JSSC, Sketch-SSC and Sketch-SSC-TV usually yield higher accuracy, showing the superior capability of SSC model in such clustering task. RSC is a k-means based clustering method, which consists of a sequence of centroid swaps and fine-tuning of the exact centroids with k-means. It shows better performance than k-means in Table 1 in terms of OA and κ . The number of iterations in RSC is set by default to 5000 in the source code (http://www.uef.fi/web/machine-learning/software), resulting thereby in longer running time than k-means and FCM. The classification accuracy of k-means for the class “Soybean-notill” is zero, which means that all the pixels belonging to class 3 are wrongly assigned to other classes. Compared with the original SSC model, the extensions L2-SSC and JSSC by incorporating different spatial information obtain the improved performance with the higher accuracy, which indicates the importance of spatial information in HSI clustering. It is very interesting and also surprised to find that the proposed Sketch-SSC-TV method yields higher clustering accuracy than the L2-SSC and JSSC methods which not only use the uncompressed self-representation dictionary but also takes the spatial information into account. Compared with the large-scale clustering methods SSSC, SSC-OMP and Sketch-SSC, our method yields significant accuracy improvement of more than 20%. The SSSC model heavily relies on the initially selected samples. So when the data sets are contaminated by noise or much diverse in each class, the performance may be greatly degraded. Compared with the Sketch-SSC model, our method offers significant improvement as well, which demonstrates the effectiveness of our approach.
Figure 4 shows the similarity matrices obtained by different clustering methods. For a better visual comparison, we randomly select 75 samples per class and arrange them in the sequential order by classes. It is known that the ideal similarity matrix should be block-diagonal as only the samples of the same class are connected in the graph [11]. The results in Figure 4 indicate that our method (Figure 4f) preserves such block-diagonal structure best, which is also the main reason why our approach achieves the highest accuracy in the spectral clustering in Table 1. In general, the similarity matrices in Figure 4e,f constructed by KNN are sparser than that by (2) in Figure 4a–c, but surprisingly they achieve comparable or even better spectral clustering performance, demonstrating the efficiency of sparse graph in the spectral clustering. The similarity matrix in Figure 4d is over sparse due to the strict sparsity constraint in the OMP algorithm, leading to the poor clustering performance. It is clearly observed that there are a lot of wrong connections between class 3 and class 4 in Figure 4a–c,e, which consequently results in the low accuracy for class 3 and class 4 as shown in Table 1. While it is less pronounced in Figure 4f, showing much closer block-diagonal structure, which achieves the highest accuracy for class 3 and 4. Such improved graph connectivity mainly benefits from the utilization of TV spatial regularization.
The results in Table 1 show that the SSSC method achieves the shortest computational time. k-means is known to be an efficient clustering algorithm with the complexity of O ( I B c M N ) where I is the number of iterations. The time complexity of SSSC is O ( I 1 B n 3 + I 2 n c 2 + M N n 2 ) , where n is set to 70 in the experiment. k-means took slightly longer running time than SSSC because it needed much more iterations to converge than SSSC on this data set. However, the results on the big data sets such as Pavia University and Salinas shown later indicate that k-means consistently is the fastest algorithm. It is also observed that the computation time t of the SSSC method is increased when the number of initially selected samples becomes larger. Among the clustering methods designed for large-scale data, SSC-OMP takes the longest time. In general, the large-scale clustering methods are much faster than the traditional SSC-based methods, with more than hundred times speed improvement. The reason for the significant speed improvement is that traditional SSC-based methods SSC, L2-SSC and JSSC use the self-representation dictionary Y , which is commonly huge for large-scale data and involves thereby many time-consuming operations of matrix multiplications and inverse calculations on the large dense matrix Y T Y R M N × M N in the optimization loop, while scalable clustering methods SSSC, Sketch-SSC and Sketch-SSC-TV employ a compressed dictionary, thus enabling a much lower computational complexity. In our method, the column size of the sketched dictionary D is 70 so that the cost of matrix multiplications and inverse calculation on the new matrix D T D R 70 × 70 in the optimization algorithm is significantly reduced in comparison with that on the huge matrix Y T Y in the traditional SSC-based methods. That is why our method only uses 6 seconds to obtain clustering result while the traditional SSC-based methods take around 10 minutes. The computation time of our sketching-based method is comparable to that of FCM and k-means, and hence much smaller than the computation time of the traditional SSC-based methods. Compared with other large-scale clustering methods, the computational cost of the Sketch-SSC-TV is similar.

4.4. Experiments on the Large-Scale HSIs

In this part, we conduct three more experiments on the entire HSIs. Due to the high computational memory requirement of the traditional SSC-based methods on large-scale HSIs, the SSC, L2-SSC and JSSC methods cannot be run on our computer for the Pavia University and Salinas. We estimate the required memory for only saving the large matrix Y T Y in the three HSIs by MATLAB as in Table 2. The required memory for the Pavia University is 320.5 GB without considering the extra memory cost for the operations including matrix multiplications and inverse calculations, which is unaffordable for normal computational devices. We report the experimental results in Table 3, Table 4 and Table 5. The clustering maps are shown in Figure 5, Figure 6 and Figure 7. The optimal parameters of the Sketch-SSC-TV model are λ = 10 3 , λ t v = 10 1 for the Indian Pines, λ = 5 × 10 2 , λ t v = 5 × 10 1 for the Pavia University and λ = 10 3 , λ t v = 10 4 for the Salinas.
The results in Table 3, Table 4 and Table 5 reveal that our method consistently achieves the highest clustering accuracy in the three HSIs, which confirms its effectiveness. Clustering for the Indian Pines is a very challenging task as some of the spectral signatures from different classes are very close and also parts of the spectrum are highly mixed due to low spatial resolution [20]. As depicted in Table 3 most of the approaches achieve quite low clustering accuracy, while our method yields a much better result with the accuracy of 60.48%.
The FCM, k-means and RSC produce similar accuracy in the Indian Pines and Pavia University, but the k-means is more efficient in terms of computation time than FCM and RSC. The traditional SSC-based methods SSC, L2-SSC and JSSC can be run only on the Indian Pines, however, our method is not only capable of running on all the three large-scale HSIs but also improves clustering performance, which mainly benefits from the exploitation of TV-norm spatial constraint and the sketching technique. Also in Table 3 we can see the computation time of the Sketch-SSC-TV method is significantly reduced by at least 600 times compared to the SSC, L2-SSC and JSSC methods, indicating the efficiency of using a compressed dictionary in our method instead of using the large self-representation dictionary. For the clustering methods designed for large-scale data, SSC-OMP uses much longer time than SSSC, Sketch-SSC and our method. The reason is that the sparse coding for each sample is performed in series. The computational time can be reduced by running simulation in parallel. Compared with the SSC clustering map in Figure 5f, the L2-SSC, JSSC and Sketch-SSC-TV methods have less impulse noise in the clustering maps, which is due to the use of spatial information to promote the connectivity between neighbouring pixels, leading to a more robust similarity matrix. The clustering results in Table 4 show that the accuracy of our method on “Self-Blocking-Bricks” is much lower than that of the reference methods. This can mainly be attributed to the over-smoothed clustering results as shown in Figure 6i, where the “Painted Metal Sheets” and the neighbouring “Self-Blocking-Bricks” are merged. However, this can be alleviated by relaxing the spatial constraint with a smaller λ t v . A possible risk is the reduced overall accuracy.
The large-scale clustering methods SSSC and SSC-OMP typically yield worse performance in terms of accuracy than the k-means and RSC method in the three large-scale HSIs, which indicates the limitation of their performance in HSI clustering. In Figure 5i, Figure 6f and Figure 7f the SSSC clustering maps are seriously deteriorated by the impulse noise, which is caused by the limited discriminative information in the spectral domain. The SSC-OMP method also suffers from the same problem for the Indian Pines and Pavia University as shown in Figure 5j and Figure 6g. Compared with the Sketch-SSC method, our method achieves significant improvement in terms of accuracy in the three large-scale HSIs, especially in the Indian Pines with the accuracy enhancement of 23.7%. The cost is a slight increase of computational time that comes from the TV-norm regularization. The Sketch-SSC model also suffers from the same impulse noise problem to the SSSC and SSC-OMP approaches in the clustering maps as shown in Figure 5k, Figure 6h and Figure 7h, while in our method such a problem is greatly alleviated as connections between neighbouring pixels are strengthened by the TV-norm constraint.

4.5. Analysis of Parameters

In this part, we analyse the effect of the parameters λ , λ t v , n and k on the clustering performance of the Sketch-SSC-TV method in the large-scale HSIs.

4.5.1. Effect of λ and λ t v

λ and λ t v in (8) controls the sparsity level and spatial constraint of the sparse matrix, respectively, which are two important parameters in the model. Let λ be varied in the range of { 1 × 10 4 , 5 × 10 4 , 1 × 10 3 , 5 × 10 3 , 1 × 10 2 , 5 × 10 2 } and λ t v in the range of { 1 × 10 4 , 5 × 10 4 , 1 × 10 3 , 5 × 10 3 , 1 × 10 2 , 5 × 10 2 , 1 × 10 1 , 5 × 10 1 } . The clustering results with respect to λ and λ t v are shown in Figure 8 for the three large-scale HSIs. The results indicate that the clustering performance is more stable with respect to λ than λ t v . According to the experimental results, we recommend to set λ = 10 3 for all the data. For the parameter λ t v , the value may be different for different data sets, but the clustering accuracy is stable and superior over other methods in a wide range, that is, when λ t v [ 5 × 10 3 , 10 1 ] for the Indian Pines, λ t v [ 5 × 10 3 , 5 × 10 1 ] for the Pavia University and λ t v [ 10 4 , 5 × 10 3 ] for the Salinas. The results of Salinas in Figure 8c are quite different with those of the Indian Pines and Pavia University. For the Salinas when the values of λ t v and λ are similar, the results typically show better performance, which means the sparsity constraint and spatial constraint are equally important. In contrast, for the Indian Pines and Pavia University our method achieves better performance when λ t v is larger than λ , indicating the spatial constraint is more important than the sparsity. The reason may lie in different types of HSIs and different levels of data quality. As each crop in the Salinas is planted regularly in block, there are more homogeneous regions and less edge than the Indian Pines and Pavia University, resulting in much smaller value of the TV-norm in Salinas. In addition, due to high data quality of the Salinas spatial information may be less important than that in the other two HSIs. Thus a lager value of λ t v can make the clustering result over smoothing, leading to a lower accuracy. Among the three constraints of Sketch-SSC-TV model, the data fidelity term is most important for the Salinas. Overall, based on the results in Figure 8 our method is robust and stable with respect to λ and λ t v .

4.5.2. Effect of the Parameter n

n is the number of columns of the sketching matrix R , which decides the sketched dictionary size and also the computation efficiency of the proposed method. We vary n in the range of { 10 , 20 , 40 , 70 , 100 , 140 } for the three large-scale HSIs. The results are reported in Figure 9, which shows that a larger n typically can result in a better clustering result. The reason is that a larger sketched dictionary can better preserve the original column space of the input data. For the Indian Pines and Salinas, the number of classes is 16. When n = 10 , the sketched dictionary cannot well represent the input data space, which results in the drop of accuracy compared to that when n = 20 . It is also revealed in Figure 9 that a small value of n (20 for example) is able to obtain satisfying clustering performance in the three HSIs, which coincides with the fact that the data of HSIs actually lies in a low dimensional subspace.

4.5.3. Effect of the Parameter k

We investigate the effect of the number of neighbours k on the clustering performance of our method by varying k in the range of { 5 , 10 , 15 , 20 , 30 , 50 } for the three large-scale HSIs. The results shown in Figure 10 reveal that a larger k yields a higher clustering accuracy in general. The accuracy curves with k < 20 rise more significantly than those with k 20 in the three HSIs. When k 20 , the accuracy becomes much more stable. Based on the results, we set the value of k to 30 in this paper.

4.6. Experimental Convergence Analysis

Figure 11 shows the squared Frobenius norm of the differences in Z and U values in each two subsequent iterations: Z r + 1 Z r F 2 and U r + 1 U r F 2 . We refer to these distances between the values of Z and U in two successive iterations as updating errors. The results show that the updating errors after a sufficient number of iterations tend to a very small value, meaning that the solutions of Z and U become stable eventually. Moreover, on all three datasets, the updating errors decline monotonically after certain iterations. Thus, we have lim r μ ( Z r + 1 Z r ) = 0 and lim r μ ( U r + 1 U r ) = 0 , where μ is a constant. This empirically demonstrates that the conditions in Theorem 1 are satisfied for all three analysed datasets, and it is thus reasonably to assume that they will be satisfied in a similar manner for most other HSIs in practice. It can be observed that the updating errors of Z and U in some datasets are zero at the beginning, which is mainly caused by the small values of A r + 1 + Y 2 r / μ in (18) and HA ( r + 1 ) T + Y 3 r / μ in (22) at the first several iterations, leading to the output of zero matrices in the thresholding operator R ( · ) . After certain number of iterations, their values increase and the output of thresholding operator is no longer zero, resulting in a temporary increase of updating errors, as shown in Figure 11. But finally they tend to a value that is close to zero.
The diagrams in Figure 12 show the evolution of the objective function values with respect to the number of iterations. The results reveal that the objective function is monotonically decreasing to a stable level in the three data sets, demonstrating the practical convergence of our optimization algorithm. Especially, the curves in Figure 12a,c drop sharply in the first few iterations and then become saturated. The results coincide with the aforementioned theoretical convergence analysis.

5. Conclusions

In this paper, the problem of large-scale HSIs clustering based on the SSC model is addressed for the first time, and a novel clustering method, namely Sketch-SSC-TV, is proposed to incorporates a random projection based sketching technique to significantly reduce the number of optimization variables. In addition, a TV-norm constraint on the sparse coefficient matrix promotes the dependencies between neighbouring pixels, which enhances the block-diagonal structure of the similarity matrix, improving thereby the performance of spectral clustering. We derived an efficient solver based on the ADMM algorithm for the resulting model and also we proved its convergence property theoretically. Unlike the traditional SSC-based methods which cannot be applied on large-scale HSIs due to extremely high computational burden, the proposed method is not only applicable on big data sets but also able to achieve a high level of clustering accuracy. The extensive experimental results clearly demonstrate that our method outperforms the state-of-the-art clustering methods.

Author Contributions

Conceptualization, S.H. and A.P.; Formal analysis, H.Z., Q.D. and A.P.; Funding acquisition, H.Z. and A.P.; Methodology, S.H.; Software, S.H.; Supervision, H.Z. and A.P.; Validation, H.Z. and A.P.; writing—original draft, S.H.; writing—review & editing, H.Z., Q.D. and A.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded in part by the Fonds voor Wetenschappelijk Onderzoek (FWO) project: G.OA26.17N, in part by Artificial Intelligence Research Flanders funded by the Flemish Government, in part by the grants from the China Scholarship Council (CSC) and UGent Bijzonder Onderzoeksfonds (BOF) cofunding-CSC and in part by the National Natural Science Foundation of China under grants 61871298 and 41711530709.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Zhang, Y.; Du, B.; Zhang, Y.; Zhang, L. Spatially Adaptive Sparse Representation for Target Detection in Hyperspectral Images. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1923–1927. [Google Scholar] [CrossRef]
  2. Wu, K.; Xu, G.; Zhang, Y.; Du, B. Hyperspectral image target detection via integrated background suppression with adaptive weight selection. Neurocomputing 2018, 315, 59–67. [Google Scholar] [CrossRef]
  3. Camps-Valls, G.; Tuia, D.; Bruzzone, L.; Benediktsson, J.A. Advances in hyperspectral image classification: Earth monitoring with statistical learning methods. IEEE Signal Process. Mag. 2014, 31, 45–54. [Google Scholar] [CrossRef] [Green Version]
  4. Eismann, M.T.; Stocker, A.D.; Nasrabadi, N.M. Automated hyperspectral cueing for civilian search and rescue. Proc. IEEE 2009, 97, 1031–1055. [Google Scholar] [CrossRef]
  5. Bezdek, J.C. Pattern Recognition with Fuzzy Objective Function Algorithms; Springer: Berlin, Germany, 2013. [Google Scholar]
  6. Lloyd, S. Least squares quantization in PCM. IEEE Trans. Inf. Theory 1982, 28, 129–137. [Google Scholar] [CrossRef]
  7. Arthur, D.; Vassilvitskii, S. k-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), New Orleans, LA, USA, 7–9 January 2007; pp. 1027–1035. [Google Scholar]
  8. Chen, G.; Lerman, G. Spectral curvature clustering (SCC). Int. J. Comput. Vis. 2009, 81, 317–330. [Google Scholar] [CrossRef] [Green Version]
  9. Dyer, E.L.; Sankaranarayanan, A.C.; Baraniuk, R.G. Greedy feature selection for subspace clustering. J. Mach. Learn Res. (JMLR) 2013, 14, 2487–2517. [Google Scholar]
  10. Elhamifar, E.; Vidal, R. Sparse subspace clustering. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 2790–2797. [Google Scholar]
  11. Elhamifar, E.; Vidal, R. Sparse subspace clustering: Algorithm, theory, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 2765–2781. [Google Scholar] [CrossRef] [Green Version]
  12. Liu, G.; Lin, Z.; Yan, S.; Sun, J.; Yu, Y.; Ma, Y. Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 171–184. [Google Scholar] [CrossRef] [Green Version]
  13. Park, D.; Caramanis, C.; Sanghavi, S. Greedy subspace clustering. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2014; pp. 2753–2761. [Google Scholar]
  14. Vidal, R. Subspace clustering. IEEE Signal Process. Mag. 2011, 28, 52–68. [Google Scholar] [CrossRef]
  15. Deerwester, S.; Dumais, S.T.; Furnas, G.W.; Landauer, T.K.; Harshman, R. Indexing by latent semantic analysis. J. Am. Soc. Inform. Sci. 1990, 41, 391–407. [Google Scholar] [CrossRef]
  16. Zhang, T.; Szlam, A.; Wang, Y.; Lerman, G. Hybrid linear modeling via local best-fit flats. Int. J. Comput. Vis. 2012, 100, 217–240. [Google Scholar] [CrossRef] [Green Version]
  17. Goh, A.; Vidal, R. Segmenting motions of different types by unsupervised manifold clustering. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 17–22 June 2007; pp. 1–6. [Google Scholar]
  18. Guo, Y.; Gao, J.; Li, F. Spatial subspace clustering for drill hole spectral data. J. Appl. Remote Sens. 2014, 8, 083644. [Google Scholar] [CrossRef]
  19. Guo, Y.; Gao, J.; Li, F. Random spatial subspace clustering. Knowl.-Based Syst. 2015, 74, 106–118. [Google Scholar] [CrossRef]
  20. Zhang, H.; Zhai, H.; Zhang, L.; Li, P. Spectral–spatial sparse subspace clustering for hyperspectral remote sensing images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 3672–3684. [Google Scholar] [CrossRef]
  21. Zhai, H.; Zhang, H.; Xu, X.; Zhang, L.; Li, P. Kernel Sparse Subspace Clustering with a Spatial Max Pooling Operation for Hyperspectral Remote Sensing Data Interpretation. Remote Sens. 2017, 9, 335. [Google Scholar] [CrossRef] [Green Version]
  22. Zhai, H.; Zhang, H.; Zhang, L.; Li, P.; Plaza, A. A new sparse subspace clustering algorithm for hyperspectral remote sensing imagery. IEEE Geosci. Remote Sens. Lett. 2017, 14, 43–47. [Google Scholar] [CrossRef]
  23. Huang, S.; Zhang, H.; Pižurica, A. Joint Sparsity Based Sparse Subspace Clustering for Hyperspectral Images. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 3878–3882. [Google Scholar]
  24. Zhai, H.; Zhang, H.; Zhang, L.; Li, P. Total Variation Regularized Collaborative Representation Clustering with a Locally Adaptive Dictionary for Hyperspectral Imagery. IEEE Trans. Geosci. Remote Sens. 2018, 57, 166–180. [Google Scholar] [CrossRef]
  25. Huang, S.; Zhang, H.; Pižurica, A. Semisupervised Sparse Subspace Clustering Method with a Joint Sparsity Constraint for Hyperspectral Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2019, 12, 989–999. [Google Scholar] [CrossRef]
  26. Wang, R.; Nie, F.; Wang, Z.; He, F.; Li, X. Scalable Graph-Based Clustering with Nonnegative Relaxation for Large Hyperspectral Image. IEEE Trans. Geosci. Remote Sens. 2019, 57, 7352–7364. [Google Scholar] [CrossRef]
  27. Huang, S.; Zhang, H.; Pižurica, A. Landmark-Based Large-Scale Sparse Subspace Clustering Method for Hyperspectral Images. In Proceedings of the IGARSS 2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 799–802. [Google Scholar]
  28. Peng, X.; Zhang, L.; Yi, Z. Scalable sparse subspace clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, United States, 25–27 June 2013; pp. 430–437. [Google Scholar]
  29. You, C.; Robinson, D.; Vidal, R. Scalable sparse subspace clustering by orthogonal matching pursuit. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, United States, 26 June–1 July 2016; pp. 3918–3927. [Google Scholar]
  30. Traganitis, P.A.; Giannakis, G.B. Sketched subspace clustering. IEEE Trans. Signal Process. 2018, 66, 1663–1675. [Google Scholar] [CrossRef]
  31. Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 2011, 3, 1–122. [Google Scholar] [CrossRef]
  32. Von Luxburg, U. A tutorial on spectral clustering. Stat. Comput. 2007, 17, 395–416. [Google Scholar] [CrossRef]
  33. Ng, A.Y.; Jordan, M.I.; Weiss, Y. On spectral clustering: Analysis and an algorithm. In Proc of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic; MIT Press: Cambridge, MA, USA, 2001; pp. 849–856. [Google Scholar]
  34. Zhang, H.; Li, J.; Huang, Y.; Zhang, L. A nonlocal weighted joint sparse representation classification method for hyperspectral imagery. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2014, 7, 2056–2065. [Google Scholar] [CrossRef]
  35. Li, W.; Du, Q. Joint within-class collaborative representation for hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2014, 7, 2200–2208. [Google Scholar] [CrossRef]
  36. Xue, J.; Zhao, Y.; Liao, W.; Chan, J.C. Nonlocal Low-Rank Regularized Tensor Decomposition for Hyperspectral Image Denoising. IEEE Trans. Geosci. Remote Sens. 2019, 57, 5174–5189. [Google Scholar] [CrossRef]
  37. Luo, F.; Du, B.; Zhang, L.; Zhang, L.; Tao, D. Feature learning using spatial-spectral hypergraph discriminant analysis for hyperspectral image. IEEE Trans. Cybern. 2018, 49, 2406–2419. [Google Scholar] [CrossRef]
  38. Xu, J.; Huang, N.; Xiao, L. Spectral-spatial subspace clustering for hyperspectral images via modulated low-rank representation. In Proceedings of the 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA, 23–28 July 2017; pp. 3202–3205. [Google Scholar]
  39. Huang, S.; Zhang, H.; Pižurica, A. A Robust Sparse Representation Model for Hyperspectral Image Classification. Sensors 2017, 17, 2087. [Google Scholar] [CrossRef] [Green Version]
  40. Mei, S.; Hou, J.; Chen, J.; Chau, L.P.; Du, Q. Simultaneous Spatial and Spectral Low-Rank Representation of Hyperspectral Images for Classification. IEEE Trans. Geosci. Remote Sens. 2018, 56, 2872–2886. [Google Scholar] [CrossRef]
  41. Zhai, H.; Zhang, H.; Zhang, L.; Li, P. Reweighted mass center based object-oriented sparse subspace clustering for hyperspectral images. J. Appl. Remote Sens. 2016, 10, 046014. [Google Scholar] [CrossRef]
  42. Yan, Q.; Ding, Y.; Xia, Y.; Chong, Y.; Zheng, C. Class-Probability Propagation of Supervised Information Based on Sparse Subspace Clustering for Hyperspectral Images. Remote Sens. 2017, 9, 1017. [Google Scholar] [CrossRef] [Green Version]
  43. Iordache, M.D.; Bioucas-Dias, J.M.; Plaza, A. Total variation spatial regularization for sparse hyperspectral unmixing. IEEE Trans. Geosci. Remote Sens. 2012, 50, 4484–4502. [Google Scholar] [CrossRef] [Green Version]
  44. Chen, J.; Richard, C.; Honeine, P. Nonlinear Estimation of Material Abundances in Hyperspectral Images with 1-Norm Spatial Regularization. IEEE Trans. Geosci. Remote Sens. 2014, 52, 2654–2665. [Google Scholar] [CrossRef]
  45. Simões, M.; Bioucas-Dias, J.; Almeida, L.B.; Chanussot, J. A convex formulation for hyperspectral image superresolution via subspace-based regularization. IEEE Trans. Geosci. Remote Sens. 2015, 53, 3373–3388. [Google Scholar] [CrossRef] [Green Version]
  46. He, W.; Zhang, H.; Zhang, L.; Shen, H. Total-variation-regularized low-rank matrix factorization for hyperspectral image restoration. IEEE Trans. Geosci. Remote Sens. 2016, 54, 178–188. [Google Scholar] [CrossRef]
  47. He, W.; Zhang, H.; Shen, H.; Zhang, L. Hyperspectral Image Denoising Using Local Low-Rank Matrix Recovery and Global Spatial–Spectral Total Variation. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 2018, 11, 713–729. [Google Scholar] [CrossRef]
  48. Liu, H.; Sun, P.; Du, Q.; Wu, Z.; Wei, Z. Hyperspectral Image Restoration Based on Low-Rank Recovery with a Local Neighborhood Weighted Spectral-Spatial Total Variation Model. IEEE Trans. Geosci. Remote Sens. 2018, 57, 1409–1422. [Google Scholar] [CrossRef]
  49. Boutsidis, C.; Zouzias, A.; Mahoney, M.W.; Drineas, P. Randomized dimensionality reduction for k-means clustering. IEEE Trans. Inf. Theory 2015, 61, 1045–1062. [Google Scholar] [CrossRef] [Green Version]
  50. Indyk, P.; Motwani, R. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, Dallas, TX, USA, 23–26 May 1998; pp. 604–613. [Google Scholar]
  51. Gong, Y.; Lazebnik, S.; Gordo, A.; Perronnin, F. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 2916–2929. [Google Scholar] [CrossRef] [Green Version]
  52. Park, Y.; Park, S.; Lee, S.g.; Jung, W. Greedy filtering: A scalable algorithm for k-nearest neighbor graph construction. In International Conference on Database Systems for Advanced Applications; Springer: Cham, Switzerland, 2014; pp. 327–341. [Google Scholar]
  53. Glowinski, R.; Le Tallec, P. Augmented Lagrangian and Operator-splitting Methods in Nonlinear Mechanics; SIAM: Philadelphia, PA, USA, 1989; Volume 9. [Google Scholar]
  54. Esser, E. Applications of Lagrangian-based alternating direction methods and connections to split Bregman. CAM Rep. 2009, 9, 31. [Google Scholar]
  55. Chen, C.; He, B.; Ye, Y.; Yuan, X. The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math. Program. 2016, 155, 57–79. [Google Scholar] [CrossRef]
  56. Lin, Z.; Chen, M.; Ma, Y. The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. arXiv 2010, arXiv:1009.5055. [Google Scholar]
  57. Bartle, R.G.; Sherbert, D.R. Introduction to Real Analysis; Wiley: New York, NY, USA, 2000; Volume 2. [Google Scholar]
  58. Bertsekas, D.P. Nonlinear programming. J. Oper. Res. Soc. 1997, 48, 334. [Google Scholar] [CrossRef]
  59. Fang, X.; Teng, S.; Lai, Z.; He, Z.; Xie, S.; Wong, W.K. Robust latent subspace learning for image classification. IEEE Trans. Neural Netw. Learn. Syst. 2017, 29, 2502–2515. [Google Scholar] [CrossRef] [PubMed]
  60. Xue, J.; Zhao, Y.; Liao, W.; Chan, J.C.; Kong, S.G. Enhanced Sparsity Prior Model for Low-Rank Tensor Completion. IEEE Trans. Neural Netw. Learn. Syst. 2019. [Google Scholar] [CrossRef] [PubMed]
  61. Shen, Y.; Wen, Z.; Zhang, Y. Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization. Optim. Methods Softw. 2014, 29, 239–263. [Google Scholar] [CrossRef] [Green Version]
  62. Fränti, P. Efficiency of random swap clustering. J. Big Data 2018, 5, 13. [Google Scholar] [CrossRef]
  63. Lovász, L.; Plummer, M.D. Matching Theory; American Mathematical Soc.: Amsterdam, The Netherlands, 1986. [Google Scholar]
  64. Rezaei, M.; Fränti, P. Set matching measures for external cluster validity. IEEE Trans. Knowl. Data Eng. 2016, 28, 2173–2186. [Google Scholar] [CrossRef]
  65. Powers, D.M. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Tech. 2011. [Google Scholar]
Figure 1. A motivation for applying the TV norm in the sparse subspace clustering (SSC) model. In the ideal case, coefficient matrices of pixels in hyperspectral images (HSIs) should be piece-wise smooth in local region and have similar edges to the original HSI, which becomes apparent after reshaping them to a 3-D cube. Observe that each M × N slice in this cube corresponds to one row in the 2-D matrix C and resembles the spatial structures in the original HSI. In order to preserve such smoothness in local regions and edge structure of the coefficient matrix, the TV spatial constraint is employed.
Figure 1. A motivation for applying the TV norm in the sparse subspace clustering (SSC) model. In the ideal case, coefficient matrices of pixels in hyperspectral images (HSIs) should be piece-wise smooth in local region and have similar edges to the original HSI, which becomes apparent after reshaping them to a 3-D cube. Observe that each M × N slice in this cube corresponds to one row in the 2-D matrix C and resembles the spatial structures in the original HSI. In order to preserve such smoothness in local regions and edge structure of the coefficient matrix, the TV spatial constraint is employed.
Remotesensing 12 00775 g001
Figure 2. Illustration of the traditional SSC-based models (top) and the sketched SSC model (bottom) where C and A are two sparse coefficient matrices to be computed and R is a random matrix for sketching.
Figure 2. Illustration of the traditional SSC-based models (top) and the sketched SSC model (bottom) where C and A are two sparse coefficient matrices to be computed and R is a random matrix for sketching.
Remotesensing 12 00775 g002
Figure 3. The false color image (a) and the corresponding ground truth (b) of the tested Indian Pines.
Figure 3. The false color image (a) and the corresponding ground truth (b) of the tested Indian Pines.
Remotesensing 12 00775 g003
Figure 4. Similarity matrix obtained by (a) SSC, (b) L2-SSC, (c) JSSC, (d) SSC-OMP, (e) Sketch-SSC and (f) Sketch-SSC-TV.
Figure 4. Similarity matrix obtained by (a) SSC, (b) L2-SSC, (c) JSSC, (d) SSC-OMP, (e) Sketch-SSC and (f) Sketch-SSC-TV.
Remotesensing 12 00775 g004
Figure 5. Indian Pines image. (a) False color image (yellow box is the cropped image), (b) Ground truth, and Clustering maps of (c) Fuzzy c-means (FCM), (d) k-means, (e) Random swap clustering (RSC), (f) SSC, (g) L2-SSC, (h) JSSC, (i) SSSC, (j) SSC-OMP, (k) Sketch-SSC and (l) Sketch-SSC-TV.
Figure 5. Indian Pines image. (a) False color image (yellow box is the cropped image), (b) Ground truth, and Clustering maps of (c) Fuzzy c-means (FCM), (d) k-means, (e) Random swap clustering (RSC), (f) SSC, (g) L2-SSC, (h) JSSC, (i) SSSC, (j) SSC-OMP, (k) Sketch-SSC and (l) Sketch-SSC-TV.
Remotesensing 12 00775 g005
Figure 6. Pavia University image. (a) False color image, (b) Ground truth, and Clustering maps of (c) FCM, (d) k-means, (e) RSC, (f) SSSC, (g) SSC-OMP, (h) Sketch-SSC and (i) Sketch-SSC-TV.
Figure 6. Pavia University image. (a) False color image, (b) Ground truth, and Clustering maps of (c) FCM, (d) k-means, (e) RSC, (f) SSSC, (g) SSC-OMP, (h) Sketch-SSC and (i) Sketch-SSC-TV.
Remotesensing 12 00775 g006
Figure 7. Salinas image. (a) False color image, (b) Ground truth, and Clustering maps of (c) FCM, (d) k-means, (e) RSC, (f) SSSC, (g) SSC-OMP, (h) Sketch-SSC and (i) Sketch-SSC-TV.
Figure 7. Salinas image. (a) False color image, (b) Ground truth, and Clustering maps of (c) FCM, (d) k-means, (e) RSC, (f) SSSC, (g) SSC-OMP, (h) Sketch-SSC and (i) Sketch-SSC-TV.
Remotesensing 12 00775 g007
Figure 8. Grid search of λ and λ T V for Sketch-SSC-TV in three data sets: (a) Indian Pines (b) Pavia University (c) Salinas.
Figure 8. Grid search of λ and λ T V for Sketch-SSC-TV in three data sets: (a) Indian Pines (b) Pavia University (c) Salinas.
Remotesensing 12 00775 g008
Figure 9. Performance of the proposed method with respect to n.
Figure 9. Performance of the proposed method with respect to n.
Remotesensing 12 00775 g009
Figure 10. Performance of the proposed method with respect to the number of neighbours k in K-nearest neighbors (KNN) graph.
Figure 10. Performance of the proposed method with respect to the number of neighbours k in K-nearest neighbors (KNN) graph.
Remotesensing 12 00775 g010
Figure 11. The evolution of the errors Z r + 1 Z r F 2 (top row) and U r + 1 U r F 2 (bottom row) with respect to the number of iterations for three datasets: Indian Pines (left), Pavia University (middle) and Salinas (right).
Figure 11. The evolution of the errors Z r + 1 Z r F 2 (top row) and U r + 1 U r F 2 (bottom row) with respect to the number of iterations for three datasets: Indian Pines (left), Pavia University (middle) and Salinas (right).
Remotesensing 12 00775 g011
Figure 12. The evolution of the objective function of the proposed model with respect to the number of iterations for three datasets: (a) Indian Pines, (b) Pavia University and (c) Salinas.
Figure 12. The evolution of the objective function of the proposed model with respect to the number of iterations for three datasets: (a) Indian Pines, (b) Pavia University and (c) Salinas.
Remotesensing 12 00775 g012
Table 1. Clustering results on the parts of Indian Pines.
Table 1. Clustering results on the parts of Indian Pines.
No.Class NameFCMk-MeansRSCSSCL2-SSCJSSCSSSCSSC-OMPSketch-SSCSketch-SSC-TV
1Corn-notill62.3969.8569.6560.0061.0974.0353.3169.3562.1961.41
2Grass-trees94.6653.8451.1098.3699.3210089.7399.86100100
3Soybean-notill44.1301.2376.9179.3786.2049.1344.4068.80100
4Soybean-mintill63.8357.5958.6350.6854.8987.7963.8541.6858.8793.81
OA(%)65.3450.1750.3365.1167.7886.4063.2858.1468.1288.46
κ 0.51180.28330.28510.52960.56290.80690.47720.44190.56280.8342
t (seconds)5.62.5815436242702.2222.85.8
Table 2. Required memory for saving Y T Y in different HSIs.
Table 2. Required memory for saving Y T Y in different HSIs.
Indian PinesPavia UniversitySalinas
Spatial image size 145 × 145 610 × 340 512 × 217
Matrix size of Y T Y 21,025 × 21,025207,400 × 207,400111,104 × 111,104
Required memory (GB)3.5320.592
Table 3. Clustering accuracy for Indian Pines.
Table 3. Clustering accuracy for Indian Pines.
No.Class NameFCMk-MeansRSCSSCL2-SSCJSSCSSSCSSC-OMPSketch-SSCSketch-SSC-TV
1Alfalfa23.91017.3936.960014.7807.3957.83
2Corn-notill25.7028.7129.0623.394348.2525.4819.332.2833.70
3Corn-mintill24.8244.3443.4934.3420.4818.1924.2435.900.5332.53
4Corn6.3314.3520.259.2800.425.9153.162.6242.53
5Grass-pasture43.8949.6949.6965.0155.4965.2246.5436.021.9065.84
6Grass-trees25.7540.8244.5237.9556.7175.2148.4949.0412.4145.18
7Grass-pasture-mowed071.430085.7175.007.14012.860
8Hay-windrowed89.3385.1581.8055.0271.1398.7456.3277.4115.10100
9Oats0030.0065.004506.0065.003.0020.00
10Soybean-notill23.4618.8318.4230.0458.5462.0424.9627.163.4875.23
11Soybean-mintill28.3538.9839.5533.9337.6443.3438.0723.8789.4276.85
12Soybean-clean23.6118.2117.2022.2624.2844.6916.4219.063.2764.99
13Wheat99.5197.0796.5996.1098.5410055.7158.545.6679.61
14Woods30.9941.8241.5038.5038.4253.1239.4341.1199.8171.46
15Bldgs-grass-trees-drives17.6218.1317.3621.5016.0638.0815.1811.141.4011.92
16Stone-steel-towers59.1486.0287.1019.3595.7067.7425.5918.2819.7879.14
OA31.3138.0838.2234.8042.1050.9033.2431.9836.7860.48
κ 0.25560.30990.31180.28640.35930.45250.25630.26590.22340.5575
t (seconds)74105031690620769183269462726
Table 4. Clustering accuracy for Pavia University.
Table 4. Clustering accuracy for Pavia University.
No.Class NameFCMk-MeansRSCSSC *L2-SSC *JSSC *SSSCSSC-OMPSketch-SSCSketch-SSC-TV
1Asphalt84.5490.5190.63---35.6059.6464.8899.78
2Meadows38.6143.8344.09---28.4727.5542.5557.25
3Gravel7.580.100.10---11.921.0520.1419.43
4Trees70.3363.6764.07---61.2982.3891.1775.05
5Painted Metal Sheets74.8048.2548.77---62.0597.1099.79100
6Bare Soil37.7832.8932.49---18.5631.6427.9460.93
7Bitumen000---5.8600.380
8Self-Blocking Bricks87.4894.2493.75---31.4877.4965.110.15
9Shadows99.89100100---8.91075.7373.75
OA51.8853.4153.50---30.1340.6549.8458.71
κ 0.42380.43370.4343---0.17940.30930.39570.4858
t (seconds)209171640---3072397838974
* Note: SSC, L2-SSC and JSSC cannot be implemented on our computer in this data due to the out-of-memory problem.
Table 5. Clustering accuracy for Salinas.
Table 5. Clustering accuracy for Salinas.
No.Class NameFCMk-MeansRSCSSC *L2-SSC *JSSC *SSSCSSC-OMPSketch-SSCSketch-SSC-TV
1Brocoli-green-weeds-199.7598.3699.90---66.99099.4399.94
2Brocoli-green-weeds-239.5966.5630.30---88.600.0598.9199.53
3Fallow19.1300.00---28.64011.928.21
4Fallow-rough-plow99.2190.3299.21---50.63019.9059.83
5Fallow-smooth91.6776.1492.87---44.20099.4598.92
6Stubble94.4487.9594.49---99.500.0599.5499.55
7Celery98.6397.9998.21---90.98055.2582.16
8Grapes-untrained34.0893.6070.95---58.6298.8298.6798.95
9Soil-vinyard-develop57.9774.1375.58---77.9499.9299.7299.94
10Corn-senesced-green-weeds7.2930.9633.10---44.230.0688.0494.26
11Lettuce-romaine-4wk4.1200.00---34.01054.2156.95
12Lettuce-romaine-5wk89.5291.6596.11---12.36070.9790.75
13Lettuce-romaine-6wk99.0298.5898.80---8.84000
14Lettuce-romaine-7wk87.3889.2588.41---54.04097.7878.77
15Vinyard-untrained30.020.0148.56---28.4600.280.30
16Vinyard-vertical-trellis12.0600.00---47.58097.6198.17
OA52.9363.7965.15---57.9732.0473.4377.00
κ 0.49000.59260.6116---0.53400.17430.70070.7411
t (seconds)394311946---3721831269335
* Note: SSC, L2-SSC and JSSC cannot be implemented on our computer in this data due to the out-of-memory problem.

Share and Cite

MDPI and ACS Style

Huang, S.; Zhang, H.; Du, Q.; Pižurica, A. Sketch-Based Subspace Clustering of Hyperspectral Images. Remote Sens. 2020, 12, 775. https://doi.org/10.3390/rs12050775

AMA Style

Huang S, Zhang H, Du Q, Pižurica A. Sketch-Based Subspace Clustering of Hyperspectral Images. Remote Sensing. 2020; 12(5):775. https://doi.org/10.3390/rs12050775

Chicago/Turabian Style

Huang, Shaoguang, Hongyan Zhang, Qian Du, and Aleksandra Pižurica. 2020. "Sketch-Based Subspace Clustering of Hyperspectral Images" Remote Sensing 12, no. 5: 775. https://doi.org/10.3390/rs12050775

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop