1. Introduction
Hyperspectral images (HSIs), acquired by the hyperspectral cameras, record the spectrum of materials covering a wide range of wavelengths. The rich spectral information of HSIs enables discriminating the materials that are often visually indistinguishable, which led to a number of applications in remote sensing, such as target detection [
1,
2], environmental monitoring [
3], geosciences, defense and security [
4]. It is often desired to categorize pixels in the imaged scene into different classes, corresponding to different materials or different types of objects. When no training data is available, this task is called clustering. Hence, clustering is also referred to as unsupervised classification.
Two most popular clustering methods are fuzzy c-means (FCM) [
5] and k-means [
6,
7] due to their simplicity and superior computational efficiency. They group data points by finding the minimum distance between test data and each cluster centroid which is updated iteratively. However, their performance is sensitive to initial conditions and noise.
Recently, spectral clustering-based methods [
8,
9,
10,
11,
12,
13] have achieved a great success and have been widely applied in various fields due to excellent performance and robustness to noise [
14]. In general, these methods first define a similarity matrix to construct a graph of data points, which is learned from the input data under different critera. Then, the resulting similarity matrix is used within the spectral clustering framework. The performance of spectral clustering heavily depends on the similarity matrix [
14], hence, its construction is a crucial step. Many of these methods, like local subspace affinity (LSA) [
15], spectral local best-fit flats (SLBF) [
16] and locally linear manifold clustering (LLMC) [
17] build the similarity matrix with
k nearest neighbours (KNN) using angle or Euclidean distance between two data points. This approach tends to treat erroneously the data points near the intersection of two subspaces because their closest points often lie in another subspace.
The recent sparse subspace clustering (SSC) method [
11] constructs the similarity matrix based on the self-expressiveness model where the input data is employed as the representation dictionary. SSC models a high-dimensional data space as a union of low-dimensional subspaces. The key insight is that, for each data point in the subspace
, the global solution of the sparse coding problem with the self-representation dictionary automatically selects the data points in the same subspace
. Thus, each data point gets automatically represented as a sparse linear or affine combination of other points in the same subspace. This is called subspace preserving property and is explicitly expressed by non-zero entries of the coefficient matrix
:
i-th and
j-th data points are in the same subspace if
. The coefficient matrix leads directly to the similarity matrix for spectral clustering.
As the SSC model calculates sparse coefficients individually and independently for each input data point, the clustering performance is sensitive to noise. In order to solve this problem, various extensions have been proposed with the aim to encode the spatial dependencies among the neighbouring data points in hyperspectral data, and obtain thereby more accurate similarity matrices and improved clustering results [
18,
19,
20,
21,
22,
23,
24,
25]. Guo et al. [
18,
19] focus on the clustering of 1-D drill hole hyperspectral data and regularize the coefficients of neighbouring data points in depth to be similar by a
norm based smoothing regularization. For the 2-D spatial-wise hyperspectral images, a smoothing strategy was introduced in Reference [
20] by minimizing the difference between coefficients corresponding to the central pixel and to the mean of pixels in a local square window. A kernel version of SSC incorporating max pooling of the sparse coefficient matrix was presented in Reference [
21]. The spectral-spatial SSC method of Reference [
22] integrates an
spatial regularizer with the SSC model (L2-SSC), to penalize abrupt differences between the coefficients of nearby pixels. In Reference [
23,
25], an
norm constraint on the coefficients of pixels in each local region was incorporated in the SSC model. Based on the collaborative representation with an
norm constraint on the coefficients, a novel model with a locally adaptive dictionary was proposed in Reference [
24].
While showing excellent performances, the above mentioned methods are also of considerable computational complexity, resulting from iterative optimization. The time complexity in each iteration is typically in the range of
, where
M and
N are the number of rows and columns in each band. For large scale HSIs with millions of pixels in each band, this bound can thus exceed
elementary operations per iteration, and such processing becomes often infeasible on the common computing platforms. The approaches reported in References [
26,
27] addressed this problem by constructing a graph based on a set of selected representative samples. In combination with modified spectral clustering methods, a lower complexity has been reached, but the clustering results are sensitive to the initially selected samples. Recently, some generalized large-scale methods [
28,
29,
30] based on SSC have been proposed for clustering tasks in computer vision. In Reference [
28], a scalable SSC method was designed for large-scale data sets, where a small part of samples are first randomly selected and clustered with the SSC model, and then clustering of remaining samples is executed by sparse coding with respect to the dictionary constructed from previously selected samples. The work in Reference [
29] studied an efficient SSC model based on orthogonal matching pursuit (OMP) and discussed theoretical conditions for subspace preserving representation. A recent sketched SSC model of Reference [
30] lowers the computational burden of SSC by using a clever random projection technique to sketch and compress the input data to a computationally affordable level. While these large-scale SSC-based methods demonstrated success in real applications with facial images, handwritten text and news corpus data, to the best of our knowledge none of them was applied before in the clustering of HSIs. Our experiments show that despite the scalability of these methods, their clustering performance in HSIs turns out to be poor. This can be attributed to the complex spatial structure of HSIs, spectral noise and spectral variability.
In view of this, we propose a sketched sparse subspace clustering method with total variation (TV) spatial regularization, termed Sketch-SSC-TV, which can handle large-scale HSIs while achieving a high level of clustering performance. A sketching matrix constructed by a random matrix is firstly employed to build a sketched dictionary, which is much smaller than the self-representation dictionary, resulting in a significant reduction of the number of coefficients to be solved. By incorporating the spatial constraint as the TV norm on the coefficient matrix, the proposed model greatly promotes the connectivity of neighbouring pixels and improves the piecewise smoothness of clustering maps. Furthermore, we propose an algorithm with theoretically guaranteed global convergence to solve the resulting optimization problem. By adopting the sketching matrix, the optimization complexity of the TV-related sub-problem reduces from to (), facilitating thus greatly the processing of large-scale data. The similarity matrix is constructed by applying KNN on the obtained coefficient matrix, and further employed within the spectral clustering method. Experiments conducted on four HSIs show superior clustering performance compared to both traditional SSC-based methods and the related large-scale clustering methods. The major contributions of the paper can be summarized as follows.
The most important contribution of this paper is a new SSC-based framework, which can be applied on large-scale HSIs while achieving excellent clustering accuracy. To the best of our knowledge, this is the first time to address the large-scale clustering problem of HSIs based on the SSC model.
Different from the traditional SSC-based methods which use all the input data as a dictionary, we adopt a compressed dictionary by using random projection technique to reduce the dictionary size, which effectively enables a scalable subspace clustering approach.
To account for the spatial dependencies among the neighbouring pixels, we incorporate a powerful TV regularization in our model, leading to a more discriminative coefficient matrix. The resulting model proves to be more robust to spectral noise and spectral variability.
We develop an efficient algorithm to solve the resulting optimization problem and prove its convergence property theoretically.
The rest of this paper is organized as follows.
Section 2 briefly introduces the clustering of HSIs with the SSC model.
Section 3 describes the proposed Sketch-SSC-TV model and the resulting optimization problem. Experimental results on real HSIs are presented in
Section 4.
Section 5 concludes the paper.
2. HSI Clustering with the SSC Model
Let a
B-band HSI be denoted as
, where the
i-th vector
represents the spectral signature of the
i-th pixel in HSI and
is the total number of pixels. Sparse subspace clustering (SSC) partitions the high-dimensional data space into a union of lower dimensional subspaces. Concretely, it assumes that all high-dimensional data points
’s, that is, spectral signatures of all the pixels from a given HSI
, are drawn from a union of subspaces, each of which corresponds to a particular class. The key idea is that among infinitely many possibilities to represent a data point
in terms of other points, a sparse representation will select a few points that belong to the same subspspace as
. This is known as the subspace preserving property. Thus, SSC starts from a self-representation model where the input data matrix
is employed as a dictionary:
and infers the coefficient matrix
by solving the sparse coding problem (requiring that
is sparse) and ensuring that the trivial solution where each sample would be simply represented by itself is excluded. The non-zero entries in
will then indicate directly which data points lie within the common subspaces. Formally, SSC solves the following optimization problem:
where
;
is an all-one vector;
is a diagonal matrix whose entries outside the main diagonal are zero and
is a parameter, which controls the balance between the data fidelity and the sparsity of the coefficient matrix. The constraint
is introduced to avoid the trivial solution of representing a sample by itself and the second constraint
ensures that each data point is an affine combination of other data points.
The problem in (
1) can be solved by the ADMM algorithm [
31], with the time complexity of
where
I is the number of iterations. The coefficient matrix
yields directly the dependence structure among the data points: a non-zero entry
indicates that the samples
and
are in the same class. Thus, it is reasonable to construct the similarity matrix
as
where
takes the absolute values of
. The symmetric structure of
ensures that each pair of samples are connected to each other if either side is selected to represent another, which results in a strengthened connection of the graph. The similarity matrix
is then used as an input to spectral clustering [
32] to produce the clustering result. Specifically, the Laplacian matrix
is first formed by
where
is a diagonal matrix with
[
33]. Then the
c eigenvectors
of
corresponding to the
c smallest non-zero eigenvalues of
are calculated via singular-value decomposition (SVD). Finally, the clustering result is obtained by running k-means clustering on the
matrix
.