Uniﬁed Low-Rank Subspace Clustering with Dynamic Hypergraph for Hyperspectral Image

: Low-rank representation with hypergraph regularization has achieved great success in hyperspectral imagery, which can explore global structure, and further incorporate local information. Existing hypergraph learning methods only construct the hypergraph by a ﬁxed similarity matrix or are adaptively optimal in original feature space; they do not update the hypergraph in subspace-dimensionality. In addition, the clustering performance obtained by the existing k-means-based clustering methods is unstable as the k-means method is sensitive to the initialization of the cluster centers. In order to address these issues, we propose a novel uniﬁed low-rank subspace clustering method with dynamic hypergraph for hyperspectral images (HSIs). In our method, the hypergraph is adaptively learned from the low-rank subspace feature, which can capture a more complex manifold structure effectively. In addition, we introduce a rotation matrix to simultaneously learn continuous and discrete clustering labels without any relaxing information loss. The uniﬁed model jointly learns the hypergraph and the discrete clustering labels, in which the subspace feature is adaptively learned by considering the optimal dynamic hypergraph with the self-taught property. The experimental results on real HSIs show that the proposed methods can achieve better performance compared to eight state-of-the-art clustering methods.


Introduction
Hyperspectral image (HSI) classification is an important problem in the remote sensing community. Extensive prior literature addresses the classification in the framework of supervised classification [1][2][3], in which the training of the classifier relies on the labeled data (with ground-truth information). However, the labeled datasets are strained and impossible to obtain in some applications by human capacity [4]. With the aim of exploiting the unlabeled remote sensing data, unsupervised classification methods containing the segmentation of the dataset into several groups with no prior label information are necessary.
According to the existing literature, clustering methods are divided into several categories [5]. The two most popular categories suitable for the characteristics of the HSIs are centroid-based methods and spectral-based methods. Among the centroid-based clustering methods, k-means [6] and fuzzy c-means (FCM) [7] get more attention due to their computational efficiency and simplicity, which group pixels by finding the minimum distance between pixels and each clustering centroid through iterative update. Recently, the spectral-based clustering methods have been highly popular and have been widely used for hyperspectral data clustering. In general, these methods construct a similarity matrix based on the original data first, then apply the centroid-based methods to the eigenspaces of the Laplacian matrix to segment pixels. Specifically, the locally spectral clustering (LSC) [8] and the globally spectral clustering (GSC) [9] use the local and global neighbors about each pixel to construct the similarity matrix which represents the relationship between pairs of pixels respectively, and applies k-means on the eigenspace of the Laplacian matrices, but they cannot distinguish between subspaces the pixels should belong to. Otherwise, the large spectral variability results in the uniform feature point distribution, which increased the difficulty of HSI clustering [5]. The recently developed sparse subspace clustering (SSC) [10,11] and low-rank subspace clustering (LRSC) [12,13] methods use the sparse and low-rank representation coefficients to define the adjacent matrix, and apply spectral clustering to obtain the segmentation result. However, compared with SSC, LRSC is better at exploring the global structure information by finding the lowest-rank representation of all the data jointly. Nevertheless, the original LRSC model cannot explore the local latent structure information of the data while exploiting the corresponding subspaces.
Inspired by the theory of manifold learning in image processing [14], Lu et al. [15] proposed the graph-embedded low-rank representation (GLRR) by incorporating graph regularization into low-rank representation objective function. However, the general graph regularization model only uses the paired relationship between two pixels, which cannot excavate the complex high-order relationships of the pixels. In fact, the relationship between the hyperspectral pixels we are interested in is not just a pairwise relationship between two pixels, but a plural or even more complex relationship. Instead of considering pairwise relations, the hypergraph models the data manifold structures by exploring the high-order relations among data points, which is first proposed by Berge [16]. Zhou et al. [17] combined the powerful methodology of spectral clustering to extend originally undirected graph to hypergraph. Then, hypergraph is widely used in feature extraction [18], band selection [19], dimension reduction [20], and noise reduction [21] in hyperspectral images. According to the extensive prior literature, the methods associated with hypergraph based on representation learning are divided into two categories. One is using the representation coefficient as the hyperedge weight to construct hypergraph, such as [2,22] regards sparse and low-rank coefficients as a new feature to measure the similarity of the pixels and adaptively select neighbors for constructing the hypergraph, respectively. The other is using hypergraph as regularization to optimal representation coefficient by capturing intrinsic geometrical structure. Gao et al. [23] first introduced hypergraph into sparse coding, in which hypergraph explores the similarity information among the pixels within the same hyperedge, and simultaneously updates the sparse representation coefficient of them to be similar to each other. Motivated by the idea of hypergraph regularization, it was introduced into non-negative matrix factorization [24], sparse NMF [25], low-rank representation [26,27].
It is noteworthy that there are two main problems in existing hypergraph-based representation learning methods. First, the pre-constructed hypergraph is usually learned from the original data with a certain distance measurement but not optimized dynamically. Then, Zhang et al. [28] proposed a unified framework for data structure estimation and feature selection, which update the hypergraph weight in the hypergraph learning process. In Reference [29], a dynamic hypergraph structure learning method was proposed, in which the incidence matrix of hypergraph can be learned by considering the data correlation on both the label space and the feature space. In addition, the data from the original feature space may contain various of noises, which could degenerate the performance since these methods highly depend on the constructed hypergraph. Zhu et al. [30,31] proposed an unsupervised sparse feature selection method by embedding a hypergraph Laplacian regularizer, in which the hypergraph was learned dynamically from the optimized sparse subspace feature. Otherwise, the hypergraph was adaptively learned from the latent representation space, which can robustly characterize the intrinsic data structure [32,33]. Second, the clustering performance obtained by the existing k-means-based methods is unstable as the initialization of the cluster centers has too much impact on the performance of the k-means method. Therefore, it is necessary to construct a unified framework and directly generate discrete clustering labels [34][35][36]. However, the existing unified clustering frame-work is based on the general graph structure, which may lead to significant information loss and reduce the performance of the clustering algorithm.
To address the issues, we propose a novel unified dynamic hypergraph low-rank subspace clustering method for hyperspectral images, known as UDHLR. First, we develop a dynamic hypergraph low-rank subspace clustering method, known as DHLR, where the hypergraph regularization is used to preserve the local complex structure of the lowdimensional data. Meanwhile, the hypergraph is adaptively learned from the low-rank subspace feature. However, the DHLR algorithm works in two separate steps: learning the low-rank coefficient matrix as similarity graph; generating the discrete clustering label by the k-means method. Therefore, we integrate these two subtasks into a unified framework, in which low-rank representation coefficient, hypergraph structure and discrete clustering label are optimized by using the results of the others to get an overall optimal result. The main contributions of our methods are summarized as follows: (1) Instead of pre-constructing a fixed hypergraph incidence and weight matrices, the hypergraph is adaptively learned from the low-rank subspace feature. The dynamically constructed hypergraph is well structured and theoretically suitable for clustering. (2) The proposed method simultaneously optimizes continuous labels, and discrete cluster labels by a rotation matrix without any relaxing information loss. (3) It jointly learns the similarity hypergraph from the learned low-rank subspace data and the discrete clustering labels by solving a unified optimization problem, in which the low-rank subspace feature and hypergraph are adaptively learned by considering the clustering performance and the continuous clustering labels just serve as intermediate products.
The remainder of this paper is organized as follows: Section 2 revisits the low-rank representation and hypergraph. Section 3 describes the proposed DHLR and UDHLR models. Section 4 presents the experimental setting and experimental results. Section 5 presents the discussions about computation complexity. Finally, concludes are presented in Section 6. The framework of the proposed methods is shown in Figure 1.

Related Work
The important notations in the paper are summarized in Table 1.

Low-Rank Representation
Let X = [x 1 , x 2 , . . . , x n ] ∈ R d×n denotes a hyperspectral image with n samples, x i ∈ R d represents the i-th pixel with d spectral bands. Low-rank representation (LRR) attempts to solve the following objective function to seek the lowest-rank representation for clustering where Z ∈ R n×n denotes the lowest-rank representation matrix under a self-expressive dictionary [13], rank(Z) is the rank of matrix Z , N is a sparse matrix of outliers. However, the rank minimization problem is NP-hard and difficult to optimize, thus the nuclear norm is adopted to address this issue, yielding the following optimization [13]: where Z * is the nuclear norm constraint of matrix Z and is calculated as The representation matrix Z can be solved by optimizing the above problem via the inexact augmented Lagrange multiplier (ALM) method [37]. Finally, the adjacency matrix |Z| + |Z| T as the edge weights can be constructed with the obtained low-rank coefficient matrix, and the clustering result is obtained by applying the k-means to the eigenspaces of the Laplacian matrix induced by the adjacency matrix.

Hypergraph
The relationship between pixels we are interested in is not just a pairwise relationship between two pixels, but a plural or even more complex relationship. When simply compress the multivariate relationship into a pairwise relationship between two pixels, it would inevitably lose a lot of useful information, thus it would affect the accuracy of feature learning to a certain extent [19].
are the set of vertexes and hyperedges, respectively. The dataset X can be used to make up the set of the vertexes V. W = diag(w(e 1 ), w(e 2 ), · · · , w(e n )) denotes the weight matrix of the hyperedges. For simplicity, it only considers the case where the hyperedge contains the same number of vertices. For a given edge e i ∈ E, the hyperedge weight can be constructed as w( is the set of the nearest neighbors to v i , σ is the kernel parameter. An incidence matrix H denotes the relationship between vertices and hyperedges, with entries defined as: The vertex degree of each vertex v i is defined as d(v i ) = ∑ n j=1 w e j h v i , e j , and the degree of hyperedge e i is defined as d( , · · · , d(v n )) and D e = diag(d(e 1 ), d(e 2 ), · · · , d(e n )) are the vertex-degree matrix and the hyperedge-degree matrix, respectively. Finally, the normalized hypergraph Laplacian matrix is Thus, the hypergraph can well represent local structure information and complex relationship between pixels. It worth noting that the quality of L depends on H and W, we use L (H,W) to represent hypergraph Laplacian hereunder.

Materials and Methods
The conventional hypergraph construction is only based on the original features, which is independent of the learned features in low-rank subspace. There is no guarantee that the learned hypergraph is optimal to model the pixelwise relationship among subspace feature. Therefore, this learned a suboptimal hypergraph structure can lead to a suboptimal solution in the process of learning incidence matrix. To address the above problems, we propose to learn a dynamic hypergraph to explore the intrinsic complex local structure of pixels in their low-dimensional feature space. In addition, hypergraph-based manifold regularization can make the low-rank representation coefficient well capture the global structure information of the hyperspectral data. In the end, the proposed model learns a rotation matrix to simultaneously learn continuous labels and discrete cluster labels in one step.

Dynamic Hypergraph-Based Low-Rank Subspace Clustering
Based on Section 2.2, a hypergraph structure can be used to maintain the local relationship of the original data [17,28]. First, we propose to preserve the local complex structure of the low-dimensional data by the hypergraph regularization. To do this, we design the following objective function: Obviously, Equation (5) is equivalent to: where L (H,W) is hypergraph Laplacian, λ 1 and λ 2 are two tuning parameters. However, H is pre-constructed based on the original data, which usually cannot be learned dynamically.
In this paper, we propose to update hypergraph H based on the low-dimensional subspace information, furthermore, couple with the learning of Z in a unified framework. To achieve this, we design the final objective function as follows: where W = diag(w), the two-norm regularization on the weight matrix is used to avoid overfitting. On the one hand, Z can preserve the global structures via the low-rank constraint to conduct subspace learning. On the other hand, it can also preserve the local structures via the second term of Equation (7) to select the informative features. The proposed dynamic hypergraph low-rank subspace clustering is known as DHLR.

Optimization Algorithm for Solving Problem (7)
In order to solve problem (7), the variable J is introduced to make (7) separable for optimization as follows: The optimization problem (8) can be solved with ADMM algorithm by minimizing the following augmented Lagrangian formulation: where C 1 , C 2 and η are Lagrange multipliers, µ is a positive penalty parameter. The variables J, Z, N, H, W. and Lagrange multipliers can be obtained by alternately solving each variable of (9) with other variables fixed. The detailed solution steps are as follows: Update J : Fixing variables Z, N, H, W, we can obtain the solution of J by solving the following problem: By introducing the singular value thresholding (SVT) operator [38], the solution of J is given as: whereΘ denotes the SVT operator. Update Z: Fixing variables J, N, H, W, we can obtain the objective function about Z as follows: Problem (12) has a closed-form solution as a quadratic minimization problem, which is: Update N: Fixing variables J, Z, H, W , we can obtain the solution of N by solving the following problem: The objective function on the variable N can be rewritten as: where P i and N i are the i-th column of matrices P and N , respectively. Update H and D e : According to the definition of the hypergraph in Section 2.2, the hyperedges are constructed from the original data and may affect the accuracy of the hypergraph with the noise. To tackle this problem, we use the low-dimensional subspace feature with no noisy to learn the hyperedges. Then the formulation for constructing the set of the hyperedges is given like [30] as follow: in which N (·) is the near neighbor pixels. In this work, Xz t+1 j is the top K similarity neighbors of Xz t+1 i except for itself. After producing the incidence matrix H t+1 , it is easy to work out D t+1 Update W : Fixing variables J, Z, N, H, we can obtain the objective function about W as follows: (19) can be rewritten as the following form: Then Equation (20) can be rewritten as the following form: According to the Karush-Kuhn-Tucker conditions, the closed-form solution for w i is: Then we further obtain W t+1 = diag w t+1 and Update the Lagrange multipliers C 1 , C 2 and penalty parameter µ by The entire procedure for solving DHLR method is summarized in Algorithm 1.

Algorithm 1 the DHLR algorithm for HSI clustering
Input: A 2-D matrix of the HSI X ∈ R d×n , the number of desired clusters c and the regularization parameter λ 1 , λ 2 , λ 3 .

Unified Dynamic Hypergraph-Based Low-Rank Subspace Clustering
Most existing hypergraph-based clustering methods contain two independent processes: the hypergraph construction and clustering. Using the hypergraph to construct similarity matrix, then use the spectral clustering or k-means to produce final clustering labels [39]. Although this approach was very popular in clustering applications, it may also produce very unstable performance since the initialization of the cluster centers has too much impact on the performance of the k-means method [8]. In order to address their problem, we propose a unified framework to exploit the correlation between the similarity hypergraph and discrete cluster labels for the clustering task. It updates the dynamic hypergraph with an optimal low-rank subspace feature and then directly generates the discrete cluster labels by introducing a rotation matrix. Finally, the proposed model cannot only make use of the optimal dynamic hypergraph and the global low-dimensional feature information but also get the discrete clustering labels. In order to achieve the above purpose, the objective function can be denoted as where λ 4 and λ 5 are penalty parameters. In general, F = [f 1 , f 2 , · · · , f n ] T ∈ R n×c (s.t. F ∈ Idx ) is the cluster indicator matrix in spectral clustering method. In order to solve the NP-hard problem caused by the discrete constraint on F, F ∈ R n×c is relaxed into continuous domain, and the orthogonal constraint is adopted to make it computational tractable. In order to achieve an ideal clustering structures, [40] proposed to impose a rank constrain on the hypergraph Laplacian matrix L induced by representation matrix Z, rank(L) = n − c. Under this constraint, we can directly partition the data into clusters. The rank constraint problem is equivalent to minimize ∑ c i=1 σ i (L) [41]. According to Ky Fan's theorem [42] Tr F T LF . In order to generate the discrete clustering label, we introduce a rotation matrix Q ∈ R c×c . According to the spectral solution invariance property [43], the last term can find a proper orthonormal Q to make the result of FQ approximate to the real discrete clustering labels. Y ∈ R n×c is the discrete label matrix. In fact, Equation (25) is not a simple unification of some terms, which can exploit the relationship between the dynamic hypergraph matrix and the clustering labels. Ideally, we have z ij = 0 if and only if pixel i and j are in the same cluster, equivalently y i = y j . It is also true vice versa. Therefore, the feedback from the inferred labels and the similarity hypergraph matrix can affect each other. From this point of view, our clustering framework has the self-taught property.

Optimization Algorithm for Solving Problem (25)
In order to solve problem (25), the variable J is introduced to make (25) separable for optimization as follows: Then, (26) can be rewritten into the following augmented Lagrangian formulation: The steps to update J, Z, N and H are similar to those of DHLR except for updating W, F, Q and Y.
Update W: Fixing variables J, Z, N, H,F,Q,Y, we can obtain the solution of W by solving the following problem: (28) can be rewritten as the following form: Similar to the solution of problem (20), the closed-form solution for w i is: We further obtain W t+1 = diag w t+1 , and D t+1 v via the same formulation as Equation (23). The diagram about the iterative optimization of H and W in the dynamic hypergraph is shown in Figure 2. Update F : with other variables fixed, it is equivalent to solving The solution of variable F can be efficiently obtained via the algorithm proposed by [44].
Update Q: By fixing other variables, we have It has a closed-form solution as an orthogonal Procrustes problem [45]. The solution is Q = UV T , where Uand V are left and right parts of the SVD decomposition of Y T F. Update Y: with other variates fixed, the problem becomes Notes that Tr Y T Y = n, the above subproblem can be rewritten as below: The optimal solution of variate Y is: Update the Lagrange multipliers and penalty parameter like DHLR in Equation (24). The details of the UDHLR algorithm optimization are summarized in Algorithm 2.

Algorithm 2 the UDHLR algorithm for HSI clustering
Input: A 2-D matrix of the HSI X ∈ R d×n , the number of desired clusters c and the regularization parameter λ 1 , λ 2 , λ 3 , λ 4 , λ 5 .

end while
Output: the cluster label Y for data X.

Experimental Datasets
To validate the effectiveness of the proposed methods, we conduct experiments on three real-world hyperspectral datasets, namely Indian Pines, Salinas-A, and Jasper Ridge. Table 2 summarizes the detailed information of these three datasets.

Indian pines
The Indian Pines dataset was collected by an Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor over Northwestern Indiana in 1992. The image has a spatial resolution of 20 m and 220 spectral bands ranging from 0.4 to 2.5 µm. During the test, 20 spectral bands (104-108, 150-163, and 220) are removed due to water absorption and noisy [46]. The size of this image is 145 × 145. There are originally 16 classes in total. Following, e.g., [47], nine main classes were used in our experiment: corn-no-till, corn-minimum-till, grass pasture, grass-trees, hay-windrowed, soybean-no-till, soybeanminimum-till, soy-bean-clean, and woods.

Selinas-A
The original data set is the Salinas Valley data. This scene was acquired by the AVIRIS sensor over the Salinas Valley, California in 1998. The size of this image is 512 × 217 and contains 224 spectral bands with a spatial resolution of 3.7 m per pixel. There are originally 16 classes in Salina Valley. Following, e.g., [48], a subset of the Salinas Valley scene, denoted as Salinas-A hereinafter, is adopted, which contains of 86 × 83 pixels with 6 classes and 204 bands remain after removing noisy bands. The subset is in the [591-678] × [158-240] of Salinas Valley.

Jasper Ridge
There are 512 × 614 pixels and 224 spectral bands in Jasper Ridge dataset. After removing the spectral bands 1-3, 108-112, 154-166 and 220-224 affected by water vapor and the atmospheric, we obtained 198 spectral bands. Since the ground-truth is too complex to get in this hyperspectral image, we consider a sub image containing 100 × 100 pixels with four classes. The first pixel starts from the (105,269)-th pixel in the original image.  In the experimental results, the normalized mutual information (NMI) is employed to gauge the clustering performance quantitatively, which measures the overlap between the experimental obtained labels and the ground-truth labels. Given two variables A and B , NMI is defined as [49]:

I(A, B) is the mutual information between A and B , H(A) and H(B) respectively denote the entropies of A and B. Obviously, if A is identical with B, NMI(A, B) will be equal to 1; if A is independent from B, NMI(A, B) will become 0.
In addition, we also evaluate the clustering performance by measuring user's accuracy, producer's accuracy, overall accuracy (OA), average accuracy (AA), k coefficient. For a dataset with n pixels, y i is the clustering label of pixel x i obtained by clustering method, g i is the ground-truth of x i . The OA is obtained by where δ(x, y ) = 1, if x = y ; δ(x, y ) = 0, otherwise. map(·) is the optimal mapping function that permutes clustering labels to match the ground-truth labels. The best mapping can be found by using the Kuhn-Munkres algorithm [50]. The average accuracy (AA) is the ratio between the number of predictions on each class and the total number of each class. For clustering tasks, the clustering results (i.e., clustering labels) obtained in the experiment must be aligned to the class labels of the ground-truth. To achieve the above purpose, a simple exhaustive search on all permutations of the cluster labels is used to maximize the resulting OA as was done in [51]. We note that this alignment is perhaps the most beneficial for maximizing OA measurement, there may be alternative alignments that powerful for maximizing AA or k [52].

Compared Methods
In order to evaluate the clustering performance of the proposed DHLR and UDHLR algorithms, eight clustering methods are selected for fair comparison. The first category comprises two centroid-based methods, which are k-means [6] and fuzzy c-means (FCM) [7]. The iterations of the k-means method are 200 in our experiment. The fuzziness exponent in FCM we set is 2. For the second category, we compare against classical spectralbased clustering approaches using both a globally connected graph (GSC [8]) as well as a locally connected graph (LSC [9]). The graph weights are constructed by a Gaussian kernel. The third category comprises four subspace-based spectral clustering methods, including SSC [10,11], LRSC [12,13], GLRSC [15], and the hypergraph-regularized LRSC (HGLRSC)as described in [52].
(1) Parameter analysis in DHLR: In DHLR, λ 1 is the manifold regularization parameter, λ 2 is the noise regularization parameter, λ 3 is penalty parameter of hyperedge weight W. Figure 3 shows the OA of DHLR with respect to the parameter λ 1 . For the Indian Pines dataset, the peak value of OA generates when λ 1 = 0.1. For the Salinas-A dataset, we set λ 1 = 1 for obtaining the best result in the experiments. For the Jasper Ridge dataset, the clustering results are better when we set λ 1 = 1. Figure 4 shows the OA of DHLR with respect to the parameter λ 2 . For the Indian Pines dataset, the peak value of OA generates when λ 2 = 1000. For the Salinas-A dataset, we set λ 2 = 1 for obtaining the best result in the experiments. For the Jasper Ridge dataset, the clustering results are better when we set λ 2 = 0.01. Figure 5 shows the OA of DHLR with respect to the parameter λ 3 . According to Figure 5, we find that the proposed methods can achieve better performance with λ 3 in the setting of 1000, 0.01, 0.001 for the Indian Pines, Salinas-A, and Jasper Ridge datasets, respectively.
(2) Parameter analysis in UDHLR: Except for the same three parameters λ 1 , λ 2 , and λ 3 as DHLR, λ 4 is the parameter of the label feature manifold regularization. In addition, λ 5 is conductive to discrete label learning. Figure 6 shows the OA of UDHLR with respect to the parameters λ 1 . For the Indian Pines dataset, the best results can be achieved when λ 1 = 10. For the Salinas-A dataset, the clustering results are better when we set λ 1 = 1000. For the Jasper Ridge dataset, we set λ 1 = 0.001 for obtaining the best result in the experiments. Figure 7 shows the OA of UDHLR with respect to the parameters λ 2 . For the Indian Pines dataset, the best results can be achieved when λ 2 = 0.01. For the Salinas-A dataset, the clustering results are better when we set λ 2 = 1. For the Jasper Ridge dataset, we set λ 2 = 100 for obtaining the best result in the experiments. Figure 8 shows the OA of UDHLR with respect to the parameters λ 3 . The UDHLR performs well when λ 3 being set of 1, 0.01, 100 for the Indian Pines, Salinas-A, and Jasper Ridge datasets, respectively. In UDHLR, λ 4 and λ 5 play a vital role in clustering performance. Figure 9 demonstrates the OA values of three datasets under tuning λ 4 while keeping other parameters fixed. As can be seen, the best result can be achieved when λ 4 = 10 for the Indian Pines dataset. For the Salinas-A dataset, we set λ 4 = 1000 in our experiments. The results in Figure 9c show that the UDHLR performs well when λ 4 = 1000 for the Jasper Ridge dataset. Figure 10 shows the OA of UDHLR with respect to the parameters λ 5 . For the Indian Pines dataset, the peak value of OA generates when λ 5 = 0.01. For the Salinas-A dataset, we set λ 5 = 0.001 for obtaining the best result in the experiments. For the Jasper Ridge dataset, the clustering results are better when we set λ 5 = 1.

Investigate of Clustering Performance
Both the clustering maps and quantitative evaluation results are given in this section. The presented results clearly demonstrate that DHLR and UDHLR outperform the other methods on the three datasets. We run all the methods 100 times independently, and show the mean results of the clustering result in the corresponding Tables of the three datasets. In addition, the corresponding variance values of the methods generated in three datasets are recorded in Figure 11. (1) Indian Pines: Figure 12 shows the clustering maps of the Indian Pines dataset. Table 3 gives the quantitative the clustering results. In general, the graph-based methods get better performance than the methods with no graph. Specifically, the K-means and FCM methods perform poorly with many misclassifications in the cluster map because of without exploring the local geometrical structure of the data. Compared with K-means and FCM, the GSC and LSC methods improves the clustering results by applying k-means on the eigenspace of the Laplacian matrices. In contrast, the subspace clustering methods can obtain a much better performance by using subspace learning to model the complex inherent structure of HSI data. Compared with K-means, SSC and LRSC perform much better in this dataset, obtaining the increments in OA of 3.82% and 5.58%, respectively. However, the learned representation coefficient matrix cannot capture the essential geometric structure information. As a result, the clustering results are not very high. GLRSC and HGLRSC improve the clustering performance of LRSC by optimizing the low-rank representation coefficient with the graph and hypergraph regularization, which shows the advantage of incorporating the latent geometric structure information. Unfortunately, the hypergraph is usually fixed, which is constructed by the original data, which is not optimized adaptively. The proposed DHLR algorithm improves 4.49% compared with the classical LRSC, and more than 4.07% compared with HGLRSC. Furthermore, the proposed UDHLR method obtains the best results with the 2.07% improvements than DHLR.  (2) Salinas-A: Figure 13 illustrates the visualization performance of the Salinas-A dataset. Table 4 gives the corresponding quantitative clustering results. Among these comparison algorithms, GLRSC and HGLRSC combine the graph theory and representation learning into the HSI data clustering. Meanwhile, SSC and LRSC only use the representation learning to obtain the new feature, and GSC and LSC only use the graph theory into the clustering. It can be seen from Table 4 that clustering accuracy of GSC, LSC, SSC and LRSC is lower than GLRSC and HGLRSC. This indicates that learning with the local geometry structure information can improve the HSI clustering observably. In addition, K-means and FCM methods perform poorer than the spectral-based methods. Compared with the aforementioned methods, the proposed DHLR and UDHLR effectively improve the clustering performance by optimal the hypergraph adaptively. As shown in Table 4, UDHLR achieves the highest OA than other methods. We can see that the proposed DHLR and UDHLR algorithms can effectively preserve the detailed structure information, and show an obvious advantage compared with the other clustering methods.  (3) Jasper Ridge: Figure 14 and Table 5 show the visual and quantitative clustering results of Jasper Ridge dataset, respectively. From Figure 14 and Table 5, we can see that the centroid-based and spectral-based clustering methods-K-means, FCM, GSC, LSC, SSC, and LRSC-achieve poorer clustering performance when compared with the graph and hypergraph combined clustering results. On the contrary, GLRSC obtains a much higher clustering accuracy than LRSC. HGLRSC also obtains higher clustering precision than LRSC and GLRSC. The proposed DHLR and UDHLR algorithm outperform the other state-of-the-art clustering methods significantly. In which the UDHLR method achieves the best clustering results, with the best OA of 92.56%, which again demonstrates the advantage of the proposed algorithm.

Discussion
In this section, we will discuss the computation complexity of the proposed DHLR and UDHLR methods. The main computation cost of the DHLR algorithm lies in updating J t+1 , Z t+1 , N t+1 , which need the complexity about O n 2 r all of them. As referred in [13], r is the rank of the dictionary with the orthogonal basis of the dictionary data. The updating of H t+1 need to construct an n × n matrix, whose time complexity is O nd 2 . The complexity of updating W t+1 is O nd 2 . In addition, updating the Lagrange multipliers take the complexity of O(nd), which is too small to be neglected. The complexity of the UDHLR algorithm comes from the updating of F t+1 , Q t+1 , Y t+1 , except for the variables J t+1 , Z t+1 , N t+1 , H t+1 , W t+1 same as DHLR. The complexity for updating F t+1 is O nc 2 + c 3 . The solution of solving Q t+1 involves SVD and the complexity is O nc 2 + c 3 . To update Y t+1 , we need O nc 2 . Therefore, the total complexity of UDHLR is O n 2 r + nd 2 + nc 2 + c 3 . Though, the number of the cluster c is small, the computation complexity of the proposed methods is greatly higher than original LRSC algorithm because of involving matrix inversion and SVD. In the future, we will consider parallel computing to increasing running speed.

Conclusions
In this paper, we propose a novel unified adaptive hypergraph-regularized lowrank subspace learning method for hyperspectral clustering. In the proposed framework, low-rank and the hypergraph terms are used to explore the local and global structure information of data, and the last two terms are used to learn the continuous label and the discrete label. Specifically, the hypergraph is adaptively learned from the low-rank subspace feature without exploring a fixed incidence matrix, which is theoretically optimal for clustering. Otherwise, the proposed model learns a rotation matrix to simultaneously learn continuous labels and discrete cluster labels, which need no relaxing information loss as many existing spectral clustering methods. It jointly learns the similarity hypergraph from the learned low-rank subspace data and the discrete clustering labels by solving an optimization problem, in which the subspace feature is adaptively learned by considering the clustering performance and the continuous clustering labels just serve as intermediate products. The experimental results demonstrate that the proposed DHLR and UDHLR outperforms the existing clustering methods. However, the computational complexity of each iteration is very high in the proposed methods, which should be optimized in the view of running time. In the future, we will optimize the complexity of the proposed method and intend the hypergraph learning to conduct the large-scale hyperspectral image clustering.