Spatial–Spectral Constrained Adaptive Graph for Hyperspectral Image Clustering

Hyperspectral image (HSI) clustering is a challenging task, whose purpose is to assign each pixel to a corresponding cluster. The high-dimensionality and noise corruption are two main problems that limit the performance of HSI clustering. To address those problems, this paper proposes a projected clustering with a spatial–spectral constrained adaptive graph (PCSSCAG) method for HSI clustering. PCSSCAG first constructs an adaptive adjacency graph to capture the accurate local geometric structure of HSI data adaptively. Then, a spatial–spectral constraint is employed to simultaneously explore the spatial and spectral information for reducing the negative influence on graph construction caused by noise. Finally, projection learning is integrated into the spatial–spectral constrained adaptive graph construction for reducing the redundancy and alleviating the computational cost. In addition, an alternating iteration algorithm is designed to solve the proposed model, and its computational complexity is theoretically analyzed. Experiments on two different scales of HSI datasets are conducted to evaluate the performance of PCSSCAG. The associated experimental results demonstrate the superiority of the proposed method for HSI clustering.


Introduction
Hyperspectral remote sensing combines imaging and spectral technologies together to detect objects remotely. The resulting hyperspectral images (HSIs) contain rich spatial and spectral information, which are able to distinguish objects with small dissimilarity. Therefore, HSIs have been widely used in various fields [1,2], such as agriculture [3], urban planning [4], environment monitoring [5], etc. In these applications, HSIs play an important role in the classification of ground objects, which is often achieved by hyperspectral image (HSI) segmentation. Currently, the HSI segmentation methods can be roughly divided into two categories: supervised and unsupervised ones. Supervised HSI segmentation is generally known as HSI classification, whose representative methods include support vector machine (SVM) [6], sparse representation-based classifier (SRC) [7], extreme learning machine (ELM) [8], and so on. Nevertheless, HSI classification requires a lot of well-labeled samples to train the model [9], which limits its application. For unsupervised HSI segmentation, the HSI labeling is unnecessary, which can greatly simplify the data processing. As the most important part of the unsupervised HSI segmentation, HSI clustering attracts extensive attention due to its simplicity and efficiency. To this end, this paper mainly focuses on HSI clustering.
In the past decades, numerous clustering methods have been proposed and applied for HSI segmentation, among which K-means clustering is one of the most classical clustering algorithms [10]. It converges quickly and generally performs well on small-scale data. However, K-means clustering is sensitive to noise and prone to falling into the local optimum. Furthermore, it cannot handle non-convex data. To overcome those shortcomings, the basic iterative self-organizing data analysis algorithm (ISODATA) was proposed to improve the K-means clustering [11], where the parameter K is updated in each iteration.
The non-negative matrix factorization (NMF) was developed by factorizing data matrix into two low-dimensional non-negative factor matrices to achieve less computational cost on HSI clustering [12]. Fuzzy c-means clustering (FCM) was proposed to avoid the hardclustering deficiency by utilizing the fuzzy membership [13]. It achieves clustering by calculating the membership of each sample to all classes, where the value of membership is between 0 and 1. However, FCM often produces clustering maps with salt and pepper noise. Considering the spatial continuity of objects, the spatial constraint was applied to exploit the spatial information of image data for enhancing the robustness of FCM, resulting in the fuzzy clustering with spatial constrains (FCM_S) [14]. In order to reduce the complexity of FCM_S, FCM_S1 was proposed [15]. Additionally, by integrating the idea of weighted mean into the FCM, Li. et al. [16] proposed the fuzzy weighted c-means (FWCM) to improve the performance of clustering. Subsequently, a new weighted fuzzy C-means algorithm (NW-FCM) was proposed for solving similar high-dimensional multiclass pattern recognition problems [17]. However, these extended FCM algorithms need a parameter to control the balance between robustness to noise and the effectiveness of preserving details; the selection of these parameters is difficult in practice. To overcome the above shortages, Krinidis and Chatzis [18] presented a fuzzy local information c-means (FLICM). In FLICM, the center pixel is greatly affected by its neighboring pixels. Thus, to trade off the center pixel's own features and the influence of neighboring pixels, a novel adaptive FLICM (ADFLICM) clustering approach was proposed to modify FLICM [19].
The subspace clustering methods generally model the same-class pixels that have various spectral signatures with a subspace and approximate the complex internal structure of HSIs by a union of subspaces, which may relieve the large spectral variability and improve the modeling accuracy [20]. The most representative subspace clustering model is sparse subspace clustering (SSC), which was proposed to group data points into different subspaces by finding the sparsest representation for each data point [21,22]. Combined with the spatial information and the nonlinearity of HSIs, many modified SSC methods have been proposed for HSI clustering. For example, a novel spectral-spatial sparse subspace clustering (S 4 C) was developed to explore the spectral similarity of local neighborhoods for improving the SSC model by incorporating the wealthy spatial information of HSI [23]. In addition, in [24], a spectral-spatial SSC based on 3D edge-preserving filtering (SSC-3DEPF) algorithm was put forward. It utilizes 3D edge-preserving filtering for the sparse coefficient matrix obtained by SSC to extract the spectral-spatial information to generate a more accurate coefficient matrix, which is favorable for clustering. A joint SSC (JSSC) method [25] was proposed to make use of the spatial information through joint sparse representation. It forces the pixels in a spatial neighborhood to share the same sparse basis. As the advantages of deep structures have been extensively verified, SSC also was extended to deep vision. To make full use of spatial information, a novel spectral-spatial Laplacian regularized deep subspace clustering (LRDSC) algorithm is proposed for HSI analysis [26]. Furthermore, a novel deep spatial-spectral subspace clustering network (DS 3 C-Net) is proposed to learn the similarity relationship among the pixels for improving HSI clustering [27].
Another important kind of clustering technique is the recent graph-based approaches. The main idea of these methods is to make the similarities within the sub-graphs as high as possible while making the edge weights connecting different sub-graphs as low as possible. The typical ones include clustering with adaptive neighbors (CAN) [28], projected CAN (PCAN) [28], fast spectral clustering with anchor graph (FSCAG) [29], and a scalable graphbased clustering with non-negative relaxation (SGCNR) [30]. CAN can adaptively acquire the local geometry of the data via constructing an adaptive adjacency graph and greatly improve the accuracy of clustering. The PCAN model attempts to learn a low-dimensional projection for reducing the dimensionality of data while clustering with adaptive neighbors. FSCAG and SGCNR greatly reduce the computational complexity by constructing the anchor graph such that they can be applied for large-scale HSI processing.
Although the previous HSI clustering methods have achieved great success, there are still two questions that need further study in HSI clustering. On the one hand, most of those methods are sensitive to noise corruption and cannot capture the intrinsic structure of data with noise accurately. On the other hand, the high-dimensionality of HSI data not only leads to a huge increase in clustering cost, but also limits the performance of HSI clustering. To tackle the above problems, this paper proposes a projected clustering with a spatialspectral adaptive graph (PCSSCAG) for HSI segmentation. PCSSCAG first constructs a spatial-spectral constrained adaptive graph with the locality structure adaptive acquisition technique, which can precisely capture the local geometrical structure information of HSI data. Meanwhile, a spatial-spectral constraint is utilized to simultaneously exploit the spatial and spectral information of HSI, which can further suppress the negative impacts of noise to improve the quality of the adaptive adjacency graph. Then, projection learning is integrated into the construction of spatial-spectral constrained adaptive graph to solve the problems that arise from high-dimensional features. Finally, this paper designs an alternating iteration algorithm to optimal the proposed model and theoretically analyze the computational complexity of the optimization algorithm. In summary, the proposed PCSSCAG method can simultaneously exploit spatial-spectral information and adaptively capture the locality geometrical structure to enhance the robustness against noise. Moreover, PCSSCAG can preserve the information reflected by the adaptive adjacency graph in the low-dimensional space to improve the performance of clustering. In addition, the lowdimensional projection, the captured locality structure, and the clustering results will finetune each other to obtain better solution at every iteration of the optimization algorithm. Extensive experiments on some benchmark HSI datasets demonstrate the effectiveness of the proposed method.

Methodology
This section first introduces, in detail, the formulation of the proposed PCSSCAG model. Then, an alternating iteration algorithm is designed to optimize the proposed model. At last, the complexity of the optimization algorithm is theoretically analyzed, and a parallel computation strategy of PCSSCAG is proposed for large-scale HSI clustering.

Formulation of PCSSCAG
In general, due to the influences of imaging environments and the characteristics of the imaging system, the obtained HSI data are inevitably disturbed by noise, which seriously degrades the quality of the data and limits the performance of HSI clustering. For the graph-based clustering methods, the quality of the adjacency graph plays an important role in the clustering. The more accurate the local geometrical structure captured by the adjacency graph, the better the clustering performance yielded. Thus, how to capture the precise intrinsic structure from the noisy data is critical for improving the accuracy of HSI clustering. Recently, the locality neighbors adaptive acquisition technique has provided an effective choice for characterizing noisy data, which can reveal the intrinsic structure of data adaptively. Inspired by CAN, an adaptive adjacency graph is first constructed to capture the accurate local geometrical structure of HSI data for improving the performance of HSI clustering. In detail, supposing a data matrix X ∈ {x 1 , x 2 , . . . , x i , . . . , x n }, n is the number of samples. Then, we can deal with the following problem to obtain the adjacency graph for the HSI data where x i ∈ R m×1 , s ij represents the similarity between x i and x j . s i ∈ R n×1 , and the j-th element is s ij . γ is the regularization parameter. It can be known from problem (1) that the smaller distance of x i − x j 2 2 corresponds to the higher probability s ij . Since the local neighbor pixels have a high probability of belonging to the same cluster, spatial information plays a critical role for the segmentation of HSI [31], which can effectively suppress the impact of noise to yield a more smooth segmentation map.
However, the problem (1) fails to consider the spatial information. Therefore, in order to construct a more accurate adaptive adjacency graph for HSI clustering, a spatial-spectral constraint term is added to (1) for exploiting the spatial-spectral information to enhance the robustness. Mathematically, the objective function with spatial-spectral constraint is expressed as where β is the impact factor of the spatial-spectral constraint, x k i is the k-th neighbor pixel of x i , and N r is the number of neighbor pixels.
The ideal adjacency graph with a clear clustering structure can be achieved by adding an additional constraint rank(Ls) = n − c into the problem (2). Thus, the new clustering model is to solve where S ∈ R n×n , and the i-th element is s i . Moreover, Ls is the Laplacian matrix, where According to [18], the problem (3) can be transformed into where 2λTr( , the stronger the similarity. It means that the probability (s ij ) that two samples belong to the same cluster is greater.
Furthermore, as mentioned in the Introduction, HSI data have the characteristics of high dimensionality, which often contain an amount of redundancy and lead to high computational cost. To address these problems, we integrate projection learning into the above model and develop a method named projected clustering with a spatial-spectral constrained adaptive graph (PCSSCAG) for HSI clustering. The corresponding objective function is formulated as where P ∈ R d×m (d << m), and S t = X T HX, where H = I − 1 n 11 T . Remarkably, while allowing the projection P = I ∈ R m×m , the PCSSCAG model will degenerate into (4), which is a special case of PCSSCAG with clustering on the raw data with a spatial-spectral constrained adaptive graph. Thus, to extensively verify the effectiveness of PCSSCAG, we denote the degenerated model as clustering with spatialspectral constrained adaptive graph (CSSCAG) for comparison in the experimental part.

Optimizayion of PCSSCAG
In the constructed PCSSCAG model, there are three variables (S, P, and F) that need to be solved. It is difficult to obtain the optimal solution for all variables at the same time. Therefore, an alternating iteration algorithm is designed to solve the three variables. Firstly, we initialize S by solving problem (1). Then, the iterative algorithm consists of the following three steps: (1) Update F If S and P are fixed, the optimal F can be computed by The optimal F is formed by the c eigenvectors of Ls = Ds − S T −S 2 corresponding to the c smallest eigenvalues.
(2) Update P Assuming F and S is given, the optimization problem becomes It can be written as The optimal solution is formed by the m eigenvectors of St −1 X T LsX corresponding to the m smallest eigenvalues. Let According to [18], the solution of s ij (the ith element of s i ) to the above problem is

Computational Complexity Analysis for PCSSCAG
This subsection briefly analyzes the computational complexity of Algorithm 1. The computational cost of optimal PCSSCAG mainly comes from updating the variables S, P, and F. Without loss of generality, we suppose that the raw data contain n samples with m features, and the projection P reduces the raw into a low-dimensional space with d features, where d m, n. The complexity of updating S in each iteration is O(nd 2 ). Updating P and F require solving two eigenvalue problems, whose complexities are at most O(n 3 ), respectively. Thus, the total computational complexity of solving PCSSCAG is at most O(t(nd 2 + 2n 3 )), where t is the number of iterations for Algorithm 1. Since d n, the complexity of PCSSCAG is O(tn 3 ), which is only highly related to the size of the samples. This implies that the PCSSCAG method can process high-dimensional data effectively.

Algorithm 1 Optimization Algorithm for Solving PCSSCAG
Input: Dataset X ∈ R n×d , cluster number c, reduced dimension m, parameter γ, λ, and β. Initialization: Initialize S by computing the problem (1). while not converged do 1: Update F by computing problem (6). 2: Update P by computing problem (8). 3: For each i, update the i-th row of S by computing problem (10). Output: S, P 2.4. Large-Scale HSI Clustering Strategy with PCSSCAG As discussed in Section 2.3, the computational complexity of PCSSCAG is highly related to the number of samples. Therefore, PCSSCAG requires more computational cost while clustering large-scale HSIs. To tackle this problem, a parallel computation strategy as shown in Figure 1 is designed to deal with the large-scale HSIs. In this strategy, HSIs are first divided into several small non-overlapping parts. Then, PCSSCAG are parallelly adopted to do clustering on each parts. Finally, the overall clustering result is obtained by combining the clustering results of all parts together. To validate the efficiency for large-scale HSI clustering, experiments are conducted on some benchmark large-scale HSI datasets in the next section.

Experiments
In this part, the performance of the proposed PCSSCAG is systematically evaluated with several state-of-the-art methods, such as NMF, FCM, FCM_S1, FSCAG, CAN, and PCAN. Specifically, to verify the usefulness of the integrated low-dimensional projection, the PCSSCAG degenerated model (i.e., CSSCAG) is used as a comparative method for the ablation study. To quantitatively evaluate the methods, the used evaluation metrics are the clustering accuracy of each category, overall accuracy (OA), Kappa coefficient (κ), and normalized mutual information (NMI). The following is the calculation formula of NMI.
where H(X), H(Y) are the respective information entropies of X and Y, and I(X, Y) is the mutual information of X and Y. In order to ensure the fairness of the experiment, the parameter values of the comparative methods are adjusted to the optimum. Furthermore, each method is rerun 100 times to eliminate the effect of random initialization, and the average result is reported as the performance evaluation.

Data Description
In the experiments, two different scales of HSIs are utilized to thoroughly validate the effectiveness of the proposed PCSSCAG method. Specifically, the Indian Pines and Salinas-A are two small-scale HSI datasets firstly used for testing the PCSSCAG method. Then, the whole Salinas and University of Pavia datasets are employed for verifying the clustering performance on large-scale HSIs similar to [32,33]. The more detailed descriptions of the datasets are presented in the following.
The Indian Pine dataset was gathered by the Airborne Visible Infrared Imaging (AVIRIS) sensor. The number of bands of the Indian Pines dataset used in our experiment was reduced to 200 by removing bands covering the region of water absorption. In particular, a typical part of the Indian Pines dataset, with the size of 85 × 68 was selected for experiments, which includes four classes: corn-notill, grass-trees, soybean-nottill, and soybean-mintill. The Salinas dataset was acquired by AVIRIS sensor over the Salinas Valley. It includes 224 spectral bands, with the size of 512 × 217. Similar to the Indian Pines scene, 24 water absorption bands were discarded. Salinas-A is a small subscene of Salinas image, with the size of 83 × 86. Salinas-A includes six classes: broccoli-green-weeds-1, corn-senesced-green-weeds, lettuce-romaine-4wk, lettuce-romaine-5wk, lettuce-romaine-6wk and lettuce-romaine-7wk. The University of Pavia dataset was obtained by German airborne reflection optical spectral imager. The size of the dataset is 610 × 340 and with 103 spectral bands. The University of Pavia dataset includes nine classes.

Parameter Analysis
In the proposed methods, there are two parameters (i.e., the size of the sliding window and β) that need to be pre-determined. In our experiments, the size of the sliding window is empirically set as 3 × 3. For the value of β, we select the value from 0 to 1 at an interval of 0.1 and finally determine the reasonable β value according to the best performance of clustering.
(1) Parameter analysis for the small-scale HSIs Figures 2 and 3 show the changes of clustering OA and NMI for small-scale HSI datasets yielded by CSSCAG and PCSSCAG, respectively. From Figure 3, it is easy to see that the optimal value of β in CSSCAG is 0.1 for the Indian Pine dataset and 0.2 for Salinas-A. Likewise, the suitable value of β for Indian Pine and Salinas-A datasets can be respectively set as 0.2 and 0.7 in PCSSCAG. (2) Parameter analysis for the large-scale HSIs The changes of clustering OA and NMI for the large-scale HSI datasets are exhibited in Figures 4 and 5. It can be learned that the most appropriate value of β for the University of Pavia dataset in CSSCAG and PCSSCAG are 0.7 and 0.6, respectively. For the Salinas datasets, the suitable β is same as the Salinas-A dataset.

Experimental Results and Analysis
(1) Clustering for small-scale HSIs The first experiment is conducted on the two small-scale HSI datasets. The clustering accuracy of each category, OA, NMI, and κ of the Indian Pines and Salinas-A datasets yielded by different clustering algorithms are listed in Tables 1 and 2. From Tables 1 and 2, it can be seen that the OA, NMI, and κ obtained by the classical methods (i.e., NMF, FCM, and FCM_S1) are relatively low for the two small-scale HSIs. It is obvious that the graphbased methods (i.e., CAN, PCAN, FCAG, CSSCAG, and PCSSCAG) perform better than the classical clustering methods. Among the graph-based methods, the CSSCAG and PCSSCAG methods that with spatial-spectral constraint obtain better clustering performance than CAN and PCAN, respectively. More importantly, the proposed PCSSCAG achieves the highest clustering accuracy. Comparing with the other methods, PCSSCAG yields increases of more than 4 percent in OA and 3 percent in κ on the Indian Pines dataset, respectively. PCSSCAG achieves great improvement on Salinas-A in clustering accuracy and presents the highest clustering accuracy with OA of 0.99, NMI of 0.97, and κ of 0.99.  Figures 6 and 7 show the corresponding cluster maps of different clustering methods for Indian Pines and Salinas-A, respectively. From the figures, a consistent conclusion can be learned from the cluster maps. It can be found that there are almost only two clusters in the cluster maps of Indian Pines yielded by FCM and FCM_S1. They failed to divide some similar classes, such as soybean-mintill and soybean-nottill. FCM and FCM_S1 also obtain similar results on Salinas-A. The graph-based clustering methods perform better cluster maps for both small-scale datasets. In particular, PCSSCAG products a more smooth clustering map than all comparative methods. (2) Clustering for large-scale HSIs The second experiment will validate the performance of the proposed PCSSCAG with two large-scale HSIs, i.e., Salinas and University of Pavia. Those two large-scale HSI datasets include quite a lot classes. Particularly, they contain more complex land-cover classes and the spectral signatures of some classes are very similar, which results in a more challenging clustering task. The quantitative evaluation and visual clustering results of the Salinas dataset are reported in Table 3 and Figure 8. From Table 3, it is obvious that the conventional clustering methods, such as NMF and FCM, achieve competitive clustering performance. However, the accuracy of some categories is still unsatisfactory. Comparing to NMF, FCM, and FCM_S1, the graph-based methods (i.e., CAN, PCAN, FSCAG, CSSCAG, and PCSSCAG) yield better clustering results. Specifically, compared to FCM and NMF, PCAN achieves an improvement of over 3 percent in OA and κ, and almost 2 percent in NMI. CSSCAG and PCSSCAG obtain higher clustering accuracy than CAN and PCAN, respectively. That implies the spatial-spectral constraint can effectively improve the cluster performance of HSIs. PCSSCAG improves the clustering performance by integrating a projection to reduce the redundancy while comparing to CSSCAG. More importantly, the proposed PCSSCAG method achieves the best clustering result. It is not hard to find that the clustering maps in Figure 8 show a consistent results with the accuracy in Table 3.   The experimental results of the University of Pavia dataset are shown in Table 4 and Figure 9. It can be seen from the Table 4 that the clustering results of the classical clustering methods (such as NMF, FCM, and FCM_S1) are very poor. For instance, the OA of FCM and NMF is less than 0.5 and samples from several categories failed to be divided by FCM_S1. Comparing with the classical clustering methods, the graph-based methods achieve better clustering performance. In particular, PCSSCAG achieves the highest clustering accuracy, which outperforms the comparative methods more than 4 percent in OA, 6 percent in NMI, and 4 percent in κ, respectively. A similar conclusion can be obtained from the cluster maps in Figure 9 for the University of Pavia dataset. From the above experiments, it can be found that the proposed methods greatly improve the performance of clustering through the incorporation of spatial-spectral constraint and projection learning. Those experimental results on both large-scale datasets validate that the clustering performance of PCSSCAG is still acceptable for large-scale HSI analysis via the designed parallel computing strategy.  Overall, from the experimental results two different scales of HSI datasets, it can obtain that the graph-based methods perform better clustering results than the other comparative methods. Among the graph-based methods, the proposed PCSSCAG method achieves the best performance in both quantitative and visual results. Specifically, comparing with CAN and PCAN, the spatial-spectral constraint imposed on PCSSCAG helps to improve the clustering accuracy by simultaneously making full use of the spatial and spectral information of HSI data. The ablation study (i.e., comparing PCSSCAG with CSSCAG) demonstrates that the low-dimensional projection integrated in PCSSCAG can reduce the redundancy to improve the effectiveness and avoid the curse of dimensionality problem.

Conclusions
In this paper, a clustering method with a spatial-spectral constrained adaptive graph is proposed for HSI clustering. The proposed method first utilizes both spectral and spatial information of HSI data to construct a more precise adjacency graph, which helps to enhance the robustness against noise. Then, projection learning is employed to alleviate the negative influences caused by the high dimensionality, which further improved the accuracy of clustering. At last, extensive experiments are conducted on several real hyperspectral datasets to verify the proposed method, and the experimental results show that the proposed PCSSCAG method performs better than all of the involved comparative methods.
However, the weight matrix associated to the adjacency graph in the proposed method is highly related to the size of samples, which requires more memory while clustering large-scale HSIs. The designed parallel strategy for large-scale HSIs fails to consider the correlation among patches, which may result in some loss of valuable global information.
In further studies, we will make an effort to develop multiple graphs based clustering method for large-scale HSI processing.

Informed Consent Statement: Not applicable
Data Availability Statement: The hyperspectral datasets used in this study are openly available at https://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_Scenes.