Next Article in Journal
Landslide Identification in the Yuanjiang Basin of Northwestern Hunan, China, Using Multi-Temporal Polarimetric InSAR with Comparison to Single-Polarization Results
Previous Article in Journal
An Enhanced Algorithm Based on Dual-Input Feature Fusion ShuffleNet for Synthetic Aperture Radar Operating Mode Recognition
Previous Article in Special Issue
Multi-Feature Lightweight DeeplabV3+ Network for Polarimetric SAR Image Classification with Attention Mechanism
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Large-Scale Hyperspectral Image-Projected Clustering via Doubly Stochastic Graph Learning

Xi’an Research Institute of High-Tech, Xi’an 710025, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2025, 17(9), 1526; https://doi.org/10.3390/rs17091526
Submission received: 18 March 2025 / Revised: 21 April 2025 / Accepted: 21 April 2025 / Published: 25 April 2025
(This article belongs to the Special Issue Remote Sensing Image Classification: Theory and Application)

Abstract

:
Hyperspectral image (HSI) clustering has drawn more and more attention in recent years as it frees us from labor-intensive manual annotation. However, current works cannot fully enjoy the rich spatial and spectral information due to redundant spectral signatures and fixed anchor learning. Moreover, the learned graph always obtains suboptimal results due to the separate affinity estimation and graph symmetry. To address the above challenges, in this paper, we propose large-scale hyperspectral image-projected clustering via doubly stochastic graph learning (HPCDL). Our HPCDL is a unified framework that learns a projected space to capture useful spectral information, simultaneously learning a pixel–anchor graph and an anchor–anchor graph. The doubly stochastic constraints are conducted to learn an anchor–anchor graph with strict probabilistic affinity, directly providing anchor cluster indicators via connectivity. Meanwhile, when using label propagation, pixel-level clustering results are obtained. An efficient optimization strategy is proposed to solve our HPCDL model, requiring monomial linear complexity concerning the number of pixels. Therefore, our HPCDL has the ability to deal with large-scale HSI datasets. Experiments on three datasets demonstrate the superiority of our HPCDL for both clustering performance and the time burden.

Graphical Abstract

1. Introduction

Hyperspectral images (HSIs) enjoy the rich information underlying hundreds of spectral bands, having demonstrated a strong discrimination ability to divide land covers, particularly for those that show extremely similar signatures in color space [1]. As a result, HSIs have been widely used for some high-level Earth observation tasks like mineral analysis, mineral exploration, precision agriculture, military monitoring, etc. However, in practical applications, annotating a large number of pixels is cumbersome and intractable [2], which forces us to consider classifying land covers in an unsupervised way, i.e., clustering. Clustering is a classic topic in data mining; it partitions pixels into different groups by specific distance (or similarity) criteria such that similar pixels are assigned to the same group. HSI clustering is a pixel-level clustering task that can be roughly divided into the centroid-based method, density-based method, subspace-based method, deep-based method, and graph-based method. The centroid-based method forces pixels close to the nearest centroid during iterations. Classic methods include K-means [3], fuzzy C-means [4], and kernel K-means [5]. It is well known that the centroid-based method is extremely sensitive to the initialization of the centroid, resulting in unstable performance [6]. The density-based method achieves clustering via the density differences of pixels in a particular space. A basic assumption is that highly dense pixels belong to the same cluster, and sparse pixels are outliers or boundary points. For this branch, the representative method DBSCAN [7] has demonstrated a strong clustering ability for arbitrary shapes of data, but it may fail to address relatively balanced data distribution. The subspace-based method assumes that all the data can be presented in a self-expressed way. Classic algorithms include diffusion-subspace-based clustering [8], sparse-subspace-based clustering [9], and dictionary-learning-based subspace clustering [10]. These methods often suffer from a heavy computational burden, limiting their scalability to large-scale data. Inspired by cluster hierarchical building, HESSC [11] proposed a sparse subspace clustering method that has a lower calculation burden. Moreover, thanks to the rapid development of deep learning, many deep-learning-based methods have been proposed for HSI clustering. Recent deep-learning-based works have studied graph neural networks (GNNs) [12] and the Transformer architecture [13] for HSI clustering. More recently, a lightweight contrast learning deep model [14] was proposed, which requires fewer model parameters than most backbones. However, current deep-learning-based methods still have poor theoretical explainability.
The graph-based method, as an important branch of HSI clustering, has been widely studied in recent years. The graph-based method learns a graph representation to model the paired relations between any two pixels and then partitions the clusters based on the learned graph [15]. Some classic methods [16,17] involve learning a symmetrical pixel–pixel graph, which needs O ( n 2 ) memory complexity and, in general, O ( n 3 ) computational complexity, where n denotes the number of pixels. Therefore, the high storage and calculation burden means that these methods can only handle small-scale HSI datasets, and they become much slower as the number n of pixels increases. To solve the problem, large-scale HSI clustering has become the key study direction, which can be roughly divided into superpixel-based methods and anchor-based methods. Superpixel-based methods use classic superpixel segmentation models like entropy rate superpixel (ERS) [18] or simple linear iterative clustering (SLIC) [19] to group similar pixels located in local neighborhoods. Pixels in the same group are generally linearly weighted as a representative pixel termed a superpixel. These superpixels are built as a symmetrical superpixel–superpixel graph to achieve clustering. Therefore, the superpixel-based method drops memory complexity to O ( M 2 ) and computational complexity to O ( M 3 ) , where M is the number of superpixels. As M n , it can be extended to large-scale HSI datasets. For this line, some representative works have been proposed. SGLSC [20] proposed a superpixel-level graph-learning method from both global and local aspects. The global graph is built in a self-expressed way, while the local graph is built by using the Manhattan distance to retain the nearest four superpixels. After fusing global and local graphs, spectral clustering is used to obtain the final results. More recently, EGFSC [21] simultaneously learned a superpixel-level graph and spectral-level graph for fusion. As EGFSC is a multistage model free of iterations, it is much faster than SGLSC. However, superpixel-based models have an internal limitation, i.e., they regard all the pixels in one superpixel as the same cluster. Therefore, these models cannot obtain high-quality pixel-level clustering results, and their performance is highly sensitive to the number of superpixels.
The anchor-based method is more popular for large-scale HSI clustering, which selects m representative anchors ( m n ) to learn a pixel–anchor graph for clustering, and, thus, both memory and computational complexity are dropped as a linear relation to n. Some works [22,23] have used random sampling or K-means to generate anchors for pixel–anchor graph learning and then partition the clusters by conducting singular value decomposition (SVD) and extra discretization operations. To generate more reasonable anchors, a recent model, SAGC [24], used ERS to obtain superpixels and regarded the center of each superpixel as an anchor. By this way, the selected anchors reflect the spatial context, unveiling exact pixel–anchor relations. However, the multistage process and the instability of postprocessing still limit the clustering performance. More efforts have been spent to design a one-step paradigm. For example, SGCNR [25] proposed a non-negative and orthogonal relaxation method to directly obtain the cluster indicator from low-dimensional embeddings. GNMF [26] proposed a unified framework that conducts non-negative matrix factorization to yield the soft cluster indicator. SSAFC [27] proposed a joint learning framework of self-supervised spectral clustering and fuzzy clustering to directly generate the soft cluster indicator. By learning an anchor–anchor graph to provide cluster results, SAPC [28] achieves large-scale HSI clustering via label propagation. Moreover, a structured bipartite-graph-based HSI clustering model termed BGPC [29] is proposed, which conducts low-rank constraint to bipartite graphs, directly providing cluster results via connectivity. Although many efforts have been made, current works confront two problems: First, performing SVD on an n × m pixel–anchor graph leads to O ( m 2 n + m 3 ) complexity, which is still a little high. Second, many current works are based on the graph learning mode of CAN [30]. However, CAN ignored the symmetry condition of graph during the process of affinity estimation, which always generates a suboptimal graph with poor clustering results. Moreover, although some doubly stochastic graph learning methods (like DSN [31]) have been proposed to solve the problem, they rely on predefined graph inputs due to the difficulty of optimization design.
To address the issues, this paper introduces a large-scale hyperspectral image projected clustering model (abbreviated as HPCDL) via doubly stochastic graph learning. The main contributions of this paper are as follows:
  • We introduce HPCDL (Code has been published at https://github.com/NianWang-HJJGCDX/HPCDL.git) (accessed on 18 February 2025), a unified framework that learns a projected feature space to simultaneously build a pixel–anchor graph and an anchor–anchor graph. The doubly stochastic constraints (symmetric, non-negative, row sum being 1) are applied to the anchor–anchor graph, combining affinity estimation and graph symmetry into one step, which ultimately produces a better graph with strict probabilistic affinity. The learned anchor–anchor graph directly derives anchor cluster indicators via its connectivity and propagates labels to pixel-level clusters through nearest-neighbor relationships in the pixel–anchor graph.
  • We analyze the relationship between the proposed HPCDL and existing hierarchical clustering models from a theoretical perspective while also providing in-depth insights.
  • We design an effective optimization scheme. Specifically, we first deduce a key equivalence relation and propose a novel method for optimizing the subproblem with doubly stochastic constraints. Unlike the widely used von Neumann successive projection (VNSP) lemma, our method does not require decomposition of the doubly stochastic constraints for alternating projections. Instead, it optimizes all constraints simultaneously, thereby guaranteeing convergence to a globally optimal solution.
  • The experiments on three widely used HSI datasets demonstrate that our method achieves state-of-the-art (SOTA) clustering performance while maintaining low computational costs, making it easily extensible to large-scale datasets.
Organization: Section 2 introduces related works. Section 3 illustrates the proposed HPCDL and then designs an effective optimization scheme. Section 4 conducts experiments to verify the merits of our proposed model. Section 5 discusses the motivation, parameters, and limitations. Section 6 concludes the paper with future research directions.

2. Related Work

2.1. Clustering with Adaptive Neighbor (CAN)

CAN [30] is a classical graph learning method that constructs a graph with probabilistic affinities, where the neighborhood relationships of each sample are adaptively determined. Formally, the model of CAN is defined as
min W i , j = 1 n x i x j 2 2 w i j + Γ w i j 2 s . t . 1 T w i = 1 , w i j 0 ,
where W denotes the affinity matrix of the graph. The term x i x j 2 2 w i j decides the affinities, which implies that any pixel x j can be a neighbor of the i - th pixel x i with probabilistic affinity w i j . This follows the assumption that pixels with smaller Euclidean distance x i x j 2 2 have higher connection probability w i j . The second term w i j 2 with a hyperparameter Γ is used to avoid trivial solution, i.e, only the nearest pixel becomes the neighbor with probability being 1. w i denotes the i-th row of W , and 1 is a column vector with all the entries being 1. Therefore, 1 T w i = 1 is the row sum constraint and w i j 0 is the non-negative constraint, which bind all affinities to [0–1]. Let d i j x = x i x j 2 2 . To avoid parameter tuning, [30] we set Γ as
Γ = 1 n i = 1 n ( c 2 d i , c + 1 x 1 2 j = 1 c d i j x ) ,
where c denotes the number of neighbors. Consequently, a closed-form solution for W can be derived as
w i j = d i , c + 1 x d i j x k d i , c + 1 x j = 1 c d i j x .
Furthermore, [30] proposed to use a rank constraint r a n k ( L W ) = n k to Problem (1), where n denotes the number of pixels and k denotes the number of clusters. Therefore, we obtain
min W i , j = 1 n x i x j 2 2 w i j + Γ w i j 2 s . t . 1 T w i = 1 , w i j 0 , r a n k ( L W ) = n k ,
where r a n k ( L W ) = n k makes the learned W structured, directly providing cluster indicators via connectivity. Like Problem (1), a similar optimization scheme can be used for Problem (4). Although Problem (4) succeeds in avoiding the postprocessor of discretization, it still requires manmade symmetry W = W + W T 2 as the symmetry condition of W is ignored during affinity estimation. Such a separate process changes the affinities and always generates a suboptimal graph with fluctuating degrees.

2.2. Doubly Stochastic Graph Learning

A doubly stochastic graph is characterized as the affinity matrix W to be w i = w i , 1 T w i = 1 , w i j 0 for all i, where w i = w i is the symmetry condition, meaning that the ith row w i and column w i of W are equal. As recent works have learned a probabilistic graph with constraints 1 T w i = 1 , w i j 0 , it in essence combines the graph symmetry and affinity estimation into one step. Doubly stochastic graph learning has been studied through a long period. An early work DSN [31] proposed a doubly stochastic approximation problem as
min W W S F 2 , s . t . w i = w i , 1 T w i = 1 , w i j 0 ,
where S is the affinity matrix of the input graph, W is the learned doubly stochastic matrix, and · F denotes the Frobenius norm.
The optimization of Problem (1) is designed by using VNSP lemma, which separates the doubly stochastic constraints and obtains two subproblems as
min W W S F 2 s . t . w i = w i , 1 T w i = 1 ,
min W W S F 2 s . t . w i j 0 .
The solutions of Problems (6) and (7) are, respectively,
W 1 = S + ( 1 n I + 1 S 1 n 2 I 1 n S ) 1 1 1 n 1 1 S ,
W 2 = S + ,
where S + means s i j = max ( s i j , 0 ) .
For each iteration, we first solve Problem (6) to obtain W 1 , and substitute S in Problem (7) with W 1 . Next, we solve Problem (7) and use its solution W 2 to replace S in Problem (6). The above process repeats until W converges under the constraints of both Problems (6) and (7), at which point W becomes a doubly stochastic matrix. Such graphs have been widely verified to clarify data structure by maintaining must-links while alleviating cannot-links (see Figure 1) [32,33,34]. Consequently, many works learned a doubly stochastic graph to improve clustering [35,36,37,38,39,40,41,42]. However, all of them followed the paradigm of DSN, which involves a doubly stochastic graph approximation problem with the VNSP-based optimization. Directly building a doubly stochastic graph from a data matrix is still an open area of investigation.

3. Materials and Methods

3.1. Model Definition

Despite significant progress, current graph-based HSI clustering approaches still face three critical challenges: (1) Spectral–spatial signature redundancy: The high-dimensional spectral features coupled with spatial correlations often lead to mixed and redundant representations, compromising feature discriminability. (2) Computational complexity limitations: While anchor-based methods have achieved linear complexity for large-scale HSI clustering, the subsequent cluster assignment through matrix decomposition or structured graph learning still incurs substantial computational overhead. (3) Graph construction suboptimality: Most approaches suffer from performance degradation due to the disjoint optimization of affinity estimation and graph symmetry. To solve the problems simultaneously, inspired by [28], we propose a unified framework termed HPCDL as follows:
min P , Z , A , W i = 1 n j = 1 m P T x i P T a j 2 2 z i j + γ z i j 2 + α ( i , j = 1 m ( P T a i P T a j 2 2 w i j + β w i j 2 ) ) s . t . P T X X T P = I ; 1 T z i = 1 , z i > 0 , w i = w i , 1 T w i = 1 , w i j > 0 , r a n k ( L W ) = m k ,
where P is the projected matrix and A is the anchor matrix. Problem (10) simultaneously learns a pixel–anchor graph Z and an anchor–anchor graph W in a projected space based on the adaptive neighbor theory. A hyperparameter α is used to balance the importance of these two kinds of graphs. Moreover, γ and β are two hyperparameters used for the building of Z and W , respectively, which can be set by Equation (2) to avoid parameter tuning according to the theory of CAN [30]. Therefore, only hyperparameter α needs to be tuned, which has an important influence for the performance of our model. As the learned pixel–anchor graph Z does not provide clustering results, we only use 1 T z i = 1 , z i > 0 to obtain probabilistic affinities. Moreover, the rank constraint r a n k ( L W ) = m k enforces the learned affinity matrix W to exhibit an ideal block-diagonal structure, where connected components correspond to distinct clusters. Therefore, the symmetry of W is necessary, and we directly learn a symmetrical probabilistic graph with doubly stochastic constraints w i = w i , 1 T w i = 1 , w i j > 0 . We detail the internal mechanism and merits of Problem (10) as follows:
(1) Effective spatial–spectral feature extraction. The constraint P T X X T P = I reduces original d spectral bands to r orthogonal ones, eliminating redundancy and enhancing spectral feature learning. Inspired by [24], we use the ERS method to obtain local segmentations and initialize anchors as a i = 1 j j x j , where i denotes the index of segmentation (see Figure 2) and j denotes the index of pixels within current segmentation. This approach ensures that the anchors effectively encode spatial information. During optimization, the projected matrix P and anchor matrix A are collaboratively updated to refine both spectral band selection and anchor placement, enabling effective spatial–spectral feature extraction.
(2) Low calculation burden. Instead of using Eigen decomposition to an n × n pixel–pixel graph or an n × m pixel–anchor graph, we achieve cluster assignment on an m × m anchor–anchor graph, reducing the calculation complexity to O ( m 3 ) , independent of the number of pixels n. Moreover, we use the strategy in [43] to adaptively decide the number of anchors m, which stably controls m to a small but reasonable value in [100–200] across datasets of varying scales. For HSIs with hundreds of thousands of pixels, such a compact anchor set dramatically accelerates the processing speed of our HPCDL framework.
(3) Doubly stochastic graph learning. Problem (10) is a unified framework used to cooperatively optimize the projected matrix P , anchor matrix A , pixel–anchor graph Z , and anchor–anchor graph W . While exact cluster assignment on a reduced graph W remains challenging, our approach learns a doubly stochastic anchor–anchor graph that simultaneously addresses both affinity estimation and graph symmetry. This formulation explicitly enhances the block-diagonal structure of W , thereby improving clustering accuracy. By searching for the connectivity of W , we obtain the anchor-level clusters. Subsequently, we can directly obtain the pixel-level clusters by the nearest pixel–anchor relationships encoded in Z .
For convenience, we introduce U = P T A and obtain the final target function of our HPDCL as
min P , Z , U , W i = 1 n j = 1 m P T x i u j 2 2 z i j + γ z i j 2 + α ( i , j = 1 m ( u i u j 2 2 w i j + β w i j 2 ) ) s . t . P T X X T P = I ; 1 T z i = 1 , z i > 0 , w i = w i , 1 T w i = 1 , w i j > 0 , r a n k ( L W ) = m k .
From a macro-level perspective, our proposed HPDCL is related to recent hierarchical clustering models, particularly the FINCH [44]. As shown in Figure 3, FINCH builds the cluster hierarchy Υ = γ 1 , γ 2 , γ L by constantly merging the nearest clusters. For each partition γ l = C 1 , C 2 , C γ l , it is a valid cluster assignment only when the condition C γ l > C γ l + 1 is always satisfied. This means that the number of clusters is gradually reduced until it converges (in some cases, all the pixels are converged to one cluster). Therefore, hierarchical clustering models are fundamentally unconstrained clustering methods that do not require predefined cluster numbers. If a specific cluster number k is required, hierarchical clustering models must first identify the cluster hierarchy γ l that is closest to but exceeds k. Subsequently, a refinement strategy iteratively merges the most similar cluster pairs until γ l = k is achieved during hierarchy reorganization. In contrast, our proposed HPDCL operates as a two-hierarchy framework with the pixel-level and anchor-level hierarchies (illustrated in the red dotted box). Specifically, by imposing a rank constraint on the anchor–anchor graph, our HPCDL framework directly enforces the formation of exactly k anchor-level clusters during the hierarchy construction phase. Consequently, pixel-level clusters are simultaneously obtained through nearest-neighbor propagation based on the pixel–anchor associations encoded in Z , eliminating the need for additional hierarchy search or iterative refinement procedures.

3.2. Optimization

As Problem (11) contains four variables ( P , Z , U , and W ), we propose an effective alternating optimization approach to iteratively update each variable until convergence.

3.2.1. Update P with Others Fixed

Problem (11) becomes
min P i = 1 n j = 1 m P T x i u j 2 2 z i j s . t . P T X X T P = I .
Considering U = P T A , we have
i = 1 n j = 1 m P T x i P T a j 2 2 z i j = 2 tr ( P T X P T A ( I n 0 0 D m 0 Z Z T 0 X T P A T P ) = 2 tr P T X X T 2 X Z A T + A D m A T P .
Let Φ = X X T 2 X Z A T + A D m A T . Problem (13) is equal to
min P T X X T P = I T r P T Φ P .
According to the generalized Rayleigh quotient, Problem (14) is equal to an eigen decomposition problem of inv( X X T ) Φ , where inv(·) denotes the matrix inversion operation. Therefore, the optimal solution is constructed by the eigenvectors corresponding to the r smallest eigenvalues, where r is the number of reduced spectral dimensions.

3.2.2. Update Z with Others Fixed

Problem (11) becomes
min Z i = 1 n j = 1 m P T x i P T a j 2 2 z i j + γ z i j 2 s . t . 1 T z i = 1 , z i > 0 .
Problem (15) is similar to Problem (1). If we set d i j x = P T x i P T a j 2 2 , the closed-form solution of Z can be obtained by Equation (3).

3.2.3. Update W with Others Fixed

Problem (11) becomes
min W i , j = 1 m ( u i u j 2 2 w i j + β w i j 2 ) s . t . w i = w i , 1 T w i = 1 , w i j > 0 , r a n k ( L W ) = n k ,
which contains both the doubly stochastic constraint and rank constraint about variable W . As the rank constraint r a n k ( L W ) = n k is not easy to solve, we introduce Theorem 1.
Theorem 1.
Let σ i ( L W ) be the i - th smallest eigenvalue of L W . We know that i 1 , 2 , , n , σ i ( L W ) 0 due to the semi-definite property of Laplace matrix L W . Therefore, minimizing i = 1 k σ i L W will drop the smallest k eigenvalues to be zero, and, thus, the constraint r a n k ( L W ) = n k is satisfied spontaneously.
By using Theorem 1, if η is large enough, we obtain an equivalent transformation of Problem (16) as
min W i , j = 1 m u i u j 2 2 w i j + β w i j 2 + 2 η i = 1 k σ i ( L W ) s . t . w i = w i , 1 T w i = 1 , w i j > 0 .
In practical use, following the recommendation of [30], we set η = β to avoid parameter tuning. Moreover, according to Ky Fan’s theorem [45], we know that
i = 1 k σ i ( L W ) = min F T F = I T r ( F T L W F ) .
Denote a vector d i R n × 1 whose j-th element is d i j = u i u j 2 2 . We obtain the matrix form of Problem (17) as
min W , F T r ( D T W ) + β W F 2 + 2 η T r ( F T L W F ) s . t . W = W T , W 1 = 1 , W 0 , F T F = I ,
where D R n × n is a distance matrix consisting of the vectors d i .
As Problem (19) contains two variables, W and F , an alternative optimization approach should be used.
(1) Update W when F is fixed. Problem (19) becomes
min W T r ( D T W ) + β W F 2 + 2 η T r ( F T L W F ) s . t . W = W T , W 1 = 1 , W 0 .
This problem involves doubly stochastic constraints, which are challenging to solve due to the presence of symmetry conditions. Previous methods, such as those based on VNSP lemma, struggle to optimize this problem effectively because of the polynomial form with respect to W . To address this issue, we propose a novel approach. Firstly, by using augmenting Lagrange method ALM [46], we obtain
min W T r ( D T W ) + β W F 2 + 2 η T r ( F T L W F ) + μ 2 W W T + 1 μ Λ F 2 s . t . W 1 = 1 , W 0 ,
where Λ R n × n is the Lagrange multiplier and μ is a hyperparameter of the ALM method, which is named penalty factor and should be set as a positive constant. Problem (21) embeds the symmetric condition of W into the objective function. During the iterations, μ is magnified by the expansion coefficient ρ (another hyperparameter of ALM), which speeds up to meet the symmetry condition W W T = 0 . Therefore, owing to the convex property, Problem (21) optimizes all doubly stochastic constraints simultaneously to the globally optimal solution. The remaining challenge lies in efficiently solving Problem (21), which we address by using Theorem 2.
Theorem 2.
Assume a variable H , for a problem with matrix form as
min H H H T + Θ F 2 s . t . H 1 = 1 , H 0 ,
whose objective function depends on both the variable H and its transposed matrix H T , where Θ is a matrix polynomial independent of H . As the constraints about H are row-separable (e.g., H 1 = 1 , H 0 ), it can be solved row by row, thus obtaining a set of vector-form problems as follows:
min h ( i ) h ( i ) h ( i ) + θ ( i ) 2 2 + h ( i ) h ( i ) + θ ( i ) 2 2 s . t . i , 1 T h ( i ) = 1 , h ( i ) 0 ,
where h ( i ) and h ( i ) denote the i-th row and column of H , and θ ( i ) and θ ( i ) denote the i-th row and column of Θ . Therefore, h ( i ) h ( i ) + θ ( i ) and h ( i ) h ( i ) + θ ( i ) denote the i-th row and column of matrix H H T + Θ , respectively.
An illustration is provided in Figure 4. When i = 1 , the row constraint 1 T h ( 1 ) = 1 , h ( 1 ) 0 means that all the elements h 11 , h 12 , , h 1 n in the first row of H should be restricted. However, as Problem (22) contains both the variable H and its transposed matrix H T , some elements (blue in Figure 4) in the first row of H are not restricted if we only update the first row of matrix H H T + Θ (red in Figure 4). Therefore, the terms h ( i ) h ( i ) + θ ( i ) 2 2 and h ( i ) h ( i ) + θ ( i ) 2 2 simultaneously restrict the i-th row and column of matrix H H T + Θ to ensure that, for any i, all the elements in constraints 1 T h ( i ) = 1 , h i j 0 participate in the calculation.
By using Theorem 2, we can transform the last term of Problem (21) to the vector form. Moreover, according to the property of Laplacian matrix, we know that 2 T r ( F T L W F ) = i = 1 , j = 1 n f i f j 2 2 w i j = i = 1 n τ ( i ) T w ( i ) , where τ ( i ) R n × 1 is a vector with its j-th element being τ i j = f i f j 2 2 . Therefore, we arrive at
min w ( i ) d ( i ) T w ( i ) + β w ( i ) 2 + η τ ( i ) T w ( i ) + μ 2 w ( i ) w ( i ) + 1 μ λ ( i ) 2 2 + μ 2 w ( i ) w ( i ) + 1 μ λ ( i ) 2 2 s . t . i , 1 T w ( i ) = 1 , w i j 0 ,
where w ( i ) w ( i ) + 1 μ λ ( i ) and w ( i ) w ( i ) + 1 μ λ ( i ) denote the i-th row and column of matrix W W T + 1 μ Λ , respectively. By simple algebraic manipulation, we further obtain
min i , 1 T w ( i ) = 1 , w i j 0 w ( i ) ϑ ( i ) 2 2 ,
where ϑ ( i ) = 2 μ w ( i ) λ ( i ) + λ ( i ) d ( i ) T η τ ( i ) T 2 ( β + μ ) . Problem (25) is a common problem having closed-form solution.
(2) Update F when W is fixed. Problem (19) becomes
min F T F = I T r F T L W F .
Based on the Rayleigh quotient, the solution of Problem (26) is a matrix consisting of the eigenvectors corresponding to the k smallest eigenvalues of L W .
Acceleration: The optimal doubly stochastic matrix W * can be obtained through an alternating optimization procedure, where we iteratively update F and W by solving Problem (25) and Problem (26), respectively. In practice use, we can accelerate the process. Supposing that the number of connectivity for current W is ϖ , after each iteration, we can decrease η to 1 2 η if ϖ > k or increase η to 2 η if ϖ < k , where k is the wanted number of clusters. When ϖ = k (i.e., W is a structured matrix) and D W = I (i.e., W is a doubly stochastic matrix), where D W is the degree matrix of W , we output current W in advance. To clear the process, the optimization of Problem (19) is summarized in Algorithm 1.
Algorithm 1: Algorithm to solve Problem (19).
  • Input:  cluster number k , data matrix X ;
  • Output:  Structured doubly stochastic matrix W ;
  • Set  1 < ρ < 2 ,   μ > 0 ;
  • Initialize  Λ as a zero matrix , initialize η , and initialize
  •         matrix W randomly ;
  •      While not converge do
  •        ( 1 ) Update W by solving Problem ( 25 ) ;
  •        ( 2 ) Update F by solving Problem ( 26 ) ;
  •        ( 3 ) Update Λ by Λ = Λ + μ ( W W T ) ;
  •        ( 4 ) Update μ by μ = ρ μ ;
  •        ( 5 ) Calculate ϖ ;
  •        ( 6 ) if ϖ = k and D W = I :
  •             Output W in advance .
  •      end While

3.2.4. Update U with Others Fixed

min U i = 1 n j = 1 m P T x i u j 2 2 z i j + α ( i , j = 1 m u i u j 2 2 w i j ) .
According to the deduction in Equation (13), we know that the first term is 2 Tr ( P T X Z U + U D m U T ) after removing the independent terms. Moreover, we know that u i u j 2 2 w i j = 2 Tr ( U L W U T ) , where L W denotes the Laplace matrix of the anchor–anchor graph W . Combining with them, we transfer Problem (27) as
min U Tr ( U ( 2 α L W + D m ) U T 2 P T X Z U ) .
By taking the derivative with respect to U and setting it to zero, we obtain the optimal solution of Problem (28) as
U = P T X Z ( D m + 2 α L W ) 1 .
At this point, we have completed the updates of all variables. The whole optimization process is summarized in Algorithm 2.
Algorithm 2: Algorithm to solve Problem (11).
  • Input:  data matrix X , cluster number k ;
  • Output:  Pixel - level clustering results ;
  • Initialize  anchor matrix A and pixel - - anchor matrix Z ;
  •      While not converge do
  •        ( 1 ) Update   P by solving Problem ( 14 ) ;
  •        ( 2 ) Update   Z by solving Problem ( 15 ) ;
  •        ( 3 ) Update   W by Algorithm 1 ;
  •        ( 4 ) Update   U by Equation ( 29 ) ;
  •      end While
  •      Obtain anchor - level clusters via the connectivity of W
         and propagate it to pixel - level results .

3.3. Computational Complexity Analysis

In the preprocessing stage, superpixel segmentation of HSI data by using ERS requires O n log ( n ) complexity, where n is the number of pixels. Notably, this is necessary for large-scale HSI clustering models that use superpixel segmentation to drop the complexity. For our model, four variables should be updated during the iterations. First, updating Z requires O n m d , where m and d denote the number of anchors and spectral signatures, respectively. Moreover, updating projected matrix P requires O d 3 . To update the structured doubly stochastic matrix W , we need to impose eigen decomposition on a m × m anchor–anchor graph, which requires O m 3 complexity. Finally, updating projected anchor matrix U needs O m 3 due to the matrix inversion operation. As n d and n m , the overall complexity of our model is mainly caused by O n m d , which is less than recent works that impose eigen decomposition on an n × n graph or an n × m graph, causing O n 3 + n m d or O n m 2 + n m d complexity in the iterations. After the iterative process, the label propagation from anchor-level to pixel-level incurs a computational complexity of O ( n m ) . However, this cost becomes negligible when compared to the computational overhead of the iteration process itself. Therefore, our proposed model requires less computational complexity than recent works.

4. Experimental Results

4.1. Comparative Methods and Experimental Setting

Comparative methods include K-means [3], SGCNR [25], HESSC [11], SGLSC [20], SAGC [24], SAPC [28], and EGFSC [21]. K-means is a classic centroid-based method and HESSC is a subspace-based method. All other methods are recent graph-based models designed for large-scale HSI data. For these graph-based methods, we construct the graph using five nearest neighbors. Moreover, for SGCNR, SGLSC, SAGC, AOPL, and our proposed HPCDL, they request to initialize the anchor matrix. As aforementioned, we use the center of each superpixel as the anchor; thus, the anchor number is strictly equal to the number of superpixels. For all the methods, the hyperparameters are set to the optimal values stated in the original paper. The experiments are conducted on a Windows 10 computer with 2.3 GHz Intel Xeon Gold 5218 CPU, 128 GB RAM, MATLAB 2020b.

4.2. Data Description

Three publicly available HSI datasets are used to verify the clustering ability of our proposed method. These datasets vary in scale, allowing us to simultaneously evaluate the method’s scalability to large-scale data.
The Xuzhou dataset was published in 2014, and captures the scene of the peri-urban region in Xuzhou City, Jiangsu Province, China, having 130,000 pixels (500 × 260) and 436 spectral bands. The image was captured by an airborne hyperspectral camera and there are nine diverse categories in total. For details, see Table 1.
The Longkou dataset was published in 2018, which is a scene of Longkou Town, Hubei Province, China. This image was captured by UAVborne system, having 220,000 pixels ( 550 × 400 ) and 270 bands to describe nine land-cover materials. For details, see Table 2.
The Pavia Centor (PaviaC for short) dataset was published in 2003, which captures in the city of Pavia, Italy, having 783,640 pixels (1096 × 715) and 102 spectral bands. The image is captured by a reflective optics system imaging spectrometer sensor and there are nine diverse categories in total. For details, see Table 3.

4.3. Metrics

We utilize user accuracy to evaluate the performance of each land-cover category. Moreover, to clearly define all the used overall metrics, we first introduce the concept of the confusion matrix. Assuming the total number of classes as k, the confusion matrix can be defined as a k-dimensional square matrix as follows:
M = m 11 m 12 m 13 m 1 k m 21 m 22 m 23 m 2 k m 31 m 32 m 33 m 3 k m k 1 m k 2 m k 3 m k k ,
where m i j denotes the number of pixels that belong to cluster i but are assigned to cluster j. Thus, the diagonal elements m i i ( i 1 , 2 , , k ) represent the number of correctly clustered pixels in the i-th land-cover category. Furthermore, the sum of all elements i , j k m i j must equal the total number of pixels n.
Overall accuracy (OA) represents the proportion of correctly clustered pixels to the total number of pixels, which can be defined as
OA = i = 1 k m i i n .
Kappa is another commonly used metric, which considers the influence of random factors when evaluating the accuracy of classification results. The mathematical definition is
K a p p a = n i = 1 k m i i i = 1 k ( m i , : × m : , i ) n 2 i = 1 k ( m i , : × m : , i ) ,
where m i , : and m : , i denote the sum of the ith row or the ith column of the confusion matrix M , respectively.
Normalized mutual information (NMI) reveals the degree of shared mutual information between a pair of clusterings. The mathematical definition is
N M I = i = 1 k j = 1 k m i j l o g n m i j m i , : m : , j i = 1 k m i , : l o g m i , : n j = 1 k m : , j l o g m : , j n .
Purity aims to estimate the degree to which every cluster covers the pixels from one class. The mathematical definition is
P u r i t y = 1 n j k max i m i j .
Adjusted Rand Index (ARI) is a metric for evaluating the similarity between clustering results, which corrects for bias introduced by random label assignments to enhance comparability. The mathematical definition is
A R I = i , j = 1 k m i j 2 i = 1 k m i , : 2 j = 1 k m : , j 2 n 2 i = 1 k m i , : 2 + j = 1 k m : , j 2 2 i = 1 k m i , : 2 j = 1 k m : , j 2 n 2 .

4.4. Experiments on Benchmarks

To comprehensively evaluate clustering performance and time cost across datasets of different scales, we selected Xuzhou, Longkou, and Pavia as small-, medium-, and large-scale datasets, respectively, based on their total pixel counts. We compare both of the quantitative metrics and visual maps. As the comparative methods contain both iteration-based models and non-iteration-based models, we report the average time cost per iteration for fair comparison.
(1) Results of Small-Scale Dataset Xuzhou: We first report the performance on Xuzhou, which is a small-scale dataset containing 130,000 pixels. For our method, we set r and α as 2 d / 5 and 1 to ensure the optimal performance, where d is the total number of spectral dimensions. Table 4 reports the quantitative results. As can be seen, none of the models can ensure high accuracy for each land-cover. Our HPCDL fails to recognize Coals and Trees, which is mainly caused by the anchor strategy. As mentioned earlier, we use the mean value of all pixels within a segmentation as an anchor, which inevitably leads to information loss and complicates precise classification. More importantly, as all the pixels within a segmentation are assigned to the same cluster, a misclassified anchor can result in large-scale pixel misclassification, severely degrading accuracy (even to zero) for certain land-cover types. Nevertheless, our HPCDL obtains high metrics for OA, kappa, NMI, purity and ARI, demonstrating the excellent clustering performance. Moreover, thanks to the lower calculation complexity, our HPCDL needs lower time cost than most other methods, except for K-means. The visual maps are provided in Figure 5. As can be seen, K-means, SGCNR, HESSC, and SAGC obtain a very non-smooth clustering map, which shows that these methods cannot effectively retain the consistency of local regions. Moreover, SGLSC, SAPC, and EGFSC obtain smoother maps and show strong local capabilities in some land-covers like Bareland2 and Red-title, demonstrating their excellent classification performance for some categories. However, for certain land-covers (e.g., Cement), our proposed model performs better and well recognizes related regions, which ensures that our model achieves the best values for five overall metrics. The results show that the doubly stochastic property of anchor–anchor graphs improves clustering to some extent.
(2) Results of Medium-Scale Dataset Longkou: We then evaluated the performance on Longkou, which is a medium-scale dataset containing 220,000 pixels. For our method, we set r and α as d / 5 and 100, respectively, to ensure the optimal performance. Examining the results in Table 5, our method obtains much better results for all the five metrics: OA, kappa, NMI, purity, and ARI. For the accuracy of each land-cover, our method obtains high accuracy for most of them, and for five out of nine, it obtains the best values. Moreover, our method achieves the second-lowest time cost, outperforming other graph-based methods while remaining only slightly higher than that of K-means. Figure 6 presents the visual maps. As shown, while K-means achieves accurate classification for most Water regions, it produces significant misclassification for other land-cover types. The methods SGCNR, HESSC, and SAGC exhibit patchy misclassification in Water areas along with scattered noise in other regions. In comparison, SAPC and EGFSC generate visually smoother clustering results, though they completely misclassify certain Narrow-leaf soybean and Rice areas. Our proposed model demonstrates superior performance by correctly classifying most of these challenging regions.
(3) Results of Large-Scale Dataset PaviaC: We finally evaluate the performance on PaviaC, which is a large-scale dataset with 783,640 pixels. For our method, r and α are set as 2 d / 5 and 10 to ensure optimal performance. Table 6 and Figure 7 present the quantitative results and visual clustering maps, respectively. The results demonstrate that our method achieves the highest accuracy for Water, Bitumen, and Meadows among all land-cover categories. While showing competitive performance for other land-cover types, it exhibits relatively lower accuracy only for Asphalt and Self-blocking bricks. Significantly, our method outperforms all comparative approaches across five metrics (OA, kappa, NMI, purity, and ARI), with marginal improvement over the recent SAPC method and substantial gains over other baselines, demonstrating its robust clustering capability.

5. Discussion

5.1. Motivation Verification

This subsection verifies whether learning a doubly stochastic graph can lead to a better graph with improved connectivity and clustering performance. To achieve this, we set a Variant A that removes the symmetry condition of our HPCDL model and manually makes W symmetrical after each iteration. Therefore, like many other graph-based models, Variant A separately conducts affinity estimation and graph symmetry. Figure 8 reports the results. As can be seen, Variant A learns a graph with fluctuating degrees, causing some “cannot-links” that influence the connectivity of the graph. Therefore, such a graph cannot provide exact anchor clusters and then obtain suboptimal pixel clustering results with lower overall accuracy (OA). By learning a doubly stochastic graph, our HPCDL combines the affinity estimation and graph symmetry into one step, generating strict probabilistic affinities with the degrees of all pixels being 1. Such a doubly stochastic graph provides clearer block-diagonal structures and more exact cluster relations. By searching for the connectivity, better clustering results are obtained.
We further show that our doubly stochastic graph learning method can effectively improve recent works. Two baselines, termed SAPC and EGFSC, are used. SAPC learns the graph via the adaptive neighbor theory of CAN. Therefore, after affinity estimation, a manmade symmetry operation is needed. EGFSC uses the heat kernel to build the graph and then constrains the neighbors by sorting the Euclidean distance. Our doubly stochastic graph learning method has obvious theoretical merit as it combines affinity estimation, neighbor decision, and graph symmetry into one step. To verify the effectiveness, we use our method to replace the graph learning method in SAPC and EGFSC, and then test it on the Longkou dataset. Table 7 reports the results. Our proposed method significantly improves the performance of SAPC and EGFSC as it solves the limitation of artificial intervention and multistage design. Moreover, by examining the results of SAPC and EGFSC, we observe that if the learned graph directly provides clustering results based on its connectivity (as in SAPC), the improvement achieved by our method is more significant, as it eliminates the need for K-means postprocessing. The results demonstrate that our method has the potential to enhance the performance of related works.

5.2. Parameter Study

First, following [30], hyperparameters γ , β , and η are automatically determined by the number of neighbors; thus, they do not need to be tuned. Moreover, we use the ALM method to optimize Problem (20) with three more hyperparameters, ρ , μ , and Λ . Λ only needs to be initialized as an all-zero matrix. Penalty factor μ and expansion coefficient ρ do not need to be tuned as they are suggested as μ = 1 and ρ = 1.1 to ensure the stability of the ALM method [46]. Therefore, for the proposed HPCDL model, only α and r should be studied. α weighs the important of pixel–anchor and anchor–anchor graphs. r denotes the reduced spectral dimension, which avoids redundant information. We study their influence by setting α { 0.01 , 0.1 , 1 , 10 , 100 } and r { d / 5 , 2 d / 5 , 3 d / 5 , 4 d / 5 , d } , where d is the total spectral dimension. The results of overall accuracy (OA) are reported in Figure 9. As can be seen, our HPCDL obtains better results when α [1–100]. This is theoretically explainable. As the learned anchor–anchor graph directly provides cluster indicators, our model requests to underline its importance to ensure accuracy. Moreover, we notice that using both a small or large spectral dimension r drops the performance of our proposed HPCDL, which shows that a too-large r introduces some redundant and mixed spectral signatures, while a too-small r limits the richness of feature dimensions.

5.3. Limitations

Our HPCDL simultaneously learns both the pixel–anchor graph and anchor–anchor graph, which is sensitive to the value of α . Experimental results demonstrate that α values within the range [1–100] typically yield better performance. Moreover, it is a tractable problem to decide the specific number of reduced spectral dimension. Therefore, although we have proposed some strategies to reduce manually set hyperparameters, our proposed model still requires slight parameter tuning and is sensitive to these parameters.

6. Conclusions

This paper proposes a large-scale HSI clustering model, termed HPCDL, by a unified framework for learning a pixel–pixel graph and an anchor–anchor graph in a projected space. A novel and effective optimization method for the subproblem with doubly stochastic constraints is proposed, which can be widely used in related graph-based clustering works. Experiments on three HSI datasets showed that our method achieves SOTA clustering performance and needs lower time cost than most of the other comparatives. In the future, we will extend the current framework to deal with multitemporal data, and explore its potential adaptation to multimodal data (e.g., HSI and LiDAR) fusion-based clustering.

Author Contributions

Conceptualization, N.W. and Y.X.; methodology, N.W. and A.L.; validation, N.W. and Y.L.; formal analysis, N.W. and Y.L.; writing—original draft, N.W. and C.Z.; writing—review and editing, Y.X. and Y.S.; supervision, Z.C. and A.L.; project administration, A.L.; funding acquisition, Z.C. and Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Natural Science Foundation of Shaanxi Province under grants 2020JQ-298 and 2023-JC-YB-501.

Data Availability Statement

The original data presented in the study are openly available and we also provide them with our code at https://github.com/NianWang-HJJGCDX/HPCDL.git (accessed on 18 February 2025).

Conflicts of Interest

The authors declare no conflicts of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; nor in the decision to publish the results.

References

  1. Zhang, Z.; Cai, Y.; Gong, W.; Ghamisi, P.; Liu, X.; Gloaguen, R. Hypergraph Convolutional Subspace Clustering With Multihop Aggregation for Hyperspectral Image. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 676–686. [Google Scholar] [CrossRef]
  2. Wang, N.; Yang, A.; Cui, Z.; Ding, Y.; Xue, Y.; Su, Y. Capsule Attention Network for Hyperspectral Image Classification. Remote Sens. 2024, 16, 4001. [Google Scholar] [CrossRef]
  3. Ranjan, S.; Nayak, D.R.; Kumar, K.S.; Dash, R.; Majhi, B. Hyperspectral image classification: A k-means clustering based approach. In Proceedings of the 2017 4th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 6–7 January 2017; pp. 1–7. [Google Scholar] [CrossRef]
  4. Yang, X.; Zhu, M.; Sun, B.; Wang, Z.; Nie, F. Fuzzy C-Multiple-Means Clustering for Hyperspectral Image. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar] [CrossRef]
  5. Zhang, F.; Yan, H.; Zhao, J.; Hu, H. Euler Kernel Mapping for Hyperspectral Image Clustering via Self-Paced Learning. Remote Sens. 2024, 16, 4097. [Google Scholar] [CrossRef]
  6. Pei, S.; Chen, H.; Nie, F.; Wang, R.; Li, X. Centerless Clustering. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 167–181. [Google Scholar] [CrossRef]
  7. Cariou, C.; Chehdi, K.; Le Moan, S. Improved Nearest Neighbor Density-Based Clustering Techniques with Application to Hyperspectral Images. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 4127–4131. [Google Scholar] [CrossRef]
  8. Nie, F.; Liu, C.; Wang, R.; Li, X. A Novel and Effective Method to Directly Solve Spectral Clustering. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 10863–10875. [Google Scholar] [CrossRef]
  9. Zhang, H.; Zhai, H.; Zhang, L.; Li, P. Spectral–Spatial Sparse Subspace Clustering for Hyperspectral Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 3672–3684. [Google Scholar] [CrossRef]
  10. Huang, S.; Zhang, H.; Pižurica, A. Subspace Clustering for Hyperspectral Images via Dictionary Learning with Adaptive Regularization. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–17. [Google Scholar] [CrossRef]
  11. Rafiezadeh Shahi, K.; Khodadadzadeh, M.; Tusa, L.; Ghamisi, P.; Tolosana-Delgado, R.; Gloaguen, R. Hierarchical sparse subspace clustering (HESSC): An automatic approach for hyperspectral image analysis. Remote Sens. 2020, 12, 2421. [Google Scholar] [CrossRef]
  12. Chen, J.; Liu, S.; Wang, H. Dual smooth graph convolutional clustering for large-scale hyperspectral images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 6825–6840. [Google Scholar] [CrossRef]
  13. Luo, F.; Liu, Y.; Duan, Y.; Guo, T.; Zhang, L.; Du, B. SDST: Self-supervised double-structure transformer for hyperspectral images clustering. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–14. [Google Scholar] [CrossRef]
  14. Yang, A.; Li, M.; Ding, Y.; Xiao, X.; He, Y. An Efficient and Lightweight Spectral-Spatial Feature Graph Contrastive Learning Framework for Hyperspectral Image Clustering. IEEE Trans. Geosci. Remote Sens. 2024. [Google Scholar] [CrossRef]
  15. Cai, Y.; Zhang, Z.; Cai, Z.; Liu, X.; Jiang, X.; Yan, Q. Graph convolutional subspace clustering: A robust subspace clustering framework for hyperspectral image. IEEE Trans. Geosci. Remote Sens. 2020, 59, 4191–4202. [Google Scholar] [CrossRef]
  16. Liu, S.; Wang, H. Graph convolutional optimal transport for hyperspectral image spectral clustering. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–13. [Google Scholar] [CrossRef]
  17. Cai, Y.; Zhang, Z.; Ghamisi, P.; Ding, Y.; Liu, X.; Cai, Z.; Gloaguen, R. Superpixel contracted neighborhood contrastive subspace clustering network for hyperspectral images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–13. [Google Scholar] [CrossRef]
  18. Liu, M.Y.; Tuzel, O.; Ramalingam, S.; Chellappa, R. Entropy rate superpixel segmentation. In Proceedings of the CVPR 2011, Colorado Springs, CO, USA, 20–25 June 2011; pp. 2097–2104. [Google Scholar]
  19. Achanta, R.; Shaji, A.; Smith, K.; Lucchi, A.; Fua, P.; Süsstrunk, S. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 2274–2282. [Google Scholar] [CrossRef]
  20. Zhao, H.; Zhou, F.; Bruzzone, L.; Guan, R.; Yang, C. Superpixel-level global and local similarity graph-based clustering for large hyperspectral images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–16. [Google Scholar] [CrossRef]
  21. Zhang, Y.; Wang, X.; Jiang, X.; Zhang, L.; Du, B. Elastic Graph Fusion Subspace Clustering for Large Hyperspectral Image. IEEE Trans. Circuits Syst. Video Technol. 2025. [Google Scholar] [CrossRef]
  22. Wang, R.; Nie, F.; Yu, W. Fast spectral clustering with anchor graph for large hyperspectral images. IEEE Geosci. Remote Sens. Lett. 2017, 14, 2003–2007. [Google Scholar] [CrossRef]
  23. Wang, Q.; Miao, Y.; Chen, M.; Yuan, Y. Spatial-Spectral Clustering With Anchor Graph for Hyperspectral Image. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–13. [Google Scholar] [CrossRef]
  24. Chen, X.; Zhang, Y.; Feng, X.; Jiang, X.; Cai, Z. Spectral-spatial superpixel anchor graph-based clustering for hyperspectral imagery. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar] [CrossRef]
  25. Wang, R.; Nie, F.; Wang, Z.; He, F.; Li, X. Scalable graph-based clustering with nonnegative relaxation for large hyperspectral image. IEEE Trans. Geosci. Remote Sens. 2019, 57, 7352–7364. [Google Scholar] [CrossRef]
  26. Huang, N.; Xiao, L.; Xu, Y.; Chanussot, J. A bipartite graph partition-based coclustering approach with graph nonnegative matrix factorization for large hyperspectral images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–18. [Google Scholar] [CrossRef]
  27. Wu, C.; Zhang, J. One-Step Joint Learning of Self-Supervised Spectral Clustering With Anchor Graph and Fuzzy Clustering for Land Cover Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 16, 11178–11193. [Google Scholar] [CrossRef]
  28. Jiang, G.; Zhang, Y.; Wang, X.; Jiang, X.; Zhang, L. Structured Anchor Learning for Large-Scale Hyperspectral Image Projected Clustering. IEEE Trans. Circuits Syst. Video Technol. 2024, 35, 2328–2340. [Google Scholar] [CrossRef]
  29. Zhang, Y.; Jiang, G.; Cai, Z.; Zhou, Y. Bipartite graph-based projected clustering with local region guidance for hyperspectral imagery. IEEE Trans. Multimed. 2024, 26, 9551–9563. [Google Scholar] [CrossRef]
  30. Nie, F.; Wang, X.; Huang, H. Clustering and Projected Clustering with Adaptive Neighbors. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; pp. 977–986. [Google Scholar]
  31. Zass, R.; Shashua, A. Doubly Stochastic Normalization for Spectral Clustering. In Proceedings of the Neural Information Processing Systems, Vancouver, BC, Canada, 3–6 December 2007; pp. 1569–1576. [Google Scholar]
  32. Wang, N.; Cui, Z.; Li, A.; Lu, Y.; Wang, R.; Nie, F. Structured doubly stochastic graph-based clustering. IEEE Trans. Neural Networks Learn. Syst. 2025. [Google Scholar] [CrossRef]
  33. Ding, T.; Lim, D.; Vidal, R.; Haeffele, B.D. Understanding Doubly Stochastic Clustering. In Proceedings of the 2022 14th International Conference on Machine Learning and Computing, Guangzhou, China, 18–21 February 2022; pp. 5153–5165. [Google Scholar]
  34. Yuan, J.; Zeng, C.; Xie, F.; Cao, Z.; Wang, R.; Nie, F.; Li, X. Doubly Stochastic Adaptive Neighbors Clustering via the Marcus Mapping. arXiv 2024, arXiv:2408.02932. [Google Scholar]
  35. Wang, X.; Nie, F.; Huang, H. Structured Doubly Stochastic Matrix for Graph Based Clustering. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1245–1254. [Google Scholar]
  36. Lim, D.; Vidal, R.; Haeffele, B.D. Doubly Stochastic Subspace Clustering. arXiv 2021, arXiv:2011.14859. [Google Scholar]
  37. Chen, M.; Gong, M.; Li, X. Robust Doubly Stochastic Graph Clustering. Neurocomputing 2022, 475, 15–25. [Google Scholar] [CrossRef]
  38. Yang, Z.; Oja, E. Clustering by Low-Rank Doubly Stochastic Matrix Decomposition. In Proceedings of the 29th International Conference on Machine Learning (ICML 2012), Edinburgh, Scotland, 26 June 26–1 July 2012; pp. 707–714. [Google Scholar]
  39. Wang, F.; Li, P.; König, A.C.; Wan, M. Improving Clustering by Learning a Bi-Stochastic Data Similarity Matrix. Knowl. Inf. Syst. 2012, 32, 351–382. [Google Scholar] [CrossRef]
  40. Park, J.; Kim, T. Learning Doubly Stochastic Affinity Matrix via Davis-Kahan Theorem. In Proceedings of the 2017 IEEE International Conference on Data Mining (ICDM), New Orleans, LA, USA, 18–21 November 2017; pp. 377–384. [Google Scholar]
  41. He, L.; Zhang, H. Doubly stochastic distance clustering. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 6721–6732. [Google Scholar] [CrossRef]
  42. Wang, Q.; He, X.; Jiang, X.; Li, X. Robust Bi-stochastic Graph Regularized Matrix Factorization for Data Clustering. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 390–403. [Google Scholar] [CrossRef] [PubMed]
  43. Jiang, J.; Ma, J.; Liu, X. Multilayer spectral–spatial graphs for label noisy robust hyperspectral image classification. IEEE Trans. Neural Netw. Learn. Syst. 2020, 33, 839–852. [Google Scholar] [CrossRef]
  44. Sarfraz, S.; Sharma, V.; Stiefelhagen, R. Efficient Parameter-Free Clustering Using First Neighbor Relations. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 8926–8935. [Google Scholar]
  45. Fan, K. On a theorem of weyl concerning eigenvalues of linear transformations. Proc. Natl. Acad. Sci. USA 1949, 35, 652–655. [Google Scholar] [CrossRef]
  46. Bertsekas, D.P. Constrained Optimization and Lagrange Multiplier Methods; Academic Press: Cambridge, MA, USA, 1982. [Google Scholar]
Figure 1. An example to explain the merit of doubly stochastic graph learning. (a) An ideal affinity matrix with 4 blocks, where the connections within the main diagonal are “must-links” that provide necessary connectivity, while the connections not located in the main diagonal are “cannot-links”, which denote the false similarities between different clusters. (b) A perturbed affinity matrix by adding noise to (a). (c) Doubly stochastic approximation of (b).
Figure 1. An example to explain the merit of doubly stochastic graph learning. (a) An ideal affinity matrix with 4 blocks, where the connections within the main diagonal are “must-links” that provide necessary connectivity, while the connections not located in the main diagonal are “cannot-links”, which denote the false similarities between different clusters. (b) A perturbed affinity matrix by adding noise to (a). (c) Doubly stochastic approximation of (b).
Remotesensing 17 01526 g001
Figure 2. The segmentation results from using ERS on the Xuzhou dataset. The yellow line marks the partition boundaries.
Figure 2. The segmentation results from using ERS on the Xuzhou dataset. The yellow line marks the partition boundaries.
Remotesensing 17 01526 g002
Figure 3. An illustration to show the relation of our proposed HPDCL and recent hierarchical clustering models.
Figure 3. An illustration to show the relation of our proposed HPDCL and recent hierarchical clustering models.
Remotesensing 17 01526 g003
Figure 4. An illustration to show the theory of Theorem 2.
Figure 4. An illustration to show the theory of Theorem 2.
Remotesensing 17 01526 g004
Figure 5. The visual maps for the clustering outputs on the Xuzhou dataset.
Figure 5. The visual maps for the clustering outputs on the Xuzhou dataset.
Remotesensing 17 01526 g005
Figure 6. The visual maps for the clustering outputs on the Longkou dataset.
Figure 6. The visual maps for the clustering outputs on the Longkou dataset.
Remotesensing 17 01526 g006
Figure 7. The visual maps for the clustering outputs on the PaviaC dataset.
Figure 7. The visual maps for the clustering outputs on the PaviaC dataset.
Remotesensing 17 01526 g007
Figure 8. The learned anchor–anchor graph on different HSI datasets. The first row and second row report the results of Variant A and HPCDL, respectively. Below each graph, the overall accuracy is shown. The third row visualizes the degrees of learned anchor–anchor graph; our HPCDL generates strict probabilistic affinities with all the degrees being 1.
Figure 8. The learned anchor–anchor graph on different HSI datasets. The first row and second row report the results of Variant A and HPCDL, respectively. Below each graph, the overall accuracy is shown. The third row visualizes the degrees of learned anchor–anchor graph; our HPCDL generates strict probabilistic affinities with all the degrees being 1.
Remotesensing 17 01526 g008
Figure 9. Parameter sensitivity experiment for reduced spectral dimension r and balancing hyperparameter α .
Figure 9. Parameter sensitivity experiment for reduced spectral dimension r and balancing hyperparameter α .
Remotesensing 17 01526 g009
Table 1. Description of Xuzhou dataset.
Table 1. Description of Xuzhou dataset.
No.Class NameSamples
1Bareland126,396
2Lakes4027
3Coals2783
4Crops-15214
5Cement13,184
6Trees2436
7Bareland26990
8Crops4777
9Red-title3070
Total68,877
Table 2. Description of Longkou dataset.
Table 2. Description of Longkou dataset.
No.Class NameSamples
1Corn34,511
2Cotton8374
3Sesame3031
4Broad-leaf soybean63,212
5Narrow-leaf soybean4151
6Rice11,854
7Water67,056
8Roads and houses7124
9Mixed weed5229
Total204,542
Table 3. Description of PaviaC dataset.
Table 3. Description of PaviaC dataset.
No.Class NameSamples
1Water65,971
2Trees7598
3Asphalt3090
4Self-blocking bricks2685
5Bitumen6584
6Tiles9248
7Shadows7287
8Meadows42,826
9Bare Soil2863
Total148,152
Table 4. Quantitative comparison on small-scale dataset Xuzhou. Numbers in bold denote the best results.
Table 4. Quantitative comparison on small-scale dataset Xuzhou. Numbers in bold denote the best results.
No.TypeK-meansSGCNRHESSCSGLSCSAGCSAPCEGFSCOurs
1Bareland10.59190.44480.99830.89750.80090.87330.75500.8788
2Lakes0.96500.96470.99390.96370.94340.95160.99330.9816
3Coals0.00000.00000.00000.00000.79230.64930.00000.0000
4Cement0.50840.51980.79480.42080.72360.45030.14370.9933
5Crops-10.28550.42790.00220.99800.31080.99800.88240.9992
6Trees0.00000.47540.87610.00000.00000.00000.48230.0000
7Bareland20.71360.53260.71940.00000.81690.98501.00000.9854
8Crops0.95250.95100.52780.99790.98600.99370.72790.9849
9Red-title0.43260.00070.27770.53840.61110.92251.00000.9228
OA0.53420.48500.58030.71640.68690.85170.74080.8701
Kappa0.44640.40420.47670.62820.61970.81350.68320.8368
NMI0.51760.52680.53470.71380.62460.82510.68090.8369
Purity0.72470.72670.66430.73760.74220.88060.77230.8968
ARI0.40640.39620.41110.61580.59920.78840.63620.8187
Time cost1.97595.5613106.316243.31649.47362.43622.84752.2453
Table 5. Quantitative comparison on medium-scale dataset Longkou. Numbers in bold denote the best results.
Table 5. Quantitative comparison on medium-scale dataset Longkou. Numbers in bold denote the best results.
No.TypeK-meansSGCNRHESSCSGLSCSAGCSAPCEGFSCOurs
1Corn0.64330.51840.91190.99830.59470.99730.93870.9993
2Cotton0.55150.83600.12220.99390.52520.00000.96940.9904
3Sesame0.34250.13630.00000.00000.66740.00000.00000.0000
4Broad-leaf soybean0.27880.35460.72260.79480.37920.87130.75310.9940
5Narrow-leaf soybean0.00070.00000.00000.00220.00000.00000.00000.6018
6Rice0.33820.64350.08620.87610.77600.88270.60220.9963
7Water0.99780.34360.76820.71940.60650.95790.99950.9997
8Roads and houses0.39840.00000.49100.52780.92140.52680.11650.5284
9Mixed weed0.43430.48080.05220.27770.00290.30330.13200.2855
OA0.59410.39550.65740.76690.52490.82880.80080.9400
Kappa0.50740.31690.55450.70790.44800.77130.74420.9204
NMI0.62350.59160.61480.77300.63020.79990.80850.9121
Purity0.80230.79880.81570.88840.81710.89200.88820.9518
ARI0.63170.36050.51250.69110.46430.75470.82200.9521
Time cost3.482316.6532206.4234122.543974.46235.64636.93924.9642
Table 6. Quantitative comparison on large-scale dataset PaviaC. Numbers in bold denote the best results.
Table 6. Quantitative comparison on large-scale dataset PaviaC. Numbers in bold denote the best results.
No.TypeK-meansSGCNRHESSCSGLSCSAGCSAPCEGFSCOurs
1Water0.99360.97190.97050.94490.99421.00000.98951.0000
2Trees0.64900.80941.00000.44350.91560.97630.16400.9902
3Asphalt0.17480.00000.00000.00000.32010.00000.02430.0000
4Self-blocking bricks0.03690.75200.00000.11920.01080.12100.81040.2534
5Bitumen0.31710.24820.55560.00020.45500.71670.70750.8408
6Tiles0.89150.45310.18510.99760.63770.96700.32480.9524
7Shadows0.17720.51080.75960.78300.00080.78340.86660.8124
8Meadows0.50040.53570.95300.98410.95190.99790.65960.9982
9Bare Soil0.00101.00000.40130.00000.95110.86310.39290.8557
OA0.70320.72650.84030.83090.85010.93340.75680.9378
Kappa0.59170.63120.76300.76320.78450.90550.66670.9163
NMI0.69060.72780.70120.80820.77620.90970.65670.9190
Purity0.83140.86110.84400.85950.86240.93510.82760.9399
ARI0.76530.79990.72910.86440.86930.97430.81610.9778
Time cost9.294319.8293489.6434256.543134.472320.646322.264817.4592
Table 7. The results on the Longkou dataset when using our doubly stochastic graph (DSG) learning method compared to recent graph-based clustering works.
Table 7. The results on the Longkou dataset when using our doubly stochastic graph (DSG) learning method compared to recent graph-based clustering works.
OAKappaNMIPurityARI
SAPC0.82880.77130.79990.89200.7547
SAPC + DSG0.86720.82310.83490.92120.7986
EGFSC0.80080.74420.80850.88820.8220
EGFSC + DSG0.82130.76340.81640.90470.8274
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, N.; Cui, Z.; Lan, Y.; Zhang, C.; Xue, Y.; Su, Y.; Li, A. Large-Scale Hyperspectral Image-Projected Clustering via Doubly Stochastic Graph Learning. Remote Sens. 2025, 17, 1526. https://doi.org/10.3390/rs17091526

AMA Style

Wang N, Cui Z, Lan Y, Zhang C, Xue Y, Su Y, Li A. Large-Scale Hyperspectral Image-Projected Clustering via Doubly Stochastic Graph Learning. Remote Sensing. 2025; 17(9):1526. https://doi.org/10.3390/rs17091526

Chicago/Turabian Style

Wang, Nian, Zhigao Cui, Yunwei Lan, Cong Zhang, Yuanliang Xue, Yanzhao Su, and Aihua Li. 2025. "Large-Scale Hyperspectral Image-Projected Clustering via Doubly Stochastic Graph Learning" Remote Sensing 17, no. 9: 1526. https://doi.org/10.3390/rs17091526

APA Style

Wang, N., Cui, Z., Lan, Y., Zhang, C., Xue, Y., Su, Y., & Li, A. (2025). Large-Scale Hyperspectral Image-Projected Clustering via Doubly Stochastic Graph Learning. Remote Sensing, 17(9), 1526. https://doi.org/10.3390/rs17091526

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop