Efficient Anchor-Guided Multi-View Clustering via Diversity–Consistency Learning and Low-Rank Tensor Recovery

Fan, Rong; Kang, Kehan; Zhang, Qian; Liu, Chundan; Hu, Yunhong; Peng, Chong

doi:10.3390/electronics15051136

Open AccessArticle

Efficient Anchor-Guided Multi-View Clustering via Diversity–Consistency Learning and Low-Rank Tensor Recovery

by

Rong Fan

^1,†

,

Kehan Kang

^2,†

,

Qian Zhang

³

,

Chundan Liu

²

,

Yunhong Hu

⁴ and

Chong Peng

^2,*

¹

College of Computer Science and Technology, Qingdao University, Qingdao 266071, China

²

Faculty of Information Science and Engineering, Ocean University of China, Qingdao 266100, China

³

College of Information Engineering, Binzhou Polytechnic, Binzhou 256619, China

⁴

College of Information Science and Engineering, Yuncheng University, Yuncheng 044000, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2026, 15(5), 1136; https://doi.org/10.3390/electronics15051136

Submission received: 16 January 2026 / Revised: 26 February 2026 / Accepted: 8 March 2026 / Published: 9 March 2026

(This article belongs to the Collection Graph Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

Multi-view clustering (MVC) is a fundamental unsupervised learning task for exploring latent structures from heterogeneous multi-view data. Existing MVC methods face critical challenges including the high computational cost of full-graph tensor models, neglect of high-order interactions between diversity and consistency information, and anchor misalignment across different views. In this paper, we propose an efficient anchor-guided MVC framework (EAG-DCT) via diversity–consistency learning and low-rank tensor recovery. The proposed method jointly learns consensus anchors, view-specific diversity graphs, and a global consistency graph in a unified model that integrates all graphs into a high-order tensor to capture rich cross-view correlations. By imposing a nonconvex low-rank constraint on the tensor, we effectively enhance the synergy between diversity and consistency learning. Our framework achieves high computational efficiency and scalability for large-scale data. Comprehensive experimental results on benchmark datasets validate that EAG-DCT outperforms state-of-the-art MVC methods in both clustering effectiveness and efficiency.

Keywords:

multi-view clustering; anchor-guided learning; diversity–consistency learning; low-rank tensor recovery; scalability

1. Introduction

Clustering is a fundamental unsupervised learning task in machine learning which aims to partition unlabeled data into subgroups based on latent structural similarities. It has found widespread applications in real-world scenarios such as image segmentation [1,2], community detection [3], text mining [4], and more. Over the past decades, clustering algorithms have been extensively developed [5,6,7]. Among these methods, subspace clustering methods based on the standard spectral clustering (SPC) are among the most popular [8,9,10]. For example, the low-rank representation (LRR) [8] and sparse subspace clustering (SSC) [9] seek low-dimensional representations of the data with desired low-rank or sparse structural constraints, respectively. LRR and SSC have both drawn significant attention thanks to their elegant theories and promising performance, and a number of extensions have been developed [2,11]. However, they inherently have a crucial limitation. In real-world scenarios, data collection often draws from from multiple heterogeneous sources, forming multi-view data that carry complementary and diverse information [12]. Unfortunately, traditional single-view learning methods cannot fully exploit such information [13], wasting valuable information and frequently yielding suboptimal clustering results in multi-view data scenarios.

To harness the rich complementary information in multi-view data, multi-view clustering (MVC) has emerged as a core research direction [14,15,16]. In general, MVC methods overcome the natural limitation of single-view methods to enable joint analysis of diverse feature spaces and excavation of inter-view correlations, accurately revealing the underlying structure of multi-view data [17,18,19]. Among the existing MVC methods, a widely adopted pipeline follows the single-view subspace clustering technique [16]. In general, the pipeline can be described as a two-step strategy. First, it is key to construct a proper affinity matrix from the data; after which it is standard to perform fundamental clustering techniques such as SPC or K-means to obtain clusters of samples.

There exist diverse approaches for constructing affinity matrices in MVC [16,18,20], among which representation learning has emerged as a widely studied paradigm [20]. This approach essentially follows the single-view subspace clustering paradigm and focuses on learning the intrinsic subspace structures of multi-view data [21]. It typically relies on the assumption that samples possess expressive characteristics [18]. With this assumption together with appropriate structural constraints, a discriminative affinity matrix can be reliably constructed to capture the inherent similarity relationships between samples. However, this approach generally incurs high computational cost due to matrix and tensor operations such as matrix multiplication and inversion, which makes it a bottleneck for real-world applications that must handle large-scale data [22,23]. In parallel, deep multi-view clustering methods leverage neural networks to learn nonlinear representations that better capture complex data structures. Representative works include LMSC [12], which learns a comprehensive latent representation from multiple views; SIB-MSC [24], which extends the information bottleneck principle; DMAC [25], which integrates learnable anchors for linear-time clustering; and MSCNLG [26], which incorporates graph information to guide subspace learning. Despite their effectiveness, deep MVC methods typically require substantial training data, lack interpretability, and offer weaker theoretical guarantees compared to tensor-based approaches, motivating our choice of the tensor paradigm in this work.

To address the above-mentioned challenges, anchor-based MVC strategies have gained significant attention in recent years [27,28]. Specifically, the anchor-based methods adopt a small set of representative anchors to construct a sample-anchor affinity matrix, which drastically reduces computational complexity [29,30]. Their efficient affinity construction makes anchor-based methods suitable for large-scale data. Early approaches usually select anchors independently in each view [27,31]. For example, the LMVSC selects anchors within each view using K-means [27], while MVC-LBG selects anchors in each view independently via random sampling [22]. However, this strategy leads to misaligned anchor points across views, which undermines the cross-view structural consistency of multi-view data and leads to distortion of the affinity matrix [28]. To address this issue, a number of methods attempt to select a shared set of anchor points [32,33,34,35]. For example, the EOMSE-CA constructs view-specific bipartite graphs over a shared anchor set and fuses them to obtain a consensus clustering structure [34]. These consensus anchor-based learning methods demonstrate promising performance and efficiency in multi-view clustering. However, their view fusion mechanisms remain matrix-based and are unable to adequately capture high-order interactions among multiple views. To capture rich cross-view interactions, recent tensor-based approaches [30,36] model and extract comprehensive cross-view information by elevating multi-view consistency into a high-order tensor space. For example, the CHOC-MVC stacks view-specific affinity graphs into a tensor and enforces low-rank structure to preserve high-order consistency [36]. Similarly, graph learning methods such as AGFS-OMVC construct a tensor consisting of view-specific graphs in order to capture rich cross-view interactions of multi-view data [30]. However, these methods only focus on cross-view high-order relationships, omitting the high-order interactions between the consistency part and diversity part. The TDCLM addresses this issue by combining the diversity graphs and the consistency graph into a tensor with low-rank constraint, while the Laplacian manifold constraints help to strengthen the relationship between diversity and consistency [37]. However, this method has high computational cost and is not applicable to large-scale datasets.

Despite these advances, existing anchor-based tensor methods still face a critical dilemma: they either sacrifice the modeling of high-order cross-view interactions for efficiency, or attempt to capture such interactions at the expense of scalability. Moreover, the intricate high-order interplay between view consistency and diversity, which is essential for fully exploiting complementary multi-view information, remains largely underexplored in a large-scale setting. This gap motivates us to develop a novel method that can simultaneously achieve high efficiency, preserve high-order cross-view correlations, and explicitly model the consistency–diversity relationship.

To address these issues, we propose a novel MVC method named the EAG-DCT. We summarize the key contributions of this paper as follows: (1) The proposed method learns consensus anchors across views, achieving scalability through efficient optimization; (2) The consistency and diversity anchor graphs are concatenated into an anchor graph tensor, which exploits cross-view correlations of multi-view data while ensuring efficiency; (3) Nonconvex tensor rank approximation is imposed on the anchor graph tensor, which admits more accurate recovery of structural information from multi-view data; (4) Extensive experimental results confirm effectiveness of the proposed method.

We organize the rest of this paper as follows: first, we introduce some key notation and necessary preliminary knowledge in Section 2; then, we introduce some related work in Section 3 and present the proposed method in Section 4; next, we develop an efficient optimization algorithm in Section 5, analyze its complexity in Section 7, and present the final clustering step in Section 6; following that, we present detailed experimental results in Section 8; finally, we conclude the paper in Section 9.

2. Notation and Preliminary

For clarity of presentation, we first introduce some key notation used in this work. The multi-view data set is denoted by

X

, consisting of n samples partitioned into c clusters, with each sample characterized by V sets of view-specific features. Specifically, the

v^{t h}

view of

X

is associated with a feature dimension of

d_{v}

, and we denote the dimensionality of multi-view data by

X \in R^{{d_{1}, \dots, d_{V}} \times n}

for ease of notation. In addition to the multi-view data, calligraphic characters are employed to represent third-order tensors, while the superscript

{(\cdot)}^{(v)}

is used to indicate either the

v^{t h}

view of the multi-view data or the

v^{t h}

frontal slice of a third-order tensor. The Frobenius norm of matrices is denoted by

{∥ \cdot ∥}_{F}

, and this definition is extended to third-order tensors as the square root of the sum of the squares of all their elements. The transpose operation for matrices is denoted by

{(\cdot)}^{T}

following the standard convention. For third-order tensors,

{(\cdot)}^{T}

represents the transpose of each frontal slice with the order of the frontal slices reversed from the second to the last along the third dimension. The trace of a matrix is denoted as

Tr (\cdot)

. We present some key definitions related to tensors as preliminary knowledge in the sections below.

Definition 1

(t-Product). ∀

L \in R^{n_{1} \times n_{2} \times n_{3}}

and

R \in R^{n_{2} \times n_{4} \times n_{3}}

, the t-product, denoted by ∗, is defined as

L * R = ifft (fft (L, [], 3) \otimes fft (R, [], 3), [], 3),

(1)

where

fft (\cdot, [], 3)

denotes the fast Fourier transform (FFT) along the third dimension of the input tensor,

ifft (\cdot, [], 3)

denotes the inverse fast Fourier transform (IFFT), and ⊗ denotes the operation of multiplying corresponding frontal slices of two tensors.

Definition 2

(Orthogonal Tensor). Tensor

U \in R^{n_{1} \times n_{1} \times n_{3}}

is orthogonal if

U * U^{T} = U^{T} * U = I,

(2)

where

I

is an identity tensor with proper size and with a first frontal slice that is an identity matrix, while the other slices are zeros.

Definition 3

(t-SVD). Given

Q \in R^{n_{1} \times n_{2} \times n_{3}}

, its t-SVD is

Q = U * S * V^{T},

(3)

where

U \in R^{n_{1} \times n_{1} \times n_{3}}

and

V \in R^{n_{2} \times n_{2} \times n_{3}}

are orthogonal tensors and

S

is f-diagonal, i.e., each

S^{(v)}

is diagonal.

Definition 4

(Tensor Nuclear Norm (TNN)). Given

Q \in R^{n_{1} \times n_{2} \times n_{3}}

, the TNN (denoted by

{∥ \cdot ∥}_{⊛}

) is defined as

{∥ Q ∥}_{⊛} = \sum_{i = 1}^{n_{3}} \sum_{j = 1}^{min {n_{1}, n_{2}}} σ_{i, j} (fft (Q, [], 3)),

(4)

where

σ_{i, j} (\cdot)

denotes the

j^{t h}

largest singular value of the

i^{t h}

frontal slice of the input tensor.

3. Related Work

Subspace-based clustering methods have been widely adopted in multi-view clustering tasks [32,38]. The core idea of subspace clustering is to learn view-specific low-dimensional representations of multi-view data. These can be formulated as

\begin{matrix} min_{G_{1}} \sum_{v = 1}^{V} {∥ X^{(v)} - X^{(v)} {(G_{1}^{(v)})}^{T} ∥}_{F}^{2} + λ f (G_{1}), \end{matrix}

(5)

where

G_{1} \in R^{n \times n \times V}

consists of the full affinity matrices for different views,

λ \geq 0

denotes a balancing parameter, and

f (\cdot)

represents a structural constraint such as the low-rank constraint used to enhance the cluster properties. However, such methods incur high computational complexity, limiting their applicability to large-scale datasets.

To address this inefficiency, anchor-based methods inherit the core idea of subspace clustering while introducing anchor points to reduce the model scale. Specifically, low-dimensional representations are no longer learned via linear combinations of all samples, but instead by a small set of anchor points. This approach drastically reduces the dimensionality of the optimization variable, which can be expressed as follows [27]:

\begin{matrix} min_{G_{2}} \sum_{v = 1}^{V} {∥ X^{(v)} - B^{(v)} {(G_{2}^{(v)})}^{T} ∥}_{F}^{2} + λ {∥ G_{2} ∥}_{F}^{2}, \\ s . t . G_{2}^{(v)} \geq 0, G_{2}^{(v)} 1_{m} = 1_{n}, \end{matrix}

(6)

where

B \in R^{{d_{1}, \dots, d_{V}} \times m}

consists of view-specific anchors, m denotes the number of anchors with

m ≪ n

,

1_{m}

and

1_{n}

are column vectors of ones with length correspondingly clarified in the subscript, and

G_{2} \in R^{n \times m \times V}

denotes the anchor-sample affinity matrices. Notably, the size of

G_{2}

is drastically reduced compared to

G_{1}

, which significantly improves the computational efficiency of the model for large-scale data.

4. The Proposed Method

Anchor-based multi-view clustering methods have gained widespread attention for improved efficiency, as they efficiently reduce computational complexity by leveraging bipartite graph structures [32]. However, existing anchor-based approaches often lack the ability to adequately capture the intrinsic complementary relationship between view-specific diversity and cross-view consistency [27]. This deficiency often leads to suboptimal clustering performance, as either view-specific discriminative information is overlooked or cross-view shared structures are not fully exploited [27]. Although Laplacian manifold regularization has been proposed to enhance the interaction between diversity and consistency [39], it incurs high computational cost when applied to full graphs, making it impractical for large-scale applications [39]. To address these issues, we propose a novel framework that integrates the efficiency advantage of anchor-based methods with effective modeling of diversity-consistency interactions while mitigating their respective drawbacks. The detailed derivation of the proposed framework is presented as follows.

For anchor-based MVC methods, a key step is often learning view-specific anchors and their corresponding anchor affinity matrices, which encode the similarity between data samples and anchors. This intuitive formulation leads to the following optimization problem:

\begin{matrix} min_{A, Z} \sum_{v = 1}^{V} {∥ X^{(v)} - A^{(v)} {(Z^{(v)})}^{T} ∥}_{F}^{2}, \\ s . t . {(A^{(v)})}^{T} A^{(v)} = I_{m}, Z^{(v)} \geq 0, Z^{(v)} 1_{m} = 1_{n}, \end{matrix}

(7)

where

A \in R^{d_{v} \times m \times V}

denotes the anchors to be learned in different views,

Z \in R^{n \times m \times V}

consists of the corresponding anchor affinity matrices, and

I_{m}

denotes an identity matrix where the subscript indicates the size. Then, the learned graphs of different views are fused for the final clustering step. However, the anchors are learned independently for each view in Equation (7), which is a critical flaw and results in misalignment across views. Since the anchor spaces of different views are incompatible, this misalignment makes it impossible to exploit cross-view consistency from the anchor affinity matrices.

To address this limitation, it is desirable to enforce cross-view consistency by learning a set of shared anchors that are applicable to all views. To explicitly exploit cross-view shared anchors, we further develop Equation (7) into the following model:

\begin{matrix} min_{W, A, Z} \sum_{v = 1}^{V} {∥ X^{(v)} - W^{(v)} A {(Z^{(v)})}^{T} ∥}_{F}^{2}, \\ s . t . {(W^{(v)})}^{T} W^{(v)} = I_{m}, A^{T} A = I_{m}, Z \geq 0, Z^{(v)} 1_{m} = 1_{n}, \end{matrix}

(8)

where

A \in R^{d \times m}

denotes the shared anchors of dimension d to be exploited across different views and

W \in R^{{d_{1}, \dots, d_{V}} \times d}

consists of view-specific mapping matrices that project view-specific features into the shared anchor space for cross-view alignment. In this paper, we set the dimension of the shared anchor space to

d = m

in order to achieve three-way consistency between the number of anchors, the dimension of the shared orthogonal space, and the dimension of the sample-anchor affinity matrix. In this way, we provide effective orthogonal constraints for cross-view alignment and compatibility with low-rank tensor modeling of multi-view interactions, avoid parameter redundancy while ensuring sufficient expressive capacity, and maintain the computational efficiency inherent to anchor-based methods.

However, Equation (8) only ensures anchor space alignment; it does not explicitly model the consistency between the view-specific anchor affinity matrices in the learning process. To further strengthen cross-view consistency, we propose fusing different views for cross-view consistency. This approach leads to

\begin{matrix} min_{W, A, Z, S} \sum_{v = 1}^{V} {∥ X^{(v)} - W^{(v)} A {(Z^{(v)})}^{T} ∥}_{F}^{2} + λ \sum_{v = 1}^{V} {∥ Z^{(v)} - S ∥}_{F}^{2}, \\ s . t . {(W^{(v)})}^{T} W^{(v)} = I_{m}, A^{T} A = I_{m}, Z \geq 0, Z^{(v)} 1_{m} = 1_{n}, \end{matrix}

(9)

where

S \in R^{n \times m}

represents the consistency graph captured by fusing information from cross-view affinity matrices.

While Equation (9) explicitly enforces consistency, the fusion of view-specific affinity matrices through a simple distance minimization strategy only captures relationships between

Z^{(v)}

and

S

, ignoring complex relationships inherent in their high-order interactions. Such high-order interactions are crucial for preserving complementary information across views and refining the quality of both diversity and consistency graphs. To address this gap and comprehensively exploit multi-view information, we integrate all view-specific diversity graphs

{Z^{(v)}}_{v = 1}^{V}

and the consistency graph

S

into a unified third-order tensor

H

, where the tensorization enables the modeling of high-order relationships among these graphs. Formally,

H

is defined as

H = stack (Z^{(1)}, Z^{(2)}, \dots, Z^{(V)}, S) \in R^{n \times m \times (V + 1)},

where

stack (\cdot)

concatenates the input matrices along a new mode to form a third-order tensor. To exploit the high-order relationships embedded in

H

and recover the intrinsic structural properties of the graphs, such as the low-rankness caused by shared clusters, we impose a low-rank constraint on

H

. Using the TNN as a convex surrogate for tensor rank, our model is developed into

\begin{matrix} min_{W, A, Z, S, α} \sum_{v = 1}^{V} {∥ X^{(v)} - W^{(v)} A {(Z^{(v)})}^{T} ∥}_{F}^{2} \\ + λ \sum_{v = 1}^{V} α_{v}^{2} {∥ Z^{(v)} - S ∥}_{F}^{2} + γ {∥ H ∥}_{⊛} \\ s . t . α^{T} 1_{V} = 1, {(W^{(v)})}^{T} W^{(v)} = I_{m}, A^{T} A = I_{m}, Z \geq 0, \\ Z^{(v)} 1_{m} = 1_{n}, H = stack (Z^{(1)}, \dots, Z^{(v)}, S), \end{matrix}

(10)

where

γ

is a balancing parameter,

{∥ \cdot ∥}_{⊛}

denotes the tensor nuclear norm, and

α \in R^{V}

denotes adaptive weights for different views. The low-rank structure of

H

serves three critical purposes: (1) It captures the intrinsic low-rank property underlying all view-specific and consistency graphs, which is closely related to cross-view consistency; (2) It facilitates high-order interactions between diversity graphs and the consistency graph, enabling mutual refinement; (3) It enhances model robustness by suppressing noise and outliers in individual graphs.

Despite the advantages of the TNN, recent studies [16,40] have pointed out that it is often inaccurate, leading to suboptimal low-rank recovery and thereby compromising the quality of the learned graphs. In contrast, nonconvex tensor rank approximations have demonstrated superior performance in capturing the true low-rank structure of tensors [16,40]. Specifically, the tensor nuclear norm (TNN) as a convex relaxation tends to overshrink the rank components and may deviate from the true rank minimization; nonconvex surrogates such as the log-determinant tensor rank approximation (LTRA) can more accurately approximate the rank function, leading to better recovery of underlying structures. Moreover, the LTRA admits closed-form solutions via singular value thresholding, which is more efficient than some other nonconvex approaches such as the tensor Schatten norm [41]. To further improve the accuracy of structural recovery for the unified tensor

H

, we adopt the log-determinant tensor rank approximation (LTRA) [16] in our model. Formally, the LTRA is defined as follows:

Definition 5

(LTRA). For

Q \in R^{n_{1} \times n_{2} \times n_{3}}

, the LTRA is defined as

{∥ Q ∥}_{ld} = \sum_{i = 1}^{n_{3}} \sum_{j = 1}^{min {n_{1}, n_{2}}} log (1 + σ_{i, j} (fft (Q, [], 3))) .

(11)

By substituting the TNN with the LTRA, we obtain a more accurate estimation of intrinsic low-rank structure for tensors, which in turn improves the recovery of structural properties for the diversity and consistency graphs. The objective function of our proposed framework is finally developed as

\begin{matrix} min_{W, A, Z, S, α} \sum_{v = 1}^{V} {∥ X^{(v)} - W^{(v)} A {(Z^{(v)})}^{T} ∥}_{F}^{2} \\ + λ \sum_{v = 1}^{V} α_{v}^{2} {∥ Z^{(v)} - S ∥}_{F}^{2} + γ {∥ H ∥}_{ld} \\ s . t . α^{T} 1_{V} = 1, {(W^{(v)})}^{T} W^{(v)} = I_{m}, A^{T} A = I_{m}, Z \geq 0, \\ Z^{(v)} 1_{m} = 1_{n}, H = stack (Z^{(1)}, \dots, Z^{(v)}, S) . \end{matrix}

(12)

We refer to this framework as Efficient Anchor-Guided Multi-View Clustering via Diversity–Consistency Learning and Low-Rank Tensor Recovery (EAG-DCT). An efficient optimization algorithm for solving the proposed objective function will be introduced in the next section.

5. Optimization

In this section, we develop an efficient optimization algorithm using the augmented Lagrange multiplier method (ALM). To facilitate the optimization, we first introduce a tensor variable

Y \in R^{n \times m \times (V + 1)}

such that

Y = H .

Then, we relax the constraint into the objective and construct the augmented Lagrange function as follows:

\begin{matrix} L (W, A, Z, S, α, Y, K, ρ) \\ = & \sum_{v = 1}^{V} {∥ X^{(v)} - W^{(v)} A {(Z^{(v)})}^{T} ∥}_{F}^{2} + λ \sum_{v = 1}^{V} α_{v}^{2} {∥ Z^{(v)} - S ∥}_{F}^{2} \\ + γ {∥ Y ∥}_{ld} + \frac{ρ}{2} {∥ Y - H + K / ρ ∥}_{F}^{2}, \\ s . t . & α^{T} 1_{V} = 1, {(W^{(v)})}^{T} W^{(v)} = I_{m}, A^{T} A = I_{m}, Z \geq 0, \\ Z^{(v)} 1_{m} = 1_{n}, H = stack (Z^{(1)}, \dots, Z^{(v)}, S), \end{matrix}

(13)

where

K \in R^{n \times m \times (V + 1)}

denotes the Lagrange multiplier and

ρ

is the penalty parameter. We then employ an alternating minimization strategy to update the variables as described below.

5.1. Updating $W$

It is clear that the subproblems of different views of

W

are independent; we may formulated the subproblem of each

W^{(v)}

as

\begin{matrix} min_{W^{(v)}} {∥ X^{(v)} - W^{(v)} A {(Z^{(v)})}^{T} ∥}_{F}^{2}, \\ s . t . {(W^{(v)})}^{T} W^{(v)} = I_{m} . \end{matrix}

(14)

It is straightforward to reformulate the above problem as

\begin{matrix} max_{W^{(v)}} Tr ({(W^{(v)})}^{T} {\bar{W}}^{(v)}), s . t . {(W^{(v)})}^{T} W^{(v)} = I_{m}, \end{matrix}

(15)

where

{\bar{W}}^{(v)} = X^{(v)} Z^{(v)} A^{T}

. According to [42], Equation (15) admits a closed-form solution. Denote the SVD of

{\bar{W}}^{(v)}

by

{\bar{W}}^{(v)} = U ({\bar{W}}^{(v)}) S ({\bar{W}}^{(v)}) {(V ({\bar{W}}^{(v)}))}^{T},

where

U (\cdot)

and

V (\cdot)

return the left and right singular vectors of the input matrix and

S (\cdot)

returns a diagonal matrix with elements that are the singular values of the input matrix. Then, the optimal

W

by slice is obtained as

W^{(v)} = U ({\bar{W}}^{(v)}) {(V ({\bar{W}}^{(v)}))}^{T} .

(16)

5.2. Updating A

The subproblem associated with

A

is presented as follows:

\begin{matrix} min_{A} \sum_{i = 1}^{v} {∥ X^{(v)} - W^{(v)} A {(Z^{(v)})}^{T} ∥}_{F}^{2}, s . t . A^{T} A = I_{m} . \end{matrix}

(17)

Under the orthogonality constraints

{(W^{(v)})}^{T} W^{(v)} = I_{m}

and

A^{T} A = I_{m}

, the objective function can be simplified. Expanding the Frobenius norm for each view yields

\begin{matrix} {∥ X^{(v)} - W^{(v)} A {(Z^{(v)})}^{T} ∥}_{F}^{2} = ∥ {X^{(v)} ∥}_{F}^{2} + {∥ W^{(v)} A {(Z^{(v)})}^{T} ∥}_{F}^{2} \\ - 2 Tr ({(X^{(v)})}^{T} W^{(v)} A {(Z^{(v)})}^{T}) . \end{matrix}

(18)

Utilizing the orthogonality of

W^{(v)}

, the second term reduces to

\begin{matrix} {∥ W^{(v)} A {(Z^{(v)})}^{T} ∥}_{F}^{2} & = Tr (Z^{(v)} A^{T} {(W^{(v)})}^{T} W^{(v)} A {(Z^{(v)})}^{T}) \\ = Tr (Z^{(v)} A^{T} A {(Z^{(v)})}^{T}) = {∥ Z^{(v)} ∥}_{F}^{2}, \end{matrix}

(19)

where the orthogonality of

A

is also applied. Both

{∥ X^{(v)} ∥}_{F}^{2}

and

{∥ Z^{(v)} ∥}_{F}^{2}

are constant with respect to

A

. Consequently, minimizing (17) is equivalent to maximizing the sum of the cross-terms:

\begin{matrix} \sum_{v = 1}^{V} Tr ({(X^{(v)})}^{T} W^{(v)} A {(Z^{(v)})}^{T}) & = \sum_{v = 1}^{V} Tr (A^{T} {(W^{(v)})}^{T} X^{(v)} Z^{(v)}) \\ = Tr (A^{T} \sum_{v = 1}^{V} {(W^{(v)})}^{T} X^{(v)} Z^{(v)}) . \end{matrix}

(20)

Define

\bar{A} = \sum_{v = 1}^{V} {(W^{(v)})}^{T} X^{(v)} Z^{(v)}

. The optimization problem then reduces to the classical orthogonal Procrustes form:

\begin{matrix} max_{A} Tr (A^{T} \bar{A}), s . t . A^{T} A = I_{m} . \end{matrix}

(21)

According to [42] and following the same reasoning as in the update of

W

, the optimal solution is obtained via singular value decomposition of

\bar{A}

:

A = U (\bar{A}) {(V (\bar{A}))}^{T},

(22)

where

U (\cdot)

and

V (\cdot)

denote the left and right singular vectors, respectively.

5.3. Updating $Z$

The subproblem of

Z

is

\begin{matrix} min_{Z \geq 0, Z^{(v)} 1_{m} = 1_{n}} \sum_{v = 1}^{V} {{∥ X^{(v)} - W^{(v)} A {(Z^{(v)})}^{T} ∥}_{F}^{2} \\ + λ α_{v}^{2} {∥ Z^{(v)} - S ∥}_{F}^{2} + \frac{ρ}{2} {∥ Y^{(v)} - Z^{(v)} + K^{(v)} / ρ ∥}_{F}^{2}} . \end{matrix}

(23)

It is clear that the slices of

Z

can be solved independently. The subproblem of each slice is formulated as follows:

\begin{matrix} min_{Z^{(v)}} {∥ Z^{(v)} - {\bar{Z}}^{(v)} ∥}_{F}^{2}, s . t . Z^{(v)} \geq 0, Z^{(v)} 1_{m} = 1_{n}, \end{matrix}

(24)

where each frontal slice of

\bar{Z} \in R^{n \times m \times V}

is defined as

{\bar{Z}}^{(v)} = \frac{2 {(X^{(v)})}^{T} W^{(v)} A + 2 λ α_{v}^{2} S + K^{(v)} + ρ Y^{(v)}}{2 λ α_{v}^{2} + ρ + 2}

. It is clear that Equation (24) is quadratic with closed convex constraint, and as such has a unique solution. In fact, this optimization problem corresponds to the standard

ℓ_{2}

projection onto the probability simplex [43]. Equation (24) can be effectively solved by a number of algorithms, including projection-based methods [43,44,45] and active set methods [46]. In this paper, we adopt the projection-based method and solve

Z^{(v)}

by row. Denote an arbitrary row of

{\bar{Z}}^{(v)}

by

{\bar{z}}^{T}

with

\bar{z} \in R^{m}

; then, the corresponding row of

Z^{(v)}

, denoted by

z^{T}

, is solved by

\begin{matrix} z^{T} = {({\bar{z}}^{T} + \frac{1}{j^{*}} (1 - \sum_{i = 1}^{j^{*}} {\bar{z}}_{(i)}) \cdot 1_{m}^{T})}_{+}, \end{matrix}

(25)

where

{(\cdot)}_{+}

denotes the non-negative projection,

{\bar{z}}_{(i)}

denotes the sorted elements of

\bar{z}

such that

{\bar{z}}_{(1)} \geq \dots \geq {\bar{z}}_{(m)}

, and

j^{*} = arg {max}_{{1 \leq j \leq m}} {{\bar{z}}_{(j)} + \frac{1}{j} (1 - \sum_{i = 1}^{j} {\bar{z}}_{(i)})}

. Then, we may update the overall

Z

accordingly. We summarize the above procedure as the following operator:

Z = ProbSimplex (\bar{Z}) .

(26)

5.4. Updating $S$

To update

S

, we need to solve the following optimization problem:

\begin{matrix} min_{S} λ \sum_{v = 1}^{V} α_{v}^{2} {∥ Z^{(v)} - S ∥}_{F}^{2} + \frac{ρ}{2} {∥ S - Y^{(V + 1)} + K^{(V + 1)} / ρ ∥}_{F}^{2} . \end{matrix}

(27)

The above problem is quadratic and admits a closed-form solution. According to the first-order optimality condition, we have

\begin{matrix} S = \frac{2 λ (\sum_{v = 1}^{V} α_{v}^{2} Z^{(v)}) - K^{(V + 1)} + ρ Y^{(V + 1)}}{ρ + 2 λ} . \end{matrix}

(28)

5.5. Updating $α_{v}$

The subproblem associated with

α_{v}

is

\begin{matrix} min_{α_{v}} \sum_{v = 1}^{V} α_{v}^{2} {∥ Z^{(v)} - S ∥}_{F}^{2}, s . t . \sum_{v = 1}^{V} α_{v} = 1, α_{v} \geq 0 . \end{matrix}

(29)

Define

ψ \in R^{V}

with

ψ_{v} : = {∥ Z^{(v)} - S ∥}_{F}^{2}

; then, according to the Cauchy–Schwarz inequality together with the constraints, we have

(\sum_{v = 1}^{V} α_{v}^{2} ψ_{v}) (\sum_{v = 1}^{V} \frac{1}{ψ_{v}}) \geq {(\sum_{v = 1}^{V} α_{v} {\sqrt{ψ}}_{v} \cdot \frac{1}{\sqrt{ψ_{v}}})}^{2} = 1,

(30)

where the equality holds if and only if

α_{v} \sqrt{ψ_{v}} \propto \frac{1}{\sqrt{ψ_{v}}}

for

v = 1, \dots, V

. Thus, the optimal

α

is obtained by element as

\begin{matrix} α_{v} = \frac{1 / ψ_{v}}{\sum_{j = 1}^{V} 1 / ψ_{j}} . \end{matrix}

(31)

5.6. Optimization of $Y$

Denote

\bar{Y} = H + K / ρ \in R^{n \times m \times (V + 1)}

; then, the subproblem of

Y

becomes

min_{Y} \frac{1}{2} {∥ \bar{Y} - Y ∥}_{F}^{2} + \frac{γ}{ρ} {∥ Y ∥}_{ld} .

(32)

Since the LTRA is defined in the Fourier domain, we may convert Equation (32) into the Fourier domain in a way similar to [40,47,48]:

min_{{\hat{Y}}^{(v)}} \sum_{v = 1}^{V + 1} \{\frac{1}{2} {∥ {\hat{\bar{Y}}}^{(v)} - {\hat{Y}}^{(v)} ∥}_{F}^{2} + \frac{γ}{ρ} \sum_{i = 1}^{c} log (1 + σ_{i} ({\hat{Y}}^{(v)})\},

(33)

where

\hat{\bar{Y}} = fft (\bar{Y}, [], 3)

and

\hat{Y} = fft (Y, [], 3)

, respectively. Then, Equation (33) can be divided into

V + 1

independent subproblems of

{\hat{Y}}^{(v)}

. It is clear that the subproblems are the log-determinant regularized shrinkage problems described in [49], which admit closed-form solutions. For ease of notation, we denote

D (\cdot)

as an operator that returns a diagonal matrix based on the input elements. Then, according to [10,49],

{\hat{Y}}^{(v)}

is obtained by

{\hat{Y}}^{(v)} = U ({\hat{\bar{Y}}}^{(v)}) D ({\hat{δ}}_{1, v}, \dots, {\hat{δ}}_{c, v}) {(V ({\hat{\bar{Y}}}^{(v)}))}^{T},

(34)

where

{\hat{δ}}_{i, v} : = arg min_{x \geq 0} f_{i, v} (x)

and

f_{i, v} (x) : = \frac{1}{2} {(x - σ_{v, i} (\hat{\bar{Y}}))}^{2} + \frac{γ}{ρ} log (1 + x)

for

i = 1, \dots, c

. Let

ξ_{i, v} = \frac{σ_{v, i} (\hat{\bar{Y}}) - 1 + \sqrt{{(1 + σ_{v, i} (\hat{\bar{Y}}))}^{2} - \frac{4 γ}{ρ}}}{2}

then,

{\hat{δ}}_{i, v} = ξ_{i, v}

if the conditions

ξ_{i, v} > 0

and

f_{i, v} (ξ_{i, v}) \leq f_{i, v} (0)

hold, otherwise

{\hat{δ}}_{i, v} = 0

. Now, we may obtain

Y = ifft (\hat{Y}, [], 3) .

(35)

Remark 1.

In practice, we follow [16,48] and perform a tensor rotation to rotate the

n \times m \times V

tensors to

n \times V \times m

tensors before Equation (32), and perform an inverse rotation to these tensors after Equation (35), respectively. This approach admits the following key advantages: first, it helps to preserve cross-view relationships, allowing us to perform FFT along the feature dimension; second, by performing such a rotation, the SVD is performed with respect to the slice using sample-view information, which is more meaningful for preserving essential cross-view information.

5.7. Updating $K$ , and $ρ$

These variables are updated in a standard form as follows:

\begin{matrix} K = K + ρ (Y - H), ρ = ρ κ, \end{matrix}

(36)

where

κ > 1

is a parameter that keeps

ρ

increasing.

6. The Clustering Step

After we solve Equation (12) and obtain the optimal

H

, a common strategy is to average all frontal slices of

H

to obtain the fused graph, on which the standard spectral clustering (SPC) is then performed to obtain the final clustering results [16]. However, the fused graph has size

n \times m

, meaning that the SPC is not directly applicable. Thus, we follow [50] to construct graphs of size

n \times n

by

M^{(v)} = H^{(v)} {(S^{(v)})}^{- 1} {(H^{(v)})}^{T}

, where

M \in R^{n \times n \times (V + 1)}

consists of the graphs as frontal slices and

S \in R^{m \times m \times (V + 1)}

is f-diagonal with

S_{i i}^{(v)} = \sum_{j = 1}^{n} H_{j i}^{(v)}

. Then, we obtain the fused graph as

M = \frac{1}{V + 1} \sum_{v = 1}^{V + 1} M^{(v)}

, on which we perform the SPC for the final clustering results. This step essentially obtains spectral embedding

Q

by solving the following problem:

max_{Q \in R^{n \times c}, Q^{T} Q = I_{c}} Tr (Q^{T} M Q),

(37)

with K-means being finally performed on

Q

to obtain the clusters. However, Equation (37) generally requires

O (n^{2} c)

complexity for eigenvalue decomposition, which is a potential bottleneck for large-scale data applications. Inspired by [51], we adopt the following strategy to solve Equation (37) more efficiently.

First, we construct a normalized concatenated matrix

\begin{matrix} H_{con} = \frac{[H^{(1)} {(S^{(1)})}^{- \frac{1}{2}}, \dots, H^{(V + 1)} {(S^{(V + 1)})}^{- \frac{1}{2}}]}{\sqrt{V + 1}} . \end{matrix}

(38)

Then, it is clear that

\begin{matrix} H_{con} H_{con}^{T} & = \frac{1}{V + 1} \sum_{v = 1}^{V + 1} H^{(v)} {(S^{(v)})}^{- \frac{1}{2}} {(S^{(v)})}^{- \frac{1}{2}} {(H^{(v)})}^{T} \\ = \frac{1}{V + 1} \sum_{v = 1}^{V + 1} H^{(v)} {(S^{(v)})}^{- 1} {(H^{(v)})}^{T} \\ = M . \end{matrix}

(39)

Let

H_{con} = U_{con} S_{con} V_{con}^{T}

be the full SVD of

H_{con}

; then, we have

\begin{matrix} M & = H_{con} H_{con}^{T} = U_{con} S_{con} V_{con}^{T} V_{con} S_{con}^{T} U_{con}^{T} \\ = U_{con} (S_{con} S_{con}^{T}) U_{con}^{T} \\ = U_{con} S_{con}^{2} U_{con}^{T} . \end{matrix}

(40)

It is clear that

U_{con}

is orthonormal and

S_{con}^{2}

is diagonal, so the above equation forms the eigenvalue decomposition of

M

. Thus, solving Equation (37) is identical to obtaining the left singular vectors that correspond to the largest c singular values of

H_{con}

, which reduces the time complexity from

O (n^{2} c)

to

O (n m V c)

. Finally, the K-means algorithm is performed to obtain the clustering result.

For clarity of presentation, we summarize the overall procedure of optimization and clustering steps in Algorithm 1.

Algorithm 1 EAG-DCT

Input: Multi-view data

X

, balancing parameter

λ

and

γ

, parameter

ρ

,

κ

, number of cluster c, number of anchors m
Initialize Initialize

W

,

A

,

Z

,

S

,

α

,

H

,

Y

,

K

,

ρ

1:: while the termination criteria are not met do
2:: Step 1. Update $W$ by Equation (16);
3:: Step 2. Update $A$ by Equation (22);
4:: Step 3. Update $Z$ by Equation (26);
5:: Step 4. Update $S$ by Equation (28);
6:: Step 5. Update $α$ by Equation (31);
7:: Step 6. Update $H$ by stacking $Z^{(v)}$ and $S$ ;
8:: Step 7. Update $Y$ by Equation (35);
9:: Step 8. Update $K$ , and $ρ$ by Equation (36).
10:: end while
11:: Compute $Q$ by performing SVD on $H_{con}$ .
12:: Output: Run k-means on $Q$ to achieve final clustering.

7. Complexity Analysis

We analyze the complexity of the EAG-DCT as follows. For ease of notation, we denote

d_{sum} = \sum_{v = 1}^{V} d_{v}

as the total dimension of concatenated multi-view features. The optimization step requires a complexity of

O (d_{sum} m n + d_{sum} m^{2})

,

O (d_{sum} m n + m^{2} n V + m^{3})

,

O (d_{sum} m n + m^{2} n V + m log (m) n V)

,

O (m n V)

,

O (m n V)

,

O (m log (m) n V + m n V^{2})

, and

O (m n V)

per iteration to update

W

,

A

,

Z

,

S

,

α

,

Y

, and

K

, respectively. The clustering step requires a complexity of

O (n m^{2} V)

to obtain spectral embedding of the fused graph and

O (c m n V)

to perform K-means. Since

c \leq m ≪ n

, the overall complexity of the EAG-DCT is simplified as

O (m^{2} n V + m n V^{2} + d_{sum} m n)

. It is clear that the complexity of EAG-DCT is linear in n and hence scalable, which renders it suitable for large-scale learning tasks.

8. Experiments

This section presents a comprehensive experimental study to validate the effectiveness and efficiency of the EAG-DCT. We compare the EAG-DCT with seven state-of-the-art MVC methods that cover both subspace clustering-based and graph learning-based approaches. We evaluate the methods on six benchmark data sets, including MSRC, ORL, Caltech101-20, Caltech101-all, VGGface50, and MNIST. We provide a brief summary of key information of these datasets in Table 1. Five metrics are adopted for clustering performance evaluation: accuracy (ACC), normalized mutual information (NMI), purity, F-score, and precision. More details of these metrics can be found in [50]. All metrics range from 0 to 1, with higher values indicating superior performance. The detailed experimental settings and results are elaborated in the subsequent subsections.

8.1. Baseline Methods

We compare the EAG-DCT with eight state-of-the-art methods, which are briefly introduced below. (1) AMGL [52] introduces a parameter-free weighting mechanism for efficient multi-view learning; (2) MSC-IAS [53] applies intact space learning to construct an informative similarity matrix that captures data intactness; (3) LMVSC [27] uses anchor points to build view-wise affinity matrices, providing a scalable approach that is suitable for large-scale datasets; (4) FMCNOF [23] jointly leverages anchor graphs and non-negative orthogonal factorization to efficiently obtain direct cluster labels; (5) ERMC-AGR [54] augments correntropy-based matrix factorization with anchor graph regularization, boosting both robustness and efficiency; (6) MSGL [55] presents a scalable graph learning framework that learns sample–anchor relationships via bipartite graphs; (7) OMVCDR [56] adopts a self-supervised auto-weighting scheme in latent space to unify multi-view learning and K-means; finally, (8) DMAC [25] introduces a learnable anchor mechanism with positive-incentive noise perturbation and anchor graph convolution to achieve linear-time deep multi-view clustering with clustering-oriented anchor optimization. Among these methods, the last five are anchor-based.

8.2. Experimental Settings

We describe the settings of the methods as follows. For the baseline methods, we tune the parameters according to the original papers. For the EAG-DCT, throughout the experiments we fix

ρ

and

κ

to 10 and 1.5, respectively. For the other parameters, we follow a commonly adopted strategy in the MVC literature and tune them within the following ranges in a grid-search manner:

$m \in {c, 2 c, 3 c, \dots, 10 c}$ ;
$λ \in {10^{- 3}, 10^{- 2}, 10^{- 1}, 10^{0}, 10^{1}, 10^{2}, 10^{3}}$ ;
$γ \in {10^{- 3}, 10^{- 2}, 10^{- 1}, 10^{0}, 10^{1}, 10^{2}, 10^{3}}$ .

Note that the orthonormal constraint

{(W^{(v)})}^{T} W^{(v)} = I_{m}

requires the number of anchors m to satisfy

m \leq d_{v}

for each view, where

d_{v}

is the feature dimension of the v-th view. Therefore, in practice the upper bound of m is further restricted by the minimum feature dimension across all views. Let

d_{min} = {min}_{v} d_{v}

. The actual range of m used in our experiments is

m \in {c, 2 c, 3 c, \dots, m i n (10 c, d_{min})}

, which guarantees that all candidate values are mathematically feasible.

For each combination of parameter values, we conduct ten independent trials of the final clustering step and record the average results. For each method, the parameters are tuned to the optimal ones, with the highest clustering results observed accordingly. We report the highest clustering results for comparison, with details provided in the next subsection. It is important to note that our hyperparameter selection process utilized ground-truth labels to identify optimal configurations during grid search. This practice was applied uniformly to all compared methods in order to ensure a fair comparison.

8.3. Clustering Performance

To validate the effectiveness of the proposed method, we conduct extensive experiments with the experimental settings described in Section 8.2. We report the clustering performance of all compared methods in Table 2, Table 3 and Table 4. Notably, the EAG-DCT achieves the highest values in all evaluation cases, which clearly demonstrates its significant superiority over the state-of-the-art baseline methods. To provide a more detailed analysis, we categorize the datasets into two groups based on their sample sizes. In particular, the first group contains small-to-medium-scale datasets, including MSRC, ORL, and Caltech101-20, while the second group includes large-scale datasets such as Caltech101-all, VGGface50, and MNIST. The experimental results for these two groups of datasets are discussed separately in detail below.

For small- to medium-scale datasets, the EAG-DCT maintains exceptional performance, achieving the best results across all fifteen evaluation metrics on all three benchmark datasets. Specifically, the EAG-DCT secures the top performance in every metric, with substantial improvements over the second-ranked methods. For example, on the ORL dataset, EAG-DCT improves performance compared with the second-best results obtained by MSC_IAS by about 18.9%, 14.5%, 19.1%, 30.8%, and 27.8% across different metrics, respectively, providing results that are quite significant. It is observed that several baseline methods exhibit competitive performance on specific datasets. For example, ERMC-AGR achieves the second-best results across all metrics on the MSRC dataset, while MSC_IAS ranks second on the ORL dataset. However, none of these baseline methods consistently perform as the leading methods across all small- to medium-scale datasets. In contrast, EAG-DCT sustains superior clustering performance across all small- to medium-scale datasets, verifying its effectiveness and stability on all of these datasets.

For large scale datasets, EAG-DCT again shows superior performance, achieving the highest results in all fifteen evaluation cases.

These large scale datasets are considered particularly challenging for clustering tasks due to the following key reasons. First, on the Caltech101-all and VGGface50 datasets, all baseline methods fail to obtain high clustering performance. In particular, a majority of metric values remain below 0.3 on the Caltech101-all dataset and below 0.1 on the VGGface50 dataset. This poor performance can be attributed to the inherent complexity of these datasets. Caltech101-all contains 102 categories with significant intra-class variation in object appearance, scale, and pose, making it difficult to discover coherent clusters. Similarly, VGGface50 comprises face images of 50 celebrities captured under diverse conditions of illumination, expression, and occlusion, resulting in high intra-class diversity and inter-class similarity. Such characteristics obscure the underlying cluster structure and pose substantial challenges for all clustering methods. Second, several methods fail on the MNIST dataset, which indicates the inherent difficulty of large-scale datasets due to their large sample sizes. Third, the MNIST dataset presents a different challenge. Most baseline methods already achieve very high clustering performance on it, leaving extremely limited room for further improvement. Despite these challenges, the proposed EAG-DCT achieves promising performance on all large scale datasets. For example, on the Caltech101-all dataset, EAG-DCT improves the results by about 10.4%, 10.4%, 12.8%, 8.8%, and 17.2% across different metrics, respectively. On the MNIST dataset, EAG-DCT enhances clustering performance by about 0.2–0.5% across different metrics, which is a non-trivial improvement given the high baseline. Similar to the observations on small- to medium-scale datasets, no single baseline method consistently obtains the most competitive results across all large scale datasets.

8.4. T-SNE Visualization

To intuitively demonstrate the representation learning capability of EAG-DCT, we employ the t-SNE technique [57] to visualize the sample distributions learned by our method in both the original feature space and the fused low-dimensional space. Since some datasets contain a large number of clusters, such as Caltech101-all with 102 classes and VGGface50 with 50 classes, displaying all clusters simultaneously would make the plots overcrowded; therefore, for each data set we randomly select seven classes (the smallest cluster number among all datasets, corresponding to MSRC with seven clusters) and visualize only the samples belonging to these selected classes. Random selection is performed after clustering, and is applied consistently to both the original and fused representations.

The t-SNE visualizations for all six benchmark data sets are presented in Figure 1, Figure 2 and Figure 3. For each dataset, the left subfigure shows the distribution in the original feature space, while the right subfigure shows the distribution in the fused space produced by EAG-DCT. From the results, it can be observed that samples from different classes are heavily overlapped in the original space, reflecting the inherent difficulty of these clustering tasks. In contrast, the fused low-dimensional representations exhibit well-separated cluster structures across all datasets, including the challenging large-scale ones such as Caltech101-all and VGGface50. These visualizations confirm that EAG-DCT effectively captures the underlying multi-view data structure and yields discriminative embeddings, demonstrating its strong adaptability and effectiveness for both small- to medium-scale and large-scale clustering tasks.

8.5. Convergence Study

Next, we empirically validate the convergent behavior of the EAG-DCT. Without loss of generality, we fix

λ = 0.001

and

γ = 0.001

in this experiment. We present the convergence curves using the Caltech101-20 and VGGface50 datasets. Specifically, the convergence analysis is conducted from two complementary perspectives.

First, we investigate how the value of the objective function changes with respect to the iteration. Second, we analyze how the difference between consecutive updates of variables changes with respect to iteration. For a clearer illustration, we define the difference as the maximum change of all variables. We show the results in Figure 4 and Figure 5. From the results, it can be clearly observed that EAG-DCT converges within about 30 iterations in terms of both objective function value and variable updates. These observations empirically demonstrate the efficient convergent behavior of EAG-DCT and show its practical applicability in real-world scenarios.

8.6. Time Cost and Scalability

In this test, we study how the time cost of the EAG-DCT changes with respect to the sample size of the dataset. Without loss of generality, we fix

λ = 0.001

and

γ = 0.001

and use the MNIST and Caltech101-all datasets for illustration. All experiments in this study are conducted in MATLAB R2020a, with the hardware environment specified as an Intel Core i9-11900 CPU (2.50 GHz) with eight cores and 64 GB of memory.

In particular, we use sampled subsets of the MNIST and Caltech101-all datasets with a sampling ratio of

r %

, where r ranges values in the set

{10, 20, \dots, 100}

. Throughout this study, we accordingly fix the number of anchor points as c for each dataset. We set the number of iterations to 50, which has been shown to be sufficient for convergence. For each subset of the data, we conduct ten independent trials of experiment and report the averaged time cost in Figure 6. As observed from the results, the time cost of EAG-DCT exhibits a linear growth trend with the increase in data size, which empirically validates its scalability. This observation suggests practical applicability of EAG-DCT to large-scale data clustering scenarios.

8.7. Parameter Sensitivity

In this study, we empirically show how the parameter value affects the learning performance. Without loss of generality, we adopt the MNIST dataset for illustration using ACC and NMI, where the experimental settings follow Section 8.2. We vary all possible combinations of

λ

and

γ

and report the clustering performance in ACC and NMI in Figure 7.

From the results presented in Figure 7, several key observations can be drawn. First, the proposed EAG-DCT achieves its optimal performance when

γ = 10

and

λ = 0.01

(or

λ = 10

), yielding an ACC of 99.1% and an NMI of 97.3%. Second, when

λ

and

γ

lie within the range

[0.01, 10]

, ACC consistently exceeds 97% and NMI remains above 95%, indicating that the method is stable and insensitive to parameter variations within this region. Third, extreme parameter values (e.g.,

λ = 0.001, γ = 0.001

) lead to degraded performance, with ACC dropping to 94.81% and NMI to 93.63%, which validates the appropriateness of the chosen search range.

8.8. Ablation Study

To verify the effectiveness of individual modules in the proposed method, we perform ablation experiments on the ORL and MNIST datasets. The following three variants are designed to isolate the contribution of each key component:

\begin{matrix} min_{W, A, Z, α} \sum_{v = 1}^{V} {∥ X^{(v)} - W^{(v)} A {(Z^{(v)})}^{T} ∥}_{F}^{2} + γ {∥ H ∥}_{ld} \\ s . t . {(W^{(v)})}^{T} W^{(v)} = I_{m}, A^{T} A = I_{m}, Z \geq 0, \\ Z^{(v)} 1_{m} = 1_{n}, H = stack (Z^{(1)}, \dots, Z^{(v)}) . \end{matrix}

(41)

Compared to the complete EAG-DCT, the first variant keeps the low-rank tensor constraint for implicit view alignment but lacks the explicit consistency mechanism.

The second variant is constructed by removing the low-rank tensor learning module, resulting in the following formulation:

\begin{matrix} min_{W, A, Z, S, α} \sum_{v = 1}^{V} {∥ X^{(v)} - W^{(v)} A {(Z^{(v)})}^{T} ∥}_{F}^{2} + λ \sum_{v = 1}^{V} α_{v}^{2} {∥ Z^{(v)} - S ∥}_{F}^{2} \\ s . t . α^{T} 1_{V} = 1, {(W^{(v)})}^{T} W^{(v)} = I_{m}, A^{T} A = I_{m}, Z \geq 0, Z^{(v)} 1_{m} = 1_{n} . \end{matrix}

(42)

Although this variant integrates both consistency and diversity across views, it lacks the global low-rank tensor regularization.

The third variant is constructed by eliminating the adaptive weighting mechanism, expressed as

\begin{matrix} min_{W, A, Z, S, α} \sum_{v = 1}^{V} {∥ X^{(v)} - W^{(v)} A {(Z^{(v)})}^{T} ∥}_{F}^{2} + λ \sum_{v = 1}^{V} {∥ Z^{(v)} - S ∥}_{F}^{2} + γ {∥ H ∥}_{ld} \\ s . t . {(W^{(v)})}^{T} W^{(v)} = I_{m}, A^{T} A = I_{m}, Z \geq 0, \\ Z^{(v)} 1_{m} = 1_{n}, H = stack (Z^{(1)}, \dots, Z^{(v)}, S) . \end{matrix}

(43)

While this variant inherits the main architecture of the full model, it disregards the varying contributions of different views when constructing the consensus representation.

The results are presented in Figure 8 and Figure 9. It can be observed that the complete EAG-DCT model consistently outperforms all three variants across both datasets. On the ORL dataset, EAG-DCT achieves substantial improvements of approximately 14.7–17.3%, 9.4–11.6%, 14.1–15.7%, 22.8–28.58%, and 23.3–32.2% across five evaluation metrics, respectively. On the MNIST dataset, where most baseline methods already saturate with near-perfect performance, EAG-DCT still manages to push the state-of-the-art further, achieving improvements of 0.9–0.6%, 1.3–1.9%, 0.56–0.9%, 1.1–1.7%, and 1.13–1.84%, respectively. These consistent gains, particularly the notable improvements on the more challenging ORL dataset and the ability to further elevate near-ceiling performance on MNIST, strongly validate the effectiveness and necessity of the key components in the proposed EAG-DCT framework.

8.9. Discussion

Our experimental results demonstrate that EAG-DCT achieves state-of-the-art performance across all datasets with significant margins, for example up to 30.8% on ORL, and with consistent superiority on large-scale benchmarks. Compared to existing methods, our approach advances the field by jointly modeling consistency and diversity within an anchor graph tensor. EAG-DCT captures high-order cross-view interactions while maintaining linear time complexity, a balance that prior methods fail to achieve. However, this work does have two main limitations: first, three parameters require tuning, though performance is stable within a wide range; second, on extremely challenging datasets such as VGGface50 (where most baselines yield very low metrics, with a majority below 0.1), the absolute performance gain remains limited, though still the highest. Future work includes automatic parameter selection, integration with deep feature learning, and extensions to incomplete multi-view data.

9. Conclusions

In this paper, we propose an innovative unified MVC framework named EAG-DCT. This new method integrates consensus anchor learning, diversity–consistency modeling, and nonconvex low-rank tensor recovery into a unified optimization paradigm. In particular, the shared consensus anchors eliminate cross-view misalignment, while integrating view-specific diversity graphs and a global consistency graph into a high-order tensor captures rich high-order correlations for mutual refinement. Adopting the LTRA ensures accurate structural recovery and the anchor-based mechanism guarantees linear time complexity with respect to sample size, enabling large-scale applicability. Comprehensive experiments on six benchmark datasets demonstrate the effectiveness and efficiency of the proposed EAG-DCT.

Author Contributions

Conceptualization, R.F., K.K. and C.P.; methodology, R.F., K.K. and C.P.; software, R.F. and K.K.; validation, Q.Z., Y.H., C.L. and C.P.; formal analysis, R.F., K.K. and Y.H.; investigation, Q.Z., C.L. and Y.H.; resources, R.F. and K.K.; data curation, R.F., K.K., C.L. and Q.Z.; writing—original draft preparation, R.F. and K.K.; writing—review and editing, C.P.; visualization, R.F. and K.K.; supervision Y.H. and C.P.; project administration, C.P.; funding acquisition, Y.H. and C.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported in part by the National Natural Science Foundation of China (NSFC) under Grant 62276147; in part by the Shandong Province Colleges and Universities Youth Innovation Technology Plan Innovation Team Project under Grant 2022KJ149; in part by the Fund Program for the Scientific Activities of Selected Returned Overseas Professionals in Shanxi Province under Grant 20240035; in part by the Research start-up funds of the OUC Youth Talents Project under Grant 862501013147; and in part by Sub-project of the University of Chinese Academy of Sciences under Grant 202422A014.

Data Availability Statement

The data used in this study are publicly accessible, while the associated code is available from the authors upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ng, T.C.; Choy, S.K.; Lam, S.Y.; Yu, K.W. Fuzzy superpixel-based image segmentation. Pattern Recognit. 2023, 134, 109045. [Google Scholar]
Chen, Y.; Wang, Z.; Bai, X. Fuzzy sparse subspace clustering for infrared image segmentation. IEEE Trans. Image Process. 2023, 32, 2132–2146. [Google Scholar] [CrossRef]
Park, N.; Rossi, R.; Koh, E.; Burhanuddin, I.A.; Kim, S.; Du, F.; Ahmed, N.; Faloutsos, C. CGC: Contrastive Graph Clustering forCommunity Detection and Tracking. In Proceedings of the The Web Conference (WWW); Association for Computing Machinery: New York, NY, USA, 2022; pp. 1115–1126. [Google Scholar]
Guan, R.; Zhang, H.; Liang, Y.; Giunchiglia, F.; Huang, L.; Feng, X. Deep Feature-Based Text Clustering and Its Explanation. In Proceedings of the 2023 IEEE 39th International Conference on Data Engineering (ICDE); IEEE: New York, NY, USA, 2023; pp. 3871–3872. [Google Scholar] [CrossRef]
Ikotun, A.M.; Ezugwu, A.E.; Abualigah, L.; Abuhaija, B.; Heming, J. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Inf. Sci. 2023, 622, 178–210. [Google Scholar]
Peng, C.; Zhang, Y.; Chen, Y.; Kang, Z.; Chen, C.; Cheng, Q. Log-based sparse nonnegative matrix factorization for data representation. Knowl.-Based Syst. 2022, 251, 109127. [Google Scholar] [PubMed]
Peng, C.; Zhang, P.; Chen, Y.; Kang, Z.; Chen, C.; Cheng, Q. Fine-grained bipartite concept factorization for clustering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2024; pp. 26264–26274. [Google Scholar]
Liu, G.; Lin, Z.; Yu, Y. Robust subspace segmentation by low-rank representation. In Proceedings of the 27th International Conference on Machine Learning (ICML-10); Omnipress: Madison, WI, USA, 2010; pp. 663–670. [Google Scholar]
Vidal, E.E.R. Sparse subspace clustering. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2009; Volume 6, pp. 2790–2797. [Google Scholar]
Peng, C.; Kang, Z.; Li, H.; Cheng, Q. Subspace clustering using log-determinant rank approximation. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2015; pp. 925–934. [Google Scholar]
Peng, C.; Zhang, Q.; Kang, Z.; Chen, C.; Cheng, Q. Kernel two-dimensional ridge regression for subspace clustering. Pattern Recognit. 2021, 113, 107749. [Google Scholar] [CrossRef]
Zhang, C.; Fu, H.; Hu, Q.; Cao, X.; Xie, Y.; Tao, D.; Xu, D. Generalized Latent Multi-View Subspace Clustering. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 86–99. [Google Scholar] [CrossRef]
Wang, F.; Chen, C.; Peng, C. Essential Low-Rank Sample Learning for Group-Aware Subspace Clustering. IEEE Signal Process. Lett. 2023, 30, 1537–1541. [Google Scholar] [CrossRef]
Fang, U.; Li, M.; Li, J.; Gao, L.; Jia, T.; Zhang, Y. A comprehensive survey on multi-view clustering. IEEE Trans. Knowl. Data Eng. 2023, 35, 12350–12368. [Google Scholar] [CrossRef]
Sun, Y.; Qin, Y.; Li, Y.; Peng, D.; Peng, X.; Hu, P. Robust Multi-View Clustering With Noisy Correspondence. IEEE Trans. Knowl. Data Eng. 2024, 36, 9150–9162. [Google Scholar] [CrossRef]
Peng, C.; Kang, K.; Chen, Y.; Kang, Z.; Chen, C.; Cheng, Q. Fine-Grained Essential Tensor Learning for Robust Multi-View Spectral Clustering. IEEE Trans. Image Process. 2024, 33, 3145–3160. [Google Scholar] [CrossRef] [PubMed]
Lu, Z.; Nie, F.; Wang, R.; Li, X. A Differentiable Perspective for Multi-View Spectral Clustering With Flexible Extension. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 7087–7098. [Google Scholar] [CrossRef] [PubMed]
Chen, Y.; Wang, S.; Peng, C.; Hua, Z.; Zhou, Y. Generalized nonconvex low-rank tensor approximation for multi-view subspace clustering. IEEE Trans. Image Process. 2021, 30, 4022–4035. [Google Scholar] [CrossRef] [PubMed]
Huang, S.; Zhang, Y.; Fu, L.; Wang, S. Learnable Multi-View Matrix Factorization With Graph Embedding and Flexible Loss. IEEE Trans. Multimed. 2023, 25, 3259–3272. [Google Scholar] [CrossRef]
Zhang, C.; Fu, H.; Wang, J.; Li, W.; Cao, X.; Hu, Q. Tensorized multi-view subspace representation learning. Int. J. Comput. Vis. 2020, 128, 2344–2361. [Google Scholar] [CrossRef]
Peng, X.; Huang, Z.; Lv, J.; Zhu, H.; Zhou, J.T. COMIC: Multi-view Clustering Without Parameter Selection. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 5092–5101. [Google Scholar]
Cai, D.; Chen, X. Large Scale Spectral Clustering Via Landmark-Based Sparse Representation. IEEE Trans. Cybern. 2015, 45, 1669–1680. [Google Scholar] [CrossRef]
Yang, B.; Zhang, X.; Nie, F.; Wang, F.; Yu, W.; Wang, R. Fast multi-view clustering via nonnegative and orthogonal factorization. IEEE Trans. Image Process. 2020, 30, 2575–2586. [Google Scholar] [CrossRef]
Wang, S.; Li, C.; Li, Y.; Yuan, Y.; Wang, G. Self-Supervised Information Bottleneck for Deep Multi-View Subspace Clustering. IEEE Trans. Image Process. 2023, 32, 1555–1567. [Google Scholar] [CrossRef]
Wang, B.; Zeng, C.; Chen, M.; Li, X. Towards Learnable Anchor for Deep Multi-View Clustering. arXiv 2025, arXiv:2503.12427. [Google Scholar] [CrossRef]
Duan, Y.Q.; Yuan, H.L.; Lai, L.L.; He, B. Multi-View Subspace Clustering with Local and Global Information. In Proceedings of the 2021 International Conference on Wavelet Analysis and Pattern Recognition (ICWAPR); IEEE: New York, NY, USA, 2021; pp. 1–6. [Google Scholar] [CrossRef]
Kang, Z.; Zhou, W.; Zhao, Z.; Shao, J.; Han, M.; Xu, Z. Large-scale multi-view subspace clustering in linear time. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI Press: Washington, DC, USA, 2020; Volume 34, pp. 4412–4419. [Google Scholar]
Li, X.; Zhang, H.; Wang, R.; Nie, F. Multiview Clustering: A Scalable and Parameter-Free Bipartite Graph Fusion Method. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 330–344. [Google Scholar] [CrossRef]
Li, J.; Wang, Q.; Yang, M.; Gao, Q.; Gao, X. Efficient Anchor Graph Factorization for Multi-View Clustering. IEEE Trans. Multimed. 2024, 26, 5834–5845. [Google Scholar] [CrossRef]
Zhao, W.; Li, Q.; Xu, H.; Gao, Q.; Wang, Q.; Gao, X. Anchor Graph-Based Feature Selection for One-Step Multi-View Clustering. IEEE Trans. Multimed. 2024, 26, 7413–7425. [Google Scholar] [CrossRef]
Li, M.; Liang, W.; Liu, X. Multi-View Clustering With Learned Bipartite Graph. IEEE Access 2021, 9, 87952–87961. [Google Scholar] [CrossRef]
Wang, S.; Liu, X.; Zhu, X.; Zhang, P.; Zhang, Y.; Gao, F.; Zhu, E. Fast parameter-free multi-view subspace clustering with consensus anchor guidance. IEEE Trans. Image Process. 2021, 31, 556–568. [Google Scholar] [CrossRef] [PubMed]
Zhang, X.; Ren, Z.; Yang, C. Center consistency guided multi-view embedding anchor learning for large-scale graph clustering. Knowl.-Based Syst. 2023, 260, 110162. [Google Scholar] [CrossRef]
Liu, S.; Wang, S.; Zhang, P.; Xu, K.; Liu, X.; Zhang, C.; Gao, F. Efficient one-pass multi-view subspace clustering with consensus anchors. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI Press: Washington, DC, USA, 2022; Volume 36, pp. 7576–7584. [Google Scholar]
Sun, M.; Zhang, P.; Wang, S.; Zhou, S.; Tu, W.; Liu, X.; Zhu, E.; Wang, C. Scalable Multi-view Subspace Clustering with Unified Anchors. In Proceedings of the 29th ACM International Conference on Multimedia; MM ’21; Association for Computing Machinery: New York, NY, USA, 2021; pp. 3528–3536. [Google Scholar] [CrossRef]
You, X.; Li, H.; You, J.; Ren, Z. Consider high-order consistency for multi-view clustering. Neural Comput. Appl. 2023, 36, 717–729. [Google Scholar] [CrossRef]
Wu, T.; Lu, G.F. Tensorized diversity and consistency with Laplacian manifold for multi-view clustering. Inf. Sci. 2025, 690, 121575. [Google Scholar] [CrossRef]
Sun, X.; Wang, Y.; Zhang, X. Multi-view subspace clustering via non-convex tensor rank minimization. In Proceedings of the 2020 IEEE International Conference on Multimedia and Expo (ICME); IEEE Computer Society: New York, NY, USA, 2020; pp. 1–6. [Google Scholar]
Pan, B.; Li, C.; Che, H. Error-robust multi-view subspace clustering with nonconvex low-rank tensor approximation and hyper-Laplacian graph embedding. Eng. Appl. Artif. Intell. 2024, 133, 108274. [Google Scholar] [CrossRef]
Gao, Q.; Zhang, P.; Xia, W.; Xie, D.; Gao, X.; Tao, D. Enhanced Tensor RPCA and Its Application. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 2133–2140. [Google Scholar] [CrossRef]
Tomioka, R.; Suzuki, T. Convex tensor decomposition via structured Schatten norm regularization. In Proceedings of the 27th International Conference on Neural Information Processing Systems–Volume 1; NIPS’13; Curran Associates Inc.: Red Hook, NY, USA, 2013; pp. 1331–1339. [Google Scholar]
Nie, F.; Zhang, R.; Li, X. A generalized power iteration method for solving quadratic problem on the Stiefel manifold. Sci. China Inf. Sci. 2017, 60, 112101. [Google Scholar] [CrossRef]
Duchi, J.; Shalev-Shwartz, S.; Singer, Y.; Chandra, T. Efficient projections onto the ℓ₁-ball for learning in high dimensions. In Proceedings of the 25th International Conference on Machine Learning; Association for Computing Machinery: New York, NY, USA, 2008; pp. 272–279. [Google Scholar]
Michelot, C. A finite algorithm for finding the projection of a point onto the canonical simplex of Rⁿ. J. Optim. Theory Appl. 1986, 50, 195–200. [Google Scholar] [CrossRef]
Condat, L. Fast projection onto the simplex and the ℓ₁ ball. Math. Program. 2016, 158, 575–585. [Google Scholar] [CrossRef]
Elhamifar, E.; Sapiro, G.; Sastry, S.S. Dissimilarity-based sparse subset selection. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 2182–2197. [Google Scholar] [CrossRef]
Pan, B.; Li, C.; Che, H. Nonconvex low-rank tensor approximation with graph and consistent regularizations for multi-view subspace learning. Neural Netw. 2023, 161, 638–658. [Google Scholar] [CrossRef]
Wu, J.; Lin, Z.; Zha, H. Essential tensor learning for multi-view spectral clustering. IEEE Trans. Image Process. 2019, 28, 5910–5922. [Google Scholar] [CrossRef]
Peng, C.; Liu, Y.; Kang, K.; Chen, Y.; Wu, X.; Cheng, A.; Kang, Z.; Chen, C.; Cheng, Q. Hyperspectral image denoising using nonconvex local low-rank and sparse separation with spatial–spectral total variation regularization. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5538617. [Google Scholar] [CrossRef]
Chen, X.; Cai, D. Large scale spectral clustering with landmark-based representation. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI Press: Washington, DC, USA, 2011; Volume 25, pp. 313–318. [Google Scholar]
Affeldt, S.; Labiod, L.; Nadif, M. Spectral clustering via ensemble deep autoencoder learning (SC-EDAE). Pattern Recognit. 2020, 108, 107522. [Google Scholar] [CrossRef]
Nie, F.; Li, J.; Li, X. Parameter-free auto-weighted multiple graph learning: A framework for multiview clustering and semi-supervised classification. In Proceedings of the IJCAI; AAAI Press: Washington, DC, USA, 2016; Volume 9, pp. 1881–1887. [Google Scholar]
Wang, X.; Lei, Z.; Guo, X.; Zhang, C.; Shi, H.; Li, S.Z. Multi-view subspace clustering with intactness-aware similarity. Pattern Recognit. 2019, 88, 50–63. [Google Scholar] [CrossRef]
Yang, B.; Zhang, X.; Lin, Z.; Nie, F.; Chen, B.; Wang, F. Efficient and robust multiview clustering with anchor graph regularization. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 6200–6213. [Google Scholar] [CrossRef]
Kang, Z.; Lin, Z.; Zhu, X.; Xu, W. Structured graph learning for scalable subspace clustering: From single view to multiview. IEEE Trans. Cybern. 2021, 52, 8976–8986. [Google Scholar] [CrossRef] [PubMed]
Wan, X.; Liu, J.; Gan, X.; Liu, X.; Wang, S.; Wen, Y.; Wan, T.; Zhu, E. One-Step Multi-View Clustering with Diverse Representation. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 5774–5786. [Google Scholar] [CrossRef] [PubMed]
Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]

Figure 1. The t-SNE visualization of the MSRC (a–c) and ORL (d–f) datasets.

Figure 2. The t-SNE visualization of the Caltech101-20 (a–c) and Caltech101-all (d–f) datasets.

Figure 3. The t-SNE visualization of the VGGface50 (a–c) and MNIST (d–f) data sets.

Figure 4. Changes in the objective function value with respect to iteration numbers on the Caltech101-20 and VGGface50 datasets.

Figure 5. Changes of the difference between two consecutive variable updates with respect to iteration numbers on the Caltech101-20 and VGGface50 datasets.

Figure 6. Time cost of EAG-DCT with respect to sample size on the MNIST and Caltech101-all datasets.

Figure 7. Clustering performance of the EAG-DCT with respect to different parameters on MNIST dataset.

Figure 8. Comparison of clustering results between EAG-DCT and its baseline variants on the ORL dataset.

Figure 9. Comparison of clustering results between EAG-DCT and its baseline variants on the MNIST dataset.

Table 1. Summary of benchmark datasets.

Data Set	# of Samples	# of Clusters	# of Views	# of Features per View
MSRC	210	7	6	1302/48/512/100/256/210
ORL	400	40	3	4096/3304/6750
Caltech101-20	2386	20	6	48/40/254/1984/512/928
Caltech101-all	9144	102	5	48/40/254/512/928
VGGface50	16,936	50	4	944/576/512/640
MNIST	60,000	10	3	342/1024/64

All datasets are widely used in the multi-view clustering literature.

Table 2. Clustering results on MSRC and ORL datasets in different metrics (%, reported in mean ± standard deviation).

Data Sets	MSRC					ORL
Methods	ACC	NMI	PUR	F-Score	Precision	ACC	NMI	PUR	F-Score	Precision
AMGL	31.9 ± 1.3	25.8 ± 1.6	38.5 ± 1.3	13.1 ± 1.2	22.5 ± 1.3	58.5 ± 3.0	77.5 ± 1.9	67.0 ± 2.4	42.7 ± 4.4	32.3 ± 4.4
MSC_IAS	42.8 ± 0.0	34.7 ± 0.0	44.7 ± 0.0	31.2 ± 0.0	29.6 ± 0.0	68.7 ± 0.0	81.7 ± 0.0	72.0 ± 0.0	56.6 ± 0.0	53.5 ± 0.0
LMVSC	34.1 ± 0.2	24.5 ± 0.0	40.0 ± 0.0	24.6 ± 0.0	24.5 ± 0.0	54.9 ± 1.4	72.0 ± 1.0	61.2 ± 1.2	32.4 ± 2.2	22.9 ± 2.1
FMCNOF	52.2 ± 2.2	38.9 ± 4.0	53.3 ± 3.3	39.2 ± 3.2	34.2 ± 3.1	59.4 ± 2.7	76.7 ± 0.7	64.7 ± 2.2	46.5 ± 1.1	39.9 ± 0.9
ERMC-AGR	69.3 ± 3.8	58.5 ± 3.8	71.8 ± 3.8	56.4 ± 4.9	53.8 ± 5.1	63.5 ± 3.9	78.2 ± 2.1	67.3 ± 3.5	49.4 ± 4.2	43.6 ± 4.5
MSGL	67.1 ± 0.0	54.6 ± 0.0	70.3 ± 0.0	54.4 ± 0.0	53.2 ± 0.0	19.7 ± 0.0	41.1 ± 0.0	24.0 ± 0.0	05.1 ± 0.0	06.3 ± 0.0
OMVCDR	66.1 ± 0.0	57.9 ± 0.0	67.1 ± 0.0	53.5 ± 0.0	52.4 ± 0.0	62.7 ± 0.0	78.4 ± 0.0	65.2 ± 0.0	52.0 ± 0.0	51.5 ± 0.0
DMAC	67.1 ± 0.0	54.8 ± 0.0	67.1 ± 0.0	66.6 ± 0.0	20.7 ± 0.0	53.5 ± 0.0	73.7 ± 0.0	58.7 ± 0.0	52.0 ± 0.0	02.8 ± 0.0
EAG-DCT	85.0 ± 0.3	73.3 ± 0.4	85.0 ± 0.3	72.1 ± 0.4	70.6 ± 0.5	87.6 ± 1.6	96.2 ± 0.4	91.1 ± 1.2	87.4 ± 2.0	81.3 ± 3.2

The best results are shown in bold; the second-best results are underlined.

Table 3. Clustering results on Caltech101-20 and Caltech101-all datasets in different metrics (%, reported in mean ± standard deviation).

Data Sets	Caltech101-20					Caltech101-All
Methods	ACC	NMI	PUR	F-Score	Precision	ACC	NMI	PUR	F-Score	Precision
AMGL	18.7 ± 1.0	19.7 ± 0.8	20.8 ± 0.4	08.6 ± 1.2	11.7 ± 0.7	08.7 ± 0.1	21.3 ± 0.1	10.4 ± 0.1	03.0 ± 0.1	02.1 ± 0.3
MSC_IAS	29.6 ± 0.0	32.8 ± 0.0	54.4 ± 0.0	21.6 ± 0.0	38.5 ± 0.0	12.0 ± 0.0	30.3 ± 0.0	27.5 ± 0.0	07.2 ± 0.0	12.8 ± 0.0
LMVSC	28.1 ± 1.3	28.6 ± 0.5	52.7 ± 0.5	19.9 ± 1.2	31.5 ± 1.4	12.0 ± 0.5	25.4 ± 0.1	23.5 ± 0.3	06.8 ± 0.6	07.4 ± 0.8
FMCNOF	36.9 ± 3.5	11.3 ± 4.5	40.1 ± 1.7	30.3 ± 5.0	21.4 ± 4.0	14.0 ± 1.1	12.3 ± 0.9	16.3 ± 0.9	08.3 ± 1.0	05.0 ± 0.7
ERMC-AGR	38.5 ± 2.1	23.7 ± 2.3	47.2 ± 2.8	43.9 ± 0.0	33.1 ± 0.3	12.4 ± 1.0	18.2 ± 0.3	18.3 ± 1.0	04.6 ± 0.0	02.4 ± 0.0
MSGL	42.2 ± 0.0	38.4 ± 0.0	53.4 ± 0.0	35.1 ± 0.0	31.7 ± 0.0	16.8 ± 0.0	31.6 ± 0.0	20.7 ± 0.0	13.9 ± 0.0	11.9 ± 0.0
OMVCDR	42.2 ± 0.0	42.3 ± 0.0	59.1 ± 0.0	41.5 ± 0.0	53.7 ± 0.0	16.7 ± 0.0	36.3 ± 0.0	30.6 ± 0.0	13.5 ± 0.0	16.6 ± 0.0
DMAC	42.2 ± 0.0	53.8 ± 0.0	72.7 ± 0.0	35.1 ± 0.0	06.1 ± 0.0	16.7 ± 0.0	35.7 ± 0.0	33.3 ± 0.0	13.2 ± 0.0	01.8 ± 0.0
EAG-DCT	50.9 ± 1.5	54.9 ± 1.1	75.8 ± 0.9	45.4 ± 2.5	66.6 ± 2.7	27.2 ± 0.4	46.7 ± 0.4	46.1 ± 0.4	22.7 ± 1.0	33.8 ± 1.5

The best results are shown in bold; the second-best results are underlined.

Table 4. Clustering results on VGGface50 and MNIST datasets in different metrics (%, reported in mean ± standard deviation).

Data Sets	VGGface50					MNIST
Methods	ACC	NMI	PUR	F-Score	Precision	ACC	NMI	PUR	F-Score	Precision
AMGL	02.7 ± 0.0	00.8 ± 0.0	02.9 ± 0.0	03.9 ± 0.0	01.9 ± 0.0	———	———	———	———	———
MSC_IAS	07.1 ± 0.0	08.0 ± 0.0	08.0 ± 0.0	03.6 ± 0.0	03.8 ± 0.0	———	———	———	———	———
LMVSC	10.7 ± 0.1	12.1 ± 0.1	11.4 ± 0.2	04.7 ± 0.0	03.5 ± 0.2	98.9 ± 0.0	96.7 ± 0.0	98.9 ± 0.0	97.9 ± 0.0	97.8 ± 0.0
FMCNOF	06.2 ± 0.1	05.5 ± 0.1	06.7 ± 0.1	03.2 ± 0.7	00.7 ± 0.0	88.5 ± 0.2	80.4 ± 0.2	88.5 ± 0.2	80.1 ± 0.2	79.5 ± 0.2
ERMC-AGR	05.0 ± 0.2	03.3 ± 0.1	05.2 ± 0.2	03.8 ± 0.1	02.3 ± 0.0	94.7 ± 5.7	94.0 ± 4.0	95.1 ± 5.2	93.7 ± 6.0	91.3 ± 9.1
MSGL	08.3 ± 0.0	09.5 ± 0.0	13.3 ± 0.3	05.3 ± 0.0	04.1 ± 0.0	98.7 ± 0.0	96.1 ± 0.0	98.7 ± 0.0	97.3 ± 0.0	97.4 ± 0.0
OMVCDR	08.3 ± 0.0	11.3 ± 0.3	09.8 ± 0.1	04.3 ± 0.0	03.4 ± 0.0	98.5 ± 0.0	95.7 ± 0.0	98.5 ± 0.0	96.9 ± 0.0	96.9 ± 0.0
DMAC	08.5 ± 0.0	10.8 ± 0.0	09.8 ± 0.0	08.4 ± 0.0	02.3 ± 0.0	97.2 ± 0.0	93.2 ± 0.0	97.2 ± 0.0	97.2 ± 0.0	94.5 ± 0.0
EAG-DCT	13.1 ± 0.5	15.9 ± 0.4	13.9 ± 0.5	06.4 ± 0.1	06.4 ± 0.1	99.1 ± 0.0	97.3 ± 0.0	99.1 ± 4.3	98.3 ± 0.0	98.3 ± 0.0

The best results are shown in bold; the second-best results are underlined.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fan, R.; Kang, K.; Zhang, Q.; Liu, C.; Hu, Y.; Peng, C. Efficient Anchor-Guided Multi-View Clustering via Diversity–Consistency Learning and Low-Rank Tensor Recovery. Electronics 2026, 15, 1136. https://doi.org/10.3390/electronics15051136

AMA Style

Fan R, Kang K, Zhang Q, Liu C, Hu Y, Peng C. Efficient Anchor-Guided Multi-View Clustering via Diversity–Consistency Learning and Low-Rank Tensor Recovery. Electronics. 2026; 15(5):1136. https://doi.org/10.3390/electronics15051136

Chicago/Turabian Style

Fan, Rong, Kehan Kang, Qian Zhang, Chundan Liu, Yunhong Hu, and Chong Peng. 2026. "Efficient Anchor-Guided Multi-View Clustering via Diversity–Consistency Learning and Low-Rank Tensor Recovery" Electronics 15, no. 5: 1136. https://doi.org/10.3390/electronics15051136

APA Style

Fan, R., Kang, K., Zhang, Q., Liu, C., Hu, Y., & Peng, C. (2026). Efficient Anchor-Guided Multi-View Clustering via Diversity–Consistency Learning and Low-Rank Tensor Recovery. Electronics, 15(5), 1136. https://doi.org/10.3390/electronics15051136

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Efficient Anchor-Guided Multi-View Clustering via Diversity–Consistency Learning and Low-Rank Tensor Recovery

Abstract

1. Introduction

2. Notation and Preliminary

3. Related Work

4. The Proposed Method

5. Optimization

5.1. Updating $W$

5.2. Updating A

5.3. Updating $Z$

5.4. Updating $S$

5.5. Updating $α_{v}$

5.6. Optimization of $Y$

5.7. Updating $K$ , and $ρ$

6. The Clustering Step

7. Complexity Analysis

8. Experiments

8.1. Baseline Methods

8.2. Experimental Settings

8.3. Clustering Performance

8.4. T-SNE Visualization

8.5. Convergence Study

8.6. Time Cost and Scalability

8.7. Parameter Sensitivity

8.8. Ablation Study

8.9. Discussion

9. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Efficient Anchor-Guided Multi-View Clustering via Diversity–Consistency Learning and Low-Rank Tensor Recovery

Abstract

1. Introduction

2. Notation and Preliminary

3. Related Work

4. The Proposed Method

5. Optimization

5.1. Updating W

5.2. Updating A

5.3. Updating Z

5.4. Updating S

5.5. Updating α v

5.6. Optimization of Y

5.7. Updating K , and ρ

6. The Clustering Step

7. Complexity Analysis

8. Experiments

8.1. Baseline Methods

8.2. Experimental Settings

8.3. Clustering Performance

8.4. T-SNE Visualization

8.5. Convergence Study

8.6. Time Cost and Scalability

8.7. Parameter Sensitivity

8.8. Ablation Study

8.9. Discussion

9. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

5.1. Updating $W$

5.3. Updating $Z$

5.4. Updating $S$

5.5. Updating $α_{v}$

5.6. Optimization of $Y$

5.7. Updating $K$ , and $ρ$