1. Introduction
Driven by advances in artificial intelligence and data-driven optimization, heterogeneous multi-source data have become increasingly common in real-world applications [
1,
2]. In such scenarios, the same set of objects can often be described by several different types of features. For example, an image can be represented by its color, texture, and shape at the same time. A web document can be described by both its text content and its link structure. A clinical patient may have genomic data, pathological images, and electronic health records. Each type of feature is called a
view, and each view captures only part of the information about the data. No single view is likely to reveal the complete cluster structure on its own. Multi-view clustering aims to combine these different views to find meaningful groups of objects without any label information. This task has wide applications in areas such as social network analysis, medical informatics, remote sensing, and visual object categorization [
3,
4].
Among the various approaches to this task, graph-based methods have attracted considerable attention. The basic idea is to build a graph for each view, where nodes represent data samples and edge weights reflect pairwise similarities. The cluster structure is then extracted from the spectral properties of the graph. In classical spectral clustering [
5], the eigenvectors of the graph Laplacian provide a continuous approximation of the discrete cluster indicators, and K-means is applied as a post-processing step to produce hard assignments. Multi-view extensions build one graph per view and try to learn a unified representation that combines information across views. Nie et al. [
6] proposed a graph learning framework that assigns view weights automatically through self-optimization. Wang et al. [
7] introduced a mutual reinforcement mechanism between individual view graphs and the consensus graph. Liang et al. [
8] modeled both consistency and inconsistency across views, and Huang et al. [
9] separated multi-view graphs into shared and diverse components. These methods have significantly improved clustering quality, but they all rely on full
similarity matrices. This requires at least
storage and
eigendecomposition cost, which becomes impractical when
N exceeds a few thousand.
Dense pairwise graphs carry a heavy computational burden. To resolve this, researchers select a small set of representative anchors, denoted as
r (
). By learning an
affinity matrix between the samples and these anchors, a sparse bipartite structure is formed. Graph construction and spectral analysis both scale as
. Li et al. [
10] initially adapted this concept for multi-view data. They constructed a separate graph per view and concatenated them prior to spectral analysis. A related pipeline by Kang et al. [
11] accelerates subspace clustering. Addressing a different issue, Nie et al. [
12] designed a structured objective based on graph connectivity. This yields cluster labels directly, avoiding the K-means step altogether. Meanwhile, other studies [
13,
14,
15] proposed fusion strategies without additional hyperparameters. These methods aggregate the individual bipartite graphs under a strict Laplacian rank constraint, which guarantees the consensus graph forms exactly
c connected components.
A recent trend attempted to solve graph construction and clustering in a single step. Fang et al. [
16] reconstructed subspaces to learn bipartite graphs for each view alongside a consensus graph. They also added label learning directly into this loop. Liu et al. [
17] relied on decomposition. By splitting each anchor graph into a shared core and a separate residual, they penalized the residuals to reduce discrepancies across views. Yan et al. [
18] pushed this idea further. They isolated consistency from diversity, applying sparsity rules so that only the consistent components shaped the final graph. However, these methods all build graphs in the original feature space without modifying the data representation beforehand. Li et al. [
19] took a different route by fusing similarities during the spectral embedding phase. By utilizing entropy weighting and spectral rotation, their model produces discrete labels without extra clustering steps. This orthogonal rotation mechanism to replace K-means was originally designed by Luo et al. [
20] for multigraph clustering. Several recent bipartite graph models have since adopted this idea [
16,
19].
Tensors provide a parallel approach to capture high-order correlations across views. Xia et al. [
21] stacked view-specific bipartite graphs into a third-order tensor and imposed the tensor Schatten
p-norm to encourage low-rank consistency. Gu et al. [
22] avoided explicit tensor decomposition by learning a compact essential representation. More recently, Long et al. [
23] developed a scalable multi-view tensor clustering method, and Liu et al. [
24] further studied large-scale multi-view tensor clustering with implicit linear kernels. Zhao et al. [
25] captured indirect relationships between samples and introduced a truncation mechanism to filter out low-quality graphs before fusion. Jiang et al. [
26] extended the bipartite graph framework to the unaligned setting, where sample correspondences across views are unknown. Ji et al. [
27] further combined tensorized modeling with multi-scale representation learning for unaligned multi-view clustering.
Despite this progress, two limitations remain in existing methods. The first limitation concerns the feature space in which bipartite graphs are constructed. Virtually all methods cited above learn anchor-sample affinities in the original feature space. When a view contains thousands of features, many of which are redundant or uninformative, pairwise distances suffer from the concentration phenomenon in high-dimensional spaces and become nearly indistinguishable. The resulting affinity values lose discriminative power, and the distorted affinities propagate through consensus fusion into the final partition. Although dimensionality reduction techniques such as PCA have long been available, integrating them into the bipartite graph learning objective while preserving the variance structure and maintaining closed-form updates remains unexplored.
The second limitation lies in the transition from continuous spectral embeddings to discrete cluster assignments. Three competing strategies coexist. The dominant approach applies K-means to the eigenvectors of the consensus Laplacian, which introduces nondeterminism through random seed selection and can yield different partitions across runs on the same graph. The constrained Laplacian rank approach [
28] enforces exactly
c connected components, yielding deterministic labels from the connectivity pattern, but it imposes a rigid structural requirement that may not suit data whose clusters overlap or vary in density. Spectral rotation [
29] aligns the eigenvector matrix to a discrete indicator through orthogonal rotation, removing K-means entirely, yet it does not by itself ensure that the underlying graph possesses a clear connected-component structure. Each strategy sacrifices one desirable property to gain another, and combining their strengths within a unified objective remains an open question.
We propose projection-enhanced bipartite graph learning (PEBGL) to address these limitations. The method first projects each view onto a compact PCA subspace and then jointly performs bipartite graph construction, entropy-regularized consensus fusion, spectral embedding, and discrete rotation within an alternating optimization framework. In the projected space, anchor reconstruction under simplex constraints forms the bipartite graph. By applying an entropy penalty, the method learns adaptive weights for the view-specific graphs and fuses them into one consensus structure. A spectral embedding is computed from the symmetric normalized Laplacian of the consensus graph, and an orthogonal rotation aligns this embedding with discrete cluster indicators as a structural regularizer. Final labels are obtained from the connected components of the converged consensus graph, reducing the dependence on K-means post-processing. Experiments on six benchmark datasets demonstrate the competitive performance and computational efficiency of PEBGL.
- (i)
A projection-enhanced bipartite graph learning framework is proposed, which combines fixed PCA-based projection preprocessing with scalable sample-anchor graph learning.
- (ii)
An entropy-regularized consensus fusion strategy is developed to adaptively integrate view-specific bipartite graphs into a unified consensus graph.
- (iii)
A K-means-free final decoding scheme is introduced by detecting connected components on the learned consensus graph, reducing the dependence on K-means post-processing.
The overall framework of PEBGL is depicted in
Figure 1.
Table 1 summarizes the key symbols used throughout this paper. The set of
binary matrices in which each row contains exactly one unit entry is denoted by
. Let
denote a multi-view dataset with
V views, where
contains
N samples described by
-dimensional features in the
v-th view. The goal is to partition the
N samples into
c clusters by jointly exploiting all views.
The remainder of this paper is organized as follows.
Section 2 establishes the mathematical preliminaries.
Section 3 develops the proposed method with complete derivations.
Section 4 reports experimental comparisons and analyses on six benchmark datasets.
Section 5 concludes the paper.
3. Proposed Method
This section presents the PEBGL framework, develops each model component, and derives the alternating optimization procedure.
3.1. Projected Bipartite Graph Learning
Existing anchor-based methods construct bipartite graphs directly in the original feature space. When the feature dimensionality is high, pairwise distances between samples and anchors tend to concentrate, degrading the quality of the resulting affinity graphs. To mitigate this effect, a fixed PCA-based projection preprocessing step is introduced for each view, mapping the original features into a compact subspace before graph construction.
For each view
v, a matrix
with orthonormal columns, satisfying
, projects both the data matrix
and the anchor matrix
into a common low-dimensional space,
The projected dimension
is determined individually for each view by retaining the leading principal components that account for at least 95% of the cumulative variance. A lower bound of
c is imposed to preserve sufficient discriminative capacity. For high-dimensional small-sample datasets with
, the projected dimension is capped at
c to avoid an underdetermined linear system.
The projection matrix is computed once by PCA before the main alternating optimization and then kept fixed. Thus, the projection is treated as a preprocessing step rather than as a jointly optimized variable in the main loop. The fixed PCA projection maps each view onto a compact subspace that preserves the dominant variance structure, thereby reducing redundant high-dimensional information before sample-anchor affinity construction while keeping the subsequent optimization efficient and numerically stable.
A prerequisite for bipartite graph methods is the construction of semantically aligned anchors across views. Independent per-view anchor generation risks semantic misalignment, where the k-th anchor in one view corresponds to a different data region than the k-th anchor in another view. To ensure consistent anchor semantics, all views are concatenated into with , and K-means is performed on to obtain r global centroids. These centroids are then split along the feature dimension to recover view-specific anchor matrices , guaranteeing that corresponding anchors share the same global origin across all views.
Given the projected data
and projected anchors
, a view-specific bipartite graph
is constructed to encode the affinity between each sample and the
r anchors. Following the anchor-based subspace learning paradigm [
16,
17], each projected sample is modeled as a non-negative linear combination of the projected anchors, leading to the reconstruction objective
where
is a regularization parameter that prevents trivial solutions and promotes well-conditioned graphs. The non-negativity constraint
ensures that the affinity values are interpretable as membership weights, and the column-sum constraint
normalizes each sample’s affinities to lie on the probability simplex. The entry
represents the reconstruction weight of the
k-th sample with respect to the
j-th anchor.
The reconstruction is performed in the projected space
rather than in the original space
. This distinction is essential when
is large, because the Frobenius norm in (
10) measures distances through
-dimensional inner products, yielding more reliable affinity estimates by concentrating on the variance-preserving directions.
3.2. Consensus Graph Fusion
Each view provides a different perspective of the underlying cluster structure, and the view-specific bipartite graphs
may exhibit both complementary patterns and view-specific artifacts. To aggregate cross-view information, a consensus bipartite graph
P is introduced through the weighted discrepancy minimization
where
denotes the adaptive weight for the
v-th view. This formulation encourages
P to serve as a centroid of the view-specific graphs in the Frobenius norm sense, with views of higher quality exerting greater influence through larger weights. The fusion objective in (
11) treats all views with permutation symmetry, meaning that relabeling the view indices does not alter the optimization landscape. Unlike methods that fuse graphs at the spectral embedding level [
19] or through direct concatenation [
11], operating in the bipartite graph space preserves the
complexity throughout the fusion process.
Different views contribute unequally to the clustering task due to varying feature quality and relevance to the underlying structure. To adaptively balance view contributions, each view is assigned a weight
subject to
, and an entropy regularization term is introduced as
where
controls the smoothness of the weight distribution. A large
drives the weights toward the uniform distribution
, while a small
concentrates weight on the views with the lowest reconstruction cost. The entropy term prevents degenerate solutions where all weight collapses onto a single view, and it admits a closed-form softmax solution as derived in
Section 3.4. The temperature parameter is set adaptively as
, where
denotes the mean per-view cost, thereby scaling the entropy regularization to the magnitude of the objective without manual tuning.
To extract the cluster structure encoded in the consensus bipartite graph
P, the augmented bipartite graph is formed as
The symmetric normalized Laplacian of
S is
, where
denotes the diagonal degree matrix. By Lemma 1, the number of connected components in
S equals the multiplicity of the zero eigenvalue of
. Let
denote the sample-node normalized Laplacian induced by the consensus bipartite graph
P. It is used to compute the sample-level spectral embedding, while the augmented graph
S is defined on both samples and anchors. A spectral embedding
is extracted from the
c eigenvectors of
associated with the smallest eigenvalues, subject to the orthonormality constraint
. The spectral embedding term
encourages
P to possess a clear block-diagonal structure conducive to clustering.
To bridge the gap between the continuous embedding
and discrete cluster assignments, the spectral rotation framework [
29] is adopted. An orthogonal matrix
R is sought such that the rotated embedding
aligns closely with a discrete indicator matrix
. The alignment is measured by
The orthogonal rotation R absorbs the inherent rotational ambiguity of the spectral embedding, and the indicator matrix Y provides a discrete cluster assignment through the row-wise argmax of . This step avoids K-means post-processing during label refinement. In the experiments, the final labels are obtained by connected-component detection on the converged consensus graph P, which reduces the dependence on K-means in the final assignment stage with fixed anchor initialization and parameter settings.
3.3. Objective Function
Given the fixed projected data and anchors, the complete optimization problem is
where
,
,
are trade-off parameters. The six terms in (
16) serve complementary roles. The reconstruction term and the Frobenius regularization together measure anchor-based approximation quality in the projected subspace while preventing trivial solutions for
. The consensus fusion term couples all views through the shared graph
P, and the spectral embedding term extracts the block-diagonal structure of
P via its symmetric normalized Laplacian
. The discrete rotation term aligns the continuous embedding to cluster indicators, and the entropy term adaptively distributes view weights.
3.4. Optimization
Problem (
16) is solved by alternating minimization. Each variable is updated in turn while all others are held fixed, and every subproblem admits either a closed-form solution or an efficient iterative procedure. The update rules are presented below as steps (A) through (E).
Expanding each term using
yields
Collecting all terms that depend on
and discarding constants, the objective reduces to
Taking the matrix derivative with respect to
and applying the identities
where
A is symmetric, the stationarity condition becomes
Dividing by 2 and rearranging,
The coefficient matrix
is symmetric positive definite, since
is positive semidefinite and
. The unconstrained minimizer is therefore unique and given by
The matrix
is of size
with
, so its Cholesky factorization costs
and the subsequent back-substitution costs
. The dominant cost is the product
, which requires
operations. The regularization term
in the right-hand side of (
26) draws each view-specific graph toward the shared consensus graph, with the strength of this effect governed by
.
The unconstrained solution
generally violates the constraints
and
. Each column of
is projected onto the probability simplex, yielding
, where
is determined in
time [
31].
A two-stage approach is adopted. The weighted-average step gives the exact minimizer of the quadratic consensus fitting term, while the subsequent coclustering refinement enforces the desired bipartite graph structure. Therefore, the overall
P-update is interpreted as an approximate block update for the full
P-subproblem rather than a simultaneous closed-form solution to all terms involving
P. The fusion term is minimized by expanding
and differentiating with respect to
P,
which yields the weighted average
followed by projection onto the non-negative orthant. The spectral term is then incorporated through coclustering refinement [
12,
13]. The augmented graph
S is formed as in (
13), and its symmetric normalized Laplacian is
. The refinement is applied to the bipartite block rather than to the whole augmented adjacency matrix. Specifically, it solves
where
denotes the augmented bipartite adjacency matrix constructed from the current bipartite block
P as in Equation (
13),
is its symmetric normalized Laplacian, and
is the spectral embedding associated with the augmented graph. Following the adaptive strategy of [
13,
16],
is doubled when the number of near-zero eigenvalues of
is less than
c and halved when it exceeds
c, until the augmented graph possesses exactly
c connected components. This mechanism eliminates a hyperparameter while imposing the desired cluster count.
which is equivalently written as
Following Luo et al. [
20], this is solved via iterative SVD. At the
k-th inner step, form
compute its thin SVD
, and update
By the von Neumann trace inequality, the update
maximizes
over all matrices with orthonormal columns. This provides an efficient orthogonality-preserving update for the inner SVD step. The embedding
is initialized from the spectral decomposition of
P and evolves smoothly across outer iterations without re-initialization; two to three inner iterations suffice in practice.
Let
be the thin SVD. By the von Neumann trace inequality
and the upper bound is attained at
For the indicator matrix
Y, the subproblem with
R and
fixed is
Let
. Since
Y is an indicator matrix, the trace decomposes row-wise as
, where
indexes the unit entry in the
i-th row of
Y. Each row is independent, and the global maximum is achieved by
(E) Update . Collecting all terms involving
from (
16), the subproblem is
where the per-view cost aggregating the reconstruction, regularization, and consensus terms of (
16) is
Introducing a Lagrange multiplier
for the constraint
, the Lagrangian is
Setting
gives
from which
Substituting into
to eliminate
yields
Substituting back into (
44) gives the softmax solution
The objective (
40) is strictly convex in
for any
because its Hessian is
, which is positive definite. Therefore (
46) is the unique global minimizer.
3.5. Algorithm Summary
The complete procedure is summarized in Algorithm 1.
| Algorithm 1 PEBGL: projection-enhanced bipartite graph learning. |
- Require:
Multi-view data , number of clusters c, anchor count r, parameters , , . - 1:
Generate joint anchors via concatenation and K-means with 30 restarts. - 2:
Compute PCA projections once and keep them fixed; form and . - 3:
Initialize via distance-based simplex projection. - 4:
Initialize . - 5:
Initialize from the spectral embedding of P. - 6:
Initialize Y via K-means on ; set ; set . - 7:
while not converged do - 8:
Update for each view via Equation ( 26) and simplex projection. - 9:
Update P via Equation ( 29) and coclustering refinement ( 30). - 10:
Update via Equations ( 33) and ( 34) with 2–3 inner SVD iterations. - 11:
Update R via Equation ( 37). - 12:
Update Y via Equation ( 39). - 13:
Update via Equation ( 46) with . - 14:
end while - Ensure:
Cluster labels from connected-component detection on P.
|
The stopping criterion is defined by the relative change in the objective function. Specifically, the iteration terminates when , or when the maximum number of iterations reaches 50, where denotes the objective value at iteration t.
Remark 1. The indicator matrix Y is initialized via K-means applied to the initial spectral embedding . Within the main loop, Y is updated exclusively by the row-wise argmax rule (39), and no further K-means invocation occurs for label refinement. Although Y provides a discrete indicator during optimization, the experiments in Section 4 report results from connected-component detection on the converged consensus graph P. Therefore, PEBGL reduces the dependence on K-means post-processing in the final label assignment stage, and the final decoding step is reproducible with fixed anchor initialization, random seed, and parameter settings. 3.6. Complexity
The per-iteration cost of Algorithm 1 is analyzed as follows. Updating for a single view requires operations, dominated by the matrix product . The Cholesky factorization of the matrix contributes , and the column-wise simplex projection contributes . Summing over V views gives .
The consensus graph update via the weighted average and coclustering refinement costs , where is the number of inner refinement steps. The spectral embedding, rotation, discrete label update, and weight update collectively cost , where the additional terms are usually small because in typical clustering tasks. Therefore, when the numbers of views, anchors, projected dimensions, clusters, and inner refinement steps are fixed, the dominant iterative graph-learning cost scales linearly with the number of samples N.
The PCA projection is performed once before the main alternating optimization. Its one-time preprocessing cost is
when using a standard PCA implementation. This cost is outside the main iterative loop and may become non-negligible for high-dimensional small-sample datasets, which is consistent with the runtime behavior observed in
Section 4.7.
4. Experiments
All experiments were executed in MATLAB R2024a on a machine equipped with an AMD Ryzen 9 7945HX processor and 32 GB of RAM.
4.1. Datasets and Evaluation Metrics
Six publicly available multi-view datasets spanning image recognition, handwritten digit classification, and plant species identification were adopted.
Table 2 summarizes their statistics.
MSRCV1 [
34] contains 210 images from 7 semantic categories, each described by 5 feature types, namely, color moments, HOG descriptors, GIST features, LBP textures, and CENTRIST features.
Yale [
35] consists of 165 face images of 15 individuals captured with varying illumination and expressions, represented by 3 high-dimensional views.
ORL [
36] includes 400 face images from 40 subjects with variations in lighting, expression, and facial details, also described by 3 views with the same feature types as Yale.
Handwritten [
37] contains 2000 images of handwritten digits 0 through 9, characterized by 6 feature types, namely, Fourier coefficients, profile correlations, Karhunen–Loève coefficients, pixel averages, Zernike moments, and morphological features.
100Leaves [
38] comprises 1600 plant leaf samples from 100 species described by 3 views of shape, fine-scale margin, and texture descriptors, all with 64-dimensional features.
Caltech101-7 [
39] contains 1474 images from 7 object categories, described by 6 views, namely, Gabor features, wavelet moments, CENTRIST features, HOG descriptors, GIST features, and LBP textures.
Three widely used external evaluation metrics were adopted. Accuracy [
40] measures the proportion of correctly assigned samples after optimal label permutation via the Hungarian algorithm. Normalized Mutual Information [
41] quantifies the mutual dependence between predicted and true labels, normalized to the range
. Purity [
42] computes the fraction of samples in each cluster that belong to the dominant class. Higher values indicate better clustering quality for all three metrics.
4.2. Experimental Setup
PEBGL was compared against seven recent multi-view clustering methods based on anchor graphs or bipartite graph learning. GFSC [
43] performs multi-graph fusion for multi-view spectral clustering. LMVSC [
11] constructs anchor graphs per view independently and fuses them through concatenation. CDMGC [
9] decomposes view graphs into consistent and diverse components for fusion. MSGL [
15] develops structured graph learning from single-view to multi-view settings. SFMC [
13] learns a parameter-free consensus bipartite graph with a Laplacian rank constraint. FPMVS-CAG [
14] integrates anchor selection and graph construction without explicit hyperparameter tuning. DiBGF-MGC [
18] separates bipartite graphs into consistency and diversity components through intra-view and inter-view constraints.
Although SFMC, FPMVS-CAG, DiBGF-MGC, and PEBGL all belong to scalable graph-based or bipartite graph-based multi-view clustering methods, their technical emphases are different. SFMC learns a parameter-free consensus bipartite graph under a Laplacian rank constraint, but sample-anchor affinities are still constructed in the original feature space. FPMVS-CAG integrates anchor selection and graph construction through consensus anchor guidance, while the learned affinities also rely on the original feature representation. DiBGF-MGC improves graph fusion by decomposing bipartite graphs into consistency and diversity components. In contrast, PEBGL applies fixed PCA-based projection preprocessing before sample-anchor affinity learning, constructs projection-enhanced bipartite graphs, performs entropy-regularized adaptive consensus fusion, and obtains final labels by connected-component detection without K-means post-processing. Therefore, the main difference of PEBGL lies in projection-enhanced graph construction, adaptive consensus fusion, and K-means-free final decoding within a scalable bipartite graph framework.
For the compared methods, publicly available implementations were used whenever available, and the same unsupervised evaluation protocol was followed on the benchmark datasets. When complete implementations or identical preprocessing settings were not fully available, the results reported in the corresponding papers were used as references. Available implementations were also tested in the same local MATLAB environment for the execution-time comparison in
Section 4.7. For PEBGL, anchors were initialized by K-means with 30 restarts with a fixed random seed, while the final labels were obtained by connected-component detection rather than K-means post-processing.
For PEBGL, the hyperparameters were tuned via a two-stage grid search. In the first stage, and were swept over logarithmic grids and respectively at each candidate r to identify the promising region. In the second stage, a finer search was performed around the identified optimum. The anchor count r was selected from a dataset-dependent candidate set ranging from c to . The rotation trade-off was fixed at . The entropy temperature was set adaptively as , where denotes the mean reconstruction cost across views. For high-dimensional small-sample datasets where , the PCA projection dimension was capped at c to avoid having fewer samples than features after projection; this applies to ORL in the present experiments.
4.3. Clustering Results
Table 3,
Table 4 and
Table 5 present the clustering results measured by ACC, NMI, and Purity on six benchmark datasets. The best result in each row is highlighted in
bold, and the second best is
underlined.
Table 3,
Table 4 and
Table 5 show that PEBGL attains the highest ACC and NMI on 100Leaves and Caltech101-7, two datasets that differ substantially in structure. The 100Leaves dataset includes 100 fine-grained plant species with low-dimensional views, while Caltech101-7 combines six feature types with dimensionality spanning from 40 to 1984. On 100Leaves, PEBGL obtains slightly higher results than DiBGF-MGC by 1.3, 0.3, and 0.7 percentage points in ACC, NMI, and Purity, respectively. On Caltech101-7, PEBGL improves ACC and NMI by 4.8 and 2.7 percentage points, respectively.
On MSRCV1, PEBGL ranks second across all three metrics, trailing DiBGF-MGC by only 1.4 points in ACC while outperforming the remaining baselines by clear margins. On Handwritten, all top methods achieve above 95% ACC; PEBGL reaches 96.9%, close to DiBGF-MGC and CDMGC. The small gap reflects the well-separated digit classes and balanced six-view configuration. On Yale and ORL, PEBGL achieves competitive but not leading results compared with DiBGF-MGC. Both datasets are high-dimensional small-sample face recognition benchmarks, where the largest view dimension reaches 6750, while the sample sizes are only 165 and 400, respectively. With this setting, the fixed PCA projection preserves dominant variance directions but may weaken some low-variance facial cues related to subtle illumination, expression, and local texture variations. In contrast, DiBGF-MGC constructs bipartite graphs in the original feature space and separates consistency and diversity components, which may better preserve view-specific facial structures.
In summary, across the 18 metric–dataset combinations, PEBGL achieves the best result five times and the second-best result five times. These results indicate that the proposed framework achieves competitive clustering accuracy on datasets with diverse scales and feature configurations.
4.4. Ablation Study
To evaluate the contribution of each module, four ablation variants were constructed. PEBGL-1 removes the PCA projection and constructs bipartite graphs directly in the original feature space. PEBGL-2 sets
so that each view-specific bipartite graph is learned independently without consensus feedback. PEBGL-3 replaces the connected-component decoding with K-means applied to the spectral embedding of the converged
P. PEBGL-4 fixes
throughout the optimization, disabling adaptive view weighting.
Table 6 reports the ACC of all variants on six datasets. Bold indicates the best result.
PEBGL-1 removes the fixed PCA projection and constructs bipartite graphs directly in the original feature space. This causes the largest drop on high-dimensional datasets, with ACC falling from 71.5% to 16.4% on Yale and from 84.9% to 42.9% on Caltech101-7, indicating that the projection step is effective in reducing redundant high-dimensional information before sample-anchor graph construction. On 100Leaves, all three views are 64-dimensional, so the projection has no measurable effect. It should also be noted that the PCA projection is fixed and unsupervised and thus may not fully preserve all cluster-discriminative components on high-dimensional small-sample face datasets. PEBGL-2 disables consensus fusion and yields the largest average degradation across all six datasets. PEBGL-3 replaces connected-component decoding with K-means, reducing ACC on every dataset. PEBGL-4 fixes uniform view weights and suffers the most on Caltech101-7, where the six views differ greatly in quality. Overall, consensus fusion contributes the most, followed by the decoding strategy and adaptive weighting, while projection is decisive only for high-dimensional views.
4.5. Parameter Analysis
The sensitivity of PEBGL to its two primary hyperparameters
and
is investigated by varying each over
and
respectively while fixing
r at its optimal value.
Figure 2 visualizes ACC as 3D bar charts on six datasets.
The effect of the anchor count
r on clustering accuracy is examined separately. With
and
fixed at their optimal values,
r is varied over
.
Figure 3 reports the results.
As shown in
Figure 2, PEBGL maintains stable ACC over a wide range of
and
on most datasets. On MSRCV1 and Handwritten, ACC remains above 60% and 80% respectively across several orders of magnitude, with degradation confined to extreme corners of the grid. Caltech101-7 shows a narrower optimal region concentrated at small
and
, and performance drops rapidly outside this region. Yale and ORL exhibit moderate sensitivity, with higher ACC values concentrated along intermediate
.
Figure 3 reveals that the optimal anchor count varies across datasets and does not follow a uniform trend. On Handwritten, ACC increases steadily with
r and stabilizes near
r = 80. On 100Leaves, performance peaks near
r = 900, a value substantially larger than the class count
c = 100, reflecting the need for a large anchor set to represent 100 leaf species. On Yale, ORL, and Caltech101-7, ACC peaks at a specific
r and declines when
r grows further, suggesting that excessive anchors introduce redundancy rather than additional discriminative information. On MSRCV1 the pattern is similar, with the best ACC achieved at
r = 45.
4.6. Behavior of the Consensus Graph
To examine how the consensus bipartite graph evolves during optimization,
Figure 4 tracks the relative change
across iterations on all six datasets.
As shown in
Figure 4, the six datasets exhibit two distinct patterns. Yale, ORL, and Caltech101-7 start from relatively large initial changes above 0.3 and drop sharply within the first 10 iterations, after which the update magnitude remains close to zero. MSRCV1, Handwritten, and 100Leaves begin with much smaller initial changes below 0.03 and decrease gradually over the 100-iteration horizon. This difference is mainly related to the initialization quality of the consensus graph
P, which already approximates a stable structure on the latter three datasets. In all cases, the relative change decreases to a negligible level, indicating stable empirical convergence behavior of the proposed alternating optimization procedure.
4.7. Computational Cost
Figure 5 reports the execution-time comparison of PEBGL and seven representative multi-view clustering methods on six benchmark datasets. The results show that PEBGL achieves competitive computational efficiency on several datasets. Specifically, PEBGL runs efficiently on MSRCV1, Handwritten, and Caltech101-7, and its running time is comparable to or lower than several recent graph-based and bipartite graph-based methods.
On Yale and ORL, PEBGL takes more time than most compared methods. This is mainly because these two face datasets contain very high-dimensional views, with the largest feature dimension reaching 6750, so the PCA preprocessing stage introduces additional computational cost before bipartite graph learning. On 100Leaves, the relatively high running time is mainly related to the large number of categories, where a larger anchor number is required to represent fine-grained leaf structures. Nevertheless, after the projection stage, the main iterative bipartite graph learning procedure remains efficient, which is consistent with the complexity analysis in
Section 3.6.
4.8. View Weight Analysis
To illustrate the behavior of the entropy-regularized adaptive weighting mechanism,
Figure 6 traces the evolution of the view weights
across iterations on MSRCV1 and Handwritten, the two datasets with the largest number of views.
On MSRCV1, Color moment, LBP, and CENTRIST collectively receive over 96% of the total weight at the final iteration, while HOG and GIST are assigned weights below 0.02. This allocation reflects the feature characteristics of the dataset as HOG and GIST retain only 10 and 19 PCA dimensions after projection, suggesting limited discriminative capacity in these two views. The weight distribution stabilizes within approximately 15 iterations and shifts only slightly thereafter.
On Handwritten, five of the six views share comparable weights between 0.12 and 0.30, whereas the Zernike view is effectively suppressed with . The 47-dimensional Zernike moment feature provides negligible complementary information beyond what the remaining five feature types already capture. In effect, the adaptive weighting acts as a soft view selection mechanism that concentrates the fusion on informative views without manual tuning. The entropy regularization prevents the weights from collapsing to a single-view solution, preserving meaningful contributions from multiple views.