1. Introduction
Feature learning aims to automatically discover informative representations from raw data, thereby reducing the reliance on handcrafted features and improving downstream task performance [
1,
2,
3]. As one of the fundamental techniques in feature learning, matrix factorization has been extensively studied and successfully applied in various domains, including data mining, pattern recognition, and information retrieval. In general, matrix factorization seeks to approximate a data matrix by the product of several low-rank matrices, enabling effective dimensionality reduction and compact representation learning. A variety of classical techniques have been developed under the matrix factorization framework, such as NMF [
4,
5], SVD [
6], QR decomposition, LDA [
7], PCA [
8], ICA [
9], and CF [
10].
Among these approaches, NMF has attracted particular attention due to its part-based and interpretable representations. By enforcing non-negativity constraints on both basis and coefficient matrices, NMF produces additive components that are often consistent with human intuition. However, the strict non-negativity requirement limits its applicability to non-negative data and may lead to suboptimal approximations in certain low-rank scenarios. In comparison, classical methods such as SVD often achieve lower reconstruction errors under the same rank constraint. To mitigate this issue, Zhang et al. [
11] proposed a low-rank matrix factorization model with orthogonality constraints to improve decomposition accuracy.
While conventional matrix factorization methods primarily focus on capturing global structures, they often overlook the intrinsic local manifold information embedded in data. To address this limitation, graph-regularized matrix factorization techniques have been widely investigated. For instance, GNMF and LCCF [
12,
13] incorporate graph Laplacian regularization to preserve local geometric structures. Yi et al. [
14] proposed NMF-LCAG by jointly considering reconstruction fidelity and locality preservation. To enhance robustness against noise and outliers, Peng et al. [
15] introduced GCCF based on the maximum correntropy criterion (MCC) [
16]. Moreover, hypergraph-based extensions, such as CHNMF [
17], further exploit high-order relationships among samples.
Most of the aforementioned approaches are formulated in an unsupervised manner and do not exploit available label information. In many practical scenarios, even limited supervision can substantially improve clustering performance. To this end, several constrained and semi-supervised matrix factorization models have been proposed. Liu et al. [
18,
19] introduced CNMF and CCF by enforcing samples from the same class to share similar representations. Zhang et al. [
20] proposed NMFCC, which integrates must-link and cannot-link constraints into the factorization process. More generally, graph-based matrix factorization models can be extended to semi-supervised settings by incorporating pairwise constraints. Following this paradigm, Peng et al. [
21,
22] developed CSNMF and CSCF using adaptive neighbor assignment to construct informative adjacency graphs, while Zhou et al. [
23] proposed CLMF by learning graph structures through sparsity-induced similarity (SIS).
Symmetric matrix factorization (SMF) can be viewed as a particular formulation of matrix factorization, in which an affinity matrix is decomposed into two identical low-rank factors. As affinity matrices primarily reflect neighborhood relations among samples, SMF-based models are inherently oriented toward capturing local structural characteristics. In recent years, a number of semi-supervised extensions of SMF have been proposed. For instance, SNMFCC [
20] incorporates pairwise supervision by embedding must-link and cannot-link constraints into the affinity matrix, while PCPSNMF [
24] simultaneously propagates pairwise constraints and updates the graph structure. To further enhance supervision, Chavoshinejad et al. introduced S
4NMF by integrating self-supervised information into symmetric factorization. In a more recent study, Yin et al. [
25] developed HSSNMF, which utilizes hypergraph-driven constraint propagation to encode complex higher-order interactions. Despite their promising performance, most existing SMF models remain non-convex, as their underlying objective can be simplified to minimizing
, leading to considerable optimization difficulties.
In addition to optimization difficulties, graph-based and symmetric matrix factorization methods also face scalability issues due to the requirement of storing and processing an
affinity matrix. To improve scalability, bipartite graph-based methods have been proposed. Zhou et al. [
26] introduced a bipartite graph-regularized robust low-rank matrix factorization (BLMF) framework for semi-supervised image clustering. Liu et al. [
27] proposed the local anchor embedding (LAE) method to reduce computational cost, while Wang et al. [
28] developed EAGR using an improved anchor graph construction strategy. Nie et al. [
29] further applied bipartite graph modeling to graph-based semi-supervised learning (BGSSL). Nevertheless, directly extending bipartite graphs to symmetric matrix factorization remains non-trivial, since it ultimately requires recovering a fully connected
affinity matrix.
This paper presents a novel fast unconstrained convex symmetric matrix factorization (FUCSMF) framework with the following advantages:
The introduction of label information reformulates the symmetric matrix factorization objective into a convex optimization problem, which guarantees global optimality and leads to reliable and efficient convergence.
By integrating a bipartite graph mechanism into the symmetric factorization framework, the computational burden is substantially reduced. In particular, the resulting time complexity is , while most existing bipartite graph-based methods incur higher costs of .
A novel adaptive graph learning framework is developed to alleviate the sensitivity of existing models to parameter selection, thereby improving robustness and practical applicability.
It is worth noting that the term “convex symmetric matrix factorization” refers to a reformulated optimization framework in which convexity is achieved through problem transformation, rather than being inherent to the original formulation.
The proposed formulation is referred to as convex symmetric matrix factorization, where convexity is achieved through problem reformulation.
3. Methodology
3.1. Bipartite Graph-Based Unconstraint Symmetric Factorization
Due to the high memory requirements and the need for or even computational complexity of traditional graphs, this chapter utilizes the concept of a hypergraph to construct a low-memory graph. Then, an unconstrained convex symmetric matrix factorization is proposed.
According to [
32], in a non-weighted hypergraph, the high-order relationships among vertices are encoded by an incidence matrix
, where each entry indicates the membership between a vertex and a hyperedge. Based on this representation, the hypergraph adjacency matrix induced by the incidence structure can be formulated as
Here,
denotes a diagonal matrix, with its diagonal elements representing the degrees of the corresponding hyperedges.
In its basic form, the incidence matrix
is binary-valued, i.e.,
. Nevertheless, as pointed out in [
33],
can also be generalized to a continuous-valued matrix with entries in the interval
, which provides greater flexibility in characterizing vertex–hyperedge relationships.
Given the resulting adjacency matrix, the hypergraph Laplacian can be constructed as
where
denotes a diagonal matrix whose diagonal elements correspond to the degrees of the vertices. The normalized Laplacian matrix is
.
Anchors can also be seen as a type of hyperedge, as they connect to multiple samples.
Figure 1 illustrates the use of anchors as hyperedges, with a total of seven samples connected to four anchors (hyperedges). Each sample is linked to two anchors. It can be observed that each hyperedge connects three to four samples, allowing for the extraction of higher-order information between the samples. Therefore, the proposed new graphical form is as follows:
Here,
represents the sample–anchor similarity matrix, while
is a diagonal matrix whose diagonal elements are given by
. In practice,
is commonly normalized along its rows. Consequently, according to Theorem 1, the resulting graph
satisfies the self-normalization property.
Theorem 1. Let be a matrix satisfying for all i, and let Λ be a diagonal matrix whose diagonal entries are defined as the column sums of . Then, the matrix is symmetric and each of its rows sums to one.
Proof of Theorem 1. It is straightforward to verify that
is symmetric. Moreover, the row-wise summation is given by:
In this context,
refers to the
-th entry of
, and
is defined as the column-wise sum of
associated with the
k-th column.
In addition, the matrix is positive semidefinite. This follows from the fact that is a diagonal matrix with non-negative entries, and the product can be written as which is of the form ; hence, it is always positive semidefinite.
By Theorem 1, the matrix possesses symmetry and satisfies the row-normalization property. Therefore, its corresponding degree matrix is simply the identity matrix .
Motivated by the above analysis, we formulate a symmetric matrix factorization framework based on a bipartite graph as follows:
It is important to emphasize that the preceding formulation is purely unsupervised and does not utilize supervision from labeled data. To incorporate available supervision in a principled manner, we introduce a semi-supervised formulation by expressing
as
, where
. Here,
serves as an auxiliary matrix encoding the information of unlabeled samples. The matrix
is constructed to incorporate both label supervision and positional information. Specifically, if the
j-th sample is labeled as belonging to the
i-th class, then
, and
otherwise. For unlabeled samples, the corresponding columns of
are zero. In this case,
if the
j-th sample corresponds to the
k-th unlabeled instance, and
otherwise. The matrix
is constructed from the bipartite graph and treated as a fixed transformation throughout the optimization process. Since
does not depend on the optimization variable, it does not alter the convexity properties of the objective function with respect to the variable
. Therefore, the introduction of
preserves the structural properties of the optimization problem.
To clarify the formulation, suppose a dataset consists of eight samples belonging to three categories. Among them,
is assigned to class I,
and
belong to class II, and
corresponds to class III. Under this setting, the associated matrix
can be written as
Given that the labeled samples are already determined and do not need to be optimized, it is reasonable to fix them in a one-hot form. Consequently, the semi-supervised symmetric matrix factorization model under the bipartite graph structure can be written as
To maintain the intrinsic neighborhood structure of the data during subspace learning, we introduce a Laplacian-based regularization term
into the objective. This term encourages nearby samples in the original space to remain close after projection into the latent representation. The Laplacian matrix is constructed as
, where
encodes pairwise similarities between samples and
is the diagonal degree matrix with entries
. The above regularization can also be expressed in an equivalent form as
Under this formulation, the regularization term takes the following form:
where
, which penalizes large differences between the feature representations of similar samples. By introducing this term, the model not only maintains the convex property of the optimization problem but also enhances the smoothness of the learned manifold structure and the discriminative power of the representation.
Therefore, after incorporating the graph regularization term, the overall objective function can be reformulated as follows:
It is worth noting that the inclusion of the matrix A corresponds to a linear transformation applied to the optimization variable, which preserves the convexity structure of the objective function. □
3.2. Analysis of Convexity
Firstly, we give the following theorem to show that Equation (
22) is convex. Since the matrix
is fixed, the convexity analysis focuses on the optimization variable, and the presence of
does not affect the convexity of each term. For readability, we provide proof sketches in the main text and defer detailed derivations to the appendix.
Theorem 2. The objective function in Equation (
22)
is convex. The key idea is to analyze the second-order behavior of the objective function along an arbitrary direction. By introducing a univariate function along a line in the parameter space, the convexity reduces to showing the non-negativity of the second derivative.
The objective consists of a quartic term, a quadratic regularization term, and a coupling term involving the bipartite graph structure. The quartic and quadratic terms naturally contribute positive semidefinite components to the Hessian. For the coupling term, we leverage the positive semidefiniteness of and the structure of to show that it also yields a non-negative contribution.
Combining these results, the overall Hessian is positive semidefinite, which implies convexity of the objective function. The detailed derivation is provided in
Appendix A.
Having established the convexity of Equation (
22), any local minimizer is also a global minimizer; however, strict convexity and uniqueness of the solution are not explicitly guaranteed under the current formulation. Since the objective is both convex and differentiable, the key remaining question concerns the uniqueness of the stationary solution. As a consequence of convexity, any local minimizer is necessarily globally optimal. For subsequent analysis, Equation (
22) can be rewritten in the following expanded form:
where
,
denotes the auxiliary matrix associated with unlabeled samples, as defined previously, and
denotes the normalized bipartite graph matrix defined earlier. By discarding all terms unrelated to
, there is
Theorem 3. The objective function in Equation (
24)
has only one stationary point. The proof is based on analyzing the monotonicity of the gradient mapping. Specifically, we examine the inner product between the gradient difference and the variable difference for two arbitrary points.
The gradient difference can be decomposed into several terms corresponding to the cubic, quadratic, and graph-related components of the objective function. Each term is shown to contribute non-negatively: the quadratic and regularization terms are straightforward, while the graph-related term is bounded using the positive semidefinite property established in Lemma A2. The cubic term is handled using standard matrix inequalities.
As a result, the gradient mapping exhibits monotonic behavior, which provides insights into the structure of stationary points. The detailed proof is given in
Appendix B.
3.3. Correntropy-Based Adaptive Graph Learning
To address the issues raised in Equation (
3), correntropy is introduced into the bipartite learning framework. It is evident that the range of correntropy is from zero to one, transforming a convex function into a quasi-concave function; thus, we replace the L2 norm with the Correntropy Induce Distance (CID)
.
Figure 2 illustrates the function curves of
and
. It can be observed that the two curves almost coincide near zero. This can also be mathematically proven.
The correntropy-based formulation introduces a nonlinear similarity measure, which enhances robustness to noise. However, the global convexity of the objective function is not strictly preserved under this substitution.
Theorem 4. The speed at which approaches zero is similar to the speed at which approaches zero.
This change maintains the range from zero to one, ensuring that the difference between the two largest numbers does not exceed one, effectively avoiding the problem of the nearest neighbor being equal to one. Then, the bipartite graph learning framework becomes
By introducing bipartite graph learning, the proposed method is as follows:
It is worth noting that the correntropy function behaves similarly to a quadratic function in a local neighborhood around zero, which provides a local approximation to convexity. This property helps maintain stable optimization behavior in practice.
Although strict global convexity is not guaranteed, empirical results demonstrate that the proposed optimization scheme converges reliably, indicating that the objective remains well-behaved in practice.
3.4. Optimization
In this section, the proposed model is optimized.
3.4.1. Fixed Update
As
is fixed, by ignoring terms unrelated to
, there is
To simplify the model and reduce the parameter count, we set
. By setting
, the above problem can be reformed as
The optimization problem can be addressed using an iterative scheme [
34].
3.4.2. Fixed Update
The gradient of Equation (
19) is
Since the objective function is convex and free of explicit constraints, it can be efficiently optimized using standard unconstrained optimization techniques. In this work, we adopt CG_DESCENT 6.8 [
35,
36,
37], a nonlinear conjugate gradient solver designed for large-scale smooth optimization problems. This algorithm has been extensively studied in the literature and is known to possess global convergence guarantees under standard conditions such as Lipschitz continuity of the gradient and appropriate line search strategies. CG_DESCENT operates in an iterative manner and requires only basic vector operations, such as inner products and vector additions, along with the evaluation of gradients and objective values at each iteration. Owing to its computational efficiency and scalability, it is well suited for the proposed framework.
Algorithm 1 describes the detailed optimization steps of our proposed method.
| Algorithm 1 FUCSMF |
- 1:
Input: Data matrix , constraint matrix , parameters . - 2:
Output: Clustering indicator matrix (representation matrix) . - 3:
Initialize the anchor point set by k-means. - 4:
while Not convergent do - 5:
Calculate each row of by solving problem Equation ( 28). - 6:
Calculate with . - 7:
Calculate by CG_DESCENT 6.8. - 8:
end while
|
In this work, we adopt CG DESCENT as an off-the-shelf solver and do not re-establish its convergence properties. Instead, we focus on formulating a suitable objective function that can be efficiently optimized using this method.
3.5. Computational Complexity
The overall computational cost of FUCSMF can be decomposed into several major components:
- 1.
Computing the anchor points requires operations.
- 2.
Evaluating the distances between data samples and anchors incurs a complexity of .
- 3.
Constructing the bipartite graph costs , where k denotes the number of nearest neighbors.
- 4.
Evaluating the objective function with respect to requires operations.
- 5.
Computing the gradient of has the same computational complexity of .
To clarify the computational complexity, we analyze the evaluation cost of the objective function and its gradient in detail. The main computational burden arises from the term , where the dominant cost lies in computing and the subsequent matrix multiplications. Specifically, computing requires operations, since . In addition, the multiplication involving the bipartite graph matrix introduces a cost of due to the interactions between samples and anchor points. Therefore, the overall complexity for evaluating the objective function is . Similarly, the gradient computation involves matrix products with the same structure, leading to an equivalent computational complexity. As a result, both the objective function evaluation and gradient computation share the same time complexity of . This explicit breakdown provides a clearer justification of the claimed computational cost.
Given that and , the leading term in the computational complexity of FUCSMF reduces to . By comparison, representative bipartite graph-based methods such as BGSSL and EAGR incur additional higher-order terms, resulting in a complexity of . In addition, the computation of the correntropy-induced metric involves evaluating kernel functions for each element, which introduces an additional cost of . However, this cost is linear with respect to the data size and is dominated by the matrix multiplication terms in the overall complexity. Therefore, the inclusion of correntropy does not change the overall computational complexity.
4. Experiment
In this section, we evaluate the proposed method on several real-world datasets to assess its effectiveness and computational efficiency.
4.1. Comparison Methods
To evaluate the effectiveness and computational efficiency of the proposed FUCSMF, several representative methods are selected for comparison.
SemiGNMF: Semi-supervised Graph-regularized NMF [
12].
CSCF: Correntropy-based Semi-supervised Concept Factorization [
22].
CLMF: Correntropy-based Low-rank MF [
23].
EDDNMF: Element Difference Discriminate NMF [
38].
PCPSNMF: Pairwise Constraint Propagation-induced SNMF [
24].
S
4NMF: Self-supervised Semi-supervised NMF [
39].
HSSNMF: Hypergraph-based Semi-supervised SNMF [
25].
OCSNMF: One-hot Constrained SNMF [
40].
SDSGC: Structured Doubly Stochastic Graph-Based Clustering [
41].
EAGR: Efficient Anchor Graph Regularization [
28].
BCAN: Semi-supervised Learning via Bipartite graph Construction with Adaptive Neighbors [
42].
In detail, methods (1)–(4) are conventional matrix factorization algorithms applied to sample representations, methods (5)–(9) are symmetric matrix factorization models defined on the complete adjacency matrix, and methods (10)–(11) adopt a bipartite graph framework.
4.2. Dataset
The performance of FUCSMF is examined on six benchmark real-world datasets, and their corresponding statistics are listed in
Table 1. In particular, COIL20 and COIL100 are object datasets, YaleB is a face database, USPS and MNIST represent handwritten digit collections, while Letters includes handwritten English alphabet samples.
All experiments were implemented on a desktop platform equipped with an Intel i7-6800K processor and 16 GB memory to maintain a unified computational environment. For fairness, the hyperparameters of competing algorithms were configured following the settings suggested in their original papers. In graph-based methods, the neighborhood size was uniformly set to five. For FUCSMF, the regularization coefficient was fixed at 100. The number of anchors was selected as 500 for COIL20, 1500 for YaleB, and 2000 for the remaining datasets. In addition, the latent dimensionality was chosen to match the number of true classes in each dataset.
For GNMF, CSNMF, and CSCF, the final cluster assignments were obtained by applying
k-means to the learned representations. In contrast, the remaining approaches determined cluster labels directly through
. The CG_DESCENT procedure was terminated when the infinity norm of the gradient satisfied
. In our experiments, 30% of the data were randomly selected as labeled samples for all semi-supervised methods to ensure a fair comparison under the same setting. Moreover, as observed in
Section 4.4, the clustering performance (e.g., ACC) tends to saturate when the proportion of labeled data exceeds a certain threshold, and further increasing labeled data brings only marginal improvement. Therefore, considering both performance and labeling cost, we adopted 30% labeled data as a reasonable trade-off between effectiveness and efficiency. We acknowledge that evaluating lower labeled ratios is also important and will investigate this aspect in future work.
To measure clustering quality, we employed four widely used metrics: ACC, NMI [
43], ARI [
44], and F-score [
45]. All methods were initialized randomly and run ten independent times, with the reported values representing the mean performance across the repeated experiments.
4.3. Experimental Performance
Table 2,
Table 3,
Table 4 and
Table 5 report the mean clustering results and corresponding standard deviations of FUCSMF compared with other methods. The highest and second-highest values are indicated in boldface and underlined, respectively. Since the USPS dataset contains negative entries, it is incompatible with SemiGNMF and, therefore, excluded from its comparisons. Overall, FUCSMF achieves a consistently superior performance across most datasets, and the strong results obtained by anchor-based approaches demonstrate that modeling relationships between samples and a compact set of anchor points effectively captures underlying similarity structures. In contrast, many traditional matrix factorization models exhibit large fluctuations in performance, as reflected by their high standard deviations, largely due to their non-convex objectives and heavy sensitivity to initialization; moreover, except for CLMF, most MF-based competitors rely on multiplicative update rules, which guarantee only monotonic decreases in the objective value without theoretical convergence. FUCSMF, however, benefits from a fully convex and unconstrained formulation, allowing CG_DESCENT to converge reliably to a stationary point—equivalent to the global optimum under convexity—thereby ensuring robustness to initialization and stable performance. A notable exception arises on the YaleB dataset, where non-anchor-based methods outperform anchor-based ones, likely due to the complexity of the face images captured under extreme illumination conditions, which renders 1000 anchor points insufficient for accurately representing the manifold structure; as further shown in
Section 4.7, increasing the number of anchors significantly boosts FUCSMF’s accuracy on YaleB, with clear potential to surpass CLMF as the anchor size grows.
4.4. Influence on Labeled Information
To examine the sensitivity of FUCSMF to the amount of supervision, we vary the number of labeled samples per class from 1 to 10. For every dataset,
samples are randomly chosen from each category as labeled data. All experiments are conducted over ten independent runs, and the averaged accuracy is reported in
Figure 3,
Figure 4 and
Figure 5. Due to CLMF, EDDNMF, OCSNMF, SDSGC, CSCF, and S
4NMF costing a lot of time, they are not compared here. As the number of labeled samples increases, all methods show an upward trend in clustering accuracy. This demonstrates the crucial importance of labeled information for semi-supervised methods and highlights its advantage of achieving good results with only a small number of labels. In addition, the relationship between the quantity of labeled data and clustering accuracy is non-linear. When the number of labeled samples increases to a certain extent, the improvement in accuracy will be very limited. It should be noted that some methods such as PCPSNMF show an oscillating increase in accuracy as the number of labels increases. This may be due to its non-convex model making it difficult to obtain optimal results in every run. FUCSMF outperforms the competing methods on all datasets except YaleB, highlighting the robustness of the proposed approach.
4.5. Time Consumption
Table 6 presents the average computational time required by all methods. For BGSSL, EAGR, and FUCSMF, the values shown in the table represent the incremental computational time added on top of the baseline
k-means running time, since these three methods are implemented based on the
k-means; thus, all timing results are standardized with
k-means as the reference. Among all compared approaches, FUCSMF achieves the lowest overall time consumption. The superiority in runtime stems from both the favorable complexity of
—significantly below that of alternative MF-based methods such as
or
—and the substantially fewer iterations needed during optimization. Traditional MF methods commonly rely on multiplicative update rules that merely guarantee monotonicity of the objective but lack convergence guarantees, forcing them to adopt large iteration counts to stabilize performance. In contrast, FUCSMF formulates a fully unconstrained and convex objective, enabling the use of the convergence-guaranteed
CG_DESCENT algorithm, which dramatically shortens the optimization process. As further demonstrated in
Section 4.9, FUCSMF consistently converges within approximately ten iterations, thereby explaining its superior empirical running speed.
4.6. Sensitivity of Parameters
FUCSMF involves two key parameters, namely
and
. In this section, the search range of
is set to
, while
is varied within
.
Figure 6,
Figure 7 and
Figure 8 illustrate the clustering performance under different parameter settings. The results show that FUCSMF remains stable over a broad range of parameter values, reflecting its insensitivity to the selection of
and
.
Across different datasets, the effect of exhibits distinct characteristics. Specifically, on the COIL20, YaleB, and COIL100 datasets, smaller values of generally lead to improved clustering performance, whereas on the USPS, MNIST, and Letters datasets, larger values of tend to produce better results. This behavior can be attributed to the role of in controlling the sparsity of the constructed adjacency matrix, where smaller values encourage sparser connections and larger values yield denser graph structures.
On COIL20 and YaleB, setting to very small values tends to deteriorate clustering results. This can be attributed to the smaller dataset size and anchor set, which magnify variations in the sample–anchor distance . When is too small, the resulting graph becomes excessively sparse, leading to insufficient neighborhood information for reliable clustering. In contrast, for the COIL100 dataset with a large number of categories, smaller values of tend to yield a better performance. This can be explained by the fact that anchor points generated by k-means may be shared across multiple categories, and restricting connections to fewer anchors helps reduce the influence of ambiguous anchors.
Overall, setting provides consistently favorable results across most datasets, demonstrating the robustness and practical effectiveness of the proposed FUCSMF framework.
4.7. Influence on Anchors
To analyze the effect of anchor quantity on the performance of FUCSMF, experiments are conducted on all datasets by varying the anchor size only, with the results reported in
Figure 9,
Figure 10 and
Figure 11. Clustering performance tends to improve with the growth in anchor quantity on all datasets. Among them, YaleB exhibits the highest sensitivity to the anchor size, whereas USPS and MNIST show a relatively stable performance under different anchor settings.
The underlying reason may lie in the datasets’ properties. USPS and MNIST contain handwritten digits with comparatively uniform structures, enabling effective representation with fewer anchors. Conversely, the YaleB dataset includes face images with pronounced variations in illumination and expression, which demand a greater number of anchors to adequately characterize the data manifold.
In addition, the runtime of FUCSMF increases approximately linearly with respect to the number of anchor points. This empirical observation is consistent with the theoretical computational complexity of . Consequently, FUCSMF can efficiently accommodate a large number of anchor points, whereas many existing anchor-based methods become computationally prohibitive due to their cubic complexity term .
4.8. Generated Graph
A comparison of graph representations derived from FUCSMF and the traditional method in [
12] on COIL20 is provided in
Figure 12.
Figure 12a illustrates the bipartite sample–anchor graph, which solely reflects the relationships between samples and anchors.As a result, this representation appears sparse and does not exhibit explicit block structures, since no direct sample–sample connections are introduced at this stage.
As shown in
Figure 12b, the adjacency matrix induced by the bipartite graph forms a pronounced block-diagonal pattern, which reflects the underlying cluster structure of the dataset. Compared with the normalized full adjacency matrix constructed by [
12] in
Figure 12c, the proposed method produces more compact and discriminative blocks, indicating that the hypergraph-based construction better preserves the intrinsic data structure.
4.9. Convergence Study
To examine the convergence behavior of FUCSMF, the objective function values across iterations are illustrated in
Figure 13,
Figure 14 and
Figure 15. Each figure visualizes the convergence process over five external iterations. In these plots, colored dashed lines correspond to the inner optimization procedure using CG_DESCENT, while the gaps between dots of different colors indicate the simplex projection step for solving Equation (
28). When only a single dot appears for a given color, it implies that the desired optimization accuracy has already been reached, and the CG_DESCENT procedure terminates accordingly. Note that five external iterations are shown solely for visualization purposes. In practice, FUCSMF terminates once the number of CG_DESCENT iterations becomes zero, as neither
nor
will be further updated. One complete iteration spans from the dot of one color to that of the next.
As observed from the figures, FUCSMF converges within at most three external iterations on all datasets. Moreover, the inner optimization loop based on CG_DESCENT requires no more than four iterations to solve Equation (
24). Consequently, the entire optimization process is completed within ten iterations across all datasets. This fast convergence behavior can be attributed to the convex nature of the proposed SMF objective function, which also explains the high computational efficiency reported in
Section 4.5. Additionally, since the objective function value approaches its optimum during the first external iteration, fewer iterations are needed in subsequent inner loops.
Because the objective value declines sharply at the beginning of the optimization process, later iterations exhibit comparatively smaller changes. To enhance clarity, a detailed view of the objective trajectory between the first and second external iterations is provided in a separate subplot. The results demonstrate that updating
substantially lowers the objective function, further supporting the validity of the update rule in Equation (
28).