Next Article in Journal
Investigation of Self-Disproportionation of Enantiomers via Column Chromatography (SDEvCC) Using 3-(ortho-Substituted-phenyl)quinazolin-4-one Derivatives
Previous Article in Journal
Oscillatory Behaviors of Two-Component Genetic Networks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Fast Unconstraint Convex Symmetric Matrix for Semi-Supervised Learning

1
School of Electronic Information and Electrical Engineering, Chengdu University, Chengdu 610106, China
2
International Joint Research Center for Perception and Control of Intelligent Rehabilitation Systems of Sichuan Province, Chengdu 610106, China
3
Tianfu Jiangxi Laboratory, Chengdu 610041, China
4
School of Mathematical Sciences, East China Normal University, Shanghai 200241, China
5
State Key Laboratory of Estuarine and Coastal Research, East China Normal University, Shanghai 200241, China
*
Author to whom correspondence should be addressed.
Symmetry 2026, 18(4), 698; https://doi.org/10.3390/sym18040698
Submission received: 25 March 2026 / Revised: 14 April 2026 / Accepted: 16 April 2026 / Published: 21 April 2026
(This article belongs to the Section Computer)

Abstract

Symmetric matrix factorization (SMF) plays an important role in clustering and representation learning. Nevertheless, most existing SMF-based approaches are formulated as non-convex optimization problems, which often leads to unstable convergence and high computational costs. In this paper, we develop a fast unconstrained convex symmetric matrix factorization framework, termed FUCSMF, for semi-supervised learning. By incorporating label information into the symmetric factorization formulation, the proposed model is transformed into a convex objective, which guarantees global optimality and enables efficient optimization using standard unconstrained solvers. To further improve scalability, a bipartite graph structure is introduced into SMF from a hypergraph-inspired perspective, significantly reducing the computational burden. The resulting computational complexity is reduced to O ( n m d ) , which is substantially lower than the O ( n m d + m 2 n + m 3 ) complexity required by existing bipartite graph-based methods, where n, m, and d denote the numbers of samples, anchor points, and feature dimensions, respectively. In addition, we propose a correntropy-based graph construction strategy to alleviate the sensitivity of conventional adaptive neighbor bipartite graph methods. Extensive experiments on six benchmark datasets, involving comparisons with eleven state-of-the-art methods, demonstrate that FUCSMF achieves superior clustering performance while requiring significantly less computational time. Empirical results further show that the proposed method converges rapidly, typically within ten iterations.

1. Introduction

Feature learning aims to automatically discover informative representations from raw data, thereby reducing the reliance on handcrafted features and improving downstream task performance [1,2,3]. As one of the fundamental techniques in feature learning, matrix factorization has been extensively studied and successfully applied in various domains, including data mining, pattern recognition, and information retrieval. In general, matrix factorization seeks to approximate a data matrix by the product of several low-rank matrices, enabling effective dimensionality reduction and compact representation learning. A variety of classical techniques have been developed under the matrix factorization framework, such as NMF [4,5], SVD [6], QR decomposition, LDA [7], PCA [8], ICA [9], and CF [10].
Among these approaches, NMF has attracted particular attention due to its part-based and interpretable representations. By enforcing non-negativity constraints on both basis and coefficient matrices, NMF produces additive components that are often consistent with human intuition. However, the strict non-negativity requirement limits its applicability to non-negative data and may lead to suboptimal approximations in certain low-rank scenarios. In comparison, classical methods such as SVD often achieve lower reconstruction errors under the same rank constraint. To mitigate this issue, Zhang et al. [11] proposed a low-rank matrix factorization model with orthogonality constraints to improve decomposition accuracy.
While conventional matrix factorization methods primarily focus on capturing global structures, they often overlook the intrinsic local manifold information embedded in data. To address this limitation, graph-regularized matrix factorization techniques have been widely investigated. For instance, GNMF and LCCF [12,13] incorporate graph Laplacian regularization to preserve local geometric structures. Yi et al. [14] proposed NMF-LCAG by jointly considering reconstruction fidelity and locality preservation. To enhance robustness against noise and outliers, Peng et al. [15] introduced GCCF based on the maximum correntropy criterion (MCC) [16]. Moreover, hypergraph-based extensions, such as CHNMF [17], further exploit high-order relationships among samples.
Most of the aforementioned approaches are formulated in an unsupervised manner and do not exploit available label information. In many practical scenarios, even limited supervision can substantially improve clustering performance. To this end, several constrained and semi-supervised matrix factorization models have been proposed. Liu et al. [18,19] introduced CNMF and CCF by enforcing samples from the same class to share similar representations. Zhang et al. [20] proposed NMFCC, which integrates must-link and cannot-link constraints into the factorization process. More generally, graph-based matrix factorization models can be extended to semi-supervised settings by incorporating pairwise constraints. Following this paradigm, Peng et al. [21,22] developed CSNMF and CSCF using adaptive neighbor assignment to construct informative adjacency graphs, while Zhou et al. [23] proposed CLMF by learning graph structures through sparsity-induced similarity (SIS).
Symmetric matrix factorization (SMF) can be viewed as a particular formulation of matrix factorization, in which an affinity matrix is decomposed into two identical low-rank factors. As affinity matrices primarily reflect neighborhood relations among samples, SMF-based models are inherently oriented toward capturing local structural characteristics. In recent years, a number of semi-supervised extensions of SMF have been proposed. For instance, SNMFCC [20] incorporates pairwise supervision by embedding must-link and cannot-link constraints into the affinity matrix, while PCPSNMF [24] simultaneously propagates pairwise constraints and updates the graph structure. To further enhance supervision, Chavoshinejad et al. introduced S4NMF by integrating self-supervised information into symmetric factorization. In a more recent study, Yin et al. [25] developed HSSNMF, which utilizes hypergraph-driven constraint propagation to encode complex higher-order interactions. Despite their promising performance, most existing SMF models remain non-convex, as their underlying objective can be simplified to minimizing ( 1 x 2 ) 2 , leading to considerable optimization difficulties.
In addition to optimization difficulties, graph-based and symmetric matrix factorization methods also face scalability issues due to the requirement of storing and processing an n × n affinity matrix. To improve scalability, bipartite graph-based methods have been proposed. Zhou et al. [26] introduced a bipartite graph-regularized robust low-rank matrix factorization (BLMF) framework for semi-supervised image clustering. Liu et al. [27] proposed the local anchor embedding (LAE) method to reduce computational cost, while Wang et al. [28] developed EAGR using an improved anchor graph construction strategy. Nie et al. [29] further applied bipartite graph modeling to graph-based semi-supervised learning (BGSSL). Nevertheless, directly extending bipartite graphs to symmetric matrix factorization remains non-trivial, since it ultimately requires recovering a fully connected n × n affinity matrix.
This paper presents a novel fast unconstrained convex symmetric matrix factorization (FUCSMF) framework with the following advantages:
  • The introduction of label information reformulates the symmetric matrix factorization objective into a convex optimization problem, which guarantees global optimality and leads to reliable and efficient convergence.
  • By integrating a bipartite graph mechanism into the symmetric factorization framework, the computational burden is substantially reduced. In particular, the resulting time complexity is O ( n m d ) , while most existing bipartite graph-based methods incur higher costs of O ( n m d + m 2 d + m 3 ) .
  • A novel adaptive graph learning framework is developed to alleviate the sensitivity of existing models to parameter selection, thereby improving robustness and practical applicability.
It is worth noting that the term “convex symmetric matrix factorization” refers to a reformulated optimization framework in which convexity is achieved through problem transformation, rather than being inherent to the original formulation.
The proposed formulation is referred to as convex symmetric matrix factorization, where convexity is achieved through problem reformulation.

2. Preliminaries

2.1. Bipartite Graph

Let X = [ x 1 , x 2 , , x n ] R d × n be a d dimensional data set with n samples. A graph can be denoted by G = ( X , E , G ) , where E is the set of edges and  G R n × n is a symmetric full adjacency matrix. If  e = ( i , j ) is connected, g i j is the weight of the edge; otherwise, g i j = 0 . Let U = [ u 1 , u 2 , , u m ] R d × m be an m-sample anchor set, where m < n . A bipartite graph can be denoted by B = [ X , S , U ] , where S R n × m is the adjacency matrix. If  x i and u j are connected, s i j is the weight of the edge; otherwise, s i j = 0 . The full adjacency matrix of the bipartite graph is
G = 0 S S 0 .
Accordingly, the Laplacian matrix of G is defined as
L = D G ,
where D R n × n is the degree of G , which is a diagonal matrix with d i i = j = 1 n + m s i j .
The traditional approach to constructing a bipartite graph using the k-nearest neighbors (kNN) [30] algorithm connects each sample to k anchor points. However, this construction method lacks flexibility. Recently, a new bipartite graph construct method has been proposed
min s i 1 = 1 , s i > 0 j = 1 m x i u j 2 2 s i j + γ i n j = 1 m s i j 2 .
With this method, each sample no longer strictly connects the same number of anchor points, improving performance. Let y i j = n x i u j 2 2 / 2 γ . The above problem can be transformed into
min s i 1 = 1 , s i > 0 s i y i 2 2 .
The problem above can be solved by projecting y onto the simplex. However, the dependence on parameter γ in Equation (4) is extremely high. Specifically, projection onto the simplex algorithm is to find a scalar p such that j max ( y i j p , 0 ) = 1 . Let [ y i j 1 , y i j 2 , , y i j m ] be a sort vector of y i in descending order and let ‘proj’ represent the projection onto the simplex. If  y i j 1 y i j 2 > 1 , then only Proj ( y i ) j 1 = 1 . Here is a simple example:
y i = 0 1 100 2 5 , proj ( y i ) = 1 0 0 0 0 .
In this situation, the above method degenerates into a 1 nearest neighbor graph, which empirically does not perform well. This problem can be addressed when γ is large; however, excessively large γ will also lead to unsatisfactory results. For instance, if we were to increase lambda by 100 times in the example above, the outcome would be:
y i = 0 0.01 1 0.02 0.05 , proj ( y i ) = 0.27 0.26 0 0.25 0.22 .
It can be observed that a large γ will connect more nodes, and the weights between each node tend to become similar.

2.2. Matrix Factorization

Matrix factorization seeks to represent a given data matrix through the product of multiple low-rank matrices, which can be formulated as
X WF
where W = [ w 1 , w 2 , , w c ] R d × c , F = [ f 1 , f 2 , , f n ] R c × n and c n . From a modeling perspective, MF approximates each sample x i as a linear reconstruction based on W and its corresponding coefficient vector f i . Under this formulation, W serves as a set of basis components, while F collects the associated representation coefficients. When the latent dimension satisfies c n , the resulting factorization provides a compact representation of the original data. Typically, the reconstruction error between W f i and x i is quantified using the Frobenius norm:
min W , F X WF F 2 .
The above-mentioned problem is essentially a non-convex problem due to the bilinear coupling between W and F , which generally leads to multiple equivalent factorizations. To illustrate this property, consider the extreme case where d = n = 1 , in which the problem reduces to minimizing 1 x y , a simple bilinear function admitting infinitely many solutions. To obtain meaningful and interpretable factorizations, additional constraints are commonly imposed on W and F , among which non-negativity constraints give rise to the well-known non-negative matrix factorization (NMF).
Symmetric MF is a special case of matrix factorization that directly decomposes a symmetric graph matrix, and it can be formulated as
min F G F F F 2 .
Similar to the traditional MF method, it is non-convex.

2.3. Correntropy

Correntropy is a similarity measure derived from information theoretic learning (ITL) [16]. As a localized statistical metric, it is robust to large deviations and outliers. Due to its capability to capture localized similarity information, correntropy has been extensively employed in tasks such as shape matching, face analysis, clustering, and subspace representation learning [26,31]. The function V σ ( x , y ) is intrinsically associated with Rényi’s entropy of order two. Given a pair of random variables x and y, correntropy can be formulated as:
V σ ( x , y ) = E k σ ( x y ) ,
Here, k σ ( · ) denotes a Mercer kernel that satisfies translation invariance, where σ controls the kernel bandwidth. Throughout this paper, the Gaussian kernel with translation invariance is employed in the correntropy formulation due to its smooth behavior, strict positive definiteness, and computational efficiency. The kernel function can be written as
k σ ( a b ) = e ( a b ) 2 2 σ 2
In practical scenarios, where data are available in discrete form, correntropy can be approximated using a finite set of samples as
V ^ σ ( x , y ) = 1 n i = 1 n k σ x i y i ,
where { x i , y i } i = 1 n is the sample set.

3. Methodology

3.1. Bipartite Graph-Based Unconstraint Symmetric Factorization

Due to the high memory requirements and the need for O ( n 2 ) or even O ( n 3 ) computational complexity of traditional graphs, this chapter utilizes the concept of a hypergraph to construct a low-memory graph. Then, an unconstrained convex symmetric matrix factorization is proposed.
According to [32], in a non-weighted hypergraph, the high-order relationships among vertices are encoded by an incidence matrix H , where each entry indicates the membership between a vertex and a hyperedge. Based on this representation, the hypergraph adjacency matrix induced by the incidence structure can be formulated as
H Λ 1 H .
Here, Λ denotes a diagonal matrix, with its diagonal elements representing the degrees of the corresponding hyperedges.
In its basic form, the incidence matrix H is binary-valued, i.e.,  H { 0 ,   1 } . Nevertheless, as pointed out in [33], H can also be generalized to a continuous-valued matrix with entries in the interval [ 0 ,   1 ] , which provides greater flexibility in characterizing vertex–hyperedge relationships.
Given the resulting adjacency matrix, the hypergraph Laplacian can be constructed as
L = D v H Λ H ,
where D v denotes a diagonal matrix whose diagonal elements correspond to the degrees of the vertices. The normalized Laplacian matrix is L = I D v 1 2 H Λ H D v 1 2 .
Anchors can also be seen as a type of hyperedge, as they connect to multiple samples. Figure 1 illustrates the use of anchors as hyperedges, with a total of seven samples connected to four anchors (hyperedges). Each sample is linked to two anchors. It can be observed that each hyperedge connects three to four samples, allowing for the extraction of higher-order information between the samples. Therefore, the proposed new graphical form is as follows:
G = S Λ 1 S ,
Here, S represents the sample–anchor similarity matrix, while Λ is a diagonal matrix whose diagonal elements are given by Λ i i = j = 1 n s j i . In practice, S is commonly normalized along its rows. Consequently, according to Theorem 1, the resulting graph G satisfies the self-normalization property.
Theorem 1.
Let S be a matrix satisfying j S i j = 1 for all i, and let Λ be a diagonal matrix whose diagonal entries are defined as the column sums of S . Then, the matrix S Λ 1 S is symmetric and each of its rows sums to one.
Proof of Theorem 1.
It is straightforward to verify that S Λ 1 S is symmetric. Moreover, the row-wise summation is given by:
j = 1 n k = 1 m s i k s j k d k = k = 1 m j = 1 n s i k s j k j = 1 n s j k = k = 1 m s i k = 1 ,
In this context, s i j refers to the ( i , j ) -th entry of S , and  d k is defined as the column-wise sum of S associated with the k-th column.
In addition, the matrix S Λ 1 S is positive semidefinite. This follows from the fact that Λ 1 is a diagonal matrix with non-negative entries, and the product can be written as S Λ 1 S = ( S Λ 1 2 ) ( S Λ 1 2 ) , which is of the form B B ; hence, it is always positive semidefinite.
By Theorem 1, the matrix S Λ 1 S possesses symmetry and satisfies the row-normalization property. Therefore, its corresponding degree matrix is simply the identity matrix I .
Motivated by the above analysis, we formulate a symmetric matrix factorization framework based on a bipartite graph as follows:
min F S Λ 1 S F F F 2 .
It is important to emphasize that the preceding formulation is purely unsupervised and does not utilize supervision from labeled data. To incorporate available supervision in a principled manner, we introduce a semi-supervised formulation by expressing F as F = ZA , where Z = I Z ˜ R c × ( n l + c ) . Here, Z ˜ R c × ( n l ) serves as an auxiliary matrix encoding the information of unlabeled samples. The matrix A = A 1 A 2 R ( n l + c ) × n is constructed to incorporate both label supervision and positional information. Specifically, if the j-th sample is labeled as belonging to the i-th class, then ( A 1 ) i j = 1 , and  ( A 1 ) i j = 0 otherwise. For unlabeled samples, the corresponding columns of A 1 are zero. In this case, ( A 2 ) k j = 1 if the j-th sample corresponds to the k-th unlabeled instance, and  ( A 2 ) k j = 0 otherwise. The matrix A is constructed from the bipartite graph and treated as a fixed transformation throughout the optimization process. Since A does not depend on the optimization variable, it does not alter the convexity properties of the objective function with respect to the variable Z . Therefore, the introduction of Z preserves the structural properties of the optimization problem.
To clarify the formulation, suppose a dataset consists of eight samples belonging to three categories. Among them, x 3 is assigned to class I, x 1 and x 6 belong to class II, and x 7 corresponds to class III. Under this setting, the associated matrix A can be written as
A 1 = 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 ,
A 2 = 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 .
Given that the labeled samples are already determined and do not need to be optimized, it is reasonable to fix them in a one-hot form. Consequently, the semi-supervised symmetric matrix factorization model under the bipartite graph structure can be written as
min Z ˜ S Λ 1 S A Z ZA F 2 .
To maintain the intrinsic neighborhood structure of the data during subspace learning, we introduce a Laplacian-based regularization term Tr ( F L F ) into the objective. This term encourages nearby samples in the original space to remain close after projection into the latent representation. The Laplacian matrix is constructed as L = D W , where W encodes pairwise similarities between samples and D is the diagonal degree matrix with entries D i i = j W i j . The above regularization can also be expressed in an equivalent form as
i = 1 n j = 1 n f i f j 2 2 w i j = 2 Tr ( F L F ) ,
Under this formulation, the regularization term takes the following form:
Tr ( F ( I S Λ 1 S ) F ) = F F 2 F S ˜ F 2
where S ˜ = S Λ 1 2 , which penalizes large differences between the feature representations of similar samples. By introducing this term, the model not only maintains the convex property of the optimization problem but also enhances the smoothness of the learned manifold structure and the discriminative power of the representation.
Therefore, after incorporating the graph regularization term, the overall objective function can be reformulated as follows:
min Z ˜ S Λ 1 S A Z ZA F 2 + λ Tr ( Z A L A Z ) .
It is worth noting that the inclusion of the matrix A corresponds to a linear transformation applied to the optimization variable, which preserves the convexity structure of the objective function. □

3.2. Analysis of Convexity

Firstly, we give the following theorem to show that Equation (22) is convex. Since the matrix A is fixed, the convexity analysis focuses on the optimization variable, and the presence of A does not affect the convexity of each term. For readability, we provide proof sketches in the main text and defer detailed derivations to the appendix.
Theorem 2.
The objective function in Equation (22) is convex.
The key idea is to analyze the second-order behavior of the objective function along an arbitrary direction. By introducing a univariate function along a line in the parameter space, the convexity reduces to showing the non-negativity of the second derivative.
The objective consists of a quartic term, a quadratic regularization term, and a coupling term involving the bipartite graph structure. The quartic and quadratic terms naturally contribute positive semidefinite components to the Hessian. For the coupling term, we leverage the positive semidefiniteness of S ˜ S ˜ and the structure of A 2 to show that it also yields a non-negative contribution.
Combining these results, the overall Hessian is positive semidefinite, which implies convexity of the objective function. The detailed derivation is provided in Appendix A.
Having established the convexity of Equation (22), any local minimizer is also a global minimizer; however, strict convexity and uniqueness of the solution are not explicitly guaranteed under the current formulation. Since the objective is both convex and differentiable, the key remaining question concerns the uniqueness of the stationary solution. As a consequence of convexity, any local minimizer is necessarily globally optimal. For subsequent analysis, Equation (22) can be rewritten in the following expanded form:   
min Z ˜ S ˜ S ˜ F 2 + λ ZA F 2 2 + λ Tr ZA S ˜ S ˜ A Z + ZA A Z F 2 = S ˜ S ˜ F 2 + λ l + Z ˜ F 2 2 + λ ZA S ˜ F 2 + ZA A Z F 2 = λ Z ˜ F 2 2 + λ A 1 S ˜ + Z ˜ A 2 S ˜ F 2 + Υ + Z ˜ Z ˜ F 2 + S ˜ S ˜ F 2 + λ l ,
where Υ = A 1 A 1 , Z ˜ R c × ( n l ) denotes the auxiliary matrix associated with unlabeled samples, as defined previously, and S ˜ = S Λ 1 2 denotes the normalized bipartite graph matrix defined earlier. By discarding all terms unrelated to Z ˜ , there is
min Z ˜ λ Z ˜ F 2 2 + λ A 1 S ˜ + Z ˜ A 2 S ˜ F 2 + Υ + Z ˜ Z ˜ F 2 .
Theorem 3.
The objective function in Equation (24) has only one stationary point.
The proof is based on analyzing the monotonicity of the gradient mapping. Specifically, we examine the inner product between the gradient difference and the variable difference for two arbitrary points.
The gradient difference can be decomposed into several terms corresponding to the cubic, quadratic, and graph-related components of the objective function. Each term is shown to contribute non-negatively: the quadratic and regularization terms are straightforward, while the graph-related term is bounded using the positive semidefinite property established in Lemma A2. The cubic term is handled using standard matrix inequalities.
As a result, the gradient mapping exhibits monotonic behavior, which provides insights into the structure of stationary points. The detailed proof is given in Appendix B.

3.3. Correntropy-Based Adaptive Graph Learning

To address the issues raised in Equation (3), correntropy is introduced into the bipartite learning framework. It is evident that the range of correntropy is from zero to one, transforming a convex function into a quasi-concave function; thus, we replace the L2 norm with the Correntropy Induce Distance (CID) 1 k σ ( · ) . Figure 2 illustrates the function curves of 1 k σ ( x ) and x 2 . It can be observed that the two curves almost coincide near zero. This can also be mathematically proven.
The correntropy-based formulation introduces a nonlinear similarity measure, which enhances robustness to noise. However, the global convexity of the objective function is not strictly preserved under this substitution.
Theorem 4.
The speed at which 1 k σ ( x ) approaches zero is similar to the speed at which x 2 approaches zero.
This change maintains the range from zero to one, ensuring that the difference between the two largest numbers does not exceed one, effectively avoiding the problem of the nearest neighbor being equal to one. Then, the bipartite graph learning framework becomes
min s i 1 = 1 , s i > 0 j = 1 m ( 1 k σ ( x i u j ) ) s i j + γ i j = 1 m s i j 2 .
By introducing bipartite graph learning, the proposed method is as follows:
min Z ˜ , s i 1 = 1 , s i > 0 λ Z ˜ F 2 + Υ + Z ˜ Z ˜ F 2 + 2 + λ j = 1 m k σ ( x i u j ) s i j + γ i j = 1 m s i j 2 A 1 S ˜ + Z ˜ A 2 S ˜ F 2 .
It is worth noting that the correntropy function behaves similarly to a quadratic function in a local neighborhood around zero, which provides a local approximation to convexity. This property helps maintain stable optimization behavior in practice.
Although strict global convexity is not guaranteed, empirical results demonstrate that the proposed optimization scheme converges reliably, indicating that the objective remains well-behaved in practice.

3.4. Optimization

In this section, the proposed model is optimized.

3.4.1. Fixed Z ˜ Update S

As Z ˜ is fixed, by ignoring terms unrelated to S , there is
min s i 1 = 1 , s i > 0 j = 1 m k σ ( x i u j ) s i j + γ i j = 1 m s i j 2 1 Λ i i j = 1 m s i j 2 k = 1 c s i j + j = 1 m s i j k = 1 c q = 1 , q i n h k j h k q s q j .
To simplify the model and reduce the parameter count, we set γ i = γ ¯ + K = 1 c s i j / Λ i i . By setting d i j = 1 Λ i i k = 1 c q = 1 , q i n h k j h k q s q j + k σ ( x i u j ) , the above problem can be reformed as
min s i 1 = 1 , s i > 0 s i d i 2 γ ¯ 2 2 .
The optimization problem can be addressed using an iterative scheme [34].

3.4.2. Fixed S Update Z ˜

The gradient of Equation (19) is
2 ( 2 Υ + 2 Z ˜ Z ˜ + λ I ) Z ˜ 2 ( 2 + λ ) ( A 1 S ˜ + Z ˜ A 2 S ˜ ) S ˜ A 2 .
Since the objective function is convex and free of explicit constraints, it can be efficiently optimized using standard unconstrained optimization techniques. In this work, we adopt CG_DESCENT 6.8 [35,36,37], a nonlinear conjugate gradient solver designed for large-scale smooth optimization problems. This algorithm has been extensively studied in the literature and is known to possess global convergence guarantees under standard conditions such as Lipschitz continuity of the gradient and appropriate line search strategies. CG_DESCENT operates in an iterative manner and requires only basic vector operations, such as inner products and vector additions, along with the evaluation of gradients and objective values at each iteration. Owing to its computational efficiency and scalability, it is well suited for the proposed framework.
Algorithm 1 describes the detailed optimization steps of our proposed method.
Algorithm 1 FUCSMF
  1:
Input: Data matrix X R d × n , constraint matrix A , parameters λ .
  2:
Output: Clustering indicator matrix (representation matrix) Z = I Z ˜ A .
  3:
Initialize the anchor point set U by k-means.
  4:
while Not convergent do
  5:
    Calculate each row of S by solving problem Equation (28).
  6:
    Calculate Λ with Λ i i = j = 1 n s j i .
  7:
    Calculate Z ˜ by CG_DESCENT 6.8.
  8:
end while
In this work, we adopt CG DESCENT as an off-the-shelf solver and do not re-establish its convergence properties. Instead, we focus on formulating a suitable objective function that can be efficiently optimized using this method.

3.5. Computational Complexity

The overall computational cost of FUCSMF can be decomposed into several major components:
1.
Computing the anchor points requires O ( m n d ) operations.
2.
Evaluating the distances between data samples and anchors incurs a complexity of O ( m n d ) .
3.
Constructing the bipartite graph costs O ( k m n ) , where k denotes the number of nearest neighbors.
4.
Evaluating the objective function with respect to Z ˜ requires O ( c 2 n + c m n ) operations.
5.
Computing the gradient of Z ˜ has the same computational complexity of O ( c 2 n + c m n ) .
To clarify the computational complexity, we analyze the evaluation cost of the objective function and its gradient in detail. The main computational burden arises from the term S Λ 1 S A Z Z A F 2 , where the dominant cost lies in computing Z Z and the subsequent matrix multiplications. Specifically, computing Z Z requires O ( c 2 n ) operations, since Z R c × n . In addition, the multiplication involving the bipartite graph matrix S R n × m introduces a cost of O ( c m n ) due to the interactions between samples and anchor points. Therefore, the overall complexity for evaluating the objective function is O ( c 2 n + c m n ) . Similarly, the gradient computation involves matrix products with the same structure, leading to an equivalent computational complexity. As a result, both the objective function evaluation and gradient computation share the same time complexity of O ( c 2 n + c m n ) . This explicit breakdown provides a clearer justification of the claimed computational cost.
Given that c min ( m , n , d ) and k min ( m , n , d ) , the leading term in the computational complexity of FUCSMF reduces to O ( m n d ) . By comparison, representative bipartite graph-based methods such as BGSSL and EAGR incur additional higher-order terms, resulting in a complexity of O ( n d m + n m 2 + m 3 ) . In addition, the computation of the correntropy-induced metric involves evaluating kernel functions for each element, which introduces an additional cost of O ( n m ) . However, this cost is linear with respect to the data size and is dominated by the matrix multiplication terms O ( c m n ) in the overall complexity. Therefore, the inclusion of correntropy does not change the overall computational complexity.

4. Experiment

In this section, we evaluate the proposed method on several real-world datasets to assess its effectiveness and computational efficiency.

4.1. Comparison Methods

To evaluate the effectiveness and computational efficiency of the proposed FUCSMF, several representative methods are selected for comparison.
  • SemiGNMF: Semi-supervised Graph-regularized NMF [12].
  • CSCF: Correntropy-based Semi-supervised Concept Factorization [22].
  • CLMF: Correntropy-based Low-rank MF [23].
  • EDDNMF: Element Difference Discriminate NMF [38].
  • PCPSNMF: Pairwise Constraint Propagation-induced SNMF [24].
  • S4NMF: Self-supervised Semi-supervised NMF [39].
  • HSSNMF: Hypergraph-based Semi-supervised SNMF [25].
  • OCSNMF: One-hot Constrained SNMF [40].
  • SDSGC: Structured Doubly Stochastic Graph-Based Clustering [41].
  • EAGR: Efficient Anchor Graph Regularization [28].
  • BCAN: Semi-supervised Learning via Bipartite graph Construction with Adaptive Neighbors [42].
In detail, methods (1)–(4) are conventional matrix factorization algorithms applied to sample representations, methods (5)–(9) are symmetric matrix factorization models defined on the complete adjacency matrix, and methods (10)–(11) adopt a bipartite graph framework.

4.2. Dataset

The performance of FUCSMF is examined on six benchmark real-world datasets, and their corresponding statistics are listed in Table 1. In particular, COIL20 and COIL100 are object datasets, YaleB is a face database, USPS and MNIST represent handwritten digit collections, while Letters includes handwritten English alphabet samples.
All experiments were implemented on a desktop platform equipped with an Intel i7-6800K processor and 16 GB memory to maintain a unified computational environment. For fairness, the hyperparameters of competing algorithms were configured following the settings suggested in their original papers. In graph-based methods, the neighborhood size was uniformly set to five. For FUCSMF, the regularization coefficient λ was fixed at 100. The number of anchors was selected as 500 for COIL20, 1500 for YaleB, and 2000 for the remaining datasets. In addition, the latent dimensionality was chosen to match the number of true classes in each dataset.
For GNMF, CSNMF, and CSCF, the final cluster assignments were obtained by applying k-means to the learned representations. In contrast, the remaining approaches determined cluster labels directly through arg max i H i j . The CG_DESCENT procedure was terminated when the infinity norm of the gradient satisfied f < 10 6 . In our experiments, 30% of the data were randomly selected as labeled samples for all semi-supervised methods to ensure a fair comparison under the same setting. Moreover, as observed in Section 4.4, the clustering performance (e.g., ACC) tends to saturate when the proportion of labeled data exceeds a certain threshold, and further increasing labeled data brings only marginal improvement. Therefore, considering both performance and labeling cost, we adopted 30% labeled data as a reasonable trade-off between effectiveness and efficiency. We acknowledge that evaluating lower labeled ratios is also important and will investigate this aspect in future work.
To measure clustering quality, we employed four widely used metrics: ACC, NMI [43], ARI [44], and F-score [45]. All methods were initialized randomly and run ten independent times, with the reported values representing the mean performance across the repeated experiments.

4.3. Experimental Performance

Table 2, Table 3, Table 4 and Table 5 report the mean clustering results and corresponding standard deviations of FUCSMF compared with other methods. The highest and second-highest values are indicated in boldface and underlined, respectively. Since the USPS dataset contains negative entries, it is incompatible with SemiGNMF and, therefore, excluded from its comparisons. Overall, FUCSMF achieves a consistently superior performance across most datasets, and the strong results obtained by anchor-based approaches demonstrate that modeling relationships between samples and a compact set of anchor points effectively captures underlying similarity structures. In contrast, many traditional matrix factorization models exhibit large fluctuations in performance, as reflected by their high standard deviations, largely due to their non-convex objectives and heavy sensitivity to initialization; moreover, except for CLMF, most MF-based competitors rely on multiplicative update rules, which guarantee only monotonic decreases in the objective value without theoretical convergence. FUCSMF, however, benefits from a fully convex and unconstrained formulation, allowing CG_DESCENT to converge reliably to a stationary point—equivalent to the global optimum under convexity—thereby ensuring robustness to initialization and stable performance. A notable exception arises on the YaleB dataset, where non-anchor-based methods outperform anchor-based ones, likely due to the complexity of the face images captured under extreme illumination conditions, which renders 1000 anchor points insufficient for accurately representing the manifold structure; as further shown in Section 4.7, increasing the number of anchors significantly boosts FUCSMF’s accuracy on YaleB, with clear potential to surpass CLMF as the anchor size grows.

4.4. Influence on Labeled Information

To examine the sensitivity of FUCSMF to the amount of supervision, we vary the number of labeled samples per class from 1 to 10. For every dataset, { 1 , 2 , , 10 } samples are randomly chosen from each category as labeled data. All experiments are conducted over ten independent runs, and the averaged accuracy is reported in Figure 3, Figure 4 and Figure 5. Due to CLMF, EDDNMF, OCSNMF, SDSGC, CSCF, and S4NMF costing a lot of time, they are not compared here. As the number of labeled samples increases, all methods show an upward trend in clustering accuracy. This demonstrates the crucial importance of labeled information for semi-supervised methods and highlights its advantage of achieving good results with only a small number of labels. In addition, the relationship between the quantity of labeled data and clustering accuracy is non-linear. When the number of labeled samples increases to a certain extent, the improvement in accuracy will be very limited. It should be noted that some methods such as PCPSNMF show an oscillating increase in accuracy as the number of labels increases. This may be due to its non-convex model making it difficult to obtain optimal results in every run. FUCSMF outperforms the competing methods on all datasets except YaleB, highlighting the robustness of the proposed approach.

4.5. Time Consumption

Table 6 presents the average computational time required by all methods. For BGSSL, EAGR, and FUCSMF, the values shown in the table represent the incremental computational time added on top of the baseline k-means running time, since these three methods are implemented based on the k-means; thus, all timing results are standardized with k-means as the reference. Among all compared approaches, FUCSMF achieves the lowest overall time consumption. The superiority in runtime stems from both the favorable complexity of O ( n m d ) —significantly below that of alternative MF-based methods such as O ( n m d + m 2 n + m 3 ) or O ( n 2 d ) —and the substantially fewer iterations needed during optimization. Traditional MF methods commonly rely on multiplicative update rules that merely guarantee monotonicity of the objective but lack convergence guarantees, forcing them to adopt large iteration counts to stabilize performance. In contrast, FUCSMF formulates a fully unconstrained and convex objective, enabling the use of the convergence-guaranteed CG_DESCENT algorithm, which dramatically shortens the optimization process. As further demonstrated in Section 4.9, FUCSMF consistently converges within approximately ten iterations, thereby explaining its superior empirical running speed.

4.6. Sensitivity of Parameters

FUCSMF involves two key parameters, namely λ and γ . In this section, the search range of λ is set to { 10 3 , 10 2 , 10 1 , 10 0 , 10 1 , 10 2 , 10 3 } , while γ is varied within { 0.5 , 0.8 , 1 , 1.2 , 1.5 } . Figure 6, Figure 7 and Figure 8 illustrate the clustering performance under different parameter settings. The results show that FUCSMF remains stable over a broad range of parameter values, reflecting its insensitivity to the selection of λ and γ .
Across different datasets, the effect of γ exhibits distinct characteristics. Specifically, on the COIL20, YaleB, and COIL100 datasets, smaller values of γ generally lead to improved clustering performance, whereas on the USPS, MNIST, and Letters datasets, larger values of γ tend to produce better results. This behavior can be attributed to the role of γ in controlling the sparsity of the constructed adjacency matrix, where smaller values encourage sparser connections and larger values yield denser graph structures.
On COIL20 and YaleB, setting γ to very small values tends to deteriorate clustering results. This can be attributed to the smaller dataset size and anchor set, which magnify variations in the sample–anchor distance x i u j 2 2 . When γ is too small, the resulting graph becomes excessively sparse, leading to insufficient neighborhood information for reliable clustering. In contrast, for the COIL100 dataset with a large number of categories, smaller values of γ tend to yield a better performance. This can be explained by the fact that anchor points generated by k-means may be shared across multiple categories, and restricting connections to fewer anchors helps reduce the influence of ambiguous anchors.
Overall, setting γ = 1 provides consistently favorable results across most datasets, demonstrating the robustness and practical effectiveness of the proposed FUCSMF framework.

4.7. Influence on Anchors

To analyze the effect of anchor quantity on the performance of FUCSMF, experiments are conducted on all datasets by varying the anchor size only, with the results reported in Figure 9, Figure 10 and Figure 11. Clustering performance tends to improve with the growth in anchor quantity on all datasets. Among them, YaleB exhibits the highest sensitivity to the anchor size, whereas USPS and MNIST show a relatively stable performance under different anchor settings.
The underlying reason may lie in the datasets’ properties. USPS and MNIST contain handwritten digits with comparatively uniform structures, enabling effective representation with fewer anchors. Conversely, the YaleB dataset includes face images with pronounced variations in illumination and expression, which demand a greater number of anchors to adequately characterize the data manifold.
In addition, the runtime of FUCSMF increases approximately linearly with respect to the number of anchor points. This empirical observation is consistent with the theoretical computational complexity of O ( m d n ) . Consequently, FUCSMF can efficiently accommodate a large number of anchor points, whereas many existing anchor-based methods become computationally prohibitive due to their cubic complexity term O ( m 3 ) .

4.8. Generated Graph

A comparison of graph representations derived from FUCSMF and the traditional method in [12] on COIL20 is provided in Figure 12. Figure 12a illustrates the bipartite sample–anchor graph, which solely reflects the relationships between samples and anchors.As a result, this representation appears sparse and does not exhibit explicit block structures, since no direct sample–sample connections are introduced at this stage.
As shown in Figure 12b, the adjacency matrix induced by the bipartite graph forms a pronounced block-diagonal pattern, which reflects the underlying cluster structure of the dataset. Compared with the normalized full adjacency matrix constructed by [12] in Figure 12c, the proposed method produces more compact and discriminative blocks, indicating that the hypergraph-based construction better preserves the intrinsic data structure.

4.9. Convergence Study

To examine the convergence behavior of FUCSMF, the objective function values across iterations are illustrated in Figure 13, Figure 14 and Figure 15. Each figure visualizes the convergence process over five external iterations. In these plots, colored dashed lines correspond to the inner optimization procedure using CG_DESCENT, while the gaps between dots of different colors indicate the simplex projection step for solving Equation (28). When only a single dot appears for a given color, it implies that the desired optimization accuracy has already been reached, and the CG_DESCENT procedure terminates accordingly. Note that five external iterations are shown solely for visualization purposes. In practice, FUCSMF terminates once the number of CG_DESCENT iterations becomes zero, as neither S nor Z will be further updated. One complete iteration spans from the dot of one color to that of the next.
As observed from the figures, FUCSMF converges within at most three external iterations on all datasets. Moreover, the inner optimization loop based on CG_DESCENT requires no more than four iterations to solve Equation (24). Consequently, the entire optimization process is completed within ten iterations across all datasets. This fast convergence behavior can be attributed to the convex nature of the proposed SMF objective function, which also explains the high computational efficiency reported in Section 4.5. Additionally, since the objective function value approaches its optimum during the first external iteration, fewer iterations are needed in subsequent inner loops.
Because the objective value declines sharply at the beginning of the optimization process, later iterations exhibit comparatively smaller changes. To enhance clarity, a detailed view of the objective trajectory between the first and second external iterations is provided in a separate subplot. The results demonstrate that updating S substantially lowers the objective function, further supporting the validity of the update rule in Equation (28).

5. Conclusions

A novel unconstrained convex symmetric matrix decomposition for semi-supervised learning is presented. By introducing a bipartite graph, the computational complexity is significantly reduced. Moreover, as the model is unconstrained and convex, it can be resolved promptly by the state-of-the-art unconstrained optimization method. Additionally, an adaptive neighbor bipartite graph construction approach based on correntropy is proposed, which effectively circumvents the issue of parameter sensitivity of traditional methods. Empirical evaluations show that the proposed approach generally achieves competitive or best results with minimal runtime.

Author Contributions

Conceptualization, W.W.; methodology, W.W., W.L. and N.Z.; software, W.W., K.C. and W.L.; formal analysis, W.L.; investigation, Y.C.; resources, Y.C.; data curation, Y.C.; writing—original draft preparation, K.C. and Y.C.; writing—review and editing, W.W. and K.C.; visualization, N.Z. and Y.C.; supervision, N.Z.; project administration, W.W. and N.Z.; funding acquisition, W.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Sichuan Provincial Department of Science and Technology (Grant No. 2024NSFSC2056).

Data Availability Statement

The datasets used in this study are publicly available from their original repositories. Specifically, COIL20 is available at https://cave.cs.columbia.edu/repository/COIL-20 (accessed on 30 March 2026); COIL100 is available at https://cave.cs.columbia.edu/repository/COIL-100 (accessed on 30 March 2026); Extended Yale B is available at https://vision.ucsd.edu/datasets/extended-yale-face-database-b-b (accessed on 30 March 2026); USPS is available at https://www.openml.org/search?type=data&sort=runs&id=41070 (accessed on 30 March 2026); MNIST is available at https://archive.ics.uci.edu/dataset/683/mnist+database+of+handwritten+digits (accessed on 30 March 2026); and EMNIST Letters is available at https://www.nist.gov/itl/products-and-services/emnist-dataset (accessed on 30 March 2026).

Conflicts of Interest

The authors declare there is no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
FUCSMFFast Unconstrained Convex Symmetric Matrix Factorization
SMFSymmetric Matrix Factorization
MFMatrix Factorization
NMFNon-negative Matrix Factorization
SNMFSymmetric Non-negative Matrix Factorization
CGConjugate Gradient
MCCMaximum Correntropy Criterion
CIDCorrentropy Induced Distance
ITLInformation Theoretic Learning
kNNk-Nearest Neighbors
LAELocal Anchor Embedding
ACCAccuracy
NMINormalized Mutual Information
ARIAdjusted Rand Index
OMOut-of-Memory
GNMFGraph Regularized Non-negative Matrix Factorization
LCCFLocally Consistent Concept Factorization
GCCFGraph Correntropy-based Concept Factorization
CHNMFCorrentropy-based Hypergraph Regularized NMF
CNMFConstrained Non-negative Matrix Factorization
CCFConstrained Concept Factorization
NMFCCNon-negative Matrix Factorization with Constrained Clustering
CSNMFConstrained Semi-supervised Non-negative Matrix Factorization
CSCFCorrentropy-based Semi-supervised Concept Factorization
CLMFCorrentropy-based Low-rank Matrix Factorization
SISSparsity-Induced Similarity
SNMFCCSymmetric Non-negative Matrix Factorization with Constrained Clustering
PCPSNMFPairwise Constraint Propagation-induced Symmetric Non-negative
Matrix Factorization
HSSNMFHypergraph-based Semi-supervised Symmetric Non-negative Matrix Factorization
BLMFBipartite Graph-regularized Low-rank Matrix Factorization
BGSSLBipartite Graph-based Semi-supervised Learning
EAGREfficient Anchor Graph Regularization
SemiGNMFSemi-supervised Graph Regularized Non-negative Matrix Factorization
EDDNMFElement Difference Discriminative Non-negative Matrix Factorization
S4NMFSelf-supervised Semi-supervised Non-negative Matrix Factorization
OCSNMFOne-hot Constrained Symmetric Non-negative Matrix Factorization
SDSGCStructured Doubly Stochastic Graph-Based Clustering
BCANBipartite Graph Construction with Adaptive Neighbors
PCAPrincipal Component Analysis
LDALinear Discriminant Analysis
ICAIndependent Component Analysis
SVDSingular Value Decomposition
CFConcept Factorization

Appendix A. Proof of Theorem 2

Lemma A1.
A function f defined on dom ( f ) is convex if, and only if, for any point x dom ( f ) and any direction v, the univariate function
ϕ ( t ) = f ( x + t v )
is convex with respect to t on the set { t x + t v dom ( f ) } .
Lemma A2.
For any matrix V R c × ( n p ) ,
V F 2 V A 2 S ˜ F 2 0 .
Proof. 
Let W = S ˜ S ˜ . Using the identity X F 2 = Tr ( X X ) , we have
V F 2 V A 2 S ˜ F 2 = Tr V I A 2 W A 2 V .
Thus, it suffices to verify that
I A 2 W A 2 0 .
Since W = S ˜ S ˜ is positive semidefinite, the matrix A 2 W A 2 is also positive semidefinite provided that A 2 is fixed. This follows from the fact that for any matrix B and positive semidefinite matrix W , the product B W B remains positive semidefinite.
For any vector x ,
x I A 2 W A 2 x = x 2 2 ( A 2 x ) W ( A 2 x ) 0 .
Hence, I A 2 W A 2 is positive semidefinite, and the lemma follows. □
Proof. 
Let Z ˜ R c × ( n p ) and consider an arbitrary direction V R c × ( n p ) . Define the one-dimensional function
ψ ( t ) = F ( Z ˜ + t V ) .
To establish convexity, it suffices to show that ψ ( t ) 0 for all t.
By direct differentiation, the second derivative can be expressed as
ψ ( t ) = V , 2 F ( Z ˜ + t V ) V .
The quartic term in the objective produces a positive semidefinite contribution since it involves squared Frobenius norms. The quadratic regularization term λ Z ˜ F 2 clearly yields a non-negative second derivative. Furthermore, from Lemma A2, the coupling term involving A 2 S ˜ is also non-negative.
Since each component of the Hessian contributes a positive semidefinite term, their summation remains positive semidefinite. Therefore, ψ ( t ) 0 for all t, which implies that F ( Z ˜ ) is convex. □
Corollary A1.
The function
S ˜ S ˜ A Z Z A F 2
is convex.
Proof. 
The above expression is a composition of convex quadratic and Frobenius norm terms, and thus preserves convexity.
Therefore, the quadratic form involved in the objective function remains non-negative under the above assumptions, which supports the convexity analysis. □

Appendix B. Proof of Theorem 3

Proof. 
To analyze the properties of stationary points, we study the monotonicity of the gradient mapping and prove that the gradient of the objective function is strictly monotone.
Let Z 1 , Z 2 R c × ( n p ) . Consider the inner product
F ( Z 1 ) F ( Z 2 ) , Z 1 Z 2 .
If this quantity is strictly positive for all Z 1 Z 2 , then the gradient mapping is strictly monotone, which provides insights into the behavior of stationary points, although uniqueness is not explicitly guaranteed.
From the expression of the gradient, the difference can be written as
F ( Z 1 ) F ( Z 2 ) = 4 Z 1 Z 1 Z 1 Z 2 Z 2 Z 2 +   2 λ ( Z 1 Z 2 ) + 4 Υ ( Z 1 Z 2 ) ( 4 + 2 λ ) ( Z 1 Z 2 ) A 2 S ˜ S ˜ A 2 .
Denote D = Z 1 Z 2 . Taking inner products yields
F ( Z 1 ) F ( Z 2 ) , D = T 1 + T 2 + T 3 + T 4 ,
where T 2 = 2 λ D F 2 , T 3 = 4 Υ D , D , and T 4 = ( 4 + 2 λ ) D A 2 S ˜ S ˜ A 2 , D .
Since Υ is diagonal with non-negative entries, T 3 0 . From Lemma A2, I A 2 S ˜ S ˜ A 2 is positive semidefinite, which implies
D A 2 S ˜ S ˜ A 2 , D D F 2 .
Therefore,
T 2 + T 3 + T 4 2 λ D F 2 + 4 Υ D , D ( 4 + 2 λ ) D F 2 .
For λ > 0 , this expression remains non-negative.
Next, consider the cubic term
T 1 = 4 Z 1 Z 1 Z 1 Z 2 Z 2 Z 2 , D .
Using standard matrix inequalities and the identity
Z 1 Z 1 Z 2 Z 2 F 2 = Z 1 Z 1 Z 1 Z 2 Z 2 Z 2 , D ,
it follows that T 1 0 , with equality only when Z 1 = Z 2 . Hence,
F ( Z 1 ) F ( Z 2 ) , Z 1 Z 2 > 0 for all Z 1 Z 2 .
Thus, the gradient is strictly monotone, which implies that the stationary point is unique.
Since Theorem 2 has established convexity, this unique stationary point is necessarily the global minimizer. □

References

  1. Xie, Y.; Ou, J.; Wen, B.; Yu, Z.; Tian, W. A joint learning method for low-light facial expression recognition. Complex Intell. Syst. 2025, 11, 139. [Google Scholar] [CrossRef]
  2. Zhou, N.; Jika, L.; Wang, W.; Xie, Y.; Du, Y.; Soh, Y.C. BERN: A Novel Framework for Enhanced Emotion Recognition Through the Integration of EEG and Eye Movement Features. In IEEE Transactions on Cognitive and Developmental Systems; IEEE: Piscataway, NJ, USA, 2025. [Google Scholar]
  3. Yu, Y.; She, K.; Shi, K.; Cai, X.; Kwon, O.M.; Soh, Y. Analysis of medical images super-resolution via a wavelet pyramid recursive neural network constrained by wavelet energy entropy. Neural Netw. 2024, 178, 106460. [Google Scholar] [CrossRef]
  4. Lee, D.D.; Seung, H.S. Learning the parts of objects by non-negative matrix factorization. Nature 1999, 401, 788–791. [Google Scholar] [CrossRef]
  5. Lee, D.; Seung, H.S. Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing Systems 13; MIT Press: Cambridge, MA, USA, 2000; Volume 13. [Google Scholar]
  6. Duda, R.O.; Hart, P.E.; Stork, D.G. Pattern Classification; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
  7. Belhumeur, P.N.; Hespanha, J.P.; Kriegman, D.J. Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell. 1997, 19, 711–720. [Google Scholar] [CrossRef]
  8. Wold, S.; Esbensen, K.; Geladi, P. Principal component analysis. Chemom. Intell. Lab. Syst. 1987, 2, 37–52. [Google Scholar] [CrossRef]
  9. Comon, P. Independent component analysis, a new concept? Signal Process. 1994, 36, 287–314. [Google Scholar] [CrossRef]
  10. Xu, W.; Gong, Y. Document clustering by concept factorization. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK, 25–29 July 2004; pp. 202–209. [Google Scholar]
  11. Zhang, Z.; Zhao, K. Low-rank matrix approximation with manifold regularization. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 1717–1729. [Google Scholar] [CrossRef] [PubMed]
  12. Cai, D.; He, X.; Han, J.; Huang, T.S. Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 1548–1560. [Google Scholar] [CrossRef]
  13. Cai, D.; He, X.; Han, J. Locally consistent concept factorization for document clustering. IEEE Trans. Knowl. Data Eng. 2010, 23, 902–913. [Google Scholar] [CrossRef]
  14. Yi, Y.; Wang, J.; Zhou, W.; Zheng, C.; Kong, J.; Qiao, S. Non-negative matrix factorization with locality constrained adaptive graph. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 427–441. [Google Scholar] [CrossRef]
  15. Peng, S.; Ser, W.; Chen, B.; Sun, L.; Lin, Z. Correntropy based graph regularized concept factorization for clustering. Neurocomputing 2018, 316, 34–48. [Google Scholar] [CrossRef]
  16. Principe, J. Information Theoretic Learning: Renyi’s Entropy and Kernel Perspectives; Springer Science & Business Media: New York, NY, USA; Dordrecht, The Netherlands; Berlin/Heidelberg, Germany; London, UK, 2010. [Google Scholar]
  17. Yu, N.; Wu, M.J.; Liu, J.X.; Zheng, C.H.; Xu, Y. Correntropy-based hypergraph regularized NMF for clustering and feature selection on multi-cancer integrated data. IEEE Trans. Cybern. 2020, 51, 3952–3963. [Google Scholar] [CrossRef] [PubMed]
  18. Liu, H.; Wu, Z.; Li, X.; Cai, D.; Huang, T.S. Constrained nonnegative matrix factorization for image representation. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 34, 1299–1311. [Google Scholar] [CrossRef] [PubMed]
  19. Liu, H.; Yang, G.; Wu, Z.; Cai, D. Constrained concept factorization for image representation. IEEE Trans. Cybern. 2013, 44, 1214–1224. [Google Scholar] [CrossRef] [PubMed]
  20. Zhang, X.; Zong, L.; Liu, X.; Luo, J. Constrained clustering with nonnegative matrix factorization. IEEE Trans. Neural Netw. Learn. Syst. 2015, 27, 1514–1526. [Google Scholar] [CrossRef] [PubMed]
  21. Peng, S.; Ser, W.; Chen, B.; Lin, Z. Robust semi-supervised nonnegative matrix factorization for image clustering. Pattern Recognit. 2021, 111, 107683. [Google Scholar] [CrossRef]
  22. Peng, S.; Yang, Z.; Nie, F.; Chen, B.; Lin, Z. Correntropy based semi-supervised concept factorization with adaptive neighbors for clustering. Neural Netw. 2022, 154, 203–217. [Google Scholar] [CrossRef]
  23. Zhou, N.; Choi, K.S.; Chen, B.; Du, Y.; Liu, J.; Xu, Y. Correntropy-based low-rank matrix factorization with constraint graph learning for image clustering. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 10433–10446. [Google Scholar] [CrossRef]
  24. Wu, W.; Jia, Y.; Kwong, S.; Hou, J. Pairwise constraint propagation-induced symmetric nonnegative matrix factorization. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 6348–6361. [Google Scholar] [CrossRef]
  25. Yin, J.; Peng, S.; Yang, Z.; Chen, B.; Lin, Z. Hypergraph based semi-supervised symmetric nonnegative matrix factorization for image clustering. Pattern Recognit. 2023, 137, 109274. [Google Scholar] [CrossRef]
  26. Zhou, N.; Luo, W.; Wu, Z.; Du, Y.; Shi, K.; Chen, B. Bipartite graph regularized robust low-rank matrix factorization for fast semi-supervised image clustering. Appl. Intell. 2026, 56, 22. [Google Scholar] [CrossRef]
  27. Liu, W.; He, J.; Chang, S.F. Large graph construction for scalable semi-supervised learning. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21–24 June 2010; pp. 679–686. [Google Scholar]
  28. Wang, M.; Fu, W.; Hao, S.; Tao, D.; Wu, X. Scalable semi-supervised learning by efficient anchor graph regularization. IEEE Trans. Knowl. Data Eng. 2016, 28, 1864–1877. [Google Scholar] [CrossRef]
  29. He, F.; Nie, F.; Wang, R.; Li, X.; Jia, W. Fast semisupervised learning with bipartite graph for large-scale data. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 626–638. [Google Scholar] [CrossRef] [PubMed]
  30. Zhang, M.L.; Zhou, Z.H. ML-KNN: A Lazy Learning Approach to Multi-Label Learning. Pattern Recognit. 2007, 40, 2038–2048. [Google Scholar]
  31. Zhou, N.; Deng, Q.; Luo, W.; Huang, X.; Du, Y.; Chen, B.; Pedrycz, W. Correntropy meets cross-entropy: A robust loss against noisy labels. Eng. Appl. Artif. Intell. 2026, 167, 113830. [Google Scholar] [CrossRef]
  32. Antelmi, A.; Cordasco, G.; Polato, M.; Scarano, V.; Spagnuolo, C.; Yang, D. A survey on hypergraph representation learning. ACM Comput. Surv. 2023, 56, 24. [Google Scholar] [CrossRef]
  33. Gao, Y.; Zhang, Z.; Lin, H.; Zhao, X.; Du, S.; Zou, C. Hypergraph learning: Methods and practices. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 2548–2566. [Google Scholar]
  34. Huang, J.; Nie, F.; Huang, H. A new simplex sparse learning model to measure data similarity for clustering. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015. [Google Scholar]
  35. Hager, W.W.; Zhang, H. A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM J. Optim. 2005, 16, 170–192. [Google Scholar] [CrossRef]
  36. Hager, W.W.; Zhang, H. Algorithm 851: CG_DESCENT, a conjugate gradient method with guaranteed descent. ACM Trans. Math. Softw. (TOMS) 2006, 32, 113–137. [Google Scholar]
  37. Hager, W.W.; Zhang, H. The limited memory conjugate gradient method. SIAM J. Optim. 2013, 23, 2150–2168. [Google Scholar] [CrossRef]
  38. Li, J.; Shen, X.; Li, C.; Li, Y. Elements Discriminative Non-Negative Matrix Factorization for Data Clustering. Eng. Appl. Artif. Intell. 2025, 156, 111210. [Google Scholar] [CrossRef]
  39. Chavoshinejad, J.; Seyedi, S.A.; Tab, F.A.; Salahian, N. Self-supervised semi-supervised nonnegative matrix factorization for data clustering. Pattern Recognit. 2023, 137, 109282. [Google Scholar] [CrossRef]
  40. Li, J.; Li, C. One-hot constrained symmetric nonnegative matrix factorization for image clustering. Pattern Recognit. 2025, 162, 111427. [Google Scholar] [CrossRef]
  41. Wang, N.; Cui, Z.; Li, A.; Lu, Y.; Wang, R.; Nie, F. Structured Doubly Stochastic Graph-Based Clustering. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 11064–11077. [Google Scholar] [CrossRef] [PubMed]
  42. Wang, Z.; Zhang, L.; Wang, R.; Nie, F.; Li, X. Semi-supervised Learning Via Bipartite Graph Construction with Adaptive Neighbors. IEEE Trans. Knowl. Data Eng. 2023, 35, 5257–5268. [Google Scholar] [CrossRef]
  43. Strehl, A.; Ghosh, J. Cluster Ensembles A Knowledge Reuse Framework for Combining Partitionings. In Proceedings of the AAAI Conference on Artificial Intelligence, Edmonton, AB, Canada, 28 July–1 August 2002. [Google Scholar]
  44. Santos, J.M.; Embrechts, M. On the Use of the Adjusted Rand Index As a Metric for Evaluating Supervised Classification. In Artificial Neural Networks—ICANN 2009, Proceedings of the 19th International Conference, Limassol, Cyprus, 14–17 September 2009; Springer: Berlin/Heidelberg, Germany, 2009; pp. 175–184. [Google Scholar]
  45. Powers, D.M.W. Evaluation: From Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation. arXiv 2020, arXiv:2010.16061. [Google Scholar] [CrossRef]
Figure 1. Connections between samples and anchor points.
Figure 1. Connections between samples and anchor points.
Symmetry 18 00698 g001
Figure 2. Function curve of 1 k σ ( x ) and x 2 .
Figure 2. Function curve of 1 k σ ( x ) and x 2 .
Symmetry 18 00698 g002
Figure 3. Effect of labeled sample quantity on clustering performance: (a) COIL20, (b) YaleB.
Figure 3. Effect of labeled sample quantity on clustering performance: (a) COIL20, (b) YaleB.
Symmetry 18 00698 g003
Figure 4. Effect of labeled sample quantity on clustering performance: (a) COIL100, (b) USPS.
Figure 4. Effect of labeled sample quantity on clustering performance: (a) COIL100, (b) USPS.
Symmetry 18 00698 g004
Figure 5. Effect of labeled sample quantity on clustering performance: (a) MNIST, (b) Letters.
Figure 5. Effect of labeled sample quantity on clustering performance: (a) MNIST, (b) Letters.
Symmetry 18 00698 g005
Figure 6. Sensitivity of FUCSMF (a) on COIL20 dataset, (b) on YaleB dataset.
Figure 6. Sensitivity of FUCSMF (a) on COIL20 dataset, (b) on YaleB dataset.
Symmetry 18 00698 g006
Figure 7. Sensitivity of FUCSMF (a) on COIL100 dataset, (b) on USPS dataset.
Figure 7. Sensitivity of FUCSMF (a) on COIL100 dataset, (b) on USPS dataset.
Symmetry 18 00698 g007
Figure 8. Sensitivity of FUCSMF (a) on MNIST dataset, (b) on Letters dataset.
Figure 8. Sensitivity of FUCSMF (a) on MNIST dataset, (b) on Letters dataset.
Symmetry 18 00698 g008
Figure 9. Impact of anchor number on clustering accuracy and runtime: (a) COIL20, (b) YaleB.
Figure 9. Impact of anchor number on clustering accuracy and runtime: (a) COIL20, (b) YaleB.
Symmetry 18 00698 g009
Figure 10. Impact of anchor number on clustering accuracy and runtime: (a) COIL100, (b) USPS.
Figure 10. Impact of anchor number on clustering accuracy and runtime: (a) COIL100, (b) USPS.
Symmetry 18 00698 g010
Figure 11. Impact of anchor number on clustering accuracy and runtime: (a) MNIST, (b) Letters.
Figure 11. Impact of anchor number on clustering accuracy and runtime: (a) MNIST, (b) Letters.
Symmetry 18 00698 g011
Figure 12. Graph representations for COIL20: (a) bipartite sample–anchor graph computed using Equation (28); (b) corresponding full adjacency matrix inferred through Equation (15); (c) normalized adjacency matrix based on the method in [12].
Figure 12. Graph representations for COIL20: (a) bipartite sample–anchor graph computed using Equation (28); (b) corresponding full adjacency matrix inferred through Equation (15); (c) normalized adjacency matrix based on the method in [12].
Symmetry 18 00698 g012
Figure 13. Convergence curve (a) on COIL20 dataset, (b) on USPS dataset.
Figure 13. Convergence curve (a) on COIL20 dataset, (b) on USPS dataset.
Symmetry 18 00698 g013
Figure 14. Convergence curve (a) on COIL100 dataset, (b) on USPS dataset.
Figure 14. Convergence curve (a) on COIL100 dataset, (b) on USPS dataset.
Symmetry 18 00698 g014
Figure 15. Convergence curve (a) on MNIST dataset, (b) on Letters dataset.
Figure 15. Convergence curve (a) on MNIST dataset, (b) on Letters dataset.
Symmetry 18 00698 g015
Table 1. Datasets.
Table 1. Datasets.
Dataset# Instances (n)# Features (d)# Classes (C)
COIL201440102420
YaleB2414102438
COIL10072001024100
USPS929825610
Letter MNIST70,00078410
Letters145,60078426
Table 2. Performance comparison in terms of accuracy and standard deviation (%) on six datasets (OM: Out-of-Memory).
Table 2. Performance comparison in terms of accuracy and standard deviation (%) on six datasets (OM: Out-of-Memory).
SemiGNMFCSCFCLMFEDDNMFPCPSNMFS4NMFHSSNMFOCSNMFSDSGCEAGRBCANFUCSMF
COIL2086.90 ± 00.7589.53 ± 02.3289.65 ± 01.2177.53 ± 01.6287.41 ± 04.2083.73 ± 02.7487.53 ± 03.6294.63 ± 01.2086.25 ± 01.5094.74 ± 00.9195.95 ± 00.8698.56 ± 00.47
YaleB41.70 ± 01.4255.42 ± 01.4767.74 ± 01.5442.98 ± 01.3556.64 ± 04.7731.61 ± 02.1567.12 ± 01.7774.57 ± 00.8753.85 ± 00.7752.29 ± 01.3954.07 ± 01.1061.99 ± 01.02
COIL10075.20 ± 00.9881.31 ± 01.3267.60 ± 00.7565.62 ± 00.6076.25 ± 01.5581.71 ± 01.5684.11 ± 01.3684.91 ± 00.3968.36 ± 01.1081.56 ± 00.4981.60 ± 00.6985.46 ± 00.46
USPS-79.16 ± 05.4180.03 ± 00.9873.44 ± 09.1589.08 ± 22.7084.98 ± 05.1782.96 ± 02.1994.03 ± 00.20OM95.10 ± 00.4594.50 ± 00.6795.55 ± 00.38
MNISTOMOMOM68.83 ± 00.68OMOMOMOMOM92.73 ± 00.9292.47 ± 00.6593.76 ± 00.92
LettersOMOMOM49.87 ± 00.14OMOMOMOMOM67.01 ± 00.9667.40 ± 00.7968.29 ± 00.91
Note: Bold values indicate the best results, and underlined values indicate the second-best results.
Table 3. Comparative NMI results and corresponding standard deviations (%) on six datasets (OM: Out-of-Memory).
Table 3. Comparative NMI results and corresponding standard deviations (%) on six datasets (OM: Out-of-Memory).
SemiGNMFCSCFCLMFEDDNMFPCPSNMFS4NMFHSSNMFOCSNMFSDSGCEAGRBCANFUCSMF
COIL2094.09 ± 00.5796.06 ± 00.4288.87 ± 01.4080.11 ± 00.8991.72 ± 02.7683.47 ± 02.7492.48 ± 01.3294.33 ± 00.8494.21 ± 03.2095.53 ± 00.8196.19 ± 00.7598.43 ± 00.43
YaleB62.52 ± 01.3271.49 ± 01.0466.21 ± 01.2743.92 ± 01.3359.21 ± 03.3845.23 ± 01.5766.82 ± 00.9173.31 ± 01.0752.14 ± 01.7254.78 ± 01.0556.85 ± 01.0062.20 ± 00.97
COIL10090.98 ± 00.3194.08 ± 00.2579.48 ± 00.5375.26 ± 00.1786.63 ± 00.8689.15 ± 00.8191.21 ± 00.5090.61 ± 00.3988.60 ± 00.5589.33 ± 00.3189.81 ± 00.3791.96 ± 00.32
USPS-84.65 ± 02.1165.98 ± 01.2475.86 ± 07.0678.84 ± 22.0473.03 ± 05.0377.24 ± 03.5186.10 ± 00.47OM88.87 ± 00.5487.76 ± 01.0489.68 ± 00.61
MNISTOMOMOM54.75 ± 00.67OMOMOMOMOM85.51 ± 00.9684.70 ± 00.7087.13 ± 01.01
LettersOMOMOM41.15 ± 00.36OMOMOMOMOM61.98 ± 00.5562.11 ± 00.6364.41 ± 00.55
Note: Bold values indicate the best results, and underlined values indicate the second-best results.
Table 4. Comparative ARI performance and standard deviation (%) on six datasets (OM: Out-of-Memory).
Table 4. Comparative ARI performance and standard deviation (%) on six datasets (OM: Out-of-Memory).
SemiGNMFCSCFCLMFEDDNMFPCPSNMFS4NMFHSSNMFOCSNMFSDSGCEAGRBCANFUCSMF
COIL2082.90 ± 00.9886.49 ± 02.7381.94 ± 02.0465.44 ± 01.6683.07 ± 06.2980.06 ± 03.0483.17 ± 03.3090.26 ± 01.5583.40 ± 04.1090.69 ± 01.2992.68 ± 01.3297.08 ± 00.91
YaleB26.39 ± 01.4041.89 ± 02.5146.00 ± 02.0043.92 ± 01.3331.54 ± 09.6518.26 ± 01.5947.21 ± 01.6956.26 ± 02.0844.05 ± 00.3631.08 ± 01.3633.52 ± 01.3242.87 ± 01.21
COIL10064.10 ± 02.2973.85 ± 03.2053.88 ± 00.8875.26 ± 00.1765.95 ± 02.5973.25 ± 01.1377.26 ± 01.6376.52 ± 00.5955.93 ± 03.2473.58 ± 00.8374.21 ± 00.8779.23 ± 00.69
USPS-76.38 ± 05.8263.54 ± 01.6581.22 ± 17.2079.89 ± 31.6773.11 ± 05.1274.07 ± 04.3888.33 ± 00.57OM90.82 ± 00.7189.66 ± 01.3091.60 ± 00.76
MNISTOMOMOM40.45 ± 01.04OMOMOMOMOM85.17 ± 01.6284.57 ± 01.1387.15 ± 01.65
LettersOMOMOM23.52 ± 00.18OMOMOMOMOM48.58 ± 00.9048.79 ± 00.8950.92 ± 00.92
Note: Bold values indicate the best results, and underlined values indicate the second-best results.
Table 5. Comparative F-score performance and standard deviation (%) on six datasets (OM: Out-of-Memory).
Table 5. Comparative F-score performance and standard deviation (%) on six datasets (OM: Out-of-Memory).
SemiGNMFCSCFCLMFEDDNMFPCPSNMFS4NMFHSSNMFOCSNMFSDSGCEAGRBCANFUCSMF
COIL2085.94 ± 00.8688.60 ± 02.5989.48 ± 01.2277.32 ± 01.5585.48 ± 04.1082.36 ± 02.8985.56 ± 03.9694.97 ± 00.8684.32 ± 02.5194.78 ± 00.9395.98 ± 00.8898.58 ± 00.46
YaleB42.86 ± 01.2953.11 ± 01.9768.00 ± 01.5047.37 ± 01.9258.27 ± 03.2731.05 ± 02.5766.41 ± 01.8974.77 ± 01.3848.25 ± 00.3252.76 ± 01.4254.14 ± 01.1762.38 ± 00.99
COIL10074.23 ± 01.0079.41 ± 01.3565.45 ± 00.8563.67 ± 00.9376.29 ± 01.3781.40 ± 01.5883.66 ± 01.2882.92 ± 00.4056.51 ± 03.1881.46 ± 00.5581.43 ± 00.6985.53 ± 00.48
USPS-76.61 ± 06.0878.63 ± 01.0381.89 ± 09.1688.08 ± 22.6683.65 ± 05.3778.19 ± 01.9693.55 ± 00.17OM94.56 ± 00.5493.94 ± 00.7095.08 ± 00.39
MNISTOMOMOM68.76 ± 00.75OMOMOMOMOM92.61 ± 00.9692.38 ± 00.6793.67 ± 00.94
LettersOMOMOM49.34 ± 00.92OMOMOMOMOM66.14 ± 00.9866.87 ± 00.7867.30 ± 00.91
Note: Bold values indicate the best results, and underlined values indicate the second-best results.
Table 6. Running time comparison of all methods (in seconds).
Table 6. Running time comparison of all methods (in seconds).
Other LearningAnchor-Based Learning
GNMFCSCFCLMFEDDNMFPCPSNMFHSSNMFS4NMFOCSNMFSDSGC k -MeansEAGRBGSSLFUCSMF
COIL200.919.8624.362.763.95.6153.490.935.320.65+0.11+0.24+0.05
YaleB2.238.6853.717.4711.3516.73200.311.0722.310.73+0.2+0.90+0.15
COIL10011.05547.59421.9037.58152.07154.582345.229.38656.732.1+0.89+2.15+0.72
USPS2.511129.17328.543.99213.75234.212612.9813.16OM4.11+0.62+2.15+0.31
MNISTOMOMOM127.56OMOMOMOMOM61.5+4.54+5.05+2.87
LettersOMOMOM200.54OMOMOMOMOM129.77+14.68+13.43+7.66
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, W.; Chen, K.; Luo, W.; Zhou, N.; Cao, Y. Fast Unconstraint Convex Symmetric Matrix for Semi-Supervised Learning. Symmetry 2026, 18, 698. https://doi.org/10.3390/sym18040698

AMA Style

Wang W, Chen K, Luo W, Zhou N, Cao Y. Fast Unconstraint Convex Symmetric Matrix for Semi-Supervised Learning. Symmetry. 2026; 18(4):698. https://doi.org/10.3390/sym18040698

Chicago/Turabian Style

Wang, Wenhao, Kaiwen Chen, Wenjun Luo, Nan Zhou, and Yanyi Cao. 2026. "Fast Unconstraint Convex Symmetric Matrix for Semi-Supervised Learning" Symmetry 18, no. 4: 698. https://doi.org/10.3390/sym18040698

APA Style

Wang, W., Chen, K., Luo, W., Zhou, N., & Cao, Y. (2026). Fast Unconstraint Convex Symmetric Matrix for Semi-Supervised Learning. Symmetry, 18(4), 698. https://doi.org/10.3390/sym18040698

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop