Next Article in Journal
An Active-Set Algorithm for Convex Quadratic Programming Subject to Box Constraints with Applications in Non-Linear Optimization and Machine Learning
Previous Article in Journal
Financing Newsvendor with Trade Credit and Bank Credit Portfolio
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Fast Proximal Alternating Method for Robust Matrix Factorization of Matrix Recovery with Outliers

1
School of Mathematics, Foshan University, Foshan 528011, China
2
College of Information Science and Technology, Jinan University, Guangzhou 510632, China
*
Authors to whom correspondence should be addressed.
Mathematics 2025, 13(9), 1466; https://doi.org/10.3390/math13091466
Submission received: 20 March 2025 / Revised: 21 April 2025 / Accepted: 26 April 2025 / Published: 29 April 2025

Abstract

:
This paper concerns a class of robust factorization models of low-rank matrix recovery, which have been widely applied in various fields such as machine learning and imaging sciences. An 1 -loss robust factorized model incorporating the 2 , 0 -norm regularization term is proposed to address the presence of outliers. Since the resulting problem is nonconvex, nonsmooth, and discontinuous, an approximation problem that shares the same set of stationary points as the original formulation is constructed. Subsequently, a proximal alternating minimization method is proposed to solve the approximation problem. The global convergence of its iterate sequence is also established. Numerical experiments on matrix completion with outliers and image restoration tasks demonstrate that the proposed algorithm achieves low relative errors in shorter computational time, especially for large-scale datasets.

1. Introduction

The low-rank matrix recovery problem seeks to recover a true yet unknown low-rank matrix M of rank r from the minimal feasible observations. This problem encompasses various applications in numerous fields, including signal and image processing, quantum state tomography, control and system identification, statistics, machine learning, etc. (see [1,2,3,4,5]). This paper considers a robust factorized model for the low-rank matrix recovery problem, in which the observations contain outliers. The model is expressed as
b = A ( M ) + ω ,
where A : R n 1 × n 2 R m is a sampling operator, b R m is the observation vector, and ω is a sparse noise vector with arbitrarily large nonzero entries, while the rest of the elements are zero. In scenarios where outliers exist, the traditional smooth least-squared loss function becomes too sensitive to outliers and is probably biased. In view of this, studies [6,7] have proposed a non-smooth p -loss with balance regularization model:
min U R n 1 × κ , V R n 2 × κ Ψ ( U , V ) : = A ( U V ) b p + γ U U V V F ,
where γ 0 . When γ > 0 , for p = 1 and κ = r , Li et al. [7] established exact matrix recovery based on the 1 / 2 -restricted isometry property of the sampling operator A and proved that the proposed subgradient method with geometrically diminishing step sizes converges linearly to the ground-truth matrix. For p = 1 or 2 and κ = r , Charisopoulos et al. [6] proved that the subgradient method and prox-linear method converge at a fast dimension-independent rate under the same assumptions as in [7]. When γ = 0 , for p = 1 , Ma and Fattahi [8] showed that the subgradient method converges to the true matrix under the Sign-RIP condition [[8], Definition 7]. Additionally, Wang et al. [9] developed a robust and fast rank-one matrix completion algorithm by minimizing the Welsch cost function.
Although the subgradient method exhibits favorable convergence properties in solving the problem (1), it is limited by some strict assumptions. Practically, the rank of the ground-truth matrix is generally unknown, and consequently, the regularization term in (1) fails to induce low-rank structures. Inspired by [10], this paper utilizes the 2 , 0 -norm in the regularization term and studies the following 1 -loss factorized model of low-rank matrix recovery:
min U R n 1 × κ , V R n 2 × κ Φ ( U , V ) : = A ( U V ) b 1 + λ ( U 2 , 0 + V 2 , 0 ) ,
where κ is an upper estimator of r , and · 2 , 0 denotes the 2 , 0 -norm of matrices, which measures the number of the nonzero column. The regularization term λ ( U 2 , 0 + V 2 , 0 ) is used to reduce rank through column sparsity. The 1 -loss function, on the other hand, is more robust against outliers and has been widely utilized for outlier detection studies [11,12,13]. Both terms of the objective function (2) are nonsmooth. On top of that, the function suffers from both nonconvexity and discontinuity due to the existence of the 2 , 0 -norm regularization term. The discontinuity causes the subgradient algorithm to fail when it is applied to problem (2). While the alternating direction method of multipliers is commonly applied to address problem (2), there is still no theoretical guarantee of its convergence in the context of such nonconvex and nonsmooth optimization problems. For methods with convergence guarantees, the majority of them impose, within their objective functions, either the assumption of smoothness for at least one term or the continuity assumption. To the best of our knowledge, no state-of-the-art algorithm is capable of handling this type of nonconvex and discontinuous problem.
The major contributions of this work are threefold. Firstly, to handle the nonconvex and discontinuous problem (2), a novel potential function is constructed:
min U R n 1 × κ , V R n 2 × κ , z R m Θ ( U , V , z ) : = α 2 A ( U V ) b z 2 + z 1 + λ ( U 2 , 0 + V 2 , 0 ) ,
where α > 0 is a given constant. It has been proven that, under mild assumptions, the sets of stationary points of (2) and (3) are equivalent. Moreover, it has been demonstrated that any global optimal solution to problem (3) is an approximate optimal solution to problem (2). This equivalence enables us to find solutions to problem (2) by solving problem (3), and the latter is more computationally efficient. Secondly, a proximal alternating linearized minimization (PALM) method is proposed to solve (3). The global convergence of PALM method is also established by exploiting the Kurdyka–Łojasiewicz (KL) property of the objective function. Finally, numerical experiments conducted on both synthetic and real-world datasets have validated that the PALM method for problem (3) outperforms SubGM [6,7] for problem (1) with p = 1 .
The remainder of this paper is organized as follows. Section 2 introduces preliminaries and foundational concepts. In Section 3, the relationship between problems (2) and (3) is established. Section 4 presents the PALM method to solve the problem (3), with its global convergence established. In Section 5, the numerical performance of the PALM method is evaluated, and its performance is compared with that of SubGM to solve the problem (1) through experiments on various datasets. Section 6 concludes the paper.

2. Preliminaries

This section introduces the notation employed and then presents the definitions of stationary points and the KL property.

2.1. Notations

Throughout this paper, R n 1 × n 2 represents the vector space of all n 1 × n 2 real matrices, equipped with the trace inner product X , Y = trace ( X Y ) for X , Y R n 1 × n 2 and its induced Frobenius norm. For a matrix X R n 1 × n 2 , Z j denotes the jth column of Z, J Z represents its index set of nonzero columns, and X F and X 2 , 0 denote the Frobenius norm and the column 2 , 0 -norm of X, which means the number of nonzero columns. For a matrix X R n 1 × n 2 , let σ ( X ) : = ( σ 1 ( X ) , , σ n ( X ) ) with σ 1 ( X ) σ n ( X ) , where σ i ( X ) denotes the singular value of X, Σ κ ( X ) : = Diag ( σ 1 ( x ) , , σ κ ( X ) ) . For a vector x R n , x 1 denotes the 1 -norm and x denotes the 2 -norm. Let U F ( U , V ) and V F ( U , V ) denote the partial gradient of F at ( U , V ) w.r.t. variable U and V, respectively. For any x R m , sign ( x ) = [ sign ( x 1 ) , , sign ( x m ) ] with
sign ( t ) : = t / | t | , t 0 ; 0 , t = 0 . for any t R .
For convenience, define G ( U , V , z ) : = α 2 A ( U V ) b z 2 ,
F ( U , V ) : = A ( U V ) b 1 and g ( U ) : = U 2 , 0 .

2.2. Stationary Points and ϵ -Global Optimal Solutions

From [14], the generalized subdifferentials of an extended real-valued function h: R n × m R ¯ : = ( , ] at a point where h attains a finite value are recalled.
Definition 1.
Given a function h : R n × m R ¯ with a point x such that h ( x ) is finite. The regular subdifferential of h at x is defined as follows:
^ h ( x ) : = v R n × m | lim inf x x x h ( x ) h ( x ) v , x x x x F 0 ,
and the basic (or, say, limiting or Morduhovich) subdifferential of h at x is defined as
h ( x ) : = v R n × m | x k x with h ( x k ) h ( x ) and v k v with v k ^ h ( x k ) .
Consider a proper lower semicontinuous (lsc) function h : R n × m R ¯ , the stationary point and ϵ -global optimal solution to problem min x R n × m h ( x ) are defined as follows.
Definition 2.
For a point x R n × m such that 0 h ( x ) , it is called a stationary point of min x R n × m h ( x ) .
Definition 3.
For a point x R n × m such that
h ( x ) ϵ h ( x ) x R n × m ,
it is called an ϵ-global optimal solution of min x R n × m h ( x ) .

2.3. Kurdyka–Łojasiewicz Property

From [15], the KL property of an extended real-valued function is stated as follows.
Definition 4.
Given h : X R ¯ a proper lsc function. It is said to have the Kurdyka–Łojasiewicz (KL) property at x ¯ dom h if there exists η ( 0 , ] ; a continuous concave function φ : [ 0 , η ) R + satisfies
(i) 
φ ( 0 ) = 0 and φ is continuously differentiable on ( 0 , η ) , and
(ii) 
s ( 0 , η ) , φ ( s ) > 0 ;
and a neighborhood U of x ¯ such that x U h ( x ¯ ) < h ( x ) < h ( x ¯ ) + η ,
φ ( h ( x ) h ( x ¯ ) ) dist ( 0 , h ( x ) ) 1 .
If the function h satisfies the KL property at any points of dom h , then h is called a KL function.
Remark 1.
According to Lemma 2.1 of [15], a proper lsc function has the KL property at any noncritical point. Thus, to prove that a proper lsc function h : X R ¯ is a KL function, it suffices to check whether h has the KL property at any critical point.

3. Relationship Between Problems (2) and (3)

This section demonstrates the relationship between the stationary points and global optimal solutions of (2) and (3). On top of that, the stationary point sets of (2) and (3) are characterized. The subdifferential of Φ at an arbitrary point ( U , V ) R n 1 × κ × R n 2 × κ is given in the following lemma.
Lemma 1.
Fix any λ > 0 . Consider any ( U , V ) R n 1 × κ × R n 2 × κ . Then, it holds that
^ Φ ( U , V ) = Φ ( U , V ) = A ( ξ ) V + λ g ( U ) A ( ξ ) U + λ g ( V ) | ξ · 1 A ( U V ) b ,
where for any X R n × m
^ g ( X ) = g ( X ) = S 1 × × S m with S j = { 0 } n if j J X ; R n if j J X .
Proof. 
Recall that Φ ( U , V ) = F ( U , V ) + λ ( g ( U ) + g ( V ) ) . From [[14], Corollary 10.9 and Exercise 10.10], it immediately follows that
^ F ( U , V ) + λ ^ g ( U ) × λ ^ g ( V ) ^ Φ ( U , V ) Φ ( U , V ) F ( U , V ) + λ g ( U ) × λ g ( V ) .
Next, F ( U , V ) and ^ F ( U , V ) are calculated. According to the chain rule [[14], Corollary 8.11 and Theorem 10.6], we obtain
F ( U , V ) = ^ F ( U , V ) = A ( ξ ) V A ( ξ ) U | ξ · 1 A ( U V ) b .
For any U R n 1 × κ , by invoking [[14], Proposition 10.5], it immediately follows that
^ g ( U ) = g ( U ) = S 1 × × S κ with S j = { 0 } n 1 if j J U ; R n 1 if j J U .
This, along with the inclusion above, implies that the desired result holds. □
The subdifferential of Θ is characterized by the following lemma.
Lemma 2.
Fix any λ > 0 . Consider any ( U , V , z ) R n 1 × κ × R n 2 × κ × R m . Then, it holds that
^ Θ ( U , V , z ) = Θ ( U , V ) = α A ( A ( U V ) b z ) V + λ g ( U ) α A ( A ( U V ) b z ) U + λ g ( V ) α ( z A ( U V ) + b ) + z 1 ,
where g is defined in Lemma 1.
Proof. 
Recall that Θ ( U , V , z ) = G ( U , V , z ) + z 1 + λ ( g ( U ) + g ( V ) ) . Then, from [[14], Exercise 8.8(c) and Proposition 10.5], it immediately follows that
G ( U , V , z ) + λ ^ g ( U ) × λ ^ g ( V ) × z 1 ^ Θ ( U , V , z ) Θ ( U , V , z ) G ( U , V , z ) + λ g ( U ) × λ g ( V ) × z 1 .
By the smoothness and the expression of G ( U , V , z ) , one has
G ( U , V , z ) = α A ( A ( U V ) b z ) V α A ( A ( U V ) b z ) U α ( z A ( U V ) + b ) .
Along with the above inclusion and (4), the desired results are obtained. □
The following proposition states the equivalence between the stationary points of problems (2) and (3) under some mild conditions.
Proposition 1.
If ( U , V , z ) R n 1 × κ × R n 2 × κ × R m is an arbitrary stationary point of (3) satisfying the nonzero elements of | A ( U V ) b | not smaller than 1 α , then ( U , V ) is a stationary point of (2). Conversely, if ( U , V ) R n 1 × κ × R n 2 × κ is an arbitrary stationary point of (2) satisfying all elements of | A ( U V ) b | not smaller than 1 α , then there exists z A ( U V ) b sign A ( U V ) b such that ( U , V , z ) is a stationary point of (3).
Proof. 
Pick any stationary point ( U , V , z ) of (3). From Lemma 2, it holds that
{ 0 α A ( A U V b z ) V + λ g U ; (5a) 0 α [ A ( A U V b z ) ] U + λ g V ; (5b) 0 α ( A U V b z ) + z 1 . (5c)
Write y : = A ( U V ) b . From (5c), it can be concluded that z = sign ( y ) max 0 , | y | 1 α . Then,
y z = sign ( y ) min ( | y | , 1 α ) ,
Along with | y i |   1 α or y i = 0 for all i = 1 , , m , one has y z = 1 α sign ( y ) . Thus, substituting y z = 1 α sign ( y ) into (5a) and (5b), one obtains
0 A sign A ( U V ) b V + λ g ( U ) ; 0 A sign A ( U V ) b U + λ g ( V ) .
This together with the definition of sign ( · ) implies that ( U , V ) is a stationary point of (2).
Conversely, any ( U , V ) is a stationary point of (2). According to Lemma 1, there exists ξ · 1 A ( U V ) b such that
0 A ( ξ ) V + λ g ( U ) 0 A ( ξ ) U + λ g ( V )
Write y : = A ( U V ) b and z : = y 1 α ξ . Then, ( U , V , z ) satisfies (5a) and (5b). By noting that | ξ i |     1 for all i = 1 , , m , and | y i |   1 α for all i = 1 , , m , one has α ( y z ) = ξ · 1 ( y 1 α ξ ) = z 1 which means that (5c) holds. This implies that ( U , V , z ) is a stationary point of (3). The proof is completed. □
Remark 2.
The converse of the above property generally does not hold, as the obtained vector y is expected to be sparse. This indicates that the set of stationary points for problem (3) is typically smaller than that of problem (2). Through the relationship between the global optimal solutions of (2) and (3) in next proposition, the stationary points of (3) exclude certain undesirable points contained in the stationary point set of (2).
Proposition 2.
If ( U , V , z ) is a global optimal solution of (3), then ( U , V ) is a m 2 α -global optimal solution of (2).
Proof. 
If ( U , V , z ) is a global optimal solution of (3), then ( U , V , z ) is a stationary point of (3). Write y : = A ( U ( V ) ) b . From (6), it holds that
y z = sign ( y ) min ( | y | , 1 α ) .
Let I : = { i [ m ] | | y i | < 1 α } and I ¯ : = { i [ m ] | | y i | 1 α } . Then, one has
G ( U , V , z ) = α 2 y z 2 + z 1 = α 2 sign ( y ) min ( | y | , 1 α ) 2 + sign ( y ) max 0 , | y | 1 α 1 = α 2 i I | y i | 2 + i I ¯ ( | y i | 1 2 α ) = y 1 m 2 α + i I ( α 2 | y i | 2 | y i | + 1 2 α ) y 1 m 2 α ,
where the inequality is due to | y i | < 1 α for i I . Since ( U , V , z ) is a global optimal solution of (3), for any ( U , V , z ) with z = A ( U V ) b
Φ ( U , V ) m 2 α Θ ( U , V , z ) Θ ( U , V , z ) = Φ ( U , V ) .
This means that ( U , V , z ) is a m 2 α -global optimal solution of (2).    □
Remark 3.
Proposition 2 demonstrates that global optimal solution ( U , V , z ) of problem (3) is an approximate optimal solution to problem (2), and as the parameter α increases, the value Φ ( U , V ) progressively approaches the optimal value of problem (2).

4. A PALM Method for Solving Problem (3)

Recall that G ( U , V , z ) = α 2 A ( U V ) b z 2 is a smooth function. The gradients U G ( · , V , z ) and V G ( U , · , z ) are Lipschitz continuous with moduli defined as τ V and τ U , respectively. Fix any ( U , V , z ) R n 1 × κ × R n 2 × κ × R m ; according to decent Lemma [[16], Proposition A.24], it holds that
G ( U , V , z ) G ( U , V , z ) + U G ( U , V , z ) , U U + τ V 2 U U F 2 , U R n 1 × κ ,
G ( U , V , z ) G ( U , V , z ) + V G ( U , V , z ) , V V + τ U 2 V V F 2 , V R n 2 × κ .
Let ( U k , V k , z k ) be the current iteration. From the expression of Θ and the above two inequalities, one obtains
Θ ( U , V k , z k ) Θ ˜ U ( U , V k , z k ) : = G ( U k , V k , z k ) + U G ( U k , V k , z k ) , U U k + λ U 2 , 0 + z k 1 + τ V k 2 U U k F 2 , Θ ( U k , V , z k ) Θ ˜ V ( U k , V , z k ) : = G ( U k , V k , z k ) + V G ( U k , V k , z k ) , V V k + λ V 2 , 0 + z k 1 + τ U k 2 V V k F 2 ,
which become equalities when U = U k and V = V k . Consequently, Θ ˜ U ( · , V k , z k ) and Θ ˜ V ( U k , · , z k ) serve as majorizations for Θ ( · , V k , z k ) at U k and Θ ( U k , · , z k ) at V k , respectively. An algorithm for problem (3) is developed by alternately minimizing the following subproblem. The iteration steps are outlined as follows.
Remark 4.
(i) For any α > 0 , the optimal solution of problem (7) has the following form z k + 1 = sign ( y k ) max ( 0 , | y k | 1 α ) , where “∘” is the Hadamard product.
(i) Let H k = U k 1 γ 1 , k U G ( U k , V k , z k + 1 ) and S k = V k 1 γ 2 , k V G ( U k + 1 , V k , z k + 1 ) . Then, the columns of U k + 1 and V k + 1 have the following closed form:
U i k + 1 = sign max ( 0 , H i k 2 λ γ 1 , k 1 ) H i k for i = 1 , , r , V i k + 1 = sign max ( 0 , S i k 2 λ γ 2 , k 1 ) S i k for i = 1 , , r .
For each k N , write w k = ( U k , V k , z k ) . To establish the convergence of the sequence { w k } k N , the following proposition is required.
Proposition 3.
Let { w k } k N be the sequence generated by Algorithm 1. Then, the following statements hold.
Algorithm 1 (PALM Method for Solving (3))
1:
Input: Sampling operator A and observation vector b R m
2:
Initialization: Choose an initial ( U 0 , V 0 ) R n 1 × κ × R n 2 × κ , set λ , μ > 0
3:
while stopping conditions are not satisfied, do
4:
   Compute y k = A ( U k ( V k ) ) b
5:
   Solve for z k + 1 :
z k + 1 = arg min z R m α 2 y k z 2 + z 1 .
6:
   Set γ 1 , k τ V k + μ , solve for U k + 1 :
U k + 1 arg min U R n 1 × κ U G ( U k , V k , z k + 1 ) , U + λ U 2 , 0 + γ 1 , k 2 U U k F 2
7:
   Set γ 2 , k τ U k + 1 + μ , solve for V k + 1 :
V k + 1 arg min V R n 2 × κ V G ( U k + 1 , V k , z k + 1 ) , V + λ V 2 , 0 + γ 2 , k 2 V V k F 2
8:
end while
9:
Output:  X = U k + 1 ( V k + 1 )
(i) 
For each k N , it holds that Θ ( w k + 1 ) < Θ ( w k ) min { α , μ } 2 w k w k + 1 F 2 . Hence, the sequence { Θ ( w k ) } k N is convergent.
(ii) 
For each k N , ( A 1 k + 1 , A 2 k + 1 , A 3 k + 1 ) Θ ( w k + 1 ) with
A 1 k + 1 : = U G ( w k + 1 ) U G ( U k , V k , z k + 1 ) γ 1 , k ( U k + 1 U k ) ; A 2 k + 1 : = V G ( w k + 1 ) V G ( U k + 1 , V k , z k + 1 ) γ 2 , k ( V k + 1 V k ) ; A 3 k + 1 : = α ( y k y k + 1 ) .
Assume that { w k } k N is bounded. Then, there exists constant c > 0 , such that
dist 0 , Θ ( w k + 1 ) c w k w k + 1 F .
Proof. 
(i) By the optimality of z k + 1 and the strong convexity of the subproblem (7) in Algorithm 1, one obtains
z k + 1 1 + α 2 y k z k + 1 2 + α 2 z k z k + 1 2 z k 1 + α 2 y k z k 2 .
Similar to U k + 1 and V k + 1 , one has
U G ( U k , V k , z k + 1 ) , U k + 1 + λ U k + 1 2 , 0 + γ 1 , k 2 U k + 1 U k F 2 U G ( U k , V k , z k + 1 ) , U k + λ U k 2 , 0 ; V G ( U k + 1 , V k , z k + 1 ) , V k + 1 + λ V k + 1 2 , 0 + γ 2 , k 2 V k + 1 V k F 2 V G ( U k + 1 , V k , z k + 1 ) , V k + λ V k 2 , 0 .
The above two inequalities imply that
λ U k 2 , 0 + λ V k 2 , 0 γ 1 , k 2 U k + 1 U k F 2 γ 2 , k 2 V k + 1 V k F 2     U G ( U k , V k , z k + 1 ) , U k + 1 U k + λ U k + 1 2 , 0 + λ V k + 1 2 , 0         + V G ( U k + 1 , V k , z k + 1 ) , V k + 1 V k .
By the Lipschitz continuity of U G ( · , V , z ) and V G ( U , · , z ) , one has
{ G U k + 1 , V k , z k + 1 G U k , V k , z k + 1 + U G ( U k , V k , z k + 1 ) , U k + 1 U k + τ V k 2 U k + 1 U k F 2 ; G U k + 1 , V k + 1 , z k + 1 G U k + 1 , V k , z k + 1 + U G ( U k + 1 , V k , z k + 1 ) , V k + 1 V k + τ U k + 1 2 V k + 1 V k F 2 .
Along with inequality (9), it immediately holds that
G ( w k + 1 ) + λ U k + 1 2 , 0 + λ V k + 1 2 , 0 G ( U k , V k , z k + 1 ) + λ U k 2 , 0 + λ V k 2 , 0 μ 2 U k + 1 U k F 2 μ 2 V k + 1 V k F 2 .
From the last inequality, the definition of Θ , the lower boundedness of Θ , and inequality (8), it follows that part (i) holds.
(ii) Under the optimal conditions of U k + 1 , V k + 1 and z k + 1 , one has
{ 0 α z k + 1 y k + · 1 z k + 1 ; 0 U G ( U k , V k , z k + 1 ) + γ 1 , k U k + 1 U k + λ · 2 , 0 U k + 1 ; 0 V G ( U k + 1 , V k , z k + 1 ) + γ 2 , k V k + 1 V k + λ · 2 , 0 V k + 1 .
Then, by Lemma 2, it is not hard to obtain
( A 1 k + 1 , A 2 k + 1 , A 3 k + 1 ) Θ ( w k + 1 ) .
Hence, dist 0 , Θ ( w k + 1 ) ( A 1 k + 1 , A 2 k + 1 , A 3 k + 1 ) F . By the boundedness of { w k } k N and the Lipschitz continuity of U G ( · , V , z ) and V G ( U , · , z ) , the desired result is obtained. □
From [[15], Section 4], Θ is a KL function. By Proposition 3 and using the same reasoning as in [[17], Theorem 1], the main convergence results are established.
Theorem 1.
Suppose that the sequence { w k } k N generated by Algorithm 1 is bounded. Then, the sequence { w k } k N is convergent and its limit ( U ˜ , V ˜ , z ˜ ) is a critical point of (3). If the nonzero elements of | A ( U ˜ V ˜ ) b | are not smaller than 1 α , then ( U ˜ , V ˜ ) is a critical point of Φ.

5. Numerical Experiments

The efficiency of Algorithm 1 is validated by solving matrix completion problems with outliers under uniform sampling, and its performance is compared with that of SubGM from [7] for solving problem (1). All numerical tests are conducted in MATLAB 2024b on a laptop computer running on a 64-bit Windows Operating System with an Intel(R) Core(TM) i9-13905H CPU 2.60GHz and 32 GB RAM (Intel, Santa Clara, CA, USA).

5.1. Implementation Details of Algorithms

Matrix completion problems involving outliers are considered. To generate the sampling operator A , a random index set Ω = ( i t , j t ) t = 1 , , m is assumed, with indices sampled independently from a uniform distribution. The mapping A is then defined by A ( X ) : = ( X i 1 , j 1 , X i 2 , j 2 , , X i m , j m ) for X R n 1 × n 2 , and b = A ( M Ω ) where M Ω R n 1 × n 2 , with
[ M Ω ] i t , j t = 0 if ( i t , j t ) Ω , M i t , j t + ϖ t if ( i t , j t ) Ω for t = 1 , 2 , , m .
The true matrix M R n 1 × n 2 and the sparse noisy vector ϖ = ( ϖ 1 , , ϖ m ) are considered. The true matrix M = M L ( M R ) R n 1 × n 2 of rank r is generated by independently sampling entries of M L R n 1 × r and M R R n 2 × r from the standard normal distribution N ( 0 , 1 ) . The nonzero entries of the sparse noisy vector ϖ = ( ϖ 1 , , ϖ m ) follow one of two distributions: (i) Student’s t-distribution with four degrees of freedom, scaled by 2 , or (ii) Laplace distribution with density d ( u ) = 0.5 exp ( | u | ) . Unless otherwise specified, the number of nonzero entries in ϖ is set to 0.3 m for all tests. The parameters are set as μ = 10 8 and λ = c λ b , where c λ > 0 is specified in the experiments. Unless otherwise stated, it takes κ = min ( n 1 , n 2 , 150 ) . The initial point ( U 0 , V 0 ) of Algorithm 1 is defined as ( P 1 Σ κ ( M Ω ) 1 / 2 , Q 1 Σ κ ( M Ω ) 1 / 2 ) , with P 1 and Q 1 denoting the matrices consisting of the first κ left and right singular vectors of M Ω , respectively. For SubGM, as in [7], the Polyak step size rule [18] is employed. Since the optimal value of (1) is typically unknown in practice, the step size Ψ ( U k , V k ) min ( U , V ) Ψ ( U , V ) ζ k F 2 is replaced with 0.05 Ψ ( U k , V k ) ζ k F 2 , where ζ k Ψ ( U k , V k ) for k N .
Algorithm 1 is terminated at iteration w k = ( U k , V k , z k ) , when k k max or one of the following conditions is satisfied:
max j { 1 , , 19 } | Θ ( w k ) Θ ( w k j ) | max { 1 , Θ ( w k ) } ϵ 1 or R 1 k + R 2 k + R 3 k 1 + b ϵ 2 for k 30 ,
where
R 1 k = α y k 1 y k ; R 2 k = U G ( U k 1 , V k 1 , z k ) U G ( w k ) + γ 1 , k 1 ( U k 1 U k ) F ; R 3 k = V G ( U k , V k 1 , z k ) V G ( w k ) + γ 2 , k 1 ( V k 1 V k ) F .
For a fair comparison, the initial point of SubGM is set to be identical to that of Algorithm 1. SubGM is terminated at iteration w k , whenever k > k max or either of the following conditions is satisfied: max j { 1 , , 9 } | Ψ ( U k , V k ) Ψ ( U k j , V k j ) | max { 1 , Ψ ( U k , V k ) } ϵ 3 or
U k + 1 ( V k + 1 ) U k ( V k ) F 1 + U k + 1 ( V k + 1 ) F ϵ 4 with k 200 .
Unless otherwise stated, ϵ 1 = 10 4 , ϵ 2 = 10 5 and k max = 1000 are set for Algorithm 1, while ϵ 3 = 5 × 10 4 , ϵ 4 = 5 × 10 4 , and k max = 500 are set for SubGM.
The matrix recovery performance is evaluated using the relative error (RE), defined by
RE : = X out M F M F ,
where X out = U out ( V out ) represents the output of a solver. All test results are averaged over 5 instances of each experiment.

5.2. Parameter Sensitivity Analysis

In this section, the effect of the parameters on the performance of Algorithm 1 is evaluated. Figure 1 illustrates the curves of RE, recovered rank, and running time (in seconds) for Algorithm 1 when solving problem (3) with n 1 = n 2 = 1000 , r = 5 , under Case I and Case II. As shown in Figure 1, for 2 α 8.5 , Algorithm 1 yields both the exact rank recovery and the lowest relative error in both cases. Values of α exceeding 8.5 degrade the algorithm’s performance. The RE increases significantly, accompanied by an increase in computational time. In view of this, α is set to 5 for all subsequent experiments.
Under the settings n 1 = n 2 = 1000 , r = 5 , κ = 10 r , and SR = 0.2 , Figure 2 illustrates the RE, recovered rank, and running time curves of Algorithm 1 for varying λ . The results show that a specific interval of λ values (e.g., 8 c λ 24 ) achieves low relative errors while accurately recovering the true rank of matrix M .
For n : = n 1 = n 2 and r = 10 , Figure 3 shows the variation in running time and RE as n increases. These results indicate that the computational time remains below 160 s when n 7000 and does not exceed 400 s even when n = 10 4 . With increasing n, the RE decreases rapidly, remaining below 0.015 for n 3000 .
For Figure 4 and Figure 5, the settings are n 1 = n 2 = 1000 and r = 5 . Figure 4 displays the average RE across five repetitions for sampling ratios or SRs = { 0.1 , 0.15 , 0.2 , , 0.5 } . Under uniform sampling, the RE decreases monotonically with increasing SR, while the running time consistently stays within 5 s.
In Figure 5, the variation in RE with the ratio of nonzero entries in the noise vector (NZR) is examined. NZR is set to { 0.1 , 0.15 , 0.18 , , 0.5 } . As shown in Figure 5, the RE increases accordingly with NZR, while the running time remains consistently lower than 4 s. Although the difficulty of the problem escalates as NZR increases, the RE remains close to 0.02 , even when NZR reaches 0.5 .

5.3. Numerical Comparisons with SubGM

The performance of Algorithm 1 in terms of solution quality and running time is compared with that of SubGM. Since the model (1) does not promote low-rank structures, κ = 3 r and n 1 = n 2 = n are adopted for the following tests. The numerical results of the two methods are obtained under the same stopping criterion described in Section 5.1, with ϵ 1 = ϵ 2 = ϵ 3 = ϵ 4 = 5 × 10 4 , and k max = 500 . Table 1 and Table 2 report the average results across five randomly generated instances for each setting. Compared with SubGM, Algorithm 1 consistently achieves the lowest RE in significantly less computational time.
Reconstructing images from limited and noisy measurements is a key application of low-rank matrix completion. To further evaluate the effectiveness of the proposed method, experiments were conducted on images from the ZJU dataset [19]. Each image has dimensions of 300 × 300 . The initial points for both algorithms were obtained using the same procedure described in Section 5.1, with κ = 40 .
The results are presented in Table 3 and Figure 6. The images were undersampled using various sampling rates and masks and then restored using a similar process reported in [9]. Table 3 lists the relative error and peak signal-to-noise ratio (PSNR) for each image restoration task. The results show that the proposed algorithm outperforms SubGM in terms of both computational time and restoration quality.

6. Conclusions

This paper has focused on robust factorization models for low-rank matrix recovery, which are of great significance in multiple fields such as machine learning and imaging sciences. To deal with the challenge of outliers, an 1 -loss robust factorized model with 2 , 0 -norm regularization is proposed. Given the nonconvex and discontinuous nature of the problem, an approximation problem (3) is constructed. Under mild assumptions, the equivalence of stationary points between the original and the approximation problems is verified, and it is proven that the global optimum of the approximation problem can serve as an approximate optimal solution for the original problem in terms of objective value. On top of that, a fast PALM method is proposed to solve the approximation problem, and the global convergence of the iterate sequence is established. The numerical experiments on synthetic datasets with outliers and image restoration tasks demonstrate that the PALM method achieves low relative errors within a significantly shorter computational time, particularly when dealing with large-scale datasets. Despite the promising results of the proposed approach, there are some limitations. The performance of the algorithm relies on the proper selection of parameters such as λ and α . To address these limitations, future research works will include the development of adaptive parameter-tuning strategies.

Author Contributions

Methodology, T.T. and J.Z.; Validation, L.X.; Formal analysis, T.T. and L.X.; Investigation, J.Z.; Writing—original draft, T.T., L.X. and J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

T.T. was supported by the Guangdong Basic and Applied Basic Research Foundation No. 2023A1515111167. L.X. was supported by the Guangdong Basic and Applied Basic Research Foundation No. 2022A1515110959. J.Z. was funded by the National Natural Science Foundation of China No. 12401630, the Educational Commission of Guangdong Province No. 2023KQNCX073, and the Natural Science Foundation of Guangdong Province No. 2023A1515110558.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Candès, E.J.; Recht, B. Exact matrix completion via convex optimization. Found. Comput. Math. 2009, 9, 717–772. [Google Scholar] [CrossRef]
  2. Davenport, M.A.; Romberg, J. An overview of low-rank matrix recovery from incomplete observations. IEEE J. Sel. Top. Signal Process. 2016, 10, 608–622. [Google Scholar] [CrossRef]
  3. Fazel, M.; Hindi, H.; Boyd, S. Rank Minimization and Applications in System Theory. Proc. 2004 Am. Control Conf. 2004, 4, 3273–3278. [Google Scholar]
  4. Gross, D.; Liu, Y.K.; Flammia, S.T.; Becker, S.; Eisert, J. Quantum state tomography via compressed sensing. Phys. Rev. Lett. 2010, 105, 150401. [Google Scholar] [CrossRef] [PubMed]
  5. Negahban, S.; Wainwright, M.J. Estimation of (near) low-rank matrices with noise and high-dimensional scaling. Ann. Stat. 2011, 39, 1069–1097. [Google Scholar] [CrossRef]
  6. Charisopoulos, V.; Chen, Y.; Davis, D.; Díaz, M.; Ding, L.; Drusvyatskiy, D. Low-Rank Matrix Recovery with Composite Optimization: Good Conditioning and Rapid Convergence. Found. Comput. Math. 2021, 21, 1505–1593. [Google Scholar] [CrossRef]
  7. Li, X.; Zhu, Z.H.; So, A.M.; Vidal, R. Nonconvex Robust Low-Rank Matrix Recovery. SIAM J. Optim. 2020, 30, 660–686. [Google Scholar] [CrossRef]
  8. Ma, J.H.; Fattahi, S. Global Convergence of Sub-gradient Method for Robust Matrix Recovery: Small Initialization, Noisy Measurements, and Over-parameterization. J. Mach. Learn. Res. 2023, 24, 1–84. [Google Scholar]
  9. Wang, Z.Y.; So, H.C.; Liu, Z.F. Fast and robust rank-one matrix completion via maximum correntropy criterion and half-quadratic optimization. Signal Process. 2022, 198, 108580. [Google Scholar] [CrossRef]
  10. Tao, T.; Qian, Y.T.; Pan, S.H. Column 2,0-norm regularized factorization model of low-rank matrix recovery and its computation. SIAM J. Optim. 2022, 32, 959–988. [Google Scholar] [CrossRef]
  11. Candès, E.J.; Li, X.; Ma, Y.; Wright, J. Robust principal component analysis. J. ACM 2011, 11, 1–37. [Google Scholar] [CrossRef]
  12. Josz, C.; Ouyang, Y.; Zhang, R.; Lavaei, J.; Sojoudi, S. A theory on the absence of spurious solutions for nonconvex and nonsmooth optimization. Adv. Neural Inf. Process. Syst. 2018, 31, 2441–2449. [Google Scholar]
  13. Li, Y.; Sun, Y.; Chi, Y. Low-rank positive semidefinite matrix recovery from corrupted rank-one measurements. IEEE Trans. Signal Process. 2017, 65, 397–408. [Google Scholar] [CrossRef]
  14. Rockafellar, R.T.; Wets, R.J.-B. Variational Analysis; Springer: Berlin/Heidelberg, Germany, 1998. [Google Scholar]
  15. Attouch, H.; Bolte, J.; Redont, P.; Soubeyran, A. Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 2010, 35, 438–457. [Google Scholar] [CrossRef]
  16. Nocedal, J.; Wright, S.J. Numerical Optimization; Springer: Berlin/Heidelberg, Germany, 2000. [Google Scholar]
  17. Bolte, J.; Sabach, S.; Teboulle, M. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 2014, 146, 459–494. [Google Scholar] [CrossRef]
  18. Polyak, B.T. Minimization of unsmooth functions. USSR Comput. Math. Math. Phys. 1969, 9, 14–29. [Google Scholar] [CrossRef]
  19. Hu, Y.; Zhang, D.; Ye, J.; Li, X.; He, X. Fast and accurate matrix completion via truncated nuclear norm regularization. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 2117–2130. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Curves of RE, rank, and running time of Algorithm 1 with different α values.
Figure 1. Curves of RE, rank, and running time of Algorithm 1 with different α values.
Mathematics 13 01466 g001
Figure 2. Curves of relative error, rank, and running time of Algorithm 1 in Case I.
Figure 2. Curves of relative error, rank, and running time of Algorithm 1 in Case I.
Mathematics 13 01466 g002
Figure 3. RE and running time curves of Algorithm 1 in Case I–II with different values of n.
Figure 3. RE and running time curves of Algorithm 1 in Case I–II with different values of n.
Mathematics 13 01466 g003
Figure 4. RE and running time curves of Algorithm 1 in Case I with different SRs.
Figure 4. RE and running time curves of Algorithm 1 in Case I with different SRs.
Mathematics 13 01466 g004
Figure 5. RE and running time curves of Algorithm 1 in Case I with different NZRs.
Figure 5. RE and running time curves of Algorithm 1 in Case I with different NZRs.
Mathematics 13 01466 g005
Figure 6. Experimental results of image recovery.
Figure 6. Experimental results of image recovery.
Mathematics 13 01466 g006
Table 1. The results of the algorithm for synthetic data in Case I.
Table 1. The results of the algorithm for synthetic data in Case I.
Algorithm 1SubGM
n (r*, SR) c λ RERankTime (s)RERankTime (s)
1000(5, 0.10)102.61  × 10 2 51.501.26  × 10 1 154.04
(5, 0.15)101.94  × 10 2 51.487.71  × 10 2 154.71
(5, 0.20)101.59  × 10 2 51.606.13  × 10 2 155.28
(5, 0.20)101.40  × 10 2 51.734.93  × 10 2 155.56
(10, 0.10)63.34  × 10 2 101.882.58  × 10 1 304.35
(10, 0.15)62.26  × 10 2 102.881.38  × 10 1 305.26
(10, 0.20)62.15  × 10 2 102.348.20  × 10 2 305.92
(10, 0.20)61.54  × 10 2 102.335.88  × 10 2 306.26
3000(10, 0.10)61.36  × 10 2 1015.03.61  × 10 2 3040.6
(10, 0.15)61.05  × 10 2 1017.12.91  × 10 2 3050.6
(10, 0.20)68.95  × 10 3 1018.32.23  × 10 2 3051.4
(10, 0.20)61.24  × 10 2 1020.52.64  × 10 2 3056.3
(20, 0.10)61.57  × 10 2 2020.37.73  × 10 2 6044.1
(20, 0.15)61.15  × 10 2 2020.03.41  × 10 2 6049.6
(20, 0.20)69.52  × 10 3 2022.42.42  × 10 2 6055.5
(20, 0.20)68.33  × 10 3 2023.22.07  × 10 2 6061.6
5000(10, 0.10)109.92  × 10 3 1045.82.29  × 10 2 30140.9
(10, 0.15)107.87  × 10 3 1050.01.93  × 10 2 30138.8
(10, 0.20)106.73  × 10 3 1046.91.87  × 10 2 30141.4
(10, 0.20)106.01  × 10 3 1050.11.61  × 10 2 30157.7
8000(10, 0.10)107.63  × 10 3 10121.31.53  × 10 2 30312.1
(10, 0.15)108.21  × 10 3 10140.21.62  × 10 2 30387.8
(10, 0.20)105.21  × 10 3 10144.91.40  × 10 2 30432.9
(10, 0.20)104.71  × 10 3 10154.71.49  × 10 2 30475.7
Table 2. The results of the algorithm for synthetic data in Case II.
Table 2. The results of the algorithm for synthetic data in Case II.
Algorithm 1SubGM
n (r*, SR) c λ RERankTime (s)RERankTime (s)
1000(5, 0.10)102.42  × 10 2 51.378.76  × 10 2 153.85
(5, 0.15)101.82  × 10 2 51.436.14  × 10 2 154.81
(5, 0.20)101.49  × 10 2 51.395.03  × 10 2 155.49
(5, 0.20)101.31  × 10 2 51.284.61  × 10 2 155.70
1000(10, 0.10)63.00  × 10 2 101.861.85  × 10 1 304.31
(10, 0.15)62.07  × 10 2 101.889.23  × 10 2 304.26
(10, 0.20)61.65  × 10 2 102.246.02  × 10 2 305.92
(10, 0.20)61.42  × 10 2 102.284.71  × 10 2 306.26
3000(10, 0.10)61.27  × 10 2 1015.53.11  × 10 1 3040.6
(10, 0.15)69.85  × 10 3 1016.12.71  × 10 2 3045.6
(10, 0.20)68.33  × 10 3 1017.32.27  × 10 2 3050.4
(10, 0.20)67.40  × 10 3 1018.52.11  × 10 2 3055.3
3000(20, 0.10)61.44  × 10 2 2020.35.30  × 10 2 6044.1
(20, 0.15)61.07  × 10 2 2022.02.79  × 10 2 6049.6
(20, 0.20)68.82  × 10 3 2022.42.11  × 10 2 6054.5
(20, 0.20)67.73  × 10 3 2023.21.85  × 10 2 6059.6
5000(10, 0.10)109.29  × 10 3 1040.82.07  × 10 2 30110.9
(10, 0.15)107.37  × 10 3 1048.01.84  × 10 2 30126.8
(10, 0.20)106.29  × 10 3 1046.91.75  × 10 2 30141.4
(10, 0.20)105.62  × 10 3 1050.11.61  × 10 2 30156.7
8000(10, 0.10)107.13  × 10 3 1099.31.51  × 10 2 30283.1
(10, 0.15)105.71  × 10 3 10114.21.56  × 10 2 30337.8
(10, 0.20)104.91  × 10 3 10114.91.50  × 10 2 30353.9
(10, 0.20)104.38  × 10 3 10124.71.48  × 10 2 30395.7
Table 3. Summary of experimental results of image recovery.
Table 3. Summary of experimental results of image recovery.
AlgorithmREPSNRTime (s)
Sampling Ratio = 50%Algorithm 10.070830.571.72
SubGM0.099127.822.60
Sampling Ratio = 20%Algorithm 10.117824.220.76
SubGM0.136123.071.84
Sampling Ratio = 10%Algorithm 10.036933.610.59
SubGM0.111324.061.49
Image with text maskAlgorithm 10.163921.100.48
SubGM0.170020.824.84
Image with cross maskAlgorithm 10.093416.711.34
SubGM0.18117.405.67
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tao, T.; Xiao, L.; Zhong, J. A Fast Proximal Alternating Method for Robust Matrix Factorization of Matrix Recovery with Outliers. Mathematics 2025, 13, 1466. https://doi.org/10.3390/math13091466

AMA Style

Tao T, Xiao L, Zhong J. A Fast Proximal Alternating Method for Robust Matrix Factorization of Matrix Recovery with Outliers. Mathematics. 2025; 13(9):1466. https://doi.org/10.3390/math13091466

Chicago/Turabian Style

Tao, Ting, Lianghai Xiao, and Jiayuan Zhong. 2025. "A Fast Proximal Alternating Method for Robust Matrix Factorization of Matrix Recovery with Outliers" Mathematics 13, no. 9: 1466. https://doi.org/10.3390/math13091466

APA Style

Tao, T., Xiao, L., & Zhong, J. (2025). A Fast Proximal Alternating Method for Robust Matrix Factorization of Matrix Recovery with Outliers. Mathematics, 13(9), 1466. https://doi.org/10.3390/math13091466

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop