A Fast Proximal Alternating Method for Robust Matrix Factorization of Matrix Recovery with Outliers

Tao, Ting; Xiao, Lianghai; Zhong, Jiayuan

doi:10.3390/math13091466

Open AccessArticle

A Fast Proximal Alternating Method for Robust Matrix Factorization of Matrix Recovery with Outliers

by

Ting Tao

¹,

Lianghai Xiao

^2,* and

Jiayuan Zhong

^1,*

¹

School of Mathematics, Foshan University, Foshan 528011, China

²

College of Information Science and Technology, Jinan University, Guangzhou 510632, China

^*

Authors to whom correspondence should be addressed.

Mathematics 2025, 13(9), 1466; https://doi.org/10.3390/math13091466

Submission received: 20 March 2025 / Revised: 21 April 2025 / Accepted: 26 April 2025 / Published: 29 April 2025

Download

Browse Figures

Versions Notes

Abstract

This paper concerns a class of robust factorization models of low-rank matrix recovery, which have been widely applied in various fields such as machine learning and imaging sciences. An

ℓ_{1}

-loss robust factorized model incorporating the

ℓ_{2, 0}

-norm regularization term is proposed to address the presence of outliers. Since the resulting problem is nonconvex, nonsmooth, and discontinuous, an approximation problem that shares the same set of stationary points as the original formulation is constructed. Subsequently, a proximal alternating minimization method is proposed to solve the approximation problem. The global convergence of its iterate sequence is also established. Numerical experiments on matrix completion with outliers and image restoration tasks demonstrate that the proposed algorithm achieves low relative errors in shorter computational time, especially for large-scale datasets.

Keywords:

matrix recovery with outliers; global convergence; column ℓ2,0-norm; alternating method

MSC:

49J52; 49M27; 90C26

1. Introduction

The low-rank matrix recovery problem seeks to recover a true yet unknown low-rank matrix

M^{*}

of rank

r^{*}

from the minimal feasible observations. This problem encompasses various applications in numerous fields, including signal and image processing, quantum state tomography, control and system identification, statistics, machine learning, etc. (see [1,2,3,4,5]). This paper considers a robust factorized model for the low-rank matrix recovery problem, in which the observations contain outliers. The model is expressed as

b = A (M^{*}) + ω,

where

A : R^{n_{1} \times n_{2}} \to R^{m}

is a sampling operator,

b \in R^{m}

is the observation vector, and

ω

is a sparse noise vector with arbitrarily large nonzero entries, while the rest of the elements are zero. In scenarios where outliers exist, the traditional smooth least-squared loss function becomes too sensitive to outliers and is probably biased. In view of this, studies [6,7] have proposed a non-smooth

ℓ_{p}

-loss with balance regularization model:

\begin{matrix} min_{U \in R^{n_{1} \times κ}, V \in R^{n_{2} \times κ}} Ψ (U, V) : = ∥ A (U V^{⊤}) {- b ∥}_{p} + γ {∥ U U^{⊤} - V V^{⊤} ∥}_{F}, \end{matrix}

(1)

where

γ \geq 0

. When

γ > 0

, for

p = 1

and

κ = r^{*}

, Li et al. [7] established exact matrix recovery based on the

ℓ_{1} / ℓ_{2}

-restricted isometry property of the sampling operator

A

and proved that the proposed subgradient method with geometrically diminishing step sizes converges linearly to the ground-truth matrix. For

p = 1

or 2 and

κ = r^{*}

, Charisopoulos et al. [6] proved that the subgradient method and prox-linear method converge at a fast dimension-independent rate under the same assumptions as in [7]. When

γ = 0

, for

p = 1

, Ma and Fattahi [8] showed that the subgradient method converges to the true matrix under the Sign-RIP condition [[8], Definition 7]. Additionally, Wang et al. [9] developed a robust and fast rank-one matrix completion algorithm by minimizing the Welsch cost function.

Although the subgradient method exhibits favorable convergence properties in solving the problem (1), it is limited by some strict assumptions. Practically, the rank of the ground-truth matrix is generally unknown, and consequently, the regularization term in (1) fails to induce low-rank structures. Inspired by [10], this paper utilizes the

ℓ_{2, 0}

-norm in the regularization term and studies the following

ℓ_{1}

-loss factorized model of low-rank matrix recovery:

\begin{matrix} min_{U \in R^{n_{1} \times κ}, V \in R^{n_{2} \times κ}} Φ (U, V) : = ∥ A (U V^{⊤}) {- b ∥}_{1} + {λ (∥ U ∥}_{2, 0} + {∥ V ∥}_{2, 0}), \end{matrix}

(2)

where

κ

is an upper estimator of

r^{*}

, and

{∥ \cdot ∥}_{2, 0}

denotes the

ℓ_{2, 0}

-norm of matrices, which measures the number of the nonzero column. The regularization term

λ (∥ U ∥_{2, 0} + ∥ V ∥_{2, 0})

is used to reduce rank through column sparsity. The

ℓ_{1}

-loss function, on the other hand, is more robust against outliers and has been widely utilized for outlier detection studies [11,12,13]. Both terms of the objective function (2) are nonsmooth. On top of that, the function suffers from both nonconvexity and discontinuity due to the existence of the

ℓ_{2, 0}

-norm regularization term. The discontinuity causes the subgradient algorithm to fail when it is applied to problem (2). While the alternating direction method of multipliers is commonly applied to address problem (2), there is still no theoretical guarantee of its convergence in the context of such nonconvex and nonsmooth optimization problems. For methods with convergence guarantees, the majority of them impose, within their objective functions, either the assumption of smoothness for at least one term or the continuity assumption. To the best of our knowledge, no state-of-the-art algorithm is capable of handling this type of nonconvex and discontinuous problem.

The major contributions of this work are threefold. Firstly, to handle the nonconvex and discontinuous problem (2), a novel potential function is constructed:

\begin{matrix} min_{U \in R^{n_{1} \times κ}, V \in R^{n_{2} \times κ}, z \in R^{m}} Θ (U, V, z) : = \frac{α}{2} ∥ A (U V^{⊤}) {- b - z ∥}^{2} + {∥ z ∥}_{1} + {λ (∥ U ∥}_{2, 0} + {∥ V ∥}_{2, 0}), \end{matrix}

(3)

where

α > 0

is a given constant. It has been proven that, under mild assumptions, the sets of stationary points of (2) and (3) are equivalent. Moreover, it has been demonstrated that any global optimal solution to problem (3) is an approximate optimal solution to problem (2). This equivalence enables us to find solutions to problem (2) by solving problem (3), and the latter is more computationally efficient. Secondly, a proximal alternating linearized minimization (PALM) method is proposed to solve (3). The global convergence of PALM method is also established by exploiting the Kurdyka–Łojasiewicz (KL) property of the objective function. Finally, numerical experiments conducted on both synthetic and real-world datasets have validated that the PALM method for problem (3) outperforms SubGM [6,7] for problem (1) with

p = 1

.

The remainder of this paper is organized as follows. Section 2 introduces preliminaries and foundational concepts. In Section 3, the relationship between problems (2) and (3) is established. Section 4 presents the PALM method to solve the problem (3), with its global convergence established. In Section 5, the numerical performance of the PALM method is evaluated, and its performance is compared with that of SubGM to solve the problem (1) through experiments on various datasets. Section 6 concludes the paper.

2. Preliminaries

This section introduces the notation employed and then presents the definitions of stationary points and the KL property.

2.1. Notations

Throughout this paper,

R^{n_{1} \times n_{2}}

represents the vector space of all

n_{1} \times n_{2}

real matrices, equipped with the trace inner product

〈 X, Y 〉 = trace (X^{⊤} Y)

for

X, Y \in R^{n_{1} \times n_{2}}

and its induced Frobenius norm. For a matrix

X \in R^{n_{1} \times n_{2}}

,

Z_{j}

denotes the jth column of Z,

J_{Z}

represents its index set of nonzero columns, and

{∥ X ∥}_{F}

and

{∥ X ∥}_{2, 0}

denote the Frobenius norm and the column

ℓ_{2, 0}

-norm of X, which means the number of nonzero columns. For a matrix

X \in R^{n_{1} \times n_{2}}

, let

σ (X) : = {(σ_{1} (X), \dots, σ_{n} (X))}^{⊤}

with

σ_{1} (X) \geq \dots \geq σ_{n} (X)

, where

σ_{i} (X)

denotes the singular value of X,

Σ_{κ} (X) : = Diag (σ_{1} (x), \dots, σ_{κ} (X))

. For a vector

x \in R^{n}

,

{∥ x ∥}_{1}

denotes the

ℓ_{1}

-norm and

∥ x ∥

denotes the

ℓ_{2}

-norm. Let

\nabla_{U} F (U, V)

and

\nabla_{V} F (U, V)

denote the partial gradient of F at

(U, V)

w.r.t. variable U and V, respectively. For any

x \in R^{m}

,

sign (x) = [sign (x_{1}), \dots, sign (x_{m})]

with

sign (t) : = \{\begin{matrix} t / | t |, & t \neq 0; \\ 0, & t = 0 . \end{matrix} for any t \in R .

For convenience, define

G (U, V, z) : = \frac{α}{2} {∥ A (U V^{⊤}) - b - z ∥}^{2}

,

F (U, V) : = ∥ A (U V^{⊤}) {- b ∥}_{1} and g (U) : = {∥ U ∥}_{2, 0} .

2.2. Stationary Points and $ϵ$ -Global Optimal Solutions

From [14], the generalized subdifferentials of an extended real-valued function h:

R^{n \times m} \to \bar{R} : = (- \infty, \infty]

at a point where h attains a finite value are recalled.

Definition 1.

Given a function

h : R^{n \times m} \to \bar{R}

with a point x such that

h (x)

is finite. The regular subdifferential of h at x is defined as follows:

\hat{\partial} h (x) : = \{v \in R^{n \times m} | \underset{x \neq x^{'} \to x}{lim inf} \frac{h (x^{'}) - h (x) - 〈 v, x^{'} - x 〉}{∥ x^{'} {- x ∥}_{F}} \geq 0\},

and the basic (or, say, limiting or Morduhovich) subdifferential of h at x is defined as

\partial h (x) : = \{v \in R^{n \times m} | \exists x^{k} \to x with h (x^{k}) \to h (x) and v^{k} \to v with v^{k} \in \hat{\partial} h (x^{k})\} .

Consider a proper lower semicontinuous (lsc) function

h : R^{n \times m} \to \bar{R}

, the stationary point and

ϵ

-global optimal solution to problem

min_{x \in R^{n \times m}} h (x)

are defined as follows.

Definition 2.

For a point

x \in R^{n \times m}

such that

0 \in \partial h (x)

, it is called a stationary point of

min_{x \in R^{n \times m}} h (x)

.

Definition 3.

For a point

x^{*} \in R^{n \times m}

such that

h (x^{*}) - ϵ \leq h (x) \forall x \in R^{n \times m},

it is called an ϵ-global optimal solution of

min_{x \in R^{n \times m}} h (x)

.

2.3. Kurdyka–Łojasiewicz Property

From [15], the KL property of an extended real-valued function is stated as follows.

Definition 4.

Given

h : X \to \bar{R}

a proper lsc function. It is said to have the Kurdyka–Łojasiewicz (KL) property at

\bar{x} \in dom \partial h

if there exists

η \in (0, \infty]

; a continuous concave function

φ : [0, η) \to R_{+}

satisfies

(i): $φ (0) = 0$ and φ is continuously differentiable on $(0, η)$ , and
(ii): $\forall s \in (0, η)$ , $φ^{'} (s) > 0$ ;

and a neighborhood

U

of

\bar{x}

such that

\forall x \in U \cap [h (\bar{x}) < h (x) < h (\bar{x}) + η],

φ^{'} (h (x) - h (\bar{x})) dist (0, \partial h (x)) \geq 1 .

If the function h satisfies the KL property at any points of

dom \partial h

, then h is called a KL function.

Remark 1.

According to Lemma 2.1 of [15], a proper lsc function has the KL property at any noncritical point. Thus, to prove that a proper lsc function

h : X \to \bar{R}

is a KL function, it suffices to check whether h has the KL property at any critical point.

3. Relationship Between Problems (2) and (3)

This section demonstrates the relationship between the stationary points and global optimal solutions of (2) and (3). On top of that, the stationary point sets of (2) and (3) are characterized. The subdifferential of

Φ

at an arbitrary point

(U, V) \in R^{n_{1} \times κ} \times R^{n_{2} \times κ}

is given in the following lemma.

Lemma 1.

Fix any

λ > 0

. Consider any

(U, V) \in R^{n_{1} \times κ} \times R^{n_{2} \times κ}

. Then, it holds that

\begin{matrix} \hat{\partial} Φ (U, V) = \partial Φ (U, V) = \{(\begin{matrix} A^{*} (ξ) V + λ \partial g (U) \\ {[A^{*} (ξ)]}^{⊤} U + λ \partial g (V) \end{matrix}) | ξ \in \partial {∥ \cdot ∥}_{1} (A (U V^{⊤}) - b)\}, \end{matrix}

where for any

X \in R^{n \times m}

\hat{\partial} g (X) = \partial g (X) = S_{1} \times \dots \times S_{m} with S_{j} = \{\begin{matrix} {0}^{n} & if j \in J_{X}; \\ R^{n} & if j \notin J_{X} . \end{matrix}

Proof.

Recall that

Φ (U, V) = F (U, V) + λ (g (U) + g (V))

. From [[14], Corollary 10.9 and Exercise 10.10], it immediately follows that

\begin{array}{l} \hat{\partial} F (U, V) + λ \hat{\partial} g (U) \times λ \hat{\partial} g (V) & \subset \hat{\partial} Φ (U, V) \subset \partial Φ (U, V) \\ \subset \partial F (U, V) + λ \partial g (U) \times λ \partial g (V) . \end{array}

Next,

\partial F (U, V)

and

\hat{\partial} F (U, V)

are calculated. According to the chain rule [[14], Corollary 8.11 and Theorem 10.6], we obtain

\begin{matrix} \partial F (U, V) = \hat{\partial} F (U, V) = \{(\begin{matrix} A^{*} (ξ) V \\ {[A^{*} (ξ)]}^{⊤} U \end{matrix}) | ξ \in \partial {∥ \cdot ∥}_{1} (A (U V^{⊤}) - b)\} . \end{matrix}

For any

U \in R^{n_{1} \times κ}

, by invoking [[14], Proposition 10.5], it immediately follows that

\hat{\partial} g (U) = \partial g (U) = S_{1} \times \dots \times S_{κ} with S_{j} = \{\begin{matrix} {0}^{n_{1}} & if j \in J_{U}; \\ R^{n_{1}} & if j \notin J_{U} . \end{matrix}

(4)

This, along with the inclusion above, implies that the desired result holds. □

The subdifferential of

Θ

is characterized by the following lemma.

Lemma 2.

Fix any

λ > 0

. Consider any

(U, V, z) \in R^{n_{1} \times κ} \times R^{n_{2} \times κ} \times R^{m}

. Then, it holds that

\begin{matrix} \hat{\partial} Θ (U, V, z) = \partial Θ (U, V) = (\begin{matrix} α A^{*} (A (U V^{⊤}) - b - z) V + λ \partial g (U) \\ α {[A^{*} (A (U V^{⊤}) - b - z)]}^{⊤} U + λ \partial g (V) \\ α (z - A (U V^{⊤}) + b) + \partial {∥ z ∥}_{1} \end{matrix}), \end{matrix}

where

\partial g

is defined in Lemma 1.

Proof.

Recall that

Θ (U, V, z) = G (U, V, z) + {∥ z ∥}_{1} + λ (g (U) + g (V))

. Then, from [[14], Exercise 8.8(c) and Proposition 10.5], it immediately follows that

\begin{matrix} \nabla G (U, V, z) + λ \hat{\partial} g (U) \times λ \hat{\partial} g (V) \times \partial {∥ z ∥}_{1} & \subset \hat{\partial} Θ (U, V, z) \subset \partial Θ (U, V, z) \\ \subset \nabla G (U, V, z) + λ \partial g (U) \times λ \partial g (V) \times {\partial ∥ z ∥}_{1} . \end{matrix}

By the smoothness and the expression of

G (U, V, z)

, one has

\begin{matrix} \nabla G (U, V, z) = (\begin{matrix} α A^{*} (A (U V^{⊤}) - b - z) V \\ α {[A^{*} (A (U V^{⊤}) - b - z)]}^{⊤} U \\ α (z - A (U V^{⊤}) + b) \end{matrix}) . \end{matrix}

Along with the above inclusion and (4), the desired results are obtained. □

The following proposition states the equivalence between the stationary points of problems (2) and (3) under some mild conditions.

Proposition 1.

If

(U, V, z) \in R^{n_{1} \times κ} \times R^{n_{2} \times κ} \times R^{m}

is an arbitrary stationary point of (3) satisfying the nonzero elements of

| A (U V^{⊤}) - b |

not smaller than

\frac{1}{α}

, then

(U, V)

is a stationary point of (2). Conversely, if

(U, V) \in R^{n_{1} \times κ} \times R^{n_{2} \times κ}

is an arbitrary stationary point of (2) satisfying all elements of

| A (U V^{⊤}) - b |

not smaller than

\frac{1}{α}

, then there exists

z \in A (U V^{⊤}) - b - sign (A (U V^{⊤}) - b)

such that

(U, V, z)

is a stationary point of (3).

Proof.

Pick any stationary point

(U, V, z)

of (3). From Lemma 2, it holds that

{\begin{array}{l} 0 \in α A^{*} (A ({U V}^{⊤}) - b - z) V + λ \partial g (U); & (5a) \\ 0 \in α {[A^{*} (A ({U V}^{⊤}) - b - z)]}^{⊤} U + λ \partial g (V); & (5b) \\ 0 \in - α (A ({U V}^{⊤}) - b - z) + \partial {‖ z ‖}_{1} . & (5c) \end{array}

Write

y : = A (U V^{⊤}) - b

. From (5c), it can be concluded that

z = sign (y) \circ max (0, | y | - \frac{1}{α})

. Then,

y - z = sign (y) \circ min (| y |, \frac{1}{α}),

(6)

Along with

| y_{i} | \geq \frac{1}{α}

or

y_{i} = 0

for all

i = 1, \dots, m

, one has

y - z = \frac{1}{α} sign (y)

. Thus, substituting

y - z = \frac{1}{α} sign (y)

into (5a) and (5b), one obtains

\{\begin{matrix} 0 \in A^{*} (sign (A (U V^{⊤}) - b)) V + λ \partial g (U); \\ 0 \in {[A^{*} (sign (A (U V^{⊤}) - b))]}^{⊤} U + λ \partial g (V) . \end{matrix}

This together with the definition of

sign (\cdot)

implies that

(U, V)

is a stationary point of (2).

Conversely, any

(U, V)

is a stationary point of (2). According to Lemma 1, there exists

{ξ \in \partial ∥ \cdot ∥}_{1} (A (U V^{⊤}) - b)

such that

\{\begin{matrix} 0 \in A^{*} (ξ) V + λ \partial g (U) \\ 0 \in {[A^{*} (ξ)]}^{⊤} U + λ \partial g (V) \end{matrix}

Write

y : = A (U V^{⊤}) - b

and

z : = y - \frac{1}{α} ξ

. Then,

(U, V, z)

satisfies (5a) and (5b). By noting that

| ξ_{i} | \leq 1

for all

i = 1, \dots, m

, and

| y_{i} | \geq \frac{1}{α}

for all

i = 1, \dots, m

, one has

α (y - z) = {ξ \in \partial ∥ \cdot ∥}_{1} (y - \frac{1}{α} ξ) = \partial {∥ z ∥}_{1}

which means that (5c) holds. This implies that

(U, V, z)

is a stationary point of (3). The proof is completed. □

Remark 2.

The converse of the above property generally does not hold, as the obtained vector y is expected to be sparse. This indicates that the set of stationary points for problem (3) is typically smaller than that of problem (2). Through the relationship between the global optimal solutions of (2) and (3) in next proposition, the stationary points of (3) exclude certain undesirable points contained in the stationary point set of (2).

Proposition 2.

If

(U^{*}, V^{*}, z^{*})

is a global optimal solution of (3), then

(U^{*}, V^{*})

is a

\frac{m}{2 α}

-global optimal solution of (2).

Proof.

If

(U^{*}, V^{*}, z^{*})

is a global optimal solution of (3), then

(U^{*}, V^{*}, z^{*})

is a stationary point of (3). Write

y^{*} : = A (U^{*} {(V^{*})}^{⊤}) - b

. From (6), it holds that

y^{*} - z^{*} = sign (y^{*}) \circ min (| y^{*} |, \frac{1}{α}) .

Let

I : = {i \in [m] | | y_{i}^{*} | < \frac{1}{α}}

and

\bar{I} : = {i \in [m] | | y_{i}^{*} | \geq \frac{1}{α}}

. Then, one has

\begin{matrix} G (U^{*}, V^{*}, z^{*}) = & \frac{α}{2} ∥ y^{*} - z^{*} ∥^{2} + {∥ z^{*} ∥}_{1} \\ = & \frac{α}{2} ∥ sign (y^{*}) \circ min (| y^{*} |, \frac{1}{α} {) ∥}^{2} + ∥ sign (y^{*}) \circ max (0, | y^{*} | - \frac{1}{α}) ∥_{1} \\ = & \frac{α}{2} \sum_{i \in I} | y_{i}^{*} |^{2} + \sum_{i \in \bar{I}} (| y_{i}^{*} | - \frac{1}{2 α}) \\ = & ∥ y^{*} ∥_{1} - \frac{m}{2 α} + \sum_{i \in I} (\frac{α}{2} | y_{i}^{*} |^{2} - | y_{i}^{*} | + \frac{1}{2 α}) \geq ∥ y^{*} ∥_{1} - \frac{m}{2 α}, \end{matrix}

where the inequality is due to

| y_{i}^{*} | < \frac{1}{α}

for

i \in I

. Since

(U^{*}, V^{*}, z^{*})

is a global optimal solution of (3), for any

(U, V, z)

with

z = A (U V^{⊤}) - b

\begin{matrix} Φ (U^{*}, V^{*}) - \frac{m}{2 α} \leq Θ (U^{*}, V^{*}, z^{*}) \leq Θ (U, V, z) = Φ (U, V) . \end{matrix}

This means that

(U^{*}, V^{*}, z^{*})

is a

\frac{m}{2 α}

-global optimal solution of (2). □

Remark 3.

Proposition 2 demonstrates that global optimal solution

(U^{*}, V^{*}, z^{*})

of problem (3) is an approximate optimal solution to problem (2), and as the parameter α increases, the value

Φ (U^{*}, V^{*})

progressively approaches the optimal value of problem (2).

4. A PALM Method for Solving Problem (3)

Recall that

G (U, V, z) = \frac{α}{2} {∥ A (U V^{⊤}) - b - z ∥}^{2}

is a smooth function. The gradients

\nabla_{U} G (\cdot, V, z)

and

\nabla_{V} G (U, \cdot, z)

are Lipschitz continuous with moduli defined as

τ_{V}

and

τ_{U}

, respectively. Fix any

(U^{'}, V^{'}, z^{'}) \in R^{n_{1} \times κ} \times R^{n_{2} \times κ} \times R^{m}

; according to decent Lemma [[16], Proposition A.24], it holds that

G (U, V^{'}, z^{'}) \leq G (U^{'}, V^{'}, z^{'}) + 〈 \nabla_{U} G (U^{'}, V^{'}, z^{'}), U - U^{'} 〉 + \frac{τ_{V^{'}}}{2} {∥ U - U^{'} ∥}_{F}^{2}, \forall U \in R^{n_{1} \times κ},

G (U^{'}, V, z^{'}) \leq G (U^{'}, V^{'}, z^{'}) + 〈 \nabla_{V} G (U^{'}, V^{'}, z^{'}), V - V^{'} 〉 + \frac{τ_{U^{'}}}{2} {∥ V - V^{'} ∥}_{F}^{2}, \forall V \in R^{n_{2} \times κ} .

Let

(U^{k}, V^{k}, z^{k})

be the current iteration. From the expression of

Θ

and the above two inequalities, one obtains

\{\begin{matrix} Θ (U, V^{k}, z^{k}) \leq {\tilde{Θ}}_{U} (U, V^{k}, z^{k}) : = G (U^{k}, V^{k}, z^{k}) + 〈 \nabla_{U} G (U^{k}, V^{k}, z^{k}), U - U^{k} 〉 \\ + {λ ∥ U ∥}_{2, 0} + ∥ z^{k} ∥_{1} + \frac{τ_{V^{k}}}{2} {∥ U - U^{k} ∥}_{F}^{2}, \\ Θ (U^{k}, V, z^{k}) \leq {\tilde{Θ}}_{V} (U^{k}, V, z^{k}) : = G (U^{k}, V^{k}, z^{k}) + 〈 \nabla_{V} G (U^{k}, V^{k}, z^{k}), V - V^{k} 〉 \\ + {λ ∥ V ∥}_{2, 0} + ∥ z^{k} ∥_{1} + \frac{τ_{U^{k}}}{2} {∥ V - V^{k} ∥}_{F}^{2}, \end{matrix}

which become equalities when

U = U^{k}

and

V = V^{k}

. Consequently,

{\tilde{Θ}}_{U} (\cdot, V^{k}, z^{k})

and

{\tilde{Θ}}_{V} (U^{k}, \cdot, z^{k})

serve as majorizations for

Θ (\cdot, V^{k}, z^{k})

at

U^{k}

and

Θ (U^{k}, \cdot, z^{k})

at

V^{k}

, respectively. An algorithm for problem (3) is developed by alternately minimizing the following subproblem. The iteration steps are outlined as follows.

Remark 4.

(i) For any

α > 0

, the optimal solution of problem (7) has the following form

z^{k + 1} = sign (y^{k}) \circ max (0, | y^{k} | - \frac{1}{α})

, where “∘” is the Hadamard product.

(i) Let

H^{k} = U^{k} - \frac{1}{γ_{1, k}} \nabla_{U} G (U^{k}, V^{k}, z^{k + 1})

and

S^{k} = V^{k} - \frac{1}{γ_{2, k}} \nabla_{V} G (U^{k + 1}, V^{k}, z^{k + 1})

. Then, the columns of

U^{k + 1}

and

V^{k + 1}

have the following closed form:

\begin{matrix} U_{i}^{k + 1} = sign [max (0, ∥ H_{i}^{k} ∥ - \sqrt{2 λ γ_{1, k}^{- 1}})] H_{i}^{k} for i = 1, \dots, r, \\ V_{i}^{k + 1} = sign [max (0, ∥ S_{i}^{k} ∥ - \sqrt{2 λ γ_{2, k}^{- 1}})] S_{i}^{k} for i = 1, \dots, r . \end{matrix}

For each

k \in N

, write

w^{k} = (U^{k}, V^{k}, z^{k})

. To establish the convergence of the sequence

{w^{k}}_{k \in N}

, the following proposition is required.

Proposition 3.

Let

{w^{k}}_{k \in N}

be the sequence generated by Algorithm 1. Then, the following statements hold.

Algorithm 1 (PALM Method for Solving (3))

1:: Input: Sampling operator $A$ and observation vector $b \in R^{m}$
2:: Initialization: Choose an initial $(U^{0}, V^{0}) \in R^{n_{1} \times κ} \times R^{n_{2} \times κ}$ , set $λ, μ > 0$
3:: while stopping conditions are not satisfied, do
4:: Compute $y^{k} = A (U^{k} {(V^{k})}^{⊤}) - b$
5:: Solve for $z^{k + 1}$ :

$\begin{matrix} z^{k + 1} = \underset{z \in R^{m}}{arg min} \frac{α}{2} ∥ y^{k} {- z ∥}^{2} + {∥ z ∥}_{1} . \end{matrix}$

(7)
6:: Set $γ_{1, k} \geq τ_{V^{k}} + μ$ , solve for $U^{k + 1}$ :

$U^{k + 1} \in \underset{U \in R^{n_{1} \times κ}}{arg min} \{〈 \nabla_{U} G (U^{k}, V^{k}, z^{k + 1}), U 〉 + {λ ∥ U ∥}_{2, 0} + \frac{γ_{1, k}}{2} {∥ U - U^{k} ∥}_{F}^{2}\}$
7:: Set $γ_{2, k} \geq τ_{U^{k + 1}} + μ$ , solve for $V^{k + 1}$ :

$V^{k + 1} \in \underset{V \in R^{n_{2} \times κ}}{arg min} \{〈 \nabla_{V} G (U^{k + 1}, V^{k}, z^{k + 1}), V 〉 + {λ ∥ V ∥}_{2, 0} + \frac{γ_{2, k}}{2} {∥ V - V^{k} ∥}_{F}^{2}\}$
8:: end while
9:: Output: $X = U^{k + 1} {(V^{k + 1})}^{⊤}$

(i): For each $k \in N$ , it holds that $Θ (w^{k + 1}) < Θ (w^{k}) - \frac{min {α, μ}}{2} {∥ w^{k} - w^{k + 1} ∥}_{F}^{2} .$ Hence, the sequence ${Θ (w^{k})}_{k \in N}$ is convergent.
(ii): For each $k \in N$ , $(A_{1}^{k + 1}, A_{2}^{k + 1}, A_{3}^{k + 1}) \in \partial Θ (w^{k + 1})$ with

$\{\begin{matrix} A_{1}^{k + 1} : = \nabla_{U} G (w^{k + 1}) - \nabla_{U} G (U^{k}, V^{k}, z^{k + 1}) - γ_{1, k} (U^{k + 1} - U^{k}); \\ A_{2}^{k + 1} : = \nabla_{V} G (w^{k + 1}) - \nabla_{V} G (U^{k + 1}, V^{k}, z^{k + 1}) - γ_{2, k} (V^{k + 1} - V^{k}); \\ A_{3}^{k + 1} : = α (y^{k} - y^{k + 1}) . \end{matrix}$

Assume that ${w^{k}}_{k \in N}$ is bounded. Then, there exists constant $c > 0$ , such that

$dist (0, \partial Θ (w^{k + 1})) \leq c {∥ w^{k} - w^{k + 1} ∥}_{F} .$

Proof.

(i) By the optimality of

z^{k + 1}

and the strong convexity of the subproblem (7) in Algorithm 1, one obtains

\begin{matrix} ∥ z^{k + 1} ∥_{1} + \frac{α}{2} ∥ y^{k} - z^{k + 1} ∥^{2} + \frac{α}{2} ∥ z^{k} - z^{k + 1} ∥^{2} \leq ∥ z^{k} ∥_{1} + \frac{α}{2} {∥ y^{k} - z^{k} ∥}^{2} . \end{matrix}

(8)

Similar to

U^{k + 1}

and

V^{k + 1}

, one has

\begin{matrix} 〈 \nabla_{U} G (U^{k}, V^{k}, z^{k + 1}), U^{k + 1} 〉 & + λ ∥ U^{k + 1} ∥_{2, 0} + \frac{γ_{1, k}}{2} {∥ U^{k + 1} - U^{k} ∥}_{F}^{2} \\ \leq 〈 \nabla_{U} G (U^{k}, V^{k}, z^{k + 1}), U^{k} 〉 + λ {∥ U^{k} ∥}_{2, 0}; \\ 〈 \nabla_{V} G (U^{k + 1}, V^{k}, z^{k + 1}), V^{k + 1} 〉 & + λ ∥ V^{k + 1} ∥_{2, 0} + \frac{γ_{2, k}}{2} {∥ V^{k + 1} - V^{k} ∥}_{F}^{2} \\ \leq 〈 \nabla_{V} G (U^{k + 1}, V^{k}, z^{k + 1}), V^{k} 〉 + λ {∥ V^{k} ∥}_{2, 0} . \end{matrix}

The above two inequalities imply that

\begin{array}{l} λ ∥ U^{k} ∥_{2, 0} + λ ∥ V^{k} ∥_{2, 0} - \frac{γ_{1, k}}{2} ∥ U^{k + 1} - U^{k} ∥_{F}^{2} - \frac{γ_{2, k}}{2} {∥ V^{k + 1} - V^{k} ∥}_{F}^{2} \\ \geq 〈 \nabla_{U} G (U^{k}, V^{k}, z^{k + 1}), U^{k + 1} - U^{k} 〉 + λ ∥ U^{k + 1} ∥_{2, 0} + λ {∥ V^{k + 1} ∥}_{2, 0} \\ + 〈 \nabla_{V} G (U^{k + 1}, V^{k}, z^{k + 1}), V^{k + 1} - V^{k} 〉 . \end{array}

(9)

By the Lipschitz continuity of

\nabla_{U} G (\cdot, V, z)

and

\nabla_{V} G (U, \cdot, z)

, one has

{\begin{cases} \begin{array}{l} G (U^{k + 1}, V^{k}, z^{k + 1}) & \leq G (U^{k}, V^{k}, z^{k + 1}) + 〈 \nabla_{U} G (U^{k}, V^{k}, z^{k + 1}), U^{k + 1} - U^{k} 〉 \\ + \frac{τ_{V^{k}}}{2} {∥ U^{k + 1} - U^{k} ∥}_{F}^{2}; \end{array} \\ \begin{array}{l} G (U^{k + 1}, V^{k + 1}, z^{k + 1}) & \leq G (U^{k + 1}, V^{k}, z^{k + 1}) + 〈 \nabla_{U} G (U^{k + 1}, V^{k}, z^{k + 1}), V^{k + 1} - V^{k} 〉 \\ + \frac{τ_{U^{k + 1}}}{2} {∥ V^{k + 1} - V^{k} ∥}_{F}^{2} . \end{array} \end{cases}

Along with inequality (9), it immediately holds that

\begin{matrix} G (w^{k + 1}) + λ ∥ U^{k + 1} ∥_{2, 0} + λ {∥ V^{k + 1} ∥}_{2, 0} & \leq G (U^{k}, V^{k}, z^{k + 1}) + λ ∥ U^{k} ∥_{2, 0} + λ {∥ V^{k} ∥}_{2, 0} \\ - \frac{μ}{2} ∥ U^{k + 1} - U^{k} ∥_{F}^{2} - \frac{μ}{2} {∥ V^{k + 1} - V^{k} ∥}_{F}^{2} . \end{matrix}

From the last inequality, the definition of

Θ

, the lower boundedness of

Θ

, and inequality (8), it follows that part (i) holds.

(ii) Under the optimal conditions of

U^{k + 1}, V^{k + 1}

and

z^{k + 1}

, one has

{\begin{cases} 0 \in α (z^{k + 1} - y^{k}) + \partial {‖ \cdot ‖}_{1} (z^{k + 1}); \\ 0 \in \nabla_{U} G (U^{k}, V^{k}, z^{k + 1}) + γ_{1, k} (U^{k + 1} - U^{k}) + λ \partial {‖ \cdot ‖}_{2, 0} (U^{k + 1}); \\ 0 \in \nabla_{V} G (U^{k + 1}, V^{k}, z^{k + 1}) + γ_{2, k} (V^{k + 1} - V^{k}) + λ \partial {‖ \cdot ‖}_{2, 0} (V^{k + 1}) . \end{cases}

Then, by Lemma 2, it is not hard to obtain

(A_{1}^{k + 1}, A_{2}^{k + 1}, A_{3}^{k + 1}) \in \partial Θ (w^{k + 1}) .

Hence,

dist (0, \partial Θ (w^{k + 1})) \leq {∥ (A_{1}^{k + 1}, A_{2}^{k + 1}, A_{3}^{k + 1}) ∥}_{F}

. By the boundedness of

{w^{k}}_{k \in N}

and the Lipschitz continuity of

\nabla_{U} G (\cdot, V, z)

and

\nabla_{V} G (U, \cdot, z)

, the desired result is obtained. □

From [[15], Section 4],

Θ

is a KL function. By Proposition 3 and using the same reasoning as in [[17], Theorem 1], the main convergence results are established.

Theorem 1.

Suppose that the sequence

{w^{k}}_{k \in N}

generated by Algorithm 1 is bounded. Then, the sequence

{w^{k}}_{k \in N}

is convergent and its limit

(\tilde{U}, \tilde{V}, \tilde{z})

is a critical point of (3). If the nonzero elements of

| A (\tilde{U} {\tilde{V}}^{⊤}) - b |

are not smaller than

\frac{1}{α}

, then

(\tilde{U}, \tilde{V})

is a critical point of Φ.

5. Numerical Experiments

The efficiency of Algorithm 1 is validated by solving matrix completion problems with outliers under uniform sampling, and its performance is compared with that of SubGM from [7] for solving problem (1). All numerical tests are conducted in MATLAB 2024b on a laptop computer running on a 64-bit Windows Operating System with an Intel(R) Core(TM) i9-13905H CPU 2.60GHz and 32 GB RAM (Intel, Santa Clara, CA, USA).

5.1. Implementation Details of Algorithms

Matrix completion problems involving outliers are considered. To generate the sampling operator

A

, a random index set

Ω = \{(i_{t}, j_{t}) ∣ t = 1, \dots, m\}

is assumed, with indices sampled independently from a uniform distribution. The mapping

A

is then defined by

A (X) : = {(X_{i_{1}, j_{1}}, X_{i_{2}, j_{2}}, \dots, X_{i_{m}, j_{m}})}^{⊤}

for

X \in R^{n_{1} \times n_{2}}

, and

b = A (M_{Ω})

where

M_{Ω} \in R^{n_{1} \times n_{2}}

, with

{[M_{Ω}]}_{i_{t}, j_{t}} = \{\begin{matrix} 0 & if (i_{t}, j_{t}) \notin Ω, \\ M_{i_{t}, j_{t}}^{*} + ϖ_{t} & if (i_{t}, j_{t}) \in Ω \end{matrix} for t = 1, 2, \dots, m .

(10)

The true matrix

M^{*} \in R^{n_{1} \times n_{2}}

and the sparse noisy vector

ϖ = {(ϖ_{1}, \dots, ϖ_{m})}^{⊤}

are considered. The true matrix

M^{*} = M_{L}^{*} {(M_{R}^{*})}^{⊤} \in R^{n_{1} \times n_{2}}

of rank

r^{*}

is generated by independently sampling entries of

M_{L}^{*} \in R^{n_{1} \times r^{*}}

and

M_{R}^{*} \in R^{n_{2} \times r^{*}}

from the standard normal distribution

N (0, 1)

. The nonzero entries of the sparse noisy vector

ϖ = {(ϖ_{1}, \dots, ϖ_{m})}^{⊤}

follow one of two distributions: (i) Student’s t-distribution with four degrees of freedom, scaled by

\sqrt{2}

, or (ii) Laplace distribution with density

d (u) = 0.5 exp (- | u |)

. Unless otherwise specified, the number of nonzero entries in

ϖ

is set to

⌊ 0.3 m ⌋

for all tests. The parameters are set as

μ = 10^{- 8}

and

λ = c_{λ} ∥ b ∥

, where

c_{λ} > 0

is specified in the experiments. Unless otherwise stated, it takes

κ = min (n_{1}, n_{2}, 150)

. The initial point

(U^{0}, V^{0})

of Algorithm 1 is defined as

(P_{1} Σ_{κ} {(M_{Ω})}^{1 / 2}, Q_{1} Σ_{κ} {(M_{Ω})}^{1 / 2})

, with

P_{1}

and

Q_{1}

denoting the matrices consisting of the first

κ

left and right singular vectors of

M_{Ω}

, respectively. For SubGM, as in [7], the Polyak step size rule [18] is employed. Since the optimal value of (1) is typically unknown in practice, the step size

\frac{Ψ (U^{k}, V^{k}) - {min}_{(U, V)} Ψ (U, V)}{∥ ζ^{k} ∥_{F}^{2}}

is replaced with

\frac{0.05 Ψ (U^{k}, V^{k})}{∥ ζ^{k} ∥_{F}^{2}}

, where

ζ^{k} \in \partial Ψ (U^{k}, V^{k})

for

k \in N

.

Algorithm 1 is terminated at iteration

w^{k} = (U^{k}, V^{k}, z^{k})

, when

k \geq k_{\max}

or one of the following conditions is satisfied:

\frac{{max}_{j \in {1, \dots, 19}} | Θ (w^{k}) - Θ (w^{k - j}) |}{max {1, Θ (w^{k})}} \leq ϵ_{1} or \frac{R_{1}^{k} + R_{2}^{k} + R_{3}^{k}}{1 + ∥ b ∥} \leq ϵ_{2} for k \geq 30,

where

\{\begin{matrix} R_{1}^{k} = α ∥ y^{k - 1} - y^{k} ∥; \\ R_{2}^{k} = ∥ \nabla_{U} G (U^{k - 1}, V^{k - 1}, z^{k}) - \nabla_{U} G (w^{k}) + γ_{1, k - 1} (U^{k - 1} - U^{k}) ∥_{F}; \\ R_{3}^{k} = {∥ \nabla_{V} G (U^{k}, V^{k - 1}, z^{k}) - \nabla_{V} G (w^{k}) + γ_{2, k - 1} (V^{k - 1} - V^{k}) ∥}_{F} . \end{matrix}

For a fair comparison, the initial point of SubGM is set to be identical to that of Algorithm 1. SubGM is terminated at iteration

w^{k}

, whenever

k > k_{max}

or either of the following conditions is satisfied:

\frac{{max}_{j \in {1, \dots, 9}} | Ψ (U^{k}, V^{k}) - Ψ (U^{k - j}, V^{k - j}) |}{max {1, Ψ (U^{k}, V^{k})}} \leq ϵ_{3}

or

\frac{∥ U^{k + 1} {(V^{k + 1})}^{⊤} - U^{k} {(V^{k})}^{⊤} ∥_{F}}{1 + ∥ U^{k + 1} {(V^{k + 1})}^{⊤} ∥_{F}} \leq ϵ_{4} with k \geq 200 .

(11)

Unless otherwise stated,

ϵ_{1} = 10^{- 4}

,

ϵ_{2} = 10^{- 5}

and

k_{\max} = 1000

are set for Algorithm 1, while

ϵ_{3} = 5 \times 10^{- 4}

,

ϵ_{4} = 5 \times 10^{- 4}

, and

k_{\max} = 500

are set for SubGM.

The matrix recovery performance is evaluated using the relative error (RE), defined by

RE : = \frac{∥ X^{out} - M^{*} ∥_{F}}{∥ M^{*} ∥_{F}},

where

X^{out} = U^{out} {(V^{out})}^{⊤}

represents the output of a solver. All test results are averaged over 5 instances of each experiment.

5.2. Parameter Sensitivity Analysis

In this section, the effect of the parameters on the performance of Algorithm 1 is evaluated. Figure 1 illustrates the curves of RE, recovered rank, and running time (in seconds) for Algorithm 1 when solving problem (3) with

n_{1} = n_{2} = 1000

,

r^{*} = 5

, under Case I and Case II. As shown in Figure 1, for

2 \leq α \leq 8.5

, Algorithm 1 yields both the exact rank recovery and the lowest relative error in both cases. Values of

α

exceeding 8.5 degrade the algorithm’s performance. The RE increases significantly, accompanied by an increase in computational time. In view of this,

α

is set to 5 for all subsequent experiments.

Under the settings

n_{1} = n_{2} = 1000

,

r^{*} = 5

,

κ = 10 r^{*}

, and

SR = 0.2

, Figure 2 illustrates the RE, recovered rank, and running time curves of Algorithm 1 for varying

λ

. The results show that a specific interval of

λ

values (e.g.,

8 \leq c_{λ} \leq 24

) achieves low relative errors while accurately recovering the true rank of matrix

M^{*}

.

For

n : = n_{1} = n_{2}

and

r^{*} = 10

, Figure 3 shows the variation in running time and RE as n increases. These results indicate that the computational time remains below 160 s when

n \leq 7000

and does not exceed 400 s even when

n = 10^{4}

. With increasing n, the RE decreases rapidly, remaining below

0.015

for

n \geq 3000

.

For Figure 4 and Figure 5, the settings are

n_{1} = n_{2} =

1000 and

r^{*} = 5

. Figure 4 displays the average RE across five repetitions for sampling ratios or

SRs = {0.1, 0.15, 0.2, \dots, 0.5}

. Under uniform sampling, the RE decreases monotonically with increasing SR, while the running time consistently stays within 5 s.

In Figure 5, the variation in RE with the ratio of nonzero entries in the noise vector (NZR) is examined. NZR is set to

{0.1, 0.15, 0.18, \dots, 0.5}

. As shown in Figure 5, the RE increases accordingly with NZR, while the running time remains consistently lower than 4 s. Although the difficulty of the problem escalates as NZR increases, the RE remains close to

0.02

, even when NZR reaches

0.5

.

5.3. Numerical Comparisons with SubGM

The performance of Algorithm 1 in terms of solution quality and running time is compared with that of SubGM. Since the model (1) does not promote low-rank structures,

κ = 3 r^{*}

and

n_{1} = n_{2} = n

are adopted for the following tests. The numerical results of the two methods are obtained under the same stopping criterion described in Section 5.1, with

ϵ_{1} = ϵ_{2} = ϵ_{3} = ϵ_{4} = 5 \times 10^{- 4}

, and

k_{\max} = 500

. Table 1 and Table 2 report the average results across five randomly generated instances for each setting. Compared with SubGM, Algorithm 1 consistently achieves the lowest RE in significantly less computational time.

Reconstructing images from limited and noisy measurements is a key application of low-rank matrix completion. To further evaluate the effectiveness of the proposed method, experiments were conducted on images from the ZJU dataset [19]. Each image has dimensions of

300 \times 300

. The initial points for both algorithms were obtained using the same procedure described in Section 5.1, with

κ = 40

.

The results are presented in Table 3 and Figure 6. The images were undersampled using various sampling rates and masks and then restored using a similar process reported in [9]. Table 3 lists the relative error and peak signal-to-noise ratio (PSNR) for each image restoration task. The results show that the proposed algorithm outperforms SubGM in terms of both computational time and restoration quality.

6. Conclusions

This paper has focused on robust factorization models for low-rank matrix recovery, which are of great significance in multiple fields such as machine learning and imaging sciences. To deal with the challenge of outliers, an

ℓ_{1}

-loss robust factorized model with

ℓ_{2, 0}

-norm regularization is proposed. Given the nonconvex and discontinuous nature of the problem, an approximation problem (3) is constructed. Under mild assumptions, the equivalence of stationary points between the original and the approximation problems is verified, and it is proven that the global optimum of the approximation problem can serve as an approximate optimal solution for the original problem in terms of objective value. On top of that, a fast PALM method is proposed to solve the approximation problem, and the global convergence of the iterate sequence is established. The numerical experiments on synthetic datasets with outliers and image restoration tasks demonstrate that the PALM method achieves low relative errors within a significantly shorter computational time, particularly when dealing with large-scale datasets. Despite the promising results of the proposed approach, there are some limitations. The performance of the algorithm relies on the proper selection of parameters such as

λ

and

α

. To address these limitations, future research works will include the development of adaptive parameter-tuning strategies.

Author Contributions

Methodology, T.T. and J.Z.; Validation, L.X.; Formal analysis, T.T. and L.X.; Investigation, J.Z.; Writing—original draft, T.T., L.X. and J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

T.T. was supported by the Guangdong Basic and Applied Basic Research Foundation No. 2023A1515111167. L.X. was supported by the Guangdong Basic and Applied Basic Research Foundation No. 2022A1515110959. J.Z. was funded by the National Natural Science Foundation of China No. 12401630, the Educational Commission of Guangdong Province No. 2023KQNCX073, and the Natural Science Foundation of Guangdong Province No. 2023A1515110558.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflict of interest.

References

Candès, E.J.; Recht, B. Exact matrix completion via convex optimization. Found. Comput. Math. 2009, 9, 717–772. [Google Scholar] [CrossRef]
Davenport, M.A.; Romberg, J. An overview of low-rank matrix recovery from incomplete observations. IEEE J. Sel. Top. Signal Process. 2016, 10, 608–622. [Google Scholar] [CrossRef]
Fazel, M.; Hindi, H.; Boyd, S. Rank Minimization and Applications in System Theory. Proc. 2004 Am. Control Conf. 2004, 4, 3273–3278. [Google Scholar]
Gross, D.; Liu, Y.K.; Flammia, S.T.; Becker, S.; Eisert, J. Quantum state tomography via compressed sensing. Phys. Rev. Lett. 2010, 105, 150401. [Google Scholar] [CrossRef] [PubMed]
Negahban, S.; Wainwright, M.J. Estimation of (near) low-rank matrices with noise and high-dimensional scaling. Ann. Stat. 2011, 39, 1069–1097. [Google Scholar] [CrossRef]
Charisopoulos, V.; Chen, Y.; Davis, D.; Díaz, M.; Ding, L.; Drusvyatskiy, D. Low-Rank Matrix Recovery with Composite Optimization: Good Conditioning and Rapid Convergence. Found. Comput. Math. 2021, 21, 1505–1593. [Google Scholar] [CrossRef]
Li, X.; Zhu, Z.H.; So, A.M.; Vidal, R. Nonconvex Robust Low-Rank Matrix Recovery. SIAM J. Optim. 2020, 30, 660–686. [Google Scholar] [CrossRef]
Ma, J.H.; Fattahi, S. Global Convergence of Sub-gradient Method for Robust Matrix Recovery: Small Initialization, Noisy Measurements, and Over-parameterization. J. Mach. Learn. Res. 2023, 24, 1–84. [Google Scholar]
Wang, Z.Y.; So, H.C.; Liu, Z.F. Fast and robust rank-one matrix completion via maximum correntropy criterion and half-quadratic optimization. Signal Process. 2022, 198, 108580. [Google Scholar] [CrossRef]
Tao, T.; Qian, Y.T.; Pan, S.H. Column ℓ_2,0-norm regularized factorization model of low-rank matrix recovery and its computation. SIAM J. Optim. 2022, 32, 959–988. [Google Scholar] [CrossRef]
Candès, E.J.; Li, X.; Ma, Y.; Wright, J. Robust principal component analysis. J. ACM 2011, 11, 1–37. [Google Scholar] [CrossRef]
Josz, C.; Ouyang, Y.; Zhang, R.; Lavaei, J.; Sojoudi, S. A theory on the absence of spurious solutions for nonconvex and nonsmooth optimization. Adv. Neural Inf. Process. Syst. 2018, 31, 2441–2449. [Google Scholar]
Li, Y.; Sun, Y.; Chi, Y. Low-rank positive semidefinite matrix recovery from corrupted rank-one measurements. IEEE Trans. Signal Process. 2017, 65, 397–408. [Google Scholar] [CrossRef]
Rockafellar, R.T.; Wets, R.J.-B. Variational Analysis; Springer: Berlin/Heidelberg, Germany, 1998. [Google Scholar]
Attouch, H.; Bolte, J.; Redont, P.; Soubeyran, A. Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 2010, 35, 438–457. [Google Scholar] [CrossRef]
Nocedal, J.; Wright, S.J. Numerical Optimization; Springer: Berlin/Heidelberg, Germany, 2000. [Google Scholar]
Bolte, J.; Sabach, S.; Teboulle, M. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 2014, 146, 459–494. [Google Scholar] [CrossRef]
Polyak, B.T. Minimization of unsmooth functions. USSR Comput. Math. Math. Phys. 1969, 9, 14–29. [Google Scholar] [CrossRef]
Hu, Y.; Zhang, D.; Ye, J.; Li, X.; He, X. Fast and accurate matrix completion via truncated nuclear norm regularization. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 2117–2130. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Curves of RE, rank, and running time of Algorithm 1 with different

α

values.

Figure 1. Curves of RE, rank, and running time of Algorithm 1 with different

α

values.

Figure 2. Curves of relative error, rank, and running time of Algorithm 1 in Case I.

Figure 3. RE and running time curves of Algorithm 1 in Case I–II with different values of n.

Figure 4. RE and running time curves of Algorithm 1 in Case I with different SRs.

Figure 5. RE and running time curves of Algorithm 1 in Case I with different NZRs.

Figure 6. Experimental results of image recovery.

Table 1. The results of the algorithm for synthetic data in Case I.

		Algorithm 1				SubGM
$n$	(r*, SR)	$c_{λ}$	RE	Rank	Time (s)	RE	Rank	Time (s)
1000	(5, 0.10)	10	2.61 $\times 10^{- 2}$	5	1.50	1.26 $\times 10^{- 1}$	15	4.04
	(5, 0.15)	10	1.94 $\times 10^{- 2}$	5	1.48	7.71 $\times 10^{- 2}$	15	4.71
	(5, 0.20)	10	1.59 $\times 10^{- 2}$	5	1.60	6.13 $\times 10^{- 2}$	15	5.28
	(5, 0.20)	10	1.40 $\times 10^{- 2}$	5	1.73	4.93 $\times 10^{- 2}$	15	5.56
	(10, 0.10)	6	3.34 $\times 10^{- 2}$	10	1.88	2.58 $\times 10^{- 1}$	30	4.35
	(10, 0.15)	6	2.26 $\times 10^{- 2}$	10	2.88	1.38 $\times 10^{- 1}$	30	5.26
	(10, 0.20)	6	2.15 $\times 10^{- 2}$	10	2.34	8.20 $\times 10^{- 2}$	30	5.92
	(10, 0.20)	6	1.54 $\times 10^{- 2}$	10	2.33	5.88 $\times 10^{- 2}$	30	6.26
3000	(10, 0.10)	6	1.36 $\times 10^{- 2}$	10	15.0	3.61 $\times 10^{- 2}$	30	40.6
	(10, 0.15)	6	1.05 $\times 10^{- 2}$	10	17.1	2.91 $\times 10^{- 2}$	30	50.6
	(10, 0.20)	6	8.95 $\times 10^{- 3}$	10	18.3	2.23 $\times 10^{- 2}$	30	51.4
	(10, 0.20)	6	1.24 $\times 10^{- 2}$	10	20.5	2.64 $\times 10^{- 2}$	30	56.3
	(20, 0.10)	6	1.57 $\times 10^{- 2}$	20	20.3	7.73 $\times 10^{- 2}$	60	44.1
	(20, 0.15)	6	1.15 $\times 10^{- 2}$	20	20.0	3.41 $\times 10^{- 2}$	60	49.6
	(20, 0.20)	6	9.52 $\times 10^{- 3}$	20	22.4	2.42 $\times 10^{- 2}$	60	55.5
	(20, 0.20)	6	8.33 $\times 10^{- 3}$	20	23.2	2.07 $\times 10^{- 2}$	60	61.6
5000	(10, 0.10)	10	9.92 $\times 10^{- 3}$	10	45.8	2.29 $\times 10^{- 2}$	30	140.9
	(10, 0.15)	10	7.87 $\times 10^{- 3}$	10	50.0	1.93 $\times 10^{- 2}$	30	138.8
	(10, 0.20)	10	6.73 $\times 10^{- 3}$	10	46.9	1.87 $\times 10^{- 2}$	30	141.4
	(10, 0.20)	10	6.01 $\times 10^{- 3}$	10	50.1	1.61 $\times 10^{- 2}$	30	157.7
8000	(10, 0.10)	10	7.63 $\times 10^{- 3}$	10	121.3	1.53 $\times 10^{- 2}$	30	312.1
	(10, 0.15)	10	8.21 $\times 10^{- 3}$	10	140.2	1.62 $\times 10^{- 2}$	30	387.8
	(10, 0.20)	10	5.21 $\times 10^{- 3}$	10	144.9	1.40 $\times 10^{- 2}$	30	432.9
	(10, 0.20)	10	4.71 $\times 10^{- 3}$	10	154.7	1.49 $\times 10^{- 2}$	30	475.7

Table 2. The results of the algorithm for synthetic data in Case II.

		Algorithm 1				SubGM
$n$	(r*, SR)	$c_{λ}$	RE	Rank	Time (s)	RE	Rank	Time (s)
1000	(5, 0.10)	10	2.42 $\times 10^{- 2}$	5	1.37	8.76 $\times 10^{- 2}$	15	3.85
	(5, 0.15)	10	1.82 $\times 10^{- 2}$	5	1.43	6.14 $\times 10^{- 2}$	15	4.81
	(5, 0.20)	10	1.49 $\times 10^{- 2}$	5	1.39	5.03 $\times 10^{- 2}$	15	5.49
	(5, 0.20)	10	1.31 $\times 10^{- 2}$	5	1.28	4.61 $\times 10^{- 2}$	15	5.70
1000	(10, 0.10)	6	3.00 $\times 10^{- 2}$	10	1.86	1.85 $\times 10^{- 1}$	30	4.31
	(10, 0.15)	6	2.07 $\times 10^{- 2}$	10	1.88	9.23 $\times 10^{- 2}$	30	4.26
	(10, 0.20)	6	1.65 $\times 10^{- 2}$	10	2.24	6.02 $\times 10^{- 2}$	30	5.92
	(10, 0.20)	6	1.42 $\times 10^{- 2}$	10	2.28	4.71 $\times 10^{- 2}$	30	6.26
3000	(10, 0.10)	6	1.27 $\times 10^{- 2}$	10	15.5	3.11 $\times 10^{- 1}$	30	40.6
	(10, 0.15)	6	9.85 $\times 10^{- 3}$	10	16.1	2.71 $\times 10^{- 2}$	30	45.6
	(10, 0.20)	6	8.33 $\times 10^{- 3}$	10	17.3	2.27 $\times 10^{- 2}$	30	50.4
	(10, 0.20)	6	7.40 $\times 10^{- 3}$	10	18.5	2.11 $\times 10^{- 2}$	30	55.3
3000	(20, 0.10)	6	1.44 $\times 10^{- 2}$	20	20.3	5.30 $\times 10^{- 2}$	60	44.1
	(20, 0.15)	6	1.07 $\times 10^{- 2}$	20	22.0	2.79 $\times 10^{- 2}$	60	49.6
	(20, 0.20)	6	8.82 $\times 10^{- 3}$	20	22.4	2.11 $\times 10^{- 2}$	60	54.5
	(20, 0.20)	6	7.73 $\times 10^{- 3}$	20	23.2	1.85 $\times 10^{- 2}$	60	59.6
5000	(10, 0.10)	10	9.29 $\times 10^{- 3}$	10	40.8	2.07 $\times 10^{- 2}$	30	110.9
	(10, 0.15)	10	7.37 $\times 10^{- 3}$	10	48.0	1.84 $\times 10^{- 2}$	30	126.8
	(10, 0.20)	10	6.29 $\times 10^{- 3}$	10	46.9	1.75 $\times 10^{- 2}$	30	141.4
	(10, 0.20)	10	5.62 $\times 10^{- 3}$	10	50.1	1.61 $\times 10^{- 2}$	30	156.7
8000	(10, 0.10)	10	7.13 $\times 10^{- 3}$	10	99.3	1.51 $\times 10^{- 2}$	30	283.1
	(10, 0.15)	10	5.71 $\times 10^{- 3}$	10	114.2	1.56 $\times 10^{- 2}$	30	337.8
	(10, 0.20)	10	4.91 $\times 10^{- 3}$	10	114.9	1.50 $\times 10^{- 2}$	30	353.9
	(10, 0.20)	10	4.38 $\times 10^{- 3}$	10	124.7	1.48 $\times 10^{- 2}$	30	395.7

Table 3. Summary of experimental results of image recovery.

	Algorithm	RE	PSNR	Time (s)
Sampling Ratio = 50%	Algorithm 1	0.0708	30.57	1.72
Sampling Ratio = 50%	SubGM	0.0991	27.82	2.60
Sampling Ratio = 20%	Algorithm 1	0.1178	24.22	0.76
Sampling Ratio = 20%	SubGM	0.1361	23.07	1.84
Sampling Ratio = 10%	Algorithm 1	0.0369	33.61	0.59
Sampling Ratio = 10%	SubGM	0.1113	24.06	1.49
Image with text mask	Algorithm 1	0.1639	21.10	0.48
Image with text mask	SubGM	0.1700	20.82	4.84
Image with cross mask	Algorithm 1	0.0934	16.71	1.34
Image with cross mask	SubGM	0.1811	7.40	5.67

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tao, T.; Xiao, L.; Zhong, J. A Fast Proximal Alternating Method for Robust Matrix Factorization of Matrix Recovery with Outliers. Mathematics 2025, 13, 1466. https://doi.org/10.3390/math13091466

AMA Style

Tao T, Xiao L, Zhong J. A Fast Proximal Alternating Method for Robust Matrix Factorization of Matrix Recovery with Outliers. Mathematics. 2025; 13(9):1466. https://doi.org/10.3390/math13091466

Chicago/Turabian Style

Tao, Ting, Lianghai Xiao, and Jiayuan Zhong. 2025. "A Fast Proximal Alternating Method for Robust Matrix Factorization of Matrix Recovery with Outliers" Mathematics 13, no. 9: 1466. https://doi.org/10.3390/math13091466

APA Style

Tao, T., Xiao, L., & Zhong, J. (2025). A Fast Proximal Alternating Method for Robust Matrix Factorization of Matrix Recovery with Outliers. Mathematics, 13(9), 1466. https://doi.org/10.3390/math13091466

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Fast Proximal Alternating Method for Robust Matrix Factorization of Matrix Recovery with Outliers

Abstract

1. Introduction

2. Preliminaries

2.1. Notations

2.2. Stationary Points and $ϵ$ -Global Optimal Solutions

2.3. Kurdyka–Łojasiewicz Property

3. Relationship Between Problems (2) and (3)

4. A PALM Method for Solving Problem (3)

5. Numerical Experiments

5.1. Implementation Details of Algorithms

5.2. Parameter Sensitivity Analysis

5.3. Numerical Comparisons with SubGM

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

A Fast Proximal Alternating Method for Robust Matrix Factorization of Matrix Recovery with Outliers

Abstract

1. Introduction

2. Preliminaries

2.1. Notations

2.2. Stationary Points and ϵ -Global Optimal Solutions

2.3. Kurdyka–Łojasiewicz Property

3. Relationship Between Problems (2) and (3)

4. A PALM Method for Solving Problem (3)

5. Numerical Experiments

5.1. Implementation Details of Algorithms

5.2. Parameter Sensitivity Analysis

5.3. Numerical Comparisons with SubGM

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2.2. Stationary Points and $ϵ$ -Global Optimal Solutions