A New Proximal Iteratively Reweighted Nuclear Norm Method for Nonconvex Nonsmooth Optimization Problems

Ge, Zhili; Zhang, Siyu; Zhang, Xin; Cui, Yan

doi:10.3390/math13162630

Open AccessArticle

A New Proximal Iteratively Reweighted Nuclear Norm Method for Nonconvex Nonsmooth Optimization Problems

¹

School of Mathematical Sciences, Nanjing Normal University of Special Education, Nanjing 210038, China

²

School of Microelectronics and Data Science, Anhui University of Technology, Ma’anshan 243032, China

³

School of Mathematics and Physics, Suqian University, Suqian 223800, China

⁴

Key Laboratory of Numerical Simulation for Large Scale Complex Systems, Ministry of Education, Nanjing 210023, China

⁵

School of Artificial Intelligence, Nanjing Normal University of Special Education, Nanjing 210038, China

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(16), 2630; https://doi.org/10.3390/math13162630

Submission received: 9 July 2025 / Revised: 13 August 2025 / Accepted: 13 August 2025 / Published: 16 August 2025

(This article belongs to the Special Issue Decision Making and Optimization Under Uncertainty)

Download

Browse Figures

Versions Notes

Abstract

This paper proposes a new proximal iteratively reweighted nuclear norm method for a class of nonconvex and nonsmooth optimization problems. The primary contribution of this work is the incorporation of line search technique based on dimensionality reduction and extrapolation. This strategy overcomes parameter constraints by enabling adaptive dynamic adjustment of the extrapolation/proximal parameters (α_k, β_k, μ_k). Under the Kurdyka–Łojasiewicz framework for nonconvex and nonsmooth optimization, we prove the global convergence and linear convergence rate of the proposed algorithm. Additionally, through numerical experiments using synthetic and real data in matrix completion problems, we validate the superior performance of the proposed method over well-known methods.

Keywords:

nonconvex and nonsmooth optimization; proximal iteratively reweighted method; line search; convergence analysis

MSC:

49M37; 65K05; 90C26

1. Introduction

1.1. Problem Description

This work addresses a nonconvex and nonsmooth optimization problem within the real matrix space

R^{m \times n} (m \leq n)

min_{X} Ψ (X) : = f (X) + \sum_{i = 1}^{m} g (σ_{i} (X)),

(1)

where

σ_{i} (X)

denotes the i-th singular value of X, f is differentiable and the gradient is Lipschitz continuous with constant

L_{f}

, g is differentiable concave and the gradient is Lipschitz continuous with constant

L_{g}

and

g^{'} (t) > 0

for any

t \in [0, + \infty)

.

It is easy to see that

\sum_{i = 1}^{m} g (σ_{i} (X))

is nonconvex and nonsmooth due to the nonsmoothness of

σ_{i} (X)

and the concavity of g. Thus, the overall function

Ψ

is nonconvex and nonsmooth even though f is differentiable (may be nonconvex). Note the generality of problem (1), it has a wide applications, such as image processing [1], machine learning [2] and multiple category classification [3]. To illustrate this point, consider the well-known image recovery problem. In such a scenario,

f (X) = \frac{1}{2} {‖ A (X) - b ‖}^{2}

(A is a linear operator and b is a vector or matrix) generally represents the quadratic loss function, which is used to measure recovery performance. Consequently, f is always differentiable. On the other hand,

\sum_{i = 1}^{m} g (σ_{i} (X))

is a nonconvex regularized term that is employed to obtain a low rank solution. Some common nonconvex regularized terms, including

L_{p}

, Log, ETP, Geman and Laplace can be found in [1,4]. The validity of the assumption of g can be verified through the nonnegativity of its second order derivatives and the median theorem.

1.2. Related Work

It is precisely because of the popularity and scope of problem (1) that there is a lot of related work, as can be seen in [5,6,7,8,9,10,11,12,13]. One of the more competitive methods is the well-known General Iterative Shrinkage and Thresholding (GIST) algorithm [14,15]. Applying the GIST algorithm to solve (1) needs to compute the proximal operator of a DC function

\sum_{i = 1}^{m} g (σ_{i} (X))

. Unfortunately, this assumption of

\sum_{i = 1}^{m} g (σ_{i} (X))

is less likely to be satisfied, since the DC decomposition of

\sum_{i = 1}^{m} g (σ_{i} (X))

is not known in general. Thus, based on the key fact that nonnegativity and monotone decrease of

\nabla g

, Lu et al. [4] proposed Proximal Iteratively Reweighed Nuclear Norm algorithm (PIRNN). Sun et al. [16] refined the related convergence conclusions. Later, Ge et al. [1] gave the PIRNN with a more general Extrapolation (PIRNNE) and proved the convergence under the same assumptions. The concrete iterative scheme can be read as

{\begin{array}{l} (2) & Y^{k} : = X^{k} + α_{k} (X^{k} - X^{k - 1}) \\ (3) & Z^{k} : = X^{k} + β_{k} (X^{k} - X^{k - 1}) \\ (4) & X^{k + 1} : = {prox}_{\sum_{i = 1}^{m} μ_{k} w_{i}^{k} σ_{i} (X)} (Y^{k} - μ_{k} \nabla f (Z^{k})) \end{array}

where

α_{k} \in [0, 1)

,

β_{k} \in [0, 1]

are the extrapolation stepsizes,

{μ_{k}}

is a nondecreasing parameter sequence,

ω_{i}^{k} : = g^{'} (σ_{i} (X^{k}))

and for any

Y \in R^{m \times n}

,

{prox}_{\sum_{i = 1}^{m} μ_{k} w_{i}^{k} σ_{i} (X)} (Y) : = arg min_{X} \{\sum_{i = 1}^{m} w_{i}^{k} σ_{i} (X) + \frac{1}{2 μ_{k}} {∥ X - Y ∥}_{F}^{2}\} .

(5)

(4) has a closed-form solution if

0 \leq w_{1}^{k} \leq w_{2}^{k} \leq \dots \leq w_{m}^{k}

. In other words, for any

Y \in R^{m \times n}

, one has

U S (Λ) V^{⊤} \in {prox}_{\sum_{i}^{m} μ w_{i}^{k} σ_{i} (X)} (Y),

where

U Λ V^{⊤}

is the SVD of Y,

S (Λ) = diag {{(Λ_{i, i} - μ_{k} w_{i}^{k})}_{+}}_{1 \leq i \leq m}

with

{(a)}_{+} = max {a, 0}

for any

a \in R

.

Meanwhile, Phan et al. [17] devised an acceleration framework utilizing the partial singular value decomposition of reduced-dimensional matrices rather than full matrices, conditioned upon parameter specifications with

α_{k} = 0

,

β_{k} = \frac{k}{k + 3}

and

μ_{k} = μ

. Xu et al. [18] integrated rank estimation via enhanced Gerschgorin disk analysis with learnable submatrix recovery, demonstrating state-of-the-art performance. Separately, Wen et al. [19] formulated an alternative accelerated matrix completion methodology employing continuation protocols and randomized truncated SVD, parameterized by

α_{k} = β_{k} = 0

and

μ_{k + 1} = max {η μ_{k}, μ_{min}}

,

η < 1

. Generalized framework [20] leveraged ADMM for nonconvex nonsmooth low-rank recovery with rigorous convergence guarantees. Some recent methods also combine other regularization techniques. Alternative regularization strategies included an image reconstruction factorization model using a total variation regularizer [21], a truncated error model using the difference between the nuclear norm and Frobenius norm for impulse noise processing [22], and an accelerated iterative reweighted nuclear norm method combined with active manifold identification [23]. This paper focuses on the efficient computation of (5) based on PIRNNE. To the best of our knowledge, suitable parameter selection makes these algorithms have good numerical performance. The most famous optimal parameter choice is the Nesterov’s acceleration, such as FISTA [24] and APG [25]. The optimal choice involving the inertial and proximal parameters of PIRNNE for the nonconvex nonsmooth problems considered in this paper are not explicit, which makes the algorithm unable to maintain its advantage. Whether there is an adaptive parameter selection is our concern.

1.3. Our Contribution

Fortunately, the line search strategy is widely used for nonconvex vector optimization to overcome restrictions on the involved parameters [26,27,28,29,30,31]. This strategy allows the parameters to be chosen initially with some aggressive values that are not below a specific threshold, and then updates adaptively parameters at each iteration according to the line search criterion, which can improve the numerical performance in implementation. A natural approach is to incorporate the line search strategy to the PIRNNE by updating the parameters

α_{k}, β_{k}

and

μ_{k}

. adaptively. Therefore, the main contributions of this paper are as follows:

We propose a Proximal Iteratively Reweighted Nuclear Norm algorithm with Extrapolation and Line Search, denoted by PIRNNE-LS. This framework integrates line search with extrapolation and dimensionality reduction, circumventing parametric limitations. Parameters withinthe proposed method initialize aggressively above defined thresholds then undergo criterion-driven adaptive recalibration per iteration.
We prove the subsequential convergence that each generated sequence converges to a stationary point of the considered problem. Especially, when the line search is monotone, we further establish its global convergence and linear convergence rate under the Kurdyka–Łojasiewicz framework.
We conduct some experiments to evaluate the performance of the proposed method for solving the matrix completion problem. Some numerical results are reported the effectiveness and superiority of our proposed method.

The remainder of this paper is organized as follows. Section 2 provides the preliminaries needed for the theoretical analysis in the subsequent sections. Section 3 details PIRNNE-LS for the specified problem. Section 4 analyzes the subsequential convergence. Specifically, we discuss the global convergence and linear convergence rate under the Kurdyka–Łojasiewicz framework in the case of monotone line search. Section 5 reports numerical results on synthetic and empirical datasets. Concluding conclusions appear in Section 6.

2. Preliminaries

In this section, we recall some definitions and properties which will be used in the analysis.

2.1. Basic Concepts in Variational and Convex Analysis

For an extended-real-valued function

J : = R^{n} \to (- \infty, \infty]

, its domain is defined by

dom (J) : = {x \in R^{n} : J (x) < + \infty} .

If

dom (J) \neq \emptyset

and

J (x) > - \infty

for any

x \in dom (J)

, we say the function J is proper. If it is lower semicontinuous, we say it is closed. For any subset

T \subseteq R^{n}

and any point

x \in R^{n}

, the distance from x to T is defined by

dist (x, T) : = \inf {∥ y - x ∥ | y \in T},

and we have that

dist (x, T) = \infty

for all x when

T = \emptyset

.

Next, we give the definition of subdifferential which plays a central role in nonconvex optimization.

Definition 1

([32,33]). (Subdifferentials) Let

J : R^{n} \to (- \infty, \infty]

be a proper and lower semicontinuous function.

(i): For a given $x \in dom (J)$ , the Fréchet subdifferential of J at x, written by $\hat{\partial} J (x)$ , is the set of all vectors $u \in R^{n}$ that satisfy

$lim_{y \neq x} inf_{y \to x} \frac{J (y) - J (x) - 〈 u, y - x 〉}{∥ y - x ∥} \geq 0 .$

When $x \notin dom (J)$ , we set $\hat{\partial} J (x) = \emptyset$ .
(ii): The limiting subdifferential, or simply the subdifferential, of J at x, written by $\partial J (x)$ , is defined by

$\partial J (x) : = {u \in R^{n} | \exists x^{k} \to x, s . t . J (x^{k}) \to J (x) and u^{k} \in \hat{\partial} J (x^{k}) \to u as k \to \infty} .$

(6)
(iii): A point $x^{*}$ is called the (limiting) critical point or stationary point of J if it satisfies $0 \in \partial J (x^{*})$ , and the set of critical points of J is denoted by critJ.

Assumption 1.

Ψ (X) \to + \infty

iff

{∥ X ∥}_{F} \to \infty .

2.2. Kurdyka–Łojasiewicz Property

Now we recall the Kurdyka–Łojasiewicz (KL) property [33,34,35], which would help us to establish the global convergence. Many functions have KL properties, like semi-algebraic functions defined in an

o -

minimal structure, and others discussed in [32].

Definition 2.

(KL property and KL function) Let

J : R^{n} \to (- \infty, \infty]

be a proper and lower semicontinuous function.

(i)

The function J is said to have KL property at

x^{*} \in dom (\partial J)

if there exists

η \in (0, + \infty]

, a neighborhood U of

x^{*}

, and a continuous and concave function

φ : [0, η) \to R^{+}

such that

(a): $φ (0) = 0$ and φ is continuously differentiable on $(0, η)$ with $φ^{'} > 0;$
(b): for all $x \in U \cap {z \in R^{n} | J (x^{*}) < J (z) < J (x^{*}) + η}$ , the following KL inequality holds:

$φ^{'} (J (x) - J (x^{*})) dist (0, \partial J (x)) \geq 1 .$

(ii)

If J satisfies the KL property at each point of

dom (\partial J)

, then J is called a KL function.

Let

Φ_{η}

be the set of function

φ

which satisfies the involved conditions in Definition 2 (i). In the following, we give a uniformized KL property which was established in [33].

Lemma 1

([33], Lemma 6). (Uniformized KL property) Let Ω be a compact set and

J : R^{n} \to (- \infty, \infty]

be a proper and lower semicontinuous function. Assume that J is a constant on Ω and satisfies the KL property at each point of Ω. Then, there exists

ζ, η > 0

and

φ \in Φ_{η}

such that for all

\bar{x} \in Ω

and all x in the following intersection

{z \in R^{n} | dist (z, Ω) < ζ} \cap {z \in R^{n} | J (\bar{x}) < J (x) < J (\bar{x}) + η},

one has

φ^{'} (J (x) - J (\bar{x})) dist (0, \partial J (x)) \geq 1 .

3. The Proposed Method

This section advances a novel computational framework addressing the nonconvex and nonsmooth optimization problem (1). Under the functional decomposition of f in (1), we posit existence of convex functions

f_{1}

and

f_{2}

exhibiting Lipschitz continuous gradients such that

f : = f_{1} - f_{2}

. Consistent with established literature [1,36,37,38,39], the Lipschitz constant

L_{f}

governing

\nabla f

conforms to

L_{f} \leq L

, where

\nabla f_{1}

and

\nabla f_{2}

possess respective Lipschitz moduli

L > 0

and

l \geq 0

under the condition

L \geq l

. The formal computational architecture is instantiated in Algorithm 1.

Algorithm 1 PIRNNE-LS for solving (1)

Choose

η_{1}

,

η_{2}

,

τ

,

p_{min} \in (0, 1)

,

α_{max}

,

β_{max}

,

d > 0

,

δ \in [0, 1)

,

0 < μ_{min} \leq \frac{1 - δ}{L + 2 d} \leq μ_{max}

.
For given

X^{0} \in R^{m \times n}

,

X^{- 1} = X^{0}

, let

{\tilde{E}}_{0} : = Ψ (X^{0})

and set

k : = 0

.
while stopping criterion is not satisfied, do
Step 1. Choose

α_{k}^{0} \in [0, α_{max}]

,

β_{k}^{0} \in [0, β_{max}]

and

μ_{k}^{0} \in [μ_{min}, μ_{max}]

, set

α_{k} : = α_{k}^{0}

,

β_{k} : = β_{k}^{0}

,

μ_{k} : = μ_{k}^{0}

, then
(1a) Compute

Y^{k}, Z^{k}

by (2) and (3), respectively.
(1b) Compute the SVD of

Y^{k} - μ_{k} \nabla f (Z^{k})

, i.e.,

Y^{k} - μ_{k} \nabla f (Z^{k}) : = {\tilde{U}}^{k} {\tilde{Λ}}^{k} {({\tilde{V}}^{k})}^{⊤}

;
Compute the singular value of

X^{k}

, and let

w_{i}^{k} : = g^{'} (σ_{i} (X^{k}))

for

i = 1, \dots, m

.
(1c) Compute

X^{k + 1} : = {\tilde{U}}^{k} S ({\tilde{Λ}}^{k}) {({\tilde{V}}^{k})}^{⊤},

(7)

where

S ({\tilde{Λ}}^{k}) : = diag {{({\tilde{Λ}}_{i, i}^{k} - μ_{k} w_{i}^{k})}_{+}}_{1 \leq i \leq m}

.
(1d) If

E_{δ} (X^{k + 1}, X^{k}, μ_{k}) - {\tilde{E}}_{k} \leq - \frac{d}{2} {∥ X^{k + 1} - X^{k} ∥}_{F}^{2},

(8)

is satisfied, go to Step 2, where

E_{δ}

is defined in (9). Otherwise, set

α_{k} = η_{1} α_{k}

,

β_{k} = η_{2} β_{k}

,

μ_{k} = max {τ μ_{k}, μ_{min}}

and go to Step (1a).
Step 2.

{\tilde{E}}_{k + 1} = p_{k} E_{δ} (X^{k + 1}, X^{k}, μ_{k}) + (1 - p_{k}) {\tilde{E}}_{k}

for

p_{k} \in [p_{min}, 1]

, then let

k = k + 1

and go to Step 1.
end while

Within Algorithm 1, analogous to references [28,31], we postulate the potential function:

E_{δ} (U, V, μ) : = Ψ (V) + \frac{δ}{4 μ} {∥ U - V ∥}_{F}^{2},

(9)

where

E_{δ} : R^{m \times n} \times R^{m \times n} \times R^{+} \to (- \infty, \infty]

and

δ \in [0, 1)

signifies an assigned nonnegative constant. Moreover, Algorithm 1 permits the selection of arbitrary initial values

α_{k}^{0} \in [0, α_{max}]

,

β_{k}^{0} \in [0, β_{max}]

and

μ_{k}^{0} \in (μ_{min}, μ_{max}]

per iteration. Subsequent adaptive refinement occurs governed by the line search criterion (8). This methodology markedly enhances the procedure’s adaptability and numerical efficiency. Furthermore, contingent upon specific conditions, users may initially and intuitively select

μ_{min}

and

μ_{max}

; subsequently, determination of d ensues based upon their stipulated conditions.

Remark 1.

Observe that PIRNNE-LS still necessitates computing the singular value decomposition of a large-scale

Y^{k} - μ_{k} \nabla f (Z^{k})

, potentially exorbitant. The subsequent lemma ensures

X^{k + 1}

is derivable through a reduced matrix’s SVD.

Y^{k} - μ_{k} \nabla f (Z^{k})

has

\hat{q}

singular values

{\tilde{Λ}}_{i_{1}, i_{1}}^{k} \geq \dots \geq {\tilde{Λ}}_{i_{\hat{q}}, i_{\hat{q}}}^{k}

such that

{\tilde{Λ}}_{i_{j}, i_{j}}^{k} > μ_{k} w_{i_{j}}^{k}

. Henceforth Algorithm 1 yields:

{\tilde{U_{\hat{q}}}}^{k} {\tilde{Λ_{\hat{q}}}}^{k} {({\tilde{V_{\hat{q}}}}^{k})}^{⊤} : = ({\tilde{u}}_{i_{1}}^{k}, \dots, {\tilde{u}}_{i_{\hat{q}}}^{k}) d i a g ({\tilde{Λ}}_{i_{1}, i_{1}}^{k},

\dots, {\tilde{Λ}}_{i_{\hat{q}}, i_{\hat{q}}}^{k}) {({\tilde{v}}_{i_{1}}^{k}, \dots, {\tilde{v}}_{i_{\hat{q}}}^{k})}^{⊤}

as the rank-

\hat{q}

SVD of

Y^{k} - μ_{k} \nabla f (Z^{k})

, where

{\tilde{u}}_{i_{j}}^{k}

and

{\tilde{v}}_{i_{j}}^{k}

denote left and right singular vectors for

{\tilde{Λ}}_{i_{j}, i_{j}}^{k}

, respectively.

Lemma 2

([17]). Let Q be a

R^{m \times q}

matrix with orthogonal columns,

{\tilde{U_{\hat{q}}}}^{k} {\tilde{Λ_{\hat{q}}}}^{k} {({\tilde{V_{\hat{q}}}}^{k})}^{⊤}

be the rank-

\hat{q}

SVD of

Y^{k} - μ_{k} \nabla f (Z^{k})

,

{\tilde{U_{Q}}}^{k} {\tilde{Λ_{Q}}}^{k} {({\tilde{V_{Q}}}^{k})}^{⊤}

be the SVD of

Q^{⊤} (Y^{k} - μ_{k} \nabla f (Z^{k}))

, and

s p a n ({\tilde{U_{\hat{q}}}}^{k}) \subset s p a n (Q)

, where

q \geq \hat{q}

. Thus,

X^{k + 1} : = Q {\tilde{U_{Q}}}^{k} S ({\tilde{Λ_{Q}}}^{k}) {({\tilde{V_{Q}}}^{k})}^{⊤}

is a solution to (5), where

S ({\tilde{Λ_{Q}}}^{k}) = d i a g {({[{({\tilde{Λ_{Q}}}^{k})}_{i_{j}, i_{j}} - μ_{k} w_{i_{j}}^{k}]}_{+})}_{i_{1} \leq i_{j} \leq i_{\hat{q}}}

.

4. Convergence Analysis

This section mainly delineates subsequential convergence, global convergence and linear convergence rate. We start by analyzing the convergence of the subsequence.

4.1. Subsequential Convergence of Nonmontone Line Search

Initially, we establish monotone nonincreasing property of the sequence

{E_{δ} (X^{k + 1}, X^{k}, μ_{k})}

.

Lemma 3.

Let

{X^{k}}

be the sequence generated by Algorithm 1. If for any

k \geq 0

, the parameters

α_{k}

,

β_{k}

and

μ_{k}

satisfy

μ_{k} < \frac{1}{L}, α_{k} \leq \sqrt{\frac{μ_{k} δ (1 - μ_{k} L)}{16 μ_{k - 1}}}, a n d β_{k} \leq \sqrt{\frac{δ (1 - μ_{k} L)}{4 μ_{k - 1} (4 μ_{k} L^{2} + L + l)}} .

(10)

Then, we have

E_{δ} (X^{k + 1}, X^{k}, μ_{k}) - E_{δ} (X^{k}, X^{k - 1}, μ_{k - 1}) \leq \frac{μ_{k} L - 1 + δ}{4 μ_{k}} {∥ X^{k + 1} - X^{k} ∥}_{F}^{2} .

(11)

Proof.

Since

X^{k + 1}

is a minimizer of the optimization problem in (4), we get

\begin{matrix} \sum_{i = 1}^{m} w_{i}^{k} σ_{i} (X^{k + 1}) \leq \sum_{i = 1}^{m} w_{i}^{k} σ_{i} (X^{k}) + 〈 \nabla f (Z^{k}), X^{k} - X^{k + 1} 〉 \\ + \frac{1}{2 μ_{k}} ∥ X^{k} - Y^{k} ∥_{F}^{2} - \frac{1}{2 μ_{k}} {∥ X^{k + 1} - Y^{k} ∥}_{F}^{2} . \end{matrix}

(12)

From the Lipschitz continuity of

\nabla f

with the modulus

L_{f}

and

L_{f} \leq L

, it follows from ([40], Lemma 1.2.3) that

f (X^{k + 1}) \leq f (Z^{k}) + 〈 \nabla f (Z^{k}), X^{k + 1} - Z^{k} 〉 + \frac{L}{2} {∥ X^{k + 1} - Z^{k} ∥}_{F}^{2} .

(13)

Similarly to the technique of ([1], Lemma 4), we get

\begin{matrix} f (X^{k + 1}) + \sum_{i = 1}^{m} w_{i}^{k} σ_{i} (X^{k + 1}) - f (X^{k}) - \sum_{i = 1}^{m} w_{i}^{k} σ_{i} (X^{k}) \\ \leq \frac{1}{2 μ_{k}} ∥ X^{k} - Y^{k} ∥_{F}^{2} - \frac{1}{2 μ_{k}} ∥ X^{k + 1} - Y^{k} ∥_{F}^{2} + \frac{L}{2} ∥ X^{k + 1} - Z^{k} ∥_{F}^{2} + \frac{l}{2} {∥ X^{k} - Z^{k} ∥}_{F}^{2} . \end{matrix}

(14)

Next, it follows from (2) and (3) that

\begin{matrix} X^{k} - Y^{k} = - α_{k} (X^{k} - X^{k - 1}), X^{k + 1} - Y^{k} = X^{k + 1} - X^{k} - α_{k} (X^{k} - X^{k - 1}), \\ X^{k} - Z^{k} = - β_{k} (X^{k} - X^{k - 1}), X^{k + 1} - Z^{k} = X^{k + 1} - X^{k} - β_{k} (X^{k} - X^{k - 1}) . \end{matrix}

(15)

Merging (14), (15), the concavity of g and the definition of

E_{δ}

in (9), we have

\begin{matrix} E_{δ} (X^{k + 1}, X^{k}, μ_{k}) - E_{δ} (X^{k}, X^{k - 1}, μ_{k - 1}) \\ = f (X^{k + 1}) + \sum_{i = 1}^{m} g (σ_{i} (X^{k + 1})) \\ + \frac{δ}{4 μ_{k}} ∥ X^{k + 1} - X^{k} ∥_{F}^{2} - f (X^{k}) - \sum_{i = 1}^{m} g (σ_{i} (X^{k})) - \frac{δ}{4 μ_{k - 1}} {∥ X^{k} - X^{k - 1} ∥}_{F}^{2} \\ \leq f (X^{k + 1}) - f (X^{k}) + \sum_{i = 1}^{m} w_{i}^{k} (σ_{i} (X^{k + 1}) - σ_{i} (X^{k})) \\ + \frac{δ}{4 μ_{k}} ∥ X^{k + 1} - X^{k} ∥_{F}^{2} - \frac{δ}{4 μ_{k - 1}} {∥ X^{k} - X^{k - 1} ∥}_{F}^{2} \\ \leq \frac{1}{2 μ_{k}} α_{k}^{2} ∥ X^{k} - X^{k - 1} ∥_{F}^{2} - \frac{1}{2 μ_{k}} {∥ (X^{k + 1} - X^{k}) - α_{k} (X^{k} - X^{k - 1}) ∥}_{F}^{2} \\ + \frac{L}{2} ∥ (X^{k + 1} - X^{k}) - β_{k} (X^{k} - X^{k - 1}) ∥_{F}^{2} + \frac{l}{2} β_{k}^{2} {∥ X^{k} - X^{k - 1} ∥}_{F}^{2} \\ + \frac{δ}{4 μ_{k}} ∥ X^{k + 1} - X^{k} ∥_{F}^{2} - \frac{δ}{4 μ_{k - 1}} {∥ X^{k} - X^{k - 1} ∥}_{F}^{2} \\ = - \frac{1}{2 μ_{k}} ∥ X^{k + 1} - X^{k} ∥_{F}^{2} + \frac{α_{k}}{μ_{k}} 〈 X^{k + 1} - X^{k}, X^{k} - X^{k - 1} 〉 + \frac{L}{2} {∥ X^{k + 1} - X^{k} ∥}_{F}^{2} \\ + \frac{L}{2} β_{k}^{2} ∥ X^{k} - X^{k - 1} ∥_{F}^{2} - L β_{k} 〈 X^{k + 1} - X^{k}, X^{k} - X^{k - 1} 〉 + \frac{l}{2} β_{k}^{2} {∥ X^{k} - X^{k - 1} ∥}_{F}^{2} \\ + \frac{δ}{4 μ_{k}} ∥ X^{k + 1} - X^{k} ∥_{F}^{2} - \frac{δ}{4 μ_{k - 1}} {∥ X^{k} - X^{k - 1} ∥}_{F}^{2}, \end{matrix}

(16)

By using the Young inequality, we obtain

\frac{α_{k}}{μ_{k}} ∥ X^{k + 1} - X^{k} ∥_{F} \cdot ∥ X^{k} - X^{k - 1} ∥_{F} \leq \frac{1 - μ_{k} L}{8 μ_{k}} ∥ X^{k + 1} - X^{k} ∥_{F}^{2} + \frac{2 α_{k}^{2}}{μ_{k} (1 - μ_{k} L)} {∥ X^{k} - X^{k - 1} ∥}_{F}^{2},

and

- L β_{k} ∥ X^{k + 1} - X^{k} ∥_{F} \cdot ∥ X^{k} - X^{k - 1} ∥_{F} \leq \frac{1 - μ_{k} L}{8 μ_{k}} ∥ X^{k + 1} - X^{k} ∥_{F}^{2} + \frac{2 μ_{k} β_{k}^{2} L^{2}}{1 - μ_{k} L} {∥ X^{k} - X^{k - 1} ∥}_{F}^{2} .

Substituting above two inequalities into (16), we have

\begin{matrix} E_{δ} (X^{k + 1}, X^{k}, μ_{k}) - E_{δ} (X^{k}, X^{k - 1}, μ_{k - 1}) \\ \leq \frac{μ_{k} L - 1 + δ}{4 μ_{k}} {∥ X^{k + 1} - X^{k} ∥}_{F}^{2} \\ + (\frac{2 α_{k}^{2}}{μ_{k} (1 - μ_{k} L)} + \frac{L}{2} β_{k}^{2} + \frac{l}{2} β_{k}^{2} + \frac{2 μ_{k} β_{k}^{2} L^{2}}{1 - μ_{k} L} - \frac{δ}{4 μ_{k - 1}}) {∥ X^{k} - X^{k - 1} ∥}_{F}^{2} . \end{matrix}

Furthermore, it follows from (10) that

\frac{(L + l)}{2} β_{k}^{2} + \frac{2 μ_{k} β_{k}^{2} L^{2}}{1 - μ_{k} L} \leq \frac{δ}{8 μ_{k - 1}} and \frac{2 α_{k}^{2}}{μ_{k} (1 - μ_{k} L)} \leq \frac{δ}{8 μ_{k - 1}} .

Hence, the assertion (11) follows immediately. The proof is completed. □

Lemma 4.

(Well-definedness of the line search criterion) Let

{X^{k}}

be the sequence generated by Algorithm 1. Then, for any

k \geq 0

, criterion (8) shall be satisfied within finite inner iterations.

Proof.

This proof advances via contradiction. Initial focus rests upon

k = 0 .

Observe that

μ_{0} = {\tilde{μ}}_{0}

with

μ_{min} \leq {\tilde{μ}}_{0} \leq \frac{1 - δ}{L + 2 d}

holds incontrovertibly after finite inner iterations. Here,

Y^{0} = X^{0}

and

Z^{0} = X^{0} .

Then, from (4), we have

\sum_{i = 1}^{m} w_{i}^{0} σ_{i} (X^{1}) \leq \sum_{i = 1}^{m} w_{i}^{0} σ_{i} (X^{0}) + 〈 \nabla f (X^{0}), X^{0} - X^{1} 〉 - \frac{1}{2 {\tilde{μ}}_{0}} {∥ X^{1} - Y^{0} ∥}_{F}^{2},

where

w_{i}^{0} = g^{'} (σ_{i} (X^{0}))

. From ([40], Lemma 1.2.3), we obtain

f (X^{1}) - f (X^{0}) \leq 〈 \nabla f (X^{0}), X^{1} - X^{0} 〉 + \frac{L}{2} {∥ X^{1} - X^{0} ∥}_{F}^{2} .

Together with the concavity of g, we have

\begin{matrix} Ψ (X^{1}) - Ψ (X^{0}) \\ = f (X^{1}) + \sum_{i = 1}^{m} g (σ_{i} (X^{1})) - f (X^{0}) - \sum_{i = 1}^{m} g (σ_{i} (X^{0})) \\ \leq f (X^{1}) - f (X^{0}) + \sum_{i = 1}^{m} w_{i}^{k} (σ_{i} (X^{1}) - σ_{i} (X^{0})) \\ \leq \frac{L}{2} ∥ X^{1} - X^{0} ∥_{F}^{2} - \frac{1}{2 {\tilde{μ}}_{0}} {∥ X^{1} - X^{0} ∥}_{F}^{2} . \end{matrix}

This inequality connotes that (11) holds. Since

{\tilde{μ}}_{0} \leq \frac{1 - δ}{L + 2 d} \leq \frac{2 - δ}{2 (L + d)}

, the criterion (8) shall invariably hold. Suppose that there exists a smallest

k > 0

such that the criterion (8) can not be satisfied. It means that the line search criterion (8) is satisfied for the former

k - 1

iterations. At the

k - 1

-th iteration, there exists

μ_{k - 1}

such that

E_{δ} (X^{k}, X^{k - 1}, μ_{k - 1}) - {\tilde{E}}_{k - 1} \leq - \frac{d}{2} {∥ X^{k} - X^{k - 1} ∥}_{F}^{2} .

Thus, we have

{\tilde{E}}_{k - 1} \leq E_{δ} (X^{k}, X^{k - 1}, μ_{k - 1})

. Further, Step 2 of Algorithm 1 defines

{\tilde{E}}_{k - 1}

, and we obtain

\begin{matrix} {\tilde{E}}_{k} = p_{k - 1} E_{δ} (X^{k}, X^{k - 1}, μ_{k - 1}) + (1 - p_{k - 1}) {\tilde{E}}_{k - 1} \\ \leq p_{k - 1} E_{δ} (X^{k}, X^{k - 1}, μ_{k - 1}) + (1 - p_{k - 1}) E_{δ} (X^{k}, X^{k - 1}, μ_{k - 1}) \\ = E_{δ} (X^{k}, X^{k - 1}, μ_{k - 1}) . \end{matrix}

(17)

Since Algorithm 1’s Step (1d) necessitates

μ_{k} \leq μ_{min}

,

μ_{k} = μ_{min}

becomes admissible. Similarly, from Step (1d) in Algorithm 1, we know that

α_{k} \leq \sqrt{\frac{μ_{k} δ (1 - μ_{k} L)}{16 μ_{k - 1}}}, a n d β_{k} \leq \sqrt{\frac{δ (1 - μ_{k} L)}{4 μ_{k - 1} (4 μ_{k} L^{2} + L + l)}}

must be satisfied. Consequently,

μ_{k} = μ_{min}

and (10) are obtained. In addition, since

d > 0

and

δ \in [0, 1)

, it holds that

0 < μ_{min} \leq \frac{1 - δ}{L + 2 d}

and, thus,

\frac{μ_{min} L - 1 + δ}{4 μ_{min}} \leq - \frac{d}{2}

. Together with Lemma 3 and (17), we have

\begin{matrix} E_{δ} (X^{k + 1}, X^{k}, μ_{min}) - {\tilde{E}}_{k} \\ \leq E_{δ} (X^{k + 1}, X^{k}, μ_{min}) - E_{δ} (X^{k}, X^{k - 1}, μ_{k - 1}) \\ \leq \frac{μ_{min} L - 1 + δ}{4 μ_{min}} {∥ X^{k + 1} - X^{k} ∥}_{F}^{2} \\ \leq - \frac{d}{2} {∥ X^{k + 1} - X^{k} ∥}_{F}^{2} . \end{matrix}

(18)

This necessitates satisfaction of line search criterion (8) at the k-th iteration, inducing contradiction. The proof is completed. □

We obtain the subsequential convergence of Algorithm 1 in the following Theorem.

Theorem 1.

Let

{X^{k}}

be the sequence generated by Algorithm 1. Then, we have

(i): the sequence ${{\tilde{E}}_{k}}$ is nonincreasing;
(ii): $lim_{k \to \infty} {∥ X^{k + 1} - X^{k} ∥}_{F} = 0;$
(iii): ${X^{k}}$ is bounded and any cluster point $\tilde{X} = lim_{k \to \infty} X^{k_{i}}$ of ${X^{k}}$ is a critical point of $Ψ .$

Proof.

(i): Invoking line search criterion (8) and ${\tilde{E}}_{k + 1}$ ’s definition in Algorithm 1, we have

$\begin{matrix} {\tilde{E}}_{k + 1} = p_{k} E_{δ} (X^{k + 1}, X^{k}, μ_{k}) + (1 - p_{k}) {\tilde{E}}_{k} \\ \leq p_{k} ({\tilde{E}}_{k} - \frac{d}{2} ∥ X^{k + 1} - X^{k} ∥_{F}^{2}) + (1 - p_{k}) {\tilde{E}}_{k} \\ \leq {\tilde{E}}_{k} - \frac{p_{min} \cdot d}{2} {∥ X^{k + 1} - X^{k} ∥}_{F}^{2} . \end{matrix}$

(19)

It indicates the sequence ${{\tilde{E}}_{k}}$ is nonincreasing.
(ii): Summing up (19) from $k = 1, \dots, N$ , we obtain

$\begin{matrix} \frac{p_{min} \cdot d}{2} {∥ X^{k + 1} - X^{k} ∥}_{F}^{2} \\ \leq \sum_{k = 1}^{N} ({\tilde{E}}_{k} - {\tilde{E}}_{k + 1}) \\ = {\tilde{E}}_{1} - {\tilde{E}}_{N + 1} \\ = E_{δ} (X^{1}, X^{0}, μ_{0}) - E_{δ} (X^{N + 1}, X^{N}, μ_{N}) \\ \leq Ψ (X^{1}) + \frac{δ}{4 μ_{0}} {∥ X^{1} - X^{0} ∥}_{F}^{2} - Ψ (X^{N}) \\ \leq Ψ (X^{1}) + \frac{δ}{4 μ_{0}} {∥ X^{1} - X^{0} ∥}_{F}^{2} - \underset{̲}{Ψ} < \infty, \end{matrix}$

where the validity of the second inequality is deducible from (17), while the third stems directly from the definition of $E_{δ}$ specified in (9). And the last inequality follows from $X^{1} \in dom Ψ$ , $μ_{0} > 0$ and $δ \geq 0$ . From the fact that $p_{min}, d > 0$ and $N \to \infty$ , the assertion (ii) is consequently established.
(iii): The sequence ${{\tilde{E}}_{k}}$ exhibits a monotone decrease, an outcome established by the result (i). We have

${\tilde{E}}_{k} \leq {\tilde{E}}_{k - 1} \leq \dots \leq {\tilde{E}}_{1} = Ψ (X^{1}) + \frac{δ}{4 μ_{0}} {∥ X^{1} - X^{0} ∥}_{F}^{2} < \infty .$

Again, from the definition of ${\tilde{E}}_{k}$ and (8), we have

$\begin{matrix} Ψ (X^{1}) + \frac{δ}{4 μ_{0}} {∥ X^{1} - X^{0} ∥}_{F}^{2} \\ \geq {\tilde{E}}_{k} = p_{k - 1} E_{δ} (X^{k}, X^{k - 1}, μ_{k - 1}) + (1 - p_{k - 1}) {\tilde{E}}_{k - 1} \\ \geq E_{δ} (X^{k}, X^{k - 1}, μ_{k - 1}) \geq Ψ (X^{k}) . \end{matrix}$

Consequently,

Ψ

is upper-bounded. Reiterating Assumption 1, the sequence

{X^{k}}

remains confined and contains at least one cluster point. Assign

\tilde{X}

as such a cluster point. Then, there exists a subsequence

{X^{k_{j}}}

of

X^{k}

such that

lim_{j \to \infty} X^{k_{j}} = \hat{X} .

Then, next proof is similar to ([1], Proof of Theorem 1 (iii)); it is easy to derive that

0 \in \partial Ψ (\tilde{X}),

which implies that

\tilde{X} \in crit Ψ

. This completes the proof. □

4.2. Global Convergence and Linear Convergence Rate of Monotone Line Search

This subsection primarily discusses the global convergence and linear convergence rate of the proposed monotone line search (

p_{k} = 1

) Algorithm 1 under the KL framework. First, we introduce the following two frequently used lemmas, whose proofs are similar to ([1], Lemma 5) and ([1], Lemma 6), so we will not repeat them here. We proceed to use the notation

Δ_{k} : = X^{k} - X^{k - 1}

in this subsection.

Lemma 5.

Let

{X^{k}}

be the sequence generated by Algorithm 1. Then, there exist some

K \in N

and

b > 0

such that for all

k \geq K

, there exists

ω^{k + 1} \in \partial E_{δ} (X^{k + 1}, X^{k}, μ_{k})

such that

∥ ω^{k + 1} ∥_{F} \leq b (∥ Δ_{k + 1} ∥_{F} + ∥ Δ_{k} ∥_{F}) .

Denote the cluster point set of

{X^{k + 1}, X^{k}, μ_{k}}

by

Ξ

. Then, we summarize some properties of the cluster point set

Ξ

.

Lemma 6.

Let

{X^{k}}

be the sequence generated by Algorithm 1 with

p_{k} = 1

. Then, we have

(i): Ξ is nonempty and $Ξ \subseteq crit E_{δ}$ ;
(ii): $lim_{k \to \infty} dist ((X^{k + 1}, X^{k}, μ_{k}), Ξ) = 0;$
(iii): $E_{δ}$ and Ψ are equal and constant on Ξ, i.e., there exists a constant κ such that for any $(\tilde{X}, \tilde{X}, \tilde{μ}) \in Ξ$ , $E_{δ} (\tilde{X}, \tilde{X}, \tilde{μ}) = Ψ (\tilde{X}) = κ$ .

Theorem 2.

Let

{X^{k}}

be the sequence generated by Algorithm 1 with

p_{k} = 1

for k ’s large enough and

E_{δ}

is a KL function.

(i): The whole sequence ${X^{k}}$ manifests finite length $\sum_{k = 0}^{\infty} {∥ Δ_{k + 1} ∥}_{F} < + \infty$ and ${X^{k}}$ globally converges to a point $\tilde{X}$ in $crit Ψ .$
(ii): Moreover, if the KL function can be taken in the form $φ (s) = ρ s^{1 - t}$ for some $t \in (0, 1 / 2]$ , the whole sequences $\{X^{k}\}$ and $\{E_{δ} (X^{k + 1}, X^{k}, μ_{k})\}$ are linearly convergent.

Proof.

(i): Assume that $(\tilde{X}, \tilde{X}, \tilde{μ}) \in Ξ \subseteq crit E_{δ} .$ Then, there exists a subsequence ${(X^{k_{i} + 1}, X^{k_{i}},$ $μ_{k_{i}})}$ of ${(X^{k + 1}, X^{k}, μ_{k})}$ converging to $(\tilde{X}, \tilde{X}, \tilde{μ})$ . Let $k_{1} \in N$ be such that $p_{k} = 1$ for all $k \geq k_{1}$ , and we know that ${\tilde{E}}_{k + 1} = E_{δ} (X^{k + 1}, X^{k}, μ_{k})$ . It follows from Theorem 1 (iii) and the continuity of $Ψ$ that $lim_{i \to \infty} Ψ (X^{k_{i}}) = Ψ (\tilde{X}) .$ Again from Theorem 1 (i) and (ii), we have $lim_{k \to \infty} {∥ Δ_{k} ∥}_{F} = 0,$ and ${E_{δ} (X^{k + 1}, X^{k}, μ_{k})}$ is nonincreasing for all $k \geq k_{1}$ . Thus, we get $lim_{k \to \infty} E_{δ} (X^{k + 1}, X^{k}, μ_{k}) = κ$ , and $E_{δ} (X^{k + 1}, X^{k}, μ_{k}) \geq κ$ for all $k \geq k_{1}$ .
If there exists an integer $\bar{k}$ such that $E_{δ} (X^{\bar{k}}, X^{\bar{k} - 1}, μ_{\bar{k} - 1}) = κ$ , then from (8), $\forall k \geq \bar{k}$ , we have

$\begin{matrix} \frac{d}{2} {∥ Δ_{k + 1} ∥}_{F}^{2} \leq E_{δ} (X^{k}, X^{k - 1}, μ_{k - 1}) - E_{δ} (X^{k + 1}, X^{k}, μ_{k}) \\ \leq E_{δ} (X^{\bar{k}}, X^{\bar{k} - 1}, μ_{\bar{k} - 1}) - E_{δ} (X^{k + 1}, X^{k}, μ_{k}) \\ \leq E_{δ} (X^{\bar{k}}, X^{\bar{k} - 1}, μ_{\bar{k} - 1}) - κ \\ = 0 . \end{matrix}$

Thus, we have $X^{k + 1} = X^{k}$ for any $k > \bar{k}$ and the assertion $\sum_{k = 0}^{\infty} {∥ Δ_{k + 1} ∥}_{F} < + \infty$ holds directly. Otherwise, since ${E_{δ} (X^{k + 1}, X^{k}, μ_{k})}$ is nonincreasing for all $k \geq k_{1}$ , we have ${E_{δ} (X^{k + 1}, X^{k}, μ_{k})} > κ$ for all $k \geq k_{1}$ . Now, we consider the sequence ${(X^{k + 1}, X^{k}, μ_{k})}_{k = 0}^{\infty}$ . It follows from Lemma 6 that the cluster point set $Ξ$ of ${(X^{k + 1}, X^{k}, μ_{k})}_{k = 0}^{\infty}$ is nonempty and compact, and for any $(\tilde{X}, \tilde{X}, \tilde{μ}) \in Ξ$ , we have

$E_{δ} (\tilde{X}, \tilde{X}, \tilde{μ}) = Ψ (\tilde{X}) = κ .$

Thus, for any $η > 0$ , there exists a nonnegative integer $k_{2} \geq k_{1}$ such that $E_{δ} (X^{k + 1}, X^{k}, μ_{k}) < κ + η$ for any $k > k_{2}$ . In addition, for any $κ > 0$ , there exists a positive integer $k_{3} \geq k_{1}$ such that $dist ((X^{k + 1}, X^{k}, μ_{k}), Ξ) < κ$ for all $k > k_{3}$ . Consequently, for any $η, κ > 0$ , $k > k_{4} : = max {k_{2}, k_{3}, K}$ , where K is given by Lemma 5, we have

$dist ((X^{k + 1}, X^{k}, μ_{k}), Ξ) < κ, and κ < E_{δ} ((X^{k + 1}, X^{k}, μ_{k}) < κ + η .$

By using Lemma 1 with $Ω : = Ξ$ , for any $k \geq k_{4}$ , we have

$φ^{'} (E_{δ} ((X^{k + 1}, X^{k}, μ_{k}) - κ) dist (0, \partial E_{δ} (X^{k + 1}, X^{k}, μ_{k})) \geq 1 .$

(20)

The remaining global convergence arguments are similar to ([1], Theorem 2); ${X^{k}}$ is a Cauchy sequence and, hence, it is convergent. By using Lemma 6 (i), there exists $(\tilde{X}, \tilde{X}, \tilde{μ}) \in crit E_{δ}$ with $\tilde{X} \in crit Ψ$ such that $lim_{k \to \infty} X^{k} = \tilde{X}$ .
(ii): Denote $Θ_{k} : = E_{δ} (X^{k}, X^{k - 1}, μ_{k - 1}) - κ$ . It follows from (20) that

$\begin{matrix} 1 & \leq φ^{'} (Θ_{k + 1}) dist (0, \partial E_{δ} (X^{k + 1}, X^{k}, μ_{k})) \\ \leq (1 - t) ρ b {(Θ_{k + 1})}^{- t} (∥Δ_{k + 1}∥ + ∥Δ_{k}∥) \\ \leq (1 - t) ρ b {(Θ_{k + 1})}^{- t} \sqrt{2 {∥Δ_{k + 1}∥}^{2} + 2 {∥Δ_{k}∥}^{2}} \\ \leq (1 - t) ρ b {(Θ_{k + 1})}^{- t} \sqrt{4 / d [E_{δ} (X^{k - 1}, X^{k - 2}, μ_{k - 2}) - E_{δ} (X^{k + 1}, X^{k}, μ_{k})]} \\ = c {(Θ_{k + 1})}^{- t} \sqrt{Θ_{k - 1} - Θ_{k + 1}}, \end{matrix}$

where $c = 2 (1 - t) ρ b / \sqrt{d}$ , the second inequality follows from Lemma 5 and the fourth one follows from (8), together with the $p_{k} = 1$ . Since $Θ_{k} \to 0$ , there exists $k_{5}$ such that $Θ_{k} \leq 1$ . Then, for all $k \geq k_{6} : = max \{k_{4}, k_{5}\}$ , it follows from (20) that for any $k \geq k_{6} + 1$ ,

$Θ_{k} \leq {(Θ_{k})}^{2 t} \leq c^{2} (Θ_{k - 2} - Θ_{k}),$

which means that

$Θ_{k} \leq \frac{c^{2}}{1 + c^{2}} Θ_{k - 2} .$

So, the sequences $\{E_{δ} (X^{2 k + 1}, X^{2 k}, μ_{2 k})\}$ and $\{E_{δ} (X^{2 k}, X^{2 k - 1}, μ_{2 k - 1})\}$ are both Q-linearly convergent. This indicates that the entire sequence $\{E_{δ} (X^{k + 1}, X^{k}, μ_{k})\}$ is R-linearly convergent. By combining this with (8), we can infer that there exist $N > 0$ and $k_{0}$ and $q \in (0, 1)$ so that for each $k \geq k_{0}, ∥Δ_{k + 1}∥ \leq N q^{k}$ . Consequently,

$∥X^{k} - X^{*}∥ \leq \sum_{i = k}^{\infty} ∥Δ_{i + 1}∥ \leq \frac{N}{1 - q} q^{k},$

which means that ${X^{k}}$ is R-linearly convergent. This completes the proof. □

5. Numerical Results

This section evaluates the algorithm’s efficacy through resolution of the matrix completion problem:

min_{X} \sum_{i = 1}^{m} g (σ_{i} (X)) + \frac{1}{2} {∥ P_{Ω} (X) - Y ∥}_{F}^{2},

(21)

where

Ω

constitutes the sample index set while

P_{Ω} : R^{m \times n} \to R^{m \times n}

operates linearly, preserving

Ω

entries intact and nullifying others. Define

f : = f_{1} - f_{2}

,

f_{1} : = \frac{1}{2} {∥ P_{Ω} (X) - Y ∥}_{F}^{2}

,

f_{2} : = 0

;

\nabla f_{1}

and

\nabla f_{2}

exhibit Lipschitz constants

L = 1

and

l = 0

. Algorithm performance manifests using both synthetic and empirical datasets. Implementation employed MATLAB 2020a on a Windows 10 platform with Intel(R) Core(TM) i7-1165G7 processor (2.80 GHz) and 16 GB RAM. Testing prioritizes ETP and Log penalty functions, replicating parameter selections from [1,4] as predominantly optimal.

5.1. Synthetic Data

Within this synthetic trial, we fabricate a rank

- r

matrix

X^{*}

as

M_{L} M_{R}

, where

M_{L} \in R^{m \times r}

, and

M_{R} \in R^{r \times n}

originate from MATLAB’s rand command. Half of the elements in

X^{*}

are randomly and uniformly missing. Here, the observed matrix

Y = P_{Ω} (X^{*})

, with

λ_{0} = {∥ Y ∥}_{\infty}

, and

λ : = λ_{t} = 10^{- 3} λ_{0}

in the model. Termination occurs when

∥ P_{Ω} {(X) - Y ∥}_{\infty} \leq 10^{- 3}

.

PIRNNE-LS integrates a line search strategy to eliminate parametric restrictions. Validating this approach, we examine ETP and Log nonconvex penalties under four

p_{k}

scenarios:

p_{k} = p \in {0.1, 0.3, 0.7, 1}

, noting monotonicity when

p = 1

and non-monotonicity for

p < 1

. Tests employ

m = n = 500, r = 50, α_{k}^{0} = 0.1, β_{k}^{0} = 0.1, μ_{k}^{0} = 1

for each

k \in N

, with Algorithm 1 (Step 1) parameters

η_{1} = 0.4, η_{2} = 0.35, τ = 0.45, d = 0.1

and

δ = 0.1

. Maximum iterations cap at 1000. Figure 1 charts error metric evolution against CPU duration. Figure 1 indicates PIRNNE-LS with

p = 0.7

and

p = 1

markedly outperforms alternatives.

5.2. Real Images

In this subsection, we primarily undertake a comparative analysis of our proposed algorithm with APIRNN in [17] and PIRNNE in [1]. For the APIRNN and PIRNNE algorithms, the choice of involved parameters

α_{k}, β_{k}

is the same as in [1,17], respectively. To better demonstrate algorithmic enhancements, we implement (i) monotone line search (

p_{k} \equiv 1

), designated PIRNNE-mLS, and (ii) nonmonotone line search (

p_{k} \equiv 0.7

), designated PIRNNE-nLS.

To more comprehensively demonstrate algorithmic efficacy, we constructed four 2D images, “Boat (

512 \times 512

)”, “Man (

1024 \times 1024

)”, “City Wall (

512 \times 512

)” and “Spillikins (

512 \times 512

)”, alongside four 3D counterparts, “Bottles (

512 \times 512

)”, “Texture (

512 \times 512

)”, “House (

256 \times 256

)”, “Clock (

512 \times 512

)”, visualized in Figure 2 and Figure 3. It is universally acknowledged that although not all authentic images possess low-rank characteristics, the essential information is primarily determined by the higher singular values. Consequently, it is feasible to recover corrupted images through low-rank approximation. For 3D imagery containing three separate channels, matrix completion executes independently per channel. The approach for selecting parameters remains identical to that employed in the artificial example, and the termination criterion is

∥ P_{Ω} {(X) - Y ∥}_{\infty} \leq 10^{- 3}

.

Due to space constraints, we concentrate upon the ETP penalty function to demonstrate recovery efficacy. The algorithm’s recovery capability quantifies via Signal-to-Noise Ratio (SNR), defined as

SNR (u, \bar{u}) = 10 {log}_{10} \frac{∥ u - \bar{u} ∥^{2}}{∥ u - u^{*} ∥^{2}},

where u and

\bar{u}

signify the original image and mean of the original image, and

u^{*}

the reconstructed image.

For this evaluation, random values perturb

50 %

of image elements, with

Ω

denoting a set of random values. These corrupted images appear in Figure 2 and Figure 3’s second row. SNR values relative to CPU duration—required by four methods to attain minimal processing time—are plotted across Figure 4 and Figure 5. The results in Figure 4 and Figure 5 show that PIRNNE-LS (including PIRNNE-mLS and PIRNNE-nLS) outperforms traditional APIRNN and PIRNNE.

Furthermore, we report the number of iterations and CPU time in seconds and SNR values in Table 1. In the presented results, we use “Iter.”, “Time” and “SNR” to denote the number of iterations, CPU time in seconds and SNR value, respectively. In color images, Iter., Time and SNR represent the mean values of the three channels. From Table 1, we observe that our proposed PIRNNE-mLS and PIRNNE-nLS have better recovery performance.

5.3. Movie Recommendation System

In order to further evaluate the performance of the proposed algorithm, we test our algorithm on the MovieLens dataset [41]. MovieLens dataset contains anonymous ratings of movies by users. Three subsets of the dataset are employed: 100 K, 1 M and 10 M, with varying numbers of users, movies and ratings as described in Table 2.

The experiments were conducted on a workstation equipped with an Intel Xeon Gold 5218R processor (20 cores/40 threads), 64 GB of RAM and dual NVIDIA GeForce RTX 4090 GPU. The software environment is Ubuntu 22.04.4 LTS and MATLAB R2020a. The key measuring metrics are computational efficiency via GPU seconds (Time), recovery accuracy via RMSE and the objective value, which are adopted to determine the algorithm’s superiority. The comparative results of different algorithms on the MovieLens dataset subsets are presented in Table 3. It should be emphasized that our advantages are not apparent with small data. However, our algorithms have a marked advantage with big data.

6. Conclusions

This paper addresses a class of nonconvex and nonsmooth optimization problems that are commonly encountered in various applications. Based on existing dimension reduction and extrapolation techniques, we propose a more generalized proximal iterative reweighted nuclear norm method. This method utilizes a line search mechanism to avoid parameter constraints, thereby providing greater flexibility in parameter selection. As a result, it is feasible to expand the application of this method in the future. In theory, we prove the subsequential convergence. Furthermore, for the case of monotone line search, we prove the global convergence and linear convergence rate of the algorithm under the KL framework. Finally, we validate the effectiveness of the algorithm through numerical results on synthetic and real data. We will construct a new nonconvex optimization model with distributed characteristics and design corresponding algorithms [42,43] based on the low rank of matrix. This will be our future research work.

Author Contributions

Conceptualization, Z.G.; methodology, Z.G.; formal analysis, Z.G.; writing—original draft preparation, Z.G.; software, X.Z.; validation, X.Z. and S.Z.; writing—review and editing, Z.G., S.Z., X.Z. and Y.C.; visualization, X.Z. and S.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (12471290), Natural Science Research of the Jiangsu Higher Education Institutions of China (20KJA520003), Six Talent Peaks Project of Jiangsu Province (JY-051), Open Fund of the Key Laboratory of NSLSCS, Ministry of Education, Suqian Sci&Tech Program (M202206) and Qing Lan Project.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ge, Z.L.; Zhang, X.; Wu, Z.M. A fast proximal iteratively reweighted nuclear norm algorithm for nonconvex low-rank matrix minimization problems. Appl. Numer. Math. 2022, 179, 66–86. [Google Scholar] [CrossRef]
Argyriou, A.; Evgeniou, T.; Pontil, M. Convex multi-task feature learning. Mach. Learn. 2008, 73, 243–272. [Google Scholar] [CrossRef]
Amit, Y.; Fink, M.; Srebro, N.; Ullman, S. Uncovering shared structures in multiclass classification. In Proceedings of the the 24th International Conference on Machine Learning, Corvallis, OR, USA, 20–24 June 2007. [Google Scholar] [CrossRef]
Lu, C.Y.; Tang, J.H.; Yan, S.C.; Lin, Z.C. Generalized nonconvex nonsmooth low-rank minimization. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; IEEE Computer Society: Washington, DC, USA, 2014; pp. 4130–4137. [Google Scholar]
Dong, W.S.; Shi, G.M.; Li, X.; Ma, Y.; Huang, F. Compressive sensing via nonlocal low-rank regularization. IEEE Trans. Image Process. 2014, 23, 3618–3632. [Google Scholar] [CrossRef]
Fazel, M.; Hindi, H.; Boyd, S.P. Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices. In Proceedings of the 2003 American Control Conference, Denver, CO, USA, 4–6 June 2003; IEEE: New York, NY, USA, 2003; pp. 2156–2162. [Google Scholar]
Hu, Y.; Zhang, D.B.; Ye, J.P.; Li, X.L.; He, X.F. Fast and accurate matrix completion via truncated nuclear norm regularization. IEEE Trans. Pattern Anal. 2013, 35, 2117–2130. [Google Scholar] [CrossRef]
Lu, C.Y.; Zhu, C.B.; Xu, C.Y.; Yan, S.C.; Lin, Z.C. Generalized singular value thresholding. In Proceedings of the 29th AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; pp. 1215–1221. [Google Scholar] [CrossRef]
Todeschini, A.; Caron, F.; Chavent, M. Probabilistic low-rank matrix completion with adaptive spectral regularization algorithms. Adv. Neural Inf. Process. Syst. 2013, 26, 845–853. [Google Scholar]
Toh, K.C.; Yun, S.W. An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems. Pac. J. Optim. 2010, 6, 615–640. [Google Scholar]
Zhang, X.; Peng, D.T.; Su, Y.Y. A singular value shrinkage thresholding algorithm for folded concave penalized low-rank matrix optimization problems. J. Glob. Optim. 2024, 88, 485–508. [Google Scholar] [CrossRef]
Tao, T.; Xiao, L.H.; Zhong, J.Y. A Fast Proximal Alternating Method for Robust Matrix Factorization of Matrix Recovery with Outliers. Mathematics 2025, 13, 1466. [Google Scholar] [CrossRef]
Cui, A.G.; He, H.Z.; Yuan, H. A Designed Thresholding Operator for Low-Rank Matrix Completion. Mathematics 2024, 12, 1065. [Google Scholar] [CrossRef]
Gong, P.H.; Zhang, C.S.; Lu, Z.S.; Huang, J.H.Z.; Ye, J.P. A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. In Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 37–45. [Google Scholar]
Nakayama, S.; Narushima, Y.; Yabe, H. Inexact proximal DC Newton-type method for nonconvex composite functions. Comput. Optim. Appl. 2024, 87, 611–640. [Google Scholar] [CrossRef]
Sun, T.; Jiang, H.; Cheng, L.Z. Convergence of proximal iteratively reweighted nuclear norm algorithm for image processing. IEEE Trans. Image Process. 2017, 26, 5632–5644. [Google Scholar] [CrossRef]
Phan, D.N.; Nguyen, T.N. An accelerated IRNN-Iteratively Reweighted Nuclear Norm algorithm for nonconvex nonsmooth low-rank minimization problems. J. Comput. Appl. Math. 2021, 396, 113602. [Google Scholar] [CrossRef]
Xu, Z.Q.; Zhang, Y.L.; Ma, C.; Yan, Y.C.; Peng, Z.L.; Xie, S.L.; Wu, S.Q.; Yang, X.K. LERE: Learning-Based Low-Rank Matrix Recovery with Rank Estimation. In Proceedings of the 38th AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; pp. 16228–16236. [Google Scholar] [CrossRef]
Wen, Y.W.; Li, K.X.; Chen, H.F. Accelerated matrix completion algorithm using continuation strategy and randomized SVD. J. Comput. Appl. Math. 2023, 429, 115215. [Google Scholar] [CrossRef]
Zhang, H.M.; Qian, F.; Shi, P.; Du, W.L.; Tang, Y.; Qian, J.J.; Gong, C.; Yang, J. Generalized Nonconvex Nonsmooth Low-Rank Matrix Recovery Framework With Feasible Algorithm Designs and Convergence Analysis. IEEE Trans. Neur. Net. Lear. 2023, 34, 5342–5353. [Google Scholar] [CrossRef] [PubMed]
Li, B.J.; Pan, S.H.; Qian, Y.T. Factorization model with total variation regularizer for image reconstruction and subgradient algorithm. Pattern Recogn. 2026, 170, 112038. [Google Scholar] [CrossRef]
Guo, H.Y.; Huang, Z.H.; Zhang, X.Z. Low rank matrix recovery with impulsive noise. Appl. Math. Lett. 2022, 134, 108364. [Google Scholar] [CrossRef]
Wang, H.; Wang, Y.; Yang, X.Y. Efficient Active Manifold Identification via Accelerated Iteratively Reweighted Nuclear Norm Minimization. J. Mach. Learn. Res. 2024, 25, 1–44. [Google Scholar]
Beck, A.; Teboulle, M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2009, 2, 183–202. [Google Scholar] [CrossRef]
Li, H.; Lin, Z.C. Accelerated proximal gradient methods for nonconvex programming. Adv. Neural Inf. Process. Syst. 2015, 28, 377–387. [Google Scholar]
Grippo, L.; Lampariello, F.; Lucidi, S. A nonmonotone line search technique for Newton’s method. SIAM J. Numer. Anal. 1986, 23, 707–716. [Google Scholar] [CrossRef]
Wright, S.J.; Nowak, R.; Figueiredo, M.A.T. Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 2008, 57, 2479–2493. [Google Scholar] [CrossRef]
Wu, Z.M.; Li, C.S.; Li, M.; Andrew, L. Inertial proximal gradient methods with Bregman regularization for a class of nonconvex optimization problems. J. Glob. Optim. 2021, 79, 617–644. [Google Scholar] [CrossRef]
Liu, J.Y.; Cui, Y.; Pang, J.S.; Sen, S. Two-stage stochastic programming with linearly bi-parameterized quadratic recourse. SIAM J. Optimiz. 2020, 30, 2530–2558. [Google Scholar] [CrossRef]
Wang, J.Y.; Petra, C.G. A sequential quadratic programming algorithm for nonsmooth problems with upper-C² Objective. SIAM J. Optimiz. 2023, 33, 2379–2405. [Google Scholar] [CrossRef]
Yang, L. Proximal gradient method with extrapolation and line search for a class of nonconvex and nonsmooth problems. J. Optimiz. Theory App. 2024, 200, 68–103. [Google Scholar] [CrossRef]
Attouch, H.; Bolte, J.; Svaiter, B.F. Convergence of descent methods for semi-algebraic and tame problems: Proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 2013, 137, 91–129. [Google Scholar] [CrossRef]
Bolte, J.; Sabach, S.; Teboulle, M. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 2014, 146, 459–494. [Google Scholar] [CrossRef]
Kurdyka, K. On gradients of functions definable in o-minimal structures. Ann. I. Fourier 1998, 48, 769–783. [Google Scholar] [CrossRef]
Attouch, H.; Bolte, J.; Redont, P.; Soubeyran, A. Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Lojasiewicz inequality. Math. Oper. Res. 2010, 35, 438–457. [Google Scholar] [CrossRef]
Ge, Z.L.; Wu, Z.M.; Zhang, X. An extrapolated proximal iteratively reweighted method for nonconvex composite optimization problems. J. Glob. Optim. 2023, 86, 821–844. [Google Scholar] [CrossRef]
Guo, K.; Han, D.R. A note on the Douglas-Rachford splitting method for optimization problems involving hypoconvex functions. J. Glob. Optim. 2018, 72, 431–441. [Google Scholar] [CrossRef]
Wen, B.; Chen, X.J.; Pong, T.K. Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems. SIAM J. Optimiz. 2017, 27, 124–145. [Google Scholar] [CrossRef]
Wu, Z.M.; Li, M. General inertial proximal gradient method for a class of nonconvex nonsmooth optimization problems. Comput. Optim. Appl. 2019, 73, 129–158. [Google Scholar] [CrossRef]
Nesterov, Y. Introductory Lectures on Convex Optimization: A Basic Course; Springer: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
Harper, F.M.; Konstan, J.A. The MovieLens Datasets: History and Context. ACM TiiS 2015, 5, 1–19. [Google Scholar] [CrossRef]
Li, S.; Li, Q.W.; Zhu, Z.H.; Tang, G.G.; Wakin, M.B. The global geometry of centralized and distributed low-rank matrix recovery without regularization. IEEE Signal Proc. Let. 2020, 27, 1400–1404. [Google Scholar] [CrossRef]
Doostmohammadian, M.; Gabidullina, Z.R.; Rabiee, H.R. Nonlinear perturbation-based non-convex optimization over time-varying networks. IEEE Trans. Netw. Sci. Eng. 2024, 11, 6461–6469. [Google Scholar] [CrossRef]

Figure 1. Evolution of the error value with respect to the CPU time.

Figure 2. The list of pictures in order: Boat, Man, City Wall and Spillikins. First row: original images, second row: noisy images.

Figure 3. The list of pictures in order: Bottles, Texture, House and Clock. First row: original images, second row: noisy images.

Figure 4. Evolution of SNR values of Boat, Man, City Wall and Spillikins with respect to the CPU time.

Figure 5. Evolution of SNR values of Bottles, Texture, House and Clock with respect to the CPU time.

Table 1. Numerical results of tested algorithms with Boat, Man, City Wall, Spillikins, Bottles, Texture, House and Clock.

	APIRNN			PIRNNE			PIRNNE-mLS			PIRNNE-nLS
	Iter.	Time	SNR	Iter.	Time	SNR	Iter.	Time	SNR	Iter.	Time	SNR
Boat	121	3.94	19.49	51	2.87	23.96	43	1.90	26.23	43	1.90	26.69
Man	117	30.02	23.29	56	23.90	26.64	45	15.61	26.35	44	15.41	26.79
City Wall	56	1.43	17.88	40	1.61	19.08	38	1.05	19.74	35	0.96	20.05
Spillikins	68	1.57	20.05	59	1.96	22.39	34	1.03	23.08	33	1.01	23.09
Bottles	66	4.76	21.92	56	6.34	22.26	46	3.10	22.50	38	2.98	22.74
Texture	57	4.72	19.81	52	6.09	19.55	42	3.13	20.94	35	2.80	21.95
House	109	1.17	21.96	114	2.74	21.37	39	0.58	23.94	38	0.55	24.76
Clock	229	12.10	20.43	155	17.36	24.92	55	5.01	24.28	47	3.30	27.99

Table 2. Dataset descriptions. The number of users, items and ratings used in each dataset.

Dataset		Users	Movies	Ratings
MovieLens	100 K	943	1682	100,000
	1 M	6040	3449	999,714
	10 M	69,878	10,677	10,000,054

Table 3. Comparative results of tested algorithms on the MovieLens dataset subsets.

Dataset		Method	Time	RMSE	Objective Value
MovieLens	100 K	APIRNN	2.06	1.0410	$6.4081 \times 10^{2}$
		PIRNNE	1.11	1.0216	$7.8860 \times 10^{2}$
		PIRNNE-mLS	2.00	1.0468	$5.9634 \times 10^{2}$
		PIRNNE-nLS	1.98	1.0450	$6.1770 \times 10^{2}$
	1 M	APIRNN	6.24	0.8855	$1.1575 \times 10^{5}$
		PIRNNE	7.88	1.0343	$1.2641 \times 10^{5}$
		PIRNNE-mLS	5.23	0.8844	$1.0311 \times 10^{5}$
		PIRNNE-nLS	5.22	0.8844	$1.0311 \times 10^{5}$
	10 M	APIRNN	28.23	0.9483	$1.9444 \times 10^{6}$
		PIRNNE	238.86	1.0063	$2.1513 \times 10^{6}$
		PIRNNE-mLS	14.49	0.9483	$1.9435 \times 10^{6}$
		PIRNNE-nLS	13.78	0.9483	$1.9435 \times 10^{6}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ge, Z.; Zhang, S.; Zhang, X.; Cui, Y. A New Proximal Iteratively Reweighted Nuclear Norm Method for Nonconvex Nonsmooth Optimization Problems. Mathematics 2025, 13, 2630. https://doi.org/10.3390/math13162630

AMA Style

Ge Z, Zhang S, Zhang X, Cui Y. A New Proximal Iteratively Reweighted Nuclear Norm Method for Nonconvex Nonsmooth Optimization Problems. Mathematics. 2025; 13(16):2630. https://doi.org/10.3390/math13162630

Chicago/Turabian Style

Ge, Zhili, Siyu Zhang, Xin Zhang, and Yan Cui. 2025. "A New Proximal Iteratively Reweighted Nuclear Norm Method for Nonconvex Nonsmooth Optimization Problems" Mathematics 13, no. 16: 2630. https://doi.org/10.3390/math13162630

APA Style

Ge, Z., Zhang, S., Zhang, X., & Cui, Y. (2025). A New Proximal Iteratively Reweighted Nuclear Norm Method for Nonconvex Nonsmooth Optimization Problems. Mathematics, 13(16), 2630. https://doi.org/10.3390/math13162630

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A New Proximal Iteratively Reweighted Nuclear Norm Method for Nonconvex Nonsmooth Optimization Problems

Abstract

1. Introduction

1.1. Problem Description

1.2. Related Work

1.3. Our Contribution

2. Preliminaries

2.1. Basic Concepts in Variational and Convex Analysis

2.2. Kurdyka–Łojasiewicz Property

3. The Proposed Method

4. Convergence Analysis

4.1. Subsequential Convergence of Nonmontone Line Search

4.2. Global Convergence and Linear Convergence Rate of Monotone Line Search

5. Numerical Results

5.1. Synthetic Data

5.2. Real Images

5.3. Movie Recommendation System

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI