The Extended Second APG Method for Constrained DC Problems

Liu, Ziye; Ke, Huitao; Liu, Chunguang

doi:10.3390/axioms15010007

Open AccessArticle

The Extended Second APG Method for Constrained DC Problems

by

Ziye Liu

,

Huitao Ke

and

Chunguang Liu

^*

Department of Mathematics, Jinan University, Guangzhou 510632, China

^*

Author to whom correspondence should be addressed.

Axioms 2026, 15(1), 7; https://doi.org/10.3390/axioms15010007

Submission received: 29 October 2025 / Revised: 6 December 2025 / Accepted: 22 December 2025 / Published: 24 December 2025

(This article belongs to the Special Issue The Numerical Analysis and Its Application, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

In this paper, we develop the extended proximal gradient algorithm with Nesterov’s second acceleration (

{EAPG}_{s}

) for constrained difference-of-convex (DC) optimization problems.

{EAPG}_{s}

has two key links to existing methods: it extends

{APG}_{s}

(for unconstrained DC problems) by adopting the constraint handling idea from Auslender’s ESQM, and serves as a variant of

{ESQM}_{e}

with extrapolation replaced by Nesterov’s second acceleration. Under basic assumptions, we establish the subsequential convergence of

{EAPG}_{s}

. By introducing a restart technique and leveraging the Kurdyka–Łojasiewicz (KL) property of a suitable potential function, we further prove its global convergence, analyze its convergence rate, and do so under weaker conditions than those for

{APG}_{s}

. Additionally, we propose

{EAPG}_{s r}

by adding practical restart criteria to

{EAPG}_{s}

. Numerical experiments verify the criteria’s efficiency and show that

{EAPG}_{s r}

performs well against state-of-the-art methods for constrained and unconstrained DC problems.

Keywords:

ESQM; ESQM_e; the proximal gradient method; Nesterov’s second accelerated technique; the restart technique; Kurdyka–Łojasiewicz property

MSC:

65K05; 90C26; 90C30

1. Introduction

In this paper, we discuss the following difference-of-convex (DC) optimization problem with smooth inequality constraints and simple geometric constraints:

\begin{matrix} min_{x \in R^{n}} & F (x) : = f (x) + P_{1} (x) - P_{2} (x) \\ s . t . & g_{i} (x) \leq 0, i = 1, \dots, m, \\ x \in C, \end{matrix}

(1)

where all functions are real-valued and defined on

R^{n}

. Specifically, f and

g_{i}

(

i = 1, \dots, m

) are differentiable;

P_{1}

and

P_{2}

are convex, with

P_{1}

admitting an efficiently computable proximal operator;

C \subset R^{n}

is a nonempty closed convex set; and the feasible set

C \cap F

is nonempty (here,

F : = {x \in R^{n} : g_{i} (x) \leq 0, i = 1, \dots, m}

).

Problem (1) belongs to one of the three types of DC problems summarized by [1], which is widely applied in various fields such as joint chance constrained programs [2], multicast network design problems [3], sparsity constrained optimization problems [4], and bilevel optimization problems [5]. As a result, it has garnered significant attention from researchers (see [1,6,7] and the references therein). Problem (1) encompasses two commonly used and extensively studied special cases:

Unconstrained DC problems, discussed in [1,8,9,10,11,12,13,14,15] and related references:

$\begin{matrix} min_{x \in R^{n}} & F (x) : = f (x) + P_{1} (x) - P_{2} (x) . \end{matrix}$

(2)

This corresponds to setting $g_{i} \equiv 0$ (for all i), $C = R^{n}$ , while retaining the original assumptions on f, $P_{1}$ , and $P_{2}$ .
DC problems with $f \equiv 0$ , analyzed in [16,17,18,19] and references therein:

$\begin{matrix} min_{x \in R^{n}} & P_{1} (x) - P_{2} (x) \\ s . t . & g_{i} (x) \leq 0, i = 1, \dots, m, \\ x \in C . \end{matrix}$

(3)

Both Problems (2) and (3) arise naturally in numerous applications. For instance, Yin et al. [17] show that compressed sensing problems can be formulated as Problem (3): here,

P_{1} - P_{2}

acts as a sparsity-inducing regularizer (e.g., the difference of

ℓ_{1}

- and

ℓ_{2}

-norms), and the (unique) constraint involves estimation error. Alternatively, the same compressed sensing problem can be cast as Problem (2), where

P_{1} - P_{2}

remains unchanged and f serves as a penalty term for the original constraint.

Notably, while Problems (2) and (3) often model the same applications, they differ in key trade-offs: Problem (3) enforces (approximate) satisfaction of constraints, but developing constraint-aware algorithms is more challenging; Problem (2) benefits from a richer set of unconstrained solvers, yet its solutions may violate the original constraints—leading to larger deviations from the true optimal solution. Given these trade-offs, extending successful methods for Problem (2) to Problems (1) and (3) is of significant value.

DC programming and DC algorithms were first introduced by Pham Dinh [15] (for a comprehensive survey of results before 2015, see [1] and references therein). The core idea of DC algorithms (DCA) is to approximate the concave component of the DC function with an affine function in each iteration, yielding a convex subproblem that can be solved via existing convex optimization techniques. Subsequent challenges in DCA research include improving algorithm efficiency, conducting convergence analyses, and deriving convergence rate guarantees.

Among DC algorithms, line search methods are fundamental (see [20] for a categorized overview). These methods typically do not require strict conditions on f,

g_{1}, \dots, g_{m}

(e.g., differentiability) or

P_{1}

(e.g., efficient proximal computation), making them applicable to a broad range of DC problems (including those with fewer restrictions than Problem (1)). However, this generality comes at a cost: line search methods cannot leverage favorable properties of functions that arise in practical applications, limiting their potential efficiency.

For Problem (2) in the case where f has a Lipschitz-continuous gradient, integrating proximal mappings and Nesterov’s acceleration techniques yields notable performance improvements. Proximal gradient methods and acceleration strategies were originally developed for convex optimization (see [21] for a detailed review). Examples include FISTA [22] (a proximal gradient method with Nesterov’s first acceleration) and IGA [23] (a proximal gradient method with Nesterov’s second acceleration). Subsequent extensions to DC problems—such as pDCAe [24] (extending FISTA) and

{APG}_{s}

[25,26] (extending IGA)—have demonstrated the effectiveness of these convex optimization ideas in the DC setting. The main iteration of IGA generates

\begin{matrix} y^{k} = & θ_{k} z^{k} + (1 - θ_{k}) x^{k}, \\ z^{k + 1} = & \underset{z \in R^{n}}{argmin} {P_{1} (z) + 〈 \nabla f (y^{k}), z 〉 + \frac{1}{2} θ_{k} L_{f} ∥ z - z^{k} ∥^{2}}, \\ x^{k + 1} = & θ_{k} z^{k + 1} + (1 - θ_{k}) x^{k} \end{matrix}

for the convex version of Problem (2) (i.e., when f is convex and

P_{2} = 0

), where

{θ_{k}}

is the sequence of acceleartion parameters, and

L_{f}

denotes the Lipschitz constant of

\nabla f

. For the general case of Problem (2),

{APG}_{s}

reformulate the above subproblem by subtracting

ζ^{k} (\in \partial P_{2} (x^{k}))

from

\nabla f (y^{k})

.

For Problem (1), Lu proposed a sequential convex programming (SCP) method in [27], where each iteration is obtained by solving a constrained convex programming problem. It was shown that any accumulation point of the sequence generated by SCP is a stationary point under Slater’s condition. However, the convergence and convergence rate of the entire generated sequence remain unknown. Recently, Yu et al. further studied the SCP method with monotone line search (denoted as

{SCP}_{l s}

) in [28], successfully establishing global convergence guarantees for the proposed algorithm and quantitatively estimating its convergence rate.

Furthermore, for nonlinear programming problems where C is the entire space, Sequential Quadratic Programming (SQP) (see [16,29]) is one of the most successful methods. The SQP algorithm solves a subproblem involving linear inequalities at each iteration. Later, attention turned to modifying SQP algorithms by constructing Sequential Quadratically Constrained Quadratic Programming (SQCQP) methods (see [30,31]), where each iteration solves a subproblem with convex quadratic inequalities. However, Solodov pointed out in [32] that a major drawback of SQP is that global convergence statements rely on the boundedness of the sequence of primal problems constructed by the algorithm—an assumption that is not easily justified. To address this, he introduced a safeguard into the line search procedure, developing an SQP method where the primal sequence is proven bounded when the feasible set is bounded and

g_{i}

is convex. This drawback also persists in SQCQP methods.

In [18], Auslender proposed the Extended Sequential Quadratic Method (ESQM) to overcome this critical limitation of existing SQP and SQCQP methods. ESQM achieves global convergence without such boundedness assumptions, enhancing its versatility for a wider range of optimization problems. To improve the convergence rate, Zhang et al. developed a variant of ESQM (denoted as

{ESQM}_{e}

[19]) that incorporates Nesterov’s extrapolation technique, achieving empirical acceleration for Problem (3).

Building on the proven effectiveness of ESQM_e for constrained DC optimization and the well-established advantages of APGs for solving Problem (2), this paper extends APGs to solve (1) by adopting the constraint handling strategy from Auslender’s ESQM. The resulting algorithm is termed the extended proximal gradient algorithm with Nesterov’s second acceleration technique (

{EAPG}_{s}

). This algorithm also serves as a variant of

{ESQM}_{e}

, where the extrapolation step is replaced with Nesterov’s second acceleration.

For

{EAPG}_{s}

, we can prove its subsequential convergence properties under basic assumptions. However, analyzing its global convergence and convergence rate faces a key obstacle: the lack of information about the subdifferential properties of F along the sequence

{x^{k}}

generated by

{EAPG}_{s}

. For APGs applied to Problem (2), this obstacle is circumvented by assuming

P_{1}

is Lipschitz differentiable—though this condition is rarely satisfied in practical applications.

For Problem (1), the presence of constraints introduces non-differentiable components in the subproblem, making this obstacle insurmountable and posing a significant challenge. To overcome this, inspired by the restart technique introduced by O’Donoghue and Candès [33], we integrate this technique into

{EAPG}_{s}

and find it effective for both theoretical analysis and practical computation. Theoretically, we construct a suitable potential function and assume it satisfies the Kurdyka–Łojasiewicz (KL) property, along with additional differentiability conditions for each

g_{i}

in (1), thereby establishing the convergence of the entire sequence and its convergence rate. Practically, we introduce efficient restart criteria to develop a practical variant (

{EAPG}_{s r}

), which is validated through numerical experiments.

The remainder of this paper is structured as follows: Section 2 presents preliminary concepts and mathematical foundations essential to our analysis. Section 3 formally introduces the

{EAPG}_{s}

algorithm. Section 4 establishes its subsequential convergence properties. Section 5 introduces the theoretical variant of

{EAPG}_{s}

with a restart technique, proves its global convergence, estimates its convergence rate, and presents the practical variant

{EAPG}_{s r}

. Finally, Section 6 demonstrates the practical performance of

{EAPG}_{s r}

through comprehensive numerical experiments.

2. Notation and Preliminaries

In this paper, we use the following standard notation:

$R$ and $R_{+}$ denote the sets of real numbers and nonnegative real numbers, respectively.
$R^{n}$ and $R_{+}^{n}$ denote the n-dimensional Euclidean space and its nonnegative orthant, respectively.
$N$ denotes the set of positive integers.
For $x \in R$ , $x_{+} : = max {x, 0}$ .
For $p \geq 1$ , ${∥ \cdot ∥}_{p}$ denotes the $l_{p}$ -norm on $R^{n}$ ; in particular, $∥ \cdot ∥$ is used exclusively to represent the $l_{2}$ -norm ( ${∥ \cdot ∥}_{2}$ ).
For $x, y \in R^{n}$ , $〈 x, y 〉$ denotes their inner product.
Given a nonempty set $D \subseteq R^{n}$ , the distance from $x \in R^{n}$ to D is defined as $dist (x, D) : = inf {∥ x - z ∥ ∣ z \in D}$ .

For extended real-valued functions

f : R^{n} \to (- \infty, + \infty]

, we adopt the following definitions:

1.: A function f is proper if its domain $dom f : = {x ∣ f (x) < + \infty}$ is nonempty.
2.: A proper function f is closed if it is lower semicontinuous at every $x \in R^{n}$ , i.e., $f (x) \leq {lim inf}_{z \to x} f (z)$ .
3.: A proper closed function f is level bounded if all its lower level sets ${x \in R^{n} ∣ f (x) \leq a}$ are bounded for every $a \in R$ .
4.: For a sequence ${x^{k}} \subseteq R^{n}$ , $x^{k} \overset{f}{\to} x$ (as $k \to \infty$ ) means $x^{k} \to x$ (in $R^{n}$ ) and $f (x^{k}) \to f (x)$ .

Definition 1

([34] Definition 8.3). For a proper closed function f, the regular subdifferential of

f : R^{n} \to R \cup {+ \infty}

at

x \in d o m f

is defined by

\begin{matrix} \hat{\partial} f (x) : = \{\hat{x} \in R_{n} : \underset{z \to x, z \neq x}{lim inf} \frac{f (z) - f (x) - 〈 \hat{x}, z - x 〉}{∥ z - x ∥} \geq 0\} . \end{matrix}

(4)

The (general) subdifferential of f at

x \in d o m f

is defined by

\begin{matrix} \partial f (x) : = \{\hat{x} : \exists x^{k} \overset{f}{\to} x, {\hat{x}}^{k} \to \hat{x} with {\hat{x}}^{k} \in \hat{\partial} f (x^{k}) for each k\}, \end{matrix}

(5)

we write

dom \partial f : = {x : \partial f (x) \neq \emptyset}

.

Note that if f is convex, then the general subdifferential and regular subdifferential of f at

x \in dom f

reduce to the (classical) subdifferential ([34] Proposition 8.12), which is given by

\partial f (x) = {\hat{x} : f (y) \geq f (x) + 〈 \hat{x}, y - x 〉 \forall y \in R^{n}} .

(6)

For a nonempty closed set

D \subseteq R^{n}

, the indicator function

δ_{D}

is defined by

δ_{D} (x) = \{\begin{matrix} 0 & x \in D, \\ \infty & x \notin D . \end{matrix}

(7)

The normal cone of D at

x \in D

is defined by

N_{D} (x) : = \partial δ_{D} (x)

.

Next, we recall several key definitions that will be used in the subsequent analysis. First, we introduce the constraint qualification for problem (1)—which was also adopted in [18,19]—followed by the (associated) first-order optimality conditions for (1).

Definition 2

(RCQ). We say that the Robinson constraint qualification holds at an

x \in R^{n}

for (1) if the following statement holds:

\begin{matrix} R C Q (x) : \exists y \in C such that g_{i} (x) + 〈 \nabla g_{i} (x), y - x 〉 < 0 \forall i = 1, \dots m . \end{matrix}

(8)

Definition 3

(Critical point). For (1), we say that x is a critical point of (1) if

x \in C

and there exists

λ = (λ_{1}, λ_{2}, \dots, λ_{m}) \in R_{+}^{m}

such that

(x, λ)

satisfies the following conditions:

(i): $g_{i} (x) \leq 0 \forall i = 1, \dots, m$ ,
(ii): $λ_{i} g_{i} (x) = 0 \forall i = 1, \dots, m$ ,
(iii): $0 \in \nabla f (x) + \partial P_{1} (x) - \partial P_{2} (x) + \sum_{i = 1}^{m} λ_{i} \nabla g_{i} (x) + N_{C} (x)$ .

Using arguments analogous to those in ([28] Section 2), one can show the following: if the RCQ(x) holds for all

x \in C \cap F

, then every local minimizer of (1) is a critical point of (1)—provided that Assumption 1 (presented in the next section) holds and

P_{1}

is continuous at x.

It is further straightforward to verify that if

g_{1}, \dots, g_{m}

are convex and the Slater condition is satisfied (i.e., there exists

\tilde{x} \in C

such that

g_{i} (\tilde{x}) < 0

for all

i = 1, 2, \dots, m

), then RCQ(x) holds for all

x \in C

.

Numerous functions are known to satisfy the Kurdyka–Łojasiewicz (KL) property. For example, proper closed semi-algebraic functions satisfy the KL property with some exponent

β \in [0, 1)

(see [35]). The KL property plays a crucial role in the convergence analysis of many first-order methods, and its exponent is particularly significant for establishing convergence rates (for further details, refer to [25,26,36,37,38,39,40,41,42] and the references therein).

First, for

η > 0

, we define

Θ_{η}

as the class of all continuous concave functions

φ : [0, η) \to [0, + \infty)

satisfying

φ (0) = 0

, where

φ

is continuously differentiable on

(0, η)

with

φ^{'} > 0

(see ([43] Section 2)).

Definition 4

((KL property and KL function) ([43] Section 2)). Let

h : R^{n} \to R \cup {+ \infty}

be a proper closed function.

(i): For $\tilde{x} \in dom \partial h : = {x \in R^{n} : \partial h (x) \neq \emptyset}$ , if there exist a neighborhood $O$ of $\tilde{x}$ , $η \in (0, + \infty]$ and function $φ \in Θ_{η}$ such that for all $x \in O \cap {x \in R^{n} : h (\tilde{x}) < h (x) < h (\tilde{x}) + η}$ , it holds that

$φ^{'} (h (x) - h (\tilde{x})) dist (0, \partial h (x)) \geq 1,$

(9)

then h is said to have the Kurdyka-Lojasiewicz (KL) property at $\tilde{x}$ .
(ii): If h satisfies the KL property at each point of $dom \partial h$ , then h is called a KL function.

Definition 5

((KL exponent) ([43] Section 2)). Suppose that

h : R^{n} \to R \cup {+ \infty}

is a proper closed function satisfying the KL property at

\tilde{x} \in dom \partial h

with

φ (s) = ρ s^{1 - β}

for some

ρ > 0

and

β \in [0, 1)

. Then, h is said to have the KL property at

\tilde{x}

with an exponent β. If h is a KL function and has the same exponent β at any

\tilde{x} \in dom \partial h

, then h is said to be a KL function with the exponent β.

Lemma 1

((Uniformized KL property) ([39] Lemma 6)). Suppose that

h : R^{n} \to R \cup {+ \infty}

is a proper closed function and Γ is a compact set. If

h \equiv ζ

on Γ for some constant ζ and satisfies the KL property at each point of Γ , then there exist

ε > 0

,

η > 0

and

φ \in Θ_{η}

such that

φ^{'} (h (x) - ζ) dist (0, \partial h (x)) \geq 1

(10)

for all

x \in {x \in R^{n} : dist (x, Γ) < ε} \cap {x \in R^{n} : ζ < h (x) < ζ + η}

.

3. Algorithmic Framework

From this section to Section 5, we always suppose that the following conditions are fulfilled.

Assumption 1.

(i): f is a differentiable (possibly nonconvex) function and its gradient $\nabla f$ is Lipschitz continuous with Lipschitz constant $L_{f} \geq 0$ , and $l_{f} \in [0, L_{f}]$ is such that $f (\cdot) + \frac{1}{2} l_{f} {∥ \cdot ∥}^{2}$ is convex.
(ii): All $g_{i} (i = 1, 2, \dots, m)$ are differentiable functions with Lipschitz continuous gradients. We use $L_{g}$ to denote the common Lipschitz continuity modulus of $\nabla g_{1}, \dots, \nabla g_{m}$ , and let $l_{g} \in [0, L_{g}]$ be such that $g_{i} (\cdot) + \frac{1}{2} l_{g} ∥ \cdot ∥$ be convex for all $i = 1, 2, \dots, m$ .
(iii): At least one of $L_{f}$ and $L_{g}$ is positive.
(iv): The function $P_{1}$ is proper, convex and lower semicontinuous; the function $P_{2}$ is continuous and convex.
(v): Either (a) C is compact, or (b) all $g_{i} = 0$ and F is level bounded.

The above assumptions all constitute basic conditions for DC problems. The Lipschitz continuity of gradients is a necessary requirement for nearly all proximal gradient-type algorithms (see, e.g., [19,21,23,25,26,42]). Since we can always select a larger Lipschitz modulus, condition (iii) is easily satisfied. Finally, level boundedness frequently arises in the context of problem (2) (see, e.g., [25,26,36,42]), whereas the compactness of C is required in [19]. Either condition guarantees the existence of an optimal solution to problem (2).

The algorithm we study in this paper is presented as Algorithm 1 below; here and throughout, for notational simplicity, for each

u, w \in R^{n}

, we define

\begin{matrix} {lin}_{g_{i}} (u, w) : = g_{i} (w) + 〈 \nabla g_{i} (w), u - w 〉 \forall i = 1, \dots, m, \end{matrix}

(11)

g_{0} : = 0 (which implies that {lin}_{g_{0}} (u, w) \equiv 0),

(12)

Ψ (u, w) : = max_{i = 1, \dots, m} {[{lin}_{g_{i}} (u, w)]}_{+} (= max_{i = 0, 1, \dots, m} {{lin}_{g_{i}} (u, w)}) .

(13)

Algorithm 1

{EAPG}_{s}

for solving Problem (1)

Initialization: Choose

{θ_{k}} \subseteq (0, 1]

,

d > 0

,

α_{0} > 0

,

x^{0}, z^{0} \in C

.
For

k = 0, 1, \dots

, take

ξ^{k} \in \partial P_{2} (x^{k})

, and compute

\begin{matrix} y^{k} = & θ_{k} z^{k} + (1 - θ_{k}) x^{k}, \\ z^{k + 1} = & \underset{z \in C}{argmin} E (z) : = P_{1} (z) + 〈 \nabla f (y^{k}) - ζ^{k}, z 〉 \end{matrix}

(14)

\begin{matrix} + α_{k} Ψ (z, y^{k}) + \frac{1}{2} θ_{k} (α_{k} L_{g} + L_{f}) {∥ z - z^{k} ∥}^{2}, \end{matrix}

(15)

\begin{matrix} x^{k + 1} = & θ_{k} z^{k + 1} + (1 - θ_{k}) x^{k}, \end{matrix}

(16)

\begin{matrix} α_{k + 1} = & \{\begin{matrix} α_{k}, & if Ψ (z^{k + 1}, y^{k}) \leq 0, \\ α_{k} + d, & otherwise . \end{matrix} \end{matrix}

(17)

End for

We refer to our algorithm as the extended proximal gradient algorithm with Nesterov’s second acceleration technique (

{EAPG}_{s}

), where “Nesterov’s second acceleration technique” corresponds to Equations (14) and (16). To guarantee the convergence properties of Algorithm 1 in Section 4, we impose the following assumption on the acceleration parameters

{θ_{k}}

:

Assumption 2.

With

{inf}_{k} θ_{k} > 0

, there exists a constant

δ \in (0, 1)

satisfying

(L_{g} + l_{g}) γ_{k}^{2} \leq L_{g} (1 - δ),

(18)

(L_{f} + l_{f}) γ_{k}^{2} \leq L_{f} (1 - δ),

(19)

where

γ_{k} : = \frac{θ_{k} (1 - θ_{k - 1})}{θ_{k - 1}}

, for

k \geq 1

.

Remark 1.

We can provide concrete examples for selecting the acceleration parameters

{θ_{k}}

and the constant δ. Define

τ_{f} : = \{\begin{matrix} \frac{L_{f} + l_{f}}{L_{f}}, & i f L_{f} > 0, \\ 1, & i f L_{f} = 0, \end{matrix} τ_{g} : = \{\begin{matrix} \frac{L_{g} + l_{g}}{L_{g}}, & i f L_{g} > 0, \\ 1, & i f L_{g} = 0, \end{matrix}

and let

τ : = max {τ_{f}, τ_{g}}

.

(i): Constant Acceleration Parameters: Set $θ_{k} \equiv θ$ for some constant $θ \in (0, 1]$ satisfying ${(1 - θ)}^{2} < τ^{- 1}$ . Choose the constant $δ = 1 - τ {(1 - θ)}^{2}$ .
(ii): Variable Acceleration Parameters: For a preselected positive integer K, let $θ_{k} = ϑ_{k}$ for all $k < K$ , and $θ_{k} = ϑ_{K}$ for all $k \geq K$ . Here, ${ϑ_{k}}$ represents the classical parameter sequence introduced by Nesterov (see [21,44]), where

$ϑ_{0} = 1, ϑ_{k + 1} = \frac{\sqrt{ϑ_{k}^{4} + 4 ϑ_{k}^{2}} - ϑ_{k}^{2}}{2} .$

The integer K is chosen such that ${(1 - ϑ_{K})}^{2} < τ^{- 1}$ . Let $δ : = 1 - τ {(1 - ϑ_{K})}^{2}$ . Noticing that the sequence ${ϑ_{k}}$ is decreasing (as shown in [21]), it follows that ${θ_{k}}$ is also decreasing. Additionally, we have:

$1 - τ γ_{k}^{2} \geq 1 - τ {(1 - θ_{k - 1})}^{2} \geq δ,$

which establishes the desired inequalities (18) and (19).

The subproblem in (15) admits a unique solution, as it involves a strongly convex objective function over a nonempty closed convex feasible set, while an iterative solver is generally required for this subproblem, we refer readers to ([45] Appendix A) for an efficient routine to solve the subproblem in (15)—specifically for cases where

m = 1

and

P_{1}

takes certain forms.

The following lemmas are useful for proving results in subsequent sections. First, the conclusions of Lemma 2 are identical to their counterparts in ([19] Lemma 3.1), with only minor differences in parameter specifications.

Lemma 2.

Suppose that the sequece

{z^{k}} \subseteq C

is generated in Algorithm 1. Then the following statements hold:

(i): Problem (15) has a unique solution.
(ii): $z^{k + 1}$ is the minimizer of the subproblem in (15) if and only if there exist $λ_{i}^{k} \geq 0$ for all $i \in I_{k} (z^{k + 1})$ such that $\sum_{i \in I_{k} (z^{k + 1})} λ_{i}^{k} = 1$ and

$\begin{matrix} 0 \in & \partial P_{1} (z^{k + 1}) + \nabla f (y^{k}) - ξ^{k} + α_{k} \sum_{i \in I_{k} (z^{k + 1})} λ_{i}^{k} \nabla g_{i} (y^{k}) \\ + θ_{k} (α_{k} L_{g} + L_{f}) (z^{k + 1} - z^{k}) + N_{C} (z^{k + 1}), \end{matrix}$

where

\begin{matrix} I_{k} (z) : = & \{ι \in {0, 1, \dots, m} : {lin}_{g_{ι}} (z, y^{k}) = Ψ (z, y^{k})\} . \end{matrix}

(20)

The next lemma is based on Equation (4.2) in [19].

Lemma 3.

For any

x, y, y^{'} \in R^{n}

,

Ψ (x, y) - Ψ (x, y^{'}) \leq \frac{L_{g}}{2} ∥ x - y^{'} ∥^{2} + \frac{l_{g}}{2} {∥ x - y ∥}^{2} .

The inequality specified in Lemma 4 has appeared in the proofs of several previous works (e.g., (26) in [26]); here, we provide a concise proof.

Lemma 4.

For any

x, x^{'}, y \in R^{n}

,

\begin{matrix} f (x) - f (x^{'}) \leq 〈 \nabla f (y), x - x^{'} 〉 + \frac{1}{2} L_{f} {∥ x - y ∥}^{2} + \frac{1}{2} l_{f} {∥ x^{'} - y ∥}^{2} . \end{matrix}

(21)

Proof.

By the Lipschitz continuity of

\nabla f

and ([44] Lemma 1.2.3), we have

f (x) \leq f (y) + 〈 \nabla f (y), x - y 〉 + \frac{1}{2} L_{f} {∥ x - y ∥}^{2} .

(22)

On the other hand, since

f (\cdot) + \frac{1}{2} l_{f} {∥ \cdot - y ∥}^{2}

is a convex function with gradient

\nabla f (y)

at y, the following inequality holds:

f (x^{'}) + \frac{1}{2} l_{f} {∥ x^{'} - y ∥}^{2} \geq f (y) + 〈 \nabla f (y), x^{'} - y 〉 .

(23)

Then (21) follows from (22) and (23). □

To simplify our discussion in Section 4, Section 5 and Section 6, we denoted by

Δ_{k} : = x^{k} - x^{k - 1}, k = 1, 2, \dots

(24)

Based on (14) and (16), we have

\begin{matrix} z^{k + 1} - x^{k} & = \frac{1}{θ_{k}} Δ_{k + 1}, \end{matrix}

(25)

\begin{matrix} x^{k} - z^{k} & = - \frac{γ_{k}}{θ_{k}} Δ_{k}, \end{matrix}

(26)

\begin{matrix} z^{k + 1} - z^{k} & = \frac{1}{θ_{k}} (x^{k + 1} - y^{k}) = \frac{1}{θ_{k}} Δ_{k + 1} - \frac{γ_{k}}{θ_{k}} Δ_{k}, \end{matrix}

(27)

\begin{matrix} x^{k} - y^{k} & = θ_{k} (x^{k} - z^{k}) = - γ_{k} Δ_{k}, \end{matrix}

(28)

\begin{matrix} z^{k + 1} - y^{k} & = \frac{1}{θ_{k}} Δ_{k + 1} - γ_{k} Δ_{k} . \end{matrix}

(29)

4. Convergence Properties

In this section, we analyze the convergence properties of Algorithm 1. A central element of our analysis is the following auxiliary function:

\begin{matrix} Q (x, x^{'}, y, α) = & α^{- 1} [F (x) - \bar{m}] + Ψ (x, y) + \frac{1}{2} L_{g} {∥ x - y ∥}^{2} + \frac{1}{2} (L_{g} + α^{- 1} L_{f}) {∥ x - x^{'} ∥}^{2}, \end{matrix}

(30)

where

\bar{m} : = inf \{F (x) : x \in C\} .

(31)

Theorem 1

(Vanishing successive changes). Consider the Problem (1) under Assumptions 1 and 2. Let

{x^{k}, y^{k}, z^{k}, α_{k}}

be generated by Algorithm 1. Then the following statements hold:

(i): For $k \geq 1$ , it holds that

$H_{k} - H_{k + 1} \geq \frac{δ}{2} (L_{g} + α_{k}^{- 1} L_{f}) {∥ Δ_{k} ∥}^{2},$

(32)

where

$\begin{matrix} H_{k} & : = Q (x^{k}, x^{k - 1}, y^{k - 1}, α_{k}) . \end{matrix}$

(33)
(ii): $\sum_{k = 0}^{\infty} {∥ Δ_{k} ∥}^{2} < \infty$ , ${lim}_{k \to \infty} ∥ Δ_{k} ∥ = 0$ , and ${lim}_{k \to \infty} ∥ z^{k + 1} - x^{k} ∥ = {lim}_{k \to \infty} ∥ x^{k} - y^{k} ∥ = {lim}_{k \to \infty}$ $∥ z^{k + 1} - z^{k} ∥ = {lim}_{k \to \infty} ∥ z^{k + 1} - y^{k} ∥ = 0$ .
(iii): The sequence ${x^{k}} \subset C$ and is bounded.

Proof.

(i): Since $E (z)$ is a strongly convex function with parameter $θ_{k} (α_{k} L_{g} + L_{f})$ and $z^{k + 1}$ denotes its minimizer over the set C, the 3-Point Property ([46] Lemma 3.2) yields

$E (z^{k + 1}) < E (x^{k}) - \frac{1}{2} θ_{k} (α_{k} L_{g} + L_{f}) {∥ z^{k + 1} - x^{k} ∥}^{2},$

(34)

which is equivalent to:

$\begin{matrix} P_{1} (z^{k + 1}) \leq & P_{1} (x^{k}) + 〈 ξ^{k} - \nabla f (y^{k}), z^{k + 1} - x^{k} 〉 + α_{k} [Ψ (x^{k}, y^{k}) - Ψ (z^{k + 1}, y^{k})] \\ + \frac{1}{2} θ_{k} (α_{k} L_{g} + L_{f}) [∥ x^{k} - z^{k} ∥^{2} - ∥ z^{k + 1} - z^{k} ∥^{2} - {∥ z^{k + 1} - x^{k} ∥}^{2}] . \end{matrix}$

(35)

Substituting the equalities (25)–(27) into the above inequality, we obtain

$\begin{matrix} P (z^{k + 1}) \leq & P_{1} (x^{k}) + \frac{1}{θ_{k}} 〈 ξ^{k} - \nabla f (y^{k}), Δ_{k + 1} 〉 + α_{k} [Ψ (x^{k}, y^{k}) - Ψ (z^{k + 1}, y^{k})] \\ + \frac{1}{2} θ_{k} (α_{k} L_{g} + L_{f}) [\frac{γ_{k}^{2}}{θ_{k}^{2}} ∥ Δ_{k} ∥^{2} - \frac{1}{θ_{k}^{2}} ∥ x^{k + 1} - y^{k} ∥^{2} - \frac{1}{θ_{k}^{2}} {∥ Δ_{k + 1} ∥}^{2}] . \end{matrix}$

(36)

By virtue of (16) and convexity of $P_{1}$ , it follows that

$\begin{matrix} P_{1} (x^{k + 1}) \leq & P_{1} (x^{k}) + θ_{k} [P_{1} (z^{k + 1}) - P_{1} (x^{k})] \\ \leq & P_{1} (x^{k}) + 〈 ξ^{k}, Δ_{k + 1} 〉 - 〈 \nabla f (y^{k}), Δ_{k + 1} 〉 \\ + θ_{k} α_{k} [Ψ (x^{k}, y^{k}) - Ψ (z^{k + 1}, y^{k})] \\ + \frac{1}{2} (α_{k} L_{g} + L_{f}) [γ_{k}^{2} ∥ Δ_{k} ∥^{2} - ∥ x^{k + 1} - y^{k} ∥^{2} - {∥ Δ_{k + 1} ∥}^{2}] . \end{matrix}$

(37)

Combining this result with two key inequalities:

$- P_{2} (x^{k + 1}) + P_{2} (x^{k}) \leq 〈 ξ^{k}, Δ_{k + 1} 〉,$

(38)

which holds due to the convexity of $P_{2}$ and the fact that $ξ^{k} \in \partial P_{2} (x^{k})$ ;

$\begin{matrix} f (x^{k + 1}) - f (x^{k}) \\ \leq & 〈 \nabla f (y^{k}), x^{k + 1} - x^{k} 〉 + \frac{1}{2} L_{f} ∥ x^{k + 1} - y^{k} ∥^{2} + \frac{1}{2} l_{f} {∥ x^{k} - y^{k} ∥}^{2}, \end{matrix}$

(39)

which is derived from Lemma 4 by substituting $x, x^{'}, y$ with $x^{k + 1}, x^{k}, y^{k}$ , respectively, we arrive at

$\begin{matrix} F (x^{k + 1}) - F (x^{k}) \\ \leq & \frac{1}{2} L_{f} ∥ x^{k + 1} - y^{k} ∥^{2} + \frac{1}{2} l_{f} {∥ x^{k} - y^{k} ∥}^{2} + θ_{k} α_{k} [Ψ (x^{k}, y^{k}) - Ψ (z^{k + 1}, y^{k})] \\ + \frac{1}{2} (α_{k} L_{g} + L_{f}) [γ_{k}^{2} ∥ Δ_{k} ∥^{2} - ∥ x^{k + 1} - y^{k} ∥^{2} - {∥ Δ_{k + 1} ∥}^{2}] . \end{matrix}$

(40)

Thus,

$\begin{matrix} \frac{F (x^{k + 1}) - \bar{m}}{α_{k + 1}} + Ψ (x^{k + 1}, y^{k}) - [\frac{F (x^{k}) - \bar{m}}{α_{k}} + Ψ (x^{k}, y^{k - 1})] \\ \leq & α_{k}^{- 1} [F (x^{k + 1}) - F (x^{k})] + Ψ (x^{k + 1}, y^{k}) - Ψ (x^{k}, y^{k - 1}) \\ \leq & Ψ (x^{k + 1}, y^{k}) - Ψ (x^{k}, y^{k - 1}) + θ_{k} Ψ (x^{k}, y^{k}) - θ_{k} Ψ (z^{k + 1}, y^{k}) \\ + \frac{1}{2 α_{k}} L_{f} ∥ x^{k + 1} - y^{k} ∥^{2} + \frac{1}{2 α_{k}} l_{f} {∥ x^{k} - y^{k} ∥}^{2} \\ + \frac{1}{2} (L_{g} + \frac{1}{α_{k}} L_{f}) [γ_{k}^{2} ∥ Δ_{k} ∥^{2} - ∥ x^{k + 1} - y^{k} ∥^{2} - {∥ Δ_{k + 1} ∥}^{2}], \end{matrix}$

(41)

where the first inequality follows from the setting of $α_{k + 1}$ in Algorithm 1 (which ensures $α_{k + 1} \geq α_{k}$ ), and the second inequality is a consequence of (40).
Next, leveraging the convexity of $Ψ (\cdot, y^{k})$ , we have

$Ψ (x^{k + 1}, y^{k}) \leq θ_{k} Ψ (z^{k + 1}, y^{k}) + (1 - θ_{k}) Ψ (x^{k}, y^{k}) .$

(42)

Additionally, by Lemma 3 (with $x, y, y^{'}$ replaced by $x^{k}, y^{k}, y^{k - 1}$ , respectively),

$Ψ (x^{k}, y^{k}) \leq Ψ (x^{k}, y^{k - 1}) + \frac{1}{2} L_{g} ∥ x^{k} - y^{k - 1} ∥^{2} + \frac{1}{2} l_{g} {∥ x^{k} - y^{k} ∥}^{2} .$

(43)

Combining these two inequalities and (41) gives

$\begin{matrix} \frac{F (x^{k + 1}) - \bar{m}}{α_{k + 1}} + Ψ (x^{k + 1}, y^{k}) - [\frac{F (x^{k}) - \bar{m}}{α_{k}} + Ψ (x^{k}, y^{k - 1})] \\ \leq & \frac{1}{2} L_{g} ∥ x^{k} - y^{k - 1} ∥^{2} + \frac{1}{2} l_{g} ∥ x^{k} - y^{k} ∥^{2} + \frac{1}{2 α_{k}} L_{f} {∥ x^{k + 1} - y^{k} ∥}^{2} \\ + \frac{1}{2 α_{k}} l_{f} {∥ x^{k} - y^{k} ∥}^{2} + \frac{1}{2} (L_{g} + \frac{1}{α_{k}}) [γ_{k}^{2} ∥ Δ_{k} ∥^{2} - ∥ x^{k + 1} - y^{k} ∥^{2} - {∥ Δ_{k + 1} ∥}^{2}] . \end{matrix}$

(44)

Together with the definition of $H_{k}$ in (33), this implies

$\begin{matrix} H_{k + 1} - H_{k} \\ \leq & \frac{1}{2} L_{g} ∥ x^{k + 1} - y^{k} ∥^{2} + \frac{1}{2} (L_{g} + α_{k + 1}^{- 1} L_{f}) ∥ Δ_{k + 1} ∥^{2} - \frac{1}{2} L_{g} {∥ x^{k} - y^{k - 1} ∥}^{2} \\ - \frac{1}{2} (L_{g} + α_{k}^{- 1} L_{f}) ∥ Δ_{k} ∥^{2} + \frac{1}{2} L_{g} ∥ x^{k} - y^{k - 1} ∥^{2} + \frac{1}{2} l_{g} {∥ x^{k} - y^{k} ∥}^{2} \\ + \frac{1}{2} α_{k}^{- 1} L_{f} ∥ x^{k + 1} - y^{k} ∥^{2} + \frac{1}{2} α_{k}^{- 1} l_{f} {∥ x^{k} - y^{k} ∥}^{2} \\ + \frac{1}{2} (L_{g} + α_{k}^{- 1} L_{f}) [γ_{k}^{2} ∥ Δ_{k} ∥^{2} - ∥ x^{k + 1} - y^{k} ∥^{2} - {∥ Δ_{k + 1} ∥}^{2}] \\ = & \frac{1}{2} (α_{k + 1}^{- 1} - α_{k}^{- 1}) L_{f} {∥ Δ_{k + 1} ∥}^{2} \\ + \frac{1}{2} [(L_{g} + l_{g}) γ_{k}^{2} - L_{g} + α_{k}^{- 1} ((L_{f} + l_{f}) γ_{k}^{2} - L_{f})] {∥ Δ_{k} ∥}^{2} \\ \leq & - \frac{1}{2} (L_{g} + α_{k}^{- 1} L_{f}) δ {∥ Δ_{k} ∥}^{2}, \end{matrix}$

(45)

where the equality relies on (28), and the final inequality holds due to $α_{k + 1} \geq α_{k}$ and Assumption 2.
(ii): From (45), we deduce

$\begin{matrix} \sum_{k = 1}^{t} \frac{δ}{2} {∥ Δ_{k} ∥}^{2} \leq \sum_{k = 1}^{t} (H_{k} - H_{k + 1}) = H_{1} - H_{t + 1} \leq H_{1} - \underset{k \to \infty}{lim inf} H_{k} < + \infty . \end{matrix}$

We directly conclude that $\sum_{k = 0}^{\infty} {∥ Δ_{k} ∥}^{2} < \infty$ and ${lim}_{k \to \infty} ∥ Δ_{k} ∥ = 0$ . Since Assumption 2 guarantees ${inf}_{k} θ_{k} > 0$ , combining this with ${lim}_{k \to \infty} ∥ Δ_{k} ∥ = 0$ and Equations (25) and (27)–(29), we further obtain the remaining limit conclusions.
(iii): According to (15) and (16), and the convexity of C, $x^{k} \in C$ for each k. If C is compact, as part (a) of Assumption 1(iv) holds, the sequence ${x_{k}}$ is obviously bounded. Otherwise, we have all $g_{i} = 0$ and F is level bounded. Thus all $α_{k} = α_{0}$ in our algorithm. From (48), we observe that $F (x^{k}) \leq α_{0} H_{k} + \bar{m} \leq α_{0} H_{1} + \bar{m} < + \infty$ . So ${x^{k}}$ is bounded by the level boundedness of F.

□

In Algorithm 1, if the penalty parameters

{α_{k}}

are unbounded, the influence of the objective function on subproblem (15) will diminish. Consequently, we cannot guarantee the critical point property for any cluster point of

{x^{k}}

.

To establish the boundedness of

{α_{k}}

, the following assumption is critical. First, introduced in ([18] Assumption (A1)) for analyzing ESQM, this assumption was also adopted in [19] for

{ESQM}_{e}

.

Assumption 3.

For (1), the

R C Q (x)

holds at every

x \in C \cap F

, and for every

x \in C ∖ F

, there cannot exist

u_{i}, i \in I (x)

, such that

\begin{matrix} u_{i} \geq 0 \forall i \in I (x), \sum_{i \in I (x)} u_{i} = 1, 〈\sum_{i \in I (x)} u_{i} \nabla g_{i} (x), z - x〉 \geq 0 \forall z \in C, \end{matrix}

(46)

where

I (x) : = \{l \in {1, \dots, m} : g_{l} (x) = {max}_{i = 1, \dots, m} {[g_{i} (x)]}_{+}\}

.

Remark 2.

(i): As shown in ([18] Remark 2.1), if Assumption 3 holds, then for any $x \in C$ , there exist no $u_{i}$ (for $i \in I (x)$ ) that satisfy (46).
(ii): As shown in ([18] Remark 2.2), if RCQ(x) holds for all $x \in C$ , then Assumption 3 is satisfied.

Theorem 2

(Boundedness of the penalty parameters

{α_{k}}

). Consider (1) and suppose that Assumption 1–3 hold. Let

{(x^{k}, y^{k}, z^{k}, α_{k})}

be generated by Algorithm 1. Then the sequence

{α_{k}}

is bounded above, i.e., there exists

K_{0} \in N

such that

α_{k} = α_{K_{0}}

whenever

k \geq K_{0}

.

Proof.

Suppose on the contrary that

{α_{k}}

is unbounded above. By the definition of

α_{k}

in Algorithm 1, there exists a subsequence of positive integers

{k_{j}}

such that

Ψ (z^{k_{j} + 1}, y^{k_{j}}) > 0 .

(47)

Moreover, we have

{lim}_{k \to \infty} α_{k} = \infty

and

{lim}_{k \to \infty} α_{k}^{- 1} = 0

.

Recalling the definitions of

I_{k} (\cdot)

in (20) and

g_{0}

in (12), we have

0 \notin I_{k_{j}} (z^{k_{j} + 1})

and

{lin}_{g_{i}} (z^{k_{j} + 1}, y^{k_{j}}) > 0 \forall i \in I_{k_{j}} (z^{k_{j} + 1}), \forall j .

Now, in view of the finiteness of

{I_{k_{j}} (z^{k_{j} + 1})}

(since

I_{k_{j}} (z^{k_{j} + 1}) \subseteq {1, \dots, m}

for all j), by passing to a further subsequence if necessary, we deduce that there exists a nonempty subset

I_{0} \subseteq {1, \dots, m}

such that

I_{k_{j}} (z^{k_{j} + 1}) \equiv I_{0}

for all j. That is, for all

i \in I_{0}

,

{lin}_{g_{i}} (z^{k_{j} + 1}, y^{k_{j}}) = Ψ (z^{k_{j} + 1}, y^{k_{j}}) > 0 \forall j .

(48)

In addition, from Lemma 2(ii), we have that for each

k_{j}

, there exist

λ_{i}^{k_{j}} \geq 0

for each

i \in I_{k_{j}} (z^{k_{j} + 1}) \equiv I_{0}

, such that

\sum_{i \in I_{0}} λ_{i}^{k_{j}} = 1

and

\begin{matrix} 0 \in & α_{k_{j}}^{- 1} [\partial P_{1} (z^{k_{j} + 1}) + \nabla f (y^{k_{j}}) - ξ^{k_{j}}] + θ_{k_{j}} (L_{g} + α_{k_{j}}^{- 1} L_{f}) (z^{k_{j} + 1} - z^{k_{j}}) \\ + \sum_{i \in I_{0}} λ_{i}^{k_{j}} \nabla g_{i} (y^{k_{j}}) + N_{C} (z^{k_{j} + 1}) . \end{matrix}

(49)

Since the sequences

{x^{k_{j}}} \subseteq C

and

{λ_{i}^{k_{j}}}

(for each

i \in I_{0}

) are bounded, by passing to a further subsequence if necessary, we assume that

{lim}_{j \to \infty} x^{k_{j}} = x^{*}

for some

x^{*}

and that for each

i \in I_{0}

,

{lim}_{j \to \infty} λ_{i}^{k_{j}} = {\bar{λ}}_{i}

for some

{\bar{λ}}_{i}

. Then

x^{*} \in C

,

{\bar{λ}}_{i} \geq 0

(for each

i \in I_{0}

),

\sum_{i \in I_{0}} {\bar{λ}}_{i} = 1

and

I_{0} \subseteq {ι \in {0, 1, \dots, m} : g_{ι} (x^{*}) = {max}_{i = 0, 1, \dots, m} g_{i} (x^{*})}

. Since

0 \notin I_{0}

, we see that

\begin{matrix} I_{0} \subseteq I (x^{*}), \end{matrix}

where

I (x)

was defined in Assumption 3. Passing to the limit in (49), and noting that

{lim}_{j \to \infty} α_{k_{j}}^{- 1} = 0

,

{lim}_{k \to \infty} ∥ z^{k + 1} - z^{k} ∥ = 0

(thanks to Theorem 1(iii)), and the fact that

{\partial P_{1} (x^{k_{j} + 1})}

,

{\nabla f (y^{k_{j}})}

, and

{ξ^{k_{j}}}

are uniformly bounded (thanks to the boundedness of

{x^{k}}

and

{y^{k}}

, the convexity of

P_{1}

,

P_{2}

and ([47] Theorem 24.7), and the Lipschitz continuity of

{\nabla f}

), we have upon invoking the closedness of

x \mapsto N_{C} (x)

that

0 \in \sum_{i \in I_{0}} {\bar{λ}}_{i} \nabla g_{i} (x^{*}) + N_{C} (x^{*}),

which implies that

〈\sum_{i \in I_{0}} {\bar{λ}}_{i} \nabla g_{i} (x^{*}), x - x^{*}〉 \geq 0 \forall x \in C .

Since

I_{0} \subseteq I (x^{*})

, this contradicts Assumption 3 in view of Remark 2(i). We complete the proof. □

Theorem 3

(Subsequential convergence). Consider (1) and suppose that Assumptions 1–3 hold. Let

{x^{k}}

be generated by Algorithm 1. Then, for any accumulation point

\bar{x}

of

{x^{k}}

, there exists

{\bar{λ}}_{i} \geq 0

for each

i \in \tilde{I} (\bar{x})

such that

\sum_{i \in \tilde{I} (\bar{x})} {\bar{λ}}_{i} = 1

and

\begin{matrix} 0 \in \partial P_{1} (\bar{x}) + \nabla f (\bar{x}) - \partial P_{2} (\bar{x}) + α_{K_{0}} \sum_{i \in \tilde{I} (\bar{x})} {\bar{λ}}_{i} \nabla g_{i} (\bar{x}) + N_{C} (\bar{x}), \end{matrix}

(50)

where

\tilde{I} (\bar{x}) : = {ι \in {0, 1, \dots, m} : g_{ι} (\bar{x}) = {max}_{i = 0, 1, \dots, m} {g_{i} (\bar{x})}}

, and

α_{K_{0}}

is defined in Theorem 2; moreover,

\bar{x}

is a critical point of Problem (1).

Proof.

Suppose that

\bar{x}

is an accumulation point of

{x^{k}}

, so there exists a convergent subsequence

{x^{k_{j}}}

such that

{lim}_{j \to \infty} x^{k_{j}} = \bar{x}

. Let

{ξ^{k}}

be the sequence generated in Algorithm 1, and let

{λ_{i}^{k}}

(for

i \in I_{k_{j}} (z^{k_{j} + 1})

) be the sequence specified in Lemma 2(ii). Note first that for all j,

I_{k_{j}} (z^{k_{j} + 1}) \subseteq {0, 1, \dots, m}

, which implies the set

{I_{k_{j}} (z^{k_{j} + 1})}

is finite. By passing to a further subsequence if necessary, we may assume there exists a nonempty subset

I_{0} \subseteq {0, 1, \dots, m}

such that

I_{k_{j}} (z^{k_{j} + 1}) \equiv I_{0}

. From Lemma 2(ii), it follows that

\begin{matrix} 0 \in & \partial P_{1} (z^{k_{j} + 1}) + \nabla f (y^{k_{j}}) - ξ^{k_{j}} + θ_{k_{j}} (α_{k_{j}} L_{g} + L_{f}) (z^{k_{j} + 1} - z^{k_{j}}) \\ + α_{k_{j}} \sum_{i \in I_{0}} λ_{i}^{k_{j}} \nabla g_{i} (y^{k_{j}}) + N_{C} (z^{k_{j} + 1}) \\ and & \sum_{i \in I_{0}} λ_{i}^{k_{j}} = 1, λ_{i}^{k_{j}} \geq 0 \forall i \in I_{k_{j}} (z^{k_{j} + 1}) \equiv I_{0} . \end{matrix}

(51)

Moreover, for each

i \in I_{k_{j}} (z^{k_{j} + 1}) \equiv I_{0}

, the sequence

{λ_{i}^{k_{j}}}

consists of nonnegative numbers bounded above by 1, hence is bounded. As for

{ξ^{k}}

, its boundedness is guaranteed by the fact that

P_{2}

is convex, together with ([47] Theorem 24.7). By passing to another further subsequence if needed, we may assume without loss of generality that

{lim}_{j \to \infty} λ_{i}^{k_{j}} = {\bar{λ}}_{i} \geq 0

for each

i \in I_{0}

, and

{lim}_{j \to \infty} ξ^{k_{j}} = \bar{ξ}

. Additionally, Theorem 2 implies that for all

k_{j} \geq K_{0}

,

α_{k_{j}} = α_{K_{0}}

and

Ψ (z^{k_{j} + 1}, y^{k_{j}}) = 0 .

(52)

Taking the limit as

j \to \infty

in both (51) and (52)—and recalling Theorem 1(ii) along with the closedness of

\partial P_{1}

,

\partial P_{2}

and

N_{C}

—we obtain

\begin{matrix} 0 & \in \nabla f (\bar{x}) + \partial P_{1} (\bar{x}) - \partial P_{2} (\bar{x}) + α_{K_{0}} \sum_{i \in I_{0}} {\bar{λ}}_{i} \nabla g_{i} (\bar{x}) + N_{C} (\bar{x}), \\ \sum_{i \in I_{0}} {\bar{λ}}_{i} = 1, {\bar{λ}}_{i} \geq 0 \forall i \in I_{0}, \end{matrix}

(53)

and

Ψ (\bar{x}, \bar{x}) = 0,

(54)

where the above inequality is equivalent to (noting that

Ψ (\bar{x}, \bar{x}) = {max}_{i = 0, 1, \dots, m} {g_{i} (\bar{x})}

by (11) and (13))

g_{i} (\bar{x}) \leq 0 \forall i = 1, \dots, m .

(55)

Furthermore, from the definition of

I_{k_{j}} (z^{k_{j} + 1})

in (20) (and since

I_{k_{j}} (z^{k_{j} + 1}) \equiv I_{0}

) and Theorem 1(ii), we have

\begin{matrix} I_{0} \subseteq \tilde{I} (\bar{x}) : = & \{ι \in {0, 1, \dots, m} : g_{ι} (\bar{x}) = max_{i = 0, 1, \dots, m} {g_{i} (\bar{x})}\} . \end{matrix}

(56)

Noting that the above inclusion, we define

{\bar{λ}}_{i} = 0

for

i \in \tilde{I} (\bar{x}) ∖ I_{0}

. With this definition, the inclusion (50) follows directly from (53).

Finally, define

{\hat{λ}}_{i} : = α_{N_{0}} {\bar{λ}}_{i} \geq 0

for all

i \in I_{0} \cap {1, \dots, m}

, and

{\hat{λ}}_{i} = 0

for all

i \in {1, \dots, m} ∖ I_{0}

. By (55) and the fact that

I_{0} \subseteq \tilde{I} (\bar{x})

(see (56)), we have

{\hat{λ}}_{i} g_{i} (\bar{x}) = 0 \forall i = 1, \dots, m .

(57)

To verify this, observe that for each

i \in I_{0}

, we have

g_{i} (\bar{x}) = 0

, and for each

i \notin I_{0}

, we have

{\hat{λ}}_{i} = 0

. Note also that

\nabla g_{0} (\bar{x}) = 0

(since

g_{0} \equiv 0

). Using the definition of

{\hat{λ}}_{i}

and (53), we find

0 \in \partial P_{1} (\bar{x}) + \nabla f (\bar{x}) - \partial P_{2} (\bar{x}) + \sum_{i = 1}^{m} {\hat{λ}}_{i} \nabla g_{i} (\bar{x}) + N_{C} (\bar{x}) .

(58)

Combining (55), (57), (58) and the definition of

\hat{λ}

above, we conclude that

\bar{x}

is a critical point of (1). □

5. ${EAPG}_{s}$ with the Restart Technique

To discuss the global convergence and convergence rate of the sequence

{x^{k}}

generated by proximal-gradient-type algorithms (e.g., the algorithms proposed in [19,25,26,42,48] and numerous other related methods), the following conditions are typically indispensable: (1) An inequality analogous to Theorem 1(i), which involves an auxiliary sequence (e.g.,

{H_{k}}

) that depends on

{x^{k}}

,

{F (x^{k})}

and potentially other sequences; (2) Certain boundedness properties of the subgradients of the objective function within the subproblem (e.g., with respect to

{x^{k}}

for the function

E (\cdot)

); (3) The Kurdyka–Łojasiewicz (KL) property of an auxiliary function H associated with

{H_{k}}

. However, for proximal gradient algorithms incorporating the second acceleration technique, analyzing condition (2) poses a significant challenge. The core reason lies in the structural design of these methods: in such algorithms,

z^{k}

(rather than

x^{k}

) serves as the minimizer of subproblem (15), while

x^{k}

is merely a linear combination of

z^{k}

and

x^{k - 1}

. Consequently, although we can derive bounds for the gradient of

E (\cdot)

at

z^{k}

, we lack sufficient information to characterize the subdifferential

\partial E

at

x^{k}

.

The solution approach adopted in [25,26,36] relies on imposing Lipschitz continuity assumptions on the gradients of all involved functions. Nevertheless, this assumption is overly restrictive: in practical scenarios, the function

P_{1}

is often nondifferentiable (e.g.,

P_{1} = {∥ \cdot ∥}_{1}

, the

l_{1}

-norm), and it is even inapplicable to Algorithm 1—since the function

Ψ

in this algorithm fails to be differentiable everywhere.

To address the aforementioned difficulty, it is essential to revisit the working mechanism of Nesterov’s second acceleration method for proximal gradient algorithms, which was first investigated in [23]. Specifically, ([23] Theorem 5.1) demonstrates that if the objective function is strongly convex and differentiable with a Lipschitz-continuous gradient, then

x^{k}

always yields a better performance (in terms of objective function value) than

z^{k}

. For nonconvex problems, however, this conclusion no longer holds—we cannot guarantee that

x^{k}

outperforms

z^{k}

.

Given the possibility that

z^{k}

may be superior to

x^{k}

in nonconvex settings, continuing the acceleration operation (which relies on

x^{k}

as the update base) may not be optimal. Thus, by drawing inspiration from the restart strategy used in Nesterov’s first acceleration method [33], we propose the following algorithm (Algorithm 2):

Algorithm 2

{EAPG}_{s}

for solving (1) with the restart technique (Theoretical version).

Initialization:

x^{(1, 0)} = x^{0} \in C

,

z^{(1, 0)} = z^{0} \in C

,

α_{(1, 0)} = α_{0} > 0

,

d > 0

.
for

s = 1, 2, \dots

do
carry out Algorithm 1 with initial values

x^{(s, 0)}

,

z^{(s, 0)}

,

α_{(s, 0)}

and preselected parameters

{θ_{(s, k^{'})}}

satisfying Assumption 4 to generate the sequence

{(x^{(s, k^{'})}, y^{(s, k^{'})}, z^{(s, k^{'})}, α_{(s, k^{'})})}

until

\begin{matrix} Q (x^{(s, k^{'})}, x^{(s, k^{'} - 1)}, y^{(s, k^{'} - 1)}, α_{(s, k^{'})}) > Q (z^{(s, k^{'})}, x^{(s, k^{'} - 1)}, y^{(s, k^{'} - 1)}, α_{(s, k^{'})}) \end{matrix}

(59)

where Q is defined in (30). Denote the above

k^{'}

by

N_{s}

.
Set:

x^{(s + 1, 0)} = z^{(s + 1, 0)} = z^{(s, N_{s})}

,

α_{(s + 1, 0)} = α_{(s, N_{s})}

.
end for

Assumption 4.

With

{inf}_{(s, k^{'})} θ (s, k^{'}) > 0

, there exists a constant

δ \in (0, 1)

such that

\begin{matrix} (L_{g} + l_{g}) γ_{(s, k^{'})}^{2} & \leq L_{g} (1 - δ), \\ (L_{f} + l_{f}) γ_{(s, k^{'})}^{2} & \leq L_{f} (1 - δ), \end{matrix}

(60)

where

γ_{(s, k^{'})} : = θ_{(s, k^{'})} (1 - θ_{(s, k^{'} - 1)}) θ_{(s, k^{'} - 1)}^{- 1}

for

k^{'} \geq 1

.

In the subsequent discussion of this section, for simplicity, we use

{x^{k}}

to denote the sequence

{x^{(s, k^{'})}} : x^{(1, 0)}, \dots, x^{(1, N_{1} - 1)}, x^{(2, 0)}, \dots, x^{(2, N_{2} - 1)}, x^{(3, 0)} \dots

generated by Algorithm 2. Similarly, we use

{y^{k}}

,

{z^{k}}

,

{α_{k}}

,

{θ_{k}}

, and

{γ_{k}}

to represent

{y^{(s, k^{'})}}

,

{z^{(s, k^{'})}}

,

{α_{(s, k^{'})}}

,

{θ_{(s, k^{'})}}

, and

{γ_{(s, k^{'})}}

, respectively. We also use

k = k (s, k^{'})

to denote the correspondence mapping

(s, k^{'})

to k. As will be shown below, these sequences satisfy the same results as those established in Section 4.

Remark 3.

For the sequences generated by Algorithm 2, we have the following relations:

(i): When $k = k (s, k^{'})$ for $1 \leq k^{'} \leq N_{s} - 2$ , the equality Equations (25)–(29) remain valid.
(ii): When $k = k (s, N_{s} - 1)$ , we have $x^{k + 1} = y^{k + 1} = z^{k + 1}$ , Equations (26) and (28) still hold, and

$\begin{matrix} z^{k + 1} - x^{k} & = Δ_{k + 1}, \\ z^{k + 1} - z^{k} & = Δ_{k + 1} - \frac{γ_{k}}{θ_{k}} Δ_{k}, \\ z^{k + 1} - y^{k} & = Δ_{k + 1} - γ_{k} Δ_{k} . \end{matrix}$
(iii): When $k = k (s + 1, 0)$ , we have $x^{k} = y^{k} = z^{k}$ , Equation (25) holds, and we have

$\begin{matrix} x^{k} - z^{k} & = x^{k} - y^{k} = 0, \\ z^{k + 1} - y^{k} & = z^{k + 1} - z^{k} = z^{k + 1} - x^{k} = \frac{1}{θ_{k}} Δ_{k + 1} . \end{matrix}$

Lemma 5.

The results established in Theorems 1–3 remain valid for the sequence

{x^{k}, y^{k}, z^{k}, α_{k}}

generated by Algorithm 2, provided the same conditions are satisfied with Assumption 2 replaced by Assumption 4.

Proof.

First, we prove that (32) remains valid for Algorithm 2. From Theorem 1(i), we know that for

s \geq 1

and

1 \leq k^{'} \leq N_{s} - 1

, the following holds:

\begin{matrix} Q (x^{(s, k^{'})}, x^{(s, k^{'} - 1)}, y^{(s, k^{'} - 1)}, α_{(s, k^{'})}) - Q (x^{(s, k^{'} + 1)}, x^{(s, k^{'})}, y^{(s, k^{'})}, α_{(s, k^{'} + 1)}) \\ \geq & \frac{δ}{2} (L_{g} + α_{(s, k^{'})}^{- 1} L_{f}) {∥x^{(s, k^{'})} - x^{(s, k^{'} - 1)}∥}^{2} . \end{matrix}

This inequality leads to the conclusion that (32) is valid in two cases:

(i) When

k = k (s, k^{'})

with

1 \leq k^{'} \leq N_{s} - 2

.

(ii) When

k = k (s, N_{s} - 1)

. Here, we note that

x^{k + 1} = x^{(s + 1, 0)} = z^{(s, N_{s})}

,

α_{k + 1} = α_{(s + 1, 0)} = α_{(s, N_{s})}

, and that (59) holds for

k^{'} = N_{s}

due to the setup of Algorithm 2.

Now, we need to address the case where

k = k (s + 1, 0)

. In this situation,

y^{k} = z^{k} = x^{k}

, and (35) still holds for

z^{k + 1}

, being equivalent to:

\begin{matrix} P_{1} (z^{k + 1}) \leq & P_{1} (x^{k}) + 〈 ξ - \nabla f (x^{k}), z^{k + 1} - x^{k} 〉 - α_{k} Ψ (z^{k + 1}, y^{k}) - θ_{k} (α_{k} L_{g} + L_{f}) {∥ z^{k + 1} - x^{k} ∥}^{2} . \end{matrix}

By Remark 3(iii), the equality (25) is still satisfied. Thus, the above inequality implies the following:

\begin{matrix} P_{1} (z^{k + 1}) \leq & P_{1} (x^{k}) + \frac{1}{θ_{k}} 〈 ξ - \nabla f (x^{k}), Δ_{k + 1} 〉 - α_{k} Ψ (z^{k + 1}, y^{k}) - \frac{1}{θ_{k}} (α_{k} L_{g} + L_{f}) {∥ Δ_{k + 1} ∥}^{2} . \end{matrix}

Since

P_{1} (x^{k + 1}) \leq P_{1} (x^{k}) + θ_{k} [P_{1} (z^{k + 1}) - P_{1} (x^{k})]

, we can obtain the following:

\begin{matrix} P_{1} (x^{k + 1}) \leq & P_{1} (x^{k}) + 〈 ξ - \nabla f (x^{k}), Δ_{k + 1} 〉 - θ_{k} α_{k} Ψ (z^{k + 1}, y^{k}) - (α_{k} L_{g} + L_{f}) {∥ Δ_{k + 1} ∥}^{2} . \end{matrix}

Combining this with (38) and (39) (noting that

x^{k} - y^{k} = 0

), the above inequality gives

\begin{matrix} F (x^{k + 1}) - F (x^{k}) \leq & - θ_{k} α_{k} Ψ (z^{k + 1}, y^{k}) - \frac{1}{2} (2 α_{k} L_{g} + L_{f}) {∥ Δ_{k + 1} ∥}^{2} . \end{matrix}

(61)

Consequently,

\begin{matrix} \frac{F (x^{k + 1}) - \bar{m}}{α_{k + 1}} + Ψ (x^{k + 1}, y^{k}) - [\frac{F (x^{k}) - \bar{m}}{α_{k}} + Ψ (x^{k}, y^{k - 1})] \\ \leq & α_{k}^{- 1} [F (x^{k + 1}) - F (x^{k})] + Ψ (x^{k + 1}, y^{k}) - Ψ (x^{k}, y^{k - 1}) \\ \leq & Ψ (x^{k + 1}, y^{k}) - θ_{k} Ψ (z^{k + 1}, y^{k}) - Ψ (x^{k}, y^{k - 1}) - \frac{1}{2} (2 L_{g} + α_{k}^{- 1} L_{f}) {∥ Δ_{k + 1} ∥}^{2} \\ \leq & - \frac{1}{2} (2 L_{g} + α_{k}^{- 1} L_{f}) {∥ Δ_{k + 1} ∥}^{2}, \end{matrix}

(62)

where the first inequality arises from the fact that

α_{k} \leq α_{k + 1}

, the second one follows from (61), and the last one is derived from (42) along with the facts that

Ψ (x^{k}, y^{k}) = 0

and

Ψ (x^{k}, y^{k - 1}) \geq 0

. From (62), we can deduce that

\begin{matrix} H_{k + 1} - H_{k} & \leq - \frac{1}{2} (2 L_{g} + α_{k}^{- 1} L_{f}) ∥ Δ_{k + 1} ∥^{2} + \frac{L_{g}}{2} ∥ x^{k + 1} - y^{k} ∥^{2} + \frac{1}{2} (L_{g} + α_{k + 1}^{- 1} L_{f}) {∥ Δ_{k + 1} ∥}^{2} \\ - \frac{L_{g}}{2} ∥ x^{k} - y^{k - 1} ∥^{2} - \frac{1}{2} (L_{g} + α_{k}^{- 1} L_{f}) {∥ Δ_{k} ∥}^{2} \\ = \frac{1}{2} (α_{k + 1}^{- 1} - α_{k}^{- 1}) L_{f} ∥ Δ_{k + 1} ∥^{2} - \frac{L_{g}}{2} ∥ x^{k} - y^{k - 1} ∥^{2} - \frac{1}{2} (L_{g} + α_{k}^{- 1} L_{f}) {∥ Δ_{k} ∥}^{2} \\ \leq - \frac{1}{2} (L_{g} + α_{k}^{- 1} L_{f}) {∥ Δ_{k} ∥}^{2} \\ \leq - \frac{δ}{2} (L_{g} + α_{k}^{- 1} L_{f}) {∥ Δ_{k} ∥}^{2}, \end{matrix}

where the equality holds because

x^{k + 1} - y^{k} = Δ_{k + 1}

. This completes the proof of the result in Theorem 1(i).

The remaining parts of the proof are straightforward. Specifically, the other results in Theorems 1–3 follow directly from (32), and their original proofs remain unchanged. □

Applying Theorem 3 to the sequences generated by Algorithm 2, there exists an integer

K_{0}

and a positive number

\hat{α}

such that

α_{k} = \hat{α}, \forall k \geq K_{0} .

(63)

Our discussion regarding the remaining convergence properties of Algorithm 2 will rely on the following auxiliary function:

\begin{matrix} H (z, x, y) & : = Q (z, x, y, \hat{α}) + δ_{C} (z) \\ = {\hat{α}}^{- 1} [F (z) - \bar{m}] + Ψ (z, y) + \frac{L_{g}}{2} {∥ z - y ∥}^{2} + \frac{L}{2} {∥ z - x ∥}^{2} + δ_{C} (z), \end{matrix}

(64)

where

L : = L_{g} + {\hat{α}}^{- 1} L_{f}

. It is worth noting that for all

k \geq K_{0}

,

H_{k} = H (x^{k}, x^{k - 1}, y^{k - 1}),

and in light of the setup of Algorithm 2, we have

H_{k} \leq {\hat{H}}_{k},

(65)

where

{\hat{H}}_{k} : = H (z^{k}, x^{k - 1}, y^{k - 1})

.

Lemma 6.

Suppose that Assumptions 1, 3, and 4 hold, let

{x^{k}}

,

{y^{k}}

, and

{z^{k}}

be generated by Algorithm 2. Let Λ and Ω denote the sets of accumulation points of

{x^{k}}

and

{(z^{k + 1}, y^{k}, x^{k})}

, respectively. Then the following assertions hold

(i): Λ is a nonempty compact set.
(ii): $Ω = {(\bar{x}, \bar{x}, \bar{x}) : \bar{x} \in Λ}$ is also nonempty and compact.
(iii): The limit $ω : = {lim}_{k \to \infty} H_{k}$ exists.
(iv): If $P_{1}$ is continuous on Ω , we have $H \equiv ω$ on Ω , and ${lim}_{k \to \infty} {\hat{H}}_{k} = ω$ .

Proof.

(i): The nonemptiness and compactness of $Λ$ follows directly from the boundedness of ${(x^{k})}$ , as stated in Theorem 1(iii).
(ii): The representation $Ω = {(\bar{x}, \bar{x}, \bar{x}) : \bar{x} \in Λ}$ is a consequence of Theorem 1(ii). Consequently, the properties of nonemptiness and compactness are inherited from $Λ$ to $Ω$ .
(iii): By Theorem 1(i), the sequence ${H_{k}}$ is nonincreasing. Furthermore, it follows from the definition of $H_{k}$ in (33) that $H_{k}$ is always non-negative. Then $ω : = {lim}_{k \to \infty} H_{k}$ exists.
(iv): Finally, we assume that $P_{1}$ is continuous on $Ω$ , which implies the continuity of H on $Ω$ . For any $\bar{x} \in Λ$ , let ${x^{k_{j}}}$ be a subsequence converging to $\bar{x}$ . By Theorem 1(ii), both ${y^{k_{j}}}$ and ${x^{k_{j} + 1}}$ also converge to $\bar{x}$ . Thus,

$H (\bar{x}, \bar{x}, \bar{x}) = lim_{j \to \infty} H (x^{k_{j} + 1}, x^{k_{j}}, y^{k_{j}}) = lim_{j \to \infty} H_{k_{j}} = ω .$

(66)

Now, suppose for contradiction that ${{\hat{H}}_{k}}$ does not converge to $ω$ . By (65) and (66), there exist a subsequence ${H (z^{k_{j} + 1}, x^{k_{j}}, y^{k_{j}})}$ and a positive number $ϵ$ such that

$H (z^{k_{j} + 1}, x^{k_{j}}, y^{k_{j}}) \geq ω + ϵ .$

(67)

By passing to a subsequence if necessary, we may assume that ${x^{k_{j}}}$ converges to some $\bar{x} \in Λ$ . Consequently, both ${y^{k_{j}}}$ and ${z^{k_{j} + 1}}$ also converge to $\bar{x}$ . By the continuity of H, we have

$lim_{j \to \infty} H (z^{k_{j} + 1}, x^{k_{j}}, y^{k_{j}}) = H (\bar{x}, \bar{x}, \bar{x}) = ω,$

(68)

which contradicts (67). Therefore, ${lim}_{k \to \infty} H (z^{k + 1}, x^{k}, y^{k})$ $= ω$ .

□

Next, we introduce an assumption to help derive an upper bound for

dist ((0, 0, 0)

,

\partial H (z^{k + 1}, x^{k}, y^{k}))

. This assumption is widely adopted in proximal-gradient-type algorithms and is generally satisfied in numerous applications (e.g., the algorithms and problems presented in [19,28,42] and the references therein).

Assumption 5.

Each

g_{i}

in (1) is twice continuously differentiable. The function

P_{2}

is differentiable with locally Lipschitz continuous gradient on an open set

U_{0}

containing

X

, where

X

is the set of critical points of (1).

Lemma 7.

Suppose that Assumptions 1 and 3–5 hold,

P_{1}

is continuous, and let

{(z^{k + 1}, x^{k}, y^{k})}

be generated by Algorithm 2. Then there exist a positive constant

A_{1}

and a positive integer

K_{1}

such that for all

k \geq K_{1}

, we have

dist ((0, 0, 0), \partial H (z^{k + 1}, x^{k}, y^{k})) \leq A_{1} (∥ Δ_{k + 1} ∥ + ∥ Δ_{k} ∥) .

(69)

Proof.

By Lemma 5 and Theorem 3,

Λ \subset X (\subset U_{0})

. Noticing Lemma 6(i), the local Lipschitz continuity of

\nabla P_{2}

ensures the existence of a bounded open neighborhood

U_{1}

of

Λ

such that

\nabla P_{2}

is Lipschitz continuous on

U_{1}

, with the corresponding Lipschitz constant denoted by

L_{2}

.

Leveraging the boundedness of

{x^{k}}

and the limit results

{lim}_{k \to \infty} ∥ z^{k + 1} - x^{k} ∥ = 0

and

{lim}_{k \to \infty} ∥ x^{k} - y^{k} ∥ = 0

(stated in Theorem 1(ii, iii)), there exists a positive integer

K^{'}

such that

z^{k + 1}, x^{k}, y^{k} \in U_{1}

for all

k \geq K^{'}

. Let

K_{1} : = max {K_{0}, K^{'}}

. For any

k \geq K_{1}

, it follows that

α_{k} = \hat{α}

; substituting this into Lemma 2(ii) yields the following:

\begin{matrix} 0 \in {\hat{α}}^{- 1} [\nabla f (y^{k}) - \nabla P_{2} (x^{k})] + {\hat{α}}^{- 1} \partial P_{1} (z^{k + 1}) \\ + \sum_{i \in I_{k} (z^{k + 1})} λ_{i}^{k} \nabla g_{i} (y^{k}) + θ_{k} L (z^{k + 1} - z^{k}) + N_{C} (z^{k + 1}), \end{matrix}

(70)

which can be rearranged to the following:

v_{1}^{k} \in V^{k},

(71)

where

\begin{matrix} v_{1}^{k} : = & {\hat{α}}^{- 1} [\nabla f (z^{k + 1}) - \nabla f (y^{k})] - {\hat{α}}^{- 1} [\nabla P_{2} (z^{k + 1}) - \nabla P_{2} (x^{k})] \\ + L_{g} (z^{k + 1} - y^{k}) + L (z^{k + 1} - x^{k}) - θ_{k} L (z^{k + 1} - z^{k}) \end{matrix}

(72)

and

\begin{matrix} V^{k} : = & {\hat{α}}^{- 1} [\nabla f (z^{k + 1}) + \partial P_{1} (z^{k + 1}) - \nabla P_{2} (z^{k + 1})] + N_{C} (z^{k + 1}) + \sum_{i \in I_{k} (z^{k + 1})} λ_{i}^{k} \nabla g_{i} (y^{k}) \\ + L_{g} (z^{k + 1} - y^{k}) + L (z^{k + 1} - x^{k}) . \end{matrix}

(73)

Next, we analyze the subdifferential

\partial H (z^{k + 1}, x^{k}, y^{k})

. For simplicity, we decompose the function H into three components:

\begin{matrix} H^{a} (z, x, y) : = {\hat{α}}^{- 1} [F (z) - \bar{m}] + δ_{C} (z), \\ H^{b} (z, x, y) : = Ψ (z, y), \\ H^{c} (z, x, y) : = \frac{L_{g}}{2} {∥ z - y ∥}^{2} + \frac{L}{2} {∥ z - x ∥}^{2} . \end{matrix}

(74)

By ([49] Theorem 8.6) and ([49] Corollary 10.9) (respectively), we have

\begin{matrix} \partial H (z^{k + 1}, x^{k}, y^{k}) & \supset \hat{\partial} H (z^{k + 1}, x^{k}, y^{k}) \\ \supset \hat{\partial} H^{a} (z^{k + 1}, x^{k}, y^{k}) + \hat{\partial} H^{b} (z^{k + 1}, x^{k}, y^{k}) + \hat{\partial} H^{c} (z^{k + 1}, x^{k}, y^{k}) . \end{matrix}

(75)

We now compute the regular subdifferentials of these three components:

1. For

H^{a}

:

\begin{matrix} \hat{\partial} H^{a} (z^{k + 1}, x^{k}, y^{k}) & = [\begin{matrix} \hat{\partial} [{\hat{α}}^{- 1} F (\cdot) + δ_{C} (\cdot)] (z^{k + 1}) \\ 0 \\ 0 \end{matrix}] \supset [\begin{matrix} {\hat{α}}^{- 1} \hat{\partial} F (z^{k + 1}) + N_{C} (z^{k + 1}) \\ 0 \\ 0 \end{matrix}], \end{matrix}

(76)

where the equality follows from ([49] Proposition 10.5), and the inclusion is derived from ([49] Corollary 10.9, Equation 10(6), Proposition 8.12, Exercise 8.14). Furthermore, by ([49] Exercise 8.8(c), Proposition 8.12), we obtain:

\begin{matrix} \hat{\partial} F (z^{k + 1}) & = \nabla f (z^{k + 1}) + \hat{\partial} P_{1} (z^{k + 1}) - \nabla P_{2} (z^{k + 1}) \\ = \nabla f (z^{k + 1}) + \partial P_{1} (z^{k + 1}) - \nabla P_{2} (z^{k + 1}) . \end{matrix}

(77)

2. For

H^{b}

: We have

\begin{matrix} \hat{\partial} H^{b} (z^{k + 1}, x^{k}, y^{k}) & = \partial H^{b} (z^{k + 1}, x^{k}, y^{k}) ∋ [\begin{matrix} \sum_{i \in I_{k} (z^{k + 1})} λ_{i}^{k} \nabla g_{i} (y^{k}) \\ 0 \\ \sum_{i \in I_{k} (z^{k + 1})} λ_{i}^{k} \nabla^{2} g_{i} (y^{k}) (z^{k + 1} - y^{k}) \end{matrix}], \end{matrix}

(78)

where the equality follows from ([49] Example 7.28, Corollary 8.11), and the inclusion is deduced from ([49] Exercise 8.31).

3. For

H^{c}

: By ([49] Exercise 8.8(a)), we have

\hat{\partial} H^{c} (z^{k + 1}, x^{k}, y^{k}) = [\begin{matrix} L_{g} (z^{k + 1} - y^{k}) + L (z^{k + 1} - x^{k}) \\ - L (z^{k + 1} - x^{k}) \\ - L_{g} (z^{k + 1} - y^{k}) \end{matrix}] .

(79)

Combining the relations (75)–(79) gives the following:

[\begin{matrix} v_{1}^{k} \\ v_{2}^{k} \\ v_{3}^{k} \end{matrix}] \in [\begin{matrix} V^{k} \\ v_{2}^{k} \\ v_{3}^{k} \end{matrix}] \subset \partial H (z^{k + 1}, x^{k}, y^{k}),

(80)

where

\begin{matrix} v_{2}^{k} & : = - L (z^{k + 1} - x^{k}), \\ v_{3}^{k} & : = \sum_{i \in I_{k} (z^{k + 1})} λ_{i}^{k} \nabla^{2} g_{i} (y^{k}) (z^{k + 1} - y^{k}) - L_{g} (z^{k + 1} - y^{k}) . \end{matrix}

(81)

Thus, from (80), we have

dist ((0, 0, 0), \partial H (z^{k + 1}, x^{k}, y^{k})) \leq ∥ v_{1}^{k} ∥ + ∥ v_{2}^{k} ∥ + ∥ v_{3}^{k} ∥ .

(82)

Furthermore, in light of

\{\begin{matrix} ∥ v_{1}^{k} ∥ \leq ({\hat{α}}^{- 1} L_{f} + L_{g}) ∥ z^{k + 1} - y^{k} ∥ + ({\hat{α}}^{- 1} L_{2} + L) ∥ z^{k + 1} - x^{k} ∥ + θ_{k} L ∥ z^{k + 1} - z^{k} ∥, \\ ∥ v_{2}^{k} ∥ = L ∥ z^{k + 1} - x^{k} ∥, \\ ∥ v_{3}^{k} ∥ \leq 2 L_{g} ∥ z^{k + 1} - y^{k} ∥ \end{matrix}

(83)

(where the last inequality follows from the Lipschitz continuity of

\nabla g_{i}

and ([49] Theorem 9.7)), and by virtue of Remark 3, we arrive at Equation (69).

□

Now we present our global convergence analysis under the Kurdyka-Lojasiewicz (KL) assumption of H.

Theorem 4.

Under the same conditions as in Lemma 7, and assuming that H is a KL function, we have

\sum_{k = 1}^{\infty} ∥ Δ_{k} ∥ < + \infty

and that

{x^{k}}

converges globally to a critical point of Problem (1).

Proof.

Our proof is divided into two cases.

Case 1. There exists an integer

\hat{K} > 0

such that

H_{\hat{K}} = ω

. Since

{H_{k}}

converges non-increasingly to

ω

, it follows that

H_{k} = ω

for all

k \geq \hat{K}

. Substituting this into Equation (32) further yields

Δ_{k} = 0

for all such k, which implies the finite convergence of

{x^{k}}

.

Case 2.

H_{k} > ω

for all k. From Lemma 6, we recall two key properties:

Ω

is a compact set, and

H \equiv ω

on

Ω

. Given that H is a KL function, Lemma 1 guarantees the existence of

φ \in Θ_{η}

,

ε > 0

, and

η > 0

such that

φ^{'} (H (z, x, y) - ω) dist ((0, 0, 0), \partial H (z, x, y)) \geq 1

(84)

holds for all

(z, x, y)

satisfying

dist ((z, x, y), Ω) < ε and ω < H (z, x, y) < ω + η .

(85)

By Lemma 6(iv), we know that

{lim}_{k \to \infty} {\hat{H}}_{k} = ω

. Additionally, since

{(z^{k}, x^{k - 1}, y^{k - 1})}

is a bounded sequence and

Ω

is its accumulation set, there exists an integer

K_{2} \geq K_{1}

such that for all

k \geq K_{2}

:

dist ((z^{k}, x^{k - 1}, y^{k - 1}), Ω) < ε

,

ω < {\hat{H}}_{k} < ω + η

, and therefore,

φ^{'} ({\hat{H}}_{k} - ω) dist ((0, 0, 0), \partial H (z^{k}, x^{k - 1}, y^{k - 1})) \geq 1 .

(86)

For the remainder of this proof, we assume

k \geq K_{2}

. Combining inequality (86) with Lemma 7 leads to

\frac{1}{φ^{'} ({\hat{H}}_{k} - ω)} \leq A_{1} (∥ Δ_{k - 1} ∥ + ∥ Δ_{k} ∥), \forall k \geq K_{2} .

(87)

On the other hand, leveraging the mean value theorem, the decreasing property of

φ^{'}

(a direct consequence of

φ

being concave), and the relations

H_{k + 1} \leq H_{k} \leq {\hat{H}}_{k}

(from Equations (32) and (65)), we further derive that

φ (H_{k} - ω) - φ (H_{k + 1} - ω) \geq φ^{'} ({\hat{H}}_{k} - ω) (H_{k} - H_{k + 1}) .

(88)

Define

ν_{k, t} : = φ (H_{k} - ω) - φ (H_{t} - ω)

. Using Equations (32), (88), and (87), respectively, we get

\begin{matrix} \frac{δ L}{2} ∥ Δ_{k} ∥^{2} \leq H_{k} - H_{k + 1} \leq \frac{ν_{k, k + 1}}{φ^{'} ({\hat{H}}_{k} - ω)} \leq A_{1} (∥ Δ_{k - 1} ∥ + ∥ Δ_{k} ∥) ν_{k, k + 1} . \end{matrix}

(89)

Since

φ^{'} > 0

and

H_{k} \geq H_{k + 1}

, it follows that

ν_{k, k + 1} \geq 0

. Applying the AM-GM inequality

\sqrt{a b} \leq \frac{a + b}{2}

to the result above, we obtain

\begin{matrix} ∥ Δ_{k} ∥ \leq \sqrt{\frac{1}{2} (∥ Δ_{k - 1} ∥ + ∥ Δ_{k} ∥)} \sqrt{\frac{4 A_{1}}{δ L} ν_{k, k + 1}} \leq \frac{1}{4} (∥ Δ_{k - 1} ∥ + ∥ Δ_{k} ∥) + \frac{2 A_{1}}{δ L} ν_{k, k + 1}, \end{matrix}

(90)

this yields

∥ Δ_{k} ∥ \leq \frac{1}{2} (∥ Δ_{k - 1} ∥ - ∥ Δ_{k} ∥) + \frac{4 A_{1}}{δ L} ν_{k, k + 1} .

(91)

Summing inequality (91) over

k = K_{2}, K_{2} + 1, \dots, t

(for any

t \geq K_{2}

) gives

\begin{matrix} \sum_{k = K_{2}}^{t} ∥ Δ_{k} ∥ \leq \frac{1}{2} (∥ Δ_{K_{2} - 1} ∥ - ∥ Δ_{t} ∥) + \frac{4 A_{1}}{δ L} ν_{K_{2}, t + 1} \leq \frac{1}{2} ∥ Δ_{K_{2} - 1} ∥ + \frac{4 A_{1}}{δ L} φ (H_{K_{2}} - ω), \end{matrix}

(92)

where the final inequality holds because

φ \geq 0

. This result implies

\sum_{k = 0}^{\infty} ∥ Δ_{k} ∥ < + \infty

, which in turn shows that

{x^{k}}

is a Cauchy sequence and hence converges to some

\bar{x}

. By Theorem 3,

\bar{x}

is a critical point of Problem (1). □

Lemma 8

([48] Lemma 10). Let

{Λ_{k}}_{k \in N}

be a nonincreasing sequence in

R_{+}

converging to 0 and there exists

\bar{k} \geq \bar{l} \geq 0

such as for all

k \geq \bar{k}

,

Λ_{k}^{2 a} \leq m (Λ_{k - \bar{l}} - Λ_{k})

, where m is a nonnegative constant and

a \in [0, 1)

. Then, the following statements hold.

(i): If $a = 0$ , then ${Λ_{k}}$ converges finite time.
(ii): If $a \in (0, \frac{1}{2}]$ , there exists $μ_{1} > 0$ and $τ \in [0, 1)$ such that for all $k \geq \bar{k}$ , $Λ_{k} \leq μ_{1} τ^{k}$ .
(iii): If $a \in (\frac{1}{2}, 1)$ , there exists $μ_{2} > 0$ such that for all $k \geq \bar{k} + \bar{l}$ , $Λ_{k} \leq μ_{2} {(k - \bar{l} + 1)}^{- \frac{1}{2 a - 1}}$ .

Theorem 5.

Under the same conditions as in Lemma 7, suppose further that H is a KL function, where the function φ in the KL inequality takes the form

φ (s) = ρ s^{1 - β}

for some constants

β \in [0, 1)

and

ρ > 0

. Let

{x^{k}}

be the sequence generated by Algorithm 2, and let

\bar{x}

be its limit. Then the following assertions hold:

(i): If $β = 0$ , then Algorithm 2 terminates after finitely many iterations.
(ii): If $β \in (0, \frac{1}{2}]$ , there exists constants $c_{1} > 0$ and $τ \in [0, 1)$ such that $∥ x^{k} - \bar{x} ∥ \leq c_{1} τ^{k}$ .
(iii): If $β \in (\frac{1}{2}, 1)$ , there exists constants $c_{2} > 0$ and $τ \in [0, 1)$ such that $∥ x^{k} - \bar{x} ∥ \leq c_{2} k^{- \frac{1 - β}{2 β - 1}}$ .

Proof.

For the case

β = 0

, we have

φ (s) = ρ s

,

φ^{'} (s) = ρ

. We aim to show that

H_{k} = ω

for for sufficiently large k, and consequently, (i) holds by virtue of Case 1 in the proof of Theorem 4. Suppose, for contradiction, that

H_{k} > ω

for all k. Then, by (65),

{\hat{H}}_{k} > ω

. Inequality (87) yields

\frac{1}{ρ} \leq A_{1} (∥ Δ_{k - 1} ∥ + ∥ Δ_{k} ∥)

(93)

which contradicts the fact that

{lim}_{k \to \infty} ∥ Δ_{k} ∥ = 0

from Theorem 1(ii).

Next, consider the case where

β \in (0, 1)

. For the remainder of this proof, we assume

k \geq K_{2}

, where

K_{2}

is defined in the proof of Theorem 4. First, utilizing the inequality (87) and the expression

φ^{'} (s) = ρ (1 - β) s^{- β}

, we derive

\begin{matrix} {(H_{k} - ω)}^{β} \leq {({\hat{H}}_{k} - ω)}^{β} \leq ρ (1 - β) A_{1} (∥ Δ_{k - 1} ∥ + ∥ Δ_{k} ∥) . \end{matrix}

(94)

Raising the above inequality to the power of

\frac{1 - β}{β} > 0

and noting that

φ (s) = ρ s^{1 - β}

, we obtain

\begin{matrix} φ (H_{k} - ω) = ρ {(H_{k} - ω)}^{1 - β} \leq ρ {[ρ (1 - β) A_{1}]}^{\frac{1 - β}{β}} (∥ Δ_{k - 1} ∥ + ∥ Δ_{k} {∥)}^{\frac{1 - β}{β}} . \end{matrix}

(95)

Define a nonincreasing sequence

R_{k} : = \sum_{i = k + 1}^{\infty} ∥ Δ_{i} ∥

. By Theorem 4, each

R_{k}

is finite. It follows that

\begin{matrix} R_{k + 1} \leq R_{k} & = \sum_{i = k + 1}^{\infty} ∥ Δ_{i} ∥ \\ \overset{(a)}{\leq} \frac{1}{2} ∥ Δ_{k} ∥ + \frac{4 A_{1}}{δ L} φ (H_{k + 1} - ω) \\ \overset{(b)}{\leq} \frac{1}{2} ∥ Δ_{k} ∥ + A_{2} (∥ Δ_{k} + ∥ Δ_{k + 1} {∥ ∥)}^{\frac{1 - β}{β}} \\ = \frac{1}{2} (R_{k - 1} - R_{k}) + A_{2} {(R_{k - 1} - R_{k + 1})}^{\frac{1 - β}{β}}, \end{matrix}

(96)

where

A_{2} = \frac{4 A_{1}}{δ L} ρ {[ρ (1 - β) A_{1}]}^{\frac{1 - β}{β}}

, the inequality

(a)

is derived analogously to (92), and inequality

(b)

follows from (95). With (96) established, we now prove (ii) and (iii) separately.

(ii) For

β \in (0, \frac{1}{2}]

, we have

\frac{1 - β}{β} \geq 1

. Recalling that

{lim}_{k \to \infty} ∥ Δ_{k} ∥ = 0

, we observe that

R_{k - 1} - R_{k + 1} \to 0

as

k \to \infty

. Thus, there exists

K_{3} > K_{2}

such that

R_{k - 1} - R_{k + 1} < 1

for all

k \geq K_{3}

. Hence

R_{k + 1} \leq (A_{2} + \frac{1}{2}) (R_{k - 1} - R_{k + 1}), \forall k \geq K_{3} .

(97)

Applying Lemma 8(ii) to

{R_{k}}

with

a = \frac{1}{2}

and

\bar{l} = 2

, we conclude that (ii) holds.

(iii) For

β \in (\frac{1}{2}, 1)

, we have

\frac{1 - β}{β} < 1

. Similarly, there exists

K_{4} > K_{2}

such that

R_{k + 1} \leq (A_{2} + \frac{1}{2}) {(R_{k - 1} - R_{k + 1})}^{\frac{1 - β}{β}}, \forall k \geq K_{4} .

(98)

Applying Lemma 8(iii) to

{R_{k}}

with

a = \frac{β}{2 (1 - β)}

and

\bar{l} = 2

, we establish (iii) and complete the proof. □

From the preceding results (Lemma 5 to Theorem 4), we observe the efficacy of the restart technique in the convergence analysis of the

{EAPG}_{s}

method. However, the restart criterion given by (59) merely serves as a sufficient condition for

z^{k + 1}

to outperform

x^{k + 1}

, and this condition is overly restrictive, making it difficult to satisfy. In fact, when implementing Algorithm 2 on most problems presented in Section 6, at most one restart occurred. In experiments where no restart was triggered, Algorithm 2 was entirely identical to Algorithm 1, with the sole exception that we explicitly ensured condition (65)—along with the subsequent theoretical results—held for Algorithm 2. Thus, Algorithm 2 should be regarded more as a theoretical construct than a practical implementation.

Practically speaking, however, the restart technique can indeed enhance the efficiency of our algorithms, as demonstrated in Section 6. This indicates the necessity of developing a practical criterion to replace (59), thereby maximizing opportunities for effective restarts. Drawing inspiration from the fixed restarting algorithm ([33] Algorithm 3), the

d_{k}

criterion (discussed following Equation (75) in [26]), and the inner product criterion in [19], we define

d_{k} = \frac{α_{k}^{- 1} [F (x^{k}) - F (x^{k + 1})] + G_{k} - G_{k + 1}}{∥ Δ_{k} ∥^{2}},

(99)

where

G_{k} = Ψ (x^{k}, y^{k - 1}) + \frac{L_{g}}{2} ∥ x^{k} - y^{k - 1} ∥^{2} + \frac{L_{g} + α_{k}^{- 1} L_{f}}{2} {∥ Δ_{k} ∥}^{2},

(100)

and propose the following algorithm (Algorithm 3):

Algorithm 3 (

{EAPG}_{s r}

)

{EAPG}_{s}

with Restart Technique (Practical Version)

Initialization: Given

x^{0}, z^{0} \in C

,

α_{0} > 0

,

d > 0

, a positive integer

N_{0}

, and

{θ_{k}}

as defined in Remark A1(ii).
Determining the restart interval: Execute Algorithm 1 with initial values

x^{0}, z^{0}, α_{0}

and parameters

{θ_{0}, θ_{1}, \dots}

for N steps, where N denotes the first step k satisfying

k \geq N_{0}

and

d_{k} > d_{k - 1}

. Set

x^{0} = z^{0} = z^{N}

and

α_{0} = α_{N}

.
for

s = 1, 2, \dots

do
Execute Algorithm 1 with initial values

x^{0}, z^{0}, α_{0}

and parameters

{θ_{0}, θ_{1}, \dots}

until either

〈 y^{k - 1} - z^{k}, z^{k} - z^{k - 1} 〉 > 0

or

k = N

. Set

x^{0} = z^{0} = z^{k}

and

α_{0} = α_{k}

.
end for

6. Numerical Experiments

In this section, we conduct numerical experiments to assess the performance of Algorithm 3. The design of these experiments is intended to demonstrate three key aspects, as elaborated below:

1.: In Section 6.1 and Section 6.2, we verify the computational efficiency of Algorithm 3 against the IPOPT solver [50,51] and three state-of-the-art methods—namely ${SCP}_{ls}$ [28], ${ESQM}_{b}$ (a basic variant of ESQM derived by fixing $β_{k} \equiv 0$ in ${ESQM}_{e}$ ), and ${ESQM}_{e}$ [19]—when solving the optimization problem formulated in (3).
2.: In Section 6.3, we evaluate three key metrics: the effectiveness of Algorithm 3’s $d_{k}$ -criterion for optimal restart interval identification, Algorithm 3’s efficiency versus Algorithms 1 and 2, and its overall performance relative to multiple modified variants.
3.: In Section 6.4, we validate the efficacy of Algorithm 3 on the unconstrained DC problem specified in (2), with comparisons drawn to the IPOPT solver and three established algorithms for unconstrained DC problems: GIST [52], pDCAe [53], and ${APG}_{s}$ [26] (the foundational prototype of Algorithm 3).

From Section 6.1 and Section 6.2, the numerical experiments focus on the following compressed sensing optimization problem:

\begin{matrix} min_{x \in R^{n}} & {∥ x ∥}_{1} - μ ∥ x ∥ \\ s . t . & h (A x - b) \leq σ, \\ {∥ x ∥}_{\infty} \leq M, \end{matrix}

(101)

where

–: $μ \in [0, 1)$ ;
–: $A \in R^{q \times n}$ has full row rank;
–: $b \in R^{q}$ ;
–: $M = {(1 - μ)}^{- 1} (∥ A^{†} {b ∥}_{1} - μ ∥ A^{†} b ∥)$ ;
–: $h : R^{q} \to R_{+}$ is an analytic function with Lipschitz-continuous gradient (modulus $L_{h}$ ), $h (0) = 0$ , and $σ \in (0, h (- b))$ .

This problem is equivalent to the following model:

\begin{matrix} min_{x \in R^{n}} & {∥ x ∥}_{1} - μ {∥ x ∥}_{2} \\ s . t . & h (A x - b) \leq σ, \end{matrix}

(102)

initially introduced in [54] and further explored in [17,55] for sparse signal recovery.

Problem (101) is a special case of (3) with

P_{1} (x) = {∥ x ∥}_{1}

,

P_{2} (x) = μ ∥ x ∥

,

m = 1

,

g_{1} (x) = h (A x - b) - σ

, and

C = {x : ∥ x ∥_{\infty} \leq M}

. Additionally, since A has full row rank and

h (0) = 0 < σ

, we have

A^{†} b \in C \cap {x : g_{1} (x) < 0} \neq \emptyset

. It is straightforward to verify that Assumption 1 holds for this problem.

In the following subsections, we conduct experiments on Problem (101) with different selections of h. All numerical experiments were performed on a computer with an Intel(R) Core(TM) i5-8265U processor and 8.00 GB of memory, running the Windows 10 operating system. The experiments were implemented using MATLAB R2021a.

6.1. Compressed Sensing with $h (\cdot) = \frac{1}{2} {∥ \cdot ∥}^{2}$

We first consider the Problem (101) with

h (\cdot) = \frac{1}{2} {∥ \cdot ∥}^{2}

, which transforms (101) into:

\begin{matrix} min_{x \in R^{n}} & {∥ x ∥}_{1} - μ ∥ x ∥ \\ s . t . & \frac{1}{2} {∥ A x - b ∥}^{2} \leq σ, \\ {∥ x ∥}_{\infty} \leq M . \end{matrix}

(103)

Note that h is convex, so

g_{1}

is also convex. Thus, we can set

L_{g} = {∥ A ∥}^{2}

and

l_{g} = 0

. Furthermore, since

A^{†} b \in C \cap {x : g_{1} (x) < 0}

, the Slater condition holds for the above problem. Based on the discussion following Definition 3 and Remark 2, Assumption 3 is satisfied.

Since the function H in (64) (corresponding to Problem (103)) is clearly semi-algebraic—and therefore a KL function—we can apply Theorem 5 to deduce the convergence of the entire sequence

{x^{k}}

generated by Algorithm 3 for solving (103).

Details of the Five Algorithms

We detail the setup of the five algorithms below:

(i): Initialization and Stopping Criteria: For ${SCP}_{ls}$ , ${ESQM}_{b}$ , and ${ESQM}_{e}$ , we adopt the same initial points as specified in [19]: specifically, $x^{0} = A^{†} b$ for ${SCP}_{ls}$ , and $x^{0} = 0$ for both ${ESQM}_{b}$ and ${ESQM}_{e}$ . For ${EAPG}_{sr}$ , the initial point is set to $x^{0} = z^{0} = 0$ . For IPOPT, we introduce slack variables u and v to reformulate Problem (103) as follows:

$\begin{matrix} min_{u, v \in R^{n}} & 1^{⊤} u + 1^{⊤} v - μ ∥ u - v ∥ \\ s . t . & \frac{1}{2} {∥ A (u - v) - b ∥}^{2} \leq σ, \\ u, v \geq 0, \end{matrix}$

(104)

where $1$ denotes the all-ones vector (i.e., a vector with each component equal to 1). The corresponding initial points are set to $u^{0} = {[A^{†} b]}_{+} + 0.001 1$ and $v^{0} = {[- A^{†} b]}_{+} + 0.001 1$ .
All algorithms except IPOPT terminate when either the relative iterate difference satisfies $\frac{∥ x^{k + 1} - x^{k} ∥}{max {1, ∥ x^{k + 1} ∥}} \leq ϵ$ (with $ϵ > 0$ to be specified in subsequent sections) or the maximum number of iterations (3000) is reached. For IPOPT, the convergence tolerance is configured to the same value of $ϵ$ and the maximum number of iterations is set to 1000.
(ii): Parameter Settings: The parameters for ${SCP}_{ls}$ follow [28], while those for ${ESQM}_{b}$ and ${ESQM}_{e}$ follow [19]. For ${EAPG}_{sr}$ , we set $α_{0} = 1$ and $d = 1$ (consistent with [19]) and $N_{0} = 20$ . Since $g_{1}$ is convex (implying $l_{g} = 0$ ), any positive integer K is valid for the acceleration parameters ${θ_{k}}$ defined in Remark 1(ii); here, we set $K = 150$ . Notably, in practice, the restart period N observed in experiments was consistently less than 100. Thus, the experimental performance would remain unchanged if we select any $K \geq 100$ .
The subproblems of these algorithms are solved following the procedures outlined in the appendices of [28,45]. In each subproblem, the computational complexity of evaluating $\nabla g_{1}$ and $ζ^{k}$ is $O (q n)$ and $O (n)$ , respectively. Additionally, solving the optimization problem (15) incurs a complexity of $O (n)$ . Consequently, the overall computational complexity of each subproblem is $O (q n)$ .
All settings of IPOPT are set to their default values except for the tolerance parameter and maximum number of iterations.

Experimental Setup for Random Instances

We tested Algorithm 3 on random instances of Problem (103), generated as follows:

1.: Generate $A \in R^{q \times n}$ with independent and identically distributed (i.i.d.) standard Gaussian entries, then normalize A such that each column has unit norm.
2.: Randomly select a subset $T \subseteq {1, 2, \dots, n}$ of size p, and generate a p-sparse vector $x_{orig}$ with i.i.d. standard Gaussian entries on T.
3.: Set $b = A x_{orig} + 0.01 \cdot \hat{n}$ , where $\hat{n}$ is a random vector with i.i.d. standard Gaussian entries; set $σ = 0.5 σ_{1}^{2}$ , where $σ_{1} = 1.1 \cdot ∥ 0.01 \cdot \hat{n} ∥$ .

Experimental Parameters and Result Metrics

In our numerical tests:

–: We set $μ = 0.99$ in Problem (105).
–: We considered parameter triples $(q, n, p) = (720 i, 2560 i,$ $160 i)$ for $i \in {2, 4, 6, 8, 10}$ .
–: For each i, 20 random instances were generated (as above), and results were averaged over these 20 instances.

Computational results for

ϵ = 10^{- 4}

and

ϵ = 10^{- 6}

are presented in Table 1 and Table 2, respectively. The metrics reported include:

–: $t_{QR}$ : Time to compute the QR decomposition of $A^{T}$ .
–: $t_{∥ A ∥}$ : Time to compute ${∥ A ∥}^{2}$ .
–: $t_{A^{†} b}$ : Time to compute $x^{0} = A^{†} b$ using the QR factorization of $A^{T}$ .
–: CPU time of each algorithm.
–: Iter: Number of iterations.
–: RecErr $: = \frac{∥ x^{*} - x_{orig} ∥}{max {1, ∥ x_{orig} ∥}}$ : Recovery error (where $x^{*}$ is the approximate solution from the algorithm).
–: Residual $: = \frac{∥ A x^{*} {- b ∥}^{2} - σ_{1}^{2}}{σ_{1}^{2}}$ : Residual of the constraint violation.

Key Observations from Results

From Table 1 and Table 2, we observe two main results:

1.: ${EAPG}_{sr}$ achieves the fastest computation speed among the five algorithms.
2.: The recovery errors (RecErr) and residuals of all five methods are comparable.

6.2. Compressed Sensing with Lorentzian Norm

Next, we consider Problem (101) with Lorentzian norm, which transforms (101) into

\begin{matrix} min_{x \in R^{n}} & {∥ x ∥}_{1} - μ ∥ x ∥ \\ s . t . & {∥ A x - b ∥}_{L L 2, γ} \leq σ, \\ {∥ x ∥}_{\infty} \leq M . \end{matrix}

(105)

The Lorentzian norm is defined as follows [56]:

{∥ y ∥}_{L L 2, γ} : = \sum_{i = 1}^{q} log (1 + \frac{y_{i}^{2}}{γ^{2}}),

(106)

where

γ > 0

.

As proven in ([19] Subsection 6.2), Assumption 3 holds.

Details of the Five Algorithms

The setup of the five algorithms is detailed below:

(i)

Initialization and Stopping Criteria: All algorithms use the same initialization and stopping criteria as in Section 6.1.

(ii)

Parameter Settings:

–: For ${SCP}_{ls}$ , parameters follow the settings in [28].
–: For the other three algorithms, we set $L_{g} = \frac{{2 ∥ A ∥}^{2}}{γ^{2}}$ , $l_{g} = \frac{{∥ A ∥}^{2}}{4 γ^{2}}$ , $α_{0} = 1.1 γ$ , $d = \frac{γ^{2}}{{150 ∥ A ∥}^{2}}$ —consistent with [19].
–: For the acceleration parameters ${θ_{k}}$ of ${EAPG}_{sr}$ , since $\frac{L_{g}}{L_{g} + l_{g}} = \frac{8}{9}$ , it is straightforward to verify that $K \leq 31$ . However, in practice, because all iterates ${x^{k}}$ lie within a bounded local region, the theoretical results may remain valid for larger values of K. As the next subsection will demonstrate, we can consistently use a large K and adaptively determine the restart period N (where $N < K$ ) to enhance the performance of Algorithm 3. In fact, the restart period N observed in experiments was consistently less than 100. Thus, selecting any $K \geq 100$ would ensure both consistent and improved experimental performance. For the lower bound of N, we also set $N_{0} = 20$ , consistent with the setting in Section 6.1.

The subproblems of these algorithms are solved following the procedures outlined in the appendices of [28,45]. Consistent with the experimental results presented in Section 6.1, the overall computational complexity of each subproblem is also

O (q n)

.

Experimental Setup for Random Instances

Random instances are generated as follows:

1.: Generate A, subset T, size p, and sparse vector $x_{orig}$ using the same method as in Section 6.1.
2.: Set $b = A x_{orig} + 0.01 \cdot \tilde{n}$ , where ${\tilde{n}}_{i} \sim Cauchy (0, 1)$ . Specifically, ${\tilde{n}}_{i}$ is generated as $tan (π ({\hat{n}}_{i} - \frac{1}{2}))$ , with $\hat{n}$ being a random vector with i.i.d. entries uniformly sampled from $[0, 1]$ (note: corrected the ambiguous “with $\tilde{n}$ being” to “with $\hat{n}$ being” to avoid variable confusion).
3.: Set $σ = 1.05 \cdot ∥ 0.01 \cdot \tilde{n} ∥_{L L 2, γ}$ with $γ = 0.055$ .

Experimental Parameters and Result Metrics

In the numerical tests:

–: We set $μ = 0.99$ in Problem (110).
–: We considered parameter triples $(q, n, p) = (720 i, 2560 i, 80 i)$ for $i \in {2, 4, 6, 8, 10}$ .
–: For each i, 20 random instances were generated, and results were averaged over these instances (consistent with Section 6.1).

Computational results for

ϵ = 10^{- 4}

and

ϵ = 10^{- 6}

are presented in Table 3 and Table 4, respectively. The reported metrics are identical to those in Section 6.1:

–: $t_{QR}$ : Time to compute the QR decomposition of $A^{T}$ .
–: $t_{∥ A ∥}$ : Time to compute ${∥ A ∥}^{2}$ .
–: $t_{A^{†} b}$ : Time to compute $x^{0} = A^{†} b$ using the QR factorization of $A^{T}$ .
–: CPU time of each algorithm.
–: Iter: Number of iterations.
–: RecErr $: = \frac{∥ x^{*} - x_{orig} ∥}{max {1, ∥ x_{orig} ∥}}$ : Recovery error (where $x^{*}$ is the approximate solution from the algorithm).
–: Residual $: = \frac{∥ A x^{*} {- b ∥}_{L L 2, γ} - σ}{σ}$ : Residual of the Lorentzian norm constraint violation.

Key Observations from Results

From Table 3 and Table 4, we observe a pattern consistent with that in Table 1 and Table 2:

1.: ${EAPG}_{sr}$ frequently demonstrates the fastest convergence speed among the five.
2.: The recovery errors (RecErr) and residuals of all five methods are comparable.

6.3. Analysis on the Settings of Algorithm 3

The preceding experiments have demonstrated the efficiency of Algorithm 3 compared to other algorithms. In this subsection, we explain the rationality of the settings in Algorithm 3 by illustrating the following conclusions with experimental results:

(i)

The restart period N determined by Algorithm 3 is a good approximation of the optimal fixed restart period for Algorithm 4.

(ii)

The restart scheme of Algorithm 3 outperforms the following alternative schemes:

Algorithm 1 and Algorithm 2, both with ${θ_{k}}$ defined in Remark 1(ii) and with K set to 30 and 100, respectively.
Variant (a) of Algorithm 3: Restarts only based on the $d_{k}$ -criterion (without determining N or using the inner product criterion).
Variant (b) of Algorithm 3: Algorithm 3 with the inner product criterion removed.
Variant (c) of Algorithm 3: Determines the restart interval using both the $d_{k}$ -criterion and the inner product criterion (consistent with Algorithm 3).
Variant (d) of Algorithm 3: Algorithm 3 with the $d_{k}$ -criterion replaced by the inner product criterion.
Variant (e) of Algorithm 3: (Algorithm 3 Incorporating the Armijo Step Size Rule): Replace Equation (16) with the step size selection strategy outlined below:
Let ${\tilde{x}}^{k + 1} = θ_{k} z^{k + 1} + (1 - θ_{k}) x^{k}$ . If $θ_{k} = 1$ or $F ({\tilde{x}}^{k + 1}) \geq F (x^{k})$ , set $x^{k + 1} = {\tilde{x}}^{k + 1}$ directly. Otherwise, compute

$x^{k + 1} = {\tilde{x}}^{k + 1} + β^{p} (z^{k + 1} - {\tilde{x}}^{k + 1})$

where p denotes the smallest non-negative integer satisfying the inequality

$\frac{F ({\tilde{x}}^{k + 1} + β^{p} (z_{k + 1} - {\tilde{x}}^{k + 1})) - F ({\tilde{x}}^{k + 1})}{β^{p} (1 - θ_{k})} \leq c \frac{F ({\tilde{x}}^{k + 1}) - F (x^{k})}{θ_{k}} .$

In the above criterion, the parameters are fixed as $c = 0.1$ and $β = 0.5$ .

To verify conclusion (i), we tested the fixed restarting version of Algorithm 1 on Problem (103) using a single dataset (where

i = 2

) for each

N \in {1, 2, \dots, 50}

(with

ϵ = 10^{- 4}

):

Algorithm 4 Fixed restarting with period N

Initialization: Given

x^{0}, z^{0} \in C

,

α_{0} > 0

,

d > 0

, a positive integer N, and

{θ_{k}}

as defined in Remark A1(ii).
for

s = 1, 2, \dots

do
Execute Algorithm 1 for N steps, with initial values

x^{0}, z^{0}, α_{0}

and parameters

{θ_{0}, θ_{1}, \dots}

. Set

x^{0} = z^{0} = z^{N}

and

α_{0} = α_{N}

.
end for

Figure 1 presents the number of iterations for

N \in {10, 11, \dots, 50}

; note that iterations for

N \in {1, 2, \dots, 9}

are excessively large (ranging from 1776 to 315). From Figure 1, the optimal fixed restart period is identified as 22. By contrast, when applying Algorithm 3 to the same problem and dataset, the

d_{k}

-criterion yields a restart period

N = 24

(see Figure 2).

Similarly, for Problem (105), we conducted the same comparison: the optimal fixed restart period is 42 (see Figure 3); the

d_{k}

-criterion yields a restart period

N = 61

(see Figure 4), which is also among the approximately optimal fixed restart periods. Consistent findings were observed across other datasets, confirming that the N determined by Algorithm 3 approximates the optimal fixed restart period well.

We also note that the lower bound

N_{0}

is necessary. Without

N_{0}

, a small percentage of datasets may result in an extremely small restart period N, which in turn leads to a significantly higher number of iterations.

To verify conclusion (ii), we generated 20 random instances (following the same procedure as in Section 6.1 and Section 6.2) with

i = 2

and

ϵ = 10^{- 4}

. The averaged computational results are presented in Table 5 and Table 6, respectively. In both tables, Algorithm 3 consistently requires the fewest iterations, while the recovery errors of all algorithms are comparable. This confirms that Algorithm 3 is more effective than the other variants.

We note here that Algorithm 2 constitutes a genuine improvement over Algorithm 1; however, in the majority of test cases, only a single restart is triggered when implementing Algorithm 2. Thus, in practical scenarios, Algorithm 3 is able to identify more appropriate restart points and achieve more substantial performance gains.

On the other hand, Variant (e) delivers moderate performance across all variants of Algorithm 3. This observation indicates that line search techniques possess considerable potential for boosting the performance of algorithms such as

{EAPG}_{s}

. Further research into the constraints governing acceleration parameters—analogous to the momentum conditions for generalized acceleration parameters in Nesterov’s first acceleration-augmented proximal gradient-type algorithms [57]—could enable line search to yield more efficient performance improvements.

6.4. Experiments on Unconstrained DC Problems

In this subsection, we demonstrate that the

{EAPG}_{s r}

algorithm yields notable performance improvements over IPOPT, GIST, pDCAe, and

{APG}_{s}

when applied to unconstrained DC problems.

We consider the support vector machine (SVM) model with training data

{(x_{i}, y_{i})}_{i = 1}^{m} \subseteq R^{n} \times {- 1, 1}

, as proposed in [58]:

min_{b \in R, w \in R^{n}} F (b, w) = f_{1} (b, w) - f_{2} (b, w) - f_{3} (b, w) + P_{1} (b, w),

(107)

where

P_{1} (b, w) = λ_{1} {∥ w ∥}_{1} + \frac{{∥ w ∥}_{2}^{2}}{2} + \frac{b^{2}}{2}

(with

λ_{1}

denoting the regularization parameter). For

j = 1, 2, 3

,

f_{j} (b, w) = \frac{1}{m} \sum_{i = 1}^{m} ℓ_{j} [y_{i} (b + w^{⊤} x_{i})]

, and the smooth convex loss functions

ℓ_{j}

are defined as

ℓ_{1} (t) = \{\begin{matrix} \frac{4}{5} - t, & if t < \frac{3}{5}, \\ \frac{5}{4} {(1 - t)}^{2}, & if \frac{3}{5} \leq t < 1, \\ \frac{5}{8} {(1 - t)}^{2}, & if 1 \leq t < \frac{7}{5}, \\ - \frac{1}{2} (\frac{6}{5} - t), & if t \geq \frac{7}{5}, \end{matrix}

(108)

ℓ_{2} (t) = \{\begin{matrix} - t - \frac{1}{5}, & if t \leq - \frac{2}{5}, \\ \frac{5}{4} t^{2}, & if - \frac{2}{5} < t \leq 0, \\ 0, & if t \geq 0, \end{matrix}

(109)

ℓ_{3} (t) = \{\begin{matrix} 0, & if t \leq \frac{8}{5}, \\ \frac{5}{8} {(t - \frac{8}{5})}^{2}, & if \frac{8}{5} \leq t < 2, \\ - \frac{1}{2} (\frac{9}{5} - t), & if t \geq 2 . \end{matrix}

(110)

Implementation Details of the Five Algorithms

We specify the setup of the five competing algorithms in detail below:

(i)

Objective Function Formulation:

For GIST, the objective is cast as $F = f + P_{1}$ , where $f = f_{1} - f_{2} - f_{3}$ ;
For pDCAe, F is decomposed into $F = f + P_{1} - P_{2}$ , with $f = f_{1}$ and $P_{2} = f_{2} + f_{3}$ ;
For ${APG}_{s}$ and ${EAPG}_{s r}$ , the decomposition takes the form $F = f + P_{1} - P_{2}$ , where $f = f_{1} - f_{2}$ and $P_{2} = f_{3}$ ;
For IPOPT, to accommodate its requirement for differentiable objectives, we introduce non-negative slack variables $u, v$ such that $w = u - v$ , thereby reformulating $F (b, w)$ into a differentiable form.

(ii)

Initialization and Termination Criteria: For each dataset, a total of 21 initial points

(b^{0}, w^{0})

are used uniformly across all algorithms except IPOPT: one zero vector, plus 5 independently sampled vectors from the normal distribution

N (0, σ^{2} I)

for each

σ \in {1, 2, 4, 8}

. All algorithms terminate if either the relative iterate difference satisfies

\frac{∥ (b^{k + 1}; w^{k + 1}) - (b^{k}; w^{k}) ∥}{max {1, ∥ (b^{k}; w^{k}) ∥}} < 10^{- 6}

or the iteration count reaches the upper limit of 3000.

For IPOPT, corresponding to each of the aforementioned initial points

(b^{0}, w^{0})

, we initialize the slack variables as

u^{0} = {[w^{0}]}_{+} + 0.001 1

and

v^{0} = {[- w^{0}]}_{+} + 0.001 1

. The convergence tolerance is set to

10^{- 6}

, with the maximum number of iterations configured to 1000.

(iii)

Parameter Configurations: The parameters for GIST, pDCAe, and

{APG}_{s}

are set in accordance with their respective original studies [26,52,53]. For

{EAPG}_{s r}

, we fix the restart parameter

N_{0} = 20

. All IPOPT settings are retained at their default values, with the only exception being the convergence tolerance (adjusted as specified above) and the maximum number of iterations.

A total of 8 real-world datasets are selected from the UCI Machine Learning Repository [59] for testing.

Experimental Parameters and Evaluation Metrics

In our numerical experiments, the following parameter specifications and performance metrics are adopted:

–: The regularization parameter in Problem (107) is set to $λ_{1} = 1 \times 10^{- 3}$ ;
–: The Lipschitz constant L for all algorithms is taken as

$L = \frac{5}{2} \frac{1}{m} \sum_{i = 1}^{m} y_{i}^{2} (1 + ∥ x_{i} ∥^{2}),$

which is derived from ([58] Proposition 1).

Computational results are summarized in Table 7, with the following metrics reported for each algorithm:

–: Iter: Number of iterations required to reach convergence;
–: Fval: Final value of the objective function at termination;
–: Time: Total CPU time (in seconds) consumed during the optimization process.

The experimental results in Table 7 consistently verify three core advantages of the

{EAPG}_{s r}

algorithm:

1.: It attains the second-lowest iteration count in most test cases.
2.: It achieves optimal objective function values across most datasets.
3.: It incurs the shortest computational time among all competing methods.

These findings confirm that

{EAPG}_{s r}

also exhibits superior performance in solving unconstrained DC problems.

7. Conclusions

In this paper, we extend the proximal gradient method with Nesterov’s second acceleration technique (

{APG}_{s}

; see [23,25,26,44])—originally designed for unconstrained DC problems—to an extended version (

{EAPG}_{s}

) for constrained DC problems. This extension draws on the constraint handling idea from the extended sequential quadratic method (ESQM) introduced in [18].

We establish the subsequential convergence of the entire sequence under appropriate assumptions. Additionally, by incorporating a restart technique, we further derive a global convergence result. Notably, this global convergence requires weaker assumptions on the function

P_{1}

compared to those in [25,26].

Guided by our theoretical analysis, we further propose a practical variant of

{EAPG}_{s}

(dubbed

{EAPG}_{s r}

, detailed in Algorithm 3) with efficient restart criteria. Numerical experiments demonstrate that, in most cases,

{EAPG}_{s r}

achieve high-quality solutions to the test problems with fewer iterations and lower CPU time.

Our core theoretical contributions are summarized as follows:

(1): Extending the ${APG}_{s}$ method to the setting of constrained DC problems, filling the gap between unconstrained DC optimization and constrained DC problem solving.
(2): Deriving a global convergence result for the restart-augmented ${EAPG}_{s}$ framework.
(3): Weakening the original regularity requirements on the function $P_{1}$ that underpin the convergence of the baseline ${APG}_{s}$ method.

From a numerical perspective,

{EAPG}_{s r}

demonstrates two key practical merits:

(a): Providing a competitive new approach for solving constrained DC problems, with performance comparable to or exceeding state-of-the-art methods.
(b): Delivering significant performance gains (attributed to the embedded restart technique) over the baseline ${APG}_{s}$ method when applied to unconstrained DC problems.

Building on the advances of this work, several promising avenues for future research are identified:

(i): Identifying the conditions under which Inequality (65) holds for Algorithm 1.
(ii): Conducting a theoretical analysis of the restart criteria for Algorithm 3.
(iii): Developing efficient solvers for subproblems involving more generalized forms of $P_{1}$ and cases with $m > 1$ .
(iv): Exploring techniques (e.g., adaptive line search rules) to identify optimal acceleration parameters, further enhancing the algorithm’s computational efficiency.

Author Contributions

Conceptualization, C.L.; Methodology, C.L.; Validation, Z.L. and H.K.; Data curation, Z.L.; Writing—original draft, Z.L. and H.K.; Writing—review & editing, Z.L., H.K. and C.L.; Supervision, C.L.; Funding acquisition, C.L. All authors have read and agreed to the published version of the manuscript.

Funding

The third author is supported by the National Natural Science Foundation of China (Project No. 12571102).

Data Availability Statement

The codes for generating the random data and implementing the algorithms in the numerical section are available from the corresponding author upon request.

Acknowledgments

The authors wish to express their sincere gratitude to the reviewers for their insightful comments and constructive suggestions, which have significantly contributed to the improvement of this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Le Thi, H.A.; Pham Dinh, T. DC programming and DCA: Thirty years of developments. Math. Program. 2018, 169, 5–68. [Google Scholar] [CrossRef]
Hong, L.J.; Yang, Y.; Zhang, L.W. Sequential Convex Approximations to Joint Chance Constrained Programs: A Monte Carlo Approach. Oper. Res. 2011, 59, 617–630. [Google Scholar] [CrossRef]
Geremew, W.; Nam, N.M.; Semenov, A.; Boginski, V.; Pasiliao, E. A DC programming approach for solving multicast network design problems via the Nesterov smoothing technique. J. Glob. Optim. 2018, 72, 705–729. [Google Scholar] [CrossRef]
Shen, C.G.; Liu, X. Solving nonnegative sparsity-constrained optimization via DC quadratic-piecewise-linear approximations. J. Glob. Optim. 2021, 81, 1019–1055. [Google Scholar] [CrossRef]
van Ackooij, W.; de Oliveira, W. Non-smooth DC-constrained optimization: Constraint qualification and minimizing methodologies. Optim. Methods Softw. 2019, 34, 890–920. [Google Scholar] [CrossRef]
Lu, Z.S.; Sun, Z.; Zhou, Z.R. Penalty and Augmented Lagrangian Methods for Constrained DC Programming. Math. Oper. Res. 2022, 47, 1707–2545. [Google Scholar] [CrossRef]
Pang, J.-S.; Razaviyayn, M.; Alvarado, A. Computing B-Stationary Points of Nonsmooth DC Programs. Math. Oper. Res. 2017, 42, 95–118. [Google Scholar] [CrossRef]
Alvarado, A.; Scutari, G.; Pang, J.-S. A new decomposition method for multiuser DC programming and its applications. IEEE Trans. Signal. Process. 2014, 62, 2984–2998. [Google Scholar] [CrossRef]
Zhang, S.; Xin, J. Minimization of transformed L₁ penalty: Theory, difference of convex function algorithm, and robust application in compressed sensing. Math. Program. 2018, 169, 307–336. [Google Scholar] [CrossRef]
Sanjabi, M.; Razaviyayn, M.; Luo, Z.-Q. Optimal joint base station assignment and beamforming for heterogeneous networks. IEEE Trans. Signal. Process. 2014, 62, 1950–1961. [Google Scholar] [CrossRef]
Candès, E.; Recht, B. Exact matrix completion via convex optimization. Commun. ACM 2012, 55, 111–119. [Google Scholar] [CrossRef]
Nakayama, S.; Gotoh, J.Y. On the superiority of pgms to pdcas in nonsmooth nonconvex sparse regression. Optim. Lett. 2021, 15, 2831–2860. [Google Scholar] [CrossRef]
Van Luong Le, F.L.; Bloch, G. Selective l₁ minimization for sparse recovery. IEEE Trans. Autom. Control 2014, 59, 3008–3013. [Google Scholar]
Wang, W.; Chen, Y. An accelerated smoothing gradient method for nonconvex nonsmooth minimization in image processing. J. Sci. Comput 2022, 90, 31. [Google Scholar] [CrossRef]
Pham Dinh, T.; Souad, E.B. Algorithms for solving a class of nonconvex optimization Problems. Methods of subgradients. In Fermat Days 85: Mathematics for Optimization; Hiriart-Urruty, J.-B., Ed.; North-Holland: Amsterdam, The Netherlands, 1986; pp. 249–271. [Google Scholar]
Gill, P.E.; Wong, E. Sequential quadratic programming methods. In Mixed Integer Nonlinear Programming; Lee, J., Leyffer, S., Eds.; Springer: New York, NY, USA, 2012; pp. 147–224. [Google Scholar]
Yin, P.; Lou, Y.; He, Q.; Xin, J. Minimization of ℓ_1–2 for compressed sensing. SIAM J. Sci. Comput. 2015, 37, A536–A563. [Google Scholar] [CrossRef]
Auslender, A. An extended sequential quadratically constrained quadratic programming algorithm for nonlinear, semidefinite, and second-order cone programming. J. Optim. Theory Appl. 2013, 156, 183–212. [Google Scholar] [CrossRef]
Zhang, Y.; Pong, T.K.; Xu, S. An extended sequential quadratic method with extrapolation. Comput. Optim. Appl. 2025, 91, 1185–1225. [Google Scholar] [CrossRef]
Gaudioso, M.; Taheri, S.; Bagirov, A.M.; Karmitsa, N. Bundle Enrichment Method for Nonsmooth Difference of Convex Programming Problems. Algorithms 2023, 16, 394. [Google Scholar] [CrossRef]
Tseng, P. Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. 2010, 125, 263–295. [Google Scholar] [CrossRef]
Beck, A.; Teboulle, M. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems. SIAM J. Imaging Sci. 2009, 2, 183–202. [Google Scholar] [CrossRef]
Auslender, A.; Teboulle, M. Interior gradient and proximal methods for convex and conic optimization. SIAM J. Optim. 2006, 16, 697–725. [Google Scholar] [CrossRef]
Wen, B.; Chen, X.; Pong, T.K. Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems. SIAM J. Optim. 2017, 27, 124–145. [Google Scholar] [CrossRef]
Lin, D.; Liu, C. The modified second apg method for dc optimization problems. Optim. Lett. 2019, 13, 805–824. [Google Scholar] [CrossRef]
Ren, K.; Liu, C.; Wang, L. The modified second APG method for a class of nonconvex nonsmooth problems. Optim. Lett. 2025, 19, 747–770. [Google Scholar] [CrossRef]
Lu, Z. Sequential convex programming methods for a class of structured nonlinear programming. arXiv 2012, arXiv:1210.3039. [Google Scholar]
Yu, P.; Pong, T.K.; Lu, Z. Convergence rate analysis of a sequential convex programming method with line search for a class of constrained difference-of-convex optimization problems. SIAM J. Optim. 2021, 31, 2024–2054. [Google Scholar] [CrossRef]
Wilson, R.B. A Simplicial Method for Convex Programming. Ph.D. Thesis, Harvard University, Cambridge, MA, USA, 1963. [Google Scholar]
Solodov, M.V. On the sequential quadratically constrained quadratic programming methods. Math. Oper. Res. 2004, 29, 64–79. [Google Scholar] [CrossRef][Green Version]
Fukushima, M.; Luo, Z.-Q.; Tseng, P. A sequential quadratically constrained quadratic programming method for differentiable convex minimization. SIAM J. Optim. 2003, 13, 1098–1119. [Google Scholar] [CrossRef]
Solodov, M.V. Global convergence of an SQP method without boundedness assumptions on any of the iterative sequences. Math. Program. 2009, 118, 1–12. [Google Scholar] [CrossRef][Green Version]
O’Donoghue, B.; Candès, E. Adaptive Restart for Accelerated Gradient Schemes. Found. Comput. Math. 2015, 15, 715–732. [Google Scholar] [CrossRef]
Rockafellar, R.T.; Wets, R.J.B. Variational Analysis, 3rd ed.; Springer Science & Business Media: Berlin, Germany, 2009; pp. 298–472. [Google Scholar]
Bolte, J.; Daniilidis, A.; Lewis, A.; Shiota, M. Clarke subgradients of stratifiable functions. SIAM J. Optim. 2007, 18, 556–572. [Google Scholar] [CrossRef]
Wang, L.; Liu, Z.; Liu, C. The Bregman Modified Second APG Method for DC Optimization Problems. IEEE Access 2025, 13, 126070–126083. [Google Scholar] [CrossRef]
Attouch, H.; Bolte, J.; Redont, P.; Soubeyran, A. Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 2010, 35, 438–457. [Google Scholar] [CrossRef]
Attouch, H.; Bolte, J.; Svaiter, B.F. Convergence of descent methods for semi-algebraic and tame problems: Proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 2013, 137, 91–129. [Google Scholar] [CrossRef]
Bolte, J.; Sabach, S.; Teboulle, M. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 2014, 146, 459–494. [Google Scholar] [CrossRef]
Li, G.; Pong, T.K. Calculus of the exponent of Kurdyka-Łojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comput. Math. 2018, 18, 1199–1232. [Google Scholar] [CrossRef]
Qian, Y.; Pan, S. Convergence of a class of nonmonotone descent methods for Kurdyka-Łojasiewicz optimization problems. SIAM J. Optim. 2023, 33, 638–651. [Google Scholar] [CrossRef]
Wen, B.; Chen, X.; Pong, T.K. A proximal difference-of-convex algorithm with extrapolation. Comput. Optim. Appl. 2018, 69, 297–324. [Google Scholar] [CrossRef]
Bot, R.I.; Dao, M.N.; Li, G. Extrapolated proximal subgradient algorithms for nonconvex and nonsmooth fractional programs. Math. Oper. Res. 2022, 47, 2415–2443. [Google Scholar] [CrossRef]
Nesterov, Y. Introductory Lectures on Convex Optimization: A Basic Course; Kluwer Academic Publishers: Boston, MA, USA, 2004; pp. 20–105. [Google Scholar]
Zhang, Y.; Li, G.; Pong, T.K.; Xu, S. Retraction-based first-order feasible methods for difference-of-convex programs with smooth inequality and simple geometric constraints. Adv. Comput. Math. 2023, 49, 1–40. [Google Scholar] [CrossRef]
Chen, G.; Teboulle, M. Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM J. Optim. 1993, 3, 538–543. [Google Scholar] [CrossRef]
Rockafellar, R.T. Convex Analysis; Princeton University Press: Princeton, NJ, USA, 1970; pp. 227–240. [Google Scholar]
Bot, R.I.; Nguyen, D.K. The proximal alternating direction method of multipliers in the nonconvex setting: Convergence analysis and rates. Math. Oper. Res. 2020, 45, 682–712. [Google Scholar] [CrossRef]
Rockafellar, R.T.; Wets, R.J.-B. Variational Analysis; Springer: Berlin, Germany, 1998; pp. 298–472. [Google Scholar]
Wächter, A.; Biegler, L.T. On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming. Math. Program. 2006, 106, 25–57. [Google Scholar] [CrossRef]
COIN-OR. Ipopt (Interior Point Optimizer). Computer Software. 2006. Available online: https://projects.coin-or.org/Ipopt (accessed on 28 October 2025).
Chen, X.; Lu, Z.; Pong, T.K. Penalty methods for a class of non-Lipschitz optimization problems. SIAM J. Optim. 2016, 26, 1465–1492. [Google Scholar] [CrossRef]
Liu, T.; Pong, T.K.; Takeda, A. A refined convergence analysis of pDCAe with applications to simultaneous sparse recovery and outlier detection. Comput. Optim. Appl. 2019, 73, 69–100. [Google Scholar] [CrossRef]
Esser, E.; Lou, Y.; Xin, J. A method for finding structured sparse solutions to non-negative least squares problems with applications. SIAM J. Imag. Sci. 2013, 6, 2010–2046. [Google Scholar] [CrossRef]
Lou, Y.; Yin, P.; He, Q.; Xin, J. Computing sparse representation in a highly coherent dictionary based on difference of L1 and L2. J. Sci. Comput. 2015, 64, 178–196. [Google Scholar] [CrossRef]
Carrillo, R.E.; Barner, K.E.; Aysal, T.C. Robust sampling and reconstruction methods for sparse signals in the presence of impulsive noise. IEEE J. Sel. Top. Sign. Process. 2010, 4, 392–408. [Google Scholar] [CrossRef]
Lin, Y.Z.; Li, S.; Zhang, Y.Z. Convergence Rate Analysis of Accelerated Forward-Backward Algorithm with Generalized Nesterov Momentum Scheme. Int. J. Numer. Anal. Model. 2023, 20, 518–537. [Google Scholar] [CrossRef]
Zhu, W.; Song, Y.; Xiao, Y. Robust support vector machine classifier with truncated loss function by gradient algorithm. Comput. Ind. Eng. 2022, 172, 108630. [Google Scholar] [CrossRef]
Dua, D.; Graff, C. UCI Machine Learning Repository. 2017. Available online: http://archive.ics.uci.edu/ml (accessed on 28 October 2025).

Figure 1. Total number of iterations of Algorithm 4 versus the fixed restart period N. The minimum number of iterations is achieved at

N = 22

(marked by the red circle). For comparison, the value of N determined by Algorithm 3 is 24 (marked by the red star).

Figure 1. Total number of iterations of Algorithm 4 versus the fixed restart period N. The minimum number of iterations is achieved at

N = 22

(marked by the red circle). For comparison, the value of N determined by Algorithm 3 is 24 (marked by the red star).

Figure 2. The value of

d_{k}

at each k-th iteration of Algorithm 3. The first iteration k for which

d_{k} > d_{k - 1}

is 24 (marked by the red star), and thus the restart period N is set to 24.

Figure 2. The value of

d_{k}

at each k-th iteration of Algorithm 3. The first iteration k for which

d_{k} > d_{k - 1}

is 24 (marked by the red star), and thus the restart period N is set to 24.

Figure 3. Total number of iterations of Algorithm 4 versus the fixed restart period N. The minimum number of iterations is achieved at

N = 42

(marked by the red circle). For comparison, the value of N determined by Algorithm 3 is 61 (marked by the red star).

Figure 3. Total number of iterations of Algorithm 4 versus the fixed restart period N. The minimum number of iterations is achieved at

N = 42

(marked by the red circle). For comparison, the value of N determined by Algorithm 3 is 61 (marked by the red star).

Figure 4. The value of

d_{k}

at each k-th iteration of Algorithm 3. The first iteration k for which

d_{k} > d_{k - 1}

is 61 (marked by the red star), and thus the restart period N is set to 61.

Figure 4. The value of

d_{k}

at each k-th iteration of Algorithm 3. The first iteration k for which

d_{k} > d_{k - 1}

is 61 (marked by the red star), and thus the restart period N is set to 61.

Table 1. Computational results for Problem (103) with

ϵ = 10^{- 4}

.

Table 1. Computational results for Problem (103) with

ϵ = 10^{- 4}

.

	Method	i = 2	i = 4	i = 6	i = 8	i = 10
Time	$t_{Q R}$	0.818	5.449	18.567	43.191	91.172
(sec)	$t_{A^{†} b}$	0.008	0.034	0.081	0.149	0.264
	$t_{∥ A ∥}$	1.321	2.062	6.486	13.789	27.855
	IPOPT	4.141	12.439	30.015	43.383	88.684
	${SCP}_{l s}$	3.804	15.112	31.714	56.368	92.386
	${ESQM}_{b}$	15.321	65.436	143.914	262.850	427.229
	${ESQM}_{e}$	0.976	4.133	9.150	16.533	26.966
	${EAPG}_{s r}$	0.986	3.963	8.812	16.132	25.874
Iter	IPOPT	108	134	146	162	180
	${SCP}_{l s}$	212	222	217	218	225
	${ESQM}_{b}$	1705	1777	1786	1793	1816
	${ESQM}_{e}$	108	112	113	112	113
	${EAPG}_{s r}$	101	102	104	103	104
RecErr	IPOPT	0.048687	0.051378	0.050837	0.051987	0.052050
	${SCP}_{l s}$	0.049113	0.051792	0.051330	0.052505	0.052568
	${ESQM}_{b}$	0.066395	0.071510	0.071093	0.072476	0.072915
	${ESQM}_{e}$	0.048744	0.051471	0.050940	0.052131	0.052169
	${EAPG}_{s r}$	0.049920	0.053142	0.052955	0.053912	0.053758
Residual	IPOPT	$1.99 \times 10^{- 6}$	$1.91 \times 10^{- 6}$	$1.97 \times 10^{- 6}$	$1.98 \times 10^{- 6}$	$2.10 \times 10^{- 6}$
	${SCP}_{l s}$	$- 5.69 \times 10^{- 7}$	$- 5.62 \times 10^{- 7}$	$- 6.09 \times 10^{- 7}$	$- 5.63 \times 10^{- 7}$	$- 6.93 \times 10^{- 7}$
	${ESQM}_{b}$	$6.79 \times 10^{- 7}$	$5.73 \times 10^{- 7}$	$5.79 \times 10^{- 7}$	$5.63 \times 10^{- 7}$	$5.49 \times 10^{- 7}$
	${ESQM}_{e}$	$1.22 \times 10^{- 7}$	$1.17 \times 10^{- 7}$	$1.03 \times 10^{- 7}$	$1.08 \times 10^{- 7}$	$1.03 \times 10^{- 7}$
	${EAPG}_{s r}$	$4.82 \times 10^{- 7}$	$6.22 \times 10^{- 7}$	$5.90 \times 10^{- 7}$	$6.08 \times 10^{- 7}$	$6.12 \times 10^{- 7}$

Table 2. Computational results for Problem (103) with

ϵ = 10^{- 6}

.

Table 2. Computational results for Problem (103) with

ϵ = 10^{- 6}

.

	Method	i = 2	i = 4	i = 6	i = 8	i = 10
Time	$t_{Q R}$	0.883	5.840	18.999	43.567	90.874
(sec)	$t_{A^{†} b}$	0.009	0.035	0.078	0.149	0.264
	$t_{∥ A ∥}$	1.365	2.199	6.517	13.888	27.822
	IPOPT	4.781	13.733	29.939	52.517	100.201
	${SCP}_{l s}$	4.388	17.595	36.548	65.808	107.272
	${ESQM}_{b}$	23.894	103.784	232.600	423.241	690.270
	${ESQM}_{e}$	1.679	7.989	18.182	34.184	57.380
	${EAPG}_{s r}$	1.584	7.392	14.545	27.413	46.253
Iter	IPOPT	127	152	165	181	200
	${SCP}_{l s}$	242	256	251	254	260
	${ESQM}_{b}$	2683	2792	2867	2866	2902
	${ESQM}_{e}$	187	214	223	231	240
	${EAPG}_{s r}$	161	168	168	171	169
RecErr	IPOPT	0.048687	0.051378	0.050836	0.051986	0.052049
	${SCP}_{l s}$	0.048695	0.051389	0.050848	0.051997	0.052060
	${ESQM}_{b}$	0.048820	0.051618	0.050987	0.052167	0.052220
	${ESQM}_{e}$	0.048689	0.051379	0.050840	0.051985	0.052049
	${EAPG}_{s r}$	0.048708	0.051406	0.050868	0.052010	0.052074
Residual	IPOPT	$1.77 \times 10^{- 6}$	$1.74 \times 10^{- 6}$	$1.73 \times 10^{- 6}$	$1.73 \times 10^{- 6}$	$1.75 \times 10^{- 6}$
	${SCP}_{l s}$	$- 5.69 \times 10^{- 10}$	$- 5.61 \times 10^{- 10}$	$- 6.08 \times 10^{- 10}$	$- 5.52 \times 10^{- 10}$	$- 6.78 \times 10^{- 10}$
	${ESQM}_{b}$	$1.01 \times 10^{- 10}$	$3.07 \times 10^{- 10}$	$1.05 \times 10^{- 10}$	$1.36 \times 10^{- 10}$	$1.30 \times 10^{- 10}$
	${ESQM}_{e}$	$5.83 \times 10^{- 11}$	$2.06 \times 10^{- 11}$	$3.19 \times 10^{- 11}$	$2.43 \times 10^{- 11}$	$- 1.96 \times 10^{- 11}$
	${EAPG}_{s r}$	$3.63 \times 10^{- 11}$	$- 2.59 \times 10^{- 12}$	$9.27 \times 10^{- 11}$	$6.51 \times 10^{- 11}$	$3.74 \times 10^{- 11}$

Table 3. Computational results for Problem (105) with

ϵ = 10^{- 4}

.

Table 3. Computational results for Problem (105) with

ϵ = 10^{- 4}

.

	Method	i = 2	i = 4	i = 6	i = 8	i = 10
Time (sec)	$t_{Q R}$	0.763	5.485	18.964	46.317	90.568
	$t_{A^{†} b}$	0.008	0.038	0.092	0.167	0.269
	$t_{∥ A ∥}$	1.280	2.201	6.614	14.669	27.917
	IPOPT	13.380	25.354	31.578	50.119	122.681
	${SCP}_{l s}$	2.861	11.071	15.250	51.427	67.749
	${ESQM}_{b}$	9.879	41.459	89.737	180.831	279.228
	${ESQM}_{e}$	1.803	7.540	16.394	32.972	50.844
	${EAPG}_{s r}$	1.541	6.269	13.451	26.205	42.090
Iter	IPOPT	334	268	162	151	191
	${SCP}_{l s}$	183	181	117	207	180
	${ESQM}_{b}$	1136	1149	1146	1195	1163
	${ESQM}_{e}$	204	207	208	217	209
	${EAPG}_{s r}$	170	169	165	167	170
RecErr	IPOPT	0.762756	3051.699141	0.084689	0.084819	15.288138
	${SCP}_{l s}$	0.081013	0.081612	0.084733	0.083314	0.086727
	${ESQM}_{b}$	0.086207	0.087180	0.090687	0.089185	0.092889
	${ESQM}_{e}$	0.080836	0.081385	0.084550	0.083104	0.086500
	${EAPG}_{s r}$	0.081517	0.081882	0.085460	0.083959	0.087379
Residual	IPOPT	$1.21 \times 10^{0}$	$2.03 \times 10^{0}$	$2.22 \times 10^{- 7}$	$2.03 \times 10^{- 7}$	$1.96 \times 10^{0}$
	${SCP}_{l s}$	$- 4.88 \times 10^{- 8}$	$- 4.03 \times 10^{- 8}$	$- 6.11 \times 10^{- 8}$	$- 3.70 \times 10^{- 8}$	$- 4.19 \times 10^{- 8}$
	${ESQM}_{b}$	$9.40 \times 10^{- 8}$	$9.44 \times 10^{- 8}$	$9.19 \times 10^{- 8}$	$9.43 \times 10^{- 8}$	$9.11 \times 10^{- 8}$
	${ESQM}_{e}$	$3.21 \times 10^{- 8}$	$2.71 \times 10^{- 8}$	$1.87 \times 10^{- 8}$	$2.41 \times 10^{- 8}$	$3.15 \times 10^{- 8}$
	${EAPG}_{s r}$	$2.98 \times 10^{- 8}$	$2.41 \times 10^{- 8}$	$3.26 \times 10^{- 8}$	$3.20 \times 10^{- 8}$	$2.31 \times 10^{- 8}$

Table 4. Computational results for Problem (105) with

ϵ = 10^{- 6}

.

Table 4. Computational results for Problem (105) with

ϵ = 10^{- 6}

.

	Method	i = 2	i = 4	i = 6	i = 8	i = 10
Time (sec)	$t_{Q R}$	0.910	5.989	19.148	44.312	91.606
	$t_{A^{†} b}$	0.009	0.034	0.081	0.146	0.263
	$t_{∥ A ∥}$	1.440	2.323	6.580	14.279	28.077
	IPOPT	16.414	27.392	35.748	56.783	135.875
	${SCP}_{l s}$	3.357	12.381	18.140	52.182	72.401
	${ESQM}_{b}$	14.465	59.228	130.443	243.864	385.851
	${ESQM}_{e}$	2.457	9.712	21.284	40.449	65.420
	${EAPG}_{s r}$	2.503	9.822	20.052	37.168	59.899
Iter	IPOPT	343	277	172	161	201
	${SCP}_{l s}$	198	196	134	225	196
	${ESQM}_{b}$	1555	1580	1604	1650	1632
	${ESQM}_{e}$	259	258	260	272	276
	${EAPG}_{s r}$	261	259	243	249	250
RecErr	IPOPT	0.762755	3051.699141	0.084688	0.084818	15.288138
	${SCP}_{l s}$	0.080891	0.081482	0.084545	0.083153	0.086576
	${ESQM}_{b}$	0.080938	0.081531	0.084597	0.083204	0.086629
	${ESQM}_{e}$	0.080887	0.081479	0.084540	0.083149	0.086571
	${EAPG}_{s r}$	0.080896	0.081489	0.084538	0.083148	0.086569
Residual	IPOPT	$1.21 \times 10^{0}$	$2.03 \times 10^{0}$	$1.75 \times 10^{- 7}$	$1.71 \times 10^{- 7}$	$1.96 \times 10^{0}$
	${SCP}_{l s}$	$- 4.60 \times 10^{- 11}$	$- 2.96 \times 10^{- 11}$	$- 2.97 \times 10^{- 11}$	$- 4.15 \times 10^{- 11}$	$- 3.98 \times 10^{- 11}$
	${ESQM}_{b}$	$1.07 \times 10^{- 11}$	$1.03 \times 10^{- 11}$	$9.55 \times 10^{- 12}$	$9.66 \times 10^{- 12}$	$9.25 \times 10^{- 12}$
	${ESQM}_{e}$	$3.40 \times 10^{- 12}$	$6.13 \times 10^{- 12}$	$4.77 \times 10^{- 12}$	$3.88 \times 10^{- 12}$	$1.69 \times 10^{- 12}$
	${EAPG}_{s r}$	$6.00 \times 10^{- 12}$	$4.99 \times 10^{- 12}$	$5.67 \times 10^{- 12}$	$5.66 \times 10^{- 12}$	$- 9.82 \times 10^{- 12}$

Table 5. Computational results for Problem (103) with

i = 2, ϵ = 10^{- 4}

.

Table 5. Computational results for Problem (103) with

i = 2, ϵ = 10^{- 4}

.

	Time	Iter	RecErr	Residual
Algorithm 1 (K = 30)	3.901	302	0.048687	$9.72 \times 10^{- 7}$
Algorithm 2 (K = 30)	2.179	184	0.048684	$4.36 \times 10^{- 10}$
Algorithm 1 (K = 100)	10.828	825	0.048687	$9.91 \times 10^{- 7}$
Algorithm 2 (K = 100)	4.339	360	0.048689	$2.09 \times 10^{- 8}$
Algorithm 3	1.029	101	0.049920	$4.82 \times 10^{- 7}$
Variant (a)	1.827	173	0.048779	$7.19 \times 10^{- 7}$
Variant (b)	1.009	101	0.049920	$4.82 \times 10^{- 7}$
Variant (c)	1.010	101	0.049920	$4.82 \times 10^{- 7}$
Variant (d)	1.376	135	0.049629	$2.74 \times 10^{- 7}$
Variant (e)	1.298	129	0.049143	$5.64 \times 10^{- 7}$

Table 6. Computational results for Problem (105) with

i = 2, ϵ = 10^{- 4}

.

Table 6. Computational results for Problem (105) with

i = 2, ϵ = 10^{- 4}

.

	Time	Iter	RecErr	Residual
Algorithm 1 (K = 30)	3.983	349	0.080888	$9.78 \times 10^{- 8}$
Algorithm 2 (K = 30)	1.998	205	0.080925	$3.81 \times 10^{- 9}$
Algorithm 1 (K = 100)	12.048	935	0.080888	$9.89 \times 10^{- 8}$
Algorithm 2 (K = 100)	3.821	361	0.080886	$6.69 \times 10^{- 9}$
Algorithm 3	1.697	170	0.081517	$2.98 \times 10^{- 8}$
Variant (a)	7.611	762	0.081102	$7.24 \times 10^{- 8}$
Variant (b)	2.280	204	0.080623	$1.56 \times 10^{- 8}$
Variant (c)	2.261	204	0.080623	$1.56 \times 10^{- 8}$
Variant (d)	2.124	201	0.080910	$1.96 \times 10^{- 8}$
Variant (e)	1.791	187	0.080645	$6.61 \times 10^{- 8}$

Table 7. Comparison of five methods on 8 datasets.

	Iter
Dataset	IPOPT	GIST	pDCAe	${APG}_{s}$	${EAPG}_{sr}$
Australian	138	678	343	179	148
Banknote	46	2859	3000	78	76
Blood	45	290	157	79	74
German	110	2636	215	356	288
Glass	101	1401	206	174	126
Hepatitis	116	1269	230	227	187
Landmines	37	490	3000	69	70
Tic	66	67	147	120	107
	Fval
Dataset	IPOPT	GIST	pDCAe	${APG}_{s}$	${EAPG}_{sr}$
Australian	0.417258	4.222325	0.753193	0.417256	0.417256
Banknote	0.524224	0.524248	0.749316	0.524223	0.524223
Blood	0.559554	0.559553	0.565832	0.559553	0.559553
German	0.590471	8.832912	0.618164	0.590463	0.590463
Glass	0.373864	0.373862	0.566074	0.374673	0.374403
Hepatitis	0.415421	0.415417	0.518707	0.415417	0.415417
Landmines	0.521345	0.521344	0.774479	0.521344	0.521344
Tic	0.653866	0.653864	0.656695	0.653864	0.653864
	Time (s)
Dataset	IPOPT	GIST	pDCAe	${APG}_{s}$	${EAPG}_{sr}$
Australian	0.345	0.973	0.155	0.064	0.053
Banknote	0.132	5.630	2.422	0.053	0.051
Blood	0.100	0.210	0.067	0.027	0.026
German	0.331	3.020	0.167	0.207	0.160
Glass	0.169	1.487	0.064	0.057	0.046
Hepatitis	0.194	1.011	0.048	0.050	0.044
Landmines	0.068	0.271	0.695	0.032	0.027
Tic	0.142	0.084	0.159	0.084	0.076

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, Z.; Ke, H.; Liu, C. The Extended Second APG Method for Constrained DC Problems. Axioms 2026, 15, 7. https://doi.org/10.3390/axioms15010007

AMA Style

Liu Z, Ke H, Liu C. The Extended Second APG Method for Constrained DC Problems. Axioms. 2026; 15(1):7. https://doi.org/10.3390/axioms15010007

Chicago/Turabian Style

Liu, Ziye, Huitao Ke, and Chunguang Liu. 2026. "The Extended Second APG Method for Constrained DC Problems" Axioms 15, no. 1: 7. https://doi.org/10.3390/axioms15010007

APA Style

Liu, Z., Ke, H., & Liu, C. (2026). The Extended Second APG Method for Constrained DC Problems. Axioms, 15(1), 7. https://doi.org/10.3390/axioms15010007

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Extended Second APG Method for Constrained DC Problems

Abstract

1. Introduction

2. Notation and Preliminaries

3. Algorithmic Framework

4. Convergence Properties

5. ${EAPG}_{s}$ with the Restart Technique

6. Numerical Experiments

6.1. Compressed Sensing with $h (\cdot) = \frac{1}{2} {∥ \cdot ∥}^{2}$

6.2. Compressed Sensing with Lorentzian Norm

6.3. Analysis on the Settings of Algorithm 3

6.4. Experiments on Unconstrained DC Problems

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

The Extended Second APG Method for Constrained DC Problems

Abstract

1. Introduction

2. Notation and Preliminaries

3. Algorithmic Framework

4. Convergence Properties

5. EAPG s with the Restart Technique

6. Numerical Experiments

6.1. Compressed Sensing with h ( · ) = 1 2 ∥ · ∥ 2

6.2. Compressed Sensing with Lorentzian Norm

6.3. Analysis on the Settings of Algorithm 3

6.4. Experiments on Unconstrained DC Problems

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

5. ${EAPG}_{s}$ with the Restart Technique

6.1. Compressed Sensing with $h (\cdot) = \frac{1}{2} {∥ \cdot ∥}^{2}$