Next Article in Journal
Averaged Iterative Algorithms for Convex Optimization Problems over a Common Fixed-Points Set of Demicontractive Mappings
Next Article in Special Issue
An Energy-Stable S-SAV Finite Element Method for the Generalized Poisson-Nernst-Planck Equation
Previous Article in Journal
A Hybrid Walrus Optimization-Based Fourth-Order Method for Solving Non-Linear Problems
Previous Article in Special Issue
On High-Order Runge–Kutta Pairs for Linear Inhomogeneous Problems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Extended Second APG Method for Constrained DC Problems

Department of Mathematics, Jinan University, Guangzhou 510632, China
*
Author to whom correspondence should be addressed.
Submission received: 29 October 2025 / Revised: 6 December 2025 / Accepted: 22 December 2025 / Published: 24 December 2025
(This article belongs to the Special Issue The Numerical Analysis and Its Application, 2nd Edition)

Abstract

In this paper, we develop the extended proximal gradient algorithm with Nesterov’s second acceleration ( EAPG s ) for constrained difference-of-convex (DC) optimization problems. EAPG s has two key links to existing methods: it extends APG s (for unconstrained DC problems) by adopting the constraint handling idea from Auslender’s ESQM, and serves as a variant of ESQM e with extrapolation replaced by Nesterov’s second acceleration. Under basic assumptions, we establish the subsequential convergence of EAPG s . By introducing a restart technique and leveraging the Kurdyka–Łojasiewicz (KL) property of a suitable potential function, we further prove its global convergence, analyze its convergence rate, and do so under weaker conditions than those for APG s . Additionally, we propose EAPG s r by adding practical restart criteria to EAPG s . Numerical experiments verify the criteria’s efficiency and show that EAPG s r performs well against state-of-the-art methods for constrained and unconstrained DC problems.

1. Introduction

In this paper, we discuss the following difference-of-convex (DC) optimization problem with smooth inequality constraints and simple geometric constraints:
min x R n F ( x ) : = f ( x ) + P 1 ( x ) P 2 ( x ) s . t . g i ( x ) 0 , i = 1 , , m , x C ,
where all functions are real-valued and defined on R n . Specifically, f and g i ( i = 1 , , m ) are differentiable; P 1 and P 2 are convex, with P 1 admitting an efficiently computable proximal operator; C R n is a nonempty closed convex set; and the feasible set C F is nonempty (here, F : = { x R n : g i ( x ) 0 , i = 1 , , m } ).
Problem (1) belongs to one of the three types of DC problems summarized by [1], which is widely applied in various fields such as joint chance constrained programs [2], multicast network design problems [3], sparsity constrained optimization problems [4], and bilevel optimization problems [5]. As a result, it has garnered significant attention from researchers (see [1,6,7] and the references therein). Problem (1) encompasses two commonly used and extensively studied special cases:
  • Unconstrained DC problems, discussed in [1,8,9,10,11,12,13,14,15] and related references:
    min x R n F ( x ) : = f ( x ) + P 1 ( x ) P 2 ( x ) .
    This corresponds to setting g i 0 (for all i), C = R n , while retaining the original assumptions on f, P 1 , and P 2 .
  • DC problems with f 0 , analyzed in [16,17,18,19] and references therein:
    min x R n P 1 ( x ) P 2 ( x ) s . t . g i ( x ) 0 , i = 1 , , m , x C .
Both Problems (2) and (3) arise naturally in numerous applications. For instance, Yin et al. [17] show that compressed sensing problems can be formulated as Problem (3): here, P 1 P 2 acts as a sparsity-inducing regularizer (e.g., the difference of 1 - and 2 -norms), and the (unique) constraint involves estimation error. Alternatively, the same compressed sensing problem can be cast as Problem (2), where P 1 P 2 remains unchanged and f serves as a penalty term for the original constraint.
Notably, while Problems (2) and (3) often model the same applications, they differ in key trade-offs: Problem (3) enforces (approximate) satisfaction of constraints, but developing constraint-aware algorithms is more challenging; Problem (2) benefits from a richer set of unconstrained solvers, yet its solutions may violate the original constraints—leading to larger deviations from the true optimal solution. Given these trade-offs, extending successful methods for Problem (2) to Problems (1) and (3) is of significant value.
DC programming and DC algorithms were first introduced by Pham Dinh [15] (for a comprehensive survey of results before 2015, see [1] and references therein). The core idea of DC algorithms (DCA) is to approximate the concave component of the DC function with an affine function in each iteration, yielding a convex subproblem that can be solved via existing convex optimization techniques. Subsequent challenges in DCA research include improving algorithm efficiency, conducting convergence analyses, and deriving convergence rate guarantees.
Among DC algorithms, line search methods are fundamental (see [20] for a categorized overview). These methods typically do not require strict conditions on f, g 1 , , g m (e.g., differentiability) or P 1 (e.g., efficient proximal computation), making them applicable to a broad range of DC problems (including those with fewer restrictions than Problem (1)). However, this generality comes at a cost: line search methods cannot leverage favorable properties of functions that arise in practical applications, limiting their potential efficiency.
For Problem (2) in the case where f has a Lipschitz-continuous gradient, integrating proximal mappings and Nesterov’s acceleration techniques yields notable performance improvements. Proximal gradient methods and acceleration strategies were originally developed for convex optimization (see [21] for a detailed review). Examples include FISTA [22] (a proximal gradient method with Nesterov’s first acceleration) and IGA [23] (a proximal gradient method with Nesterov’s second acceleration). Subsequent extensions to DC problems—such as pDCAe [24] (extending FISTA) and APG s [25,26] (extending IGA)—have demonstrated the effectiveness of these convex optimization ideas in the DC setting. The main iteration of IGA generates
y k = θ k z k + ( 1 θ k ) x k , z k + 1 = argmin z R n { P 1 ( z ) + f ( y k ) , z + 1 2 θ k L f z z k 2 } , x k + 1 = θ k z k + 1 + ( 1 θ k ) x k
for the convex version of Problem (2) (i.e., when f is convex and P 2 = 0 ), where { θ k } is the sequence of acceleartion parameters, and L f denotes the Lipschitz constant of f . For the general case of Problem (2), APG s reformulate the above subproblem by subtracting ζ k ( P 2 ( x k ) ) from f ( y k ) .
For Problem (1), Lu proposed a sequential convex programming (SCP) method in [27], where each iteration is obtained by solving a constrained convex programming problem. It was shown that any accumulation point of the sequence generated by SCP is a stationary point under Slater’s condition. However, the convergence and convergence rate of the entire generated sequence remain unknown. Recently, Yu et al. further studied the SCP method with monotone line search (denoted as SCP l s ) in [28], successfully establishing global convergence guarantees for the proposed algorithm and quantitatively estimating its convergence rate.
Furthermore, for nonlinear programming problems where C is the entire space, Sequential Quadratic Programming (SQP) (see [16,29]) is one of the most successful methods. The SQP algorithm solves a subproblem involving linear inequalities at each iteration. Later, attention turned to modifying SQP algorithms by constructing Sequential Quadratically Constrained Quadratic Programming (SQCQP) methods (see [30,31]), where each iteration solves a subproblem with convex quadratic inequalities. However, Solodov pointed out in [32] that a major drawback of SQP is that global convergence statements rely on the boundedness of the sequence of primal problems constructed by the algorithm—an assumption that is not easily justified. To address this, he introduced a safeguard into the line search procedure, developing an SQP method where the primal sequence is proven bounded when the feasible set is bounded and g i is convex. This drawback also persists in SQCQP methods.
In [18], Auslender proposed the Extended Sequential Quadratic Method (ESQM) to overcome this critical limitation of existing SQP and SQCQP methods. ESQM achieves global convergence without such boundedness assumptions, enhancing its versatility for a wider range of optimization problems. To improve the convergence rate, Zhang et al. developed a variant of ESQM (denoted as ESQM e [19]) that incorporates Nesterov’s extrapolation technique, achieving empirical acceleration for Problem (3).
Building on the proven effectiveness of ESQMe for constrained DC optimization and the well-established advantages of APGs for solving Problem (2), this paper extends APGs to solve (1) by adopting the constraint handling strategy from Auslender’s ESQM. The resulting algorithm is termed the extended proximal gradient algorithm with Nesterov’s second acceleration technique ( EAPG s ). This algorithm also serves as a variant of ESQM e , where the extrapolation step is replaced with Nesterov’s second acceleration.
For EAPG s , we can prove its subsequential convergence properties under basic assumptions. However, analyzing its global convergence and convergence rate faces a key obstacle: the lack of information about the subdifferential properties of F along the sequence { x k } generated by EAPG s . For APGs applied to Problem (2), this obstacle is circumvented by assuming P 1 is Lipschitz differentiable—though this condition is rarely satisfied in practical applications.
For Problem (1), the presence of constraints introduces non-differentiable components in the subproblem, making this obstacle insurmountable and posing a significant challenge. To overcome this, inspired by the restart technique introduced by O’Donoghue and Candès [33], we integrate this technique into EAPG s and find it effective for both theoretical analysis and practical computation. Theoretically, we construct a suitable potential function and assume it satisfies the Kurdyka–Łojasiewicz (KL) property, along with additional differentiability conditions for each g i in (1), thereby establishing the convergence of the entire sequence and its convergence rate. Practically, we introduce efficient restart criteria to develop a practical variant ( EAPG s r ), which is validated through numerical experiments.
The remainder of this paper is structured as follows: Section 2 presents preliminary concepts and mathematical foundations essential to our analysis. Section 3 formally introduces the EAPG s algorithm. Section 4 establishes its subsequential convergence properties. Section 5 introduces the theoretical variant of EAPG s with a restart technique, proves its global convergence, estimates its convergence rate, and presents the practical variant EAPG s r . Finally, Section 6 demonstrates the practical performance of EAPG s r through comprehensive numerical experiments.

2. Notation and Preliminaries

In this paper, we use the following standard notation:
  • R and R + denote the sets of real numbers and nonnegative real numbers, respectively.
  • R n and R + n denote the n-dimensional Euclidean space and its nonnegative orthant, respectively.
  • N denotes the set of positive integers.
  • For x R , x + : = max { x , 0 } .
  • For p 1 , · p denotes the l p -norm on R n ; in particular, · is used exclusively to represent the l 2 -norm ( · 2 ).
  • For x , y R n , x , y denotes their inner product.
  • Given a nonempty set D R n , the distance from x R n to D is defined as dist ( x , D ) : = inf { x z z D } .
For extended real-valued functions f : R n ( , + ] , we adopt the following definitions:
1.
A function f is proper if its domain dom f : = { x f ( x ) < + } is nonempty.
2.
A proper function f is closed if it is lower semicontinuous at every x R n , i.e., f ( x ) lim inf z x f ( z ) .
3.
A proper closed function f is level bounded if all its lower level sets { x R n f ( x ) a } are bounded for every a R .
4.
For a sequence { x k } R n , x k f x (as k ) means x k x (in R n ) and f ( x k ) f ( x ) .
Definition 1
([34] Definition 8.3). For a proper closed function f, the regular subdifferential of f : R n R { + } at x d o m f is defined by
^ f ( x ) : = x ^ R n : lim inf z x , z x f ( z ) f ( x ) x ^ , z x z x 0 .
The (general) subdifferential of f at x d o m f is defined by
f ( x ) : = x ^ : x k f x , x ^ k x ^ with x ^ k ^ f ( x k ) for each k ,
we write dom f : = { x : f ( x ) } .
Note that if f is convex, then the general subdifferential and regular subdifferential of f at x dom f reduce to the (classical) subdifferential ([34] Proposition 8.12), which is given by
f ( x ) = { x ^ : f ( y ) f ( x ) + x ^ , y x y R n } .
For a nonempty closed set D R n , the indicator function δ D is defined by
δ D ( x ) = 0 x D , x D .
The normal cone of D at x D is defined by N D ( x ) : = δ D ( x ) .
Next, we recall several key definitions that will be used in the subsequent analysis. First, we introduce the constraint qualification for problem (1)—which was also adopted in [18,19]—followed by the (associated) first-order optimality conditions for (1).
Definition 2
(RCQ). We say that the Robinson constraint qualification holds at an x R n for (1) if the following statement holds:
R C Q ( x ) : y C such that g i ( x ) + g i ( x ) , y x < 0 i = 1 , m .
Definition 3
(Critical point). For (1), we say that x is a critical point of (1) if x C and there exists λ = ( λ 1 , λ 2 , , λ m ) R + m such that ( x , λ ) satisfies the following conditions:
(i) 
g i ( x ) 0 i = 1 , , m ,
(ii) 
λ i g i ( x ) = 0 i = 1 , , m ,
(iii) 
0 f ( x ) + P 1 ( x ) P 2 ( x ) + i = 1 m λ i g i ( x ) + N C ( x ) .
Using arguments analogous to those in ([28] Section 2), one can show the following: if the RCQ(x) holds for all x C F , then every local minimizer of (1) is a critical point of (1)—provided that Assumption 1 (presented in the next section) holds and P 1 is continuous at x.
It is further straightforward to verify that if g 1 , , g m are convex and the Slater condition is satisfied (i.e., there exists x ˜ C such that g i ( x ˜ ) < 0 for all i = 1 , 2 , , m ), then RCQ(x) holds for all x C .
Numerous functions are known to satisfy the Kurdyka–Łojasiewicz (KL) property. For example, proper closed semi-algebraic functions satisfy the KL property with some exponent β [ 0 , 1 ) (see [35]). The KL property plays a crucial role in the convergence analysis of many first-order methods, and its exponent is particularly significant for establishing convergence rates (for further details, refer to [25,26,36,37,38,39,40,41,42] and the references therein).
First, for η > 0 , we define Θ η as the class of all continuous concave functions φ : [ 0 , η ) [ 0 , + ) satisfying φ ( 0 ) = 0 , where φ is continuously differentiable on ( 0 , η ) with φ > 0 (see ([43] Section 2)).
Definition 4
((KL property and KL function) ([43] Section 2)). Let h : R n R { + } be a proper closed function.
(i) 
For x ˜ dom h : = { x R n : h ( x ) } , if there exist a neighborhood O of x ˜ , η ( 0 , + ] and function φ Θ η such that for all x O { x R n : h ( x ˜ ) < h ( x ) < h ( x ˜ ) + η } , it holds that
φ ( h ( x ) h ( x ˜ ) ) dist ( 0 , h ( x ) ) 1 ,
then h is said to have the Kurdyka-Lojasiewicz (KL) property at x ˜ .
(ii) 
If h satisfies the KL property at each point of dom h , then h is called a KL function.
Definition 5
((KL exponent) ([43] Section 2)). Suppose that h : R n R { + } is a proper closed function satisfying the KL property at x ˜ dom h with φ ( s ) = ρ s 1 β for some ρ > 0 and β [ 0 , 1 ) . Then, h is said to have the KL property at x ˜ with an exponent β. If h is a KL function and has the same exponent β at any x ˜ dom h , then h is said to be a KL function with the exponent β.
Lemma 1
((Uniformized KL property) ([39] Lemma 6)). Suppose that h : R n R { + } is a proper closed function and Γ is a compact set. If h ζ on Γ for some constant ζ and satisfies the KL property at each point of Γ , then there exist ε > 0 , η > 0 and φ Θ η such that
φ ( h ( x ) ζ ) dist ( 0 , h ( x ) ) 1
for all x { x R n : dist ( x , Γ ) < ε } { x R n : ζ < h ( x ) < ζ + η } .

3. Algorithmic Framework

From this section to Section 5, we always suppose that the following conditions are fulfilled.
Assumption 1.
(i) 
f is a differentiable (possibly nonconvex) function and its gradient f is Lipschitz continuous with Lipschitz constant L f 0 , and l f [ 0 , L f ] is such that f ( · ) + 1 2 l f · 2 is convex.
(ii) 
All g i ( i = 1 , 2 , , m ) are differentiable functions with Lipschitz continuous gradients. We use L g to denote the common Lipschitz continuity modulus of g 1 , , g m , and let l g [ 0 , L g ] be such that g i ( · ) + 1 2 l g · be convex for all i = 1 , 2 , , m .
(iii) 
At least one of L f and L g is positive.
(iv) 
The function P 1 is proper, convex and lower semicontinuous; the function P 2 is continuous and convex.
(v) 
Either (a) C is compact, or (b) all g i = 0 and F is level bounded.
The above assumptions all constitute basic conditions for DC problems. The Lipschitz continuity of gradients is a necessary requirement for nearly all proximal gradient-type algorithms (see, e.g., [19,21,23,25,26,42]). Since we can always select a larger Lipschitz modulus, condition (iii) is easily satisfied. Finally, level boundedness frequently arises in the context of problem (2) (see, e.g., [25,26,36,42]), whereas the compactness of C is required in [19]. Either condition guarantees the existence of an optimal solution to problem (2).
The algorithm we study in this paper is presented as Algorithm 1 below; here and throughout, for notational simplicity, for each u , w R n , we define
lin g i ( u , w ) : = g i ( w ) + g i ( w ) , u w i = 1 , , m ,
g 0 : = 0 ( which implies that lin g 0 ( u , w ) 0 ) ,
Ψ ( u , w ) : = max i = 1 , , m [ lin g i ( u , w ) ] + = max i = 0 , 1 , , m { lin g i ( u , w ) } .
Algorithm 1 EAPG s for solving Problem (1)
Initialization: Choose { θ k } ( 0 , 1 ] , d > 0 , α 0 > 0 , x 0 , z 0 C .
For   k = 0 , 1 , , take ξ k P 2 ( x k ) , and compute
y k = θ k z k + ( 1 θ k ) x k , z k + 1 = argmin z C E ( z ) : = P 1 ( z ) + f ( y k ) ζ k , z
+ α k Ψ ( z , y k ) + 1 2 θ k ( α k L g + L f ) z z k 2 ,
x k + 1 = θ k z k + 1 + ( 1 θ k ) x k ,
α k + 1 = α k , if Ψ ( z k + 1 , y k ) 0 , α k + d , otherwise .

End for
We refer to our algorithm as the extended proximal gradient algorithm with Nesterov’s second acceleration technique ( EAPG s ), where “Nesterov’s second acceleration technique” corresponds to Equations (14) and (16). To guarantee the convergence properties of Algorithm 1 in Section 4, we impose the following assumption on the acceleration parameters { θ k } :
Assumption 2.
With inf k θ k > 0 , there exists a constant δ ( 0 , 1 ) satisfying
( L g + l g ) γ k 2 L g ( 1 δ ) ,
( L f + l f ) γ k 2 L f ( 1 δ ) ,
where γ k : = θ k ( 1 θ k 1 ) θ k 1 , for k 1 .
Remark 1.
We can provide concrete examples for selecting the acceleration parameters { θ k } and the constant δ. Define
τ f : = L f + l f L f , i f L f > 0 , 1 , i f L f = 0 , τ g : = L g + l g L g , i f L g > 0 , 1 , i f L g = 0 ,
and let τ : = max { τ f , τ g } .
(i) 
Constant Acceleration Parameters: Set θ k θ for some constant θ ( 0 , 1 ] satisfying ( 1 θ ) 2 < τ 1 . Choose the constant δ = 1 τ ( 1 θ ) 2 .
(ii) 
Variable Acceleration Parameters: For a preselected positive integer K, let θ k = ϑ k for all k < K , and θ k = ϑ K for all k K . Here, { ϑ k } represents the classical parameter sequence introduced by Nesterov (see [21,44]), where
ϑ 0 = 1 , ϑ k + 1 = ϑ k 4 + 4 ϑ k 2 ϑ k 2 2 .
The integer K is chosen such that ( 1 ϑ K ) 2 < τ 1 . Let δ : = 1 τ ( 1 ϑ K ) 2 . Noticing that the sequence { ϑ k } is decreasing (as shown in [21]), it follows that { θ k } is also decreasing. Additionally, we have:
1 τ γ k 2 1 τ ( 1 θ k 1 ) 2 δ ,
which establishes the desired inequalities (18) and (19).
The subproblem in (15) admits a unique solution, as it involves a strongly convex objective function over a nonempty closed convex feasible set, while an iterative solver is generally required for this subproblem, we refer readers to ([45] Appendix A) for an efficient routine to solve the subproblem in (15)—specifically for cases where m = 1 and P 1 takes certain forms.
The following lemmas are useful for proving results in subsequent sections. First, the conclusions of Lemma 2 are identical to their counterparts in ([19] Lemma 3.1), with only minor differences in parameter specifications.
Lemma 2.
Suppose that the sequece { z k } C is generated in Algorithm 1. Then the following statements hold:
(i) 
Problem (15) has a unique solution.
(ii) 
z k + 1 is the minimizer of the subproblem in (15) if and only if there exist λ i k 0 for all i I k ( z k + 1 ) such that i I k ( z k + 1 ) λ i k = 1 and
0 P 1 ( z k + 1 ) + f ( y k ) ξ k + α k i I k ( z k + 1 ) λ i k g i ( y k ) + θ k ( α k L g + L f ) ( z k + 1 z k ) + N C ( z k + 1 ) ,
where
I k ( z ) : = ι { 0 , 1 , , m } : lin g ι ( z , y k ) = Ψ ( z , y k ) .
The next lemma is based on Equation (4.2) in [19].
Lemma 3.
For any x , y , y R n ,
Ψ ( x , y ) Ψ ( x , y ) L g 2 x y 2 + l g 2 x y 2 .
The inequality specified in Lemma 4 has appeared in the proofs of several previous works (e.g., (26) in [26]); here, we provide a concise proof.
Lemma 4.
For any x , x , y R n ,
f ( x ) f ( x ) f ( y ) , x x + 1 2 L f x y 2 + 1 2 l f x y 2 .
Proof. 
By the Lipschitz continuity of f and ([44] Lemma 1.2.3), we have
f ( x ) f ( y ) + f ( y ) , x y + 1 2 L f x y 2 .
On the other hand, since f ( · ) + 1 2 l f · y 2 is a convex function with gradient f ( y ) at y, the following inequality holds:
f ( x ) + 1 2 l f x y 2 f ( y ) + f ( y ) , x y .
Then (21) follows from (22) and (23).    □
To simplify our discussion in Section 4, Section 5 and Section 6, we denoted by
Δ k : = x k x k 1 , k = 1 , 2 ,
Based on (14) and (16), we have
z k + 1 x k = 1 θ k Δ k + 1 ,
x k z k = γ k θ k Δ k ,
z k + 1 z k = 1 θ k ( x k + 1 y k ) = 1 θ k Δ k + 1 γ k θ k Δ k ,
x k y k = θ k ( x k z k ) = γ k Δ k ,
z k + 1 y k = 1 θ k Δ k + 1 γ k Δ k .

4. Convergence Properties

In this section, we analyze the convergence properties of Algorithm 1. A central element of our analysis is the following auxiliary function:
Q ( x , x , y , α ) = α 1 [ F ( x ) m ¯ ] + Ψ ( x , y ) + 1 2 L g x y 2 + 1 2 ( L g + α 1 L f ) x x 2 ,
where
m ¯ : = inf F ( x ) : x C .
Theorem 1
(Vanishing successive changes). Consider the Problem (1) under Assumptions 1 and 2. Let { x k , y k , z k , α k } be generated by Algorithm 1. Then the following statements hold:
(i) 
For k 1 , it holds that
H k H k + 1 δ 2 ( L g + α k 1 L f ) Δ k 2 ,
where
H k : = Q ( x k , x k 1 , y k 1 , α k ) .
(ii) 
k = 0 Δ k 2 < , lim k Δ k = 0 , and     lim k z k + 1 x k = lim k x k y k = lim k z k + 1 z k = lim k z k + 1 y k = 0 .
(iii) 
The sequence { x k } C and is bounded.
Proof. 
(i)
Since E ( z ) is a strongly convex function with parameter θ k ( α k L g + L f ) and z k + 1 denotes its minimizer over the set C, the 3-Point Property ([46] Lemma 3.2) yields
E ( z k + 1 ) < E ( x k ) 1 2 θ k ( α k L g + L f ) z k + 1 x k 2 ,
which is equivalent to:
P 1 ( z k + 1 ) P 1 ( x k ) + ξ k f ( y k ) , z k + 1 x k + α k [ Ψ ( x k , y k ) Ψ ( z k + 1 , y k ) ] + 1 2 θ k ( α k L g + L f ) x k z k 2 z k + 1 z k 2 z k + 1 x k 2 .
Substituting the equalities (25)–(27) into the above inequality, we obtain
P ( z k + 1 ) P 1 ( x k ) + 1 θ k ξ k f ( y k ) , Δ k + 1 + α k [ Ψ ( x k , y k ) Ψ ( z k + 1 , y k ) ] + 1 2 θ k ( α k L g + L f ) γ k 2 θ k 2 Δ k 2 1 θ k 2 x k + 1 y k 2 1 θ k 2 Δ k + 1 2 .
By virtue of (16) and convexity of P 1 , it follows that
P 1 ( x k + 1 ) P 1 ( x k ) + θ k [ P 1 ( z k + 1 ) P 1 ( x k ) ] P 1 ( x k ) + ξ k , Δ k + 1 f ( y k ) , Δ k + 1 + θ k α k [ Ψ ( x k , y k ) Ψ ( z k + 1 , y k ) ] + 1 2 ( α k L g + L f ) γ k 2 Δ k 2 x k + 1 y k 2 Δ k + 1 2 .
Combining this result with two key inequalities:
P 2 ( x k + 1 ) + P 2 ( x k ) ξ k , Δ k + 1 ,
which holds due to the convexity of P 2 and the fact that ξ k P 2 ( x k ) ;
f ( x k + 1 ) f ( x k ) f ( y k ) , x k + 1 x k + 1 2 L f x k + 1 y k 2 + 1 2 l f x k y k 2 ,
which is derived from Lemma 4 by substituting x , x , y with x k + 1 , x k , y k , respectively, we arrive at
F ( x k + 1 ) F ( x k ) 1 2 L f x k + 1 y k 2 + 1 2 l f x k y k 2 + θ k α k [ Ψ ( x k , y k ) Ψ ( z k + 1 , y k ) ] + 1 2 ( α k L g + L f ) γ k 2 Δ k 2 x k + 1 y k 2 Δ k + 1 2 .
Thus,
F ( x k + 1 ) m ¯ α k + 1 + Ψ ( x k + 1 , y k ) F ( x k ) m ¯ α k + Ψ ( x k , y k 1 ) α k 1 F ( x k + 1 ) F ( x k ) + Ψ ( x k + 1 , y k ) Ψ ( x k , y k 1 ) Ψ ( x k + 1 , y k ) Ψ ( x k , y k 1 ) + θ k Ψ ( x k , y k ) θ k Ψ ( z k + 1 , y k ) + 1 2 α k L f x k + 1 y k 2 + 1 2 α k l f x k y k 2 + 1 2 L g + 1 α k L f γ k 2 Δ k 2 x k + 1 y k 2 Δ k + 1 2 ,
where the first inequality follows from the setting of α k + 1 in Algorithm 1 (which ensures α k + 1 α k ), and the second inequality is a consequence of (40).
Next, leveraging the convexity of Ψ ( · , y k ) , we have
Ψ ( x k + 1 , y k ) θ k Ψ ( z k + 1 , y k ) + ( 1 θ k ) Ψ ( x k , y k ) .
Additionally, by Lemma 3 (with x , y , y replaced by x k , y k , y k 1 , respectively),
Ψ ( x k , y k ) Ψ ( x k , y k 1 ) + 1 2 L g x k y k 1 2 + 1 2 l g x k y k 2 .
Combining these two inequalities and (41) gives
F ( x k + 1 ) m ¯ α k + 1 + Ψ ( x k + 1 , y k ) F ( x k ) m ¯ α k + Ψ ( x k , y k 1 ) 1 2 L g x k y k 1 2 + 1 2 l g x k y k 2 + 1 2 α k L f x k + 1 y k 2 + 1 2 α k l f x k y k 2 + 1 2 L g + 1 α k γ k 2 Δ k 2 x k + 1 y k 2 Δ k + 1 2 .
Together with the definition of H k in (33), this implies
H k + 1 H k 1 2 L g x k + 1 y k 2 + 1 2 L g + α k + 1 1 L f Δ k + 1 2 1 2 L g x k y k 1 2 1 2 ( L g + α k 1 L f ) Δ k 2 + 1 2 L g x k y k 1 2 + 1 2 l g x k y k 2 + 1 2 α k 1 L f x k + 1 y k 2 + 1 2 α k 1 l f x k y k 2 + 1 2 L g + α k 1 L f γ k 2 Δ k 2 x k + 1 y k 2 Δ k + 1 2 = 1 2 α k + 1 1 α k 1 L f Δ k + 1 2 + 1 2 ( L g + l g ) γ k 2 L g + α k 1 ( L f + l f ) γ k 2 L f Δ k 2 1 2 ( L g + α k 1 L f ) δ Δ k 2 ,
where the equality relies on (28), and the final inequality holds due to α k + 1 α k and Assumption 2.
(ii)
From (45), we deduce
k = 1 t δ 2 Δ k 2 k = 1 t ( H k H k + 1 ) = H 1 H t + 1 H 1 lim inf k H k < + .
We directly conclude that k = 0 Δ k 2 < and lim k Δ k = 0 . Since Assumption 2 guarantees inf k θ k > 0 , combining this with lim k Δ k = 0 and Equations (25) and (27)–(29), we further obtain the remaining limit conclusions.
(iii)
According to (15) and (16), and the convexity of C, x k C for each k. If C is compact, as part (a) of Assumption 1(iv) holds, the sequence { x k } is obviously bounded. Otherwise, we have all g i = 0 and F is level bounded. Thus all α k = α 0 in our algorithm. From (48), we observe that F ( x k ) α 0 H k + m ¯ α 0 H 1 + m ¯ < + . So { x k } is bounded by the level boundedness of F.
   □
In Algorithm 1, if the penalty parameters { α k } are unbounded, the influence of the objective function on subproblem (15) will diminish. Consequently, we cannot guarantee the critical point property for any cluster point of { x k } .
To establish the boundedness of { α k } , the following assumption is critical. First, introduced in ([18] Assumption (A1)) for analyzing ESQM, this assumption was also adopted in [19] for ESQM e .
Assumption 3.
For (1), the R C Q ( x ) holds at every x C F , and for every x C F , there cannot exist u i , i I ( x ) , such that
u i 0 i I ( x ) , i I ( x ) u i = 1 , i I ( x ) u i g i ( x ) , z x 0 z C ,
where I ( x ) : = l { 1 , , m } : g l ( x ) = max i = 1 , , m [ g i ( x ) ] + .
Remark 2.
(i) 
As shown in ([18] Remark 2.1), if Assumption 3 holds, then for any x C , there exist no u i (for i I ( x ) ) that satisfy (46).
(ii) 
As shown in ([18] Remark 2.2), if RCQ(x) holds for all x C , then Assumption 3 is satisfied.
Theorem 2
(Boundedness of the penalty parameters { α k } ). Consider (1) and suppose that Assumption 1–3 hold. Let { ( x k , y k , z k , α k ) } be generated by Algorithm 1. Then the sequence { α k } is bounded above, i.e., there exists K 0 N such that α k = α K 0 whenever k K 0 .
Proof. 
Suppose on the contrary that { α k } is unbounded above. By the definition of α k in Algorithm 1, there exists a subsequence of positive integers { k j } such that
Ψ ( z k j + 1 , y k j ) > 0 .
Moreover, we have lim k α k = and lim k α k 1 = 0 .
Recalling the definitions of I k ( · ) in (20) and g 0 in (12), we have 0 I k j ( z k j + 1 ) and
lin g i ( z k j + 1 , y k j ) > 0 i I k j ( z k j + 1 ) , j .
Now, in view of the finiteness of { I k j ( z k j + 1 ) } (since I k j ( z k j + 1 ) { 1 , , m } for all j), by passing to a further subsequence if necessary, we deduce that there exists a nonempty subset I 0 { 1 , , m } such that I k j ( z k j + 1 ) I 0 for all j. That is, for all i I 0 ,
lin g i ( z k j + 1 , y k j ) = Ψ ( z k j + 1 , y k j ) > 0 j .
In addition, from Lemma 2(ii), we have that for each k j , there exist λ i k j 0 for each i I k j ( z k j + 1 ) I 0 , such that i I 0 λ i k j = 1 and
0 α k j 1 [ P 1 ( z k j + 1 ) + f ( y k j ) ξ k j ] + θ k j ( L g + α k j 1 L f ) ( z k j + 1 z k j ) + i I 0 λ i k j g i ( y k j ) + N C ( z k j + 1 ) .
Since the sequences { x k j } C and { λ i k j } (for each i I 0 ) are bounded, by passing to a further subsequence if necessary, we assume that lim j x k j = x * for some x * and that for each i I 0 , lim j λ i k j = λ ¯ i for some λ ¯ i . Then x * C , λ ¯ i 0 (for each i I 0 ), i I 0 λ ¯ i = 1 and I 0 { ι { 0 , 1 , , m } : g ι ( x * ) = max i = 0 , 1 , , m g i ( x * ) } . Since 0 I 0 , we see that
I 0 I ( x * ) ,
where I ( x ) was defined in Assumption 3. Passing to the limit in (49), and noting that lim j α k j 1 = 0 , lim k z k + 1 z k = 0 (thanks to Theorem 1(iii)), and the fact that { P 1 ( x k j + 1 ) } , { f ( y k j ) } , and { ξ k j } are uniformly bounded (thanks to the boundedness of { x k } and { y k } , the convexity of P 1 , P 2 and ([47] Theorem 24.7), and the Lipschitz continuity of { f } ), we have upon invoking the closedness of x N C ( x ) that
0 i I 0 λ ¯ i g i ( x * ) + N C ( x * ) ,
which implies that
i I 0 λ ¯ i g i ( x * ) , x x * 0 x C .
Since I 0 I ( x * ) , this contradicts Assumption 3 in view of Remark 2(i). We complete the proof.    □
Theorem 3
(Subsequential convergence). Consider (1) and suppose that Assumptions 1–3 hold. Let { x k } be generated by Algorithm 1. Then, for any accumulation point x ¯ of { x k } , there exists λ ¯ i 0 for each i I ˜ ( x ¯ ) such that i I ˜ ( x ¯ ) λ ¯ i = 1 and
0 P 1 ( x ¯ ) + f ( x ¯ ) P 2 ( x ¯ ) + α K 0 i I ˜ ( x ¯ ) λ ¯ i g i ( x ¯ ) + N C ( x ¯ ) ,
where I ˜ ( x ¯ ) : = { ι { 0 , 1 , , m } : g ι ( x ¯ ) = max i = 0 , 1 , , m { g i ( x ¯ ) } } , and α K 0 is defined in Theorem 2; moreover, x ¯ is a critical point of Problem (1).
Proof. 
Suppose that x ¯ is an accumulation point of { x k } , so there exists a convergent subsequence { x k j } such that lim j x k j = x ¯ . Let { ξ k } be the sequence generated in Algorithm 1, and let { λ i k } (for i I k j ( z k j + 1 ) ) be the sequence specified in Lemma 2(ii). Note first that for all j, I k j ( z k j + 1 ) { 0 , 1 , , m } , which implies the set { I k j ( z k j + 1 ) } is finite. By passing to a further subsequence if necessary, we may assume there exists a nonempty subset I 0 { 0 , 1 , , m } such that I k j ( z k j + 1 ) I 0 . From Lemma 2(ii), it follows that
0 P 1 ( z k j + 1 ) + f ( y k j ) ξ k j + θ k j ( α k j L g + L f ) ( z k j + 1 z k j ) + α k j i I 0 λ i k j g i ( y k j ) + N C ( z k j + 1 ) and i I 0 λ i k j = 1 , λ i k j 0 i I k j ( z k j + 1 ) I 0 .
Moreover, for each i I k j ( z k j + 1 ) I 0 , the sequence { λ i k j } consists of nonnegative numbers bounded above by 1, hence is bounded. As for { ξ k } , its boundedness is guaranteed by the fact that P 2 is convex, together with ([47] Theorem 24.7). By passing to another further subsequence if needed, we may assume without loss of generality that lim j λ i k j = λ ¯ i 0 for each i I 0 , and lim j ξ k j = ξ ¯ . Additionally, Theorem 2 implies that for all k j K 0 , α k j = α K 0 and
Ψ ( z k j + 1 , y k j ) = 0 .
Taking the limit as j in both (51) and (52)—and recalling Theorem 1(ii) along with the closedness of P 1 , P 2 and N C —we obtain
0 f ( x ¯ ) + P 1 ( x ¯ ) P 2 ( x ¯ ) + α K 0 i I 0 λ ¯ i g i ( x ¯ ) + N C ( x ¯ ) , i I 0 λ ¯ i = 1 , λ ¯ i 0 i I 0 ,
and
Ψ ( x ¯ , x ¯ ) = 0 ,
where the above inequality is equivalent to (noting that Ψ ( x ¯ , x ¯ ) = max i = 0 , 1 , , m { g i ( x ¯ ) } by (11) and (13))
g i ( x ¯ ) 0 i = 1 , , m .
Furthermore, from the definition of I k j ( z k j + 1 ) in (20) (and since I k j ( z k j + 1 ) I 0 ) and Theorem 1(ii), we have
I 0 I ˜ ( x ¯ ) : = ι { 0 , 1 , , m } : g ι ( x ¯ ) = max i = 0 , 1 , , m { g i ( x ¯ ) } .
Noting that the above inclusion, we define λ ¯ i = 0 for i I ˜ ( x ¯ ) I 0 . With this definition, the inclusion (50) follows directly from (53).
Finally, define λ ^ i : = α N 0 λ ¯ i 0 for all i I 0 { 1 , , m } , and λ ^ i = 0 for all i { 1 , , m } I 0 . By (55) and the fact that I 0 I ˜ ( x ¯ ) (see (56)), we have
λ ^ i g i ( x ¯ ) = 0 i = 1 , , m .
To verify this, observe that for each i I 0 , we have g i ( x ¯ ) = 0 , and for each i I 0 , we have λ ^ i = 0 . Note also that g 0 ( x ¯ ) = 0 (since g 0 0 ). Using the definition of λ ^ i and (53), we find
0 P 1 ( x ¯ ) + f ( x ¯ ) P 2 ( x ¯ ) + i = 1 m λ ^ i g i ( x ¯ ) + N C ( x ¯ ) .
Combining (55), (57), (58) and the definition of λ ^ above, we conclude that x ¯ is a critical point of (1).    □

5. EAPG s with the Restart Technique

To discuss the global convergence and convergence rate of the sequence { x k } generated by proximal-gradient-type algorithms (e.g., the algorithms proposed in [19,25,26,42,48] and numerous other related methods), the following conditions are typically indispensable: (1) An inequality analogous to Theorem 1(i), which involves an auxiliary sequence (e.g., { H k } ) that depends on { x k } , { F ( x k ) } and potentially other sequences; (2) Certain boundedness properties of the subgradients of the objective function within the subproblem (e.g., with respect to { x k } for the function E ( · ) ); (3) The Kurdyka–Łojasiewicz (KL) property of an auxiliary function H associated with { H k } . However, for proximal gradient algorithms incorporating the second acceleration technique, analyzing condition (2) poses a significant challenge. The core reason lies in the structural design of these methods: in such algorithms, z k (rather than x k ) serves as the minimizer of subproblem (15), while x k is merely a linear combination of z k and x k 1 . Consequently, although we can derive bounds for the gradient of E ( · ) at z k , we lack sufficient information to characterize the subdifferential E at x k .
The solution approach adopted in [25,26,36] relies on imposing Lipschitz continuity assumptions on the gradients of all involved functions. Nevertheless, this assumption is overly restrictive: in practical scenarios, the function P 1 is often nondifferentiable (e.g., P 1 = · 1 , the l 1 -norm), and it is even inapplicable to Algorithm 1—since the function Ψ in this algorithm fails to be differentiable everywhere.
To address the aforementioned difficulty, it is essential to revisit the working mechanism of Nesterov’s second acceleration method for proximal gradient algorithms, which was first investigated in [23]. Specifically, ([23] Theorem 5.1) demonstrates that if the objective function is strongly convex and differentiable with a Lipschitz-continuous gradient, then x k always yields a better performance (in terms of objective function value) than z k . For nonconvex problems, however, this conclusion no longer holds—we cannot guarantee that x k outperforms z k .
Given the possibility that z k may be superior to x k in nonconvex settings, continuing the acceleration operation (which relies on x k as the update base) may not be optimal. Thus, by drawing inspiration from the restart strategy used in Nesterov’s first acceleration method [33], we propose the following algorithm (Algorithm 2):
Algorithm 2   EAPG s for solving (1) with the restart technique (Theoretical version).
Initialization:    x ( 1 , 0 ) = x 0 C , z ( 1 , 0 ) = z 0 C , α ( 1 , 0 ) = α 0 > 0 , d > 0 .
for s = 1 , 2 ,  do
carry out Algorithm 1 with initial values x ( s , 0 ) , z ( s , 0 ) , α ( s , 0 ) and preselected parameters { θ ( s , k ) } satisfying Assumption 4 to generate the sequence { ( x ( s , k ) , y ( s , k ) , z ( s , k ) , α ( s , k ) ) } until
Q ( x ( s , k ) , x ( s , k 1 ) , y ( s , k 1 ) , α ( s , k ) ) > Q ( z ( s , k ) , x ( s , k 1 ) , y ( s , k 1 ) , α ( s , k ) )
where Q is defined in (30). Denote the above k by N s .
Set: x ( s + 1 , 0 ) = z ( s + 1 , 0 ) = z ( s , N s ) , α ( s + 1 , 0 ) = α ( s , N s ) .
end for
Assumption 4.
With inf ( s , k ) θ ( s , k ) > 0 , there exists a constant δ ( 0 , 1 ) such that
( L g + l g ) γ ( s , k ) 2 L g ( 1 δ ) , ( L f + l f ) γ ( s , k ) 2 L f ( 1 δ ) ,
where γ ( s , k ) : = θ ( s , k ) ( 1 θ ( s , k 1 ) ) θ ( s , k 1 ) 1 for k 1 .
In the subsequent discussion of this section, for simplicity, we use { x k } to denote the sequence
{ x ( s , k ) } : x ( 1 , 0 ) , , x ( 1 , N 1 1 ) , x ( 2 , 0 ) , , x ( 2 , N 2 1 ) , x ( 3 , 0 )
generated by Algorithm 2. Similarly, we use { y k } , { z k } , { α k } , { θ k } , and { γ k } to represent { y ( s , k ) } , { z ( s , k ) } , { α ( s , k ) } , { θ ( s , k ) } , and { γ ( s , k ) } , respectively. We also use k = k ( s , k ) to denote the correspondence mapping ( s , k ) to k. As will be shown below, these sequences satisfy the same results as those established in Section 4.
Remark 3.
For the sequences generated by Algorithm 2, we have the following relations:
(i) 
When k = k ( s , k ) for 1 k N s 2 , the equality Equations (25)–(29) remain valid.
(ii) 
When k = k ( s , N s 1 ) , we have x k + 1 = y k + 1 = z k + 1 , Equations (26) and (28) still hold, and
z k + 1 x k = Δ k + 1 , z k + 1 z k = Δ k + 1 γ k θ k Δ k , z k + 1 y k = Δ k + 1 γ k Δ k .
(iii) 
When k = k ( s + 1 , 0 ) , we have x k = y k = z k , Equation (25) holds, and we have
x k z k = x k y k = 0 , z k + 1 y k = z k + 1 z k = z k + 1 x k = 1 θ k Δ k + 1 .
Lemma 5.
The results established in Theorems 1–3 remain valid for the sequence { x k , y k , z k , α k } generated by Algorithm 2, provided the same conditions are satisfied with Assumption 2 replaced by Assumption 4.
Proof. 
First, we prove that (32) remains valid for Algorithm 2. From Theorem 1(i), we know that for s 1 and 1 k N s 1 , the following holds:
Q x ( s , k ) , x ( s , k 1 ) , y ( s , k 1 ) , α ( s , k ) Q x ( s , k + 1 ) , x ( s , k ) , y ( s , k ) , α ( s , k + 1 ) δ 2 L g + α ( s , k ) 1 L f x ( s , k ) x ( s , k 1 ) 2 .
This inequality leads to the conclusion that (32) is valid in two cases:
(i) When k = k ( s , k ) with 1 k N s 2 .
(ii) When k = k ( s , N s 1 ) . Here, we note that x k + 1 = x ( s + 1 , 0 ) = z ( s , N s ) , α k + 1 = α ( s + 1 , 0 ) = α ( s , N s ) , and that (59) holds for k = N s due to the setup of Algorithm 2.
Now, we need to address the case where k = k ( s + 1 , 0 ) . In this situation, y k = z k = x k , and (35) still holds for z k + 1 , being equivalent to:
P 1 ( z k + 1 ) P 1 ( x k ) + ξ f ( x k ) , z k + 1 x k α k Ψ ( z k + 1 , y k ) θ k ( α k L g + L f ) z k + 1 x k 2 .
By Remark 3(iii), the equality (25) is still satisfied. Thus, the above inequality implies the following:
P 1 ( z k + 1 ) P 1 ( x k ) + 1 θ k ξ f ( x k ) , Δ k + 1 α k Ψ ( z k + 1 , y k ) 1 θ k ( α k L g + L f ) Δ k + 1 2 .
Since P 1 ( x k + 1 ) P 1 ( x k ) + θ k [ P 1 ( z k + 1 ) P 1 ( x k ) ] , we can obtain the following:
P 1 ( x k + 1 ) P 1 ( x k ) + ξ f ( x k ) , Δ k + 1 θ k α k Ψ ( z k + 1 , y k ) ( α k L g + L f ) Δ k + 1 2 .
Combining this with (38) and (39) (noting that x k y k = 0 ), the above inequality gives
F ( x k + 1 ) F ( x k ) θ k α k Ψ ( z k + 1 , y k ) 1 2 ( 2 α k L g + L f ) Δ k + 1 2 .
Consequently,
F ( x k + 1 ) m ¯ α k + 1 + Ψ ( x k + 1 , y k ) F ( x k ) m ¯ α k + Ψ ( x k , y k 1 ) α k 1 [ F ( x k + 1 ) F ( x k ) ] + Ψ ( x k + 1 , y k ) Ψ ( x k , y k 1 ) Ψ ( x k + 1 , y k ) θ k Ψ ( z k + 1 , y k ) Ψ ( x k , y k 1 ) 1 2 ( 2 L g + α k 1 L f ) Δ k + 1 2 1 2 ( 2 L g + α k 1 L f ) Δ k + 1 2 ,
where the first inequality arises from the fact that α k α k + 1 , the second one follows from (61), and the last one is derived from (42) along with the facts that Ψ ( x k , y k ) = 0 and Ψ ( x k , y k 1 ) 0 . From (62), we can deduce that   
H k + 1 H k 1 2 ( 2 L g + α k 1 L f ) Δ k + 1 2 + L g 2 x k + 1 y k 2 + 1 2 ( L g + α k + 1 1 L f ) Δ k + 1 2 L g 2 x k y k 1 2 1 2 ( L g + α k 1 L f ) Δ k 2 = 1 2 ( α k + 1 1 α k 1 ) L f Δ k + 1 2 L g 2 x k y k 1 2 1 2 ( L g + α k 1 L f ) Δ k 2 1 2 ( L g + α k 1 L f ) Δ k 2 δ 2 ( L g + α k 1 L f ) Δ k 2 ,
where the equality holds because x k + 1 y k = Δ k + 1 . This completes the proof of the result in Theorem 1(i).
The remaining parts of the proof are straightforward. Specifically, the other results in Theorems 1–3 follow directly from (32), and their original proofs remain unchanged.    □
Applying Theorem 3 to the sequences generated by Algorithm 2, there exists an integer K 0 and a positive number α ^ such that
α k = α ^ , k K 0 .
Our discussion regarding the remaining convergence properties of Algorithm 2 will rely on the following auxiliary function:
H ( z , x , y ) : = Q ( z , x , y , α ^ ) + δ C ( z ) = α ^ 1 F ( z ) m ¯ + Ψ ( z , y ) + L g 2 z y 2 + L 2 z x 2 + δ C ( z ) ,
where L : = L g + α ^ 1 L f . It is worth noting that for all k K 0 ,
H k = H ( x k , x k 1 , y k 1 ) ,
and in light of the setup of Algorithm 2, we have
H k H ^ k ,
where H ^ k : = H ( z k , x k 1 , y k 1 ) .
Lemma 6.
Suppose that Assumptions 1, 3, and 4 hold, let { x k } , { y k } , and { z k } be generated by Algorithm 2. Let Λ and Ω denote the sets of accumulation points of { x k } and { ( z k + 1 , y k , x k ) } , respectively. Then the following assertions hold
(i) 
Λ is a nonempty compact set.
(ii) 
Ω = { ( x ¯ , x ¯ , x ¯ ) : x ¯ Λ } is also nonempty and compact.
(iii) 
The limit ω : = lim k H k exists.
(iv) 
If P 1 is continuous on Ω , we have H ω on Ω , and lim k H ^ k = ω .
Proof. 
(i)
The nonemptiness and compactness of Λ follows directly from the boundedness of { ( x k ) } , as stated in Theorem 1(iii).
(ii)
The representation Ω = { ( x ¯ , x ¯ , x ¯ ) : x ¯ Λ } is a consequence of Theorem 1(ii). Consequently, the properties of nonemptiness and compactness are inherited from Λ to Ω .
(iii)
By Theorem 1(i), the sequence { H k } is nonincreasing. Furthermore, it follows from the definition of H k in (33) that H k is always non-negative. Then ω : = lim k H k exists.
(iv)
Finally, we assume that P 1 is continuous on Ω , which implies the continuity of H on Ω . For any x ¯ Λ , let { x k j } be a subsequence converging to x ¯ . By Theorem 1(ii), both { y k j } and { x k j + 1 } also converge to x ¯ . Thus,
H ( x ¯ , x ¯ , x ¯ ) = lim j H ( x k j + 1 , x k j , y k j ) = lim j H k j = ω .
Now, suppose for contradiction that { H ^ k } does not converge to ω . By (65) and (66), there exist a subsequence { H ( z k j + 1 , x k j , y k j ) } and a positive number ϵ such that
H ( z k j + 1 , x k j , y k j ) ω + ϵ .
By passing to a subsequence if necessary, we may assume that { x k j } converges to some x ¯ Λ . Consequently, both { y k j } and { z k j + 1 } also converge to x ¯ . By the continuity of H, we have
lim j H ( z k j + 1 , x k j , y k j ) = H ( x ¯ , x ¯ , x ¯ ) = ω ,
which contradicts (67). Therefore, lim k H ( z k + 1 , x k , y k ) = ω .
   □
Next, we introduce an assumption to help derive an upper bound for dist ( ( 0 , 0 , 0 ) , H ( z k + 1 , x k , y k ) ) . This assumption is widely adopted in proximal-gradient-type algorithms and is generally satisfied in numerous applications (e.g., the algorithms and problems presented in [19,28,42] and the references therein).
Assumption 5.
Each g i in (1) is twice continuously differentiable. The function P 2 is differentiable with locally Lipschitz continuous gradient on an open set U 0 containing X , where X is the set of critical points of (1).
Lemma 7.
Suppose that Assumptions 1 and 3–5 hold, P 1 is continuous, and let { ( z k + 1 , x k , y k ) } be generated by Algorithm 2. Then there exist a positive constant A 1 and a positive integer K 1 such that for all k K 1 , we have
dist ( ( 0 , 0 , 0 ) , H ( z k + 1 , x k , y k ) ) A 1 ( Δ k + 1 + Δ k ) .
Proof. 
By Lemma 5 and Theorem 3, Λ X ( U 0 ) . Noticing Lemma 6(i), the local Lipschitz continuity of P 2 ensures the existence of a bounded open neighborhood U 1 of Λ such that P 2 is Lipschitz continuous on U 1 , with the corresponding Lipschitz constant denoted by L 2 .
Leveraging the boundedness of { x k } and the limit results lim k z k + 1 x k = 0 and lim k x k y k = 0 (stated in Theorem 1(ii, iii)), there exists a positive integer K such that z k + 1 , x k , y k U 1 for all k K . Let K 1 : = max { K 0 , K } . For any k K 1 , it follows that α k = α ^ ; substituting this into Lemma 2(ii) yields the following:
0 α ^ 1 f ( y k ) P 2 ( x k ) + α ^ 1 P 1 ( z k + 1 ) + i I k ( z k + 1 ) λ i k g i ( y k ) + θ k L ( z k + 1 z k ) + N C ( z k + 1 ) ,
which can be rearranged to the following:
v 1 k V k ,
where
v 1 k : = α ^ 1 f ( z k + 1 ) f ( y k ) α ^ 1 P 2 ( z k + 1 ) P 2 ( x k ) + L g ( z k + 1 y k ) + L ( z k + 1 x k ) θ k L ( z k + 1 z k )
and
V k : = α ^ 1 f ( z k + 1 ) + P 1 ( z k + 1 ) P 2 ( z k + 1 ) + N C ( z k + 1 ) + i I k ( z k + 1 ) λ i k g i ( y k ) + L g ( z k + 1 y k ) + L ( z k + 1 x k ) .
Next, we analyze the subdifferential H ( z k + 1 , x k , y k ) . For simplicity, we decompose the function H into three components:
H a ( z , x , y ) : = α ^ 1 [ F ( z ) m ¯ ] + δ C ( z ) , H b ( z , x , y ) : = Ψ ( z , y ) , H c ( z , x , y ) : = L g 2 z y 2 + L 2 z x 2 .
By ([49] Theorem 8.6) and ([49] Corollary 10.9) (respectively), we have
H ( z k + 1 , x k , y k ) ^ H ( z k + 1 , x k , y k ) ^ H a ( z k + 1 , x k , y k ) + ^ H b ( z k + 1 , x k , y k ) + ^ H c ( z k + 1 , x k , y k ) .
We now compute the regular subdifferentials of these three components:
1. For H a :
^ H a ( z k + 1 , x k , y k ) = ^ [ α ^ 1 F ( · ) + δ C ( · ) ] ( z k + 1 ) 0 0 α ^ 1 ^ F ( z k + 1 ) + N C ( z k + 1 ) 0 0 ,
where the equality follows from ([49] Proposition 10.5), and the inclusion is derived from ([49] Corollary 10.9, Equation 10(6), Proposition 8.12, Exercise 8.14). Furthermore, by ([49] Exercise 8.8(c), Proposition 8.12), we obtain:
^ F ( z k + 1 ) = f ( z k + 1 ) + ^ P 1 ( z k + 1 ) P 2 ( z k + 1 ) = f ( z k + 1 ) + P 1 ( z k + 1 ) P 2 ( z k + 1 ) .
2. For H b : We have
^ H b ( z k + 1 , x k , y k ) = H b ( z k + 1 , x k , y k ) i I k ( z k + 1 ) λ i k g i ( y k ) 0 i I k ( z k + 1 ) λ i k 2 g i ( y k ) ( z k + 1 y k ) ,
where the equality follows from ([49] Example 7.28, Corollary 8.11), and the inclusion is deduced from ([49] Exercise 8.31).
3. For H c : By ([49] Exercise 8.8(a)), we have
^ H c ( z k + 1 , x k , y k ) = L g ( z k + 1 y k ) + L ( z k + 1 x k ) L ( z k + 1 x k ) L g ( z k + 1 y k ) .
Combining the relations (75)–(79) gives the following:
v 1 k v 2 k v 3 k V k v 2 k v 3 k H ( z k + 1 , x k , y k ) ,
where
v 2 k : = L ( z k + 1 x k ) , v 3 k : = i I k ( z k + 1 ) λ i k 2 g i ( y k ) ( z k + 1 y k ) L g ( z k + 1 y k ) .
Thus, from (80), we have
dist ( 0 , 0 , 0 ) , H ( z k + 1 , x k , y k ) v 1 k + v 2 k + v 3 k .
Furthermore, in light of
v 1 k α ^ 1 L f + L g z k + 1 y k + ( α ^ 1 L 2 + L ) z k + 1 x k + θ k L z k + 1 z k , v 2 k = L z k + 1 x k , v 3 k 2 L g z k + 1 y k
(where the last inequality follows from the Lipschitz continuity of g i and ([49] Theorem 9.7)), and by virtue of Remark 3, we arrive at Equation (69).
   □
Now we present our global convergence analysis under the Kurdyka-Lojasiewicz (KL) assumption of H.
Theorem 4.
Under the same conditions as in Lemma 7, and assuming that H is a KL function, we have k = 1 Δ k < + and that { x k } converges globally to a critical point of Problem (1).
Proof. 
Our proof is divided into two cases.
Case 1. There exists an integer K ^ > 0 such that H K ^ = ω . Since { H k } converges non-increasingly to ω , it follows that H k = ω for all k K ^ . Substituting this into Equation (32) further yields Δ k = 0 for all such k, which implies the finite convergence of { x k } .
Case 2. H k > ω for all k. From Lemma 6, we recall two key properties: Ω is a compact set, and H ω on Ω . Given that H is a KL function, Lemma 1 guarantees the existence of φ Θ η , ε > 0 , and η > 0 such that
φ ( H ( z , x , y ) ω ) dist ( 0 , 0 , 0 ) , H ( z , x , y ) 1
holds for all ( z , x , y ) satisfying
dist ( z , x , y ) , Ω < ε and ω < H ( z , x , y ) < ω + η .
By Lemma 6(iv), we know that lim k H ^ k = ω . Additionally, since { ( z k , x k 1 , y k 1 ) } is a bounded sequence and Ω is its accumulation set, there exists an integer K 2 K 1 such that for all k K 2 : dist ( z k , x k 1 , y k 1 ) , Ω < ε , ω < H ^ k < ω + η , and therefore,
φ ( H ^ k ω ) dist ( 0 , 0 , 0 ) , H ( z k , x k 1 , y k 1 ) 1 .
For the remainder of this proof, we assume k K 2 . Combining inequality (86) with Lemma 7 leads to
1 φ ( H ^ k ω ) A 1 ( Δ k 1 + Δ k ) , k K 2 .
On the other hand, leveraging the mean value theorem, the decreasing property of φ (a direct consequence of φ being concave), and the relations H k + 1 H k H ^ k (from Equations (32) and (65)), we further derive that
φ ( H k ω ) φ ( H k + 1 ω ) φ ( H ^ k ω ) ( H k H k + 1 ) .
Define ν k , t : = φ ( H k ω ) φ ( H t ω ) . Using Equations (32), (88), and (87), respectively, we get
δ L 2 Δ k 2 H k H k + 1 ν k , k + 1 φ ( H ^ k ω ) A 1 ( Δ k 1 + Δ k ) ν k , k + 1 .
Since φ > 0 and H k H k + 1 , it follows that ν k , k + 1 0 . Applying the AM-GM inequality a b a + b 2 to the result above, we obtain
Δ k 1 2 ( Δ k 1 + Δ k ) 4 A 1 δ L ν k , k + 1 1 4 ( Δ k 1 + Δ k ) + 2 A 1 δ L ν k , k + 1 ,
this yields
Δ k 1 2 ( Δ k 1 Δ k ) + 4 A 1 δ L ν k , k + 1 .
Summing inequality (91) over k = K 2 , K 2 + 1 , , t (for any t K 2 ) gives
k = K 2 t Δ k 1 2 ( Δ K 2 1 Δ t ) + 4 A 1 δ L ν K 2 , t + 1 1 2 Δ K 2 1 + 4 A 1 δ L φ ( H K 2 ω ) ,
where the final inequality holds because φ 0 . This result implies k = 0 Δ k < + , which in turn shows that { x k } is a Cauchy sequence and hence converges to some x ¯ . By Theorem 3, x ¯ is a critical point of Problem (1).    □
Lemma 8
([48] Lemma 10). Let { Λ k } k N be a nonincreasing sequence in R + converging to 0 and there exists k ¯ l ¯ 0 such as for all k k ¯ , Λ k 2 a m ( Λ k l ¯ Λ k ) , where m is a nonnegative constant and a [ 0 , 1 ) . Then, the following statements hold.
(i) 
If a = 0 , then { Λ k } converges finite time.
(ii) 
If a ( 0 , 1 2 ] , there exists μ 1 > 0 and τ [ 0 , 1 ) such that for all k k ¯ , Λ k μ 1 τ k .
(iii) 
If a ( 1 2 , 1 ) , there exists μ 2 > 0 such that for all k k ¯ + l ¯ , Λ k μ 2 ( k l ¯ + 1 ) 1 2 a 1 .
Theorem 5.
Under the same conditions as in Lemma 7, suppose further that H is a KL function, where the function φ in the KL inequality takes the form φ ( s ) = ρ s 1 β for some constants β [ 0 , 1 ) and ρ > 0 . Let { x k } be the sequence generated by Algorithm 2, and let x ¯ be its limit. Then the following assertions hold:
(i) 
If β = 0 , then Algorithm 2 terminates after finitely many iterations.
(ii) 
If β ( 0 , 1 2 ] , there exists constants c 1 > 0 and τ [ 0 , 1 ) such that x k x ¯ c 1 τ k .
(iii) 
If β ( 1 2 , 1 ) , there exists constants c 2 > 0 and τ [ 0 , 1 ) such that x k x ¯ c 2 k 1 β 2 β 1 .
Proof. 
For the case β = 0 , we have φ ( s ) = ρ s , φ ( s ) = ρ . We aim to show that H k = ω for for sufficiently large k, and consequently, (i) holds by virtue of Case 1 in the proof of Theorem 4. Suppose, for contradiction, that H k > ω for all k. Then, by (65), H ^ k > ω . Inequality (87) yields
1 ρ A 1 ( Δ k 1 + Δ k )
which contradicts the fact that lim k Δ k = 0 from Theorem 1(ii).
Next, consider the case where β ( 0 , 1 ) . For the remainder of this proof, we assume k K 2 , where K 2 is defined in the proof of Theorem 4. First, utilizing the inequality (87) and the expression φ ( s ) = ρ ( 1 β ) s β , we derive
( H k ω ) β ( H ^ k ω ) β ρ ( 1 β ) A 1 ( Δ k 1 + Δ k ) .
Raising the above inequality to the power of 1 β β > 0 and noting that φ ( s ) = ρ s 1 β , we obtain
φ ( H k ω ) = ρ ( H k ω ) 1 β ρ [ ρ ( 1 β ) A 1 ] 1 β β ( Δ k 1 + Δ k ) 1 β β .
Define a nonincreasing sequence R k : = i = k + 1 Δ i . By Theorem 4, each R k is finite. It follows that
R k + 1 R k = i = k + 1 Δ i ( a ) 1 2 Δ k + 4 A 1 δ L φ ( H k + 1 ω ) ( b ) 1 2 Δ k + A 2 ( Δ k + Δ k + 1 ) 1 β β = 1 2 ( R k 1 R k ) + A 2 ( R k 1 R k + 1 ) 1 β β ,
where A 2 = 4 A 1 δ L ρ [ ρ ( 1 β ) A 1 ] 1 β β , the inequality ( a ) is derived analogously to (92), and inequality ( b ) follows from (95). With (96) established, we now prove (ii) and (iii) separately.
(ii) For β ( 0 , 1 2 ] , we have 1 β β 1 . Recalling that lim k Δ k = 0 , we observe that R k 1 R k + 1 0 as k . Thus, there exists K 3 > K 2 such that R k 1 R k + 1 < 1 for all k K 3 . Hence
R k + 1 ( A 2 + 1 2 ) ( R k 1 R k + 1 ) , k K 3 .
Applying Lemma 8(ii) to { R k } with a = 1 2 and l ¯ = 2 , we conclude that (ii) holds.
(iii) For β ( 1 2 , 1 ) , we have 1 β β < 1 . Similarly, there exists K 4 > K 2 such that
R k + 1 ( A 2 + 1 2 ) ( R k 1 R k + 1 ) 1 β β , k K 4 .
Applying Lemma 8(iii) to { R k } with a = β 2 ( 1 β ) and l ¯ = 2 , we establish (iii) and complete the proof.    □
From the preceding results (Lemma 5 to Theorem 4), we observe the efficacy of the restart technique in the convergence analysis of the EAPG s method. However, the restart criterion given by (59) merely serves as a sufficient condition for z k + 1 to outperform x k + 1 , and this condition is overly restrictive, making it difficult to satisfy. In fact, when implementing Algorithm 2 on most problems presented in Section 6, at most one restart occurred. In experiments where no restart was triggered, Algorithm 2 was entirely identical to Algorithm 1, with the sole exception that we explicitly ensured condition (65)—along with the subsequent theoretical results—held for Algorithm 2. Thus, Algorithm 2 should be regarded more as a theoretical construct than a practical implementation.
Practically speaking, however, the restart technique can indeed enhance the efficiency of our algorithms, as demonstrated in Section 6. This indicates the necessity of developing a practical criterion to replace (59), thereby maximizing opportunities for effective restarts. Drawing inspiration from the fixed restarting algorithm ([33] Algorithm 3), the d k criterion (discussed following Equation (75) in [26]), and the inner product criterion in [19], we define
d k = α k 1 F ( x k ) F ( x k + 1 ) + G k G k + 1 Δ k 2 ,
where
G k = Ψ ( x k , y k 1 ) + L g 2 x k y k 1 2 + L g + α k 1 L f 2 Δ k 2 ,
and propose the following algorithm (Algorithm 3):
Algorithm 3 ( EAPG s r ) EAPG s with Restart Technique (Practical Version)
Initialization: Given x 0 , z 0 C , α 0 > 0 , d > 0 , a positive integer N 0 , and { θ k } as defined in Remark A1(ii).
Determining the restart interval: Execute Algorithm 1 with initial values x 0 , z 0 , α 0 and parameters { θ 0 , θ 1 , } for N steps, where N denotes the first step k satisfying k N 0 and d k > d k 1 . Set x 0 = z 0 = z N and α 0 = α N .
for    s = 1 , 2 ,  do
Execute Algorithm 1 with initial values x 0 , z 0 , α 0 and parameters { θ 0 , θ 1 , } until either y k 1 z k , z k z k 1 > 0 or k = N . Set x 0 = z 0 = z k and α 0 = α k .
end for

6. Numerical Experiments

In this section, we conduct numerical experiments to assess the performance of Algorithm 3. The design of these experiments is intended to demonstrate three key aspects, as elaborated below:
1.
In Section 6.1 and Section 6.2, we verify the computational efficiency of Algorithm 3 against the IPOPT solver [50,51] and three state-of-the-art methods—namely SCP ls [28], ESQM b (a basic variant of ESQM derived by fixing β k 0 in ESQM e ), and ESQM e  [19]—when solving the optimization problem formulated in (3).
2.
In Section 6.3, we evaluate three key metrics: the effectiveness of Algorithm 3’s d k -criterion for optimal restart interval identification, Algorithm 3’s efficiency versus Algorithms 1 and 2, and its overall performance relative to multiple modified variants.
3.
In Section 6.4, we validate the efficacy of Algorithm 3 on the unconstrained DC problem specified in (2), with comparisons drawn to the IPOPT solver and three established algorithms for unconstrained DC problems: GIST [52], pDCAe [53], and APG s [26] (the foundational prototype of Algorithm 3).
From Section 6.1 and Section 6.2, the numerical experiments focus on the following compressed sensing optimization problem:
min x R n x 1 μ x s . t . h ( A x b ) σ , x M ,
where
μ [ 0 , 1 ) ;
A R q × n has full row rank;
b R q ;
M = ( 1 μ ) 1 A b 1 μ A b ;
h : R q R + is an analytic function with Lipschitz-continuous gradient (modulus L h ), h ( 0 ) = 0 , and σ ( 0 , h ( b ) ) .
This problem is equivalent to the following model:
min x R n x 1 μ x 2 s . t . h ( A x b ) σ ,
initially introduced in [54] and further explored in [17,55] for sparse signal recovery.
Problem (101) is a special case of (3) with P 1 ( x ) = x 1 , P 2 ( x ) = μ x , m = 1 , g 1 ( x ) = h ( A x b ) σ , and C = { x : x M } . Additionally, since A has full row rank and h ( 0 ) = 0 < σ , we have A b C { x : g 1 ( x ) < 0 } . It is straightforward to verify that Assumption 1 holds for this problem.
In the following subsections, we conduct experiments on Problem (101) with different selections of h. All numerical experiments were performed on a computer with an Intel(R) Core(TM) i5-8265U processor and 8.00 GB of memory, running the Windows 10 operating system. The experiments were implemented using MATLAB R2021a.

6.1. Compressed Sensing with h ( · ) = 1 2 · 2

We first consider the Problem (101) with h ( · ) = 1 2 · 2 , which transforms (101) into:
min x R n x 1 μ x s . t . 1 2 A x b 2 σ , x M .
Note that h is convex, so g 1 is also convex. Thus, we can set L g = A 2 and l g = 0 . Furthermore, since A b C { x : g 1 ( x ) < 0 } , the Slater condition holds for the above problem. Based on the discussion following Definition 3 and Remark 2, Assumption 3 is satisfied.
Since the function H in (64) (corresponding to Problem (103)) is clearly semi-algebraic—and therefore a KL function—we can apply Theorem 5 to deduce the convergence of the entire sequence { x k } generated by Algorithm 3 for solving (103).
  • Details of the Five Algorithms
We detail the setup of the five algorithms below:
(i)
Initialization and Stopping Criteria: For SCP ls , ESQM b , and ESQM e , we adopt the same initial points as specified in [19]: specifically, x 0 = A b for SCP ls , and x 0 = 0 for both ESQM b and ESQM e . For EAPG sr , the initial point is set to x 0 = z 0 = 0 . For IPOPT, we introduce slack variables u and v to reformulate Problem (103) as follows:
min u , v R n 1 u + 1 v μ u v s . t . 1 2 A ( u v ) b 2 σ , u , v 0 ,
where 1 denotes the all-ones vector (i.e., a vector with each component equal to 1). The corresponding initial points are set to u 0 = [ A b ] + + 0.001 1 and v 0 = [ A b ] + + 0.001 1 .
All algorithms except IPOPT terminate when either the relative iterate difference satisfies x k + 1 x k max { 1 , x k + 1 } ϵ (with ϵ > 0 to be specified in subsequent sections) or the maximum number of iterations (3000) is reached. For IPOPT, the convergence tolerance is configured to the same value of ϵ and the maximum number of iterations is set to 1000.
(ii)
Parameter Settings: The parameters for SCP ls follow [28], while those for ESQM b and ESQM e follow [19]. For EAPG sr , we set α 0 = 1 and d = 1 (consistent with [19]) and N 0 = 20 . Since g 1 is convex (implying l g = 0 ), any positive integer K is valid for the acceleration parameters { θ k } defined in Remark 1(ii); here, we set K = 150 . Notably, in practice, the restart period N observed in experiments was consistently less than 100. Thus, the experimental performance would remain unchanged if we select any K 100 .
The subproblems of these algorithms are solved following the procedures outlined in the appendices of [28,45]. In each subproblem, the computational complexity of evaluating g 1 and ζ k is O ( q n ) and O ( n ) , respectively. Additionally, solving the optimization problem (15) incurs a complexity of O ( n ) . Consequently, the overall computational complexity of each subproblem is O ( q n ) .
All settings of IPOPT are set to their default values except for the tolerance parameter and maximum number of iterations.
  • Experimental Setup for Random Instances
We tested Algorithm 3 on random instances of Problem (103), generated as follows:
1.
Generate A R q × n with independent and identically distributed (i.i.d.) standard Gaussian entries, then normalize A such that each column has unit norm.
2.
Randomly select a subset T { 1 , 2 , , n } of size p, and generate a p-sparse vector x orig with i.i.d. standard Gaussian entries on T.
3.
Set b = A x orig + 0.01 · n ^ , where n ^ is a random vector with i.i.d. standard Gaussian entries; set σ = 0.5 σ 1 2 , where σ 1 = 1.1 · 0.01 · n ^ .
  • Experimental Parameters and Result Metrics
In our numerical tests:
We set μ = 0.99 in Problem (105).
We considered parameter triples ( q , n , p ) = ( 720 i , 2560 i , 160 i ) for i { 2 , 4 , 6 , 8 , 10 } .
For each i, 20 random instances were generated (as above), and results were averaged over these 20 instances.
Computational results for ϵ = 10 4 and ϵ = 10 6 are presented in Table 1 and Table 2, respectively. The metrics reported include:
t QR : Time to compute the QR decomposition of A T .
t A : Time to compute A 2 .
t A b : Time to compute x 0 = A b using the QR factorization of A T .
CPU time of each algorithm.
Iter: Number of iterations.
RecErr : = x * x orig max { 1 , x orig } : Recovery error (where x * is the approximate solution from the algorithm).
Residual : = A x * b 2 σ 1 2 σ 1 2 : Residual of the constraint violation.
  • Key Observations from Results
From Table 1 and Table 2, we observe two main results:
1.
EAPG sr achieves the fastest computation speed among the five algorithms.
2.
The recovery errors (RecErr) and residuals of all five methods are comparable.

6.2. Compressed Sensing with Lorentzian Norm

Next, we consider Problem (101) with Lorentzian norm, which transforms (101) into
min x R n x 1 μ x s . t . A x b L L 2 , γ σ , x M .
The Lorentzian norm is defined as follows [56]:
y L L 2 , γ : = i = 1 q log 1 + y i 2 γ 2 ,
where γ > 0 .
As proven in ([19] Subsection 6.2), Assumption 3 holds.
  • Details of the Five Algorithms
The setup of the five algorithms is detailed below:
(i)
Initialization and Stopping Criteria: All algorithms use the same initialization and stopping criteria as in Section 6.1.
(ii)
Parameter Settings:
For SCP ls , parameters follow the settings in [28].
For the other three algorithms, we set L g = 2 A 2 γ 2 , l g = A 2 4 γ 2 , α 0 = 1.1 γ , d = γ 2 150 A 2 —consistent with [19].
For the acceleration parameters { θ k } of EAPG sr , since L g L g + l g = 8 9 , it is straightforward to verify that K 31 . However, in practice, because all iterates { x k } lie within a bounded local region, the theoretical results may remain valid for larger values of K. As the next subsection will demonstrate, we can consistently use a large K and adaptively determine the restart period N (where N < K ) to enhance the performance of Algorithm 3. In fact, the restart period N observed in experiments was consistently less than 100. Thus, selecting any K 100 would ensure both consistent and improved experimental performance. For the lower bound of N, we also set N 0 = 20 , consistent with the setting in Section 6.1.
The subproblems of these algorithms are solved following the procedures outlined in the appendices of [28,45]. Consistent with the experimental results presented in Section 6.1, the overall computational complexity of each subproblem is also O ( q n ) .
  • Experimental Setup for Random Instances
Random instances are generated as follows:
1.
Generate A, subset T, size p, and sparse vector x orig using the same method as in Section 6.1.
2.
Set b = A x orig + 0.01 · n ˜ , where n ˜ i Cauchy ( 0 , 1 ) . Specifically, n ˜ i is generated as tan π n ^ i 1 2 , with n ^ being a random vector with i.i.d. entries uniformly sampled from [ 0 , 1 ] (note: corrected the ambiguous “with n ˜ being” to “with n ^ being” to avoid variable confusion).
3.
Set σ = 1.05 · 0.01 · n ˜ L L 2 , γ with γ = 0.055 .
  • Experimental Parameters and Result Metrics
In the numerical tests:
We set μ = 0.99 in Problem (110).
We considered parameter triples ( q , n , p ) = ( 720 i , 2560 i , 80 i ) for i { 2 , 4 , 6 , 8 , 10 } .
For each i, 20 random instances were generated, and results were averaged over these instances (consistent with Section 6.1).
Computational results for ϵ = 10 4 and ϵ = 10 6 are presented in Table 3 and Table 4, respectively. The reported metrics are identical to those in Section 6.1:
t QR : Time to compute the QR decomposition of A T .
t A : Time to compute A 2 .
t A b : Time to compute x 0 = A b using the QR factorization of A T .
CPU time of each algorithm.
Iter: Number of iterations.
RecErr : = x * x orig max { 1 , x orig } : Recovery error (where x * is the approximate solution from the algorithm).
Residual : = A x * b L L 2 , γ σ σ : Residual of the Lorentzian norm constraint violation.
  • Key Observations from Results  
From Table 3 and Table 4, we observe a pattern consistent with that in Table 1 and Table 2:
1.
EAPG sr frequently demonstrates the fastest convergence speed among the five.
2.
The recovery errors (RecErr) and residuals of all five methods are comparable.

6.3. Analysis on the Settings of Algorithm 3

The preceding experiments have demonstrated the efficiency of Algorithm 3 compared to other algorithms. In this subsection, we explain the rationality of the settings in Algorithm 3 by illustrating the following conclusions with experimental results:
(i)
The restart period N determined by Algorithm 3 is a good approximation of the optimal fixed restart period for Algorithm 4.
(ii)
The restart scheme of Algorithm 3 outperforms the following alternative schemes:
  • Algorithm 1 and Algorithm 2, both with { θ k } defined in Remark 1(ii) and with K set to 30 and 100, respectively.
  • Variant (a) of Algorithm 3: Restarts only based on the d k -criterion (without determining N or using the inner product criterion).
  • Variant (b) of Algorithm 3: Algorithm 3 with the inner product criterion removed.
  • Variant (c) of Algorithm 3: Determines the restart interval using both the d k -criterion and the inner product criterion (consistent with Algorithm 3).
  • Variant (d) of Algorithm 3: Algorithm 3 with the d k -criterion replaced by the inner product criterion.
  • Variant (e) of Algorithm 3: (Algorithm 3 Incorporating the Armijo Step Size Rule): Replace Equation (16) with the step size selection strategy outlined below:
    Let x ˜ k + 1 = θ k z k + 1 + ( 1 θ k ) x k . If θ k = 1 or F ( x ˜ k + 1 ) F ( x k ) , set x k + 1 = x ˜ k + 1 directly. Otherwise, compute
    x k + 1 = x ˜ k + 1 + β p ( z k + 1 x ˜ k + 1 )
    where p denotes the smallest non-negative integer satisfying the inequality
    F x ˜ k + 1 + β p ( z k + 1 x ˜ k + 1 ) F ( x ˜ k + 1 ) β p ( 1 θ k ) c F ( x ˜ k + 1 ) F ( x k ) θ k .
    In the above criterion, the parameters are fixed as c = 0.1 and β = 0.5 .
To verify conclusion (i), we tested the fixed restarting version of Algorithm 1 on Problem (103) using a single dataset (where i = 2 ) for each N { 1 , 2 , , 50 } (with ϵ = 10 4 ):
Algorithm 4 Fixed restarting with period N
Initialization: Given x 0 , z 0 C , α 0 > 0 , d > 0 , a positive integer N, and { θ k } as defined in Remark A1(ii).
for   s = 1 , 2 ,  do
Execute Algorithm 1 for N steps, with initial values x 0 , z 0 , α 0 and parameters { θ 0 , θ 1 , } . Set x 0 = z 0 = z N and α 0 = α N .
end for
Figure 1 presents the number of iterations for N { 10 , 11 , , 50 } ; note that iterations for N { 1 , 2 , , 9 } are excessively large (ranging from 1776 to 315). From Figure 1, the optimal fixed restart period is identified as 22. By contrast, when applying Algorithm 3 to the same problem and dataset, the d k -criterion yields a restart period N = 24 (see Figure 2).
Similarly, for Problem (105), we conducted the same comparison: the optimal fixed restart period is 42 (see Figure 3); the d k -criterion yields a restart period N = 61 (see Figure 4), which is also among the approximately optimal fixed restart periods. Consistent findings were observed across other datasets, confirming that the N determined by Algorithm 3 approximates the optimal fixed restart period well.
We also note that the lower bound N 0 is necessary. Without N 0 , a small percentage of datasets may result in an extremely small restart period N, which in turn leads to a significantly higher number of iterations.
To verify conclusion (ii), we generated 20 random instances (following the same procedure as in Section 6.1 and Section 6.2) with i = 2 and ϵ = 10 4 . The averaged computational results are presented in Table 5 and Table 6, respectively. In both tables, Algorithm 3 consistently requires the fewest iterations, while the recovery errors of all algorithms are comparable. This confirms that Algorithm 3 is more effective than the other variants.
We note here that Algorithm 2 constitutes a genuine improvement over Algorithm 1; however, in the majority of test cases, only a single restart is triggered when implementing Algorithm 2. Thus, in practical scenarios, Algorithm 3 is able to identify more appropriate restart points and achieve more substantial performance gains.
On the other hand, Variant (e) delivers moderate performance across all variants of Algorithm 3. This observation indicates that line search techniques possess considerable potential for boosting the performance of algorithms such as EAPG s . Further research into the constraints governing acceleration parameters—analogous to the momentum conditions for generalized acceleration parameters in Nesterov’s first acceleration-augmented proximal gradient-type algorithms [57]—could enable line search to yield more efficient performance improvements.

6.4. Experiments on Unconstrained DC Problems

In this subsection, we demonstrate that the EAPG s r algorithm yields notable performance improvements over IPOPT, GIST, pDCAe, and APG s when applied to unconstrained DC problems.
We consider the support vector machine (SVM) model with training data { ( x i , y i ) } i = 1 m R n × { 1 , 1 } , as proposed in [58]:
min b R , w R n F ( b , w ) = f 1 ( b , w ) f 2 ( b , w ) f 3 ( b , w ) + P 1 ( b , w ) ,
where P 1 ( b , w ) = λ 1 w 1 + w 2 2 2 + b 2 2 (with λ 1 denoting the regularization parameter). For j = 1 , 2 , 3 , f j ( b , w ) = 1 m i = 1 m j [ y i ( b + w x i ) ] , and the smooth convex loss functions j are defined as
1 ( t ) = 4 5 t , if t < 3 5 , 5 4 ( 1 t ) 2 , if 3 5 t < 1 , 5 8 ( 1 t ) 2 , if 1 t < 7 5 , 1 2 6 5 t , if t 7 5 ,
2 ( t ) = t 1 5 , if t 2 5 , 5 4 t 2 , if 2 5 < t 0 , 0 , if t 0 ,
3 ( t ) = 0 , if t 8 5 , 5 8 t 8 5 2 , if 8 5 t < 2 , 1 2 9 5 t , if t 2 .
Implementation Details of the Five Algorithms
We specify the setup of the five competing algorithms in detail below:
(i)
Objective Function Formulation:
  • For GIST, the objective is cast as F = f + P 1 , where f = f 1 f 2 f 3 ;
  • For pDCAe, F is decomposed into F = f + P 1 P 2 , with f = f 1 and P 2 = f 2 + f 3 ;
  • For APG s and EAPG s r , the decomposition takes the form F = f + P 1 P 2 , where f = f 1 f 2 and P 2 = f 3 ;
  • For IPOPT, to accommodate its requirement for differentiable objectives, we introduce non-negative slack variables u , v such that w = u v , thereby reformulating F ( b , w ) into a differentiable form.
(ii)
Initialization and Termination Criteria: For each dataset, a total of 21 initial points ( b 0 , w 0 ) are used uniformly across all algorithms except IPOPT: one zero vector, plus 5 independently sampled vectors from the normal distribution N ( 0 , σ 2 I ) for each σ { 1 , 2 , 4 , 8 } . All algorithms terminate if either the relative iterate difference satisfies
( b k + 1 ; w k + 1 ) ( b k ; w k ) max { 1 , ( b k ; w k ) } < 10 6
or the iteration count reaches the upper limit of 3000.
For IPOPT, corresponding to each of the aforementioned initial points ( b 0 , w 0 ) , we initialize the slack variables as u 0 = [ w 0 ] + + 0.001 1 and v 0 = [ w 0 ] + + 0.001 1 . The convergence tolerance is set to 10 6 , with the maximum number of iterations configured to 1000.
(iii)
Parameter Configurations: The parameters for GIST, pDCAe, and APG s are set in accordance with their respective original studies [26,52,53]. For EAPG s r , we fix the restart parameter N 0 = 20 . All IPOPT settings are retained at their default values, with the only exception being the convergence tolerance (adjusted as specified above) and the maximum number of iterations.
A total of 8 real-world datasets are selected from the UCI Machine Learning Repository [59] for testing.
  • Experimental Parameters and Evaluation Metrics
In our numerical experiments, the following parameter specifications and performance metrics are adopted:
The regularization parameter in Problem (107) is set to λ 1 = 1 × 10 3 ;
The Lipschitz constant L for all algorithms is taken as
L = 5 2 1 m i = 1 m y i 2 ( 1 + x i 2 ) ,
which is derived from ([58] Proposition 1).
Computational results are summarized in Table 7, with the following metrics reported for each algorithm:
Iter: Number of iterations required to reach convergence;
Fval: Final value of the objective function at termination;
Time: Total CPU time (in seconds) consumed during the optimization process.
The experimental results in Table 7 consistently verify three core advantages of the EAPG s r algorithm:
1.
It attains the second-lowest iteration count in most test cases.
2.
It achieves optimal objective function values across most datasets.
3.
It incurs the shortest computational time among all competing methods.
These findings confirm that EAPG s r also exhibits superior performance in solving unconstrained DC problems.

7. Conclusions

In this paper, we extend the proximal gradient method with Nesterov’s second acceleration technique ( APG s ; see [23,25,26,44])—originally designed for unconstrained DC problems—to an extended version ( EAPG s ) for constrained DC problems. This extension draws on the constraint handling idea from the extended sequential quadratic method (ESQM) introduced in [18].
We establish the subsequential convergence of the entire sequence under appropriate assumptions. Additionally, by incorporating a restart technique, we further derive a global convergence result. Notably, this global convergence requires weaker assumptions on the function P 1 compared to those in [25,26].
Guided by our theoretical analysis, we further propose a practical variant of EAPG s (dubbed EAPG s r , detailed in Algorithm 3) with efficient restart criteria. Numerical experiments demonstrate that, in most cases, EAPG s r achieve high-quality solutions to the test problems with fewer iterations and lower CPU time.
Our core theoretical contributions are summarized as follows:
(1)
Extending the APG s method to the setting of constrained DC problems, filling the gap between unconstrained DC optimization and constrained DC problem solving.
(2)
Deriving a global convergence result for the restart-augmented EAPG s framework.
(3)
Weakening the original regularity requirements on the function P 1 that underpin the convergence of the baseline APG s method.
From a numerical perspective, EAPG s r demonstrates two key practical merits:
(a)
Providing a competitive new approach for solving constrained DC problems, with performance comparable to or exceeding state-of-the-art methods.
(b)
Delivering significant performance gains (attributed to the embedded restart technique) over the baseline APG s method when applied to unconstrained DC problems.
Building on the advances of this work, several promising avenues for future research are identified:
(i)
Identifying the conditions under which Inequality (65) holds for Algorithm 1.
(ii)
Conducting a theoretical analysis of the restart criteria for Algorithm 3.
(iii)
Developing efficient solvers for subproblems involving more generalized forms of P 1 and cases with m > 1 .
(iv)
Exploring techniques (e.g., adaptive line search rules) to identify optimal acceleration parameters, further enhancing the algorithm’s computational efficiency.

Author Contributions

Conceptualization, C.L.; Methodology, C.L.; Validation, Z.L. and H.K.; Data curation, Z.L.; Writing—original draft, Z.L. and H.K.; Writing—review & editing, Z.L., H.K. and C.L.; Supervision, C.L.; Funding acquisition, C.L. All authors have read and agreed to the published version of the manuscript.

Funding

The third author is supported by the National Natural Science Foundation of China (Project No. 12571102).

Data Availability Statement

The codes for generating the random data and implementing the algorithms in the numerical section are available from the corresponding author upon request.

Acknowledgments

The authors wish to express their sincere gratitude to the reviewers for their insightful comments and constructive suggestions, which have significantly contributed to the improvement of this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Le Thi, H.A.; Pham Dinh, T. DC programming and DCA: Thirty years of developments. Math. Program. 2018, 169, 5–68. [Google Scholar] [CrossRef]
  2. Hong, L.J.; Yang, Y.; Zhang, L.W. Sequential Convex Approximations to Joint Chance Constrained Programs: A Monte Carlo Approach. Oper. Res. 2011, 59, 617–630. [Google Scholar] [CrossRef]
  3. Geremew, W.; Nam, N.M.; Semenov, A.; Boginski, V.; Pasiliao, E. A DC programming approach for solving multicast network design problems via the Nesterov smoothing technique. J. Glob. Optim. 2018, 72, 705–729. [Google Scholar] [CrossRef]
  4. Shen, C.G.; Liu, X. Solving nonnegative sparsity-constrained optimization via DC quadratic-piecewise-linear approximations. J. Glob. Optim. 2021, 81, 1019–1055. [Google Scholar] [CrossRef]
  5. van Ackooij, W.; de Oliveira, W. Non-smooth DC-constrained optimization: Constraint qualification and minimizing methodologies. Optim. Methods Softw. 2019, 34, 890–920. [Google Scholar] [CrossRef]
  6. Lu, Z.S.; Sun, Z.; Zhou, Z.R. Penalty and Augmented Lagrangian Methods for Constrained DC Programming. Math. Oper. Res. 2022, 47, 1707–2545. [Google Scholar] [CrossRef]
  7. Pang, J.-S.; Razaviyayn, M.; Alvarado, A. Computing B-Stationary Points of Nonsmooth DC Programs. Math. Oper. Res. 2017, 42, 95–118. [Google Scholar] [CrossRef]
  8. Alvarado, A.; Scutari, G.; Pang, J.-S. A new decomposition method for multiuser DC programming and its applications. IEEE Trans. Signal. Process. 2014, 62, 2984–2998. [Google Scholar] [CrossRef]
  9. Zhang, S.; Xin, J. Minimization of transformed L1 penalty: Theory, difference of convex function algorithm, and robust application in compressed sensing. Math. Program. 2018, 169, 307–336. [Google Scholar] [CrossRef]
  10. Sanjabi, M.; Razaviyayn, M.; Luo, Z.-Q. Optimal joint base station assignment and beamforming for heterogeneous networks. IEEE Trans. Signal. Process. 2014, 62, 1950–1961. [Google Scholar] [CrossRef]
  11. Candès, E.; Recht, B. Exact matrix completion via convex optimization. Commun. ACM 2012, 55, 111–119. [Google Scholar] [CrossRef]
  12. Nakayama, S.; Gotoh, J.Y. On the superiority of pgms to pdcas in nonsmooth nonconvex sparse regression. Optim. Lett. 2021, 15, 2831–2860. [Google Scholar] [CrossRef]
  13. Van Luong Le, F.L.; Bloch, G. Selective l1 minimization for sparse recovery. IEEE Trans. Autom. Control 2014, 59, 3008–3013. [Google Scholar]
  14. Wang, W.; Chen, Y. An accelerated smoothing gradient method for nonconvex nonsmooth minimization in image processing. J. Sci. Comput 2022, 90, 31. [Google Scholar] [CrossRef]
  15. Pham Dinh, T.; Souad, E.B. Algorithms for solving a class of nonconvex optimization Problems. Methods of subgradients. In Fermat Days 85: Mathematics for Optimization; Hiriart-Urruty, J.-B., Ed.; North-Holland: Amsterdam, The Netherlands, 1986; pp. 249–271. [Google Scholar]
  16. Gill, P.E.; Wong, E. Sequential quadratic programming methods. In Mixed Integer Nonlinear Programming; Lee, J., Leyffer, S., Eds.; Springer: New York, NY, USA, 2012; pp. 147–224. [Google Scholar]
  17. Yin, P.; Lou, Y.; He, Q.; Xin, J. Minimization of 1–2 for compressed sensing. SIAM J. Sci. Comput. 2015, 37, A536–A563. [Google Scholar] [CrossRef]
  18. Auslender, A. An extended sequential quadratically constrained quadratic programming algorithm for nonlinear, semidefinite, and second-order cone programming. J. Optim. Theory Appl. 2013, 156, 183–212. [Google Scholar] [CrossRef]
  19. Zhang, Y.; Pong, T.K.; Xu, S. An extended sequential quadratic method with extrapolation. Comput. Optim. Appl. 2025, 91, 1185–1225. [Google Scholar] [CrossRef]
  20. Gaudioso, M.; Taheri, S.; Bagirov, A.M.; Karmitsa, N. Bundle Enrichment Method for Nonsmooth Difference of Convex Programming Problems. Algorithms 2023, 16, 394. [Google Scholar] [CrossRef]
  21. Tseng, P. Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. 2010, 125, 263–295. [Google Scholar] [CrossRef]
  22. Beck, A.; Teboulle, M. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems. SIAM J. Imaging Sci. 2009, 2, 183–202. [Google Scholar] [CrossRef]
  23. Auslender, A.; Teboulle, M. Interior gradient and proximal methods for convex and conic optimization. SIAM J. Optim. 2006, 16, 697–725. [Google Scholar] [CrossRef]
  24. Wen, B.; Chen, X.; Pong, T.K. Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems. SIAM J. Optim. 2017, 27, 124–145. [Google Scholar] [CrossRef]
  25. Lin, D.; Liu, C. The modified second apg method for dc optimization problems. Optim. Lett. 2019, 13, 805–824. [Google Scholar] [CrossRef]
  26. Ren, K.; Liu, C.; Wang, L. The modified second APG method for a class of nonconvex nonsmooth problems. Optim. Lett. 2025, 19, 747–770. [Google Scholar] [CrossRef]
  27. Lu, Z. Sequential convex programming methods for a class of structured nonlinear programming. arXiv 2012, arXiv:1210.3039. [Google Scholar]
  28. Yu, P.; Pong, T.K.; Lu, Z. Convergence rate analysis of a sequential convex programming method with line search for a class of constrained difference-of-convex optimization problems. SIAM J. Optim. 2021, 31, 2024–2054. [Google Scholar] [CrossRef]
  29. Wilson, R.B. A Simplicial Method for Convex Programming. Ph.D. Thesis, Harvard University, Cambridge, MA, USA, 1963. [Google Scholar]
  30. Solodov, M.V. On the sequential quadratically constrained quadratic programming methods. Math. Oper. Res. 2004, 29, 64–79. [Google Scholar] [CrossRef][Green Version]
  31. Fukushima, M.; Luo, Z.-Q.; Tseng, P. A sequential quadratically constrained quadratic programming method for differentiable convex minimization. SIAM J. Optim. 2003, 13, 1098–1119. [Google Scholar] [CrossRef]
  32. Solodov, M.V. Global convergence of an SQP method without boundedness assumptions on any of the iterative sequences. Math. Program. 2009, 118, 1–12. [Google Scholar] [CrossRef][Green Version]
  33. O’Donoghue, B.; Candès, E. Adaptive Restart for Accelerated Gradient Schemes. Found. Comput. Math. 2015, 15, 715–732. [Google Scholar] [CrossRef]
  34. Rockafellar, R.T.; Wets, R.J.B. Variational Analysis, 3rd ed.; Springer Science & Business Media: Berlin, Germany, 2009; pp. 298–472. [Google Scholar]
  35. Bolte, J.; Daniilidis, A.; Lewis, A.; Shiota, M. Clarke subgradients of stratifiable functions. SIAM J. Optim. 2007, 18, 556–572. [Google Scholar] [CrossRef]
  36. Wang, L.; Liu, Z.; Liu, C. The Bregman Modified Second APG Method for DC Optimization Problems. IEEE Access 2025, 13, 126070–126083. [Google Scholar] [CrossRef]
  37. Attouch, H.; Bolte, J.; Redont, P.; Soubeyran, A. Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 2010, 35, 438–457. [Google Scholar] [CrossRef]
  38. Attouch, H.; Bolte, J.; Svaiter, B.F. Convergence of descent methods for semi-algebraic and tame problems: Proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 2013, 137, 91–129. [Google Scholar] [CrossRef]
  39. Bolte, J.; Sabach, S.; Teboulle, M. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 2014, 146, 459–494. [Google Scholar] [CrossRef]
  40. Li, G.; Pong, T.K. Calculus of the exponent of Kurdyka-Łojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comput. Math. 2018, 18, 1199–1232. [Google Scholar] [CrossRef]
  41. Qian, Y.; Pan, S. Convergence of a class of nonmonotone descent methods for Kurdyka-Łojasiewicz optimization problems. SIAM J. Optim. 2023, 33, 638–651. [Google Scholar] [CrossRef]
  42. Wen, B.; Chen, X.; Pong, T.K. A proximal difference-of-convex algorithm with extrapolation. Comput. Optim. Appl. 2018, 69, 297–324. [Google Scholar] [CrossRef]
  43. Bot, R.I.; Dao, M.N.; Li, G. Extrapolated proximal subgradient algorithms for nonconvex and nonsmooth fractional programs. Math. Oper. Res. 2022, 47, 2415–2443. [Google Scholar] [CrossRef]
  44. Nesterov, Y. Introductory Lectures on Convex Optimization: A Basic Course; Kluwer Academic Publishers: Boston, MA, USA, 2004; pp. 20–105. [Google Scholar]
  45. Zhang, Y.; Li, G.; Pong, T.K.; Xu, S. Retraction-based first-order feasible methods for difference-of-convex programs with smooth inequality and simple geometric constraints. Adv. Comput. Math. 2023, 49, 1–40. [Google Scholar] [CrossRef]
  46. Chen, G.; Teboulle, M. Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM J. Optim. 1993, 3, 538–543. [Google Scholar] [CrossRef]
  47. Rockafellar, R.T. Convex Analysis; Princeton University Press: Princeton, NJ, USA, 1970; pp. 227–240. [Google Scholar]
  48. Bot, R.I.; Nguyen, D.K. The proximal alternating direction method of multipliers in the nonconvex setting: Convergence analysis and rates. Math. Oper. Res. 2020, 45, 682–712. [Google Scholar] [CrossRef]
  49. Rockafellar, R.T.; Wets, R.J.-B. Variational Analysis; Springer: Berlin, Germany, 1998; pp. 298–472. [Google Scholar]
  50. Wächter, A.; Biegler, L.T. On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming. Math. Program. 2006, 106, 25–57. [Google Scholar] [CrossRef]
  51. COIN-OR. Ipopt (Interior Point Optimizer). Computer Software. 2006. Available online: https://projects.coin-or.org/Ipopt (accessed on 28 October 2025).
  52. Chen, X.; Lu, Z.; Pong, T.K. Penalty methods for a class of non-Lipschitz optimization problems. SIAM J. Optim. 2016, 26, 1465–1492. [Google Scholar] [CrossRef]
  53. Liu, T.; Pong, T.K.; Takeda, A. A refined convergence analysis of pDCAe with applications to simultaneous sparse recovery and outlier detection. Comput. Optim. Appl. 2019, 73, 69–100. [Google Scholar] [CrossRef]
  54. Esser, E.; Lou, Y.; Xin, J. A method for finding structured sparse solutions to non-negative least squares problems with applications. SIAM J. Imag. Sci. 2013, 6, 2010–2046. [Google Scholar] [CrossRef]
  55. Lou, Y.; Yin, P.; He, Q.; Xin, J. Computing sparse representation in a highly coherent dictionary based on difference of L1 and L2. J. Sci. Comput. 2015, 64, 178–196. [Google Scholar] [CrossRef]
  56. Carrillo, R.E.; Barner, K.E.; Aysal, T.C. Robust sampling and reconstruction methods for sparse signals in the presence of impulsive noise. IEEE J. Sel. Top. Sign. Process. 2010, 4, 392–408. [Google Scholar] [CrossRef]
  57. Lin, Y.Z.; Li, S.; Zhang, Y.Z. Convergence Rate Analysis of Accelerated Forward-Backward Algorithm with Generalized Nesterov Momentum Scheme. Int. J. Numer. Anal. Model. 2023, 20, 518–537. [Google Scholar] [CrossRef]
  58. Zhu, W.; Song, Y.; Xiao, Y. Robust support vector machine classifier with truncated loss function by gradient algorithm. Comput. Ind. Eng. 2022, 172, 108630. [Google Scholar] [CrossRef]
  59. Dua, D.; Graff, C. UCI Machine Learning Repository. 2017. Available online: http://archive.ics.uci.edu/ml (accessed on 28 October 2025).
Figure 1. Total number of iterations of Algorithm 4 versus the fixed restart period N. The minimum number of iterations is achieved at N = 22 (marked by the red circle). For comparison, the value of N determined by Algorithm 3 is 24 (marked by the red star).
Figure 1. Total number of iterations of Algorithm 4 versus the fixed restart period N. The minimum number of iterations is achieved at N = 22 (marked by the red circle). For comparison, the value of N determined by Algorithm 3 is 24 (marked by the red star).
Axioms 15 00007 g001
Figure 2. The value of d k at each k-th iteration of Algorithm 3. The first iteration k for which d k > d k 1 is 24 (marked by the red star), and thus the restart period N is set to 24.
Figure 2. The value of d k at each k-th iteration of Algorithm 3. The first iteration k for which d k > d k 1 is 24 (marked by the red star), and thus the restart period N is set to 24.
Axioms 15 00007 g002
Figure 3. Total number of iterations of Algorithm 4 versus the fixed restart period N. The minimum number of iterations is achieved at N = 42 (marked by the red circle). For comparison, the value of N determined by Algorithm 3 is 61 (marked by the red star).
Figure 3. Total number of iterations of Algorithm 4 versus the fixed restart period N. The minimum number of iterations is achieved at N = 42 (marked by the red circle). For comparison, the value of N determined by Algorithm 3 is 61 (marked by the red star).
Axioms 15 00007 g003
Figure 4. The value of d k at each k-th iteration of Algorithm 3. The first iteration k for which d k > d k 1 is 61 (marked by the red star), and thus the restart period N is set to 61.
Figure 4. The value of d k at each k-th iteration of Algorithm 3. The first iteration k for which d k > d k 1 is 61 (marked by the red star), and thus the restart period N is set to 61.
Axioms 15 00007 g004
Table 1. Computational results for Problem (103) with ϵ = 10 4 .
Table 1. Computational results for Problem (103) with ϵ = 10 4 .
Methodi = 2i = 4i = 6i = 8i = 10
Time t Q R 0.8185.44918.56743.19191.172
(sec) t A b 0.0080.0340.0810.1490.264
t A 1.3212.0626.48613.78927.855
IPOPT4.14112.43930.01543.38388.684
SCP l s 3.80415.11231.71456.36892.386
ESQM b 15.32165.436143.914262.850427.229
ESQM e 0.9764.1339.15016.53326.966
EAPG s r 0.9863.9638.81216.13225.874
IterIPOPT108134146162180
SCP l s 212222217218225
ESQM b 17051777178617931816
ESQM e 108112113112113
EAPG s r 101102104103104
RecErrIPOPT0.0486870.0513780.0508370.0519870.052050
SCP l s 0.0491130.0517920.0513300.0525050.052568
ESQM b 0.0663950.0715100.0710930.0724760.072915
ESQM e 0.0487440.0514710.0509400.0521310.052169
EAPG s r 0.0499200.0531420.0529550.0539120.053758
ResidualIPOPT 1.99 × 10 6 1.91 × 10 6 1.97 × 10 6 1.98 × 10 6 2.10 × 10 6
SCP l s 5.69 × 10 7 5.62 × 10 7 6.09 × 10 7 5.63 × 10 7 6.93 × 10 7
ESQM b 6.79 × 10 7 5.73 × 10 7 5.79 × 10 7 5.63 × 10 7 5.49 × 10 7
ESQM e 1.22 × 10 7 1.17 × 10 7 1.03 × 10 7 1.08 × 10 7 1.03 × 10 7
EAPG s r 4.82 × 10 7 6.22 × 10 7 5.90 × 10 7 6.08 × 10 7 6.12 × 10 7
Table 2. Computational results for Problem (103) with ϵ = 10 6 .
Table 2. Computational results for Problem (103) with ϵ = 10 6 .
Methodi = 2i = 4i = 6i = 8i = 10
Time t Q R 0.8835.84018.99943.56790.874
(sec) t A b 0.0090.0350.0780.1490.264
t A 1.3652.1996.51713.88827.822
IPOPT4.78113.73329.93952.517100.201
SCP l s 4.38817.59536.54865.808107.272
ESQM b 23.894103.784232.600423.241690.270
ESQM e 1.6797.98918.18234.18457.380
EAPG s r 1.5847.39214.54527.41346.253
IterIPOPT127152165181200
SCP l s 242256251254260
ESQM b 26832792286728662902
ESQM e 187214223231240
EAPG s r 161168168171169
RecErrIPOPT0.0486870.0513780.0508360.0519860.052049
SCP l s 0.0486950.0513890.0508480.0519970.052060
ESQM b 0.0488200.0516180.0509870.0521670.052220
ESQM e 0.0486890.0513790.0508400.0519850.052049
EAPG s r 0.0487080.0514060.0508680.0520100.052074
ResidualIPOPT 1.77 × 10 6 1.74 × 10 6 1.73 × 10 6 1.73 × 10 6 1.75 × 10 6
SCP l s 5.69 × 10 10 5.61 × 10 10 6.08 × 10 10 5.52 × 10 10 6.78 × 10 10
ESQM b 1.01 × 10 10 3.07 × 10 10 1.05 × 10 10 1.36 × 10 10 1.30 × 10 10
ESQM e 5.83 × 10 11 2.06 × 10 11 3.19 × 10 11 2.43 × 10 11 1.96 × 10 11
EAPG s r 3.63 × 10 11 2.59 × 10 12 9.27 × 10 11 6.51 × 10 11 3.74 × 10 11
Table 3. Computational results for Problem (105) with ϵ = 10 4 .
Table 3. Computational results for Problem (105) with ϵ = 10 4 .
Methodi = 2i = 4i = 6i = 8i = 10
Time (sec) t Q R 0.7635.48518.96446.31790.568
t A b 0.0080.0380.0920.1670.269
t A 1.2802.2016.61414.66927.917
IPOPT13.38025.35431.57850.119122.681
SCP l s 2.86111.07115.25051.42767.749
ESQM b 9.87941.45989.737180.831279.228
ESQM e 1.8037.54016.39432.97250.844
EAPG s r 1.5416.26913.45126.20542.090
IterIPOPT334268162151191
SCP l s 183181117207180
ESQM b 11361149114611951163
ESQM e 204207208217209
EAPG s r 170169165167170
RecErrIPOPT0.7627563051.6991410.0846890.08481915.288138
SCP l s 0.0810130.0816120.0847330.0833140.086727
ESQM b 0.0862070.0871800.0906870.0891850.092889
ESQM e 0.0808360.0813850.0845500.0831040.086500
EAPG s r 0.0815170.0818820.0854600.0839590.087379
ResidualIPOPT 1.21 × 10 0 2.03 × 10 0 2.22 × 10 7 2.03 × 10 7 1.96 × 10 0
SCP l s 4.88 × 10 8 4.03 × 10 8 6.11 × 10 8 3.70 × 10 8 4.19 × 10 8
ESQM b 9.40 × 10 8 9.44 × 10 8 9.19 × 10 8 9.43 × 10 8 9.11 × 10 8
ESQM e 3.21 × 10 8 2.71 × 10 8 1.87 × 10 8 2.41 × 10 8 3.15 × 10 8
EAPG s r 2.98 × 10 8 2.41 × 10 8 3.26 × 10 8 3.20 × 10 8 2.31 × 10 8
Table 4. Computational results for Problem (105) with ϵ = 10 6 .
Table 4. Computational results for Problem (105) with ϵ = 10 6 .
Methodi = 2i = 4i = 6i = 8i = 10
Time (sec) t Q R 0.9105.98919.14844.31291.606
t A b 0.0090.0340.0810.1460.263
t A 1.4402.3236.58014.27928.077
IPOPT16.41427.39235.74856.783135.875
SCP l s 3.35712.38118.14052.18272.401
ESQM b 14.46559.228130.443243.864385.851
ESQM e 2.4579.71221.28440.44965.420
EAPG s r 2.5039.82220.05237.16859.899
IterIPOPT343277172161201
SCP l s 198196134225196
ESQM b 15551580160416501632
ESQM e 259258260272276
EAPG s r 261259243249250
RecErrIPOPT0.7627553051.6991410.0846880.08481815.288138
SCP l s 0.0808910.0814820.0845450.0831530.086576
ESQM b 0.0809380.0815310.0845970.0832040.086629
ESQM e 0.0808870.0814790.0845400.0831490.086571
EAPG s r 0.0808960.0814890.0845380.0831480.086569
ResidualIPOPT 1.21 × 10 0 2.03 × 10 0 1.75 × 10 7 1.71 × 10 7 1.96 × 10 0
SCP l s 4.60 × 10 11 2.96 × 10 11 2.97 × 10 11 4.15 × 10 11 3.98 × 10 11
ESQM b 1.07 × 10 11 1.03 × 10 11 9.55 × 10 12 9.66 × 10 12 9.25 × 10 12
ESQM e 3.40 × 10 12 6.13 × 10 12 4.77 × 10 12 3.88 × 10 12 1.69 × 10 12
EAPG s r 6.00 × 10 12 4.99 × 10 12 5.67 × 10 12 5.66 × 10 12 9.82 × 10 12
Table 5. Computational results for Problem (103) with i = 2 , ϵ = 10 4 .
Table 5. Computational results for Problem (103) with i = 2 , ϵ = 10 4 .
TimeIterRecErrResidual
Algorithm 1 (K = 30)3.9013020.048687 9.72 × 10 7
Algorithm 2 (K = 30)2.1791840.048684 4.36 × 10 10
Algorithm 1 (K = 100)10.8288250.048687 9.91 × 10 7
Algorithm 2 (K = 100)4.3393600.048689 2.09 × 10 8
Algorithm 31.0291010.049920 4.82 × 10 7
Variant (a)1.8271730.048779 7.19 × 10 7
Variant (b)1.0091010.049920 4.82 × 10 7
Variant (c)1.0101010.049920 4.82 × 10 7
Variant (d)1.3761350.049629 2.74 × 10 7
Variant (e)1.2981290.049143 5.64 × 10 7
Table 6. Computational results for Problem (105) with i = 2 , ϵ = 10 4 .
Table 6. Computational results for Problem (105) with i = 2 , ϵ = 10 4 .
TimeIterRecErrResidual
Algorithm 1 (K = 30)3.9833490.080888 9.78 × 10 8
Algorithm 2 (K = 30)1.9982050.080925 3.81 × 10 9
Algorithm 1 (K = 100)12.0489350.080888 9.89 × 10 8
Algorithm 2 (K = 100)3.8213610.080886 6.69 × 10 9
Algorithm 31.6971700.081517 2.98 × 10 8
Variant (a)7.6117620.081102 7.24 × 10 8
Variant (b)2.2802040.080623 1.56 × 10 8
Variant (c)2.2612040.080623 1.56 × 10 8
Variant (d)2.1242010.080910 1.96 × 10 8
Variant (e)1.7911870.080645 6.61 × 10 8
Table 7. Comparison of five methods on 8 datasets.
Table 7. Comparison of five methods on 8 datasets.
Iter
DatasetIPOPTGISTpDCAe APG s EAPG sr
Australian138678343179148
Banknote46285930007876
Blood452901577974
German1102636215356288
Glass1011401206174126
Hepatitis1161269230227187
Landmines3749030006970
Tic6667147120107
Fval
DatasetIPOPTGISTpDCAe APG s EAPG sr
Australian0.4172584.2223250.7531930.4172560.417256
Banknote0.5242240.5242480.7493160.5242230.524223
Blood0.5595540.5595530.5658320.5595530.559553
German0.5904718.8329120.6181640.5904630.590463
Glass0.3738640.3738620.5660740.3746730.374403
Hepatitis0.4154210.4154170.5187070.4154170.415417
Landmines0.5213450.5213440.7744790.5213440.521344
Tic0.6538660.6538640.6566950.6538640.653864
Time (s)
DatasetIPOPTGISTpDCAe APG s EAPG sr
Australian0.3450.9730.1550.0640.053
Banknote0.1325.6302.4220.0530.051
Blood0.1000.2100.0670.0270.026
German0.3313.0200.1670.2070.160
Glass0.1691.4870.0640.0570.046
Hepatitis0.1941.0110.0480.0500.044
Landmines0.0680.2710.6950.0320.027
Tic0.1420.0840.1590.0840.076
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, Z.; Ke, H.; Liu, C. The Extended Second APG Method for Constrained DC Problems. Axioms 2026, 15, 7. https://doi.org/10.3390/axioms15010007

AMA Style

Liu Z, Ke H, Liu C. The Extended Second APG Method for Constrained DC Problems. Axioms. 2026; 15(1):7. https://doi.org/10.3390/axioms15010007

Chicago/Turabian Style

Liu, Ziye, Huitao Ke, and Chunguang Liu. 2026. "The Extended Second APG Method for Constrained DC Problems" Axioms 15, no. 1: 7. https://doi.org/10.3390/axioms15010007

APA Style

Liu, Z., Ke, H., & Liu, C. (2026). The Extended Second APG Method for Constrained DC Problems. Axioms, 15(1), 7. https://doi.org/10.3390/axioms15010007

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop