A Novel Symmetrical Inertial Alternating Direction Method of Multipliers with Proximal Term for Nonconvex Optimization with Applications

Li, Ji-Hong; Lan, Heng-You; Lin, Si-Yuan

doi:10.3390/sym17060887

Open AccessArticle

A Novel Symmetrical Inertial Alternating Direction Method of Multipliers with Proximal Term for Nonconvex Optimization with Applications

by

Ji-Hong Li

¹,

Heng-You Lan

^1,2,*

and

Si-Yuan Lin

¹

College of Mathematics and Statistics, Sichuan University of Science and Engineering, Zigong 643000, China

²

Sichuan Province University Key Laboratory of Bridge Non-Destruction Detecting and Engineering Computing, Zigong 643000, China

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(6), 887; https://doi.org/10.3390/sym17060887

Submission received: 13 April 2025 / Revised: 28 May 2025 / Accepted: 1 June 2025 / Published: 5 June 2025

(This article belongs to the Special Issue Symmetry in Mathematical Optimization Algorithm and Its Applications)

Download

Browse Figures

Versions Notes

Abstract

In this paper, we propose a novel alternating direction method of multipliers based on acceleration technique involving two symmetrical inertial terms for a class of nonconvex optimization problems with a two-block structure. To address the nonconvex subproblem, we introduce a proximal term to reduce the difficulty of solving this subproblem. For the smooth subproblem, we employ a gradient descent method on the augmented Lagrangian function, which significantly reduces the computational complexity. Under appropriate assumptions, we prove subsequential convergence of the algorithm. Moreover, when the generated sequence is bounded and the auxiliary function satisfies Kurdyka–Łojasiewicz property, we establish global convergence of the algorithm. Finally, effectiveness and superior performance of the proposed algorithm are validated through numerical experiments in signal processing and smoothly clipped absolute deviation penalty problems.

Keywords:

convergence analysis; nonconvex optimization; alternating direction method of multipliers; symmetrical inertial term; Kurdyka–Łojasiewicz inequality

MSC:

90C26; 65K05; 49K35; 41A25

1. Introduction

It is well known that recovering sparse signals from incomplete observations is an important and very significant research topic in practical applications. The core objective is to find the optimal sparse solution to a system of linear equations, which can be formulated as the following model [1]:

min c {∥ x ∥}_{0} + \frac{1}{2} ∥ A x - b ∥,

(1)

where

A \in R^{m \times n}

is the measurement matrix,

b \in R^{m}

is the observed data,

x \in R^{n}

is a sparse signal,

c > 0

is a regularization parameter, and

{∥ \cdot ∥}_{0}

denotes the

ℓ_{0}

-norm. However, Chartrand and Staneva [2] pointed out that (1) represents a class of problems that are fundamentally difficult to solve. To overcome this challenge, Zeng et al. [3] proposed a relaxed objective function by replacing the

l_{0}

regularization with the

l_{\frac{1}{2}}

regularization; the model (1) is transformed into a more tractable nonconvex optimization problem. Therefore, adopting this modification becomes more reasonable in signal recovery problems, which leads to the following two-block nonconvex optimization problem:

\begin{matrix} min c {∥ x ∥}_{\frac{1}{2}}^{\frac{1}{2}} + \frac{1}{2} {∥ A x - b ∥}^{2}, \end{matrix}

(2)

where

{∥ x ∥}_{\frac{1}{2}}^{\frac{1}{2}} = \sum_{i = 1}^{n} {| x_{i} |}^{\frac{1}{2}}

. Further, to mitigate feature compression and information loss in motor imagery decoding, Doostmohammadian et al. [4] proposed a Cauchy-based nonconvex sparse regularization model, which enhances feature extraction and noise reduction across datasets. In general, introducing an auxiliary variable

z \in R^{m}

such that

z = A x - b

in (2), we would consider reformulating the problem (2) as follows:

\begin{matrix} min c {∥ x ∥}_{\frac{1}{2}}^{\frac{1}{2}} + \frac{1}{2} {∥ z ∥}^{2} \\ s . t . A x - z = b . \end{matrix}

(3)

We notice that many scholars have investigated constrained nonconvex optimization problems of the form (3). In particular, Zeng et al. [3] pointed out that the iterative soft-thresholding algorithm can be used to solve the regularization problem, which was validated in the context of problem (3). Meanwhile, Chen and Selesnick [5] performed a performance validation of model (3) using an improved overlapping shrinkage algorithm. Further related works can be found in [6,7] and the references therein.

In order to solve (3), we will consider the general model of (3) in this paper. That is, in (3), extending the objective function, respectively, to a lower semicontinuous function

f : R^{n} \to R

and a differentiable function

g : R^{m} \to R

whose gradient is L-Lipschitz continuous with

L > 0

, and setting

R^{m} ∋ y = - z

, the problem (3) is generalized as the following nonconvex optimization with linear constraints:

\begin{matrix} min f (x) + g (y) \\ s . t . A x + y = b, \end{matrix}

(4)

where A is the same as in (1). In recent years, nonconvex optimization problems of the form (4) have found widespread applications in science and engineering. For instance, based on a gradient tracking algorithm, Zhang et al. [8] investigated optimization of local nonconvex objective functions in time-varying networks. In addition, to ensure the effectiveness of image reconstruction, Tiddeman and Ghahremani [9] combined wavelet transforms with principal component analysis to propose a class of principal component waveform networks for solving linear inverse problems, and fully utilized the symmetry in wavelet transforms during the wavelet decomposition. For more related works, one can see [10,11,12] and the references therein.

In fact, variants of (4) have found applications in various fields, such as statistical learning [13,14,15], penalized zero-variance discriminant analysis [16] and image reconstruction [17,18]. Furthermore, if

y = - p

,

f (x) = \sum_{i = 1}^{n} h_{k} (| x_{i} |)

for

x = {(x_{1}, x_{2}, \dots, x_{n})}^{T} \in R^{n}

and

g (y) = \frac{1}{2} {∥ y ∥}^{2} = \frac{1}{2} {∥ p ∥}^{2}

in (4), then the nonconvex optimization degenerates to the following smoothly clipped absolute deviation (SCAD) penalty problem:

\begin{matrix} min \sum_{i = 1}^{n} h_{k} (| x_{i} |) + \frac{1}{2} {∥ p ∥}^{2} \\ s . t . A x - p = b, \end{matrix}

(5)

with the penalty function

h_{κ}

in the objective, we refer readers to (26) later. The SCAD penalty problem (5) proposed by scholars can be conceptually understood and it should be noted that in statistical optimization, certain penalty methods exhibit limitations, such as vulnerability to data circumvention and biased estimation of significant variables [19]. To address these questions, Fan and Li [20] proposed a SCAD penalty function, developed optimization algorithms to solve nonconcave penalized likelihood problems and demonstrated that this method possesses asymptotic oracle properties. Remarkably, with appropriate regularization parameter selection, the results can achieve nearly identical performance to the known true model.

It is well known that alternating direction method of multipliers (ADMM) has gained widespread attention due to its balance between performance and efficiency. When the subproblems are independent, ADMM exhibits a unique symmetry. In fact, with appropriately designed update steps, this symmetry ensures that the convergence of ADMM is independent of the order in which the subproblems are updated [21]. In recent years, as nonconvex optimization problems have gained increasing attention, convergence analysis of ADMM in nonconvex settings has become a research hotspot. Due to the lack of theoretical guarantees for ADMM in nonconvex problems, Hong et al. [22] not only established a convergence theory for ADMM in the nonconvex setting, but also overcame the limitation on the number of variable blocks. Wang et al. [23] demonstrated that incorporating the Bregman distance into ADMM can effectively simplify the computation of subproblems, emphasizing the feasibility of ADMM in nonconvex settings. Ding et al. [24] proposed a class of semi-proximal ADMM for solving low-rank matrix recovery problems. Furthermore, in the presence of noisy matrix data, by minimizing the nuclear norm, Ding et al. [24] effectively addressed the issues of Gaussian noise and related mixed noise. Guo et al. [25] provided insights into solving large-scale nonconvex optimization problems using ADMM. For more related work, readers may refer to [26,27,28,29] and the references therein.

The inertial acceleration technique, which is derived from the heavy-ball method, utilizes information from previous iterations to construct affine combinations [30]. Additionally, we observe that the inertial technique can employ different extrapolation strategies during the optimization process to enhance convergence speed. Using a general inertial proximal gradient method, Wu and Li [31] proposed two distinct extrapolation strategies to flexibly adjust convergence rate of the inertial proximal gradient method. Chao et al. [32] investigated an inertial proximal ADMM and established the global convergence of iterates under appropriate assumptions. Moreover, Wang et al. [33] considered a different inertial update scheme, which not only preserves the acceleration effect of inertia but also reduces the computational error caused by the inertial term.

Unfortunately, Wang et al. [33] only considered the inertial update step for x. We note that Chen et al. [34] also discovered that embedding the inertial term into the y-subproblem can significantly improve convergence speed of the presented algorithm. Inspired by the work in [33,34], we adopt the inertial update step for x proposed by Wang et al. [33], and additionally consider an inertial update step for y to further accelerate the convergence of the algorithm. Based on this, we propose a novel symmetrical inertial alternating direction method of multipliers with proximal term (NIP-ADMM) to solve (4) and two application problems (3) and (5). Our main contributions can be summarized as follows:

(i): Building upon the inertial update step due to [33], we introduce an additional inertial update for y and incorporate ${\bar{y}}^{k}$ into the x-subproblem update; this form of inertial update ensures that the primal variables are treated equally, and thereby we achieve faster acceleration. In addition, we introduce two distinct inertial parameters to avoid the differentiated feedback effect that a single inertial parameter may impose on different inertial terms.
(ii): To simplify the computation of the subproblems, we introduce an approximation term in the x-subproblem, which under appropriate conditions, a closed-form solution can be obtained in practical applications.
(iii): Under reasonable assumptions, we prove that any cluster point of the sequence generated by NIP-ADMM belongs to the set of critical points of the augmented Lagrangian function. Furthermore, under the condition that the auxiliary function satisfies Kurdyka–Łojasiewicz property (KLP), we further establish that the sequence generated by NIP-ADMM converges to a stationary point of the augmented Lagrangian function.
(iv): Since function g in (4) is convex, this ensures that g is well-defined, and enables us to abandon the traditional ADMM update scheme and instead adopt a gradient descent approach. This method requires only the computation of gradients at each iteration, and significantly reduces computational complexity. Consequently, it offers substantial advantages when handling high-dimensional or large-scale datasets.

The structure of this paper is as follows. In Section 2, we review essential results required for further analysis. We present NIP-ADMM and analyze its convergence in Section 3. Numerical experiment and application to signal recovery model and SCAD penalty problem in Section 4 highlight the benefits of the majorization and inertial techniques. Lastly, in Section 5, we provide a conclusion.

2. Preliminaries

In this section, we introduce key notations and definitions that are essential for the results to be developed and are utilized in the subsequent sections.

Assume

〈 x, y 〉 = x^{T} y

and

∥ x ∥ = \sqrt{〈 x, x 〉}

for

x, y \in R^{n}

. If matrix S is a positive definite (semi-definite positive) matrix, then we have

S ≻ 0 (S ⪰ 0)

. Given any

n \times n

matrix

S ⪰ 0

and a vector

x \in R^{n}

, let

{∥ x ∥}_{S} : = \sqrt{x^{T} S x}

be S-norm of x. For a matrix

D \in R^{n \times m}

, we define

λ_{{min}_{(D)}}

and

λ_{{max}_{(D)}}

as the smallest and largest eigenvalues of

D^{T} D

, respectively. If we denote

f : R^{n} \to (- \infty, + \infty]

, then the domain of f is defined as

d o m f = {x \in R^{n} | f (x) < + \infty}

.

Definition 1.

Let

Q \subseteq R^{n}

. Then the distance from point

x \in R^{n}

to Q is defined as

d (x, Q) = {inf}_{y \in Q} ∥ y - x ∥

. In particular, if

Q = \emptyset

, then

d (x, Q) = + \infty

.

Definition 2.

For a differentiable convex function

F : R^{n} \to R

, Bregman distance is defined by

▵_{F} (p, q) = F (p) - F (q) - 〈 \nabla F (q), p - q 〉, \forall p, q \in R^{n} .

Definition 3.

Assume

χ : R^{n} \to R \cup {+ \infty}

is a proper lower semicontinuous function.

(i): Frechet sub-differential of χ at $x \in dom χ$ is denoted by $\hat{\partial} χ (x)$ and defined as:

$\hat{\partial} χ (x) = \{\bar{x} \in R^{n} : lim_{y \neq x} inf_{y \to x} \frac{χ (y) - χ (x) - 〈\bar{x}, y - x〉}{∥ y - x ∥} \geq 0\} .$

Among others, we set $\hat{\partial} χ (x) = \emptyset$ when $x \notin dom χ$ .
(ii): The limiting sub-differential of χ at $x \in dom χ$ is written as $\partial χ (x)$ and defined by

$\partial χ (x) = \{\bar{x} \in R^{n} : \exists x_{k} \to x, χ (x_{k}) \to χ (x), {\hat{x}}_{k} \in \hat{\partial} χ (x_{k}), {\hat{x}}_{k} \to \bar{x}\} .$

Definition 4.

We say the point

(x^{★}, y^{★}, λ^{★})

is a critical point of the following augmented Lagrangian function associated with problem (4):

\begin{matrix} L_{β} (x, y, λ) = f (x) + g (y) - 〈 λ, A x + y - b 〉 + \frac{β}{2} {∥ A x + y - b ∥}^{2}, \end{matrix}

(6)

where λ denotes the augmented Lagrange multiplier, and

β > 0

is a penalty parameter, if it satisfies the following conditions:

\{\begin{matrix} A^{T} λ^{★} \in \partial f (x^{★}), \\ λ^{★} = \nabla g (y^{★}), \\ A x^{★} + y^{★} = b . \end{matrix}

Definition 5

([35]). (KLP) Let

χ : R^{n} \to R \cup {+ \infty}

be a proper lower semicontinuous function with

\partial χ \neq \emptyset

. For

\hat{p} \in \partial χ

, if there exists

ς \in (0, + \infty)

, a neighborhood U of

\hat{p}

, and a function

φ \in V_{φ}

, where

V_{φ}

is the set of the concave function

φ : [0, ς) \to [0, + \infty)

, then for any

p \in {p \in U ∣ χ (\hat{p}) < χ (p) < χ (\hat{p}) + ς}

, the following inequality holds:

φ^{'} (χ (p) - χ (\hat{p})) d (0, \partial χ (p)) \geq 1 .

We say that χ has KLP at

\hat{p}

, and φ is the associate function of f with KLP.

Lemma 1

([36]). The sub-differential of a lower semicontinuous function

χ : R^{n} \to R \cup {+ \infty}

possesses several fundamental and significant properties as follows:

(i): From Definition 3, which implies that $\hat{\partial} χ (x) \subseteq \partial χ (x)$ holds for all $x \in R^{n}$ , and given that $\partial χ (x)$ is a closed set, $\hat{\partial} χ (x)$ is also a closed set.
(ii): Suppose that $(x^{k}, y^{k})$ is a sequence that converges to $(x^{★}, y^{★})$ , and $χ (x^{k})$ converges to $χ (x^{★})$ with $y^{k} \in \partial χ (x^{k})$ . Then, by the definition of the sub-differential, we have $y^{★} \in \partial χ (x^{★})$ .
(iii): If $x^{★}$ is a local minimum of χ, then it follows that $0 \in \partial χ (x^{★})$ .
(vi): Assuming that $F : R^{n} \to R$ is a continuously differentiable function, we can derive:

$\partial (χ + F) (x) = \partial χ (x) + \nabla F (x) .$

Lemma 2

([35]). Assume

Y (x, y) = f (x) + g (y)

, where

f : R^{n} \to R \cup {+ \infty}

and

g : R^{m} \to R \cup {+ \infty}

are both proper lower semicontinuous functions. Then, for any

(x, y) \in dom Y = dom f \times dom g

, we can obtain

\partial Y (x, y) = \partial_{x} f (x, y) \times \partial_{y} g (x, y) .

Lemma 3

([37]). (Uniformized KLP) Let Ω be a compact set and

V_{φ}

be the same as in Definition 5. If a proper lower semicontinuous function

χ : R^{n} \to R \cup {+ \infty}

is fixed at a point in Ω and satisfies the KLP at every point on Ω, and there exist

ϱ > 0

,

ς > 0

, and

φ \in V_{φ}

such that for any

\hat{x} \in Ω

and

x \in {x \in R^{n} : d (x, Ω) < ϱ} \cap [χ (\hat{x}) < χ (x) < χ (\hat{x}) + ς]

, then the following inequality is satisfied:

φ^{'} (χ (x) - χ (\hat{x})) d (0, \partial χ (x)) \geq 1 .

Lemma 4

([38]). If the function

c : R^{n} \to R

is continuously differentiable, and

\nabla c

is Lipschitz continuous with constant

L \geq 0

, then for any

x, y \in R^{n}

, the following result holds:

| c (y) - c (x) - 〈 \nabla c (x), y - x 〉 | \leq \frac{L}{2} {∥ y - x ∥}^{2} .

3. Novel Algorithm and Convergence Analysis

In this section, based on (6), we propose NIP-ADMM for solving the problem (4), which is outlined below:

Algorithm 1 NIP-ADMM

Initialization: Input $x^{1}, y^{1}$ , and $λ^{1}$ , and let ${\bar{x}}^{0} = x^{1}$ , and ${\bar{y}}^{0} = y^{1}$ . Given constants $η, θ \in (0, 1]$ , $γ, β \in (0, + \infty]$ and $S \in R^{n \times n}$ . Set appropriate stopping parameters $ϖ \in R$ , choose a sufficiently small $h$ , and finally set $k = 1$ .
for k = 0, 1, 2, …do

1°: Compute $({\bar{x}}^{k}, {\bar{y}}^{k}) = (x^{k}, y^{k}) + θ (x^{k} - {\bar{x}}^{k - 1}, 0) + η (0, y^{k} - {\bar{y}}^{k - 1})$ .
2°: Execute $arg min {L_{β} (x, {\bar{y}}^{k}, λ^{k}) + \frac{1}{2} ∥ x - {\bar{x}}^{k} ∥_{S}^{2}$ to determine $x^{k + 1}$ .
3°: Calculate $y^{k + 1} = y^{k} - γ \nabla_{y} L_{β} (x^{k + 1}, y^{k}, λ^{k})$ .
4°: Update dual variable $λ^{k + 1} = λ^{k} - β (A x^{k + 1} + y^{k + 1} - b)$ .
5°: If $ϖ \leq h$ then

break
else
$k = k + 1$ and continue
End for

6°: Output: output $(x^{k + 1}, y^{k + 1}, λ^{k + 1})$ of the problem (4).

Remark 1.

(i) In Algorithm 1, S is a semi-definite matrix. Moreover, the parameters ϖ appearing in

5^{\circ}

could be related to

∥ x^{k + 1} - x^{k} ∥

and

∥ y^{k + 1} - y^{k} ∥

.

(ii): The update scheme in $3^{\circ}$ of Algorithm 1 for y-subproblem adopts the gradient descent method, where $\nabla_{y} L_{β}$ is the gradient of the function $L_{β}$ with respect to y, and γ is called the learning rate.
(iii): The inertial structure adopted in Algorithm 1 employs a structurally balanced acceleration strategy. This update strategy is mathematically symmetric with the only distinction for the values of the parameters η and θ.

According to Algorithm 1, the optimality conditions for NIP-ADMM are obtained as

\{\begin{matrix} 0 \in \partial f (x^{k + 1}) - A^{T} λ^{k} + β A^{T} (A x^{k + 1} + {\bar{y}}^{k} - b) + S (x^{k + 1} - {\bar{x}}^{k}), \\ 0 = \nabla g (y^{k}) + λ^{k} + β (A x^{k + 1} + y^{k} - b) - \frac{1}{γ} (y^{k + 1} - y^{k}) . \end{matrix}

(7)

Before concluding this section, we present the following fundamental assumptions, which are essential for convergence analysis.

Assumption 1. (i)

f : R^{n_{1}} \to R \cup {+ \infty}

is a proper lower semicontinuous function.

g : R^{n_{2}} \to R

is continuously differentiable, and

\nabla g

is Lipschitz continuous with a Lipschitz constant

l_{g} > 0

.

(ii): S is a positive semidefinite matrix.
(iii): For convenience, we introduce the following symbols:

$\begin{array}{l} ζ = (x, y, λ), ζ^{k} = (x^{k}, y^{k}, λ^{k}), ζ^{*} = (x^{*}, y^{*}, λ^{*}), ξ = \frac{1}{γ} - β, \\ {\hat{ζ}}^{k} = (x^{k}, y^{k}, λ^{k}, {\bar{x}}^{k}, x^{k - 1}, y^{k - 1}), σ_{0} = \frac{1}{γ} - \frac{L + β}{2} - \frac{2 ξ^{2}}{β} - \frac{2 {(ξ + L)}^{2}}{β}, \\ {\hat{L}}_{β} ({\hat{ζ}}^{k}) = L_{β} (ζ^{k}) + \frac{{(ξ + L)}^{2}}{β} ∥ y^{k + 1} - y^{k} ∥^{2} + \frac{1}{2} {∥ {\bar{x}}^{k + 1} - x^{k} ∥}_{S}^{2} . \end{array}$
(iv): To analyze the monotonicity of ${{\hat{L}}_{β} ({\hat{ζ}}^{k})}$ , we set $σ_{0} > 0$ .

Lemma 5.

If conditions (i)-(iv) in Assumption 1 holds, then for any

k \geq 1

,

\begin{matrix} {\hat{L}}_{β} (x^{k + 1}, y^{k + 1}, λ^{k + 1}, {\bar{x}}^{k + 1}, x^{k}, y^{k}) - {\hat{L}}_{β} (x^{k}, y^{k}, λ^{k}, {\bar{x}}^{k - 1}, x^{k - 1}, y^{k - 1}) \\ \leq - \frac{1 - θ^{2}}{2} ∥ x^{k} - {\bar{x}}^{k - 1} ∥_{S}^{2} - σ_{0} {∥ y^{k + 1} - y^{k} ∥}^{2}, \end{matrix}

(8)

where

θ \in (0, 1]

is the inertial parameter in Algorithm 1.

Proof.

According to the definition of the Lagrangian function, one gets

\begin{matrix} L_{β} (x^{k + 1}, y^{k + 1}, λ^{k + 1}) - L_{β} (x^{k + 1}, y^{k + 1}, λ^{k}) & = 〈 λ^{k + 1} - λ^{k}, A x^{k + 1} + y^{k + 1} - b 〉 \\ = \frac{1}{β} {∥ λ^{k + 1} - λ^{k} ∥}^{2}, \end{matrix}

(9)

and we also can know that

\begin{matrix} L_{β} (x^{k + 1}, y^{k + 1}, λ^{k}) - L_{β} (x^{k + 1}, {\bar{y}}^{k}, λ^{k}) \\ = g (y^{k + 1}) - g ({\bar{y}}^{k}) - 〈 λ^{k}, y^{k + 1} - {\bar{y}}^{k} 〉 + \frac{β}{2} {∥ A x^{k + 1} + y^{k + 1} - b ∥}^{2} \\ - \frac{β}{2} {∥ A x^{k + 1} + {\bar{y}}^{k} - b ∥}^{2} . \end{matrix}

(10)

Since

x^{k + 1}

is the optimal solution to the subproblem with respect to x in 2° of Algorithm 1, one knows that

\begin{matrix} L_{β} (x^{k + 1}, {\bar{y}}^{k}, λ^{k}) - L_{β} (x^{k}, {\bar{y}}^{k}, λ^{k}) \leq \frac{1}{2} ∥ x^{k} - {\bar{x}}^{k} ∥_{S}^{2} - \frac{1}{2} {∥ x^{k + 1} - {\bar{x}}^{k} ∥}_{S}^{2} \\ \leq \frac{1}{2} ∥ x^{k} - {\bar{x}}^{k - 1} ∥_{S}^{2} - \frac{1}{2} ∥ x^{k} - {\bar{x}}^{k + 1} ∥_{S}^{2} - \frac{1 - θ^{2}}{2} {∥ x^{k} - {\bar{x}}^{k - 1} ∥}_{S}^{2} . \end{matrix}

(11)

Furthermore, we know

\begin{matrix} L_{β} (x^{k + 1}, {\bar{y}}^{k}, λ^{k}) - L_{β} (x^{k + 1}, y^{k}, λ^{k}) \\ = g ({\bar{y}}^{k}) - g (y^{k}) - 〈 λ^{k}, {\bar{y}}^{k} - y^{k} 〉 + \frac{β}{2} {∥ A x^{k + 1} + {\bar{y}}^{k} - b ∥}^{2} \\ - \frac{β}{2} {∥ A x^{k + 1} + y^{k} - b ∥}^{2} . \end{matrix}

(12)

Summing up (10)–(12), we have

\begin{matrix} L_{β} (x^{k + 1}, y^{k + 1}, λ^{k}) - L_{β} (x^{k}, y^{k}, λ^{k}) \\ = g (y^{k + 1}) - g (y^{k}) - 〈 λ^{k}, y^{k + 1} - y^{k} 〉 \\ + \frac{β}{2} ∥ A x^{k + 1} + y^{k + 1} {- b ∥}^{2} - \frac{β}{2} {∥ A x^{k + 1} + y^{k} - b ∥}^{2} \\ + \frac{1}{2} ∥ x^{k} - {\bar{x}}^{k - 1} ∥_{S}^{2} - \frac{1}{2} ∥ x^{k} - {\bar{x}}^{k + 1} ∥_{S}^{2} - \frac{1 - θ^{2}}{2} {∥ x^{k} - {\bar{x}}^{k - 1} ∥}_{S}^{2} . \end{matrix}

(13)

It follows from (7), (13) and Lemma 4 that

\begin{matrix} g (y^{k + 1}) - g (y^{k}) - 〈 λ^{k}, y^{k + 1} - y^{k} 〉 \\ \leq 〈 \nabla g (y^{k}), y^{k + 1} - y^{k} 〉 + \frac{L}{2} {∥ y^{k + 1} - y^{k} ∥}^{2} - 〈 λ^{k}, y^{k + 1} - y^{k} 〉 \\ = 〈 λ^{k + 1} + (\frac{1}{γ} - β) (y^{k} - y^{k + 1}), y^{k + 1} - y^{k} 〉 + \frac{L}{2} {∥ y^{k + 1} - y^{k} ∥}^{2} - 〈 λ^{k}, y^{k + 1} - y^{k} 〉 \\ = (\frac{L}{2} + β - \frac{1}{γ}) {∥ y^{k + 1} - y^{k} ∥}^{2} + 〈 λ^{k + 1} - λ^{k}, y^{k + 1} - y^{k} 〉, \end{matrix}

and we get

\begin{matrix} \frac{β}{2} ∥ A x^{k + 1} + y^{k + 1} {- b ∥}^{2} - \frac{β}{2} {∥ A x^{k + 1} + y^{k} - b ∥}^{2} \\ = \frac{β}{2} ∥ A x^{k + 1} + y^{k + 1} {- b ∥}^{2} - \frac{β}{2} {∥ (A x^{k + 1} + y^{k + 1} - b) + (y^{k} - y^{k + 1}) ∥}^{2} \\ = \frac{β}{2} ∥ A x^{k + 1} + y^{k + 1} {- b ∥}^{2} - \frac{β}{2} (∥ A x^{k + 1} + y^{k + 1} {- b ∥}^{2} + ∥ y^{k} - y^{k + 1} ∥^{2}) \\ + 〈 β (A x^{k + 1} + y^{k + 1} - b), y^{k} - y^{k + 1} 〉 \\ = - \frac{β}{2} {∥ y^{k} - y^{k + 1} ∥}^{2} + 〈 λ^{k} - λ^{k + 1}, y^{k} - y^{k + 1} 〉 . \end{matrix}

Combining the above two formulas yields (12); one can declare

\begin{matrix} L_{β} (x^{k + 1}, y^{k + 1}, λ^{k}) - L_{β} (x^{k}, y^{k}, λ^{k}) & \leq (\frac{L + β}{2} - \frac{1}{γ}) {∥ y^{k + 1} - y^{k} ∥}^{2} \\ + \frac{1}{2} ∥ x^{k} - {\bar{x}}^{k - 1} ∥_{S}^{2} - \frac{1}{2} {∥ x^{k} - {\bar{x}}^{k + 1} ∥}_{S}^{2} \\ - \frac{1 - η^{2}}{2} {∥ x^{k} - {\bar{x}}^{k - 1} ∥}_{S}^{2} . \end{matrix}

(14)

Noticing Algorithm 1 and (7), one can see

(\frac{1}{γ} - β) (y^{k} - y^{k + 1}) = \nabla g (y^{k}) - λ^{k + 1} .

Thus, it is natural to derive the following process:

\begin{matrix} {∥λ^{k + 1} - λ^{k}∥}^{2} & = ∥\nabla g (y^{k}) - \nabla g (y^{k - 1}) + (\frac{1}{γ} - β) (y^{k + 1} - y^{k}) \\ {- (\frac{1}{γ} - β) (y^{k} - y^{k - 1})∥}^{2} \\ \leq 2 {(L + \frac{1}{γ} - β)}^{2} ∥ y^{k} - y^{k - 1} ∥^{2} + 2 {(\frac{1}{γ} - β)}^{2} {∥ y^{k + 1} - y^{k} ∥}^{2} . \end{matrix}

(15)

Combining (9), (11) and Equation (15), one can draw the following result:

\begin{matrix} L_{β} (x^{k + 1}, y^{k + 1}, λ^{k + 1}) + \frac{{(ξ + L)}^{2}}{β} ∥ y^{k + 1} - y^{k} ∥^{2} + \frac{1}{2} {∥ {\bar{x}}^{k + 1} - x^{k} ∥}_{S}^{2} \\ \leq L_{β} (x^{k}, y^{k}, λ^{k}) + \frac{{(ξ + L)}^{2}}{β} ∥ y^{k} - y^{k - 1} ∥^{2} + \frac{1}{2} {∥ {\bar{x}}^{k} - x^{k - 1} ∥}_{S}^{2} \\ - \frac{1 - θ^{2}}{2} ∥ x^{k} - {\bar{x}}^{k - 1} ∥_{S}^{2} - σ_{0} {∥ y^{k + 1} - y^{k} ∥}^{2}, \end{matrix}

where

ξ = \frac{1}{γ} - β

and

σ_{0} = \frac{1}{γ} - \frac{L + β}{2} - \frac{2 ξ^{2}}{β} - \frac{2 {(ξ + L)}^{2}}{β}

, and we obtain the desired conclusion. □

According to Assumption 1 with

σ_{0} > 0

and

θ \in (0, 1]

, the monotonic non-increasing property of the sequence

{{\hat{L}}_{β} ({\hat{ζ}}^{k})}

is guaranteed.

Lemma 6.

If the sequence

ζ^{k} : = {x^{k}, y^{k}, λ^{k}}

generated by Algorithm 1 is bounded, then we have

∥ ζ^{k + 1} - ζ^{k} ∥^{2} < + \infty .

Proof.

Since

{ζ^{k}}

is bounded, it is evident that

{\hat{ζ}}^{k}

is also bounded. Moreover, there exists an accumulation point; let us assume it to be

ζ^{★}

, and there exists a subsequence

{ζ_{j}^{k}}

of

{ζ^{k}}

such that

\begin{matrix} \underset{j \to \infty}{lim inf} {\hat{L}}_{β} (ζ^{k_{j}}) \geq {\hat{L}}_{β} (ζ^{★}), \end{matrix}

(16)

which implies that

{{\hat{L}}_{β} (ζ^{k_{j}})}

is bounded from below. From the conclusion of Lemma 5 and the condition

k \geq 2

, it follows that

\begin{matrix} \sum_{k = 2}^{n} σ_{0} ∥ y^{k + 1} - y^{k} ∥^{2} + \sum_{k = 2}^{n} \frac{1 - θ^{2}}{2} {∥ x^{k} - {\bar{x}}^{k - 1} ∥}_{S}^{2} \leq {\hat{L}}_{β} (ζ^{2}) - {\hat{L}}_{β} (ζ^{★}) . \end{matrix}

Given

σ_{0} > 0

,

θ \in [0, 1)

, and S is a positive semi-definite matrix, it is evident that one can derive that

\begin{matrix} \sum_{k = 0}^{n} σ_{0} ∥ y^{k + 1} - y^{k} ∥^{2} < \infty, \sum_{k = 0}^{n} \frac{1 - θ^{2}}{2} {∥ x^{k} - {\bar{x}}^{k - 1} ∥}^{2} < \infty . \end{matrix}

(17)

By the inertial relationship, the following conclusion can be obtained:

\begin{matrix} ∥ x^{k + 1} - x^{k} ∥^{2} & = ∥ x^{k + 1} - {\bar{x}}^{k} + {\bar{x}}^{k} - x^{k} ∥^{2} \\ = ∥ x^{k + 1} - {\bar{x}}^{k} + θ (x^{k} - {\bar{x}}^{k - 1}) ∥^{2} \\ \leq 2 ∥ x^{k + 1} - {\bar{x}}^{k} ∥^{2} + 2 θ {∥ x^{k} - {\bar{x}}^{k - 1} ∥}^{2}, \\ ∥ y^{k + 1} - y^{k} ∥^{2} & = ∥ y^{k + 1} - {\bar{y}}^{k} + {\bar{y}}^{k} - y^{k} ∥^{2} \\ = ∥ y^{k + 1} - {\bar{y}}^{k} + η (y^{k} - {\bar{y}}^{k - 1}) ∥^{2} \\ \leq 2 ∥ y^{k + 1} - {\bar{y}}^{k} ∥^{2} + 2 η {∥ y^{k} - {\bar{y}}^{k - 1} ∥}^{2} . \end{matrix}

(18)

Combining (15), (17), and (18), we have

\sum_{k = 0}^{n} ∥ x^{k + 1} - x^{k} ∥^{2} < \infty, \sum_{k = 0}^{n} ∥ y^{k + 1} - y^{k} ∥^{2} < \infty, \sum_{k = 0}^{n} {∥ λ^{k + 1} - λ^{k} ∥}^{2},

thus, we have

∥ ζ^{k + 1} - ζ^{k} ∥^{2} < + \infty

. □

Now we give subsequential convergence analysis of NIP-ADMM.

Theorem 1.

(Subsequential Convergence) The sequence

{ζ^{k}}

generated by NIP-ADMM is bounded, and assume M and

\hat{M}

are the sets of cluster points of

{ζ^{k}}

and

{{\hat{ζ}}^{k}}

, respectively. Under the assumptions and the conditions of Lemma 5, we have the following conclusion:

(i): M and $\hat{M}$ are two non-empty compact sets. As $k \to \infty$ , it follows that $d (ζ^{k}, M) \to 0$ and $d ({\hat{ζ}}^{k}, \hat{M}) \to 0$ .
(ii): $ζ^{★} \in M \Leftrightarrow {\hat{ζ}}^{★} \in \hat{M}$ .
(iii): $M \subseteq c r i t L_{β}$ .
(iv): The sequence ${{\hat{L}}_{β} ({\hat{ζ}}^{k})}$ converges, and ${\hat{L}}_{β} ({\hat{ζ}}^{★}) = inf_{k \in N} {\hat{L}}_{β} ({\hat{ζ}}^{k}) = lim_{k \to \infty} {\hat{L}}_{β} ({\hat{ζ}}^{k})$ .

Proof.

Now let us draw conclusions (i)-(iv) in turn.

(i) Based on the definitions of M and

\hat{M}

, the conclusion can be satisfied.

(ii) Combining Lemma 5 with the definitions of

ζ^{★}

and

{\hat{ζ}}^{★}

, we obtain the desired conclusion.

(iii) Letting

ζ \in M

, then one can obtain that a subsequence

{ζ^{k_{j}}}

of

{ζ^{k}}

can converge to

ζ^{★}

. By Lemma 5, as

k \to + \infty

, one has

∥ ζ^{k + 1} - ζ^{k} ∥ = 0

, which implies

lim_{j \to + \infty} ζ^{k_{j}} = ζ^{★}

. On the one hand, noting that

x^{k + 1}

is the optimal solution to the x-subproblem in 2° of Algorithm 1, we have

\begin{matrix} f (x^{k + 1}) - 〈 λ^{k}, A x^{k + 1} 〉 + \frac{β}{2} ∥ A x^{k + 1} + {\bar{y}}^{k} {- b ∥}^{2} + \frac{1}{2} {∥ x^{k + 1} - {\bar{x}}^{k} ∥}_{S}^{2} \\ \leq f (x^{★}) - 〈 λ^{k}, A x^{★} 〉 + \frac{β}{2} ∥ A x^{★} + {\bar{y}}^{k} {- b ∥}^{2} + \frac{1}{2} {∥ x^{★} - {\bar{x}}^{k} ∥}_{S}^{2} . \end{matrix}

From Lemma 6, we know that

lim_{k \to + \infty} ∥ x^{k + 1} - {\bar{x}}^{k} ∥ = 0

. And combining this with

lim_{k_{j} \to + \infty} ζ^{k_{j}} = lim_{k_{j} \to + \infty} ζ^{k_{j} + 1} = ζ^{★}

, we conclude that

lim_{k_{j} \to + \infty} sup f (x^{k_{j} + 1}) \leq f (x^{★})

holds. On the other hand, since f is a lower semi-continuous function, we deduce that

lim_{k_{j} \to + \infty} f (x^{k_{j} + 1}) \geq f (x^{★})

, and one gets

\begin{matrix} lim_{k_{j} \to + \infty} f (x^{k_{j} + 1}) & = f (x^{★}) . \end{matrix}

(19)

Moreover, given the closedness of

\partial f

and the continuity of

\nabla g

, along with

k = k_{j} \to + \infty

and the optimality condition of NIP-ADMM (7), we assert that

\{\begin{matrix} A^{T} λ^{★} \in \partial f (x^{★}), \\ λ^{★} = \nabla g (y^{★}), \\ A x^{★} + y^{★} = b, \end{matrix}

and

ζ^{★}

is a critical point of

L_{β}

.

(v) Let

{\hat{ζ}}^{★} \in \hat{M}

, and assume that there exists a subsequence

{ζ^{k_{j}}}

of

{ζ^{k}}

that converges to

ζ^{★}

. Combining the relations (16), (19), and the continuity of g, we have

lim_{k_{j} \to + \infty} {\hat{L}}_{β} (ζ^{k_{j}}) = {\hat{L}}_{β} (ζ^{★}) .

Considering that

\hat{L} ({\hat{ζ}}^{k})

is monotonically non-increasing, it follows that

\hat{L} ({\hat{ζ}}^{k})

is convergent. Consequently, for any

{\hat{ζ}}^{★} \in \hat{S}

, the relationship can be established as

{\hat{L}}_{β} ({\hat{ζ}}^{★}) = inf_{k \in N} {\hat{L}}_{β} ({\hat{ζ}}^{k}) = lim_{k \to \infty} {\hat{L}}_{β} ({\hat{ζ}}^{k}) .

□

By (6) and the semidefiniteness of the matrix S, the following can be defined with

ζ^{k} = (x^{k}, y^{k}, λ^{k})

:

\{\begin{matrix} ℘_{1}^{k + 1} = & A^{⊤} (λ^{k} - λ^{k + 1}) + β (y^{k + 1} - {\bar{y}}^{k}) + S ({\bar{x}}^{k} - x^{k + 1}), \\ ℘_{2}^{k + 1} = & \nabla g (y^{k + 1}) - \nabla g (y^{k}) + (λ^{k} - λ^{k + 1}) + β (y^{k + 1} - y^{k}) + \frac{1}{γ} (y^{k} - y^{k + 1}) \\ + \frac{2 {(ε + L)}^{2}}{β} (y^{k + 1} - y^{k}), \\ ℘_{3}^{k + 1} = & - (A x^{k + 1} + y^{k + 1} - b), ℘_{4}^{k + 1} = S ({\bar{x}}^{k + 1} - x^{k}), \\ ℘_{5}^{k + 1} = & - S ({\bar{x}}^{k + 1} - x^{k}), ℘_{6}^{k + 1} = - \frac{2 {(ε + L)}^{3}}{β} (y^{k + 1} - y^{k}) . \end{matrix}

Then, the following result can be obtained.

Lemma 7.

Let

(℘_{1}^{k + 1}, ℘_{2}^{k + 1}, ℘_{3}^{k + 1}, ℘_{4}^{k + 1}, ℘_{5}^{k + 1}, ℘_{6}^{k + 1})

be contained in

\partial {\hat{L}}_{β} ({\hat{ζ}}^{k + 1})

. Then, there exists

ψ > 0

and

k \geq 1

such that

d (0, \partial {\hat{L}}_{β} ({\hat{ζ}}^{k + 1})) \leq ψ (∥ y^{k + 1} - y^{k} ∥ + ∥ x^{k + 1} - {\bar{x}}^{k} ∥ + ∥ y^{k} - y^{k - 1} ∥ + ∥ y^{k + 1} - {\bar{y}}^{k} ∥) .

Proof.

By the definition of

{\hat{L}}_{β}

and

{\hat{ζ}}^{k} = (x^{k}, y^{k}, λ^{k}, {\hat{x}}^{k}, x^{k - 1}, y^{k - 1})

, we can derive from Lemma 1 that

\{\begin{matrix} \partial_{x} {\hat{L}}_{β} (ζ^{k + 1}) = & \partial f (x^{k + 1}) - A^{⊤} λ^{k + 1} + β (A x^{k + 1} - y^{k + 1} - b), \\ \partial_{y} {\hat{L}}_{β} (ζ^{k + 1}) = & \nabla g (y^{k + 1}) - λ^{k + 1} + β (A x^{k + 1} - y^{k + 1} - b) + \frac{2 {(ε + L)}^{2}}{β} (y^{k + 1} - y^{k}), \\ \partial_{λ} {\hat{L}}_{β} (ζ^{k + 1}) = & - (A x^{k + 1} + y^{k + 1} - b), \partial_{\bar{x}} {\hat{L}}_{β} (ζ^{k + 1}) = S ({\bar{x}}^{k + 1} - x^{k}), \\ \partial_{\hat{x}} {\hat{L}}_{β} (ζ^{k + 1}) = & - S ({\bar{x}}^{k + 1} - x^{k}), \partial_{\bar{y}} {\hat{L}}_{β} (ζ^{k + 1}) = - \frac{2 {(ε + L)}^{3}}{β} (y^{k + 1} - y^{k}) . \end{matrix}

Combining the above expression with the optimality condition (7) of NIP-ADMM, it means

\{\begin{matrix} ℘_{1}^{k + 1} = & A^{⊤} (λ^{k} - λ^{k + 1}) + β (y^{k + 1} - {\bar{y}}^{k}) + S ({\bar{x}}^{k} - x^{k + 1}), \\ ℘_{2}^{k + 1} = & \nabla g (y^{k + 1}) - \nabla g (y^{k}) + (λ^{k} - λ^{k + 1}) + β (y^{k + 1} - y^{k}) + \frac{1}{γ} (y^{k} - y^{k + 1}) \\ + \frac{2 {(ε + L)}^{2}}{β} (y^{k + 1} - y^{k}), \\ ℘_{3}^{k + 1} = & - (A x^{k + 1} + y^{k + 1} - b), ℘_{4}^{k + 1} = S ({\bar{x}}^{k + 1} - x^{k}), \\ ℘_{5}^{k + 1} = & - S ({\bar{x}}^{k + 1} - x^{k}), ℘_{6}^{k + 1} = - \frac{2 {(ε + L)}^{3}}{β} (y^{k + 1} - y^{k}) . \end{matrix}

(20)

It is easy to see from Lemma 2 that

(℘_{1}^{k + 1}, ℘_{2}^{k + 1}, ℘_{3}^{k + 1}, ℘_{4}^{k + 1}, ℘_{5}^{k + 1}, ℘_{6}^{k + 1}) \in \partial {\hat{L}}_{β} ({\hat{ζ}}^{k + 1})

. Moreover, since g has a Lipschitz continuous gradient with respect to L, we get

∥ \nabla g (y^{k + 1}) - \nabla g (y^{k}) ∥ \leq L ∥ y^{k + 1} - y^{k} ∥ .

Thus, according to (20), there exists a positive real number

ψ_{1}

such that

\begin{matrix} ∥ (℘_{1}^{k + 1}, ℘_{2}^{k + 1}, ℘_{3}^{k + 1}, ℘_{4}^{k + 1}, ℘_{5}^{k + 1}, ℘_{6}^{k + 1}) ∥ & \leq ψ_{1} (∥ λ^{k + 1} - λ^{k} ∥ + ∥ y^{k + 1} - {\bar{y}}^{k} ∥ \\ + ∥ x^{k + 1} - {\bar{x}}^{k} ∥ + ∥ y^{k + 1} - y^{k} ∥) . \end{matrix}

Furthermore, combining this with (15), we know that there exists

ψ_{2} > 0

and

k > 1

such that

∥ λ^{k + 1} - λ^{k} ∥ \leq ψ_{2} (∥ y^{k} - y^{k - 1} ∥ + ∥ y^{k + 1} - y^{k} ∥) .

Hence, by selecting

ψ = max {ψ_{1}, ψ_{2}}

and

k > 1

, we can further conclude that

\begin{matrix} d (0, \partial {\hat{L}}_{β} ({\hat{ζ}}^{k + 1})) & \leq ∥ (℘_{1}^{k + 1}, ℘_{2}^{k + 1}, ℘_{3}^{k + 1}, ℘_{4}^{k + 1}, ℘_{5}^{k + 1}, ℘_{6}^{k + 1}) ∥ \\ \leq ψ (∥ y^{k + 1} - y^{k} ∥ + ∥ x^{k + 1} - {\bar{x}}^{k} ∥ + ∥ y^{k} - y^{k - 1} ∥ + ∥ y^{k + 1} - {\bar{y}}^{k} ∥), \end{matrix}

This concludes the proof. □

Theorem 2.

(Global convergence) Suppose the sequence

{ζ^{k}}

generated by NIP-ADMM is bounded, and the hypotheses (i)-(iv) in Assumption 1 hold. If

{\hat{L}}_{β}

is a KL function, then

∥ ζ^{k + 1} - ζ^{k} ∥ < + \infty .

Moreover, the sequence

{ζ^{k}}

converges to a critical point of

L_{β}

.

Proof.

From Theorem 1, we know that

lim_{k \to \infty} {\hat{L}}_{β} ({\hat{ζ}}^{k}) = {\hat{L}}_{β} ({\hat{ζ}}^{★})

. For any

{\hat{ζ}}^{★} \in \hat{M}

, the proof process needs to consider the following two cases:

(Case I) For any

k_{0} > 1

and given that

{\hat{L}}_{β} ({\hat{ζ}}^{(k_{0})}) = {\hat{L}}_{β} ({\hat{ζ}}^{★})

, it follows from Lemma 5 that there exists a constant

K > 0

such that

K (∥ x^{k} - {\bar{x}}^{k - 1} ∥^{2} + ∥ y^{k + 1} - y^{k} ∥^{2}) \leq {\bar{L}}_{β} ({\hat{ζ}}^{k}) - {\bar{L}}_{β} ({\hat{ζ}}^{k + 1}) \leq {\bar{L}}_{β} ({\hat{ζ}}_{0}^{k}) - {\bar{L}}_{β} ({\hat{ζ}}^{★}) .

It is clear that

K = min \{σ_{0}, \frac{1 - θ^{2}}{2} λ_{{min}_{(S)}}\}

. As a result, we have

∥ y^{k + 1} - y^{k} ∥ < + \infty

and

∥ x^{k} - {\bar{x}}^{k - 1} ∥ < + \infty

. Combining (15) and (18), it follows that

∥ x^{k + 1} - x^{k} ∥ < + \infty

and

∥ λ^{k + 1} - λ^{k} ∥ < + \infty

. Finally, for any

k > k_{0}

, we conclude that

∥ ζ^{k + 1} - ζ^{k} ∥ < + \infty

, and the result holds.

(Case II) Assume that for any

k > 0

, the inequality

L_{β} ({\hat{ζ}}^{k}) > L_{β} ({\hat{ζ}}^{★})

holds. Since

lim_{k \to \infty} d ({\hat{ζ}}^{k}, \hat{M}) = 0,

it follows that for any

ϵ_{1} > 0

, there exists

k_{1} > 0

such that for all

k \geq k_{1}

, we have:

d ({\hat{ζ}}^{k}, \hat{M}) < ϵ_{1} .

Moreover, noting that

lim_{k \to \infty} {\hat{L}}_{β} ({\hat{ζ}}^{k}) = {\hat{L}}_{β} ({\hat{ζ}}^{★}),

it implies that for any

ϵ_{2} > 0

, there exists

k_{2} > 0

such that for all

k > k_{2}

, the following inequality holds:

L_{β} ({\hat{ζ}}^{k}) < L_{β} ({\hat{ζ}}^{★}) + ϵ_{2} .

Hence, given

ϵ_{1}

and

ϵ_{2}

, when

k = max {k_{1}, k_{2}}

, we have

d ({\hat{ζ}}^{k}, \hat{M}) < ϵ_{1}, L_{β} ({\hat{ζ}}^{★}) < L_{β} ({\hat{ζ}}^{k}) < L_{β} ({\hat{ζ}}^{★}) + ϵ_{2} .

And, based on Lemma 3, it can be deduced that for all

k > \bar{k}

,

φ^{'} ({\hat{L}}_{β} ({\hat{ζ}}^{k}) - {\hat{L}}_{β} ({\hat{ζ}}^{★})) d (0, \partial {\hat{L}}_{β} ({\hat{ζ}}^{k})) \geq 1 .

Furthermore, using the concavity of

φ

, we derive the following:

\begin{matrix} φ (L_{β} ({\hat{ζ}}^{k}) - L_{β} ({\hat{ζ}}^{★})) - φ (L_{β} ({\hat{ζ}}^{k + 1}) - L_{β} ({\hat{ζ}}^{★})) & \geq φ^{'} (L_{β} ({\hat{ζ}}^{k}) - L_{β} ({\hat{ζ}}^{★})) \\ \times (L_{β} ({\hat{ζ}}^{k}) - L_{β} ({\hat{ζ}}^{k + 1})) . \end{matrix}

Noting the fact that

φ^{'} ({\hat{L}}_{β} (w^{k}) - {\hat{L}}_{β} (w^{★})) > 0

, together with the conclusion obtained in Lemma 7, one can infer that

\begin{matrix} L_{β} ({\hat{ζ}}^{k}) - L_{β} ({\hat{ζ}}^{k + 1}) & \leq \frac{φ (L_{β} ({\hat{ζ}}^{k}) - L_{β} ({\hat{ζ}}^{★})) - φ (L_{β} ({\hat{ζ}}^{k + 1}) - L_{β} ({\hat{ζ}}^{★}))}{φ^{'} (L_{β} ({\hat{ζ}}^{k}) - L_{β} ({\hat{ζ}}^{★}))} \\ \leq Π_{[φ], [k + 1, k]} ψ T_{[k, k + 1]}, \end{matrix}

(21)

where

Π_{[φ], [k, k + 1]}

represents

φ (L_{β} ({\hat{ζ}}^{k}) - L_{β} ({\hat{ζ}}^{★})) - φ (L_{β} ({\hat{ζ}}^{k + 1}) - L_{β} ({\hat{ζ}}^{★})),

and

T_{[k, k + 1]}

represents

∥ y^{k + 1} - y^{k} ∥ + ∥ x^{k + 1} - {\bar{x}}^{k} ∥ + ∥ y^{k} - y^{k - 1} ∥ + ∥ y^{k + 1} - {\bar{y}}^{k} ∥ .

Combining Lemma 5, we can rewrite (21) as follows:

\begin{matrix} ϕ (∥ x^{k} - {\bar{x}}^{k - 1} ∥^{2} + ∥ y^{k + 1} - {\bar{y}}^{k} ∥^{2}) \leq Π_{[φ], [k + 1, k]} ψ T_{[k, k + 1]}, \end{matrix}

(22)

which can be equivalently expressed as

\sqrt{2 ∥ x^{k} - {\bar{x}}^{k - 1} ∥^{2} + 2 {∥ y^{k + 1} - {\bar{y}}^{k} ∥}^{2}} \leq 2 \sqrt{\frac{2 ψ}{ϕ}} T_{[k, k + 1]}^{\frac{1}{2}} Π_{[φ], [k + 1, k]}^{\frac{1}{2}} .

By applying the Cauchy–Schwarz inequality and multiplying both sides by 6, we obtain

6 {(∥ x^{k} - {\bar{x}}^{k - 1} ∥^{2} + {∥ y^{k + 1} - {\bar{y}}^{k} ∥}^{2})}^{\frac{1}{2}} \leq 3 \sqrt{\frac{2 ψ}{ϕ}} T_{[k, k + 1]}^{\frac{1}{2}} Π_{[φ], [k + 1, k]}^{\frac{1}{2}} .

Then, by further applying the fundamental inequality

{(a + b)}^{2} \leq 2 (a^{2} + b^{2})

for all

a, b \in R

, we can deduce that

\begin{matrix} 6 (∥ x^{k} - {\bar{x}}^{k - 1} ∥ + ∥ y^{k + 1} - {\bar{y}}^{k} ∥) & \leq 6 \sqrt{2} {(∥ x^{k} - {\bar{x}}^{k - 1} ∥^{2} + {∥ y^{k + 1} - {\bar{y}}^{k} ∥}^{2})}^{\frac{1}{2}} \\ \leq 2 T_{[k, k + 1]}^{\frac{1}{2}} \sqrt{\frac{18 ψ}{ϕ}} Π_{[φ], [k + 1, k]}^{\frac{1}{2}} \\ \leq T_{[k, k + 1]} + \frac{18 ψ}{ϕ} Π_{[φ], [k + 1, k]} \\ = ∥ y^{k + 1} - y^{k} ∥ + ∥ x^{k + 1} - {\bar{x}}^{k} ∥ + ∥ y^{k} - y^{k - 1} ∥ \\ + ∥ y^{k + 1} - {\bar{y}}^{k} ∥ + \frac{18 ψ}{ϕ} Π_{[φ], [k + 1, k]} \\ \leq 2 ∥ y^{k + 1} - {\bar{y}}^{k} ∥ + 2 ∥ y^{k} - {\bar{y}}^{k - 1} ∥ + ∥ x^{k + 1} - {\bar{x}}^{k} ∥ \\ + ∥ y^{k - 1} - {\bar{y}}^{k - 2} ∥ + \frac{18 ψ}{ϕ} Π_{[φ], [k + 1, k]} . \end{matrix}

(23)

Next, summing up (23) from

k = i + 3

to

k = τ

and rearranging the terms, one gets

\begin{matrix} 5 \sum_{k = i + 3}^{τ} ∥ x^{k} - {\bar{x}}^{k - 1} ∥ + \sum_{k = i + 3}^{τ} ∥ y^{k + 1} - {\bar{y}}^{k} ∥ \leq & 3 ∥ y^{i + 1} - {\bar{y}}^{i} ∥ + ∥ y^{i + 2} - {\bar{y}}^{i + 1} ∥ \\ + ∥ x^{i + 1} - {\bar{x}}^{i} ∥ - 3 ∥ y^{τ + 1} - {\bar{y}}^{τ} ∥ \\ - ∥ y^{τ + 2} - {\bar{y}}^{τ + 1} ∥ - ∥ x^{τ + 1} - {\bar{x}}^{τ + 1} ∥ \\ + \frac{18 ψ}{ϕ} Π_{[φ], [i + 1, t a u + 1]} . \end{matrix}

Furthermore, as

0 \leq φ (L_{β} ({\hat{ζ}}^{k}) - L_{β} ({\hat{ζ}}^{★}))

and

τ \to + \infty

, we can conclude that

\sum_{k = i + 1}^{+ \infty} (5 ∥ x^{k} - {\bar{x}}^{k - 1} ∥ + ∥ y^{k + 1} - {\bar{y}}^{k} ∥) < + \infty,

which implies

\sum_{k = 0}^{+ \infty} ∥ x^{k} - {\bar{x}}^{k - 1} ∥ < + \infty, \sum_{k = 0}^{+ \infty} ∥ y^{k} - {\bar{y}}^{k - 1} ∥ < + \infty .

Based on the relationship between (15) and (18), we can assert that

\sum_{k = 0}^{+ \infty} ∥ x^{k} - x^{k - 1} ∥ < + \infty, \sum_{k = 0}^{+ \infty} ∥ y^{k} - y^{k - 1} ∥ < + \infty, \sum_{k = 0}^{+ \infty} ∥ λ^{k} - λ^{k - 1} ∥ < + \infty .

This demonstrates that

{ζ^{k}}

forms a Cauchy sequence, which ensures its convergence. By applying Theorem 1, it follows that

{ζ^{k}}

converges to a critical point of

L_{β}

. □

4. Numerical Simulations

In this section, we demonstrate the application of NIP-ADMM to the signal recovery model (3) and SCAD penalty problem (5). To verify the effectiveness of Algorithm 1 (i.e., NIP-ADMM), we compare it with Bregman modification of ADMM (BADMM) proposed by Wang et al. [23] and inertial proximal ADMM (IPADMM) proposed by Chen et al. [34]. All codes were implemented in MATLAB 2024b and executed on a Windows 11 system equipped with an AMD Ryzen R9-9900X CPU.

4.1. Signal Recovery

In this subsection on signal recovery, we consider the previously mentioned model (3).

\begin{matrix} min c {∥ x ∥}_{\frac{1}{2}}^{\frac{1}{2}} + \frac{1}{2} {∥ z ∥}^{2} \\ s . t . A x - z = b, \end{matrix}

(24)

where

{∥ x ∥}_{\frac{1}{2}}^{\frac{1}{2}} = \sum_{i = 1}^{n} {| x_{i} |}^{\frac{1}{2}}

, and

A \in R^{m \times n}, b \in R^{m}

,

z \in R^{m}

,

x \in R^{n}

. In the context of the problem (24), Wang et al. [23] experimentally validated the effectiveness of BADMM. To further evaluate the efficiency of NIP-ADMM, numerical comparisons were conducted not only with BADMM by us, but also with IPADMM [34], to verify the rationality of the proposed inertial symmetric update scheme.

We construct the following framework to solve the problem (24). We set

S = e_{0} I - β A^{⊤} A

, where I denotes the identity matrix.

\{\begin{cases} ({\bar{x}}^{k}, {\bar{z}}^{k}) = (x^{k}, z^{k}) + θ (x^{k} - {\bar{x}}^{k - 1}, 0) + η (0, z^{k} - {\bar{z}}^{k - 1}), \\ x^{k + 1} = H (\frac{1}{e_{0}} (- A^{⊤} λ^{k} + β A^{⊤} ({\bar{z}}^{k} + b) + (e_{0} I - β A^{⊤} A) {\bar{x}}^{k}), \frac{2 c}{e_{0}}), \\ z^{k + 1} = z^{k} - γ (z^{k} + λ^{k} - β (A x^{k + 1} - z^{k} - b)), \\ λ^{k + 1} = λ^{k} + β (A x^{k + 1} - z^{k + 1} - b) . \end{cases}

Here, H represents the half-shrinkage operator proposed by Xu et al. [39], which is defined as

H (x, e) = {h_{e} (x_{1}), h_{e} (x_{2}), \dots, h_{e} (x_{n})}^{T},

where the function

h_{e} (x_{i})

for

i = 1, 2, \dots, n

is defined by

\{\begin{matrix} h_{e} (x_{i}) = \frac{2 x_{i}}{3} (1 + cos \frac{2}{3} (π - arccos (\frac{e}{8} {(\frac{x_{i}}{3})}^{- \frac{3}{2}}))), & |x_{i}| > \frac{\sqrt[3]{54}}{4} e^{\frac{2}{3}}, \\ 0, & otherwise . \end{matrix}

In this setup, the entries of matrix A are drawn from a standard normal distribution, with each column adjusted for normalization. The starting vector

x_{0}

is initialized as a sparse vector, containing at least 100 non-zero components. The values

{\bar{x}}^{0}, {\bar{z}}^{0}, x^{1}, z^{1}, λ^{0}

are set initially to zero. To simulate the observation vector with added noise, we generate b using

b = A x^{0} + v

, where

v \sim N (0, 10^{- 3} I)

. For the regularization parameter c, we compute

c = 0.1 ∥ A^{T} {b ∥}_{\infty} .

Based on Assumption 1, the parameters are chosen as

β = 3

and

e_{0} = 10

. The error trend is depicted as

\{\begin{matrix} Error 1 = & l o g ∥ x^{k} - x^{★} ∥, \\ Error 2 = & l o g ∥ z^{k} - z^{★} ∥, \end{matrix}

where log denotes the base-10 logarithm. At the

(k + 1)

-th iteration, the primal residual is expressed as

r^{k + 1} = A x^{k + 1} + z^{k + 1} - b,

while dual residual is represented as

s^{k + 1} = β A^{T} (x^{k + 1} - x^{k}) .

Termination occurs when both conditions are met:

\{\begin{matrix} ∥ r^{k + 1} ∥_{2} & \leq \sqrt{n} \times 10^{- 4} + 10^{- 3} \cdot max (∥ A x^{k} ∥_{2}, {∥ z^{k} ∥}_{2}), \\ ∥ s^{k + 1} ∥_{2} & \leq \sqrt{n} \times 10^{- 4} + 10^{- 3} \cdot {∥ A^{T} λ^{k} ∥}_{2} . \end{matrix}

During the experiments, to satisfy Assumption 1, we set

γ = 0.3

. Table 1 shows that when

m = n = 1000

, selecting the inertial parameter values

θ = 0.8

and

η = 0.75

produces satisfactory results. Therefore, in subsequent experiments, we also adopt

θ = 0.8

and

η = 0.75

. The metrics include the number of iterations (Iter), CPU running time (CPUT) and the objective function value (Obj). To better present the experimental results, we retain two decimal places for Obj and four decimal places for CPUT.

The numerical results consistently demonstrate the superior performance of NIP-ADMM compared to BADMM and IPADMM (see Table 2). Due to the introduction of inertial and proximal terms, the proposed NIP-ADMM demonstrates accelerated convergence with respect to both objective value reduction and error diminution. For

m = n = 1000

, NIP-ADMM shows faster convergence in terms of both objective value and error reduction (see Figure 1). For

m = 3000

and

n = 4000

, in comparison to BADMM and IPADMM, NIP-ADMM achieves a more significant reduction in the number of iterations and computational time (see Figure 2). For larger-scale models with

m = n = 6000

, NIP-ADMM consistently outperforms both BADMM and IPADMM, further indicating its superior suitability for solving large-scale problems (see Figure 3). It is worth noting that the above results highlight the high efficiency of NIP-ADMM, which can be attributed to the proposed inertial update strategy.

4.2. SCAD Penalty Problem

We note that the SCAD penalty problem in statistics can be formulated as the following model [20,31]:

\begin{matrix} min & \sum_{i = 1}^{n} h_{k} (| x_{i} |) + \frac{1}{2} {∥ p ∥}^{2} \\ s . t . & A x - p = b, \end{matrix}

(25)

with

A \in R^{m \times n}, b \in R^{m}

,

p \in R^{m}

,

x \in R^{n}

and the penalty function

h_{κ}

in the objective defined as

h_{κ} (θ) = \{\begin{matrix} κ θ, & θ \leq κ, \\ \frac{- θ^{2} + 2 c κ θ - κ^{2}}{2 (c + 1)}, & κ < θ \leq c κ, \\ \frac{(c + 1) κ^{2}}{2}, & θ > c κ, \end{matrix}

(26)

where

c > 2

and

κ > 0

, being the knots of the quadratic spline function. In the reference signal recovery subsection, we similarly set

ψ (x) = \frac{1}{2} {∥ x ∥}_{S}^{2}

and

ϕ (y) = 0

, where

S = μ I - β A^{⊤} A

. For the problem (25), the x-subproblem corresponding to

2^{\circ}

of Algorithm 1 can be expressed as

\begin{matrix} x^{k + 1} & = arg min \{\sum_{i = 1}^{n} h_{k} (| x_{i} |) - 〈 λ^{k}, A x 〉 + \frac{β}{2} ∥ A x - {\bar{p}}^{k} {- b ∥}^{2} + \frac{1}{2} {∥ x - {\bar{x}}^{k} ∥}_{μ I - β A^{T} A}^{2}\} \\ = arg min \{\sum_{i = 1}^{n} h_{k} (| x_{i} |) + \frac{μ}{2} {∥x - \frac{β A^{T} ({\bar{p}}^{k} + b + \frac{p^{k}}{β} - A {\bar{x}}^{k})}{μ}∥}^{2}\} . \end{matrix}

On the one hand, the x-subproblem can be equivalently formulated as:

min_{x \in R^{n}} \{\sum_{i = 1}^{n} h_{k} (| x_{i} |) + \frac{1}{2 ν} {∥ x - q ∥}^{2}\} .

On the other hand, under the condition that

1 + ν \leq c

, we can update x using the following rule [31]:

x_{i} : = \{\begin{matrix} sign (q_{i}) (| q_{i} {| - κ ν)}_{+}, & | q_{i} | \leq (1 + ν) κ, \\ \frac{(c - 1) q_{i} - sign (q_{i}) κ ν}{c - 1 - ν}, & (1 + ν) κ < | q_{i} | \leq c κ, \\ q_{i}, & | q_{i} | > c κ, \end{matrix}

where

{(\cdot)}_{+}

represents the positive part operator, which is defined as

{(x)}_{+} = max (0, x)

; applying NIP-ADMM to solve the problem (25), one yields that

\{\begin{matrix} {\bar{x}}^{k} = x^{k} + θ (x^{k} - {\bar{x}}^{k - 1}), {\bar{p}}^{k} = p^{k} + η (p^{k} - {\bar{y}}^{k - 1}), \\ x^{k + 1} = arg min \{\sum_{i = 1}^{n} h_{k} (| x_{i} |) + \frac{μ}{2} {∥ x - \frac{- A^{⊤} λ^{k} + β A^{⊤} ({\bar{p}}^{k} + b) + μ_{0} {\bar{x}}^{k} - A^{⊤} A {\bar{x}}^{k}}{μ} ∥}^{2}\}, \\ p^{k + 1} = p^{k} - γ (p^{k} - λ^{k} - β (A x^{k + 1} - p^{k} - b)), \\ λ^{k + 1} = λ^{k} - β (A x^{k + 1} - p^{k + 1} - b) . \end{matrix}

Similarly, the update scheme of BADMM can be represented by the following procedure:

\{\begin{matrix} x^{k + 1} = arg min \{\sum_{i = 1}^{n} h_{k} (| x_{i} |) + \frac{μ}{2} {∥ x - \frac{β A^{T} (p^{k} + b + \frac{λ^{k}}{β} - A x^{k}) + μ x^{k}}{μ} ∥}^{2}\}, \\ p^{k + 1} = \frac{1}{1 + β} (β (A x^{k + 1} - b) - λ^{k}), \\ λ^{k + 1} = λ^{k} - β (A x^{k + 1} - p^{k + 1} - b) . \end{matrix}

Utilize IPADMM to address model (25) and derive the following iterative scheme:

\{\begin{matrix} ({\bar{x}}^{k}, {\bar{p}}^{k}, {\bar{λ}}^{k}) = (x^{k}, p^{k}, λ^{k}) + θ (x^{k} - x^{k - 1}, p^{k} - p^{k - 1}, λ^{k} - λ^{k - 1}),, \\ x^{k + 1} = arg min \{\sum_{i = 1}^{n} h_{k} (| x_{i} |) + \frac{μ}{2} {∥ x - \frac{- A^{⊤} {\bar{λ}}^{k} + β A^{⊤} ({\bar{p}}^{k} + b) + μ_{0} {\bar{x}}^{k} - A^{⊤} A {\bar{x}}^{k}}{μ} ∥}^{2}\}, \\ λ^{k + 1} = {\bar{λ}}^{k} - β (A x^{k + 1} - {\bar{p}}^{k} - b), \\ p^{k + 1} = \frac{1}{1 + β} (β (A x^{k + 1} - b) - λ^{k}) . \end{matrix}

In this experiment, we generate a random

m \times n

matrix A, and perform row and column normalization. Here, we choose to generate a vector z of dimension n with a sparsity ratio of

\frac{100}{m}

. The vector b is represented as the sum of

A z

and a Gaussian noise vector with zero mean and variance

0.001

. The initial variables

x_{1}

,

x_{0}

,

p_{1}

and

p_{0}

are set as zero vectors, serving as the starting point for optimization. To improve numerical efficiency, in this experiment, we set

c = 5

and

κ = 0.1

. Under the condition that Assumption 1 is satisfied, we configure

γ = 0.1

,

β = 12

, and

μ = 100

for NIP-ADMM and other algorithms. The error trend is depicted as

\{\begin{matrix} Error 1 = & l o g ∥ x^{k} - x^{★} ∥, \\ Error 2 = & l o g ∥ p^{k} - p^{★} ∥, \end{matrix}

and the stopping criterion for the updates is set as

max \{∥ x^{k + 1} - x^{k} ∥, ∥ p^{k + 1} - p^{k} ∥\} \leq 10^{- 2} .

In Table 3, we set

m = n = 1000

. The results in the table support our choice of the inertial parameters

η = 0.9

and

θ = 0.9

. Under these conditions, NIP-ADMM requires the fewest iterations and the least running time. Therefore, we selected the inertial parameters

η = 0.9

and

θ = 0.9

for the experiments.

To further evaluate the performance of NIP-ADMM, IPADMM, and BADMM in solving the SCAD penalized problem, we conducted numerical experiments under three different dimensional settings and compared the convergence behavior of the three algorithms. Figure 4, Figure 5 and Figure 6 illustrate the evolution of the objective function values (left) and iteration errors (right) with respect to the number of iterations for NIP-ADMM, IPADMM, and BADMM. As observed from the figures, although all three algorithms eventually converge, NIP-ADMM demonstrates superior performance in terms of convergence speed and error control. In addition, Table 4 presents the numerical performance of the three algorithms under different dimensional settings. Although there are slight differences in the objective function values among the three methods, NIP-ADMM exhibits a clear advantage in terms of iteration count (Iter) and computational time (CPUT). These results clearly demonstrate the acceleration effect brought by the incorporation of inertial and proximal terms in NIP-ADMM.

5. Conclusions

In this paper, we studied a class of nonconvex optimization problems subject to linear constraints. By integrating a class of symmetrically structured inertial terms with a proximal ADMM framework, we proposed a novel symmetrical inertial alternating direction method of multipliers with a proximal term. Under some mild assumptions, we analyzed the subsequential convergence of the proposed NIP-ADMM. Furthermore, assuming that the associated auxiliary function satisfied Kurdyka–Łojasiewicz property, we established the global convergence of the algorithm, which provided theoretical support for the stability of the algorithm. Meanwhile, numerical simulations also validated the theoretical contributions. In both the signal recovery problem and SCAD-penalized problem, NIP-ADMM demonstrated faster convergence compared to IPADMM and BADMM. These results not only confirmed the theoretical advantages of the symmetric inertial term in accelerating convergence and the proximal term in simplifying subproblem computations, but also indicated that NIP-ADMM outperformed both non-inertial and traditional inertial methods in practical applications.

Furthermore, we believe that future work could explore whether the convergence of NIP-ADMM can be guaranteed when the objective function is non-separable. Additionally, it would be worthwhile to investigate whether introducing inertial terms into the y-subproblem and the multiplier

λ

could further accelerate the convergence speed of NIP-ADMM.

Author Contributions

Conceptualization, J.-H.L. and H.-Y.L.; methodology, J.-H.L.; software, J.-H.L. and S.-Y.L.; validation, J.-H.L. and H.-Y.L.; writing—original draft preparation, J.-H.L. and H.-Y.L.; writing—review and editing, J.-H.L. and H.-Y.L.; visualization, J.-H.L., H.-Y.L. and S.-Y.L.; supervision, H.-Y.L.; project administration, H.-Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Innovation Fund of Postgraduate, Sichuan University of Science & Engineering (Y2024340), the Scientific Research and Innovation Team Program of Sichuan University of Science and Engineering (SUSE652B002) and the Opening Project of Sichuan Province University Key Laboratory of Bridge Non-destruction Detecting and Engineering Computing (2023QZJ01).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

The authors would appreciate the anonymous reviewers for their useful comments and advice.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kim, S.J.; Koh, K.; Lustig, M.; Boyd, S.; Gorinevsky, D. An interior-point method for large-scale ℓ₁-regularized least Squares. IEEE J. Sel. Top. Signal Process. 2007, 1, 606–617. [Google Scholar] [CrossRef]
Chartrand, R.; Staneva, V. Restricted isometry properties and nonconvex compressive sensing. Inverse Probl. 2008, 24, 035020. [Google Scholar] [CrossRef]
Zeng, J.S.; Lin, S.B.; Wang, Y.; Xu, Z.B. L_1/2 regularization: Convergence of iterative half thresholding algorithm. IEEE Trans. Signal Process. 2014, 62, 2317–2329. [Google Scholar] [CrossRef]
Doostmohammadian, M.; Gabidullina, Z.R.; Rabiee, H.R. Nonlinear perturbation-based non-convex optimization over time-varying networks. IEEE Trans. Netw. Sci. Eng. 2024, 11, 6461–6469. [Google Scholar] [CrossRef]
Chen, P.Y.; Selesnick, I.W. Group-sparse signal denoising: Non-convex regularization, convex optimization. IEEE Trans. Signal Process. 2014, 62, 3464–3478. [Google Scholar] [CrossRef]
Bai, Z.L. Sparse Bayesian learning for sparse signal recovery using ℓ_1/2-norm. Appl. Acoust. 2023, 207, 109340. [Google Scholar] [CrossRef]
Wang, C.; Yan, M.; Rahimi, Y.; Lou, Y.F. Accelerated schemes for the L₁/L₂ minimization. IEEE Trans. Signal Process. 2020, 68, 2660–2669. [Google Scholar] [CrossRef]
Zhang, S.R.; Wang, Q.H.; Zhang, B.X.; Liang, Z.; Zhang, L.; Li, L.L.; Huang, G.; Zhang, Z.G.; Feng, B.; Yu, T.Y. Cauchy non-convex sparse feature selection method for the high-dimensional small-sample problem in motor imagery EEG decoding. Front. Neurosci. 2023, 17, 1292724. [Google Scholar] [CrossRef]
Tiddeman, B.; Ghahremani, M. Principal component wavelet networks for solving linear inverse problems. Symmetry 2021, 13, 1083. [Google Scholar] [CrossRef]
Xia, Z.C.; Liu, Y.; Hu, C.; Jiang, H.J. Distributed nonconvex optimization subject to globally coupled constraints via collaborative neurodynamic optimization. Neural Netw. 2025, 184, 107027. [Google Scholar] [CrossRef]
Yu, G.; Fu, H.; Liu, Y.F. High-dimensional cost-constrained regression via nonconvex optimization. Technometrics 2021, 64, 52–64. [Google Scholar] [CrossRef] [PubMed]
Merzbacher, C.; Mac Aodha, O.; Oyarzun, D.A. Bayesian optimization for design of multiscale biological circuits. ACS Synth. Biol. 2023, 12, 2073–2082. [Google Scholar] [CrossRef] [PubMed]
Bai, J.C.; Zhang, H.C.; Li, J.C. A parameterized proximal point algorithm for separable convex optimization. Optim. Lett. 2018, 12, 1589–1608. [Google Scholar] [CrossRef]
Wen, F.; Liu, P.L.; Liu, Y.P.; Qiu, R.C.; Yu, W.X. Robust sparse recovery in impulsive noise via ℓ_p-ℓ1 optimization. IEEE Trans. Signal Process. 2017, 65, 105–118. [Google Scholar] [CrossRef]
Zhang, H.M.; Gao, J.B.; Qian, J.J.; Yang, J.; Xu, C.Y.; Zhang, B. Linear regression problem relaxations solved by nonconvex ADMM with convergence analysis. IEEE Trans. Circuits Syst. Video Technol. 2023, 34, 828–838. [Google Scholar] [CrossRef]
Ames, B.P.W.; Hong, M.Y. Alternating direction method of multipliers for penalized zero-variance discriminant analysis. Comput. Optim. Appl. 2016, 64, 725–754. [Google Scholar] [CrossRef]
Zietlow, C.; Lindner, J.K.N. ADMM-TGV image restoration for scientific applications with unbiased parameter choice. Numer. Algorithms 2024, 97, 1481–1512. [Google Scholar] [CrossRef]
Bian, F.M.; Liang, J.W.; Zhang, X.Q. A stochastic alternating direction method of multipliers for non-smooth and non-convex optimization. Inverse Probl. 2021, 37, 075009. [Google Scholar] [CrossRef]
Beck, A.; Teboulle, M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2009, 2, 183–202. [Google Scholar] [CrossRef]
Fan, J.Q.; Li, R.Z. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Statist. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
Parikh, N.; Boyd, S. Proximal Algorithms; Now Publishers: Braintree, MA, USA, 2014. [Google Scholar]
Hong, M.Y.; Luo, Z.Q.; Razaviyayn, M. Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 2016, 26, 337–364. [Google Scholar] [CrossRef]
Wang, F.H.; Xu, Z.B.; Xu, H.K. Convergence of Bregman alternating direction method with multipliers for nonconvex composite problems. arXiv 2014, arXiv:1410.8625. [Google Scholar]
Ding, W.; Shang, Y.; Jin, Z.; Fan, Y. Semi-proximal ADMM for primal and dual robust Low-Rank matrix restoration from corrupted observations. Symmetry 2024, 16, 303. [Google Scholar] [CrossRef]
Guo, K.; Han, D.R.; Wu, T.T. Convergence of alternating direction method for minimizing sum of two nonconvex functions with linear constraints. Int. J. Comput. Math. 2016, 94, 1653–1669. [Google Scholar] [CrossRef]
Wang, Y.; Yin, W.T.; Zeng, J.S. Global convergence of ADMM in nonconvex nonsmooth optimization. J. Sci. Comput. 2019, 78, 29–63. [Google Scholar] [CrossRef]
Wang, F.H.; Cao, W.F.; Xu, Z.B. Convergence of multi-block Bregman ADMM for nonconvex composite problems. Sci. China Inf. Sci. 2018, 61, 122101. [Google Scholar] [CrossRef]
Barber, R.F.; Sidky, E.Y. Convergence for nonconvex ADMM, with applications to CT imaging. J. Mach. Learn. Res. 2024, 25, 1–46. [Google Scholar]
Wang, X.F.; Yan, J.C.; Jin, B.; Li, W.H. Distributed and parallel ADMM for structured nonconvex optimization problem. IEEE Trans. Cybern. 2021, 51, 4540–4552. [Google Scholar] [CrossRef]
Alvarez, F.; Attouch, H. An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Anal. 2001, 9, 3–11. [Google Scholar] [CrossRef]
Wu, Z.M.; Li, M. General inertial proximal gradient method for a class of nonconvex nonsmooth optimization problems. Comput. Optim. Appl. 2019, 73, 129–158. [Google Scholar] [CrossRef]
Chao, M.T.; Zhang, Y.; Jian, J.B. An inertial proximal alternating direction method of multipliers for nonconvex optimization. Int. J. Comput. Math. 2020, 98, 1199–1217. [Google Scholar] [CrossRef]
Wang, X.Q.; Shao, H.; Liu, P.J.; Wu, T. An inertial proximal partially symmetric ADMM-based algorithm for linearly constrained multi-block nonconvex optimization problems with applications. J. Comput. Appl. Math. 2023, 420, 114821. [Google Scholar] [CrossRef]
Chen, C.H.; Chan, R.H.; Ma, S.Q.; Yang, J.F. Inertial proximal ADMM for linearly constrained separable convex optimization. SIAM J. Imaging Sci. 2015, 8, 2239–2267. [Google Scholar] [CrossRef]
Attouch, H.; Bolte, J.; Redont, P.; Soubeyran, A. Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 2010, 35, 438–457. [Google Scholar] [CrossRef]
Rockafellar, R.T.; Wets, R.J.B. Variational Analysis; Springer: Berlin, Germany, 1998. [Google Scholar]
Bolte, J.; Sabach, S.; Teboulle, M. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 2014, 146, 459–494. [Google Scholar] [CrossRef]
Nesterov, Y. Introductory Lectures on Convex Optimization: A Basic Course; Springer: New York, NY, USA, 2004. [Google Scholar]
Xu, Z.B.; Chang, X.Y.; Xu, F.M.; Zhang, H. L_1/2 regularization: A thresholding representation theory and a fast solver. IEEE Trans. Neural Netw. Learn. Syst. 2012, 23, 1013–1027. [Google Scholar] [CrossRef]

Figure 1. Comparison of convergence when

m = n = 1000

: (a) The objective value. (b) The error trends of Error1 and Error2.

Figure 1. Comparison of convergence when

m = n = 1000

: (a) The objective value. (b) The error trends of Error1 and Error2.

Figure 2. Comparison of convergence when

m = 3000

and

n = 4000

: (a) The objective value. (b) The error trends of Error1 and Error2.

Figure 2. Comparison of convergence when

m = 3000

and

n = 4000

: (a) The objective value. (b) The error trends of Error1 and Error2.

Figure 3. Comparison of convergence when

m = n

= 6000: (a) The objective value. (b) The error trends of Error1 and Error2.

Figure 3. Comparison of convergence when

m = n

= 6000: (a) The objective value. (b) The error trends of Error1 and Error2.

Figure 4. Comparison of convergence when

m = n = 1500

: (a) The objective value. (b) The error trends of Error1 and Error2.

Figure 4. Comparison of convergence when

m = n = 1500

: (a) The objective value. (b) The error trends of Error1 and Error2.

Figure 5. Comparison of convergence when

m = 2000

and

n = 2500

: (a) The objective value. (b) The error trends of Error1 and Error2.

Figure 5. Comparison of convergence when

m = 2000

and

n = 2500

: (a) The objective value. (b) The error trends of Error1 and Error2.

Figure 6. Comparison of convergence when

m = n = 3500

: (a) The objective value. (b) The error trends of Error1 and Error2.

Figure 6. Comparison of convergence when

m = n = 3500

: (a) The objective value. (b) The error trends of Error1 and Error2.

Table 1. Numerical results of NIP-ADMM with different

θ

and

η

.

Table 1. Numerical results of NIP-ADMM with different

θ

and

η

.

$θ$	$η$	Iter	CPUT(s)	$θ$	$η$	Iter	CPUT(s)
0.2	0.2	75	2.2392	0.6	0.7	54	1.6014
0.3	0.2	78	2.3039	0.8	0.8	49	1.4583
0.3	0.3	69	1.9476	0.8	0.75	49	1.4309
0.5	0.5	60	1.7622	0.85	0.85	56	1.6445
0.6	0.6	56	1.6516	0.9	0.9	84	2.4614

Table 2. Comparison of iteration effect between NIP-ADMM, IPADMM and BADMM.

m	n	NIP-ADMM			IPADMM			BADMM
m	n	Iter	CPUT(s)	Obj	Iter	CPUT(s)	Obj	Iter	CPUT(s)	Obj
1000	1000	49	1.2698	19.36	78	2.1407	18.46	90	2.3407	20.14
1500	2000	44	4.9152	23.17	72	8.4978	22.12	76	8.5387	23.69
3000	3000	40	15.0464	21.02	57	22.3823	20.56	73	27.4115	21.18
3000	4000	55	34.0601	23.21	98	62.8206	23.11	76	48.3825	23.22
4000	5000	36	40.6110	24.02	53	61.7431	23.09	65	74.4521	24.03
4500	5500	40	61.7638	24.05	45	71.7627	23.79	67	102.6028	24.06
6000	6000	40	88.7702	24.99	48	108.3045	24.56	63	135.8133	25.00

Table 3. Numerical results of NIP-ADMM with different

θ

and

η

.

Table 3. Numerical results of NIP-ADMM with different

θ

and

η

.

$θ$	$η$	Iter	CPUT(s)	$θ$	$η$	Iter	CPUT(s)
0.2	0.2	196	2.0092	0.6	0.7	149	1.5017
0.3	0.2	187	1.9122	0.8	0.7	134	1.3693
0.3	0.3	181	1.8690	0.8	0.9	133	1.3503
0.4	0.5	170	1.7650	0.9	0.8	127	1.3467
0.5	0.5	159	1.6438	0.9	0.9	126	1.3100

Table 4. Comparison of iteration effect between NIP-ADMM, IPADMM and BADMM.

m	n	NIP-ADMM			IPADMM			BADMM
m	n	Iter	CPUT(s)	Obj	Iter	CPUT(s)	Obj	Iter	CPUT(s)	Obj
1000	1000	121	1.3366	10.91	213	2.3065	10.55	182	1.9632	10.55
1000	1300	115	1.9246	12.96	211	3.5724	12.48	174	2.9611	13.13
1500	1000	130	1.9270	8.92	228	3.2943	8.59	172	2.4709	7.49
1500	1300	140	3.0474	13.38	259	5.7832	12.90	215	4.7147	11.88
1500	1500	125	3.6104	13.43	230	6.6865	12.81	196	5.4584	12.71
1800	1500	146	4.6432	13.47	257	8.0396	12.94	209	6.1925	11.83
1800	2000	115	5.8341	15.00	210	10.7513	14.29	182	9.1033	14.69
2500	2000	142	8.9043	14.95	250	15.6397	14.29	201	12.2370	13.07
2900	2700	134	15.3647	17.70	245	28.4289	16.71	203	22.9945	16.50
3000	3000	125	17.1686	17.20	217	34.1575	16.34	188	25.0864	17.22
3500	3000	128	20.2808	16.87	234	37.1725	15.84	194	30.6876	15.94
3500	3500	123	24.5455	19.80	223	44.4163	18.69	200	39.2771	19.43

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, J.-H.; Lan, H.-Y.; Lin, S.-Y. A Novel Symmetrical Inertial Alternating Direction Method of Multipliers with Proximal Term for Nonconvex Optimization with Applications. Symmetry 2025, 17, 887. https://doi.org/10.3390/sym17060887

AMA Style

Li J-H, Lan H-Y, Lin S-Y. A Novel Symmetrical Inertial Alternating Direction Method of Multipliers with Proximal Term for Nonconvex Optimization with Applications. Symmetry. 2025; 17(6):887. https://doi.org/10.3390/sym17060887

Chicago/Turabian Style

Li, Ji-Hong, Heng-You Lan, and Si-Yuan Lin. 2025. "A Novel Symmetrical Inertial Alternating Direction Method of Multipliers with Proximal Term for Nonconvex Optimization with Applications" Symmetry 17, no. 6: 887. https://doi.org/10.3390/sym17060887

APA Style

Li, J.-H., Lan, H.-Y., & Lin, S.-Y. (2025). A Novel Symmetrical Inertial Alternating Direction Method of Multipliers with Proximal Term for Nonconvex Optimization with Applications. Symmetry, 17(6), 887. https://doi.org/10.3390/sym17060887

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Symmetrical Inertial Alternating Direction Method of Multipliers with Proximal Term for Nonconvex Optimization with Applications

Abstract

1. Introduction

2. Preliminaries

3. Novel Algorithm and Convergence Analysis

4. Numerical Simulations

4.1. Signal Recovery

4.2. SCAD Penalty Problem

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI