Parallel Primal-Dual Method with Linearization for Structured Convex Optimization

Zhang, Xiayang; Tang, Weiye; Wang, Jiayue; Zhang, Shiyu; Zhang, Kangqun

doi:10.3390/axioms14020104

Open AccessArticle

Parallel Primal-Dual Method with Linearization for Structured Convex Optimization

by

Xiayang Zhang

^*,

Weiye Tang

,

Jiayue Wang

,

Shiyu Zhang

and

Kangqun Zhang

Nanjing Institute of Technology, Nanjing 211167, China

^*

Author to whom correspondence should be addressed.

Axioms 2025, 14(2), 104; https://doi.org/10.3390/axioms14020104

Submission received: 7 January 2025 / Revised: 26 January 2025 / Accepted: 27 January 2025 / Published: 29 January 2025

Download

Browse Figures

Versions Notes

Abstract

This paper presents the Parallel Primal-Dual (PPD3) algorithm, an innovative approach to solving optimization problems characterized by the minimization of the sum of three convex functions, including a Lipschitz continuous term. The proposed algorithm operates in a parallel framework, simultaneously updating primal and dual variables, and offers potential computational advantages. This parallelization can greatly accelerate computation, particularly when run on parallel computing platforms. By departing from traditional primal-dual methods that necessitate strict parameter constraints, the PPD3 algorithm removes reliance on the spectral norm of the linear operator, significantly reducing the computational burden associated with its evaluation. As the problem size grows, calculating the spectral norm, which is essential for many primal-dual methods, becomes progressively more expensive. In addition, adaptive step sizes are computed to accelerate the convergence process. In contrast to most primal-dual approaches that employ a fixed step size constrained by a global upper limit throughout all iterations, the adaptive step size is typically greater and may result in faster convergence. An O(1/k) ergodic convergence rate is proved theoretically. Applications in Fused LASSO and image inpainting demonstrate the method’s efficiency in computation time and convergence rate compared to state-of-the-art algorithms.

Keywords:

parallel; primal-dual; three operators; linearization; fused LASSO; image inpainting

MSC:

65K10; 90C25; 65K05

1. Introduction

In this paper, we focus on the structured nonlinear optimization problem below:

\begin{matrix} min_{x \in X} f (x) + g (x) + h (A x) . \end{matrix}

(1)

Let

A \in R^{m \times n}

, and let

X \subset R^{n}

and

Y \subset R^{m}

be closed convex sets. Consider three proper lower semi-continuous convex functions f, g, and h; the function f is differentiable, and its gradient is

β

-Lipschitz continuous, meaning that for all

x, y \in X

, we have

∥ \nabla f (x) - \nabla f (y) ∥ \leq {β ∥ x - y ∥}^{2} .

This condition implies that the difference in function values of f at any two points in X is bounded by a constant

β

times the Euclidean distance between those points.

It is crucial to understand that this original primal optimization problem embodies a primal-dual representation as follows:

\begin{matrix} min_{x \in X} max_{y \in Y} f (x) + g (x) + y^{T} A x - h^{*} (y), \end{matrix}

(2)

where

h^{*}

denotes the conjugate of a convex function h, defined as

h^{*} (y) = sup_{x \in dom (h)} (y^{T} x - h (x)) .

The formulation in (1) encompasses a broad spectrum of applications, such as image restoration, machine and statistical learning, compressive sensing, and portfolio optimization. Several illustrative examples are provided below:

Elastic net [1]: The elastic net method incorporates both

l_{1}

and

l_{2}

regularization terms, leveraging the advantages of each. The elastic net estimator is defined as the solution to the following optimization problem:

\begin{matrix} min_{x \in X} μ_{1} {∥ x ∥}_{1} + μ_{2} {∥ x ∥}_{2}^{2} + ℓ (A x - b), \end{matrix}

(3)

where

x \in R^{p}

,

A \in R^{n \times p}

,

b \in R^{n}

, and ℓ is a loss function. The second term possesses a gradient that is Lipschitz continuous. When

μ_{2} = 0

, the elastic net reduces to LASSO regression. Conversely, when

μ_{1} = 0

, the elastic net simplifies to ridge regression.

Matrix completion [2]: Consider a matrix

Y \in R^{m \times n}

with entries confined to the interval

[l, u]

, where

l < u

are positive real constants. The task is to recover a low-rank matrix, X, from the observed elements of Y. Let

Ω

be the set of

Y^{'} s

non-zero elements, and let

P_{Ω}

be a linear operator that extracts the available entries of Y, leaving the missing entries as zeros. The problem, as defined in [2], is formulated as the following optimization problem:

\begin{matrix} min_{X \in ℜ^{m \times n}} & {μ ∥ X ∥}_{*} + \frac{1}{2} {∥ P_{Ω} (X - Y) ∥}_{2}^{2} \\ s . t . & l \leq X \leq u \end{matrix}

where

μ > 0

is a regularization constant, and

{∥ \cdot ∥}_{*}

denotes the nuclear norm. By applying the Lagrange multiplier method, the aforementioned optimization task can be restated in the form presented in (2).

Support vector machine classification [3]: The constrained quadratic optimization problem in

R^{d}

can likewise be reformulated as presented in (2).

\begin{matrix} min_{x \in ℜ^{d}} & \frac{1}{2} {∥ x ∥}_{Q}^{2} + c^{T} x \\ s . t . & x \in C_{1} \cap C_{2} \end{matrix}

In the dual representation of the soft-margin kernelized support vector machine classifier, we encounter a situation involving a symmetric positive semi-definite matrix

Q \in R^{d \times d}

, a vector

c \in R^{d}

, and two sets of constraints,

C_{1}

and

C_{2}

, within

R^{d}

. In particular,

C_{1}

represents a bounding constraint, and

C_{2}

refers to a linear constraint.

Before introducing algorithms to solve (2), we begin by reviewing several established techniques for general saddle-point problems, particularly the case of (2), where f is not present. Three key splitting methods are typically used:

The forward-backward-forward splitting method [4];
The forward-backward splitting method [5];
The Douglas-Rachford splitting method [6].

In recent years, several primal-dual splitting algorithms have emerged, grounded in three primary foundational techniques. The primal-dual hybrid gradient (PDHG) method, first introduced in [7], demonstrates significant numerical efficiency when applied to total variation (TV) minimization tasks. However, its convergence is highly sensitive to the choice of parameters. To mitigate this issue, Chambolle and Pock introduced the CP method in [8], which enhances the PDHG approach by adjusting the update rule for the dual variables. This adjustment improves both convergence rates and numerical efficiency. In [9], He and Yuan performed an extensive investigation of the CP approach, offering a comprehensive analysis via the Proximal Point Algorithm (PPA) and introducing a relaxation parameter within the PPA framework to expedite convergence (the HeYuan method). More recently, Goldstein et al. introduced the Adaptive Primal-Dual Splitting (APD) method in [10], which dynamically modifies the step-size parameters of the CP approach. In [11], O’Connor and Vandenberghe presented a primal-dual decomposition strategy based on Douglas-Rachford splitting, which is utilized for various primal-dual optimality condition decompositions. Additionally, the adaptive parallel primal-dual (APPD) method presented in [12] is an enhanced version of the primal-dual approach, designed with parallel computing capabilities to optimize performance.

A comprehensive framework outlining various well-known primal-dual methods tackles the saddle-point problem by following these steps:

\begin{matrix} \{\begin{matrix} x^{k + 1} = \arg min_{x \in X} {g (x) + x^{T} A^{T} y^{k} + \frac{1}{2 τ} ∥ x - x^{k} ∥^{2}}, \\ {\bar{x}}^{k} = x^{k + 1} + θ (x^{k + 1} - x^{k}), \\ y^{k + 1} = \arg min_{y \in Y} {h^{*} (y) - y^{T} A {\bar{x}}^{k} + \frac{1}{2 σ} ∥ y - y^{k} ∥^{2}}, \end{matrix} \end{matrix}

(4)

As summarized in Table 1, different values of

θ

correspond to distinct methods under the framework described in Section 4. It is noteworthy that a straightforward correction step is necessary for both the HeYuan method and the APPD method.

In the context of (2), Condat and Vu proposed a primal-dual splitting framework, the CV scheme, in their studies [13,14]. This approach, based on forward-backward iteration, extends the Chambolle-Pock algorithm (CP) by linearizing the function f, though it operates with a narrower parameter range than CP. Later, the Asymmetric Forward-Backward-Adjoint (AFBA) splitting method, introduced in [15], generalizes the CV scheme. Additionally, the Primal-Dual Fixed-Point (PDFP) algorithm, proposed in [16], incorporates two proximal operations of g in each iteration. Other notable methods for solving (2) include the Primal-Dual Three-Operator splitting method (PD3O) from [17] and DavisYin’s three-operator splitting technique, known as PDDY, introduced in [18]. More recently, a modified primal-dual algorithm (MPD3O) was studied in [19], which combines the idea of the balanced augmented Lagrangian method (BALM) in [20] and PD3O. The parameter conditions are relaxed in [21] for the AFBA/PDDY method.

A similar overarching framework exists for algorithms addressing (2), analogous to the one in (4):

\begin{matrix} \{\begin{matrix} x^{k + 1} = \arg min_{x \in X} {g (x) + x^{T} A^{T} y^{k} + \frac{1}{2 τ} ∥ x - x^{k} ∥^{2} + x^{T} \nabla f (x^{k})}, \\ {\bar{x}}^{k} = {\bar{x}}_{(x^{k}, x^{k + 1})}^{k}, \\ y^{k + 1} = \arg min_{y \in Y} {h^{*} (y) - y^{T} A {\bar{x}}^{k} + \frac{1}{2 σ} ∥ y - y^{k} ∥^{2}} . \end{matrix} \end{matrix}

(5)

The corresponding parameter restrictions for these methods are provided in Table 2. The relationship between the aforementioned methods and the framework described in (5) is detailed in Table 3.

In the Discussion section of Chambolle’s work [8], a prominent challenge for existing primal-dual methods is emphasized: the handling of a linear operator, A, with a potentially large or unknown norm. Numerous methods, including the PDHG method described in [7], the CP algorithm presented in [8], the HeYuan method from [9], the CV scheme introduced in [13,14], the PDFP method proposed in [16], and the PD3O method outlined in [17], all necessitate the satisfaction of the condition

τ σ \leq 1 / ∥ A^{T} A ∥

for convergence. Consequently, the norm of A must be computed prior to initializing the parameters. However, as the dimensionality of A escalates, computing this norm becomes increasingly costly, thereby introducing a substantial computational burden.

To tackle the aforementioned challenge, we introduce the PPD3 algorithm, aimed at minimizing the sum of three functions, one of which includes a Lipschitz continuous term. The method we propose consists of two separate phases. In the first phase, a parallel version of (5), where

\bar{x}

is replaced by

x^{k}

, functions as a prediction step to produce an intermediate estimate. Subsequently, in the subsequent phase, the step length

α_{k}

is determined, after which a simple adjustment is applied to the predictor, ensuring the convergence of the PPD3 algorithm.

The key contributions presented in this paper are outlined as follows:

In contrast to other primal-dual methods, which usually update the primal and dual variables alternately, the proposed approach computes both variables simultaneously and in parallel. This parallelization can greatly accelerate computation, particularly when run on parallel computing platforms.
The proposed algorithm is independent of any prior knowledge about the linear operator A. The constraint $τ σ \leq 1 / ∥ A^{T} A ∥$ on the parameters is no longer required. In fact, as the problem size grows, calculating the spectral norm of A, which is essential for many primal-dual methods, becomes progressively more expensive.
The step size $α_{k}$ is adjusted dynamically at each iteration. This method tailors the step size for each specific iteration, in contrast to most primal-dual approaches that employ a fixed step size constrained by a global upper limit throughout all iterations. As a result, the adaptive step size, which is typically greater, may result in faster convergence.

The organization of this paper is structured as follows: In Section 2, we introduce the proposed algorithm. Section 3 presents a comprehensive convergence analysis of the algorithm, while Section 4 delves into an examination of its computational complexity. Section 5 reports the performance of the algorithm through diverse applications, accompanied by corresponding numerical results. Lastly, Section 6 summarizes the key conclusions and outlines potential directions for future research endeavors.

2. The Parallel Primal-Dual Algorithm

To facilitate understanding, we combine the primal and dual parameters and introduce them as follows:

u = (\begin{matrix} x \\ y \end{matrix}), θ (u) = g (x) + h^{*} (y), F (u) = (\begin{matrix} A^{T} y + \nabla f (x) \\ - A x \end{matrix}), Ω = X \times Y .

The subsequent symbols are employed as well:

Q (x^{k}, {\tilde{x}}^{k}) = (\begin{matrix} τ^{- 1} (x^{k} - {\tilde{x}}^{k}) - A^{T} (y^{k} - {\tilde{y}}^{k}) - (\nabla f (x^{k}) - \nabla f ({\tilde{x}}^{k})) \\ A (x^{k} - {\tilde{x}}^{k}) + σ^{- 1} (y^{k} - {\tilde{y}}^{k}) \end{matrix}),

H = (\begin{matrix} τ^{- 1} I_{n} & 0 \\ 0 & σ^{- 1} I_{m} \end{matrix}), G = (\begin{matrix} (τ^{- 1} - β) I_{n} & 0 \\ 0 & σ^{- 1} I_{m} \end{matrix}),

and

M (x^{k}, {\tilde{x}}^{k}) = (\begin{matrix} (x^{k} - {\tilde{x}}^{k}) - τ A^{T} (y^{k} - {\tilde{y}}^{k}) - τ (\nabla f (x^{k}) - \nabla f ({\tilde{x}}^{k})) \\ σ A (x^{k} - {\tilde{x}}^{k}) + (y^{k} - {\tilde{y}}^{k}) \end{matrix}),

where

τ > β

and

σ > 0

represent positive parameters. It is clear that

M (x^{k}, {\tilde{x}}^{k}) = H^{- 1} Q (x^{k}, {\tilde{x}}^{k}) .

The concurrent primal-dual approach is described in Algorithm 1.

The loop in Algorithm 1 follows a pattern similar to the CV approach, with one key difference: the prediction step (6) utilizes

x^{k}

rather than

\bar{x}

, as seen in (5). The incorporation of

x^{k}

provides a notable benefit: it eliminates the requirement for any updated data related to x in the y-subproblem, thus enabling the parallel processing of both the x- and y-subproblems.

Subsequently, the residuals

p_{k}

and

d_{k}

are calculated as termination conditions. Once both residuals tend toward zero, we will show (in the subsequent section) that

∥ Q (x^{k}, {\tilde{x}}^{k}) ∥

also approaches zero. Consequently, (1) converges to

{\tilde{u}}^{k} \in Ω, θ (u) - θ ({\tilde{u}}^{k}) + {(u - {\tilde{u}}^{k})}^{T} F ({\tilde{u}}^{k}) \geq 0, \forall u \in Ω .

By contrasting this variational inequality with (3), it is evident that

{\tilde{u}}^{k}

meets the criteria of the initial problem (2). Consequently, the convergence of the primal-dual algorithm can be assessed by analyzing the residuals’ norms (see also [10,22,23]).

At each iteration, the step size

α_{k}

is computed, which addresses the challenge of setting a global upper bound for the step size across all iterations. In traditional primal-dual algorithms, this upper bound is usually constrained by the condition

τ σ \leq ∥ A^{T} A ∥

. In contrast, our approach bypasses this limitation on the parameters

τ

and

σ

by adaptively computing

α_{k}

. Experimental observations indicate that, in certain cases, the step size in our method exceeds the upper bound

∥ A^{T} A ∥

, which has been found to accelerate the convergence process.

Ultimately, a straightforward adjustment procedure guarantees the algorithm’s convergence.

Algorithm 1: Parallel Primal-Dual Algorithm

Initialize $τ < 1 / β$ , $σ > 0$ , $x^{0} \in X$ , $y^{0} \in Y$
While $p_{k}$ , $d_{k}$ > tolerance do
Prediction

$\begin{matrix} \{\begin{matrix} {\tilde{x}}^{k} = arg min_{x \in X} \{g (x) + x^{T} A^{T} y^{k} + \frac{1}{2 τ} {∥ x - x^{k} ∥}^{2} + x^{T} \nabla f (x^{k})\}, \\ {\tilde{y}}^{k} = arg min_{y \in Y} \{h^{*} (y) - y^{T} A x^{k} + \frac{1}{2 σ} {∥ y - y^{k} ∥}^{2}\} . \end{matrix} \end{matrix}$

(6)
Compute residual norms

$\begin{matrix} \{\begin{matrix} p_{k} = ∥ τ^{- 1} (x^{k} - {\tilde{x}}^{k}) - A^{T} (y^{k} - {\tilde{y}}^{k}) - (\nabla f (x^{k}) - \nabla f ({\tilde{x}}^{k})) ∥, \\ d_{k} = ∥ σ^{- 1} (y^{k} - {\tilde{y}}^{k}) + A (x^{k} - {\tilde{x}}^{k}) ∥ . \end{matrix} \end{matrix}$

(7)
Calculate step-size

$α_{k} = \frac{τ^{- 1} ∥ x^{k} - {\tilde{x}}^{k} ∥^{2} + σ^{- 1} {∥ y^{k} - {\tilde{y}}^{k} ∥}^{2} - {(x^{k} - {\tilde{x}}^{k})}^{T} (\nabla f (x^{k}) - \nabla f ({\tilde{x}}^{k}))}{τ p_{k}^{2} + σ d_{k}^{2}} .$
Correction

$\begin{matrix} \{\begin{matrix} x^{k + 1} = x^{k} - α_{k} (x^{k} - {\tilde{x}}^{k} - τ A^{T} (y^{k} - {\tilde{y}}^{k}) - τ (\nabla f (x^{k}) - \nabla f ({\tilde{x}}^{k}))), \\ y^{k + 1} = y^{k} - α_{k} (y^{k} - {\tilde{y}}^{k} + σ A (x^{k} - {\tilde{x}}^{k})) . \end{matrix} \end{matrix}$

(8)
End while

The prediction stage in (6) is a special case of (5), where

{\bar{x}}^{k} = x^{k}

. Unlike other algorithms, such as the CV method, where

{\bar{x}}^{k} = 2 x^{k + 1} - x^{k}

, our method uses the previous iterate

x^{k}

as

{\bar{x}}^{k}

. When

f = 0

, the prediction stage in (6) coincides with (4) when

θ = - 1

.

Note that our algorithm does not require the condition

τ σ \leq ∥ A^{T} A ∥

for theoretical analysis, and thus, no such constraint is necessary.

3. Convergence Analysis

We begin by providing the following reformulation of the variational inequality (VI) corresponding to (2):

Find

u^{*} = (x^{*}, y^{*})

such that

VI (Ω, F, θ) : u^{*} \in Ω, θ (u) - θ (u^{*}) + {(u - u^{*})}^{T} F (u^{*}) \geq 0, \forall u \in Ω = X \times Y,

The mapping

F (u)

is composed of the sum of an affine skew-symmetric matrix and a monotone operator, which ensures that

F (u)

itself is monotone. Let

Ω^{*}

represent the solution set to the variational inequality VI(

Ω

, F,

θ

); we assume that this solution set is non-empty.

Lemma 1.

For a given

u^{k} = (\begin{matrix} x^{k} \\ y^{k} \end{matrix})

, let

{\tilde{u}}^{k}

be given by the expression in (6). Then,

{\tilde{u}}^{k} \in Ω, θ (u) - θ ({\tilde{u}}^{k}) + {(u - {\tilde{u}}^{k})}^{T} F ({\tilde{u}}^{k}) \geq {(u - {\tilde{u}}^{k})}^{T} Q (u^{k}, {\tilde{u}}^{k}) for all u \in Ω,

Proof of Lemma 1.

By applying the optimality condition from (6), we derive

g (x) - g ({\tilde{x}}^{k}) + {(x - {\tilde{x}}^{k})}^{T} {A^{T} y^{k} + τ^{- 1} ({\tilde{x}}^{k} - x^{k}) + \nabla f (x^{k})} \geq 0, \forall x \in X,

and

h^{*} (y) - h^{*} ({\tilde{y}}^{k}) + {(y - {\tilde{y}}^{k})}^{T} {- A x^{k} + σ^{- 1} ({\tilde{y}}^{k} - y^{k})} \geq 0, \forall y \in Y .

By combining the two inequalities above, it follows that

{\tilde{u}}^{k} = ({\tilde{x}}^{k}, {\tilde{y}}^{k}) \in Ω

, and

\begin{matrix} [g (x) - g ({\tilde{x}}^{k}) + h^{*} (y) - h^{*} ({\tilde{y}}^{k})] + {(\begin{matrix} x - {\tilde{x}}^{k} \\ y - {\tilde{y}}^{k} \end{matrix})}^{T} \\ \{(\begin{matrix} A^{T} {\tilde{y}}^{k} \\ - A {\tilde{x}}^{k} \end{matrix}) + (\begin{matrix} τ^{- 1} I & A^{T} \\ - A & σ^{- 1} I \end{matrix}) (\begin{matrix} {\tilde{x}}^{k} - x^{k} \\ {\tilde{y}}^{k} - y^{k} \end{matrix}) + (\begin{matrix} \nabla f (x^{k}) - \nabla f ({\tilde{x}}^{k}) \\ 0 \end{matrix})\} \geq 0, \end{matrix}

(9)

For every

u \in Ω

, the assertion is established through the notation introduced in the preceding section. □

Lemma 2.

For a given

u^{k} = (\begin{matrix} x^{k} \\ y^{k} \end{matrix})

, let

{\tilde{u}}^{k}

be defined by (6), then

{\tilde{u}}^{k} \in Ω,

\begin{matrix} α_{k} (θ (u) - θ ({\tilde{u}}^{k}) + {(u - {\tilde{u}}^{k})}^{T} F ({\tilde{u}}^{k})) \\ \geq \frac{1}{2} ∥ u - u^{k + 1} ∥_{H}^{2} - \frac{1}{2} ∥ u - u^{k} ∥_{H}^{2} + \frac{α_{k}}{2} {∥ u - {\tilde{u}}^{k} ∥}_{G}^{2} for all u \in Ω, \end{matrix}

where

{∥ \cdot ∥}_{H}

,

{∥ \cdot ∥}_{G}

are semi-norm, defined as

{∥ x ∥}_{H} = x^{T} H x

.

Proof of Lemma 2.

Step 5 and 6 in Algorithm 1 are equivalent to

α_{k} = {(u^{k} - {\tilde{u}}^{k})}^{T} Q (u^{k}, {\tilde{u}}^{k}) / {∥ M (u^{k}, {\tilde{u}}^{k}) ∥}_{H}^{2} and

(10)

\begin{matrix} u^{k + 1} = u^{k} - α_{k} M (u^{k}, {\tilde{u}}^{k}) . \end{matrix}

(11)

By utilizing

Q (u^{k}, {\tilde{u}}^{k}) = H M (u^{k}, {\tilde{u}}^{k})

and the relation in (11), the right-hand side of (1) simplifies to

{(u - {\tilde{u}}^{k})}^{T} H (u^{k} - u^{k + 1})

, and therefore

\begin{matrix} α_{k} (θ (u) - θ ({\tilde{u}}^{k}) + {(u - {\tilde{u}}^{k})}^{T} F ({\tilde{u}}^{k})) \\ \geq {(u - {\tilde{u}}^{k})}^{T} H (u^{k} - u^{k + 1}) \\ = \frac{1}{2} ∥ u - u^{k + 1} ∥_{H}^{2} - \frac{1}{2} ∥ u - u^{k} ∥_{H}^{2} + \frac{1}{2} ∥ u - {\tilde{u}}^{k} ∥_{H}^{2} - \frac{1}{2} {∥ u - {\tilde{u}}^{k + 1} ∥}_{H}^{2} \end{matrix}

For the last two terms, we have

\begin{matrix} \frac{1}{2} ∥ u - {\tilde{u}}^{k} ∥_{H}^{2} - \frac{1}{2} {∥ u - {\tilde{u}}^{k + 1} ∥}_{H}^{2} \\ = \frac{1}{2} [2 α_{k} {(u^{k} - {\tilde{u}}^{k})}^{T} H M (u^{k}, {\tilde{u}}^{k}) - α_{k}^{2} ∥ M (u^{k}, {\tilde{u}}^{k}) ∥_{H}^{2}] \\ = \frac{α_{k}}{2} {(u^{k} - {\tilde{u}}^{k})}^{T} H M (u^{k}, {\tilde{u}}^{k}) \\ = \frac{α_{k}}{2} {(\begin{matrix} x - {\tilde{x}}^{k} \\ y - {\tilde{y}}^{k} \end{matrix})}^{T} [(\begin{matrix} τ^{- 1} I & A^{T} \\ - A & σ^{- 1} I \end{matrix}) (\begin{matrix} x^{k} - {\tilde{x}}^{k} \\ y^{k} - {\tilde{y}}^{k} \end{matrix}) + (\begin{matrix} \nabla f (x^{k}) - \nabla f ({\tilde{x}}^{k}) \\ 0 \end{matrix})] \\ \geq \frac{α_{k}}{2} (τ^{- 1} ∥ x^{k} - {\tilde{x}}^{k} ∥^{2} + σ^{- 1} ∥ y^{k} - {\tilde{y}}^{k} ∥^{2} - β {∥ x^{k} - {\tilde{x}}^{k} ∥}^{2}) \\ = \frac{α_{k}}{2} {∥ u - {\tilde{u}}^{k} ∥}_{G}^{2}, \end{matrix}

The proof of the assertion is, thus, complete. □

Lemma 3.

For a given

u^{k} = (\begin{matrix} x^{k} \\ y^{k} \end{matrix})

, let

{\tilde{u}}^{k}

and

u^{k + 1}

be defined by (6) and (8), respectively. If

τ^{- 1} σ^{- 1} \geq c {∥ A ∥}_{2}^{2}

, then there exists a positive constant

α^{+}

such that

α^{+} \leq α_{k}

for all

k > 0

, and the following inequality is satisfied:

\begin{matrix} ∥ u^{k + 1} - u^{*} ∥_{H}^{2} \leq ∥ u^{k} - u^{*} ∥_{H}^{2} - α^{+} {∥ u^{k} - {\tilde{u}}^{k} ∥}_{G}^{2}, \forall u^{*} \in Ω^{*} . \end{matrix}

(12)

Proof of Lemma 3.

By substituting

u = u^{*}

into (10), we derive

\begin{matrix} α_{k} (θ (u^{*}) - θ ({\tilde{u}}^{k}) + {(u^{*} - {\tilde{u}}^{k})}^{T} F ({\tilde{u}}^{k})) \\ \geq \frac{1}{2} ∥ u^{*} - u^{k + 1} ∥_{H}^{2} - \frac{1}{2} ∥ u^{*} - u^{k} ∥_{H}^{2} + \frac{α_{k}}{2} {∥ u - {\tilde{u}}^{k} ∥}_{G}^{2}, \forall u \in Ω . \end{matrix}

(13)

Since

{({\tilde{u}}^{k} - u^{*})}^{T} F ({\tilde{u}}^{k}) \geq {({\tilde{u}}^{k} - u^{*})}^{T} F (u^{*})

and

θ ({\tilde{u}}^{k}) - θ (u^{*}) + {({\tilde{u}}^{k} - u^{*})}^{T} F (u^{*}) \geq 0,

these inequalities can be combined to yield:

∥ u^{k + 1} - u^{*} ∥_{H}^{2} \leq ∥ u^{k} - u^{*} ∥_{H}^{2} - α_{k} {∥ u^{k} - {\tilde{u}}^{k} ∥}_{G}^{2}, \forall u^{*} \in Ω^{*} .

Next, let us recall the definition of

α_{k}

:

α_{k} = \frac{{(u^{k} - {\tilde{u}}^{k})}^{T} Q (u^{k}, {\tilde{u}}^{k})}{∥ M (u^{k}, {\tilde{u}}^{k}) ∥_{H}^{2}} .

(14)

We proceed to compute the term

{(u^{k} - {\tilde{u}}^{k})}^{T} Q (u^{k}, {\tilde{u}}^{k})

:

\begin{matrix} {(u^{k} - {\tilde{u}}^{k})}^{T} Q (u^{k}, {\tilde{u}}^{k}) & = {(\begin{matrix} x - {\tilde{x}}^{k} \\ y - {\tilde{y}}^{k} \end{matrix})}^{T} [(\begin{matrix} τ^{- 1} I & A^{T} \\ - A & σ^{- 1} I \end{matrix}) (\begin{matrix} x^{k} - {\tilde{x}}^{k} \\ y^{k} - {\tilde{y}}^{k} \end{matrix}) \\ + (\begin{matrix} \nabla f (x^{k}) - \nabla f ({\tilde{x}}^{k}) \\ 0 \end{matrix})] \\ \geq (τ^{- 1} ∥ x^{k} - {\tilde{x}}^{k} ∥^{2} + σ^{- 1} ∥ y^{k} - {\tilde{y}}^{k} ∥^{2} - β {∥ x^{k} - {\tilde{x}}^{k} ∥}^{2}) \\ = (1 - τ β) ∥ u - {\tilde{u}}^{k} ∥_{H}^{2} . \end{matrix}

Next, we examine the term

∥ M (u^{k}, {\tilde{u}}^{k}) ∥_{H}^{2}

:

\begin{matrix} ∥ M (u^{k}, {\tilde{u}}^{k}) ∥_{H}^{2} & = {∥(\begin{matrix} I & τ A^{T} \\ - σ A & I \end{matrix}) (u^{k} - {\tilde{u}}^{k}) + τ (\begin{matrix} \nabla f (x^{k}) - \nabla f ({\tilde{x}}^{k}) \\ 0 \end{matrix})∥}_{H}^{2} \\ \leq 2 {∥(\begin{matrix} I & τ A^{T} \\ - σ A & I \end{matrix}) (u^{k} - {\tilde{u}}^{k})∥}_{H}^{2} + 2 τ^{2} {∥(\begin{matrix} \nabla f (x^{k}) - \nabla f ({\tilde{x}}^{k}) \\ 0 \end{matrix})∥}_{H}^{2} \\ \leq 2 ∥(\begin{matrix} I & τ A^{T} \\ - σ A & I \end{matrix})∥ ∥ u^{k} - {\tilde{u}}^{k} ∥_{H}^{2} + 2 τ^{2} β^{2} {∥ u^{k} - {\tilde{u}}^{k} ∥}_{H}^{2} . \end{matrix}

(15)

Therefore,

α_{k}

is bounded below by a positive constant

α^{+}

, and the conclusion follows directly. □

Theorem 1.

Given the sequence

u^{k} = (x^{k}, y^{k})

, let

{\tilde{u}}^{k}

and

u^{k + 1}

be defined by Equations (6) and (8), respectively. It follows that the sequence

{u^{k}}

converges to a limit point

u^{\infty} \in Ω^{*}

.

Proof of Theorem 1.

From Equation (12), it follows that the sequence

{u^{k}}

is bounded, and consequently, we obtain

lim_{k \to \infty} ∥ u^{k} - {\tilde{u}}^{k} ∥ = 0 .

(16)

Thus, the sequence

{{\tilde{u}}^{k}}

is also bounded. Let

u^{\infty}

denote a cluster point of

{{\tilde{u}}^{k}}

, and assume that

{{\tilde{u}}^{k_{j}}}

is a subsequence of

{{\tilde{u}}^{k}}

that converges to

u^{\infty}

. Furthermore, let

{u^{k}}

and

{u^{k_{j}}}

represent the corresponding subsequences induced by

{{\tilde{u}}^{k}}

and

{{\tilde{u}}^{k_{j}}}

, respectively. From Equation (1), it follows that

{\tilde{u}}^{k_{j}} \in Ω, θ (u) - θ ({\tilde{u}}^{k_{j}}) + {(u - {\tilde{u}}^{k_{j}})}^{T} F ({\tilde{u}}^{k_{j}}) \geq {(u - u^{k_{j}})}^{T} Q (u^{k_{j}} - {\tilde{u}}^{k_{j}}), \forall u \in Ω .

By utilizing the continuity of

θ (u)

and

F (u)

, we derive

w^{\infty} \in Ω, θ (u) - θ (u^{\infty}) + {(u - w^{\infty})}^{T} F (u^{\infty}) \geq 0, \forall w \in Ω .

This variational inequality implies that

u^{\infty}

satisfies the variational inequality VI(

Ω, F

).

By applying Equation (16) and using the fact that

{lim}_{j \to \infty} {\tilde{u}}^{k_{j}} = u^{\infty}

, it follows that the subsequence

{u^{k_{j}}}

converges to

u^{\infty}

. Moreover, from Equation (12), we obtain

∥ u^{k + 1} - u^{\infty} ∥_{H} \leq {∥ u^{k} - u^{\infty} ∥}_{H},

This result demonstrates that the sequence

{u^{k}}

approaches

u^{\infty}

, thus finalizing the proof. □

4. Computational Complexity

Lemma 4.

The solution set of the variational inequality VI

(Ω, F, θ)

is convex and can be described as

Ω^{*} = ⋂_{u \in Ω} \{\tilde{u} \in Ω : (θ (u) - θ (\tilde{u})) + {(u - \tilde{u})}^{T} F (u) \geq 0\} .

Proof of Lemma 4.

The proof can be derived as a natural extension of Theorem 2.3.5 in [24], or alternatively, one may consult the proof of Theorem 2.1 in [25] for a related approach. □

Lemma 4 demonstrates that

\tilde{u} \in Ω

constitutes an approximate solution to the variational inequality VI

(Ω, F, θ)

, with an accuracy of

ϵ > 0

, provided that it satisfies

\begin{matrix} θ (u) - θ (\tilde{u}) + {(u - \tilde{u})}^{T} F (u) \geq - ϵ, \forall u \in Ω \cap D (\tilde{u}), \end{matrix}

(17)

where

D (\tilde{u}) = {u ∣ ∥ u - \tilde{u} ∥ \leq 1}

represents the neighborhood of

\tilde{u}

.

Theorem 2.

Algorithm 1 demonstrates convergence with an ergodic rate of

O (1 / t)

. Specifically, let

{u^{k}}

represent the sequence generated by Algorithm 1, starting from an arbitrary initial point

u^{0}

. For the sequence

{{\bar{u}}^{t}}

defined by

\begin{matrix} {\bar{u}}^{t} = \frac{1}{\sum_{k = 0}^{t - 1} α_{k}} \sum_{k = 0}^{t - 1} α_{k} {\tilde{u}}^{k}, \end{matrix}

(18)

the subsequent convergence bound holds:

\begin{matrix} θ (u) - θ ({\bar{u}}^{t}) + {(u - {\bar{u}}^{t})}^{T} F (u) \geq - \frac{1}{2 t α^{+}} {∥ u - u^{0} ∥}_{H}^{2} . \end{matrix}

(19)

Proof of Theorem 2.

By leveraging the monotonicity of F, we obtain

\begin{matrix} θ (u) - θ ({\tilde{u}}^{k}) + {(u - {\tilde{u}}^{k})}^{T} F (u) \geq θ (u) - θ ({\tilde{u}}^{k}) + (u - {\tilde{u}}^{k}) F (\tilde{u}) . \end{matrix}

(20)

By substituting this into (12), we derive

α_{k} [θ (u) - θ ({\tilde{u}}^{k}) + {(u - {\tilde{u}}^{k})}^{T} F (u)] \geq \frac{1}{2} ∥ u - u^{k + 1} ∥_{H}^{2} - \frac{1}{2} {∥ u - u^{k} ∥}_{H}^{2} \forall u \in Ω .

(21)

It is important to highlight that the above assertion holds when

G > 0

.

By summing the inequality in (21) over

k = 0, 1, \dots, t - 1

, we obtain

\begin{matrix} \sum_{k = 0}^{t - 1} \{α_{k} [θ (u) - θ ({\tilde{u}}^{k}) + {(u - {\tilde{u}}^{k})}^{T} F (u)]\} \geq - \frac{1}{2} {∥ u - u^{0} ∥}_{H}^{2} . \end{matrix}

(22)

By applying the definition of

{\bar{u}}^{t}

, this can be expressed as

\begin{matrix} (\sum_{k = 0}^{t - 1} α_{k}) θ (u) - \sum_{k = 0}^{t - 1} α_{k} θ ({\tilde{u}}^{k}) + (\sum_{k = 0}^{t - 1} α_{k}) {(u - {\bar{u}}^{t})}^{T} F (u) \geq - \frac{1}{2} {∥ u - u^{0} ∥}_{H}^{2} . \end{matrix}

(23)

Given that

θ (u)

is convex, it follows that

(\sum_{k = 0}^{t - 1} α_{k}) θ ({\tilde{u}}^{t}) \leq \sum_{k = 0}^{t - 1} [α_{k} θ ({\tilde{u}}^{k})] .

Substituting this into the preceding inequality yields

\begin{matrix} θ (u) - θ ({\bar{u}}^{t}) + {(u - {\bar{u}}^{t})}^{T} F (u) \geq - \frac{1}{2 \sum_{k = 0}^{t - 1} α_{k}} ∥ u - u^{0} ∥_{H}^{2} \geq - \frac{1}{2 t α^{+}} {∥ u - u^{0} ∥}_{H}^{2} . \end{matrix}

(24)

Therefore, the result has been established. □

5. Numerical Experiments

Algorithm 1 was applied to the Fused LASSO and image inpainting problem to show its numerical performance. We compare our method with Condat and Vu’s primal-dual splitting scheme (CV), Yan’s primal-dual three-operator splitting scheme (PD3O), and the Asymmetric Forward-Backward-Adjoint method (AFBA)/Davis and Yin’s three-operator splitting method (PDDY). All codes were written and implemented in Matlab

2024 a

, and all experiments were carried out on a computer with a

1.20

GHz Intel Core Ultra 5 125H processor and 32 GB of memory.

5.1. Fused LASSO

The formulation of Fused LASSO involves the following optimization problem:

\hat{x} = arg min \{\sum_{i} {(b_{i} - \sum_{j} a_{i j} x_{j})}^{2}\}

s . t . \sum_{j = 1}^{p} | x_{j} | \leq μ_{1} and \sum_{j = 2}^{p} | x_{j} - x_{j - 1} | \leq μ_{2},

where

μ_{1}

and

μ_{2}

are tuning parameters that control the sparsity of the coefficients and the smoothness of their differences, respectively. This formulation captures both sparsity and smoothness, making it especially suited for high-dimensional settings where feature ordering is important. Examples of such problems include gene expression analysis and protein mass spectroscopy.

The optimization problem can be formulated as follows:

x^{*} = \underset{x}{arg min} (\frac{1}{2} {∥ A x - b ∥}_{2}^{2} + μ_{1} {∥ x ∥}_{1} + μ_{2} {∥ D x ∥}_{1}),

where

x \in R^{p}

,

A \in R^{n \times p}

, and

b \in R^{n}

. The matrix D captures the differences between consecutive coefficients, and the

ℓ_{1}

-penalty on

D x

promotes smoothness. We let

f (x) = \frac{1}{2} {∥ A x - b ∥}^{2}

,

g (x) = μ_{2} {∥ x ∥}_{1}

, and

h (D x) = μ_{1} {∥ D x ∥}_{1}

. The problem can also be reformulated in a primal-dual form for efficient computation:

min_{x \in R^{p}} max_{y \in R^{p - 1}} \frac{1}{2} {∥ A x - b ∥}^{2} + μ_{2} {∥ x ∥}_{1} + 〈 D x, y 〉 - μ_{1} I_{B^{\infty}} (\frac{y}{μ_{1}})

where

B^{\infty}

is the closed unit ball in the

l \infty

norm and

I_{B^{\infty}}

is its indicator function. We used the same problem setting as in [21].

We set the initial parameter values

τ_{0} = 0.95 / β = 0.95 / ∥ A^{T} A ∥

and

σ_{0} = 1 / (τ ∥ D^{T} D ∥)

,

∥ D^{T} D ∥ \to 2 (n \to \infty)

. Figure 1 shows that a larger primal step size produces a better convergence rate for PPD3; thus, we chose

τ

close to its upper bound for the PPD3 method in this subsection. Since PPD3 has no restriction of

τ σ \leq 1 / ∥ D^{T} D ∥

, we fixed the maximal primal step size at

τ = 1 / β

and chose a different dual step size

σ

so as to find the optimal dual step size (Figure 2). We see that removing the restriction on the dual step size, indeed, provides more choice regarding the step size parameters and has the potential to speed up algorithmic convergence. We chose

σ = 1.1 / (τ ∥ D^{T} D ∥)

for PPD3 to solve Fused LASSO, and the rest of the parameters are presented in Table 4. All

τ

values take their upper bound as the same as the experiment in [21], drawing the conclusion that a larger

τ

leads to faster convergence in Fused LASSO.

θ

was 1.5 for improved parameter conditions in AFBA/PDDY, as in [21] (Figure 3, Figure 4 and Figure 5).

We compared PPD3 with other methods for Fused LASSO problems of different dimensions. The results are consistent with the dimension of the problem, showing that PPD3 converges faster among all methods. PPD3 and AFBA/PDDY have close performance and are significantly better than CV or PD3O.

5.2. Image Inpainting

The classical Rudin-Osher-Fatemi (ROF) model, which minimizes total variation (TV), serves as a foundational approach to image restoration. A generic model for image restoration can be derived from the classical ROF model (in the discrete setting) as

min_{x \in X} {∥ Φ x ∥}_{1} + \frac{λ}{2} {∥ x - f ∥}_{2}^{2},

(25)

where

Φ

is a linear transformation (e.g., wavelets and curvelets), and x represents the restored image. The first term enforces sparsity in the transformed domain, while the second term ensures the restored image is close to the observed image f, which is partially corrupted.

λ

controls the trade-off between sparsity and fidelity.

Image inpainting is the process of reconstructing missing or corrupted parts of an image. We consider a simple modification of (25). Let

D = {(i, j) ∣ 1 \leq i \leq M, 1 \leq j \leq N}

denote the set of indices of the image domain, and let

I \subset D

denote the set of indices of the inpainting domain. The inpainting model can then be defined as

min_{x \in X} {∥ Φ x ∥}_{1} + \frac{λ}{2} \sum_{i, j \in D ∖ I} {(x_{i, j} - f_{i, j})}^{2},

(26)

This formulation minimizes the squared error between the reconstructed and observed image in non-missing regions. A large

λ

focuses solely on inpainting, while a small

λ

denoises the image. Here, we used a more compact form to present (26):

\begin{matrix} min_{l \leq X \leq u} {μ ∥ \nabla x ∥}_{1} + \frac{1}{2} {∥ S x - f ∥}_{2}^{2}, \end{matrix}

(27)

where ∇ is the discrete gradient operator,

S \in ℜ^{N \times N}

is the mask operator (i.e., a diagonal matrix where zero entries denote missed information and identity entries indicate observed information), and f is the observed image. We let

f (x) = \frac{1}{2} {∥ S x - f ∥}^{2}

,

g (x) = I_{[l, u]} (x)

, and

h (\nabla x) = {μ ∥ \nabla x ∥}_{1}

. By rewriting (27), the primal-dual form

\begin{matrix} min_{x} max_{y} \frac{1}{2} {∥ S x - f ∥}_{2}^{2} + I_{[l, u]} (x) + 〈 \nabla x, y 〉 - μ I_{B \infty} (\frac{y}{μ}) \end{matrix}

(28)

was obtained, which fits the form (2). This formulation balances sparsity, fidelity, and structure for effective image reconstruction.

The tested images are

h o u s e . p n g (256 \times 256)

,

l e n a . p n g (512 \times 512)

, and

P e p p e r s . p n g (512 \times 512)

. The image of

h o u s e . p n g (256 \times 256)

is degraded by a character mask, where about 15% of the pixels are missing. The degraded

l e n a . p n g (512 \times 512)

retains one row of pixels in every eight rows, whereas the degraded Pepper.png (512 × 512) keeps one column of pixels in every eight columns, which means about 87% of the pixels are missing. For all images, a zero-mean Gaussian noise with a standard deviation of 0.02 was added. The original and degraded images are shown in Figure 6, Figure 7 and Figure 8.

Signal-to-Noise Ratio (SNR) is a measure used in science and engineering to quantify the level of a desired signal relative to the level of background noise. The formula for SNR in decibels (dB) is typically

SNR (dB) = 10 {log}_{10} (\frac{P_{signal}}{P_{noise}}),

where

P_{signal}

is the power of the signal, and

P_{noise}

is the power of the noise. In the realm of image processing, it measures the clarity of images, especially in applications like medical imaging or low-light photography. Higher SNR values indicate a clearer, stronger signal relative to noise.

We set the initial parameter values as

τ_{0} = 0.95 / β = 0.95

,

σ_{0} = 1 / (τ ∥ \nabla^{T} \nabla ∥)

, and

∥ \nabla^{T} \nabla ∥ \to 8 (n \to \infty)

. SNR was plotted against iteration number for the masked house image inpainting problems in Figure 9 and Figure 10. As shown below in Figure 9, the step size parameter is less sensitive in image inpainting than in Fused LASSO. Figure 10 indicates that a larger primal step size always leads to better performance; hence, we chose

τ = 0.95 / {∥ S ∥}^{2}, σ = 1.5 / τ {∥ \nabla ∥}^{2}

for PPD3, and we used the same settings listed in Table 5 for the rest of the parameters.

The recovered images are shown in Figure 11, Figure 12 and Figure 13, and the SNR was plotted against iteration number for all image inpainting in Figure 14.

The result in Figure 14 shows that in all three image inpainting missions, PPD3 is the best among the compared methods. The PPD3 and AFBA/PDDY methods are significantly better than the CV or PD3O methods.

5.3. Other Computational Advantages of PPD3

Beyond its rapid convergence in terms of iterations, the PPD3 method also demonstrates considerable computational efficiency in terms of time. By taking the image inpainting tasks of house.png (

256 \times 256

) and pepper.png (

512 \times 512

) as examples, the primary CPU time costs are detailed in Table 6, Table 7, Table 8 and Table 9. In particular, the parallel implementation of PPD3 achieves theoretical time savings of up to

46.6 %

and

47.8 %

, respectively, when communication overhead is ignored.

Furthermore, alternative methods typically require the computation of

∥ A^{T} A ∥

, an operation for which the time cost is non-trivial. Although the linear operators in both experiments happen to have theoretical values, this is not guaranteed in all situations. We report the computational times of various MATLAB commands for

∥ A^{T} A ∥

under scenarios where A is either dense or sparse. If

∥ A^{T} A ∥

is unknown, we observe that even the fastest estimation of

∥ A^{T} A ∥

accounts for approximately

10 %

of the total computation time. These results underscore that PPD3 is particularly advantageous as a solver, especially when the linear operator A is dense and of significant size. In practice, PPD3 can take

τ

to its upper bound and choose a relatively large

σ

or a random one to avoid the calculation of

∥ A^{T} A ∥

.

6. Discussion

This paper introduces the Parallel Primal-Dual (PPD3) algorithm, a novel framework for solving optimization problems involving the minimization of the sum of three convex functions, including a Lipschitz continuous term. By leveraging parallel updates of primal and dual variables and removing stringent parameter dependencies, PPD3 addresses the computational inefficiencies inherent in existing methods. The algorithm’s adaptability to various problem settings and its ability to perform without reliance on the spectral norm of the linear operator mark notable progressions in the field of convex optimization.

In our algorithm, parallelization has the potential to accelerate computation, particularly when running on parallel computing platforms. The constraint

τ σ \leq \frac{1}{∥ A^{T} A ∥}

on the parameters is removed, eliminating the need to compute the spectral norm of A, which is essential for most primal-dual methods. This also broadens the choice of parameters. Additionally, the step size is adjusted dynamically at each iteration rather than using a fixed global upper limit. As a result, our step size is typically larger, leading to faster convergence.

Through applications in image inpainting and Fused LASSO, PPD3 has demonstrated satisfactory performance in terms of computational efficiency and convergence speed. These results underscore its potential to handle complex, large-scale optimization challenges across a variety of domains.

For future work, we could explore extending the PPD3 framework to non-convex settings. Additionally, we may develop a stochastic version of PPD3. Furthermore, the parameters could be further investigated to determine whether they can be relaxed.

Author Contributions

Conceptualization, X.Z.; methodology, X.Z.; software, S.Z. and J.W.; validation, X.Z. and K.Z.; formal analysis, X.Z. and W.T.; writing—original draft preparation, X.Z. and W.T.; writing—review and editing, X.Z. and K.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data generated or analyzed during this study are available from the corresponding author.

Acknowledgments

The authors would like to thank Yuan Shen and Wenxing Zhang for their valuable advice, especially in the numerical experiment.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ROF	Rudin-Osher-Fatemi
CV	Condat and Vu
PDFP	Primal-Dual Fixed-Point
PD3O	Primal-Dual Three-Operator splitting
AFBA	Asymmetric Forward-Backward-Adjoint
PDDY	DavisYin’s 82 three-operator splitting
PDHG	Primal-dual hybrid gradient
CP	Chambolle and Pock
APPD	Adaptive parallel primal-dual
PPD3	Parallel Primal-Dual

References

Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
Candes, E.J.; Plan, Y. Matrix Completion With Noise. Proc. IEEE 2010, 98, 925–936. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Tseng, P. A modified forward-backward splitting method for maximal monotone mappings. SIAM J. Control Optim. 2000, 38, 431–446. [Google Scholar] [CrossRef]
Passty, G.B. Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 1979, 72, 383–390. [Google Scholar] [CrossRef]
Lions, P.-L.; Mercier, B. Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 1979, 16, 964–979. [Google Scholar] [CrossRef]
Zhu, M.; Chan, T.F. An Efficient Primal-Dual Hybrid Gradient Algorithm for Total Variation Image Restoration; CAM Report 08-34; UCLA: Los Angeles, CA, USA, 2008. [Google Scholar]
Chambolle, A.; Pock, T. A first-order primal-dual algorithm for convex problems with applications to imaging. Math. Imaging Vis. 2011, 40, 120–145. [Google Scholar] [CrossRef]
He, B.; Yuan, X. Convergence analysis of primal-dual algorithms for a saddle-point problem: From contraction perspective. SIAM J. Imaging Sci. 2012, 5, 119–149. [Google Scholar] [CrossRef]
Goldstein, T.; Li, M.; Yuan, X.; Esser, E.; Baraniuk, R. Adaptive primal-dual hybrid gradient methods for saddle-point problems. arXiv 2015, arXiv:1305.0546v2. [Google Scholar]
O’Connor, D.; Vandenberghe, L. Primal-dual decomposition by operator splitting and applications to image deblurring. SIAM J. Imaging Sci. 2014, 7, 1724–1754. [Google Scholar] [CrossRef]
Zhang, X. Adaptive parallel primal-dual method for saddle point problems. Numer. Math. Theory, Method Appl. 2018, 11, 187–210. [Google Scholar]
Condat, L. A primal-dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms. J. Optim. Theory Appl. 2013, 158, 460–479. [Google Scholar] [CrossRef]
Vũ, B.C. A splitting algorithm for dual monotone inclusions involving cocoercive operators. Adv. Comput. Math. 2013, 38, 667–681. [Google Scholar] [CrossRef]
Latafat, P.; Patrinos, P. Asymmetric forward-backward-adjoint splitting for solving monotone inclusions involving three operators. arXiv 2016, arXiv:1602.08729. [Google Scholar] [CrossRef]
Chen, P.; Huang, J.; Zhang, X. A primal-dual fixed point algorithm for minimization of the sum of three convex separable functions. Fixed Point Theory Appl. 2016, 54, 1–18. [Google Scholar] [CrossRef]
Yan, M. A Primal-Dual Three-Operator Splitting Scheme. 2017. Available online: https://optimization-online.org/?p=14319 (accessed on 26 January 2025).
Davis, D.; Yin, W. Convergence rate analysis of several splitting schemes. arXiv 2014, arXiv:1406.4834. [Google Scholar]
Xu, C.Y.H.; Yang, J. A Modified Primal-Dual algorithm for Structured Convex Optimization with a Lipschitzian Term. J. Oper. Res. China 2024, 1–21. [Google Scholar]
He, B.; Yuan, X. Balanced augmented Lagrangian method for convex programming. arXiv 2021, arXiv:2108.08554. [Google Scholar]
Yan, M.; Li, Y. On the improved Conditions for Some Primal-Dual Algorithms. J. Sci. Comput. 2024, 99, 74. [Google Scholar] [CrossRef]
Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via admm. Found. Trends Machine Learn. 2010, 3, 1–122. [Google Scholar] [CrossRef]
He, B.; Yuan, X.; Zhang, J.J.Z. Comparison of two kinds of prediction-correction methods for monotone variational inequalities. Comput. Optim. Appl. 2004, 27, 247–267. [Google Scholar] [CrossRef]
Facchinei, F.; Pang, J.-S. Finite-Dimensional Variational Inequalities and Complementarity Problems; Springer Series in Operations Research; Springer: New York, NY, USA, 2003; Volume I. [Google Scholar]
He, B.; Yuan, X. On the O(1/n) convergence rate of the Douglas-Rachford alternating direction method. SIAM J. Numer. Anal. 2012, 50, 700–709. [Google Scholar] [CrossRef]

Figure 1. Fused LASSO with various primal step sizes. (a) relative objective error vs. iteration number. (b) relative variable error vs. iteration number.

Figure 2. Fused LASSO with various dual step sizes. (a) relative objective error vs. iteration number. (b) relative variable error vs. iteration number.