Distributed Optimization Algorithm for Composite Optimization Problems with Non-Smooth Function

Shi, Yawei; Ran, Liang; Tang, Jialong; Wu, Xiangzhao

doi:10.3390/math10173135

Open AccessArticle

Distributed Optimization Algorithm for Composite Optimization Problems with Non-Smooth Function

by

Yawei Shi

,

Liang Ran

^*,

Jialong Tang

and

Xiangzhao Wu

Chongqing Key Laboratory of Nonlinear Circuits and Intelligent Information Processing, College of Electronic and Information Engineering, Southwest University, Chongqing 400715, China

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(17), 3135; https://doi.org/10.3390/math10173135

Submission received: 22 July 2022 / Revised: 22 August 2022 / Accepted: 24 August 2022 / Published: 1 September 2022

(This article belongs to the Topic Distributed Optimization for Control)

Download

Browse Figures

Versions Notes

Abstract

:

This paper mainly studies the distributed optimization problems in a class of undirected networks. The objective function of the problem consists of a smooth convex function and a non-smooth convex function. Each agent in the network needs to optimize the sum of the two objective functions. For this kind of problem, based on the operator splitting method, this paper uses the proximal operator to deal with the non-smooth term and further designs a distributed algorithm that allows the use of uncoordinated step-sizes. At the same time, by introducing the random-block coordinate mechanism, this paper develops an asynchronous iterative version of the synchronous algorithm. Finally, the convergence of the algorithms is proven, and the effectiveness is verified through numerical simulations.

Keywords:

distributed optimization; non-smooth convex function; proximal operator; random-block coordinate

MSC:

68W15

1. Introduction

In this paper, we study a class of distributed multi-agent problems on networks. Each agent in the network system has the following private objective function to be solved

F_{i} (\bar{x}) = f_{i} (\bar{x}) + g_{i} (\bar{x}),

(1)

where

\bar{x} \in R^{n}

is the decision variable,

f_{i}

is a Lipschitz-differentiable convex function, and

g_{i}

is a non-smooth convex function. Examples of

f_{i}

include quadratic functions and logistic functions [1], and applications of function

g_{i}

include the elastic-net norm, L1-norm, and indicator functions [2].

For the network system, we consider that each agent in the system is only allowed to interact with neighbor agents, and there is no central agent to process data; then we can obtain

\begin{matrix} \begin{matrix} \min_{x_{1}, \dots, x_{m}} \sum_{i = 1}^{m} F_{i} (x_{i}) \\ s . t . x_{i} = x_{i}, (i, j) \in E \end{matrix} \end{matrix}

(2)

where

x_{i} \in R^{n}

is the local estimation for

\bar{x}

and

E

represents a collection of edges in the network. This distributed computing architecture captures various areas containing distributed information processing and decision making, networked multi-vehicle coordination, distributed estimation, etc. Typical applications include power systems control [3], model predictive control [4], statistical inference and learning [5], and distributed average consensus [6].

In recent years, most of the literature has mainly focused on the case that the optimization objective function contains only one smooth convex function. At the same time, many centralized algorithms with excellent performance, such as proximal gradient descent, sub-gradient algorithm, Newton method, and so on, solve these problems by extending to a distributed form. The sub-gradient algorithm is the most commonly used method. In [7], Nedić and Ozdaglar apply this method to the distributed optimization problem on time-varying networks and creatively propose the distributed sub-gradient method (DGD). Shi et al. [8] propose an exact first-order algorithm (EXTRA) and prove the linear convergence of the algorithm. The algorithm makes use of the error between adjacent iterations of the DGD algorithm. Then, [9] designs a distributed first-order algorithm by combining DGD and the gradient tracking method. In order to further accelerate the convergence of the algorithm, researchers successively propose the distributed ADMM algorithm in [10,11,12,13]. However, these algorithms can only solve the optimization problem of a single function.

For (2) this composite distributed optimization problem with a non-smooth term, many research results have emerged. The authors of [14] design a proximal gradient method by combining Nesterov acceleration mechanisms. However, each iteration will lead to the consumption of more computing resources because more internal iteration steps are required. In undirected networks, Shi et al. design a proximal gradient exact first-order algorithm (PG-EXTRA) for composite optimization problems based on the classical first-order distributed optimization algorithm (EXTRA) [8] in [15]. The algorithm can accurately converge to the optimal solution of the problem by using a fixed step-size, so it is different from most algorithms that must use attenuation step-size. The authors of [16] propose a communication-efficient random walk named Walkman by using a Markov chain. By analyzing the relationship between optimization accuracy and network connectivity, this method obtains the explicit expression of communication complexity and the communication efficiency of the system. Further, considering that the complex situation of the real scene causes most agents in the network to transmit data in a directed way, ref. [17] uses the push sum mechanism to eliminate the information imbalance caused by the directed network and proposes the PG-ExtraPush algorithm on the basis of [8] and maintains the same convergence property.

Recently developed, the operator splitting technology has become the mainstream method to deal with this kind of complex optimization problem. Operator splitting technology is applied for the first time to composite optimization since Combettes and Pesquet designed a fully splitting algorithm, refs. [18,19,20,21] and others successively propose various algorithms for composite optimization. However, operator splitting technology is rarely applied to distributed composite optimization. Based on this, this paper aims to design a distributed algorithm with excellent performance by using the operator splitting method and based on the theory of operator monotonicity.

Contributions: Compared with most existing distributed optimization algorithms, the main contributions of this paper are summarized as follows:

1.: To solve problem (2), this paper develops a novel, fully distributed algorithm based on the operator splitting method, which has superiorities in flexibility and efficiency compared with relatively centralized counterparts [18,19,20,21].
2.: Based on a class of randomized block-coordinate methods, an asynchronous iterative version of the proposed algorithm is also derived, wherein only a subset of agents that are independently activated participate in the updates. Note that such an activation scheme is more flexible compared with the single coordinate activation [22].
3.: Both proposed algorithms allow not only local information interaction among neighboring agents but also the use of uncoordinated step-sizes, without any requirement of coordinated or dynamic ones considered in [7,8,9,14,23]. Additionally, the convergence of both algorithms is ensured under the same mild assumptions. In particular, the consideration of the local Lipschitz assumption avoids the conservative selections of step-sizes, unlike the global one assumed in [8,14,15,17].

Organization: The contents of the remaining sections of the paper are as follows. Section 2 provides the symbols, lemmas, definitions, and assumptions that will be used in the paper. We give the specific process of algorithm derivation in Section 3. In Section 4, we show the convergence analysis of the proposed algorithms. Section 5 presents the simulation experiment to verify the algorithms. Finally, Section 6 gives the conclusion of the paper.

2. Preliminaries

In this section, we give the notations and display the definitions and lemmas that will be used in the paper. Then, we give two important assumptions.

Above all, we introduce some knowledge about graph theory. Let

G = (V, E)

represent an undirected network composed of n agents, where

V

denotes the set of agents and

E

denotes the set of edges. The neighborhood of the i-th agent is recorded as

N_{i} = \{j |(i, j) \in E\}

. Specifically, when there is at least one path in any two agents in an undirected network

G

, the network is connected.

Let

R^{n}

denote the n-dimensional Euclidean space and

∥\cdot∥

denote the Euclidean norm of a vector

x \in R^{n}

. The notation

ρ_{max} (\cdot)

is the spectral radius of a matrix, and

N

represents the set of positive integers. Then let

X_{0} (R^{n})

denote the collection of all proper lower semi-continuous convex functions from

R^{n}

to

(- \infty, + \infty]

. When

W_{i}

denotes a positive definite matrix, using

W_{i}

as the diagonal element can form a positive definite diagonal matrix

blkdiag {\{W_{i}\}}_{i \in V}

. Let

ri (\cdot)

denote the interior of a convex subset and

do m_{f}

denote the effective domain of f. The subdifferential of function

f_{i}

is expressed as

\partial f (x_{1}) = {v \in R^{n} | {(x_{2} - x_{1})}^{T} v \leq f (x_{2}) - f (x_{1}), \forall x_{2} \in R^{n}} .

The proximity operator of a function

f \in X_{0} (R^{n})

related to

{∥ \cdot ∥}_{P}

is defined by

p r o x_{P^{- 1} f} (x) = arg {min}_{y \in R^{n}} {f (y) + (1 / 2) | | x - y {| |}_{P}^{2}} .

The convex conjugate function of f is written as

f^{\otimes}

.

At the same time, we give the following lemmas and assumptions.

Lemma 1

([24]). Let

f \in X_{0} (R^{n})

, then for vectors

x_{1}, x_{2} \in R^{n}

, the following relation holds:

\begin{matrix} x_{2} \in \partial f (x_{1}) \Leftrightarrow x_{1} & = p r o x_{f} (x_{1} + x_{2}) \\ \Leftrightarrow x_{2} & = (I - p r o x_{f}) (x_{1} + x_{2}) . \end{matrix}

(3)

Lemma 2

([25]). Let

f \in X_{0} (R^{n})

, then both

p r o x_{f}

and

I - p r o x_{f}

satisfy a firmly nonexpansive relationship.

Lemma 3

([19]). For a fixed point iteration

u_{k + 1} = T (u_{k})

,

\{u_{k}\}

will converge to the fixed point of T when it satisfies the following conditions:

1.: T is continuous,
2.: $\{| | u_{k} - u^{*} | |^{2}\}$ is non-increasing,
3.: $lim_{k \to \infty} {∥u_{k + 1} - u_{k}∥}^{2} = 0 .$

Definition 1.

For all

x_{1}, x_{2} \in R^{n}

, if an operator T satisfies

∥T x_{1} - T x_{2}∥ \leq ∥x_{1} - x_{2}∥

, then T is a nonexpansive operator. Further, if T satisfies

{∥T x_{1} - T x_{2}∥}^{2} \leq {(T x_{1} - T x_{2})}^{T} (x_{1} - x_{2})

, then T is firmly a nonexpansive operator.

Definition 2.

When

{(T x_{1} - T x_{2})}^{T} (x_{1} - x_{2}) \geq σ_{T} {∥x_{1} - x_{2}∥}^{2}, x_{1}, x_{2} \in R^{n}

exist for a constant

σ_{T} > 0

and operator T, then operator T satisfies

σ_{T}

-strongly monotone.

The following assumptions will also be used.

Assumption 1.

Graph

G

satisfies undirected and connected operators.

Assumption 2.

The following three points are satisfied:

1.: $f_{i} : R^{n} \to R$ is a smooth convex function, let $1 / β_{i}$ be Lipschitz constant, then $f_{i}$ satisfies

$\begin{matrix} β_{i} ∥\nabla f_{i} (x_{1}) - \nabla f_{i} (x_{2})∥ \leq ∥x_{1} - x_{2}∥, \forall x_{1}, x_{2} \in R^{n}, \end{matrix}$
2.: $g_{i} : R^{n} \to R$ is a convex non-smooth function,
3.: Problem (2) has at least one solution.

3. Algorithm Development

In this section, we design and derive the synchronous algorithm and asynchronous algorithm.

We next carry on the equivalent transformation to problem (2) to facilitate the subsequent algorithm design. The constraint,

x_{i} = x_{j}

in (2) can be written as the edge-based form

\begin{matrix} E_{i j} x_{i} + E_{j i} x_{j} = 0, \end{matrix}

(4)

where

E_{i j} = I \in R^{n \times n}

for

i < j

, and

E_{i j} = - I \in R^{n \times n}

otherwise. Then, define the following linear operator:

\begin{matrix} M_{(i, j)} : x \to (E_{i j} x_{i}, E_{j i} x_{j}) \in R^{2 n \times m n}, \end{matrix}

(5)

with the compact variable

x = {[x_{1}^{T}, \dots, x_{m}^{T}]}^{T} \in R^{m n}

. We stack all

M_{(i, j)}

to get the following operator:

\begin{matrix} M : x \to {(M_{(i, j)} x)}_{(i, j) \in E}, \end{matrix}

(6)

with the dimension

2 n |E| \times m n

, where

|E|

is the number of edges of the network

E

. Considering the set

\begin{matrix} C_{(i, j)} = \{(e_{1}, e_{2}) \in R^{n} \times R^{n} |e_{1} + e_{2} = 0\} . \end{matrix}

Then, constraint (4) can be further reformulated in the following form:

\begin{matrix} M_{(i, j)} x \in C_{(i, j)} . \end{matrix}

(7)

Based on the above analysis, problem (2) can be transformed into

\begin{matrix} \begin{matrix} min_{x_{i} \in R^{n}} \sum_{i = 1}^{m} f_{i} (x_{i}) + g_{i} (x_{i}) + \sum_{i = 1}^{m} \sum_{(i, j) \in E} δ_{C_{(i, j)}} (M_{(i, j)} x), \end{matrix} \end{matrix}

(8)

where

δ_{C}

represents the indicator function, i.e.,

\begin{matrix} δ_{C_{(i, j)}} (M_{(i, j)} x) = \{\begin{matrix} 0, M_{(i, j)} x \in C_{(i, j)}, \\ + \infty, M_{(i, j)} x \notin C_{(i, j)} . \end{matrix} \end{matrix}

Then, let

\begin{matrix} f (x) = \sum_{i = 1}^{m} f_{i} (x_{i}), \\ g (x) = \sum_{i = 1}^{m} g_{i} (x_{i}), \\ δ_{C} (M x) = \sum_{i = 1}^{m} \sum_{j \in N_{i}} δ_{C_{(i, j)}} (M_{(i, j)} x), \end{matrix}

and

C = \prod_{(i, j) \in E} C_{(i, j)}

(

\prod

denotes the Cartesian product). Hence, the compact form of problem (8) can be expressed by

\begin{matrix} min_{x \in R^{m n}} f (x) + g (x) + δ_{C} (M x) . \end{matrix}

(9)

3.1. Synchronous Algorithm 1

According to the fixed point theory, we design the distributed optimization algorithm of problem (2) from (9). We define the step-size matrices

Γ = blkdiag {\{γ_{i} I_{n}\}}_{i \in V}

,

\tilde{Λ} = blkdiag {\{{\tilde{λ}}_{(i, j)} {\tilde{γ}}_{(i, j)}^{- 1}\}}_{(i, j) \in E}

, and

\tilde{H} = blkdiag {\{λ_{(i, j)} I_{2 n}\}}_{(i, j) \in E}

, where we let

{\tilde{γ}}_{(i, j)} = blkdiag \{γ_{i} I_{n}, γ_{j} I_{n}\}

, and then introduce the following operators:

T_{0} (s^{*}, x^{*}) = p r o x_{Γ g} (\begin{matrix} x^{*} - Γ \nabla f (x^{*}) - {(\tilde{H} M)}^{T} s^{*} \end{matrix}),

(10)

{\tilde{T}}_{1} (s^{*}, x^{*}) = (I - p r o x_{{\tilde{Λ}}^{- 1} δ_{C}}) (M T_{0} (s^{*}, x^{*}) + s^{*}),

(11)

T_{2} (s^{*}, x^{*}) = p r o x_{Γ g} (x^{*} - Γ \nabla f (x^{*}) - {(\tilde{H} M)}^{T} {\tilde{T}}_{1} (s^{*}, x^{*})),

(12)

T (s^{*}, x^{*}) = ({\tilde{T}}_{1} (s^{*}, x^{*}), T_{2} (s^{*}, x^{*})),

(13)

where

x^{*} = col {\{x_{i}^{*}\}}_{i = 1}^{m}

and

s^{*} = col {s_{(i, j)}^{*}}_{(i, j) \in E}

with

s_{(i, j)}^{*} = col {s_{(i, j), i}^{*}, s_{(i, j), i}^{*}}

are the fixed points of T. In particular,

s_{(i, j), i}^{*}

and

s_{(i, j), j}^{*}

are maintained by i and j, respectively. Considering the update variables

y_{k + 1} = col {y_{i}^{k + 1}}_{i = 1}^{m}

,

x_{k + 1} = col {x_{i}^{k + 1}}_{i = 1}^{m}

, and

s_{k + 1} = col {s_{(i, j)}^{k + 1}}_{(i, j) \in E}

with the edge-based variable

s_{(i, j)}^{k + 1} = col {s_{(i, j), i}^{k + 1}, s_{(i, j), j}^{k + 1}}

, we give the Picard sequence of T and obtain the following update rules:

\begin{matrix} \begin{matrix} y_{k + 1} = p r o x_{Γ g} (x_{k} - Γ \nabla f (x_{k}) - {(\tilde{H} M)}^{T} s_{k}) \\ s_{k + 1} = (I - p r o x_{{\tilde{Λ}}^{- 1} δ_{C}}) (M y_{k + 1} + s_{k}) \\ x_{k + 1} = p r o x_{Γ g} (x_{k} - Γ \nabla f (x_{k}) - {(\tilde{H} M)}^{T} s_{k + 1}) \end{matrix} \end{matrix}

(14)

Let

{\bar{w}}_{k + 1} = col {{\bar{w}}_{(i, j)}^{k + 1}}_{(i, j) \in E} = col {{\tilde{λ}}_{(i, j)} {\tilde{γ}}_{(i, j)}^{- 1} \cdot s_{(i, j)}^{k + 1}}_{(i, j) \in E}

. Using Lemma 2, (14) can be rewritten as

y_{k + 1} = p r o x_{Γ g} (x_{k} - Γ \nabla f (x_{k}) - Γ M^{T} {\tilde{w}}_{k}),

(15a)

s_{k + 1} = (I - p r o x_{{\tilde{Λ}}^{- 1} δ_{C}}) (M y_{k + 1} + s_{k}),

(15b)

x_{k + 1} = p r o x_{Γ g} (x_{k} - Γ \nabla f (x_{k}) - Γ M^{T} {\tilde{w}}_{k + 1}) .

(15c)

Next, we split (15a)–(15c) in a distributed manner. It follows from (5) and (6) that the i-th component of

M^{T} {\tilde{w}}_{k + 1}

is

E_{i j} {\tilde{w}}_{(i, j), i}^{k + 1}

. Note that (15a) can be decomposed into

\begin{matrix} (\begin{matrix} y_{1}^{k + 1} \\ ⋮ \\ y_{m}^{k + 1} \end{matrix}) = (\begin{matrix} p r o x_{γ_{1} g_{1}} (x_{1}^{k} - γ_{1} \nabla f_{1} (x_{1}^{k}) - γ_{1} \sum_{j \in N_{1}} E_{1 j} {\tilde{w}}_{(1, j), 1}^{k}) \\ ⋮ \\ p r o x_{γ_{m} g_{m}} (x_{m}^{k} - γ_{m} \nabla f_{m} (x_{m}^{k}) - γ_{m} \sum_{j \in N_{m}} E_{m j} {\tilde{w}}_{(m, j), m}^{k}) \end{matrix}) . \end{matrix}

(16)

For (15b), multiply both sides of the equality by

\tilde{Λ}

. As done in (16), we also split (15b) and (15c) and use the result

p r o x_{δ_{C_{(} i, j)}} = p r o j_{C_{(} i, j)}

to get the semi-distributed form:

y_{i}^{k + 1} = p r o x_{γ_{i} g_{i}} (\begin{matrix} x_{i}^{k} - γ_{i} \nabla f_{i} (x_{i}^{k}) - γ_{i} \sum_{j \in N_{i}} E_{i j} {\tilde{w}}_{(i, j), i}^{k} \end{matrix}),

(17a)

{\tilde{w}}_{(i, j)}^{k + 1} = {\tilde{w}}_{(i, j)}^{k} + {\tilde{λ}}_{(i, j)} {\tilde{γ}}_{(i, j)}^{- 1} (\begin{matrix} M_{(i, j)} y^{k + 1} - p r o j_{C_{(i, j)}} (\begin{matrix} \frac{{\tilde{γ}}_{(i, j)}}{{\tilde{λ}}_{(i, j)}} {\tilde{w}}_{(i, j)}^{k} + M_{(i, j)} y^{k + 1} \end{matrix}) \end{matrix}),

(17b)

x_{i}^{k + 1} = p r o x_{γ_{i} g_{i}} (\begin{matrix} x_{i}^{k} - γ_{i} \nabla f_{i} (x_{i}^{k}) - γ_{i} \sum_{j \in N_{i}} E_{i j} {\tilde{w}}_{(i, j), i}^{k + 1} \end{matrix}) .

(17c)

Note that (17b) is not fully distributed due to the structure

w_{(i, j)}^{k + 1} = col {w_{(i, j), i}^{k + 1}, w_{(i, j), j}^{k + 1}}

. By using (4) and (5), we can derive that the projection of vectors

e_{1}, e_{2} \in R^{n}

to

C_{(i, j)}

is expressed as

\begin{matrix} p r o j_{C_{(i, j)}} (e_{1}, e_{2}) = \frac{1}{2} (e_{1} - e_{2}, e_{2} - e_{1}), \end{matrix}

which contributes to the local update of (17b), i.e.,

\begin{matrix} {\tilde{w}}_{(i, j), i}^{k + 1} = {\tilde{w}}_{(i, j), i}^{k} + \frac{{\tilde{λ}}_{(i, j)}}{γ_{i}} (\begin{matrix} y_{i}^{k + 1} - \frac{1}{2} (\begin{matrix} (\frac{γ_{i}}{{\tilde{λ}}_{(i, j)}} {\tilde{w}}_{(i, j), i}^{k} + y_{i}^{k + 1}) - (\begin{matrix} \frac{γ_{i}}{{\tilde{λ}}_{(i, j)}} {\tilde{w}}_{(i, j), j}^{k} - y_{j}^{k + 1} \end{matrix}) \end{matrix}) \end{matrix}), \end{matrix}

\begin{matrix} {\tilde{w}}_{(i, j), j}^{k + 1} = {\tilde{w}}_{(i, j), j}^{k} + \frac{{\tilde{λ}}_{(i, j)}}{γ_{j}} (\begin{matrix} - y_{j}^{k + 1} - \frac{1}{2} (\begin{matrix} (\frac{γ_{j}}{{\tilde{λ}}_{(i, j)}} {\tilde{w}}_{(i, j), j}^{k} - y_{j}^{k + 1}) - (\begin{matrix} \frac{γ_{j}}{{\tilde{λ}}_{(i, j)}} {\tilde{w}}_{(i, j), i}^{k} + y_{i}^{k + 1} \end{matrix}) \end{matrix}) \end{matrix}) . \end{matrix}

Therefore, according to (17a), (17c), and the update of

w_{(i, j), i}^{k + 1}

, we can summarize the synchronous distributed algorithm as follows:

Remark 1.

Notice that Algorithm 1 is completely distributed without involving any global parameters. For example, each agent individually maintains the private primal variable

x_{i}^{k}

, auxiliary variable

y_{i}^{k}

, and edge-based variables

{\tilde{w}}_{(i, j), i}^{k + 1}

. For each edge

(i, j) \in E

in the network,

{\tilde{w}}_{(i, j)}^{k} = col {{\tilde{w}}_{(i, j), i}^{k}, {\tilde{w}}_{(i, j), i}^{k}}

as an auxiliary profile contains two components, i.e.,

{\tilde{w}}_{(i, j), i}^{k}

and

{\tilde{w}}_{(i, j), i}^{k}

, which are respectively kept by i and j. Meanwhile, the information exchange is locally conducted; that is, agent i shares its updated data

y_{i}^{k + 1}

and

{\tilde{w}}_{(i, j), i}^{k + 1}

with its all neighbors

j \in N_{i}

. On the other hand, the proposed algorithm takes uncoordinated constant positive step-sizes,

γ_{i}

, essentially distinguished from the global and dynamic ones in [7,8,9,14,23]. It is also worth noting that the edge-based step-size

{\tilde{λ}}_{(i, j)}

, held by agents i and j linked by the edge

(i, j) \in E

, can be seen as inherent parameters of the communication network, revealing the quality of the communication.

Algorithm 1 Distributed algorithm based on proximal operators

Input: For all agents

i \in V

,

x_{i}^{0} \in R^{n}

, and

{\tilde{w}}_{(i, j), i}^{0} \in R^{n}

, where

j \in N_{i}

. And select proper positive step-sizes or parameters,

γ_{i}

and

{\tilde{λ}}_{(i, j)}

.
For

k = 0, 1, \dots, do

:
1.

y_{i}^{k + 1} = p r o x_{γ_{i} g_{i}} (\begin{matrix} x_{i}^{k} - γ_{i} \nabla f_{i} (x_{i}^{k}) - γ_{i} \sum_{j \in N_{i}} E_{i j} {\tilde{w}}_{(i, j), i}^{k} \end{matrix}),

2.

{\tilde{w}}_{(i, j), i}^{k + 1} = \frac{1}{2} \frac{{\tilde{λ}}_{(i, j)}}{γ_{i}} (y_{i}^{k + 1} - y_{j}^{k + 1})

+ \frac{1}{2} ({\tilde{w}}_{(i, j), i}^{k} + {\tilde{w}}_{(i, j), j}^{k}), \forall j \in N_{i},

3.

x_{i}^{k + 1} = p r o x_{γ_{i} g_{i}} (\begin{matrix} x_{i}^{k} - γ_{i} \nabla f_{i} (x_{i}^{k}) - γ_{i} \sum_{j \in N_{i}} E_{i j} {\tilde{w}}_{(i, j), i}^{k + 1} \end{matrix}),

4. Send

y_{i}^{k + 1}

,

{\tilde{w}}_{(i, j), i}^{k + 1}

to j for

j \in N_{i}

,
5. Until the

∥x_{i}^{k + 1} - x_{i}^{k}∥

approaches zero.
End
Output: The primal variable

x_{i}^{k + 1}

as the optimal solution

x_{i}^{*}

.

3.2. Asynchronous Algorithm 2

Here, we extend the synchronous Algorithm 1 to the asynchronous iterative version based on the random-block coordinate mechanism in [2]. Combining with the principle of this mechanism, we define the diagonal matrix

P_{i} \in R^{(2 |E| + m) n} \times R^{(2 |E| + m) n}

(where

| E |

denotes the number of edges of the graph

E

) diagonal elements of 0 or 1 to represent the coordinate matrix, and then divide the vector

(s, x)

into m blocks. At the same time, we define the activation vector

ξ^{k} \in R^{m}

of

ϕ

-valued, where

ϕ = 0, 1

is a binary string with length m. When

ξ_{i}^{k} = 1

, it means that the agent i is activated at the k-th iteration; otherwise it is not activated.

In order to describe the activation state of different coordinate blocks and ensure random activation, we give the following assumption.

Assumption 3.

The following two points are satisfied:

1.: The sum of $P_{i}$ satisfies $\sum_{i}^{m} P_{i} = I$ ,
2.: ${(ξ^{k})}_{k \geq 0}$ is a ϕ-valued vector satisfying identical independent distributionsand its probability is $p_{i} = P (ξ_{i}^{k} = 1) > 0, k \geq 0 .$

Then, based on the given assumption, we can develop the asynchronous algorithm as follows:

It can be seen that Algorithm 2 allows each agent to awaken with an independent probability, which means that a subset of randomly activated agents will participate in the updates while inactivated ones stay in previous states. Such a scheme is more flexible than the single waking-up scheme [22] or other activated block coordinates that are uniformly selected [26]. In addition, the probability is completely independent of the others, which does not meet some strict conditions, such as

\sum_{i = 1}^{m} p_{i} = 1

.

Algorithm 2 Asynchronous distributed version

Input: For all agents

i \in V

,

x_{i}^{0} \in R^{n_{i}}

, and

{\tilde{w}}_{(i, j), i}^{0} \in R^{n}

, where

j \in N_{i}

. And select proper positive step-sizes or parameters,

γ_{i}

and

{\tilde{λ}}_{(i, j)}

.
For

k = 0, 1, \dots, do :

For $j \in N_{i}$ , each agent i is activated independently with probability $p_{i}$ , and further performs the update steps 1–5 in Algorithm 1. While agents that are not activated, the last values keep unchanged.

End
Output: The primal variable

x_{i}^{k + 1}

as the optimal solution

x_{i}^{*}

.

In order to facilitate the subsequent derivation of convergence, we need to give a compact form of Algorithm 2. By making

u = (s, x)

, we get

\begin{matrix} u_{k + 1} = u_{k} + E_{k + 1} (T u_{k} - u_{k}), \end{matrix}

(18)

where

E_{k + 1} = \sum_{i = 1}^{m} ξ_{i}^{k + 1} P_{i}

and operator T can be seen in Equation (11).

4. Convergence Analysis

The convergence proof of the algorithms is provided in this section. The following assumption is the condition to be met for the convergence of the algorithms.

Assumption 4.

Recall the local Lipschitz constant

β_{i}

in Assumption 2. It is assumed that the step-sizes satisfy the following conditions:

\begin{matrix} 0 < γ_{i} < 2 β_{i}, 0 < {\tilde{λ}}_{(i, j)} < 1 . \end{matrix}

Lemma 4.

Let

x^{*}

be a solution to (9), then there are

\begin{matrix} s^{*} = {\tilde{T}}_{1} (s^{*}, x^{*}), \\ x^{*} = T_{2} (s^{*}, x^{*}), \end{matrix}

which means

u^{*} = (s^{*}, x^{*})

is a fixed point of T. On the contrary,

x^{*}

is the solution to (9) when

u^{*}

is the fixed point of T.

Proof.

Use the first-order optimal condition of (9) to obtain

0 \in Γ \nabla f (x^{*}) + Γ \partial g (x^{*}) + Γ M^{T} \partial δ_{C} (M x^{*})

, where

x^{*}

is the optimal solution. According to the definition of matrix step-sizes, we further obtain

\begin{matrix} 0 \in & Γ \nabla f (x^{*}) + Γ \partial g (x^{*}) + {(\tilde{H} M)}^{T} {\tilde{Λ}}^{- 1} \partial δ_{C} (M x^{*}) . \end{matrix}

Use Lemma 1 and let

s^{*} \in {\tilde{Λ}}^{- 1} \partial δ_{C} (M x^{*})

to get

\begin{matrix} s^{*} = (I - p r o x_{{\tilde{Λ}}^{- 1} δ_{C}}) (M x^{*} + s^{*}), \end{matrix}

(19)

\begin{matrix} x^{*} = p r o x_{Γ g} (x^{*} - Γ \nabla f (x^{*}) - {(\tilde{H} M)}^{T} s^{*}) . \end{matrix}

(20)

Then according to (19) and (20), we can get

\begin{matrix} s^{*} = (I - p r o x_{{\tilde{Λ}}^{- 1} δ_{C}}) (M p r o x_{Γ g} (\begin{matrix} x^{*} - Γ \nabla f (x^{*}) - {(\tilde{H} M)}^{T} s^{*} \end{matrix}) + s^{*}) . \end{matrix}

Therefore, we have

x^{*} = T_{2} (s^{*}, x^{*})

and

s^{*} = {\tilde{T}}_{1} (s^{*}, x^{*})

. Meanwhile,

u^{*} = T u^{*}

, where

u^{*} = (s^{*}, x^{*})

. Accordingly, if there is

u^{*} = T u^{*}

, it can also be deduced that

x^{*}

satisfies the first-order optimality condition of problem (9). Thus

x^{*}

is an optimal solution of problem (9). □

Lemma 5.

Let Assumptions 1 and 2 hold, then there are

\begin{matrix} {∥s_{k + 1} - s^{*}∥}_{\tilde{Λ}}^{2} \leq {∥s_{k} - s^{*}∥}_{\tilde{Λ}}^{2} - {∥s_{k + 1} - s_{k}∥}_{\tilde{Λ}}^{2} + 2 {(s_{k + 1} - s^{*})}^{T} \tilde{Λ} M (y_{k + 1} - x^{*}), \end{matrix}

(21)

\begin{matrix} {∥x_{k + 1} - x^{*}∥}_{Γ^{- 1}}^{2} \\ \leq {∥x_{k} - x^{*}∥}_{Γ^{- 1}}^{2} - {∥x_{k + 1} - y_{k + 1}∥}_{Γ^{- 1}}^{2} - {∥x_{k} - y_{k + 1}∥}_{Γ^{- 1}}^{2} \\ + 2 {(Γ^{- 1} x_{k + 1} - Γ^{- 1} y_{k + 1})}^{T} (\begin{matrix} Γ \nabla f (x_{k}) + {(\tilde{H} M)}^{T} s_{k} \end{matrix}) \\ + 2 {(Γ^{- 1} x^{*} - Γ^{- 1} x_{k + 1})}^{T} (\begin{matrix} Γ \nabla f (x_{k}) + {(\tilde{H} M)}^{T} s_{k + 1} \end{matrix}) \\ + 2 ((g \circ Γ) (Γ^{- 1} x^{*}) - (g \circ Γ) (Γ^{- 1} y_{k + 1})) . \end{matrix}

(22)

Proof.

Combining (14), (19), and Lemma 2, we get

\begin{matrix} {∥s_{k + 1} - s^{*}∥}_{\tilde{Λ}}^{2} \\ = {∥\begin{matrix} (I - p r o x_{{\tilde{Λ}}^{- 1} δ_{C}}) (M y_{k + 1} + s_{k}) - (I - p r o x_{{\tilde{Λ}}^{- 1} δ_{C}}) (M x^{*} + s^{*}) \end{matrix}∥}_{\tilde{Λ}}^{2} \end{matrix}

\begin{matrix} \leq {(s_{k + 1} - s^{*})}^{T} \tilde{Λ} ((M y_{k + 1} + s_{k}) - (M x^{*} + s^{*})) . \end{matrix}

It is further concluded that

\begin{matrix} {(s_{k + 1} - s^{*})}^{T} \tilde{Λ} (s_{k + 1} - s_{k} + s_{k} - s^{*}) \\ \leq {(s_{k + 1} - s^{*})}^{T} \tilde{Λ} M (y_{k + 1} - x^{*}) + {(s_{k + 1} - s^{*})}^{T} \tilde{Λ} (s_{k} - s^{*}) . \end{matrix}

Here we introduce an equality. For a positive definite matrix

K

and

x_{1}, x_{2}, x_{3} \in R^{n}

, we have

\begin{matrix} 2 {(x_{1} - x_{2})}^{T} K (x_{3} - x_{2}) = {∥x_{3} - x_{2}∥}_{K}^{2} + {∥x_{1} - x_{2}∥}_{K}^{2} - {∥x_{1} - x_{3}∥}_{K}^{2} . \end{matrix}

(23)

Combining the above two results, we derive

\begin{matrix} {∥s_{k + 1} - s^{*}∥}_{\tilde{Λ}}^{2} = & {∥s_{k} - s^{*}∥}_{\tilde{Λ}}^{2} - {∥s_{k + 1} - s_{k}∥}_{\tilde{Λ}}^{2} + 2 {(s_{k + 1} - s^{*})}^{T} \tilde{Λ} (s_{k + 1} - s_{k}) \\ \leq & {∥s_{k} - s^{*}∥}_{\tilde{Λ}}^{2} - {∥s_{k + 1} - s_{k}∥}_{\tilde{H}}^{2} + 2 {(s_{k + 1} - s^{*})}^{T} \tilde{Λ} M (y_{k + 1} - x^{*}) . \end{matrix}

(24)

In order to prove the validity of (22), (3) is used for (14)

\begin{matrix} Γ^{- 1} (x_{k} - Γ \nabla f (x_{k}) - {(\tilde{H} M)}^{T} s_{k + 1} - x_{k + 1}) \in \partial g (x_{k + 1}) . \end{matrix}

Using subdifferential properties to obtain

\begin{matrix} {(x^{*} - x_{k + 1})}^{T} Γ^{- 1} (x_{k} - Γ \nabla f (x_{k}) - {(\tilde{H} M)}^{T} s_{k + 1} - x_{k + 1}) \leq g (x^{*}) - g (x_{k + 1}) \end{matrix}

and equivalent

\begin{matrix} {(Γ^{- 1} x_{k + 1} - Γ^{- 1} x^{*})}^{T} (x_{k + 1} - x_{k}) \\ \leq - {(Γ^{- 1} x_{k + 1} - Γ^{- 1} x^{*})}^{T} Γ^{- 1} (\begin{matrix} Γ \nabla f (x_{k}) + {(\tilde{H} M)}^{T} s_{k + 1} \end{matrix}) \\ + (g \circ Γ) (Γ^{- 1} x^{*}) - (g \circ Γ) (Γ^{- 1} x_{k + 1}) . \end{matrix}

(25)

Moreover, there is

\begin{matrix} {∥x_{k + 1} - x^{*}∥}_{Γ^{- 1}}^{2} = & {∥x_{k} - x^{*}∥}_{Γ^{- 1}}^{2} - {∥x_{k + 1} - x_{k}∥}_{Γ^{- 1}}^{2} + 2 {(x_{k + 1} - x^{*})}^{T} Γ^{- 1} (x_{k + 1} - x_{k}) . \end{matrix}

(26)

A derivation similar to (25) is obtained for (14)

\begin{matrix} {(Γ^{- 1} x_{k + 1} - Γ^{- 1} y_{k + 1})}^{T} (\begin{matrix} x_{k} - Γ \nabla f (x_{k}) - {(\tilde{H} M)}^{T} s_{k} - y_{k + 1} \end{matrix}) \\ \leq (g \circ Γ) (Γ^{- 1} x_{k + 1}) - (g \circ Γ) (Γ^{- 1} y_{k + 1}) \\ \Leftrightarrow {(Γ^{- 1} x_{k + 1} - Γ^{- 1} y_{k + 1})}^{T} (x_{k} - y_{k + 1}) \\ \leq {(Γ^{- 1} x_{k + 1} - Γ^{- 1} y_{k + 1})}^{T} (\begin{matrix} Γ \nabla f (x_{k}) + {(\tilde{H} M)}^{T} s_{k} \end{matrix}) \\ + (g \circ Γ) (Γ^{- 1} x_{k + 1}) - (g \circ Γ) (Γ^{- 1} y_{k + 1}) . \end{matrix}

Therefore, we deduce

\begin{matrix} - {∥x_{k + 1} - x_{k}∥}_{Γ^{- 1}}^{2} \\ = - {∥x_{k} - y_{k + 1}∥}_{Γ^{- 1}}^{2} - {∥x_{k + 1} - y_{k + 1}∥}_{Γ^{- 1}}^{2} + 2 {(x_{k} - y_{k + 1})}^{T} Γ^{- 1} (x_{k + 1} - y_{k + 1}) \end{matrix}

\begin{matrix} \leq - {∥x_{k} - y_{k + 1}∥}_{Γ^{- 1}}^{2} - {∥x_{k + 1} - y_{k + 1}∥}_{Γ^{- 1}}^{2} \\ + 2 {(Γ^{- 1} x_{k + 1} - Γ^{- 1} y_{k + 1})}^{T} (\begin{matrix} Γ \nabla f (x_{k}) + {(\tilde{H} M)}^{T} s_{k} \end{matrix}) \\ + 2 ((g \circ Γ) (Γ^{- 1} x_{k + 1}) - (g \circ Γ) (Γ^{- 1} y_{k + 1})) . \end{matrix}

Combining the above two equalities and (26), we can get (22). □

Lemma 6.

Let Assumptions 1 and 2 hold. Set

β = blkdiag {\{β_{i} I_{n}\}}_{i \in V}

. For matrix

P = blkdiag {\tilde{Λ}, Γ^{- 1}}

and

u = (s, x)

, there is

\begin{matrix} {∥u_{k + 1} - u^{*}∥}_{P}^{2} \\ \leq {∥u_{k + 1} - u^{*}∥}_{P}^{2} - {∥s_{k + 1} - s_{k}∥}_{\tilde{Λ} (I - M Γ M^{T} \tilde{Λ})}^{2} \\ - {∥\begin{matrix} y_{k + 1} - x_{k + 1} + {(\tilde{H} M)}^{T} (s_{k + 1} - s_{k}) \end{matrix}∥}_{Γ^{- 1}}^{2} \\ - {∥x_{k} - y_{k + 1} - (Γ \nabla f (x_{k}) - Γ \nabla f (x^{*}))∥}_{Γ^{- 1}}^{2} \\ - {∥\nabla f (x_{k}) - \nabla f (x^{*})∥}_{2 β - Γ}^{2} . \end{matrix}

(27)

Proof.

Adding (21) and (22), then rearranging to get

\begin{matrix} {∥x_{k + 1} - x^{*}∥}_{Γ^{- 1}}^{2} + {∥s_{k + 1} - s^{*}∥}_{\tilde{Λ}}^{2} \\ \leq {∥x_{k} - x^{*}∥}_{Γ^{- 1}}^{2} + {∥s_{k} - s^{*}∥}_{\tilde{Λ}}^{2} - {∥x_{k} - y_{k + 1}∥}_{Γ^{- 1}}^{2} \\ - {∥s_{k + 1} - s_{k}∥}_{\tilde{Λ}}^{2} - {∥x_{k + 1} - y_{k + 1}∥}_{Γ^{- 1}}^{2} \\ + 2 {(\begin{matrix} {(\tilde{H} M)}^{T} (s_{k + 1} - s_{k}) \end{matrix})}^{T} Γ^{- 1} (y_{k + 1} - x_{k + 1}) \\ + 2 {(Γ \nabla f (x_{k}) - Γ \nabla f (x^{*}))}^{T} Γ^{- 1} (x_{k} - y_{k + 1}) \\ - 2 {(Γ \nabla f (x_{k}) - Γ \nabla f (x^{*}))}^{T} Γ^{- 1} (x_{k} - x^{*}) \\ + 2 (\begin{matrix} {(\begin{matrix} - Γ \nabla f (x^{*}) - {(\tilde{H} M)}^{T} s^{*} \end{matrix})}^{T} Γ^{- 1} (y_{k + 1} - x^{*}) \\ + (g \circ Γ) (Γ^{- 1} x^{*}) - (g \circ Γ) (Γ^{- 1} y_{k + 1}) \end{matrix}) . \end{matrix}

Further, we have

\begin{matrix} {∥x_{k + 1} - x^{*}∥}_{Γ^{- 1}}^{2} + {∥s_{k + 1} - s^{*}∥}_{\tilde{Λ}}^{2} \\ \leq {∥x_{k} - x^{*}∥}_{Γ^{- 1}}^{2} + {∥s_{k} - s^{*}∥}_{\tilde{Λ}}^{2} - {∥x_{k} - y_{k + 1}∥}_{Γ^{- 1}}^{2} \\ - {∥s_{k + 1} - s_{k}∥}_{\tilde{Λ} (I - M Γ M^{T} \tilde{Λ})}^{2} - {∥x_{k + 1} - y_{k + 1}∥}_{Γ^{- 1}}^{2} \\ + {∥\begin{matrix} {(\tilde{H} M)}^{T} (s_{k + 1} - s_{k}) \end{matrix}∥}_{Γ^{- 1}}^{2} + {∥y_{k + 1} - x_{k + 1}∥}_{Γ^{- 1}}^{2} \\ - {∥y_{k + 1} - x_{k + 1} + {(\tilde{H} M)}^{T} (s_{k + 1} - s_{k})∥}_{Γ^{- 1}}^{2} \\ + {∥Γ \nabla f (x_{k}) - Γ \nabla f (x^{*})∥}_{Γ^{- 1}}^{2} + {∥x_{k} - y_{k + 1}∥}_{Γ^{- 1}}^{2} \\ - {∥x_{k} - y_{k + 1} - (Γ \nabla f (x_{k}) - Γ \nabla f (x^{*}))∥}_{Γ^{- 1}}^{2} \\ + 2 (\begin{matrix} {(\begin{matrix} - Γ \nabla f (x^{*}) - {(\tilde{H} M)}^{T} s^{*} \end{matrix})}^{T} (y_{k + 1} - x^{*}) \\ + (g \circ Γ) (Γ^{- 1} x^{*}) - (g \circ Γ) (Γ^{- 1} y_{k + 1}) \end{matrix}) . \end{matrix}

(28)

Then we deal with some terms in the above inequality. For (20) combined with Lemma 1 can deduce

\begin{matrix} - Γ \nabla f (x^{*}) - {(\tilde{H} M)}^{T} s^{*} \in Γ \partial g (x^{*}) = \partial (g \circ Γ) (Γ^{- 1} x^{*}) . \end{matrix}

Further using subdifferential properties, we have

\begin{matrix} {(Γ^{- 1} y_{k + 1} - Γ^{- 1} x^{*})}^{T} (\begin{matrix} - Γ \nabla f (x^{*}) - {(\tilde{H} M)}^{T} s^{*} \end{matrix}) \\ + (g \circ Γ) (Γ^{- 1} x^{*}) - (g \circ Γ) (Γ^{- 1} y_{k + 1}) \leq 0 . \end{matrix}

Meanwhile, because

\nabla f_{i}

is

1 / β_{i}

-strongly monotone, there is

\begin{matrix} - {(\nabla f (x_{k}) - \nabla f (x^{*}))}^{T} (x_{k} - x^{*}) \leq - {∥\nabla f (x_{k}) - \nabla f (x^{*})∥}_{β}^{2} . \end{matrix}

(29)

Bring the above results back to (28) and get (27). □

Lemma 7.

Under Assumptions 1–4,

\{| | u_{k} - u^{*} {| |}_{P}^{2}\}

is non-increasing and

{lim}_{k \to \infty} {∥u_{k + 1} - u_{k}∥}_{P}^{2} = 0

.

Proof.

If Assumption 4 holds, we can deduce that

\{| | u_{k} - u^{*} {| |}_{P}^{2}\}

satisfies non-increasing operators.

Sum (27) over k from 0 to N to obtain

\begin{matrix} {∥u_{N + 1} - u^{*}∥}_{P}^{2} \\ \leq {∥u_{0} - u^{*}∥}_{P}^{2} - \sum_{k = 0}^{n} {∥s_{k + 1} - s_{k}∥}_{\tilde{Λ} (I - M Γ M^{T} \tilde{Λ})}^{2} \\ - \sum_{k = 0}^{n} {∥\begin{matrix} (x_{k + 1} - y_{k + 1}) + {(\tilde{H} M Γ)}^{T} (s_{k + 1} - s_{k}) \end{matrix}∥}_{Γ^{- 1}}^{2} \\ - \sum_{k = 0}^{n} {∥x_{k} - y_{k + 1} - (Γ \nabla f (x_{k}) - Γ \nabla f (x^{*}))∥}_{Γ^{- 1}}^{2} \\ - \sum_{k = 0}^{n} {∥\nabla f (x_{k}) - \nabla f (x^{*})∥}_{2 β - Γ}^{2} . \end{matrix}

When N tends to infinity, we can get

\begin{matrix} \begin{matrix} \sum_{k = 0}^{\infty} {∥s_{k + 1} - s_{k}∥}_{\tilde{Λ} (I - M Γ M^{T} \tilde{Λ})} < \infty, \\ \sum_{k = 0}^{\infty} ∥\begin{matrix} (x_{k + 1} - y_{k + 1}) + {(\tilde{H} M Γ)}^{T} (s_{k + 1} - s_{k}) \end{matrix}∥ < \infty, \\ \sum_{k = 0}^{\infty} ∥x_{k} - y_{k + 1} - (Γ \nabla f (x_{k}) - Γ \nabla f (x^{*}))∥ < \infty, \\ \sum_{k = 0}^{\infty} ∥\nabla f (x_{k}) - \nabla f (x^{*})∥ < \infty . \end{matrix} \end{matrix}

This means

lim_{k \to \infty} {∥s_{k + 1} - s_{k}∥}_{\tilde{Λ} (I - M Γ M^{T} \tilde{Λ})} = 0,

(30)

lim_{k \to \infty} ∥\begin{matrix} (x_{k + 1} - y_{k + 1}) + {(\tilde{H} M Γ)}^{T} (s_{k + 1} - s_{k}) \end{matrix}∥ = 0,

(31)

lim_{k \to \infty} ∥x_{k} - y_{k + 1} - (Γ \nabla f (x_{k}) - Γ \nabla f (x^{*}))∥ = 0,

(32)

lim_{k \to \infty} ∥\nabla f (x^{*}) - \nabla f (x_{k})∥ = 0 .

(33)

Next, according to (32) and (33), we obtain

\begin{matrix} lim_{k \to \infty} ∥x_{k} - y_{k + 1}∥ = 0 . \end{matrix}

(34)

Meanwhile, if Assumption 4 holds,

I - M Γ M^{T} \tilde{Λ}

is a symmetric positive definite. Therefore, we can get

\begin{matrix} lim_{k \to \infty} ∥s_{k + 1} - s_{k}∥ = 0 . \end{matrix}

(35)

According to (31) and (35) we obtain

{lim}_{k \to \infty} ∥x_{k + 1} - y_{k + 1}∥ = 0 .

Combining with (34), we get

\begin{matrix} lim_{k \to \infty} {∥x_{k + 1} - x_{k}∥}^{2} = 0 . \end{matrix}

(36)

Then according to (35) and (36), we get

{lim}_{k \to \infty} {∥u_{k + 1} - u_{k}∥}^{2} = 0 .

□

Next, we give the following theorem to prove the convergence of Algorithm 1.

Theorem 1.

Under Assumptions 1–4,

\{x_{k}\}

and

\{u_{k}\}

converge to the optimal solution of (2) and the fixed points of T, respectively.

Proof.

Because

p r o x_{f}

and

I - p r o x_{f}

are firmly nonexpansive, T is continuous. Then,

{lim}_{k \to \infty} {∥u_{k + 1} - u_{k}∥}_{P}^{2} = 0

and the sequence

\{| | u_{k} - u^{*} {| |}_{P}^{2}\}

satisfies non-increasing are obtained from Lemma 7. Based on Lemma 3, the sequence

\{u_{k}\}

converges to a fixed point of T. According to Lemma 4, it can be concluded that

\{x_{k}\}

converges to a solution to (2). □

At the same time, we also give the following theorem to prove the convergence of Algorithm 2.

Theorem 2.

Under Assumptions 1–4, relative to the solution set

S

, the sequence

{\{u_{k}\}}_{k \geq k_{0}}, k_{0} \in N

satisfies

Π^{- 1} P

stochastic Fejér monotonicity [27]:

\begin{matrix} E [{∥u_{k + 1} - u^{*}∥}_{Π^{- 1} P}^{2}] \\ \leq {∥u_{k} - u^{*}∥}_{Π^{- 1} P}^{2} - {∥s_{k + 1} - s_{k}∥}_{\tilde{Λ} (I - M Γ M^{T} \tilde{Λ})}^{2} - {∥\nabla f (x_{k}) - \nabla f (x^{*})∥}_{2 β - Γ}^{2} . \end{matrix}

(37)

Further, the sequence

{\{u_{k}\}}_{k \geq k_{0}}

converges almost surely to some

u^{*} \in S

.

Proof.

Before proving, we give some definitions. Here

Π = \sum_{i = 1}^{m} p_{i} P_{i}

denotes the probability matrix, and

E [\cdot |F_{k}]

is ca onditional expectation, and its abbreviation is

E_{_{k}} [\cdot]

, where

F_{k}

represents the filtration generated by

(ξ^{1}, \dots, ξ^{k})

. We use

E_{k} = \sum_{i = 1}^{m} ξ_{i}^{k} P_{i}

to map the components of

(R^{(2 |E| + m) n}, F_{k - 1})

to

(R^{(2 |E| + m) n}, F_{k})

.

Based on the definition of

ξ^{k}

, we have

E \circ (E_{k + 1}) = Π .

Using the idempotent property of

E_{k}

, we have

\begin{matrix} E [{∥u_{k + 1} - u^{*}∥}_{Π^{- 1} P}^{2}] \\ = E [{∥u_{k} + E_{k + 1} ({T u}_{k} - u_{k}) - u^{*}∥}_{Π^{- 1} P}^{2}] \\ = E [\begin{matrix} {∥u_{k} - u^{*}∥}_{Π^{- 1} P}^{2} + {∥E_{k + 1} ({T u}_{k} - u_{k})∥}_{Π^{- 1} P}^{2} \\ + 2 {(u_{k} - u^{*})}^{T} Π^{- 1} P (E_{k + 1} ({T u}_{k} - u_{k})) \end{matrix}] \\ = {∥u_{k} - u^{*}∥}_{Π^{- 1} P}^{2} + {∥{T u}_{k} - u_{k}∥}_{P}^{2} + 2 {(u_{k} - u^{*})}^{T} P ({T u}_{k} - u_{k}) . \end{matrix}

Then according to Lemma 6 and (23), we get

\begin{matrix} E [{∥u_{k + 1} - u^{*}∥}_{Π^{- 1} P}^{2}] \\ = {∥u_{k} - u^{*}∥}_{Π^{- 1} P}^{2} + {∥{T u}_{k} - u^{*}∥}_{P}^{2} - {∥u_{k} - u^{*}∥}_{P}^{2} \\ \leq {∥u_{k} - u^{*}∥}_{Π^{- 1} P}^{2} - {∥s_{k + 1} - s_{k}∥}_{\tilde{Λ} (I - M Γ M^{T} \tilde{Λ})}^{2} - {∥\nabla f (x_{k}) - \nabla f (x^{*})∥}_{2 β - Γ}^{2} . \end{matrix}

Therefore, if Assumption 4 holds, we can obtain the convergence of (37) according to [28] Th. 3, [27] Prop. 2.3, and the Robbins–Siegmund lemma in [29]. □

5. Numerical Experiments

5.1. Case Study I: Performance Examination

We present the effectiveness of the algorithms in this section by solving a class of quadratic programming problems on undirected networks. The network topology is shown in Figure 1.

The quadratic programming problem model is as follows:

\begin{matrix} \begin{matrix} \min_{x_{1}, \dots, x_{m}} \sum_{i = 1}^{m} f_{i} (x_{i}) = x_{i}^{T} V_{i} x_{i} + b_{i}^{T} x_{i} \\ s . t . x_{i}^{\min} \leq x_{i} \leq x_{i}^{\max}, i = 1, \dots, m, \\ x_{i} = x_{j}, i = 1, \dots, m, (i, j) \in E, \end{matrix} \end{matrix}

(38)

where

x_{i}

is the decision variable of each agent. Matrix

V_{i}

in the objective function is a diagonal matrix, and its elements are randomly selected in [−8, 8], and the elements of vector

b_{i}

are randomly selected in [−10, −5]. For the box constraint of

x_{i}

, the range of

x_{i}^{m i n}

is [−10, −5], and the range of

x_{i}^{m a x}

is [5, 10].

To solve problem (38), we need to convert the problem into the form of problem (8). Defining the set

X_{i} = \{e \in R^{2} |x_{i}^{m} \leq e \leq x_{i}^{M}\}

and defining the indicator function

δ_{X_{i}} (x_{i})

, then we can get the following problem:

\begin{matrix} min_{x_{i} \in R^{2}} & \sum_{i = 1}^{m} f_{i} (x_{i}) + δ_{X_{i}} (x_{i}) + \sum_{i = 1}^{m} \sum_{(i, j) \in E} δ_{C_{(i, j)}} (M_{(i, j)} x) . \end{matrix}

Figure 2a shows that the agent finally converges to a consistent state through synchronous Algorithm 1. In Figure 2b, we use asynchronous Algorithm 2 with activation probability

p_{i} = 0.2

to describe the state of the agent under the same parameter conditions.

In Figure 3, the performance of both proposed algorithms is depicted through a comparison with existing algorithms, i.e., an ADMM-based method [30], TriPD-Dist, and its asynchronous version [2]. It can be shown that Algorithm 1 outperforms the ADMM-based method and TriPD-Dist, and the proposed asynchronous algorithm (Algorithm 2) also has a faster convergence speed than asynchronous TriPD-Dist, mainly by estimating the logarithmic values of

1 / m \cdot \sum_{i = 1}^{m} ∥ x_{i}^{k} - {\tilde{x}}^{*} ∥

.

5.2. Case Study II: First-Order Dynamics System

In this subsection, we apply the proposed synchronous algorithm to solve a first-order dynamics system problem in a 2-D space [31], where each agent has its own cost function

f_{i} (\tilde{p}) = {∥\tilde{p} - {\tilde{p}}_{x, i}∥}^{2} + {∥\tilde{p} - {\tilde{p}}_{y, i}∥}^{2}

, with the action response

\tilde{p} = {[{\tilde{p}}_{x}, {\tilde{p}}_{y}]}^{T}

, and the private reference positions

{\bar{p}}_{x, i} = {[i - 3.5, 0]}^{T}

and

{\bar{p}}_{y, i} = {[0, i - 3.5]}^{T}

. The goal of the considered problem is that all agents cooperatively find the optimal position

\tilde{p}

under the local constraints

Ω_{i} = \{\tilde{p} \in R^{2}| {∥\tilde{p} - {\bar{p}}_{i}^{0}∥}^{2} \leq 64\}

, where

{\bar{p}}_{i}^{0}

is the initial position of agent

i \in {1, 2, 3, 4, 5, 6, 7}

. Let

{\bar{p}}_{1}^{0} = {[- 4, 5.5]}^{T}, {\bar{p}}_{2}^{0} = {[0, 7]}^{T}, {\bar{p}}_{3}^{0} = {[6, 5]}^{T}, {\bar{p}}_{4}^{0} = {[5, - 3.5]}^{T}, {\bar{p}}_{5}^{0} = {[0, - 7]}^{T}, {\bar{p}}_{6}^{0} = {[- 5, - 5]}^{T}, {\bar{p}}_{7}^{0} = {[7, 7]}^{T}

, then the distributed problem can be formulated as

\begin{matrix} min_{p_{1}, \dots, p_{m}} \sum_{i = 1}^{m} {∥p_{i} - {\bar{p}}_{x, i}∥}^{2} + {∥p_{i} - {\bar{p}}_{y, i}∥}^{2} + δ_{Ω_{i}} (p_{i}), \\ s . t . p_{i} = p_{j}, (i, j) \in E, \end{matrix}

where

p_{i} \in R^{2}

is the local estimation action for

\tilde{p}

. In light of (1), we can set

g_{i} (p_{i}) = δ_{Ω_{i}} (p_{i})

. The selections of step-sizes are the same as that of Case Study I.

The results are described in Figure 4 and Figure 5. To be specific, Figure 4a,b reflect the trajectories of

p_{i} = {[p_{x, i}, p_{y, i}]}^{T}

. Figure 5 depicts the motions of the entire system over iterations, where the optimal position

{\tilde{p}}^{*} = {[0 . 6743, 0 . 2711]}^{T}

is marked by a cross at the intersection of two star lines, the circles with a dotted line are the corresponding motion areas of agents, and the solid ones are the initial positions.

6. Conclusions

This paper mainly studies a class of distributed composite optimization problems with non-smooth convex functions. To solve this kind of problem, this paper proposes two completely distributed algorithms. At the same time, the algorithms are verified in theory and simulation. However, there are still some aspects worthy of improvement in this paper. For example, in the network structure, we can consider expanding from an undirected graph to a directed graph, and we can also combine it with more practical application scenarios, such as resource allocation.

Author Contributions

Y.S.: Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Supervision, Writing—original draft. L.R.: Data curation, Formal analysis, Software. J.T.: Formal analysis, Software, Writing—original draft. X.W.: Methodology, Software, Formal analysis. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Li, Z.; Shi, W.; Yan, M. A decentralized proximal-gradient method with network independent step-sizes and separated convergence rates. IEEE Trans. Signal Process. 2019, 67, 4494–4506. [Google Scholar] [CrossRef]
Latafat, P.; Freris, N.M.; Patrinos, P. A new randomized block-coordinate primal-dual proximal algorithm for distributed optimization. IEEE Trans. Autom. Control 2019, 64, 4050–4065. [Google Scholar] [CrossRef]
Bai, L.; Ye, M.; Sun, C.; Hu, G. Distributed economic dispatch control via saddle point dynamics and consensus algorithms. IEEE Trans. Control Syst. Technol. 2019, 27, 898–905. [Google Scholar] [CrossRef]
Jin, B.; Li, H.; Yan, W.; Cao, M. Distributed model predictive control and optimization for linear systems with global constraints and time-varying communication. IEEE Trans. Autom. Control 2020, 66, 3393–3400. [Google Scholar] [CrossRef]
Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via alternating direction method of multipliers. Found. Trends Mach. Learn. 2011, 3, 1–122. [Google Scholar] [CrossRef]
Olshevsky, A. Linear time average consensus and distributed optimization on fixed graphs. SIAM J. Control. Optim. 2017, 55, 3990–4014. [Google Scholar] [CrossRef]
Nedić, A.; Ozdaglar, A. Distributed subgradient methods for multi-agent optimization. IEEE Trans. Autom. Control 2009, 54, 48–61. [Google Scholar] [CrossRef]
Shi, W.; Ling, Q.; Wu, G.; Yin, W. Extra: An exact first-order algorithm for decentralized consensus optimization. SIAM J. Optim. 2015, 25, 944–966. [Google Scholar] [CrossRef]
Qu, G.; Li, N. Harnessing smoothness to accelerate distributed optimization. IEEE Trans. Control. Netw. Syst. 2018, 5, 1245–1260. [Google Scholar] [CrossRef]
Chang, T.H.; Hong, M.Y.; Wang, X.F. Multi-agent distributed optimization via inexact consensus ADMM. IEEE Trans. Signal Process. 2015, 63, 482–497. [Google Scholar] [CrossRef] [Green Version]
Iutzeler, F.; Bianchi, P.; Ciblat, P.; Hachem, W. Explicit convergence rate of a distributed alternating direction method of multipliers. IEEE Trans. Autom. Control 2016, 61, 892–904. [Google Scholar] [CrossRef]
Shi, W.; Ling, Q.; Yuan, K.; Wu, G.; Yin, W.T. On the linear convergence of the ADMM in decentralized consensus optimization. IEEE Trans. Signal Process. 2014, 62, 1750–1761. [Google Scholar] [CrossRef]
Wei, E.; Ozdaglar, A. Distributed alternating direction method of multipliers. In The 51st IEEE Conference on Decision and Control; IEEE: Maui, HI, USA, 2012; pp. 5445–5450. [Google Scholar]
Chen, A.I.; Ozdaglar, A. A fast distributed proximal-gradient method. In Proceedings of the Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 1–5 October 2012; pp. 601–608. [Google Scholar]
Shi, W.; Ling, Q.; Wu, G.; Yin, W. A proximal gradient algorithm for decentralized composite optimization. IEEE Trans. Signal Process. 2015, 63, 6013–6023. [Google Scholar] [CrossRef]
Mao, X.; Yuan, K.; Hu, Y.; Gu, Y.; Sayed, A.; Walkman, W.Y. A communication-efficient random-walk algorithm for decentralized optimization. IEEE Trans. Signal Process. 2020, 68, 2513–2528. [Google Scholar] [CrossRef]
Zeng, J.; He, T.; Wang, M. A fast proximal gradient algorithm for decentralized composite optimization over directed networks. Syst. Control. Lett. 2017, 107, 36–43. [Google Scholar] [CrossRef]
Condat, L. A primal-dual splitting method for convex optimization involving lipschitzian, proximable and linear composite terms. J. Optim. Theory Appl. 2013, 158, 460–479. [Google Scholar] [CrossRef]
Chen, P.; Huang, J.; Zhang, X. A primal-dual fixed point algorithm for minimization of the sum of three convex separable functions. Fixed Point Theory Appl. 2016. [Google Scholar] [CrossRef]
Latafat, P.; Patrinos, P. Asymmetric forward-backward-adjoint splitting for solving monotone inclusions involving three operators. Comput. Optim. Appl. 2017, 68, 57–93. [Google Scholar] [CrossRef]
Yan, M. A new primal-dual algorithm for minimizing the sum of three functions with a linear operator. J. Sci. Comput. 2018, 76, 1698–1717. [Google Scholar] [CrossRef]
Boyd, S.; Ghosh, A.; Prabhakar, B.; Shah, D. Randomized gossip algorithms. IEEE Trans. Inf. Theory 2006, 52, 2508–2530. [Google Scholar] [CrossRef] [Green Version]
Ren, X.; Li, D.; Xi, Y.; Shao, H. Distributed subgradient algorithm for multi-agent optimization with dynamic stepsize. IEEE/CAA J. Autom. Sin. 2021, 8, 1451–1464. [Google Scholar] [CrossRef]
Micchelli, C.A.; Shen, L.; Xu, Y. Proximity algorithms for image models: Denoising. Inverse Probl. 2011, 27, 45009. [Google Scholar] [CrossRef]
Bauschke, H.H.; Combettes, P.L. Convex Analysis and Monotone Operator Theory in Hilbert Spaces; Springer: New York, NY, USA, 2011. [Google Scholar]
Hong, M.; Chang, T.-H. Stochastic proximal gradient consensus over random networks. IEEE Trans. Signal Process. 2017, 65, 2933–2948. [Google Scholar] [CrossRef]
Combettes, P.L.; Pesquet, J.C. Stochastic quasi-Fejér block- coordinate fixed point iterations with random sweeping. SIAM J. Optim. 2015, 25, 1221–1248. [Google Scholar] [CrossRef]
Bianchi, P.; Hachem, W.; Iutzeler, F. A coordinate descent primal-dual algorithm and application to distributed asynchronous optimization. IEEE Trans. Autom. Control 2016, 61, 2947–2957. [Google Scholar] [CrossRef]
Robbins, H.; Siegmund, D. A convergence theorem for non negative almost supermartingales and some applications. In Herbert Robbins Selected Papers; Lai, T.L., Siegmund, D., Eds.; Springer: New York, NY, USA, 1985; pp. 111–135. [Google Scholar]
Aybat, N.S.; Wang, Z.; Lin, T.; Ma, S. Distributed linearized alternating direction method of multipliers for composite convex consensus optimization. IEEE Trans. Autom. Control 2018, 63, 5–20. [Google Scholar] [CrossRef]
Li, H.; Su, E.; Wang, C.; Liu, J.; Xia, D. A primal-dual forward-backward splitting algorithm for distributed convex optimization. IEEE Trans. Emerg. Top. Comput. Intell. 2021. [Google Scholar] [CrossRef]

Figure 1. Graph topology.

Figure 2. Convergence results of the two algorithms. (a) Algorithm 1. (b) Algorithm 2.

Figure 3. Performance comparison.

Figure 4. Evaluations of positions. (a) Evaluations of

p_{x, i}^{k}

. (b) Evaluations of

p_{y, i}^{k}

.

Figure 4. Evaluations of positions. (a) Evaluations of

p_{x, i}^{k}

. (b) Evaluations of

p_{y, i}^{k}

.

Figure 5. Motions of all agents in the 2-D space.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shi, Y.; Ran, L.; Tang, J.; Wu, X. Distributed Optimization Algorithm for Composite Optimization Problems with Non-Smooth Function. Mathematics 2022, 10, 3135. https://doi.org/10.3390/math10173135

AMA Style

Shi Y, Ran L, Tang J, Wu X. Distributed Optimization Algorithm for Composite Optimization Problems with Non-Smooth Function. Mathematics. 2022; 10(17):3135. https://doi.org/10.3390/math10173135

Chicago/Turabian Style

Shi, Yawei, Liang Ran, Jialong Tang, and Xiangzhao Wu. 2022. "Distributed Optimization Algorithm for Composite Optimization Problems with Non-Smooth Function" Mathematics 10, no. 17: 3135. https://doi.org/10.3390/math10173135

APA Style

Shi, Y., Ran, L., Tang, J., & Wu, X. (2022). Distributed Optimization Algorithm for Composite Optimization Problems with Non-Smooth Function. Mathematics, 10(17), 3135. https://doi.org/10.3390/math10173135

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Distributed Optimization Algorithm for Composite Optimization Problems with Non-Smooth Function

Abstract

1. Introduction

2. Preliminaries

3. Algorithm Development

3.1. Synchronous Algorithm 1

3.2. Asynchronous Algorithm 2

4. Convergence Analysis

5. Numerical Experiments

5.1. Case Study I: Performance Examination

5.2. Case Study II: First-Order Dynamics System

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI