Next Article in Journal
Mobile Sensor Networks for Finite-Time Distributed H Consensus Filtering of 3D Nonlinear Distributed Parameter Systems with Randomly Occurring Sensor Saturation
Previous Article in Journal
Effectiveness of a Multimodal Intervention on Social Climate (School and Family) and Performance in Mathematics of Children with Attention Deficit/Hyperactivity Disorder
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Distributed Optimization Algorithm for Composite Optimization Problems with Non-Smooth Function

Chongqing Key Laboratory of Nonlinear Circuits and Intelligent Information Processing, College of Electronic and Information Engineering, Southwest University, Chongqing 400715, China
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(17), 3135; https://doi.org/10.3390/math10173135
Submission received: 22 July 2022 / Revised: 22 August 2022 / Accepted: 24 August 2022 / Published: 1 September 2022
(This article belongs to the Topic Distributed Optimization for Control)

Abstract

:
This paper mainly studies the distributed optimization problems in a class of undirected networks. The objective function of the problem consists of a smooth convex function and a non-smooth convex function. Each agent in the network needs to optimize the sum of the two objective functions. For this kind of problem, based on the operator splitting method, this paper uses the proximal operator to deal with the non-smooth term and further designs a distributed algorithm that allows the use of uncoordinated step-sizes. At the same time, by introducing the random-block coordinate mechanism, this paper develops an asynchronous iterative version of the synchronous algorithm. Finally, the convergence of the algorithms is proven, and the effectiveness is verified through numerical simulations.

1. Introduction

In this paper, we study a class of distributed multi-agent problems on networks. Each agent in the network system has the following private objective function to be solved
F i ( x ¯ ) = f i ( x ¯ ) + g i ( x ¯ ) ,
where x ¯ R n is the decision variable, f i is a Lipschitz-differentiable convex function, and g i is a non-smooth convex function. Examples of f i include quadratic functions and logistic functions [1], and applications of function g i include the elastic-net norm, L1-norm, and indicator functions [2].
For the network system, we consider that each agent in the system is only allowed to interact with neighbor agents, and there is no central agent to process data; then we can obtain
min x 1 , , x m i = 1 m F i x i s . t . x i = x i , i , j E
where x i R n is the local estimation for x ¯ and E represents a collection of edges in the network. This distributed computing architecture captures various areas containing distributed information processing and decision making, networked multi-vehicle coordination,  distributed estimation, etc. Typical applications include power systems control [3], model predictive control [4], statistical inference and learning [5], and distributed average consensus [6].
In recent years, most of the literature has mainly focused on the case that the optimization objective function contains only one smooth convex function. At the same time, many centralized algorithms with excellent performance, such as proximal gradient descent, sub-gradient algorithm, Newton method, and so on, solve these problems by extending to a distributed form. The sub-gradient algorithm is the most commonly used method. In [7], Nedić and Ozdaglar apply this method to the distributed optimization problem on time-varying networks and creatively propose the distributed sub-gradient method (DGD). Shi et al. [8] propose an exact first-order algorithm (EXTRA) and prove the linear convergence of the algorithm. The algorithm makes use of the error between adjacent iterations of the DGD algorithm. Then, [9] designs a distributed first-order algorithm by combining DGD and the gradient tracking method. In order to further accelerate the convergence of the algorithm, researchers successively propose the distributed ADMM algorithm in [10,11,12,13]. However, these algorithms can only solve the optimization problem of a single function.
For (2) this composite distributed optimization problem with a non-smooth term, many research results have emerged. The authors of [14] design a proximal gradient method by combining Nesterov acceleration mechanisms. However, each iteration will lead to the consumption of more computing resources because more internal iteration steps are required. In undirected networks, Shi et al. design a proximal gradient exact first-order algorithm (PG-EXTRA) for composite optimization problems based on the classical first-order distributed optimization algorithm (EXTRA) [8] in [15]. The algorithm can accurately converge to the optimal solution of the problem by using a fixed step-size, so it is different from most algorithms that must use attenuation step-size. The authors of [16] propose a communication-efficient random walk named Walkman by using a Markov chain. By analyzing the relationship between optimization accuracy and network connectivity, this method obtains the explicit expression of communication complexity and the communication efficiency of the system. Further, considering that the complex situation of the real scene causes most agents in the network to transmit data in a directed way, ref. [17] uses the push sum mechanism to eliminate the information imbalance caused by the directed network and proposes the PG-ExtraPush algorithm on the basis of [8] and maintains the same convergence property.
Recently developed, the operator splitting technology has become the mainstream method to deal with this kind of complex optimization problem. Operator splitting technology is applied for the first time to composite optimization since Combettes and Pesquet designed a fully splitting algorithm, refs. [18,19,20,21] and others successively propose various algorithms for composite optimization. However, operator splitting technology is rarely applied to distributed composite optimization. Based on this, this paper aims to design a distributed algorithm with excellent performance by using the operator splitting method and based on the theory of operator monotonicity.
Contributions: Compared with most existing distributed optimization algorithms, the main contributions of this paper are summarized as follows:
1.
To solve problem (2), this paper develops a novel, fully distributed algorithm based on the operator splitting method, which has superiorities in flexibility and efficiency compared with relatively centralized counterparts [18,19,20,21].
2.
Based on a class of randomized block-coordinate methods, an asynchronous iterative version of the proposed algorithm is also derived, wherein only a subset of agents that are independently activated participate in the updates. Note that such an activation scheme is more flexible compared with the single coordinate activation [22].
3.
Both proposed algorithms allow not only local information interaction among neighboring agents but also the use of uncoordinated step-sizes, without any requirement of coordinated or dynamic ones considered in [7,8,9,14,23]. Additionally, the convergence of both algorithms is ensured under the same mild assumptions. In particular, the consideration of the local Lipschitz assumption avoids the conservative selections of step-sizes, unlike the global one assumed in [8,14,15,17].
Organization: The contents of the remaining sections of the paper are as follows. Section 2 provides the symbols, lemmas, definitions, and assumptions that will be used in the paper. We give the specific process of algorithm derivation in Section 3. In Section 4, we show the convergence analysis of the proposed algorithms. Section 5 presents the simulation experiment to verify the algorithms. Finally, Section 6 gives the conclusion of the paper.

2. Preliminaries

In this section, we give the notations and display the definitions and lemmas that will be used in the paper. Then, we give two important assumptions.
Above all, we introduce some knowledge about graph theory. Let G = V , E represent an undirected network composed of n agents, where V denotes the set of agents and E denotes the set of edges. The neighborhood of the i-th agent is recorded as N i = j i , j E . Specifically, when there is at least one path in any two agents in an undirected network G , the network is connected.
Let R n denote the n-dimensional Euclidean space and · denote the Euclidean norm of a vector x R n . The notation ρ max · is the spectral radius of a matrix, and N represents the set of positive integers. Then let X 0 R n denote the collection of all proper lower semi-continuous convex functions from R n to , + . When W i denotes a positive definite matrix, using W i as the diagonal element can form a positive definite diagonal matrix blkdiag W i i V . Let ri · denote the interior of a convex subset and do m f denote the effective domain of f. The subdifferential of function f i is expressed as f x 1 = { v R n | x 2 x 1 T v f x 2 f x 1 , x 2 R n } . The proximity operator of a function f X 0 R n related to · P is defined by p r o x P 1 f ( x ) = arg min y R n { f ( y ) + ( 1 / 2 ) | | x y | | P 2 } . The convex conjugate function of f is written as f .
At the same time, we give the following lemmas and assumptions.
Lemma 1
([24]). Let f X 0 R n , then for vectors x 1 , x 2 R n , the following relation holds:
x 2 f x 1 x 1 = p r o x f x 1 + x 2 x 2 = I p r o x f x 1 + x 2 .
Lemma 2
([25]). Let f X 0 R n , then both p r o x f and I p r o x f satisfy a firmly nonexpansive relationship.
Lemma 3
([19]). For a fixed point iteration u k + 1 = T u k , u k will converge to the fixed point of T when it satisfies the following conditions:
1.
T is continuous,
2.
| | u k u * | | 2 is non-increasing,
3.
lim k u k + 1 u k 2 = 0 .
Definition 1.
For all x 1 , x 2 R n , if an operator T satisfies T x 1 T x 2 x 1 x 2 , then T is a nonexpansive operator. Further, if T satisfies T x 1 T x 2 2 T x 1 T x 2 T x 1 x 2 , then T is firmly a nonexpansive operator.
Definition 2.
When T x 1 T x 2 T x 1 x 2 σ T x 1 x 2 2 , x 1 , x 2 R n exist for a constant σ T > 0 and operator T, then operator T satisfies σ T -strongly monotone.
The following assumptions will also be used.
Assumption 1.
Graph G satisfies undirected and connected operators.
Assumption 2.
The following three points are satisfied:
1. 
f i : R n R is a smooth convex function, let 1 / β i be Lipschitz constant, then f i satisfies
β i f i ( x 1 ) f i ( x 2 ) x 1 x 2 , x 1 , x 2 R n ,
2. 
g i : R n R is a convex non-smooth function,
3. 
Problem (2) has at least one solution.

3. Algorithm Development

In this section, we design and derive the synchronous algorithm and asynchronous algorithm.
We next carry on the equivalent transformation to problem (2) to facilitate the subsequent algorithm design. The constraint, x i = x j in (2) can be written as the edge-based form
E i j x i + E j i x j = 0 ,
where E i j = I R n × n for i < j , and  E i j = I R n × n otherwise. Then, define the following linear operator:
M ( i , j ) : x ( E i j x i , E j i x j ) R 2 n × m n ,
with the compact variable x = [ x 1 T , , x m T ] T R m n . We stack all M ( i , j ) to get the following operator:
M : x M i , j x i , j E ,
with the dimension 2 n E × m n , where E is the number of edges of the network E . Considering the set
C i , j = e 1 , e 2 R n × R n e 1 + e 2 = 0 .
Then,  constraint (4) can be further reformulated in the following form:
M ( i , j ) x C ( i , j ) .
Based on the above analysis, problem (2) can be transformed into
min x i R n i = 1 m f i x i + g i x i + i = 1 m i , j E δ C i , j M i , j x ,
where δ C represents the indicator function, i.e., 
δ C i , j M i , j x = 0 , M i , j x C i , j , + , M i , j x C i , j .
Then, let
f x = i = 1 m f i x i , g x = i = 1 m g i x i , δ C M x = i = 1 m j N i δ C i , j M i , j x ,
and C = ( i , j ) E C ( i , j ) ( denotes the Cartesian product). Hence, the compact form of problem (8) can be expressed by
min x R m n f x + g x + δ C M x .

3.1. Synchronous Algorithm 1

According to the fixed point theory, we design the distributed optimization algorithm of problem (2) from (9). We define the step-size matrices Γ = blkdiag γ i I n i V , Λ ˜ = blkdiag λ ˜ i , j γ ˜ i , j 1 i , j E , and  H ˜ = blkdiag λ i , j I 2 n i , j E , where we let γ ˜ i , j = blkdiag γ i I n , γ j I n , and then introduce the following operators:
T 0 s * , x * = p r o x Γ g x * Γ f x * ( H ˜ M ) T s * ,
T ˜ 1 s * , x * = I p r o x Λ ˜ 1 δ C M T 0 s * , x * + s * ,
T 2 s * , x * = p r o x Γ g x * Γ f x * ( H ˜ M ) T T ˜ 1 s * , x * ,
T s * , x * = T ˜ 1 s * , x * , T 2 s * , x * ,
where x * = col x i * i = 1 m and s * = col { s i , j * } i , j E with s i , j * = col { s i , j , i * , s i , j , i * } are the fixed points of T. In particular, s i , j , i * and s i , j , j * are maintained by i and j, respectively. Considering the update variables y k + 1 = col { y i k + 1 } i = 1 m , x k + 1 = col { x i k + 1 } i = 1 m , and  s k + 1 = col { s i , j k + 1 } i , j E with the edge-based variable s i , j k + 1 = col { s i , j , i k + 1 , s i , j , j k + 1 } , we give the Picard sequence of T and obtain the following update rules:
y k + 1 = p r o x Γ g x k Γ f x k H ˜ M T s k s k + 1 = I p r o x Λ ˜ 1 δ C M y k + 1 + s k x k + 1 = p r o x Γ g x k Γ f x k H ˜ M T s k + 1
Let w ¯ k + 1 = col { w ¯ i , j k + 1 } i , j E = col { λ ˜ i , j γ ˜ i , j 1 · s i , j k + 1 } i , j E . Using Lemma 2, (14) can be rewritten as
y k + 1 = p r o x Γ g x k Γ f x k Γ M T w ˜ k ,
s k + 1 = I p r o x Λ ˜ 1 δ C M y k + 1 + s k ,
x k + 1 = p r o x Γ g x k Γ f x k Γ M T w ˜ k + 1 .
Next, we split (15a)–(15c) in a distributed manner. It follows from (5) and (6) that the i-th component of M T w ˜ k + 1 is E i j w ˜ i , j , i k + 1 . Note that (15a) can be decomposed into
y 1 k + 1 y m k + 1 = p r o x γ 1 g 1 x 1 k γ 1 f 1 x 1 k γ 1 j N 1 E 1 j w ˜ 1 , j , 1 k p r o x γ m g m x m k γ m f m x m k γ m j N m E m j w ˜ m , j , m k .
For (15b), multiply both sides of the equality by Λ ˜ . As done in (16), we also split (15b) and (15c) and use the result p r o x δ C ( i , j ) = p r o j C ( i , j ) to get the semi-distributed form:
y i k + 1 = p r o x γ i g i x i k γ i f i x i k γ i j N i E i j w ˜ i , j , i k ,
w ˜ i , j k + 1 = w ˜ i , j k + λ ˜ i , j γ ˜ i , j 1 M i , j y k + 1 p r o j C i , j γ ˜ i , j λ ˜ i , j w ˜ i , j k + M i , j y k + 1 ,
x i k + 1 = p r o x γ i g i x i k γ i f i x i k γ i j N i E i j w ˜ i , j , i k + 1 .
Note that (17b) is not fully distributed due to the structure w i , j k + 1 = col { w i , j , i k + 1 , w i , j , j k + 1 } . By using (4) and (5), we can derive that the projection of vectors e 1 , e 2 R n to C i , j is expressed as
p r o j C i , j e 1 , e 2 = 1 2 e 1 e 2 , e 2 e 1 ,
which contributes to the local update of (17b), i.e.,
w ˜ i , j , i k + 1 = w ˜ i , j , i k + λ ˜ i , j γ i y i k + 1 1 2 γ i λ ˜ i , j w ˜ i , j , i k + y i k + 1 γ i λ ˜ i , j w ˜ i , j , j k y j k + 1 ,
w ˜ i , j , j k + 1 = w ˜ i , j , j k + λ ˜ i , j γ j y j k + 1 1 2 γ j λ ˜ i , j w ˜ i , j , j k y j k + 1 γ j λ ˜ i , j w ˜ i , j , i k + y i k + 1 .
Therefore, according to (17a), (17c), and the update of w i , j , i k + 1 , we can summarize the synchronous distributed algorithm as follows:
Remark 1.
Notice that Algorithm 1 is completely distributed without involving any global parameters. For example, each agent individually maintains the private primal variable x i k , auxiliary variable y i k , and edge-based variables w ˜ i , j , i k + 1 . For each edge i , j E in the network, w ˜ i , j k = col { w ˜ i , j , i k , w ˜ i , j , i k } as an auxiliary profile contains two components, i.e.,  w ˜ i , j , i k and w ˜ i , j , i k , which are respectively kept by i and j. Meanwhile, the information exchange is locally conducted; that is, agent i shares its updated data y i k + 1 and w ˜ i , j , i k + 1 with its all neighbors j N i . On the other hand, the proposed algorithm takes uncoordinated constant positive step-sizes, γ i , essentially distinguished from the global and dynamic ones in [7,8,9,14,23]. It is also worth noting that the edge-based step-size λ ˜ i , j , held by agents i and j linked by the edge i , j E , can be seen as inherent parameters of the communication network, revealing the quality of the communication.
Algorithm 1 Distributed algorithm based on proximal operators
Input: For all agents i V , x i 0 R n , and  w ˜ i , j , i 0 R n , where j N i . And select proper positive step-sizes or parameters, γ i and λ ˜ i , j .
For k = 0 , 1 , , do :
 1. y i k + 1 = p r o x γ i g i x i k γ i f i x i k γ i j N i E i j w ˜ i , j , i k ,
 2. w ˜ i , j , i k + 1 = 1 2 λ ˜ i , j γ i y i k + 1 y j k + 1 + 1 2 w ˜ i , j , i k + w ˜ i , j , j k , j N i ,
 3. x i k + 1 = p r o x γ i g i x i k γ i f i x i k γ i j N i E i j w ˜ i , j , i k + 1 ,
 4. Send y i k + 1 , w ˜ i , j , i k + 1 to j for j N i ,
 5. Until the x i k + 1 x i k approaches zero.
End
Output: The primal variable x i k + 1 as the optimal solution x i * .

3.2. Asynchronous Algorithm 2

Here, we extend the synchronous Algorithm 1 to the asynchronous iterative version based on the random-block coordinate mechanism in [2]. Combining with the principle of this mechanism, we define the diagonal matrix P i R ( 2 E + m ) n × R ( 2 E + m ) n (where | E | denotes the number of edges of the graph E ) diagonal elements of 0 or 1 to represent the coordinate matrix, and then divide the vector s , x into m blocks. At the same time, we define the activation vector ξ k R m of ϕ -valued, where ϕ = 0 , 1 is a binary string with length m. When ξ i k = 1 , it means that the agent i is activated at the k-th iteration; otherwise it is not activated.
In order to describe the activation state of different coordinate blocks and ensure random activation, we give the following assumption.
Assumption 3.
The following two points are satisfied:
1.
The sum of P i satisfies i m P i = I ,
2.
ξ k k 0 is a ϕ-valued vector satisfying identical independent distributionsand its probability is p i = P ξ i k = 1 > 0 , k 0 .
Then, based on the given assumption, we can develop the asynchronous algorithm as follows:
It can be seen that Algorithm 2 allows each agent to awaken with an independent probability, which means that a subset of randomly activated agents will participate in the updates while inactivated ones stay in previous states. Such a scheme is more flexible than the single waking-up scheme [22] or other activated block coordinates that are uniformly selected [26]. In addition, the probability is completely independent of the others, which does not meet some strict conditions, such as i = 1 m p i = 1 .
Algorithm 2 Asynchronous distributed version
Input: For all agents i V , x i 0 R n i , and  w ˜ i , j , i 0 R n , where j N i . And select proper positive step-sizes or parameters, γ i and λ ˜ i , j .
For k = 0 , 1 , , do :
  • For j N i , each agent i is activated independently with probability p i , and further performs the update steps 1–5 in Algorithm 1. While agents that are not activated, the last values keep unchanged.
End
Output: The primal variable x i k + 1 as the optimal solution x i * .
In order to facilitate the subsequent derivation of convergence, we need to give a compact form of Algorithm 2. By making u = ( s , x ) , we get
u k + 1 = u k + E k + 1 T u k u k ,
where E k + 1 = i = 1 m ξ i k + 1 P i and operator T can be seen in Equation (11).

4. Convergence Analysis

The convergence proof of the algorithms is provided in this section. The following assumption is the condition to be met for the convergence of the algorithms.
Assumption 4.
Recall the local Lipschitz constant β i in Assumption 2. It is assumed that the step-sizes satisfy the following conditions:
0 < γ i < 2 β i , 0 < λ ˜ i , j < 1 .
Lemma 4.
Let x * be a solution to (9), then there are
s * = T ˜ 1 s * , x * , x * = T 2 s * , x * ,
which means u * = s * , x * is a fixed point of T. On the contrary, x * is the solution to (9) when u * is the fixed point of T.
Proof. 
Use the first-order optimal condition of (9) to obtain 0 Γ f x * + Γ g x * + Γ M T δ C M x * , where x * is the optimal solution. According to the definition of matrix step-sizes, we further obtain
0 Γ f x * + Γ g x * + ( H ˜ M ) T Λ ˜ 1 δ C M x * .
Use Lemma 1 and let s * Λ ˜ 1 δ C M x * to get
s * = I p r o x Λ ˜ 1 δ C M x * + s * ,
x * = p r o x Γ g x * Γ f x * ( H ˜ M ) T s * .
Then according to (19) and (20), we can get
s * = I p r o x Λ ˜ 1 δ C M p r o x Γ g x * Γ f x * ( H ˜ M ) T s * + s * .
Therefore, we have x * = T 2 s * , x * and s * = T ˜ 1 s * , x * . Meanwhile, u * = T u * , where u * = ( s * , x * ) . Accordingly, if there is u * = T u * , it can also be deduced that x * satisfies the first-order optimality condition of problem (9). Thus x * is an optimal solution of problem (9). □
Lemma 5.
Let Assumptions 1 and 2 hold, then there are
s k + 1 s * Λ ˜ 2 s k s * Λ ˜ 2 s k + 1 s k Λ ˜ 2 + 2 s k + 1 s * T Λ ˜ M y k + 1 x * ,
x k + 1 x * Γ 1 2 x k x * Γ 1 2 x k + 1 y k + 1 Γ 1 2 x k y k + 1 Γ 1 2 + 2 Γ 1 x k + 1 Γ 1 y k + 1 T Γ f x k + H ˜ M T s k + 2 Γ 1 x * Γ 1 x k + 1 T Γ f x k + H ˜ M T s k + 1 + 2 ( g Γ ) Γ 1 x * g Γ Γ 1 y k + 1 .
Proof. 
Combining (14), (19), and Lemma 2, we get
s k + 1 s * Λ ˜ 2 = I p r o x Λ ˜ 1 δ C M y k + 1 + s k I p r o x Λ ˜ 1 δ C M x * + s * Λ ˜ 2
s k + 1 s * T Λ ˜ M y k + 1 + s k M x * + s * .
It is further concluded that
s k + 1 s * T Λ ˜ s k + 1 s k + s k s * s k + 1 s * T Λ ˜ M y k + 1 x * + s k + 1 s * T Λ ˜ s k s * .
Here we introduce an equality. For a positive definite matrix K and x 1 , x 2 , x 3 R n , we have
2 x 1 x 2 T K x 3 x 2 = x 3 x 2 K 2 + x 1 x 2 K 2 x 1 x 3 K 2 .
Combining the above two results, we derive
s k + 1 s * Λ ˜ 2 = s k s * Λ ˜ 2 s k + 1 s k Λ ˜ 2 + 2 s k + 1 s * T Λ ˜ s k + 1 s k s k s * Λ ˜ 2 s k + 1 s k H ˜ 2 + 2 s k + 1 s * T Λ ˜ M y k + 1 x * .
In order to prove the validity of (22), (3) is used for (14)
Γ 1 x k Γ f x k ( H ˜ M ) T s k + 1 x k + 1 g x k + 1 .
Using subdifferential properties to obtain
x * x k + 1 T Γ 1 x k Γ f x k ( H ˜ M ) T s k + 1 x k + 1 g x * g x k + 1
and equivalent
Γ 1 x k + 1 Γ 1 x * T x k + 1 x k Γ 1 x k + 1 Γ 1 x * T Γ 1 Γ f x k + ( H ˜ M ) T s k + 1 + g Γ Γ 1 x * g Γ Γ 1 x k + 1 .
Moreover, there is
x k + 1 x * Γ 1 2 = x k x * Γ 1 2 x k + 1 x k Γ 1 2 + 2 x k + 1 x * T Γ 1 x k + 1 x k .
A derivation similar to (25) is obtained for (14)
Γ 1 x k + 1 Γ 1 y k + 1 T x k Γ f x k ( H ˜ M ) T s k y k + 1 g Γ Γ 1 x k + 1 g Γ Γ 1 y k + 1 Γ 1 x k + 1 Γ 1 y k + 1 T x k y k + 1 Γ 1 x k + 1 Γ 1 y k + 1 T Γ f x k + H ˜ M T s k + g Γ Γ 1 x k + 1 g Γ Γ 1 y k + 1 .
Therefore, we deduce
x k + 1 x k Γ 1 2 = x k y k + 1 Γ 1 2 x k + 1 y k + 1 Γ 1 2 + 2 x k y k + 1 T Γ 1 x k + 1 y k + 1
x k y k + 1 Γ 1 2 x k + 1 y k + 1 Γ 1 2 + 2 Γ 1 x k + 1 Γ 1 y k + 1 T Γ f x k + H ˜ M T s k + 2 g Γ Γ 1 x k + 1 g Γ Γ 1 y k + 1 .
Combining the above two equalities and (26), we can get (22). □
Lemma 6.
Let Assumptions 1 and 2 hold. Set β = blkdiag β i I n i V . For matrix P = blkdiag { Λ ˜ , Γ 1 } and u = s , x , there is
u k + 1 u * P 2 u k + 1 u * P 2 s k + 1 s k Λ ˜ I M Γ M T Λ ˜ 2 y k + 1 x k + 1 + H ˜ M T s k + 1 s k Γ 1 2 x k y k + 1 Γ f x k Γ f x * Γ 1 2 f x k f x * 2 β Γ 2 .
Proof. 
Adding (21) and (22), then rearranging to get
x k + 1 x * Γ 1 2 + s k + 1 s * Λ ˜ 2 x k x * Γ 1 2 + s k s * Λ ˜ 2 x k y k + 1 Γ 1 2 s k + 1 s k Λ ˜ 2 x k + 1 y k + 1 Γ 1 2 + 2 H ˜ M T s k + 1 s k T Γ 1 y k + 1 x k + 1 + 2 Γ f x k Γ f x * T Γ 1 x k y k + 1 2 Γ f x k Γ f x * T Γ 1 x k x * + 2 Γ f x * H ˜ M T s * T Γ 1 y k + 1 x * + ( g Γ ) Γ 1 x * g Γ Γ 1 y k + 1 .
Further, we have
x k + 1 x * Γ 1 2 + s k + 1 s * Λ ˜ 2 x k x * Γ 1 2 + s k s * Λ ˜ 2 x k y k + 1 Γ 1 2 s k + 1 s k Λ ˜ I M Γ M T Λ ˜ 2 x k + 1 y k + 1 Γ 1 2 + H ˜ M T s k + 1 s k Γ 1 2 + y k + 1 x k + 1 Γ 1 2 y k + 1 x k + 1 + H ˜ M T s k + 1 s k Γ 1 2 + Γ f x k Γ f x * Γ 1 2 + x k y k + 1 Γ 1 2 x k y k + 1 Γ f x k Γ f x * Γ 1 2 + 2 Γ f x * H ˜ M T s * T y k + 1 x * + ( g Γ ) Γ 1 x * g Γ Γ 1 y k + 1 .
Then we deal with some terms in the above inequality. For (20) combined with Lemma 1 can deduce
Γ f x * H ˜ M T s * Γ g x * = g Γ Γ 1 x * .
Further using subdifferential properties, we have
Γ 1 y k + 1 Γ 1 x * T Γ f x * H ˜ M T s * + g Γ Γ 1 x * g Γ Γ 1 y k + 1 0 .
Meanwhile, because f i is 1 / β i -strongly monotone, there is
f x k f x * T x k x * f x k f x * β 2 .
Bring the above results back to (28) and get (27). □
Lemma 7.
Under Assumptions 1–4, | | u k u * | | P 2 is non-increasing and lim k u k + 1 u k P 2 = 0 .
Proof. 
If Assumption 4 holds, we can deduce that | | u k u * | | P 2 satisfies non-increasing operators.
Sum (27) over k from 0 to N to obtain
u N + 1 u * P 2 u 0 u * P 2 k = 0 n s k + 1 s k Λ ˜ I M Γ M T Λ ˜ 2 k = 0 n x k + 1 y k + 1 + ( H ˜ M Γ ) T s k + 1 s k Γ 1 2 k = 0 n x k y k + 1 Γ f x k Γ f x * Γ 1 2 k = 0 n f x k f x * 2 β Γ 2 .
When N tends to infinity, we can get
k = 0 s k + 1 s k Λ ˜ I M Γ M T Λ ˜ < , k = 0 x k + 1 y k + 1 + ( H ˜ M Γ ) T s k + 1 s k < , k = 0 x k y k + 1 Γ f x k Γ f x * < , k = 0 f x k f x * < .
This means
lim k s k + 1 s k Λ ˜ I M Γ M T Λ ˜ = 0 ,
lim k x k + 1 y k + 1 + ( H ˜ M Γ ) T s k + 1 s k = 0 ,
lim k x k y k + 1 Γ f x k Γ f x * = 0 ,
lim k f x * f x k = 0 .
Next, according to (32) and (33), we obtain
lim k x k y k + 1 = 0 .
Meanwhile, if Assumption 4 holds, I M Γ M T Λ ˜ is a symmetric positive definite. Therefore, we can get
lim k s k + 1 s k = 0 .
According to (31) and (35) we obtain lim k x k + 1 y k + 1 = 0 . Combining with (34), we get
lim k x k + 1 x k 2 = 0 .
Then according to (35) and (36), we get lim k u k + 1 u k 2 = 0 .
Next, we give the following theorem to prove the convergence of Algorithm 1.
Theorem 1.
Under Assumptions 1–4, x k and u k converge to the optimal solution of (2) and the fixed points of T, respectively.
Proof. 
Because p r o x f and I p r o x f are firmly nonexpansive, T is continuous. Then, lim k u k + 1 u k P 2 = 0 and the sequence | | u k u * | | P 2 satisfies non-increasing are obtained from Lemma 7. Based on Lemma 3, the sequence u k converges to a fixed point of T. According to Lemma 4, it can be concluded that x k converges to a solution to (2). □
At the same time, we also give the following theorem to prove the convergence of Algorithm 2.
Theorem 2.
Under Assumptions 1–4, relative to the solution set S , the sequence u k k k 0 , k 0 N satisfies Π 1 P stochastic Fejér monotonicity [27]:
E u k + 1 u * Π 1 P 2 u k u * Π 1 P 2 s k + 1 s k Λ ˜ I M Γ M T Λ ˜ 2 f x k f x * 2 β Γ 2 .
Further, the sequence u k k k 0 converges almost surely to some u * S .
Proof. 
Before proving, we give some definitions. Here Π = i = 1 m p i P i denotes the probability matrix, and E [ · F k ] is ca onditional expectation, and its abbreviation is E k · , where F k represents the filtration generated by ξ 1 , , ξ k . We use E k = i = 1 m ξ i k P i to map the components of R ( 2 E + m ) n , F k 1 to R ( 2 E + m ) n , F k .
Based on the definition of ξ k , we have E E k + 1 = Π .
Using the idempotent property of E k , we have
E u k + 1 u * Π 1 P 2 = E u k + E k + 1 T u k u k u * Π 1 P 2 = E u k u * Π 1 P 2 + E k + 1 T u k u k Π 1 P 2 + 2 u k u * T Π 1 P E k + 1 T u k u k = u k u * Π 1 P 2 + T u k u k P 2 + 2 u k u * T P T u k u k .
Then according to Lemma 6 and (23), we get
E u k + 1 u * Π 1 P 2 = u k u * Π 1 P 2 + T u k u * P 2 u k u * P 2 u k u * Π 1 P 2 s k + 1 s k Λ ˜ I M Γ M T Λ ˜ 2 f x k f x * 2 β Γ 2 .
Therefore, if Assumption 4 holds, we can obtain the convergence of (37) according to [28] Th. 3, [27] Prop. 2.3, and the Robbins–Siegmund lemma in [29]. □

5. Numerical Experiments

5.1. Case Study I: Performance Examination

We present the effectiveness of the algorithms in this section by solving a class of quadratic programming problems on undirected networks. The network topology is shown in Figure 1.
The quadratic programming problem model is as follows:
min x 1 , , x m i = 1 m f i x i = x i T V i x i + b i T x i s . t . x i min x i x i max , i = 1 , , m , x i = x j , i = 1 , , m , i , j E ,
where x i is the decision variable of each agent. Matrix V i in the objective function is a diagonal matrix, and its elements are randomly selected in [−8, 8], and the elements of vector b i are randomly selected in [−10, −5]. For the box constraint of x i , the range of x i m i n is [−10, −5], and the range of x i m a x is [5, 10].
To solve problem (38), we need to convert the problem into the form of problem (8). Defining the set X i = e R 2 x i m e x i M and defining the indicator function δ X i x i , then we can get the following problem:
min x i R 2 i = 1 m f i x i + δ X i x i + i = 1 m i , j E δ C i , j M i , j x .
Figure 2a shows that the agent finally converges to a consistent state through synchronous Algorithm 1. In Figure 2b, we use asynchronous Algorithm 2 with activation probability p i = 0.2 to describe the state of the agent under the same parameter conditions.
In Figure 3, the performance of both proposed algorithms is depicted through a comparison with existing algorithms, i.e., an ADMM-based method [30], TriPD-Dist, and its asynchronous version [2]. It can be shown that Algorithm 1 outperforms the ADMM-based method and TriPD-Dist, and the proposed asynchronous algorithm (Algorithm 2) also has a faster convergence speed than asynchronous TriPD-Dist, mainly by estimating the logarithmic values of 1 / m · i = 1 m x i k x ˜ * .

5.2. Case Study II: First-Order Dynamics System

In this subsection, we apply the proposed synchronous algorithm to solve a first-order dynamics system problem in a 2-D space [31], where each agent has its own cost function f i ( p ˜ ) = p ˜ p ˜ x , i 2 + p ˜ p ˜ y , i 2 , with the action response p ˜ = [ p ˜ x , p ˜ y ] T , and the private reference positions p ¯ x , i = [ i 3.5 , 0 ] T and p ¯ y , i = [ 0 , i 3.5 ] T . The goal of the considered problem is that all agents cooperatively find the optimal position p ˜ under the local constraints Ω i = p ˜ R 2 p ˜ p ¯ i 0 2 64 , where p ¯ i 0 is the initial position of agent i { 1 , 2 , 3 , 4 , 5 , 6 , 7 } . Let p ¯ 1 0 = [ 4 , 5.5 ] T , p ¯ 2 0 = [ 0 , 7 ] T , p ¯ 3 0 = [ 6 , 5 ] T , p ¯ 4 0 = [ 5 , 3.5 ] T , p ¯ 5 0 = [ 0 , 7 ] T , p ¯ 6 0 = [ 5 , 5 ] T , p ¯ 7 0 = [ 7 , 7 ] T , then the distributed problem can be formulated as
min p 1 , , p m i = 1 m p i p ¯ x , i 2 + p i p ¯ y , i 2 + δ Ω i p i , s . t . p i = p j , i , j E ,
where p i R 2 is the local estimation action for p ˜ . In light of (1), we can set g i p i = δ Ω i p i . The selections of step-sizes are the same as that of Case Study I.
The results are described in Figure 4 and Figure 5. To be specific, Figure 4a,b reflect the trajectories of p i = [ p x , i , p y , i ] T . Figure 5 depicts the motions of the entire system over iterations, where the optimal position p ˜ * = [ 0 . 6743 , 0 . 2711 ] T is marked by a cross at the intersection of two star lines, the circles with a dotted line are the corresponding motion areas of agents, and the solid ones are the initial positions.

6. Conclusions

This paper mainly studies a class of distributed composite optimization problems with non-smooth convex functions. To solve this kind of problem, this paper proposes two completely distributed algorithms. At the same time, the algorithms are verified in theory and simulation. However, there are still some aspects worthy of improvement in this paper. For example, in the network structure, we can consider expanding from an undirected graph to a directed graph, and we can also combine it with more practical application scenarios, such as resource allocation.

Author Contributions

Y.S.: Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Supervision, Writing—original draft. L.R.: Data curation, Formal analysis, Software. J.T.: Formal analysis, Software, Writing—original draft. X.W.: Methodology, Software, Formal analysis. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Li, Z.; Shi, W.; Yan, M. A decentralized proximal-gradient method with network independent step-sizes and separated convergence rates. IEEE Trans. Signal Process. 2019, 67, 4494–4506. [Google Scholar] [CrossRef]
  2. Latafat, P.; Freris, N.M.; Patrinos, P. A new randomized block-coordinate primal-dual proximal algorithm for distributed optimization. IEEE Trans. Autom. Control 2019, 64, 4050–4065. [Google Scholar] [CrossRef]
  3. Bai, L.; Ye, M.; Sun, C.; Hu, G. Distributed economic dispatch control via saddle point dynamics and consensus algorithms. IEEE Trans. Control Syst. Technol. 2019, 27, 898–905. [Google Scholar] [CrossRef]
  4. Jin, B.; Li, H.; Yan, W.; Cao, M. Distributed model predictive control and optimization for linear systems with global constraints and time-varying communication. IEEE Trans. Autom. Control 2020, 66, 3393–3400. [Google Scholar] [CrossRef]
  5. Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via alternating direction method of multipliers. Found. Trends Mach. Learn. 2011, 3, 1–122. [Google Scholar] [CrossRef]
  6. Olshevsky, A. Linear time average consensus and distributed optimization on fixed graphs. SIAM J. Control. Optim. 2017, 55, 3990–4014. [Google Scholar] [CrossRef]
  7. Nedić, A.; Ozdaglar, A. Distributed subgradient methods for multi-agent optimization. IEEE Trans. Autom. Control 2009, 54, 48–61. [Google Scholar] [CrossRef]
  8. Shi, W.; Ling, Q.; Wu, G.; Yin, W. Extra: An exact first-order algorithm for decentralized consensus optimization. SIAM J. Optim. 2015, 25, 944–966. [Google Scholar] [CrossRef]
  9. Qu, G.; Li, N. Harnessing smoothness to accelerate distributed optimization. IEEE Trans. Control. Netw. Syst. 2018, 5, 1245–1260. [Google Scholar] [CrossRef]
  10. Chang, T.H.; Hong, M.Y.; Wang, X.F. Multi-agent distributed optimization via inexact consensus ADMM. IEEE Trans. Signal Process. 2015, 63, 482–497. [Google Scholar] [CrossRef] [Green Version]
  11. Iutzeler, F.; Bianchi, P.; Ciblat, P.; Hachem, W. Explicit convergence rate of a distributed alternating direction method of multipliers. IEEE Trans. Autom. Control 2016, 61, 892–904. [Google Scholar] [CrossRef]
  12. Shi, W.; Ling, Q.; Yuan, K.; Wu, G.; Yin, W.T. On the linear convergence of the ADMM in decentralized consensus optimization. IEEE Trans. Signal Process. 2014, 62, 1750–1761. [Google Scholar] [CrossRef]
  13. Wei, E.; Ozdaglar, A. Distributed alternating direction method of multipliers. In The 51st IEEE Conference on Decision and Control; IEEE: Maui, HI, USA, 2012; pp. 5445–5450. [Google Scholar]
  14. Chen, A.I.; Ozdaglar, A. A fast distributed proximal-gradient method. In Proceedings of the Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 1–5 October 2012; pp. 601–608. [Google Scholar]
  15. Shi, W.; Ling, Q.; Wu, G.; Yin, W. A proximal gradient algorithm for decentralized composite optimization. IEEE Trans. Signal Process. 2015, 63, 6013–6023. [Google Scholar] [CrossRef]
  16. Mao, X.; Yuan, K.; Hu, Y.; Gu, Y.; Sayed, A.; Walkman, W.Y. A communication-efficient random-walk algorithm for decentralized optimization. IEEE Trans. Signal Process. 2020, 68, 2513–2528. [Google Scholar] [CrossRef]
  17. Zeng, J.; He, T.; Wang, M. A fast proximal gradient algorithm for decentralized composite optimization over directed networks. Syst. Control. Lett. 2017, 107, 36–43. [Google Scholar] [CrossRef]
  18. Condat, L. A primal-dual splitting method for convex optimization involving lipschitzian, proximable and linear composite terms. J. Optim. Theory Appl. 2013, 158, 460–479. [Google Scholar] [CrossRef]
  19. Chen, P.; Huang, J.; Zhang, X. A primal-dual fixed point algorithm for minimization of the sum of three convex separable functions. Fixed Point Theory Appl. 2016. [Google Scholar] [CrossRef]
  20. Latafat, P.; Patrinos, P. Asymmetric forward-backward-adjoint splitting for solving monotone inclusions involving three operators. Comput. Optim. Appl. 2017, 68, 57–93. [Google Scholar] [CrossRef]
  21. Yan, M. A new primal-dual algorithm for minimizing the sum of three functions with a linear operator. J. Sci. Comput. 2018, 76, 1698–1717. [Google Scholar] [CrossRef]
  22. Boyd, S.; Ghosh, A.; Prabhakar, B.; Shah, D. Randomized gossip algorithms. IEEE Trans. Inf. Theory 2006, 52, 2508–2530. [Google Scholar] [CrossRef] [Green Version]
  23. Ren, X.; Li, D.; Xi, Y.; Shao, H. Distributed subgradient algorithm for multi-agent optimization with dynamic stepsize. IEEE/CAA J. Autom. Sin. 2021, 8, 1451–1464. [Google Scholar] [CrossRef]
  24. Micchelli, C.A.; Shen, L.; Xu, Y. Proximity algorithms for image models: Denoising. Inverse Probl. 2011, 27, 45009. [Google Scholar] [CrossRef]
  25. Bauschke, H.H.; Combettes, P.L. Convex Analysis and Monotone Operator Theory in Hilbert Spaces; Springer: New York, NY, USA, 2011. [Google Scholar]
  26. Hong, M.; Chang, T.-H. Stochastic proximal gradient consensus over random networks. IEEE Trans. Signal Process. 2017, 65, 2933–2948. [Google Scholar] [CrossRef]
  27. Combettes, P.L.; Pesquet, J.C. Stochastic quasi-Fejér block- coordinate fixed point iterations with random sweeping. SIAM J. Optim. 2015, 25, 1221–1248. [Google Scholar] [CrossRef]
  28. Bianchi, P.; Hachem, W.; Iutzeler, F. A coordinate descent primal-dual algorithm and application to distributed asynchronous optimization. IEEE Trans. Autom. Control 2016, 61, 2947–2957. [Google Scholar] [CrossRef]
  29. Robbins, H.; Siegmund, D. A convergence theorem for non negative almost supermartingales and some applications. In Herbert Robbins Selected Papers; Lai, T.L., Siegmund, D., Eds.; Springer: New York, NY, USA, 1985; pp. 111–135. [Google Scholar]
  30. Aybat, N.S.; Wang, Z.; Lin, T.; Ma, S. Distributed linearized alternating direction method of multipliers for composite convex consensus optimization. IEEE Trans. Autom. Control 2018, 63, 5–20. [Google Scholar] [CrossRef]
  31. Li, H.; Su, E.; Wang, C.; Liu, J.; Xia, D. A primal-dual forward-backward splitting algorithm for distributed convex optimization. IEEE Trans. Emerg. Top. Comput. Intell. 2021. [Google Scholar] [CrossRef]
Figure 1. Graph topology.
Figure 1. Graph topology.
Mathematics 10 03135 g001
Figure 2. Convergence results of the two algorithms. (a) Algorithm 1. (b) Algorithm 2.
Figure 2. Convergence results of the two algorithms. (a) Algorithm 1. (b) Algorithm 2.
Mathematics 10 03135 g002
Figure 3. Performance comparison.
Figure 3. Performance comparison.
Mathematics 10 03135 g003
Figure 4. Evaluations of positions. (a) Evaluations of p x , i k . (b) Evaluations of p y , i k .
Figure 4. Evaluations of positions. (a) Evaluations of p x , i k . (b) Evaluations of p y , i k .
Mathematics 10 03135 g004
Figure 5. Motions of all agents in the 2-D space.
Figure 5. Motions of all agents in the 2-D space.
Mathematics 10 03135 g005
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Shi, Y.; Ran, L.; Tang, J.; Wu, X. Distributed Optimization Algorithm for Composite Optimization Problems with Non-Smooth Function. Mathematics 2022, 10, 3135. https://doi.org/10.3390/math10173135

AMA Style

Shi Y, Ran L, Tang J, Wu X. Distributed Optimization Algorithm for Composite Optimization Problems with Non-Smooth Function. Mathematics. 2022; 10(17):3135. https://doi.org/10.3390/math10173135

Chicago/Turabian Style

Shi, Yawei, Liang Ran, Jialong Tang, and Xiangzhao Wu. 2022. "Distributed Optimization Algorithm for Composite Optimization Problems with Non-Smooth Function" Mathematics 10, no. 17: 3135. https://doi.org/10.3390/math10173135

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop