Decentralized Primal-Dual Proximal Operator Algorithm for Constrained Nonsmooth Composite Optimization Problems over Networks

In this paper, we focus on the nonsmooth composite optimization problems over networks, which consist of a smooth term and a nonsmooth term. Both equality constraints and box constraints for the decision variables are also considered. Based on the multi-agent networks, the objective problems are split into a series of agents on which the problems can be solved in a decentralized manner. By establishing the Lagrange function of the problems, the first-order optimal condition is obtained in the primal-dual domain. Then, we propose a decentralized algorithm with the proximal operators. The proposed algorithm has uncoordinated stepsizes with respect to agents or edges, where no global parameters are involved. By constructing the compact form of the algorithm with operators, we complete the convergence analysis with the fixed-point theory. With the constrained quadratic programming problem, simulations verify the effectiveness of the proposed algorithm.


Introduction
Recently, the distributed data processing methods based on multi-agent networks have received much attention. The traditional methods put all the data into one machine and perform the computation centrally. However, as the size of data continues to grow, this kind of centralized strategy is limited by the computing power of the hardware. In contrast to this, the distributed methods distribute computing tasks to agents over decentralized networks [1,2]. Each agent keeps an arithmetic unit and a memory unit. The agents interact with each other through communication links, and this communication occurs only among the neighboring agents. Under these conditions, the distributed methods can effectively solve the optimization problems common to sensor networks [3], economic dispatch [4][5][6], machine learning [7,8] and dynamic control [9].
The existing decentralized algorithms have included some successful results [10][11][12][13][14][15]. Previous works considered the problem models composed of a single function. With a fixed stepsize, Shi et al. designed EXTRA [10], which can exactly converge to the optimal solution. Lei et al. studied problems with bound constraints and proposed the primaldual algorithm [11]. In addition, recent works [16][17][18] investigated a general distributed optimization with an objective function by designing decentralized subgradient-based algorithms, but diminishing or non-summable step-sizes are utilized, which may cause slow convergence rates [19].
In order to make full use of these special properties, some scholars have studied the nonsmooth composite optimization problems, which possess smooth and nonsmooth structures. By extending EXTRA to the nonsmooth combinational optimization, Shi et al.

1.
This paper focuses on an optimization problem with partially smooth and nonsmooth objective functions, where the decision variable satisfies local equality and feasible constraints, unlike these works [10,16,[18][19][20][21] without considering any constraints. Then, to solve this problem, we propose a novel decentralized algorithm by combining primal-dual frame with the proximal operators, which avoids the estimation of subgradients for nonsmooth terms.

2.
Different from existing node-based methods [16][17][18][19][20][21], the proposed algorithm adopts an edge-based communication pattern that explicitly highlights the process of information exchange among neighboring agents and further gets rid of the dependence on Laplacians [13]. Such a consideration also makes it possible to use uncoordinated stepsizes instead of commonly global or dynamic ones [10,12,16,18,19,21].

3.
By employing the first-order optimal conditions and fixed-point theory of operators, the convergence is proved, and its sublinear rate O(1/k) (k is the number of iteration); i.e., at most, O(1/ ) iterations in order to reach an accuracy of is established.
Organization: The rest of this paper is organized as follows. In Section 2, the necessary notations and basic knowledge are first provided, and then we describe the optimization problem over the networks and necessary assumptions. Section 3 supplies the development of the proposed decentralized algorithm. In Section 4, the convergence analysis for the proposed algorithm is provided. In Section 5, we use the simulation experiments to verify the theoretical analysis. Finally, conclusions are given in Section 6.

Preliminaries
In this section, we introduce the notations involved in this paper. Meanwhile, the objective problem and its explanation are also supplied.

Graph Theory and Notations
The knowledge of graph theory is used to construct the mathematical model of the communication network. Let G = (V, E ) describe the network as a graph, where V is the set of vertices and E ⊂ V × V is the set of edges. For an agent i ∈ V, N i denotes the set of its neighbors. Let the unordered pair (i, j) ∈ E represent the edge between agent i and agent j. However, (i, j) or (j, i) is still order, i.e., the variables with respect to them are different.
Next, we explain the notations that appear in this paper. Let R represent the set of real numbers. Therefore, R n denotes the n-dimensional vector space, and R n×m denotes the set of all n-row and m-column real matrices. We define I n as the n-dimensional identity operator, 0 n as the n-dimensional null vector, and 0 n×n as the null matrix. If their dimensions are clear from the context, we omit their subscript. Then, blkdiag{P, Q} is the block diagonal matrix grouped by matrices P and Q. For a matrix P, let M be its transpose. We denote x P = √ x Px as the induced norm with matrix P. The subdifferential of function arg min x f (x) + 1 2 x − y 2 M . Moreover, let S represent the optimal solution set of a solvable optimization problem over networks.

Decentralized Optimization Problem
The constrained composite optimization problem over networks studied in this paper is based on the network G = {V, E } with m agents. Specifically, the formulation of the problem is established as follows: In problem (1),x ∈ R n is the decision variable; f i : R n → R ∪ {+∞} and g i : R n → R ∪ {+∞} are two private cost functions to agent i, where the former has the Lipschitz continuous gradient, but the latter may be nonsmooth; b i ∈ R r is a vector and A i : R n → R r is a linear operator. Convex set Ω i gives the box constraints to the decision variable of agent i.
To clarify the properties of problem (1), the following necessary assumption is given.

Assumption 1.
For any agent i ∈ V: (i) The cost function f i is Lipschitz continuous and convex; i.e., if we consider the positive Lipschitz constant β i , then it holds the inequality for the gradient ∇ f i : (ii) The local cost function g i is a nonsmooth and convex function.
(iii) The optimal solutionx * to objective problem (1) exists, which satisfies both the equality constraints and the box constraints. (iv) The graph G is undirected and connected.
Note that the cost functions f i and g i are separable. Hence, we introduce the consensus constraint to transform problem (1) into the structure that can be computed in a decentralized manner: such that Problem (3) can be processed by the penalty function method. For i ∈ V and j ∈ N i , let C ij = I if i < j and C ij = −I otherwise. Thus, Problem (3) is equivalent to the following problem: Then, let x = col(x 1 , . . . , x m ) be the global variable. For i ∈ V and j ∈ N i , we introduce a linear operator N (i,j) : x → ((C ij x i ) , (C ji x j ) ) , which generates the edge-based variable from x. With the set C (i,j) = (z 1 , z 2 ) |z 1 + z 2 = 0 , the constraint in the problem (4) can be transformed into another penalty function. Therefore, the problem (1) is finally equivalent to the following problem: Based on the problem (5), we design a novel decentralized algorithm to solve the constrained composite optimization problem over networks in the next section.

Algorithm Development
The introduction with respect to the design process of the proposed algorithm is provided in this section.
Notice that Problem (5) is an unconstrained problem. According to [32] (Proposition 19.20), we obtain the following Lagrangian function: where v i ∈ R n , u i ∈ R n and w (i,j) ∈ R 2n are dual variables, and δ * , respectively. Notice that w (i,j) = (w ij , w ji ) ∈ R 2n is an edge-based variable, where w ij ∈ R n is the local variable of agent i and w ji ∈ R n is for agent j. Then, the last term of the Lagrangian function (6) satisfies: Thus, the Lagrangian function (6) can also be written as Taking the partial derivatives of the Lagrangian function (7) and combining the operator splitting method [29], we propose a new update flow as follows: are the auxiliary variables, and γ i , σ i , and µ i are positive stepsizes. Notice that the stepsizes are uncoordinated, which can be selected independently related to different agents and enjoy their own acceptable ranges. Additionally, the edge-based parameters ω (i,j) can be seen as inherent parameters of the communication network, revealing the quality of the communication.
The steps related to the edge-based variables in update flow (8) cannot be conducted directly, so we next replace them with the agent-based variables. We apply the Moreau decomposition to the first step in update flow (8) such that for the second term on the right side, we have Define (9) as the projection P C (i,j) ω −1 (i,j) w k (i,j) + N (i,j) x k . Then, according to the definition of the set C (i,j) , the projection has the following explicit expression: Thus, for i ∈ V, j ∈ N i , the update step forw (i,j) can be decomposed intō Moreover, the update step for w (i,j) can be replaced by Combining the update flow (8), (10) and (11), we finally propose the decentralized algorithm for Problem (1) in Algorithm 1.
Here, we directly give the stepsize condition of Algorithm 1 in the following assumption. The specific theoretical origin of this condition can be found in the convergence analysis section.

Assumption 2. (Stepsize conditions)
For any agent i ∈ V and j ∈ N i , the stepsizes γ i , µ i , σ i and ω (i,j) are positive. Let the following condition hold: where β i is the Lipschitz constant for the gradient ∇ f i .

Algorithm 1 The Decentralized Algorithm
Initialization : For each agent i ∈ V and all j ∈ N i , letw 0 ij ∈ R n ,ū 0 i ∈ R n ,v 0 i ∈ R n , x 0 i ∈ R n , w 0 ij ∈ R n , u 0 i ∈ R n and v 0 i ∈ R n . For k = 0, 1, 2, · · · do Each agent i repeats, for all j ∈ N i , to estimate the optimal solution.

Convergence Analysis
In this section, we first establish the compact form with operators of the proposed algorithm. Then, the results of the theoretical analysis are provided.
Define two variables U = col(w, u, v, x) andŪ = col(w,ū,v,x). Based on the equalities in (12), Algorithm 1 is equivalent to the following compact form described by the operators: where the operators are given as follows: Consider one iteration of the proposed algorithm as an operator T. Then we let U * = col(x * , w * , u * , v * ) be the fixed point of the operator T such that U * = TU * . Next, we conduct the convergence analysis.

Lemma 1. (Optimal analysis) Let
Assumption 1 be satisfied. The fixed point U * related to the operator T meets the first-order optimal conditions of the objective problem, and x * ∈ S is an optimal solution.
Proof. Substituting the fixed point into (12), we have the following set of equalities: which is also the KKT condition of the Lagrangian function (6). Therefore, x * is an optimal solution to problem (1).
The relationship between the fixed point and the optimal solution is ensured by Lemma 1. Split the operator T H as T H = T P + T K , where we let and further define another linear operator With these definitions above, the following lemma provides the property of the operator T for convergence analysis.

Lemma 2.
Under Assumption 1, there exists the following inequality for U * : where B = blkdiag{β i I n } for i ∈ V is the Lipschitz parameter matrix.

Proof.
With the definition of operator T D , we have the equality According to [32] (Theorem 18.16), for i ∈ V, ∇ f i is cocoercive, i.e., it holds Note that for any vector a and b in the same dimension and a diagonal positive definite matrix V, then there exists the inequality x y ≤ x 2 V + 1 4 y 2 V −1 . Hence, we have Combining (14)- (16), we can obtain the objective inequality and end the proof.

Lemma 3.
Under Assumption 1, there exists the following inequality for U * : where TP is defined before Lemma 2.
Proof. Considering the change of the optimal residual before and after one iteration, we have From the second step of the update flow (13), there exists such that the equality (18) leads to From the first step of the update flow (13), it holds that Thus, we further have Then, we discuss the right side of (19). Note that Lemma 1 proves the equivalence between the fixed point and the optimal solution. Substituting the property of fixed points into the update flow (13), we obtain U * =Ū * and Hence, the third term on the right side of (19) satisfies where the inequality is based on Lemma 2. Notice that the operator T A is monotone [32] (Theorem 21.2 and Proposition 20.23), i.e., it holds Since the linear operator T Q is a skew-symmetric matrix, it is monotone [29]. Combining (19)- (21), we obtain From the second step of the update flow (13), it holds where T H , T Q and T S are the linear operators. Considering that T P is also a linear operator, the second term on the right side of (22) has an equivalent form: Substituting (23) into (22), we complete the proof.
Summarizing the above lemmas, the following theorem supplies the convergence results.
Theorem 1. When Assumption 1 and 2 are satisfied, for the sequence U k k≥0 generated by the operator T, we have whereB = blkdiag 0 n×n , 0 n×n , 0 n×n , 1 4 B . Then, the sequence U k k≥0 has sublinear rate O(1/k), and the sequence x k k≥0 converges to an optimal solution x * ∈ S.
Proof. With the definition ofB, we have the following equality: Substituting (25) into (17), we obtain the inequality (24). In Theorem 1, the positive definite property is needed for the induced matrices, which leads to the stepsize conditions in Assumption 2.

Numerical Simulation
The correctness of the theoretical analysis is verified through numerical simulation on a constrained optimization problem over networks in this section.
The constrained quadratic programming problem [33] is considered in the experiments, which has the formulation as follows: x min ≤x ≤x max , i ∈ V, where matrix E i ∈ R n×n is diagonal and positive definite, e i ∈ R n is a vector, and ρ i is the penalty factor. Both x min i and x max i are vectors with constants, which give the bounds of the decision variablex. In the light of (1), we can set f i (x) = x 2 E i + e T ix and g i (x) = ρ 1 x 1 . In this case, the dimension of the decision variable is set as n = 4, and we let r = 1. For i ∈ V, the paramount data of Problem (26) are selected randomly. The elements of matrix E i are in [1,2], and the elements of the linear operator A i are in [1,15]. Both vectors e i and b i take values in [−5, 5]. The box constraints are considered as [−2.5, 2.5]. Then, we set the uncoordinated setpsizes randomly as γ i ∈ [0.005, 0.006], while σ i , µ i and ω (i,j) are in [5,6]. The numerical experiments are performed over the generated network with eight agents, which is displayed in Figure 1. The simulations are carried by running the distributed algorithms on a laptop with Intel(R) Core i5-5500U CPU @ 2.40 GHz, 8.0 GB of RAM, and Matlab R2016a on Windows 10 operating system.  Figure 2, in which a node-based consensus algorithm [34] is introduced as a comparative profile. Note that the obtained optimal solution from the proposed algorithm is in line with that of the node-based consensus one, i.e., x * = [0.6900, 0.6270, 0.8046, 0.4400] T , but the latter achieves a stable consensus after 15,000 iterations. Figure 3 shows that our proposed algorithm outperforms the node-based and subgradient algorithms [35] in terms of convergence performance by evaluating the relative errors ∑ m i=1 x k i −x * /m x * .

Conclusions
In this paper, a distributed algorithm based on proximal operators has been designed to deal with discussed a class of distributed composite optimization problems, in which the local function has a smooth and nonsmooth structure and the decision variable abides by both affine and feasible constraints. Distinguishing attributes of the proposed algorithm include the use of uncoordinated stepsizes and the edge-based communication that avoids the dependency on Laplacian weight matrices. Meanwhile, the algorithm has been verified in theory and simulation. However, there are still some aspects worthy of improvement in this paper. For example, it is worth adopting efficient accelerated protocols (such as the Nesterov-based method and heavy ball method) to improve the convergence rate and developing asynchronous distributed algorithms to deal with the issue of communication latency. In addition, more general optimization models and more efficient algorithms should be investigated in order to address potential applications, e.g., [36][37][38] with nonconvex objectives, coupled and nonlinear constraints.