Distributed Newton Methods for Strictly Convex Consensus Optimization Problems in Multi-Agent Networks

Various distributed optimization methods have been developed for consensus optimization problems in multi-agent networks. Most of these methods only use gradient or subgradient information of the objective functions, which suffer from slow convergence rate. Recently, a distributed Newton method whose appeal stems from the use of second-order information and its fast convergence rate has been devised for the network utility maximization (NUM) problem. This paper contributes to this method by adjusting it to a special kind of consensus optimization problem in two different multi-agent networks. For networks with Hamilton path, the distributed Newton method is modified by exploiting a novel matrix splitting techniques. For general connected multi-agent networks, the algorithm is trimmed by combining the matrix splitting technique and the spanning tree for this consensus optimization problems. The convergence analyses show that both modified distributed Newton methods enable the nodes across the network to achieve a global optimal solution in a distributed manner. Finally, the distributed Newton method is applied to solve a problem which is motivated by the Kuramoto model of coupled nonlinear oscillators and the numerical results illustrate the performance of the proposed algorithm.


Introduction
A number of problems that arise in the context of wired and wireless networks can be posed as the minimization of a sum of functions, when each component function is available only to a specific agent [1].Decentralized consensus optimization problems are an important class of these problems [2].To solve these problems, distributed methods-which only require the agents to locally exchange information between each other-gain a growing interest with every passing day.Nedic and Ozdaglar [3,4] proposed distributed subgradient methods and provided convergence results and convergence rate estimates for this method.Some extensions [5,6] of this method were subsequently proposed.Ram et al. [5] adjusted a distributed subgradient method to address the problem of vertically and horizontally distributed regression in large peer-to-peer systems.Lobel and Ozdaglar [6] studied the consensus optimization problem over a time-varying network topolopy and proposed a distributed subgradient method that uses averaging algorithms for locally sharing information among the agents.Moreover, Ram et al. [1] proposed a distributed stochastic subgradient projection algorithm and explored the effects of stochastic subgradient errors on the convergence of the algorithm.These methods only used gradient or subgradient information of the objective functions, which suffered from slow convergence rate.Apart from these gradient or subgradient methods, Mota et al. [7] combined the centering alternating direction method of multiplier (ADMM) [8] and node-coloring technique and proposed a distributed ADMM (D-ADMM) algorithm for the consensus optimization problem.This method makes some improvements in convergence rate over distributed subgradient methods.Compared with conventional centralized methods, the distributed methods have faster computing efficiency and have been widely used in many fields, such as image processing [9,10], computer vision [11], intelligent power grids [12,13], machine learning [14,15], unrelated parallel machine scheduling problems [16], model predictive control (MPC) problems [17], and resource allocation problems in multi-agent communication networks [18,19].
These distributed algorithms mentioned above are all first-order methods, since they only use gradient or subgradient information of the objective function.To substitute for the distributed gradient method for solving the unconstrained minimization problem mentioned by Nedic and Ozdaglar [3], Mokhtari et al. [20] proposed a network Newton (NN)-K method based on the second-order information, where K is the number of Taylor series terms of the Newton step.NN-K can be implemented through the aggregation of information in K-hop neighborhoods in every iteration.Consequently, the communication between the adjacent nodes will increase exponentially with the augment of the number of iterations.To ensure the iterative results closer to the optimal value, a larger K should be selected and it is time-consuming-especially for large-scale networks.
Another second-order method-the distributed Newton method-was proposed by Wei et al. [21] to solve the network utility maximization (NUM) problem in a distributed manner.NUM can be formulated as a convex optimization problem with equality constraints by introducing some slack variables and the coefficient matrix of the equality constraints having full row rank.This distributed Newton-type second-order algorithm achieves superlinear convergence rate in terms of primal iterations, but it cannot solve consensus optimization problems in multi-agent networks.Tracing its root, the coefficient matrix of the constraint does not have full row rank, and predetermined routes cannot be given in the general optimization problem.
The distributed Newton method addressed in this study aims to solve the problem of minimizing a sum of strictly convex objective functions where the components of the objective are available at different nodes of a network.This paper adds to the growing body of knowledge regarding distributed second order methods.The contributions made by this paper are three-fold.

•
Adjusting the distributed Newton algorithm for the NUM problem to a special kind of consensus optimization problem in multi-agent networks with a Hamilton path.To overcome the obstacle, computation of the dual step involves the global information of the Hessian matrix, and an iterative scheme based on a novel matrix splitting technique is devised.Further, the convergence of the distributed Newton algorithm is proved theoretically.

•
A modified distributed Newton algorithm is proposed for consensus optimization problems in connected multi-agent networks.The coefficient matrix has full row rank by constructing a spanning tree of the connected network.Combined with the matrix splitting technique for NUM, the distributed Newton method for multi-agent convex optimization is proposed and a theory is presented to show the global convergence of the method.

•
The effectiveness of the modified distributed Newton methods is demonstrated by a numerical experiment.The experiment is based on the Kuramoto model of coupled nonlinear oscillators.
The proposed distributed Newton method can be applied to solve this model more efficiently compared with two first-order methods The rest of the paper is organized as follows: Section 2 provides some necessary preliminaries.Section 3 formulates the general multi-agent strictly convex consensus optimization problem in connected networks.Section 4 presents a distributed inexact Newton method in networks with a Hamilton path.A solution algorithm to solve the problem in general connected networks is proposed in Section 5. Section 6 presents the simulation results to demonstrate convergence properties of the algorithms.Finally, conclusions and recommendations for future work are provided in Section 7.

Preliminaries
Consider a connected network with P nodes and E edges modeled by a undirected graph G = (V, E ), where V = {1, 2, • • • , P} is the set of nodes and E ⊂ V × V is the set of edges.
Referring to Wei et al. [21], the NUM problem can be written as follows: where f i (x i ) is a strictly convex function, matrix A has full row rank, and c is a constant vector.This problem can be solved by an exact Newton method, where x k is the Newton direction given as the solution of the following system of linear equations where x k is the primal vector, ω k is the dual vector, ∇ f (x k ) is the gradient vector, and ∇ 2 f (x k ) is the Hessian matrix.Moreover, ∇ 2 f (x k ) is abbreviated as H k for notational convenience.Solving x k and ω k in the preceding system yields Since f (x) is a separable, strictly convex function, its Hessian matrix is a positive definite diagonal matrix, and hence Equation (4) can be easily computed by a distributed iterative scheme.Wei et al. [21] proposed a distributed Newton method for the NUM problem (1) by using a matrix splitting scheme to compute the dual vector ω k in Equation ( 5) in a distributed manner.Let C k be a diagonal matrix, with diagonal entries Matrix B k is given by Let matrix Bk be a diagonal matrix with diagonal entries By splitting the matrix AH −1 k A as the sum of C k + Bk and B k − Bk , the following theorem [21] can be obtained.
Theorem 1.For a given k > 0, let C k , B k , Bk be the matrices defined in Equations ( 6)- (8).Let ω 0 be an arbitrary initial vector.We can obtain the sequence {ω t } by the following iteration Then, the spectral radius of the matrix ) is strictly bounded above by 1 and the sequence {ω t } converges to the solution of Equation ( 5) as t → ∞.
Note that the predetermined routes and full row rank coefficient matrix are necessary when running the distributed Newton method for the NUM problem according to Reference [21].Unfortunately, this property is usually not met in the general multi-agents consensus optimization problems.

Problem Formulation
For the multi-agents consensus optimization problems proposed in this paper, only agent p has access to its private cost function f p and can communicate with its neighbors using the network infrastructure.This situation can be illustrated in Figure 1; i.e., node 2 can communicate with its adjacent nodes 1, 3, 6, and 7. Node i has its own objective function f i , and all nodes cooperate in minimizing the aggregate cost function f (x) where x ∈ R n is the global optimization variable.This problem is also known as the consensus optimization problem and its optimal solution is donated as x * .A common technique to decouple problem ( 10) is to assign copies of the global variable x to each node and then constrain all copies to be equal.Denoting the copy held by node p with x p ∈ R n , problem (10) is written equivalently as Problem ( 11) is no longer coupled by the common variable in all f p , but instead by the new equations x i = x j , for all pairs of edges in the network {i, j} ∈ E .These equations enforce all copies to be equal while the network is connected.Note that they can be written more compactly as (B T I n )x = 0, where B ∈ R P×E is the node arc-incidence matrix of the graph, I n is the identity matrix in R n , and is the Kronecker product, x = (x 1 , x 2 , • • • , x P ) ∈ (R n ) P is the optimization variable.Each column of B is associated with an edge (i, j) ∈ E and has 1 and -1 in the ith and jth entry, respectively; the remaining entries are zeros.Problem (11) can be rewritten as where A is the coefficient matrix taking values A = B T I n .In this paper we assume that the local costs f p are twice differentiable and strongly convex.

Distributed Newton Method For Multi-Agent Consensus Optimization Problems in Networks with a Hamilton Path
For some networks with particular topology structures (e.g., a Hamilton path), we can use special techniques to solve the proposed consensus optimization problem.In this section, a novel matrix splitting technique is devised for multi-agent consensus optimization problems in networks with a Hamilton path, which travels every node in the network just once.For simplicity, we renumber these nodes from 1 to P (P = 11) along this path as depicted in Figure 2. We know that every dual variable corresponds to one link, so ω i (i = 1, 2, • • • , P − 1) can be used to denote the dual variable ω i,i+1 , which is stored in node i.In Figure 2, node 0 is the copy of node 1 and it is actually non-existent.We add the definition of ω 0 = 0 and ω P = 0 for the sake of analysis.From Figure 2, the coefficient matrix A in problem ( 12) is a dual-diagonal matrix given by where I is an identity matrix of dimension n.Let M k be a diagonal matrix with diagonal entries where is the diagonal block of the Hessian matrix.Matrix N k is given by By splitting matrix AH −1 k A as the sum of M k and N k , a Jacobian iteration can be used to compute the dual vector ω k in (5) in a distributed manner.
Theorem 2. For a given k > 0, let M k and N k be the matrices defined in ( 14) and (15) and ω 0 be an arbitrary initial vector.The sequence {ω t } can be obtained by the iteration Then, the spectral radius of the matrix M −1 k N k is strictly bounded above by 1 and the sequence {ω t } converges to the solution of (5) as t → ∞.
Proof of Theorem 2. The proof is described in Appendix A.
There are many ways to split the matrix AH −1 k A , Jacobian iteration is selected in our method due to two reasons.Firstly, considering the special structure of the matrices A and H −1 k , the spectral radius of Jacobian matrix can be guaranteed strictly bounded above by 1 and thus the sequence {ω k } converges as k → ∞.Secondly, the matrix M k is diagonal, which guarantees that the dual variable is updated without global information.
Next, a distributed computation procedure to calculate the dual vector will be developed by rewriting the iteration (16).
Theorem 3. The dual iteration ( 16) can be written as where Proof of Theorem 3. The proof is described in Appendix B.
From this theorem, each link variable ω i is updated using its private result, Q i , ∇ i f (x k ), and the information from its neighbors; i.e., . The adjacent nodes' information is obtained directly through the information exchange.Therefore, the dual variable can be obtained in a distributed manner.
Once the dual variables are computed, the primal Newton direction can be obtained according to (4) as: From this equation system, the primal Newton direction is computed only using the local information Q i , ∇ i f (x k ), ω k i , and ω k i−1 ; hence, the calculation of Newton direction is decentralized.For the consensus optimization problem (10), we convert it to a separable optimization problem with equality constraints (11) and introduce Equations ( 4) and ( 5) to solve it.However, the computation of the dual variable ω k at a given primal solution x k cannot be implemented in a distributed manner, since the evaluation of the matrix inverse (AH −1 k A ) −1 requires global information.We provide a decentralized computation of ω k using Jacobian iteration.Then, the primal Newton direction is expressed in (18).Now we present the details of the algorithm.Algorithm 1 is distributed and local.Node i receives x k j from its neighbors and computes the values of a i , b i .Step 2 and Step 3 are dual iterations.Node i generates ω t+1 i by using ω t i−1 and ω t i+1 from its neighbors and sends the estimates to them.We find that the values of a i , b i are not changed at a given primal solution x k .Hence, they are calculated only once before the iteration of dual variable.Lastly, Algorithm 1 computes the Newton direction x k p and updates the primal variable x k+1 p based on the previous result x k p and sends them to their neighbors.If some stopping criterion is met, the algorithm stops and produces the result within the desired accuracy.
Algorithm 1 is proposed based on networks with a Hamilton path.In the next section, a distributed inexact Newton algorithm is proposed for multi-agent consensus optimization problems in any connected network.

Algorithm 1. Distributed Inexact Newton Method in Networks with a Hamilton Path
Step 0: Initialization: Initialize primal variables x 0 and dual variables ω 0 , set the number of iterations k = 0.
Step 1: For each node i, Send ω t+1 i to N i .End for.
Step 3: If some stopping criterion is met for ω t , continue; otherwise, set t = t + 1 and go back to Step 2.
Step 4: For each node i Calculate Newton direction x k i Update the primal variable x k+1 i = x k i + x k i and send it to N i .End for.
Step 5: If some stopping criterion is met, stop; otherwise, set t = 0, k = k + 1 and go back to Step 1.

Distributed Newton Method for Multi-agent Problems in General Connected Networks
This distributed Newton method is proposed for multi-agent consensus optimization problems in general connected networks.Before giving this method, a theorem is firstly introduced.Theorem 4. In reference [22], each connected graph has at least one spanning tree.
Thus, we can find at least one spanning tree in a connected graph.Now we select an arbitrary node as the root of the tree.We call the nodes connecting to root the first-level nodes, and the nodes which connected to ith-level nodes is called (i + 1)th-level nodes.All nodes are renumbered according to these levels.The dual variable ω i,j corresponds to a link between node i and node j.In order to ensure that the coefficient matrix A has full row rank, we eliminate the links between nodes belonging to the same level.Without loss of generality, we choose node 6 to be the root of the tree, as shown in Figure 3.In order to facilitate the analysis, all nodes in Figure 3 are renumbered according to the rule of top-to-bottom and left-to-right as depicted in Figure 4. Figure 5 is the dual graph of the spanning tree.From this figure, we have the observation that the dual graph is no longer a tree and there are many circuits.
The coefficient matrix A of Figure 4 is a lower triangular matrix given by where A l,l+1 = −I and A l,i = I if the link {l + 1, i} ∈ E .Iteration Equation ( 9) in Theorem 1 can be used to compute the dual sequence {ω t }, since the matrix A has full row rank and the objective function is strictly convex.The matrix A in this problem has different forms with it in NUM, and therefore predetermined routes are not needed when we rewrite Equation ( 9).Theorem 5.For each primal iteration k, the dual iteration ( 9) can be written as where N i /j is defined as the set of nodes connected to node i, excluding node j; sign (r, s) = 1 if r, s belongs to the same level and sign (r, Proof of Theorem 5.The proof is described in Appendix C.
From this theorem, each dual component ω t+1 i,j is updated using its private result ω t i,j and the adjacent nodes' information; i.e., ω t i,p , ω t j,p , D i , D j , Therefore, the dual variable can be computed in a distributed manner.Next, we obtain the primal Newton direction in a distributed way.
Recall the definition of matrix A; i.e., A li = I and A lj = −I if {i, j} ∈ E , i < j, and A l p = 0 otherwise.Therefore, we have Thus, the Newton direction can be given by From this equation, the primal Newton direction is computed using only the local information Q l , ∇ l f (x k ) and the dual information ω k l,p which is connected with node l.Hence, the calculation of Newton direction is decentralized.
For the consensus optimization problem (10), we have proposed a distributed inexact Newton method in the previous subsection.In order to get rid of the dependence on the network topology, we propose the following distributed Newton algorithm using a novel matrix splitting technique (Algorithm 2).

Algorithm 2. Distributed Inexact Newton Method for Arbitrary Connected Network
Step 0: Initialization: Initialize primal variables x 0 and dual variables ω 0 , set the number of iterations k = 0.
Step 1: For each link {i, j} ∈ E , Calculate a ij , b ij , and c ij .End for.
Step 2: For each link {i, j} ∈ E , Calculate ω t+1 i,j by Equation ( 20) and send the result to nodes i and j.End for.
Step 3: If some stopping criterion is met for ω t , continue; otherwise, set t = t + 1 and go back to Step 2.
Step 4: For each node i, Calculate Newton direction x k i by Equation ( 22); Update the primal variable Step 5: If some stopping criterion is met, stop; otherwise, set t = 0, k = k + 1, and go back to Step 1.
Algorithm 2 is also distributed and local.Step 2 and 3 are dual iterations.An immediate consequence of Theorem 6 is that the dual iteration is distributed.We find that the values of a i,j , b i,j , c i,j are not changed at a given primal solution x k .Hence, they are calculated only once before the dual iteration.Lastly, Algorithm 2 computes the Newton direction x k i and updates the primal variable x k+1 i based only on the previous result x k i and the dual components of the nodes connected node i. Algorithm 1 and Algorithm 2 are distributed, and they are second-order methods.We will demonstrate the effectiveness of the proposed distributed inexact Newton methods by applying them to the convex programming.

Numerical Experiments
In this section, we demonstrate the effectiveness of the proposed distributed Newton methods by applying them to solve a problem which is motivated by the Kuramoto model of coupled nonlinear oscillators [23].This problem is selected in numerical experiments for two reasons.On one hand, the objective function of this problem is strict convex and separable, which are consistent with the requirement of the special consensus optimization problem.On the other hand, compared with least square problem, the Kuramoto model is more universal and representative.Our simulations were based on random network topology.The codes were written in MATLAB.All numerical experiments were run in MATLAB 7.10.0 on a laptop with Pentium(R) Dual-Core E5500 2.80GHz CPU and 2GB of RAM.
The problem can be reformulated on the form The problem instances of this problem were generated in the following manner.The number of nodes was 100.We terminated all algorithms whenever reaching an absolute error |x k − x k−1 | = 10 −4 or the iteration number exceeded 10 3 .In addition to the decentralized incremental algorithm, we also compared the proposed distributed Newton method with the distributed subgradient algorithm.
Figures 6 and 7 show the convergence curves of the three methods under test with P = 100.The curves shown in these figures are the corresponding absolute error and objective function value of the running average iterates of the three methods.The step size of the decentralized incremental algorithm is set to α k = 0.01, and that of the distributed subgradient method is α k = 1 k+1 .From Figure 6, we observe that the proposed distributed Newton method and the distributed subgradient method exhibit comparable convergence behavior; both methods converge to a reasonable value within 10 iterations and outperform the decentralized incremental algorithm.For the decentralized incremental method, the convergence speed slows down when the iteration number becomes large.From Figure 7, we decrease the absolute value and see that the proposed Newton method performs better than the distributed subgradient algorithm.One should note that the distributed subgradient method is more computationally expensive than the proposed distributed Newton method, since in each iteration the former requires the computation of the projection of the iteration value.

Conclusions
This paper adjusted the distributed Newton methods for the NUM problem to solve the consensus optimization problem in different multi-agent networks.Firstly, a distributed inexact Newton method for consensus optimization problem in networks with a Hamilton path was devised.This method achieves the decomposition of a Hessian matrix by exploiting matrix splitting techniques.The convergence analysis of this method followed.Secondly, a distributed Newton algorithm for consensus optimization problems in general multi-agent networks was proposed by combining the matrix splitting technique for NUM and the spanning tree of the network.Meanwhile, the convergence analysis showed that the proposed algorithms enable the nodes across the network to achieve a global optimal solution in a distributed manner.Lastly, the proposed distributed inexact Newton method was applied to solving a problem which is motivated by the Kuramoto model of coupled nonlinear oscillators.The numerical experiment showed that the proposed algorithm converged with less iterations compared with the distributed projected subgradient method and the decentralized incremental approach.Moreover, the number of iterations of the proposed algorithm has a small change with the increase of the nodes' number.
When constructing the spanning tree of an arbitrary connected network, the links between nodes belonging to the same level are eliminated in order to ensure that the coefficient matrix A has full row rank.In other words, a large number of network resources have not been effectively utilized.Consequently, the efficiency of the distributed inexact Newton algorithm can be further improved.In addition, the number of primal iterations is only considered in numerical experiments and compared with other two algorithms.We will take the number of dual iterations into consideration in future work.
From the analysis given above, the matrices M k + N k and M k − N k are positive definite.Then, the spectral radius ρ(M −1 k N k ) < 1 [24].Hence, Jacobian iteration guarantees the convergence of sequence {ω t } generated by iteration (16) to the solution of (5).

Appendix B
Proof of Theorem 3. It can be obtained by the definition of matrix N k in (15) . .
The following equation can be obtained by the definition of A in ( 13) From the preceding relation and the definition of matrix C k in (6), we have Finally, we obtain the desired distributed iteration (20) when substituting Equations (A4)-(A6) into (9).

Figure 2 .
Figure 2. A Hamilton path of network with P = 11 nodes.

Figure 3 .
Figure 3. Spanning tree of network with 11 nodes in Figure 1.

Figure 5 .
Figure 5. Dual graph for the network in Figure 3.

)
Substituting these two equations and the definition of M k in (14) into the original Jacobian iteration (