Distributed Newton Methods for Strictly Convex Consensus Optimization Problems in Multi-Agent Networks

Wang, Dong; Ren, Hualing; Shao, Fubo

doi:10.3390/sym9080163

Open AccessArticle

Distributed Newton Methods for Strictly Convex Consensus Optimization Problems in Multi-Agent Networks

by

Dong Wang

¹

,

Hualing Ren

^1,* and

Fubo Shao

²

¹

Institute of Transportation System Science and Engineering, Beijing Jiaotong University, Beijing 100044, China

²

School of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China

^*

Author to whom correspondence should be addressed.

Symmetry 2017, 9(8), 163; https://doi.org/10.3390/sym9080163

Submission received: 19 July 2017 / Revised: 7 August 2017 / Accepted: 10 August 2017 / Published: 18 August 2017

Download

Browse Figures

Versions Notes

Abstract

:

Various distributed optimization methods have been developed for consensus optimization problems in multi-agent networks. Most of these methods only use gradient or subgradient information of the objective functions, which suffer from slow convergence rate. Recently, a distributed Newton method whose appeal stems from the use of second-order information and its fast convergence rate has been devised for the network utility maximization (NUM) problem. This paper contributes to this method by adjusting it to a special kind of consensus optimization problem in two different multi-agent networks. For networks with Hamilton path, the distributed Newton method is modified by exploiting a novel matrix splitting techniques. For general connected multi-agent networks, the algorithm is trimmed by combining the matrix splitting technique and the spanning tree for this consensus optimization problems. The convergence analyses show that both modified distributed Newton methods enable the nodes across the network to achieve a global optimal solution in a distributed manner. Finally, the distributed Newton method is applied to solve a problem which is motivated by the Kuramoto model of coupled nonlinear oscillators and the numerical results illustrate the performance of the proposed algorithm.

Keywords:

consensus optimization; distributed optimization; spanning tree; distributed Newton methods; matrix decomposition

1. Introduction

A number of problems that arise in the context of wired and wireless networks can be posed as the minimization of a sum of functions, when each component function is available only to a specific agent [1]. Decentralized consensus optimization problems are an important class of these problems [2]. To solve these problems, distributed methods—which only require the agents to locally exchange information between each other—gain a growing interest with every passing day. Nedic and Ozdaglar [3,4] proposed distributed subgradient methods and provided convergence results and convergence rate estimates for this method. Some extensions [5,6] of this method were subsequently proposed. Ram et al. [5] adjusted a distributed subgradient method to address the problem of vertically and horizontally distributed regression in large peer-to-peer systems. Lobel and Ozdaglar [6] studied the consensus optimization problem over a time-varying network topolopy and proposed a distributed subgradient method that uses averaging algorithms for locally sharing information among the agents. Moreover, Ram et al. [1] proposed a distributed stochastic subgradient projection algorithm and explored the effects of stochastic subgradient errors on the convergence of the algorithm. These methods only used gradient or subgradient information of the objective functions, which suffered from slow convergence rate. Apart from these gradient or subgradient methods, Mota et al. [7] combined the centering alternating direction method of multiplier (ADMM) [8] and node-coloring technique and proposed a distributed ADMM (D-ADMM) algorithm for the consensus optimization problem. This method makes some improvements in convergence rate over distributed subgradient methods. Compared with conventional centralized methods, the distributed methods have faster computing efficiency and have been widely used in many fields, such as image processing [9,10], computer vision [11], intelligent power grids [12,13], machine learning [14,15], unrelated parallel machine scheduling problems [16], model predictive control (MPC) problems [17], and resource allocation problems in multi-agent communication networks [18,19].

These distributed algorithms mentioned above are all first-order methods, since they only use gradient or subgradient information of the objective function. To substitute for the distributed gradient method for solving the unconstrained minimization problem mentioned by Nedic and Ozdaglar [3], Mokhtari et al. [20] proposed a network Newton (NN)-K method based on the second-order information, where K is the number of Taylor series terms of the Newton step. NN-K can be implemented through the aggregation of information in K-hop neighborhoods in every iteration. Consequently, the communication between the adjacent nodes will increase exponentially with the augment of the number of iterations. To ensure the iterative results closer to the optimal value, a larger K should be selected and it is time-consuming—especially for large-scale networks.

Another second-order method—the distributed Newton method—was proposed by Wei et al. [21] to solve the network utility maximization (NUM) problem in a distributed manner. NUM can be formulated as a convex optimization problem with equality constraints by introducing some slack variables and the coefficient matrix of the equality constraints having full row rank. This distributed Newton-type second-order algorithm achieves superlinear convergence rate in terms of primal iterations, but it cannot solve consensus optimization problems in multi-agent networks. Tracing its root, the coefficient matrix of the constraint does not have full row rank, and predetermined routes cannot be given in the general optimization problem.

The distributed Newton method addressed in this study aims to solve the problem of minimizing a sum of strictly convex objective functions where the components of the objective are available at different nodes of a network. This paper adds to the growing body of knowledge regarding distributed second order methods. The contributions made by this paper are three-fold.

Adjusting the distributed Newton algorithm for the NUM problem to a special kind of consensus optimization problem in multi-agent networks with a Hamilton path. To overcome the obstacle, computation of the dual step involves the global information of the Hessian matrix, and an iterative scheme based on a novel matrix splitting technique is devised. Further, the convergence of the distributed Newton algorithm is proved theoretically.
A modified distributed Newton algorithm is proposed for consensus optimization problems in connected multi-agent networks. The coefficient matrix has full row rank by constructing a spanning tree of the connected network. Combined with the matrix splitting technique for NUM, the distributed Newton method for multi-agent convex optimization is proposed and a theory is presented to show the global convergence of the method.
The effectiveness of the modified distributed Newton methods is demonstrated by a numerical experiment. The experiment is based on the Kuramoto model of coupled nonlinear oscillators. The proposed distributed Newton method can be applied to solve this model more efficiently compared with two first-order methods

The rest of the paper is organized as follows: Section 2 provides some necessary preliminaries. Section 3 formulates the general multi-agent strictly convex consensus optimization problem in connected networks. Section 4 presents a distributed inexact Newton method in networks with a Hamilton path. A solution algorithm to solve the problem in general connected networks is proposed in Section 5. Section 6 presents the simulation results to demonstrate convergence properties of the algorithms. Finally, conclusions and recommendations for future work are provided in Section 7.

2. Preliminaries

Consider a connected network with P nodes and E edges modeled by a undirected graph

G = (V, E)

, where

V = {1, 2, \dots, P}

is the set of nodes and

E \subset V \times V

is the set of edges.

Referring to Wei et al. [21], the NUM problem can be written as follows:

\begin{matrix} \min & f (x) = f_{1} (x_{1}) + f_{2} (x_{2}) + \dots + f_{P} (x_{P}) \\ s . t . & A x = c, \end{matrix}

(1)

where

f_{i} (x_{i})

is a strictly convex function, matrix A has full row rank, and c is a constant vector. This problem can be solved by an exact Newton method,

x^{k + 1} = x^{k} + d^{k} ▵ x^{k},

(2)

where

▵ x^{k}

is the Newton direction given as the solution of the following system of linear equations

[\begin{matrix} \nabla^{2} f (x^{k}) & A^{'} \\ A & 0 \end{matrix}] [\begin{matrix} ▵ x^{k} \\ ω^{k} \end{matrix}] = - [\begin{matrix} \nabla f (x^{k}) \\ 0 \end{matrix}],

(3)

where

x^{k}

is the primal vector,

ω^{k}

is the dual vector,

\nabla f (x^{k})

is the gradient vector, and

\nabla^{2} f (x^{k})

is the Hessian matrix. Moreover,

\nabla^{2} f (x^{k})

is abbreviated as

H_{k}

for notational convenience.

Solving

▵ x^{k}

and

ω^{k}

in the preceding system yields

▵ x^{k} = - H_{k}^{- 1} (\nabla f (x^{k}) + A^{'} ω^{k}),

(4)

(A H_{k}^{- 1} A^{'}) ω^{k} = - A H_{k}^{- 1} \nabla f (x^{k}) .

(5)

Since

f (x)

is a separable, strictly convex function, its Hessian matrix is a positive definite diagonal matrix, and hence Equation (4) can be easily computed by a distributed iterative scheme. Wei et al. [21] proposed a distributed Newton method for the NUM problem (1) by using a matrix splitting scheme to compute the dual vector

ω^{k}

in Equation (5) in a distributed manner. Let

C_{k}

be a diagonal matrix, with diagonal entries

{(C_{k})}_{l l} = {(A H_{k}^{- 1} A^{'})}_{l l},

(6)

Matrix

B_{k}

is given by

B_{k} = A H_{k}^{- 1} A^{'} - C_{k} .

(7)

Let matrix

{\bar{B}}_{k}

be a diagonal matrix with diagonal entries

{(\bar{B_{k}})}_{i i} = \sum_{j = 1}^{L} {(B_{k})}_{i j} .

(8)

By splitting the matrix

A H_{k}^{- 1} A^{'}

as the sum of

C_{k} + {\bar{B}}_{k}

and

B_{k} - {\bar{B}}_{k}

, the following theorem [21] can be obtained.

Theorem 1.

For a given

k > 0

, let

C_{k}

,

B_{k}

,

{\bar{B}}_{k}

be the matrices defined in Equations (6)–(8). Let

ω^{0}

be an arbitrary initial vector. We can obtain the sequence

{ω^{t}}

by the following iteration

ω^{t + 1} = {(C_{k} + {\bar{B}}_{k})}^{- 1} ({\bar{B}}_{k} - B_{k}) ω^{t} + {(C_{k} + {\bar{B}}_{k})}^{- 1} (- A H_{k}^{- 1} \nabla f (x^{k})) .

(9)

Then, the spectral radius of the matrix

{(C_{k} + {\bar{B}}_{k})}^{- 1} ({\bar{B}}_{k} - B_{k})

is strictly bounded above by 1 and the sequence

{ω^{t}}

converges to the solution of Equation (5) as

t \to \infty

.

Note that the predetermined routes and full row rank coefficient matrix are necessary when running the distributed Newton method for the NUM problem according to Reference [21]. Unfortunately, this property is usually not met in the general multi-agents consensus optimization problems.

3. Problem Formulation

For the multi-agents consensus optimization problems proposed in this paper, only agent p has access to its private cost function

f_{p}

and can communicate with its neighbors using the network infrastructure. This situation can be illustrated in Figure 1; i.e., node 2 can communicate with its adjacent nodes 1, 3, 6, and 7. Node i has its own objective function

f_{i}

, and all nodes cooperate in minimizing the aggregate cost function

f (x)

\begin{matrix} \min_{x} & f (x) = f_{1} (x) + f_{2} (x) + \dots + f_{P} (x) \\ s . t . & x \in R^{n}, \end{matrix}

(10)

where

x \in R^{n}

is the global optimization variable. This problem is also known as the consensus optimization problem and its optimal solution is donated as

x^{*}

.

A common technique to decouple problem (10) is to assign copies of the global variable x to each node and then constrain all copies to be equal. Denoting the copy held by node p with

x_{p} \in R^{n}

, problem (10) is written equivalently as

\begin{matrix} \min & f_{1} (x_{1}) + f_{2} (x_{2}) + \dots + f_{P} (x_{P}) \\ s . t . & x_{i} = x_{j}, {i, j} \in E . \end{matrix}

(11)

Problem (11) is no longer coupled by the common variable in all

f_{p}

, but instead by the new equations

x_{i} = x_{j}

, for all pairs of edges in the network

{i, j} \in E

. These equations enforce all copies to be equal while the network is connected. Note that they can be written more compactly as

(B^{T} ⨂ I_{n}) x = 0

, where

B \in R^{P \times E}

is the node arc-incidence matrix of the graph,

I_{n}

is the identity matrix in

R^{n}

, and ⨂ is the Kronecker product,

x = (x_{1}, x_{2}, \dots, x_{P}) \in {(R^{n})}^{P}

is the optimization variable. Each column of B is associated with an edge

(i, j) \in E

and has 1 and -1 in the ith and jth entry, respectively; the remaining entries are zeros. Problem (11) can be rewritten as

\begin{matrix} \min & f (x) = \sum_{p = 1}^{p = P} f_{p} (x_{p}) \\ s . t . & A x = 0, \end{matrix}

(12)

where A is the coefficient matrix taking values

A = B^{T} ⨂ I_{n}

. In this paper we assume that the local costs

f_{p}

are twice differentiable and strongly convex.

4. Distributed Newton Method For Multi-Agent Consensus Optimization Problems in Networks with a Hamilton Path

For some networks with particular topology structures (e.g., a Hamilton path), we can use special techniques to solve the proposed consensus optimization problem. In this section, a novel matrix splitting technique is devised for multi-agent consensus optimization problems in networks with a Hamilton path, which travels every node in the network just once. For simplicity, we renumber these nodes from 1 to

P (P = 11)

along this path as depicted in Figure 2. We know that every dual variable corresponds to one link, so

ω_{i}

(

i = 1, 2, \dots, P - 1

) can be used to denote the dual variable

ω_{i, i + 1}

, which is stored in node i. In Figure 2, node 0 is the copy of node 1 and it is actually non-existent. We add the definition of

ω_{0} = 0

and

ω_{P} = 0

for the sake of analysis.

From Figure 2, the coefficient matrix A in problem (12) is a dual-diagonal matrix given by

A = [\begin{matrix} I & - I & 0 & 0 & 0 & 0 & 0 \\ 0 & I & - I & 0 & \dots & 0 & 0 & 0 \\ 0 & 0 & I & - I & 0 & 0 & 0 \\ ⋮ & ⋱ & ⋮ \\ 0 & 0 & 0 & 0 & \dots & I & - I & 0 \\ 0 & 0 & 0 & 0 & 0 & I & - I \end{matrix}],

(13)

where I is an identity matrix of dimension n.

Let

M_{k}

be a diagonal matrix with diagonal entries

{(M_{k})}_{p p} = {(A H_{k}^{- 1} A^{'})}_{p p} = Q_{p}^{- 1} + Q_{p + 1}^{- 1},

(14)

where

Q_{p} = {(H_{k})}_{p p} = \frac{\partial^{2} f (x^{k})}{\partial x_{p}^{2}}

is the diagonal block of the Hessian matrix. Matrix

N_{k}

is given by

N_{k} = A H_{k}^{- 1} A^{'} - M_{k},

(15)

By splitting matrix

A H_{k}^{- 1} A^{'}

as the sum of

M_{k}

and

N_{k}

, a Jacobian iteration can be used to compute the dual vector

ω^{k}

in (5) in a distributed manner.

Theorem 2.

For a given

k > 0

, let

M_{k}

and

N_{k}

be the matrices defined in (14) and (15) and

ω^{0}

be an arbitrary initial vector. The sequence

{ω^{t}}

can be obtained by the iteration

ω^{t + 1} = - M_{k}^{- 1} N_{k} ω^{t} - M_{k}^{- 1} A H_{k}^{- 1} \nabla f (x^{k}) .

(16)

Then, the spectral radius of the matrix

M_{k}^{- 1} N_{k}

is strictly bounded above by 1 and the sequence

{ω^{t}}

converges to the solution of (5) as

t \to \infty

.

Proof of Theorem 2.

The proof is described in Appendix A. ☐

There are many ways to split the matrix

A H_{k}^{- 1} A^{'}

, Jacobian iteration is selected in our method due to two reasons. Firstly, considering the special structure of the matrices A and

H_{k}^{- 1}

, the spectral radius of Jacobian matrix can be guaranteed strictly bounded above by 1 and thus the sequence

{ω^{k}}

converges as

k \to \infty

. Secondly, the matrix

M_{k}

is diagonal, which guarantees that the dual variable is updated without global information.

Next, a distributed computation procedure to calculate the dual vector will be developed by rewriting the iteration (16).

Theorem 3.

The dual iteration (16) can be written as

ω_{i}^{t + 1} = {(Q_{i}^{- 1} + Q_{i + 1}^{- 1})}^{- 1} (Q_{i}^{- 1} ω_{i - 1}^{t} + Q_{i + 1}^{- 1} ω_{i + 1}^{t} + a_{i}) i = 1, 2, \dots, P - 1,

(17)

where

a_{i} = Q_{i + 1}^{- 1} \nabla_{i + 1} f (x^{k}) - Q_{i}^{- 1} \nabla_{i} f (x^{k}), ω_{0} = 0

and

ω_{P} = 0

.

Proof of Theorem 3.

The proof is described in Appendix B. ☐

From this theorem, each link variable

ω_{i}

is updated using its private result,

Q_{i}, \nabla_{i} f (x^{k})

, and the information from its neighbors; i.e.,

ω_{i - 1}^{t}, Q_{i + 1}

,

\nabla_{i + 1} f (x^{k}), ω_{i + 1}^{t}

. The adjacent nodes’ information is obtained directly through the information exchange. Therefore, the dual variable can be obtained in a distributed manner.

Once the dual variables are computed, the primal Newton direction can be obtained according to (4) as:

[\begin{matrix} ▵ x_{1}^{k} \\ ▵ x_{2}^{k} \\ ▵ x_{3}^{k} \\ ⋮ \\ ▵ x_{n}^{k} \end{matrix}] = [\begin{matrix} - Q_{1}^{- 1} (\nabla_{1} f (x^{k}) + ω_{1}^{k}) \\ - Q_{2}^{- 1} (\nabla_{2} f (x^{k}) + ω_{2}^{k} - ω_{1}^{k}) \\ - Q_{3}^{- 1} (\nabla_{3} f (x^{k}) + ω_{3}^{k} - ω_{2}^{k}) \\ ⋮ \\ - Q_{n}^{- 1} (\nabla_{n} f (x^{k}) - ω_{n - 1}^{k}) \end{matrix}] .

(18)

From this equation system, the primal Newton direction is computed only using the local information

Q_{i}, \nabla_{i} f (x^{k}), ω_{i}^{k}

, and

ω_{i - 1}^{k}

; hence, the calculation of Newton direction is decentralized.

For the consensus optimization problem (10), we convert it to a separable optimization problem with equality constraints (11) and introduce Equations (4) and (5) to solve it. However, the computation of the dual variable

ω^{k}

at a given primal solution

x^{k}

cannot be implemented in a distributed manner, since the evaluation of the matrix inverse

{(A H_{k}^{- 1} A^{'})}^{- 1}

requires global information. We provide a decentralized computation of

ω^{k}

using Jacobian iteration. Then, the primal Newton direction is expressed in (18). Now we present the details of the algorithm.

Algorithm 1 is distributed and local. Node i receives

x_{j}^{k}

from its neighbors and computes the values of

a_{i}, b_{i}

. Step 2 and Step 3 are dual iterations. Node i generates

ω_{i}^{t + 1}

by using

ω_{i - 1}^{t}

and

ω_{i + 1}^{t}

from its neighbors and sends the estimates to them. We find that the values of

a_{i}, b_{i}

are not changed at a given primal solution

x^{k}

. Hence, they are calculated only once before the iteration of dual variable. Lastly, Algorithm 1 computes the Newton direction

▵ x_{p}^{k}

and updates the primal variable

x_{p}^{k + 1}

based on the previous result

x_{p}^{k}

and sends them to their neighbors. If some stopping criterion is met, the algorithm stops and produces the result within the desired accuracy.

Algorithm 1 is proposed based on networks with a Hamilton path. In the next section, a distributed inexact Newton algorithm is proposed for multi-agent consensus optimization problems in any connected network.

Algorithm 1. Distributed Inexact Newton Method in Networks with a Hamilton Path

Step 0: Initialization: Initialize primal variables

x^{0}

and dual variables

ω^{0}

, set the number of iterations

k = 0

.

Step 1: For each node i,

If

i \neq P

, calculate

a_{i}

and

b_{i} = {(Q_{i}^{- 1} + Q_{i + 1}^{- 1})}^{- 1}

;

If

i = P

, continue.

End for.

Step 2: Set

ω_{0}^{t} = ω_{P}^{t} = {[0, 0, \dots, 0]}^{'} \in R^{n - 1}

.

For each node i,

If

i = P

, continue; Otherwise, calculate

ω_{i}^{t + 1} = b_{i} (Q_{i}^{- 1} ω_{i - 1}^{t} + Q_{i + 1}^{- 1} ω_{i + 1}^{t} + a_{i}) .

Send

ω_{i}^{t + 1}

to

N_{i}

.

End for.

Step 3: If some stopping criterion is met for

ω^{t}

, continue; otherwise, set

t = t + 1

and go back to Step 2.

Step 4: For each node i

Calculate Newton direction

▵ x_{i}^{k}

Update the primal variable

x_{i}^{k + 1} = x_{i}^{k} + ▵ x_{i}^{k}

and send it to

N_{i}

.

End for.

Step 5: If some stopping criterion is met, stop; otherwise, set

t = 0, k = k + 1

and go back to Step 1.

5. Distributed Newton Method for Multi-agent Problems in General Connected Networks

This distributed Newton method is proposed for multi-agent consensus optimization problems in general connected networks. Before giving this method, a theorem is firstly introduced.

Theorem 4.

In reference [22], each connected graph has at least one spanning tree.

Thus, we can find at least one spanning tree in a connected graph. Now we select an arbitrary node as the root of the tree. We call the nodes connecting to root the first-level nodes, and the nodes which connected to

i th

-level nodes is called

(i + 1) th

-level nodes. All nodes are renumbered according to these levels. The dual variable

ω_{i, j}

corresponds to a link between node i and node j. In order to ensure that the coefficient matrix A has full row rank, we eliminate the links between nodes belonging to the same level. Without loss of generality, we choose node 6 to be the root of the tree, as shown in Figure 3. In order to facilitate the analysis, all nodes in Figure 3 are renumbered according to the rule of top-to-bottom and left-to-right as depicted in Figure 4. Figure 5 is the dual graph of the spanning tree. From this figure, we have the observation that the dual graph is no longer a tree and there are many circuits.

The coefficient matrix A of Figure 4 is a lower triangular matrix given by

A = [\begin{matrix} I & - I & 0 & 0 & 0 & 0 & 0 \\ I & 0 & - I & 0 & \dots & 0 & 0 & 0 \\ I & 0 & 0 & - I & 0 & 0 & 0 \\ ⋮ & ⋱ & ⋮ \\ 0 & 0 & 0 & I & \dots & 0 & - I & 0 \\ 0 & 0 & 0 & 0 & 0 & I & - I \end{matrix}],

(19)

where

A_{l, l + 1} = - I

and

A_{l, i} = I

if the link

{l + 1, i} \in E

.

Iteration Equation (9) in Theorem 1 can be used to compute the dual sequence

{ω^{t}}

, since the matrix A has full row rank and the objective function is strictly convex. The matrix A in this problem has different forms with it in NUM, and therefore predetermined routes are not needed when we rewrite Equation (9).

Theorem 5.

For each primal iteration k, the dual iteration (9) can be written as

\begin{matrix} ω_{i, j}^{t + 1} & = & - a_{i j} [\sum_{s \in N_{i} / j} s i g n (s, j) Q_{i}^{- 1} ω_{i, s}^{t} + \sum_{s \in N_{j} / i} s i g n (s, i) Q_{j}^{- 1} ω_{j, s}^{t} - b_{i j} ω_{i, j}^{t} + c_{i j}], \end{matrix}

(20)

where

N_{i} / j

is defined as the set of nodes connected to node i, excluding node j; sign

(r, s) = 1

if

r, s

belongs to the same level and sign

(r, s) = - 1

; otherwise,

a_{i j} = {(D_{i} Q_{i}^{- 1} + D_{j} Q_{j}^{- 1})}^{- 1}

,

b_{i j} = (D_{i} - 1) Q_{i}^{- 1} + (D_{j} - 1) Q_{j}^{- 1}

and

c_{i j} = Q_{i}^{- 1} \nabla_{i} f (x^{k}) - Q_{j}^{- 1} \nabla_{j} f (x^{k})

.

Proof of Theorem 5.

The proof is described in Appendix C. ☐

From this theorem, each dual component

ω_{i, j}^{t + 1}

is updated using its private result

ω_{i, j}^{t}

and the adjacent nodes’ information; i.e.,

ω_{i, p}^{t}, ω_{j, p}^{t}, D_{i}

,

D_{j}

,

Q_{i}^{- 1}, Q_{j}^{- 1}, \nabla_{i} f (x^{k}), \nabla_{j} f (x^{k})

. Therefore, the dual variable can be computed in a distributed manner. Next, we obtain the primal Newton direction in a distributed way.

Recall the definition of matrix A; i.e.,

A_{l i} = I

and

A_{l j} = - I

if

{i, j} \in E, i < j

, and

A_{l p} = 0

otherwise. Therefore, we have

{(A^{'} ω^{k})}_{l} = \sum_{p \in N_{l}, p > l} ω_{l, p} - \sum_{p \in N_{l}, p < l} ω_{l, p} .

(21)

Thus, the Newton direction can be given by

{(▵ x^{k})}_{l} = - Q_{l}^{- 1} (\nabla_{l} f (x^{k}) + \sum_{p \in N_{l}, p > l} ω_{l, p} - \sum_{p \in N_{l}, p < l} ω_{l, p}) .

(22)

From this equation, the primal Newton direction is computed using only the local information

Q_{l}, \nabla_{l} f (x^{k})

and the dual information

ω_{l, p}^{k}

which is connected with node l. Hence, the calculation of Newton direction is decentralized.

For the consensus optimization problem (10), we have proposed a distributed inexact Newton method in the previous subsection. In order to get rid of the dependence on the network topology, we propose the following distributed Newton algorithm using a novel matrix splitting technique (Algorithm 2).

Algorithm 2. Distributed Inexact Newton Method for Arbitrary Connected Network

Step 0: Initialization: Initialize primal variables

x^{0}

and dual variables

ω^{0}

, set the number of iterations

k = 0

.

Step 1: For each link

{i, j} \in E

,

Calculate

a_{i j}, b_{i j}

, and

c_{i j}

.

End for.

Step 2: For each link

{i, j} \in E

,

Calculate

ω_{i, j}^{t + 1}

by Equation (20) and send the result to nodes i and j.

End for.

Step 3: If some stopping criterion is met for

ω^{t}

, continue; otherwise, set

t = t + 1

and go back to Step 2.

Step 4: For each node i,

Calculate Newton direction

▵ x_{i}^{k}

by Equation (22);

Update the primal variable

x_{i}^{k + 1} = x_{i}^{k} + ▵ x_{i}^{k}

.

Send

x_{i}^{t + 1}

to

N_{i}

.

End for.

Step 5: If some stopping criterion is met, stop; otherwise, set

t = 0, k = k + 1

, and go back to Step 1.

Algorithm 2 is also distributed and local. Step 2 and 3 are dual iterations. An immediate consequence of Theorem 6 is that the dual iteration is distributed. We find that the values of

a_{i, j}, b_{i, j}, c_{i, j}

are not changed at a given primal solution

x^{k}

. Hence, they are calculated only once before the dual iteration. Lastly, Algorithm 2 computes the Newton direction

▵ x_{i}^{k}

and updates the primal variable

x_{i}^{k + 1}

based only on the previous result

x_{i}^{k}

and the dual components of the nodes connected node i.

Algorithm 1 and Algorithm 2 are distributed, and they are second-order methods. We will demonstrate the effectiveness of the proposed distributed inexact Newton methods by applying them to the convex programming.

6. Numerical Experiments

In this section, we demonstrate the effectiveness of the proposed distributed Newton methods by applying them to solve a problem which is motivated by the Kuramoto model of coupled nonlinear oscillators [23]. This problem is selected in numerical experiments for two reasons. On one hand, the objective function of this problem is strict convex and separable, which are consistent with the requirement of the special consensus optimization problem. On the other hand, compared with least square problem, the Kuramoto model is more universal and representative. Our simulations were based on random network topology. The codes were written in MATLAB. All numerical experiments were run in MATLAB 7.10.0 on a laptop with Pentium(R) Dual-Core E5500 2.80GHz CPU and 2GB of RAM.

The problem can be reformulated on the form

\begin{matrix} min_{x} & \sum_{i = 1}^{P} 1 - \sqrt{1 - x_{i}^{2}} \\ s . t . & x_{i} = x_{j}, \forall i \neq j . \end{matrix}

The problem instances of this problem were generated in the following manner. The number of nodes was 100. We terminated all algorithms whenever reaching an absolute error

| x^{k} - x^{k - 1} | = 10^{- 4}

or the iteration number exceeded

10^{3}

. In addition to the decentralized incremental algorithm, we also compared the proposed distributed Newton method with the distributed subgradient algorithm.

Figure 6 and Figure 7 show the convergence curves of the three methods under test with

P = 100

. The curves shown in these figures are the corresponding absolute error and objective function value of the running average iterates of the three methods. The step size of the decentralized incremental algorithm is set to

α_{k} = 0.01

, and that of the distributed subgradient method is

α_{k} = \frac{1}{k + 1}

. From Figure 6, we observe that the proposed distributed Newton method and the distributed subgradient method exhibit comparable convergence behavior; both methods converge to a reasonable value within 10 iterations and outperform the decentralized incremental algorithm. For the decentralized incremental method, the convergence speed slows down when the iteration number becomes large. From Figure 7, we decrease the absolute value and see that the proposed Newton method performs better than the distributed subgradient algorithm. One should note that the distributed subgradient method is more computationally expensive than the proposed distributed Newton method, since in each iteration the former requires the computation of the projection of the iteration value.

7. Conclusions

This paper adjusted the distributed Newton methods for the NUM problem to solve the consensus optimization problem in different multi-agent networks. Firstly, a distributed inexact Newton method for consensus optimization problem in networks with a Hamilton path was devised. This method achieves the decomposition of a Hessian matrix by exploiting matrix splitting techniques. The convergence analysis of this method followed. Secondly, a distributed Newton algorithm for consensus optimization problems in general multi-agent networks was proposed by combining the matrix splitting technique for NUM and the spanning tree of the network. Meanwhile, the convergence analysis showed that the proposed algorithms enable the nodes across the network to achieve a global optimal solution in a distributed manner. Lastly, the proposed distributed inexact Newton method was applied to solving a problem which is motivated by the Kuramoto model of coupled nonlinear oscillators. The numerical experiment showed that the proposed algorithm converged with less iterations compared with the distributed projected subgradient method and the decentralized incremental approach. Moreover, the number of iterations of the proposed algorithm has a small change with the increase of the nodes’ number.

When constructing the spanning tree of an arbitrary connected network, the links between nodes belonging to the same level are eliminated in order to ensure that the coefficient matrix A has full row rank. In other words, a large number of network resources have not been effectively utilized. Consequently, the efficiency of the distributed inexact Newton algorithm can be further improved. In addition, the number of primal iterations is only considered in numerical experiments and compared with other two algorithms. We will take the number of dual iterations into consideration in future work.

Acknowledgments

The work was supported by the National Natural Science Foundation of China (71371026 & 71771019), Science Fund for Creative Research Groups of the National Natural Science Foundation of China (71621001).

Author Contributions

The authors contributed equally to this work.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Proof of Theorem 2.

Since each

f_{p}

is a strictly convex function, both the Hessian matrix

H_{k}

and its inverse

H_{k}^{- 1}

are positive definite and block-diagonal for all k. In addition, the matrix A has full row rank as shown in (13). Therefore, the product

A H_{k}^{- 1} A^{'}

is real and symmetric.

We now prove that matrix

M_{k} + N_{k}

is a positive definite matrix. Let matrix

C_{k}

be given by

C_{k} = M_{k} + N_{k}

, and

C_{k} = [\begin{matrix} Q_{1}^{- 1} + Q_{2}^{- 1} & - Q_{2}^{- 1} & 0 & 0 & 0 \\ - Q_{2}^{- 1} & Q_{2}^{- 1} + Q_{3}^{- 1} & - Q_{3}^{- 1} & \dots & 0 & 0 \\ 0 & - Q_{3}^{- 1} & Q_{3}^{- 1} + Q_{4}^{- 1} & 0 & 0 \\ ⋮ & ⋱ \\ 0 & 0 & 0 & \dots & Q_{P - 2}^{- 1} + Q_{P - 1}^{- 1} & - Q_{P - 1}^{- 1} \\ 0 & 0 & 0 & \dots & - Q_{P - 1}^{- 1} & Q_{P - 1}^{- 1} + Q_{P}^{- 1} \end{matrix}] .

For any nonzero vector

v = {[v_{1}^{'}, v_{2}^{'} \dots v_{P - 1}^{'}]}^{'}

, it can be obtained that

\begin{matrix} (C_{k} v, v) & = & ((Q_{1}^{- 1} + Q_{2}^{- 1}) v_{1} - Q_{2}^{- 1} v_{2}, v_{1}) + (- Q_{2}^{- 1} v_{1} + (Q_{2}^{- 1} + Q_{3}^{- 1}) v_{2} \\ - Q_{3}^{- 1} v_{3}, v_{2}) + (- Q_{3}^{- 1} v_{4} + (Q_{3}^{- 1} + Q_{4}^{- 1}) v_{3} - Q_{4}^{- 1} v_{4}, v_{3}) \\ + \dots + (- Q_{P - 1}^{- 1} v_{P - 2} + (Q_{P - 1}^{- 1} + Q_{P}^{- 1}) v_{P - 1}, v_{P - 1}) \\ = & (Q_{1}^{- 1} v_{1}, v_{1}) + (Q_{2}^{- 1} (v_{1} - v_{2}), (v_{1} - v_{2})) + (Q_{3}^{- 1} (v_{2} - v_{3}), (v_{2} - v_{3})) \\ + \dots + (Q_{P - 1}^{- 1} (v_{P - 2} - v_{P - 1}), (v_{P - 2} - v_{P - 1})) + (Q_{P}^{- 1} v_{P - 1}, v_{P - 1}) . \end{matrix}

Since the matrices

Q_{p}

, their inverse

Q_{p}^{- 1}

and

Q_{p}^{- 1} + Q_{q}^{- 1}

, are all positive definite, we have

(C_{k} v, v) > 0

combining the nonnegativity of the vector v. Thus, the matrix

M_{k} + N_{k}

is positive definite. Evidenced by the same token, the matrix

M_{k} - N_{k}

is also a positive definite matrix.

From the analysis given above, the matrices

M_{k} + N_{k}

and

M_{k} - N_{k}

are positive definite. Then, the spectral radius

ρ (M_{k}^{- 1} N_{k}) < 1

[24]. Hence, Jacobian iteration guarantees the convergence of sequence

{ω^{t}}

generated by iteration (16) to the solution of (5). ☐

Appendix B

Proof of Theorem 3.

It can be obtained by the definition of matrix

N_{k}

in (15)

- N_{k} ω^{t} = [\begin{matrix} 0 & Q_{2}^{- 1} & 0 \\ Q_{2}^{- 1} & 0 & Q_{3}^{- 1} \\ Q_{3}^{- 1} & 0 \\ ⋱ & ⋱ & ⋱ \\ 0 & Q_{P - 1}^{- 1} \\ Q_{P - 1}^{- 1} & 0 \end{matrix}] [\begin{matrix} ω_{1}^{t} \\ ω_{2}^{t} \\ ω_{3}^{t} \\ ⋮ \\ ω_{P - 2}^{t} \\ ω_{P - 1}^{t} \end{matrix}] = [\begin{matrix} Q_{2}^{- 1} ω_{2}^{t} \\ Q_{2}^{- 1} ω_{1}^{t} + Q_{3}^{- 1} ω_{3}^{t} \\ Q_{3}^{- 1} ω_{2}^{t} + Q_{4}^{- 1} ω_{4}^{t} \\ ⋮ \\ Q_{P - 2}^{- 1} ω_{P - 3}^{t} + Q_{P - 1}^{- 1} ω_{P - 1}^{t} \\ Q_{P - 1}^{- 1} ω_{P - 2}^{t} \end{matrix}] .

(A1)

The following equation can be obtained by the definition of A in (13)

\begin{matrix} - A H_{k}^{- 1} \nabla f (x^{k}) & = - [\begin{matrix} Q_{1}^{- 1} & - Q_{2}^{- 1} \\ Q_{2}^{- 1} & - Q_{3}^{- 1} \\ ⋱ & ⋱ \\ Q_{P - 1}^{- 1} & Q_{P}^{- 1} \end{matrix}] [\begin{matrix} \nabla_{1} f (x^{k}) \\ \nabla_{2} f (x^{k}) \\ ⋮ \\ \nabla_{P} f (x^{k}) \end{matrix}] \\ = [\begin{matrix} Q_{2}^{- 1} \nabla_{2} f (x^{k}) - Q_{1}^{- 1} \nabla_{1} f (x^{k}) \\ Q_{3}^{- 1} \nabla_{3} f (x^{k}) - Q_{2}^{- 1} \nabla_{2} f (x^{k}) \\ ⋮ \\ Q_{P - 1}^{- 1} \nabla_{P - 1} f (x^{k}) - Q_{P}^{- 1} \nabla_{P} f (x^{k}) \end{matrix}] . \end{matrix}

(A2)

Substituting these two equations and the definition of

M_{k}

in (14) into the original Jacobian iteration (16), we have

[\begin{matrix} ω_{1}^{t + 1} \\ ω_{2}^{t + 1} \\ ω_{3}^{t + 1} \\ ⋮ \\ ω_{P - 1}^{t + 1} \end{matrix}] = [\begin{matrix} (Q_{1}^{- 1} + Q_{2}^{- 1}) (Q_{2}^{- 1} ω_{2}^{t} + Q_{2}^{- 1} \nabla_{2} f (x^{k}) - Q_{1}^{- 1} \nabla_{1} f (x^{k})) \\ (Q_{2}^{- 1} + Q_{3}^{- 1}) (Q_{2}^{- 1} ω_{1}^{t} + Q_{3}^{- 1} ω_{3}^{t} + Q_{3}^{- 1} \nabla_{3} f (x^{k}) - Q_{2}^{- 1} \nabla_{2} f (x^{k})) \\ (Q_{3}^{- 1} + Q_{4}^{- 1}) (Q_{3}^{- 1} ω_{2}^{t} + Q_{4}^{- 1} ω_{4}^{t} + Q_{4}^{- 1} \nabla_{4} f (x^{k}) - Q_{3}^{- 1} \nabla_{3} f (x^{k})) \\ ⋮ \\ (Q_{P - 1}^{- 1} + Q_{P}^{- 1}) (Q_{P - 1}^{- 1} ω_{P - 2}^{t} + Q_{P - 1}^{- 1} \nabla_{P - 1} f (x^{k}) - Q_{P}^{- 1} \nabla_{P} f (x^{k})) \end{matrix}] .

(A3)

Finally, let

a_{i} = Q_{i + 1}^{- 1} \nabla_{i + 1} f (x^{k}) - Q_{i}^{- 1} \nabla_{i} f (x^{k})

, and we obtain the iteration (17). ☐

Appendix C

Proof of Theorem 5.

Without loss of generality, we assume that

ω_{i, j}

is the lth component of the dual vector

ω

; i.e.,

ω_{i, j} = ω_{l}

. Recalling the definition of coefficient matrices A and

B_{k}

, we can obtain that:

{(B_{k})}_{l, p} = Q_{i}^{- 1}

if

ω_{p} = ω_{i, s}

is connected to node i. In addition, node s and node j belong to the same level;

{(B_{k})}_{l, p} = - Q_{i}^{- 1}

if

ω_{p} = ω_{i, s}

is connected to node i and node s and node j belong to different levels. Then, we have

{(B_{k} ω^{t})}_{l} = \sum_{s \in N_{i} / j} s i g n (s, j) Q_{i}^{- 1} ω_{i, s}^{t} + \sum_{s \in N_{j} / i} s i g n (s, i) Q_{j}^{- 1} ω_{j, s}^{t} .

(A4)

Using the definition of coefficient matrix A one more time, we have this fact: there are

D_{i}

matrix

Q_{i}^{- 1}

and

D_{j}

matrix

Q_{j}^{- 1}

in the lth row of the matrix

A H_{k}^{- 1} A^{'}

. Combining the definition of

{\bar{B}}_{k}

in (8), we obtain

{({\bar{B}}_{k} ω^{t})}_{l} = [(D_{i} - 1) Q_{i}^{- 1} + (D_{j} - 1) Q_{j}^{- 1}] ω_{i, j}^{t} .

(A5)

From the preceding relation and the definition of matrix

C_{k}

in (6), we have

{(C_{k} + {\bar{B}}_{k})}_{l} = Q_{i}^{- 1} + Q_{j}^{- 1} + [(D_{i} - 1) Q_{i}^{- 1} + (D_{j} - 1) Q_{j}^{- 1}] = D_{i} Q_{i}^{- 1} + D_{j} Q_{j}^{- 1} .

(A6)

Finally, we obtain the desired distributed iteration (20) when substituting Equations (A4)–(A6) into (9). ☐

References

Ram, S.S.; Nedić, A.; Veeravalli, V.V. Distributed stochastic subgradient projection algorithms for convex optimization. J. Optim. Theory Appl. 2010, 147, 516–545. [Google Scholar]
Shi, W.; Ling, Q.; Yuan, K.; Wu, G.; Yin, W. On the Linear Convergence of the ADMM in Decentralized Consensus Optimization. IEEE Trans. Signal Process. 2014, 62, 1750–1761. [Google Scholar] [CrossRef]
Nedić, A.; Ozdaglar, A. Distributed subgradient methods for multi-agent optimization. Autom. Control IEEE Trans. 2009, 54, 48–61. [Google Scholar] [CrossRef]
Nedić, A.; Ozdaglar, A. On the rate of convergence of distributed subgradient methods for multi-agent optimization. In Proceedings of the IEEE CDC, New Orleans, LA, USA, 12–14 December 2007; pp. 4711–4716. [Google Scholar]
Sundhar Ram, S.; Nedić, A.; Veeravalli, V.V. A new class of distributed optimization algorithms: Application to regression of distributed data. Optim. Methods Softw. 2012, 27, 71–88. [Google Scholar] [CrossRef]
Lobel, I.; Ozdaglar, A. Distributed subgradient methods for convex optimization over random networks. Autom. Control IEEE Trans. 2011, 56, 1291–1306. [Google Scholar] [CrossRef]
Mota, J.F.; Xavier, J.M.; Aguiar, P.M.; Puschel, M. D-ADMM: A communication-efficient distributed algorithm for separable optimization. Signal Process. IEEE Trans. 2013, 61, 2718–2723. [Google Scholar] [CrossRef]
Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Found. Trends Mach. Learn. 2011, 3, 1–122. [Google Scholar] [CrossRef]
Xu, F.; Han, J.; Wang, Y.; Chen, M.; Chen, Y.; He, G.; Hu, Y. Dynamic Magnetic Resonance Imaging via Nonconvex Low-Rank Matrix Approximation. IEEE Access 2017, 5, 1958–1966. [Google Scholar] [CrossRef]
Chen, Y.; Guo, Y.; Wang, Y.; Wang, D.; Peng, C.; He, G. Denoising of Hyperspectral Images Using Nonconvex Low Rank Matrix Approximation. IEEE Trans. Geosci. Remote Sens. 2017. [Google Scholar] [CrossRef]
Chen, Y.; Wang, Y.; Li, M.; He, G. Augmented Lagrangian alternating direction method for low-rank minimization via non-convex approximation. Signal Image Video Process. 2017. [Google Scholar] [CrossRef]
Zhang, Y.; Liu, X.J.; Qu, B. Distributed Model Predictive Load Frequency Control of Multi-area Power System with DFIGs. IEEE/CAA J. Autom. Sin. 2017, 4, 125. [Google Scholar]
Ram, S.S.; Veeravalli, V.V.; Nedić, A. Distributed non-autonomous power control through distributed convex optimization. In Proceedings of the IEEE INFOCOM 2009, Rio de Janeiro, Brazil, 19–25 April 2009; pp. 3001–3005. [Google Scholar]
Cavalcante, R.L.; Yamada, I.; Mulgrew, B. An adaptive projected subgradient approach to learning in diffusion networks. Signal Process. IEEE Trans. 2009, 57, 2762–2774. [Google Scholar] [CrossRef]
Elad, M.; Figueiredo, M.A.; Ma, Y. On the role of sparse and redundant representations in image processing. Proc. IEEE 2010, 98, 972–982. [Google Scholar] [CrossRef]
Wang, L.; Wang, S.Y.; Zheng, X.L. A Hybrid Estimation of Distribution Algorithm for Unrelated Parallel Machine Scheduling with Sequence-Dependent Setup Times. IEEE/CAA J. Autom. Sin. 2016, 3, 235. [Google Scholar]
Song, Y.; Lou, H.F.; Liu, S. Distributed Model Predictive Control with Actuator Saturation for Markovian Jump Linear System. IEEE/CAA J. Autom. Sin. 2015, 2, 374. [Google Scholar]
Chiang M, Hande P, L.T.e.a. Power control in wireless cellular networks. Found. Trends Netw. 2008, 2, 381–533. [Google Scholar]
Shen, C.; Chang, T.H.; Wang, K.Y.; Qiu, Z.; Chi, C.Y. Distributed robust multicell coordinated beamforming with imperfect CSI: An ADMM approach. Signal Process. IEEE Trans. 2012, 60, 2988–3003. [Google Scholar] [CrossRef]
Mokhtari, A.; Ling, Q.; Ribeiro, A. Network newton-part i: Algorithm and convergence. Available online: https://arxiv.org/abs/1504.06017 (accessed on 5 August 2017).
Wei, E.; Ozdaglar, A.; Jadbabaie, A. A distributed newton method for network utility maximization. In Proceedings of the 49th IEEE Conference on Decision and Control (CDC) IEEE, Atlanta, GA, USA, 15–17 December 2010; pp. 1816–1821. [Google Scholar]
West, D.B. Introduction to Graph Theory; Prentice Hall: Upper Saddle River, NJ, USA, 2001. [Google Scholar]
Jadbabaie, A.; Motee, N.; Barahona, M. On the stability of the Kuramoto model of coupled nonlinear oscillators. In Proceedings of the 2004 IEEE American Control Conference, Boston, MA, USA, 30 June–2 July 2004; Volume 5, pp. 4296–4301. [Google Scholar]
Mangasarian, O.L. Convergence of iterates of an inexact matrix splitting algorithm for the symmetric monotone linear complementarity problem. SIAM J. Optim. 1991, 1, 114–122. [Google Scholar] [CrossRef]

Figure 1. Network with

P = 11

nodes.

Figure 1. Network with

P = 11

nodes.

Figure 2. A Hamilton path of network with

P = 11

nodes.

Figure 2. A Hamilton path of network with

P = 11

nodes.

Figure 3. Spanning tree of network with 11 nodes in Figure 1.

Figure 4. Spanning tree after renumbering.

Figure 5. Dual graph for the network in Figure 3.

Figure 6. Absolute error along iterations for network with 100 nodes (1). (a) Absolute error; (b) Objective function value.

Figure 7. Absolute error along iterations for network with 100 nodes (2). (1). (a) Absolute error; (b) Objective function value.

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, D.; Ren, H.; Shao, F. Distributed Newton Methods for Strictly Convex Consensus Optimization Problems in Multi-Agent Networks. Symmetry 2017, 9, 163. https://doi.org/10.3390/sym9080163

AMA Style

Wang D, Ren H, Shao F. Distributed Newton Methods for Strictly Convex Consensus Optimization Problems in Multi-Agent Networks. Symmetry. 2017; 9(8):163. https://doi.org/10.3390/sym9080163

Chicago/Turabian Style

Wang, Dong, Hualing Ren, and Fubo Shao. 2017. "Distributed Newton Methods for Strictly Convex Consensus Optimization Problems in Multi-Agent Networks" Symmetry 9, no. 8: 163. https://doi.org/10.3390/sym9080163

APA Style

Wang, D., Ren, H., & Shao, F. (2017). Distributed Newton Methods for Strictly Convex Consensus Optimization Problems in Multi-Agent Networks. Symmetry, 9(8), 163. https://doi.org/10.3390/sym9080163

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Distributed Newton Methods for Strictly Convex Consensus Optimization Problems in Multi-Agent Networks

Abstract

1. Introduction

2. Preliminaries

3. Problem Formulation

4. Distributed Newton Method For Multi-Agent Consensus Optimization Problems in Networks with a Hamilton Path

5. Distributed Newton Method for Multi-agent Problems in General Connected Networks

6. Numerical Experiments

7. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

Appendix A

Appendix B

Appendix C

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI