In-Network Computation of the Optimal Weighting Matrix for Distributed Consensus on Wireless Sensor Networks

In a network, a distributed consensus algorithm is fully characterized by its weighting matrix. Although there exist numerical methods for obtaining the optimal weighting matrix, we have not found an in-network implementation of any of these methods that works for all network topologies. In this paper, we propose an in-network algorithm for finding such an optimal weighting matrix.


Introduction
A sensor is a device capable of measuring a certain physical property. Normally, in a wireless sensor network (WSN), each sensor or node can transmit and receive data wirelessly, and it has the ability of performing multiple tasks, which are usually based on simple mathematical operations such as additions and multiplications. Moreover, the sensors within a WSN are usually powered with batteries, leading to very limited energy resources.
For most tasks, it is required that each sensor computes a target value that depends on the values measured by other sensors of the WSN. Commonly, a WSN has a central entity, known as the central node, which collects the values measured by all the sensors, computes the target values, and sends each target value to the corresponding sensor. This strategy is known as centralized computation.
The main disadvantage of the centralized computation strategy is that it is extremely energy inefficient from the transmission point of view because, when a sensor is far away from the central node, it has to consume disproportionate amounts of energy, with respect to the energy provided by its battery, in order to transmit its measured value to the central node. An alternative strategy to overcome the energy inefficiency of the centralized computation is the distributed or in-network computation strategy. In distributed computation, which is a cooperative strategy, each sensor computes its target value by interchanging information with its neighbouring sensors.
In many recent signal processing applications of distributed computations (e.g., [1][2][3][4]), the average needs to be computed (i.e., each sensor seeks the arithmetic mean of the values measured by all the sensors of the WSN). The problem of obtaining that average in all the sensors of the WSN by using the distributed computation strategy is known as the distributed averaging problem, or the distributed average consensus problem. Moreover, the problem of obtaining the same value in all the sensors of the WSN by using the distributed computation strategy is known as the distributed consensus problem (see, for example, [5] for a review on this subject).
A common approach for solving the distributed averaging problem is to use a synchronous linear iterative algorithm that is characterized by a matrix, which is called the weighting matrix.
A well-known problem related to this topic is that of finding a symmetric weighting matrix that achieves consensus as fast as possible. This is the problem of finding the fastest symmetric distributed linear averaging (FSDLA) algorithm.
The FSDLA problem was solved in [6]. Specifically, in [6], the authors proved that solving the FSDLA problem is equivalent to solving a semidefinite program, and they used the subgradient method for efficiently solving such a problem to obtain the corresponding weighting matrix. Unfortunately, solving the FSDLA problem this way requires a central entity with full knowledge of the entire network. This central entity has to solve the FSDLA problem and then communicate the solution to each node of the network. This process has to be repeated each time the network topology changes due to, for example, a node failing, a node being added or removed (plug-and-play networks), or a node changing its location.
Moreover, WSNs may not have a central entity to compute the optimal weighting matrix. This paper proposes, for those networks without a central entity, an in-network algorithm for finding the optimal weighting matrix.
It is worth mentioning that in the literature, one can find other in-network algorithms that solve the FSDLA problem in a distributed way. In particular, in [7], the authors present an in-network algorithm that computes the fastest symmetric weighting matrix, but only with positive weights. As will be made more explicit in the next section, this matrix is not a solution of the FSDLA problem in general, as the latter might contain negative weights.
In [8], the FSDLA problem is solved in a centralized way when the communication among nodes is noisy. Closed-form expressions for the optimal weights for certain network topologies (paths, cycles, grids, stars, and hypercubes) are also provided. However, unless the considered network topology is one of these five, an in-network solution to the FSDLA is not provided.
Finally, in [9], an in-network algorithm for solving the FSDLA problem is provided. However, as the authors claim, the algorithm breaks down when the second-and third-largest eigenvalues of the weighting matrix become similar or equal.
Unlike the approaches found in the literature, the in-network algorithm presented in this paper is proved to always converge to the solution of the FSDLA problem, irrespective of the considered network topology.

The Distributed Average Consensus Problem
We consider a network composed of n nodes. The network can be viewed as an undirected graph G = (V, E ), where V = {1, 2, . . . , n} is the set of nodes and E is the set of edges. An edge e = {i, j} ∈ E means that nodes i, j ∈ V are connected and can therefore interchange information. Conversely, if {i, j} / ∈ E , this means that nodes i, j ∈ V are not connected and cannot interchange information. We let K be the cardinal of E , i.e., K is the number of edges in the graph G. For simplicity, we enumerate the edges in the graph G as E = {e 1 , e 2 , . . . , e K }, where e k = {i k , j k } for all k ∈ {1, 2, . . . , K}.
We assume that each node i ∈ V has an initial value x i (0) ∈ R, where R denotes the set of (finite) real numbers. Accordingly, in this paper, R m×n denotes the set of m × n real matrices. We consider that all the nodes are interested in obtaining the arithmetic mean (average) x ave of the initial values of the nodes, that is, using a distributed algorithm. This problem is commonly known as the distributed averaging problem, or the distributed average consensus problem.
The approach that will be considered here for solving the distributed averaging problem is to use a linear iterative algorithm of the form where time t is assumed to be discrete (namely, t ∈ {0, 1, 2, . . .}) and w i,j ∈ R are the weights that need to be set so that lim for all i ∈ V and for all x 1 (0), x 2 (0), . . . , x n (0) ∈ R. From the point of view of communication protocols, there exist efficient ways of implementing synchronous consensus algorithms of the form of Equation (1) (e.g., [10]). We observe that Equation (1) can be written in matrix form as , and W ∈ R n×n is called the weighting matrix, which is such that its entry at the ith row and jth column, [W ] i,j , is given by Therefore, Equation (2) can be rewritten as where P n := 1 n 1 n 1 n , and 1 n is the n × 1 matrix of ones. We only consider algorithms of the form of Equation (3), for which the weighting matrix W is symmetric. If W is symmetric, it is shown in [6] (Theorem 1) that Equation (5) holds if and only if W1 n = 1 n and W − P n 2 < 1, where · 2 denotes the spectral norm. For the reader's convenience, we here recall that if A ∈ R n×n is symmetric, then A 2 = |λ 1 (A)|, where λ l (A), l ∈ {1, 2, . . . , n}, denote the eigenvalues of A, which, in this paper, are arranged such that |λ 1 (A)| ≥ |λ 2 (A)| ≥ . . . ≥ |λ n (A)| (e.g., [11] (pp. 350, 603)).
We observe that Equation (10) can be computed in a distributed way if each node i ∈ V is able to know y i . The following result provides a means of computing such a unit eigenvector y of W (w) in a distributed way.

Considered Minimization Problem: FSDLA Problem
We denote with W (G) the set of all the n × n real symmetric matrices that satisfy Equation (4) and W1 n = 1 n simultaneously, that is, In [6], the convergence time of an algorithm of the form of Equation (3) with symmetric weighting matrix W is defined as This convergence time is a mathematical measure of the convergence speed of the algorithm.
According to the previous, we call the FSDLA problem to find a weighting matrix W opt ∈ W (G) such that W opt − P n 2 ≤ W − P n 2 ∀W ∈ W (G) (7) We observe that in this definition the meaning of fastest is in terms of convergence time.
It is shown in [6] that the FSDLA problem of Equation (7) is a constrained convex minimization problem that can be efficiently solved. In fact, in [6], it is shown that the FSDLA problem of Equation (7) can be expressed as a semidefinite program, and semidefinite programs can be efficiently solved [12]. However, to the best of our knowledge, there are yet no approaches for solving this FSDLA problem in a distributed (in-network) way. The contribution of this paper is to solve the FSDLA problem of Equation (7) in a distributed way. To do so, we develop a distributed subgradient method.
Finally, it should be mentioned that in [7], the authors solved, in a distributed way, a related problem: they find the fastest mixing Markov chain (FMMC). The FMMC problem is devoted , the solution of the FSDLA problem is faster than, or is at least as fast as, the solution of the FMMC problem.

FSDLA as an Unconstrained Convex Minimization Problem
In order to use a distributed subgradient method (the classical reference on subgradient methods is [13]), we first need to convert the FSDLA problem into an unconstrained convex minimization problem. We observe that if W ∈ W (G), it is clear that W depends on w e k := w i k ,j k for all k ∈ {1, 2, . . . , K}. We notice that w e k is well defined because W is symmetric. In fact, as it was stated in [6], given the vector w = (w e 1 , w e 2 , . . . , where I n is the n × n identity matrix and A k ∈ R n×n is defined as In other words, the function W : R K×1 → W (G) defined in Equation (8) is a bijection. We define the function f : We observe that the FSDLA problem of Equation (7) can now be expressed as an unconstrained minimization of the function f .
In the sequel, we denote with w a solution of the FSDLA problem, that is, It is easy to show that f has a bounded set of minimum points w. In the sequel, we will refer to the function f as the cost function of the FSDLA problem. We finish the section with Lemma 1 which will be useful in the derivation of the algorithm.

Algorithm for the In-Network Solution of the FSDLA Problem
We here derive the algorithm that solves the FSDLA problem in a distributed way (Algorithm 1). To this end, we assume that n is known by all the nodes of the network. The task of counting nodes can be performed in a distributed way (see [14]). The algorithm is a distributed implementation of a subgradient method. More specifically, each pair of nodes {i k , j k } will update their weight w i k ,j k according to the following iterative formula: where w p ∈ R K×1 is the vector of weights at the pth step, η p ∈ R is the stepsize, and ∇ f (w) is a subgradient of f at w. We recall here that a vector Theorem 1. If w ∈ R K×1 such that 0 < f (w) < 1, and y = (y 1 , y 2 , . . . , y n ) ∈ R n×1 is such that y = 1 and W (w)y = (−1) s |λ 2 (W (w))|y for some s ∈ {1, 2}, then a subgradient of f at w is We observe that Equation (10) can be computed in a distributed way if each node i ∈ V is able to know y i . The following result provides a means of computing such a unit eigenvector y of W (w) in a distributed way.
The rest of the section is devoted to proving that Equation (9) can be computed in a distributed way (Theorems 1-3), and to proving that Equation (9) actually converges to w (Theorem 4).

29: end for
30: if p < p max go to 5 Proof. Let W = W (w) = Udiag n (1, λ 2 (W ), . . . , λ n (W )) U be as in the proof of Lemma 1, with U = (u 1 |u 2 | . . . |u n ). Observe that λ 2 (W ) = 0, as |λ 2 (W )| = f (w) = 0. If (−1) s−1 |λ 2 (W )| is an eigenvalue of W for some s ∈ {1, 2}, then we denote by L s its algebraic multiplicity. Otherwise we set L s = 0. From Lemma 1, f (w) = |λ 2 (W )| and consequently L 1 and L 2 cannot be simultaneously zero. Moreover, without loss of generality we can assume that Then, we have that where α l = (x(0)) u l for all l ∈ {1, 2, . . . , n}, Observe that On the one hand, from Equation (13), we obtain On the other hand, as |λ l (W )| < |λ 2 (W )| for all l ∈ {L 1 + L 2 + 2, . . . , n}, we have that Consequently, Combining Equations (14) and (15), we obtain Equation (11). From Equation (11), (a) implies (b) for all x(0) ∈ R n×1 . As f (w) < 1, from Lemma 1 and Equation (15), we have y s = 0 n×1 if and only if a s = 0 n×1 . Consequently, if (b) holds, the set of x(0) such that a s = 0 n×1 is a vector space whose dimension is less than n; thus it has Lebesgue measure 0. Therefore, (a) and (b) are equivalent for almost every x(0) ∈ R n×1 . Theorem 2 implies that y 1 and y 2 cannot be zero simultaneously. Therefore, either y 1 y 1 or y 2 y 2 is the unit eigenvector required for computing Equation (10). We notice that the norm of a vector can be computed in a distributed way because it is the square root of n times the average of the squares of its entries. Consequently, we only need to know how to compute Equation (12) in a distributed way, or equivalently, how to compute the cost function f in a distributed way: for almost every x(0) ∈ R n×1 , where x(t) = (W (w)) t x(0) for all t ∈ {0, 1, 2, . . .}.
At this point, we have shown that the iterative Equation (9) can be computed in a distributed way. It only remains to be shown that Equation (9) actually converges to w: Theorem 4. Consider w 0 ∈ R K×1 such that 0 < f (w 0 ) < 1. Let {η p } be a sequence of real numbers satisfying lim p→∞ η p = 0 and ∑ ∞ p=0 η p = ∞. We also assume that where w p is defined in Equation (9). Then, f ( w) = W opt − P n 2 = lim p→∞ f (w p ).
Proof. Theorem 1 yields Consequently, as f has a bounded set of minimum points, the result now follows from [13] (Theorem 2.4).
We observe that the initial point w 0 in Theorem 4 can be taken, for instance, as that given by the Metropolis-Hastings algorithm (e.g., [8]). That is, if w 0 is that given by the Metropolis-Hastings algorithm, then [w 0 ] k,1 = 1 max (d i k ,d j k ) for all k ∈ {1, 2, . . . , K}, where d i is the degree of node i ∈ V (i.e., the number of nodes to which node i is connected). Therefore, w 0 can be computed in a distributed way. Table 1 relates Algorithm 1 with the theoretical aspects shown in this section.
We finish the section by describing Algorithm 1. For ease of notation, we define which is the tth iteration of Equation (1) and can clearly be computed in a distributed way. As for Algorithm 1, we fix t 0 to be the number of iterations of Equation (1) required for a desired precision. We observe that because the worst possible network topology is a path, if we set t 0 ≥ log log cos(π/n) , then ave w (x, t 0 ) − x ave 1 n 2 ≤ x 2 (see [15]), and therefore t 0 can also be obtained in a distributed way.

Numerical Results
We here present the numerical results obtained using Algorithm 1 for two networks with n = 16 nodes. The chosen starting point w 0 was that given by the Metropolis-Hastings algorithm [8], and the chosen initial sequence of stepsizes was {β p } = 1 √ p for all p ∈ {1, 2, . . .}. Moreover, we took t 0 = 250 ≈ log 10 −2 log cos(π/16) . Figure 1 shows the convergence time τ W(w p ) for the network presented in Figure 2 (solid line). Figure 1 also shows τ(W opt ) = 10.03, which was obtained by using CVX, a package for specifying and solving convex programs in a centralized way [16,17] (dashed line). Finally, Figure 1 also shows the minimum value of τ W(w p ) obtained up to step p (dotted line). For comparison purposes, we observe that the convergence time yielded by the Metropolis-Hastings algorithm was τ (W(w 0 )) = 20.81, while the minimum convergence time obtained after 150 iterations of our algorithm was 10.31.   Figure 3 is of the same type as Figure 1, but in this case, the considered network was a 4 × 4 grid. In this case, if the problem is optimally solved in a centralized way it yields τ(W opt ) = 2.89.  We finish the section with a note on the number of exchanged messages (number of transmissions). For every iteration p of Algorithm 1, the number of exchanged messages per node was at most 5t 0 , divided as follows: t 0 message exchanges were required for lines 8 and 9, another 2t 0 message exchanges were needed in line 10 (lines 14 and 15 did not require new message exchanges), and line 16 required another t 0 message exchanges. Finally, depending on the if-clause, another t 0 message exchanges were required in line 21. Therefore, the overall number of required transmissions per node was between 4p max t 0 and 5p max t 0 .

Conclusions
In this paper we have provided an algorithm for the in-network computation of the optimal weighting matrix for distributed consensus. The algorithm can be viewed as an iterative repetition of, at most, five distributed consensus operations. Our algorithm is especially useful for networks that do not have a central entity and that change with time. In fact, if a network never changes with time (and its topology is known a priori), it seems easier to solve the FSDLA problem offline (in a centralized rather than a distributed way) using [6], and then pre-configuring the nodes with the obtained weights. However, if the network topology changes randomly with time (e.g., if sensors are added or removed) and there is no central entity, our algorithm would so far be the only way of obtaining the optimal solution to the FSDLA problem.