Analysis of Known Linear Distributed Average Consensus Algorithms on Cycles and Paths

In this paper, we compare six known linear distributed average consensus algorithms on a sensor network in terms of convergence time (and therefore, in terms of the number of transmissions required). The selected network topologies for the analysis (comparison) are the cycle and the path. Specifically, in the present paper, we compute closed-form expressions for the convergence time of four known deterministic algorithms and closed-form bounds for the convergence time of two known randomized algorithms on cycles and paths. Moreover, we also compute a closed-form expression for the convergence time of the fastest deterministic algorithm considered on grids.


Introduction
A distributed averaging (or average consensus) algorithm obtains in each sensor the average (arithmetic mean) of the values measured by all the sensors of a sensor network in a distributed way.
The most common distributed averaging algorithms are linear and iterative: x(t + 1) = W(t) x(t), t ∈ {0, 1, 2, . . .}, where: is a real vector, n is the number of sensors of the network, which we label v j with j ∈ {1, . . . , n}, x j (0) is the value measured by the sensor v j , x j (t) is the value computed by the sensor v j in time t = 0 and the weighting matrix W(t) is an n × n real sparse matrix satisfying that if two sensors v j and v k are not connected (i.e., if v j and v k cannot interchange information), then [W(t)] j,k = 0. From the point of view of communication protocols, there exist efficient ways of implementing synchronous algorithms of the form of (1). (see, e.g., [1]). The linear distributed averaging algorithms can be classified as deterministic or randomized depending on the nature of the weighting matrices W(t).

Deterministic Linear Distributed Averaging Algorithms
Several well-known deterministic linear distributed averaging algorithms can be found in [2] and [3]. Those algorithms are time-invariant and have symmetric weights, that is, the deterministic weighting matrix W(t) is symmetric and does not depend on t (and consequently, x(t) = W t x(0)).
In [2], the authors search among all the symmetric weighting matrices W the one that makes (1) the fastest possible and show that such a matrix can be obtained by numerically solving a convex optimization problem. This algorithm is called the fastest linear time-invariant (LTI) distributed averaging algorithm for symmetric weights. It should be mentioned that in [4], the authors proposed an in-network algorithm for finding such an optimal weighting matrix.
In [2], the authors also give a slower algorithm: the fastest constant edge weights algorithm. In this other algorithm, they consider a particular structure of symmetric weighting matrices that depends on a single parameter and find the value of that parameter that makes (1) the fastest possible.
In [3], another two algorithms can be found: the maximum-degree weights algorithm and the Metropolis-Hastings algorithm.
For other deterministic linear distributed averaging algorithms, we refer the reader to [5] and the references therein.

Randomized Linear Distributed Averaging Algorithms
For the randomized case, a well-known linear distributed averaging algorithm was given in [6]. That algorithm is called the pairwise gossip algorithm because only two randomly-selected sensors interchange information at each time instant t.
Another well-known randomized algorithm can be found in [7]. That algorithm is called the broadcast gossip algorithm because a single sensor is randomly selected at each time instant t and broadcasts its value to all its neighboring sensors. The broadcast gossip algorithm is a linear distributed consensus algorithm rather than a linear distributed averaging algorithm. However, the broadcast gossip algorithm converges to a random consensus value, which is, in expectation, the average of the values measured by all the sensors of the network. If one uses the directed version of the broadcast gossip algorithm [8] in a symmetric graph, one would converge to the true average.
For other randomized linear distributed averaging algorithms, we refer the reader to [9] and the references therein. The linear distributed averaging algorithms reviewed in Sections 1.1 and 1.2 are the most cited algorithms in the literature on the topic.

Our Contribution
A key feature of a distributed averaging algorithm is its convergence time, because it allows one to establish the stopping criterion for the iterative algorithm. The convergence time is defined as the number of iterations t required in (1) until the effective value computed by the sensors, x(t), has approached the steady state sufficiently close (to a threshold ). In the literature, we have not found closed-form expressions for the convergence time of the six linear distributed averaging algorithms mentioned in Sections 1.1 and 1.2. A mathematical expression is said to be a closed-form expression if it is written in terms of a finite number of elementary functions (i.e., in terms of a finite number of constants, arithmetic operations, roots, exponentials, natural logarithms and trigonometric functions). In the present paper, we compute closed-form expressions for the convergence time of the deterministic algorithms and closed-form upper bounds for the convergence time of the randomized algorithms on two common network topologies: the cycle and the path. Observe that these closed-form formulas give us upper bounds for the convergence time of the considered algorithms (stopping criteria) on any network that contains as a subgraph a cycle or a path with the same number of sensors. Specifically, in this paper, we compute: From these closed-form formulas, we study the asymptotic behavior of the convergence time of the considered algorithms as the number of sensors of the network grows. The obtained asymptotic and non-asymptotic results allow us to compare the considered algorithms in terms of convergence time and, consequently, in terms of the number of transmissions required, as well (see Sections 4 and 5). The knowledge of the number of transmissions required lets us know the energy consumption of the distributed technique. The knowledge of the energy consumption is a key factor in the design of a new wireless sensor network (WSN), where one has to decide the number of nodes and the network topology. It should be mentioned that when designing new WSNs, cycles, paths and grids are topologies that are considered frequently.

Convergence Time of Deterministic Linear Distributed Averaging Algorithms
Different definitions of convergence time are used in the literature. We have found three different definitions for the convergence time of a deterministic linear distributed averaging algorithm (see [2,10,11]). In this paper, we consider the definition of -convergence time given in [11]: where ∈ (0, 1), · 2 is the spectral norm and P n := 1 n 1 n 1 n , with 1 n being the n × 1 matrix of ones and denoting the transpose. If we replace the spectral norm by the infinity norm in that definition, we obtain the definition of -convergence time given in [10]. If the deterministic matrix W(t) in (1) does not depend on t, we denote the -convergence time by τ( , W).

Convergence Time of the Fastest LTI Distributed Averaging Algorithm for Symmetric Weights
In this section, we give a closed-form expression for the -convergence time of the fastest LTI distributed averaging algorithm for symmetric weights, and we study its asymptotic behavior as the number of sensors of the network grows. We consider three common network topologies: the cycle, the grid and the path (see Figure 1).

The Cycle
Let: Using (4), Theorem 1 gives the expression of the weighting matrix of the fastest LTI distributed averaging algorithm for symmetric weights on a cycle with n sensors. Theorem 1. Let n ∈ N, with n > 3. Then, • W n (γ 0 ) is the weighting matrix of the fastest LTI distributed averaging algorithm for symmetric weights on a cycle with n sensors, where: with: if n is odd.
Proof. See Appendix B.
We now give a closed-form expression for the -convergence time of the fastest LTI distributed averaging algorithm for symmetric weights on a cycle. We also study the asymptotic behavior of this convergence time as the number of sensors of the cycle grows. We first introduce some notation: Two sequences of numbers {a n } and {b n } are said to be asymptotically equal, and write a n ∼ b n , if and only if lim n→∞ a n b n = 1 (see, e.g., [12] (p. 396)), and, consequently, Let f , g : N → R be two non-negative functions. We write f (n) = O(g(n)) (respectively, f (n) = Ω(g(n))) if there exist K ∈ (0, ∞) and n 0 ∈ N such that f (n) ≤ Kg(n) (respectively, f (n) ≥ Kg(n)) for all n ≥ n 0 . If f (n) = O(g(n)) and f (n) = Ω(g(n)), then we write f (n) = Θ(g(n)).

Theorem 2.
Consider ∈ (0, 1) and n ∈ N, with n > 3. Let where log is the natural logarithm and x denotes the smallest integer not less than x. Moreover, Proof. See Appendix C.

The Grid
Let: be the n × n matrix for n ≥ 2, and ∼ W 1 (α) := 1. We define: where ⊗ is the Kronecker product. Using (12), Theorem 3 gives the expression of the weighting matrix of the fastest LTI distributed averaging algorithm for symmetric weights on a grid of r rows and c columns.
Theorem 3. Let r, c ∈ N, with rc > 2. Then, the rc × rc matrix W r,c 1 2 is the weighting matrix of the fastest LTI distributed averaging algorithm for symmetric weights on a grid of r rows and c columns.

Proof. See Appendix D.
We now give a closed-form expression for the -convergence time of the fastest LTI distributed averaging algorithm for symmetric weights on a grid of r rows and c columns. We also study the asymptotic behavior of this convergence time as the number of rows of the grid grows. Theorem 4. Consider ∈ (0, 1) and r, c ∈ N, with rc > 2. Without loss of generality, we assume r ≥ c. Then, Moreover, and consequently, τ , W r,c 1 2 = Θ(r 2 log −1 ). (15) Proof. From [2] (Theorem 1), Theorem A1 and (A64), we obtain (13). The rest of the proof runs as the proof of Theorem 2.
Since the number of transmissions per iteration on a grid of r rows and c columns is rc for the fastest LTI distributed averaging algorithm for symmetric weights, the total number of transmissions required for τ , W r,c 1 2 iterations is: If r = c = √ n, from Theorem 4, we obtain: and hence, T , W r,c 1 2 = Θ(n 2 log −1 ). Observe that from (13), the optimal configuration for a grid with n sensors is obtained when r = c = √ n.

The Path
Since the path with n sensors can be seen as a grid of n rows and one column, from Theorem 3, we conclude that ∼ W n 1 2 is the weighting matrix of the fastest LTI distributed averaging algorithm for symmetric weights on a path of n sensors, and from Theorem 4, we conclude that: Moreover, and consequently, Finally, from (16), we obtain: and hence, T , ∼ W n 1 2 = Θ(n 3 log −1 ).

Convergence Time of the Fastest Constant Edge Weights Algorithm
In [2], the authors consider the real symmetric weighting matrices W n (ρ) given by: where d j denotes the degree of the sensor v j (i.e., the number of sensors different from v j connected to v j ).
Observe that the weighting matrices of the fastest LTI distributed averaging algorithms for symmetric weights given in Section 2.1 for a cycle and a path, namely • W n (γ 0 ) and ∼ W n 1 2 , can be regarded as W n (ρ) in (22) taking ρ = γ 0 and ρ = 1 2 , respectively. Therefore, the closed-form expression for the -convergence time of the fastest constant edge weights algorithm is given by Theorem 2 on a cycle and by Theorem 4 on a path. That is, the -convergence time of the fastest constant edge weights algorithm and the -convergence time of the fastest LTI distributed averaging algorithm for symmetric weights is the same on a cycle and on a path.

Convergence Time of the Maximum-Degree Weights Algorithm and of the Metropolis-Hastings Algorithm
For the maximum-degree weights algorithm [3], the weighting matrix considered is the real symmetric matrix W n (ρ) in (22) with: On the other hand, for the Metropolis-Hastings algorithm [3], the entries of the weighting matrix W n are given by: where A is the adjacency matrix of the network, that is A is the n × n real symmetric matrix given by:

The Cycle
Observe that the weighting matrices of the maximum-degree weights algorithm and the Metropolis-Hastings algorithm for a cycle with n sensors can be regarded as (4) taking γ = 1 3 . We now give a closed-form expression for the -convergence time of the maximum-degree weights algorithm and of the Metropolis-Hastings algorithm on a cycle. We also study the asymptotic behavior of this convergence time as the number of sensors of the cycle grows.
Since the number of transmissions per iteration on a cycle with n sensors is n for both algorithms, the total number of transmissions required for τ , . From Theorem 5, we obtain: and thus, T ,

The Path
Observe that the weighting matrices of the maximum-degree weights algorithm and of the Metropolis-Hastings algorithm for a path with n sensors can be regarded as (11) taking α = 1 3 . We now give a closed-form expression for the -convergence time of the maximum-degree weights algorithm and of the Metropolis-Hastings algorithm on a path. We also study the asymptotic behavior of this convergence time as the number of sensors of the path grows. Theorem 6. Consider ∈ (0, 1) and n ∈ N, with n > 3. Then: and therefore, Proof. Combining (A63) and [4] (Lemma 1), we obtain: and applying [2] (Theorem 1) and Theorem A1, (31) holds. The rest of the proof runs as the proof of Theorem 2.
Since the number of transmissions per iteration on a path with n sensors is n for both algorithms, the total number of transmissions required for τ , From Theorem 6, we obtain: and thus, T ,

Lower and Upper Bounds for the Convergence Time of the Pairwise Gossip Algorithm
In the literature, we have found two different definitions for the convergence time of a randomized linear distributed averaging algorithm (see [6,7]). In this subsection, we consider the definition of -convergence time for a randomized linear distributed averaging algorithm given in [6]: where ∈ (0, 1) and Pr denotes probability. We prove in Theorem A1 (Appendix A) that the definitions of -convergence time in (3) and (36) coincide when applied to deterministic LTI distributed averaging algorithms with symmetric weights (in particular, the four algorithms considered in Section 2). For those algorithms, we also obtain from Theorem A1 that: where τ (W) denotes the definition of convergence time given in [2]. We recall here that in the pairwise gossip algorithm [6], only two sensors interchange information at each time instant t. These two sensors v j t and v k t are randomly selected at each time instant t, and the weighting matrix W(t), which we denote by W P (t), is the symmetric matrix given by: for all j, k ∈ {1, . . . , n}.
In [6], a lower and an upper bound for the -convergence time of the pairwise gossip algorithm were introduced. We now give a closed-form expression for those bounds on a cycle and on a path, and we study their asymptotic behavior as the number of sensors of the network grows.

The Cycle
Theorem 7. Consider ∈ (0, 1) and n ∈ N, with n > 3. Suppose that • W P (t) is the weighting matrix of the pairwise gossip algorithm given in (38) on a cycle with n sensors, where the edge {v j t , v k t } is randomly selected at each time instant t ∈ N ∪ {0} with probability 1 n . Then: with: Moreover, and: Proof. The entries of the expectation of • W P (0) are given by: for all j, k ∈ {1, . . . , n}. Thus, E( ). Therefore, combining (A29) and [6] (Theorem 3), we obtain (39). The rest of the proof runs as the proof of Theorem 2.
Since the number of transmissions per iteration on a cycle with n sensors is two for the pairwise gossip algorithm, the total number of transmissions required for τ ,

The Path
Theorem 8. Consider ∈ (0, 1) and n ∈ N, with n > 3. Suppose that ∼ W P (t) is the weighting matrix of the pairwise gossip algorithm given in (38) on a path with n sensors, where the edge {v j t , v k t } is randomly selected at each time instant t ∈ N ∪ {0} with probability 1 n−1 . Then: with: Moreover, and: Proof. The entries of the expectation of ∼ W P (0) are given by: for all j, k ∈ {1, . . . , n}. Thus, E( ∼ W P (0)) = ∼ W n ( 1 2n−2 ). Therefore, combining (A63) and [6] (Theorem 3), we obtain (44). The rest of the proof runs as the proof of Theorem 2.
Since the number of transmissions per iteration on a path with n sensors is two for the pairwise gossip algorithm, the total number of transmissions required for τ ,

Lower and Upper Bounds for the Convergence Time of the Broadcast Gossip Algorithm
We begin this subsection with the definition of -convergence time for a randomized linear distributed averaging algorithm given in [7] (Equation (42)): where ∈ (0, 1). It can be proven that the definitions of -convergence time in (36) and (49) coincide when applied to algorithms in which the matrix W(t) satisfies W(t)P n = P n W(t) = P n for all t ∈ N ∪ {0} (in particular, the pairwise gossip algorithm and deterministic LTI distributed averaging algorithms with symmetric weights).
Observe that (49) is actually a definition for the convergence time of linear distributed consensus algorithms, not only of linear distributed averaging algorithms.
We recall here that in the broadcast gossip algorithm, a single sensor broadcasts at each time instant t. This sensor v j t is randomly selected at each time instant t with probability 1 n , and the weighting matrix W(t) is given by: for all j, k ∈ {1, . . . , n}, where ϕ ∈ (0, 1) and A is the adjacency matrix of the network. We denote by W B (t) the weighting matrix in (50) when ϕ is the optimal parameter: ϕ 0 (see [7] (Section V)). In [7], a lower and an upper bound for the -convergence time of the broadcast gossip algorithm were introduced. We now give a closed-form expression for ϕ 0 and for those bounds on a cycle and on a path. We also study the asymptotic behavior of the bounds as the number of sensors of the network grows.

The Cycle
Theorem 9. Consider ∈ (0, 1) and n ∈ N, with n > 3. Suppose that • W B (t) is the weighting matrix in (50) when the network is a cycle with n sensors and ϕ is the optimal parameter: • ϕ 0 . Then: and: with: Moreover, and: Proof. See Appendix E.
Since the number of transmissions per iteration on a cycle with n sensors is one for the broadcast gossip algorithm, the total number of transmissions required for τ ,

The Path
Theorem 10. Consider ∈ (0, 1) and n ∈ N, with n > 3. Suppose that ∼ W B (t) is the weighting matrix in (50) when the network is a path with n sensors and ϕ is the optimal parameter: ∼ ϕ 0 . Then: and: with: and: Proof. See Appendix F.
Since the number of transmissions per iteration on a path with n sensors is one for the broadcast gossip algorithm, the total number of transmissions required for τ ,

Discussion
As in this paper we have used the same definition of converge time for both deterministic and randomized linear distributed averaging algorithms (namely, the one in (49)), the results given in Sections 2 and 3 allow us to compare the considered algorithms on a cycle and on a path in terms of convergence time and, consequently, in terms of the number of transmissions required, as well. In particular, these results show the following:

•
The behavior of the considered deterministic linear distributed averaging algorithms is as good as the behavior of the considered randomized ones in terms of the number of transmissions required on a cycle and on a path with n sensors: Θ(n 3 log −1 ). • For a large enough number of sensors and regardless of the considered distributed averaging algorithm, the number of transmissions required on a path is four times larger than the number of transmissions required on a cycle.

Numerical Examples
For the numerical examples, we first consider a cycle and a path with five and 10 sensors. For each network topology, we present a figure: Figure 2 for the cycle and Figure 3 for the path.  • W n(γ0) T (ǫ, • W n(γ0) T (ǫ,    In the figures, it can be observed that the Metropolis-Hastings algorithm behaves on average better than the pairwise gossip algorithm in terms of the number of transmissions required on the considered networks. It can also be observed that the broadcast gossip algorithm behaves on average approximately equal to the fastest LTI distributed averaging algorithm for symmetric weights in terms of the number of transmissions required on those networks. However, we recall here that the broadcast gossip algorithm converges to a random consensus value instead of to the average consensus value, and it should be executed several times in order to get that average value in every sensor. The figures also bear evidence of the asymptotic equalities given in (61) and in (62).

Conclusions
In this paper, we have studied the convergence time of six known linear distributed averaging algorithms. We have considered both deterministic (the fastest LTI distributed averaging algorithm for symmetric weights, the fastest constant edge weights algorithm, the maximum-degree weights algorithm and the Metropolis-Hastings algorithm) and randomized (the pairwise gossip algorithm and the broadcast gossip algorithm) linear distributed averaging algorithms. In the literature, we have not found closed-form expressions for the convergence time of the considered algorithms. We have computed closed-form expressions for the convergence time of the deterministic algorithms and closed-form upper bounds for the convergence time of the randomized algorithms on two common network topologies: the cycle and the path. Moreover, we have also computed a closed-form expression for the convergence time of the fastest LTI algorithm on a grid. From the computed closed-form formulas, we have studied the asymptotic behavior of the convergence time of the considered algorithms as the number of sensors of the considered networks grows.
Although there exist different definitions of convergence time in the literature, in this paper, we have proven that one of them (namely, the one in (49)) encompasses all the others for the algorithms here considered. As we have used the definition of converge time in (49) for both deterministic and randomized linear distributed averaging algorithms, the obtained closed-form formulas and asymptotic results allow us to compare the considered algorithms on cycles and paths in terms of convergence time and, consequently, in terms of the number of transmissions required, as well.
We now summarize the most remarkable conclusions: • The best algorithm among the considered deterministic distributed averaging algorithms is not worse than the best algorithm among the considered randomized distributed averaging algorithms for cycles and paths.

•
The weighting matrix of the fastest LTI distributed averaging algorithm for symmetric weights and the weighting matrix of the fastest constant edge weights algorithm are the same on cycles and on paths.

•
The number of transmissions required on a path with n sensors is asymptotically four-times larger than the number of transmissions required on a cycle with the same number of sensors.

•
The number of transmissions required grows as n 3 on cycles and on paths for the six algorithms considered.

•
For the fastest LTI algorithm, the number of transmissions required grows as n 2 on a square grid of n sensors (i.e., r = c = √ n).
A future research direction of this work would be to generalize the analysis presented in the paper to other network topologies. In particular, networks that can be decomposed into cycles and paths could be studied.

1.
B t P = P for all t ∈ N.
We recall that an n × n matrix A is idempotent if and only if A 2 = A. An example of idempotent matrix is P n with n ∈ N, since P 2 The following result gives an eigenvalue decomposition for the matrix P n for all n ∈ N.
We finish this subsection with a result regarding the -convergence time.
Theorem A1. Let B be an n × n real symmetric matrix with B = P n and B t → P n . If ∈ (0, 1), then: (A3) Proof. Let t ∈ N. We first prove that the following statements are equivalent: 1.
B t x−P n x 2 x−P n x 2 ≤ for all x = P n x.

2.
B t x−P n x 2 x 2 ≤ for all x = 0 n×1 .
1⇒2 Fix x = 0 n×1 . If x = P n x, applying Lemma A2 yields: where I n is the n × n identity matrix. If x = P n x, from Lemma A1 and [2] (Theorem 1), we obtain: 2⇒1 If x = P n x, then: Consequently, To prove (A10). we have used the equivalence 1⇔2. To show (A12) and (A13), we have applied the definition of the spectral norm (see, e.g., [15] (pp. 603, 609)) and Assertion 3 of Lemma A1, respectively. To prove (A14), we have used [2] (Theorem 1) ( B − P n 2 < 1). As: we only need to show that T 1 = T 2 to finish the proof, where: and: with: Since: we have t x ≤ T 1 for all x = 0 n×1 and, consequently, is a decreasing sequence for all x = 0 n×1 , then: and therefore, and T 1 ≤ T 2 . Thus, if we prove that these sequences are decreasing, the proof is complete. Given x = 0 n×1 , from Lemma A1 and [2] (Theorem 1), we conclude that: for all t ∈ N. To prove the two equalities in (A26), we have used Assertion 2 of Lemma A1. To show the first inequality in (A26), we have applied a well-known inequality on the spectral norm (see, e.g., [15] (p. 611)), and to prove the second inequality in (A26), we have used [2] (Theorem 1) ( B − P n 2 < 1).

Appendix D. Proof of Theorem 3
We denote with W the set of all the rc × rc real symmetric matrices such that: where A is the adjacency matrix of a grid of r rows and c columns. Consider the bijection B : R q → W r,c defined in [4] (Equation (8)), where q = 4rc − 3c − 3r + 2 (i.e., q is the number of edges when the network is viewed as an undirected graph).

Appendix F. Proof of Theorem 10
We begin by proving (56). The Laplacian matrix of a path with n sensors is: if j = k, j = 1 and j = n, 1 if j = k, j ∈ {1, n}, 0 otherwise, From [20], the eigenvalues of