On Consensus-Based Distributed Blind Calibration of Sensor Networks

This paper deals with recently proposed algorithms for real-time distributed blind macro-calibration of sensor networks based on consensus (synchronization). The algorithms are completely decentralized and do not require a fusion center. The goal is to consolidate all of the existing results on the subject, present them in a unified way, and provide additional important analysis of theoretical and practical issues that one can encounter when designing and applying the methodology. We first present the basic algorithm which estimates local calibration parameters by enforcing asymptotic consensus, in the mean-square sense and with probability one (w.p.1), on calibrated sensor gains and calibrated sensor offsets. For the more realistic case in which additive measurement noise, communication dropouts and additive communication noise are present, two algorithm modifications are discussed: one that uses a simple compensation term, and a more robust one based on an instrumental variable. The modified algorithms also achieve asymptotic agreement for calibrated sensor gains and offsets, in the mean-square sense and w.p.1. The convergence rate can be determined in terms of an upper bound on the mean-square error. The case when the communications between nodes is completely asynchronous, which is of substantial importance for real-world applications, is also presented. Suggestions for design of a priori adjustable weights are given. We also present the results for the case in which the underlying sensor network has a subset of (precalibrated) reference sensors with fixed calibration parameters. Wide applicability and efficacy of these algorithms are illustrated on several simulation examples. Finally, important open questions and future research directions are discussed.


Introduction
Recently emerged technologies dealing with networked systems, such as the Internet of Things (IoT), Networked Cyber-Physical Systems (CPS), and Sensor Networks (SN), still have many conceptual and practical challenges intriguing to both researchers and practitioners [1][2][3][4][5][6][7][8][9]. New classes of problems in this area continuously arise, driven by many new real-world applications. Particularly in the case of only one reference sensor, the corrected gains and offsets of the rest of the sensors converge to the same point imposed by the reference sensor.
Finally, an analysis is given which clarifies the influence of initially selected weights corresponding to particular nodes in the presented calibration parameters estimation recursions. Guidelines are formulated on how these weights should be chosen so that given requirements are satisfied. General discussion of the described results is provided from both theoretical and practical points of view, based on which several future research directions are proposed.
The outline of the rest of the paper is as follows. The following section briefly discusses related work. In Section 3 we introduce the distributed blind macro-calibration problem and derive the basic algorithm for the noiseless case. Section 4 is devoted to the presentation of the convergence properties of the base algorithm. In Section 5 certain assumptions about the measured signals, communication errors, and communication protocol are relaxed, and the appropriate algorithm modifications are introduced, together with their convergence properties. In Section 6 a discussion on the convergence rate, the case of presence of reference sensors with fixed characteristics, and some design guidelines are presented. In Section 7 we present illustrative simulation results. Finally, Section 8 presents some conclusions and future research directions.

Related Work
Macro-calibration is based on the idea of calibrating the whole SN based on the responses of all the nodes. The most frequent approaches to this problem are based on parameter estimation techniques (e.g., [11]). If controlled stimuli are not available the problem is usually referred to as blind calibration of SNs. In general, it is a difficult problem, which has certain similarities with more general problems of blind estimation, equalization, and deconvolution (e.g., [22][23][24] and references therein).
Most of the proposed appraches to blind calibration in the existing literature are centralized and non-recursive [12,[25][26][27][28][29][30][31][32][33][34][35][36][37]. Within this class of methods, in refs. [12,25] a blind calibration algorithm based on signal subspace projection was analyzed assuming restrictive signal and sensor properties. In ref. [26] the method was improved from the point of view of robustness to subspace uncertainties. In ref. [27] the authors proposed to use sparsity and convex optimization for blind estimation of calibration gains. In ref. [28] an approach to blind sensor calibration is adopted based on centralized consistency maximization at the network level assuming very dense deployment and only pairwise inter-node communications. In ref. [29] a moments-based centralized blind calibration is proposed for mobile SNs, exploiting multiple measurements of the same signal of mobile nodes, assuming that the measured signal does not change in time. In ref. [30], the authors proposed a method which can manage situations in which density requirements are not met. Interesting centralized approaches to blind drift calibration proposed in refs. [31][32][33], which also work when the density requirement is not met, are based on non-restrictive modeling of the assumed underlying signal subspace, with drift estimation using Kalman filter [31], sparse Bayesian learning [32], or deep learning [33]. The approach in ref. [34] also does not rely on stringent assumptions about signal subspace, but assume first-order auto-regressive signal process model. The authors of [35] introduce linear algebraic model of calibration relationships in a SN with centralized architecture to improve the simple mean calibration scheme, assuming sufficiently dense deployment. Another centralized approach to mobile sensors calibration is proposed in ref. [36] and is based on using a nonnegative matrix factorization. Some of the density assumptions introduced in this work were relaxed in ref. [37]. In ref. [38] the blind calibration problem was treated in a context of sparse sensing, using a message passing algorithm, assuming constant measured signal. The method proposed in ref. [39], based on geospatial estimation and Kalman filter, works if the sensors are calibrated at the beginning of the operation after deployment, and then may start to drift.
The problem of distributed blind macro-calibration may have certain similarities with the clock synchronization approaches based on local data processing and communications with neighbors [40][41][42][43][44][45][46][47]. However, these approaches cannot be directly mapped to the calibration problem treated in this paper.
Finally, certain extended consensus algorithms have been applied to SN calibration problems, but in different settings than the one treated in this paper [48][49][50][51]. An approach to blind calibration of sensor gains only, based on distributed gossip-based Expectation-Maximization iterations was proposed in ref. [52], assuming that the measured signal is constant. Another distributed approach was proposed in ref. [53], which explicitly uses a state-space model of the underlying process, and a message exchange protocol for offset compensation. The proposed scheme was formulated without proof of convergence. This paper is focused on the algorithms proposed recently in refs. [13][14][15][16][17] representing completely distributed and decentralized blind macro-calibration algorithms with rigorous proofs of convergences for both corrected sensor gains and offsets, with satisfactory performance under diverse deteriorating conditions which may typically appear in practical applications.

Problem Definition and the Basic Algorithm
Assume that the SN to be calibrated consists of n nodes/sensors. In the base setup, it is assumed that each sensor is measuring the same signal x(t) in discrete-time instants t = . . . , −1, 0, 1, . . .; this signal can be considered as a realization of a stochastic process {x(t)}. Note that we have implicitly assumed that the sensor nodes are functioning synchronously, since all the sensor nodes perform measurements in the same time instances t. We will relax this assumption in Section 5.3. The output (measurement) of the i-th node can be written as where α i is the unknown gain, and β i the unknown offset of sensor i. Note that, in this problem setup, it is assumed that α i and β i are unknown constants and not the random variables. Calibration of a sensor is performed by applying an affine calibration function to the raw readings (1) which results in the following calibrated sensor output where a i and b i are the calibration parameters to be obtained, g i = a i α i is the corrected gain and f i = a i β i + b i the corrected offset. The calibration objective is, ideally, to find parameters a i and b i for which g i is equal to one and f i equal to zero. In general, if we assume that there are no sensors which give perfect readings z i (t) = x(t) and that the signal x(t) is unknown and cannot be obtained or measured by some other means, this objective is impossible to achieve. Hence, in our decentralized real-time blind macro-calibration problem setup, this ideal objective must be alleviated: we require that the calibration process asymptotically achieves equal calibrated outputs z i (t) for all the nodes i = 1, ..., n. To approach as close as possible to the ideal goal of achieving g i = 1 and f i = 0, we could use certain a priori knowledge about the underlying SN, and try to adjust the algorithm, such that, loosely speaking, the "good" sensors (e.g., precalibrated or higher-quality sensors) correct, using the consensus strategy, the response of the rest of the sensors. For example, if, in a given SN, there is an apriori given perfectly calibrated reference sensor, the ideal asymptotic calibration (g i = 1 and f i = 0) of the rest of the sensor nodes will be achieved if the consensus goal is achieved. It is assumed that the underlying SN have a predefined communication topology, defining possible inter-sensor communications, represented by a directed graph G = (N , E ), where N is the set of nodes (sensors) and E the set of communication links (arcs). Define the adjacency matrix A = [a ij ], i, j = 1, . . . , n, where a ij = 1 if the j-th node is able to send messages to the i-th sensor, and a ij = 0 otherwise. Let N i be the set of in-neighboring nodes (or just neighbors) of the i-th node, i.e., the nodes j for which a ij = 1. Similarly, let N out i be the set of out-neighboring nodes of the i-th node, i.e., the nodes j for which a ji = 1.
Let us now derive the basic calibration algorithm. The idea is to start with local criteria for each node, whose local minimization would lead to a network-level consensus on the corrected sensor outputs: i = 1, ..., n, where γ ij are nonnegative scalar weights whose influence on the properties of the algorithm will be discussed later. Denoting θ i = [a i b i ] T , the following expression is obtained for the gradient of (3): From (4) we obtain the following stochastic gradient recursion for estimating θ * i minimizing (3): , and δ i (t) > 0 is a time-varying gain whose influence on the convergence properties of the algorithm will be discussed later. The initial conditions are assumed to beθ i (0) = [1 0] T , i = 1, . . . , n. We expect that the set of recursions (5) asymptotically achieve that all the local estimates of corrected gainsĝ i (t) =â i (t)α i and corrected offsetsf i (t) =â i (t)β i +b i (t) converge to the same valuesḡ andf , respectively; this implies that the corrected sensor outputs of all the nodes are also equalẑ j (t) =ẑ i (t), i, j = 1, . . . , n.
In Figure 1 an illustrating smart-city example sensor network is depicted. Completely decentralized network architecture is assumed, i.e., the nodes communicate according to the directed communication graph which is represented in the figure using arcs. The communication graph will typically depend on the mutual node distances, transmission power of individual nodes, channel conditions, presence of obstacles, etc. Each node in the network is equipped with the same type of sensor which measures certain physical quantity (e.g., certain atmospheric condition or air quality indicator). At each time instant t, a node i performs local reading of the raw sensor output y i (t), calculation of the corrected sensor outputẑ i (t) according to (2) using current local estimates of the calibration parametersâ i (t) andb i (t), transmission of the corrected valueẑ i (t) to the out-neighbors N out i , reception of the valuesẑ j (t) from the in-neighbors j ∈ N i , and calculation of the updated estimates of the local calibration parametersâ i (t + 1) andb i (t + 1) using (5). In the initial presentation we will assume that, at each iteration of the algorithm (5), local sensor measurement y i (t) and the current messages of the neighboring nodes' corrected outputsẑ j (t) are available at node i. Possible communication dropouts and/or faulty/noisy sensor readings will be treated later. Local computational cost for each agent is minor since only two parameters are being estimated. Communication complexity depends on the number of neighboring agents, which is small in typical SNs with decentralized architecture. For the sake of compact notations, suitable for convergence analysis of the derived algorithm, let us introduceφ and so that (5) becomesφ where , with the initial conditionsρ i (0) = [α i β i ] T , i = 1, . . . , n. Therefore, the following compact form for the recursions (8) is obtainedφ where ⊗ is the Kronecker product, I is the identity matrix of dimension 2n, I 2 is the dimension 2 identity matrix,φ(t) = [φ 1 (t) T · · ·φ n (t) T ] T , ∆(t) = diag{δ 1 (t), . . . , δ n (t)}, diag{. . .} denotes the corresponding block diagonal matrix, , where γ ij = 0 when j / ∈ N i , and the initial condition isφ(0) = [φ 1 (0) T · · ·φ n (0) T ] T , according to (8). From the way in which we have constructed the vectorφ(t) we conclude that the asymptotic value of φ(t) should be such that all of its odd components are equal, and all of its even components are equal.
In the next section, it will be shown that, under certain general assumptions, for any choice of the weights γ ij ≥ 0 for j ∈ N i (and γ ij = 0 when j / ∈ N i ) the algorithm achieves convergence to consensus. However, if the underlying calibration objective is to achieve absolute calibration of the sensors (i.e.,ḡ close to one andf close to zero), this can be done by trying to exploit sensors that are a priori known to have good characteristics. In a large SN, this can be achieved in two ways: (1) if the large number of sensors are "good" sensors, then γ ij -s in all neighborhoods N i should be approximately the same; or (2) if there is a set of a priori chosen good sensors j ∈ N f ⊂ N the goal is to enforce their dominant influence to the rest of the nodes. There are two possibilities to achieve this: (a) to set high values of γ ij for all j ∈ N f and i ∈ N out j ; or (b) to set small values of γ jk for all j ∈ N f , k ∈ N j , k = j (which prevents large changes ofφ j (t)). Section 6.3 deals with the guidelines on weights tuning, while Section 6.4 treats the case in which a set of reference sensors is kept with fixed calibration parameters.

Convergence Analysis
In this section we discuss the convergence properties of the calibration scheme presented in the previous section, where it has been assumed that both local sensor measurements and inter-node communications are perfect, i.e., possible communication errors and/or measurement errors are not present. We first analyze this basic scheme in order to focus on structural characteristics of the algorithm; the case of lossy SNs will be treated in the subsequent sections. In the basic setup, without presence of any unreliability, it is sufficient to assume that the step sizes δ i (t) are constant: (A1) δ i (t) = δ = const, for all i = 1, . . . , n.
In practice, when the SNs are used to measure certain physical quantities, the assumption that {x(t)} is i.i.d. is almost never satisfied; hence it will be relaxed later.
Based on (A1) and (A2), the expectation of the parameter estimatesφ(t) = E{φ(t)} satisfies the following recursionφ (t The following assumption, typical for consensus-based algorithms, is introduced: (A3) Graph G has a spanning tree. It implies that the matrix Γ has one zero eigenvalue and the rest eigenvalues with negative real parts, e.g., [54]. Hence, from the structure of matrixB, we directly conclude that it has at least two zero eigenvalues. Its remaining eigenvalues can be characterized starting from the following assumption: This assumption guarantees that the estimation recursions are sufficiently excited by the signal x(t). Its important consequence is that −Ω i defined by (12) is Hurwitz, for all i = 1, . . . , n. Indeed, using some simple algebra it can be derived that −Ω i is Hurwitz if and only if (iff) Both inequalities hold iff (A4) holds. This greatly simplifies further derivations which depend on somewhat complicate expression (12) for the 2 × 2 diagonal blocks of the matrixΩ.
Because of the block structure of matricesΩ andB, the properties of the main recursion (11) cannot be analyzed using standard linear consensus methodologies (see, e.g., [18,54] and references therein). To cope with this problem, a methodology based on the concept of diagonal quasi-dominance of matrices decomposed into blocks has been used [13,17,[19][20][21] to obtain the following important result characterizing all the eigenvalues of the matrixB.

Observe that vectors
is the set of real numbers, are the right eigenvectors ofB corresponding to the eigenvalue at the origin. Let ρ 1 and ρ 2 be the corresponding normalized left eigenvectors, satisfying The following lemma deals with a similarity transformation important for all the remaining derivations throughout the paper.

span{A} denotes a linear space spanned by the columns of matrix A). Then, T is nonsingular and
whereB * is Hurwitz, and 0 i×j denotes a i × j zero matrix.
Notice that where S (2n−2)×2n can be determined from the definition of T. From the structure of the matrices in (11), it can be concluded that the transformation T from Lemma 2, when applied to the original matrix B(t), will produce a matrix which has the same structure as the transformed matrix given in Equation (14). 13,17]). For the matrix B(t) in (10) it holds that, for all t, where B(t) * is an (2n − 2) × (2n − 2) matrix and T is given in Lemma 2.
The following convergence theorem can now be formulated.
Note here that the limit vector in (17) (i 1 ρ 1 + i 2 ρ 2 )φ(0) have all the odd elements equal, and all the even elements equal, which means that the corrected gains of all the nodes converge to the same value, and the corrected offsets of all the nodes converge to the same value. It can be shown [13] that this value only depends on the unknown sensor parameters α i and β i , and the weights γ ij in J i , i, j = 1, . . . n. For given initial conditions in (5), ρ 1φ (0) and ρ 2φ (0) are in the form of weighted sums of α i and β i , 1, . . . , n, respectively. Assuming that the weights γ ij are the same for all the nodes, and that α i have a distribution centered around one, and β i around zero, these weighted sums will be close to one and zero, respectively.
The value of δ > 0 in Theorem 1, which ensures convergence, may be restrictive. In practice, the choice of step size δ in (A1) should be based on the actual properties of the underlying SN; its value needs to be small enough to achieve convergence, but it should also be sufficiently large to achieve acceptable rate of convergence (as in the standard parameter estimation recursions [55]).
Hence, (A2') requires stationarity, boundedness, and imposes a mixing condition on the signal {x(t)}. The explicitly used time shift parameter d will be used later for introducing a new algorithm based on an instrumental variable, capable of dealing with possible measurement noise.

Extensions of the Basic Algorithm
In this section, we introduce several modifications and generalizations of the basic algorithm (5), so that it is possible to achieve distributed calibration under more challenging conditions, typically present in real-life SNs: communication dropouts, additive communication noise, measurement noise, and asynchronous communication. Convergence properties of the introduced modifications are presented in detail.

Communication Errors
In this subsection, we assume that inter-node communication errors can be manifested in two ways: (1) communication dropouts (outages) and (2) additive communication noise. Communication dropouts typically occur in SNs using digital communication; additive noise can, in this case, model quantization effects. For example, in the case of smart city sensor networks, depicted in Figure 1, the dropouts will happen relatively often because of the dynamic environment, where both physical obstacles and electronic interference can be persistent. In certain, less frequent practical situations, SNs can use analog communication (e.g., when certain types of energy harvesting are used [56]), when additive communication noise is dominant, and dropouts appear less frequently.
The communication errors are formally introduced using the following assumptions: (A5) The weights γ ij in the algorithm (5) are now randomly time-varying, according to stochastic processes given by Based on the above assumptions, the communication dropout at any iteration t, when node j is sending to node i, will happen with probability 1 − p ij , independently of the additive communication noise process {ξ ij (t)} and the measured signal {x(t)}. Denoting and ν(t) = ν 1 (t) . . . ν n (t) , one obtains from (10) that where , and Γ(t) is obtained from Γ by applying (A5). Convergence properties of the recursion (20), under the additional assumptions (A5)-(A7), can be derived starting from the results of the previous subsection. Due to the mutual independence of the random variables in B (t), it can be concluded that has the same spectrum asB in (11): it has two zero eigenvalues and the rest eigenvalues are with negative real part.
Since the additive noise is now present in the recursions (20), (A1) needs to be replaced with the following assumption, typical in the stochastic approximation literature (e.g., [57]): . . , n. Intuitively, (A1') introduces diminishing gains δ i (t) which converge to zero slowly enough, so that the additive noise can be averaged out while asymptotic convergence to a consensus point is achieved (despite the presence of noise). Therefore, we haveφ Similarly as in the noiseless case, let as introduce the similarity transformation where ρ 1 and ρ 2 are the left eigenvectors ofB corresponding to the eigenvalue at the origin. By applying transformation T to (21), and using stochastic Lyapunov stability arguments, along with the arguments typically used in analyzing stochastic approximation algorithms [13,17,58,59], the following theorem can be proved: 13,17]). Let Assumptions (A1'), (A2)-(A7) be satisfied. Then,φ(t) generated by (21) converges to i 1 w 1 + i 2 w 2 in the mean square sense and w.p.1, where w 1 and w 2 are scalar random variables satisfying E{w 1 } = ρ 1φ (0) and E{w 2 } = ρ 2φ (0).
The theorem essentially states that, again, all the corrected drifts converge to the same point, and all the corrected offsets converge to the same point; however, because of the additive communication noise, these points are random and depend on the noise realization. The mean values of these possible convergence points depend on the sensor parameters α i and β i , the design parameters γ ij , as well as on the dropout probabilities p ij , i, j = 1, . . . n.

Measurement Noise
In this subsection we, in addition to communication errors, assume that the signal x(t) is measured with additive measurement noise. This situation is of essential importance for practical applications since practically all the existing sensors contain certain measurement errors which are typically modeled using stochastic processes [3]. Formally, we model the additive noise stochastic process using the following assumption: (A8) Instead of y i (t) given by (1), the sensor measurements are now contaminated by noise, and given by y By replacing y η i (t) instead of y i (t) in the base algorithm (5), one obtains the following "noisy" version of (8): where It is important to observe here that Assuming again that the step sizes δ i (t), i = 1, . . . , n, satisfy (A1'), one can obtain the following equation analog to (10): where In an analogous way as in the previous section, instead of (11), the following equation is obtained for the mean of the corrected calibration parameters whereB is as in (11) and α n ∑ j γ nj , 0}. Because of the additional term Σ η , the sums of the rows of the matrixB + Σ η are not equal to zero anymore, so that the convergence to consensus (as in Theorem 1) cannot be achieved in this case.
However, it can be seen from the structure of the recursion (24) that, if we assume that the measurement noise variances (σ η i ) 2 are a priori known, we can use them to modify the basic algorithm (5) in the following way, ensuring again the asymptotic convergence to consensus: . . , n. The following theorem deals with the convergence of the above modification of the basic algorithm, when the measurement noise is present together with the communication errors. The convergence points will again depend on the measurement and communication noise realizations, in a similar way as in Theorem 3.
Notice that the above theorem was based on assumption (A2): indeed, when both {x(t)} and {η i (t)} are i.i.d. sequences, it is not surprising that the asymptotic consensus is achievable only provided σ η i , i = 1, . . . , n, are known. However, we can replace the unrealistic assumption (A2) with (A2') (introduced in Section 4 in the noiseless case) allowing correlated sequences {x(t)} which is almost always the case in practice. In such a way, the correlatedness problem present in the algorithm (24) can be overcame, without requiring any a priori information about the measurement noise process. The idea is to introduce instrumental variables in the basic algorithm in the way analogous to the one often used in the field system identification, e.g., [60,61]. Instrumental variables have the basic property of being correlated with the measured signal, and uncorrelated with noise. If {ζ i (t)} is the instrumental variable sequence of the i-th agent, one has to ensure that ζ i (t) is correlated with x(t) and uncorrelated with η j (t), j = 1, . . . , n. Under A2') a logical choice is to take the delayed sample of the measured signal as an instrumental variable, i.e., to take ζ i (t) = y η i (t − d), where d ≥ 1. Consequently, we present the following general calibration algorithm based on instrumental variables able to cope with measurement noise:θ where d ≥ 1 and . . , n. Following the derivations from Section 3, one obtains from (26) the following relations involving explicitly x(t) and the noise terms:φ where In the same way as in (23), we havê where To formulate a convergence theorem for (28), the following modification of (A4) is needed: This assumption implies that the correlation m(d 0 ) should be large enough. Similarly as in the case of (A4), it can be concluded that (A4') implies that −Ω(d) = −E{Ω i (t, d)} is Hurwitz. Similarly as in the above cases, let as introduce the similarity transformation where T 2n×(2n−2) is an 2n × (2n − 2) matrix, such that span{T 2n×(2n−2) } = span{B(d) }. Then, where ρ 1 and ρ 2 are the left eigenvectors ofB(d) = E{Ω(t, d)(Γ(t) ⊗ I 2 )} =Ω(d)(Γ ⊗ I 2 ) corresponding to the zero eigenvalue. The following theorem deals with the convergence of the instrumental variable algorithm (26). The convergence point, again, depends on the noise realization. Theorem 5 ([16]). Assume that the assumptions (A1'), (A2'), (A3), (A4'), (A5)-(A8) hold. Thenφ(t), given by (28) with d = d 0 , converges to i 1 w 1 + i 2 w 2 in the mean square sense and w.p.1, where w 1 and w 2 are scalar random variables satisfying E{w 1 } = ρ 1φ (0) and E{w 2 } = ρ 2φ (0).

Asynchronous Broadcast Gossip Communication
So far we have shown how to deal with most of the practical challenges which emerge when dealing with real life SNs, such as communication dropouts, communication additive noise and measurement noise. However, in all of the above discussed algorithms we have implicitly assumed that all the nodes in the network share a common clock, based on which the recursions in (5), (25) or (26) can be implemented synchronously. Indeed, when introducing the basic algorithm we have assumed that the signal x(t) is being measured in discrete-time instances t by all the nodes. These instances are also used as time indexes of synchronous recursions of the above algorithms. Yet, there are many practical cases of SNs for which it is impossible or impractical to function synchronously. A typical example is the case when the nodes follow certain sleeping policies in order to minimize power consumption (e.g., [3]). For example, the nodes in SN shown in Figure 1, measuring air pollution or atmospheric conditions, may be programmed to make measurements less often during periods in which there is less traffic in the city. These types of situations are rigorously treated in the rest of this subsection.
Instead of the problem setup introduced in Section 3, assume now that the sensors are measuring a continuous-time signal x(t) at discrete points t k , t k ∈ R + , k = 1, 2, . . ., t k+1 > t k , producing the sensor outputs where the α i and β i are the same unknown parameters as in the previous subsections, and we also assume that the measurement noise η i (t k ), i = 1, . . . , n, is present in the sensor readings. Furthermore, since the goal is to remove dependence on a common global clock, it is now assumed that every node j ∈ N has its own local clock. For the sake of compact notation and simpler derivations, a single clock, called global virtual clock, is introduced, which ticks when any of the local clocks ticks. Hence, t k in (29) can be considered as the time in which the k-th tick of the virtual clock happend.
To have a well defined situation, it is formally assumed that the ticks of the local clocks are independent, and that the intervals between any two consecutive ticks are finite w.p.1. It is also assumed, for the sake of simpler derivations, that the unconditional probability that the j-th clock ticked at an instance t k is q j > 0, independently of k. It is easy to verify that these conditions are satisfied for a typical model used in SNs, where it is assumed that the local clocks tick according to independent Poisson processes with rates µ j (as in, e.g., [62,63]). This case will be adopted throughout this subsection. It directly follows that, in this case, the virtual global clock ticks according to a Poisson process with the rate ∑ n j=1 µ j . According to the above assumptions, let us denote with t j l the ticks of the local clock j, l = 1, 2, . . .. The communication protocol can then be defined in the following way. At each local clock tick, a node j makes the local sensor measurement, calculates the corrected sensor output z j (t j l ) (based on the current estimates of calibration parameters a j and b j ), and broadcasts it to its out-neighbors i ∈ N out j . We assume also that communication dropouts can happen, i.e., each node i ∈ N out j receives the transmitted message with probability p ij > 0. For the sake of clarity of presentation, we do not treat additive communication noise in this subsection. It is also assumed that the communication delay is negligible, so that, practically at the same time instant all the nodes which have received the broadcast, perform the local sensor reading, calculate their corrected outputs z i (t j l ), and update the local estimates of their calibration parameters a i and b i . This procedure is repeated for any local clock tick. The index of the node whose clock has ticked at instant t k is denoted by j(k), and let J(k) be the subset of the out-neighbors i ∈ N out j(k) which have received the broadcast message. Also, let The measurement noise is treated as in the previous subsection, by using the delayed measurement y i (d i (k)) as the instrumental variable where d i (k) is the global iteration number that corresponds to the closest past measurement of the node i. By using the same local criteria as in (3) and gradients as in (4), the following new recursion for updating the calibration parameters at node i is formulated: where: • δ i (k) is the step size given by δ i (k) = ν i (k) −c , where ν i (k) = ∑ k m=1 I{i ∈ J(m)} is the number of parameter updates of node i up to the iteration k, with 1/2 < c ≤ 1 (I{·} denotes the indicator function), are the corrected outputs of node j(k) and node i.
The initial conditions are adopted to beθ i (0) = [1 0] T . Note that, according to the problem setup, at a given iteration k only the nodes i ∈ J(k) perform the above parameters update; for the rest of the nodes it holds thatθ i (k) =θ i (k − 1).
Computationally, the algorithm is as simple as the basic one, requiring only a few additions and multiplications in one iteration. Information needed at node i are: the local sensor measurement, the local instrumental variable, and the current output sent by an in-neighbor j. Knowledge of the global iteration index k (or d i (k)) is not needed.
From the above definition of the step size δ i (k) it can be concluded that it depends only on the number of local clock ticks, which makes the algorithm completely decentralized.
It should also be noticed that the instrumental variables in (31) can be selected in several ways. For example, instead of choosing (30), it can be practical to choose ζ i (k) = y i (t j,i l ), wheret j,i l is the time instant of a supplementary measurement of node i, just after the last step of the recursion (31) has been locally performed. This scheme is not assumed in the sequel, because of much more complicated notation; all the results can be easily transferred to this case.
Similarly as in the synchronous case, we introduce: Consequently, we havê where (36) for i = 1, ..., n, can be written compactly aŝ where: Since we have formulated a slightly different problem setup than in Section 3, we introduce a new set of assumptions, and denote them using letter B: The φ-mixing condition (B1) represents one of the strong mixing conditions, usually satisfied for sensory signals [64][65][66].
Assumption (B2) represents an extension of the assumption (A4), adapted to the presence of the instrumental variable y i (d i (k)) in (31). It guarantees the persistence of excitation in the sense that the variance of x(k) must be greater than zero (for all k, because of stationarity) so that constant signals are not allowed [13,55]. However, it also ensures sufficent correlation between the instrumental variable and the current measurement, so that e.g., white noise signals are also not allowed. It can be easily derived [14] that (B2) is satisfied if the autocovariance function of x(t) is positive in a sufficiently large interval around zero. Also, if the rates µ j are adjustable we can choose µ min = min j∈N µ j large enough, such that (B2) is always satisfied. Therefore, (B2) is, in general, not restrictive for processes having dominant low frequency spectrum, which is typical in practical applications of SNs.
Based on the above modified problem definition, the following result was proved in ref. [14], stating that both corrected gains and corrected offsets will converge to consensus points (which depend on the realizations of the stochastic processes) for all the nodes.

Rate of Convergence
In the above subsections we did not discuss on how quickly the presented algorithms can achieve convergence to the specified points. Since the basic Equation (5) have constant step size, it can be concluded that the asymptotic convergence rate in the noiseless case is exponential. In the cases of measurement and additive communication noise, the convergence rate of the algorithms can be obtained following general methodology applied to the analysis of standard stochastic approximation algorithms. The following result gives an upper bound on the mean-square error with respect to the consensus point: 17]). Under the assumptions of any of the Theorems 3, 4 or 5, together with lim t→∞ (δ(t + 1) −1 − δ(t) −1 ) = d ≥ 0, there exists such a positive number σ < 1 that for all 0 < σ < σ the asymptotic consensus is achieved by the presented algorithms with the convergence rate o(δ(t) σ ).
It might be problematic to obtain the precise value of σ in concrete applications. However, it can be shown that it directly depends on the sensor and network properties (encoded by matrix B(t) or B(t, d)) and on the connectivity of the underlying communication graph [67]. On one hand, if the number of nodes is increased without increasing network connectivity, the rate of convergence will decrease; however, if the graph connectivity is increased, the convergence rate will also increase. For example, if the graph is fully connected, the convergence rate will be high at the expense of very large number of communication links. In practice, a compromise between the rate of convergence and the network complexity needs to be found.
Another compromise to be found is between algorithm's noise immunity and convergence rate. Indeed, according to [68], assuming that δ(t) is given as δ(t) = m 1 /(m 2 + t µ ), m 1 , m 2 > 0, 1 2 < µ ≤ 1, the values of µ closer to 1 2 give larger rate of convergence but higher sensitivity to noise; the values of µ closer to 1 give the opposite effect.

Stationarity of the Measured Signal
In the previous section, we made comments about all the introduced assumptions, explaining their practical applicability. Let us make some additional comments on the stationarity assumption for the random process {x(t)}, introduced in (A2), (A2') and (B1). From the point of view of applications, it cannot be considered restrictive, since it encompasses a large variety of quickly and slowly varying real signals. This assumption is not essential for proving convergence of the presented algorithms: it has been introduced primarily for the sake of focusing on the essential structural aspects of the algorithm and avoiding complex notation [13,14,17]. Notice, according to Lemmas 2 and 3, that the similarity transformation T can be applied even in the case of time varying matrixB(t), owing to its specific structure; namely, we have T −1B (t)T = 0 2×2 0 2×(2n−2) 0 (2n−2)×2B (t) * , whereB(t) * is Hurwitz and T is obtained fromB(t) for any selected t = t . Moreover, notice that the conclusions of Theorem 1 hold, in general, provided the following unrestrictive condition holds: lim t ∏ τ (I −B(t − τ) * ) = 0. Also, it is possible to conclude directly that the results of the above theorems hold for changes ofB(t) * sufficiently slow. Moreover, it is not difficult to prove that the above convergence results exactly hold when the signal is asymptotically stationary.

Network Weights Design
As already discussed in the previous subsections, the implicit goal of the presented calibration scheme is to exploit the sensors with a priori good calibration properties by enforcing their dominating effect in the final consensus value to which all the nodes converge. This can be done in two ways, by adjusting the design weights γ ij : (1) if the majority of sensors are "good", we can set all γ ij for the neighborhood of any node i to the same value; or (2) if there is a smaller subset N f ⊂ N of a priori "good" sensors in the network we should appropriately tune the values of γ ij . For this scenario, in this subsection, we give a more detailed analysis of the weights adjusting problem for the case of asynchronous communications treated in Section 5.3.
According to the theoretical results presented in detail in ref. [14], the dominant component of the random variables [χ 1 χ 2 ] in Theorem 6 is given by a weighted sum of the unknown sensor parameters α i and β i . The positive weights are determined by the left eigenvectors w 1 and w 2 ofB corresponding to the zero eigenvalue. In turn, these weights are functions of the design parameters γ ij , the wake-up probabilities q j , and the dropout probabilities p ji , i, j = 1, . . . , n. Therefore, it is clear that the initial characteristics of a selected sensor i will have larger influence on the asymptotic consensus value if the appropriate elements of w 1 and w 2 are increased. This can be achieved in two ways: 1. By reducing the values of all the elements in the i-th row ofΓ, or 2. By increasing the values γ ji , j = i, from the i-th column (keeping in mind thatΓ must be row stochastic).
Probabilities q j can in certain situations also be adjusted since they depend on the rate of the local clock of node j. By increasing the clock rate of the node j, the influence of that node on the asymptotic calibration parameter values achieved at consensus, is also increased. Adjusting dropout probabilities 1 − p ji might also be possible in certain situations: by decreasing the probability of a node i of receiving a message from a neighbor, we increase its influence on the asymptotic consensus. Hence, there are several design variables which can be adjusted so that the desired convergence point is achieved.

Macro Calibration for Networks with Reference Nodes
As discussed in the previous subsection, the selection of the weights in the matrixΓ is important for attaining the calibration goal of emphasizing a priori selected "good" sensors ("leaders"). Besides the described methods, this can be ultimately done by leaving the nodes from a set N f ⊂ N with unchanged calibration parameters (reference nodes), and only apply the recursions (31) (or (5), or (26)) to the rest of the nodes i ∈ N − N f . An example of a SN with such topology corresponding to the smart city example in Figure 1 is shown in Figure 2. In practice, this situation emerges, for example, when a SN needs to be expanded, i.e., when several uncalibrated sensors needs to be added to an already calibrated SN. In this subsection, the convergence results for this case are presented assuming asynchronous calibration algorithm [14].
First, we treat the special case in which |N f | = 1, i.e., there is only one reference sensor, and we want to calibrate the rest of the SN so that their calibrated outputs converge to the output of the reference sensor. For this case, all of the above results still hold, since the resulting communication graph will again have a spanning tree (with the reference node as a center node), which implies that (B3) (and (A3)) holds. Therefore, by applying the above convergence theorems, one concludes that the corrected gains and offsetsφ i (k), i = 1, . . . , n, will converge to the same value, dictated by the "leader".
In the general case, assume, without loss of generality, that N f = {1, 2, . . . , n f }, is the set of reference senors which have fixed parameters: The calibration algorithms above are now applied in the same way, except that the reference nodes do not change the calibration parameters:θ i (k) =θ i (k − 1) for all i ∈ N f . Let N − N f = {n f + 1, . . . , n} and letφ v (k) = [φ n f +1 (k) T . . .φ n (k) T ] T be the vector of all the calibration parameters to be tuned. In this case, all of the above theorems do not hold anymore, since, when n f > 1, the communication graph does not necessarily satisfy (B3) (the graph does not have a center node anymore, since two or more reference nodes are not mutually reachable). Hence, a separate convergence theorem is needed which treats this case: 14]). Let Assumptions (B1)-(B4) be satisfied and let all the nodes from N − N f be reachable from all the nodes in N f . Then the algorithm (31), in which γ ij = 0 for all i ∈ N f , provides convergence of φ v (k) in the mean square sense and w.p.1 to the limit defined bŷ where matricesΓ v andΓ f ,v are (n − n f ) × (n − n f ) and (n − n f ) × n f submatrices of matrix P −cΓ , with indices i, j = n f + 1, . . . , n and i = n f + 1, . . . , n, j = 1, . . . , n f , respectively; P = diag{p 1 , . . . , p n } and c is defined in (31).
It can be easily concluded that if the calibration parameters of all the reference nodes are the same, equal to some φ f , then the calibration parameters of the rest of the nodes will also converge to φ f . If this is not the case, it can be seen from (38) that the calibration parameters of different nodes will converge to different values. These values will be typically dictated by the reference sensors which are the closest to a given node.

Autonomous Gain Correction and Relationship with Time Synchronization
After presenting the most important aspects of the presented blind calibration methodology encompassing both noiseless and noisy environments, we will now make a few comments on the problem of its relationship with the algorithms for time synchronization in sensor networks, which has attracted a lot of attention (e.g., [40][41][42][43][44][45][46][47][69][70][71] and the references therein). Indeed, after coming back to the main measurement model, one easily realizes that in the case of time synchronization one has the form (1) with the absolute time t replacing the signal value x(t). The estimation schemes used in the mentioned analogous time synchronizaion algorithms are, indeed, related to the estimation of the parameters of the calibration functions (2) based on the use of local time measurements; however, they consist of one separate recursion for the relative drift estimation and one separate recursion for the estimation of offsets, relying on the obtained relative drifts. In ref. [72] this scheme was reformulated in the light of the calibration problem and the above described methodology. One starts from the difference model ∆y i (t) = y i (t + 1) − y i (t) = α i ∆x(t), where ∆x(t) = x(t + 1) − x(t), and construct a gradient recursion forâ i following the above methodology, having in mind that ∆y i (t) does not depend on β i . On the other hand, the estimation of b i has to start from (1); it has the form of the recursion forb i in (5), but should useâ i generated by first recursion. Such a combined gradient algorithm based on use of ∆y i (t) resembles typical time synchronization algorithms. In general, it is important to observe that the introduction of t instead of x(t) in the basic relation (1) suffers from the very basic problem that unboundedness of the linear function contradicts the requirements for boundedness of the second order moments of x(t) (typical for stochastic approximation algorithms) and that it is not possible to guarantee convergence of the obtained recursions. This indicates that formal transfers of methodologies from one domain to the other should be done with extreme caution.

Simulation Results
In this section we present the results of extensive simulations, illustrating that the algorithms are applicable to real-world problems involving sensor networks with decentralized architectures. The results are presented using several figures which demonstrate all of the important properties theoretically addressed in the previous sections. In all of the simulations, a SN with ten nodes, with randomly generated communication graph satisfying assumption (A3), has been chosen. To show that the algorithms are applicable to a large variety of sensor characteristics, the sensor parameters α i and β i have been generated randomly from uniform distribution with means one and zero, respectively, and with standard deviation 0.3.
In Figure 3 the corrected gainsĝ i (t) and offsetsf i (t) obtained by the presented algorithm (5) in the noiseless case are presented, for a preselected step size, equal for all the nodes, δ = 0.01. It is clear from the figure that the convergence to consensus, and, hence, implicit asymptotic calibration is achieved with the asymptotic values close the desired (one for corrected gains, and zero for corrected offsets). In Figure 4 the simulation results are presented for the situation in which the first node is set to be a reference node ("leader") with perfect parameters α 1 = 1 and β 1 = 0. Obviously, all of the rest of the nodes converge to this ideal characteristic.
In Figure 5 the corrected gainsĝ i (t) and offsetsf i (t) are depicted for the case in which all the theoretically discussed unreliabilities are included: communication dropouts with p = 0.2 for all the links in the network, normally distributed communication additive noise with variance 0.1, and normally distributed measurement noises with variances different for all the nodes, uniformly generated in the range (0, 0.1). In this case, in order to achieve convergence in the presence of noise, the time varying diminishing step-size sequences are chosen δ(t) = 0.01/t 0.6 , equal for all the nodes. The algorithm which works in this case is the one based on the instrumental variables (26), so that the measured signal x(t) is assumed to be generated by a second order linear system with white noise at the input, resulting in a correlated sequence with zero mean and standard deviation one. It is obvious from the figure that the convergence is achieved despite the presence of all the introduced unreliabilities. Next we simulate the asynchronous algorithm (31) which includes instrumental variables. It is assumed that the local clocks of all the nodes are driven by Poisson processes with the same rates.
In Figure 6, the corrected gainsĝ i (k) and offsetsf i (k) generated by (31) are depicted assuming the steps δ(k) = 0.01/k 0.6 and the presence of communication dropouts with probabilities p ij = 0.2 for each link. The measurement noises and the signal x(k) are assumed to be the same as in Figure 5. Gains Offsets Figure 6. The asynchronous algorithm based on instrumental variables without reference sensors: convergence to consensus is achieved for corrected gains and corrected offsets.
In Figure 7 the necessity of introducing the instrumental variables is demonstrated when the noise is present. The basic algorithm (5), without instrumental variables, has been simulated and the convergence is not achieved in this case. It can be seen that all the corrected gainsĝ i (k) slowly converge to zero which is very undesirable.  Figure 8 illustrates the case discussed in Section 6.4 in which there are two reference sensors in the given SN. It can be seen from the figure that, as predicted by Theorem 8, the consensus is not achieved, and that all the calibration parameters converge to different values, determined by Equation (38). Offsets Gains Figure 8. The asynchronous algorithm with two reference sensors with different characteristics: both the corrected gains and the corrected offsets converge to different values determined by (38).

Conclusions
In this paper, we consolidate the existing results on distributed recursive blind macro-calibration based on consensus, present the algorithms in a unified way, and provide additional analysis of several important theoretical and practical issues. The studied algorithms are completely decentralized, not requiring any fusion center. It was shown that the algorithms successfully perform on lossy sensor networks, characterized by unreliable communications, limited only to the local neighbors. Convergence properties have been presented under both noiseless conditions, and the conditions typical for noisy and unreliable environments. The practically important case of asynchronous communication based on a broadcast gossip scheme has also been treated. Extensive discussions have been provided, explaining in detail several important practical issues: convergence rate, design of tunable network weights, and calibration of sensor networks with multiple reference nodes. Extensive simulation results illustrating the behavior of the algorithms have been presented.

Future Work
The presented results can be extended in several directions. The first future direction, which is imposed naturally, is to extend the presented algorithms, or develop new ones, for the case when the nodes measure spatially varying but correlated signals. Treating this situation would drastically increase the practical applicability of the described methodology. The performance of such a scheme would highly depend on a priori knowledge about the interrelatedness of the measurements of different nodes.
Another direction of possible future work is to extend the results to include the possibility that individual nodes measure vector values (instead of scalars treated in this paper). This case would arise in practically frequent situations when there are multiple diverse sensors on each node, and when each sensor possibly measures different (overlapping) subsets of all the available sensed values.
One possible idea for treating these situations is to use the wide existing literature in overlapping decentralized estimation (e.g., [20,[73][74][75][76]). One typical example where this situation arises are networks of cameras with different view angles and scene coverage.
Third future direction emerges if one interprets the above results not in the context of calibration, but as pure synchronization (consensus) results for certain special types of linear time-varying systems. Indeed, the derivations and results of the presented convergence theorems open up a possibility of extending them to the case of synchronization of general higher order linear parameter-varying systems, which is a topic of high theoretical importance (e.g., [77] and references therein).
Finally, sensor networks often operate in hostile environments where battery usage and power consumption of the sensor nodes are of crucial importance. In this context it would be of high importance to analyze, for a given particular sensor network mission, how to formulate optimization problem and achieve optimal compromise between sensor network calibration and the nominal operation dictated by the given mission.