Robust Distributed Kalman Filtering: On the Choice of the Local Tolerance

We propose a distributed Kalman filter for a sensor network under model uncertainty. The distributed scheme is characterized by two communication stages in each time step: in the first stage, the local units exchange their observations and then they can compute their local estimate; in the final stage, the local units exchange their local estimate and compute the final estimate using a diffusion scheme. Each local estimate is computed in order to be optimal according to the least favorable model belonging to a prescribed local ambiguity set. The latter is a ball, in the Kullback–Liebler topology, about the corresponding nominal local model. We propose a strategy to compute the radius, called local tolerance, for each local ambiguity set in the sensor network, rather than keep it constant across the network. Finally, some numerical examples show the effectiveness of the proposed scheme.


Background
In this section, we review the robust Kalman filter proposed in [29], which represents the "building block" used throughout the paper. Consider the nominal state-space model where A ∈ R n×n , Γ B ∈ R n×n+pN , C ∈ R pN×n , Γ D ∈ R pN×n+pN , x t is the state process, y t is the observation process, u t is normalized white Gaussian noise (WGN), and r t is a deterministic signal. It is assumed that u t is independent from the initial state x 0 ∼ N (x 0 , V 0 ). We also assume that the noise entering in the state process and the one entering in the observation process are independent, i.e., we assume that Γ B Γ T D = 0. Finally, the state-space model in Equation (1) is considered to be reachable and observable. Let φ t (z t |x t ) denote the nominal transition probability density of z t := x T t+1 y T t T given x t .
Notice that φ t (z t |x t ) is Gaussian by construction and it is straightforwardly given by Equation (1). We assume that the (unknown) actual transition probabilityφ t (z t |x t ) belongs to the ambiguity set which is a closed ball centered in φ t (z t |x t ) in the KL topology: is defined as the actual conditional probability density of x t given Y t−1 . The mismatch modeling budget allowed for each time step is represented by the parameter c > 0, which is called tolerance. The robust estimator of x t+1 given Y t for the nominal model in Equation (1) is given by solving the following minimax problem: where G t is the set of all estimators g t whose variance is finite under any model in the ambiguity set B t , is the estimation error under the transition densityφ t (z t |x t ). In [29], it is proved that the estimator solution to the problem in Equation (3) has the following Kalman-like structure: where γ(P, θ) := log det(I − θP) + tr((I − θP) −1 − I).
Parameter θ t > 0 is called risk sensitivity parameter. It is worth noting that, given P > 0 and c > 0, the equation γ(P, θ) = c always admits a unique solution in θ and such that: θ > 0, P −1 − θ I > 0.
In the special case that c = 0, i.e., the nominal model coincides with the actual model, then θ t = 0 and thus Equation (4) degenerates in the usual Kalman filter.

Remark 1.
It is worth noting that the robust Kalman filter is well defined also in the case that the ambiguity set Equation (2) is defined by a time-varying tolerance, i.e., c t instead of c. However, we prefer to keep c constant in Equation (3) because in the following we assume that the actual (global) model is the solution to Equation (3) with constant tolerance c, in order to simplify the setup.

Distributed Robust Kalman Filtering with Uniform Local Tolerance
In this section, we review the distributed robust Kalman filter presented in [39]. Consider a network made by N sensors. The latter are connected if they can communicate with each other. Accordingly, every sensor k has a set of neighbors which is denoted by N k . In particular, k ∈ N k that is each node is connected with itself. The number of neighbors of node k is denoted by n k . The corresponding N × N adjacency matrix J = [j lk ] lk is defined as We assume that every node collects a measurement y k,t ∈ R p at time t and the corresponding nominal state-space model is where w t and v k,t , with k = 1 . . . N, are independent normalized WGNs. It is worth noting that the actual state-space model for each node is unknown. By stacking Equation (5) for every k, it is possible to rewrite such sensor network as Equation (1) where: Accordingly, Equation (4) represents the centralized robust Kalman filter. Defining R := DD T , R l := D l D T l with l = 1 . . . N, and the Kalman gain for Equation (5) becomes, using the matrix inversion lemma, Since the nominal model in Equation (5) does not coincide with the actual one and each node k can only exploit information shared by its neighbors l ∈ N k , the aim of distributed robust Kalman filtering is to compute a predictionx k,t of the state x t for every node k by using only the local information, taking into account the model uncertainty. In the case that the node k has access to all measurements across all the nodes in the network, thenx k,t coincides with Equation (4) which can be written, using the parameterization in Equations (6) and (7) aŝ wherex k,t =x t , P k,t = P t , V k,t = V t , and θ k,t = θ t . In the case that not all the measurements in the network are accessible to node k, then the target is to compute a state predictionx k,t of x t which is as similar as possible to the global state prediction. Assume that the node k can collect the measurements from its neighbors N k . Then, the corresponding local nominal state-space model is The latter can be rewritten in the compact form where u loc T is the input noise and y loc k,t is the output; v loc k,t and y loc k,t are given by stacking v l,t and y l,t , with l ∈ N k , respectively. Moreover, C loc k is given by stacking C l with l ∈ N k , Γ D loc = 0 D loc k and D loc k is a block diagonal matrix whose main blocks are D l with l ∈ N k . In addition, defining R loc k := D loc k D loc k T and S k : We conclude that the one-step ahead predictor of x t at node k is similar to the one in Equation (8) but now we need to discard the terms for which l / ∈ N k . It is worth noting that the latter represents an intermediate local prediction of x t+1 at node k, and it is denoted as ψ k,t+1 . Allowing that the connected nodes can exchange their intermediate estimates, then each node can update the prediction at node k in terms of both ψ k,t+1 and ψ l,t+1 with l ∈ N k . More precisely, consider a matrix W = [w lk ] lk ∈ R N×N such that Therefore, the final predicted state at node k is given by means of the so-called diffusion step [14]: To sum up, in the diffusion scheme, each local unit uses the measurements and the intermediate local predictions from its neighbors. The resulting scheme is explained through Algorithm 1.

Algorithm 1 Distributed robust Kalman filter with uniform local tolerance at time t.
Input:x k,t , V k,t , y k,t , W = [w lk ] lk with k = 1 . . . N Output:x k,t+1 , V k,t+1 with k = 1 . . . N Incremental step. Compute at every node k: Diffusion step. Compute at every node k: It is worth noting that ψ k,t is computed by using the robust Kalman scheme in Equation (4) applied to the local model in Equation (10). In addition, c is the same for any node that is c takes a uniform value over the sensor network. In particular, the tolerance c is the same for both the centralized and the distributed Kalman filter. This strategy for the selection of the tolerance does not ensure that the least favorable model computed at node k is compatible with the one of the centralized filter. However, in the case of large deviations of the least favorable model corresponding to the centralized problem, it is very likely that the predictor at node k using Algorithm 1 is better than the one which assumes that the nominal and actual models coincide. Finally, in the case that c = 0, i.e., the nominal model coincides with the actual one, Algorithm 1 boils down to the distributed Kalman filter with diffusion step in [14].

Distributed Robust Kalman Filtering with Non-Uniform Local Tolerance
We investigate the possibility to assign a possibly different local tolerance to each node that is the local tolerance is not uniform across the sensor network. Recall that the least favorable model is given by the minimax problem in Equation (3), with constant tolerance c, and the corresponding optimal estimator is given by the centralized robust Kalman filter in Equation (4).
Consider the centralized problem in Equation (3). Let denote the pseudo-nominal and the least favorable conditional probability densities of z t given the past observations Y t−1 , respectively. Recall that φ t (z t |x t ) is the nominal transition density of the state space model in Equation (1) and thus Sincef t (x t |Y t−1 ) ∼ N (x t , V t ), and in view of Equations (14) and (16), we havē In [29], it has been shown that the optimal solutionφ 0 t (z t |x t ) to Equation (3) is Gaussian. Accordingly, in view of Equation (15), the corresponding least favorable density of z t given Y t−1 is Gaussian:f It is clear then that the minimax problem in Equation (3) can be written by replacing φ t (z t |x t ) and , respectively. Then, the equivalent minimax problem iŝ where the ambiguity set is a ball about the pseudo-nominal densityf t (z t |Y t−1 ) : It is well known that D KL (f t f t ) also represents the negative log-likelihood of the modelf t under the actual modelf t , [40][41][42]. Accordingly, c represents an upper bound of the negative log-likelihood and it can be found as follows. Fix the nominal state space model (A, B, C, D) and collect the data (y N , u N , be the negative log-likelihood of this nominal model using the collected data. Then, fix c = (A, B, C, D; y N , u N , x N ). Clearly, we need to assume that the state is accessible to observation (or its estimate is reasonably good) to compute c. [30]). Letf t (z t |Y t−1 ) be the nominal density with mean m z t and covariance matrix K z t partitioned as

Theorem 1 (Levy & Nikoukhah
according to the dimension of x t+1 and y t , respectively. The least favorable densityf 0 t (z t |Y t−1 ) solution to Equation (18) has mean and covariance matrix as follows: denote the nominal and least favorable error covariance matrices of x t+1 given Y t . Then, (20) and θ t > 0 is the unique value for which The above result provides a way to computef 0 t (z t |Y t−1 ). Indeed, once the centralized robust Kalman filter in Equation (4) has been computed, the mean and the covariance matrix off 0 t (z t |Y t−1 ) are given, in view of Equation (17), bỹ where , we can compute the nominal and least favorable density for each node. Consider the state-space model in Equation (10) Then, the nominal transition probability at node k, in view of Equation (10), is Then,f denotes the pseudo-nominal conditional probability density of z k,t given the past observations , and in view of Equation (23), we havē and Such a result is not surprising, indeedf k,t (z k,t |Y t−1 ) is given by marginalizingf t (z t |Y t−1 ) with respect to y l,t with l / ∈ N k . Roughly speaking, this means that m z k,t , K x t+1 ,y k,t and K y k,t are obtained from m z t , K x t+1 ,y t and K y t as follows: • m z k,t is the vector obtained from m z t by deleting the elements from pl − p + 1 to pl for any l / ∈ N k .
• K x t+1 ,y k,t is the matrix obtained from K x t+1 ,y t by deleting the columns from pl − p + 1 to pl for any l / ∈ N k . • K y k,t is the matrix obtained from K y t by deleting the rows and the columns from pl − p + 1 to pl for any l / ∈ N k .
Accordingly, we can compute the least favorable density at node k, sayf 0 k,t (z k,t |Y t−1 ), by marginalizingf 0 t (z t |Y t−1 ) with respect to y l,t with l / ∈ N k . Therefore, we havẽ where in the last equality we exploit Equation (22). It remains to design the robust filter to compute the intermediate prediction ψ k,t+1 .

Remark 2.
At this point, it is worth doing a digression about Algorithm 1. The intermediate prediction at node k is the solution to the following minimax problem whereB and G k,t is the set of all estimators g k,t whose variance is finite under any model in the ambiguity setB k,t . Moreover, in view of Theorem 1, the least favorable densityf k,t (z k,t |Y t−1 ) solution to Equation (26) is such that D KL (f k,t f k,t ) = c. It is worth noting that the best estimator at node k would be the one constructed fromf 0 k,t . On the other hand, the problem in Equation (26) Clearly, one would design the intermediate estimator at node k by usingf 0 k,t . However, the latter is not available at node k, and it is only known by a "central unit", i.e., a unit knowing the global model, but neither collecting measurements nor computing predictions. Moreover, the transmission of the mean and the covariance matrix off 0 k,t would be more expensive in terms of transmission costs. As alternative, we can consider a minimax problem whose least favorable modelf k,t is such that whereB and p k coincides with the number of rows of C loc k . Under the above scheme, the central unit only transmits the local tolerance to each node in the network. The procedure which implements this optimized strategy of distributed robust Kalman filtering is outlined in Algorithm 2.

Algorithm 2 Distributed robust Kalman filter with non-uniform local tolerance at time t.
Input:x k,t , V k,t , y k,t , W = [w lk ] lk with k = 1 . . . N Output:x k,t+1 , V k,t+1 with k = 1 . . . N Tolerance update. Using the nominal global model, the central unit computes for every node k: Incremental step. Compute at every node k: Diffusion step. Compute at every node k:

Least Favorable Performance
We show how to evaluate the performance of the previously introduced distributed algorithm with non-uniform local tolerance and diffusion step with respect to the least favorable model solution of the centralized problem in Equation (3). More precisely, we show how to compute the mean and the variance of the prediction error for each node k in the network. In [29,34], it is shown that the least favorable model can be characterized through a state-space model over a finite interval [0, T] as follows. Let ξ t = [ x T t e T t ] T , where x t is the least favorable state process. Then, the least favorable model takes the form where ε t is normalized WGN, independent fromx 0 , andř t := [ r T t 0 ] T . Moreover, The matrix Ω −1 t+1 is computed from the backward recursion Letx k,t = x t −x t,k denote the least favorable state prediction errorx k,t of node k at time t using Algorithm 2 or Algorithm 1. Define the vector containing all the errors across the network Using the same reasonings in [39], it is not difficult to prove thatχ t obeys the following dynamics Finally, 1 denotes the vector of ones. Then, we combine Equation (35) with the model for e t in Equation (34): where Taking the expectation of Equation (36), we obtain In view of the fact that x 0 has mean equal tox 0 andx k,0 =x 0 for k = 1 . . . N, it is not difficult to see thatẼ[η 0 ] = 0. This implies that η t is a zero mean stochastic process or, equivalently, all the predictors are unbiased. Next, we show how to derive the variance of the prediction errors.
In view of the fact that ε t is normalized WGN, by Equation (36), we have that Q t is given by solving the following Lyapunov equation We partition Q t as follows: where P t ∈ R Nn×Nn , H t ∈ R Nn×n and R t ∈ R n×n . Notice that P t contains in the main block diagonal the covariance matrices of the estimation error at each node. Accordingly, the least favorable mean square deviation is given by where MSD k,t is the variance of the prediction error at node k. Finally, we have the following convergence result for the proposed distributed algorithm. Let (A, B) be a reachable pair and (A, C loc k ) be an observable pair for any k. Let W be an arbitrary diffusion matrix satisfying Equation (11). Then, there exists c > 0 sufficiently small such that, for any arbitrary initial condition V 0 > 0 and V k,0 > 0, the sequence Q t , t ≥ 0, generated by Equation (39) converges toQ > 0 over [αT, βT] as T → ∞. Moreover, we have F t →F , G t →Ḡ, and c k,t →c k . In particular,Q corresponds to the unique solution of the algebraic Lyapunov equation
Proof. First, notice that the observability condition on the pairs (A, C loc k ) implies the observability of (A, C). Since the global model is reachable and observable, the robust centralized Kalman filter converges provided that c is sufficiently small (see [43,44]). As a consequence, V t →V > 0 as t → ∞. Accordingly, in view of Equation (17), K z t → K z and, thus, in view of Equation (22),K z t →K z . Since K z k,t andK z k,t are submatrices of K z t andK z t , respectively, we have that K z k,t → K z k andK z k,t →K z k . Accordingly, in view of Equation (30), we have that c k,t →c k wherē In [30], it has been shown that V t → P t as c → 0, and thusK z t → K z t . Since K z k,t andK z k,t are submatrices of K z t andK z t , respectively, we have thatK z k,t → K z k,t . Accordingly, in view of Equation (30), we have that c k,t → 0 as c → 0.
In view of ( [43] , Proposition 5.3), we conclude that the robust local Kalman filter at node k converges because: the local state-space model is reachable and observable;c k is sufficiently small provided that c is sufficiently small as well.
Finally, the remaining part of the proof follows the one in ( [39] , Section IV-A) (see also [45]).
It is worth noting that Proposition 1 guarantees thatQ is bounded becauseF is Schur stable. This means that the prediction errors over the network have finite variance, i.e., the Kalman gains of the local filters are stabilizing. The proof above also shows that, in the case c = 0, i.e., the nominal model coincides with the actual one, Algorithm 2 boils down to the distributed Kalman filter with diffusion step proposed in [14].

Numerical Examples
In this section, we test the performance of the distributed Kalman filters with uniform versus non-uniform local tolerance. More precisely, we consider the problem in [39] to track the position of a projectile from position observations corrupted by noise and coming from a network of N = 20 sensors. The latter is shown in Figure 1.
where n l represents the total number of neighbors of node l while α k is chosen such that Equation (11) holds.

First Example
We assume that the actual model is contained in the ambiguity set Equation (2) with c = 0.02. Figure 2 shows the least favorable mean squared deviation across the network. We notice that MSD t converges at the steady-state for all the distributed versions of the Kalman filter. RKFDNU performs slightly better that RKFDU and both perform consistently better than KFD. Finally, all of them perform worse than the centralized versions, and RKFC results the best. However, the situation is more salient if we consider the steady-state least favorable MSD k,t for each node (see Figure 3a): RKFDNU performs slightly better than RKFDU for the majority of the nodes. However, there is a clear difference for nodes 18 and 19 which are more susceptible to model uncertainty: RKFDNU performs better than RKFDU.  Figure 3b shows the behavior of the local tolerances c k,t over the time for RKFDNU. As expected, every c k,t converges to a constant value. However, the latter is different from the tolerance c of the centralized minimax problem.
Finally, Figure 4a,b shows the risk sensitivity parameters θ k,t at every node for RKFDU and RKFDNU. We can observe that the risk sensitivity parameters of RKFDU takes larger values than the ones of RKFDNU. Accordingly, the inferior performance of RKFDU is due by the fact that the robust local filters are too conservative.

Second Example
In the second experiment, we consider a larger deviation between the actual model and the nominal one, i.e., we choose c = 0.06. Figures 5 and 6a show the least favorable mean square deviation across the network and for each node in the steady-state. The situation is similar to the previous one, but the difference among RKFD, RKFDU, and RKFDNU is more evident. In particular, the steady-state value of MSD k,t for k = 18, 19 using RKFDNU is clearly better than the ones corresponding to KFD and RKFDU.
In addition, Figure 6b shows the tolerances c k,t at every node over the time. As expected, the latter are higher than the ones with c = 0.02. Indeed, the uncertainty now is greater than before and thus the robust local filters now must be more conservative than before.
Finally, we study how the least favorable MSD for each node correlates with the topology of the sensor network. Figure 7a,b shows two additional sensor networks obtained from the original network of Figure 1 by adding connections only to some nodes. More precisely, the density of the original network, i.e., number of connections over all possible connections, is d 1 = 0.39; the density of the networks in Figure 7a Figure 8a,b shows the results obtained by RKFDNU with the three different sensor networks. As expected, the increase of the degrees of the nodes, and consequently of the connections in the network, reduces the least favorable MSD related to those nodes at steady-state and the total least favorable MSD across the network. In conclusion, by adding edges the performance of RKFDNU tends to the one obtained in the centralized case (RKFC), where the nodes are considered all connected to each other.

Efficient Algorithm
Proposition 1 suggests a simplified version of Algorithm 2. Indeed, if c is sufficiently small, then c k,t converges toc k in the steady state for every node of the network. Accordingly, the central unit can computec k and transmit it to any node once. In this way, the transmission costs are reduced. The resulting procedure is outlined in Algorithm 3.

Algorithm 3 Efficient distributed robust Kalman filter with non-uniform local tolerance at time t.
Input:x k,t , V k,t , y k,t , W = [w lk ] lk with k = 1 . . . N Output:x k,t+1 , V k,t+1 with k = 1 . . . N Incremental step. Compute at every node k: Diffusion step. Compute at every node k: We compared this algorithm, hereafter called RKFDNU2, with RKFDNU: the performance in practice is the same. Figure 9 shows their least favorable mean square deviation across the network in the scenario of Section 5.2 in the first 50 time steps. Finally, Figure 10a,b shows the risk sensitivity parameters for RKDFNU and RKFDNU2, respectively: there is a slight difference. However, we saw that such a difference disappears after 20 time steps. We conclude that the efficient scheme RKFDNU2 represents a good approximation of RKFDNU. Finally, Table 1 summarizes the performance of RKFC, RKFDU, RKFDNU, and RKFDNU2 obtained with tolerance c = 0.02. The considered values are the least favorable MSD across the network at steady-state, the average among every node of the tolerances at steady-state, the average among every node of the risk sensitivity parameter at steady-state, and the occurred communications between the central unit and the local nodes in the whole time span. In particular, concerning the communication: • in RKFDU, the central unit transmits the uniform tolerance to each node at once (at the beginning); • in RKFDNU, the central unit transmits the local tolerances to each node at every time step; • in RKFDNU2, the central unit transmits the steady-state local tolerances to each node at once (at the beginning).

Conclusions
In this article, the problem of distributed robust Kalman filtering for a sensor network is considered. More precisely, we consider a distributed scheme with diffusion step and the intermediate estimate is designed in order to be optimal according to the least favorable model belonging to a prescribed local ambiguity set. The latter is a ball about the local nominal model and the radius of this ball is the local tolerance. In this paper, we propose an algorithm in which the local tolerance of each node is different and suitably computed by the central unit. We also consider a more efficient implementation of the algorithm where the central unit computes and transmits the steady-state local tolerances for every node at once. In this way, the communication between the central unit and the local nodes is reduced. Through some numerical examples, we showed that the proposed algorithm performs better than the one with a uniform local tolerance across the network.

Conflicts of Interest:
The authors declare no conflict of interest.