Optimally Distributed Kalman Filtering with Data-Driven Communication †

For multisensor data fusion, distributed state estimation techniques that enable a local processing of sensor data are the means of choice in order to minimize storage and communication costs. In particular, a distributed implementation of the optimal Kalman filter has recently been developed. A significant disadvantage of this algorithm is that the fusion center needs access to each node so as to compute a consistent state estimate, which requires full communication each time an estimate is requested. In this article, different extensions of the optimally distributed Kalman filter are proposed that employ data-driven transmission schemes in order to reduce communication expenses. As a first relaxation of the full-rate communication scheme, it can be shown that each node only has to transmit every second time step without endangering consistency of the fusion result. Also, two data-driven algorithms are introduced that even allow for lower transmission rates, and bounds are derived to guarantee consistent fusion results. Simulations demonstrate that the data-driven distributed filtering schemes can outperform a centralized Kalman filter that requires each measurement to be sent to the center node.


Introduction
The efficient processing of sensor data is a central topic in a wide variety of research areas, which is underlined by advances in sensor technology and capabilities, e.g., for odor [1] and taste recognition [2] or by advances in visual information processing [3] as well as by applications in robotics [4] and sensor networks [5,6]. In particular to process data of multiple sensors, the well-known Kalman filter [7] has evolved into a key component of data fusion algorithms. Multisensor data can directly be transmitted to a data sink that employs a centralized Kalman filter to process all the accrued sensor readings. Such a simple filter design stands in stark contrast to the communication costs expended to transmit the data. The idea behind distributed Kalman filter implementations is to use local processing power to combine and condense sensor data locally so that the processing results can be transmitted more efficiently and less frequently. As the local processing results comprise the information from all past observations, the results can be sent to the data sink after arbitrarily many time steps without losing information from the past measurements. As compared to a centralized Kalman filter, distributed implementations 1.
All nodes have to send their data at the same time, and 2.
the central cannot infer any information about the state between the sending times.
Hence, the standard implementation of the ODKF implies transmissions of either all or none of the nodes. As a further consequence, full-rate communication of the nodes is required if the data sink needs an estimate at every time step. In this article, extensions of the ODKF are proposed that can operate under lower communication rates. This is achieved by introducing data-driven transmission strategies [29,30]. In particular, the local estimates can asynchronously be transmitted to the data sink. In order to guarantee consistency, bounds on the estimation errors are provided. These bounds are only required in situations when not every local estimate is available at the data sink; the optimal estimate as provided by a central Kalman filter is still obtained each time when all local estimate have been sent to the data sink.
As compared to the standard formulation of the ODKF, the advantage of the proposed extension is that the data sink can now compute an estimate of the state based on a subset of local estimates. This article continues the work in [31] by introducing an additional criterion for the data-driven transmission strategy, providing more details, and extending the evaluation and discussions.
The paper is structured as follows. Section 2 provides a description of the centralized and the optimally distributed Kalman filter as well as a problem formulation. In Section 3, the first extension is introduced which enables the data sink to treat missing estimates. In Section 4 and in Section 5, we describe the second and the third new distributed algorithm, respectively, which implement data-driven transmission schemes and allow for omitted estimates at the fusion center over multiple time steps. The results of an experimental evaluation are discussed in Section 6. Finally, the article is concluded in Section 7.

Centralized and Optimally Distributed Kalman Filtering
We consider a sensor network with N local sensor nodes and a central node, which serves as a data sink and computes an estimate on the state. The true state of the system at time step k is denoted by x k , which evolves according to the discrete-time linear dynamic system where A k is the system matrix and w k denotes the process noise, which is assumed to be zero-mean Gaussian noise, w k ∼ N (0, C w k ) with covariance matrix C w k . At each time step k, each sensor i ∈ {1, . . . , N} observes the state through the model where H i k is the measurement matrix and v i k the measurement noise, which is assumed to be Gaussian noise with zero mean, v k ∼ N (0, C z,i k ) with covariance matrix C z,i k . The measurement noise terms of different local sensors are assumed to be mutually uncorrelated. Also, the process and measurement noise terms for different time steps are uncorrelated.
For the centralized Kalman filter (CKF), each measurement is sent to the data sink, and fused by means of the formulas These equations correspond to the information form [32,33] of the measurement update of the standard Kalman filter.x e,c k and C e,c k denote the state estimate and the corresponding error covariance matrix after the fusion step, respectively.x p,c k and C p,c k denote the state estimate and the corresponding error covariance matrix computed by the prediction step of the Kalman filter. The prediction step is carried out at the center node by where these formulas are also given in the information form [26,32]. Since these equations correspond to the standard Kalman filter, the CKF is unbiased and optimal with respect to the minimum mean squared error. In particular, the computed error covariance matrix is equal to the actual estimation error, i.e., In [20][21][22][23], a distributed implementation of the Kalman filter algorithm has been derived, which is algebraically equivalent to the centralized scheme, i.e., which is also unbiased, minimizes the mean squared error, and which fulfills (3). This is achieved by defining a local filtering scheme such that the fusion result is equal to results (1) and (2) of the CKF. We will describe this algorithm-the optimally distributed Kalman filter (ODKF)-in the following.
The local sensor nodes run modified versions of the Kalman filtering algorithm. They use so called globalized local states estimates and error covariance matrices (Although, strictly speaking, the local estimate and covariance matrix do not represent consistent estimates of the state, we denote them as local estimates). To initialize the ODKF, the local initial estimates and covariance matrices (x e,i 0 , C e,i 0 ) at the sensor nodes i ∈ {1, . . ., N}, which are usually identical, have to be replaced by the globalized estimatesx Since the globalized error covariance matrix is equal for each sensor, the sensor index i is omitted. This equality also applies to all future time steps. The local prediction step is replaced by the globalized prediction equationsx The local filtering steps are globalized bȳ The processing steps (4) and (5) are computed locally on each sensor node. Hence, measurements are not directly transmitted to the central node-instead, the local estimates are sent to the central node. As the globalized covariance matrix is equal for each node, it can also be computed in the central node.
In order to compute an estimate at an arbitrary time step k, the central node receives the globalized estimates (x e,i k ,C e k ) from each sensor i and fuses them according to x e,d k denotes the state estimate after the fusion step at the data sink with corresponding error covariance matrix C e,d k . From Equations (6) and (7), we can easily accept that The same equations apply to the fusion of the predicted estimates and error covariance matrices in (4). In [20], it has been shown that the results are optimal, i.e., x e,d k =x e,c k , wherex e,c k and C e,c k are computed by the CKF, i.e., (1) and (2). The disadvantage of the centralized Kalman filter is that each sensor node has to transmit measurements of each time step to the central node. For the ODKF, we observe that communication in past time steps does not influencex e,d k and C e,d k , i.e., the equalities hold independently of the past communication pattern in the distributed network. As a consequence of (8) and (9), we can see that is equal to (3)-the ODKF is optimal. A significant drawback of the ODKF implementation becomes apparent in the following situation. If only m < N sensors transmit their estimates to the fusion center at time step k, Equations (6) and (7) become The resulting ODKF estimatex e,d k and error covariance matrix C e,d k are different from the centralized estimatex e,c k and the error covariance matrix C e,c k , which are In particular, we notice that the covariance matrix (13) differs from the ODKF covariance matrix in (11). A consequence of this mismatch is a possible bias in the fused estimate as discussed, e.g., in [24]-hence, the ODKF may provide inconsistent estimates in case of missing transmissions. This issue will be addressed in the following sections. Although the CKF does not suffer from inconsistency, (12) and (13) unveil a critical downside of the CKF: Missing measurements at time step k are lost for all future time steps. By contrast, the local estimates of the ODKF incorporate past measurements, which states the reason why the ODKF may outperform the CKF if we can solve the inconsistency problem of the ODKF.
In this section, the ODKF has been revisited; it provides the same results as the CKF but offers the advantage that transmissions can take place at arbitrary instants of time. However, the ODKF still requires that all nodes send their local estimates at the transmission times to compute (6) and (7). As a consequence, the data sink typically operates at a lower rate than the local nodes as it is idle between transmission times. In the following sections, extensions are provided that enable the nodes to transmit their local estimates asynchronously. The data sink can then operate at the same rate as the sensor nodes, i.e., it is able to provide an estimate at every time step k. By employing data-driven strategies, the communication rate of each node can be significantly lower than 1, where the value 1 corresponds to transmissions at every time step k.
In Section 3, we develop a consistent ODKF extension than can cope with situations where sensor nodes may send their estimates at every second time step. This algorithm still provides results equal to the CKF. With this algorithm, we are able to reduce the communication rate by half. Sections 4 and 5 introduce a second and third algorithm that can even reach a lower communication rate by applying bounds on the missing pieces of information.

Distributed Kalman Filtering with Omitted Estimates
We consider the ODKF algorithm as described in the previous section. At time step k, only sensor nodes 1, . . . , m send their estimates to the fusion center, but the estimates of sensor nodes m + 1, . . . , N are not. In this section, we assume that the data from the nodes m + 1, . . . , N had been available in the fusion center at time step k − 1. Thus, the fusion center can compute the predicted estimatesx p,m+1 k , . . . ,x p,N k for time step k by using (4). In place of the ODKF fusion Equations (6) and (7), the fusion result is now computed by The resulting estimatex e,d k and the error covariance matrix C e,d k are equal to the estimatex e,c k and the error covariance matrix C e,c k computed by a centralized Kalman filter according to (12) and (13). A proof for the equality is provided in Appendix A.
Since (14) and (15) are equivalent to the CKF, unbiasedness, optimality, and (10) are accordingly inherited from the CKF. We have generalized the original ODKF algorithm such that full-rate communication is not required anymore. The novel fusion algorithm merely requires that if a particular sensor does not communicate with the center at time k, it has sent its data at time k − 1, i.e., each sensor has to communicate with the center at least every second time step. Hence, the required communication rate can be reduced by half to 0.5.
However, a higher communication rate-and thus, the incorporation of the information contained in additional measurements-will always result in a lower mean squared error (MSE). Thus, we have to deal with the trade-off between a low communication rate and a low MSE. Nevertheless, it is possible to achieve a smaller MSE while keeping the same communication rate by using a data-driven communication strategy and thus, scheduling the data according to the information contained. Valuable results have already been achieved using data-driven communication in distributed sensor networks [34][35][36][37][38][39][40][41][42]. The idea is that each local sensor evaluates the distance between the predicted estimatex p,i k and the filtered estimatex e,i k . If the distance is large, the measurement z i k adds much new information to the prediction. Only in this case, the sensor should send its current estimatex e,i k to the center node.
It is important to emphasize that the globalized parametersx p,i k andx e,i k are not unbiased estimates of the actual state. It can be shown [24] that, in contrast to the difference between the standard Kalman filter estimates,x k is not zero on average, but may even diverge. Therefore, in order to evaluate the influence of a measurement z i k , we study the difference between the predicted and updated estimates of the standard Kalman filter, which is related to the weighted difference between the measurement and the prediction, i.e., where K i k denotes the standard Kalman gain. For this purpose, the standard Kalman filtering algorithm has to run in parallel to the globalized version of the Kalman filter at each sensor node. The following data-driven communication strategy can be applied: Still, experiments (see Section 6) will show that by using the data-driven communication strategy instead of random communication, for a fixed communication rate an improvement of the MSE of the fused estimatex e,d k can be achieved. However, a drawback of the algorithm is the assumption that if a particular sensor does not communicate with the center at time k, it has communicated at time k − 1, i.e., each sensor communicates with the center at least every other time step. Thus, communication rates lower than 0.5 cannot be achieved. This will be addressed by the following extensions.

Data-Driven Distributed Kalman Filtering with Omitted Estimates over Multiple Time Steps-Version 1
If we want to achieve communication rates lower than 0.5 in the sensor network, we have to allow that a particular sensor does not send its estimates to the fusion center over multiple time steps. In this case, the fusion center has to perform multiple consecutive predictions. Let us assume that the last communication of sensor i with the center occurred at time k − l. The predicted estimate for time step k is computed as shown in the following scheme.
Prediction refers to the application of Equation (4). Note that the predicted estimates are now marked with "pp" instead of "p" to emphasize that possibly multiple prediction steps were applied consecutively. In case that prediction has been performed only once, we havex The new estimatex e,d k can be expressed in terms of the estimatex e,d k from (15) as follows: For the yet to be defined triggering criterion, we consider the distance The expected estimation error covariance matrix is then given by Obviously, the expected estimation error cannot be computed exactly at the fusion center, since the difference d i k is not available. Nevertheless, it is possible to obtain an upper bound on the estimation error, if we alter the communication test (16). We ensure that in case of communication the matrix k . An alternative possibility to avoid the divergence of the difference is to debias the local globalized estimates. A strategy to debias the estimates using debiasing matrices has been proposed in [24,25]. In each prediction and filtering step, each local node computes a new debiasing matrix. This matrix is initialized by ∆ p,i 0 = I. In the filtering step, the debiasing matrix is computed by In the prediction step, the debiasing matrix is computed by ∆ pp,i k is computed by applying Equation (21) multiple times, until the next communication with the center node occurs. By multiplying the inverse of the debiasing matrix with the globalized estimates, we can debias the estimates [24,25], i.e., It can be easily shown that the same applies to the predicted estimate over multiple time steps, i.e., We new definex pp,i k Thus, in general the differencex We can now define the new fusion equations as where m is the number of sensors which communicate with the center at time k and l is the number of sensors which do not communicate with the center at time k, but for whichx pp,i k −x p,i k = 0 hold. Note that the fusion formulas are equal to (14) and (15) for N = m + l, i.e., for the case that each sensor sends its estimate to the center at least every other time step. The resulting estimate is consistent, i.e., A proof for the consistency condition (22) is provided in Appendix C.
The drawback of the presented algorithm is that it needs two parameters B and α to perform the communication test. Both parameters influence the communication rate. Thus, it is difficult to find the parameters which ensure the desired balance between a small communication rate and a small estimation error. Experiments using the particular dynamic system are needed to find the best combination of both parameters. Thus, we will now present another algorithm which only uses one parameter B for the communication strategy.

Data-Driven Distributed Kalman Filtering with Omitted Estimates over Multiple Time Steps-Version 2
For the second data-driven algorithm, fusion Equation (15) is now generalized by The difference to the previous Equation (17) is the covariance matrix (C e k ) −1 in the second sum. The new estimatex e,d k can be expressed in terms of the estimatex e,d k from (15) as follows: In order to define a communication strategy, we consider the difference which is compared against the matrix B by do not send estimate to the fusion center else send estimate to the fusion center.
B denotes a user-defined symmetric positive definite matrix. This time we do not need the Euclidean distance x pp,i k −x e,i k ≤ α since the distance between predicted and filtered estimate is included in d i k . We can now define the new fusion equations as where m is the number of sensors which communicate with the center at time k. With the same arguments as in Section 4 it can be shown that the resulting estimate is consistent, i.e., In the experimental evaluation of the algorithms we will see that although this version of fusing the estimates has the advantage that only one parameter has to be chosen, the estimate of the fused error covariance matrix is not as good as as in the previous version.

Simulations and Evaluation
We apply the CKF algorithm as well as the three data-driven ODKF algorithms to a single-target tracking problem. The system state x k is a six-dimensional vector with two dimensions for the position, two dimensions for the velocity, and two dimensions for the acceleration. A near-constant acceleration model is used. The system matrix is given by with the sampling interval ∆ = 0.25 s. The process noise covariance matrix is given by We have a sensor network consisting of six sensor nodes and one fusion node. Two sensors measure the position, two sensors measure the velocity and two sensors measure the acceleration. The measurement noise covariance matrices are given by C z,i k = 1 0 0 1 for i ∈ 1, . . . , 6 .
Monte Carlo simulations with 500 independent runs over 100 time steps are performed. Since (A k , C w k ) is stable and (A k , H k ) is detectable the error covariance matrix and the MSE converge to a unique values [43]. Based on the simulation, actual error covariance matrices and MSE values are computed for each algorithm. Monte Carlo simulations are performed for different average communication rates for each of the three algorithms. For the CKF, communication is performed randomly, but with different average rates. Note that only current measurements are communicated. If the measurement z i k is not sent to the fusion center at time k, the information will not be available at the center at any future time.
The first ODKF algorithm (Algorithm 1) from Section 3 is performed with random communication as well as with data-driven communication. In the latter case, the parameter α is varied to achieve different rates. The second algorithm (Algorithm 2) from Section 4 and the third algorithm (Algorithm 3) from Section 5 are performed with data-driven communication. For Algorithm 2 both parameters α and B are varied, for Algorithm 3 the parameter B is varied. The compared methods are: Algorithm 1: ODKF algorithm from Section 3 with random and data-driven communication. Algorithm 2: ODKF algorithm from Section 4 with data-driven communication and parameters α, B. Algorithm 3: ODKF algorithm from Section 5 with data-driven communication and parameter B.
The simulation results are shown in Figure 1. The MSEs and the traces of the error covariance matrices are depicted relative to the average communication rate in the network. Since for Algorithm 2 different parameter combinations lead to different results, we have only included the results with the smallest error covariance matrices in the plot.
Only for the centralized Kalman filter and for the Algorithm 2 and 3, communication rates lower than 0.5 are given. We can observe that for Algorithm 1 data-driven communication leads to an improved estimate compared to random communication. However, it also leads to a larger trace of the error covariance matrix and thus, to a larger uncertainty reported with the estimate.
We also can observe that for communication rates in range [0.5, 1] the results of Algorithms 2 and 3 with data-driven communication are almost equal. This can be explained by the fact that Algorithm 2 extends Algorithm 3 share a common triggering criterion, and the fusion formulas for both algorithms are equal if each sensor communicates with the center at least every other time step.  Figure 1 shows that for each of the algorithms the MSE is always smaller than or equal to the trace of the error covariance matrix. This illustrates that the estimators provide consistent results. The traces are good estimates of the MSEs except for low communication rates in Algorithm 3 and very low communication rates in Algorithm 2. Thus, the trace of the error covariance matrices-the uncertainty reported by the estimators-is not significantly larger than the actual uncertainty in most of the cases. Each of the distributed fusion algorithms performs better in terms of the MSE as compared to the centralized algorithm. This can be explained by the fact that in the distributed network the fused estimates contain the information of all past measurements, while in the centralized network only the current measurements are fused.

Conclusions
In this article, the optimally distributed Kalman filter (ODKF) has been extended by data-driven communication strategies in order to bypass the need for full communication that is usually required by the ODKF to compute an estimate. Since the ODKF may provide inconsistent results if data transmissions are omitted, the missing estimates are replaced by predictions from previous time steps and consistent bounds on the error covariance matrix are computed. The first proposed technique allows for communication rates in the range [0.5, 1] while the second and the third algorithm allow for any communication rate in range [0, 1]. In a centralized Kalman filter (CKF), where measurements are directly sent to the central node, missing or lost transmissions to the center node need to be repeated in order to avoid loss of measurement data. In this regard, the proposed extensions of the ODKF can significantly outperform the CKF: The local estimates of each sensor node comprise the entire history of local measurements and hence, suspended transmissions do not lead to a loss of information in the network.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
We will now show that the resulting estimatex e,d k and the error covariance matrix C e,d k are equal to the estimatex e,c k and the error covariance matrix C e,c k computed by the CKF according to (12) and (13). Proof: C e,d k = C e,c k holds due to Accordingly,x e,d k =x e,c k holds due to

Appendix B
In the following, relationship (20) is proven.
Proof. As a symmetric positive definite matrix, B can be written as a product B = AA T , where A is a lower triangular matrix with positive diagonal entries, using the Cholesky decomposition. A T is then an upper triangular matrix with positive diagonal entries. As triangular matrices with positive diagonal entries, the matrices A and A T are invertible. We define b := A −1 d i k . We have then We also have We still have to show that First, we will show "⇒". From I − bb T ≥ 0 we have a T a − a T bb T a = a T I − bb T a ≥ 0 ∀a = 0. With a = b we have We will now show "⇐". From b T b ≤ 1 we have a T b ≤ a b ≤ a and b T a ≤ a b ≤ a ∀a. We have then a T bb T a ≤ a T a ∀a. It follows a T I − bb T a = a T a − a T bb T a ≥ 0 ∀a = 0 and thus, I − bb T ≥ 0 .

Appendix C
We will now show that the resulting estimate is consistent, i.e., relationship (22) holds. Due to the orthogonality principle [44] we have We can now write (A1) as

Proof. We havex
To complete the proof we still have to show that