Variational Bayesian Algorithms for Maneuvering Target Tracking with Nonlinear Measurements in Sensor Networks

The variational Bayesian method solves nonlinear estimation problems by iteratively computing the integral of the marginal density. Many researchers have demonstrated the fact its performance depends on the linear approximation in the computation of the variational density in the iteration and the degree of nonlinearity of the underlying scenario. In this paper, two methods for computing the variational density, namely, the natural gradient method and the simultaneous perturbation stochastic method, are used to implement a variational Bayesian Kalman filter for maneuvering target tracking using Doppler measurements. The latter are collected from a set of sensors subject to single-hop network constraints. We propose a distributed fusion variational Bayesian Kalman filter for a networked maneuvering target tracking scenario and both of the evidence lower bound and the posterior Cramér–Rao lower bound of the proposed methods are presented. The simulation results are compared with centralized fusion in terms of posterior Cramér–Rao lower bounds, root-mean-squared errors and the 3σ bound.


Introduction
There has been much interest in recent years in the use of sensor networks as opposed to single sensors in the fields of target tracking [1], intelligent transportation [2], environmental monitoring [3], spacecraft navigation [4], etc. Multi-sensor nodes can provide greater spatial coverage and, by cooperation, effectively complement the limitations of a single sensor, potentially resulting in improving estimation and fusion performance. Along with the rapid development of sensor network technologies, the two aspects of fusion architecture and state estimation optimization have been studied in the past few years [5][6][7].
Sensor network fusion can be classified into three categories, i.e., centralized, decentralized, and distributed architectures [6,8]. Schematically, examples of different fusion architectures are shown in Figure 1. The topology of sensor networks is, typically, described as an undirected graph where nodes can exchange measurements (or estimation) from neighbors via a bidirectional edge or a directed graph where nodes only can deliver measurements (or estimation) in a fixed direction. Broadcast communications can be single-hop or multi-hop [9]. Here, we only consider the sensor network with undirected topology and single-hop communications. The yellow circles and blue circles describe fusion centers and sensor nodes, respectively. In the centralized architecture, information fusion takes place after the local sensor measurements are delivered into the fusion center. This architecture can make full use of measurements if the communication bandwidth is high enough to accommodate transmission of all sensor measurements to the fusion center, leading to theoretically optimal fusion [10]. As a result, it is generally utilized as a benchmark for performance comparison and evaluation of the other fusion architectures that will be discussed here. However, there are several inherent problems that need to be taken into account in considering a centralized fusion architecture. One problem is the measurement delay resulting from communication or sensor sampling rate [11][12][13]. Another is that the centralized architecture will, typically, result in a trade-off between timely fusion accuracy performance, and the requirements of communication bandwidth and computational costs because of the broadcast and processing of measurements from all sensors [6,14]. In addition, this architecture is also generally more sensitive to outliers than other fusion architectures [15], such as the following decentralized one.
In decentralized architecture, sensors are partitioned into several clusters, each of them with a fusion center [16,17] where measurements from neighboring nodes are collected to yield a local fused estimation. The decentralized architecture is distinguished from the centralized one by the multiple fusion centers, which can lead to a spread of computational costs across multiple devices and better robustness [6].
In the distributed fusion architecture, each node individually provides a local estimation by using measurements collected from itself and its neighbors. It could be considered as a special form of the decentralized one. Compared with the other two forms, it has the following advantages: (1) Enhanced scalability and feasibility; it allows relatively easy scaling of the network up or down by adding or subtracting nodes depending on practical applications [18]. (2) Increased robustness and fault tolerance to node failures, especially in harsh situations, such as an underwater environment. (3) Reduced bandwidth requirement; it mitigates communication bottlenecks that might arise, for example, in target detection and tracking systems. (4) Reduced computational cost of the three architectures because some processing such as inverse operations generally needed in estimation are performed in their own individual fusion centers. A significant amount of literature on distributed fusion architectures has been published. Ref. [10] is an early paper describing the basic principles of distributed fusion architecture in target tracking systems. A comprehensive review of the characteristics, advantages and estimation solutions for distributed low-cost sensor networks is presented in [6]. For distributed robust filtering, a variational Bayesian (VB) algorithm with a conjugate-exponential model is proposed in [19]. As mentioned in [20], the problem of distributed detection and tracking over a Doppler-shift sensor network is studied. Consensus methods are used in [21,22] for the problem of uncertain noise statistics of distributed sensor networks. However, distributed fusion architecture also has some negative issues, for example, lack of awareness of the sensor network as a whole.
In the context of fusion architectures, state estimation is also essential for target tracking in sensor networks, especially for a nonlinear system. Exact solutions for the posterior probability density function (PDF) for nonlinear systems are mostly unavailable, resulting in a considerable amount of work on approximations of the posterior. A great number of nonlinear filters, such as the extended Kalman filter (EKF) [23], the unscented Kalman filter (UKF) [24], the Gauss-Hermite Kalman filter (GHKF) [25], the central difference Kalman filter (CDKF) [25], the cubature Kalman filter (CKF) [26] and their variants [27,28], have been proposed under a Gaussian noise assumption both on measurements and system. For non-Gaussian noise models, stochastic estimation methods, such as Markov Chain Monte Carlo [29], and sequential Monte Carlo (also named particle filters (PF)) [30] and variants [31,32] have been the focus of much attention over the last two decades. Since these methods need a large number of particles to ensure nonlinear estimation accuracy, Doucet and his colleagues introduced the Rao-Blackwellized particle filter (RBPF) [33] to marginalize out the linear variables to be solved by an optimal filter, and focused the sampled particles on the remaining variables (nonlinear part). For more information about nonlinear estimation problems, we refer the reader to the review articles [34][35][36] where more comprehensive interpretations have been provided.
In terms of optimization techniques, gradient-based optimization is often used in nonlinear filters to improve the performance of nonlinear estimation. As shown in [37], gradient descent is adopted for smartphone orientation estimation, yielding a quaternionbased Kalman filter algorithm to estimate exercise motion. In [38], a gradient descent iterative nonlinear Kalman is proposed for the problem of random missing outputs. For problems where the objective function, such as the variational evidence lower bound (ELBO) and the Kullback-Leibler Divergence (KLD), need to be maximized or minimized, the gradient method generally requires linearization of the objective function. For nonlinear estimation optimization problems, the gradient can be written as a mean of the samples of interest, yielding a stochastic gradient optimization method [39]. In [40], the authors present a modified particle filter where stochastic gradient is used to minimize the criterion function. Compared with the ordinary gradient, natural gradient (NG) has the advantage of theoretical connections from information geometry. According to Amari's works [41,42], it has a steepest direction in Riemannian space. As shown in [41], NG comes with a theoretical guarantee of asymptotic optimality and can be used to produce a Fisher-efficient iterative estimator on a statistical manifold. In [43], the NG method for the nonlinear estimation problem is proved to be asymptotically optimal in the sense of the Cramér-Rao bound. The earliest interest in NG can be found in [44,45]. Simultaneous perturbation stochastic approximation (SPSA) [46] is an alternative optimization method which can be use for stochastic search. It is a gradient approximation statistic optimization method, which does not require the linearization of the objective function [46]. In this method, a group of samples of objective functions is sampled to obtain a two-side differential function, so that it can be used to approximate the otherwise intractable gradient of the objective function to be estimated. An application of SPSA for detection of the center of a thermal updraft is presented in [47], resulting in an adaptive autonomous soaring algorithm for multiple unmanned aerial vehicles.
In terms of the iterative optimization methods, among the most common approaches are expectation-maximization (EM) [48] and VB [49,50]. EM realizes estimation optimization by establishing a feedback loop, including an expectation step (E-step) and a maximization step (M-step). In the E-step, the conditional expectation of the likelihood function is calculated according to a given prior and measurement. In the M-step, the conditional expectation is maximized to obtain the estimation of the variables. While the state and parameters are estimated and optimized alternately in the iteration cycle, EM still has problems for large scale models and data. VB differs from EM in that the parameters are stochastic [50], resulting in VB being capable of making a joint distribution of state and parameters, and then being especially suitable for dealing with high-dimensional and large-scale problems via mean field theory. Recent advances in the variational iterative framework can be found in [19,51], in which a unified VB approach is provided for the joint estimation of system state and parameters in a target tracking system. This paper considers the problems of estimation optimization and fusion in networked target tracking systems, and aims to derive a nonlinear estimation, optimization and fusion approach by utilizing VB with the optimization methods of NG and SPSA, achieving a high performance in accuracy. The key contributions of this paper are as follows: 1.
Development of a distributed variational fusion framework by utilizing variational mean field theory to approximately partition the joint posterior distribution into several solvable variational distributions under the assumption of independent measurements. 2.
Presentation of a novel deterministic nonlinear estimation optimization method to maximize the distributed ELBO using NG, based on linearization to approximate the posterior distributions closely, producing the DVBKF-NG algorithm which yields closed form nonlinear state estimation with the associated covariance for the sensor network.

3.
Presentation of a novel stochastic estimation optimization for the VB framework by using SPSA and deriving the stochastic gradient estimation of ELBO, thus producing an iterative filter, i.e., DVBKF-SPSA.

4.
Demonstrate performance metrics: distributed ELBO and the posterior Cramér-Rao lower bound (PCRLB) for these algorithms over different iterations.
The rest of the paper is organized as follows. In Section 2, we introduce the general nonlinear estimation problem in target tracking over a sensor network. A distributed variational Bayesian estimation optimization framework is proposed in Section 3, involving partitioning of the joint measurement likelihood. Under the optimization framework, two distributed iterative nonlinear variational Bayesian Kalman filtering algorithms (DVBKF-NG and DVBKF-SPSA) are presented in Section 4, using NG and SPSA, respectively. In Section 5, we consider two kinds of metrics, i.e., ELBO and PCRLB, to evaluate the performance of the proposed methods. In Section 6, we give an example to verify the proposed methods for maneuvering target tracking over a sensor network. Finally, conclusions are drawn and future work is discussed in Section 7.

Problem Formulation
Consider a maneuvering target travels in the area covered by a fully connected sensor network with N nodes; the measurements Z k = [Z 1,k , · · · , Z n,k , · · · , Z N,k ] T with noise covariance R k at time index k are collected from the set of sensors, where Z n,k = z n,k , z 1 n,k , · · · , z l n,k , · · · , z L n,k T denotes the measurements from sensor n and its neighbors, where z n,k ∈ R d is the measurement of the nth sensor and z l n,k is the measurement from the lth neighbor of the nth sensor, the sensor index n ∈ {1, 2, · · · , N}, l ∈ {1, 2, · · · , L} and satisfies L ≤ N − 1, d denotes the dimension of z n,k . We assume that only the single-hop communication between two sensors is considered in this paper. The system model and the measurement model are given as follows, respectively, where x k ∈ R m is the system state which may include position, velocity and other related quantities, and m is the dimension of x k . Both the system process ω k and the measurement noise υ n,k of sensor n are assumed to be mutually independent and zero-mean Gaussian noises with covariances of Q k−1 and R n,k , respectively. And f k|k−1 (·) and h n,k (·) denote the state transfer function and measurement function, respectively. What we want is to estimate the system state based on the collection of sensor measurements, which is the process of inferring the system state of interest using measurements Z k , usually, under the Minimum Mean Square Error (MMSE) criterion, i.e., where x k|k and P k|k are the state estimation and the associated error covariance, and where p(x k |Z k−1 ) is the a priori distribution and p(Z k |x k ) is the likelihood of measurement Z k with given x k . Under Gaussian assumptions, the predicted density p(x k |Z k−1 ) is given by For nonlinear systems, computing the integral p(x k , z k )dx k in the denominator of (5) is in general difficult and the optimal approximation is needed. In this paper, we are interested to use the variational Bayesian method to approximate the a posteriori distribution and thus solve the nonlinear estimation problem of maneuvering target tracking in the described sensor network.

Distributed Variational Bayesian Estimation Optimization Framework
Given measurement Z k collected from sensors, the principle of VB is given as and where q(x k |ψ k ) is a variational distribution with parameter ψ k , L(ψ k ), and D KL [q(x k |ψ k )||p(x k |Z k )] are the ELBO and the KLD between q(x k |ψ k ) and p(x k |Z k ), respectively. What we wish to do is to approximate the p(x k |Z k ) as closely as possible by minimizing the KLD. Actually, the approximation can be considered as a moving variational distribution along a chosen search direction iteratively to the position of the posterior distribution in a statistical manifold. However, it is difficult to seek out an appropriate method to achieve the approximation, because the variational distribution is usually underparameterized and not sufficiently flexible to capture the true posterior [52]. As a result, the ELBO is considered to be maximized since maximizing ELBO is equivalent to minimizing KLD from (7), i.e., We assume that measurements from all sensors are mutually independent; the term E q [log p(Z k |x k )] in (10) can be partitioned in distributed fusion architecture as follows where the measurements Z n,k with associated noise variance R n,k are collected from sensor n and its neighbors, where where z n,k denotes the measurement of sensor n, and z l n,k denotes the measurement from the lth neighbor of sensor n, V n, is the measurement noise of sensor n and its neighbors, and R n,k and H n,k (x k ) are the associated noise covariance and measurement matrix, respectively, where R l n,k = Cov v l n,k , (v l n,k ) T denotes the variance of the noise v l n,k from the lth neighbor of sensor n. The math symbol blkdiag(·) denotes a block diagonal matrix created by aligning input matrices.
For the (i + 1)th variational iteration, we can take the ith variational distribution q(x k |ψ i k ) as the prior; the optimized ELBO can be given as We note the definition ψ k (x k|k , P k|k ) in this paper, and wish to update state estimation x k|k and the associated error covariance P k|k in each iteration by maximizing the above ELBO. In the following section, we present two distributed iterative variational Bayesian Kalman filters for maneuvering target tracking by using NG and SPSA, respectively.

Distributed Iterative Variational Bayesian Kalman Filters over Sensor Network
In this section, with the assumption that the measurements are mutually independent, we present alternative distribution variational Bayesian Kalman filtering algorithms (DVBKF) via NG and SPSA.

NG-Based DVBKF
NG calculated by using information geometry (generally KLD linearization) has a steepest direction in Riemannian space. According to Amari's works [41,42], we present the NG of objective function L(ψ k ) with respect to parameter ψ k as follows. Set ∆ψ k ψ k − ψ i k → 0, and (13) can be rewritten as where F ψ i k is Fisher information and presented as The proof can be seen in [53]. After computing the partial derivative of the right side of (14) and setting it equal to 0, the NG of the ELBO at the ith iteration in distributed architecture is expressed as The NG ∇ ψ i k is the direction in which the increase of ELBO is greatest [54]; that means it has the greatest descent or ascent at each iteration in statistical manifold space, and can move the variational distribution to approximate the posterior fastest. At this point, the optimal ψ k is presented as It is observed that (17) is a general expression of the iterative update of parameter ψ k . In terms of the update of x i+1 k|k and P i+1 k|k , the special forms of Fisher information matrices (F −1 ) with respect to x k|k and P k|k need to be analyzed. With the Gaussian system assumption, the KLD between two Gaussian distributions q 1 ∼ N(ξ 1 ; µ 1 , C 1 ) and q 2 ∼ N(ξ 2 ; µ 2 , C 2 ) with the same dimension d is given as Combining with (15), the Fisher information matrices with respect to x i k|k and P i k|k are presented as The gradient of log-likelihood expectation with respect to x i k|k and P i k|k of sensor n are presented as denotes the Jacobian matrix of the measurement matrix of the nth sensor.
Recalling the iterative optimization forms in (16) and (17), the distributed iterative state estimation x i+1 k|k and the associated covariance P i+1 k|k in DVBKF-NG are, respectively, given by The iterative optimization process of DVBKF-NG is summarized in Algorithm 1. We make the following remarks: 1.
The update of x i k|k is preconditioned by P i k|k , which produces an adaptive movement to the posterior PDF along the direction of NG.

2.
Clearly, Equation (24) shows that P i+1 n,k|k ≤ P i n,k|k , which means that estimation error covariance decreases gradually at each iteration. Additionally, since H T n,x i k|k R −1 n,k H n,x i k|k is the expectation of the Hessian, P i n,k|k has a quadratic convergence.

3.
To make the algorithm adaptive, the relative estimation error e r can be used for judging the iteration termination.
where is a small positive number which can be chosen according to practical scenarios.

SPSA-Based DVBKF
SPSA is a statistical optimization method for gradient approximation, which does not require the full knowledge of the objective function being minimized (or maximized) and parameters being optimized [46]. In this method, a group of samples of objective function L(ψ k ) is sampled as Y (ψ k ) = L(ψ k ) + ζ to obtain a two-side differential function, where ζ is a random infinitesimal perturbation. Then, the two-side infinitesimal function is computed by where ∆ ψ i k is random perturbation vector with a Gaussian distribution form and c ψ i k is a small positive number that decreases with i. As a result, the estimation of the gradient of objective function L(ψ k ) can be obtained by the two-side differential. The mth component of the gradient estimator at the ith iteration is given as follows, where m ∈ {1, 2, , · · · , M}, M is the dimension of ψ k . The gradient estimation of L(ψ k ) at the ith iteration is presented aŝ where At this point, the ψ i+1 k can be updated by where a ψ i k is a weighted factor.
Taking the distributed variational ELBO L(ψ k ) in (13) as the objective function, the two-side infinitesimal perturbations of L(ψ k ) with respect to x i k|k and P i k|k are given, respectively, as Randomly choose a sample x s k ; the random perturbation of state ∆ x i k|k is given by The associated random perturbation of covariance is given as Recall (26), the two-side differential dY ψ i k with respect to x i k|k and P i k|k is written as where ζ x and ζ P are random infinitesimal perturbations. It follows that the estimation of the gradient of L(ψ k ) with respect to x i k|k and P i k|k at the ith iteration is given byĜ where

M,M
and (∆ P i k|k ) m,m denotes the element in the mth row and mth column of ∆ P i k|k . Therefore, state estimation and the associated covariance are updated by where the parameters of a x i k|k and a P i k|k are weighted factors. Set the same iteration termination as (25); the DVBKF-SPSA algorithm is summarized in Algorithm 2.

8:
Compute the two-side infinitesimal perturbations of L(ψ k ) with respect to x i k|k and P i k|k by (30) and (31), respectively.

Performance Evaluation
It is often of interest to know how closely the posterior distribution is approximated and how accurately a variable can be estimated. In this section, we present two metrics for the proposed algorithm performance evaluation: One is variational ELBO which is monotonically increasing in iteration index to measure the convergence of variational iteration. The other is PCRLB which provides a lower bound on the mean square error of system state estimation [27]. In this section, we present the general forms of ELBO and PCRLB both of DVBKF-NG and DVBKF-SPSA.

Performance in ELBO
From (10) and (11), the iterative ELBO of distributed architecture can be rewritten as After computing the log-likelihood expectations in (40) under the Gaussian assumption, we present the iterative ELBO of DVBKF (see the derivation in Appendix A) over the sensor network, as follows in which D n,z and D x are the dimensions of Z n,k and x k , and H n,

Performance in PCRLB
The PCRLB provides a theoretical lower bound for the estimation problem under a distributed Bayesian framework. It has the following defined form [55] where J k+1 is the posterior Fisher information matrix, recursively computed by The terms in (43) can be expressed by From (44), we can know that the terms of D 11 k , D 12 k and D 21 k are related only to the system model and irrelated to the fusion architecture of the sensor network. With the assumption of independent measurements, log p(Z k+1 |x k+1 ) = ∑ N n=1 log p(Z n,k+1 |x k+1 ).
After computing the gradients in (44) and (45) by linearizing h k (x k ) and f k|k−1 x k−1|k−1 , the PCRLB (see the derivation in Appendix B) of DVBKF has the following form

1.
From (40) and (41), it is observed that the ELBO values both of DVBKF-NG and DVBKF-SPSA are related to measurements, prior, variational distribution and the number of iterations. Generally, it is assumed that the prior of a given objective system is known. Therefore, choosing an appropriate form of parameterized variational distribution, increasing the number of iterations and providing more measurement are of the essence to maximize the ELBO and lead a close approximation of the posterior. However, the balance between computation cost and accuracy should be considered according to practical applications.

2.
The PCRLB is an important metric to evaluate the accuracy of estimation algorithms. On the one hand, it is determined by models of dynamic systems and measurement systems. On the other hand, similarly to the ELBO, the PCRLB values both of DVBKF-NG and DVBKF-SPSA are still related to the number of iterations and the amount of measurement. 3.
The two metrics are inextricably linked with each other: The former is the means and the latter is the goal. ELBO maximization means approximating the posterior distribution closely by an iterative variational distribution, which can lead the PCRLB to a lower trend.

Numerical Simulation
In this section, we present a scenario of 2-D maneuvering target tracking over a sensor network with Doppler-only measurement to illustrate the performance of DVBKF-NG and DVBKF-SPSA. We also present the NG-based and SPSA-based optimizations for centralized fusion architecture, named CVBKF-NG and CVBKF-SPSA, which can be utilized, respectively, as benchmarks corresponding to DVBKF-NG and DVBKF-SPSA for performance comparison.
From the system observability theory, we can know that target state with Doppler-only measurement only can be observable after collecting measurements from at least three Doppler sensors with different fixed locations. Thus, the measurements from neighbors along with the measurements of local sensors are used to estimate variables in distribution fusion architecture.

Performance Metrics
The estimation performance of the proposed algorithms is measured by root-meansquared error (RMSE), 3σ rule and the mean running overhead with 1000 Monte Carlo simulations. The RMSE in range R RMSE k , RMSE in radical velocity V RMSE k , 3σ in range R 3σ k , 3σ in radical velocity V 3σ k and mean running time T mean are given, respectively, as follows T denote the true position vector and velocity vector of the target, where x p k|k and x v k|k denote the associated estimation, respectively, · denotes the Euclidean norm and t m,k is the running time at the kth estimation in the mth Monte Carlo simulation. The values of the parameters in the proposed algorithms are given in Table 1.

. Simulation Setup
In this scenario, we consider a sensor network which consists of 20 Doppler-only sensors located in a square of 120 m × 120 m randomly, as shown in Figure 2. The communication capability of each sensor node is 50 m. The target is maneuvering with dynamic multiple models F j k , j ∈ {1, 2} given as follows. 14), [18,30), [34,49), [53,71), [75, 90)}; j = 2, k ∈ { [14,18), [30,34), [49,53), [71, 75)}. (48) where the turn rates are θ 1 = −9.8N 1 /v k and θ 2 = −θ 1 , N 1 = 0.2 is the overloads of target maneuvering and the scan period is T = 0.2 s. In this manuscript, the target state is represented as √ 0.03 m/s. System noise covariance is given as Q k = diag q 2 1 q 2 1 q 2 2 q 2 2 , where q 1 = 0.01 m and q 2 = 0.01 m/s. Sensor measurements are described by a nonlinear equation of Doppler shift between the target and each of the sensors, and measurement noise covariances are time-varying because of target motion. The measurement function and measurement noise covariances are given by [56,57] h (j) Equations (49) and (50) represent the measurement model of the sensors, where (49) is the mapping from state to measurement and represents the Doppler shift between the target and each of the sensors. Since the radial velocity of the target is related to Doppler shift, we update it by using the velocity in the coordinates. Equation (50) represents the associated time-varying measurement variance because of target motion, where T d = 1 µs is the pulse Doppler waveform width, R SNR is the signal-to-noise ratio (SNR) and R SNR = P e P n . For a given transmitted waveform with unit energy, the energy of the received signal P e = P t GA e δ (4π) 2 r 4 , where r is radar radius, G and δ are the radar antenna and the cross-section area, respectively, and P t and A e are the echo power and the effective receiving area of radar antenna, respectively. The relationship between measurement noise covariance and the range from radar to target is presented in Appendix C for a Gaussian noise P n = kT s 2 , where T s = 290 K and k = 1.3806 × 10 −23 J/K are the temperature in degrees Kelvin and the Boltzmann constant, respectively. Figure 3a shows the number of neighbors of each sensor node. It is clearly the 4th sensor has the most neighbors, numbering 12, then the 1st, 6th, 12th and 14th sensors, and the 17th sensor has the least number of neighbors. Besides the proposed iterative algorithms, we use the interacting multiple model (IMM) method to achieve the model transfer in simulation. In distributed architecture, the sensor with best estimation accuracy is chosen for output. From Figure 3b, we can observe that sensor 4 has the largest number of outputs, then sensors 6, 14, 12 and 1. To a certain degree, this reflects that multi-sensor measurements have the advantage of improving accuracy.   Figure 4a,b shows the PCRLB and RMSE curves in range and radical velocity, respectively. From the view of fusion architecture, centralized fusion architecture has lower RMSE curves and PCRLB curves than the distributed one. Namely, CEKF, CVBKF-NG and CVBKF-SPSA are, respectively, better than the associated distributed DEKF, DVBKF-NG and DVBKF-SPSA, since the state estimation is updated by using the measurements from the sensor and its neighbor in distributed architecture. In contrast, the measurements collected from all sensors are used to update state estimation in centralized fusion architecture.

Simulation Results and Analysis
From the view of optimization methods, we can observe that the RMSE curves of DVBKF-NG and DVBKF-SPSA are better than DEKF which does not use any optimization methods. Besides iterative linearization, random perturbation sampling which can capture more information from nonlinear measurement is used in DVBKF-SPSA. As a result, the measurement is utilized effectively to improve the accuracy performance. As shown in Figure 4, the RMSE curves obtained by using SPSA are lower than those obtained by NG optimization both in range and radical velocity. However, from Figure 4b, it is also found that the proposed algorithms and comparison algorithms in distributed architecture are more sensitive than those in the centralized one when the dynamic model transfers. The comparisons of the PCRLB and RMSE of estimated position and velocity in coordinates are given in Figures 5 and 6, respectively.  More quantitative RMSE comparison in centralized architecture and distributed architecture can be seen in Tables 2 and 3, from which it can be seen that the RMSE curves of the proposed algorithms DVBKF-NG and DVBKF-SPSA are close to those in the centralized architecture. It is also clear that the PCRLB in distributed architecture is slightly bigger than that in centralized architecture, because only a part of the sensors are used in distributed architecture. However the computational cost in distributed architecture is much smaller than that in the centralized one. Tables 4 and 5 give the computational cost comparison of the algorithms in centralized architecture and distributed architecture mentioned above. The 3σ bound is another evaluation of target tracking accuracy. A solid line indicates the estimation error of associated algorithm. A dashed line indicates the 3σ error of the algorithm which presented as a solid line with the same color. Figure 7 presents the comparison of the 3σ bound. It is seen that the algorithms with NG or SPSA have smaller 3σ bound in range than CEKF and DEKF. The radical velocities 3σ of the proposed algorithms with NG and SPSA are slightly bigger than that of CEKF at some scans in Figure 7b. But we can observe that the radical velocities 3σ of the proposed algorithms are robust for the maneuvering target tracking both in centralized fusion architecture and distributed fusion architecture. The comparisons of 3σ error of estimated position and velocity in coordinates are given in Figures 8 and 9, respectively. A solid line indicates the estimation error. A dashed line indicates the 3σ error of the algorithm which presented as a solid line with the same color. The quantitative 3σ comparison is given in Tables 6 and 7.
As mentioned above, the minimization of KLD between variational distribution and posterior distribution is equivalent to maximization variational ELBO. To observe the changes in ELBO and KLD with iteration clearly, we present Figures 10 and 11 to illustrate the normalized ELBO and KLD of the proposed DVBKF-NG and DVBKF-SPSA, respectively. In Figure 10, each line denotes the ELBO in one scan with 200 iterations, and each line in Figure 11 denotes KLD in one scan with 200 iterations. The results in Figures 10 and 11 verify our standpoints by the following fact: the ELBO curves increase with the number of iterations, corresponding to a decrease in the KLD.

Discussion
In this paper, we address the problem of improving the accuracy for maneuvering target tracking in sensor networks. Two kinds of optimization methods, NG and SPSA, are introduced to maximize the distributed ELBO where the joint likelihood is partitioned approximately into several simple marginal likelihoods by variational mean field, formulating the algorithms of DVBKF-NG and DVBKF-SPSA. Moreover, the performance metrics, both of ELBO and PCRLB, are presented over different iterative indexes. In addition, a maneuvering target tracking scenario over a sensor network is given to verify the performance of the proposed algorithms. From the view of fusion architecture, centralized fusion architecture has lower RMSE curves and PCRLB curves than the distributed one. From the view of optimization methods, the simulation results show that the RMSE curves of the proposed algorithms are better than those which do not use any optimization methods.
For future work, we plan to adopt VB for the robust estimation of sensor networks. For example, outliers always lead to heavy-tailed and asymmetric distributions which are apt to lead to large estimation errors. It is expected for novel methods to mitigate the adverse influence and VB in which an unsolvable distribution can be approximated by a parameterized distribution is an appropriate method at this point. We also plan to develop VB for multiple passive sensor placement. For example, in a bearings-only target tracking system, tracking accuracy is highly dependent on the locations of the bearings-only sensors. Therefore, it is desirable to schedule the sensors' moving trajectories in a way to achieve a minimized tracking error at a future time.  Institutional Review Board Statement: Not Applicable.

Data Availability Statement:
No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Derivation of ELBO of DVBKF
With Gaussian assumption, the log-likelihood expectations in (40) are computed as (A1), in which H n, The expectation of log-a priori distribution and the expectation of log-variational distribution are given as (A2) and (A3), respectively. As a result, the distributed ELBO L ψ i k in (40) is calculated as (A4).  (A4)

Appendix B. Derivation of the PCRLB of DVBKF
After the linearization of h k (x k ) and f k x k−1|k−1 , the above equations can be written as According to the matrix inversion lemma, the Fisher information in Equation (43) can be rewritten as Therefore it yields the distributed PRLB, (A7)

Appendix C. The Relationship between Measurement Noise Covariance and the Range from Radar to Target
The standard deviation of the time-varying measurement noise of radar is σ ≥ √ 3∆ f π √ R SNR , which is given as Equation (18.46) in reference [56]; the ∆ f is the frequency domain resolution and ∆ f = 1 T d . The time-varying measurement noise can be rewritten as follows Assume a radar transmit power is P t ; the power density is written as S 1 = P t 4πr 4 . Define the radar antenna gain G; the power density can be rewritten as Set the cross-section area as δ; the received power density S r is calculated by Assume the effective receiving area of radar antenna A e ; the echo power is given as P e = A e S r = P t GA e δ (4π) 2 r 4 ∝ 1 r 4 .
since signal-to-noise ratio is defined as R SNR = 10 log 10 P e P n , where P n is noise power. For Gaussian noise P n = kT s 2 , where k and T s are Boltzman constant and Kelvin temperature, respectively, it yields R SNR , R SNR = 10 log 10 P t GA e δ 8π 2 kT s r 4 . (A12)