An Iterative Nonlinear Filter Using Variational Bayesian Optimization

We propose an iterative nonlinear estimator based on the technique of variational Bayesian optimization. The posterior distribution of the underlying system state is approximated by a solvable variational distribution approached iteratively using evidence lower bound optimization subject to a minimal weighted Kullback-Leibler divergence, where a penalty factor is considered to adjust the step size of the iteration. Based on linearization, the iterative nonlinear filter is derived in a closed-form. The performance of the proposed algorithm is compared with several nonlinear filters in the literature using simulated target tracking examples.


Introduction
Bayesian estimation is widely applied across many areas of engineering such as target tracking, aerial surveillance, intelligent vehicles, and machine learning [1]. In linear Gaussian systems, the state estimation can be optimally achieved using a Kalman filter as a closed form solution. However, many real-world estimation problems are nonlinear, resulting in analytically intractable posterior probability density function (PDF) for the state. In consequence, suboptimal approximation methods are sought to solve the nonlinear estimation problems [2].
Many suboptimal techniques have been developed to solve nonlinear estimation problems. These techniques may be divided into the following three categories. The first category, includes the extended Kalman filter (EKF) [3], the iterated extended Kalman filter (IEKF) [4,5], and their variants [6,7], solves the state estimation problem through replacing the nonlinear functions by their linear approximations via the Taylor expansion. The second category involves stochastic sampling methods. In the filtering process, a set of randomly sampled points with weights are adopted to approximate the PDF of the underlying state. For example, the particle filter (PF) [8][9][10] is a sequential Monte Carlo (SMC) stochastic sampling method, which approximates the PDF by the sampled particles from a proposal distribution. PF can be applied to nonlinear non-Gaussian systems. Markov Chain Monte Carlo (MCMC) is another popular stochastic method since it is able to achieve arbitrarily high accuracy using a large number of particles, sometimes resulting in prohibitive computational expenditure. The techniques in the third category use deterministic sampling methods. Nonlinear state PDFs are approximated by a set of fixed points and weights that represent the location and spread of the distribution. This category includes the unscented Kalman filter (UKF) [9,11,12], the cubature Kalman filter (CKF) [13,14], and the central difference Kalman filter (CDKF) [15]. The UKF and the

Problem Formulation
Consider a general dynamic system with measurement as follows.
where f k (·) denotes the state transition function, and h k (·) denotes the mapping from the system state to the measurement; ω k and v k are the process noise and the measurement noise, respectively. We assume that ω k and v k are Gaussian and mutually independent, ω k ∼ N (0, Q k ) and v k ∼ N (0, R k ).
Assuming that the posterior PDF p (x k−1 |z k−1 ) at time k − 1 is available, the PDF of the predicted state is obtained by the Chapman-Kolmogorov equation.
p (x k |z k−1 ) = p (x k |x k−1 ) p (x k−1 |z k−1 ) dx k−1 . (3) Then, at time k, the posterior PDF is obtained using the measurement z k by an application of the Bayes rule: For linear Gaussian systems, it is well known that the optimal state estimation x k|k and the corresponding error covariance P k|k under the criterion of minimum variance estimation are obtained by: For nonlinear systems, the integral in Equation (4) is often intractable. Suboptimal approximations for the posterior PDF are needed. Most existing suboptimal algorithms adopt linearization or sampling techniques to approximate the posterior PDF p (x k |z k ). We consider an iterative VB approach, in which the true PDF is approximated by a variational distribution and is approached by iterative optimization of the ELBO. The proposed algorithm converts the nontrivial integration to a closed-form optimization and therefore improves estimation accuracy.

Evidence Lower Bound Maximization
The above nonlinear estimation problem can be solved using a VB framework. Express the marginal PDF p (z k ) using a variational distribution q (x k |ψ k ) as follows: where D KL [p(x k |z k ) q(x k |ψ k )] is the KL divergence between the true posterior PDF p(x k |z k ) and the variational distribution q(x k |ψ k ); that is, L (ψ k ) is the variational ELBO: The variational distribution q (x k |ψ k ) is assumed Gaussian with unknown parameter ψ k = x k|k , P k|k (to be estimated), where x k|k is the mean and P k|k is the covariance.
Please note that the poterior PDF p(x k |z k ) needs to be closely approximated by a known distribution in nonlinear filtering. From Equation (7), evidently, the variational distribution q(x k |φ k ) would be equal to the true posterior PDF p(x k |z k ) if the KL divergence were zero. Minimizing the KL divergence, and thereby approximating the posterior PDF by a variational distribution, is equivalent to maximizing the ELBO, i.e., According to Equations (9) and (10), the nontrivial integration of Equation (7) is converted to the problem of maximizing ELBO, which can be solved by the VB method.

Proximal Iterative Nonlinear Filter
In this section, we derive an iterative procedure in a closed-form to iteratively maximize the ELBO so as to minimize the KL divergence between the true posterior PDF p(x k |z k ) and the variational distribution q(x k |ψ k ).

Penalty Function Based on KL Divergence
Notice that the KL divergence D KL q (x k |ψ k ) q x k |ψ i k is nonnegative for all q (x k |ψ k ). Following [24], we adopt the proximal point algorithm to generate a sequence ψ i+1 k via the following iterative scheme, where i denotes the iteration index and the penalty factor β i is used to adjust the optimization step length. Roughly speaking, the ELBO is maximized when the KL divergence between the two variational distributions q (x k |ψ k ) and q x k |ψ i k approaches zero. Equation (11) can be rewritten as Please note that one iteration of this proximal method is equivalent to moving a step in the direction of the natural gradient [18]. The influence of the KL divergence on ψ i+1 k can be adjusted by β i . The larger the β i , the weaker the influence of the KL divergence on ψ i+1 k , and vice versa. In [18], it is assumed that β i = 1.

The Proximal Iterative Nonlinear Filter
The proximal iterative method is implemented via an iterative minimization of the KL divergence, where the initial state is assigned with an estimation from a core-filter, e.g., Bayesian filter. Here, EKF is adopted as the core-filter to predict and update the system state before the iterative optimization process. We propose a proximal iterative nonlinear filter combined with VB, called PEKF-VB, which is described and derived in the following.
Firstly, by substituting Equation (10) into Equation (12), we can rewrite the iterative optimization as Under the Gaussian assumptions for the process noise and the measurement noise, the variational distribution is of the form q (x k |ψ k ) ∼ N (x k ; x k|k , P k|k ). Given the prior of the state x k−1|k−1 at time For the first term in Equation (14), the expectation E q [log p (z k |x k )] related to x k and P k can be approximated linearly using the gradients with respect to (w. r. t.) x k and P k . Defining g x k|k , P k|k E q [log p (z k |x k )], the gradients of g w. r. t. x k|k and P k|k are The expectation E q [log p (z k |x k )] at time k is then maximized by gradient ascent in the variables x k|k and P k|k ; that is, where α i k and γ i k at iteration i are given by and H i k is the Jacobian matrix: In other words, the coefficients α i k and γ i k are updated by H i k with x i k|k in each iterative step. The detailed calculations of α i k and γ i k are given in Appendix A.
where operators tr(·) and |·| denote the trace and the determinant of a matrix, respectively. By Equations (17)-(21), Equation (13) can be rewritten as, Then ψ i+1 k is maximized at a point (x, P) which can be explicitly calculated. To find them, set the partial derivatives of ψ i+1 Accordingly, x i+1 k|k and p i+1 k|k are seen to be (24) and (25) show that the state estimate and the associated covariance in the iteration are updated by α i k and γ i k , respectively. As shown in Figure 1, the complete iteration procedure consists of Equations (18), (19), (24) and (25).
Update state and associated covariance by Eqs. (5) and (6) Initialize iteration and Calculate Jaccobian matrix by Eq. (20) Calculate parameters and by Eqs. (18) and (19) Update iterative estimation and associated covariance by Eqs. (26) and (27) Termination Figure 1. The flow diagram of the proposed PEKF-VB algorithm.
We note that, in principle, the computational cost in Equations (24) and (25) can be slightly reduced by using the Matrix Inversion Lemma [27]. As a result, Equations (24) and (25) are derived as Equations (26) and (27), respectively.
where (24) and (25) can be rewritten as where The flow diagram of the proposed PEKF-VB algorithm is shown in Figure 1 and the detailed implementation of PEKF-VB is given in Algorithm 1.

Algorithm 1
The implementation of the PEKF-VB algorithm 1: Initialization (k = 0): state estimation x 0 and associated error covariance P 0 , the number of iterations. 2: Compute the predicted state x k|k−1 and the associated error covariance P k|k−1 where H k = ∂h k (x) ∂x | x=x k|k−1 . 4: Update the state estimation x k|k and the associated error covariance P k|k 5: Let x 1 k|k = x * k|k , P 1 k|k = P * k|k , and i = 1. 6: while not converge do 7: Compute parameters α i+1 k and γ i+1 k by Equations (18) and (19). 8: Compute iterated state estimation x i+1 k|k and its error covariance P i+1 k|k by Equations (26) and (27). 9: Let i = i + 1. 10: end while 11: Let k = k + 1, go back to Step 2.

1.
The VB method approximates the true posterior PDF by choosing from a parameterized variational distribution. In each iteration of the PEKF-VB, the ELBO (9) increases. It follows that the ELBO is a proper criterion for measuring the performance of variational optimization. The ELBO of the proposed nonlinear filter is where D x and D z denote the dimension of the state and the dimension of the measurement, respectively. The derivation of the ELBO is given in Appendix B.

2.
Apart from the KL divergence, we can use Calvo and Oller's distance (COD) as the penalty function in Equation (13); the corresponding filter is denoted by CODEKF. The COD of two where n is the dimension of T which is diagonal, and ∆µ = µ 2 − µ 1 , TT T = I. We replace the KLD in Equation (13) with Equation (31) as follows. 3.
Since both PEKF-VB and CODEKF involve iterations within the VB framework to minimize the divergence between the posterior PDF and variational distribution, their complexity is increased by the calculation of the Jacobian in each iteration.

4.
In PEKF-VB, we use KL divergence to measure the similarity between two distributions. Under Gaussian assumptions for the distributions, a closed-form solution of the variational distribution has been derived. However, the VB framework with the KL divergence can also apply to non-Gaussian distributions. If no closed-form exists, a Monte Carlo method can be used to approximate the divergence. Other measures of dissimilarity between probability distributions, such as the alpha-divergence, the Rényi-divergence and the alpha-beta divergence, can also be used in the VB framework. See [29] and references therein. Unfortunately, in general, no computationally tractable form of the variational distribution can be derived and a Monte Carlo method has to be employed.

Numerical Simulations
In this section, we present two nonlinear estimation examples of 2D target tracking and a benchmark nonlinear filtering problem to illustrate the performance of PEKF-VB and CODEKF. We compare them with EKF and UKF. The performance is measured by root-mean-squared error (RMSE) of the estimates and the computational overhead.
Example 1: Range-bearing tracking. In this scenario, the underlying target motion is described by a constant turn (CT) model, with the state vector consists of 2D position and velocity components. As shown in Figure 2a, the target moves with initial state x 0 = (565.7, 29.99, 1166, −0.62) T along a circular trajectory, and is observed by a range-bearing sensor. The state transition matrix F k in Equation (1) and the measurement function h k (x k ) in Equation (2) are: where the turn rateθ = −0.0333 rad/s, the covariances of the zero mean Gaussian white noises ω k and υ k are Q k = 1.5 GqG T and R k = diag r 2 , φ 2 , respectively, where q = I 2×2 , and G = T 2 /2 T 0 0 0 0 T 2 /2 T T , r = 35, and φ = 0.5π/180. At each run, the track is initialised using the two-point method [1] with initial error covariance P 0 = diag([600, 100, 600, 100]). We let β i = 1, and T = 1 s. The target moves for 80 scans (periods). 1000 Monte Carlo runs are carried out. The RMSE plots of EKF, UKF, CODEKF and PEKF-VB are showed in Figure 2b. The black, red, green and blue curves are obtained by the PEKF-VB CODEKF, UKF and EKF, respectively. It is seen that in terms of RMSE performance, PEKF-VB is slightly better than CODEF; both PEKF-VB and CODEF are better than EKF and UKF. Table 1 provides the quantitative comparison of RMSE and execution time of EKE, UKF and CODEKF, PEKF-VB. Figure 3b gives the relationship between iteration index and the running time of PEKF-VB.   In Figure 3a we compare the RMSE performance of PEKF-VB by varying the numbers of iterations. Notice that the RMSE decreases with the increasing of the number of iterations. Figure 3b gives the computational overhead of PEKF-VB w.r.t the number of iterations. Both results agree with our intuition.
To illustrate the convergence of PEKF-VB, we present the ELBO for different numbers of iterations in Figure 4. Figure 4a illustrates the ELBO at the second scan, and Figure 4b shows the ELBO for all scans. Clearly, the ELBO increases with the number of iterations increases, showing that the iteration procedure in PEKF-VB converges.  Example 2: Bearing-only tracking. In this scenario, a single target tracking using measurements from a single bearing-only sensor is considered. While the sensor (ownership) measurement model is nonlinear to the target state, the sensor has to maneuver relative to the target in order to observe it [30,31]. Let x k = [x k , y k ,ẋ k ,ẏ k ] be the state of the target at time k, where (x k , y k ) and (ẋ k ,ẏ k ) are the position and velocity, respectively. The target, with an initial range of 5 km (relative to the sensor) and initial course of 220 • in a clockwise direction (Set the positive axis of y is 0 • ), is modeled by a constant velocity model in Equation (35).
where T = 1min. The speed of the target is 0.1235 km/min. The sensor starts moving with a fixed speed of 0.1543 km/min and an initial course of 140 • in a clockwise direction (Set the positive axis of y is 0 • ). Please note that for the bearing-only tracking problem, to be able to estimate the range of the target, the sensor has to maneuver. Here we assume the sensor maneuvers from scan 14 to scan 17, and then moves with constant velocity from scan 18 to scan 40. The motion model of the sensor is given by Equation (36), where the turn rateθ = 30 • /min. Both the target and the sensor move for 40 scans and their trajectories are shown in Figure 5a.
The measurement function h k (x k ) of the bearing-only sensor is, where x k and y k are the position of the target in Cartesian coordinate system, x s and y s are the position of the sensor. The standard deviation of measurement noise is 1 • . The initial position of the target is randomly sampled at range r = 13 km with covariance P 0 where σ m = π/180 rad, ∆r = 2 km and ∆v = 61.7 m/min. We let β i = 1. Figure 5b shows the estimated target trajectories obtained by the EKF, UKF, CODEKF and PEKF-VB for a single run.
x (km) Based on 1000 Monte Carlo simulations, the RMSE comparison of EKF, UKF, CODEKF and PEKF-VB is illustrated in Figure 6, where the number of iterations for PEKF-VB is 5.

•
As we expected, with a fixed sensor trajectory, PEKF-VB has the best target observability from sensor measurements and leads to a better RMSE performance than EKF and UKF. Under the VB framework, the variational distribution approaches the real posterior PDF through the iteration of the proximal filter.

•
The RMSE performance of CODEKF is also better than those of EKF and UKF because, for CODEKF, the Jacobian matrix of Equation (37) in each iteration is updated to minimize the COD. However, the RMSE performance of CODEKF is slightly worse than that of PEKF-VB.

•
In the first few scans, the performance of the four filters are comparable. This is because, in this bearing-only tracking problem, the accumulative measurements in these scans do not provide enough information to the four filters. The performance of CODEKF and of PEKF-VB suffers when measurement data is very limited. As more measurements accumulate both CODEKF and PEFK-VB extract more information via the iteration process, resulting in superior performance.  Figure 7 shows the metric values versus the number of iterations for PEKF-VB and CODEKF. The KLD curve is close to zero after the fifth iteration. The enlarged plots marked from the sixth to the tenth iteration shows that PEKF-VB converge faster than CODEKF. Example 3: A Strongly nonlinear filtering problem. For further verifying our proposed method, we run EKF, UKF, CODEDK and PEKF-VB on the benchmark nonlinear problem in [32][33][34] where ν k−1 and ω k are zero mean Gaussian noise with variances Q k−1 and R k , respectively. We let Q k = 0.0001, R k = 1 and scan period T = 1. The simulation results are given in Figure 8, from which we can see that CODEKF and PEKF-VB have very similar results and outperform EKF and UKF. This is because that local linearization adopted by EKF-based filter are not a sufficient description of the nonlinear nature of this example [34], while the VB iteration can make use of measurement information as much as possible.

Conclusions
We have developed a proximal iterative nonlinear filter, in which the expectation of the posterior PDF is approximated by a parameterised variational distribution that is iteratively optimized in the VB framework. A weighted KL divergence is adopted as a proximal term in the iteration to ensure the convergence can be achieved with a tight bound. The simulation results show that the proposed algorithm is better than several existing algorithms in terms of estimation accuracy at the cost of increased computational burden. Since the system is Gaussian, the likelihood function p(z k |x i k|k ) is The parameters α k in Equation (18) and γ i k in Equation (19) are derived according to Equation (A2) and Equation (A3), respectively.
By Bonnet's theorem [35], the gradient of the expectation of f (ξ) under a Gaussian distribution N (ξ|µ, C) w.r.t. the mean µ is the expectation of the gradient of f (ξ) It follows that α i k can be written where the Jacobian matrix According to Price's theorem [36], the gradient of the expectation of f (ξ) under a Gaussian distribution N (ξ|µ, C) w.r.t. the covariance C is the expectation of the Hessian of f (ξ): Similarly, (A7)

Appendix B. Derivation of ELBO
From Equations (9) and (8), the ELBO is Since both system noise and measurement noise are Gaussian, the likelihood function p (y k |x k ), the PDF p (x k ) and the PDF q (x k |ψ k ) below are Gaussian Assume that the state transition matrix and the measurement matrix are obtained by linearization, p(z k |x k ) and p(x k ) can be written as, where F k−1 = ∂ f (x) ∂x | x=x k−1|k−1 and H k = ∂h(x) ∂x | x=x k|k .
Assume that the state estimation is unbiased; that is, E q [x k−1 ] = x k−1|k−1 , Equation (A10) can be written approximately as Since estimation is unbiased, the ground truth x k at time k can be expressed as, where E q [x k|k ] = 0. The first term E q [log p(z k |x k )] in Equation (A8) becomes The second term E q [log p(x k )] in Equation (A8) becomes The third term E q [log q (x k |ψ k )] in Equation (A8) is Combining Equations (A16)-(A18), we obtain the ELBO as (A19)