A New Variational Bayesian-Based Kalman Filter with Unknown Time-Varying Measurement Loss Probability and Non-Stationary Heavy-Tailed Measurement Noise

In this paper, a new variational Bayesian-based Kalman filter (KF) is presented to solve the filtering problem for a linear system with unknown time-varying measurement loss probability (UTVMLP) and non-stationary heavy-tailed measurement noise (NSHTMN). Firstly, the NSHTMN was modelled as a Gaussian-Student’s t-mixture distribution via employing a Bernoulli random variable (BM). Secondly, by utilizing another Bernoulli random variable (BL), the form of the likelihood function consisting of two mixture distributions was converted from a weight sum to an exponential product and a new hierarchical Gaussian state-space model was therefore established. Finally, the system state vector, BM, BL, the intermediate random variables, the mixing probability, and the UTVMLP were jointly inferred by employing the variational Bayesian technique. Simulation results revealed that in the scenario of NSHTMN, the proposed filter had a better performance than current algorithms and further improved the estimation accuracy of UTVMLP.


Introduction
Under the minimal mean square error criteria, the KF is the optimal estimator for the linear Gaussian state-space model [1,2]. KF has been widely employed in a variety of applications [3][4][5]. Unfortunately, in many practical applications, when the sensor produces intermittent faults, the actual measurement of the sensors may not be accurately represented by the KF measurement model [6,7]. If the random measurement loss occurs, the measurement of the sensors contains only pure noise. In this situation, the estimation accuracy of a typical KF will drop significantly or even diverge. Various filtering methods have been developed to address the measurement loss filtering issue, such as the intermittent KF (IKF) [8,9]. However, IKF has an important assumption: the measurement loss probability is known. In practical applications, the measurement loss probability is usually unknown and the IKF is no longer applicable in this case [7].
In order to address the filtering issues of the unknown measurement loss probability of the linear system, the first Bayesian Kalman filter and the second Bayesian Kalman filter were designed by estimating the process and posterior distribution of the measurement loss, respectively [10]. The above two filters, however, are no longer valid if the unknown measurement loss probability is time-varying. Recently, the variational Bayesianbased adaptive KF (VBAKF) was derived for a linear system with unknown time-varying measurement loss probability (UTVMLP) and both the system vector and UTVMLP are jointly estimated by introducing the variational Bayesian technique [7]. Additionally, VBAKF shows excellent performance in the context of white Gaussian measurement noise with known statistical characteristics. Unfortunately, in realistic engineering applications, measurement outliers may occur at various periods due to environmental changes and unreliable sensors, resulting in NSHTMN, i.e., when the system runs healthily, the measurement noise is the Gaussian-distributed, and when the time-varying measurement outliers erode the system, the measurement noise is heavy-tail-distributed [11,12]. In the scenario of NSHTMN, the estimation accuracy of VBAKF will drop sharply.
Recently, some mixture distribution-based algorithms have been presented to address NSHTMN, such as the Gaussian-Student's t-mixture distribution-based KF (GSTKF) [13,14]. However, the filtering problem with UTVMLP and NSHTMN cannot directly solved by employing a mixture distribution, that is, under the scenario of UTVMLP and NSHTMN, the current likelihood function is a weighted sum of double-mixture distributions, which is an unclosed and unconjugated distribution that makes the Bayesian inference difficult to employ directly.
In this paper, a new variational Bayesian-based KF is presented to settle the filtering issue for linear discrete-time systems with UTVMLP and NSHTMN. Firstly, the Gaussian-Student's t-mixture distribution with BM is employed to model the NSHTMN. Secondly, the form of the likelihood function is converted to an exponential product and constructs a new hierarchical Gaussian state-space model by utilizing BL. Thirdly, the variational Bayesian method is introduced to simultaneously estimate the system state vector, BM, BL, the intermediate random variables, the mixing probability, and the UTVMLP. Finally, a numerical simulation experiment reveals that the proposed filter has better estimation accuracy but is more time-consuming than existing filtering algorithms in the scenarios of NSHTMN and UTVMLP.
The contributions of this paper are as follows: (a) By employing a Bernoulli-distributed variable, the NSHTMN is modelled as a Gaussian-Student's t-mixture distribution; (b) The measurement likelihood function is converted from the weight sum of two mixture distributions to an exponential product and a new hierarchical Gaussian state-space model is therefore derived; (c) The system state vector, UTVMLP, and the unknown variables are simultaneously estimated by utilizing the variational Bayesian technique; (d) Numerical simulation results indicate that the proposed filter has better performance than that of existing algorithms in the scenarios of NSHTMN and UTVMLP

Problem Formulation
Consider the linear stochastic system with the following state and measurement equations: where x t ∈ R m denotes the system state vector; F t−1 ∈ R m×m denotes the state transition matrix; e t ∈ R m represents the Gaussian-distributed white process noise vector with a zero mean value and covariance matrix Q t ; y t ∈ R n represents the measurement vector; H t ∈ R n×m is the measurement matrix; g t ∈ R n is the white NSHTMN vector; and t represents the index of discrete time. The phenomenon of measurement loss is described by introducing the identically distributed and mutually uncorrelated measurement loss defined as the Bernoulli random variable (BL) b t , which is expressed by the following equations.
where γ t ∈ [0, 1] denotes the time-varying measurement loss probability. Note that the value of γ t is unknown in this paper. The initial Gaussian-distributed system state vector x 0 is the random vector with meanx 0|0 = 0 and covariance matrix P 0|0 . Additionally, it is assumed that the initial system state vector x 0 , the noise vectors e t−1 and g t , and the Bernoulli random variable b t are mutually independent. It can be seen from Equations (1)-(4) that the ideal measurement was received by the sensor when b t = 0 and the measurement loss with UTVMLP occurred when b t = 1. Meanwhile, the measurement noise is NSHTMN due to measurement outliers, that is, when the system runs healthily, the measurement noise is Gaussian-distributed, and when measurement outliers erode the system, the measurement noise is heavy-tail-distributed. The NSHTMN and UTVMLP can result in estimation errors or even in filtering divergence. Therefore, a new variational Bayesian-based Kalman filter with NSHTMN and UTVMLP will be proposed.

Proposed Variational Bayesian-Based Kalman Filter
In this section, a new variational Bayesian-based Kalman filter is proposed to address the filtering issue for a linear system with NSHTMN and UTVMLP. Firstly, the Gaussian-Student's t-mixture distribution is utilized to model the NSHTMN and the hierarchical form is derived. Secondly, by converting the measurement likelihood function into an exponential multiplication, a new hierarchical Gaussian state-space model is established. Thirdly, by using the variational Bayesian method, the system state and unknown variables are simultaneously estimated. Finally, the required mathematical expectations are given.

Gaussian-Student's t-Mixture Distribution
The NSHTMN vector can be modeled as the Gaussian-Student's t-mixture distribution by employing another mixing-defined Bernoulli random variable (BM), ζ t , and the probability density function (PDF), p(g t ) is given as 1} (5) where N(x; 0, Σ) represents the Gaussian PDF with a zero mean vector and covariance matrix Σ, and ST(x; 0, Y, ω) represents the student's t-PDF with a zero mean vector, covariance matrix Y, and degree of freedom (dof) parameter ω. R t represents the covariance matrix of the nominal measurement noise. The PDF of the mixing probability ϕ t and the probability mass function (PMF) of ζ t are defined as follows, respectively.
where Be(x; σ, κ) represents the Beta PDF with shape parameters σ and κ. Due to the hierarchical properties of the student's t-distribution, Equation (5) can be rewritten as such: where G(x, a, b) represents the Gamma PDF with shape parameter a and rate parameter b, and β t represents the intermediate random variable.

New Hierarchical Gaussian State-Space Model (HGSSM)
According to Equations (2)-(4), the measurement likelihood PDF is derived as Based on Equation (2), the following equation can be obtained.
where p g t (·) represents the measurement noise PD. Substituting Equations (11) and (12) in Equation (10) results in (13) is an unclosed and unconjugated weighted sum form, and it is impossible to infer the system state vector and unknown parameters directly by utilizing the variational Bayesian. The weighted sum will then be converted into an exponential multiplication form to address this problem. The PMF of BL b t is given as

Remark 1. The measurement likelihood PDF in Equation
Exploiting Equations (13) and (14), the measurement likelihood PDF is reformulated as According Equation (15), the exponential multiplication-formed likelihood PDF p(y t |x t , b t ) is given as follows.
Remark 2. The variational Bayesian method must select the suitable conjugate-prior distributions for unknown variables. Therefore, the appropriate prior PDFs to construct a new HGSSM are selected. The one-step predicted PDF p(x t |y 1:t−1 ) of system state vector x t is assumed as being Gaussian distributed as follows.
wherex t|t−1 represents the mean vector and P t|t−1 represents the covariance matrix. Botĥ x t|t−1 and P t|t−1 can be updated by the typical Kalman filter, which is given aŝ In employing Equations (8), (9) and (16), the conditional likelihood PDF p(y t |x t , b t , β, ζ t ) is derived as It can be seen from Equations (6)-(9), (13) and (20) that the measurement vector y t depends on system state vector x t , intermediate random variable β t , BM ζ t , BL b t , mixing probability ϕ t , and measurement loss probability γ t . The following joint-prior PDF must be calculated, i.e., where the definitions of p(γ t ), p(ζ t |ϕ), p(β), p(b t |γ t ), and p(x t |y 1:t−1 ) are given in Equations (6), (7), (9), (14) and (17), respectively. Additionally, p(γ t |y 1:t−1 ) denotes the prior PDF of the time-varying measurement loss probability, which can be assumed as the following Beta distribution.
where the shape parametersη t|t−1 andδ t|t−1 can be calculated by introducing the forgetting factor ρ ∈ (0, 1] as follows.η whereη t−1|t−1 andδ t−1|t−1 represent posterior shape parameters. The proposed new HGSSM is comprised of Equations (14) and (17)-(24). System state vector x t , intermediate random variable β t , BM ζ t , BL b t , mixing probability ϕ t , and measurement loss probability γ t will be simultaneously estimated by utilizing the variational Bayesian method.

Variational Bayesian Approximation of the Joint Posterior PDFs
Aiming at the estimation of the unknown variables of the new HGSSM, the joint posterior PDF p(Ξ|y t ) with Ξ {x t , β t , b t , ζ t , ϕ t , γ t } is required to be solved. However, the analytical solution of p(Ξ|y t ) is not accessible. The variational Bayesian approach is therefore employed to determine an approximate PDF for p(Ξ|y t ) and to compute an approximate solution [15][16][17], i.e., where θ represents an arbitrary element of Ξ and q a (θ) denotes the approximate PDF or PMF. By minimizing the Kullback-Leibler divergence (KLD) between p(Ξ|y 1:t ) and where KLD(q a (A)||p(A)) represents the KLD between q a (A) and p(A), and the optimal solution of Equation (26) can be calculated via the following formula [15,17].
where E(θ) denotes the mathematical expectation operation, Ξ −θ signifies a grouping of all the components in Ξ apart from θ, and the constant with regard to θ is denoted by c θ . Additionally, the fixed-point iteration technique is utilized to derive the approximate formation of q a (θ) due to the fact that estimated parameters are mutually coupled. The joint PDF p(Ξ, y 1:t ) in Equation (26) can be derived as Proposition 1. Let θ = x t and by using Equation (29) in (28) where q (s+1) a (·) represents the approximate PDF in the (s + 1)th iteration, while the mean vectorx (s+1) t|t and the covariance matrix P (s+1) t|t are assumed to be updated in accordance with the traditional Kalman filter as follows.
represents the Kalman gain matrix. The modified measurement noise covari- where E (s) [·] represents the mathematical expectation of variables in the sth iteration.
Proof: see Appendix A.

Proposition 2.
Let θ = β t and by using Equation where the shape parameter π (s+1) t and rate parameter ν where n represents the dimension of the measurement vector, tr(·) represents the trace operation, and G (s+1) t is defined as Proof: see Appendix B.
Proposition 3. Let θ = b t and by using Equation (29) in Equation (28), q (s+1) a (b t ) can be updated as the Bernoulli distribution. The posterior probabilities p(b t = 1) and p(b t = 0) of BL b t are given as where ∆ (s+1) represents the normalizing constant and the parameters C are, respectively, defined as Proof: see Appendix C.

Proposition 4.
Let θ = ζ t and by using Equation (29) in (28), q (s+1) a (ζ t ) can be also updated as the Bernoulli distribution. The posterior probabilities p(ζ t = 1) and p(ζ t = 0) of BM ζ t are given as where ∇ (s+1) also represents the normalizing constant and the definitions of parameters V (s+1) t and W (s+1) t are, respectively, given as Proof: see Appendix D.
Proposition 5. Let θ = ϕ t and by using Equation (29) in (28), q (s+1) a (ϕ t ) can be updated as the Beta distribution, i.e., where the shape parameters h Proof: see Appendix E.
Proposition 6. Let θ = γ t and by using Equation (29) in Equation (28), q (s+1) a (γ t ) can be also updated as the Beta distribution, i.e., where the definitions of shape parametersη Proof: see Appendix F.

Update the Bernoulli
, and D

Simulations
Aimed at demonstrating the superiority of the presented filter in the scenario with UTVMLP and NSHTMN, a numerical example is simulated. The process and measurement equations of the stochastic system are, respectively, given as [7] x t = 0.6 0.4 0.1 0.9 where the Gaussian process noise e t−1 and the NSHTMN g t are given as [12] e t−1 ∼ N(0, Q t ) (65) where w.p. represents "with probability". The true process noise covariance matrix Q t with parameter M = 1 and the nominal measurement noise covariance matrix R t with parameter N = 150 m 2 are set as The real UTVMLP is set as From Equations (66)-(69), it can be seen that the measurement noise and UTVMLP are divided into four stages, as shown in Table 2. The remaining system parameters are as follows: the sampling interval ∆k = 0.01 s and the total simulation time T = 400 s. The proposed filter is compared with the typical Kalman filter (KF) [2]; the existing variational Bayesian-based adaptive KF with UTVMLP (VBAKF) [7]; the existing Gaussian-Student's t-mixture distribution-based KF (GSTKF) with Gaussian process noise [14]; and the existing IKF with known real measurement loss probability [8]. The parameters of VBAKF are selected as p 0 = 0.5, α 0 = 5, β 0 = 5, ρ = 1 − exp(−5), and N m = 10. The parameters of GSTKF are selected as v t = 5 and e 0 = 0.85. The parameters of the proposed filter are given as ρ = 0.99, µ = 5, h 0 = 0.85, N I = 10, η 0 = 5, and δ t = 5, ς = 10 −16 . All filters are programmed with MATLAB R2018a and run on a computer with Intel Core i5-6300HQ CPU at 2.30 GHz and 8 GB of RAM.  Aimed at evaluating the performances in the estimation of the system state vector of all the algorithms, the root-mean square error (RMSE) and the averaged root-mean square error (AGRMSE) are utilized as performance indicators. The definitions of RMSE and AGRMSE of the system state are given as where (x r t , y r t ) and (x r t ,ŷ r t ) denote the actual and estimated system state at the jth Monte Carlo run and discrete-time t, respectively. M c = 500 represents the total Monte Carlo run time.
Different from the proposed algorithm and VBAKF, the KF, IKF and GSTKF do not estimate UTVMLP. Although IKF can also address the filtering problem with measurement loss, IKF is based on the assumption that the measurement loss probability is known. Therefore, only VBAKF and the proposed algorithm participate in the comparison of the UTVMLP estimation performance. Figure 1 shows the RMSE x s of the proposed filter and the existing filters over 500 times of the Monte Carlo run. Additionally, the AGRMSE x s and SSRTs of different filters are listed in Table 2. It can be seen from Figure 1 and Table 3 that in the contexts of UTVMLP and NSHTMN, when the measurement is the Gaussian measurement noise and there is slight loss probability, as shown in stages 1 and 4, the estimation accuracy of the proposed filter is close to the IKF with true loss probability and the performance of the proposed algorithm is better than the other algorithms. We can also find that the proposed algorithm still has better performance than the existing algorithms when the measurement has heavytailed measurement noise and larger measurement loss probability, as shown in stages 2 and 4. In addition, the proposed algorithm has longer SSRT and higher computational complexity than the existing filters, which can be observed from Table 3.   Figure 2 shows the curves of the true and estimated UTVMLPs of VBAKF and the proposed filter over 500 times of the Monte Carlo run. Obviously, the NSHTMN has a great influence on the filtering performance of VBAKF and the proposed filter has better UTVMLP estimation accuracy than VBAKF in the scenario of NSHTMN.    Figure 2 shows the curves of the true and estimated UTVMLPs of VBAKF and the proposed filter over 500 times of the Monte Carlo run. Obviously, the NSHTMN has a great influence on the filtering performance of VBAKF and the proposed filter has better UTVMLP estimation accuracy than VBAKF in the scenario of NSHTMN. It can be seen that the proposed filter with different shape parameters has better performance than current algorithms in the system state and UTVMLP estimations. Moreover, the degree of freedom parameter has little influence on the estimation accuracy and time complexity of the proposed algorithm, and the recommended value of is therefore set as 5.   It can be seen that the proposed filter with different shape parameters has better performance than current algorithms in the system state and UTVMLP estimations. Moreover, the degree of freedom parameter µ has little influence on the estimation accuracy and time complexity of the proposed algorithm, and the recommended value of µ is therefore set as 5. It can be seen that the proposed filter with different shape parameters has better performance than current algorithms in the system state and UTVMLP estimations. Moreover, the degree of freedom parameter has little influence on the estimation accuracy and time complexity of the proposed algorithm, and the recommended value of is therefore set as 5.       The corresponding SSRT of the proposed filter with = 0.93,0.95,0.97,0.99 is approximately equal to 0.2990. We can find that the proposed filter with = 0.99 has the best performance in the system state and UTVMLP estimations, and the value of has little effect on calculation complexity. Therefore, the recommended value of is 0.99.   Figure 7 shows the AGRMSE s of the proposed filter and the current algorithms with the iteration number = 1,2, ⋯ 10. We can see from Figure 7 that the proposed filter has a smaller AGRMSE than the existing filters when 3 and the proposed filter converges faster than the existing filters. However, Table 4 shows that the setting of has a huge impact on the time consumption of the proposed filter and the SSRT increases with the increase of . Therefore, considering time consumption and estimation accuracy, the recommended value of ranges from 4 to 10.   Figure 7 shows the AGRMSE x s of the proposed filter and the current algorithms with the iteration number N I = 1, 2, · · · 10. We can see from Figure 7 that the proposed filter has a smaller AGRMSE x than the existing filters when N I ≥ 3 and the proposed filter converges faster than the existing filters. However, Table 4 shows that the setting of N I has a huge impact on the time consumption of the proposed filter and the SSRT increases with the increase of N I . Therefore, considering time consumption and estimation accuracy, the recommended value of N I ranges from 4 to 10.  Figure 7 shows the AGRMSE s of the proposed filter and the current algorithms with the iteration number = 1,2, ⋯ 10. We can see from Figure 7 that the proposed filter has a smaller AGRMSE than the existing filters when 3 and the proposed filter converges faster than the existing filters. However, Table 4 shows that the setting of has a huge impact on the time consumption of the proposed filter and the SSRT increases with the increase of . Therefore, considering time consumption and estimation accuracy, the recommended value of ranges from 4 to 10.

Conclusions
In this paper, a new VB-based KF is presented to address the filtering issue with UTVMLP and NSHTMN. The system state vector, BM, BL, the intermediate random variables, the mixing probability, and the UTVMLP are simultaneously inferred by introducing the variational Bayesian technique. Simulation results illustrated that the proposed filter has a better performance than existing filters in the estimations of the system state vector and UTVMLP.

Future Work
The environmental factors in practical applications may be more complicated than this paper illustrated. Apart from non-stationary heavy-tailed measurement noise and unknown loss probability, the system process noise may also present non-stationary heavytailed distribution. In terms of measurement, random delays in measurement will also appear. Therefore, in our future research, we will further consider the factors of nonstationary heavy-tailed process noise and random measurement delay based on the theoretical content of this paper. Additionally, we will design a non-linear filtering method with lower computational complexity to verify the effectiveness in the real world.

Data Availability Statement:
No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest:
The authors declare no conflict of interest.