Variational Bayesian-Based Improved Maximum Mixture Correntropy Kalman Filter for Non-Gaussian Noise

The maximum correntropy Kalman filter (MCKF) is an effective algorithm that was proposed to solve the non-Gaussian filtering problem for linear systems. Compared with the original Kalman filter (KF), the MCKF is a sub-optimal filter with Gaussian correntropy objective function, which has been demonstrated to have excellent robustness to non-Gaussian noise. However, the performance of MCKF is affected by its kernel bandwidth parameter, and a constant kernel bandwidth may lead to severe accuracy degradation in non-stationary noises. In order to solve this problem, the mixture correntropy method is further explored in this work, and an improved maximum mixture correntropy KF (IMMCKF) is proposed. By derivation, the random variables that obey Beta-Bernoulli distribution are taken as intermediate parameters, and a new hierarchical Gaussian state-space model was established. Finally, the unknown mixing probability and state estimation vector at each moment are inferred via a variational Bayesian approach, which provides an effective solution to improve the applicability of MCKFs in non-stationary noises. Performance evaluations demonstrate that the proposed filter significantly improves the existing MCKFs in non-stationary noises.


Introduction
The state estimation problem in dynamic systems is an important research topic in engineering applications and scientific research. As an excellent optimal state-space estimator, the Kalman filter (KF) is commonly applied in various fields like control systems and signal processing. Unfortunately, the optimality of KF requires exact system models and ideal noise conditions as summarized in [1].
The widely used KF, which usually refers to the Kalman filter based on the Hidden Markov Models (HMM), has rigorous requirements for the noise models. Both process and measurement noise are assumed as ideal independent Gaussian noise sequences. However, in practical applications, ideal noise conditions are not likely, and model uncertainties such as system structure changes and environmental disturbances are generally inevitable. Moreover, unexpected noise interference, such as colored noise and non-Gaussian noise, widely exist, and the performance of the KF is likely to worsen when applied to such situations. For example, when the independent noise assumption no longer holds, and the colored noise needs to be considered in the system model, as one of the efficient solutions, the pairwise Markov models (PMM), which can be deemed a general form of HMM, can be taken as an efficient improvement scheme for KF. In [2], the framework of the KF based on PMM is derived, which allows the cross-dependence between observations conditionally on hidden variables. As an extension, the KF based on triplet Markov chains is provided in [3]. These methods provided efficient solutions for the colored noise filtering from the perspective of the state-space models. On this basis, to solve the model parameters' uncertain problems in practical applications. The robust parameter estimation problem for a linear system corrupted by impulsive noise can be solved effectively by the MCKF. However, just like other robust filters, the filtering performance of the MCKF is closely related to its initial parameters, which are usually obtained by experience or trial and error method. Similar to many existing robust filtering algorithms with fixed parameters, it might result in performance degradation in non-stationary noises. Several studies have focused on the parameter problem of correntropy function. In prior works, heuristic solutions were proposed to adjust the kernel bandwidth during filtering [25,26]. As it is difficult to directly find an optimal kernel bandwidth during filtering, the mixture correntropy concept was proposed in [27,28], the method takes mixture Gaussian correntropy with different kernel parameters instead of the Gaussian correntropy, and a new maximum mixture correntropy criterion (MMCC) is derived from replacing MCC. However, the mixing probability of the mixture correntropy needs to be configured manually and fixed, which results in performance degradation similar to that of the MCKF.
To improve the filtering performance when applied to non-stationary noise conditions, it may be an efficient improved solution that considers the model's switching scenarios. For instance, in [29], the optimal recursive filtering method is studied for non-Gaussian Markov switching models, in which a semi-supervised parameter estimation method is used. In addition, for the non-stationary noise conditions concerned in this paper, the variable Bayesian approach is an implementation scheme worthy of consideration. Based on the triplet Markov chain model, the general form of variable Bayesian inference is deduced in [30], and a structured variable Bayesian inference frame with regime switching is obtained. For typical linear filtering applications, the RSTKF proposed in [12] also adopts a similar operation scheme, where VB approximation is operated by rescaling the covariance and inferring parameters to deal with the non-stationary non-Gaussian noises. In [31], a similar problem is further studied, and a VB-based robust student's t KF is applied to the linear PMM systems, which extends the filter's applicability to more general conditions, and independent noise assumption is no longer needed. Therefore, inspired by the related references, the model switch concept is adopted in this work, and the variational Bayesian approach is also operated to infer estimation results. In view of the filtering accuracy degradation problem of MCKF in the non-stationary and non-Gaussian noise, a series of studies are carried out.
In this work, an improved maximum mixture Kalman filter (IMMCKF) is therefore proposed, the intermediate random variables are used to represent the mixing probability of mixture correntropy. The state variables and mixing probability are approximated by the derived variational Bayesian approach. Compared with the existing MCKF algorithms, the numerical test results demonstrate that the proposed filter could deal with the filtering problem well in a non-stationary non-Gaussian noises environment. The contributions of this paper are listed as follows: The accuracy degradation problem of existing fixed-parameter robust filtering algorithms in non-stationary noise environment is considered. By analyzing, it is inferred that the mixture correntropy function can be improved as the breakthrough point and applied to the filtering problem in such noise conditions. On this basis, a novel improved robust filtering algorithm is then further derived.  Through analyses and derivations, the necessary selection strategy of initial parameters is derived, and the numerical test results show that the filtering performance is enhanced obviously after several iterations. On the one hand, it achieved desired robust filtering performance when applied to non-stationary noise conditions. On the other hand, the proposed algorithm effectively avoids the possible filtering divergence difficulties of MCKF in practical application.
This paper is organized as follows: In Section 2, we review the concept of correntropy and existing MCKF. In Section 3, a variational Bayesian-based improved maximum mixture correntropy KF is derived to solve the filtering problem in non-stationary non-Gaussian noises, in which the variational Bayesian approach is applied in the proposed filter. Section 4 provides performance evaluations and analysis, demonstrating the advantages of the proposed filter in different noise conditions. Conclusions are given in Section 5.

Definition of Correntropy and Properties
Correntropy is a useful local similarity measure tool for state estimation in heavytailed noise environments. Given two random variables X, Y ∈ R with a joint distribution function F XY (x, y), correntropy can be defined as where κ σ (·) is a shift-invariant Mercer kernel. In this work, the Gaussian kernel function is given by where e = X − Y, and σ > 0 denotes the kernel bandwidth. In order to make correntropy applicable to complex noise conditions, the default Gaussian kernel function can be extended into a mixture correntropy form as follows: where G σ 1 , G σ 2 represent the Gaussian correntropy with two different kernel bandwidth parameters, and ρ ∈ [0, 1] represents the mixing probability. For convenience of application, (3) can be approximately expressed as where N represents the number of samples. The mixture correntropy can be taken as a generalization of the original correntropy, if ρ = 1 or ρ = 0, it reduces to the Gaussian correntropy with single kernel parameter and M(X, Y) = 1 if X = Y.

Robust Kalman Filter Based on Maximum Correntropy Criterion
Consider the linear state-space system based on HMM, the equations to be where k is a discrete time index, x k ∈ R n is the system state vector at discrete-time index k, F k is the state transition matrix, y k ∈ R m is the measurement vector, H k is the measurement matrix; w k and v k are zero-mean process and measurement noise vector with nominal covariance Q k and R k , respectively. It is assumed that both process and measurement noise are statistically independent and time uncorrelated. When the process and measurement noises are assumed ideal Gaussian distributions, and initial state x 0 is random Gaussian variables, the state can be inferred by the KF, which is an optimal filter in the minimum mean square error (MMSE) criterion. Additionally, the quadratic objective function can be formulated as follows: where x 2 A = x T Ax. To minimize Equation (7), the KF is implemented in two steps as below, the prior estimation isx and the posterior measurement update is where P k|k−1 and P k|k represents the prior and posterior covariance matrix, respectively. The KF is an optimal estimator in an ideal white Gaussian noise environment. In Equation (7), only the second-order statistics are used during the state update, and the KF is susceptible to non-Gaussian noise interference.
To solve the filtering problem in non-Gaussian noise conditions, recently, MCC is often considered an efficient potential solution. As described in Section 2, correntropy has several necessary properties that make it capable of dealing with the non-Gaussian estimation problem. Different from the global similarity measure-mean square error (MSE), which only contains the second-order statistics, Gaussian correntropy incorporates all even order moments [17,18]. In geometric meaning, MSE gives the L2 norm distance while correntropy offers a hybrid norm distance, where the correntropy behaves like the L2→L0 norm with the increasing difference between two points.
As proved by the research in [17], maximizing the correntropy of two different random variables can be used as a criterion in dealing with the non-Gaussian noise problem, especially the filtering problem in heavy-tailed noises, which leads to MCC. Therefore, to enhance the robustness of the Kalman filter, the objective function based on MCC is introduced to replace the original quadratic cost function, and the new objective function therefore can be formulated as: the subscript i represents the ith element of the vector; a and b are tuning parameter vectors; and σ m , σ p denotes the kernel bandwidths corresponding to the R k and P k|k−1 , respectively. For simplicity, it is assumed that σ m = σ p = σ, to ensure the filter converges to the KF when the kernel bandwidth goes to infinity, assuming that a i = σ, b j = σ [21]. The prior estimation steps of the MCKF are the same as in Equations (8) and (9) and the posterior state estimation of x k is obtained by below KF like equations with the updated filter gain as where ...G σ (e m )), and P k|k−1 = S P k|k−1 S T . It is worth noting that the MCC solution cannot be obtained in a closed form and usually solves it using an iterative update algorithm such as the fixed-point iterative algorithm, which involves no step size and may converge fast, the condition that guarantees the convergence of the fixed point MCC was given in [32].

Robust Kalman Filter Based on Mixture Correntropy Criterion
The kernel parameters of Gaussian correntropy determine the filtering performance of the MCKF, and an improper kernel bandwidth might lead to filtering performance degradation or even diverge. For this reason, the mixture correntropy is an alternative solution to improve the solution of problem by reducing the filter's sensitivity to the kernel parameters.
The maximum mixture correntropy Kalman filter (MMCKF) is derived in this section. According to the mixture correntropy Function (4), the objective function of the maximum mixture correntropy criterion can be formulated aŝ where M(e i ) represents the mixture correntropy function as (3). Similar to Equation (13), it is assumed that a i = b j = λ, and mixture correntropy is the convex combination of the two Gaussian correntropy functions. To maximize this objective function, the solution can be obtained by solving where represents the derivation of the correntropy function.
To maintain consistency to the KF, a tuning factor λ should be properly assigned as (20) and the MMCKF converges to the optimal KF while the process and measurement noises obey ideal Gaussian distributions. A modified mixture correntropy function C(e i ) is then formulated as ...C(e m )), and µ ∈ [0, 1]. Here, C(e i ) can be regarded as a linear transformation of M(e i ), they are positive correlated, and C(

Improved MMCKF via Variational Bayesian Approximation
Similar to the Gaussian correntropy of the MCKF, in Equation (21), the mixing probability parameters are constant, which inevitably results in the filtering performance degradation in non-stationary noises. To improve the filtering performance in such complex noise environments, an improved algorithm is derived in this section. The mixing probability of the mixture correntropy is reassigned as an unknown random variable that can be further approximated during filtering.
According to the Bayesian theorem, the posterior probability density function (PDF) p(x k |y 1:k ) is formulated as p(x k |y 1:k ) ∝ p(y k |x k )p(x k |y 1:k−1 ) (22) in which p(x k |y 1:k−1 ) is determined using the Chapman-Kolmogorov equation. Therefore, where p(x k |x k−1 ) is the posterior density at time k − 1 and p(x k−1 |y 1:k−1 ) is the transition density.
In this work, we assumed that µ is an unknown random variable, and µ ∈ [0, 1]. In order to approximate the unknown state and reasonable mixing probability via VB approach, we first need to determine the assumed distribution of unknown variable. Then, it is necessary to introduce a prior probability distribution p(µ). According to the desired scenario, µ is assumed to be an Beta distributed variable. Since the likelihood distribution of the mixing probability µ can be formulated as a Bernoulli distribution by introducing Bernoulli random variables ξ, according to the Bayesian probability theory, we have and then, p(µ|ξ) ∝ p(ξ|µ) · p(µ). According to the Bayesian formula, the posterior probability distribution is directly proportional to the product of the prior probability distribution and the likelihood function. The form of the posterior probability distribution will be the same as that of the prior. Therefore, the conjugate prior distribution of µ is selected as a Beta distribution according to Bayesian probability theory, have p(µ) = Be(µ; a, b), where E(µ) = a/(a + b). Now that we have determined the distribution of the required intermediate variables, we can further derive the following specific formula based on the system model. Using two random variables s k and t k obey Bernoulli distribution to generate the likelihood distribution. In order to ensure that the prior and posterior probability distributions retain the same form, the Beta distribution is taken as the prior conjugation distribution for inference. Therefore, the conditional probability p(t k |α k ) and p(s k |β k ) can be formulated as (26) in which α k and β k obey the Beta distribution as where a 0 , b 0 , c 0 and d 0 represent the prior Beta parameters for the mixing probabilities. Then, the conditional PDF p(y k |x k , t k ) and p(x k |y 1:k−1 , s k ) can be rewritten as the following hierarchical Gaussian form by whereR k ,P k|k−1 andR k ,P k|k−1 represent the updated covariance matrix corresponding to the Gaussian correntropy functions G σ 1 and G σ 2 , respectively. The modified prior error covariance is formulated asP , and the measurement noise covariance is formulated asR . Therefore, the conditional probability density distribution of p(y k |x k , α k , t k ) and p(x k |y 1:k−1 , β k , s k ) can be formulated as According to the likelihood PDF derived above and the Bayesian theorem, the joint PDF can be given as follows: where Θ k ∆ = {x k , s k , t k , α k , β k } contains the state variable x k that needs to be estimated, and the Beta-Bernoulli variables {α k , β k }, {s k , t k } are used for the inference of mixing probability parameters. Therefore, to obtain the estimated value of Θ using the VB inference, according to prior work [12,33], the approximate posterior PDF of the element in Θ needs to satisfy the following equation: where θ is an element of Θ k , Θ −θ means the remain elements of Θ except θ, and C θ is a constant with respect to θ. As (34) cannot be solved analytically, fixed-point iteration is required to achieve an approximate solution. By expanding (34), the following can be obtained: Initialize the parameters a 0 , b 0 , c 0 , d 0 and calculate E (i+1 , where ψ represents the digamma function. By exploiting (35), the estimation of the unknown parameters in Θ k is implemented by fixed-point iteration loop in (a)-(c). (a) Let θ = x k , utilizing (34) in (35), and log q (i+1) (x k ) can be rewritten as where i represents the ith fixed-point iterations. Using E (i+1) [s k ] and E (i+1) [t k ] replace the mixing probability ρ in (21) and then calculate the conditional PDF p(x k |y 1:k−1 ) and p(y k |x k ) by proposed mixture correntropy as where the modified prediction error covariance matrixP k|k−1 and the modified measurement noise covariance matrixR (i+1) k are used. After deformation, the posterior PDF q (i+1) (x k ) can be formulated as According to (38), q (i+1) (x k ) is updated using a nominal Gaussian distribution as By setting θ = s k and θ = t k , and then updating q (i+1) (s k ) and q (i+1) (t k ) using the Bernoulli distribution, the probabilities of s k and t k being 1 are given by: where ∆ is a normalizing constant, and two auxiliary variables A, B are used to evaluate the prior probability covariance and measurement covariance as To implement the fixed-point iteration, (43) and (44) is approximated as (c) Next, q (i+1) (α k ) and q (i+1) (β k ) are updated using the Beta distribution as shown in (27) and (28). Take the PDF of the Beta distribution into derivation, and, by setting θ = α k and θ = β k , then where q (i+1) (α k ) and q (i+1) (β k ) can be updated with new shape parameters, therefore, the shape parameters of the Beta prior distribution can be updated as follows: After completing c), checking the convergence of the iteration according to the preset error threshold, if < ε then stop the iteration and output the result. Otherwise, return to (a) for the next iteration. In addition to the previous state and measurement vector, to begin the algorithm, the proposed filter only needs two kernel parameters and the initial prior beta shape parameters for the mixture of Gaussian correntropy. Among them, the kernel parameters of the Gaussian correntropy subfunction are more likely to be chosen by experience. In most cases, they have minimal impact on the filtering performance. This conclusion will be confirmed in the subsequent simulations.
On the other hand, the chosen of prior Beta distribution parameters needs a simple discussion. By combining the characteristics of the Beta distribution, the initial mixing probability ρ is determined by its prior shape parameters. If a 0 = c 0 = 0 or b 0 = d 0 = 0, the proposed filter converges to the original MCKF. In order to reduce input parameters and simplify the algorithm, it is generally assumed that a 0 = c 0 and b 0 = d 0 in most cases.
In this work, two Gaussian correntropies with different kernel parameters are used to cope with the stable filtering process and the process corrupted by dynamic abnormal errors, such as impulsive noise disturbance. In order to ensure filtering accuracy and stability, the prior parameters are not arbitrarily set, but obey a certain regularity. For example, in an ideal Gaussian noise environment, we have E (j+1) [t k ] → 1 and then the performance of the mixture correntropy converge to the Gaussian correntropy with a large kernel parameter, so that it has basic robustness and retains convergence to KF as much as possible, expanding (41) and we can find that where C k represents the terms independent of a 0 and b 0 . According to (41) and (42), if the distributions of measurement and state noise nearly meet the ideal Gaussian assumption is needed. In contrast, for the filtering process corrupted by severe impulsive noise, the mixing probability is likely to be redistributed appropriately to enhance the robustness to abnormal noise, and then As the definition of Gaussian correntropy Equation (2), if σ 1 > σ 2 , for the residual term e i with significant abnormality, there is a great of difference between G σ 1 (e i ) and G σ 2 (e i ), and G σ 1 (e i ) G σ 2 (e i ). Therefore, it is further inferred that C k increases significantly and then can be taken as a constant term to regulate the transition of E (i+1) [t k ], and a 0 > b 0 still applies to this case. Therefore, in this work, the parameters can be chosen within the range 0.95 ≤ a 0 ≤ 0.85 and b 0 = 1 − a 0 , which can be used as a typical parameter configuration. On this basis, it may sometimes be useful to make a minor adjustment on the initial parameter configurations according to the specified application. This will contributes to further improving the algorithm's performance in complex environments. The related results are shown in the later simulation.

Example I: 2-D Moving Target Tracking Model
Consider a two-dimensional (2D) moving target tracking system as in which Γ represents the process noise gain matrix, T = 0.2, w = 0.2, and the total simulation steps N = 200/T. It is assumed that Q k = 0.1I 2 and R k = 10I 2 . The true initial state x 0 = [1 1 1 1] T , the initial state estimationx 0 = [0 0 0 0] T and the error covariance matrix P 0|0 = I 4 . In addition to the KF, the HKF with loss function parameter r = 1.345; the MCKF with typical kernel parameters σ = 2, σ = 3, σ = 5 and σ = 9, are used for comparison. For the algorithm proposed in this paper, we chose σ 1 = 9, σ 2 = 3 and a 0 = 0.9 as the default initial parameters. The maximum iteration times N m = 10 for the robust filters. The numerical test was coded with MATLAB and executed on a computer with Intel Core i7-9700 CPU @3.0 GHz. In order to evaluate the filtering performance, the root mean square errors (RMSE) and averaged RMSE (ARMSE) are chosen as evaluation indicators of position and velocity estimation, which are defined as follows: where (x s k , y s k ) and (x s k ,ŷ s k ) are the true and estimated positions(or velocity) at the kth step of the sth Monte Carlo run, and RMSE pos (k) is the RMSE of position at k step. The ARMSEs of position and velocity are denoted as ARMSE pos and ARMSE vel , respectively [12,21].
In order to verify whether the proposed filtering algorithm improve the solution of accuracy degradation problem of original MCKF in no-stationary non-Gaussian noise conditions, the simulation was divided into two stages, and the noises with different distributions were used to test the performance of the proposed filter. To simulate the non-Gaussian noise sequence contaminated by impulsive noise interference, the Gaussian mixture distribution model with specific parameters are used to generate the noise.
The number of Monte Carlo runs was M = 1000 and the specific noise parameters were set as follows: The RMSEs of the position and velocity from different filters are shown in Figures 1 and 2. In addition, the ARMSEs from different filters at all stages are listed in Table 1 for comparison. In the intervals (0, N/2] and (N/2, N], the process and measurement vectors were contaminated by non-Gaussian heavy-tailed noise with different distributions. The plots show that the RMSEs from different filters have obvious differences, and the best kernel size for the MCKF at each stage varied accordingly. For example, MCKF2 (σ = 3) and MCKF3 (σ = 5) achieved higher accuracy, respectively, at these two stages; nevertheless, a too large or small kernel size also resulted in the filtering performance to degrade or even diverge, as shown by MCKF1 (σ = 2) and MCKF4 (σ = 9).   In comparison, the proposed filter had the lowest estimation error at each stage, especially in the case of noise distribution change, the results show that the proposed VB interference method played a positive role in the filtering process, which takes into account both stability performance and robustness to non-Gaussian noise. Therefore, by comparing the filtering performance of all stages, the superiority of the proposed algorithm is preliminarily verified.

Position (m) Velocity (m/s) Position (m) Velocity (m/s) Position (m) Velocity (m/s)
For the existing robust filters with fixed parameters, such as the classic MCKF or HKF, the filtering parameters can be obtained by experience or trial and error methods, which are more applicable to stationary noise conditions. However, as mentioned before, the classic MCKF does not always achieve satisfactory estimation accuracy in non-stationary noise environment. In order to show this change more concretely, the process and measurement noise distribution can be expressed as w k ∼ (1 − p 1 )N(0, Q k ) + p 1 N(0, 100Q k ) and v k ∼ (1 − p 2 )N(0, R k ) + p 2 N(0, 100R k ), where p 1 and p 2 represent the outlier percentage of noise.
In Figures 3 and 4, the outlier percentage of process noise was fixed as p 1 = 0.05, and the ARMSEs of the different filters varying with p 2 : 0 ∼ 0.15 are shown. The overall performance of the ARMSEs showed an increasing trend, which means that as the proportion of impulsive noise increased, the estimation accuracy of the filters also decreased. In this test, the MCKFs with different kernel parameters obtained significant filtering accuracy differences. This means that, under the interference of different non-Gaussian noise, the MCKFs with a fixed kernel parameter could not ensure reliable estimation results, and the MCKFs lack enough self-adaptive ability.
In addition, the fixed-parameters MMCKF without variational Bayesian iteration also taken for comparison. The comparison shows that the results of MMCKF always converges to the MCKF of a specific fixed parameter. Therefore, although it achieves better accuracy than MCKF in some cases, it does not avoid similar accuracy degradation problem as MCKF. Since the filtering results of MMCKF are similar to the typical MCKF with specific parameters, therefore, the comparison and discussion of the simulation mainly focus on the classic MCKF.
Compared with other filtering results, it can be concluded that the proposed filter has better performance than other existing algorithms. On the one hand, the algorithm does not lose much optimality of estimation caused by changing the objective function of KF, and on the other hand, it retains robustness to the increased impulsive noise probability. Therefore, the proposed filter further demonstrates its performance advantage in various non-Gaussian noise environments.
The proposed filter( 1 =9, 2 =3,a 0 =0.9) The proposed filter( 1 =9, 2 =2,a 0 =0.9)  In addition, the simulation tests were performed to compare the filtering performance with different initial parameter configurations. For the proposed IMMCKF, the Gaussian correntropy G σ 1 (e), G σ 2 (e) were mixed to generate the mixture correntropy, where σ 1 and σ 2 represent the specific kernel parameters, and σ 1 > σ 2 . It has been proved that Gaussian correntropy with a smaller kernel bandwidth is more sensitive to impulsive noise. However, this also degrades the filtering stability and might lead to divergence. Table 2 lists the RMSEs of the proposed filter with different σ 1 and σ 2 . In general, the filtering results of the proposed filter with different σ 1 and σ 2 were not significant. Within a certain range, the filtering accuracy improved with the increase of σ 1 , as it provided better compatibility to Gaussian noises for the filtering algorithm. As too large of a kernel bandwidth may reduce Gaussian correntropy's robustness to impulsive noise, σ 1 much greater than σ 2 would not be considered as the parameter set. Therefore, generally speaking, it is easy to choose appropriate σ 1 and σ 2 for the proposed algorithm. In this case, several potential shape parameter options were used for comparison. Table 3 lists the ARMSEs of position and velocity from the proposed filter with different initial shape parameters. As demonstrated, the estimation accuracy can be improved somewhat by fine-tuning the parameter within the given range, but in general, it does not significantly affect the overall filtering performance. In summary, for the state estimation problem in non-stationary noises, the filter proposed in this work has good compatibility with the initial parameter sets. In order to balance the computational efficiency and the filtering performance, it is necessary to choose a reasonable number of iterations for the proposed filter. Figure 5 shows the ARMSEs of filters with different numbers of iterations. For comparison, several filtering results with similar performance in the simulation are also taken into accounts. It can be concluded that the accuracy of the proposed algorithm can be greatly improved after several iterations, and the filtering result also gradually converged. In practical applications, the increase of iteration times brings more computational burden. As shown from Figure 5, when the number of iterations N m is 3∼5, the filtering algorithm has obtained satisfactory estimation accuracy. Therefore, when considering the accuracy and calculation burden factors, the number of iterations N m set as 3∼5 is reasonable. The implementation time of the proposed filter and existing filters in a single-step run with N m = 1 is evaluated. It shows that KF (0.0054 ms) is the fastest, MCKF (0.0253 ms), HKF (0.0251 ms) and MMCKF (0.0212 ms) has similar computational burden and the proposed IMMCKF (0.0490 ms). Compared with the MCKF and MMCKF, the computation complexity of the proposed filter increased simultaneously due to the additional variational Bayesian iterations. In view of this, the additional computation cost of the IMMCKF can be compensated by adjusting the number of iterations. Therefore, it is still feasible to apply this algorithm in real-time applications.

Example II: INS/GPS Integrated Navigation System
To validate the effectiveness and superiority of the proposed algorithm in this paper, the experimental data collected in a vehicle-mounted INS/GPS integrated navigation experiment was used for the test. The experiment was carried out in the campus of Harbin Engineering University, the test trajectory is shown in Figure 6. A low-cost MEMS-IMU based INS/GPS integrated navigation system is used to provide the navigation data.
An INS/GNSS integration navigation system includes a self-made navigation-grade fiber optic strap-down INS and a double-antenna GPS receiver is used for reference. The initial velocity and position of INS/GPS are obtained directly from GPS measurement, the initial level attitude information is acquired from the alignment results of high accuracy SINS. In the experimental test, the car moves along a bumpy road, and the GPS might work abnormally due to the occlusion of trees and buildings.
The sampling frequencies of the low-cost IMU and GPS were 100 Hz and 1 Hz, respectively. The loosely coupled configuration is used in the integration, and a linear closed-loop feedback scheme as [15,23] is used. The state variables vector is defined as The filter only performs time update while there are no GPS outputs. To compare the robustness of different filters to non-Gaussian measurement noise interference, inspired by the scheme in [34,35], the gross errors were added artificially to the measurement data. In the integrated navigation system, the velocity and position state variables are susceptible to external interference. Therefore, the velocity and position errors are used for comparison. For comparison, several typical filtering results are shown in Figures 7 and 8. It can be seen that most filters obtain similar results in stable periods, as the measurement output is reliable in these periods. For the existing robust filters with fixed-parameter, due to the uncertain interference factors in practical application. In this example, both MCKF1 (σ = 2) and MCKF2 (σ = 3) have a divergence trend in the filtering process. In contrast, MCKFs with larger kernel parameters and the proposed filter have more stable performance.
In this example, the measurement sequence is disturbed by impulsive noise with a certain probability. It is evident from the Figures that the estimation accuracy of KF is seriously corrupted during filtering. Generally, a small kernel size can be more effective in attenuating the measurement outliers, but at the same time, it inevitably decreases the filter's stability, e.g., MCKF1 (σ = 2) failed to obtain reliable filtering results due to divergence. By comparing the robustness of MCKF3 (σ = 5), MCKF4 (σ = 9), and other algorithms to non-Gaussian noise, it can be concluded that, for existing MCKFs, it is difficult to obtain both stable and robustness estimation results. The results listed in Table 4 confirmed that the filter proposed in this work solves the problem well. NaN represents invalid output due to filter divergence. Compared with other robust filters, the proposed filter achieves better filtering results by taking into account both the robustness and stability of the filter, which demonstrated the conclusions of the previous simulation example.

Conclusions
In this work, the performance degradation problem of the existing MCKF in nonstationary noises is explored, and a new improved mixture correntropy filtering algorithm is proposed as an effective solution. To cope with the dynamic process where both Gaussian and non-Gaussian noises may occur, the intermediate random variables is used to construct the mixture correntropy. By derivation, the state variables and intermediate parameters are approximated via a variational Bayesian approach. The theoretical derivation and numerical test results show that the proposed method significantly improves upon the existing MCKF algorithms in different conditions, which offers a promising improvement for the robust filtering problem in complex noise environments.

Data Availability Statement:
No new data were created or analyzed in this study. Data sharing is not applicable to this article.