Student’s t-Kernel-Based Maximum Correntropy Kalman Filter

The state estimation problem is ubiquitous in many fields, and the common state estimation method is the Kalman filter. However, the Kalman filter is based on the mean square error criterion, which can only capture the second-order statistics of the noise and is sensitive to large outliers. In many areas of engineering, the noise may be non-Gaussian and outliers may arise naturally. Therefore, the performance of the Kalman filter may deteriorate significantly in non-Gaussian noise environments. To improve the accuracy of the state estimation in this case, a novel filter named Student’s t kernel-based maximum correntropy Kalman filter is proposed in this paper. In addition, considering that the fixed-point iteration method is used to solve the optimal estimated state in the filtering algorithm, the convergence of the algorithm is also analyzed. Finally, comparative simulations are conducted and the results demonstrate that with the proper parameters of the kernel function, the proposed filter outperforms the other conventional filters, such as the Kalman filter, Huber-based filter, and maximum correntropy Kalman filter.


Introduction
The state estimation problem is ubiquitous in various applications, such as navigation [1], target tracking [2], and so on [3,4]. The common state estimation method is the Kalman filter (KF), which has been successfully used in many fields [5][6][7]. For the linear system with additive Gaussian noise, the KF can achieve optimal state estimation. But most systems in real world are usually nonlinear, which limits the application of KF. To solve the state estimation problem of the nonlinear system, many novel filters, such as the extended Kalman filter (EKF) [8], unscented Kalman filter (UKF) [9], quadrature Kalman filter (QKF) [10], cubature Kalman filter (CKF) [11], and so forth have been proposed in the last few decades. The basic idea of EKF is to linearize the nonlinear system by the Taylor expansion technique and truncate the Taylor series at the first-order term. As a result, EKF requires the system model to be differentiable and the Jacobian matrices need to be calculated, resulting in high computational complexity. Besides, for strong nonlinear systems, the first-order linearization will inevitably introduce non-negligible linearization errors, which may lead to the degradation of state estimate performance. The UKF utilizes the unscented transformation (UT) technique to achieve the nonlinear propagation of the mean and covariance of the system state, which avoids the derivation of the Jacobian matrices. Compared with EKF, it has better performance, especially for the strong nonlinear system. To further improve the estimation accuracy, various numerical integration methods are introduced into the filter, such as cubature rule-based filters and quadrature rulebased filters. These methods improve the numerical approximation accuracy of intractable integrals, which leads to more accurate characterization of original probability density functions, correspondingly resulting in an enhanced estimation accuracy [12].
• A novel maximum correntropy Kalman filter is developed in which the Student's t kernel function is used to replace the conventional Gaussian kernel function. • Considering the fixed-point iteration method is used to update the posterior estimates of the state in STKKF, the convergence analysis under a certain condition is given.
• The comparative simulations with other filters are conducted to demonstrate the superiority of STKKF.
The rest of the paper is organized as follows. In Section 2, basic knowledge about the correntropy and Kalman filter are introduced briefly. In Section 3, The STKKF based on Student's t kernel maximum correntropy criterion is derived. The convergence of the filter is analyzed in Section 4. To evaluate the performance of STKKF, the comparative simulations are conducted in Section 5. Finally, a discussion is given in Section 6.

Correntropy
The correntropy was proposed to measure similarity across lags as the autocorrelation of random processes [34], and then was extended to measure the localized similarity of arbitrary two random variables [35]. Let X and Y represent two random variables respectively; then, the correntropy between them can be defined as where E(·) is the expectation function, κ(X, Y) is the kernel function, and p X,Y (x, y) represents the joint probability density function (PDF) of X and Y.
The most widely used Gaussian kernel function is defined as where σ represents the Gaussian kernel bandwidth.
To better capture the heavy-tailed features in the noise, Student's t kernel function [33] is used in this paper to replace the Gaussian kernel function, defined as where v is used to control the shape of Student's t kernel function, and σ is the kernel bandwidth.
In real applications, it is difficult to obtain the joint PDF of random variables. Therefore, the sample mean estimator of correntropy is often used, as shown in the following equation where

Kalman Filter
Consider the following linear stochastic system represented by the state-space model where k is the discrete time index, x k ∈ R n is the state vector, y k ∈ R m is the measurement vector, F k and H k are known as state transition matrix and the measurement matrix. q k−1 ∈ R n and r k ∈ R m are the mutually independent process and measurement noise, respectively, and satisfy E(q k−1 ) = 0, E(r k ) = 0, In general, the KF includes the following two steps: 1. One-step state prediction: The priori state estimatex k|k−1 and the corresponding error covariance matrix P k|k−1 can be given bŷ 2. Measurement update: The posteriori state estimatex k|k and the corresponding error covariance matrix P k|k can be given bŷ where K k is the KF gain matrix.

Student's t Kernel-Based Maximum Correntropy Kalman Filter
The traditional KF is based on the minimum mean square error (MMSE) criterion and performs well under Gaussian noises. However, when the noises are non-Gaussian or large outliers arise in the measurement, the performance of KF may degrade significantly. Correntropy contains second-and higher order moments of the error and is inherently insensitive to outliers. Therefore, the filters based on the maximum correntropy criterion outperform traditional filters in non-Gaussian noise environments. Meanwhile, these filters are more robust to abnormal measurements. However, the performance of these filters is mainly affected by the kernel function and its parameters. The common Gaussian kernel function may overlook the heavy-tailed properties of heavy-tailed noises, which results in a decrease of the estimation accuracy. To better utilize the heavy-tailed features and improve the estimation accuracy of the system state, Student's t kernel function is used to replace the Gaussian kernel function to model and process the heavy-tailed noise.
For the linear system represented by Equation (5), the following equation can be obtained where v k = − x k −x k|k−1 T r T k T and the corresponding covariance matrix can be given as where B p and B r can be obtained from the Cholesky decomposition of P k|k−1 and R k , respectively. When Student's t kernel function is employed, the cost function based on the maximum correntropy criterion can be given as where S v,σ (·) is Student's t kernel function, and e x,i and e y,j represent the ith and jth element of e x and e y , respectively. The e x and e y are given by According to the maximum correntropy criterion, to obtain the optimal state estimation x k|k , the following equation should be solved Substituting Equations (11) and (12) into Equation (13), the following equation can be obtained: where B p,i and B r,j represent the ith and the jth row of B p and B r , respectively. The matrix form of Equation (14) can be expressed as where Let Then, Equation (15) can be rewritten as Add and subtract H T k R −1 k H kxk|k−1 at the right side of Equation (18), where the following equation can be obtained: Then multiplying P −1 at both sides of Equation (19), we have where Accordingly, the posteriori error covariance matrix P k|k can be given by It can be seen from Equations (16) and (17) that K k is nonlinear with respect to x k . Then the fixed-point iterative method is used to solve Equation (20).
In general, the STKKF algorithm can be given as follows: 1.
Initialization: The parameters v and σ in the Student's t kernel function are chosen appropriately, and a small number ε ∈ R + used as an iterative iteration termination condition is given. The initial statex 0 and error covariance matrixP 0 are set.

2.
State prediction: The one-step state predictionx k|k−1 and the corresponding error covariance matrix P k|k−1 are the same as those in KF, which can be obtained by Equation (7). 3.
Posterior state estimate: (a) Calculate the matrix B p and B r by the Cholesky decomposition of P k|k−1 and R k , respectively.
Calculate the state estimate at the (l + 1)th iteration by the following equations (d) Check whether the state estimate in this iteration meets the iteration termination condition by Equation (24). If the termination condition is not met, set l = l + 1, return to step (c), and continue the next iteration. Otherwise, set the final state estimatex k|k =x k (l + 1), and go to step 4. 4.
Posterior error covariance update: calculate the corresponding posteriori error covariance matrix P k|k by Equation (22). Set k = k + 1 and return to step 2.

Theorem 1.
If v is fixed, when σ → ∞, the STKKF will tend to become the KF.
Proof. As σ → ∞, the matrix Λ x and Λ y in Equation (16) → I. Accordingly, P k|k−1 → P k|k−1 and R k → R k , which means that STKKF reduces to KF.

Theorem 2.
If σ is fixed, when v → ∞, the STKKF will tend to become the MCKF with bandwidth σ.

Proof.
As v → ∞, the following equation holds: where the equation lim Then Student's t kernel function reduces to the Gaussian kernel function, which means that the STKKF tends to become the MCKF.

Convergence Analysis of STKKF
The fixed-point iteration method is used in the STKKF to update the posterior state estimate. To ensure the iterations converge, the convergence of the STKKF is analyzed in this section. The method used is similar to that of the Ref. [36], where only a sufficient condition is given.
The Equation (15) can be rewritten in the following form: Let Equation (26) can be rewritten in the following form: Then, x k can be given as Firstly, function f (x k ) = x k is constructed, and by substituting Equation (16) into Equation (29), f (x k ) can be expressed as where W i represents the ith row of matrix W, D i is the ith component of vector D, and The Jacobian matrix of f (x k ) with respect to x k , ∇ x k f (x k ), can be expressed as where Then, the following theorem holds. (34) holds. The expression of ξ, ϕ(v, σ), and ψ(v, σ) are shown in Equations (35), (36) and (37), respectively.
where || · || p is the l p -norm of a vector or induced matrix norm defined by Ax p x p , and (a) comes from the compatibility of the matrix norm and vector norm.
According to the matrix theory, the following equation holds: where λ max (·) represents the maximum eigenvalue of the matrix. Similarly, where (c) is because 1 + If the parameter v is fixed, then ϕ(v, σ) is the monotonically decreasing function of σ, and then we have lim Therefore, for ∀β > ξ, ∃, the unique σ * ∈ (0, ∞), According to the matrix theory, to prove ∇ x k f (x k ) 1 ≤ α, we just need to prove ∂ ∂x k,j f (x k ) 1 ≤ α for ∀j. The Equation (33) is rewritten here: The following equation can be derived: where (d) comes from that when σ > σ * , f (x k ) 1 ≤ β, (e) is because of the convexity of the l 1 norm, and e i W According to Equations (40), (46) and (47), the following equation can be obtained: Additionally, if the parameter v is fixed, then ψ(v, σ) is the monotonically decreasing function of σ; then, we have lim Therefore, for ∀ α ∈ (0, 1), ∃ the unique σ + , s.t. ϕ(v, σ + ) = α. When σ > σ + , the following equation holds: Based on the above derivation, we conclude that when the parameter v is fixed, σ > max(σ * , σ + ), and x k ∈ {R n : x k 1 < β}, the following equations hold: The theorem is proved completely.
By Theorem 3 and the Banach Fixed-Point Theorem [37], if the l 1 -norm of the initial iteration point x k (0) 1 ≤ β, then STKKF will surely converge to the unique point in range x k ∈ {R n : x k 1 ≤ β}, provided that the kernel bandwidth σ is larger than a certain value.
The implementation pseudocode of the STKKF is shown in Algorithm 1.

Algorithm 1:
The implementation pseudocode for one time-step of the STKKF.
, . . . , ≤ ε, where if the termination condition is met, then set x k|k =x k (l + 1) and go to Step 8; otherwise, set l = l + 1 and return to Step 2, and continue the next iteration.
Outputs:x k|k , P k|k .

Simulations and Results
In this section, simulations are conducted to demonstrate the performance of STKKF. The results of KF, HKF, MCKF, and STKKF are compared when different kinds of noise distribution exist.
The benchmark navigation problem is considered [38]. The dynamics and measurement model are given as follows: where ∆t is the sample period, and q k and r k represent the process noise and measurement noise respectively. The first two components of state vector x k ∈ R 4 represent the north and east position of a land vehicle, and the last two components are the corresponding north velocity and east velocity. The position of the vehicle is measured directly by a device.
In the simulation, the sample period ∆t is 1 s, and the initial values of the true state x 0 , estimated statex 0|0 , and error covariance matrix P 0|0 are assumed to be Two different cases with different kinds of processes and measurement noises are considered in this simulation as follows: 1.

2.
The process noise is a Gaussian distribution and the measurement noise is a Gaussian mixture noise.
The proposed STKKF and the other filters are coded with MATLAB, and the simulations are run on a computer with Intel Core i7-3540M CPU at 3.0 GHz. The time steps in the simulation is 200. For each case, 100 Monte Carlo simulations are implemented to quantify estimation performance. The performance of the filters is evaluated by the root mean square error (RMSE) and average RMSE (ARMSE). The RMSE and ARMSE in position are defined as: In case 1, the process and measurement noise are both Gaussian noises. Theoretically, KF should be the best estimator. To demonstrate the relationship between the KF and STKKF, these two filters were studied in this case. The ARMSE of the position and velocity estimates are listed in Table 1. Meanwhile, the average iteration number required for the STKKF to converge and the average implementation times of the filters for one time step are also listed. The iteration termination condition is given by Equation (24). Here, ε is set as 10 −4 . In addition, the RMSEs of the position and velocity of KF and STKKF with different kernel parameters are also plotted in Figure 1a and Figure 1b, respectively.  In case 2, the process noises are still Gaussian but the measurement noise is a heavytailed (impulsive) non-Gaussian noise, with a mixed-Gaussian distribution. The ARMSEs of the position and velocity estimate of different filters, average iteration numbers for MCKF and STKKF, and the average implementation times of the filters for one time step are listed in Table 2. In MCKF, the iteration termination parameter is set to be the same as that of STKKF, that is, 10 −4 . The RMSEs of the position and velocity of different algorithms are also plotted in Figure 2a and Figure 2b, respectively. It should be noted that for the MCKF and STKKF with different kernel parameters listed in Table 2, only partial models of them are plotted in the figures to maintain the clarity of the plot. It should be pointed out that the noise parameters of KF in the Table 2 are set with the true noise covariance of mixture distribution in Equation (55), and hence, KF is the best linear estimator in the MSE sense. The parameter used in the HKF is set to ensure that the best estimation accuracy is obtained.

Discussion
In this paper, the Student's t kernel function is employed to replace the traditional Gaussian kernel function in the definition of correntropy to better utilize the heavy-tailed features of noises when the underlying system is disturbed by heavy-tailed non-Gaussian noise. Then the maximum correntropy criterion based on Student's t kernel function is applied to the Kalman filter as the optimality criterion. Based on the criterion, a novel Kalman-type filtering algorithm, named STKKF, is derived. Meanwhile, since the fixedpoint iteration method is used to update the posterior state estimate, the convergence of the STKKF is also analyzed.
The performance of the proposed filter is verified through comparative simulations with KF, HKF, and MCKF. When both the process noise and measurement noise are Gaussian noises, it can be seen from Table 1 and Figure 1 that the performance of the KF is the best, since both the noises are Gaussian. When the kernel bandwidth is too small, the STKKF may achieve worse performance. However, one can also see that with the increase of the kernel bandwidth, the performance of the STKKF becomes better and approaches that of the KF. This phenomenon can be explained by Theorem 1, that is, when the filter kernel bandwidth σ → ∞, the STKKF will tend to become the KF algorithm. In general, with appropriate parameters, the STKKF performance is at least as good as KF. In addition, the average fixed-point iteration numbers required for STKKF to converge and the average implementation times of the filters for one time step are also calculated in Table 1. It is obvious that the average iteration numbers decrease as the kernel bandwidth σ increases, that is, the convergence speed becomes faster. Correspondingly, the average implementation time of STKKF also decreases. In practical real-time applications, the kernel bandwidth should be set appropriately to ensure the algorithm can run in real time.
When the measurement noise is a heavy-tailed (impulsive) non-Gaussian noise, the results shown by Table 2 and Figure 2 demonstrate that with appropriate parameters, the state estimate accuracy of filters based on the maximum correntropy criterion (MCKF and STKKF) outperform that of the other algorithms. However, the implementation times of these maximum correntropy filters are much longer than that of KF and HKF due to their computational complexity. Furthermore, it can be seen that the implementation times of STKKF are consistently longer than MCKF when they have the same kernel bandwidth. This is because the shape of Student's t kernel function has a heavier tail than the Gaussian kernel function. In addition, one can also see that when the kernel bandwidth is set as the same, the performance of the STKKF with different v is consistently better than MCKF. Additionally, with the increase of the v, the performance of the STKKF approaches that of the MCKF. This phenomenon can be explained by Theorem 2, that is, when v → ∞, the STKKF will tend to become the MCKF with bandwidth σ.
In summary, with proper parameters, the STKKF can outperform the other filtering algorithms, especially for the heavy-tailed non-Gaussian noises. However, like other filters based on the maximum correntropy criterion, the choice of parameters of the kernel function is critical. When the parameters are not appropriate, the filter's performance may degrade, which should be given much attention in practical applications. At present, the Student's t kernel function-based maximum correntropy criterion is only being applied to the linear system. In the future, an extension to the nonlinear system model can be investigated.