Maximum Correntropy Unscented Kalman Filter for Spacecraft Relative State Estimation

A new algorithm called maximum correntropy unscented Kalman filter (MCUKF) is proposed and applied to relative state estimation in space communication networks. As is well known, the unscented Kalman filter (UKF) provides an efficient tool to solve the non-linear state estimate problem. However, the UKF usually plays well in Gaussian noises. Its performance may deteriorate substantially in the presence of non-Gaussian noises, especially when the measurements are disturbed by some heavy-tailed impulsive noises. By making use of the maximum correntropy criterion (MCC), the proposed algorithm can enhance the robustness of UKF against impulsive noises. In the MCUKF, the unscented transformation (UT) is applied to obtain a predicted state estimation and covariance matrix, and a nonlinear regression method with the MCC cost is then used to reformulate the measurement information. Finally, the UT is adopted to the measurement equation to obtain the filter state and covariance matrix. Illustrative examples demonstrate the superior performance of the new algorithm.


Introduction
The relative state estimation problem is rather important in space communication networks, including inter-satellite link (ISL) establishment, communication support, multiple spacecraft formation flying, and spacecraft rendezvous and docking [1][2][3][4]. Many relative motion scenes make use of the radar or/and ladar to obtain the relative measurement information among spacecrafts, which is necessary to determine the relative motions.
As for the state estimation problem, the Kalman filter (KF) is often used for a linear dynamic system, which is, in essence, a recursive minimum l 2 -norm linear filter [5][6][7]. However, from [8], we know that the relative state estimation in a circular reference orbit involves a linear state Clohessy-Wiltshire (CW) equation and a nonlinear measurement equation. Especially, when the reference orbit is elliptic, the dynamic state equation is also nonlinear [9]. To handle the nonlinear problems, the widely used methods are the extended Kalman filter (EKF) [10][11][12] and unscented Kalman filter (UKF) [11][12][13]. The EKF is a popular nonlinear extension of KF, which makes use of the first order Taylor series expansions to approximate the nonlinear system and applies the KF to this approximation. Nevertheless, the approximation is crude and it may lead to filter divergence if the function exhibits highly non-linear characteristics. Furthermore, the derivation of the Jacobian matrices is cumbersome and often results in the implementation difficulties. In UKF, the probability distribution of the state is approximated by a set of deterministically selected sigma points and propagated though

Maximum Correntropy Criterion
Given two random variables X, Y ∈ R with joint distribution function F XY (x, y), the correntropy is a generalized similarity measure between X and Y defined by where E is the expectation operator, and κ(·, ·) denotes a shift-invariant kernel following the Mercer condition. In this paper, the kernel function is chosen as the Gaussian kernel given by where e = x − y, and σ > 0 is the kernel bandwidth. In many practical applications, we have only a limited amount of data and the joint distribution F XY is unknown. In these cases, the correntropy can be estimated by the sample mean estimator: where e(i) = x(i) − y(i), with {x(i), y(i)} N i=1 being N samples drawn from F XY .
Taking the Taylor series expansion of the Gaussian kernel yields It can be seen that the correntropy is a weighted sum of all even order moments of the error variable X − Y. In addition, the kernel bandwidth appears as a parameter weighting the second order and higher order moments. It is noted that the correntropy will be dominated by the second order moment when the kernel bandwidth is very large compared to the dynamic range of the data.
Given a sequence of error data {e(i)} N i=1 , the cost function of MCC is given by Assume W is a parameter vector of an adaptive system to learn, and let x(i) and y(i) denote the model output and the desired response, respectively. The MCC based learning can be formulated as the following optimization problem: where W denotes the optimal solution, and Ω denotes a feasible set of the parameter.

Unscented Kalman Filter
The UKF provides a suitable alternative for EKF to deal with nonlinear systems. In this paper, we discuss a discrete-time nonlinear system described as where x(k) ∈ R n denotes the n-dimensional state vector of the system at time step k, y(k) ∈ R m denotes an m-dimensional measurement vector. f represents a nonlinear system function, and h is a nonlinear measurement function and they are assumed to be continuously differentiable. The process noise q(k − 1) and measurement noise r(k) are mutually uncorrelated, with zero mean values and covariance matrices In general, similar to KF, the UKF includes two steps, namely predict and update:

Predict
First, a set of 2n + 1 sigma points are computed by the estimated state and covariance matrix of last time step: where (n + λ)P(k − 1|k − 1) i is the i-th column of the matrix square root of (n + λ)P(k − 1|k − 1), n denotes the dimension of the state vector, and λ represents a compound scaling factor defined by where α determines the extent of the spread of the sigma points, which is usually chosen as 0 < α ≤ 1, ε is a scaling factor and is usually set to 3 − n.
The transformed set is given through the process model Then, the predicted state mean and associated covariance matrix are calculated by where the corresponding weights of the state and covariance matrix are defined as , with β being related to prior knowledge of the distribution of x(k) (for Gaussian noises, β = 2 is optimal).

Update
Similarly, a set of 2n + 1 sigma points are computed by the predicted state mean and covariance matrix The transformed set is given through the measurement model Then, the predicted measurement mean and covariance matrix are calculated by In addition, the state-measurement cross-covariance matrix is computed by Next, the Kalman gain is given as Finally, the filter state in this case is obtained by Additionally, the filter covariance is recursively updated as

Unscented Kalman Filter under MCC
To improve the robustness of the UKF, we present the idea to combine the MCC cost function with the UKF framework to derive a novel UKF, which may perform much better in non-Gaussian noise environments, since correntropy contains second and higher order moments of the error.
First, we consider a nonlinear model given by Equation (7) and Equation (8), combine expression Equations (13) and (14) and recast the nonlinear regression model as where ψ(k) denotes by with where T(k) can be obtained by Cholesky decomposition of Ψ(k). Left multiplying both sides of Equation (24) by T −1 (k), we have the following equation where , e(k) = T −1 (k)ψ(k) and the covariance of e(k) is the identity matrix. Then, we define the following cost function J L (x(k)) based MCC: where Under the MCC, the optimal estimate of x(k) can be found by maximizing the cost function where e i (k) is the i-th element of e(k) and Hence, the optimal solution for x(k) can be found from the following equation which can also be expressed as when x i (k) is the i-th element of x(k), and ϕ(e i (k)) = Gσ(ei(k)) · e i (k). Then, we define C(k, i) = ϕ(e i (k))/e i (k) and have C(k, i) = G σ (e i (k)).
Equation (32) further writes in matrix form as where Based on Equation (27) with only one iteration in [32], we could obtain similar results in the UKF framework by using C(k) to reformulate the measurement information. Similar to [16], there are two ways to do it. One is to re-weight the residual error covariance depending on the value of e i (k) = d i (k) − g i (k, x(k)), the other is to reconstruct the 'pseudo-observation'. In general, the derived results based on two ways are the same. In this paper, we just describe the algorithm based on the first way for simplicity.
Defining Ψ(k) is the modified covariance, and could be given as For our next discussion, we write Ψ(k) into two portions so that In fact, since the true state x(k) is unknown, we set η(x(k)) = x(k|k − 1) − x(k) = 0. In this case, we can see easily that The modified measurement covariance can be obtained In the above derivation, the MCUKF algorithm can be summarized as follows: 1. Choose a proper kernel bandwidth σ; set an initial estimate x(0|0) and corresponding covariance matrix P(0|0); and let k = 1; 2. Use Equations (10) and (12) to calculate the sigma points and the propagated sigma points with respect to function f; 3. Compute predicted estimate x(k|k − 1) and covariance matrix P(k|k − 1) by Equations (13) and (14), and adopt Cholesky decomposition to get T p (k|k − 1); 4. Use Equations (16) and (17) to calculate the sigma points and the propagated sigma points with respect to function h; 5. Use Equations (24)-(38) to derive the modified R(k); compute the predicted measurement mean by Equation (18); and replace R(k) with R(k) in Equation (19) and have the following covariance matrix: 6. Use Equations (20)- (23) to compute the filter state mean and covariance matrix; go back to 2.
Remark 1. As one can see, different from the UKF algorithm, the MCUKF uses a non-linear regression model combined MCC to update the measurement information. Note that the kernel bandwidth σ is a key parameter in MCUKF. In general, a smaller kernel bandwidth makes the algorithm more robust (with respect to outliers). However, when the kernel bandwidth is too small, it may result in the filter divergence or accuracy deterioration. The reason for this can be seen in [34]. On the other hand, when σ becomes larger and larger, the MCUKF will behave more and more like the UKF algorithm. In particular, the following theorem holds.

Theorem 1.
If the kernel bandwidth σ approaches infinity, the MCUKF will reduce to the traditional UKF.
Proof of Theorem 1. The proof is simple, we give a short explanation. Note that as σ → ∞, we have C(k) → I. In this case, the MCUKF is equal to the traditional UKF.
Thus, it is seen that the kernel bandwidth has significant influences on the behavior of MCUKF. In practical applications, the kernel bandwidth can be set manually or optimized by trial and error methods.

Illustrative Examples
This section demonstrates the performances of the proposed algorithm by two illustrative examples. In the paper, we mainly compare the estimation performance based on the following benchmarks: where K is the entire time steps in every Monte Carlo run and M represents the total Monte Carlo runs.

Example 1
Now, a univariate nonstationary growth model (UNGM) that is often used as a benchmark example for nonlinear filtering is considered, whose state and measurement equations are given by The parameters are set to α 1 = 0.5, α 2 = 25, α 3 = 8. First, we consider the case in which all the noises are Gaussian q(k − 1) ∼ N(0, 1), r(k) ∼ N(0, 1).
In this example, the parameters are chosen as K = 500, M = 100. Table 1 illustrates the MSEs of x, defined in Equation (41), in Gaussian noise. Since all the noises are Gaussian, the UKF gives the smallest MSE in all filters. Moreover, it is noted that when the kernel bandwidth is small, the MCUKF may result in a worse estimation; in contrast, when the kernel bandwidth becomes larger, its performance will approach that of the UKF. Actually, it has been proved easily that when σ → ∞, the MCUKF will reduce to the traditional UKF (see Theorem 1). Therefore, one should choose a larger kernel bandwidth in Gaussian noises.
Second, we change the observation noise into a heavy-tailed non-Gaussian noise, with a mixed-Gaussian distribution 8N(0, 1) + 0.2N(0, 500). Table 2 shows the MSEs of x in non-Gaussian measurement noise. As one can see, in this case, when kernel bandwidth is too small or too large, the performance of MCUKF will be not good. However, with a proper kernel bandwidth (say σ = 2.0), the MCUKF can outperform the UKF, achieving the smallest MSE. Again, when σ is very large, MCUKF achieves almost the same performance as the UKF.

Example 2
Finally, we consider a practical example with respect to the relative motion of two spacecrafts, which is illustrated in Figure 1. One of the spacecrafts is called the chief spacecraft, which is moving on the reference orbit, and the other is the deputy spacecraft. They all revolve around the earth and thus the inertial orbital equations of two spacecraft are given as where r c and r d are the position vectors of the chief spacecraft and the deputy spacecraft in ECI coordinate frame, r c and r d are the norms of r c and r d , respectively, and µ is the gravitational parameter of the earth. The position vector of the deputy spacecraft relative to the chief spacecraft is Here, we use the Hill coordinate frame, which is a rectangular, Cartesian, dextral rotating frame centered on the chief spacecraft and refer to it as the local-vertical-local-horizontal (LVLH) frame. Then, the motion of the deputy spacecraft relative to the chief spacecraft in the Hill coordinate frame can be described in [35] as where x, y and z are the radial, in-track and cross-track coordinates of the Hill frame, and ω denotes the orbital angular velocity of the chief spacecraft. The radar is set on the chief spacecraft to obtain the measurements, and the measurement coordinate system is showed in Figure 2. The measurement equations are given as where ρ is the relative range between the chief spacecraft and deputy spacecraft, θ is the azimuth angle, and φ is the elevation angle. The two spacecrafts are on the elliptic low Earth orbits. The initial six orbital elements of the chief spacecraft are showed in Table 3, and the trajectory of which in ECI coordinate frame is propagated by the fixed-step fourth-order Runge-Kutta algorithm. The prediction of filters is performed every 0.1 s and the measurements record from the radar every 1 s. The state vector x(k) = x(k) y(k) z(k)ẋ(k)ẏ(k)ż(k) T contains the relative position and velocity components in Hill frame, respectively. The initial true state is  Table 3. Initial orbital elements of chief spacecraft.

Orbital Elements Chief Spacecraft
Semi-major axis 8000 km Eccentricity 0.150 Orbit inclination π/6 rad Argument of perigee π/6 rad Right ascension of the ascending node π/18 rad True anomaly 0 rad In this example, 100 independent Monte Carlo runs have been conducted, and, in each case, an elapsed time of 7200 s is considered, that is K = 7200, M = 100. Since correntropy is a local similarity measure, the MCUKF may converge to the optimal solution slowly, especially when the initial values deviate greatly from the true values. Thus, we use the UKF during the first 100 s to make the process settle down and then switch to the MCUKF to continue. The performance of EKF, Huber-EKF (HEKF) [36], UKF and novel robust UKF (NRUKF) [16] are shown for comparison with that of the proposed algorithm.
First, we consider the case in which all the noises are Gaussian, that is,  Table 4 summarizes the corresponding TAMSD p and TAMSD v , defined in Equations (44) and (45) (the parameter is set to k 1 = 1000, k 2 = 7200). Those results illustrate that the UKF play the best performance in all filters in this case and the UKF type filters have the better performance than the EKF type counterparts. One can also observe that the robust filters do not perform as well as their non-robust counterparts in the Gaussian noises. Moreover, it is worth noting that when the kernel bandwidth is small, the MCUKF may achieve a worse performance; whereas, when the kernel bandwidth becomes larger, it has similar results to the UKF.    Second, we consider the case in which the measurement noises are heavy-tailed, with mixed-Gaussian distributions  Table 5 lists the corresponding TAMSD p and TAMSD v . As one can observe again, the UKF type filters give a smaller TAMSD p and TAMSD v than the EKF type counterparts. All the robust filters are superior to their non-robust counterparts in impulsive noises. It is noted that when the kernel bandwidth is very large, MCUKF achieves almost the same performance as the UKF. However, with a proper kernel bandwidth, the MCUKF can outperform the UKF significantly. Particularly, when σ = 2.0, the MCUKF exhibits the smallest TAMSD p and TAMSD v .
Moreover, to compare the computational cost, the computation times of different filters in this example are shown in Table 6. We can see that the computation time of the MCUKF is moderate compared with the UKF, and is superior to that of the HEKF and NRUKF.

Conclusions
In this paper, we propose a new unscented Kalman filter (UKF), namely the maximum correntropy unscented Kalman filter (MCUKF), which shows strong robustness against heavy-tailed non-Gaussian noises. The proposed algorithm is recast in the form of a non-linear regression model and makes use of the MCC to obtain the filter estimates. The MCUKF is then applied to the spacecraft relative state estimation, compared with EKF, HEKF, UKF and NRUKF. Simulation results confirm that, with a large kernel bandwidth, the MCUKF will behave like the original UKF. With a proper kernel bandwidth, the new filter can achieve better performance than other filters, especially when the measurement system is disturbed by some impulsive non-Gaussian noises.