Regularized Optimal Transport Based on an Adaptive Adjustment Method for Selecting the Scaling Parameters of Unscented Kalman Filters

In this paper, an adaptation method for adjusting the scaling parameters of an unscented Kalman filter (UKF) is proposed to improve the estimation performance of the filter in dynamic conditions. The proposed adaptation method is based on a sequential algorithm that selects the scaling parameter using the user-defined distribution of discrete sets to more effectively deal with the changing measurement distribution over time and avoid the additional process for training a filter model. The adaptation method employs regularized optimal transport (ROT), which compensates for the error of the predicted measurement with the current measurement values to select the proper scaling parameter. In addition, the Sinkhorn–Knopp algorithm is used to minimize the cost function of ROT due to its fast convergence rate, and the convergence of the proposed ROT-based adaptive adjustment method is also analyzed. According to the analysis results of Monte Carlo simulations, it is confirmed that the proposed algorithm shows better performance than the conventional algorithms in terms of the scaling parameter selection in the UKF.


Introduction
In recent years, analysis of the state variables estimation of a dynamic system has played a central role in a variety of research areas such as navigation, target tracking, etc. [1][2][3][4][5][6][7]. Thus, filtering approaches for estimating the state variables were researched to improve the estimation performance in the case of a nonlinear system and non-Gaussian noise [8][9][10]. These methods can be classified into two categories: derivative approximations and derivative-free approximations [11]. The filtering based on derivative approximation replaces the nonlinear functions in the system description by derivative-based expansions such as the Taylor or the Fourier-Hermite series expansions [6]. The extended Kalman filter (EKF) is one of the most widely used derivative approximation methods based on first order linearization. However, the EKF can produce unstable filters in cases when local linearity is violated. Thus, derivative-free approximations based on differential polynomial interpolations, such as an unscented transform (UT) and various numerical integration rules, were recently studied for nonlinear systems [12]. The unscented Kalman filter (UKF) is a representative filtering method of derivative-free approximations that employs selected sigma-points which are propagated through a known nonlinear system model and measurement model. Some recent works on the UKF have used statistical regression instead of Taylor series linearization to increase the filter performance [13][14][15].
In the UKF, the proper positioning of the sigma points is an essential process for reducing the UT approximation error in dynamic situations. Various algorithms of the UKF have The main algorithm of the UKF is based on the structure of the Kalman filter, where means and covariance matrices of filter state variables are recursively updated using moment approximation methods [16,27]. However, in the UKF, predicted moments are calculated by the UT. In the UT, a fixed number of sigma points that capture the desired moments (means and covariance) of the distribution of the states is deterministically selected [5]. The sigma points are then propagated with the nonlinear function, and the moments of the transformed variable are estimated [5,9]. In this section, the general filter structure of the UKF is briefly introduced.

Structure UT and General Filter Structure
The discrete-time nonlinear system is considered as: where x k represents the state variable of the nonlinear system and z k refers to the measurement at time k, respectively. Further, f k , h k are the system and measurement model, respectively. w k , v k are the independent nonlinear system and measurement white noises, respectively, of which the probability density functions (pdfs) are assumed to be the Gaussian pdf with zero means and pre-designed covariance matrices (Q k and R k , respectively). In order to estimate the desired moments of the distribution of filter states, a matrix χ of 2n + 1 sigma vector χ i is defined by the UT and written as follows [16]: where x is the mean vector of x k , P x is the covariance matrix of x k , and n is the dimension of x k [16]. The corresponding weights are: Sigma points χ i are used in the nonlinear transformation h k , yielding: Finally, the approximated moments z k are expressed by: Using the definition of the UT, the time update and measurement update processes of the UKF are obtained [16]. According to the process of the UKF, the position of the sigma points is determined by the estimated moments and the scaling parameter, which influences the accuracy of the UT approximation [27]. Thus, the appropriate scaling parameter should be selected to improve the estimation performance.

Selection of Scaling Parameter
The scaling parameter, which determines the positions of the sigma points, needs to be appropriately selected to improve the estimation performance of the UKF [27,28]. In previous works, the recommended setting of the scaling parameter has been proposed as: To maintain the positive semi-definition of the filter parameter κ should be selected as zero for n > 3 (following the cubature Kalman filter (CKF) [8]). When the recommended scaling parameter is selected, the fourth-order term of the UT approximation error, which is only defined when the measurement model is nonlinear [27], is eliminated.
According to previous works [25][26][27][28], in some cases, the remaining terms may cause greater error than when the fourth-order term of the UT approximation error is removed. Thus, the scaling parameter can be defined as a design parameter depending on the functions in the filter state and measurement equations, characteristics of the noise, and operating point on the estimated moments.
Several studies have investigated the selection of the proper scaling parameter of the UKF, taking both an online approach [25][26][27][28] and an offline approach [23,24]. However, the offline approach for the selection of the scaling parameter causes performance degradation when it is applied to the dynamic system. Any change of the operating point in the dynamic situation requires a new training procedure for selecting the proper scaling parameter because the UT approximation error and selection of the scaling parameter depend on the estimated state statistics [27]. Thus, for selecting the scaling parameter in this paper, we focus on the online approach. In the next section, the conventional scaling parameter adjustment methods based on the online approach and the proposed scaling parameter adjustment method are introduced.

Scaling Parameter Adjustment Method
In general, the adaptive method for adjusting the scaling parameter is based on selecting the parameter that achieves the highest value of the criterion within a candidate set of scaling parameters [28]. Besides, the adaptive method is initiated between the time update process and the measurement update process of the UKF. This section briefly introduces pdf-based criteria, which use likelihood or posterior probability with predicted states moments [28]. However, approaches employing likelihood or posterior probability may cause a large UT approximation error when the measurement model changes over time because only the predicted values of filter states are used, with reduced emphasis on the current measurement values. Thus, a moment-based criterion based on the innovation of the filter can serve as an alternative criterion for a time-varying measurement model [28]. In the case of moment-based criterion, measurement noise has a significant effect on the criterion. Consequently, the moment-based criterion may not be suitable for adjusting the scaling parameter when the measurement noise increases. Thus, in this paper, ROT [35][36][37][38] is used to compensate for the error of predicted measurement with the current measurement values to select the appropriate scaling parameter.

Conventional Adjustment
The criteria for adjusting the scaling parameter used in the previous work [28] can be classified into two categories: pdf-based approach and moment-based approach. The pdf-based criteria require the pdf of the state and measurement noise [28]. The pdf of the predicted states is needed as well. Using likelihood function or posterior probability is a representative method based on the pdf-based approach. In the first method, the optimal scaling parameter is determined using the likelihood function as follows [27,28]: If the predictive pdf and measurement pdf is assumed to be Gaussian, then the likelihood is obtained as: where N{} refers to Gaussian distribution (normal distribution) with mean z − k (κ) and covariance matrix P − z,k (κ). z − k (κ) is the weighted sum of the predicted measurement points which is obtained by: In (12), ω i is the weight of the sigma points and h k χ − k,i is the transformation of each sigma point, χ − k,i , through the nonlinear function, while the predicted covariance of h k χ − k,i is expressed by: Equation (10) show the selected scaling parameter, which achieves the maximum value of the likelihood function within a candidate set of scaling parameters. Besides, the measurement information is only used in the criterion, and the information on the state dynamics is ignored. Thus, the criterion depends on the quality of the measurement information in a real situation. To extract the information from the known state dynamics, a posterior probability based criterion is used, presented as follows [28]: where z k = {z 1 , z 2 , . . . , z k } and the posterior probability is decomposed into likelihood function and prior probability according to Bayes rule. The prior probability denotes the information of the state dynamics. If the likelihood function and prior probability are assumed to be Gaussian, then the posterior probability can be obtained as follows: where T + Q k , Q k is the system noise covariance. Neither adaptive selection using likelihood nor posterior probability is suitable for selecting the correct scaling parameter when the measurement model changes over time or when the measurement is obtained using a dynamic situation because aside from the current measurement values, the adaptive selection methods only use predicted states moments. Thus, in previous work [28], an alternative selection of the scaling parameter using a moment-based criterion was proposed. The moment-based criterion is based on the innovation of the filter, which refers to the difference (z k ) between the measurement and predicted measurement and can be expressed as [28]: In the case of different variances of individual measurement, the covariance of the measurement prediction error can be used to normalize the innovation square as expressed in (16). However, a moment-based criterion is sensitive to changes in measurement noise. Consequently, an incorrect scaling parameter may be selected when the measurement noise increases.

Regularized Optimal Transport Based Adjustment
Before explaining the proposed adaptive method, a brief review of ROT is presented. Optimal transport is proposed to minimize the cost function for transport between a source probability distribution µ and the desired probability distribution ν using a transport  [35][36][37]. In the case of discrete measures through a finite number of samples, two distributions can be written as: where δ x i , δ z j are the Dirac at locations x i , z j , respectively. Further, p i , q j are the probability masses associated with the i and jth sample, respectively, belonging to the condition: The source and desired samples are expressed as x = [x 1 , . . . , x N x ] T and z = [z 1 , . . . , z N z ] T , respectively. The set of probabilistic couplings between these two distributions can then be considered as the set of doubly stochastic matrices, S defined as: where 1 is an N dimensional vector of ones, and p, q are data set of the probability masses explained in (17) The Kantorovitch formulation is used to obtain the optimal transport π * as follows: where c ij is the cost function matrix related to the energy needed to move a probability mass from the source to the desired samples. In general, the cost is chosen as the Euclidian distance between the two samples sets as follows: transport has a high computational load if the data dimension increases. In order to address the problem and improve the relaxation matching between the two sample sets, the optimal transportation problem can be regularized by an entropic term [37]. Finally, the regularized cost function can be timeously minimized using the Sinkhorn-Knopp algorithm [38], and ROT π * γ can then be calculated as: where v ∈ R N x ×1 , u ∈ R N z ×1 are vectors, diag(•) refers to the diagonal matrix, and M ij = exp −c ij /γ . In addition, γ is the regularization parameter, which is the predesigned parameter for weighting the cost. The vectors can be computed by Sinkhorn s fixed-point iteration (N r th iteration number) as follows: In this paper, the error between the N z of current measurements (z N z k ) and the predicted measurements points (z − k ) is compensated using ROT. The whole process of ROT is expressed in Table 1. The complexity of the ROT is O(N log N) which is already analyzed in [37] (N is the dimension of the input vector).
In this process, when multiple measurements (z N z k ) are taken at each time sequence, the value q should be updated every epoch in order to reflect the time-varying distribution of the measurements. In this way, it is possible to easily solve by clustering a measurement dataset using Fuzzy C-means clustering [39,40] and giving a weight for setting the q based on the distance from the clustering center s value. The weights are set using the user-defined exponential function and can be set to an appropriate value reflecting the characteristics of the system to which the algorithm is applied.

Input:
Measurements : Compensated point set of predicted measurement,z In addition, the calculation of the vectors should be performed by Sinkhorn s fixedpoint iteration to obtain the correct vectors. Using the output values of ROT, the scaling parameter is adjusted as: where the expected value and covariance value of the compensated predicted measurement can be obtained as: var If the measurement pdf is assumed to be Gaussian, the compensated pdf can be rewritten as: p In general, it is not possible to identify a closed-form of an optimal solution to the criteria in (10), (14), and (22). Thus, to find a solution to the criteria, the numerical method needs to be used. While many numerical methods can be used to find the solution to the maximization of the likelihood and posterior probability, the grid method [28] is applied to identify the optimal solution to the criteria in this paper because the search area of the scaling parameter is narrow. In the grid method, the criterion used to find the solution is evaluated at the grid of the points in a feasible interval. The point with the maximal value is then selected and considered to be the recommended value of the scaling parameter. Although the simple structure of the grid method allows for its application to any algorithm, it increases the computational load on the filter. To reduce the computational costs, the adaptive selection of the scaling parameter based on the grid method is only performed when the measurement dramatically changes. The optimal scaling parameter selection of the UKF does not affect reducing the UT approximated error when the measurement model is close to the linear model.
Thus, simple logic is implemented to control the activation of adaptive adjustment logic for the scaling parameter. The logic is based on detecting whether the measurement model is close to the linear model within and around the predictive mean [28]. If such a characteristic is detected, the adaptation is skipped, and the recommended scaling parameter is used. While several methods can be used to measure the nonlinearity of the function, the simple technique based on the residual value of the weighted least squares method [28] is used in this paper to detect the nonlinearity of the measurement. The whole process of the proposed algorithm is summarized in Table 2, and its computational complexity is O N 3 (N is the dimension of the state variables). Table 2. Pseudocode of the proposed algorithm. Input: x + k−1 , P + k−1 (k−1 step estimates) Output: Prediction values calculation 8 Scaling parameter adjustment: Table 1 11 UKF update : Sigma points recalculation with κ ROT k (the process in 1~4) 13 Weight recalculation with κ ROT k (the process in 6) 14 Prediction values recalculation with κ ROT k (the process in 8~9) 15 Measurement prediction Update values calculation

Convergence of UKF with Adaptive Scaling Parameter
In this section, an analysis of the stability of the UKF with an adaptive scaling parameter is performed based on the results of previous studies [17,41]. The error of state variables and measurements are defined, respectively. where the estimation error is defined as x + k = x k −x + k (after measurement update of state variable estimation), and the prediction error is written by are Jacobian matrices.
In the case of the UKF with an adaptive scaling parameter, if the scaling parameter changes during the process of the adjustment algorithm (that is, the location of the sigma point in the UKF changes), the prediction error of state variable and measurement are also changed simultaneously. To deal with the changes in the errors induced by the relocation of the sigma points, an instrumental diagonal matrix α k = diag{α 1 , α 2 , · · · , α n } for the error of state variables and β k = diag{β 1 , β 2 , · · · , β N z } for the error of measurements are added to (26) and (27) as follows, which reflects the effect of the relocated sigma points on the predicted errors in state variables and the residual of measurements [41].
where α k refers to the compensation values of the difference between true state variables and approximated state variables using the UKF with adaptive scaling parameter, particularly including state estimation error according to changes in the scaling parameter (κ). In addition, above, (28) and (29) are valid only when there are additional errors in the filter models, respectively. In (26), the predicted error is defined, and it is written in detail as follows: Let (n + κ)P x i = ∆x i and take the Taylor series expansion of the nonlinear transformation of f (x), then the above equation can be derived as: In (31), the Taylor series expansion of f k Finally, in (32), k−1 and the rest of the terms are modeled by a function of the scaling parameter.
Then, the predicted error has the relationship between α k and scaling parameter κ, as follows: where I is the N x × N x identity matrix. Likewise, the relationship between β k and κ can be obtained like the above process (from (27) to (35)). For convenient analysis of the stability of the UKF with the adaptive scaling parameter, approaches used in previous works [17,41] are employed to simplify the error expression. In addition, some assumptions are held for verifying the boundedness of the estimation errors in the UKF. There exist real constants which are related to the system model and measurement model written in (1) and (2): such that the following bounds on matrices of filter models are satisfied for every time index, k as follows: where is matrix norm. According to (33), the value f k (κ) is related to the power series of n + κ and error covariance, P x as well as the partial differential of the system model, D m ∆x i , using the predicted state variable,x + k−1 , at each time step. Among these values, the influence of the scale parameter, κ on the amount of f k (κ) increases with the order of the power series (m). Thus, the boundary of the f k (κ) s value is approximately set to f k (κ) ≤ T κ , and finally, the boundary of α k and β k is: The relationship between T κ and κ, while it will depend on the system model and the estimated value, is generally proportional.
The predicted error covariance matrix of state variables, P − k , is defined as follows: where Q * k = Q k + α k F k K k R k (α k F k K k ) T + δP k and δP k refer to the error between the ideal error covariance matrix and the unscented transformed error covariance matrix [41].
The weighted error square of state variables is defined as: Using the above two equations, the expectation of the weighted error square is obtained by: According to the results of previous works [17,41], the expectation of the weighted error square is bounded as: If the coefficient of e k x − k is replaced with 1 − λ and the last term is substituted with an arbitrary positive value µ, (42) can be rewritten as: Finally, applying Lemma 2.1 of the previous work [42] to (49), the stochastic process of x − k is bounded. Therefore, it is shown that the estimation error of the UKF with the adaptive scaling parameter remains bounded if the filter system satisfies some assumptions listed in (36) to (40).

Simulation
The problem of bearings-only tracking arises in a variety of important practical applications, and it is often used for evaluating the performance of nonlinear filters [43]. The simple 2-D example describing the bearings-only tracking [27] is used to compare the performance of the proposed UKF algorithm with that of the CKF [8], the iterated Kalman filter (IKF) based on the UT [13,15], which has a method similar to that of the proposed algorithm with linearization considered in the update step. In addition, the UKF-AA algorithm [28][29][30][31], using the conventional scaling adjustment method mentioned in Section 3.1, is also used to compare the performance of the proposed algorithm. The system and measurement model are defined as: where the filter state variables are the 2-D position x k = [x 1,k , x 2,k ] T , k = 1, . . . , N s , 0.1 0.05 0.05 0.1 , R k = 0.025, ∀k, and i is the number of measurements of which the distribution p z i k is assumed to be Gaussian with a mean of tan −1 x 2,k −sin(k) x 1,k −cos(k) and variance of R k . The initial values of the filter are set as follows: x 0 = [20, 5] T , P 0 = 0.1 0 0 0.1 , i = 300. The candidate scaling parameter set is assumed to be {0, 1, 2, 3, 4}. The performance of the filter is analyzed in terms of mean squared error (MSE) using Monte Carlo simulations with 500 independent runs. Further, the averaged normalized estimation error squared (ANEES) is also used to analyze the performance of the proposed algorithm and is defined as: ANEES provides an evaluation of a relative estimation error at the j-th Monte Carlo simulations (N s is the number of total runs and is set to 500), and it evaluates a self-assessment provided by each filter in the form of the covariance matrix of the estimation error. Figure 1 show one target and observer trajectory of the Monte Carlo simulation. The target moves from the right side to the left side during the first time and holds a static position with a randomly varying vertical position during the remaining simulation time. The trajectory of the target is plotted with asterisks, while the path of the observer is plotted with dots. The dynamic model of the target, which is sensitive to changes in the scaling parameter of the UKF, is based on the original paper of UKF-AA [28]. Besides, the circular trajectory of the observer is designed to give a significant change in the measurement values. The information set of direction between target and observer is obtained according to the change of the observer s position. Figure 2 refer to the estimation results of the two-state variables and the expected value of measurement. These results are obtained in the simulation case, as shown in Figure 1. The proposed algorithm accurately tracks the target, although the measurement changes in the 0~50 (related to x 1 ) and the 50-450 time indexes (especially 100~150, 250~450 time related to x 2 ). During these periods, the scaling parameter adjustment logic is activated to reduce the UT approximated error, as shown in Figure 3. In other periods, the detection algorithm of measurement nonlinearity suggests that the behavior of the measurement is linear. Consequently, the scaling parameter is set to the recommended scaling parameter, which is 1 in the case of two-dimensional states. Figures 4 and 5 show the result of the MSE and the matrix norm of the error covariance matrix, respectively. These figures are to facilitate the assessment of estimation consistency. For verifying the performance, the proposed method is compared with various existing methods. In the CKF [8], the scaling parameter is always set to zero. In the UKF-AA [28], the conventional scaling parameter adjustment method is used in three types, which are the maximization of likelihood (ML) [30,31], posterior probability (MAP) [29], and predicted error square (MPES). Besides, the IKF is based on the UT [13,15], with linearization considered in the update step. Although the primary concept of this algorithm was implemented for performance analysis, the detailed settings for an application to this system differed from the original setting in the previous work [15]. The results of these figures show that the proposed method has the smallest estimation error and that all the filters maintained the estimate consistency.
is activated to reduce the UT approximated error, as shown in Figure 3. In oth detection algorithm of measurement nonlinearity suggests that the behavio urement is linear. Consequently, the scaling parameter is set to the recomm parameter, which is 1 in the case of two-dimensional states.   x ). During these periods, the scaling parameter adjustm is activated to reduce the UT approximated error, as shown in Figure 3. In other per detection algorithm of measurement nonlinearity suggests that the behavior of t urement is linear. Consequently, the scaling parameter is set to the recommende parameter, which is 1 in the case of two-dimensional states.    For verifying the performance, the proposed method is compare existing methods. In the CKF [8], the scaling parameter is always set to ze AA [28], the conventional scaling parameter adjustment method is used which are the maximization of likelihood (ML) [30,31], posterior probabil and predicted error square (MPES). Besides, the IKF is based on the UT [13 arization considered in the update step. Although the primary concept of was implemented for performance analysis, the detailed settings for an app system differed from the original setting in the previous work [15]. The r figures show that the proposed method has the smallest estimation error filters maintained the estimate consistency.   Figures 4 and 5 show the result of the MSE and the matrix norm of the error covariance matrix, respectively. These figures are to facilitate the assessment of estimation consistency. For verifying the performance, the proposed method is compared with various existing methods. In the CKF [8], the scaling parameter is always set to zero. In the UKF-AA [28], the conventional scaling parameter adjustment method is used in three types, which are the maximization of likelihood (ML) [30,31], posterior probability (MAP) [29], and predicted error square (MPES). Besides, the IKF is based on the UT [13,15], with linearization considered in the update step. Although the primary concept of this algorithm was implemented for performance analysis, the detailed settings for an application to this system differed from the original setting in the previous work [15]. The results of these figures show that the proposed method has the smallest estimation error and that all the filters maintained the estimate consistency.  Table 3 show the MSE and ANEES of the Monte Carlo simulation for efficient performance comparison. In Table 3, ROT refers to the ROT based proposed method, and SP refers to the scaling parameter of the UKF. It is confirmed that the proposed algorithm shows better performance compared with CKF, UKF-AA, and IKF because the proposed scaling parameter selection algorithm based on ROT uses the predicted values of filter states with the current measurement values so that the UT approximated error is reduced when the measurement values change rapidly.   Table 3 show the MSE and ANEES of the Monte Carlo simulation for efficient performance comparison. In Table 3, ROT refers to the ROT based proposed method, and SP refers to the scaling parameter of the UKF. It is confirmed that the proposed algorithm shows better performance compared with CKF, UKF-AA, and IKF because the proposed scaling parameter selection algorithm based on ROT uses the predicted values of filter states with the current measurement values so that the UT approximated error is reduced when the measurement values change rapidly. Considering that the proposed algorithm can be applied to other systems, relative operating time and related parameters were summarized in Table 4 using a function provided by the simulation program (MATLAB function, tic/toc used). In Table 4, total operating-time refers to the time taken to process a sequence of a simulation, and scaling parameter adjustment-time means the time it takes to adjust parameters during a sequence of a simulation. The rate indicates the ratio of the scaling parameter adjustment-time to total operating-time. As a result of comparing the algorithm processing time, the parameter adjustment step of the proposed method has an extended processing time. Thus, it is necessary to improve the processing speed of the algorithm to apply it to a real system even if the proposed method has the best estimation performance.   4.193 Considering that the proposed algorithm can be applied to other systems, relative operating time and related parameters were summarized in Table 4 using a function provided by the simulation program (MATLAB function, tic/toc used). In Table 4, total operating-time refers to the time taken to process a sequence of a simulation, and scaling parameter adjustment-time means the time it takes to adjust parameters during a sequence of a simulation. The rate indicates the ratio of the scaling parameter adjustment-time to total operating-time. As a result of comparing the algorithm processing time, the parameter adjustment step of the proposed method has an extended processing time. Thus, it is necessary to improve the processing speed of the algorithm to apply it to a real system even if the proposed method has the best estimation performance.

Conclusions
The UKF with a novel scaling parameter adaptation method using ROT is proposed to improve the estimation performance of the UKF when the measurement changes rapidly. In addition, the Sinkhorn-Knopp algorithm is used to minimize the cost function of ROT due to its fast convergence rate and the relaxation matching between predicted measurement sets and received measurement sets. Monte Carlo simulations are performed to evaluate the estimation performance of the proposed algorithm, and the results confirm that the proposed algorithm performs better than the UKF with conventional scaling parameter adaption methods. However, in order to apply the proposed algorithm to the actual tracking system, it is necessary to study additional algorithms that can reduce the operation time of the scaling parameter adaptation method, which occupies 69% of the operation time.