Optimization-Based Sensor Fusion of GNSS and IMU Using a Moving Horizon Approach

The rise of autonomous systems operating close to humans imposes new challenges in terms of robustness and precision on the estimation and control algorithms. Approaches based on nonlinear optimization, such as moving horizon estimation, have been shown to improve the accuracy of the estimated solution compared to traditional filter techniques. This paper introduces an optimization-based framework for multi-sensor fusion following a moving horizon scheme. The framework is applied to the often occurring estimation problem of motion tracking by fusing measurements of a global navigation satellite system receiver and an inertial measurement unit. The resulting algorithm is used to estimate position, velocity, and orientation of a maneuvering airplane and is evaluated against an accurate reference trajectory. A detailed study of the influence of the horizon length on the quality of the solution is presented and evaluated against filter-like and batch solutions of the problem. The versatile configuration possibilities of the framework are finally used to analyze the estimated solutions at different evaluation times exposing a nearly linear behavior of the sensor fusion problem.


Introduction
During the last few years, fully autonomous systems have been a highly active research field, which pushed product development towards the commercialization of such systems. For applications such as autonomous driving and unmanned aerial vehicles (UAVs) more semi-autonomous features become available every day. Besides the future of self-driving cars, autonomous drones will soon take over tasks in transportation, agriculture, maintenance, surveillance or energy generation. The first prototypes have already proven to be feasible and have successfully delivered small sized goods, simplified inspection processes in rough terrain or harvested wind energy at previously unreachable altitudes. These recent developments allow the prediction of an increasing number of UAVs applications (see Figure 1), which will result in a more crowded airspace, representing a paradigm shift in comparison to traditional airborne applications where the airspace is heavily secured and supervised. UAVs operating in a more occupied airspace close to humans require increased robustness to avoid fatal incidents, which translates to a strict set of requirements imposing new challenges for the estimation and control algorithms. Moving horizon estimation (MHE) and model predictive control (MPC) are promising strategies that use numerical optimization methods over a window of data to increase stability and accuracy of the system's motion. MPC has been successfully applied to challenging control problems with nonlinear dynamics and difficult environment conditions [1][2][3]. The counterpart to the optimal control theory for state estimation is represented by MHE which has already been used for decades to estimate the state of nonlinear chemical processes [4] or more recently, to increase fault tolerance of relative navigation problems [5]. The computational burden of solving an optimization problem has become less restrictive due to advances in computer technology and the development of tailored optimization algorithms [6,7] that nowadays allo the usage of efficient optimization-based methods on embedded systems. These recent developments make these strategies attractive for a broad range of applications including autonomous systems.
To estimate the motion of a system exposing a high degree of freedom with high accuracy, it often becomes necessary to fuse information from different sensors. Sensor fusion is a well known strategy to reduce the impact of measurement errors on the state estimate [8] and estimate not directly observable system states. One traditional way to solve multi-sensor fusion problems for time critical applications is the famous Kalman filter (KF) algorithm [9] and its derivatives for nonlinear systems extended Kalman filter (EKF) and unscented Kalman filter (UKF). To achieve high estimation accuracy, KF-based algorithms require, in practice, a procedure to tune the filter parameters, which is, for complex systems, non-trivial yet crucial. To overcome this problem, adaptive methods have been proposed [10,11]; however, filter stability can become an issue under certain conditions. Optimization-based approaches allow for a more elegant formulation respecting system constraints [12]. Typical Newton-type optimization algorithms use an iterative approach and are therefore known to better capture the nonlinearity of the problem yielding a more accurate and robust solution.
In this paper, we present an optimization-based sensor fusion framework for state estimation following an MHE approach, extending [13]. The framework is applied to the well-known sensor fusion problem for inertial navigation of a global navigation satellite system (GNSS) receiver measuring position and an inertial measurement unit (IMU) measuring linear acceleration and angular velocity. The measurement data is used to estimate the 3D-pose and velocity of a maneuvering object, such as an aircraft or satellite. There are several methods to integrate the inertial data and GNSS, such as loosely coupled and tightly coupled integration. The integration is typically achieved by adapting a nonlinear KF [14][15][16]. By preprocessing the nonlinearities of the problem, linear KFs represent a further option for the sensor fusion problem [17].
The sensor fusion problem discussed in this paper arises from the use of consumer-grade IMUs for which sensor errors introduce a drift that needs to be compensated for by using different measurement updates such as those given by vision based sensors [18] or GNSS receivers. The resulting sensor fusion problem for inertial navigation has been addressed using traditional and advanced approaches for Kalman filtering [17,19]. The observability of sensor parameters, such as biases of the IMU, is strongly coupled to the motions of the system. Therefore, a unique solution to the estimation problem cannot be guaranteed [20]. By considering a time window of measurement data, the estimated solution is suggested to achieve improved accuracy and robustness [21]. Several MHE formulations for sensor fusion in the context of inertial navigation have been published in the recent past and have been shown to outperform traditional EKF approaches for the integration of GNSS and IMU [22,23] and online-identification of IMU parameters [24]. Despite the different sets of sensors, models and optimization methods, all authors present an accuracy improvement for a specific horizon length compared to Kalman filtering. This paper contributes to the existing research by presenting a general sensor fusion framework capable of analyzing moving horizon estimators over the spectrum of filter-like configurations to batch processing. Furthermore, the estimated results are evaluated against an accurate reference of a maneuvering aircraft that was obtained using high-grade sensors. A detailed study of the influence of the horizon length on the quality of the solution is given and critically analyzed.
This paper is organized as follows. First, in Section 2, we describe the relevant system and sensor models to address the sensor fusion problem by the definition of an equality-constrained optimization problem, which components are explained in detail. In Section 3, the presented approach is used to estimate the trajectory of an aircraft, and the results are compared against an accurate reference trajectory. Finally, the conclusions of this work are presented in Section 4.

Methods
Section 2 introduces the necessary coordinate frames to define the system and sensor models, which will finally be used to derive the optimization problem of the MHE state estimator.

Coordinate Frames
The sensor fusion problem contains measured and estimated quantities expressed in several coordinate frames (see Figure 2). The position measurements are obtained by the GNSS sensor in the earth-centered, earth-fixed (ECEF) frame and often expressed in latitude, longitude, altitude (LLA). The measurements are transformed to a locally-fixed and non-moving frame L following the east, north, up (ENU) convention with its origin located at a reference location. Since the transformation between global and local frame is constant over time, the measurements are converted to the local frame in a preprocessing step. The measurements of the IMU are obtained in the sensor coordinate frame S, which is moving with respect to the local frame L. Throughout this document, the notation · L or · S is adopted to indicate measured or estimated variables according to the local or sensor frame, respectively.

System Model
A generic piecewise constant linear acceleration and angular velocity model are used to model translational and rotational dynamics of the moving system. These models are widely used in the target tracking community and can be reviewed in detail in, e.g., the survey of [25]. The ordinary differential equations (ODEs) of the navigational states are defined bẏ where the position p L (t) ∈ R 3 and velocity v L (t) ∈ R 3 are obtained by integration of the acceleration a L (t) ∈ R 3 . The angular velocity ω S LS ∈ R 3 from the S-frame to the L-frame expressed in the S-frame drives the ODE of the orientation, which is parametrized by a unit quaternion q LS (t) ∈ Q 1 = {R 4 : q 2 = 1} describing the orientation between the S and L-frame. The operator is introduced for the product of a vector r ∈ R 3 and a quaternion q = (q 0 , q v ) ∈ Q 1 with q 0 ∈ R and q v ∈ R 3 being the scalar and vector part, respectively. The ODE Equations (1) of the system model lead to the definition of the concatenated vector which contains the navigation states of the model and is therefore denoted by the index M.
The navigational states are driven by the piecewise constant control inputs

Sensor Models
The sensors used for this estimation problem are a single GNSS receiver and an IMU which are both rigidly attached to the moving system. The described sensor models relate the measured quantities to the state and control variables defined accordingly in Equations (3) and (4). Section 2.2 defines a continuous-time model; however, the sensors acquire the measurements at discrete times t k with a sensor-specific measurement frequency f S . We define output functions for each sensor measurement, which evaluate the continuous variables at the sampling times t k and establish the relation between discrete measurements and estimated continuous state and control variables.

Inertial Measurement Unit
An IMU combines a three-axis accelerometer and a three-axis gyroscope in a single package. The sensors can be based on different principles that define implicitly the accuracy of the sensor. Due to the advances in micro-electro-mechanical system (MEMS) technology, the sensor modules can be produced very cost efficiently on a single silicon chip. The MEMS accelerometer and gyroscope are modeled in this paper using an additive bias term δ S to compensate the time-varying offsets in the average signal output of the sensor [26]. These biases δ S a ∈ R 3 and δ S ω ∈ R 3 are modeled as a random walk using the device specific Allan variance [27]. To include the estimation of the bias terms, we extend the state vector x M (t) as defined in Equation (3) to The bias terms are modeled as constants between consecutive sampling times t k and t k+1 of the sensor exposing the lowest sampling rate f S leading tȯ and the actual random walk is embedded inside the optimization problem by allowing discontinuities according to the device specific random walk process noise. The inertial quantities acceleration y S a ∈ R 3 and angular velocity y S ω ∈ R 3 are acquired by the IMU at sampling rate f IMU 100 Hz. The sampled IMU data is integrated between two consecutive measurements of the slowest sensor, which is, in this specific problem, the GNSS receiver, and expressed as motion increments [28]. The increments are converted to an average inertial measurement over the interval [t k , t k+1 ], which directly translates to the piecewise constant control inputs defined in Equation (4). The following output functions define the IMU measurements in terms of state and control variables: where g L stands for the constant gravity vector in the L frame and R(q, r) denotes the rotation of a vector r ∈ R 3 by a unit quaternion q ∈ Q 1 .

GNSS Receiver
A GNSS receiver is used to retrieve position measurements based on pseudo ranges estimated from satellite signals. The interested reader is referred to [29] for more information. A typical GNSS receiver has an output frequency between 1 Hz and 10 Hz, at which the position measurements are made available with respect to an earth fixed coordinate frame and converted to the local L frame as described before in Section 2.1. The output function for position measurements at t k is defined by which relates the GNSS measurements directly to the state x(t). The position accuracy is estimated and reported by the GNSS receiver and strongly depends on the signal and ambient conditions.

Optimization Problem
To define the optimization problem for the nonlinear sensor fusion of GNSS and IMU, we first define the components of the cost function, the implicit integration method using direct collocation, and the necessary equality constraints.

Measurement Residuals
The cost function of the optimal estimation problem is defined by the squared weighted sum of residuals between the estimated output variables y k and the measurementsȳ k at sampling times t k . The cost terms for the position and inertial measurements are defined by and evaluate the continuous output functions of IMU (7) and GNSS (8) at time t k . The residuals are weighted according to the measurement noise variances using the diagonal matrices Q (·) .

Direct Collocation
We use a direct collocation approach to embed the integration of the nonlinear dynamics inside the optimization problem. Direct collocation is a strategy known from the field of direct optimal control [30] that is based on the discretization of control and state trajectory, therefore lifting the problem to higher dimensional space in comparison to traditional single shooting methods [31]. By embedding the numerical integration inside the optimization problem using collocation variables, the need of an explicit call to an integrator vanishes. We decided to adopt direct collocation for the presented sensor fusion problem because of the potential to decrease the number of iterations by initializing the collocation variables well. The discretization of x(t) and u(t) at times t 0:N leads to the discrete sets of states X = {x 0 . . . x N } and controls U = {u 0 . . . u N−1 }. While the control trajectory is defined to be piecewise constant, the state trajectory is approximated using Lagrange polynomials P 0:D of degree D at Gauss-Radau collocation points τ ∈ R D [32]. Hence, the state trajectory is represented by the sum of the weighted polynomials where the polynomial is scaled by the collocation variables c k ∈ R N x ×D . The additional collocation variables enter the optimization problem over the collocation states C = {c 0 . . . c N−1 } and increase the number of decision variables. To retrieve a physically meaningful state trajectory, the collocation variables require being constrained using the system dynamics. This is enforced by constraining the derivative of the time-scaled polynomial at the collocation points τ 1:D to the ODE of the systeṁ x(t) = f (x(t), u(t)) evaluated at the same point in time using equality constraints. We obtain D equality constraints for each time interval [t k , t k+1 ] of duration T = t k+1 − t k defined by The discretization of the state trajectory X = {x 0 . . . x N } requires further equality constraints to obtain a closed state trajectory. The continuity constraints, constrain the next state to correspond to the propagated previous state. Z M ∈ R N x ×N x is defined as a selection matrix for the system model states x M (t). The continuity constraint is exclusively applied to the system states x M (t) as defined in Equation (3).

Random Walk
As described in Section 2.3, the biases are modeled by a random walk process. This behavior is introduced by relaxing the continuity constraint and defining additional cost terms where the difference between previous and current state is weighted according to the sensor's random walk process noise and minimized accordingly. The selection matrix Z δ ∈ R 3×N x is used to extract the bias states from the state vector x(t).

MHE Estimator
The components of the optimization problem introduced in Section 2.4 are leading to the definition of the following minimization problem for the MHE estimator subject tȯ where N ∈ N and T = t k+1 − t k define the horizon length and the sampling time. Since the quaternion is an over-parametrization of a rotation, we have to guarantee that the estimated quaternions satisfy the unit norm condition. The quaternion ODE of an orientation as defined in Equation (1c) preserves the unit norm, allowing for adding a single unit norm constraint at the end of an estimation horizon. The constraint (14c) is expressed using the selection matrix Z q for the quaternion entries of the state vector x j+N . The additional cost term in the objective function (14a) defines the arrival cost c A (x j ), which summarizes the history of measurements by penalizing deviations to previous estimates of x(t j ). The arrival cost is defined as a quadratic approximation of the Schur complement correction and is calculated while marginalizing the system before shifting the horizon [33]. Depending on the nonlinearity of the system, the approximation of the past by the arrival cost might be more or less accurate. In an MHE approach, longer horizons can increase the information about the past by considering more measurement data. The benefit of having a more information about the recent past becomes negligible in cases where the approximation of the past by the arrival cost is already accurate, and vice versa.

Dataset
The MHE estimator described in Section 2 is used in the following to study the influence of the horizon length N on the accuracy of the solution. The dataset used for evaluation has been recorded during a flight with a small Socata single propulsion aircraft (see Figure 3a) flying different maneuvers in hazy weather conditions. In the 100 s long data selection, the airplane completes two sharp turns (see Figure 4), while reaching roll angles up to 60 deg.

Sensor Setup
The complete sensor setup used for data collection contains two independent sets of sensors: one for the estimation results and one to obtain the reference trajectory. Both sets use a GNSS receiver and an IMU, which differ in their specifications. While the estimation setup includes exclusively consumer-grade sensors, the reference setup uses higher-grade sensors.
To collect the data used for estimation, we used a Xsens MTi-G-700 [34] motion tracker which combines an IMU and GNSS receiver in a single package. By sharing the same signal pipeline, the risk of time synchronization errors between the independent sensors is reduced. The Allan variance curves of the contained MEMS accelerometers and MEMS gyroscopes are plotted in Figure 5 and define the noise density and bias stability of the sensor. The noise density is used to fill the weighting matrices Q −1 ω and Q −1 a used in the cost functions (9b) and (9c). The GNSS receiver measures the position and its accuracy at time t k . The latter is further used to fill the weighting matrix Q −1 p,k in Equation (9a). The GNSS dataȳ L p,k was acquired at the maximum rate of 4 Hz and the IMU was configured to stream motion increments, as described in Section 2.3 at an output frequency of 4 Hz. The further available velocity measurements acquired by the GNSS receiver are not considered in this document.   Table 1 allows a direct comparison between the noise parameters of the sensors used for estimation and reference. The iMAR-FSAS IMU [35] uses a tactical-grade fibre optic gyro (FOG) with lower measurement noise compared to the consumer-grade MEMS gyroscopes used for estimation. Furthermore accurate GNSS receivers were used in a differential GPS (DGPS) configuration allowing for very precise positioning with errors below 0.1 m. To improve the accuracy of the ground truth trajectory further, the acquired data of the reference sensors was fused using a batch processing approach [36]. All sensor devices (Figure 3b) were rigidly attached to the airplane (Figure 3a). The sensor frame S of the IMU was aligned with the body frame of the aircraft, which implies that no further transformations were required to retrieve physically meaningful output for the navigation states defined in Equation (3).

Algorithm Configuration
Crucial for every kind of nonlinear optimization is the initialization of the decision variables. To initialize the states X and controls U , we first compute the initial orientation q LS 0 using the velocity vector estimated from consecutive GNSS measurements, assuming the system obeys holonomic constraints and has sufficient speed. The measurements of GNSS receiver and IMU are used to initialize position p L 0 and controls u(t 0 ), respectively. The accelerometer and gyroscope biases are initialized to zero. The resulting initial guess x(t 0 ) (as defined in Equation (3)) and its corresponding standard deviations are summarized in Table 2.

Horizon Length Evaluation
To analyze the estimation accuracy depending on the horizon length N, the estimation is repeated for different values N = {1, 2, 4, 8, 16, 20}. The horizon length N directly defines the number of measurements N M = N and the number of decision variables (15) contained in each MHE optimization problem, where N x , N u and N x M represent the dimensions of the state and control vectors defined in Equations (3) and (4). The degree of the collocation polynomials D defines implicitly the number of collocation variables used for the integration of the navigation states and is set to D = 3 for this evaluation. Figure 6 shows the estimated values for velocity and orientation as well as the calculated reference values. For all configurations, the MHE estimator output follows the reference trajectory and no major differences are observable. The plots reveal, however, that the estimation results recover faster from an incorrect initialization with longer horizons and that, in general, larger horizons show an improved tracking behavior of the reference trajectory. The evolution of the standard deviations of position, velocity, and orientation for the evaluated horizons is plotted in Figure 7. The standard deviations converge after the first turn to a steady state, which is determined by the measurement noise. The turn maneuver results in sufficient excitation of the system to observe the orientation, which is the only navigational state that is not directly related to the measurements of the sensors. In Figure 8, we observe that the bias estimates are converging towards the same steady-state for all evaluated horizons. Furthermore, the evolution of the bias standard deviations confirms the necessity of motion changes to identify sensor parameters. The estimated biases converge only after finishing the first turn at around 40 s to their corresponding steady state.  Before taking a closer look at the results for different horizons, we define two types of solutions: the real-time solution represents the estimate of the most recent state in the horizon, whereas the lagged solution represents the oldest state in the current window. In the latter case, the value of the state estimate gets updated until it is not contained anymore in the window. Considering a dataset with current data index j = 1, . . . , J of total length J, the lagged solution can be compared to fixed-lag smoothing, maximizing the probability of p(x j−N |y 1:j ), whereas the real-time solution corresponds to filtering approach estimating p(x j |y 1:j ). For completeness, we remind readers that a typical batch processing approach estimates p(x 0:J |y 1:J ). Even though both real-time and lagged solutions contain more measurement data with increasing horizons, there are differences in their interpretation. Both estimators contain an approximation of the past that is defined by the arrival cost c A (x j ) in Equation (14a). While the lagged version evaluates the first state in the window, the real-time version estimates the last state in the window, which time is defined by the latest measurement at time t j . From a time perspective, this means that the lagged version contains information about its past and future, whereas the real-time problem only contains information about its past.
A comparison between real-time and lagged solutions for different horizon lengths is given in Figure 9. A noticeable decrease in terms root mean square error (RMSE) with increasing horizons can only be observed for the lagged solution. The accuracy of the lagged solution converges towards the solution of the batch estimator (dotted lines), which can be interpreted as a locally optimal solution since all measurement data is contained in the optimization problem, allowing the removal of start-up effects due to the initialization of the algorithm. In terms of error quantities, we observe relatively large RMSEs for the position estimate, ranging up to 3.5 m in the z-direction. The batch solution confirms this large error of 3.1 m, which allows the conclusion that the error is not coupled to start-up effects or initialization errors but rather a systematic offset between the sensor setups. Table 1 shows that the GNSS receiver, used in the reference setup, operates on the L1/L2-band, which improves position accuracy and especially the altitude estimate compared to a single L1-band GNSS receiver as used for estimation.
The orientation estimate reveals error reductions for roll φ and pitch θ angles with increasing horizon length. The yaw angle ψ reveals the largest RMSE, which is an expected behavior due to limited system dynamics contained in the dataset. The 100 s long flight is not subject to larger changes in pitch or strong accelerations in the xy-plane, which contributes to the challenging conditions for the estimation of the yaw angle in the presence of sensor biases. The consequence of a lack of excitation can be further observed in the bias estimates and their standard deviations in Figure 8. The initial jumps of the bias estimates reveal that the biases are not observable under constant motion. After completion of the first turn, the biases in xand y-directions become observable and the values converge to their corresponding steady state.
Coming back to Figure 9, we notice that only the lagged MHE solution (solid lines) achieves major RMSE improvements in position, velocity, and orientation with an increasing horizon length N. The real-time solution does hardly improve by considering a bigger horizon in comparison to the filter solution, where N = 1. The evaluated sensor fusion problem exposes a close to linear behavior, which can be approximated well by an arrival cost. Therefore, only small improvements can be achieved by considering a longer window and, therefore, a more detailed representation of the recent past for the real-time solution.
The nearly linear behavior of the regarded problem is further proven by restricting the optimization algorithm to a maximum number of one iteration [37]. This configuration achieves a KF-like algorithm where only a single linearization of the system is calculated. Analyzing the solution in Figure 10 for the RMSEs in orientation, we notice no visible difference between the single-iteration and the iterated solution that satisfies the exit condition, even though the converged solutions will naturally preserve constraints more accurately.

Discussion
The presented MHE approach for multi-sensor fusion allows a mathematically elegant formulation in a single optimization problem where cost functions and constraints are recursively adapted with the availability of new measurement data. We introduced the modules of the framework by applying it to the commonly known sensor fusion problem of GNSS receiver and IMU and evaluated the accuracy of the solution on a real flight dataset in comparison to an accurate reference trajectory for different horizon lengths N = {1, 2, 4, 8, 16, 20}. Major improvements for the navigation states position, velocity, and orientation in terms of RMSEs can be achieved when considering either the time-lagged or the batch estimates. With an increasing horizon, the RMSEs decrease and errors approach the results of the post-processed batch solution. For the real-time solution, which estimates the state at the most recent measurement, the performance is only slightly better than the filter solution, corresponding to N = 1.
These findings coincide with relevant research [21], where MHE is suggested to improve consistently the accuracy of the estimates at the price of a higher computational cost. In general, it can be said that if applications allow for a time-lagged estimate, the results will be more accurate considering larger horizons. This makes the time-lagged implementation of a MHE-based estimator a versatile approach for systems with slow dynamics and represents an alternative for memory-restrictive offline processing. For real-time critical applications which are governed by nearly linear equations, the accuracy of the state estimates does hardly benefit from the nonlinear optimization approach resulting in a comparable accuracy to filter solutions. The nearly linear behavior of the specific sensor fusion problem was confirmed by evaluating an EKF-like configuration of the MHE estimator using a real-time iteration scheme [37] with a single iteration and a horizon of N = 1 yielding similar results in terms of RMSEs. Notice, the considered sensor fusion problem of IMU and GNSS receiver can be considered a linear estimation problem when conditioned on orientation, i.e., the optimization problem reduces to linear system if the orientation trajectory is known a priori.
The presented optimization-based framework for sensor fusion allows a seamless and detailed evaluation of filter, MHE, and batch solutions by adapting the horizon length N. The modular structure contains several key features which we aim to exploit in further research. By encoding the system dynamics in a single set of ODEs, the framework can be adjusted to consider more complex system kinematics in a straightforward manner. Additional measurement updates from further sensors can be included by manipulation of the cost function of the optimization problem. When increasing the number of sensors, the calculation of a common time-grid becomes nontrivial. The presented framework is prepared to account for asynchronous measurements by estimating time-continuous state trajectories using a direct collocation approach. The adaptivity of a MHE-based approach for sensor fusion is one major advantage compared to traditional filter schemes and combined with efficient numerical solvers MHE is likely to be beneficial for many challenging applications.
The problem of motion tracking does not classify as nearly-linear problem in the general case. By considering measurements of relative distance sensors, such as ultra-wideband (UWB), time of flight (TOF) or camera sensors, the sensor models become strongly nonlinear. Therefore an MHE approach with the appropriate optimization strategy is expected to improve the real-time estimates as well. For the future, we plan to extend the presented sensor fusion approach of GNSS and IMU to include the full calibration of the sensors, which will increase the nonlinearity of the optimization problem and MHE is expected to outperform conventional filter techniques. The identification of parameters like scale factors, misalignment or time delays will improve the accuracy when using uncalibrated consumer-grade sensors.