Next Article in Journal
A Hybrid SAR/ISAR Approach for Refocusing Maritime Moving Targets with the GF-3 SAR Satellite
Next Article in Special Issue
Latency Compensated Visual-Inertial Odometry for Agile Autonomous Flight
Previous Article in Journal
Investigating the Use of Pretrained Convolutional Neural Network on Cross-Subject and Cross-Dataset EEG Emotion Recognition
Previous Article in Special Issue
Weak Knock Characteristic Extraction of a Two-Stroke Spark Ignition UAV Engine Burning RP-3 Kerosene Fuel Based on Intrinsic Modal Functions Energy Method
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:

Robust Outlier-Adaptive Filtering for Vision-Aided Inertial Navigation

School of Aerospace Engineering, Georgia Institute of Technology, 270 Ferst Drive, Atlanta, GA 30313, USA
Faculty of Aerospace Engineering, The Pennsylvania State University, 229 Hammond Building, University Park, PA 16802, USA
Author to whom correspondence should be addressed.
Current address: 55 River Oaks Pl, San Jose, CA 95134, USA.
Sensors 2020, 20(7), 2036;
Submission received: 7 March 2020 / Revised: 28 March 2020 / Accepted: 1 April 2020 / Published: 4 April 2020
(This article belongs to the Special Issue UAV-Based Smart Sensor Systems and Applications)


With the advent of unmanned aerial vehicles (UAVs), a major area of interest in the research field of UAVs has been vision-aided inertial navigation systems (V-INS). In the front-end of V-INS, image processing extracts information about the surrounding environment and determines features or points of interest. With the extracted vision data and inertial measurement unit (IMU) dead reckoning, the most widely used algorithm for estimating vehicle and feature states in the back-end of V-INS is an extended Kalman filter (EKF). An important assumption of the EKF is Gaussian white noise. In fact, measurement outliers that arise in various realistic conditions are often non-Gaussian. A lack of compensation for unknown noise parameters often leads to a serious impact on the reliability and robustness of these navigation systems. To compensate for uncertainties of the outliers, we require modified versions of the estimator or the incorporation of other techniques into the filter. The main purpose of this paper is to develop accurate and robust V-INS for UAVs, in particular, those for situations pertaining to such unknown outliers. Feature correspondence in image processing front-end rejects vision outliers, and then a statistic test in filtering back-end detects the remaining outliers of the vision data. For frequent outliers occurrence, variational approximation for Bayesian inference derives a way to compute the optimal noise precision matrices of the measurement outliers. The overall process of outlier removal and adaptation is referred to here as “outlier-adaptive filtering”. Even though almost all approaches of V-INS remove outliers by some method, few researchers have treated outlier adaptation in V-INS in much detail. Here, results from flight datasets validate the improved accuracy of V-INS employing the proposed outlier-adaptive filtering framework.

1. Introduction

The most widely used algorithms for estimating the states of a dynamic system are a Kalman Filter [1,2] and its nonlinear versions such as an extended Kalman filter (EKF) [3,4]. After the NASA Ames Research Center implemented the Kalman filter into navigation computers to estimate the trajectory of the Apollo program, engineers have developed a myriad of applications of the Kalman filter in navigation system research areas [5]. For example, Magree and Johnson [6] developed a simultaneous localization and mapping (SLAM) architecture with improved numerical stability based on the UD factored EKF, and Song et al. [7] proposed a new EKF system that loosely fuses both absolute state measurements and relative state measurements. Furthermore, Mostafa et al. [8] integrated radar odometry and visual odometry via EKF to help overcome their limitations in navigation. Despite the development of numerous applications of the Kalman filter in various fields, it suffers from inaccurate estimation when required assumptions fail.
Estimation using a Kalman filter is optimal when process and measurement noise are Gaussian. However, sensor measurements are often corrupted by unmodeled non-Gaussian or heavy-tailed noise. An abnormal value relative to an overall pattern of the nominal Gaussian noise distribution is called an outlier. In other words, in statistics, an outlier is an observation that deviates so much from other observations as to arouse suspicion that it is generated by a different mechanism [9]. Such outliers have many anomalous causes. They arise due to unanticipated changes in system behavior (e.g., temporary sensor failure or transient environmental disturbance) or unmodeled factors (e.g., human errors or unknown characteristics of intrinsic noise). As an example of measurement outliers in many navigation systems, either computer vision data contaminated by outliers or sonar data corrupted by phase noise lead to erroneous measurements. Process outliers also occur by chance. Inertial measurement unit (IMU) dead reckoning and wheel odometry as a proxy often generate inaccurate dynamic models in visual-inertial odometry (VIO) and SLAM algorithms, respectively. Without accounting for outliers, the accuracy of the estimator significantly degrades, and control systems that rely on high-quality estimation can diverge.

1.1. Related Work

1.1.1. State Estimation for Measurements with Outliers

As the performance of the Kalman filter degrades at the presence of measurement outliers, many researchers have investigated other approaches to mitigate the impact of the outliers. Mehra [10] created adaptive filtering with the identification of noise covariance matrices and showed the asymptotic convergence of the estimates towards their true values. Maybeck [11] and Stengel [12] found other noise-adaptive filtering such as covariance matching. However, all of these filters performed only offline and required filter tuning. To estimate parameter values in unknown covariances without the need for manual parameter tuning, Ting et al. [13] used a variational expectation–maximization (EM) framework. That is, they introduced a scalar weight for each observed data sample and modeled the weights to be Gamma distributed random variables. However, it assumed that noise characteristics are homogeneous across all measurements even though sensors have distinct properties. Särkkä and Nummenmaa [14] provided the online learning of the parameters of the measurement noise variance, but to simultaneously track the system states and the levels of sensor noise, they additionally defined a heuristic transition model for the noise parameters. Piché et al. [15] developed Gaussian assumed density filtering and smoothing framework for nonlinear systems using the multivariate Student t-distribution, and Roth et al. [16] included an approximation step for heavy-tailed process noise, but these filters are not applicable in high dimensions. Next, Solin and Särkkä [17] found that the added flexibility of Student-t processes over Gaussian processes robustifies inference in outlier-contaminated noisy data, but they treated only analytic solutions enabled by the noise entanglement.
Recently, Agamennoni et al. developed the outlier-robust Kalman filter (ORKF) [18,19] to obtain the optimal precision matrices of measurement outliers by variational approximation for Bayesian inference [20]. However, this method requires iterations at every time, even when observed data contain no outliers. Graham et al. also established the l 1 -norm filter [21] for both types of sparse outliers. However, the filter might not work for nonlinear systems since they derived the constraint of l 1 -norm optimization based on only linear system equations. Similar to the ORKF, the l 1 -norm filter needs the constrained optimization at all times, even when no additional noise present as outliers. Therefore, these two approaches demand some extensive computational complexity for either iterations or optimization. As outliers do not always arise (i.e., are rare), we reduce such computation cost if a test detects the time when outliers occur. All of the above methods were not validated for complicated systems such as unmanned aerial vehicles or vision-aided inertial navigation [6,22,23] or with sequential measurement updates [24,25].

1.1.2. Outlier Rejection Techniques

One of the primary problems in vision-aided inertial navigation systems (V-INS) is incorrect vision data correspondence or association. Matched features between two different camera views are corrupted by outliers because of image noise, occlusions, and illumination changes that are not modeled by the feature matching techniques. To provide cleaned measurement data to the filter, outlier removal in image processing front-end is essential. One of standard outlier rejection techniques is RANdom SAmple Consensus (RANSAC) [26]. RANSAC is an iterative approach to estimate the parameters of a mathematical model from a set of observed data contaminated by outliers. An underlying assumption is that the data consist of inliers whose distribution is described by some set of the parameters of the model and outliers that do not fit the model. The generated parameters are then verified on the remaining subset of the data, and the model with the highest consensus is a selected solution. In particular, 2-point RANSAC [27,28] is an extended RANSAC-based method for two consecutive views of a camera rigidly mounted on a vehicle platform. Given gyroscopic data from IMU measurements, randomly selected two-feature correspondences hypothesize an ego-motion of the vehicle. This motion constraint discards wrong data associations in the feature matching processes.
For detecting remaining outliers that are not rejected in the image processing front-end, outlier detection tests are required in filtering back-end. Most of the statistical tests [29] that require access to the entire set of data samples for detecting outliers might not be a viable option in real-time applications. For example, the typicality and eccentricity data analysis [30,31] used in [32] is an inadequate measure in V-INS, as computing the means and the variances of each residual of sequential measurements is challenging. In V-INS, the tracking of some measured features is possibly lost due to out of sight, and new feature measurements are coming for initialization.
For the real-time outlier detection of sequential measurements in V-INS, the Mahalanobis gating test [33] is a useful measure based on the analysis of residual and covariance signals at each feature measurement. The approach builds upon each Mahalanobis distance [34] of residuals and compares each value against a threshold given by the quantile of the chi-squared distribution with a certain degree of freedom. The confidence level of the threshold is designated prior to examining the data. Most commonly, the 95% confidence level is used. This hypothesis testing, called goodness of fit, is a commonly used outlier detection method in practice. Because of such suitability of the Mahalanobis gating test to real-time detection in V-INS, this paper combines the test with the ORKF [18,32] to detect and handle measurement outliers in vision-aided estimation problems. Similar to the derivation of update steps for handling measurement outliers in the ORKF, for computing the optimal precision matrices of unmodeled outliers in V-INS, Section 4 will derive feasible update procedures by variational inference. In other words, whenever unexpected outliers appear, the outlier-adaptive filtering in this paper updates and marginalizes measurement outliers to improve the robustness of the navigation systems.

1.2. Summary of Contributions

This paper presents improving the use of outlier removal techniques in image processing front-end and the development of a robust and adaptive state estimation framework for V-INS when frequent outliers occur. For outlier removal in the image processing front-end of V-INS, feature correspondence constitutes the following three steps: tracking, stereo matching, and 2-point RANSAC. To estimate the states of V-INS in which vision measurements still contain remaining outliers, we propose a novel approach that combines a real-time outlier detection technique with an extended version of the ORKF in the filtering back-end of V-INS. Therefore, our approach does not restrict noise at either a constant or Gaussian level in filtering. The testing results of benchmark flight datasets show that our approach leads to greater improvement in accuracy and robustness under severe illumination environments.
Starting from the architecture of the existing vision-aided inertial navigation system, this paper more focuses on contributing to the development of red boxes in Figure 3.

1.3. A Guide to This Document

The remainder of this document contains the following sections. Section 2 introduces background for all of this study. To estimate the states of V-INS in which frequent outliers arise, Section 3 examines outlier rejection techniques in image processing front-end, and Section 4 and Section 5 formulate a novel implementation of robust outlier-adaptive filtering. Section 6 shows testing results of this study on benchmark flight datasets. The last section concludes and plans future work.

2. Preliminaries

2.1. The Extended Kalman Filter

The system equations with continuous-time dynamics and a discrete-time sensor are as follows,
x ˙ ( t ) = f x ( t ) , η ( t )
y ( t k ) = h ( x ( t k ) ) + ζ ( t k ) ,
where x R n is the state and y R m a measurement. f ( · ) and h ( · ) are the nonlinear dynamic and measurement functions, respectively. Let us assume that these functions are known based on each equation of motion and modeling. To clarify, t denotes continuous time, subscript k represents the k-th time step, and initial condition x ( 0 ) = x 0 is given. Moreover, let us assume that both propagation and measurements are corrupted by additive zero-mean white Gaussian noise; that is, η ( t ) N ( 0 , Q ) and ζ ( t k ) N ( 0 , R ) .

2.1.1. Time Update

To estimate the state variables of the system, we design a hybrid EKF in the following steps. In the propagation step, state estimate x ^ : = E [ x ] and its error-covariance P : = E [ ( x x ^ ) ( x x ^ ) T ] are integrated from time ( k 1 ) + to time k with respect to variable τ
x ^ k = x ^ k 1 + + t k 1 t k f x ^ ( τ ) d τ ,
where let x ^ k = x ^ ( t k ) . Hat “ ^ ” denotes an estimate, and superscripts − and + represent a priori and a posteriori estimates, respectively. Here, for one numerical solution of the ordinary differential equation, we use the Heun’s method [35] that refers to the improved Euler’s method or a similar two-stage Runge–Kutta method. Jacobian A, B, and state transition matrix Φ are defined by
A k 1 = f ( x ) x | x ^ k 1 + , B k 1 = f ( x ) η | x ^ k 1 + Φ k 1 = exp ( A k 1 Δ t k 1 ) I + A k 1 Δ t k 1 ,
where Δ t k 1 = t k t k 1 . Letting P k = P ( t k ) , the time update of error covariance is
P k = Φ k 1 P k 1 + Φ k 1 T + B k 1 Q B k 1 T Δ t k 1 .

2.1.2. Measurement Update

Using actual sensor measurements, the measurement update step of the EKF corrects state estimates and its corresponding error covariance after propagation. Letting y k = y ( t k ) , at every time k when y k is measured,
K k = P k C k T ( C k P k C k T + R ) 1
x ^ k + = x ^ k + K k ( y k h ( x ^ k ) )
P k + = P k K k C k P k ,
where K is called the Kalman gain and Jacobian C is defined as
C k = h ( x ) x | x ^ k .
Equation (7) is the Joseph’s form [36] of the covariance measurement update, so this form preserves its symmetry and positive definite. For more details such as optimality and derivation, see References [24,37].

2.1.3. Sequential Measurement Update

When myriad measurements are observed at one time, sequential Kalman filtering is useful. In fact, we obtain N measurements, y 1 , y 2 , , y N , at time k; that is, we first measure y 1 , then y 2 , ⋯, and finally y N , shown in Figure 1.
We first initialize a posteriori estimate and covariance after zero measurement is processed; that is, they are equal to the a priori estimate and covariance. For i = 1 , , N , perform the general measurement update using the i-th measurement. We lastly assign the a posteriori estimate and covariance as x ^ k + x ^ k N and P k + P k N . Based on Simon [24]’s proof that the sequential Kalman filtering is equivalent formulation of the standard EKF, the order of updates does not affect overall performance of estimation.

2.2. Models of Vision-Aided Inertial Navigation

2.2.1. Vehicle Model

The nonlinear dynamics of a vehicle is driven by raw micro-electromechanical system (MEMS) IMU sensor data including specific force and angular velocity inputs. The estimated vehicle state is given by
x ^ V = i p ^ b / i T i v ^ b / i T δ θ ^ T b ^ a T b ^ ω T T ,
where p b / i , v b / i are the position and velocity of the vehicle with respect to the inertial frame, respectively. δ θ is the error quaternion of the attitude of the vehicle, and its more details are explained in [38,39,40]. b a and b ω are the acceleration and gyroscope biases of the IMU, respectively. Left superscript i denotes a vector expressed in the inertial frame. The EKF propagates the vehicle state vector by dead reckoning with data from the IMU. Raw MEMS IMU sensor measurements a raw and ω raw are corrupted by noise and bias as follows,
a raw = a true T b / i i g + b a + η a ω raw = ω true + b ω + η ω
b ˙ a = η b a
b ˙ ω = η b ω ,
where a true and ω true are the true acceleration and angular rate, respectively, and g is the gravitational acceleration in the inertial frame. η a and η ω are zero-mean, white, Gaussian noise of the accelerometer and gyroscope measurement, and T b / i = T i / b T denotes the rotation matrix from the inertial frame to the body frame.
η b a and η b ω in Equations (9) and (10) are the random walk rate of the acceleration and gyroscope biases. From the works in [40,41,42], the MEMS accelerometer and gyroscope are subject to flicker noise in the electronics and other components susceptible to random flickering. The flicker noise causes their biases to wander over time. Such bias fluctuations are usually modeled as a random walk. In other words, slow variations in the bias of the IMU sensor are modeled with a “Brownian motion” process, also termed a random walk in discrete-time. In practice, the biases are constrained to be within some range, and thus the random walk model is only a good approximation to the true process for short periods of time.
The vehicle dynamics is given by
i p ^ ˙ b / i = i v ^ b / i
i v ^ ˙ b / i = T ^ i / b ( a raw b ^ a ) + i g
q ^ ˙ i / b = 1 2 Q ( ω raw b ^ ω ) q ^ i / b
δ θ ^ ˙ = ( ω raw b ^ ω ) × δ θ ^
b ^ ˙ a = 0
b ^ ˙ ω = 0 ,
where α × is a skew symmetric matrix, and function Q ( · ) maps a 3 by 1 vector of the angular velocity into a 4 by 4 matrix [25]. The use of the 4 by 1 quaternion representation in state estimation causes the covariance matrix to become singular, so it requires considerable accounting for the quaternion constraints. To avoid these difficulties, engineers developed the error-state Kalman filter in which 3 by 1 infinitesimal error quaternion δ θ is used instead of 4 by 1 quaternion q in the state vector. In other words, we use attitude error quaternion to express the incremental difference between tracked reference body frame b and actual body frame b for the vehicle. Jacobian matrix A = x ˙ x | x ^ and B = x ˙ η , where η = [ η a T , η ω T , η b a T , η b ω T ] T , are computed in Appendix A.

2.2.2. Camera Model

An intrinsically calibrated pinhole camera model [27,43] is given by
u j v j = y j = h j x + ζ j = f u c X j c Z j + ζ u j f v c Y j c Z j + ζ v j
c X j , c Y j , c Z j T = c p f j / c = T c / i i p f j / i i p c / i = T c / b T q i / b i p f j / i i p b / i T c / b b p c / b ,
where x is the state vector including the vehicle state and the feature state, and measurement y j is the j-th feature 2D location on the image plane. f u and f v are the horizontal and vertical focal lengths, respectively, and ζ u and ζ v are additive, zero-mean, white, Gaussian noise of the measurement. Vectors p f j / c , p f j / i are the j-th feature 3D position with respect to the camera frame and the inertial frame, respectively. Extrinsic parameter T c / b and b p c / b are known and constant. Jacobian matrix C j = y j x | x ^ is computed in Appendix A. In addition, from Equation (19), if j-th measurement y j on an image is a new feature, then i p f j / i is unknown so need to be initialized. Details of feature initialization are explained in Appendix B.

3. Outlier Rejection in Image Processing Front-End

3.1. Feature Correspondence

In this study, a feature detector using the Features from Accelerated Segment Test (FAST) algorithm [44,45] maintains a minimum number of features in each image. For each new image, a feature extractor using the Kanade–Lucas–Tomasi (KLT) sparse optical flow algorithm [46] tracks the existing features. Even though Paul et al. [47] proved that descriptor-based methods for temporal feature tracking are more accurate than KLT-based methods, as Sun et al. [48] found that descriptor-based methods require much more computing resource with a small gain in accuracy, we employ the KLT optical flow algorithm in the image processing front-end of this study. Next, our stereo matching using a fixed baseline stereo configuration also applies to the KLT optical flow algorithm for saving computational loads compared to other stereo matching approaches. With the matched features, a 2-point RANSAC [26] is applied to remove remaining outliers by utilizing the RANSAC step in the fundamental matrix test [27]. In the scope of this study, we implement the 2-point RANSAC algorithm by simply running one of open source codes.
Similar to [48,49], our outlier rejection is composed of three steps, shown in Figure 2. We assume that features from previous c 1 and c 2 images are outlier-rejected points, where c 1 and c 2 are left and right camera frames of a stereo camera, respectively. The three steps form a closed loop of previous and current frames of left and right cameras. The first step is the stereo matching of tracked features on current c 1 image to c 2 image. The next steps are applying 2-point RANSAC between previous and current images of the left camera and another 2-point RANSAC between previous and current images of the right camera. For steps 2 and 3, stereo-matched features are directly used in each RANSAC.

3.2. Algorithm of Feature Correspondence

Algorithm 1 summarizes the feature correspondence for outlier rejection. For the scope of this paper, the OpenCV library [50] and open source codes of RANSAC are extremely useful and directly applied.
Algorithm 1 Feature Correspondence for Outlier Rejection
Require: Pyramids and outlier-rejected points of previous c 1 , c 2 images
Feature Tracking:
functionbuildOpticalFlowPyramid(current c 1 or c 2 image)                           ▹ OpenCV [50]
return pyramid of current c 1 or c 2
end function
functionpredictFeatures(outlier-rejected points of previous c 1 , T ^ curr prev of c 1 , Intrinsic c 1 )
return predicted features of current c 1
end function
functioncalcOpticalFlowPyrLK(pyramids of previous and current c 1 , outlier-rejected points of previous c 1 , predicted features of current c 1 )                                            ▹ OpenCV [50]
return tracked points of previous c 1 and c 2 , tracked features of current c 1
end function
Stereo Matching:
functionstereoMatching(tracked points of previous c 1 and c 2 , tracked features of current c 1 )
 Initialize c 2 points by projecting the tracked features of current c 1 to c 2 using the rotation from stereo extrinsic
function calcOpticalFlowPyrLK(pyramid of current c 1 and c 2 , tracked features of current c 1 , initialized c 2 points)                                                    ▹ OpenCV [50]
end function
 Further remove outliers based on the essential matrix
return matched points of previous c 1 and c 2 , matched features of current c 1 and c 2
end function
2-Point RANSAC:
functiontwoPointRansac(matched points of previous c 1 or c 2 , matched features of current c 1 or c 2 , T ^ curr prev of c 1 or c 2 , Intrinsic of c 1 or c 2 )
return outlier-rejected points of current c 1 or c 2
end function
Addition of Newly Detected Features:
Create a mask to avoid re-detecting existing features
functionfastFeatureDetector(current c 1 image, mask)
return new features on current c 1
end function
functionstereoMatching(new features on current c 1 )
return matched new features on current c 2
end function
Group all of outlier-rejected features
In Algorithm 1, Pyramid is a type of multi-scale signal representation in which an image is subject to repeated smoothing and sub-sampling.

4. Outlier Adaptation in Filtering Back-End

Even though image processing front-end removes outliers by tracking, stereo matching, and 2-point RANSAC, some outlier features still survive and enter the filter as inputs. This section explains the outlier rejection procedure in filtering back-end.

4.1. Outlier Removal in Feature Initialization

If a measurement is a new feature, our system initializes its 3D position with respect to the inertial frame. In feature initialization, Gaussian–Newton least-squares minimization first estimates the depth of left c 1 camera. If either the estimated depth of the left or right camera is negative, then the solution of the minimization is invalid since features are always in front of both camera frames observing them. This process of removing features that has the invalid depth is referred to as outlier removal in feature initialization.

4.2. Outlier Detection by Chi-Squared Statistical Test

Before operating navigation systems, we initialize the chi-squared test table with a 95% confidence level. While the systems estimate the state variable, if j-th measurement y j at time k is the existing feature, then its residual r j and Jacobian C j are computed. Next, we proceed a Mahalanobis gating test [33] for residual r j to detect remaining outliers. In fact, Mahalanobis distance [34] γ j is a measure of the distance between residual r j and covariance matrix S j = C j ( P k ) j 1 C j T + R
γ j = r j T C j ( P k ) j 1 C j T + R 1 r j .
In the statistic test, we compare γ j value against a threshold given by the 95-th percentile of the χ 2 distribution with ν j degree of freedom. Here, ν j is the number of observations of the j-th feature minus one. If the feature passes the test, the EKF uses residual r j to process the measurement update.

4.3. Outlier-Adaptive Filtering

Unlike the extended ORKF (EORKF) [32], for a practical estimation approach in V-INS, this study investigates only measurement outliers due to the following reasons. As the measurement update is not the process performed at every time step, the outlier detection by each residual value cannot directly detect the outliers of IMU measurements. Furthermore, in the sequential measurement update, multiple residuals are computed to update at one IMU time stamp. In other words, as only rare observations among feature measurements from one image are corrupted by the remaining outliers, hypothesizing that the outliers come from the IMU may be faulty. Therefore, in the scope of this paper, we handle only measurement outliers.

4.3.1. Student’s t-Distribution

Despite the true system with outliers, the classical EKF assumes that each model in the filter is corrupted with additive white Gaussian noise. The levels of the noise are assumed to be constant and encoded by sensor covariance matrices Q and R (i.e., η k N ( 0 , Q ) , ( ζ j ) k N ( 0 , R ) ). However, as outliers arise in the realistic system, now we do not restrict noise at either a constant or Gaussian level. Instead, their levels vary over time, or noise have heavier tails than the normal distribution as follows,
ζ j | k ST ( 0 , R ˜ j , ν j ) , where R ˜ j W 1 ν j Λ j , ν j ,
where ST ( · ) denotes a Student’s t-distribution, and ν k > m 1 is degrees of freedom. Covariance matrix R ˜ j follows the inverse-Wishart distribution, denoted as W 1 ( · ) . Λ j 0 is m × m precision matrix.
In Bayesian statistics, the inverse-Wishart distribution is used as the conjugate prior for the covariance matrix of a multivariate normal distribution [20]. The probability density function (pdf) of the inverse-Wishart is
p ( R ˜ j | ν j , Λ j ) | R ˜ j | ν j + m + 1 2 exp ν j 2 tr ( Λ j R ˜ j 1 ) ,
where tr ( · ) denotes the trace of a square matrix in linear algebra. Moreover, in probability and statistics, a Student’s t-distribution is any member of a family of continuous probability distributions that arises when estimating the mean of a normally distributed population in situations where the standard deviation of the population is unknown [51]. Whereas a normal distribution describes a full population, a t-distribution describes samples drawn from a full population; thus, the larger the sample, the more the distribution resembles a normal distribution. Indeed, as the degree of freedom goes to infinity, the t-distribution approaches the standard normal distribution. In other words, when the variance of a normally distributed random variable is unknown and a conjugate prior placed over it that follows an inverse-Wishart distribution, the resulting marginal distribution of the variable follows a Student’s t-distribution [52]. Then, the Student-t, a sub-exponential distribution with much heavier tails than the Gaussian, is more prone to producing outlying values that fall far from its mean.

4.3.2. Variational Inference

The purpose of filtering is generally to find the approximations of posterior distributions p ( x k | y 1 : k ) , where y 1 : k = [ y 1 , y 2 , , y k ] is the histories of sensor measurements obtained up to time k. For systems with the heavy-tailed noise, we also wish to produce another inference about covariance matrices whose priors follow the inverse-Wishart distribution. Hence, our goal in this section is to find both approximations for posterior distribution p ( x 1 : k , R ˜ 1 : k | y 1 : k ) and model evidence p ( y 1 : k ) . Compared to sampling methods, the variational Bayesian method performs approximate posterior inference at low computational cost for a wide range of models [20,52]. In the method, we decompose log marginal probability
ln p ( y 1 : k ) = KL q p + L [ q ] ,
KL q p = q ( x 1 : k , R ˜ 1 : k ) ln q ( x 1 : k , R ˜ 1 : k ) p ( x 1 : k , R ˜ 1 : k | y 1 : k )
L [ q ] = q ( x 1 : k , R ˜ 1 : k ) ln p ( x 1 : k , R ˜ 1 : k , y 1 : k ) q ( x 1 : k , R ˜ 1 : k ) .
p is the true distribution that is intractable for non-Gaussian noise models, and q is a tractable approximate distribution.
In probability theory, a measure of the difference between two probability distributions p and q is the Kullback–Leibler divergence, denoted as KL [ · ] . If we allow any possible choice for q such as the Gaussian distribution, then lower bound L [ q ] is maximum when the KL divergence vanishes; that is, q ( x 1 : k , R ˜ 1 : k ) = p ( x 1 : k , R ˜ 1 : k | y 1 : k ) . To minimize the KL divergence, we seek the member of a restricted family of q ( x 1 : k , R ˜ 1 : k ) . Indeed, maximizing L [ q ] is equivalent to minimizing another new KL divergence [52], and thus the minimum occurs when factorized distributions q ( x 1 : k , R ˜ 1 : k ) = q ( x 1 : k ) q ( R ˜ 1 : k ) and the following Equations (25) and (26) hold,
ln q ( x 1 : k ) = ln p ( x 1 ) + t = 2 k E q ( R ˜ 1 : t ) [ ln p ( x t | x t 1 ) ] + t = 1 k E q ( R ˜ 1 : t ) [ ln p ( y t | x t , R ˜ t ) ] +
ln q ( R ˜ k ) = E q ( x 1 : k ) [ ln p ( y k | x k , R ˜ k ) ] + ln p ( R ˜ k ) + ,
where E q ( · ) represents the expectation with respect to q ( · ) . With assuming that initial state x 1 is Gaussian, the measurement update with varying noise covariance E [ R ˜ t 1 ] = Λ t 1 , which closely resemble the EKF updates, solve Equation (25). Algorithm 2 describes the details of the updates.
Now let us assume that the true priors are IID noise models as the case in this study; that is, p ( R ˜ k ) follows W 1 ( ν R , ν ) distribution. Then, second term ln p ( R ˜ k ) in the right-hand side of Equation (26) is computed using the pdf of the inverse-Wishart distribution in Equation (21) with its prior noise model.
ln p ( R ˜ k ) = ν + m + 1 2 ln | R ˜ k | ν 2 tr ( R R ˜ k 1 ) .
As the term is conjugate prior for Equation (20), the approximations of q ( R ˜ k ) have same mathematical forms as priors; that is, q ( R ˜ k ) also follows W 1 ( ν ˜ k Λ k , ν ˜ k ) distribution.
ln q ( R ˜ k ) = ν ˜ k + m + 1 2 ln | R ˜ k | ν ˜ k 2 tr ( Λ k R ˜ k 1 ) .
As y t | { x t , R ˜ t } N ( h ( x t ) , R ˜ t ) ,
E ln p ( y k | x k , R ˜ k ) = 1 2 ln | R ˜ k | 1 2 tr E [ ζ k ζ k T ] R ˜ k 1 .
From Equations (26)–(29), to handle measurement outliers, similar to Agamennoni et al. [18,19]’s derivation, we derive how to compute precision matrix Λ k of approximate distribution q ( R ˜ k ) of R ˜ k as follows,
ν ˜ k = 1 + ν , ν ˜ k Λ k = E [ ζ k ζ k T ] + ν R Λ k = ν R + E [ ζ k ζ k T ] ν + 1 ,
where each feature from one image is independent and
( ζ j ) k = ( y j ) k h ( x ( t img ) ) = ( y j ) k h ( x ^ k ) j + e j ( y j ) k h ( x ^ k ) j C j e j .
Next, in Equation (30),
E ( ζ j ) k ( ζ j ) k T = ( y j ) k h ( x ^ k ) j ( y j ) k h ( x ^ k ) j T + C j E [ e j e j T ] C j T = r j r j T + C j ( P k ) j C j T ,
where estimation error e j = x k ( x ^ k ) j and the Jacobian C j = h j x | ( x ^ k ) j . In the sequential measurement update, ( x ^ k ) j and ( P k ) j are corrected by Kalman gain K j that is a function of ( Λ j ) k , so these update steps are coupled. Hence no a closed-form solution exists, and we can only solve iteratively. The purpose of the iteration seems to be similar to that of the online learning of unknown variances of each noise [10]. In addition, similar to Agamennoni et al.’s interpretation [19], the convergence and optimality of the derived update steps for outliers are guaranteed since the variational lower bound is convex with respect to ( x ^ k ) j , ( P k ) j , and ( Λ j ) k . In particular, as the j-th feature is observed countless times (i.e., ν j ), Λ j converges to R in the limit of an infinitely precise noise distribution, so the iterative update steps reduce to the standard sequential measurement update of the EKF.
If true state x ( t img ) is significantly different from its estimate ( x ^ k ) j , then statistics E ( ζ j ) k ( ζ j ) k T dominates, and ( Λ j ) k becomes much larger than R. This ζ j is regarded as a measurement outlier at time k. As Kalman gain K j is a function of the inverse of precision matrix ( Λ j ) k , the larger ( Λ j ) k values, the smaller the Kalman gain. Therefore, to deal with situations where measurement outliers occur, the iteration for Equations (30) and (31) corrects the state estimates and its covariance with low weights.

5. Implementation

5.1. Marginalization of Feature States

If measurement outliers often occur, a few numbers of sequential updates in the EKF are proceeded to correct the state estimates. Without a sufficient number of measurement updates, the EKF is not robust and even diverges. Hence, outlier-adaptive filtering introduced in Section 4.3 performs the modified measurement update even when a residual is detected as an outlier. Indeed, to save computation resources, this study operates the outlier-adaptive filtering for only features detected frequently outliers. For implementation, we count how many numbers of features augmented in state variables are detected as outliers. Once updating feature outliers by the outlier-adaptive filtering approach, we prune the used feature states from the state vector (Figure 3). In addition, similar to that mentioned in Appendix B, to maintain a certain size of the state vector, after the feature initialization, we marginalize the features with the least number of observations among tracked features.

5.2. Summarized Algorithm

This section summarizes and describes an implementation of the proposed method. Figure 4 and Algorithm 2 illustrate a flow chart and the pseudocode of the overall process of the outlier-adaptive filtering approach for V-INS, respectively. For the robust outlier-adaptive filter presented in this paper, the blue boxes in Figure 4 are extended from the figure in [25].
Algorithm 2 The Outlier-Adaptive Filtering
Require: x ^ 0 + ( = x ^ V 0 + ) , P 0 + , Q , R , χ 2
for k = 1 : T do
                ▹ Image processing front-end in different thread
if new image capture then
Image Processing:                       ▹ Algorithm 1
  Stereo matching between current images of left camera c 1 and right camera c 2
  RANSAC between previous and current images of camera c 1
  RANSAC between previous and current images of camera c 2
end if
                   ▹ Filtering back-end in different thread
if new IMU packet arrival then
Time Update:
  State prediction
end if
if new vision data packet arrival then
  for j = 1 : of observed features N do
   if new feature then
Feature Initialization:
    If any depth of c 1 or c 2 is negative, j-th feature is outlier
   else                        ▹ tracked feature
Outlier Gating Test:
     r j = y j h j x ^ k j 1
     S j = ( C j ) k ( P k ) j 1 ( C j ) k T + R         ▹ ( C j ) k = h j x | x ^ k j 1
     γ = r j T S j 1 r j < ? χ j 2
Measurement Updates:
    if outlier detected then
      x ˜ j x ^ k j 1 , P ˜ j ( P k ) j 1    ▹ x ^ k 0 = x ^ k , ( P k ) 0 = P k
     while until converged do
      Update measurement noise given the state
       r ˜ j = y j h j ( x ˜ j )
       C ˜ j = h j x | x ˜ j
       W j = r ˜ j r ˜ j T + C ˜ j P ˜ j C ˜ j T
       Λ j ν j R + W j ν j + 1
      Update the posteriori state given noise
       S ˜ j = ( C j ) k ( P k ) j 1 ( C j ) k T + Λ j
       K ˜ j = ( P k ) j 1 ( C j ) k T S ˜ j 1
       x ˜ j x ^ k j 1 + K ˜ j r j
       P ˜ j ( P k ) j 1 K ˜ j ( C j ) k ( P k ) j 1
     end while
      x ^ k j = x ˜ j , ( P k ) j = P ˜ j
    else             ▹ standard sequential EKF in Section 2
      K j = ( P k ) j 1 ( C j ) k T S j 1
      x ^ k j = x ^ k j 1 + K j r j
      ( P k ) j = ( P k ) j 1 K j ( C j ) k ( P k ) j 1
    end if
   end if
  end for                 ▹ x ^ k + = x ^ k N , P k + = ( P k ) N
end if
end for

6. Flight Datasets Test Results

To examine the influence of outliers in V-INS and validate the reliability and robustness of the proposed outlier-adaptive approach for navigation systems with outliers, we test one of benchmark flight datasets, the so-called “EuRoC MAV datasets” [53]. The visual-inertial sequences of the datasets were recorded onboard a micro aerial vehicle (MAV) while a pilot manually flew around the indoor motion capture environment. For more details, see Appendix C. To articulate the significance of outliers, we select two datasets of the bright scene, called “EuRoC V1 Easy”, and motion blur, called “EuRoC V1 Difficult.” As the images in the difficult dataset are dark scene or motion blur, we hypothesize that outliers occur more frequently in the difficult dataset.
The EKF estimates the relative location and orientation from a starting point. As we do not know the exact absolute location of the origin of given datasets, to compare with ground-truth data given in the datasets, we require certain evaluation error metrics such as so-call “absolute trajectory error [54]”. For more details, see Appendix D. The absolute trajectory error as an evaluation error metric yields the following various comparison plots. Figure 5 illustrates the top-down view of the estimated flight trajectory of the difficult dataset.
Figure 6 depict the estimated x, y, z position and their respective estimation errors. All the estimation errors are bounded within each standard deviation σ envelope, so the proposed approach is reliable vision-aided inertial navigation under even poor illumination environments. Conceptually, the position error gets locked in, and it tends to increase with the length of the trajectory until new features are being mapped.
Similar to the analysis presented in [25], the adaptive filter is a well-tuned estimator since the performance of doing runs with ×3 or ×10 ( /3 or /10) multiplier on the R matrix used in the filter is worse for all of those, shown in Table 1. That is, the fact that using the multipliers reveals larger root mean square (RMS) estimation errors indicates that our filter is well-tuned.
Figure 7 shows the advantages of the addition of the outlier adaptation proposed in this paper by comparing it with a baseline, the standard EKF in V-INS.
As Lee [25] and other researchers showed that the standard EKF is a basic filter in V-INS, we choose the method as a baseline here. The baseline only rejects outliers whenever the chi-squared test fails, whereas the outlier-adaptive filtering follows all proposed adaptive approaches in Algorithm 2. Although the iteration in the outlier-adaptive filtering might increase computational resources, it significantly improves the accuracy of estimation. Fortunately, the “while” loop iteration in Algorithm 2 rapidly converges to the optimal noise covariance by twice or three times iterations. For sensitivity analysis, RMS position errors resulting from the baseline and the outlier-adaptive filtering are compiled in Table 2.
Motion blur datasets are more sensitive to outliers as the improvement is larger when applied to those datasets. Although the outlier-adaptive filtering is the best choice for motion blur datasets, we can select an adequate mode depending on computation margin and cost.
Although a number of researchers have investigated V-INS of the EuRoC datasets [55], only a few of them thoroughly has focused on vision measurements with outliers. Table 3 reveals that the proposed estimator, the outlier-adaptive filter, outperforms other state-of-the-art V-INS techniques, called “SVO+MSF [56,57]” and “S-MSCKF [48,58] ” in which stereo is available. As SVO+MSF is loosely coupled, its algorithm actually gets diverged.

7. Discussion

This paper has presented practical outlier-adaptive filtering for a vision-aided inertial navigation systems (V-INS) and evaluated its performance with flight datasets testing. In other words, this study develops a robust and adaptive state estimation framework for V-INS under frequent outliers occurrence. In the image processing front-end of the framework, we propose the improved utilization of outlier removal techniques. In filtering back-end, for estimating the states of V-INS with measurement outliers, we implement a novel approach of the outlier-robust extended Kalman filter (EKF) to V-INS, for which we derive iterative update steps for computing the precision noise matrices of vision outliers when the Mahalanobis gating test detects remaining outliers.
To validate the accuracy of the proposed approach and compare it with other state-of-the-art V-INS algorithms, we test the performance of V-INS employing the outlier-adaptive filtering algorithm in the realistic benchmark flight datasets. In particular, to show more improvements of our method over the others’ approaches, we use the fast motion and motion blur flight datasets. Results from the flight datasets testing show that the novel navigation approach in this study improves the accuracy and reliability of state estimation in V-INS with frequent outliers. Using the outlier-adaptive filtering reduces the root mean square (RMS) error of the estimates and accelerates the robustness of the estimates, especially for the motion blur datasets.
The primary goals of future work are listed as follows. Since an inertial measurement unit (IMU) is also a sensor, it could generate outliers in V-INS. With accounting for the process outliers, the accuracy and robustness of the estimator would be improved. If we distinguish process outliers from IMU sensors with measurement outliers from vision data, the extended outlier-robust EKF [32] may be an impressive and innovate approach for this case. Furthermore, the investigation of color noise in V-INS is another possible future work. One of the required assumptions of the Kalman filter is the whiteness of measurement noise. As an illustration, during sampling and transmission in image processing, color noise that may be originated from a multiplicity of sources could degrade the quality of images [59]. The vibrational effects of camera sensors might also produce colored measurement noise [60]. That is, if the residuals of vision data are correlated with themselves at different timestamp, then colored measurement noise occurs in V-INS. Therefore, the images with color noise would be filtered for ensuring the accuracy of locating landmarks. As modeling noise without additional prior knowledge of the noise statistics is typically difficult, the machine learning techniques-based state estimator for colored noise [61,62] may handle the unknown correlations in V-INS.
This study showcases the approaches using stereo cameras but is also suitable for monocular V-INS and employable to other filter-based V-INS frameworks. We test benchmark flight datasets to validate the reliability and robustness of this study, but additional validating with other flight datasets or real-time flight tests would help to prove its more robustness. In addition, we can operate unmanned aerial vehicles (UAVs) stacked the navigation algorithms in this study and a controller-in-the-loop. The use of the controller-in-the-loop could be a more important validation criteria due to the potential for navigation-controller coupling. The research objectives and contributions presented here will remarkably advance the state-of-the-art techniques of vision-aided inertial navigation for UAVs.

Author Contributions

Conceptualization, K.L. and E.N.J.; Data curation, Formal analysis, Investigation, Methodology, Resources, and Software, K.L.; Supervision, E.N.J.; Validation, Visualization, and Writing-original draft, K.L.; Writing-review & editing, K.L. and E.N.J. All authors have read and agreed to the published version of the manuscript.


This research received no external funding.

Conflicts of Interest

The authors declare no conflicts of interest.


The following abbreviations are used in this manuscript:
V-INSVision-aided Inertial Navigation Systems
VIOVisual-Inertial Odometry
IMUInertial Measurement Unit
MEMSMicro-ElectroMechanical System
EKFExtended Kalman Filter
UAVUnmanned Aerial Vehicle
RANSACRANdom SAmple Consensus
FASTFeatures from Accelerated Segment Test
STStudent’s t-distribution
ROSRobot Operating System


kdiscrete timejindex of measurements
Qprocess noise covarianceRmeasurement noise covariance
Perror state covariancerresidual

Appendix A. Jacobians of Models

From Section 2.2.1 and Section 2.2.2, Jacobian A,B, and C are
A = 0 i p ^ ˙ b / i i v ^ b / i 0 0 0 0 0 i v ^ ˙ b / i δ θ ^ i v ^ ˙ b / i b ^ a 0 0 0 δ θ ^ ˙ δ θ ^ 0 δ θ ^ ˙ b ^ ω 0 0 0 0 0 0 0 0 0 0 , B = 0 0 0 0 i v ^ ˙ b / i η a 0 0 0 0 δ θ ^ ˙ η ω 0 0 0 0 b ˙ a η b a 0 0 0 0 b ˙ ω η b ω
i p ^ ˙ b / i i v ^ b / i = I 3 × 3 , i v ^ ˙ b / i δ θ ^ = T ^ i / b ( a raw b ^ a ) × , i v ^ ˙ b / i b ^ a = T ^ i / b , δ θ ^ ˙ δ θ ^ = ( ω raw b ^ ω ) × , δ θ ^ ˙ b ^ ω = I 3 × 3 , i v ^ ˙ b / i η a = T ^ i / b , δ θ ^ ˙ η ω = I 3 × 3 , b ˙ a η b a = I 3 × 3 , b ˙ ω η b ω = I 3 × 3 ,
C j = y j i p ^ b / i 0 y j δ θ ^ 0 0 | 0 y j i p ^ f j / i 0
y j i p ^ b / i = y j c p ^ f j / c ( T ^ c / i ) , y j i p ^ f j / i = y j i p ^ b / i , y j δ θ ^ = y j c p ^ f j / c T c / b b p ^ f j / b ×
y j c p ^ f j / c = 1 c Z ^ j f u 0 f u c X ^ j c Z ^ j 0 f v f v c Y ^ j c Z ^ j ,
where for more detailed derivations, see reference [25,63].

Appendix B. Feature Initialization

In the first step of the measurement update, we employ Gauss–Newton least-squares minimization [58,64] to estimate feature 3D position i p ^ f j / i . To avoid local minima, we apply the inverse depth parameterization of the feature position [65] that is numerically more stable than the Cartesian parameterization.
We assume that the intrinsic and extrinsic parameters of a stereo camera are known and constant values. c 1 , c 2 frames are the left and right camera frame of the stereo, respectively. Since the baseline of the stereo is fixed, rotation T c 2 / c 1 and translation c 2 p c 1 / c 2 between both cameras are constant and known values. Feature coordinates c [ X , Y , Z ] T with respect to both cameras are
c 2 p f j / c 2 = T c 2 / c 1 c 1 p f j / c 1 + c 2 p c 1 / c 2
c 2 X j c 2 Y j c 2 Z j T = T c 2 / c 1 c 1 X j c 1 Y j c 1 Z j T + c 2 p c 1 / c 2
= c 1 Z j T c 2 / c 1 c 1 X j c 1 Z j c 1 Y j c 1 Z j 1 + 1 c 1 Z j c 2 p c 1 / c 2
= c 1 Z ^ j T c 2 / c 1 u ^ j , 1 / f u 1 v ^ j , 1 / f v 1 1 + 1 c 1 Z ^ j c 2 p c 1 / c 2 .
Simplifying Equation (A8),
c 2 X j c 2 Y j c 2 Z j = c 1 Z ^ j m x m y m z + 1 c 1 Z ^ j t r x t r y t r z ,
where m x , m y , and m z are scalar functions of given j-th measurement and constant extrinsic rotation matrix. Based on Equation (17), as the measurement equations from the stereo camera are
y j = u j , 1 v j , 1 u j , 2 v j , 2 = f u 1 c 1 X j c 1 Z j f v 1 c 1 Y j c 1 Z j f u 2 c 2 X j c 2 Z j f v 2 c 2 Y j c 2 Z j + ζ j ,
right c 2 camera measurements are expressed in A x = b form.
u ^ j , 2 / f u 2 v ^ j , 2 / f v 2 = m x + ( t r x / c 1 Z ^ j ) m z + ( t r z / c 1 Z ^ j ) m y + ( t r y / c 1 Z ^ j ) m z + ( t r z / c 1 Z ^ j )
m x ( u ^ j , 2 / f u 2 ) m z m y ( v ^ j , 2 / f v 2 ) m z c 1 Z ^ = ( u ^ j , 2 / f u 2 ) t r z t r x ( v ^ j , 2 / f v 2 ) t r z t r y ,
where let
x = c 1 Z ^ , A = m x ( u ^ j , 2 / f u 2 ) m z m y ( v ^ j , 2 / f v 2 ) m z , b = ( u ^ j , 2 / f u 2 ) t r z t r x ( v ^ j , 2 / f v 2 ) t r z t r y .
Therefore, the Gauss–Newton least-squares minimization estimates depth c 1 Z of left c 1 camera using the pseudo-inverse of A is
A x = b ( A T A ) x = A T b x ^ = ( A T A ) 1 A T b .
If either estimated depth c 1 Z ^ or c 2 Z ^ is negative, the solution of the minimization is invalid since the feature is always in front of both camera frames observing it. By substituting estimated c 1 Z ^ into Equation (A8),
c 1 p ^ f j / c 1 = ( u ^ j , 1 / f u 1 ) c 1 Z ^ ( ( v ^ j , 1 / f v 1 ) c 1 Z ^ c 1 Z ^ T ,
where p ^ f j / c is not related to the pose of the vehicle. Likewise, if a monocular camera is used instead, c 1 is the camera frame in which the feature was observed at the first time, and c 2 is the camera frame at a different time instance.
The j-th feature 3D position with respect to the inertial frame is
i p ^ f j / i = T ^ i / c 1 c 1 p ^ f j / c 1 + i p ^ c 1 / i = T ^ i / b T b / c 1 c 1 p ^ f j / c 1 + i p ^ b / i + T ^ i / b b p c 1 / b = T ^ i / b T ^ b / b T b / c 1 c 1 p ^ f j / c 1 + b p c 1 / b + i p ^ b / i .
The new feature is initialized using only one image in which the feature is first observed. Although the new feature is initialized, as it still entails uncertainty, the EKF recursively estimates and updates its 3D position by augmenting into the state vector.
x ^ = x ^ V T i p ^ f j / i T T ,
where x ^ V is the estimated vehicle state vector defined in Equation (8). The overall initialization includes the initial value of the feature state and its error covariance assignment. The error covariance of the new feature are initialized using state augmentation with Jacobian J.
P = I J P I J T = P P J T J P J P J T + P f new ,
where Jacobian J = p f / i x | x ^ is computed as follows,
J = I 3 × 3 0 3 × 3 T ^ i / b ( T b / c 1 c 1 p ^ f j / c 1 + b p c 1 / b ) × 0 3 × 6 0 3 × 3 N f .
N f is the number of all features and P f new is the initial uncertainty of the initialized new feature. The error pertains to measurement noise and the error of the least-squares minimization and so on. In fact, since Montiel et al. [65] validate that the initial uncertainty is coded as Gaussian, the EKF including the feature initialization still holds optimality.
Once initialized, the EKF processes the feature state in the prediction-update loop. In the time update of the EKF, we propagate P by Equation (4).
Φ 0 0 I P v v P v f P f v P f f Φ T 0 0 I + Q v 0 0 Q f = Φ P v v Φ T + Q v Φ P v f P f v Φ T P f f + Q f ,
where state transition matrix Φ I + A Δ t . P V V : = E [ ( x V x ^ V ) ( x V x ^ V ) T ] , P f f is the error covariance of all features, and P V f = P f V T represents vehicle-feature correlations. In addition, we assume that surrounding is static, so the dynamics of features p ^ ˙ f j / i = 0 . In the measurement update of the EKF, only tracked features are used for the update. For the efficient management of the map database, if the size of the state vector exceeds than the maximum limit, then the feature with the least number of observations is pruned and marginalized.

Appendix C. Experimental Equipment and Environments

Burri et al. [53] provide the benchmark datasets of UAV flying, and Table A1 illustrates the specifications of the sensors used in the datasets.
Table A1. Sensors of EuRoC Datasets.
Table A1. Sensors of EuRoC Datasets.
Stereo Images (Aptina MT9V034)2 × 20 FPSGlobal Shutter, WVGA Monochrome
MEMS IMU (ADIS16448)200 HzInstrumentally Calibrated
They obtain the noise model parameters from the IMU at rest and provide them; that is, σ a , σ ω , σ b a , and σ b ω are known. The intrinsic and extrinsic parameters of both cameras are also given; that is, f u , f v , T c / b , and b p c / b are known. The visual-inertial sensor unit is calibrated with Kalibr [66] prior to dataset collection. Furthermore, IMU and cameras are hardware time-synchronized such that the middle of the exposure aligned with the IMU measurements.
Visual and inertial data is logged and timestamped onboard the MAV while ground truth is logged in the Vicon 6D motion capture system on the base station. The accuracy of the synchronization between the ground truth and the sensor data is limited by the fact that both sources are recorded on different machines and that the timestamps of the devices are unavailable for the ground-truth system. A maximum likelihood (ML) estimator [11] aligns the data temporally and calibrates the position of the ground-truth coordinate with respect to the body sensor unit. In fact, the ML estimator synchronizes the time-varying temporal offset between the ground truth and the body sensor system. Moreover, it determines the unknown transform between the ground-truth reference frame and the body frame. To obtain the full ML solution, they employ a batch estimator in an offline procedure. Finally, as ground truth, they provide the ML solutions instead of raw data.

Appendix D. Evaluation Error Metric

Sturm et al. [54] provide a set of tools used to preprocess the datasets and evaluate the tracking results. To validate estimation results, we need to evaluate the errors in the estimated trajectory by comparing it with the ground truth. Among various error metrics, two prominent methods are the absolute trajectory error (ATE) and the relative pose error (RPE). In this paper, to evaluate the overall performance of V-INS employing the outlier-adaptive filtering, the ATE measure is selected.

Appendix Absolute Trajectory Error (ATE)

The absolute trajectory error directly measures the difference between points of the true and the estimated trajectory. As a preprocessing step, we associate the estimated poses with ground-truth poses using the recorded timestamps. Based on this association, we align the true and the estimated trajectory using the Horn et al. [67]’s closed-form method based on singular value decomposition (SVD). Finally, we compute the differences between each pair of poses and output the mean, median, and standard deviation of these differences.


  1. Kalman, R.E. A New Approach to Linear Filtering and Prediction Problems. J. Basic Eng. 1960, 82, 35–45. [Google Scholar] [CrossRef] [Green Version]
  2. Kalman, R.E.; Bucy, R.S. New Results in Linear Filtering and Prediction Theory. J. Basic Eng. 1961, 83, 95–108. [Google Scholar] [CrossRef]
  3. Anderson, B.D.; Moore, J.B. Optimal Filtering; Prentice Hall: Englewood Cliffs, NJ, USA, 1979; Chapters 2–4. [Google Scholar]
  4. Bar-Shalom, Y.; Li, X.R.; Kirubarajan, T. Estimation with Applications to Tracking and Navigation: Theory Algorithms and Software; John Wiley & Sons: Hoboken, NJ, USA, 2004; Chapters 2–9. [Google Scholar]
  5. Schmidt, S.F. The Kalman Filter: Its Recognition and Development for Aerospace Applications. J. Guid. Control Dyn. 1981, 4, 4–7. [Google Scholar] [CrossRef]
  6. Magree, D.; Johnson, E.N. Factored Extended Kalman Filter for Monocular Vision-Aided Inertial Navigation. J. Aerosp. Inf. Syst. 2016, 13, 475–490. [Google Scholar] [CrossRef]
  7. Song, Y.; Nuske, S.; Scherer, S. A Multi-Sensor Fusion MAV State Estimation from Long-Range Stereo, IMU, GPS and Barometric Sensors. Sensors 2017, 17, 11. [Google Scholar] [CrossRef]
  8. Mostafa, M.; Zahran, S.; Moussa, A.; El-Sheimy, N.; Sesay, A. Radar and Visual Odometry Integrated System Aided Navigation for UAVS in GNSS Denied Environment. Sensors 2018, 18, 2776. [Google Scholar] [CrossRef] [Green Version]
  9. Hawkins, D.M. Identification of Outliers; Springer: Berlin/Heidelberg, Germany, 1980; Chapters 2–5. [Google Scholar]
  10. Mehra, R.K. On the Identification of Variances and Adaptive Kalman Filtering. IEEE Trans. Autom. Control 1970, 15, 175–184. [Google Scholar] [CrossRef]
  11. Maybeck, P.S. Stochastic Models, Estimation, and Control; Academic Press: Cambridge, MA, USA, 1982; Chapter 10; pp. 68–158. [Google Scholar]
  12. Stengel, R.F. Optimal Control and Estimation; Courier Corporation: Chelmsford, MA, USA, 2012; Chapter 4. [Google Scholar]
  13. Ting, J.A.; Theodorou, E.; Schaal, S. A Kalman Filter for Robust Outlier Detection. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), San Diego, CA, USA, 29 October–2 November 2007; pp. 1514–1519. [Google Scholar] [CrossRef]
  14. Särkkä, S.; Nummenmaa, A. Recursive Noise Adaptive Kalman Filtering by Variational Bayesian Approximations. IEEE Trans. Autom. Control 2009, 54, 596–600. [Google Scholar] [CrossRef]
  15. Piché, R.; Särkkä, S.; Hartikainen, J. Recursive Outlier-Robust Filtering and Smoothing for Nonlinear Systems Using the Multivariate Student-t Distribution. In Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing (MLSP), Santander, Spain, 23–26 September 2012; pp. 1–6. [Google Scholar] [CrossRef]
  16. Roth, M.; Özkan, E.; Gustafsson, F. A Student’s t Filter for Heavy Tailed Process and Measurement Noise. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, BC, Canada, 26–31 May 2013; pp. 5770–5774. [Google Scholar] [CrossRef] [Green Version]
  17. Solin, A.; Särkkä, S. State Space Methods for Efficient Inference in Student-t Process Regression. In Proceedings of the International Conference on Artificial Intelligence and Statistics, San Diego, CA, USA, 9–12 May 2015; pp. 885–893. [Google Scholar]
  18. Agamennoni, G.; Nieto, J.I.; Nebot, E.M. An Outlier-Robust Kalman Filter. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, 9–13 May 2011; pp. 1551–1558. [Google Scholar] [CrossRef]
  19. Agamennoni, G.; Nieto, J.I.; Nebot, E.M. Approximate Inference in State-Space Models with Heavy-Tailed Noise. IEEE Trans. Signal Process. 2012, 60, 5024–5037. [Google Scholar] [CrossRef]
  20. Beal, M.J. Variational Algorithms for Approximate Bayesian Inference. Ph.D. Thesis, University of London, London, UK, 2003. [Google Scholar]
  21. Graham, M.C.; How, J.P.; Gustafson, D.E. Robust State Estimation with Sparse Outliers. J. Guid. Control Dyn. 2015, 38, 1229–1240. [Google Scholar] [CrossRef] [Green Version]
  22. Wu, A.D.; Johnson, E.N.; Kaess, M.; Dellaert, F.; Chowdhary, G. Autonomous Flight in GPS-Denied Environments Using Monocular Vision and Inertial Sensors. J. Aerosp. Inf. Syst. 2013, 10, 172–186. [Google Scholar] [CrossRef]
  23. Nakamura, T.; Haviland, S.T.; Bershadsky, D.; Johnson, E.N. Vision Sensor Fusion for Autonomous Landing. In Proceedings of the AIAA Information Systems-AIAA Infotech@ Aerospace, Grapevine, TX, USA, 9–13 January 2017; p. 0674. [Google Scholar] [CrossRef]
  24. Simon, D. Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches; John Wiley & Sons: Hoboken, NJ, USA, 2006; Chapters 2–13. [Google Scholar]
  25. Lee, K. Adaptive Filtering for Vision-Aided Inertial Navigation. Ph.D. Thesis, Georgia Institute of Technology, Atlanta, GA, USA, 2018. [Google Scholar]
  26. Fischler, M.A.; Bolles, R.C. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
  27. Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision; Cambridge University Press: Cambridge, UK, 2003; pp. 22–236. [Google Scholar]
  28. Troiani, C.; Martinelli, A.; Laugier, C.; Scaramuzza, D. 2-Point-Based Outlier Rejection for Camera-IMU Systems with Applications to Micro Aerial Vehicles. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–7 June 2014; pp. 5530–5536. [Google Scholar] [CrossRef] [Green Version]
  29. Kriegel, H.P.; Kröger, P.; Zimek, A. Outlier Detection Techniques. Tutor. KDD 2010, 10, 1–76. [Google Scholar]
  30. Angelov, P. Anomaly Detection Based on Eccentricity Analysis. In Proceedings of the IEEE Symposium on Evolving and Autonomous Learning Systems (EALS), Orlando, FL, USA, 9–12 December 2014; pp. 1–8. [Google Scholar] [CrossRef]
  31. Costa, B.S.J.; Bezerra, C.G.; Guedes, L.A.; Angelov, P.P. Online Fault Detection Based on Typicality and Eccentricity Data Analytics. In Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN), Killarney, Ireland, 12–17 July 2015; pp. 1–6. [Google Scholar] [CrossRef] [Green Version]
  32. Lee, K.; Johnson, E.N. Robust State Estimation and Online Outlier Detection Using Eccentricity Analysis. In Proceedings of the IEEE Conference on Control Technology and Applications (CCTA), Mauna Lani, HI, USA, 27–30 August 2017; pp. 1350–1355. [Google Scholar] [CrossRef]
  33. Chang, G. Robust Kalman Filtering Based on Mahalanobis Distance as Outlier Judging Criterion. J. Geod. 2014, 88, 391–401. [Google Scholar] [CrossRef]
  34. Mahalanobis, P.C. On the Generalized Distance in Statistics; National Institute of Science of India: Brahmapur, India, 1936. [Google Scholar]
  35. Atkinson, K.E. An Introduction to Numerical Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2008; Chapters 5–6. [Google Scholar]
  36. Gelb, A. Applied Optimal Estimation; MIT Press: Cambridge, MA, USA, 1974; Chapters 4, 6; pp. 102–155, 180–228. [Google Scholar]
  37. Crassidis, J.L.; Junkins, J.L. Optimal Estimation of Dynamic Systems; CRC Press: Boca Raton, FL, USA, 2011; Chapters 3–4. [Google Scholar]
  38. Stevens, B.L.; Lewis, F.L.; Johnson, E.N. Aircraft Control and Simulation: Dynamics, Controls Design, and Autonomous Systems; John Wiley & Sons: Hoboken, NJ, USA, 2015; Chapters 1–2. [Google Scholar]
  39. Markley, F.L. Attitude Error Representations for Kalman Filtering. J. Guid. Control Dyn. 2003, 26, 311–317. [Google Scholar] [CrossRef]
  40. Sola, J. Quaternion Kinematics for the Error-State Kalman Filter. arXiv 2017, arXiv:1711.02508. [Google Scholar]
  41. Woodman, O.J. An Introduction to Inertial Navigation; Technical Report; University of Cambridge, Computer Laboratory: Cambridge, UK, 2007. [Google Scholar]
  42. Titterton, D.; Weston, J.L.; Weston, J. Strapdown Inertial Navigation Technology; The Institution of Engineering and Technology (IET): London, UK, 2004; Volume 17, Chapters 4–7. [Google Scholar]
  43. Forsyth, D.A.; Ponce, J. Computer Vision: A Modern Approach; Prentice Hall: Upper Saddle River, NJ, USA, 2002; Chapters 4–5. [Google Scholar]
  44. Trajković, M.; Hedley, M. Fast Corner Detection. Image Vis. Comput. 1998, 16, 75–87. [Google Scholar] [CrossRef]
  45. Rosten, E.; Drummond, T. Machine Learning for High-Speed Corner Detection. In Proceedings of the European Conference on Computer Vision (ECCV); Springer: Berlin/Heidelberg, Germany, 2006; pp. 430–443. [Google Scholar] [CrossRef]
  46. Lucas, B.D.; Kanade, T. An Iterative Image Registration Technique with an Application to Stereo Vision. In Proceedings of the 7th International Joint Conference on Artificial Intelligence, Vancouver, BC, Canada, 24–28 August 1981; Morgan Kaufmann Publishers Inc.: Bullington, MA, USA, 1981; Volume 2, pp. 674–679. [Google Scholar]
  47. Paul, M.K.; Wu, K.; Hesch, J.A.; Nerurkar, E.D.; Roumeliotis, S.I. A Comparative Analysis of Tightly-Coupled Monocular, Binocular, and Stereo Vins. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 165–172. [Google Scholar] [CrossRef]
  48. Sun, K.; Mohta, K.; Pfrommer, B.; Watterson, M.; Liu, S.; Mulgaonkar, Y.; Taylor, C.J.; Kumar, V. Robust Stereo Visual Inertial Odometry for Fast Autonomous Flight. IEEE Robot. Autom. Lett. 2018, 3, 965–972. [Google Scholar] [CrossRef] [Green Version]
  49. Kitt, B.; Geiger, A.; Lategahn, H. Visual Odometry Based on Stereo Image Sequences with Ransac-Based Outlier Rejection Scheme. In Proceedings of the IEEE Intelligent Vehicles Symposium (IV), San Diego, CA, USA, 21–24 June 2010; pp. 486–492. [Google Scholar] [CrossRef] [Green Version]
  50. Open CV. 2019. Available online: (accessed on 8 November 2019).
  51. Devore, J.L. Probability and Statistics for Engineering and the Sciences; Cengage Learning: Boston, MA, USA, 2015; Chapter 7; pp. 267–299. [Google Scholar]
  52. Bishop, C.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006; Chapter 10; pp. 461–522. [Google Scholar]
  53. Burri, M.; Nikolic, J.; Gohl, P.; Schneider, T.; Rehder, J.; Omari, S.; Achtelik, M.W.; Siegwart, R. The EuRoC Micro Aerial Vehicle Datasets. Int. J. Robot. Res. 2016, 35, 1157–1163. [Google Scholar] [CrossRef]
  54. Sturm, J.; Engelhard, N.; Endres, F.; Burgard, W.; Cremers, D. A Benchmark for the Evaluation of RGB-D SLAM Systems. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vilamoura, Portugal, 7–12 October 2012; pp. 573–580. [Google Scholar] [CrossRef] [Green Version]
  55. Delmerico, J.; Scaramuzza, D. A Benchmark Comparison of Monocular Visual-Inertial Odometry Algorithms for Flying Robots. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018. number CONF. [Google Scholar] [CrossRef]
  56. Faessler, M.; Fontana, F.; Forster, C.; Mueggler, E.; Pizzoli, M.; Scaramuzza, D. Autonomous, Vision-Based Flight and Live Dense 3D Mapping with a Quadrotor Micro Aerial Vehicle. J. Field Robot. 2016, 33, 431–450. [Google Scholar] [CrossRef]
  57. Forster, C.; Zhang, Z.; Gassner, M.; Werlberger, M.; Scaramuzza, D. SVO: Semidirect Visual Odometry for Monocular and Multicamera Systems. IEEE Trans. Robot. 2017, 33, 249–265. [Google Scholar] [CrossRef] [Green Version]
  58. Mourikis, A.I.; Roumeliotis, S.I. A Multi-State Constraint Kalman Filter for Vision-Aided Inertial Navigation. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Roma, Italy, 10–14 April 2007; pp. 3565–3572. [Google Scholar] [CrossRef]
  59. Xiong, S.; Zhou, Z.; Zhong, L.; Xu, C.; Zhang, W. Adaptive Filtering of Color Noise Using the Kalman Filter Algorithm. In Proceedings of the 17th IEEE Instrumentation and Measurement Technology Conference (IMTC), Baltimore, MD, USA, 1–4 May 2000; Volume 2, pp. 1009–1012. [Google Scholar] [CrossRef]
  60. Kumar, A.; Crassidis, J.L. Colored-Noise Kalman Filter for Vibration Mitigation of Position/Attitude Estimation Systems. In Proceedings of the AIAA Guidance, Navigation and Control Conference and Exhibit, Hilton Head, SC, USA, 20–23 August 2007; p. 6516. [Google Scholar] [CrossRef] [Green Version]
  61. Lee, K.; Johnson, E.N. State Estimation Using Gaussian Process Regression for Colored Noise Systems. In Proceedings of the IEEE Aerospace Conference, Big Sky, MT, USA, 4–11 March 2017; pp. 1–8. [Google Scholar] [CrossRef]
  62. Lee, K.; Choi, Y.; Johnson, E.N. Kernel Embedding-Based State Estimation for Colored Noise Systems. In Proceedings of the IEEE/AIAA 36th Digital Avionics Systems Conference (DASC), St. Petersburg, FL, USA, 17–21 September 2017; pp. 1–8. [Google Scholar] [CrossRef]
  63. Wu, A.D. Vision-Based Navigation and Mapping for Flight in GPS-Denied Environments. Ph.D. Thesis, Georgia Institute of Technology, Atlanta, GA, USA, 2010. [Google Scholar]
  64. Bjorck, A. Numerical Methods for Least Squares Problems; Society for Industrial and Applied Mathematics (SIAM): Philadelphia, PA, USA, 1996; Volume 51, Chapters 2–9. [Google Scholar]
  65. Montiel, J.M.; Civera, J.; Davison, A.J. Unified Inverse Depth Parametrization for Monocular SLAM. In Proceedings of the Robotics: Science and Systems, Philadelphia, PA, USA, 16–19 August 2006. [Google Scholar] [CrossRef]
  66. Furgale, P.; Rehder, J.; Siegwart, R. Unified Temporal and Spatial Calibration for Multi-Sensor Systems. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Tokyo, Japan, 3–7 November 2013; pp. 1280–1286. [Google Scholar] [CrossRef]
  67. Horn, B.K.; Hilden, H.M.; Negahdaripour, S. Closed-Form Solution of Absolute Orientation Using Orthonormal Matrices. J. Opt. Soc. Am. A 1988, 5, 1127–1135. [Google Scholar] [CrossRef] [Green Version]
Figure 1. A schematic of the sequential measurement update.
Figure 1. A schematic of the sequential measurement update.
Sensors 20 02036 g001
Figure 2. Close loop steps of outlier rejection in image processing front-end.
Figure 2. Close loop steps of outlier rejection in image processing front-end.
Sensors 20 02036 g002
Figure 3. A block diagram of the vision-aided inertial navigation system employing the outlier-adaptive filtering.
Figure 3. A block diagram of the vision-aided inertial navigation system employing the outlier-adaptive filtering.
Sensors 20 02036 g003
Figure 4. A flow chart of the overall process of the outlier-adaptive filtering.
Figure 4. A flow chart of the overall process of the outlier-adaptive filtering.
Sensors 20 02036 g004
Figure 5. Top-down view of flight trajectory of the EuRoC V1 difficult dataset by the outlier-adaptive filter.
Figure 5. Top-down view of flight trajectory of the EuRoC V1 difficult dataset by the outlier-adaptive filter.
Sensors 20 02036 g005
Figure 6. Position and estimation error of the EuRoC V1 difficult dataset by the outlier-adaptive filter.
Figure 6. Position and estimation error of the EuRoC V1 difficult dataset by the outlier-adaptive filter.
Sensors 20 02036 g006
Figure 7. Box plot of absolute estimation error of the position of the EuRoC V1 difficult dataset by the outlier-adaptive filter.
Figure 7. Box plot of absolute estimation error of the position of the EuRoC V1 difficult dataset by the outlier-adaptive filter.
Sensors 20 02036 g007
Table 1. Indication that the outlier-adaptive filter is well-tuned for the EuRoC V1 difficult dataset.
Table 1. Indication that the outlier-adaptive filter is well-tuned for the EuRoC V1 difficult dataset.
Multiplier on R /10/31×3×10
RMS error [m] 0.92400.38010.17000.51530.5610
Table 2. Sensitivity analysis in RMS position error [m] of the outlier-adaptive filtering.
Table 2. Sensitivity analysis in RMS position error [m] of the outlier-adaptive filtering.
Dataset EuRoC V1 Easy EuRoC V1 Difficult
Slow Motion 0.41 m/s, 16.0 deg/s Fast Motion 0.75 m/s, 35.5 deg/s
Method Bright Scene Motion Blur
Baseline 0.2558 0.3656
Outlier-Adaptive 0.2237 0.2264
Table 3. Comparison with other methods in RMS position error [m] of the outlier-adaptive filtering.
Table 3. Comparison with other methods in RMS position error [m] of the outlier-adaptive filtering.
Dataset EuRoC V1 EasyEuRoC V1 Difficult
Method Bright SceneMotion Blur
Outlier-Adaptive Filter 0.22370.2264
SVO+MSF [56,57] (loosely coupled) 0.40×
S-MSCKF [48,58] (stereo-filter) 0.340.67

Share and Cite

MDPI and ACS Style

Lee, K.; Johnson, E.N. Robust Outlier-Adaptive Filtering for Vision-Aided Inertial Navigation. Sensors 2020, 20, 2036.

AMA Style

Lee K, Johnson EN. Robust Outlier-Adaptive Filtering for Vision-Aided Inertial Navigation. Sensors. 2020; 20(7):2036.

Chicago/Turabian Style

Lee, Kyuman, and Eric N. Johnson. 2020. "Robust Outlier-Adaptive Filtering for Vision-Aided Inertial Navigation" Sensors 20, no. 7: 2036.

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop