Latency Compensated Visual-Inertial Odometry for Agile Autonomous Flight

Lee, Kyuman; Johnson, Eric N.

doi:10.3390/s20082209

Open AccessArticle

Latency Compensated Visual-Inertial Odometry for Agile Autonomous Flight

by

Kyuman Lee

^1,*,†

and

Eric N. Johnson

²

¹

School of Aerospace Engineering, Georgia Institute of Technology, 270 Ferst Drive, Atlanta, GA 30313, USA

²

Faculty of Aerospace Engineering, The Pennsylvania State University, 229 Hammond Building, University Park, PA 16802, USA

^*

Author to whom correspondence should be addressed.

^†

Current address: 55 River Oaks Pl, San Jose, CA 95134, USA.

Sensors 2020, 20(8), 2209; https://doi.org/10.3390/s20082209

Submission received: 28 February 2020 / Revised: 10 April 2020 / Accepted: 12 April 2020 / Published: 14 April 2020

(This article belongs to the Special Issue UAV-Based Smart Sensor Systems and Applications)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

In visual-inertial odometry (VIO), inertial measurement unit (IMU) dead reckoning acts as the dynamic model for flight vehicles while camera vision extracts information about the surrounding environment and determines features or points of interest. With these sensors, the most widely used algorithm for estimating vehicle and feature states for VIO is an extended Kalman filter (EKF). The design of the standard EKF does not inherently allow for time offsets between the timestamps of the IMU and vision data. In fact, sensor-related delays that arise in various realistic conditions are at least partially unknown parameters. A lack of compensation for unknown parameters often leads to a serious impact on the accuracy of VIO systems and systems like them. To compensate for the uncertainties of the unknown time delays, this study incorporates parameter estimation into feature initialization and state estimation. Moreover, computing cross-covariance and estimating delays in online temporal calibration correct residual, Jacobian, and covariance. Results from flight dataset testing validate the improved accuracy of VIO employing latency compensated filtering frameworks. The insights and methods proposed here are ultimately useful in any estimation problem (e.g., multi-sensor fusion scenarios) where compensation for partially unknown time delays can enhance performance.

Keywords:

VIO; UAV; EKF; IMU; camera vision; time delay; latency compensation; online temporal calibration; sensor fusion; navigation

1. Introduction

The most widely used algorithms for estimating the states of a dynamic system are a Kalman Filter [1,2] and its nonlinear versions (e.g., extended Kalman filter (EKF) [3,4] and unscented Kalman filter (UKF) [5]). The design of the standard Kalman filter does not inherently allow for significant sensor-related delays in computation. Figure 2 shows that the delay is the time difference between an instant when a measurement is taken by a sensor and another instant when the measurement is available in the filter. As an example of key delay sources, some complex sensors such as vision processors for navigation often require extensive computations to obtain higher-level information from raw sensor data. Furthermore, a closed-loop system including control logic may be an overall computational burden to a single processor. Delays resulting from heavy computation may distort the quality of state estimation since a current measurement is compared to past states of a system model. In other words, unless compensating delays in Kalman filtering, large estimation errors may accumulate over time, or even cause the filter to diverge.

The delay value is typically at least partially unknown and at least partially variable in many real applications. As an example of delay uncertainty contributors, even though a local clock is initially forced to synchronize with the centralized clock, deviations between clocks would occur because of clock drift, skew, or bias. In sensor fusion systems, when the timestamps of each sensor are typically recorded by triggered signals, non-deterministic, or non-quantized transmission delays lead to unknown time offsets on sensor streams. Moreover, if low-cost sensors such as rolling shutter cameras or software triggered devices are mounted on a vehicle, the variance of the uncertainty of timestamps might be larger. In particular, in visual-inertial odometry (VIO), we do not know the exact time instant when a camera opens and captures images for any particular pixel location. Often, exposure time depends on surrounding illumination conditions. The timestamp of the latest image by some cameras corresponds to some event such as when the shutter was triggered to start or when the entire image was available in memory. In practice, these uncertainties may be small compared to traditional sensors used for feedback in aerospace applications—but can be a major contributor to errors in emerging estimation problems such as VIO. Indeed, when estimating faster motions such as a highly agile unmanned aerial vehicle (UAV) or using progressive scan cameras, the unknown time delays may be a major driver of navigation quality and achievable controller bandwidth. We have experimentally observed the necessity of the time delay compensation to be more accurate than typically demanded, specifically for a UAV flying closed-loop on a vision-based navigation solution. With poor time delay compensation, we have observed oscillations and even divergence of estimates and closed-loop tracking error as expected, but even when we use fixed/known time delay compensation, we find time delay is still a limiting factor in accuracy and achievable control bandwidth. To illustrate, consider a UAV with a body-fixed camera that maneuvers with a more rapid rotation than reference design. Any time error produces a larger potential position estimation error with this faster rotation. Thus, even after the very best job possible has eliminated as much deterministic time delay error as is practical, we find that adapting to the non-deterministic error can enhance performance. In particular, we find it can be beneficial to deal with unknown time delays in VIO systems used in closed flight control and other systems like these.

1.1. Related Work

1.1.1. Visual-Inertial Odometry

In recent years, an increasing demand for the research of UAVs has prompted substantial interest in VIO systems [6,7,8,9]. Delmerico and Scaramuzza [10] provide a benchmark comparison of monocular VIO algorithms for flying robots. Similar to their comparison, Table 1 illustrates state-of-the-art VIO techniques even including stereo VIO. Let us explain some relevant terms for clarity. The tightly-coupled VIO jointly optimizes over all sensor measurements (i.e., visual and inertial cost terms in VIO) within a single process which results in higher accuracy. The opposite is referred to as the loosely-coupled. Indeed, the loosely-coupled VIO does not handle the correlation of visual and inertial motion constraints, resulting in the loss of information. Moreover, at the back-end of VIO, the optimization-based VIO solves a nonlinear least-squares problem (e.g., pose-graph optimization or bundle adjustment [11]) to update a window of states, which allows for reducing errors by re-linearization [12] but with a high computational cost and possibly stuck in the local minima. In contrast, the filtering-based VIO updates only the most recent state by the Kalman filter or EKF framework, resulting in computationally faster and more efficient, but one-time linearization possibly leads to linearization errors into the estimator. For more details of the terminology, see reference [13,14].

VINS-MONO [16,21] is optimization-based visual simultaneous localization and mapping (SLAM) including loop closure. Some processes in this approach are not efficient due to the following reasons. VINS-MONO duplicates integration with the same IMU data at different timestamps for prediction and optimization purposes. That is, for publishing odometry at IMU rate, it integrates whenever IMU data arrives, whereas IMU data are also accumulated in a buffer for batch processing of integration at the time of image measurement update steps. Mourikis first introduced a multi-state constraint Kalman filter (MSCKF) [22,23], and Sun et al. [19] recently provided its stereo version. Although the real-time high-frequency VIO outputs might be crucial for UAV attitude control, MSCKF does not publish the odometry at the IMU rate but at the image rate. Furthermore, batch processing for IMU data integration in MSCKF may add redundant time delays to the filter when vision measurements are available. VINS-MONO and MSCKF are applicable to IMU and vision fusion. If we fuse other sensors such as the global positioning system (GPS) and altimeters in navigation systems, those approaches may not be operable since measurements from other sensors are available to update between images. Another limitation is that assumptions for IMU pre-integration between keyframes and backward propagation with loop closure in their approaches do not always hold. Hence, the EKF-based VIO frameworks cover a greater scope of sensor fusion problems.

Faessler et al. [17] combined semi-direct visual odometry (SVO) [24,25] with modular multi-sensor fusion (MSF) [26]. Even though this approach uses IMU data for fusing, since it is loosely-coupled, its results are sub-optimal. Paul et al. [18,27] recently proposed alternating stereo VINS that requires computation comparable to monocular VIO, yet provides scale information from the visual observations. However, this method may not be sufficient for tracking fast motion in low-latency demanding applications. Since the implementation is not open-source, this is not used for comparison in this paper. Leutenegger et al. [20] introduced a consistent keyframe-based stereo SLAM algorithm that performs nonlinear optimization over both visual and inertial cost terms. To maintain the sparsity of the system, their approach employs an approximation rendering it sub-optimal. Since it requires considerable computation resources or specific levels of sensors such as industrial grade IMUs, operating OKVIS in real-time is more challenging. Among the six algorithms in Table 1, only S-MSCKF and SVO+MSF handle an unknown time delay, so we will use their estimation results for comparison in this study.

1.1.2. State Estimation Using Time-Delayed Measurements

In a number of applications, a vital problem for combining data from various sensors is the fusion of delayed observations, and if the computational delay is crucial, fusing the data in a Kalman filter is challenging. During the last 20 years, the sensor time-delay problem has been addressed by a number of methods, most of which modify the Kalman filter so that it handles delay in the sensor update step. Alexander [28] derived a method of calculating a correction term and then added it to filter estimates when lagged measurements arrive. However, because the uncertainty of measurements is often an unknown quantity until the data are processed, applying the method in time-varying systems is not addressed. To overcome the shortcoming, Larsen et al. [29] extrapolated a measurement to a current time using the past and present estimates of the Kalman filter and calculated an optimal gain for this extrapolated measurement. However, Larsen’s approach is exact for linear systems only, but if the system dynamics and measurement equations are significantly nonlinear, it can be highly inaccurate. For optimally fusing lagged sensor data in a general nonlinear system, Van Der Merwe et al. [30,31] introduced a new technique called “sample-state augmentation,” based on the Schmidt–Kalman filter [32] or stochastic cloning [33]. Appendix C provides detailed background information about the new technique. Lastly, Gopalakrishnan et al. [34] provided a survey of all previously noted methods.

All of the above methods assume that the amount of time delay is known. As an illustration, those methods only work with synchronized sensors. However, the hardware synchronization of most low-cost or customized sensors is not always available. Moreover, situations in which a current, accurate time delay might not be known can arise in real applications. To deal with the unknown time delays, Julier and Uhlmann [35] introduced the covariance union algorithm, and Sinopoli et al. [36] modeled the arrival of intermittent observations as a random variable with a probability. In addition, Choi et al. [37] and Yoon et al. [38] augmented a state vector with as many past states as the maximum number of delayed steps. The size of this augmented state vector is extremely large, and calculations with the large-size vector might require additional extensive computational effort. Recently, for the uncertainty of time delays in state estimation, Lee and Johnson [39] also suggested an approach combined with multiple-model adaptive estimation. However because of imperfect information on a certain range of the delay value, this method might not be suitable if uncertainty of time delay is high.

Instead, we directly estimate the time delay as an additional state since augmentation is a straightforward means of handling the unknown delay. Nilsson et al. [40] investigated this idea using the Taylor series expansion for small delays. However, delay values are typically larger than a time step, and the linearization in their approach does not hold for large delays. Li and Mourikis [41] also examined the state augmentation for estimating an unknown time offset between the timestamps of two sensors. However, their approach is not optimal since it performs the measurement update of delayed sensor data without the covariance correction that uses the cross-covariance term computed during the delay period. Furthermore, in the recent optimization-based method proposed by Qin and Shen [42], if cameras move at non-constant speed during the short time period like progressive scan cameras, then their assumption does not hold. Despite the short time period, the camera coordinate frame is still changing and moving. Their assumption of a constant time offset is also not general since the unknown delay may be varying. To overcome all previously noted limitations, this paper proposes a novel approach, “latency compensated filtering” based on the combined parameter-state estimator [43,44].

1.2. Summary of Contributions

To fuse visual measurements with unknown time delays in VIO systems and systems like them (e.g., multi-sensor fusion), the approach in this paper incorporates three correction techniques into state estimation. First, we directly estimate the unknown part of actual delays in online fashion by augmenting vehicle-feature states. With the estimated unknown part and the approximately known part of the delays, we find the most precise measurement times based on the definition of total delays introduced in this paper. Next, at the calibrated measurement time, we evaluate the Jacobian and the residual for the EKF using interpolated states. At the measurement update of the EKF, the third correction is to formulate a modified Kalman gain by the cross-covariance term computed during the delay period. The testing results of this study on flight datasets show that the proposed latency compensated VIO is a more reliable and accurate navigation solution than the existing VIO systems.

1.3. A Guide to This Document

The remainder of this document contains the following sections. Section 2 introduces background for all of this study. To estimate the unknown time delays and states of VIO, Section 3 and Section 4 present theory and implementation for a novel combination of the parameter estimation technique with the modified EKF that compensates delayed measurements, respectively. Section 5 shows the testing results of this study on the benchmark flight dataset. The last section concludes and plans future work.

2. Preliminaries

2.1. Sequential Measurement Update

When multiple measurements are observed at one discrete-time, sequential Kalman filtering, shown in Figure 1, is useful [45]. In fact, we obtain N measurements,

y_{1}, y_{2}, \dots, y_{N}

, at time k; that is, we first measure

y_{1}

, then

y_{2}

, ⋯, and finally

y_{N}

.

We first initialize a posteriori estimate and covariance after zero measurement is processed; that is, they are equal to the a priori estimate and covariance. For

i = 1, \dots, N

, perform the general measurement update using the i-th measurement. We lastly assign the a posteriori estimate and covariance as

{\hat{x}}_{k}^{+} \leftarrow {({\hat{x}}_{k})}_{N}

and

P_{k}^{+} \leftarrow {(P_{k})}_{N}

. To clarify, hat “ˆ” denotes an estimate, and superscript − and + a priori and a posteriori estimates, respectively. Based on Simon [45]’s proof that the sequential Kalman filtering is an equivalent formulation of the standard EKF, the order of updates does not affect the overall performance of estimation.

2.2. Vehicle Model

The nonlinear dynamics of a vehicle is driven by IMU sensor data including specific force and angular velocity. The estimated vehicle state is given by

{\hat{x}}_{V} = {[\begin{matrix} ^{i} {\hat{p}}_{b / i}^{T} & ^{i} {\hat{v}}_{b / i}^{T} & {δ \hat{θ}}^{T} & {\hat{b}}_{a}^{T} & {\hat{b}}_{ω}^{T} \end{matrix}]}^{T},

(1)

where

p_{b / i}

,

v_{b / i}

are the position and velocity of the vehicle with respect to the inertial frame, respectively.

δ θ

is the error quaternion of the attitude of the vehicle, and its more details are explained in references [46,47,48].

b_{a}

,

b_{ω}

are the acceleration and gyroscope biases of the IMU, respectively. Left superscript i denotes a vector expressed in the inertial frame. The EKF propagates the vehicle state vector by dead reckoning with data from the IMU. Raw IMU sensor measurements

a_{raw}

and

ω_{raw}

are corrupted by noise and bias as follows:

\begin{matrix} a_{raw} & = a_{true} - T_{b / i}^{i} g + b_{a} + η_{a}, {\dot{b}}_{a} = η_{b_{a}} \end{matrix}

(2)

\begin{matrix} ω_{raw} & = ω_{true} + b_{ω} + η_{ω}, {\dot{b}}_{ω} = η_{b_{ω}}, \end{matrix}

(3)

where

a_{true}

,

ω_{true}

are the true acceleration and angular rate, respectively, and g is the gravitational acceleration in the inertial frame.

η_{a}

,

η_{ω}

are zero-mean, white, Gaussian noise of the accelerometer and gyroscope measurement, and

η_{b_{a}}

,

η_{b_{ω}}

are the random walk rate of the acceleration and gyroscope biases.

T_{b / i} = {T_{i / b}}^{T}

denotes the rotation matrix from the inertial frame to the body frame.

The vehicle dynamics is given by

\begin{matrix} ^{i} {\dot{\hat{p}}}_{b / i} & =^{i} {\hat{v}}_{b / i} \end{matrix}

(4)

\begin{matrix} ^{i} {\dot{\hat{v}}}_{b / i} & = {\hat{T}}_{i / b} (a_{raw} - {\hat{b}}_{a}) +^{i} g \end{matrix}

(5)

\begin{matrix} {\dot{\hat{q}}}_{i / b^{'}} & = \frac{1}{2} Q (ω_{raw} - {\hat{b}}_{ω}) {\hat{q}}_{i / b^{'}} \end{matrix}

(6)

\begin{matrix} δ \dot{\hat{θ}} & = - ⌈(ω_{raw} - {\hat{b}}_{ω}) \times⌋ δ \hat{θ} \end{matrix}

(7)

\begin{matrix} {\dot{\hat{b}}}_{a} & = 0 \end{matrix}

(8)

\begin{matrix} {\dot{\hat{b}}}_{ω} & = 0, \end{matrix}

(9)

where

⌈α \times⌋

is a skew symmetric matrix, and function

Q (\cdot)

maps a 3 by 1 vector of the angular velocity into a 4 by 4 matrix [44]. The use of the 4 by 1 quaternion representation in state estimation causes the covariance matrix to become singular, so it requires considerable accounting for the quaternion constraints. To avoid these difficulties, engineers developed the error-state Kalman filter in which 3 by 1 infinitesimal error quaternion

δ θ

is used instead of 4 by 1 quaternion q in the state vector. In other words, we use attitude error quaternion

δ q_{b / b^{'}}

to express the incremental difference between tracked reference body frame

b^{'}

and actual body frame b for the vehicle.

\begin{matrix} q_{i / b} & = {\hat{q}}_{i / b^{'}} \otimes δ q_{b^{'} / b} \end{matrix}

(10)

\begin{matrix} δ q_{b^{'} / b} & = {\hat{q}}_{i / b^{'}}^{- 1} \otimes q_{i / b} ≃ [\begin{matrix} 1 \\ \frac{1}{2} δ θ \end{matrix}], \end{matrix}

(11)

where ⊗ is quaternion product defined in reference [47]. Resulting rotation matrices with error quaternion and with respect to the nominal reference body frame are

\begin{matrix} T (q_{i / b}) & = {\hat{T}}_{b / i} = {\hat{T}}_{b / b^{'}} {\hat{T}}_{b^{'} / i} \end{matrix}

(12)

\begin{matrix} {\hat{T}}_{i / b^{'}} & = {\hat{T}}_{b^{'} / i}^{T} = T {({\hat{q}}_{i / b^{'}})}^{T} \end{matrix}

(13)

\begin{matrix} {\hat{T}}_{b^{'} / b} & = {\hat{T}}_{b / b^{'}}^{T} ≃ {(I + [\hat{δ θ} \times])}^{T} . \end{matrix}

(14)

Jacobian matrix

A = \frac{\partial \dot{x}}{\partial x} |_{\hat{x}}

and

B = \frac{\partial \dot{x}}{\partial η}, where η = {[η_{a}^{T}, η_{ω}^{T}, η_{b_{a}}^{T}, η_{b_{ω}}^{T}]}^{T},

are computed in Appendix A.

2.3. Camera Model

An intrinsically calibrated pinhole camera model [49,50] is given by

\begin{matrix} [\begin{matrix} u_{j} \\ v_{j} \end{matrix}] = y_{j} & = h_{j} (x) + ζ_{j} = [\begin{matrix} f_{u} \frac{^{c} X_{j}}{^{c} Z_{j}} + ζ_{u_{j}} \\ f_{v} \frac{^{c} Y_{j}}{^{c} Z_{j}} + ζ_{v_{j}} \end{matrix}] \end{matrix}

(15)

\begin{matrix} {[\begin{matrix} ^{c} X_{j}, & ^{c} Y_{j}, & ^{c} Z_{j} \end{matrix}]}^{T} =^{c} p_{f_{j} / c} & = T_{c / i} (^{i} p_{f_{j} / i} -^{i} p_{c / i}) \\ = T_{c / b} T (q_{i / b}) (^{i} p_{f_{j} / i} -^{i} p_{b / i}) - T_{c / b}^{b} p_{c / b} \end{matrix}

(16)

where x is the state vector including the vehicle and feature state, and measurement

y_{j}

is the j-th feature 2D location on the image plane.

f_{u}

,

f_{v}

are the horizontal and vertical focal lengths, respectively, and

ζ_{u}

,

ζ_{v}

are additive, zero-mean, white, Gaussian noise of the measurement. Vectors

p_{f_{j} / c}

,

p_{f_{j} / i}

are the j-th feature 3D position with respect to the camera frame and the inertial frame, respectively. Extrinsic parameter

T_{c / b}

and

^{b} p_{c / b}

are known and constant, and rotation matrix

{\hat{T}}_{c / i} = T_{c / b} {\hat{T}}_{b / b^{'}} {\hat{T}}_{b^{'} / i}

. Jacobian matrix

C_{j} = \frac{\partial y_{j}}{\partial x} |_{\hat{x}}

is computed in Appendix A.

2.4. Feature Initialization

From Equation (16), if j-th measurement

y_{j}

on an image is a new feature, then

^{i} p_{f_{j} / i}

is unknown so it needs to be initialized. In the first step of the measurement update, we employ Gauss–Newton least-squares minimization [22,51] to estimate feature 3D position

^{i} {\hat{p}}_{f_{j} / i}

. To avoid local minima, we apply the inverse depth parameterization of the feature position [52] that is numerically more stable than the Cartesian parameterization. In other words, by the derivation explained in Appendix B, we obtain j-th feature 3D position

^{c_{1}} {\hat{p}}_{f_{j} / c_{1}}

with respect to

c_{1}

left camera frame of a stereo camera.

The j-th feature 3D position with respect to the inertial frame is

\begin{matrix} ^{i} {\hat{p}}_{f_{j} / i} & = {\hat{T}}_{i / c_{1}}^{c_{1}} {\hat{p}}_{f_{j} / c_{1}} +^{i} {\hat{p}}_{c_{1} / i} \\ = {\hat{T}}_{i / b} T_{b / c_{1}}^{c_{1}} {\hat{p}}_{f_{j} / c_{1}} + (^{i} {\hat{p}}_{b / i} + {\hat{T}}_{i / b}^{b} p_{c_{1} / b}) \\ = {\hat{T}}_{i / b^{'}} {\hat{T}}_{b^{'} / b} (T_{b / c_{1}}^{c_{1}} {\hat{p}}_{f_{j} / c_{1}} +^{b} p_{c_{1} / b}) +^{i} {\hat{p}}_{b / i} . \end{matrix}

(17)

The new feature is initialized using only one image in which the feature is first observed. Although the new feature is initialized, since it still entails uncertainty, the EKF recursively estimates and updates its 3D position by augmenting into the state vector:

\begin{matrix} \hat{x} & = {[\begin{matrix} {\hat{x}}_{V}^{T} & ^{i} {\hat{p}}_{f_{j} / i}^{T} \end{matrix}]}^{T}, \end{matrix}

(18)

where

{\hat{x}}_{V}

is the estimated vehicle state vector defined in Equation (1). The overall initialization includes the initial value of the feature state and its error covariance assignment. The error covariance of the new feature are initialized using state augmentation with Jacobian J:

\begin{matrix} [\begin{matrix} P & * \\ * & * \end{matrix}] & = [\begin{matrix} I \\ J \end{matrix}] P [\begin{matrix} I & J^{T} \end{matrix}] = [\begin{matrix} P & P J^{T} \\ J P & J P J^{T} + P_{f_{new}} \end{matrix}], \end{matrix}

(19)

where Jacobian

J = \frac{\partial p_{f / i}}{\partial x} |_{\hat{x}}

is computed as follows:

J = [\begin{matrix} I_{3 \times 3} & 0_{3 \times 3} & - {\hat{T}}_{i / b^{'}} ⌈(T_{b / c_{1}}^{c_{1}} {\hat{p}}_{f_{j} / c_{1}} +^{b} p_{c_{1} / b}) \times⌋ & 0_{3 \times 6} & 0_{3 \times 3 N_{f}} \end{matrix}] .

(20)

N_{f}

is the number of all features and

P_{f_{new}}

is the initial uncertainty of the initialized new feature. The error pertains to measurement noise and the error of the least-squares minimization. In fact, since Montiel et al. [52] validate the initial uncertainty as a Gaussian distribution, the EKF including the feature initialization still holds optimality. Equations (18)–(20) arise based on “consider covariance analysis [53,54]”.

Once initialized, the EKF processes the feature state in the prediction-update loop. In the time update of the EKF, we propagate P by

\begin{matrix} [\begin{matrix} Φ & 0 \\ 0 & I \end{matrix}] [\begin{matrix} P_{V V} & P_{V f} \\ P_{f V} & P_{f f} \end{matrix}] [\begin{matrix} Φ^{T} & 0 \\ 0 & I \end{matrix}] + [\begin{matrix} Q_{V} & 0 \\ 0 & Q_{f} \end{matrix}] & = [\begin{matrix} Φ P_{V V} Φ^{T} + Q_{V} & Φ P_{V f} \\ P_{f V} Φ^{T} & P_{f f} + Q_{f} \end{matrix}], \end{matrix}

(21)

where state transition matrix

Φ \approx I + A Δ t

.

P_{V V} : = E [(x_{V} - {\hat{x}}_{V}) {(x_{V} - {\hat{x}}_{V})}^{T}]

,

P_{f f}

is the error covariance of all features, and

P_{V f} = P_{f V}^{T}

represents vehicle–feature correlations. In addition, we assume that the surroundings are static, so the dynamics of features

{\dot{\hat{p}}}_{f_{j} / i} = 0

. In the measurement update of the EKF, only tracked features are used for the update. For the efficient management of the map database, if the size of the state vector exceeds than the maximum limit, then the feature with the least number of observations is pruned and marginalized.

3. Theory

3.1. Definition of Time Delays

Based on dead reckoning, the EKF propagates state x and its error covariance P at time t when IMU sensor data

a_{raw}

and

ω_{raw}

are measured. Since an IMU is a discrete-time sensor, the time update of the EKF is processed in discrete time step

k = (integer) (t / Δ t_{IMU}),

where

(integer)

defines the conversion of all data types to integers, continuous time

t \in [0, t_{final}]

, and

Δ t_{IMU}

is the sampling rate of the IMU.

Δ t_{IMU}

is generally almost constant since a micro controller such as Arduino and Pixhawk calculates precise timestamps in milliseconds for each IMU measurement. Next, whenever new vision data from an image are arrived at the filter, the EKF performs the measurement update for correcting the state estimate and its error covariance. As introduced in Section 1, various reasons such as image processing produce time delays that the time stamps of vision data contain. For clarity, this section defines the time delay in detail.

Latency is the time difference between when an image was grabbed and when vision data from the image are updated in the filter, shown in Figure 2.

That is, true delays

Δ t_{d}

are written as

t = t_{img} + Δ t_{d},

(22)

where t is current IMU time and

t_{img}

is the time when the current image was captured. In essence, we treat IMU time as our common time reference, and we do not necessarily know the exact time when images are grabbed. The timestamp of each image is encoded by indirect ways such as triggers. In other words, true image time

t_{img}

constitutes readable timestamps

t_{img, raw}

and unknown

δ t_{d}

. Let us define time differences

Δ {\bar{t}}_{d}

between the time readouts of sensors as follows:

\begin{matrix} t_{img, raw} & = t_{img} + δ t_{d} \end{matrix}

(23)

\begin{matrix} Δ {\bar{t}}_{d} & : = t - t_{img, raw} = Δ t_{d} - δ t_{d} \end{matrix}

(24)

\begin{matrix} Δ t_{d} & = Δ {\bar{t}}_{d} + δ t_{d}, \end{matrix}

(25)

where

Δ {\bar{t}}_{d}

and

δ t_{d}

are the approximately known and the unknown parts of true delays

t_{d}

, respectively.

3.2. Approximately Known Part of Time Delays

Δ {\bar{t}}_{d}

is either a fixed value determined by offline beforehand tuning or readable differences between the time stamps of image and the time stamps of IMU data. Indeed, regardless of a constant value or readable varying delays, approximate delay

Δ {\bar{t}}_{d}

is a known value. Let the discrete steps of the approximately known part be

d = (integer) (Δ {\bar{t}}_{d} / Δ t_{IMU})

, where (integer) means type conversion to integer from other types; that is, d is the quotient of division

\frac{Δ {\bar{t}}_{d}}{Δ t_{IMU}}

.

3.2.1. Jacobian and Residual—“Baseline Correction”

Since

δ t_{d}

is unknown, we first consider only the

Δ {\bar{t}}_{d}

term as our delay of the system. From the system models given in Section 2.2 and Section 2.3, only measurements from the camera model depend on the time delays. To correct the Jacobian and residual with approximately known delays, interpolation and quaternion slerp are required. Since

k - d \neq (integer) (\frac{t - Δ {\bar{t}}_{d}}{Δ t_{IMU}})

, we define new time notation

[k - \bar{d}]

as

[k - \bar{d}] : = \frac{t_{img, raw}}{Δ t_{IMU}} = \frac{t - Δ {\bar{t}}_{d}}{Δ t_{IMU}} .

When time

[k - \bar{d}]

is expressed at subscript (e.g.,

x_{[k - \bar{d}]}

,

P_{[k - \bar{d}]}

), we will use the shorthand notation without

[]

(e.g.,

x_{k - \bar{d}}

,

P_{k - \bar{d}}

).

Although delay d in discrete-time systems is the number of delayed samples, time

[k - \bar{d}]

is not required to be an integer by reading timestamps of each sensor. Since

[k - \bar{d}]

is not an integer, we cannot directly access the values of either

{\hat{x}}_{k - \bar{d}}

or its corresponding error covariance

P_{k - \bar{d}}

, so relevant interpolation is required instead.

Mathematically, linear interpolation constructs a new data point within the range of two known adjacent data points by the same slope of two lines [55]. Let us take the nearest integer time step

k - d

, which is greater than or equal to

[k - \bar{d}]

, shown in Figure 3a.

With two data points, either

(k - d - 1, {\hat{x}}_{k - d - 1})

and

(k - d, {\hat{x}}_{k - d})

or

(k - d - 1, P_{k - d - 1})

and

(k - d, P_{k - d})

, the interpolants at time

[k - \bar{d}]

are given by

\begin{matrix} {\hat{x}}_{k - d} - {\hat{x}}_{k - d - 1} & = \frac{{\hat{x}}_{k - d} - {\hat{x}}_{k - \bar{d}}}{k - d - [k - \bar{d}]} \end{matrix}

\begin{matrix} {\hat{x}}_{k - d} - {\hat{x}}_{k - \bar{d}} & ≃ (\frac{t}{Δ t_{IMU}} - d - \frac{t - Δ {\bar{t}}_{d}}{Δ t_{IMU}}) ({\hat{x}}_{k - d} - {\hat{x}}_{k - d - 1}) \end{matrix}

(26)

\begin{matrix} {\hat{x}}_{k - \bar{d}} & = (1 - \frac{Δ {\bar{t}}_{d}}{Δ t_{IMU}} + d) {\hat{x}}_{k - d} + (\frac{Δ {\bar{t}}_{d}}{Δ t_{IMU}} - d) {\hat{x}}_{k - d - 1}, \end{matrix}

(27)

where

k = (integer) (\frac{t}{Δ t_{IMU}}) \approx \frac{t}{Δ t_{IMU}}

in Equation (26). Likewise,

P_{k - \bar{d}} = (1 - \frac{Δ {\bar{t}}_{d}}{Δ t_{IMU}} + d) P_{k - d} + (\frac{Δ {\bar{t}}_{d}}{Δ t_{IMU}} - d) P_{k - d - 1} .

Although we compute the interpolants at time

[k - \bar{d}]

using linear interpolation because of the constraint and specialty of quaternion, another interpolation is required. Slerp is shorthand for spherical linear interpolation, introduced by Ken Shoemake [56] in the context of quaternion interpolation for the purpose of animating 3D rotation. Interpolants refer to constant-speed motion along a unit-radius circle arc, shown in Figure 3b. Based on the fact that any point on the curve is linear combination of the given ends, the geometric formula [56,57] is

\begin{matrix} Θ & = {cos}^{- 1} (q_{k - d} \cdot q_{k - d + 1}) \end{matrix}

(28)

\begin{matrix} {\hat{q}}_{k - \bar{d}} & = \frac{sin [(1 - \frac{Δ {\bar{t}}_{d}}{Δ t_{IMU}} + d) Θ]}{sin Θ} {\hat{q}}_{k - d} + \frac{sin [(\frac{Δ {\bar{t}}_{d}}{Δ t_{IMU}} - d) Θ]}{sin Θ} {\hat{q}}_{k - d - 1}, \end{matrix}

(29)

where since only unit quaternions are valid rotations, normalization of each quaternion before applying Slerp is a prerequisite.

Θ

is a smaller angle between two end quaternions, so we ensure that

- 90 deg \leq Θ \leq 90 deg

. If the dot product in Equation (28) is negative, Slerp does not represent the shortest path. To prevent long paths, we negate one of end quaternions since q and

- q

are equivalent when the negation is applied to all four components. If two quaternions input

q_{k - d}

,

q_{k - d + 1}

are too close, then interpolants by linear interpolation are acceptable. Otherwise,

{cos}^{- 1} (\cdot)

in Equation (28) is safe computation because the dot product is in the range of the threshold.

With suitable interpolants at time

[k - \bar{d}]

, a baseline approach modifies the feature initialization in Appendix B and the measurement update. At time k, the vision data of an image grabbed at time

(t - Δ t_{d})

arrive at the filter for either the feature initializations or the sequential measurement updates. If j-th measurement

y_{j}

on the last image is a new feature, then, from Equations (18) and (19), state

\hat{x}

and covariance P at current time k are augmented as follows:

\begin{matrix} {\hat{x}}_{k} & \overset{aug}{\Rightarrow} {[\begin{matrix} {\hat{x}}_{k}^{T} & ^{i} {\hat{p}}_{f_{j} / i}^{T} \end{matrix}]}^{T} \end{matrix}

(30)

\begin{matrix} P_{k} & \overset{aug}{\Rightarrow} [\begin{matrix} P_{k} & P_{k} {(J_{j})}_{k - \bar{d}}^{T} \\ {(J_{j})}_{k - \bar{d}} P_{k} & {(J_{j})}_{k - \bar{d}} P_{k} {(J_{j})}_{k - \bar{d}}^{T} + P_{f_{j_{new}}} \end{matrix}] \end{matrix}

(31)

where

^{i} {\hat{p}}_{f_{j} / i} = T_{i / b} |_{k - \bar{d}}^{b} {\hat{p}}_{f_{j} / b} + {(^{i} p_{b / i})}_{k - \bar{d}}

and

{(J_{j})}_{k - \bar{d}} = \frac{\partial p_{f_{j} / i}}{\partial x} |_{{\hat{x}}_{k - \bar{d}}}

.

^{b} {\hat{p}}_{f_{j} / b}

is initialized by Gaussian-Newton least-squares minimization derived in Appendix B. Although we assume static features, since the feature initialization is related to estimated camera pose at the time when the delays begin, corrected Jacobian

J_{j}

is required in the initialization steps.

If j-th measurement

y_{j}

on the image is a tracked feature, then we correct only residual r and Jacobian C in the following measurement update:

\begin{matrix} K_{j} & = {(P_{k})}_{j - 1} {(C_{j})}_{k - \bar{d}}^{T} {({(C_{j})}_{k - \bar{d}} {(P_{k})}_{j - 1} {(C_{j})}_{k - \bar{d}}^{T} + R)}^{- 1} \end{matrix}

(32)

\begin{matrix} {({\hat{x}}_{k})}_{j} & = {({\hat{x}}_{k})}_{j - 1} + K_{j} (y_{j} |_{t - Δ t_{d}} - h_{j} ({\hat{x}}_{k - \bar{d}})) \end{matrix}

(33)

\begin{matrix} {(P_{k})}_{j} & = {(P_{k})}_{j - 1} - K_{j} {(C_{j})}_{k - \bar{d}} {(P_{k})}_{j - 1} \end{matrix}

(34)

where corrected residual

{(r_{j})}_{k - \bar{d}} = y_{j} |_{t - Δ t_{d}} - h_{j} ({\hat{x}}_{k - \bar{d}})

and Jacobian

{(C_{j})}_{k - \bar{d}} = \frac{\partial h_{j} (x)}{\partial x} |_{{\hat{x}}_{k - \bar{d}}}

. R is the measurement noise covariance of

y_{j} |_{t - Δ t_{d}}

, and

K_{j}

is sub-optimal Kalman gain computed by current covariance. As sequential Kalman Filtering introduced in Section 2.1, if j is the first feature on the current image (i.e., j = 0), then assign

{({\hat{x}}_{k})}_{0} \leftarrow {\hat{x}}_{k}^{-}

,

{(P_{k})}_{0} \leftarrow P_{k}^{-}

, and if j is the last feature on the current image (i.e., j =

N_{k}

), then assign

{\hat{x}}_{k}^{+} \leftarrow {({\hat{x}}_{k})}_{N}

,

P_{k}^{+} \leftarrow {(P_{k})}_{N}

. Before measurement updates (32)–(34), a chi-squared gating test rejects outliers of each measurement. For only this test purpose in the case of baseline correction, we add uncertainty due to time delay. Procedures in Equations (30)–(34) are referred to as the “baseline correction.”

3.2.2. Cross-Covariance—“Covariance Correction”

During the delay period, even though an image was already captured in the past, since vision data from the image have not yet arrived at the filter, the EKF is not able to perform the measurement update. Indeed, the filter processes only time update. When a vision data packet from the image finally arrives and is ready to update in the filter, we simply execute the Jacobian and residual correction in Equations (32)–(34) using the delayed measurements. However, unlike the baseline correction, if the filter updates as if the measurements arrive immediately without delays (like red lines in Figure 4), then filter can achieve a more accurate estimate. In fact, covariance correction presented in this section (like blue lines in Figure 4) is as if the filter accomplished the general measurement update at the time instant when the image was captured. In other words, red lines in Figure 4 are ideal but unrealistic, and blue lines in the figure are practical. The red lines process the measurement update first and then time update; however, the order of the processes of the blue lines is the opposite. Only the order of the processes has changed.

Among a variety of fusing techniques for time-delayed observations discussed in Section 1.1.2, the stochastic cloning [33]-based method (i.e., the Schmidt EKF [30,31]) is applicable to varying delays and nonlinear functions such as the vehicle and camera models described in Section 2.2 and Section 2.3, respectively. Thus, this study modifies the method for finding the optimal navigation solution of vision-aided inertial navigation systems.

Let us introduce new notation

P^{dly}

.

P^{dly}

is P covariance matrix at the time when the true delays begin. In the scope of this section,

P^{dly} ≃ P_{k - \bar{d}}

. In addition, when this section uses corrected residual

{(r_{j})}_{k - \bar{d}}

and Jacobians

{(J_{j})}_{k - \bar{d}}

,

{(C_{j})}_{k - \bar{d}}

, we will use their shorthand notations as

r_{j}

and

J_{j}

,

C_{j}

, respectively. That is, each residual and Jacobian is corrected based on Section 3.2.1. In addition to the baseline correction, we correct error covariance in both the feature initialization and the measurement update when delayed vision data are available in the filter.

If j-th feature measurement

y_{j}

on the recent image is a new feature, the augmentation of

P^{dly}

in the feature initialization is similar to Equation (31). On the other hand, since Jacobian

J_{j}

is computed at the time when the delays begin, the augmentation of covariance matrix

P_{k}

at the current time is as follows in a different way:

\begin{matrix} P_{k} & \overset{aug}{\Rightarrow} [\begin{matrix} P_{k} & 0 \\ 0 & d Q_{f} \end{matrix}] \end{matrix}

(35)

\begin{matrix} P^{dly} & \overset{aug}{\Rightarrow} [\begin{matrix} P^{dly} & P^{dly} J_{j}^{T} \\ J_{j} P^{dly} & J_{j} P^{dly} J_{j}^{T} + P_{f_{j_{new}}} \end{matrix}], \end{matrix}

(36)

where

d = (integer) \frac{Δ {\bar{t}}_{d}}{Δ t_{IMU}}

and

J_{j} = \frac{\partial p_{f_{j} / i}}{\partial x} |_{{\hat{x}}_{k - \bar{d}}}

. State estimate

{\hat{x}}_{k}

is augmented by Equation (30).

When j-th delayed vision data

y_{j}

is ready to update at time k, we modify the measurement update steps of the sequential Kalman filtering as follows:

\begin{matrix} S_{j} & = C_{j} {(P^{dly})}_{j - 1} C_{j}^{T} + R \end{matrix}

\begin{matrix} K_{j}^{crs} & = {(P^{crs})}_{j - 1} C_{j}^{T} S_{j}^{- 1} \end{matrix}

(37)

\begin{matrix} {({\hat{x}}_{k})}_{j} & = {({\hat{x}}_{k})}_{j - 1} + K_{j}^{crs} r_{j} \end{matrix}

(38)

\begin{matrix} {(P_{k})}_{j} & = {(P_{k})}_{j - 1} - K_{j}^{crs} C_{j} {(P^{crs})}_{j - 1}^{T}, \end{matrix}

(39)

where

r_{j} = y_{j} |_{t - Δ t_{d}} - h_{j} ({\hat{x}}_{k - \bar{d}})

and

C_{j} = \frac{\partial h_{j} (x)}{\partial x} |_{{\hat{x}}_{k - \bar{d}}}

.

P^{crs}

is the relevant cross-covariance term during the delay period. This term, which fuses a current prediction of the state with an observation related to the lagged state of the system, is used for formulating modified Kalman gain matrix

K^{crs}

. Equation (39) still holds Joseph’s form [58] that preserves the symmetry of the updated covariance and ensures its the positive definiteness. By sequential update provisions, the state estimate and covariance at time

[k - \bar{d}]

are also updated as follows:

\begin{matrix} K_{j}^{dly} & = {(P^{dly})}_{j - 1} C_{j}^{T} S_{j}^{- 1} \end{matrix}

(40)

\begin{matrix} {({\hat{x}}_{k - \bar{d}})}_{j} & = {({\hat{x}}_{k - \bar{d}})}_{j - 1} + K_{j}^{dly} r_{j} \end{matrix}

(41)

\begin{matrix} {(P^{dly})}_{j} & = {(P^{dly})}_{j - 1} - K_{j}^{dly} C_{j} {(P^{dly})}_{j - 1} . \end{matrix}

(42)

At time

t_{img}

, when cameras open for capturing the image, the cross-covariance matrix is initialized with covariance at that time; that is,

P^{crs} \leftarrow P^{dly} \approx P_{k - \bar{d}}

. During the delay period, from time

[k - \bar{d}]

to current time k, if no other measurements are fused into the filter, the cross-covariance is only propagated by the following computation based on the Schmidt–Kalman filter [30,32,33]:

\begin{matrix} Φ^{crs} & = \prod_{i = k - 1}^{k - d} [\begin{matrix} Φ_{i} & 0 \\ 0 & I \end{matrix}] = [\begin{matrix} (\prod_{i = k - 1}^{k - d} Φ_{i}) & 0 \\ 0 & I \end{matrix}] \end{matrix}

(43)

\begin{matrix} P^{crs} & = Φ^{crs} P^{dly} \end{matrix}

(44)

\begin{matrix} = [\begin{matrix} (\prod_{i = k - 1}^{k - d} Φ_{i}) P_{v v}^{dly} & (\prod_{i = k - 1}^{k - d} Φ_{i}) P_{v f}^{dly} \\ P_{f v}^{dly} & P_{f f}^{dly} \end{matrix}] \end{matrix}

(45)

where

Φ

is the state transition matrix. In the sequential measurement update, based on updated

{(P^{dly})}_{j}

in Equation (42), updating

{(P^{crs})}_{j - 1}

is straightforward as follows:

{(P^{crs})}_{j} = Φ^{crs} {(P^{dly})}_{j}

(46)

If other measurements from other sensors such as an altimeter and GPS are fused during the delay period, then

P^{dly}

and cross-covariance

P^{crs}

are also recursively updated using the Kalman gain of the other measurements. For this case, Equations (43)–(46) do not hold any longer. For more details, see Appendix C. All modification in this section is referred to as “covariance correction.” Furthermore, the optimality of this covariance correction is guaranteed based on the fact that the standard Kalman filter is an optimal filter since Appendix C proves that the covariance correction is identical to the standard EKF. Hence, the proposed correction still holds its optimality. Section 4 will describe its practical implementation.

3.3. Unknown Part of Time Delays—“Online Calibration”

Although residual, Jacobians, and covariance are corrected for measurements with time delays, if

Δ {\bar{t}}_{d}

is uncertain readouts or

δ t_{d}

is the larger portion of true delays, we cannot guarantee the reliability of the correction algorithm (Figure 5). For the robustness of vision-aided navigation systems, we need to additionally investigate the unknown part of true delays.

Figure 5 shows three corrections in the latency compensated VIO presented in this study. From the standard Kalman filter, if one does not account for time delay, propagation, and measurement update look like grey lines in Figure 5. For the last correction, we estimate the unknown part of time delays to obtain more precise time instant when the delays begin. As discussed in Section 1, unknown phenomena such as clock bias, drift, skews, and asynchronization cause

δ t_{d}

, so

δ t_{d}

may be a positive or negative value.

State estimation theory can be used to estimate not only the states but also the unknown parameters of the system [59]. Numerous researchers [60,61,62] have proved that state augmentation functions are easy to use with state observers, so we enable design a state observer by state augmentation to estimate the unknown part of the time delays. To estimate unknown delay value

δ t_{d}

, we first augment state estimates

{\hat{x}}_{V}

and covariance

P_{v v}

of the vehicle as follows:

x_{V} \overset{aug}{\Rightarrow} {[\begin{matrix} x_{V}^{T}, & δ t_{d} \end{matrix}]}^{T}, P_{v v} \overset{aug}{\Rightarrow} [\begin{matrix} P_{v v} & P_{v δ t_{d}} \\ P_{δ t_{d} v} & P_{δ t_{d}} \end{matrix}] .

(47)

Like the modeling of the IMU biases in Equations (2) and (3), we model the dynamics of

δ t_{d}

using a small artificial noise term

{\dot{δ t}}_{d} = η_{d}, {\dot{\hat{δ t}}}_{d} = 0,

(48)

where

η_{p}

is a random walk rate that allows the EKF to change its estimate of

δ t_{d}

; that is, the power spectral density of

η_{p}

represents the variability of

{δ t}_{d}

. In fact, this is a conventional random walk model for an unknown parameter that may be varying—commonly seen for things like gyro bias, as done here. If additional modeling information about the way time delays is expected to vary is known, then it could be captured here with a more complex model.

Let us rewrite the definition of time delays:

\begin{matrix} t - t_{img} & = Δ t_{d} \\ Δ t_{d} & = Δ {\bar{t}}_{d} + δ t_{d} = (t - t_{img, raw}) + δ t_{d} . \end{matrix}

For clarity, we define new time notation

[k - \hat{d}]

as

\begin{matrix} [k - \hat{d}] & : = \frac{t_{img, raw} - {\hat{δ t}}_{d}}{Δ t_{IMU}} \approx \frac{t_{img}}{Δ t_{IMU}} \\ = \frac{t - (Δ {\bar{t}}_{d} + {\hat{δ t}}_{d})}{Δ t_{IMU}}, \end{matrix}

where now time

[k - \hat{d}]

is the most precise time instant when the image was captured. To apply the relevant interpolation techniques to the state estimates and covariance at time

[k - \hat{d}]

, we access their values at the nearest integer time step

k - s

, where

s = (integer) (Δ {\bar{t}}_{d} + {\hat{δ t}}_{d}) / Δ t_{IMU}

. In other words, s, discrete delayed samples including estimated latency, is greater than or equal to

[k - \hat{d}]

, as shown in Figure 2.

To operate the augmented system, we match its dimension by augmenting other matrices. In the time update, since

{\dot{\hat{δ t}}}_{d} = 0

, the state transition matrix and the process noise covariance matrix are augmented

Φ \overset{aug}{\Rightarrow} [\begin{matrix} Φ & 0 \\ 0 & I \end{matrix}], Q_{v} \overset{aug}{\Rightarrow} [\begin{matrix} Q_{v} & 0 \\ 0 & Q_{d} \end{matrix}]

(49)

where I is due to

{\dot{\hat{δ t}}}_{d} = 0

and the Gaussian white noise

η_{d} \sim N (0, Q_{d})

. Under the assumption of static features, since estimated latency

δ t_{d}

pertains to only vision measurements, we compute augmented elements of Jacobian matrices J and C [41,43]. In fact, from Equation (20), Jacobian

J_{j}

in the feature initialization is augmented as follows:

{(J_{j})}_{k - \hat{d}} \overset{aug}{\Rightarrow} [\begin{matrix} I_{3 \times 3} & 0_{3 \times 3} & \frac{\partial {\hat{p}}_{f_{j} / i}}{\partial δ \hat{θ}} |_{[k - \hat{d}]} & 0_{3 \times 6} & J_{j_{δ t_{d}}} & | 0 \dots \end{matrix}],

where

\begin{matrix} J_{j_{δ t_{d}}} & = \frac{\partial {\hat{p}}_{f_{j} / i}}{\partial {\hat{δ t}}_{d}} |_{[k - \hat{d}]} ≃ \frac{\partial {\hat{p}}_{f_{j} / i}}{\partial x} |_{{\hat{x}}_{k - \hat{d}}} \cdot \frac{\partial x}{\partial t} |_{[k - \hat{d}]} \cdot \frac{\partial t}{\partial {\hat{δ t}}_{d}} |_{[k - \hat{d}]} \\ = {(J_{j})}_{k - \hat{d}} {\dot{\hat{x}}}_{k - \hat{d}} \\ = [\begin{matrix} I_{3 \times 3} & 0_{3 \times 3} & \frac{\partial {\hat{p}}_{f_{j} / i}}{\partial δ \hat{θ}} |_{[k - \hat{d}]} & 0_{3 \times 3} & 0_{3 \times 3} & | 0 \dots \end{matrix}] {[\begin{array}{c} ^{i} {\dot{\hat{p}}}_{b / i} \\ ^{i} {\dot{\hat{v}}}_{b / i} \\ δ \dot{\hat{θ}} \\ 0 \\ 0 \\ {(0 \dots 0)}^{T} \end{array}]}_{[k - \hat{d}]} \\ = {(^{i} {\hat{v}}_{b / i})}_{k - \hat{d}} + (\frac{\partial {\hat{p}}_{f_{j} / i}}{\partial δ \hat{θ}} |_{[k - \hat{d}]}) {δ \dot{\hat{θ}}}_{k - \hat{d}} \end{matrix}

(50)

Furthermore, from Equation (A2), augmented Jacobian

C_{j}

in the measurement update is

{(C_{j})}_{k - \hat{d}} \overset{aug}{\Rightarrow} {[\begin{matrix} \frac{\partial y_{j}}{\partial^{i} {\hat{p}}_{b / i}} & 0 & \frac{\partial y_{j}}{\partial δ \hat{θ}} & 0 & 0 & \frac{\partial y_{j}}{\partial {\hat{δ t}}_{d}} & | 0 \dots & \frac{\partial y_{j}}{\partial^{i} {\hat{p}}_{f_{j} / i}} & \dots 0 \end{matrix}]}_{[k - \hat{d}]},

where

\begin{matrix} \frac{\partial y_{j}}{\partial {\hat{δ t}}_{d}} |_{[k - \hat{d}]} & ≃ \frac{\partial y_{j}}{\partial x} |_{{\hat{x}}_{k - \hat{d}}} \cdot \frac{\partial x}{\partial t} |_{[k - \hat{d}]} \cdot \frac{\partial t}{\partial {\hat{δ t}}_{d}} |_{[k - \hat{d}]} = {(C_{j})}_{k - \hat{d}} {\dot{\hat{x}}}_{k - \hat{d}} \\ = (\frac{\partial y_{j}}{\partial^{i} {\hat{p}}_{b / i}} |_{[k - \hat{d}]}) {(^{i} {\hat{v}}_{b / i})}_{k - \hat{d}} + (\frac{\partial y_{j}}{\partial δ \hat{θ}} |_{[k - \hat{d}]}) {δ \dot{\hat{θ}}}_{k - \hat{d}} . \end{matrix}

(51)

Here, let us call the combination of the estimation of the unknown latency in this section with the baseline correction “online calibration.” Therefore, to reliably estimate the state variable and effectively compensate the total delays, we incorporate all three corrections, called “latency-adaptive filtering.”

4. Implementation

This section summarizes and describes an implementation of the proposed method. Figure 6 illustrates a flow chart of the overall process.

4.1. Forward Computation of Cross-Covariance

Even though delays begins

Δ t_{d}

time prior, estimated delay value

{\hat{t}}_{d}

is only accessible when delay finished. That is, during the delay period from

t_{img}

to t,

Δ {\hat{t}}_{d}

is unknown yet.

Δ {\hat{t}}_{d}

is estimated at current time k. Since estimated delay value

Δ {\hat{t}}_{d}

is unknown up to time k, we are not sure when the covariance correction begins computing cross-covariance

P^{crs}

. Theoretically, when

Δ {\hat{t}}_{d}

is estimated at time k, we compute

P^{dly}

and

P^{crs}

by backward from time k to time

[k - \hat{d}]

with saved Jacobians and covariance during the delay period. This is an ideal computation, but not realistic. Backward computation that is used in [43] is impossible for real-time operations since storing large matrices such as sequences of Jacobian and covariance matrices allocates huge memory uses. Furthermore, backward computing is not efficient because it iterates backward at time k like batch processing.

Instead, for a real-time framework, an approximated way of forward computation of cross-covariance is introduced. Since

{\hat{δ t}}_{d} = 0

, we assume that the time delay does not change in state propagation during the delay period, so a posteriori estimate of time delay when the last measurement update is assumed to be a priori estimate of the delay at the current time. Next, under this assumption, we predict when the time delay of the next image begins. At the predicted time instant, we store the covariance matrix once for

P^{dly}

and recursively calculate

Φ^{crs}

for

P^{crs}

.

4.2. Summarized Algorithm

When the size of the state after augmentation in the feature initialization steps exceeds a maximum threshold, we prune the number of features in the database. The system in this study finds an index for the best place to insert a new point in the database. The one with the least number of observations or frequent outliers is marginalized. Unlike Lee at al. [43], this study does not estimate the total parts of time delays, so the latency compensated filter does not entail specific constraints. That is, this study estimates only unknown part

δ t_{d}

that is a possible positive or negative value. To save computation, constrained Kalman filtering is not necessary. Instead, interpolation and quaternion Slerp explained in Section 3.2.1 are tractable.

From the definition of time delays presented in Section 3.1, the total time delay is not estimated as negative. For example, if the estimated delay is negative (i.e., an exceeded index), estimation is impossible since this case is forecasting states or obtaining measurements from the future, so the total delay has to be bounded by zero. Moreover, in the sequential measurement update, if estimated time delay

δ t_{d}

is larger than the sampling time of the IMU,

Δ t_{IMU}

, then we indicate another slot in the delay buffer. Algorithm 1 is a summarized algorithm of the overall processes of the latency compensated VIO.

Algorithm 1 The Latency Compensated VIO

Require:

{\hat{x}}_{0}^{+} (= {\hat{x}}_{V_{0}}^{+}), P_{0}^{+}, Q, R, P^{dly} (= P_{0}^{+}), Φ^{crs} (= I), χ^{2}

1:: for $k = 1 : T$ do
2:: if new IMU packet arrival then
3:: Time Update:
4:: ${\dot{\hat{x}}}_{V} = f ({\hat{x}}_{V_{k - 1}}^{+}, a_{raw}, ω_{raw})$ ▹ static features
5:: Numerically integrate with $Δ t_{IMU} (= t_{k} - t_{k - 1})$
6:: ${\hat{x}}_{k}^{-} = {\hat{x}}_{k - 1}^{+} + \int_{t_{k - 1}}^{t_{k}} \dot{\hat{x}} (τ) d τ$
7:: $Φ_{k - 1} = exp (\int_{t_{k - 1}}^{t_{k}} A (τ) d τ)$ ▹ $A = \frac{\partial f}{\partial x} |_{{\hat{x}}_{V}}$
8:: ${(P_{v v})}_{k}^{-} = Φ_{k - 1} {(P_{v v})}_{k - 1}^{+} Φ_{k - 1}^{T} + B_{k - 1} Q_{v} B_{k - 1}^{T}$ ▹ $B = \frac{\partial f}{\partial η} |_{{\hat{x}}_{V}}$
9:: $P_{k}^{-} = [\begin{matrix} {(P_{v v})}_{k}^{-} & Φ_{k - 1} {(P_{v f})}_{k - 1}^{+} \\ {(P_{f v})}_{k - 1}^{+} Φ_{k - 1}^{T} & {(P_{f f})}_{k - 1}^{+} + Q_{f} \end{matrix}]$
10:: Store the state estimates into the delay buffer
11:: if during delay period then
12:: $Φ^{crs} \leftarrow Φ_{k - 1} Φ^{crs}$ ▹ recursive
13:: else
14:: $P^{dly} \leftarrow P_{k}^{-}$
15:: $Φ^{crs} \leftarrow I$
16:: end if
17:: end if
18:: if new vision data packet arrival then
19:: Compute index ${\hat{d}}^{-}$ of delay
20:: Interpolate using the state estimates from the buffer
21:: for $j = 1 : ♯$ of observed features do
22:: if new feature then
23:: Feature Initialization:
24:: ${\hat{p}}_{f_{j} / i} = g_{j} ({\hat{x}}_{k - \hat{d}}, y_{j})$ ▹ least-squares minimization
25:: Augment state if feature is valid ▹ if positive depth
26:: ${\hat{x}}_{k} \overset{aug}{\Rightarrow} {[\begin{matrix} {\hat{x}}_{k}^{T} & {\hat{p}}_{f_{j} / i}^{T} \end{matrix}]}^{T}$
27:: $P^{dly} \overset{aug}{\Rightarrow} [\begin{matrix} P^{dly} & P^{dly} J_{j}^{T} \\ J_{j} P^{dly} & J_{j} P^{dly} J_{j}^{T} + P_{f_{j_{new}}} \end{matrix}]$ ▹ $J_{j} = \frac{\partial p_{f_{j} / i}}{\partial x} |_{{\hat{x}}_{k - \hat{d}}}$
28:: $P_{k} \overset{aug}{\Rightarrow} [\begin{matrix} P_{k} & 0 \\ 0 & \hat{d} Q_{f} \end{matrix}]$
29:: Prune state vector if exceed maximum
30:: else ▹ tracked feature
31:: Measurement Update:
32:: Update if gating test is passed ▹ $r_{j}^{T} S_{j}^{- 1} r_{j} \overset{?}{<} χ_{j}^{2}$
33:: $r_{j} = y_{j} - h_{j} ({\hat{x}}_{k - \hat{d}})$
34:: $S_{j} = {(C_{j})}_{k - \hat{d}} P^{dly} {(C_{j})}_{k - \hat{d}}^{T} + R$ ▹ ${(C_{j})}_{k - \hat{d}} = \frac{\partial h_{j}}{\partial x} |_{{\hat{x}}_{k - \hat{d}}}$
35:: $P^{crs} \leftarrow [\begin{matrix} Φ^{crs} P_{v v}^{dly} & Φ^{crs} P_{v f}^{dly} \\ P_{f v}^{dly} & P_{f f}^{dly} \end{matrix}]$
36:: $K_{j}^{crs} = P^{crs} {(C_{j})}_{k - \hat{d}}^{T} S_{j}^{- 1}$
37:: $Δ {\hat{x}}_{k} = + K_{j}^{crs} r_{j}$ ▹ $Δ {\hat{t}}_{d} \overset{?}{<} Δ t_{IMU}$
38:: $Δ P_{k} = - K_{j}^{crs} {(C_{j})}_{k - \hat{d}} {(P^{crs})}^{T}$
39:: Sequentially update the buffer
40:: $K_{j}^{dly} = P^{dly} {(C_{j})}_{k - \hat{d}}^{T} S_{j}^{- 1}$
41:: $Δ {\hat{x}}_{k - \hat{d}} = + K_{j}^{dly} r_{j}$
42:: $Δ P^{dly} = - K^{dly} {(C_{j})}_{k - \hat{d}} P^{dly}$
43:: end if
44:: end for
45:: Store index ${\hat{d}}^{+}$ of the posterior estimated delay
46:: $P^{dly} \leftarrow P_{k}^{+}, Φ^{c r s} \leftarrow I$
47:: Erase used slots in the delay buffer
48:: end if
49:: end for

5. Results

This section provides results from Monte Carlo trials and flight datasets testing. First, in a simulation environment, since we can set true time-delay values at measurements, we test if the proposed framework estimates the actual time delays accurately. The next subsection presents the performance of the proposed approach by testing with benchmark flight datasets for validating whether it solves real-world problems.

5.1. Monte Carlo Simulations of a Simple Example

To show actual-time delays being estimated accurately, this section simulates a simple example problem by 100 Monte Carlo trials. The vehicle and measurement models of this simulation are direct from Lee and Johnson’s previous work [43]. The models are a second-order dynamic system with a non-delayed speed measurement and two delayed bearing angles measured from each location of two stations. From Equation (49), variance

Q_{d}

value of this simulation is

0.25 s^{2}

.The actual time delay of the delayed measurements in this simulation is

0.9 s

, and this value is identical to 18 delayed samples since the propagation rate of the simulation is

0.05 s

. Monte Carlo simulations estimate the values of time delays, shown in Figure 7.

Figure 7a shows that the estimated delay rapidly converges to the true delay value. That is, the estimation error of the delayed samples gradually decreases toward zero. Moreover, we may wonder whether the latency compensated filtering algorithm works when the delay is not static. See Figure 7b, which illustrates a response to a change in time delay. Although the values of unknown delays vary over time, estimation resulting from the online calibration method converges to true delay values.

5.2. Flight Datasets’ Test Results

To validate the reliability of the proposed approach for estimating states and unknown delay values, we test one of the benchmark datasets, so-called “EuRoC MAV datasets [63].” The visual-inertial sequences of the datasets were recorded onboard a micro aerial vehicle while a pilot manually flew around indoor Vicon environments. For more details, see reference [63]. Although the datasets include noise model parameters from the IMU at rest, we need to tune each variance of process noise covariance Q for the best performance. Likewise, to estimate the unknown part of time delays, we set the standard deviation of random walk

η_{d}

in Equation (48) as

1.0 \times 10^{- 5}

since the order of this value is set to the same order of the smallest value among the provided noise parameters.

Given that datasets provide various levels of challenging sequences such as faster motion and poor illumination in each environment, to articulate the significance of time delays defined in Section 3.1, we select two datasets of slow motion, called “EuRoC V1 Easy,” and fast motion, called “EuRoC V1 Medium.” Since the vehicle in the medium dataset maneuvers twice as fast, we hypothesize that the time delays have greater impact on the navigation solution of the medium dataset. Algorithms of image processing and filtering are developed under the robot operating system (ROS) [64], given that IMU data and images from the stereo camera are also subscribed under the ROS, shown in Figure 8.

The simplest solution to the estimation problem of the given datasets is to run the baseline in Section 3.2.1 that corrects only Jacobians and residual. However, the novel latency compensated filter described in Algorithm 1 compensates for delayed measurements at the time when the vision data are fused at the filter and estimate the refined state and the delay values. This compensated filtering follows the processes of all three corrections, shown in Figure 6.

The EKF estimates relative location from a starting point. Since we do not know the exact absolute location of origin of given datasets, to compare with ground truth data given in the datasets, certain evaluation error metrics such as so-called “absolute trajectory error [65]” are required.

After applying the absolute trajectory error, Figure 9 illustrates the top-down view of the estimated flight trajectory of the medium dataset. Figure 10 exhibit estimated x, y, z position and their estimation errors.

All estimation errors are bounded within each standard deviation

σ

bounds. We should expect significant time correlation in error plots and a generally growing error covariance for vision-aided inertial navigation problems like this one. Conceptually, position error gets “locked in” and to the extent new features are being mapped the position error will tend to grow with the length of the trajectory. Starting from the noise model parameters reported for the datasets, the compensated filter is a well-tuned estimator—the performance of doing runs with ×3 or ×10 (/3 or /10) multiplier on the R term used in the filter is worse for all of those, shown in Table 2. In other words, the fact that using those multipliers shows larger root mean square (RMS) estimation errors indicates that our approach is a well-tuned filter.

Figure 11 shows the advantages of each correction in the latency compensated VIO by comparing it with the baseline and the covariance correction. The baseline discards cross-covariance and unknown part of the delays, and, although the latency compensated filter might increase the computational effort of the entire system, it significantly improves the accuracy of estimation.

Unlike either the baseline or the covariance correction, the latency compensated filter calibrates the unknown part of time delays. Figure 12 shows that estimation resulting from the compensated filter converges to a certain, final delay value, and its variance rapidly decreases although initial uncertainty is high.

As shown in Figure 13, the average of estimated total delays is around

45 ms

that could generate about

4 cm

drift and offset during the delay period when the vehicle fly at

0.91 m / s

average speed.

When readable delay values are negative, the timestamps of images might indicate wrong pairs or packets.

Table 3 lists the RMS position errors of cases for sensitivity analysis. Approximately known part of time delays introduced in Section 3.2 are either fixed

{\bar{t}}_{d}

by tuning or readouts

{\bar{t}}_{d_{raw}}

, which is the difference of readable timestamps of current IMU and image. In addition, we can directly estimate entire parts of time delays without information on the approximately known part. For another case, using the final value of the estimated unknown part of time delays, we add a fixed

{\bar{δ t}}_{d}

to the total delays at every time. However, this case might not work when the delay is varying, and we can know the final value only after running the proposed filter. In other words, before applying the compensation filtering, fixed

{\bar{δ t}}_{d}

is still unknown. The estimation results from the latency compensated VIO approach depict the influence of the delays and the effectiveness of the corrections in the sensor fusion of the lagged measurements.

Fast motion datasets are more sensitive to time delays since the improvement is larger when applied to those datasets.

Although numerous researchers have explored the VIO of the EuRoC datasets, few of them thoroughly considered measurements with unknown time delays. Table 4 reveals that the proposed estimator, the latency compensated VIO, outperforms the existing state-of-the-art methods, called “S-MSCKF” and “SVO+MSF” in which stereo is available.

6. Discussion

This study develops a practical extended Kalman filter (EKF)-based visual-inertial odometry (VIO) accounting for vehicle-feature correlations; that is, we develop tightly-coupled VIO for autonomous flight of unmanned aerial vehicles (UAVs). In particular, this paper has presented the development of a reliable and accurate filtering scheme for measurements with unknown time delays. We define time delays of vision data measurements in VIO. For compensating delayed measurements and estimating unknown delay values, this paper presents latency compensated filtering that includes state augmentation, interpolation, and residual, Jacobian, covariance corrections. The optimality of the three corrections and the observability of the state augmentation are validated; in other words, the appendix shows that the proposed latency compensated filter results in optimal estimates as if there were no delays in the data.

We test the performance of VIO employing the latency compensated filtering algorithms in the benchmark flight datasets for comparison to other state-of-the-art VIO algorithms. Results from flight datasets testing show that the novel navigation approach in this paper improves the accuracy and reliability of state estimation with unknown time delays in VIO. With the latency compensated filtering, the root mean square (RMS) errors of estimation are decreased. In particular, we show improved accuracy of our method over previous approaches for state estimation in the fast motion datasets.

The overall approach in this document can be easily employed in other filter-based, sensor-aided inertial navigation frameworks and is suitable to monocular VIO although this study uses a stereo camera to showcase the methods. Although the reliability and robustness of this study are validated by testing benchmark flight datasets, validating with other datasets is of interest.

Author Contributions

Conceptualization, K.L. and E.N.J.; Data curation, Formal analysis, Investigation, Methodology, Resources, and Software, K.L.; Supervision, E.N.J.; Validation, Visualization, and Writing—original draft, K.L.; Writing—review and editing, K.L. and E.N.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

VIO	Visual-Inertial Odometry
IMU	Inertial Measurement Unit
EKF	Extended Kalman Filter
UAV	Unmanned Aerial Vehicle
ROS	Robot Operating System

Nomenclature

x	state
y	measurement
t	continuous time
k	discrete time
j	index of measurements
P	error state covariance
Q	process noise covariance
R	measurement noise covariance
r	residual

Appendix A. Jacobians of Models

\begin{matrix} A & = [\begin{matrix} 0 & \frac{\partial^{i} {\dot{\hat{p}}}_{b / i}}{\partial^{i} {\hat{v}}_{b / i}} & 0 & 0 & 0 \\ 0 & 0 & \frac{\partial^{i} {\dot{\hat{v}}}_{b / i}}{\partial δ \hat{θ}} & \frac{\partial^{i} {\dot{\hat{v}}}_{b / i}}{\partial {\hat{b}}_{a}} & 0 \\ 0 & 0 & \frac{\partial δ \dot{\hat{θ}}}{\partial δ \hat{θ}} & 0 & \frac{\partial δ \dot{\hat{θ}}}{\partial {\hat{b}}_{ω}} \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{matrix}], B = [\begin{matrix} 0 & 0 & 0 & 0 \\ \frac{\partial^{i} {\dot{\hat{v}}}_{b / i}}{\partial η_{a}} & 0 & 0 & 0 \\ 0 & \frac{\partial δ \dot{\hat{θ}}}{\partial η_{ω}} & 0 & 0 \\ 0 & 0 & \frac{\partial {\dot{b}}_{a}}{\partial η_{b_{a}}} & 0 \\ 0 & 0 & 0 & \frac{\partial {\dot{b}}_{ω}}{\partial η_{b_{ω}}} \end{matrix}] \\ \frac{\partial^{i} {\dot{\hat{p}}}_{b / i}}{\partial^{i} {\hat{v}}_{b / i}} = I_{3 \times 3}, \frac{\partial^{i} {\dot{\hat{v}}}_{b / i}}{\partial δ \hat{θ}} = - {\hat{T}}_{i / b^{'}} ⌈(a_{raw} - {\hat{b}}_{a}) \times⌋, \frac{\partial^{i} {\dot{\hat{v}}}_{b / i}}{\partial {\hat{b}}_{a}} = - {\hat{T}}_{i / b}, \\ \frac{\partial δ \dot{\hat{θ}}}{\partial δ \hat{θ}} = - ⌈(ω_{raw} - {\hat{b}}_{ω}) \times⌋, \frac{\partial δ \dot{\hat{θ}}}{\partial {\hat{b}}_{ω}} = - I_{3 \times 3}, \\ \frac{\partial^{i} {\dot{\hat{v}}}_{b / i}}{\partial η_{a}} = - {\hat{T}}_{i / b}, \frac{\partial δ \dot{\hat{θ}}}{\partial η_{ω}} = - I_{3 \times 3}, \frac{\partial {\dot{b}}_{a}}{\partial η_{b_{a}}} = I_{3 \times 3}, \frac{\partial {\dot{b}}_{ω}}{\partial η_{b_{ω}}} = I_{3 \times 3}, \end{matrix}

(A1)

\begin{matrix} C_{j} & = [\begin{matrix} \frac{\partial y_{j}}{\partial^{i} {\hat{p}}_{b / i}} & 0 & \frac{\partial y_{j}}{\partial δ \hat{θ}} & 0 & 0 & | 0 \dots & \frac{\partial y_{j}}{\partial^{i} {\hat{p}}_{f_{j} / i}} & \dots 0 \end{matrix}] \end{matrix}

(A2)

\begin{matrix} \frac{\partial y_{j}}{\partial^{i} {\hat{p}}_{b / i}} = (\frac{\partial y_{j}}{\partial^{c} {\hat{p}}_{f_{j} / c}}) (- {\hat{T}}_{c / i}), \frac{\partial y_{j}}{\partial^{i} {\hat{p}}_{f_{j} / i}} = - \frac{\partial y_{j}}{\partial^{i} {\hat{p}}_{b / i}}, \\ \frac{\partial y_{j}}{\partial δ \hat{θ}} = (\frac{\partial y_{j}}{\partial^{c} {\hat{p}}_{f_{j} / c}}) T_{c / b} ⌈^{b^{'}} {\hat{p}}_{f_{j} / b} \times⌋ \end{matrix}

(A3)

\begin{matrix} (\frac{\partial y_{j}}{\partial^{c} {\hat{p}}_{f_{j} / c}}) = \frac{1}{^{c} {\hat{Z}}_{j}} [\begin{matrix} f_{u} & 0 & - f_{u} \frac{^{c} {\hat{X}}_{j}}{^{c} {\hat{Z}}_{j}} \\ 0 & f_{v} & - f_{v} \frac{^{c} {\hat{Y}}_{j}}{^{c} {\hat{Z}}_{j}} \end{matrix}], \end{matrix}

(A4)

where, for more detailed derivations, see reference [66].

Appendix B. Feature Initialization

We assume that the intrinsic and extrinsic parameters of a stereo camera are known and constant values.

c_{1}

,

c_{2}

frames are the left and right camera frame of the stereo, respectively. Since the baseline of the stereo is fixed, rotation

T_{c_{2} / c_{1}}

and translation

^{c_{2}} p_{c_{1} / c_{2}}

between both cameras are constant and known values. Feature coordinates

^{c} {[X, Y, Z]}^{T}

with respect to both cameras are

\begin{matrix} ^{c_{2}} p_{f_{j} / c_{2}} & = T_{c_{2} / c_{1}}^{c_{1}} p_{f_{j} / c_{1}} +^{c_{2}} p_{c_{1} / c_{2}} \end{matrix}

(A5)

\begin{matrix} {[\begin{matrix} ^{c_{2}} X_{j} & ^{c_{2}} Y_{j} & ^{c_{2}} Z_{j} \end{matrix}]}^{T} & = T_{c_{2} / c_{1}} {[\begin{matrix} ^{c_{1}} X_{j} & ^{c_{1}} Y_{j} & ^{c_{1}} Z_{j} \end{matrix}]}^{T} +^{c_{2}} p_{c_{1} / c_{2}} \end{matrix}

(A6)

\begin{matrix} =^{c_{1}} Z_{j} (T_{c_{2} / c_{1}} [\begin{matrix} \frac{^{c_{1}} X_{j}}{^{c_{1}} Z_{j}} \\ \frac{^{c_{1}} Y_{j}}{^{c_{1}} Z_{j}} \\ 1 \end{matrix}] + \frac{1}{^{c_{1}} Z_{j}}^{c_{2}} p_{c_{1} / c_{2}}) \end{matrix}

(A7)

\begin{matrix} =^{c_{1}} {\hat{Z}}_{j} (T_{c_{2} / c_{1}} [\begin{matrix} {\hat{u}}_{j, 1} / f_{u_{1}} \\ {\hat{v}}_{j, 1} / f_{v_{1}} \\ 1 \end{matrix}] + \frac{1}{^{c_{1}} {\hat{Z}}_{j}}^{c_{2}} p_{c_{1} / c_{2}}) . \end{matrix}

(A8)

Simplifying Equation (A8),

[\begin{matrix} ^{c_{2}} X_{j} \\ ^{c_{2}} Y_{j} \\ ^{c_{2}} Z_{j} \end{matrix}] =^{c_{1}} {\hat{Z}}_{j} ([\begin{matrix} m_{x} \\ m_{y} \\ m_{z} \end{matrix}] + \frac{1}{^{c_{1}} {\hat{Z}}_{j}} [\begin{matrix} t r_{x} \\ t r_{y} \\ t r_{z} \end{matrix}]),

(A9)

where

m_{x}

,

m_{y}

, and

m_{z}

are scalar functions of given j-th measurement and constant extrinsic rotation matrix.

Based on Equation (15), since measurement equations from the stereo camera are

y_{j} = [\begin{matrix} u_{j, 1} \\ v_{j, 1} \\ u_{j, 2} \\ v_{j, 2} \end{matrix}] = [\begin{matrix} f_{u_{1}} \frac{^{c_{1}} X_{j}}{^{c_{1}} Z_{j}} \\ f_{v_{1}} \frac{^{c_{1}} Y_{j}}{^{c_{1}} Z_{j}} \\ f_{u_{2}} \frac{^{c_{2}} X_{j}}{^{c_{2}} Z_{j}} \\ f_{v_{2}} \frac{^{c_{2}} Y_{j}}{^{c_{2}} Z_{j}} \end{matrix}] + ζ_{j},

right

c_{2}

camera measurements are expressed in

A x = b

form:

\begin{matrix} [\begin{matrix} {\hat{u}}_{j, 2} / f_{u_{2}} \\ {\hat{v}}_{j, 2} / f_{v_{2}} \end{matrix}] & = [\begin{matrix} \frac{m_{x} + (t r_{x} /^{c_{1}} {\hat{Z}}_{j})}{m_{z} + (t r_{z} /^{c_{1}} {\hat{Z}}_{j})} \\ \frac{m_{y} + (t r_{y} /^{c_{1}} {\hat{Z}}_{j})}{m_{z} + (t r_{z} /^{c_{1}} {\hat{Z}}_{j})} \end{matrix}] \end{matrix}

(A10)

\begin{matrix} [\begin{matrix} m_{x} - ({\hat{u}}_{j, 2} / f_{u_{2}}) m_{z} \\ m_{y} - ({\hat{v}}_{j, 2} / f_{v_{2}}) m_{z} \end{matrix}]^{c_{1}} \hat{Z} & = [\begin{matrix} ({\hat{u}}_{j, 2} / f_{u_{2}}) t r_{z} - t r_{x} \\ ({\hat{v}}_{j, 2} / f_{v_{2}}) t r_{z} - t r_{y} \end{matrix}], \end{matrix}

(A11)

where

x =^{c_{1}} \hat{Z}, A = [\begin{matrix} m_{x} - ({\hat{u}}_{j, 2} / f_{u_{2}}) m_{z} \\ m_{y} - ({\hat{v}}_{j, 2} / f_{v_{2}}) m_{z} \end{matrix}], b = [\begin{matrix} ({\hat{u}}_{j, 2} / f_{u_{2}}) t r_{z} - t r_{x} \\ ({\hat{v}}_{j, 2} / f_{v_{2}}) t r_{z} - t r_{y} \end{matrix}] .

(A12)

Hence, Gauss-Newton least-squares minimization estimates depth

^{c_{1}} Z

of left

c_{1}

camera using the pseudo-inverse of A:

A x = b \Rightarrow (A^{T} A) x = A^{T} b \Rightarrow \hat{x} = {(A^{T} A)}^{- 1} A^{T} b .

If either estimated depth

^{c_{1}} \hat{Z}

or

^{c_{2}} \hat{Z}

is negative, the solution of the minimization is invalid since the feature is always in front of both camera frames observing it. By substituting estimated

^{c_{1}} \hat{Z}

into Equation (A8),

^{c_{1}} {\hat{p}}_{f_{j} / c_{1}} = {[\begin{matrix} ({\hat{u}}_{j, 1} / f_{u_{1}})^{c_{1}} \hat{Z} & (({\hat{v}}_{j, 1} / f_{v_{1}})^{c_{1}} \hat{Z} & ^{c_{1}} \hat{Z} \end{matrix}]}^{T},

(A13)

where

{\hat{p}}_{f_{j} / c}

is not related to the pose of the vehicle.

Likewise, if a monocular camera is used instead,

c_{1}

is the camera frame in which the feature was observed at the first time, and

c_{2}

is the camera frame at a different time instance.

Appendix C. Stochastic Cloning (or the Schmidt–Kalman filter)

For shorthand expressions in this Appendix, we denote state x and covariance P without the augmented state by the feature initialization.

First, we prove Equations (43)–(46). During the delay period, cross-covariance term

P^{crs}

is propagated from time

k - \hat{d}

to time

k - s

\begin{matrix} [\begin{matrix} P_{k - \hat{d}} & {P_{(k - s) | (k - \hat{d})}^{crs}}^{T} \\ P_{(k - s) | (k - \hat{d})}^{crs} & P_{k - s} \end{matrix}] \\ = [\begin{matrix} I & 0 \\ 0 & Φ_{k - \hat{d}} \end{matrix}] [\begin{matrix} P_{k - \hat{d}} & P_{k - \hat{d}} \\ P_{k - \hat{d}} & P_{k - \hat{d}} \end{matrix}] {[\begin{matrix} I & 0 \\ 0 & Φ_{k - \hat{d}} \end{matrix}]}^{T} + [\begin{matrix} 0 & 0 \\ 0 & Q Δ t_{IMU} \end{matrix}] \\ = [\begin{matrix} P_{k - \hat{d}} & P_{k - \hat{d}} Φ_{k - \hat{d}}^{T} \\ Φ_{k - \hat{d}} P_{k - \hat{d}} & Φ_{k - \hat{d}} P_{k - \hat{d}} Φ_{k - \hat{d}}^{T} + Q Δ t_{IMU} \end{matrix}], \end{matrix}

where

P_{k - \hat{d}} \approx P^{dly}

and

Φ

denotes the state transition matrix.

After s time steps, at time k, the final cross-covariance term computed during the delay period is

[\begin{matrix} P_{k - \hat{d}} & {P_{k | (k - \hat{d})}^{crs}}^{T} \\ P_{k | (k - \hat{d})}^{crs} & P_{k}^{-} \end{matrix}] = [\begin{matrix} P_{k - \hat{d}} & P_{k - \hat{d}} (\prod_{i = k - \hat{d}}^{k - 1} Φ_{i}^{T}) \\ (\prod_{i = k - 1}^{k - \hat{d}} Φ_{i}) P_{k - \hat{d}} & P_{k}^{-} \end{matrix}] .

(A14)

Next, we prove Equations (37)–(42). The modified Kalman gain is computed as follows:

\begin{matrix} [\begin{matrix} K^{s} \\ K^{crs} \end{matrix}] & = [\begin{matrix} P_{k - \hat{d}} & {P_{k | (k - \hat{d})}^{crs}}^{T} \\ P_{k | (k - \hat{d})}^{crs} & P_{k}^{-} \end{matrix}] {[\begin{matrix} C_{k - \hat{d}} & 0 \end{matrix}]}^{T} {(C_{k - \hat{d}} P_{k - \hat{d}} C_{k - \hat{d}}^{T} + R)}^{- 1} \\ = [\begin{matrix} * \\ P_{k | (k - \hat{d})}^{crs} C_{k - \hat{d}}^{T} {(C_{k - \hat{d}} P_{k - \hat{d}} C_{k - \hat{d}}^{T} + R)}^{- 1} \end{matrix}], \end{matrix}

(A15)

where

K^{s}

denotes the stationary Kalman gain and

C_{k - \hat{d}} P_{k - \hat{d}} C_{k - \hat{d}}^{T} = [\begin{matrix} C_{k - \hat{d}} & 0 \end{matrix}] [\begin{matrix} P_{k - \hat{d}} & {P_{k | (k - \hat{d})}^{crs}}^{T} \\ P_{k | (k - \hat{d})}^{crs} & P_{k}^{-} \end{matrix}] {[\begin{matrix} C_{k - \hat{d}} & 0 \end{matrix}]}^{T} .

The update of covariance matrix using the cross-covariance term is

\begin{matrix} [\begin{matrix} * & * \\ * & P_{k}^{+} \end{matrix}] & = [\begin{matrix} P_{k - \hat{d}} & {P_{k | (k - \hat{d})}^{crs}}^{T} \\ P_{k | (k - \hat{d})}^{crs} & P_{k}^{-} \end{matrix}] - [\begin{matrix} K^{s} \\ K^{crs} \end{matrix}] [\begin{matrix} C_{k - \hat{d}} & 0 \end{matrix}] [\begin{matrix} P_{k - \hat{d}} & {P_{k | (k - \hat{d})}^{crs}}^{T} \\ P_{k | (k - \hat{d})}^{crs} & P_{k}^{-} \end{matrix}] \\ = [\begin{matrix} * & * \\ * & P_{k}^{-} - K^{crs} C_{k - \hat{d}} {P_{k | (k - \hat{d})}^{crs}}^{T} \end{matrix}] . \end{matrix}

(A16)

We finally prove the optimality of the latency compensated filtering in this paper. Since the standard EKF is an optimal estimator, if we prove that the latency compensated filter is identical to the standard EKF, then the latency compensated filtering approach becomes also optimal estimation.

Let us recall Equations (37)–(39):

\begin{matrix} K^{crs} & = P_{k | (k - \hat{d})}^{crs} C_{k - \hat{d}}^{T} {(C_{k - \hat{d}} P_{k - \hat{d}} C_{k - \hat{d}}^{T} + R)}^{- 1} \end{matrix}

\begin{matrix} = Φ_{k - 1} Φ_{k - 2} \dots Φ_{k - \hat{d}} P_{k - \hat{d}} C_{k - \hat{d}}^{T} {(C_{k - \hat{d}} P_{k - \hat{d}} C_{k - \hat{d}}^{T} + R)}^{- 1} \end{matrix}

(A17)

\begin{matrix} {\hat{x}}_{k}^{+} & = {\hat{x}}_{k}^{-} + K^{crs} r_{k - \hat{d}} \end{matrix}

(A18)

\begin{matrix} P_{k}^{+} & = P_{k}^{-} - K^{crs} C_{k - \hat{d}} {P_{k | (k - \hat{d})}^{crs}}^{T} \end{matrix}

\begin{matrix} = P_{k}^{-} - K^{crs} C_{k - \hat{d}} P_{k - \hat{d}} Φ_{k - \hat{d}}^{T} Φ_{k - s}^{T} \dots Φ_{k - 1}^{T}, \end{matrix}

(A19)

Next, we assume that delayed measurement y is available immediately without delays. In other words, for this assumed case, measurement update is first performed and then propagation steps are processed. At time

[k - \hat{d}]

, given

{\hat{x}}_{k - \hat{d}}^{-}

and

P_{k - \hat{d}}^{-}

, the standard measurement update performs as follows:

\begin{matrix} K_{k - \hat{d}} & = P_{k - \hat{d}}^{-} C_{k - \hat{d}}^{T} {(C_{k - \hat{d}} P_{k - \hat{d}}^{-} C_{k - \hat{d}}^{T} + R)}^{- 1} \\ {\hat{x^{'}}}_{k - \hat{d}}^{+} & = {\hat{x}}_{k - \hat{d}}^{-} + K_{k - \hat{d}} r_{k - \hat{d}} \\ P_{k - \hat{d}}^{^{'} +} & = P_{k - \hat{d}}^{-} - K_{k - \hat{d}} C_{k - \hat{d}} P_{k - \hat{d}}^{-}, \end{matrix}

where

{\hat{x^{'}}}_{k - \hat{d}}^{+} \neq {\hat{x}}_{k - \hat{d}}

and

P_{k - \hat{d}}^{^{'} +} \neq P_{k - \hat{d}}

since

{\hat{x^{'}}}_{k - \hat{d}}^{+}

and

P_{k - \hat{d}}^{^{'} +}

are values after the corrections by the measurement update, shown in Figure A1. That is, red lines illustrate the original processes of the latency compensated filtering and blue lines present the processes of the assumed case.

Figure A1. Optimality of the latency compensated VIO.

From time

(k - s)

to time k during the delay period, the assumed case propagates state estimates and covariance recursively. At time

(k - s)

,

\begin{matrix} {\hat{x^{'}}}_{k - s}^{-} & \approx Φ_{k - \hat{d}} {\hat{x^{'}}}_{k - \hat{d}}^{+} \\ = Φ_{k - \hat{d}} {\hat{x}}_{k - \hat{d}}^{-} + Φ_{k - \hat{d}} K_{k - \hat{d}} r_{k - \hat{d}} \\ P_{k - \hat{d} + 1}^{^{'} -} & = Φ_{k - \hat{d}} P_{k - \hat{d}}^{^{'} +} Φ_{k - \hat{d}}^{T} + Q Δ t_{IMU} \\ = Φ_{k - \hat{d}} P_{k - \hat{d}}^{-} Φ_{k - \hat{d}}^{T} + Q Δ t_{IMU} - Φ_{k - \hat{d}} K_{k - \hat{d}} C_{k - \hat{d}} P_{k - \hat{d}}^{-} Φ_{k - \hat{d}}^{T} . \end{matrix}

Likely, at time k, only time update is processed since the measurement was already used to update:

\begin{matrix} {\hat{x}}_{k}^{+} = {\hat{x^{'}}}_{k}^{-} & \approx Φ_{k - 1} {\hat{x^{'}}}_{k - 1}^{+} \\ = Φ_{k - 1} Φ_{k - 2} \dots Φ_{k - \hat{d}} {\hat{x}}_{k - \hat{d}}^{-} + Φ_{k - 1} Φ_{k - 2} \dots Φ_{k - \hat{d}} K_{k - \hat{d}} r_{k - \hat{d}} \\ = {\hat{x}}_{k}^{-} + Φ_{k - 1} Φ_{k - 2} \dots Φ_{k - \hat{d}} K_{k - \hat{d}} r_{k - \hat{d}} \\ = {\hat{x}}_{k}^{-} + K^{crs} r_{k - \hat{d}} \end{matrix}

(A20)

\begin{matrix} P_{k}^{+} = P_{k}^{^{'} -} & = Φ_{k - 1} Φ_{k - 2} \dots Φ_{k - \hat{d}} P_{k - \hat{d}}^{-} Φ_{k - \hat{d}}^{T} Φ_{k - s}^{T} \dots Φ_{k - 1}^{T} \\ + Q Δ t_{IMU} + Φ_{k - 1} Q Δ t_{IMU} Φ_{k - 1}^{T} + \dots \\ - Φ_{k - 1} Φ_{k - 2} \dots Φ_{k - \hat{d}} K_{k - \hat{d}} C_{k - \hat{d}} P_{k - \hat{d}}^{-} Φ_{k - \hat{d}}^{T} Φ_{k - s}^{T} \dots Φ_{k - 1}^{T} \\ = P_{k}^{-} - Φ_{k - 1} Φ_{k - 2} \dots Φ_{k - \hat{d}} K_{k - \hat{d}} C_{k - \hat{d}} P_{k - \hat{d}}^{-} Φ_{k - \hat{d}}^{T} Φ_{k - s}^{T} \dots Φ_{k - 1}^{T} \\ = P_{k}^{-} - K^{crs} C_{k - \hat{d}} P_{k - \hat{d}}^{-} Φ_{k - \hat{d}}^{T} Φ_{k - s}^{T} \dots Φ_{k - 1}^{T} . \end{matrix}

(A21)

Since Equations (A18) and (A19) are identical to Equations (A20) and (A21), respectively, the hypothesis was completely proved. In other words, the latency compensated filter for VIO acts as if the delayed vision data from an image are available at the right time when the image was captured.

References

Kalman, R.E. A New Approach to Linear Filtering and Prediction Problems. J. Basic Eng. 1960, 82, 35–45. [Google Scholar] [CrossRef] [Green Version]
Kalman, R.E.; Bucy, R.S. New Results in Linear Filtering and Prediction Theory. J. Basic Eng. 1961, 83, 95–108. [Google Scholar] [CrossRef]
Anderson, B.D.; Moore, J.B. Optimal Filtering; Prentice Hall: Englewood Cliffs, NJ, USA, 1979; pp. 193–222. [Google Scholar]
Bar-Shalom, Y.; Li, X.R.; Kirubarajan, T. Estimation with Applications to Tracking and Navigation: Theory Algorithms and Software; John Wiley & Sons: River Street Hoboken, NJ, USA, 2004. [Google Scholar]
Julier, S.J.; Uhlmann, J.K. A New Extension of the Kalman Filter to Nonlinear Systems. In Proceedings of the International Symposium on Aerospace/Defense, Sensing, Simulation and Controls, Orlando, FL, USA, 20–25 April 1997; Volume 3, pp. 182–193. [Google Scholar] [CrossRef]
Wu, A.D.; Johnson, E.N.; Kaess, M.; Dellaert, F.; Chowdhary, G. Autonomous Flight in GPS-Denied Environments Using Monocular Vision and Inertial Sensors. J. Aerosp. Inform. Syst. 2013, 10, 172–186. [Google Scholar] [CrossRef]
Chowdhary, G.; Johnson, E.N.; Magree, D.; Wu, A.; Shein, A. GPS-Denied Indoor and Outdoor Monocular Vision Aided Navigation and Control of Unmanned Aircraft. J. Field Robot. 2013, 30, 415–438. [Google Scholar] [CrossRef]
Song, Y.; Nuske, S.; Scherer, S. A Multi-Sensor Fusion MAV State Estimation from Long-Range Stereo, IMU, GPS and Barometric Sensors. Sensors 2017, 17, 11. [Google Scholar] [CrossRef]
Kong, W.; Hu, T.; Zhang, D.; Shen, L.; Zhang, J. Localization Framework for Real-Time UAV Autonomous Landing: An On-Ground Deployed Visual Approach. Sensors 2017, 17, 1437. [Google Scholar] [CrossRef] [Green Version]
Delmerico, J.; Scaramuzza, D. A Benchmark Comparison of Monocular Visual-Inertial Odometry Algorithms for Flying Robots. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018. [Google Scholar] [CrossRef]
Triggs, B.; McLauchlan, P.F.; Hartley, R.I.; Fitzgibbon, A.W. Bundle Adjustment—A Modern Synthesis. In Proceedings of the International Workshop on Vision Algorithms, Corfu, Greece, 21–22 September 1999; pp. 298–372. [Google Scholar] [CrossRef] [Green Version]
Dellaert, F.; Kaess, M. Square Root SAM: Simultaneous Localization and Mapping via Square Root Information Smoothing. Int. J. Rob. Res. 2006, 25, 1181–1203. [Google Scholar] [CrossRef] [Green Version]
Huang, G. Visual-Inertial Navigation: A Concise Review. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019. [Google Scholar] [CrossRef] [Green Version]
Cadena, C.; Carlone, L.; Carrillo, H.; Latif, Y.; Scaramuzza, D.; Neira, J.; Reid, I.; Leonard, J.J. Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age. IEEE Trans. Robot. 2016, 32, 1309–1332. [Google Scholar] [CrossRef] [Green Version]
Bloesch, M.; Omari, S.; Hutter, M.; Siegwart, R. Robust Visual Inertial Odometry Using a Direct EKF-Based Approach. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 28 September–2 October 2015; pp. 298–304. [Google Scholar] [CrossRef] [Green Version]
Qin, T.; Li, P.; Shen, S. VINS-MONO: A Robust and Versatile Monocular Visual-Inertial State Estimator. IEEE Trans. Robot. 2018, 34, 1004–1020. [Google Scholar] [CrossRef] [Green Version]
Faessler, M.; Fontana, F.; Forster, C.; Mueggler, E.; Pizzoli, M.; Scaramuzza, D. Autonomous, Vision-Based Flight and Live Dense 3D Mapping with a Quadrotor Micro Aerial Vehicle. J. Field Robot. 2016, 33, 431–450. [Google Scholar] [CrossRef]
Paul, M.K.; Roumeliotis, S.I. Alternating-Stereo VINS: Observability Analysis and Performance Evaluation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 4729–4737. [Google Scholar] [CrossRef]
Sun, K.; Mohta, K.; Pfrommer, B.; Watterson, M.; Liu, S.; Mulgaonkar, Y.; Taylor, C.J.; Kumar, V. Robust Stereo Visual Inertial Odometry for Fast Autonomous Flight. IEEE Robot. Autom. Lett. 2018, 3, 965–972. [Google Scholar] [CrossRef] [Green Version]
Leutenegger, S.; Lynen, S.; Bosse, M.; Siegwart, R.; Furgale, P. Keyframe-Based Visual–Inertial Odometry Using Nonlinear Optimization. Int. J. Rob. Res. 2015, 34, 314–334. [Google Scholar] [CrossRef] [Green Version]
Lin, Y.; Gao, F.; Qin, T.; Gao, W.; Liu, T.; Wu, W.; Yang, Z.; Shen, S. Autonomous Aerial Navigation Using Monocular Visual-Inertial Fusion. J. Field Robot. 2018, 35, 23–51. [Google Scholar] [CrossRef]
Mourikis, A.I.; Roumeliotis, S.I. A Multi-State Constraint Kalman Filter for Vision-Aided Inertial Navigation. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Roma, Italy, 10–14 April 2007; pp. 3565–3572. [Google Scholar] [CrossRef]
Li, M.; Mourikis, A.I. High-Precision, Consistent EKF-Based Visual-Inertial Odometry. Int. J. Rob. Res. 2013, 32, 690–711. [Google Scholar] [CrossRef]
Forster, C.; Pizzoli, M.; Scaramuzza, D. SVO: Fast Semi-Direct Monocular Visual Odometry. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–7 June 2014; pp. 15–22. [Google Scholar] [CrossRef] [Green Version]
Forster, C.; Zhang, Z.; Gassner, M.; Werlberger, M.; Scaramuzza, D. SVO: Semidirect Visual Odometry for Monocular and Multicamera Systems. IEEE Trans. Robot. 2017, 33, 249–265. [Google Scholar] [CrossRef] [Green Version]
Lynen, S.; Achtelik, M.W.; Weiss, S.; Chli, M.; Siegwart, R. A Robust and Modular Multi-Sensor Fusion Approach Applied to MAV Navigation. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Tokyo, Japan, 3–7 November 2013; pp. 3923–3929. [Google Scholar] [CrossRef] [Green Version]
Paul, M.K.; Wu, K.; Hesch, J.A.; Nerurkar, E.D.; Roumeliotis, S.I. A Comparative Analysis of Tightly-Coupled Monocular, Binocular, and Stereo Vins. In Proceedings of the International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 165–172. [Google Scholar]
Alexander, H.L. State Estimation for Distributed Systems with Sensing Delay. In Proceedings of the Data Structures and Target Classification, Orlando, FL, USA, 1–2 April 1991; Volume 1470, pp. 103–111. [Google Scholar]
Larsen, T.D.; Andersen, N.A.; Ravn, O.; Poulsen, N.K. Incorporation of Time Delayed Measurements in a Discrete-Time Kalman Filter. In Proceedings of the 37th IEEE Conference on Decision and Control (CDC), Tampa, FL, USA, 16–18 December 1998; Volume 4, pp. 3972–3977. [Google Scholar] [CrossRef] [Green Version]
Van Der Merwe, R.; Wan, E.A.; Julier, S. Sigma-Point Kalman Filters for Nonlinear Estimation and Sensor-Fusion: Applications to Integrated Navigation. In Proceedings of the AIAA Guidance, Navigation, and Control Conference, Providence, RI, USA, 16–19 August 2004; pp. 16–19. [Google Scholar] [CrossRef] [Green Version]
Van Der Merwe, R. Sigma-Point Kalman Filters for Probabilistic Inference in Dynamic State-Space Models. Ph.D. Thesis, Oregon Health & Science University, Portland, OR, USA, 2004. [Google Scholar]
Schmidt, S.F. Applications of State Space Methods to Navigation Problems. Adv. Control Syst. 1966, 3, 293–340. [Google Scholar] [CrossRef]
Roumeliotis, S.I.; Burdick, J.W. Stochastic Cloning: A Generalized Framework for Processing Relative State Measurements. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Washington, DC, USA, 11–15 May 2002; Volume 2, pp. 1788–1795. [Google Scholar] [CrossRef] [Green Version]
Gopalakrishnan, A.; Kaisare, N.S.; Narasimhan, S. Incorporating Delayed and Infrequent Measurements in Extended Kalman Filter Based Nonlinear State Estimation. J. Process Contr. 2011, 21, 119–129. [Google Scholar] [CrossRef]
Julier, S.J.; Uhlmann, J.K. Fusion of Time Delayed Measurements with Uncertain Time Delays. In Proceedings of the American Control Conference (ACC), Portland, OR, USA, 8–10 June 2005; pp. 4028–4033. [Google Scholar] [CrossRef]
Sinopoli, B.; Schenato, L.; Franceschetti, M.; Poolla, K.; Jordan, M.I.; Sastry, S.S. Kalman Filtering with Intermittent Observations. IEEE Trans. Automat. Contr. 2004, 49, 1453–1464. [Google Scholar] [CrossRef]
Choi, M.; Choi, J.; Park, J.; Chung, W.K. State Estimation with Delayed Measurements Considering Uncertainty of Time Delay. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Kobe, Japan, 12–17 May 2009; pp. 3987–3992. [Google Scholar] [CrossRef]
Yoon, H.; Sternberg, D.C.; Cahoy, K. Interpolation Method for Update with out-of-Sequence Measurements: The Augmented Fixed-Lag Smoother. J. Guid. Control Dyn. 2016, 39, 2546–2553. [Google Scholar] [CrossRef] [Green Version]
Lee, K.; Johnson, E.N. Multiple-Model Adaptive Estimation for Measurements with Unknown Time Delay. In Proceedings of the AIAA Guidance, Navigation, and Control Conference, Grapevine, TX, USA, 9–13 January 2017. [Google Scholar] [CrossRef] [Green Version]
Nilsson, J.O.; Skog, I.; Händel, P. Joint State and Measurement Time-Delay Estimation of Nonlinear State Space Systems. In Proceedings of the 10th IEEE International Conference on Information Sciences, Signal Processing and their Applications (ISSPA), Kuala Lumpur, Malaysia, 10–13 May 2010; pp. 324–328. [Google Scholar] [CrossRef] [Green Version]
Li, M.; Mourikis, A.I. Online Temporal Calibration for Camera–IMU systems: Theory and Algorithms. Int. J. Rob. Res. 2014, 33, 947–964. [Google Scholar] [CrossRef]
Qin, T.; Shen, S. Online Temporal Calibration for Monocular Visual-Inertial Systems. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 3662–3669. [Google Scholar] [CrossRef] [Green Version]
Lee, K.; Johnson, E.N. State and Parameter Estimation Using Measurements with Unknown Time Delay. In Proceedings of the IEEE Conference on Control Technology and Applications (CCTA), Mauna Lani, HI, USA, 27–30 August 2017; pp. 1402–1407. [Google Scholar] [CrossRef]
Lee, K. Adaptive Filtering for Vision-Aided Inertial Navigation. Ph.D. Thesis, Georgia Institute of Technology, Atlanta, GA, USA, 2018. [Google Scholar]
Simon, D. Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
Stevens, B.L.; Lewis, F.L.; Johnson, E.N. Aircraft Control and Simulation: Dynamics, Controls Design, and Autonomous Systems; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
Sola, J. Quaternion Kinematics for the Error-State Kalman Filter. arXiv 2017, arXiv:1711.02508. [Google Scholar]
Markley, F.L. Attitude Error Representations for Kalman Filtering. J. Guid. Control Dyn. 2003, 26, 311–317. [Google Scholar] [CrossRef]
Forsyth, D.A.; Ponce, J. Computer Vision: A Modern Approach; Prentice Hall: Upper Saddle River, NJ, USA, 2002; pp. 96–147. [Google Scholar]
Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision; Cambridge University Press: Cambridge, UK, 2003; pp. 22–236. [Google Scholar]
Bjorck, A. Numerical Methods for Least Squares Problems; Society for Industrial and Applied Mathematics (SIAM): Philadelphia, PA, USA, 1996; Volume 51, Chapters 2–9. [Google Scholar]
Montiel, J.M.; Civera, J.; Davison, A.J. Unified Inverse Depth Parametrization for Monocular SLAM. In Proceedings of the Robotics: Science and Systems (RSS), Philadelphia, PA, UAS, 16–19 August 2006. [Google Scholar] [CrossRef]
Schutz, B.; Tapley, B.; Born, G.H. Statistical Orbit Determination; Elsevier Academic Press: Burlington, MA, USA, 2004; pp. 387–438. [Google Scholar]
Hinks, J.; Psiaki, M. A Multipurpose Consider Covariance Analysis for Square-Root Information Filters. In Proceedings of the AIAA Guidance, Navigation, and Control Conference, Minneapolis, MN, USA, 13–16 August 2012; p. 4599. [Google Scholar] [CrossRef] [Green Version]
Meijering, E. A Chronology of Interpolation: From Ancient Astronomy to Modern Signal and Image Processing. Proc. IEEE 2002, 90, 319–342. [Google Scholar] [CrossRef] [Green Version]
Shoemake, K. Animating Rotation with Quaternion Curves. In Proceedings of the ACM SIGGRAPH Computer Graphics, San Francisco, CA, USA, 20–28 July 1985; Volume 19, pp. 245–254. [Google Scholar] [CrossRef]
Dam, E.B.; Koch, M.; Lillholm, M. Quaternions, Interpolation and Animation; Citeseer: Copenhagen, Denmark, 1998; Volume 2, pp. 34–76. [Google Scholar]
Gelb, A. Applied Optimal Estimation; MIT Press: Cambridge, MA, USA, 1974; pp. 102–155, 180–228. [Google Scholar]
Kopp, R.E.; Orford, R.J. Linear Regression Applied to System Identification for Adaptive Control Systems. AIAA J. 1963, 1, 2300–2306. [Google Scholar] [CrossRef]
Viegas, D.; Batista, P.; Oliveira, P.; Silvestre, C. Nonlinear Observability and Observer Design through State Augmentation. In Proceedings of the IEEE 53rd Annual Conference on Decision and Control (CDC), Los Angeles, CA, USA, 15–17 December 2014; pp. 133–138. [Google Scholar] [CrossRef]
Del Vecchio, D.; Murray, R. Observability and Local Observer Construction for Unknown Parameters in Linearly and Nonlinearly Parameterized Systems. In Proceedings of the American Control Conference (ACC), Denver, CO, USA, 4–6 June 2003; Volume 6, pp. 4748–4753. [Google Scholar] [CrossRef]
Carrassi, A.; Vannitsem, S. State and Parameter Estimation with the Extended Kalman Filter: An Alternative Formulation of the Model Error Dynamics. Q. J. R. Meteorol. Soc. 2011, 137, 435–451. [Google Scholar] [CrossRef]
Burri, M.; Nikolic, J.; Gohl, P.; Schneider, T.; Rehder, J.; Omari, S.; Achtelik, M.W.; Siegwart, R. The EuRoC Micro Aerial Vehicle Datasets. Int. J. Rob. Res. 2016, 35, 1157–1163. [Google Scholar] [CrossRef]
Quigley, M.; Conley, K.; Gerkey, B.; Faust, J.; Foote, T.; Leibs, J.; Wheeler, R.; Ng, A.Y. ROS: An Open-Source Robot Operating System. In Proceedings of the ICRA Workshop on Open Source Software, Kobe, Japan, 12–17 May 2009; Volume 3, p. 5. [Google Scholar]
Sturm, J.; Engelhard, N.; Endres, F.; Burgard, W.; Cremers, D. A Benchmark for the Evaluation of RGB-D SLAM Systems. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vilamoura, Portugal, 7–12 October 2012; pp. 573–580. [Google Scholar] [CrossRef] [Green Version]
Wu, A.D. Vision-Based Navigation and Mapping for Flight in GPS-Denied Environments. Ph.D. Thesis, Georgia Institute of Technology, Atlanta, GA, USA, 2010. [Google Scholar]

Figure 1. A schematic of the sequential measurement update.

Figure 2. Data streams of the IMU and the delayed vision data.

Figure 3. Examples of interpolation and slerp. (a) Linear interpolation, (b) Quaternion slerp.

Figure 4. A schematic of a modified measurement update using covariance correction.

Figure 5. Three corrections in the latency compensated VIO.

Figure 6. A flow chart of the overall process of the latency compensated VIO.

Figure 7. Estimation of total delays in simulation. (a) A static delay, (b) Varying delays.

Figure 8. ROS rqt graph of the latency compensated VIO linked to the EuRoC dataset.

Figure 9. Top-down view of flight trajectory of the EuRoC V1 medium dataset by the latency compensated VIO.

Figure 10. Position and estimation error of the EuRoC V1 medium dataset by the latency compensated VIO. (a) Position x, (b) Position y, (c) Position z.

Figure 11. Box plot of absolute estimation error of position of the EuRoC V1 medium dataset by the latency compensated VIO.

Figure 12. Estimation of unknown part of time delays of the EuRoC V1 medium dataset.

Figure 13. Estimation of time delays of the EuRoC V1 medium dataset. (a) Readable delays, (b) Estimated total delays.

Table 1. State-of-the-art Visual-Inertial Odometry.

Name	ROVIO [15]	VINS-MONO [16]	SVO +MSF [17]	Alternating Stereo VINS [18]	S-MSCKF [19]	OKVIS [20]
Monocular	×	×	×
Stereo				×	×	×
Indirect		×		×	×	×
Semi-direct			×
Direct	×
Loosely-Coupled			×
Tightly-Coupled	×	×		×	×	×
Optimization-based		×				×
Filtering-based	×		×	×	×
Open-source	×	×	×		×	×

Table 2. Indication that the latency compensated VIO is well-tuned for EuRoC V1 medium dataset.

Multiplier on R		/10	/3	1	×3	×10
RMS error [m]		1.5096	0.1969	0.1619	0.2636	0.2850

Table 3. Sensitivity analysis in RMS position error [m] of latency compensated VIO.

	Dataset	EuRoC V1 Easy Slow Motion 0.41 m/s, 16.0 deg/s		EuRoC V1 Medium Fast Motion 0.91 m/s, 32.1 deg/s
Method		Cross-Cov OFF	Cross-Cov ON	Cross-Cov OFF	Cross-Cov ON
Fixed ${\bar{t}}_{d_{const}}$		0.3376	0.2677	0.4644	0.3135
Entirely Estimated ${\hat{t}}_{d}$		0.2282	0.2406	0.4734	0.3538
Readouts ${\bar{t}}_{d}$	+ N/A	0.2558	0.2032	0.4163	0.3121
	+ Fixed ${\bar{δ t}}_{d}$	0.2869	0.2285	0.3281	0.2218
	+ Estimated ${\hat{δ t}}_{d}$	0.2019	0.1461	0.3353	0.1619

Table 4. Comparison with other methods in RMS position error [m] of latency compensated VIO

	EuRoC V1 Easy	EuRoC V1 Medium
Method	Slow Motion 0.41 m/s, 0.28 rad/s	Fast Motion 0.91 m/s, 0.56 rad/s
Latency Compensated VIO	0.1461	0.1619
S-MSCKF (stereo-filter)	0.34	0.20
SVO+MSF (loosely-coupled)	0.40	0.63

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lee, K.; Johnson, E.N. Latency Compensated Visual-Inertial Odometry for Agile Autonomous Flight. Sensors 2020, 20, 2209. https://doi.org/10.3390/s20082209

AMA Style

Lee K, Johnson EN. Latency Compensated Visual-Inertial Odometry for Agile Autonomous Flight. Sensors. 2020; 20(8):2209. https://doi.org/10.3390/s20082209

Chicago/Turabian Style

Lee, Kyuman, and Eric N. Johnson. 2020. "Latency Compensated Visual-Inertial Odometry for Agile Autonomous Flight" Sensors 20, no. 8: 2209. https://doi.org/10.3390/s20082209

APA Style

Lee, K., & Johnson, E. N. (2020). Latency Compensated Visual-Inertial Odometry for Agile Autonomous Flight. Sensors, 20(8), 2209. https://doi.org/10.3390/s20082209

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Latency Compensated Visual-Inertial Odometry for Agile Autonomous Flight

Abstract

1. Introduction

1.1. Related Work

1.1.1. Visual-Inertial Odometry

1.1.2. State Estimation Using Time-Delayed Measurements

1.2. Summary of Contributions

1.3. A Guide to This Document

2. Preliminaries

2.1. Sequential Measurement Update

2.2. Vehicle Model

2.3. Camera Model

2.4. Feature Initialization

3. Theory

3.1. Definition of Time Delays

3.2. Approximately Known Part of Time Delays

3.2.1. Jacobian and Residual—“Baseline Correction”

3.2.2. Cross-Covariance—“Covariance Correction”

3.3. Unknown Part of Time Delays—“Online Calibration”

4. Implementation

4.1. Forward Computation of Cross-Covariance

4.2. Summarized Algorithm

5. Results

5.1. Monte Carlo Simulations of a Simple Example

5.2. Flight Datasets’ Test Results

6. Discussion

Author Contributions

Funding

Conflicts of Interest

Abbreviations

Nomenclature

Appendix A. Jacobians of Models

Appendix B. Feature Initialization

Appendix C. Stochastic Cloning (or the Schmidt–Kalman filter)

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI