Next Article in Journal
The Compass Error Comparison of an Onboard Standard Gyrocompass, Fiber-Optic Gyrocompass (FOG) and Satellite Compass
Previous Article in Journal
Powering the Environmental Internet of Things
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Monocular Visual-Inertial Odometry with an Unbiased Linear System Model and Robust Feature Tracking Front-End

1
School of Automation Science and Electrical Engineering, Beihang University, No. 37 Xueyuan Road, Haidian District, Beijing 100191, China
2
Science and Technology on Aircraft Control Laboratory, Beihang University, No. 37 Xueyuan Road, Haidian District, Beijing 100191, China
3
Science and Technology on Complex System Control and Intelligent Agent Cooperation Laboratory, No. 40 Yungangbeili, Fengtai District, Beijing 100074, China
4
Sino-French Engineer School, Beihang University, No. 37 Xueyuan Road, Haidian District, Beijing 100191, China
*
Author to whom correspondence should be addressed.
Sensors 2019, 19(8), 1941; https://doi.org/10.3390/s19081941
Submission received: 23 March 2019 / Revised: 17 April 2019 / Accepted: 22 April 2019 / Published: 25 April 2019
(This article belongs to the Section Intelligent Sensors)

Abstract

:
The research field of visual-inertial odometry has entered a mature stage in recent years. However, unneglectable problems still exist. Tradeoffs have to be made between high accuracy and low computation for users. In addition, notation confusion exists in quaternion descriptions of rotation; although not fatal, this may results in unnecessary difficulties in understanding for researchers. In this paper, we develop a visual-inertial odometry which gives consideration to both precision and computation. The proposed algorithm is a filter-based solution that utilizes the framework of the noted multi-state constraint Kalman filter. To dispel notation confusion, we deduced the error state transition equation from scratch, using the more cognitive Hamilton notation of quaternion. We further come up with a fully linear closed-form formulation that is readily implemented. As the filter-based back-end is vulnerable to feature matching outliers, a descriptor-assisted optical flow tracking front-end was developed to cope with the issue. This modification only requires negligible additional computation. In addition, an initialization procedure is implemented, which automatically selects static data to initialize the filter state. Evaluations of proposed methods were done on a public, real-world dataset, and comparisons were made with state-of-the-art solutions. The experimental results show that the proposed solution is comparable in precision and demonstrates higher computation efficiency compared to the state-of-the-art.

1. Introduction

The fusion of monocular cameras and inertial measurement units (IMUs) is very popular recently, thanks to great improvements in the computation capacity of computers with low energy costs and low weight, and the increasing demand for accurate motion tracking or positioning in unmanned aerial vehicles (UAVs), augmented reality, and driverless cars. This fusion problem has been studied by brilliant scientists for years [1,2,3,4], and as a result, one can now build his own visual-inertial odometry (VIO) module simply with cheap sensors and open-source software [2,5,6,7,8,9].
Fusion frameworks are divided into two main branches: filter-based and optimization- based [10,11]. Optimization-based methods are so far widely recognized performing better in terms of precision [12] due to their iterating mechanism, which is essentially solving a noted bundle adjustment (BA) problem [13]. The BA problem was considered to be computational costly in earlier years, until the literature recognized and revealed its sparse structure [13,14] so as to develop real-time algorithms. Kümmerle et al. [7] modeled BA as a graph optimization problem. Kaess et al. [8] introduced a factor graph model to further illustrate the Bayesian nature of BA. Kaess et al. [8] also found that the incremental fact of sophiscated Hessian matrix in normal equation can be utilized for solving BA, thus speeding up the calculation further. With these profound insights, researchers also made efforts to overcome the inconsistency in fixed-lag fusion algorithms, which has the advantages of having bounded computation with less information loss and maintaining the sparse structure [15]. Several open-source libraries are available for building the back-end for algorithms of this branch, based on different mathematical descriptions listed above and providing convenient application program interfaces (APIs) [7,8,9,16]. Although enabling reduced computation by leveraging sparse matrix factorization, optimization-based VIO systems still need to be tailored sometimes in order to be deployed on a computation-limited platform. Sometimes, this leads to a downgraded performance [17].
Before the flowering of optimization-based methods, the solving of fusion problems was dominated by filtering [18]. The ordinary procedure is to include IMU pose and map point positions in the filter state and recursively propagate and update as IMU and camera measurements respectively arrive [10]. The accurate estimation of map point positions is the key to bring about an unbiased IMU pose updating. In traditional filter-based solutions, the filter state would invariably have a very large dimension since it always preserves a lot of map points, resulting in enhanced computation requirements [1]. The use of a multi-state constraint Kalman filter (MSCKF) was proposed as an effective and optimal filter-based solution that does not maintain map points in filter state [19]. By properly handling the camera measurement, MSCKF can achieve as competitive a performance as optimization-based algorithms and demands far less computation [17].
By correcting observability properties [19,20,21] and incorporating camera–IMU extrinsic parameters into the filter state [22], the performance of MSCKF was further improved. Many follow-up works emerged, including an open-source monocular implementation [23], expansion to stereo camera rig [24], and schemes using direct visual front-ends [25] or adding line features [26].
It should be emphasized that all members of the MSCKF family so far have been developed based on Shuster’s notation of quaternion [27], whereas most of the community utilizes the traditional Hamilton notation, which results in unnecessary trouble in understanding for researchers [28].
Visual front-ends apparently play an important role in VIOs. There are typically of two categories. Feature-based methods use descriptors to match features between consecutive images [6], while direct methods seek a minimization of photometric residuals to accomplish data correlation [5,25]. Sparse optical flow tracking is an efficient direct method that is widely used [2,23,24]. It provides sub-pixel accuracy but contains more outliers than feature descriptor matching [29]. An optimization-based back-end would eliminate outliers during iteration [30]. Filter-based back-ends are meanwhile vulnerable to the outliers if only one-off updating is applied [23]. Using an iterated update scheme would mitigate this situation while introducing additive computation [31].
To recap, in order to make VIO algorithms more practical, it is desirable to develop algorithms with lower computation while maintaining high precision.
In this paper, we developed a filter-based monocular visual-inertial odometry which can be regarded as a member of MSCKF family, giving consideration to both high precision and computation efficiency. The main contributions of this paper are as follows:
  • We deduced a closed-form IMU error state transition equation based on the more cognitive Hamilton notation of quaternion. By solving integration terms analytically, a novel fully linear formulation was further obtained, which is also closed-form, and furthermore, is readily implemented.
  • By analyzing the statistical properties of ORB descriptor [32] distances of matched and unmatched feature points, we introduced a novel descriptor-assisted sparse optical flow tracking technique, which enhances the feature tracking robustness and barely adds any computation complexity.
  • More improvements are made to improve the usability and performance of the filter. An initialization procedure is developed that automatically detects stationary scenes by analyzing tracked features and initializes the filter state based on static IMU data. The feature triangulation mechanism is carefully refined to provide efficient measurement updates.
  • A filter-based monocular VIO using the proposed state transition equation, visual front-end, and initialization procedure under Sun et al.’s [24] framework is implemented. The performances of our VIO and MSCKF-MONO [23], an open-source monocular implementation of MSCKF, are compared with parameters setup as similarly as possible. Ours is also compared with other state-of-the-art open-source VIOs including ROVIO [5], OKVIS [6], and VINS-MONO [2]. In addition, we analyze the process time of our algorithm. All of the evaluations above are done on EuRoC datasets [33]. Detailed evaluations are reported.
The rest of this paper is organized as follows. The problem of quaternion notation confusion is illustrated in Section 2. Section 3 deduces the error state differential equation based on Hamilton’s notation. Section 4 gives a closed-form error state transition formulation and then solves the integration terms in it, obtaining a fully linear closed-form formulation. Section 5 presents the descriptor-assisted sparse optical flow tracking front-end. Other implementation details and improvements are presented in Section 6, including the overall filter model, automatic initialization procedure, and refined feature triangulation mechanism. Section 7 presents the experimental results in detail. Finally, conclusions are made in Section 8.

2. Quaternion Notation Confusion

Quaternion is one of the widely used representations of rotation in numerical calculations [34]. In the related literature, there are mainly two different notations: Hamilton’s notation and Shuster’s notation [35]. The difference between them lies in their flipped rule for the multiplication of imaginary parts i , j , and k . Hamilton utilizes ij = k , while Shuster advocates for ij = k to maintain the order of chain rule when transferring to direction cosine matrices (DCMs). Sommer et al. [28] surveyed this notation confusion problem in detail and argue for entirely abandoning Shuster’s notation. In this section, we present the original problem that Shuster’s notation is designed to solve and a solution for maintaining chain rule order while still using Hamilton’s notation.
A quaternion of rotation q is basically a unit quaternion; it can be defined as
q = cos θ 2 + u A · sin θ 2
where u A is the unit vector of rotation axis in frame A, and θ is the angle of rotation. In the rest of this article, the term “quaternion” will be used to refer to a quaternion of rotation, for the sake of simplicity.
Equation (1) shows how to construct a quaternion q from an axis-angle θ u A , which describes the anticlockwise rotation of an angle θ about the axis u . If the original frame A is rotated to a new frame B after this rotation, as illustrated in Figure 1, then we can use a quaternion q A B or a DCM R A B to describe this rotation.
Note that R A B can be used to compute the coordinate of a vector v in frame B given its coordinate in frame A, that is v B = R A B v A . R A B can be written as a function of q A B
R A B = C S q A B
where C S is an operator mapping q A B to R A B .
Let q B C be the quaternion describing the rotation from frame B to frame C and q A C the rotation from frame A to frame C. Then, according to Equation (2), we have
R B C = C S q B C R A C = C S q A C .
Coordinate transformation of vectors can also be done by applying the triple product of quaternions:
v B = q A B 1 v A q A B v C = q B C 1 v B q B C .
Here we abuse the notation of v A , v B , and v C to describe quaternions with zero real part such that v A 0 v A T T . Combining the two equations above yields:
v C = q B C 1 q A B 1 v A q A B q B C = q A B q B C 1 v A q A B q B C .
Referring to Equation (5), there is
q A C = q A B q B C .
At the same time, by applying the chain rule in DCM production, it follows that
R A C = R B C R A B .
Now we can conclude that C S q A B q B C = C S q B C C S q A B , which means the mapping C S is not a homomorphism. One would prefer a homomorphic mapping between DCM and quaternion to maintain the chain-rule order, which is convenient to manipulate. Shuster utilized a flipped multiplication rule to avoid this problem. This notation was adopted by the Jet Propulsion Laboratory (JPL) and thus introduced to spacecraft literatures, while other research fields were still using the traditional Hamilton notation. But as researchers have exchanged ideas between different research fields, Shuster’s notation has been utilized in robotics for rotation representation [28]. So far, all of the theories about MSCKF are deduced based on this notation [1].
As Sommer et al. [28] claimed, a homomorphic mapping could be obtained even under Hamilton’s notation. Let C H be an operator that satisfies C H q = C S q T . Thus, we have
R B A = C H q A B R C B = C H q B C R C A = C H q A C .
According to Equations (6) and (7), we now have C H q A B q B C = C H q A B C H q B C , which proves C H to be a homomorphism.
Given a quaternion
q A B = cos θ 2 + u A sin θ 2 = q 0 + q 1 i + q 1 j + q 2 k ,
the operator C H is defined as a function mapping quaternion q A B to a DCM R B A as
C H q A B = R B A = q 0 2 + q 1 2 q 2 2 q 3 2 2 q 1 q 2 q 0 q 3 2 q 1 q 3 + q 0 q 2 2 q 1 q 2 + q 0 q 3 q 0 2 q 1 2 + q 2 2 q 3 2 2 q 2 q 3 q 0 q 1 2 q 1 q 3 q 0 q 2 2 q 2 q 3 + q 0 q 1 q 0 2 q 1 2 q 2 2 + q 3 2 .
This is, in fact, the classical Rodrigues Rotation Formula. There is a more thorough discussion about this mapping in [36].

3. IMU Error State Differential Equation

In this section, we deduce the IMU error state differential equation based on Hamilton’s quaternion notation. The Earth’s rotation is ignored as low cost gyros cannot measure it. The static world assumption is employed, which means that gravity has a fixed direction. This is acceptable when a VIO is working in a limited region.

3.1. Notation

The east-north-up geographic coordinate system at initial position is selected as the reference world frame w. As the Earth’s rotation is omitted, w can be regarded as an inertial frame. Quaternion q w b is used to represent the rotation from frame w to body frame b. According to Equation (8), we obtain
C H q w b = R b w .
The error quaternion is defined as
δ q w b = q w b q ^ w b 1 ,
where q ^ w b is the estimated quaternion of q w b .
According to Equation (10), applying the C H mapping to Equation (12) leads to
R w w = R b w R b w 1 ,
where w is the estimated world frame, and δ q w b corresponds to the rotation between w and w .
δ q w b can be expressed in axis-angle formulation as
δ q w b = cos δ θ w 2 + δ θ w δ θ w sin δ θ w 2 ,
where δ θ w = δ θ x w δ θ y w δ θ z w T is an axis-angle in frame w that rotates frame w to frame w. As  δ θ w is a small angle, an approximate expression of Equation (14) is
δ q w b 1 + 1 2 δ θ w .
Based on Equations (10) and (14), an approximate expression of R w w is formulated as
R w w 1 δ θ z w δ θ y w δ θ z w 1 δ θ x w δ θ y w δ θ x w 1 = I + δ θ w × ,
where the operator × is used to denote the skew matrix. For a given three-dimensional (3D) vector v = v x v y v z T , its skew matrix is
v × = 0 v z v y v z 0 v x v y v x 0 .

3.2. IMU Measurement Model

An IMU includes a 3-axis gyroscope and a 3-axis accelerometer, whose axes are aligned with the body frame. The output of the gyroscope is modeled as
ω m b = ω w b b + b g + n g ,
where ω w b b is the true angular velocity, b g denotes the gyroscope bias under the body frame, and n g is the Gaussian white noise.
The accelerometer measures the specific force along a body-fixed axis, which includes an opposite gravity and is affected by bias and noise as well:
f m b = a b R w b g w + b a + n a = R w b a w g w + b a + n a ,
where a b is the true acceleration, and g w = 0 0 g T denotes the gravity under the world frame. b a and n a denote the bias and the Gaussian white noise under the body frame, respectively.
The biases b g and b a are modeled as random walk processes
b ˙ g = n w g b ˙ a = n w a ,
where n w g and n w a are Gaussian white noises.

3.3. IMU Error State Definition

The IMU state includes the quaternion q w b , velocity v b w and position p b w of the body frame origin in the world frame, and IMU biases b g and b a . The IMU state can be defined as
x I M U = q w b T v b w T p b w T b g T b a T T .
The filter is designed based on the error state because it is convenient to process by extended Kalman filter (EKF). Three dimensional angular error δ θ w rather than four dimensional quaternion error δ q w b is utilized since it is accordance with the degree of freedom (DOF) of rotation, and thus a minimum parameterization.
Other error state components are simply defined as the Euclidean distances between true states and the estimated states, which lead to
δ v b w = v b w v ^ b w ,
δ p b w = p b w p ^ b w ,
δ b g = b g b ^ g ,
δ b a = b a b ^ a .
The overall IMU error state can now be concluded as
δ x I M U = δ θ w T δ v b w T δ p b w T b g T b a T T .

3.4. Differential Equation

The matrix form of the differential equation of the overall IMU error state is as follows.
δ x ˙ I M U = F δ x I M U + G n I M U ,
where n I M U denotes the IMU noise, given by
δ x I M U = δ θ w δ v b w T δ p b w T δ b g T δ b a T T ,
n I M U = n g T n a T n w g T n w a T T ,
and the matrices F and G are as follows:
F = 0 3 × 3 0 3 × 3 0 3 × 3 R ^ b w 0 3 × 3 R ^ b w a ^ × 0 3 × 3 0 3 × 3 0 3 × 3 R ^ b w 0 3 × 3 I 3 × 3 0 3 × 3 0 3 × 3 0 3 × 3 0 3 × 3 0 3 × 3 0 3 × 3 0 3 × 3 0 3 × 3 0 3 × 3 0 3 × 3 0 3 × 3 0 3 × 3 0 3 × 3 ,
G = R ^ b w 0 3 × 3 0 3 × 3 0 3 × 3 0 3 × 3 R ^ b w 0 3 × 3 0 3 × 3 0 3 × 3 0 3 × 3 0 3 × 3 0 3 × 3 0 3 × 3 0 3 × 3 I 3 × 3 0 3 × 3 0 3 × 3 0 3 × 3 0 3 × 3 I 3 × 3 .

4. Fully Linear State Transition Equation Formulation

A state transition equation is needed for the extended Kalman filter (EKF) to propagate the state and covariance. One commonly used method is to make a first-order approximation based on a continuous differential equation [37]. Li and Mourikis [19] proposed a closed-form error state transition equation that effectuated a system model with no information loss. However, there are still some tricky integration terms left behind. In this section, we first present the closed-form IMU error state transition equation based on the results of Section 3. Then, we solve the integration terms by two-sample fitting of the rotation matrix, resulting in a closed-form formulation that is fully linear.

4.1. Original Closed-Form Equation

Following the methodology of Li and Mourikis [19], the closed-form transition equation was deduced and presented in what follows. Noting that k and k + 1 are consecutive discrete sampling instants of IMU, and Δ t is the sampling period,
δ θ k + 1 | k w = Φ θ θ k + 1 , k δ θ k | k w + Φ θ b g k + 1 , k δ b g k | k + n θ k ,
δ v k + 1 | k w = Φ v θ k + 1 , k δ θ k | k w + Φ v v k + 1 , k δ v k | k w + Φ v b g k + 1 , k δ b g k | k + Φ v b a k + 1 , k δ b a k | k + n v k ,
δ p k + 1 | k w = Φ p θ k + 1 , k δ θ k | k w + Φ p v k + 1 , k δ v k | k w + Φ p p k + 1 , k δ p k | k w + Φ p b g k + 1 , k δ b g k | k + Φ p b a k + 1 , k δ b a k | k + n p k ,
δ b g k + 1 | k = Φ b g b g k + 1 , k δ b g k | k + n b g k ,
δ b a k + 1 | k = Φ b a b a k + 1 , k δ b a k | k + n b a k ,
where Φ x 1 x 2 is used to represent the transition matrix of the error state of x 1 with respect to the error state of x 2 , and n * terms represent noise. All of the Φ * and n * terms are listed as follows:
Φ θ θ k + 1 , k = I 3 × 3 ,
Φ θ b g k + 1 , k = R ^ b k w t k t k + 1 R ^ b τ b k d τ ,
Φ v θ k + 1 , k = v ^ k + 1 | k w v ^ k | k w g w Δ t × ,
Φ v v k + 1 , k = I 3 × 3 ,
Φ v b g k + 1 , k = t k t k + 1 v ^ ˙ τ w g w × R ^ b k w t k τ R ^ b m b k d m d τ ,
Φ v b a k + 1 , k = R ^ b k w t k t k + 1 R ^ b τ b k d τ ,
Φ p θ k + 1 , k = p ^ k + 1 | k w p ^ k | k w v ^ k | k w Δ t 1 2 g w Δ t 2 × ,
Φ p v k + 1 , k = I 3 × 3 Δ t ,
Φ p p k + 1 , k = I 3 × 3 ,
Φ p b g k + 1 , k = t k t k + 1 t k τ v ^ ˙ s w g w × R ^ b k w t k s R ^ b m b k d m d s d τ ,
Φ p b a k + 1 , k = R ^ b k w t k t k + 1 t k τ R ^ b s b k d s d τ ,
Φ b g b g k + 1 , k = I 3 × 3 ,
Φ b a b a k + 1 , k = I 3 × 3 ,
n θ k = R ^ b k w t k t k + 1 R ^ b τ b k t k τ n w g s d s + n g τ d τ ,
n v k = t k t k + 1 R ^ b τ w t k τ n w a s d s n a τ d τ t k t k + 1 v ^ ˙ τ w g w × R ^ b k w t k τ R ^ b m b k t k m n w g s d s + n g m d m d τ ,
n p k = t k t k + 1 n v τ d τ ,
n b g k = t k t k + 1 n w g τ d τ ,
n b a k = t k t k + 1 n w a τ d τ .

4.2. Fully Linear Closed-Form Formulation

Notice that in Equations (38), (41), (42), (46), and (47), although they are closed-form expressions, there are still some tricky integration terms that are not straightforward for implementation. One can solve these terms with numerical integration, but here we present a fully linear analytical expression that is readily implemented. The key is to solve t k t k + 1 R ^ b τ b k d τ . We first apply a two-sample fitting method to approximate the axis-angle representing R ^ b τ b k , then Rodrigues’ rotation formula is applied to express the DCM as a linear function of τ , thus making the integration easy to solve.

4.2.1. Two-Sample Fitting of Axis-Angle

Any DCM can be regarded as a single rotation about a fixed axis, and thus can be represented by an axis-angle. Let the axis-angle of R ^ b k + 1 b k be ϕ = α u b k , where α is the angle of rotation and u b k is the rotation axis. Let τ be a time instant between t k and t k + 1 and ε = τ t k , then a linear model can be used to represent the axis-angle of R ^ b τ b k :
ϕ τ = ϕ ε = ε Δ t α u b k = ε Δ t ϕ .
Angular velocity measurements of t k and t k + 1 are available when calculating the transition matrix Φ k + 1 , k , so a two-sample fitting method can be used to approximate the axis-angle ϕ . We start from the differential equation of ϕ ε [36]:
ϕ ˙ ε = ω ¯ + 1 2 ϕ ε × ω ¯ + 1 12 ϕ ε × ϕ ε × ω ¯ ,
where ω ¯ is the average angular velocity between t k and t k + 1 . As two gyro measurements are available, we use a straight line model to fit ω ¯ as
ω ¯ t k + ε = a + 2 b ε , 0 ε Δ t .
Considering ω ¯ b t k = ω ^ w b b t k and ω ¯ b t k + Δ t = ω ^ w b b t k + 1 , leads to
a = ω ^ w b b t k b = ω ^ w b b t k + 1 ω ^ w b b t k ω ^ w b b k + 1 ω ^ w b b k 2 Δ t 2 Δ t .
According to Equation (55), ϕ is equal to ϕ t k + 1 , then using Taylor expansion to expand ϕ at linearized point t k yields
ϕ = ϕ t k + 1 = ϕ t k + Δ t ϕ ˙ t k + Δ t 2 2 ! ϕ ¨ t k + = ϕ 0 + Δ t ϕ ˙ 0 + Δ t 2 2 ! ϕ ¨ 0 + .
Now define a new function of ε as
Δ θ ε = 0 ε ω ^ w b b t k + ε d ε .
It can be pointed out that ϕ ε Δ θ ε . Derivatives of Δ θ 0 are defined as
Δ θ 0 = 0 , Δ θ ˙ 0 = ω ^ w b b t k = a , Δ θ ¨ 0 = ω ^ ˙ w b b t k = 2 b , Δ θ i 0 = ω ^ w b b i 1 t k = 0 , i = 3 , 4 , 5 , .
The third term in Equation (56) is a high-order small quantity that can be omitted. By substituting ϕ ε as Δ θ ε , Equation (56) turns into
ϕ ˙ ε = ω ¯ t k + ε + 1 2 Δ θ ε × ω ¯ t k + ε .
Now the high-order derivatives of ϕ ε can be obtained:
ϕ ¨ ε = ω ¯ ˙ t k + ε + 1 2 Δ θ ˙ ε × ω ¯ t k + ε + 1 2 Δ θ ε × ω ¯ ˙ t k + ε , ϕ 3 ε = 1 2 Δ θ ¨ ε × ω ¯ t k + ε + Δ θ ˙ ε × ω ¯ ˙ t k + ε , ϕ 4 ε = 3 2 Δ θ ¨ ε × ω ¯ ˙ t k + ε , ϕ i ε = 0 , i = 5 , 6 , 7 , .
Let ε = 0 , and considering Equation (61), we have
ϕ 0 = 0 , ϕ ˙ 0 = a , ϕ ¨ 0 = 2 b , ϕ 3 0 = a × b , ϕ i 0 = 0 , i = 4 , 5 , 6 , .
Substituting the equations above into Equation (59) yields
ϕ = a Δ t + b Δ t 2 + 1 6 a × b Δ t 3 = 1 2 ω ^ w b b k + ω ^ w b b k + 1 Δ t + 1 12 ω ^ w b b k × ω ^ w b b k + 1 Δ t 2 .
This is how the axis-angle between two consecutive sampling time instants t k and t k + 1 can be computed.
According to Rodrigues’ rotation formula,
R ^ b k + 1 b k = I + sin α u b k × + 1 cos α u b k × 2 .
As α is a small angular, since Δ t is small, Equation (66) has an approximation
R ^ b k + 1 b k I + α u b k × = I + ϕ × .
Now, substituting Equation (55) into Equation (67) leads to
R ^ b τ b k I + τ t k Δ t ϕ × .
Finally, the general procedure to solve the integration term t k t k + 1 R ^ b τ b k d τ can be summarized as follows:
  • Compute the axis-angle between t k and t k + 1 according to Equation (65).
  • Express R ^ b τ b k as Equation (68).
  • Easily solve the t k t k + 1 R ^ b τ b k d τ term, as it becomes an integration about a linear analytic expression.

4.2.2. Solve Integration Terms in Φ *

The fully linear closed-form transition matrix of Equations (38), (41), (42), (46), and (47) can now be obtained by simply solving the integration terms. The results are listed below.
Φ θ b g k + 1 , k = R ^ b k w Δ t I + 1 2 Δ t ϕ × , Φ v b g k + 1 , k = p ^ k + 1 | k w + p ^ k | k w + v ^ k + 1 | k w Δ t 1 2 g w Δ t 2 × R ^ b k w + 1 2 p ^ k + 1 | k w + 1 2 p ^ k | k w + 1 2 v ^ k + 1 | k w Δ t 1 6 g w Δ t 2 × R ^ b k w ϕ × , Φ v b a k + 1 , k = R ^ b k w Δ t I + 1 2 Δ t ϕ × , Φ p b g k + 1 , k = 1 6 g w Δ t 3 × R ^ b k w + 1 4 p ^ k + 1 | k w Δ t 1 4 p ^ k | k w Δ t 1 24 g w Δ t 3 × R ^ b k w ϕ × , Φ p b a k + 1 , k = 1 6 R ^ b k w Δ t 2 3 I + ϕ × .
Notice that all of the variables needed are available at the time of calculating the Φ terms above. This model is unbiased up to the information loss of the two-sample fitting of DCM, which is small due to the utilization of all related measurement data.

4.2.3. Process Noise Terms

The property of noise terms in Equations (50)–(54) should be acquired to compute the process noise covariance matrix in a Kalman filter. The process noise covariance at t k can be computed as [37]:
Q t k = t k t k + 1 Φ t k + 1 , τ G τ q G T τ Φ T t k + 1 , τ d τ .
We temporarily abuse symbol q here to represent the noise intensity matrix. Φ is the overal; IMU error state transition matrix. As Δ t is a small quantity, an approximate expression of Equation (70) is formulated as
Q t k Φ t k + 1 , t k G t k q G T t k Φ T t k + 1 , t k Δ t .
The discrete form, which will be preferable for a discrete filter implementation, is
Q k Φ k + 1 , k G k q G T k Φ T k + 1 , k Δ t .

4.3. Summarization

According to the derivation above, the proposed fully linear closed-form IMU error state transition equation is as follows:
δ x I M U k + 1 = Φ k + 1 , k δ x I M U k + n I M U k ,
where
Φ k + 1 , k = I 3 × 3 0 3 × 3 0 3 × 3 Φ θ b g 0 3 × 3 Φ v θ I 3 × 3 0 3 × 3 Φ v b g Φ v b a Φ p θ Δ t I 3 × 3 I 3 × 3 Φ p b g Φ p b a 0 3 × 3 0 3 × 3 0 3 × 3 I 3 × 3 0 3 × 3 0 3 × 3 0 3 × 3 0 3 × 3 0 3 × 3 I 3 × 3 ,
n I M U k = n θ k T n v k T n p k T n b g k T n b a k T T ,
and the covariance matrix of n I M U k is E n I M U k n I M U T k = Q k .
The integration terms are solved using a fitting rule of DCMs by utilizing all of the related measurements, so we claim that the obtained formulation is an unbiased model up to the numerical integration resolution.

5. ORB Descriptor-Assisted Optical Flow Front-End

In this section, we propose a sparse visual front-end using descriptor-assisted optical flow feature tracking.
Different kinds of feature descriptors are used in several VIOs to accomplish feature extraction and matching [1,6,30]. In contrast, other solutions choose optical flow feature tracking as their front-end solution since it is not that time-consuming compared to the descriptor-based methods [2,23,24]. However, there are more wrong matches in optical flow tracking than in descriptor-based methods, and these wrong matches exist even after eliminating algorithms such as random sample consensus (RANSAC). Filter-based VIOs are very sensitive to feature outliers since they don’t eliminate outliers in their iterations as the optimization-based ones do. Wrong matches left behind will participate in measurement updates, which may result in deteriorating estimates or even failure. As a conclusion, a robust front-end is needed to achieve stable performance for filter-based VIOs, while a real-time solution also calls for fast data correlation.
Yang et al. [29], refined ORB-SLAM [38] by using a sparse optical flow algorithm. The key idea was to correct the image coordinates of ORB features by optical flow tracking results to achieve sub-pixel precision. The proposed method here is a bit different since we use optical flow to first conduct a fast tracking, then compute descriptor distance between matched feature pair members and justify whether they are a good match-up.
There exist plenty of feature descriptor algorithms. We chose the ORB descriptor in our proposed method for two reasons:
  • The ORB descriptor is a binary string, so the distance between two descriptors can be expressed as a Hamming distance, which can be computed efficiently.
  • The rotation between consecutive images in a real-time application is usually very gentle, so invariance to rotation is not very important for a descriptor.

Descriptor Distance Analysis for General Corner Features

The basic visual front-end is based on Shi-Tomasi corner detection [39] and optical flow tracking [40]. It is important to figure out whether the ORB descriptor is meaningful for general Shi-Tomasi corner features. An experiment was done and proved that it is indeed meaningful statistically. We calculated the feature angle for a Shi-Tomasi feature and then used it to compute the ORB descriptor [32]. Several tests were conducted in the experiment. For each test, feature pairs from every two adjacent images of a continuous image stream were stored separately in two sequences. These tests basically analyzed the statistical properties of ORB descriptor distances of feature pairs, including
  • Coarsely matched feature pairs based on Shi-Tomasi corner detection and optical flow tracking.
  • Relatively strictly matched feature pairs based on ORB descriptor matching and RANSAC.
  • Randomly constructed feature pairs.
  • Unmatched feature pairs generated by inverse order of one of the strictly matched feature sequences.
One feature sequence from strictly matched pairs was inverted to generate strictly unmatched feature pairs. The experimental result is shown in Figure 2.
We strongly suspect the very long tail in Figure 2a may be due to wrong matches because no further outlier rejection method was applied after optical flow tracking in this test. In Figure 2b, except for the massive Guassian-like distribution, a little bump centered at about 17 appeared, which is framed by a red rectangular border. This is because the random pairs were constructed in two adjacent images and thus, two matched features have a considerable probability of being coincidentally formed into a pair. These two experiments prove that ORB descriptors and descriptor distances are meaningful for general Shi-Tomasi corners, from a statistical standpoint.
In order to clearly analyze the statistical properties of matched and unmatched pairs, two further tests were conducted. First, a descriptor-based matching and RANSAC mechanism were applied to obtain relatively strictly matched feature pairs. Then, the order of one of the feature sequences was reversed, which is a simple yet effective way to make two sequences unmatched. Descriptor distances before and after order reversion were computed, and statistical results are shown in Figure 2c,d. It can be seen from the figures that the long tail and little bump disappear because of the relatively strict pairing rule. They are plotted together in Figure 3 to make a clear comparison.
The experimental results show that the descriptor distances of unmatched and matched feature pairs possess significantly different statistical properties. As shown in Figure 3, descriptor distances of unmatched features approximately follow the Gaussian distribution with a mean, or we can say peak, at about 124.7 and with a standard deviation of 21.8 . For matched pairs, the distribution shows a sharper peak at about 18.5 . There is still a tail in the matched distribution, but it is much smaller than the one in Figure 2a. The difference between matched and unmatched pairs is significant enough to design a strategy to filter out wrong matches.
We use a heuristic to complete the mission:
  • For feature pairs with distances lower than the smaller peak value, classify them as inliers.
  • For feature pairs with distances higher than the bigger peak value, classify them as outliers.
  • For feature pairs whose distances are between two peaks, calculate and compare the Mahalanobis distances to both peak to decide their classification.

6. EKF-Based VIO Implementation Details and Improvements

In this section, implementation details and improvements of the proposed EKF-based VIO are presented, including filtering scheme, automatic initialization procedure, and refined feature triangulation mechanism. An overall flow chart of the implemented VIO algorithm is shown in Figure 4. Red sections highlight novelties proposed in this paper.

6.1. Filter State and Measurement Model

A VIO following the scheme of Mourikis and Roumeliotis [1] is implemented. The system state includes a sliding window of N historical IMU poses and camera-IMU extrinsic as proposed by Li and Mourikis [22]. The overall system state is formulated as
x a l l = x I M U q b c p c b q w b 1 p b 1 w q w b N p b N w T .
Therefore, the overall error state of the filter is
δ x a l l = δ x I M U δ θ b δ p c b δ θ 1 w δ p b 1 w δ θ N w δ p b N w T .
The measurement residual is a linearized residual about historical IMU pose errors and camera-IMU extrinsic errors. The original reprojection error is manipulated firstly by left nullspace multiplication to marginalize out the feature position, and secondly by applying QR decomposition to decrease the residual dimensions without information loss [1]. Furthermore, only residuals passing through the Mahalanobis gating test would be used in measurement updating.

6.2. Automatic Initialization Procedure

An automatic initialization procedure is developed. Firstly, a stationary scene is automatically detected by only using image stream. Secondly, stationary IMU data is used to initialize the system state. The detailed procedure is described in Algorithm 1.
The algorithm identifies a stationary scene by continuously detecting almost no motion of tracked features. Then, static gyro data is used to initialize gyro bias. Rotation matrix R ^ w b is computed by aligning gravity in frame b, which is the mean static accelerometer data, with gravity in frame w. This initialization procedure is a rough one since accelerometer bias has not been eliminated, but its uncertainty can be modeled by the initial covariance matrix of the filter state.
Algorithm 1 Automatic initialization procedure
  • Detect stationary scene
    • C o u n t e r = 0
    • for each image do
    •    pix c u r r = T r a c k F e a t u r e s ( )
    •   if 0 = = C o u n t e r then
    •      C o u n t e r + +
    •      pix p r e v = pix c u r r
    •     continue
    •   end if
    •    diff = pix c u r r pix p r e v
    •   if m a x ( diff ) is small enough then
    •      C o u n t e r + +
    •   else
    •      C o u n t e r = 0
    •      pix p r e v = pix c u r r
    •     continue
    •   end if
    •   if C o u n t e r is big enough then
    •     break
    •   end if
    • end for
  • Initialize system state
    • Save stationary acc data in arry a c c
    • Save stationary gyro data in arry g y r o
    • g b = m e a n ( arry a c c )
    • g w = [ 0 , 0 , 9.8 ] T
    • R ^ w b = F r o m T w o V e c t o r s ( g b , g w )
    • b ^ g = m e a n ( arry g y r o )
    • v ^ w = 0 ; p ^ w = 0 ; b ^ a = 0 ;

6.3. Refined Feature Triangulation Mechanism

In Mourikis and Roumeliotis’s work [1], features are triangulated only if they are no longer being tracked; however, we found that this mechanism does not perform well, especially when using cheap IMUs. To conduct frequent and effective measurement updates, which is crucial to correct biased IMU propagation, a maximum feature tracking length is set. This means each feature would be triangulated when it has been tracked for a certain number of frames, even if it is still being tracked. In the latter situation, the current observation would not be used in triangulation.
Generally, features that failed in triangulation would be discarded directly. While the proposed mechanism is that if a feature fails in triangulation while it is still being tracked, it will have another chance to triangulate when the next image is coming. This mechanism improves the performance when the camera is moving slowly, where features in adjacent images exhibit a small parallax that would easily result in triangulation failure.

7. Experimental Results

The public dataset EuRoC [33] was used to evaluate the performance of the proposed VIO. It includes 11 sequences that were collected by a UAV in three different scenes. One is a machine hall and the other two are rooms equipped with motion capture systems and different manual layout arrangements. The extrinsic and intrinsic parameters of sensors are carefully calibrated, and ground truths of UAV poses are provided. It is one of the widely used benchmarks for evaluating algorithms of different configurations, including monocular-visual, stereo-visual and monocular/stereo-visual-inertial setups. All of the experiments below were performed on an Ubuntu 16.04 virtual machine powered by MacBook Pro Mid 2015 assigned with two core and 8 GB RAM. Our implementation is a real-time algorithm based on ROS nodelet [41].
The estimated trajectories and corresponding ground truths are shown in Figure 5. Estimated trajectories are aligned with ground truths by a 6-DOF S i m 3 transformation without adjusting the scale [42].

7.1. Front-End Improvement

For implementations of front-ends with and without ORB descriptor assistance, we run each EuRoC sequence 50 times. A boxplot summary is shown in Figure 6. The corresponding means and standard deviations are listed in Table 1.
After adding ORB descriptor assistance, the estimator performs better in most sequences, since boxes became narrow and their position lower in Figure 6. The statistics in Table 1 give a numerical display of the results. Obvious improvement can be observed in seven sequences. In the other four sequences, performance are similar with or without ORB descriptor assistance. This may be due to the small quantity of outliers of optical flow tracking in these sequences.
We also analyzed the processing time of the proposed ORB descriptor-assisted outlier elimination procedure. The maximum feature number is set as 150. The results are listed in Table 2.
As shown in Table 2, the proposed ORB descriptor-assisted outlier elimination procedure introduces little computation. The processing time varies among sequences, mostly due to the motion speed. Sequences with aggressive motion tend to take less processing time than those with slow motion since fewer features are tracked in the former case, and fewer ORB descriptor distances need to be calculated.

7.2. Comparison with MSCKF-MONO

We compare our proposed monocular MSCKF with the open-source monocular MSCKF implementation MSCKF-MONO [23]. MSCKF-MONO has a visual front-end based on optical flow and utilizes first-order approximation state transition equations in filtering. It also applies observability-constrained Kalman filter (OC-KF) [21] to fix the observability problem, which would fix the wrong observability properties and improve filter performance. Note that ours does not apply any similar techniques.
In our experiment, we removed the coarse initialization and forbid the reset module in MSCKF-MONO because for some reason, MSCKF-MONO did not work properly on nearly half of sequences under the original coarse initialization, and reset does not help if there is no stop during running. The initial state was assigned by noisy ground truth for both our algorithm and MSCKF-MONO in this experiment. To make a fair comparison, we tried to run with same setup for common parameters in both algorithms, such as noise densities for sensors measurement, sliding window size, and maximum or minimum track lengths for features. However, MSCKF-MONO barely worked in any sequences under a similar setup as ours. This is mainly due to the different state transition model and visual front-end implementations. As we explored further and could not find a setup which generally performed better than the original setup for MSCKF-MONO, we left the original parameters unaltered. The comparison results are listed in Table 3.
The results show that the proposed monocular MSCKF is far more accurate than MSCKF-MONO. We claimed that this is due to a more accurate state transition model and a robust visual front-end.

7.3. Comparison with the State-Of-The-Art

The results of proposed VIO algorithm are compared with several state-of-the-art open-source monocular VIOs using the EuRoC dataset, including OKVIS [6], ROVIO [5], and VINS-MONO [2]. To make a fair comparison between pure VIOs, we turned off the closure detection in VINS-MONO. The proposed VIO automatically selects stationary IMU data to initialize the rotation and gyro bias at the beginning of every sequence, while other states are initialized as zeros. In addition, a unique parameter configuration is applied in all sequences. Results are listed in Table 4.
As shown above, the proposed VIO algorithm is comparable in accuracy to the state-of-the-art. Notice that VINS-MONO generally performs best out of all four algorithms, and the proposed algorithm has a similar performance in vicon rooms, which is due to good feature triangulation results in a limited area. In addition, the proposed algorithm and ROVIO perform better in V1_03 and V2_03 than others. There are aggressive motions in these two sequences that might result in tracking failure in the front-end; the proposed algorithm and ROVIO are filter-based methods that can utilize IMU measurements to propagate for a short period in this situation, while VINS-MONO and OKVIS sometimes fail and have to lean on relocalization in this circumstance. Notice that the machine hall is a relatively large-scale scenario [33], where triangulations in the proposed method mostly deal with points of large depth. This results in a relatively downgraded performance of the proposed method in the machine hall, even in sequences with mild motions.

7.4. Processing Time

As mentioned by Delmerico and Scaramuzza [17], the better performance of VINS-MONO is a trade-off requiring more computer resources than others. In contrast, the proposed method has a similar architecture to MSCKF-MONO, which is a light-weight solution. The average processing time of the visual front-end and EKF/optimization back-end of our implementation and the state-of-the-art are listed in Table 5.
The results show that, the proposed method has higher processing speed than the listed optimization-based methods. ROVIO is the fastest solution among all listed solutions, but as shown in Table 4, its precision is generally the worst. In proposed method, the visual front-end can process images at about 60 Hz. Notice that V2_03 is a little bit slower than others, because aggressive motions in this sequence result in a short feature tracking length, and thus, the front-end will take more time to extract new features. The EKF-based back-end run at more than 160Hz and the difference between each sequence is due to the difference in the number of features used in measurement updating. As can be concluded from Table 4 and Table 5, the proposed method is a VIO solution which has comparable precision and generally required less computation resources than the state-of-the-art.

8. Conclusions

In this paper, we first deduced a highly closed-form IMU error state transition equation from scratch. By using Hamilton’s notation of quaternion, we tried to eliminate notation ambiguity. We then managed to solve the integration terms left behind in the transition equation by introducing a two-sample fitting method to approximate the axis-angle, resulting in a fully linear closed-form formulation that is unbiased up to the fitting resolution. This formulation also has potential to incorporate IMU intrinsics into the filter state, since it is a linear function of IMU measurements. An automatic initialization procedure is developed and the feature triangulation mechanism is carefully refined. The ORB descriptor distance between Shi-Tomasi corner pairs was analyzed, and we found that there is a statistical difference in descriptor distances between matched and unmatched feature pairs. As outliers are sometimes fatal for filter-based VIOs, this inspired us to propose a visual front-end based on optical flow tracking and additionally, to use ORB descriptors to eliminate outliers. We implement a monocular VIO under the framework of MSCKF with proposed novelties.
Through a comparison between estimation results with and without the proposed outlier eliminating method, we demonstrate its effectiveness. Furthermore, an experiment was done to compare the proposed method with several state-of-the-art VIOs, both in terms of precision and computation. Results show that the proposed VIO is a visual inertial fusion solution with comparable precision to the state-of-the-ar but which demands less computation resources.
Future works include adding a robust initialization procedure adapting to versatile scenes and analyzing the point selection mechanism in detail.

Author Contributions

X.Q. and H.Z. designed the algorithms. X.Q. deduced all the formulas, analyzed the experimental results and drafted the paper. W.F., C.Z., and Y.J. revised the draft.

Funding

This research work is supported by the National Key Research and Development Program of China (Grant No. 2016YFB0502004 and No. 2017YFC0821102).

Acknowledgments

We would like to thank Sun et al. [24] for their released code. X.Q. would also like to thank Xingwei Qu, Huakun Cui, and Shuhang Liao for their inspiring talks.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Mourikis, A.I.; Roumeliotis, S.I. A multi-state constraint Kalman filter for vision-aided inertial navigation. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Roma, Italy, 10–14 April 2007; pp. 3565–3572. [Google Scholar]
  2. Qin, T.; Li, P.; Shen, S. Vins-mono: A robust and versatile monocular visual-inertial state estimator. IEEE Trans. Robot. 2018, 34, 1004–1020. [Google Scholar] [CrossRef]
  3. Von Stumberg, L.; Usenko, V.; Cremers, D. Direct Sparse Visual-Inertial Odometry using Dynamic Marginalization. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 2510–2517. [Google Scholar]
  4. He, Y.; Zhao, J.; Guo, Y.; He, W.; Yuan, K. PL-VIO: Tightly-Coupled Monocular Visual-Inertial Odometry Using Point and Line Features. Sensors 2018, 18, 1159. [Google Scholar] [CrossRef] [PubMed]
  5. Bloesch, M.; Omari, S.; Hutter, M.; Siegwart, R. Robust visual inertial odometry using a direct EKF-based approach. In Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 28 September–2 October 2015; pp. 298–304. [Google Scholar]
  6. Leutenegger, S.; Lynen, S.; Bosse, M.; Siegwart, R.; Furgale, P. Keyframe-based visual-inertial odometry using nonlinear optimization. Int. J. Robot. Res. 2015, 34, 314–334. [Google Scholar] [CrossRef]
  7. Kümmerle, R.; Grisetti, G.; Strasdat, H.; Konolige, K.; Burgard, W. g2o: A general framework for graph optimization. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, 9–13 May 2011; pp. 3607–3613. [Google Scholar]
  8. Kaess, M.; Johannsson, H.; Roberts, R.; Ila, V.; Leonard, J.J.; Dellaert, F. iSAM2: Incremental smoothing and mapping using the Bayes tree. Int. J. Robot. Res. 2012, 31, 216–235. [Google Scholar] [CrossRef]
  9. Liu, H.; Chen, M.; Zhang, G.; Bao, H.; Bao, Y. ICE-BA: Incremental, Consistent and Efficient Bundle Adjustment for Visual-Inertial SLAM. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 1974–1982. [Google Scholar]
  10. Gui, J.; Gu, D.; Wang, S.; Hu, H. A review of visual inertial odometry from filtering and optimisation perspectives. Adv. Robot. 2015, 29, 1289–1301. [Google Scholar] [CrossRef]
  11. Aqel, M.O.; Marhaban, M.H.; Saripan, M.I.; Ismail, N.B. Review of visual odometry: Types, approaches, challenges, and applications. SpringerPlus 2016, 5, 1897. [Google Scholar] [CrossRef] [PubMed]
  12. Strasdat, H.; Montiel, J.; Davison, A.J. Real-time monocular SLAM: Why filter? In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Anchorage, AK, USA, 3–7 May 2010; pp. 2657–2664. [Google Scholar]
  13. Triggs, B.; McLauchlan, P.F.; Hartley, R.I.; Fitzgibbon, A.W. Bundle adjustment—A modern synthesis. In Proceedings of the 1999 International Workshop on Vision Algorithms, Corfu, Greece, 20–25 September 1999; Springer: Berlin, Germany, 1999; pp. 298–372. [Google Scholar]
  14. Lourakis, M.I.; Argyros, A.A. SBA: A software package for generic sparse bundle adjustment. ACM Trans. Math. Softw. (TOMS) 2009, 36, 2. [Google Scholar] [CrossRef]
  15. Hsiung, J.; Hsiao, M.; Westman, E.; Valencia, R.; Kaess, M. Information Sparsification in Visual-Inertial Odometry. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018. [Google Scholar]
  16. Agarwal, S.; Mierle, K. Ceres Solver. Available online: http://ceres-solver.org (accessed on 16 August 2018).
  17. Delmerico, J.; Scaramuzza, D. A Benchmark Comparison of Monocular Visual-Inertial Odometry Algorithms for Flying Robots. Memory 2018, 10, 20. [Google Scholar]
  18. Eade, E.; Drummond, T. Scalable monocular SLAM. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), New York, NY, USA, 17–22 June 2006; Volume 1, pp. 469–476. [Google Scholar]
  19. Li, M.; Mourikis, A.I. Improving the accuracy of EKF-based visual-inertial odometry. In Proceedings of the 2012 IEEE International Conference on Robotics and Automation (ICRA), St Paul, MN, USA, 14–18 May 2012; pp. 828–835. [Google Scholar]
  20. Hesch, J.A.; Kottas, D.G.; Bowman, S.L.; Roumeliotis, S.I. Observability-Constrained Vision-Aided Inertial Navigation; Technical Report; University of Minnesota, Departmen of Computer Science & Engineering: Minneapolis, MN, USA, 2012; Volume 1, p. 6. [Google Scholar]
  21. Huang, G.P.; Mourikis, A.I.; Roumeliotis, S.I. Observability-based rules for designing consistent EKF SLAM estimators. Int. J. Robot. Res. 2010, 29, 502–528. [Google Scholar] [CrossRef]
  22. Li, M.; Mourikis, A.I. High-precision, consistent EKF-based visual-inertial odometry. Int. J. Robot. Res. 2013, 32, 690–711. [Google Scholar] [CrossRef]
  23. Group of Prof; Kostas Daniilidis, R. Msckf-Mono. Available online: https://github.com/daniilidis-group/msckf_mono (accessed on 16 August 2018).
  24. Sun, K.; Mohta, K.; Pfrommer, B.; Watterson, M.; Liu, S.; Mulgaonkar, Y.; Taylor, C.J.; Kumar, V. Robust stereo visual inertial odometry for fast autonomous flight. IEEE Robot. Autom. Lett. 2018, 3, 965–972. [Google Scholar] [CrossRef]
  25. Zheng, X.; Moratto, Z.; Li, M.; Mourikis, A.I. Photometric patch-based visual-inertial odometry. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 3264–3271. [Google Scholar]
  26. Zheng, F.; Tsai, G.; Zhang, Z.; Liu, S.; Chu, C.C.; Hu, H. Trifo-VIO: Robust and Efficient Stereo Visual Inertial Odometry using Points and Lines. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018. [Google Scholar]
  27. Trawny, N.; Roumeliotis, S.I. Indirect Kalman Filter for 3D Attitude Estimation; Technical Report; University of Minnesota, Departmen of Computer Science & Engineering: Minneapolis, MN, USA, 2005; Volume 2. [Google Scholar]
  28. Sommer, H.; Gilitschenski, I.; Bloesch, M.; Weiss, S.M.; Siegwart, R.; Nieto, J. Why and How to Avoid the Flipped Quaternion Multiplication. Aerospace 2018, 5, 72. [Google Scholar] [CrossRef]
  29. Yang, N.; Wang, R.; Gao, X.; Cremers, D. Challenges in monocular visual odometry: Photometric calibration, motion bias, and rolling shutter effect. IEEE Robot. Autom. Lett. 2018, 3, 2878–2885. [Google Scholar] [CrossRef]
  30. Mur-Artal, R.; Tardós, J.D. Visual-inertial monocular SLAM with map reuse. IEEE Robot. Autom. Lett. 2017, 2, 796–803. [Google Scholar] [CrossRef]
  31. Bloesch, M.; Burri, M.; Omari, S.; Hutter, M.; Siegwart, R. Iterated extended Kalman filter based visual-inertial odometry using direct photometric feedback. Int. J. Robot. Res. 2017, 36, 1053–1072. [Google Scholar] [CrossRef]
  32. Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar]
  33. Burri, M.; Nikolic, J.; Gohl, P.; Schneider, T.; Rehder, J.; Omari, S.; Achtelik, M.W.; Siegwart, R. The EuRoC micro aerial vehicle datasets. Int. J. Robot. Res. 2016, 35, 1157–1163. [Google Scholar] [CrossRef]
  34. Titterton, D.; Weston, J.L.; Weston, J. Strapdown Inertial Navigation Technology; IET: Stevenage, UK, 2004; Volume 17. [Google Scholar]
  35. Solà, J. Quaternion Kinematics for the Error-State Kalman Filter; Technical Report; Laboratoire dAnalyse et dArchitecture des Systemes-Centre National de la Recherche Scientifique (LAAS-CNRS): Toulouse, France, 2017. [Google Scholar]
  36. Qin, Y. Inertial Navigation; Science Press: Berlin, Germany, 2006. (In Chinese) [Google Scholar]
  37. Qin, Y.; Zhang, H.; Wang, S. Kalman Filtering and Integrated Navigation Principles, 3rd ed.; Northwestern Polytechnical University Press: Xi’an, China, 2015. (In Chinese) [Google Scholar]
  38. Mur-Artal, R.; Tardós, J.D. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras. IEEE Trans. Robot. 2017, 33, 1255–1262. [Google Scholar] [CrossRef]
  39. Shi, J.; Tomasi, C. Good Features to Track; Technical Report; Cornell University: Ithaca, NY, USA, 1993. [Google Scholar]
  40. Bouguet, J.Y. Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm. Intel Corp. 2001, 5, 4. [Google Scholar]
  41. Quigley, M.; Conley, K.; Gerkey, B.; Faust, J.; Foote, T.; Leibs, J.; Wheeler, R.; Ng, A.Y. ROS: An open-source Robot Operating System. In Proceedings of the ICRA Workshop on Open Source Software, Kobe, Japan, 17 May 2009; Volume 3, p. 5. [Google Scholar]
  42. Umeyama, S. Least-squares estimation of transformation parameters between two point patterns. IEEE Trans. Pattern Anal. Mach. Intell. 1991, 4, 376–380. [Google Scholar] [CrossRef]
Figure 1. Rotation of frame A into frame B.
Figure 1. Rotation of frame A into frame B.
Sensors 19 01941 g001
Figure 2. Statistical distribution of ORB descriptor [32] distances for coarsely matched, strictly matched, random constructed, and unmatched Shi-Tomasi feature pairs. The X axis represents descriptor distances and ranges from 0 to 255. The range of the Y axis is determined by the number of feature pairs in each experiment. (a) Coarsely matched features results. (b) Random constructed features results. (c) Strictly matched features results. (d) Unmatched features results.
Figure 2. Statistical distribution of ORB descriptor [32] distances for coarsely matched, strictly matched, random constructed, and unmatched Shi-Tomasi feature pairs. The X axis represents descriptor distances and ranges from 0 to 255. The range of the Y axis is determined by the number of feature pairs in each experiment. (a) Coarsely matched features results. (b) Random constructed features results. (c) Strictly matched features results. (d) Unmatched features results.
Sensors 19 01941 g002
Figure 3. This figure shows the descriptor distances of unmatched and matched feature pairs. It can be clearly seen that the difference is statistically significant, thus a heuristic algorithm can be used to pick out outliers.
Figure 3. This figure shows the descriptor distances of unmatched and matched feature pairs. It can be clearly seen that the difference is statistically significant, thus a heuristic algorithm can be used to pick out outliers.
Sensors 19 01941 g003
Figure 4. Flow chart of extended Kalman filter (EKF)-based visual-inertial odometry (VIO) implementation. Red sections highlight novelties proposed in this paper. Term “IMU” stands for inertial measurement unit, and term “MSCKF” stands for multi-state constraint Kalman filter.
Figure 4. Flow chart of extended Kalman filter (EKF)-based visual-inertial odometry (VIO) implementation. Red sections highlight novelties proposed in this paper. Term “IMU” stands for inertial measurement unit, and term “MSCKF” stands for multi-state constraint Kalman filter.
Sensors 19 01941 g004
Figure 5. Results of 4 EuRoC sequences classified as “difficult”. Estimated trajectories are aligned with ground truths by a 6-DOF S i m 3 transformation (without scale). (a) MH_04_difficult. (b) MH_05_difficult. (c) V1_03_difficult. (d) V2_03_difficult.
Figure 5. Results of 4 EuRoC sequences classified as “difficult”. Estimated trajectories are aligned with ground truths by a 6-DOF S i m 3 transformation (without scale). (a) MH_04_difficult. (b) MH_05_difficult. (c) V1_03_difficult. (d) V2_03_difficult.
Sensors 19 01941 g005
Figure 6. Boxplot summary of experimental results in terms of translation root-mean-square errors (RMSEs) of estimated trajectories. As can be seen, with ORB descriptor assistance the estimation is generally of higher precision, reflected in the lower position and narrower height of the corresponding box’s range for most sequences.
Figure 6. Boxplot summary of experimental results in terms of translation root-mean-square errors (RMSEs) of estimated trajectories. As can be seen, with ORB descriptor assistance the estimation is generally of higher precision, reflected in the lower position and narrower height of the corresponding box’s range for most sequences.
Sensors 19 01941 g006
Table 1. Mean and standard deviation of RMSEs in Figure 6. For each sequence, the one with an obviously better performance is highlighted.
Table 1. Mean and standard deviation of RMSEs in Figure 6. For each sequence, the one with an obviously better performance is highlighted.
SequenceMH_01MH_02MH_03MH_04MH_05V1_01V1_02V1_03V2_01V2_02V2_03
meanstdmeanstdmeanstdmeanstdmeanstdmeanstdmeanstdmeanstdmeanstdmeanstdmeanstd
pure optical flow0.3090.0760.2970.0650.3810.0500.4350.0710.3930.0510.1080.0260.0820.0120.1300.0180.1620.0570.1370.0190.2480.047
ORB assisted0.2940.0550.2730.0560.3300.0480.3660.0580.3910.0460.1040.0180.0820.0100.1310.0170.1270.0300.1340.0190.2310.039
Table 2. Mean of the processing time (ms) of the proposed ORB descriptor-assisted outlier elimination procedure for every image.
Table 2. Mean of the processing time (ms) of the proposed ORB descriptor-assisted outlier elimination procedure for every image.
SequenceMH_01MH_02MH_03MH_04MH_05V1_01V1_02V1_03V2_01V2_02V2_03
process time1.39421.64801.33731.39831.08701.32971.04100.95741.25061.05250.7465
Table 3. Comparison results for proposed algorithm and MSCKF-MONO using the EuRoC dataset. The means of positioning RMSEs (m) of 10 runs for both algorithms are calculated.
Table 3. Comparison results for proposed algorithm and MSCKF-MONO using the EuRoC dataset. The means of positioning RMSEs (m) of 10 runs for both algorithms are calculated.
MH_01MH_02MH_03MH_04MH_05V1_01V1_02V1_03V2_01V2_02V2_03
MSCKF-MONO1.0150.5340.4272.1020.9680.1690.2751.5510.2810.341×
Proposed0.2990.2800.3420.3500.3840.0960.0780.1320.1210.1370.224
Table 4. Results of proposed and state-of-the-art VIOs using EuRoC dataset. Ten runs on each sequence and the means of positioning RMSEs (m) are calculated.
Table 4. Results of proposed and state-of-the-art VIOs using EuRoC dataset. Ten runs on each sequence and the means of positioning RMSEs (m) are calculated.
MH_01MH_02MH_03MH_04MH_05V1_01V1_02V1_03V2_01V2_02V2_03
VINS-MONO0.1590.1820.1990.3500.3130.0900.1100.1880.0890.1630.305
ROVIO0.2500.6530.4491.0071.4480.1590.1980.1720.2990.6420.190
OKVIS0.3760.3780.2770.3230.4510.0870.1570.2240.1320.1850.305
Proposed0.2890.2580.3310.3940.4230.1170.0890.1340.0970.1400.211
Table 5. Average processing time (ms) and rate (Hz) of visual front-end and EKF/optimization back-end of our implementation and the state-of-the-art using the EuRoC dataset.
Table 5. Average processing time (ms) and rate (Hz) of visual front-end and EKF/optimization back-end of our implementation and the state-of-the-art using the EuRoC dataset.
SequenceMH_01MH_02MH_03MH_04MH_05V1_01V1_02V1_03V2_01V2_02V2_03
TimeRateTimeRateTimeRateTimeRateTimeRateTimeRateTimeRateTimeRateTimeRateTimeRateTimeRate
VINS-MONOfront-end18.05518.35518.65419.35221.34720.24921.44723.24322.34523.84230.633
back-end50.22050.92050.12050.12053.01953.11945.92237.92654.41848.32133.430
ROVIOfront-end2.05051.95262.04972.14762.04901.95382.05082.14812.05032.05102.0478
back-end15.96315.96315.96315.96315.76315.96315.96315.96315.96315.96315.963
OKVISfront-end46.72145.32247.42140.92441.42438.52638.82631.33238.82637.32731.432
back-end39.82539.42539.92532.13133.13030.63325.53919.25229.63427.93618.056
Proposedfront-end16.26216.56115.96316.16215.76415.76415.36516.46115.86315.96317.358
back-end5.51825.91696.11645.51816.01665.71745.41854.92035.71765.61784.6218

Share and Cite

MDPI and ACS Style

Qiu, X.; Zhang, H.; Fu, W.; Zhao, C.; Jin, Y. Monocular Visual-Inertial Odometry with an Unbiased Linear System Model and Robust Feature Tracking Front-End. Sensors 2019, 19, 1941. https://doi.org/10.3390/s19081941

AMA Style

Qiu X, Zhang H, Fu W, Zhao C, Jin Y. Monocular Visual-Inertial Odometry with an Unbiased Linear System Model and Robust Feature Tracking Front-End. Sensors. 2019; 19(8):1941. https://doi.org/10.3390/s19081941

Chicago/Turabian Style

Qiu, Xiaochen, Hai Zhang, Wenxing Fu, Chenxu Zhao, and Yanqiong Jin. 2019. "Monocular Visual-Inertial Odometry with an Unbiased Linear System Model and Robust Feature Tracking Front-End" Sensors 19, no. 8: 1941. https://doi.org/10.3390/s19081941

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop