Online IMU Self-Calibration for Visual-Inertial Systems

Low-cost microelectro mechanical systems (MEMS)-based inertial measurement unit (IMU) measurements are usually affected by inaccurate scale factors, axis misalignments, and g-sensitivity errors. These errors may significantly influence the performance of visual-inertial methods. In this paper, we propose an online IMU self-calibration method for visual-inertial systems equipped with a low-cost inertial sensor. The goal of our method is to concurrently perform 3D pose estimation and online IMU calibration based on optimization methods in unknown environments without any external equipment. To achieve this goal, we firstly develop a novel preintegration method that can handle the IMU intrinsic parameters error propagation. Then, we frame IMU calibration problem into general factors so that we can easily integrate the factors into the current graph-based visual-inertial frameworks and jointly optimize the IMU intrinsic parameters as well as the system states in a big bundle. We evaluate the proposed method with a publicly available dataset. Experimental results verify that the proposed approach is able to accurately calibrate all the considered parameters in real time, leading to significant improvement of estimation precision of visual-inertial system (VINS) compared with the estimation results with offline precalibrated IMU measurements.


Introduction
Vision-based motion estimation and 3D reconstruction is a very active field of research in robotics and computer vision communities over the last decades due to its potential applications, such as robot navigation, autonomous driving, augmented reality (AR) and virtual reality (VR). Different sensor setups can be used for vision-based motion estimation: monocular [1,2], stereo [3,4], RGB-D [5,6] and visual-inertial [7,8]. Among these, the combination of cameras and inertial sensors, also known as visual-inertial system (VINS), has attracted more and more attention in recent years [7][8][9][10][11][12][13][14][15][16][17][18][19][20][21]. On the one hand, camera and inertial measurements offer complementary nature, and the combination of these two sensors can lead to significant improvements in accuracy of the estimation results and robustness of the system. On the other hand, with the development of microelectro mechanical systems (MEMS) inertial sensors, the cost of inertial measurement unit (IMU) steadily decline and an increasing number of robot platforms, drones and consumer electronics, especially mobile devices, are equipped with low-cost MEMS-based IMUs.
Due to the imperfections of the manufacturing and physical characteristics of the sensors, real IMU measurements are usually affected by systematic errors as well as random noise [22,23]. The process of identifying the intrinsic parameters of sensors for compensating these errors is known as IMU calibration. Although VINS have seen tremendous improvements in accuracy, robustness and efficiency over recent years, most visual-inertial methods either treat these measurement errors as sensor noise or assume the IMU that used is precalibrated offline. For high performance inertial sensors, which generally are manufactured precisely or factory calibrated carefully with each sensor is sold with its own calibration parameters stored into the firmware, providing accurate measurements off the shelf [23], this assumption is reasonable. However, for low-cost MEMS-based inertial sensors, which are usually used for consumer mobile robots, neglecting these errors may significantly influence the performance of VINS or even breakdown the whole system. Moreover, offline IMU calibration have certain shortcomings. Traditional high precision calibration process heavily rely on special external equipment such as robot manipulator [24][25][26], high accuracy turntable [27], motion tracked system [28]. Such equipment is usually very expensive. As a result, the cost of one calibration process often exceeds the cost of MEMS-based IMU. Secondly, the IMU measurement models used for calibration are usually simplified linear models, the intrinsic parameters of which may vary with mechanical shocks, temperature and other factors. Treating these parameters as constant would lead to unmodeled errors and degenerate the performance of the VINS. Therefore, even high quality offline-calibration is performed, this process still needs to be repeated periodically. Moreover, performing offline calibration often needs special operators and is time consuming, which hinders the rapid deployment of VINS to consumer devices.
To this end, we propose an online IMU self-calibration method for VINS. Specifically, we are interested in concurrently performing 3D pose estimation and online IMU calibration in unknown environment, using only camera and inertial measurements, and without any knowledge about the structure of the scene or any external equipment.
We implement the method with the extension of an open source monocular VINS pipeline, VINS-Mono [8], and demonstrate that, in this setting, it is possible to concurrently localize and perform online calibration of all of the following quantities: the axis misalignments and scale factors of the IMU sensors, the g-sensitivity (also called linear acceleration dependence) of the gyroscopes. What we have to claim is that although our implementation is based on VINS-Mono, the proposed online IMU self-calibration method can be easily integrated into other graph-based visual/visual-inertial frameworks as well as the multi-camera cases. To the best of our knowledge, this is the first paper to present a VINS pipeline with online IMU self-calibration based on optimization methods. We highlight our main contributions as follows: 1.
An online IMU self-calibration method for visual-inertial system that can perform in real time in unknown environment without any external equipment.

2.
We frame the IMU calibration problem into general factors so that the proposed method can be easily integrated into other graph-based visual/visual-inertial frameworks.

3.
An extensive evaluation on publicly available dataset showing that the online calibration can obtain accurate IMU intrinsics estimation and significantly improve the estimation precision of the VINS method compared with the estimation results with offline precalibrated IMU measurements.
The rest of this paper is organized as follows. In Section 2, we discuss the relevant literature. The algorithm as well as the implementation details are introduced in Section 3. The experimental results are shown in Section 4. Finally, the paper is concluded with the discussion and possible future research directions in Section 5.

Related Work
Over the last decade, there has been tremendous progress in VINS. Existing approaches can be classified into filtering-based methods [9][10][11][12][13][14][15][16] and graph-based methods [7,8,[17][18][19][20][21]. Filtering-based methods usually process the fusion with the extended Kalman filter (EKF) and its variants, in which, the measurements from IMU are used for state propagation and observations from vision are used for update. Filtering-based methods have the advantage of fast processing since it contentiously marginalizes past states. However, their performance are usually sub-optimization due to early fix of linearization points. Compared to filter-based methods, graph-based methods have attracted more attention in the recent years, as they obtain more accurate estimation results by maintaining measurements of long term history and perform batch optimization to obtain the optimal estimate. To achieve real time operation, graph-based methods usually apply keyframe-based techniques and optimize over a bounded-size sliding window of recent states by marginalizing past states and measurements. Nevertheless, all of these methods above either treat the IMU measurement errors as sensor noise or assume the IMU that used is precalibrated offline except the work in [13], which, to the best of our knowledge, is the only VINS method that implement IMU and camera online self-calibration in a filter-based framework. However, only simulation results were shown in this paper. Besides, their method cannot be applied for graph-based VINS methods.
For offline IMU calibration, numerous methods have been proposed over the last several decades, with the development of strapdown inertial navigation system or AHRS [22]. As we are concerned with online self-calibration, here we only give a brief review of these methods. Traditionally IMU calibration methods are usually done by using special external equipment that could provide known reference acceleration or rotational velocity of the inertial sensor. The measurements of the sensor are compared directly with the known reference value to determine the intrinsic parameters. Apparently, the accuracy of these methods strongly relies on the accuracy of the kinematic reference. To obtain highly accurate results, expensive mechanical platforms, such as robot manipulator or high accuracy turntable [24][25][26][27] are usually used, resulting in a calibration cost that often exceeds the cost of the IMU's hardware. In [28] the authors exploited using a marker-based optical tracking system to provide the reference value, while in [29], the GPS readings are used to calibrate initial biases and misalignments. In order to achieve in-situ calibration, the multi-position method was firstly introduced in [30], using the fact that the magnitude of the static acceleration must equal to the gravity's magnitude. However, this method can only calibrate the bias and scale factors of the accelerometer. This technique has been further extended by [31], in which, a method with two calibration schemes was presented. The accelerometer is firstly calibrated by exploiting the high local stability of the gravity vector's magnitude, and then the gyroscope is calibrated by comparing the gravity vector sensed by the calibrated accelerometer with the gravity vector obtained by integrating the angular velocities. The main advantage of this method is that it do not require any external mechanical equipment. A similarly approach also can be found in [23]. A self-calibration method based on an iterative matrix factorization was proposed by Hwangbo et al. [32]. this method use gravity as accelerometers reference, and a camera as gyroscopes reference. A noteworthy work in the VINS domain is the kalibr toolbox by Rehder et al. [33], in which, multiple camera-IMU extrinsic as well as the IMU intrinsics are jointly optimized in a single estimator with the continues-time batch optimization framework [33,34]. Impressive results were demonstrated in this work. However, their method cannot be integrated directly in the presented discrete-time graph-based VINS methods.
Motivated by IMU preintegration theories developed by [8,[35][36][37][38], we firstly develop a novel preintegration method to handle the IMU intrinsic parameters error propagation. Then we frame the IMU preintegration measurements and IMU intrinsic parameters into general factors. In such a way, we can implement an optimization method to jointly estimate the IMU intrinsic parameters as well as the system state in a big bundle, which makes our method totally distinguished from the method in [13]. Compared to the the offline IMU calibration methods, our method integrates into the current VINS method and performs online IMU self-calibration, without any special equipment.

Visual-Inertial System
Consider a sensing system (e.g., a mobile robot or a hand-held device) equipped with a monocular camera and a low-cost uncalibrated IMU. Our goal is to concurrently estimate the states of the sensing system, as well as the IMU intrinsics in a tightly-coupled graph-based framework. A general visual-inertial system is illustrated in Figure 1, in which, several keyframes and IMU measurements are maintained in a sliding window. When a new keyframe is inserted into the sliding window, a local bundle adjustment (BA) would be implemented to jointly optimize the camera and IMU states, as well as the feature location. After optimization, one of the old keyframe as well as its corresponding states and measurements would be marginalized from the sliding windows, limiting the sliding windows size and reducing computation complexity.  Figure 1. Illustration of visual-inertial system. Several keyframes and inertial measurement unit (IMU) measurements are maintained in a sliding window. When a new keyframe is inserted into the sliding window, a local bundle adjustment (BA) would be implemented to jointly optimize the camera and IMU states, as well as the feature location.
The notations and frame definitions used in this paper are briefly described as follows. We denote the world frame and the body frame (BF) as (·) w and (·) b , respectively. The z-axis of the world frame is aligned with the direction of the gravity. The BF is defined to be the same as the accelerometer orthogonal frame (AOF). We primarily use quaternions q with Hamilton convention to represent rotation, but rotation matrices R are also used for the convenient representation of 3D vectors rotation. q w b represents the orientation of body frame with respect to the word frame. R w b is the corresponding rotation matrix of q w b . The parameter p w b is the translation from the world frame to the body frame, being expressed in word frame, b k is the body frame while taking the k-th image, g w and g b are the gravity vectors in the world frame and body frame, respectively.
The full state vector in the sliding window is defined as where x k is the IMU state at the k-th image. It contains position, orientation and velocity of the IMU as well as the accelerometer bias b a and gyroscope bias b ω . The parameter x I is the IMU intrinsic parameters, f l is the feature location, which can be parameterized by either the 3D position or inverse depth of the feature, n is the number of keyframes in the sliding window, m is the number of features which have been observed at least by two frames in the sliding window.
After the VINS was successfully initialized with the method proposed in [8,39], we were able to use a sliding window based framework to jointly optimize both the full system states as described in Equation (1). We minimize the sum of prior and the Mahalanobis norm of all measurement residuals to obtain a maximum posteriori estimation as where {r p , H p } are the prior information from marginalization. r B (ẑ b k b k+1 X ) and r C (ẑ c j l , X ) are residuals for the IMU and visual measurements respectively, where B is the set of all IMU measurements and C is the set of features. The parameter r I (x) is the residuals for IMU intrinsic parameters.
As our goal is to estimate the IMU intrinsic parameters online, in this paper, we assume the translation and orientation between camera and IMU are fixed and known from prior calibration. Furthermore, we assume there is a vision front-end for feature detection and tracking, providing the pixel measurements of the features. Therefore, we will neglect the residuals of vision measurements r C (ẑ c j l , X ) in this paper and only detail the residuals of IMU measurements r B (ẑ b k b k+1 X ) as well as the IMU intrinsics r I (x) in the following.

IMU Model
A six-degree-of-freedom IMU composes by a triple axis accelerometer and a triple axis gyroscope. Ideally, the two triple-axis define a single, shared, orthogonal 3D frame. The accelerometer senses the acceleration of the sensor along of each input axis, while gyroscope measures the angular velocity around each input axis. The scale factors convert analog or digital output of the sensor to real physical quantity measurements. Unfortunately, due to assembly imperfections, each axis of the accelerometer and gyroscope deviates by a small angle from its designed mounting direction, as a result, introducing axis misalignment and cross-axis sensitivity errors to the measurements. Also, real scale factor of each axis is unique and different from the nominal value provided by the manufacturer. In additional, the angular velocities measurements are affected by the accelerations that the sensor is subjected. Moreover, all of the measurements are almost always affected by variable bias. In order to model and compensate these errors, we employ the IMU measurement model as described in [33].
where a m and ω m are the measurement of accelerometer and gyroscope, respectively, S a is a 3 × 3 diagonal matrix constructed by the scale factors of the triple axis, M a is a 3 × 3 lower triangular matrix representing the axis misalignments and the cross-coupling terms, and S ω and M ω are defined analogously to S a and M a . Each row of M a and M ω is a unit vector. The parameter A ω is a full populated matrix representing the g-sensitivity coefficients of gyroscope, a b wb and ω b wb are the body frame acceleration and angular velocity with respect to the word frame, being expressed in the body frame, b a and b ω is the measurement bias of accelerometer and gyroscope respectively, n a , n ω is the measurement noise. We assume that n a , n ω are Gaussian white noise, n a ∼ N (0, σ 2 a ), n ω ∼ N (0, σ 2 ω ). For the derivations of Equations (3) and (4), please refer to [33].
The accelerometer and gyroscope bias are modeled as random walks, the derivatives of which are Gaussian white noise, It is inconvenient to optimize the M a and M ω as each row of these two matrices is constrained to a unit vector. Besides, when integrating the angular velocity and acceleration to obtain the motion of the system, we need to compute the inverse matrix of S a , M a , S ω , M ω at every step to compensate the measurement errors, which is a waste of computing resources. If defining T a = (S a M a ) −1 , T ω = (S ω M ω ) −1 , we could get the compensated acceleration and angular velocity from the measurements more effectively: where, T a is still a lower triangular matrix, T ω is a full populated matrix without constraint.
For an uncalibrated IMU, T a , T ω is roughly close to identify matrix with some uncertainty. A ω is close to 0.

IMU Preintegration
The IMU preintegration method was firstly introduced in [35], which parameterize the rotation error using Euler angles. This method was further improved in [8] with the continuous-time quaternion-based derivation and including the handling of IMU biases. The intuitive concept of IMU preintegration is illustrated in Figure 2. The proposed an IMU preintegration method that is motivated by [8], but is different from the previous work as we introduce the IMU intrinsic parameters in the measurement model. Therefore, we need to re-derive all of the equations from scratch.

Images
Illustration of the concept of IMU preintegration. The IMU measures the acceleration and angular velocity at discrete time. Generally, The IMU measurement frequency is much larger than the frame rate of camera. Giving two time consecutive frames b k and b k+1 , there exists several measurements in time interval [t k , t k+1 ]. The IMU measurements between image frames k and k + 1 are pre-integrated into a single compound measurement. From an optimization perspective, it can be seen as the observations or measurements with which we can construct the error terms and adjust the corresponding state values to minimize the errors.
Theoretically, giving two time instants at images frames k and k + 1, we can compute the states at t k+1 as a function of IMU measurements between two frames: where ∆t k is the duration of time interval [t k , t k+1 ]. While Equations (8)-(10) provide an estimation of motion between time t k and t k+1 , it can be seen that the IMU state propagation requires the rotation, position and velocity of frame k as the starting states. In the optimization-based algorithm, we adjust the full system states at every iteration step, which means these starting states may change. In such cases, we need to re-integrate the IMU measurements, which is computation consuming. To avoid this, we adopt the preintegration method.
Premultiply q b k w or R b k w to both sides of Equations (8)- (10) to change the reference frame for the IMU propagation to the local frame b k : where The preintegration parts α can be obtained solely with IMU measurements and its intrinsic parameters. These preintegrated measurements construct the constraints between the states at time t k and the states at time t k+1 . Therefore, we can construct the error terms with these preintegrated IMU measurements, and adjust the corresponding state values to minimize the errors base on optimization method.
It can be seen that the preintegration measurements in Equations (14)- (16) are also the function of the IMU intrinsic parameters, T a , T ω , A ω as well as the bias b a , b ω . As a result, the imperfection of these parameters would introduce errors into the measurements. This also means that we can concurrently optimize the IMU intrinsic parameters to minimize the errors constrained by the preintegration measurements. To achieve this goal, two problems need to be dealt with:

1.
Compute the Jacobian matrix of α with respect to the IMU intrinsic parameters. The Jacobian matrix will be used in two ways: (1) for the optimization method, the Jacobian matrix would be used to evaluate the increment of the input arguments; (2) at the end of each iteration step, we use Jacobian as well as the new adjusted states will be used to compute the first-order approximation of preintegration measurement so that we can avoid re-integrating the IMU measurements.

2.
Estimate the uncertainty of the preintegration measurements. All of the IMU measurements contain random noises. These noises, as well as the the inaccuracy of the IMU intrinsic parameters, introduce the uncertainty to the preintegration results. In order to get more precise optimization results, we need to estimate the uncertainty of the preintegration measurements in the form of a covariance matrix or information matrix and use these uncertainty information to weight the error terms during optimization.
These two problems will be tackled in the next section.

Jacobian and Noise Propagation
In order to compute the Jacobian matrix and estimate the uncertainty of preintegration results, we divide the true state value of preintegration result x into two components, the nominal statex and the error state δx. The nominal state is estimated by integrating the IMU measurements directly, without taking into account the noise terms and imperfect intrinsic parameters or unmodeled error. As a results, it will accumulate errors. These errors are collected into the error state. In this way, the true state can be expressed as a composition of the nominal state and error state, that is, where ⊕ is the composition operator. Since the quaternion q is over-parameterized, we define its error term with the minimal representation δθ. As a result, for quaternion, ⊕ is defined as where [·] xyz extracts the vector part of a quaternion q. For other states,x ⊕ δx x + δx.
The concept of nominal-state and error-state is similar to the error-state Kalman filter (ESKF) [40,41]. The main difference is that we need to tackle the imperfection of the IMU intrinsic parameters. This is a nontrivial extension as it introduces 24 more parameters into the state space, which makes the equations more complex compared to the general ESKF and preintegration methods showed in [8,37,38].
The nominal-state is computed as follows: As the real IMU measures the acceleration and angular velocity at discrete time, Equations (19)-(21) can be computed by numberical integraion methods such as Euler, mid-point or Runge-Kutta methods. The mid-point integration is used in our implementation code, which is detailed in the Appendix A.2. At the beginning of integration, is set to identity quaternion. With the computation above, all the large-signal dynamics have been integrated in the nominal-state. As a result, the error-state is always small. We can solve its kinematics and neglect all second-order infinitesimals to get the continuous-time linearized dynamics.
Details about how to get the error-state kinematic representation above is given in the Appendix A.1.
In order to compute the the Jacobian matrix and covariance recursively, we need to convert Equations (22)-(24), (26) and (27) to matrix representation. This takes a little trick. where With T a ij , T ω ij , A ω ij are the corresponding elements of the matrix T a , T ω , A ω , respectively. Assume r 3×1 = [r 1 , r 2 , r 3 ] T . Now we define the [·] T operator as With the definitions above, Equations (22)- (24), (26) and (27) can be converted to matrix representation form where With the Equation (29), it is convenient to compute the the covariance matrix P b k b k+1 recursively by the first order discrete-time covariance update where δt is the time between two IMU measurements, and Q is the diagonal covariance matrix of noise (σ 2 a , σ 2 g , σ 2 b a , σ 2 b g ). The initial covariance value corresponding to α, β, γ, b a , b ω is set to 0. For T a , T ω , A ω , the initial covariance can be set to 0 or a constant uncertainty, which will be discussed in Section 3.6. Now we deal with the Jacobian matrix. Neglecting the noise term, Equation (29) can be converted to a discrete-time representation as following: As a result, the Jacobian of δX pre (t + δt) with respect to δX pre (t) can be obtained conveniently: We can compute J k+1 k recursively: where i and j represent the i-th and j-th IMU measurement obtained at the time instant of frame k and k + 1, respectively, as illustrated in Figure 2.
The first order approximation of α b k+1 , β b k+1 , γ b k+1 with respect to the IMU intrinsic parameters can be computed as: where J α b a is the sub-block matrix of J b k+1 whose location is corresponding to The same meaning is also used for other J in the above. At each step of optimization, we can use Equations (34)- (36) to update the value of α, β, γ with the new estimated IMU intrinsic parameters and recompute the corresponding residuals.

IMU Measurement Factor
Considering two consecutive frames k and k + 1 in the window, the residual of preintegrated IMU measurement can be constructed from Equations (11)-(13): Please note that the accelerometer and gyroscope bias are also included in the residual terms for online correction. Equation (37) seems the same as the IMU measurements residual in the paper [8]. However, as our preintegrated IMU measurements α, β, γ handle the imperfection of the IMU intrinsic parameters, which made it different from the one of [8]. Besides, the Jacobian of r B (ẑ b k b k+1 , X ) with respect to the input arguments is completely different, too.

IMU Intrinsic Factor
There are several possible but different strategies for defining the IMU intrinsic parameters residual:

1.
The parameters T a , T ω , A ω are modeled as random-walk processes, but are constant between two frames in the sliding window, and are handled with the same strategy as the IMU bias.

2.
The parameters T a , T ω , A ω are assumed to be uncertain but constant between two frames in the sliding window. With this strategy, we can set the covariance value of P b k b k in Equation (30) corresponding to T a , T ω , A ω with a constant value and optimize them at every frame. 3.
The parameters T a , T ω , A ω are assumed to be uncertain, but constant in the timespan of the sliding window.
From the perspective of optimization method, the strategies 1 and 2 are nearly same, except that the covariance value is different, with strategy 1 computing the covariance during preintegration. However, for every frame these two strategies would add additional 24 parameters to the estimator. Assuming the size of sliding window is 10, this means that we will need to optimize an additional 240 parameters at each step, leading to a computational problem for realtime implementation. Besides, compared to the accelerometer and gyroscope bias which vary with time and temperature, the scale factor and misalignment parameters seem more stable, which means that we do not need to optimize them at every frame. Moreover, the scale factors and axis misalignments may couple with the measurement bias under degenerate motions such as rectilinear trajectory, motion with zero acceleration or rotation. In such a case, it is impossible to distinguish all these parameters in one or two frames. With these reasons, in our algorithm, we would choose the strategy 3.
The residual of T a , T ω , A ω in our implementation is defined as follows: wheret a,k ,t ω,k andâ ω,k are the current estimation of IMU intrinsic parameters. Please note that the residual is weighted by a constant covariance matrix P I and add to the cost function, Equation (2).

Discussion and Implementation Details
The considered IMU intrinsic parameters in our method include axis misalignments, scale factors and g-sensitivity. In practice, however, it should be noted that online g-sensitivity calibration is a difficult task. This my be caused by various reasons. First, the errors caused by g-sensitivity are generally smaller than other errors, especially in cases that the IMU suffers from motion with low acceleration or constant speed, or remains static. Second, the g-sensitivity coefficients of a real IMU may vary with the frequency of vibration. As a result, the errors caused by g-sensitivity are more prone to couple with the gyroscope bias or manifest as random noise. In some worse cases, online g-sensitivity calibration may lead to less accuracy of the motion estimation of VINS. Moreover, g-sensitivity calibration would introduce an extra nine variables to the state space, which would consume more computational resource. Therefore, in practice application, whether or not calibrating the g-sensitivity online depends on the IMU used and the motion that the sensor system are subjected. Fortunately, it is very easy to calibrate without g-sensitivity by setting the g-sensitivity coefficients to zeros and keep them constant. The implementation details would be discussed in the following.
We implemented the proposed online IMU self-calibration method with the extension of VINS-Mono [8], as depicted in Figure 3. The system started with measurement preprocessing. For each new image, features were detected by the Shi-Tomasi corner detector [42] and tracked by a KLT tracker [43]. Meanwhile, IMU measurements were locally pre-integrated. The initialization procedure provided all necessary values for bootstrapping the subsequent nonlinear optimization. After the system has been successfully initialized, the tightly-coupled nonlinear optimization would be implemented when every frame is added to the sliding window. After that, keyframe management module marginalized one of the frames from the sliding window in order to bound the computation complexity. We add the IMU intrinsic parameters into the state vector and replace the IMU preintegration block with our method, which are detailed in Sections 3.3 and 3.4. The proposed IMU measurement factor (Section 3.5) and IMU intrinsic factor (Section 3.6) were used in the nonlinear optimization block. The optimization computation is based on the Ceres Solver [44]. In the implementation, we separate the T a , T ω , A ω into individual parameter blocks so that we can conveniently set any blocks to constant when optimization, depending the real application. In such way, we can implement our method with and without g-sensitivity calibration, as discussed above.  Figure 3. Block diagram illustrating of a monocular visual-inertial system (VINS-Mono) [8], based on which, we implement the proposed online IMU self-calibration method. We add the IMU intrinsic parameters into the state vector and replace the IMU Preintegration block with our method, which are detailed in Sections 3.3 and 3.4. The proposed IMU measurement factor (Section 3.5) and IMU intrinsic factor (Section 3.6) are used in the nonlinear optimization block.

Experimental Results
We performed dataset experiments to evaluate the proposed online IMU self-calibration method. At present there are several publicly available datasets for VINS evaluation, such as UMich NCLT [45], EuRoc [46], PennCOSYVIO [47], Zurich Urban MAV [48], TUM VI [49], etc., in which, EuRoc is a very popular dataset and used by many papers [8,20,38,39,[50][51][52]. For a completely comparison of these datasets, please refer to [49]. In this paper we adopt the TUM VI dataset [49] for two main reasons: (1) the IMU used in EuRoc dataset is an ADIS16448, a high-end industrial grade IMU with precisely factory calibrated. In contrast, the TUM VI uses a low-cost consumer grade IMU, Bosch BMI160, which makes it more suitable for evaluating our algorithm; (2) The dataset provides both the raw IMU measurements and the precalibrated IMU measurements that compensated for proper IMU scaling and axis alignment. The value of scale factor and axis alignment parameters used for compensation are also provided with the dataset. As a result, we can easily run our algorithm with the raw IMU measurements and compare the estimated intrinsic parameters with the provided reference value.
The TUM VI dataset provides several categories of sequences for the evaluation of VINS method, including three corridor sequences, six magistrale sequences, eight outdoor sequences, six room sequences, and three slider sequence. The outdoor sequences are neglected in our experiments as we focus on the IMU self-calibration but not long time drift evaluation in challenging environments. To perform the comparison experiments, we firstly packed the the quarter resolution images (512 × 512 pixels) of the left camera, precalibred IMU measurements as well as the raw IMU measurements with different topic names into one single ROS bag file for each sequence so that we can easily evaluate the algorithm with different type of IMU measurements by just modifying the topic name.
Please note that we deduct the large IMU bias, which is coarsely reproducible between sensor restarts, from the raw measurements. This is reasonably explained in [49]. We also compensate the timestamp of IMU data so that the timestamp is consistent for camera, IMU and ground truth pose.

IMU Intrinsics Estimation Results
As discussed in Section 3.7, it is difficult to calibrate the g-sensitivity online. In the experiments, we ran the algorithm two times on each sequence of dataset, with and without A ω estimation, respectively. Then we compared the IMU intrinsics estimation results with the the reference values. In all experiments, the initial value of T a and T ω were set to the identity matrix, with the covariance as 2.0 × 10 −4 , and the initial value A ω was set as a zeros matrix with covariance as 5.0 × 10 −6 . Sensivity   Figures 4 and 5 show the estimation results on the first sequence of each categories, corridor 1, magistrale 1, room 1, slides 1, respectively, in detail. We can see that all parameters converge to a certain range close to the ref value, but slightly vary with time after that. For each sequence, the convergence speed is different. Generally, the parameters of the gyroscope converge in about 15-20 s and the estimation results is more stable. In contrast, it takes 50-100 s for the parameters of accelerometer to converge. In Figures 4b and 5b, the estimation results of T a 21  (b) It is not surprising for the results mentioned above, as many factors would affect the accuracy of the estimation results. Firstly, the VINS problem has the characteristics of high nonlinearity and local weak observability. To achieve full observability of the IMU intrinsic parameters as well as the measurement bias, motion with sufficient excitation was required. It is certain that this condition would not be always satisfied at any time in real applications. Therefore, the result shows fluctuation under insufficient excitation motion. Besides, the convergence speeds were affected by different motions, which shows a different convergence speed for each sequence. Secondly, as the accelerometer measures the acceleration and the gyroscope can measure the angular velocity directly, this also means that different conditions are required for the accelerometer intrinsics and gyroscope intrinsics to achieve observability. For the accelerometer, sufficient acceleration on each axis is required, and for the gyroscope, only sufficient rotation of the system is required. For a moving sensing system, keeping sufficient acceleration all the time is infeasible in real applications. In contrast, keeping sufficient rotation is a more easily satisfied condition. As a result, the convergence speed of gyroscope shows faster and more stable than that of accelerometer. Thirdly, the measurements of vision sensor and IMU are affected by random noise, which is coupled with the erros caused by inaccuracy intrinsics and bias, and further affects the accuracy of the estimation results. Generally, the measurement noise of gyroscopes is much smaller than that of accelerometers, which means the estimated results of gyroscopes are more accurate than accelerometers.

IMU Intrinsics Estimation without G-
From Figures 4 and 5, it seems that the estimation results of intrinsic parameters located at the diagonal of matrix T a and T ω are more stable and accurate than other parameters located at the lower or upper triangle. This is because parameters of the lower or upper triangle themselves are much smaller than those of diagonal, and any small challenges would show large fluctuations in the figures. To further evaluate the estimate results, we take the estimation values at the end of each sequence as the final estimation results and calculate their mean value, standard deviation as well as the differences compared with the corresponding reference values. The statistics are shown in the Tables 1 and 2. We can see that: (1) the standard deviation of diagonal parameters and that of off-diagonal parameters do not show obvious differences; (2) The standard deviation of gyroscope intrinsics is much smaller than that of accelerometer intrinsics. In Table 1, the standard deviation of results was about 0.002. In contrast, five of nine parameters' standard deviations in Table 2 were less than 0.001. This means that the results of the gyroscope were more accurate, which concludes the analysis above. Generally, the axis misalignments of low-cost MEMS-based IMU was 0.5-1%, and the standard deviations of our estimation results was in the order of magnitude 10 −3 . Therefore, we can conclude that the proposed method can works well with the low-cost MEMS-based IMU.  Table 2. Statistics of the estimation results of gyroscope intrinsics.

IMU Intrinsics Estimation with G-Ensitivity
As the g-sensitivity coefficients are unknown, here we only show the results of sequence of corridor 1 and magistrale 1 in detail, as in Figure 6a,b. Comparing these two figures with the estimation results without g-sensitivity showed in Figure 4a,b, we can find that the reuslts of T a and T ω are nearly same, regardless of with or without g-sensivity estimation. The results of g-sensivity A ω do not show that they are very stable. However, this does not mean that we obtained the wrong estimations, or that the calibration of g-sensitivity is totally failed. We would further discuss this in the next section with the quantitative results of VINS to indirectly evaluation to g-sensitivity estimation.

Quantitative Evaluation
The goal of the IMU calibration is to estimate the IMU intrinsics and compensate the measurements of IMU, and to improve the performance of VINS. Therefore, the accuracy of the IMU intrinsics calibration would directly affects the results of motion estimation. In this section, we quantify the accuracy of the estimation result of system pose with absolute trajectory error (ATE) and relative pose error (RPE) [53]. The RPE is only available for the sequences of rooms 1-6, as for other sequences only accurate pose ground truth at the start and end are available. For comparison, we also ran the VINS-Mono with the same camera image sequences and configuration (IMU noise, camera intrinsics, etc.), but with the precalibrated IMU measurements provided by the dataset. We did not run VINS-Mono directly with the raw IMU measurements as it fails after a few seconds on all sequences. For each sequence, we run the proposed algorithm and the VINS-Mono five times. In each run of the proposed algorithm, the IMU intrinsic parameters estimated online are saved after each nonlinear optimization step and used as the initial values of the next run. The median values obtained by the five times running were used as the final result. The trajectory estimation results of sequence rooms 3 and 5 are showed in Figure 7. To simplify the notation, in the following figures and tables, we use VINS to denote the results of VINS-Mono, our results and our results with A ω to denote the results of our proposed method without and with g-sensitivity estimation, respectively. The root-mean-square-error (RMSE) ATE of all sequences is shown in Table 3. Compared the results of our method without g-sensitivity estimation (column 3) against those of VINS-Mono (column 2), 13/20 of the our results method show improvement performance, with the rest of results offering near performance to those of VINS-Mono except the sequence magistrales 3 and 6. What interested us is that, although the estimation results of T a 21 and T a 32 in sequence magistrale 1 and slider 1 show larger differences compared to the reference values, the ATE of these two sequences show obvious improvement. This means there exist some errors in the precalibrated reference values or the intrinsic parameter values vary with time, as we mentioned in Section 1. Therefore, our VINS implementaion with online IMU self-calibration shows superior performance compared against the result of VINS-Mono with offline precalibrated IMU measurements. Now we further focus on the ATE results of our method with g-sensitivity estimation (column 4). 9/20 of the results show best performance between the three algorithms, most of which show more than 50% improvements. However, some of the results show worse performance compared to the estimation results without g-sensitivity calibration. This also concludes our discussion in Section 3.7 that online g-sensitivity calibration is a difficult task and we would not always get a better performance with g-sensitivity estimation. Table 3. Root-mean-square-error (RMSE) absolute trajectory error (ATE) comparison between the proposed method with raw inertial measurement unit (IMU) measurements and monocular visual-inertial system (VINS-Mono) with precalibrated IMU measurements in all of the sequences. Downarrow means better performance compared to VINS-Mono. The bold values indicate the best results.  Figure 8 shows the overall relative pose error (RTE) in room1-room6, caculated by the toolbox [54]. In Figure 8a, we can find obvious improvements compared to VINS-Mono. In contrast, nearly same relative orientation errors are showed in Figure 8b.

Runtime Evaluation
The proposed online IMU self-calibration method adds additional states to the system state vector, and estimates the optimal value of these states based on optimization method. As a result, online calibration would consume more computation resource. Figure 9 shows the average processing time per frame on each sequence. All experiments were performed using a computer equipped with an Intel Core(TM) i7-7700 CPU at 3.60 GHz and 8 GB RAM.
It is obvious that the proposed method consumes an additional 3-5 ms to estimate the IMU intrinsics. We further compute the average processing time for all sequences and the results are 27.7 ms, 31.8 ms, 33.5 ms, respectively. The processing time of our method without and with g-sensitivity estimation increase 14.9% and 20.9%, respectively, compared with VINS-Mono. Nevertheless, we can conclude that our method can run in real time.
Calibration process with g-sensitivity would introduce extra nine variables to the state space and consumes about additional 2 ms times the cost. Therefore, in practice applications, it is necessary to decide whether or not calibrating the g-sensitivity online, according to the IMU used and the trade-off between estimation accuracy and real time performance. Beside, the motion of the sensing system that are subjected should be considered, too. For example, if the general motion of the system is in low acceleration most of the time, g-sensitivity calibration may not significantly improve the performance of VINS.

Conclusions
In this paper, we propose an online IMU self-calibration method for visual-inertial system with low-cost inertial sensors. The main advantage of our method is that it performs online IMU self-calibration based on optimization method in unknown environment, using only camera and inertial measurements, and without any knowledge about the structure of scene or any external equipment. Specifically, the estimated parameters include the axis misalignments and scale factors of the IMU, as well as the g-sensitivity of the gyroscope. We perform extensive evaluation on public TUM VI dataset, with and without g-sensitivity estimation, respectively. Experimental results verify that the proposed method is able to accurately calibrate all the considered parameters in real time, and show superior performance compared with the results of VINS with offline precalibrated IMU measurements. It should be noted that the calibration process with g-sensitivity would introduce extra nine variables to the state space and consume more computation resource. Therefore, in practical application, it is necessary to decide whether or not calibrating the g-sensitivity online, according to the IMU used and the trade-off between estimation accuracy and real time performance.
The proposed method is dedicated to low-cost IMUs. As a result, it is very suitable for the consumer grade mobile robots, drones and cellphones which are broadly equipped with low-cost MEMS-based IMUs. Moreover, our method would promote the rapid deployment of VINS in various applications base on these platforms, such as robot navigation, autonomous driving, AR and VR.
In this paper we adopt quaternion as global parameterization for rotations. In the future we will extend our method to work with the special orthogonal group SO(3), which would make it more easily be employed in existing SO(3)-based visual/visual-inertial frameworks.
From Equations (19)- (21), the nominal-states kinematic is obtained as follows:α A. The Translation Error With Equations (A3) and (A6), it is easy to obtain: B. The Linear Velocity Error We start with the following relations: (A10)

Appendix A.2. Mid-Point Integration for the IMU Preintegration
We perform the Mid-point numerical integration in our implementation code. This section shows the computation details.
As showed in the Figure 2, we assume that the i-th IMU measurement is obtained at the time instant of frame k, and j-th measurement at frame k + 1. For a real sensor suite, there may be no IMU measurement at the time instant of a image frame is captured. In this case, linear interpolation method can be used to estimate a IMU measurement at the corresponding time instant. As showed in Section 3.6, the T a , T ω , A ω is also assumed constant in the timespan of sliding window, so we will not explicitly distinguish the T a,i , T ω,i , A ω,i and T a,i+1 , T ω,i+1 , A ω,i+1 in the following equations.

A. Nominal-State Preintegration
Integrating Equations (19)- (21) is to get the nominal-state estimation: where δt is the time interval between t i and t i+1 . At the beginning of integration, is 0, is identity quaternion. The prorogation starts with i = i and repeats step by step until i + 1 = j.
In the following we will detail how to obtain the error-state Jacobian and covariance using the Mid-point integration.