Mixed-Degree Cubature H ∞ Information Filter-Based Visual-Inertial Odometry

: Visual–inertial odometry is an effective system for mobile robot navigation. This article presents an egomotion estimation method for a dual-sensor system consisting of a camera and an inertial measurement unit (IMU) based on the cubature information ﬁlter and H ∞ ﬁlter. The intensity of the image was used as the measurement directly. The measurements from the two sensors were fused with a hybrid information ﬁlter in a tightly coupled way. The hybrid ﬁlter used the third-degree spherical-radial cubature rule in the time-update phase and the ﬁfth-degree spherical simplex-radial cubature rule in the measurement-update phase for numerical stability. The robust H ∞ ﬁlter was combined into the measurement-update phase of the cubature information ﬁlter framework for robustness toward non-Gaussian noises in the intensity measurements. The algorithm was evaluated on a common public dataset and compared to other visual navigation systems in terms of absolute and relative accuracy.


Introduction
Visual odometry (VO) is the process of estimating the egomotion of an agent with a single or multiple visual sensors in mobile robot navigation. The aim of VO is to estimate the pose incrementally. Compared with the simultaneous localization and mapping (SLAM) method seeking globally consistent estimation, VO is concerned with the local consistency of the pose estimation [1]. As an inexpensive method independent of external reference systems such as global positioning systems (GPS) and motion capture systems, VO is widely applied in domains including robotics, automotive, and wearable computing [2]. In robotic applications, VO is the basis of studies on visual control [3], obstacle avoidance in robot navigation [4,5], and unmanned aerial vehicle navigation [6].
In practical applications, VO is commonly combined with the inertial measurement units (IMUs) with the aid of Kalman filters. As the visual-inertial integrated navigation system serves as an egomotion estimator and is similar to VO functionally, we used the term of visual-inertial odometry (VIO) in this paper. The IMU generally operates independently of the environmental conditions at a higher rate than cameras. Therefore, the robustness and accuracy of the visual odometry can be enhanced by fusing the IMU measurements.
The VO methods are grouped into indirect methods and direct methods, depending on the usage of the images. The indirect methods preprocess the images of the camera to extract features that are used for motion estimation with a matching process. Direct methods use the raw measurements of the visual sensors without feature extraction and matching processes.
Indirect methods have dominated the research field for a long time, and many effective descriptors of features have been developed as shown in [7]. Guang [8] applied a line-feature extractor in the pre-processing of images and estimated the attitude error with line features. The IMU measurements and is suitable for the camera-IMU system consisting of an IMU and an RGBD camera. The main contributions of this paper are as follows.

1.
A mixed-degree nonlinear filter framework was proposed for fusing the IMU measurements and the intensity measurements of the camera. The proposed filter applied two cubature rules with different degrees to guarantee numerical stability with high accuracy. 2.
The H ∞ filter was used in the measurement update phase for robustness toward the non-Gaussian noises in intensity errors.
The remainder of this paper is organized as follows. The model of the visual-inertial odometry is presented in Section 2. The VIO system based on a mixed-order cubature H ∞ information filter is detailed in Section 3. The test results with a public dataset are presented in Section 4. The conclusions are presented in Section 5.

Model of Visual-Inertial Odometry
In this section, we present the model of a keyframe-based VIO system consisting of the motion model, the keyframe management, and the measurement model. The motion model was formulated in form of classical rigid-body kinematics with the modified Rodrigues parameter representing the attitude. The keyframe management was mainly inspired by Engle's DSO system with some simplifications. The measurement model was formulated in the form of the simple frame-to-frame alignment with the intensity as the measurement.

Coordinate Systems
Two moving coordinate systems connected with the sensors were used in this paper: (1) The IMU coordinate system (ICS) with its origin at the center of the IMU, where the linear accelerations and angular rates are measured; and (2) The camera coordinate system (CCS) with its origin at the optical center of the camera and the z-axis aligned with the optical axis of the lens.
Another two reference coordinate systems were defined correspondingly: The static IMU coordinate system (SICS) with the same origin and axes as the ICS at the time of constructing the current keyframe; and the static camera coordinate system (SCCS) with the same origin and axes as the CCS at the time of constructing the current keyframe. The SICS and SCCS were static relative to the inertial space in the egomotion estimation.
In this paper, the superscript at the top-left corner of a vector represents the coordinate system the vector is expressed in. Specifically, I p represents the vector p in the ICS, C p represents the vector p in the CCS, SI p represents the vector p in the SICS, and SC p represents the vector p in the SCCS.

Motion Model of Camera-IMU System
A 9-dimension system was built to represent the motion of the camera-IMU system. The state vector is defined as follows: where I SI ρ is the modified Rodrigues parameter (MRP) representing the rotation of the ICS relative to the SICS; SI p is the position of the origin of the ICS in the SICS; and SI v is the velocity of the origin of the ICS relative to the SICS. A discrete motion model was constructed according to the six degrees of the freedom kinematic model of a rigid body as follows: where is the input vector of the navigation system; I ω k−1 and I a k−1 are the measurements of the gyroscope and the accelerator at time k − 1, respectively; C = 0 1×6 ∆t SI g T T is a constant vector; ∆t is the sampling period; SI g = SI E R E g is the gravity acceleration vector in the SICS; SI E R is the transformation matrix from the earth-fixed coordinate system to the SICS, of which initial value determined by a simple attitude and heading reference system (AHRS) [32]; and F and G(x k−1 ) are the state-transition matrix and input matrix, respectively, formulated as follows: where I 3 is the three-dimensional identity matrix; and 0 3 is the three-dimensional zero matrix. A I SI ρ k−1 and I SI R I SI ρ k−1 are formulated as follows. The right subscript k − 1, left subscript 'SI' and left prescript 'I' in the right side of the equations are neglected to simplify the description In fact, I SI R I SI ρ k−1 is the direct cosine matrix representing the rotation of the IMU coordinate system relative to the static IMU coordinate system.

Keyframe Management
A keyframe consisting of a RGB image and a depth image represents the reference for the egomotion estimation. The very first RGB image and depth image are used to set up the first keyframe. A point set is sampled from the images of the keyframe together with the intensity. Unlike the indirect VO, no extraction or matching process of the distinct features is needed in the direct VO. As a result, the direct VO is generally faster than the indirect VO. However, the accuracy and robustness are hard to ensure without carefully designed features. To ensure the accurate estimation, the point sampling in the direct VO should follow two basic criteria: (1) The sampled points are distributed as evenly as possible in the image; and (2) The absolute intensity gradient at the sampled point is significantly higher than its neighbors. The absolute intensity gradient g at point j with the pixel coordinate of u p j , v p j can be evaluated with the intensity differences as follows: where I(·) is the intensity at the corresponding pixel coordinate. For even sampling, the RGB image in the keyframe is divided into patches with the size of b × b. The pixel with the highest absolute intensity gradient among the pixels with valid depth measurements in each patch is chosen as a candidate. If the absolute intensity gradient of a candidate is no less than a predesigned threshold g thres , the 3D Appl. Sci. 2019, 9, 56 5 of 19 coordinate of the candidate pixel is constructed. We used a simple pin-hole camera model for the 3D construction.
where d u p j , v p j is the depth measurement of corresponding point; and f x , f y , c x , and c y are the intrinsic parameters of the RGB camera. To simplify the measurement function, the constructed points were transformed into static IMU coordinate system. The transformed 3-dimensional coordinate was added into the point set of the keyframe.
where C I R and C p represent the transformation between the ICS and the CCS. We assumed that the two sensors were mounted on a rigid body so that the transformation did not change over time. The camera-IMU transformation and the intrinsic parameters were calibrated in advance and the calibration methods are not included in this paper.
The absolute intensity gradient threshold g thres was designed according to the intensity characteristic of each patch and evaluated adaptively online. We treated g thres as the sum of two parts: where g is the average intensity of the image patch and g b is a positive value to ensure that the absolute intensity gradient of the candidate is sufficiently high. It is easy to understand that the absolute intensity gradient of the candidate should exceed more pixels in a larger patch, which means that g thres is higher with a larger patch and g b has a positive correlation with the patch size b. We used a simple linear function to represent the positive correlation and formulated the lower-bound constraint of the absolute intensity gradient as follows: where λ is a positive constant. Particularly in the case of b = 1, i.e., only one pixel exits in each patch, the above lower bound constraint degrades into g ≥ g, which means that all points with valid depth measurements are constructed. Figure 1 shows an example of point sampling on a 640 × 480 image. In the example, the above method was able to sample points not only from the edges of the objects, but also from regions with weak intensity variations such as the computer screen and the floor. The points were distributed more densely with a smaller patch. The numbers of sampled points with different patch sizes in this example are shown in Table 1.  The above point sampling method adjusts the number of the points by tuning b and guarantees evenly distributed points. The usage of images approximated the dense visual odometry with a small b. The usage of images approximated the sparse visual odometry with a large b. The keyframe may not sufficient to estimate the egomotion as the camera moves, so we used two judging criteria to change the keyframe: 1. A new keyframe is needed if there are not enough points in the current view. This is measured via the ratio of points in the current view. 2. Observing the same area from different locations far from each other is likely to lead to large differences in occlusion and illumination. In such cases, new keyframes are needed even if enough points are in the current view. This criterion is quantified with the mean square optical flow ignoring rotational motion  is the pixel coordinate transformed only by the translational motion. A new keyframe is constructed with the current RGBD images if any of the two criteria are satisfied. After a new keyframe is constructed, the previous keyframe will be replaced by the new one, and the reference coordinate systems are changed in turn. The estimations of SI v and SI E R are transformed with I SI ρ . Then, the estimations of I SI ρ and SI p are set to 0. The above point sampling method adjusts the number of the points by tuning b and guarantees evenly distributed points. The usage of images approximated the dense visual odometry with a small b. The usage of images approximated the sparse visual odometry with a large b. The keyframe may not sufficient to estimate the egomotion as the camera moves, so we used two judging criteria to change the keyframe:

1.
A new keyframe is needed if there are not enough points in the current view. This is measured via the ratio of points in the current view.

2.
Observing the same area from different locations far from each other is likely to lead to large differences in occlusion and illumination. In such cases, new keyframes are needed even if enough points are in the current view. This criterion is quantified with the mean square optical is the pixel coordinate transformed only by the translational motion.
A new keyframe is constructed with the current RGBD images if any of the two criteria are satisfied. After a new keyframe is constructed, the previous keyframe will be replaced by the new one, and the reference coordinate systems are changed in turn. The estimations of SI v and SI E R are transformed with I SI ρ. Then, the estimations of I SI ρ and SI p are set to 0. The above keyframe and point sampling method mainly refers to Engel's DSO method, except for several simplifications as follows: • Only one keyframe is maintained for the sake of complexity. • The depth measurements are generated with an RGBD camera instead of being estimated in the optimization scheme.

•
The gradient constraint in the sampling points is evaluated according to the patch size.

Measurement Model
A direct VO was used as the measurement model where the intensity was used as the measurement directly without feature extraction and matching processes. The position of each point j was transformed from the SICS into the CCS by the motion state as follows: The pixel coordinate of point j in the current view was evaluated according to the pin-hole camera model.
The intensity I k u p j , v p j of point j in the RGB image at time k was used as the measurement where n j,k is the measurement noise and assumed to obey a Gaussian distribution with covariance R j,k . Combining Equations (11)-(13), one can obtain the following measurement equation.

Mixed-Degree Cubature H ∞ Information Filter-Based VIO
The above keyframe management method and measurement model of intensity led to a simple frame-to-frame alignment-based egomotion estimator. However, the simple estimator with linearization-based solvers such as the Gaussian-Newton optimization and EKF performed poorly as the models were highly nonlinear and the measurement noises vary dramatically. The linear approximations and Gaussian assumptions used in the solvers were not valid practically. For a robust estimator, the previous studies on VO generally maintained a slide window of multiple keyframes or a local map. In this section, we present a nonlinear filter-based VIO method for the frame-to-frame alignment problem. The designed hybrid cubature H ∞ information filter used two cubature rules with different degrees to reduce the linearization error in a numerically stable way and the H ∞ filter to estimate the states in the presence of non-Gaussian noises.

Bayes Filter for Nonlinear Visual-Inertial Navigation System
Consider the following nonlinear system where f(·) and h(·) are arbitrary nonlinear functions and w k−1 and n k are the process noise and the measurement noise, respectively. Under the Gaussian assumption of noises, the Bayes filter for nonlinear system described by the information vector and information matrix was formulated as an iterated process of a time-update phase and a measurement-update phase.
(1) Time-update phasê (2) Measurement-update phasê The state estimation could be recovered from the information state According to the measurement model in Equation (14), only a part of the state vector appeared explicitly in the measurement equation. We defined the state vector explicitly shown in the measurement model as follows: where n 1 = 6. The rest state vector is Correspondingly, the covariance matrix P of the state vector is composed of four blocks as follows: Applying the above partition of the state vector to the evaluation of the predicted measurements (the first equation in Equation (17)), one can obtain.
As shown by Equation (22), the original 9-dimensional integral is simplified to a 6-dimensional integral. The evaluation of the cross-covariance matrix (the second equation in Equation (17)) can also be simplified in a similar way. where p(x 1 , x 2 ) is the joint probability density function of two random vectors x 1 and x 2 i.e., the probability density function of the Gaussian distribution. Based on the formula of conditional probability, one can obtain p(x 1 , Substituting Equation (24) into Equation (23) yields With the Gaussian assumption of the state vector, the distribution of x 2 conditional on x 1 is a multivariate Gaussian with the mean value ofx 2 + P 21 P −1 11 (x 1 −x 1 ) and the covariance matrix of P 22 − P 21 P −1 11 P 12 . Thus, the inner integral in Equation (25) is solved as The cross-covariance matrix is evaluated as follows: In this way, the integrals in 9-dimensional space are simplified into integrals in 6-dimensional space. This is practically useful when applying high-degree cubature rules and will be discussed later.

Mixed-Degree Cubature Information Filter
In general, the four integrals in Equations (16) and (17) cannot be solved directly. A group of numerical integration-based approximation methods named by cubature rules are used for the filter process and the resulting filters are called cubature filters. Using a cubature rule, a Gaussian weighted integral of a nonlinear function g(x) is approximated with a weighted sum as follows: where X j represents the cubature points; w j is the corresponding weight that satisfies N p ∑ j=1 w j = 1; and N p is the number of the cubature points. The formulations of the cubature points and weights are different in different cubature rules. The spherical-radial (SR) cubature rules use the spherical-radial coordinates to transform the original integral into a double integral and select the cubature points on the spherical coordinate and the radial coordinate respectively. Based on the SR rules, the spherical simplex-radial (SSR) cubature rules use the n-simplex to evaluate the cubature points on the spherical coordinate. The degree of a cubature rule is defined based on the degree of used monomials in the Tayler polynomial of the nonlinear function. A p-degree cubature rule integrates all of the monomials in the Tayler polynomial of g(x) up to degree p exactly but not exactly for some monomials of degree p + 1. In general, the cubature rule with a higher degree achieves higher accuracy, but suffers from more computation with more cubature points and worse numerical stability with potential negative weights. Practically, third-degree cubature rules and fifth-degree cubature rules are commonly used in the Bayes filter. The numbers of cubature points of the four cubature rules are listed in Table 2. 3rd-degree spherical-radial cubature rule (3-SR) 2n 5th-degree spherical-radial cubature rule (5-SR) 2n 2 + 1 3rd-degree spherical simplex-radial cubature rule (3-SSR) 2n + 2 5th-degree spherical simplex-radial cubature rule (5-SSR) n 2 + 3n + 3 A cubature rule with all weights being positive is more desirable than one with both positive and negative weights for the sake of numerical stability. A stability parameter was defined in [33] as the sum of absolute values of the weights: When Θ is larger than 1, the truncation error will decrease the accuracy of the numerical integration. The curves of Θ with respect to n for different cubature rules are shown in Figure 2. The stability parameters of 3-SR and 3-SSR were maintained at 1, while the stability parameter of 5-SR increased since n > 4, and the stability parameter of 5-SSR increased since n > 7.
rules, the spherical simplex-radial (SSR) cubature rules use the n-simplex to evaluate the cubature points on the spherical coordinate.
The degree of a cubature rule is defined based on the degree of used monomials in the Tayler polynomial of the nonlinear function. A p-degree cubature rule integrates all of the monomials in the Tayler polynomial of ( ) g x up to degree p exactly but not exactly for some monomials of degree p + 1. In general, the cubature rule with a higher degree achieves higher accuracy, but suffers from more computation with more cubature points and worse numerical stability with potential negative weights. Practically, third-degree cubature rules and fifth-degree cubature rules are commonly used in the Bayes filter. The numbers of cubature points of the four cubature rules are listed in Table 2.

Cubature rules
Np 3rd-degree spherical-radial cubature rule (3-SR) 2n 5th-degree spherical-radial cubature rule (5-SR) 2n 2 + 1 3rd-degree spherical simplex-radial cubature rule (3-SSR) 2n + 2 5th-degree spherical simplex-radial cubature rule (5-SSR) n 2 + 3n + 3 A cubature rule with all weights being positive is more desirable than one with both positive and negative weights for the sake of numerical stability. A stability parameter was defined in [33] as the sum of absolute values of the weights: When Θ is larger than 1, the truncation error will decrease the accuracy of the numerical integration. The curves of Θ with respect to n for different cubature rules are shown in Figure 2. The stability parameters of 3-SR and 3-SSR were maintained at 1, while the stability parameter of 5-SR increased since n > 4, and the stability parameter of 5-SSR increased since n > 7. According to the previous motion and measurement models, the time-update phase involves Gaussian-weighted integrals in 9-dimensional space while the measurement-update phase involves Gaussian-weighted integrals in 6-dimensional space. To obtain high precision and numerical stability with as few cubature points as possible, we applied the 3-SR rule in the time-update phase and the 5-SSR rule in the measurement-update phase. Substituting the chosen cubature rules, we formulated the VIO process based on the mixed-degree cubature information filter as follows.
(1) Time-update phase Evaluate the cubature points based on the 3-SR rule: According to the previous motion and measurement models, the time-update phase involves Gaussian-weighted integrals in 9-dimensional space while the measurement-update phase involves Gaussian-weighted integrals in 6-dimensional space. To obtain high precision and numerical stability with as few cubature points as possible, we applied the 3-SR rule in the time-update phase and the 5-SSR rule in the measurement-update phase. Substituting the chosen cubature rules, we formulated the VIO process based on the mixed-degree cubature information filter as follows.
The weights for the 5-SSR rule are as follows: , j = 0 (7−n 1 )n 2 1 2(n 1 +1) 2 (n 1 +2) 2 , j = 1, · · · , 2(n 1 + 1) 2(n 1 −1) 2 (n 1 +1) 2 (n 1 +2) 2 , j = 2n 1 + 3, · · · , n 2 1 + 3n 1 + 2 Evaluate the predicted measurementẑ k|k−1 and the cross-covariance matrix P xz : As the dimension of the measurement vector is generally large, the inversion of the matrix in Equation (17) is very time-consuming. To omit the inversion, we assumed that the m measurements were independent of each other, and the covariance matrix was a diagonal matrix with diagonal elements of R j,k (j = 1, . . . , m). Further assuming that R j,k was invariant with time and equal to the other diagonal elements, R j,k was treated as a constant R. Thus, the inverse of the m × m matrix R was replaced by the division of the scalar R, and the measurement update of the information vector and matrix was obtained as follows.
Finally, the estimated state vector and covariance matrix were recovered based on Equation (18).

Robust H ∞ Filter
The aim of the H ∞ filter is to obtain the state estimation minimizing the following cost function [34].
Based on game theory, the design of the H ∞ filter is a game between the filter designer and nature. Nature's goal is to maximize the estimation error by properly choosing x 0 , w k , and n k . The cost function is defined in a fractional form to create a fair game by preventing nature using extremely large noises.
The above optimization problem is difficult to solve directly. An alternative strategy is to choose an upper bound and estimate the state satisfying the following constraint.
where γ is the performance bound to be designed. According to the above equation, a new cost function is defined as follows: Therefore, the original optimization problem is transformed into the following minimax problem.
Yang [29] solved the minimax problem and proposed an extended H ∞ filter (EH ∞ F) for the nonlinear system, which was similar to EKF in form.
where I n is the n-dimension identity matrix. The EH ∞ F is a Jacobian-based filter, and can be extended with cubature rules for higher precision in nonlinear systems as Chandra did in [30]. The time-update phase of the cubature H ∞ filter (CH ∞ F) is the same as the CKF and expressed as the equation. Therefore, we only provided the measurement-update equations as follows: where the cross-covariance matrix P xz and the measurement matrix H k are evaluated using a cubature rule. It can be seen from Equations (47) and (48) that the main difference between the EH ∞ F and the Kalman filter was the formula to update the covariance matrix P k|k . In this way, the EH ∞ F acts to reduce the confidence of the state estimation in case the unexpected noises negatively impact the performance. Applying the information-description of the Kalman filter on the above CH ∞ F, a cubature H ∞ information filter (CH ∞ IF) can be formulated as follows: where the covariance matrix of measurement error vector P zz is an m-dimension matrix. As previously stated, m is generally large in direct VO, which means that the inversion of P zz in the above equation is not practical. We used the expressions of the gain matrix in the Kalman filter to simplify the inversion. In this way, the measurement-update phase in CH ∞ IF was divided into two steps as follows: (1) Evaluate the measurement matrix H k with a cubature rule and update the information matrix using the information filter.
(2) Evaluate the gain matrix K ∞ and execute the measurement update of the H ∞ filter.
It is easy to prove that the above two-step measurement-update process is equal to Equation (49). The inversion of the m-dimension matrix is replaced by the inversion of a 9-dimensional matrix Y k .
It is worth noting that the measurement-update phase of the proposed CH ∞ IF was different to the one with the same abbreviation that Chandra derived in [31]. Compared with Equation (49), Chandra's method omitted the second term of i k when using the technique to analyze the estimation error in [35,36]. The error transition function of the measurement-update phase of our method is as follows: x k|k = x k|k−1 − K kzk (52) wherez k is the predicted error of the measurement z k , and formulated as follows: While the error transition function of the measurement-update phase of Chandra's method is related to the predicted mean value of the state vectorx k|k−1 as follows: This goes against the convergence of the filter. In contrast, the error transition function of our method is in the form of a weighted error function as used in [35,36], which makes a convergent filter.

Summary of MCH ∞ IF-Based VIO
Combining the above methods, a mixed-order cubature H ∞ information filter (MOCH ∞ IF) was designed for the nonlinear non-Gaussian VIO system. The time-update phase was the same as Equations (30)- (34). The measurement-update phase was performed in the following steps.
• Evaluate the measurement matrix H k with the third equation in Equation (17) and update the information matrix using the information filter with the time complexity of O(m) • Evaluate the gain matrix K ∞ and execute the measurement update of the H ∞ filter with the time complexity of O(m) In conclusion, the time complexity of the measurement-update phase of MOCH ∞ IF is O mN p . In the case of n 1 = 6, we have N p = 57. Generally, the number of measurements is much larger than that i.e., m N p . Thus, the above measurement-update phase is a linear algorithm. In addition, the process of the measurement-update phase is highly parallelizable except for the Cholesky decomposition and the inverse of Y k as it mainly consists of matrix additions and multiplications.

Evaluation Results
As analyzed in the previous section, the measurement-update phase with operations of large matrixes is the bottleneck for real-time applications. However, the measurement-update phase of an information-based filter is parallelizable and can be implemented on a parallel processor. We used a laptop with the Geforce 940M GPU (NVIDIA, Santa Clara, CA, USA) and the i7-5600U CPU (Intel, Santa Clara, CA, USA) to test the proposed MOCH ∞ IF-VIO system. We implemented our method with CUDA (in Supplementary Materials) (Version 8.0, NVIDIA, Santa Clara, CA, USA, 2016) language. The time-update phase was executed on the CPU and the measurement-update phase was executed on the GPU. We used the Technical University of Munich (TUM) RGBD dataset [37] for evaluation. The TUM RGBD dataset contains multiple sequences of videos collected using an RGBD camera under different environmental conditions. The RGB images and depth images are collected at the rate of 30 Hz with a resolution of 640 × 480. The TUM dataset does not contain IMU measurements, we simply used the differential and quadratic differential of the ground truth with the addition of random noises as the simulated IMU measurements.
We compared the proposed MCH ∞ IF-VIO method with RGBDSLAM v2.0 in [38] and the lightweight visual tracking (LVT) in [12]. The results of RGBDSLAM were calculated with the corresponding ROS package, and the results of the LVT were directly obtained from the original paper. We conducted the evaluation for MCH ∞ IF-VIO with five different patch sizes: 2, 4, 8, 16, and 32. The numbers were selected as powers of two for the convenience of CUDA implementation. The absolute trajectory (ATE) and relative pose error (RPE) defined in [37] were used as error metrics. The ATE measures the global consistency of the estimated trajectory and is suitable for visual SLAM methods. On the other hand, the RPE measures the local accuracy of the trajectory over a fixed time interval and is suited for visual odometry methods. We used five sequences in the TUM dataset for testing the three methods. The basic parameters of the five sequences are shown in Table 3. The test results are shown in  In these figures, MCH ∞ IF-VIO(n) represents the MCH ∞ IF-VIO with a patch size of n. The RPE of MCH ∞ IF-VIO increased along with the patch size in all of the tests. The change in the ATE of MCH ∞ IF-VIO was similar to RPE with some exceptions. This is in line with the common knowledge that more measurements lead to better accuracy in visual navigation. The MCH ∞ IF-VIO in this paper and LVT are dead-reckoning algorithms, while RGBDSLAM is a compete SLAM algorithm with joint optimization and loop-closure. Therefore, RGBDSLAM is more consistent globally than MCH ∞ IF-VIO and LVT according to the ATE results. In our tests, MCH ∞ IF-VIO achieved a smaller ATE than LVT with only one exception, where the ATE of MCH ∞ IF-VIO(32) was larger than LVT in the test on fr1_room. MCH ∞ IF-VIO(2) had the smallest RPE in all of the tests. The estimation errors caused by the IMU measurement noises increased over time. Moreover, the simple frame-to-frame alignment used in the measurement-update phase was less robust than the optimization with the global and local map used in RGBDSLAM and LVT. Therefore, the performance of MCH ∞ IF-VIO in the test with long duration such as fr3_office was less distinctive.          The RPE of MCH∞IF-VIO increased along with the patch size in all of the tests. The change in the ATE of MCH∞IF-VIO was similar to RPE with some exceptions. This is in line with the common knowledge that more measurements lead to better accuracy in visual navigation. The MCH∞IF-VIO in this paper and LVT are dead-reckoning algorithms, while RGBDSLAM is a compete SLAM  The RPE of MCH∞IF-VIO increased along with the patch size in all of the tests. The change in the ATE of MCH∞IF-VIO was similar to RPE with some exceptions. This is in line with the common knowledge that more measurements lead to better accuracy in visual navigation. The MCH∞IF-VIO in this paper and LVT are dead-reckoning algorithms, while RGBDSLAM is a compete SLAM

Conclusions
In this paper, we presented our visual-inertial navigation system called MCH ∞ IF-VIO, which uses a raw intensity-based measurement model. By analyzing the integrated navigation system, we constructed a numerically stable mixed-degree cubature information filter scheme for the state estimation problem. The H ∞ filter was combined for the non-Gaussian noises in the intensity measurements. The proposed VIO system was suitable for the RGBD camera-IMU system. The system was evaluated on the TUM RGBD dataset and compared with the RGBDSLAM and LVT systems.
Given the raw visual measurement model with intensities directly used as measurements in frame-to-frame alignment, the proposed MCH ∞ IF-VIO system with consideration of the system nonlinearity and non-Gaussian measurement noises was shown to achieve good performance in our tests. Though the method prefers short durations, the implementation with a patch size of two outperformed the RGBDSLAM and LVT in the long-duration test. With the aid of adjustable patch size, the proposed method could be tuned from an accurate-dense algorithm to a fast-sparse algorithm.
This paper aimed to build a robust state-estimation framework according to the fundamental characteristics of the visual-inertial navigation system. The camera measurements were used in a raw and direct way without heuristics in selecting the feature descriptors or tracking a sliding window. Other well-designed intensity-based and feature-based measurement models can be applied in this hybrid filter-based framework.
Supplementary Materials: The CUDA code used in the evaluations is available online at https://1drv.ms/f/s! ApzqIuEnpaPHiDMOyneS8UV2KO6N.