Fast Attitude Estimation System for Unmanned Ground Vehicle Based on Vision/Inertial Fusion

: The attitude estimation system based on vision/inertial fusion is of vital importance and great urgency for unmanned ground vehicles (UGVs) in GNSS-challenged/denied environments. This paper aims to develop a fast vision/inertial fusion system to estimate attitude; which can provide attitude estimation for UGVs during long endurance. The core idea in this paper is to integrate the attitude estimated by continuous vision with the inertial pre-integration results based on optimization. Considering that the time-consuming nature of the classical methods comes from the optimization and maintenance of 3D feature points in the back-end optimization thread, the continuous vision section calculates the attitude by image matching without reconstructing the environment. To tackle the cumulative error of the continuous vision and inertial pre-integration, the prior attitude information is introduced for correction, which is measured and labeled by an off-line fusion of multi-sensors. Experiments with the open-source datasets and in road environments have been carried out, and the results show that the average attitude errors are 1.11 ◦ and 1.96 ◦ , respectively. The road test results demonstrate that the processing time per frame is 24 ms, which shows that the proposed system improves the computational efﬁciency. Author Contributions: Conceptualization, Z.F. and C.M.; methodology, P.Y.; software, Z.F.; valida-tion, Q.Z. and P.Y.; formal analysis, Z.F. and C.M.; investigation, Z.F.; data curation, C.M. and X.L.; writing—original draft preparation, Z.F.; writing—review and editing, C.M. and P.Y.; visualization, X.L.; supervision, Q.Z. and P.Y.; project administration,


Introduction
The attitude estimation of unmanned ground vehicles (UGVs) plays an important role in autonomous navigation, rollover warning, and ride experience optimization. As a part of navigation information, attitude can be calculated by many navigation methods of unmanned vehicles, such as the strapdown inertial navigation system (SINS), vision navigation system, LiDAR navigation system, polarized light navigation system, or multisensor fusion system.
At present, UGVs rely more on the fusion of the global navigation satellite system (GNSS) and inertial navigation system (INS), which can estimate the attitude all-weather, all-day [1]. SINS can independently calculate the attitude independently, and its error accumulates over time. As GNSS can independently provide high-precision measurement information, using filtering methods such as the Kalman filter (KF), particle filter (PF), extended Kalman filter (EKF), etc. to combine them can achieve high-precision attitude estimation [2,3]. In general, GNSS/SINS integrated navigation systems work well when the GPS signal has not interfered. However, the GNSS signals might be affected by attenuation and multipath effects. In this case, how to design the attitude estimating system for UGVs is vital [4]. To tackle this, LiDAR is an ideal choice. Generally, LiDAR can provide for UGVs with incremental odometer information by accurate bearing and range measurements at high frequency [5,6]. Zhou et al. designed a computationally efficient LiDAR-odometry framework based on truncated least squares with a novel feature extraction module [7]. Nevertheless, the high cost of high-precision LiDAR prohibits its mass deployment in UGVs. Bionic polarized light orientation can be used to achieve the low-cost orientation for 1.
A fast attitude estimate system is proposed. Based on the optimization method, the MEMS pre-integration results and the continuous visual attitude calculation results are fused. In order to eliminate the accumulated error, the pre-measured offline attitude library is introduced to provide a high-precision value.

2.
The experimental comparison results demonstrate the computational efficiency of the proposed method, and the attitude error will not accumulate with the endurance.
The rest of this paper is organized as follows. In Section 2, the system model, fusion algorithm, and attitude library construction methods are introduced. In Section 3, we Machines 2021, 9,241 3 of 15 conduct experiments; the experimental results are discussed in Section 4. In Section 5, we summarize this paper and give the conclusion.

System Model
In this paper, a fast attitude estimate system based on the fusion of vision/inertial is proposed. Firstly, based on the optimization method, the MEMS pre-integration results and the continuous visual attitude estimation results are fused. As shown in Figure 1, based on consecutive image frames, the residual between two frames was constructed by utilizing the attitude estimation by the continuous vision and the inertial raw data. Then, the attitude between the two frames was calculated through optimization. Considering that inertial noise and calculation error of vision will inevitably cause the attitude errors accumulation over time, the pre-measured attitude libraries are introduced to ensure high accuracy in continuous endurance.
The rest of this paper is organized as follows. In Section 2, the system model, fusion algorithm, and attitude library construction methods are introduced. In Section 3, we conduct experiments; the experimental results are discussed in Section 4. In Section 5, we summarize this paper and give the conclusion.

System Model
In this paper, a fast attitude estimate system based on the fusion of vision/inertial is proposed. Firstly, based on the optimization method, the MEMS pre-integration results and the continuous visual attitude estimation results are fused. As shown in Figure 1, based on consecutive image frames, the residual between two frames was constructed by utilizing the attitude estimation by the continuous vision and the inertial raw data. Then, the attitude between the two frames was calculated through optimization. Considering that inertial noise and calculation error of vision will inevitably cause the attitude errors accumulation over time, the pre-measured attitude libraries are introduced to ensure high accuracy in continuous endurance.
The system can be divided into three parts: the attitude estimation based on the vision/inertial fusion method is given in Section 2.2. The off-line attitude library construction method is given in Section 2.3. Finally, the platform of road test experiments designed in this paper is given in Section 2.4.

Attitude Estimate Based on the Vision/Inertial Fusion
Since the purpose of this paper is to estimate the attitude, there is no need to reconstruct the environment and the 3D position of feature points does not need to be calculated. Therefore, we only use the camera geometric constraints to form residual equations, thereby reducing the computational cost of the method [14,20]. So far, the attitude estimate theory of continuous visual has matured and the epipolar geometric constraint for attitude calculation can be expressed as: where the matching points of 2 1 , p p satisfy Equation (1) and the fundamental matrix c F satisfies: where c K is the camera intrinsics matrix, c R is the rotation matrix and c t is the translation matrix between two frames. According to Equation (2), the rotation matrix between two frames can be solved by constructing the parameters matrix through multiple matching points and performing SVD decomposition. Thus, it can be predicted that the accuracy of matching determines the accuracy of attitude, and the accuracy of matching is influenced by many factors such as light intensity, rotation, blur, scale change, and weak tex- The system can be divided into three parts: the attitude estimation based on the vision/inertial fusion method is given in Section 2.2. The off-line attitude library construction method is given in Section 2.3. Finally, the platform of road test experiments designed in this paper is given in Section 2.4.

Attitude Estimate Based on the Vision/Inertial Fusion
Since the purpose of this paper is to estimate the attitude, there is no need to reconstruct the environment and the 3D position of feature points does not need to be calculated. Therefore, we only use the camera geometric constraints to form residual equations, thereby reducing the computational cost of the method [14,20]. So far, the attitude estimate theory of continuous visual has matured and the epipolar geometric constraint for attitude calculation can be expressed as: where the matching points of p 2 , p 1 satisfy Equation (1) and the fundamental matrix F c satisfies: where K c is the camera intrinsics matrix, R c is the rotation matrix and t c is the translation matrix between two frames. According to Equation (2), the rotation matrix between two frames can be solved by constructing the parameters matrix through multiple matching points and performing SVD decomposition. Thus, it can be predicted that the accuracy of matching determines the accuracy of attitude, and the accuracy of matching is influenced by many factors such as light intensity, rotation, blur, scale change, and weak texture.
Considering that ORB is a fast and robust feature among the current feature extraction algorithms and many research achievements have emerged up to now, this paper utilizes it as the extraction method. After applying the RSNSAC as the matching method, the attitude can be solved. Thus, given the attitude at i frame R c i , we can obtain the attitude at j frame R c j according to the solved rotation matrix R c between i and j frames. Furthermore, the attitude results can be used with the pre-integration of the MEMS to form the residual equation.
The pre-integration of MEMS from i to j can be expressed as: where ω k is the raw data of the MEMS and b g k is the bias of the gyro at k frame. Then, the residual equation can be given as: where R i and R j is the rotation matrix at i and j frame, respectively. δb g i is the deviation value of the gyro bias at i frame. According to Equation (4), the attitude information to be solved is R i , R j , δb g i . For the convenience of solving the residual equation, the optimization variables are set as δφ i , δφ j , δδb g i . Thus, the partial derivative equation of r ∆R ij with respect to them can be given as: where J r is the right Jacobian matrix of the three-dimensional special orthogonal group. J −1 r is the inverse mapping of the J r . Thus, given the raw data of the gyro ω k and the attitude estimation results of the consecutive images R i , R j , the attitude residual equation can be formed. According to the partial derivative Equations (5)-(7), the optimization variables δφ i , δφ j , δδb g i can be solved by the Gauss-Newton method; thus, the attitude can be updated.
It is obvious that if the attitude estimation results of consecutive vision work well, the attitude estimation results based on the vision/inertial fusion method are close to it. In this way, the error of the attitude estimation results will cumulative. To tackle this, this paper designs an off-line attitude library to calibrate the real-time image, as shown in Figure 1. The offline attitude library contains the calibrated image and the camera intrinsics matrix and attitude in the navigation frame. After the solution of Equation (2), high-precision attitude results can be obtained which are not related to time.

Off-Line Attitude Library Construction Method
The off-line attitude libraries are constructed by multi-sensor fusion. As shown in Figure 2, a tightly fused SINS/GNSS method is designed to provide the library with accurate attitude in the navigation frame(E-N-U) firstly. Then, the vision/MEMS fused system is utilized to collect image data. The liner system of SINS/GNSS can be expressed as: where X(t) is the error-state vector of SINS, F(t) is the state transition matrix, r is the vector of process noise, q is the vector of measurement noise, Z(t) is the vector of measurement, and H(t) is the observation matrix constructed based on pseudo-range and pseudo-range rate. As expressed in (2), the state vector X has 17 dimensions, which are composed of 3dim attitude errors, 3-dim velocity errors, 3-dim position errors, 3-dim gyroscope drifts, 3-dim accelerometer drifts, 1-dim pseudo-range error, and 1-dim pseudo-range rate error: , , , , , , , , , , , , , , , , Based on the state vector, the state transition matrix F can be written as: The measurement model can be calculated as: where the Z ( ) r t is the pseudo-range measurement model and the Z ( ) v t is the pseudorange rate measurement model. The  I  I  I  I  I  e  n  I  I  I  I  I  I  I   I  I where [ ] , , λ is the position of the SINS mechanization in navigation frame. In Equation (11), P is the retraction of: x y z is the position of GNSS in the ECEF frame. The ρ denotes the pseudo-range which can be given by GNSS receiver.
In Equation (11), the Q is the retraction of: The liner system of SINS/GNSS can be expressed as: .
where X(t) is the error-state vector of SINS, F(t) is the state transition matrix, r is the vector of process noise, q is the vector of measurement noise, Z(t) is the vector of measurement, and H(t) is the observation matrix constructed based on pseudo-range and pseudo-range rate. As expressed in (2), the state vector X has 17 dimensions, which are composed of 3-dim attitude errors, 3-dim velocity errors, 3-dim position errors, 3-dim gyroscope drifts, 3-dim accelerometer drifts, 1-dim pseudo-range error, and 1-dim pseudo-range rate error: Based on the state vector, the state transition matrix F can be written as: n 0 3 * 2 0 pv F pφ F pp 0 3 * 3 0 3 * 3 0 3 * 2 0 6 * 3 0 6 * 3 0 6 * 3 0 6 * 3 0 6 * 3 0 3 * 2 0 The measurement model can be calculated as: where the Z r (t) is the pseudo-range measurement model and the Z v (t) is the pseudo-range rate measurement model. The T e n is the transformation of velocity from the navigation frame system to the ECEF (Earth-Centered Earth-Fixed) frame: where [L I , λ I , h I ] is the position of the SINS mechanization in navigation frame. In Equation (11), P is the retraction of: where the vector [x I , y I , z I ] is the position of the SINS mechanization in the ECEF frame which transformed from [L I , λ I , h I ]. The vector [x G , y G , z G ] is the position of GNSS in the ECEF frame. The ρ denotes the pseudo-range which can be given by GNSS receiver.
In Equation (11), the Q is the retraction of: Given the true value of the SINS position (x, y, z) and the pseudo-range ρ, the pseudorange rate . ρ can be calculated as: In Equation (11), the δT e n is the retraction of: The Above is the calculation process of the system equation. Then, the time update step of the Kalman filter with feedback can be written as: where, for the convenience of algorithmic calculation, the discrete form of error-state is expressed as: F k = I 17 * 17 + F(t)T. The measurement update step of Kalman filter with feedback can be written as: where Equations (16) and (17) are the standard Kalman filter process [21]. Note that when the multiple pseudo range and pseudo range rate data are received, multiple iterations are performed according to (19). Followed by the Kalman filter, the mechanization results are corrected by: The integrated attitude results of SINS/GNSS only provides accurate attitude for the attitude library. The purpose of a tightly integrated method is to maintain a high-precision attitude, even when the GNSS fails for a short time.
Then, we can use the attitude results to build the attitude library. With the attitude library in the navigation frame, we can use it to correct the accumulated error of the proposed method by calculating the fundamental matrix K −T 1 R c t c × K −1 2 .

Platform of Road Test Experiments
As shown in Figure 3, the platform of road test experiments consists of the Vehicle, Camera, SINS, MEMS, and GNSS, whose position and the coordinates are also given in Figure 3. The robot operating system (ROS) is the core operating system of the platform which completes the data alignment of the multi-sensors and the program operation of the multi sensors fusion. The CPU of the intel core i5-8600K is the core processor and all road test data are processed off-line.

Platform of Road Test Experiments
As shown in Figure 3, the platform of road test experiments consists of the Vehicle, Camera, SINS, MEMS, and GNSS, whose position and the coordinates are also given in Figure 3. The robot operating system (ROS) is the core operating system of the platform which completes the data alignment of the multi-sensors and the program operation of the multi sensors fusion. The CPU of the intel core i5-8600K is the core processor and all road test data are processed off-line.

Results
For verifying the accuracy and computational cost of the attitude estimation system, we conducted experiments using the KITTI dataset and urban road test data. The experimental results are given in Sections 3.2 and 3.3, respectively. Furthermore, the parameters of sensors used for the road test are given in Section 3.1. Considering the similar performance of classical algorithms, for comparing the results of the experiments with the classical method conveniently, we chose the VI-ORBSLAM [22] as the classical method after the analysis in Section 1.

Parameters of Sensors
The proposed attitude estimation system consists of MEMS, sensor of vision, SINS, and GNSS, whose parameters are shown below.
• Parameters of MEMS.
The parameters of the MEMS are given in Table 1. The sensor of vision is triggered by hardware and can generate an image with a resolution of 1250 * 500 per 0.05 s.
Considering that the changes of the angular velocity are very slow in the road test environment, the attitudes error calculated by SINS/GNSS with a high-precision IMU is within 0.2 • .

Experiments with KITTI Dataset
The experiments in the highway area were conducted using the KITTI dataset 2011_10_ 03_drive_0042 from raw sequence 000000 to sequence 001100. The trajectory was about 1384 m and average velocity was about 75.9 km/h [23]. For comparing the experiments results with the ground truth conveniently, we used the camera frame of the first frame as the navigation frame. Since there was no prior library with attitude labels, we took raw images every 100 frames to build a library and take the ground truth as the attitude value.
The vision/inertial fusion attitude results of the proposed method and the classical method is shown in Figure 4, which demonstrates the effectiveness of the proposed method in this experiment. The error of the classical method and proposed method are given in Figure 5, which shows the error comparison of the proposed method with the classical method.
The attitude results in Figure 4 demonstrate that the effect of the proposed method is close to the classical method in the KITTI dataset. The attitude error results in Figure 5 show the details of the proposed method, which demonstrate the effectiveness of the proposed vision/inertial integrated method and the corrected method based on the off-line attitude library.
The computational efficiency and average error of the proposed method and classical method is summarized in Table 2. Note that the average attitude error is obtained by taking the average of all three attitude angles and the processing time per frame is obtained by taking the average of all frames consuming time. As we can see from two quantitative metrics illustrated in Table 2, the average attitude error of the classical method is better than the proposed method in this case. Compared with the classical method, the processing time per frame of the proposed method is reduced by 79%.

Experiments with Urban Road Test Data
The road test experiment was conducted in Jinhu Street, Xi'an, China, which is shown in Figure 6. The vehicle velocity is about 30−40 km/h, and the trajectory of the urban road is about 1692 m.
The computational efficiency and average error of the proposed method and classical method is summarized in Table 2. Note that the average attitude error is obtained by taking the average of all three attitude angles and the processing time per frame is obtained by taking the average of all frames consuming time. As we can see from two quantitative metrics illustrated in Table 2, the average attitude error of the classical method is better than the proposed method in this case. Compared with the classical method, the processing time per frame of the proposed method is reduced by 79%.

Experiments with Urban Road Test Data
The road test experiment was conducted in Jinhu Street, Xi'an, China, which is shown in Figure 6. The vehicle velocity is about 30−40 km/h, and the trajectory of the urban road is about 1692 m.
The off-line attitude library is also constructed based on the SINS/GNSS before this road test. After collecting the library data, seven locations were selected to build the library before the test, which are shown in Figure 7. The SINS used in this paper is a highprecision inertial navigation system and the attitude estimation error of the SINS/GNSS can be maintained within 0.2° during long endurance. Taking into account the attitude calibration error of SINS ant MEMS, it can be considered that the accuracy of the attitude in the attitude library is within 0.3°.
In [23], the navigation results of SINS/GNSS are used to design the ground truth of navigation because of its high precision. Thus, for verifying the accuracy of the proposed method, the tightly integrated SINS/GNSS navigation method, the same method proposed in Section 2.3, is introduced and the attitude results are used as the ground truth. The attitude estimation error by the SINS is better 0.2° per hour. Even if the GNSS fails in a short time, the attitude estimation accuracy of SINS/GNSS will not be affected. Since the The off-line attitude library is also constructed based on the SINS/GNSS before this road test. After collecting the library data, seven locations were selected to build the library before the test, which are shown in Figure 7. The SINS used in this paper is a high-precision inertial navigation system and the attitude estimation error of the SINS/GNSS can be maintained within 0.2 • during long endurance. Taking into account the attitude calibration error of SINS ant MEMS, it can be considered that the accuracy of the attitude in the attitude library is within 0.3 • .  The attitude results in Figure 8 demonstrate that the effect of the proposed method is close to the classical method in the road test experiment. The attitude error results in Figure 9 show the details of the proposed method, which demonstrate the effectiveness of the proposed vision/inertial integrated method and the corrected method based on the off-line attitude library.
The computational efficiency and average error of the proposed method in the road test is summarized in Table 3. As we can see from two quantitative metrics illustrated in Table 3, the average attitude error of the proposed method is 1.97° in the road test and the processing time per frame of the proposed method is 24 ms.  In [23], the navigation results of SINS/GNSS are used to design the ground truth of navigation because of its high precision. Thus, for verifying the accuracy of the proposed method, the tightly integrated SINS/GNSS navigation method, the same method proposed in Section 2.3, is introduced and the attitude results are used as the ground truth. The attitude estimation error by the SINS is better 0.2 • per hour. Even if the GNSS fails in a short time, the attitude estimation accuracy of SINS/GNSS will not be affected. Since the attitude estimation error of the SINS/GNSS is better than 0.2 • , the attitude estimation results of SINS/GNSS can be used as the ground truth.
The vision/inertial fusion attitude results of the proposed method are shown in Figure 8, which demonstrates the effectiveness of the proposed method in this road test. The attitude results in Figure 8 demonstrate that the effect of the proposed method is close to the classical method in the road test experiment. The attitude error results in Figure 9 show the details of the proposed method, which demonstrate the effectiveness of the proposed vision/inertial integrated method and the corrected method based on the off-line attitude library. The computational efficiency and average error of the proposed method in the road test is summarized in Table 3. As we can see from two quantitative metrics illustrated in Table 3, the average attitude error of the proposed method is 1.97 • in the road test and the processing time per frame of the proposed method is 24 ms.

Discussion
Based on the analysis of the experiment results in Section 3, the discussion of the experiment results, the proposed system, and the future work are given in Section 4.1, Section 4.2, and Section 4.3, respectively.

Discussion of the Experiment Results
As shown in Table 2, the attitude estimate accuracy of the classical method is better than the proposed method in that case. Considering that, with the endurance of UGVs, the error will inevitably accumulate with time without the loop closure. On the contrary, the attitude error of the proposed method will not accumulate. Therefore, the proposed method is more suitable for the UGVs. Most importantly, the computational efficiency of the proposed method is far superior to the classical method, which makes it possible to deploy it on industrial processors.
Furthermore, as shown in Tables 2 and 3, the average error of the proposed method in the urban road test is greater than with the KITTI dataset, which may be caused by sensor calibration errors and fewer environmental characteristics.

Discussion of the Proposed System
The purpose of this paper is to design a fast attitude estimation system for UGVs. Compared with the proposed system, using a pure vision attitude estimation module and the attitude library can also complete the task. Meanwhile, the calculation efficiency will be higher, and the accuracy will also be higher without the attitude jumping of the MEMS on bumpy roads. However, the pure visual attitude estimation module is unstable, which may track failure caused by many factors such as bumps, fast rotation, and occlusion, etc. Without the relocation, the attitude estimation will fail. In this case, fusing the vision and inertia can provide continuous attitudes for UGVs by pre-integration. That is why the MEMS/camera integration system is utilized in this paper.

Discussion of Futher Work
Fast attitude estimation makes it possible for deployment on industrial processors. In the future, we will design an attitude estimation system based on FPGA to provide high-precision and low-cost attitude estimation for UGVs.
With the development of servers and the Internet of Vehicles, all UGVs can interact with the servers in real time in the future. In this case, all UGVs will be able to upload data to the server or download data, which makes it possible for the construction and utilizing of the attitude library. In the future, we will use multi-vehicle collaboration to take a trial.

Conclusions
Compared with the high computational cost of the current classical vision/inertial fusion algorithm, this paper proposes a fast vision/inertial fusion system estimate attitude with high precision, which only relies on MEMS and a camera. By integrating the attitude estimated by continuous vision with the inertial pre-integration results based on optimization, the proposed system can provide high-precision attitude estimation for UGVs during long endurance. Considering that the errors of continuous vision and inertial pre-integration are cumulative, the prior information with attitude information is introduced, which are measured and labeled by off-line fusion of multi-sensors. Experimental results demonstrate the effectiveness of the proposed method. The contributions of the paper can be summarized as: