Abstract
Visual-inertial odometry (VIO) algorithms, fusing various features such as points and lines, are able to improve their performance in challenging scenes while the running time severely increases. In this paper, we propose a novel lightweight point–line visual–inertial odometry algorithm to solve this problem, called LRPL-VIO. Firstly, a fast line matching method is proposed based on the assumption that the photometric values of endpoints and midpoints are invariant between consecutive frames, which greatly reduces the time consumption of the front end. Then, an efficient filter-based state estimation framework is designed to finish information fusion (point, line, and inertial). Fresh measurements of line features with good tracking quality are selected for state estimation using a unique feature selection scheme, which improves the efficiency of the proposed algorithm. Finally, validation experiments are conducted on public datasets and in real-world tests to evaluate the performance of LRPL-VIO and the results show that we outperform other state-of-the-art algorithms especially in terms of speed and robustness.
1. Introduction
State estimation is crucial for unmanned mobile platforms, especially when operating in GPS-denied areas. Simultaneous localization and mapping (SLAM) algorithms have the ability to provide real-time pose estimation and build consistent maps; thus, it is a crucial technique for robots, self-driving cars and augmented reality (AR) devices [1]. Pure visual SLAM algorithms [2,3,4], which use cameras as the sole sensor, are lightweight, low-cost, and have gained popularity over the past decade. However, they lack strong robustness because of sensitivity to illumination change and motion blur.
Many researchers have found that combining a camera with an inertial measurement unit (IMU) offers complementary advantages [5]. IMUs output high-frequency and biased inertial measurements while cameras produce images with rich information. Based on this, numerous visual–inertial odometry and SLAM systems are designed to obtain accurate and robust pose estimation. According to the estimation strategy, they can be divided into two categories: optimization-based methods and filter-based methods. The former constructs a factor graph with visual re-projection errors and IMU pre-integration errors to optimize poses and feature landmarks such as OKVIS [6] and VINS-Mono [7]. The computational load is managed using a sliding window and marginalization to achieve real-time performance. The latter holds a state vector which consists of body states (position, speed, orientation, and inertial biases) and a fixed number of history poses such as MSCKF [8] and HybVIO [9]. State propagation is finished on the basis of IMU kinematic model and visual update provides multi-frame constraints to produce an accurate trajectory. However, the aforementioned algorithms rely solely on points for visual constraints, which can lead to divergence or failure in low-texture environments.
As line features are abundant in human-made worlds, more and more VIO frameworks fuse both points and lines to improve their performance. PL-VIO [10] is the first optimization-based point–line visual–inertial odometry framework. Points, lines and IMU pre-integration terms are integrated into the optimization window to recover trajectories and scene appearances. Hence, it can outperform its predecessor VINS-Mono in some large difficult environments with severe sacrifice of running time. To speed up the processing of line features, the effect of the hidden parameters in the LSD algorithm [11] was studied in PL-VINS [12]. The authors modified a proper set of parameters to balance the speed and quality of line feature extraction in the original LSD for pose estimation tasks. In this way, PL-VINS is capable of outputting estimated poses in real-time. FPL-VIO [13] applied two methods to make the front end lightweight. It uses a fast line detection algorithm FLD [14] instead of LSD to extract line features and BRIEF descriptors [15] of midpoints to perform line matching, which greatly reduces the running time of the front end. The authors in [16] presented a similar solution, choosing EDlines [17] with gamma correction for rapid detection of long line features. They tracked a certain number of points on the line, instead of the entire segment, using the sparse KLT algorithm for line matching. As a result, the consumed time of line features in the front end is declined. However, the back end of these optimization-based methods is still a heavy module because of the repeated linearization of visual and inertial error terms, which becomes worse after fusing both point and line features [10].
Since filter-based methods avoid the re-linearization, they are considered to be more efficient [5]. Trifo-VIO [18] is a stereo point–line VIO algorithm based on MSCKF. After state propagation, both point and line features are used for visual update. However, the line features are parameterized using a 3D point and a normal vector in this system, which is an over-parameterized representation because a space line has only four degrees of freedom. Another MSCKF with lines framework is proposed in [19]. This system adopts the closest point method to represent line features and shows a good performance in real-world experiments. However, its front end uses LBD [20] to match line features; thus, its real-time performance is severely limited. A hybrid point–line MSCKF algorithm is proposed in [21]. Based on the sparse KLT algorithm, it tracks sampled points on the line between three consecutive frames in a predicting–matching way; thus, a new line can be recovered if the original one is lost. However, extra memories and operations are required in the hybrid framework since line feature landmarks are preserved in the state vector.
Most SLAM and odometry algorithms run on small-sized devices with limited available resources. How to provide accurate and high-frequency pose estimation with low computational consumption for multiple feature frameworks is still an open problem. To solve this, we propose a novel lightweight point–line visual–inertial odometry algorithm which can robustly track the poses of moving platforms. The main contributions of this paper are as follows:
- A novel filter-based point–line VIO framework with a unique feature selection scheme is proposed to produce high-frequency and accurate pose estimation results. The whole system is fast, robust, and accurate to work in complex environments such as weak texture and motion blur.
- A fast line matching method is proposed in order to decline the running time of the front end. The lines are matched using an endpoint–midpoint tracking way and a complete prediction–tracking–rejection scheme, which can ensure the matching quality with a fast speed.
- Validation experiments on public datasets and in real-world tests are conducted to evaluate the proposed LRPL-VIO. The results prove the better performance of LRPL-VIO compared with other state-of-the-art systems (HybVIO [9], VINS-Mono [7], PL-VIO [10], and PL-VINS [12]), especially in terms of speed and robustness.
The rest of this paper is organized as follows. Section 2 describes our filter-based point–line VIO system. The proposed fast line matching method is detailed in Section 3. The experiment results are explained and presented in Section 4. Finally, conclusion and future works are discussed in Section 5.
2. Filter-Based Point–Line Visual–Inertial Odometry
While point-only visual–inertial odometry algorithms can produce accurate pose estimations in environments with constant illumination and rich texture, they often struggle, tending to diverge or fail in more challenging scenes. Fusing multiple features is a good solution, while the whole system becomes heavy. In this paper, we design a lightweight and efficient point–line VIO system based on HybVIO [9] to tackle this issue. The working flowchart of LRPL-VIO is shown in Figure 1.
Figure 1.
The working flowchart of LRPL-VIO.
2.1. State Definition
Similar to most filters derived from MSCKF [8], the state vector in our system consists of the body states and a window of past poses. At timestamp k, the state vector is constructed as:
where and denote the current pose of the body. is the velocity. And
is a vector related to inertial biases. Only the diagonal elements of are used for the multiplicative correction of the accelerometer. represents the IMU-camera time shift. A fixed-length window
holds poses of past moments.
2.2. Filter Propagation
The states are initialized as after obtaining the current orientation using the first inertial measurement. The initial covariance matrix are a diagonal matrix. The system are propagated using each subsequent inertial measurement as the prediction steps of the core filter:
where is the Gaussian process noise. This propagation is finished in discrete-time by a mechanization equation [22]:
where is the current time increment. The biased inputs of gyroscope and accelerometer are calculated as and . and are i.i.d. Gaussian noises. is the gravity vector. The rotation process represented by the quaternion is and the quaternion is updated by the function [23]. The bias vector is propagated by
where is modeled as the Ornstein–Uhlenbeck random walks [24] to better match the characteristics of the IMU sensor.
2.3. Image Processing
For points, we use the Good Features to Track (GFTT) algorithm [25] to extract new features and the sparse KLT optical flow algorithm [26] to perform feature tracking. The inertial measurements between consecutive frames are integrated to obtain the instant rotation. Initial values for the feature tracker, based on two-view geometry, could be obtained (See Equation (28)) and enhance tracking quality during rapid camera motions After all this, a hybrid 2-point [27] and 5-point [28] RANSAC method is performed to reject outliers.
For lines, we use the modified LSD algorithm [11,12] to detect new line segments and set a fixed threshold to abandon short lines. The line matching is finished using the proposed fast line matching method (See Section 3), which can greatly decrease the execution time of the front end and provide higher accuracy for our VIO system than the traditional descriptor-based method LBD [20].
2.4. Feature Selection
In addition to feature detection and matching, visual update in filter-based VIO methods is another time-consuming module. Paying more attention to the most informative features is an efficient way of decreasing computational load. Another novelty of the proposed LRPL-VIO is that we do not use all the tracked features (both points and lines) but a subset of them to perform visual updates.
For a visual feature j, its whole track is a set of pose indices where denotes its first detection frame and denotes its last tracked frame. As the system moves, old poses are abandoned; thus, the oldest pose in the window denoted as may not be anymore. We use to represent the oldest tracked frame in the window. Not all the measurements but a subset of them are used for triangulation and linearization:
where is the newest frame used in the last update. In a word, we always choose the freshest information for efficiency.
For a new received frame, we also select a subset of all available visual feature tracks (denoted as ) to perform visual update at random from more-than-median ones
where the implementation of are different for points and lines in LRPL-VIO. For points, they are evaluated by the tracking length:
where is the pixel coordinate. For lines, they are less sensitive to tracking length change than points. Thus, we use the frame number as the scoring policy:
which ensures the update accuracy even using a small number of line features.
2.5. Feature Triangulation and Update
The visual update is triggered track by track until the target number is reached:
with
where
denotes the triangulated landmark using its tracked feature measurements . is the re-projection process and is the error calculation.
2.5.1. Point Feature
The point error is the difference between the re-projected landmark and tracked measurements:
where the point triangulation is the minimization process of the re-projection error
using the GN method. Since the Jacobian of with respect to is available after the initial value is provided by a two-frame triangulation, the whole optimization process of Equation (15) needs to be differentiated to render the direct linearization of Equation (14) with respect to
which avoids the null space projection motion and can be used for visual update.
2.5.2. Line Feature
The line error is defined as the distance between the endpoints of tracked measurements and the re-projected line:
with
where is the re-projected line. For a space line representation, the Plücker coordinate [29] is used in our system. On the basis of two camera poses and their corresponding measurements , we can obtain the dual Plücker matrix of a line feature [30] as
where
are the measurement plane determined by two endpoints and the camera optical center. Triangulation depending on just two frames is not reliable enough; thus, we introduce a n-views method proposed in [31]. Specifically, for measurements of a line , we stack all relevant planes:
and perform singular value decomposition of Equation (21) as . We can obtain two main planes and from the columns of by checking two largest singular values. We use Equation (19) to obtain the initial value of if the singular values are reasonable and perform a nonlinear optimization to further improve the accuracy of this triangulation. Based on the above methods, the linearization of Equation (17) is performed as
and the null space projection motion [19] is unavoidable for visual update because the feature positions are not maintained in the state vector.
2.6. Pose Augmentation and Stationary Detection
Every time a new camera frame is received, its predicted pose is inserted into the window and an old pose is removed. This process is performed as an EKF prediction step:
with
The adjustment of d can be treated as an efficient strategy and we follow [9] to combine a fixed-size with a Towers-of-Hanoi scheme:
where is the least-significant zero bit index of i. Then the max stride of poses is exponentially increased and the update time of old and new poses are properly set to different frequencies.
When the moving platform stays still, the poses in the window are quickly be the same due to Equation (23), which makes the VIO unstable. Thus, an unaugmentation step is performed if a stationary signal is received as
which pops the new inserted frame and holds most of old poses. We judge the stationary condition by the maximum pixel change of tracked point features:
where is a fixed threshold. And a ZUPT of velocity [32] is also performed to correct the pose estimation results.
3. Fast Line Matching
The complex pixel distribution of line features makes their matching more challenging and time-consuming compared to point features. In this section, we propose a novel fast line matching method to break this bottleneck. An overview of our method is shown in Algorithm 1 and details are explained below.
| Algorithm 1 Fast Line Matching |
|
Extraction: For each line feature, tracking is focused on its two endpoints and midpoint, rather than the entire line or other sampled points. In other words, for n line features, we have points in total.
Prediction: To counteract aggressive motions, inertial measurements between two camera frames are used to determine the initial positions of the points for tracking. Specifically, for two consecutive frames, and , a point transformation between them is:
where and are pixel coordinates of the same point in these frames. and are the corresponding depth measurements. is the intrinsic matrix which is considered as a static variable. The pose between and is represented by and . By taking the assumption that the translation between two consecutive frames is small enough to be ignored, and can be removed from Equation (28). Thus, a simplified version is:
We obtain the rotation through gyroscope measurements integration and then the predicted positions of the points using Equation (29).
Tracking: After the above stages, the line matching task becomes the tracking of the points, which is finished based on the photometric invariance assumption in LRPL-VIO. Take a single line endpoint as an example. With its original pixel coordinate in , our idea is to find the target pixel coordinate in to satisfy Equation (30):
where is the photometric value of the pixel in . Apparently we can not obtain using one equation; thus, another assumption that the movements of all pixels in a local window are the same is applied. That is, we have
for all w pixels in the window. To solve Equation (31), a nonlinear optimization problem is constructed:
where
Equation (32) is a typical least squares problem and can be solved in an iterative way with the initial values provided by Equation (29). In addition, the image pyramids are introduced to improve the tracking quality.
Outlier Rejection: As long as the points of a line feature are tracked, we first check the average photometric values of two endpoints. In other words, an endpoint track is considered as an inlier if
where is the threshold. However, Equation (34) is not enough to reject outliers when there is a large repeated texture area in the image. For this reason, an angle variation check is also performed if both two endpoints passed Equation (34). Namely, if a line matching pair meets
where and are the angles of the line in consecutive frames, is seen as a candidate line.
Generally, endpoints have the potential to move out of view or be tracked unsuccessfully. Hence, after obtaining the first batch of candidate lines by checking endpoints, we take tracked midpoints as new endpoints of the line features which failed to pass the above tests. For example, if is not an acceptable tracking result, it will be replaced by or . Certainly, the replaced line pairs have to satisfy both Equations (34) and (35). This scheme is able to improve the tracking length of line features with no additional sampled points. Finally, an 8-point RANSAC is performed to further reject outliers in these candidates.
Matching: After all this, we build matched line features through connecting the reserved endpoints and remove short ones which are useless for pose estimation.
4. Experiments
4.1. Dataset and Evaluation
To validate the necessity of fusing point–line features and the performance of our LRPL-VIO in different scenes, we conduct various experiments on three public academic datasets (EuRoC [33], UMA-VI [34], and VIODE [35]) and a collected real-world dataset. Four state-of-the-art algorithms (point-based VINS-Mono [7] and HybVIO [9], point–line-based PL-VIO [10] and PL-VINS [12]) are selected for comparison.
For the evaluation criteria, we choose the root mean square error (RMSE) of the absolute trajectory error (ATE) to test the estimation accuracy of different algorithms. For the EuRoC, VIODE and our collected dataset which provide groundtruth poses during the whole running process, we use the evo [36] toolbox to compute RMSE ATE between the whole estimated trajectory and groundtruth poses. For the UMA-VI dataset whose groundtruth poses are available at the start and end segments of the whole running process, we use their python tool to compute RMSE ATE between these segments of the estimated trajectory and the ground truth poses (the alignment error [34,37]). And we report the average value of five times.
A desktop computer with an Intel Core i7-9750H processor @2.60GHz and 15.5 GB RAM is used as the main experiment platform running Ubuntu 18.04 with ROS melodic.
4.2. Accuracy
In this subsection, we conduct an accuracy experiment on the EuRoC [33] dataset. It is made by a micro aerial vehicle (MAV) in three different indoor scenes. Sequences in each scene are divided into three modes: easy, medium, and difficult, according to the image quality and MAV motion speed. The results are shown as follows.
4.2.1. Ablation Experiment
In order to validate the effectiveness of our LRPL-VIO with point–line fusion, fast front end and feature track selection, we first conduct an ablation experiment on five sequences of EuRoC dataset including MH_02_easy, MH_03_medium, MH_05_difficult, V1_03_difficult, and V2_02_medium. We replace the fast line matching method with the PL-VINS LBD matching module in our system (denoted as LRPL-VIO (LBD)) for matching comparison. And the line feature selection module is disabled (denoted as LRPL-VIO (All Line Track)) to prove its necessity. The results are shown in Table 1.
Table 1.
The results of the ablation experiment, which is evaluated using RMSE ATE in meter.
First, it can be seen from Table 1 that the point–line fusion strategy could bring more visual constraints for the VIO system; thus, LRPL-VIO could produce more accurate trajectories than the point-only HybVIO (with 11% enhance on the average). Second, the proposed fast line matching method could finish line matching more efficiently than LBD with higher matching quality (LRPL-VIO obtains lower RMSE ATE than LRPL-VIO (LBD) on all five sequences) and less running time (See Table 6). Finally, the feature track selection scheme avoids using all tracked line features and their updated measurements; thus, the pose estimation accuracy could be guaranteed (with 2% enhance on the average) even using a small numbers of features (5 successful line updates at most for one frame in our implementation).
4.2.2. Accuracy Experiment
We use all 11 sequences on the EuRoC dataset to test the pose estimation accuracy of LRPL-VIO and compare it with four SOTA open-source algorithms. The results are shown in Table 2.
Table 2.
The results of the pose estimation accuracy test, which is evaluated using RMSE ATE in meter.
Compared with two point-only methods VINS-Mono and HybVIO, LRPL-VIO outperforms them on most sequences because of successful point–line fusion. Using visual constraints from various features, visual–inertial navigation systems could perform pose estimation more accurately. The average RMSE of LRPL-VIO is more than 10% lower than them. With improved line matching quality using the proposed method and feature selection scheme, line features could be used in LRPL-VIO in a more efficient way. Thus, compared with the LBD-based PL-VIO and PL-VINS, we outperform them with more than 7% lower average RMSE and less computational resource consumption (See Table 6).
4.3. Robustness
To further validate the robustness of the proposed LRPL-VIO, we select some challenging sequences from the following two datasets:
The UMA-VI dataset [34] is recorded by a custom handheld visual–inertial sensor suite. The images recorded in different scenes are severely affected by many challenging factors including low texture, illumination change, sun overexposure, and motion blur, which makes it a difficult dataset for VIO algorithms.
The VIODE dataset [35] is recorded by a simulated unmanned aerial vehicle (UAV) in dynamic environments. The novelty of this dataset is that the UAV navigates the same path in four sub-sequences (none, low, mid, high) of each scene, and the only difference between them is the number of dynamic objects.
Table 3.
The features of the selected challenging sequences.
Table 4.
The results of the robustness experiment. For evaluation, the alignment error in meter is calculated on the UMA-VI dataset and the RMSE ATE in meter is calculated on the VIODE dataset.
Effective point–line fusion strategy could improve the robustness of visual–inertial odometry algorithms. From Table 4, we can see that PL-VINS and LRPL-VIO can perform successful pose estimation on all these challenging sequences. However, we show a better performance with a lower error on each sequence, which validates the better robustness of LRPL-VIO. We also provide the alignment error figures and heat maps of estimated trajectories of PL-VINS and LRPL-VIO in Figure 2. For the alignment error figures, the smaller the translational error is, the better accuracy the VIO could provide. For the heat maps, we could focus on the difference between the estimated trajectory and groundtruth poses, which is marked in different colors. Based on this, Figure 2 can validate the better robustness of LRPL-VIO than PL-VINS on the other hand.
Figure 2.
The pose estimation error of PL-VINS and LRPL-VIO on the UMA-VI and VIODE dataset. (a) The alignment error of PL-VINS in class_csc2. (b) The alignment error of PL-VINS in parking_csc2. (c) The RMSE ATE of PL-VINS in cd3_high. (d) The alignment error of LRPL-VIO in class_csc2. (e) The alignment error of LRPL-VIO in parking_csc2. (f) The RMSE ATE of LRPL-VIO in cd3_high.
4.4. Real-World Performance
To test the performance of LRPL-VIO in real-world applications, we collected a custom dataset in a challenging indoor scene. A sensor suite with a Intel Realsense D455 camera (gray image, 30 Hz) and a Xsens MTi-680G IMU (inertial measurement, 200 Hz) is used as the collection platform. Two motion modes (normal and fast rotation) are applied to produce different evaluation sequences, which are shown in Figure 3a,b. The results are shown in Table 5.
Figure 3.
The figures of real-world experiments. (a) An example image of sequence Lab_Normal. (b) An example image of sequence Lab_FastRotation. (c) The 3D error map of HybVIO in Lab_Normal. (d) The X-Y plane of 3D error map of HybVIO in Lab_Normal. (e) The 3D error map of HybVIO in Lab_FastRotation. (f) The X-Y plane of 3D error map of HybVIO in Lab_FastRotation. (g) The 3D error map of LRPL-VIO in Lab_Normal. (h) The X-Y plane of 3D error map of LRPL-VIO in Lab_Normal. (i) The 3D error map of LRPL-VIO in Lab_FastRotation. (j) The X-Y plane of 3D error map of LRPL-VIO in Lab_FastRotation.
Table 5.
The results of the real-world experiments, which is evaluated using RMSE ATE in meter.
From Table 5, it can be seen that LRPL-VIO could perform pose estimation more accurately than HybVIO in the experiments. The RMSE ATE of LRPL-VIO is 35.4% lower in Lab_Normal and 26.5% lower in Lab_FastRotation. Fusing various features could bring more constraints; thus, the whole estimated trajectories of LRPL-VIO are closer to groundtruth poses. And Figure 3c–j could validate this more intuitively.
4.5. Runtime
To evaluate the real-time performance of LRPL-VIO, we divide it into three main modules including point processing (front end), line processing (front end), and VIO (back end) for convenience of comparison with PL-VIO and PL-VINS. And the MH_04_difficult sequence of EuRoC dataset is used to conduct this test. The results are shown in Table 6.
Table 6.
The results of the runtime analysis, which is evaluated using millisecond.
As shown in Table 6, the time-consuming LBD and the heavy optimization back end are the most time-consuming module of PL-VIO and PL-VINS. In contrast, the proposed fast line matching method in Section 3 brings our system high efficiency. The execution time of line detection and tracking process of LRPL-VIO is much less than them. In addition, our core pose estimation scheme is an efficient EKF with a unique feature selection scheme, which ensures that our total processing speed of a single frame is nearly three times faster than PL-VINS.
5. Conclusions and Future Work
In this paper, a novel point–line visual–inertial odometry is proposed to address positioning issues in complex environments such as weak texture and dynamic features. The short runtime of feature correspondence is maintained by a fast line matching method; thus, the whole system can work at a high frequency. A line feature selection scheme is utilized to further improve the efficiency of the core filter. Validation experiments on the EuRoC, UMA-VI, and VIODE dataset have shown the better performance and efficiency of our system against other SOTA open-source algorithms (HybVIO [9], VINS-Mono [7], PL-VIO [10], and PL-VINS [12]). In the future, we will try to introduce the structural constraints of 3D line features and plane features to further improve the accuracy.
Author Contributions
Conceptualization, F.Z. and L.S.; investigation, F.Z.; methodology, F.Z., L.Z. and L.S.; software, F.Z., W.L. and J.L.; writing—original draft preparation, F.Z.; writing—review and editing, F.Z., L.Z., W.L., J.L. and L.S. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the National Natural Science Foundation under Grant 62173192 and Shenzhen Natural Science Foundation under Grant JCYJ20220530162202005.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Data sharing is not applicable.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Tourani, A.; Bavle, H.; Sanchez-Lopez, J.L.; Voos, H. Visual SLAM: What Are the Current Trends and What to Expect? Sensors 2022, 22, 9297. [Google Scholar] [CrossRef]
- Forster, C.; Pizzoli, M.; Scaramuzza, D. SVO: Fast Semi-Direct Monocular Visual Odometry. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May 2014–7 June 2014; pp. 15–22. [Google Scholar]
- Mur-Artal, R.; Montiel, J.M.M.; Tardos, J.D. ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar] [CrossRef]
- Engel, J.; Koltun, V.; Cremers, D. Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 611–625. [Google Scholar] [CrossRef] [PubMed]
- Huang, G. Visual-Inertial Navigation: A Concise Review. In Proceedings of the 2019 IEEE International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 9572–9582. [Google Scholar]
- Leutenegger, S.; Lynen, S.; Bosse, M.; Siegwart, R.; Furgale, P. Keyframe-based Visual-Inertial Odometry Using Nonlinear Optimization. Int. J. Robot. Res. 2015, 34, 314–334. [Google Scholar] [CrossRef]
- Qin, T.; Li, P.; Shen, S. VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator. IEEE Trans. Robot. 2018, 34, 1004–1020. [Google Scholar] [CrossRef]
- Mourikis, A.I.; Roumeliotis, S.I. A Multi-State Constraint Kalman Filter for Vision-Aided Inertial Navigation. In Proceedings of the 2007 IEEE International Conference on Robotics and Automation (ICRA), Rome, Italy, 10–14 April 2007; pp. 3565–3572. [Google Scholar]
- Seiskari, O.; Rantalankila, P.; Kannala, J.; Ylilammi, J.; Rahtu, E.; Solin, A. HybVIO: Pushing the Limits of Real-time Visual-Inertial Odometry. In Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2022; pp. 701–710. [Google Scholar]
- He, Y.; Zhao, J.; Guo, Y.; He, W.; Yuan, K. PL-VIO: Tightly-Coupled Monocular Visual-Inertial Odometry using Point and Line Features. Sensors 2018, 18, 1159. [Google Scholar] [CrossRef]
- Von Gioi, R.G.; Jakubowicz, J.; Morel, J.M.; Randall, G. LSD: A Fast Line Segment Detector with A False Detection Control. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 32, 722–732. [Google Scholar] [CrossRef]
- Fu, Q.; Wang, J.; Yu, H.; Ali, I.; Guo, F.; He, Y.; Zhang, H. PL-VINS: Real-Time Monocular Visual-Inertial SLAM with Point and Line Features. arXiv 2020, arXiv:2009.07462. [Google Scholar]
- Li, W.; Cai, H.; Zhao, S.; Liu, Y.; Liu, C. A Fast Visual-Inertial Odometry Based on Line Midpoint Descriptor. Int. J. Autom. Comput. 2021, 18, 667–679. [Google Scholar] [CrossRef]
- Li, J.H.; Li, S.; Zhang, G.; Lim, J.; Chung, W.K.; Suh, I.H. Outdoor Place Recognition in Urban Environments Using Straight Lines. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–7 June 2014; pp. 5550–5557. [Google Scholar]
- Calonder, M.; Lepetit, V.; Strecha, C.; Fua, P. BRIEF: Binary Robust Independent Elementary Features. In Proceedings of the 2010 European Conference on Computer Vision (ECCV), Heraklion, Crete, Greece, 5–11 September 2010; pp. 778–792. [Google Scholar]
- Kuang, Z.; Wei, W.; Yan, Y.; Li, J.; Lu, G.; Peng, Y.; Li, J.; Shang, W. A Real-time and Robust Monocular Visual Inertial SLAM System Based on Point and Line Features for Mobile Robots of Smart Cities Toward 6G. IEEE Open J. Commun. Soc. 2022, 3, 1950–1962. [Google Scholar] [CrossRef]
- Akinlar, C.; Topal, C. EDLines: A Real-time Line Segment Detector with A False Detection Control. Pattern Recognit. Lett. 2011, 32, 1633–1642. [Google Scholar] [CrossRef]
- Zheng, F.; Tsai, G.; Zhang, Z.; Liu, S.; Chu, C.C.; Hu, H. Trifo-VIO: Robust and Efficient Stereo Visual Inertial Odometry using Points and Lines. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 3686–3693. [Google Scholar]
- Yang, Y.; Geneva, P.; Eckenhoff, K.; Huang, G. Visual-Inertial Navigation with Point and Line Features. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 4–8 November 2019; p. 3. [Google Scholar]
- Zhang, L.; Koch, R. An Efficient and Robust Line Segment Matching Approach Based on LBD Descriptor and Pairwise Geometric Consistency. J. Vis. Commun. Image Represent. 2013, 24, 794–805. [Google Scholar] [CrossRef]
- Wei, H.; Tang, F.; Xu, Z.; Zhang, C.; Wu, Y. A Point-Line VIO System With Novel Feature Hybrids and With Novel Line Predicting-Matching. IEEE Robot. Automat. Lett. 2021, 6, 8681–8688. [Google Scholar] [CrossRef]
- Solin, A.; Cortes, S.; Rahtu, E.; Kannala, J. PIVO: Probabilistic Inertial-Visual Odometry for Occlusion-Robust Navigation. In Proceedings of the 2018 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp. 616–625. [Google Scholar]
- Titterton, D.; Weston, J.L. Strapdown Inertial Navigation Technology; The Institution of Electrical Engineers: Stevenage, UK, 2004; pp. 42–45. [Google Scholar]
- Uhlenbeck, G.E.; Ornstein, L.S. On The Theory of The Brownian Motion. Phys. Rev. 1930, 36, 823. [Google Scholar] [CrossRef]
- Shi, J.; Tomasi, C. Good Features To Track. In Proceedings of the 1994 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 21–23 June 1994; pp. 593–600. [Google Scholar]
- Lucas, B.D.; Kanade, T. An Iterative Image Registration Technique with An Application to Stereo Vision. In Proceedings of the 1981 International Joint Conference on Artificial Intelligence (IJCAI), Vancouver, BC, Canada, 24–28 August 1981; pp. 674–679. [Google Scholar]
- Kanatani, K.i. Analysis of 3-D Rotation Fitting. IEEE Trans. Pattern Anal. Mach. Intell. 1994, 16, 543–549. [Google Scholar] [CrossRef]
- Nistér, D. An Efficient Solution to The Five-Point Relative Pose Problem. IEEE Trans. Pattern Anal. Mach. Intell. 2004, 26, 756–770. [Google Scholar] [CrossRef]
- Zhang, G.; Lee, J.H.; Lim, J.; Suh, I.H. Building a 3-D Line-Based Map Using Stereo SLAM. IEEE Trans. Robot. 2015, 31, 1364–1377. [Google Scholar] [CrossRef]
- Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision; Cambridge University Press: Cambridge, UK, 2003; pp. 322–323. [Google Scholar]
- Lee, S.; Hwang, S. Elaborate Monocular Point and Line SLAM with Robust Initialization. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1121–1129. [Google Scholar]
- Solin, A.; Cortes, S.; Rahtu, E.; Kannala, J. Inertial Odometry on Handheld Smartphones. In Proceedings of the 2018 International Conference on Information Fusion (FUSION), Cambridge, UK, 10–13 July 2018; pp. 1–5. [Google Scholar]
- Burri, M.; Nikolic, J.; Gohl, P.; Schneider, T.; Rehder, J.; Omari, S.; Achtelik, M.W.; Siegwart, R. The EuRoC Micro Aerial Vehicle Datasets. Int. J. Robot. Res. 2016, 35, 1157–1163. [Google Scholar] [CrossRef]
- Zuñiga-Noël, D.; Jaenal, A.; Gomez-Ojeda, R.; Gonzalez-Jimenez, J. The UMA-VI Dataset: Visual-Inertial Odometry in Low-textured and Dynamic Illumination Environments. Int. J. Robot. Res. 2020, 39, 1052–1060. [Google Scholar] [CrossRef]
- Minoda, K.; Schilling, F.; Wüest, V.; Floreano, D.; Yairi, T. VIODE: A Simulated Dataset to Address The Challenges of Visual-Inertial Odometry in Dynamic Environments. IEEE Robot. Automat. Lett. 2021, 6, 1343–1350. [Google Scholar] [CrossRef]
- Grupp, M. EVO: Python Package for The Evaluation of Odometry and SLAM. 2017. Available online: https://github.com/MichaelGrupp/evo (accessed on 3 January 2024).
- Engel, J.; Usenko, V.; Cremers, D. A Photometrically Calibrated Benchmark For Monocular Visual Odometry. arXiv 2016, arXiv:1607.02555. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).