D-VINS: Dynamic Adaptive Visual–Inertial SLAM with IMU Prior and Semantic Constraints in Dynamic Scenes
Abstract
:1. Introduction
- A feature classification using the YOLOV5 [12] object detection algorithm is proposed in the front end, which divides dynamic feature points into three categories: absolute static points, absolute dynamic points, and temporary static points. Then, the dynamic factors of temporary static features are calculated based on the IMU pre-integration prior constraint and the epipolar constraint. The temporary static features are classified again according to the dynamic factors.
- A robust BA optimization method based on the dynamics factor is proposed in the back end. If the object is more dynamic, its feature weights are decreased, and vice versa.
- Extensive experiments are carried out on public datasets like TUM, KITTI, and VIODE, and our dataset. These datasets include extensive occlusion scenes, and the results from these datasets are representative. The experiment results demonstrate the accuracy and robustness of our proposed D-VINS.
2. Related Work
2.1. Geometry-Based Dynamic SLAM
2.2. Semantic-Based Dynamic SLAM
3. Methods
3.1. Dynamic Object Classification
3.1.1. Semantic Label Incremental Updating with Bayes’ Rule
Algorithm 1: Semantic label updating with Bayesian rule |
Input: Current frame bounding box ; current frame’s feature points ; previous frame’s dynamic label ; non-updated current frame’s dynamic label; threshold of the dynamic label; frequency of feature point being observed Output: Current frame’s dynamic label. |
1: for each in this Frame do: 2: for each bounding box in this Frame do: 3: if (InThisBoundingBox(,)) && ()) then, 4: ++; 5: ; 6: ; 7: end if 8: end for 9: ; 10: end for |
3.1.2. Feature Point Motion State Classification
3.2. Feature Dynamic Check with IMU Prior and Epipolar Constraints
3.2.1. Dynamic Factor of Reprojection Error Based on IMU Prior Constraint
3.2.2. Dynamic Factor of Epipolar Constraints
Algorithm 2: Dynamic feature rejection algorithm. |
Input: Previous frame ; current frame; previous frame’s feature points ; current frame’s feature points ; the threshold of the reprojection dynamic factor; the threshold of the epipolar dynamic factor . Output: Current frame’s feature points’ dynamic factors and; current frame’s feature points’ dynamic label. |
1: for each in this Frame do: 2: if (.dynamics_lable == Temporary Static Point) do: 3: F_Maxtrix = cv::FindFundamentalMat(, , CV_FM_RANSAC); 4: .A = CalIMUProjectDis(,); 5: .B = CalEpipolarDis(,, F_Maxtrix); 6: if ((A >)&&(B >)) do: 7: .dynamics_lable = ADP1; 8: end if 9: end for |
3.3. Dynamic Adaptive Bundle Adjustment
3.3.1. Conventional Bundle Adjustment Optimization
3.3.2. Dynamic Adaptive Cost Function with Dynamic Factors
4. Experimental Results
4.1. TUM RGB-D, VIODE, and KITTI Dataset Evaluation
4.1.1. TUM RGB-D Dataset
4.1.2. KITTI Dataset
4.1.3. VIODE Dataset
4.2. Data Collection Equipment and Real-Environment Dataset Experiments
4.2.1. Data Collection Devices and Real Datasets
- The 5_SLAM_country_dynamic_loop_1 sequence was collected in a village in Xiangyin County, Yueyang City, Hunan Province, in a relatively open environment, where a pedestrian and child were always present in the image moving in synchronization with the camera. The start and end points of the sequence are close to each other, but there is no loop to detect the drift.
- The 14_SLAM_car_road_1 sequence shows a street in Xiangyin County, Yueyang City, Hunan Province. The sequence is an open environment. This environment is challenging for stereo-visual localization, which causes severe drift. The rural roads are narrow with many vehicles, and there are villagers gathering in the middle of the road. Pedestrians and vehicles are intricate and occupy a large field of view, making positioning difficult and challenging.
- The 18_SLAM_car_road_2 sequence is an urban environment with wider roads, more vehicles, and more pedestrians compared to the 14 rural streets. It is suitable as a dynamic rejection algorithm evaluation sequence. The main data types include GNSS raw data, IMU data, LiDAR point cloud data, and binocular color image data. The ground truth of the trajectory is obtained with GNSS RTK.
4.2.2. Feature Classification Results on the Real Dataset
4.2.3. Trajectory Results on the Real Dataset
4.2.4. Time Analysis and Ablation Experiment
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Kazerouni, I.A.; Fitzgerald, L.; Dooly, G.; Toal, D. A survey of state-of-the-art on visual SLAM. Expert Syst. Appl. 2022, 205, 117734. [Google Scholar] [CrossRef]
- Covolan, J.P.M.; Sementille, A.C.; Sanches, S.R.R. A Mapping of Visual SLAM Algorithms and Their Applications in Augmented Reality. In Proceedings of the 2020 22nd Symposium on Virtual and Augmented Reality (SVR), Porto de Galinhas, Brazil, 7–10 November 2020; pp. 20–29. [Google Scholar]
- Tourani, A.; Bavle, H.; Sanchez-Lopez, J.L.; Voos, H. Visual SLAM: What Are the Current Trends and What to Expect? Sensors 2022, 22, 9297. [Google Scholar] [CrossRef] [PubMed]
- Chen, C.; Zhu, H.; Li, M.; You, S. A Review of Visual-Inertial Simultaneous Localization and Mapping from Filtering-Based and Optimization-Based Perspectives. Robotics 2018, 7, 45. [Google Scholar] [CrossRef] [Green Version]
- Cvisic, I.; Markovic, I.; Petrovic, I. SOFT2: Stereo Visual Odometry for Road Vehicles Based on a Point-to-Epipolar-Line Metric. IEEE Trans. Robot. 2022, 39, 273–288. [Google Scholar] [CrossRef]
- Qin, T.; Li, P.; Shen, S. VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator. IEEE Trans. Robot. 2018, 34, 1004–1020. [Google Scholar] [CrossRef] [Green Version]
- Campos, C.; Elvira, R.; Rodriguez, J.J.G.; Montiel, J.M.M.; Tardos, J.D. ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM. IEEE Trans. Robot. 2021, 37, 1874–1890. [Google Scholar] [CrossRef]
- von Stumberg, L.; Cremers, D. DM-VIO: Delayed Marginalization Visual-Inertial Odometry. IEEE Robot. Autom. Lett. 2022, 7, 1408–1415. [Google Scholar] [CrossRef]
- Qin, T.; Cao, S.; Pan, J.; Shen, S. A General Optimization-Based Framework for Global Pose Estimation with Multiple Sensors. arXiv 2019, arXiv:1901.03642. [Google Scholar]
- Mur-Artal, R.; Tardos, J.D. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras. IEEE Trans. Robot. 2017, 33, 1255–1262. [Google Scholar] [CrossRef] [Green Version]
- Fischler, M.A.; Bolles, R.C. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. In Readings in Computer Vision; Fischler, M.A., Firschein, O., Eds.; Morgan Kaufmann: San Francisco CA, USA, 1987; pp. 726–740. ISBN 978-0-08-051581-6. [Google Scholar]
- Jocher, G.; Chaurasia, A.; Stoken, A.; Borovec, J.; Kwon, Y.; Michael, K.; TaoXie; Fang, J.; NanoCode012; Imyhxy; et al. Ultralytics/Yolov5: V7.0 - YOLOv5 SOTA Realtime Instance Segmentation 2022. Available online: https://zenodo.org/record/7347926 (accessed on 1 May 2023). [CrossRef]
- Yan, L.; Hu, X.; Zhao, L.; Chen, Y.; Wei, P.; Xie, H. DGS-SLAM: A Fast and Robust RGBD SLAM in Dynamic Environments Combined by Geometric and Semantic Information. Remote Sens. 2022, 14, 795. [Google Scholar] [CrossRef]
- Song, S.; Lim, H.; Lee, A.J.; Myung, H. DynaVINS: A Visual-Inertial SLAM for Dynamic Environments. IEEE Robot. Autom. Lett. 2022, 7, 11523–11530. [Google Scholar] [CrossRef]
- Zhang, C.; Zhang, R.; Jin, S.; Yi, X. PFD-SLAM: A New RGB-D SLAM for Dynamic Indoor Environments Based on Non-Prior Semantic Segmentation. Remote Sens. 2022, 14, 2445. [Google Scholar] [CrossRef]
- Bian, J.; Lin, W.-Y.; Matsushita, Y.; Yeung, S.-K.; Nguyen, T.-D.; Cheng, M.-M. GMS: Grid-Based Motion Statistics for Fast, Ultra-Robust Feature Correspondence. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2828–2837. [Google Scholar]
- Huang, J.; Yang, S.; Zhao, Z.; Lai, Y.-K.; Hu, S. ClusterSLAM: A SLAM Backend for Simultaneous Rigid Body Clustering and Motion Estimation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Repiblic of Korea, 27 October–2 November 2019; pp. 5874–5883. [Google Scholar]
- Bescos, B.; Facil, J.M.; Civera, J.; Neira, J. DynaSLAM: Tracking, Mapping, and Inpainting in Dynamic Scenes. IEEE Robot. Autom. Lett. 2018, 3, 4076–4083. [Google Scholar] [CrossRef] [Green Version]
- Xiao, L.; Wang, J.; Qiu, X.; Rong, Z.; Zou, X. Dynamic-SLAM: Semantic monocular visual localization and mapping based on deep learning in dynamic environment. Robot. Auton. Syst. 2019, 117, 1–16. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Yu, C.; Liu, Z.; Liu, X.-J.; Xie, F.; Yang, Y.; Wei, Q.; Fei, Q. DS-SLAM: A Semantic Visual SLAM towards Dynamic Environments. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 1168–1174. [Google Scholar] [CrossRef] [Green Version]
- Lucas, B.D.; Kanade, T. An Iterative Image Registration Technique with an Application to Stereo Vision. In Proceedings of the DARPA Image Understanding Workshop, Washington, DC, USA, 21–23 April 1981; pp. 674–679. [Google Scholar]
- Ran, T.; Yuan, L.; Zhang, J.; Tang, D.; He, L. RS-SLAM: A Robust Semantic SLAM in Dynamic Environments Based on RGB-D Sensor. IEEE Sens. J. 2021, 21, 20657–20664. [Google Scholar] [CrossRef]
- Liu, J.; Li, X.; Liu, Y.; Chen, H. Dynamic-VINS: RGB-D Inertial Odometry for a Resource-Restricted Robot in Dynamic Environments. IEEE Robot. Autom. Lett. 2022, 7, 9573–9580. [Google Scholar] [CrossRef]
- Wu, W.; Guo, L.; Gao, H.; You, Z.; Liu, Y.; Chen, Z. YOLO-SLAM: A semantic SLAM system towards dynamic environment with geometric constraint. Neural Comput. Appl. 2022, 34, 6011–6026. [Google Scholar] [CrossRef]
- Cheng, S.; Sun, C.; Zhang, S.; Zhang, D. SG-SLAM: A Real-Time RGB-D Visual SLAM Toward Dynamic Scenes With Semantic and Geometric Information. IEEE Trans. Instrum. Meas. 2023, 72, 7501012. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the Computer Vision—ECCV 2014, Zurich, Switzerland, 6–12 September 2014; Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Springer International Publishing: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
- Shi, J. Tomasi Good Features to Track. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR-94, Seattle, WA, USA, 21–23 June 1994; IEEE Comput. Soc. Press: Seattle, WA, USA, 1994; pp. 593–600. [Google Scholar]
- Shafi, O.; Rai, C.; Sen, R.; Ananthanarayanan, G. Demystifying TensorRT: Characterizing Neural Network Inference Engine on Nvidia Edge Devices. In Proceedings of the 2021 IEEE International Symposium on Workload Characterization (IISWC), Storrs, CT, USA, 7–9 November 2021; pp. 226–237. [Google Scholar]
- Wang, Q.; Yan, C.; Tan, R.; Feng, Y.; Sun, Y.; Liu, Y. 3D-CALI: Automatic Calibration for Camera and LiDAR Using 3D Checkerboard. Measurement 2022, 203, 111971. [Google Scholar] [CrossRef]
- Rehder, J.; Nikolic, J.; Schneider, T.; Hinzmann, T.; Siegwart, R. Extending Kalibr: Calibrating the Extrinsics of Multiple IMUs and of Individual Axes. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016; pp. 4304–4311. [Google Scholar]
- Sturm, J.; Engelhard, N.; Endres, F.; Burgard, W.; Cremers, D. A Benchmark for the Evaluation of RGB-D SLAM Systems. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, 7–12 October 2012; pp. 573–580. [Google Scholar]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The KITTI dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef] [Green Version]
- Minoda, K.; Schilling, F.; Wuest, V.; Floreano, D.; Yairi, T. VIODE: A Simulated Dataset to Address the Challenges of Visual-Inertial Odometry in Dynamic Environments. IEEE Robot. Autom. Lett. 2021, 6, 1343–1350. [Google Scholar] [CrossRef]
- Zhang, Z.; Scaramuzza, D. A Tutorial on Quantitative Trajectory Evaluation for Visual(-Inertial) Odometry. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 7244–7251. [Google Scholar]
Sequences | ORB-SLAM2 | ORB-SLAM3 | D-VINS * (Ours) | Improvement | ||||
---|---|---|---|---|---|---|---|---|
ATE | RPE | ATE | RPE | ATE | RPE | ATE | RPE | |
fr3_sitting_static | 0.0116 | 0.0152 | 0.0097 | 0.0060 | 0.0080 | 0.0114 | 17.53% | - |
fr3_sitting_xyz | 0.0133 | 0.0199 | 0.0098 | 0.0086 | 0.0153 | 0.0179 | - | - |
fr3_sitting_halfs | 0.0336 | 0.0124 | 0.0208 | 0.0080 | 0.0252 | 0.0122 | - | - |
fr3_walking_static | 0.4121 | 0.0299 | 0.2450 | 0.0163 | 0.0069 | 0.0101 | 97.18% | 38.04% |
fr3_walking_xyz | 0.8856 | 0.1255 | 0.5617 | 0.0267 | 0.0155 | 0.0182 | 97.24% | 31.84% |
fr3_walking_rpy | 0.5987 | 0.0528 | 0.6841 | 0.0289 | 0.0422 | 0.0432 | 92.95% | - |
fr3_walking_ half | 0.4227 | 0.0338 | 0.3212 | 0.0202 | 0.0216 | 0.0234 | 93.27% | - |
Sequences | DS-SLAM | RS-SLAM | Dynamic-VINS | D-VINS * (Ours) | ||||
---|---|---|---|---|---|---|---|---|
ATE | RPE | ATE | RPE | ATE | RPE | ATE | RPE | |
fr3_walking_static | 0.0081 | 0.0102 | 0.0067 | 0.0099 | 0.0077 | 0.0095 | 0.0069 | 0.0101 |
fr3_walking_xyz | 0.0247 | 0.0333 | 0.0146 | 0.0210 | 0.0486 | 0.0578 | 0.0155 | 0.0182 |
fr3_walking_rpy | 0.4442 | 0.1503 | 0.1869 | 0.2640 | 0.0629 | 0.0595 | 0.0422 | 0.0432 |
fr3_walking_half | 0.0303 | 0.0297 | 0.0425 | 0.0609 | 0.0608 | 0.0665 | 0.0216 | 0.0234 |
Sequences | VINS-Fusion | DynaVINS | D-VINS |
---|---|---|---|
KITTI 05 | 1.913 | 12.4668 | 1.7631 |
KITTI 07 | 2.1927 | 3.8006 | 2.1100 |
Scenes | Sequences | VINS-Fusion | DynaVINS | D-VINS |
---|---|---|---|---|
Parking_lot | 0_none | 0.0774 | 0.0595 | 0.0538 |
1_low | 0.1126 | 0.0826 | 0.0472 | |
2_mid | 0.1174 | 0.0630 | 0.0396 | |
3_high | 0.1998 | 0.0982 | 0.0664 | |
City_day | 0_none | 0.1041 | 0.1391 | 0.0882 |
1_low | 0.2043 | 0.0748 | 0.0912 | |
2_mid | 0.2319 | 0.0520 | 0.0864 | |
3_high | 0.3135 | 0.0743 | 0.0835 | |
City_night | 0_none | 0.2624 | 0.1801 | 0.1561 |
1_low | 0.5665 | 0.1413 | 0.1221 | |
2_mid | 0.3862 | 0.1192 | 0.1395 | |
3_high | 0.7611 | 0.1519 | 0.1566 |
Sequences | VINS-Fusion | DynaVINS | D-VINS |
---|---|---|---|
5_SLAM_dynamic_loop_1 | 0.657039 | 2.145493 | 0.654882 |
14_SLAM_car_road_1 | 37.31964 | - | 27.60877 |
18_SLAM_car_road_2 | 299.7889 | - | 151.2075 |
Scenes | Sequences | D-VINS (G) | D-VINS (S) | D-VINS (G + S) |
---|---|---|---|---|
Parking_lot | 0_none | 0.1000 | 0.0568 | 0.0538 |
1_low | 0.1196 | 0.0517 | 0.0472 | |
2_mid | 0.1126 | 0.0709 | 0.0396 | |
3_high | 0.1387 | 0.0702 | 0.0664 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sun, Y.; Wang, Q.; Yan, C.; Feng, Y.; Tan, R.; Shi, X.; Wang, X. D-VINS: Dynamic Adaptive Visual–Inertial SLAM with IMU Prior and Semantic Constraints in Dynamic Scenes. Remote Sens. 2023, 15, 3881. https://doi.org/10.3390/rs15153881
Sun Y, Wang Q, Yan C, Feng Y, Tan R, Shi X, Wang X. D-VINS: Dynamic Adaptive Visual–Inertial SLAM with IMU Prior and Semantic Constraints in Dynamic Scenes. Remote Sensing. 2023; 15(15):3881. https://doi.org/10.3390/rs15153881
Chicago/Turabian StyleSun, Yang, Qing Wang, Chao Yan, Youyang Feng, Rongxuan Tan, Xiaoqiong Shi, and Xueyan Wang. 2023. "D-VINS: Dynamic Adaptive Visual–Inertial SLAM with IMU Prior and Semantic Constraints in Dynamic Scenes" Remote Sensing 15, no. 15: 3881. https://doi.org/10.3390/rs15153881
APA StyleSun, Y., Wang, Q., Yan, C., Feng, Y., Tan, R., Shi, X., & Wang, X. (2023). D-VINS: Dynamic Adaptive Visual–Inertial SLAM with IMU Prior and Semantic Constraints in Dynamic Scenes. Remote Sensing, 15(15), 3881. https://doi.org/10.3390/rs15153881