OTE-SLAM: An Object Tracking Enhanced Visual SLAM System for Dynamic Environments
Abstract
:1. Introduction
- The YOLOv5-seg-based dynamic object detection method, which detects the dynamic objects in the image and provides segmentation masks of the objects.
- The ByteTrack-based object tracking method which tracks the dynamic objects and provides data association between segmented instances in consecutive frames.
- An object position estimation method, which computes the initial position of the dynamic objects based on the camera pose and object tracking.
- A novel object-aware and tightly coupled bundle adjustment (BA) framework, which jointly optimizes the poses of the camera and the dynamic objects, and the 3D map points positions.
- A visual SLAM system for dynamic environments, which can operate robustly in dynamic environments with the ability to track the motion of dynamic objects.
2. Related Work
2.1. Dynamic Features Removal
2.2. SLAM with Moving Object Tracking
3. System Overview
3.1. ORB-SLAM2 Framework
3.2. OTE-SLAM Overview
4. Methodology
4.1. Notations
- , , represent the world, camera, and object coordinate systems, respectively.
- i, j, k are the indices of the static map points, objects, and frames, respectively.
- represents the camera pose at the k-th frame, which is a rigid body transformation that transforms points from the world coordinate system to the camera coordinate system.
- represents the position of the j-th object at the k-th frame in the world coordinate system or camera coordinate system.
- represents a 3D static point in the world or camera coordinate system.
- represents the linear velocity of the j-th object at the k-th frame in the world coordinate system.
4.2. YOLOv5-Seg Network
4.3. ByteTrack Multi Object Tracking
4.4. Feature Classification
Algorithm 1 Feature Classification |
|
4.5. Object Position Estimation
Algorithm 2 Object position initialization and tracking strategy |
|
4.6. Joint Optimization
4.6.1. Static Points Re-Projection Error
4.6.2. Dynamic Objects Re-Projection Error
4.6.3. Object Motion Smoothness Regularization
5. Experiments
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
SLAM | Simultaneous localization and mapping |
MOT | Multi-object tracking |
RANSAC | Random sample consensus |
ORB | Oriented FAST and Rotated BRIEF |
YOLO | You Only Look Once |
MSE | Mean squared error |
MAE | Mean absolute error |
APE | Absolute pose error |
RPE | Relative pose error |
RMSE | Root mean square error |
STD | Standard deviation |
References
- Mur-Artal, R.; Tardós, J.D. ORB-SLAM2: An open-source slam system for monocular, stereo, and RDB-D cameras. IEEE Trans. Robot. 2017, 33, 1255–1262. [Google Scholar] [CrossRef]
- Engel, J.; Koltun, V.; Cremers, D. Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 611–625. [Google Scholar] [CrossRef] [PubMed]
- Forster, C.; Pizzoli, M.; Scaramuzza, D. SVO: Fast semi-direct monocular visual odometry. In Proceedings of the 2014 IEEE international conference on robotics and automation (ICRA), Hong Kong, China, 31 May–7 June 2014; pp. 15–22. [Google Scholar]
- Pumarola, A.; Vakhitov, A.; Agudo, A.; Sanfeliu, A.; Moreno-Noguer, F. PL-SLAM: Real-time monocular visual SLAM with points and lines. In Proceedings of the 2017 IEEE international conference on robotics and automation (ICRA), Singapore, 29 May–3 June 2017; pp. 4503–4508. [Google Scholar]
- Tateno, K.; Tombari, F.; Laina, I.; Navab, N. Cnn-slam: Real-time dense monocular slam with learned depth prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6243–6252. [Google Scholar]
- Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
- Yu, C.; Liu, Z.; Liu, X.J.; Xie, F.; Yang, Y.; Wei, Q.; Fei, Q. DS-SLAM: A semantic visual SLAM towards dynamic environments. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 1168–1174. [Google Scholar]
- Bescos, B.; Fácil, J.M.; Civera, J.; Neira, J. DynaSLAM: Tracking, mapping, and inpainting in dynamic scenes. IEEE Robot. Autom. Lett. 2018, 3, 4076–4083. [Google Scholar] [CrossRef]
- Zhong, F.; Wang, S.; Zhang, Z.; Wang, Y. Detect-SLAM: Making object detection and SLAM mutually beneficial. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp. 1001–1010. [Google Scholar]
- Xiao, L.; Wang, J.; Qiu, X.; Rong, Z.; Zou, X. Dynamic-SLAM: Semantic monocular visual localization and mapping based on deep learning in dynamic environment. Robot. Auton. Syst. 2019, 117, 1–16. [Google Scholar] [CrossRef]
- Fan, Y.; Zhang, Q.; Tang, Y.; Liu, S.; Han, H. Blitz-SLAM: A semantic SLAM in dynamic environments. Pattern Recognit. 2022, 121, 108225. [Google Scholar] [CrossRef]
- Bescos, B.; Campos, C.; Tardós, J.D.; Neira, J. DynaSLAM II: Tightly-coupled multi-object tracking and SLAM. IEEE Robot. Autom. Lett. 2021, 6, 5191–5198. [Google Scholar] [CrossRef]
- Zhang, J.; Henein, M.; Mahony, R.; Ila, V. VDO-SLAM: A visual dynamic object-aware SLAM system. arXiv 2020, arXiv:2005.11052. [Google Scholar]
- Ballester, I.; Fontán, A.; Civera, J.; Strobl, K.H.; Triebel, R. DOT: Dynamic object tracking for visual SLAM. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 11705–11711. [Google Scholar]
- Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Weng, F.; Yuan, Z.; Luo, P.; Liu, W.; Wang, X. Bytetrack: Multi-object tracking by associating every detection box. In Proceedings of the Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23–27 October 2022; Proceedings, Part XXII. Springer: Berlin, Germany, 2022; pp. 1–21. [Google Scholar]
- Wu, Q.; Shi, S.; Wan, Z.; Fan, Q.; Fan, P.; Zhang, C. Towards V2I age-aware fairness access: A dqn based intelligent vehicular node training and test method. arXiv 2022, arXiv:2208.01283. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Chang, Z.; Wu, H.; Li, C. YOLOv4-tiny-based robust RGB-D SLAM approach with point and surface feature fusion in complex indoor environments. J. Field Robot. 2022, 40, 521–534. [Google Scholar] [CrossRef]
- Zang, Q.; Zhang, K.; Wang, L.; Wu, L. An Adaptive ORB-SLAM3 System for Outdoor Dynamic Environments. Sensors 2023, 23, 1359. [Google Scholar] [CrossRef] [PubMed]
- Yuan, C.; Xu, Y.; Zhou, Q. PLDS-SLAM: Point and Line Features SLAM in Dynamic Environment. Remote Sens. 2023, 15, 1893. [Google Scholar] [CrossRef]
- Yang, S.; Scherer, S. CubeSLAM: Monocular 3-D object SLAM. IEEE Trans. Robot. 2019, 35, 925–938. [Google Scholar] [CrossRef]
- Qiu, Y.; Wang, C.; Wang, W.; Henein, M.; Scherer, S. AirDOS: Dynamic SLAM benefits from articulated objects. In Proceedings of the 2022 IEEE International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; pp. 8047–8053. [Google Scholar]
- Liu, Y.; Liu, J.; Hao, Y.; Deng, B.; Meng, Z. A switching-coupled backend for simultaneous localization and dynamic object tracking. IEEE Robot. Autom. Lett. 2021, 6, 1296–1303. [Google Scholar] [CrossRef]
- Sun, Y.; Hu, J.; Yun, J.; Liu, Y.; Bai, D.; Liu, X.; Zhao, G.; Jiang, G.; Kong, J.; Chen, B. Multi-objective location and mapping based on deep learning and visual slam. Sensors 2022, 22, 7576. [Google Scholar] [CrossRef] [PubMed]
- Veeramani, B.; Raymond, J.W.; Chanda, P. DeepSort: Deep convolutional networks for sorting haploid maize seeds. BMC Bioinform. 2018, 19, 289. [Google Scholar] [CrossRef] [PubMed]
- Mur-Artal, R.; Montiel, J.M.M.; Tardos, J.D. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar] [CrossRef]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The kitti dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef]
Sequence | ORB-SLAM2 | ORB-SLAM2 (Masked) | DynaSLAM | OTE-SLAM | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
RMSE | Mean | Std | RMSE | Mean | Std | RMSE | Mean | Std | RMSE | Mean | Std | |
Sequence 01 | 10.78 | 10.23 | 3.39 | 9.85 | 9.34 | 3.13 | 11.79 | 10.68 | 4.98 | 10.44 | 9.80 | 3.61 |
Sequence 03 | 0.73 | 0.61 | 0.39 | 0.79 | 0.67 | 0.41 | 0.75 | 0.64 | 0.39 | 0.62 | 0.52 | 0.34 |
Sequence 04 | 0.22 | 0.20 | 0.09 | 0.19 | 0.19 | 0.09 | 0.20 | 0.18 | 0.09 | 0.17 | 0.16 | 0.07 |
Sequence 06 | 0.84 | 0.80 | 0.26 | 0.82 | 0.77 | 0.31 | 0.84 | 0.82 | 0.18 | 1.04 | 1.01 | 0.24 |
Sequence 07 | 0.57 | 0.54 | 0.21 | 0.55 | 0.52 | 0.18 | 0.54 | 0.50 | 0.20 | 0.52 | 0.48 | 0.20 |
Sequence 08 | 3.46 | 3.14 | 1.45 | 3.87 | 3.59 | 1.37 | 3.41 | 3.13 | 1.35 | 4.01 | 3.78 | 1.32 |
Sequence 10 | 1.15 | 1.04 | 0.50 | 1.25 | 1.15 | 0.51 | 1.24 | 1.11 | 0.55 | 1.22 | 1.08 | 0.54 |
Sequence | ORB-SLAM2 | ORB-SLAM2 (Masked) | DynaSLAM | OTE-SLAM | ||||
---|---|---|---|---|---|---|---|---|
RPE | RPE | RPE | RPE | RPE | RPE | RPE | RPE | |
[%] | [deg/100 m] | [%] | [deg/100 m] | [%] | [deg/100 m] | [%] | [deg/100 m] | |
Sequence 01 | 1.43 | 0.22 | 1.40 | 0.19 | 1.87 | 0.17 | 1.45 | 0.20 |
Sequence 03 | 0.69 | 0.20 | 0.72 | 0.16 | 0.73 | 0.16 | 0.65 | 0.20 |
Sequence 04 | 0.50 | 0.12 | 0.47 | 0.18 | 0.43 | 0.11 | 0.43 | 0.08 |
Sequence 06 | 0.54 | 0.22 | 0.56 | 0.18 | 0.54 | 0.19 | 0.64 | 0.21 |
Sequence 07 | 0.55 | 0.29 | 0.50 | 0.27 | 0.50 | 0.31 | 0.53 | 0.30 |
Sequence 08 | 1.07 | 0.33 | 1.07 | 0.32 | 1.08 | 0.33 | 1.07 | 0.32 |
Sequence 10 | 0.64 | 0.29 | 0.59 | 0.31 | 0.68 | 0.35 | 0.67 | 0.30 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chang, Y.; Hu, J.; Xu, S. OTE-SLAM: An Object Tracking Enhanced Visual SLAM System for Dynamic Environments. Sensors 2023, 23, 7921. https://doi.org/10.3390/s23187921
Chang Y, Hu J, Xu S. OTE-SLAM: An Object Tracking Enhanced Visual SLAM System for Dynamic Environments. Sensors. 2023; 23(18):7921. https://doi.org/10.3390/s23187921
Chicago/Turabian StyleChang, Yimeng, Jun Hu, and Shiyou Xu. 2023. "OTE-SLAM: An Object Tracking Enhanced Visual SLAM System for Dynamic Environments" Sensors 23, no. 18: 7921. https://doi.org/10.3390/s23187921
APA StyleChang, Y., Hu, J., & Xu, S. (2023). OTE-SLAM: An Object Tracking Enhanced Visual SLAM System for Dynamic Environments. Sensors, 23(18), 7921. https://doi.org/10.3390/s23187921