Shaped-Based Tightly Coupled IMU/Camera Object-Level SLAM
Abstract
:1. Introduction
- A shape-based solution to the SLAM problem that can achieve a 4.1 to 13.1 cm error (in an indoor environment) is developed.
- An object-level tightly coupled IMU/camera fusion is developed. The particle weight update does not require point-to-point data correspondences and relies only on the contour of the segmented object.
- An undelayed object initialization is developed. This method mitigates error accumulation due to IMU mechanization. The undelayed initialization is achieved using a novel coarse-to-fine pose estimation.
2. Literature Review
2.1. Object-Level Mapping and Localization Frameworks
2.2. Object Representation
3. Methodology
3.1. Overview
- The proposal distribution corresponds to the predicted state of the robot in a SLAM problem. The proposal distribution can be obtained using the motion model of a robot or a device. In this research, such a motion model is provided using IMU mechanization.
- Particle weighting corresponds to the update step in DBN. The particles are weighted using observation likelihood. In this research, the actual and the predicted observations are obtained using semantic segmentation of the object and the predicted projection (onto the camera) of the object, respectively. This is explained in more detail in Section 3.2 and Section 3.3.
- Particle resampling is an important step in any PF-based solution. In the resampling, the particles with higher weights are duplicated, while particles with lower weights are discarded. In this research, classical sequential importance resampling (SIR) [55] is used.
- Based on the abovementioned processes, landmark initialization is another important topic that should be addressed in every solution to the SLAM problem, which is explained in Section 3.4.
3.2. Tightly Coupled IMU/Camera Fusion
3.3. Measuring IoU
3.4. Landmark Initialization
3.5. Challenges Associated with Evaluation of Observation Likelihood
4. Results and Discussion
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Davison, A.J.; Reid, I.D.; Molton, N.D.; Stasse, O. MonoSLAM: Real-time single camera SLAM. In IEEE Transactions on Pattern Analysis and Machine Intelligence; IEEE: Piscataway, NJ, USA, 2007; Volume 29, pp. 1052–1067. [Google Scholar]
- Paul, S.; Reid, I.D.; Davison, A.J. Real-time monocular SLAM with straight lines. In Proceedings of the British Machine Vision Conference, Edinburgh, UK, 4–7 September 2006. [Google Scholar]
- Michael, K. Simultaneous localization and mapping with infinite planes. In Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 25–30 May 2015; pp. 4605–4611. [Google Scholar]
- Tianxing, C.; Guo, N.; Backén, S.; Akos, D. Monocular camera/IMU/GNSS integration for ground vehicle navigation in challenging GNSS environments. Sensors 2012, 12, 3162–3185. [Google Scholar]
- Wei, L.; Xia, X.; Xiong, L.; Lu, Y.; Gao, L.; Yu, Z. Automated vehicle sideslip angle estimation considering signal measurement characteristic. IEEE Sens. J. 2021, 21, 21675–21687. [Google Scholar]
- Henrik, F. Fusion of IMU and Monocular-SLAM in a Loosely Coupled EKF. Master’s Thesis, LinköPing University, Linkoping, Sweden, 2017. [Google Scholar]
- Ariane, S.; Frémont, V.; Şekercioğlu, Y.A.; Fantoni, I. A loosely-coupled approach for metric scale estimation in monocular vision-inertial systems. In Proceedings of the 2017 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), Daegu, Republic of Korea, 16–18 November 2017; pp. 137–143. [Google Scholar]
- Gang, P.; Lu, Z.; Peng, J.; He, D.; Li, X.; Hu, B. Robust tightly coupled pose measurement based on multi-sensor fusion in mobile robot system. Sensors 2021, 21, 5522. [Google Scholar]
- Yijia, H.; Zhao, J.; Guo, Y.; He, W.; Yuan, K. PL-VIO: Tightly-coupled monocular visual–inertial odometry using point and line features. Sensors 2018, 18, 1159. [Google Scholar]
- Sudeep, P.; Leonard, J. Monocular slam supported object recognition. arXiv 2015, arXiv:1506.01732. [Google Scholar]
- Thanuja, D.; Lui, V.; Drummond, T. MO-SLAM: Multi object slam with runtime object discovery through duplicates. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Republic of Korea, 9–14 October 2016; pp. 1214–1221. [Google Scholar]
- Dorian, G.-L.; Salas, M.; Tardós, J.D.; Montiel, J.M.M. Real-time monocular object slam. Robot. Auton. Syst. 2016, 75, 435–449. [Google Scholar]
- Liang, Z.; Wei, L.; Shen, P.; Wei, W.; Zhu, G.; Song, J. Semantic SLAM based on object detection and improved octomap. IEEE Access 2018, 6, 75545–75559. [Google Scholar]
- Shiqi, L.; Wang, J.; Xu, M.; Zhao, H.; Chen, Z. Contour-SLAM: A Robust Object-Level SLAM Based on Contour Alignment. IEEE Trans. Instrum. Meas. 2023, 72, 5006812. [Google Scholar]
- Yang, S.; Zhang, Z.; Wu, J.; Wang, Y.; Zhao, L.; Huang, S. A right invariant extended kalman filter for object based slam. IEEE Robot. Autom. Lett. 2021, 7, 1316–1323. [Google Scholar]
- Peiyu, G.; Cao, Z.; Chen, E.; Liang, S.; Tan, M.; Yu, J. A real-time semantic visual SLAM approach with points and objects. Int. J. Adv. Robot. Syst. 2020, 17, 1729881420905443. [Google Scholar]
- Dongjiang, L.; Shi, X.; Long, Q.; Liu, S.; Yang, W.; Wang, F.; Wei, Q.; Qiao, F. DXSLAM: A robust and efficient visual SLAM system with deep features. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25–29 October 2020; pp. 4958–4965. [Google Scholar]
- Sean, L.B.; Daniilidis, K.; Pappas, G.J. Robust Object-Level Semantic Visual SLAM Using Semantic Keypoints. Field Robot. 2022, 2, 513–524. [Google Scholar]
- Xinke, D.; Mousavian, A.; Xiang, Y.; Xia, F.; Bretl, T.; Fox, D. PoseRBPF: A Rao–Blackwellized particle filter for 6-D object pose tracking. IEEE Trans. Robot. 2021, 37, 1328–1342. [Google Scholar]
- Sunghwan, A.; Choi, M.; Choi, J.; Chung, W.K. Data association using visual object recognition for EKF-SLAM in home environment. In Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, Beijing, China, 9–15 October 2006; pp. 2588–2594. [Google Scholar]
- Oliver, C.R.; Klein, G.; Murray, D.W. Combining monoSLAM with object recognition for scene augmentation using a wearable camera. Image Vis. Comput. 2010, 28, 1548–1556. [Google Scholar]
- Javier, C.; Gálvez-López, D.; Riazuelo, L.; Tardós, J.D.; Montiel, J.M.M. Towards semantic SLAM using a monocular camera. In Proceedings of the 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, San Francisco, CA, USA, 25–30 September 2011; pp. 1277–1284. [Google Scholar]
- Klein, G.; Murray, D. Parallel tracking and mapping for small AR workspaces. In Proceedings of the 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, Nara, Japan, 13–16 November 2007; pp. 225–234. [Google Scholar] [CrossRef]
- Frank, D.; Kaess, M. Square Root SAM: Simultaneous localization and mapping via square root information smoothing. Int. J. Robot. Res. 2006, 25, 1181–1203. [Google Scholar]
- Ethan, R.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar]
- David, G.L. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar]
- Sean, L.B.; Atanasov, N.; Daniilidis, K.; Pappas, G.J. Probabilistic data association for semantic slam. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 1722–1729. [Google Scholar]
- Lachlan, N.; Milford, M.; Sünderhauf, N. Quadricslam: Dual quadrics from object detections as landmarks in object-oriented slam. IEEE Robot. Autom. Lett. 2018, 4, 1–8. [Google Scholar]
- Kyel, O.; Liu, K.; Frey, K.; How, J.P.; Roy, N. Robust object-based slam for high-speed autonomous navigation. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 669–675. [Google Scholar]
- Qian, Z.; Patath, K.; Fu, J.; Xiao, J. Semantic SLAM with autonomous object-level data association. arXiv 2020, arXiv:2011.10625. [Google Scholar]
- Shichao, Y.; Scherer, S. Cubeslam: Monocular 3-d object slam. IEEE Trans. Robot. 2019, 35, 925–938. [Google Scholar]
- Raul, M.-A.; Tardós, J.D. Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans. Robot. 2017, 33, 1255–1262. [Google Scholar]
- Michael, M.; Thrun, S. Simultaneous localization and mapping with unknown data association using FastSLAM. In Proceedings of the 2003 IEEE International Conference on Robotics and Automation (Cat. No. 03CH37422), Taipei, Taiwan, 14–19 September 2003; Volume 2, pp. 1985–1991. [Google Scholar]
- Ethan, E.; Drummond, T. Scalable monocular SLAM. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 17–22 June 2006; Volume 1, pp. 469–476. [Google Scholar]
- Ethan, E.; Drummond, T. Edge landmarks in monocular SLAM. Image Vis. Comput. 2009, 27, 588–596. [Google Scholar]
- Tim, B. Constrained initialisation for bearing-only SLAM. In Proceedings of the 2003 IEEE International Conference on Robotics and Automation (Cat. No. 03CH37422), Taipei, Taiwan, 14–19 September 2003; Volume 2, pp. 1966–1971. [Google Scholar]
- Mungúia, R.; Grau, A. Monocular SLAM for visual odometry: A full approach to the delayed inverse-depth feature initialization method. Math. Probl. Eng. 2012. [Google Scholar] [CrossRef]
- Joan, S.; Monin, A.; Devy, M.; Lemaire, T. Undelayed initialization in bearing only SLAM. In Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, Edmonton, AB, Canada, 2–6 August 2005; pp. 2499–2504. [Google Scholar]
- Rodrigo, M.; Castillo-Toledo, B.; Grau, A. A robust approach for a filter-based monocular simultaneous localization and mapping (SLAM) system. Sensors 2013, 13, 8501–8522. [Google Scholar]
- Adrian, P.V.; Kähler, O.; Murray, D.W.; Reid, I.D. Simultaneous 3D tracking and reconstruction on a mobile phone. In Proceedings of the 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Adelaide, Australia, 1–4 October 2013; pp. 89–98. [Google Scholar]
- Siddharth, C.; Trevor, A.J.B.; Christensen, H.I.; Dellaert, F. SLAM with object discovery, modeling and mapping. In Proceedings of the 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, Chicago, IL USA, 14–18 September 2014; pp. 1018–1025. [Google Scholar]
- Caccamo, S.; Ataer-Cansizoglu, E.; Taguchi, Y. Joint 3D reconstruction of a static scene and moving objects. In Proceedings of the 2017 International Conference on 3D Vision (3DV), Qingdao, China, 10–12 October 2017; pp. 677–685. [Google Scholar]
- Renato, F.S.-M.; Newcombe, R.A.; Strasdat, H.; Kelly, P.H.J.; Davison, A.J. Slam++: Simultaneous localisation and mapping at the level of objects. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 1352–1359. [Google Scholar]
- Parkhiya, P.; Khawad, R.; Murthy, J.K.; Bhowmick, B.; Krishna, K.M. Constructing category-specific models for monocular object-slam. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–26 May 2018; pp. 4517–4524. [Google Scholar]
- Joshi, N.; Sharma, Y.; Parkhiya, P.; Khawad, R.; Krishna, K.M.; Bhowmick, B. Integrating objects into monocular slam: Line based category specific models. In Proceedings of the 11th Indian Conference on Computer Vision, Graphics and Image Processing, Hyderabad, India, 18–22 December 2018; pp. 1–9. [Google Scholar]
- Zi-Xin, Z.; Huang, S.-S.; Mu, T.-J.; Wang, Y.-P. ObjectFusion: Accurate object-level SLAM with neural object priors. Graph. Models 2022, 123, 101165. [Google Scholar]
- Jingwen, W.; Rünz, M.; Agapito, L. DSP-SLAM: Object oriented SLAM with deep shape priors. In Proceedings of the 2021 International Conference on 3D Vision (3DV), London, UK, 1–3 December 2021; pp. 1362–1371. [Google Scholar]
- Hinterstoisser, S.; Lepetit, V.; Ilic, S.; Holzer, S.; Bradski, G.; Konolige, K.; Navab, N. Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In Proceedings of the Computer Vision–ACCV 2012: 11th Asian Conference on Computer Vision, Daejeon, Republic of Korea, 5–9 November 2012; pp. 548–562. [Google Scholar]
- Hokmabadi, I.A.S.; Ai, M.; Minaretzis, C.; Sideris, M.; El-Sheimy, N. Accurate and Scalable Contour-based Camera Pose Estimation Using Deep Learning with Synthetic Data. In Proceedings of the 2023 IEEE/ION Position, Location and Navigation Symposium (PLANS), Monterey, CA, USA, 24–27 April 2023; pp. 1385–1393. [Google Scholar]
- Joseph, R.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Noureldin, A.; Karamat, T.B.; Georgy, J. Fundamentals of Inertial Navigation, Satellite-Based Positioning and Their Integration; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
- Daum, F.; Huang, J. Curse of dimensionality and particle filters. In Proceedings of the 2003 IEEE Aerospace Conference Proceedings (Cat. No. 03TH8652), Big Sky, MT, USA, 8–15 March 2003; Volume 4, pp. 4_1979–4_1993. [Google Scholar]
- Asl Sabbaghian Hokmabadi, I. Localization on Smartphones Using Visual Fingerprinting. Master’s Thesis, University of Calgary, Calgary, AB, Canada, 2018. [Google Scholar]
- Montemerlo, M. FastSLAM: A Factored Solution to the Simultaneous Localization and Mapping Problem with Unknown Data Association. Ph.D. Thesis, Carnegie Mellon University, Pittsburgh, PA, USA, 2003. [Google Scholar]
- Anton, J.H. Bayesian Estimation and Tracking: A Practical Guide; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
- Herbert, E.; Mücke, E.P. Three-dimensional alpha shapes. ACM Trans. Graph. (TOG) 1994, 13, 43–72. [Google Scholar]
- Woods, R.E.; Gonzalez, R.C. Digital Image Processing; Pearson Education Ltd.: Upper Saddle River, NJ, USA, 2008. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Louis, B.J. Multidimensional binary search trees used for associative searching. Commun. ACM 1975, 18, 509–517. [Google Scholar]
- Gao, X.-S.; Hou, X.-R.; Tang, J.; Cheng, H.-F. Complete solution classification for the perspective-three-point problem. IEEE Trans. Pattern Anal. Mach. Intell. 2003, 25, 930–943. [Google Scholar]
- Torr, P.H.S.; Zisserman, A. MLESAC: A new robust estimator with application to estimating image geometry. Comput. Vis. Image Underst. 2000, 78, 138–156. [Google Scholar] [CrossRef]
- Furgale, P.; Rehder, J.; Siegwart, R. Unified temporal and spatial calibration for multi-sensor systems. In Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan, 3–7 November 2013; pp. 1280–1286. [Google Scholar]
- Chermak, L.; Aouf, N.; Richardson, M.; Visentin, G. Real-time smart and standalone vision/IMU navigation sensor. J. Real-Time Image Process. 2019, 16, 1189–1205. [Google Scholar] [CrossRef]
Reference (Date) | Object Detection | Object Representation | Frontend/Backend | Relies on ORB SLAM 2 (or 1)? |
---|---|---|---|---|
[22] (2011) | Detector/Descriptor | Offline (feature points) | EKF/no backend | No |
[10] (2015) | Detector/Descriptor | Online (feature points) | LBA/FG | Yes |
[12] (2016) | Detector/Descriptor | Offline (feature points) | LBA/graph optimization | No |
[27] (2017) | Detector/Descriptor + DNN | No 6DoF object models used | LBA/FG | No |
[44] (2018) | DNN | Offline (3D shape priors) | Decoupled estimation/FG | No |
[28] (2018) | DNN | Ellipsoids | Decoupled estimation/FG | No |
[29] (2019) | DNN | Ellipsoids | LBA/GBA | No |
[31] (2019) | Detector/Descriptor + DNN | Cuboids | LBA/GBA | Yes |
[30] (2020) | Detector/Descriptor + DNN | Ellipsoids | LBA/FG | Yes |
[47] (2021) | DNN | Offline (3D shape priors) | Decoupled estimation/FG | Yes |
[19] (2021) | DNN | 3D models with RGB values | RBPF | No |
[14] (2023) | DNN | Online (3D shape priors) | LBA/GBA | Yes |
Ours | DNN | Offline (2D shape priors) | RBPF | No |
Sensor | Intrinsic Calibration | Extrinsic Calibration | Additional Information |
---|---|---|---|
Monocular camera | MATLAB’s calibration toolbox | Provided by 3D CAD model | Arducam with 8MP IMX219 |
IMU | Six-position static calibration | Provided by 3D CAD model | Xsens MTi-G-710 |
Power supply | N/A | N/A | Krisdonia Laptop Power Bank |
Mini Desktop PC | N/A | N/A | Beelink Mini S |
Test | Error in Position | Distance (Camera to Object) | Experiments Details | |
---|---|---|---|---|
EV (cm) | MAP (cm) | |||
Test 1 | 7.8 | 22.4 | short | lower clutter, shorter trajectory |
Test 2 | 4.1 | 18.9 | long | lower clutter, shorter trajectory |
Test 3 | 12.3 | 35.7 | medium | higher clutter, longer trajectory |
Test 4 | 13.1 | 32.7 | medium | higher clutter, longer trajectory |
Test 5 | 12.3 | 25.5 | medium | higher clutter, average trajectory |
Test | Error in Position | Distance (Camera to Object) | Experiments Details | |
---|---|---|---|---|
EV/TTP | MAP/TTP | |||
Test 1 | 0.015 | 0.043 | short | lower clutter, shorter trajectory |
Test 2 | 0.005 | 0.023 | long | lower clutter, shorter trajectory |
Test 3 | 0.018 | 0.052 | medium | higher clutter, longer trajectory |
Test 4 | 0.021 | 0.052 | medium | higher clutter, longer trajectory |
Test 5 | 0.014 | 0.029 | medium | higher clutter, average trajectory |
Test | Other Assessment Metrics | Distance (Camera to Object) | Experiments Details | |
---|---|---|---|---|
IoU | Fail Rate | |||
Test 1 | 0.805 | 0.00 | short | lower clutter, shorter trajectory |
Test 2 | 0.805 | 0.00 | long | lower clutter, shorter trajectory |
Test 3 | 0.819 | 0.00 | medium | higher clutter, longer trajectory |
Test 4 | 0.821 | 0.00 | medium | higher clutter, longer trajectory |
Test 5 | 0.747 | 0.01 | medium | higher clutter, average trajectory |
Number of Particles | Error in Position | Other Assessment Metrics | |||
---|---|---|---|---|---|
EV (cm) | EV/TTP | IoU | Time (s) | Fail Rate | |
5000 | 12.4 | 0.018 | 0.813 | 6149.2 | 0.00 |
7000 | 11.4 | 0.020 | 0.821 | 8577.4 | 0.00 |
9000 | 10.9 | 0.019 | 0.824 | 9291.8 | 0.00 |
13,000 | 10.8 | 0.018 | 0.830 | 11015.2 | 0.00 |
17,000 | 13.4 | 0.022 | 0.835 | 12488.7 | 0.00 |
19,000 | 11.0 | 0.018 | 0.837 | 13390.3 | 0.00 |
Test | Error in Position | Other Details | ||||
---|---|---|---|---|---|---|
EV (cm) | MAP (cm) | EV/TTP | MAP/TTP | IoU | Fail Rate | |
Test 6 | 9.1 | 17.3 | 0.012 | 0.023 | 0.815 | 0.03 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Asl Sabbaghian Hokmabadi, I.; Ai, M.; El-Sheimy, N. Shaped-Based Tightly Coupled IMU/Camera Object-Level SLAM. Sensors 2023, 23, 7958. https://doi.org/10.3390/s23187958
Asl Sabbaghian Hokmabadi I, Ai M, El-Sheimy N. Shaped-Based Tightly Coupled IMU/Camera Object-Level SLAM. Sensors. 2023; 23(18):7958. https://doi.org/10.3390/s23187958
Chicago/Turabian StyleAsl Sabbaghian Hokmabadi, Ilyar, Mengchi Ai, and Naser El-Sheimy. 2023. "Shaped-Based Tightly Coupled IMU/Camera Object-Level SLAM" Sensors 23, no. 18: 7958. https://doi.org/10.3390/s23187958
APA StyleAsl Sabbaghian Hokmabadi, I., Ai, M., & El-Sheimy, N. (2023). Shaped-Based Tightly Coupled IMU/Camera Object-Level SLAM. Sensors, 23(18), 7958. https://doi.org/10.3390/s23187958