IVU-AutoNav: Integrated Visual and UWB Framework for Autonomous Navigation
Abstract
:1. Introduction
- The first method involves calibrating the camera to the ground plane height or storing the geometric dimensions of typical landmarks in the working environment [8]. By calculating the ratio between the ground plane height or landmark size in the SLAM system and the ground truth, the scale of the SLAM system can be restored. Nonetheless, this approach requires a high degree of distinctiveness in camera placement and environmental features, which poses difficulties in adapting to more intricate settings.
- The second method includes obtaining an accurate scale by adding sensors directly, e.g., by using stereo or multi-camera systems with a fixed baseline to accurately measure environmental depth through triangulation, or by employing devices such as an Inertial Measurement Unit (IMU) [9,10] or LiDAR [11,12,13] rangefinders to collect precise scale information. However, these methods require additional sensor installation;
- The third method involves utilizing machine learning algorithms. By identifying common standard landmarks like pedestrians or vehicles in the images captured by the camera and combining them with prior information on the average geometric dimensions of these landmarks, the scale of the SLAM system can be accurately determined. However, this method requires high-performance computing platforms, imposing high demands on the computational power and power consumption of the onboard computing platform [14,15].
2. Related Work
2.1. Visual SLAM Localization
2.2. UWB Localization
2.3. UWB-Integrated Visual SLAM Localization
3. Methodology
3.1. Problem Statement and Method Overview
3.2. Visual Initialization
3.3. UWB Candidate Frame Collection
3.3.1. UWB Distance Measurement Model
3.3.2. Time Alignment
3.3.3. Anchor Position Estimation
3.4. System Initialization
3.5. Tightly Coupled UWB and Visual SLAM
3.5.1. Distance and Visual Fusion Tracking
3.5.2. UWB and Visual Joint Optimization
4. Experiment and Analysis
4.1. Simulation Experiment
4.2. Flight Tests
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Gupta, A.; Fernando, X. Simultaneous localization and mapping (slam) and data fusion in unmanned aerial vehicles: Recent advances and challenges. Drones 2022, 6, 85. [Google Scholar] [CrossRef]
- Chen, C.; Tian, Y.; Lin, L.; Chen, S.; Li, H.; Wang, Y.; Su, K. Obtaining world coordinate information of UAV in GNSS denied environments. Sensors 2020, 20, 2241. [Google Scholar] [CrossRef] [PubMed]
- Liu, X.; Wen, W.; Hsu, L.T. GLIO: Tightly-coupled GNSS/LiDAR/IMU integration for continuous and drift-free state estimation of intelligent vehicles in urban areas. IEEE Trans. Intell. Veh. 2023, 9, 1412–1422. [Google Scholar] [CrossRef]
- Campos, C.; Elvira, R.; Rodríguez, J.J.G.; Montiel, J.M.; Tardós, J.D. Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam. IEEE Trans. Robot. 2021, 37, 1874–1890. [Google Scholar] [CrossRef]
- Mo, J.; Islam, M.J.; Sattar, J. Fast direct stereo visual SLAM. IEEE Robot. Autom. Lett. 2021, 7, 778–785. [Google Scholar] [CrossRef]
- Cai, Y.; Ou, Y.; Qin, T. Improving SLAM techniques with integrated multi-sensor fusion for 3D reconstruction. Sensors 2024, 24, 2033. [Google Scholar] [CrossRef]
- Steenbeek, A.; Nex, F. CNN-based dense monocular visual SLAM for real-time UAV exploration in emergency conditions. Drones 2022, 6, 79. [Google Scholar] [CrossRef]
- Tian, R.; Zhang, Y.; Zhu, D.; Liang, S.; Coleman, S.; Kerr, D. Accurate and robust scale recovery for monocular visual odometry based on plane geometry. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 5296–5302. [Google Scholar]
- Lee, J.; Park, S.Y. PLF-VINS: Real-time monocular visual-inertial SLAM with point-line fusion and parallel-line fusion. IEEE Robot. Autom. Lett. 2021, 6, 7033–7040. [Google Scholar] [CrossRef]
- Xia, L.; Meng, D.; Zhang, J.; Zhang, D.; Hu, Z. Visual-inertial simultaneous localization and mapping: Dynamically fused point-line feature extraction and engineered robotic applications. IEEE Trans. Instrum. Meas. 2022, 71, 5019211. [Google Scholar] [CrossRef]
- Xu, X.; Zhang, L.; Yang, J.; Cao, C.; Wang, W.; Ran, Y.; Tan, Z.; Luo, M. A review of multi-sensor fusion slam systems based on 3D LIDAR. Remote Sens. 2022, 14, 2835. [Google Scholar] [CrossRef]
- Zheng, C.; Xu, W.; Zou, Z.; Hua, T.; Yuan, C.; He, D.; Zhou, B.; Liu, Z.; Lin, J.; Zhu, F.; et al. Fast-livo2: Fast, direct lidar-inertial-visual odometry. IEEE Trans. Robot. 2024, 41, 326–346. [Google Scholar] [CrossRef]
- Wang, W.; Wang, C.; Liu, J.; Su, X.; Luo, B.; Zhang, C. HVL-SLAM: Hybrid Vision and LiDAR Fusion for SLAM. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5706514. [Google Scholar] [CrossRef]
- Guizilini, V.; Ambruș, R.; Chen, D.; Zakharov, S.; Gaidon, A. Multi-frame self-supervised depth with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 160–170. [Google Scholar]
- Mishima, N.; Seki, A.; Hiura, S. Absolute Scale from Varifocal Monocular Camera through SfM and Defocus Combined. In Proceedings of the BMVC, Online, 22–25 November 2021; p. 28. [Google Scholar]
- Zhang, X.; Wang, L.; Su, Y. Visual place recognition: A survey from deep learning perspective. Pattern Recognit. 2021, 113, 107760. [Google Scholar] [CrossRef]
- Lin, H.Y.; Yeh, M.C. Drift-free visual slam for mobile robot localization by integrating uwb technology. IEEE Access 2022, 10, 93636–93645. [Google Scholar] [CrossRef]
- Gong, Z.; Ying, R.; Wen, F.; Qian, J.; Liu, P. Tightly coupled integration of GNSS and vision SLAM using 10-DoF optimization on manifold. IEEE Sens. J. 2019, 19, 12105–12117. [Google Scholar] [CrossRef]
- Qiao, Z.; Xu, A.; Sui, X.; Hao, Y. An integrated indoor positioning method using ORB-SLAM/UWB. J. Navig. Position 2018, 6, 29–34. [Google Scholar]
- Obeidat, H.; Shuaieb, W.; Obeidat, O.; Abd-Alhameed, R. A review of indoor localization techniques and wireless technologies. Wirel. Pers. Commun. 2021, 119, 289–327. [Google Scholar] [CrossRef]
- Elsanhoury, M.; Mäkelä, P.; Koljonen, J.; Välisuo, P.; Shamsuzzoha, A.; Mantere, T.; Elmusrati, M.; Kuusniemi, H. Precision positioning for smart logistics using ultra-wideband technology-based indoor navigation: A review. IEEE Access 2022, 10, 44413–44445. [Google Scholar] [CrossRef]
- Kim Geok, T.; Zar Aung, K.; Sandar Aung, M.; Thu Soe, M.; Abdaziz, A.; Pao Liew, C.; Hossain, F.; Tso, C.P.; Yong, W.H. Review of indoor positioning: Radio wave technology. Appl. Sci. 2020, 11, 279. [Google Scholar] [CrossRef]
- Nguyen, T.H.; Nguyen, T.M.; Xie, L. Tightly-coupled single-anchor ultra-wideband-aided monocular visual odometry system. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 665–671. [Google Scholar]
- Cao, Y.; Beltrame, G. VIR-SLAM: Visual, inertial, and ranging SLAM for single and multi-robot systems. Auton. Robot. 2021, 45, 905–917. [Google Scholar] [CrossRef]
- Mur-Artal, R.; Tardós, J.D. Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans. Robot. 2017, 33, 1255–1262. [Google Scholar] [CrossRef]
- Cadena, C.; Carlone, L.; Carrillo, H.; Latif, Y.; Scaramuzza, D.; Neira, J.; Reid, I.; Leonard, J.J. Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age. IEEE Trans. Robot. 2016, 32, 1309–1332. [Google Scholar] [CrossRef]
- Qin, T.; Li, P.; Shen, S. Vins-mono: A robust and versatile monocular visual-inertial state estimator. IEEE Trans. Robot. 2018, 34, 1004–1020. [Google Scholar] [CrossRef]
- Saputra, M.R.U.; Markham, A.; Trigoni, N. Visual SLAM and structure from motion in dynamic environments: A survey. ACM Comput. Surv. CSUR 2018, 51, 1–36. [Google Scholar] [CrossRef]
- Tateno, K.; Tombari, F.; Laina, I.; Navab, N. Cnn-slam: Real-time dense monocular slam with learned depth prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6243–6252. [Google Scholar]
- Subedi, S.; Pyun, J.Y. A survey of smartphone-based indoor positioning system using RF-based wireless technologies. Sensors 2020, 20, 7230. [Google Scholar] [CrossRef] [PubMed]
- Nguyen, T.H.; Nguyen, T.M.; Xie, L. Range-Focused Fusion of Camera-IMU-UWB for Accurate and Drift-Reduced Localization. IEEE Robot. Autom. Lett. 2021, 6, 1678–1685. [Google Scholar] [CrossRef]
- Li, J.; Wang, S.; Hao, J.; Ma, B.; Chu, H.K. UVIO: Adaptive Kalman Filtering UWB-Aided Visual-Inertial SLAM System for Complex Indoor Environments. Remote Sens. 2024, 16, 3245. [Google Scholar] [CrossRef]
- Burri, M.; Nikolic, J.; Gohl, P.; Schneider, T.; Rehder, J.; Omari, S.; Achtelik, M.W.; Siegwart, R. The EuRoC micro aerial vehicle datasets. Int. J. Robot. Res. 2016, 35, 1157–1163. [Google Scholar] [CrossRef]
- Sturm, J.; Engelhard, N.; Endres, F.; Burgard, W.; Cremers, D. A benchmark for the evaluation of RGB-D SLAM systems. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Algarve, Portugal, 7–12 October 2012; pp. 573–580. [Google Scholar]
- Leutenegger, S.; Lynen, S.; Bosse, M.; Siegwart, R.; Furgale, P. Keyframe-based visual–inertial odometry using nonlinear optimization. Int. J. Robot. Res. 2015, 34, 314–334. [Google Scholar] [CrossRef]
Symbol | Description |
---|---|
Optimization parameter, defined as | |
Lie algebra corresponding to the pose of the frame | |
Position of the frame | |
Pixel positions of the feature points | |
Focal lengths of the camera | |
Principal point offsets of the camera | |
Position of the map point in 3D space | |
Components of the translation vector | |
Rotation matrix of the frame | |
Components of the Lie algebra | |
Camera projection matrix | |
Position of the map point | |
Pixel locations of the projected map point | |
Pose matrix of the image frame | |
Projection equation corresponding to the image frame | |
Pixel positions of the feature point pairs | |
Distances measured by UWB for frames , , and k | |
Change in distance between consecutive frames | |
f | Threshold for the increase ratio of distance change |
c | Speed of light |
Timestamp when anchor a transmits the request signal | |
Timestamp when anchor a receives the response signal | |
Signal processing delay | |
Measurement noise in distance estimation | |
Covariance matrix of the Gaussian noise | |
UWB ranging values at the n-th and -th measurements | |
Timestamps of the n-th and -th UWB measurements | |
Timestamp of the k-th image acquisition | |
Euclidean norm of a parameter vector | |
Position of the UWB receiver on the UAV in the navigation coordinate system at time k | |
Position of the UWB anchor point in the navigation coordinate system at time k | |
Installation pose of the UWB receiver on the UAV | |
Transformation from the UAV body coordinate system to the navigation coordinate system at time k | |
Position of the UWB anchor in the SLAM system | |
s | Scale factor between the system scale and the metric scale |
N | Number of keyframes used for system initialization |
Error in UWB distance measurements | |
Measured and computed distances in the UWB system | |
Reprojection error of the j-th feature point in the i-th information frame | |
Information matrix for the reprojection error | |
Information matrix for the UWB error | |
Reciprocal of the number of scale levels at which the feature point is captured in the pyramid system |
Method | MH-01 | MH-02 | MH-03 | MH-04 | MH-05 |
---|---|---|---|---|---|
IVU-AutoNav (Our Method) | 0.12 | 0.15 | 0.13 | 0.14 | 0.18 |
VINS_MONO (2018) [27] | 0.27 | 0.15 | 0.14 | 0.25 | 0.35 |
OKVIS (2015) [35] | 0.16 | 0.22 | 0.24 | 0.34 | 0.47 |
ORB_SLAM3 (2021) [4] | 0.13 | 0.10 | 0.12 | 0.20 | 0.22 |
Method in (2020) [23] | 0.16 | 0.11 | 0.15 | 0.25 | 0.24 |
VIR SLAM (2021) [24] | 0.18 | 0.19 | 0.26 | 0.37 | 0.29 |
Method | MH-01 | MH-02 | MH-03 | MH-04 | MH-05 |
---|---|---|---|---|---|
IVU-AutoNav (Our Method) | 99.6% | 97.5% | 96.8% | 101.2% | 99.8% |
VINS MONO (2018) [27] | 91.7% | 85.2% | 127.2% | 96.4% | 110.4% |
OKVIS (2015) [35] | 121.4% | 112% | 92.8% | 91.5% | 121.5% |
ORB_SLAM3 (2021) [4] | 98.2% | 95.1% | 111.4% | 102.7% | 105.1% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bu, S.; Zhang, J.; Li, X.; Li, K.; Hu, B. IVU-AutoNav: Integrated Visual and UWB Framework for Autonomous Navigation. Drones 2025, 9, 162. https://doi.org/10.3390/drones9030162
Bu S, Zhang J, Li X, Li K, Hu B. IVU-AutoNav: Integrated Visual and UWB Framework for Autonomous Navigation. Drones. 2025; 9(3):162. https://doi.org/10.3390/drones9030162
Chicago/Turabian StyleBu, Shuhui, Jie Zhang, Xiaohan Li, Kun Li, and Boni Hu. 2025. "IVU-AutoNav: Integrated Visual and UWB Framework for Autonomous Navigation" Drones 9, no. 3: 162. https://doi.org/10.3390/drones9030162
APA StyleBu, S., Zhang, J., Li, X., Li, K., & Hu, B. (2025). IVU-AutoNav: Integrated Visual and UWB Framework for Autonomous Navigation. Drones, 9(3), 162. https://doi.org/10.3390/drones9030162