Pose Estimation Utilizing a Gated Recurrent Unit Network for Visual Localization
Abstract
:1. Introduction
- A VO based on GRU network is proposed for predicting future yaw angle by making use of stacked yaw angle obtained from classical VO.
- The proposed method is able to extract rotational tendency constrained by shape or type of robot or vehicle, particularly effective in the cornering section.
- A modified VO framework is developed to improve VO performance by applying the GRU network to the classical VO without causing any change of the pipeline of the original VO.
- Fusion of yaw angle by Normalized Cross-Correlation (NCC) and subsequent reconstruction of the rotation matrix by using the fused yaw angle are presented.
2. System Overview
2.1. Visual Odometry
Algorithm 1 Pseudo-code of Monocular Visual Odometry algorithm [2,36]. 2D-to-2D Monocular Visual Odometry |
|
2.2. Effect of Cornering on VO
2.3. Framework of Pose Estimation Utilizing GRU Network
2.3.1. Network Architecture
2.3.2. Framework with Classical VO
Algorithm 2 Pseudo-code of Monocular Visual Odometry algorithm based on proposed method. 2D-to-2D Monocular Visual Odometry with Pose Correction Using a GRU Network |
|
3. Simulation
3.1. Training and Testing
3.2. Evaluation
4. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Nistér, D.; Naroditsky, O.; Bergen, J. Visual odometry. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, 27 June–2 July 2004. [Google Scholar]
- Scaramuzza, D.; Fraundorfer, F. Visual odometry [tutorial]. IEEE Robot. Autom. Mag. 2011, 18, 80–92. [Google Scholar] [CrossRef]
- Li, R.; Wang, S.; Gu, D. Ongoing evolution of visual slam from geometry to deep learning: Challenges and opportunities. Cognit. Comput. 2018, 10, 875–889. [Google Scholar] [CrossRef]
- Yang, N.; Wang, R.; Gao, X.; Cremers, D. Challenges in monocular visual odometry: Photometric calibration, motion bias, and rolling shutter effect. IEEE Robot. Autom. Lett. 2018, 3, 2878–2885. [Google Scholar] [CrossRef] [Green Version]
- Sun, R.; Giuseppe, B.A. 3D Reconstruction of Real Environment from Images Taken from UAV (SLAM Approach). Ph.D. Thesis, Politecnico di Torino, Turin, Italy, 2018. [Google Scholar]
- Cvišić, I.; Petrović, I. Stereo odometry based on careful feature selection and tracking. In Proceedings of the 2015 European Conference on Mobile Robots (ECMR), Paris, France, 2–4 September 2015; pp. 1–6. [Google Scholar]
- More, R.; Kottath, R.; Jegadeeshwaran, R.; Kumar, V.; Karar, V.; Poddar, S. Improved pose estimation by inlier refinement for visual odometry. In Proceedings of the 2017 Third International Conference on Sensing, Signal Processing and Security (ICSSS), Chennai, India, 4–5 May 2017; pp. 224–228. [Google Scholar]
- Liu, Y.; Gu, Y.; Li, J.; Zhang, X. Robust stereo visual odometry using improved RANSAC-based methods for mobile robot localization. Sensors 2017, 17, 2339. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Patruno, C.; Colella, R.; Nitti, M.; Renò, V.; Mosca, N.; Stella, E. A Vision-Based Odometer for Localization of Omnidirectional Indoor Robots. Sensors 2020, 20, 875. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Yi, K.M.; Trulls, E.; Lepetit, V.; Fua, P. Lift: Learned invariant feature transform. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 467–483. [Google Scholar]
- DeTone, D.; Malisiewicz, T.; Rabinovich, A. Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 224–236. [Google Scholar]
- Revaud, J.; Weinzaepfel, P.; De Souza, C.; Pion, N.; Csurka, G.; Cabon, Y.; Humenberger, M. R2d2: Repeatable and reliable detector and descriptor. arXiv 2019, arXiv:1906.06195. [Google Scholar]
- Sarlin, P.-E.; Cadena, C.; Siegwart, R.; Dymczyk, M. From coarse to fine: Robust hierarchical localization at large scale. In Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–19 June 2019; pp. 12716–12725. [Google Scholar]
- Newcombe, R.A.; Lovegrove, S.J.; Davison, A.J. DTAM: Dense tracking and mapping in real-time. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 7–13 November 2011; pp. 2320–2327. [Google Scholar]
- Engel, J.; Schöps, T.; Cremers, D. LSD-SLAM: Large-scale direct monocular SLAM. In Proceedings of the 2014 European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 834–849. [Google Scholar]
- Caruso, D.; Engel, J.; Cremers, D. Large-scale direct SLAM for omnidirectional cameras. In Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 28 September–2 October 2015; pp. 141–148. [Google Scholar]
- Engel, J.; Stückler, J.; Cremers, D. Large-scale direct SLAM with stereo cameras. In Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 28 September–2 October 2015; pp. 1935–1942. [Google Scholar]
- Usenko, V.; Engel, J.; Stückler, J.; Cremers, D. Direct visual-inertial odometry with stereo cameras. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016; pp. 1885–1892. [Google Scholar]
- Wang, R.; Schworer, M.; Cremers, D. Stereo DSO: Large-scale direct sparse visual odometry with stereo cameras. In Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3903–3911. [Google Scholar]
- Zhao, X.; Liu, L.; Zheng, R.; Ye, W.; Liu, Y. A robust stereo feature-aided semi-direct SLAM system. Robot. Auton. Syst. 2020, 132, 103597. [Google Scholar] [CrossRef]
- Wang, F.; Lü, E.; Wang, Y.; Qiu, G.; Lu, H. Efficient Stereo Visual Simultaneous Localization and Mapping for an Autonomous Unmanned Forklift in an Unstructured Warehouse. Appl. Sci. 2020, 10, 698. [Google Scholar] [CrossRef] [Green Version]
- Kendall, A.; Grimes, M.; Cipolla, R. Posenet: A convolutional network for real-time 6-dof camera relocalization. In Proceedings of the 2015 IEEE International Conference on Computer Vision, Las Condes, Chile, 11–18 December 2015; pp. 2938–2946. [Google Scholar]
- Wang, S.; Clark, R.; Wen, H.; Trigoni, N. Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 2043–2050. [Google Scholar]
- Liu, Q.; Zhang, H.; Xu, Y.; Wang, L. Unsupervised Deep Learning-Based RGB-D Visual Odometry. Appl. Sci. 2020, 10, 5426. [Google Scholar] [CrossRef]
- Liu, Q.; Li, R.; Hu, H.; Gu, D. Using unsupervised deep learning technique for monocular visual odometry. IEEE Access 2019, 7, 18076–18088. [Google Scholar] [CrossRef]
- Zhao, C.; Sun, L.; Yan, Z.; Neumann, G.; Duckett, T.; Stolkin, R. Learning Kalman Network: A deep monocular visual odometry for on-road driving. Robot. Auton. Syst. 2019, 121, 103234. [Google Scholar] [CrossRef]
- Peretroukhin, V.; Kelly, J. Dpc-net: Deep pose correction for visual localization. IEEE Robot. Autom. Lett. 2017, 3, 2424–2431. [Google Scholar] [CrossRef] [Green Version]
- Peretroukhin, V.; Wagstaff, B.; Giamou, M.; Kelly, J. Probabilistic regression of rotations using quaternion averaging and a deep multi-headed network. arXiv 2019, arXiv:1904.03182. [Google Scholar]
- Comport, A.I.; Malis, E.; Rives, P. Real-time quadrifocal visual odometry. Int. J. Robot. Res. 2010, 29, 245–266. [Google Scholar] [CrossRef] [Green Version]
- Gutierrez, D.; Rituerto, A.; Montiel, J.; Guerrero, J.J. Adapting a real-time monocular visual slam from conventional to omnidirectional cameras. In Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Barcelona, Spain, 6–13 November 2011; pp. 343–350. [Google Scholar]
- Wang, S.; Clark, R.; Wen, H.; Trigoni, N. End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks. Int. J. Robot. Res. 2018, 37, 513–542. [Google Scholar] [CrossRef]
- Jiao, J.; Jiao, J.; Mo, Y.; Liu, W.; Deng, Z. MagicVO: An End-to-End hybrid CNN and bi-LSTM method for monocular visual odometry. IEEE Access 2019, 7, 94118–94127. [Google Scholar] [CrossRef]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The kitti dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef] [Green Version]
- Zhu, J.; Yang, Z.; Guo, Y.; Zhang, J.; Yang, H. Short-term load forecasting for electric vehicle charging stations based on deep learning approaches. Appl. Sci. 2019, 9, 1723. [Google Scholar] [CrossRef] [Green Version]
- Yang, S.; Yu, X.; Zhou, Y. LSTM and GRU Neural Network Performance Comparison Study: Taking Yelp Review Dataset as an Example. In Proceedings of the 2020 International Workshop on Electronic Communication and Artificial Intelligence (IWECAI), Shanghai, China, 12–14 June 2020; pp. 98–101. [Google Scholar]
- Singh, A.; Venkatesh, K. Monocular Visual Odometry. Undergrad. Proj 2. 2015. Available online: http://avisingh599.github.io/assets/ugp2-report.pdf (accessed on 11 December 2020).
- Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
- Grupp, M. Python Package for the Evaluation of Odometry and SLAM. Available online: https://libraries.io/pypi/evo (accessed on 2 November 2020).
- Ouyang, H.; Zeng, J.; Li, Y.; Luo, S. Fault Detection and Identification of Blast Furnace Ironmaking Process Using the Gated Recurrent Unit Network. Processes 2020, 8, 391. [Google Scholar] [CrossRef] [Green Version]
- Siegwart, R.; Nourbakhsh, I.R.; Scaramuzza, D. Introduction to Autonomous Mobile Robots; MIT Press: Cambridge, MA, USA, 2011. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Zhan, H. kitti-Odom-Eval. Available online: https://github.com/Huangying-Zhan/kitti-odom-eval (accessed on 15 September 2020).
- Prokhorov, D.; Zhukov, D.; Barinova, O.; Anton, K.; Vorontsova, A. Measuring robustness of Visual SLAM. In Proceedings of the 2019 16th International Conference on Machine Vision Applications (MVA), Tokyo, Japan, 27–31 May 2019; pp. 1–6. [Google Scholar]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? In The kitti vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar]
- ChiWeiHsiao; Daiyk; Alexander. DeepVO-Pytorch. Available online: https://github.com/ChiWeiHsiao/DeepVO-pytorch (accessed on 15 September 2020).
Sequence | 00 | 01 | 02 | 03 | 04 | 05 | 06 | 07 | 08 | 09 | 10 | Avg | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MVO | Tr [%] | Err | 11.307 | 6.174 | 11.819 | 5.165 | 0.641 | 16.745 | 4.384 | 8.032 | 11.427 | 8.554 | 6.199 | 8.222 |
Rot [deg /100 m] | Err | 3.946 | 2.171 | 4.688 | 2.597 | 0.373 | 10.444 | 2.118 | 5.427 | 3.770 | 2.885 | 5.17 | 3.963 | |
ATE [m] | Err | 143.731 | 38.289 | 90.335 | 1.545 | 0.232 | 128.369 | 26.280 | 15.427 | 88.199 | 28.789 | 8.642 | 51.803 | |
RPE [m] | Err | 0.022 | 0.176 | 0.026 | 0.022 | 0.026 | 0.018 | 0.025 | 0.021 | 0.028 | 0.024 | 0.018 | 0.037 | |
RPE [deg] | Err | 0.311 | 0.275 | 0.284 | 0.222 | 0.063 | 0.346 | 0.153 | 0.221 | 0.253 | 0.260 | 0.321 | 0.246 | |
MVO + Proposed method | Tr [%] | Err | 9.769 | 6.201 | 10.767 | 5.135 | 0.738 | 5.524 | 4.396 | 8.106 | 9.678 | 8.564 | 5.886 | 6.797 |
Diff | −1.538 | 0.027 | −1.052 | −0.030 | 0.097 | −11.221 | 0.012 | 0.074 | −1.749 | 0.010 | −0.313 | −1.426 | ||
Rot [deg /100 m] | Err | 3.554 | 2.268 | 4.384 | 2.597 | 0.406 | 2.794 | 1.960 | 5.496 | 3.422 | 2.889 | 4.968 | 3.158 | |
Diff. | −0.392 | 0.097 | −0.304 | 0.000 | 0.033 | −7.650 | −0.158 | 0.069 | −0.348 | 0.004 | −0.202 | −0.805 | ||
ATE [m] | Err | 121.600 | 38.096 | 79.475 | 1.545 | 0.177 | 39.750 | 25.882 | 14.665 | 72.799 | 28.937 | 6.643 | 39.052 | |
Diff. | −22.131 | −0.193 | −10.860 | 0.000 | −0.055 | −88.619 | −0.398 | −0.762 | −15.400 | 0.148 | −1.999 | −12.752 | ||
RPE [m] | Err | 0.023 | 0.179 | 0.026 | 0.022 | 0.026 | 0.018 | 0.025 | 0.020 | 0.027 | 0.024 | 0.019 | 0.037 | |
Diff. | 0.001 | 0.003 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | −0.001 | −0.001 | 0.000 | 0.001 | 0.000 | ||
RPE [deg] | Err | 0.306 | 0.282 | 0.279 | 0.222 | 0.063 | 0.206 | 0.152 | 0.221 | 0.247 | 0.259 | 0.318 | 0.232 | |
Diff. | −0.005 | 0.007 | −0.005 | 0.000 | 0.000 | −0.140 | −0.001 | 0.000 | −0.006 | −0.001 | −0.003 | −0.014 | ||
Deep VO [45] | Tr [%] | Err | 65.749 | 138.875 | 11.819 | 81.133 | 20.053 | 47.975 | 70.602 | 62.13 | 72.711 | 90.195 | 110.799 | 70.186 |
Rot [deg/ 100 m] | Err | 28.745 | 11.612 | 4.688 | 30.119 | 7.19 | 28.675 | 29.824 | 50.996 | 30.599 | 25.633 | 25.814 | 24.900 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kim, S.; Kim, I.; Vecchietti, L.F.; Har, D. Pose Estimation Utilizing a Gated Recurrent Unit Network for Visual Localization. Appl. Sci. 2020, 10, 8876. https://doi.org/10.3390/app10248876
Kim S, Kim I, Vecchietti LF, Har D. Pose Estimation Utilizing a Gated Recurrent Unit Network for Visual Localization. Applied Sciences. 2020; 10(24):8876. https://doi.org/10.3390/app10248876
Chicago/Turabian StyleKim, Sungkwan, Inhwan Kim, Luiz Felipe Vecchietti, and Dongsoo Har. 2020. "Pose Estimation Utilizing a Gated Recurrent Unit Network for Visual Localization" Applied Sciences 10, no. 24: 8876. https://doi.org/10.3390/app10248876
APA StyleKim, S., Kim, I., Vecchietti, L. F., & Har, D. (2020). Pose Estimation Utilizing a Gated Recurrent Unit Network for Visual Localization. Applied Sciences, 10(24), 8876. https://doi.org/10.3390/app10248876