Visual Robot Relocalization Based on Multi-Task CNN and Image-Similarity Strategy
Abstract
:1. Introduction
- We present an end-to-end multi-task CNN which can simultaneously perform 6DoF pose regression and scene recognition tasks by using a single hand-held RGB visual sensor. Compared with the state-art networks, our CNN can not only maintain the stabilization of pose estimation by using the confidence of scene recognition, which may overcome the influence of incorrect scene image on CNN, but also improve the accuracy of 6D relocalization.
- Besides using multi-task CNN, another contribution on the improvement of relocalization accuracy is that: we present a block selection algorithm for a new input image, which is based on particle swarm optimization to find the most similar block to some training images in the training set.
- To reduce the computational complexity of finding the most similar image in the whole training set, we adopt k-means, an unsupervised clustering method, to segment the training feature space so as to form clustering feature vectors and then calculate the similarity to the testing image, which can make our model operate in real time.
- Based on 2 and 3, we present a preprocessing system of testing images, namely, the dual-level image-similarity strategy, which adopts an end-to-end manner to obtain image block most visually similar to training set.
2. Proposed Approaches and Datasets
2.1. The Specific Methods and Measures
2.1.1. Backbone Network
2.1.2. Multi-Task Learning for Pose Regression and Scene Recognition
2.1.3. Dual-Level Image-Similarity Strategy
Initial Level
Iteration-Level: PSO-Based Image-Block Selection Algorithm
Algorithm 1: PSO-based image-block selection. |
2.2. Datasets
3. Experiments
3.1. Implementation Details
3.1.1. Training Details for 6D Relocalization Network
3.1.2. Initialization of PSO-Based Image-Block Selection
3.2. Feature Representation in Pose Regression and Scene Recognition
3.3. Results of Dual-Level Image-Similarity Strategy
3.3.1. Feature Vector Clustering
3.3.2. The Robustness of the PSO-Based Image-Block Selection Algorithm
3.3.3. The Reliability of the Dual-Level Image-Similarity Strategy
3.4. Experimental Results and Discussion
3.5. Efficiency of Our Network
4. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Giubilato, R.; Vayugundla, M.; Schuster, M.J.; Stürzl, W.; Wedler, A.; Triebel, R.; Debei, S. Relocalization with submaps: Multi-session mapping for planetary rovers equipped with stereo cameras. IEEE Robot. Autom. Lett. 2020, 5, 580–587. [Google Scholar] [CrossRef] [Green Version]
- Coccia, M.; Watts, J. A theory of the evolution of technology: Technological parasitism and the implications for innovation magement. J. Eng. Technol. Manag. 2020, 55, 101552. [Google Scholar] [CrossRef]
- Wang, L.; Li, R.; Sun, J.; Seah, H.S.; Quah, C.K.; Zhao, L.; Tandianus, B. Image-similarity-based Convolutional Neural Network for Robot Visual Relocalization. Sens. Mater. 2020, 32, 1245–1259. [Google Scholar] [CrossRef]
- Schonberger, J.; Frahm, J.M. Structure-from-Motion Revisited. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Gee, A.P.; Mayol-Cuevas, W. 6D relocalisation for RGBD cameras using synthetic view regression. Cuevas 2012, 1–11. [Google Scholar] [CrossRef] [Green Version]
- Glocker, B.; Izadi, S.; Shotton, J.; Criminisi, A. Real-time RGB-D camera relocalization. In Proceedings of the 2013 IEEE International Symposium on Mixed and Augmented Reality, Adelaide, Australia, 1–4 October 2013; pp. 173–179. [Google Scholar]
- Williams, B.; Klein, G.; Reid, I. Automatic relocalization and loop closing for real- time monocular SLAM. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 1699–1712. [Google Scholar] [CrossRef] [PubMed]
- Williams, B.; Klein, G.; Reid, I. Real-time SLAM relocalisation. In Proceedings of the 2007 IEEE 11th International Conference on Computer Vision, Rio de Janeiro, Brazil, 14–21 October 2007; pp. 1–8. [Google Scholar]
- Valentin, J.; Niener, M.; Shotton, J.; Fitzgibbon, A.; Izadi, S.; Torr, P. Exploiting uncertainty in regression forests for accurate camera relocalization. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 4400–4408. [Google Scholar]
- Grasa, O.; Bernal, E.; Casado, S.; Gil, I.; Montiel, J.M.M. Visual SLAM for handheld monocular endoscope. IEEE Trans. Med. Imaging 2014, 33, 135–146. [Google Scholar] [CrossRef] [PubMed]
- Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar]
- Lowe, D.G. Object recognition from local scale-invariant features. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; pp. 1150–1157. [Google Scholar]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Hao, Q.; Cai, R.; Li, Z.; Zhang, L.; Pang, Y.; Wu, F. 3D visual phrases for landmark recognition. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3594–3601. [Google Scholar]
- Li, Y.; Snavely, N.; Huttenlocher, D.P. Location recognition using prioritized feature matching. In Proceedings of the European Conference on Computer Vision, Crete, Greece, 5–11 September 2010; pp. 791–804. [Google Scholar]
- Shotton, J.; Glocker, B.; Zach, C.; Izadi, S.; Criminisi, A.; Fitzgibbon, A. Scene coordinate regression forests for camera relocalization in rgb-d images. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 2930–2937. [Google Scholar]
- Wang, J.; Wang, P.; Dai, D.; Xu, M.; Chen, Z. Regression Forest Based RGB-D Visual Relocalization Using Coarse-to-Fine Strategy. IEEE Robot. Autom. Lett. 2020, 5, 4431–4438. [Google Scholar] [CrossRef]
- Brachmann, E.; Krull, A.; Nowozin, S.; Shotton, J.; Michel, F.; Gumhold, S.; Rother, C. Dsac-differentiable ransac for camera localization. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2492–2500. [Google Scholar]
- Brachmann, E.; Rother, C. Learning less is more-6d camera localization via 3d surface regression. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4654–4662. [Google Scholar]
- Xu, S.; Chou, W.; Dong, H. A robust indoor localization system integrating visual localization aided by CNN-based image retrieval with Monte Carlo localization. Sensors 2019, 19, 249. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kendall, A.; Grimes, M.; Cipolla, R. PoseNet: A convolutional network for real-time 6-dof camera reloc-alization. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 2938–2946. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanho, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Kendall, A.; Cipolla, R. Modelling uncertainty in deep learning for camera relocalization. Des. Eng. Anal. Reliab. Effic. Softw. 2015, 31, 4762–4769. [Google Scholar]
- Esfahani, M.A.; Wu, K.; Yuan, S.; Wang, H. DeepDSAIR: Deep 6-DOF camera relocalization using deblurred semantic-aware image representation for large-scale outdoor environments. Image Vis. Comput. 2019, 89, 120–130. [Google Scholar] [CrossRef]
- Melekhov, I.; Ylioinas, J.; Kannala, J.; Rahtu, E. Image-based localization using hourglass networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, Italy, 22–29 October 2017; pp. 870–877. [Google Scholar]
- Wu, J.; Ma, L.; Hu, X. Delving deeper into convolutional neural networks for camera relocalization. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 5644–5651. [Google Scholar]
- Phan, T.V.; Nakagawa, M. Text/Non-text classification in online handwritten documents with recurrent neural networks. In Proceedings of the 2014 14th International Conference on Frontiers in Handwriting Recognition, Heraklion, Greece, 1–4 September 2014; pp. 23–28. [Google Scholar]
- Xu, P.; Sarikayam, R. Contextual domain classification in spoken language understanding systems using recurrent neural network. In Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 4–9 May 2014; pp. 136–140. [Google Scholar]
- Nguyen, A.; Do, T.-T.; Caldwell, D.G.; Tsagarakis, N.G. Real-time 6DOF pose relocalization for event cameras with stacked spatial LSTM networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
- Clark, R.; Wang, S.; Markham, A.; Trigoni, N.; Wen, H. VidLoc: A deep spatio-temporal model for 6-dof video-clip relocalization. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2652–2660. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN’95-International Conference on Neural Networks, Perth, Australia, 27 November–1 December 1995; Volume 4, pp. 1942–1948. [Google Scholar]
- Parsopoulos, K.E.; Vrahatis, M.N. Particle swarm optimization method in multiobjective problems. In Proceedings of the 2002 ACM Symposium on Applied Computing, Madrid, Spain, 10–14 March 2002; pp. 603–607. [Google Scholar]
Operating Parameters | Descriptions | Values |
---|---|---|
M | Scale of particle swarm | M = 30 |
Initial inertia weight | = 0.9 | |
Inertia weight after maximum iteration | = 0.4 | |
, | Accelerating constants | = 2, = 2 |
Maximum number of iteration | = 200 | |
Distance threshold | = 50 |
Dataset | Train | Test | Clustering | PoseNet | Bayesian | Our Methods | Our Methods | Our |
---|---|---|---|---|---|---|---|---|
Frames | Frames | Centers K | PoseNet | without Using | without | Methods | ||
Scene Recognition | Using DLISS | |||||||
Chess | 4000 | 2000 | 40 | 0.32 m, 8.12 | 0.37 m, 7.24 | 0.32 m, 5.68 | 0.32 m, 4.71 | 0.27 m, 4.1 |
Fire | 2000 | 2000 | 20 | 0.47 m, 14.4 | 0.43 m, 13.7 | 0.41 m, 11.0 | 0.42 m, 11.23 | 0.36 m, 10.2 |
Heads | 1000 | 1000 | 10 | 0.29 m, 12.0 | 0.31 m, 12.0 | 0.28 m, 10.6 | 0.25 m, 10.16 | 0.21 m, 9.78 |
Office | 6000 | 4000 | 50 | 0.48 m, 7.68 | 0.48 m, 8.04 | 0.43 m, 7.18 | 0.39 m, 7.14 | 0.38 m, 6.82 |
Pumpkin | 4000 | 2000 | 40 | 0.47 m, 8.42 | 0.61 m, 7.08 | 0.42 m, 7.32 | 0.37 m, 6.92 | 0.35 m, 6.64 |
Red Kitchen | 7000 | 5000 | 50 | 0.59 m, 8.64 | 0.58 m, 7.54 | 0.53 m, 7.46 | 0.48 m, 6.96 | 0.45 m, 6.76 |
Stairs | 2000 | 1000 | 20 | 0.47 m, 13.8 | 0.48 m, 13.1 | 0.44 m, 10.2 | 0.34 m, 8.71 | 0.31 m, 8.28 |
Average | 0.44 m, 10.4 | 0.47 m, 9.81 | 0.40 m, 8.49 | 0.37 m, 7.98 | 0.33 m, 7.51 | |||
King’s college | 1220 | 343 | 15 | 1.92 m, 5.40 | 1.74 m, 4.06 | 1.64 m, 3.25 | 1.43 m, 3.13 | 1.38 m, 2.94 |
Street | 3015 | 2923 | 30 | 3.67 m, 6.50 | 2.14 m, 4.96 | 1.91 m, 4.35 | 1.72 m, 4.15 | 1.53 m, 3.97 |
Old Hospital | 895 | 182 | 10 | 2.31 m, 5.38 | 2.57 m, 5.14 | 1.87 m, 4.49, | 1.66 m, 4.10 | 1.59 m, 3.81 |
Shop Facade | 231 | 103 | 10 | 1.46 m, 8.08 | 1.25 m, 7.54 | 1.31 m, 7.89 | 1.13 m, 7.71 | 1.04 m, 6.61 |
St Mary’s Church | 1487 | 530 | 15 | 2.65 m, 8.48 | 2.11 m, 8.38 | 1.81 m, 7.49 | 1.78 m, 7.84 | 1.66 m, 6.83 |
Average | 2.40 m, 6.76 | 1.96 m, 6.02 | 1.71 m, 5.49 | 1.54 m, 5.38 | 1.44 m, 4.83 |
Dataset | The Number of | Scene |
---|---|---|
Images in Other Scenes | Recognition Accuracy | |
Chess | 2000 | 96.63% |
Fire | 1000 | 96.31% |
Heads | 500 | 91.69% |
Office | 3000 | 93.81% |
Pumpkin | 2000 | 86.41% |
Red Kitchen | 3500 | 87.53% |
Stairs | 1000 | 98.41% |
King’s college | 600 | 84.23% |
Street | 1200 | 87.56% |
Old Hospital | 400 | 86.68% |
Shop Facade | 100 | 84.71% |
St Mary’s Church | 700 | 85.68% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xie, T.; Wang, K.; Li, R.; Tang, X. Visual Robot Relocalization Based on Multi-Task CNN and Image-Similarity Strategy. Sensors 2020, 20, 6943. https://doi.org/10.3390/s20236943
Xie T, Wang K, Li R, Tang X. Visual Robot Relocalization Based on Multi-Task CNN and Image-Similarity Strategy. Sensors. 2020; 20(23):6943. https://doi.org/10.3390/s20236943
Chicago/Turabian StyleXie, Tao, Ke Wang, Ruifeng Li, and Xinyue Tang. 2020. "Visual Robot Relocalization Based on Multi-Task CNN and Image-Similarity Strategy" Sensors 20, no. 23: 6943. https://doi.org/10.3390/s20236943
APA StyleXie, T., Wang, K., Li, R., & Tang, X. (2020). Visual Robot Relocalization Based on Multi-Task CNN and Image-Similarity Strategy. Sensors, 20(23), 6943. https://doi.org/10.3390/s20236943