You are currently viewing a new version of our website. To view the old version click .
Symmetry
  • Article
  • Open Access

20 April 2022

GMS-RANSAC: A Fast Algorithm for Removing Mismatches Based on ORB-SLAM2

,
,
,
and
1
School of Mechanical Engineering, Hubei University of Technology, Wuhan 430062, China
2
Collaborative Innovation Center of Intelligent Green Manufacturing Technology and Equipment, Qingdao 266000, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
This article belongs to the Section Computer

Abstract

This paper presents a new method of removing mismatches of redundant points based on oriented fast and rotated brief (ORB) in vision simultaneous localization and mapping (SLAM) systems. On the one hand, the grid-based motion statistics (GMS) algorithm reduces the processing time of key frames with more feature points and greatly increases the robustness of the original algorithm in a complex environment. On the other hand, aiming at the situation that the GMS algorithm is prone to false matching when there are few symmetry feature point pairs, the random sample consensus (RANSAC) algorithm is used to optimize and correct it. Experiments show that the method we propose has an average error correction rate of 28.81% for individual GMS while the time consumed at the same accuracy threshold is reduced by 72.18% on average. At the same time, we compared it to locality preserving matching (LPM) and progressive sample consensus (PROSAC), and it performed the best. Finally, we integrated GMS-RANSAC into the ORB-SLAM2 system for monocular initialization, which results in a significant improvement.

1. Introduction

With the proposal and popularization of intelligence in robotics, this also accelerates the development of simultaneous localization and mapping (SLAM), which plays an important role in applications such as service robots, VR, 3D reconstruction and autonomous navigation platforms. Depending on the kinds of sensors, we usually divide SLAM into two main categories. One is based on a laser. The distance and angle of markings in adjacent keyframes is used to estimated position and motion with Rao-Blackwellized Particle Filter in the robot coordinate system [1]. The other category is vision SLAM, which can be carried out by using just a monocular camera. With good robustness and accurate recognition in position, these cameras which provide rich environmental information are also cheap.
MonoSLAM [2] is the first real-time monocular vision SLAM system, in which the Extended Kalman Filter (EKF) [3] is used as the back-end to track the sparse feature points at the front-end. In 2007, Klein et al. proposed Parallel Tracking and Mapping (PTAM) [4] that realized the parallelization of the tracking and drawing process and used nonlinear optimization instead of conventional filters at the back-end. ORB-SLAM [5] was put forward in 2015. It has good versatility for supporting monocular, stereo and RGB-D. In the traditional feature point method of the SLAM process, feature matching is a critical step in visual SLAM [6], which fixes an issue with data association in SLAM. This means the correspondence was determined between a road sign that is currently seen and before it was seen. By accurately matching the description subs between the image and the image or the map [7], we can reduce a lot of burden for subsequent attitude estimation, optimization and other operations. However, due to the local characteristics of image features, mismatches are widespread, and this has not been effectively solved for a long time [8]. At present, it has become a bottleneck in visual SLAM that restricts performance improvement [9].
As Figure 1 shows, researchers usually estimate the Fundamental matrix and Homography matrix [10], and then choose the one which has smaller reprojection error to perform the exercise estimation. We prefer to use random sample consensus (RANSAC) [11] techniques when there may be mismatches. To meet the corresponding accuracy requirements when processing large amounts of data, the number of iterations must be increased, which greatly increases the time it takes to match images [12]. The response time has a great impact on real-time performance [13], and its reduction will sacrifice the accuracy of the SLAM system. Therefore, it is difficult to improve the response time and accuracy at the same time when the amount of data is large.
Figure 1. Partial flow of feature point method in SLAM.
This paper presents a new method in which grid-based motion statistics (GMS) [14] and RANSAC techniques are introduced to eliminate the incorrect matches. The GMS algorithm reduces the processing time of key frames with more feature points and greatly increases the robustness of the original algorithm in a complex environment. On the other hand, aiming at the situation that the GMS algorithm is prone to false matching when there are few feature points, the proposed algorithm can correct it. By experimental comparison, GMS-RANSAC improved the accuracy of GMS by about 28.81% and saved RANSAC about 72.18% of the time by setting the RANSAC threshold instead of setting the number of iterations.
Finally, we projected the method to create a new initializer based on the ORB-SLAM2 [15] system for monocular initialization, in which the number of the RANSAC model is decreased significantly. Figure 2 presents the initialization process for new design. The number of models is rapidly controlled within a corresponding range by GMS and canceling the number of iterations makes optimization global to achieve a better model, which results in a significant improvement.
Figure 2. Initialization process for new design.
The remainder of the paper is organized as follows: Section 2 describes related work. In Section 3, we present our algorithm. The results are evaluated in Section 4. Section 5 concludes the paper. The source code of the demo is at https://github.com/fycie/demo.git, accessed on 10 March 2022.

3. Methodology

3.1. Workflow of the Method

Figure 4 shows the workflow of the proposed method. Specifically, the main workflow is as follows:
Figure 4. The workflow of the proposed method.
(1)
Firstly, the ORB feature points of two images are extracted, respectively;
(2)
Then, the ORB features are matched by Hamming distance of descriptor;
(3)
Thirdly, the results of the previous step are roughly screened by GMS algorithm, which makes the number of matches greatly reduced;
(4)
Finally, the outliers are further removed by setting the random sampling consistency of the threshold.
As shown in Figure 4, first, the same number of ORB feature points was created to match two images I = { I 1 , I 2 } . We recorded their horizontal and vertical coordinates as k p i , so that K P i = { k p 1 , k p 2 , , k p n } . Secondly, by calculating the Hamming distance of ORB features, we found the corresponding matching points through I 2 in the order in which the feature points were stored in K P 1 and then recorded their serial numbers in the K P 2 . m i is the number of K P 2 , which is matched to k p i in K P 1 . If there were no points matched, m i = 1 . After that, we would obtain M 1 = m 1 , m 2 , , m n and then
F g m s ( i , m i ) = 1 , m i 0 f g m s ( i , m i ) , m i > 0
We used Equation (2) that is GMS to set the value of m i which did not meet the requirements to −1.
According to the idea of GMS algorithm, the image was divided to be matched into G × G grids, and we set G to 20 in this paper. M 2 = m 1 , m 2 , , m n was obtained through GMS. Then, we sorted out the data. When m i 1 , we set
p n = f ( i , m i )
As a result, P g = { p 1 , p 2 , , p n } , which was prepared for RANSAC. After normalizing the points, we used RANSAC algorithm to calculate the Homography matrix, and set the threshold of the algorithm to 5. According to the same form before, the outer points were eliminated by re-projection error. Finally, we obtained P g + r = { p 1 , p 2 , , p n } , which are excellent matches.

3.2. GMS Mismatch Correction

Figure 5 presents false matches of GMS in left picture when 2000 features are created, and the matches are corrected in the right image after RANSAC. RANSAC takes only 0.68 ms and improves 42% percent of accuracy at the criterion of the RANSAC threshold of 5.
Figure 5. Comparison of two methods under 2000 features: (a) ORB-BF-GMS, (b) ORB-BF-GMS-RANSAC.
Then, GMS was compared with GMS-RANSAC at different numbers of ORB feature points. Figure 6 shows that GMS-RANSAC becomes more effective with the number of points increasing, and the great gap from 1000 to 2000 is the weaknesses of the GMS. The trend of optimization also reflects changes in GMS accuracy. We processed the data according to the threshold value of 5. The highest point occurs when the number of ORB feature points is 5000, and the accuracy is 59.76%. Each of feature points were matched for record, and then set as Q i = { X 1 , X 2 , , X n } . The maximum and minimum values in Q ¯ i were removed for analysis.
Q ¯ i = i = 1 n X ¯ i n
Figure 6. Number of matches after filtering.
The average improvement efficiency A is about 54.24%.
A = i = 1 n Q ¯ i n

3.3. The Acceleration of RANSAC

GMS for continuity-based neighborhood search matching can quickly reduce a large number of coarse matches. Time of different matching methods was measured by taking same number of ORB feature points from the same two pictures, and numbers of fine matches were recorded.
As Figure 7 shows, RANSAC takes more time with ORB feature points increasing at the same RANSAC threshold. For example, after rough matching, outliers are removed by the RANSAC algorithm. It takes 64.23 ms to reach the standard algorithm with a threshold of 5 based on 1000 ORB features, but the method proposed only takes 1.86 ms in total. GMS consumed 1.18 ms and other times are RANSAC In addition, through the calculation from 1000 to 10,000 feature points at 1000 intervals, the average time consumed was reduced by 94% in parallel to RANSAC. Because GMS rapidly reduces the number of matches, the number of RANSAC reaching the 5 threshold is greatly reduced. This is the reason why the time required by RANSAC algorithm was reduced.
Figure 7. The cost time of two methods.

4. Experiments

The experiments were conducted on a 4-threaded, 4-RAM virtual machine ubuntu16.04 installed by VMware Workstation Pro under intel i5-8400 CPU, 16 G RAM, 256 G SSD desktops and C++ code.

4.1. GMS-RANSAC for Correspondence Selection

In order to evaluate GMS-RANSAC performance comprehensively, we experimented with different local features and different number of points. Two frames were selected from a range of datasets for experimental comparison to simulate the operation of scenarios.

4.1.1. Datasets and Metrics

It consists of 2 datasets, including TUM [19] and KITTI [20], which provide indoor scenes and street views. We calculated the ratio by counting the time spent by the algorithm and the number of good matches obtained in Section 3. Figure 8 presents the comparison of RANSAC and GMS-RANSAC in different datasets at 3000 feature points, and a.1, b.1 and c.1 represent the results of RANSAC under TUM and KITTI. The other results are GMS-RANSAC’s. We evaluate the two methods by comparing the number of matches and the time it takes.
Figure 8. Comparison of RANSAC and GMS-RANSAC in different datasets at 3000 ORB feature points: (a) on TUM desk, (b) on KITTI, (c) on TUM doll.

4.1.2. Experimental Results

It can be seen from Figure 9a that the matching numbers of RANSAC are more than GMS and GMS-RANSAC. As the number of ORB points increases, all three methods gradually increase, and the gap between RANSAC and the other two methods increases. GMS-RANSAC is slightly higher than GMS around 8000 points of ORB features on KITTI in Figure 9b. Figure 9c shows the results on TUM dataset in which the total number filtered by the method is lower than that of the other two datasets.
Figure 9. Matches of the three algorithms in different datasets: (a) on TUM desk, (b) on KITTI, (c) on TUM doll.
Figure 9a–c shows that the number of GMS-RANSAC results is similar to the number of high-quality matches obtained by the original RANSAC method. Using Equations (4) and (5), we summarize the correction rate of the above GMS-RANSAC compared to the GMS. The data show that the number of matches obtained after GMS processing is greater than RANSAC. However, when further filtered with the same threshold RANSAC, the resulting number of fine matches is not much different from the original.
As can be seen in Figure 10a, the cost time of three methods increases with the increasing number of ORB features on TUM desk. All three algorithms reach their maximum values at 10,000 ORB features. It can be seen in Figure 10b that there are several inflection points in RANSAC on KITTI. The extremum of RANSAC is 32.735 ms at 3000, and the time consumed by the three algorithms is relatively close when ORB feature points are 1000–2000 (Figure 10c). The gap becomes increasing gradually with the number of ORB features rising.
Figure 10. Cost time of the three algorithms in different datasets: (a) on TUM desk, (b) on KITTI, (c) on TUM doll.
Figure 10a–c present that as the ORB feature points increase, the time consumed by RANSAC increases linearly. In contrast, the time required for GMS stabilized at 1–2ms. GMS-RANSAC, where the growth of GMS-RANSAC is stabilizing.
As shown in Table 1, the method we propose has an average error correction rate of 28.81% for individual GMS. On the other hand, there is a reduction of about 72.18% in time for individual RANSAC. Overall, it is a significant improvement in feature matching. The effect of GMS-RANSAC is most pronounced on the TUM dataset, and the maximum of correction rate between GMS and GMS-RANSAC is 30.94%. The time reduction maximum between GMS and GMS-RANSAC is 84.54%. It plays an important role in both ways.
Table 1. Data obtained from three sets of experiments.

4.1.3. Comparisons

Table 2 reports the experimental results for PROSAC [21], LPM and GMS-RASAC. Here, we mainly compare algorithms in terms of %Precision, %Recall and time cost after RANSAC-based outlier removal.
Table 2. Experimental results of three algorithms under 2000 ORB features.
All the above results demonstrate that GMS-RANSAC can show better performance with LPM using the same feature correspondences as input. Although PROSAC is outstanding, it is less stable at runtime with different number of points and consumes more time.

4.2. The New Initializer Based on ORB-SLAM2

To save computing resources, we inserted GMS-RANSAC in S e a r c h F o r I n i t i a l i z a t i o n . According to its original practice, v m a t c h e d 12 [ i ] at the outer point was set to −1 to mark it. Furthermore, the number of iterations of the original group was replaced by the threshold method, which was integrated into the initialization of ORB-SLAM2. The number of iterations and the parallax angle of the initialization decision were changed to keep system stability. We set the number of iterations of the original system from 200 to 300 and adjusted the parallax angle limit from 0.36 to 0.2 to test the initializer.

4.2.1. Datasets and Metrics

Because the monocular initialization in KITTI datasets was easy, we chose TUM dataset freiburg1_xyz and freiburg1_desk as the datasets for this experiment, and this was beneficial for us to identify the ability of initializers. In each test, we measured (1) how many 3D points were triangulated; (2) how much time the algorithm cost. In addition, we calculated the ratio of the number of points to the initialization time to evaluate the efficiency of the initializers.

4.2.2. Experimental Results

Table 3 shows the results, in which RANSAC is in parallel to GMS-RANSAC based on the original initializer. The results show that the proposed initializer can significantly improve the initialization efficiency and make the 3D map denser. This is of great significance for the ORB-SLAM2 system, especially when they meet the complex texture environment. The proposed method can greatly increase the robustness of the system.
Table 3. Modified ORB-SLAM2 initialization results on the TUM dataset freiburg1_xyz (R:RANSAC G-R:GMS-RANSAC #3D Points: points by triangulation).
Table 4 presents that the original initializer tends to be stable in the number of triangulated map points while our proposed method increases steadily. Although the time will increase significantly with the increase of 3D points after 3000 ORB points, the rate is always higher than the original. Figure 11 presents the experimental scenarios under different datasets.
Table 4. Modified ORB-SLAM2 initialization results on the TUM dataset freiburg1_desk (R:RANSAC G-R:GMS-RANSAC #3D Points: points by triangulation).
Figure 11. The scenes of ORB-SLAM2 changed: (a) on TUM dataset freiburg1_xyz, (b) on TUM dataset freiburg1_desk.

5. Conclusions

This paper proposes a fast correspondence selection algorithm which combines GMS and RANSAC. Consequently, it can make up for the shortcomings of the original two algorithms used alone and make them more robust, especially when the ORB features are too many or too little. Moreover, experiments show that the method we propose has an average error correction rate of 28.81% for individual GMS while the time consumed at the same accuracy threshold is reduced by 72.18% on average. This was also verified in the ORB-SLAM2 system, in which it shows its great potential in real-time applications.
We think that the RANSAC in ORB-SLAM2 is quantitative from the method. As experiments show, it can handle a peak of 3000, which does not fully utilize the acquired image information. This paper proposes a new method of global optimization, which can greatly improve the robustness of the system in a complex environment. We are simply using the Homography matrix for experiments at present, and a new contrast structure will be designed on the Homography matrix and the Fundamental matrix to improve the monocular initialization in the future. In addition, we are considering replacing bag of words (BOW) with GMS-RANSAC to make great progress in real time of the SLAM system. Furthermore, mapping needs to be further developed, and building more reliable and dense maps to support navigation is also the core pursuit of the SLAM system.

Author Contributions

D.Z. and J.Z. contributed equally to this manuscript. Conceptualization, D.Z., J.Z. and X.H.; Data curation, J.Z.; Formal analysis, J.Z., F.W. and X.H.; Investigation, X.Y.; Methodology, J.Z., F.W. and X.H.; Project administration, D.Z.; Resources, J.Z. and X.Y.; Software, J.Z.; Supervision, D.Z. and X.Y.; Validation, D.Z., J.Z. and F.W.; Visualization, J.Z. and F.W.; Writing—original draft, J.Z. and X.Y.; Writing—review and editing, J.Z. and F.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China (No. 52075152) and the Collaborative Innovation Center of Intelligent Green Manufacturing Technology and Equipment, Shandong (IGSD-2020-006).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The datasets used or analyzed during the current study are available from the corresponding author upon reasonable request.

Acknowledgments

We are grateful to the Hubei University of Technology for creating good experimental conditions for us. We are grateful to the National Natural Science Foundation of China for the grant support (No. 52075152) and the Collaborative Innovation Center of Intelligent Green Manufacturing Technology and Equipment, Shandong (IGSD-2020-006). Co-first author Jinlun Zhu thanks Gao Xiang’s sharing at the technical level, for which he laid the foundation for his visual SLAM study. He also gives thanks for support and encouragement of his parents and teachers. At the same time, he would like to thank the people of Wuhan Relaxation, especially Xu, who accompanied him when he was depressed.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Grisetti, G.; Stachniss, C.; Burgard, W. Improving Grid-based SLAM with Rao-Blackwellized Particle Filters by Adaptive Proposals and Selective Resampling. In Proceedings of the IEEE International Conference on Robotics & Automation, Barcelona, Spain, 18–22 April 2005. [Google Scholar]
  2. Davison, A.J.; Reid, I.D.; Molton, N.D.; Stasse, O. MonoSLAM: Real-time single camera SLAM. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 1052–1067. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Julier, S.J.; Uhlmann, J.K. New extension of the Kalman filter to nonlinear systems. In Signal Processing, Sensor Fusion, and Target Recognition VI; International Society for Optics and Photonics: Bellingham, WA, USA, 1997; Volume 3068, pp. 182–193. [Google Scholar]
  4. Klein, G.; Murray, D. Parallel Tracking and Mapping for Small AR Workspaces. In Proceedings of the IEEE & Acm International Symposium on Mixed & Augmented Reality, Cambridge, UK, 15–18 September 2008. [Google Scholar]
  5. Mur-Artal, R.; Montiel, J.M.M.; Tardos, J.D. ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar] [CrossRef] [Green Version]
  6. Opdenbosch, D.V.; Steinbach, E. Collaborative Visual SLAM Using Compressed Feature Exchange. In Proceedings of the International Conference on Robotics and Automation, Pasadena, CA, USA, 19–23 May 2018; pp. 57–64. [Google Scholar]
  7. Bescós, B.; Fácil, J.; Civera, J.; Neira, J. DynSLAM: Tracking, Mapping and Inpainting in Dynamic Scenes. IEEE Robot. Autom. Lett. 2018, 3, 4076–4083. [Google Scholar] [CrossRef] [Green Version]
  8. Kim, T. Feature Detection Based on Significancy of Local Features for Image Matching. IEICE Trans. Inf. Syst. 2021, 104, 1510–1513. [Google Scholar] [CrossRef]
  9. Gao, X.; Zhang, T. Robust RGB-D simultaneous localization and mapping using planar point features. Robot. Auton. Syst. 2015, 72, 1–14. [Google Scholar] [CrossRef]
  10. Bian, J.W.; Wu, Y.H.; Zhao, J.; Liu, Y.; Zhang, L.; Cheng, M.M.; Reid, I. An Evaluation of Feature Matchers for Fundamental Matrix Estimation. In Proceedings of the British Machine Vision Conference, Cardiff, UK, 9–12 September 2019. [Google Scholar]
  11. Fischler, M.A.; Bolles, R.C. Random Sample Consensus: A Paradigm for Model Fitting with Applications To Image Analysis and Automated Cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
  12. Bian, J.-W.; Lin, W.-Y.; Liu, Y.; Zhang, L.; Yeung, S.-K.; Cheng, M.-M.; Reid, I. GMS: Grid-Based Motion Statistics for Fast, Ultra-robust Feature Correspondence. Int. J. Comput. Vis. 2019, 128, 1580–1593. [Google Scholar] [CrossRef] [Green Version]
  13. Tang, J.X.; Ericson, L.; Folkesson, J.; Jensfelt, P. GCNv2: Efficient Correspondence Prediction for Real-Time SLAM. IEEE Robot. Autom. Lett. 2019, 4, 3505–3512. [Google Scholar] [CrossRef] [Green Version]
  14. Bian, J.; Lin, W.Y.; Matsushita, Y.; Yeung, S.K.; Cheng, M.M. GMS: Grid-Based Motion Statistics for Fast, Ultra-Robust Feature Correspondence. In Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
  15. Mur-Artal, R.; Tardos, J.D. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras. IEEE Trans. Robot. 2017, 33, 1255–1262. [Google Scholar] [CrossRef] [Green Version]
  16. Ma, J.; Zhao, J.; Jiang, J.; Zhou, H.; Guo, X. Locality Preserving Matching. Int. J. Comput. Vis. 2019, 127, 512–531. [Google Scholar] [CrossRef]
  17. Ma, J.; Jiang, J.; Zhou, H.; Zhao, J.; Guo, X. Guided Locality Preserving Feature Matching for Remote Sensing Image Registration. IEEE Trans. Geosci. Remote Sens. 2018, 56, 4435–4447. [Google Scholar] [CrossRef]
  18. Ma, J.; Jiang, X.; Fan, A.; Jiang, J.; Yan, J. Image Matching from Handcrafted to Deep Features: A Survey. Int. J. Comput. Vis. 2020, 129, 23–79. [Google Scholar] [CrossRef]
  19. Sturm, J.; Engelhard, N.; Endres, F.; Burgard, W.; Cremers, D. A benchmark for the evaluation of RGB-D SLAM systems. In Proceedings of the (IROS) 2012: IEEE/RSJ International Conference on Intelligent Robots and Systems, Algarve, Portugal, 7–12 October 2012. [Google Scholar]
  20. Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The KITTI dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef] [Green Version]
  21. Chum, O.; Matas, J. Matching with PROSAC—Progressive sample consensus. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.