GMS-RANSAC: A Fast Algorithm for Removing Mismatches Based on ORB-SLAM2

: This paper presents a new method of removing mismatches of redundant points based on oriented fast and rotated brief (ORB) in vision simultaneous localization and mapping (SLAM) systems. On the one hand, the grid-based motion statistics (GMS) algorithm reduces the processing time of key frames with more feature points and greatly increases the robustness of the original algorithm in a complex environment. On the other hand, aiming at the situation that the GMS algorithm is prone to false matching when there are few symmetry feature point pairs, the random sample consensus (RANSAC) algorithm is used to optimize and correct it. Experiments show that the method we propose has an average error correction rate of 28.81% for individual GMS while the time consumed at the same accuracy threshold is reduced by 72.18% on average. At the same time, we compared it to locality preserving matching (LPM) and progressive sample consensus (PROSAC), and it performed the best. Finally, we integrated GMS-RANSAC into the ORB-SLAM2 system for monocular initialization, which results in a signiﬁcant improvement.


Introduction
With the proposal and popularization of intelligence in robotics, this also accelerates the development of simultaneous localization and mapping (SLAM), which plays an important role in applications such as service robots, VR, 3D reconstruction and autonomous navigation platforms. Depending on the kinds of sensors, we usually divide SLAM into two main categories. One is based on a laser. The distance and angle of markings in adjacent keyframes is used to estimated position and motion with Rao-Blackwellized Particle Filter in the robot coordinate system [1]. The other category is vision SLAM, which can be carried out by using just a monocular camera. With good robustness and accurate recognition in position, these cameras which provide rich environmental information are also cheap.
MonoSLAM [2] is the first real-time monocular vision SLAM system, in which the Extended Kalman Filter (EKF) [3] is used as the back-end to track the sparse feature points at the front-end. In 2007, Klein et al. proposed Parallel Tracking and Mapping (PTAM) [4] that realized the parallelization of the tracking and drawing process and used nonlinear optimization instead of conventional filters at the back-end. ORB-SLAM [5] was put forward in 2015. It has good versatility for supporting monocular, stereo and RGB-D. In the traditional feature point method of the SLAM process, feature matching is a critical step in visual SLAM [6], which fixes an issue with data association in SLAM. This means the correspondence was determined between a road sign that is currently seen and before it was seen. By accurately matching the description subs between the image and the image or the map [7], we can reduce a lot of burden for subsequent attitude estimation, optimization and other operations. However, due to the local characteristics of image features, mismatches are widespread, and this has not been effectively solved for a long time [8]. At present, it has become a bottleneck in visual SLAM that restricts performance improvement [9].
As Figure 1 shows, researchers usually estimate the Fundamental matrix and Homography matrix [10], and then choose the one which has smaller reprojection error to perform the exercise estimation. We prefer to use random sample consensus (RANSAC) [11] techniques when there may be mismatches. To meet the corresponding accuracy requirements when processing large amounts of data, the number of iterations must be increased, which greatly increases the time it takes to match images [12]. The response time has a great impact on real-time performance [13], and its reduction will sacrifice the accuracy of the SLAM system. Therefore, it is difficult to improve the response time and accuracy at the same time when the amount of data is large.
Symmetry 2022, 14, x FOR PEER REVIEW 2 of 13 it was seen. By accurately matching the description subs between the image and the image or the map [7], we can reduce a lot of burden for subsequent attitude estimation, optimization and other operations. However, due to the local characteristics of image features, mismatches are widespread, and this has not been effectively solved for a long time [8]. At present, it has become a bottleneck in visual SLAM that restricts performance improvement [9]. As Figure 1 shows, researchers usually estimate the Fundamental matrix and Homography matrix [10], and then choose the one which has smaller reprojection error to perform the exercise estimation. We prefer to use random sample consensus (RANSAC) [11] techniques when there may be mismatches. To meet the corresponding accuracy requirements when processing large amounts of data, the number of iterations must be increased, which greatly increases the time it takes to match images [12]. The response time has a great impact on real-time performance [13], and its reduction will sacrifice the accuracy of the SLAM system. Therefore, it is difficult to improve the response time and accuracy at the same time when the amount of data is large. This paper presents a new method in which grid-based motion statistics (GMS) [14] and RANSAC techniques are introduced to eliminate the incorrect matches. The GMS algorithm reduces the processing time of key frames with more feature points and greatly increases the robustness of the original algorithm in a complex environment. On the other hand, aiming at the situation that the GMS algorithm is prone to false matching when there are few feature points, the proposed algorithm can correct it. By experimental comparison, GMS-RANSAC improved the accuracy of GMS by about 28.81% and saved RANSAC about 72.18% of the time by setting the RANSAC threshold instead of setting the number of iterations.  This paper presents a new method in which grid-based motion statistics (GMS) [14] and RANSAC techniques are introduced to eliminate the incorrect matches. The GMS algorithm reduces the processing time of key frames with more feature points and greatly increases the robustness of the original algorithm in a complex environment. On the other hand, aiming at the situation that the GMS algorithm is prone to false matching when there are few feature points, the proposed algorithm can correct it. By experimental comparison, GMS-RANSAC improved the accuracy of GMS by about 28.81% and saved RANSAC about 72.18% of the time by setting the RANSAC threshold instead of setting the number of iterations.
Finally, we projected the method to create a new initializer based on the ORB-SLAM2 [15] system for monocular initialization, in which the number of the RANSAC model is decreased significantly. Figure 2 presents the initialization process for new design. The number of models is rapidly controlled within a corresponding range by GMS and canceling the number of iterations makes optimization global to achieve a better model, which results in a significant improvement. it was seen. By accurately matching the description subs between the image and the image or the map [7], we can reduce a lot of burden for subsequent attitude estimation, optimization and other operations. However, due to the local characteristics of image features, mismatches are widespread, and this has not been effectively solved for a long time [8]. At present, it has become a bottleneck in visual SLAM that restricts performance improvement [9]. As Figure 1 shows, researchers usually estimate the Fundamental matrix and Homography matrix [10], and then choose the one which has smaller reprojection error to perform the exercise estimation. We prefer to use random sample consensus (RANSAC) [11] techniques when there may be mismatches. To meet the corresponding accuracy requirements when processing large amounts of data, the number of iterations must be increased, which greatly increases the time it takes to match images [12]. The response time has a great impact on real-time performance [13], and its reduction will sacrifice the accuracy of the SLAM system. Therefore, it is difficult to improve the response time and accuracy at the same time when the amount of data is large. This paper presents a new method in which grid-based motion statistics (GMS) [14] and RANSAC techniques are introduced to eliminate the incorrect matches. The GMS algorithm reduces the processing time of key frames with more feature points and greatly increases the robustness of the original algorithm in a complex environment. On the other hand, aiming at the situation that the GMS algorithm is prone to false matching when there are few feature points, the proposed algorithm can correct it. By experimental comparison, GMS-RANSAC improved the accuracy of GMS by about 28.81% and saved RANSAC about 72.18% of the time by setting the RANSAC threshold instead of setting the number of iterations.   The remainder of the paper is organized as follows: Section 2 describes related work. In Section 3, we present our algorithm. The results are evaluated in Section 4. Section 5 concludes the paper. The source code of the demo is at https://github.com/fycie/demo.git, accessed on 10 March 2022.

RANSAC
RANSAC is used to randomly find a few points from all observations to fit the model and calculate the residuals of the model and all observation data in turn. All points will be judged as intrinsic points and be counted when they are less than a given threshold. On the contrary, a point could be judged as an outer point. Then, a few points are randomly selected to fit the model iteration again. If the number of fit inner points is larger than the previous model, the old is iterated into the new one.
The traditional RANSAC algorithm needs to set an upper bound of iteration value and error threshold, calculate the transformation matrix H 1 through four symmetry pairs of sample data, and record it as a data detection model M 1 which is used to measure all of the points and calculate the reprojection error. When the error is less than the set threshold, this point is classified as an inner point and added to the set of inner points. If the current set of inner points is more than the optimal, the current set is updated to the optimal set of inner points. Then, we update the number of iterations n. The process ends when the current is greater than the set number of iterations, otherwise the above steps are repeated and iterations are set to n + 1. It is expressed as: If a maximum overlap is not set in advance generation value, the iterations will continue in this process. The optimal solution obtained at the end is determined by the limitation of the number of iterations, and the upper limit of the number of iterations is strongly consistent with the probability of obtaining the best model, which means the greater the upper limit of the number of iterations, the greater the probability of obtaining the best model. Although the iterations are increased at the same time, it slows down the algorithm. In the ORB-SLAM2 system, RANSAC rudely iterates a smaller number of times to ensure that the response time meets the real-time requirements. The proposed method is not to limit the number of iterations, but to set the threshold to ensure the accuracy of the results. No matter how large the amount of data is, in our method the GMS algorithm can quickly filter data to prepare for the RANSAC algorithm. Figure 3 shows the example of RANSAC algorithm.

GMS
Combining smoothness constraints into feature matching is known to enable super robust matching. However, it was complex and time-consuming for us to succeed in this process in the past. GMS proposes a simple method to illustrate motion smoothness as a number of matching statistical possibilities in an area, which makes high-quantity

GMS
Combining smoothness constraints into feature matching is known to enable super robust matching. However, it was complex and time-consuming for us to succeed in this process in the past. GMS proposes a simple method to illustrate motion smoothness as a number of matching statistical possibilities in an area, which makes high-quantity matching with displacement have a high match quality. This provides a real-time, robust matching system. Evaluation of a video with low texture, blur and wide baselines (a pair of images that can be thought of as rotating displacements) shows that GMS exceeds other real-time matching methods and achieves the same effect as more complex, slower matching methods.
The key idea of GMS technology is to combine motion smoothing constraints (adjacent pixels of images from different viewpoints share similar motion) [12] into a statistical framework to reject wrong matching. The motion smoothing assumption means that the real matching neighborhood view is the same 3D region. Formally, the number of similar neighbors is used to identify good correspondences. The flaw in the model is that when there are few points within each grid, confidence of neighborhoods is low, which may lead to a large number of error results. By fusing RANSAC, the proposed approach is more robust than GMS when faced with a dataset containing errors.

LPM
Locality preserving matching (LPM) [16] is to maintain the local neighborhood structures of those potential true matches. It derives a closed-form solution with linearithmic time and linear space complexities in which the problem was formulated into a mathematical model [17]. It utilizes a more relaxed geometric constraint, yet it achieves surprisingly excellent performance and becomes the most popular performer considering the time cost [18]. Compared with LPM, our method obtains better performance in accuracy and runtime. Figure 4 shows the workflow of the proposed method. Specifically, the main workflow is as follows:

Workflow of the Method
(1) Firstly, the ORB feature points of two images are extracted, respectively; (2) Then, the ORB features are matched by Hamming distance of descriptor; (3) Thirdly, the results of the previous step are roughly screened by GMS algorithm, which makes the number of matches greatly reduced; (4) Finally, the outliers are further removed by setting the random sampling consistency of the threshold.
As shown in Figure 4, first, the same number of ORB feature points was created to match two images I = {I 1 , I 2 }. We recorded their horizontal and vertical coordinates as kp i , so that KP i = {kp 1 , kp 2 , . . . , kp n }. Secondly, by calculating the Hamming distance of ORB features, we found the corresponding matching points through I 2 in the order in which the feature points were stored in KP 1 and then recorded their serial numbers in the KP 2 . m i is the number of KP 2 , which is matched to kp i in KP 1 . If there were no points matched, m i = −1. After that, we would obtain M 1 = {m 1 , m 2 , . . . , m n } and then We used Equation (2) that is GMS to set the value of m i which did not meet the requirements to −1.
According to the idea of GMS algorithm, the image was divided to be matched into G × G grids, and we set G to 20 in this paper. M 2 = {m 1 , m 2 , . . . , m n } was obtained through GMS. Then, we sorted out the data. When m i = −1, we set As a result, P g = {p 1 , p 2 , . . . , p n }, which was prepared for RANSAC. After normalizing the points, we used RANSAC algorithm to calculate the Homography matrix, and set the threshold of the algorithm to 5. According to the same form before, the outer points were eliminated by re-projection error. Finally, we obtained P g+r = {p 1 , p 2 , . . . , p n }, which are excellent matches. (3) Thirdly, the results of the previous step are roughly screened by GMS algorithm, which makes the number of matches greatly reduced; (4) Finally, the outliers are further removed by setting the random sampling consistency of the threshold. As shown in Figure 4, first, the same number of ORB feature points was created to match two images We used Equation (2) that is GMS to set the value of i m which did not meet the requirements to −1.
According to the idea of GMS algorithm, the image was divided to be matched into G G  grids, and we set G to 20 in this paper.   Then, GMS was compared with GMS-RANSAC at different numbers of ORB feature points. Figure 6 shows that GMS-RANSAC becomes more effective with the number of points increasing, and the great gap from 1000 to 2000 is the weaknesses of the GMS. The trend of optimization also reflects changes in GMS accuracy. We processed the data according to the threshold value of 5. The highest point occurs when the number of ORB Then, GMS was compared with GMS-RANSAC at different numbers of ORB feature points. Figure 6 shows that GMS-RANSAC becomes more effective with the number of points increasing, and the great gap from 1000 to 2000 is the weaknesses of the GMS. The trend of optimization also reflects changes in GMS accuracy. We processed the data according to the threshold value of 5. The highest point occurs when the number of ORB feature points is 5000, and the accuracy is 59.76%. Each of feature points were matched for record, and then set as Q i = {X 1 , X 2 , . . . , X n }. The maximum and minimum values in Q i were removed for analysis.

GMS Mismatch Correction
The average improvement efficiency A is about 54.24%. Then, GMS was compared with GMS-RANSAC at different numbers of ORB feature points. Figure 6 shows that GMS-RANSAC becomes more effective with the number of points increasing, and the great gap from 1000 to 2000 is the weaknesses of the GMS. The trend of optimization also reflects changes in GMS accuracy. We processed the data according to the threshold value of 5. The highest point occurs when the number of ORB feature points is 5000, and the accuracy is 59.76%. Each of feature points were matched for record, and then set as

The Acceleration of RANSAC
GMS for continuity-based neighborhood search matching can quickly reduce a large number of coarse matches. Time of different matching methods was measured by taking same number of ORB feature points from the same two pictures, and numbers of fine matches were recorded.
As Figure 7 shows, RANSAC takes more time with ORB feature points increasing at the same RANSAC threshold. For example, after rough matching, outliers are removed by the RANSAC algorithm. It takes 64.23 ms to reach the standard algorithm with a threshold of 5 based on 1000 ORB features, but the method proposed only takes 1.86 ms in total. GMS consumed 1.18 ms and other times are RANSAC In addition, through the calculation from 1000 to 10,000 feature points at 1000 intervals, the average time consumed was reduced by 94% in parallel to RANSAC. Because GMS rapidly reduces the number of matches, the number of RANSAC reaching the 5 threshold is greatly reduced. This is the reason why the time required by RANSAC algorithm was reduced.
by the RANSAC algorithm. It takes 64.23 ms to reach the standard algorithm with a threshold of 5 based on 1000 ORB features, but the method proposed only takes 1.86 ms in total. GMS consumed 1.18 ms and other times are RANSAC In addition, through the calculation from 1000 to 10,000 feature points at 1000 intervals, the average time consumed was reduced by 94% in parallel to RANSAC. Because GMS rapidly reduces the number of matches, the number of RANSAC reaching the 5 threshold is greatly reduced. This is the reason why the time required by RANSAC algorithm was reduced.

Experiments
The experiments were conducted on a 4-threaded, 4-RAM virtual machine ubuntu16.04 installed by VMware Workstation Pro under intel i5-8400 CPU, 16G RAM, 256G SSD desktops and C++ code.

GMS-RANSAC for Correspondence Selection
In order to evaluate GMS-RANSAC performance comprehensively, we experimented with different local features and different number of points. Two frames

Experiments
The experiments were conducted on a 4-threaded, 4-RAM virtual machine ubuntu16.04 installed by VMware Workstation Pro under intel i5-8400 CPU, 16 G RAM, 256 G SSD desktops and C++ code.

GMS-RANSAC for Correspondence Selection
In order to evaluate GMS-RANSAC performance comprehensively, we experimented with different local features and different number of points. Two frames were selected from a range of datasets for experimental comparison to simulate the operation of scenarios.

Datasets and Metrics
It consists of 2 datasets, including TUM [19] and KITTI [20], which provide indoor scenes and street views. We calculated the ratio by counting the time spent by the algorithm and the number of good matches obtained in Section 3. Figure  were selected from a range of datasets for experimental comparison to simulate the operation of scenarios.

Datasets and Metrics
It consists of 2 datasets, including TUM [19] and KITTI [20], which provide indoor scenes and street views. We calculated the ratio by counting the time spent by the algorithm and the number of good matches obtained in Section 3. Figure 8

Experimental Results
It can be seen from Figure 9a that the matching numbers of RANSAC are more than GMS and GMS-RANSAC. As the number of ORB points increases, all three methods gradually increase, and the gap between RANSAC and the other two methods increases. GMS-RANSAC is slightly higher than GMS around 8000 points of ORB features on KITTI

Experimental Results
It can be seen from Figure 9a that the matching numbers of RANSAC are more than GMS and GMS-RANSAC. As the number of ORB points increases, all three methods gradually increase, and the gap between RANSAC and the other two methods increases. GMS-RANSAC is slightly higher than GMS around 8000 points of ORB features on KITTI in Figure 9b. Figure 9c shows the results on TUM dataset in which the total number filtered by the method is lower than that of the other two datasets.

Experimental Results
It can be seen from Figure 9a that the matching numbers of RANSAC are more than GMS and GMS-RANSAC. As the number of ORB points increases, all three methods gradually increase, and the gap between RANSAC and the other two methods increases. GMS-RANSAC is slightly higher than GMS around 8000 points of ORB features on KITTI in Figure 9b. Figure 9c shows the results on TUM dataset in which the total number filtered by the method is lower than that of the other two datasets.   (4) and (5), we summarize the correction rate of the above GMS-RANSAC compared to the GMS. The data show that the number of matches obtained after GMS processing is greater   (4) and (5), we summarize the correction rate of the above GMS-RANSAC compared to the GMS. The data show that the number of matches obtained after GMS processing is greater than RANSAC. However, when further filtered with the same threshold RANSAC, the resulting number of fine matches is not much different from the original.
As can be seen in Figure 10a, the cost time of three methods increases with the increasing number of ORB features on TUM desk. All three algorithms reach their maximum values at 10,000 ORB features. It can be seen in Figure 10b that there are several inflection points in RANSAC on KITTI. The extremum of RANSAC is 32.735 ms at 3000, and the time consumed by the three algorithms is relatively close when ORB feature points are 1000-2000 (Figure 10c). The gap becomes increasing gradually with the number of ORB features rising.
Symmetry 2022, 14, x FOR PEER REVIEW 9 of 13 than RANSAC. However, when further filtered with the same threshold RANSAC, the resulting number of fine matches is not much different from the original. As can be seen in Figure 10a, the cost time of three methods increases with the increasing number of ORB features on TUM desk. All three algorithms reach their maximum values at 10,000 ORB features. It can be seen in Figure 10b that there are several inflection points in RANSAC on KITTI. The extremum of RANSAC is 32.735 ms at 3000, and the time consumed by the three algorithms is relatively close when ORB feature points are 1000-2000 (Figure 10c). The gap becomes increasing gradually with the number of ORB features rising. Figure 10a-c present that as the ORB feature points increase, the time consumed by RANSAC increases linearly. In contrast, the time required for GMS stabilized at 1ms-2ms.GMS-RANSAC, where the growth of GMS-RANSAC is stabilizing.
As shown in Table 1, the method we propose has an average error correction rate of 28.81% for individual GMS. On the other hand, there is a reduction of about 72.18% in Figure 10a-c present that as the ORB feature points increase, the time consumed by RANSAC increases linearly. In contrast, the time required for GMS stabilized at 1-2ms. GMS-RANSAC, where the growth of GMS-RANSAC is stabilizing.
As shown in Table 1, the method we propose has an average error correction rate of 28.81% for individual GMS. On the other hand, there is a reduction of about 72.18% in time for individual RANSAC. Overall, it is a significant improvement in feature matching. The effect of GMS-RANSAC is most pronounced on the TUM dataset, and the maximum of correction rate between GMS and GMS-RANSAC is 30.94%. The time reduction maximum between GMS and GMS-RANSAC is 84.54%. It plays an important role in both ways.  Table 2 reports the experimental results for PROSAC [21], LPM and GMS-RASAC. Here, we mainly compare algorithms in terms of %Precision, %Recall and time cost after RANSAC-based outlier removal. All the above results demonstrate that GMS-RANSAC can show better performance with LPM using the same feature correspondences as input. Although PROSAC is outstanding, it is less stable at runtime with different number of points and consumes more time.

The New Initializer Based on ORB-SLAM2
To save computing resources, we inserted GMS-RANSAC in SearchForInitialization. According to its original practice,vmatched12[i] at the outer point was set to −1 to mark it. Furthermore, the number of iterations of the original group was replaced by the threshold method, which was integrated into the initialization of ORB-SLAM2. The number of iterations and the parallax angle of the initialization decision were changed to keep system stability. We set the number of iterations of the original system from 200 to 300 and adjusted the parallax angle limit from 0.36 to 0.2 to test the initializer.

Datasets and Metrics
Because the monocular initialization in KITTI datasets was easy, we chose TUM dataset freiburg1_xyz and freiburg1_desk as the datasets for this experiment, and this was beneficial for us to identify the ability of initializers. In each test, we measured (1) how Symmetry 2022, 14, 849 10 of 12 many 3D points were triangulated; (2) how much time the algorithm cost. In addition, we calculated the ratio of the number of points to the initialization time to evaluate the efficiency of the initializers. Table 3 shows the results, in which RANSAC is in parallel to GMS-RANSAC based on the original initializer. The results show that the proposed initializer can significantly improve the initialization efficiency and make the 3D map denser. This is of great significance for the ORB-SLAM2 system, especially when they meet the complex texture environment. The proposed method can greatly increase the robustness of the system.  Table 4 presents that the original initializer tends to be stable in the number of triangulated map points while our proposed method increases steadily. Although the time will increase significantly with the increase of 3D points after 3000 ORB points, the rate is always higher than the original. Figure 11 presents the experimental scenarios under different datasets.  Table 4 presents that the original initializer tends to be stable in the number of triangulated map points while our proposed method increases steadily. Although the time will increase significantly with the increase of 3D points after 3000 ORB points, the rate is always higher than the original. Figure 11 presents the experimental scenarios under different datasets.

Conclusions
This paper proposes a fast correspondence selection algorithm which combines GMS and RANSAC. Consequently, it can make up for the shortcomings of the original two algorithms used alone and make them more robust, especially when the ORB features are too many or too little. Moreover, experiments show that the method we propose has an average error correction rate of 28.81% for individual GMS while the time consumed at the same accuracy threshold is reduced by 72.18% on average. This was also verified in the ORB-SLAM2 system, in which it shows its great potential in real-time applications.
We think that the RANSAC in ORB-SLAM2 is quantitative from the method. As experiments show, it can handle a peak of 3000, which does not fully utilize the acquired image information. This paper proposes a new method of global optimization, which can greatly improve the robustness of the system in a complex environment. We are simply using the Homography matrix for experiments at present, and a new contrast structure

Conclusions
This paper proposes a fast correspondence selection algorithm which combines GMS and RANSAC. Consequently, it can make up for the shortcomings of the original two algorithms used alone and make them more robust, especially when the ORB features are too many or too little. Moreover, experiments show that the method we propose has an average error correction rate of 28.81% for individual GMS while the time consumed at the same accuracy threshold is reduced by 72.18% on average. This was also verified in the ORB-SLAM2 system, in which it shows its great potential in real-time applications.
We think that the RANSAC in ORB-SLAM2 is quantitative from the method. As experiments show, it can handle a peak of 3000, which does not fully utilize the acquired image information. This paper proposes a new method of global optimization, which can greatly improve the robustness of the system in a complex environment. We are simply using the Homography matrix for experiments at present, and a new contrast structure will be designed on the Homography matrix and the Fundamental matrix to improve the monocular initialization in the future. In addition, we are considering replacing bag of words (BOW) with GMS-RANSAC to make great progress in real time of the SLAM system. Furthermore, mapping needs to be further developed, and building more reliable and dense maps to support navigation is also the core pursuit of the SLAM system. Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: The datasets used or analyzed during the current study are available from the corresponding author upon reasonable request.