A Termination Criterion for Probabilistic PointClouds Registration

Probabilistic Point Clouds Registration (PPCR) is an algorithm that, in its multi-iteration version, outperformed state of the art algorithms for local point clouds registration. However, its performances have been tested using a fixed high number of iterations. To be of practical usefulness, we think that the algorithm should decide by itself when to stop, to avoid an excessive number of iterations and, therefore, wasting computational time. With this work, we compare different termination criterion on several datasets and prove that the chosen one produce very good results that are comparable to those obtained using a very high number of iterations while saving computational time.


Introduction
Point clouds registration is the problem of finding the transformation (mostly a rigid transformation) that best aligns two point clouds, usually called the source and target point clouds.
One of the first approaches to this problem, and still one of the most used, is Iterative Closest Point (ICP) [1][2][3], which aligns two point clouds by minimising the sum of distances between corresponding points, where corresponding points are nearest neighbouring points. Probabilistic Point Clouds Registration (PPCR) [4] is variant of ICP that uses a probabilistic model to improve the robustness against noise and outliers, one of the most relevant problem of local registration algorithms. Much like ICP, it is an iterative algorithm that repeatedly tries to improve a solution, until a convergence criterion is satisfied.
The experiments show that it outperformed most state-of-the-art local registration algorithms in this field. However, these experiments have been performed using a large fixed number of iterations as stopping criterion. Instead, we think that, to be of practical utility, an iterative algorithm should autonomously decide when to stop. Indeed, using a fixed number of iterations on one hand does not guarantee that the best solution have been found; on the other, it could result in an excess of computation time, because the solution have been found earlier.
For these reasons, we propose an improvement of PPCR, analysing different termination criteria and finding the best one. Moreover, we demonstrate that the chosen solution is as effective as using a very high number of iterations, but, at the same time, results in fewer iterations and, therefore, less computational time.

Related Work
Point clouds registration algorithms could be divided into two categories: local and global. Local registration algorithms (also known as fine registration) aim at finding the rototranslation that best aligns two point cloud that are already roughly aligned. Therefore, they refine a pre-existing alignment, that can be obtained in different ways; examples are: with another algorithm, with an inertial system or manually.
One of the most important algorithms in this category is Iterative Closest Point (ICP). ICP was developed independently by Besl and McKay [1], Chen and Medioni [2], and Zhang [3] and is still one of the most used technique. The most critical problem a registration algorithm has to solve is the data association problem, that is, associating one point in a point clouds, to one or more in the another. ICP solves this issue by associating to a point in the source point clouds the closest in the target. The best transformation resulting from this data association is found and this process is repeated until convergence.
Many different variants of ICP have been proposed. Usually, they aim at speeding up the algorithm or at improving the accuracy [5]. One of the most important of these variants is Generalized ICP (G-ICP) [6], that greatly improves the quality of the results by using a probabilistic framework with a point-to-plane data association.
Probabilistic Point Clouds Registration (PPCR) [4] uses the same closest-point based data association of ICP, in conjunction with a probabilistic model, to improve both the accuracy and, most important, the robustness against noise and outliers. While it was originally developed to deal with the problem of aligning a sparse point cloud with a dense one, it was shown to perform very well also on traditional registration problems.
Another important technique used for local point clouds registration is called Normal Distribution Transform (NDT), [7]. This technique was originally developed to register 2D laser scans, but has been successfully applied also to 3D point clouds, [8]. Differently from ICP, it does not establish any explicit correspondence between points. Instead, the source point cloud or laser scan is subdivided into cells and a normal distribution is assigned to each cell, so that the points are represented by a probability distribution. The matching problem is then solved as a maximization problem, using Newton's algorithm.
The second category of point clouds registration algorithms aims at global registration, that is, aligning two point clouds without any prior assumption on their misplacement. Traditionally, this problem has been solved using features-based techniques, such as PFH [9] and their faster variant FPFH [10], or angular-invariant features, [11]. Usually the matches found are used to estimate the rototranslation between the two point clouds using algorithms such as RANSAC [12]. As an alternative to hand-crafted descriptors, solutions based on neural networks, that aim at enhancing the discriminative capacity of the features, have been proposed. Examples are 3dMatch [13] and 3DSmoothNet [14]. Networks that combines both the feature matching and the transformation estimation steps together have been proposed too, such as Pointnetlk [15] and Pcrnet [16].
The drawback of global registration approaches is that they usually cannot provide an accurate alignment, mainly because of the high number of spurious matches; therefore, they are rather used to obtain a coarse registration that is later refined with a fine registration algorithm [17]. For this reason, techniques aimed at estimating a rototranslation from matches with a high number of outliers have been proposed. Notable examples are Fast Global Registration [18] and TEASER++ [19], that can even work without any feature, but using an all-to-all association strategy. While these approaches are a great improvements over traditional feature-based techniques, they have not been proved yet to outperform the best local registration algorithms.

Probabilistic Point Clouds Registration
We already presented PPCR in a previous work [4]; however, since we present an extension to the original version, we briefly summarize its working.
PPCR is a closest-point based algorithm for local point clouds registration. This means that it is aimed at fine-aligning two point clouds that are already roughly aligned. It does not use any feature to estimate correspondences between two point clouds; instead, similarly to ICP, it approximates the true, unknown, correspondences by using a data-association policy based on the closest distance. However, our data association policy differs from that of ICP (and many of its variants): in ICP each point in the source point cloud is associated only with a point in the target point cloud, while PPCR associates a point in the source point cloud with a set of points in the target cloud. Moreover, the associations are weighted. The weights represent the probability of an association of being the right data-association for a particular point.
The two different data association methods are depicted in fig. 1 For each point x j in the source point cloud, we look for the n nearest points, y 0 , ..., y n , in the target cloud. For each of these points y k , with 0 ≤ k ≤ n, we define an error term given by y k − (Rx j + T) 2 (1) Equation 1 represents the squared error between the point y k in the target point cloud and the associated point x j from the source point cloud, transformed using the current estimate of the rototranslation.
The set of error terms, calculated according to Equation 1, forms an optimization problem which is solved using a suitable method (such as Levenberg-Marquardt). However, given a set of points associated to x j , not all the corresponding error terms should have the same weight. Intuitively we want to give more importance to the associations that are in accordance with the current estimate of the transformation and lower importance to the others. Thus, the weight of the error term y k − (Rx j + T) 2 is given by where the proportionality implies a normalization among all the error terms associated with x j so that their weights represents a probability distribution. Equation 2 is derived from the EM algorithm, with an additive Gaussian noise model [4]. The Gaussian in Equation 2 is appropriate assuming that there are no outliers and all points in the source point cloud have a corresponding point in the target point cloud. However, a t-distribution is a better choice in presence of outliers, especially when there is lot of distortion in one of the point clouds that, thus, cannot be aligned perfectly. Consequently, we decided to use a more robust formulation for the weights, basing on the t-distribution. A t-distribution is very similar to a Gaussian, but the its tails have a higher probability; therefore, the probability of having outliers is higher than when using a Gaussian.
where ν the is number of degrees of freedom of the t-distribution and d is the dimension of the error terms (in our case 3, since we are operating with points in the 3D space). In eq. (3) we need an estimate of the rotation and translation; however these are estimated by solving the optimization problem whose error terms are weighted with the weights we want to calculate. Hence our problem cannot be formulated as a simple least-square error problem, but it has to be reformulated as an Expectation Maximization problem. During the Expectation phase the latent variables, in our case the weights, are estimated using the previous iteration estimate of the target variables (the rotation and translation), while during the Maximization phase, the problem becomes a least-square error optimization problem, with the latent variables assuming the values estimated during the Expectation phase.
The proposed approach, in its multi-iteration version, is composed of two nested loops. The inner one finds the best rototranslation that minimizes the sum of weighted squared errors (as in eq. (1), very similarly to ICP. However, differently to ICP, our problem cannot be solved in closed form and, thus, we use an iterative algorithm such as Levenberg-Marquard. Notice that at each iteration of Levenberg Marquard, the associations are not estimated again, but their weights are recalculated. Thus we solve an iteratively reweighted mean squared error problem. In the outer loop, we move the source cloud with the result of the optimization, re-estimate the associations and build a new optimization problem. This structure has been already briefly described in our previous work. However here we present a novel way to decide when the outer loop should stop, instead than using a predefined number of iterations.

Termination Criteria
In case that the source and the target point clouds are very close, a single iteration of the proposed probabilistic point clouds registration algorithm may be enough. However, in a typical real scenario, more iterations are necessary. In order for the algorithm to converge, most of the correspondences used to form the optimization problem need to be right. Since we use a data association policy based on the euclidean distance, this happens only if the two point clouds are close enough. In our algorithm, two parameters control which and how many points in the target point cloud are associated to a particular point in the source point cloud: the maximum distance between neighbors and the maximum number of neighbors. Setting these parameters to very high values could help the algorithm to converge to a good solution even when the starting poses of the two point clouds are not really close. However, this will allow more outliers, i.e., wrong data associations, to get into the optimization step. Even tough the probabilistic approach has the capability to soft-filtering out outliers, thanks to the probabilistic weighting technique, using too many points will lead to a huge optimization problem which would be very slow to solve. Usually, a much more practical and fast solution is to use lower values for the maximum distance and the maximum number of neighbors and use multiple iterations of the probabilistic approach, that implies re-estimating the data associations, in the same way it is done, for example, in ICP and G-ICP.
With this technique, our approach becomes composed of two nested loops. The inner one solves an optimization problem using the Levenberg-Marquard algorithm. The outer one moves the point cloud using the solution found in the inner loop, re-estimates the point correspondences and builds the corresponding optimization problem. This process is repeated until some convergence criterion is met.
The multi iteration version of our algorithm provides good results, compared to other state of the art algorithms [4]. Of course, in order to be of practical usefulness, such an algorithm would greatly benefit from some kind of automatic termination criterion. It would mean that the algorithm could decide by itself when it should stop.
The most simple termination criterion is to use a fixed predefined number of iterations. This is the technique we used in our previous work. However, this solution is far from being optimal, since the number of iterations would become a parameter of the algorithm. Most importantly, there would be no automatic way of estimating this parameter a-priori, so this solution is unpractical and has to be discarded. Lastly, using a fixed value for this parameter would probably mean using too many iterations in some cases and using too few in others. On the other hand, using a very large value would greatly increase the execution time, in many cases without improving the quality of the result.
For these reasons, we evaluated different automatic termination criteria, to find which one works best with PPCR.
Our first choice was to evaluate the Mean Squared Error (MSE) with respect to the previous iteration. We take the source point cloud and apply, separately, the rototranslations estimated during the current iteration of the algorithm and during the previous one. Therefore, we have the same point cloud in two different poses. Since applying a rototranslation, that is, a matrix multiplication, maintains the order of the points, we know that point x t i in X t (the source point cloud aligned with the current estimate) corresponds to point x t−1 i in X t−1 (the source point cloud aligned with the previous estimate). Hence, the point correspondences are known and exact. We used eq. (5), where N is the dimension of the point cloud, to calculate the Mean Squared Error (MSE) between two iterations.
We stop the algorithm when the MSE drops under a certain relative threshold. With relative we mean that we are not using a fixed absolute threshold, but we want to stop when, for example, the Mean Squared Error becomes smaller than a certain fraction of that at the previous iteration. That is: This means that we are stopping the algorithm when it is not able to move (or it is moving of a negligible amount) the source point cloud any more; thus, it has converged. We use a relative threshold, instead than an absolute, because it is much more flexible and does not have to be tuned for each set of point clouds. However, instead than checking for eq. (6) just once, we ensure that the condition holds for several consecutive iterations. In this way we avoid stopping too early because of a single iteration during which the alignment was not improved, but that could be followed by other successful iterations.
Another option we evaluated is the use of the so-called Cost Drop. During each outer iteration of the multi-iteration version of PPCR, an optimization problem is solved. Initially, the solution of the problem we are going to optimize will have a certain cost. The optimizer will, hopefully, reduce this cost to a lower value. The difference between the initial and the final cost is called Cost Drop. We used this value stopping the outer loop when the cost drop of the inner loop drops under a threshold. We want to avoid absolute thresholds, since they need to be specifically tuned for each application. Instead, we express this threshold with respect to the initial cost of the problem: for example we could stop when the cost drop is less than 1% of the initial cost of the problem. This is what we used for our experiments and proved to be a good threshold for obtaining accurate registrations. This condition is expressed by eq. (7).
Similarly to the MSE, and for the same reasons, this condition should hold for several iterations and not just once.
The third criterion we evaluated is the number of successful iterations of optimization problem. Solving an optimization problem with Levenberg-Marquard is an iterative process. Each step of this process can be successful, if the step managed to reduce the cost of the problem, or, otherwise, unsuccessful. We wanted to test if this value could be used as termination criterion somehow.
To evaluate the effectiveness of a termination criterion, we used the following idea. Suppose we have the ground truth for the source point cloud, i.e., we know the true rototranslation between the reference frames of the source and target point clouds. At the end of each iteration, we obtain an estimate of this rototranslation. Therefore, we can calculate the Mean Squared Error (as in eq. (6)) between our estimate and the ground truth, since they are the same point cloud in different poses. Theoretically, if the algorithm is working properly, this error should decrease among the steps of the outer loop of the algorithm; therefore, the more the iterations, the smaller the difference becomes. Practically, at some point this difference will cease to decrease, or, more precisely, it will start decreasing of a negligible amount. This is the iteration to which we should stop, since it means that the algorithm has converged to a solution. Note that it does not mean that it has converged to the right solution, but, nevertheless, that is the best solution we can get with the algorithm and the set of parameters we are using.
Ideally, a good termination criterion should behave similarly to the difference w.r.t. the ground truth. It should stop the algorithm more or less at the same iteration that we would stop if we would be using the difference w.r.t. the ground truth (that, of course, in a real problem is unknown).
We evaluated the selected termination criteria on two datasets, to find which one works best. Eventually, we evaluated the best one on other datasets, to ensure that the results could be generalized and were not specific to the data we were using for the comparison and that the results we obtained were as good as if we were using a fixed high number of iterations.

Results
In fig. 2 we plotted the three termination criteria while aligning two point clouds from the Standford Bunny dataset [20]. The starting transformation between the two clouds is a rotation of 45 • around the vertical axis. On the x-axis we have the number of the iteration, while on the y-axis we can find: the number of successful steps of the "inner" optimization problem, the initial and final cost of the "inner" optimization problem, the cost drop (i.e., the difference between the two previous values), the Mean Squared Error w.r.t. the previous iteration, the Mean Squared Error w.r.t. the ground truth and the discrete derivatives of the last three variables. We plotted also the discrete derivatives because they clearly show when a variable is not changing anymore: when the derivative becomes zero, the value of a variable has stabilized.
We can see that both the cost drop and the MSE w.r.t. the previous iteration have a very similar trend to the MSE w.r.t. the ground truth. Most important, the three values stabilizes more or less at the same iteration. This is particularly obvious if we compare the discrete derivatives: they become almost zero more or less at the same time. Although the MSE w.r.t. to the ground truth keeps decreasing for a few iterations after the other two values stabilizes, its effect on the quality of the result is negligible. This becomes obvious looking at fig. 3, where we have two point clouds, one aligned using a predefined very large number of iterations, the second one using as stopping criterion the cost drop. We can seen that they overlap practically perfectly. The difference between the errors with respect to the ground truth of the two alignments is less than one tenth of the resolution of the point clouds, thus can be considered definitely negligible. Other experiments on the same datasets yielded similar results.
Instead, the number of successful steps oscillates a lot and appears to be not correlated to the MSE w.r.t. the ground truth. For these reasons it was discarded.
In figs. 5a and 5b we show the results obtained using the Bremen Dataset [21], to which we applied, respectively, a small rotation ( fig. 4b) and a small translation (fig. 4a). In these plots, and in the followings, we will not show the derivatives for space reasons. We can see that the cost drop stabilizes more or less when also the MSE w.r.t. the previous iteration stabilizes, that is, when the cloud has already been moved to the right solution (future adjustments are negligible compared to the resolution of the point cloud).  Considering the results, there seems to be no strong reason to choose the MSE w.r.t. the previous iteration over the cost drop as termination criterion. However, it has to be considered that the MSE has to be specifically calculated after each iteration and is relatively computationally intensive, since the whole source point clouds has to be traversed. This is not a computationally expensive operation per sé, but, on the other hand, the relative cost drop is very fast to compute. Indeed, while solving an optimization problem we already calculate the absolute cost drop, since it is used as termination criterion of the inner loop by the optimization algorithm. Thus, calculating the relative cost drop requires only few more operations: it comes practically for free. For this reason we have chosen to use the cost drop as termination criterion: it is very fast to compute and is as good as the Mean Squared Error.
We performed experiments also with clouds that the PPCR algorithm was not able to align properly. The reason is that we wanted to discover whether the termination criteria were able to stop the algorithm early enough, so that computational time is not wasted.
As an example, we show the results on two point clouds from the Standford Bunny dataset, whose initial misalignment is a rotation of 90 • around the vertical axis, (fig. 6a), and a rotation of 180 • around the vertical axis ( fig. 6b). In these cases, it can be seen that the cost drop stabilizes much earlier than the MSE w.r.t. the ground truth. This behaviour, indeed, is good, since it appeared only in unsuccessful alignments, during which stopping earlier is an advantage (going further would be only a waste of computational time).
We tested the chosen termination criterion on the same datasets we presented in our previous work [4] and on pairs of point clouds taken from the Stanford Bunny and Bremen datasets. Our goal is to show that the criterion is effective at stopping the algorithm at the right iteration: too late is a waste of computational time, too early leads to sub-optimal results. For this reason, we did not fine tune other parameters, since the performances of the algorithm were already shown in our previous work.
To show the effectiveness of our termination criteria, we executed the algorithm twice on each dataset. The first time using a predefined very large number of iterations. The second one using the cost drop to stop the algorithm. As a measure of the quality of the results we used the MSE w.r.t. the ground truth. The results are shown in tables 1 and 2. As it can be seen, the results using our criterion are usually comparable, and sometimes better, than when using many more iterations. This means that it succeeds at stopping the algorithm at the right iteration. In some cases, such as for the corridor dataset, the results in table 1 are much better than those in table table 2. This happens because, sometimes, an excessive number of iterations is not only a waste of time, but could also bring the algorithm to converge to a wrong solution, even though the right solution was reached. This could happen with every algorithm that uses closest point based associations as a greedy approximation for the (unknown) correspondences.
We performed experiments also on the Comprehensive Benchmark for Point Clouds Registration algorithms [22]: it is composed of several point clouds, produced with different kinds of sensor and in different environments. Moreover, it includes several registration problems, with different initial misalignments and different overlaps between the clouds to align. For these reasons, we think that is particularly suitable to prove that the chosen criterion is at least as good as using a high number of iterations, but more efficient. Since the benchmark is composed of several datasets, we show statistics for both the single datasets and for the whole benchmark. The result is expressed in terms of median and 0.75 and 0.95 quantiles of the scaled mean squared error, as described in [22]. In table 3 we compare the median of the results on the various datasets of the benchmark, using PPCR with a high fixed number of iterations (100 iterations) and using the cost drop as termination criterion (stopping when the cost drop is less the 1% of the initial cost for more than 10 iterations). For the cost drop, in the column named Number of iterations, we show the mean number of iterations required to solve the registration problems. The same results are shown in fig. 7 as histograms. For most sequences, the differences between the results obtained using the two methods are negligible. Indeed, the medians among all the registration problems of all the sequences (that is, the row named total in table 3) are very close.
However, there are notable exceptions. For the box_met and the urban05 sequences, the cost drop leads to much better results (a lower median means a better alignment). This is the same behaviour we observed for the Corridor dataset in the previous set of experiments. On the other hand, on the p2at_met and the plain sequences the high number of iterations leads to better results. Nevertheless, it has to be considered that, even when the cost drop is not the best termination criterion, its results are still very good. At the same time, the average number of iterations required using the cost drop is 18.55 and, considering the sequences individually, never greater than 31; therefore there is a great reduction in computational time, w.r.t. using 100 iterations. The proposed termination criterion requires two parameters: the percentage of drop and the number of iterations during which the condition described by eq. (7) should hold. However, the experiments show that using 1% and 10 iterations as thresholds leads to good results in a very large and varied set of registration problems. Therefore, this values should be adequate for most cases and should not require any further fine-tuning.
In table 5 we show the results using 1% and 20 iterations as thresholds. The median result is very close to that obtained using 100 iterations, although the mean number of iterations used is less than 30; therefore, there is a great saving in computational time. However, in our opinion, the difference w.r.t. using 10 iterations as threshold is so negligible that it is not worth the extra computational time. Anyway, it is still an option if a very accurate result is desired.
PPCR using the proposed termination criterion, along with instructions on how to use it, is released on GitHub: https://github.com/iralabdisco/probabilistic_point_clouds_registration.

Conclusions
We introduced the use of the relative cost drop as termination criterion for the Probabilistic Point Clouds Registration Algorithm. We tested this criterion on different datasets and on a comprehensive benchmark for point clouds registration algorithms [22], which is composed of several registration problems, with different degrees of overlap and initial misalignment. The experiments prove that the cost drop is effective at stopping the algorithm at the right iteration, that is, when the algorithm has converged to a good solution that cannot be improved substantially anymore. Moreover, it stops the algorithm very early when solving problems that are not going to converge using more iterations, which is a very desirable behaviour. While it requires two parameters, we propose values that were proven to be effective on a wide range of registration problems.

Funding:
This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.