SGRTmreg: A Learning-Based Optimization Framework for Multiple Pairwise Registrations

Point cloud registration is a fundamental task in computer vision and graphics, which is widely used in 3D reconstruction, object tracking, and atlas reconstruction. Learning-based optimization and deep learning methods have been widely developed in pairwise registration due to their own distinctive advantages. Deep learning methods offer greater flexibility and enable registering unseen point clouds that are not trained. Learning-based optimization methods exhibit enhanced robustness and stability when handling registration under various perturbations, such as noise, outliers, and occlusions. To leverage the strengths of both approaches to achieve a less time-consuming, robust, and stable registration for multiple instances, we propose a novel computational framework called SGRTmreg for multiple pairwise registrations in this paper. The SGRTmreg framework utilizes three components—a Searching scheme, a learning-based optimization method called Graph-based Reweighted discriminative optimization (GRDO), and a Transfer module to achieve multi-instance point cloud registration.Given a collection of instances to be matched, a template as a target point cloud, and an instance as a source point cloud, the searching scheme selects one point cloud from the collection that closely resembles the source. GRDO then learns a sequence of regressors by aligning the source to the target, while the transfer module stores and applies the learned regressors to align the selected point cloud to the target and estimate the transformation of the selected point cloud. In short, SGRTmreg harnesses a shared sequence of regressors to register multiple point clouds to a target point cloud. We conduct extensive registration experiments on various datasets to evaluate the proposed framework. The experimental results demonstrate that SGRTmreg achieves multiple pairwise registrations with higher accuracy, robustness, and stability than the state-of-the-art deep learning and traditional registration methods.


Introduction
Point cloud registration has been actively studied in computer vision and graphics [1][2][3][4][5][6], and most studies mainly focus on pairwise registration [7].The primary objective of pairwise registration is to estimate the transformation parameters that align a source point cloud to a target point cloud.However, there is a multi-instance point cloud registration scenario, where multiple instances are aligned to a fixed template via multiple pairwise registrations.Multiple pairwise registrations make the existing registration methods more time-consuming, especially for the traditional methods with the estimation of the Hessian or inverse Hessian matrix, applying them to the registration of the point clouds obtained from LiDAR with variations in perturbations and point density, demanding high computational capacity and processing time [8].
Additionally, they adopt a model-or feature-driven approach to learn regressors from data to mimic gradients, resulting in heightened stability and robustness in the registration with various perturbations.However, the current approach restricts the learned regressors to training and testing an individual model, lacking flexibility and efficiency for multiple pairwise registrations.
Deep learning methods [12][13][14][15][16][17][18][19][20][21] significantly enhance point cloud registration by automatically extracting features and estimating transformations with learned regressors based on point correspondences.Their data-driven nature bestows them with flexibility, enabling the registration of unseen point clouds.However, this reliance on data can potentially impact registration performance, particularly when confronted with diverse perturbations such as noise, outliers, and occlusions.
To enhance the efficiency, stability, and robustness of multiple pairwise registrations, we introduce SGRTmreg, a new computational framework.Given a collection of point clouds, a source point cloud, and a target point cloud, the process for SGRTmreg to achieve registration unfolds in three steps: (1) Selecting a point cloud similar to the source from the collection based on graph structure, coordinates, node importance, and normal vectors via a searching scheme.(2) Learning regressors from the source using the Graph-based Reweighted Discriminative Optimization (GRDO) method by registering the source to the target.GRDO encodes features and learns regressors from key points in graph structures, reducing memory storage and computational costs.(3) Using the learned regressors to estimate the transformation from the selected point cloud to the target via a transfer module.Notably, the learned regressors possess the versatility to be employed in registering any other point clouds resembling the selected one.
We demonstrate the potential of SGRTmreg in multiple pairwise registrations on the ModelNet40 dataset and showcase the high performance of GRDO in registration under various perturbations on synthetic datasets, the WHU-TLS dataset [22] and the UWA dataset [23].Our experimental results exhibit the accuracy and stability of SGRTmreg in multiple pairwise registrations, with GRDO surpassing advanced registration methods in robustness, accuracy, and stability.The contributions of this paper are the following: • SGRTmreg achieves higher accuracy and robustness in the multiple pairwise registrations.
• GRDO outperforms advanced learning-based optimization methods in robustness, stability, and computational/storage efficiency.• The proposed key points selection method retains detailed information compared to common downsampling approaches [24].

Point Cloud Registration
Point cloud registration aligns two point clouds into a common coordinate system.The Iterative Closest Point (ICP) method [25] is widely used to find the optimal rigid transformation by iteratively minimizing the point cloud difference.Coherent Point Drift (CPD) [26] casts point cloud registration as the matching of Gaussian mixture models, which moves the Gaussian mixture model centroids coherently to preserve the topological structure of point clouds.Bayesian Coherent Point Drift (BCPD) [27] replaces the motion coherence theory in CPD with Bayesian inference.Both CPD and BCPD focus on point-to-point distance without considering local surface geometry.LSGCPD [28] incorporates varying levels of point-to-plane penalization alongside point-to-point penalization.TEASER++ [29] leverages estimation theory, geometry, graph theory, and optimization to register point clouds in the presence of large amounts of outlier correspondences.A scale-adaptive ICP method is introduced in [30] for aligning objects differing by rigid transformations (translations, rotations) and uniform scaling.QGORE [31] employs "rotation correspondence" to establish a one-point RANSAC for lower bound estimation and proposes geometric consistency voting for tight upper bound seeking, which is the first quadratic-time guaranteed outlier removal method for point cloud registration.These traditional methods approach point cloud registration as an optimization problem involving designing objective functions and function solutions.Objective functions are typically tailored to address the registration under specific perturbations, such as noises, outliers, and occlusions.Gradient-based methods are widely employed as function solvers, which often require approximations of the Hessian or inverse Hessian matrices, making it challenging to solve objective functions with a large number of parameters or high storage requirements.
To avoid calculating gradients, learning-based optimization methods utilize supervised sequential update methods to learn regressors emulating gradient directions.Ref. [30] uses regressors to update shape parameters based on image features.The Discriminative Optimization method (DO) [10] adopts the least-squares method to learn regressors mapped to the features of point clouds to estimate transformation parameters.The Reweighted Discriminative Optimization method (RDO) [11] designs an asymmetrical parameter treatment scheme to learn regressors.While learning-based optimization methods demonstrate robustness and stability in handling registrations with various perturbations, they are unable to register multiple point cloud pairs using the learned regressors from individual point clouds.
The success of deep learning techniques in image processing has been extended to point cloud registration.PointnetLK [16] utilizes the Lucas-Kanade algorithm [32] to estimate transformation on a global feature space.DCP [17] replaces the Lucas-Kanade algorithm with differentiable singular value decomposition.RPMNet [20] inputs point clouds and normals to extract features and then estimate point correspondences.RGM [21] transforms point clouds into graphs and calculates correspondences via a graph feature extractor.FMR [18] estimates transformation by minimizing a feature-metric projection error without seeking correspondences.DeepGMR [19] formulates registration as KLdivergence minimization between mixtures of Gaussians.SACF-Net [14] incorporates a novel feature interaction mechanism to enhance pointwise matching by leveraging both low-level geometric and high-level context-aware information.GeoTransformer [33] encodes pair-wise distances and triplet-wise angles to learn geometric features for registration, which ensures invariance to rigid transformations and enhances robustness in low-overlap scenarios.PAnet [34] proposes a point-attention-based multi-scale feature fusion network for partially overlapping point cloud registration.RoReg [35] utilizes oriented descriptors and estimated local rotations throughout the registration pipeline.It introduces a novel oriented descriptor, RoReg-Desc, which is employed for estimating the local rotations.GM-CNet [36] employs a novel transformation-robust point transformer module to adaptively aggregate local features with respect to the structural relations, taking advantage of both handcrafted rotation-invariant features and noise-resilient spatial coordinates to estimate correspondences for full-range partial-to-partial point cloud registration.RIGA [37] develops descriptors with rotation-invariant and globally-aware methods to extract robust correspondences for registration.PointTr [38] employs a learnable geometric position update module and a deeper cross-attention module to automatically learn and capture the geometric structure and features among partial point clouds.The limitations of these methods are twofold: (1) performance drops significantly when applied to unseen point clouds with structural differences from the training data; (2) vulnerability to perturbations due to high data reliance.Nevertheless, deep learning methods provide greater flexibility, enabling training on large amounts of data and testing with any relevant data, a limitation of learning-based optimization methods.
In summary, learned-based optimization methods offer advantages over traditional registration methods by learning regressors directly from data without the need for designing objective functions or calculating gradient matrices.They also exhibit greater robustness compared to deep learning methods and are less dependent on data size.However, they may lack the flexibility of deep learning methods, as they solely rely on learned regressors for registering an individual point cloud pair.Given this, we develop a framework named SGRTmreg for multiple pairwise registrations, utilizing the core insight of learning-based optimization methods-supervised sequential update methods.

Supervised Sequential Update Methods
Learning-based optimization methods use supervised sequential update methods to learn regressors that mimic gradient directions, avoiding explicit gradient calculations.This is completed by learning a sequence of regressors that maps a feature vector to an update vector that points to the desired parameters.Here, we provide a brief review of supervised sequential update methods.Dollár et al. [39] propose a cascaded pose regression to compute 2D object poses in images.Cao et al. [40] develop an explicit shape regression method for face alignment by learning a vectorial regression function.Tuzel et al. [41] present a learning-based tracking method combined with object detection, where a linear regression function represents the descent direction.Xiong et al. [9] learn a sequence of regressors to update shape parameters based on image features per iteration.Most supervised sequential update methods focus on image-based tracking and pose estimation.Vongkulbhisal et al. [10,42] propose DO as an extension of the supervised sequential update methods and apply DO in the 3D registration.Inspired by DO, Zhao et al. [11] introduce an asymmetrical parameter treatment scheme in the least squares method, and Deng et al. [43] develop a generative optimization method for non-rigid registration.
While these methods offer the advantage of not requiring gradient calculation, they suffer from a longer feature extraction time with increasing points, making the registration of dense point clouds infeasible.Additionally, they are commonly used for identical point cloud registration, wherein the test point cloud is generated by introducing a specific perturbation to a training point cloud, which is determined by the following updating criteria of regressors: Here, f : R p → R f is a function that encodes a feature of a point cloud, and D t+1 ∈ R p× f is a regressor that regresses the feature f(x t ) to an update vector.x t+1 is the updating parameter vector for transformation estimation.The prerequisite for the learned regressors D t+1 attained in the training stage being used to estimate the parameter vector x t+1 of the test point cloud is that the features of training and test point clouds must be similar, or at the very least, possess the same dimensions.Accordingly, we devise a search scheme to select a point cloud similar to the target, ensuring the successful application of the learned regressor for the registration of the target model.

Methodology
In this section, we denote a collection of point clouds as P, a source point cloud as Q, and a target point cloud as M. SGRTmreg aims to utilize one sequence of regressors D t+1 to register two point cloud pairs (⟨Q, M⟩ and ⟨S, M⟩), where S is the selected point cloud from P and is the most similar to Q.Note that if there is another point cloud S ′ similar to S, SGRTmreg can utilize D t+1 to register ⟨S ′ , M⟩ as well.
The critical steps for SGRTmreg to achieve the registration of multiple point cloud pairs are: (1) Utilizing a searching scheme to select the point cloud S closely resembling the source point cloud Q from the collection P. (2) Learning the sequence of regressors D t+1 by registering Q and M via the Graph-based Reweighted Discriminative Optimization (GRDO) method.(3) Applying D t+1 in a transfer module to estimate the transformation parameters aligning S to M, as shown in Figure 1.Specifically, first, the searching scheme identifies the similar point cloud S by successively comparing the similarity of key points in the source point cloud Q with those in each point cloud in the collection P across four screening stages, considering graph structure, coordinate distribution, node importance, and normal vector information.Then, GRDO learns the sequence of regressors D t+1 by aligning Q to M via the extracted feature f Q from the key points in Q. Last, the transfer module estimates the transformation from S to M by mapping the learned regressors D t+1 to the feature f S of the key points in S.

Key Point Extraction
To reduce the storage requirement for designing features and learning regressors D t+1 while cutting computational costs for GRDO, we design a key point extraction approach for downsampling point clouds.Figure 2 shows the process of key point extraction.Given a point cloud, Delaunay triangulation is applied to the top view (xy-view) of the point cloud to form a graph [44], where nodes represent vertices and edges represent connections between nodes.Then, the degrees of all nodes in the graph are counted.The degree of a node is the number of connections that it has to other nodes in the graph.Nodes with higher degrees have more connections, signifying their greater importance.Nodes connected by the non-shared edge between two triangles will be extracted as boundary points.The nodes whose degree has the most or the second most occurrence number and boundary points are selected as key points.Figure 3 shows that the proposed key point extraction approach reduces points while preserving detailed model information in contrast to the random and uniform downsample methods [24].

Searching Scheme
The searching scheme aims to identify the most similar point cloud S from the set P by comparing the similarity between each point cloud P i ∈ R N Pi ×3 and the target Q through four screening stages: (1) Measure the graph structure similarity between point cloud pairs ⟨P i , Q⟩ by employing the Hamming distance on their degree lists ⟨Deg P i , Deg Q ⟩.
(2) Measure the similarity of coordinate distribution ⟨Co P i , Co Q ⟩ by clustering the mix of key points in ⟨P i , Q⟩ via the Dirichlet Process Gaussian Mixture Model (DPGMM) [45].
(3) Measure the similarity in the importance of graph nodes ⟨Node P i , Node Q ⟩ using the Eigenvector centrality method [46].( 4) Measure the similarity in normal vectors ⟨NorV P i , NorV Q ⟩ in Euclidean space.The point cloud P i passing these four screening stages will be chosen as the similar point cloud S, as shown in Figure 4.

Similarity in Graph Structure
After converting a point cloud into a graph via the Delaunay triangulation in Section 3.1, the degree of nodes is initially used to sift through candidate point clouds.
• • • are the degree lists of P i and Q, respectively, where each element represents the degree of a node.We sort degrees based on their occurrences and ensure that the length of Deg P i matches that of Deg Q .If the length of Deg P i is larger, the degree with less occurrence will be removed.If it is shorter, Deg P i will be filled with 0.
where d H is the Hamming distance.The Hamming distance between Deg P i and Deg Q is the count of differing elements at corresponding positions.L is the length of Deg Q .P i will enter the second stage as a candidate if the similarity P i De is larger than β.β ∈ (0.5, 1) will always be set manually.

Similarity in Coordinate Distribution
The coordinate distribution reflects the rough shape of a point cloud.The similarity in coordinate distributions ⟨Co P i , Co Q ⟩ is measured by applying DPGMM to cluster the mixture of key points in ⟨P i , Q⟩. Suppose the mixture has been divided into K clusters The elements in R P i and R Q depict the proportion of C k P i in Co P i and that of C k Q in Co Q .N P i and N Q are the number of key points in P i and Q, respectively.
where δ is the Dirac delta function [47].Equations ( 5) and ( 6) illustrate that C τ P i and C τ Q cluster most of the points in P i and Q.If τ P i = τ Q , it implies that P i and Q have similar coordinate distributions (as shown in the cluster circled in Figure 4), and P i will be moved onto the next round.Please note that if Co Q is equally divided, P i will also enter the next round as a candidate.

Similarity in the Importance of Nodes
After sifting out point clouds with shapes similar to source Q, the similarity in internal structure is considered for further screening.The internal structure is revealed through node importance, quantified using the eigenvector centrality method [46].The eigenvector centrality method evaluates the importance of a node based on how important the nodes in contact with it are: the higher the latter is, the higher the former becomes.Assuming the key points in source Q have been converted to the graph G Q with an adjacency matrix A, the absolute value of its principal eigenvector serves as the score for all nodes, revealing the eigenvector centrality of the graph G Q [46].The eigenvector centrality of P i can be attained in the same way.If the average score of all nodes in P i is closest to that of Q, P i becomes a candidate for the next screening stage.To prevent eliminating the most similar point cloud during this screening, we relax the number of candidates entering the next stage to β ′ .

Similarity in Normal Vectors
The similarity in normal vectors is the final criterion for selecting the similar point cloud S. NorV P = n P 1 , n P 2 , • • • n P N is the normal vectors of the candidate collection.N is the number of candidates in this round.The Euclidean distance between each normal of Q and NorV P is calculated, generating a distance matrix E with the size of where m and n are the indices of the normal vectors of Q and NorV P , respectively.d E is the Euclidean distance.
Matrix E c with the size of N Q × ∑ N i=1 N P i locates the points with the highest similarity of normal vectors.E m,: represents the m th row of E.
where E c :,n represents the n th column of E c .N j P counts the number of points with the highest similarity in P j .The j-th point cloud with the maximal value of N j P is the final selected similar point cloud S.

Sequence of Regressors
Let f Q be the feature of Q and D t+1 ∈ R p× f be a matrix mapping the feature to an update vector.Given an initial parameter vector x 0 ∈ R p , the updating process is as follows: The update process ends until x t+1 converges to a stationary point, and the sequence of regressors D t+1 , t = 0, 1 • • • are learned through approximating the estimated parameter vector x i t+1 to the ground truth x i * .
where N is the number of point clouds that participate in the training process, x i t is the parameter vector of the i-th point cloud at the t-th iteration.W t ∈ R p×p is a weighting diagonal matrix.The detailed explanation of ( 12) has been provided in [11].For simplicity, we denote x i t as x t for any point cloud.

Design the Feature f Q
Good registration occurs when the surfaces of two shapes are aligned [10].To achieve such registration, we design a feature function h Q to encode the relative position information of key points, making GRDO learn D t+1 in the direction that aligns surfaces, as shown in Figure 5.We quantize the space around M into a uniform grid G spanning [−2, 2] in each dimension and denote a grid as g j .Let n i be the normal vector of the key point m i in M, computed from the local plane fitted by its six neighboring points; g + = g j : n T i g j − m i > 0 be the set of grids on the 'front' of q i ; and g − = g j : n T i g j − m i < 0 contains the re-maining grids.We design a sparse matrix S p to store the relative position information between the uniform grid G and M. ) where σ controls the width of the exp function, and d M is the number of key points in M. We introduce a function F that applies rigid transformation with parameter x to the source point cloud Q. F(Q; x) records the transformation of Q per iteration.Then, we count the number of key points in the transformation F(Q; x) that fall into each grid to form a counted vector c Q .Then, the feature f Q can be calculated as follows: Feature f Q is employed to learn the sequence of regressors D t+1 via (12).The learned regressors D t+1 will be employed to estimate the transformation for the pair ⟨S, M⟩ in the transfer module.

Transfer Module
The transfer module intends to share the learned regressors D t+1 with S to estimate the transformation parameter x t+1 aligning the pair ⟨S, M⟩ via the following formula: The number of key points in the transformation F(S; x) that fall into each grid forms the vector c S .The feature of the selected point cloud f S can be calculated as follows: For clarity, we provide the pseudocodes for training GRDO and parameter estimation, as shown in Algorithms 1 and 2. We start by training D 1 using initial data W t , and f Q with (12), followed by updating x 1 with D 1 using (11).At each step, a new parameter vector can be created by recursively applying the update rule in (11).The learning process is repeated until certain termination criteria are met, for example, until the error is not reduced too much or the maximum number of iterations T is reached.Then, we count the number of key points in the transformation of S falling into each grid to form the vector c S and utilize the sparse matrix S p via (15) to obtain the feature f S according to (18).Finally, the learned sequence of regressors {D t } T t=1 and feature f S are applied in (17) to estimate the transformation parameter from the selected model S to the target model M.

Algorithm 1 Training a sequence of update maps
Require: Compute W t according to [11] 3: Compute f Q with (16) 4: Compute D t+1 with (12) 5: end for 8: end for Algorithm 2 Parameter estimation Require: x 0 , {D t } T t=1 , δ, S Ensure: x T 1: Count the number of key points in S falling into each grid to form c S 2: for t = 0 to T − 1 do

Experimentation
This section describes applying the proposed framework SGRTmreg for the registration of multiple point cloud pairs.Three registration experiments are conducted: (1) The comparison with traditional registration methods-DO [42], RDO [11], BCPD [27], LSGCPD [28], and TEASER++ [29] on synthetic datasets (http://visionair.ge.imati.cnr/(accessed on 25 October 2020)) [48] (in Figure 6a,b) to show the accuracy and robustness of GRDO.(2) The comparison with deep learning registration methods-FMR [18], Deep-RGM [19], RPMNet [20], and RGM [21] on the ModelNet40 datasets [49] (in Figure 6c,d), which involves the selection of a similar point cloud and parameter transfer, and aims to showcase the efficacy of SGRTmreg on the registration of multiple point cloud pairs.(3) The comparison with traditional and deep learning registration methods on the WHU-TLS (Terrestrial Laser Scanner) dataset [22] (in Figure 6e,f).(4) The comparison with traditional and deep learning registration methods on the range-scan UWA dataset [23] (in Figure 6g,h) to demonstrate the registration capability of GRDO on real-world datasets.

Experimental Design
We normalize each point cloud P i , the target point cloud M, and the source point cloud Q to [−1, 1] 3 .The normalized Q and the normalized P i are compared to select the similar point cloud S via the searching scheme (Section 3.2).We register Q and M to learn the regressors D t+1 in the training process of GRDO.Then, the learned regressors D t+1 are utilized to register S and M.

GRDO Training
The parameters in the training process are similar to those in DO [42].Given the source model Q and the target model M, we first normalized them to lie in [−1,1].Then, we applied the following perturbations to the source model Q to generate the training samples: (i) Rotation and Translation: The rotation is within 45 • and the translations is in [−0.3, 0.3] 3 , which represents the ground truth (x * in (12)). (ii) Noise and Outliers: Gaussian noise with the standard deviation 0.05 is added to Q; 0 to 300 points within [−1.5, 1.5] 3 are added as the sparse outliers.A Gaussian ball of 0 to 200 points with a standard deviation of 0.1 to 0.25 simulates the structured outliers.(iii) Occlusion: We remove 40% to 90% points from Q to simulate occlusions [42].We generate 30,000 training samples, and set x 0 as 0 6 , (N = 30,000, x 0 = 0 6 in Equation ( 12)).Please note that the rotation range in the above settings covers the relative position of the target model M and the source model Q.

Evaluation Metrics
Mean Square Error (MSE) evaluates the performance of registration methods, which measures the average squared difference between the coordinates of the registered point cloud and the target point cloud.Since DO, RDO, BCPD, LSGCPD, GRDO, and TEASER++ are all implemented in MATLAB 2022b, the computation time in seconds serves as an additional metric for assessing these registration methods.

Parameter Settings
For DO and RDO, we set σ 2 as 0.03.The value of the tolerance of the absolute difference between the current estimation and ground truth in iterations is 1 × 10 −4 .For BCPD, the expected percentage of outliers is 0.1, the parameter in the Gaussian kernel is 2.0, and the expected length of the displacement vector is 400.For LSGCPD, the expected percentage of outliers is 0.1, and the maximum iteration is 30.For TEASER++, Graduated Non-Convexity (GNC) [50] is used to estimate rotation, and the factor for increasing/decreasing the GNC function control parameter is set to 1.4.All deep learning networks are trained on a Nvidia Geforce 2080Ti GPU with 12 G memory.The parameter settings for FMR, RGM, DeepGMR, and RPMNet are shown in Table 1.When one parameter is changed, the values of other parameters are fixed to the default value.We will test 750 test samples in each variable setting.
Registration on the ModelNet40 dataset.The ModelNet40 dataset contains prealigned shapes from 40 categories, split into 9843 for training and 2468 for testing.We randomly select one instance from the testing sets of two categories (Airplane and Car) as the given source models Q.Similar models S are selected from the training sets of these two categories via the proposed searching scheme.Figure 7 shows the selected similar point cloud (green) for the given point clouds (red).The perturbation settings on the ModelNet40 dataset are similar to those on synthetic datasets.Registration on the WHU-TLS and UWA datasets.The WHU-TLS dataset comprises 115 scans and over 1740 million 3D points collected from 11 different environments with point density, clutter, and occlusion variations.The perturbation settings on the WHU-TLS dataset are similar to those on synthetic datasets.We uniformly sample from the original model with the replacement of almost 8000 points to generate the model Q.The UWA dataset contains 50 cluttered scenes with five objects taken with the Minolta Vivid 910 scanner in various configurations.All objects are heavily occluded (60% to 90%).From the original model of the object (chef), ∼400 points are sampled using pcdownsample to generate the model Q.We also downsample the scene to ∼1000 points to generate the model M. We initialize M from 0 to 45 degrees from the ground truth orientation with random translation within [−0.3, 0.3] 3 .Nevertheless, GRDO exhibits shorter computation time compared to DO and RDO.This is because GRDO extracts features from a limited number of key points, leading to less time to recount the number of key points falling into each grid.In contrast, BCPD needs more computing time.Meanwhile, the TEASER++ algorithm stands out as the most timeefficient method, even when dealing with large rotations.The time advantage of TEASER++ stems from its adoption of GNC for rotation estimation without solving the large-scale semidefinite programming problem.(Second and Third) show that GRDO still takes less computation time to achieve registration under various noises and outliers than DO and RDO.(Right) illustrates that all methods require less computation time as the occlusion ratio increases.However, the decline in computation time is particularly noticeable for GRDO, BCPD, and TEASER++.Tables 2 and 3 present the MSE of the registration results on Skeleton Hand and Dancing Children models under various perturbations, respectively.We analyze the MSE distribution via two box-plot factors (Maximum and IQR-Interquartile Range).A smaller maximum value indicates higher registration accuracy, while a smaller IQR signifies greater performance stability.The tables show the minimal maximum registration error in bold and the minimal IQR value in italics.The results highlight that BCPD and GRDO exhibit superior stability compared to other methods.Also, the registration accuracy of GRDO is the highest, especially when handling the registration with various noise and outliers.

Registration on the ModelNet40 Dataset
Figure 9 shows the comparison with deep learning methods on the ModelNet40 dataset.The top and bottom show the registration results on the airplane and car models, respectively.Because RGM requires the same size of point clouds to be matched, RGM is unsuitable for registrations involving outliers or occlusions.Hereby, the performances of GRDO, FMR, RPMNet, and DeepGMR are compared.RPMNet and RGM show lower registration accuracy under various rotations.GRDO struggles with accuracy and stability for larger rotations (90 • and above), while DeepGMR excels in these scenarios.Additionally, GRDO demonstrates robustness to noise and outliers, outperforming FMR.When dealing with different degrees of occlusions, RPMNet is the least accurate, while GRDO maintains high accuracy and stability.

Registration on the WHU-TLS Dataset
Figure 10 displays the registration results on Campus and Heritage Building under the following perturbations: rotation-90 • , noise-std = 0.08, outliers-400, missing ratio-0.60.It can be seen that DeepGMR and GRDO demonstrate higher accuracy in registration when the rotation angle is 90 • .When the standard deviation of Gaussian noise is 0.08, DeepGMR, GRDO, RDO, and DO perform better.Regarding the registration with outliers, LSGCPD, GRDO, RDO, and DO show superior performance.GRDO consistently maintains high accuracy even when the occlusion ratio reaches 60%.

Registration on the UWA Dataset
Figure 12 shows the registration results on the UWA dataset.Except for DO, RDO, and GRDO, other methods showcase unsatisfactory performance in registering the model and scene.RDO stands out for its accuracy.In contrast, GRDO performs poorly.GRDO is solely trained on the chef model, lacking exposure to other objects within the scene.It achieves registration using key points from both chef and scene models.Due to the body of the chef model being missing in the scene, the extracted key points from the scene graph differ significantly from those of the chef model, resulting in the poor performance of GRDO.

Key Points Extraction
We conduct experiments on the Campus model to explore the influence of point cloud density on the key point extraction.Figure 13 illustrates the point clouds with varying densities attained through random, uniform, and nonuniform downsampling, along with the extracted key points.The key points effectively capture the model shape and details (highlighted by red rectangles), as seen in Figure 13, except for those extracted using the uniform downsample method.This method merges points within the same box, averaging their locations, colors, and normals, leading to a loss of detailed information.Additionally, we extract key points from point clouds with varying rotations, noise levels, and sampling rates to explore the robustness and effectiveness of Delaunay triangulation in terms of different perturbations.Figure 14 displays key points extracted via Delaunay triangulation from the Campus model (WHU-TLS dataset) and the Chair model (ModelNet40 dataset) rotated at 0 • , 30 • , 60 • , and 90 • along the X, Y, and Z axes.The bold black number indicates the number of key points extracted from a single point cloud, while the black points illustrate differences among key points extracted from rotated and non-rotated point clouds.It can be seen that the number of key points extracted from the point cloud rotated 90 • is nearly half that of the non-rotated point cloud.For symmetric shapes like the Chair model, rotation has less impact on the performance of Delaunay triangulation, and the extracted key points adequately cover both the shape and its details in terms of various rotation angles.However, for intricate shapes like the Campus model, extracted key points generally outline the shape but overlook detailed information.As the rotation angle increases, the disparity between key points extracted from rotated and non-rotated point clouds widens, evident in the black area in the third and fourth columns.Figure 15 depicts key points extracted via Delaunay triangulation from the Campus model (WHU-TLS dataset) and the Chair model (ModelNet40 dataset) under various noise and sampling rates.The first row displays the extracted key points under Gaussian noise standard deviations of 0, 0.02, 0.04, and 0.06.The second and third rows show the extracted key points via the random sampling technique and the nonuniform sampling technique, respectively.The sampling rates are 100%, 80%, 60%, and 40%.The bold black number signifies the number of key points extracted from a single point cloud.The preservation of shape and detail highlights the robustness of the key point extraction to variations in noise and sampling.To further explore the influence of key points extracted by Delaunay triangulation on the final registration, we rotate the Dancing Children model 30 • , 60 • , 90 • , and 120 • along the X, Y, and Z axes to extract their key points, while comparing the number of key points and the registration error.The number of key points and their registration error is shown in Table 4. Please note that the number of key points in this table represents the size of the intersection of the key points of the rotated model and the key points of the original model.The number of key points of the original model is 3269.It can be seen that no matter how many degrees the model is rotated, the number of key points is about 2000, which is greater than half of the number of key points of the original model.To discuss the influence of the number of key points on registration error, we also compare the MSE of registration under varying rotations between the key points and the original model.It can be seen that although the registration accuracy of the original model is higher than that of the rotated model, the gap is small.Additionally, as the degree of rotation is increased, the registration accuracy is lowered.
In summary, combining Figure 14 with Table 4, we can find that although the key points extracted by Delaunay triangulation are rotation-dependent, the shape and most details are maintained, making the gap between the registration error of the key points and that of the original model slight.

Searching Scheme
The proposed searching scheme comprises four screening criteria: (1) R 1 -similarity in graph structure; (2) R 2 -similarity in coordinates; (3) R 3 -similarity in the importance of graph nodes; (4) R 4 -similarity in normal vectors.To demonstrate the indispensability of screening criteria, we conduct the ablation study on MPI Dynamic FAUST [51] and ModelNet40 datasets.
The MPI Dynamic FAUST dataset includes 10 subjects with 14 poses, and each pose contains hundreds of sequences, from which we randomly select one subject and its 14 poses as searching instances, as shown in Figure 16.We select instance 6 (rectangle) as the reference instance Q and try to find its similar instance S from the remaining 13 instances.The ellipse shows the difference between these 13 instances and instance 6 .instance 1 and instance 12 are regarded as the target instances because the difference between these instances with Q is slight.Table 5 shows the result of the ablation study on the MPI Dynamic FAUST dataset.The number of candidates represents the number of instances entering the next screening round.The value after "/" represents the candidate number when β ′ = 1 and β = 1.The value before "/" shows the candidate number when β ′ = 3 and β = 0.80.The collaboration of these four screening criteria takes 5.538106 s to find the target instance S-instance 12 .Also, we find that if the parameters are set loosely, S will not be easily eliminated.To further explore the robustness of the searching scheme, we conduct experiments in the selected subject with its 14 poses under varying sampling and noise levels, and the Shape Distributions [52] method is used for comparison.The shape distribution quantitatively describes and compares 3D geometry using geometric characteristics evaluated by a shape function.The D2 shape distribution is renowned for its suitability in model classification and comparison.The Bhattacharyya coefficient is utilized to measure the similarity between shape distributions [53].Given the reference instance Q and the remaining 13 candidates, we test the robustness of the proposed searching scheme using three cases: (1) Searching for a similar instance S under varying noise levels.The standard deviation of Gaussian noise is set to 13 random numbers within the range of 0 to 0.3.
(2) Searching for a similar instance S under varying sampling rates.The sampling rate is set to 13 random numbers within the range of 0.7 to 1. (3) Searching for a similar instance S under varying sampling rates and noise levels, referring to the mentioned settings of sampling rate and the standard deviation.The search results are shown in Table 6.The value after "/" represents the index of the selected similar S, and the value before "/" shows the candidates with higher similarity to the given reference instance Q (instance 6 ).It can be seen that the proposed searching scheme is feasible for handling search tasks under various perturbations and has higher robustness than the Shape Distributions method.To confirm this conclusion, we randomly select one subject with a single pose (chicken wings) comprising 216 sequences with varying levels of occlusions and outliers from the MPI Dynamic FAUST datasets, as shown in Figure 17.We select instance 01 as the reference instance Q and try to find its similar instance S from the remaining 215 instances.Table 7 shows the result of the ablation study on the MPI Dynamic FAUST dataset with the pose of chicken wings.The candidates entering the second screening round are shown in Figure 18.The black represents the reference instance instance 01 .instance 56 is the selected similar instance S of the Shape Distributions method.It can be seen that the proposed search scheme performs better than the Shape Distribution method.The ModelNet40 dataset contains 40 categories of CAD models, among which we select the "Car" category as the study object.The training set includes 190 instances, and the test set contains 95 instances.We randomly select instance 102 as Q and try to find its target instance S in the remaining 284 instances.Table 8 shows the ablation study results."\" is used to replace the index value when the number of candidates is large.We can find that R 1 can eliminate almost one-half of instances whose graph structure is far different from that of Q, and R 2 can achieve a similar effect in reducing the number of candidates.
Figure 18.The candidates entering the second screening round and the final selected similar point clouds (dashed rectangles).In addition, we test the proposed searching scheme on objects with unseen categories using a mixed dataset comprising the MPI Dynamic FAUST dataset, ModelNet40 dataset, and SHREC'20 dataset.The SHREC'20 dataset [54] includes an elastic-stuffed toy rabbit with 11 partial scans and one full scan.For this experiment, we focus on the "Car" category from the ModelNet40 dataset with instance 102 as the reference instance Q, resulting in a total of 310 candidate instances.We set β ′ = 5 and β = 0.99.Table 9 shows the ablation study results.Due to β ′ = 5, there are five candidates entering R 4 , as shown in Figure 19.It can be seen that the proposed searching scheme can locate targets in objects with unseen categories.These four screening criteria play distinct roles in the searching scheme.R 1 filters out instances whose structure is far different from that of Q. R 2 eliminates instances with dissimilar point distributions to Q. R 3 screens instances with similar node proximity to Q. R 4 selects the most similar instance based on the normal vector.In summary, the proposed searching scheme follows a coarse-to-fine approach to efficiently search S for Q.Each of the four screening criteria is essential and complements one another.Please note that R 2 will discard similar point cloud S if the rotation angle between S and Q exceeds 75 • because DPGMM clusters point clouds by coordinates in R 2 .

Partial Point Cloud Registration
We conduct registration experiments on the MVP dataset [36] under various rotations to evaluate the performance of GRDO on partial point cloud registration.The MVP dataset is a large-scale multi-view partial point cloud dataset comprising over 100,000 high-quality scans, and it provides a training set with 62,400 partial-complete point cloud pairs and a test set with 41,800 pairs.We randomly select six pairs for registration.Notably, GRDO is solely trained on complete models, employing the following training parameters: rotation-90 • , noise-0, outliers-0, and missing ratio-0.4 to 0.9.In the test stage, we use the learned regressors to register the partial point cloud pairs directly.Figure 20 shows that GRDO can register partial point cloud pairs and perform well under varying occlusions yet struggles with larger rotations.

Different Density Distribution
Given that GRDO extracts key points based on graph structures, we suspect its sensitivity to matching point clouds with varying densities, especially when the source and target point clouds are from different or noisy density distributions.We conduct registration experiments on the MVP dataset with varying densities to investigate this.We first rotate the original model by 45 • , 90 • , and 120 • to obtain the ground truth, then add Gaussian noise with a standard deviation of 0.02 to the ground truth, which is downsampled via the random sample, uniform sample, and nonuniform sample methods to create target models.The training parameters are rotation-150 • , noise-std = 0.05, outliers-0, and missing ratio-0.The registration results are shown in Figure 21.As the number of points increased, the registration accuracy improved significantly.Surprisingly, even with a threefold difference in the number of points between point clouds (2048 vs. 512), GRDO successfully registered them, proving its resilience to density distribution variations while maintaining high accuracy.To explore the influence of the number of points on computation time, we downsample the MVP dataset (motorcycle) to 10,000, 5000, 2500, 1000, and 500 points, respectively, while comparing the computation time.The computation time is shown in Figure 22.Please note that the computation time is the time for registering the model with 10,000, 5000, 2500, 1000, and 500 points to the model with 10,000 points, respectively.It can be seen that as the number of points decreases, the computation time becomes shorter.

Transfer Module
To validate the transfer module in transformation estimation, we conduct a comparative experiment on the ModelNet40 dataset between registration using GRDO TF (with the module) and GRDO NTF (without the module).Figure 23 shows the registration results on Airplane and Car models.GRDO TF is represented by the solid line with a square, while GRDO NTF is shown by the solid line with a circle.Top displays the comparison of computation time.Bottom shows the log 10 MSE.GRDO NTF generally has shorter computational time, better registration accuracy, and similar robustness and stability compared to GRDO TF .Despite having lower accuracy compared to GRDO NTF , GRDO TF exhibits high robustness and stability, surpassing most of the comparison methods.Thus, the transfer module is essential and highly effective for learning-based optimization in the registration of multiple point cloud pairs.

Comparison with Learning-Based Methods
The memory requirement is O N (c 1 +c 2 )N M +c 2 N S +c 3 N M × N M for learning D t+1 in DO, which largely depends on the number of points [11].GRDO extracts features from key points, substantially reducing the storage requirement for learning D t+1 .Compared to deep learning methods, learning-based optimization approaches (DO, RDO, and GRDO) achieve more stable and robust registration under various perturbations.Deep learning methods face challenges in converging to optimal solutions when dealing with perturbations like noises and outliers due to their data-driven nature.In contrast, model/feature-driven learning-based optimization methods excel in handling such perturbations.Although learning-based optimization methods are not as flexible as deep learning methods, SGRTmreg provides a new perspective for achieving it.A breakthrough in developing a more general feature could enable learning-based optimization methods to achieve multi-number and multi-category point cloud registration efficiently.

Limitations
GRDO outperforms DO and RDO in terms of computation time and storage, but it has limitations in achieving registration on model and scene.These limitations arise from the key point extraction process, which relies on the graph structure.When matching point clouds with significantly different graph structures, the performance of GRDO diminishes, making it challenging to register partial point cloud pairs with vastly different graph structures, such as large outdoor and indoor scenes.While the proposed SGRTmreg framework can achieve multiple pairwise registrations, it is limited to similar point cloud pairs due to the poor generalization of the feature extraction method, restricting the applicability of the learned regressors.

Conclusions
This paper presents SGRTmreg, a framework for the registration of multiple point cloud pairs, featuring a proficient searching scheme to find similar point clouds, the learning-based optimization algorithm GRDO for registering point cloud pairs, and a transfer module for additional registrations.The searching scheme selects a similar point cloud for a given one from a collection by using four similarity measurements: graph structure, shape, inner structure, and surface direction.Experimental results demonstrate that the searching scheme can select similar point clouds under various perturbations and from mixed datasets.GRDO learns shared regressors from key points of point clouds, enabling faster and more efficient registration.Experimental results show its high robustness and

Figure 1 .
Figure 1.The framework of SGRTmreg includes three steps: (1) a searching scheme for selecting a similar point cloud S from a collection P for the source model Q by comparing the similarity of P and Q in four screen stages; (2) considering graph structure, coordinate distribution, and the importance of nodes and normal vector information, a learning-based optimization method called GRDO for learning a single sequence of regressors D t+1 via the alignment of the source model Q and the target model M; (3) a transfer module for estimating the transformation between ⟨S, M⟩ by transferring the learned regressors D t+1 to the selected model S. To reduce storage requirements and computation costs, each step works on the extracted key points.

Figure 2 .
Figure 2. The process of key point extraction.The pink and orange are the extracted key points.

Figure 3 .
Figure 3.The comparison of our key point extraction approach with the random and uniform downsample methods.This figure shows the zoomed-in model from the XZ and XY perspectives.The dark circles show the difference.The red circles show the main detailed difference.

Figure 4 .
Figure 4.The structure of the searching scheme, including four screening stages, considering graph structure, coordinate distribution, the importance of nodes, and normal vector information.The dotted rectangle displays the candidate point clouds moving to the next screening stage.The points circled by blue and black are clustered in the same group.Red shows the nodes with the highest importance and pink with the lowest.

Figure 5 .
Figure 5.The process of feature extraction.

Figure 7 .
Figure 7.The search results for the given source models.Red shows the given source point clouds.Green shows the selected similar point clouds.

Figure 8
Figure 8 presents the computation time of traditional methods on synthetic datasets.(Top) and (Bottom) display the log 10 computation time on the Skeleton Hand model and the Dancing Children model, respectively.(Left) shows that the computation time of learning-based methods (DO, RDO, and GRDO) takes longer as the rotation angle increases.Nevertheless, GRDO exhibits shorter computation time compared to DO and RDO.This is because GRDO extracts features from a limited number of key points, leading to less time to recount the number of key points falling into each grid.In contrast, BCPD needs

Figure 8 .
Figure 8.The log 10 computation time of traditional registration methods on synthetic datasets (Top-Skeleton Hand, Bottom-Dancing Children).Each X-axis represents varying degrees of perturbation: rotation angles, standard deviation of noise, number of outliers, and occlusion rate.

Figure 10 .
Figure 10.The registration results on the WHU-TLS dataset (Top-Campus, Bottom-Heritage Building).Each column shows the registration results under a specific perturbation, while each row displays the registration results of different methods.

Figure 11
Figure 11  displays registration results on the WHU-TLS dataset under different perturbations, with the top for Campus and the bottom for Heritage Building.The red indicates the log MSE of deep learning methods, and the blue represents that of DO, RDO, and GRDO.The green shows that of BCPD, LSGCPD, and TEASER++.DeepGMR performs well with the registration under larger rotations (over 90 • ).DeepGMR, GRDO, and FMR demonstrate higher accuracy in achieving registration under varying degrees of noise.Traditional methods, notably DO, RDO, and GRDO, outperform deep learning methods in handling registration under outliers and occlusions.

Figure 11 .
Figure 11.The log 10 MSE of registration results on the WHU-TLS dataset (Top-Campus, Bottom-Heritage Building).Red signifies deep learning methods, blue represents learning-based optimization methods, and green indicates traditional registration methods.Each X-axis represents varying degrees of perturbation: rotation angles, standard deviation of noise, number of outliers, and occlusion rate.For the sake of comparison, the comparison is marked by arrows and numeric values.

Figure 12 .
Figure 12.The registration results on the UWA dataset.

Figure 13 .
Figure 13.Campus model with different densities and the extracted key points.The value in the bottom-right corner represents the number of points.

Figure 15 .
Figure 15.The key points extracted via Delaunay triangulation from the Campus model (WHU-TLS dataset) and the Chair model (ModelNet40 dataset) under various noise and sampling rates.

Figure 16 .
Figure 16.The searching instances in the MPI Dynamic FAUST dataset.

Figure 17 .
Figure 17.The sequence of the poses of chicken wings.

Figure 19 .
Figure 19.The instances from the mixed dataset entering the final screening.

Figure 20 .
Figure 20.Registration on the MVP dataset with various rotation angles.

Figure 21 .
Figure 21.Registration on the MVP dataset with various density distributions.

Figure 22 .
Figure 22.The computation time on the motorcycle model with different sizes.

Figure 23 .
Figure 23.The comparison of GRDO with transfer module (GRDO TF ) and GRDO without transfer module (GRDO NTF ) under different perturbation settings on the ModelNet40 dataset (A-Airplane, C-Car).Top displays the comparison of computation time.Bottom shows the log 10 MSE.Each X-axis is varying degrees of perturbation, namely rotation angles, standard deviation of noise, number of outliers, and occlusion rate.

Table 1 .
Parameter settings for deep learning methods.

Table 3 .
The registration results on the dancing children model(10 −4).

Table 4 .
The number of key points (NP) and their registration error (MSE).

Table 5 .
Result of ablation study of searching scheme on the MPI dynamic FAUST dataset.

Table 6 .
The search results for the MPI Dynamic FAUST dataset under varying perturbations.

Table 7 .
Result of ablation study on the MPI dynamic FAUST dataset with the pose of chicken wings.

Table 8 .
Result of ablation study of searching scheme on the ModelNet40 (car) dataset.

Table 9 .
Result of ablation study of searching scheme on the mixture of datasets.