Partial-to-Partial Point Cloud Registration by Rotation Invariant Features and Spatial Geometric Consistency

: Point cloud registration is a critical problem in 3D vision tasks, and numerous learning-based point cloud registration methods have been proposed in recent years. However, a common issue with most of these methods is that their feature descriptors are rotation-sensitive, which makes them difﬁcult to converge at large rotations. In this paper, we propose a new learning-based pipeline to address this issue, which can also handle partially overlapping 3D point clouds. Speciﬁcally, we employ rotation-invariant local features to guide the point matching task, and utilize a cross-attention mechanism to update the feature information between the two point clouds to predict the key points in the overlapping regions. Subsequently, we construct a feature matrix based on the features of the key points to solve the soft correspondences. Finally, we construct a non-learning correspondence constraint module that exploits the spatial geometric invariance of the point clouds after rotation and translation, as well as the compatibility between point pairs, to reject the wrong correspondences. To validate our approach, we conduct extensive experiments on ModelNet40. Our approach achieves better performance compared to other methods, especially in the presence of large rotations.


Introduction
Three-dimensional point cloud registration is of great significance in robotics and computer vision to find a rigid body transformation to align a pair of point clouds with unknown point correspondences.It has many important applications in scene reconstruction [1][2][3], localization [4], autonomous driving [5] and so on.The most widely utilized traditional registration method is the iterative closest point (ICP) [6], which is alternated between the two steps: solving the point correspondences and rigid transformation.However, ICP is sensitive to initialization and often converges to the wrong local minima.Some global registration algorithms, i.e., GO-ICP [7] and fast global registration (FGR) [8], are proposed to overcome the limitations of ICP, but they can easily fail in the case of noise or partially overlapping point clouds.
In recent years, the deep learning model has dominated the field of computer vision.The point cloud registration algorithms based on deep learning [9][10][11][12][13][14][15] are faster and more robust than traditional algorithms.Roughly, they could be divided into two categories: correspondences-free methods and correspondences-learning methods.Correspondencesfree methods [9][10][11] regress the rigid motion parameters by minimizing the difference of feature maps between two input point clouds.Although they have good robustness for noise, most of them hardly deal with partially overlapping 3D point clouds.The main idea of correspondences-learning methods is to establish correspondences through the highdimensional features of each point.Examples range from deep closest point (DCP) [12], PRNet [13] and RPMNet [14] to IDAM [15].However, most of these networks do not explicitly deal with error correspondences, and they often fail at large rotations.
Based on the above discussion, in this paper, we propose a learning-based pipeline for partially overlapping 3D point cloud registration with large rotations.We address the issue of sensitivity to rotation in feature descriptors by utilizing rotation-invariant features based on 4D point pair features (PPF) [16].However, relying solely on high-dimensional rotation-invariant features can lead to overfitting during network training, and the lack of position information about the position of each point can result in similar features for points in smooth or symmetric regions, leading to mismatches in key points.In order to make the feature contain position information and be robust to rotation, we use a two-branch feature extraction strategy for the point clouds, and allow the rotation-invariant feature to guide the global feature after positional encoding.However, there are always a large number of wrong correspondences in feature matching.While weighting the correspondences is a common practice, such weights are closely tied to the matching features and may fail to eliminate incorrect point pairs.In order to solve this problem, we propose a non-learning correspondence constraint module, which does not rely on the feature of point cloud, but only utilizes the geometric invariance after rotation and translation.We leverage the bidirectional correlation of distance between the inline point pairs to reject the wrong correspondences.Finally, the transformation matrix is estimated using a differentiable singular value decomposition (SVD) layer.Extensive experiments demonstrate that the method we have proposed can effectively eliminate errors in noise-free data, and achieves better performance on noisy point clouds with large rotations compared to many traditional methods and methods based on deep learning.

Related Work 2.1. Traditional Point Cloud Registration Methods
The most widely utilized traditional local registration method is ICP [6].It alternates between finding the point correspondences between the source and target point clouds and solving the least-squares problem [17].Although many algorithms [18][19][20] utilize different methods to solve the time and convergence of ICP, unfortunately, ICP and its variants are sensitive to initialization and easily converge to the local minimum.
The global registration algorithm random sample consensus (RANSAC) [21] is another important registration algorithm.It usually utilizes fast point feature histogram (FPFH) [22] or signature of histogram of orientation (SHOT) [23] to extract the features of the point clouds, and randomly selects a fixed number of points for estimation in each iteration to compute a rough transformation.Although these methods can effectively remove outliers, they are very time-consuming.FGR [8] utilizes FPFH to describe the features of the point clouds and find the corresponding point pairs in the feature space.Go-ICP [7] utilizes a branch-and-bound scheme to search for the optimal solution in the pose space.Furthermore, 4PCS [24] finds a set of four corresponding points between two point clouds, and then uses the correspondences between these points to calculate the rigid transformation.The advantages of the 4PCS algorithm are high efficiency and strong robustness.However, most of these methods are very sensitive to noise, and do not work well on partially overlapping 3D point cloud registration.

Correspondences-Free Methods
PointNetLK [9] is the first to utilize deep learning to process 3D point cloud registration.It combines PointNet [25] and Lucas and Kanade [26] to register through feature alignment and iterative processing.PCRNet [10] is another global registration network that utilizes PointNet for feature extraction and utilizes multi-layer perceptron (MLP) for regression of rotation and translation parameters.OMNet [11] learns masks in a coarse-to-fine manner to reject non-overlapping regions, however, it is difficult to accurately estimate the masks without feature information interaction.Although these methods achieve good performance in their own experiments, their performance deteriorates when the point clouds are partially overlapping.In contrast, our work belongs to correspondences-learning meth-ods, which require only a small amount of matching correspondences to achieve accurate and effective point cloud registration.

Correspondences-Learning Methods
DCP [12] utilizes the dynamic graph CNN (DGCNN) network [27] to extract the local features from point clouds for forming soft correspondences and solving least square problems through an SVD layer.However, it is assumed to have a one-to-one corresponding relationship in the two point clouds.DCP has been extended to PRNet [13], which includes a key points detection module to perform partial-to-partial registration.RPMNet [14] utilizes a differentiable Sinkhorn [28] layer and annealing to obtain soft assignments of point correspondences from hybrid features learned from both spatial coordinates and local geometry.IDAM [15] combines feature and Euclidean information into the corresponding matrix, and utilizes a two-stage learnable point elimination technique for registration.However, these methods depend on the similarity of feature descriptors of key points, and the network cannot converge if only encoding the coordinates through shared convolution layers when the rotation is large and the difference between the coordinates of two clouds is significant.In contrast to these methods, we adopt a two-branch feature description strategy that includes position information and rotation-invariant local features to obtain the high-dimensional embedding of the point clouds.

Rotation-Invariant Descriptors
The FPFH descriptor [22] is conventionally generated based on geometric properties of local surfaces such as curvature and normal deviation.On the other hand, PPF [16] utilizes Euclidean distances and angles between point vectors and normals to describe each pair of relations.Although these hand-crafted descriptors are rotationally invariant by design, they remain sensitive to noise.To address this issue, PPFNet [29] represents unorganized point clouds as a combination of points, normals and point pair features to describe local geometric features.In subsequent work [30], FoldingNet [31] is adopted instead of multiple MLPs as the backbone network to learn 3D local descriptors.Nevertheless, all those methods are constrained by their locality and do not take into account the absolute position of the points, which may result in a large number of mismatched points with similar local features being utilized as key points.So, in the implementation of our network, we incorporate it as an auxiliary branch.

Method
This section describes the proposed point cloud registration model, and the entire network architecture is illustrated in Figure 1.The global features and rotation-invariant features of the two point clouds are extracted through two branches (Section 3.1).By employing the cross-attention mechanism, the features of the point clouds can perceive contextual information from each other, specifically focusing on key points within overlapping regions.Subsequently, a feature matrix is constructed on the features of the key points to address the soft correspondences (Section 3.2).Finally, a space geometric consistency constraint module (SGC) is utilized to reject the outliers (Section 3.3).

Feature Extraction Network
Global features are extracted using a simplified graph neural network (GNN) architecture.Unlike the approach described in the original paper [27], our network avoids the use of dynamically changing neighborhoods in the graph.This modification is made to prevent the feature information from being propagated differently across different regions, which can interfere with achieving symmetrical point cloud registration [32].The feature extraction framework is shown in Figure 2. We only construct the graph structure between coordinates, not between features.Specifically, suppose we have a point cloud X, N i is the index of K points closest to point x i in point cloud X, which can be obtained by the K-nearest neighbor algorithm (K-NN).Let u (n) i be the high-dimensional space feature vector Remote Sens. 2023, 15, 3054 4 of 18 of the nth layer of the point x i in the GNN.Then the feature of point x i in the next layer is computed as: where g is composed of two MLPS with normalization and ReLU activations, f is a singlelayer MLP with the same input and output dimensions, which aims to further enhance the feature information, and max is the element-wise max operation.

Feature Extraction Network
Global features are extracted using a simplified graph neural network (GNN) architecture.Unlike the approach described in the original paper [27], our network avoids the use of dynamically changing neighborhoods in the graph.This modification is made to prevent the feature information from being propagated differently across different regions, which can interfere with achieving symmetrical point cloud registration [32].The feature extraction framework is shown in Figure 2. We only construct the graph structure between coordinates, not between features.Specifically, suppose we have a point cloud x in point cloud X , which can be ob- tained by the K-nearest neighbor algorithm (K-NN).Let ( ) n i u be the high-dimensional space feature vector of the n th layer of the point i x in the GNN.Then the feature of point i x in the next layer is computed as: where g is composed of two MLPS with normalization and ReLU activations, f is a single-layer MLP with the same input and output dimensions, which aims to further enhance the feature information, and max is the element-wise max operation.

Feature Extraction Network
Global features are extracted using a simplified graph neural network (GNN) architecture.Unlike the approach described in the original paper [27], our network avoids the use of dynamically changing neighborhoods in the graph.This modification is made to prevent the feature information from being propagated differently across different regions, which can interfere with achieving symmetrical point cloud registration [32].The feature extraction framework is shown in Figure 2. We only construct the graph structure between coordinates, not between features.Specifically, suppose we have a point cloud x in point cloud X , which can be ob- tained by the K-nearest neighbor algorithm (K-NN).Let ( ) n i u be the high-dimensional space feature vector of the n th layer of the point i x in the GNN.Then the feature of point i x in the next layer is computed as: where g is composed of two MLPS with normalization and ReLU activations, f is a single-layer MLP with the same input and output dimensions, which aims to further enhance the feature information, and max is the element-wise max operation.For the point cloud registration task, it is not enough to capture only the local features of the point cloud.In order to make the features of each point contain the information of For the point cloud registration task, it is not enough to capture only the local features of the point cloud.In order to make the features of each point contain the information of the whole point cloud, we utilize the self-attention mechanism [33] to update the information of each point.We employ inner product calculation to assess the correlation between each point and other points in the point cloud.When two points exhibit a higher degree of correlation, their feature interaction is more pronounced.This method enables us to extend the local neighborhood feature of each point to encompass the global feature of the entire point cloud.As a result, we obtain a more comprehensive and accurate feature representation.Through this method, we can determine the importance weight of each point, which can be employed for feature fusion and selection purposes.
Specifically, as shown in Figure 3a, the input features are updated into query vector Q x_sa , key vector K x_sa and value vector V x_sa through three convolution layers, respectively (Equation ( 2)).Additionally, the attention-based feature maps A x are obtained as Equation ( 3), which is used to measure the degree of correlation between two points.In order to prevent loss of information, we utilize the residual structure to obtain the final features (Equation (4)).Encoding is performed in exactly the same way for point cloud Y.
where W a_sa , W b_sa and W c_sa denote the weights, W a_sa and W b_sa are implemented using a two-layer one-dimensional convolutional neural network and W c_sa is implemented using a four-layer one-dimensional convolutional neural network.α is a learnable weight, which determines the degree of influence between points.In order to design rotationinvariant features, we utilize PPF [16] as the initial input of the network, and utilize edge convolution [27] and max-pooling to project each local PPF signature to the c-dimensional local geometric description.For a point x c in the point cloud X, we first define a local neighborhood N(x c ) which contains points within a distance of r ∈ R from it.Each PPF can be defined as: where x i ∈ N(x c ) and ∆x c,i represents the vector between x c and x i , and n c and n i are the normals of points x c and x i .∠ computes the angle between two vectors v 1 and v 2 , which can be defined as: Sens. 2023, 15, x FOR PEER REVIEW 5 of 19 the whole point cloud, we utilize the self-attention mechanism [33] to update the information of each point.We employ inner product calculation to assess the correlation between each point and other points in the point cloud.When two points exhibit a higher degree of correlation, their feature interaction is more pronounced.This method enables us to extend the local neighborhood feature of each point to encompass the global feature of the entire point cloud.As a result, we obtain a more comprehensive and accurate feature representation.Through this method, we can determine the importance weight of each point, which can be employed for feature fusion and selection purposes.Specifically, as shown in Figure 3a, the input features are updated into query vector  3), which is used to measure the degree of correlation between two points.In order to prevent loss of information, we utilize the residual structure to obtain the final features (Equation ( 4)).Encoding is performed in exactly the same way for point cloud Y .

Key Points and Soft Matching
In order to reduce computational complexity and identify a small number of highly correlated correspondences, it is necessary to extract a subset of points for matching purposes.However, directly using an MLP to select key points may result in the network retrieving a large number of points that are not in the overlapping regions.To address this, information exchange between the two point clouds is required prior to sampling.
By leveraging the cross-attention mechanism, the feature information from both point clouds can be exchanged and combined effectively, enabling the identification of key points that are relevant for the overlapping regions.This approach helps to alleviate the issue of fetching unnecessary points and facilitates the selection of a smaller, more relevant set of points for matching.The module structure diagram shown in Figure 3b illustrates this process.In the cross-attention module, the initial embedding consists of source point cloud features and target point cloud features.The computation of feature interaction follows the approach outlined in Equations ( 2)-(4).
where W a_ca , W b_ca and W c_ca denote the weights, and α is a learnable weight.The updated features obtained from the cross-attention module are passed through a fully connected layer with dimensions (64, 64, 1) to compute the matching probability s(i) for each point.This step follows the original network design of IDAM [15].
To generate the matching probability matrix, we stack the updated features of the key points and include additional features such as the distance between the point clouds and the pointing unit vector between point pairs.This results in an M × M × H matrix, where M represents the number of key points selected from point clouds X and Y, and H denotes the number of stacked channels.
To ensure the invariance of the input order, we apply MLP on the feature vector of each correspondence, which outputs scores.These scores capture the similarity between the corresponding points in the source point cloud X and the target point cloud Y.By applying the Softmax function along each row of the M × M score matrix, we obtain the similarity matrix S. Each element S ij in this matrix represents the probability that the point x i and the point y i are correctly matched.To construct soft correspondences, we select the point pair relation with the maximum probability in each row of the similarity matrix.This ensures that the most likely matches are identified, allowing for accurate correspondence estimation between the two point clouds.

Spatial Geometric Consistency Constraint Module
In the soft matching relationship, how to obtain the correct corresponding point pair information is a key problem.In this paper, we address this challenge by leveraging the spatial consistency provided by Euclidean transformations to eliminate incorrect correspondences.The fundamental idea is that the spatial geometric properties of a point cloud remain unchanged under rotation and translation, as depicted in Figure 4.
For instance, consider the inline point pairs (x 1 ,y 1 ) and (x 2 ,y 2 ) in their respective point clouds.These pairs maintain distance invariance despite the transformations.On the other hand, the point pair (x 3 ,y 3 ) is an incorrect correspondence due to similar features, preventing it from forming a compatible relationship with other valid inline correspondences.To establish the correct correspondences, we define x i and y i as a group of corresponding point pairs in the source point cloud X and target point cloud Y, respectively, and x j , y j represent another set of corresponding point pairs, then we can define: If the two groups are correct correspondences, the x i x j distance in point cloud X is consistent with the y i y j distance in point cloud Y, that is, d ij = δ, and δ is an acceptable noise error (0 without noise).If one or two groups of mismatched point pairs exist, then d ij is a non-regular random quantity.According to the spatial geometric consistency of the point cloud after rotation and translation, the reciprocity between different point pairs, we can remove the wrong correspondences.
the other hand, the point pair ( 3x , 3 y ) is an incorrect correspondence due to similar fea- tures, preventing it from forming a compatible relationship with other valid inline correspondences.To establish the correct correspondences, we define i  M d ij is greater, the ith point pair is more likely to be the correct corresponding relationship.Finally, we select a small number of excellent corresponding point pairs and input them into the SVD module for the solution.

Loss Functions
The sampling of key points and the correct matching relationship are very important to the quality of point cloud registration, so that two loss functions are proposed to supervise the above two procedures separately.
Key point loss: This function is utilized to select the matching key points.It is difficult to label the point pair relationship in a noisy environment, so we utilize the soft match matrix for mutual supervision.
Correspondence loss: It is a standard cross entropy loss utilized to train the convolution module in soft correspondence.We define this loss as: where, is the index of the point closest to the source point x i in the target point cloud under the change of ground truth, R * and t * are ground truth.r is the super parameter controlling the minimum radius.

Results
In this section, we verify and compare the performance of the proposed method through a large number of experiments, and analyze the experimental results.We compare our model with ICP [6], FGR [8], RANSAC [21], DCP [12], IDAM [15], RPMNet [14], PointNetLK [9] and Predator [34].We also test the generalization of our model on real data.The optimization of the entire network's parameters is performed using the Adam optimizer.The initial learning rate is 1 × 10 −3 , then we set it to 1 × 10 −4 after 150 epochs, and 250 epochs are trained in total.
Most of our experiments are carried out on the ModelNet40 [35] dataset, which consists of 40 object categories.We utilize 9843 models for training and 2468 models for testing.Following the experimental settings of RPMNet, for a given shape, we randomly sample 1024 points to form a point cloud.We randomly generate three Euler angles within the range of [0, 45 • ] or [0, 90 • ], and translations within the range of [−0.5, 0.5] for each point cloud.The original point cloud is utilized as the source and the transformed point cloud is utilized as the target point cloud.
We utilize the same metrics as [12,15] to evaluate the performance of all the methods.For the rotation matrix, we utilize root mean square error (RMSE(R)) and mean absolute error (MAE(R)).For the translation vectors, we utilize root mean square error (RMSE(t)) and mean absolute error (MAE(t)).If the overlapping regions of two clouds are exactly the same and rigid transformation is perfect, all of these error metrics should be zero, and all of the angle measurements in our results are in degrees.Since we utilize Open3D [36] to process point cloud data, it is important to note that Open3D interprets the coordinate values as meters (m) by default.Therefore, the translation errors are typically measured in meters (m) in our results.For ICP, FGR and RANSAC, we utilize the implementations in Intel Open3D [36], where the number of iterations for ICP is 30, and the search radius and the maximum number of neighborhood points of FPFH are 0.2 and 100, respectively.Since the data generation method is almost the same as that of RPMNet, in the experiment of [0, 45 • ], we directly utilize pretrained models of RPMNet for testing, and other experimental results are obtained after retraining.
In this experiment, the source point cloud and the target point cloud are identical and have one-to-one correspondence.In theory, the two clouds can completely overlap after rotation and translation.The experimental results are shown in Tables 1 and 2. The traditional methods are seriously influenced by the initialization, and for large rotation angles, they tend to converge to local minima.The methods based on deep learning show excellent performance when the rotation angle is within the range of [0, 45 • ].However, many learning-based methods also fail when the rotation angle is too large.The highdimensional features of the matched points after shared convolutional layers can have large gaps and seriously affect the subsequent matching of key points.In contrast, our proposed method leverages rotation-invariant features to guide the matching task, enabling accurate selection of matching points even under large rotations.By enforcing spatial geometric consistency, we achieve an error of less than 10 −4 .Qualitative comparisons of the registration results can be found in Figure 5a.

Gaussian Noise
In this experiment, we add Gaussian noise sampled from N(0, 0.01 2 ) and clipped to [−0.05, 0.05] in the source and target point clouds.Since there is no one-to-one correspondence between the two point clouds, it is difficult for the network to approximate the ground truth.The experimental results are shown in Tables 3 and 4. As FPFH [22] is sensitive to noise, the errors of traditional methods such as FGR and RANSAC become large.Compared with the correspondences-free method (PointNetLK), the methods based on correspondences matching (DCP, IDAM, RPM and Predator) will become worse because of noise.This is because the methods based on global features focus on the features of the whole point cloud, not on the local features of the points.Compared with other methods, the proposed method achieves the best performance under larger rotations.A qualitative example of registration on noisy data can be found in Figure 5b,d.

Partial Visibility
In order to generate partially overlapping point clouds, we sample a halfspace with a random direction and shift it so that approximately 70% of the points are retained for each point cloud [14].The experimental results are shown in Tables 5 and 6.It can be seen that the errors of almost all methods will be larger, and the learning-based methods hardly converge under large rotations.Compared with these methods, although we achieved the best results, our RMSE(R) is more than four times that of MAE(R), and we try to analyze the reasons for such result in experiment 4.2.Example results on partially visible data are shown in Figure 5c,e

Partial Visibility
In order to generate partially overlapping point clouds, we sample a halfspace with a random direction and shift it so that approximately 70% of the points are retained for each point cloud [14].The experimental results are shown in Tables 5 and 6.It can be seen that the errors of almost all methods will be larger, and the learning-based methods hardly

Key points and Correspondences
In this experiment, in order to verify the validity of rotation-invariant features, we visualize the point cloud feature maps generated by PointNet, DGCNN and the method utilized in this paper, respectively.We utilize t-SNE [37] to reduce the dimension of highdimensional features.As shown in the Figure 6, we aligned the rotated target point cloud for better visualization.It can be seen that the feature descriptor we designed is invariant to rotation.DGCNN and PointNet are highly related with the input position, which is very sensitive to rotation.Different from these methods, we do not rely on the position information of individual points, but utilize the relative geometric information of the domain points to weaken the interference of rotation angle.

FOR PEER REVIEW
13 of 19 constraints on the input match relationships and improve the inlier ratio.We compare the inlier ratio before and after using this module to evaluate its performance in optimizing match relationships.As shown in the Table 7, our proposed spatial geometric consistency constraint module outperformed the RANSAC method in most cases and performed better in rejecting incorrect matches.
Original In order to further observe the role of feature matching and spatial consistency constraints, we also visualize the soft correspondences and hard correspondences.As shown in the Figure 7, we show the matching of points in three scenarios (clean, jitter and crop), respectively.Since all points in X have exact correspondences in Y, the corresponding points match best in the clean scenario, and the crop scenario has the most incorrect correspondences due to partial overlap and noise.Although there are a large number of outliers in the soft correspondences, the SGC module can effectively extract the correct correspondences from them.We also conduct comparative experiments between the RANSAC method and the spatial geometric consistency constraint module.We use the inlier ratio of the correspondences before and after input as a performance metric.To reject incorrect matches, we use the RANSAC method in the experiment.It is a rejection-based algorithm based on random sampling that estimates model parameters and rejects incorrect matches.Meanwhile, we also propose the spatial geometric consistency constraint module to optimize the match relationships.This module can impose spatial and geometric consistency constraints on the input match relationships and improve the inlier ratio.We compare the inlier ratio before and after using this module to evaluate its performance in optimizing match relationships.As shown in the Table 7, our proposed spatial geometric consistency constraint module outperformed the RANSAC method in most cases and performed better in rejecting incorrect matches.
In Figure 8, we also show the situation where the network mismatches the key points due to symmetry interference.The large gap between RMSE(R) and MAE(R) indicates that there are a large number of outliers in the test data.As shown in Figure 8 and Table 8, we visualize the registration situation and corresponding relationship of bad cases.Due to the similarity of the distribution of points in the symmetric region, a large number of mismatched points have very similar features, so there are many abnormal cases in the test data.In Figure 8, we also show the situation where the network mismatches the key points due to symmetry interference.The large gap between RMSE(R) and MAE(R) indicates that there are a large number of outliers in the test data.As shown in Figure 8 and Table 8, we visualize the registration situation and corresponding relationship of bad cases.Due to the similarity of the distribution of points in the symmetric region, a large number of mis-  In Figure 8, we also show the situation where the network mismatches the key points due to symmetry interference.The large gap between RMSE(R) and MAE(R) indicates that there are a large number of outliers in the test data.As shown in Figure 8 and Table 8, we visualize the registration situation and corresponding relationship of bad cases.Due to the similarity of the distribution of points in the symmetric region, a large number of mismatched points have very similar features, so there are many abnormal cases in the test data.

Real Data
In this section, we conduct experiments on the Stanford 3D Scan datasets [38] and odometry KITTI [39] to further evaluate the generalizability.For Stanford 3D Scan datasets, we sample 768 points on these 3D meshes separately to generate point clouds.We also downsample voxels from the original KITTI dataset to 2000-2500 points.The network parameters in this section are the weights trained in the ModelNet40 dataset without fine-tuning.The partially overlapping point clouds are generated by manner in Prnet [13].Some qualitative examples are shown in Figure 9.

Real Data
In this section, we conduct experiments on the Stanford 3D Scan datasets [38] and odometry KITTI [39] to further evaluate the generalizability.For Stanford 3D Scan datasets, we sample 768 points on these 3D meshes separately to generate point clouds.We also downsample voxels from the original KITTI dataset to 2000-2500 points.The network parameters in this section are the weights trained in the ModelNet40 dataset without finetuning.The partially overlapping point clouds are generated by manner in Prnet [13].Some qualitative examples are shown in Figure 9.

Ablation Study
In order to demonstrate how each component affects the performance of the network, in this section, we conduct the ablation study, in which we gradually add and remove different modules in the network to evaluate their contributions to the final matching performance.The experiments are carried out on the partial visibility point clouds with noise.

Ablation Study
In order to demonstrate how each component affects the performance of the network, in this section, we conduct the ablation study, in which we gradually add and remove different modules in the network to evaluate their contributions to the final matching performance.The experiments are carried out on the partial visibility point clouds with noise.Tables 9 and 10 illustrate the results of ablation studies under [0, 45 • ] and [0, 90 • ], respectively, where SA, CA, PPF and SGC, respectively, represent self-attention, cross-attention, deep high-dimensional features based on PPF and the spatial geometric consistency constraint module.The symbol represents the addition of a module to the network.According to the results, it is found that cross-attention can combine the information of two point clouds, is suitable for processing partially overlapping point clouds.Additionally, the rotation-invariant feature based on PPF is effective for large rotations.In addition, the proposed correspondence module can weaken the effect of wrong correspondences and further improve the accuracy of the network.

Discussion
In comparison with other methods, our proposed approach exhibits better performance in both [0, 45 • ] and [0, 90 • ], especially in the presence of large rotations, demonstrating its robustness.Further analysis of the experimental results will be conducted to discuss the advantages and limitations of our method.
In the noise-free experiment, our method achieves incredible results compared with other methods, which may seem unrealistic.This is because the target point cloud is generated by rotating and translating the source point cloud.Therefore, in the absence of noise, the two point clouds are exactly the same and have a one-to-one correspondence.Additionally, only four matched points are needed to recover the correct rigid transformation.However, in reality, data are rarely one-to-one correspondences.Nevertheless, this experiment can reflect the constraint ability of our spatial geometric consistency constraint module on outliers.
In the noisy experiment, although our method is slightly inferior to the RPM network, it demonstrates excellent performance in large rotation angles due to the supplement of rotation-invariant features.To further prove the effectiveness of this module, we conducted ablation experiments.The comparison between Tables 9 and 10 shows that the improvement of introducing the rotation-invariant module is limited in [0, 45 • ], but introducing this module within [0, 90 • ] significantly improves the results, reducing the error from 12.07 to 8.34.
To validate the effectiveness of our constraint module, we compared it with RANSAC.The experimental results show that our module can significantly improve the inlier ratio of correspondences compared with RANSAC.Furthermore, we also demonstrated the importance of the constraint module in the network through ablation experiments.
However, some experimental results show that this method has many limitations.In the partial visibility experiments in [0, 90 • ], the value of RMSE(R) is about 4.5 times that of MAE(R), which indicates that there are a lot of outliers in the test data.We have visualized the registration results and correspondences predicted of the bad cases.It can be seen that due to the large amount of symmetric data in the ModelNet40 dataset, there are a large number of non-matching points with similar features in the symmetric regions, which seriously affects the final registration results.It can be seen from Table 7 that when the point clouds are partially visible, there are many outliers in the soft correspondences, resulting in a low inline rate of the input correspondence.How to select the matching points with distinguishing features in indistinguishable surfaces is still a difficult problem.
Secondly, our method cannot converge when the point clouds have low overlap.Supervision and training of overlapping regions may alleviate this problem.Additionally, the proposed network cannot be directly applied to large scale datasets, because our method of feature extraction is to operate on each point, not to extract features while sampling such as Pointnet++ [40].In future work, our focus will be on combining this work with feature extraction methods such as KPConv [41] or FCGF [42] to process large scene datasets in an end-to-end manner, and using the methods proposed in this paper to guide super-point matching and precise registration.

Conclusions
In this paper, we propose a novel network to tackle partially overlapping 3D point cloud registration.In contrast to previous works, we focus on the impact of large rotations on feature matching and the problem of feature mismatch caused by similar regions.Since large rotations can result in significant differences between the key point features of two point clouds, we introduce a high-dimensional rotation-invariant feature module in the feature extraction stage to reduce the gap between corresponding point features.Additionally, apart from incorporating self-attention mechanisms to enhance point cloud global features, we employ a cross-attention mechanism to identify overlapping regions between the two point clouds.To mitigate the impact of mismatched correspondences, we not only weight each matching point pair based on point cloud features, but also propose a non-learning module that exploits the intrinsic rotation invariance of point clouds and rejects mismatches by constraining inter-relations.Extensive experiments demonstrate that our proposed method not only achieves superior performance in the presence of large rotations but also effectively improves the proportion of correct correspondences.

Remote 19 Figure 1 .
Figure 1.Overview of the network structure.

Figure 2 .
Figure 2. Overview of the GNN structure.

Figure 1 .
Figure 1.Overview of the network structure.

Figure 1 .
Figure 1.Overview of the network structure.

Figure 2 .
Figure 2. Overview of the GNN structure.

Figure 2 .
Figure 2. Overview of the GNN structure.
through three convolution layers, respectively (Equation (2)).Additionally, the attention-based feature maps x A are ob- tained as Equation (

Figure 3 .
Figure 3. Illustrations of (a) self-attention mechanism and (b) cross-attention mechanism modules.
mented using a two-layer one-dimensional convolutional neural network and _ c sa W is implemented using a four-layer one-dimensional convolutional neural network.α is a

Figure 3 .
Figure 3. Illustrations of (a) self-attention mechanism and (b) cross-attention mechanism modules.

Figure 4 .
Figure 4. Corresponding relations between points.The green lines represent the correct correspondences, and the red line represents the error correspondence.

Figure 4 . 1 ∑
Figure 4. Corresponding relations between points.The green lines represent the correct correspondences, and the red line represents the error correspondence.We first create a Euclidean spatial distance matrix M d with dimension M × M, where M d ij is the distance value calculated using Equation (10).Then we set a distance error limit σ, and utilize the relationship between M d ij and σ to update M d into a matrix containing only 0 and 1.If M d ij ≤ σ, M d ij = 1, and otherwise 0. Taking the ith row as an example, M−1 ∑ j=0 M d ij represents whether the ith point pair has more interaction with other correct

4. 1 .
ModelNet40 4.1.1.Unseen Shapes In our first experiment, we classify all point clouds in the ModelNet40 dataset into training sets and test sets, and utilize different point clouds during training and testing.

Figure 5 .
Figure 5. Qualitative registration examples on (a) clean data, (b) noisy data, (c) partially visible data, (d) noisy data with large rotation and (e) partially visible data with large rotation.

Figure 5 .
Figure 5. Qualitative registration examples on (a) clean data, (b) noisy data, (c) partially visible data, (d) noisy data with large rotation and (e) partially visible data with large rotation.

Figure 9 .
Figure 9. Results on the real dataset.The top row shows the initial positions of the two point clouds, and the bottom row shows the results of registration.(a,b) Stanford 3D Scan data, (c-e) KITTI data.

Figure 9 .
Figure 9. Results on the real dataset.The top row shows the initial positions of the two point clouds, and the bottom row shows the results of registration.(a,b) Stanford 3D Scan data, (c-e) KITTI data.

Table 1 .
Results for testing on point clouds of unseen shapes in [0, 45 • ].

Table 3 .
Results for testing on point clouds of unseen shapes with Gaussian noise in [0, 45 • ].

Table 4 .
Results for testing on point clouds of unseen shapes with Gaussian noise in [0, 90 • ].

Table 5 .
Results for testing on partial visibility point clouds with Gaussian noise in [0, 45 • ].

Table 6 .
Results for testing on partial visibility point clouds with Gaussian noise in [0, 90 • ].

Table 7 .
Inlier ratio in correspondences under different methods.

Table 7 .
Inlier ratio in correspondences under different methods.

Table 8 .
Results of failure cases on ModelNet40.

Table 8 .
Results of failure cases on ModelNet40.