1. Introduction
In the field of computer vision, 3D point cloud registration has garnered significant attention due to its widespread applications in various domains, including 3D reconstruction [
1], object recognition [
2], autonomous navigation [
3], and cultural heritage preservation. The core objective of point cloud registration is to precisely align point clouds originating from distinct coordinate systems into a common reference frame. This process enables the calculation of rotation and translation transformation matrices between two target point clouds [
4]. The advent of low-cost point cloud acquisition devices, such as Microsoft Kinect and Intel RealSense, has made point cloud data acquisition more accessible. Due to viewpoint limitations, it is often necessary to capture a complete 3D model from multiple different angles. As a result, point cloud registration techniques are employed to align point clouds obtained from various viewpoints in a pairwise manner.
Currently, existing methods for three-dimensional point cloud registration fall into four broad categories: direct georeferencing, target-based registration, surface-based registration, and feature-based registration [
5]. The global feature descriptors encode the geometric information of the entire point cloud using a set of global features. However, global features ignore local shape information, making it challenging to register point clouds with occlusions and clutter. Specifically, on one hand, scenes with clutter and occlusions contain a significant number of outliers and missing scene information, which can lead to algorithms becoming stuck in local optima. On the other hand, large-scale point clouds have more points, and global descriptor algorithms need to process more data points, thus increasing computational complexity. In contrast, methods based on local feature descriptors encode the information from the local neighborhood of keypoint, requiring only a minimum of three corresponding point pairs to achieve registration. As a result, they significantly enhance the success rate of registration, especially in scenarios with occlusion and low overlap.
Keypoint detection serves as a crucial step in registration methods based on local feature descriptors. Over the past decade, numerous keypoint detection algorithms have been proposed for 3D reconstruction and point cloud simplification. Existing keypoint detection algorithms can be broadly classified into two categories: fixed-size and adaptive-size detection algorithms [
6]. Fixed-size detection algorithms detect keypoints using a constant scale, which remains unchanged, regardless of variations in the feature scales of different regions within the point cloud. Examples of such algorithms include intrinsic shape signatures (ISS) [
7], local surface patches (LSP), Harris 3D [
8], and keypoint quality (KPQ) [
9], among others. However, each of these methods has its limitations in practical applications. For instance, LSP detects keypoints uniformly but with relatively low repeatability [
10], and KPQ tends to perform poorly in scenarios involving partial occlusion [
6]. On the other hand, adaptive-size detection algorithms generate multiple scales for the input point cloud and detect keypoints at different scales, as observed in MeshDog and KPQ-AS [
9]. Adaptive-size detection algorithms generally outperform fixed-size approaches [
10], but they come with added complexity in scale computation, resulting in increased computational overhead. In contrast, fixed-size detection algorithms are computationally more efficient due to their simplicity. However, they tend to perform satisfactorily primarily when dealing with high-quality point clouds. In the presence of significant noise, occlusion, and low resolution in input point clouds, keypoints often exhibit low repeatability and tend to cluster in specific areas, significantly limiting their utility.
To address the aforementioned issues, this paper introduces a keypoint detection algorithm based on the local variation of the surface (LVS) and a point cloud registration method based on LVS. The core idea behind LVS is the uniform detection of keypoints displaying local surface variation within the point cloud. It initially establishes a local coordinate system for each point within the keypoint’s neighborhood to calculate the surface variation index. Subsequently, points with surface variation index values lower than the local average are designated as initial keypoints. Finally, the algorithm searches within the initial keypoint set for points in the neighborhood and selects those with the minimum surface variation index as keypoints.
The LVS-based registration algorithm encompasses keypoint detection, feature description, transformation estimation, and fine registration. LVS is utilized for keypoint detection, while SAC-GC is employed for transformation estimation. To assess the performance of LVS and the proposed registration method, comprehensive experiments and comparisons were conducted on four datasets. The results of these experiments indicate that the LVS keypoint detection algorithm introduced in this paper achieves higher repeatability and enhanced robustness in the presence of interference. Furthermore, registration experiments on the BMR dataset demonstrate that the LVS-based registration method outperforms state-of-the-art approaches.
This article makes the following contributions:
Firstly, it introduces a novel LVS keypoint detection algorithm that exhibits exceptional repeatability and robustness and is particularly well-suited for addressing issues like noise, occlusions, and clutter in point cloud data.
Secondly, it proposes a coarse-to-fine point cloud registration algorithm combining LVS and SAC-GC that seamlessly combines high accuracy and computational efficiency, showcasing outstanding performance in pairwise point cloud registration.
The remaining sections of the paper are organized as follows:
Section 3 elaborates on the LVS keypoint detection method and technical details and introduces the LVS-based point cloud registration algorithm.
Section 4 presents a comprehensive evaluation of the LVS keypoint and the proposed point cloud registration algorithm. Finally,
Section 5 summarizes the findings and outlines future research directions.
2. Related Work
In the realm of point cloud registration, the current approach predominantly involves a combination of coarse registration and fine registration. Coarse registration quickly estimates the rigid transformation between two initial point clouds, offering an excellent initial position for fine registration. Fine registration, in turn, uses the initial pose from coarse registration to iteratively obtain the optimal transformation matrix [
11]. The main challenge in this process lies in obtaining a robust initial pose during the coarse registration phase, with keypoint detection playing a crucial role. This study, therefore, places its focus on keypoint detection, and existing keypoint detectors are reviewed.
Sipiran et al. [
8] introduced a three-dimensional version of the Harris operator (Harris 3D), inspired by its two-dimensional counterpart, using surface normals to compute the gradient of the covariance matrix. However, the performance of two-dimensional methods in a three-dimensional space is somewhat limited, particularly in terms of keypoint repeatability. To address the issue of a limited number of keypoints detected by Harris 3D, Wang et al. [
12] proposed the nearest neighbor search (NNS) Harris 3D algorithm, which enhances the original Harris 3D keypoints by including multiple neighboring points as keypoints. Chen et al. [
13] introduced the local surface patches (LSP) keypoint detection algorithm, which employs point shape indices to identify candidate keypoints and utilizes non-maximum suppression (NMS) to select the final keypoints. Despite achieving an even distribution of keypoints, LSP’s repeatability is relatively poor. Building upon LSP, Zeng et al. [
14] presented the double Gaussian weighted dissimilarity measure (DGWDM) keypoint detection algorithm. DGWDM detects keypoints in two stages. Initially, it identifies initial keypoints based on local dissimilarity values and subsequently determines final keypoints using a multi-scale detection strategy. However, this method requires additional point cloud mesh information. Zhong et al. [
7] introduced the intrinsic shape signatures (ISS) algorithm, which utilizes the ratio of eigenvalues of the covariance matrix and employs NMS to filter keypoints. The number of keypoints detected by this algorithm is roughly similar to the LSP method. Main et al. [
9] proposed the keypoint quality (KPQ) keypoint detection method, which initially determines keypoints based on the ratio of local coordinate axes and subsequently selects the final keypoints by setting a keypoint quality parameter threshold. Compared to the ISS algorithm, KPQ uses fewer constraints in the non-maximum suppression step, to detect more keypoints, but increases computational complexity by using keypoint quality for final point selection [
6]. Building upon LSP and KPQ, Lan et al. [
15] introduced a keypoint detection algorithm based on the normalized shape index. This method initially detects keypoints using the coordinate axis ratios from KPQ and shape indices from LSP and subsequently determines final keypoints based on local dissimilarity measure values at different radii. Xiong et al. [
16] presented a keypoint detection algorithm based on surface transformation and eigenvalue change indices, which combines the advantages of ISS and KPQ to improve repeatability. In the realm of keypoint detection based on normal vectors, Prakhya et al. [
17] presented a keypoint detection algorithm based on surface transformation and eigenvalue change indices, which combines the advantages of ISS and KPQ to improve repeatability. Also in the realm of keypoint detection based on normal vectors, Muhammad et al. [
10] introduced the fuzzy logic and normal orientation (Fuzzy-HoNo) algorithm, which enhances keypoint detection through soft decision boundaries and adaptive parameters. While it improves keypoint detection, this algorithm does entail higher computational complexity.
4. Experimental Results and Discussion
In this section, a series of experiments were conducted to evaluate the LVS keypoint detection algorithm, including assessing relative repeatability, absolute repeatability, and computational time. Four distinct datasets were utilized in these experiments to provide a comprehensive assessment of the algorithm’s performance. Furthermore, the point cloud registration method, based on LVS keypoints, was subjected to tests to evaluate its accuracy and robustness using the BMR dataset. It is important to note that all these experiments were carried out on a computer equipped with an Intel Core i5-13600kf 3.5 GHz processor and 32 GB of RAM, sourced from Lenovo, a Chinese brand based in Changchun.
4.1. Datasets
There were four datasets used to assess the performance of the proposed LVS keypoint detection method. These datasets are Stanford 3D, B3R, U3OR, and Queen.
Figure 5 provides visual representations of the models and scenes within these four datasets.
The Stanford 3D dataset [
22] consists of six models and their corresponding range data obtained through the Cyberware 3030 MS scanner. The point cloud data are noise-free, with a resolution of approximately 0.5 mm. The B3R dataset [
23] includes six models and 18 scenes from the Stanford 3D Scanning Repository. These 18 scenes are generated by applying random transformations to the six models and adding Gaussian noise, with the values of 0.1, 0.3, and 0.5 milliradians. The data collection equipment is the same as the Stanford 3D dataset, and the data are primarily affected by rigid transformations and noise, with no occlusion. The U3OR dataset [
24] comprises five models and 50 scenes. Each scene is composed of four or five models and captured using the Minolta Vivid 910 scanner with a resolution of around 0.6 mm. The data are free of noise but include occlusion and clutter. The Queen dataset [
25] is composed of five models and 80 scenes, acquired using the Konica-Minolta Vivid 3D scanner. Compared to U3OR, the Queen dataset has more scene models and lower-quality data, including noise, occlusion, clutter, and resolution variations.
4.2. Parameter Analysis
The neighborhood radius
is a crucial parameter in point cloud feature detection, and its value directly influences the recognition of point cloud features. If
is set too large, it increases computational complexity and diminishes robustness against occlusions and point cloud boundaries. Conversely, when
is set too small, it diminishes feature distinctiveness. To better analyze the impact of the neighborhood radius
on keypoint detection, a methodology similar to that in reference [
6] was employed. The Stanford point cloud data were used as test data, as discussed in detail in
Section 4.1, and Gaussian noise, with values of 0.1 mr, 0.2 mr, and 0.3 mr, was introduced into the test point cloud. The neighborhood radius was set at 6 mr, 8 mr, 10 mr, 12 mr, 14 mr, 16 mr, and 18 mr where mr represents the point cloud resolution, which indicates the average distance between neighboring points in the point cloud. The relative repeatability and absolute repeatability of keypoint detection were assessed separately. Relative repeatability is defined by Formula (17), while absolute repeatability is defined by Formula (18). We also evaluated the computation time. The experimental results are presented in
Figure 6.
As shown in
Figure 6a, the relative repeatability of keypoints demonstrates an increase as the neighborhood radius R expands for different noise levels. When R is set at values less than 10 mr, the relative repeatability of keypoints experiences a significant uptick as the radius increases, particularly in the case of various noise point cloud data. Nevertheless, when R surpasses 10 mr, the relative repeatability of keypoints registers a more gradual increase with the expanding radius. In
Figure 6b, the absolute repeatability of keypoints follows a consistent pattern, decreasing linearly as the neighborhood radius grows across different noise levels. This outcome is attributed to the increasing radius, leading to the detection of more stable features while filtering out lower-quality keypoints. In
Figure 6c, it is shown that the growth is slow for R values less than 10 mr, and then it increases rapidly. Consequently, it is concluded that a neighborhood radius of 10 mr is well-suited for the keypoint detection method.
4.3. Evaluation Criteria for Keypoints
All keypoint methods employ keypoint repeatability as a widely accepted evaluation criterion. Evaluation methods for keypoint repeatability can be found in the following references [
6,
11,
18]. Keypoint repeatability pertains to the capability of a keypoint detection algorithm to consistently locate the same keypoint in the same local region of the same model. High keypoint repeatability indicates that the detection algorithm demonstrates strong resistance to interference, such as noise, occlusion, and clutter.
Keypoint repeatability is typically divided into two categories: relative repeatability and absolute repeatability. The primary calculation steps are as follows: first, keypoint sets are independently detected from the source point cloud and the target point cloud, denoted as
and
, respectively. Next, the keypoints detected from the source point cloud are aligned to the target point cloud using the true rotation and translation matrices, where the true rotation and translation matrix data are provided from datasets. If a keypoint
in the source point cloud is at a distance less than the threshold
from the nearest neighbor keypoint
in the target point cloud, it is considered repeatable; that is:
Based on Equation (17), the total number of keypoints that meet the conditions detected from the source point cloud is denoted as
. The definitions of absolute repeatability and relative repeatability are as follows:
where
is the number of keypoints in the source point cloud.
4.4. Keypoint Repeatability Experiment
In this experiment, the LVS keypoint detection algorithm proposed in this paper was compared with six advanced methods (ISS, HoNo, Harris 3D, KPQ, LSP, NNS Harris 3D) using four public datasets. Due to errors or interference in point cloud devices, true corresponding points may exhibit a small offset. Therefore, a threshold that is too small might exclude correct corresponding points, while a threshold that is too large could result in the consideration of erroneous points. The distance threshold
was set to 2 mr, where mr represented the average distance between points in the point cloud. The neighborhood radius was set to 6 mr, 8 mr, 10 mr, 12 mr, 14 mr, 16 mr, and 18 mr, respectively. The experiment calculated the relative repeatability and absolute repeatability of each algorithm under varying neighborhood radii. The results are displayed in
Figure 7.
For the B3R dataset, the robustness of all methods in noisy environments was evaluated. The results of keypoint detection algorithms are presented in
Figure 7a–f. Regarding keypoint repeatability (
Figure 7a–c), LVS and KPQ performed exceptionally well, significantly surpassing other methods. The line graphs clearly illustrate that the LVS keypoint detection algorithm can maintain a repeatability rate of over 80% across different noise levels, and this rate increases with the radius. Harris 3D and NNS Harris 3D also demonstrated decent performance, with a repeatability rate fluctuating around 80%. However, HoNo, LSP, and ISS performed less satisfactorily, possibly due to the sensitivity of HoNo’s normal vectors, LSP’s curvature to noise, and ISS potentially containing planar points. Regarding keypoint absolute repeatability (
Figure 7d–f), it is evident that, under varying noise conditions, KPQ and LVS consistently have the highest number of repeatable keypoints compared to the other methods. The number of keypoints they detect is less affected by noise. In contrast, ISS, HoNo, Harris 3D, LSP, and NNS Harris 3D detect fewer keypoints, which may be attributed to the impact of non-maximum suppression, implying that non-maximum suppression filtered out a significant portion of the keypoints.
Robustness experiments were conducted for all keypoint detection algorithms under occlusion, clutter, and border conditions, using the Stanford 3D and U3OR datasets. The results are presented in
Figure 8, where (a, b) represent relative repeatability. It is evident from the graphs that LVS outperforms all other algorithms on both datasets. In
Figure 8d,e depicting absolute repeatability, it can be observed that as the radius increases, the number of repeatable points gradually decreases for all algorithms. However, LVS consistently maintains 1000 or more repeatable points at various radii, while the other methods gradually tend to zero. This demonstrates the robustness of LVS against occlusion, clutter, and border conditions. The robustness of LVS can be attributed to two key factors: firstly, LVS detects keypoints based on the average surface change index, eliminating points in flat regions and enhancing the quality of keypoints by filtering out redundant points.
The Queen dataset, being of lower quality than the others, presents significant challenges due to its higher noise level, complex background, and sparse point distribution. The results of the seven methods are presented in
Figure 8c,f. Regarding relative repeatability, KPQ and NNS Harris 3D exhibit the largest fluctuations, while LVS remains relatively stable. Furthermore, the absolute repeatability of LVS surpasses that of the other methods, providing a substantial number of keypoints for subsequent registration. This underscores the robustness of the LVS keypoint detection algorithm in handling various interferences.
4.5. Keypoint Computation Efficiency
In this section, the efficiency of all methods will be compared. Since the efficiency of the keypoint computation is related to the number of points in the original point cloud, this study selected the B3R dataset for testing. Radii are set from 6 mr to 18 mr, with a 2 mr interval, and then seven keypoint detection algorithms are used to detect keypoints at different radii. The results of time consumption are shown in
Figure 9.
We can observe that ISS is the most efficient method, since it only requires the computation of eigenvalue ratios. Secondly, KPQ is the most time-consuming, as it needs to perform surface fitting to compute keypoint quality. It is worth noting that LVS employs a method similar to KPQ but is more efficient relative to KPQ and HoNo.
4.6. Registration Algorithm Comparison Experiment
In this section, the proposed registration algorithm based on LVS is evaluated. The evaluation experiments include qualitative and quantitative analyses, as well as comparisons with state-of-the-art algorithms. These experiments use the BMR dataset, which comprises point clouds obtained from various viewpoints using a Microsoft Kinect device. The point clouds in this dataset are challenging, and are characterized by noise, occlusions, missing data, and variations in resolution. Therefore, this dataset provides a comprehensive assessment of the registration algorithm’s performance. The registration parameters for the algorithm are set based on the analysis in
Section 4.2, with
= 10 mr,
,
,
.
4.7. Qualitative Results
Figure 10 displays the registration results of the proposed method for point clouds with varying angles and overlap ratios. The original point clouds pose challenges such as holes, missing data, and occlusions, which can make registration a challenging task.
However, our registration method can establish partially accurate correspondences and compute initial poses, providing a good starting point for fine registration. This reflects the ability of the proposed LVS keypoint detection method to identify a significant number of repeatable keypoints and underscores the robustness of LVS features to various interferences. It is worth noting that our algorithm is fully automatic and requires only the input of source and target point clouds to produce registration results. To validate the effectiveness of the algorithm, we also present 15 registration results in
Figure 11.
4.8. Quantitative Results
In addition to the visual analysis, we conducted quantitative analyses of the registration results, including the time required for registration, rotation error, and translation error.
To measure the accuracy of registration, rotation error and translation error were used. Rotation error is between the estimated and the true rotation matrix, while translation error is between the estimated and the true translation vector. Specifically, they are defined as follows:
where
and
are the results obtained through manual initial alignment and refined using ICP registration.
To further validate the superiority of our proposed registration algorithm, we conducted a comparison of registration errors between our method and other methods. To ensure a fair comparison, we utilized the same registration pipeline, and the various methods only replaced the keypoint detection algorithms. We estimated the initial pose using SAC-GC and the final pose using ICP. Subsequently, we compared the performance of different methods at 80% and 50% overlap rates, with the calculation of overlap rates following the reference [
4].
Table 1 presents all the registration results.
As indicated in
Table 1, all methods exhibit notable errors in the fine registration results, primarily due to the substantial interference in the BMR dataset, which poses challenges in initial pose estimation. Nevertheless, the LVS-based registration method consistently achieves the lowest rotation and translation errors across all datasets. The KPQ method, which detects fewer keypoints, performs the least effectively. At an 80% overlap rate, both LVS and the method based on NNS Harris 3D outperform the other methods in all datasets. The remaining methods exhibit optimal performance in only a few datasets. As the overlap rate decreases to 50%, registration becomes more challenging, leading to increased errors for all registration methods. The methods based on HoNo, Harris 3D, KPQ, LSP, and NNS Harris 3D display significant rotation and translation errors across all datasets as they struggle to detect repeatable keypoints. The ISS-based method performs reasonably well only on the Frog dataset.
To validate the improvement in SAC-GC over RANSAC, tests were conducted by replacing SAC-GC with RANSAC while maintaining the same number of iterations. The summarized experimental results are presented in the table below. It is evident from
Table 2 that each method has its strengths and weaknesses in terms of rotation and translation errors. Notably, SAC-GC offers a 14-fold speedup over RANSAC, due to the geometric similarity filtering, which eliminates a substantial number of incorrect point pairs.
5. Conclusions
This paper introduces a keypoint detection algorithm and a registration algorithm based on LVS. The LVS keypoint detection method begins by computing a local coordinate system for each point in the point cloud. In this local coordinate system, a surface variation index is calculated based on the coordinate axis ratios. Points with surface variation indices smaller than the local average value are identified as candidate keypoints. Subsequently, the point with the smallest surface variation index in the neighborhood of each candidate keypoint is selected as the final keypoint. Based on LVS keypoints, an efficient and accurate registration algorithm is proposed.
Extensive experimental evaluations were conducted to assess the performance of the proposed keypoint detector and registration algorithm. For keypoint evaluation, tests were conducted on four datasets, which contained noise, clutter, occlusion, and resolution variations. Compared to existing keypoint detectors, the LVS keypoint detector exhibited higher keypoint repeatability. The results also demonstrated the efficiency of the LVS detector. In the context of registration, experiments were carried out on low-quality point cloud datasets using the LVS and SAC-GC registration method. Qualitative and quantitative results demonstrated outstanding performance in terms of accuracy and efficiency. This was primarily attributed to the high repeatability of LVS keypoints and the efficient change estimation of the SAC-GC algorithm. Moreover, a comparison between SAC-GC and RANSAC under the same performance conditions revealed a 14-fold increase in efficiency for SAC-GC.