You are currently viewing a new version of our website. To view the old version click .
Sensors
  • Article
  • Open Access

6 January 2022

Matching Algorithm for 3D Point Cloud Recognition and Registration Based on Multi-Statistics Histogram Descriptors

,
,
,
,
and
School of Physical Science and Technology, Southwest Jiaotong University, Chengdu 610031, China
*
Author to whom correspondence should be addressed.
This article belongs to the Section Physical Sensors

Abstract

Establishing an effective local feature descriptor and using an accurate key point matching algorithm are two crucial tasks in recognizing and registering on the 3D point cloud. Because the descriptors need to keep enough descriptive ability against the effect of noise, occlusion, and incomplete regions in the point cloud, a suitable key point matching algorithm can get more precise matched pairs. To obtain an effective descriptor, this paper proposes a Multi-Statistics Histogram Descriptor (MSHD) that combines spatial distribution and geometric attributes features. Furthermore, based on deep learning, we developed a new key point matching algorithm that could identify more corresponding point pairs than the existing methods. Our method is evaluated based on Stanford 3D dataset and four real component point cloud dataset from the train bottom. The experimental results demonstrate the superiority of MSHD because its descriptive ability and robustness to noise and mesh resolution are greater than those of carefully selected baselines (e.g., FPFH, SHOT, RoPS, and SpinImage descriptors). Importantly, it has been confirmed that the error of rotation and translation matrix is much smaller based on our key point matching algorithm, and the precise corresponding point pairs can be captured, resulting in enhanced recognition and registration for three-dimensional surface matching.

1. Introduction

As the laser 3D scanning technology has been developed rapidly, the recognition and registration of three-dimensional objects have also become the active and difficult problems in the research of computer vision []. In different kinds of 3D data description, retaining details with space-efficient data are the advantages of point cloud, which has been extensively used in 3D data processing []. The descriptor establishment and key point matching are two important steps of 3D surface matching. As long as the surface matched well, the accuracy of recognition and registration can be improved [,]. In this paper, we focus on establishing an effective feature descriptor and improving the performance of key point matching algorithm, finally resulting a satisfied 3D surface matching.
In the process of 3D surface matching, serving as a concise representation of point cloud, the descriptor is an essential component containing extensive local features. We also consider the establishment of descriptors as a feature extraction process. Due to the limitation of scanning equipment and environment, inevitably there are noise, occlusion and incomplete regions in the collected point cloud. Thus, the geometric and semantic information would be lost, which would severely affect the performance of descriptors []. Therefore, an effective descriptor should have a strong description ability and be robust to the noise, occlusion and incomplete regions.
In the literature on point cloud, some descriptors construct a Local Reference Frame (LRF) base on key point, and extract the spatial distribution features (e.g., the number of points) in the several partitioned bins. They have a good performance against noise and incomplete regions [,,], but some of them do not have enough description ability towards point cloud with high quality. Some other descriptors extract the geometric attributes features (e.g., normals and curvatures) directly, and these descriptors have a strong description ability but they are sensitive to noise and incomplete regions in the point cloud [,].
Another important component is key point matching, aiming to build up a correspondence between two 3D point clouds, the most commonly used algorithms include Nearest Neighbor and Nearest Neighbor Distance Ratio [], etc. However, they usually considered up to the top-two similar key points in the target point cloud. In fact, the correctly matched corresponding key point in the target point cloud might not be any of the top-two similar key points, which would lead to errors in the calculation of transformation matrix by Singular Value Decomposition (SVD) method []. Thus, the performance of 3D surface matching based on the above algorithms should be further improved.
In this paper, we mainly study the 3D surface matching problems based on the local feature descriptor of the point cloud. With the purpose of describing a 3D object from multiple aspects to enhance the description ability and robustness, we propose a Multi-statistics Histogram Descriptor (MSHD) that combines the spatial distribution and geometric attributes features. Furthermore, we propose a key point matching algorithm that not only considers more similar points when matching a key point, but also handles the corresponding key points through BP networks. Our methods perform better with higher accuracy when matching multi-object point clouds with noise, incomplete regions, and occlusion. The main contributions of this paper can be summarized as follows.
  • First, a descriptor with multi-statistical feature description histogram is proposed. A Local Reference Frame is constructed, and the normals, curvatures, and distribution density of the neighboring points are extracted; the descriptor could describes the features from these three aspects so that it keeps a strong descriptive ability and robustness to noise and mesh resolution.
  • Second, based on deep learning a new key point matching algorithm is proposed, which could detect more corresponding key point pairs than the existing methods. The experimental results show that the proposed algorithm is effective on 3D surface matching.
  • Finally, the matching algorithm based on MSHD is applied to the real component data of the train bottom. Based on this algorithm, more corresponding key point pairs in the two point clouds are obtained, resulting in a high accuracy of 3D surface matching.
The rest of this paper is organized as follows. Related work is discussed in Section 2, and Section 3 introduces the three-dimensional surface matching methods in detail, including the Multi-statistics Histogram Descriptor and matching algorithm for key point. Section 4 shows the experimental results to prove the effectiveness and feasibility of our methods. Finally, the conclusion is given in Section 5.

3. Methodology

3.1. Multi-Statistics Histogram Descriptors

With the purpose of establishing an effective descriptor, we hope it to have both the advantages of spatial distribution feature descriptors and geometric attributes feature descriptors: being robust to noise and incomplete regions with a well-performed description ability at the same time. Therefore, in this paper we propose a descriptor with multi-statistical feature description histogram, which combines the spatial distribution and geometric attributes features. First, we construct an LRF based on the key point, and three coordinate axis planes can be obtained. All the neighboring points can be projected onto these axis planes, so we can calculate the density and average curvatures based on the points falling into each bin, and calculate the normals of the points in each bin. Meanwhile the values are sorted into a 1XN dimensional array with a certain order. The array can be regarded as a histogram, and a histogram descriptor is generated. Figure 3 shows the process of the establishment of the proposed descriptors.
Figure 3. Establishment process of descriptors.

3.1.1. Construct an Local Reference Frame

In the point cloud, if the coordinate system changed, the coordinate of points would also change. For eliminating the influence of the coordinate system changes on the description, some related researches use the invariance of three-dimensional rigid body space transformation, for constructing an Local Reference Frame. First, all the points in the target regions are translated to the centroid, and rotated around the origin of the new coordinate system, which is constructed based on centroid, until the original axes of the target region points are parallel to the three main axes directions. This is the process of Local Reference Frame being constructed.
Supposed there is a point cloud P = p 1 , p 2 , , p i , , p n with n points, and any point p i in P could construct an LRF in the neighborhood of p i . Here, the neighboring points of p i within a certain radius are defined as n b h d p i . For eliminating the influence of the translation, the n b h d p i is translated to the coordinate system, which is constructed based on the centroid of the neighborhood:
p c = p j n b h d p i p j k
p j = p j p c
where p c is centroid of p i neighborhood, p j is all the coordinates of the neighboring points in the neighborhood of p i , p j is the coordinates of the neighboring points after transforming to the centroid coordinate system, and k is the number of the neighboring points of p i .
Expect the centroid p c point, and the key point can also be set as the origin of the new coordinate, the coordinates formula is
p j = p j p k
where the p k is the coordinate of key point. Then, Principal Component Analysis (PCA) is performed to eliminate the influence of rotation. A covariance matrix c o v p i is constructed for the translated n b h d p i by the following formula:
c o v p i = 1 k p j n b h d p i p j p c T p j p c
If the key point is used as the origin, p c should be replaced with p k here. As c o v p i is a symmetric positive semi-definite matrix, we can get three non-negative real eigenvalues λ 1 , λ 2 , λ 3 , and they satisfy the relation of λ 1 λ 2 λ 3 . These three eigenvalues correspond to three eigenvectors v 1 , v 2 , v 3 , and build up a set of orthogonal basis. The three eigenvectors could be used as the three coordinate axes of LRF.
The process of selecting the coordinate axis should be consistent. First, the eigenvector v 1 , corresponding to the largest eigenvalue λ 1 , is chosen as the axis of x . The direction of axis z is related to eigenvector v 3 , which is corresponding to the smaller eigenvalue λ 3 , and it needs to calculate the vector component of the neighboring points along the direction of v 3 . If the number of points with negative coordinates is more than that of points with positive coordinates, the direction of axis z is the same as v 3 ; otherwise, set axis z as the opposite direction of v 3 . Now, the axis of y could be defined since axis x and axis z have been defined.
Then, the coordinates of the neighboring points after transformation can be calculated by the following formula:
R = [ v 1 , v 3 × v 1 , v 3 ]
p = p · R
where the p is the initial coordinate of the neighboring points after translation, R is the direction of the axis, and p is the coordinate of p projected onto the LRF axis planes.

3.1.2. Normals and Curvatures

After the LRF has been constructed, the coordinates of neighboring points could not be directly used as the measurement to generate a descriptor. The accuracy will be seriously reduced once the sampling points change or some noise invades. Therefore, the normals and the curvatures might be the better choices as the measurements to generate a descriptor. The normal of p i could be approximately equal to the tangent plane direction vector of the surface, which is constituted by p i and the neighboring points. After the covariance matrix c o v p i of the n b h d p i is eigen-decomposed, the PCA algorithm also can be used for calculating the normals. The eigenvector v 3 corresponded to the smaller eigenvalue could be regarded as the direction vector of the fitting approximate plane. Therefore, v 3 represents the normal of p i approximately, where n i = v 3 . As the local surface may be concave or convex, the direction of the normals need to be clarified, and the component of the neighboring points along the v 3 direction is calculated. If the points with negative coordinates is more than those with positive coordinates, set n i = v 3 . In order to make p i and its neighboring points distribute on the same plane approximately, the tangent plane could be replaced with an approximate plane, and the radius r of the neighborhood should not be too large. In this paper, we search the neighboring points by kNN algorithm to calculate the normals, the number of neighboring points for detecting is set as 50.
The measurements of curvature represents the steepness of the surface that constituted by the point and its neighboring points. In a word, if the curvatures of points were larger, the variation of the surface would be larger, and more features could be obtained. Otherwise the smaller curvatures that the points have, the smoother that the surface is, and fewer features could be obtained.
The curvature formula of y in two-dimensional coordinate system is as follows:
c = y 1 + y 2 3 2
where y means the ordinate of the point, the curvature c is proportional to the second derivative of y , and thus the curvature is sensitive to the changes of the object surface, also it is susceptible to the interference of noise.
Based on the three eigenvalues λ 1 , λ 2 , λ 3 in the covariance matrix c o v p i , we could estimate the complexity of the surface. The curvature c i of p i can be defined as
c = λ 3 λ 1 + λ 2 + λ 3
where c is curvature of the point, but it is an curvature approximation of the surface constituted by n b h d p i .

3.1.3. Generate the Descriptors

The specific process of the descriptor generation is as follows:
(1) Preparation: Detecting the key points of the point cloud P, the key points are denoted as K P (the curvature c and the normal n of each point are calculated in the point cloud P).
(2) Construction of L R F : Searching the neighborhood of the key point p i K P in the point cloud P, then an L R F based on the key point p i can be constructed, and the coordinates of n b h d p i can be translated into L R F .
(3) Projection along the axis of L R F : The n b h d p i is projected along the three L R F coordinate axes, respectively, three frames of the projected point cloud can be obtained.
(4) Generation of the grid statistical map about the projected point cloud: Dividing the projected point cloud into N P × N P grids, the points and their coordinates could be obtained in each grid, and thus a discrete projection statistical map n b h d ˜ p i could be obtained.
(5) Construction of the normal histogram: n j is defined as the normals of neighboring points, and n i is the normal of center point p i . The angle n i , n j between n i and n j can be calculated. Then the value range of n i , n j with 0 , π can be divided into N θ subintervals, and the points distributed in each subinterval of the grid can be counted. As Figure 4a shows, each sub-interval of the grid can be regarded as a bin, N θ bins in each grid. With one measurement in each bin, and there is N θ bins in each grid. The gird map are expanded to a 1 × N θ × N P × N P dimensional array in a certain order, and after normalization, the histogram H n is generated.
Figure 4. Generation of the histogram. (a) Grids and bins are expanded to an array. (b) Grids are expanded to an array.
(6) Construction of the curvature histogram: Calculating the average curvature of each grid in the projected statistical map n b h d ˜ p i , the values with the average curvature are assigned to each grid. Value 1 is assigned for the grid with no points. With one measurement in each grid, as the Figure 4b shows, the gird map are expanded to a 1 × N P × N P dimensional array in a certain order, and after normalization, the histogram H c is generated.
(7) Construction of the average density of the points histogram: Calculating the average density of the points in each grid in the projected statistical map n b h d ˜ p i , the average density values of the points are assigned to each grid, and value 1 is assigned for the grid with no points. The gird map are expanded to a 1 × N P × N P dimensional array in a certain order, and after normalization, the histogram H d is generated.
(8) Splicing the feature histogram: The arrays of H n , H c and H d from three frames can be spliced in together, so the descriptor histogram can be generated as follows:
H = k 1 H n , k 2 H c , k 3 H d
where H is the final feature histogram descriptor, k 1 , k 2 and k 3 are weights that have been presented for adjusting the proportion of the normal, curvature and density in the feature description.
In order to evaluate the descriptor conveniently and get the values of k 1 , k 2 and k 3 , we make H n , H c , and H d the same proportions temporarily in the description.

3.2. Matching Algorithm

In the stage of key points matching, the key points in the model point cloud would directly match the most similar ones in the scene point cloud with NN algorithm. While the NNDR algorithm only considered the top-two similar key points in the target point cloud. In fact, the correct matching key point in the target point cloud may be not any of them, leading to errors in the calculation of transformation matrix. Both of the two matching algorithms would cause the wrong corresponding key point pairs, especially in the point cloud with low quality. Therefore, for getting more precise corresponding key points pairs effectively, we proposed a novel key point matching algorithm that not only considers more similar points, but also handles the corresponding key points through BP networks. This algorithm is divided into two parts:
(1) In the first part, suppose there are i key points in the model point cloud and j key points in the scene point cloud. We use the proposed descriptor in this paper to extract the features one by one from the key points K P m i , i = 1 , 2 , 3 , , i , and the same operation is also carried out on the key points K P s j , j = 1 , 2 , 3 , , j in the scene point cloud. We choose a key point K P m i in the model randomly, and set the number of similar points to be found as k. Therefore, key points as K P s i , k , where i means the key point from the scene matching K P m i . K P s i , 1 means the first similar key point, and K P s i , 2 means the second similar key point, etc. Defining d f P 1 , P 2 as the similarity of features descriptors between these two points P 1 and P 2 , calculated by k N N methods. Now, consider the following formula:
d f K P m i , K P s i , 1 < 0.5 d f K P m i , K P s i , 2 + d f K P m i , K P s i , 3 + + d f K P m i , K P s i , k k 1
where K P m i and K P s i , 1 are a pair of corresponding key points. If the most similar point K P s i , 1 does not satisfy the above equation, we take all the k similar points K P s i , 1 , K P s i , 2 , , K P s i , k into the second part to consider which key point is matched with K P m i precisely.
(2) In the second part, we handle the corresponding key points with BP networks. The reasons of using BP network is that it could fit the mapping relationship between the independent variables x 1 , x 2 , , x n and the dependent variable y through enough data training. The structure of a conventional BP neural network is shown in Figure 5.
Figure 5. Structure of a conventional BP neural network.
The number of neurons in the hidden layer can be set based on experience as follows:
n l = n + m + b
where n l is the number of neurons in the hidden layer, n is the number of neurons in the input layer, m is the number of neurons in the output layer, and b is a constant within 0 , 10 .
In general, the transfer function of the hidden layer adopts the Sigmoid function, so that the BP network could achieve arbitrary approximation to any function, while the output layer adopts a linear function. For the choice of learning rate, a learning rate that is too large will lead to ups and downs in network training; also, it will easily skip the global optimal solution and enter the local optimal solution. There have been many methods in designing the learning function, and the Levenberg–Marquardt Backpropagation learning algorithm is more commonly used with a good performance and high training speed.
We can calculate some spatial features such as the distance between two key points and the angles formed by three key points. These spatial features and angles can be used as the input independent variables x 1 , x 2 , , x n to BP networks for training. Here, d i s t P 1 , P 2 is defined as the Euclidean distance difference between two points, a n g l e P 1 , P 2 , P 3 is defined as the angle of three points with P 2 as the vertex between P 1 and P 3 . As Figure 6 shows, suppose there is a key point K P m i in the model point cloud and three nearest neighboring key points of it; they are K P m , 1 i , K P m , 2 i , K P m , 3 i . Furthermore, suppose we have found k similar key points to K P m i in the scene point cloud, one of which is K P s i , k , and the three nearest neighboring key points of K P s i , k are K P s , 1 i , k , K P s , 2 i , k and K P s , 3 i , k . Then, we can calculate the spatial distance features from K P m i and K P s i , k as follows:
d 1 = d i s t K P m i , K P m , 1 i d i s t K P s i , k , K P s , 1 i , k
d 2 = d i s t K P m i , K P m , 2 i d i s t K P s i , k , K P s , 2 i , k
d 3 = d i s t K P m i , K P m , 3 i d i s t K P s i , k , K P s , 3 i , k
and the spatial angle features from K P m i and K P s i , k as follows:
θ 1 = a n g l e K P m , 1 i , K P m i , K P m , 2 i a n g l e K P s , 1 i , k , K P s i , k , K P s , 2 i , k
θ 2 = a n g l e K P m , 1 i , K P m , 3 i , K P m , 2 i a n g l e K P s , 1 i , k , K P s , 3 i , k , K P s , 2 i , k
as well as the differences of descriptors from K P m i and K P s i , k :
d f 1 = d f K P m i , K P m , 1 i d f K P s i , k , K P s , 1 i , k
d f 2 = d f K P m i , K P m , 2 i d f K P s i , k , K P s , 2 i , k
d f 3 = d f K P m i , K P m , 3 i d f K P s i , k , K P s , 3 i , k
Figure 6. Spatial features of the corresponding key points in the model and scene point cloud.
As long as there are enough precisely matched corresponding key point pairs and wrong matched corresponding key point pairs, we can get enough independent variables d 1 , d 2 , d 3 , θ 1 , θ 2 and d f 1 , d f 2 , d f 3 . Then, we could use these independent variables as the input data to BP networks. The label of the precise corresponding key point pairs is 1, and that of the wrong matched key point pairs is 0. Therefore, we hope the BP networks can predict a value of the input data, whether the two key points represent the corresponding key point pairs or not.
We trained the BPnet1 by using d 1 , d 2 , d 3 , θ 1 , θ 2 as the input data, and defined the output variable as y . We trained BPnet2 by using d f 1 , d f 2 , d f 3 , and defined the output variable as v . After trained with a huge number of data, these two BP networks performed well in validation. We combined them with the second part to judge whether the two key points are the corresponding pairs.
Defining two thresholds τ 1 and τ 2 , and suppose we have calculated the d 1 , d 2 , d 3 , θ 1 , θ 2 from K P m i , K P s i , k and their neighboring key points. Then, we input d 1 , d 2 , d 3 , θ 1 , θ 2 into BPnet1. If the output is y > τ 1 , we consider K P m i , K P s i , k as a corresponding key point pair. Otherwise, we calculate d f 1 , d f 2 , d f 3 from K P m i , K P s i , k and their neighboring key points. Taking d f 1 , d f 2 , d f 3 into BPnet2, if the output v > τ 2 , we can also consider K P m i , K P s i , k as a corresponding key point pair. If the output v < τ 2 , let k = k + 1 , and we judge the next similar key point from the scene point cloud. If all the k similar key points K P s i , k to K P m i are not the corresponding key points, let i = i + 1 , and we continue to consider the next key point K P m i in the model point cloud, whether there is a corresponding key points in or not in the scene point cloud. Finally, the corresponding key point pairs can be obtained. Here setting the threshold τ 1 = 0.95 and threshold τ 2 = 0.7 . The BP networks would have a best performance according to the experience of validation, and surely they can be adjusted according to the specific data.

4. Experimental Results

4.1. Multi-Statistics Histogram Descriptor

4.1.1. Data and Testing Environment

There are six different models and thirty-six scenes in the dataset of Random Views, which is established on the basis of Stanford 3D dataset, and as shown in Figure 7 [], there are some occlusion and incomplete regions in the scenes. The models are generated by registration on the multi-view point clouds, and they are from Stanford University point cloud library, including the famous Armadillo, Bunny, Happy Buddha, Asian Dragon, Thai Statue, etc. Because the mesh resolution of the laser scanning that scanned these dataset is the same, so each point cloud is scaled to the same size, and it is convenient to set the neighborhood radius r for the descriptor. All experiments are performed under windows10 operating system, Intel i5-9400 and 16 GB RAM with the simulation software.
Figure 7. Part of the point cloud in Random Views.
As Figure 7 shows, the quality of the model point cloud is really high. In contrast, the scenes are single-view point cloud with occlusion and noise, and each scene includes three to five models, but the quality of the scene point cloud is quite low.

4.1.2. Evaluation Criteria of the Descriptor

The Precision and Recall curve ( P R curve) is often used to evaluating the description ability of the local feature descriptors. The process of evaluating the descriptors are as following shows.
First of all, the key points are detected by the Intrinsic Shape Signatures (ISS) algorithm [] in the model and the scene point cloud, and the ISS is commonly used in key point detection. Then, the descriptors are generated for each key point from the model and scene point cloud, and the feature sets F m o d e l and F s c e n e can be obtained.
Next, the key point matching algorithm NNDR would be used for feature matching. In brief, the most similar descriptor f s c e n e i and the second similar descriptor f s c e n e i i in the scenes would be detected for each descriptor f m o d e l i in the models. The ratio of the distance can be calculated as follows:
τ = | f s c e n e i f m o d e l i | | f s c e n e i i f m o d e l i |
It can also be understood as the ratio of the similarity between f m o d e l i with f s c e n e i and f s c e n e i i .
Only if the ratio τ is less than the threshold τ t h , and the descriptor f m o d e l i and descriptor f s c e n e i are matched, the key points of these two descriptors are a corresponding key point pair. After that, by using SVD method, the transformation matrix is calculated through the corresponding key point pairs between the model point cloud and the scene point cloud. Finally, the model point cloud can be transformed to the scene point cloud.
Ideally, all the corresponding key point pairs completely overlap point to point. Due to the limitation of NNDR algorithm and the difference between the description ability of the descriptors, some wrong matched corresponding key point pairs would be caused. It should be pointed out that the wrong matched corresponding key point pairs would take some errors when calculating the transformation matrix, leading to the distance after transformation between the key point from model with the key point from scene. Therefore, we can use the same key point matching algorithm NNDR to evaluate the description ability of different kinds of descriptors.
After transformation, if the distance between two correctly matched corresponding key points is less than 0.5 r, these two corresponding key points will be regarded as the true positive correspondence; otherwise, they will be regarded as the false positive correspondence. Moreover, if the distance of wrong matched corresponding key points is more than 0.5 r, these two corresponding key points will be regarded as the false negative correspondence.
Many groups of precision and recall can be obtained by changing the ratio threshold τ t h in NNDR, so the P R curve can be generated as follows:
p r e c i s i o n = T P T P + F P
r e c a l l = T P T P + F N
where T P is the number of true positive correspondences, F P is the number of the false positive correspondences, and F N is the number of false negative correspondences.
According to the principle of NNDR, more corresponding key point pairs will be obtained when the threshold τ t h is raised, but the precision will be decreased, and more true positive correspondences will also be obtained, so the recall will be increased. By contrast, fewer corresponding key point pairs will be obtained due to the threshold τ t h is reduced, while the precision will be increased, and some correctly matched corresponding key point pairs can not be obtained, so the recall will be decreased. Thus, the P R curve should be a decreasing curve. In general, if the precision remains high when the recall increasing, it is an effective descriptor.

4.1.3. Robustness to Noise

For evaluating the robustness of the descriptor to noise, the Gaussian noise is added, respectively, with the peak intensity of 0.05 r, 0.1 r, and 0.2 r to the scene point cloud. Then, the feature descriptors based on the key points are calculated in the scenes. The feature descriptors of the model point cloud without noise are also calculated. The key point matching experiment are made for generating the P R curve, our descriptor would be contrasted with FPFH, RoPS, SHOT, and SpinImage. Here, with different peak intensity Gaussian noise, two examples of the scene point clouds are shown in Figure 8.
Figure 8. Examples of the scenes with different peak intensity of Gaussian noise. (a) 0.05 r. (b) 0.1 r. (c) 0.2 r.
The steps of the experiments are as follows.
(1) First, the key points from the model and the scene point cloud are detected and, respectively, recorded as K P m and K P s . The feature descriptors are generated based on K P m and K P s . The feature set F m o d e l is built up by all the model descriptors, and the scene feature set F s c e n e is built up by all the scene descriptors.
(2) Based on the F s c e n e , a K D tree can be constructed. Through the kNN searching algorithm, each descriptor in F m o d e l can detect several similar descriptors in F s c e n e .
(3) Finally, the correspondences between K P m and K P s can be constructed by using NNDR. As it was mentioned in Section 4.1.2, many groups of precision and recall can be obtained by changing the ratio threshold τ t h in NNDR, so the P R curve can be generated.
The P R curve in Figure 9 shows the performance of these different descriptors. It can be seen that our descriptor is more robust to noise than other descriptors, and SHOT has the second best performance. This occurs because the proposed descriptor extracts the features from multiple aspects especially from the density and generates a statistical histogram. After projection, the histogram generated by the local point density is not sensitive to noise, so the robustness and description ability of descriptor is guaranteed. However, it can be seen from Figure 9c that the description ability of our descriptor is also reduced. Because the average curvature histogram is used in our descriptor, it improves the description ability while reducing the robustness to Gaussian noise.
Figure 9. P R curves in different noise scenes. (a) Gaussian noise σ = 0.05 r. (b) Gaussian noise σ = 0.1 r. (c) Gaussian noise σ = 0.2 r.

4.1.4. Robustness to Varying Mesh Resolution

In order to evaluate the robustness of the descriptor to varying mesh resolution, 25%, 50%, and 75% downsampling are used, respectively, in the scene point cloud. Then, the feature descriptors based on the key points are calculated in the scenes. The feature descriptors of the model point cloud without noise are also calculated. The key point matching experiment are made for generating the P R curve, and our descriptor would be contrasted with FPFH, RoPS, SHOT, and SpinImage through the experimental results. Here, with different mesh resolution, two examples of the scene point clouds are shown in Figure 10.
Figure 10. Examples of scenes with different mesh resolution. (a) 75%. (b) 50%. (c) 25%.
Here, the steps of the experiments are approximately identical to those of the previous section, except that the scenes are downsampled instead of adding noise.
The P R curve in Figure 11 shows the performance of these different descriptors with different mesh resolution. It can be seen that our descriptor is more robust than other descriptors under different mesh resolution, and RoPS has the second best performance. As the proposed descriptor extracts the geometric attributes features of the points, such as the normals and curvatures, even if there are low mesh resolution, occlusion and incomplete regions in the scene point cloud, the description ability of our descriptor can be guaranteed. Although our descriptor does not perform well when the point clouds are downsampled to 25%, it is rare for this degree of mesh resolution in actual work. Moreover, our feature descriptor performs well when the mesh resolution downsampling to 75% and 50%. Therefore, our feature descriptor is robust to varying mesh resolution.
Figure 11. P-R curves in different mesh resolution. (a) Downsampling 75%. (b) Downsampling 50%. (c) Downsampling 25%.

4.1.5. Key Point Matching Based on Descriptors with Single Model

In this experiment, we use six point cloud models from the Stanford University dataset as the model point cloud. For the scene point cloud, the Gaussian noise ( σ = 0.1 r) is added into each model, and then these point clouds are rotated and translated to a new position, so we can regard them as the scene point cloud. Moreover, the model point cloud without noise is still at the initial position. Now the experiment is to make the pairwise registration between the model point cloud and the scene point cloud. After the extraction of feature descriptors and the feature matching by NNDR, the correspondences have been constructed between the model point cloud and the scene point cloud. Figure 12 shows the examples about the results of the key point matching between the model point cloud (in green) and the scene point cloud (in blue). The red lines are used for connecting the corresponding key points.
Figure 12. Examples of matching results between models and Gaussian noise models. (a) Corresponding key point pairs in Armadillo. (b) Corresponding key point pairs in Asian Dragon. (c) Corresponding key point pairs in Happy Buddha.
In general, the more parallel red lines there are, the more correctly matched corresponding key point pairs there are. If a red line is not parallel to most other red lines, it represents the wrong matched corresponding key point pairs.
According to the results of the corresponding key point pairs, the SVD method is used for calculating the rotation matrix R d and the translation matrix T d . The wrong matched corresponding key point pairs will cause errors to the rotation and translation matrix. Therefore, an effective feature descriptor can obtain more correctly matched corresponding key point pairs. For the scene point cloud at the above paragraph, and the real rotation matrix is defined as R g t , and the translation matrix is defined as T g t . If the error between R d and R g t is small and the error between T d and T g t is also small, it means there are many correctly matched corresponding key point pairs, also it can reflect that the descriptor have a good performance in pairwise registration. The rotation error θ r and the translation error θ t can be defined as follows:
θ r = arccos ( t r a c e ( R d R g t 1 1 ) 2 ) 180 π
θ t = T d T g t d r
Here, trace is the sum of the diagonal elements of the matrix and d r is set as 0.5 r.
Base on different descriptors, Table 1 shows the error of the rotation and translation after feature matching. The error of the rotation and translation calculated by the proposed descriptor is smaller than that of other descriptors. Thus, it can further prove that the description ability of our descriptor MSHD is better than other descriptors, and it can also reflect the robustness and effectiveness about our descriptor.
Table 1. Errors of Stanford 3D models.

4.2. Matching Algorithm for Key Points between Model and Multi-Object Scene

As the real point cloud data are usually collected by the laser scanner, it is inevitable that there will be occlusion, incomplete regions, etc. in the collected point cloud with multiple objects. As shown in Figure 13, for reflecting the characteristics of these real data, three models in Random View are selected: the Bunny, the Dragon, and the Happy Buddha (in green). Moreover, three scene point clouds containing these models are also selected (in blue). Furthermore, two models in Space Time dataset [] are selected: the Mario and the Rex (in green). Two scene point clouds containing these models are selected (in blue). Therefore, there are five models and five scenes totally for the experiment. It can be seen that there are many occlusion, truncation, incomplete regions, and other problems in the each scene point cloud, while the model point cloud from Space Time dataset are single-view point cloud. These selected point clouds can restore the characteristics of real data, such as the multi-objects scenes and some single-view real data, which can help us to evaluate the effectiveness of our key point matching algorithm in this paper.
Figure 13. Point clouds that were selected in the experiment.
In this experiment, based on our descriptor, the key point matching algorithm is used for obtaining the corresponding key point pairs between the models and scenes, and then the rotation and translation matrix is calculated for 3D surface matching. Therefore, the models can be matched into the scene point cloud. The experimental results of our key point algorithm are compared with the commonly used NN and NNDR. Here, according to the principle, τ t h is set as 0.5 in NNDR, based on which the best performance can be got. All experiments are performed on the five model point clouds and the five scene point clouds that have been mentioned above.
The results of 3D surface matching are shown in Figure 14. It can be seen from the results of NN and NNDR, the model and the scene do not match well. Because the NN directly matches the most similar key point, and NNDR only considers the top-two similar key points in scenes. Many wrong matched corresponding key point pairs are obtained due to the limitation of these two algorithms, leading to the errors of transformation matrix which is calculated based on all the correspondences between the model and the scene point cloud, so the results of 3D surface matching are unsatisfied. Moreover, due to its strict conditions and the limitation of only considering the top-two similar points, in some situations, NNDR can not get enough or even any matched corresponding key point pairs. Less than three groups of corresponding point pairs will lead the transformation matrix can not be calculated, and the position of models also can not be transformed. In contrast, the 3D surface matching results of our key point matching algorithm are much better, which means there are much more correctly matched corresponding key point pairs.
Figure 14. Surface matching results of three methods tested on the selected datasets. The models (in green) and the scenes (in blue) to be matched (a,e,i,m,q). NN results (b,f,j,n,r), NNDR results (c,g,k,o,s), and our method results (d,h,l,p,t).
Furthermore, it also can be seen from Table 2 that the error of our method is much smaller than NN. The word “matched” in the table means the number of the corresponding key point pairs. Analyzing from the results that combining with the errors of θ r and θ t , based on our method, the number of correctly matched corresponding key point pairs is greater than that of NN and NNDR. NNDR cannot even get any matched corresponding key points pair in the data of Bunny, Happy Buddha, and Rex. Therefore, our method is more robust and effective in processing the data with occlusion, truncation, and incomplete regions.
Table 2. Errors of three key point matching algorithms on models and scenes.

4.3. Matching Algorithm for Real Data

In this experiment, some real component point cloud data from the train bottom are used, and they are collected by the 3D laser scanning with a three-million pixel industrial camera, including the part of wheel hub, edge of base, tie rod, and bolts (Figure 15). All the real data have been preprocessed to improve the quality. The results of 3D surface matching are shown in Figure 15, and from Table 3 we can see that the error of our method is still much smaller than that of NN. NNDR is effective as ours, but more corresponding key point pairs can be obtained by our method, which is good for the last fine registration.
Figure 15. Results of 3D surface matching on real data. The up column is the initial position of these components. The down column is the results of 3D surface matching.
Table 3. Errors of the real data.

5. Conclusions

This paper introduces a 3D point cloud surface matching method, including a multi-statistics histogram descriptor that combines spatial distribution features and geometric attributes features, and a novel key point matching algorithm based on deep learning, which identifies more corresponding point pairs than the existing methods. Experimental results on Stanford dataset show that MSHD performs better than the baselines in the data with noise, occlusion, and incomplete regions. Meanwhile, MSHD has a strong robustness against noise and mesh resolution, and it also reflects a strong description ability. Our key point matching algorithm is evaluated on Stanford 3D dataset and four real component point clouds from the train bottom. From the results of the experiment about 3D surface matching, more corresponding key point pairs can be obtained. Combined with the results of errors in the rotation and translation matrix, it has been confirmed that the error of our methods is much smaller, and more number of precisely matched corresponding key point pairs can be captured, resulting in enhanced recognition and registration.

Author Contributions

Conceptualization, J.L., B.C. and M.Y.; methodology, B.C., M.Y. and Q.Z.; validation, J.L. and B.C.; formal analysis, J.L., L.L. and X.G.; data curation, J.L. and B.C.; writing—original draft preparation, B.C. and M.Y.; writing—review and editing, B.C. and J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by National Natural Science Foundation of China (NSFC) and Sichuan Science and Technology Planning Project.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We thank the anonymous reviewers and members of the editorial team for the comments and contributions. We thank the 3D scanning repository of Stanford for providing the data.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. L, Q.; Z, L.; L, J. Research progress in three-dimensional object recognition. J. Image Graph. 2000, 5, 985–993. [Google Scholar]
  2. Huang, Z.; Yu, Y.; Xu, J.; Ni, F.; Le, X. PF-Net: Point Fractal Network for 3D Point Cloud Completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
  3. Li, H.; Hartley, R. The 3D-3D Registration Problem Revisited. In Proceedings of the 2007 IEEE 11th International Conference on Computer Vision, Rio de Janeiro, Brazil, 14–21 October 2007; pp. 1–8. [Google Scholar]
  4. Guo, Y.; Sohel, F.; Bennamoun, M.; Lu, M.; Wan, J. Rotational Projection Statistics for 3D Local Surface Description and Object Recognition. Int. J. Comput. Vis. 2013, 105, 63–86. [Google Scholar] [CrossRef] [Green Version]
  5. Guo, Y.; Bennamoun, M.; Sohel, F.; Min, L.; Wan, J.; Kwok, N.M. A Comprehensive Performance Evaluation of 3D Local Feature Descriptors. Int. J. Comput. Vis. 2016, 116, 66–89. [Google Scholar] [CrossRef]
  6. Johnson, A.E. Spin-Images: A Representation for 3-D Surface Matching. Ph.D. Thesis, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA, 1997. [Google Scholar]
  7. Halma, A.; Haar, F.T.; Bovenkamp, E.; Eendebak, P.; Eekeren, A.V. Single spin image-ICP matching for efficient 3D object recognition. In Proceedings of the ACM Workshop on 3D Object Retrieval, Firenze, Italy, 25 October 2010. [Google Scholar]
  8. Rusu, R.B.; Blodow, N.; Beetz, M. Fast Point Feature Histograms (FPFH) for 3D registration. In Proceedings of the IEEE International Conference on Robotics & Automation, Kobe, Japan, 12–17 May 2009. [Google Scholar]
  9. Tombari, F.; Salti, S.; Stefano, L.D. Unique Signatures of Histograms for Local Surface Description. In Proceedings of the European Conference on Computer Vision Conference on Computer Vision, Crete, Greece, 5–11 September 2010. [Google Scholar]
  10. Yang, B.; Zang, Y. Automated registration of dense terrestrial laser-scanning point clouds using curves. ISPRS J. Photogramm. Remote Sens. 2014, 95, 109–121. [Google Scholar] [CrossRef]
  11. Oomori, S.; Nishida, T.; Kurogi, S. Point cloud matching using singular value decomposition. Artif. Life Robot. 2016, 21, 149–154. [Google Scholar] [CrossRef]
  12. Tombari, F.; Salti, S.; Stefano, L.D. Unique shape context for 3d data description. In 3DOR 2010: Proceedings of the ACM Workshop on 3D Object Retrieval; ACM: Firenze, Italy, 2011. [Google Scholar]
  13. Wang, X.L.; Liu, Y.; Zha, H. Intrinsic Spin Images: A subspace decomposition approach to understanding 3D deformable shapes. Procdpvt 2010, 10, 17–20. [Google Scholar]
  14. Rusu, R.B.; Blodow, N.; Marton, Z.C.; Beetz, M. Aligning Point Cloud Views using Persistent Feature Histograms. In Proceedings of the 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, Acropolis Convention Center, Nice, France, 22–26 September 2008. [Google Scholar]
  15. Chen, H.; Bhanu, B. 3D free-form object recognition in range images using local surface patches. Pattern Recognit. Lett. 2007, 28, 1252–1262. [Google Scholar] [CrossRef]
  16. Sun, J.; Ovsjanikov, M.; Guibas, L. A Concise and Provably Informative Multi-Scale Signature Based on Heat Diffusion. Comput. Graph. Forum 2009, 28, 1383–1392. [Google Scholar] [CrossRef]
  17. Lu, B.; Wang, Y. Matching Algorithm of 3D Point Clouds Based on Multiscale Features and Covariance Matrix Descriptors. IEEE Access 2019. [Google Scholar] [CrossRef]
  18. Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
  19. Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In Proceedings of the NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Online, 4 December 2017. [Google Scholar]
  20. Li, Y.; Bu, R.; Sun, M.; Chen, B. PointCNN. In Proceedings of the 32nd Conference on Neural Information Processing Systems (NIPS), Montreal, Canada, 2–8 December 2018. [Google Scholar]
  21. Wu, W.; Qi, Z.; Li, F. PointConv: Deep Convolutional Networks on 3D Point Clouds. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
  22. He, B.; Lin, Z.; Li, Y.F. An automatic registration algorithm for the scattered point clouds based on the curvature feature. Opt. Laser Technol. 2013, 46, 53–60. [Google Scholar] [CrossRef]
  23. Yang, J.; Li, H.; Campbell, D.; Jia, Y. Go-ICP: A Globally Optimal Solution to 3D ICP Point-Set Registration. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 2241–2254. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  24. Hong, S.; Ko, H.; Kim, J. VICP: Velocity Updating Iterative Closest Point Algorithm. In Proceedings of the IEEE International Conference on Robotics & Automation, Anchorage, AK, USA, 3–7 May 2012. [Google Scholar]
  25. Yang, J.; Li, H.; Jia, Y. Go-ICP: Solving 3D Registration Efficiently and Globally Optimally. In Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013. [Google Scholar]
  26. Censi, A. An ICP variant using a point-to-line metric. In Proceedings of the IEEE International Conference on Robotics & Automation, Pasadena, CA, USA, 19–23 May 2008. [Google Scholar]
  27. Magnusson, M.; Lilienthal, A.; Duckett, T. Scan registration for autonomous mining vehicles using 3D-NDT. J. Field Robot. 2010, 24, 803–827. [Google Scholar] [CrossRef] [Green Version]
  28. Chang, S.; Ahn, C.; Lee, M.; Oh, S. Graph-matching-based correspondence search for nonrigid point cloud registration. Comput. Vis. Image Underst. 2020, 192, 102899.1–102899.12. [Google Scholar] [CrossRef]
  29. Li, J.; Qian, F.; Chen, X. Point Cloud Registration Algorithm Based on Overlapping Region Extraction. J. Phys. Conf. Ser. 2020, 1634, 012012. [Google Scholar] [CrossRef]
  30. He, Y.; Lee, C.H. An Improved ICP Registration Algorithm by Combining PointNet++ and ICP Algorithm. In Proceedings of the 2020 6th International Conference on Control, Automation and Robotics (ICCAR), Singapore, 20–23 April 2020. [Google Scholar]
  31. Kamencay, P.; Sinko, M.; Hudec, R.; Benco, M.; Radil, R. Improved Feature Point Algorithm for 3D Point Cloud Registration. In Proceedings of the 2019 42nd International Conference on Telecommunications and Signal Processing (TSP), Budapest, Hungary, 1–3 July 2019. [Google Scholar]
  32. Xiong, F.; Dong, B.; Huo, W.; Pang, M.; Han, X. A Local Feature Descriptor Based on Rotational Volume for Pairwise Registration of Point Clouds. IEEE Access 2020, 8, 100120–100134. [Google Scholar]
  33. Taati, B.; Greenspan, M. Local shape descriptor selection for object recognition in range data. Comput. Vis. Image Underst. 2011, 115, 681–694. [Google Scholar] [CrossRef]
  34. Papazov, C.; Haddadin, S.; Parusel, S.; Krieger, K.; Burschka, D. Rigid 3D geometry matching for grasping of known objects in cluttered scenes. Int. J. Robot. Res. 2012, 31, 538–553. [Google Scholar] [CrossRef]
  35. Yu, Z. Intrinsic shape signatures: A shape descriptor for 3D object recognition. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Kyoto, Japan, 27 September–4 October 2010. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.