A 6D Pose Estimation for Robotic Bin-Picking Using Point-Pair Features with Curvature (Cur-PPF)

Pose estimation is a particularly important link in the task of robotic bin-picking. Its purpose is to obtain the 6D pose (3D position and 3D posture) of the target object. In real bin-picking scenarios, noise, overlap, and occlusion affect accuracy of pose estimation and lead to failure in robot grasping. In this paper, a new point-pair feature (PPF) descriptor is proposed, in which curvature information of point-pairs is introduced to strengthen feature description, and improves the point cloud matching rate. The proposed method also introduces an effective point cloud preprocessing, which extracts candidate targets in complex scenarios, and, thus, improves the overall computational efficiency. By combining with the curvature distribution, a weighted voting scheme is presented to further improve the accuracy of pose estimation. The experimental results performed on public data set and real scenarios show that the accuracy of the proposed method is much higher than that of the existing PPF method, and it is more efficient than the PPF method. The proposed method can be used for robotic bin-picking in real industrial scenarios.


Introduction
Bin-picking is a common scene in the industry, aiming to take out objects placed in disorder by robotic arms. There are different degrees of overlap and occlusion interference with the detection and perception of objects, yielding the failure of the robotic grasping task [1]. Bin-picking is challenging, attracting many domestic and foreign scholars [2][3][4].
The key of bin-picking is to calculate the pose of the best picking point of the target object [5], namely, 6D pose estimation. According to the current research on pose estimation, it can be divided into correspondence method, template-based method, voting-based method, and deep learning-based method [6].
The method to find the relationship between input data and known point cloud model is called the correspondence method. According to the type of input data, the method can be divided into 2D-3D correspondence and 3D-3D correspondence [7]. The 2D-3D corresponding method is often used for objects with rich textures. The point cloud model is projected from multiple angles, and the relationship between the template image and the RGB image of the target object in a single angle, is found through feature points. Then, the Perspective-n-Point (PnP) algorithm is used to restore the pose of the current perspective. For example, Hu et al. [8] introduced a segmentation driven network framework for 6D pose estimation. This method predicts the local pose through the 2D key point position of objects in the scenario, thereby generating a set of reliable 3D to 2D correspondences, and then uses the PnP algorithm to calculate the accurate pose of each object. This method can maintain robustness in the presence of overlap among objects, but it is not suitable for untextured objects. In the 3D-3D corresponding method, the acquired depth image is converted into a 3D point cloud, and then the relationship between the two point clouds is solved this method, the normal vector in the original feature is changed to a tangent vector to enhance the feature representation of objects. A multi-edge appearance method of model description was proposed, to improve the efficiency by reducing useless pointpairs matching. Vidal et al. [23] presented to estimate 6D poses of free-form objects in the presence of clutter and occlusions. By considering the judgment value of surface information, a new viewpoint-dependent re-scoring process and two scene consistency verification steps were proposed to reduce false-positive cases. Ruel et al. [24] proposed a 3DLASSO system which was designed to perform real-time tracking and 6D pose estimation of target spacecraft from sparse and noisy 3D data. Different from the PPF method, instead of point-pairs larger polygons are used in a similar setup, and a faster version of the ICP algorithm is developed for pose estimation. The algorithm is quite robust to sensor noise and deviations from the reference model, but poses that do not provide enough geometric information to the algorithm showed larger errors.
In order to solve the bin-picking problem in industry, we have expanded and improved Drost's method [17]. The main contributions of the proposed method are: (1) An effective method for extracting candidate targets point cloud is adopted in preprocessing step. Specifically, the organized scene point cloud is mapped to the grayscale image, and the segmented grayscale images are mapped back to the point cloud. After threshold processing, only point cloud of unobstructed target objects in the scene are retained; (2) A new point-pair feature descriptor is proposed, which introduces curvature information based on the PPF method to effectively enhance the description of point-pair features; (3) In the pose voting link, a new weighted voting scheme is proposed by combining the curvature distribution of the model, which gives more weight to high information point-pairs, thereby further improving the accuracy of pose estimation.
The rest of this paper is organized as follows. The proposed method is presented in Section 2. Experimental results and discussions are given in Section 3. The conclusion is provided in Section 4.

The Proposed Method
Our work is based on the method proposed by Drost et al. [17]. Through the improvement and optimization of the PPF, 6D poses of target objects can be accurately achieved in complex industrial scenarios, enabling the robotic arm to complete the bin-picking tasks. The flow chart of the proposed method is shown in Figure 1, which comprises offline phase and online phase. lator. Finally, poses are clustered and the average of the highest clustered poses is used as the output result. The ICP algorithm is used to refine the pose estimation. In the next section, we will elaborate on all aspects of the proposed method, especially the differences from the PPF.  Table   Preprocessing Cur In the offline stage, the CAD model of the object is used to generate point cloud, as shown in Figure 2. Firstly, the generated model point cloud is preprocessed, which mainly includes point cloud downsampling, normal calculation, and curvature calculation. Due to the mass of model point cloud, it causes calculation redundancy. In order to speed up the processing, downsampling operation is required. The normals and curvatures of point cloud are calculated to prepare for PPF [25]. Then the high-curvature part and the  low-curvature part are classified according to curvature distributions of models point cloud,  and the pose weighted voting is performed on point-pairs with high information. Finally,  the six-dimensional features of the model point-pairs are calculated, and features are stored  in the hash table for

Preprocessing
The preprocessing includes point cloud downsampling, normal calculation and curvature calculation. The point cloud downsampling and normal calculation are the same as the method by Drost et al. [17]. In the following, we focus on the point cloud curvature calculation.
Curvature can reflect the bending degree of geometry [27]. In the three-dimensional space, the curvature of the point cloud can provide special information for feature matching, which can effectively reduce matching error [28]. From the geometric description, the types of curvature can be divided into principal curvature, Gaussian curvature and average curvature. Principal curvature refers to the normal curvature in the principal direction of a point on the surface, and it is also the maximum and minimum values of the normal curvature of the surface in all directions at that point. At any point in the point cloud, there is a surface ( ) , z r x y = approaching this point. Assuming that the principal curvature of this point is n k , the n k calculation formula is:  In the online stage, the main work is to calculate 6D poses of target objects through PPF matching, to achieve precise grasping. We use a 3D sensor to obtain organized scene point cloud, which is outlier removed and transformed into gray images through mapping. Watershed algorithm is used [26] to segment gray images and candidate targets are extracted. For the segmented point cloud, the same preprocessing and features calculation are performed as done in the offline stage. By finding PPFs similar to target objects in the hash table, transformations among model point-pairs and scene point-pairs are derived, and the weighted votes of poses are completed in the two-dimensional accumulator. Finally, poses are clustered and the average of the highest clustered poses is used as the output result. The ICP algorithm is used to refine the pose estimation. In the next section, we will elaborate on all aspects of the proposed method, especially the differences from the PPF.

Preprocessing
The preprocessing includes point cloud downsampling, normal calculation and curvature calculation. The point cloud downsampling and normal calculation are the same as the method by Drost et al. [17]. In the following, we focus on the point cloud curvature calculation.
Curvature can reflect the bending degree of geometry [27]. In the three-dimensional space, the curvature of the point cloud can provide special information for feature matching, which can effectively reduce matching error [28]. From the geometric description, the types of curvature can be divided into principal curvature, Gaussian curvature and average curvature. Principal curvature refers to the normal curvature in the principal direction of a point on the surface, and it is also the maximum and minimum values of the normal curvature of the surface in all directions at that point. At any point in the point cloud, there is a surface z = r(x, y) approaching this point. Assuming that the principal curvature of this point is k n , the k n calculation formula is: The principal curvature k n is obtained by solving the quadratic equation. In the formula E = r x r x , F = r x r y , G = r y r y , L = r xx n, M = r xy n, N = r yy n; where r x , r y , r xx , r yy , r xy is the partial differential of the surface z = r(x, y), n is the value of the unit normal vector of the tangent plane of the surface z = r(x, y) at the point (x 0 , y 0 ), that is, n = r x × r y / r x × r y (x 0 ,y 0 ) . (E, F, G) is called the first basic invariant of the surface, and (L, M, N) is called the second basic invariant of the surface. The Gaussian curvature of a point on the surface is the product of the two principal curvatures, which is used to characterize the overall curvature of the local area, denoted as K, that is, K = k 1 k 2 . The average curvature of a point on the surface is the average of the two principal curvatures, denoted as H, that is, H = (k 1 + k 2 )/2. Combining the principal curvature calculation Formula (2) and the Veda theorem, it can be known that the calculation formulas of Gaussian curvature and average curvature are: In order to better describe the change of the point cloud, we used the average curvature to represent curvature characteristics. Table   The proposed Cur-PPF is a six-dimensional feature vector using the distance information of two points and its normal vector and average curvature. Compared with the original PPF, curvature information is introduced in the proposed method, which enhances the feature description of point-pairs. Cur-PPF is shown in Figure 3. For any point-pair (m 1 , m 2 ), m 1 and m 2 are two points in the model point cloud, n 1 and n 2 are the normal vectors of these two points, q 1 and q 2 are average curvatures of the two points, vector d = m 2 − m 1 , feature expression F is:

Cur-PPF Feature Extraction and Hash
where d 2 represents the Euclidean distance between the two points, ∠(a, b) ∈ [0, π] denotes the angle between two vectors. It should be noted that the feature F Cur−PPF is asymmetric, just as     , m m is different from 1 F , so it is stored in another slot of the hash table, which is represented by feature 2 F .

Point Cloud Segmentation and Candidate Target Selection
Effectively extracting target objects in complex scenarios is very helpful for feature matching, so scene point cloud segmentation is performed. Point cloud segmentation can be divided into two categories [29]. The first type of method is the direct method, in which the point cloud is directly segmented, such as the Euclidean distance segmentation algorithm [30] integrated in the PCL library [31]. Its principle is to find a certain point in space, the n points closest to the point are found through KdTree, and the distance to the point is judged. If the distance is less than the threshold, it is considered to be of the same kind. This algorithm has to traverse all the points in the space, which is complicated and takes a long time, so it is not suitable for real-time system. The second is the indirect method. The point cloud is mapped to a two-dimensional image for segmentation, and then segmented images are mapped back to the three-dimensional space to achieve point cloud segmentation. The method is based on two-dimensional image processing, with high accuracy and less time consumed [32].
Because the point cloud is obtained by the 3D sensor in this system and the order of the point cloud is known [33], we chose the second method to achieve point cloud segmentation. Firstly, the ordered point cloud is projected onto the plane composed of

Point Cloud Segmentation and Candidate Target Selection
Effectively extracting target objects in complex scenarios is very helpful for feature matching, so scene point cloud segmentation is performed. Point cloud segmentation can be divided into two categories [29]. The first type of method is the direct method, in which the point cloud is directly segmented, such as the Euclidean distance segmentation algorithm [30] integrated in the PCL library [31]. Its principle is to find a certain point in space, the n points closest to the point are found through KdTree, and the distance to the point is judged. If the distance is less than the threshold, it is considered to be of the same kind. This algorithm has to traverse all the points in the space, which is complicated and takes a long time, so it is not suitable for real-time system. The second is the indirect method. The point cloud is mapped to a two-dimensional image for segmentation, and then segmented images are mapped back to the three-dimensional space to achieve point cloud segmentation. The method is based on two-dimensional image processing, with high accuracy and less time consumed [32].
Because the point cloud is obtained by the 3D sensor in this system and the order of the point cloud is known [33], we chose the second method to achieve point cloud segmentation. Firstly, the ordered point cloud is projected onto the plane composed of x − axis and y − axis of the coordinate system, and the effective detection range of the depth value in the z − axis direction is mapped to become the gray value. Then the watershed segmentation algorithm [26] is used to segment the gray image, so an image is divided into several disjoint local areas. Finally, gray images are mapped back to the three-dimensional space to complete the point cloud segmentation. For a more detailed understanding of the segmentation process, we describe it using pseudocode, which is shown in Algorithm 1.
There are usually overlapping occlusions in the picking scenarios. The candidate objects grabbed by the robotic arm are the top priority (that is, the ones that are not occluded or have a large exposed surface), which also conforms to the logical order of grabbing. Therefore, grayscale images are thresholded after watershed segmentation. Firstly, the single-sided point cloud of a single object in the scene is obtained by a 3D sensor and mapped to a grayscale image to obtain the number of pixels of the image. Then, the number of local pixels after segmentation are compared with the number of pixels on one side of the object. If the number of surface pixels is similar to the number of surface pixels on one side of the object, and the number of contour pixels is similar to the number of contour pixels on one side of the object, we consider the object to be a candidate to be grasped by the robotic arm. Finally, each pixel is mapped to three-dimensional space to complete the effective segmentation and the selection of candidate targets. Three-way tube is a category in the test data set of this paper, and is demonstrated as a legend, as shown in Figure 5.  depth value in the z axis − direction is mapped to become the gray value. Then the watershed segmentation algorithm [26] is used to segment the gray image, so an image is divided into several disjoint local areas. Finally, gray images are mapped back to the three-dimensional space to complete the point cloud segmentation. For a more detailed understanding of the segmentation process, we describe it using pseudocode, which is shown in Algorithm 1.
There are usually overlapping occlusions in the picking scenarios. The candidate objects grabbed by the robotic arm are the top priority (that is, the ones that are not occluded or have a large exposed surface), which also conforms to the logical order of grabbing. Therefore, grayscale images are thresholded after watershed segmentation. Firstly, the single-sided point cloud of a single object in the scene is obtained by a 3D sensor and mapped to a grayscale image to obtain the number of pixels of the image. Then, the number of local pixels after segmentation are compared with the number of pixels on one side of the object. If the number of surface pixels is similar to the number of surface pixels on one side of the object, and the number of contour pixels is similar to the number of contour pixels on one side of the object, we consider the object to be a candidate to be grasped by the robotic arm. Finally, each pixel is mapped to three-dimensional space to complete the effective segmentation and the selection of candidate targets. Three-way tube is a category in the test data set of this paper, and is demonstrated as a legend, as shown in Figure 5.

Feature Matching
Feature matching refers to successfully finding PPFs of the model in the hash table, so that the transformation can be calculated. In this paper, the local coordinate system is established for solving. Given a point-pair s r , s j in the scene, the Cur-PPF of the point-pair is calculated and the feature as the key value is used to find the corresponding The two points s r and m r are moved to the origin of the local coordinate system, and the normals of these two points are aligned with the x − axis, so that the object can be rotated around the normal to align the model with the scene, as shown in Figure 6. The transformation from the model to the scene can be represented by a point and a rotation angle α, which is (m r , α). If the model point-pair m r , m j and the scene point-pair s r , s j have similar Cur-PPF, the conversion relationship between the two point-pairs can be calculated by the Formula (6).
where, T m→g is a transformation with rotation and translation, which translates the reference point m r in the model point-pair feature (m r , m i ) to the origin of the coordinate system, and at the same time rotates the normal vector n m r of the reference point m r to the same direction as the x − axis of the coordinate system. T s→g is also a transformation with rotation and translation, which translates the reference point s r in the model point-pair feature (s r , s i ) to the origin of the coordinate system, and at the same time rotates the normal vector n s r of the reference point s r to the same direction as the x − axis of the coordinate system. T −1 s→g is the inverse of T s→g . R x (α) is the rotation around the x − axis with angle α.  ( ) i.e., t lies on the half-plane defined by the  In order to improve the calculation speed of α angle, α can be divided into two parts, namely α = α m − α s . Where, α m is the rotation angle at which the model point-pair (m r , m i ) continues to rotate around the x − axis after the transformation of T m→g , so that the point m i falls on the plane composed of the x − axis and the positive half-axis of the y − axis; α s is the rotation angle at which the scene point-pair (s r , s i ) continues to rotate around the x − axis after the transformation of T s→g , so that the point s i falls on the plane composed of the x − axis and the positive half-axis of the y − axis; the direction of rotation of the two remains the same. The calculation of these two parts is independent of each other, so we can split R i.e., t lies on the half-plane defined by the x − axis and the non-negative part of the y − axis.
For successfully paired point-pairs, α m can be calculated for model point-pairs in the offline phase and store them in the hash table. In this way, only α s needs to be calculated for scene point-pairs. The final angle α is the difference between the two angles.  Figure 7a. Different colors represent the average curvature value. It can be seen from characteristics of curvature that point cloud with similar curvature values is also similar in bending, and such point cloud is distributed in the same area in space. And point cloud with large differences in curvature values also has large differences in the degree of bending, and such point cloud is distributed far apart in space. We believe that point-pairs with the greater difference in curvature values of the two points contain more information, and the mapping relationship α is calculated by the pairing is more accurate, such that α should be given a higher weight when voting, as shown in Formula (8). For example, in the three-way tube model of this experiment, the high-curvature part and low-curvature part of the model are divided according to the curvature histogram. The curvature histogram is shown in Figure 7b. Weighted vote is performed on the calculated α that has a greater difference in curvature values between the two points in the model point-pair. The voting process is shown in Figure 8.  Weight= . histogram of the three-way tube. According to the curvature histogram, we set 0-0.035 as the low-curvature range, and greater than 0.1 as the high-curvature range.

Pose Clustering
When reference points are located on the object surface, multiple effective point-pairs will be generated. Each point-pair will be calculated a pose result after feature matching, so an object will have a set of pose sets. The pose sets are clustered to ensure that the translation and rotation errors of all poses in each category are in the set threshold. The score of each pose is the cumulative sum of votes obtained by that pose during the voting phase. The category with the highest score is selected, and poses contained in this category are averaged to obtain the final pose results. This operation not only removes the pose data with large errors through the threshold, but also improves the accuracy of the final pose result by the average value. Since there will be multiple objects in the scene, multiple high-scoring categories will be generated, and the category with the highest number of votes is selected as the preferred pose.

ICP Optimization
In order to further improve the accuracy of the pose results, we used the ICP algorithm [13] for optimization after the pose obtained by the pose clustering. The clustering pose is used as the initial value of the ICP algorithm, and the error is further reduced by continuously reducing the Euclidean distance between the model point and the corresponding scene point. On basis of whether model points match scenic points successfully by setting the distance threshold. If the distance between the two points is less than the threshold, it is considered that the two points match successfully. Finally, the ratio ∂ between the number of matched points and the number of object points in the scene is taken as the matching rate, as shown in Formula (9). In real experimental scenario, when the value of the matching rate can enable the robotic arm to successfully grasp the target object, it is the minimum matching rate that we can accept.
Number of matching success points . Number of object points in the scene ∂ = (9) Hash Table   +1 +1 +W Accumulator Space

Pose Clustering
When reference points are located on the object surface, multiple effective point-pairs will be generated. Each point-pair will be calculated a pose result after feature matching, so an object will have a set of pose sets. The pose sets are clustered to ensure that the translation and rotation errors of all poses in each category are in the set threshold. The score of each pose is the cumulative sum of votes obtained by that pose during the voting phase. The category with the highest score is selected, and poses contained in this category are averaged to obtain the final pose results. This operation not only removes the pose data with large errors through the threshold, but also improves the accuracy of the final pose result by the average value. Since there will be multiple objects in the scene, multiple high-scoring categories will be generated, and the category with the highest number of votes is selected as the preferred pose.

ICP Optimization
In order to further improve the accuracy of the pose results, we used the ICP algorithm [13] for optimization after the pose obtained by the pose clustering. The clustering pose is used as the initial value of the ICP algorithm, and the error is further reduced by continuously reducing the Euclidean distance between the model point and the corresponding scene point. On basis of whether model points match scenic points successfully by setting the distance threshold. If the distance between the two points is less than the threshold, it is considered that the two points match successfully. Finally, the ratio ∂ between the number of matched points and the number of object points in the scene is taken as the matching rate, as shown in Formula (9). In real experimental scenario, when the value of the matching rate can enable the robotic arm to successfully grasp the target object, it is the minimum matching rate that we can accept.

∂ =
Number of matching success points Number of object points in the scene .

Experimental Results and Discussions
We used online public data set and real scene data to verify the effectiveness of the proposed method, and used a robotic arm to perform bin-picking tasks to evaluate the performance of the method in industrial applications. Our algorithm was implemented in C++ language under the Visual Studio2019 platform and was run on the NVIDIA GeForce GTX1060 processor. Through experimental comparison, the advantages of the proposed method over the original method are verified in terms of accuracy, efficiency, and adaptability.

Public Data Set
We used the online Retrieval [34] data set to verify the advancement of the proposed method. The data set includes 6 models and 18 scenes, and the model is shown in Figure 9. Each scene has only one set of point cloud data, which prevents other factors from interfering with the experimental comparison results. For all experiments, the Leaf_size of the model point cloud and scene point cloud downsampling was set to 5 mm; the hash table distance step d dist was set to 3 mm; the angle step d angle was set to 12 • ; and the 1/5 of the point cloud number was used as the scene reference point. The matching rate of the point cloud was calculated by Formula (8) in Section 2.2.5, where the threshold was set to 5 mm.

Experimental Results and Discussions
We used online public data set and real scene data to verify the effectiveness of the proposed method, and used a robotic arm to perform bin-picking tasks to evaluate the performance of the method in industrial applications. Our algorithm was implemented in C++ language under the Visual Studio2019 platform and was run on the NVIDIA Ge-Force GTX1060 processor. Through experimental comparison, the advantages of the proposed method over the original method are verified in terms of accuracy, efficiency, and adaptability.

Public Data Set
We used the online Retrieval [34] data set to verify the advancement of the proposed method. The data set includes 6 models and 18 scenes, and the model is shown in Figure  9. Each scene has only one set of point cloud data, which prevents other factors from interfering with the experimental comparison results. For all experiments, the Leaf_size of the model point cloud and scene point cloud downsampling was set to 5 mm; the hash table distance step dist d was set to 3 mm; the angle step angle d was set to 12  ; and the We verified the enhancement effect of curvature on the PPF description in the proposed method. Each model in the Retrieval data set corresponds to multiple scenes with different levels of noise. In order to reduce the impact of noise on the matching effect, a scene with a noise coefficient of 0.1 was selected for matching. The final matching rate is the average one between each model and multiple scenes, and the average of matching time with multiple scenes is viewed as the final time. The radius of curvature of models in the data set was set to 15 mm. Due to the different curvature distributions of each model, the curvature steps d cur of Bunny, Dragon, Statuette, Chinese_Dragon, Armadillo, and Buddha were set to 0.07, 0.1, 0.13, 0.15, 0.2, and 0.11, respectively. The matching experiments of the PPF algorithm and the Cur-PPF (unweighted) algorithm were carried out respectively. A set of matching effects are shown in Figures 10 and 11. Tables 1 and 2 are the data comparison between the PPF algorithm and the Cur-PPF (unweighted) algorithm in terms of matching rate and time. The experimental results show that the introduction of curvature information can strengthen the description of the feature, and it is better than the original PPF algorithm in terms of matching rate and time. experiments of the PPF algorithm and the Cur-PPF (unweighted) algorithm were carried out respectively. A set of matching effects are shown in Figures 10 and 11. Tables 1 and 2 are the data comparison between the PPF algorithm and the Cur-PPF (unweighted) algorithm in terms of matching rate and time. The experimental results show that the introduction of curvature information can strengthen the description of the feature, and it is better than the original PPF algorithm in terms of matching rate and time.    We also verified that the weighted voting in the proposed method has an enhanced effect on the matching effect. According to curvature histograms of point cloud models, the high-curvature part and the low-curvature part of models are divided [35]. The curvature histograms of the models point cloud are shown in Figure 12. Through multiple experiments with different models, we think that setting the weight to 2-8 is a better range. The setting of the experimental parameters is consistent with the Cur-PPF(Unweight) parameters. The matching experiments of the Cur-PPF(unweight) algorithm and the weighted Cur-PPF algorithm were carried out respectively. The matching effect of a group of the weighted Cur-PPF algorithm are shown in Figure 13. Tables 3 and 4 are the comparison of the matching rate and time between the Cur-PPF(Unweight) algorithm and the Cur-PPF algorithm. The experimental results show that the weighted operation introduced into the pose voting link further improve the point cloud matching rate, and the time is basically similar to the unweighted Cur-PPF algorithm, which proves the role of the weighted operation. The method proposed by Drost et al. can recognize different objects in the same scene. In order to verify that the improved method proposed in this paper based on the original PPF can also effectively recognize different objects in the same scene, we choose the public dataset Laser Scanner as an experiment. Since the method in this paper focuses more on the scene of the same object in bin-picking, this experiment serves as a supplementary experiment to verify the ability of the proposed method to recognize different objects. We compared the matching rates of Cur-PPF and Cur-PPF+ICP. The results are shown in Figure 14, and the average matching rates are shown in Table 5. Experiments show that the improved method proposed in this paper has similar functions to the original PPF method, not only can identify different objects in the same scene, but also has a satisfactory coarse registration effect. After ICP optimization, the average matching rate of fine registration can reach 93%.   The method proposed by Drost et al. can recognize different objects in the same scene. In order to verify that the improved method proposed in this paper based on the original PPF can also effectively recognize different objects in the same scene, we choose the public dataset Laser Scanner as an experiment. Since the method in this paper focuses more on the scene of the same object in bin-picking, this experiment serves as a supplementary experiment to verify the ability of the proposed method to recognize different objects. We compared the matching rates of Cur-PPF and Cur-PPF+ICP. The results are shown in Figure 14, and the average matching rates are shown in Table 5. Experiments show that the improved method proposed in this paper has similar functions to the original PPF method, not only can identify different objects in the same scene, but also has a satisfactory coarse registration effect. After ICP optimization, the average matching rate of fine registration can reach 93%.

Real Scene Data
In the previous section, the advantages of the proposed method Cur-PPF without clutter, overlapping occlusion are verified. However, in real scenes, the environment is chaotic and noisy, and it becomes more difficult for the robot to perform grasping tasks. In order to verify that the proposed method also has advantages in complex scenes, we built a robotic arm bin-picking scene, and the system is shown in Figure 15. The bin-picking scene is also one of the common scenes in the industry. In this scene, there is overlap and occlusion among target objects, which cause interference to the matching. In order to evaluate the algorithm, we consider the point cloud matching effect and the grasping rate of the robotic arm.

Matching Effect of Real Scenario
In the real scenario matching experiment, we used common objects in the industry as test objects. The point cloud and image data were acquired by a 3D sensor (a COBOT COMATRIX-IM camera, consisting of a gray-scale camera and a projector). We randomly put test objects into the box, and collected 20 sets of test scenarios for each type of object, and used the PPF algorithm and the algorithm proposed in this paper to perform match experiments. The experimental parameters were set as follows: the Leaf_size of the model point cloud and scene point cloud downsampling were set to 3 mm; the hash table distance step dist d was set to 0.5 mm; the angle step angle d was set to 12  ; the 1 5 of the point cloud number was used as the scene reference point; the radius of curvature was set to 10 mm, the curvature step cur d of the first type of object was set to 0.025, and the curvature step cur d of the second type of object was set to 0.3; the low-curvature range of the first type of object is 0-0.015, the high-curvature range is greater than 0.06, and the voting weight was set to 3; the second type of object has low-curvature range of 0-0.015, high-curvature range of greater than 0.065, and voting weight was set to 5. In calculating the matching rate between the model point cloud and the scene point cloud, the distance threshold was set to 5 mm.

Robot
Gripper Components 3D Sensor Figure 15. Bin-picking system diagram. The system is composed of robot, gripper, components, and 3D sensor.

Matching Effect of Real Scenario
In the real scenario matching experiment, we used common objects in the industry as test objects. The point cloud and image data were acquired by a 3D sensor (a COBOT COMATRIX-IM camera, consisting of a gray-scale camera and a projector). We randomly put test objects into the box, and collected 20 sets of test scenarios for each type of object, and used the PPF algorithm and the algorithm proposed in this paper to perform match experiments. The experimental parameters were set as follows: the Leaf_size of the model point cloud and scene point cloud downsampling were set to 3 mm; the hash table distance step d dist was set to 0.5 mm; the angle step d angle was set to 12 • ; the 1/5 of the point cloud number was used as the scene reference point; the radius of curvature was set to 10 mm, the curvature step d cur of the first type of object was set to 0.025, and the curvature step d cur of the second type of object was set to 0.3; the low-curvature range of the first type of object is 0-0.015, the high-curvature range is greater than 0.06, and the voting weight was set to 3; the second type of object has low-curvature range of 0-0.015, high-curvature range of greater than 0.065, and voting weight was set to 5. In calculating the matching rate between the model point cloud and the scene point cloud, the distance threshold was set to 5 mm.
We used the PPF algorithm [17] and the Cur-PPF algorithm proposed in this paper to perform point cloud matching respectively, and the ICP algorithm was used to correct the matching results. The point cloud matching processes are shown in Figure 16. In order to effectively compare the two algorithms, we only keep the top five matching results in the scene for the first type of object. For the second type of objects, the volume of the objects is larger, and the top layer can only be placed at most five, so only the results of the top three match rates in the scene are retained. The matching results are rendered in different colors, and the average of the matching rate is regarded as the final matching rate. Tables 6 and 7 show the comparison of the parameters of the two algorithms in terms of matching rate and time. It can be seen that the method proposed in this paper has greater advantages than the original method in the bin-picking scenario.  Figure 16. The matching process of the PPF algorithm and the Cur-PPF algorithm for two common objects in the industry. Among them, (a) is the three-way tube (the first type of objects); (e) is the upright column (the second type of objects); (b,f) correspond to the scene point cloud of two types of objects, the outer frame of the box is filtered out by setting thresholds on − x axis , − y axis respectively; (c,g) are matching results of the PPF algorithm for two types of objects; (d,h) are matching effect pictures after ICP correction; (i,o) are mapped from point cloud depth information to grayscale images; (j,p) are grayscale images after segmentation; (k,q) are candidate objects that are screened out according to the number of pixels in the segmented image; (l) and (r) are point clouds of candidate objects; (m,s) are the point cloud matching effect diagrams of the Cur-PPF al- Figure 16. The matching process of the PPF algorithm and the Cur-PPF algorithm for two common objects in the industry. Among them, (a) is the three-way tube (the first type of objects); (e) is the upright column (the second type of objects); (b,f) correspond to the scene point cloud of two types of objects, the outer frame of the box is filtered out by setting thresholds on x − axis, y − axis respectively; (c,g) are matching results of the PPF algorithm for two types of objects; (d,h) are matching effect pictures after ICP correction; (i,o) are mapped from point cloud depth information to grayscale images; (j,p) are grayscale images after segmentation; (k,q) are candidate objects that are screened out according to the number of pixels in the segmented image; (l,r) are point clouds of candidate objects; (m,s) are the point cloud matching effect diagrams of the Cur-PPF algorithm; and (n,t) are matching effect pictures after ICP correction. The matching rate from high to low is rendered in the order of red, orange, yellow, green, and blue. In order to verify the validity of the method proposed in this article, we used a six-axis robotic arm to perform bin-picking. In this system, the model of the robotic arm is UR5e (UNIVERSAL ROBOTS), the model of the gripper is AG-95 (DH ROBOTS), and the model of the 3D sensor is COMATRIX-IM (COBOT). Our experiment was carried out indoors. The light source is indoor incandescent lamp, and no specific light source is added.
We randomly placed 25 three-way tubes in the bin, and used the Cur-PPF algorithm to match the model with the scene. Each three-way tube in the scene will generate a set of pose results after weighted voting. The pose results after clustering were corrected using the ICP algorithm. According to our experience, when the matching rate is greater than 85%, the robotic arm can successfully grasp the target object. If the matching rate is less than 85%, the robotic arm will grab empty or pose error when grasping, which is considered as a wrong matching result. We carried out a total of 100 three-way tube grasping experiments, and the results showed five grasping failures, as shown in Table 8. Three of the failures were due to the close proximity of the three-way tubes, and the nearby objects were encountered before grasping, which caused the pose of the target object to change. Because of the low matching rate, which made the posture accuracy of the points captured by the robotic arm poor, and eventually led to other failure of the grasping operation.

Conclusions
We propose a 6D pose estimation method based on a new point-pair feature descriptor. In this method, an effective point cloud preprocessing is introduced, which can accurately extract candidate target objects and improve the matching efficiency. At the same time, the curvature information is introduced into the point-pair feature descriptor, which enhances the feature description and improves the matching accuracy. In addition, a weighted voting method is proposed in the pose voting link, which further improves the accuracy of pose estimation. At the end of this paper, we test the proposed method and the PPF on public data set and real scenarios. The experimental results show that the average matching rate of our method on the public data set has increased by 8.55%, and the average time taken has been shortened by 467.34 ms. In real scenarios, the average matching rate of our method has increased by 12.7%, and the average time taken has been shortened by 3188 ms, and the capture rate in the bin-picking scenarios is as high as 95%. It can be seen that the method proposed in this paper has the advantages of high pose estimation accuracy and short calculation time, and can be used in actual industrial scenarios.
In the future, we will continue to study the mathematical model of high-curvature and low-curvature partitioning in the weighting strategy, which will improve the efficiency of the strategy when applied to new objects. The point cloud matching rate can also be improved by accurately dividing the model curvature; in addition, there are useless model point-pairs during matching, and it is worth exploring how to avoid useless point-pairs in the future, which will further improve the overall efficiency.