PPTFH: Robust Local Descriptor Based on Point-Pair Transformation Features for 3D Surface Matching

Three-dimensional feature description for a local surface is a core technology in 3D computer vision. Existing descriptors perform poorly in terms of distinctiveness and robustness owing to noise, mesh decimation, clutter, and occlusion in real scenes. In this paper, we propose a 3D local surface descriptor using point-pair transformation feature histograms (PPTFHs) to address these challenges. The generation process of the PPTFH descriptor consists of three steps. First, a simple but efficient strategy is introduced to partition the point-pair sets on the local surface into four subsets. Then, three feature histograms corresponding to each point-pair subset are generated by the point-pair transformation features, which are computed using the proposed Darboux frame. Finally, all the feature histograms of the four subsets are concatenated into a vector to generate the overall PPTFH descriptor. The performance of the PPTFH descriptor is evaluated on several popular benchmark datasets, and the results demonstrate that the PPTFH descriptor achieves superior performance in terms of descriptiveness and robustness compared with state-of-the-art algorithms. The benefits of the PPTFH descriptor for 3D surface matching are demonstrated by the results obtained from five benchmark datasets.


Introduction
Local surface description is a fundamental technology in 3D computer vision and robotics. It has been used in several applications, such as 3D point clouds registration [1][2][3][4][5][6], 3D shape retrieval [7,8], 3D object recognition [9,10], and robot manipulation [11][12][13]. A local surface descriptor is represented by a high-dimensional feature vector that encodes geometric and spatial information on a local surface. As we focus on local surface description for rigid objects, the local descriptor should be invariant to pose transformations. Moreover, owing to the existence of noise, mesh resolution variation, clutter, and occlusion in several applications, a local feature descriptor should exhibit strong robustness to resist the negative effects of these factors, and high descriptiveness to distinguish local surfaces. Thus, designing a local surface descriptor with high overall performance is a considerable challenge, and several attempts have been made to overcome the related difficulties. Depending on whether a local reference frame (LRF) is used, these descriptors are classified into two categories [14].
Regarding descriptors without LRF, numerous highly effective methods have been proposed [15], and descriptors based on point-pair features (PPFs) are the most classical methods in 3D surface description [16]. Johnson et al. [17] presented a spin-image (SI) local descriptor. This algorithm calculates two spatial distances using the key point-its normal and a neighbor point-then, the SI descriptor is generated by considering the distribution information of neighbor points along the two spatial features. This descriptor is very low in terms of time consumption, but it has weak descriptive performance owing to the poor surface information encoded by the two simple features [15]. Rusu et al. [18] proposed a point feature histogram (PFH) to describe a local surface for point cloud registration. The PFH is constructed by using several point-pair features that are computed using Darboux frames [19], defined at a point-pair. To reduce time consumption, a modified version of the PFH was proposed in [20]; that is, the fast PFH (FPFH) descriptor which is constructed by weighting the simplified PFH associated with all neighbor points. These descriptors only make use of single geometric or spatial features to encode surface information, resulting in poor performance in terms of descriptiveness under the effect of various disturbances (noise, clutter, occlusion, mesh decimation, etc.). Moreover, after analyzing the descriptive power of the four features that are used in the classical object recognition algorithm (i.e., PPF [21]), Buch et al. [22] proposed the point-pair feature histogram (PPFH) descriptor based on the two most discriminative features in PPF. Although this descriptor is highly efficient and descriptive, it is sensitive to noise and mesh resolution variations [16,23].
Regarding LRF-based methods, these descriptors can encode both geometric and spatial information on a local surface, according to the established LRFs. The best-known examples are based on a signature of histograms of orientations (SHOT) [24], rotational projection statistics (RoPS) [25], and triple orthogonal local depth images (TOLDI) [26]. The SHOT descriptor performs covariance matrix analysis to define the LRF and divides a spherical neighborhood space into 32 bins along the radial, azimuth, and elevation directions. It is constructed by considering point distribution information in 32 bins using the deviation angles between the key point normal and the neighbor normal. Despite its high descriptiveness, SHOT is sensitive to mesh resolution variations. The LRF of the RoPS descriptor is generated by the weighted scatter matrix on triangular meshes, and the RoPS descriptive algorithm is obtained by extracting a set of statistics from several point-distribution matrices. The RoPS descriptor has been proven to have high descriptiveness [15]; however, it is highly time-consuming. With regard to the TOLDI descriptor, LRF is first constructed by calculating the normal of the key point and the weighted projection vectors of all the radius neighbors of the key point. Then, TOLDI uses three local depth images, corresponding to three coordinate planes, to further form the TOLDI descriptor. This descriptor is robust to clutter and occlusion, but it is not compact [27]. The descriptiveness and robustness of these LRF-based feature descriptors depend on the descriptive algorithms, and the performance of these methods is affected by the stability and repeatability of the constructed LRF. Unfortunately, LRFs generated on local surfaces tend to suffer from low stability and sign ambiguity, and a small error of the LRF may significantly change the generated local descriptor; this negatively affects the descriptor matching results [27].
Based on this analysis, existing descriptors either only extract single geometric and spatial information or include unstable geometric and spatial information encoded by the LRFs, resulting in low descriptiveness and weak robustness under the influence of disturbances [23]. To address these drawbacks, we propose a novel local descriptor using point-pair transformation feature histograms (PPTFHs) for discriminative and robust surface description. Specifically, the point-pair sets on a local surface are elaborately partitioned into four subsets using a simple but efficient strategy. Then, using four pointpair transformation features calculated by the proposed Darboux frame, the point-pair distribution information is encoded by PPTFHs to generate the proposed descriptor. A series of experiments on five public datasets representing different 3D application scenarios demonstrate that the proposed descriptor achieves superior performance compared with existing methods. The major contributions of this study are summarized as follows.

1.
A novel local surface descriptor (PPTFH) is proposed to achieve superior performance in terms of descriptiveness and robustness under various disturbances.

2.
A simple but effective point-pair set partition strategy is introduced. It exhibits high repeatability and stability, and it can be applied to other PPF descriptors (e.g., PFH) to enhance their feature matching performance (Section 3).
The remainder of this paper is organized as follows. In Section 2, we describe in detail the generation process of the PPTFH descriptor. In Section 3, we present a performance evaluation of the PPTFH descriptor and other state-of-the-art algorithms on four popular datasets. In Section 4, we apply the proposed descriptor in 3D surface matching on five different application scenarios. The paper is concluded in Section 5.

Local Surface Description Based on PPTFHs
In general, a unique high-dimensional feature vector is used to describe the local 3D surface of a 3D key point in the field of 3D surface description. In our study, we propose point-pair transformation feature histograms (PPTFH) to describe a 3D local surface. The entire generation process of the PPTFH descriptor mainly consists of three steps. First, local point-pair sets are divided into four subsets through a simple but stable spatial cue. Subsequently, three 2D histograms are constructed using the computed pointpair transformation features on each point-pair subset. Finally, all feature histograms corresponding to the four subsets are concatenated into a feature vector to represent the PPTFH descriptor. Furthermore, the four key parameters of the PPTFH descriptor are quantitatively analyzed.

Point-Pair Set Partition
The approach of tackling point-pairs on the local surface is critical for encoding rich surface information and constructing a descriptive and robust descriptor. Existing methods either do nothing for the point-pair sets or divide them into several subsets based on unstable geometric features [18,28], resulting in low descriptiveness and weak robustness. Hence, we propose a novel method to accurately partition point-pair sets into four subsets.
For a key point p k and support radius r, its neighbors are obtained as S p k = {p i : ||p i − p k || ≤ r ∧ p i = p k , i ∈ {0, 1, . . . , n}}, where n represents the number of the neighbors. It should be noted that the key point p k is not included in its neighbors. Then, the point-pair sets around p k are constructed as Q p k = p i , p j p i , p j ∈ S p k ∧ p i = p j , as shown in Figure 1a. Concerning the key point p k and each point-pair p i , p j ∈ Q p k , a spatial feature δ is computed, as shown in Figure 1b. It is the Euclidean distance from the key point p k to the straight line, determined by the point-pair p i , p j , that is, The value of δ is in the interval [0, r]. This feature δ is used to divide the point-pair sets Q p k into N δ regions, Herein, we set N δ = 4 accord to the parameter analysis experiment result in Section 2.3; that is, the range of δ is equally partitioned into four parts, and, thus, we can obtain four sub-regions{Q 1 , Q 2 , Q 3 , Q 4 }, as shown in Figure 1c. For instance, assume that p 1 , p 2 are the neighbors of the key point p k , and the feature δ 12 ∈ [3r/4, r], then, the point-pair (p 1 , p 2 ) ∈ Q 4 . Accordingly, δ is able to elaborately and stably partition local point-pair sets.

Transformation Feature Histogram Generation
After the point-pair sets have been divided into four subsets, the point-pair transformation is calculated using the proposed Darboux frames, and the local descriptor is generated by the computation of the PPTFHs.

Definition of a Novel Darboux Frame
An illustration of the proposed Darboux frame is shown in Figure 2a. For a key point p k , a neighbor point p i , and the normal n i of p i , a new Darboux frame with its origin at p i can be represented as where u i is equal to the normalized vector p k p i , and v i is computed by the cross-product of n i and u i . It is worth noting that the normal and the associated sign are estimated using the PCA (Principal Component Analysis) method presented in [28], with a support radius of 5 mr (mr denotes mesh resolution, which is the average distance between each point and its nearest neighbor point in 3D data). Then, w i is generated by the cross-product of u i and v i , and p i is the origin point of the Darboux frame. Compared with the Darboux frame mentioned in [19], the defined u i axis and novel Darboux frame can further increase the robustness of the proposed local descriptor. This is explained in Section 2.3.

Point-Pair Transformation Feature Histogram Computation
First, a local point-pair set Q p k is generated using the local point cloud determined by a key point p k and the support radius r. The point-pair set Q p k is divided into four subsets by the method presented in Section 2.1; that is, Q p k = {Q 1 , Q 2 , Q 3 , Q 4 }. Then, three PPTFHs are computed for each subset. Finally, the PPTFH descriptor is constructed using these 2D histograms.
Assuming that Q n (n = 1, 2, 3, 4) is a subset of the point-pair set Q p k corresponding to the key point p k . For arbitrary point-pair (p i , p j ) in Q n , we first compute two angles ϕ i and ϕ j between the two normals n i , n j and the line segment p i p j . Then, the point with the smaller angle of ϕ i and ϕ j is defined as a source point p s , and the other point is the target point p t , as is shown in Figure 2b. We construct two Darboux frames L s and L t , where L s is obtained using the key point p k and the source point p s by the method proposed in Section 2.2.1, and L t is obtained similarly. The results are as follows: For a Darboux frame, a transformation matrix could be obtained by transferring the Darboux frame system to the base coordinate system (it is a 4 × 4 identity matrix mathematically) (see Figure 2c). Hence, with regard to the Darboux frames L s and L t , we could obtain two transformation matrices T s and T t respectively. The two transformation matrices are defined as The transformation matrix T(R, t) from the p s Darboux frame to the p t Darboux frame can now be calculated using the transformation matrices T t and T s as follows: where the rotation part R of T is computed using the following equation: Subsequently, the distance d between p s and p t is computed using the translation part t to indicate the relative position of the two points; moreover, the three Euler angles (α, β, γ) are calculated using the rotation part R to represent the pose relationship between the two normals. The detailed results are as follows: Then, the four point-pair transformation attributes of a key point and a neighbor point-pair are where the three angle attributes ( f 2 , f 3 , f 4 ) are represented by the cosines of the corresponding angles. This can enhance the descriptive power of the descriptor, as demonstrated in [22,24]. After the above four transfer features are introduced, we compute the four attributes ( f 1 , f 2 , f 3 , f 4 ) for each point-pair in the partition Q n . Subsequently, we use the four attributes to describe the local surface. Certain binning policies are presented in [19,20,25], and we use two-dimensional bins to encode the four-attribute information and achieve optimal description performance [28].
The number of bins for dividing the ranges of the distance attribute f 1 and the three angle attributes f 2 , f 3 , f 4 are, respectively, N d and N a . The two parameters are determined by the parameter analysis experimental results in Section 2.3, and we discretize the three 2D attribute spaces (i.e., ( f 1 , For each 2D attribute space, a point-pair transformation feature histogram H is generated by counting the number of point-pairs in the partition Q n that fall into each 2D grid. The histogram H represents the distribution information of all point-pairs in the region Q n . Moreover, the performance of normal-based surface descriptors is affected by, for example, Gaussian noise and variable mesh resolution; hence, to reduce sensitivity to point density variations and noise, each histogram is interpolated bi-linearly and normalized to sum up to 1, and the PPTFH descriptor is constructed by concatenating all the histograms into a one-dimensional histogram (as in [29]). The length of the PPFTH surface descriptor In the following, we theoretically demonstrate the superior discriminability and robustness of the proposed descriptor. The merits of the PPFTH descriptor can be summarized in at least three aspects.
First, compared with the partition strategy presented in HoPPF [28], without relying on the unstable normal, a simple but stable spatial cue is introduced to robustly divide the point-pair sets into four subsets. The division strategy is capability of improving the descriptive power and robustness of the PPTFH descriptor [23]. Second, in contract to HoPPF, our PPTFH descriptor defines two new Darboux frames based on point-pair (p s , p t ), their normals (n s , n t ) and the key point p k ; then, the point-pair transformation matrix is computed using the two defined Darboux frames in Section 2.2.1. The pointpair transformation matrix calculated by our method not only encodes the point-pair relative position and their normals relative rotation information, but also implies the angle information between the p k p s and p k p t . The above facts indicate that our point-pair transformation matrix implies more rich and stable local point-pair information, thus improving the descriptiveness of our PPTFH descriptor and robustness to nuisances [23]. Finally, the point-pair distribution histograms are interpolated bi-linearly and normalized, ensuring resistance to point density variations and Gaussian noise.

Parameter Analysis for PPTFH Descriptor
In most local surface descriptors [14], the neighborhood radius is a common and critical parameter. To achieve a balance between descriptiveness, robustness, and time efficiency, we set the support radius to r = 15 mr in all performance evaluation experiments, according to the suggestion presented in [26]. In addition to the support radius, four other parameters should be determined in the PPFTH method: different computing methods of the point-pair transformation matrix, the point-pair sets partition number N δ , the bin number N d of the Euclidean distance feature, and the partition number N a corresponding to three angle features. To determine reasonable parameter configurations, we use the classical recall versus 1-precision curve (RPC) criterion [26] to quantitatively evaluate the performance of the PPFTH descriptor for different parameter sets on the tuning dataset. The tuning dataset in this experiment includes 12 range images with 6 models and 6 scenes. The model range images are from the Bologna Retrieval dataset [24], and the scene data can be obtained by rotating the models, resampling the models to 1/4 of their original mesh resolution, and adding Gaussian noise, with a standard deviation of 0.5 mr. Examples of the models and scene images are shown in Figure 3. The computing method of the point-pair transformation matrix is critical for the construction of the PPFTH descriptor. Different computing methods of the point-pair transformation matrix have a strong effect on the performance of the descriptor. In addition to the proposed computing method in Figure 4a, depending on whether the key point is used or not, there are two methods to calculate the point-pair transformation matrix, as shown in Figure 4b,c. One method is that the transformation matrix corresponding to each point-pair is computed using the two Darboux frames, each of which is defined by combining the connecting line between one neighbor point and the key point, with the neighbor point normal (Figure 4b). The other makes use of the two Darboux frames which are constructed by combining the line determined by the point-pair with their normals, as in [28] (Figure 4c). The two methods are referred to as Method-1 ( Figure 4b) and Method-2 ( Figure 4c). We used the two methods in place of the proposed methodology, whereas the other parameters were set to be constant. The obtained RPCs are shown in Figure 5a, and it is clearly seen that the PPTFH descriptor with the proposed method achieved the best performance, followed by the PPTFH with Method-1 and Method-2. The superior performance of the proposed method is because the Euler angles computed by the proposed method not only encode the relationship between two neighbor points, but also include the angle deviation information of the two connecting lines between the two neighbor points and the key point. Consequently, the proposed method was selected to compute the point-pair transformation matrix. The point-pair sets partition number N σ is a critical parameter which affects the descriptiveness, time efficiency, and compactness of the PPTFH descriptor. Figure 5b presents the RPC results under different partition numbers in the tuning dataset. Obviously, the best performance is achieved when partition number N σ is equal to 4. The other two important parameters are the bin numbers N d and N a . The configuration of these parameters greatly affects the robustness and discriminability of the proposed method. Hence, we test the performance of the descriptor for different bin numbers. The RPC results are presented in Figure 5c, where it can be observed that the best performance is achieved when the partition numbers of the Euclidean distance and angle features are set to 7 and 5, respectively. Accordingly, we eventually set N d = 7, N a = 5 in this study, and the dimensionality of the PPTFH descriptor is equal to 4 × 3 × 7 × 5 = 420.

Performance Evaluation Experiments
Herein, we test the performance of the PPTFH descriptor in various application scenarios using RPC (Recall Precision Curve) and AUC pr (Area Under Curve) as evaluation metrics [26]. We first describe the implementation details of the experiments, namely the benchmark datasets, the compared methods, and evaluation metrics. Then, the proposed PPTFH descriptor is compared with state-of-the-art surface descriptors in terms of descriptiveness, robustness, compactness and time efficiency. Moreover, the generalization ability of the proposed point-pair division strategy is verified by the experimental results. All the experiments were implemented in VS2017 and PCL and conducted on a PC with an Intel Core i7-8700 3.2 GHz CPU and 16 GB of RAM.

Experiment Datasets and Methods
Four popular datasets were selected to conduct a series of experiments: the Bologna retrieval (BR) dataset [24] for 3D shape retrieval, the Stanford 3D scanning repository (SDSR) dataset [30] for partial 3D data registration, the UWA dataset [31] for 3D object recognition, and the Kinect dataset [32] for 3D object recognition with low-quality surfaces. Examples are shown in Figure 6.
More specifically, there are 6 3D models and 45 synthetic scenes in the BR dataset. To evaluate robustness, we resample all scenes to 1/2, 1/4, 1/8, and 1/16 of the original mesh resolution to enhance the nuisance factors in this dataset, and add Gaussian noise, with a standard deviation of 0.1, 0.3, 0.5, 0.7, and 0.9 mr, separately to the 1 4 -mesh-resolution scenes. The UWA dataset involves 5 models and 50 real scenes that are generated by scanning several real objects with random placement. Consequently, clutter and occlusions are the main challenges in this dataset. The SDSR dataset separately contains 15 scans from Happy-Buddha and 15 scans from Dragon. The nuisance factors in this dataset are missing regions, holes, and self-occlusions. The Kinect dataset consists of 6 models and 16 scenes acquired by the Microsoft Kinect sensor. In addition to the low mesh quality, moderate occlusion and clutter are also nuisance factors in this dataset. In all comparative experiments, the PPTFH descriptor was compared with the most representative methods for performance evaluation in different 3D vision applications. These descriptors were divided into two categories: PPF-based and LRF-based methods. The PPF-based descriptors included PFH [18], FPFH [20], PPFH [22] and HoPPF [28], all of which were generated using a variety of point-pair features. In addition, we selected two well-known LRF-based descriptors (i.e., SHOT [24] and TOLDI [26]) to be compared with the proposed descriptor. In addition, to evaluate the generalization ability of the proposed point-pair set division method, we applied this strategy to the PFH descriptor. Specifically, the point-pair sets were partitioned into 4 subsets and the PFH descriptor in each subset was computed as a sub-feature. Then, the modified PFH (MoPFH) descriptor was generated by concatenating the four sub-features into a vector, and the dimensions of the MoPFH were 4 × 125 = 500. The parameter information of these state-of-the-art descriptors is listed in Table 1. It should be noted that, in all experiments, using the uniform sampling method mentioned in [14], we sampled around 1000 points as the key points in each model to compute the descriptors. The key points in the scenes were obtained by transferring the model key points to the scene using the ground truth given by the four datasets. We uniformly set the support radius of all comparative descriptors to 15 mr to ensure fairness in the comparisons.

Evaluation Metrics
In order to quantitatively assess the descriptiveness, robustness and compactness, we adapted the popular Recall vs. 1-Percision Curve (RPC) and Area Under Curve (AUC pr ) as performance evaluation metrics. Note that AUC pr is the area between the RPC and the 1-precision axis. The RPC results can be generated through the following steps.
First, for giving a model, a scene, and the ground truth pose, we randomly sampled around 1000 points as the key points in each model, and the key points in each scene could be obtained through translating the model key points to the scene using the ground true transformation matrix. Then, the descriptor of each model key point was matched with all scene key points descriptors to search the closest and the second closest descriptor. If the ratio ε between the closest and the second closest descriptor distances was smaller than a given threshold τ, the model key point and the scene key point with the closest descriptor distance would be considered a match. A match would be further defined as a correct one if the Euclidean distance between the transformed model key point and the scene key point was sufficiently small (i.e., being smaller than 1 3 of the support radius of the descriptor in this study), otherwise it was regarded as a false match. Consequently, in a certain threshold, the recall and 1-percision are separately defined as: recall = the number of correct matches total number of corresponding descriptors 1 − percision = the number of false matches total number of matches (9) Finally, the RPC result would be generated through setting a series of threshold. In our study, the series of thresholds for calculating the RPC were set as 0.3, 0.4, 0.6, 0.75, 0.85, 0.9, 0.95, 1.0, respectively. It is worth noting that the RPC result will locate at the upper left areas when the descriptor match obtains both of high recall and precision.

Performance Evaluation Results and Discussion
The performance of the PPTFH descriptor is compared with that of the descriptors in Table 1 in terms of the RPC and AUC pr metrics (Figures 7 and 8) on the 4 benchmark datasets. The evaluation is in terms of local surface descriptiveness, robustness to various nuisance factors, compactness [14], and time efficiency. The details are as follows.

Descriptiveness of the PPTFH Descriptor
As is shown in Table 2, our PPTFH descriptor is superior to the state-of-the-art methods in terms of descriptiveness. More specifically, according to the results in Figure 7a on the BR dataset with 0.5 mr noise and 1/4 decimation resolution, the proposed PPTFH method outperforms the other descriptors in terms of descriptiveness by a large margin (at least 0.2 regarding AUC pr ), followed by HoPPF. Moreover, the HoPPF descriptor outperforms the other descriptors by 0.1 regarding the AUC pr values, which is consistent with the results in [28]. The PFH descriptor is significantly inferior to the others because it is more sensitive to noise. Clutter and occlusion are the main challenges in the UWA dataset compared with the BR dataset. To improve computational efficiency, we resample the scenes with 1/4 mesh decimation, and the key points on the scene boundary are removed. The PRC and AUC pr results for the UWA dataset are shown in Figure 7b. Evidently, the proposed PPTFH descriptor outperforms all the others by a large margin again, followed by the HoPPF and PPFH descriptors with similar performance. Moreover, compared with the three methods, the other descriptors exhibit a dramatic descriptive performance degradation on the UWA dataset. The SDSR dataset contains some 2.5D range images from different views, involving missing regions, holes, and self-occlusions. As shown in Figure 7c, the HoPPF descriptor achieves the best performance by a wide margin, followed by MoPFH and the proposed PPTFH method. Moreover, the other methods are inferior to the aforementioned three descriptors. In contrast to the above three datasets, the Kinect dataset is obtained by the cheap Kinect 3D sensor, and the mesh quality of the scanned range images is lower than that of 3D data scanned by a laser scanner. Therefore, in terms of PRC and AUC pr values, the results in Figure 7d are inferior to those on the above three datasets by a large margin. Furthermore, the PPTFH descriptor is slightly superior to the SHOT method, followed by HoPPF. Remarkably, the descriptiveness of the PPFH descriptor is inferior to the other methods, and this observation coincides with the results in [28]. Moreover, from the results shown in Figure 7a-d, the modified PFH (MoPFH) descriptor with the proposed division strategy achieves a significant performance improvement compared with the original PFH. The results demonstrate that using the proposed point-pair set division strategy could enhance the performance of point-pair descriptors.   A comprehensive analysis of the results on the four datasets indicates two interesting phenomena. One is that the descriptiveness of early PPF-based methods (i.e., PFH and FPFH) is generally inferior to that of recent LRF-based descriptors (i.e., SHOT, TOLDI), which is because LRF-based methods can encode richer surface information than early PPF-based methods using the LRFs. Another shows that the most current PPF-based methods (i.e., the proposed PPTFH and HoPPF) outperform recent LRF-based methods in term of descriptive power, and it can be explained that the most current PPF-based descriptors (the proposed PPTFH and HoPPF) make full use of the spatial and geometric cues caused by the point-pair set partition strategy and novel point-pair features, and they are not affected by unstable LRFs.

Robustness to Various Nuisance Factors
Herein, we use AUC pr values to evaluate robustness of the PPTFH descriptor to Gaussian noise, mesh resolution variations, scene clutter, and occlusion. The experiments are only conducted on the Bologna and UWA datasets. The results are shown in Figure 8 and Tables 3-5.    -0.65 0.65-0.70 0.70-0.75 0.75-0.80 0.80-0.85 0.85-0.90 0.90-0 Gaussian Noise. Regarding the robustness to Gaussian noise, we first resampled the Bologna dataset to 1/4 mesh resolution, and then added noise with a standard deviation of 0.1, 0.3, 0.5, 0.7, and 0.9 mr to each scene separately. The AUC pr results for different levels of noise are shown in Figure 8a and Table 3. The proposed PPTFH descriptor achieved the best performance for each noise level, followed by the HoPPF and TOLDI descriptors. It can also be seen that the performance margin between the PPTFH descriptor and the other methods increased for high noise levels (i.e., 0.5, 0.7, and 0.9 mr Gaussian noise), and the PPTFH method achieved satisfactory performance, even at the highest noise levels (with an AUC pr of at least 0.5, whereas the corresponding values for the other methods were less than 0.36).
Mesh Decimation. The proposed PPTFH descriptor performed the best at all levels of mesh decimation (Figure 8b and Table 3). More specifically, when the mesh decimation was low (i.e., 1/8 and even 1/16), the PPTFH method outperformed all the others by a large margin, and its AUC pr values were always greater than 0.5, compared with those of the others, which were less than 0.25 for the lowest mesh decimation level (1/16).
Clutter and Occlusions. In the UWA dataset, the effects of scene clutter and occlusion on the performance of the PPTFH descriptor and the others were measured using the AUC pr values, where the clutter rate was recomputed as suggested in [16] and the occlusion rate was provided by the UWA dataset. The results are shown in Figure 8c,d, and Tables 4 and 5 for different levels of clutter and occlusion. In terms of robustness to clutter and occlusion, the PPTFH method outperformed the others for all clutter and occlusion levels, followed by the HoPPF and PPFH descriptors. Moreover, as the clutter rate increased, the performance of all descriptors did not change significantly, and the overall performance gradually decreased. The robustness to occlusion was consistent with that to clutter when the occlusion rate increased from approximately 60% to 80%; however, with an occlusion rate of more than 80%, the performance of all methods rapidly decreased because of the existence of boundary areas.
These results clearly demonstrate the strong robustness of the PPTFH descriptor to various nuisance factors (i.e., Gaussian noise and mesh decimation, scene clutter, and occlusion). Compared with the other surface descriptors, the proposed PPTFH method performs the best on both the Bologna retrieval dataset with noise and mesh resolution variation and the UWA object recognition dataset with clutter and occlusion. More importantly, satisfactory performance is also achieved by the PPTFH descriptor, even in extreme cases (0.9 mr noise and 1/16 mesh decimation).

Compactness
In this part, the compactness of the state-of-the-art descriptors in Table 1 are evaluated on BR, UWA, SDSR and Kinect datasets. For a local descriptor, the compactness is also a significant attribute. It affects both the efficiency of feature matching and the size of memory usage. The compactness represents the performance of each floating-point number in a descriptor vector [14], which is defined as: where the AUC pr values of the compared descriptors are presented in Figure 7 and Table 1 has given the dimensionality of these descriptors. The compactness of the compared descriptors calculated by the Equation (10) is shown in Figure 9a. The FPFH method achieves the best result in term of compactness, followed by the PFH. The high compactness of these two descriptors is mainly due to their very short lengths. Our PPTFH is the third compact descriptor owing to the high AUC pr value, which means that our PPTFH achieves a balance between descriptiveness and compactness. Furthermore, the TOLDI method obtains a poor performance in term of compactness due to the long dimensionality of the TOLDI (up to 1200).

Time Efficiency
In this section, we test the efficiencies of all the compared descriptors (see in Table 1). We first randomly sampled around 1000 key points in the tuning dataset (see in Section 2.3). Then, the total time costs of these descriptors generated on the extracted key points with different support radii (from 10 mr to 30 mr by increments of 5 mr) were counted, and the average consuming time for calculating one descriptor was considered as the final experiment result.
The evaluation results are shown in Figure 9b, some observations can be made. First, the PPFH achieves the optimal time efficiency, followed by the TOLDI and SHOT, because the time complexity of the three methods is o(k) for the k-nearest neighbors compared with the other four methods (PPTFH, HoPPF, FPFH, PFH). Moreover, our PPTFH is moderate in terms of time efficiency, and it is a little more efficient than HoPPF due to the lower dimensionality (420 vs. 600). Finally, the PFH is the slowest descriptor because of its o k 2 computation complexity.

Surface Matching on Four Benchmark Datasets
To further validate the effectiveness of the PPTFH approach in different application scenarios, we applied the PPTFH descriptor and the aforementioned six methods (Table 1) to 3D correspondence-based surface matching on the BR, UWA, SDSR, and Kinect datasets. As in [22], the F 1 score was used to measure the surface matching performance of these local feature descriptors.
Specifically, we resampled the BR dataset for object retrieval to 1/4 mesh decimation and added Gaussian random noise with a standard deviation of 0.9 mr to the 1/4 mesh decimation scenes, and the UWA dataset and SDSR dataset were resampled to 1/4 mesh decimation. Each model was sampled to approximately 1000 points, as the model key points using the uniform sampling method mentioned in [14], and each scene was also sampled to appropriate numbers that could ensure sufficient key points for each instance in the scene. The local features corresponding to the key points were calculated by the local descriptors under comparison. Subsequently, the local features on model and scene served as input of the nearest neighbor distance ratio (NNDR) matching technique [33] to obtain the 3D correspondence points, and the 3D correspondence points were used as input to the surface-matching pipeline. Finally, to reject inappropriate 3D correspondence points and obtain a coarse pose estimation result, we adopted the geometric constraints (GC) in [34] to remove mismatch points, and the random sample consensus (RANSAC) [35] technique to estimate the pose transformation from the model to the scene.
It should be noted that, in all 3D surface matching experiments, the threshold in the NNDR method was set to 0.95, and the RANSAC iteration number was always equal to 10,000. Moreover, for each descriptor and the corresponding dataset, the other parameters of the matching pipeline were determined to maximize the number of true positives and minimize the number of false positives, so that the best performance in terms of the F 1 score might be achieved. With the other parameters remaining constant, we conducted a series of surface matching experiments under different support radii (increasing from 10 to 30 mr by increments of 5 mr).
The results are shown in Figure 10, and Table 6 lists the highest F 1 scores of each local descriptor. It can be observed that the proposed PPTFH descriptor achieves the best surface matching performance in all four datasets, followed by the HoPFF, TOLDI, and SHOT methods, which is consistent with the conclusions regarding the descriptive power in Section 3.3.1. Furthermore, the best performance is achieved by the proposed PPTFH method, even for different support radii. It can be concluded that, under noise, varying mesh resolution, clutter, occlusion, and mesh quality variations, the PPTFH method coherently provides a discriminative and robust description of the local surface. In addition, as shown in Figure 10a, the matching performance of most descriptors (with the exception of the PFH) improves as the support radius increases. This is because, in this object retrieval dataset (BR dataset), the descriptors can encode more surface information and are not affected by clutter, occlusion, and missing mesh under a larger radius. However, Figure 10b-d indicates that the object recognition performance of most methods (except for PPFH and TOLDI) first improves and then degrades as the support radius increases. This can be explained by the fact that these descriptors could include richer surface features under a larger radius. However, the presence of clutter and occlusions, missing mesh, and boundaries in the three datasets has a seriously negative effect on the matching performance of the descriptors when the support radius is excessively large. Furthermore, from Figure 10, we can observe that different local descriptors have different support radii when the surface matching results achieve optimal performance on different datasets. This observation indicates that there is no constant support radius parameter to optimize the surface matching performance on different application scenarios and different nuisances (e.g., noise, clutter, occlusion, varying point density, and so on). In general, we recommend that it is enough that the support radius is selected from 15 mr to 25 mr in some applications, according to our study and other literature [14]. We also present some visual surface matching results based on the our PPTFH descriptor, as shown in Figure 11. From the top down, these subfigures separately represent the surface matching visual sample on BR, UWA, SDSR, and Kinect datasets. From left to right, they represent the model, scene, correspondences obtained by using PPTFH descriptors matching and NNSR technology, correspondences after removing the mismatches by GC, and the surface matching results by RANSAC. One must note that the initial model and scene have obvious pose variation. After matching the PPTFH descriptors from scene to model, there are always enough correspondences to align the model to the scene by RANSAC transformation method.

Surface Matching on the WHU-TLS Dataset
Besides the surface matching on the above four benchmark datasets, we also applied our PPTFH descriptor to registration of the large-scale terrestrial laser scanner point clouds on the WHU-TLS dataset [3,4]. The WHU-TLS dataset consists of 115 scans from 11 different environments with varying point density, clutter, and occlusion. Herein, we selected 34 representative scene scans from 5 environments (i.e., mountain, campus, residence, riverbank, and heritage building) on the WHU-TLS dataset to perform the pairwise 3D registration experiment. More specifically, for a pair of point clouds, we first calculated the transformation matrix for registering the two-point clouds by using our proposed method. Then, if the error between the transformation matrix and the truth was less than the given threshold, we argued that the two-point clouds were registered correctly. The precision was the ratio of the correct registration number to the total pairs number.
The registration experiment results are shown in Table 7. Despite the complex environment and low point cloud quality in these scans, we successfully achieve the 32 pairwise registration in 34 pair scans with a 94.12% precision rate. Moreover, some visual 3D registration results of the five environmental data are presented in Figure 12 (from top to bottom, mountain, campus, residence, riverbank, and heritage building), and we find that our proposed PPTFH descriptor is able to generate enough correspondences to align the two 3D point clouds.

Discussions and Conclusions
In this article, we proposed a novel PPTFH descriptor for 3D surface description, together with a proposed point-pair division strategy. The prominent advantage of our PPTFH descriptor is its high descriptiveness and strong robustness.
With regard to the point-pair division strategy, by computing the distance between the key point and the line determined by two neighbor points, a novel spatial feature was introduced to divide the point-pair sets into four subsets. Differing from the classical methods (i.e., PFH [18], FPFH [20]), which directly encode the surface information without tackling the point-pair sets, our technique utilizes a simple yet efficient spatial feature to divide the point-pair sets elaborately; thus, it can realize the improvement of the descriptiveness and robustness of the PPF-based descriptor.
The PPTFH descriptor was then generated with four sub-features, each of which was constructed by three transformation feature histograms corresponding to each point-pair subset. Both geometric and spatial information was encoded in the PPTFH descriptor in a comprehensive manner. The main characteristics of the proposed PPTFH descriptor are concluded as follows. First, the PPTFH descriptor make use of a stable and robust division strategy to preprocess the point-pair sets, which enhances the descriptiveness and robustness of the PPTFH. Furthermore, the PPTFH is highly informative because it makes use of four point-pair transformation attributes computed via our defined Darboux frames to encode the local surface. Finally, the PPTFH is robust to noise and various mesh resolutions, owing to the interpolation and normalization operations.
In order to evaluate our PPTFH descriptor, a series of experiments and comparisons were performed on the BR, UWA, SDSR, Kinect, and WHU-TLS datasets, which are, respectively, relevant to shape retrieval, object recognition, 3D registration, 3D object recognition with low-quality surfaces, and 3D terrestrial laser scanner point cloud registration in the domain of remote sensing. The result reveals that our proposed point-pair sets division strategy could be grafted to other point-pair-features-based methods (e.g., PFH) to improve descriptiveness and robustness, and the proposed PPTFH descriptor outperforms state-of-the-art methods by a large margin in terms of descriptiveness and robustness. In addition, the superior feature matching performance of the PPTFH descriptor is validated by its applications to 3D surface matching in five benchmark datasets.
In future, our work will focus on improving time efficiency, as time consumption of the PPTFH method is moderate when compared with the state-of-the-art methods. In addition, along with the development of RGB-D sensors, integrating color cues into our PPTFH descriptor is helpful for an application to objects that have limited geometric features but rich texture information.