A Novel Pallet Detection Method for Automated Guided Vehicles Based on Point Cloud Data

Automated guided vehicles are widely used in warehousing environments for automated pallet handling, which is one of the fundamental parts to construct intelligent logistics systems. Pallet detection is a critical technology for automated guided vehicles, which directly affects production efficiency. A novel pallet detection method for automated guided vehicles based on point cloud data is proposed, which consists of five modules including point cloud preprocessing, key point extraction, feature description, surface matching and point cloud registration. The proposed method combines the color with the geometric features of the pallet point cloud and constructs a new Adaptive Color Fast Point Feature Histogram (ACFPFH) feature descriptor by selecting the optimal neighborhood adaptively. In addition, a new surface matching method called the Bidirectional Nearest Neighbor Distance Ratio-Approximate Congruent Triangle Neighborhood (BNNDR-ACTN) is proposed. The proposed method overcomes the problems of current methods such as low efficiency, poor robustness, random parameter selection, and being time-consuming. To verify the performance, the proposed method is compared with the traditional and modified Iterative Closest Point (ICP) methods in two real-world cases. The results show that the Root Mean Square Error (RMSE) is reduced to 0.009 and the running time is reduced to 0.989 s, which demonstrates that the proposed method has faster registration speed while maintaining higher registration accuracy.


Introduction
Under the background of "Industry 4.0", the logistics industry is facing challenges, including structural adjustment, industrial optimization, cost reduction and efficiency improvement, and also has ushered in development opportunities such as information technology, intelligent logistics and machine vision [1]. As an important part of an intelligent logistics system, automated guided vehicles are widely used in warehousing, production, service, aerial work and other scenarios, which can establish a human-machine friendly interactive environment and reduce the incidence of safety accidents [2]. However, in the actual storage environment, due to the influence of many factors such as excessive obstacles, uneven illumination, accumulated handling errors and manual intervention, automated guided vehicles have problems of low efficiency and repeated handling in the process of pallet handling [3]. With the help of a 3D vision sensor, automated guidance vehicles can detect the scene pallet, which can effectively solve these problems. Pallet detection for automated guided vehicles is widely used in various scenarios including storage shelves, the production workshop, drug transport and blast furnace conditions, which are shown in Figure 1. The applications of automated guided vehicles in these scenarios can establish a human-machine friendly interactive environment, improve production efficiency and reduce the incidence of safety accidents. establish a human-machine friendly interactive environment, improve production efficiency and reduce the incidence of safety accidents. The existing vision-based object detection methods are mainly divided into two categories: the image-based method [4] and the point cloud-based method [5,6]. There has been a large amount of research on the object detection method based on images [7][8][9][10][11]. Specific to pallet detection, Li et al. [12] applied the Region Growing algorithm to extract the whole target region, and the pallet was located by the Progressive Probabilistic Hough Transform (PPHT) method, which solved the problem of difficult target detection under unstable light conditions. Syu et al. [13] used the monocular vision system on the forklift and combined the Adaptive Structure Feature (ASF) and Direction Weighted Overlapping (DWO) ratio to detect the pallet, which removes most of the non-stationary background and significantly increases the processing efficiency. Li et al. [14] established the pallet dataset and applied the improved deep learning object detection algorithm to obtain detection results, which improved the efficiency and accuracy of the pallet detection. The above methods of object detection based on 2D images have been intensively investigated, which is currently a relatively mature research area. However, the imaging process of 2D images involves mapping from 3D space to 2D space, which loses a lot of useful information during the mapping process. Therefore, object detection based on 2D images can no longer satisfy the needs of current industrial production.
With the rapid development of low-cost depth sensors, object detection has converted from traditional single point and segment measurement to dense point cloud and full profile measurement [15][16][17]. Compared with 2D images, 3D point cloud data provide more information about color, texture, geometric feature and space distribution [18], which makes pallet detection based on the 3D point cloud an active research topic. Firstly, the methods based on artificial features were attached to the pallets. Seelinger et al. [19] presented a vision-based approach to identify the fiducials which were placed on each pallet, which provides automated guided vehicle systems with the capability of performing pallet detection tasks. Two reflectors were fixed left and right on the short side of the pallet in the study by Lecking [20] to realize pallet detection. Although these artificial features simplify pallet detection, it takes effort to label all of the pallets in advance, thereby The existing vision-based object detection methods are mainly divided into two categories: the image-based method [4] and the point cloud-based method [5,6]. There has been a large amount of research on the object detection method based on images [7][8][9][10][11]. Specific to pallet detection, Li et al. [12] applied the Region Growing algorithm to extract the whole target region, and the pallet was located by the Progressive Probabilistic Hough Transform (PPHT) method, which solved the problem of difficult target detection under unstable light conditions. Syu et al. [13] used the monocular vision system on the forklift and combined the Adaptive Structure Feature (ASF) and Direction Weighted Overlapping (DWO) ratio to detect the pallet, which removes most of the non-stationary background and significantly increases the processing efficiency. Li et al. [14] established the pallet dataset and applied the improved deep learning object detection algorithm to obtain detection results, which improved the efficiency and accuracy of the pallet detection. The above methods of object detection based on 2D images have been intensively investigated, which is currently a relatively mature research area. However, the imaging process of 2D images involves mapping from 3D space to 2D space, which loses a lot of useful information during the mapping process. Therefore, object detection based on 2D images can no longer satisfy the needs of current industrial production.
With the rapid development of low-cost depth sensors, object detection has converted from traditional single point and segment measurement to dense point cloud and full profile measurement [15][16][17]. Compared with 2D images, 3D point cloud data provide more information about color, texture, geometric feature and space distribution [18], which makes pallet detection based on the 3D point cloud an active research topic. Firstly, the methods based on artificial features were attached to the pallets. Seelinger et al. [19] presented a vision-based approach to identify the fiducials which were placed on each pallet, which provides automated guided vehicle systems with the capability of performing pallet detection tasks. Two reflectors were fixed left and right on the short side of the pallet in the study by Lecking [20] to realize pallet detection. Although these artificial features simplify pallet detection, it takes effort to label all of the pallets in advance, thereby identifying the above approaches as unfeasible. Guo et al. [21] summarized the existing local feature detection methods and concluded that the contradiction between descriptiveness and computational efficiency of the local feature descriptor was a major challenge faced in feature extraction. Hence, it was essential to construct a robust and descriptive feature descriptor. The Fast Point Feature Histogram (FPFH) is a commonly used local feature descriptor which can perform well in descriptiveness, robustness and efficiency [22]. Additionally, FPFH employed the geometric feature of the pallet to build a descriptor without adding any artificial marks. Tao et al. [23] combined SVM classification and the FPFH descriptor to achieve object detection, which improved the robot detection ability and perception in three-dimensional space. A new point registration algorithm that combines FPFH and greedy projection triangulation was presented by Liu et al. [24], which improved the accuracy of registration. Li et al. [25] proposed a novel method of point registration called the Four Initial Point Pairs (FIPP) algorithm based on the FPFH feature descriptor, and the accuracy of FIPP could reach a better level, but it had low efficiency in mass data. However, few studies considered the color information and the criteria for the selection of the neighborhood radius in the FPFH descriptor. Most researchers adjusted the neighborhood radius manually based on prior knowledge, with certain randomness, low efficiency and high complexity.
In response to the above problems, a novel pallet detection method for automated guided vehicles based on point cloud data is proposed, including point cloud preprocessing, key point extraction, feature description, surface matching and point cloud registration. The main contributions can be summarized as: (1) the proposed method considers the HSV color feature, which improves the detection accuracy; (2) an ACFPFH feature descriptor is proposed and the criteria for adaptive selection of the optimal neighborhood radius are established; (3) a new surface matching method called the Bidirectional Nearest Neighbor Distance Ratio-Approximate Congruent Triangle Neighborhood (BNNDR-ACTN) is proposed, which increases the efficiency and accuracy. The proposed method not only overcomes the drawback of randomness and inefficiency of neighborhood selection in traditional feature extraction but also improves the accuracy and efficiency of pallet detection. Moreover, the proposed method can be well adapted to a variety of complex scenes such as the ground and the shelf.
The rest of the paper is organized as follows: In Section 2, the proposed pallet detection method based on the ACFPFH feature descriptor is described. Section 3 outlines two specific case studies and further comparison analysis for verifying the proposed method in engineering applications. Finally, Section 4 concludes this paper.

Overview of the Proposed Method
This section describes an overview of the proposed method. It consists of five modules: point cloud preprocessing, key point extraction, feature description, surface matching and point cloud registration. The framework of the proposed method is shown in Figure 2. The procedure involves the following steps.
Step 1: Point cloud preprocessing. The Percipio FM851-E2 3D vision sensor is used to acquire the point cloud data which represent the whole scene, including the pallet. Outliers are eliminated and the redundant information, such as walls and grounds, is removed using Random Sample Consensus (RANSAC) algorithm.
The key points with rich information are extracted from scene point clouds by the Intrinsic Shape Signatures (ISS) algorithm.
Step 3: Feature description. The optimal neighborhood radius of each point is obtained based on the minimum criterion of the neighborhood feature entropy function. The color components and the geometric information based the optimal neighborhood of the key point are encoded into a representative feature descriptor called the Adaptive Color Fast Point Feature Histogram. The pallet template point cloud and its corresponding library of feature descriptors are obtained by performing the above three steps.
Step 4: Surface matching. The matching method based on the Bidirectional Nearest Neighbor Distance Ratio (BNNDR) is employed to complete feature matching between the pallet template point cloud and the scene point cloud. Considering that there are some incorrect matching point pairs which will reduce the registration accuracy, it is essential to eliminate them by the Approximate Congruent Triangle Neighborhood (ACTN).
Step 5: Point cloud registration. The RANSAC algorithm is applied for performing point cloud coarse registration, which can obtain the relationship between the template point cloud and the scene point cloud and provide an ideal initial position for fine registration. The fine registration works to obtain a final optimal transformation matrix using the Iterative Closest Point (ICP) algorithm.
Fast Point Feature Histogram. The pallet template point cloud and its corresponding library of feature descriptors are obtained by performing the above three steps.
Step 4: Surface matching. The matching method based on the Bidirectional Nearest Neighbor Distance Ratio (BNNDR) is employed to complete feature matching between the pallet template point cloud and the scene point cloud. Considering that there are some incorrect matching point pairs which will reduce the registration accuracy, it is essential to eliminate them by the Approximate Congruent Triangle Neighborhood (ACTN).
Step 5: Point cloud registration. The RANSAC algorithm is applied for performing point cloud coarse registration, which can obtain the relationship between the template point cloud and the scene point cloud and provide an ideal initial position for fine registration. The fine registration works to obtain a final optimal transformation matrix using the Iterative Closest Point (ICP) algorithm.

Outliers Elimination
Due to the hardware design of the Percipio FM851-E2 3D vision sensor, external environmental interference and other factors, point cloud outliers are inevitable in the measurement. The pallet detection results will have errors if the outliers in the original scene point cloud Qso are not eliminated.
The distance from arbitrary point i P in the point cloud to its neighborhood point ik P (k = 1,2,...,m) is approximately subject to Gaussian distribution, and the probability density function of the average neighborhood distance is listed below:

Outliers Elimination
Due to the hardware design of the Percipio FM851-E2 3D vision sensor, external environmental interference and other factors, point cloud outliers are inevitable in the measurement. The pallet detection results will have errors if the outliers in the original scene point cloud Q so are not eliminated.
The distance from arbitrary point P i in the point cloud to its neighborhood point P ik (k = 1, 2, . . . , m) is approximately subject to Gaussian distribution, and the probability density function of the average neighborhood distance is listed below: where i = 1, 2, . . . , n, n represents the number of points in the point cloud, d i is the average neighborhood distance of arbitrary point P i , µ and σ are the expectation and standard deviation of the average neighborhood distance d i , respectively. Calculating the average neighborhood distance d i , the point P i is considered as an outlier and removed if µ − σ < d i < µ + σ.

Plane Segmentation
In warehousing environments, the scene point cloud acquired by the Percipio FM851-E2 3D vision sensor contains a lot of redundant information, such as the grounds and the walls, which will decrease the calculation efficiency. Therefore, it is necessary to remove the useless planes in the scene point cloud [26]. The specific segmentation procedures are as follows: Step 1: The plane equation in the three-dimensional point cloud is defined as: where A, B and C are plane parameters, and D is the distance from the plane to the point P i . Randomly select three points from the scene point cloud Q SE after removing outliers and obtain the parameters of the initial plane P I .
Step 2: Calculate the distance D i from the point P i to the initial plane P I and the angle β i between the point Pi and the normal vector of the initial plane P I . Set distance threshold D ε and angle threshold β ε ; if both D i < D ε and β i < β ε are satisfied, the point P i belongs to the plane P I .
Step 3: Repeat the above procedures until the number of the points in the plane reaches the threshold t, and remove the final fitted plane model to obtain the preprocessed scene point cloud Q s .

Key Point Extraction
The preprocessed scene point cloud Q s still contains a large number of points, which leads to low efficiency of feature extraction and matching. Selecting key points to simplify the point clouds can retain the features of the point clouds as much as possible while reducing the number of the points. The Intrinsic Shape Signatures (ISS) is a widely used algorithm with a fast calculation speed and high repeatability to realize key point extraction [27]. The extraction procedures of the key points PF i are summarized as follows: Step 1: The neighborhood points P ik (k = 1, 2, . . . , m) of P i in the scene point cloud Q S are searched within a certain radius d p . d p is the average closest point distance of the point cloud collected by the 3D vision sensor, which can be calculated as follows: where N is the number of the points, and d m is the distance between each point and its closest point. Compute a weight parameter ω ik for each point P i inversely related to the distance from P ik to P i as follows: Step 2: The covariance matrix C i of point P i is generated as follows: where m is the number of the neighborhood points P ik , and Step 3: Calculate the eigenvalues of the covariance matrix C i and sort them from large to small as λ i 1 , λ i 2 , λ i 3 .
Step 4: Set the thresholds k 2 and k 1 ; the points with the relation of λ 2 λ 1 < k 1 ∩ λ 3 λ 2 < k 2 are considered as the key point PF i .

Feature Description
As for the traditional feature descriptors, the neighborhood radius for all the points is a fixed value. Most studies select the appropriate radius based on empirical knowledge, which has strong subjectivity and low efficiency. Besides, the color information is ignored, which makes it difficult to fully and accurately characterize the objects. Therefore, a novel feature descriptor called ACFPFH is defined, which adaptively selects the optimal neighborhood radius and considers the color and geometric features. The flowchart of the proposed feature description method is shown in Figure 3, and the detailed procedures are described as follows.
to small as λ λ λ Step 4: Set the thresholds k2 and k1; the points with the relation of λ are considered as the key point PFi.

Feature Description
As for the traditional feature descriptors, the neighborhood radius for all the p is a fixed value. Most studies select the appropriate radius based on empirical knowle which has strong subjectivity and low efficiency. Besides, the color information is igno which makes it difficult to fully and accurately characterize the objects. Therefore, a n feature descriptor called ACFPFH is defined, which adaptively selects the optimal ne borhood radius and considers the color and geometric features. The flowchart of the posed feature description method is shown in Figure 3, and the detailed procedure described as follows.

Adaptive Optimal Neighborhood Selection
The choice is preferred where the radius is more flexible and allowed to vary w a dataset. The proper neighborhood radius obtained by adaptive selection can reduc runtime of feature extraction under the premise of ensuring precision. Therefore, a gen method for obtaining the adaptive optimal neighborhood radius ropt is proposed in paper, without the limit of prior knowledge. The detailed procedures about deprivin adaptive optimal neighborhood radius ropt are described as follows.

Adaptive Optimal Neighborhood Selection
The choice is preferred where the radius is more flexible and allowed to vary within a dataset. The proper neighborhood radius obtained by adaptive selection can reduce the runtime of feature extraction under the premise of ensuring precision. Therefore, a general method for obtaining the adaptive optimal neighborhood radius r opt is proposed in this paper, without the limit of prior knowledge. The detailed procedures about depriving the adaptive optimal neighborhood radius r opt are described as follows.
Step 1: Set the radius range [r min , r max ] and change interval ∆r of the neighborhood search. Set the value of r min equal to the average closest point distance d p , and r max is the maximal acceptable neighborhood radius for all the points of the scene point cloud Q s , which can usually be set to a fixed value. Considering the radius of interest is usually closer to r min than to r max , the radius is calculated as follows: where r mid = r min +r max 2 , j = 1, 2, 3, . . . until r j > r max and ∆r is the adaptive neighborhood radius value step. It results in more reasonable samples near the radius of interest and less when reaching the maximal values.
Step 2: Calculate the covariance matrix C j and eigenvalues λ 1 , λ 2 , λ 3 of each neighborhood radius r j , which can determine the dimensionality characteristics of the local neighborhood. Table 1 shows the details about the dimensionality characteristics [28]. Construct the dimensionality features, including the 1D linearity feature L λ , 2D planarity feature P λ and 3D scattering feature S λ . They are represented as: where L λ + P λ + S λ = 1, and each of them can be regarded as the probability of the point P i being labeled as a 1D, 2D or 3D structure. Consequently, the task of searching for an optimal neighborhood size can be converted to finding which radius favors the corresponding dimensionality.
Step 3: The entropy function of local neighborhood E neiborhood is established as a measure of unpredictability based on information entropy theory, and it is defined as [29]: The smaller the value of the information entropy, the smaller the uncertainty of the variable, which is the core of the Shannon entropy theory [30]. Accordingly, it can be concluded that the smaller the information entropy value of the local neighborhood, the less the uncertainty of the dimensional feature of the points. That is, the greater the probability that the point belongs to a certain dimensional feature, and the more similar the spatial distribution characteristics of the local data points under the neighborhood radius, then the neighborhood radius tends to be more optimal. More immediately, it is feasible to obtain the adaptive optimal neighborhood radius r e-opt according to the minimum criterion of neighborhood entropy function.
However, the optimal neighborhood radius r e-opt obtained according to Equations (7) and (8) is based on the assumption that obvious dimensionality characteristics exist in the observed point cloud. When the dimensionality features of the point P i are indistinguishable, the optimality of the estimated neighborhood cannot be determined.
In order to avoid the limitation of the above assumptions for the scene point cloud and improve the estimation accuracy of the optimal neighborhood, a more general solution for calculating the optimal neighborhood radius r opt is proposed in this paper. The eigenvalues directly reflect the dimensional distribution characteristics of the neighborhood points. Consequently, the three eigenvalues are normalized by their sum ∑ λ j for obtaining an eigen entropy E e that is defined as: E e = −e 1 ln(e 1 ) − e 2 ln(e 2 ) − e 3 ln(e 3 ) (10) where the e j = λ j / ∑ λ j for j ∈ {1, 2, 3} represents the normalized eigenvalues summing up to 1. The optimal neighborhood radius r opt is obtained according to the minimum criterion of eigen entropy E e .

ACFPFH Feature Description
The ACFPFH feature descriptor consisting of a 3-dimensional HSV color feature and the 33-dimensional FPFH geometric feature is proposed in this section, which is shown as Equation (11): where the PF i is the key points of the pallet point cloud. The specific calculation procedures of color feature and geometric feature are as follows: (1) Color feature calculation The point cloud data acquired by the Percipio FM851-E2 3D vision sensor contain information such as the color and coordinates of the object. Due to the high correlation between components in RGB color space, color cognitive properties cannot be intuitively expressed. Therefore, RGB color space is not suitable for feature similarity detection. Compared with RGB color space, HSV color space is easier to distinguish and more consistent with human visual characteristics. H represents the hue, S represents the saturation and V represents the value. HSV color space is exploited to form a color feature descriptor of the key point PF i , and it can be converted from the RGB color space [31].
(2) Geometric feature calculation FPFH is an efficient local feature descriptor which reflects the normal relationship between query points and neighborhood points of point cloud data. The detailed calculation procedures are explained as follows: Step 1: For each key point PF i (or query point P q ), select all of the neighborhood points P qj of the query point P q that are enclosed in the sphere with an adaptive optimal neighborhood r opt , as shown in Figure 4. The red point P q in the middle of the figure is the query point, and the colored points P q1 -P q5 in the black circle are the neighborhood points of P q , and those blue points P q6 -P q15 are the neighborhood points of the colored points P q1 -P q5 .  Step 2: The point pairs s t p ,p are generated based on the query point q P a neighborhood points qj P . Estimate their corresponding normal s n and t n . The r relationship between the point pairs s t p ,p is obtained by establishing a local fra shown in Figure 5.  Step 2: The point pairs p s , p t are generated based on the query point P q and the neighborhood points P qj . Estimate their corresponding normal n s and n t . The relative relationship between the point pairs p s , p t is obtained by establishing a local frame, as shown in Figure 5. Step 2: The point pairs s t p ,p are generated based on the query point q P and the neighborhood points qj P . Estimate their corresponding normal s n and t n . The relative relationship between the point pairs s t p ,p is obtained by establishing a local frame, as shown in Figure 5.
Step 3: The angles α , ϕ and θ are calculated for representing the deviation between the normal vectors s n and t n , which forms the simplified point feature histograms (SPFH). Step 4: For each neighborhood point qj P of i PF , the opt r is re-determined and the neighboring SPFH value is used to weight the final histograms of i PF , whose results are called FPFH. Taking the point p s as the coordinate origin, the coordinate frame is set up with u, v and w axes. The axis is defined as: Step 3: The angles α, ϕ and θ are calculated for representing the deviation between the normal vectors n s and n t , which forms the simplified point feature histograms (SPFH).
where p represents Step 4: For each neighborhood point P qj of PF i , the r opt is re-determined and the neighboring SPFH value is used to weight the final histograms of PF i , whose results are called FPFH. FPFH where k represents the number of the neighborhood point P qj , and ω represents the weight, which is the reciprocal of the distance between P s and P t . Figure 6 shows an example of the ACFPFH of one point. where k represents the number of the neighborhood point Pqj, and ω represents the weight, which is the reciprocal of the distance between Ps and Pt. Figure 6 shows an example of the ACFPFH of one point.

Surface Matching
Accurate surface matching is an important prerequisite for point cloud registration, which directly affects the performance of pallet detection. For the traditional surface matching method, the one-way feature matching is performed and the method for eliminating incorrect matching point pairs only considers the relationship between points, leading to too many incorrect matching pairs. Therefore, a new surface matching method called BNNDR-ACTN is proposed, which includes feature matching based on the Bidirectional Nearest Neighbor Distance Ratio (BNNDR) and the incorrect matching point pairs' elimination based on the Approximate Congruent Triangle Neighborhood (ACTN). The architecture of the proposed surface matching method is shown in Figure 7, and the detailed procedures are described as follows.

Surface Matching
Accurate surface matching is an important prerequisite for point cloud registration, which directly affects the performance of pallet detection. For the traditional surface matching method, the one-way feature matching is performed and the method for eliminating incorrect matching point pairs only considers the relationship between points, leading to too many incorrect matching pairs. Therefore, a new surface matching method called BNNDR-ACTN is proposed, which includes feature matching based on the Bidirectional Nearest Neighbor Distance Ratio (BNNDR) and the incorrect matching point pairs' elimination based on the Approximate Congruent Triangle Neighborhood (ACTN). The architecture of the proposed surface matching method is shown in Figure 7, and the detailed procedures are described as follows.
where k represents the number of the neighborhood point Pqj, and ω represents the weight, which is the reciprocal of the distance between Ps and Pt. Figure 6 shows an example of the ACFPFH of one point.

Surface Matching
Accurate surface matching is an important prerequisite for point cloud registration, which directly affects the performance of pallet detection. For the traditional surface matching method, the one-way feature matching is performed and the method for eliminating incorrect matching point pairs only considers the relationship between points, leading to too many incorrect matching pairs. Therefore, a new surface matching method called BNNDR-ACTN is proposed, which includes feature matching based on the Bidirectional Nearest Neighbor Distance Ratio (BNNDR) and the incorrect matching point pairs' elimination based on the Approximate Congruent Triangle Neighborhood (ACTN). The architecture of the proposed surface matching method is shown in Figure 7, and the detailed procedures are described as follows.  Module 1: Feature matching The purpose of point cloud feature matching is to establish the relationship between the feature descriptors of the template point cloud and the scene point cloud, thereby obtaining the initial matching point pair.
Step 1: Forward matching where the threshold th is a constant between 0 and 1. Step If the key points PF Module 2: Elimination of wrong matching point pairs The surface of the object is rough and noisy, which leads to some mismatching point pairs. Therefore, after obtaining initial matching pairs, the next step is to eliminate the wrong matching point pairs. Step where dist( · ) represents the distance between two points, W = max(dist), and t represents the degree of approximation between the point pairs. Then, the final correct matching point pairs set CP = Q i CS , Q i CM is obtained by repeating the above steps. Considering the stability of the triangle, each point in the point cloud is expanded into a triangular neighborhood. Therefore, the point-to-point matching problem is transformed into the neighborhood matching problem, which can obtain more feature information and improve registration accuracy. In addition, each point in the point cloud is regarded as the vertex of the triangle, which benefits the maintenance of the geometric characteristics of the original point cloud.
2.6. Point Cloud Registration 2.6.1. Coarse Registration The main task of point cloud coarse registration is to obtain the relationship between the template point cloud Q M and scene point cloud Q S and provide an ideal initial position for fine registration. This computation is based on the correct matching point pairs set The main steps are as follows: Step 1: Three correspondences are randomly selected to estimate the rigid transformation matrix R 0 and T 0 .
Step 2: Calculate the distance D(R 0 , T 0 ) between the point Q i CS and the transformed point Q i TCM based on the transformation matrix R 0 and T 0 . Take the point in point set Q CS whose corresponding distance D(R 0 , T 0 ) is less than threshold d 0 as the inlier point; otherwise, consider it as an exterior point.
Step 3: Repeat the above steps to obtain a different rigid transformation matrix and count its corresponding number of inliers point until the maximum iteration number I 0 is reached.
Step 4: Obtain the final rigid transformation matrix R 0 and T 0 with the most interior points; the template point cloud Q M is transformed into the coordinate system of the scene point cloud Q S to complete the coarse registration. Define the transformed template point cloud Q M as Q MT .

Fine Registration
The Iterative Closest Point (ICP) algorithm is used to achieve point cloud fine registration. It is based on minimizing the error function to calculate the optimal rotation matrix and translation matrix. The specific procedures of fine registration are as follows: Step 1: For each point Q i MT in the transformed template point cloud Q MT , search for its nearest neighbor point Q i S in the scene point cloud Q S , thereby generating the corresponding points pairs set C F = Q i MT , Q i S .
Step 2: Use the least square method to solve the rotation matrix R n and translation matrix T n with the smallest average distance e n between the corresponding points.
where n is the number of iterations and k is the number of the corresponding points.
Step 3: Repeat the above steps and obtain the optimal rotation matrix R f and translation matrix T f until e n is smaller than distance threshold e f , or the maximum number of iterations I f is reached. The new template point cloud Q MF is obtained by using the transformation matrix R f and T f , and fine registration is completed.

Evaluation Index
In order to validate the performance of the proposed ACFPFH feature descriptor and the overall registration method, two representative indicators are developed to evaluate the experiment results, and the details of these indicators are briefly described as follows. The experiment was performed using MATLAB Code on a desktop with 3.6 GHz inter ® Core™ i7-11700kf CPU and 16 G memory.
(1) Precision-recall curve The precision-recall curve (PRC) is used to evaluate the descriptiveness of a feature descriptor. The precision is calculated as the number of correct matching point pairs with respect to the total number of matching point pairs: where N CP represents the number of correct matching point pairs, and N MP represents the number of matching point pairs. The recall is calculated as the number of correct matching point pairs with respect to the number of key points of the template point cloud: (22) where N PF represents the number of key points of the template point cloud. The value of the threshold th used for performing feature matching in Section 2.5 varies from 0 to 1 to calculate the precision and recall under each threshold and obtain the PRC.
(2) Root mean square error Root mean square error (RMSE) is the error evaluation index commonly used in point cloud registration, which represents the average of the sum of squared distances between the corresponding points of the two point clouds. It is defined as: where P i and Q j are the corresponding points, and m is the number of the corresponding point pairs. The smaller the value of RMSE, the better the fine registration result.

Experiment Preparation
In order to verify the effectiveness and feasibility of the proposed pallet detection method, a widely used industrial camera called the Percipio FM851-E2 3D vision sensor is adopted to acquire point cloud data for comparative analysis of the results. The vision sensor shown in Figure 8 consists of an RGB camera and a depth sensor which is composed of an infrared camera and structured light projector. Its length, width and height are 124.0 mm, 86.8 mm and 28.6 mm, respectively. The RGB camera captures the RGB image with a resolution of 1280 × 960 and the depth sensor captures the depth image with a resolution of 320 × 240. Percipio FM851-E2 3D vision sensor active binocular vision technology is used for measuring the distance, and its operative range is from 0.7 m to 6.0 m. image with a resolution of 1280 × 960 and the depth sensor captures the depth image with a resolution of 320 × 240. Percipio FM851-E2 3D vision sensor active binocular vision technology is used for measuring the distance, and its operative range is from 0.7 m to 6.0 m. The Percipio FM851-E2 3D vision sensor is mounted on the top of the carriage of a real automated guided vehicle, which means the camera will move along with the forks, as shown in Figure 9. Given that the length of the fork is 1150 mm, the distance between the top of the fork and the front face of the pallet is set to 500 mm so that the automated guided vehicle is able to adjust its position. Meanwhile, it is necessary to ensure that the fork is perpendicular to the front face of the pallet and that the center of the sensor is in line with the center of the pallet. The specific placement of the pallet is shown in Figures  9 and 10. Considering the effect of illumination on point cloud data acquired by the sensor, all experiments are carried out under normal daytime illumination. In this case, the size of the pallet is 1200 mm × 1000 mm × 150 mm, and it is extracted from the scene point The Percipio FM851-E2 3D vision sensor is mounted on the top of the carriage of a real automated guided vehicle, which means the camera will move along with the forks, as shown in Figure 9. Given that the length of the fork is 1150 mm, the distance between the top of the fork and the front face of the pallet is set to 500 mm so that the automated guided vehicle is able to adjust its position. Meanwhile, it is necessary to ensure that the fork is perpendicular to the front face of the pallet and that the center of the sensor is in line with the center of the pallet. The specific placement of the pallet is shown in Figures 9  and 10. Considering the effect of illumination on point cloud data acquired by the sensor, all experiments are carried out under normal daytime illumination. In this case, the size of the pallet is 1200 mm × 1000 mm × 150 mm, and it is extracted from the scene point cloud and considered as the template point cloud. The Percipio FM851-E2 3D vision sensor is mounted on the top of the carriage of a real automated guided vehicle, which means the camera will move along with the forks, as shown in Figure 9. Given that the length of the fork is 1150 mm, the distance between the top of the fork and the front face of the pallet is set to 500 mm so that the automated guided vehicle is able to adjust its position. Meanwhile, it is necessary to ensure that the fork is perpendicular to the front face of the pallet and that the center of the sensor is in line with the center of the pallet. The specific placement of the pallet is shown in Figures  9 and 10. Considering the effect of illumination on point cloud data acquired by the sensor, all experiments are carried out under normal daytime illumination. In this case, the size of the pallet is 1200 mm × 1000 mm × 150 mm, and it is extracted from the scene point cloud and considered as the template point cloud.    The Percipio FM851-E2 3D vision sensor is mounted on the top of the carriage of a real automated guided vehicle, which means the camera will move along with the forks, as shown in Figure 9. Given that the length of the fork is 1150 mm, the distance between the top of the fork and the front face of the pallet is set to 500 mm so that the automated guided vehicle is able to adjust its position. Meanwhile, it is necessary to ensure that the fork is perpendicular to the front face of the pallet and that the center of the sensor is in line with the center of the pallet. The specific placement of the pallet is shown in Figures  9 and 10. Considering the effect of illumination on point cloud data acquired by the sensor, all experiments are carried out under normal daytime illumination. In this case, the size of the pallet is 1200 mm × 1000 mm × 150 mm, and it is extracted from the scene point cloud and considered as the template point cloud.

Implementation Process
For the pallets on the ground, the color image of the scene is acquired by the Percipio FM851-E2 3D vision sensor at the same distance of 500 mm, which is shown in Figure 11. Figure 12 shows the pallet template point cloud and the scene point cloud. Then, the outliers of the scene point cloud are eliminated. The normal of the ground and the wall are [0, 1, 0] and [0, 0, 1], respectively. Set distance threshold D ε = 0.02 m and angle threshold β ε = 5 • ; the plane segmentation is performed for the scene point cloud after removing outliers, and the result is shown in Figure 13. The ISS algorithm is used to extract key points with the search radius of 0.013 m and thresholds κ 1 = 0.6, κ 2 = 0.75, which can guarantee the efficiency and accuracy of the method. The number of points in the pallet template point cloud decreased from 2661 to 492, and the number of points in the scene point cloud decreased from 41,351 to 576, as shown in Figure 14, and the red points in Figure 14 represent the key points. The point cloud image shown below contains the RBG information of the point cloud and therefore has different colors.
liers, and the result is shown in Figure 13. The ISS algorithm is used to extract key points with the search radius of 0.013 m and thresholds κ κ 1 2 =0.6, =0.75 , which can guarantee the efficiency and accuracy of the method. The number of points in the pallet template point cloud decreased from 2661 to 492, and the number of points in the scene point cloud decreased from 41,351 to 576, as shown in Figure 14, and the red points in Figure 14 represent the key points. The point cloud image shown below contains the RBG information of the point cloud and therefore has different colors.   cloud decreased from 2661 to 492, and the number of points in the scene point cloud decreased from 41,351 to 576, as shown in Figure 14, and the red points in Figure 14 represent the key points. The point cloud image shown below contains the RBG information of the point cloud and therefore has different colors.   with the search radius of 0.013 m and thresholds κ κ 1 2 =0.6, =0.75 , which can guarantee the efficiency and accuracy of the method. The number of points in the pallet template point cloud decreased from 2661 to 492, and the number of points in the scene point cloud decreased from 41,351 to 576, as shown in Figure 14, and the red points in Figure 14 represent the key points. The point cloud image shown below contains the RBG information of the point cloud and therefore has different colors.   It is necessary to determine the adaptive neighborhood radius for each point before calculating the ACFPFH feature descriptor. Given that the interval between two sampling points of point cloud data acquired by Percipio FM851-E2 3D vision sensor is 0.007 m, set the radius range  It is necessary to determine the adaptive neighborhood radius for each point before calculating the ACFPFH feature descriptor. Given that the interval between two sampling points of point cloud data acquired by Percipio FM851-E2 3D vision sensor is 0.007 m, set the radius range r min = 0.007m, r max = 0.015m, r mid = 0.011m and r ∆ = 0.0005m. The adaptive optimal neighborhood radius of each point is obtained with the minimum criterion of neighborhood information entropy function. The adaptive optimal neighborhood radius distribution of pallet template point cloud and scene point cloud is shown in Figure 15. The horizontal axis represents the value of the neighborhood radius, and the vertical axis represents the number of points corresponding to each neighborhood radius. Among the 2661 points in the pallet template point cloud, there are 855 points with an optimal neighborhood radius of 0.007 m. Among the 41,351 points in the ground scene point cloud, there are 15,421 points with an optimal neighborhood radius of 0.007 m. It meets where the optimal neighborhood radius of points is concentrated at the given minimum neighborhood radius, which aids in improving the efficiency. It is necessary to determine the adaptive neighborhood radius for each point before calculating the ACFPFH feature descriptor. Given that the interval between two sampling points of point cloud data acquired by Percipio FM851-E2 3D vision sensor is 0.007 m, set the radius range = min r 0.007m , = max r 0.015m, = mid r 0.011m and Δ = r 0.0005m . The adaptive optimal neighborhood radius of each point is obtained with the minimum criterion of neighborhood information entropy function. The adaptive optimal neighborhood radius distribution of pallet template point cloud and scene point cloud is shown in Figure  15. The horizontal axis represents the value of the neighborhood radius, and the vertical axis represents the number of points corresponding to each neighborhood radius. Among the 2661 points in the pallet template point cloud, there are 855 points with an optimal neighborhood radius of 0.007 m. Among the 41,351 points in the ground scene point cloud, there are 15,421 points with an optimal neighborhood radius of 0.007 m. It meets where the optimal neighborhood radius of points is concentrated at the given minimum neighborhood radius, which aids in improving the efficiency.
(a) (b) Figure 15. Adaptive optimal neighborhood radius distribution of point cloud. (a) Adaptive optimal neighborhood radius distribution of template point cloud. (b) Adaptive optimal neighborhood radius distribution of ground scene point cloud.
Extract the HSV color components of the key points of the pallet template point cloud and the scene point cloud, and calculate the geometric feature based on the adaptive optimal neighborhood radius. The ACFPFH feature descriptor is obtained by superimposing the color and geometric features. The feature matching is completed with the distance ratio threshold of = th 0.75 . The initial matching result is shown in Figure 16a. The green line connects the corresponding points between the pallet template point cloud and the scene point cloud. Obviously, there are some wrong matching point pairs. The wrong matching point pairs are eliminated by using the wrong matching point pairs elimination Extract the HSV color components of the key points of the pallet template point cloud and the scene point cloud, and calculate the geometric feature based on the adaptive optimal neighborhood radius. The ACFPFH feature descriptor is obtained by superimposing the color and geometric features. The feature matching is completed with the distance ratio threshold of th = 0.75. The initial matching result is shown in Figure 16a. The green line connects the corresponding points between the pallet template point cloud and the scene point cloud. Obviously, there are some wrong matching point pairs. The wrong matching point pairs are eliminated by using the wrong matching point pairs elimination algorithm based on the ACTN, and the result is shown in Figure 16b. The RANSAC algorithm is used for coarse registration to calculate the rough transformation matrix, and the ICP algorithm is used to obtain a final transformation matrix and complete fine registration.

Performance Evaluation
The PRC is used to evaluate the descriptiveness of a feature descriptor. The ACFPFH feature descriptor is compared with the classical feature descriptors including FPFH, CFPFH and Signature of Histogram of Orientation (SHOT) with the fixed neighborhood radius.
The is considered as the selected distance ratio threshold set of the feature matching stage, and the PRC corresponding to different feature descriptors is obtained, as shown in Figure 17. Take th = 0.75 to compare the accuracy of different feature descriptors, as shown in Table 2. Table 3 lists the time required for feature extraction of the scene point cloud of different feature descriptors, and the bold characters are the experimental results of the proposed method.

Performance Evaluation
The PRC is used to evaluate the descriptiveness of a feature descriptor. The ACFPFH feature descriptor is compared with the classical feature descriptors including FPFH, CF-PFH and Signature of Histogram of Orientation (SHOT) with the fixed neighborhood radius.
The set th = {0.2, 0.4, 0.6, 0.75, 0.85, 0.925, 0.95, 9.975, 1.0} is considered as the selected distance ratio threshold set of the feature matching stage, and the PRC corresponding to different feature descriptors is obtained, as shown in Figure 17. Take th = 0.75 to compare the accuracy of different feature descriptors, as shown in Table 2. Table 3   is considered as the selected distance ratio threshold set of the feature matching stage, and the PRC corresponding to different feature descriptors is obtained, as shown in Figure 17. Take th = 0.75 to compare the accuracy of different feature descriptors, as shown in Table 2. Table 3 lists the time required for feature extraction of the scene point cloud of different feature descriptors, and the bold characters are the experimental results of the proposed method.  Traditional feature descriptors such as SHOT and FPFH only describe the geometric feature of the pallet and ignore the color information, so the precision is lower. The CFPFH feature descriptor considers the HSV color information, which improves the precision. The neighborhood radiuses of the above three feature descriptors are obtained by complex and inefficient manual debugging methods, which are not suitable for all points in the point cloud. A large neighborhood radius leads to too many key points in the neighborhood, which reduces the speed of feature extraction. The ACFPFH feature descriptor not only contains color information but also adaptively selects the optimal neighborhood radius for each key point, so that it performs better in terms of effectiveness and precision. It is well known that the closer the curve is to the upper right, the better the performance of the feature descriptor in the PRC graph. It can be seen from Figure 17 that comparing with SHOT, FPFH and CFPFH feature descriptors with fixed radiuses, the ACF-PFH feature descriptor has the best performance. It can be seen from Tables 2 and 3 that when th = 0.75, compared with the SHOT feature descriptor with a neighborhood radius of 0.011 m, the precision is improved by 29.40%, and the time required for feature extraction is reduced by 14.57%. Compared with the FPFH feature descriptor with a neighborhood radius of 0.011 m, the precision is improved by 39.10%, and the time required for feature extraction is reduced by 11.03%. Compared with the CFPFH feature descriptor with a neighborhood radius of 0.011 m, the precision is improved by 16.68%, and the feature extraction time is reduced by 18.87%.
The RMSE and runtime are used to evaluate the performance of the registration algorithms. Popular algorithms including ICP, SHOT + ICP, FPFH + ICP and CFPFH + ICP are selected to compare with the proposed method in this paper. The number of iterations, the RMSE and the runtime of the above methods are detailed in Table 4. The initial position relationship of the pallet template point cloud and the scene point cloud and the registration results of different methods are shown in Figure 18, and the red points are the template point cloud.   The following conclusions can be drawn from Table 4 and Figure 18: The traditional ICP algorithm has a large registration error due to the large initial pose difference. It takes 27.256 s to realize registration, which cannot meet the real-time requirements in intelligent manufacturing systems. The modified ICP registration methods such as SHOT + ICP, FPFH + ICP and CFPFH + ICP perform coarse registration, providing a better initial position for the fine registration by the ICP algorithm. Compared with the traditional ICP algorithm, the RMSE and runtime are reduced. However, the feature descriptors used by the above methods lack neighborhood selection criteria, which leads to an increase in the overall registration runtime. The proposed method has minimal registration error and the least runtime, which shows higher efficiency and proves that the proposed method has a The following conclusions can be drawn from Table 4 and Figure 18: The traditional ICP algorithm has a large registration error due to the large initial pose difference. It takes 27.256 s to realize registration, which cannot meet the real-time requirements in intelligent manufacturing systems. The modified ICP registration methods such as SHOT + ICP, FPFH + ICP and CFPFH + ICP perform coarse registration, providing a better initial position for the fine registration by the ICP algorithm. Compared with the traditional ICP algorithm, the RMSE and runtime are reduced. However, the feature descriptors used by the above methods lack neighborhood selection criteria, which leads to an increase in the overall registration runtime. The proposed method has minimal registration error and the least runtime, which shows higher efficiency and proves that the proposed method has a more significant improvement than other methods. Furthermore, the precision and efficiency of the proposed method also meet the production requirement in intelligent manufacturing systems.

Case Study II
Shelves are widely used in intelligent manufacturing systems, which can improve the utilization rate of warehouse space and realize the rational allocation of resources while ensuring the quality of goods. Hence, it is necessary to complete the pallet detection of the shelf scene. The color image and the point cloud of the shelf scene are acquired with the same distance from the ground scene, as shown in Figure 19. The same pallet template point cloud is used to perform pallet detection of the shelf scene, and the parameters are consistent with the ground scene in Case Study I. After extracting the key points, the number of points in the scene point cloud decreased from 30,469 to 658. The adaptive optimal neighborhood radius distribution of the scene point cloud in the shelf scene is shown in Figure 20. Figure 21 and Table 5 show the registration result.

Case Study II
Shelves are widely used in intelligent manufacturing systems, which can improve the utilization rate of warehouse space and realize the rational allocation of resources while ensuring the quality of goods. Hence, it is necessary to complete the pallet detection of the shelf scene. The color image and the point cloud of the shelf scene are acquired with the same distance from the ground scene, as shown in Figure 19. The same pallet template point cloud is used to perform pallet detection of the shelf scene, and the parameters are consistent with the ground scene in Case Study I. After extracting the key points, the number of points in the scene point cloud decreased from 30,469 to 658. The adaptive optimal neighborhood radius distribution of the scene point cloud in the shelf scene is shown in Figure 20. Figure 21 and Table 5 show the registration result.

Case Study II
Shelves are widely used in intelligent manufacturing systems, which can improve the utilization rate of warehouse space and realize the rational allocation of resources while ensuring the quality of goods. Hence, it is necessary to complete the pallet detection of the shelf scene. The color image and the point cloud of the shelf scene are acquired with the same distance from the ground scene, as shown in Figure 19. The same pallet template point cloud is used to perform pallet detection of the shelf scene, and the parameters are consistent with the ground scene in Case Study I. After extracting the key points, the number of points in the scene point cloud decreased from 30,469 to 658. The adaptive optimal neighborhood radius distribution of the scene point cloud in the shelf scene is shown in Figure 20. Figure 21 and Table 5 show the registration result.

Case Study II
Shelves are widely used in intelligent manufacturing systems, which can improve the utilization rate of warehouse space and realize the rational allocation of resources while ensuring the quality of goods. Hence, it is necessary to complete the pallet detection of the shelf scene. The color image and the point cloud of the shelf scene are acquired with the same distance from the ground scene, as shown in Figure 19. The same pallet template point cloud is used to perform pallet detection of the shelf scene, and the parameters are consistent with the ground scene in Case Study I. After extracting the key points, the number of points in the scene point cloud decreased from 30,469 to 658. The adaptive optimal neighborhood radius distribution of the scene point cloud in the shelf scene is shown in Figure 20. Figure 21 and Table 5 show the registration result.    Compared with the traditional ICP algorithm, the RMSE of the proposed method is greatly reduced, and the runtime is reduced from 29.523 s to 0.989 s, which proves that the efficiency and accuracy have been greatly improved. Compared with other modified ICP registration methods, the RMSE of the proposed method is the smallest, and the runtime is the shortest. In summary, the proposed method still achieves optimal performance in the shelf scene, which shows its effectiveness in different scenes. Furthermore, the above case studies demonstrate that the proposed method can be well applied in intelligent manufacturing systems to realize accurate and efficient pallet detection. In addition, feature descriptors can often determine the final performance in the process of point cloud registration. Combining a good feature descriptor with a good matching strategy would improve the efficiency of point cloud registration.

Conclusions
A novel pallet detection method for automated guided vehicles based on point cloud data is proposed in this paper. The contributions of this paper can be concluded as follows:

1.
A novel pallet detection method for automated guided vehicles based on point cloud data is proposed, which can be used for automated guided vehicles to perform automated and effective pallet handling, thereby promoting the transformation and upgrading of the manufacturing industry.

2.
A new Adaptive Color Fast Point Feature Histogram (ACFPFH) feature descriptor has been built for the description of pallet features, which overcomes shortcomings such as low efficiency, time-consumption, poor robustness, and random parameter selection in feature description.

3.
A new surface matching method called the Bidirectional Nearest Neighbor Distance Ratio-Approximate Congruent Triangle Neighborhood (BNNDR-ACTN) is proposed, which transforms the point-to-point matching problem into the neighborhood matching problem and can obtain more feature information and improve the detection accuracy.
Due to the measurement accuracy of the 3D vision sensor being easily affected by environmental factors such as illumination and obstacles, a more robust and efficient pallet detection method will be researched, which is suitable for more complex scenarios.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.