A Novel Surface Descriptor for Automated 3-D Object Recognition and Localization

This paper presents a novel approach to the automated recognition and localization of 3-D objects. The proposed approach uses 3-D object segmentation to segment randomly stacked objects in an unstructured point cloud. Each segmented object is then represented by a regional area-based descriptor, which measures the distribution of surface area in the oriented bounding box (OBB) of the segmented object. By comparing the estimated descriptor with the template descriptors stored in the database, the object can be recognized. With this approach, the detected object can be matched with the model using the iterative closest point (ICP) algorithm to detect its 3-D location and orientation. Experiments were performed to verify the feasibility and effectiveness of the approach. With the measured point clouds having a spatial resolution of 1.05 mm, the proposed method can achieve both a mean deviation and standard deviation below half of the spatial resolution.


Introduction
Nowadays, both 2-D and 3-D machine vision systems are widely integrated with robot manipulators to enhance the flexibility and versatility of modern manufacturing systems. These intelligent integrated systems can accelerate manufacturing and produce efficiently customized products to enhance competitiveness. 2-D machine vision systems [1][2][3][4] are still used in most integrated systems due to their high accuracy and low cost. However, the operation accuracy in these systems is limited by the viewing angle of optical sensing and is especially sensitive to the alignment precision of the jig and fixture employed in locating the workpiece. Systems that involve 3-D data processing can overcome the existing difficulties in 2-D digital imaging by relying on both shape and color information of the objects. These systems have recently been integrated with automated machine manipulators for pick and place applications [5][6][7][8][9][10][11].
One of the most important tasks in 3-D data processing is to determine the position and orientation of the target in 3-D space. This task remains challenging in automation because the target can be of any geometric form and its positioning orientation has six degrees of freedom. Thus, it is difficult to detect the target with an unstructured data input, such as range point clouds. In recent years, several attempts have been made to solve this nontrivial problem. According to their characteristics, the proposed strategies can be divided into graph-based, feature-based, and view-based methods.
Graph-based methods extract the geometric properties of a 3-D shape using a graph, which represents the type and spatial relations between shape components [6,7,12]. In these approaches, the topology graph is built using the detected primitive shapes in the scanned scene. The query graph, which represents the structure of the CAD model and is manually defined by the user, is employed object can be recognized by comparing the computed curvature-based histogram with the template histograms in the database. View-based methods can be applied to almost all kinds of objects. However, the disadvantage is that the comparison of the database and the acquired data is time-consuming.
From the above review, it can be seen that one of the reasons why the 3-D vision system has not been widely used in robotic automation is the long computation time required for practical application. In view of this, a fast and effective method for 3-D object recognition and localization is proposed to deal with the difficulties existing in this task. The main idea of the approach is that two objects are similar if the distributions of their surface areas in their oriented bounding boxes (OBBs) are the same.
In this study, the proposed approach for 3D object recognition and localization can deal with the difficulties existing for practical applications such as robotic automation. The developed method proposes a new index for 3-D object recognition with a good efficiency in operation and robustness to illumination variations. In addition, the key solution can recognize stacked objects with arbitrary orientation. The main idea of the approach starts from the point that two objects are similar if the distributions of their surface areas in their OBBs are the same. The method comprises two main stages. In the first stage, the proposed regional area-based descriptors are computed for implementing the shape-matching algorithm for reliable object recognition. In the second stage, the position and orientation of the target are initially determined by aligning the OBBs and further refinement by the iterative closest point (ICP) algorithm.
The rest of this paper is organized as follows. Section 2 presents the proposed method for object recognition and localization employing regional area-based descriptors. The experimental results and analysis are shown in Section 3. Section 4 discusses the characteristics and limitations of the developed method in detail. Finally, the conclusions and further work are summarized in Section 5.

Methodology
Given a point cloud that represents the scene of the randomly stacked objects in an unstructured bin, the task is to recognize and localize the target object in the scene point clouds. To deal with the task, the proposed approach first separates the scene point clouds into individual object point clouds [22]. Each segmented object can be represented by a feature vector, which is computed according to its OBB and object surface area. The feature vector is then matched with the feature vectors kept in the database, which represent the different views of the object model. According to the matching results, the transformation matrix can be initially computed to align the segmented object with the object model. Finally, the 3-D position and orientation of the target object in the scene point clouds can be estimated using the ICP algorithm. The overview and flowchart of the proposed method are shown in Figures 1 and 2, respectively. In addition, the operation procedure of the proposed method is described in Algorithm 1. Output: Position and Orientation of objects.

Object Segmentation
The point cloud data acquired from the 3-D scanner comprises information of one view of different objects in the scene. In order to obtain the segmented point clouds corresponding to one view of an object for a later recognition task, the scene point clouds should be further separated into smaller point clouds. In recent years, many techniques have been developed to obtain the individual parts from the point cloud data. The popular techniques are the region growing method [23,24], k-nearest neighbors clustering algorithm [25,26], and graph theoretic approach [27][28][29].
In this study, the developed 3-D object segmentation algorithm for randomly stacked objects [29,30] is employed to obtain the segmented point clouds corresponding to the object in the scene point clouds. The basic idea of the proposed algorithm is that the points located geometrically farther from the surface boundary of the projected object are more likely to belong to the same object than the other points (as shown in Figure 3b). The points far away from the surface boundary can be used as an internal seed marker for object region growth (as illustrated in Figure 3c). With a flooding algorithm, the scene point clouds can be clearly separated into smaller individual object point clouds (see Figure 3d). The general process of the proposed algorithm is presented in Figure 3.

Object Segmentation
The point cloud data acquired from the 3-D scanner comprises information of one view of different objects in the scene. In order to obtain the segmented point clouds corresponding to one view of an object for a later recognition task, the scene point clouds should be further separated into smaller point clouds. In recent years, many techniques have been developed to obtain the individual parts from the point cloud data. The popular techniques are the region growing method [23,24], k-nearest neighbors clustering algorithm [25,26], and graph theoretic approach [27][28][29].
In this study, the developed 3-D object segmentation algorithm for randomly stacked objects [29,30] is employed to obtain the segmented point clouds corresponding to the object in the scene point clouds. The basic idea of the proposed algorithm is that the points located geometrically farther from the surface boundary of the projected object are more likely to belong to the same object than the other points (as shown in Figure 3b). The points far away from the surface boundary can be used as an internal seed marker for object region growth (as illustrated in Figure 3c). With a flooding algorithm, the scene point clouds can be clearly separated into smaller individual object point clouds (see Figure 3d). The general process of the proposed algorithm is presented in Figure 3.

Regional Area-based Descriptor
To find the similarity between the point cloud model M and the segmented point cloud O, which is extracted using the object segmentation algorithm, the feature descriptor of the segmented object is compared with the feature descriptors of different views of the model. A feature descriptor is normally defined by two essential elements [30], namely the OBB and surface area of the object. An OBB consists of a corner and three principle vectors (shown in Figure 4). The surface area of the object defines the histogram of all the surface areas within the OBB. In general, the total number of subdivided boxes is k1 x k2 x k3 when the OBB is defined by k1, k2, and k3 (shown in Figure 5). The surface area in each subdivided box (Vijk) is described as Sv, in which v = kk1k2 + jk1 + i. Let S be the total surface area of the segmented object in the OBB and fv equal Sv/S. The feature descriptor can be described as follows: } ,..., ,..., , , where nV = k1k2k3 -1; C: corner vector; CC1, CC2, and CC3: principle vectors corresponding to the maximum, middle, and minimum dimensions of the OBB, respectively.

Regional Area-based Descriptor
To find the similarity between the point cloud model M and the segmented point cloud O, which is extracted using the object segmentation algorithm, the feature descriptor of the segmented object is compared with the feature descriptors of different views of the model. A feature descriptor is normally defined by two essential elements [30], namely the OBB and surface area of the object. An OBB consists of a corner and three principle vectors (shown in Figure 4). The surface area of the object defines the histogram of all the surface areas within the OBB. In general, the total number of subdivided boxes is k 1 × k 2 × k 3 when the OBB is defined by k 1 , k 2 , and k 3 (shown in Figure 5). The surface area in each subdivided box (V ijk ) is described as S v , in which v = kk 1 k 2 + jk 1 + i. Let S be the total surface area of the segmented object in the OBB and f v equal S v /S. The feature descriptor can be described as follows: where n V = k 1 k 2 k 3 − 1; C: corner vector; CC 1 , CC 2 , and CC 3 : principle vectors corresponding to the maximum, middle, and minimum dimensions of the OBB, respectively.
where nV = k1k2k3 -1; C: corner vector; CC1, CC2, and CC3: principle vectors corresponding to the maximum, middle, and minimum dimensions of the OBB, respectively.  Finally, according to Sv, the regional area-based descriptor can be built as shown in Figure 6. In this example, the OBB of the object point clouds is subdivided into eight sub-boxes. In addition, three parameters k1, k2, and k3 are set to equal 2.  The proposed descriptor utilizes the distribution of the object's surface area inside the OBB of the object to represent the object in 3D space. The OBB regional area-based descriptor is invariable to arbitrary poses of the objects, because the surface area is an intrinsic property that is completely independent of object positions and orientations in space. Furthermore, the developed approach is robust to numerous variations of surface sampling density and noise generated from the measurement process. In order to match the model point cloud and the segmented point cloud extracted using the object segmentation algorithm, the feature descriptor of the segmented object is compared with the feature descriptors of various views of the model in the database using Finally, according to S v , the regional area-based descriptor can be built as shown in Figure 6. In this example, the OBB of the object point clouds is subdivided into eight sub-boxes. In addition, three parameters k 1 , k 2 , and k 3 are set to equal 2. Finally, according to Sv, the regional area-based descriptor can be built as shown in Figure 6. In this example, the OBB of the object point clouds is subdivided into eight sub-boxes. In addition, three parameters k1, k2, and k3 are set to equal 2.  The proposed descriptor utilizes the distribution of the object's surface area inside the OBB of the object to represent the object in 3D space. The OBB regional area-based descriptor is invariable to arbitrary poses of the objects, because the surface area is an intrinsic property that is completely independent of object positions and orientations in space. Furthermore, the developed approach is robust to numerous variations of surface sampling density and noise generated from the measurement process. In order to match the model point cloud and the segmented point cloud extracted using the object segmentation algorithm, the feature descriptor of the segmented object is compared with the feature descriptors of various views of the model in the database using The proposed descriptor utilizes the distribution of the object's surface area inside the OBB of the object to represent the object in 3D space. The OBB regional area-based descriptor is invariable to arbitrary poses of the objects, because the surface area is an intrinsic property that is completely independent of object positions and orientations in space. Furthermore, the developed approach is robust to numerous variations of surface sampling density and noise generated from the measurement process. In order to match the model point cloud and the segmented point cloud extracted using the object segmentation algorithm, the feature descriptor of the segmented object is compared with the feature descriptors of various views of the model in the database using normalized cross-correlation (NCC).
The total number (n) of subdivided boxes in the OBB is an important parameter for computation efficiency. When the OBB contains more subdivided boxes, matching normally takes more time and the matching accuracy is thus increased. To shorten the matching procedure, fewer subdivided boxes are preferable. However, to ensure matching accuracy, the number of subdivided boxes should be adequately set to achieve meaningful matching. Figure 7 illustrates three examples of the regional area-based descriptors of an L-shape object with three different types of segments, in which

Estimation of Oriented Bounding Box
The OBB of an object is a rectangular bounding box that covers all object point clouds. The orientation of an OBB can be determined using the covariance matrix [31]. The associated algorithm is described in the following steps: Calculate the center p of the point cloud P; Compute the covariance matrix: Extract the eigenvectors {v1, v2, v3} from the covariance matrix; Determine the dimensions of the object defined in each eigenvector using the distances between the nearest and farthest projected points.
The corner C(xC, yC, zC) farthest from the center p will be chosen from eight corners of the

Estimation of Oriented Bounding Box
The OBB of an object is a rectangular bounding box that covers all object point clouds. The orientation of an OBB can be determined using the covariance matrix [31]. The associated algorithm is described in the following steps: Calculate the center p of the point cloud P; Compute the covariance matrix: Extract the eigenvectors {v 1 , v 2 , v 3 } from the covariance matrix; Determine the dimensions of the object defined in each eigenvector using the distances between the nearest and farthest projected points.
The corner C(x C , y C , z C ) farthest from the center p will be chosen from eight corners of the OBB. Then, the vectors CC i=1-3 can be established from the largest, medium, and smallest dimensions of the OBB, respectively. However, the distances between the corners and the center point are sometimes indistinguishable or the dimensions of the OBB along CC i=1-3 are similar. In these circumstances, all possibilities have to be taken into account to determine the best choice using NCC mentioned in Section 2.4.2.

Simplified Regional Area-based Descriptor
The normalized surface area in each subdivided box can be calculated as follows: where n t is the number of triangles in the triangle mesh and n s is the number of triangles that are inside the subdivided box.
In order to reduce the computation time for the regional area-based descriptor, the number of triangles inside the subdivided box can be replaced by the number of points inside the subdivided box. In addition, the number of triangles in the triangle mesh can be substituted by the number of points in the point clouds, as expressed below: where n p v is the number of points inside the v th subdivided box and n is the number of points in the object point clouds.
By applying Equation (4), a new form of the regional area-based descriptor can be expressed as follows: Equation (5) is the simplified regional area-based descriptor. The descriptor captures less precise information included in the surface area-based descriptor, but it still retains most of the discriminative power of the surface area-based descriptor. The regional area-based descriptor and its simplification are shown in Figure 8b. As can be seen, the distribution of the surface area inside the OBB of the object in this case is almost the same as the distribution of points inside the OBB. Furthermore, the efficiency of simplification is demonstrated by six tests in Figure 9.
Equation (5) is the simplified regional area-based descriptor. The descriptor captures less precise information included in the surface area-based descriptor, but it still retains most of the discriminative power of the surface area-based descriptor. The regional area-based descriptor and its simplification are shown in Figure 8b. As can be seen, the distribution of the surface area inside the OBB of the object in this case is almost the same as the distribution of points inside the OBB. Furthermore, the efficiency of simplification is demonstrated by six tests in Figure 9. (b) Regional area-based descriptor of the object (blue) and its simplification (red).

Figure 9.
Comparison of computation time between original regional area-based descriptor and simplified descriptor inside OBBs of six point clouds.
regional area-based descriptor simplified descriptor (b) Regional area-based descriptor of the object (blue) and its simplification (red).
OBB of the object in this case is almost the same as the distribution of points inside the OBB. Furthermore, the efficiency of simplification is demonstrated by six tests in Figure 9. (b) Regional area-based descriptor of the object (blue) and its simplification (red).

Figure 9.
Comparison of computation time between original regional area-based descriptor and simplified descriptor inside OBBs of six point clouds.
regional area-based descriptor simplified descriptor Figure 9. Comparison of computation time between original regional area-based descriptor and simplified descriptor inside OBBs of six point clouds.

D Virtual Camera
A virtual sensor is developed to extract the template point clouds corresponding to various views of the model. The virtual sensor with the same internal properties as the real one is located at the origin of the world coordinate system. The central axis of the virtual camera is defined as the z-axis. The CAD model of the object is positioned on the z-axis. The working distance, t z , is shown in Figure 10. The reference point clouds are generated by rotating the model around the x-axis with every increment θ x and around the y-axis with every increment θ y . For each rotation, a set of point clouds is created corresponding to one view of the model. Figure 11 shows the different views of the CAD model in Figure 10.
the origin of the world coordinate system. The central axis of the virtual camera is defined as the zaxis. The CAD model of the object is positioned on the z-axis. The working distance, tz, is shown in Figure 10. The reference point clouds are generated by rotating the model around the x-axis with every increment θx and around the y-axis with every increment θy. For each rotation, a set of point clouds is created corresponding to one view of the model. Figure 11 shows the different views of the CAD model in Figure 10. (d−f) regional area-based descriptors of point clouds of (a−c) cases, respectively. A virtual sensor is developed to extract the template point clouds corresponding to various views of the model. The virtual sensor with the same internal properties as the real one is located at the origin of the world coordinate system. The central axis of the virtual camera is defined as the zaxis. The CAD model of the object is positioned on the z-axis. The working distance, tz, is shown in Figure 10. The reference point clouds are generated by rotating the model around the x-axis with every increment θx and around the y-axis with every increment θy. For each rotation, a set of point clouds is created corresponding to one view of the model. Figure 11 shows the different views of the CAD model in Figure 10. (d−f) regional area-based descriptors of point clouds of (a−c) cases, respectively. (d-f) regional area-based descriptors of point clouds of (a-c) cases, respectively.

Feature Matching
Given a feature vector FV O that represents the object point clouds O, the matched feature vector FV M that represents a view of the model needs to be estimated. The feature vector FV M is built through the database generation process in the offline phase. The feature vector FV comprises the OBB parameters and the regional area-based descriptor. In the first step, the OBB parameters between the object point clouds and the object model are compared for best matching from a series of viewpoints [30]. When the OBB matching result satisfies the preset condition, the regional area-based descriptor between the object point clouds is then matched with the object model defined from the viewpoint. The general process for matching the object point clouds with the model is shown in Figure 12.
parameters and the regional area-based descriptor. In the first step, the OBB parameters between the object point clouds and the object model are compared for best matching from a series of viewpoints [30]. When the OBB matching result satisfies the preset condition, the regional area-based descriptor between the object point clouds is then matched with the object model defined from the viewpoint. The general process for matching the object point clouds with the model is shown in Figure 12.  CCM1(xMmax, yMmax, zMmax), CCM2(xMmid, yMmid, zMmid), and CCM3(xMmin, yMmin, zMmin), while the OBB of the object point clouds is denoted by CC1(xmax, ymax, zmax), CC2(xmid, ymid, zmid), and CC3(xmin, ymin, zmin), respectively. These two OBBs should be satisfied by the following equation: where dthresh is the given adequate threshold.
2.4.2.. Matching Criteria for Regional Area-based Descriptor The regional area-based descriptor presents the histogram of the surface area of the object. The resemblance between the object and model descriptors is measured using NCC. Let FO = {fv, v = 0, …, nV} be the regional area-based descriptor of the object point clouds and FM = {fMv, v = 0, …, nV} be the regional area-based descriptor of the template point clouds. The NCC between FO and FM is computed as follows:

OBB Matching
The OBB matching process determines the views of the object model comprising the same dimensions as the object point clouds. The OBB of each template point cloud is represented by three vectors, including CC M1 (x Mmax , y Mmax , z Mmax ), CC M2 (x Mmid , y Mmid , z Mmid ), and CC M3 (x Mmin , y Mmin , z Mmin ), while the OBB of the object point clouds is denoted by CC 1 (x max , y max , z max ), CC 2 (x mid , y mid , z mid ), and CC 3 (x min , y min , z min ), respectively. These two OBBs should be satisfied by the following equation: where d thresh is the given adequate threshold.

2.4.2.
Matching Criteria for Regional Area-based Descriptor The regional area-based descriptor presents the histogram of the surface area of the object. The resemblance between the object and model descriptors is measured using NCC. Let F O = {f v , v = 0, . . . , n V } be the regional area-based descriptor of the object point clouds and F M = {f Mv , v = 0, . . . , n V } be the regional area-based descriptor of the template point clouds. The NCC between F O and F M is computed as follows: where f = 1 If the coefficient C(F O , F M ) is larger than a given threshold, the matching result is good and the two feature vectors FV O and FV M can be adopted to estimate the initial transformation matrix between the object point clouds and the model.

Transformation Estimation and Refinement
Through the matching step, the correspondence feature vectors FV O and FV M are obtained. According to these feature vectors, the initial transformation matrix T initial between the object point clouds and the model can be estimated by aligning the frame {C M , v M1 , v M2 , v M3 } that represents the model to the frame {C, v 1 , v 2 , v 3 } that represents the object point clouds.
The accuracy of initial pose estimation is limited by the set of templates in the database. Each template describes only one view of the object and the number of templates is restricted; hence, the matched template may not be exactly the same as the detected object. For example, some small parts of the detected object are missed from the template. Hence, the estimated 3-D pose may be incorrect. Additionally, the measurement data can lose some high-frequency information (such as the edges of the object) owing to limitation of the measurement system. The measurement errors can reduce the accuracy of the initial 3-D pose estimation. Therefore, it is necessary to refine the estimated initial 3-D pose. The algorithm often performed to obtain the refined transformation is the ICP algorithm [32,33]. The ICP algorithm is a matching process that minimizes the fitting deviation between two matching point clouds, and iteratively refines the transformation through minimizing the distance between the points of the object point clouds and the model. The steps of the ICP algorithm are described as follows:

•
For each point p ∈ O, find the closest point q ∈ M; • Estimate the rotation matrix R and translation vector t that minimize the root mean squared distance; • Transform O k+1 ← Q(O k ) using the estimated parameters; • Terminate the iteration when the change in error falls below the preset threshold.

Experimental Results and Analysis
The feasibility of the proposed methodology is verified by comparing simulated and actual data obtained from experiments on industrial objects. Figure 13 shows the experimental setup for the robot pick and place application that involves the regional area-based descriptor. The developed 3-D scanner has been integrated with the robotic arm to acquire the 3-D point clouds that represent the randomly stacked objects in the scene. The measurement volume of the 3-D scanner is approximately 147 × 110 × 80 mm 3 . The simulation data provided by Industrial Technology Research Institute (ITRI) comprise the database of six different work parts. The dimensions of each object in the ITRI database are shown in Table 1. The resolution of the scene point clouds in the database is greater than 0.5 mm. Actual data are measured using the developed 3-D optical scanner according to the random speckle pattern projection principle and the triangulation theory. The depth resolution of the measured data is about 0.3 mm and the spatial resolution is 1.05 mm. In the experiment, the datasets are processed on a computer with a Core i5 processor (3.40 GHz and 4 GB RAM). . System setup for pick and place application using regional area-based descriptor.

Model
Dimensions (mm) Name 3-D representation Length Width Height Figure 13. System setup for pick and place application using regional area-based descriptor.

3-D representation Length Width Height
Brazo control Figure 13. System setup for pick and place application using regional area-based descriptor. Cylinder Figure 13. System setup for pick and place application using regional area-based descriptor.  Figure 13. System setup for pick and place application using regional area-based descriptor. Hammerhead Figure 13. System setup for pick and place application using regional area-based descriptor. A viewpoint is defined as a set of six parameters, with three position parameters (x, y, z) defining the spatial position of the 3-D sensor, and three orientation parameters (Rx, Ry, Rz) defining the direction of the sensor. The accuracy of a point cloud in the measurement using the 3-D sensor depends on the angle of incidence of the sensor on the surface. The ideal angle is π/2; that is, the closer the angle of incidence of the sensor is to the normal surface direction, the more accurate the measured points. To ensure quality of the measured data, a quality criterion is included in the experimental system. This criterion states that acquired point clouds detected using local normal vectors should satisfy a uniform distribution condition. Therefore, the simplified regional area-based descriptor is utilized for all tested data.

Case Study on Simulated Data
In testing the simulation data, Gaussian random noise is added to the object point clouds with increasing standard deviation on σ from 0.001 to 1.0. Following this, the object descriptor is compared with the descriptors of the templates to determine the correlation coefficient using Equation (7) A viewpoint is defined as a set of six parameters, with three position parameters (x, y, z) defining the spatial position of the 3-D sensor, and three orientation parameters (Rx, Ry, Rz) defining the direction of the sensor. The accuracy of a point cloud in the measurement using the 3-D sensor depends on the angle of incidence of the sensor on the surface. The ideal angle is π/2; that is, the closer the angle of incidence of the sensor is to the normal surface direction, the more accurate the measured points. To ensure quality of the measured data, a quality criterion is included in the experimental system. This criterion states that acquired point clouds detected using local normal vectors should satisfy a uniform distribution condition. Therefore, the simplified regional area-based descriptor is utilized for all tested data.

Case Study on Simulated Data
In testing the simulation data, Gaussian random noise is added to the object point clouds with increasing standard deviation on σ from 0.001 to 1.0. Following this, the object descriptor is compared A viewpoint is defined as a set of six parameters, with three position parameters (x, y, z) defining the spatial position of the 3-D sensor, and three orientation parameters (R x , R y , R z ) defining the direction of the sensor. The accuracy of a point cloud in the measurement using the 3-D sensor depends on the angle of incidence of the sensor on the surface. The ideal angle is π/2; that is, the closer the angle of incidence of the sensor is to the normal surface direction, the more accurate the measured points. To ensure quality of the measured data, a quality criterion is included in the experimental system. This criterion states that acquired point clouds detected using local normal vectors should satisfy a uniform distribution condition. Therefore, the simplified regional area-based descriptor is utilized for all tested data.

Case Study on Simulated Data
In testing the simulation data, Gaussian random noise is added to the object point clouds with increasing standard deviation on σ from 0.001 to 1.0. Following this, the object descriptor is compared with the descriptors of the templates to determine the correlation coefficient using Equation (7). For each model, the correlation coefficient corresponding to each Gaussian noise level is the average of 30 values, which measure the similarity between 30 different object point clouds with the model. The accuracy of the proposed method is evaluated by determining the root mean squares (RMS), translation, and rotation errors. In addition, the computation time of the object recognition and localization process is measured to evaluate the efficiency of the proposed method. Figure 14 shows examples of input object point clouds with different Gaussian noise levels. The effect of Gaussian noise on matching between the object descriptor and model descriptor is illustrated in Figure 15. In this experiment, the correlation coefficients obtained can exceed 0.8 with a noise level of σ below 1.0. the spatial position of the 3-D sensor, and three orientation parameters (Rx, Ry, Rz) defining the direction of the sensor. The accuracy of a point cloud in the measurement using the 3-D sensor depends on the angle of incidence of the sensor on the surface. The ideal angle is π/2; that is, the closer the angle of incidence of the sensor is to the normal surface direction, the more accurate the measured points. To ensure quality of the measured data, a quality criterion is included in the experimental system. This criterion states that acquired point clouds detected using local normal vectors should satisfy a uniform distribution condition. Therefore, the simplified regional area-based descriptor is utilized for all tested data.

Case Study on Simulated Data
In testing the simulation data, Gaussian random noise is added to the object point clouds with increasing standard deviation on σ from 0.001 to 1.0. Following this, the object descriptor is compared with the descriptors of the templates to determine the correlation coefficient using Equation (7). For each model, the correlation coefficient corresponding to each Gaussian noise level is the average of 30 values, which measure the similarity between 30 different object point clouds with the model. The accuracy of the proposed method is evaluated by determining the root mean squares (RMS), translation, and rotation errors. In addition, the computation time of the object recognition and localization process is measured to evaluate the efficiency of the proposed method. Figure 14 shows examples of input object point clouds with different Gaussian noise levels. The effect of Gaussian noise on matching between the object descriptor and model descriptor is illustrated in Figure 15. In this experiment, the correlation coefficients obtained can exceed 0.8 with a noise level of σ below 1.0.
true alg err Figure 15. Effect of Gaussian noise on matching between object descriptor and model descriptor.
The accuracy of the 3-D object recognition and localization algorithm can be evaluated by RMS, translation, and rotation errors. The RMS error is calculated using the distance between corresponding points in the CAD model (p itrue ) and transformed point clouds (p ialg ) under the estimated transformation matrix. Translation error (T err ) is the absolute difference between the true and computed translation vectors (T true and T alg ). Rotation error (q err ) is the absolute difference between the true and computed unit quaternions (q true and q alg ) representing the rotations of objects in 3-D space. Figures 16-18 show the RMS, translation, and rotation errors, respectively, of six different types of objects in the ITRI (Industrial Technology Research Institute) database. The maximum RMS error is smaller than 0.15 mm. The translation errors do not exceed 0.06 mm. In addition, the rotation errors are smaller than 0.005 • . the true and computed unit quaternions (qtrue and qalg) representing the rotations of objects in 3-D space. (11) Figures 16, 17, and 18 show the RMS, translation, and rotation errors, respectively, of six different types of objects in the ITRI (Industrial Technology Research Institute) database. The maximum RMS error is smaller than 0.15 mm. The translation errors do not exceed 0.06 mm. In addition, the rotation errors are smaller than 0.005°.      To evaluate the efficiency of the proposed recognition and localization algorithm, the simplified regional area-based descriptor is employed. This descriptor captures less precise information included in the surface area-based descriptor, but it retains most of the discriminative power of the regional area-based descriptor. The computation time of the object recognition and localization tasks is summarized in Figure 19. The average time required to recognize and localize the object is less than 0.3 s for all tested objects. The comparison of performances between the proposed approach and existing methods is reported in Table 2. As can be seen, the developed method outperforms in pose estimation accuracy To evaluate the efficiency of the proposed recognition and localization algorithm, the simplified regional area-based descriptor is employed. This descriptor captures less precise information included in the surface area-based descriptor, but it retains most of the discriminative power of the regional area-based descriptor. The computation time of the object recognition and localization tasks is summarized in Figure 19. The average time required to recognize and localize the object is less than 0.3 s for all tested objects.  To evaluate the efficiency of the proposed recognition and localization algorithm, the simplified regional area-based descriptor is employed. This descriptor captures less precise information included in the surface area-based descriptor, but it retains most of the discriminative power of the regional area-based descriptor. The computation time of the object recognition and localization tasks is summarized in Figure 19. The average time required to recognize and localize the object is less than 0.3 s for all tested objects. The comparison of performances between the proposed approach and existing methods is reported in Table 2. As can be seen, the developed method outperforms in pose estimation accuracy The comparison of performances between the proposed approach and existing methods is reported in Table 2. As can be seen, the developed method outperforms in pose estimation accuracy and computation cost. In terms of computational efficiency, it is worth noting that the benefits of speed offered by the proposed method outweigh the others. Thus, regional area-based object recognition and the localization algorithm can achieve a real-time and accurate pose estimation of 3D objects in cluttered range images.

Case Study on Measured Data
In the experiments with the measured data, different types of objects were selected to test the effectiveness of the proposed algorithm. The samples are randomly stacked on the table to ensure randomness in their positions and orientations. The performance of object matching can be evaluated by judging the distance from each point in the measured cloud to the closest point in the model [34]. Denote d i as the distance between a point (p i ) in the measured cloud and its closest point (q i ) in the model (shown in Figure 20). Then, the mean distance, µ, and standard deviation, σ, can be computed to evaluate the matching condition as follows: where n is the number of points in the measured point clouds.
and computation cost. In terms of computational efficiency, it is worth noting that the benefits of speed offered by the proposed method outweigh the others. Thus, regional area-based object recognition and the localization algorithm can achieve a real-time and accurate pose estimation of 3D objects in cluttered range images.

Case Study on Measured Data
In the experiments with the measured data, different types of objects were selected to test the effectiveness of the proposed algorithm. The samples are randomly stacked on the table to ensure randomness in their positions and orientations. The performance of object matching can be evaluated by judging the distance from each point in the measured cloud to the closest point in the model [34]. Denote di as the distance between a point (pi) in the measured cloud and its closest point (qi) in the model (shown in Figure 20). Then, the mean distance, μ, and standard deviation, σ, can be computed to evaluate the matching condition as follows: ( ) where n is the number of points in the measured point clouds. As seen in Figure 21, the location and orientation of all objects can be effectively detected by using the developed method. Three experimental cases are shown, in which three, seven, and two various parts are contained in each case, respectively, for object recognition and localization. The correlation coefficients in the matching process are higher than 0.9. For all tested data, the localization As seen in Figure 21, the location and orientation of all objects can be effectively detected by using the developed method. Three experimental cases are shown, in which three, seven, and two various parts are contained in each case, respectively, for object recognition and localization. The correlation coefficients in the matching process are higher than 0.9. For all tested data, the localization of each part can be completed within 0.5 s. In addition, the mean deviation ranges between 0.180 mm and 0.469 mm and the standard deviation ranges between 0.168 mm and 0.484 mm (shown in Table 3). of each part can be completed within 0.5 s. In addition, the mean deviation ranges between 0.180 mm and 0.469 mm and the standard deviation ranges between 0.168 mm and 0.484 mm (shown in Table 3).
(a) (b) (c) Figure 21. Object recognition and localization results for different types of objects: (a) three parts randomly stacked; (b) seven parts randomly stacked and (c) two parts randomly stacked. Table 3. Matching performance for objects in Figure 21. In Figure 22, two cases were further tested for the verification of handling more complexity. Case 1 has five parts with an objective to detect a connector, while Case 2 has five parts with an objective to detect a 3D-printed hammer. By using the developed method, two connectors were effectively detected with part orientation determined in Case 1. In Case 2, the 3D-printing hammer, which is partly overlapped by a toy model, can also be detected effectively with its orientation localized.

Case study
(a) (b) (c) Figure 21. Object recognition and localization results for different types of objects: (a) three parts randomly stacked; (b) seven parts randomly stacked and (c) two parts randomly stacked. Table 3. Matching performance for objects in Figure 21. In Figure 22, two cases were further tested for the verification of handling more complexity. Case 1 has five parts with an objective to detect a connector, while Case 2 has five parts with an objective to detect a 3D-printed hammer. By using the developed method, two connectors were effectively detected with part orientation determined in Case 1. In Case 2, the 3D-printing hammer, which is partly overlapped by a toy model, can also be detected effectively with its orientation localized.

Discussion
The proposed methods have been implemented and tested on non-overlapping, multiple overlapping, and stacked objects in 3-D scene point clouds. In this section, the advantages and limitations of object recognition and localization are discussed. The developed method represents the object in 3-D space based on the distribution of the object's surface area inside the OBB. Surface area

Discussion
The proposed methods have been implemented and tested on non-overlapping, multiple overlapping, and stacked objects in 3-D scene point clouds. In this section, the advantages and limitations of object recognition and localization are discussed. The developed method represents the object in 3-D space based on the distribution of the object's surface area inside the OBB. Surface area is an intrinsic surface property and independent of surface sampling; hence, the developed feature descriptor is invariant to arbitrary rotations and translations of the object. In addition, the feature descriptor is less sensitive to surface sampling and noise. As illustrated in Figure 15, the correlation coefficients exceed 0.8 with a noise level σ equal to 1.0.
The effectiveness and accuracy of the proposed 3-D object recognition and localization have been tested on both simulated and measured data. The accuracy of the proposed method in the experiment with simulated data is evaluated in terms of RMS, translation, and rotation errors, which were found to be below 0.15 mm, 0.06 mm, and 0.005 • , respectively.
The proposed algorithm employs a point-to-point ICP algorithm to match the scene point clouds with the model; hence, the accuracy can be affected by the resolution of the scene point clouds, which is about 0.5 mm in the experiment. Therefore, some important features of the object located in the small surface regions are only represented by a small number of measured points, thus causing errors when estimating the position and orientation of the target object. For example, the scene nos. 19 and 28 of the finefood object shown in Figure 23, respectively, are only represented by a rather small number of point clouds. Due to this reason, the RMS and rotation errors of scene nos. 19 and 28 of the finefood object shown in Figure 16 (in Page 15) and Figure 18 (in Page 16), respectively, are larger than those of the other scenes.

Discussion
The proposed methods have been implemented and tested on non-overlapping, multiple overlapping, and stacked objects in 3-D scene point clouds. In this section, the advantages and limitations of object recognition and localization are discussed. The developed method represents the object in 3-D space based on the distribution of the object's surface area inside the OBB. Surface area is an intrinsic surface property and independent of surface sampling; hence, the developed feature descriptor is invariant to arbitrary rotations and translations of the object. In addition, the feature descriptor is less sensitive to surface sampling and noise. As illustrated in Figure 15, the correlation coefficients exceed 0.8 with a noise level σ equal to 1.0.
The effectiveness and accuracy of the proposed 3-D object recognition and localization have been tested on both simulated and measured data. The accuracy of the proposed method in the experiment with simulated data is evaluated in terms of RMS, translation, and rotation errors, which were found to be below 0.15 mm, 0.06 mm, and 0.005°, respectively.
The proposed algorithm employs a point-to-point ICP algorithm to match the scene point clouds with the model; hence, the accuracy can be affected by the resolution of the scene point clouds, which is about 0.5 mm in the experiment. Therefore, some important features of the object located in the small surface regions are only represented by a small number of measured points, thus causing errors when estimating the position and orientation of the target object. For example, the scene nos. 19 and 28 of the finefood object shown in Figure 23, respectively, are only represented by a rather small number of point clouds. Due to this reason, the RMS and rotation errors of scene nos. 19 and 28 of the finefood object shown in Figure 16  Errors in the initial transformation estimation process depend considerably on the rotation increments θx and θy of templates generated by the virtual camera. To enhance the matching accuracy, the rotation increment is controlled to be smaller. However, a smaller increment can easily generate a significant increase in the runtime of the operation process. Take for instance an experiment having 10,000 points, such as Wrench shown in Table 2; its computation time in the case θx = θx = 0.157 (rad) was 3 s, which is much longer than that in the case θx = θx = 0.785 0.157 (rad), at merely 0.15 s.
In the experiment with the actual measurement data, the accuracy of the proposed method is estimated according to the matching deviation between the object point clouds and the model. The performance of object matching is evaluated using the mean distance and its standard deviation. With the measured point clouds having a depth resolution of 0.3 mm and spatial resolution of 1.05 × 1.05 mm 2 , the proposed method can achieve a mean deviation below 0.47 mm and standard deviation Errors in the initial transformation estimation process depend considerably on the rotation increments θ x and θ y of templates generated by the virtual camera. To enhance the matching accuracy, the rotation increment is controlled to be smaller. However, a smaller increment can easily generate a significant increase in the runtime of the operation process. Take for instance an experiment having 10,000 points, such as Wrench shown in Table 2; its computation time in the case θ x = θ x = 0.157 (rad) was 3 s, which is much longer than that in the case θ x = θ x = 0.785 0.157 (rad), at merely 0.15 s.
In the experiment with the actual measurement data, the accuracy of the proposed method is estimated according to the matching deviation between the object point clouds and the model. The performance of object matching is evaluated using the mean distance and its standard deviation. With the measured point clouds having a depth resolution of 0.3 mm and spatial resolution of 1.05 × 1.05 mm 2 , the proposed method can achieve a mean deviation below 0.47 mm and standard deviation below 0.49 mm. To increase the accuracy of the developed algorithm, the best way is to improve the quality and resolution of the measured point clouds.
The efficiency of the 3-D object recognition and localization algorithm is a very important parameter for the 3-D vision system in practical applications. In the experiment, for a set of more than 16k points in one 3-D image map, the proposed method requires only 0.5 s for each recognized object using a common PC, and it can be reduced to 0.1 s by simplification. To substitute it for a regional area-based descriptor, the measured point cloud is required to have a certain level of density and uniform distribution. The computation cost of the algorithm is proved to be efficient for in-line industrial automation.

Conclusions
In this study, a new method for automated 3-D object recognition and localization has been developed using 3-D point clouds. Experimental results indicate the effectiveness of the proposed method. In addition, the developed approach can accurately detect the position and orientation of the target objects, which are randomly stacked in an unstructured bin. The detection accuracy is affected by the spatial and depth resolution of the measured point cloud. The proposed method can achieve a localization accuracy better than half of the spatial resolution of the point cloud. The developed algorithm would be particularly useful for the automation of workpiece manipulation and handling in manufacturing sectors. Enhancement of computational efficiency of the object recognition process is achievable by further employing parallel computing techniques.