Dynamic Workpiece Modeling with Robotic Pick-Place Based on Stereo Vision Scanning Using Fast Point-Feature Histogram Algorithm

: In the era of rapid development in industry, an automatic production line is the fundamental and crucial mission for robotic pick-place. However, most production works for picking and placing workpieces are still manual operations in the stamping industry. Therefore, an intelligent system that is fully automatic with robotic pick-place instead of human labor needs to be developed. This study proposes a dynamic workpiece modeling integrated with a robotic arm based on two stereo vision scans using the fast point-feature histogram algorithm for the stamping industry. The point cloud models of workpieces are acquired by leveraging two depth cameras, type Azure Kinect Microsoft, after stereo calibration. The 6D poses of workpieces, including three translations and three rotations, can be estimated by applying algorithms for point cloud processing. After modeling the workpiece, a conveyor controlled by a microcontroller will deliver the dynamic workpiece to the robot. In order to accomplish this dynamic task, a formula related to the velocity of the conveyor and the moving speed of the robot is implemented. The average error of 6D pose information between our system and the practical measurement is lower than 7%. The performance of the proposed method and algorithm has been appraised on real experiments of a speciﬁed stamping workpiece.


Introduction
Under the trend of Industry 4.0, the concept of unmanned factories has dramatically emerged. All the governments over the world are also promoting smart manufacturing policies. It is expected that by developing unmanned smart factories to cope with the current shortage of labor, many manufacturers have introduced production lines combined with the robotic arm. Especially, in the metal forming industry, manual work in the current situation still plays an important role [1]. Hence, to improve production efficiency and reduce labor costs, as well as reduce the danger of human loss when working with the hightemperature workpiece in the hot-stamping process [2], it is required that the objects are first arranged and placed in a position, and then the robot arm is used to pick up and unload the materials to achieve the loading and unloading of the objects. The pre-work then is more troublesome. In order to fully achieve automation of new-generation smart factories, the combination of robotic arms and visual images has gradually become the current development trend. Handreg et al. [3] illustrate an idea to convert the current standard cold forming process of curved panels in the shipbuilding industry to computerized process control and quality examination using a conventional press, crane set up, and point cloud model based on a 3D vision system to monitor the production process successively. As a suggestion for our study to utilize three-dimensional images in the stamping industry, we have processed the point cloud to estimate the six-axis degree of freedom position and feature point information of hot-stamping workpieces. The latest modern Microsoft Azure Kinect camera [4], which is an application of the Time of Flight (ToF) principle [5], is the vision device used for this application to establish a 3D point cloud model in the scene. Later, the point cloud is leveraged to process the 3D point cloud data through a point-to-feature matching algorithm [6] to identify the stamping workpiece and its 6-DoF (6 degrees-of-freedom) position. The stamping workpieces in this study are placed on a conveyor to perform dynamic pick-and-place. Guo et al. [7] have a similar method of using a Kinect camera and a laser scanner to construct the point cloud and identify the position of the fruits, which were later used for harvesting. Compared to their research, a system with two depth cameras calibrated by stereo calibration in our study will generate both a target model point cloud and a reference model point cloud. Hence, the demand for a lower processing time as well as accuracy can be met. Moreover, setting up two cameras with two different viewing angles also allows the point cloud to fully model the object with just one scan. After the 6-DoF position of the workpieces is figured out, they will be transferred by the conveyor and the robot arm will pick them into the stamping die. Zheng et al. [8] stated that in a hot-stamping production line of high-strength steel, in order to evade oxidation and plasticity reduction when the temperature falls, the sheet metal has to be shifted from the furnace to the stamping die as quickly as possible. Therefore, they have proposed a method of synchronizing the motion of the feeding system. A description related to the velocity of the conveyor and the moving speed of the robot is executed for the purpose of dynamic transmission as our main goal of this study. After being placed into the machine, the system will trigger a signal for stamping to move and complete stamping. Finally, the system achieves the dynamic workpiece feeding process.

Research Methodology
The study uses ToF cameras to build a three-dimensional model and uses a pose estimation system to identify the stamping workpiece. The conveyor is used as a transmission system for delivering the workpiece to the robot gripping area. The workpiece 6D positions are automatically uploaded to the controller and execute intelligently dynamic grasping.

Experimental Devices and Setup
The hardware devices used in this study are a 6-axis robotic arm of Syntec 81R.6 Axis (Syntec, Taiwan). for gripping experiments and a depth camera of Microsoft Azure (Microsoft, Redmond, WA, USA) for object recognition. In addition, a stepper motor of model StepSyn 103H7126-0461 (Sanyo Denki, Japan) and a driver of model DRV8825 (Texas Instruments, Dallas, TX, USA) are used to control the conveyor system that utilizes a microprocessor of Arduino Mega 2560 (Arduino, Italy). Figure 1 demonstrates the overall setup of the system. Under the conditions of the experiment, the room temperature was applied (15-25 • C). The distance from both cameras to the object is in the range of 0.5 to 0.85 m, which satisfied the requirement of the manufacturer (>0.5 m). A computer will be utilized as the main controller, which is connected to the microprocessor, two depth cameras, and also the robot arm. The scanning area will contain two cameras which are integrated into a fixed aluminum frame. After the system has identified the posture of the objects, the conveyor will transmit the workpiece for the robot to perform the dynamic gripping. The robot grasps the stamping workpiece with the pneumatic gripper. Afterward, the full cycle of automation will be finished when the robot feeds the workpiece to the stamping machine.

Point Cloud Contrustion and Pre-Processing
In order to achieve the dynamic feeding system, first, the point cloud which is later used for pose estimation will be constructed [9]. Two depth devices are utilized for generating the point cloud model. Afterward, some filters will be applied to the scene point cloud with the purpose of resulting in the input object's model for the lateral program.

Stereo Calibration Principle
Two ToF depth cameras will generate two distinguished point clouds. Combining these two point clouds will create a point cloud with more details so the data for the location system will be more accurate. Calibration for combining two point clouds is performed using the stereo calibration principle [10]. Stereo calibration will help the study to find out the transformation matrix rotation, R, and translation, t, between two RGB cameras of the Azure Kinect.
Firstly, to use stereo calibration, the parameters of the transformation between an object in 3D space and the 2D image observed by the camera from visual information have to be determined. The pinhole camera model used to calibrate a camera is described in Figure 2. Considering that P(X, Y, Z) is the 3D world coordinates point of the object, the 3D coordinate of the same point in the camera frame, P cam , is: where R is the 3 × 3 rotation matrix and t is the 3 × 1 translation matrix. Let p(x, y) be the position of the 3D point in the image coordinate, this will result in the 3D to 2D mapping; here, K is the intrinsic matrix of the camera: The definition of the intrinsic matrix of the camera is as follows: where f x is the X-axis focal length and f y is the Y-axis focal length of the camera, and c x , c y is the coordinate of the principal point. The camera calibration's purpose is to find out the intrinsic parameter, K matrix, and extrinsic parameters, R and t matrices. The calculation uses linear algebra to find all the parameters. To achieve these parameters, multiple images of a checkerboard with a fixed square size will be taken and all the calibration patterns (the cross-point of the black/white consecutive squares) in each image will be found. These calibration patterns in the image correspond to some 3D points in the world. These point-to-point correspondences will be stored, and after that, the non-linear algorithm is used to solve the calibration parameters.
After the intrinsic matrices K 1 and K 2 of the two cameras are known, the next problem is figuring out the rotation, R 12 , and translation, t 12 , between camera 1 and camera 2, which will contribute to finding point correspondences in the left and right image planes. The schematic diagram of the stereo calibration of the two cameras is described in Figure 3. Let p 1 and p 2 be a point in the camera 1 and camera 2 image coordinates respectively, which is the mapping of world coordinate P in 3D space ( Figure 3). The fundamental matrix F is defined as a mapping from a point in an image plane to an epipolar line in the other image. Therefore, the following equation can be obtained: The form of the fundamental matrix in terms of the two camera projection matrices: M 1 and M 2 , may be derived algebraically. The ray that is back-projected from p 1 by M 1 is obtained by solving: The one-parameter family of solutions of Equation (5) is of the form given by: where M −1 1 is the pseudo-inverse of M 1 , i.e., M 1 M −1 1 = I, and C 1 is the null vector, namely the camera 1 center, defined by M 1 C 1 = 0. The ray is parametrized by the scalar λ. Now consider two situations: when λ = 0 and λ = ∞, Equation (6) becomes: These two points in the above situations can be imaged by camera 2, with projection matrix M 2 at M 2 M −1 1 p 1 and M 2 C 1 respectively, in the second view. The epipolar line is the line joining these two projected points: From Equations (4) and (8), we obtain: Now that the cameras are calibrated, let us assume that the world origin is at the camera 1 center: Then, Therefore, substitute M 1 , M 2 , M −1 1 , and C 1 to Equations (10) and (11), then Equation (9) becomes: Hence, the expression for F = (K 2 t 12 ) × K 2 R 12 K −1 1 is purely in terms of K 1 , K 2 , R 12 , and t 12 . The correspondence relation between the two images is defined by the fundamental matrix, F, as: Using the checkerboard to take multiple pictures and find the position in each calibration pattern, multiple sets of p 1 and p 2 will be calculated. Then, the fundamental matrix, F, can be solved according to Equation (13). Afterward, Equation (12) is used to find the rotation matrix, R 12 , and the translation matrix, t 12 , where the other parameters F, K 1 , and K 2 are known.

Object Segmentation
After the scene point cloud has been created and processed by the pass-through filter, statistical outlier removal filter, and voxel grid filter to achieve the proper view of the region of interest, the object's point cloud will be segmented from the scene point cloud by combining the random sample consensus (RANSAC) [11] and the Euclidean cluster extraction algorithms [12].
The RANSAC Algorithm 1 is a learning technique to estimate the parameters of a model by random sampling of inspected data. RANSAC utilizes the voting scheme to estimate the optimal appropriate result of a processing dataset in which data components carry both inliers and outliers. Data components in the dataset are used to determine one or multiple models. In this study, the RANSAC algorithm was utilized to determine the plane model, and the following will illustrate its pseudocode. The target workpiece is placed on a worktable. In order to extract the point cloud of the target object, a worktable plane is required to be removed from the scene. After inputting the point cloud data, the plane model is estimated by applying RANSAC; then, the plane inliers are eliminated from the input cloud data [13].
To achieve the object extraction, a method called clustering classification is applied. A simple and powerful data clustering approach in the Euclidean sense can be executed using an octree data structure. This algorithm will calculate the minimum distance between points and divide the point cloud data into clusters by setting the distance threshold. The cluster definition formula is as follows: where p i is the i point cloud data, p j is the j point cloud data, and d th is the distance threshold. The distance between points is searched through the Nearest Neighbor Search (NNS) in the K-Dimension Tree (KD-Tree) [14], and the Euclidean cluster extraction algorithm is defined as follows Algorithm 2.

Algorithm 2 Euclidean cluster extraction algorithm to extract the workpiece point cloud
Input: Point cloud data P. Output: Point cloud clusters C i C i ∅;//list of clusters Q∅;//list of checked points If (p k has not been processed) then p k → Q ; End if; End while; End while; If (all points p i ∈ Q are processed) then C i Q; Q∅; End if; End while After processing through the above steps, the original point cloud model can be clustered, and according to the threshold value, the target object from the scene can be obtained to be input in the pose estimation program. Figure 4 will demonstrate the effect as well as the steps of object segmentation in this study.

Pose Estimation System Construction
The cluster from the previous section segmentation will be the input cloud of the pose estimation system. In addition, a reference point cloud of the subject also has to be saved into the database before performing position prediction. Firstly, feature points of the object will be estimated for further evaluation. Then, the matching methods between the target point cloud and reference point cloud will be carried out by comparing the features point values.

Feature Point Descriptor
This study uses the fast point-feature histograms (FPFH) [15] to estimate the feature point. FPFH is the optimization method of its predecessor-point feature histograms (PFH) [16]. PFH has the theoretical computational complexity of a point cloud P with N points, O N·k 2 , where k is the number of neighbors for each point p in P. The FPFH overtakes the PFH in this aspect with the complexity of O(N·k), while still retaining the advantage of the PFH [7].
The FPFH algorithm proceeds as follows: for each query point p with its normal vector n, only the relation between the point itself and its neighbors, p k (the neighbors are chosen inside the sphere with the radius r), with the normal vector n k (not between all of its neighbors as in PFH), is computed. Define the uwv frame (where u = n k , v = (p − p k ) × u, w = u × v) and the computation of the angular variations α, φ, θ (which is called the simplified point feature histogram-SPFH) of n and n k as follows: Then, the final histogram of p is weighted through re-determination of its k-neighbors and the previous SPFH values: where the weight, w ki , indicates the distance between point p and the neighbor point p ki . Figure 5 shows the effect area of the FPFH algorithm. In the figure, the query point p only linked with its direct neighbors p k (inside the red-dashed circle). Each of the neighbors, p k (in the figure, they are displayed with different colors with their own circle of region), are linked with their own neighbors. The resulted histogram is weighted together with the histogram of the point p to estimate the FPFH value. The gray connections marked with number 2 between the point p and its neighbors together have a weight twice as large as other normal connections. The FPFH value of each query point p will be analyzed in the next stage of the study.

Coarse Alignment Matching
Coarse alignment matching provides an initial prediction of the change between two point clouds by applying the sample consensus initial alignment (SAC-IA) algorithm [17]. In the SAC-IA algorithm, the target object point cloud's feature descriptors are matched with the reference's feature descriptors to obtain a rough pose estimation. The series of points that have almost identical FPFH values in the object as well as reference point cloud will be figured out and then the transformation matrix between these corresponding points is determined. The completion of the prevailing registration transformation of the point cloud with n times iteration is evaluated by the Huber penalty function, H(l i ), which is calculated by the following equation: (17) where m l is the threshold which is predetermined, and l i is the distance differences between after and before the transformation of each point.

Finish Alignment Matching
After the coarse alignment matching through the SAC-IA algorithm, the iterative closest point (ICP) algorithm was applied to optimize the prediction and obtain the final 6-DoF pose [18]. The reference point cloud and target point cloud which have been transformed by the SAC-IA algorithm are considered as the input data for the ICP algorithm. Then, the nearest corresponding point, Q i , will be found inside the target point cloud, Q, for each point, P i , of the reference point cloud, P. The finishing point pairs are built up through the pair P i -Q i . The error of the ICP finish alignment is defined by the acquired error of Euclidean distance between corresponding point pairs P i -Q i through the following formula: where d k is the Euclidean distance, N is the number of iterations, and R and t are the rotation and the translation matrix of the finish alignment, respectively.

Position Error Compensation of Robot Arm
The robot arm has errors when going to the specified position using the linear movement. This error happens due to the non-linearity of the positive sensitive detector (PSD). To solve this problem, a polynomial fitting algorithm is proposed [19]; however, the method will consider only the error of the XY plane. The Z-axis error will be ignored because, in the range of the study, the height of the object is not enough for the Z-axis error to affect the accuracy of the system.
Due to the different manufacturing processes, the size of the non-linear error of the PSD of each robot arm will be different. According to the different degree of the nonlinearity of the PSD, the photosensitive surface of the PSD is usually artificially divided into Region A and Region B. Region A is the center area which has good linearity and a small measurement error; on the other hand, Region B is the margin area, where the non-linear error is large and the measurement error is also large.
According to the principle of the PSD, the non-linear error of the PSD in the x and y directions is relatively independent. There are error values in the x and y directions for the measured values of each point on the PSD photosensitive surface. Consider the error of each point to be E x x i , y j for the X-axis and E y x i , y j for the Y-axis. Equation (19) defines the error: where x i and y j are the final positions after correction for the robot arm, and x i0 and y j0 are the positions after calculation of the pose estimation system from Section 2.3. According to the non-linear characteristics, the third-degree polynomials are chosen for fitting: The polynomial fitting is mainly the calculation of its coefficients, using the measured n-set of position coordinates and calculating according to Equation (19). Substituting ∆x and ∆y into Equation (20) leads to the following sets of Equation (21): Then, the equation of A × C = B can be obtained using the least square method to solve the coefficient, as shown in Equation (22): After finding all the coefficients by using Equation (22), these coefficients are applied back into Equation (19) and the actual position after correction is obtained: where X R and Y R are the actual position for the robot, X out and Y out are the output measured values, and ∆x and ∆y are the coefficients calculated above.

Synchronization of Transmission Conveyor and Robot Arm
In order to accomplish the automation production line, we used a Syntec robot and a microcontroller-controlled conveyor to transfer the workpiece to the stamping die. Furthermore, for the dynamic transmission workpiece to take place successfully, a formula needs to be proposed to synchronize the movement of the robot and the conveyor.
First of all, the gripping position, P gripping , needs to be recognized. To achieve the gripping pose, the point cloud of the object in the reference position has to be considered. The robot arm will be set to this position and its coordinates will be saved to the database for calculating the later position change of the object. After the pose estimation system figures out the relative position differences between the target point cloud and the reference point cloud, the gripping position of the target object, P gripping , will be calculated as the sum of the reference position stored in the database, P database , and the relative position, P estimate . Equation (24) and Figure 6 describe the grab position of the target object. P gripping = P database + P estimate (24) Figure 6. Construction of the target object gripping pose. Figure 7 depicts the home position of the robot, P robot , and the scanning position of the conveyor, P scan , to the gripping position, P gripping . These positions, P robot and P scan , need to be known in advance and pre-set in the database to synchronize the operation of the robot and the conveyor.
The travel time for the robot to move from the home position to the gripping position of the object, t R→G , is calculated as the distance between the robot's initial position, P robot , and the gripping position, P gripping , divided by the feed rate of the robot in linear movement, FL. The following will demonstrate how to find t R→G : The time, t S→G , for the conveyor to move from the scan position to the gripping position is calculated by the distance of the scan position of the object, P scan , to the gripping position, P gripping , divided by the speed of the conveyor, v conveyor . t S→G will be expressed by: Travel times t R→G and t S→G have to be synchronized for the robot to successfully pick the workpiece while the conveyor is moving; therefore, t R→G = t S→G . Substitute into Equations (25) and (26), obtaining: v conveyor = FL × P scan − P gripping P robot − P gripping Equation (27) indicates that the v conveyor will be the linear function of the feed rate of the robot in linear movement, FL.

Experimental Results
This study focuses on the pose estimation and the robot pick-place system. Consequently, results that need to be found first are from the experiment on how accurately the program can predict the position of the workpiece. Then, with the information conducted from the pose estimation system, robot pick-place experiments were performed and resulted in quantity visualization. Moreover, the robot error compensation results will be demonstrated to verify the nonlinear correction method and increase the success rate of dynamic pick-placing.

Experiment on the Accuracy of the Pose Estimation System
The accuracy of the pose estimation system plays an important role for the robot arm to correctly pick up the stamping workpiece. Therefore, the experiment in Section 3.1 measures the accuracy of the pose estimation system and focuses on the error range between the estimated distance and the actual moving distance. All of the translation and rotation axes have to be involved in this experiment. The experiment has two sub-experiments, where the first one examined each individual 6-axis X, Y, Z, RX, RY, and RZ error, and the other checked the repetition of the pose estimation system. Firstly, the three translation axes X, Y, Z errors are considered. As shown in Figure 8a for the X and Y-axis, the workpiece is initially placed next to the base which is considered to be the reference position; afterward, place a block next to the base which will enable the workpiece to move a distance of 60mm (width of the block is 60mm). Then compared to the reference position, the workpiece is moved 60mm (the new position that needs to be measured); continue the experiment with other distances of 120mm, 180mm, and the other direction of −60mm, −120mm, −180mm. Figure 8b displays the experiment for Z-axis. Similar to the experiment of the X and Y-axis, this experiment also uses a block to move the workpiece but in the vertical direction (Z-axis) and the base now is the worktable. The experiment is carried out by using a block with height of 25mm. The distances of experiment are 25mm, 50mm, 75mm, 100mm, 125mm, and 150mm. Both in the reference and measured position, the workpiece point cloud is modeled for later recorded the error between the real distance and estimating distance. Now, considering the rotation axes RX, RY, and RZ, the experiment setup for these axes is shown in Figure 9. The worktable is now recognized as the base and the reference position is where the workpiece is placed on the worktable. The experiment used the BOSCH rangefinder (Bosch, Germany) to estimate the angle. For the reference position, the rangefinder shows 0 • . Thereafter, the workpiece is rotated by placing a block under its side. Now, the new position angle will be shown on the rangefinder.  The method of the pose estimation system of this study is shown in Figure 10a-f for the three translation axes of X, Y, and Z and the three rotation axes of RX, RY, and RZ. Similar to the description of the experiment setup above, the camera generates point clouds of the workpiece in both reference and measured positions to create the reference and measured cloud, respectively. In the figure, the blue point cloud is the reference point cloud, and the green point cloud is the cloud that needs to be measured for distance and angle. Each sub-experiment focuses on one axis in the total 6-DoF, and the estimated result is then compared with the real distance (or angle) to find the error. As shown in Figure 11, the translation error of the pose estimation of this study is considered to be reasonable and can be used in practical applications. Figure 11 confirms that the error of the translation axis ranges between 0.825 and 9.834 mm and 0.825~5.810%, and the average error is 5.417 mm and 4.850% for the X-axis, 4.001 mm and 3.513% for the Y-axis, and 1.570 mm and 2.214% for the Z-axis. The trend of Figure 11a points out that in the X and Y axes, the larger the distance between the reference and the measured points, the higher the error of the system. On the contrary, a smaller distance will lead to a higher error percent in the system. The same trend appears in the error of the Z-axis (Figure 11b).  Figure 12 demonstrates the rotation error of the system. For the RX and RY axes (Figure 12a), and the RZ-axis (Figure 12b), the errors are small if the real angle differences are small, but the error percent will increase when the angle differences decrease. The error and error percentage of the rotation axis range 0.13 •~2 .67 • and 0.37~16.40%, respectively. The average error and error percentage of the RX-axis are 1.36 • and 6.38%, respectively, the average error and error percentage of the RY-axis are 1.01 • and 6.01%, respectively, and the average error and error percentage for the RZ-axis are 1.21 • and 2.59%, respectively. The second sub-experiment was conducted to examine the repetition accuracy of the pose estimation system. The setup of this sub-experiment is similar to the above experiments (as shown in Figures 8 and 9). First, the workpiece is placed in the reference position, then it is moved to the new position that needs to be measured. However, in this experiment, the workpiece will be repositioned in all 6 axes. The measured position will be estimated 10 times; later, the data will be analyzed to find the repetition error of the pose estimation system.
As shown in Figure 13a, the translation error ranges 0.024~8.563 mm, which is 0.080~9.937%. The rotation error ranges 0.094 •~2 .908 • , which is 0.314~9.923% (Figure 13b). The average error percentages of each axis are 3.905%, 3.182%, 5.087%, 6.013%, 6.189%, and 5.523% for the X, Y, Z, RX, RY, and RZ axes, respectively. The error values which were measured were all less than 10% and 9 mm. For the purpose of researching and developing, the pose estimation repetition error is adequate for the robot system to successfully pick up the object. All the measurements were performed independently; hence, each time, measuring repetition errors will result in a different set of errors and do not relate to any of the previous sets of measurements.

Robot Error Compensation Results
As mentioned in Section 2.4, we utilized the non-linear correction algorithm to compensate for the robot error. Figure 14 demonstrates the robot error experiment setup. First, the checkerboard (side of each square is 20 mm) is placed on the worktable; then, the worktable center point is considered to be the centroid and does not have an error (meaning in the worktable center point, E x = E y = 0). The worktable center point is set to be the base point, where the error is 0 for both the X and Y axes ( Figure 15). The theory position of the other points (blue dots in 0) in the region of interest (red-dashed rectangle in 0) will be setup according to the base point and will be inputted into the controller. After the robot moves to the input position, the error is observed and measured, which is the displacement between the supposed input position (blue dot) and the actual moving position, by using the caliper. There are total of 198 points that were measured in the study. Using the input positions, the matrix A was calculated by recalling Equation (21). The matrix B is the error measured between the input and the actual position, recalling Equation (22) to obtain matrix C-the coefficient matrix: After applying the coefficient matrix to the calculation, the robot's positioning accuracy significantly improved. 0a shows the robot error before using the compensation function. As can be seen, the error of the robot was very large, up to 39.69 mm on the X-axis and 45.45 mm on the Y-axis. The average error without correction was 10.55 and 20.86 mm on the X-axis and the Y-axis, respectively. On the other hand, 0b demonstrates the sharp enhancement of the robot positioning. Compared to before compensation, the average translation errors can now reduce by over 20 times with the X-axis and 40 times with the Y-axis, to 0.49 and 0.37 mm on average, respectively. The maximum errors were also downgraded to 3.47 and 2.50 mm for the X and Y axes, respectively. Figure 16 visualizes the positions of all the points which were considered in the experiment. As shown in the figure, the real measurement positions (the red circle) of the input have a larger error compared to the input positions (the black square). However, after compensation using the non-linear correction algorithm, the robot positions (the blue triangle) have come closer to the input positions, and this statement is proven by the fact that the blue triangle points almost coincide with the black square points. This result indicates that the robot error can be solved by applying the coefficient function by software compensation. Figure 16. Position differences of the robot before and after compensation using the non-linear correction algorithm for robot pick-place. Figure 17 shows schematic diagram of dynamic workpiece feeding. When the signal starts to be transmitted and the pose estimation program is activated and completes the positioning of the object, the robot and the conveyor will move simultaneously so that while the conveyor is still moving, the robot will successfully pick up the object. The linear feed rate of the robot arm was set at 0.03 m/s. The distance between gripping position and scanning position was set at 0.5 m, and between gripping position and robot home position was set to 0.3 m. The dynamic task was accomplished by applying Equation (27) related to the velocity of the conveyor and the feed rate of the robot arm. Therefore, the velocity of the conveyor will be calculated as in Equation (29):

Experiment of the Dynamic Stack Workpiece Feeding System
After the camera finishes forming the point cloud, the program will predict the 6-DoF pose of the object based on that point cloud. If there is an object, the program will predict the location of the object, but if the object does not appear, the program will stop here. The position of the object will then be added with the conveyor's travel distance, then transmitted over the Ethernet connection, and the robot arm will begin to pick up the object to the stamping machine. The rest of the task will be the stamping machine's responsibility, and the pick-placing task is completed. The loop will continue if there are still objects to stamp.
There were two experiments-the first with a single object and the second with a stack of two objects. Figures 18 and 19 show cases where the robot successfully picked up the object to stamping die with single and piled objects, respectively. Figures 18a and 19a show the workpieces that were placed in random positions, Figures 18b and 19b show the scene point cloud of each case, Figures 18c and 19c show the results of the pose estimation system, and finally, Figures 18d and 19d show the images of the robot successfully picking up the object.    The results' data are presented in Table 1. Compared with static feeding (90% and 95%), dynamic feeding will have a significantly lower success rate-down to 65% in the case of 2 objects and 70% in the case of 1 object. The sudden drop in success rate is due to hardware limitations. The conveyor travel speed is not constant due to resistance and friction caused by the spindle shaft and the belt. The program's object position estimation time will not change in this experiment.

Conclusions and Future Prospects
In this study, we have developed an automatic feeding system for dynamic workpieces of the stamping industry using a robotic arm and 3D cameras. The main contents covered in the study were divided into three sections. The first part was the formation of a 3D point cloud using depth cameras and combining two point clouds from these cameras with different angles setup into a complete point cloud by stereo calibration. The accuracy of the point cloud has been found to be satisfactory to perform computational tasks, with a 4.7 mm mean distance of point-to-point cloud error. In terms of the accuracy of object information in the point cloud, it has also been shown to be steady, with less than a 7% error in positioning and a 5% error in height. Using the generated point cloud as input for the pose estimation system was the second part of this study. We used several point cloud processing algorithms, with emphasis on Euclidean cluster extraction to segment the workpiece with other objects on the worktable, FPFH to evaluate the feature points of the target object as well as the reference object, and SAC-IA and ICP respectively, to align the target and reference point cloud and find the transformation matrix. The error of the estimated system was a high-grade error, with an average of less than 6 mm for the translation error and 3 • for the rotation error. The repeatability error of the system was also kept stable, with an average of only 7% error. The last part was the transmission system with motion synchronization between the robotic arm and the conveyor to successfully pick up the objects. Before being used to pick up the object, the robotic arm will be calibrated by the nonlinear correction algorithm to find the appropriate adjustment coefficients that contribute to the accuracy of the manipulator. Compared to before compensation, the average translation errors were able to reduce by over 20 times with the X-axis and 40 times with the Y-axis, to 0.49 and 0.37 mm on average, respectively. The dynamic feeding experiment was performed with quite satisfactory success (>65%), despite the influence of limited hardware installation. This has proven to be viable in production with more invested equipment.
In the future, in addition to improving the device to meet the stability, another algorithm (potentially a Neural Network application algorithm) can be applied to calculate the position of the workpiece faster, to obtain real-time estimation. With the use of the fast refresh rate-Azure Kinect, the advantage of this study is that the scanning time for point cloud formation was within a second. However, using FPFH in combination with SAC-IA and ICP to compare and provide estimated locations of a workpiece requires a minimum of seconds. Therefore, the study can hardly be achieved in real-time. Besides, to increase the success rate of dynamic workpiece grabbing, as mentioned above, it is also necessary to have more stable devices. Since the depth sensors have a certain operating temperature (10-25 • C), the accuracy of processing point cloud data can be affected if the temperature is high. As a result, the workpiece modeling has to be carried out before undergoing the austenitization process. An experiment examining the influence of thermal radiation on modeling will be performed in future work. In the end, the study is only a simulation of a feeding system, and some other devices, which have the effect of supporting production automation, such as sensors measuring product quality and cameras playing the role of an observation and alarm alert when the system has problems, can be integrated to create a full-cycle stamping process. The robotic arm is also not the only option of stamping workpiece feeding automation, and other kinematic architectures should be applied in terms of size, payload, and preferences.