Deep Vision Servo Hand-Eye Coordination Planning Study for Sorting Robots

: Within the context of large-scale symmetry, a study on deep vision servo hand-eye coordination planning for sorting robots was conducted according to the problems of low recognition-sorting accuracy and efﬁciency in existing sorting robots. In order to maintain the symmetry of the picking robot, a small telescopic sorting robot with RealSense depth vision servo embedded in the manipulator was developed. The workspace and posture of picking parcels were analyzed, and the coordinate transformation model of hand-eye coordination was established for the “Eye-in-hand” mode. The hand-eye coordinated sorting test shows that the average positioning accuracy of the end in the X , Y and Z directions is 3.49 mm, 2.76 mm and 3.32 mm respectively, and the average time is 19.19 s. Among them, the average time for the mechanical arm to pick up the package from the initial position is 12.02 s, the average time for intermediate identiﬁcation and calculation is 3.79 s, and the average time for placing the package is 6.9 s. The time consumed by robot arm’s action accounts for 79.8% of the whole cycle. The robot structure and the hand-eye coordination strategy with RealSense depth vision servo embedded in the robot can meet picking operation requirements, and the design of a picking robot proposed in this paper can greatly improve the coordination symmetry of fruit target recognition, detection, and picking.


Introduction
In order to maintain the symmetry of the picking robot, "hand-eye" coordination is the main concern of an express parcel sorting robot and the key to completing an autonomous picking operation. The complexity of parcel types, colors, postures, etc., as well as the randomness and variability of one stacking, poses a great challenge to the wisdom of "perception-action" of the robot hand-eye system. The "Eye-to-Hand" structure has been first widely researched in the field of sorting robots and has strongly promoted the progress of picking robot technology. However, external camera telemetry resulted in excessive redundant information in the image, a complicated search process for target parcels, a prominent problem of mutual occlusion between hand-eye coordination and parcels, large recognition reliability, and visual localization errors [1]. This "see first, move later" static fragmentation approach lacks the ability to adapt to unstructured and complex environments and dynamic changes in logistics; at the same time, it relies too much on accurate hand-eye coordinate system conversion, robot kinematic modeling, and inverse problem solving, resulting in the accumulation of robotic errors and affecting positioning accuracy [2]. In contrast, the "Eye-in-Hand" (seeing and moving) mode of hand-eye coordination enables continuous visual feedback and correction of target positioning at the end so as to achieve real-time visual servo control, thus gaining importance in the field of sorting robots. In Japan, the United States, Belgium, and other countries, logistics sorting robot systems for different colored cartons and bags have adopted the "Eye-in-Hand" configuration with palm or wrist mounted cameras. For the "Eye-in-Hand" hand-eye coordination control problem, LUO et al. designed a picking robot based on the greedy algorithm using CCD camera in hand mode, and laboratory detection and picking success rate reached 97% and 79%, but the robotic pick and place parcel's path was too long, and too much time in motion planning and action execution was consumed. Moreover, a single operation time required up to 106 s [3]. Wang et al. designed a visual servo positioning accuracy of the end at about 15 mm, which can only meet the picking and positioning requirements of medium and large parcels [4]. Meanwhile, the existence of parcel prongs in the path will affect the servo's control performance or even cause damage to the end-effector, and the measurement noise has a greater impact on the effect of feedback control based on high gain. Alipour et al. developed a humanoid dual-arm carton parcel picking robot with Xtion and PrimeSense sensors installed in the head and wrist, respectively [5]. However, experimental accuracy was insufficient, and the autonomous hand-eye coordination control was not fully realized. The logistics picking robot studied by Kumar et al. [6] required establishing a three-dimensional point cloud model by scanning the parcels vertically and horizontally with RealSense on the hand; however, the experiment only achieved two-dimensional image pixel position approximation but not depth (distance) localization. Ning et al. [7] constructed a hand-eye system based on the RealSense RR300 sensor that was first positioned in front of the parcel to take a picture in order to determine the relative angle between the manipulator and the parcel, but the control study of hand-eye coordination has not been conducted. The disadvantages of former studies are shown in Table 1. Table 1. Differences between the different previous methods and the proposed method.

Methods Characteristic
Greedy algorithm [3] path is too long, too much time spent Visual servo robot [4] only meet medium and large parcels Humanoid dual-arm robot [5] experimental accuracy is insufficient, and hand-eye coordination control is not fully realized RealSense picking robot [6] the experiment only achieved two-dimensional image pixel position Hand-eye system based on RSRR300 [7] the control study of hand-eye coordination has not been conducted Proposed method lifting picking robot and "far and near" hand-eye coordination control In order to achieve efficient picking robot operation under replicated real-world conditions, "Eye-in-Hand" hand-eye coordination control based on RGB-D sensors has become an objective need and research consensus [8]. In this paper, a RealSense depth servo picking robot design and far and near scene motion planning were carried out and verified by experiments based on the RealSense depth sensor's reliable identification of 160-1200 mm depth of field and 160 mm close range package, combined with the structural design of lifting picking robots, the new hand-eye configuration mode, and the combination of "far and near" hand-eye coordination control.

Picking Robot Structure
The structure of the RealSense depth servo small telescopic picking robot is shown in Figure 1, which is composed of an autonomous mobile chassis, a scissor lift mechanism, a three-turn robot arm, and a wrist RealSense SR300 depth sensor. (1) A multi-stage picking and lifting mechanism driven by a flush DC motor was designed, and a flexible auxiliary start-stop device was designed to improve starting performance and achieve a buffer for retraction, thus effectively solving the contradiction between the miniaturization of the chassis for narrow space passage and the larger telescoping needs for mobile operation. (2) On the basis of the two-joint robot arm, the horizontal servo slewing mechanism of the base of the picking and telescoping platform was designed, and the combination of the large and small arms, horizontal slewing, and space telescoping becomes a four-degree of freedom PRRR robot arm. Moreover, picking and parcel placement can be completed in a specific height area only by the three-swing action planning of the robot arm, which greatly reduces path length, motion planning complexity, and time consumption of the cycle operation and satisfies the standardized logistics picking requirements. (3) For the "pick-and-place" cycle operation with a few degrees of freedom and short path, the "Eye-in-Hand" system was formed by the Real Sense depth sensor installed on the wrist of the robot arm, using its depth of field of 160-1200 mm and a large field of view of 1280 × 720 depth detection characteristics and in-hand Re-alSense depth servo control, thus ensuring hand-eye coordination and accurate positioning of the combination of near and far views.

Picking Workspace and Posture Analysis
The picking robot achieves a wide range of picking operations by vertical lifting and lowering and 3 degrees of freedom rotation of the robotic arm, while only 3 degrees of freedom rotation of the robotic arm is required to achieve the placement of parcels [9][10][11]. The end work space and posture with respect to lifting and turning should meet the need of accurately placing parcels in the reserved area.

Workspace Analysis
The workspace of the manipulator was solved according to the Monte Carlo method [12]. The method involves drawing a point cloud diagram of the manipulator by using the functions of Rand and Plot in MATLAB robot toolbox [13].
On the basis of the lifting mechanism, the rotary robot arm can pick parcels within 278 mm of the vertical range. When the chassis is 160 mm away from packages, the (1) A multi-stage picking and lifting mechanism driven by a flush DC motor was designed, and a flexible auxiliary start-stop device was designed to improve starting performance and achieve a buffer for retraction, thus effectively solving the contradiction between the miniaturization of the chassis for narrow space passage and the larger telescoping needs for mobile operation. (2) On the basis of the two-joint robot arm, the horizontal servo slewing mechanism of the base of the picking and telescoping platform was designed, and the combination of the large and small arms, horizontal slewing, and space telescoping becomes a four-degree of freedom PRRR robot arm. Moreover, picking and parcel placement can be completed in a specific height area only by the three-swing action planning of the robot arm, which greatly reduces path length, motion planning complexity, and time consumption of the cycle operation and satisfies the standardized logistics picking requirements. (3) For the "pick-and-place" cycle operation with a few degrees of freedom and short path, the "Eye-in-Hand" system was formed by the Real Sense depth sensor installed on the wrist of the robot arm, using its depth of field of 160-1200 mm and a large field of view of 1280 × 720 depth detection characteristics and in-hand RealSense depth servo control, thus ensuring hand-eye coordination and accurate positioning of the combination of near and far views.

Picking Workspace and Posture Analysis
The picking robot achieves a wide range of picking operations by vertical lifting and lowering and 3 degrees of freedom rotation of the robotic arm, while only 3 degrees of freedom rotation of the robotic arm is required to achieve the placement of parcels [9][10][11]. The end work space and posture with respect to lifting and turning should meet the need of accurately placing parcels in the reserved area.

Workspace Analysis
The workspace of the manipulator was solved according to the Monte Carlo method [12]. The method involves drawing a point cloud diagram of the manipulator by using the functions of Rand and Plot in MATLAB robot toolbox [13].
On the basis of the lifting mechanism, the rotary robot arm can pick parcels within 278 mm of the vertical range. When the chassis is 160 mm away from packages, the maximum picking depth in the horizontal direction is 753 mm, thus meeting the bilateral picking of parcels in the logistics center within 1.5 m.

Posture Analysis
Combined with the robot's working space and parcel size, the posture of the picking robot was analyzed. According to the parcel location and end accessibility, robot picking presents three different postures: top picking, flat picking, and back picking. At the same time, according to the relative installation relationship between parcel placement position and the robot arm, the posture of the robot arm end when placing the parcel is 70-110 degrees between the forearm and the horizontal angle.

Coordinate Transformation of Hand-Eye Coordination
For the "Eye-in-Hand" mode, parcel positioning needs to obtain the position of the parcel relative to RealSense [14], and then it obtains the position coordinates of the parcel relative to the robot coordinate system according to the relative installation position of RealSense and the robot arm and kinematic equations.
Assuming that the coordinates of the parcel position obtained by RealSense SR300 is S 0 = (x 0 , y 0 , z 0 ), the position S 1 of the target parcel relative to the robot coordinate system is In Equation (1), M is the transformation matrix of wrapped camera coordinates and robot coordinates.
z 0 -Picking lift mechanism vertical linear displacement, mm. 1-Horizontal rotational joint angular displacement, rad. 2-Rotational angular displacement of the robot arm's large arm, rad. 3-Angular displacement of small arm rotation, rad. L 1 -L 6 -The length of each bar of the robot arm and the relative mounting dimensions of RealSense and the robot arm.

Hand-Eye Coordination Strategy for Near and Far View Based on RealSense Servo
Compared with CCD passive vision, active depth vision has less interference information and is less affected by light [15]. However, consumer-grade RGB-D cameras are affected by resolution and accuracy and cannot obtain a clear depth image of the parcel individual. Moreover, they may have large positioning errors occuring after only one vision identification and positioning [16,17]. Using RealSense SR300 as a vision sensor, the depth and coordinate information of package pairs can be obtained in the range of 160-1200 mm depth of field. Meanwhile, the "large and clear" image acquisition and reliable identification and positioning of a limited number of package boxes can be achieved in ultra-close ranges and small-view fields of 160 mm. Based on RealSense, the extensive remote detection of the top of the package pile, sub-area division, and positioning and the close range precise identification and positioning of the package are effectively combined to realize the coarse and fine guidance and precise positioning of the robot arm and can improve the real-time and stability of the "Eye-in-Hand" image servo control through segmental path planning and a combination of key points in the path.
The RealSense servo-based far-near view hand-eye coordination action flow is shown in Figure 2. OABCD is the initial picking path; DBCD is the cycle operation path. (1) The wrist of the robot arm moves from the initial position O to the perspective position A. RealSense SR300 obtains a depth map and the corresponding coordinate information of the larger parcel pile area from the perspective position A. According to the vertical and horizontal view range of RealSense (55 • × 71 • ) and the target distance, the field of view of different distances can be obtained. Then, according to the optimal close-range view range of RealSense SR300, the perspective depth area can be divided into several close-range subareas, and the location of the subareas is completed (Figure 3); the recognition symmetry of fruit tree information scanning area plays a guiding role in the selection of subsequent design strategies.  Figure 2. OABCD is the initial picking path; DBCD is the cycle operation path.  In this strategy, RealSense returns different depth information of "coarse" and "fine" at the distant view position and the close view position, respectively, and guides the robot arm to approach the target from the far and near point and completes the selection. The method is based on a depth vision servo with limited key points to achieve high precision robot arm guidance and continuous operation with less computation and effective reliability and stability, thus providing assurance of fast and high success rate of robot sorting operation.

Critical Path Points for Hand-Eye Coordination
RealSense vision sensors are installed on the wrist of the robotic arm, which feeds back target depth information at the far and near points, respectively, by using the far and near point coordination strategy to guide robotic arm movement in order to target parcel picking. from the subregion, and still uses the horizontal altitude as the ideal altitude. The vision sensor detection area is 288 mm × 208 mm [18].
(3) Picking position: According to the coordinates of the targeted parcel, the robot arm reaches the picking position through altitude planning and picks the parcels up in a horizontal or upward or downward altitude.
The flowchart of the picking method is shown in Figure 3.

Trajectory Planning Method
In order to achieve control of the robot arm, the poses of the end of the robot arm at the initial and target points need to be provided before planning. The robot's poses in Cartesian space are converted to joint angles, and the trajectory is described by fitting the joint angles with a function. The commonly used trajectory interpolation methods in joint space include a cubic polynomial and quintuple polynomial. The cubic polynomial function only constrains the position and velocity on the trajectory. In order to ensure smoothness of the robot arm's motion, it also needs to constrain acceleration at the starting and stopping points of the joint. Trajectory interpolation is used for trajectory According to the operating range of the robot arm, it is known that the best vision detection range of RealSense SR300 is 500-700 mm [17], and the best close view recognition distance is 200 mm. The settings of the key position are critical to the precise positioning of the robot arm, and the hand-eye tasks at each key position are as follows: (1) Far-field position: The robot arm is placed 500-700 mm away from the top of the parcel pile to scan the top of the parcel pile at a horizontal altitude and can smoothly obtain complete depth data of the top of the pile at a certain height far field for subregional division. The 700 mm detection area is 1008 mm × 729 mm, and the 500 mm detection area is 720 mm × 520 mm [17]. (2) Close view position: After determining the subregion, the robot arm moves to the center of the targeted subregion, which is the best close view position at 200 mm from the subregion, and still uses the horizontal altitude as the ideal altitude. The vision sensor detection area is 288 mm × 208 mm [18]. (3) Picking position: According to the coordinates of the targeted parcel, the robot arm reaches the picking position through altitude planning and picks the parcels up in a horizontal or upward or downward altitude.
The flowchart of the picking method is shown in Figure 3.

Trajectory Planning Method
In order to achieve control of the robot arm, the poses of the end of the robot arm at the initial and target points need to be provided before planning. The robot's poses in Cartesian space are converted to joint angles, and the trajectory is described by fitting the joint angles with a function. The commonly used trajectory interpolation methods in joint space include a cubic polynomial and quintuple polynomial. The cubic polynomial function only constrains the position and velocity on the trajectory. In order to ensure smoothness of the robot arm's motion, it also needs to constrain acceleration at the starting and stopping points of the joint. Trajectory interpolation is used for trajectory interpolation. The quintuple polynomial interpolation function equation [19] is described as follows. θ(t) = a 0 + a 1 t + a 2 t 2 + a 3 t 3 + a 4 t 4 + a 5 t 5 (2) By taking the first and second order derivatives of Equation (2), the time functions of angular velocity and acceleration of the joint of the manipulator can be obtained. The After setting each key point and determining the linear trajectory, certain interpolation points need to be added to each segment of the trajectory, and the interpolation of these difference points is calculated by a 5-times polynomial to derive the rate of change of the positional coordinates of the different points with time, i.e., the relationship between the positional changes with time at the end of the process from the starting point to the end point is described by interpolation points.

Segmented Trajectory Planning
The trajectory of robotic arm operation can be divided into two trajectories: picking and placing. Considering the operation efficiency and control difficulty, the linear trajectory is adopted as the shortest path planning scheme in this paper for the two sections from the initial position to the prospective position and moving to the placing point after picking.
The process from the far view position to the near view range and then to the parcel picking position can be understood as the operation path of the robot arm approaching the parcel from the far view position. After completing parcel picking, the robot arm then moves the parcel from the picking position to the placement position and then returns to the vision position. Joint angle, angular velocity, and acceleration are generated from the picking position and then combined with the pose of the telepresence position for the inverse solution, and the joint angle, angular velocity, and acceleration are generated after trajectory planning in the joint space so that the robot arm arrives at the given positioning pose.

Track Articulation
From the initial position to the prospective position, the velocity at the end of the manipulator is 0 at both ends of the path. The process from the prospective point to the near scenic spot is a fast approaching process, and the velocity is not 0. However, the velocity at both ends of the trajectory from the picking position to the placement position is 0. According to the picking process and each key point, it is necessary to plan three straight line trajectories from the initial position to the prospective position, from the prospective position to the picking position and from the picking position to the placing position. Firstly, the coordinates of key points in Cartesian space need to be converted into the angles of revolute joints and the positions of straight joints (up and down) in joint space, as shown in Table 2. In order to ensure the accuracy of the linear trajectory of each fitting point, an interpolation point is selected every 15 mm, and the key parameters of the manipulator corresponding to each interpolation point are obtained by inverse solution. The linear interpolation trajectory of each segment of P 0 to P 1 , P 1 to P 2 , and P 2 to P 3 was obtained with MATLAB simulation.

Prototype Design
The structure diagram of the small lifting picking robot includes four main picking parts: an autonomous mobile chassis, a scissor lift mechanism, a three-turn robot arm, and a wrist RealSense SR300 depth sensor. The main technical parameters and performance indicators of this prototype are shown in Table 3. As shown in Figure 4, two customized transparent scale plates of 1 m × 1 m and 2 mm interval size were built into a coordinate detection device with a right-angle coordinate system through aluminum profiles in order to facilitate the detection of the picking robot accuracy test, and a laser pointer was installed on the wrist of the robot arm (Section 3.1.1. Workspace analysis).

Prototype Design
The structure diagram of the small lifting picking robot includes four main picking parts: an autonomous mobile chassis, a scissor lift mechanism, a three-turn robot arm, and a wrist RealSense SR300 depth sensor.
The main technical parameters and performance indicators of this prototype are shown in Table 3. As shown in Figure 4, two customized transparent scale plates of 1 m × 1 m and 2 mm interval size were built into a coordinate detection device with a right-angle coordinate system through aluminum profiles in order to facilitate the detection of the picking robot accuracy test, and a laser pointer was installed on the wrist of the robot arm Adjust and determine the initial position of the manipulator. The depth direction of RealSense camera was used as the Z axis, and the initial position of the manipulator was measured in the height direction, i.e., X direction is 800 mm, and the center line of the Adjust and determine the initial position of the manipulator. The depth direction of RealSense camera was used as the Z axis, and the initial position of the manipulator was measured in the height direction, i.e., X direction is 800 mm, and the center line of the waist rotating joint is the Y direction. After recording the reference point (800 mm, 0, 0) relative to the XOY plane on the grid scale plate, the X and Y coordinates of each measured end coordinate are read by the coordinate point relative to the reference point, and the Z direction can be measured by the vernier caliper, thus obtaining the actual coordinates of the picking robot end.

Test Method
(1) The chassis of the picking robot is fixed, the mechanical arm is adjusted to the initial position to the initial position, and several express packages are fixed on the transparent scale plate in the Z direction. The center point of the package is taken as the target point, and the actual coordinates of the package are recorded according to the scale. (2) RealSense servo-based far and near field hand-eye coordination strategy and parcel picking robot arm operation trajectory segmentation planning are applied to perform hand-eye tasks in different key positions and among key positions to complete singlecycle parcel picking. (3) The distance between the aluminum frame and the robot base coordinate system in the Z direction is adjusted, and the position of the parcels in different Z 1 , Z 2 , . . . , Z n planes is changed so that hand-eye coordination of parcel picking can be performed independently for different positions. (4) The coordinates of the projection of the end in the XOY plane are read from the mark point of the laser pointer on the scale plate. The distance between the end and the scale plate is measured by a vernier caliper, and the coordinate value of the end on the Z axis is obtained by combining the distance of the scale plate in the Z direction of the robot base coordinate system. At the same time, a DV camera was used to record the running process, and video editing software was used to capture the video clips of the manipulator's movement and to obtain its running time.

Analysis of the Test Results
The measured values of the coordinate points actually reached at the end of the picking robot and the corresponding elapsed time are shown in Table 4. As shown in Table 4, under the hyperopia and myopia coordination strategy involving RealSense information feedback, the average errors of the robot in X, Y and Z directions are 3.49 mm, 2.76 mm and 3.32 mm respectively. The time consumption of pick-up operation increases with the increase of package distance, and the average time consumption is 19.19 seconds. The average time consumption from the initial position of the manipulator to pick-up action is 12.02 s, the average time consumption of intermediate identification and calculation is 3.79 s, the average time consumption of package placement action is 6.9 s, and the time consumption of robot arm's action accounts for 79.8% of the whole process. The hand-eye coordination movement accuracy can meet the picking requirements, and achieves a good comprehensive effect of picking accuracy and efficiency. However, its accuracy and efficiency still have great room for improvement. There are some errors in each joint of the self-developed robot; In particular, there is a certain gap between the joints of the three rotating machinery. The maximum clearance of the manipulator under static test is (2.75, 2.1, 2.1, 2.65) mm, accounting for about 79% of the actual positioning error. Through optimization, the hand-eye coordination accuracy of the robot can be improved significantly. In the experiment, the path of the manipulator is calculated through interpolation with an interval of 15 mm. The actual arrival accuracy of the manipulator will improve with the reduction of the interval of interpolation points, and the improvement of the robot's computing ability will further reduce the recognition, positioning and interpolation calculation consuming.

Discussion
The results show that the average positioning accuracy of the end in the X, Y and Z directions are 3.49 mm, 2.76 mm and 3.32 mm respectively. The average time-consuming is 19.19 s, of which the average time-consuming of the robot arm from the initial position is 12.02 s, the average time-consuming of intermediate recognition and calculation is 3.79 s, the average time-consuming of package placement action is 6.9 s, and the action of the robot arm accounts for 79.8% of the whole course of treatment. Next, the construction of image feature vector of fruit tree picking object will be optimized to improve the accuracy of image recognition.

Conclusions
The design of the picking robot proposed in this paper can greatly improve the coordination symmetry of fruit target recognition, detection, and picking. In order to improve recognition sorting accuracy and efficiency of existing fruit tree picking robots, a small telescopic sorting robot with a RealSense depth visual servo embedded in the manipulator was designed in this paper. According to the parameters of RealSense and the manipulation of "eye-in-hand," the hand-eye coordination strategy from far to near based on RealSense depth servo installed at the hand was proposed for the sorting robot, and the operation flow and motion planning based on depth vision and far-near coordination strategy were completed. The robot structure and in-hand RealSense depth servo hand-eye coordination measurement can meet picking operation requirements, while the improvement of joint accuracy and optimization of path interpolation calculation can further improve the accuracy and efficiency of hand-eye coordination.