Occluded Apple Fruit Detection and Localization with a Frustum-Based Point-Cloud-Processing Approach for Robotic Harvesting

: Precise localization of occluded fruits is crucial and challenging for robotic harvesting in orchards. Occlusions from leaves, branches, and other fruits make the point cloud acquired from Red Green Blue Depth (RGBD) cameras incomplete. Moreover, an insufﬁcient ﬁlling rate and noise on depth images of RGBD cameras usually happen in the shade from occlusions, leading to the distortion and fragmentation of the point cloud. These challenges bring difﬁculties to position locating and size estimation of fruit for robotic harvesting. In this paper, a novel 3D fruit localization method is proposed based on a deep learning segmentation network and a new frustum-based point-cloud-processing method. A one-stage deep learning segmentation network is presented to locate apple fruits on RGB images. With the outputs of masks and 2D bounding boxes, a 3D viewing frustum was constructed to estimate the depth of the fruit center. By the estimation of centroid coordinates, a position and size estimation approach is proposed for partially occluded fruits to determine the approaching pose for robotic grippers. Experiments in orchards were performed, and the results demonstrated the effectiveness of the proposed method. According to 300 testing samples, with the proposed method, the median error and mean error of fruits’ locations can be reduced by 59% and 43%, compared to the conventional method. Furthermore, the approaching direction vectors can be correctly estimated.


Introduction
In the fresh fruit industry, harvest requires plenty of labor force, which usually undergoes a seasonal shortage. According to the existing literature [1], the cost of labor in the process of harvest constitutes over 50% of the total cost in apple orchards. To reduce the labor cost, robotic harvesters have been widely investigated over the past few decades. For a harvesting robot, locating fruits is one of the most challenging tasks of robotic perception in the complicated orchard environment. Visual sensors offer abundant information about the environment for robots, and in particular, Red Green Blue Depth (RGBD) cameras have pushed the boundaries of robot perception significantly and have been viewed as a promising technique. With the advantages of low cost, lightweight, and mini-type, the RGBD camera has become an essential component in agricultural robots, as well as in industrial robots and has attracted increasing attention.
One primary task of robotic harvesting is the recognition and localization of fruits. In orchards, the environmental factors affect the accuracy and robustness of stereo perception with RGBD cameras. Densely arranged leaves and branches in front of fruits are very common in robotic harvesting, bringing difficulties to accurately locating and approaching fruits. Moreover, illumination conditions also make the appearances of fruits very different from morning to night. In this case, taking these working conditions of harvesting robots into account is of great significance. In other words, fruit-harvesting robots are required to adapt to the environment and to understand it to increase the rate of grasping success. In previous literature, fruit recognition, segmentation, and pose estimation have been extensively studied. The recognition and localization of fruits can be successfully implemented in the case of no occlusion or a small occlusion. In the field operation of orchards, a robotic harvester has to cope with many fruits with complicated situations of occlusion. In some cases, the centers of fruits usually are occluded by leaves or branches. Then, the bounding boxes detected and the corresponding depth measurements tend to be error-prone, affected by the obstacles, which results in the failures of grasps. Therefore, further development of the techniques to detect and locate the fruits with complicated occlusions are demanded.
In this work, a new method of fruit localization is proposed for robotic harvesting. The contributions can be summarized as follows: • An instance segmentation network was employed to provide the location and geometry information of fruits in four cases of occlusions including leaf-occluded, branchoccluded, fruit-occluded, and non-occluded; • A point-cloud-processing pipeline is presented to refine the estimations of the 3D positional and pose information of partially occluded fruits and to provide 3D approaching pose vectors for the robotic gripper.
The rest of the paper is organized as follows: Section 2 introduces the related works on the apple detection and localization algorithm. Section 3 compares different instance segmentation networks for fruit recognition and localization on the image-plane. Moreover, the point-cloud-processing method for the 3D localization of partially occluded fruits is introduced in this section. Section 4 gives the field experimental results and further discussions. Section 5 concludes the paper and provides future works.

Fruit Recognition
The methods of fruit recognition can be divided into three kinds: single-feature-based, multi-feature-based, and deep-convolutional-neural-network-based. In a single-featurebased method, the differences among some shallow features to identify and detect fruit targets are employed, such as the color difference method with Ostu adaptive threshold segmentation, the Hough transform-based circle detection method, the optical flow method, etc. To improve robustness in the outdoors with variable lighting conditions and occlusion, the multiple-feature fusion method is proposed, which usually uses a combination of the Otsu adaptive threshold of colors, image morphology, and Support Vector Machine (SVM) to extract the areas of apple targets. In [2], fruit regions were segmented by combining red/green chromatic mapping and Otsu thresholding in the RGB color space, and a local binary pattern operator was employed to encode fruit areas to form a histogram for the classification by the SVM network. In [3], the fruits were segmented by a region-growing method, and then, the color and texture characteristics of the growing region were used for the classification and segmentation. Compared with the single-feature method, the core idea of the fruit-detection method based on multi-feature fusion is to encode pixels or local features of pixels to form descriptors and similarity measures and to select potential fruit regions based on super-pixels and region fusion algorithms. Then, region features are extracted as the inputs of the classifiers to complete region classification. Such methods can adapt to occlusion and changes in the external environment to a certain extent. However, the extracted features are designed artificially, and the process of feature extraction adopts an unsupervised way, which makes the selected features lack representativeness to distinguish various categories. In addition, the features contained in feature descriptors are usually shallow features such as color, edge, and distance and lack advanced features such as spatial position and pixel correlation.
In light of the DCNN-based method being easily adapted to multiple kinds of fruits and having the capability of generalization to complicated natural environments, in this paper, we employed the DCNN technique to propose the fruit recognition algorithm.

3D Fruit Localization and Approaching Direction Estimation
In the visual sensing system of an agricultural robot, 3D localization refers to the process of extracting the 3D information of the target by visual sensors. The main task of 3D localization for a robotic harvester is to obtain the spatial coordinates of the fruit to further guide the robot's gripper approaching the targets. To accomplish such a task of targets, stereo cameras usually are employed, e.g., binocular vision systems based on optical geometry and consumer RGBD sensors based on Structured Light (SL), the Time-of-Flight (ToF) method, and Active Infrared Stereo (AIRS).
Based on the triangulation optical measurement principle [15], binocular cameras have been successfully used to identify tomatoes, sweet peppers, and apples [16][17][18][19]. Reference [20] developed a vision sensor system that used an over-the-row platform for apple crop load estimation. The platform acquires images from opposite sides of apple trees and identifies the targets on the images based on shape and color, then maps the detected apples from both sides together in a 3D space to avoid duplicate counting. The above optical-geometry-based stereo-vision system is cost-effective, but its accuracy of measurement and timeliness are usually limited in real applications [21], as the sensing modalities and algorithms have became sophisticated and time-consuming.
Consumer RGBD cameras have a simple and compact structure and can be used for many local tasks, such as the three-dimensional reconstruction of targets at specific locations. RGBD cameras, e.g., Microsoft Kinect V2 (ToF-based) [22][23][24] and Intel Realsense (AIRS-based) [12,25], have been widely used in harvesting robots with the merits of low cost, high measuring resolution of distance, adaptiveness to ambient light, and quick response. The technique of fruits' localization with RGBD cameras has been widely investigated in the field of harvesting robots. Reference [26] employed the depth map of the Fotonic F80 depth camera from the detected sweet pepper regions and transformed the region to the 3D location of the mass center to obtain the locations of fruits. Reference [27] employed the point cloud of apple trees of Kinect v2 by fusing the RGB information and depth information, then segmenting the fruit regions by the point-cloud-segmentation method of the ROI in the RGB images and achieved a segmented purity of 96.7% and 96.2% for red and green apples. Reference [28] addressed the 3D pose estimation of peppers using Kinect Fusion technology from the Intel Realsense F200 RGBD camera to fit a superellipsoid from peppers' point clouds through a constrained non-linear least-squares optimization for the estimation of sweet pepper pose and grasp pose. In their follow-up work [29], the estimated pose of each sweet pepper was chosen from multiple candidates' poses during the harvesting by a utility function. In [30], tomatoes were fit by Random Sample Consensus (RANSAC), and the robotic manipulator grasped the tomatoes according to the fit centroid of the sphere. Likewise, Reference [31] used the RANSAC-based fitting method to model a guava fruit and an FCN network to localize a stem for robotic arm grasping. Reference [32] used an improved 3D descriptor (Color-FPFH) with a fusion of color features and 3D geometry features of the point clouds generated from RGBD cameras. Then, by the 3D descriptors and the classifiers optimized by support vector machine and the genetic algorithm, the objects of apples, branches, and leaves were divided. In [33], the 3D sphere Hough transform method was employed to model the apple fruits to compute the grasp pose of each fruit based on the fit sphere.
All aforementioned investigations usually assumed that the point clouds include an ideal surface of targets to employ a 3D descriptor and perform a fitting algorithm. Mostly, however, it is hard to acquire ideal point clouds of objects due to the unsatisfactory performance of the depth sensor, which is sensitive to external disturbance and prone to lose the filling rate of depth maps in outdoor conditions. However, there is little literature to address the locating problem in unsatisfactory point clouds, which are common in a real application.

Methods and Materials
The proposed detection and localization method for occluded apple fruits is based on deep learning and a point-cloud-processing algorithm. It is schematically shown in

Hardware and Software Platform
The implementation of the proposed method was based on the hardware platform, as shown in Figure 2a- The overall control system was implemented on the Robot Operation System (ROS) Melodic, with the operation system being Linux Ubuntu 18.04. The ROS accounts for robotic hardware abstraction, low-level devices' (e.g., cameras, robotic arm) control, the implementation of commonly used functionalities (e.g., motion planning), message-passing between processes, and package management. RGBD image acquisition was driven by Realsense SDK v2.40.0 with the ROS package. The training and real-time inference of the deep learning system were performed with the acceleration of CUDA and CUDNN with the Pytorch framework. The processing of the point cloud and computer vision were based on Open3D and OpenCV, respectively. The robotic arm control was implemented by the MoveIt motion-planning framework and franka_ros.

Data Preparation
To train the segmentation network for the proposed apple localization algorithm, two kinds of image datasets were employed in this work. One was the open-source MinneApple dataset [34], which contains 1000 images with over 41,000 labeled instances of apples. The other dataset (RGBD apple dataset) comprises over 600 images with over 4800 labeled instances of apples in 2 modern standard orchards in Haidian District and Changping District, Beijing, China, acquired by the Realsense D435i RGBD camera. Various conditions were considered in the preparation, as shown in Figure 3, including: • Different illumination: front lighting, side lighting, back lighting, cloudy; • Different occlusions: non-occluded, leaf-occluded, branch/wire-occluded, fruit-occluded; • Different periods of the day: morning (08:00-11:00), noon (11:00-14:00), afternoon (14:00-18:00). The dataset was split into 4:1, for the training and validation sets, respectively. In the collected training set, there were 50.97%, 33.49%, 7.95%, and 7.59% fruits of non-occlusion, leaf-occlusion, branch/wire-occlusion, and fruit-occlusion, respectively. In the validation sets, there were 42.47%, 35.67%, 14.01%, and 7.86% fruits of the four occlusions, respectively.
To prevent overfitting, we employed the following methods to augment the dataset: (a) photometric distortion including random contrast, random saturation, random hue, and random brightness, (b) random mirror operation, (c) random sample crop, (d) random flip, and (e) random expand.

Image Fruit Detection and Instance Segmentation
To select the instance segmentation network for our task, 4 network models were compared: Mask-RCNN, Mask Scoring RCNN(MS RCNN), YOLACT, and YOLACT++. We employed pre-trained models to implement transfer learning in this work. The comparison results are given in Table 2. From the comparison results, it can be seen that the YOLACT++ network with ResNet-101 had better FPS performance than Mask RCNN and MS RCNN due to its network structure. Besides, due to the use of deformable convolutional networks, YOLACT++ can tune the size and shape of the convolution kernel adaptively according to the learning rate. Consequently, YOLACT++ (ResNet-101) showed better Average Precision (AP) performance than the counterparts. The comparative test results of the networks are given in Figures 4-6. From Figure 4, it can be seen that ResNet-50 output an incorrect bounding box in the dotted red circle. In contrast, ResNet-101 correctly detected and segmented all fruits. In Figures 5 and 6, it can be seen that ResNet-50 had missing detections, compared with ResNet-101. Based on the above analysis, we chose YOLACT++ (ResNet-101) as the instance segmentation network for apples in this work, and some test results are demonstrated in Figure 7.   Besides, we employed the hybrid strategy, which switches from Adam to Stochastic Gradient Descent (SGD) optimization [35] to achieve better generalization and to prevent overfitting. By the combination of Adam and SGD, a balance between convergence speed and learning performance can be obtained. Moreover, we froze the BN layer during fine-tuning to prevent large value changes of Gamma and Beta. In practical application, it can be very common that fruits are partially occluded by branches, leaves, and other fruit. Such a situation brings difficulties in point cloud processing. Moreover, the sensitivities of RGBD cameras to an external disturbance in the outdoor conditions result in noise and a low fill rate of the depth maps. This also leads to the low quality of point clouds generated from RGBD images. For instance, based on our previous experimental point cloud data acquired in real applications (see Figure 8), the point clouds of the partially occluded fruits were usually fragmentary and could not be used for the 3D reconstruction of the targets, making the apples' location hard to determine. From Figure 8, it can be observed that both Target A and Target B had a poor filling rate of depth to different extents, which can be seen in two different views. Moreover, point clouds on the surface of Target A and Target B were not spherical, leading to difficulties in morphological feature analysis.
To address this problem, we now propose a pipeline for fruit high-precision localization with occlusions.

Fruit Central Line and Frustum Proposal
Determining the fruit's center is a fundamental problem for a robotic harvest. In practice, however, due to the characteristics of sensors and the inaccuracy of mask segmentation, there are deficiencies and outliers in the point clouds, resulting in that the acquired point clouds are fragmentary and distorted; see Figure 9. In this case, the point cloud inside of the bounding box cannot represent the whole fruit, and the inference of centroid based on such point clouds might be inaccurate. Many conventional methods directly use point clouds inside of the 2D bounding box to calculate the centroid of fruits, leading to a large location error. To address this issue, we propose a frustum-based method to determine the centroid. Before introducing the frustum, it is necessary to extract the central line of fruits. We used 2D bounding boxes obtained from the RCNN network to propose a complete fruit region along with a center on the RGB image. For the quasi-symmetry of spherical fruits, the center of the bounding box overlaps the center of the fruit. Using such a property, we may determine the center's position on the x-axis and y-axis in the frame of an RGBD camera.  To further obtain the coordinate on the z-axis, the fruits' bounding boxes on the RGB image can be lifted to a frustum (with near and far planes specified by the depth range in the 2D bounding box on the depth image) and a 3D central line (starting at the origin of the camera frame and ending at the center of the frustum's far plane), by using the RGB camera's intrinsic parameter matrix and the aligned depth image, as shown in Figure 10.

Point Cloud Generation of Visible Parts of Occluded Apples
After obtaining the frustum and the central line, it is necessary to further determine the specific position of the fruit's center on the line. As shown in Figure 11, there are point clouds of non-fruit in the frustum, such as the background, leaves, branches, etc. It is necessary to further filter the point cloud belonging to the fruit from objects of non-fruit. To this end, there are two steps to be performed.

1.
Generation of point clouds under the fruits' masks. According to the fruits' masks detected by the RCNN network and combining their corresponding depth map, the point clouds inside the masks can be generated, as shown in Figure 12. Ideally, the point cloud(s) generated from the fruit's mask is (are) supposed to be distributed on the surface of the fruit sphere. In practice, however, due to the nonideal characteristics of sensors and the inaccuracy of mask segmentation, there are deficiencies and outliers in the point clouds, resulting in the acquired point clouds being fragmentary and distorted; see Figure 9. Therefore, the following step is necessary; 2.
Selection of the most likely point cloud. To sort out the cluster on the target's surface, we used Density-Based Spatial Clustering of Applications with Noise (DBSCAN) to cluster the target point cloud. By DBSCAN, the point clouds generated from the masks can be clustered. Then, through counting how many points each cluster holds, the point cloud holding more points is selected as the best cluster. Taking the cluster as the most likely point cloud belonging to the target's surface, we may remove the other point clouds in the frustum. The reasons for choosing DBSCAN in this work lied in the following aspects: (1) DBSCAN does not need to know the number of clusters in the point clouds a priori, as opposed to k-means; (2) DBSCAN is robust to noise in point clouds; (3) There are only two parameters needed in DBSCAN, and it is insensitive to the order of points in the database, which suits clustering point clouds; (4) In the orchard sensing tasks, the parameters of DBSCAN, a minimum number of points (minPts), and the neighborhood ( ) can be easily determined according to artificial experience.

Centroid Determination and Pose Estimation
By the above two steps, the outliers can be removed and the target point cloud on the surface of the fruit obtained. Then, determining the fruit's center is necessary. Due to the distortion of the point clouds, as shown in Figure 9, fitting a sphere (such as by RANSAC and 3D Hough transform) is not feasible in this case. Consequently, the estimations of the center and the approaching vector of fruits need the following steps:

1.
Obtaining the centroid of the filtered point cloud, denoted by p c ; 2.
Calculating the radius from the 2D bounding box of the target and camera intrinsic parameters by: where ∆u = u r − u l is the width of the bounding box and u l , u r are the pixel positions of the left side and the right side of the bounding box on the U-axis, respectively; z are the z-axis coordinate of the point p c ; r denotes the radius of the sphere; K u is the scale factor on the U-axis of the camera; 3.
Constructing a sphere with radius r by taking p c as a spherical center, two points of intersection between the sphere and the central line of the frustum were obtained; 4.
Taking the far point of the above two as the fruit's centroid p o ; 5.
Taking the direction vector from p o to p c as the approaching vector.

Results of Localization and Approaching Vector Estimation
To verify the performance of the proposed method, experiments in an orchard were conducted. We set up a reference system with an RGBD camera and a LiDAR, aiming at providing the ground truth positions of fruit in order to evaluate the performance of the proposed method, as shown in Figure 2d. Such a reference system is not essential in the implementation of robotic harvest with the proposed method. Consequently, the standalone reference system with the LiDAR and Realsense camera was used. To extract the true values of the positions of a fruit's surface, the positional measurements of the LiDAR were employed due to its high resolution of distance and robustness to sunlight.
Once having obtained the point clouds from the LiDAR, we manually determined the centroid's position of fruit according to its point cloud. Besides, the true values of fruits' sizes were calculated by the centroid's positions and the image pixel positions, according to r = K u ∆u/2z c , where z c is the true value of the center's position on the z-axis.
We extracted 50 frames of RGBD image pairs of the robotic sensing system and the corresponding point cloud generated from the LiDAR as well, where the target fruits belonged to six different apple trees.
In the experiments, the travel speed of the tracked mobile platform to acquire images was 0.3 m/s along the fruit tree line and the approaching speed toward fruits of the RGBD camera was 0.3 m/s. The total pipeline processing speed of the proposed method was 0.1-0.12 s per frame, namely around 10 Frames Per Second (FPS) on our hardware system, which is competent to perform vision-based harvesting robot control.
We conducted three groups of tests at the following distances: <0.6 m (Group 1), 0.6-0.9 m (Group 2), and >0.9 m (Group 3) from the row of trees and at the view of the left, middle, and right of the target tree, respectively, as shown in Figure 13. In these tests, the numbers of the four different conditions of occlusions are as follows: nonocclusion: 91 (30.33%), leaf-occlusion: 171 (56.99%), branch/wire-occlusion: 24 (8%), and fruit-occlusion: 14 (4.67%). To verify the effectiveness of the proposed method, we took the bounding-box-based (bbx mtd.) method [36] as the reference scheme. Table 3 presents the estimating errors of the center and radius with the proposed method (our mtd.) and the bounding-box-based method (bbx mtd.) in the different testing groups and the total amount. Moreover, the errors included the Maximum error (Max. error), Minimum error (Min. error), Median error (Med. error), Mean error (Mean error), and Standard error (Std. error). From Table 3, it can be observed that the values of the median error, mean error, and standard error of the centers by the proposed method were considerably reduced by 67%, 50%, and 10% in Group 1 and 59%, 43%, and 9% in total. The median error, mean error, and standard error of the radius by the proposed method were reduced by 75%, 80%, and 96% in Group 1 and 70%, 70%, and 78% in total. This demonstrated that the proposed method can reduce the errors of both the center localization and the fruit size estimation. To clarify the relationship between distances and center/radius error, Figures 14 and 15 are presented, where the x-axis denotes the errors divided into five groups and the y-axis denotes the number of samples accordingly. From the comparisons of the distributions between the bounding-box-based method (Figure 15a,b) and the proposed method (Figure 15c,d), the majority of testing samples were distributed at the section of 0-10 mm/0-5 mm in the proposed method, outperforming the bounding-box-based method. In Figure 14c,d, one may see that the locating and estimating precisions were significantly affected by the sensing distances, denoted by the bars in different colors. This means that the density of the point clouds was one of the key factors of the performance.  Intuitively speaking, the abundant information of the RGBD pixels of the fruit improved the accuracy of location, which can be seen in Figure 15. The nearer sensing distance offered higher accuracy, but lost sensing range, and such a trade-off has to be made. Besides, Figures 16 and 17 demonstrate some experimental results of localization and approaching vector estimations. In Figure 17, the sphere-fitting-based methods, including RANSAC [30,31], 3D descriptor (Color-FPFH) [32], and 3D Hough transform [33], failed to generate accurate approach vectors because these sphere-fitting-based methods failed to extract the position of the fruits' surface accurately. In contrast, the success rate of detachment with the proposed method increased significantly. This demonstrated that the derived approach vectors by the proposed method can achieve better robustness on the point cloud without an ideal sphere surface of fruits.

Discussion
In the experiments in Figures 14 and 15, it can be observed that the proposed method showed good accuracy of localization in different distances. In the 3D-bounding-box-based method, the minimum and maximum positions on each axis of all points inside the 3D bounding box were employed to construct the edges. As shown in Figure 16, the front face of the cube had a large offset on the z-axis due to the leaf. This illustrates that the 3D-bounding-box-based method is sensitive to the outliers of the fruits' point clouds.
It was found that the performance of fruit location accuracy also depended on the accuracy of central line extraction based on 2D bounding boxes and 2D masks in our experiments. Namely, the detection accuracy had an influence on the centroid positioning along the XOY-plane; while the mask affected the performance on the z-axis. Consequently, the accuracy of detection and segmentation on 2D images is still a prerequisite. Some excellent studies provided potential solutions to this issue. For instance, neural rendering [37] and the Cycle-Generative Adversarial Network (C-GAN) [38] have been employed to restore occluded or unseen plant organs in the field of phenotypic data robotic collection and automatic fruit yield estimation. Besides, the symmetry-based 3D-shape-completion method [39] has been successfully applied to locate strawberries with the incomplete visible part of the targets and showed good effectiveness. To completely solve the problem of occluded fruit detection, multiple techniques are expected, which is also our research interest in the future.
Moreover, the experimental results showed that the filling rate of the depth data underwent a degradation in the back lighting condition, due to the influence of RGBD sensor performance. By the proposed method, the localization accuracy under good illumination conditions outperformed the case with the back lighting illumination. In practical applications, the outdoor filling rate of the RGBD image is still the main limitation to be solved in the future by the development of sensor technology. On the other hand, the improvement obtained in this paper was on the precision of localization and 3D approaching pose. Such an improvement was at the cost of the processing rate, compared with the conventional image-segmentation-based fruit localization algorithm, which is usually 25 FPS. However, compared with a harvesting cycle (around 6 s in our case), such a time delay can be neglected and the success rate is much more important than such a tiny delay for the overall harvesting time.

Conclusions
This paper investigated the problem of apple fruits' localization for harvesting robots under occluded conditions. A network for apple fruit target instance segmentation and a frustum-based processing pipeline for point clouds generated from RGBD images were proposed in this paper. To segment the fruit part from occluding objects with partial occlusions, YOLACT++ (ResNet-101) was used to improve the accuracy of bounding boxes' detection and masks' estimation in the case of partial occlusion. To address the issue of the low fill rate and noise of the depth image in an outdoor environment, we constructed a frustum by the bounding box and extracted the central line of the fruits. After that, to generate the point clouds of fruits, we used the segmented masks of fruits on RGBD images. Based on the central line and point cloud of fruits, DBSCAN was used to search the points that belong to the target fruit in the distorted and fragmentary point clouds. Meanwhile, the fruit's radius and the central position of the target fruit can be estimated from the point of intersection accordingly. Moreover, the proposed method provided the approaching direction vector for robots, calculated from the center of the fruit to the mass center of the obtained cluster.
The experimental results in orchards showed that the proposed method improved the performance of fruits' localization in the case of partial occlusion. The superiority of the proposed method can be summarized as follows: (a) robustness to the low fill rate and noise of the depth map in an outdoor environment; (b) accuracy when a partial occlusion exists; (c) providing the approaching direction for the reference of the robotic gripper to pick. According to 300 testing samples, the values of the center locations' median error, mean error, and standard error could be considerably reduced by 59%, 43%, and 9% in total with the proposed method, and the approaching direction vectors could be correctly estimated.