1. Introduction
At present, there are many disorderly stacked and diverse types of express packages in logistics transportation, forming a complex logistics sorting background. As a result, it is difficult to improve the sorting efficiency of express packages in this scene, which greatly affects the progress of logistics transportation. With the rapid development of deep learning theory, object detection and recognition technology based on machine vision has been gradually applied to the logistics industry [
1,
2], but there are still deficiencies in the detection and grasping of express packages under complex logistics sorting, which makes the sorting one of the main weaknesses in the development of the logistics industry at the present stage.
Automatic sorting of express packages [
3] has been a mainstream solution in logistics transportation, where object detection technology is applied to obtaining information such as position, category, segmentation mask [
4] and posture [
5], and then the intelligent sorting robot will grab packages accurately by locating and tracing them [
6,
7]. Traditional object detection technologies [
8] such as key point detection, Histogram of Gradient [
9] and Scale-Invariant Feature Transform [
10] are not suitable for detecting in complex scenes like shadows [
11,
12] or on blurred images [
13] due to poor generalization and slow execution speed. While object detection algorithms based on Convolutional Neural Network (CNN) have more detection accuracy and are gradually applied in practice [
14,
15], they have been divided into the two-stage algorithm and the one-stage algorithm. The two-stage algorithms such as Region-CNN (R-CNN) [
16], Fast R-CNN [
17], Faster R-CNN [
18] and Mask R-CNN [
19] have better detection accuracy but take longer inference time and lack in real-time detection compared to one-stage algorithms represented by SSD (Single Shot Multibox Detector) [
20], YOLO (You Only Look Once) [
21] and RetinaNet [
22]. Compared with the traditional ones, object detection algorithms based on CNN can extract the features of the target more effectively, and adapt to detection tasks in specific scenes like small target detection [
23,
24], obscured target detection [
25] and multi-target detection [
26].
In terms of logistics transportation, early studies mainly focused on the application of object detection and recognition. For example, Hwang et al. [
27] applied the object detection algorithm to the recognition of goods to realize the automatic loading of trucks, and Gou et al. [
28] studied the dataset image synthesis method of cargo to load and unload cartons based on deep learning, so as to improve the target recognition. The objects in the above research were cartons with regular appearances, which are detected and recognized more easily than multiple types of express packages in the complex logistics sorting scene. In terms of visual sorting, Zuo et al. [
29] studied the location detection of targets in a scene stacked with objects by combining machine vision and a deep learning algorithm and then controlled the sorting robot to grasp it. Han et al. [
30] proposed a visual sorting method based on multi-modal information fusion to improve object detection and grasp accuracy of the manipulator. Both of these approaches aim to solve the problem of object grasping in complex scenes, but there are not only the problems of single type and regular shape of the target, but also the simple and ideal experimental background, which is greatly different from a real complex sorting scene. In terms of the determination of optimal grasping position of the target, Han et al. [
31,
32] proposed a robot sorting method based on a deep neural network where the geometric center had been calculated from four key points determining the final grasping position. However, this method has limitations for objects with irregular shape and uneven surface. What’s more, as the actual logistics sorting scene is quite complex and changeable, the methods above find it difficult to satisfy the requirements of a real situation where express package sorting operates.
In the actual logistics transportation sorting scene, the background of express package detection and recognition is more complex and restricted by various factors, which can be mainly divided into external environment factors and express package itself factors. The ambient light is one of the external environment factors that influences detection effects of the target [
33,
34,
35], for example, strong light makes the package surface reflect and be overexposed, while uneven lighting conditions lead to a large range of shadows. These external influences make the texture features of the package surface fade, fuzz, disappear or be confused with the background, thus affecting the detection and recognition of the target, reducing the detection accuracy and image segmentation quality. Poor lighting conditions will also affect the RGB-D camera’s extraction of target depth information, and then affect the generation and transformation of 3D point cloud data [
36,
37,
38]. In addition to external environment factors, the target to be detected also has a great impact on object detection and instance segmentation [
39,
40,
41]. In actual logistics transfer center scenes, there are a large number of packages with different shapes, colors and materials stacked in a disorderly manner. Some of them are similar in appearance, such as shape and color, which are difficult to distinguish, or are composed of the same material being overlapped or obscured. Moreover, some packages are prone to reflect lights, imaging unclearly and appearing seriously deformed due to special materials, which are difficult to identify. Furthermore, these packages usually appear in dense distributions, unevenly or dispersedly. In general, under the combined influence of these two adverse factors, a complex logistics sorting background has been formed, which is quite different from those of previous studies.
Although effective methods had been proposed to solve corresponding problems in the studies above, the influences of various targets and backgrounds of research were ignored, leading to disadvantages in detection and sorting under complex logistics sorting. In order to improve the sorting efficiency of express packages, a multi-dimensional fusion method for visual sorting is proposed that is suitable for diverse types of packages in complex logistics sorting scenes. Mask R-CNN is applied to the 2D detection task, from which the segmentation mask is combined with 3D point clouds to determine the sorting vector and the optimal grasping position of the express package in real time. Lastly, an experiment on robot sorting is carried out to verify the progress of the proposed method on sorting efficiency. It is hoped that the MDFM can improve the efficiency of logistics sorting and promote the development of the logistics industry.
2. Method
Due to the uneven surface of most express packages, especially those easily deformed packages such as bags, and the complex situation of disordered stacking and overlap, the traditional method that estimates the pose and determines the grasping position of packages based on point cloud is difficult to apply to the complex logistics sorting scene. To this end, the multi-dimensional fusion method is proposed, in which Mask R-CNN is adopted and 3D point cloud data is used, and its overall framework is shown in
Figure 1.
In MDFM, Mask R-CNN is designed and applied to detect and recognize different kinds of express packages, obtaining information of the category and instance segmentation. Point cloud data filtering is designed to acquire accurate point clouds of the package grasping surface, which combines the boundary information of 2D instance segmentation generated form Mask R-CNN to accurately filter the 3D point clouds of the package grasping surface. It can reduce the interference of non-grasping surfaces and the other packages on the point cloud extraction. Then, the ordinary least squares method is used to conveniently and quickly fit the point clouds into a virtual plane, and the normal vector of the plane is obtained to determine the sorting vector of the package. At last, the geometric center of the original surface is mapped to the fitting plane, where the final optimal grasping position is located.
2.1. Detection on Express Packages
Mask R-CNN is applied to the 2D detection task in MDFM considering its multifunctional ability in detection and adaptability to complex scenes, through which category classification, bounding box regression and instance segmentation can be carried out, possessing the practicability for the detection task in complex backgrounds.
Affected by the complexity of the actual logistics sorting scene, the accuracy of the one-stage object detection algorithm is lower than that of the two-stage object detection algorithm. Compared with other two-stage target detection algorithms, Mask R-CNN can detect and recognize targets and segment instances more precisely at the same time, dividing individual package units accurately, which is more conducive to the automatic sorting of express packages.
2.2. Method for Point Cloud Data Filtering
After using the Mask R-CNN to accurately process the express packages, information such as the type and quantity of packages to be sorted and the boundary of the segmentation mask can be obtained at the 2D level. Combined with the 3D information such as the coordinate position, pose and grasping position of the package, the sorting robot can be applied for accurate and fast automatic operation.
Data filtering refers to the targeted filtering of 3D point clouds collected by the RGB-D camera, which is generally divided into two parts: the point cloud filtration of all express packages to be sorted, and the point cloud filtration of each express package grasping surface. Due to the impact of complex logistics sorting backgrounds, point clouds of express packages collected by the RGB-D camera often contain other interference factors, such as the conveyor belt, sorting table or even irrelevant packages outside the sorting range. By setting a range threshold of filtration, the point clouds of other objects outside the detection and sorting range are eliminated, and only the point clouds of express packages to be sorted will be retained, which also prepares for the next step of combining boundary information to filter point clouds of the grasping surface.
As shown in
Figure 2, the segmentation mask of the grasping surface can be accurately generated on a single express package through the instance segmentation of Mask R-CNN, and the boundary contour of the mask can be drawn on the RGB image. Next, the RGB image is aligned with the depth image, and the boundary contour is called to divide the range of the grasping surface of the package to be sorted on the depth map, and then this part of depth information will be converted into the corresponding point clouds. After this, the accurate filtration of the grasping surface has been realized, which also reflects the combination of visual information and 3D information. In addition, calibration of the RGB-D camera is required before the detection task to avoid influences caused by camera distortion.
2.3. Plane Fitting and Sorting Information
In order to improve the efficiency of the logistics sorting, it is necessary to provide accurate sorting information like the grasping position and sorting vector of packages, which means plane fitting ought to be carried out based on the point clouds that are filtered from the object grasping surface, and then the position of the plane center and the normal vector will be calculated. Suppose the set of the grasping surface points is
P, then the set
P can be represented by Formula (1).
where
represents the coordinate of the point cloud in the set
P,
represents the coordinate of the
ith point cloud in the set
P,
n represents the number of point clouds composing the grasping surface.
Given that the points in the set P represent n discrete points in the grasping surface, the ordinary least squares method is used to fit them into a new plane. The calculations in detail are shown as follows.
The expression of the ordinary plane can be expressed as:
where
X,
Y, and
Z represent the values of the
x,
y, and
z axes, respectively, at any point on the plane,
a,
b,
c, and
d represent arbitrary constants.
Supposing that
, making
,
,
, the new fitting plane expression can be expressed as:
If there are
m (
m n) points in set
P, according to the principle of the ordinary least squares method, making the quadratic sum of
z-axis errors between points and corresponding points on the fitting plane minimize, as shown in Formula (4).
where
m represents the number of point clouds that satisfy Equation (4),
represents
z-axis value of the
ith point cloud.
After solving the ordinary least squares problem expressed by Formula (4), unknown quantities
A,
B and
C can be worked out. The expression of the fitting plane can be obtained, and the normal vector of the fitting plane is:
After obtaining the expression of the fitting plane expression, the optimal grasping position of the plane can be calculated and determined, as shown in the following formulas:
where
represents the coordinate of the point which is the geometric center of the original surface,
represents the coordinate of the point on the fitting plane mapped from
by Expression (3), namely the optimal grasping position of this plane.
The 3D point clouds of the grasping surface are accurately filtered through the boundary information of 2D instance segmentation, and the virtual plane is quickly fitted to determine the sorting vector and the optimal grasping position, as shown in
Figure 3. The processing of plane fitting is not only beneficial for improving the grasping accuracy of express packages in a disordered distribution, but also to a certain extent reduces the impact of adverse lighting conditions in complex scenes, resulting in the absence of point clouds on the package surface, and finally improves the overall sorting efficiency.
4. Conclusions
In this research, a new multi-dimensional fusion method for visual sorting of express packages under actual complex logistics sorting is proposed, in which Mask R-CNN is adopted and 3D point cloud data is used. Firstly, the express package images under the background of complex logistics sorting are collected, and the dataset is made. Secondly, Mask R-CNN is evaluated and applied to a 2D detection task. Then, the point cloud data is filtered, and a virtual grasping surface is fitted, after which accurate sorting information including the sorting vector and the optimal grasping position of express packages are worked out. Finally, robot sorting experiments are carried out. The main conclusions are as follows:
- (1)
The Mask R-CNN was evaluated for detection accuracy, achieving higher precision in object detection and having advantages in instance segmentation compared with previous classical object detection algorithms. The results show that Mask R-CNN can provide accurate detection information in MDFM.
- (2)
Based on accurate detection results, combined with precise vector sorting and optimal grasping position, the sorting success rate of the MDFM reaches 97.2%, proving the stability and applicability of the proposed sorting method.
- (3)
The method is conducive to improving the sorting efficiency of express packages under complex logistics sorting, and provides technical conditions for realizing comprehensive automation and high efficiency of sorting in complex scenes, which has important application value.
Although the MDFM proposed in this research improves the sorting efficiency of express packages, the actual logistics sorting scene will become more complex with the development of the logistics industry. Future work will further improve the method’s object detection precision of express packages and its adaptability in other complex logistics sorting scene datasets.