Next Article in Journal
Learning Advanced Locomotion for Quadrupedal Robots: A Distributed Multi-Agent Reinforcement Learning Framework with Riemannian Motion Policies
Next Article in Special Issue
Mechanical Design, Manufacturing, and Testing of a Soft Pneumatic Actuator with a Reconfigurable Modular Reinforcement
Previous Article in Journal
Optimization of Q and R Matrices with Genetic Algorithms to Reduce Oscillations in a Rotary Flexible Link System
Previous Article in Special Issue
Automated Grasp Planning and Finger Design Space Search Using Multiple Grasp Quality Measures
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multiple-Object Grasping Using a Multiple-Suction-Cup Vacuum Gripper in Cluttered Scenes

1
Corporate Manufacturing Engineering Center, Toshiba Corporation, Yokohama 235-0017, Japan
2
Corporate Research & Development Center, Toshiba Corporation, Kawasaki 212-8582, Japan
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Robotics 2024, 13(6), 85; https://doi.org/10.3390/robotics13060085
Submission received: 23 April 2024 / Revised: 22 May 2024 / Accepted: 23 May 2024 / Published: 27 May 2024
(This article belongs to the Special Issue Advanced Grasping and Motion Control Solutions, Edition II)

Abstract

:
Multiple-suction-cup grasping can improve the efficiency of bin picking in cluttered scenes. In this paper, we propose a grasp planner for a vacuum gripper to use multiple suction cups to simultaneously grasp multiple objects or an object with a large surface. To take on the challenge of determining where to grasp and which cups to activate when grasping, we used 3D convolution to convolve the affordable areas inferred by a neural network with the gripper kernel in order to find graspable positions of sampled gripper orientations. The kernel used for 3D convolution in this work was encoded, including cup ID information, which helps to directly determine which cups to activate by decoding the convolution results. Furthermore, a sorting algorithm is proposed to determine the optimal grasp among the candidates. Our planner exhibited good generality and successfully found multiple-cup grasps in previous affordance map datasets. Our planner also exhibited improved picking efficiency using multiple suction cups in physical robot-picking experiments. Compared with single-object (single-cup) grasping, multiple-cup grasping contributed to 1.45 × , 1.65 × , and 1.16 × increases in efficiency for picking boxes, fruits, and daily necessities, respectively.

1. Introduction

With the growth of e-commerce, demand for the automation of bin picking by robots in warehouses has become high [1], particularly in Japan, since the country is faced with a labor shortage due to its aging society. COVID-19 has made the situation worse since picking tasks in warehouses is not amenable to telework. Most state-of-the-art robotic picking systems focus on single-object grasping. To further improve the efficiency of these systems, simultaneous grasping of multiple objects might reduce the number of pick attempts to improve the picking speed, as shown in Figure 1A. In addition, a robot can more stably grasp and hold objects that have a large surface using multiple suction cups to grasp the object, as demonstrated in Figure 1B.
Multiple-object grasp planning for jaw or multi-finger grippers has previously been proposed under various conditions, such as in well-organized scenes [2,3], rearranged scenes [4], and cluttered scenes [5,6,7,8]. These studies demonstrated that multiple-object grasping could improve picking speed. However, few studies have examined multiple-object grasping using a vacuum gripper with multiple suction cups. Most studies infer the single-object grasp point for a gripper with only a single suction cup using direct or indirect methods. Direct methods [9,10,11] use deep convolutional neural networks to directly infer the grasp point, while indirect methods [12,13,14] first infer the affordance map, which is a pixel-wise map indicating the graspability score for a single-cup vacuum gripper at each pixel, and then find the optimal grasp point in the map. Given that the affordance map contains all possible grasp points for a single suction cup, if all cups in a vacuum gripper have the same geometry (e.g., cup radius) and dynamics (e.g., suction force limit and friction coefficient), then we can search for a gripper pose where the center positions of at least two of the cups are located at non-zero pixels in the affordance map and satisfy the conditions described in Section 4 for grasping multiple objects or an object with a large surface area.
In this study, we propose an affordance-map-based grasp planner for a multiple-suction-cup vacuum gripper to grasp multiple objects or grasp an object with a large surface area. We propose a 3D convolution-based method, which takes advantage of the suction affordance map inferred by our prior work, suction graspability U-Net++ (SG-U-Net++) [14], to search for a gripper pose capable of grasping multiple objects or an object with a large surface area. Furthermore, unlike the control of a jaw gripper in which all fingers of the gripper are usually controlled to open or close simultaneously, the suction cups need to be controlled separately. Therefore, we design a kernel that includes an encoded cup ID to determine which suction cup to activate. Furthermore, as there might be many candidates for multiple-suction-cup grasping, we propose an evaluation metric for determining the optimal grasp among candidates. The proposed grasp planner is validated using previous affordance datasets and physical robot experiments.
In short, the contributions of this work include the following:
  • A 3D-convolution-based grasp planner for a vacuum gripper with multiple cups to grasp multiple objects or an object with a large surface area.
  • Control of suction cup activation by incorporating a convolution kernel, including the encoded cup ID.
  • A robotic picking system with a hybrid planner that performs multiple-suction-cup grasp planning preferentially and switches to single-object grasp planning when there are no solutions.
  • A sorting algorithm for determining the optimal grasp for multiple-cup grasping.
  • Validation of the grasp planner on previous affordance datasets, including Suction FCN [12], SuctionNet-1Billion [13], and SG-U-Net++ [14].
  • Experiments for picking boxes, fruits, and daily necessities with a vacuum gripper with two cups and a comparison of multiple- and single-cup grasping results.

2. Related Works

2.1. Single-Object Grasping Based on an Affordance Map

A pixel-wise affordance map contains grasp quality at each pixel when the robot grasps the object in the corresponding pose. Unlike end-to-end deep learning, which is used to directly predict grasp configurations such as a rotated bounding box [15,16,17] for a jaw gripper or a suction point for a vacuum gripper with only a single cup [9,10,11], affordance learning has the advantages that the neural network model can be anchor-free, and there is no need to sample candidate grasps as in [18]. Zeng et al. [12] were one of the first research groups to apply pixel-wise affordance learning to bin picking for four multiple-motion primitives of a hybrid robotic hand with both a jaw and a single suction cup. They used a manually annotated affordance dataset to train fully convolutional networks (FCNs). The precision and generalizability of the FCNs were then further improved by [19,20]. Another representative work is by Morrison et al. [21], who generated affordance and pose maps from the rotated bounding box and designed a generative grasp CNN (GG-CNN) to directly infer the pixel-wise grasp pose and quality. To learn the grasp of a jaw gripper, many researchers [17,22,23,24,25] later used similar methods to generate affordance map datasets from grasp configuration annotations represented by a rotated rectangle (e.g., the Cornell Grasp Dataset [15] and Jacquard Dataset [26]).
However, these studies required real images and an expert to perform pixel-wise grasp affordance annotation. To reduce dataset generation costs, datasets are generated in a physical simulator where affordance is evaluated using a designed contact model (e.g., the quasi-static spring model used in Dex-net [27]) on a previously synthesized image. Recently, a similar contact model was used by Cao et al. [13] to generate a larger suction pixel-wise affordance (seal score) dataset. However, these studies required that contact model parameters were determined properly and treated with a vacuum gripper with only a single cup.
In the present study, we used our previously proposed SG-U-Net++ [14] to infer the pixel-wise suction affordance map. SG-U-Net++ was trained on a synthesized dataset annotated by an analysis-based method and was competitive with the method trained on a dataset annotated by a contact model. We propose a grasp planner for multiple-suction-cup grasping that takes advantage of the predicted affordance map.

2.2. Multiple-Object Grasping

Most studies have focused on multiple-object grasping using a multi-finger gripper. Grasp conditions have been analyzed for a multi-finger gripper to stably grasp multiple cylinders [28,29], polyhedral objects [2], planar objects [3], and shaped spatial objects [30]. Recent studies have started to use data-driven methods to deal with the multiple-object grasping problem. Shenoy et al. [5] used a deep neural network to infer the number of objects for a three-finger gripper to grasp when digging into a pile of objects. They later proposed a Markov decision-based method to optimize pick-transfer routines when grasping multiple objects [6]. Sakamoto et al. [4] used mask-RCNN to detect objects and then searched for a gripper pose to push two boxes together in order to simultaneously grasp them. A similar push–grasp task was studied by Agboh et al. for grasping multiple arbitrary convex polygonal objects under frictional and frictionless contact conditions between objects [7,8]. They proposed MOG-Net to infer the maximum number of objects that a two-finger gripper could grasp by a sampled pose. However, the set of objects in these studies was still simple, and simultaneous grasping of objects with more complicated shapes (e.g., daily necessities) is required for more general applications (e.g., picking in warehouses). Mucchiani et al. [31] designed a novel end-effector to sequentially grasp multiple objects with complicated shapes. Similarly, Yao et al. [32] proposed a human-like grasp synthesis algorithm to achieve sequential multiple-object grasping. However, these studies grasped multiple objects sequentially rather than simultaneously.
For multiple-suction-cup grasping, most studies focus on grasping a single object rather than multiple objects using multiple cups. Mantriota [33] analyzed the suction force and friction coefficient to grasp and hold a large object with a four-cup vacuum gripper. Kozák [34] et al. used a deep neural network to estimate the pose of a round part and then used a six-cup vacuum gripper to grasp it. Tanaka et al. [35] designed a two-surface vacuum gripper in which each surface was equipped with multiple cups. They used a gripper to simultaneously suck two surfaces of a large box to improve the stability of grasping and holding. Leitner et al. [36] used a gripper with two different shaped cups to grasp an object on a shelf. These studies used a multiple-cup vacuum gripper to grasp a single object more stably. Kessens [37] et al. mounted a four-cup vacuum gripper on a drone to achieve sequential multiple-object grasping in the air, but found that simultaneous grasping was challenging. Islam et al. [38] proposed a planner for an unloading task in which the robot used a multiple-suction-cup vacuum gripper to simultaneously grasp and unload multiple cardboard boxes, but it was difficult to apply the planner to objects with complicated shapes in a cluttered scene. To our knowledge, the present study is the first to propose a grasp planner to simultaneously grasp multiple objects using a multiple-suction-cup vacuum gripper. The proposed planner can also determine gripper poses to stably grasp large objects with multiple cups.

3. Problem Statement

This study focuses on a bin-picking task in cluttered scenes. The robot is required to pick multiple objects or an object with a large surface area using multiple suction cups and then place them in a tote.

3.1. Assumption

We assume a suction vacuum gripper with multiple suction cups where all cups have the same specifications (e.g., the right side of Figure 2, in which both cups have the same shape, size, and suction force limits). In addition, the gripper tool center point (TCP) and all cups are in the same plane (e.g., the left side of Figure 2, in which the cup center points and TCP are in the same blue plane).

3.2. Vacuum Gripper State

The vacuum gripper state G consists of gripper position P , orientation O , suction cup center position C , and cup activation mode A , as shown in Equation (1). P is the position ( x g , y g , z g ) of the TCP in world coordinates. O is the orientation represented by a ZYZ rotation matrix ( R z ( θ g ) R y ( ϕ g ) R z ( γ g ) ), where θ g and ϕ g are the azimuthal angle and polar angle of the unit vector of gripper axis z, respectively, and γ g is the rotation angle around gripper axis z. Note that s i n and c o s are abbreviated to c and s in the matrix. C consists of the center position ( x c i , y c i , z c i ) of each suction cup in world coordinates. C 0 is the center position ( x c i 0 , y c i 0 , z c i 0 ) of each suction cup in local gripper coordinates (see the right side of Figure 2). A is a one-hot vector representing the activation status ( a c i ) of each suction cup, where a c i is 1 if the ith cup is activated, and 0 if it is disabled.
G = [ P , O , C , A ] , P = [ x g , y g , z g ] , O = R z ( θ g ) R y ( ϕ g ) R z ( γ g ) , = c ϕ g c θ g c γ g s θ g c γ g c ϕ g c θ g s γ g s θ g c γ g s ϕ g c θ g c ϕ g s θ g c γ g + c θ g s γ g c ϕ g s θ g s γ g + c θ g c γ g s ϕ g s θ g s ϕ g c γ g s ϕ g s γ g c ϕ g , C = [ [ x c 1 , y c 1 , z c 1 ] , [ x c 2 , y c 2 , z c 2 ] , . . . , [ x c i , y c i , z c i ] ] , A = [ a c 1 , a c 2 , . . . , a c i ] .

4. Conditions for Grasping Using Multiple Suction Cups

Since all suction cups installed in the gripper are assumed to be the same, the affordance map of each suction cup is the same. Hence, we can determine a gripper pose capable of grasping multiple objects or an object with a large surface area by multiple cups if the following conditions are satisfied (an example is shown in Figure 3).
Condition 1. 
At least two of the contact points are located in affordable areas on the object. If the contact points are located in affordable areas on different objects, the gripper can grasp multiple objects. If the contact points are located in the same affordable area, the gripper can grasp a large surface area using multiple cups.
Condition 2. 
The gripper TCP and all contact points located in affordable areas are in the same plane, which is perpendicular to the unit vector of the gripper’s z-axis ( n g ).
Condition 3. 
Normals of all contact points located in affordable areas ( n c p i ) are in the same direction as the unit vector of the gripper’s z-axis ( n g ), as shown in Equation (2). Note that a f f c p i > 0 indicates that the ith contact point is located in an affordable area where its affordance score is non-zero.
arccos ( n c p i · n g ) < ε 1 where a f f c p i > 0 .
Condition 4. 
The distance from each contact point located in affordable areas to the TCP in world coordinates ( d c p i ) needs to be equal to the distance from the corresponding cup center to the TCP in local coordinates ( d c i 0 in Figure 2).
| d c p i d c i 0 | < ε 2 where a f f c p i > 0 .

5. Multiple-Suction-Cup Grasp Planner

5.1. Overview of Architecture

Figure 4 and Algorithm 1 show the overall architecture and workflow of our multiple-suction-cup grasp planner. Given a depth image of I d , our previous work SG-U-Net++ is used to infer the affordance map I a f f for a single cup. The voxel grid generator then extracts the point cloud ( I p c d ) affiliated with the affordable areas in the map and downsamples them to a voxel grid (V). The orientation generator uses the point normals n p c d of extracted points to efficiently generate the gripper orientation samples ( S O ). The gripper kernel generator generates 3D encoded gripper kernels ( K ), including cup ID information. The decoder decodes the result ( C o n v R e s ) of 3D convolution (3D Conv.) of V over K and generates the gripper pose candidates ( G c a n d ). The normal direction checker removes candidates where the n g and contact point normals are not in the same direction. If G c a n d is successfully found, G c a n d is evaluated and ranked to obtain the optimal grasp ( G o p t ). Otherwise, if no G c a n d is found, the planner is switched to our previous single-object grasp planner, where the position with the highest affordance score is set as the goal, and the cup that can reach the goal by the shortest trajectory is selected to grasp the object.
Algorithm 1 Multiple-suction-cup grasp planner
     Input: Iaff: affordance map
        Id: depth image
        Ipcd: point cloud
        l: voxel size
        C0: local cup center positions (see the right side of Figure 2)
     Output:  G o p t : optimal grasp
1:
V GenerateVoxelGrid ( I p c l , I a f f , l )
2:
n p c d EstimateNormals ( I p c d )
3:
S O SampleGripperOrientation ( n p c d , I a f f )
4:
K GenerateEncodedKernels ( S O , C 0 , l )
5:
C o n v R e s Conv 3 D ( V , K )
6:
G c a n d Decode ( C o n v R e s )
7:
G c a n d NormalDirectionCheck ( C o n v R e s )
8:
if  len ( G c a n d ) > 0   then
9:
     G o p t Ranking ( I a f f , G c a n d )
10:
else
11:
     # Single object grasp planning
12:
     G o p t argmax ( I a f f )
13:
end if
14:
return  G o p t

5.2. Affordance Map Inference

We use SG-U-Net++ from our prior work to generate the affordance map. SG-U-Net++ has a nested U-Net structure and infers pixel-wise grasp quality and approachability based on a depth image. Refer to [14] for further details. Pixels with non-zero grasp quality scores are filtered out to generate an affordance map (the green area in the affordance map in Figure 4).

5.3. Voxel Grid Generation

We use voxel downsampling to generate the binary voxel grid (V) of the point cloud. Points located in the affordable areas are extracted and downsampled to a voxel grid with a defined grid size l. The voxel grid is further binarized such that if a grid in the voxel grid contained more than 10 points, the grid value would be 1 or 0. The voxel grid shape is N x × N y × N z , where N x = B x m a x B x m i n l , N y = B y m a x B y m i n l , and N z = B z m a x B z m i n l . B m a x and B m i n are the maximum and minimum bounds of the point cloud.

5.4. Grasp Orientation Candidate Generation

To satisfy Condition 3 in Section 4, Equation (2) must be computed for each point normal to sample the gripper orientations. If the size of the input point cloud is large, online sampling will result in high costs in terms of memory usage and computation time. We propose an efficient sampling method for a vacuum gripper by generating an offline normal to gripper orientation map. Since the Cartesian coordinates of a given vector vec ( [ v e c x , v e c y , v e c z ] ) can be represented by the azimuthal angle θ and polar angle ϕ as in Equation (4), all possible normals of contact points can be sampled by an angle interval Δ α , as in Equation (5). Meanwhile, as in Equation (1), n g (the last column of O ) depends on only ϕ g and θ g and has the same representation as Equation (4), so n g can be sampled by the same angle interval as in Equation (6).
vec = [ v e c x , v e c y , v e c z ] = [ c θ s ϕ , s θ s ϕ , c ϕ ] , θ = arctan ( v e c y , v e c x ) , ϕ = arccos ( v e c z ) ,
where θ is the normal polar coordinate in the x o y plane, and ϕ is the angle between the vector and the z axis. Assuming the normal is always in the up direction, θ ( π , π ] and ϕ [ 0 , π 2 ] .
S n ( i i , j j ) = [ c i i Δ α π s j j Δ α , s i i Δ α π s j j · Δ α , c j j Δ α ] ,
S n g ( i i , j j ) = [ c i i Δ α π s j j Δ α , s i i Δ α π s j j Δ α , c j j Δ α ] ,
where i i , i i = 0 , 1 , , π Δ α and j j , j j = 0 , 1 , , π 2 Δ α .
For each S n c p ( i i , j j ) , we search for all S n g ( i i , j j ) satisfying Equation (2) in order to create a map M : ( i i , j j ) ( i i , j j ) , which maps a point normal entry to all n g in the same direction as the normal vector. This map could be generated offline, and this only needs to be carried out once, thus reducing the computation cost.
Based on M , given the point normals, the feasible candidate n g could be rapidly obtained so that gripper orientation samples ( S O ) could be generated. Given n p c d , normals of points located in affordable areas are extracted and azimuthal and polar angles are computed (lines 1–3 in Algorithm 2). The angles are then used to calculate the entry key ii , jj to query M to obtain the feasible ii , jj , based on which samples ( S θ g and S ϕ g ) of θ g and ϕ g are obtained (lines 4–7 in Algorithm 2). Note that only unique ii , jj values with top-10% counts are used as entries. This helps to improve the sampling efficiency when the variation in n p c d is small. For example, if the input point cloud is a set of points in a plane, all n p c d and corresponding ii , jj are the same. Hence, using unique values, only one unique ii , jj rather than all ii , jj are used. As n g depends on only θ g and ϕ g , γ g could be any value if θ g and ϕ g are feasible. Hence, γ g is sampled by the same interval Δ α in the range ( π , π ] (lines 9–10 in Algorithm 2). The final S O is obtained by multiplying the rotation matrix of sampled S θ g , S ϕ g , and S γ g .
Algorithm 2 Sample gripper orientation
     Input: npcd: point normals
        Iaff: affordance map
        Δα: sampling interval
     Output: SO: gripper orientation samples
1:
n n p c d [ I a f f > 0 ]
2:
θ arctan ( n y , n x )
3:
ϕ arccos ( n z )
4:
ii θ + π Δ α
5:
jj ϕ Δ α
6:
ii , jj Unique ( ii , jj )
7:
ii , jj M ( ii , jj )
8:
S θ g ii Δ α π
9:
S ϕ g jj Δ α
10:
kk 0 , 1 , , π Δ α
11:
S γ g kk Δ α π
12:
S O R z ( S θ g ) R y ( S ϕ g ) R z ( S γ g )
13:
return  S O

5.5. Gripper Orientation Kernel Generation and Suction Cup ID Encoding

The kernel representing each candidate gripper orientation generated in Section 5.4 is created using 3D convolution to determine the graspable position for each S O , as in Algorithm 3. A binary kernel is used to represent gripper poses in previous studies using 2D convolution [39]. However, the convolution results can only determine the graspable position of the kernel and cannot directly determine which suction cup to activate. For example, as shown in Figure 5, although the convolution results for the four cases are the same, the suction cups chosen to be activated are different and cannot be directly determined from the convolution results. Hence, we design a 3D kernel that includes the suction cup ID information. Algorithm 3 is used to generate kernels S K of S O . The shape of one kernel K is N K x × N K y × N K z , where N K x = N K y = N K z = m a x ( d C i 0 ) l . m a x ( d C i 0 ) is the maximum distance of the suction cup center to the TCP in local gripper coordinates, and l is the grid size of the kernel, which is equal to that of the voxel grid. The kernel indices of cup centers are C l + m a x ( d C i 0 ) 2 l , where C is the cup center position of S O . The kernel grids at cup center kernel indices are filled with encoded vacuum ID information as in line 9 in Algorithm 3. Here, the ith suction cup ID information is encoded as 10 i , such that the cup ID is saved in the ith decimal place, and such encoding helps to directly obtain the target suction cups to activate from the decoding convolution results (see Section 5.7).
Algorithm 3 Generated encoded kernels
     Input: SO: gripper orientation samples
        C0: local cup center positions
        (see the right side of Figure 2)
        l: voxel size
     Output: S K : kernels of SO
1:
N O len ( S O )
2:
S K Zeros ( N O , N K x , N K y , N K z )
3:
for  n 0 to N O  do
4:
     K Zeros ( N K x , N K y , N K z )
5:
    for  i 0 to N c  do
6:
         [ C [ i ] , 1 ] S O [ n ] 0 0 1 C 0 [ i ] T 1
7:
         C [ i ] C [ i ] T
8:
         # encoding
9:
         K [ C [ i ] l + m a x ( d C i 0 ) 2 l ] 10 i
10:
    end for
11:
     S K [ n ] K
12:
end for
13:
return  S K

5.6. Three-Dimensional Convolution

We perform 3D convolution to determine the indices in V, where the gripper can grasp using multiple suction cups. Because the kernel is generated from an oriented C 0 that is located in the same plane as the TCP, the corresponding kernel indices of cup centers and the TCP are in the same plane, which satisfies Condition 2 in Section 4. Furthermore, as the distances from cup centers to the TCP are represented in a kernel scale that is the same as the voxel grid scale, we can slide the kernel over the voxel grid to determine the voxel grid index where the TCP satisfies Conditions 2 and 4. Specifically, as in Equation (7), the kernel is set to each grid cell of the voxel grid to calculate the convolution sum. Note that N K is the number of kernels, which is equal to N O .
C o n v R e s = n = 0 N K m = 0 N V x t = 0 N V y p = 0 N V z i = N K x 2 N K x 2 j = N K y 2 N K y 2 k = N K z 2 N K z 2 K [ i , j , k ] · V [ m + i , t + j , p + k ] .

5.7. Convolution Results Decoding and Normal Direction Check

Algorithm 4 shows the decode function that decodes the 3D convolution results ( C o n v R e s ) to generate grasp candidates. As the 3D convolution of the kernel center is set to each grid cell of V and accumulates the kernel values where the corresponding voxel grid value is non-zero (Equation (7)), the cup to be activated can be determined by obtaining each digit of C o n v R e s . As in line 7 in Algorithm 4, C o n v R e s is decoded to target ith suction cup activation a i in Equation (1) by scaling up C o n v R e s by 10 i times and then calculating the value mod 10. If a i is 1, a contact point exists for the ith vacuum cup that should be activated. Otherwise, there is no contact point and the cup should be disabled. For example, for the gripper with two suction cups in Figure 2, there are four ( 2 2 ) possible values of convolution results: 0.00, 0.10, 0.01, and 0.11, and the decoding results are [0, 0], [1, 0], [0, 1], and [1, 1], indicating non-graspable, graspable for only the first cup, graspable for only the second cup, and graspable for both cups, respectively.
Algorithm 4 Decode
     Input: ConvRes: 3D convolution results
     Output: Gcand: grasp candidates for multiple-cup suction
1:
N K , N V x , N V y , N V z c o n v R e s . shape
2:
A Zeros ( N K , N V x , N V y , N V z , N c )
3:
for  R e s in C o n v R e s  do
4:
    for  i 0 to N c  do
5:
         # decoding cup ID
6:
         A [ . . . , i ] 10 N c u p 10 N c u p i R e s mod 10
7:
    end for
8:
end for
9:
v a l i d I n d sum ( A , dim = 1 ) 2
10:
S P v a l i d I n d · l + B m i n
11:
S O S O [ v a l i d I n d ]
12:
S C S O S P 0 1 C 0 T 1
13:
S A A [ v a l i d I n d ]
14:
G c a n d [ S P , S O , S C , S A ]
15:
return  G c a n d
As A is a one-hot vector, the sum of A is the number of suction cups to be used. Therefore, we determine the indices ( v a l i d I n d ) of V where the sum of A is greater than or equal to two ( s u m ( A , d i m = 1 ) 2 ) in order to find the voxel grid indices where multiple suction cups can be used to grasp multiple objects or an object with a large surface area. v a l i d I n d is further converted to TCP positions in world coordinates ( S P in Algorithm 4), and the corresponding orientation S P , cup center positions S C , and target activation status S A can be obtained to generate the grasp candidates ( G c a n d ), as in lines 12–15 in Algorithm 4.
The normal directions of all activated cups ( a i = 1) of G c a n d are checked to satisfy Condition 3. Specifically, the closest point to the contact point of each activated cup is searched for in I p c d , and then the normal of that point is checked for whether it is in the same direction as the gripper’s z-axis direction using Equation (2).

5.8. Ranking

Each G c a n d is evaluated and ranked to determine the optimal grasp G o p t . We first perform point clustering on the points with non-zero affordance values, which are extracted from I p c d to generate a label map M l a b e l , distance map M d i s t , and orientation map M o r i e n t , as shown in Figure 6. M l a b e l contains the ID label of each cluster and is later used to calculate how many objects can be grasped. M d i s t contains the 3D distance from each point in the cluster to the cluster center. M o r i e n t contains the 3D orientation of each cluster. M d i s t and M o r i e n t are generated for later evaluation of the score (J) of G o p t . The height and width of the three maps are the same as those of I d .
Lines 3–14 in Algorithm 5 evaluate the maximum number of objects that can be grasped m a x O b j and the score J of each G in G c a n d , saving the evaluation results to a dictionary ( r a n k i n g R e s ). The image coordinates ( u c , v c ) of cup center positions are calculated to obtain the contact point label for each cup in the M l a b e l . Note that the contact labels might not be unique. If several cups have the same contact point label, it means that these cups can be used to grasp the same object, which has a large surface area. If all cups have different contact point labels, it means that each cup can grasp a unique object. Therefore, the number of unique contact labels is the maximum number of objects that can be grasped by G. J is the sum of J c e n t e r , J v a r , and J o r i e n t . J c e n t e r evaluates the distance from the cup center or the average of cup centers to the cluster center because it is assumed to be more stable to hold the object at its center. As in Equation (8) and Figure 7A, J d i s t is evaluated as the average sum of distances from cups to the cluster center by obtaining the value of M d i s t at the position of the average cup center position ( avg ( v c , u c , c o n t a c t L a b e l i ) ] ) in each cluster.
J d i s t = i = 0 N c o n t a c t L a b e l M d i s t [ avg ( v c , u c , c o n t a c t L a b e l i ) ] m a x O b j .
J v a r has been incorporated because there are cases where one cup is near, but another cup is far from the cluster center, and J d i s t cannot evaluate these cases. J v a r is used to balance the distances of cups to cluster center positions. Specifically, as in Equation (9) and Figure 7B, J v a r evaluates the variance of the M d i s t value at the position of average cup center positions.
J v a r = i = 0 N c o n t a c t L a b e l ( M d i s t [ avg ( v c , u c , c o n t a c t L a b e l i ) ] J d i s t ) 2 m a x O b j .
Algorithm 5 Ranking
     Input: Iaff: affordance map
        Ipcd: point cloud
        Gcand: grasp candidates
     Output: Gopt: optimal grasp
1:
M l a b e l , M d i s t , M o r i e n t clustering ( I a f f , I p c d )
2:
r a n k i n g R e s Dict ( )
3:
for  G in G c a n d  do
4:
     P , O , C , A G
5:
     u t c p , v t c p getImgCoord ( P )
6:
     u c , v c getImgCoord ( C )
7:
     c o n t a c t L a b e l unique ( M l a b e l [ v c , u c ] )
8:
     m a x O b j len ( c o n t a c t L a b e l )
9:
     J c e n t e r calcCenterScore ( M d i s t , u c , v c , c o n t a c t L a b e l )
10:
     J v a r calcVarScore ( M v a r , u c , v c , c o n t a c t L a b e l )
11:
     J O r i e n t calcOrientScore ( M o r i e n t , u c , v c , c o n t a c t L a b e l )
12:
     J J c e n t e r + J v a r + J O r i e n t
13:
    add [ m a x O b j , J ] to r a n k i n g R e s [ c o n t a c t L a b e l ]
14:
end for
15:
r a n k i n g R e s sort ( r a n k i n g R e s )
16:
for  R e s in r a n k i n g R e s  do
17:
    if  m a x O b j in R e s > m a x O b j in G o p t  then
18:
         G o p t R e s
19:
    else if  m a x O b j in R e s = m a x O b j in G o p t  then
20:
        if J in R e s > J in G o p t  then
21:
            G o p t R e s
22:
        end if
23:
    end if
24:
end for
25:
return  G o p t , r a n k i n g R e s
J o r i e n t has been incorporated to align the orientation of a polygon composed of the cup center positions in the cluster to the cluster orientation. Specifically, we calculate the dot product between the cluster orientation (longer or short axis unit vector) and the polygon orientation, as shown in Equation (10) and Figure 7C.
J o r i e n t = i = 0 N c o n t a c t L a b e l M o r i e n t [ v c , u c ] · P o l y o r i e n t ( v c , u c , c o n t a c t L a b e l i ) m a x O b j .
The G and corresponding m a x O b j and J are added to the dictionary using c o n t a c t L a b e l as a key. Key level (local level) sorting is first performed to sort stored J (line 15 in the algorithm). Next, dictionary-level (global level) sorting is performed to determine G o p t with the highest m a x O b j and J (lines 16–24 in the algorithm). Note that both G o p t and sorted r a n k i n g R e s are returned because, if the motion planner fails to find a trajectory to G o p t , it would search for the trajectory to other goals with high m a x O b j and J in r a n k i n g R e s .

6. Experiments

The multiple-suction-cup planner was validated using previous affordance map datasets as well as real picking experiments. For both validations, thresholds ε 1 in Equation (2) and ε 2 in Equation (3) were set to 11 . 5 and 0.01 m, respectively. The voxel grid size l was set to 0.005 m. The angle sampling interval Δ α was set to 5 . Validations were performed on an Ubuntu 20.04 PC with an 11th Gen Intel CoreTM i7-11700K @ 3.60 GHz × 16 CPU and NVIDIA GeForce RTX 3060 GPU.

6.1. Validation Using a Previous Affordance Map Dataset

We used three datasets to validate the generality of the multiple-suction-cup grasp planner: Suction FCN [12], SuctionNet-1Billion [13], and SG-U-Net++ [14]. These datasets provide real RGB-D or synthesized depth images and the corresponding affordance maps. Point clouds converted from depth images and affordance maps in the dataset were used to determine the optimal multiple-cup graspable poses and the cups to activate two-cup and four-cup vacuum grippers. The accuracy of the position and orientation were evaluated by the average error of Equation (2) and Equation (3), respectively.

6.2. Validation Using the Picking Experiment

To evaluate the robot picking system and efficiency improvement using the multiple-suction-cup grasp planner, we conducted picking experiments and compared the results of single-cup (single-object) grasping and multiple-cup (multiple-object) grasping. The robot with a two-cup vacuum gripper was used to pick items from a bin and then place them into a tote (Figure 8A). Figure 8C shows the gripper architecture. The gripper was equipped with PISCO (PISCO CO., LTD., Nagano, Japan) standard suction cups. We used a cup with a 20 mm diameter because it can generate sufficient force to grasp and hold objects (weight less than 500 g) used in experiments under our vacuum pump setup. The cups were connected to vacuum ejector ports by tubes. Each ejector port controlled the on/off of the corresponding cup by supplying (on) or blocking (off) compressed air. A camera was installed in the center of the gripper, which captured the depth image at the robot’s home position. The affordance map was then inferred by SG-U-Net++ based on the depth image. For single-object grasping, the planner in our previous work [14] was used to find the position of maximum affordance value and selected the suction cup that can be used to reach the target grasp point by the shortest trajectory. For multiple-object grasping, a multiple-suction-cup grasp planner was first used to determine grasp poses capable of grasping multiple objects or an object with a large surface area using multiple cups. If there was no solution, the planner was switched to the planner for single-object grasping. Trajectories from the home position to grasp poses were generated by MoveIt. As shown in Figure 8B, the target object set included boxes, fruits, and daily necessities. The robot was required to pick 50 boxes, 50 fruits, and 51 daily necessities in a cluttered scene. The robot continued grasp attempts until the scene was cleared. A grasp attempt was considered to have failed if the robot could not pick up the item or the item was dropped during the movement of the manipulator. The results of single-object grasping and multiple-object grasping were evaluated and compared by success rate, picks per hour (PPH), and number of pick attempts. The success rate was defined as the number of successful attempts divided by the number of pick attempts. PPH was defined as the number of objects the robot could pick up in 1 h. The number of pick attempts was defined as a number of attempts for the robot to clear the cluttered scene.

7. Results and Discussion

To our knowledge, this study is the first to propose a grasp planner for multiple-suction-cup grippers to grasp multiple objects or an object with a large surface area. Most of the previous studies used a deep neural network to infer the affordance map for finding the optimal grasp for single-cup grasping. Our planner took advantage of the affordance map to determine the optimal grasp for multiple-cup grasping. The planner was validated on three previous affordance map datasets, and the results are shown in Table 1. Our planner successfully found multiple-suction-cup grasps from the affordance map from Suction FCN, SuctionNet-1Billion, and SG-U-Net++, indicating the high generality of the planner. There were no significant differences in position orientation error between the two-cup and four-cup gripper planning results. The error was the smallest when grasping was planned based on the affordance map from SG-U-Net++ because SG-U-Net++ used synthesized data (e.g., depth image and point cloud normals) without noise values. Figure 9 and Figure 10 show examples of grasp planning results for the two-cup and four-cup vacuum grippers. The planner successfully determined which of the cups to activate when grasping.
The physical experiment results show that multiple-cup suction grasping could improve the efficiency of picking tasks. Table 2 shows a comparison of experimental results between single-cup (single-object) and multiple-cup (multiple-object) grasping. For single-object grasping, all three object sets could be cleared by the robot. Daily necessities had the highest success rate (91%) and highest PPH (502) among the three object sets. The success rate of picking fruits was the lowest because the objects had a ball-like shape and rolled and slipped when the gripper pushed them along the normal direction during grasping despite having the correct grasp pose. The success rate of picking boxes was lower than that of daily necessities because when two boxes were very close together, the planner treated them as a single box and grasped the center, which was actually the edge between two boxes. This problem did not occur in the case of multiple-suction-cup grasping because even when two boxes were treated as a single big box, the planner set the averaged cup center positions to the center of the affordable area, as shown in Figure 7, so that the cups did not suck the edge between boxes. For multiple-object grasping, all three object sets could also be cleared by the robot. The success rate for grasping boxes (100%) was the highest among the object sets. The robot picked fruits with the highest speed (PPH = 779). Multiple-object grasping improved the picking speed by 1.45 × for boxes (PPH: 467 vs. 677), 1.65 × for fruits (PPH: 472 vs. 779), and 1.16 × for daily necessities (PPH: 502 vs. 583). These results indicate that multiple-suction-cup grasping can improve picking speed. The improvement in picking daily necessities was minor because it was difficult to find multiple-cup graspable poses due to the complicated shapes of the items. Figure 11 shows one picking trial for multiple-suction-cup grasping of boxes, fruits, and daily necessities. More trials are shown in the Supplementary Materials Video S1 file.
In this study, the object sets are lightweight items. For the fruit object set, we used lightweight plastic fruit samples instead of real fruit to avoid crushing them. We mainly focused on the correct suction poses and assumed that the suction cup can generate enough force to hold the object if the MSC grasp pose is correct. Although we believe that the maximum load capacity of the selected suction cup is 500 g and that the cup can hold most of the common daily necessities, the influence of object weight on the planner result will be validated using objects with various weights (e.g., using real fruit with different weight) in the future. In addition, the objects in the experiments had rather large and flat surfaces, such that the formulated seals were circular shapes and all contact points between the cup and the surface were in the same plane. In this case, cups are assumed to be capable of fully generating the suction force. However, for a complicated object shape, the seal of a cup might be a polygon (see Figure 1 in [18]). In this case, a more precise contact force model should be used to evaluate the grasp quality (grasp success probability) of the cups, but this will also lead to an increase in planning time. Finding an efficient method to evaluate the contact force for MSC grasp candidates is one of our future works to improve the planner so it is applicable for grasping more complex shape objects in dense scenes.
The gripper dimensions will affect the planner in two ways. Firstly, the increase in dimensions will lead to an increase in the volume of the gripper kernels in Figure 4, which will further result in an increase in planning time. Therefore, proper kernel interval size should be determined based on gripper dimension. In this study, the gripper kernel interval size was determined empirically. Automatic determination of the size based on gripper dimension will be considered to make the planner applicable to grippers with different dimensions. Secondly, if the gripper is too large, it will easily collide with the neighboring objects or bins so it will be difficult for the planner to find solutions for MSC grasping.
The picking system is expected to be improved in future work aimed at further increasing the picking speed. Our current layout of cups is fixed, and the positions and orientations of cups are fixed in the gripper. However, the real-world scenario is complex in that the items may have complex shapes, and the poses of items are disorganized. Hence, a fixed cup configuration may lead to a low probability of finding MSC grasp poses in a very complex environment (e.g., a dense scene) because there are few candidates who satisfy the MSC grasp condition. Making the positions and poses of cups adjustable by a proper gripper mechanical design may increase the probability of finding MSC grasp poses, which could be one of the directions in our future work to further improve the picking efficiency. In addition, as described above, one common failure is that objects can move (e.g., roll) after being grasped. We intend to analyze the dynamics (e.g., object shape, friction, and contact force between items) after grasping to find a grasp that moves the object and neighboring objects such that grasp success is improved. Another area for improvement is depth filling because incomplete depth results in low accuracy in estimating the affordance map and normals and thus leads to low grasp success. Furthermore, we will consider the picking sequence to improve the possibility of picking multiple objects. Picking experiments using a gripper with more suction cups will be conducted to further validate the performance and application of the planner for MSC grasp in the future. Our system cannot detect whether or not the gripper has successfully grasped the object. There are no sensors to detect whether or not the seal has been successfully formulated. Using a force sensor to detect the grasp success will be conducted in the future. If the force load of the end-effector increases after grasping, the object will be considered to be successfully grasped by the gripper. Based on the success detection result, the robot will be required to re-grasp a certain object if grasp failure occurs without returning to its home position to improve the picking efficiency.

8. Conclusions

In this study, we proposed a grasp planner for a multiple-suction-cup vacuum gripper. The planner took advantage of an affordance map to determine grasp poses for multiple-cup grasping using a 3D convolution-based method. Thanks to the encoded cup ID kernel, the planner could directly determine which cups to activate by decoding the convolution results. The planner exhibited good generality on previous affordance map datasets. The planner also showed the ability to improve picking speed compared with single-cup grasping in physical experiments with a real robot. We will work on improving the planner in future work from several directions, including object state analysis after grasping, point cloud or depth image completion, and picking sequence planning.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/robotics13060085/s1, Video S1: video.mp4.

Author Contributions

Conceptualization, P.J., J.O. (Junji Oaki), Y.I. and J.O. (Junichiro Ooga); methodology, P.J.; software, P.J.; validation, P.J. and J.O. (Junichiro Ooga); formal analysis, P.J.; investigation, P.J.; resources, P.J.; data curation, P.J.; writing—original draft preparation, P.J.; writing—review and editing, P.J., J.O. (Junji Oaki), Y.I. and J.O. (Junichiro Ooga); visualization, P.J.; supervision, Y.I. and J.O. (Junichiro Ooga); project administration, Y.I. and J.O. (Junichiro Ooga). All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets and source code presented in this article are not readily available due to the policy of the authors’ institution. Requests to access the datasets should be directed to the corresponding author.

Conflicts of Interest

Ping Jiang, Junji Oaki, Yoshiyuki Ishihara, and Junichiro Ooga are employees of the Toshiba Corporation. Ping Jiang and Yoshiyuki Ishihara are inventors of a patent related to this work. US Patent No.: 11691275. B2; Patent Date: 4 July 2023; Name: HANDLING DEVICE AND COMPUTER PROGRAM PRODUCT.

References

  1. Bogue, R. Growth in e-commerce boosts innovation in the warehouse robot market. Ind. Robot. Int. J. 2016, 43, 583–587. [Google Scholar] [CrossRef]
  2. Yu, Y.; Fukuda, K. Analysis of multifinger grasp internal forces for stably grasping multiple polyhedral objects. Int. J. Mechatronics Autom. 2013, 3, 203–216. [Google Scholar] [CrossRef]
  3. Yamada, T.; Yamada, M.; Yamamoto, H. Stability analysis of multiple objects grasped by multifingered hands with revolute joints in 2D. In Proceedings of the 2012 IEEE International Conference on Mechatronics and Automation, Chengdu, China, 5–8 August 2012; pp. 1785–1792. [Google Scholar]
  4. Sakamoto, T.; Wan, W.; Nishi, T.; Harada, K. Efficient picking by considering simultaneous two-object grasping. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 8295–8300. [Google Scholar]
  5. Chen, T.; Shenoy, A.; Kolinko, A.; Shah, S.; Sun, Y. Multi-object grasping–estimating the number of objects in a robotic grasp. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 4995–5001. [Google Scholar]
  6. Shenoy, A.; Chen, T.; Sun, Y. Multi-Object Grasping–Generating Efficient Robotic Picking and Transferring Policy. arXiv 2021, arXiv:2112.09829. [Google Scholar]
  7. Agboh, W.C.; Ichnowski, J.; Goldberg, K.; Dogar, M.R. Multi-object Grasping in the Plane. In Robotics Research; Springer: Berlin/Heidelberg, Germany, 2023; pp. 222–238. [Google Scholar]
  8. Agboh, W.C.; Sharma, S.; Srinivas, K.; Parulekar, M.; Datta, G.; Qiu, T.; Ichnowski, J.; Solowjow, E.; Dogar, M.; Goldberg, K. Learning to Efficiently Plan Robust Frictional Multi-Object Grasps. arXiv 2022, arXiv:2210.07420. [Google Scholar]
  9. Jiang, P.; Ishihara, Y.; Sugiyama, N.; Oaki, J.; Tokura, S.; Sugahara, A.; Ogawa, A. Depth image–based deep learning of grasp planning for textureless planar-faced objects in vision-guided robotic bin-picking. Sensors 2020, 20, 706. [Google Scholar] [CrossRef] [PubMed]
  10. Pattar, S.P.; Hirakawa, T.; Yamashita, T.; Sawanobori, T.; Fujiyoshi, H. Single Suction Grasp Detection for Symmetric Objects Using Shallow Networks Trained with Synthetic Data. IEICE Trans. Inf. Syst. 2022, 105, 1600–1609. [Google Scholar] [CrossRef]
  11. Araki, R.; Hirakawa, T.; Yamashita, T.; Fujiyoshi, H. MT-DSSD: Multi-task deconvolutional single shot detector for object detection, segmentation, and grasping detection. Adv. Robot. 2022, 36, 373–387. [Google Scholar] [CrossRef]
  12. Zeng, A.; Song, S.; Yu, K.T.; Donlon, E.; Hogan, F.R.; Bauza, M.; Ma, D.; Taylor, O.; Liu, M.; Romo, E.; et al. Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching. Int. J. Robot. Res. 2022, 41, 690–705. [Google Scholar] [CrossRef]
  13. Cao, H.; Fang, H.S.; Liu, W.; Lu, C. Suctionnet-1billion: A large-scale benchmark for suction grasping. IEEE Robot. Autom. Lett. 2021, 6, 8718–8725. [Google Scholar] [CrossRef]
  14. Jiang, P.; Oaki, J.; Ishihara, Y.; Ooga, J.; Han, H.; Sugahara, A.; Tokura, S.; Eto, H.; Komoda, K.; Ogawa, A. Learning suction graspability considering grasp quality and robot reachability for bin-picking. Front. Neurorobotics 2022, 16, 806898. [Google Scholar] [CrossRef]
  15. Lenz, I.; Lee, H.; Saxena, A. Deep learning for detecting robotic grasps. Int. J. Robot. Res. 2015, 34, 705–724. [Google Scholar] [CrossRef]
  16. Xu, R.; Chu, F.J.; Vela, P.A. Gknet: Grasp keypoint network for grasp candidates detection. Int. J. Robot. Res. 2022, 41, 361–389. [Google Scholar] [CrossRef]
  17. Yu, S.; Zhai, D.H.; Xia, Y.; Wu, H.; Liao, J. SE-ResUNet: A novel robotic grasp detection method. IEEE Robot. Autom. Lett. 2022, 7, 5238–5245. [Google Scholar] [CrossRef]
  18. Mahler, J.; Matl, M.; Liu, X.; Li, A.; Gealy, D.; Goldberg, K. Dex-net 3.0: Computing robust vacuum suction grasp targets in point clouds using a new analytic model and deep learning. In Proceedings of the 2018 IEEE International Conference on robotics and automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 5620–5627. [Google Scholar]
  19. Utomo, T.W.; Cahyadi, A.I.; Ardiyanto, I. Suction-based Grasp Point Estimation in Cluttered Environment for Robotic Manipulator Using Deep Learning-based Affordance Map. Int. J. Autom. Comput. 2021, 18, 277–287. [Google Scholar] [CrossRef]
  20. Hasegawa, S.; Wada, K.; Kitagawa, S.; Uchimi, Y.; Okada, K.; Inaba, M. Graspfusion: Realizing complex motion by learning and fusing grasp modalities with instance segmentation. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 7235–7241. [Google Scholar]
  21. Morrison, D.; Corke, P.; Leitner, J. Learning robust, real-time, reactive robotic grasping. Int. J. Robot. Res. 2020, 39, 183–201. [Google Scholar] [CrossRef]
  22. Kumra, S.; Joshi, S.; Sahin, F. Antipodal robotic grasping using generative residual convolutional neural network. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October 2020–24 January 2021; pp. 9626–9633. [Google Scholar]
  23. Le, T.N.; Lundell, J.; Abu-Dakka, F.J.; Kyrki, V. Deformation-Aware Data-Driven Grasp Synthesis. arXiv 2021, arXiv:2109.05320. [Google Scholar] [CrossRef]
  24. Cao, H.; Chen, G.; Li, Z.; Lin, J.; Knoll, A. Lightweight convolutional neural network with Gaussian-based grasping representation for robotic grasping detection. arXiv 2021, arXiv:2101.10226. [Google Scholar]
  25. Kumra, S.; Joshi, S.; Sahin, F. GR-ConvNet v2: A Real-Time Multi-Grasp Detection Network for Robotic Grasping. Sensors 2022, 22, 6208. [Google Scholar] [CrossRef] [PubMed]
  26. Depierre, A.; Dellandréa, E.; Chen, L. Jacquard: A large scale dataset for robotic grasp detection. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 3511–3516. [Google Scholar]
  27. Mahler, J.; Matl, M.; Satish, V.; Danielczuk, M.; DeRose, B.; McKinley, S.; Goldberg, K. Learning ambidextrous robot grasping policies. Sci. Robot. 2019, 4, eaau4984. [Google Scholar] [CrossRef]
  28. Kensuke, H.; Makoto, K. Enveloping grasp for multiple objects. In Proceedings of the 1998 IEEE International Conference on Robotics and Automation (ICRA), Leuven, Belgium, 20 May 1998; pp. 2409–2415. [Google Scholar]
  29. Takayoshi, Y.; Hidehiko, Y.; Tsuji, T. Rolling-Based Manipulation for Multiple Objects. IEEE Trans. Robot. Autom. 2000, 16, 457–468. [Google Scholar]
  30. Takayoshi, Y.; Hidehiko, Y. Static Grasp Stability Analysis of Multiple Spatial Objects. J. Control Sci. Eng. 2015, 3, 118–139. [Google Scholar] [CrossRef]
  31. Mucchiani, C.; Yim, M. A novel underactuated end-effector for planar sequential grasping of multiple objects. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 8935–8941. [Google Scholar]
  32. Yao, K.; Billard, A. Exploiting Kinematic Redundancy for Robotic Grasping of Multiple Objects. IEEE Trans. Robot. 2023, 39, 1982–2002. [Google Scholar] [CrossRef]
  33. Mantriota, G. Optimal grasp of vacuum grippers with multiple suction cups. Mech. Mach. Theory 2007, 42, 18–33. [Google Scholar] [CrossRef]
  34. Kozák, V.; Sushkov, R.; Kulich, M.; Přeučil, L. Data-driven object pose estimation in a practical bin-picking application. Sensors 2021, 21, 6093. [Google Scholar] [CrossRef]
  35. Tanaka, J.; Ogawa, A. Cardboard box depalletizing robot using two-surface suction and elastic joint mechanisms: Mechanism proposal and verification. J. Robot. Mechatronics 2019, 31, 474–492. [Google Scholar] [CrossRef]
  36. Leitner, J.; Tow, A.W.; Sünderhauf, N.; Dean, J.E.; Durham, J.W.; Cooper, M.; Eich, M.; Lehnert, C.; Mangels, R.; McCool, C.; et al. The ACRV picking benchmark: A robotic shelf picking benchmark to foster reproducible research. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 4705–4712. [Google Scholar]
  37. Kessens, C.C.; Thomas, J.; Desai, J.P.; Kumar, V. Versatile aerial grasping using self-sealing suction. In Proceedings of the 2016 IEEE international conference on robotics and automation (ICRA), Stockholm, Sweden, 16–21 May 2016; pp. 3249–3254. [Google Scholar]
  38. Islam, F.; Vemula, A.; Kim, S.K.; Dornbush, A.; Salzman, O.; Likhachev, M. Planning, learning and reasoning framework for robot truck unloading. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 5011–5017. [Google Scholar]
  39. Domae, Y.; Okuda, H.; Taguchi, Y.; Sumi, K.; Hirai, T. Fast graspability evaluation on single depth maps for bin picking with general grippers. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–7 June 2014; pp. 1997–2004. [Google Scholar]
Figure 1. Using multiple suction cups to grasp (A) multiple objects or (B) an object with a large surface area.
Figure 1. Using multiple suction cups to grasp (A) multiple objects or (B) an object with a large surface area.
Robotics 13 00085 g001
Figure 2. An example of a vacuum gripper with two cups. The TCP is the gripper tool center point. d c i 0 is the distance from the ith cup to the TCP. ( x c i 0 , y c i 0 ) is the center position of the ith cup in local gripper coordinates. The green trapezium is the suction cup. The blue cylinder is the vacuum cylinder.
Figure 2. An example of a vacuum gripper with two cups. The TCP is the gripper tool center point. d c i 0 is the distance from the ith cup to the TCP. ( x c i 0 , y c i 0 ) is the center position of the ith cup in local gripper coordinates. The green trapezium is the suction cup. The blue cylinder is the vacuum cylinder.
Robotics 13 00085 g002
Figure 3. An example of conditions for a vacuum gripper with two cups to grasp two objects. c p is the contact point where the suction cup sucks the surface. Contact points c p 1 and c p 2 need to be located in the affordable areas of objects. c p 1 , c p 2 , and TCP need to be in the same plane perpendicular to n g . n c p 1 and n c p 2 are the normals of the contact points for the left and right cups, respectively. Both n c p 1 and n c p 2 need to be parallel to n g . d c p 1 and d c p 2 are the distances from the contact points to the TCP for the left and right cups in world coordinates, respectively. Both d c p 1 and d c p 2 need to be equal to the distances from the left cup center ( d c 1 0 ) and right cup center ( d c 2 0 ) to the TCP in gripper coordinates.
Figure 3. An example of conditions for a vacuum gripper with two cups to grasp two objects. c p is the contact point where the suction cup sucks the surface. Contact points c p 1 and c p 2 need to be located in the affordable areas of objects. c p 1 , c p 2 , and TCP need to be in the same plane perpendicular to n g . n c p 1 and n c p 2 are the normals of the contact points for the left and right cups, respectively. Both n c p 1 and n c p 2 need to be parallel to n g . d c p 1 and d c p 2 are the distances from the contact points to the TCP for the left and right cups in world coordinates, respectively. Both d c p 1 and d c p 2 need to be equal to the distances from the left cup center ( d c 1 0 ) and right cup center ( d c 2 0 ) to the TCP in gripper coordinates.
Robotics 13 00085 g003
Figure 4. Overall architecture of the multiple-suction-cup grasp planner. ⊛ is the convolution operator.
Figure 4. Overall architecture of the multiple-suction-cup grasp planner. ⊛ is the convolution operator.
Robotics 13 00085 g004
Figure 5. The problem of using a binary kernel. Red dots are cup centers, and blue dots are TCP positions. The transparent blue area indicates graspable positions using two cups. The convolution results for two-cup suction grasping are the same for the four cases in which the convolved values for the four cases are all equal to 2. However, although the suction cup center positions in the affordable area are different so that the cups to activate differ between the four cases, the activation pattern cannot be directly determined from the convolution result.
Figure 5. The problem of using a binary kernel. Red dots are cup centers, and blue dots are TCP positions. The transparent blue area indicates graspable positions using two cups. The convolution results for two-cup suction grasping are the same for the four cases in which the convolved values for the four cases are all equal to 2. However, although the suction cup center positions in the affordable area are different so that the cups to activate differ between the four cases, the activation pattern cannot be directly determined from the convolution result.
Robotics 13 00085 g005
Figure 6. Clustering results. (A) Label map M l a b e l . (B) Distance map M d i s t . (C) Orientation map M o r i e n t . Numbers in (A) is the cluster Id. (B) is the heatmap of the distance to the center of the clustered area. Green points and red lines in (C) are center positions and axes of clustered areas.
Figure 6. Clustering results. (A) Label map M l a b e l . (B) Distance map M d i s t . (C) Orientation map M o r i e n t . Numbers in (A) is the cluster Id. (B) is the heatmap of the distance to the center of the clustered area. Green points and red lines in (C) are center positions and axes of clustered areas.
Robotics 13 00085 g006
Figure 7. Metrics to evaluate G c a n d . Red dots are cup centers and blue dots are TCP positions. Green areas are clusters of affordable areas. Green dots are cluster centers. (A) Distance score J d i s t . (B) Distance variation score J v a r . (C) Orientation score J o r i e n t .
Figure 7. Metrics to evaluate G c a n d . Red dots are cup centers and blue dots are TCP positions. Green areas are clusters of affordable areas. Green dots are cluster centers. (A) Distance score J d i s t . (B) Distance variation score J v a r . (C) Orientation score J o r i e n t .
Robotics 13 00085 g007
Figure 8. Experiment setup. (A) Robot. (B) Object set. (C) Gripper.
Figure 8. Experiment setup. (A) Robot. (B) Object set. (C) Gripper.
Robotics 13 00085 g008
Figure 9. Examples of grasp planning results for two-cup and four-cup vacuum grippers. (A) Suction FCN. (B) SuctionNet-1Billion. (C) SG-U-Net++. Red dots are the center positions of the cups. Dark blue dots are the center positions of the the gripper.
Figure 9. Examples of grasp planning results for two-cup and four-cup vacuum grippers. (A) Suction FCN. (B) SuctionNet-1Billion. (C) SG-U-Net++. Red dots are the center positions of the cups. Dark blue dots are the center positions of the the gripper.
Robotics 13 00085 g009
Figure 10. Examples of grasp planning results for two-cup and four-cup vacuum grippers. (A) Suction FCN. (B) SuctionNet-1Billion. (C) SG-U-Net++. Large red spots are centers of activated cups. Small red spots are centers of disabled cups. Dark blue dots are the center positions of the the gripper.
Figure 10. Examples of grasp planning results for two-cup and four-cup vacuum grippers. (A) Suction FCN. (B) SuctionNet-1Billion. (C) SG-U-Net++. Large red spots are centers of activated cups. Small red spots are centers of disabled cups. Dark blue dots are the center positions of the the gripper.
Robotics 13 00085 g010
Figure 11. One picking trial for the multiple-suction-cup grasp of (A) boxes, (B) fruits, and (C) daily necessities. Red dots are the center positions of the cups. Dark blue dots are the center positions of the the gripper.
Figure 11. One picking trial for the multiple-suction-cup grasp of (A) boxes, (B) fruits, and (C) daily necessities. Red dots are the center positions of the cups. Dark blue dots are the center positions of the the gripper.
Robotics 13 00085 g011
Table 1. Position and orientation error of grasp pose.
Table 1. Position and orientation error of grasp pose.
DatasetTwo-Cup GripperFour-Cup Gripper
Position ErrorOrientation ErrorPosition ErrorOrientation Error
(Mean (SD) [m])(Mean (SD) [deg.])(Mean (SD) [m])(Mean (SD) [deg.])
Suction FCN6.28  × 10 3 (0.80  × 10 4 )4.50 (6.94)5.94  × 10 3 (0.23  × 10 4 )5.04 (11.00)
suctionNet-1Billion7.67  × 10 3 (2.43  × 10 4 )4.66 (6.30)7.64  × 10 3 (1.41 × 10 4 )4.59 (6.98)
SG-U-Net++2.88  × 10 3 (0.18  × 10 4 )2.85 (10.2)2.30  × 10 3 (0.07  × 10 4 )2.68 (8.07)
Table 2. Experimental results.
Table 2. Experimental results.
Object SetMethodTotal AttemptsSuccessful AttemptsSuccess RatePPH
BoxesSingle-object grasping595085%468
Multiple-object grasping3636100%677
FruitsSingle-object grasping645078%472
Multiple-object grasping333194%779
Daily necessitiesSingle-object grasping565191%502
Multiple-object grasping534075%583
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jiang, P.; Oaki, J.; Ishihara, Y.; Ooga, J. Multiple-Object Grasping Using a Multiple-Suction-Cup Vacuum Gripper in Cluttered Scenes. Robotics 2024, 13, 85. https://doi.org/10.3390/robotics13060085

AMA Style

Jiang P, Oaki J, Ishihara Y, Ooga J. Multiple-Object Grasping Using a Multiple-Suction-Cup Vacuum Gripper in Cluttered Scenes. Robotics. 2024; 13(6):85. https://doi.org/10.3390/robotics13060085

Chicago/Turabian Style

Jiang, Ping, Junji Oaki, Yoshiyuki Ishihara, and Junichiro Ooga. 2024. "Multiple-Object Grasping Using a Multiple-Suction-Cup Vacuum Gripper in Cluttered Scenes" Robotics 13, no. 6: 85. https://doi.org/10.3390/robotics13060085

APA Style

Jiang, P., Oaki, J., Ishihara, Y., & Ooga, J. (2024). Multiple-Object Grasping Using a Multiple-Suction-Cup Vacuum Gripper in Cluttered Scenes. Robotics, 13(6), 85. https://doi.org/10.3390/robotics13060085

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop