Toward Future Automatic Warehouses: An Autonomous Depalletizing System Based on Mobile Manipulation and 3D Perception

: This paper presents a mobile manipulation platform designed for autonomous depalletizing tasks. The proposed solution integrates machine vision, control and mechanical components to increase ﬂexibility and ease of deployment in industrial environments such as warehouses. A collaborative robot mounted on a mobile base is proposed, equipped with a simple manipulation tool and a 3D in-hand vision system that detects parcel boxes on a pallet, and that pulls them one by one on the mobile base for transportation. The robot setup allows to avoid the cumbersome implementation of pick-and-place operations, since it does not require lifting the boxes. The 3D vision system is used to provide an initial estimation of the pose of the boxes on the top layer of the pallet, and to accurately detect the separation between the boxes for manipulation. Force measurement provided by the robot together with admittance control are exploited to verify the correct execution of the manipulation task. The proposed system was implemented and tested in a simpliﬁed laboratory scenario and the results of experimental trials are reported.


Introduction
Industrial automated warehouses are designed to optimize the transportation and distribution of goods usually stored in cardboard boxes [1]. In this work, we consider the task of automated depalletizing (unloading) a pallet containing homogeneous boxes, namely the origin pallet, with the purpose of composing a new pallet containing potentially different boxes, namely the mixed pallet, by collecting items from different origin pallets. Homogeneous pallets are organized as a grid of cardboard boxes of the same type arranged on multiple stacked layers.
Robotic depalletizers in automated warehouses are usually bulky and limited to four degrees of freedom (DOFs), requiring very high payloads to move large end-effectors.
Robot end-effectors are usually designed to pick up an entire layer of boxes from the origin pallet and place it on a distribution and serialization system composed of multiple conveyor belts, even in the case of light boxes or when only a few boxes are needed to be transferred from the origin pallet. Moreover, since more than ten thousand different types of goods may need to be stored in a single food and beverage factory, warehouses require large buildings for intermediate buffers to temporarily store the depalletized items before moving them to the destination (mixed) pallets.
In this paper, we address the automated depalletizing problem by defining a new system able to integrate safety, maneuverability and ease of interaction for a palletizing/depalletizing task performed in an industrial environment. This system could be considered a first prototype of a multi-purpose platform with high reconfiguration capabilities for the interaction and assistance of human operators in a shared warehouse environment. To this end, our system exploits a serial collaborative robot (cobot) mounted on an autonomous mobile robot (AMR). Both the serial arm and the AMR must be endowed with collaborative features, since the overall robotic system must be able to safely operate in an industrial environment shared with humans. The cobot is equipped with an eye-in-hand time of flight (ToF) 3D camera. The cobot, an UR10e from Universal Robots, has six DOFs, providing a wide workspace for box extraction, 10 Kg payload, and wrist force/torque sensor for tool wrench measurement. A shovel-shaped tool mounted on the cobot end-effector is used to fit in the gap between boxes and to pull the cardboard boxes one by one on the mobile base for transportation. The choice of this type of tool with respect to other standard solutions allows to reduce both the total execution time and the load on the manipulator. The 3D camera is located on the cobot end-effector to detect the pose of the boxes and the separation gap between them for tool insertion.
The design of the proposed mobile manipulation system was driven by the fact that payload is one of the main bottleneck problems faced by collaborative mobile robots in industrial environments. Indeed, due to safety and maneuverability, only small cobots can be placed on AMRs. Therefore, the proposed solution is based on decoupling object displacement and lifting operations, leaving the former to the cobot and the latter to an automatic lifting device located on the AMR, next to the cobot. The cobot manipulates the item at hand (e.g., a box) without lifting it, but rather by dragging it onto the lifting device.

Related Work
The optimization of operations inside a warehouse is currently characterized by several research areas. In particular, the implementation of a robotic system able to assist human workers in their tasks requires the capability to effectively plan and optimize the sequence of actions [2][3][4], to correctly localize persons and objects [5] and to interact with the environment and other systems in the network [6]. Since the advent of Industry 4.0, the requirements of optimizing the transport and distribution of products has encouraged the use of autonomous guided vehicles within modern automated warehouses [7]. Moreover, autonomous mobile robots have been introduced to improve flexibility and robustness in modern industrial contexts [8]. Nowadays, the potential collaboration between cobots and mobile platforms has received increasing efforts from industrial and scientific research, particularly for logistic applications [9]. Bonini et al. [10] explored the possibility of employing a mobile collaborative system for palletizing operations. The hardware consists of a lifting device equipped with a pneumatic gripper, a powered conveyor and a set of sensors which allows interaction between the system itself and the environment. Vacuum grippers are usually adopted for depalletizing cardboard boxes [11][12][13]. However, this solution is not generally safe, since cardboard boxes cannot always hold the weight of the items that they contain. Nakamoto et al. [13] designed a depalletizing system where a second robot arm is used to support the boxes during motion. A robot manipulation system based on 3D vision and a vacuum gripper, for the detection and unloading of cardboard boxes from shipping containers or semi-truck trailers, is presented in [11]. A fixed-base robot depalletizing system designed for supermarket logistic processes is described in [12]. Similarly, the application of vacuum technology was proposed in [13,14], whereas Matsuo et al. [15] developed a mobile robot equipped with a self-weight compensation system. However, these solutions may lack flexibility, and vacuum gripping solutions may be hard to implement on AMRs, especially small-sized ones. In contrast, with previous systems, a non-prehensile approach of manipulation was suggested by Lim et al. [16], where items are dragged aboard the mobile robotized system. Non-prehensile manipulation strategies are a subject of growing interest, both in the research and industrial fields, because with the reduction in the overall weight supported by the gripper, the minimization of the risk of falls and the capability to perform specific motions in a cluttered environment would be impossible under normal grasping scenarios. In [17], a new planning framework to exploit the funneling effect of pushing to deal with uncertain and clutter environments is proposed. Similarly, in [18] a framework that takes into account the human presence in cluttered environments is analyzed. Acharya et al. [19] presented a motion planning analysis for the optimization of the stability and control of an asymmetric object during non-prehensile manipulation. Ardakani et Al. [20] proposed a quasi-static analysis to define a dynamical system to predict the object behavior in function of friction forces. In the industrial scenario [21], the manipulation of a flexible belt exploiting the friction between the belt and the gripper is discussed. Our work aimed to promoting new studies on the interaction forces between the object to be picked and the support layer to further simplify the depalletizing operation in a shared human-robot environment.
In terms of perception, similarly to what is proposed in this paper, Hashimoto et al. [22] presented a genetic algorithm to recognize loads on a pallet. The main differences are that the method in [22] adopted a fixed gray scale camera placed above the pallet, while we adopted an eye-in-hand camera system and admittance control to simultaneously achieve the accurate positioning of the robot arm and a controlled interaction with the boxes. Yunardi et al. [23] investigated a method to determine the size of parcels moving on a conveyor belt using RGB cameras. Prasse et al. [24] proposed a system to detect the pose of parcels located on a pallet by combining a time of flight (ToF) sensor and RFIDs, which were used to compute the 3D structure of the layer. Katsoulas et al. [25] proposed methods for the recognition of arbitrary size boxes in cluttered environments using a planar laser scanner, mounted on a robot arm in eye-in-hand configuration. A drawback is that, since they are based on 2.5D edge detectors, they could fail to detect aligned boxes in contact with one another. In [26], a robotic depalletizing system was proposed using uncalibrated vision and 3D laser-assisted image analysis. All sensors were attached to the ceiling. In [27], an RGB-D vision system based on pattern matching was developed for the localization of heterogeneous cases in a depalletizing robotic cell.

Concept and Mechanical Design
The items to be handled are cardboard boxes placed on a standard Euro Pallet (EPAL), in such a manner that the parcel boxes form at most four vertical layers separated by rigid interlayers. The overall depalletizing task may be subdivided into the following sequential steps: • Robot's self-localization within the workspace; • Autonomous navigation towards the desired pallet; • Pose detection of boxes on the top layer of the pallet; • Extraction of boxes from the pallet and placement aboard the robot.
A manipulation strategy based on dragging goods aboard the mobile robot is employed, in order to overcome the limitations of standard grabbing manipulation, such as a limited robot payload or the possibility of handling only packages that are able to sustain their own weight.
We propose using a serial collaborative manipulator UR10e, provided by Universal Robots (www.universal-robots.com/products/ur10-robot/, accessed on 24 June 2021), installed on a Mir100 AMR, supplied by Mobile Industrial Robots (www.mobile-industr ial-robots.com/en/solutions/robots/mir100/, accessed on 24 June 2021) as illustrated in Figure 1. In addition, a scissor lifting mechanism is integrated to collect boxes on board. Thus, the displacement and lifting functions are decoupled. The top of the lifting mechanism is equipped with an idler-roller conveyor to allow items to be dragged and collected. During the manipulation phase, a swivel hatch, mounted on the terminal part of the conveyor, can be set in either open or closed configuration. The former position enables items to be dragged either from the pallet to the conveyor or vice versa, whereas the latter position prevents the boxes from falling out during the AMR navigation. The operational capability of the overall system is influenced by how the cobot and the lifting mechanism are integrated on the AMR. For this reason, three different solutions were analyzed. In the first solution, the cobot is installed on a fixed support, and the installation height is selected by optimizing the cobot workspace with respect to the box locations. In this way, system stability is not excessively penalized, but the fixed cobot location limits the boxes that can be reached. In the second solution, the cobot is installed on top of the lifting mechanism, thus simplifying task execution, since the cobot base is always aligned with the conveyor and the box layer to be handled. As a drawback, a more powerful lifting mechanism is needed. The last solution requires the installation of the cobot on a telescopic actuator, thus leading to independent movements of the cobot and the lifting mechanism. However, cost-effectiveness is penalized. Moreover, in both the second and the third solution, system stability decreases when handling the higher boxes. Another aspect to be considered when choosing the optimal hardware architecture is that, due to the rectangular footprint of the MiR AMR, the cobot and the lifting mechanism can be placed in either longitudinal or transverse arrangement. The former (Figure 2a) enhances stability, but possible interference between the lifting mechanism and the cobot limits the base joint rotation. On the other hand, the latter arrangement ( Figure 2b) overcomes such limitations by virtue of the larger distance of the cobot with respect to the lifting mechanism, however, AMR navigation may be more challenging because of the non-omnidirectional MiR steering system. As a result of the design analysis (with the latter being thoroughly described in [28]), we chose the installation of the cobot on a fixed support with a transverse arrangement. It is worth observing that when the AMR is close to the pallet, at most two boxes can be processed, due to the limited workspace of the robot arm; thus, handling the third box of the same row requires the AMR to reposition on the opposite side of the pallet. This drawback can be overcome by using a cobot with a larger workspace mounted on a larger AMR.
(b) Transverse layout. Concerning the lifting mechanism, it is composed of two scissor linkages placed in parallel and a linear electric actuator that acts on a transverse ledger. A four-bar linkage enables the rotational motion of the swivel hatch, as represented in Figure 3. For the four-bar actuation, we select a passive solution for the sake of simplicity, lightness and cost-effectiveness. By employing a crank-slider mechanism, the linear motion of the top wheel installed in the scissor lifting mechanism can be converted into an angular rotation of the four-bar crank. Friction phenomena during box manipulation might result in an undesired sliding of the interlayer. Therefore, we use a pair of rotating clips (Figure 3) which keep the interlayer in place and prevent it from slipping. Each clip comprises a RC-Servo motor, a main body rigidly attached to the motor, a sliding rod, and a compression spring. The RC-Servo actuates the rotation of the main body, resulting in a spring compression and a force applied on the interlayer.

Simulations
The industrial scenarios considered in this work were simulated using CoppeliaSim (www.coppeliarobotics.com/coppeliaSim, accessed on 24 June 2021) and evaluated in terms of task execution time and motion feasibility.
Simulations include a comparison between the dragging manipulation approach against a standard pick-and-place solution.
One layer of 21 palletized boxes (of size 250 × 150 × 300 mm, arranged in a grid of 7 rows, 3 columns) must be transported from an initial pallet to a storage pallet, maintaining the initial grid arrangement of the boxes. It is assumed that during the manipulation phase of the items the total payload attached to the robot (i.e., the tool and the box) is always lower than the UR10e maximum payload.
The first simulated scenario (Procedure 1) involves the usage of a shovel-shaped tool on the UR10e end-effector. The tool drags the item on the lifting device. In this case, it is assumed that the AMR can transport only two boxes per journey. Making such assumptions, the task can be accomplished in 11 journeys: for each of the seven rows of the box grid, the first two columns items are manipulated in the first seven journeys, then the items on the third column are handled in the remaining four journeys, with a quite different picking method. Indeed, a pair of items (in the same row) of the first two columns, once the AMR is positioned next to them, can be pulled by the manipulator without any additional motion of the AMR, whereas the AMR must be moved to manipulate a pair of items (in different rows) of the third column. Figure 4 shows the AMR/cobot system and the grid of 21 boxes highlighting the items involved in each of the 11 journeys. The shovel-shaped tool of the cobot allows to load a box on the lifting device by means of a linear dragging movement. Firstly, the manipulator executes a collision-free movement from its current initial position towards the detected approaching frame between two consecutive boxes. As will be further described in Section 6, such a movement consists of different point-to-point motions, because of the need for taking a first 3D image of the pallet top layer and then a second image closer to the identified gap between two boxes to refine the detection accuracy. Afterwards, three linear motions are required to insert the tool into the detected gap, drag the box and finally bring the manipulator back to the next starting position.
Procedure 1 lists the consecutive motions that are required to perform the loading (and similarly, the unloading) of a single box on the AMR.
Procedure 1: one-box dragging motion sequence, shovel-shaped tool 1 Point-to-point motion to box target frame 2 Linear downward approaching movement (200 mm) 3 Linear dragging movement (600 mm) 4 Linear upward movement (300 mm) The collision-free point-to-point motions are generated by means of the Open Motion Planning Library framework (OMPL) (https://ompl.kavrakilab.org/, accessed on 24 June 2021), exploiting the OMPL wrapper included into CoppeliaSim. OMPL is also integrated into ROS/MoveIt and employed in the real robot application. Table 1 shows the average simulated elapsed time for each manipulator motion performing the loading and the unloading of two boxes placed in the first two columns. The partial time taken to load and unload the first box are indicated as s T l1 and s T u1 , respectively, while s T l and s T u represent the overall time needed for loading/unloading two boxes. The left superscript s indicates the shovel-shaped tool. Note also that in Table 1, motion indices are referred to the steps of the first simulated scenario (Procedure 1).
In order to guarantee proper safety bounds, the manipulator joint velocities and the end-effector linear velocity are limited to 40 deg/s and 200 mm/s, respectively. • T j is the time required to move the AMR from the initial pallet to the storage pallet; • T s = 1 s is the time required to move the AMR from the current to the next column of the grid, and the overall time needed to complete the transport of all the 21 boxes can be fairly approximated as follows: where the term 22 T j considers 11 round-trip journeys, the terms 7 s T l and 7 s T u include the loading/unloading of the aligned boxes in the first and second grid columns, while 7 s T l1 and 7 s T u1 are the times taken to handle the remaining single box of the third grid column. It is worth noting that the additional time required to re-execute the procedure in the case of a collision between the tool and the boxes due to the inaccurate detection of the gap, as described in Section 6, is not considered here, because it is expected that the probability of this type of events will be reduced and ideally made negligible for the future industrial release of the system.
The second manipulation scheme (Procedure 2) involves the usage of a vacuum gripper mounted on the robot end-effector instead of the shovel-shaped tool. The main difference with the previous approach is that the robot can execute a standard pick-andplace operation with approaching movements at different heights, so that the lifting device is no more required. Moreover, the detection of the gap between boxes is no longer required and the vision system can identify the picking frame exploiting a single camera acquisition, so that the first robot motion is faster with respect to the former manipulation approach. The algorithm of Procedure 2 describes the sequence of robot motions required to pick a single box from the pallet and place it on the AMR. In this case, the loading/unloading procedures are slower than those of the former approach because of the necessity of two additional linear motions during the manipulation phase. The time taken to perform each movement is reported in Table 2. The overall time to complete the transport of the 21 boxes can be computed by means of Equation (1), with the following result: Therefore, the former manipulation scheme provides an overall time saving of v T tot − s T tot ≈ 60 s. Such a result justifies the choice of equipping the AMR with the lifting device and using a tool that allows the manipulator to drag the items instead of picking and placing them.

Experimental Setup, Perception and Control System
In order to perform a preliminary evaluation of the proposed depalletizing system, a prototype was set up, as shown in Figure 5a, which does not include the lifting device on the AMR. Moreover, only a single layer of boxes is present. Experiments were conducted in a laboratory environment instead of an industrial setting, due to the ongoing the COVID-19 pandemic. Since we are interested in demonstrating the capability of the system to collect a limited number of items without manipulating an entire pallet layer, we present the experiments that involve unloading of two boxes. The 3D camera is shown in Figure 5b, as well as the paddle tool connected to the manipulator through a custom flange. The algorithm describing the box extraction task is detailed in Algorithm 1. After the mobile base has moved closer to the pallet, the system performs detection of parcel boxes as described in Section 5.1, by moving the 3D camera to an elevated pose in order to have a complete view of the pallet top layer, and it detects the 3D position of all the boxes in front of the loading surface of the robot. Then, in the box depalletizing plan phase, the sequence of picking operations to be performed is determined according to the pose of the boxes with respect to the robot. Each picking operation is performed as follows. First, an edge of the box is chosen for the picking and its position is evaluated according to the camera estimation and the box dimensions. Then, the camera on the arm is moved above the estimated position to perform a close view refined estimation of the center of the gap between two boxes, where the tool should be inserted to complete the retrieval (Section 5.2). Once the estimation is obtained, the robot tool is aligned with the gap center and a tool insertion operation is attempted. During the insertion, the tool wrench is continuously evaluated to identify any unexpected collision, possibly due to wrong detection. An admittance control scheme allows to prevent any damages to the robot and the boxes during the insertion as reported in Section 5.3. As soon as a collision occurs, the insertion operation is interrupted, and the estimation of the gap is tried again. If no collision is detected, the insertion is considered successful and the box, dragged by the robot tool, is loaded on the robot support surface, which will be replaced by the lifting device in our future work.

Algorithm 1: Robotized Depalletizing Algorithm
Data: 3D camera image, robot tool wrench Result: depalletizing of a box layer move camera in the detection pose; perform detection of parcel boxes; define box picking sequence in box depalletizing plan; while box picking sequence is not empty do extraction start: consider the first box in the sequence; evaluate the box edge position; move the 3D camera to the evaluated position; perform refined estimation of the gap between boxes using the 3D camera image; move the robot tool above the refined gap; while insert the robot tool in the gap do if vertical tool wrench > wrench threshold then extract robot tool; go to beginning of current section: extraction start; end end drag the box toward the robot; remove current box from the sequence; end

Detection of Parcel Boxes
In the detection of parcel boxes phase, the robot arm first moves to an observation configuration, where the upper layer of the pallet can be fully observed by the eye-in-hand infra-red 3D camera (IFM Electronics O3D303). Furthermore, the camera exposure time t exp is set to a constant value t exp,far , suitable to observe objects from the current distance of the sensor to the pallet, as shown in Figure 6a. Then, the camera acquires a depth image with a of resolution 352 × 264 pixels, which is converted into an organized point cloud containing 3D points p ij . The camera also produces an intensity value r ij for each pixel that contains the amount of light returned to the sensor.
As light intensity r ij decreases with distance according to the inverse square law, and since the amount of light received by the sensor is proportional to the current exposure time t exp , a corrected intensity image R = r ij is computed as r ij = r ij p ij 2 / t exp .
As the camera provides its own infra-red lighting, both depth and intensity measurements proved robust to changes to environmental light conditions compatible with indoor industrial environments. In order to ensure stability, pallets are generally arranged as a stack of layers. Each layer above the first one lies on the top planar faces of the parcel boxes of the layer below. In industrial environments, the content and the configuration of a pallet is known in advance. Pallets stored in a warehouse usually contain parcel boxes of the same type. Moreover, palletizing and depalletizing tasks of single parcels are performed by inserting or removing the parcel boxes from the top layer.  These assumptions derived from industrial practice are exploited to strengthen the robustness of our box detection algorithm.
Hence, in this work, we consider a single type of boxes of known size, but the proposed approach can be easily extended to handle different box formats. Moreover, we can estimate the equation of the top plane of the highest layer of parcel boxes by applying a RANSAC-based estimation method when the pallet layer is full, i.e., when no box has been removed yet. Hence, we assume that the top plane is always known with respect to the robot. Parcel box detection is constrained to the points p ij which are within a small distance from the plane. Parcel boxes are then detected according to the following steps [29]:

Edge Detection and Candidate Boxes Computation
Edges are detected in both the intensity and the depth image acquired by the camera. In particular, discontinuities in the intensity image are detected by applying the standard Canny edge detector, with upper and lower thresholds U canny = 120 and L canny = 60. Conversely, depth discontinuities are defined as the pixels whose corresponding points belong to the top plane of the pallet (with a tolerance th inlier = 5 cm) for which there exists at least a pixel in the neighborhood which does not satisfy the same condition. Straight lines are fitted to the union of the two discontinuity images using the Hough Line Transform.
Lines are organized in a connectivity graph by connecting them at intersections. A set of candidate boxes B is generated by locating cycles of length 4 in the graph, i.e., quadrilaterals in the image. A candidate box is only accepted if the edges approximately intersect at a right angle, with tolerance θ box = 10 • . Moreover, the length of the edges must correspond to the expected size of the box top face (with tolerance σ box = 2 cm). In the case of a mixed pallet, multiple possible box sizes could be acceptable in this step.

Genetic Optimization Algorithm
As the set of candidate boxes B detected in the previous step may contain many spurious or overlapping boxes, a genetic optimization algorithm is used to locate the best subset of boxes S. The optimization aims at maximizing the area F(S) of the image (in pixels) which is covered by exactly one of the boxes. In particular, the objective function to be optimized is: The first term is the summation of the areas A(S) (in pixels) of the candidate box S ∈ S, and the remaining three terms are penalty terms where O(S) is the number of pixels of S, which do not belong to the top plane of the pallet, I(S) is the difference between the quadrilateral area and the expected area of the top face of a box, and C(S, S ) is the overlapping area between box candidates S and S . Coefficients γ O = 2, γ I = 8 and γ C = 2 are parameters weighting each penalty term.
The genetic optimization algorithm operates on a population of G pop = 100 individuals S i , each representing a subset of the candidate box B. The population is initialized by selecting random subsets of candidate boxes. We define two mutation operators Mutation1(S) and Mutation2(S), and a crossover operator Crossover(S 1 , S 2 ). Operator Mutation1(S), applied with probability P M1 = 0.1, removes up to three random elements from S. Operator Mutation2(S), applied with probability P M2 = 0.1, removes a random element from S and replaces it with the element in B which maximizes F(S). Finally, operator Crossover(S 1 , S 2 ), with probability P C = 0.7, generates a new individual S by extracting random elements from S 1 ∪ S 2 as long as they increase the objective function F(S). Furthermore, we define a Fill(S, B) operator, which generates a superset of S by repeatedly adding random candidate boxes from B until F(S) stops increasing. By applying the Fill operator at the initialization and after each operator, we ensure that each individual is always a greedy local maximum. Optimization ends when the objective function does not decrease for G stall = 20 consecutive frames.
The genetic optimization algorithm was implemented in C++ using the OpenGA library (https://github.com/Arash-codedev/openGA, accessed on 24 June 2021). The output of the box detection can be appreciated in Figure 6b. Further details on the genetic optimization approach can be found in [30].

Refined Estimation of the Gap between Two Boxes
In the refined estimation of the gap phase, the camera on the arm is moved to the estimated position of the gap between two boxes, at a fixed height above the boxes themselves. The camera is oriented so that the image plane is roughly parallel to the top pallet layer. Moreover, the image x axis is oriented as the expected direction of the box edge. The camera exposure t exp is adjusted to a low value t exp,near to prevent the saturation of the camera sensor. The 3D camera view at this stage can be seen in Figure 6c.
The gap refinement algorithm consists of three steps. First, 2D lines are detected through the probabilistic Hough Transform on the corrected intensity image R . Then, 2D lines are discarded if their angle with respect to the image x axis is too large or if they are too far from the image center, as shown in Figure 6d. Through a proper choice of thresholds, only the horizontal lines in the center of the image remain and most outliers are discarded. Finally, the gap position p ref and orientation θ ref are computed as the mean position and orientation of the detected lines, and used to determine the target frame for the manipulation operation.

Controller Design
During the extraction procedure, an admittance control strategy is implemented for the control of the arm position x ref : along the tool insertion direction defined by the selection matrix Λ: The admittance behavior is based on the tool wrench measure provided by the cobot and it produces a smooth behavior during tool insertion. In particular, a displacement for the control reference is defined at each time instant according to the measured wrench, in order to produce an elastic interaction in the case of collisions and prevent any damage to the boxes. The induced displacement ∆x des is obtained by means of a first-order digital filter to reduce the wrench estimation noise: where z −1 represents the sample-time delay according to the Z-transform notation and K p is the filter gain. The wrench signal is evaluated with respect to a predefined deadzone function dz(·, ·) to prevent undesired oscillations of the control:

Experimental Results
Experiments were performed to show the capabilities of the robot platform and to evaluate its effectiveness in depalletizing tasks. We report unloading tasks in scenarios where the mobile platform is already in place to collect the cardboard boxes, as well as in more dynamic cases where the AMR first approaches the boxes. The experiments were performed with filter gains K F = 10e −3 , K P = 0.1 and force threshold F1 = 10N.

2-Boxes Depalletizing and Transportation
A two-boxes depalletizing task is shown in Figure 7. First, the mobile base approaches a group of boxes by moving to a suitable configuration for the box unloading task, then the cobot rises the 3D camera to an elevated position to detect the boxes. The two-steps detection of the gap between adjacent boxes was then executed. Then, the tool is inserted into the gap between the first two boxes and the first box is pulled on the robot. Then, the procedure is repeated to manipulate the second box. Finally, the mobile base moves away with the two boxes onboard. In Figure 8, the Cartesian position and the measured wrench on the cobot end-effector are displayed along the task. The low values of the measured wrench confirm that the task was successfully completed without any collision of the tool.   The mobile base is exploited to approach the boxes for the unloading task and to transport the boxes away once loaded on the robot. Figure 9 reports the success rate of the task for 50 box extractions. It can be noticed that in most cases, the system performed the extraction by the first attempt. Nonetheless, in a few cases, the first estimation of the gap between boxes was not correct, and a second attempt was required. This is probably due to unexpected changes in light conditions or slight motions of the mobile base. In all cases, the second estimation of the gap was successfully completed and the box was correctly loaded onto the robot. Success Rate (%) Figure 9. Overall insertion success rate with respect to gap detection repetitions.

Single Box Extraction
The Cartesian position and the measured wrench on the cobot end-effector during the unloading of a single box are reported in Figure 10a. At first, the box position was estimated by the in-hand camera. Then, the estimated gap between two adjacent boxes was refined to obtain the target frame for tool insertion (t = 4 s). Once the target frame was determined, the tool was aligned with the gap between the boxes (t = 8 s), and an insertion attempt was performed (t = 12.5 s). During the insertion phase, a collision was detected (t = 15.5 s), possibly caused by an inaccurate evaluation of the gap position. Therefore, tool insertion was suspended and a new detection of the gap was performed (t = 21 s). Since no further collisions were detected (t = 32 s) the cobot completed the insertion and the box was successfully pulled onto the robot (t = 36 s). It can be noticed that when a collision occurs, the admittance control allows for a smooth motion of the end-effector, thus reducing the risk of damaging the robot arm or the target box.

Complete Layer Depalletizing
A successful example of unloading a pallet layer with four boxes, in a static scenario where the AMR was fixed, is shown in Figure 10a. In this test case, the target boxes were extracted one by one until the pallet layer was empty. The AMR was placed so that all four boxes were inside the robot arm workspace. The boxes were manually removed from the robot platform once unloaded since the robot platform can carry at most two boxes. The graph in Figure 10b shows that the system was able to successfully extract all four boxes. A collision occurred when the tool was inserted to extract the third box, and a second estimation of the gap was required for the task to be successfully completed. It is noteworthy that each extraction showed a similar behavior, and a comparable completion time (with the exception of the collision case). Moreover, the robot was moving at only 10% its maximum speed. Therefore, the expected performance obtained in Section 4 can be easily achieved if the robot arm was moving at full speed. (a) Single box depalletizing task.

Conclusions and Future Work
In this paper, a novel mobile manipulator equipped with a 3D perception system for autonomous depalletizing tasks was presented. The proposed solution shows that it is possible to unload, in a controlled way, just a few boxes from a pallet, without the need of disassembling an entire pallet layer. Therefore, the system can be very effective to build mixed pallets containing different types of goods, in which only a small number of boxes of the same type are needed.
Future work will be devoted to the evaluation of the proposed system in an industrial scenario, which was not possible due to COVID-19-related restrictions. Moreover, the implementation of the depalletizing task will be extended to handle packages of bottles and cans, which will require a more complex manipulation plan. We will also consider the task of building a complete mixed pallet, including the collection of small lots of boxes from multiple single-item pallets and their arrangement on the mixed pallet in a proper order.