Development of a Multifunctional Mobile Manipulation Robot Based on Hierarchical Motion Planning Strategy and Hybrid Grasping

Cao, Yuning; Wang, Xianli; Wu, Zehao; Xu, Qingsong

doi:10.3390/robotics14070096

Open AccessArticle

Development of a Multifunctional Mobile Manipulation Robot Based on Hierarchical Motion Planning Strategy and Hybrid Grasping

Department of Electromechanical Engineering, Faculty of Science and Technology, University of Macau, Avenida da Universidade, Taipa, Macau 999078, China

^*

Author to whom correspondence should be addressed.

Robotics 2025, 14(7), 96; https://doi.org/10.3390/robotics14070096

Submission received: 30 May 2025 / Revised: 3 July 2025 / Accepted: 11 July 2025 / Published: 15 July 2025

(This article belongs to the Section Sensors and Control in Robotics)

Download

Browse Figures

Versions Notes

Abstract

A mobile manipulation robot combines the navigation capability of unmanned ground vehicles and manipulation advantage of robotic arms. However, the development of a mobile manipulation robot is challenging due to the integration requirement of numerous heterogeneous subsystems. In this paper, we propose a multifunctional mobile manipulation robot by integrating perception, mapping, navigation, object detection, and grasping functions into a seamless workflow to conduct search-and-fetch tasks. To realize navigation and collision avoidance in complex environments, a new hierarchical motion planning strategy is proposed by fusing global and local planners. Control Lyapunov Function (CLF) and Control Barrier Function (CBF) are employed to realize path tracking and to guarantee safety during navigation. The convolutional neural network and the gripper’s kinematic constraints are adopted to construct a learning-optimization hybrid grasping algorithm to generate precise grasping poses. The efficiency of the developed mobile manipulation robot is demonstrated by performing indoor fetching experiments, showcasing its promising capabilities in real-world applications.

Keywords:

mobile manipulation robot; navigation control; learning-optimization hybrid grasping; motion planning

1. Introduction

The mobile manipulation robot, which integrates a mobile robot’s locomotion mobility with robotic arm’s manipulation dexterity, holds great promise for wide applications ranging from industrial automation to domestic services. It is a fundamental and indispensable element for autonomous manufacture and transportation. The development of a mobile manipulation robot demands both mechanical structure and control algorithm designs. According to the mechanical structure, mobile manipulation robots can be categorized into wheeled robots, tracked robots, legged robots, and hybrid robots. For example, as a popular type of hybrid robot, the legged-wheel robot combines the high speed of a wheeled robot and the surmounting ability of a legged robot. The selection of the mobile platform depends on the specific tasks to be accomplished. For instance, DRC-HUBO developed by KAIST team in DARPA Robotics Challenge has dual mobility modes: wheel and walking modes. The walking mode allows the robot to traverse a landscape with varying ground levels, such as an area of rubble. The wheel mode is ideal for traversing normal ground quickly and stably. In addition, the manipulators can be constructed by one or more robotic arms, each with multiple degrees of freedom (DOFs).

According to control strategy, mobile manipulation robots can be categorized into whole-body control and decoupled control. The whole-body control problem is difficult to solve due to its high-dimension nature, while the optimality of the mobile manipulation can be better guaranteed. In motion planning problem, to treat the mobile manipulation robot as an integrated system with redundant DOF and sophisticated kinematic model, we can adopt sampling-based methods [1], model predictive control (MPC) methods [2], and reinforcement learning (RL) methods [3,4]. In the literature, CLF and CBF have been incorporated into the reactive controller of a mobile vehicle and bipedal robot to mitigate way points deviation and eliminate non-smooth motions [5,6,7]. Quadratic Programming (QP) is extended and employed to optimize velocity command for a 9-DOF mobile manipulator [8,9,10,11]. Additionally, QP is utilized to solve the recurrent neural networks optimization problem for the tracking control of mobile manipulators [12]. The fusion of different algorithms is also noticeable. For example, the CLF-RRT* method is created for the smooth feedback control of a bipedal robot [13]; the MPC-CBF algorithm is applied in obstacle avoidance of an unmanned vehicle [14]; the RL-CBF method is proposed in [15] to provide safety guarantee for the learning process; and QP is integrated with a deep neural network to allocate velocity command for a robotic arm and mobile base via a learned inverse dynamics model [16]. In contrast, the decoupled control strategy is more computationally efficient. However, the collaboration between the independent navigation system of a mobile platform and manipulation system of robotic arms is a critical issue.

The ability to grasp objects with precision and stability in a dynamic and unstructured environment represents an unavoidable challenge in mobile manipulation. To implement the grasping operation of robotic arms, we can utilize optimization methods [17,18,19,20] and learning-based methods [21,22,23,24]. The former is rooted in specific kinematic models of grippers, which can be subsequently employed as one of the constraints in optimization process. The learning-based and data-driven methods are more straightforward and are able to get rid of mathematics deduction. Nevertheless, their requirement for empirical data is sometimes hard to meet. Recently, multiple kinds of hybrid grasping algorithms have emerged, which combine the two aforementioned methods to achieve accurate and swift grasping [25]. In particular, optimization is commonly used to refine preliminary grasping configurations, and learning is adopted to predict grasping quality which is subsequently integrated into grasping optimization. Moreover, swift and safe motion planning is crucial for the operation of mobile manipulation robots. However, no universal method is available for deployment to a general mobile manipulation robot.

In this work, the design and development of a multifunctional mobile manipulation robot is implemented, and a new hierarchical motion planning strategy is proposed by integrating global and local planner along with a CLF-CBF-empowered refiner. It provides a robust framework for navigation and collision avoidance in complex environments. The algorithm enables the robot to generate smooth and collision-free trajectories, over-approximates obstacles, and incorporates safety constraints into control loop. The main contributions of the paper include the following:

(a): Development of a reliable mobile manipulation system by combining the state-of-the-art perception, mapping, navigation, and grasping subsystems.
(b): Construction of a hierarchical navigation system with CLF restricting deviation from trajectory and CBF ensuring obstacle avoidance.
(c): Incorporation of learning-based and optimization gripping algorithms with antipodal gripper’s kinematic constraints to achieve higher success rate of grasping.

We verify the efficiency of the proposed mobile manipulation system by performing repetitive fetching experiments.

The proposed approach enhances the robot’s ability to interact with the environment, providing a valuable tool for service applications. The remaining parts of the paper are organized as follows. Section 2 reviews related works in mobile manipulation. Section 3 describes system design and implementation details. Section 4 introduces the navigation algorithm. Section 5 explains the hybrid grasping method. Section 6 presents the experimental setup and results. Section 7 concludes the work.

2. Related Works

There are substantial research studies on mobile manipulation robots nowadays. The control of mobile platform and robotic arms primarily incorporates sampling-based methods, model predictive control, and learning-based methods. In the literature [26], a mobile manipulator was made of an ABB YuMi dual-arm robot and a Clearpath Ridgeback mobile platform. The robot was designed to conduct an autonomous fetch-and-carry workflow in unstructured indoor environments. The integration of different actions (e.g., mapping, navigation, object detection, and object mapping), dense reconstruction, and grasp planning was achieved by a state machine with a fixed action sequence and transition conditions. A real-time robot control framework (named PODO) was presented in [27], which is expandable in both software and hardware. The framework can accommodate multiple sub-processes running simultaneously. Hybrid position and force control was implemented to calculate the desired force to be generated by the end effectors.

Apart from the conventional sampling-based and model predictive control methods, learning-based approaches can be used to solve online high-dimensional whole-body control problems. For example, Proximal Policy Optimization (PPO) was used as a reinforcement learning algorithm to train a deep neural network composed of data from a LiDAR scan, goal position in the end-effector frame, and the robot’s internal states (positions and velocities of the base and joints) [28]. With a handcrafted reward function, the learning-based method could generate efficient action, leading to collision-free passage through a corridor. In [29], a humanoid mobile manipulator was equipped with two hierarchies plannings, a high-level online redundancy resolution based on the neural-dynamic optimization algorithm in operational space, and a low-level reinforcement learning in joint space to learn the trajectories with uncertainties. In addition, a hierarchical whole-body control framework was proposed to prevent the interaction between different functions by projecting the Jacobian matrices of lower priority tasks into the null space of higher priority tasks, so that collision avoidance would not affect the completion of navigation and grasping of target objects [30].

In addition to obstacle collision avoidance, self-collision avoidance is a crucial subject in the safe manipulation of mobile manipulators. Collision avoidance volume is presented with spherically extended convex hulls that tightly cover the actual volume of robot. A damping design is applied to each contact point with respect to relative motion between the two corresponding contact points [31]. Crowdsourcing through gamification can improve the success rate of object detection by employing human reasoning in the control loop of autonomous robots. When a mobile manipulator’s head camera with pre-trained CNN failed to identify bottles confidently among other groceries, human operators were asked to complete classification games in real-time to expand the learning dataset and help the robot understand unknown objects [32].

Interaction with dynamic environment is crucial for industrial and service mobile manipulators. A logistic mobile manipulator employs a multi-layer sensor setup (consisting of six RGB-D cameras and three LiDARs) to detect humans in shared workspace [33]. The anthropomorphic robot Cosero can detect the legs and torsos of people with LiDAR, and subsequently identify registered and unknown individuals with face detection. It segments the head, upper body, and arms of human instructor in depth image and takes the line through eyes and hand as target direction [34]. An industrial mobile manipulator is equipped with a state machine to perform navigation, docking, object detection, and grasping functions seamlessly in correct sequence. In the object detection task, the RGB image and depth data were fed into PVN3D perception network together to estimate the target object’s position in end effector’s frame [35,36]. Mobile manipulators are supposed to be in a reachable zone with regard to the target object’s position. Reference [37] employed deep reinforcement learning approaches to generate velocity commands for a mobile base using robotic arms’ feedback as inputs. A simple reward function was designed to penalize collision and guarantee adjacency with target. Model predictive control respects the constraints for mechanical stability and joint limits while tracking a given trajectory. Euclidean signed distance fields were used to represent the surrounding environment, and its gradients were caught in the MPC optimization process to achieve real-time obstacle avoidance [38].

Overall, the research on mobile manipulation robots highlights both the advancements and existing challenges in achieving precise grasping and effective interaction in dynamic environments. Addressing these challenges is critical for continued evolution of mobile manipulators in real-world applications. This work presents the development of a multifunctional mobile manipulation robot for practical applications.

3. System Design and System Integration

In this section, the system hardware setup, signal communication, and system integration of a mobile manipulation robot are introduced.

3.1. System Hardware Design

For the autonomous fetching task, the mobile manipulator is composed of a four-wheel, differential-drive mobile platform and two robotic arms (see Figure 1). It is equipped with multiple sensors, such as RGB-D cameras and 3D LiDAR on the front and rear. Two robotic arms are mounted on top of the rear wing to facilitate the manipulation process. The two robotic arms are mounted with a gripper and camera, respectively (see Figure 2).

3.1.1. Mobile Platform

The specifications of the mobile platform are tabulated in Table 1. The differential-drive unmanned ground vehicle (UGV) (model: DGT-01L, Shenzhen Yuhesen Technology, Shenzhen, China) has four sophisticated large-torque motors. Each motor controls an individual wheel, allowing the robot to turn in 360°. The platform is fully covered with steel plate to protect the computers and motors inside. The obstacle-surmounting ability allows it to navigate various terrains. The 3D LiDAR has a detection range of 0.4 to 150 m, which enables the platform to map both indoor and outdoor environments. The two RGB-D cameras on the front and rear of the mobile platform have a detection range of 0.2 to 4 m. The point clouds of LiDAR and cameras are integrated to form a laserscan in an established obstacle map.

3.1.2. Robot Manipulators

For executing manipulation task, two robot manipulators (model: UR3e, Universal Robots, Odense, Denmark) are mounted on the mobile platform. Their manipulation capacity is sufficient for majority of target objects, like water bottle, books, and so on. One robotic arm is connected with a two-finger gripper (model: 2F-85, Robotiq, Lévis, QC, Canada) to perform a grasping action, while the other is equipped with a camera (model: Azure, Microsoft Corporation, Redmond, WA, USA). The gripper’s force is variable, from 20

N

to 235

N

, allowing it to capture targets of various textures and hardnesses. The camera is chosen due to its high resolution and broad capture angle. The robot system integrates the depth camera, RGB camera, audio, and inertial measurement unit.

3.2. Communication Framework

The sensors, control boards, industrial computer, and actuators are connected through a communication framework as illustrated in Figure 2. The industrial computer is contained inside the vehicle running Ubuntu v18.04 operating system. During operation, it sends WiFi signal to connect the operator’s laptop wirelessly. The vehicle can be operated by a remote operation handled manually. The 3D LiDAR is used in the mapping and navigation process, which generates maps with positioning repeatability of approximately 3 cm.

The front and rear cameras provide visual information for surrounding environments. The point clouds of these sensors are transmitted to industrial computer and integrated to perceive static and movable obstacles in real time. Odometer and IMU data help to locate the robot in the map. Information from the motors in the robotic arms and gripper is used to figure out its configuration relative to the mobile platform. The overall configuration of the mobile manipulator is integrated by the industrial computer and can be monitored by Rviz software (version 1.14.2) at the operator’s laptop. In the object detection and grasping process, the camera transmits the visual and depth information of target objects to identify the target and generate optimized grasping position. Once the target position is secured, it is transmitted to an industrial computer to calculate an expected trajectory. Subsequently, the industrial computer generates expected linear and angular velocities at the robot’s current position, which are then sent to the mobile platform’s vehicle control unit (VCU) via a USB-to-CAN adapter. The VCU is responsible for monitoring and commanding the proportional–integral–derivative (PID) controllers of four servomotors. They employ velocity, position, and torque control loops to regulate each motor’s angular velocity and report their actual values.

3.3. System Integration and System Behaviors

The completion of mobile manipulation tasks requires collaborative motion planning and control of the mobile base and manipulators. As shown in Figure 3, the workflow can be decomposed into three fundamental steps: object detection, navigation of the mobile platform, and the grasping planning of manipulators. These functions are addressed in the following sections.

3.3.1. Task Planning

Task planning aims to connect the heterogeneous subsystems required for performing a consistent task sequentially. It enables the robot to conduct sophisticated tasks, such as setting up a dinner table by activating and terminating behaviors in a timely manner. It allows the robot some autonomy and intelligence by pre-inserting the to-do lists. There are mainly three kinds of mechanisms for high-level task planning: finite state machine [39], behavior tree [40], and large language models [41]. The finite state machine is relatively simple and efficient compared to other methods. However, due to its traceback analogy problem, it fails to cope with large numbers of tasks in a long execution sequence. In this paper, since the mobile manipulation task is considered to be straightforward and the experimental setting is of a small scale, the finite state machine is adopted to organize and activate each function once, given the robot’s condition.

3.3.2. Mapping and Object Detection

The first step of the mobile manipulation task is to map the surrounding environment and locate the robot itself. Simultaneous localization and mapping (SLAM) aims to generate the relative position of obstacles and target objects (see Figure 4). Our robot utilize LIO-SAM (LiDAR Inertial Odometry and Mapping) to build the 3D map in a large-scale environment and the Cartographer algorithm to build the 2D map in a small-scale indoor environment. LIO-SAM achieves the real-time, tightly coupled sensor fusion of LiDAR and IMU data. This integration allows for highly accurate and robust 3D mapping and localization, even in dynamic and complex environments. The use of a factor graph for optimization helps maintain consistency and precision in the generated maps. Cartographer consumes less computation resources compared to other SLAM algorithms. It first constructs a series of grid submaps through the data fusion of LiDAR and IMU. Subsequently, it eliminates cumulative errors and reduces drift using close-loop optimization. Eventually, the refined submaps are stitched together to form a global map in back-end optimization.

After building the map, the robot needs to search for the target and locate its position using the RGB-D camera mounted on the end of a robotic arm. The search procedure is continued to locate the target object. Then, its 3D position consisting of camera coordinate and depth information is projected to the map and transferred to the base controller, which sets off the command signal for the mobile base to approach the target. Here, the YOLOv11 algorithm is selected to undertake the object identification function due to its efficiency and fast detection speed (see Figure 5).

3.3.3. Navigation

Once the detection process is completed, the navigation function is activated by a finite state machine, which sets the position of the target center as the navigation destination automatically (see Figure 6). The target position is sent to the global path planner stack by publishing a move_base_simple/goal topic. The navigation process is based on the ROS move-base node, which provides an interface connecting the sensor data, SLAM algorithm, path planners, and motion controllers. The global path planning is conducted by the Astar algorithm, which employs a heuristic function to search the grid map given by the SLAM process. Afterwards, the path from the robot’s current position towards destination is transmitted to the dynamic window approach (DWA) local planner to translate the path from the point mass model to a four-wheel, differential-drive kinematic model and cope with movable, unknown obstacles. As a result, the paths generated by the local planner should satisfy the robot’s kinematic constraint and be collision-free in real-time operation. The motion controller is the final step to refine a path before it is transmitted to the mobile base’s VCU to activate motors. Here, we utilize a Control Barrier Function Quadratic Programming method to enforce the safety constraint while smoothing the trajectory.

3.3.4. Scene Understanding and Grasping

The navigation process is terminated when the mobile manipulator is within 300 mm distance relative to the target object’s center of mass (COM) and the collision with the pedestal is detected by the front RGB-D camera. While the target is within the reachable range of the robotic arms, the scene understanding and grasping process will be activated by the state machine. To successfully grasp target objects, the robot has to understand their physical characteristics, e.g., shapes and edges. We employ image segmentation to extract target objects from background information. The dense reconstruction process is able to create complete point cloud of the target and facilitate grasping pose detection. Subsequently, to generate grasping positions for the antipodal robotic gripper, a CNN-based learning-optimization module is employed to provide all feasible grasping positions for target objects. The learning process is based on a customized neural network structure, which incorporates directional estimates and rotational parameters. Then, the network is trained with the Cornell Grasping Dataset. Finally, the optimization process with respect to the antipodal gripper’s kinematic constraints and object’s geometry is able to refine the grasping pose using exponentially weighted gradient descent and root mean square propagation methods in the adaptive moment estimation algorithm. With the precise grasping position of the end effector, the motion planning of the robotic arm is conducted by the RRT-connect algorithm. The other arm with a camera moves away from the center and provides real-time images to boost the precision of the grasping. To avoid collision between the robotic arms and mobile platform, an intuitive movement restraint confines the arms’ end position in a semi-cylinder.

4. Optimization-Based Motion Control Algorithm for Mobile Robot

4.1. Modeling of the Differential-Drive Mobile Platform

The mobile manipulator is composed of a 3-DOF four-wheel differential-drive non-holonomic UGV equipped with two 6-DOF robotic arms. The ipsilateral wheels of the UGV always maintain the same speed. Therefore, its forward linear speed

v_{f}

and whole-body rotational speed

w_{w}

can be deduced from the left (

ϖ_{L}

) and right (

ϖ_{R}

) wheel speeds as

w_{w} = \frac{r}{w} (ϖ_{R} - ϖ_{L}), v_{f} = \frac{r}{2} (ϖ_{R} + ϖ_{L}),

(1)

where the wheel’s radius

r

and the UGV width

w

are known parameters. Transformation into the world coordinate system generates the streamlined mathematics model below:

\dot{s} = [\begin{matrix} \dot{θ} \\ \dot{x} \\ \dot{y} \end{matrix}] = [\begin{matrix} - \frac{r}{w} & \frac{r}{w} \\ \frac{r}{2} cos θ & \frac{r}{2} cos θ \\ \frac{r}{2} sin θ & \frac{r}{2} sin θ \end{matrix}] [\begin{matrix} ϖ_{L} \\ ϖ_{R} \end{matrix}],

(2)

where

w_{w} = \dot{θ}

,

\dot{x} = v_{f} cos θ

, and

\dot{y} = v_{f} sin θ

. The gripper and camera’s position in the world frame is eventually obtained through the transformation of 3D coordinate frames between the robotic arm and mobile base:

T_{e}^{w} = T_{b}^{w} (x, y, θ) T_{a}^{b} T_{e}^{a},

(3)

where

T_{e}^{w}

represents the end-effector’s position in the world frame, which is obtained by multiplying the base position in the world frame

T_{b}^{w}

, arms position relative to the fixed base

T_{a}^{b}

, and end-effector’s position in the arm coordinate frame

T_{e}^{a}

.

4.2. Construction of Lyapunov Function

The global planner Astar employs the global grid map and target position and generates a reference path pointing from the current position to the target. The path is represented by

ϵ (x_{r}, y_{r}, θ_{r})

. CLF is utilized to guarantee a successful tracking performance (see Figure 7). We consider a control affine system as follows:

\dot{x} = f (x) + g (x) u,

(4)

where the locally Lipschitz

f (x) : R^{n} \to R^{n}

determines the variance of system state

x \in R^{n}

in the absence of control input

u \in R^{m}

.

g (x) : R^{n} \to R^{m}

describes the relationship between system states and inputs. Continuously differentiable function

V (x) : R^{n} \to R

represents the system energy, which should diminish while the system is stabilizing.

If the Lyapunov function

V (x)

satisfies

$Ƶ_{c} : = {x \in R^{n} : V (x) \leq c},$
$V (x_{e}) = 0$ , $V (x) > 0 \forall x \in R^{n},$
${inf}_{u \in U} \dot{V} (x, u) < 0 for all x \in Ƶ_{c} ∖ {x_{e}},$

where the constant

c > 0

. Then, every

x \in Ƶ_{c}

is asymptotically stabilizable to equilibrium

x_{e}

, i.e.,

Ƶ_{c}

is a region of attraction. Additionally, the decreasing rate of

V (x)

can be defined by a positive constant

λ \in R

, which yields

inf_{u \in U} \dot{V} (x, u) + λ V (x) \leq 0,

where

\begin{matrix} \dot{V} (x, u) & = \nabla V (x) \cdot \dot{x} \\ = \nabla V (x) \cdot f (x) + \nabla V (x) \cdot g (x) u \\ = L_{f} V (x) + L_{g} V (x) u . \end{matrix}

(5)

The greater the value of

λ

, the more aggressive the CLF decline will be, and

V (x)

will be exponentially stabilizable. The translational coordinate tracking error can be written as

e = {[e_{x} e_{y}]}^{T} = {[x_{r} - x y_{r} - y]}^{T},

(6)

Therefore, the CLF can be intuitively constructed as

V (x) = (1 / 2) e^{T} e + β sin (δ),

(7)

where

δ = \frac{| θ_{r} - θ |}{2}

and

β

is the weight of heading angle deviation. Ideally, CLF will maintain the maximum decreasing rate of

- λ V (x)

and reach its equilibrium state

x_{e}

, where both translational and rotational tracking errors are eliminated.

4.3. Construction of Control Barrier Function

To guarantee the avoidance of collision with obstacles, CBF is introduced as a safety filter of the control input. The safety guarantee problem is formulated as finding an invariant safe set

Ƶ_{s}

of system states,

\begin{matrix} Ƶ_{s} = {x \in R^{n} : h (x) \geq 0}, \\ \partial Ƶ_{s} = {x \in R^{n} ∣ h (x) = 0}, \end{matrix}

which means that

Ƶ_{s}

is the super-zero level set of a continuously differentiable function

h (x)

,

\begin{matrix} \dot{h} (x, u) & = \nabla h (x) \cdot \dot{x} \\ = \nabla h (x) \cdot f (x) + \nabla h (x) \cdot g (x) u \\ = L_{f} h (x) + L_{g} h (x) u . \end{matrix}

(8)

If

\dot{h} (x, u) \geq 0, \forall x \in \partial Ƶ_{s}

, then

Ƶ_{s}

is a forward invariant set. That is, a system’s trajectory starting in

Ƶ_{s}

will stay inside it for all future time. To enforce an exponential decay rate on

h (x)

, a positive constant

α \in R

is introduced, which induces

sup_{u \in U} [L_{f} h (x) + L_{g} h (x) u] + α (h (x)) \geq 0,

(9)

where

- α

is the lower bound of the decreasing rate for

h (x)

. Assume that the obstacle can be confined within a circle of radius

r_{u}

, whose center

(x_{u}, y_{u})

is accessible through sensor measurement. CBF is constructed as follows:

h (x) = {(x - x_{u})}^{2} + {(y - y_{u})}^{2} - r_{u}^{2} .

(10)

4.4. Quadratic Program Formulation

The above-mentioned CLF and CBF constraints are linear for control inputs. The trajectory tracking and obstacle avoidance problems are formulated through the quadratic program controller. A non-negative slack variable

μ

is added into the cost function and CLF to relax the tacking constraint while meeting the potential conflicting collision avoidance conditions. The

μ

value is increased to keep the controller feasible for deviation from the the reference path caused by avoiding an obstacle. That is,

\begin{matrix} argmin & {(u - u_{r e f})}^{T} H (u - u_{r e f}) + p μ^{2} \\ s . t . & inf_{u \in U} [L_{f} V (x) + L_{g} V (x) u] + λ V (x) \leq μ \\ sup_{u \in U} [L_{f} h (x) + L_{g} h (x) u] + α (h (x)) \geq 0 . \end{matrix}

(11)

This min-norm controller is produced by solving a convex optimization problem via a quadratic program solver (CVXOPT) in real time.

5. Learning-Optimization Hybrid Grasping Method

After navigating to the target’s vicinity, the robot will start to grasp the target item. First, the wrist-mounted camera scans the grasping background, and the scene interference module is activated automatically. With the depth image from the RGB-D camera, a tailored CNN trained by the grasping dataset undertakes the task to produce an interference possibility map, where all potential grasping position candidates are presented with directional information. Subsequently, an optimization module takes the possibility map as a scene interaction constraint and combines it with the antipodal gripper’s kinematic constraints to determine the best grasping candidate. The overall hybrid grasping workflow is illustrated in Figure 8.

5.1. Scene Interaction Module

The grasping framework is decomposed into the elements of the grasping information and kinematics of the gripper, which are handled separately. This module focuses on predicting interaction probabilities for various states within the defined interaction space. The grasping information is composed of contact position and direction, which is represented by

g = [\begin{matrix} x_{g} \\ y_{g} \\ θ_{g} \end{matrix}] \in R^{3},

(12)

where

g_{p} = {[x_{g}, y_{g}]}^{T} \in R^{2}

indicates the position of contact and

θ_{g} \in R

stands for the approaching direction of an antipodal gripper. The possibility of feasible grasping information

g

is incorporated into an interference map through scene interaction module.

We create a fingertip dataset by isolating grasping labels from existing datasets, emphasizing the positional and directional characteristics of the gripper–object interactions. A CNN is employed to predict both interaction probabilities and orientations of the fingertips using the fingertip dataset. The initial predictions of the CNN are refined with a data ensemble approach, which improves the precision of the interaction probability estimation. During the process, each original RGB image is rotated by an angle

γ_{i}

(i = 0 to 7) to mitigate the possible disturbance caused by robotic arm’s unstable operation. Then, the predicted probabilities are interpolated across the continuous interaction space to form a complete interaction probability map.

5.1.1. Fingertip Dataset Generation

The fingertip of the gripper is defined by a tuple

F (o_{x}, o_{y}, ϕ, p)

, where

o_{x}

and

o_{y}

are the pixel coordinates of the fingertip’s center location,

ϕ

stands for the orientation of the fingertip constrained within

[- π, π]

, and p represents the associated probability of the fingertip being present at the specified coordinates. The perception information is represented by

Υ \in R^{C \times W_{i} \times H_{i}}

, where C,

W_{i}

, and

H_{i}

represent the channel, width, and height of the image, respectively. The tailored dataset leveraged for this purpose is the Cornell Grasping Dataset. It includes 885 RGB-D images and 5110 rectangle grasping labels characterized by

G (p_{c}, ℵ, w_{g}, h_{g})

, representing the rectangle’s center location, orientation, opening width, and height. Eventually, in the annotation process, each two-finger grasping is broken down into fingertip regions, which are supplemented with angle information. The resultant fingertip dataset comprises a probability heatmap

Q \in R^{W_{i} \times H_{i}}

(indicating the possibility of presence of fingertips) and an orientation map

Φ \in R^{W_{i} \times H_{i}}

(showing the orientation of the fingertips). The possibility in the heatmap ranges from 0 to 1, and the fingertip orientation in the orientation map varies within a range of

(- π, π)

.

5.1.2. Fingertip Prediction

The task of CNN is to predict both the probability heatmap and orientation map.

Φ

is reformulated within

[Φ_{sin}, Φ_{cos}]

to facilitate the calculation. The aim is to derive

M_{h} = (Q, Φ_{sin}, Φ_{cos}) \in R^{3 \times W_{i} \times H_{i}}

based on the input RGB-D image

Υ

. The labeled fingertip dataset along with feature pyramid network and encoder–decoder structure is utilized to train the neural network to approximate the mapping function.

5.1.3. Ensemble of Scalar and Directional Data

This approach involves rotating the input image to improve the regression outcomes. The key steps include the following:

Applying rotational parameters $γ_{i}$ to transform the image. $T_{γ_{i}}$ is used as the input transformation with rotational parameters.
Transforming estimates to restore their original alignment.
Calculating the arithmetic mean for scalar values and the circular mean for directional data.

The above methodology produces both scalar estimate and directional estimates as follows:

\begin{matrix} Q_{T} = \frac{1}{n} \sum_{i = 1}^{n} T_{γ_{i}}^{- 1} Q_{γ_{i}}, \\ Φ_{T} = arctan 2 (\frac{Φ_{T_{n}}^{sin}}{Φ_{T_{n}}^{cos}}) . \end{matrix}

(13)

5.1.4. Interpolating to Interference Space

With a B-spline mapping function

f_{Bspline}

, the interpolation results from

(o_{x}, o_{y})

coordinate to interference space

g

are obtained as

\begin{matrix} B_{Q} (x_{g}, y_{g}) = f_{Bspline} (Q_{T} (o_{x}, o_{y})), \\ B_{Φ} (x_{g}, y_{g}) = f_{Bspline} (Φ_{T} (o_{x}, o_{y})) . \end{matrix}

(14)

The Dirac delta distribution is utilized to deal with the interaction probability along the

θ_{g}

axis, i.e.,

P (g) = \{\begin{matrix} B_{Q} (x_{g}, y_{g}) & g \in M \\ 0 & g \in M^{c}, \end{matrix}

(15)

where

M = {(x_{g}, y_{g}, θ_{g}) | x_{g}, y_{g} \in R, θ_{g} = B_{Φ} (x_{g}, y_{g})}

. This modular strategy enhances the overall grasping optimization framework by providing a reliable interaction probability map for the subsequent optimization stages.

5.2. Grasping Synthesis Algorithm

After establishing the interaction probability map, we proceed to the grasping optimization phase. This phase focuses on generating optimal grasps by utilizing the fingertip states derived from the scene interaction module. The kinematic constraints of the gripper are integrated into the optimization process, ensuring that the resulting grasps are realistic and feasible based on the gripper’s capabilities. The optimization balances the interaction probabilities with these kinematic constraints, resulting in effective grasping configurations (see Figure 9).

5.2.1. Kinematic Constraint

Through planar projecting, a grasp pose is determined by the fingertip and anchor point states in the interference space. For an antipodal gripper, the fingertip states are represented as

F_{s} = (S_{1}, S_{2})

with

S_{i} = (S_{i}^{T}, S_{i}^{R}) = (x_{i}, y_{i}, σ_{i})

, which contains both location

S_{i}^{T}

and direction

S_{i}^{R}

information. Apart from the maximum open distance L, the antipodal gripper’s fingertips maintain opposite orientations, which is formulated below:

C_{K i n} (F_{s}) : \{\begin{matrix} | | S_{1}^{T} - S_{2}^{T} | | \leq L \\ arctan 2 (S_{1}^{T} - S_{2}^{T}) = S_{1}^{R} \\ arctan 2 (- (S_{1}^{T} - S_{2}^{T})) = S_{2}^{R} . \end{matrix}

(16)

5.2.2. Scene Constraint

Regarding the interaction probability

P (g)

, the fingertip locations are required to be within the areas where the interaction probability exceeds a threshold

P_{t h}

. The fingertip directions should align with the interpolated orientation map. That is,

C_{S c e} (F_{s}, B_{Q}, B_{Φ}) : \{\begin{matrix} B_{Q} (S_{i}^{T}) \geq P_{t h} \\ B_{Φ} (S_{i}^{T}) = S_{i}^{R} . \end{matrix}

(17)

5.2.3. Objective Function

The objective function consists of three penalty functions for stability, probability, and orientation.

To ensure grasping stability, the configurations should achieve force closure. The grasping stability measure evaluates the candidates based on the distance from the object’s center of mass to fingertip positions $O_{s} (S_{i}^{T}) = L_{s}$ .
The probability function reflects the aggregation of the scalar estimates of both fingertips $O_{p} (S_{i}^{T}, B_{Q}) = B_{Q} (S_{1}^{T}) + B_{Q} (S_{2}^{T})$ .
The orientation function is based on the angular constraint of the antipodal grippers: $S_{1}^{R} + S_{i}^{R} = 0$ . Therefore,

$O_{o} (S_{i}^{R}) = | | S_{1}^{R} + S_{i}^{R} | | .$

(18)

The optimization tool CasADi and solver IPOPT are employed to process the antipodal grasping optimizer as follows:

\begin{matrix} min_{S_{i}} & O (S_{i}, B_{Q}) = O_{s} - ς O_{p} + ρ O_{o} \\ s . t . & C_{K i n} (F_{s}) \\ C_{S c e} (F_{s}, B_{Q}, B_{Φ}) . \end{matrix}

(19)

5.3. Collision Avoidance for Grasping

During the grasping operation, it is possible that the two robotic arms collide with the UGV or crash by each other. Therefore, to avoid collision with UGV, the robotic arms’ end-effector positions

T_{e, k}^{a} (x_{e, k}^{a}, y_{e, k}^{a}, z_{e, k}^{a})

in the arm frame are restricted by

x_{e, k}^{a} \leq L_{s}, \forall k = 1, 2

.

L_{s}

is the length of the safety buffer zone, which ensures that the arm will not reach backward.

The velocity damper below is introduced to eliminate self-collision between two robotic arms,

v_{r} \leq ϱ \frac{l_{r} - l_{b}}{l_{e} - l_{b}},

(20)

where

v_{r}

is the change rate of the distance

l_{r}

between gripper-mounted right arm and camera-mounted left arm.

l_{e}

is the effective distance that triggers the damper, and

l_{b}

is the bottle-line distance that makes the robotic arm stop. The value of positive constant

ϱ

determines how aggressive the damper’s restricting effect is.

6. Experimental Results

The mobile manipulation robot is tested by several search-and-fetch experiments in a laboratory. Specifically, the compatibility of its multi-subsystems, precision of grasping, and safety of navigation are evaluated.

6.1. Environment Layout

The laboratory is 9.3 m long and 6.6 m wide, and its planar diagram is shown in Figure 5. The stage where target objects are placed is on the northeast corner of the office. It is 0.6 m long and 0.9 m wide. The robot’s initial pose is set at the southeast corner of a 1.3 m-wide corridor, which is surrounded by obstacles such as a desk and TV. The placing stage is positioned in the middle-right side of the room against wall. There are four chairs. Each chair measures 0.5 m in length and width, positioned at the center of the laboratory.

6.2. Workflow Generation

The mobile manipulation robot is commanded to navigate through a 1.2 m-width corridor between chairs and right-side long desk to reach the target stage to perform the grasping tasks. Subsequently, the robot turns around and passes through a 1.7 m-width corridor formed by two rows of chairs to reach the placing stage to drop the target items. The experiment is conducted with 20 common daily necessities in a laboratory. The target objects for the robot vary significantly in size, ranging from small items (e.g., cable connectors) to larger items (e.g., glass containers). The robot is required to grasp them precisely with respect to the optimal position and hold them stably during navigation.

6.3. Results and Discussion

The navigation and manipulation experiments involve repeating each trial 10 times for a total of 20 different objects. Such repetition helps ensure the consistency and reliability in the robot’s performance. The average operational time of the 200 trials is 156 s, and the success rate is 91.5%. A trial is considered successful when the navigation is collision-free and the manipulation is accurate.

As shown in Figure 10, the robot successfully avoids collision with obstacles (e.g., desk’s edges and TV) and passes through narrow corridors formed by two rows of chairs. Previously unknown obstacles are detected by the front and rear RGB-D cameras, which are mapped into point clouds and subsequently added to the local cost maps. The success rate of the navigation task is 93.5%. Considering the length and width of the UGV (see Table 1), it might crash into the chairs or desk when it turns in a confined space. Therefore, the CBF decay rate

α

increases from 0.21 to 0.46 to emphasize the CBF-derived safety constraint. The CLF decay rate

λ

and relaxation variable

μ

are both lowered to give more leeway to satisfy the CBF-derived safety constraints.

Figure 11 illustrates the refined path, global path, and local path in different colors, with the initial pose located at the upper-left of the coordinate system. These trajectories are generated by three layers of a hierarchical planner. It is demonstrated that the global planner only considers the previously scanned obstacles in the global cost map, and its trajectory directly leads to the goal pose. The local planner considers previously unknown obstacles by integrating real-time data from sensors and generates a local path to bypass them. The refined path is the actual path taken by the mobile manipulation and is recorded by AMCL. It helps minimize the risk of collision based on immediate environmental condition, while maintaining the robot’s ability to reach its destination.

As shown in Figure 12, the robot successfully reaches and grasps the target objects through optimal grasping positions generated by the proposed learning-optimization grasp algorithm. The target objects vary from fruit to a screwdriver, which possess different hardness and surface roughness. The grasping success rate in the repetitive grasp-and-place experiments is 95.5%, which is measured independently of the navigation process, with an average grasping time of 33 s. The 3D position and orientation represented by the quaternion of two robotic arms’ end effectors are illustrated in Figure 13 and Figure 14. The failure trails are caused by slippery contact surface or insufficient gripper aperture. Therefore, the dataset containing gripper force and aperture information should be established to train the CNN in future work.

To reveal the advantage of the proposed approach over existing similar works, a comparison of the success rate of mobile manipulation tasks is tabulated in Table 2. All of the mobile manipulation tasks were performed in real-world, small-scale indoor environments, involving the approach-and-grasp operation of a target object. It is observed that the proposed mobile manipulation robot achieves the highest success rate, which is enabled by the introduced hierarchical motion planning and hybrid grasping strategies.

7. Conclusions

In this work, we have developed a mobile manipulation robot with two robotic arms. It integrates heterogeneous subsystems that undertake perception, mapping, navigation, and grasping functions into a seamless communication and control framework. The integration requires the construction of a state machine architecture that bridges each subsystem and real-time coordinate transformation among cameras, robotic arms, and mobile base. By utilizing a hierarchical motion planning structure, CLF and CBF guarantee the tracking performance and obstacle avoidance in a dynamic environment. The precise and reliable manipulation of target objects are achieved through a learning-optimization hybrid algorithm. Search-and-fetch experiments in a laboratory environment with typical daily necessities demonstrate the efficiency and reliability of the proposed mobile manipulation robotic system. Compared with four related works in Table 2, our method increases the success rate of mobile manipulation tasks by over 8.17%. Notably, our work explores the potential of mobile manipulation robots to address challenges arising from aging populations in shared workspaces, offering scalable assistance for daily tasks. In the future, we will employ the Visual Language Large Model (VLM) to realize task planning, which can significantly enhance the mobile manipulation robot’s quality of work and independence.

Author Contributions

Y.C. conceived the hierarchical motion planning strategy, developed the core algorithms, and conducted experimental validation; X.W. designed the robotic platform and hybrid grasping system; Z.W. implemented the control architecture and performed simulations; Q.X. supervised the research, acquired funding, and validated methodologies. All authors contributed to data interpretation and manuscript preparation, with Q.X. serving as corresponding author. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 52175556, in part by Macao Science and Technology Development Fund under Grant 0004/2022/AKP, Grant 0102/2022/A2, and Grant 0078/2023/RIB3, and in part by the University of Macau under Grant MYRG-CRG2022-00004-FST-ICI and Grant MYRG-GRG2024-00222-FST-UMDF.

Data Availability Statement

Data is contained in this paper.

Conflicts of Interest

The authors declare no conflicts of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Seyboldt, R.; Frese, C.; Zube, A. Sampling-based path planning to cartesian goal positions for a mobile manipulator exploiting kinematic redundancy. In Proceedings of the 47st International Symposium on Robotics (ISR 2016), Munich, Germany, 21–22 June 2016; pp. 1–9. [Google Scholar]
Spahn, M.; Brito, B.; Alonso-Mora, J. Coupled mobile manipulation via trajectory optimization with free space decomposition. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 12759–12765. [Google Scholar]
Xiong, H.; Mendonca, R.; Shaw, K.; Pathak, D. Adaptive mobile manipulation for articulated objects in the open world. arXiv 2024, arXiv:2401.14403. [Google Scholar]
Wang, C.; Zhang, Q.; Tian, Q.; Li, S.; Wang, X.; Lane, D.; Petillot, Y.; Wang, S. Learning mobile manipulation through deep reinforcement learning. Sensors 2020, 20, 939. [Google Scholar] [CrossRef] [PubMed]
Agrawal, A.; Sreenath, K. Discrete control barrier functions for safety-critical control of discrete systems with application to bipedal robot navigation. In Proceedings of the Robotics: Science and Systems, Cambridge, MA, USA, 12–16 July 2017; Volume 13, pp. 1–10. [Google Scholar]
Desai, M.; Ghaffari, A. Clf-cbf based quadratic programs for safe motion control of nonholonomic mobile robots in presence of moving obstacles. In Proceedings of the 2022 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM), Sapporo, Japan, 11–15 July 2022; pp. 16–21. [Google Scholar]
Ames, A.D.; Grizzle, J.W.; Tabuada, P. Control barrier function based quadratic programs with application to adaptive cruise control. In Proceedings of the 53rd IEEE Conference on Decision and Control, Los Angeles, CA, USA, 15–17 December 2014; pp. 6271–6278. [Google Scholar]
Haviland, J.; Sünderhauf, N.; Corke, P. A holistic approach to reactive mobile manipulation. IEEE Robot. Autom. Lett. 2022, 7, 3122–3129. [Google Scholar] [CrossRef]
Haviland, J.; Corke, P. NEO: A novel expeditious optimisation algorithm for reactive motion control of manipulators. IEEE Robot. Autom. Lett. 2021, 6, 1043–1050. [Google Scholar] [CrossRef]
Burgess-Limerick, B.; Leitner, C.L.J.; Corke, P. Enabling failure recovery for on-the-move mobile manipulation. arXiv 2023, arXiv:2305.08351. [Google Scholar]
Zimmermann, S.; Poranne, R.; Coros, S. Go fetch!—Dynamic grasps using boston dynamics spot with external robotic arm. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 4488–4494. [Google Scholar]
Sayar, E.; Gao, X.; Hu, Y.; Chen, G.; Knoll, A. Toward coordinated planning and hierarchical optimization control for highly redundant mobile manipulator. ISA Trans. 2024, 146, 16–28. [Google Scholar] [CrossRef]
Huang, J.K.; Grizzle, J.W. Efficient anytime clf reactive planning system for a bipedal robot on undulating terrain. IEEE Trans. Robot. 2023, 39, 2093–2110. [Google Scholar] [CrossRef]
Zeng, J.; Zhang, B.; Sreenath, K. Safety-critical model predictive control with discrete-time control barrier function. In Proceedings of the 2021 American Control Conference (ACC), Xi’an, China, 25–28 May 2021; pp. 3882–3889. [Google Scholar]
Cao, H.; Xiong, H.; Zeng, W.; Jiang, H.; Cai, Z.; Hu, L.; Zhang, L.; Lu, W. Safe Reinforcement Learning-Based Motion Planning for Functional Mobile Robots Suffering Uncontrollable Mobile Robots. IEEE Trans. Intell. Transp. Syst. 2024, 25, 4346–4363. [Google Scholar] [CrossRef]
Dong, K.; Pereida, K.; Shkurti, F.; Schoellig, A.P. Catch the ball: Accurate high-speed motions for mobile manipulators via inverse dynamics learning. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25–29 October 2020; pp. 6718–6725. [Google Scholar]
Miller, A.T.; Allen, P.K. Graspit! a versatile simulator for robotic grasping. IEEE Robot. Autom. Mag. 2004, 11, 110–122. [Google Scholar] [CrossRef]
Liu, T.; Liu, Z.; Jiao, Z.; Zhu, Y.; Zhu, S.C. Synthesizing diverse and physically stable grasps with arbitrary hand structures using differentiable force closure estimator. IEEE Robot. Autom. Lett. 2021, 7, 470–477. [Google Scholar] [CrossRef]
Roa, M.A.; Suárez, R. Grasp quality measures: Review and performance. Auton. Robot. 2015, 38, 65–88. [Google Scholar] [CrossRef]
Varley, J.; DeChant, C.; Richardson, A.; Ruales, J.; Allen, P. Shape completion enabled robotic grasping. In Proceedings of the 2017 IEEE/RSJ Inte. Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 2442–2447. [Google Scholar]
Newbury, R.; Gu, M.; Chumbley, L.; Mousavian, A.; Eppner, C.; Leitner, J.; Bohg, J.; Morales, A.; Asfour, T.; Kragic, D.; et al. Deep learning approaches to grasp synthesis: A review. IEEE Trans. Robot. 2023, 39, 3994–4015. [Google Scholar] [CrossRef]
Lundell, J.; Corona, E.; Le, T.N.; Verdoja, F.; Weinzaepfel, P.; Rogez, G.; Moreno-Noguer, F.; Kyrki, V. Multi-fingan: Generative coarse-to-fine sampling of multi-finger grasps. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 4495–4501. [Google Scholar]
Veres, M.; Moussa, M.; Taylor, G.W. Modeling grasp motor imagery through deep conditional generative models. IEEE Robot. Autom. Lett. 2017, 2, 757–764. [Google Scholar] [CrossRef]
Kappler, D.; Bohg, J.; Schaal, S. Leveraging big data for grasp planning. In Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 26–30 May 2015; pp. 4304–4311. [Google Scholar]
Wang, X.; Xu, Q. Transferring Grasping Across Grippers: Learning-Optimization Hybrid Framework for Generalized Planar Grasp Generation. IEEE Trans. Robot. 2024, 40, 3388–3405. [Google Scholar] [CrossRef]
Blomqvist, K.; Breyer, M.; Cramariuc, A.; Förster, J.; Grinvald, M.; Tschopp, F.; Chung, J.J.; Ott, L.; Nieto, J.; Siegwart, R. Go fetch: Mobile manipulation in unstructured environments. arXiv 2020, arXiv:2004.00899. [Google Scholar]
Lim, J.; Lee, I.; Shim, I.; Jung, H.; Joe, H.M.; Bae, H.; Sim, O.; Oh, J.; Jung, T.; Shin, S.; et al. Robot system of DRC-HUBO+ and control strategy of team KAIST in DARPA robotics challenge finals. J. Field Robot. 2017, 34, 802–829. [Google Scholar] [CrossRef]
Spenko, M.; Buerger, S.; Iagnemma, K. The DARPA Robotics Challenge Finals: Humanoid Robots to the Rescue; Springer: Berlin/Heidelberg, Germany, 2018; Volume 121. [Google Scholar]
Kindle, J.; Furrer, F.; Novkovic, T.; Chung, J.J.; Siegwart, R.; Nieto, J. Whole-body control of a mobile manipulator using end-to-end reinforcement learning. arXiv 2020, arXiv:2003.02637. [Google Scholar]
Li, M.; Yang, Z.; Zha, F.; Wang, X.; Wang, P.; Li, P.; Ren, Q.; Chen, F. Design and analysis of a whole-body controller for a velocity controlled robot mobile manipulator. Sci. China Inf. Sci. 2020, 63, 1–15. [Google Scholar] [CrossRef]
Dietrich, A.; Wimböck, T.; Täubig, H.; Albu-Schäffer, A.; Hirzinger, G. Extensions to reactive self-collision avoidance for torque and position controlled humanoids. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011; pp. 3455–3462. [Google Scholar]
Gorjup, G.; Gerez, L.; Liarokapis, M. Leveraging Human Perception in Robot Grasping and Manipulation Through Crowdsourcing and Gamification. Front. Robot. AI 2021, 8, 652760. [Google Scholar] [CrossRef]
Engemann, H.; Du, S.; Kallweit, S.; Cönen, P.; Dawar, H. Omnivil–an autonomous mobile manipulator for flexible production. Sensors 2020, 20, 7249. [Google Scholar] [CrossRef]
Stückler, J.; Schwarz, M.; Behnke, S. Mobile manipulation, tool use, and intuitive interaction for cognitive service robot cosero. Front. Robot. AI 2016, 3, 58. [Google Scholar] [CrossRef]
Outón, J.L.; Merino, I.; Villaverde, I.; Ibarguren, A.; Herrero, H.; Daelman, P.; Sierra, B. A real application of an autonomous industrial mobile manipulator within industrial context. Electronics 2021, 10, 1276. [Google Scholar] [CrossRef]
Pantusin, F.J.; Carvajal, C.P.; Ortiz, J.S.; Andaluz, V.H. Virtual teleoperation system for mobile manipulator robots focused on object transport and manipulation. Technologies 2024, 12, 146. [Google Scholar] [CrossRef]
Iriondo, A.; Lazkano, E.; Susperregi, L.; Urain, J.; Fernandez, A.; Molina, J. Pick and place operations in logistics using a mobile manipulator controlled with deep reinforcement learning. Appl. Sci. 2019, 9, 348. [Google Scholar] [CrossRef]
Pankert, J.; Hutter, M. Perceptive model predictive control for continuous mobile manipulation. IEEE Robot. Autom. Lett. 2020, 5, 6177–6184. [Google Scholar] [CrossRef]
Bohren, J.; Rusu, R.B.; Jones, E.G.; Marder-Eppstein, E.; Pantofaru, C.; Wise, M.; Mösenlechner, L.; Meeussen, W.; Holzer, S. Towards autonomous robotic butlers: Lessons learned with the PR2. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011; pp. 5568–5575. [Google Scholar]
Nicolescu, M.N.; Matarić, M.J. A hierarchical architecture for behavior-based robots. In Proceedings of the 1st International Joint Conference on Autonomous Agents and Multiagent Systems: Part 1, Bologna, Italy, 15–19 July 2002; pp. 227–233. [Google Scholar]
Zhang, J.; Tang, L.; Song, Y.; Meng, Q.; Qian, H.; Shao, J.; Song, W.; Zhu, S.; Gu, J. FLTRNN: Faithful long-horizon task planning for robotics with large language models. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; pp. 6680–6686. [Google Scholar]
Yenamandra, S.; Ramachandran, A.; Yadav, K.; Wang, A.; Khanna, M.; Gervet, T.; Yang, T.Y.; Jain, V.; Clegg, A.W.; Turner, J.; et al. Homerobot: Open-vocabulary mobile manipulation. arXiv 2023, arXiv:2306.11565. [Google Scholar]
Jauhri, S.; Peters, J.; Chalvatzaki, G. Robot learning of mobile manipulation with reachability behavior priors. IEEE Robot. Autom. Lett. 2022, 7, 8399–8406. [Google Scholar] [CrossRef]

Figure 1. Photo of the developed mobile manipulator in action.

Figure 2. Connection scheme of the robot system hardware. Perception instruments are listed on the left. The industrial computer is wirelessly connected to both the operator’s PC and a remote device. The setup allows for seamless communication and control among different components of the robot.

Figure 3. Behavior-based workflow of subsystems. The robot integrates perception, mapping, navigation, and manipulation functions. The connection of these heterogeneous subsystems requires a state machine and ROS nodes.

Figure 4. Planar diagram (left) and map of the experimental environment (right) generated by the Cartographer SLAM algorithm. Only the cubicles and desks in a laboratory are previously known and mapped. Pink and brown rectangles represent unknown obstacles and target stage, respectively.

Figure 5. The results of object detection for the selected subjects (left) and all subjects (right). The center of every object is labeled with its 3D position in camera coordinate frame.

Figure 6. Snapshots of the simulation for the robot navigating to the target position. Sub-figures (a,b) illustrate the robot’s movement from its initial position, while sub-figures (c,d) demonstrate its turning maneuver to reach the target stage. White particles are the point cloud generated by the front and rear RGB-D cameras. Red lines are the obstacles scanned by LiDAR. Purple and green lines in (b,c) represent the rear and front obstacle’s depth information, respectively.

Figure 7. The state

s = {[x y θ]}^{T}

of UGV in world coordinate frame is compared with the reference state generated by a high-level global planner. CLF is designed to mitigate the deviation in translational and rotational coordinates.

Figure 7. The state

s = {[x y θ]}^{T}

of UGV in world coordinate frame is compared with the reference state generated by a high-level global planner. CLF is designed to mitigate the deviation in translational and rotational coordinates.

Figure 8. The streamlined structure of the CNN-based scene inference module trained by decomposed antipodal grasping using the Cornell Grasping Dataset. The interference probability estimation is further optimized based on gripper’s kinematic model to enhance the grasping stability. The mobile manipulator provides visual input through a wrist-mounted camera and receives the target position and orientation along with the optimal grasping pose.

Figure 9. The results of the learning-based grasping position optimization. First, the original RGB-D image (a) is processed by the Scene Interaction Module, generating an orientation probability map (b1) along with a position probability map (b2). Their combination delivers a general interaction map (c), which is filtered through the Grasp Synthesis Algorithm with kinematic constraints and produces a final optimized grasping pose (d). The actual grasping scenario is displayed in (e).

Figure 10. Demonstration of the mobile manipulation robot performing a navigation and manipulation task: (1, a) Initial positioning; (2, b) Traversing through a narrow corridor; (3, c) Approaching and grasping the target object (a Hall effect ammeter); (4,5, d,e) Executing a turning maneuver in confined space; (6, f) Precise placement of the object on the designated stage. The bottom row shows pictures extracted from ROS Rviz interface, which monitors the real-time operation information of the robot.

Figure 11. (a) The 2D trajectory obtained by global planner (green), local planner (red), and refiner (blue). (b) The real-time linear (yellow) and angular (purple) velocity curves of the mobile manipulation robot in navigation. The sampling frequency is 10 Hz.

Figure 12. Demonstration of grasping scenarios for six distinct target objects: (a,d) Observation and computation of optimal grasp poses; (b,c,e,f) Execution of grasp commands. The ROS Rviz displays (left) are compared with real scenes (right).

Figure 13. The 3D position curves (upper) and orientation curves (lower) represented by a quaternion of the left robotic arm equipped with an RGB-D camera.

Figure 14. The 3D position curves (upper) and orientation curves (lower) represented by a quaternion of the right robotic arm mounted with an antipodal gripper.

Table 1. Main specifications of the mobile manipulation robot.

Parameter	Value
Height with folded UR3e arms	1275 $m$ $m$
Length	1055 $m$ $m$
Width	743 $m$ $m$
Total weight	200 $k g$
Max. payload (mobile platform)	200 $k g$
Max. velocity (mobile platform)	4 $k$ $m$ $h$ ⁻¹
Wheel diameter	400 $m$ $m$
Surmountable obstacle height	100 $m$ $m$
Max. approaching angle	15°
Max. reach (UR3e)	500 $m$ $m$
Max. payload (UR3e)	3 $k g$
Max. stroke (Robotiq gripper)	85 $m$ $m$
Grip force (Robotiq gripper)	20 $N$ to 235 $N$
Max. payload (Robotiq gripper)	5 $k g$

Table 2. Comparison of success rate of typical mobile manipulation robots.

Reference	[11]	[35]	[42]	[43]	This Work
Success rate	80.00%	83.33%	20.00%	81.25%	91.50%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cao, Y.; Wang, X.; Wu, Z.; Xu, Q. Development of a Multifunctional Mobile Manipulation Robot Based on Hierarchical Motion Planning Strategy and Hybrid Grasping. Robotics 2025, 14, 96. https://doi.org/10.3390/robotics14070096

AMA Style

Cao Y, Wang X, Wu Z, Xu Q. Development of a Multifunctional Mobile Manipulation Robot Based on Hierarchical Motion Planning Strategy and Hybrid Grasping. Robotics. 2025; 14(7):96. https://doi.org/10.3390/robotics14070096

Chicago/Turabian Style

Cao, Yuning, Xianli Wang, Zehao Wu, and Qingsong Xu. 2025. "Development of a Multifunctional Mobile Manipulation Robot Based on Hierarchical Motion Planning Strategy and Hybrid Grasping" Robotics 14, no. 7: 96. https://doi.org/10.3390/robotics14070096

APA Style

Cao, Y., Wang, X., Wu, Z., & Xu, Q. (2025). Development of a Multifunctional Mobile Manipulation Robot Based on Hierarchical Motion Planning Strategy and Hybrid Grasping. Robotics, 14(7), 96. https://doi.org/10.3390/robotics14070096

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Development of a Multifunctional Mobile Manipulation Robot Based on Hierarchical Motion Planning Strategy and Hybrid Grasping

Abstract

1. Introduction

2. Related Works

3. System Design and System Integration

3.1. System Hardware Design

3.1.1. Mobile Platform

3.1.2. Robot Manipulators

3.2. Communication Framework

3.3. System Integration and System Behaviors

3.3.1. Task Planning

3.3.2. Mapping and Object Detection

3.3.3. Navigation

3.3.4. Scene Understanding and Grasping

4. Optimization-Based Motion Control Algorithm for Mobile Robot

4.1. Modeling of the Differential-Drive Mobile Platform

4.2. Construction of Lyapunov Function

4.3. Construction of Control Barrier Function

4.4. Quadratic Program Formulation

5. Learning-Optimization Hybrid Grasping Method

5.1. Scene Interaction Module

5.1.1. Fingertip Dataset Generation

5.1.2. Fingertip Prediction

5.1.3. Ensemble of Scalar and Directional Data

5.1.4. Interpolating to Interference Space

5.2. Grasping Synthesis Algorithm

5.2.1. Kinematic Constraint

5.2.2. Scene Constraint

5.2.3. Objective Function

5.3. Collision Avoidance for Grasping

6. Experimental Results

6.1. Environment Layout

6.2. Workflow Generation

6.3. Results and Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI