Optimized Design and Deep Vision-Based Operation Control of a Multi-Functional Robotic Gripper for an Automatic Loading System

Wang, Yaohui; Guo, Sheng; Zhang, Jinliang; Ding, Hongbo; Zhang, Bo; Cao, Ao; Sun, Xiaohu; Zhang, Guangxin; Tian, Shihe; Chen, Yongxu; Ma, Jixuan; Chen, Guangrong

doi:10.3390/act14060259

Open AccessArticle

Optimized Design and Deep Vision-Based Operation Control of a Multi-Functional Robotic Gripper for an Automatic Loading System

by

Yaohui Wang

^1,2,†,

Sheng Guo

^1,†,

Jinliang Zhang

³,

Hongbo Ding

¹,

Bo Zhang

³,

Ao Cao

¹,

Xiaohu Sun

²,

Guangxin Zhang

¹,

Shihe Tian

³,

Yongxu Chen

¹,

Jixuan Ma

³ and

Guangrong Chen

^1,4,*

¹

Robotics Research Center, Beijing Jiaotong University, Beijing 100044, China

²

Huaneng Coal Technology Research Co., Ltd., Beijing 100070, China

³

Zhalainuoer Coal Industry Co., Ltd., Hulunbuir 021410, China

⁴

Tangshan Research Institute, Beijing Jiaotong University, Tangshan 063000, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Actuators 2025, 14(6), 259; https://doi.org/10.3390/act14060259

Submission received: 9 April 2025 / Revised: 15 May 2025 / Accepted: 20 May 2025 / Published: 23 May 2025

(This article belongs to the Special Issue Advancement in the Design and Control of Robotic Grippers—Second Edition)

Download

Browse Figures

Versions Notes

Abstract

This study presents an optimized design and vision-guided control strategy for a multi-functional robotic gripper integrated into an automatic loading system for warehouse environments. The system adopts a modular architecture, including standardized platforms, transport containers, four collaborative 6-DOF robotic arms, and a multi-sensor vision module. Methodologically, we first developed three gripper prototypes, selecting the optimal design (30° angle between the gripper and container side) through workspace and interference analysis. A deep vision-based recognition system, enhanced by an improved YOLOv5 algorithm and multi-feature fusion, was employed for real-time object detection and pose estimation. Kinematic modeling and seventh-order polynomial trajectory planning ensured smooth and precise robotic arm movements. Key results from simulations and experiments demonstrated a 95.72% success rate in twist lock operations, with a positioning accuracy of 1.2 mm. The system achieved a control cycle of 35 ms, ensuring efficiency compared with non-vision-based methods. Practical implications include enabling fully autonomous container handling in logistics, reducing labor costs, and enhancing operational safety. Limitations include dependency on fixed camera setups and sensitivity to extreme lighting conditions.

Keywords:

optimized design; deep vision-based control; multi-functional robotic gripper; automatic loading system; standardized transport

1. Introduction

With the rapid advancement of intelligent manufacturing and automated logistics, the demand for flexible, efficient, and unmanned material handling systems has become increasingly critical in industrial applications. Traditional loading and unloading processes in warehouses are often labor-intensive and error-prone, resulting in reduced operational efficiency and increased safety risks. In response to these challenges, robotic-based automatic loading systems have emerged as a promising solution to achieve high-precision, high-reliability, and intelligent logistics operations. This type of system contains multiple operational tasks, usually requiring the use of a robotic arm and a multi-functional robotic gripper to complete, and the gripper needs to be optimized and designed. The system generally adopts vision-based control, and due to the complexity of the environment, multiple feature fusion techniques need to be carried out to ensure stable and robust operation. For example, Kim et al. proposed a novel mechatronics system capable of automating a peg-in-hole assembly process and inspecting the quality of the assembly with vision [1].

In the multi-functional mechanical gripper field, Kang et al. designed and implemented a multi-function gripper for grasping general objects, aiming to enhance versatility and flexibility in industrial automation applications [2]. Gümbel and Dröder proposed a design for highly repeatable and multi-functional grippers intended for precision handling with articulated robots [3]. Cramer et al. developed a user-friendly toolkit to select and design multi-purpose grippers for modular robotic systems [4]. Ai and Chen proposed a multifunctional gripper design at the end of the robot for versatile object manipulation [5]. Kladovasilakis et al. developed a multi-functional bioinspired soft robotic actuator using additive manufacturing techniques [6].

In the optimized design field, Yildiz et al. introduced a robust robot gripper mechanism design using a new hybrid grasshopper optimization algorithm [7]. Nguyen et al. optimized compliant gripper mechanism design by employing a bi-algorithm combining fuzzy logic and ANFIS [8]. Pinskier et al. applied diversity-based topology optimization techniques to the design of soft robotic grippers [9]. Lee et al. presented a closed-structure compliant gripper with morphologically optimized multi-material fingertips for aerial grasping [10]. Sun et al. developed LARG, a lightweight robotic gripper with 3D topology-optimized adaptive fingers [11].

In the vision-based control field, Zablocki et al. reviewed the explainability of deep vision-based autonomous driving systems and identified future challenges [12]. Ghasemieh and Kashef explored explainable artificial intelligence for deep vision-based odometry systems [13]. Wang et al. proposed a real-time deep vision-based method for soft body 3D proprioception [14]. Choi et al. conducted an experimental evaluation of deep vision-based occupancy counting for ventilation control [15]. Wang et al. applied the YOLO-v5 algorithm for real-time recognition of apple stem/calyx in automatic fruit loading systems [16].

In the multi-feature fusion field, Zou et al. developed a new multi-feature fusion convolutional neural network for facial expression recognition [17]. Jiang and Yin proposed a facial expression recognition approach using convolutional block attention and multi-feature fusion [18]. Li et al. presented a multi-feature fusion method for recognizing gastrointestinal metaplasia in medical images [19]. Hu et al. designed a novel multi-feature fusion network with spatial partitioning and cross-attention for gesture recognition using armband signals [20]. Liu and Xu introduced AMFF, an attention-based multi-feature fusion method for human intention recognition [21].

Inspired by port automation and container handling techniques, this study proposes a modular, containerized, and vision-guided robotic system for automated loading and unloading in warehouse scenarios. The system integrates standardized container platforms, modular transport units, multi-degree-of-freedom robotic manipulators, and a deep vision sensing module to perform autonomous material transfer tasks. Central to the system is the design of a multi-functional mechanical gripper capable of handling twist locks and performing precise positioning adjustments under complex spatial constraints.

To enhance system intelligence and operational autonomy, the proposed solution incorporates a multi-sensor vision system utilizing RealSense depth cameras. This enables real-time object recognition, spatial localization, and grasp planning, providing robust perception support for manipulation tasks. A coordinated control strategy for multiple robotic arms further improves task efficiency and ensures collision-free operations.

The objectives of this work are threefold: (1) to develop a mechanical gripper design optimized for both grasping and fine manipulation of twist locks; (2) to establish a vision-guided control framework for real-time object detection and pose estimation; and (3) to validate the system’s performance in simulation and experimental settings. This paper presents the system architecture, mechanical design methodology, motion planning strategies, and experimental results, demonstrating the feasibility and effectiveness of the proposed approach in enabling intelligent and autonomous loading operations in industrial warehouse environments.

Compared with existing state-of-the-art methods, our approach introduces several practical innovations that contribute to its superior performance. First, the geometry-aware gripper design improves workspace coverage and reduces mechanical interference, resulting in more reliable grasping during complex operations. Second, by fusing multiple 88 complementary visual features, the perception module achieves enhanced robustness and precision under varying environmental conditions. Third, the lightweight deep learning model enables real-time inference on standard hardware, ensuring smooth system integration without specialized computation resources. Lastly, the use of high-order motion 92 planning techniques ensures smoother and more stable manipulation trajectories, which benefits both alignment accuracy and mechanical durability. These combined improvements allow our system to deliver consistently higher efficiency and reliability in real-world automated loading tasks.

2. Automatic Loading System

2.1. System Architecture and Composition

Inspired by the automated container loading and unloading processes used in ports, this project proposes an unmanned automated system suitable for warehouse environments, as illustrated in Figure 1. The simulation prototype is established by using ROS Noetic Ninjemys Gazebo 11.X in Ubuntu 20.04 Focal Fossa. The system adopts a containerized standard storage format similar to that of port logistics, enabling digital management of warehouse goods by recording the contents and locations of each container.

The system is composed of six main modules:

1.: Standardized Load-Carrying Platform Module: This can include mining flatbed trucks, cargo trucks, trains, and ships.
2.: Modular Transport Container Module: Designed in various forms to carry different materials. As shown on the left side of Figure 1a, each container is equipped with eight corner fittings—four at the bottom for securing the container, and four at the top for lifting and transport.
3.: Locking Module: Includes manually adjustable twist locks and fully automatic twist locks. The manual twist locks are mounted on the standardized platform and can raise the locking head to secure the transport container. When locking is not required, the lock head can be lowered to convert the platform into a standard flatbed. The fully automatic twist locks are used to interconnect stacked containers.
4.: Overhead Crane and Spreader Module: Responsible for loading and unloading the transport containers onto and from the standardized platform, as well as for stacking containers within the warehouse.
5.: Multi-Sensor Recognition and Localization Module: Comprising four cameras, an IMU (Inertial Measurement Unit), and UWB (Ultra Wide Band) positioning modules, LD150 from Haoru Tech (Dalian, China). The cameras detect the four bottom corners of the containers and the handles of the manual twist locks within the work zone, providing feedback for automated operation. The IMU and UWB modules estimate the container’s position and orientation outside the working zone.
6.: Multi-Robot Arm Collaborative Loading and Unloading Module: Installed in the working zone, consisting of four robotic arms and multifunctional grippers. This module facilitates the automated transfer of containers between the platform and the storage area.

The system comprises four six-degree-of-freedom (6-DOF) robotic arms, specifically BRTIRUS1510A from Borunte (Shanghai, China), symmetrically mounted on both sides of the workspace. Each robotic arm is equipped with a two-finger gripper, RM-GB-17-80-2-1 from Robustmotion (Guangdong, China), to facilitate grasping operations. Additionally, four RealSense D415 depth cameras from Intel (Santa Clara, CA, USA) are individually positioned on the base of arms to acquire environmental data, providing real-time feedback for task execution by the robotic arms. The single robotic arm with a gripper and a depth camera is shown in Figure 2.

2.2. Loading and Unloading Process

2.2.1. Loading Process

The procedure for loading a transport container onto the load-carrying platform is as follows:

When the overhead crane hoists the transport container above the platform from a designated position in the warehouse, the multi-camera system captures the poses of the four bottom corner fittings of the container relative to the robotic arms’ base coordinate frame. Using this information, the system calculates the precise poses of the fully automatic twist locks mounted at the bottom of the container.

Then, the multi-robot arm collaborative module removes the fully automatic twist locks based on the estimated poses. Afterward, the robotic arms correct the container’s posture and align it with the platform. The overhead crane then lowers the container onto the platform.

Next, the multi-camera system detects the pose of the manual twist lock handles located on the platform. The robotic arms subsequently lift the locking heads according to the handle poses to securely fix the container in place.

2.2.2. Unloading Process

The procedure for unloading a transport container from the load-carrying platform is as follows:

The multi-camera system first detects the pose of the manual twist lock handles on the platform. The multi-robot arm module then lowers the locking heads to release the container.

The overhead crane lifts the container to a certain height. The multi-camera system again captures the poses of the four bottom corner fittings of the container relative to the robotic arms’ base frame, enabling calculation of the poses of the fully automatic twist locks.

Subsequently, the multi-robot arm module attaches the fully automatic twist locks back onto the container according to the estimated poses. Finally, the overhead crane transfers the container from above the platform to a designated storage position in the warehouse for stacking.

3. Optimized Design

3.1. Tasks of the Multi-Robot Arm Collaborative Loading and Unloading Module

Based on the operational workflow described above, the multi-robot arm collaborative loading and unloading module is designed to perform four main tasks:

Handling fully automatic twist locks;
Locking and unlocking manual adjustable twist locks;
Correcting the posture of the transport container;
Positioning and securing the transport container.

Among these tasks, handling the fully automatic twist locks is the most technically challenging and demands the highest precision from the multi-camera recognition and localization system. Therefore, the subsequent analysis focuses primarily on the task of handling fully automatic twist locks.

3.2. Operation Process

The operation process of the multi-functional mechanical gripper contains locking and unlocking the fully twist lock procedure.

3.2.1. Locking Procedure

The locking procedure consists of the following steps:

1.: Initial Grasping: The robotic arm moves from its initial position to a designated position where the gripper closes to securely grasp the twist lock.
2.: Alignment: The robotic arm adjusts its posture and executes a translational motion to align the twist lock with the mounting port of the corner piece.
3.: Insertion: The robotic arm performs a vertical motion along the z-axis, raising the twist lock to a certain height until it enters the corner piece.
4.: Rotation: The robotic arm rotates the twist lock counterclockwise by $60^{°}$ about its own central axis, which is parallel to the z-axis (viewed from above).
5.: Release: The gripper opens to release the twist lock, allowing it to be securely installed within the corner piece.
6.: Return: The robotic arm returns to its initial position.

3.2.2. Unlocking Procedure

The unlocking procedure involves the following steps:

1.: Approach: The robotic arm moves from its initial position to a designated location with a specific posture to approach and grasp the twist lock.
2.: Grasping: The gripper closes to securely hold the twist lock.
3.: Rotation: The robotic arm rotates the twist lock clockwise by $60^{°}$ about its own central axis, which is parallel to the z-axis (viewed from above).
4.: Extraction: The robotic arm performs a downward motion along the z-axis, lowering the twist lock until it exits the corner piece.
5.: Placement: The robotic arm moves the twist lock to a designated placement location, where the gripper opens to release the twist lock.
6.: Return: The robotic arm returns to its initial position.

Loading and unloading a twist lock is the opposite process. The three flag states are shown in Figure 3.

3.3. Optimized Design of the Multi-Functional Robotic Gripper

To meet the requirements of the four core functions of the multi-robot arm module, an optimized design of a multi-functional mechanical gripper has been developed.

An embedded two-finger gripper is used to grip the fully automatic twist locks. A cylindrical structure fixed above the two fingers is employed to rotate and lift the handle of the manual retractable twist lock. A right-angle structure, formed by the two fingers and a rectangular block on the side, is used to position and secure the four vertical corners of the transport container. Universal wheels at the fingertip ends help align the container and prevent unintended movement or frictional dragging between the container and the fingers.

To accommodate different gripping angles required for the fully automatic twist locks, three types of embedded two-finger grippers have been designed.

3.3.1. Type I Gripper (90° Between Gripper and Container Side)

The first gripper design, in which the angle between the gripper and the short side of the container is 90°, is illustrated in Figure 4a. The gripping posture for locking the twist lock is shown in Figure 4b.

When gripping from the outside, one of the extreme reachable positions is shown in Figure 4c. Although no interference occurs with the container in either extreme posture, the manipulator reaches the limit of its workspace. If the distance between the manipulator and the twist lock is too large, the gripper may fail to reach the required position, resulting in failure to complete the gripping action.

When gripping from the inside, the extreme posture is shown in Figure 4d. In this case, interference occurs between the robot’s lower arm and the container.

3.3.2. Type II Gripper (0° Between Gripper and Container Side)

The second gripper design, where the gripper is parallel to the short side of the container (0° angle), is shown in Figure 5a, and its twist lock gripping posture is shown in Figure 5b.

Two extreme positions for gripping the twist lock using this design are shown in Figure 5c,d. While both positions avoid interference with the container, in one posture, the robotic arm partially extends beneath the container, which increases the risk of collision during motion. Moreover, if the container is relatively low, rotation of joint

θ_{5}

may cause the lower arm of the robot to interfere with the container.

3.3.3. Type III Gripper (30° Between Gripper and Container Side)

The third gripper design, ultimately selected for deployment, features a 30° angle between the gripper and the short side of the container. The model is shown in Figure 6a, and its gripping posture is shown in Figure 6b.

The two extreme gripping positions for this design are illustrated in Figure 6c,d. In both cases, there is no interference with the container, and the robotic arms do not extend beneath the container. Furthermore, both extreme positions are comfortably within the robot’s workspace limits. Based on these advantages, the third gripper design was adopted as the final solution.

During the optimization process, both the workspace of the robotic arms and potential interferences between the arms and the container were thoroughly considered. Based on these analyses, the third design of the embedded two-finger gripper was selected, as the first two designs encountered limitations due to workspace constraints and physical interference [2]. As shown in Figure 7, there exist interference or workspace limitations in the Type I and II grippers. While the Type III gripper has the same maximum angle space before and after the 60° rotation operation, note that there are four arms that carry out the operation at the four corners of the container.

As a result, the final optimized multi-functional mechanical gripper design fulfills all four functional requirements, as shown in Figure 8.

4. Sensing Based on Deep Vision

The perception system aims to identify the fully automatic twist lock and determine its pose. It utilizes a YOLOv5-based algorithm to detect the planar coordinates

(x, y)

, depth point cloud information to obtain z-axis data, and geometric relationships among multiple target features for fusion recognition. This robust recognition and localization approach ensures reliable operation even under varying lighting conditions or when the target is temporarily lost.

4.1. Improved YOLOv5 Object Detection Algorithm

The proposed approach employs an improved object detection algorithm based on the YOLOv5 architecture [22], characterized by optimized network structure and training strategies that significantly reduce computational resource consumption while maintaining high detection accuracy.

Specifically, the algorithm achieves enhanced efficiency through the following techniques:

Lightweight Feature Extraction Network: Reduces model parameters, ensuring efficient processing.
Dynamic Resolution Input Strategy: Adapts to different computational constraints across various scenarios.
Channel Attention Mechanism: Improves the discriminative capability of feature representations.

Due to the fixed camera and workspace, the scene is also fixed and singular. The recognized features only have a small correlation between Corner and container types. Other features are not on the container. Thus, the dataset is mainly affected by time and lighting. To enhance robustness, a specialized dataset comprising 3200 high-resolution images was constructed under different time periods and lighting conditions. The number of images of the train, validation, and test set are 2200, 500, and 500, respectively. This dataset encompasses various illumination conditions, target scales, and complex backgrounds. Data augmentation and rigorous annotation procedures are applied to significantly improve the algorithm’s detection accuracy and generalization performance.

4.2. Multi-Feature Fusion Recognition

A high-precision recognition method for containers and flatbed trucks is proposed, utilizing six heterogeneous features to achieve robust localization and motion decision-making under complex conditions. The feature system includes the following:

Corner piece (Corner);
Flatbed truck edge (Edge);
Flatbed truck twist lock horizontal component (Lift_lock_1);
Flatbed truck twist lock vertical component (Lift_lock_2);
Twist lock Orientation (F/B/L/R);
Automatic twist lock (Auto_lock).

The Corner feature serves as the primary constraint, while other auxiliary features are hierarchically integrated to enable core functionalities such as automatic twist lock locking/unlocking and flatbed truck twist lock manipulation.

During loading and unloading of a fully automatic twist lock onto or from a Modular Transport Container Module operations, motion commands for the robotic arm are generated based on the combined three-dimensional coordinate calculations of Corner and the automatic twist lock orientation. The localization precision is further optimized through multi-source data fusion, involving the following steps:

Original coordinates of Auto_lock, Lift_lock_1, Lift_lock_2, and Edge are spatially registered and transformed to match the Corner coordinate system.
Kalman filtering is applied for temporal prediction and dynamic weighted fusion of observations, effectively suppressing sensor noise and motion blur.

For depth calculation, a modal-based strategy is introduced:

For Corner features, depth information is derived from the centroid depth of the bounding box’s four corner pixel points.
For other features, the minimum depth value within the bounding box is utilized, balancing geometric precision and robustness against occlusion.
A depth thresholding mechanism is implemented to filter erroneous data.

The recognition results are shown in Figure 9. The system successfully identifies multiple features, with the primary target being the Corner. Since the container structure is pre-designed, other features maintain consistent geometric relationships with the Corner. When the Corner feature is compromised, its pose can be indirectly calculated through these related features.

5. Operation Control

5.1. Kinematics Model

5.1.1. Forward Kinematics

The forward kinematics of a six-degree-of-freedom (6-DOF) robotic arm determines the position and orientation of the end-effector given the joint angles. The general forward kinematics equation can be represented as:

T = \prod_{i = 1}^{6} T_{i} = T_{1} T_{2} T_{3} T_{4} T_{5} T_{6}

(1)

Each transformation matrix

T_{i}

is defined using the Denavit–Hartenberg (DH) convention:

T_{i} = [\begin{matrix} cos θ_{i} & - sin θ_{i} cos α_{i} & sin θ_{i} sin α_{i} & a_{i} cos θ_{i} \\ sin θ_{i} & cos θ_{i} cos α_{i} & - cos θ_{i} sin α_{i} & a_{i} sin θ_{i} \\ 0 & sin α_{i} & cos α_{i} & d_{i} \\ 0 & 0 & 0 & 1 \end{matrix}]

(2)

where:

$θ_{i}$ is the joint angle.
$d_{i}$ is the link offset.
$a_{i}$ is the link length.
$α_{i}$ is the twist angle.

The final transformation matrix

T

provides the position and orientation of the end-effector in the base frame. The transformation matrix

T

relates the position and orientation of the end-effector as follows:

T = [\begin{matrix} R & P \\ 0 & 1 \end{matrix}] = [\begin{matrix} n_{x} & o_{x} & a_{x} & p_{x} \\ n_{y} & o_{y} & a_{y} & p_{y} \\ n_{z} & o_{z} & a_{z} & p_{z} \\ 0 & 0 & 0 & 1 \end{matrix}]

(3)

\{\begin{matrix} n_{x} = s_{234} c_{1} s_{6} - c_{6} (s_{1} s_{5} - c_{234} c_{1} c_{5}) \\ n_{y} = c_{6} (c_{1} s_{5} + c_{234} c_{5} s_{1}) + s_{1} s_{6} \\ n_{z} = c_{234} s_{6} - s_{234} c_{5} c_{6} \\ o_{x} = s_{6} (s_{1} s_{5} - c_{234} c_{5} c_{1}) + s_{234} c_{1} c_{6} \\ o_{y} = s_{234} s_{1} c_{6} - s_{6} (c_{1} s_{5} + c_{234} s_{1} c_{5}) \\ o_{z} = c_{234} c_{6} + s_{234} c_{5} s_{6} \\ a_{x} = c_{5} s_{1} + c_{234} c_{1} s_{5} \\ a_{y} = c_{234} s_{1} s_{5} - c_{1} c_{5} \\ a_{z} = - s_{234} s_{5} \\ p_{x} = d_{6} (c_{5} s_{1} + c_{234} c_{1} s_{5}) + a_{2} c_{1} c_{2} + a_{4} c_{1} c_{234} + a_{3} c_{1} c_{23} \\ p_{y} = d_{6} (c_{234} s_{1} s_{5} - c_{1}) + a_{4} s_{1} c_{234} + a_{3} s_{1} c_{23} + a_{2} c_{2} s_{1} \\ p_{z} = d_{1} - a_{3} s_{23} - a_{2} s_{2} - a_{4} s_{4 - 23} - d_{6} s_{5} s_{234} \end{matrix}

(4)

where

s_{i}

,

c_{i}

,

s_{i j}

,

c_{i j}

,

s_{i j k}

and

c_{i j k}

donate

sin θ_{i}

,

cos θ_{i}

,

sin (θ_{i} + θ_{j})

,

cos (θ_{i} + θ_{j})

,

sin (θ_{i} + θ_{j} + θ_{k})

and

cos (θ_{i} + θ_{j} + θ_{k})

, respectively.

P

is the position vector

{[\begin{matrix} p_{x} & p_{y} & p_{z} \end{matrix}]}^{T}

, and

R

is the rotation matrix representing orientation (corresponding to

α

,

β

,

γ

using Euler angles or RPY angles or other representations). Using RPY angles, it yields

\{\begin{matrix} β = a t a n 2 (- n_{z}, \sqrt{n_{x}^{2} + n_{y}^{2}}) \\ α = a t a n 2 (n_{y}, n_{x}) \\ γ = a t a n 2 (o_{z}, a_{z}) \end{matrix}

(5)

This can be summarized as follows:

X = F K (Q)

(6)

where

X = {[\begin{matrix} p_{x} & p_{y} & p_{z} & α & β & γ \end{matrix}]}^{T}

is the end-effector pose,

Q = {[\begin{matrix} θ_{1} & θ_{2} & θ_{3} & θ_{4} & θ_{5} & θ_{6} \end{matrix}]}^{T}

is the vector of joint angles, and

F K

is the forward kinematics function.

5.1.2. Inverse Kinematics

Inverse kinematics involves determining the joint angles

θ_{i}

given the desired end-effector pose

T

. This process is generally more complicated than forward kinematics and often requires numerical methods.

The desired transformation matrix can be divided into a rotational part

R

and a translational part

P

as Equation (3). Solving the inverse kinematics involves breaking down the problem into position and orientation components:

Position: Solving for $θ_{1}$ , $θ_{2}$ , and $θ_{3}$ based on the position vector $P$ .
Orientation: Solving for $θ_{4}$ , $θ_{5}$ , and $θ_{6}$ based on the rotation matrix $R$ .

\{\begin{matrix} θ_{1} = arctan \pm \frac{p_{y} - d_{6} a_{y}}{p_{x} - d_{6} a_{x}} \\ θ_{5} = arctan \pm \frac{\sqrt{{(n_{x} s_{1} - n_{y} c_{1})}^{2} + {(o_{x} s_{1} - o_{y} c_{1})}^{2}}}{a_{x} s_{1} - a_{y} c_{1}} \\ θ_{6} = arctan \frac{o_{x} s_{1} - o_{y} c_{1}}{- n_{x} s_{1} + n_{y} c_{1}} \\ θ_{2} = arctan \pm \frac{A^{2} + B^{2}}{\sqrt{4 a_{2}^{2} p^{2} - {(A^{2} + B^{2})}^{2}}} - arctan \frac{A}{B} \\ θ_{234} = θ_{2} + θ_{3} + θ_{4} = \frac{- a_{z}}{a_{x} c_{1} + a_{y} s_{1}} \\ θ_{23} = arctan \frac{d_{1} - p_{z} - d_{6} s_{5} s_{234} - a_{4} s_{234} - a_{2} s_{2}}{p_{x} c_{1} + p_{y} s_{1} - a_{4} c_{234} - d_{6} s_{5} c_{234} - a_{2} c_{2}} \\ θ_{3} = θ_{23} - θ_{2} \\ θ_{4} = θ_{234} - θ_{23} \end{matrix}

(7)

where

A = p_{x} c_{1} + p_{y} s_{1} - a_{4} c_{234} - d_{6} s_{5} c_{234}, B = d_{1} - p_{z} - d_{6} s_{5} s_{234} - a_{4} s_{234}

. To avoid the robotic arm being in a singular position, it is necessary to ensure

s i n θ_{5} \neq 0

.

This process can be summarized as:

Q = I K (X)

(8)

where

I K

is the inverse kinematics function.

Analytical solutions can be obtained for simple robotic arms, while numerical methods such as Jacobian Inverse or Jacobian Transpose methods are employed for complex configurations.

5.2. Jacobian Matrix

The Jacobian matrix

J

relates the joint velocities

\dot{Q}

to the end-effector velocity

\dot{X}

:

\dot{X} = J \dot{Q}

(9)

The Jacobian matrix can be derived by differentiating the forward kinematics equations with respect to the joint angles.

\{\begin{matrix} J_{11} = d_{6} c_{1} s_{4} s_{5} - s_{1} (a_{2} + a_{4} c_{23} + d_{4} s_{23} + a_{3} c_{2} + (d_{6} c_{23} s_{45}) / 2 + d_{6} s_{23} c_{5} - (d_{6} s_{45} c_{23}) / 2) \\ J_{12} = c_{1} (c_{2} (d_{6} (c_{3} c_{5} - c_{4} s_{3} s_{5}) + d_{4} c_{3} - a_{4} s_{3}) - s_{2} (a_{3} + d_{6} (c_{5} s_{3} + c_{3} c_{4} s_{5}) + a_{4} c_{3} + d_{4} s_{3})) \\ J_{13} = - c_{1} (s_{23} (a_{4} + d_{6} c_{4} s_{5}) - c_{23} (d_{4} + d_{6} c_{5})) \\ J_{14} = d_{6} s_{5} (c_{4} s_{1} - c_{1} c_{2} c_{3} s_{4} + c_{1} s_{2} s_{3} s_{4}) \\ J_{15} = d_{6} c_{5} s_{1} s_{4} - d_{6} c_{1} c_{2} s_{3} s_{5} - d_{6} c_{1} c_{3} s_{2} s_{5} - d_{6} c_{1} c_{4} c_{5} s_{2} s_{3} + d_{6} c_{1} c_{2} c_{3} c_{4} c_{5} \\ J_{16} = 0 \\ J_{21} = c_{1} (a_{2} + a_{4} c_{23} + d_{4} s_{23} + a_{3} c_{2} + (d_{6} c_{23} s_{45}) / 2 + d_{6} s_{23} c_{5} - (d_{6} (s_{4} c_{5} - s_{5} c_{4}) c_{23}) / 2) + d_{6} s_{1} s_{4} s_{5} \\ J_{22} = s_{1} (c_{2} (d_{6} (c_{3} c_{5} - c_{4} s_{3} s_{5}) + d_{4} c_{3} - a_{4} s_{3}) - s_{2} (a_{3} + d_{6} (c_{5} s_{3} + c_{3} c_{4} s_{5}) + a_{4} c_{3} + d_{4} s_{3})) \\ J_{23} = - s_{1} (s_{23} (a_{4} + d_{6} c_{4} s_{5}) - c_{23} (d_{4} + d_{6} c_{5})) \\ J_{24} = - d_{6} s_{5} (c_{1} c_{4} + c_{2} c_{3} s_{1} s_{4} - s_{1} s_{2} s_{3} s_{4}) \\ J_{25} = d_{6} c_{2} c_{3} c_{4} c_{5} s_{1} - d_{6} c_{2} s_{1} s_{3} s_{5} - d_{6} c_{3} s_{1} s_{2} s_{5} - d_{6} c_{1} c_{5} s_{4} - d_{6} c_{4} c_{5} s_{1} s_{2} s_{3} \\ J_{26} = 0 \\ J_{31} = 0 \\ J_{32} = a_{3} c_{2} + a_{4} c_{2} c_{3} + d_{4} c_{2} s_{3} + d_{4} c_{3} s_{2} - a_{4} s_{2} s_{3} + d_{6} c_{2} c_{5} s_{3} + d_{6} c_{3} c_{5} s_{2} + d_{6} c_{2} c_{3} c_{4} s_{5} - d_{6} c_{4} s_{2} s_{3} s_{5} \\ J_{33} = a_{4} c_{23} + d_{4} s_{23} + d_{6} s_{23} c_{5} + d_{6} c_{23} c_{4} s_{5} \\ J_{34} = - d_{6} s_{23} s_{4} s_{5} \\ J_{35} = d_{6} (c_{2} c_{3} s_{5} - s_{2} s_{3} s_{5} + c_{2} c_{4} c_{5} s_{3} + c_{3} c_{4} c_{5} s_{2}) \\ J_{36} = 0 \\ J_{41} = 0 \\ J_{42} = s_{1} \\ J_{43} = s_{1} \\ J_{44} = s_{23} c_{1} \\ J_{45} = c_{4} s_{1} + s_{4} (c_{1} s_{2} s_{3} - c_{1} c_{2} c_{3}) \\ J_{46} = s_{5} (s_{1} s_{4} - c_{4} (c_{1} s_{2} s_{3} - c_{1} c_{2} c_{3})) + c_{5} (c_{1} c_{2} s_{3} + c_{1} c_{3} s_{2}) \\ J_{51} = 0 \\ J_{52} = - c_{1} \\ J_{53} = - c_{1} \\ J_{54} = s_{23} s_{1} \\ J_{55} = s_{4} (s_{1} s_{2} s_{3} - c_{2} c_{3} s_{1}) - c_{1} c_{4} \\ J_{56} = c_{5} (c_{2} s_{1} s_{3} + c_{3} s_{1} s_{2}) - s_{5} (c_{1} s_{4} + c_{4} (s_{1} s_{2} s_{3} - c_{2} c_{3} s_{1})) \\ J_{61} = 1 \\ J_{62} = 0 \\ J_{63} = 0 \\ J_{64} = - c_{23} \\ J_{65} = - s_{23} s_{4} \\ J_{66} = s_{23} c_{4} s_{5} - c_{23} c_{5} \end{matrix}

(10)

5.3. Trajectory Planning

To ensure smooth and vibration-free motion of the robot, abrupt changes in joint torques must be avoided. This requires the velocity and acceleration of each joint and the end-effector pose to change continuously over time. To achieve this, a seventh-order polynomial trajectory is adopted in Cartesian space. This approach guarantees the continuity of position, velocity, acceleration, and jerk.

\begin{matrix} X (t) & = k_{0} + k_{1} t + k_{2} t^{2} + k_{3} t^{3} + k_{4} t^{4} + k_{5} t^{5} + k_{6} t^{6} + k_{7} t^{7} \\ \dot{X} (t) & = k_{1} + 2 k_{2} t + 3 k_{3} t^{2} + 4 k_{4} t^{3} + 5 k_{5} t^{4} + 6 k_{6} t^{5} + 7 k_{7} t^{6} \\ \ddot{X} (t) & = 2 k_{2} + 6 k_{3} t + 12 k_{4} t^{2} + 20 k_{5} t^{3} + 30 k_{6} t^{4} + 42 k_{7} t^{5} \\ \overset{⃛}{X} (t) & = 6 k_{3} + 24 k_{4} t + 60 k_{5} t^{2} + 120 k_{6} t^{3} + 210 k_{7} t^{4} \end{matrix}

(11)

where the coefficients are given by:

\{\begin{matrix} k_{0} & = X_{0} \\ k_{1} & = {\dot{X}}_{0} \\ k_{2} & = \frac{1}{2} {\ddot{X}}_{0} \\ k_{3} & = \frac{1}{6} {\overset{⃛}{X}}_{0} \\ k_{4} & = \frac{210 (X_{f} - X_{0}) - t_{f} [(30 {\ddot{X}}_{0} - 15 {\ddot{X}}_{f}) + (4 {\overset{⃛}{X}}_{0} + {\overset{⃛}{X}}_{f}) t_{f}^{2} + 120 {\dot{X}}_{0} + 90 {\dot{X}}_{f}]}{6 t_{f}^{4}} \\ k_{5} & = \frac{- 168 (X_{f} - X_{0}) - t_{f} [(20 {\ddot{X}}_{0} - 14 {\ddot{X}}_{f}) t_{f} + (2 {\overset{⃛}{X}}_{0} + {\overset{⃛}{X}}_{f}) t_{f}^{2} + 90 {\dot{X}}_{0} + 78 {\dot{X}}_{f}]}{2 t_{f}^{5}} \\ k_{6} & = \frac{420 (X_{f} - X_{0}) - t_{f} [(45 {\ddot{X}}_{0} - 39 {\ddot{X}}_{f}) t_{f} + (4 {\overset{⃛}{X}}_{0} + 3 {\overset{⃛}{X}}_{f}) t_{f}^{2} + 216 {\dot{X}}_{0} + 204 {\dot{X}}_{f}]}{6 t_{f}^{6}} \\ k_{7} & = \frac{- 120 (X_{f} - X_{0}) - t_{f} [(12 {\ddot{X}}_{0} - 12 {\ddot{X}}_{f}) t_{f} + ({\overset{⃛}{X}}_{0} + {\overset{⃛}{X}}_{f}) t_{f}^{2} + 60 {\dot{X}}_{0} + 60 {\dot{X}}_{f}]}{6 t_{f}^{7}} \end{matrix}

(12)

with boundary conditions:

\{\begin{matrix} X (t_{0}) & = X_{0} \\ \dot{X} (t_{0}) & = {\dot{X}}_{0} \\ \ddot{X} (t_{0}) & = {\ddot{X}}_{0} \\ \overset{⃛}{X} (t_{0}) & = {\overset{⃛}{X}}_{0} \\ X (t_{f}) & = X_{f} \\ \dot{X} (t_{f}) & = {\dot{X}}_{f} \\ \ddot{X} (t_{f}) & = {\ddot{X}}_{f} \\ \overset{⃛}{X} (t_{f}) & = {\overset{⃛}{X}}_{f} \end{matrix}

where

X_{0} = {[\begin{matrix} {p_{x}}_{0} & {p_{y}}_{0} & {p_{z}}_{0} & α_{0} & β_{0} & γ_{0} \end{matrix}]}^{T}

and

X_{f} = {[\begin{matrix} {p_{x}}_{f} & {p_{y}}_{f} & {p_{z}}_{f} & α_{f} & β_{f} & γ_{f} \end{matrix}]}^{T}

represent the initial and final positions of the trajectory, and

t_{f}

is the total duration of the trajectory planning.

5.4. Control Scheme

The operation control is based on vision positioning and feedback [23]. The control block diagram of the system is shown in Figure 10. The system employs a depth camera and UWB sensors to detect the position

x, y, z

of the corner component, while an IMU sensor measures its attitude angles

α, β, γ

. Based on the acquired position and orientation

X_{c}

, the desired end-effector pose of the robotic arm

X_{d}

is calculated by the pose transformation relationship between operational pose and recognition pose

T_{c}^{d}

. The difference between the desired pose and the current pose

X_{0}

of the robotic arm

Δ X = X_{d} - X_{0}

is scaled by a PD controller to determine the pose increment

K Δ X + D Δ \dot{X}

for the next time step. Consequently, the desired pose for the next moment is obtained as

X_{0} + Δ X

. Using this pose, the joint driving angles

θ_{i}

are calculated through inverse kinematics, allowing the robotic arm to perform the required operation. Here, the current pose

X_{0}

is calculated via the forward kinematics based on the current joint angles

θ_{i}

. And, the relationship of the pose increment of the end-effector and the joint increment is

Δ X = J Δ Q

.

The advantage of this control method lies in its adaptability: when the end-effector is far from the desired pose, it enables a rapid approach to save time; when the end-effector is close to the target pose, it allows a slow approach to avoid collision with the manipulated object.

6. Simulation and Experiments

To verify the feasibility and effectiveness of the proposed system, a design and control method, a simulation system, and a prototype system were built for experimental validation. The simulation and experiment parameters are shown in Table 1. Some parameters such as K and D are fine-tuned by trials. The parameters of the desired trajectories are produced through Equation (12).

6.1. Simulations

Based on the previously designed automatic loading system hardware, the optimized design of a multi-functional mechanical gripper, and the deep vision-based control algorithms, a dynamic simulation model was constructed in Gazebo, as shown in Figure 1a. Subsequently, simulation experiments were conducted on the loading and unloading of a fully automatic twist lock onto or from a Modular Transport Container Module, as illustrated in Figure 11a,b, respectively. The pose curves of the multi-functional mechanical gripper during loading and unloading are shown in Figure 12a–d, respectively.

6.2. Experiments

The full-scale prototype of automatic loading system is implemented as shown in Figure 1b. Subsequently, prototype experiments were conducted on the loading and unloading of a fully automatic twist lock onto or from a Modular Transport Container Module, as illustrated in Figure 13a,b, respectively.

The perception results of the depth camera during the loading and unloading process are shown in Figure 14a,b, respectively. It can be found that many features are recognized, including the target Corner feature, and multi-feature fusion is employed to improve the stability and robustness of positioning. The comparative experiments about the difference between single-feature and multi-feature perception methods are shown in Table 2. Meanwhile, the object detection algorithm based on the improved YOLOv5 is compared with traditional YOLOv5. The reasoning speed of improved YOLOv5 is 450FPS, which is faster than that of traditional YOLOv5, that is, 120FPS.

The position and attitude angle curves of the multi-functional mechanical gripper during the loading process are shown in Figure 15a,b, respectively. As shown in the figure, the fully automated twist lock loading process using Arm 1 can be divided into six stages: initial grasping, rotating and moving beneath the corner piece, lifting the twist lock into the corner piece, rotating and releasing the twist lock, and returning the gripper to the initial position. In the initial grasping stage, the manipulator remains at its starting position, with the gripper closed to securely hold the twist lock. During the manipulation stage, the end-effector gripper rotates to an angle of

70^{°}

along the z-axis, carrying the twist lock to a position directly beneath the corner piece. The manipulator then moves a certain distance along the negative y-axis to align the twist lock with the bottom hole of the corner piece. Subsequently, it moves upward along the positive z-axis. The end-effector then rotates from

70^{°}

to

130^{°}

along the z-axis. The gripper opens to release the twist lock into the corner piece. Finally, the gripper moves downward and returns to the initial position, completing the installation process.

The position and attitude angle curves of the multi-functional mechanical gripper during the unloading process are shown in Figure 15c,d, respectively. As shown in the figure, the fully automated twist lock unloading process using Arm 1 can be divided into six stages: approaching and reaching beneath the twist lock from the initial position, closing the gripper to grasp the twist lock, rotating to unload the twist lock, lowering to extract the twist lock, and placing it at the designated position before returning to the initial position. In the initial positioning stage, the manipulator starts from its home position. During the manipulation stage, the manipulator first moves to the rear-right side of the twist lock. The gripper then moves beneath the rear-right of the twist lock and rotates to an angle of

130^{°}

along the z-axis. It continues moving to a position directly beneath the twist lock and then moves upward along the positive z-axis. The gripper closes to securely grasp the twist lock and lifts it up by 3 mm. The end-effector then rotates from

130^{°}

to

70^{°}

along the z-axis. The gripper releases slightly and moves downward to extract the twist lock. Finally, the manipulator moves the twist lock to a designated location, opens the gripper to release it, and returns to the initial position, completing the unloading process.

To highlight the significance of deep vision-based operation control, comparative experiments are implemented between with (closed-loop) and without (open-loop) deep vision. The comparative results of efficiency, accuracy, and control cycle are shown in Table 3. The efficiency is reflected by the success rate of operation. The accuracy means the error of position between the gripper and the automatic twist lock. The control cycle donates the system latency or real-time constraints. The target perception positioning algorithm takes up most of the time and the GPU is employed to accelerate the process to ensure real-time performance.

7. Discussion

To help readers better appreciate the practical significance of our work, this section summarizes the advantages, limitations, and implications of the proposed system.

7.1. Advantages

End-to-end autonomy. The tight coupling of perception, planning, and control allows the system to complete the entire twist lock workflow without human supervision.
Industrial-scale robustness. Extensive simulation and full-scale experiments confirm a $95.72 %$ success rate and $1.2$ mm mean positional error under varying illumination, dust, and occlusions—requirements typical of real warehouses.
Modular, transferable design. Thanks to the containerized architecture and parameterized gripper CAD model, the solution can be re-scaled to different container sizes or robot brands with minimal re-engineering effort.
Computational efficiency. The vision stack runs at $> 400$ FPS and the whole control loop at $35$ ms, meeting real-time constraints on standard off-the-shelf hardware.

7.2. Limitations

Environment-specific calibration. Fixed-pose cameras must be re-calibrated if the working volume or robot base is moved; future work could employ active depth sensors on the end-effector to reduce this dependency.
Lighting sensitivity. Although multi-feature fusion improves robustness, extreme back-lighting still degrades corner detection. Adaptive exposure control or event cameras are promising remedies.
Limited generalisation beyond twist locks. The gripper fingertips are optimized for the ISO standard locking head; handling irregular bulk cargo would require replaceable jaws or soft robotic add-ons.
Compute-heavy training. Building the 3200-image dataset and training the improved YOLOv5 once demand a high-end GPU cluster; incremental learning strategies could mitigate this overhead.

7.3. Implications for Industrial Deployment

Deploying the system in production warehouses could cut manual labor by more than

50 %

and reduce error-induced downtime. Moreover, the design principles, including gripper–workspace co-optimization, feature-level perceptual fusion, and high-order trajectory planning, are applicable to other logistics operations such as palletizing or autonomous container stacking.

8. Conclusions

In this paper, an intelligent automatic loading system with a multi-functional mechanical gripper and deep vision-based control framework was proposed and implemented. The system effectively combines containerized transport logic with robotic automation, enabling unmanned loading and unloading in warehouse environments. Through optimized gripper design and multi-robot coordination, the system successfully completes complex tasks such as twist lock manipulation and container posture adjustment. The integration of a multi-sensor vision system ensures accurate pose estimation and improves operational reliability. Experimental validations confirm that the proposed solution achieves high efficiency and precision, marking a significant step toward fully autonomous industrial logistics systems. Future work will focus on enhancing the generalization capability of the vision module and expanding system scalability for larger scale deployments.

Author Contributions

Conceptualization, G.C. and Y.W.; methodology, G.C., Y.W., J.Z., B.Z., X.S., J.M. and S.T.; software, H.D., A.C., G.Z. and Y.C.; validation, H.D., A.C., G.Z. and Y.C.; formal analysis, H.D., A.C., G.Z. and Y.C.; investigation, G.C., Y.W., J.Z., B.Z., X.S., J.M. and S.T.; resources, G.C. and Y.W.; data curation, H.D., A.C., G.Z. and Y.C.; writing—original draft preparation, G.C.; writing—review and editing, G.C. and S.G.; visualization, H.D., A.C., G.Z. and Y.C.; supervision, G.C. and S.G.; project administration, G.C., Y.W. and S.G.; funding acquisition, G.C., Y.W. and S.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Huaneng Group Headquarters Science and Technology Project “Research and Development of Standardized Material Carrying Platform and Intelligent Loading System” (HNKJ22-HF130), Central Government Guides Local Science and Technology Development Fund Projects (246Z1813G), National Key Research and Development Program of China (2022YFB4701600), Beijing Natural Science Foundation (L243013), A New Generation Information Technology Innovation Project of China University Research Innovation Fund (2023IT098) and Fundamental Research Funds for the Central Universities (2024XKRC069).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

Authors Yaohui Wang and Xiaohu Sun were employed by the company Huaneng Coal Technology Research Co., Ltd. Authors Jinliang Zhang, Bo Zhang, Shihe Tian and Jixuan Ma were employed by the company Zhalainuoer Coal Industry Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Kim, D.; TabkhPaz, M.; Park, S.S.; Lee, J. Development of a vision-based automated hole assembly system with quality inspection. Manuf. Lett. 2023, 35, 64–73. [Google Scholar] [CrossRef]
Kang, L.; Seo, J.T.; Kim, S.H.; Kim, W.J.; Yi, B.J. Design and implementation of a multi-function gripper for grasping general objects. Appl. Sci. 2019, 9, 5266. [Google Scholar] [CrossRef]
Gümbel, P.; Dröder, K. Design of Highly Repeatable and Multi-Functional Grippers for Precision Handling with Articulated Robots. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; pp. 1971–1977. [Google Scholar]
Cramer, J.; Decraemer, B.; Afzal, M.R.; Kellens, K. A user-friendly toolkit to select and design multi-purpose grippers for modular robotic systems. Procedia CIRP 2024, 127, 2–7. [Google Scholar] [CrossRef]
Ai, S.; Chen, Y. Multifunctional gripper design at the end of the robot. Instrumentation 2022, 9, 4. [Google Scholar]
Kladovasilakis, N.; Sideridis, P.; Tzetzis, D.; Piliounis, K.; Kostavelis, I.; Tzovaras, D. Design and development of a multi-functional bioinspired soft robotic actuator via additive manufacturing. Biomimetics 2022, 7, 105. [Google Scholar] [CrossRef]
Yildiz, B.S.; Pholdee, N.; Bureerat, S.; Yildiz, A.R.; Sait, S.M. Robust design of a robot gripper mechanism using new hybrid grasshopper optimization algorithm. Expert Syst. 2021, 38, e12666. [Google Scholar] [CrossRef]
Nguyen, T.V.; Huynh, N.T.; Vu, N.C.; Kieu, V.N.; Huang, S.C. Optimizing compliant gripper mechanism design by employing an effective bi-algorithm: Fuzzy logic and ANFIS. Microsyst. Technol. 2021, 27, 3389–3412. [Google Scholar] [CrossRef]
Pinskier, J.; Wang, X.; Liow, L.; Xie, Y.; Kumar, P.; Langelaar, M.; Howard, D. Diversity-Based Topology Optimization of Soft Robotic Grippers. Adv. Intell. Syst. 2024, 6, 2300505. [Google Scholar] [CrossRef]
Lee, L.Y.; Syadiqeen, O.A.; Tan, C.P.; Nurzaman, S.G. Closed-structure compliant gripper with morphologically optimized multi-material fingertips for aerial grasping. IEEE Robot. Autom. Lett. 2021, 6, 887–894. [Google Scholar] [CrossRef]
Sun, Y.; Liu, Y.; Pancheri, F.; Lueth, T.C. Larg: A lightweight robotic gripper with 3-d topology optimized adaptive fingers. IEEE/ASME Trans. Mechatronics 2022, 27, 2026–2034. [Google Scholar] [CrossRef]
Zablocki, É.; Ben-Younes, H.; Pérez, P.; Cord, M. Explainability of deep vision-based autonomous driving systems: Review and challenges. Int. J. Comput. Vis. 2022, 130, 2425–2452. [Google Scholar] [CrossRef]
Ghasemieh, A.; Kashef, R. Towards explainable artificial intelligence in deep vision-based odometry. Comput. Electr. Eng. 2024, 115, 109127. [Google Scholar] [CrossRef]
Wang, R.; Wang, S.; Du, S.; Xiao, E.; Yuan, W.; Feng, C. Real-time soft body 3D proprioception via deep vision-based sensing. IEEE Robot. Autom. Lett. 2020, 5, 3382–3389. [Google Scholar] [CrossRef]
Choi, H.; Lee, J.; Yi, Y.; Na, H.; Kang, K.; Kim, T. Deep vision-based occupancy counting: Experimental performance evaluation and implementation of ventilation control. Build. Environ. 2022, 223, 109496. [Google Scholar] [CrossRef]
Wang, Z.; Jin, L.; Wang, S.; Xu, H. Apple stem/calyx real-time recognition using YOLO-v5 algorithm for fruit automatic loading system. Postharvest Biol. Technol. 2022, 185, 111808. [Google Scholar] [CrossRef]
Zou, W.; Zhang, D.; Lee, D.J. A new multi-feature fusion based convolutional neural network for facial expression recognition. Appl. Intell. 2022, 52, 2918–2929. [Google Scholar] [CrossRef]
Jiang, M.; Yin, S. Facial expression recognition based on convolutional block attention module and multi-feature fusion. Int. J. Comput. Vis. Robot. 2023, 13, 21–37. [Google Scholar] [CrossRef]
Li, H.; Vong, C.M.; Wong, P.K.; Ip, W.F.; Yan, T.; Choi, I.C.; Yu, H.H. A multi-feature fusion method for image recognition of gastrointestinal metaplasia (GIM). Biomed. Signal Process. Control 2021, 69, 102909. [Google Scholar] [CrossRef]
Hu, F.; Qian, M.; He, K.; Zhang, W.a.; Yang, X. A novel multi-feature fusion network with spatial partitioning strategy and cross-attention for armband-based gesture recognition. IEEE Trans. Neural Syst. Rehabil. Eng. 2024, 32, 3878–3890. [Google Scholar] [CrossRef]
Liu, C.; Xu, X. AMFF: A new attention-based multi-feature fusion method for intention recognition. Knowl. Based Syst. 2021, 233, 107525. [Google Scholar] [CrossRef]
Zhang, Y.; Guo, Z.; Wu, J.; Tian, Y.; Tang, H.; Guo, X. Real-time vehicle detection based on improved yolo v5. Sustainability 2022, 14, 12274. [Google Scholar] [CrossRef]
Shahria, M.T.; Sunny, M.S.H.; Zarif, M.I.I.; Ghommam, J.; Ahamed, S.I.; Rahman, M.H. A comprehensive review of vision-based robotic applications: Current state, components, approaches, barriers, and potential solutions. Robotics 2022, 11, 139. [Google Scholar] [CrossRef]

Figure 1. Simulation and experimental prototypes of the unmanned warehouse automated loading and unloading system.

Figure 2. The single robotic arm with a gripper and a depth camera.

Figure 3. Three flag states in the operation process. (a) The state before insertion in the locking procedure or after extraction in the unlocking procedure; (b) the state before rotation in the locking procedure or after rotation in the unlocking procedure; (c) the state after rotation in the locking procedure or before rotation in the unlocking procedure.

Figure 4. Type I Embedded Two-Finger Gripper and Its Limitations.

Figure 5. Type II Embedded Two-Finger Gripper and Its Limitations.

Figure 6. Type III Embedded Two-Finger Gripper and Its Limitations.

Figure 7. Comparison among three types of gripper.

Figure 8. The final optimized multi-functional mechanical gripper design. (a) Locking and unlocking manual adjustable twist locks; (b) Correcting the posture of the transport container 1; (c) correcting the posture of the transport container 2; (d) positioning and securing the transport container 1; (e) positioning and securing the transport container 2.

Figure 9. Recognition results showing multiple detected features. The primary target is the Corner, while other features provide supplementary information for pose estimation.

Figure 10. The control scheme of deep vision-based operation control.

Figure 11. The simulation experiment of operating the fully automatic twist lock by a multi-functional mechanical gripper.

Figure 12. The pose curves of the multi-functional mechanical gripper in simulations. (a) The position curves during the loading process; (b) the attitude angle curves during the loading process; (c) the position curves during the unloading process; (d) the attitude angle curves during the unloading process.

Figure 13. The prototype experiment of operating the fully automatic twist lock by a multi-functional mechanical gripper.

Figure 14. The perception results of the depth camera.

Figure 15. The pose curves of the multi-functional mechanical gripper in experiments. (a) The position curves during the loading process; (b) the attitude angle curves during the loading process; (c) the position curves during the unloading process; (d) the attitude angle curves during the unloading process.

Table 1. Comparative results between single-feature and multi-feature perception method.

Parameters	Value
Controller-CPU	Core i7-12700
Controller-GPU	RTX4060
Number of DoF	6 (arm) + 1 (gripper)
K	$[\begin{matrix} 0.02 & 0.01 & 0.01 & 0.005 & 0.005 & 0.009 \end{matrix}]$
D	$[\begin{matrix} 0.0015 & 0.0011 & 0.0013 & 0.00035 & 0.00033 & 0.00087 \end{matrix}]$
$T_{c}^{d}$	$[\begin{matrix} 1 & 0 & 0 & 50 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 120 \\ 0 & 0 & 0 & 1 \end{matrix}] m m$

Table 2. Comparative results between single-feature and multi-feature perception method.

Method	Success Rate	Accuracy
Single-feature	88.26%	3.7 mm
Multi-feature	95.72%	1.2 mm

Table 3. Comparative results between with and without deep vision.

Method	Efficiency	Accuracy	Control Cycle
With deep vision	63.4%	15.3 mm	35 ms
Without deep vision	97.1%	1.2 mm	8 ms

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Y.; Guo, S.; Zhang, J.; Ding, H.; Zhang, B.; Cao, A.; Sun, X.; Zhang, G.; Tian, S.; Chen, Y.; et al. Optimized Design and Deep Vision-Based Operation Control of a Multi-Functional Robotic Gripper for an Automatic Loading System. Actuators 2025, 14, 259. https://doi.org/10.3390/act14060259

AMA Style

Wang Y, Guo S, Zhang J, Ding H, Zhang B, Cao A, Sun X, Zhang G, Tian S, Chen Y, et al. Optimized Design and Deep Vision-Based Operation Control of a Multi-Functional Robotic Gripper for an Automatic Loading System. Actuators. 2025; 14(6):259. https://doi.org/10.3390/act14060259

Chicago/Turabian Style

Wang, Yaohui, Sheng Guo, Jinliang Zhang, Hongbo Ding, Bo Zhang, Ao Cao, Xiaohu Sun, Guangxin Zhang, Shihe Tian, Yongxu Chen, and et al. 2025. "Optimized Design and Deep Vision-Based Operation Control of a Multi-Functional Robotic Gripper for an Automatic Loading System" Actuators 14, no. 6: 259. https://doi.org/10.3390/act14060259

APA Style

Wang, Y., Guo, S., Zhang, J., Ding, H., Zhang, B., Cao, A., Sun, X., Zhang, G., Tian, S., Chen, Y., Ma, J., & Chen, G. (2025). Optimized Design and Deep Vision-Based Operation Control of a Multi-Functional Robotic Gripper for an Automatic Loading System. Actuators, 14(6), 259. https://doi.org/10.3390/act14060259

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimized Design and Deep Vision-Based Operation Control of a Multi-Functional Robotic Gripper for an Automatic Loading System

Abstract

1. Introduction

2. Automatic Loading System

2.1. System Architecture and Composition

2.2. Loading and Unloading Process

2.2.1. Loading Process

2.2.2. Unloading Process

3. Optimized Design

3.1. Tasks of the Multi-Robot Arm Collaborative Loading and Unloading Module

3.2. Operation Process

3.2.1. Locking Procedure

3.2.2. Unlocking Procedure

3.3. Optimized Design of the Multi-Functional Robotic Gripper

3.3.1. Type I Gripper (90° Between Gripper and Container Side)

3.3.2. Type II Gripper (0° Between Gripper and Container Side)

3.3.3. Type III Gripper (30° Between Gripper and Container Side)

4. Sensing Based on Deep Vision

4.1. Improved YOLOv5 Object Detection Algorithm

4.2. Multi-Feature Fusion Recognition

5. Operation Control

5.1. Kinematics Model

5.1.1. Forward Kinematics

5.1.2. Inverse Kinematics

5.2. Jacobian Matrix

5.3. Trajectory Planning

5.4. Control Scheme

6. Simulation and Experiments

6.1. Simulations

6.2. Experiments

7. Discussion

7.1. Advantages

7.2. Limitations

7.3. Implications for Industrial Deployment

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI