Symmetric UAV Cooperative Lifting Motion Planning in Confined Space

Jingwen Huang; Tianyi Jia; Xiulan Wei

doi:10.3390/sym17122041

,

and

College of Intelligent Engineering and Supply Chain Innovation, Beijing Wuzi University, Beijing 101126, China

^*

Author to whom correspondence should be addressed.

Symmetry2025, 17(12), 2041;https://doi.org/10.3390/sym17122041
(registering DOI)

This article belongs to the Special Issue Symmetry/Asymmetry Studies in Data Mining & Machine Learning

Version Notes

Order Reprints

Abstract

This paper investigates the motion planning problem for symmetric UAV cooperative lifting in confined spaces. A dynamic model of the symmetric UAV cooperative lifting system is established, and differential flatness analysis is employed to transform nonlinear dynamics into constraints on flat outputs, thereby simplifying the motion planning process. The planning framework consists of two levels: path planning and trajectory planning. For path planning, a reinforcement learning-based bidirectional RRT (RLDB-BiRRT) method is proposed, which integrates the random tree expansion mechanism with the DDPG algorithm to achieve adaptive directional bias. This approach effectively mitigates the issues of low search efficiency and excessive redundant nodes inherent in traditional RRT algorithms. For trajectory planning, an adaptive safe flight corridor (SFC) construction method is introduced, combining symmetric ellipsoids and convex polyhedra to generate high-quality linear constraints. Building upon the proposed motion planning method and leveraging differential flatness analysis, a unified planning framework is developed that seamlessly integrates the reinforcement learning-enhanced path planning with adaptive safe corridor construction and differential-flatness-based trajectory optimization, specifically designed for symmetric UAV cooperative lifting tasks in confined spaces. This integrated approach enhances corridor space utilization and ensures trajectory continuity. Simulation experiments validate the effectiveness of the proposed methods, demonstrating their capability to generate dynamically feasible, smooth, and safe transportation trajectories in confined environments, while effectively constraining load swing and UAV attitude angles. This study provides theoretical foundations and practical references for the application of symmetric UAV cooperative lifting in low-altitude logistics and emergency transportation scenarios.

Keywords:

symmetric UAV lifting; reinforcement learning; differential flatness; trajectory planning; Safe Corridor (SFC)

1. Introduction

Low-altitude logistics, an innovative domain emerging from the integration of the low-altitude economy and smart logistics, effectively utilizes low-altitude Unmanned Aerial Vehicles (UAV) to overcome ground transportation constraints. It not only reduces operational costs but also demonstrates significant value in specialized applications such as emergency rescue, medical supply delivery, and oversized cargo transport. Quadrotor UAV are commonly deployed in these scenarios; however, limited payload capacity of individual drones has led to the development of cooperative transportation using symmetric drone formations. When operating in confined spaces, these systems encounter challenges such as spatial constraints, complex terrain, and obstacles that impede movement. To ensure safe and smooth navigation through such environments, motion planning that incorporates UAV dynamics is essential.

Motion planning consists of two levels: path planning and trajectory planning. Path planning identifies a collision-free geometric path in a constrained space, often by simplifying the system as a point mass. Trajectory planning further incorporates dynamic configuration and temporal constraints to generate a time-parameterized executable trajectory. Among various path planning algorithms, sampling-based methods such as the Rapidly-exploring Random Tree (RRT) are widely used. RRT incrementally expands tree nodes toward the goal, offering advantages like short planning times and adaptability to high-dimensional environments. However, it often produces suboptimal paths due to random sampling and exhibits low search efficiency. Improved variants such as RRT* guarantee asymptotic optimality by rewiring the tree, and Informed RRT* further accelerates convergence by heuristically constraining the sampling domain. Recent hybrids couple RRT* with complementary techniques: an adaptive RRT* was fused with the Dynamic Window Approach [] to enhance dynamic feasibility, and a bidirectional RRT integrated with particle-swarm optimization [] was proposed for smoother trajectories. Nevertheless, their efficacy in highly confined spaces remains limited.

In the field of trajectory planning, many remarkable solutions have emerged. Alkomy et al. [] used polynomial trajectories for vibration reduction of a quadrotor with a cable-suspended payload. Quadratic programming (QP) approaches reformulate trajectory generation into problems with linear constraints [] or mixed-integer quadratic programming (MIQP) formulations [], both demonstrating strong performance in complex scenarios. Reinforcement learning (RL) methods generate trajectories via value function approximation and have shown robustness in varying conditions [,]. The challenge of real-time obstacle avoidance in dense environments has been addressed through an integrated sensor-based planning approach [].

While these studies provide a foundation for cooperative trajectory planning, symmetric dual-drone systems introduce additional challenges. Planning must account for both individual drone dynamics and inter-agent coordination. Differential flatness simplifies this by reducing the high-dimensional system into smooth parameterizations of flat outputs, embedding dynamics into the outputs and their derivatives. The challenge of multi-helicopter cooperative transportation in crowded environments has been addressed through real-time trajectory planning leveraging differential flatness []. In multi-UAV systems, Valentim et al. [] introduced a new set of flat outputs for cooperative load transport, achieving precise tracking.

Safe Flight Corridors (SFC) compress the search space into finite-dimensional representations, providing convex constraints for trajectory optimization. Various SFC construction methods have been proposed, including point-cloud processing techniques [,] and convex optimization-based approach []. Common representations include spherical corridors [] and ellipsoidal corridors []. However, these methods often model drones as point masses, neglecting formation constraints in cooperative systems.

To enhance the adaptability of path planning while accounting for the formation constraints of a symmetric UAV cooperative lifting system in confined spaces, this paper proposes a motion planning strategy comprising the following components: (1) A novel integration of DDPG reinforcement learning with a bidirectional RRT framework (RLDB-BiRRT) is introduced, which significantly improves path search efficiency through adaptive parameter tuning. (2) An adaptive safe flight corridor construction method is designed, which enhances the spatial utilization of cooperative systems by hybridizing ellipsoidal and polyhedral geometries, thereby addressing a key challenge long overlooked by safe flight corridor methods—the configuration constraints of UAV formations. (3) A unified planning framework is developed, seamlessly integrating the above reinforcement learning-enhanced path planning method and adaptive safe corridor construction technique with differential-flatness-based trajectory optimization, specifically tailored for symmetric UAV cooperative lifting tasks in confined spaces.

2. Dynamic Model and Analysis

The overall architecture of the symmetric UAV cooperative transportation system is depicted in Figure 1. The process begins with the construction of a grid map of the confined space. A reinforcement learning-based Bi-RRT algorithm then generates a feasible path, followed by the construction of a Safe Flight Corridor (SFC) via convex decomposition. Finally, the optimal trajectory

X_{d}

for the controller is obtained by solving an optimization problem.

Figure 1. The overall framework of the control system.

Compared to single-UAV transport, the symmetric cooperative system introduces additional degrees of freedom and nonlinear dynamic coupling. This necessitates simultaneous management of both UAVs’ attitudes and thrusts while satisfying geometric and dynamic equilibrium constraints, significantly increasing planning and control complexity. To streamline trajectory planning while ensuring dynamic feasibility, the system’s dynamic model is firstly established. Subsequently, differential flatness analysis is applied, transforming the nonlinear dynamics into equivalent constraints on the flat outputs and their derivatives. This reduces constraint dimensionality and enables efficient analytical motion planning.

2.1. Dynamic Model

The physical model and coordinate system of the symmetric UAV cooperative transportation system are illustrated in Figure 2, where

O - X Y Z

denotes the inertial frame. The positions of the symmetric UAVs, UAV1 and UAV2, are denoted as

(x_{1}, y_{1}, z_{1})

and

(x_{2}, y_{2}, z_{2})

, respectively, and the payload position is denoted as

(x_{3}, y_{3}, z_{3})

.

Figure 2. Physical model and coordinate system of the symmetric UAV cooperative lifting system.

The cable lengths connecting UAV1 and UAV2 to the payload are

L_{1}

and

L_{2}

, respectively. The masses of the symmetric UAVs and the payload are

m_{1}

,

m_{2}

, and

m_{3}

, respectively. The angles

θ_{1}

and

θ_{2}

represent the angles between the projections of cables

L_{1}

and

L_{2}

onto the

x y

-plane and the x-axis, respectively. The angles

ϕ_{1}

and

ϕ_{2}

represent the angles between cables

L_{1}

,

L_{2}

and the z-axis, respectively. Under the thrust generated by their propellers, the UAVs achieve motion control along the X, Y, and Z axes of the inertial frame to accomplish spatial positioning and transportation tasks for the payload.

To facilitate modeling and the design of subsequent algorithms, the following four assumptions are made:

The UAV fuselage structure is symmetric.
Both the UAVs and the payload are symmetric rigid bodies with uniform mass density.
The mass of the cables connecting the UAVs to the payload is negligible.
The swing angles of the cables connected to the payload satisfy $ϕ, θ \in (- \frac{π}{2}, \frac{π}{2})$ .

To derive the dynamic model based on the Lagrange equation, the generalized coordinate vector is introduced:

q = {[L_{1}, θ_{1}, ϕ_{1}, L_{2}, θ_{2}, ϕ_{2}, x_{3}, y_{3}, z_{3}]}^{T} \in R^{9}

(1)

where

L_{1}

and

L_{2}

are the cable lengths,

θ_{1}, ϕ_{1}

and

θ_{2}, ϕ_{2}

are the swing angles of cable 1 and cable 2, and

x_{3}, y_{3}, z_{3}

are the payload position coordinates.

The symmetric UAVs are interconnected via the payload and cables. The following kinematic relations are considered in the modeling process:

x_{1} = x_{3} + L_{1} \sin ϕ_{1} \cos θ_{1}, y_{1} = y_{3} + L_{1} \sin ϕ_{1} \sin θ_{1}, z_{1} = z_{3} + L_{1} \cos ϕ_{1}

(2)

Due to the symmetry of the UAVs, we have

x_{2} = x_{3} + L_{2} \sin ϕ_{2} \cos θ_{2}, y_{2} = y_{3} + L_{2} \sin ϕ_{2} \sin θ_{2}, z_{2} = z_{3} + L_{2} \cos ϕ_{2}

(3)

The kinetic energy T and potential energy V of the system are given by:

T = \frac{1}{2} \sum_{i = 1}^{3} m_{i} ({\dot{x}}_{i}^{2} + {\dot{y}}_{i}^{2} + {\dot{z}}_{i}^{2}), V = \sum_{i = 1}^{3} m_{i} g z_{i}

(4)

Substituting the kinematic constraints into these expressions yields the Lagrangian of the system:

L (q, \dot{q}) = T (\dot{q}) - V (q)

(5)

Considering the generalized forces

Q

(which include the UAV thrusts and cable tensions), the equations of motion are derived via the Euler–Lagrange equation:

\frac{d}{d t} (\frac{\partial L}{\partial {\dot{q}}_{i}}) - \frac{\partial L}{\partial q_{i}} = Q_{i}, (i = 1, 2, \dots, 9)

(6)

The above dynamic equations can be expressed in matrix form as:

M (q) \ddot{q} + C (q, \dot{q}) \dot{q} + G (q) = F

(7)

where

M (q) \in R^{9 \times 9}

is the inertia matrix,

C (q, \dot{q}) \in R^{9 \times 9}

is the Coriolis and centrifugal matrix,

G (q) \in R^{9 \times 1}

is the gravity vector, and

F \in R^{9 \times 1}

is the generalized force vector applied to the system. The explicit expressions of the matrix can be found in https://blog.csdn.net/weixin_56678845/article/details/153729276?spm=1001.2014.3001.5502 (accessed on 28 October 2025)

2.2. Differential Flatness Analysis

A system is considered differentially flat if there exists a set of flat outputs such that all states and control inputs can be expressed as algebraic functions of these outputs and their finite-order derivatives []. This property transforms the complex, nonlinear system dynamics into an equivalent set of constraints on the flat outputs, significantly simplifying trajectory planning.

For the symmetric dual-UAV cooperative transport system studied in this paper, under the assumption that the payload is a point mass, the following vector is selected as the flat output:

σ = {[x_{3}, y_{3}, z_{3}, ϕ]}^{T}

(8)

where

x_{3}, y_{3}, z_{3}

are the coordinates of the payload’s center of mass in the inertial frame, and

ϕ

is a common cable inclination angle, embodying the symmetric condition

ϕ_{1} = ϕ_{2} = ϕ

.

The differential flatness is established by demonstrating that all generalized coordinates

q

(from Equation (1)) and the generalized forces F in the system’s Lagrangian dynamics (Equation (7)) can be algebraically expressed using

σ

and its derivatives.

The components of the generalized coordinate vector

q

are expressed in terms of the flat outputs as follows. The payload coordinates

x_{3}, y_{3}, z_{3}

are elements of

σ

by definition. The symmetric cable inclination is directly specified:

ϕ_{1} = ϕ_{2} = ϕ

.

The horizontal swing angles

θ_{1}

and

θ_{2}

are determined by the system’s kinematics and dynamics. The system’s symmetry implies that the two UAVs are symmetrically distributed about the payload in the horizontal plane, leading to the relation:

θ_{2} = θ_{1} + π

(9)

To find an algebraic expression for

θ_{1}

, we analyze the payload’s dynamics. The force balance equations for the payload in the horizontal plane, which are inherent in the structure of the Lagrangian model, are:

\begin{matrix} m_{3} {\ddot{x}}_{3} & = T_{1} \sin ϕ \cos θ_{1} + T_{2} \sin ϕ \cos θ_{2} \end{matrix}

(10)

\begin{matrix} m_{3} {\ddot{y}}_{3} & = T_{1} \sin ϕ \sin θ_{1} + T_{2} \sin ϕ \sin θ_{2} \end{matrix}

(11)

Substituting the geometric constraint

θ_{2} = θ_{1} + π

(i.e.,

\cos θ_{2} = - \cos θ_{1}, \sin θ_{2} = - \sin θ_{1}

) into Equations (10) and (11) yields:

\begin{matrix} m_{3} {\ddot{x}}_{3} & = (T_{1} - T_{2}) \sin ϕ \cos θ_{1} \end{matrix}

(12)

\begin{matrix} m_{3} {\ddot{y}}_{3} & = (T_{1} - T_{2}) \sin ϕ \sin θ_{1} \end{matrix}

(13)

In the ideally symmetric configuration corresponding to either stationary conditions or uniform linear motion, the cable tensions satisfy

T_{1} = T_{2}

. However, the presence of horizontal load acceleration breaks this tension symmetry (

T_{1} \neq T_{2}

) to enable coordinated system motion, thereby producing the control moment essential for manipulating the load’s dynamic state.

A more rigorous approach involves solving for

θ_{1}

directly from the horizontal force balance. For the system to generate horizontal acceleration, a non-zero control moment is required, which implies

T_{1} \neq T_{2}

. Under this condition, dividing Equation (13) by Equation (12) cancels out the common factor

(T_{1} - T_{2}) \sin ϕ

, yielding:

\frac{{\ddot{y}}_{3}}{{\ddot{x}}_{3}} = \frac{\sin θ_{1}}{\cos θ_{1}} = \tan θ_{1}

(14)

Therefore, provided a horizontal acceleration exists

({\ddot{x}}_{3}, {\ddot{y}}_{3}) \neq (0, 0)

, the horizontal swing angle

θ_{1}

is determined by the algebraic relation:

θ_{1} = arctan2 ({\ddot{y}}_{3}, {\ddot{x}}_{3})

(15)

This crucial equation links

θ_{1}

directly to the second-order derivatives of the flat outputs

x_{3}

and

y_{3}

. Consequently,

θ_{2} = θ_{1} + π

.

The cable lengths

L_{1}

and

L_{2}

are treated as known system parameters, held constant at

L_{1} = L_{2} = L_{0}

during transport.

With all components of

q

now expressed as functions of

σ

and its derivatives, the generalized velocities

\dot{q} (t)

and accelerations

\ddot{q} (t)

are likewise determined.

The final step is to show that the generalized forces F in Equation (7) are also algebraic functions of the flat outputs. Substituting the fully parameterized expressions

q (σ)

,

\dot{q} (σ, \dot{σ})

, and

\ddot{q} (σ, \dot{σ}, \ddot{σ})

into the Lagrangian dynamics transforms the differential equation into an identity that algebraically defines F:

F = M (σ, \dot{σ}, \dots) \ddot{q} (σ, \dot{σ}, \ddot{σ}) + C (σ, \dot{σ}, \dots) \dot{q} (σ, \dot{σ}, \ddot{σ}) + G (σ)

(16)

The physical control inputs—the UAV thrust vectors

F_{1}, F_{2}

and the cable tensions

T_{1}, T_{2}

—are linearly related to specific components of this generalized force vector F. For example, the components of F corresponding to the swing angles in the Lagrangian formulation are directly related to the moments generated by the cable tensions. The UAV thrusts can subsequently be resolved from the force balance in their respective translational dynamics, where all other terms (position, tension) are already known functions of

σ

. This completes the proof that all states and inputs are algebraically parameterized by the flat output.

The symmetric dual-UAV cooperative lifting system is differentially flat with the flat output

σ = {[x_{3}, y_{3}, z_{3}, ϕ]}^{T}

. This property allows the complex motion planning problem to be reduced to planning within the flat output space. A trajectory for

σ (t)

automatically defines a dynamically feasible trajectory for the full system, guaranteed by the algebraic consistency with the Lagrangian model. This forms the theoretical foundation for the minimum-snap trajectory optimization presented in Section 5.

3. Path Planning with Reinforcement Learning

3.1. Directional-Biased Bidirectional RRT

The directional-biased bidirectional RRT (DB-BiRRT) algorithm builds upon our prior work [], where the core framework was initially developed and validated. In this paper, we focus on its application to the symmetric UAV cooperative system. The algorithm integrates goal-biased sampling, improved potential field guidance, and adaptive step size to optimize path planning in confined spaces.

3.1.1. Goal-Biased Mechanism with Regional Probability Sampling

The algorithm employs an adaptive goal-biasing strategy to balance exploration and exploitation:

P_{bias} = \{\begin{matrix} P_{direct}, & if r \leq P_{0} \\ P_{regional}, & if r > P_{0} \end{matrix}

(17)

where

P_{direct}

denotes the target point coordinates,

P_{regional}

represents the coordinates of a regional probability cluster point,

P_{0}

is the bias probability threshold, and

r \in (0, 1)

is a random number.

When

r > P_{0}

, sampling is based on a regional probability distribution determined by a Gaussian function:

G (d) = exp [- \frac{1}{2} {(\frac{d}{ζ})}^{2}]

(18)

where

ζ = k L

is the standard deviation proportional to the start-target distance L, k is a scaling factor, and d is the perpendicular distance from a region’s center to the start-target line.

3.1.2. Direction Guidance Mechanism Based on Improved Potential Field

To address local minima in traditional potential fields, we introduce an improved potential field that incorporates random sampling points. The potential functions are defined as:

\begin{matrix} U_{att} (q) & = \{\begin{matrix} \frac{1}{2} α_{1} d_{g} {(q, q_{goal})}^{2}, & d_{g} < d_{g 0} \\ \frac{1}{2} α_{1} d_{g 0} d_{g} (q, q_{goal}), & d_{g} \geq d_{g 0} \end{matrix} \end{matrix}

(19)

\begin{matrix} U_{rep} (q) & = \{\begin{matrix} \frac{1}{2} β {(\frac{1}{d (q, q_{obs})} - \frac{1}{d_{0}})}^{2} d_{g} {(q, q_{goal})}^{n}, & d (q, q_{obs}) < d_{0} \\ 0, & d (q, q_{obs}) \geq d_{0} \end{matrix} \end{matrix}

(20)

where

α_{1}

and

α_{2}

are attractive gain coefficients;

β

is the repulsive gain coefficient; q,

q_{obs}

,

q_{goal}

are the coordinates of the current node, obstacle, and target point;

d (q, q_{obs})

,

d_{g} (q, q_{goal})

are Euclidean distances;

d_{0}

is the maximum influence distance of obstacles;

d_{g 0}

is the threshold distance for the attractive field; n is the repulsive strength parameter (

n > 0

).

The corresponding forces are derived as:

\begin{matrix} F_{att} & = - \nabla U_{att} (q) = \{\begin{matrix} α_{1} d_{g} (q, q_{goal}), & d_{g} < d_{g 0} \\ α_{1} d_{g 0}, & d_{g} \geq d_{g 0} \end{matrix} \end{matrix}

(21)

\begin{matrix} F_{rep} & = - \nabla U_{rep} (q) = \{\begin{matrix} β (\frac{1}{d (q, q_{obs})} - \frac{1}{d_{0}}) \frac{d_{g} {(q, q_{goal})}^{n}}{d {(q, q_{obs})}^{2}} \frac{\partial d (q, q_{obs})}{\partial q}, & d (q, q_{obs}) < d_{0} \\ 0, & d (q, q_{obs}) \geq d_{0} \end{matrix} \end{matrix}

(22)

The direction of

F_{att}

is from the current node toward the target, while

F_{rep}

is directed from the obstacle toward the current node.

3.1.3. Adaptive Step Size Mechanism

An adaptive step size based on obstacle density enhances planning flexibility:

L = L_{\min} + (L_{\max} - L_{\min}) e^{- k \cdot ρ}

(23)

where

ρ = A_{obs} / A_{total}

is the obstacle density ratio,

A_{total}

is the area of the obstacle detection region,

A_{obs}

is the area occupied by obstacles,

L_{\max}

and

L_{\min}

are the step size bounds, and k is a decay coefficient.

These adaptive parameters

P_{0}

,

β

, and L are dynamically optimized by the reinforcement learning module described in the next subsection.

3.2. Reinforcement Learning Supporting

To enhance the adaptability and flexibility of the algorithm in environments of varying complexity, this paper proposes a reinforcement learning-supported DB-BiRRT (RLDB-BiRRT) method, which integrates a learning module to dynamically generate key parameters during node expansion. The overall framework is illustrated in Figure 3.

Figure 3. The framework of RLDB-BiRRT.

With the reinforcement learning support, the adaptive parameters

P_{0}

(target bias probability),

β

(repulsive gain coefficient), and L (step size) in Equations (17), (22), and (23) are dynamically output by a trained decision-making model. The random tree expansion process is modeled as a Markov Decision Process (MDP), with the following components.

3.2.1. State Space

To capture sufficient environmental information around a node, the state space is designed as:

S = {D_{g 1}, D_{g 2}, D_{g 3}, D_{s 1}, D_{s 2}, D_{s 3}, D_{o 1}, D_{o 2}, D_{o 3}, N_{p 1}, N_{p 2}, N_{p 3}}

(24)

where

D_{g i}

,

D_{s i}

,

D_{o i}

and

N_{p i}

represent, respectively, the 3D vectors from the current node of tree i to its corresponding target point, to the nearest random sampling point, and to the nearest obstacle, as well as the total number of nodes within a specified radius of the current node of tree i, for

i = 1, 2, 3

.

3.2.2. Action Space

Based on the design and parameter requirements of RLDB-BiRRT, the action space is defined as a continuous vector:

A = {P_{b 1}, P_{b 2}, β_{1}, β_{2}, L_{1}, L_{2}}

(25)

where

P_{b 1}, P_{b 2}

: Goal-Biased probabilities for the two trees;

β_{1}, β_{2}

: Repulsive gain coefficients for the two trees;

L_{1}, L_{2}

: Step sizes for the two trees.

3.2.3. Reward Function

The reward function is designed to guide the agent through sparse reward challenges by combining multiple objectives. The four core reward components, their design objectives, triggering conditions, and values are defined in Table 1.

Table 1. Design of fundamental reward components.

The total reward

R_{total}

is formulated as a weighted sum of these components:

R_{total} = w_{obs} R_{obs} + w_{prox} R_{prox} + w_{conn} R_{conn} + w_{qual} R_{qual}

(26)

with the weight coefficients set as:

w_{obs} = w_{prox} = w_{conn} = 1.0

and

w_{qual} = 0.8

, to prioritize safety and feasibility over path optimality.

This hierarchical design prioritizes task success and safety. The primary weights on

R_{obs}

,

R_{prox}

, and

R_{conn}

ensure the agent first learns a feasible and safe policy, while the secondary weight on

R_{qual}

refines the solution for optimality, leading to more stable training.

3.2.4. Training Process

The Deep Deterministic Policy Gradient (DDPG) algorithm [] is employed to train the agent due to its suitability for continuous action spaces. The agent is trained over a series of episodes. Each episode terminates upon successful connection of the two trees, collision with an obstacle, or exceeding the maximum number of iterations (set to 2000). Upon completion of training, the trained agent is integrated into the complete workflow of the RLDB-BiRRT algorithm, with the pseudocode shown in Algorithm 1. The algorithm begins by initializing parameters, including start and goal positions, the obstacle map, and the two random trees. The pre-trained agent is then loaded. At each decision step, the agent observes the current state S of the environment and outputs an action vector A. This action vector dynamically sets the parameters for target biasing, repulsive field gain, and step size. Based on these adapted parameters, the algorithm performs goal-biased sampling, regional probability sampling, and calculates the expansion direction using the improved potential field. Combined with the adaptive step size, a new sampling point is determined. This sampling cycle continues until the two random trees successfully connect, yielding a feasible path and completing the path planning process.

Algorithm 1: RLDB-BiRRT: Reinforcement Learning-based Bi-RRT

3.3. Simulation Experiments

Simulation experiments were conducted in MATLAB R2023a on a computer equipped with an Intel i9 CPU and 32 GB RAM, to validate the effectiveness and superiority of the proposed RLDB-BiRRT algorithm in path planning. A custom 3D obstacle map with dimensions of

45 \times 45 \times 45

cubic units was constructed, incorporating various obstacle primitives including spheres, cylinders, and cuboids, arranged to form complex regions, narrow passages, and scenarios that mimic challenging real-world confined spaces.

Parameters for the RLDB-BiRRT algorithm were set as follows: obstacle influence distance

d_{0} = 10

units, attractive field threshold

d_{g 0} = 50

units, attractive gains

α_{1} = 2, α_{2} = 10

, safe distance

d_{safe} = 5

units, repulsive parameter

n = 2

, with adaptive step size bounds

L_{\max} = 3.5

units,

L_{\min} = 0.2

units, and a maximum of 2000 iterations per planning trial. The hyperparameters for the DDPG algorithm, determined through empirical tuning, are listed in Table 2.

Table 2. DDPG algorithm hyperparameters.

The training reward curve of the DDPG agent is shown in Figure 4. It can be observed that the agent’s average reward exhibits considerable variation during the initial exploration phase. As training progresses, the overall trend stabilizes. The reward curve fluctuates within a narrow range after approximately 500 episodes, demonstrating the convergence and improved stability of the training process.

Figure 4. Training reward curve of the DDPG agent.

To evaluate the performance of the RLDB-BiRRT algorithm, three benchmark sampling-based algorithms were selected for comparison: the classic RRT* [], Improved Bi-RRT* [], and APF-RRT* []. These algorithms were tested on identical obstacle maps to ensure a fair and comprehensive performance comparison. For all compared algorithms, the maximum number of iterations was set to 2000, with a fixed step size of 1.75 units, a maximum connection distance of 5 units, and a rewiring radius of 5 units for RRT*-based optimizations.

The simulation results of the four algorithms are qualitatively compared in Figure 5. Twenty independent Monte Carlo trials were conducted for each algorithm to ensure statistical significance. The runtime, path length, and number of nodes were recorded for each trial. The statistical results for computation time and path length are summarized in Figure 6 and Figure 7, respectively. The results indicate that the proposed RLDB-BiRRT algorithm achieves a favorable balance between planning efficiency (time) and path quality (length), consistently outperforming the benchmark algorithms in these key metrics across diverse environmental configurations.

Figure 5. Comparison of paths generated by the four algorithms in a representative scenario.

Figure 6. Comparison of average computation time of the four algorithms.

Figure 7. Comparison of average final path length of the four algorithms.

The proposed RLDB-BiRRT algorithm introduces a learning-based paradigm to dynamically adapt the core expansion parameters of a BiRRT, aiming for greater robustness and efficiency across diverse environments. We note that deep reinforcement learning comprises both offline training and online inference phases. The former is computationally intensive and time-consuming, as it requires extensive environmental interactions and experience replay. However, once training is completed, the computational overhead of online inference—that is, using the trained model to make decisions based on the current state—becomes very low. Each inference step takes approximately 0.08 seconds and consumes 0.315 MB of memory. Such a computational load is considered typical for conventional offline training scenarios. Statistical results show that the agent first discovers a feasible solution within 184 ± 27 episodes and reaches a reward plateau (±5 %) after approximately 487 ± 42 episodes. The entire training process takes 21.6 min. During online execution, the average decision time is only 0.18 s, which meeting the fast-response requirement.

4. Safe Flight Corridor Construction

For symmetric UAV systems performing cooperative transportation in confined spaces, the high obstacle density and stringent spatial constraints present dual challenges. Traditional safe flight corridor (SFC) methods addressed in [,] typically model free space as spheres or ellipsoids. While these approaches suit single-UAV high-speed flight scenarios, they fail to accommodate the unique configuration requirements of dual-UAV cooperative systems, resulting in poor corridor space utilization and trajectories prone to boundary violations.

To address this issue, this study proposes an adaptive hybrid geometric construction framework. The method begins by optimizing and pruning the raw nodes generated from path planning in Section 3 to extract key waypoints as the basis for corridor segmentation. Subsequently, using each path segment as an axis, it constructs continuous safe flight corridors through adaptive ellipsoid construction and convex polyhedron optimization. Finally, these convex polyhedral constraints are embedded into the trajectory optimization framework, providing high-quality linear constraint spaces for backend planning. This technical approach effectively balances safety, spatial continuity, and utilization efficiency, establishing a reliable trajectory planning foundation for cooperative transportation tasks in confined spaces.

4.1. Path Segmentation and Initial Corridor Generation

The process begins with the piecewise linear path

P = {P_{0}, P_{1}, \dots, P_{n}}

generated by the global planner described in Section 3. For each segment

L_{i} = {P_{i} \to P_{i + 1}}

, an initial spherical corridor is constructed. This sphere fully contains both endpoints and the segment’s midpoint, establishing a preliminary collision-free volume for subsequent ellipsoidal refinement.

As illustrated in Figure 8, conventional methods generate an ellipsoidal corridor per path segment, with the overall SFC composed of convex polyhedra derived from the ellipsoids’ tangent planes. Let

V_{i}

denote the convex polyhedron for segment

L_{i}

; the complete corridor is then

V_{s} = {V_{i} ∣ i = 0, 1, \dots, n - 1}

. This construction ensures that consecutive polyhedra

V_{i}

and

V_{i + 1}

intersect at the shared waypoint

P_{i + 1}

, guaranteeing continuity at these critical junctions.

Figure 8. Traditional SFC construction using spheres.

However, these methods typically consider a single UAV. For cooperative lifting, the spatial constraints are coupled. Our proposed adaptive SFC explicitly constructs corridors that accommodate the formation of two UAVs and the suspended payload, thereby improving space utilization.

An ellipsoid is formally defined as:

E = ξ (C, d) = \{p = C^{T} p^{'} + d ∣ ∥ p^{'} ∥ \leq 1\}

(27)

where

E

is the ellipsoid,

C

is a

3 \times 3

symmetric positive definite matrix shaping the ellipsoid, and

d

is its center. The matrix

C

can be decomposed via eigenvalue decomposition as

C = R^{T} S R

.

Here,

R

is a rotation matrix aligning the ellipsoid with the global coordinate frame, and

S = diag (a, b, c)

is a diagonal matrix where a, b, and c represent the lengths of the semi-principal axes, with

a \geq b

and

a \geq c

.

The ellipsoid must fully enclose its corresponding path segment L while remaining free of obstacles O. The process starts by creating a sphere centered at the segment midpoint,

d = (P_{i} + P_{i + 1}) / 2

, with initially equal axis lengths.

4.2. Ellipsoid Adaptive Construction

To resolve issues of insufficient obstacle clearance and intersection discontinuity prevalent in initial corridors (Figure 8), we introduce a dual-objective adaptive strategy termed “Dynamic Ellipsoid Adaptation and Intersection Continuity Assurance” as shown in Figure 9 with the procedure is as follows:

Figure 9. Adaptive elongation of the ellipsoid’s major axis to increase clearance from nearby obstacles.

First, the major axis of the ellipsoid is adaptively adjusted. The segment’s direction vector is

dr = \frac{P_{i + 1} - P_{i}}{∥ P_{i + 1} - P_{i} ∥}

For an obstacle point

P_{o}

, its projection parameter along the segment is

t = (P_{o} - P_{i}) \cdot dr

The corresponding projection point on the segment is:

P_{t} = P_{i} + t \cdot dr

The perpendicular distance from the obstacle to the segment is then:

d_{i} = ∥ P_{o} - P_{t} ∥

If

d_{i} \leq d_{threshold}

(e.g.,

d_{threshold} = 0.1

), the obstacle is considered influential. The minimum distance from the obstacle to either segment endpoint is computed as:

x = min (∥ P_{o} - P_{i} ∥, ∥ P_{o} - P_{i + 1} ∥)

(28)

This distance x is used to extend the major axis, enhancing the safety margin. If no obstacle is within the threshold, x is set to a fixed positive increment. The updated scaling matrix becomes:

S = diag (f + x, f, f)

(29)

Following the acquisition of a collision-free ellipsoid

E

, a convex polyhedron

V_{p}

is constructed to provide linear inequality constraints for trajectory optimization. The polyhedron is built iteratively from the obstacle set O and a set of uniformly sampled points

S_{sample}

on the ellipsoid’s surface, parameterized by spherical coordinates. These are combined into an extended constraint set

O_{extended} = O \cup S_{sample}

.

The iterative construction process as shown in Figure 10 is as follows: While

O_{extended}

is non-empty, find the point

p_{c} \in O_{extended}

closest to the ellipsoid center

d

. A constraining hyperplane H tangent to the ellipsoid at

p_{c}

is defined. Its normal vector and offset are given by:

a = 2 C^{- T} (p_{c} - d)

(30)

b = a^{T} p_{c}

(31)

This defines the half-space

H = {p ∣ a^{T} p \leq b}

, which is added to

V_{p}

. For all points

p \in O_{extended}

, the signed distance to H is computed as

d (p) = a^{T} p - b

. Points satisfying

d (p) > 0

(i.e., outside H) are retained, updating

O_{extended}

. This loop continues until

O_{extended}

is empty, ensuring all obstacle and sample points are interior to the polyhedron.

Figure 10. Iterative refinement process to ensure the ellipsoid maintains a safe distance from all obstacles while preserving path connectivity.

The final convex polyhedron is the intersection of all generated hyperplanes:

V_{p} = ⋂_{i} H_{i} = {p ∣ A^{T} p \leq b}

(32)

where columns of

A

are the normal vectors

a_{i}

, and

b

contains the corresponding offsets

b_{i}

. Since the initial ellipsoid

E

contains segment L, and

V_{p}

is constructed from its tangent planes, it is guaranteed that

L \subset V_{p}

. The incorporation of surface sample points ensures the polyhedron boundary approximates the ellipsoidal shape, while obstacle exclusion guarantees safety.

The adaptive safe corridor shown in Figure 11 effectively meets the obstacle avoidance requirements in complex environments, ensures the spatial continuity between path segments, and enhances the trajectory safety of the symmetric UAV cooperative transportation system.

Figure 11. The resulting adaptive safe corridor.

5. Trajectory Planning Based on Safety Corridors

The Safe Flight Corridor (SFC) constructed in Section 4 provides a series of convex polyhedral safe regions, effectively transforming non-convex spatial constraints into manageable linear inequalities. This representation significantly enhances the convergence properties of the trajectory optimization algorithm. By integrating these linear constraints into the planning process, the system guarantees dynamic feasibility and robust obstacle avoidance for the UAV formation within the defined safe space. This section formulates the trajectory generation as a constrained minimum-snap optimization problem, solved via Sequential Quadratic Programming (SQP) to achieve safe, smooth, and controllable flight in complex 3D environments.

5.1. Objective Function Formulation

In trajectory planning for cooperative UAV transportation systems, the minimization of snap is adopted for generating trajectories in confined spaces. This approach ensures continuity in lower-order derivatives (acceleration and jerk), thereby producing smoother motion trajectories, reducing energy consumption, and achieving more manageable control inputs. Furthermore, the minimum-snap optimization problem can be formulated as a quadratic programming (QP) problem, which exhibits lower computational complexity compared to other higher-order optimization methods, making it suitable for real-time applications.

As established in Section 2, the symmetric UAV cooperative transportation system is differentially flat with outputs defined as

σ = {[x_{3}, y_{3}, z_{3}, ϕ]}^{T}

. This property implies that planning trajectories for these flat outputs is sufficient to determine the complete system states and control inputs, thereby significantly simplifying the planning complexity.

The trajectory is represented as an n-th order piecewise polynomial over M segments:

p (t) = \{\begin{matrix} c_{1}^{T} fi (t), & t_{0} \leq t < t_{1} \\ c_{2}^{T} fi (t), & t_{1} \leq t < t_{2} \\ ⋮ & ⋮ \\ c_{M}^{T} fi (t), & t_{M - 1} \leq t < t_{M} \end{matrix}

(33)

where

c_{i}

is the coefficient vector for the i-th segment,

fi (t) = {[1, t, t^{2}, \dots, t^{n}]}^{T}

is the polynomial basis vector, and

t_{0}, t_{1}, \dots, t_{M}

are the segment time nodes.

The objective is to find the optimal polynomial coefficients that minimize the total snap while satisfying all constraints, formulated as:

\begin{matrix} min_{p} & f (p) = \sum_{i = 1}^{M} \int_{t_{i - 1}}^{t_{i}} {∥p_{i}^{(4)} (t)∥}^{2} d t \\ s . t . & A_{eq} p = b_{eq}, \\ A_{ineq} p \leq b_{ineq} \end{matrix}

(34)

Here, the objective function

f (p)

minimizes the integral of the squared norm of the snap across all trajectory segments, promoting overall smoothness and energy efficiency.

5.2. Constraints Formulation

5.2.1. Equality Constraints

Equality constraints are imposed to ensure the trajectory satisfies specific state conditions at key points (start, end, and intermediate waypoints) and maintains smooth transitions between segments.For a polynomial segment

p_{i} (t) = c_{i}^{T} fi (t)

, the value and derivatives at a specific time

t_{0}

can be computed using the basis vector and its derivatives:

fi (t_{0}), \dot{fi} (t_{0}), \ddot{fi} (t_{0})

, etc. The constraints for position (

p_{0}

), velocity (

v_{0}

), and acceleration (

a_{0}

) at

t_{0}

are:

c_{i}^{T} fi (t_{0}) = p_{0} (Pos), c_{i}^{T} \dot{fi} (t_{0}) = v_{0} (Vel), c_{i}^{T} \ddot{fi} (t_{0}) = a_{0} (Acc)

(35)

Similar constraints are applied at intermediate waypoints and the final point to ensure geometric and dynamic consistency.

Continuity between adjacent segments is enforced by equating the derivatives at the connection points. For instance, the continuity of position, velocity, and acceleration between segment i and segment

i + 1

at time

t_{i}

is expressed as:

c_{i}^{T} fi (t_{i}) = c_{i + 1}^{T} fi (t_{i}) (Pos), c_{i}^{T} \dot{fi} (t_{i}) = c_{i + 1}^{T} \dot{fi} (t_{i}) (Vel), c_{i}^{T} \ddot{fi} (t_{i}) = c_{i + 1}^{T} \ddot{fi} (t_{i}) (Acc)

(36)

All these linear equality constraints are aggregated into the system

A_{eq} p = b_{eq}

.

5.2.2. Inequality Constraints

Inequality constraints are crucial for ensuring the trajectory remains within the Safe Flight Corridor (SFC), thereby guaranteeing safety and obstacle avoidance. A piecewise polynomial trajectory constrained within an SFC.

Each segment i of the trajectory, defined over

[t_{i}, t_{i + 1}]

, is associated with a convex polyhedron

V_{i}

from the SFC. This polyhedron is defined by a set of hyperplanes

V_{i} = {(n_{i, j}, q_{i, j})}_{j = 1}^{m_{i}}

, where

n_{i, j}

is the outward-pointing normal vector of the j-th hyperplane, and

q_{i, j}

is a point on that hyperplane.

For a trajectory point

x (t) = {[x (t), y (t), z (t)]}^{T}

to lie inside the polyhedron, it must satisfy the linear inequality for every hyperplane:

n_{i, j}^{T} x (t) \leq n_{i, j}^{T} q_{i, j}, \forall j = 1, \dots, m_{i}, t \in [t_{i}, t_{i + 1}]

(37)

To incorporate this continuous-time constraint into the QP framework, the trajectory is discretized. For each segment i, a set of

N + 1

sample times is defined:

t_{i, k} = t_{i} + \frac{k}{N} (t_{i + 1} - t_{i}), k = 0, 1, \dots, N

(38)

The safety constraint is then enforced at these discrete time points.

The position in each axis is represented by polynomials. For the i-th segment:

x (t) = t_{base} {(t)}^{T} p_{x, i}, y (t) = t_{base} {(t)}^{T} p_{y, i}, z (t) = t_{base} {(t)}^{T} p_{z, i}

(39)

where

t_{base} (t) = {[1, (t - t_{i}), {(t - t_{i})}^{2}, \dots, {(t - t_{i})}^{n}]}^{T}

and

p_{x, i}, p_{y, i}, p_{z, i}

are the coefficient vectors.

Substituting the polynomial expressions into Equation (37) at a sample time

t_{i, k}

yields the linear inequality:

[n_{x, i, j} t_{base} {(t_{i, k})}^{T}, n_{y, i, j} t_{base} {(t_{i, k})}^{T}, n_{z, i, j} t_{base} {(t_{i, k})}^{T}] [\begin{matrix} p_{x, i} \\ p_{y, i} \\ p_{z, i} \end{matrix}] \leq n_{i, j}^{T} q_{i, j}

(40)

By constructing this constraint for every hyperplane j and every sample point k in every segment i, all spatial safety requirements are transformed into a consolidated set of linear inequality constraints of the form

A_{ineq} p \leq b_{ineq}

.

5.3. Complete Optimization Problem

Integrating the objective function with all constraints, the complete minimum-snap trajectory optimization problem is formulated as the following Quadratic Program (QP):

\begin{matrix} min_{p} & f (p) = \sum_{i = 1}^{n_{p}} \int_{t_{i}}^{t_{i + 1}} {∥\frac{d^{4} p_{i} (t)}{d t^{4}}∥}^{2} d t \\ s . t . & \frac{d^{k}}{d t^{k}} p_{1} (t_{0}) = s_{0}^{(k)}, k = 0, \dots, K (Initial Boundary Conditions) \\ \frac{d^{k}}{d t^{k}} p_{n_{p}} (t_{f}) = s_{f}^{(k)}, k = 0, \dots, K (Final Boundary Conditions) \\ A_{eq} p = b_{eq} (Equality Constraints) \\ A_{ineq} p \leq b_{ineq} (Inequality Constraints) \end{matrix}

(41)

where K defines the order of derivative continuity (typically up to snap,

K = 4

),

s_{0}^{(k)}

and

s_{f}^{(k)}

are the specified initial and final state derivatives, and the matrices

A_{eq}, A_{ineq}

and vectors

b_{eq}, b_{ineq}

encapsulate all linear constraints derived above.

6. Simulation Experiments of the Integrated Framework

To comprehensively validate the effectiveness and safety of the proposed integrated framework—encompassing both the Safe Flight Corridor (SFC) construction and the trajectory planning method for the symmetric UAV cooperative transportation system—simulation experiments were conducted. The simulations replicate the complete process of two UAVs cooperatively transporting a payload from a start point to a goal within a complex, confined environment. The experiments are designed to evaluate the performance of both the SFC generation and the subsequent trajectory optimization under stringent boundary constraints.

6.1. Simulation Environment Setup

Simulations of the symmetric UAV transportation system were performed in MATLAB. The environment simulates a scenario requiring navigation through obstacles, represented as a point cloud

O_{obs}

distributed within a

6 m \times 6 m \times 8 m

workspace. This setup models the process of symmetric UAVs cooperatively transporting a payload through obstacles in a complex setting. The obstacles consist of three stacked cuboids, forming a narrow passage through which the system must navigate. A simulation model for the symmetric UAV cooperative payload transport was also established, incorporating the dynamic model developed in Section 2.

The symmetric UAVs cooperatively transport a payload with a mass of

m_{3} = 0.5 kg

, connected to the UAVs by cables of length

L = 1 m

. The masses of the two UAVs are

m_{1} = 1.5 kg

and

m_{2} = 1.5 kg

, respectively. Given the start and goal positions for the payload, the system is tasked with collaboratively controlling the payload to traverse the narrow passage within the obstacle region and safely reach the target point.

6.2. Safe Flight Corridor Construction for Confined Space

To evaluate the performance of the Safe Flight Corridor (SFC) construction, this study conducts a comparative analysis between the generated convex polyhedra and the elliptical corridors proposed in [] in terms of volume, continuity, and collision-free properties. As shown in Figure 12, the constructed SFC consists of three overlapping convex polyhedra generated along segmented path waypoints. Rigorous validation confirms that all generated safe corridors remain collision-free with the obstacle point cloud. A comparison with Figure 13 further verifies the effectiveness of the proposed method in ensuring safety. Moreover, by appropriately elongating the major axis in local regions, the algorithm achieves more efficient utilization of the available space, demonstrating its capability to adapt to environmental disturbances.

Figure 12. Compared spherical corridor with [].

Figure 13. Proposed adaptive ellipsoidal corridor.

The proposed optimizer reduces the snap energy by 49% and the jerk energy by 31%, which directly translates into lower motor thrust-rate demands, smaller cable-swing amplitudes, and a 20% increase in actuator margin, while still allowing the corridor to be tightened (

Δ w ↓ 20 %

) and its spatial utilization to rise from 68% to 98%. Therefore, the improvement in the smoothness index not only represents numerical optimization but also enables simultaneous gains in compactness, safety, and trackability as reported in the paper.

As shown in Figure 12, the proposed adaptive corridor demonstrates significant performance improvements compared to the corridor presented in [] (Figure 13). According to the summary in Table 3, although the adaptive corridor incurs a slight increase in generation time and memory usage, it achieves a remarkable 30-percentage-point gain in spatial utilization (increasing from 68% to 98%). This enhancement expands the feasible region for trajectory planning, enabling passage through areas previously inaccessible under symmetric hoisting configurations. Experimental results confirm that the proposed method substantially improves both spatial efficiency and corridor continuity but with less energy as shown in Table 4, thereby providing a more reliable safety margin for subsequent trajectory planning.

Table 3. Comparison of safe corridor performance.

Table 4. Comparison of path smoothness indices between the Compared SFC approach and the proposed method.

6.3. Trajectory Planning and Tracking Results

The payload transportation system is designed to depart from the start point, navigate through the obstacle region depicted in Figure 14, and ultimately arrive at the target point. The design of closed-loop dynamic control is crucial for the actual tracking performance of the system. In our previous research, our team has conducted in-depth studies on trajectory tracking control for similar systems [], with which the stable trajectory tracking can be effectively achieved while ensuring system dynamic stability, maintaining positive cable tension, and effectively suppressing load . The obstacles, represented by the blue rectangular prisms, have a height of 1 m, while the transportation system flies at an altitude of 1.2 m. With the help of results presented in Section 2.2, Section 3 and Section 4, as well as our control strategies [], the system effectively avoids obstacles by dynamically adjusting both the inter-UAV angle and the flight trajectory as shown in Figure 14.

Figure 14. Navigating through the internal confined space within a large obstacle.

Based on the framework developed in this study, the symmetric UAV system was tested in a larger and more complex three-dimensional obstacle environment, as shown in Figure 15. To ensure safe passage through the complex environment, the system dynamically adjusts the inter-UAV angles and flight trajectories, as depicted in Figure 15. In the figure, the blue solid line represents the desired trajectory, while the black dashed line indicates the actual tracking trajectory. The tracking performance of the system is presented in Table 5, which shows the average error (AE) and maximum error (ME) percentages for different variables related to the load’s trajectory tracking. Specifically,

x_{3}

,

y_{3}

, and

z_{3}

represent the average error percentages of the load’s three directional axes, while

θ_{1}

and

θ_{2}

indicate the maximum error percentages of the two swing angles. These error percentages are calculated by multiplying the ratio of the error value to the reference value (e.g., the total length of the trajectory) by 100%. As shown in the table, the system exhibits low error percentages, indicating high precision and reliability in trajectory tracking.

Figure 15. Three dimensional internal confined space within a large obstacle structure.

Table 5. Error Percentages for Different Metrics.

7. Discussion and Conclusions

This paper presents an integrated motion planning framework for symmetric UAV cooperative transportation. The core novelties lie in the RL-driven adaptive path planner, the formation-aware safe flight corridor, and their integration within a differential flatness-based optimization framework. This framework effectively addresses cooperative transportation challenges through unified path planning, corridor construction, and trajectory optimization.

The entire paper is developed under the symmetry and small-angle assumptions. If these assumptions are violated, the following failures occur:

Differential-flatness breakdown
The flat output $σ = {[x_{l}, y_{l}, z_{l}, ϕ]}^{T}$ is derived under the symmetry conditions $θ_{2} = θ_{1} + π$ and $T_{1} \approx T_{2}$ . If the two UAVs differ in position or tension ( $ϕ_{1} \neq ϕ_{2}$ , $θ_{2} - θ_{1} \neq π$ ), the 4-D vector $σ$ can no longer algebraically parameterize the 9-D state vector q. Hence the dimensionality-reduction core of the minimum-snap optimizer vanishes; the trajectory must be planned in the full 9-D space, and the QP size and solution time explode.
Constraint-linearization error blow-up
The small-angle approximation ( $\sin θ \approx θ$ , $\cos θ \approx 1$ ) turns Lagrangian dynamics into linear/quadratic constraints. For swing angles $> 20 °$ – $30 °$ the linearization error exceeds $10 %$ , so the “smooth” optimal trajectory deviates significantly from real dynamics, producing negative cable tension (slack), thrust commands beyond motor saturation, and saturated tracking controllers that may become unstable.
Safe-flight corridor misalignment
The SFC treats the whole formation as one ellipsoid plus a sequence of convex polyhedra—valid only if symmetry fixes the relative UAV positions. Without symmetry, the UAV-payload triangle changes over time; the ellipsoid must be recomputed frame-by-frame or enveloped separately, leading to insufficient corridor volume (collision risk) or excessive corridor volume (space utilization collapses). collapses).

Future work will extend this approach to large-scale multi-UAV systems, dynamic environments, and real-world applications with perception uncertainties.

Author Contributions

Conceptualization and methodology, J.H.; validation, T.J.; formal analysis and investigation, X.W.; writing—original draft preparation, J.H.; writing—review and editing, J.H. and X.W.; visualization, T.J.; project administration, J.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China through the Grants (11972070, 11702016).

Data Availability Statement

The dynamic model and the simulation result can be found at https://blog.csdn.net/weixin_56678845/article/details/153729276?spm=1001.2014.3001.5502 (accessed on 28 October 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Guo, L.; Li, Q. Path Planning of Mobile Robot Based on Improved RRT* Algorithm. Cogn. Syst. Res. 2024, 19, 1209–1217. [Google Scholar]
Chen, Z.; Min, L.; Shao, X.; Zhao, Z. Obstacle Avoidance Path Planning of Bridge Crane Based on Improved RRT Algorithm. J. Syst. Simul. 2021, 33, 1832–1838. [Google Scholar]
Alkomy, H.; Shan, J. Vibration reduction of a quadrotor with a cable-suspended payload using polynomial trajectories. Nonlinear Dyn. 2021, 104, 3713–3735. [Google Scholar] [CrossRef] [PubMed]
Yu, G.; Chen, Y.; Chen, C.; Chen, X.; Mei, Y.; Liu, Y.; Xu, Y.; Zhang, W. Aggressive maneuvers for a quadrotor-slung-load system through fast trajectory generation and tracking. Auton. Robot. 2022, 46, 499–513. [Google Scholar] [CrossRef]
Zhang, G.; Liu, X. UAV collision avoidance using mixed-integer second-order cone programming. J. Guid. Control. Dyn. 2022, 45, 1732–1738. [Google Scholar] [CrossRef]
Li, X.; Zhang, J.; Han, J. Trajectory planning of load transportation with multi-quadrotors based on reinforcement learning algorithm. Aerosp. Sci. Technol. 2021, 116, 106887. [Google Scholar] [CrossRef]
Estevez, J.; Gangoiti, U.; Lopez-Guede, J.M.; Graña, M. Reinforcement learning based trajectory planning for multi-UAV load transportation. IEEE Access 2024. [Google Scholar] [CrossRef]
Campos-Macías, L.; Aldana-López, R.; de la Guardia, R.; Parra-Vilchis, J.I.; Gómez-Gutiérrez, D. Autonomous navigation of MAVs in unknown cluttered environments. J. Field Robot. 2021, 38, 307–326. [Google Scholar] [CrossRef]
Duan, D.; Zu, R.; Yu, T.; Zhang, C.; Li, J. Differential flatness-based real-time trajectory planning for multihelicopter cooperative transportation in crowded environments. AIAA J. 2023, 61, 4079–4095. [Google Scholar] [CrossRef]
Valentim, T.; Cunha, R.; Oliveira, P.; Cabecinhas, D.; Silvestre, C. Multi-vehicle cooperative control for load transportation. IFAC-PapersOnLine 2019, 52, 358–363. [Google Scholar] [CrossRef]
Savin, S. An algorithm for generating convex obstacle-free regions based on stereographic projection. In Proceedings of the 2017 International Siberian Conference on Control and Communications (SIBCON), Astana, Kazakhstan, 29–30 June 2017; pp. 1–6. [Google Scholar]
Zhong, X.; Wu, Y.; Wang, D.; Wang, Q.; Xu, C.; Gao, F. Generating large convex polytopes directly on point clouds. arXiv 2020, arXiv:2010.08744. [Google Scholar] [CrossRef]
Werner, P.; Cohn, T.; Jiang, R.H.; Seyde, T.; Simchowitz, M.; Tedrake, R.; Rus, D. Faster algorithms for growing collision-free convex polytopes in robot configuration space. arXiv 2024, arXiv:2410.12649. [Google Scholar] [CrossRef]
Ren, Y.; Zhu, F.; Liu, W.; Wang, Z.; Lin, Y.; Gao, F.; Zhang, F. Bubble planner: Planning high-speed smooth quadrotor trajectories using receding corridors. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; pp. 6332–6339. [Google Scholar]
Liu, S.; Watterson, M.; Mohta, K.; Sun, K.; Bhattacharya, S.; Taylor, C.J.; Kumar, V. Planning dynamically feasible trajectories for quadrotors using safe flight corridors in 3-D complex environments. IEEE Robot. Autom. Lett. 2017, 2, 1688–1695. [Google Scholar] [CrossRef]
An, N.; Wang, Q.; Zhao, X.; Wang, Q. Differential Flatness and Formation Control of Unmanned Aerial Vehicle Swarm. J. Dyn. Control 2023, 21, 79–80. [Google Scholar]
Zhang, H.; Huang, J. Research on Motion Planning of Tower Crane in Complex Environment. Control Eng. 2025, 32, 1833–1845. [Google Scholar]
Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
Karaman, S.; Frazzoli, E. Incremental Sampling-based Algorithms for Optimal Motion Planning. Int. J. Robot. Res. 2011, 30, 846–894. [Google Scholar] [CrossRef]
Wang, H.F.; Zhang, Y.; Zhu, Y.K.; Chen, X.B. Path Planning Algorithm for Mobile Robot Based on Improved Bidirectional RRT*. J. Northeast. Univ. (Nat. Sci.) 2021, 42, 1065–1070. [Google Scholar]
Shang, F.; Luo, C.; Li, W.; Liang, Y.; Xiao, W.; Chen, Q.; Zhou, X. Path Planning Research for Pitaya Picking Manipulator Based on APF-RRT*. J. Agric. Mech. Res. 2025, 47, 48–52. [Google Scholar]
Huang, J.; Wang, Y.; Ni, L. Design of Cooperative Transportation and Anti-Sway Control for Dual UAVs. Comput. Integr. Manuf. Syst. 2025, 31, 4096–4104. [Google Scholar]

Figure 1. The overall framework of the control system.

Figure 2. Physical model and coordinate system of the symmetric UAV cooperative lifting system.

Figure 3. The framework of RLDB-BiRRT.

Figure 4. Training reward curve of the DDPG agent.

Figure 5. Comparison of paths generated by the four algorithms in a representative scenario.

Figure 6. Comparison of average computation time of the four algorithms.

Figure 7. Comparison of average final path length of the four algorithms.

Figure 8. Traditional SFC construction using spheres.

Figure 9. Adaptive elongation of the ellipsoid’s major axis to increase clearance from nearby obstacles.

Figure 11. The resulting adaptive safe corridor.

Figure 12. Compared spherical corridor with [].

Figure 13. Proposed adaptive ellipsoidal corridor.

Figure 14. Navigating through the internal confined space within a large obstacle.

Figure 15. Three dimensional internal confined space within a large obstacle structure.

Table 1. Design of fundamental reward components.

Component	Design Objective	Condition	Value
$R_{obs}$	Penalize proximity to obstacles for safety.	$d_{obs} < d_{safe}$	$- 1$
		Otherwise	0
$R_{prox}$	Encourage progress toward the goal.	$d_{goal} > d_{s}$	$- 1$
		Otherwise	0
$R_{conn}$	Highly reward successful path connection.	Trees connect	$+ 100$
		Connection fails	$- 200$
$R_{qual}$	Incentivize shorter paths.	$X < X_{\min}$	$+ 80$
		$X > X_{\max}$	$- 80$
		Otherwise	0

Table 2. DDPG algorithm hyperparameters.

Parameter	Value
Actor Learning Rate	0.003
Critic Learning Rate	0.003
Replay Buffer Size	$5 \times 10^{4}$
Mini-batch Size	128
Initial Noise Std	0.1
Noise Std Decay Rate	0.995
Minimum Noise Std	0.01

Table 3. Comparison of safe corridor performance.

Corridor Type	Gen. Time (s)	Memory Usage (MB)	Utilization (%)
Compared Corridor in []	0.018	0.188	68
Adaptive Corridor (Proposed)	0.023	0.242	98

Table 4. Comparison of path smoothness indices between the Compared SFC approach and the proposed method.

Method	Jerk Energy $\int ∥ p^{(3)} {(t) ∥}^{2} d t$	Snap Energy $\int ∥ p^{(4)} {(t) ∥}^{2} d t$
Compared SFC	5.0–6.8	23.7–31.4
Proposed Adaptive Corridor	3.5	15.9

Table 5. Error Percentages for Different Metrics.

Metric	AE of $x_{3}$	AE of $y_{3}$	AE of $z_{3}$	ME of $θ_{1}$	ME of $θ_{2}$
Error Percentage	0.3851%	0.7532%	0.1101%	0.0204%	0.0203%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Article metric data becomes available approximately 24 hours after publication online.