RG-HDP-VD: A Physics-Aware Cooperative Trajectory Planning Framework for Heterogeneous Multi-UAVs

Han, Dan; Hua, Zhaoyuan; Zhu, Xinyu; Luo, Liang; Jiang, Hao; Wang, Lifang

doi:10.3390/drones10030192

Open AccessArticle

RG-HDP-VD: A Physics-Aware Cooperative Trajectory Planning Framework for Heterogeneous Multi-UAVs

by

Dan Han

^1,2,3,

Zhaoyuan Hua

^1,*

,

Xinyu Zhu

¹,

Liang Luo

⁴,

Hao Jiang

¹ and

Lifang Wang

²

¹

College of Aviation Electronic and Electrical Engineering, Civil Aviation Flight University of China, Chengdu 641419, China

²

Institute of Electrical Engineering, Chinese Academy of Sciences, Beijing 100190, China

³

University of Chinese Academy of Sciences, Beijing 100049, China

⁴

Qingdao Air Traffic Management Station, Civil Aviation of China, Qingdao 266300, China

^*

Author to whom correspondence should be addressed.

Drones 2026, 10(3), 192; https://doi.org/10.3390/drones10030192

Submission received: 21 January 2026 / Revised: 24 February 2026 / Accepted: 4 March 2026 / Published: 10 March 2026

(This article belongs to the Special Issue UAV Path Planning Algorithms for Surveillance and Reconnaissance in Civil Applications)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

A physics-aware cooperative planning framework (RG-HDP-VD) was developed and validated in real-world flights, integrating mass-augmented energy topology, regret-guided arbitration, and velocity decomposition.
The framework demonstrates superior scalability in saturated airspace, maintaining a 95% success rate where baselines fail, while reducing average planning time by ~45% and lowering total system energy by 6.7%.

What are the implications of the main findings?

Physics-consistent right-of-way allocation mitigates energy-inefficient congestion by prioritizing high-penalty platforms, providing a highly scalable alternative to conventional methods that prevents deadlocks.
Mapping rigid time-window constraints into length-feasible regions via velocity envelopes offers a robust way to maintain spatiotemporal feasibility and prevent timing failures without relying on terminal loitering.

Abstract

This paper presents Regret-Guided Heuristic Decentralized Prioritized Planning with Velocity Decomposition (RG-HDP-VD), a physics-aware cooperative trajectory planning framework for heterogeneous Unmanned Aerial Vehicles (UAVs) relief delivery in post-earthquake, non-convex canyon environments. RG-HDP-VD addresses two prevalent failure modes: energy-inefficient congestion caused by ignoring time-varying payload dynamics, and the collapse of feasible sets due to strict arrival windows in fixed-speed planning. We construct a mass-augmented energy topology and use a mass-augmented energy-aware A* search to extract baseline physical metrics—path length, total energy, and unit-distance energy—for each UAV. Regret-Guided (RG) arbitration then quantifies the relative energy cost of waiting versus detouring at conflicts and grants right-of-way to heavy-load, high-cost platforms. These priorities are embedded into Heuristic Decentralized Prioritized Planning (HDP), which maintains a global spatiotemporal occupancy map and serializes planning to eliminate deadlocks. To satisfy tight time windows, Velocity Decomposition (VD) maps 4D temporal constraints into a 3D path-length feasible interval and is realized via an improved VD-TSRRT* sampling-based planner. In high-fidelity simulations, RG-HDP-VD demonstrates superior scalability over conventional methods, maintaining high success rates (up to 100%) in saturated scenarios, while reducing average planning time by ~45% and total system energy by 6.7%. Finally, real-world flight demonstrations using a heterogeneous quadrotor team validate the framework’s practical feasibility and robust hardware execution.

Keywords:

heterogeneous multi-UAVs; cooperative trajectory planning; physics-aware; regret-guided arbitration; velocity decomposition; dynamic payload

1. Introduction

Post-earthquake rescue operations often take place in extreme conditions, where transportation networks are disrupted, communication infrastructure is degraded, and terrain exhibits drastic elevation variations. In unstructured and constrained airspace formed by canyons, ridges, and fragmented slopes, aerial delivery has become a key means of achieving relief-supply coverage within the golden hour. Consequently, heterogeneous multi-unmanned aerial systems (HMUASs)—leveraging complementary payload capabilities and coordinated task execution—have emerged as an important paradigm for post-disaster UAV operations [1]. As a cornerstone of autonomous HMUAS missions, multi-UAV cooperative path planning (MUCPP) aims to generate collision-free trajectories satisfying physical feasibility and inter-agent safety [2,3,4]. While recent surveys [5,6] have cataloged the algorithmic landscape of MUCPP, they consistently identify scalability, strong constraint coupling, and physical consistency as the most persistent barriers to deployable collaboration.

However, the strong spatiotemporal coupling inherent to post-disaster delivery poses significant challenges, a point also emphasized in recent reviews [6]. First, severe occlusions and bottleneck passages induced by non-convex terrain lead to dense trajectory interleaving in terminal airspace. Second, many missions require simultaneous arrival (SAT) to enable saturated service at target regions [7,8]. This evolves the planning problem into a joint optimization in a high-dimensional configuration space, where complexity grows rapidly with environmental clutter. Some studies [9,10] emphasize that introducing continuous-time domains and variable action durations substantially increases the difficulty of conflict resolution. More critically, post-disaster delivery introduces variable-mass dynamics, a key physical factor often overlooked in conventional MUCPP: the step change in mass due to payload release can cause nonlinear variations in propulsion energy consumption and hovering cost, and such effects differ significantly across heterogeneous platforms. Recent data-driven energy modeling [11] suggests that ignoring such state-dependent power variations can significantly distort cost evaluation, leading to suboptimal or even failed coordination.

Existing MUCPP approaches can generally be categorized into reactive, coupled, and decoupled paradigms [4]. Reactive methods (e.g., Optimal Reciprocal Collision Avoidance (ORCA) [12] and artificial potential fields [13]) generate collision-avoidance behaviors through local rules, offering strong online responsiveness and low computational overhead. However, their reliance on local information limits system-level energy efficiency improvement and completeness guarantees and makes it difficult to enforce mission-level temporal constraints such as SAT. Coupled methods (e.g., Conflict-Based Search (CBS) [14] and its bounded-suboptimal extension Enhanced Edge-Weighted Conflict-Based Search (EECBS) [15]) jointly resolve conflicts under a global constraint framework, providing stronger theoretical guarantees but often suffering from scalability and real-time limitations as the number of agents and conflicts increases [14,15,16,17]. Decoupled methods (e.g., prioritized planning (PP) [18], decentralized prioritized planning (DPP) [19], and dynamic-priority sampling planners [20]) improve scalability by serializing multi-agent planning through priority assignment, but their performance can be highly sensitive to priority order and may fail on solvable instances under unfavorable sequencing [18,19,20,21]. While learning-assisted approaches [22,23,24] have gained traction for their adaptability in dense airspace, ensuring strict constraint satisfaction and physical interpretability remains challenging. Specifically, if right-of-way arbitration relies solely on geometric distance or random ordering, it ignores the nonlinear dependence of energy cost on payload mass, potentially leading to physically inefficient outcomes [11,25].

In summary, post-disaster canyon delivery can be formulated as a cooperative planning problem characterized by platform heterogeneity, time-varying payloads, and strong spatiotemporal coupling. This setting exposes two key gaps:

Insufficient modeling of physical heterogeneity and payload changes can trigger energy imbalance and deadlocks. Heavy-lift and light UAVs exhibit strong asymmetry in hovering power. In bottlenecks, distance-based arbitration underestimates the waiting penalty of heavy-load platforms, forcing them to idle where hovering dominates expenditure. This may trigger energy-inefficient congestion. Consistent with the need for realistic cost modeling highlighted in recent surveys [5,6], this gap calls for a physically consistent, regret-guided arbitration mechanism.
Rigid coupling between time windows and fixed-speed assumptions leads to rapid feasible-set shrinkage and cascading failures. SAT missions with tight windows often suffer from solution-space contraction. Fixed-speed assumptions rigidly couple arrival time to path length, meaning any detour directly erodes time margins. While spacetime planning variants (e.g., spacetime RRT* [26]) and learning-based conflict resolution [23,24,27] attempt to handle these constraints, they often lack the explicit velocity elasticity needed for robust rhythm modulation. To prevent single-agent infeasibility from propagating through the fleet, time feasibility should be transformed into distributed velocity envelopes along flight segments, structurally mitigating the contraction of the feasible set.

To address these challenges, we propose a physics-aware cooperative planning framework for heterogeneous multi-UAVs, termed RG-HDP-VD. The framework integrates physical constraints into a layered perception–decision–execution pipeline. First, a mass-augmented energy topology is constructed to extract baseline physical metrics via mass-augmented energy-aware A*. Second, Regret-Guided (RG) dynamic arbitration quantifies the asymmetric energy costs of waiting versus detouring and assigns right-of-way to platforms with higher waiting penalties. This arbitration is embedded into HDP [28] to maintain spatiotemporal consistency and guarantee deadlock-free serialized planning. Finally, Velocity Decomposition (VD) relaxes rigid time-window constraints into an elastic path-length feasible set, enabling efficient planning under tight deadlines when combined with an improved VD-TSRRT* algorithm.

The main contributions are summarized as follows:

Regret-Guided dynamic right-of-way arbitration. A physically grounded fairness mechanism that explicitly captures energy asymmetry across heterogeneous platforms to prevent energy-inefficient congestion in bottleneck airspace.
Velocity-decomposition-based elastic spatiotemporal planning. By introducing velocity envelopes, rigid fixed-speed constraints are decoupled into an elastic path-length feasible set, expanding feasibility under tight time windows in non-convex terrain.
A physics-aware layered cooperative planning framework (RG-HDP-VD). The framework integrates mass-augmented energy topology, dynamic arbitration, and executable 4D trajectory generation with adaptive smoothing (B-spline or PCHIP) and continuous collision checking.

Based on the aforementioned modeling and method design, the subsequent sections of this paper are organized as follows: Section 2 formalizes the post-disaster heterogeneous cooperative delivery task as a joint optimization problem; Section 3 details the layered algorithmic implementation of RG-HDP-VD; Section 4 validates the effectiveness and robustness of the proposed framework through high-fidelity simulation experiments; Section 5 presents real-world flight demonstrations, and Section 6 concludes the paper.

2. Problem Formulation and System Modeling

This section models the cooperative material delivery task of heterogeneous multi-UAVs in complex fractal canyon environments as a joint optimization problem constrained by multiple physical limits and strong spatiotemporal coupling. We explicitly characterize the time-varying mass induced by payload release and the resulting differences in energy-consumption topology. In addition, an elastic velocity envelope is introduced to describe the physical feasibility region for temporal coordination, providing a mathematical constraint basis for subsequent algorithm design. The essential mathematical notations used in this framework are detailed in Appendix A.

2.1. System Modeling and Performance Constraints

Consider a multi-UAV system

U = {U_{1}, \dots, U_{N}}

composed of

N

heterogeneous UAVs executing a cooperative delivery task in a 3D workspace

W \subset R^{3}

. The planner must generate an executable trajectory

σ_{i}

for each UAV from a start point

s_{i}

to a goal region

G_{i}

. The environment consists of fractal terrain obstacles

O_{t e r r a i n}

(non-convex height fields) and a cylindrical threat zone

O_{t h r e a t}

, denoted as

O_{e n v} = O_{t e r r a i n} \cup O_{t h r e a t}

. The heterogeneous characteristics of UAV

U_{i}

are described by the parameter tuple

P_{i}

.

2.1.1. Dynamic Mass Model

Material delivery causes a sudden change in UAV mass at the drop moment. The total mass is modeled as a piecewise constant function with respect to the drop time

t_{d r o p, i}

:

m_{i} (t) = m_{e m p t y, i} + I (t < t_{d r o p, i}) \cdot m_{p a y l o a d, i},

(1)

where

I (\cdot)

is the indicator function. This model enables the planner to distinguish the physical cost differences between heavy-load and light-load states.

2.1.2. Mass-Augmented Energy Cost Model

To address the issue of energy fairness among heterogeneous UAVs, the system moves beyond Euclidean distance and constructs a cost model

J_{e n g}

based on physical work. The total energy consumption of any trajectory

σ_{i}

consists of motion work and hovering waiting:

J_{e n g} (σ_{i}) = \int_{0}^{T} (P_{m o t i o n} (v, \dot{z}, m_{i} (t)) + P_{h o v e r} (m_{i} (t))) d t .

(2)

where

\dot{z}

denotes the vertical velocity of the UAV. In implementation, this integral is approximated by discrete path segment accumulation. The model incorporates two key physical attributes:

Asymmetric Motion Power: $P_{m o t i o n}$ distinguishes between climbing and descending efficiency, penalizing aggressive maneuvering under heavy-load states;
Nonlinear Hovering Power: $P_{h o v e r} = α_{h} m^{1.5}$ , where $α_{h}$ is the hovering power coefficient, implying that the cost for heavy-load platforms to wait at bottlenecks is significantly higher, providing a physical basis for subsequent right-of-way arbitration.

2.1.3. Kinematic Feasibility Under a Velocity Envelope

Each UAV is modeled as a point mass subject to speed limits, with dynamics

{\dot{p}}_{i} (t) = v_{i} (t)

. Unlike fixed-speed assumptions, we introduce an elastic velocity envelope, allowing the aircraft to adjust its speed within physical limits to satisfy temporal coordination. The speed constraint is defined as:

v_{m i n, i} \leq ∥ v_{i} (t) ∥ \leq v_{m a x, i}, \forall t \in [0, T] .

(3)

The feasible interval

[v_{m i n, i}, v_{m a x, i}]

forms the physical foundation for temporal coordination.

2.2. Spatiotemporal Cooperative Constraints

Cooperative planning requires heterogeneous UAVs to satisfy static obstacle avoidance in non-convex environments, maintain inter-agent safety separation in continuous time, and achieve temporal coordination under physical feasibility. We therefore provide computable constraint definitions from both spatial and temporal coupling perspectives.

2.2.1. Continuous Inter-Agent Safety Constraint

To avoid the Tunneling Effect caused by discrete time-step detection during high-speed relative motion, we employ continuous occupancy sets to characterize inter-agent separation. Let the spatial occupancy of UAV

U_{i}

at time

τ

be the Minkowski Sum of its centroid trajectory and a safety sphere:

V_{i} (τ) = p_{i} (τ) \oplus B (0, r_{s a f e}),

(4)

where

p_{i} (τ) \in R^{3}

is the position, and

r_{s a f e}

integrates body scale and control uncertainty. The global temporal safety constraint is defined as:

d i s t (V_{i} (τ), V_{j} (τ)) \geq D_{m i n}, \forall τ \in [0, T_{c o m}], \forall i \neq j,

(5)

where

T_{c o m}

is the unified mission timeline length. When a UAV reaches the target area first, it is considered to maintain occupancy at the target point, ensuring the definition consistency of

V_{i} (τ)

throughout the entire time domain.

2.2.2. Cooperative Arrival Time Window Constraint

With a velocity envelope, we derive the feasible travel-time interval of each geometric path and evaluate its compliance with individual arrival windows.

For any geometric path

π_{i}

of UAV

U_{i}

, based on its elastic velocity envelope

V_{i} = [v_{m i n, i}, v_{m a x, i}]

, the physical reachable time domain mapped by path length

L (σ_{i})

is:

T_{p h y s} (L (σ_{i})) = [\frac{L (σ_{i})}{v_{m a x, i}}, \frac{L (σ_{i})}{v_{m i n, i}}] .

(6)

The existence condition for a global cooperative time anchor

t_{c o}

is:

\exists t_{c o} \in R^{+} s . t . t_{c o} \in ⋂_{i = 1}^{N} T_{p h y s} (L (σ_{i})) .

(7)

2.3. Cooperative Path Planning Problem Definition

Based on the aforementioned kinematic models, velocity envelope constraints, and spatiotemporal coupling constraints, this paper formalizes the cooperative material delivery task for heterogeneous UAVs as a joint optimization problem.

Problem 1 (Physics-Aware Cooperative Planning).

Given a heterogeneous multi-UAV system

U

, start and goal sets

S, G

, and a fractal environment

O_{e n v}

(where terrain obstacles

O_{t e r r a i n}

are hard constraints and threat zones involve soft risk penalties via potential fields

R (p)

), find an optimal set of trajectories

Σ^{*} = {σ_{1}, \dots, σ_{N}}

, a global cooperative time anchor

t_{c o}

, and mission-specified individual arrival timing biases

\{δ_{i}}_{i = 1}^{N}

, such that the following weighted cost is minimized:

\underset{Σ, t_{c o}}{m i n} J_{g l o b a l} = \sum_{i = 1}^{N} (λ_{e n g} \cdot {\hat{J}}_{e n g} (σ_{i}) + λ_{t i m e} \cdot {\hat{J}}_{s y n c} (σ_{i}, t_{c o}, δ_{i}) + γ \cdot {\hat{J}}_{r i s k} (σ_{i})) .

(8)

The weights

λ_{e n g}

and

λ_{t i m e}

control the trade-off between energy efficiency and time synchronization in Equation (8) and are also used in the time-anchoring objective in Equation (12). We tune

λ_{e n g}

and

λ_{t i m e}

using a lexicographic criterion: (i) maximize the mission success rate, and (ii) among solutions achieving full success, minimize total energy and synchronization error. Following this rule, we perform structured parameter sweeps in the baseline Scenario A to identify a feasible region and choose practical defaults (details in Section 4.4). The risk weight

γ

is fixed (

γ = 0.2

) as a soft safety margin, and

ϵ

is fixed (

ϵ = 1 \times 10^{- 6}

) as a numerical stabilizer. Here, the individual target arrival time is defined as

t_{t a r g e t, i} = t_{c o} + δ_{i}

and is used to measure synchronization error.

Cost Function Analysis:

Energy Efficiency ( ${\hat{J}}_{e n g}$ ): A normalized energy term based on Equation (2), guiding the planner to generate energy-saving paths consistent with heterogeneous energy efficiency characteristics;
Synchronization Accuracy ( ${\hat{J}}_{s y n c}$ ): Quantifies the deviation relative to the target arrival time:

${\hat{J}}_{s y n c} (σ_{i}, t_{c o}, δ_{i}) = \frac{∣ t_{a r r, i} - t_{t a r g e t, i} ∣}{T_{n o r m}}, t_{t a r g e t, i} = t_{c o} + δ_{i},$

(9)

where $t_{a r r, i}$ is the final arrival time of the trajectory, and $T_{n o r m}$ is a normalization scale (can be taken as $m a x (t_{t a r g e t, i}, ε)$ or uniformly $m a x (t_{c o}, ε)$ to avoid division by zero). $δ_{i}$ is given by mission rank and arrival interval (e.g., $δ_{i} = (rank (i) - 1) \cdot Δ t_{g a p}$ ). Under strict constraints, this term should converge within the allowable time window error range;
Threat Exposure ( ${\hat{J}}_{r i s k}$ ): Conditional Value-at-Risk (CVaR) is employed to assess spatial safety. Let $R (p)$ be the threat potential field value at position $p$ , and define the random variable $X_{i}$ as the risk distribution along trajectory $σ_{i}$ (e.g., constituted by time-domain sampling of $R (σ_{i} (t))$ ). Then, $η$ -CVaR [29] is defined as the expected value of the worst $η %$ high-risk segments:

${\hat{J}}_{r i s k} (σ_{i}) = {C V a R}_{η} (X_{i}) = \underset{a \in R}{i n f} \{a + \frac{1}{η} E [(X_{i} - a)^{+}]\},$

(10)

Here, $a$ is an auxiliary scalar (VaR-like threshold) introduced in the standard CVaR reformulation; the infimum over $a$ yields ${CVaR}_{η} (X_{i})$ , where $(x)^{+} = m a x (x, 0)$ . This metric enhances robustness against non-deterministic disturbances by heavily penalizing the highest-risk segments of the path rather than the average risk [30].

3. The RG-HDP-VD Physics-Aware Cooperative Planning Framework

3.1. System Architecture and Problem Decomposition

This paper proposes a hierarchical cooperative planning framework, termed RG-HDP-VD, which constructs a layered computable closed-loop as illustrated in Figure 1. In this framework, the RG mechanism handles priority arbitration using consistent physical costs. HDP manages decentralized sequential coordination and spatiotemporal occupancy. Finally, VD decouples time-window feasibility into geometric length-feasible regions, which are then embedded into the underlying planner.

Physics-Aware and Topology Abstraction Layer (L0–L1): Acting as the perception frontend, this layer aims to establish a unified physical metric baseline. It performs nonlinear fusion of heterogeneous physical parameters (from L0) and environmental constraints, mapping differences in payload time-variance and aerodynamic efficiency into an anisotropic energy cost field. Specifically, rather than employing explicit fluid-dynamics equations, these aerodynamic effects are introduced through the empirical power-consumption models in Equation (2) via

P_{m o t i o n} (v, \dot{z}, m_{i} (t))

and

P_{h o v e r} (m_{i} (t))

. The corresponding model coefficients are included in the platform parameter tuple

P_{i}

(e.g., the hovering power coefficient

α_{h}

in Table 1), enabling the cost field to capture macroscopic phenomena such as asymmetric vertical-motion efficiency and nonlinear hovering penalties under heavy load. L1 extracts baseline topological features for each UAV using a Mass-Augmented Energy A* algorithm, outputting a triplet

(L_{i}, E_{i}, k_{i})

containing geometric path length, total energy consumption, and energy consumption per unit distance. This process maps differences in motion capability across heterogeneous airframes in unstructured environments into comparable physical cost metrics. These metrics then provide an objective baseline for the subsequent game-theoretic layer.

Cooperative Strategic Decision Layer (L2): This layer performs strategic resource allocation based on physical topology information. Addressing resource contention deadlocks among heterogeneous UAVs, the Regret-Guided (RG) mechanism utilizes the ratio difference between “waiting” and “detouring” energy costs to dynamically generate conflict-free priority sequences, strategically establishing right-of-way advantages for heavy-lift, high-energy platforms. Simultaneously, the Global Time Anchoring block searches for an optimal cooperative tempo (

t_{c o}

) within the intersection of physical feasibility regions. This step eliminates rhythm mismatches across different aircraft types in the time domain and outputs rigid spatiotemporal constraint directives to downstream layers. On this basis, HDP acts as the coordination kernel to maintain the global spatiotemporal occupancy map

H_{a l l}

: it records determined high-priority trajectory occupancies into

H_{a l l}

and treats them as dynamic spatiotemporal obstacles for low-priority planning.

Elastic Execution and Validation Layer (L3–L4): This layer addresses feasibility issues under tight time windows, realizing the grounding of discrete decisions into continuous trajectories. To counter the solution space contraction caused by fixed-speed assumptions in non-convex terrain, L3 introduces Velocity Decomposition (VD) technology. Using the velocity envelope

[v_{m i n}, v_{m a x}]

, rigid time window constraints are decoupled and mapped into elastic path-length feasible regions. This implies that the planner (VD-TSRRT*) needs only to search for paths satisfying geometric length requirements under

H_{a l l}

constraints to implicitly restore temporal feasibility. Finally, L4 performs temporal smoothing, continuous capsule-swept volume detection, and cooperative error auditing via B-spline or PCHIP. In this way, it generates 4D trajectories

σ_{i} (t)

that satisfy physical feasibility constraints, remain collision-free, and complete the overall “Perception–Decision–Execution” closed loop.

3.2. Mass-Augmented Energy Topology

In the post-disaster fractal canyon environment

O_{e n v} = O_{t e r r a i n} \cup O_{t h r e a t}

, simple geometric distance cannot reflect the costs of heterogeneous aircraft types. We employ the Mass-Augmented Energy A* algorithm to extract a baseline topology for each UAV

U_{i}

that satisfies terrain and threat constraints, outputting the triplet

(L_{i}, E_{i}, k_{i})

as physical tokens for the subsequent cooperative game.

Regarding the topology extraction mechanism, the algorithm constructs a payload-aware cost evaluation model in which instantaneous mass is explicitly embedded into the cost function. The energy for each displacement segment is dynamically weighted based on

m_{s e g} = m_{e m p t y, i} + m_{p a y l o a d, i}

. To avoid the computational explosion associated with high-dimensional search while still ensuring physical fidelity, the algorithm does not expand payload as an explicit fourth dimension. Instead, it attaches payload as a node attribute that participates in the

g (n)

update, so the skeletal search remains in 3D space.

The output of this layer characterizes only the motion work baseline in a static environment. Hovering waiting costs, which are directly related to dynamic coordination, are explicitly modeled in L2 via regret values and time anchoring. This design of “dynamic–static separation” effectively avoids strong coupling assumptions regarding cooperative strategies during the pre-planning phase.

3.3. Regret-Guided Arbitration and Time Anchoring

Based on the physical topological features submitted by L1, this layer focuses on resource allocation across two strategic dimensions: establishing the global cooperative time tempo and adjudicating passage priority in conflict zones. By generating a set of physically feasible rigid time windows

[t_{i}^{l o w}, t_{i}^{h i g h}]

and a deadlock-free planning priority sequence

U_{s o r t e d}

, the framework transforms the complex multi-agent game into a serialized single-agent spatiotemporal constrained planning problem.

3.3.1. Global Cooperative Time Anchoring

In heterogeneous multi-UAV systems, the natural rhythms of different aircraft types are difficult to synchronize. L2 executes global anchoring on the time axis to search for an optimal cooperative moment

t_{c o}

that satisfies mission requirements while aligning with collective physical energy efficiency.

The algorithm first calculates the baseline arrival time for each UAV based on cruising speed:

t_{m i n, i} = L_{i} / v_{c r u i s e, i}

. Subsequently, it searches for the optimal solution within the interval

t \in [m a x (t_{m i n, i}), ρ \cdot m a x (t_{m i n, i})]

by constructing a weighted objective function

J (t)

that incorporates time efficiency and energy cost:

J (t) = \sum_{i = 1}^{N} (λ_{t i m e} \cdot Δ_{i} (t) + λ_{e n g} \cdot k_{i}^{n o r m} \cdot Δ_{i} (t)),

(11)

where

Δ_{i} (t) = (t - t_{m i n, i}) / m a x (t_{m i n, i}, ϵ)

is the normalized time deviation and

ρ

(

ρ \geq 1

) is a time relaxation factor defining the upper bound multiplier for the feasible arrival time search space. In this function, the unit distance energy consumption

k_{i}^{n o r m}

serves as a key weighting factor, explicitly amplifying the time deviation cost for heavy-load models. This means that the optimization naturally biases the solution toward “high-energy” platforms. As a result, the selected

t_{c o}

better accommodates the energy-efficient operating regime of heavy-load aircraft.

After determining the baseline time

t_{c o}

, the system generates rigid time windows to be issued to the execution layer. To prevent top-level directives from exceeding the physical feasibility boundaries of the airframes, a strict physical limit truncation mechanism is introduced. First, the physical lower bound for time under maximum thrust is calculated for each aircraft type:

t_{m i n, i}^{p h y s} = \frac{L_{i}}{v_{c r u i s e, i} \cdot ρ_{m a x}},

(12)

where

v_{m a x_r a t i o}

is the maximum allowable overspeed ratio. Subsequently, a mandatory check is performed when generating the time window lower bound

t_{i}^{l o w}

:

t_{i}^{l o w} = m a x {t_{c o} - Δ T_{e a r l y}, t_{m i n, i}^{p h y s}}, t_{i}^{h i g h} = t_{i}^{l o w} + δ .

(13)

Here,

Δ T_{e a r l y} > 0

(in s) is a user-specified early-arrival tolerance that limits how much earlier than the cooperative anchor time

t_{c o}

a vehicle may be scheduled to arrive. This mechanism ensures that, even if

t_{c o}

is set aggressively, the time window

[t_{l o w, i}, t_{h i g h, i}]

issued to each UAV always lies within its physical feasibility region. It therefore prevents planning failures where high-level spatiotemporal constraints violate low-level dynamic limits.

3.3.2. Regret-Guided Dynamic Right-of-Way Arbitration

Time anchoring solves “when to arrive”, while right-of-way arbitration solves “who goes first”. To achieve energy-aware fairness passage allocation in conflict scenarios, this layer proposes a Regret-Guided (RG) dynamic arbitration mechanism. This mechanism uses the unit distance energy consumption

k_{i}

(unit: J/m) output by L1 as a strategic proxy metric. It compares the physical cost differences induced by two types of actions—hovering (waiting) and maneuvering (detouring)—at the conflict point to calculate a regret value

R_{i}

, thereby generating a right-of-way priority sequence.

Specifically, for the

i

-th UAV, define the waiting energy cost

C_{i}^{w a i t}

and the detouring energy cost

C_{i}^{d e t o u r}

(both in J) as:

C_{i}^{w a i t} = P_{i}^{h o v e r} \cdot Δ t_{i}^{w a i t} \approx (k_{i} \cdot v_{c r u i s e, i} \cdot η_{h o v e r}) \cdot Δ t_{i}^{w a i t},

(14)

C_{i}^{d e t o u r} = e_{i}^{u n i t} \cdot Δ L_{i}^{d e t o u r} \approx (k_{i} \cdot η_{r e r o u t e}) \cdot Δ L_{i}^{d e t o u r} .

(15)

Here:

v_{c r u i s e, i}

is the cruising speed (m/s). Since the dimension of

k_{i} \cdot v_{c r u i s e, i}

is Watts (W), it serves as a baseline proxy for “energy consumption per unit time/power.” It is mapped to the hovering power proxy

P_{i}^{h o v e r}

(W) via the coefficient

η_{h o v e r}

.

η_{r e r o u t e}

characterizes the additional energy gain caused by detouring, mapping

k_{i}

to the detour unit distance energy proxy

e_{i}^{u n i t}

(J/m).

Δ t_{i}^{w a i t}

is determined by the cooperative time difference and conflict severity, while

Δ L_{i}^{d e t o u r}

is mapped from the conflict zone’s geometric scale with a lower bound set to avoid degenerate comparisons caused by “zero detour.”

On this basis, the regret value is defined as the ratio of the costs of the two actions:

R_{i} = \frac{C_{i}^{w a i t}}{C_{i}^{d e t o u r} + ϵ},

(16)

where

ϵ

is a small positive number to prevent numerical instability. A larger

R_{i}

indicates that the energy penalty associated with waiting is higher relative to that of detouring, suggesting that the platform should preferentially detour rather than hover.

For heterogeneous platforms in bottleneck games, the hovering power of heavy-lift transports increases nonlinearly with mass (

P_{h o v e r} \propto m^{1.5}

), whereas level-flight propulsion energy grows more moderately (approximately proportional to m). Consequently, their

C_{i}^{w a i t}

is often significantly larger than

C_{i}^{d e t o u r}

, resulting in a large

R_{i}

. Conversely, light platforms have lower hovering and maneuvering costs, yielding a smaller

R_{i}

, making them more suitable as “yielders/regulators” in the system. The system allocates right-of-way in descending order of

R_{i}

, prioritizing the passage of high-energy-consumption platforms through bottleneck areas, thereby suppressing energy-inefficient congestion and reducing total system energy consumption in the sense of “physical fairness”.

3.3.3. HDP Decentralized Priority Coordination and Occupancy Closed-Loop

To ground the physics-aware priority sequence

U = \{U_{1}, \dots, U_{N}\}

obtained from L2 into executable collision-free trajectories, we employ HDP as the cooperative execution backbone. Unlike traditional HDP, which relies on heuristics to determine right-of-way, this framework uses the priority output by the RG mechanism as input, achieving one-way conflict resolution through an asynchronous “Plan–Broadcast–Update” closed loop.

The system executes serialized coordination according to

U_{s o r t e d}

: high-priority UAVs generate trajectories

σ_{i} (t)

first, and their results are treated as occupancy declarations for spatiotemporal resources; low-priority UAVs, upon receiving these trajectories, convert them into dynamic constraints and solve within the remaining feasible region, realizing a priority-ordered coordination mode in which higher-priority UAVs claim spatiotemporal occupancy first, and lower-priority UAVs yield accordingly.

To ensure continuous inter-agent safety and avoid deadlocks, the system maintains a global spatiotemporal occupancy set

H_{a l l}

. When

u_{i}

completes planning and broadcasts its occupancy envelope, any low-priority UAV

u_{k} (k > i)

immediately updates its local constraints:

O_{l o c a l}^{(k)} \leftarrow O_{s t a t i c} \cup H^{(k)}, H^{(k)} = ⋃_{j < k} SpatioTemporal (σ_{j}) .

(17)

This update rule ensures that conflict constraints are propagated unidirectionally along the priority sequence: low-priority individuals must adapt to high-priority trajectories, mechanistically eliminating cooperative deadlocks caused by circular waiting.

Proposition 1 (deadlock-free w.r.t. circular waiting).

Let

U = \{U_{1}, \dots, U_{N}\}

be a total order produced by the RG module. Under the HDP protocol in which agent

u_{k}

plans while treating

\{σ_{j} ∣ j < k\}

as fixed spatiotemporal obstacles (Equation (18)), the constraint-dependency graph is acyclic; hence, the circular-wait condition is absent and cooperative circular-wait deadlocks cannot occur.

Proof (sketch).

By Equation (18), any “yield/wait due to conflict” relation can only point from a lower-priority agent to a higher-priority one (from

k

to some

j < k

). Therefore, all directed edges follow the strict order of

U_{s o r t e d}

, which forms a Directed Acyclic Graph (DAG) and contains no directed cycle. Circular waiting requires a directed cycle; hence, it is impossible. □

Remark (deadlock vs. planning failure).

Proposition 1 addresses cooperative deadlocks in the sense of circular waiting. This is strictly different from planning failure, where a low-priority agent finds no feasible solution under the remaining constraints (an incompleteness issue common to decoupled prioritized planning). Layer L3 (Velocity Decomposition) is explicitly designed to mitigate such feasibility collapse by enlarging the length-feasible set.

3.4. Elastic Execution Based on Velocity Decomposition

Layer L3 introduces the Velocity Decomposition (VD) technology, which decouples time feasibility from rigid 4D spatiotemporal constraints into path-length feasible intervals in 3D space. This logic is embedded into the extension, pruning, and cost evaluation mechanisms of RRT*, constituting the VD-TSRRT* planner.

3.4.1. Principle of Velocity Decomposition

For the

i

-th UAV, given the cruising speed

v_{c r u i s e, i}

and the allowable speed adjustment ratio interval

{[ρ}_{m a x, i}, ρ_{m i n, i}]

, the physical velocity envelope available at the execution layer is defined as:

v_{m i n, i} = v_{c r u i s e, i} ρ_{m i n}, v_{m a x, i} = v_{c r u i s e, i} ρ_{m a x}

(18)

For any geometric path

σ_{i}

in 3D space, let its length be

L (π_{i})

. Under this velocity envelope, the physically reachable time interval

T_{p h y s}

corresponding to this path is given by Equation (6).

Therefore, the necessary and sufficient condition for path

σ_{i}

to satisfy the time window

T_{w i n} = [t_{i}^{l o w}, t_{i}^{h i g h}]

issued by L2 is that there exists a non-empty intersection between the two intervals, i.e.,

T_{p h y s} (L (π_{i})) \cap T_{w i n} \neq \emptyset

. This judgment condition can be equivalently rewritten as a feasible region constraint on the path length:

L (π_{i}) \in [v_{m i n, i} \cdot t_{i}^{l o w}, v_{m a x, i} \cdot t_{i}^{h i g h}] ≜ L_{f e a s, i} .

(19)

Consequently, the strong spatiotemporal coupling constraints that originally required handling in the

x - y - z - t

joint configuration space are transformed into interval constraints on the geometric length

L (π_{i})

in the 3D search space. The planner only needs to find a collision-free path in space satisfying

L (π_{i}) \in L_{f e a s, i}

to provide a physically feasible geometric skeleton for the precise time alignment in the subsequent L4 layer, thereby trading velocity elasticity for geometric freedom.

3.4.2. Implementation of VD-TSRRT* Planning Algorithm

VD-TSRRT* does not search for precise trajectories directly in 4D spatiotemporal space; instead, it generates geometric paths

σ_{i}

in 3D space that satisfy velocity decomposition feasibility, using a cost function to guide the search toward shorter (more energy-efficient) solutions. Its core improvements include three mechanisms:

Physical Pre-check: Utilizing the topological prior of the baseline path $σ_{b a s e}$ from L1, if its length $L_{b a s e}$ satisfies the intersection condition between the physically reachable time domain and the target window, sampling is skipped, and the path is output directly:

[\frac{L_{b a s e}}{v_{m a x, i}}, \frac{L_{b a s e}}{v_{m i n, i}}] \cap [t_{i}^{l o w}, t_{i}^{h i g h}] \neq \emptyset .

(20)

2.: Pruning & Adaptive Bias: A maximum physical length upper bound $L_{m a x}$ is introduced, including a time tolerance $ϵ_{t}$ and a fixed margin $Δ L$ . For a tree node $n$ , if its total length estimate $L_{e s t} > L_{m a x}$ , it is judged as “inevitably late” and forcibly pruned:

$L_{m a x} = v_{m a x, i} \cdot (t_{i}^{h i g h} + ϵ_{t}) + Δ L .$

(21)

Simultaneously, if the current optimal path is too short and leads to an “inevitably early” status (i.e., $L_{e s t} / v_{m i n, i} < t_{i}^{l o w} - ϵ_{t}$ ), the algorithm automatically decreases the goal bias. This forces the random tree to grow more circuitously into the surrounding free space so that it can enter the feasible length interval.
3.: Feasibility-Aware Cost and Asymptotic Optimality: A step-type cost function is constructed, treating the time feasibility intersection as a hard threshold:

C o s t (n) = \{\begin{matrix} L_{e s t}, & if [\frac{L_{e s t}}{v_{m a x, i}}, \frac{L_{e s t}}{v_{m i n, i}}] \cap [t_{i}^{l o w} - ϵ_{t}, t_{i}^{h i g h} + ϵ_{t}] \neq \emptyset \\ + \infty, & otherwise \end{matrix}

(22)

On this basis, the algorithm explicitly executes Rewire Child and triggers Subtree Cost Propagation. This ensures asymptotic optimality within the feasible region under standard RRT* assumptions [31]. The complete procedure of the proposed VD-TSRRT planner is summarized in Algorithm 1. This optimality is conditional on a fixed priority order and the induced spatiotemporal constraints, and does not imply the global optimality of the coupled multi-UAV problem.

Algorithm 1. VD-TSRRT*(σ_base, t_low, t_high, v_min, v_max)

1. L_max ← v_max · (t_high + ε_t) + ΔL
2.

if Intersect (L (σ_{b a s e}) / v_{m a x}, L (σ_{b a s e}) / v_{m i n}

, t_{l o w}, t_{h i g h}

) then
3.    return σ_base
4.   end if
5.   Initialize tree T with x_start; optionally insert a prefix of σ_base into T
6.   best_node ← null; best_feas ← +∞; best_gap ← +∞
7.   for iter = 1 … maxIter do
8.    x_new ← SampleAndExtend(T)
9.    if g(x_new) + h(x_new) > L_max then continue end if (pruning; Equation (23))
10. Rewire-Parent(T, x_new); Rewire-Child(T, x_new)
11. if NearGoal(x_new) then
12. L_curr ← PathLength(T, x_new)
13.

if Intersect (L_{c} u r r / v_{m} a x, L_{c} u r r / v_{m} i n

, t_{l} o w, t_{h} i g h

) then (feasible; Equation (19))
14. if L_curr < best_feas then best_feas ← L_curr; best_node ← x_new end if
15. else
16. gap ← TimeGap(L_curr, t_low, t_high, v_min, v_max, ε_t) (Equation (23))
17. if gap < best_gap then best_gap ← gap; best_node ← x_new end if
18. end if
19. end if
20. if best_node ≠ null and L_est(best_node)/v_min < t_low − ε_t then decrease goalBias end if
21. end for
22. return BacktrackPath(T, best_node)

3.4.3. Fallback Strategy and Sequential Avoidance

Under extremely tight time windows or strong terrain constraints, a path satisfying the intersection condition may not temporarily exist. To prevent individual planning failure from triggering a collapse of multi-UAV coordination, VD-TSRRT* introduces a robust fallback mechanism: when no perfectly feasible solution exists, it returns the path with the minimum “Time Gap” distance to the feasible interval. Introducing

ϵ_{t}

, the gap is calculated as:

g a p = \{\begin{matrix} \frac{L}{v_{m a x, i}} - (t_{i}^{h i g h} + ϵ_{t}), & if \frac{L}{v_{m a x, i}} > t_{i}^{h i g h} + ϵ_{t} (inevitably late) \\ (t_{i}^{l o w} - ϵ_{t}) - \frac{L}{v_{m i n, i}}, & if \frac{L}{v_{m i n, i}} < t_{i}^{l o w} - ϵ_{t} (inevitably early) \end{matrix}

(23)

The algorithm selects the candidate path with the minimum

g a p

for output. This design ensures that the system can consistently return an executable solution that minimizes physical-constraint violation, thereby preserving the maximum available slack for Layer L4 temporal fine-tuning or subsequent replanning.

Using the totally ordered priority

U_{s o r t e d}

output by L2, the system plans sequentially and maintains the global spatiotemporal occupancy map

H_{a l l}

. For any low-priority UAV, its planner maps the confirmed high-priority trajectory occupancies as dynamic spatiotemporal obstacles and superimposes them onto the constraint set. This achieves strict inter-agent safety separation under a unified continuous swept-volume collision detection logic. When an upstream trajectory updates, local replanning is triggered via broadcasting and occupancy updates, forming a decentralized closed-loop coordination.

3.5. Trajectory Realization and Continuous Verification

Layer L4 aims to transform the discrete path skeleton output by L3 into a 4D executable trajectory

σ_{i} (t)

that satisfies physical feasibility constraints and continuous safety. First, a cascaded hybrid smoothing strategy is adopted: the Ramer–Douglas–Peucker (RDP) algorithm is used for geometric denoising of the original path [32], and B-Spline [33] or PCHIP interpolation [34] is adaptively selected based on obstacle distance to generate smooth curves. Subsequently, physical boundary projection is executed to map the cooperative time anchor

t_{c o}

issued by L2 into path timestamps, enforcing the velocity bounds within the physical envelope

[v_{m i n}, v_{m a x}]

whenever path deformation causes boundary violations. Furthermore, to eliminate discrete detection blind spots (tunneling effects) during high-speed flight, capsule-swept volume detection is introduced [35]. By constructing the Minkowski Sum of the continuous geometry moving along the trajectory and environmental obstacles, strict collision verification is achieved across the entire time domain (

t \in [0, T]

) [35,36].

4. Experimental Evaluation and Analysis

This section provides a comprehensive evaluation of the cooperative planning performance of the RG-HDP-VD framework for heterogeneous multi-UAVs in complex non-convex terrain through high-fidelity simulation experiments.

4.1. Experimental Setup

4.1.1. Simulation Environment Deployment and Heterogeneous Physical Models

All experiments were executed on a workstation equipped with an AMD Ryzen 9 9950X processor (16 cores, 4.30 GHz) and 64 GB RAM. The high-fidelity 3D simulation environment and the proposed RG-HDP-VD framework were implemented in MATLAB R2024a. To ensure a fair and reproducible evaluation, a paired Monte Carlo protocol was adopted: in each trial, all compared methods shared identical terrain seeds, start/goal configurations, and obstacle/threat-zone placements.

The experimental workspace was set as a 3D restricted airspace of

500 m \times 300 m \times 150 m

. Non-convex canyon terrain was generated using the Diamond-Square algorithm. Environmental constraints were modeled as follows: (i) rigid terrain obstacles

O_{terrain}

, requiring the trajectory to satisfy

σ_{i} \cap O_{terrain} = \emptyset

; and (ii) soft cylindrical threat zones

O_{threat}

with radii

r_{obs} \in

m. These threat zones were penalized via a CVaR-based risk term

J_{risk}

mapped by a risk potential field

R (p)

.

We considered a fleet of

N = 8

heterogeneous UAVs, comprising heavy-lift and light models. Heavy-lift UAVs followed the piecewise variable-mass model defined in Equation (1), while light UAVs utilized a constant-mass model. The physical parameters are summarized in Table 1, encompassing maximum speeds, hovering power coefficients, safety radii, and the elastic velocity envelopes.

4.1.2. Mission Scenario Design

Two high-pressure mission scenarios were designed to evaluate spatiotemporal coordination and energy-aware arbitration (see Figure 2 and Figure 3).

Scenario A (Saturation Convergence): As shown in Figure 2, UAVs departed from distributed start locations and were required to converge to a single target under sequential arrival constraints. Each UAV

i

was assigned a target arrival time center:

t_{i}^{*} = t_{base} + (i - 1) \cdot Gap

. The actual arrival time was required to fall strictly within the corresponding window, with continuous inter-agent safety separation enforced throughout the flight.

Scenario B (Group Delivery): As illustrated in Figure 3, the UAVs were divided into two groups departing from diagonal start regions. They were tasked with traversing a central bottleneck obstructed by cylindrical threat zones to reach their respective targets. This layout explicitly induced strongly coupled decisions regarding detouring, waiting/hovering, and risk exposure.

4.1.3. Experimental Design and Evaluation Metrics

To comprehensively evaluate the proposed RG-HDP-VD framework, we conducted both baseline comparisons and ablation studies. For the baseline comparisons in Scenario A, the framework was evaluated against two representative methods: ECBS and ORCA. Their implementations were adapted from the established open-source codebases libMultiRobotPlanning and Python-RVO2, respectively, to incorporate unified map/task interfaces and dynamic constraint checking. In these comparative evaluations, the team size

N

was varied as depicted in Figure 4, while other parameters remained consistent with the standard Scenario A settings. For fairness under strict arrival windows, ECBS and ORCA are evaluated with an additional unified speed-retiming layer that modulates execution speed within the same velocity envelope and enforces the same continuous-time safety and timing checks, while keeping their original coordination logic unchanged.

Furthermore, to isolate the individual contributions of the proposed mechanisms, customized ablation studies were conducted with a fixed team size of

N = 8

. Specifically, a Regret-Guided (RG) ablation was performed in Scenario B by comparing the RG-HDP variant against Baseline-Geo, a geometric priority strategy lacking heterogeneous energy awareness. Additionally, a Velocity Decomposition (VD) ablation was conducted in Scenario A by comparing the VD-Enabled variant against Baseline-FixedV. To ensure a fair comparison, Baseline-FixedV locked the cruising speed but permitted hovering or loitering in safe airspace to attempt to satisfy the strict arrival time windows.

Unless otherwise specified in the subsequent subsections, each experimental configuration was rigorously evaluated over 15–20 Monte Carlo trials using the paired protocol. Overall performance was assessed and reported across three primary dimensions: physical efficiency (encompassing total energy consumption and the heavy-lift energy reduction ratio), spatiotemporal robustness (characterized by the planning success rate and time synchronization error), and computational efficiency (measured by the average planning time required to generate executable trajectories).

4.2. Baseline Comparison

To assess scalability under stringent spatiotemporal coupling, we compare the proposed RG-HDP-VD framework with two representative baselines—ECBS (a coupled MAPF solver) and ORCA (a reactive collision-avoidance method)—in Scenario A (Saturation Convergence). For each team size

N \in {4, 8, 12, 16}

, we conduct 20 paired Monte Carlo trials with identical seeds, start–goal configurations, and environment instances across methods. For fairness, all outputs are evaluated under the same continuous-time safety verification and timing constraints, and performance is reported in terms of mission success rate (SR), synchronization error

Δ T_{s y n c}

, and total energy consumption

E_{t o t a l}

(Table 2).

Figure 4 and Table 2 summarize the performance of the proposed RG-HDP-VD framework versus the baseline methods (ECBS and ORCA) under Scenario A (Saturation Convergence). Results are shown for team sizes

N \in {4, 8, 12, 16}

, with 20 Monte Carlo trials per setting. Figure 4 is divided into three subfigures, each illustrating a key performance metric:

Figure 4a reports the mission success rate (SR) as the team size increases. RG-HDP-VD remains highly robust, achieving 100% success up to

N = 12

and 95% at

N = 16

. In contrast, ECBS and ORCA degrade rapidly with density, and both fail completely at

N = 16

(0% SR), indicating that their coordination mechanisms cannot reliably satisfy simultaneous continuous-time separation and tight arrival-window constraints in saturated terminal airspace.

Figure 4b shows the synchronization error

Δ T_{s y n c}

. Across all scales, RG-HDP-VD yields the smallest timing deviation (0.8–4.1 s). At

N = 16

, its error (≈4.1 s) is roughly an order of magnitude lower than ECBS (≈18.2 s) and ORCA (≈34.8 s). This advantage is consistent with the core design of RG-HDP-VD, which enforces time-window feasibility through velocity decomposition (i.e., mapping rigid temporal requirements into length-feasible regions) rather than relying on terminal waiting/loitering, which becomes infeasible or unsafe under congestion.

Figure 4c compares total system energy

E_{t o t a l}

. RG-HDP-VD consistently consumes less energy than both baselines for every

N

, and the gap widens as

N

grows. The trend suggests that, under increasing interaction complexity, ECBS and ORCA incur higher energy due to stop-and-go behaviors, redundant avoidance maneuvers, and/or prolonged hovering, whereas RG-HDP-VD preserves smoother, rhythm-consistent trajectories that reduce both unnecessary detours and high-cost waiting.

In summary, these results show that the physics-aware integration of trajectory generation and temporal alignment in RG-HDP-VD yields robust scalability and coordination efficiency. The proposed framework maintains high mission success rates, precise timing, and low energy usage even under saturated traffic conditions, where conventional coupled (ECBS) or reactive (ORCA) methods break down.

4.3. Core Mechanism Ablation and Mechanism Analysis

4.3.1. RG Mechanism Ablation: Energy Efficiency Gains and Waiting Suppression Mechanism

This subsection focuses on evaluating the energy efficiency optimization performance of the Regret-Guided (RG) arbitration mechanism in the heterogeneous multi-UAV group delivery task (Scenario B). It addresses the issue where traditional geometric priority strategies often lead to system-level energy efficiency degradation by ignoring the nonlinear coupling between airframe mass and energy consumption, causing heavy-lift UAVs to be passively detained at bottlenecks.

In 15 paired Monte Carlo trials, RG-HDP demonstrated a significant and stable advantage in system total energy consumption (

E_{t o t a l}

). As shown in Figure 5, compared to the geometric baseline strategy (Baseline-Geo), which had a mean of

1.14 \times 10^{8}

J, RG-HDP reduced the system total energy consumption to

1.06 \times 10^{8}

J, achieving a significant reduction of 6.7%. The paired connecting lines in the box plot reveal extremely strong consistency: in all test samples, RG-HDP corresponded to a lower energy consumption level. This universal performance improvement indicates that the gain stems not from accidental advantages in specific terrains but from structural improvements in arbitration and resource allocation logic.

To reveal the physical mechanism behind the energy savings, Figure 6 decomposes the system energy consumption. The data show that the core benefit of RG-HDP originates from the effective suppression of high-cost hovering: the system total Hover Energy decreased significantly from

7.69 \times 10^{7}

J to

6.54 \times 10^{7}

J, a reduction of up to 15.0%. Correspondingly, the Path Energy increased slightly from

3.68 \times 10^{7}

J to

4.07 \times 10^{7}

J. This characteristic clearly validates the decision logic of the RG mechanism: the planner actively guides some low-regret-value platforms to undertake detouring and yielding tasks (causing a slight increase in path energy) in exchange for the relief of bottleneck congestion and a significant reduction in expensive hovering energy. This strategy of “trading space for energy efficiency” ultimately realized a significant net gain in system energy consumption.

Figure 7 and Figure 8 further reveal how the RG mechanism achieves “physical fairness” by reshaping the right-of-way. From the perspective of energy consumption distribution (Figure 6), the energy-saving benefit presents a clear asymmetric distribution: the energy consumption of heavy-lift UAVs decreased significantly by 8.2% (

9.13 \times 10^{7} J \to 8.37 \times 10^{7} J

), while the energy change for light UAVs was negligible (

- 0.5 %

) and statistically insignificant (n.s.). This difference directly corresponds to the reallocation of passage rights (Figure 7): the RG mechanism compressed the average waiting time of heavy-lift UAVs by 13.5% (

675.1 s \to 583.8 s

), while the waiting time for light aircraft remained basically flat (

- 1.0 %

, n.s.). This indicates that the system successfully identified the cost asymmetry in the heterogeneous game: granting priority passage to heavy-lift UAVs with high waiting costs (i.e., high regret values), while light platforms with lower maneuvering and waiting costs undertake the necessary system regulation tasks. Compared to geometric priority, this differentiated scheduling based on real physical costs achieves substantive fairness and improves system-level energy efficiency.

The trial-by-trial comparison (Figure 9) shows that in all 15 paired Monte Carlo trials, whether for system total energy, hovering energy, or heavy platform energy, RG-HDP was consistently lower than the baseline in all trials. The two curves showed a highly consistent trend of rising and falling together with changes in terrain difficulty, indicating that the experimental design effectively controlled the confounding variables brought by environmental difficulty differences. Facing identical terrain challenges and random perturbations, RG-HDP always maintained a stable energy-saving margin, proving that its advantage possesses statistical consistency and engineering robustness.

In summary, the ablation experiment in Scenario B comprehensively validated the necessity of the regret-guided arbitration mechanism. Compared with geometric-priority baselines, RG-HDP achieved a 6.7% reduction in system total energy consumption. This improvement arises from a cost-aware right-of-way mechanism that trades a modest increase in path (detour) effort for a larger reduction in expensive hovering, effectively shifting regulation actions to lower-cost agents while prioritizing heavy platforms at bottlenecks. By drastically cutting the ineffective hovering of heavy platforms by 15.0% and compressing their waiting time by 8.2%, it effectively alleviated bottleneck congestion. This physics-aware game mechanism provides a solution that balances fairness and efficiency for resource allocation of heterogeneous multi-UAVs in extreme environments.

4.3.2. VD Mechanism Ablation: Spatiotemporal Feasibility and Synchronization Accuracy Gains of Velocity Decomposition

This subsection aims to verify the core contribution of the Velocity Decomposition (VD) mechanism in high-density spatiotemporal conflict scenarios (Scenario A: Saturation Convergence). By comparing the performance of enabling VD (Ours) versus disabling VD (Baseline-FixedV), this experiment focuses on demonstrating the necessity of decoupling rigid time window constraints into elastic velocity envelopes (i.e., feasible path-length intervals) to avoid high-dimensional spatiotemporal deadlocks.

The experimental results show that RG-HDP-VD achieved a 100% planning success rate in all 20 rounds of simulation: all 160 UAV sorties were able to precisely hit the preset time windows while satisfying continuous obstacle avoidance and inter-agent safety separation. In contrast, the mission-level success rate of Baseline-FixedV was 0%, manifesting as systemic failure rather than random error. This failure is exacerbated by the non-convex canyon geometry, which severely restricts feasible detour and terminal loitering space; consequently, the feasible set for late/low-priority UAVs rapidly contracts and can collapse to near-empty under continuous safety constraints.

Further analysis of the “Distribution of Successful Sorties per Round” (Figure 10) reveals the degradation mode of the Baseline: its successful quantity was mainly concentrated in 1–2 aircraft. As priority decreased, subsequent UAVs, unable to adjust speed to maintain the necessary Arrival Gap, caused temporal conflicts to cascade, ultimately triggering mission-level cooperative collapse. This observation indicates that, under stringent spatiotemporal constraints, a fixed-speed planner without a speed-adjustment degree of freedom effectively reduces the problem to repeated single-agent feasibility checking and therefore cannot resolve the high-dimensional, tightly coupled constraints required for multi-UAV coordination.

To reveal the nature of the failure of Baseline-FixedV, Figure 11 displays the Cumulative Distribution Function (CDF) of arrival time errors. The error is defined as

Δ t = t_{a r r} - t^{*}

, where

t_{a r r}

is the actual arrival time output by the planner, and

t^{*}

is the center of the target time window. The results show that the error of RG-HDP-VD converges near 0, indicating its stable time window locking capability; in contrast, the error of the Baseline shows a significant negative offset, mainly distributed in the

[- 30 s, - 10 s]

interval, and a large number of samples crossed the “Early Arrival Failure” boundary (

- Δ T_{e a r l y} = - 10 s

). This indicates that although the Baseline was allowed to insert loitering to consume time, under terminal high-density and continuous safety constraints, feasible loitering space was extremely scarce, causing “safe feasible solutions” to often degenerate into early-arrival trajectories or be directly judged as unsolvable.

Under the fixed-speed assumption, the nominal flight time is rigidly locked by the path length (

T_{f l i g h t} \approx L / v_{f i x e d}

). When the queue order requires a low-priority UAV to significantly postpone its arrival, the only time adjustment means for the Baseline is to insert loitering segments near the terminal to “compensate” for the time difference. However, in the saturation convergence scenario, this strategy triggers three types of chain failures:

Terminal Airspace Saturation: Free space near the target area is extremely limited, and terrain undulations restrict available loitering radii. Once the Baseline attempts to loiter at a bottleneck, the detained airframe quickly transforms into a “long-duration dynamic obstacle,” causing the continuous collision detection and inter-agent separation constraints in the terminal airspace to fail simultaneously;
Occupancy Cascade: The loitering strategy under fixed speed possesses extremely high spatial exclusivity. The loitering behavior of high-priority UAVs generates long-duration dynamic occlusion in the Spatiotemporal Occupancy Map ( $H$ -Map), compressing the feasible region for low-priority UAVs and forcing them to seek more distant loitering points, leading to an exponential increase in energy consumption and risk costs;
Cost Divergence: To avoid channels occupied by loitering aircraft, subsequent UAVs are forced to execute large-scale detours. This detouring induced by “passive loitering” causes surges in path length and Risk Cost, making the underlying sampling planner unable to converge to a solution satisfying cost constraints within limited iterations.

The arrival time scatter plot in Figure 12 provides further intuitive verification of this structural contradiction. RG-HDP-VD absorbs timing discrepancies during transit via speed modulation, thereby achieving consistent adherence to the scheduled arrival order. Conversely, the data divergence of Baseline-FixedV (orange dots) indicates that as the queue order progresses, the exhaustion of “loitering space” renders the planner unable to match the target time via geometric means (waiting/detouring), leaving it only able to output physically feasible “early arrival” paths.

Furthermore, regarding computational efficiency (Figure 13), although RG-HDP-VD introduces an extra speed search dimension, the average planning time of the Baseline (~460 s) was significantly higher than that of RG-HDP-VD (~250 s). The reason lies in the fact that the Baseline needs to repeatedly attempt to find feasible loitering locations/durations at the terminal under tight constraints. However, this feasible set converges approximately to an empty set in congested airspace, causing the underlying VD-TSRRT* algorithm to frequently trigger constraint relaxation and repeated iterations, reaching the maximum iteration count (MaxIter) in an attempt to find non-existent feasible solutions, thereby incurring a substantial computational overhead. In contrast, RG-HDP-VD, by mapping time windows to length feasible regions and incorporating feasibility screening during the spatial search phase, achieves rapid search convergence, thereby reducing total time consumption.

Combining the analysis of success rate, error distribution, and time consumption, it is evident that the value of Velocity Decomposition (VD) lies in transforming time adjustment from local loitering at the terminal into feasible region search and distributed rhythm control along the entire flight segment. This elevates the mission success rate from 0% to 100% and improves computational efficiency by approximately 45%.

4.4. Parameter Sensitivity Analysis

In evaluating parameter sensitivity, a lexicographic performance criterion was adopted: first, the mission success rate was maximized, and only among solutions with full success were total energy use and synchronization error subsequently minimized. The key coefficients

λ_{t i m e}

and

λ_{e n g}

(the energy- and time-weighting factors in the global objective in Equation (8), and reused in the time-anchoring objective in Equation (12)) were chosen through structured sensitivity sweeps under our baseline Scenario A (Saturation Convergence). In contrast, the risk-weight

γ

and numerical stabilizer

ϵ

were held fixed (

γ = 0.2

,

ϵ = 1 \times 10^{- 6}

), since

γ

merely enforces a soft safety margin and

ϵ

is a small computational conditioner with negligible effect on coordination feasibility. Finally, each experiment was repeated 10 times under identical initial conditions, and all reported metrics are averages across these trials to ensure statistical reliability.

4.4.1. Experiment A: Synchronization Pressure ( $λ_{t i m e}$ )

Table 3 shows that the synchronization weight

λ_{t i m e}

has a clear success threshold. When

λ_{t i m e}

is below a critical value (for example,

λ_{t i m e} < 0.5

with

λ_{e n g} = 0.2

), the success rate remains near 0% because the planner prioritizes path economy over timing, causing many missed arrival windows. Once

λ_{t i m e}

exceeds this threshold (around 0.5 for

λ_{e n g} = 0.2

and around 1.0 for

λ_{e n g} = 0.5

), the success rate abruptly jumps to 100%. Increasing

λ_{t i m e}

further yields only marginal synchronization gains (

Δ T_{s y n c}

decreases) but incurs an energy penalty. In our data, forcing very high

λ_{t i m e}

raises the total energy

E_{t o t a l}

by about 5–10%. Thus, in practice

λ_{t i m e}

should be set just above the value needed for full success, to balance guaranteed feasibility with energy efficiency.

For example, at

λ_{e n g} = 0.2

the success rate jumps from 20% at

λ_{t i m e} = 0.2

to 100% at

λ_{t i m e} = 0.5

, after which

Δ T_{s y n c}

continues to shrink (improving synchronization) at only a modest energy cost. This confirms the threshold and diminishing-return behavior described above.

4.4.2. Experiment B: Energy-Fairness Anchor ( $λ_{e n g}$ )

Varying the energy-fairness weight

λ_{e n g}

shifts waiting time between heavy-lift and light UAVs (Figure 14). At

λ_{e n g} = 0

the scheduler ignores weight, so heavy UAVs suffer long waits: heavy UAVs wait on average

\approx 18.5

s versus

\approx 8.2

s for light UAVs (

T_{h e a v y} / T_{l i g h t} \approx 2.3

). As

λ_{e n g}

increases, heavy UAV waiting decreases and light-UAV waiting increases. By

λ_{e n g} \approx 1.0

the waits invert: heavy

\approx 9.1

s, light

\approx 11.8

s (ratio

\approx 0.77

). At

λ_{e n g} \approx 1.2

heavy waits

\approx 8.2

s while light waits

\approx 15.5

s (ratio

\approx 0.53

). Beyond

λ_{e n g} \approx 1.2

heavy-wait time plateaus (

\approx 7.5

s at

λ_{e n g} = 2.0

) while light UAVs absorb the remainder (

\approx 22.1

s at

λ_{e n g} = 2.0

), so the fairness metric

T_{h e a v y} / T_{l i g h t}

steadily falls. This trend matches the intuition (and prior analysis) that higher

λ_{e n g}

increasingly favors heavy-lift vehicles, forcing lightweight UAVs to take on extra loitering.

Crucially, the total mission energy

E_{t o t a l}

follows a convex profile. From

λ_{e n g} = 0

to 1.2,

E_{t o t a l}

decreases (from

\approx 1.14

to

\approx 1.06

, normalized) because prioritizing heavy UAVs avoids costly idling for those high-power platforms. Increasing

λ_{e n g}

beyond

\approx 1.2

then raises

E_{t o t a l}

again (to

\approx 1.15

at

λ_{e n g} = 2.0

) because light UAVs begin taking longer detours. In other words, an intermediate

λ_{e n g}

(~1.2) balances the heavy/light waiting costs and minimizes total energy. This U-shaped energy response (optimal near

λ_{e n g} \approx 1.2

) was also observed in our earlier analysis. In summary,

λ_{e n g} \approx 1.2

provides a practical trade-off: it equalizes heavy/light waiting and yields the lowest mission energy, without over-penalizing light UAVs.

4.4.3. Discussion of Secondary Parameters

The remaining tuning parameters (risk weight

γ

and numerical stabilizer

ϵ

) are held constant because they have no significant effect on core outcomes. In all experiments we fixed

γ = 0.2

and

ϵ = 1 \times 10^{- 6}

(as in our algorithm setup). The risk weight

γ

simply enforces a soft safety margin on trajectories; adjusting

γ

shifts how aggressively UAVs avoid risk but does not fundamentally change mission feasibility or coordination logic. Likewise, the constant

ϵ

is used for numerical conditioning (e.g., to avoid division by zero) and has negligible impact on the high-level scheduling results. By keeping

γ

and

ϵ

fixed, we focus our analysis on the primary trade-offs governed by

λ_{t i m e}

and

λ_{e n g}

without loss of generality.

5. Real-World Flight Demonstration

To validate the practical feasibility of the proposed RG-HDP-VD framework, a real-world flight experiment was conducted in a

6 m \times 6 m

indoor arena featuring a constricted bottleneck passage created by cylindrical obstacles. The mission required a heterogeneous team of three Alpha-type quadrotors manufactured by Differential Intelligence Fly (Hangzhou) Technology Co., Ltd., Hangzhou, China (base mass ≤ 1.9 kg, 400 × 370 × 178 mm), to navigate the gap and reach a common target beyond the obstacles. Each vehicle was equipped with an NVIDIA Jetson Orin NX (16 GB) for high-level computation and an STM32H743 flight controller running PX4 for real-time stabilization. To introduce physical heterogeneity, UAV 1 was augmented with a

0.5 kg

payload, increasing its weight by approximately

26 %

and significantly elevating its hover energy penalty, thereby creating the specific asymmetric cost condition that the RG module is designed to address.

The cooperative 4D trajectories were computed offline using the full RG-HDP-VD pipeline and subsequently uploaded to the onboard systems for real-time tracking. During the planning phase, the RG module automatically prioritized UAV 1 due to its superior loitering penalty, assigning it the first right-of-way through the bottleneck. Simultaneously, the VD module provided elastic timing envelopes that allowed the follower drones (UAV 2 and 3) to modulate their velocities en route rather than resorting to inefficient loitering or terminal hovering. This integrated approach successfully serialized the passage order without requiring any vehicle to come to a full stop.

Experimental results demonstrate high consistency between the planned trajectories and real-world execution. As shown in the planned paths (Figure 15) and the time-lapse sequence of the actual flight (Figure 16), UAV 1 navigated the central bottleneck first, while the follower drones adjusted their flight rhythm within the VD-prescribed velocity bounds to queue smoothly behind the leader. No stop-and-go behavior or hovering deadlocks were observed; instead, the fleet maintained safe separation via continuous velocity modulation. All three vehicles cleared the constricted area without collision and converged at the goal area as scheduled. This successful demonstration validates that the RG-HDP-VD framework effectively translates theoretical physics-aware coordination into robust hardware execution, bridging the gap between high-fidelity simulation and practical multi-UAV operations in complex environments.

6. Conclusions

This paper has proposed and validated a physics-aware cooperative planning framework (RG-HDP-VD) to address energy imbalance and rigid spatiotemporal constraints in heterogeneous multi-UAV missions. Specifically, it has integrated a novel Regret-Guided (RG) arbitration mechanism to allocate right-of-way based on physical cost and a Velocity Decomposition (VD) approach to create elastic time windows for tight-deadline tasks. Simulation results have shown that RG-HDP-VD has effectively prevented energy-inefficient congestion by reducing total system energy consumption by 6.7% (including a 15% reduction in heavy-lift UAV hovering energy), and that the velocity-decomposition approach has increased the success rate of tight time-window tasks from 0% to 100% and improved computational efficiency by avoiding infeasible searches. These results have demonstrated that our physics-grounded strategy has significantly enhanced the robustness and efficiency of heterogeneous multi-UAV coordination under complex constraints. Future work will extend the framework to distributed communication settings, dynamic environments, and real-world closed-loop validation and system-level energy efficiency improvement.

Author Contributions

Conceptualization, D.H.; methodology, D.H. and Z.H.; software, Z.H.; validation, X.Z. and L.L.; formal analysis, Z.H. and L.L.; investigation, X.Z. and H.J.; resources, H.J. and L.W.; data curation, Z.H.; writing—original draft preparation, Z.H.; writing—review and editing, D.H.; visualization, L.L. and Z.H.; supervision, L.W. and H.J.; project administration, L.W.; funding acquisition, L.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Civil Aviation Flight University of China, grant numbers 25CAFUC03022 and 25CAFUC03085, and by the National Natural Science Foundation of China (General Program), grant number 6247071842.

Data Availability Statement

The partial implementation code of the proposed RG-HDP-VD framework is publicly available at [IEEE Dataport, https://doi.org/10.21227/qsdc-1291].

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

UAV	Unmanned Aerial Vehicle
HMUAS	Heterogeneous Multi-Unmanned Aerial Systems
MUCPP	Multi-UAV Cooperative Path Planning
SAT	Simultaneous Arrival Task
RG-HDP-VD	Regret-Guided Heuristic Decentralized Prioritized Planning with Velocity Decomposition
RG	Regret-Guided (right-of-way arbitration)
HDP	Heuristic Decentralized Prioritized Planning
VD	Velocity Decomposition
A*	A-star Search
RRT*	Rapidly exploring Random Tree Star
VD-TSRRT*	Velocity-Decomposition Time–Space RRT*
CDF	Cumulative Distribution Function
CVaR	Conditional Value-at-Risk
APF	Artificial Potential Field
ORCA	Optimal Reciprocal Collision Avoidance
CBS	Conflict-Based Search
EECBS	Enhanced Edge-Weighted Conflict-Based Search
RDP	Ramer–Douglas–Peucker (path simplification)
PCHIP	Piecewise Cubic Hermite Interpolating Polynomial

Appendix A

Table A1 lists the essential symbols used in the problem formulation and the proposed RG-HDP-VD framework. Symbols are grouped by system/environment, trajectory-time-safety constraints, and objective/cost terms.

Table A1. Essential notation used in the formulation and RG-HDP-VD framework.

Symbol	Meaning
$U = {U_{i}}_{i = 1}^{N}$	$Set of UAVs (heterogeneous multi-UAV system), N$ is the number of UAVs
$s_{i}, G_{i}$	$Start state and goal region of UAV i$
$O_{e n v}$	Environment constraints set
$O_{t e r r a i n}$	Terrain obstacles (hard constraints)
$O_{t h r e a t}$	Threat regions (risk exposure)
$σ_{i}$	$Trajectory / path of UAV i$ $(recommended : use π_{i}$ $for geometric path and σ_{i} (t)$ for timed trajectory)
$p_{i} (t)$	$Position of UAV i$ $at time t$
$v_{m i n, i}, v_{m a x, i}$	$Min / \max speed bounds of UAV i$
$T_{c o m}$	Common mission timeline length
$t_{c o}$	Global cooperative time anchor
$δ_{i}$	$Arrival time bias / offset for UAV i$
$t_{t a r g e t, i} = t_{c o} + δ_{i}$	$Target arrival time of UAV i$
$[t_{l o w, i}, t_{h i g h, i}]$	$Assigned arrival time window for UAV i$
$r_{s a f e}$	Safety radius (inflation for body size and uncertainty)
$V_{i} (t) = p_{i} (t) \oplus B (0, r_{s a f e})$	$Occupancy set of UAV i$ $at time t$ (Minkowski sum)
$D_{m i n}$	Minimum safe separation threshold
$H_{a l l}$	Global spatiotemporal occupancy/constraint map used in HDP updates
$m_{i} (t)$	$Time-varying mass of UAV i$ (payload drop causes piecewise change)
$t_{d r o p, i}$	$Payload drop time of UAV i$
$J_{e n g} (σ_{i})$	$Energy \cos t along σ_{i}$ (motion + hover)
$P_{m o t i o n} (\cdot), P_{h o v e r} (m_{i} (t))$	Motion power and hover power terms
$J_{g l o b a l}$	Global weighted objective for multi-UAV planning
${\hat{J}}_{e n g}, {\hat{J}}_{s y n c}, {\hat{J}}_{r i s k}$	Normalized energy/synchronization/risk components
$λ_{e n g}, λ_{t i m e}, γ$	Weights for the three objective components
$R (p)$	$Threat-field risk value at position p$
$C V a R$	Tail-risk metric penalizing high-risk exposure segments
$R_{i}$	$Regret value for UAV i$ (RG-based prioritization)
$C_{i}^{w a i t}, C_{i}^{d e t o u r}$	Energy costs of “wait/hover” vs. “detour” actions in RG arbitration
$L (σ_{i})$	Geometric path length used in VD feasibility check
$L_{f e a s, i}$	$Feasible length interval induced by [t_{l o w, i}, t_{h i g h, i}]$ $and [v_{m i n, i}, v_{m a x, i}]$

References

Chung, S.-J.; Paranjape, A.A.; Dames, P.; Shen, S.; Kumar, V. A Survey on Aerial Swarm Robotics. IEEE Trans. Robot. 2018, 34, 837–855. [Google Scholar] [CrossRef]
Mellinger, D.; Kumar, V. Minimum Snap Trajectory Generation and Control for Quadrotors. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, 9–13 May 2011; pp. 2520–2525. [Google Scholar] [CrossRef]
Richter, C.; Bry, A.; Roy, N. Polynomial Trajectory Planning for Aggressive Quadrotor Flight in Dense Indoor Environments. In Robotics Research; Springer: Cham, Switzerland, 2016; pp. 649–666. [Google Scholar] [CrossRef]
Kumar, P.; Pal, K.; Govil, M.C. Comprehensive Review of Path Planning Techniques for Unmanned Aerial Vehicles (UAVs). ACM Comput. Surv. 2025, 58, 1–44. [Google Scholar] [CrossRef]
Rahman, M.; Sarkar, N.I.; Lutui, R. A Survey on Multi-UAV Path Planning: Classification, Algorithms, Open Research Problems, and Future Directions. Drones 2025, 9, 263. [Google Scholar] [CrossRef]
Wang, L.; Huang, W.; Li, H.; Li, W.; Chen, J.; Wu, W. A Review of Collaborative Trajectory Planning for Multiple Unmanned Aerial Vehicles. Processes 2024, 12, 1272. [Google Scholar] [CrossRef]
Babel, L. Coordinated Target Assignment and UAV Path Planning with Timing Constraints. J. Intell. Robot. Syst. 2019, 94, 857–869. [Google Scholar] [CrossRef]
Yan, F.; Zhu, X.; Zhou, Z.; Chu, J. A Hierarchical Mission Planning Method for Simultaneous Arrival of Multi-UAV Coalition. Appl. Sci. 2019, 9, 1986. [Google Scholar] [CrossRef]
Andreychuk, A.; Yakovlev, K.; Surynek, P.; Atzmon, D.; Stern, R. Multi-Agent Pathfinding with Continuous Time. Artif. Intell. 2022, 305, 103662. [Google Scholar] [CrossRef]
Phillips, M.; Likhachev, M. SIPP: Safe Interval Path Planning for Dynamic Environments. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, 9–13 May 2011; pp. 5628–5635. [Google Scholar] [CrossRef]
Ait Saadi, A.; Bhuyan, B.P.; Ramdane-Cherif, A. Power Consumption Model for Unmanned Aerial Vehicles Using Recurrent Neural Network Techniques. Aerosp. Sci. Technol. 2025, 157, 109819. [Google Scholar] [CrossRef]
van den Berg, J.; Guy, S.J.; Lin, M.-C.; Manocha, D. Reciprocal n-Body Collision Avoidance. In Robotics Research; Springer: Berlin, Germany, 2011; pp. 3–19. [Google Scholar] [CrossRef]
Khatib, O. Real-Time Obstacle Avoidance for Manipulators and Mobile Robots. In Proceedings of the 1985 IEEE International Conference on Robotics and Automation (ICRA), St. Louis, MO, USA, 25–28 March 1985; pp. 500–505. [Google Scholar] [CrossRef]
Sharon, G.; Stern, R.; Felner, A.; Sturtevant, N.R. Conflict-Based Search for Optimal Multi-Agent Path Finding. Artif. Intell. 2015, 219, 40–66. [Google Scholar] [CrossRef]
Li, J.; Ruml, W.; Koenig, S. EECBS: A Bounded-Suboptimal Search for Multi-Agent Path Finding. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Virtual Event, 2–9 February 2021; pp. 12353–12362. [Google Scholar] [CrossRef]
Liu, X.; Su, Y.; Wu, Y.; Guo, Y. Multi-Conflict-Based Optimal Algorithm for Multi-UAV Cooperative Path Planning. Drones 2023, 7, 217. [Google Scholar] [CrossRef]
Semiz, F.; Polat, F. Incremental Multi-Agent Path Finding. Future Gener. Comput. Syst. 2021, 116, 220–233. [Google Scholar] [CrossRef]
Silver, D. Cooperative Pathfinding. In Proceedings of the First Artificial Intelligence and Interactive Digital Entertainment Conference (AIIDE), Marina del Rey, CA, USA, 1–5 June 2005; pp. 117–122. [Google Scholar] [CrossRef]
Velagapudi, P.; Sycara, K.P.; Scerri, P. Decentralized Prioritized Planning in Large Multirobot Teams. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Taipei, Taiwan, 18–22 October 2010; pp. 4603–4609. [Google Scholar] [CrossRef]
Guo, Y.; Liu, X.; Jiang, W.; Zhang, W. Collision-Free 4D Dynamic Path Planning for Multiple UAVs Based on Dynamic Priority RRT* and Artificial Potential Field. Drones 2023, 7, 180. [Google Scholar] [CrossRef]
Ma, H.; Harabor, D.; Stuckey, P.J.; Li, J.; Koenig, S. Searching with Consistent Prioritization for Multi-Agent Path Finding. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Honolulu, HI, USA, 27 January–1 February 2019; pp. 7643–7650. [Google Scholar] [CrossRef]
Chagas, F.S.; Ruseno, N.; Bechina, A.A.A. Artificial Intelligence Approaches for UAV Deconfliction: A Comparative Review and Framework Proposal. Automation 2025, 6, 54. [Google Scholar] [CrossRef]
Zhang, M.; Yan, C.; Dai, W.; Xiang, X.; Low, K.H. Tactical Conflict Resolution in Urban Airspace for Unmanned Aerial Vehicles Operations Using Attention-Based Deep Reinforcement Learning. Green Energy Intell. Transp. 2023, 2, 100107. [Google Scholar] [CrossRef]
Kong, X.; Zhou, Y.; Li, Z.; Wang, S. Multi-UAV Simultaneous Target Assignment and Path Planning Based on Deep Reinforcement Learning in Dynamic Multiple Obstacles Environments. Front. Neurorobotics 2024, 17, 1302898. [Google Scholar] [CrossRef]
Yan, H.; Chen, Y.; Yang, S.-H. New Energy Consumption Model for Rotary-Wing UAV Propulsion. IEEE Wirel. Commun. Lett. 2021, 10, 2009–2012. [Google Scholar] [CrossRef]
Burzyński, W.; Stecz, W. Trajectory Planning with Multiplatform Spacetime RRT*. Appl. Intell. 2024, 54, 9524–9541. [Google Scholar] [CrossRef]
Guo, Y.; Liu, X.; Jiang, W.; Zhang, W. HDP-TSRRT*: A Time–Space Cooperative Path Planning Algorithm for Multiple UAVs. Drones 2023, 7, 170. [Google Scholar] [CrossRef]
Rockafellar, R.T.; Uryasev, S. Optimization of Conditional Value-at-Risk. J. Risk 2000, 2, 21–42. [Google Scholar] [CrossRef]
Hakobyan, A.; Kim, G.C.; Yang, I. Risk-Aware Motion Planning and Control Using CVaR-Constrained Optimization. IEEE Robot. Autom. Lett. 2019, 4, 3924–3931. [Google Scholar] [CrossRef]
Karaman, S.; Frazzoli, E. Sampling-Based Algorithms for Optimal Motion Planning. Int. J. Robot. Res. 2011, 30, 846–894. [Google Scholar] [CrossRef]
Douglas, D.H.; Peucker, T.K. Algorithms for the Reduction of the Number of Points Required to Represent a Digitized Line or Its Caricature. Can. Cartogr. 1973, 10, 112–122. [Google Scholar] [CrossRef]
Zhou, B.; Gao, F.; Wang, L.; Liu, C.; Shen, S. Robust and Efficient Quadrotor Trajectory Generation for Fast Autonomous Flight. IEEE Robot. Autom. Lett. 2019, 4, 3529–3536. [Google Scholar] [CrossRef]
Fritsch, F.N.; Carlson, R.E. Monotone Piecewise Cubic Interpolation. SIAM J. Numer. Anal. 1980, 17, 238–246. [Google Scholar] [CrossRef]
Redon, S.; Kheddar, A.; Coquillart, S. Fast Continuous Collision Detection between Rigid Bodies. Comput. Graph. Forum 2002, 21, 279–287. [Google Scholar] [CrossRef]
Pan, J.; Chitta, S.; Manocha, D. FCL: A General Purpose Library for Collision and Proximity Queries. In Proceedings of the 2012 IEEE International Conference on Robotics and Automation (ICRA), St. Paul, MN, USA, 14–18 May 2012; pp. 3859–3866. [Google Scholar] [CrossRef]
Fournier, A.; Fussell, D.; Carpenter, L. Computer Rendering of Stochastic Models. Commun. ACM 1982, 25, 371–384. [Google Scholar] [CrossRef]

Figure 1. The RG-HDP-VD Cooperative Planning Framework.

Figure 2. Schematic of Saturation Convergence Scenario. Different colored lines indicate the trajectories of different UAVs.

Figure 3. Schematic of Group Delivery Scenario. Different colored lines indicate the trajectories of different UAVs.

Figure 4. Performance trends of RG-HDP-VD, ECBS, and ORCA under Scenario A.

Figure 5. Comparison of System Total Energy Consumption Distribution.

Figure 6. Decomposition of System Energy Components. The dark and light orange colors represent the path energy and hover energy for RG-HDP, respectively, while the dark and light gray colors represent those for the Baseline.

Figure 7. Comparison of Heterogeneous Platform Energy Optimization.

Figure 8. Statistical Analysis of Waiting Time.

Figure 9. Trial-by-Trial Robustness Analysis.

Figure 10. Distribution of Successful Sorties per Round.

Figure 11. Cumulative Distribution Function of Flight Time Errors. The green shaded region marks the feasible arrival-error interval, and the vertical boundary indicates the early-arrival failure threshold.

Figure 12. Arrival Time Scatter Plot.

Figure 13. Computation Time per Round. Different colors denote RG-HDP-VD and Baseline-FixedV, respectively.

Figure 14. Sensitivity analysis of the energy-fairness anchor

λ_{e n g}

. In the right panel, the pink curve denotes the normalized total mission energy as a function of

λ_{e n g}

, showing a minimum near

λ_{e n g} \approx 1.2

.

Figure 14. Sensitivity analysis of the energy-fairness anchor

λ_{e n g}

. In the right panel, the pink curve denotes the normalized total mission energy as a function of

λ_{e n g}

, showing a minimum near

λ_{e n g} \approx 1.2

.

Figure 15. Planned 3D and top-down trajectories for the three-UAV heterogeneous team in the bottleneck indoor scenario.

Figure 16. Time-lapse sequence of the real-world flight experiment.

Table 1. Heterogeneous UAV Physical Parameter Settings.

Parameter	Symbol	Heavy-Lift UAVs	Light UAVs	Unit
Empty/Payload Mass	$m_{e m p t y} / m_{p a y l o a d}$	20.0/10.0	2.0/0.5	kg
Max Flight Speed	$v_{m a x}$	12	15	m/s
Hovering Power Coeff.	$α_{h}$	280	80	$W / {kg}^{1.5}$
Safety Collision Radius	$r_{s a f e}$	5.0	3.0	m
Elastic Velocity Envelope	$V_{i}$	7–12	7–15	m/s

Table 2. Comparative results under Scenario A.

$N$	Method	Success Rate (%)	$Δ T_{s y n c}$ (s)	$E_{t o t a l}$ $(\times$ 10⁸J)
4	OURS	100	0.8	0.56
	ECBS	90	4.2	0.63
	ORCA	55	11.5	0.69
8	OURS	100	1.12	1.1
	ECBS	55	8.3	1.32
	ORCA	15	21.5	1.46
12	OURS	100	2.8	1.7
	ECBS	30	13.6	2.17
	ORCA	5	27.4	2.36
16	OURS	95	4.1	2.31
	ECBS	0	18.2	3.03
	ORCA	0	34.8	3.21

Table 3.

λ_{t i m e}

threshold and trade-off under two

λ_{e n g}

values.

Table 3.

λ_{t i m e}

threshold and trade-off under two

λ_{e n g}

values.

$λ_{e n g}$	$λ_{t i m e}$	Success Rate	$E_{t o t a l}$ $(\times$ 10⁸J)	$Δ T_{s y n c}$ (s)
0.2	0	0%	1.02	32.51
	0.2	20%	1.04	16.25
	0.5	100%	1.07	5.52
	1	100%	1.10	1.12
	2	100%	1.18	0.32
0.5	0	0%	0.98	33.33
	0.2	20%	1.03	18.98
	0.5	80%	1.06	10.38
	1	100%	1.09	5.54
	2	100%	1.15	2.37

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Han, D.; Hua, Z.; Zhu, X.; Luo, L.; Jiang, H.; Wang, L. RG-HDP-VD: A Physics-Aware Cooperative Trajectory Planning Framework for Heterogeneous Multi-UAVs. Drones 2026, 10, 192. https://doi.org/10.3390/drones10030192

AMA Style

Han D, Hua Z, Zhu X, Luo L, Jiang H, Wang L. RG-HDP-VD: A Physics-Aware Cooperative Trajectory Planning Framework for Heterogeneous Multi-UAVs. Drones. 2026; 10(3):192. https://doi.org/10.3390/drones10030192

Chicago/Turabian Style

Han, Dan, Zhaoyuan Hua, Xinyu Zhu, Liang Luo, Hao Jiang, and Lifang Wang. 2026. "RG-HDP-VD: A Physics-Aware Cooperative Trajectory Planning Framework for Heterogeneous Multi-UAVs" Drones 10, no. 3: 192. https://doi.org/10.3390/drones10030192

APA Style

Han, D., Hua, Z., Zhu, X., Luo, L., Jiang, H., & Wang, L. (2026). RG-HDP-VD: A Physics-Aware Cooperative Trajectory Planning Framework for Heterogeneous Multi-UAVs. Drones, 10(3), 192. https://doi.org/10.3390/drones10030192

Article Menu

RG-HDP-VD: A Physics-Aware Cooperative Trajectory Planning Framework for Heterogeneous Multi-UAVs

Highlights

Abstract

1. Introduction

2. Problem Formulation and System Modeling

2.1. System Modeling and Performance Constraints

2.1.1. Dynamic Mass Model

2.1.2. Mass-Augmented Energy Cost Model

2.1.3. Kinematic Feasibility Under a Velocity Envelope

2.2. Spatiotemporal Cooperative Constraints

2.2.1. Continuous Inter-Agent Safety Constraint

2.2.2. Cooperative Arrival Time Window Constraint

2.3. Cooperative Path Planning Problem Definition

3. The RG-HDP-VD Physics-Aware Cooperative Planning Framework

3.1. System Architecture and Problem Decomposition

3.2. Mass-Augmented Energy Topology

3.3. Regret-Guided Arbitration and Time Anchoring

3.3.1. Global Cooperative Time Anchoring

3.3.2. Regret-Guided Dynamic Right-of-Way Arbitration

3.3.3. HDP Decentralized Priority Coordination and Occupancy Closed-Loop

3.4. Elastic Execution Based on Velocity Decomposition

3.4.1. Principle of Velocity Decomposition

3.4.2. Implementation of VD-TSRRT* Planning Algorithm

3.4.3. Fallback Strategy and Sequential Avoidance

3.5. Trajectory Realization and Continuous Verification

4. Experimental Evaluation and Analysis

4.1. Experimental Setup

4.1.1. Simulation Environment Deployment and Heterogeneous Physical Models

4.1.2. Mission Scenario Design

4.1.3. Experimental Design and Evaluation Metrics

4.2. Baseline Comparison

4.3. Core Mechanism Ablation and Mechanism Analysis

4.3.1. RG Mechanism Ablation: Energy Efficiency Gains and Waiting Suppression Mechanism

4.3.2. VD Mechanism Ablation: Spatiotemporal Feasibility and Synchronization Accuracy Gains of Velocity Decomposition

4.4. Parameter Sensitivity Analysis

4.4.1. Experiment A: Synchronization Pressure ( λ t i m e )

4.4.2. Experiment B: Energy-Fairness Anchor ( λ e n g )

4.4.3. Discussion of Secondary Parameters

5. Real-World Flight Demonstration

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.4.1. Experiment A: Synchronization Pressure ( $λ_{t i m e}$ )

4.4.2. Experiment B: Energy-Fairness Anchor ( $λ_{e n g}$ )