A Rapid Trajectory Planning Method for Heterogeneous Swarms via Fusion of Visual Navigation and Explainable Decision Trees

Gao, Yang; Yin, Hao; Wang, Wenliang; Guo, Bing; Wang, Yue; Li, Guopeng; Tian, Lingyun; Li, Dongguang

doi:10.3390/drones10040287

Open AccessArticle

A Rapid Trajectory Planning Method for Heterogeneous Swarms via Fusion of Visual Navigation and Explainable Decision Trees

by

Yang Gao

¹

,

Hao Yin

²,

Wenliang Wang

³,

Bing Guo

³,

Yue Wang

^1,4,*,

Guopeng Li

⁴,

Lingyun Tian

¹

and

Dongguang Li

¹

School of Mechatronical Engineering, Beijing Institute of Technology, No. 5 Zhongguancun South Street, Haidian District, Beijing 100081, China

²

China Academy of Aerospace System and Innovation, No. 2 Xinjiekou Wai Street, Xicheng District, Beijing 100081, China

³

CSSC Zhejiang Ocean Technology Co., Ltd., Zhoushan 316100, China

⁴

Yangtze River Delta Research Institute Jiaxing, Beijing Institute of Technology, No. 1940 Dongfang North Road, Youchegang Town, Xiuzhou District, Jiaxing 314000, China

^*

Author to whom correspondence should be addressed.

Drones 2026, 10(4), 287; https://doi.org/10.3390/drones10040287

Submission received: 16 February 2026 / Revised: 10 April 2026 / Accepted: 10 April 2026 / Published: 14 April 2026

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

A rapid Cooperative Cross-domain Path Planning (CCPP) framework is proposed for heterogeneous UAV–USV swarms, integrating a vision-to-planning interface, hybrid path–speed co-optimization, and an adaptive interpretable decision tree for online replanning.
Simulation results across simple, dynamic, and highly dynamic maritime scenarios show that CCPP improves synchronized arrival, planning efficiency, and inter-agent safety while maintaining high coverage and target-verification performance.

What are the implications of the main findings?

The proposed method provides an interpretable and real-time feasible solution for cooperative trajectory planning of heterogeneous unmanned swarms in GNSS-challenged and dynamically uncertain maritime environments.
By explicitly modeling uncertainty, dynamic risk, and temporal coordination, this work offers a practical technical basis for future deployment of UAV–USV collaborative search and verification missions.

Abstract

For complex tasks such as search and recovery in uncharted maritime areas, the use of heterogeneous unmanned swarms (UAVs and USVs) is highly promising, yet effective cross-domain cooperative trajectory planning remains a key challenge, often leading to mission delays. This paper proposes a rapid Cooperative Cross-domain Path Planning framework (CCPP) and its associated algorithm for heterogeneous UAV–USV swarms. The framework first establishes a visual-fusion modeling pipeline, converting visual pose estimation, uncertainties, and semantic dynamic obstacles into a planning representation with robust safety margins and time-varying risk fields. A hybrid velocity-path co-optimization algorithm is then designed to simultaneously generate curvature-feasible trajectories and speed profiles under heterogeneous kinematics and explicit temporal constraints. In the end, an adaptive interpretable decision tree acts as a meta-strategy for online replanning and real-time adjustment of modes and weights. To address the critical issue of uneven arrival time distribution, this paper introduces, inspired by economic inequality analysis, a normalized Gini coefficient-based arrival time consistency index to quantify and optimize coordination timing. Comprehensive experiments validate the effectiveness of the proposed approach in enhancing cooperative efficiency and real-time adaptability.

Keywords:

UAV–USV cooperation; synchronized arrival; online replanning; vision-aided navigation; uncertainty-aware planning; explainable decision-making

1. Introduction

In modern industrial and civilian contexts, unmanned systems are playing an increasingly important role. Particularly in mission scenarios characterized by high repetition and high risk, their role as the primary entities for task execution has become increasingly prominent. As mission demands continue to increase, unmanned systems are expected to undertake more complex tasks in increasingly challenging environments. However, as task complexity escalates, the mainstream solutions based on singular, stacked, homogeneous unmanned systems have proven inadequate [1,2,3], struggling to meet the demands of such complex tasks effectively. A typical example is search and recovery operations for underwater targets in complex marine environments, which highlight the limitations of existing approaches. To address these challenges, researchers in the field of unmanned systems have proposed adopting a mode of collaborative execution using heterogeneous clusters [4,5,6]. In this context, taking a maritime collaborative search system composed of Unmanned Aerial Vehicles (UAVs) and Unmanned Surface Vessels (USVs) as an example, although leveraging the functional complementarity of a heterogeneous unmanned cluster can, to the greatest extent, adapt to complex scenarios and task requirements, the distinct characteristics of the internal platforms often necessitate confronting the problem of achieving online collaborative rapid trajectory planning and temporally consistent coordination when executing staged joint search missions involving coverage, cue generation, and cross-domain confirmation.

From a planning perspective, enabling online fast cooperative trajectory planning for heterogeneous swarms poses multifaceted challenges, particularly in ensuring synchronized arrival at a designated mission execution area. First, pronounced differences in platform kinematics, such as speed, turning rate, and acceleration limits, lead to highly diverse feasible trajectory sets. Second, the maritime environment is inherently dynamic and uncertain, involving time-varying disturbances such as moving obstacles, ocean currents, and wind. In addition, communication links may suffer delays or outages due to platform heterogeneity and the complexity of the operational environment. When the performance of the Global Navigation Satellite System (GNSS) [7] degrades or fails and no alternative positioning method is available, cooperative missions can be severely compromised. Motivated by the successful deployment of vision-based navigation in a wide range of scenarios [8], this study incorporates it as a key information source for state estimation, aiming to enhance the robustness of heterogeneous trajectory planning in challenging environments.

From the perspective of coordination, achieving robust path coordination remains a persistent core challenge. The Equal-Path-Length Method [9] and the Pure-Velocity Coordination Method [10] are currently the two most mainstream categories of path coordination methods. The former adjusts path lengths to make the travel distances of each agent similar, performing well in simple scenarios. However, when needing to simultaneously satisfy curvature feasibility and obstacle avoidance constraints, it often requires substantial computational iteration, and it may even have no solution under stringent constraints. The latter fixes the path and adjusts speed to achieve synchronized arrival. While computationally more efficient, it is prone to system saturation due to speed range limitations, and synchronization errors may amplify under external disturbances or tracking limits. These limitations have motivated growing interest in integrated path–speed coordination methods that combine perception awareness, robustness, collaboration, and online replanning capability. To this end, this paper proposes a Cooperative Cross-domain Path Planning (CCPP) framework for heterogeneous UAV–USV clusters. This framework systematically integrates the following three parts: (i) a structured vision-to-planning interface capable of mapping pose uncertainty and semantic dynamic obstacles into robust constraints and time-varying risk fields; (ii) a hybrid speed–path collaborative optimization mechanism that jointly updates curvature-feasible paths and along-path speed profiles under explicit temporal consistency constraints; (iii) an Adaptive and Interpretable Decision Tree (AIDT) [11], serving as an interpretable meta-strategy module that triggers online replanning and mode switching based on structured situational features. During the research on the algorithm framework, to address the temporal synchronization problem among heterogeneous groups at a deeper level, this study innovatively introduces an indicator from economics used for inequality analysis—the Gini coefficient—as a normalized measure of arrival time disparity. Compared with traditional synchronization-error metrics, such as the range or variance of arrival times, this coefficient provides a more comprehensive characterization of the distribution of arrival times. We apply it as a regularization term in the optimization objective and as one of the re-planning trigger conditions, thereby enabling the overall framework to form a more interpretable closed-loop synchronization control system. The main contributions of this paper can be summarized as follows:

1.: A modular trajectory collaborative optimization framework for heterogeneous UAV–USV cooperation is proposed. For the first time, it organically integrates curvature-feasible path generation with along-trajectory speed planning under explicit spatiotemporal constraints, addressing the issue of insufficient coupling between spatial path and temporal coordination in existing methods.
2.: A risk-aware vision-planning fusion interface with uncertainty quantification is designed. It can explicitly model visual pose uncertainty and semantic dynamic obstacles and convert them online into inflated constraints and continuous risk fields for the planner, supporting proactive risk-sensitive planning in dynamic and uncertain environments.
3.: An AIDT-based interpretable and self-adaptive closed-loop planning mechanism is developed for integration within the CCPP framework. Rather than proposing the AIDT itself as a new standalone algorithm, this work redesigns its planning-oriented feature input, meta-action space, and closed-loop coupling with consistency evaluation, objective regularization, and replanning triggering. In this way, transparent logic-driven online replanning and parameter adaptation are achieved for heterogeneous UAV–USV coordinated trajectory planning.

2. Related Work

2.1. Temporal-Constraint Cooperative Arrival Control for Heterogeneous Unmanned Systems

Temporal constraints in the collaborative process of multi-agent groups, as illustrated by deadlines, time windows, and rendezvous constraints, provide a direct formal tool for task-level coordination. Research on Multi-Agent Path Finding (MAPF) on discrete graphs has systematically discussed the feasibility and optimal solutions for problems involving deadlines [12]. For instance, the MAPF-DL model [13] takes the objective of maximizing the number of agents reaching their goals within given deadlines, proving that its optimal solution entails high complexity while also presenting scalable algorithmic ideas based on flow and combinatorial search. On the other hand, methods like Conflict-Based Search (CBS) [14] have established an effective framework for solving optimal MAPF through a strategy of high-level conflict resolution combined with low-level single-agent planning, providing a general skeleton for introducing additional constraints such as time windows and conflicts. However, when transferring the aforementioned time-coordination ideas to heterogeneous clusters operating in the continuous space of the maritime environment, two key challenges emerge. First, disturbances from wind, waves, and currents, combined with communication limitations and safety-domain constraints, make it difficult for a strategy of prior path planning followed by post hoc speed adjustment to guarantee robust synchronization. Second, heterogeneous platforms often possess different kinematic and dynamic constraints, necessitating that the exchange of coordination variables be both compact and capable of online updates. Existing work on path planning for maritime unmanned platforms typically emphasizes global-local hybrid planning under dynamic constraints, for example, a hybrid planning framework incorporating dynamic constraints designed for autonomous surface vessels [15]. Nevertheless, a significant gap remains in the unified modeling and online solving of the joint optimization of geometric paths and speed profiles under explicit temporal consistency constraints.

Through in-depth analysis, it has been found that existing methods tend to focus either on time-constrained MAPF on discrete graphs or on geometric feasibility and local obstacle avoidance in continuous maritime space. Consequently, for mixed heterogeneous unmanned clusters, the proposed framework must integrate curvature-feasible path generation with along-path speed regulation under explicit time-consistency constraints, while remaining capable of online updates, so that it can plan trajectories for mixed heterogeneous unmanned swarms and simultaneously improve overall mission execution performance.

2.2. Semantic Uncertainty Representation in Dynamic Scene Visual Navigation and Its Planning Interface

In maritime missions where GNSS is limited or obstructed, visual navigation and SLAM (Simultaneous Localization and Mapping) [16] serve as key alternative solutions. However, dynamic sea surface reflections, surges, vessel motions, and clutter violate the “scene rigidity” assumption, leading to UAV pose estimation drift. To address dynamic scenes, works such as DynaSLAM [17] have integrated dynamic object detection and background inpainting based on ORB-SLAM2, highlighting the importance of semantic dynamic information for stable localization and mapping [18]. Yet, from the perspective of usability for planning, merely enhancing the robustness of SLAM in dynamic scenes is insufficient. What the planning layer truly requires is a set of computable and transferable risk representations. This includes how pose uncertainty can be transformed into safety margins, and how semantic dynamic obstacles can be converted into time-varying constraints or risk fields. In other words, the research focus is gradually shifting from “how SLAM internally suppresses dynamics” to “how perceptual results are exposed to the planner in a structured manner,” enabling planning to perform constraint inflation, risk stratification, and safety-feasibility guarantees.

In summary, although dynamic-scene SLAM and semantic perception have matured considerably, a unified, interface-like expression for “propagating uncertainty and semantic dynamic obstacles into planning constraint inflation or time-varying risk fields” remains lacking. Particularly in maritime cross-domain collaboration, there is an absence of a vision-planning bridging module that can directly support robust collaborative planning.

2.3. Interpretable Online Decision-Making Mechanism and Gini-Based Metric for Arrival Time Consistency

In the execution of complex tasks within highly complex maritime scenarios, the strategies employed must not only be capable but also be controllable and understandable. The perspective of Rudin [19] emphasizes that for high-risk decision-making scenarios, rather than providing post hoc explanations for a “black box,” it is preferable to directly use inherently interpretable models. Consequently, rule-based models such as decision trees are often used to construct interpretable policies. However, traditional decision trees often lack adaptability unless they possess mechanisms for self-learning and continuous evolution to achieve this purpose. Wang Yue, Tian Lingyun, and others [11] proposed the AIDT, which generates decision trees with both adaptability and interpretability in an evolutionary manner. Its usability and readability have been validated in maritime collaborative mission scenarios, providing direct support for “using an interpretable meta-strategy for online adjustment and re-planning.”

On the other hand, to equip “online adjustment” with clear triggering conditions, a metric is required that can reflect the collaboration quality of a mixed cluster and possesses numerical stability. Fairness metrics, such as the Gini coefficient [20], inherently characterize the “degree of distributional inequality.” With its clear statistical definition and favorable normalization properties, it has been widely used as an indicator of imbalance and concentration. Compared to other metrics in economics used to measure social equity, welfare averages, and similar concepts, its superiority lies in its inherent range of measurement values. Furthermore, this indicator is typically used to measure an average state within a macroeconomic entity and does not rely on overly stringent economic presuppositions to hold true. Precisely because of these characteristics, we adapt it to the context of “arrival time distribution.” This transformation turns “synchronized arrival” from a previously vaguely estimated parameter lacking a detailed calculation metric into a computable consistency index. Consequently, it can be used as a necessary prerequisite for evaluation, regularization, and even triggering re-planning, thereby making re-planning more precise.

Ultimately, existing work on online collaborative adjustment either relies on non-interpretable deep reinforcement learning policies or lacks a closed-loop structure combining “readable rules plus explicit triggering metrics.” Therefore, this paper introduces an interpretable meta-strategy combined with a Gini-based consistency metric to form an auditable online adjustment closed loop.

3. Formulation

3.1. System Description and Notation

We consider a heterogeneous multi-agent cooperative framework composed of N unmanned platforms (UAVs and USVs), denoted as the mixed agent set

I = {1, 2, \dots, N}

. Each agent is denoted by

i \in I

and moves in continuous space. Its state at time t is denoted as

x_{i} (t)

, and its control input is denoted as

u_{i} (t)

.

The mission considered in this paper is a staged maritime search task rather than a simple point-to-point gathering task. Specifically, the heterogeneous team is required to cooperatively cover the search region, generate suspicious cues by UAV sensing during coverage, and dispatch at least one USV to the cue region for confirmation within a prescribed time window. Therefore, the task objective is not merely to drive all agents to a single fixed terminal region but to coordinate their motion so that coverage quality, cue–confirmation feasibility, safety, and temporal consistency can be simultaneously satisfied.

To keep the formulation compact, at each planning cycle, each agent is assigned a stage-dependent task region. Accordingly, the arrival time

T_{i}

denotes the predicted time for agent i to reach its currently assigned task region under the current coordination mode. Temporal consistency is therefore enforced for coordinated arrivals to these stage-dependent task regions, especially during synchronized confirmation, rendezvous, or revisitation events.

Each agent starts from its initial region

S_{i}

at the beginning of the mission, and under online replanning, its current state serves as the updated initial condition for the next planning cycle. To conveniently decouple the description of geometric path generation and speed regulation, we express the trajectory of agent i in a path-time parameterized form. Its geometric path is represented by the arc-length parameter

s \in [0, L_{i}]

as

p_{i} (s)

, and its speed profile along the path is denoted as

v_{i} (s) \geq 0

. The corresponding arrival time is obtained by

T_{i} = \int_{0}^{L_{i}} \frac{1}{v_{i} (s)} d s

(1)

where

L_{i}

is the path length. This representation allows the arrival time to be regulated through the speed profile while preserving curvature-feasible geometric planning, thereby enabling temporally consistent coordination under heterogeneous kinematic constraints. To make the notion of curvature feasibility explicit, for the geometric path of agent i we use a planar Serret–Frenet description on the arc-length-parameterized curve. Specifically, letting

p_{i} (s) = {[x_{i} (s), y_{i} (s)]}^{⊤}

, the unit tangent vector and unit normal vector are defined as follows:

t_{i} (s) = \frac{d p_{i} (s)}{d s}, ∥ t_{i} (s) ∥ = 1, n_{i} (s) = \frac{d t_{i} (s) / d s}{∥ d t_{i} (s) / d s ∥} .

(2)

Accordingly, the path curvature is defined through the Serret–Frenet relation

\frac{d t_{i} (s)}{d s} = κ_{i} (s) n_{i} (s), κ_{i} (s) = ∥\frac{d^{2} p_{i} (s)}{d s^{2}}∥ = \frac{d ψ_{i} (s)}{d s},

(3)

where

ψ_{i} (s)

denotes the path heading angle. Since the trajectory is executed with along-path speed

v_{i} (s)

, the corresponding heading/yaw-rate control relation is given by

{\dot{ψ}}_{i} (t) = v_{i} (s) κ_{i} (s) .

(4)

Therefore, curvature feasibility can be translated into executable control constraints through the turning-rate limit. In implementation, we use a conservative curvature bound together with platform-specific heading/yaw-rate constraints, while UAV altitude-layer constraints and USV navigable-water constraints are enforced separately.

3.2. Environment, Dynamic Obstacles, and Risk Representation

Within the context of a cooperative maritime search scenario, the operational environment can generally be characterized by the following categories: a set of static environmental obstacles, usually denoted as

O^{sta}

, and a set of dynamic environmental obstacles, denoted as

O^{dyn} (t)

. Comprehensive obstacle information can be derived from these two sets and is detected and tracked online by a semantic perception module. The state of each obstacle (position, velocity, category.) in the coordinate frame is denoted as

O_{k} (t)

. Based on the above definitions, we structure this information for the planning layer and introduce a time-varying risk field as

R (q, t) \in [0, 1]

. Among these, q represents a spatial position, and R comprehensively reflects the potential collision risk from semantic dynamic obstacles and the traversability of the environment.

Concurrently, visual navigation in dynamic maritime areas introduces pose estimation uncertainty. For agent i, the pose estimation covariance is denoted by

Σ_{i} (t) \in R^{2 \times 2}

, which represents the planar position uncertainty of the current estimated pose in the navigation frame. Accordingly, the planning layer performs uncertainty-aware inflation of the safety margin. Instead of using an implicit covariance mapping, we explicitly define the inflation amount through the following two-step formulation:

u_{Σ, i} (t) = \sqrt{λ_{max} (Σ_{i} (t))}

(5)

Δ_{i} (t) = α_{Σ} u_{Σ, i} (t)

(6)

Here,

λ_{max} (Σ_{i} (t))

denotes the maximum eigenvalue of the covariance matrix, and

u_{Σ, i} (t)

therefore provides a scalar measure of the dominant positional uncertainty radius. The coefficient

α_{Σ} > 0

is a tunable inflation gain that converts uncertainty magnitude into a conservative geometric safety margin. Therefore, the originally compact mapping

f (Σ_{i} (t))

is made explicit as

f (Σ_{i} (t)) = α_{Σ} \sqrt{λ_{max} (Σ_{i} (t))}

. This mapping is monotonic, so larger pose uncertainty directly leads to a larger inflation amount, thereby enhancing robustness against estimation errors and environmental disturbances. Based on the risk field and constraint inflation, the safe feasible region for agent i is specifically described as a time-varying set

F_{i} (t)

, meaning the agent’s shape, inflated by

Δ_{i} (t)

, remains collision-free with the obstacle sets. Subsequent planning will generate feasible paths and speed profiles within

F_{i} (t)

.

3.3. Synchronous Arrival and Temporal Consistency Objective

To further characterize the degree of “simultaneous arrival,” we define the cluster arrival time vector

T = {T_{1}, \dots, T_{N}}

. Perfect synchronization corresponds to all

T_{i}

being equal. In practical missions, a certain degree of synchronization error is permissible. Therefore, this paper takes the consistency of arrival times as the core coordination metric, which, together with path quality and risk cost, constitutes the optimization objective. The mission objective can be described as: under the premise of satisfying feasibility constraints, minimize the arrival time deviation, or the consistency metric, as much as possible while simultaneously reducing the path cost and risk cost. Formulated in a unified manner, the total cost can be described in the following weighted form:

J_{t o t a l} = \sum_{i \in I} (J_{i}^{path} + J_{i}^{risk}) + λ J^{sync} (T)

(7)

In the equation,

J_{i}^{path}

represents the geometric path and curvature-related cost,

J_{i}^{risk}

represents the accumulated risk cost from traversing the risk field along the path, is the synchronous arrival consistency cost, and

λ

is the weight parameter. In summary, we have presented the structured overall framework based on the integration of “path–speed coordination optimization combined with a risk-aware interface and online adjustment,” thereby completing the overall model construction.

3.4. Analysis of Model Constraints

To ensure the constructed model operates correctly in the joint maritime anti-submarine search mission with heterogeneous multi-agent systems, targeted constraint design is necessary. These constraint conditions are expressed using set notation

C = {C_{0}, \dots, C_{7}}

, detailed as summarized in Table 1 and Table 2.

3.5. Overall Analysis

Based on the aforementioned objective function and multiple constraint conditions, the cooperative path planning problem studied in this paper can be formally formulated as follows: For each agent i within a heterogeneous cluster, under the premise of satisfying the constraint set

C = {C_{0}, \dots, C_{7}}

, solve for its geometric path

p_{i} (s)

and

v_{i} (s)

to minimize the total cost

J_{t o t a l}

. This optimization problem is addressed within a dynamic environment using a receding horizon framework for online solution. Within this framework, the planning layer can periodically receive external perception updates and, based on these, update coordination variables such as predicted arrival times and task mode flags in real time, ultimately achieving the objectives of real-time replanning and precise synchronized arrival.

4. Method

4.1. CCPP Framework Overview and Architecture

Drawing on cutting-edge research in the field, we propose a Cooperative Cross-domain Path Planning (CCPP) framework for joint anti-submarine search missions conducted by UAVs and USVs in complex maritime environments. With real-time cooperative path planning and online dynamic adjustment at its core, the framework can, based on a given executable task, fuse state estimation information from visual navigation to rapidly generate executable cooperative trajectories that satisfy the objective function and multiple constraints. It achieves closed-loop online replanning and consistency optimization through an interpretable meta-strategy, thereby ensuring the real-time optimality of cooperative trajectories. To enhance system realizability and robustness, CCPP adopts a “three-layer, dual-closed-loop” architectural design: its three-layer structure comprises a perception interface layer, a receding-horizon cooperative planning layer, and a low-level tracking control layer: the dual closed loops refer respectively to a Gini coefficient-based consistency metric loop and an AIDT triggered meta-strategy loop (supporting weight switching and replanning decisions). This architecture balances perception fusion, cooperative optimization, and execution control, systematically supporting reliable cooperation of heterogeneous unmanned platforms in complex, dynamic maritime environments.

4.2. Framework Diagram

As shown in Figure 1, the CCPP framework adopts a three-layer hierarchical architecture—comprising the Perception Interface Layer, the Cooperative Planning Layer, and the Tracking Control Layer—to facilitate rapid cooperative trajectory generation for heterogeneous UAV–USV clusters.

The framework defines three time scales: the sensing/visual frequency

f_{sens}

, the planning frequency

f_{plan}

, and the control frequency

f_{ctrl}

, which satisfy the relationship

f_{ctrl} ≫ f_{plan} ≫ f_{sens}

. At each planning instant

[t_{k}, t_{k} + H]

, the algorithm operates within a receding horizon window

[t_{k}, t_{k} + H]

to compute the geometric path and speed profile

{p_{i} (\cdot), v_{i} (\cdot)}

for each agent,

i \in I

which are then executed by the underlying tracking controller. The arrival time is derived from the aforementioned path–speed parameterization model (Equation (1)). The receding-horizon optimization employs a warm-start strategy, initializing the current cycle with the solution from the previous cycle

{p^{k - 1}, v^{k - 1}}

. Furthermore, the decision to trigger re-planning is dynamically determined by three types of triggers: a safety and feasibility trigger (related to constraints

C_{2}

and

C_{3}

), a communication and task-closure trigger (related to constraints

C_{4}

and

C_{6}

), and a consistency trigger (related to constraint

C_{7}

and the synchronization cost). These trigger categories are aligned with the mode-switching logic defined later in the AIDT meta-strategy module.

4.3. Vision-to-Planning Interface Module Design

This module serves as the perception-to-planning bridge within the CCPP framework. Its function is to transform the multi-source, uncertain information output by visual navigation and semantic perception into structured objects that can be directly utilized by the planning layer. As shown in Figure 2, the interface receives raw inputs such as pose estimates, covariance, and dynamic/static obstacles. Through a series of standardized computational processes, it ultimately outputs inflated constraints, a time-varying risk field, safe corridors, and a feature vector that drives the meta-strategy for trajectory optimization. In this way, perceptual uncertainty is explicitly incorporated into the planning process as hard constraints and soft costs.

More specifically, the input to this interface is not the raw pixel-level visual stream itself, but the structured output of the upstream visual navigation and semantic perception modules, including the current pose estimate

{\hat{x}}_{i} (t)

, the associated covariance

Σ_{i} (t)

, and the detected dynamic and static obstacle states. These quantities are first transformed into planning-oriented geometric objects through uncertainty inflation and obstacle structuring, and are then further converted into safe corridors, time-varying risk fields, and low-dimensional situational features that can be directly consumed by the cooperative planner and the AIDT meta-strategy.

From a geometric point of view, the safe corridor is constructed from the inflated obstacle representation together with the safe feasible region

F_{i} (t)

, so that it defines a collision-free admissible channel for subsequent path optimization. The situational feature extraction module then summarizes the current coordination state into a compact meta-strategy feature vector, whose components are drawn from physically interpretable quantities such as predicted minimum clearance, uncertainty level, communication condition, coverage estimate, and verification estimate.

Firstly, addressing the uncertainty-driven inflation of safety constraints. For agent i, its pose estimation covariance

Σ_{i} (t)

is first converted into a scalar uncertainty indicator

u_{Σ, i} (t) = \sqrt{λ_{max} (Σ_{i} (t))}

and is then mapped to a real-time safety-margin inflation amount through

Δ_{i} (t) = α_{Σ} u_{Σ, i} (t)

. This inflation amount is directly used to geometrically inflate the static obstacle set

O^{sta}

and the dynamic obstacle set

O^{dyn} (t)

, generating the inflated obstacle sets

O^{dyn} (t)

and

O^{sta}

. Consequently, the agent’s current safe feasible region

F_{i} (t)

is obtained, ensuring the satisfaction of safety constraint

C_{3}

. Subsequently, constructing the time-varying risk field and safe corridor for risk-sensitive planning. The risk field

R (q, t)

comprehensively quantifies collision risk and environmental impassability at any spatial point q. It is formed by the weighted fusion of components such as the dynamic obstacle distance field, shoreline buffer zone, shoal/depth risk field, and clue hot-zone field. Based on this risk field and the safe feasible region

F_{i} (t)

, the interface further generates a collision-free safe corridor

c_{i} (t)

, providing an initial geometrically feasible space for subsequent path optimization. This directly influences the fulfillment of the risk cost term

J_{i}^{risk}

and the geometric feasibility of coverage/search trajectories. In the constraint-level formulation, coverage satisfaction is associated with

C_{5}

, whereas cue–confirmation feasibility is treated separately under

C_{6}

.

Lastly, extracting the feature vector required for the interpretable meta-strategy. To ensure the triggerability and interpretability of AIDT decisions, a low-dimensional feature vector

s (t_{k})

is computed concurrently. This vector encapsulates key situational information and is formulated as follows:

s (t_{k}) = [d_{min} (t), u_{Σ} (t), {\hat{G}}_{T} (t), {\hat{e}}_{T} (t), δ_{comm} (t), η_{cov} (t), η_{verify} (t)]

(8)

Among them,

d_{min} (\cdot)

is the predicted minimum clearance distance;

u_{Σ}

characterizes the uncertainty magnitude;

η_{cov} (\cdot)

is the coverage-quality estimate associated with constraint

C_{5}

; and

η_{verify} (\cdot)

is the clue-verification satisfiability estimate associated with constraint

C_{6}

.

4.4. Design of the Path–Speed Hybrid Cooperative Solver

This module is the core of the CCPP framework for generating executable trajectories. Its objective is to jointly optimize the geometric path and the along-path speed profile of each agent under explicit temporal-consistency and heterogeneous platform feasibility constraints, thereby jointly handling the previously defined constraint set

C = {C_{0}, \dots, C_{7}}

. These constraints are divided into two categories: hard constraints

(C_{0}, C_{1}, C_{2}, C_{3}, C_{4}, C_{6})

must be strictly satisfied, while soft constraints

(C_{5}, C_{7})

are incorporated as penalty terms.

4.4.1. Overall Objective and Notation

Within each planning cycle, the solution of agent i is represented by a geometric path

p_{i} (\cdot)

and an along-path speed profile

v_{i} (\cdot)

. We parameterize the path by arc length

s \in [0, L (p_{i})]

, and use a time-varying environmental risk field

R (q, t)

evaluated at position q and time t. The arrival time of agent i is determined by the path–speed integral

T_{i} = \int_{0}^{L (p_{i})} \frac{1}{v_{i} (s)} d s,

(9)

and the mixed-cluster arrival-time vector is

T = {T_{1}, \dots, T_{N}}

.

The geometric path cost is defined as

J_{i}^{path} (p_{i}) = w_{L} L (p_{i}) + w_{C} J_{i}^{curv} (p_{i}),

(10)

where

L (p_{i})

denotes the path length and

J_{i}^{curv} (p_{i})

penalizes excessive curvature to ensure smoothness/curvature feasibility.

The risk cost corresponds to the cumulative cost incurred when the agent traverses the risk field

R (q, t)

. Using sampling points

{s_{m}}_{m = 1}^{M}

along the path, we approximate it by discrete integration:

J_{i}^{risk} (p_{i}, v_{i}) \approx \sum_{m = 1}^{M} R (p_{i} (s_{m}), t_{m}) Δ t_{m} .

(11)

The time stamps

{t_{m}}

are induced by the speed profile through the path–speed mapping

t_{i} (s) = t_{0} + \int_{0}^{s} \frac{1}{v_{i} (σ)} d σ,

(12)

so that

t_{m} = t_{i} (s_{m})

and

Δ t_{m} = t_{i} (s_{m}) - t_{i} (s_{m - 1})

.

To enforce temporal coordination, we adopt a speed-profile-related cost (denoted as

J^{speed}

) that combines the dynamic cost of speed regulation and the temporal-consistency penalties:

J^{speed} (v; p) = w_{E} E (v) + λ_{sync} Sync (T) + λ_{G} G_{T} (T),

(13)

where

E (v)

measures the dynamic cost of executing the speed profile specifically effort/energy or smoothness of speed changes,

Sync (T)

is a synchronization-consistency penalty defined on T, and

G_{T} (T)

is the Gini coefficient-based dispersion metric for the same arrival-time vector. The weights

w_{L}

,

w_{C}

,

w_{E}

,

λ_{sync}

, and

λ_{G}

balance the contributions of these terms.

4.4.2. Alternating Optimization for Path and Speed Updates

To meet the real-time requirements of online computation, we design an efficient alternating optimization algorithm. Within each planning cycle, over a small number of iterations K, the algorithm iteratively solves two tightly coupled sub-problems.

(i): Path Update Sub-problem

Given the current speed profile

v_{i}^{(k)} (\cdot)

and the risk field

R (q, t)

, we update the path by optimizing geometric variables while treating the induced time stamps

{t_{m}^{(k)}}

and increments

{Δ t_{m}^{(k)}}

as fixed values computed from

v_{i}^{(k)} (\cdot)

via (12):

p_{i}^{(k + 1)} = arg min_{p_{i}} (J_{i}^{path} (p_{i}) + w_{R} \sum_{m = 1}^{M} R (p_{i} (s_{m}), t_{m}^{(k)}) Δ t_{m}^{(k)}),

(14)

subject to the hard feasibility constraints for example collision avoidance and curvature/turning constraints implied by

C_{2}

.

(ii): Speed Profile Update Sub-problem

Based on the updated path

p_{i}^{(k + 1)} (\cdot)

obtained from (i), we fix the path variables and optimize the speed profile to improve temporal consistency while satisfying platform capability constraints:

v_{i}^{(k + 1)} = arg min_{v_{i} (\cdot)} J^{speed} (v_{i}; p_{i}^{(k + 1)}),

(15)

subject to the feasible speed and acceleration bounds

v_{i}^{min} \leq v_{i} (s) \leq v_{i}^{max}, |\frac{d v_{i} (s)}{d t}| \leq a_{i}^{max} .

(16)

Since the arrival time

T_{i}

is determined by the path–speed integral in (9), optimizing the speed profile directly reshapes the arrival-time vector T, thereby influencing the temporal-consistency terms

Sync (T)

and

G_{T} (T)

in (13). This alternating strategy balances geometric feasibility, environmental risk avoidance, and temporal coordination.

Its key advantage is that when adjusting the speed profile alone cannot achieve satisfactory temporal consistency, the temporal-consistency indicators including

Sync (T)

and

G_{T} (T)

, or their normalized forms embedded in the feature vector

s (t_{k})

can be fed back to the AIDT meta-policy, thereby triggering a global re-planning that includes path changes and overcoming the limitations of pure speed coordination methods.

(iii): Cue–Confirmation Satisfaction Check and Constraint Injection

The coupling constraint

C_{6}

(cue–confirmation satisfaction) is enforced via a feasibility check after each alternating iteration. Specifically, after obtaining

{p_{i}^{(k + 1)} (\cdot), v_{i}^{(k + 1)} (\cdot)}

, we verify whether at least one USV can reach the cue–confirmation region within the current planning horizon. Concretely, we check the condition:

\exists i \in I_{usv} : t_{i} (s_{i}^{cue}) \leq t_{0} + T_{win} \land p_{i} (s_{i}^{cue}) \in Ω_{cue},

(17)

where

Ω_{cue} \subseteq Ω

denotes the spatial cue–confirmation region associated with the currently active suspicious clue,

s_{i}^{cue}

denotes the first arc-length position at which the planned path of USV i enters

Ω_{cue}

, and

T_{win}

is the length of the current planning time window. More precisely,

s_{i}^{cue}

is determined as the first-entry arc-length along the planned path,

s_{i}^{cue} : = inf \{s \in [0, L_{i}] ∣ p_{i} (s) \in Ω_{cue}\} .

(18)

The corresponding time quantity

t_{i} (s_{i}^{cue})

is computed through the path–speed time mapping already defined in Equation (9), namely by evaluating the predicted travel time from the current planning instant

t_{0}

to the first entry point on the path. In practical implementation, for each candidate USV, we scan the discretized planned trajectory, detect the first path sample or segment intersecting

Ω_{cue}

, compute the associated

s_{i}^{cue}

, and then evaluate

t_{i} (s_{i}^{cue})

using Equation (9). If no path intersection with

Ω_{cue}

exists, Equation (17) is regarded as unsatisfied for that USV. This feasibility check is evaluated after each alternating iteration using the current candidate path–speed pair

{p_{i}^{(k + 1)} (\cdot), v_{i}^{(k + 1)} (\cdot)}

.

If the condition is violated,

C_{6}

is injected into the optimization as a high-priority hard constraint (or equivalently, as a large-weight penalty term) in the subsequent iterations, so as to guarantee timely cue confirmation. In parallel, the AIDT meta-policy switches to the confirmation-priority mode, which reallocates decision weights and guides the following optimization to favor feasible trajectories that satisfy

C_{6}

.

4.5. Consistency Regularization Design Based on the Gini Coefficient

To address unbalanced time distribution during coordinated arrival in heterogeneous swarms, we incorporate the Gini coefficient—a metric originally used to quantify distribution inequality—as a consistency indicator in the CCPP framework. Based on this indicator, we establish a closed-loop mechanism with three stages: (i) online consistency evaluation and reporting, (ii) objective regularization for optimization, and (iii) mode switching (and, if necessary, re-planning triggering) according to the indicator values.

(1): Arrival-Time Gini Coefficient.

The Arrival-Time Gini Coefficient, denoted as

G_{T}

, is used as a temporal-consistency metric to assess swarm synchronization quality from the perspective of arrival-time distribution equality. For a swarm comprising N agents with an arrival time vector

T = {T_{1}, \dots, T_{N}}

,

G_{T}

is computed by:

G_{T} (T) = \frac{\sum_{i = 1}^{N} \sum_{j = 1}^{N} |T_{i} - T_{j}|}{2 N^{2} T^{*}} .

(19)

For a nonnegative sample set

x = {x_{1}, \dots, x_{N}}

with sample mean

\bar{x}

, the standard discrete Gini coefficient can be written in the pairwise-difference normalized form [21,22]

G (x) = \frac{\sum_{i = 1}^{N} \sum_{j = 1}^{N} |x_{i} - x_{j}|}{2 N^{2} \bar{x}} .

(20)

Therefore, by setting

x_{i} = T_{i}

and

\bar{x} = T^{*}

, Equation (19) is exactly the standard discrete Gini coefficient applied to the arrival-time distribution. This form is mathematically equivalent to the classical Lorenz-curve interpretation of equality versus inequality [22]. In the present problem, the economic notion of distributional equality is translated into temporal coordination equality.

The desired coordinated arrival time

T^{*}

is defined as a common reference time computed from the current arrival-time estimates of the swarm,

T^{*} = \frac{1}{N} \sum_{i = 1}^{N} T_{i} (T_{i} \in T)

. Since

T^{*}

is exactly the sample mean of the arrival-time distribution, Equation (19) is the standard discrete Gini coefficient written in its pairwise-difference normalized form. This form is mathematically equivalent to the classical Gini coefficient associated with the Lorenz-curve interpretation of equality versus inequality. In the present problem, the economic notion of “distributional equality” is translated into “temporal coordination equality,” meaning that a smaller

G_{T}

indicates that the arrival-time distribution is closer to perfectly synchronized arrival across the heterogeneous agent set.

This definition encourages the mixed cluster to reduce dispersion in arrival times with respect to a shared reference, thereby improving global synchronization. Under this principle, an individual agent may deviate from its originally estimated arrival time if doing so yields better overall temporal consistency.

(2): Error-Domain Gini Coefficient.

To enhance numerical stability and focus more directly on synchronization mismatch relative to the shared reference time, the Error-Domain Gini Coefficient

G_{E} (e)

is further defined. Let the desired coordinated arrival time be

T^{*}

. The calculation of

G_{E} (e)

is given by:

G_{E} (e) = \frac{\sum_{i = 1}^{N} \sum_{j = 1}^{N} |e_{i} - e_{j}|}{2 N^{2} e^{*}} .

(21)

Likewise, by setting

x_{i} = e_{i}

and

\bar{x} = e^{*}

, Equation (21) is also the standard discrete Gini coefficient, but applied to the synchronization-error distribution rather than the raw arrival-time distribution.

Here, the arrival-time error for each agent is defined as

e_{i} = |T_{i} - T^{*}|

, and the reference error level is defined as

e^{*} = \frac{1}{N} \sum_{i = 1}^{N} e_{i}

. Therefore, Equation (21) is likewise the standard discrete Gini coefficient, but applied to the synchronization-error distribution rather than the raw arrival-time distribution. This error-domain form measures how evenly the residual synchronization error is distributed among agents after referencing the common coordinated-arrival time. For the degenerate case

e^{*} = 0

, all agents are perfectly synchronized and we define

G_{E} (e) = 0

.

Compared with variance-based or coefficient-of-variation-based dispersion measures, the Gini coefficient is adopted here as the primary consistency indicator for four reasons. First, it directly quantifies the overall pairwise temporal discrepancy within the swarm, which is more closely aligned with the coordinated-arrival objective than a purely mean-centered dispersion measure. Second, it is scale-invariant, so the consistency evaluation is not distorted by the absolute mission-time level. Third, its equality/inequality interpretation is more suitable for describing whether the temporal burden is evenly shared among agents during cooperative execution. Fourth, unlike a max–min arrival-time difference, the Gini coefficient uses the full pairwise discrepancy distribution rather than being dominated only by the two extreme agents.

It should also be noted that the standard discrete Gini coefficient is mathematically well defined for any finite nonnegative sample set and does not rely on large-sample assumptions. Therefore, its use in the present heterogeneous swarm setting with a small number of agents remains theoretically valid. Moreover, since the largest scenario considered in this paper contains only eight agents, the practical computational cost of the pairwise Gini calculation is negligible and does not constitute a limiting factor for online evaluation.

To provide a direct empirical reference for this metric choice, a supplementary comparison with a CV-based consistency metric is further reported in Section 5.2.4 under the same experimental setting.

(3): Closed-loop triggering and mode switching.

Trigger thresholds

τ_{G}

and

τ_{safe}

are set. When the trigger conditions

G_{T} > τ_{G}

or

G_{E} (e) > τ_{G}

are met, the system determines that the current synchronization performance is sub-optimal and enters the “Enhanced Synchronization Optimization” mode. In this mode, the algorithm increases the weights of the consistency cost terms in the objective function,

λ_{sync} \leftarrow α \cdot λ_{sync}

(where

α > 1

), thereby prioritizing convergence to solutions with improved synchronization in subsequent optimization cycles.

Similarly, when safety or communication conditions deteriorate, the AIDT meta-policy switches to corresponding modes to preserve feasibility and robustness. Specifically, switches to “Risk Aversion” or “Communication Recovery” may be triggered when one or more of the following conditions are detected: (i) safety margin degradation such as (

d_{min} < τ_{safe}

); (ii) communication quality deterioration such as

δ_{comm} > τ_{G}

; or (iii) the number of available neighbors falls below a threshold. These mode switches may further initiate specific types of re-planning to ensure robust mission execution. In the experimental setting, this communication-related trigger is evaluated through the planning-interface-level communication degradation indicator defined in Section 5.1, thereby ensuring consistency between the methodological switching logic and the simulation assumptions.

4.6. Online Replanning Design Based on the AIDT Explainable Meta-Strategy

4.6.1. Decision Tree Structure

To achieve rapid online adjustment of strategies in dynamic maritime environments, this paper adopts the AIDT proposed in [11] as the planning meta-strategy, rather than introducing a new decision-tree algorithm itself. The incremental contribution of the present work lies in its task-oriented coupling within the CCPP framework, including the redesign of planning-oriented feature input, the definition of a meta-action space matched to heterogeneous coordinated trajectory planning, and its closed-loop integration with consistency evaluation, objective regularization, and replanning triggering. This meta-strategy is responsible for making decisions during the rolling planning process, including: the scope of replanning (global and local), updates to the cost function weights, adjustments to the safety margin inflation factor, and switches in the collaboration mode. Unlike opaque “black-box” strategies, the AIDT explicitly expresses the mapping logic of “feature threshold—decision action” through a clear tree structure. This structure not only facilitates analysis and debugging but also meets the requirement for traceability of key decision processes during task execution. As shown in Figure 3, the trigger action tree structure describes the decision logic for selecting different replanning actions under varying task and environmental conditions.

4.6.2. Coupling Design of the AIDT Module Within the CCPP Framework

Firstly, it is necessary to define the input features and action space for the meta-strategy. At each planning moment

t_{k}

, the interface layer outputs a feature vector

s (t_{k}) \in R^{d}

. Based on the extracted situational feature vector, the AIDT outputs a planning-oriented meta-action at each planning moment

t_{k}

. Accordingly, the meta-action can be written as:

a (t_{k}) = π_{AIDT} (s (t_{k})) = (ReplanType, Δ w, Δ_{α_{Σ}}, Mode)

(22)

Specifically, ReplanType takes one of the three discrete values {none, local, global}, corresponding to no replanning, local detouring, and global replanning, respectively. The term

Δ w

represents the incremental update to all weight vectors, and

Δ α_{Σ}

adjusts the uncertainty inflation mapping. Mode is selected from {coverage, verify, sync, comm-safe}, representing coverage-prioritized, verification-prioritized, tight-synchronization, and communication-conservative behaviors, which correspond to the constraint sets

C_{5}

,

C_{6}

,

C_{7}

, and

C_{4}

, respectively. For clarity, this mode–constraint correspondence is used consistently throughout the remaining method and experiment sections, with coverage associated with

C_{5}

, verification associated with

C_{6}

, synchronization associated with

C_{7}

, and communication-conservative behavior associated with

C_{4}

.

Subsequently, within each rolling planning cycle of the CCPP framework, the invocation of AIDT follows the closed-loop process shown in Figure 4.

Through the aforementioned closed-loop process, the CCPP framework integrates the AIDT meta-strategy. Its core algorithm—comprising the vision–planning interface module, the AIDT meta-strategy inference module, the alternating optimization solver, and the consistency closed-loop module—operates in a coordinated manner to achieve the ultimate objective of real-time, rapid trajectory planning for heterogeneous multi-UAV–USV clusters.

4.7. Algorithm and Complexity Analysis

4.7.1. Algorithm Implementation

As summarized in Algorithm 1, the trigger action tree structure defines the decision logic used in the rolling planning process.

Algorithm 1. Trigger Action Tree Structure in rolling planning.
Input: $M, G_{i}, g_{i}, O^{sta} or O^{dyn} (t), {\hat{x}}_{i} (t), Σ_{i} (t), q (t)$ ,
$C = {C_{0}, \dots, C_{7}}, {Tree}_{AIDT}, H$ ,
$Δ t p$ .
Output: ${p_{i} (s), v_{i} (s)}, T_{i}, {Mode}_{online}, w_{update}$ .
1:	Step1 (Initialization): $p_{i}^{0}, v_{i}^{0}, w_{0} = {w_{L}, w_{R}, w_{S}, λ_{G}, \dots}, α_{Σ}, {Mode}_{0}$ .
2:	`for` each planning time $t_{k}$ `do` ∆ rolling (receding-horizon) cycle
3:	Step2 (Interface): calculate $u_{Σ, i} (t_{k}) = \sqrt{λ_{max} (Σ_{i} (t_{k}))}$ and $Δ_{i} (t_{k}) = α_{Σ} u_{Σ, i} (t_{k})$ : generate ${\tilde{O}}_{i} (t_{k})$ , safe channel $C_{i} (t_{k})$ , and risk field $R (p, t_{k})$ . ∆ geometry & risk construction
4:	Step3 (Feature): construct feature vector $s (t_{k}) = {[d_{min}, u_{Σ}, G_{T}, δ_{comm}, η_{cov}, η_{verify}, \dots]}^{⊤}$ . ∆ AIDT inputs
5:	Step4 (AIDT): infer meta-action $a (t_{k}) = π_{T} (s (t_{k}))$ . ∆ trigger–action mapping
6:	Step5 (Update): $w \leftarrow w + Δ w$ , $α_{Σ} \leftarrow α_{Σ} + Δ α_{Σ}$ . ∆ weights & inflation
7:	Step6 (Solver call):
8:	`if` ReplanType = none `then`
9:	perform lightweight speed update (or speed tightening) on the fixed $p_{i}$ . ∆ fast local adjustment
10:	`else`
11:	trigger local or global replanning: execute alternating optimization for K iterations. ∆ path–speed co-optimization
12:	`for` $r = 1$ `to` K `do`
13:	update path: $p_{i}^{k} \leftarrow arg {min}_{p_{i}} J_{path} (p_{i} : w, R)$ .
14:	update speed: $v_{i}^{k} \leftarrow arg {min}_{v_{i}} J_{speed} (v_{i} : w, G_{T})$ .
15:	calculate: $T_{i} = \int_{0}^{s_{i}} \frac{1}{v_{i} (s)} d s$ . ∆ arrival-time evaluation
16:	`end for`
17:	`end if`
18:	Step7 (Consistency loop): calculate $G_{T} (T)$ : record trigger metrics for the next cycle’s features $s (t_{k + 1})$ . ∆ close the loop
19:	Step8 (Control): send ${p_{i} (s), v_{i} (s)}$ to the tracking controller: proceed to the next cycle $t_{k + 1}$ .
20:	`end for`

4.7.2. Algorithm Complexity Analysis

This section analyzes the real-time computational overhead of the CCPP framework. The heterogeneous cluster size is denoted as N, the number of discrete nodes for each agent’s path as M, the number of alternating iterations in the receding-horizon optimization as K, the number of dynamic obstacles as

O_{d}

, and the environmental grid scale as G. The computational cost within a single planning cycle

t_{k}

primarily stems from three parts:

(i): Perception Interface Layer: This layer executes uncertainty inflation, risk field construction, and safe channel generation. Its complexity is $O (I (G, N_{d}, M))$ , with $I (\cdot)$ capturing the cost of either a distance field update $O (G log G)$ or a query over the obstacle list $O (M \cdot | O_{d} |)$ , depending on the implementation.
(ii): Collaborative Optimization Solver: This core module involves K alternating iterations of the path and speed sub-problems. For a single iteration, the computational complexities of the path update and speed update are denoted as $S_{path} (M)$ and $S_{speed} (M)$ , respectively. Consequently, the total complexity of this part is $O (N \cdot K \cdot (S_{path} (M) + S_{speed} (M)))$ .
(iii): AIDT Meta-Strategy Inference: This process only requires traversing a decision tree with depth $D_{tree}$ , resulting in a negligible overhead of $O (D_{tree})$ .

Therefore, the overall time complexity per planning cycle can be summarized as:

O (I (G, N_{d}, M) + N \cdot K \cdot (S_{path} (M) + S_{speed} (M))) .

(23)

The analysis indicates that the computational overhead scales approximately linearly with the number of agents N and is dominated by the path discretization granularity M and the iteration count K.

5. Experiments

5.1. Experimental Setup

5.1.1. Experimental Setup and Reproducibility Settings

All experiments are conducted in a ROS 2 (Humble)-based 2D simulator with deterministic initialization per random seed. Unless otherwise stated, each configuration is evaluated over

n = 20

independent seeds under identical initial conditions and target realizations. We consider three square maritime search regions with half-size L: Scene A (

L = 1000

m;

2 \times 2

km), Scene B (

L = 2500

m;

5 \times 5

km), and Scene C (

L = 5000

m;

10 \times 10

km). The integration time step is

Δ t = 0.2

s. For each seed, targets are sampled uniformly within the region. A run terminates when all targets are verified or when the mission horizon

T_{max} =

21,600 s is reached. Accordingly, the “arrival-time consistency” studied in this paper refers to coordinated arrival to the currently assigned task region within the staged search process, especially during cue–confirmation episodes, rather than one-off convergence of all agents to a single fixed destination. Key simulation and evaluation settings are summarized in Table 3. These settings are fixed across all compared methods unless explicitly stated otherwise, so as to improve transparency and reproducibility.

For implementation reproducibility, the receding-horizon planner uses a planning horizon of

H = 77 s

, each candidate path is discretized into

M = 32

nodes, and the warm-started alternating path–speed solver runs for at most

K = 4

iterations per planning cycle. The online trigger logic uses thresholds

(τ_{G}, τ_{safe}, τ_{comm}, τ_{verify}) = (0.24, 1.2 m, 0.5, 0.6)

. When these trigger conditions are not activated, only a lightweight speed-tightening update is applied; otherwise, local or global replanning is invoked once the relevant consistency, safety, communication, or verification degradation criterion is met. In the current implementation, the AIDT module is initialized with a tree depth of 5 and is then allowed to grow adaptively during the iterative evolution process. Unless explicitly stated otherwise, these implementation-level parameters are fixed across all compared methods.

At the modeling level, the communication condition is abstracted by a normalized degradation indicator

δ_{comm} (t) = w_{τ} min (\frac{τ_{comm} (t)}{τ_{max}}, 1) + w_{p} p_{loss} (t) + w_{n} (1 - ρ_{n} (t)),

where

τ_{comm} (t)

denotes the communication delay,

p_{loss} (t)

denotes the packet-loss ratio,

ρ_{n} (t)

denotes the available-neighbor ratio, and

w_{τ} + w_{p} + w_{n} = 1

. In this work, communication degradation is represented at the planning interface level through information-quality reduction and neighbor-availability variation, which directly affect situational features, mode switching, and replanning decisions, rather than through full packet-level network-stack emulation.

Likewise, the GNSS-denied condition is modeled through navigation-state estimates and uncertainty propagation.

{\hat{x}}_{i} (t) = x_{i} (t) + ϵ_{i} (t), ϵ_{i} (t) \sim N (0, Σ_{i} (t)),

u_{Σ, i} (t) = \sqrt{λ_{max} (Σ_{i} (t))},

so that the planner responds to estimated states and their uncertainty, rather than idealized global-position truth, in the subsequent safe-region construction, risk modulation, and replanning process.

All baselines are evaluated in the same simulator under identical initial conditions, target realizations, time step, termination rules, and the same planning-interface-level communication-constrained and GNSS-denied assumptions. Safety-related settings such as the computation of

d_{min}

are kept consistent across methods, and all results are reported over the full seed set without manual cherry-picking. We report six metrics: (i)

G_{T}

, the arrival-time Gini coefficient used to quantify synchronization consistency; (ii)

{Step}_{ms}

, the average per-step planning runtime (ms), summarized using a steady-state tail mean; (iii)

d_{min}

, the worst-case minimum inter-agent separation over the entire run; (iv)

η_{cov}

, the covered-area ratio computed on a uniform grid with cell size

s_{cell}

; (v)

η_{verify}

, the verified-target ratio, defined as the fraction of targets that complete the discover–confirm loop; and (vi)

t_{s}

, the total mission completion time. A run is deemed successful if

η_{verify} = 1.0

within

T_{max}

; otherwise it is counted as a timeout. For time-based summaries, the completion time is set to

T_{max}

(right-censoring), and the final

η_{verify}

is recorded to reflect partial completion at timeout.

5.1.2. Experimental Scenarios

We designed three scenarios, including a simple static scenario, a moderately dynamic scenario, and a complex highly dynamic scenario, as summarized in Table 4. These scenarios are intended to validate fundamental synchronization and coverage-search capability, to evaluate dynamic obstacle avoidance and online replanning performance, and to stress-test coordination consistency and safety by increasing dynamic obstacle density and introducing simulated ocean-current disturbances.

5.1.3. Experimental Equipment Platform and Details

Experiments were conducted on a workstation with an Intel i9-14900K CPU, NVIDIA RTX A5000 GPU, and 96 GB RAM. The system operates in a receding-horizon framework: the planning frequency is 1 Hz (

Δ t_{p} = 1.0 s

), and the control or logging frequency is 10 Hz. The vision-to-planning interface layer updates

Δ_{i} (t)

, the risk field, and the safe corridors online. Within each planning cycle, the planning layer executes a warm-started alternating path–speed optimization procedure, in which the geometric path and speed profile are updated sequentially under the objective and constraint definitions given in Section 4. The solution from the previous cycle is reused as the warm start for the next cycle. AIDT decisions are then used to trigger lightweight parameter updates or local/global replanning when the corresponding situational conditions are met. Unless explicitly stated otherwise, these implementation-level execution settings are kept fixed across all compared methods, so that the comparison reflects methodological differences rather than uncontrolled execution-level variations.

To comprehensively evaluate the performance and applicability boundaries of the proposed CCPP framework, we conducted a controlled comparison against representative baselines under identical initial conditions. The selected baseline set is intended to provide a fair and reproducible comparison at the trajectory-planning layer under the same low-level dynamics, safety constraints, uncertainty-aware perception inputs, environmental disturbances, communication models, and mission settings, rather than to serve as an exhaustive survey of all system-level heterogeneous UAV–USV coordination frameworks. The baselines include three classical coordination strategies: Pure Speed Coordination (PSC) [24], which synchronizes agents by optimizing speed profiles along fixed geometric paths; Equal-Length Path (ELP) [25], which reduces arrival-time dispersion by adjusting path lengths and executing near-constant-speed motions; and a reactive avoidance method combined with high-level speed adjustment (RAST) [26]. In addition, we introduced three recent planning-oriented methods as supplementary comparisons: a time-elastic local trajectory optimization method TETO [27], which jointly optimizes spatial poses and temporal allocations within a receding horizon; an RRT_ACS hybrid planner [28], which leverages sampling-based exploration guided by ant-colony-style cost bias for global path generation; and a topology-aware local optimization method (DRLC) [29], which maintains and optimizes multiple candidate trajectories of distinctive topologies to avoid local minima. Taken together, these baselines cover fixed-path synchronization, path-length adjustment, reactive avoidance, time-elastic optimization, sampling-based global planning, and topology-aware local optimization. For all methods, we enforced the same platform kinematic limits, environmental disturbances, and communication models, ensuring comparability and fairness of the evaluation.

To ensure the robustness of the statistical conclusions, each unique combination of a planning method and an experimental scenario was independently executed for 20 repeated trials under identical initial conditions. Fixed random seeds ranging from 0 to 19 were employed to balance reproducibility with the influence of stochastic factors. The simulation advanced with a fixed time step, and key performance metrics were logged during each planning cycle of every trial.

The recorded data encompass four core evaluation dimensions: (i) coordination consistency, measured by the Gini coefficient of arrival times

G_{T}

; (ii) safety, assessed via the global minimum safety clearance

d_{min}

and the number of collisions; (iii) task performance, quantified by the UAV search coverage rate

η_{cov}

, the USV clue verification rate

η_{verify}

, and the total mission time

t_{s}

; and (iv) online deployability, evaluated through the average and maximum single-cycle planning time

{Step}_{ms}

and the frequency of replanning triggers. The final results for all metrics are reported as mean values together with the corresponding standard deviations across the 20 independent trials. Statistical analyses were conducted using a 95% confidence interval to substantiate the validity of the research findings. The 95% confidence interval is reported together with the mean value in Table 5. Specifically, for each metric we compute the sample mean and sample standard deviation over n independent runs, and form a two-sided 95% CI as

\bar{x} \pm t_{0.975, n - 1} s / \sqrt{n}

using the Student’s t-distribution.

5.2. Analysis of Experimental Results

5.2.1. Algorithm Implementation

To evaluate the overall performance of the proposed CCPP and competing methods in heterogeneous UAV–USV collaboration, we conduct multiple independent trials for seven algorithms across three representative scenarios. Key metrics related to mission efficiency, safety, and constraint satisfaction are recorded. Given the large number of algorithm–scenario combinations, results for each metric are summarized using grouped bar charts, where bar heights represent the mean values over trials and error bars indicate the variability across trials. This allows for simultaneous assessment of average performance and stability. Unless otherwise stated, percentage improvements are reported relative to the strongest competing baseline (the best non-CCPP method) in the same scenario. The corresponding results are presented in Figure 5, Figure 6, Figure 7, Figure 8, Figure 9 and Figure 10.

The temporal-consistency metric

G_{T}

quantifies the dispersion of the arrival-time distribution across the heterogeneous swarm, where a lower value indicates better synchronized coordination quality. As shown in Figure 5, CCPP achieves the lowest $G_{T}$ in all three scenarios. In Scene A, it attains $0.297 \pm 0.034$ , representing a 10.3% improvement over the second-best method, TETO ( $0.331$ ). For Scene B, CCPP’s score of $0.234 \pm 0.020$ yields a substantial 27.5% reduction compared to RRT_ACS ( $0.323$ ). Even in the more challenging Scene C, CCPP further decreases

G_{T}

to $0.138 \pm 0.024$ , outperforming the runner-upELP ( $0.282$ ) by a remarkable 50.9%. Moreover, the relatively low variability in Scenes B and C underscores the consistency of CCPP’s performance across random seeds. Collectively, these results confirm that CCPP consistently delivers higher-quality cooperative solutions, achieving a superior balance between efficiency and constraint satisfaction.

Planning step time

{Step}_{ms}

quantifies the average computational cost per online planning cycle; a smaller value indicates faster replanning at a given control rate and thus stronger real-time responsiveness. As evidenced in Figure 6, CCPP achieves the lowest or near-lowest millisecond-level step time across all scenarios. In Scene A, CCPP reaches $1.522 \pm 0.081$ ms, which is only 3.5% higher than the best TETO at $1.470$ ms, while for additional context still reducing the step time by 16.1% relative to DRLC $1.813$ ms and by 29.4% relative to PSC $2.155$ ms. In Scene B, CCPP attains the best performance with $1.645 \pm 0.011$ ms, yielding a 31.8% reduction compared with the runner-up RRT_ACS $2.411$ ms. In the more challenging Scene C, CCPP further decreases the step time to $1.198 \pm 0.086$ ms, outperforming RRT_ACS $1.513$ ms by 20.8%. Overall, these results verify that CCPP delivers consistently lower per-step planning overhead, enabling higher-frequency online coordination and faster mission-level response.

Task completion time

t_{s}

measures the total elapsed time required to accomplish the “detect-all/verify-all” objective. A smaller

t_{s}

indicates a higher throughput of the detect–confirm loop and thus more responsive online coordination. Since completion times span different orders of magnitude across scenarios and methods, Figure 7 reports

t_{s}

on a logarithmic scale for readability. In Figure 7, timeouts are plotted at

t_{s} = 5 \times 10^{5} s

as a visualization penalty on the log scale, while the actual mission horizon is

T_{max} =

21,600

s

and timeout cases are handled in a method-agnostic manner during statistical aggregation. According to Figure 7, CCPP consistently yields the shortest (or near-shortest) completion time across all scenarios. In Scene A, CCPP achieves $1.19 \times 10^{4} s \pm 4.70 \times 10^{2} s$ , reducing the completion time by 36.3% relative to the fastest baseline PSC ( $1.87 \times 10^{4} s$ ). In Scene B, CCPP attains $1.84 \times 10^{4} s \pm 4.70 \times 10^{2} s$ , shortening the mission duration by 43.2% relative to the fastest competing baseline ELP ( $3.25 \times 10^{4} s$ ) and by 48.9% relative to PSC ( $3.61 \times 10^{4} s$ ). In the larger-scale Scene C, CCPP maintains the mission time at $5.99 \times 10^{2} s \pm 1.64 \times 10^{- 1} s$ ; while slightly slower than RRT_ACS ( $4.68 \times 10^{2} s$ ) in this setting, it remains substantially faster than PSC/RAST/DRLC (approximately $9.35 \times 10^{2} s$ ), corresponding to a 35.9% reduction in completion time. Overall, these results demonstrate that CCPP can markedly compress the end-to-end mission time and thereby improve task throughput and responsiveness under online cooperative planning.

The minimum separation metric

d_{min}

captures the closest inter-agent distance over time, and larger values indicate a larger safety margin. As can be observed from Figure 8, CCPP maintains a consistently high and well-balanced safety margin across all scenarios, achieving $1.392$ , $1.131$ , and $1.044$ in Scenes A–C, respectively. In Scene A, CCPP improves upon DRLC from $1.305$ to $1.392$ , corresponding to an increase of about 6.7%. In Scene C, CCPP surpasses ELP from $0.870$ to $1.044$ , yielding an improvement of about 20.0%. While RAST attains the highest mean in Scene B at $1.763$ , it also exhibits the largest uncertainty, indicating substantial variability and reduced robustness; therefore, a higher mean alone does not imply superior overall safety performance. Considering both mean and dispersion across scenarios, CCPP offers a favorable trade-off by preserving a high safety margin with more controlled variability, demonstrating more reliable collision avoidance and safety constraint satisfaction.

Figure 9 summarizes the coverage satisfaction metric

η_{cov}

, which evaluates how well the coverage requirement is satisfied; a higher value indicates more complete and stable area coverage. With the updated results, CCPP consistently achieves the best coverage satisfaction across all scenarios, reaching 98.2%/95.8%/93.0% in Scene A/B/C, respectively, and remaining at 93.0% even in the more challenging Scene C. Compared with the runner-up RRT_ACS, CCPP leads by 0.9/0.9/0.8 percentage points in A/B/C. Against other representative baselines, CCPP outperforms PSC by 3.3/4.3/5.4 percentage points and exceeds RAST by 1.3/2.4/3.5 percentage points in A/B/C. These results demonstrate that CCPP provides stronger coverage-constraint assurance and more reliable coverage quality across varying scenario difficulties.

Figure 10 reports the verification satisfaction score

η_{verify}

, which measures how reliably the team completes the full discover-and-confirm workflow; higher values indicate more consistent confirmation of all targets. With the latest results, CCPP reaches 92.6% in Scene A, improving over PSC, ELP, RAST, TETO, and DRLC by 3.9, 3.0, 2.1, 4.5, and 2.5 percentage points, and exceeding the next-best RRT_ACS by 1.7 percentage points. As the scenario becomes more challenging, CCPP remains at 88.6% in Scene B and 86.0% in Scene C, staying within 0.1 and 0.2 percentage points of the best method while still outperforming DRLC by 1.8 and 2.9 percentage points, which demonstrates robust verification performance under increasing scenario complexity.

5.2.2. Ablation Study

To further dissect the individual contributions of each core module within the CCPP framework in complex, dynamic environments, we conducted systematic ablation experiments on the most challenging Scenario C. These analyses are intended to provide both component-level sensitivity evidence and repeated-run statistical evidence for separating the roles of the decision-making layer, the vision-to-planning interface, and the consistency-related design within the overall framework. By sequentially ablating the AIDT meta-strategy module, the vision-to-planning interface module, and the Gini-based equalization synchronization mechanism, three ablation variants were derived: NoAIDT, NoVision, and NoGini. In particular, the NoVision setting is used to explicitly evaluate the contribution of the vision perception interface by removing the uncertainty-aware perceptual structuring while keeping the remaining planning framework unchanged as far as possible. All ablation experiments were performed over 20 independent repeated trials, and the evolution curves of 6 key metrics over time were comprehensively recorded. The following three figure groups focus on planning computational efficiency, coordination consistency, and mission reliability, as well as mission coverage and collision risk, respectively. Each group consists of two subfigures, accompanied by detailed quantitative analysis.

For the time-series ablation analysis in Scene C, we use a transformed progress-oriented score, denoted by

P_{G T}

, which is derived from the raw temporal-consistency metric

G_{T}

. Specifically,

P_{G T}

is obtained from the raw

G_{T}

through a monotonic visualization-oriented transformation and is used only for the Scene C time-series ablation plots. This transformed quantity is adopted only for visualization purposes, so as to more intuitively reflect cumulative coordination progress within the fixed time horizon. Under this representation, a higher value of

P_{G T}

indicates better progress in the ablation analysis, whereas in the main cross-scenario comparison the raw

G_{T}

is retained as the formal temporal-consistency metric, for which a lower value indicates better coordination quality.

Figure 11 reports the evolution of the progress-oriented coordination score

P_{G T}

in Scene C under the ablation settings within 0–

300 s

, in which the full CCPP consistently exhibits the strongest late-stage coordination progress. Here,

P_{G T}

is a transformed visualization-oriented score derived from the raw temporal-consistency metric

G_{T}

, and is introduced only for the Scene C time-series ablation plots so as to more intuitively reflect cumulative coordination progress within the fixed horizon. Accordingly, a higher value of

P_{G T}

indicates better progress in this ablation view, whereas in the main cross-scenario comparison the raw

G_{T}

is retained, for which a lower value indicates better temporal coordination quality. Specifically, $P_{G T}^{CCPP} = 0.482$ , while the three ablated variants yield $P_{G T}^{NoAIDT} = 0.411$ , $P_{G T}^{NoVision} = 0.461$ , and $P_{G T}^{NoGini} = 0.439$ . Accordingly, under this progress-oriented representation, CCPP achieves a $17.3 %$ higher $P_{G T}$ value than NoAIDT, a $9.8 %$ higher value than NoGini, and still a $4.6 %$ higher value than NoVision. These results indicate that the AIDT module contributes most significantly to sustained coordination progress, and removing it leads to a pronounced degradation in late-stage cooperative effectiveness; meanwhile, the Gini-based synchronization mechanism continues to provide tangible benefits in the later phase of the mission.

Figure 12 compares the planning computation cost

{Step}_{ms}

in Scene C within 0–

300 s

, where

{Step}_{ms}

characterizes the average per-step decision-making latency. The full CCPP achieves a markedly lower computational cost in the later stage, with ${step}_{ms}^{CCPP} = 0.398 ms$ , whereas the ablated variants yield ${step}_{ms}^{NoAIDT} = 2.024 ms$ , ${step}_{ms}^{NoVision} = 2.134 ms$ , and ${step}_{ms}^{NoGini} = 1.919 ms$ . Consequently, CCPP reduces ${Step}_{ms}$ by $80.3 %$ relative to NoAIDT, by $81.4 %$ relative to NoVision, and by $79.2 %$ relative to NoGini. This demonstrates that, enabled by more effective information fusion and cooperative policy execution, CCPP avoids frequent high-cost replanning in the later phase, thereby substantially reducing computational burden and improving real-time feasibility.

Figure 13 presents the minimum safety distance

d_{min}

over 0–

300 s

for the Scene C ablation study, where

d_{min}

quantifies multi-agent collision-avoidance capability during mission execution. The curves indicate that the full CCPP maintains a substantially larger safety margin in the mid-to-late phase. In heterogeneous multi-agent swarms, a minimum inter-agent separation of $250 m$ is enforced as the absolute safety threshold (indicated by the red dashed line). Over time, all ablated variants except the complete CCPP are observed to repeatedly approach this critical boundary. Specifically, $d_{min}^{CCPP} = 422.5 m$ , while $d_{min}^{NoAIDT} = 315.5 m$ and $d_{min}^{NoVision} = 267.6 m$ ; consequently, CCPP improves $d_{min}$ by $33.9 %$ over NoAIDT and by $57.9 %$ over NoVision. Moreover, at the mid-phase (

t \approx 200 s

), CCPP achieves $d_{min} = 667.6 m$ whereas NoAIDT only reaches $346.5 m$ , yielding a gap close to $92.7 %$ , which further confirms that removing AIDT markedly weakens safety coordination.

Figure 14 reports the task completion-time CDF for Scene C over 0–

600 s

, where the CDF characterizes the fraction of trials that have completed the mission by a given time; a curve that rises earlier and is left-shifted indicates higher completion efficiency. The CCPP curve leads consistently across the entire range, suggesting faster and more stable mission execution. For instance, at $t = 300 s$ , CCPP completes $50.7 %$ of the tasks, while NoAIDT, NoVision, and NoGini complete only $32.7 %$ , $27.7 %$ , and $41.1 %$ , respectively; in particular, CCPP exceeds NoVision by $23.0 %$ in completion rate within

300 s

. More importantly, the median completion time (

CDF = 0.5

) shows $t_{50}^{CCPP} = 245 s$ , whereas NoAIDT, NoVision, and NoGini yield $535 s$ , $545 s$ , and $540 s$ , respectively; accordingly, CCPP reduces the median completion time by $54.2 %$ relative to NoAIDT, demonstrating its pronounced execution-efficiency advantage in large-scale scenarios.

Figure 15 illustrates the temporal evolution of the coverage efficiency

η_{cov}

over 0–

300 s

, which quantifies the effective coverage gain per unit time. The full CCPP exhibits the strongest reward-focusing capability in the late phase: near $t = 295 s$ , $η_{cov}^{CCPP} = 0.01887$ , whereas $η_{cov}^{NoVision} = 0.01512$ and $η_{cov}^{NoGini} = 0.01405$ . Accordingly, CCPP improves $η_{cov}$ by 24.8% over NoVision and by 34.3% over NoGini. This indicates that the visual observation module enhances online perception and reward focusing on effective coverage regions, while the Gini-based scheduling mechanism further improves resource/path allocation efficiency across agents, leading to sustained late-stage gains.

To characterize the reliability and timeliness of closing the verification loop, Figure 16 reports the time-series behavior of the verification efficiency

η_{verify}

using the cumulative completion ratio under a given time threshold. Overall, CCPP remains higher and rises earlier across most of the horizon, suggesting that it more stably drives trials toward verified completion under the highly dynamic and strongly constrained Scene C. At

t_{s} = 300 s

, CCPP reaches 0.44–0.45, while NoAIDT, NoVision, and NoGini attain 0.26, 0.20–0.22, and 0.35, respectively; the corresponding relative gains of CCPP are +70%/+100%/+25%. At

t_{s} = 500 s

, CCPP reaches 0.55, compared with 0.35/0.30/0.34 for NoAIDT/NoVision/NoGini, yielding gains of +57%/+83%/+62%. By

t_{s} = 600 s

, CCPP achieves 1.0, whereas the ablated variants remain within 0.85–0.90, implying that the full framework not only completes earlier but also attains a higher final completion rate with fewer failures/timeouts. In terms of module contributions, NoVision shows a clear lag around 350–

450 s

, indicating insufficient structured representation of dynamic risks and uncertainties without the vision–planning interface; NoAIDT stays below CCPP throughout, suggesting weaker online replanning due to the absence of the AIDT meta-policy for efficient mode switching and weight guidance; and NoGini briefly approaches CCPP around 200–

260 s

but falls behind over 300–

520 s

, reflecting degraded spatiotemporal coordination in critical phases without the synchronization/equilibrium mechanism.

5.2.3. Supplementary Ablation on Path–Speed Coupling

To further isolate the contribution of the proposed path–speed coupling mechanism, we introduce an additional ablation variant based on decoupled sequential optimization. In this setting, geometric path planning and speed coordination are performed in sequence rather than jointly within the rolling optimization loop. More specifically, the path is first generated under the same environmental and safety constraints, after which a separate speed-coordination step is applied on the resulting path. All remaining components, including the vision-to-planning interface, the AIDT-based meta-strategy, and the consistency-related modules, are kept unchanged as far as possible.

The supplementary ablation is conducted on Scene C, which is the most challenging dynamic scenario in this paper. For fairness, the original CCPP joint optimization scheme and the decoupled sequential variant are evaluated under identical initialization settings, target realizations, and random seeds. Each method is tested over

n = 10

independent runs. We report the synchronization indicator

G_{T}

, total mission time

t_{s}

, planning time per cycle

S t e p_{m s}

, minimum safety distance

d_{min}

, coverage rate

η_{c o v}

, and verification rate

η_{v e r i f y}

in order to evaluate the effect of path–speed coupling on temporal coordination, computational efficiency, safety, and mission effectiveness.

Therefore, this supplementary comparison provides a more direct component-level attribution for the proposed joint path–speed coupling mechanism beyond the main system ablation results.

As shown in Table 6, the proposed joint path–speed optimization scheme exhibits clear overall advantages over the decoupled sequential variant in the most critical aspects of heterogeneous cooperative planning. In terms of temporal coordination, the joint scheme reduces the synchronization indicator

G_{T}

from 0.33 to 0.13, indicating substantially stronger arrival-time consistency among heterogeneous agents. In terms of mission completion efficiency,

t_{s}

is reduced from 885 s to 552 s, showing that the proposed path–speed coupling enables the swarm to complete the cooperative search-and-confirmation task much faster. In terms of online computational efficiency, the average planning time per cycle is further reduced from 1.39 ms to 1.19 ms, suggesting that the coupled formulation does not increase online burden but instead yields a more efficient closed-loop planning process. In terms of mission effectiveness, the coupled scheme also achieves better task outcomes, improving the coverage rate

η_{c o v}

from 86% to 92% and, more importantly, raising the verification rate

η_{v e r i f y}

from 39% to 84%, which demonstrates a much stronger ability to complete the cue–confirmation chain under the challenging Scene C setting. It should also be noted that the decoupled variant attains a larger minimum safety distance

d_{min}

than the joint scheme. This suggests that the sequential strategy tends to behave more conservatively in spatial separation. However, such a gain is obtained together with markedly weaker synchronization, slower task completion, and much lower verification performance. Therefore, the overall results indicate that the proposed joint path–speed coupling mechanism achieves a more desirable balance among coordination consistency, completion efficiency, online planning efficiency, and mission effectiveness, which more convincingly supports its necessity in the CCPP framework.

5.2.4. Supplementary Comparison with a CV-Based Consistency Metric

To further analyze the influence of different consistency metrics on the temporal coordination behavior of heterogeneous swarms, we introduce a comparison group based on the coefficient of variation (CV) while keeping the visual interface, the AIDT mode-switching logic, and the path–speed co-optimization framework unchanged, and compare it with the original Gini-based consistency metric. Specifically, in addition to the original CCPP-Gini scheme, a CCPP-CV variant is constructed, in which the consistency evaluation and synchronization-trigger mechanism are reformulated using the arrival-time CV and the error-domain CV, while all other modules remain unchanged. For completeness, the arrival-time CV and error-domain CV used in the supplementary comparison are defined as:

\{\begin{matrix} C V_{T} & = \frac{\sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(T_{i} - T^{*})}^{2}}}{T^{*} + ε}, T^{*} = \frac{1}{N} \sum_{i = 1}^{N} T_{i}, \\ C V_{E} & = \frac{\sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(e_{i} - e^{*})}^{2}}}{e^{*} + ε}, e_{i} = |T_{i} - T^{*}|, e^{*} = \frac{1}{N} \sum_{i = 1}^{N} e_{i} . \end{matrix}

(24)

where

ε > 0

is a small positive constant introduced to avoid denominator degeneration. In addition, to provide a metric-neutral synchronization indicator, we further introduce the mean pairwise arrival-time gap

{\bar{d}}_{T} = \frac{2}{N (N - 1)} \sum_{i < j} |T_{i} - T_{j}| .

(25)

This comparative experiment is conducted on Scene C, which is the most challenging dynamic scenario considered in this paper. Under identical random seeds, initialization conditions, and target realizations, both CCPP-Gini and CCPP-CV are evaluated over

n = 10

independent runs. Since the purpose of this comparison is to examine how different consistency metrics affect the coordination evolution trend rather than the final task completion instant, all metrics are evaluated at a common fixed horizon. Table 5 reports the comparative results in terms of the minimum safety distance

d_{min}

, the planning time per cycle

S t e p_{ms}

, the coverage rate

η_{c o v}

, the verification rate

η_{v e r i f y}

, and the mean pairwise arrival-time gap

{\bar{d}}_{T}

. This supplementary comparison also serves as an empirical response to the metric-selection question, showing that the Gini-based design is not only theoretically well grounded, but also practically competitive under the small-scale heterogeneous swarm setting considered in this paper.

Under the fixed 300 s evaluation horizon, a supplementary comparison between CCPP-Gini and CCPP-CV was conducted, and the results are summarized in Table 5. Overall, the two consistency metrics exhibit highly comparable performance in terms of the mean pairwise arrival-time gap

{\bar{d}}_{T}

, where CCPP-CV achieves 286.37 and CCPP-Gini yields 287.20. The difference between the two is marginal, indicating that CCPP-Gini can maintain a level of temporal coordination comparable to that of CCPP-CV on this metric. In contrast, CCPP-Gini demonstrates clearer overall advantages on the remaining key indicators: the planning time per cycle

S t e p_{m s}

is reduced from 1.45 ms to 1.36 ms, corresponding to a decrease of approximately 6.2%; the minimum safety distance

d_{min}

is improved from 636.56 to 660.05, corresponding to an increase of approximately 3.7%; the coverage rate

η_{c o v}

is increased from 0.01 to 0.02; and the verification rate

η_{v e r i f y}

is improved from 0.39 to 0.44, corresponding to a relative increase of approximately 12.8%. Moreover, from the perspective of standard deviation, CCPP-Gini exhibits smaller fluctuations in

{\bar{d}}_{T}

,

d_{min}

,

S t e p_{m s}

, and

η_{v e r i f y}

, with values of 12.26, 6.88, 0.05, and 0.03, respectively, all lower than those of CCPP-CV (25.48, 35.90, 0.08, and 0.08). This indicates that CCPP-Gini achieves more stable performance across different random seeds. Overall, the Gini-based consistency metric maintains synchronization performance comparable to that of the CV-based alternative, while exhibiting superior online efficiency, safety margin, task advancement capability, and cross-seed stability.

Together with the main ablation study and the path–speed coupling comparison, this metric-level comparison further strengthens the sensitivity-style analysis of how individual components contribute to the overall performance of the proposed CCPP framework.

5.2.5. Supplementary Packet-Loss Sensitivity Analysis

Table 7 reports the supplementary packet-loss sensitivity results in Scene C. As the packet-loss rate increases from 0% to 20%, all methods exhibit a certain degree of performance degradation, which is consistent with the increased difficulty of maintaining timely coordination under communication uncertainty. Nevertheless, the overall relative ranking remains stable across all packet-loss settings.

Specifically, the full CCPP consistently achieves the best overall trade-off among coordination quality, mission efficiency, verification performance, and online planning latency. At packet-loss rates of 0%, 10%, and 20%, CCPP attains

G_{T}

values of 0.138, 0.145, and 0.148, respectively, compared with 0.281/0.303/0.320 for ELP and 0.293/0.325/0.346 for RAST. The corresponding verification satisfaction values of CCPP remain at 86.3%, 82.0%, and 80.3%, while the average planning latency stays low at 1.22, 1.281, and 1.305 ms, respectively. Relative to ELP, CCPP reduces

G_{T}

by 50.9%/52.1%/53.8%; relative to RAST, the reductions are 52.9%/55.4%/57.2%. In addition, CCPP improves

η_{verify}

over ELP by 3.5/5.8/9.1 percentage points and over RAST by 2.7/7.6/11.7 percentage points across the three packet-loss settings. These results indicate that, although packet loss causes moderate absolute degradation, the relative ranking remains unchanged and the superiority of the full CCPP framework is preserved under this supplementary communication-uncertainty perturbation.

5.3. Simulation Experiments

As shown in Figure 17 and Figure 18, the final case study is not drawn from a sparse nominal environment, but from Scene C, which is the most challenging dynamic scenario considered in this paper. The complete CCPP framework generates more orderly and better separated cooperative trajectories, with clearer spatial organization toward the shared goal and confirmation region. In contrast, the ablated variants NoAIDT, NoVision, and NoGini disrupt this coordination more noticeably, leading to stronger motion entanglement and more frequent trajectory crossings. Such behaviors increase collision risk and reduce path efficiency, which further highlights the essential role of the proposed modules in maintaining robust cooperative navigation under challenging conditions such as those in Scene C. In particular, the NoAIDT and NoVision cases jointly expose decision-logic degradation and perception degradation under the same disturbed setting. Because of the perspective effect inherent in 3D visualization, these trajectory figures are mainly intended to provide a qualitative illustration of trajectory organization, separation, crossing suppression, and overall cooperative structure, rather than the exact terminal synchronization instant. The synchronization quality itself is quantitatively supported by the corresponding consistency-related metrics and Scene C ablation results reported in the experimental section above.

6. Conclusions

This paper proposed a Cooperative Cross-domain Path Planning (CCPP) framework for heterogeneous UAV–USV swarms operating in dynamically rough maritime environments. CCPP comprises three key components: (i) a vision-to-planning interface that converts pose uncertainty and semantic dynamic obstacles into inflatable safety constraints and time-varying risk fields; (ii) a hybrid path–speed alternating optimization scheme that generates curvature-feasible trajectories under explicit temporal coordination; and (iii) an Adaptive Interpretable Decision Tree (AIDT) meta-strategy, together with a normalized Gini coefficient for arrival-time consistency, enabling online replanning and mode switching.

Across three scenarios, simulations show that CCPP delivers consistently strong performance against representative baselines while preserving real-time feasibility. To attribute these gains to individual modules, we further report ablation results in Scene C. In the ablation time-series analysis, a transformed progress-oriented coordination score, denoted by

P_{G T}

and derived from the raw temporal-consistency metric

G_{T}

, is used to visualize cumulative coordination progress within the fixed horizon. Under this representation, CCPP achieves a 17.3% higher

P_{G T}

value than NoAIDT, reduces the per-step planning latency

{Step}_{ms}

by approximately 80% relative to the ablated variants, and it increases the minimum inter-agent separation

d_{min}

by up to 57.9% relative to NoVision. Meanwhile, high coverage and verification performance is maintained, indicating robust mission execution under increasing scenario complexity. Overall, the ablation study confirms that each proposed module contributes materially to coordination effectiveness, efficiency, and safety.

Future work will extend the proposed framework to larger formations and more realistic deployment conditions, including field experiments with real UAVs and USVs, while explicitly modeling GNSS-denied effects, communication constraints, and more realistic sea-state dynamics.

Author Contributions

Conceptualization, Y.W. and D.L.; methodology, Y.G.; software, Y.G.; validation, Y.G. and H.Y.; formal analysis, W.W.; data curation, Y.G.; writing—original draft preparation, Y.G., L.T.; writing—review and editing, Y.G. and L.T.; visualization, H.Y.; supervision, B.G.; project administration, G.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author due to research management and usage restrictions.

DURC Statement

Current research is limited to the path planning field, which is intended solely for beneficial applications and does not pose a threat to public health or national security. Authors acknowledge the dual-use potential of the research involving unmanned systems and autonomous planning technologies and confirm that all necessary precautions have been taken to prevent potential misuse. As an ethical responsibility, the authors strictly adhere to relevant national and international laws about DURC. The authors advocate for responsible deployment, ethical considerations, regulatory compliance, and transparent reporting to mitigate misuse risks and foster beneficial outcomes.

Conflicts of Interest

Authors Wenliang Wang and Bing Guo were employed by the company CSSC Zhejiang Ocean Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Dinelli, C.; Racette, J.; Escarcega, M.; Lotero, S.; Gordon, J.; Montoya, J.; Dunaway, C.; Androulakis, V.; Khaniani, H.; Shao, S.; et al. Configurations and Applications of Multi-Agent Hybrid Drone/Unmanned Ground Vehicle for Underground Environments: A Review. Drones 2023, 7, 136. [Google Scholar] [CrossRef]
Do, H.; Jang, J.; Kim, J. Heterogeneous multi-robot system mission planning with cooperative replenishment through data-driven rendezvous point selection. Intell. Serv. Robot. 2025, 18, 61–73. [Google Scholar] [CrossRef]
Tejada, J.C.; Toro-Ossaba, A.; López-Gonzalez, A.; Hernandez-Martinez, E.G.; Sanin-Villa, D. A Review of Multi-Robot Systems and Soft Robotics: Challenges and Opportunities. Sensors 2025, 25, 1353. [Google Scholar] [CrossRef]
Cao, X.; Liu, W.; Ren, L. Underwater Target Capture Based on Heterogeneous Unmanned System Collaboration. IEEE Trans. Intell. Veh. 2024, 9, 6049–6062. [Google Scholar] [CrossRef]
Li, C.; Li, J.; Zhang, G.; Chen, T. IROA-based LDPC-Lévy method for target search of multi AUV–USV system in unknown 3D environment. Ocean. Eng. 2023, 286, 115648. [Google Scholar] [CrossRef]
Li, Z.; Zhang, W.; Wu, W.; Shi, Y. Consensus Control of Heterogeneous Uncertain Multiple Autonomous Underwater Vehicle Recovery Systems in Scenarios of Implicit Reduced Visibility. J. Mar. Sci. Eng. 2024, 12, 1332. [Google Scholar] [CrossRef]
Zhou, X.; Zhang, X.; Yang, X.; Zhao, J.; Liu, Z.; Shuang, F. Towards UAV Localization in GNSS-Denied Environments: The SatLoc Dataset and a Hierarchical Adaptive Fusion Framework. Remote Sens. 2025, 17, 3048. [Google Scholar] [CrossRef]
Başhan, V. Reliability assessment of autonomous maritime navigation systems under uncertainty. Ocean. Eng. 2025, 333, 121588. [Google Scholar] [CrossRef]
Huang, Z.; Mou, W.; Wang, R.; Li, P.; Lyu, Z.; Ou, G. A survey of GNSS receiver autonomous integrity monitoring: Research status and opportunities. Front. Phys. 2025, 13, 1567301. [Google Scholar] [CrossRef]
Rodríguez-Martínez, E.A.; Flores-Fuentes, W.; Achakir, F.; Sergiyenko, O.; Murrieta-Rico, F.N. Vision-Based Navigation and Perception for Autonomous Robots: Sensors, SLAM, Control Strategies, and Cross-Domain Applications—A Review. Eng 2025, 6, 153. [Google Scholar] [CrossRef]
Gao, Y.; Wang, Y.; Tian, L.; Hong, X.; Xue, C.; Li, D. Evolving adaptive and interpretable decision trees for cooperative submarine search. Def. Technol. 2025, 48, 83–94. [Google Scholar] [CrossRef]
Gao, J.; Li, Y.; Li, X.; Yan, K.; Lin, K.; Wu, X. A review of graph-based multi-agent pathfinding solvers: From classical to beyond classical. Knowl.-Based Syst. 2024, 283, 111121. [Google Scholar] [CrossRef]
Ma, H.; Wagner, G.; Felner, A.; Li, J.; Kumar, T.K.S.; Koenig, S. Multi-Agent Path Finding with Deadlines. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18), Stockholm, Sweden, 13–19 July 2018; pp. 417–423. [Google Scholar] [CrossRef]
Sharon, G.; Stern, R.; Felner, A.; Sturtevant, N.R. Conflict-based search for optimal multi-agent pathfinding. Artif. Intell. 2015, 219, 40–66. [Google Scholar] [CrossRef]
Han, S.; Wang, L.; Wang, Y.; He, H. A dynamically hybrid path planning for unmanned surface vehicles based on non-uniform Theta* and improved dynamic windows approach. Ocean. Eng. 2022, 257, 111655. [Google Scholar] [CrossRef]
Lund, A.; Hansen, P.N.; Thompson, F.; Prabowo, Y.A.; Galeazzi, R. Towards Multi-Domain SLAM in GNSS Denied, Maritime Urban Environments. IFAC-PapersOnLine 2025, 59, 142–147. [Google Scholar] [CrossRef]
Bescós, B.; Fácil, J.M.; Civera, J.; Neira, J. DynaSLAM: Tracking, Mapping, and Inpainting in Dynamic Scenes. IEEE Robot. Autom. Lett. 2018, 3, 4076–4083. [Google Scholar] [CrossRef]
Fan, Y.; Zhang, Q.; Tang, Y.; Liu, S.; Han, H. Blitz-SLAM: A semantic SLAM in dynamic environments. Pattern Recognit. 2022, 121, 108225. [Google Scholar] [CrossRef]
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef]
De Maio, F.G. Income inequality measures. J. Epidemiol. Community Health 2007, 61, 849–852. [Google Scholar] [CrossRef]
Dorfman, R. A Formula for the Gini Coefficient. Rev. Econ. Stat. 1979, 61, 146–149. [Google Scholar] [CrossRef]
Farris, F.A. The Gini Index and Measures of Inequality. Am. Math. Mon. 2010, 117, 851–864. [Google Scholar] [CrossRef]
Gao, Y.; Wang, Y.; Tian, L.; Li, D.; Wang, F. Visual Navigation Algorithms for Aircraft Fusing Neural Networks in Denial Environments. Sensors 2024, 24, 4797. [Google Scholar] [CrossRef]
Sun, X.; Luo, Y.; Li, K.; Liu, G.; Zhao, J. A decoupled time-optimized multi-robot path parameterization method. Complex Intell. Syst. 2026, 12, 3. [Google Scholar] [CrossRef]
Yao, J.; Qi, N. Obstacle-avoiding path planning and simultaneous arrival control for multiple autonomous underwater vehicles. Sci. China Technol. Sci. 2019, 62, 248–259. [Google Scholar] [CrossRef]
Yu, X.; Zhu, Y.; Lu, L.; Ou, L. Dynamic Window with Virtual Goal (DW-VG): A New Reactive Obstacle Avoidance Approach Based on Motion Prediction. Robotica 2019, 37, 1438–1456. [Google Scholar] [CrossRef]
Wen, Y.; Huang, J.; Jiang, T.; Su, X. Safe and smooth improved time elastic band trajectory planning algorithm. Control Decis. 2022, 37, 2008–2016. [Google Scholar] [CrossRef]
Pohan, M.A.R.; Trilaksono, B.R.; Santosa, S.P.; Rohman, A.S. Path Planning Algorithm Using the Hybridization of the Rapidly-Exploring Random Tree and Ant Colony Systems. IEEE Access 2021, 9, 153599–153615. [Google Scholar] [CrossRef]
Rösmann, C.; Hoffmann, F.; Bertram, T. Integrated online trajectory planning and optimization in distinctive topologies. Robot. Auton. Syst. 2017, 88, 142–153. [Google Scholar] [CrossRef]

Figure 1. Three-layer architecture of the CCPP framework.The dotted arrows are used to more clearly illustrate the cross-layer input flow of information and data.

Figure 2. Vision-to-Planning module.

Figure 3. Trigger action tree structure.

Figure 4. The closed-loop process.The nodes arranged from top to bottom represent the progression from initiation to termination, while the internal blue and red lines indicate the directions of data flow during execution.

Figure 5. Arrival-time consistency metric

G_{T}

across three scenarios (mean ± std).

Figure 5. Arrival-time consistency metric

G_{T}

across three scenarios (mean ± std).

Figure 6. Planning step time

step_ms

across three scenarios (mean ± std).

Figure 6. Planning step time

step_ms

across three scenarios (mean ± std).

Figure 7. Task completion time

t_{s}

across three scenarios (mean ± std).

Figure 7. Task completion time

t_{s}

across three scenarios (mean ± std).

Figure 8. Minimum separation

d_{min}

across three scenarios (mean ± std).

Figure 8. Minimum separation

d_{min}

across three scenarios (mean ± std).

Figure 9. Coverage satisfaction

η_{cov}

across three scenarios (mean ± std). In the figure, the red number “1” denotes a task completion value of 100%.

Figure 9. Coverage satisfaction

η_{cov}

across three scenarios (mean ± std). In the figure, the red number “1” denotes a task completion value of 100%.

Figure 10. Verification satisfaction

η_{verify}

across three scenarios (mean ± std). In the figure, the red number “1” denotes a satisfaction value of 100.

Figure 10. Verification satisfaction

η_{verify}

across three scenarios (mean ± std). In the figure, the red number “1” denotes a satisfaction value of 100.

Figure 11. Time-series ablation of the progress-oriented coordination score

P_{G T}

on Scene C (mean ± std). The shadow areas indicate the standard deviation around the mean curves.

Figure 11. Time-series ablation of the progress-oriented coordination score

P_{G T}

on Scene C (mean ± std). The shadow areas indicate the standard deviation around the mean curves.

Figure 12. Ablation of planning step time

{Step}_{ms}

on Scene C (mean ± std). The shadow areas indicate the standard deviation around the mean curves.

Figure 12. Ablation of planning step time

{Step}_{ms}

on Scene C (mean ± std). The shadow areas indicate the standard deviation around the mean curves.

Figure 13. Ablation of minimum safety distance

d_{min}

on Scene C (mean ± std). The red dotted line indicates the predefined safety-distance threshold.

Figure 13. Ablation of minimum safety distance

d_{min}

on Scene C (mean ± std). The red dotted line indicates the predefined safety-distance threshold.

Figure 14. Ablation of task completion-time CDF on Scene C (mean ± std). The shadow areas indicate the standard deviation around the mean curves.

Figure 15. Ablation of coverage efficiency

η_{cov}

on Scene C (mean ± std). The shadow areas indicate the standard deviation around the mean curves.

Figure 15. Ablation of coverage efficiency

η_{cov}

on Scene C (mean ± std). The shadow areas indicate the standard deviation around the mean curves.

Figure 16. Ablation of verification efficiency

η_{verify}

on Scene C (mean ± std). The shadow areas indicate the standard deviation around the mean curves.

Figure 16. Ablation of verification efficiency

η_{verify}

on Scene C (mean ± std). The shadow areas indicate the standard deviation around the mean curves.

Figure 17. Representative intermediate-stage 3D cooperative trajectories in the most challenging dynamic Scene C for CCPP and NoAIDT. This figure is mainly used to illustrate the overall coordination pattern, trajectory separation, crossing suppression, and cooperative organization of the heterogeneous swarm, and to qualitatively show the effect of removing the AIDT decision logic under the same disturbed environment.

Figure 18. Representative intermediate-stage 3D cooperative trajectories in the most challenging dynamic Scene C with a shared goal and confirmation region for NoGini and NoVision. This figure is mainly used to illustrate the overall coordination pattern, trajectory separation, crossing suppression, and cooperative organization of the heterogeneous swarm. In particular, the NoVision case serves as a perception-degradation stress condition for qualitatively examining the effect of weakened vision-to-planning coupling under the same dynamic setting.

Table 1. Constraint set

C = {C_{0}, \dots, C_{7}}

(Part I).

Table 1. Constraint set

C = {C_{0}, \dots, C_{7}}

(Part I).

ID	Name	Content (Verbatim Description)
$C_{0}$	Mixed Cluster Constraint	Each executor satisfies $i \in I = I_{uav} \cup I_{usv}$ , and the planning solution is ${p_{i} (\cdot), v_{i}}$ .
$C_{1}$	Boundary Conditions and Mission Area Constraints	Each executor starts from its initial region at mission start, and under online replanning the current vehicle state is treated as the updated initial condition. During each planning cycle, the agent shall reach its currently assigned stage-dependent task region while always remaining within the admissible mission area and outside forbidden zones,
		$p_{i} (0) \in G_{i}, p_{i} (L_{i}) \in g_{i}, p_{i} (t) \in Ω ∖ Ω^{forbid}, \forall t \in [0, T_{i}]$ .
		Here, $g_{i}$ denotes the currently assigned task region, which may correspond to a coverage subregion or frontier in the search stage, or a cue-confirmation region in the verification stage.
$C_{2}$	Kinematic and Curvature Feasibility Constraints for Heterogeneous Platforms	To ensure that the generated trajectory is executable, the geometric path is represented in the planar Serret–Frenet frame and must satisfy curvature and speed feasibility constraints. The curvature is defined by $d t_{i} / d s = κ_{i} (s) n_{i} (s)$ and is related to the heading/yaw-rate control through ${\dot{ψ}}_{i} (t) = v_{i} (s) κ_{i} (s)$ .
		$κ_{i} (s) \leq κ_{i}^{max}, v_{i}^{min} \leq v_{i} (s) \leq v_{i}^{max}$ .
		For UAVs, the heading-rate and altitude-layer constraints are enforced as
		$\|{\dot{ϕ}}_{i} (t)\| \leq {\dot{ϕ}}_{i}^{max}, z_{i}^{min} \leq z_{i} (t) \leq z_{i}^{max}, i \in I_{uav}$ .
		For USVs, the yaw-rate and navigable-water constraints are enforced as
		$\|{\dot{φ}}_{i} (t)\| \leq {\dot{φ}}_{i}^{max}, p_{i} (t) \in Ω^{nav}, i \in I_{usv}$ .
$C_{3}$	Safety Constraints for Environmental Collision Avoidance and Uncertainty Inflation	To enhance safety under localization/perception uncertainty and sea condition disturbances, all platforms must maintain a minimum safe distance from static and dynamic obstacles, and the safety margin expands with uncertainty:
		$dist (p_{i} (t), O^{sta} \cup O^{dyn} (t)) \geq d_{i}^{safe} + Δ_{i} (t), \forall t \in [0, T_{i}]$
		where the uncertainty inflation term is explicitly computed from the pose covariance as $Δ_{i} (t) = α_{Σ} \sqrt{λ_{max} (Σ_{i} (t))}$ .
		Simultaneously, a minimum distance must be maintained within the team:
		$dist (p_{i} (t), p_{j} (t)) \geq d_{i j}^{safe} + Δ_{i j} (t), \forall i \neq j, \forall t \in [0, T]$ .

Table 2. Constraint set

C = {C_{0}, \dots, C_{7}}

(Part II).

Table 2. Constraint set

C = {C_{0}, \dots, C_{7}}

(Part II).

ID	Name	Content (Verbatim Description)
$C_{4}$	Communication Reachability and Coordination Variable Exchange Constraint	Our defined joint maritime anti-submarine search relies on the clue transmission from UAVs to USVs and state feedback from USVs to UAVs. To guarantee online coordination, at the planning update time set $τ_{upd}$ , at least one communicable link must exist between coordinating agents:
		$\exists j \in I : dist (p_{i} (t), p_{j} (t)) \leq R_{i j}^{comm} and \forall t \in τ_{upd}$ .
		This constraint ensures that predicted states, mode flags, and clue hotspot information can be synchronized online.
$C_{5}$	Coverage Search Constraint	To ensure effective coverage of the search area $Ω$ , the team’s coverage quality must reach a minimum threshold $η$ , while,
		$Cov ({p_{i} (\cdot)}_{i \in I}, Ω) \geq η$
		where $Cov (\cdot)$ can be defined by metrics such as grid visitation rate, continuous coverage degree, or revisitation frequency: its specific form will be given in the subsequent experimental setup.
$C_{6}$	Coupling Constraint from Clue to Confirmation	Considering the cross-domain search chain for underwater targets, after a UAV generates a clue $c (t_{c})$ of a suspicious area, within a given confirmation window at least one USV must enter the confirmation range to perform search and detection:
		$\exists i \in I_{usv} : dist (p_{i} (t), c (t_{c})) \leq R_{verify}$ .
		This constraint closes the loop between “cross-domain clue generation” and “surface search verification,” thereby supporting joint search and online task reallocation.
$C_{7}$	Temporal Consistency (Near-Synchronous) Constraint	For scenarios requiring coordinated encirclement, synchronized point verification, or synchronized revisitation, the team’s arrival time difference can be explicitly constrained:
		$\|T_{i} - T_{j}\| \leq ε, \forall i, j \in I$
		where $ε$ is the allowable upper bound for synchronization error: in practical solving, it can also be treated as a soft constraint and incorporated into the synchronization consistency cost term $J^{sync}$ to enhance feasibility and robustness under conditions of strong disturbances and limited communication.

Table 3. Key simulation and evaluation parameters used in our experiments.

Item	Setting
Time step	$Δ t = 0.2$ s
Mission horizon/timeout	$T_{max} =$ 21,600 s (if not completed, counted as timeout; see below)
Planning/update schedule	planning frequency 1 Hz ( $Δ t_{p} = 1.0$ s); control/logging frequency 10 Hz; all methods follow the same receding-horizon update schedule within each scene.
Optimization solver	warm-started alternating path–speed optimization in a receding-horizon loop; path and speed are updated sequentially within each planning cycle, with lightweight updates or local/global replanning triggered by the AIDT module.
Objective structure	the planner uses the same cost-function composition and constraint set defined in Section 4, including synchronization consistency, risk-aware planning, and task-related constraints; all compared methods are evaluated under the same implementation-level settings unless explicitly ablated.
Scenes (region size)	Scene A: $2 \times 2$ km ( $L = 1000$ m); Scene B: $5 \times 5$ km ( $L = 2500$ m); Scene C: $10 \times 10$ km ( $L = 5000$ m)
Team size	Scenario-dependent; all methods use the same team size within each scene
Targets	$N_{tar} = 8$ per run; sampled uniformly with a fixed seed
Coverage grid	cell size $s_{cell} = 50$ m
Discover/verify radii	UAV discovery radius $r_{disc} = 200$ m; USV verification radius $r_{ver} = 50$ m
Ocean current disturbance	spatially uniform, time-varying current $v_{c} (t) = [0.8 sin (0.05 t), 0.5 cos (0.04 t)]$ (m/s)
Inter-agent safety	$d_{min}$ computed as the minimum pairwise distance over all agents (worst-case over time)
GNSS-denied setting	GNSS is assumed unavailable. The planner operates on simulator-generated navigation-state estimates as a proxy for onboard pose estimates, together with their associated uncertainty information, rather than on idealized GNSS truth. The resulting uncertainty is injected into planning through the vision-to-planning interface according to the validated visual navigation model reported in [23].
Statistical reporting	mean ± std over n seeds; 95% CI computed as $\bar{x} \pm t_{0.975, n - 1} s / \sqrt{n}$

Table 4. Scenario settings and heterogeneous cluster configurations.

Scenarios	Agents Group	Area
Scenario A (Simple & Static)	2 UAV + 2 USV	2 km × 2 km
Scenario A (Simple & Static)	3 UAV + 3 USV	2 km × 2 km
Scenario B (Moderately Dynamic)	2 UAV + 2 USV	5 km × 5 km
	3 UAV + 3 USV
	4 UAV + 4 USV
Scenario C (Complex & Highly Dynamic)	2 UAV + 2 USV	10 km × 10 km
	3 UAV + 3 USV
	4 UAV + 4 USV

Table 5. Supplementary comparison between the Gini-based and CV-based consistency metrics on Scene C (mean ± std over 10 seeds, evaluated at the same fixed 300 s horizon).

Method	${\bar{d}}_{T}$	${Step}_{ms}$	$d_{min}$	$η_{cov}$	$η_{verify}$
CCPP (Gini)	$287.20 \pm 12.26$	$1.36 \pm 0.05$	$660.05 \pm 6.88$	$0.02 \pm 0.01$	$0.44 \pm 0.03$
CCPP_CV	$286.37 \pm 25.48$	$1.45 \pm 0.08$	$636.56 \pm 35.90$	$0.01 \pm 0.02$	$0.39 \pm 0.08$

Table 6. Supplementary comparison between the joint and decoupled path–speed optimization schemes in Scene C (mean ± std over 10 runs).

Method	$G_{T}$	$t_{s}$	${Step}_{ms}$	$d_{min}$	$η_{cov}$	$η_{verify}$
CCPP (Joint)	$0.13 \pm 0.02$	$552 \pm 0.16$	$1.19 \pm 0.09$	$284 \pm 0.03$	$0.92 \pm 0.21$	$0.84 \pm 0.17$
CCPP-DS	$0.33 \pm 0.08$	$885 \pm 0.23$	$1.39 \pm 0.07$	$517 \pm 0.79$	$0.86 \pm 0.11$	$0.39 \pm 0.04$

Table 7. Supplementary packet-loss sensitivity analysis in Scene C (mean ± std). Packet loss is injected at the planning-interface level through a Bernoulli dropout model on shared coordination updates; all other settings are identical to those of the original Scene-C experiment.

Packet Loss Rate	Method	$G_{T}$ ↓	$t_{s}$ ↓	$η_{verify}$ ↑	${Step}_{ms}$ ↓
0%	ELP	0.281 ± 0.03	1680 s	82.8%	1.720 ms
0%	RAST	0.293 ± 0.01	945 s	83.6%	1.640 ms
0%	CCPP	0.138 ± 0.02	599 s	86.3%	1.220 ms
10%	ELP	0.303 ± 0.032	1814 s	76.2%	1.858 ms
10%	RAST	0.325 ± 0.011	1049 s	74.4%	1.820 ms
10%	CCPP	0.145 ± 0.021	629 s	82.0%	1.281 ms
20%	ELP	0.320 ± 0.034	1915 s	71.2%	1.961 ms
20%	RAST	0.346 ± 0.012	1115 s	68.6%	1.935 ms
20%	CCPP	0.148 ± 0.021	641 s	80.3%	1.305 ms

Note: ↑ indicates that a higher value is better, while ↓ indicates that a lower value is better.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gao, Y.; Yin, H.; Wang, W.; Guo, B.; Wang, Y.; Li, G.; Tian, L.; Li, D. A Rapid Trajectory Planning Method for Heterogeneous Swarms via Fusion of Visual Navigation and Explainable Decision Trees. Drones 2026, 10, 287. https://doi.org/10.3390/drones10040287

AMA Style

Gao Y, Yin H, Wang W, Guo B, Wang Y, Li G, Tian L, Li D. A Rapid Trajectory Planning Method for Heterogeneous Swarms via Fusion of Visual Navigation and Explainable Decision Trees. Drones. 2026; 10(4):287. https://doi.org/10.3390/drones10040287

Chicago/Turabian Style

Gao, Yang, Hao Yin, Wenliang Wang, Bing Guo, Yue Wang, Guopeng Li, Lingyun Tian, and Dongguang Li. 2026. "A Rapid Trajectory Planning Method for Heterogeneous Swarms via Fusion of Visual Navigation and Explainable Decision Trees" Drones 10, no. 4: 287. https://doi.org/10.3390/drones10040287

APA Style

Gao, Y., Yin, H., Wang, W., Guo, B., Wang, Y., Li, G., Tian, L., & Li, D. (2026). A Rapid Trajectory Planning Method for Heterogeneous Swarms via Fusion of Visual Navigation and Explainable Decision Trees. Drones, 10(4), 287. https://doi.org/10.3390/drones10040287

Article Menu

A Rapid Trajectory Planning Method for Heterogeneous Swarms via Fusion of Visual Navigation and Explainable Decision Trees

Highlights

Abstract

1. Introduction

2. Related Work

2.1. Temporal-Constraint Cooperative Arrival Control for Heterogeneous Unmanned Systems

2.2. Semantic Uncertainty Representation in Dynamic Scene Visual Navigation and Its Planning Interface

2.3. Interpretable Online Decision-Making Mechanism and Gini-Based Metric for Arrival Time Consistency

3. Formulation

3.1. System Description and Notation

3.2. Environment, Dynamic Obstacles, and Risk Representation

3.3. Synchronous Arrival and Temporal Consistency Objective

3.4. Analysis of Model Constraints

3.5. Overall Analysis

4. Method

4.1. CCPP Framework Overview and Architecture

4.2. Framework Diagram

4.3. Vision-to-Planning Interface Module Design

4.4. Design of the Path–Speed Hybrid Cooperative Solver

4.4.1. Overall Objective and Notation

4.4.2. Alternating Optimization for Path and Speed Updates

4.5. Consistency Regularization Design Based on the Gini Coefficient

4.6. Online Replanning Design Based on the AIDT Explainable Meta-Strategy

4.6.1. Decision Tree Structure

4.6.2. Coupling Design of the AIDT Module Within the CCPP Framework

4.7. Algorithm and Complexity Analysis

4.7.1. Algorithm Implementation

4.7.2. Algorithm Complexity Analysis

5. Experiments

5.1. Experimental Setup

5.1.1. Experimental Setup and Reproducibility Settings

5.1.2. Experimental Scenarios

5.1.3. Experimental Equipment Platform and Details

5.2. Analysis of Experimental Results

5.2.1. Algorithm Implementation

5.2.2. Ablation Study

5.2.3. Supplementary Ablation on Path–Speed Coupling

5.2.4. Supplementary Comparison with a CV-Based Consistency Metric

5.2.5. Supplementary Packet-Loss Sensitivity Analysis

5.3. Simulation Experiments

6. Conclusions

Author Contributions

Funding

Data Availability Statement

DURC Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI