A Comprehensive Review of Path-Planning Algorithms for Multi-UAV Swarms

Li, Junqi; Li, Junjie; Zhang, Jian; Meng, Wenyue

doi:10.3390/drones10010011

Open AccessReview

A Comprehensive Review of Path-Planning Algorithms for Multi-UAV Swarms

¹

National Key Laboratory of Science and Technology on Advanced Light-Duty Gas-Turbine, Institute of Engineering Thermophysics, Chinese Academy of Sciences, Beijing 100190, China

²

University of Chinese Academy of Sciences, Beijing 100049, China

³

Key Laboratory of Space Utilization, Technology and Engineering Center for Space Utilization, Chinese Academy of Sciences, Beijing 100094, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Drones 2026, 10(1), 11; https://doi.org/10.3390/drones10010011

Submission received: 7 October 2025 / Revised: 15 December 2025 / Accepted: 25 December 2025 / Published: 26 December 2025

(This article belongs to the Section Artificial Intelligence in Drones (AID))

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

A scenario-conditioned taxonomy (mission × environment dynamics) is established for UAV swarms, mapping centralized, decentralized, and hybrid planners to nine common settings with summary tables and quantitative evidence.
Cross-scenario trade-offs among responsiveness, safety, scalability, and energy are synthesized, identifying best-fit choices and typical failure modes under real-world constraints (limited computing, unstable links, imperfect sensing).

What are the implications of the main findings?

Provide deployment-oriented guidance for selecting planners given mission constraints and resources, together with a concise evaluation checklist to enable reproducible comparisons.
Outline near-term R&D priorities: adaptive planners on resource-constrained platforms; scalable multi-objective planning with safety guarantees; sim-to-real benchmarks/digital twins; energy-aware hierarchical planning; and coupling offline pre-planning with online replanning.

Abstract

Collaborative multi-UAV swarms are central to many missions. This review covers the most recent two years. It organizes the literature with a scenario-aligned taxonomy. The taxonomy has 12 cells (Path/Distribution/Coverage × offline/online × static/dynamic). Nine cells are well populated and analyzed. For each, representative techniques, reported limitations, and scenario-appropriate use are summarized. Cross-scenario trade-offs are made explicit. Key examples include scalability vs. energy efficiency and centralized vs. decentralized (hybrid) architectures. The review also links offline pre-planning to online execution through architecture choices, digital-twin validation, and safety-aware collision avoidance in cluttered airspace. Unlike prior algorithm-centric or bibliometric surveys, this work applies a scenario-conditioned taxonomy, ties best-suited method families to each populated cell, and surfaces reported limitations alongside trade-offs. The result is deployment-oriented guidance that maps methods to mission context. Finally, five near-term priorities are highlighted: (i) compute-aware real-time adaptivity on resource-constrained platforms; (ii) scalable multi-objective scheduling with coupled motion and cooperative control; (iii) bandwidth-aware, conflict-resilient intra-swarm communication with reliability guarantees; (iv) certifiable planning for dense urban low-altitude corridors; and (v) energy-aware, hierarchical planners that couple offline pre-planning with online replanning.

Keywords:

multi-UAV swarms; path planning; scenario-conditioned taxonomy; digital twin; collision avoidance; centralized vs. decentralized architectures; best-suited techniques; limitations

1. Introduction

UAVs have gained widespread adoption in military, agricultural, and rescue operations because of their low cost, compact size, rapid deployment ability, and high stealth capabilities [1,2,3]. Compared with single UAVs, UAV swarms provide broader search coverage, higher operational efficiency, and improved task performance. UAV missions generally fall into six categories: roundup, reconnaissance, surveillance, transportation, target strike, and search. Regardless of mission type, effective path planning underpins the success of UAV swarms. Consequently, designing robust path-planning algorithms is essential for efficient mission execution.

As enabling technologies advance, path-planning algorithms are evolving toward greater intelligence and autonomy. In the early 2000s, classical algorithms—such as Dijkstra, A*, Voronoi diagrams, and Rapidly Exploring Random Trees (RRT)—were commonly employed to compute globally optimal paths in 2D and 3D discrete spaces. However, their adaptability to dynamic environments is limited, and they respond sluggishly to sudden changes. During the 2020s, bio-inspired intelligent algorithms—such as genetic algorithms, particle swarm optimization, and ant colony optimization—enhanced obstacle-avoidance and path-adjustment capabilities in dynamic scenes, thereby improving global search efficiency for UAV swarms. However, these methods still struggle with high computational complexity and slow convergence when faced with cluttered or uncertain environments. The rapid growth of machine learning has brought deep learning (DL) and reinforcement learning (RL) into UAV path planning. Training on large-scale data endows UAV swarms with self-learning, adaptive behavior, and autonomous decision-making. Moreover, multi-agent cooperation and multi-sensor fusion (e.g., vision and LiDAR) have emerged as pivotal research themes. Current efforts aim to boost algorithm generalization, lessen data dependence, and strike a balance between real-time performance and computational overhead.

Over the past decade, multi-UAV swarm planning has progressed substantially, with meaningful results reported across surveillance, disaster response, inspection, and logistics, among other application domains [4,5,6]. The pace has further accelerated in the most recent four years, driven by improved onboard computing, sensing, and communication, as well as the rapid development of learning-based and hybrid planners [5,6]. Despite this growth, the literature remains fragmented: path planning, task distribution, and area coverage are often treated in isolation, and reported conclusions depend strongly on whether missions are planned offline or online and whether environments are static or dynamic. Consequently, practitioners still lack a concise and deployment-oriented overview that links mission context to suitable planning architectures and algorithm families and makes cross-scenario trade-offs and limitations explicit. To address this need, this review focuses on studies published in the most recent four years and organizes them using a scenario-conditioned taxonomy defined by mission type (Path/Distribution/Coverage), planning mode (offline/online), and environment type (static/dynamic).

Several recent surveys provide useful background from complementary viewpoints. Aggarwal and Kumar [2] review UAV path-planning techniques and classify them into sampling-based and artificial intelligence approaches. Yang et al. [3] group 3D path-planning methods into sampling-based, node-based, model-based, bio-inspired, and multi-fusion categories. Ait Saadi et al. [7] organize algorithms by optimization approach (classical, heuristic, meta-heuristic, machine-learning, and hybrid), while Zhang et al. [8] focus on cooperative path planning for multi-UAV systems. Cetinsaya et al. [4] review control and path-planning methods for UAVs and swarms and summarize open challenges. A recent multi-UAV survey [5] compares meta-heuristic, classical, heuristic, machine learning, and hybrid planners mainly in terms of performance and complexity. Wu et al. [6] use bibliometric tools to analyze 2000–2024 UAV path-planning papers. Ghambari et al. [9] propose a method-centric taxonomy for multi-robot motion planning, and Chen et al. [10] and Bui [11] review multi-robot navigation and motion planning from broader robotic perspectives. Athira et al. [12] survey multi-robot task allocation mainly for ground robots.

These works provide a comprehensive picture of algorithm families, system architectures, and research trends, but they are primarily algorithm-centric or platform-agnostic and do not organize methods by UAV-specific mission type, planning mode, and environment. In contrast, the present review uses a scenario-conditioned taxonomy defined by mission (Path/Distribution/Coverage), planning mode (offline/online) and environment type (static/dynamic). Table 1 crosswalks this taxonomy against the above surveys so that the differences in scope and emphasis are explicit.

Prior surveys organize methods by lineage or paradigm, but practitioners decide under operational context—what mission is being flown (Path/Distribution/Coverage), whether planning is offline or online, and whether the environment is static or dynamic. Without a scenario-conditioned lens, evidence remains fragmented: results that are optimal on static maps may break under online dynamics, and algorithms that scale in simulation may stall when bandwidth or energy budgets are tight. This review therefore treats mission × planning mode × environment as the primary axis (12 cells, 9 populated).

Based on this organization, the main contributions of the review are as follows:

(1): A scenario-conditioned taxonomy for multi-UAV swarm planning is established and applied consistently across the paper. The taxonomy covers twelve mission–planning–environment cells, nine of which are populated by recent work.
(2): For each populated cell, representative algorithm families are summarized together with their reported limitations, and cross-scenario trade-offs (e.g., scalability versus energy efficiency, centralized versus decentralized or hybrid architectures) are discussed using a common set of evaluation lenses.
(3): The review provides deployment-oriented guidance by linking offline and online planning through architecture choices, digital-twin-based validation, and safety-aware collision avoidance, and by outlining scenario-specific research directions for disaster response, public-safety surveillance, urban logistics, and other application domains. The structure of this paper is as follows. Section 2 defines the classification criteria and method families. Section 3 reviews the algorithms across the nine scenarios. Section 4 discusses cross-cutting issues (architecture selection, digital twins, collision avoidance). Section 5 synthesizes trade-offs and scenario-conditioned guidance. Section 6 formulates open scientific problems. Section 7 concludes.

2. Classification and Analysis of Path-Planning Algorithms for Multi-UAV Swarms

2.1. Classification Criteria

2.1.1. Mission Types

The primary mission types for UAV swarms can be grouped into three categories [8]: path, distribution, and coverage.

Path Missions: Path missions require multiple UAVs to launch from the same or different sites and then either converge on a common target or maintain designated relative formations during flight. For instance, roundup and surveillance operations typically demand convergence at the target to guarantee mission effectiveness. Path missions commonly optimize either minimal flight time or shortest path length to maximize execution efficiency.
Distribution Missions: Distribution missions involve UAVs departing from the same or different launch points and navigating to distinct spatial locations to execute independent tasks. In transport and target-strike operations, UAV swarms must match mission demands with available vehicles, optimizing both allocation and routing to maximize efficiency and accuracy. The core challenge lies in developing algorithms with sufficient computational efficiency and real-time responsiveness to accommodate dynamic distribution requirements.
Coverage Missions: Coverage missions feature uncertain objectives; search and reconnaissance operations, for example, require UAV swarms to explore every point within the designated area. During execution, planners must minimize mission cost while avoiding omissions of critical regions and redundant coverage of low-value zones. Because coverage environments are typically cluttered and dynamic, algorithms must offer high computational efficiency and real-time adaptability.

2.1.2. Planning Methods

Depending on whether prior knowledge of the mission environment is available, path planning is categorized as either online or offline.

Online Planning: Online planning dynamically revises a UAV swarm’s flight path during execution based on real-time environmental data and mission updates. As environmental or mission conditions evolve, the path is continually replanned. Consequently, online-planning algorithms must deliver high real-time performance and robust stability to cope with dynamic scenarios.
Offline Planning: By contrast, offline planning pre-computes a UAV swarm’s flight path using known environmental data and mission requirements before deployment. Because the path remains fixed during execution, offline-planning algorithms must guarantee global optimality to ensure efficient mission completion.

2.1.3. Environment Types

Based on mission-space characteristics, environments are categorized as either static or dynamic.

Static Environments: A static environment features fixed obstacles and terrain throughout the planning process. Under such conditions, UAV swarms may acquire complete environmental information and pre-plan paths before deployment. Because frequent real-time computation is unnecessary, algorithms for static environments should prioritize computational efficiency to generate rapid, globally optimal routes.
Dynamic Environments: A dynamic environment contains moving obstacles or time-varying conditions (e.g., other aircraft or emerging threats). In such settings, UAV swarms must perceive environmental changes and replan paths in real time; hence, dynamic-environmental algorithms require high real-time performance and flexibility to cope with complex conditions.

2.2. Algorithm Classification

Drawing on three criteria—mission type, planning method, and environment type—this review cross-classifies UAV-swarm mission scenarios into 12 distinct categories, as illustrated in Figure 1. For example, path-planning problems for online operation in dynamic environments (PND), distribution-mission problems for online planning in static environments (DNS), and coverage problems for offline planning in dynamic environments (CFD) are representative cases.

Algorithm categorization in this review follows the specifications listed below:

(1)

This review systematically surveys domestic and international literature from the past two years on UAV-swarm path-planning algorithms and summarizes the eligible algorithms according to the proposed classification criteria.

(2)

The absence of algorithms in some categories (e.g., PNS, DNS, CNS) does not imply that these scenarios are unsolvable. The specific reasons for the absence of algorithms in these categories are as follows:

Marginal benefit vs. cost. In static, fully known maps, continuous re-planning provides little improvement over a high-quality offline plan, yet incurs persistent onboard computing, communication, and energy overhead; most pipelines therefore pre-compute (PFS/DFS/CFS) and keep only a lightweight safety layer online.
Problem reformulation. PNS typically degenerates to offline global routing with reactive near-field safety (e.g., APF/MPC) rather than full online re-planning. DNS is commonly handled as offline static scheduling (DFS), with periodic batch updates if tasks drip in; once meaningful exogenous changes appear, it is re-cast as DND.CNS reduces to deterministic coverage path planning (CFS) via decomposition/spanning-tree, since the environment does not change; adding online decision making brings little gain without dynamics.
Benchmarking and applicability. “Online–static” lacks clear trigger events for re-planning, so widely used benchmarks emphasize offline–static or online–dynamic settings. In regulated airspace, certification also favors pre-flight plans, further steering research away from online–static cells.
When these cells matter. They become meaningful only if “static” maps hide latent uncertainty (e.g., intermittent sensing or map incompleteness) or if long-duration missions impose periodic re-optimization; such cases are routinely labeled online–dynamic in practice and therefore appear among the populated cells.

(3)

Each publication is assigned to a single classification category. When an algorithm spans multiple mission forms, it is placed in the most representative category to simplify classification and highlight its primary features.

(4)

To avoid redundancy, algorithms applicable to multiple scenarios are discussed only within the scenario most relevant to their core contribution.

Unlike prior taxonomies that cluster methods primarily by algorithmic lineage or optimization style [2,3,4,5,6,7,8,9,10,11,12], the grid in Figure 1 uses mission type (Path/Distribution/Coverage), planning mode (offline/online), and environment type (static/dynamic) as its primary axes. Methods from different families are grouped into the same scenario cell whenever they address the same mission–planning–environment combination, and each cell is described using a common set of attributes: planning architecture (centralized/decentralized/hybrid), planning horizon, main constraints (safety, communication, energy, payload), evaluation metrics, and reported limitations or failure modes. This consistent description serves two purposes. First, it enables cross-scenario generalization: by reading along rows or columns, one can see how the same algorithm families behave when moving, for example, from Path to Distribution missions or from static to dynamic environments. Second, it supports deployment-oriented lookup: a practitioner can locate the scenario cell closest to a given project and immediately see which combinations of architecture and algorithm families have been tested under comparable assumptions, together with their known trade-offs.

Subsequent sections examine each mission scenario in turn, detailing the applicable algorithms and their latest advances.

3. Research Status of Path-Planning Algorithms for Multi-UAV Swarms

Building on the categorization criteria introduced in the previous section, this review partitions the mission space into nine scenarios, while omitting three rarely applied categories. The subsequent sections detail the research status of these nine scenarios, analyzing how different algorithm families perform and evolve across mission contexts. Related algorithms beyond these scenarios are also summarized to provide additional context.

This section adopts a nine-scenario taxonomy—PND, PFS, PFD, DND, DFD, DFS, CND, CFD, and CFS—to organize recent advances in multi-UAV swarm planning. To avoid redundancy, we first give one-time, family-level definitions and explicitly map each family to the scenarios where it is best suited, stating why it works there (real-time feasibility, scalability, implement ability, or coupling with routing/assignment) and what its principal limitations are. When a family spans multiple scenarios, we clarify the scenario-specific mechanisms that make it effective in each case. Subsequent subsections do not repeat family primers; they report only scenario-specific adaptations, empirical performance, and reported limitations, ensuring concision while preserving technical fidelity.

Reinforcement Learning (RL):

RL learns policies or value functions from interaction and is best suited to PND, where both the environment and costs are time-varying and online adaptivity is required; it is also effective for decentralized tasking in DND under partial observability and bandwidth limits and can wrap global planners as a local decision layer [13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38]. Its main limitations are sample inefficiency, long training time, non-trivial reward/safety design, sim-to-real gaps, and potential instability under distribution shift.

RRT family (RRT, RRT *):

Sampling-based, kinodynamically feasible planning provides anytime solutions: RRT-style planners enable rapid replans in PND when costs or obstacles change online [39,40]. Convergence can be slow in narrow passages or with tight curvature/clearance requirements, post-smoothing is often necessary, and performance depends on the sampling policy and collision-checking budget.

Artificial Potential Field (APF):

As reactive near-field safety modules, APFs complement global routes and provide low-latency collision avoidance in PND [41,42,43,44,45]. They are lightweight and easy to deploy, but can suffer from local minima, oscillations near boundaries, sensitivity to gain tuning, and lack global completeness unless coupled with a global planner.

Model Predictive Control (MPC):

Model-based look-ahead control is particularly suitable for PND when kinodynamic and safety envelopes dominate and maintains track-keeping under mild disturbances [46,47,48,49]. The approach is compute-intensive, sensitive to model mismatch and horizon choices, and feasibility guarantees hinge on constraint formulations.

Ant Colony Optimization (ACO):

Pheromone-guided constructive search is attractive when allocation and routing are tightly coupled: online co-optimization in DND under lossy communications via re-pheromones, and discrete waypoint selection in PND [50,51,52,53,54]. It is sensitive to pheromone/heuristic tuning, struggles on very large graphs, and requires robust evaporation/reset policies for dynamic changes.

Meta-heuristic and bio-inspired algorithms:

Population-based global search thrives in large online spaces and coupled assignment–routing pipelines across PND/DND/CND thanks to strong global exploration and easy hybridization with clustering, graph seeding, or local refiners [55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70]. The trade-offs are parameter sensitivity (population size, inertia/cooling, operators), stochastic variability across runs, lack of deterministic optimality, and wall-time growth with dimension or fleet size, unless parallelized.

Supervised Learning (SL):

With labeled data, SL replaces hand-crafted heuristics and accelerates decision making in PFS, where environments are static and data are abundant [71]. It offers limited runtime adaptivity and degrades under domain shifts; labels can be costly, and learned surrogates may lack safety guarantees.

Graph-search algorithms:

Heuristic and incremental graph searches provide reproducible baselines with optimal/anytime properties on known maps in PFS [72,73,74]. Performance depends heavily on heuristic quality and memory; scalability degrades on very large graphs or high-dimensional kinodynamics, and frequent global replans can be expensive.

Genetic Algorithm (GA):

GA encodes solutions for selection–crossover–mutation and is used for offline multi-objective trajectory optimization in PFS, large static assignment in DFS when exact solvers slow down, and coverage tour ordering/refinement in CFS [75,76,77,78,79,80,81,82]. Its limitations are early convergence without diversity control, solution-quality dependence on operator design, and high evaluation costs for complex fitness functions.

Particle Swarm Optimization (PSO):

With simple velocity–position updates, PSO scales well and is widely used to encode time-varying costs pre-mission in PFD, to solve multi-objective temporal couplings for allocation in DFD (often with rolling horizons), and to generate sweep paths and adaptive routing for moving targets in CFD [83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108]. Premature convergence and parameter sensitivity (inertia/learning factors) are the main risks; runtime grows with dimensionality and rapid time variation unless mitigated by graph seeding or clustering-assisted hybrids.

Unsupervised Learning (UL):

Clustering and embedding methods discover structures without labels to partition tasks/areas and balance load/bandwidth—dynamic partitioning in DND [109,110]. Quality depends on metric and cluster-number choices; methods are sensitive to scaling/outliers and may drift without periodic rebalancing.

Area-segmentation algorithms:

Cellular decomposition, Voronoi/centroid partition, and spanning-tree formulations divide the workspace into manageable regions for assignment and routing, underpinning deterministic coverage in CND [111,112,113,114,115]. Typical caveats include boundary oscillations, resolution dependence, potential load imbalance without dynamic reweighting, and the need for robust connectors to stitch sub-tours across regions.

Differential Evolution (DE):

Real-valued differential mutation with greedy selection often achieves faster numerical convergence than GA for continuous parametrizations and route refinement in CFS and is also used inside some PSO/GA hybrids as a local improver [116]. It remains sensitive to scale factors and population size, can stagnate on rugged landscapes, and—like other meta-heuristics—lacks deterministic optimality guarantees.

Table 2 provides a one-page taxonomy and qualitative comparison guide that links each algorithm family to the nine scenario cells and summarizes key deployment-relevant factors (runtime compute footprint, offline training/optimization cost, and typical roles). It complements Figure 1 (taxonomy axes) and Section 4.4 (scenario-level recommendations) by making the scenario alignment and resource/applicability trade-offs explicit.

Reading guide for Section 3: Table 2 is used as the navigation map for the scenario-by-scenario review. For each algorithm family, the Primary fit column lists the scenario cells in which the family is most used as the main planning component, while the Secondary fit column lists cells in which it is typically used as an auxiliary layer (e.g., as a safety wrapper, a seeding mechanism, or a local refiner in a hybrid pipeline). Table 2 also provides qualitative labels for runtime planning compute and offline training/optimization cost, together with brief notes on typical roles and applicability assumptions. Each scenario subsection in Section 3 then follows a consistent template: (i) the key operational constraints for the scenario cell, (ii) the primary-fit families and what roles they play, (iii) secondary-fit families and common hybridization, and (iv) a short comparative synthesis with pointers to the corresponding scenario-specific summary tables (Table 3, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9, Table 10 and Table 11).

As a practical complement to Table 2 and the reading guide above, it is also useful to clarify how workspace and configuration-space representations affect execution time and, consequently, whether a method is more suitable for online or offline use. Representation choice is a major driver of runtime. Many studies rely on occupancy grids/voxels (2D cost maps or 3D voxel maps), which support graph search and rasterized RL observations but scale strongly with resolution and update frequency. Discrete graphs/roadmaps (waypoint or task graphs) reduce continuous geometry to combinatorial search, which is efficient for moderate graph sizes but costly when dense discretization is required. Sampling-based planners and APF-style local layers typically operate in continuous spaces and rely on collision checks or distance queries against voxel/mesh models, making the collision-checking budget a key real-time bottleneck. Perception-driven pipelines may start from point clouds, but voxelization or surface reconstruction can dominate runtime, even if the planner is lightweight. Correspondingly, in the reviewed literature, graph search is most often grid-based; RRT/RRT* depends on collision checking; APF uses local cost maps or distance fields; MPC requires model fidelity and often obstacle convexification; RL typically uses rasterized or point-cloud-derived features with offline training; PSO/GA/DE optimize waypoint parameterizations over DEM/cost rasters; and ACO/segmentation/clustering operate on discrete graphs or partitioned regions. This representation-centric view helps interpret why certain families appear more often in online cells (fast local layers, bounded collision checking) than offline or slow-horizon cells (high-resolution grids, large graphs, or heavy preprocessing).

3.1. PND Problem (Path, Online, Dynamic)

3.1.1. Reinforcement-Learning Algorithms

Recent work on the PND scenario uses reinforcement learning (RL) in mainly two ways: learning a global routing policy that cooperates with a lighter near-field safety layer and learning decentralized policies that coordinate coverage or pursuit under partial observability. Across these designs, RL is attractive because it can handle time-varying costs and uncertain environments, but training is resource-hungry and safety is usually enforced by penalties rather than formal constraints.

The first line of work combines a learned virtual leader with local safety controllers. Hu [13] and Wu et al. [20] both let a single DDQN/PPO leader propose global headings, while individual UAVs maintain formation and avoid obstacles via artificial potential field (APF) or flocking rules. These architectures centralize the main decision loop into one actor–critic, which improves scalability and keeps onboard policies simple, while APF-style modules enforce basic separation. They work well in simulated low-altitude scenes with known threat maps or static/moving obstacles, but rely on accurate maps and sensor models, tuned reward weights, and reliable short-range communication. Safety and kinodynamic feasibility are encoded through potential fields and penalties rather than certified guarantees.

A second group of studies uses centralized training with decentralized execution (CTDE) for online path planning and informative coverage. Westheider et al. [14] employ COMA to coordinate multi-UAV exploration on a 3D grid; Arranz et al. [16] train PPO-based sub-agents for search, tracking, and obstacle avoidance under a centralized task allocator; Azzam et al. [21] and Wang et al. [22] adopt actor–critic variants (including MASAC) so that critics access global maps during training while actors run onboard from local observations and limited map sharing. These methods show that RL can scale to medium-size teams and handle partial observability, but they usually assume fixed-altitude planar motion, ideal sensing, and communication, and they still require additional safety layers to cope with cluttered, certification-bound 3D airspace.

A third cluster couples RL with task allocation, pursuit, or richer interaction models. Kong et al. [15] integrate a TD3 motion policy with a supervised target-assignment network, using Hungarian labels as a teaching signal so that assignment and collision-free routing are optimized jointly. Wang et al. [17] design decentralized D3QN policies for data collection in the presence of non-cooperative UAVs and jammers; K-means clustering and distance-based deterrence terms keep complexity manageable while maintaining connectivity. Liu [23] formulates roundup as a game between pursuers and an intelligent evader using Apollonius-circle geometry and a Q-learning-driven payoff matrix to coordinate joint actions, while Liu et al. [26] tackle multi-UAV moving-target search as a DEC-POMDP with a high–low-altitude collaborative architecture and MADDPG-based policies, showing that multi-layered RL can exploit field-of-view and altitude coupling but remains sensitive to FoV/range assumptions and reward design. Cheng et al. [18], Wu et al. [24], and Zhao et al. [25] add hierarchy, model-based lookahead, or motion-primitive libraries around RL cores to improve exploration and respect kinematic constraints. These pipelines make the roles of learning, assignment, and low-level control clearer, but inherit similar limitations: heavy dependence on simulation-based training, sensitivity to reward design and hyper-parameters, and a lack of formal robustness guarantees when sensing or communication degrade.

Overall, PND-oriented RL planners demonstrate strong adaptability to dynamic costs, moving obstacles, and partial observability, and they integrate well with virtual-leader, APF, MPC, or motion-primitive modules [13,14,15,16,17,18,19,20,21,22,23,24,25,26]. However, most studies still rely on 2D or fixed-altitude abstractions, idealized sensing and communication, and penalty-based safety. Bringing these methods closer to deployment will require dynamics-aware safety layers, more realistic environment and sensor models, and careful accounting of onboard computational and energy budgets.

3.1.2. The Rapidly Exploring Random Tree Algorithm

RRT- and RRT*-style planners in the PND scenario mainly act as kinodynamically feasible global routers that embed timing, aerodynamics, or connectivity directly into the sampling process. Rather than relying on lattice-based grids, these methods explore a continuous state–time space and then smooth the resulting branches into flyable trajectories, making them natural components for missions where flight dynamics and explicit time-stamping matter (e.g., tight corridors, formation timing, or connectivity-aware routing).

Burzyński et al. [39] present a spacetime extension of RRT* that builds a “spacetime tunnel” around a centroid-to-goal backbone and then plans time-stamped, dynamically feasible trajectories for each platform within that tunnel, explicitly incorporating fixed-wing aerodynamics and inter-vehicle separation constraints. The method uses informed sampling to focus search, transforms 2D plans into 3D spacetime to enable collision checking with dynamic agents, and employs geometry-based collision avoidance and Dubins/Bezier trajectory primitives to respect platform maneuverability. In simulation, the algorithm reliably finds feasible swarm trajectories for small teams and reduces search failures by increasing sample counts, offering a practical component for obstacle-dense missions where flight dynamics and timing matter. Limitations include a 2D → spacetime abstraction (fixed-altitude/point-mass simplifications in some steps), sensitivity to sampling and tunnel parameters (σ, D, sample count), and reliance on ideal sensing/communication for dynamic-collision checks; deploying in certification-bound, cluttered 3D airspace would require explicit dynamics-aware safety certification and on-board robustness to sensing/comm failures.

Kelner et al. [40] extend RRT*-style sampling to explicitly account for intra-swarm communication by embedding a Flying Ad Hoc Network model into the planner: planned segments are constrained not only by collision-free reachability but also by radio-link quality and a minimum-spanning-tree connectivity requirement so that every pair of agents remains connected (possibly via multi-hop) during flight. The modification limits sampling radii by acceptable bit-error rates and uses a dynamic MST rebuild to track changing link quality as platforms move, while informed sampling around a centroid-to-goal backbone reduces search effort. This makes the planner practical for missions where maintaining communication and safe separation are both critical (e.g., search-and-rescue), because it couples physical reachability, propagation-aware link budgets, and timing into the same planning loop. Key caveats are the reliance on propagation-model fidelity and line-of-sight assumptions, sensitivity to communication and sampling parameters, and the need for near-real-time topology updates; these assumptions and parameter sensitivities must be addressed before deploying the method in cluttered, certification-bound 3D airspace.

Overall, spacetime RRT* and its FANET-aware extensions show that sampling-based planners can incorporate aerodynamic envelopes, inter-vehicle separation, and communication quality in a single loop, and reliably generate feasible multi-UAV trajectories in dynamic, obstacle-dense scenes. Their main limitations come from the 2D → spacetime abstraction, sensitivity to sampling and tunnel/connectivity parameters, and the assumption of ideal sensing and near-real-time topology updates. Moving toward certified 3D deployments will require dynamics-aware safety layers and robustness to sensing and communication degradation [39,40].

3.1.3. Artificial Potential Field (APF) Methods

In the PND scenario, artificial potential field (APF) methods are primarily used as reactive near-field safety layers and as mission-specific shaping tools around a higher-level route or formation pattern. The surveyed work shows APF being adapted to jammer-aware tracking, affine-formation maneuvering, Theta*-guided formation flight, source seeking, and loiter–attack behaviors, demonstrating how the same basic potential-field idea can be tailored to different operational objectives while keeping computation lightweight.

Xiang et al. [41] formulate multi-target tracking with malicious jammers as a joint interference-minimization problem and decompose it into three cooperating modules: a cluster-evolutionary target association (CETA) for adaptive sub-swarm division and per-target assignment, a jamming-sensitive and singular-case-tolerant artificial potential field (JSSCT-APF) for trajectory generation that incorporates jammer avoidance into the potential field, and a jamming-aware mean-field game (JA-MFG) for distributed power control that trades off link quality and total interference. The dynamic collaboration scheme alternates updates of association, trajectory, and transmits power so the swarm adapts to moving targets and jammers. The paper demonstrates the method’s practical focus by integrating propagation-aware SINR constraints, collision avoidance, and energy limits into the optimization loop. The approach is effective where jammer locations or statistics are approximately known and communication/sensing are reliable, but it depends on propagation-model fidelity, tuned weightings and thresholds, and on-board capability for frequent topology/SINR updates—issues that warrant attention when porting the scheme to cluttered or certification-constrained 3D airspace.

Kang et al. [42] propose an affine-formation maneuvering scheme that combines a global virtual-leader trajectory with local, APF-based trajectory optimization so that each vehicle can locally replan to avoid obstacles while the team preserves its prescribed geometric pattern. The virtual leader acts as a low-rate, global reference (the centroid of selected leaders), and the local planner uses a second-order differentiable virtual force field to produce smooth, dynamically consistent local trajectories; local updates are computed via a receding-horizon RK4 discretization and tuned spring-like gains that trade cohesion against repulsion from obstacles. In simulation, the method improves tracking accuracy and preserves formation integrity under sudden trajectory changes, but this practicality comes with costs and assumptions: the experiments use a two-dimensional, fixed-altitude setting with idealized sensing and communication, run with a small team (seven agents), and the increased robustness is bought at higher computational expense and sensitivity to APF/spring-constant tuning—points that should be considered when comparing scalability, kinodynamic feasibility, and certification-ready deployment.

Zhao et al. [43] propose a Theta*–APF hybrid that embeds an artificial-potential-field heuristic into an omni-directional Theta* grid search, so the planner prefers low-potential corridors and thus reduces unnecessary node expansions. The method is applied to formation flight under a virtual-leader scheme and is evaluated in 2D and 3D voxel simulations. Results reported in the paper show substantially fewer search nodes, shorter planned paths, fewer inflection points, and much lower search time than a baseline A* implementation, indicating that coupling APF heuristics with any-angle search can materially improve computational efficiency and path smoothness in cluttered maps. Important caveats remain: the approach relies on grid/voxel discretization (raising memory/search loads in high-resolution 3D maps), inherits APF’s susceptibility to local minima and parameter sensitivity, and assumes reliable map knowledge and idealized sensing/communication for formation control; the reported gains may shrink under noisy perception, stricter kinodynamic constraints, or on-board compute limits, so applying the method to real-world, certification-bound 3D operations requires further robustness and runtime profiling.

Chen et al. [44] reformulate swarm source seeking as optimization of a combined navigation function that merges obstacle repulsion, inter-UAV spacing, and source intensity, and use normalized extremum-seeking to follow the function’s gradient without explicit position information. The APF component supplies collision avoidance for obstacles and inter-vehicle safety, while normalized ESC stabilizes updates and avoids the large transients typical of standard ESC. The scheme is validated in single-UAV and swarm settings, including software-in-the-loop tests, showing practical integration with a PX4 velocity loop and LiDAR sensing. The approach assumes reliable onboard sensing, fixed-altitude envelopes, and map convexity assumptions for the navigation function; safety is enforced by potential-based penalties rather than kinodynamic guarantees, and performance depends on tuning of gains and perturbation frequencies, so transfer to cluttered, certification-bound 3D airspace would require dynamics-aware safety layers and robustness to sensing/communication imperfections.

Li et al. [45] augment artificial potential field planning with a bounded repulsive “halo” at the target so that vehicles are attracted toward the objective but settle into a loiter posture rather than collapse onto the goal; a weak inter-UAV repulsion keeps separation during transit, and a bee colony controller adjusts swarm cohesion, dispersion, and speed to coordinate heterogeneous roles. The design is simple to implement and aligns the planner with the mission’s loiter–attack behavior, yet it inherits APF’s sensitivity to gains and local-minimum traps, presumes reliable target localization and ideal sensing, and enforces safety through penalties rather than kinodynamic guarantees; deployment in cluttered, certification-bound airspace would require a dynamic-aware safety layer and parameter-robust tuning.

Taken together, these APF-based designs confirm that potential fields remain an effective way to encode obstacle, jammer, and inter-UAV constraints at low latency, especially when wrapped around virtual-leader or graph-search backbones. At the same time, they inherit APF’s well-known vulnerabilities—local minima, oscillations, parameter sensitivity, and geometric (non-kinodynamic) safety assumptions—and almost all results rely on fixed-altitude abstractions with ideal sensing and communication. Therefore, for cluttered, certification-bound airspace, APF layers need to be coupled with dynamics-aware safety mechanisms and more systematic gain/threshold tuning [41,42,43,44,45].

3.1.4. Model Predictive Control Algorithm

Model predictive control (MPC)-based planners in the PND scenario are aimed at missions where kinodynamic constraints, obstacle clearance, and energy budgets dominate and where a receding-horizon formulation can justify the computational cost. The reviewed work illustrates three main patterns: centralized MPC for indoor multi-UAV navigation, MPC guided by weather forecasts in threat fields, and distributed MPC augmented with meta-heuristics or APF for better global search and disturbance rejection.

Kallies et al. [46] extend an MPC planner to multi-UAV indoor navigation by solving a receding-horizon MILP that embeds linearized vehicle dynamics, obstacle and separation constraints, and a battery equivalent-consumption term that triggers orderly return when energy is low. The planner updates obstacle sets and waypoint coverage online, assigns rooms/waypoints to different agents, and produces time-stamped, dynamically feasible segments that the vehicles can track with a faster low-level controller. The approach makes the trade-offs explicit between feasibility, coverage, and energy, and demonstrates mission-oriented behaviors such as covering nearby waypoints on the way home. Its practicality is balanced by assumptions and costs: linearized 2D indoor models with ideal sensing, reliance on a fast MILP solver and warm starts, sensitivity to horizon and big-M parameters, and the absence of hard kinodynamic safety proofs. In certification-bound, cluttered 3D deployments, additional dynamics-aware safety layers and communication-loss handling would be required.

Fan et al. [47] couple a receding-horizon MPC with an LSTM forecaster so the planner anticipates short-term wind and convective activity and then reoptimizes the route at each step using a threat field that encodes wind, thunderstorms, and no-fly regions. The LSTM predicts atmospheric parameters on a gridded map; the MPC linearizes vehicle motion and enforces obstacle and separation constraints while adjusting headings to avoid predicted storm cells and compensate for drift. An artificial bee colony routine is used to tune LSTM hyperparameters, so forecasts are stable enough for online use. The approach is well aligned with weather-driven missions where map knowledge is available and forecasts are credible, but it inherits assumptions of fixed-altitude 2D motion, reliable sensing, and communication, and it relies on solver speed and forecast fidelity; applying the method in cluttered, certification-bound 3D airspace would require dynamics-aware safety layers and robustness to forecast errors.

Wang et al. [48] embed a chaotic Gray Wolf Optimizer inside a neighbor-aware distributed MPC so that each vehicle solves its finite-horizon problem with stronger global exploration, while local feasibility comes from MPC constraints and a receding-horizon update. An event-triggered policy suppresses unnecessary solves by checking distance, angle, tracking error, and input bounds against thresholds; no-fly zones are added as penalties and the network shares only neighbor estimates, which keeps messaging light. The design cleanly separates roles—CGWO for global search, MPC for constraint handling, and event triggers for compute throttling—but it rests on a 2D, fixed-altitude point-mass model with ideal sensing/links, depends on solver speed and threshold tuning, and provides penalty-based safety rather than hard kinodynamic guarantees. Deploying cluttered, certification-bound 3D airspace would require dynamics-aware safety layers and robustness to sensing/communication loss.

Xian and Song [49] combine an offline MPC planner that yields smooth, dynamically feasible global routes with an online APF that is modified by a regulating force to escape local minima and to mitigate lingering near the goal. Dynamic obstacles are handled by an event-triggered switch: the vehicle deviates from the APF only when a threat radius is violated and then returns to the MPC path at a computed recovery point, maximizing reuse of the preplanned trajectory. The formulation linearizes vehicle motion for the global MPC, adds soft collision and compactness constraints via slack variables, and limits APF action to the local plane, where dynamic obstacles are sensed. This separation improves responsiveness under moving hazards while keeping the global path quality, but it assumes fixed-altitude 2D motion and ideal sensing/communications, depends on solver and APF gain tuning, and relies on penalty-based safety rather than hard kinodynamic guarantees—considerations that would need dynamics-aware safety layers and robustness to sensing/comm loss for certification-grade deployments.

Overall, these MPC pipelines offer strong track-keeping, explicit constraint handling, and interpretable trade-offs between coverage, safety, and energy, making them attractive for high-assurance missions with moderate team sizes. Their practicality is nevertheless bound by linearized, fixed-altitude models, reliance on fast MILP or QP solvers, sensitivity to horizon and weighting choices, and the lack of formal kinodynamic safety proofs in cluttered 3D scenes. For deployment on resource-constrained swarms, MPC layers must be paired with dynamics-aware safety shields, robust communication-loss handling, and careful profiling of solver load [46,47,48,49].

3.1.5. Ant Colony Optimization Algorithm

In the PND scenario, ant-colony-based planners are used as constructive, exploration–exploitation searchers that unify mapping, coverage, and routing in a single process. The surveyed approaches either decentralize ant-like exploration with shared revisit and cost maps, or pair ACO with clustering front-ends so that allocation and routing are co-designed, highlighting ACO’s ability to coordinate multiple UAVs with modest onboard computation.

Wee [50] proposes a decentralized swarm strategy that builds a shared revisit map and a return-to-home cost map while agents explore, using ant-foraging logic to spread vehicles and a light communication scheme to exchange only local positions and partial maps. Each UAV senses obstacles with time-of-flight rings and chooses among eight heading primitives; coverage proceeds without a central router, and the accumulated maps later provide a feasible path home for every agent. The design is attractive for GNSS-denied search tasks because mapping and routing are produced in one pass and computation per agent is small. At the same time, the approach assumes fixed-altitude 2D grids, ideal sensing and synchronous messaging, provides penalty-based rather than kinodynamic safety, and evaluates path quality mainly against an iterative A* baseline; transfer to cluttered 3D airspace would require dynamics-aware safety and robustness to perception and link loss.

Guo et al. [51] combine an SOM allocator with an ACO router: targets are embedded onto a neural lattice so competitive learning yields an assignment from targets to vehicles, then an ACO with adaptive pheromone evaporation explores early and exploits later to generate collision-aware routes. A weight-adjustment rule updates winning neurons online to smooth boundary effects and reduce assignment oscillations; the pipeline therefore separates who does what (SOM) from how to get there (ACO). The design is simple and interpretable and tends to shorten total travel when targets cluster, but it assumes known target locations and largely static maps, is sensitive to population size and evaporation parameters, and enforces safety via penalties rather than kinodynamic constraints. Scaling to very large swarms or time-varying targets would require messaging limits, dynamic re-assignment, and dynamics-aware collision handling.

Taken together, these ACO pipelines show that ant-foraging logic can yield simple, scalable exploration and joint assignment–routing schemes for GNSS-denied or unknown environments while keeping per-UAV computation low. However, they assume fixed-altitude grids with ideal sensing and synchronous messaging, benchmark safety and optimality mainly against geometric and A*-style baselines, and remain sensitive to population and evaporation parameters. Scaling to cluttered 3D airspace and very large swarms will require dynamics-aware safety layers, communication-aware design, and support for dynamic re-tasking [50,51].

3.1.6. Meta-Heuristic and Bio-Inspired Algorithms

In the PND scenario, meta-heuristic and bio-inspired planners are used mainly as offline global optimizers that generate one or more backbone trajectories before execution. These paths are then tracked or locally adapted by lighter online modules (e.g., APF, MPC, motion primitives). The appeal of such methods is robust global exploration of complex 3D terrain with multiple constraints (altitude bands, threat envelopes, curvature limits), at the cost of considerable parameter tuning and a strong reliance on static map knowledge.

The first group of works focuses on population-based global search with enriched operators. Wang et al. [55] improve tuna swarm optimization with elite opposition-based initialization, Lévy-flight jumps and golden-sine updates; Gu et al. [56] strengthen the marine predators algorithm with spiral, crossover, boundary-control, and refined eddy/FADs rules; Xu et al. [57], Fu et al. [58], Liu et al. [59,60], and Yin et al. [61] design krill-, bird-, snake-, sand cat-, and whale-inspired variants with multi-population schemes, distribution-estimation learning, elite pools, Cauchy or chaos perturbations, and boundary-handling. Across these designs, 3D paths are encoded by waypoint chains under composite costs that combine length, smoothness, altitude bands, and threat exposure, and B-spline or Dubins smoothing is applied post-optimization. Results on rugged, cluttered DEMs show improved escape from local minima and lower path costs than baseline population methods. However, all assume known terrain/threat maps and point-mass kinematics, and their behavior is sensitive to population sizes, step-size schedules, and weight coefficients; safety remains penalty-based rather than kinodynamic.

A second group hybridizes meta-heuristics with sampling-based planning or flocking control. Chen et al. [62] combine a starling-inspired flocking controller with an RRT-based global planner lifted to 3D spacetime, so local formation spacing and regrouping are handled by flocking rules, while RRT supplies obstacle-aware guidance. Xiang et al. [63] nest a multi-mode lightning search algorithm around a greedy RRT initializer to plan urban patrol routes that trade execution rate, energy, and impact risk. Wu et al. [64] enhance moth–flame optimization for formation planning over mountains. Zhang et al. [65] inject differential-evolution operators into a fireworks algorithm to coordinate per-UAV firework groups under separation constraints. Hou et al. [66] adopt a different strategy: a library of time-optimal motion primitives is generated offline and paired with precomputed collision masks, so online replanning is reduced to selecting safe primitives rather than rerunning the meta-heuristic. These hybrids make the division of labor explicit—meta-heuristics for slow, global design; RRT/primitive libraries and flocking for online feasibility—and demonstrate that a pre-computed backbone or library can support large teams with limited onboard compute. At the same time, they inherit the same assumptions of known 3D city or terrain models, fixed-altitude or banded motion, and geometric safety; effectiveness depends on numerous hyper-parameters (discharge probabilities, neighborhood radii, library resolution), and none provides formal kinodynamic guarantees.

Overall, meta-heuristic and bio-inspired methods play a supporting role in PND: they are well suited to pre-mission synthesis of globally reasonable paths or primitive libraries under rich terrain and threat models, but less suited to tight real-time constraints on resource-limited platforms. Bringing them closer to deployment would require (i) tighter coupling to dynamics-aware safety layers, (ii) more systematic parameter-robust designs, and (iii) explicit accounting of computation and memory when used as upstream components for online replanning [55,56,57,58,59,60,61,62,63,64,65,66].

The foregoing findings are consolidated in Table 3.

From a comparative perspective, PND methods fall into three complementary roles. RL- and MPC-based planners provide strong online adaptivity and can embed rich objectives but incur high training and inference costs and depend on reasonably accurate models. RRT* and related sampling-based planners bridge global search and kinodynamic feasibility, offering anytime improvements when costs or obstacles change, while APF and ACO modules act as lightweight, reactive safety, and routing layers around higher-level plans. Meta-heuristic and bio-inspired optimizers are most effective as offline or slow-horizon global designers that produce backbones or motion-primitive libraries later refined online. Table 3 therefore suggests that, in practice, PND deployments will typically combine a cheap reactive layer (APF/ACO), a kinodynamically feasible global planner (RRT*/MPC) and, where compute allows, an RL or meta-heuristic component to capture complex objectives and adaptivity.

Table 3. PND literature summary.

Reference	Method (Family → Specific)	Scenario and Problem	Limitation
Hu J, Fan L, Lei Y, et al. [13]	RL → PPO; formation/APF	Low-altitude, radar-evading path via virtual leader; Role: global path; Arch: hybrid; Assump: known radar map; Quant: Mission success rate ≈ 95%; Average path length ≈ 121.7 km	Sensitive to radar-model fidelity and reward weights; on-policy training cost; robustness to unknown sensors/wind unclears
Westheider J, Rückin J, Popović M. [14]	RL(MARL) → COMA	Online 3D informative coverage; Role: coverage planning; Arch: CTDE/decentralized; Assump: flat grid terrain; Quant: Coverage ratio at mission end ≈ 79% (4 agents); Coverage ratio at mission end with zero communication ≈ 76%	Needs global features at training; critic-heavy; no explicit obstacle/kinodynamic safety
Kong X, Zhou Y, Li Z, et al. [15]	RL → TD3 + target-assignment network (Hungarian labels)	Joint target assignment and collision-free path in 3D dynamic obstacles; Role: per-step centralized assignment + decentralized TD3 motion; Arch: hybrid; Assump: local sensing, spherical obstacles, grid world; Quant: Mission success rate ≈ 84%; Targets reached = 5/5	Per-step Hungarian adds cubic cost; relies on stable Q aggregation and engineered reward; no explicit kinodynamic safety
Arranz R, Carramiñana D, Miguel G, et al. [16]	RL → PPO sub-agents + deterministic swarm controller	Online ground surveillance (search, track, avoid); Role: centralized tasking + on-board learned behaviors; Arch: centralized controller with per-UAV sub-agents; Assump: fixed altitude, benign weather, ideal sensors; Quant: Time to first target ≈ 2.8 s; Tracking continuity ≥ 95%	Relies on stable comms and perfect sensing; fixed-altitude envelope; training cost; no explicit kinodynamic safety—needs added safety/comm-loss handling for cluttered airspace
Wang X, Gursoy M C. [17]	RL → D3QN (dueling double DQN); decentralized multi-agent	IoT data-collection path planning with non-cooperative UAVs (and jammer); Role: per-UAV on-board planner; Arch: decentralized with low-level neighbor exchange; Assump: local sensing, TDMA connectivity, velocity-set actions; Quant: Mission success rate ≥ 99%; Collision rate < 0.6%	Needs reliable sensing/limited comms; motion discretized; safety via reward shaping—add explicit kinodynamic/safety layer for cluttered, certified airspace
Cheng Y, Li D, Wong W E, et al. [18]	RL → MAXQ + SA	Cooperative path planning in 2D grids; Role: hierarchical subtask planning; Arch: centralized training/execution; Assump: point-mass UAVs, fixed altitude, ideal sensing; Quant: Average planning steps lower than MAXQ (proxy for time)	2D abstraction; tuned cooling schedule; safety via penalties (no kinodynamic constraints)
Niu Y, Yan X, Wang Y, et al. [19]	Meta-heuristic → AEO + MEAEO-RL	Global optimization for multi-UCAV 3D paths with timing/collision constraints; Role: global optimizer with subpopulations; Arch: centralized coop. via shared costs; Assump: DEM terrain, known threat hemispheres, point-mass, speed bounds; Quant: Mission time ≈ 7.69 h (cooperative arrival); Average path length ≈ 122 km	Needs known threat maps and weight tuning; grid/point-mass abstraction; safety via penalties (no kinodynamic guarantees)
Wu W, Zhang X. [20]	RL → DDQN; virtual leader + APF local avoidance	Swarm navigation with static/dynamic obstacles; Role: global heading (leader) + local APF; Arch: hybrid; Assump: fixed-wing model, reliable sensing/comms;	Safety via penalties (no kinodynamic guarantees); discrete heading actions; relies on sensing/comms stability
Azzam R, Boiko I, Zweiri Y. [21]	RL(MARL) → Actor–Critic (CTDE); curriculum learning	Cooperative navigation to simultaneous arrival; Role: central critic (train) + decentralized actors (exec); Arch: CTDE; Assump: fixed altitude, planar model, local observations; Quant: Completion time ≈ 34 s (10 UAVs)	Requires reliable sensing/links; safety via reward shaping (no kinodynamic guarantees); fixed-altitude planar abstraction
Wang W, You M, Sun L, et al. [22]	RL → MASAC-Discrete (multi-agent SAC); CTDE	Unknown-environment cooperative exploration; Role: online coverage planning; Arch: CTDE with decentralized execution; Assump: fixed-altitude 2D grid, local sensing + map sharing; Quant: Task success rate ≈ 93%; Average steps per episode ≈ 121	Depends on reliable sensing/sharing; planar abstraction; safety via reward shaping (no kinodynamic guarantees)
Liu Q. [23]	Game theory + Q-learning with adaptive exploration; Apollonius-circle capture	Cooperative roundup of an intelligent evader; Role: geometric capture test + payoff-matrix game + tabular RL; Arch: centralized game solves with per-UAV execution; Assump: ideal sensing/comms, 2D fixed altitude, point-mass kinematics; Quant: Capture time ≈ 22 s; Steps reduced ≈ 51% vs. standard Q-learning	2D abstraction; relies on synchronous sensing/comms; safety via penalties (no kinodynamic guarantees); scalability to 3D/noisy maps untested
Wu Q, Liu K, Chen L, et al. [24]	RL(CTDE) → MADDPG + MPC-style multi-step value convergence; CTPDE/CTFDE + distance-weighted mean field	Stochastic hazards MAPF; Role: RL waypoints + fluid-field controller; Arch: CTDE (central critic) with CTPDE/CTFDE execution; Quant: reported collision count = 0 in tests; real-robot deployment: 3-UAV demo	Fixed-altitude, point-mass; ideal sensing/links; parameter-sensitive (mean-field, controller, horizon); training overhead from centralized critics; geometric (non-kinodynamic) safety
Zhao X, Yang R, Zhong L, et al. [25]	RL → SAC (LiDAR) + AIT* follow-points; parameter-sharing, off-policy; no comms	Multi-UAV path planning and following; Role: SAC end-to-end planner with AIT* tracking; Arch: shared replay; no inter-agent comms; Quant: 3-UAV success (1000 rounds): 829 vs. 705 (baseline SAC)	Fixed altitude; no kinodynamic safety; parameter-sensitive (LiDAR/range, rewards)
Liu Y, Li X, Wang J, Wei F, et al. [26]	RL → AM-MAPPO + action-mask CA + rule-based target capture + FoV encoding	3D moving-target cooperative search; Role: high–low collaboration (sweep → descend) + masked CTDE policy; Quant: captured targets increase with team size (~2.58 → 3.77 → 3.65 for 3/5/8 UAVs); avg uncertainty decreases to ~0.156 at 8 UAVs	Three fixed altitude bands; grid/ideal sensing; no kinodynamic safety; parameter-sensitive (FoV/range, rewards/clip)
Burzyński W, Stecz W. [39]	Sampling → Spacetime RRT* (Multiplatform Spacetime RRT*)	Time-aware multi-UAV trajectory planning in dynamic, obstacle-dense environments; Role: global spacetime tunnel + per-UAV spacetime RRT planning; Arch: centralized backbone + per-UAV planning; Quant: Average planning time ≈ 1.6 s (3000 samples); Mission success rate ≈ 100% (≥2000 samples)	2D → spacetime abstraction; sensitive to σ/D/sample choices; assumes reliable sensing/comm for dynamic collision checks; needs dynamics-aware safety for certified 3D deployments
Kelner J M, Burzynski W, Stecz W. [40]	Sampling → RRT* (FANET-aware, spacetime)	Swarm trajectory planning that enforces multi-hop connectivity (FANET) and collision-free time-stamped paths; Role: global spacetime planner + dynamic MST for connectivity; Arch: centroid backbone + informed sampling	Assumes accurate radio-propagation/LOS models and reliable sensing/comm; sensitive to sampling/MST parameters and latency of MST rebuilds; 2D → spacetime simplifications in parts may limit direct transfer to certified 3D airspace
Xiang L, Wang F, Xu W, et al. [41]	Joint → CETA (clustering) + JSSCT-APF + JA-MFG	Adaptive sub-swarm assignment, jammer-aware trajectory and power control; Role: association + jamming-sensitive path + jamming-aware power Quant (paper): Average total interference ≈ 28%; Tracking steps ≈ 33% vs. baselines	Requires propagation-model fidelity and reliable sensing/comm; needs tuning of weights/thresholds and frequent topology/SINR updates; direct transfer to cluttered/certified 3D airspace needs extra robustness work.
Kang C, Xu J, Bian Y. [42]	Virtual leader + APF (second-order differentiable virtual-force-field); affine formation maneuvering	Formation-keeping obstacle avoidance and continuous configuration change; role: virtual-leader global reference + local APF-based trajectory replanning; tested in 2D fixed-altitude formation maneuvers; Quant Tracking accuracy ≈ 89.5%; Completion time ≈ 450 s	Assumes ideal sensing/communication, 2D fixed-altitude, small team (N = 7); higher computational cost and sensitivity to APF/spring-constant tuning; scalability/kinodynamic certifiability untested.
Zhao W, Li L, Wang Y, et al. [43]	Theta + APF heuristic (Theta–APF) *—omni-directional Theta with APF-guided heuristic for formation path planning	3D/formation path planning in cluttered voxel maps; role: reduce node expansions and smooth paths while keeping formation via virtual-leader control. Quant: Search time ≈ 10.38 s (vs. 21.67 s A*); Average path length ≈ 119.21 (grid units).	Relies on grid/voxel discretization (scales poorly with resolution); inherits APF local-minima and parameter sensitivity; assumes reliable maps/sensing/communication and limited kinodynamic testing.
Chen G, Yuan S, Zhu X, et al. [44]	ESC + APF (normalized ESC; combined navigation function); swarm source seeking	Unknown-environment source seeking with obstacle and inter-UAV avoidance; role: normalized ESC for gradient following + APF for collision avoidance; arch: onboard sensing + PX4 velocity loop; Quant: leader time-to-target ≈ 100 s; minimum follower–follower distance ≈ 0.4 m (no collision)	Assumes reliable onboard sensing and fixed-altitude envelope; safety via potentials (no kinodynamic guarantees); performance sensitive to gains/perturbation frequencies; robustness to sensing/comm loss needs validation.
Li J, Zi S, Lu X, et al. [45]	APF (improved) + bee-colony control; bounded goal-repulsion for loiter-attack	Swarm path planning in complex island-reef terrain; role: target attraction + inter-UAV repulsion + loiter at goal; arch: APF planner with swarm-intelligence coordination; Quant: mission success rate 100%; attack time window ≈ 2.738 s (with bee-colony control, vs. improved APF without it ≈ 5.946 s)	Parameter sensitivity and APF local minima; assumes accurate target localization and ideal sensing; safety via potentials (no kinodynamic guarantees)
Kallies C, Gasche S, Karásek R. [46]	Optimal control → MPC (MILP; ECM); energy-aware cooperative planning	Dynamic obstacles and waypoint coverage; role: MILP planner with energy return; arch: centralized solver with receding horizon; Quant: covered waypoints 30/36 in 47 s (Scenario 1); 32/36 in 62 s with low-energy return	Linearized 2D indoor model; depends on fast MILP solver and warm starts; parameter sensitivity (horizon, big-M); safety not certified for 3D cluttered airspace.
Fan X, Li H, Chen Y, et al. [47]	Optimal control + Deep learning → MPC + LSTM (weather forecast); threat-field replan	routing under wind and mobile severe weather; role: LSTM predicts atmosphere, MPC replans with threat field each step; arch: receding-horizon solver + forecast; Quant: mission success rate ≈ 99%; average planning time ≈ 67 s under wind + moving threats	Assumes fixed-altitude 2D, reliable sensing/comm; depends on solver speed and forecast fidelity; no hard kinodynamic safety.
Wang Y, Zhang T, Cai Z, et al. [48]	Meta-heuristic + Optimal control → CGWO + distributed MPC (event-triggered)	Neighbor-sharing MPC with no-fly constraints; role: CGWO global search + MPC constraint handling; arch: distributed with event triggers; Quant: tracking-error convergence time ≈ 32 s (CGWO) vs. ≈ 39 s (PSO); total event-triggered solver calls ≈ 217 vs. 429 without event-trigger	2D fixed-altitude, ideal sensing/links; depends on solver speed and threshold tuning; safety via penalties (no kinodynamic guarantees)
Xian B, Song N. [49]	Optimal control + Reactive → MPC (offline) + improved APF (online); event-triggered change/recovery	Global smooth path via MPC; local dynamic-obstacle avoidance via APF; role: MPC global guidance + APF local reaction	Assumes fixed-altitude 2D and ideal sensing/comms; requires solver/APF gain tuning; safety via penalties (no kinodynamic guarantees)
Wee L B, Paw Y C. [50]	SLAM + ant-foraging (decentralized revisit/cost maps); exploration and return	GNSS-denied search-and-rescue; role: decentralized coverage + built-in return-home planning; arch: local sensing + light map sharing; Quant: coverage ratio at mission end ≈ 99% (50-agent case); average search time ≈ 697 s (50-agent Monte Carlo mean)	2D fixed-altitude grid and ideal sensing/links; safety via penalties (no kinodynamic guarantees); path optimality mainly benchmarked vs. iterative A*
Guo J, Gao Y, Liu Y. [51]	Clustering + Meta-heuristic → SOM + ACO (adaptive evaporation); joint allocation–routing	Multi-UAV task allocation + collision-aware routing; role: SOM assignment + ACO path; arch: centralized pipeline	Assumes known targets, mostly static maps; parameter sensitivity (population, evaporation); safety via penalties (no kinodynamic guarantees); dynamic re-tasking and large-swarm scalability untested
Wang Q, Xu M, Hu Z. [55]	Meta-heuristic → SL-TSO (Sine–Lévy TSO with elite opposition and golden-sine)	Offline global optimizer for 3D UAV paths with altitude/threat constraints; Role: global path synthesis (B-spline smoothing); Arch: centralized; Assump: known terrain/threat maps, point-mass kinematics	Parameter-sensitive; requires known maps; safety via penalties (no kinodynamic guarantees)
Gu G, Li H, Zhao C. [56]	Meta-heuristic → MEMPA (random-spiral; H–V crossover; centroid boundary; refined eddy/FADs)	Offline global optimizer for 3D swarm paths; Role: global path synthesis; Arch: centralized; Assump: known terrain/threat maps, point-mass kinematics; Quant: path cost improved ≈ 10% vs. MPA; ≈10% vs. NMPA	Parameter-sensitive; relies on known maps; safety via penalties (no kinodynamic guarantees)
Xu N, Zhu H, Sun J. [57]	Meta-heuristic → Krill-swarm planner (forage/evade/cruise + B-spline)	Offline global planner for 3D plant-protection terrain; Role: global path synthesis; Arch: centralized; Assump: known 3D terrain, ideal sensing; Quant: path length reduced by 1.1–17.5%; operation time reduced by 27.56–75.15% (vs. swarm-intelligence baselines)	Heuristic behavior-switch thresholds (step/perception/crowding) sensitive; point-mass abstraction; safety via distance penalties (no kinodynamic guarantees)
Fu S, Li K, Huang H, et al. [58]	Meta-heuristic → RBMO (small/large group foraging + storage)	Offline global UAV path planning; Role: global path synthesis (B-spline smoothing); Arch: centralized; Assump: known terrain/threat maps; Quant: average path cost ≈ 214 (3D planning); best cost ≈ 214	Parameter-sensitive; point-mass abstraction; safety via penalties (no kinodynamic guarantees)
Liu P, Sun N, Wan H, et al. [59]	Meta-heuristic → SOEA (elite adversarial + adaptive threshold)	Offline global 3D path planning; Role: global path synthesis (smoothed); Arch: centralized; Assump: known terrain/threat maps, point-mass kinematics;	Parameter-sensitive (elite ratio, perturbation, threshold); relies on known maps; safety via penalties (no kinodynamic guarantees)
Liu L, Lu Y, Yang B, et al. [60]	Meta-heuristic → MISCSO (multi-population; distribution-estimation; elite pool; Cauchy perturb.)	Offline global optimizer for 3D UAV paths (length/threat/altitude/smoothness); Role: global path synthesis (B-spline); Arch: centralized; Assump: known terrain/threat, ideal sensing;	Sensitive to subpopulation ratios/Gaussian model/Cauchy step; point-mass abstraction; safety via penalties (no kinodynamic guarantees)
Yin S, Yang J, Ma L, et al. [61]	Meta-heuristic → QREWOA (quasi-opposition; real-time boundary; adversarial/history-guided)	Offline 3D path planning with length/threat/altitude/turning constraints; Role: centralized global path synthesis; Arch: centralized; Assump: known DEM and threat map; Quant: planning success rate increased by ~50%; coverage of feasible paths reported as 100%	Single-objective weight surrogate; relies on known maps; point-mass abstraction; safety via penalties (no kinodynamic guarantees)
Chen F, Tang Y, Li N, et al. [62]	Bionic flocking + RRT (Dubins smoothing)	Cooperative 3D path with formation maintenance in rugged terrain; Role: local flocking safety + global RRT; Arch: decentralized flocking + centralized/global planning; Quant: cluster sizes tested = 12/16/20 UAVs	Fixed neighbor/spacing radii and tuned gains; assumes ideal sensing/comms for escape broadcast; safety via geometric penalties (no kinodynamic guarantees)
Xiang H, Han Y, Pan N, et al. [63]	Meta-heuristic → MNRW-LSA + greedy RRT	Cooperative urban-patrol paths (energy/risk constraints); Role: RRT seeding + offline global optimizer; Arch: centralized; Assump: known 3D city model, fixed bounds; Quant: optimal path length ≈ 1.50 km; average running time ≈ 5.9 s	Needs known maps/weight tuning; partial fixed-altitude/point-mass abstraction; parameter-sensitive; safety via penalties (no kinodynamic guarantees)
Wu X, Xu L, Zhen R, et al. [64]	Meta-heuristic → GLMFO (chaos init; adaptive weighted update; crossover/mutation)	Offline global formation path planning over mountains; Role: centralized global path synthesis; Arch: centralized; Assump: known terrain, ideal sensing; Quant: average run time reduced by ~36%; total average iterations reduced by ~35% (vs. baselines)	Parameter-sensitive (weight schedule, crossover/mutation); point-mass abstraction; safety via penalties (no kinodynamic guarantees)
Zhang X, Zhang X, Miao Y. [65]	Meta-heuristic → HDEFWA (DE-sparks; chaotic init; min-radius; info-sharing)	Offline cooperative global paths (length/threat/separation); Role: centralized global optimizer (per-UAV groups); Arch: centralized with cooperative cost; Assump: known terrain/threat, point-mass; Quant: average path cost ≈ 1065.6 (Case I); ≈957.6 (Case II)	Although the algorithm incorporates DE to improve FWA, the introduction of additional operators (mutation, crossover, selection) inevitably increases computation, which may hinder real-time multi-UAV applications
Hou J.; Zhou X.; Pan N.; et al. [66]	Sampling → time-optimal primitive library + env → traj collision masks; async decentralized selector	Online path/coverage in unknown maps; Role: library → mask → min-cost selection; Arch: decentralized, asynchronous; Assump: fixed-altitude, onboard sensing, short-horizon broadcast; Quant: per-agent local planning ≈ 0.427 ms; real-time 1000-UAV sim	Geometric (no kinodynamic) safety; parameter-sensitive; needs high-level guidance for maze-like scenes

3.2. PFS Problem (Path, Offline, Static)

3.2.1. Supervised-Learning Models

Within the PFS scenario, learning-enhanced meta-heuristics serve as one-shot offline global optimizers on static 3D maps. Rather than replacing classical planners outright, methods such as the equilibrium optimizer with generalized opposition and population-level crossover/mutation are used to improve escape from local minima while preserving the familiar waypoint-chain encoding and composite path-cost formulations.

Chen et al. [71] augment EO with two diversity mechanisms—crossover/mutation inside the population and generalized opposition-based learning to sample “opposite” candidates—so individuals update against a balanced pool (top four + mean) rather than the single best, delaying premature convergence while preserving exploitation. Multi-UAV paths are encoded as waypoint sequences under a composite cost of distance, turning penalty, and terminal error; the solver outputs smoothed trajectories for all vehicles on static obstacle maps. The design clarifies its role as a centralized offline global optimizer and improves escape from local minima relative to vanilla EO and several swarm baselines, yet it relies on known terrain, a point-mass abstraction, and weight tuning, and its opposition sampling introduces extra runtime overhead; performance can degrade on some fixed-dimension multimodal cases where alternative meta-heuristics remain competitive.

Overall, these learning-augmented optimizers demonstrate that modest supervision or opposition-based sampling can improve diversity and solution quality for static multi-UAV routing, at the expense of extra runtime and additional hyper-parameters. Their benefits remain conditional on known obstacle maps, point-mass abstractions, and carefully tuned weights, and they compete with other meta-heuristics on some multimodal benchmarks rather than uniformly outperforming them; in certified applications, they would therefore sit upstream of dynamics-aware safety layers.

3.2.2. Graph-Search Algorithms

In PFS, graph-search-based planners remain the canonical baselines for offline path planning on static maps. The surveyed work extends A* and Dijkstra with workspace partitioning, connectivity-aware graph construction, and homotopy- or history-guided heuristics, clarifying how classical search can be adapted to multi-UAV settings while retaining optimality or anytime properties on voxel grids.

Du et al. [72] couple an enhanced A* with a task-allocation partitioner: the workspace is split into sub-regions, and each UAV plans locally, while an improved open/closed-list implementation replaces repeated scans with a direct query table to cut memory traffic and look-ups. The map is voxelized from DEM data, and paths are generated on a 3D grid, then smoothed for flight. This clarifies roles—centralized partitioning, per-UAV local A*—and reduces contention versus a single-region search. The approach assumes a static, known obstacle map and a grid abstraction without kinodynamic limits; benefits rely on region granularity and data-structure parameters, and dynamic replanning still requires re-runs when obstacles change. For certified deployments, explicit dynamics-aware safety and on-board robustness to sensing/communication loss would be needed.

Bashir et al. [73] formulate urban swarm planning as a connectivity-aware graph problem and couple it with a four-layer controller. A path is first extracted on a connectivity graph (Dijkstra) built from obstacle line-segments and safe-margin constraints; the fleet then flies that path in a line formation while a hierarchical controller preserves backhaul/fleet links and enforces obstacle clearance. The design separates centralized offline path finding (ground side) from per-UAV navigation (onboard), so any vehicle can take the lead while the rest maintain spacing and multi-hop connectivity. The approach is practical for static urban maps, but it assumes known stationary obstacles and a propagation model/threshold to infer link viability; vehicle dynamics are not enforced beyond geometric margins, and real-time robustness hinges on sensing/communication stability and timely topology updates during handovers.

Xie et al. [74] couple a homotopy-aware A* front-end with waypoint pruning and smooth trajectory optimization. A historical-path heuristic biases replanning toward the previous homotopy class, so the route does not jump between symmetric corridors in clutter, a JPS-based filter deletes redundant waypoints to shrink the optimization problem, and an L-BFGS back-end refines the pruned path under formation similarity and obstacle-clearance constraints. The pipeline separates global search, problem-size reduction, and continuous refinement, suppressing path hopping while preserving formation. Assumptions include a static voxelized map, point-mass kinematics with fixed bounds, and ideal sensing. Safety is enforced by geometric margins rather than kinodynamic guarantees, and gains/thresholds must be tuned for symmetry tolerance.

Collectively, these graph-search pipelines confirm that enhanced A*/Dijkstra variants are well suited to static 3D environments where maps are known, connectivity constraints are explicit, and solver time can be amortized offline. Their limitations stem from grid and voxel abstractions, lack of explicit kinodynamic guarantees, sensitivity to region granularity and heuristic parameters, and the need to rerun planning when obstacles or constraints change; in practice, they therefore serve as high-quality global backbones that are later combined with lighter online safety layers [72,73,74].

3.2.3. The Genetic Algorithm

For PFS missions, genetic algorithms (GAs) are mainly used as multi-objective curve-shaping tools that operate on smooth parametric routes (e.g., Bézier curves) rather than raw waypoint chains. This allows energy, path length, and environment-specific constraints (terrain, weather, no-fly zones) to be combined in a single fitness, while crossover and mutation adjust control points under clearance and boundary requirements.

Kladis et al. [75] encode each UAV’s route as a Bézier curve and use a two-stage GA to optimize a weighted objective that combines energy expenditure with geometric path length under environmental constraints (terrain, traffic, weather, no-fly zones). Normalized fitness and payoff-table style scaling keep objectives comparable, while crossover/mutation reshape control points to satisfy start/goal and clearance limits; the result is a smoothed flyable set of swarm trajectories. The framework is vehicle-agnostic and suits centralized offline planning, but performance hinges on weight selection and payoff-table estimation, assumes a known static map and ideal sensing, and uses a point-mass abstraction with geometric margins rather than kinodynamic guarantees—factors that would require dynamics-aware safety layers for certified deployments.

These GA-based planners show that parametric curve encodings can yield energy-efficient, globally consistent swarm trajectories on static maps and are flexible across vehicle and terrain types. However, their performance hinges on weight and payoff-table tuning, assumes accurate static environment models and point-mass kinematics, and provides only geometric safety margins rather than certified kinodynamic constraints. As a result, GA-based PFS planners are best viewed as offline global designers whose outputs must still be checked and possibly wrapped in dynamics-aware safety layers before deployment.

The foregoing findings are consolidated in Table 4.

In the PFS setting, where maps are static and planning is predominantly offline, graph-search, GA/PSO-style meta-heuristics and supervised learning play distinct but complementary roles. Enhanced A*/Dijkstra variants provide reproducible, often optimal, or anytime baselines on voxel grids, making them attractive for safety-critical pre-planning. GA and related meta-heuristics offer greater flexibility for multi-objective optimization and smooth parametric curve design, especially when exact solvers become slow at the price of parameter tuning and stochastic variability. Supervised surrogates, when sufficient labeled data are available, can replace hand-crafted heuristics and accelerate repeated evaluations. Table 4 shows that realistic deployments are likely to use graph search as a baseline backbone, with GA/PSO or SL components to handle complex objectives, approximations, or repeated planning.

Table 4. PFS literature summary.

Reference	Algorithm/Method	Applicable Scenario and Problem Addressed	Limitation
Chen Y, Pi D, Wang B, et al. [71]	Meta-heuristic → MGOEO (EO + generalized opposition; crossover/mutation)	Offline multi-UAV path planning on static 3D maps; Role: centralized global path synthesis; Arch: centralized; Assump: known obstacle map, point-mass model; Quant: average runtime on Map-1 ≈ 60.5 s; average runtime on Map-2 ≈ 80.1 s	Extra runtime from opposition sampling; relies on known maps/weight tuning; point-mass abstraction; weaker on some fixed-dimension multimodal cases
Du Y. [72]	Graph search → Enhanced A* (query table; task-allocation partition)	Offline 3D grid planning with workspace partition for multi-UAV SAR; Role: centralized partition + per-UAV local A; Arch: centralized partitioning; Quant: planning time ≈ 29.65 s (enhanced) vs. 51.53 s (A); max open-list size reduced 5751 → 3286 (representative region)	Static known map; grid abstraction; dynamics not enforced
Bashir N, Boudjit S, Dauphin G. [73]	Graph search → Dijkstra (connectivity-aware) + layered control	Offline urban path with fleet/backhaul connectivity; Role: ground path finding + onboard formation tracking; Arch: centralized planning + per-UAV navigation; Quant: max reaction delay to leader speed change ≈ 0.58 s; minimum UAV–UAV spacing ≈ 15 m during mission	Assumes known static obstacles and radio thresholds; no explicit kinodynamic constraints; relies on sensing/comms stability for handover/topology updates
Xie J, Zhang G, Zhang W, et al. [74]	Graph search → improved A* + JPS; Traj. opt. → L-BFGS	Static 3D grid; formation-aware motion planning; Role: A* seed + JPS pruning + L-BFGS refinement; Arch: centralized; Assump: voxel map, point-mass	The integration of improved A*, JPS-based path simplification, and L-BFGS optimization increases algorithmic complexity, which may reduce real-time applicability for large UAV swarms
Kladis G P, Doitsidis L, Tsourveloudis N C. [75]	GA (multi-objective) → Bézier-curve planner	Offline static 3D map; minimize energy + path length; Role: centralized global path synthesis; Arch: centralized; Assump: DTED terrain, weather, and no-fly zones	Relies on weight/payoff-table tuning; point-mass and geometric safety (no kinodynamic guarantees)

3.3. PFD Problem (Path, Offline, Dynamic)

Particle Swarm Optimization Algorithms

In the PFD scenario, particle swarm optimization (PSO) and its hybrids are widely used to encode time-varying costs and dynamic environments while keeping the heavy optimization phase offline or in a receding-horizon loop. Most methods follow a similar pattern: a structural front-end generates candidate paths or partitions, and PSO refines waypoints or segment parameters under composite costs that mix path length, safety margins, and energy-related terms.

Several studies combine PSO with graph- or sampling-based front-ends to improve exploration on complex maps. Liu et al. [83] use random geometric graphs to generate variable-length candidate paths that are then optimized by PSO in a divide-and-conquer fashion, while Meng et al. [84] seed feasible routes with RRT*, route higher-priority vehicles first, and let lower-priority UAVs treat those trajectories as dynamic obstacles during PSO refinement. These designs clarify the roles of candidate generation and local search and reduce stagnation in local optima, but they rely on known DEM or voxel maps, simplified point-mass kinematics, and geometric clearances rather than kinodynamic guarantees. Graph construction, RRT* seeding, and path-to-particle conversion also introduce extra runtime overhead.

A second group focuses on dynamic objectives and hybrid search. Huang et al. [85] frame Age-of-Information–aware data collection as a Markov game and augment MATD3 with a PSO-based critic-parameter search and dual replay buffers, while Cao et al. [86] couple a weighted Voronoi partitioner with per-UAV PSO so that airspace is continuously repartitioned and each vehicle optimizes its path within its own adaptive cell. Shao et al. [87] parameterize formation routes with PH curves and refine control points with a multi-population PSO–GA hybrid, and Wang et al. [88] use a cooperative-game hybrid of spherical-vector PSO and Differential Evolution for inspection in urban pipe corridors. These schemes show that PSO can handle dynamic costs, formation constraints, and multi-objective trade-offs, but at the price of added algorithmic complexity, parameter sensitivity (weights, population sizes, update rates) and extra compute from hybrid operators, bargaining steps, or multi-population coordination.

Other work targets specific dynamic tasks. Sheng et al. [89] improve PSO for TDOA-based localization by screening promising UAV combinations and inheriting previous best patterns, Tan et al. [90] tune PSO coefficients toward a Nash equilibrium to stabilize convergence, Wang et al. [91] introduce dynamic clustering and chaotic initialization to escape local optima in an APF-based scene, and Li et al. [92] optimize composite-UAV operations by jointly choosing air-launch points and multi-segment routes via grouped PSO. In each case, PSO provides a flexible search backbone while problem-specific encodings capture task pairing, launch geometry, or receding-horizon decisions. The cost of this flexibility is highly dependent on encoding design, hyper-parameter schedules, and penalty weights; safety and vehicle limits remain modeled geometrically rather than through certified kinodynamic constraints.

Overall, PSO and PSO hybrids form a natural choice for PFD problems where costs and constraints vary over time, but the main optimization can still run offline or on a relatively slow horizon. They offer strong global search and easy hybridization with clustering, graph seeding, game-theoretic arbitration, or learning components. At the same time, they are sensitive to parameter settings and incur non-trivial runtime once hybrids and multi-population schemes are introduced, and most existing designs rely on static or quasi-static maps with point-mass abstractions. Closing the gap to real deployments will require better modeling of vehicle dynamics and communication limits, explicit safety layers, and careful profiling of computational load under realistic update rates [83,84,85,86,87,88,89,90,91,92].

The foregoing findings are consolidated in Table 5.

For PFD missions, PSO and other population-based optimizers dominate because they naturally encode time-varying costs and can operate in rolling-horizon frameworks while still running mainly offline or at relatively slow update rates. Graph- or sampling-based front-ends (e.g., RRT*, visibility graphs) act as structural generators, with PSO refining waypoints or segments, and hybrid schemes combine PSO with clustering, decomposition, or game-theoretic arbitration to manage complexity. Compared with PFS, the emphasis shifts from strict optimality to robust performance under dynamic constraints, with RL playing a smaller role due to its training cost and the difficulty of guaranteeing safety. Table 5 indicates that PSO-based hybrids offer a practical compromise between expressiveness and computational load in PFD, especially when cost and constraint changes are slower than the PSO update cycle.

Table 5. PFD literature summary.

Reference	Algorithm/Method	Applicable Scenario and Problem Addressed	Limitation
Liu Y, Zhu X, Zhang X Y, et al. [83]	PSO + RGG (variable-length) + divide-and-conquer	Offline 2D grid; RGG candidates + PSO sub-path refinement; Quant: average normalized path length ≈ 0.248 (vs. 0.872 PSO); iterations to first feasible path ≈ 4.1 (vs. 14.0 PSO)	Static known map; parameter-sensitive (radius/samples/waypoint cap); geometric safety only.
Meng Q, Chen K, Qu Q. [84]	PSO (hybrid) + RRT* + priority planning → PPSwarm	Cooperative routing; Role: RRT* seeding + per-UAV PSO with dynamic-obstacle list; Arch: centralized high-level + per-UAV refinement; Quant: average runtime ≈ 111.7 s; mean path cost ≈ 153,744 (Scenario-2)	Known DEM/cylindrical obstacles; point-mass abstraction; safety via geometric distances; performance depends on priority/params; replans needed when map/links change
Huang H, Li Y, Song G, et al. [85]	RL → DP-MATD3 (MATD3 + PSO tuning + dual replay)	AoI-aware multi-UAV data collection; Role: per-UAV policy with CTDE; Arch: centralized training/decentralized exec.; Assump: fixed altitude, local sensing/sharing; Quant: weighted average Age of Information reduced by ~33.3% (5 m/s) and ~27.5% (10 m/s) vs. MATD3	DP-MATD3 integrates PSO optimization and dual experience pools into MATD3, which increases computational overhead and may hinder real-time deployment in large-scale UAV networks
Cao Z, Li D, Zhang B. [86]	Weighted Voronoi partition + PSO (real-time updates)	Dynamic cluttered airspace; Role: team-level partition + per-UAV local PSO; Arch: hybrid (central partition, onboard refinement); Assump: planar/altitude-band, reliable sensing/links	Sensitive to weights/update rate (boundary oscillation, load imbalance); point-mass abstraction; geometric safety (no kinodynamic guarantees)
Shao Z, Zhou Z, Qu G, et al. [87]	PH-curve parametrization + MHPSGA (multi-population PSO–GA)	Formation-constrained 3D planning in cluttered terrain; Role: PH geometry + hybrid optimizer; Arch: centralized; Assump: known static map, point-mass, curvature bounds	Parameter-sensitive (pop/migration/crossover/mutation); fixed formation templates; geometric safety only (no kinodynamic guarantees)
Wang C, Zhang L, Gao Y, et al. [88]	SPSO + DE (Nash bargaining) → GSPSODE	Offline inspection paths in urban pipe corridors; Role: hybrid global search with bargaining; Arch: centralized; Quant: average runtime (Scene 2) ≈ 122 s; best path cost (Scene 2) ≈ 46,900	Despite the game-theoretic hybridization, SPSO-DE may still fall into local optima in highly complex environments
Sheng L, Li H, Qi Y, et al. [89]	Improved PSO + similarity screening (TDOA)	Online passive localization + trajectory optimization; Role: select 4 UAVs + optimize their next positions; Arch: centralized screening + per-UAV refinement; Quant: average positioning error ≈ 1.39 km	Although similarity screening reduces redundant calculations, the improved PSO with large populations (e.g., 6000 particles) and inheritance mechanisms still imposes heavy computational overhead, limiting real-time scalability
Tan L, Zhang H, Shi J, et al. [90]	PSO with Nash-equilibrium tuning	Offline 3D grid path planning; Role: PSO search with on-the-fly coefficient balance; Quant: average convergence time reduced by ~32%; average flight distance reduced by ~34% (vs. PSO)	Static known map; point-mass, fixed altitude; relies on Nash reaction-function assumptions; parameter-sensitive; penalty-based safety (no kinodynamic guarantees)
Wang L, Luan Y, Xu L, et al. [91]	DCPSO (dynamic clusters + Tent-chaos) on APF + receding horizon	Online waypoint selection with APF scene; Role: horizon model + clustered PSO search; Quant: mean path length ≈ 108.98 km; best path length ≈ 108.23 km (30 runs)	2D fixed altitude; ideal sensing/links; clustering overhead; parameter-sensitive; penalty safety (no kinodynamic guarantees)
Li Y, Zhang L, Cai B, et al. [92]	FP-GPSO (Fermat-point grouping PSO)	Unified ALP + three-segment routes in mountains; Role: geometry-informed ALP + PSO segment routing; Quant: mean path cost ≈ 1.45; feasible-path rate = 100%	Needs known DEM/risk–safety maps; point-mass/fixed envelope; sensitive to group sizes and coefficient/weight schedules; penalty-based safety (no kinodynamic guarantees)

3.4. DND Problem (Distribution, Online, Dynamic)

3.4.1. Ant Colony Optimization Algorithm

In the DND scenario, ant colony-based methods are used to jointly handle dynamic task distribution and routing under heterogeneous environments and intermittent communications. The surveyed designs range from wind-aware two-stage pipelines for power-line inspection, through integrated coverage–surveillance–strike frameworks, to distributed ACO schemes that maintain consensus under switching communication graphs.

Li et al. [52] model a 3D mountain site with four heterogeneous wind fields, then split the workflow into task allocation and path planning. Allocation is solved by BACOHBA, a bidirectional ant-colony scheme refined by a discretized Honey Badger local search to avoid one-way bias; planning is solved by HBAFOA, which uses Honey Badger for global exploration and Fruit-Fly moves to repair oversized or out-of-bound steps, yielding feasible, obstacle-free segments that are stitched into full routes. The pipeline couples wind-aware costs (distance/time/energy and wind-change penalties) with DEM-based clearance, improving assignment quality and routing in rugged terrain. Limitations are tied to the need for a known DEM and wind-field model, added algorithmic complexity from the two-stage hybrid (longer wall time), and reliance on geometric penalties rather than kinodynamic guarantees; certified deployments would require dynamics-aware safety layers and robustness to sensing/communication delays.

Luo [53] proposes an integrated pipeline spanning search–coverage, coordinated surveillance, and strike. A distributed sensing–communication layer builds a situational-awareness map that drives cooperative search coverage; for surveillance, an indirect pheromone mechanism over a target-probability map coordinates multi-UAV monitoring while preserving separations; for strike, a GA-enhanced ACO assigns targets under range, timing, and benefit constraints. The design separates perception → allocation → routing, improves stability via digital-pheromone updates, and shows resilience to node loss in simulation. At the same time, it relies on grid-based maps and idealized comm/sensing, encodes safety through geometric separations rather than kinodynamic guarantees, and requires hand-tuned evaporation/propagation and GA/ACO weights—assumptions that would need dynamics-aware safety layers and robustness to comm loss for field deployment.

Zhang et al. [54] design DCS-UC, a distributed ACO scheme that keeps a swarm coordinated when communication links switch or drop. The method maintains pheromone-matrix consensus over a switching, connected graph so that agents share a consistent view of explored cells, performs position-consensus updates to recover all-UAV positions without full connectivity, and applies a consensus-based collision-avoidance rule when separations shrink. Waypoints are chosen by an ACO transition rule augmented with a coverage-gain heuristic, yielding an online cooperative search despite intermittent links. Assumptions include a 2D fixed-altitude grid, ideal sensing, and a connected topology (at least one spanning tree); convergence depends on consensus frequency vs. pheromone update rate and on network latency, and obstacles are not modeled (focus is on communication effects), so transfer to cluttered 3D airspace would require explicit kinodynamic safety and robustness to sensing/communication loss.

Overall, these ACO-based DND planners highlight that pheromone-guided constructive search can encode coupled allocation–routing decisions and tolerate link instabilities, making them attractive for dynamic distribution tasks with complex terrain and communication conditions. Their weaknesses lie in increased algorithmic complexity and wall time for multi-stage hybrids, dependence on DEM and wind/propagation models, parameter sensitivity (evaporation, weights, consensus rates), and the absence of kinodynamic safety guarantees; moving to certified deployments will therefore require dynamics-aware safety layers and careful profiling under realistic network conditions [52,53,54].

3.4.2. Reinforcement-Learning Algorithms

Reinforcement-learning-based planners in the DND scenario focus on dynamic task allocation and routing under partial observability, resource constraints, and latency limits. The reviewed approaches combine centralized training with decentralized execution, and often integrate graph or attention-based features, solver-based routing, or shared inference workloads to balance decision quality against computation and communication costs.

Dhuheir et al. [27] couple path planning with on-board, distributed CNN inference to minimize end-to-end decision latency. A central PPO agent plans per-UAV trajectories over a cell grid while sharding CNN layers across vehicles; the placement and paths co-evolve so that neighboring UAVs exchange intermediate features with low delay and limited interference. The state encodes layer queues, device capacities, hot-cell coverage, and relative positions; actions decide which UAV executes the next layer and where it moves next. This architecture removes single-UAV bottlenecks and trims communication distance during inference, but it assumes a 2D fixed-altitude grid with ideal sensing/links, fixed bandwidth and power, and no Doppler. Safety is enforced via penalties rather than kinodynamic guarantees, and performance depends on hot-cell design and periodic re-optimization—factors that would require dynamics-aware safety layers and robustness to link loss in certification-bound deployments.

Li et al. [28] split dynamic mission planning into allocation and path layers and train an attention-based RL policy (with graph features) to assign targets in real time as team size and task count change. Given each UAV’s assigned set, the path layer solves an optimal route with Gurobi for quality or a greedy heuristic when latency is critical, so the pipeline can trade optimality for speed on demand. The design cleanly separates learning-based allocation from solver-based routing and supports rapid re-assignment under disturbances, yet it presumes a grid map and known obstacles, central coordination, and reliable links, and it enforces safety via geometric margins rather than kinodynamic guarantees; scalability on very large instances hinges on solver runtime, and frequent re-optimization is required when dynamics are intense.

Du [29] studies swarm routing and task allocation with incomplete information, refining MADDPG for path planning by adding a pseudo-collision reward that discourages entries into uncertain threat zones and adapting MAPPO to handle cooperative–competitive assignment when agents observe only local neighborhoods. The training follows a centralized-training/decentralized-execution pattern: centralized critics see joint information during learning, while onboard actors execute from local observations. This cleanly separates a collision-aware routing layer (MADDPG) from a team-level allocator (MAPPO) and stabilizes behavior under partial observability. The approach, however, assumes a 2D fixed-altitude kinematic model with ideal sensing/links, requires careful tuning of the pseudo-collision reward and MAPPO game parameters, and provides no kinodynamic safety guarantees. Scalability is limited by critic training and replay overhead, and frequent re-optimization is needed as team size or environment changes.

Chen et al. [30] cast joint serving and charging as a partially observable Markov decision process and design two distributed DRL agents: a CNN-based DQN that maps local observations to actions, and a DRQN that augments DQN with an LSTM to retain temporal context. Each UAV runs the same policy on-board while critics are trained centrally, so vehicles decide when to serve, fly, rest, or recharge without knowing others’ full state. The model enforces charging-slot limits, energy budgets, and mobility reachability, and the reward penalizes boundary violations, battery depletion, and ineffective service. This architecture removes the single-controller bottleneck and keeps scheduling feasible under limited visibility; however, it assumes a 2D fixed-altitude grid, ideal sensing/links, and fixed bandwidth/power, and it encodes safety via penalties rather than kinodynamic guarantees. Performance also depends on charging capacity (rate and slots per station) and requires retraining or periodic re-optimization as demand patterns change.

Together, these DND RL pipelines suggest that learning-based allocators can adapt to changing team sizes, task sets, and resource budgets, while solver-backed or heuristic motion layers preserve route quality. Their applicability is nonetheless constrained by 2D grid abstractions, idealized sensing and fixed bandwidth models, heavy critic training and replay overheads, and safety formulations based on geometric penalties rather than certified kinodynamic envelopes. For field deployment, they will need dynamics-aware safety layers, communication-aware design, and clearer guarantees on latency and scalability [27,28,29,30].

3.4.3. Unsupervised Learning Algorithms

The “unsupervised learning” category in DND primarily covers clustering-driven front-ends that structure targets or zones before auction-based assignment and reinforcement-learning-based routing. By grouping targets geographically or by behavior, these methods reduce the dimensionality of the allocation problem and provide more stable bidding and exploration patterns in unknown or partially known maps.

Yu [109] combines a hierarchical target clustering front-end with an enhanced Contract Net Protocol (hybrid centralized–distributed auction plus an optimal-allocation rule) to assign UAVs to target groups and then uses Q-learning to synthesize obstacle-aware paths by agent–environment interaction, with an auxiliary tracker accelerating Q-value updates. The pipeline separates perception/allocation from online routing, stabilizes bidding via digital-pheromone-style cues, and supports a coordinated search/track in unknown environments. At the same time, the framework rests on a grid map and idealized sensing/links, encodes safety through geometric margins rather than kinodynamic guarantees, and remains parameter-sensitive (pheromone/auction coefficients, clustering thresholds, Q-learning rates). Scalability hinges on auction frequency and message load; frequent re-optimization is required as targets appear/disappear or move.

Wang et al. [110] design a cooperative airport bird-dispersion scheme that couples LSTM–Kalman trajectory prediction with Hungarian task assignment and 3D Dubins curvature-bounded routing. A base-station processes radar tracks to predict the intruder’s motion, allocates interception points to the nearest formation, and commands each UAV along Dubins segments, while a proportional-guidance law expedites approach when far from the target. The pipeline separates perception (radar + LSTM–KF), allocation (Hungarian on a cost matrix), and feasible routing (Dubins + guidance), yielding cohesive, collision-free maneuvers across repeated intrusion attempts. Assumptions include centralized computation, reliable links, and single-bird behavior without flock dynamics. Feasibility and safety are enforced geometrically rather than by kinodynamic guarantees, so field deployment would require dynamics-aware safety layers and robustness to sensing/communication loss.

Overall, clustering-enhanced pipelines show that unsupervised structure discovery can simplify downstream auctions and RL routing, enabling scalable multi-target tracking and bird-dispersion schemes in dynamic scenes. Their performance, however, depends strongly on clustering metrics and thresholds, auction and pheromone parameters, and assumes grid maps with ideal sensing and communication; safety is again geometric rather than kinodynamic. As dynamics intensify and fleet sizes grow, auction frequency, message loads, and re-optimization overhead become critical factors that must be managed explicitly [109,110].

3.4.4. Meta-Heuristic and Bio-Inspired Algorithms

In DND, meta-heuristic and bio-inspired methods are mainly used to stabilize and accelerate market-based task allocation under large target sets and resource constraints. The two-stage auction schemes surveyed here combine data-driven bidding-function tuning with lightweight re-auction mechanisms, aiming to preserve high throughput while reducing idle time and deadlocks in large-scale engagements.

Tan et al. [67] design a two-stage auction for swarms that first learns the bidding-function weight (k) via machine-learning pretuning, then runs a re-auction mechanism during execution, so en route UAVs can relinquish or re-bid when higher-value tasks appear. The framework couples a lightweight central auctioneer with distributed bidders: hierarchical clustering and the learned bidding rule stabilize prices, while a secondary auction with greedy tie-breaking reduces idle time and missed deadlines. This separates parameter learning → formal auction and supports rapid re-assignment under resource limits. Key caveats are sensitive to the learned weight k and re-auction frequency, added overhead/latency from secondary auctions, and reliance on a central auctioneer (single-point pressure). Safety and vehicle limits are handled geometrically rather than by kinodynamic guarantees; field deployment would require dynamics-aware safety layers and robustness to link loss.

Wang et al. [68] formulate large-scale cooperative strike as a two-stage greedy auction: Stage 1 ranks tasks via entropy weighting to obtain a fast initial assignment; Stage 2 performs reassignment using a strike-effectiveness index to refine matches and prevent deadlocks. The scheme keeps a lightweight central auctioneer and redistributes tasks when conditions change, so the swarm sustains high-throughput decisions at scale. The design is well suited to dynamic, many-target settings, yet it assumes known static maps with Dubins-curve surrogates for trajectories, omits communication limits and maximum-range constraints (acknowledged by the authors), and relies on weight tuning for the scoring indices. Safety and vehicle kinematics are handled geometrically rather than by certified kinodynamic guarantees, implying that field deployment would require dynamics-aware safety and comm-robustness.

These auction-focused meta-heuristics demonstrate that learned or carefully engineered bidding rules can improve utilization and completion rates without sacrificing the simplicity of centralized auctioneers. Their caveats are sensitive to learned weights and re-auction frequency, potential bottlenecks at the central auction server, and the continued reliance on static-map and Dubins-surrogate assumptions with purely geometric safety. For real deployments, they must be integrated with communication-aware designs, range and kinematic constraints, and dynamic-aware safety layers [67,68].

The foregoing findings are consolidated in Table 6.

In DND scenarios, where distribution and routing must adapt online under partial observability and unstable communications, three families emerge. ACO-based methods provide constructive, pheromone-driven allocation–routing schemes that tolerate link losses and uncertainty but can be slow to converge on very large graphs. RL-based methods are more expressive and can encode complex couplings between communication, sensing, and tasking, yet incur significant training costs and require careful reward and safety design. Clustering-based unsupervised learning and meta-heuristic refinements sit in between: they discover structure in tasks or areas and stabilize auctions or re-allocations while relying on simpler controllers for motion. As summarized in Table 6 and Table 7, practical DND systems are likely to combine RL or ACO for high-level allocation with clustering and lightweight local controllers to keep runtime and communication overhead manageable.

Table 6. DND literature summary.

Reference	Algorithm/Method	Applicable Scenario and Problem Addressed	Limitation
Li K, Yan X, Han Y. [52]	BACOHBA (bidirectional ACO + discrete HBA) + HBAFOA (HBA + FOA)	Power-line inspection with multi-wind fields; Role: wind-aware task allocation + path planning; Quant: average path cost ≈ 7307; average runtime ≈ 378 s	Needs DEM/wind model; higher complexity/longer wall time; geometric (non-kinodynamic) safety
Luo X. [53]	Integrated ACO framework (coverage → surveillance → strike; GA-enhanced ACO for allocation)	Unknown environments: distributed sensing map; pheromone-based surveillance; GA–ACO strike allocation; Quant: coverage achieved = 100% in search demo; min inter-UAV spacing ≥ 150 m during monitoring	Grid/ideal sensing and comm; hand-tuned pheromone/GA–ACO weights; geometric (non-kinodynamic) safety
Zhang H, Ma H, Mersha B W, et al. [54]	Distributed ACO → DCS-UC (pheromone and position consensus + collision avoidance)	Online cooperative search with unstable links; Role: ACO waypointing + consensus + avoidance; Arch: decentralized on switching connected graphs; Quant: coverage completion ≈ 250 s (4 UAVs, fixed topology); collisions = 0	2D fixed altitude; obstacles not modeled; relies on consensus frequency/latency; ideal sensing assumed; no kinodynamic guarantees
Dhuheir M A, Baccour E, Erbad A, et al. [27]	RL → PPO (joint trajectory + distributed CNN inference)	Online surveillance with collaborative inference; Role: central agent picks layer-to-UAV and next move; Quant: avg per-request latency ≈ 0.26–0.53 s	2D fixed altitude; fixed BW/power; no Doppler; ideal sensing; penalty-based safety; periodic re-opt. needed
Li M, Ma Q, Wu G. [28]	Attention-based RL (with GNN) + Gurobi/greedy	Dynamic task allocation with solver-based routing; Role: RL allocation + optimal/greedy path; Quant: task completion ↑ ≈ 27–60%; decision time < 1 s (multi-scale dynamic tests)	Grid/known obstacles; central coordination; solver runtime dominates at scale; geometric (non-kinodynamic) safety; frequent re-optimization under heavy dynamics.
Du J. [29]	MADDPG, MAPPO	Partial-observability routing + cooperative–competitive allocation; Role: MADDPG routing + MAPPO allocation	2D fixed altitude; ideal sensing/links; parameter-sensitive (reward/game weights); critic/replay overhead at scale; geometric/penalty safety (no kinodynamic guarantees).
Chen H C, Yen L H. [30]	RL → DQN/DRQN (CTDE)	Distributed serving–charging scheduling with limited visibility; Role: per-UAV on-board policy; Arch: CTDE/decentralized execution; Quant: average residual energy ≈ 25–55% (DQN/DRQN runs); fails when charging rate = 0.5 or 1.0 with 3 slots/station; works with 9 slots or rate = 2.0	2D fixed altitude; fixed BW/power; ideal sensing; penalty safety (no kinodynamic guarantees); retraining/periodic re-opt. under changing demand.
Yu S. [109]	Enhanced Contract Net (hybrid) + Q-learning routing	Multi-target tracking with unknown map; Role: hierarchical clustering → hybrid auction → Q-learning paths	Ideal sensing/links; parameter-sensitive (pheromone/auction/Q-rates); auction/message overhead at scale; geometric (non-kinodynamic) safety; frequent re-optimization as targets change.
Wang X, Zhang X, Lu Y, et al. [110]	LSTM–Kalman prediction + Hungarian allocation + 3D Dubins + proportional guidance	Airport bird-dispersion; Role: predict → assign → route (centralized base-station; radar-tracked target);Quant: eviction completed ≈ 71 s; optimal formation size = 5 UAVs per group by cost–benefit analysis	Centralized computation and radar dependence; assumes single-bird (no flock dynamics); ideal links; geometric (non-kinodynamic) safety
Tan C, Liu X. [67]	Improved two-stage auction (ML-tuned bidding + re-auction)	Dynamic, resource-constrained allocation; Role: learned bidding + secondary auction; Quant: task-completion improvement (ITCR) ≈ 6–7%; degradation count NPD ≈ 1/100 runs (function + mechanism)	Sensitive to k and re-auction rate; central auctioneer overhead; geometric (non-kinodynamic) safety.
Wang G, Wang F, Wang J, et al. [68]	Two-Stage Greedy Auction (TSGAA)	Large-scale naval target allocation; Role: entropy-weighted initial auction + effectiveness-based reassignment; Quant: avg runtime ≈ 0.0004 s (20 UAVs/10 targets); avg runtime ≈ 0.52–0.66 s	Assumes static known map and Dubins surrogate; no comm/range constraints; weight tuning; geometric (non-kinodynamic) safety

3.5. DFD Problem (Distribution, Offline, Dynamic)

Particle Swarm Optimization Algorithms

In the DFD scenario, PSO and PSO-hybrid planners are used to tackle offline distribution problems under dynamic demand or environmental models. Typical tasks include multi-point wildfire response, WSN data collection, multi-weapon/multi-target assignment and logistics routing with time windows. A common pattern is to encode assignment and routing decisions jointly, or to couple a static pre-allocation with a lighter online re-allocation stage, so that dynamic changes are absorbed by partial replanning rather than full recomputation.

One line of work targets spatiotemporal mission fields and multi-objective routing. Yan and Chen [93] combine PSO-based routing over a mountainous DEM with an artificial-bee-colony mechanism that, at each fire point, assesses severity and decides how many UAVs stay for suppression. Beishenalieva and Yoo [94] optimize movement in a 3D WSN by maximizing a time-varying sensing-utility function while minimizing travel time and energy, with grid-cell models and communication-aware constraints. Tang et al. [98] slice dynamic logistics into quasi-static sub-instances, pre-plan routes with an improved PSO, and trigger selective replans when demand and wind patterns change. Han et al. [100] solve post-earthquake SAR routing as a VRPTW with cost and delay penalties using a PSO–GWO hybrid. These studies show that PSO can handle multi-objective trade-offs among sensing value, timeliness, energy, and fleet size on dynamic backgrounds, but they rely on known DEMs, simplified payload–energy models, and penalty-encoded constraints, and they incur non-trivial runtime once hybrid operators and time-slicing are introduced.

The second group focuses on dynamic assignments and re-allocation. Zhang et al. [95] design an enhanced PSO with task-type coding and competitive co-evolution to steer a formation through reconnaissance–strike–assessment phases under time windows; Deng et al. [96] embed a digital twin into a multi-objective, adaptive-weight PSO so that changes in resources, demands and constraints are detected in simulation and reflected in weight adjustments; Yu [97] uses a discrete, matrix-coded PSO (with BAS-style local search) to perform static pre-assignment, and extends CBBA with partial reset to handle in-mission re-tasking under heterogeneous loads and time windows; Li et al. [99] integrate intuitionistic-fuzzy threat assessment with VNS-IBPSO for multi-weapon, multi-target assignment under evolving posture uncertainty. These pipelines separate modeling layers (threat assessment, DT-based state estimation, or pre-assignment) from PSO-based searches and from downstream re-bidding and illustrate that PSO can scale to large assignment spaces when paired with problem-specific encodings. Their limitations lie in heavy dependence on accurate priors (threat probabilities, DT fidelity, task statistics), sensitivity to penalty and weight settings, and increased time/space cost from neighborhood search, subgrouping, or repeated DT response cycles. Safety and platform limits are still implemented through geometric penalties rather than certified kinodynamic constraints.

Overall, PSO and PSO-hybrids are natural tools for DFD problems where the allocation and routing structure is complex, but the main optimization can be performed offline or in slow cycles. They provide flexible encodings and strong global search, and integrate cleanly with digital twins, fuzzy threat models, and auction-style re-assignment. At the same time, parameter sensitivity, reliance on static or quasi-static maps, and the lack of formal safety guarantees remain important gaps when moving toward certified, real-time multi-UAV deployments [93,94,95,96,97,98,99,100].

The foregoing findings are consolidated in Table 7.

In DFD, where allocation and routing are tightly coupled but can be optimized offline or on slow horizons, PSO and other meta-heuristics are the primary tools, sometimes augmented by digital twins, fuzzy threat models, or game-theoretic components. Exact MILP formulations are attractive for small to medium instances, but quickly become intractable as fleet size and task count grow, at which point PSO- and GA-based hybrids provide better scalability at the cost of deterministic optimality. Table 7 highlights that PSO excels at handling multi-objective trade-offs among timeliness, energy, and risk, while GA and other meta-heuristics are often used in outer loops for schedule design or solution diversification. From a deployment standpoint, the balance between exact optimization and heuristic search should be tuned according to instance size and the need for explainable guarantees.

Table 7. DFD literature summary.

Reference	Algorithm/Method	Applicable Scenario and Problem Addressed	Limitation
Yan X, Chen R. [93]	PSO path planning + ABC fire assessment/control	Offline multi-point routing on mountain DEM; Role: PSO for reachability, ABC for severity-driven allocation; Quant: iterations to convergence ≈ 62; per-path computation time ≈ 7.9–13.3 s	Needs known DEM/ideal sensing; PSO local-optimum/parameter sensitivity; ABC heuristic rules; geometric (non-kinodynamic) safety.
Beishenalieva A, Yoo S J. [94]	PSO with grid-cell 3D model; multi-objective utility (VSI/time/energy)	Offline multi-UAV planning on static WSN maps; Role: asynchronous next-cell selection with FANET connectivity; Quant: fitness evaluations per movement ≈ 250 vs. 22,832; cumulative sensing value ≈ 95% of full-search	Needs known DEM/sensor stats; free-space/beam assumptions; point-mass and grid-cell abstraction; weight tuning; geometric (non-kinodynamic) safety
Zhang J, Cui Y, Ren J. [95]	Enhanced PSO (particle coding + competitive co-evolution)	Distributed planning for TSTs; Role: alliance + allocation + per-UAV paths; Quant: avg fitness (Scenario-1) ≈ 2.87 × 10²; avg calc. time (Scenario-1) ≈ 2.5 s (200 runs)	Fixed-altitude, known DEM/obstacles; relies on coding/update parameters; penalty-based safety; comm model assumes low-hop, stable links
Deng M, Yao Z, Li X, et al. [96]	DMOAWPSO (DT-assisted adaptive-weighted multi-objective PSO)	Dynamic multi-objective task allocation with DT change-response; Role: DT monitors scene → MO-PSO updates	Depends on DT fidelity and latency; parameter-sensitive (subgroups/mutation/weight schedule); added compute from response rounds; penalty-based safety (no kinodynamic guarantees)
Yu Y. [97]	Discrete PSO (matrix-coded + BAS) + extended CBBA (partial reset)	Static pre-allocation + dynamic re-tasking with heterogeneous loads and time windows; Quant: overall gain highest with partial-reset CBBA; runtime shorter than full-reset and closer to no-rese	Grid and ideal sensing/links; weight/evaporation and reset-rate sensitive; geometric (non-kinodynamic) safety
Tang G, Xiao T, Du P, et al. [98]	Improved PSO (inferior-solution mutation + selective crossover); time slicing	Offline multi-weapon/multi-target assignment; Role: fuzzy threat assess. → VNS-IBPSO allocation; Quant: convergence time ≈ 4.08 s; fitness variance ≈ 1.0 × 10⁻⁴ (best among compared)	Needs known hit probs and single-shot execution; parameter-sensitive (penalty/weights); added VNS overhead; geometric (non-kinodynamic) safety
Li Y, Chen W, Liu S, et al. [99]	VNS-IBPSO + intuitionistic-fuzzy MADM	Multi-weapon, multi-target assignment under evolving threat assessments	The VNS-IBPSO integrates intuitionistic fuzzy threat assessment, improved BPSO update rules, and variable neighborhood search. This multi-layer design enhances performance but significantly increases computational cost, limiting real-time deployment in large-scale air combat
Han D, Jiang H, Wang L, et al. [100]	PSOGWO (PSO + GWO; nonlinear factor; dynamic weighting)	Post-earthquake SAR VRPTW: minimize fleet/cost/penalties; Role: centralized offline allocation + routing; Quant: min rescue cost lower than PSO/GWO; UAV routes satisfy capacity and time windows	Static known map; homogeneous UAVs; ideal sensing/links; parameter-sensitive; geometric/penalty safety (no kinodynamic guarantees)

3.6. DFS Problem (Distribution, Offline, Static)

Genetic Algorithm Algorithms

For DFS scenarios, genetic algorithms and GA–hybrid planners serve as offline schedulers and route designers that tackle tightly coupled assignment–routing problems under time windows, energy models, and heterogeneous formations. The surveyed pipelines typically split the problem into global assignment (GA/HGSA/AGA/NSGA-II) and local refinement (SCPSO, LNS, PSO-NGDP), clarifying how GA can be combined with other heuristics to handle large static instances.

Wu et al. [76] split heterogeneous formation missions into task assignment and formation/path planning. An HGSA allocator (GA + simulated annealing, with task-based feasibility and group-based selection) builds formations under rich combinatorial constraints; an OAEC controller adds a temporary-target mechanism to standard consensus so formations can be created while bypassing no-fly zones; an MPSO-C planner picks waypoints one-by-one under turn/altitude bounds and hands them to the consensus tracker; finally, a GA shifts formation departure times to remove inter-formation conflicts and the assignment is lightly re-optimized using actual path distances. The pipeline cleanly separates allocation, formation creation, waypointing, and conflict resolution, but it presumes known DEM/no-fly/radar models and ideal sensing/links, uses a fixed-altitude/point-mass abstraction, and enforces safety via geometric penalties rather than kinodynamic guarantees; performance is parameter-sensitive (temporary-point policy, sampling step, PSO settings), and scalability depends on the combined HGSA/GA search effort.

Xiong et al. [77] couple an AGA mission allocator with a SCPSO 3D router on voxelized terrain that includes mountain, tower-EMI, and severe-weather threat models. AGA uses an improved circle pre-process, roulette + elite retention, and adaptive crossover/mutation to stabilize convergence under combinatorial constraints. SCPSO adopts sine/cosine amplitude scheduling with linear-weighted inertia/acceleration to balance exploration and refinement, producing a smoothed trajectory between tasks. The pipeline cleanly separates offline assignment and offline path refinement and is tested on static DEM maps. Assumptions include known threats and a point-mass, fixed-altitude abstraction; benefits depend on data-structure/parameter choices (e.g., list storage in A*, adaptive rates in AGA, amplitude bounds in SCPSO), and safety is enforced via geometric penalties rather than kinodynamic guarantees—factors that limit direct transfer to certification-bound airspace without added dynamics-aware safety.

Pan et al. [78] formulate joint power/trajectory optimization in a UAV-enabled WPCN with obstacles and decompose it into UPAOP (power allocation + hovering points) and UTTOP (3D multi-segment trajectory). NSGA-II-KV improves non-dominated sorting GA with K-means initialization and a variable-dimension mechanism, so hovering points and transmit powers co-evolve; PSO-NGDP adds normal-distribution seeding, GA-style crossover, DE-style mutation, and a pursuit operator on a spacetime waypoint model with obstacle pretreatment. The pipeline clarifies roles—multi-objective power/hovering design then constrained 3D routing—and yields feasible, obstacle-aware paths. Limits: relies on known DEM and link/charging models, fixed-altitude/point-mass abstractions in places, and parameter sensitivity (e.g., σ/D/waypoint count for the spacetime tunnel; GA/PSO settings). Safety is penalty-based rather than kinodynamic, so certified deployment needs dynamics-aware safety and robustness to sensing/communication loss.

Du et al. [79] pose multi-UAV logistics as a cooperative routing model that jointly optimizes service assignment and routes under mixed hard/soft time windows, simultaneous pickup-and-delivery, and an energy-consumption model tied to load. The proposed IGCPA keeps GA for global exploration (elite retention, two-point crossover, swap mutation) and embeds an LNS destroy/repair operator for local improvement of customer segments; feasibility (load, energy, time windows) is enforced while repairing. The pipeline cleanly separates global search (GA) from local refinement (LNS) and yields lower-energy plans on static maps. Assumptions include a known customer/time-window/energy model and point-mass kinematics; performance depends on GA/LNS parameters and the destroy/repair policy, and the two-phase hybrid adds wall-time overhead on larger cases; safety and vehicle dynamics are handled via penalties rather than certified kinodynamic guarantees.

Jia et al. [80] build a heterogeneous multi-task allocation model with damage, range, and route-length costs, then propose IM-DPSO, which encodes UAV–task pairings in a two-row priority matrix and fuses GA-style cross-mutation into PSO’s self/social updates; adaptive inertia and learning-factor schedules further suppress early stagnation. Under static maps, the solver assigns ordered task lists to each UAV and yields a feasible global plan. The approach is clear in role (centralized allocation) and improves escape from local minima versus vanilla DPSO/GA, but it assumes a 2D fixed-altitude, known-map setting with ideal sensing/links, requires weight/timetable tuning (cost coefficients, crossover/mutation rates, inertia/learning schedules), and offers no dynamic re-allocation. Safety and platform limits are enforced via geometric penalties rather than kinodynamic guarantees.

Li et al. [81] pair an intuitionistic-fuzzy, time-weighted threat assessment (AHP/entropy-fused factor weights with Gaussian time weighting) with a Variable-Neighborhood Search + Improved Binary PSO allocator. Interval-valued intuitionistic fuzzy numbers encode posture uncertainty; the allocation layer improves BPSO via a V-shaped update and a VNS operator (swap/reverse/insert) to enlarge neighborhoods, suppress premature convergence, and accelerate local improvement. The pipeline cleanly separates threat assessment → allocation, yielding fast, feasible multi-weapon/multi-target plans. Limits include reliance on known hit probabilities and DEM/visibility models, a single-shot/simultaneous execution assumption during assignment, and parameter sensitivity (penalty and weight coefficients, VNS depth). Safety and platform limits are implemented through penalties rather than certified kinodynamic guarantees, so field deployment would require dynamics-aware safety and robustness to sensing/communication loss.

Overall, these DFS-focused GA frameworks show that combining global GA exploration with local search or secondary meta-heuristics yields competitive energy and route-cost performance on static DEMs, even under complex time-window and formation constraints. Their limitations come from reliance on known customer/terrain/threat models, fixed-altitude point-mass abstractions, parameter sensitivity across multiple modules, and the absence of dynamic re-allocation or kinodynamic safety guarantees. In practice, they are best suited to pre-mission planning, with online adaptation delegated to lighter mechanisms when demand or environment changes [76,77,78,79,80,81].

The foregoing findings are consolidated in Table 8.

For DFS, where both distribution and environment are essentially static, planners have more freedom to trade computation for solution quality. Exact optimization and column-generation methods provide strong baselines, but struggle with very large heterogeneous fleets, whereas GA- and PSO-based hybrids scale better and can incorporate complex constraints and multi-objective criteria. Local search and neighborhood-based refinements (e.g., LNS, PSO-NGDP) are frequently layered on top of GA/PSO to avoid premature convergence. As shown in Table 8, DFS solutions used in practice are likely to employ exact or near-exact solvers for small subproblems and GA/PSO hybrids for larger instances, with the choice driven by fleet size, time windows, and the degree of heterogeneity.

Table 8. DFS literature summary.

Reference	Algorithm/Method	Applicable Scenario and Problem Addressed	Limitation
Wu Y, Liang T, Gou J, et al. [76]	HGSA + OAEC + MPSO-C + GA	Heterogeneous formation mission: GA/SA allocation → OAEC formation → MPSO-C waypointing → GA departure deconfliction; Quant: formation time (OAEC) ≈ 274.6 s; path length (MPSO) ≈ 7.54 × 10⁴ vs. PSO ≈ 1.15 × 10⁵	Needs known DEM/no-fly/radar; fixed-altitude point-mass; parameter-sensitive (temp-point, step, PSO); safety via penalties (no kinodynamic guarantees); scalability tied to HGSA/GA search
Xiong T, Liu F, Liu H, et al. [77]	AGA (adaptive crossover/mutation) + SCPSO (sine–cosine scheduling)	Offline assignment + 3D routing on static DEM with threats; Role: AGA allocate → SCPSO route; Quant: mean path length ≈ 115.8 vs. 129.6 (SCPSO vs. PSO, 100 iters); mean path length ≈ 114.1 vs. 125.7 (200 iters)	Known DEM/threats; fixed-altitude point-mass; parameter-sensitive; geometric (non-kinodynamic) safety
Pan H, Liu Y, Sun G, et al. [78]	NSGA-II-KV (power/hovering) + PSO-NGDP (3D trajectory)	UAV-WPCN with obstacles; Role: MO power/hovering → spacetime waypoint PSO routing; Quant: coverage up to 18.03%; flight energy up to 25.30% (vs. baselines)	Needs known DEM/charging and channel models; partial fixed-altitude/point-mass; parameter-sensitive; penalty safety
Du P, He X, Cao H, et al. [79]	GA + LNS → IGCPA	Energy-aware logistics routing with mixed time windows; Role: GA global search + LNS local repair; Quant: energy cost ≈ 17% vs. GA; ≈10% vs. PSO (100-customer case)	Needs known customers/time windows/energy model; parameter-sensitive (GA/LNS); hybrid adds wall-time on large cases; point-mass and penalty safety (no kinodynamic guarantees)
Jia Z, Xiao B, Qian H. [80]	Discrete PSO → IM-DPSO (priority matrix; GA cross-mutation; adaptive weights)	Offline multi-task assignment on a static map; Role: centralized allocator with ordered task lists; Quant: execution time ≈ 18 min 27 s; total route length ≈ 396.6 km	2D fixed altitude; known map; parameter-sensitive (weights/schedules); no dynamic re-allocation; geometric/penalty safety (no kinodynamic guarantees)
Li Y, Chen W, Liu S, et al. [81]	VNS-IBPSO + intuitionistic-fuzzy dynamic threat assessment	Offline multi-weapon/multi-target assignment; Role: fuzzy threat assess. → VNS-IBPSO allocation; Quant: convergence time ≈ 4.08 s; fitness variance ≈ 1.0 × 10⁻⁴	Needs known hit probs/DEM; single-shot assumption; parameter-sensitive (penalties/weights/VNS depth); penalty-based safety (no kinodynamic guarantees)

3.7. CND Problem (Coverage, Online, Dynamic)

3.7.1. Reinforcement-Learning Algorithms

For CND scenarios, reinforcement learning is used to design coverage policies on grid abstractions that prioritize high-risk areas, manage energy and communication constraints, and coordinate multiple UAVs without hand-crafted patrol patterns. Most work adopts CTDE-style multi-agent RL or value-decomposition frameworks, with per-UAV actors executing onboard from local observations, while critics or mixers operate centrally during training.

The first set of studies learns risk-aware coverage policies on fixed-altitude grids. Demir et al. [31] use per-UAV DDQN agents to patrol forest cells whose fire risk is derived from a GIS-based map, with rewards balancing risk-point collection against penalties for boundary and communication-range violations, revisits, and off-time landings. Puente-Castro et al. [35] accelerate model-free Q-learning with a compact two-layer ANN that maps obstacle/position/visited-cell grids to actions and compare local versus global networks across team sizes. These approaches demonstrate that relatively simple Q-based architectures can learn to favor high-value cells and reduce boundary violations without explicit area-division rules, but they rely on accurate risk maps, fixed-altitude grid motion, and ideal sensing/links; safety and vehicle limits remain geometric, and training time and reward sensitivity are non-trivial.

A second group emphasizes energy-awareness and joint communication/coverage objectives. Cheng et al. [32] couple a trace-pheromone field with MADDPG so that evaporating pheromones encode visited history and energy cues, discouraging short-term revisits and saving energy, while actor–critic learning deals with dynamics. Dhuheir et al. [33] use a meta-RL policy to adapt wireless power transfer and data-collection patterns online as team size and device demands change. Baccour et al. [34] apply PPO in a 6G setting to jointly select swarm size and clustering, device–UAV association, RIS phases, UAV trajectories and base-station power. Hou et al. [38] adopt MADDPG with CNN map features to coordinate multi-target searches and reduce repeated visits. These works show that RL can internalize complex trade-offs among throughput, harvested energy, coverage, and stability, and can adapt to changing network or demand patterns. Their common limitations are centralized critics or controllers with significant training costs, simplified propagation and interference models, fixed-altitude grids, and penalty-based safety, all of which need to be revisited for certified operation in realistic 3D airspace.

A third line advances multi-agent coordination mechanisms. Zou [36] scales a single-UAV coverage scheme to multi-UAV cooperation using a fused coverage/position grid in each agent’s observation and an improved QMIX mixer with a masked highway connection to better handle non-monotonic returns. He [37] further combines improved QMIX for cell selection with a CTDE multi-agent SAC motion controller that uses attention to fuse features under sensing and communication limits. In both cases, adaptive rewards and shared coverage maps reduce overlaps and conflicts compared with hand-crafted anti-flocking baselines. Assumptions again include fixed-altitude grids, ideal sensing and links, point-mass kinematics, and geometric safety; performance remains sensitive to reward weights, mixer/memory parameters and critic training cost.

Overall, RL-based CND planners confirm that learning-based coverage can reduce revisits and encode risk/energy priorities without rigid partitioning and that CTDE and value decomposition provide workable coordination patterns. At the same time, the reliance on grid abstractions, idealized communication, penalty-based safety, and heavy training burdens points to the need for dynamics-aware safety layers, more realistic environment and link models, and explicit profiling of compute and energy budgets before field deployment [31,32,33,34,35,36,37,38].

3.7.2. Area-Segmentation Algorithms

Within the CND scenario, area-segmentation methods act as structure-imposing front-ends that decompose large coverage regions into manageable sub-areas before or during online dispatch. The surveyed frameworks span wavefront-based waypoint assignment, multi-base-station and sub-region partitioning, dynamic centroid-based resizing, footprint-optimized Voronoi cells, and ID/altitude-based striping, illustrating the variety of ways in which space can be partitioned to balance load and fault tolerance.

Szklany et al. [111] propose Tsunami, an online coverage framework that discretizes a polygonal environment into GPS waypoints offline, then, at runtime, maintains a drone pool and uses a wavefront traversal to dispatch waypoints while repartitioning on the fly as UAV fail, recharge, or join the swarm. Collision risk is handled during path generation by checking active trajectories and inserting a simple altitude offset; a wavefront order also reduces waypoint contention and idle time. The design separates offline discretization from online work distribution, giving fault tolerance and balanced load without fixed cellular partitions. Assumptions include known polygonal maps and no-fly zones, ideal sensing/links, fixed-altitude waypoint flight, and a geometric (not kinodynamic) safety rule. Performance is sensitive to traversal/parameter choices (e.g., corridor width/σ, sample count), and guaranteed separation or certified safety would require a dynamics-aware safety layer.

Yu et al. [112] propose two cooperative coverage planners for obstacle-rich regions with energy constraints. MBS-MUCCPPAFOA uses multiple base stations around the area and plans a global “square-wave” coverage route, letting each UAV depart/return from the nearest station to cut deadhead distance. MUAV-CCPPAFOA-AS first segments the workspace into four sub-areas and plans per-area coverage, reducing long transits and redundant sweeps. Both retain flexible A* obstacle avoidance and assign per-UAV time budgets via a binary-search allocator. The design separates global allocation/partitioning from per-UAV coverage execution, improving robustness across shapes and obstacle densities. Assumptions include a fixed-altitude grid, known polygonal map/no-fly zones, and ideal sensing/links; safety is geometric rather than kinodynamic. Sensitivities arise from partition balance and station placement (boundary oscillations and load imbalance when obstacles cluster), so field deployment would need dynamics-aware safety and online map/communication robustness.

Gui et al. [113] propose a decentralized exploration scheme that continuously resizes each UAV’s workspace from a dynamic centroid computed with position and task-weight (effective RRT candidate density), so lighter-loaded UAVs inherit larger sectors while heavier-loaded ones shrink. Each vehicle runs an NBV planner that samples RRT targets only inside its current partition, updates Octo Map locally, and shares just poses/weights over a light 5 G Wi-Fi link. Partition rays (planes through the centroid) repel candidates to suppress overlap, and the best-branch first segment is executed each cycle. The framework yields balanced assignments and fewer revisits in cluttered indoor/outdoor scenes (system diagram Figure 1, partitioning rules pp. 4–6). Assumptions include known initial relative poses, ideal sensing/links, and a fixed-altitude Octo Map grid. Sampling randomness can induce partition oscillations and trial-to-trial variance (pp. 10–12), and safety is geometric rather than kinodynamic—so certified deployment would require dynamics-aware safety and robustness to sensing/communication loss.

Swain et al. [114] build a three-layer pipeline: (i) Voronoi partitions the region into non-overlapping sub-polygons; (ii) within each sub-polygon, waypoints are laid out by optimizing camera-footprint rectangles for high inside-coverage and low spillover; (iii) an online path designer links waypoints and a rule set detects/avoids static and dynamic collisions (FGM-I for static, wait/speed-up rules from collision-cone geometry for cross-over/head-on, timed sleeps for lateral cases). The design separates global partitioning and footprint optimization from online motion and yields shorter routes with fewer collision interventions than baselines. Assumptions include a fixed-altitude 2D grid, ideal sensing/links, and point-mass kinematics; safety is geometric (no kinodynamic guarantees); performance is sensitive to sensing-range/rectangle parameters, and dynamic-obstacle handling hinges on accurate collision-cone estimates.

Bakirci [115] proposes a distributed scanning scheme for post-disaster surveillance that assigns each UAV an independent sector by segmenting the scene by UAV ID and altitude tier; every drone derives its camera footprint from the horizontal FoV to maximize at-point coverage and prevent overlap, while boundary scan counters guarantee full-area completion (phases illustrated in Figure 1, p. 3). A communication layer uses connector UAVs to relay data and maintain links when ground networks fail (Figure 2, p. 4); k-means aids rapid formation re-convergence, and an HVCR strategy shrinks insecure ranges by organizing border nodes. The approach is simple and scalable for large zones, yet it assumes a known polygonal map, fixed-altitude waypoint flight, and ideal sensing/links; collision avoidance and vehicle limits are enforced by geometric rules rather than kinodynamic guarantees, and performance depends on partition and communication parameters.

Altogether, these segmentation schemes confirm that carefully designed partitions can reduce overlap, shorten transit legs, and improve coverage time, especially when combined with simple local planners or A* avoidance. Their weaknesses include dependence on known polygonal maps and no-fly zones, fixed-altitude waypoint abstractions, parameter sensitivity (corridor width, station placement, partition rays, sensing range), and purely geometric safety. For real-world, certification-bound coverage missions, segmentation must therefore be coupled with dynamics-aware safety layers and robust online map/communication updates [111,112,113,114,115].

3.7.3. Meta-Heuristic and Bio-Inspired Algorithms

For CND, meta-heuristic and bio-inspired methods are used more sparingly and problem-specifically, typically to optimize sensor placement or inspection roles, rather than full online trajectories. The two representative works here focus on booby-inspired zone/role switching for indoor pipe inspection and MRFO–Tabu hybrids for smart-city UAV placement, both treating coverage as a static optimization over grids or LoS-constrained city models.

Aljalaud et al. [69] design a problem-specific heuristic for indoor pipe inspection inspired by booby foraging: UAVs operate in two modes—default inspection and area-restricted search around detected defects—and adopt dynamic primary/secondary/temporary roles to balance load across clustered zones. The map is gridded; requests and role changes are governed by zone state and proximity; simple collision rules and altitude layering maintain separation. This architecture separates zone assignment and local scanning, reduces revisits via ARS, and runs with lightweight data structures rather than heavy meta-heuristics. Assumptions include a fixed-altitude grid with known geometry, ideal sensing/links, and point-mass motion. Safety is geometric rather than kinodynamic, parameters (e.g., zone thresholds, cluster count) require tuning, and performance depends on the chosen clustering and role-switch logic.

Saadi et al. [70] cast smart-city UAV placement as a weighted multi-objective problem (maximize user coverage and inter-UAV connectivity, minimize energy and load imbalance) and propose IMRFO-TS: MRFO is strengthened by a tangential non-linear control to keep exploration dominant late in training, and then a Tabu Search local phase refines the best candidate for each iteration. The system model and objectives (coverage radius from altitude/visibility, connectivity from pairwise range, energy from flight-power model, load balance) are explicitly defined (see § III, pp. 5–6, with the architecture and coverage diagrams on pp. 6–7). The hybrid cleanly separates global exploration (MRFO with non-linear schedule) from local exploitation (TS), improving escape from local optima on static 3D urban maps. Limitations include reliance on known DEM/LoS propagation and fixed power models, parameter sensitivity (control schedule; Tabu neighborhood size), and additional runtime/space overhead per Tabu round. Safety is enforced via geometric margins rather than kinodynamic guarantees, so certified deployments would require dynamics-aware safety and robustness to sensing/communication loss.

These CND meta-heuristics show that specialized heuristics and hybrid foraging-inspired optimizers can improve coverage, connectivity, and load balance in static or slowly varying environments, while keeping runtime moderate compared with full-blown multi-stage frameworks. However, they depend on known geometry and propagation models, fixed-altitude grids, tuned control schedules, and neighborhood sizes, and again implement only geometric safety. Their role is, therefore, primarily to provide offline deployment or placement designs, with online coverage behavior left to lighter controllers [69,70].

The foregoing findings are consolidated in Table 9.

In CND, RL, area-segmentation and meta-heuristic approaches occupy complementary niches. RL-based coverage policies excel at learning risk- and energy-aware behaviors from grids or abstract maps, especially under partial observability, but require substantial training and depend on the fidelity of risk and communication models. Area-segmentation methods based on cellular decomposition, Voronoi, and centroid partitions provide deterministic structure, reducing overlap and simplifying routing, while meta-heuristic and bio-inspired methods fine-tune placements or roles in static or slowly varying environments. Table 9 suggests that real deployments will often pair a relatively stable segmentation scheme with either RL or simpler heuristic controllers at the vehicle level, leveraging RL when adaptation to complex risk or traffic patterns is critical, and compute permits.

Table 9. CND literature summary.

Reference	Algorithm/Method	Applicable Scenario and Problem Addressed	Limitation
Demir K, Tumen V, Kosunalp S, et al. [31]	RL → DDQN with risk-map grid	Wildfire reconnaissance coverage; Role: risk-prioritized patrol; Arch: per-UAV agent + replay/target net; Assump: fixed altitude, grid map, multi-hop to GS; Quant: point-collection ratio ≈ 30%; boundary violations → 0 after training	Grid/known risk weights; ideal sensing/links; long training; point-mass and geometric safety (no kinodynamic guarantees)
Cheng X, Jiang R, Sang H, et al. [32]	RL → MADDPG + trace-pheromone (TP-EDC)	Energy-aware dynamic coverage with stigmergy; Role: pheromone-guided policy; Arch: per-UAV actors + shared replay; Quant: average coverage rate ≈ 0.92 (6 UAVs); normalized average energy consumption ≈ 0.61	Grid and fixed altitude; ideal sensing/links; parameter-sensitive (evaporation/bounds); geometric (non-kinodynamic) safety
Dhuheir M, Erbad A, Al-Fuqaha A, et al. [33]	Meta-RL for EH + WIT (disaster zones)	Energy-harvesting + data collection with time/energy/data-rate constraints; Role: central learner + on-board actors; fixed-altitude grid; Quant: total harvested energy +25% vs. DQN; +32% vs. PSO (also higher than greedy)	Grid and fixed altitude; ideal links; no Doppler/interference; penalty-based safety (no kinodynamic guarantees); training cost/episodes.
Baccour E, Erbad A, Hamdi M, et al. [34]	PPO for adaptive UAV clustering + RIS phase + association + trajectory + BS power	6G urban anti-jamming: joint association/trajectory/beam/power with dynamic swarm sizing	Centralized PPO at BS; LoS/propagation assumptions; RIS energy neglected; ZF/SIC to ignore IUI; training cost; geometric (non-kinodynamic) safety
Puente-Castro A, Rivero D, Pedrosa E, et al. [35]	RL → Q-Learning + 2-layer ANN (replay, ε-greedy)	Obstacle-rich grid coverage; Role: local vs. global ANN controllers; Quant: point-collection/action counts reported per map and team size; boundary violations drop to ~0 after training	Fixed-altitude grid; ideal sensing/links; point-mass abstraction; reward/ANN/replay sensitivity; geometric (non-kinodynamic) safety
Zou L. [36]	Improved QMIX (state-space fusion + masked-highway mixer)	Cooperative area search on fixed-altitude grids; Role: fused-grid observation + decentralized actions + joint value mixing; Quant: steps to full coverage ≈ 74 (2 UAVs); ≈35 (5 UAVs)	Grid and fixed altitude; ideal sensing/links; sensitive to mask weight λ and reward settings; geometric (non-kinodynamic) safety
He J. [37]	RL (CTDE) → improved QMIX + multi-agent SAC (attention)	Cooperative coverage with conflict mitigation; Role: grid-cell planning + obstacle-aware motion; Quant: average coverage time reduced ≥11.6% vs. Anti-Flocking; mean coverage time (4 UAVs) ≈ 109 s	Fixed-altitude grid; ideal sensing/links; point-mass model; geometric (non-kinodynamic) safety; centralized critics/training cost
Hou Y, Zhao J, Zhang R, et al. [38]	RL → MADDPG (CTDE) with CNN map features	Large-scale cooperative target search; Role: decentralized actors + centralized critics; Quant: overall steps to find all targets ≈ 2631 vs. 3012 (DQN)/3150 (ACO); success rate ≈ 10% higher than DQN/ACO	Fixed-altitude grid; ideal sensing/links; point-mass model; reward/hyper-parameter sensitivity; centralized-critic training cost; geometric (non-kinodynamic) safety
Szklany M, Cohen A, Boubin J. [111]	Online partitioning + wavefront traversal (“Tsunami”)	Fault-tolerant swarm coverage with dynamic reassignment; Role: offline waypoint grid → online dispatch via drone pool; Quant: coverage time 1.6 × faster (ideal); 1.91× faster with faults vs. cellular decomposition (SCoPP)	Requires known polygonal map/NFZs; fixed-altitude, waypoint flight; ideal sensing/links; geometric (non-kinodynamic) safety; parameter-sensitive (corridor/σ/samples)
Yu Y, Lee S. [112]	MBS-MUCCPPAFOA; MUAV-CCPPAFOA-AS (A* avoidance)	Energy-aware multi-UAV coverage with no-fly zones; Role: multi-base-station global plan/four-region segmentation + per-UAV execution; Quant: proposed methods complete coverage at 950 × 950 m where baseline MUSCPP fails; lower completion times across 600–950 m sizes and varied aspect ratios	Fixed-altitude grid; known polygonal map/NFZs; partition/load sensitivity; geometric (non-kinodynamic) safety; ideal sensing/links
Gui J, Yu T, Deng B, et al. [113]	DCAS + NBV/RRT (decentralized)	Dynamic centroid area-segmentation + per-UAV NBV in partitions; Role: partition (load-balanced) + local RRT; Arch: decentralized; Quant: mean completion time (5 UAVs, outdoor) ≈ 376.7 s; indoor ≈ 86.2 s	Needs known initial poses, ideal sensing/links, fixed-altitude Octo Map; sampling variance/partition oscillation; geometric (non-kinodynamic) safety
Swain S, Khilar P M, Senapati B R. [114]	Voronoi + camera-footprint waypointing + online path and collision (FGM-I + cone rules)	Multi-UAV coverage with static and moving obstacles; Role: offline partition/waypoints + online path/avoidance; Quant: path length 16.88 m vs. 20.43 m and collision-avoidance events 5 vs. 7	Fixed-altitude 2D, ideal sensing/links, parameter sensitivity (Rsense/footprint); geometric (non-kinodynamic) safety
Bakirci M, Ozer M M. [115]	Distributed Swarm Scanning (DSS): ID/altitude segmentation + FoV footprints + boundary counters; connector UAV relays	Post-disaster area scanning; Role: per-UAV sector scan + comm relay; Arch: decentralized scanning + relay layer	Assumes known polygon map and fixed-altitude waypoints; ideal sensing/links; parameter-sensitive (partition and comm); geometric (non-kinodynamic) safety
Aljalaud F, Kurdi H, Youcef-Toumi K. [69]	Booby-inspired heuristic inspection (default + ARS modes; role switching)	Indoor pipe inspection; Role: zone assignment + local scanning; Arch: lightweight heuristic; Quant: mean defect-detection time ↓ ≥13% vs. random; runtime speedup ≥ 3 × vs. random	Fixed-altitude grid; ideal sensing/links; parameter-sensitive (zone/thresholds/clusters); geometric (non-kinodynamic) safety
Saadi A A, Soukane A, Meraihi Y, et al. [70]	IMRFO-TS (MRFO + Tabu; tangential non-linear control)	Smart-city UAV placement optimizing coverage/connectivity/energy/load; Role: MRFO exploration + TS local refinement	Needs known DEM/LoS and fixed power model; parameter-sensitive; TS adds runtime/space; geometric (non-kinodynamic) safety

3.8. CFD Problem (Coverage, Offline, Dynamic)

Particle Swarm Optimization Algorithms

In the CFD scenario, PSO and PSO-hybrid planners are primarily used as coverage-route synthesizers on static or quasi-static maps with dynamic targets or risk fields. The surveyed methods combine PSO with receding-horizon optimization, clustering and ACO seeding, cooperative coevolution, or multi-sub-swarm learning, and often include B-spline smoothing, aiming to minimize non-spraying time, maximize detection probability, or balance coverage and energy.

Cheng et al. [101] formulate heterogeneous-swarm search as an online path-optimization problem: UAV kinematics are unified in polar coordinates to accommodate different maneuver envelopes; a multi-search situation map (environment search, pheromone, and target-probability layers) is updated each step—pheromones via a Gaussian-integral growth/volatilization rule—to steer exploration; and an enhanced PSO-RHC optimizes a finite-horizon sum of rewards, with an “optimal-solution rolling” inheritance to accelerate convergence between horizons. This cleanly separates (i) a geometry- and sensing-aware scene model from (ii) horizon-based path optimization that can replan under dynamic targets. The approach, however, relies on a 2D fixed-altitude abstraction and ideal sensing/links, assumes inter-UAV collision is absent (different altitudes), and introduces extra compute/latency from horizon expansion despite the inheritance trick; performance is sensitive to pheromone and PSO hyper-parameters (λ, σ, swarm size, Q). Certified deployment would require a dynamics-aware safety layer and robustness to sensing/communication loss.

Pehlivanoglu et al. [102] fuse Fuzzy C-Means clustering and ACO ordering to generate a high-quality initial swarm, then represent each path with cubic splines and introduce additional waypoints around locally clustered collision points to handle obstacles. Two further variants add a vibrational mutation for diversity and a local-solution prediction step that assembles the best curve segments into a new particle in each iteration. The pipeline cleanly separates candidate generation (FCM/ACO), obstacle repair (extra waypoints), and PSO refinement, producing smooth, collision-free routes on rural/urban DEMs. Assumptions include a static, known map, fixed-altitude point-mass motion, and geometric (not kinodynamic) safety; performance is sensitive to sampling and “corridor” parameters (e.g., σ, D, sample count), and the extra-waypoint repair may reduce spline flexibility or induce sharp turns for fixed-wing vehicles.

Li et al. [103] cast a multi-UAV moving-target search as maximizing the cumulative detection probability under a Markov target model and motion-aware sensing. Each UAV’s path is motion-encoded (amplitude/direction steps) and evolved by its own subswarm; a cooperative-coevolution loop couples sub-swarms by evaluating each particle with the current global bests of the other sub-swarms, and a stagnation detector resets a sub-swarm when it collapses near a local optimum. A layered scene model (target motion, sensor detection, target-probability map) supplies fitness, so the optimizer alternates global coevolution with per-UAV motion updates. The method improves escape from local traps relative to single-swarm PSO/ACO variants, but it assumes a 2D, fixed-altitude grid, ideal sensing/links, and no explicit inter-UAV collision model (separation handled geometrically); performance is parameter-sensitive (e.g., inertia/coefficients, clamping, convergence ε, swarm size) and incurs extra evaluation cost from cross-swarm fitness coupling.

Tang et al. [104] divide the swarm into fitness-based sub-swarms and assign level-specific update rules (global-best, sub-swarm-best, ordinary) with an adaptive inertia schedule, so higher-fitness particles guide lower-fitness ones while preserving diversity. Multi-subswarm learning is combined with variable-length path encoding and B-spline smoothing to generate flightable routes that minimize non-spraying time (return trips, replenishment, turns). The pipeline separates candidate generation and local refinement and stabilizes convergence versus single-swarm PSO. Assumptions include a fixed-altitude, known DEM environment, and point-mass motion; performance is sensitive to sub-swarm boundaries and learning weights, and extra overhead arises from sub-swarm division/coordination. Safety and vehicle limits are implemented via geometric penalties rather than kinodynamic guarantees.

Wang et al. [105] design a distributed exploration framework for unknown space that evaluates paths and terminal states jointly: multi-step selective sampling proposes several short-horizon path/terminal sequences from a grid risk map, and an improved discrete binary PSO (with contraction factor and age-triggered mutation) selects a collision-free sequence that maximizes exploration gain while penalizing energy, action discontinuity, and inter-UAV proximity. A terminal-guided evaluator favors movement toward frontiers; the controller executes the first segment and replans, enabling online coverage without a central planner. Assumptions include a fixed-altitude 2D grid with ideal sensing/links; safety and dynamics are handled by geometric penalties rather than kinodynamic guarantees, and performance is sensitive to sampling horizon/weights and DBPSO parameters.

Yan et al. [106] fuse an O-FCM area divider—membership weighted by nearest-neighbor distance to sharpen cluster borders—with a PSO–ACO path planner that uses dynamic pheromone updating and niche-style PSO refinement to cut repeated sweeps. A honeycomb (hexagonal) coverage model supplies waypoint seeds; DEM-based elevation and sensor cone geometry shape feasible footprints, and routes are smoothed after optimization. The pipeline cleanly separates zoning (balanced sub-areas) from routing (hybrid PSO–ACO), improving load balance and full-coverage planning on static mountain maps. Assumptions include a known 3D terrain map and sensor model, fixed-altitude/point-mass kinematics, and ideal sensing/links; safety is geometric (no kinodynamic guarantees). Performance is parameter-sensitive (e.g., O-FCM weights, pheromone/evaporation, niche PSO settings), and the two-stage hybrid adds runtime overhead versus single-family planners.

Chen et al. [107] propose an IPSO-based deployment scheme that maximizes coverage and uplink rate for UAV-aided data collection over randomly distributed sensors. The area is tiled by Reuleaux-triangle grids to form a distributed coverage area (DCA); a Roman-domination-based “attractive-source” selector prevents different UAVs from choosing adjacent vertices and reduces overlap, while two-stage IPSO (exploration → coverage) updates inertia/coefficients to exchange information early and spread platforms toward uncovered edges later. After deployment, a convex maximum-collection program assigns sensors to UAVs to raise throughput. The method separates (i) geometry- and channel-aware coverage modeling from (ii) IPSO deployment and (iii) convex association, but it assumes a fixed-altitude 2D map, ideal sensing/links, and known channel/LoS parameters; safety is geometric (no kinodynamic guarantees), and performance is parameter-sensitive (e.g., λ/μ/ν in the vertex utility, ω and c1–c3 schedules)

Yan et al. [108] build a clustering–optimization pipeline that first partitions targets by a sequential geographic clustering (tuning cluster diameter and migration distance), solves a per-cluster TSP with ACO to seed tours, and then refines both fleet size and routes with PSO/CSO. The workflow separates 2D assignment (number of UAVs, per-cluster tours) from 3D track shaping (climb/descend angles, sensor viewing geometry), with a Sobol analysis to rank platform–payload factors that drive mission time. Assumptions include a known static map, point-mass kinematics, and fixed altitude; performance is sensitive to clustering/weight settings and the choice of ACO/PSO/CSO parameters, and safety is enforced by geometric margins rather than kinodynamic guarantees—caveats that matter for certified deployment.

Taken together, these CFD PSO pipelines show that PSO can efficiently explore large coverage spaces, handle multi-objective trade-offs, and generate smooth, feasible routes for plant protection, checkpoint coverage, and dynamic-target search. Their limitations are familiar: fixed-altitude grid or DEM abstractions, ideal sensing/links, parameter sensitivity (inertia coefficients, swarm size, clustering, and corridor parameters), additional overhead from multi-swarm coordination and cross-swarm fitness coupling, and geometric rather than kinodynamic safety. Bridging the gap to certified deployments will require dynamics-aware safety layers, more realistic sensing/communication models and careful profiling of runtime under operational update rates [101,102,103,104,105,106,107,108].

The foregoing findings are consolidated in Table 10.

For CFD missions, PSO and PSO-hybrids again form the core planning tools, reflecting the need to design coverage routes on static or quasi-static maps while accounting for dynamic targets or risk fields. Clustering, graph seeding, and multi-swarm cooperation help PSO manage large search spaces and multi-objective criteria such as non-spraying time, detection probability, and energy. Compared with CND, the focus is more on synthesizing efficient sweep patterns and less on fine-grained online adaptation, so RL plays a limited role. As summarized in Table 10, CFD deployments are likely to use PSO-based coverage planners combined with relatively simple local controllers, with the main design choices concerning how much structure (e.g., rows, corridors, clusters) is imposed before PSO optimization.

Table 10. CFD literature summary.

Reference	Algorithm/Method	Applicable Scenario and Problem Addressed	Limitation
Cheng K, Hu T, Wu D, et al. [101]	PSO-RHC + polar-coord. model + multi-layer map	Heterogeneous-swarm dynamic-target search; Role: horizon-based online replanning guided by pheromone/TPM; Quant: avg detected targets = 8/8; avg steps to finish ≈ 313.5	2D fixed altitude; ideal sensing/links; collision ignored (altitude separation); extra compute from horizon; parameter-sensitive (λ, σ, swarm/Q)
Pehlivanoglu V Y, Pehlivanoğlu P. [102]	PSO (FCM + ACO seeding; waypoint repair; mutation; prediction)	Offline multi-UAV checkpoint coverage on static DEMs; Role: seeded PSO + obstacle repair; Quant: mean utility (rural-1, 2 UAVs) 153 vs. 10,091 (PSO-3 vs. PSO-1); iterations ≥ 45% fewer than PSO-2	Static known map; fixed-altitude point-mass; parameter-sensitive (σ/D/samples); extra-waypoint repair may constrain spline turns; geometric (non-kinodynamic) safety
Li Y, Chen W, Fu B, et al. [103]	Cooperative-Coevolution, Motion-Encoded PSO (CC-MPSO)	Dynamic-target search; Role: per-UAV sub-swarms + cross-swarm fitness coupling; Arch: centralized evaluation with distributed sub-swarms; Quant: optimized 52/75 statistical items across six scenarios; e.g., Scenario-2 (UAV2) mean detection prob. ≈ 0.153 vs. PSO ≈ 0.125	2D fixed-altitude grid; ideal sensing/links; no explicit collision model (geometric spacing only); parameter-sensitive (ω, c1/c2, clamp, ε, swarm size); extra compute from cross-swarm fitness coupling
Tang Y, Huang K, Tan Z, et al. [104]	MSC-PSO (fitness-based sub-swarms + level-based learning + adaptive inertia)	Plant-protection paths on static DEM; Role: multi-subswarm PSO + B-spline smoothing; Quant: convergence iterations ≈ 52; total non-spraying time ≈ 12.3 min in sim field	Fixed-altitude known map; parameter-sensitive (sub-swarm bounds/weights); overhead from sub-swarm coordination; geometric (non-kinodynamic) safety
Wang Y, Li X, Zhuang X, et al. [105]	DNBPT: multi-step gain sampling + improved DBPSO	Distributed exploration of unknown grids; Role: path–terminal co-evaluation + rolling execution; Quant: mean exploration time ≈ 65.9 s (Scene II, fixed); ≈205.4 s (Scene III, fixed)	Fixed-altitude 2D grid; ideal sensing/links; geometric (non-kinodynamic) safety; parameter-sensitive (horizon/weights/DBPSO)
Yan X, Chen R, Jiang Z. [106]	PSOHAC: O-FCM zoning + PSO–ACO routing	Offline coverage on mountain DEM; Role: balanced zoning + hybrid routing; Quant: total path length ≈ 1348.6 m; flight time ≈ 78.2 s	Needs known DEM/sensor model; fixed-altitude point-mass; parameter-sensitive (O-FCM/pheromone/PSO); added hybrid overhead; geometric (non-kinodynamic) safety
Chen Y, Qin D, Yang X, et al. [107]	IPSO + Reuleaux-tiling DCA + Roman-domination + convex association	Offline deployment and association for WSN data collection; Role: two-stage IPSO deploy → convex sensor–UAV assignment	Fixed-altitude 2D, ideal sensing/links; parameter-sensitive (λ/μ/ν, ω, c1–c3); geometric (non-kinodynamic) safety
Yan Y, Sun Z, Hou Y, et al. [108]	Clustering + ACO(TSP) seeding + PSO/CSO refinement	Offline fleet sizing + routing on static map; Role: 2D assignment → 3D track shaping; Quant: number of UAVs after optimization = 23; mean path length ≈ 64.34 km	Static known map; sensitive to cluster diameter/migration and ACO/PSO/CSO params; geometric (non-kinodynamic) safety

3.9. CFS Problem (Coverage, Offline, Static)

3.9.1. Genetic Algorithm Algorithms

In CFS, GA-based planners focus on capacitated coverage-routing problems, such as multi-UAV pesticide spraying, where both vehicle capacity (battery, tank) and route length must be considered jointly. The representative memetic GGA combines GA exploration with guided local search to refine multi-tour chromosomes, treating coverage as a capacitated VRP on static grid maps.

Jasim et al. [82] cast pesticide-spraying with multiple UAVs as a capacitated VRP that jointly minimizes battery and tank consumption, and propose a memetic GGA: a GA generates and evolves FIFO-decoded multi-tour chromosomes (selection/2-point crossover/swap–reverse–insert mutation), while guided local search improves the current best by penalizing frequently used features to avoid local traps; the improved solution is re-inserted into the GA pool (elitism). The formulation enforces per-UAV battery/tank bounds and starts/returns at a depot; constraints are VRP-style, with big-M subtour elimination. Assumptions include a static, known grid map, constant-speed flight, square downward FoV, no wind, and a point-mass kinematic abstraction. Safety and dynamics are handled via geometric margins rather than certified kinodynamic guarantees. Performance hinges on GA/GLS settings (population, crossover/mutation, penalties) and FIFO sequencing can bias early allocations; nevertheless, across increasing node/UAV scales the hybrid consistently improves best/mean objectives and passes non-parametric significance tests against common meta-heuristics, motivating its use as an offline planner for agricultural CPP-VRP.

This CFS GA framework shows that memetic GA–GLS hybrids can significantly reduce energy and resource consumption in waypoint-dense agricultural scenarios, while remaining tractable for medium-sized instances. Its applicability is bounded by static known maps, constant-speed point-mass models, sensitivity to GA/GLS settings and FIFO sequencing bias and using geometric safety only. As with other offline planners, its output should be combined with dynamics-aware safety and updated when the environment or vehicle models change.

3.9.2. Differential Evolution Algorithms

The DE-based CFS planner treats coverage-path synthesis as a constrained continuous optimization problem over region-zoning and sweep parameters. By combining a POC/POD/POS search model with adaptive DE updates, it seeks routes that respect endurance, communication, and safety constraints while covering polygonal and circular regions under limited prior information.

Fan et al. [116] exploit limited prior information by building a POC/POD/POS-based search model and optimizing multi-UAV routes with an adaptive Differential Evolution (DE) solver. The method mixes polygon and base-circle region types, a variable-angle parallel sweep for convex polygons, and sector/expanded-square/spiral strategies for circular focus zones. Constraints cover endurance, kinematics, no-fly/boundaries, inter-UAV separation, and a topology-switching comms rule. DE uses a generation-scheduled mutation and a 6n-dimensional encoding to improve diversity early and precision late. This cleanly separates (i) region zoning, (ii) constrained path synthesis, and (iii) comms topology updates. Assumptions include a fixed-altitude grid, known DEM/risk map, and ideal links/sensing; safety is geometric (not kinodynamic). Performance is parameter-sensitive (mutation schedule, sampling/spacing, region setup), and rolling replans are needed when priors drift.

This CFS DE approach demonstrates that adaptive DE can effectively coordinate region zoning, constrained path synthesis, and simple communication-topology rules to achieve high containment probability in static disaster scenarios. Its constraints include reliance on fixed-altitude grids and prior probability/DEM maps, parameter sensitivity in mutation schedules and spacing/region setup, idealized communication switching, and purely geometric safety. As with other offline CFS planners, it is best used for pre-mission design, with rolling replans and dynamics-aware safety layers handling drift in priority and real-time uncertainties.

The foregoing findings are consolidated in Table 11.

In CFS, GA and DE-based planners address capacitated coverage-routing problems such as multi-UAV spraying, where both vehicle endurance and route structure must be optimized. GA–GLS memetic schemes offer flexible encoding for multi-tour solutions and can handle complex agricultural field shapes, while DE often provides faster convergence for continuous parametrization and can be embedded as a local improver inside PSO/GA hybrids. Table 11 indicates that, at the scale of current CFS applications, GA and DE provide an effective trade-off between solution quality and computational effort, with the main limitations being parameter tuning, stochastic variability, and the lack of formal optimality or safety guarantees, which motivates pairing them with dynamic-aware safety checks in deployment.

Table 11. CFS literature summary.

Reference	Algorithm/Method	Applicable Scenario and Problem Addressed	Limitation
Jasim A N, Fourati L C. [82]	GA + GLS → GGA (memetic CVRP solver)	Offline multi-UAV CPP-VRP (min battery + tank); Role: GA global search + GLS best-solution refinement; Quant: mean objective (256 nodes, 4 UAVs) ≈ 859.7; best ≈ 852.0; runtime ≈ 18.0 s at 256/4 vs. ≈3.5 s at 25/1	Static known map; constant-speed point-mass model; FIFO bias; parameter-sensitive; geometric (non-kinodynamic) safety
Fan X, Li H, Chen Y, et al. [116]	Adaptive DE + POC/POD/POS model; variable-angle sweep (polygon) + sector/expanded-square/spiral (circle)	Offline multi-UAV disaster-area search with endurance/comms/safety constraints; Quant: total POS ≈ 0.874 in the time-limited 6-UAV case; POSC ≈ 0.940 for the polygon-partition global search	Fixed-altitude grid; requires prior probability map/DEM; parameter-sensitive (mutation schedule, spacing/region setup); geometric (non-kinodynamic) safety; comms/topology switching idealized

4. Technique Selection

4.1. Architecture Selection Across the Nine Scenarios: Trade-Offs Among Centralized, Decentralized, and Hybrid Planning

Centralized planning aggregates global information to produce consistent, certificate-friendly paths and handle tightly coupled constraints (e.g., formation, time windows, no-fly corridors). It is preferred when maps are known and stable (PFS), safety and compliance dominate, communication is reliable, and team size is small to moderate; however, it incurs high compute/communication load, slower replan cycles, and a single point of failure, and can be brittle under packet loss—limitations that grow with fleet size and environmental volatility. Decentralized planning distributes computation to the vehicles, yielding low-latency replans, graceful degradation under dropouts, and scalability for large swarms—advantages that align with dynamic online settings (PND/DND/CND) and contested or bandwidth-limited links. The costs are bounded-suboptimality from partial views, risk of inter-agent conflicts or deadlocks without robust collision-avoidance/consensus, and difficulty enforcing global coupling or fairness constraints. In practice, hybrid architecture is common: global partitioning or task allocation is computed centrally pre-mission (PFS/DFS/CFS), while agents execute decentralized, reactive, or learning-based replanning in flight (PND/DND/CND), with event-triggered recentering when communication permits. As a rule of thumb, choose centralized for high-assurance missions in stable maps with reliable comms and tight global constraints; choose decentralized for large teams, volatile environments, and lossy links; use hybrids when missions span both phases or when you must hedge against communication uncertainty.

4.2. Digital Twins Across the Nine Scenarios: A Bridge for Pre-Deployment Synthesis and Validation

A digital twin (DT) is a synchronized virtual replica of the swarm, the operating environment, and mission constraints that enable closed-loop trajectory synthesis and verification before flight. Beyond conventional simulation, the DT reduces sim-to-real mismatch by (i) identifying and calibrating models from prior flights and bench tests (aerodynamics, sensors, battery aging), (ii) supporting multi-fidelity optimization—fast surrogates and learned cost-to-go for broad design sweeps, high-fidelity physics for final validation, (iii) providing Monte Carlo stress testing (wind fields, GPS outages, packet loss) to expose brittle behaviors, (iv) co-simulating networks and latency to size receding-horizon windows and collision-avoidance buffers, (v) enabling SIL/HIL loops and policy training (e.g., RL) with domain randomization, and (vi) enforcing safety envelopes (geofences, minimum separation, energy margins) via runtime monitors that are later mirrored onboard.

Within our nine-scenario taxonomy, the DT acts as the bridge between offline planning and online execution. It verifies curvature/clearance and track-keeping in PFS, encodes time-varying costs and moving-target models for PFD/CFD/CFS; co-optimizes partitioning/scheduling under vehicle and resource limits in DFS/DFD; and hardens PND/DND policies against partial observability and lossy links by testing end-to-end perception-planning-control stacks.

4.3. Safety-Aware Collision Avoidance Across the Nine Scenarios

In cluttered and dynamic environments, effective avoidance in multi-UAV swarms is achieved by coupling low-latency reactive modules with short-horizon replanning, and where kinodynamic limits dominate, optimization-based control—all using methods already covered in this manuscript. Concretely, APF provides immediate, communication-light near-field separation for decentralized teams (natural in PND/DND), with their well-known local-minimum/oscillation issues mitigated by anchoring motion to a global route produced offline or quasi-online by A*/PRM*/RRT* (from PFS) and by simple tie-breaking/priority heuristics. When dynamics and safety envelopes are tight, MPC maintains constraint-aware separation over short horizons (appropriate for PND) at the cost of solver load and model-mismatch sensitivity. For moving obstacles and rapidly changing costs, RRT (or incremental D* Lite/ARA* when a lattice is used) updates the route around the global backbone without discarding it; light post-processing smoothing absorbs last-moment conflicts. RL policies are used in PND/DND/CND for adaptivity to unmodeled effects, but in practice they are wrapped by an APF safety layer so that learned decisions remain separation-safe under distribution shift. Meta-heuristics (e.g., PSO/GA/DE/ACO) and offline graph-search/sampling planners primarily shape conflict-averse waypoints and partitions upstream (e.g., for CFD/CFS or pre-mission path design) rather than serving as the instantaneous avoidance mechanism itself. Across scenarios, this yields a consistent pattern: PFS furnishes the global route; PND executes APF with RRT or MPC for local avoidance; DND relies on decentralized APF plus lightweight replanning; CFD/CFS use global-route guidance with reactive buffering in dense coverage.

4.4. Technical Selection

To keep comparisons fair across heterogeneous setups without over-prescribing metrics, we evaluate each technique family through the following lenses and then synthesize the trade-offs in this section. We look at path quality (length/time/energy and smoothness), safety and robustness (minimum separation, collision/violation incidence, success under disturbances), responsiveness (replanning latency and stability in dynamic scenes), computational footprint (runtime, memory, on-board load, and communication overhead), scalability (growth with team size and task density), constraint compliance (geofences, no-fly corridors, curvature/kinodynamic limits), and deploy ability (sim-to-real behavior, hardware readiness, parameter sensitivity). Scenario-specific aspects are considered where relevant—Distribution emphasizes makespan, total travel, and load fairness; coverage emphasizes coverage ratio, overlap, revisit time, and uniformity. In Section 3.1, Section 3.2, Section 3.3, Section 3.4, Section 3.5, Section 3.6, Section 3.7, Section 3.8 and Section 3.9, we discuss prior results through these lenses and cite quantitative outcomes where benchmarks are directly comparable; here in Section 4.4, we aggregate those judgments into scenario-conditioned recommendations (best-suited families and the reasons).

There is no algorithm family that is universally optimal across missions and environments. We therefore provide best-suited recommendations for each of the nine scenarios covered in Section 3. “Best-suited” denotes the technique family that, across the cited studies, most reliably balances real-time performance, scalability, and implementation ability under the given operating assumptions. Each recommendation below is a synthesis of Section 3 and is supported by representative evidence from the manuscript.

The specific recommendations for each scenario are synthesized in Table 12, which provides a concise overview of the best-suited techniques and rationales.

Beyond structuring the survey, the scenario-conditioned taxonomy can be used as a simple tool for practical deployment. Given a new multi-UAV project, a practitioner can (i) locate the most relevant scenario cell in the grid according to mission type, planning mode, and environment; (ii) consult Table 12 and the corresponding scenario tables to see which planning architectures and algorithm families have already been evaluated under comparable assumptions, together with their reported limitations; and (iii) read across neighboring cells to understand how these choices evolve when one axis changes (for example, when moving from offline to online planning, or from static to dynamic environments). Families that appear in multiple cells, such as CTDE-based reinforcement learning or PSO hybrids, can then be considered candidates for cross-scenario reuse, while the limitations summarized in the tables indicate which aspects (e.g., safety layer, communication model, energy budget) must be adapted before transfer.

Building on the above scenario-conditioned recommendations, it is important to emphasize the trade-offs that govern practical algorithm selection. Reinforcement learning (RL) methods dominate online/dynamic scenarios due to their adaptivity and scalability, yet their high training cost and energy demand limit deployment on resource-constrained platforms. By contrast, meta-heuristic families such as PSO and GA are more energy-efficient and easier to implement for offline planning, but their scalability diminishes as swarm size or dimensionality increases. Likewise, centralized planners ensure globally consistent and certifiable routes—critical in dense urban corridors—but are computation- and communication-intensive, whereas decentralized planners scale gracefully to large swarms with low latency at the expense of global optimality and fairness. Hybrid schemes that combine centralized pre-mission partitioning with decentralized in-flight adaptation have therefore become increasingly favored in recent studies. This trade-off perspective complements the scenario-based synthesis by making explicit that no algorithm family is universally optimal; the choice is inevitably shaped by mission scale, environmental volatility, and onboard resource constraints.

4.5. Benchmarking

Building on the scenario-based algorithm selection summarized in the previous subsections, this part shifts the focus to comparability across studies. Drawing exclusively on the works surveyed in Section 3, we consolidate (a) the performance metrics reported under each scenario, and (b) representative benchmark tasks.

Table 13 collates only those metrics explicitly reported by the studies surveyed in Section 3 and aligns them with our nine-scenario taxonomy. This ensures consistency with the evaluation lenses already established in Section 4.4 (path quality/energy, safety and robustness, responsiveness, computational footprint, scalability, constraint compliance, and deploy ability). Illustrative instances include Age of Information for multi-UAV data collection (DP-MATD3), and coverage-efficiency gains under dynamic reassignment (Tsunami), both drawn directly from the cited works. The table therefore serves as a compact index from the scenario to the metrics used in the literature synthesized by this review.

Table 14 abstracts task patterns directly from Section 3 case studies and embeds the reported evaluation setups into the same table (when papers did not name a specific simulator, the environment elements—map type, workspace bounds, obstacle/traffic/wind models—are listed as described).

5. Discussion

Figure 2a summarizes current algorithm families applied to path-planning problems. RL methods dominate owing to their model-free nature, broad applicability, and capacity to operate in high-dimensional spaces under multiple constraints. Real-time policy updates endow RL with greater autonomy in dynamic scenes, granting it a clear edge in shortest- or optimal-path optimization. By contrast, traditional schemes—RRT, APF, and graph-search variants—excel in low-complexity settings thanks to simple heuristics and high computational throughput. However, their sensitivity to dynamics and local-optimum traps has prompted extensive work on tailored enhancements. In small-scale cases, such enhanced variants deliver near-optimal paths with high efficiency. MPC offers robust, forward-looking planning and thus suits missions demanding high path fidelity. Its reliance on accurate models and heavy compute footprints often motivate hybridization with lighter heuristics to improve adaptability in complex environments. Heuristic meta-heuristics remain popular for their adaptability and intrinsic parallelism. ACO and GA are flexible, yet compute-intensive and slow to converge, limiting their appeal for real-time optimal routing. PSO is easier to implement and scales naturally to multi-agent cooperation, making it the preferred choice for large-scale path-planning tasks.

In distribution problems, accuracy and speed are the two primary metrics. Figure 2b summarizes the algorithm landscape for this task. PSO evaluates multiple scenarios in parallel, delivering high computational throughput. PSO handles multi-objective requirements, adapts to modest dynamics, and therefore efficiently solves many UAV-swarm allocation problems. Yet, PSO particles are susceptible to local optima, limiting performance on large-scale swarm allocations. RL methods update policies in real time, making them well suited to complex, multi-task allocations. With adaptive learning, RL can uncover near-optimal allocations. Training RL is resource-intensive; hence, it is often hybridized with lighter heuristics to meet real-time deadlines. GA provides robust global search, yielding near-optimal swarm allocations even in diverse and complex task spaces. Because GA is gradient-free, it copes with discontinuous, nonlinear, and hard-to-model allocation landscapes. Even with complex objectives, GA effectively explores the solution space and identifies allocations that satisfy mission requirements.

Coverage algorithms must be tailored to specific mission requirements. Key performance metrics—coverage ratio, completion time, and energy consumption—are driven by those requirements. As illustrated in Figure 2c, PSO converges rapidly, making it ideal for time-critical missions. Its adaptability also allows PSO to satisfy multi-objective, multi-constraint coverage tasks, thereby increasing overall mission success. RL refines global coverage paths through continual self-learning, boosting efficiency while minimizing overlap. A carefully designed reward structure balances competing objectives, enabling RL to handle complex, constrained-coverage scenarios. Area-division approaches lower planning complexity by partitioning the workspace into sub-regions assigned to individual UAVs. Such decomposition prevents redundant sweeps and lets each vehicle operate independently. Localized planning reduces computational load and planning time, simultaneously enhancing operational safety. However, once partitioned, regions are typically static; unexpected obstacles, environmental changes, or fleet adjustments can degrade efficiency. Effective segmentation depends on accurate initial parameters—UAV capability, terrain geometry, and mission priority. Poor initial partitioning can produce oversized, under-served, or overlapping regions. Consequently, area-division excels at task decomposition and load balancing in static, multi-UAV settings.

As Figure 2d indicates, PSO and RL already dominate practical deployments, efficiently solving a wide spectrum of path-planning tasks and holding clear promises for further expansion. Continued research reveals that meta-heuristic and bio-inspired methods increasingly outperform incumbents in niche scenarios with exotic constraints. Future UAV-swarm planners must accommodate diverse theatres and handle tightly coupled objectives—coverage, timing, energy, communication, and safety—rather than optimizing any single metric in isolation. Accordingly, designing general-purpose planning frameworks that transfer across mission types, scale with swarm size, and blend global search with rapid local refinement will be a central research frontier.

6. Future Research

This section distills the scenario-conditioned synthesis developed in Section 2, Section 3 and Section 4 into a focused set of research priorities. Each priority is framed within the mission–planning–environment contexts defined in Figure 1 and assessed through the evaluation lenses already established—responsiveness, safety, scalability, and energy—so that method choice follows from context rather than from algorithm lineage. The agenda explicitly traces back to the evidence consolidated in Section 3 (per-scenario advances with reported limitations) and to the cross-cutting guidance in Section 4 (architecture selection, digital-twin–supported validation, and safety-aware avoidance). In this way, Section 6 extends the taxonomy-based synthesis into testable directions and deployment-oriented guidance, rather than introducing a parallel list.

6.1. Risks and Challenges

Across the nine scenarios, the main risks and challenges fall into four closely related groups, rather than a single monolithic issue.

6.1.1. Real-Time Computation and Scalability

Future theatres—military, agricultural, emergency response, and low-altitude logistics—impose strict cycle-time budgets on resource-limited avionics. Classical graph search and sampling planners lack intrinsic online adaptivity, while many population-based meta-heuristics stagnate in local optima when the search space is high-dimensional and dynamic. Reinforcement-learning (RL) methods offer strong adaptivity and can encode complex objectives, but training is computationally and energy intensive, and embedded deployment is constrained by memory, inference latency, and hyperparameter sensitivity. Balancing online adaptivity against computational footprint and energy budget therefore remains a central risk when moving from simulation to fielded swarms. Emerging mitigation strategies include shifting most training to offline/batch settings with high-fidelity digital twins so that policies are learned in virtual replicas of the real system [117,118], compressing and distilling large policies into lightweight embedded actors for edge deployment [119], sharing parameters across agents in multi-agent RL so that one actor serves many UAVs [25], and using hierarchical designs in which RL selects among pre-computed motion primitives or macro-actions at a coarse time scale while classical controllers handle fast, low-level stabilization [120].

6.1.2. Task Allocation, Cooperation, and Multi-Objective Scheduling

Even with the mission-based taxonomy used in this review, assignment and routing remain tightly coupled; planners must jointly account for reachability, time windows, safety margins, and mission priorities. In large swarms, naive decompositions (first allocation, then routing) can lead to unstable behavior, with frequent reassignments and oscillatory paths. Ensuring scalable multi-objective scheduling and cooperative control—so that heterogeneous fleets can meet deadlines, respect coupling constraints, and maintain fairness as team size and task density grow—remains an open challenge, especially under partial observability.

6.1.3. Communication, Safety, and Sensing Robustness

Multi-UAV swarms depend on low-latency, bandwidth-limited links to share state and mission information. Centralized architecture struggles to scale: they create single points of failure and are brittle under packet loss or heterogeneous links, yet they remain attractive when safety and certification demand global consistency. Decentralized architectures scale more gracefully but can suffer from partial views, deadlocks, and inter-agent conflicts if collision avoidance and consensus are not carefully designed. In both cases, safety is often enforced by geometric thresholds rather than certified kinodynamic envelopes, and many studies assume ideal sensing and map fidelity. In practice, noisy or failed sensors, map errors, and latency can quickly erode the safety margins implied by simulations.

6.1.4. System-Level Concerns: Security, Privacy, and Energy/Lifecycle Constraints

As UAV applications mature, system-level issues increasingly shape path-planning design. Cyber-security and privacy requirements restrict which data can be exchanged and where decision logic is deployed, complicating the use of centralized controllers or cloud-based learning. At the same time, battery, thermal, and lifecycle limits couple route design with charging, maintenance, and fleet rotation policies; simple energy models used in most current work under-represents these constraints. Coverage of irregular or concave regions remains challenging for many planners, further stressing energy budgets when repeated passes are needed. Future planning frameworks must therefore address security, privacy, and lifecycle-aware energy management alongside classical objectives such as path length and coverage.

These four groups of risks and challenges motivate the open problems articulated in Section 6.2, which recast them into testable research questions tied to specific scenario cells and evaluation lenses.

6.2. Open Scientific Problems

We articulate six open problems using concise, operational language. Each problem is tied to our nine-scenario taxonomy and should be evaluated through the lenses defined in § 4.4—path quality, safety and robustness, responsiveness, computational footprint, scalability, constraint compliance, and deploy ability. The intent is to guide verifiable progress without prescribing specific benchmarks.

6.2.1. Compute-Aware, Safety-Critical Online Planning (PND/CND/DND)

The key challenge is to produce online plans within strict cycle-time budgets on resource-limited avionics while maintaining separation and mission performance in dynamic, cluttered scenes. Promising directions include global-backbone–aware incremental replanning, short-horizon optimization with safety filters, and learned controllers wrapped by certified safety layers, with digital twins sizing horizons and buffers pre-flight. Progress should be reported by demonstrating cycle-time compliance, safety under disturbances, and graceful degradation when sensing or computing temporarily degrades.

6.2.2. Scalable Task Allocation with Coupled Motion (PND/CND/DND)

Heterogeneous fleets must make online assignment decisions that respect reachability and time-window limits while remaining stable as team size and task density grow. Viable approaches include rolling-horizon hybrids (auction/Hungarian/MILP combined with predictive routing), decentralized market mechanisms with minimal messaging, and fairness-aware load balancing. Digital twins enable at-scale what-if evaluation before deployment. Evidence of progress emphasizes makespan, total travel, latency/fairness and stability under churn and partial observability.

6.2.3. Bandwidth-Aware Decentralized Planning with Reliability Guarantees (PND/CND/DND/PFD/DFD/CFD)

When communication graphs are time-varying and lossy, planners must retain safety and useful performance with intermittent consensus and local fallback behaviors. The agenda is to identify messaging-light coordination and event-triggered updates that keep separation invariant and bound the performance gap to centralized ideals; digital twins can expose failure modes across packet-loss/latency regimes. Robustness should be quantified under controlled loss patterns and mobility profiles.

6.2.4. Certifiable Planning in Dense Low-Altitude Corridors (PND/CND/DND)

Urban airspaces with geofences, curvature/jerk limits and moving obstacles require assurance both pre-flight and at runtime. Risk-based certification frameworks, such as the JARUS Specific Operations Risk Assessment (SORA), as adopted in EASA’s “specific” category rules, and the EU U-space Regulation (EU) 2021/664 for dedicated low-altitude drone airspace already require operators to demonstrate that proposed trajectories, geofences, and contingency procedures keep air and ground risk within acceptable levels and remain compatible with U-space/UTM services [121,122]. In parallel, international UAS Traffic Management (UTM) initiatives and standards such as ASTM F3269 (run-time assurance for complex functions) and ASTM F3548 (UTM UAS Service Supplier interoperability) are pushing towards architectures in which complex onboard planners are supervised by certifiable monitors and integrated with traffic-management services, rather than being certified in isolation [123,124].

Research should therefore reduce the conservatism of safety envelopes, connect offline corridor synthesis to auditable runtime monitors and online updates at practical compute budgets, and use digital twins to vet corner cases before flight. Credible results verify safety margins and low false-positive rates while keeping computational costs within onboard constraints and demonstrate how the proposed planning and monitoring scheme could be incorporated into SORA/UTM/U-space style approval workflows.

6.2.5. Energy-Aware Hierarchical Planning with Lifecycle Constraints (PFD/PND/CFS/CFD)

Fleets operating under battery and thermal limits need coupled route–schedule–speed planning that can switch between endurance-optimal and time-critical modes and remain feasible during replans. Lightweight yet faithful energy/thermal surrogates calibrated in digital twins, together with triggers for mode switching, are central. Progress is reflected in mission success, with healthy energy margins, endurance gains, and stability across payloads and weather.

6.3. Application-Oriented Outlook

Aligned with the open problems in Section 6.2, the following domain-driven research tasks specify scenario cells, method patterns, and verifiable metrics. Each item is phrased as an actionable “what to build and how to measure”.

Disaster response (wildfire reconnaissance; post-disaster SAR)—(CND/CFD + PND)

Task. Online dynamic coverage with intermittent communications and safety/energy constraints.

Approach. CTDE-style MARL for exploration/re-tasking with a lightweight near-field safety layer (APF); rolling-horizon PSO (/hybrids) for next-best path under evolving hazards; decentralized fallback under packet loss.

Metrics. Time-to-first-scan/time-to-localize, coverage ratio and mean revisit interval, energy per covered area, comm success under loss, minimum-separation violations.

b.: Public safety surveillance (urban patrol, border monitoring, large-event security)—(PND/DND/CND)

Task. Path + distribution with moving/ephemeral targets under partial observability.

Approach. Attention/GNN-augmented MARL for assignment/routing; spatiotemporal RRT* or short-horizon MPC for kinodynamic/separation constraints; auction/Hungarian re-assignment to curb overlap.

Metrics. Event detection/recall, track continuity (MOTA/MOTP), patrol-coverage uniformity, mean revisit time, operator workload from false alarms.

c.: Urban low-altitude corridors/UAM-UTM—(PND/CND; certification focus)

Task. Certifiable planning under geofences, curvature/jerk and moving obstacles, with auditable runtime monitors.

Approach. MPC with safety shields and incremental replanning on a certified global backbone; event-triggered updates sized via digital-twin stress tests.

Metrics. Verified safety margins, low false-positive rate of runtime monitors, cycle-time compliance on avionics, violation-free hours in dense corridors.

d.: Smart logistics (multi-drop delivery with dynamic orders and airspace rules)—(PFD/CFD + DND/DFD)

Task. Coupled allocation–routing under time windows, no-fly zones, and depot/curb constraints.

Approach. Hybrid centralized–decentralized pipeline: market-based assignment + PSO (/DE/ACO) trajectory refinement; rolling-horizon replans for order churn.

Metrics. On-time delivery rate, makespan and total travel, energy per package, constraint-violation count (time-window/geofence/curvature).

e.: Linear-asset inspection (power lines, pipelines, rail)—(CFD/CFS + DND)

Task. Length-constrained coverage with connectivity and safety buffers.

Approach. PSO–/ACO-hybrids with corridor seeding; connectivity-aware routing (graph or RRT* + link budgets); decentralized APF for near-field conflicts.

Metrics. Coverage completeness, duplicate-coverage rate, path smoothness, link QoS (throughput/latency) and drop-rate along corridors.

f.: Environmental monitoring and precision agriculture—(CFS/CND)

Task. Large-area coverage with energy and revisit constraints.

Approach. Area segmentation (Voronoi/decomposition) for load balance + RL-guided next-best-view; battery-aware scheduling.

Metrics. Coverage ratio, uniformity, energy per hectare, revisit bound for hotspots, battery cycles per sortie.

g.: Airport bird-dispersion and airfield safety—(DND/PND)

Task. Predict–intercept with stochastic flock motion and strict separation.

Approach. LSTM (+Kalman) trajectory prediction; Hungarian/auction pairing; curvature-constrained Dubins paths; APF safety layer.

Metrics. Interception success/time, runway-vicinity incursion rate, separation violations, operator interventions.

h.: Indoor warehouse swarms (GPS-denied)—(PND; static maps, dynamic agents)

Task. High-throughput multi-UAV traffic with narrow aisles and partial observability.

Approach. VIO/vision-based policies with local safety shields; graph-based global flows + local RL; priority rules at intersections.

Metrics. Throughput (orders/hour), collision/near-miss rate, queuing delay at chokepoints, compute/energy per task.

Beyond the application instances above, two cross-cutting deployment directions deserve explicit attention in future work. First, software-defined sensing (“software sensors”) can reduce payload and integration burden by inferring state, map updates, and platform health from heterogeneous signals; importantly, it can provide integrity/confidence indicators that planners can consume to trigger confidence-aware replanning under degraded sensing. Second, anti-jamming and interference-resilient operations should be treated as a first-class planning constraint rather than an external add-on. In partially observable and bandwidth-limited settings (notably PND/DND/CND), planners need to reason about link variability, communication outages, and navigation degradation and to support graceful fallbacks (e.g., conservative separation, rendezvous policies, mode switching, or decentralized execution) when integrity drops. Incorporating these effects into digital-twin evaluations and benchmarks that inject sensing failures and interference would help quantify real-time overheads and clarify when online planning is feasible versus when offline backbones and slow-horizon updates are more appropriate.

Method selection should be conditioned on the scenario cell (online/offline; static/dynamic), link reliability and team size (centralized vs. decentralized vs. hybrid), and energy/compute budgets. Publishing domain-specific, reproducible test suites with the above metrics will tighten comparability without prescribing a single algorithm family as universally optimal.

7. Conclusions

This review consolidates the most recent advances on multi-UAV swarms through a nine-scenario taxonomy that jointly spans Path/Distribution/Coverage under offline/online and static/dynamic settings, enabling side-by-side, scenario-conditioned comparisons. We standardize terminology and citations and complement the synthesis with quantitative statistics (Figure 2) and a “Limitations (as reported)” column in all summary tables, making trade-offs explicit. Beyond summarization, we provide practitioner-oriented guidance: Section 3 distills scenario-specific adaptations and evidence; Section 4 treats cross-cutting issues—architecture selection (centralized/decentralized/hybrid), digital-twin–driven pre-deployment validation, and safety-aware collision avoidance in cluttered, dynamic environments—thus bridging offline planning and online execution. Taken together, these elements yield actionable design rules.

Focusing on a four-year window prioritizes recency but may under-represent earlier foundational work; we mitigate this by citing canonical sources where needed. Comparisons are constrained by non-uniform metrics, datasets, and simulators across studies, which limits strict apples-to-apples evaluation; part of the evidence is simulation-only, so sim-to-real gaps and model mismatch remain concerns. Taxonomic boundaries can blur for hybrid pipelines (e.g., offline pre-planning with online replanning), and classification choices inevitably reflect author judgment. Finally, publication bias, incomplete ablations, and restricted access to code/data can affect completeness.

Beyond summarizing individual algorithm families, this review’s nine-scenario taxonomy is intended to be a reusable lens for both analysis and deployment. By organizing recent work into mission–planning–environment cells and describing each cell with a common set of attributes (architecture, horizon, constraints, evaluation lenses, and reported limitations), the taxonomy makes cross-scenario patterns visible and supports “lookup-style” use. Practitioners can map a concrete project onto the closest scenario cell, inspect which combinations of architectures and algorithm families have already been tested under comparable assumptions, and then read across neighboring cells to understand how these choices change when the planning mode or environment shifts. In this way, the taxonomy complements existing algorithm-centric and bibliometric surveys by linking method families to mission context and by turning heterogeneous findings into a structured, deployment-oriented reference.

The synthesis highlights five priorities for near-term progress: (i) real-time adaptivity on resource-constrained platforms; (ii) scalable multi-objective scheduling and cooperative control; (iii) bandwidth-aware, conflict-resilient intra-swarm communication for heterogeneous teams; (iv) safety-aware, certifiable planning for dense, low-altitude airspaces; and (v) energy-aware planners that couple offline pre-planning with online replanning. We hope the taxonomy, statistics, and scenario-conditioned guidance presented here can serve as a practical scaffold for standardized benchmarks, digital-twin-supported validation, and fielded deployments.

Author Contributions

Conceptualization, J.L. (Junqi Li) and J.L. (Junjie Li); formal analysis, J.L. (Junqi Li) and W.M.; writing—original draft preparation, J.L. (Junqi Li) and J.L. (Junjie Li); writing—review and editing, J.L. (Junjie Li) and J.Z.; visualization, J.L. (Junqi Li) and J.L. (Junjie Li); supervision, J.Z. and W.M.; funding acquisition, J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

To improve readability, a concise glossary of frequently used terms, scenario codes, and acronyms is provided below.

Acronyms and Abbreviations
6G	Sixth-generation cellular network (used as a communication context in some studies).
A*	A-star graph-search algorithm for shortest-path planning on grids/graphs.
ACO	Ant colony optimization.
ANN	Artificial neural network.
AoI	Age of Information; a timeliness metric for sensed/communicated data.
APF	Artificial potential field; a reactive method for obstacle/formation avoidance and guidance.
CFD	Coverage mission + offline planning + dynamic environment (scenario code used in this review).
CFS	Coverage mission + offline planning + static environment (scenario code used in this review).
CND	Coverage mission + online planning + dynamic environment (scenario code used in this review).
CNN	Convolutional neural network.
COMA	Counterfactual Multi-Agent policy gradients; a CTDE multi-agent RL method.
CTDE	Centralized training with decentralized execution (multi-agent learning paradigm).
DDPG	Deep Deterministic Policy Gradient (continuous-control RL algorithm).
DDQN	Double Deep Q-Network.
DE	Differential evolution.
DEM	Digital elevation model.
DFD	Distribution mission + offline planning + dynamic environment (scenario code used in this review).
DFS	Distribution mission + offline planning + static environment (scenario code used in this review).
DND	Distribution mission + online planning + dynamic environment (scenario code used in this review).
DQN	Deep Q-Network.
DT	Digital twin; a virtual replica/simulator used for validation, monitoring, or optimization.
FANET	Flying ad hoc network; a self-organized UAV communication network.
GA	Genetic algorithm.
GIS	Geographic information system.
GNSS	Global Navigation Satellite System.
LoS	Line of sight (communication or sensing).
LSTM	Long short-term memory network.
MADDPG	Multi-Agent DDPG (a CTDE multi-agent RL algorithm).
MAPPO	Multi-Agent PPO (a CTDE multi-agent RL algorithm).
MDP	Markov decision process.
MILP	Mixed-integer linear programming.
MPC	Model predictive control; receding-horizon optimization with constraints.
PFD	Path mission + offline planning + dynamic environment (scenario code used in this review).
PFS	Path mission + offline planning + static environment (scenario code used in this review).
PH curve	Pythagorean-hodograph curve; a smooth parametric curve used for trajectory generation.
PND	Path mission + online planning + dynamic environment (scenario code used in this review).
PPO	Proximal Policy Optimization (RL algorithm).
PSO	Particle swarm optimization.
QMIX	Value-decomposition method for cooperative multi-agent RL.
RGG	Random geometric graph.
RIS	Reconfigurable intelligent surface (communication technology).
RRT	Rapidly exploring random tree (sampling-based planner).
RRT*	Asymptotically optimal variant of RRT.
SAC	Soft Actor-Critic (RL algorithm).
SINR	Signal-to-interference-plus-noise ratio.
TD3	Twin delayed DDPG (RL algorithm).
TDOA	Time difference of arrival (used for localization).
VNS	Variable neighborhood search.
VRP	Vehicle routing problem.
VRPTW	Vehicle routing problem with time windows.
WSN	Wireless sensor network.
Key Terms
Centralized planning	Planning/decision making is computed at a central node with (near-)global information and then dispatched to individual UAVs.
Decentralized planning	Each UAV plans using local observations and limited messages; coordination emerges through communication and local rules.
Hybrid architecture	Combines centralized components (e.g., global assignment or map fusion) with decentralized local planning/execution.
Kinodynamic constraints	Motion constraints that account for both kinematics and dynamics (e.g., speed/acceleration limits, turn rate, and actuator bounds).
Motion primitives	A precomputed library of short, feasible maneuvers used to compose longer trajectories with low online computation.
Receding-horizon planning	Repeatedly optimizes over a finite future horizon as new observations arrive (e.g., MPC-style replanning).
Safety layer (shield)	An additional mechanism that enforces collision avoidance or constraint satisfaction, even when the high-level planner is imperfect.
Static vs. dynamic environment: Static	obstacles/threats are fixed during planning; Dynamic: obstacles, targets, or threats change over time and require replanning.
Scenario codes (e.g., PND)	Three-letter codes used in this review: the first letter indicates mission type (P = Path, D = Distribution, C = Coverage); the second indicates planning mode (N = online, F = offline); and the third indicates environment type (S = static, D = dynamic).
Online vs. offline planning	Offline: plans are computed before execution; Online: plans are updated during execution based on sensed changes or new tasks.

References

Yang, X.; Wang, R.; Zhang, T. Review of unmanned aerial vehicle swarm path planning based on intelligent optimization. Control Theory Appl. 2020, 37, 2291–2302. [Google Scholar]
Aggarwal, S.; Kumar, N. Path planning techniques for unmanned aerial vehicles: A review, solutions, and challenges. Comput. Commun. 2020, 149, 270–299. [Google Scholar] [CrossRef]
Yang, L.; Qi, J.; Xiao, J.; Yong, X. A literature review of UAV 3D path planning. In Proceedings of the 11th World Congress on Intelligent Control and Automation, Shenyang, China, 29 June–4 July 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 2376–2381. [Google Scholar]
Cetinsaya, B.; Reiners, D.; Cruz-Neira, C. From PID to swarms: A decade of advancements in drone control and path planning—A systematic review (2013–2023). Swarm Evol. Comput. 2024, 89, 101626. [Google Scholar] [CrossRef]
Rahman, M.; Sarkar, N.I.; Lutui, R. A survey on multi-UAV path planning: Classification, algorithms, open research problems, and future directions. Drones 2025, 9, 263. [Google Scholar] [CrossRef]
Wu, Q.; Su, Y.; Tan, W.; Zhan, R.; Liu, J.; Jiang, L. UAV path planning trends from 2000 to 2024: A bibliometric analysis and visualization. Drones 2025, 9, 128. [Google Scholar] [CrossRef]
Ait Saadi, A.; Soukane, A.; Meraihi, Y.; Benmessaoud Gabis, A.; Mirjalili, S.; Ramdane-Cherif, A. UAV path planning using optimization approaches: A survey. Arch. Comput. Methods Eng. 2022, 29, 4233–4284. [Google Scholar] [CrossRef]
Zhang, H.; Xin, B.; Dou, L.; Chen, J.; Hirota, K. A review of cooperative path planning of an unmanned aerial vehicle group. Front. Inf. Technol. Electron. Eng. 2020, 21, 1671–1694. [Google Scholar] [CrossRef]
Ghambari, S.; Golabi, M.; Jourdan, L.; Lepagnot, J.; Idoumghar, L. UAV path planning techniques: A survey. RAIRO-Oper. Res. 2024, 58, 2951–2989. [Google Scholar] [CrossRef]
Chen, W.; Chi, W.; Ji, S.; Ye, H.; Liu, J.; Jia, Y.; Yu, J.; Cheng, J. A survey of autonomous robots and multi-robot navigation: Perception, planning and collaboration. Biomim. Intell. Robot. 2025, 5, 100203. [Google Scholar] [CrossRef]
Bui, H. A survey of multi-robot motion planning. arXiv 2023, arXiv:2310.08599. [Google Scholar] [CrossRef]
Athira, K.A.; Udayan, D.J.; Subramaniam, U. A systematic literature review on multi-robot task allocation. ACM Comput. Surv. 2024, 57, 6801–6828. [Google Scholar] [CrossRef]
Hu, J.; Fan, L.; Lei, Y.; Xu, Z.; Fu, W.; Xu, G. Reinforcement learning-based low-altitude path planning for UAS swarm in diverse threat environments. Drones 2023, 7, 567. [Google Scholar] [CrossRef]
Westheider, J.; Rückin, J.; Popović, M. Multi-UAV adaptive path planning using deep reinforcement learning. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 649–656. [Google Scholar]
Kong, X.; Zhou, Y.; Li, Z.; Wang, S. Multi-UAV simultaneous target assignment and path planning based on deep reinforcement learning in dynamic multiple obstacles environments. Front. Neurorobot. 2024, 17, 1302898. [Google Scholar] [CrossRef]
Arranz, R.; Carramiñana, D.; Miguel, G.; Besada, J.A.; Bernardos, A.M. Application of deep reinforcement learning to UAV swarming for ground surveillance. Sensors 2023, 23, 8766. [Google Scholar] [CrossRef]
Wang, X.; Gursoy, M.C. Robust and decentralized reinforcement learning for UAV path planning in IoT networks. arXiv 2023, arXiv:2312.06250. [Google Scholar] [CrossRef]
Cheng, Y.; Li, D.; Wong, W.; Zhao, M.; Mo, D. Multi-UAV collaborative path planning using hierarchical reinforcement learning and simulated annealing. Int. J. Perform. Eng. 2022, 18, 463–474. [Google Scholar] [CrossRef]
Niu, Y.; Yan, X.; Wang, Y.; Niu, Y. Three-dimensional collaborative path planning for multiple UCAVs based on improved artificial ecosystem optimizer and reinforcement learning. Knowl.-Based Syst. 2023, 276, 110782. [Google Scholar] [CrossRef]
Wu, W.; Zhang, X. Reinforcement learning-based swarm control for UAVs in static and dynamic multi-obstacle environments. In Proceedings of the 2023 China Automation Congress (CAC), Qingdao, China, 3–5 November 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1387–1392. [Google Scholar]
Azzam, R.; Boiko, I.; Zweiri, Y. Swarm cooperative navigation using centralized training and decentralized execution. Drones 2023, 7, 193. [Google Scholar] [CrossRef]
Wang, W.L.; You, M.; Sun, L.; Zhang, X.; Zong, Q. Intelligent cooperative exploration and path planning for UAV swarms in unknown environments. Chin. J. Eng. Sci. 2024, 46, 1197–1206. [Google Scholar]
Liu, J. Research on UAV Swarm Capture Method Based on Game Learning. Master’s Thesis, Xi’an Technological University, Xi’an, China, 2023. [Google Scholar]
Wu, Q.; Liu, K.; Chen, L.; Lv, J. Multi-Agent Reinforcement Learning-Based UAV Pathfinding for Obstacle Avoidance in Stochastic Environment. arXiv 2023, arXiv:2310.16659. [Google Scholar]
Zhao, X.; Yang, R.; Zhong, L.; Hou, Z. Multi-UAV Path Planning and Following Based on Multi-Agent Reinforcement Learning. Drones 2024, 8, 18. [Google Scholar] [CrossRef]
Liu, Y.; Li, X.; Wang, J.; Wei, F.; Yang, J. Reinforcement-Learning-Based Multi-UAV Cooperative Search for Moving Targets in 3D Scenarios. Drones 2024, 8, 378. [Google Scholar] [CrossRef]
Dhuheir, M.A.; Baccour, E.; Erbad, A.; Al-Obaidi, S.S.; Hamdi, M. Deep reinforcement learning for trajectory path planning and distributed inference in resource-constrained UAV swarms. IEEE Internet Things J. 2022, 10, 8185–8201. [Google Scholar] [CrossRef]
Li, M.; Ma, Q.; Wu, G. UAV swarm dynamic task planning algorithm based on reinforcement learning. Syst. Simul. Technol. 2023, 19, 193–204. [Google Scholar]
Du, J. Path Planning and Task Assignment of UAV Swarm Under Incomplete Information. Master’s Thesis, Harbin Engineering University, Harbin, China, 2023. [Google Scholar]
Chen, H.C.; Yen, L.H. DRL-based distributed joint serving and charging scheduling for UAV swarm. In Proceedings of the 2024 International Conference on Information Networking (ICOIN), Ho Chi Minh City, Vietnam, 17–20 January 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 587–592. [Google Scholar]
Demir, K.; Tumen, V.; Kosunalp, S.; Iliev, T. A deep reinforcement learning algorithm for trajectory planning of swarm UAV fulfilling wildfire reconnaissance. Electronics 2024, 13, 2568. [Google Scholar] [CrossRef]
Cheng, X.; Jiang, R.; Sang, H.; Li, G.; He, B. Trace pheromone-based energy-efficient UAV dynamic coverage using deep reinforcement learning. IEEE Trans. Cogn. Commun. Netw. 2024, 10, 1063–1074. [Google Scholar] [CrossRef]
Dhuheir, M.; Erbad, A.; Al-Fuqaha, A.; Seid, A.M. Meta reinforcement learning for UAV-assisted energy harvesting IoT devices in disaster-affected areas. IEEE Open J. Commun. Soc. 2024, 5, 2145–2163. [Google Scholar] [CrossRef]
Baccour, E.; Erbad, A.; Hamdi, M.; Guizani, M. RL-based adaptive UAV swarm formation and clustering for secure 6G wireless communications in dynamic dense environments. IEEE Access 2024, 12, 125609–125628. [Google Scholar] [CrossRef]
Puente-Castro, A.; Rivero, D.; Pedrosa, E.; Pereira, A.; Lau, N.; Fernandez-Blanco, E. Q-learning based system for path planning with unmanned aerial vehicles swarms in obstacle environments. Expert Syst. Appl. 2024, 235, 121240. [Google Scholar] [CrossRef]
Zou, L. Research on UAV Cooperative Area Search Planning Based on Reinforcement Learning. Master’s Thesis, Huazhong University of Science and Technology, Wuhan, China, 2023. [Google Scholar]
He, J. Multi-Agent Reinforcement Learning Regional Coverage Method for Real Tasks and Constraints. Ph.D. Thesis, University of Electronic Science and Technology of China, Chengdu, China, 2023. [Google Scholar]
Hou, Y.; Zhao, J.; Zhang, R.; Cheng, X.; Yang, L. UAV swarm cooperative target search: A multi-agent reinforcement learning approach. IEEE Trans. Intell. Veh. 2023, 9, 568–578. [Google Scholar] [CrossRef]
Burzyński, W.; Stecz, W. Trajectory planning with multiplatform spacetime RRT. Appl. Intell. 2024, 54, 9524–9541. [Google Scholar] [CrossRef]
Kelner, J.M.; Burzyński, W.; Stecz, W. Modeling UAV swarm flight trajectories using Rapidly-exploring Random Tree algorithm. J. King Saud Univ.-Comput. Inf. Sci. 2024, 36, 101909. [Google Scholar] [CrossRef]
Xiang, L.; Wang, F.; Xu, W.; Zhang, T.; Pan, M.; Han, Z. Dynamic UAV swarm collaboration for multi-targets tracking under malicious jamming: Joint power, path and target association optimization. IEEE Trans. Veh. Technol. 2023, 73, 5410–5425. [Google Scholar] [CrossRef]
Kang, C.; Xu, J.; Bian, Y. Affine formation maneuver control for multi-agent based on optimal flight system. Appl. Sci. 2024, 14, 2292. [Google Scholar] [CrossRef]
Zhao, W.; Li, L.; Wang, Y.; Zhan, H.; Fu, Y.; Song, Y. Research on a global path-planning algorithm for unmanned aerial vehicle swarm in three-dimensional space based on Theta*–artificial potential field method. Drones 2024, 8, 125. [Google Scholar] [CrossRef]
Chen, G.; Yuan, S.; Zhu, X.; Zhou, G.; Zhang, Z. Path planning for fast swarm source seeking in unknown environments. Int. J. Adapt. Control Signal Process. 2024, 38, 360–377. [Google Scholar] [CrossRef]
Li, J.; Zi, S.; Lu, X. Combat strategy of UAV swarm based on improved artificial potential field method. Radio Eng. 2024, 54, 1970–1977. [Google Scholar]
Kallies, C.; Gasche, S.; Karásek, R. Multi-agent cooperative path planning via model predictive control. In Proceedings of the 2024 Integrated Communications, Navigation and Surveillance Conference (ICNS), Herndon, VA, USA, 23–25 April 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–7. [Google Scholar]
Fan, X.; Li, H.; Chen, Y.; Dong, D. A path-planning method for UAV swarm under multiple environmental threats. Drones 2024, 8, 171. [Google Scholar] [CrossRef]
Wang, Y.; Zhang, T.; Cai, Z.; Zhao, J.; Wu, K. Multi-UAV coordination control by chaotic grey wolf optimization based distributed MPC with event-triggered strategy. Chin. J. Aeronaut. 2020, 33, 2877–2897. [Google Scholar] [CrossRef]
Xian, B.; Song, N. Multi-UAV path planning based on model predictive control and improved artificial potential field method. Control Decis. 2024, 39, 2133–2141. [Google Scholar]
Wee, L.B.; Paw, Y.C. Simultaneous mapping localization and path planning for UAV swarm. In Proceedings of the 2023 IEEE Aerospace Conference, Big Sky, MT, USA, 4–11 March 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar]
Guo, J.; Gao, Y.; Liu, Y. Task assignment and path planning algorithm for multiple fixed-wing UAVs. J. Taiyuan Univ. Technol. 2025, 56, 348–355. [Google Scholar]
Li, K.; Yan, X.; Han, Y. Multi-mechanism swarm optimization for multi-UAV task assignment and path planning in transmission line inspection under multi-wind field. Appl. Soft Comput. 2024, 150, 111033. [Google Scholar] [CrossRef]
Luo, X. Research on Multi-UAV Cooperative Task Decision-Making and Planning Method Based on Ant Colony Algorithm in Unknown Environments. Master’s Thesis, University of Electronic Science and Technology of China, Chengdu, China, 2023. [Google Scholar]
Zhang, H.; Ma, H.; Mersha, B.W.; Zhang, X.; Jin, Y. Distributed cooperative search method for multi-UAV with unstable communications. Appl. Soft Comput. 2023, 148, 110592. [Google Scholar] [CrossRef]
Wang, Q.; Xu, M.; Hu, Z. Path planning of unmanned aerial vehicles based on an improved bio-inspired tuna swarm optimization algorithm. Biomimetics 2024, 9, 388. [Google Scholar] [CrossRef]
Gu, G.; Li, H.; Zhao, C. A multi-strategy enhanced marine predator algorithm for global optimization and UAV swarm path planning. IEEE Access 2024, 12, 112095–112115. [Google Scholar] [CrossRef]
Xu, N.; Zhu, H.; Sun, J. Bionic 3D path planning for plant protection UAVs based on swarm intelligence algorithms and krill swarm behavior. Biomimetics 2024, 9, 353. [Google Scholar] [CrossRef]
Fu, S.; Li, K.; Huang, H.; Ma, C.; Fan, Q.; Zhu, Y. Red-billed blue magpie optimizer: A novel metaheuristic algorithm for 2D/3D UAV path planning and engineering design problems. Artif. Intell. Rev. 2024, 57, 134. [Google Scholar] [CrossRef]
Liu, P.; Sun, N.; Wan, H.; Zhang, C.; Zhao, J.; Wang, G. Improved adaptive snake optimization algorithm with application to multi-UAV path planning. Trans. Inst. Meas. Control 2024, 47, 1639–1650. [Google Scholar] [CrossRef]
Liu, L.; Lu, Y.; Yang, B.; Yang, L.; Zhao, J.; Chen, Y.; Li, L. Research on a multi-strategy improved sand cat swarm optimization algorithm for three-dimensional UAV trajectory path planning. World Electr. Veh. J. 2024, 15, 244. [Google Scholar] [CrossRef]
Yin, S.; Yang, J.; Ma, L.; Fu, M.; Xu, K. An enhanced whale algorithm for three-dimensional path planning for meteorological detection of the unmanned aerial vehicle in complex environments. IEEE Access 2024, 12, 60039–60057. [Google Scholar] [CrossRef]
Chen, F.; Tang, Y.; Li, N.; Wang, T.; Hu, Y. A study of collaborative trajectory planning method based on starling swarm bionic algorithm for multi-unmanned aerial vehicle. Appl. Sci. 2023, 13, 6795. [Google Scholar] [CrossRef]
Xiang, H.; Han, Y.; Pan, N.; Zhang, M.; Wang, Z. Study on multi-UAV cooperative path planning for complex patrol tasks in large cities. Drones 2023, 7, 367. [Google Scholar] [CrossRef]
Wu, X.J.; Xu, L.; Zhen, R.; Wu, X. Global and local moth-flame optimization algorithm for UAV formation path planning under multi-constraints. Int. J. Control Autom. Syst. 2023, 21, 1032–1047. [Google Scholar] [CrossRef]
Zhang, X.; Zhang, X.; Miao, Y. Cooperative global path planning for multiple unmanned aerial vehicles based on improved fireworks algorithm using differential evolution operation. Int. J. Aeronaut. Space Sci. 2023, 24, 1346–1362. [Google Scholar] [CrossRef]
Hou, J.; Zhou, X.; Pan, N.; Li, A.; Guan, Y.; Xu, C.; Gan, Z.; Gao, F. Primitive-Swarm: An Ultra-lightweight and Scalable Planner for Large-scale Aerial Swarms. arXiv 2025, arXiv:2502.16887. [Google Scholar] [CrossRef]
Tan, C.; Liu, X. Improved two-stage task allocation of distributed UAV swarms based on an improved auction mechanism. Int. J. Mach. Learn. Cybern. 2024, 15, 5119–5128. [Google Scholar] [CrossRef]
Wang, G.; Wang, F.; Wang, J.; Li, M.; Gai, L.; Xu, D. Collaborative target assignment problem for large-scale UAV swarm based on two-stage greedy auction algorithm. Aerosp. Sci. Technol. 2024, 149, 109146. [Google Scholar] [CrossRef]
Aljalaud, F.; Kurdi, H.; Youcef-Toumi, K. Autonomous multi-UAV path planning in pipe inspection missions based on booby behavior. Mathematics 2023, 11, 2092. [Google Scholar] [CrossRef]
Saadi, A.A.; Soukane, A.; Meraihi, Y.; Gabis, A.B.; Ramdane-Cherif, A. A hybrid improved manta ray foraging optimization with Tabu search algorithm for solving the UAV placement problem in smart cities. IEEE Access 2023, 11, 24315–24342. [Google Scholar] [CrossRef]
Chen, Y.; Pi, D.; Wang, B.; Mohamed, A.W.; Chen, J.; Wang, Y. Equilibrium optimizer with generalized opposition-based learning for multiple unmanned aerial vehicle path planning. Soft Comput. 2024, 28, 6185–6198. [Google Scholar] [CrossRef]
Du, Y. Multi-UAV search and rescue with enhanced A∗ algorithm path planning in 3D environment. Int. J. Aerosp. Eng. 2023, 2023, 8614117. [Google Scholar] [CrossRef]
Bashir, N.; Boudjit, S.; Dauphin, G. A connectivity aware path planning for a fleet of UAVs in an urban environment. IEEE Trans. Intell. Transp. Syst. 2023, 24, 10537–10552. [Google Scholar] [CrossRef]
Xie, J.; Zhang, G.; Zhang, W. A swarm motion planning algorithm for multi-UAV cooperative tasks. In Proceedings of the 7th National Conference on Swarm Intelligence and Cooperative Control, Harbin, China, 24–27 September 2023; China Command and Control Society, Harbin Institute of Technology Simulation Center: Harbin, China, 2023; p. 7. [Google Scholar]
Kladis, G.P.; Doitsidis, L.; Tsourveloudis, N.C. Energy-efficient path-planning for UAV swarm based missions: A genetic algorithm approach. In Proceedings of the 2024 International Conference on Unmanned Aircraft Systems (ICUAS), Chania, Greece, 11–14 June 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 458–463. [Google Scholar]
Wu, Y.; Liang, T.; Gou, J.; Tao, C.; Wang, H. Heterogeneous mission planning for multiple UAV formations via metaheuristic algorithms. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 3924–3940. [Google Scholar] [CrossRef]
Xiong, T.; Liu, F.; Liu, H.; Ge, J.; Li, H.; Ding, K.; Li, Q. Multi-drone optimal mission assignment and 3D path planning for disaster rescue. Drones 2023, 7, 394. [Google Scholar] [CrossRef]
Pan, H.; Liu, Y.; Sun, G.; Fan, J.; Liang, S.; Yuen, C. Joint power and 3-D trajectory optimization for UAV-enabled wireless powered communication networks with obstacles. IEEE Trans. Commun. 2023, 71, 2364–2380. [Google Scholar] [CrossRef]
Du, P.; He, X.; Cao, H.; Garg, S.; Kaddoum, G.; Hassan, M.M. AI-based energy-efficient path planning of multiple logistics UAVs in intelligent transportation systems. Comput. Commun. 2023, 207, 46–55. [Google Scholar] [CrossRef]
Jia, Z.; Xiao, B.; Qian, H. Improved mixed discrete particle swarms based multi-task assignment for UAVs. In Proceedings of the 2023 IEEE 12th Data Driven Control and Learning Systems Conference (DDCLS), Harbin, China, 5–7 August 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 442–448. [Google Scholar]
Wu, J.; Zhang, N.; Li, D.; Bi, J.; Han, G. A context-aware feature fusion method for multi-uav cooperative air combat. IEEE Trans. Intell. Transp. Syst. 2025, 26, 7197–7210. [Google Scholar] [CrossRef]
Jasim, A.N.; Fourati, L.C. Guided genetic algorithm for solving capacitated vehicle routing problem with unmanned-aerial-vehicles. IEEE Access 2024, 12, 106333–106358. [Google Scholar] [CrossRef]
Liu, Y.; Zhu, X.; Zhang, X.; Xiao, J.; Yu, X. RGG-PSO+: Random geometric graphs based particle swarm optimization method for UAV path planning. Int. J. Comput. Intell. Syst. 2024, 17, 127. [Google Scholar] [CrossRef]
Meng, Q.; Chen, K.; Qu, Q. Ppswarm: Multi-UAV path planning based on hybrid PSO in complex scenarios. Drones 2024, 8, 192. [Google Scholar] [CrossRef]
Huang, H.; Li, Y.; Song, G.; Gai, W. Deep reinforcement learning-driven UAV data collection path planning: A study on minimizing AoI. Electronics 2024, 13, 1871. [Google Scholar] [CrossRef]
Cao, Z.; Li, D.; Zhang, B. Dynamic trajectory planning for UAV cluster by weighted Voronoi diagram with particle swarm optimization. In Proceedings of the International Conference on Autonomous Unmanned Systems, Xi’an, China, 23–25 September 2022; Springer Nature: Singapore, 2022; pp. 3479–3490. [Google Scholar]
Shao, Z.; Zhou, Z.; Qu, G.; Zhu, X. Reference path planning for UAVs formation flight based on PH curve. In Proceedings of the Asia-Pacific International Symposium on Aerospace Technology, Jeju, Republic of Korea, 15–17 November 2021; Springer Nature: Singapore, 2021; pp. 155–168. [Google Scholar]
Wang, C.; Zhang, L.; Gao, Y.; Zheng, X.; Wang, Q. A cooperative game hybrid optimization algorithm applied to UAV inspection path planning in urban pipe corridors. Mathematics 2023, 11, 3620. [Google Scholar] [CrossRef]
Sheng, L.; Li, H.; Qi, Y.; Shi, M. Real-time screening and trajectory optimization of UAVs in cluster based on improved particle swarm optimization algorithm. IEEE Access 2023, 11, 81838–81851. [Google Scholar] [CrossRef]
Tan, L.; Zhang, H.; Shi, J.; Liu, Y.; Yuan, T. A robust multiple unmanned aerial vehicles 3D path planning strategy via improved particle swarm optimization. Comput. Electr. Eng. 2023, 111, 108947. [Google Scholar] [CrossRef]
Wang, L.; Luan, Y.; Xu, L. UAV swarm path planning method based on dynamic cluster particle swarm optimization. Comput. Appl. 2023, 43, 3816–3823. [Google Scholar]
Li, Y.; Zhang, L.; Cai, B.; Liang, Y. Unified path planning for composite UAVs via Fermat point-based grouping particle swarm optimization. Aerosp. Sci. Technol. 2024, 148, 109088. [Google Scholar] [CrossRef]
Yan, X.; Chen, R. Application strategy of unmanned aerial vehicle swarms in forest fire detection based on the fusion of particle swarm optimization and artificial bee colony algorithm. Appl. Sci. 2024, 14, 4937. [Google Scholar] [CrossRef]
Beishenalieva, A.; Yoo, S.J. Multiobjective 3-D UAV movement planning in wireless sensor networks using bioinspired swarm intelligence. IEEE Internet Things J. 2022, 10, 8096–8110. [Google Scholar] [CrossRef]
Zhang, J.; Cui, Y.; Ren, J. Dynamic mission planning algorithm for UAV formation in battlefield environment. IEEE Trans. Aerosp. Electron. Syst. 2022, 59, 3750–3765. [Google Scholar] [CrossRef]
Deng, M.; Yao, Z.; Li, X.; Wang, H.; Nallanathan, A.; Zhang, Z. Dynamic multi-objective AWPSO in DT-assisted UAV cooperative task assignment. IEEE J. Sel. Areas Commun. 2023, 41, 3444–3460. [Google Scholar] [CrossRef]
Yu, Y. Research on UAV Swarm Cooperative Task Assignment Technology in Complex Constrained Environments. Ph.D. Thesis, Xidian University, Xi’an, China, 2023. [Google Scholar]
Tang, G.; Xiao, T.; Du, P.; Zhang, P.; Liu, K.; Tan, L. Improved PSO-based two-phase logistics UAV path planning under dynamic demand and wind conditions. Drones 2024, 8, 356. [Google Scholar] [CrossRef]
Li, Y.; Chen, W.; Liu, S.; Yang, G.; He, F. Multi-UAV cooperative air combat target assignment method based on VNS-IBPSO in complex dynamic environment. Int. J. Aerosp. Eng. 2024, 2024, 9980746. [Google Scholar] [CrossRef]
Han, D.; Jiang, H.; Wang, L.; Zhu, X.; Chen, Y.; Yu, Q. Collaborative task allocation and optimization solution for unmanned aerial vehicles in search and rescue. Drones 2024, 8, 138. [Google Scholar] [CrossRef]
Cheng, K.; Hu, T.; Wu, D.; Li, T.; Wang, S.; Liu, K.; Yi, D. Heterogeneous UAV swarm collaborative search mission path optimization scheme for dynamic targets. Int. J. Aerosp. Eng. 2024, 2024, 6643424. [Google Scholar] [CrossRef]
Pehlivanoglu, V.Y.; Pehlivanoğlu, P. An efficient path planning approach for autonomous multi-UAV system in target coverage problems. Aircr. Eng. Aerosp. Technol. 2024, 96, 690–706. [Google Scholar] [CrossRef]
Li, Y.; Chen, W.; Fu, B.; Wu, Z.; Hao, L.; Yang, G. Research on dynamic target search for multi-UAV based on cooperative coevolution motion-encoded particle swarm optimization. Appl. Sci. 2024, 14, 1326. [Google Scholar] [CrossRef]
Tang, Y.; Huang, K.; Tan, Z.; Fang, M.; Huang, H. Multi-subswarm cooperative particle swarm optimization algorithm and its application. Inf. Sci. 2024, 677, 120887. [Google Scholar] [CrossRef]
Wang, Y.; Li, X.; Zhuang, X.; Li, F.; Liang, Y. A sampling-based distributed exploration method for UAV cluster in unknown environments. Drones 2023, 7, 246. [Google Scholar] [CrossRef]
Yan, X.; Chen, R.; Jiang, Z. UAV cluster mission planning strategy for area coverage tasks. Sensors 2023, 23, 9122. [Google Scholar] [CrossRef]
Chen, Y.; Qin, D.; Yang, X.; Zhang, G.; Zhang, X.; Ma, L. A deployment strategy for UAV-aided data collection in unknown environments. IEEE Sens. J. 2024, 24, 27017–27028. [Google Scholar] [CrossRef]
Yan, Y.; Sun, Z.; Hou, Y.; Zhang, B.; Yuan, Z.; Zhang, G.; Ma, X. UAV swarm mission planning and load sensitivity analysis based on clustering and optimization algorithms. Appl. Sci. 2023, 13, 12438. [Google Scholar] [CrossRef]
Yu, S. Research on Task Assignment and Path Planning Methods of UAV Swarm for Target Tracking. Master’s Thesis, Shenyang University of Technology, Shenyang, China, 2023. [Google Scholar]
Wang, X.; Zhang, X.; Lu, Y.; Zhang, H.; Li, Z.; Zhao, P.; Wang, X. Target trajectory prediction-based UAV swarm cooperative for bird-driving strategy at airport. Electronics 2024, 13, 3868. [Google Scholar] [CrossRef]
Szklany, M.; Cohen, A.; Boubin, J. Tsunami: Scalable, fault tolerant coverage path planning for UAV swarms. In Proceedings of the 2024 International Conference on Unmanned Aircraft Systems (ICUAS), Chania, Greece, 11–14 June 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 711–717. [Google Scholar]
Yu, Y.; Lee, S. Multi-UAV coverage path assignment algorithm considering flight time and energy consumption. IEEE Access 2024, 12, 26150–26162. [Google Scholar] [CrossRef]
Gui, J.; Yu, T.; Deng, B.; Zhu, X.; Yao, W. Decentralized multi-UAV cooperative exploration using dynamic centroid-based area partition. Drones 2023, 7, 337. [Google Scholar] [CrossRef]
Swain, S.; Khilar, P.M.; Senapati, B.R. An efficient path planning algorithm for 2D ground area coverage using multi-UAV. Wirel. Pers. Commun. 2023, 132, 361–407. [Google Scholar] [CrossRef]
Bakirci, M.; Ozer, M.M. Post-disaster area monitoring with swarm UAV systems for effective search and rescue. In Proceedings of the 2023 10th International Conference on Recent Advances in Air and Space Technologies (RAST), Istanbul, Turkey, 8–10 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar]
Zhao, X.; Zhang, W.; Zhang, H.; Zheng, C.; Ma, J.; Zhang, Z. ITD-YOLOv8: An infrared target detection model based on YOLOv8 for unmanned aerial vehicles. Drones 2024, 8, 161. [Google Scholar] [CrossRef]
Li, Z.; Lei, L.; Shen, G.; Liu, X.; Liu, X. Digital Twin-Enabled Deep Reinforcement Learning for Safety-Guaranteed Flocking Motion of UAV Swarm. Trans. Emerg. Telecommun. Technol. 2024, 35, e70011. [Google Scholar] [CrossRef]
Shen, G.; Lei, L.; Zhang, X.; Li, Z.; Cai, S.; Zhang, L. Multi-UAV Cooperative Search Based on Reinforcement Learning with a Digital Twin Driven Training Framework. IEEE Trans. Veh. Technol. 2023, 72, 8354–8368. [Google Scholar] [CrossRef]
Sun, Y.; Fazli, P. Real-Time Policy Distillation in Deep Reinforcement Learning. arXiv 2019, arXiv:1912.12630. [Google Scholar] [CrossRef]
Stulp, F.; Schaal, S. Hierarchical Reinforcement Learning with Movement Primitives. In Proceedings of the 11th IEEE-RAS International Conference on Humanoid Robots, Bled, Slovenia, 26–28 October 2011; pp. 231–238. [Google Scholar]
Joint Authorities for Rulemaking on Unmanned Systems (JARUS). SORA—Specific Operations Risk Assessment for Unmanned Aircraft Systems, v2.5; Main Body (JAR Doc 25); Joint Authorities for Rulemaking on Unmanned Systems (JARUS): Vienna, Austria, 2024. [Google Scholar]
European Commission. Commission Implementing Regulation (EU) 2021/664 of 22 April 2021 on a Regulatory Framework for the U-Space. Off. J. Eur. Union 2021, 139, 161–183. [Google Scholar]
ASTM F3269-21; Standard Practice for Methods to Safely Bound Behavior of Aircraft Systems Containing Complex Functions Using Run-Time Assurance. ASTM International: West Conshohocken, PA, USA, 2021.
ASTM F3548-21; Standard Specification for UAS Traffic Management (UTM) UAS Service Supplier (USS) Interoperability. ASTM International: West Conshohocken, PA, USA, 2021.

Figure 1. UAV-swarm mission scenarios.

Figure 2. Algorithm distributions across scenarios: Panels (a–c) report the number of publications (counts, not percentages) for the Path, Distribution, and Coverage scenarios, respectively. Panel (d) shows the percentage share (%) of algorithm families that are applicable to multiple scenarios. Panel (e) Year-wise distribution of path-planning manuscripts for UAV swarms within the four-year window (2020–2024). Key takeaways: (1) RL methods constitute the largest share in online/dynamic path and coverage studies within the four-year window, reflecting strong adaptability; (2) PSO (and PSO-hybrids) dominates offline–dynamic planning and dynamic coverage, benefiting large search spaces and moving targets.

Table 1. Crosswalk to prior surveys ([2,3,4,5,6,7,8,9,10,11,12]).

Framework	Primary Axis/Taxonomy Style	Scope/Focus	What It Captures Well	Gaps for Multi-UAV Swarms (Re Scenario Cells)
Aggarwal & Kumar [2]	Method lineage: classical/heuristic/meta-heuristic/ML/hybrid	UAV path planning (broad)	Clear family catalog; quick orientation by “how it works”	No explicit mission (Path/Distribution/Coverage) × planning (offline/online) × environment (static/dynamic) mapping; weak deployment guidance
Yang et al. [3]	Algorithmic paradigm: sampling/node/model-based/bio-inspired/multi-fusion	UAV 3D path planning (classical emphasis)	Nuanced split of classical planners	Platform-/scenario-agnostic; when to add MPC/APF/RL safety or which architecture to pick is unclear
Ait Saadi et al. [7]	Optimization approach: classical/heuristic/meta-heuristic/ML/hybrid	UAV path via optimization	Broad optimization view; pros/cons per family	Limited linkage to online vs. offline and dynamic vs. static missions; few cross-scenario rules
Zhang et al. [8]	Cooperative path planning; optimization-oriented synthesis	Multi-UAV cooperative path planning	Cooperation aspects summarized	Lacks scenario-conditioned mapping across Path/Distribution/Coverage and environment dynamics
Cetinsaya et al. [4]	Systematic review of control + path (2013–2023)	UAVs/UAS and swarms (controls + planning)	Decade-wide sweep; challenges and trends	Platform-level view; not tied to scenario cells or deployment bridges
Rahman et al. [5]	Family buckets: meta-heuristic/classical/heuristic/ML/hybrid; comparative criteria	Multi-UAV path planning	Family usage stats; criteria (time, cost, complexity, convergence, adaptability)	Comparisons not anchored to mission/planning/environment contexts
Wu et al. [6]	Bibliometric mapping (MKD) of 2000–2024	Trend and cluster analysis	Macro trends; surge post-2018; scenario-agnostic map	Method-agnostic; no operational scenario guidance
Ghambari et al. [9]	Taxonomy + environment modeling; optimality/completeness	General UAV path planning	Modeling choices and criteria well covered	Stops short of dynamic coverage, allocation–routing coupling, bandwidth-limited teams across scenario cells
Chen et al. [10]	Navigation stack: perception/planning/collaboration/control	Autonomous and multi-robot navigation (ground and aerial platforms)	System-level view of the navigation stack; links perception, planning and coordination	Not UAV-swarm-specific; no mission (Path/Distribution/Coverage) × planning-mode × environment grid; limited treatment of low-altitude airspace, energy and bandwidth constraints
Bui [11]	Four-axis taxonomy: robot model/environment type/communication mode/planner type	Multi-robot motion planning (platform-agnostic)	Makes communication modes and planner centralization explicit; clear gap analysis for different planner types	Focuses on general multi-robot systems; not tailored to low-altitude UAV corridors or swarm-scale missions; no mapping to Path/Distribution/Coverage cells
Athira et al. [12]	PRISMA-based task-assignment taxonomy (objectives, constraints, solution methods)	Multi-robot task allocation (mainly ground robots)	Detailed classification of task-assignment formulations and solvers; thorough PRISMA synthesis	Covers allocation but not 3D routing/coverage coupling; little UAV-swarm evidence; no mission/planning/environment mapping across scenario cells

Table 2. Qualitative comparison of planning families: scenario alignment (primary/secondary fit), compute footprint, and applicability across nine scenarios.

Algorithm Family	Primary Fit	Secondary Fit	Runtime Compute	Offline Cost	Typical Role/Key Applicability Notes
Graph search (A*/Dijkstra/JPS)	PFS	PND, PFD, DND, DFD, DFS, CND, CFD, CFS	Medium–High	Low	Reproducible offline backbone on known maps; grid/voxel dependence; expensive global replans on very large graphs
RRT/RRT* (incl. spatiotemporal)	PND	PFS, PFD, DND, DFD, DFS, CND, CFD, CFS	Medium–High	Low	Anytime kinodynamic routing; replan-friendly; slow in narrow passages; sensitive to collision-check budget
APF/reactive safety layers	PND	DND, CND	Low	Low	Near-field safety wrapper; low latency; local minima/oscillation; gain tuning sensitive; needs global backbone
MPC (incl. distributed MPC)	PND	PFS, PFD, DND, DFD, CND, CFD	High	Medium	Constraint-aware receding horizon; strong tracking; solver/model sensitivity; best for moderate team sizes
RL/MARL (CTDE, PPO/TD3/COMA, etc.)	PND, CND	PFD, DND, DFD, CFD	Medium (inference)	High	Online adaptivity, rich objectives; reward/safety design + sim-to-real issues; typically needs safety wrapper/monitor
PSO/PSO-hybrids	PFD, DFD, CFD	PND, PFS, DND, DFS, CND, CFS	>Medium (rolling)	Medium–High	Time-varying cost optimization; parameter sensitivity; improved by graph seeding/clustering/decomposition hybrids
GA/DE (incl. hybrids)	PFS, DFS, CFS	PND, PFD, DFD, CND, CFD	Low (runtime)	Medium–High	Offline multi-objective optimization/scheduling/capacitated tours; stochastic variability; no deterministic optimality
ACO (incl. SOM/FCM seeding)	DND	PND, PFS, PFD, DFD, DFS, CND, CFD, CFS	Low–Medium	Medium	Constructive allocation–routing under uncertainty; pheromone tuning; large-graph scalability limits; comm/pheromone sharing often needed
Unsupervised (clustering/partition embeds)	DND	DFD, DFS, CND, CFD	Low–Medium	Medium	Structure discovery for partitioning and load/bandwidth balancing; metric/cluster sensitivity; needs periodic rebalancing
Area segmentation (Voronoi/decomposition)	CND, CFD, CFS	DND	Low	Medium	Deterministic spatial structure for coverage; reduces overlap; requires stitching and dynamic reweighting for balance
Auction/market-based tasking	DND	PND, DFD, DFS, CND, CFD	Low–Medium	Medium	Online (re)assignment via bids/consensus; communication-heavy; pairs with local planners for motion feasibility

Table 12. Best-suited algorithm families across nine UAV mission scenarios.

Scenario	Best-Suited Algorithm Family	Rationale
PND	RL (PPO/COMA/TD3/DQN)	Online learning, strong adaptability and scalability in dynamic scenes
PFS	Graph-search (A*/Dijkstra); GA	Optimality and speed in static maps; GA effective for offline multi-objective paths
PFD	PSO/PSO-hybrids	Adapts to time-varying costs; improved convergence and scale
DND	Auction/market-based	Real-time scalability; re-bid/utilization benefits
DFD	PSO/PSO-hybrids	Scales to large assignments; hybrids escape local optima
DFS	GA (and GA-SA hybrids)	Effective constraint-aware offline assignment; integrates with routing
CND	RL	Drives exploration; reduces revisits; energy-aware adaptation
CFD	PSO/PSO-hybrids	Coverage efficiency, balanced load in dynamic targets/environments
CFS	GA; DE	Widely used baselines; hybrids improve efficiency and route quality

Table 13. Scenario-specific metrics reported by the surveyed studies.

Scenario	Metrics Reported in the Cited Studies	Representative Sources
PND	Mission/coverage efficiency; obstacle-avoidance success; robustness and scalability across swarm sizes	DQN hybrid improves task-completion efficiency and obstacle-avoidance under dynamic obstacles; COMA outperforms non-learning baselines across sizes/conditions (robustness and scalability)
PFS	Energy and path-length (multi-objective); runtime/memory; connectivity compliance	GA minimizes energy + length for swarm trajectories ; enhanced A* addresses heavy runtime/memory on large spaces; connectivity-aware path planning validated in urban airspace
PFD	Shorter paths; faster convergence; lower runtime; AoI for data collection; planning efficiency in dynamic airspace	PPSwarm: shorter paths, faster convergence, lower runtime ; DP-MATD3: lowering Age-of-Information in multi-UAV data collection; weighted-Voronoi + PSO shows high planning efficiency and robust avoidance in dynamic environments
DND	Flight time; energy consumption; completion rate; resource consumption	Power-line inspection under heterogeneous wind fields: flight time, energy; two-stage auction boosts completion rate and reduces resource consumption
DFD	Information value vs. flight time/energy (multi-objective); convergence/diversity under dynamics; re-allocation efficiency	WSN planning maximizes sensing information while minimizing time and energy; DT-assisted PSO improves convergence/diversity in dynamic conditions; dynamic re-allocation improves execution flexibility/efficiency
DFS	Execution efficiency; convergence speed; route cost	Disaster-relief AGA/SCPSO shows strong efficiency, convergence, and route cost performance on benchmarks
CND	Coverage efficiency; task efficiency; energy consumption; exploration time; load balance	Tsunami dynamic reassignment improves coverage efficiency (fault-tolerant); multi-base-station/area-segmentation planners improve task efficiency and reduce energy; DCAS shortens exploration time and improves load balance
CFD	Minimum tour distance; rapid, energy-aware exploration; energy/load balance; coverage efficiency	Checkpoint coverage focuses on minimum distance tours; DNBPT targets maximum gain with minimal energy and yields faster exploration and higher coverage efficiency; PSOHAC raises coverage, trims energy, and balances load
CFS	Battery/spray consumption; overall planning efficiency in waypoint-dense fields	GGA reduces battery and spray-tank usage, improving overall planning efficiency

Table 14. Representative benchmark tasks with reported evaluation setups.

Task/Scenario Pattern	Brief Task Description	Reported Evaluation Setup
Forest-fire missions (CND/CFD)	Coverage/search and ignition-source localization; trajectory planning for reconnaissance.	Rugged/complex terrain; dynamic fire-risk maps; simulation-based assessment of coverage/time/energy.
Transmission-line inspection under multi-wind fields (DND)	Joint task allocation and path planning targeting safety, time, and energy. connectivity compliance	Heterogeneous wind-field profiles; environmental constraints along transmission corridors; online reallocation effects.
Airport bird-dispersion (DND)	Trajectory prediction + assignment + curvature-constrained interception (Dubins).	Parametric simulations over fleet size, exclusion radius, and intrusion cases; LSTM–Kalman prediction + Hungarian assignment; curvature-limited paths.
Urban pipeline corridors inspection (PFD/CFD)	3D inspection-route optimization with connectivity/safety constraints.	Corridor geometry and obstacle density specified; smooth, collision-free 3D trajectories; connectivity maintained along the route.
Post-disaster SAR (DFS/CND)	Cooperative allocation + routing; distributed swarm scanning for large areas.	Case studies with dynamic reassignment/failure recovery; polygonal maps discretized into GPS waypoints; runtime wavefront dispatch.
Plant-protection coverage (CFD/CFS)	Multi-sub-swarm cooperative coverage minimizing round-trip distance and non-spraying time.	Field tiling/area partition; target-density and operation-speed constraints; evaluation by distance/non-spraying time.
AoI-driven data collection (PFD/CND)	Multi-UAV tours minimizing the Age of Information.	Time-varying sensing tasks; AoI computed from visit times; communication/scheduling constraints considered.
Urban patrol/corridor navigation (PND/PFS/CFD)	Patrol/route planning under urban geofences and curvature limits.	Low-altitude urban corridors; connectivity and minimum-separation constraints; route smoothness/feasibility checks.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, J.; Li, J.; Zhang, J.; Meng, W. A Comprehensive Review of Path-Planning Algorithms for Multi-UAV Swarms. Drones 2026, 10, 11. https://doi.org/10.3390/drones10010011

AMA Style

Li J, Li J, Zhang J, Meng W. A Comprehensive Review of Path-Planning Algorithms for Multi-UAV Swarms. Drones. 2026; 10(1):11. https://doi.org/10.3390/drones10010011

Chicago/Turabian Style

Li, Junqi, Junjie Li, Jian Zhang, and Wenyue Meng. 2026. "A Comprehensive Review of Path-Planning Algorithms for Multi-UAV Swarms" Drones 10, no. 1: 11. https://doi.org/10.3390/drones10010011

APA Style

Li, J., Li, J., Zhang, J., & Meng, W. (2026). A Comprehensive Review of Path-Planning Algorithms for Multi-UAV Swarms. Drones, 10(1), 11. https://doi.org/10.3390/drones10010011

Article Menu

A Comprehensive Review of Path-Planning Algorithms for Multi-UAV Swarms

Highlights

Abstract

1. Introduction

2. Classification and Analysis of Path-Planning Algorithms for Multi-UAV Swarms

2.1. Classification Criteria

2.1.1. Mission Types

2.1.2. Planning Methods

2.1.3. Environment Types

2.2. Algorithm Classification

3. Research Status of Path-Planning Algorithms for Multi-UAV Swarms

3.1. PND Problem (Path, Online, Dynamic)

3.1.1. Reinforcement-Learning Algorithms

3.1.2. The Rapidly Exploring Random Tree Algorithm

3.1.3. Artificial Potential Field (APF) Methods

3.1.4. Model Predictive Control Algorithm

3.1.5. Ant Colony Optimization Algorithm

3.1.6. Meta-Heuristic and Bio-Inspired Algorithms

3.2. PFS Problem (Path, Offline, Static)

3.2.1. Supervised-Learning Models

3.2.2. Graph-Search Algorithms

3.2.3. The Genetic Algorithm

3.3. PFD Problem (Path, Offline, Dynamic)

Particle Swarm Optimization Algorithms

3.4. DND Problem (Distribution, Online, Dynamic)

3.4.1. Ant Colony Optimization Algorithm

3.4.2. Reinforcement-Learning Algorithms

3.4.3. Unsupervised Learning Algorithms

3.4.4. Meta-Heuristic and Bio-Inspired Algorithms

3.5. DFD Problem (Distribution, Offline, Dynamic)

Particle Swarm Optimization Algorithms

3.6. DFS Problem (Distribution, Offline, Static)

Genetic Algorithm Algorithms

3.7. CND Problem (Coverage, Online, Dynamic)

3.7.1. Reinforcement-Learning Algorithms

3.7.2. Area-Segmentation Algorithms

3.7.3. Meta-Heuristic and Bio-Inspired Algorithms

3.8. CFD Problem (Coverage, Offline, Dynamic)

Particle Swarm Optimization Algorithms

3.9. CFS Problem (Coverage, Offline, Static)

3.9.1. Genetic Algorithm Algorithms

3.9.2. Differential Evolution Algorithms

4. Technique Selection

4.1. Architecture Selection Across the Nine Scenarios: Trade-Offs Among Centralized, Decentralized, and Hybrid Planning

4.2. Digital Twins Across the Nine Scenarios: A Bridge for Pre-Deployment Synthesis and Validation

4.3. Safety-Aware Collision Avoidance Across the Nine Scenarios

4.4. Technical Selection

4.5. Benchmarking

5. Discussion

6. Future Research

6.1. Risks and Challenges

6.1.1. Real-Time Computation and Scalability

6.1.2. Task Allocation, Cooperation, and Multi-Objective Scheduling

6.1.3. Communication, Safety, and Sensing Robustness

6.1.4. System-Level Concerns: Security, Privacy, and Energy/Lifecycle Constraints

6.2. Open Scientific Problems

6.2.1. Compute-Aware, Safety-Critical Online Planning (PND/CND/DND)

6.2.2. Scalable Task Allocation with Coupled Motion (PND/CND/DND)

6.2.3. Bandwidth-Aware Decentralized Planning with Reliability Guarantees (PND/CND/DND/PFD/DFD/CFD)

6.2.4. Certifiable Planning in Dense Low-Altitude Corridors (PND/CND/DND)

6.2.5. Energy-Aware Hierarchical Planning with Lifecycle Constraints (PFD/PND/CFS/CFD)

6.3. Application-Oriented Outlook

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI