Next Article in Journal
Geometry and Constants in Finite Ring Continuum
Previous Article in Journal
Shewhart-Type TBEA Charts for Monitoring Frequency and Amplitude with Symmetry Structure Under Generalized Weibull and Generalized Log-Logistic Distributions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Hybrid Optimization Algorithm for Enhanced Path Planning in Dynamic Multi-UAV Environments

School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou 325000, China
*
Author to whom correspondence should be addressed.
Symmetry 2026, 18(5), 749; https://doi.org/10.3390/sym18050749
Submission received: 7 March 2026 / Revised: 6 April 2026 / Accepted: 20 April 2026 / Published: 27 April 2026
(This article belongs to the Section Computer)

Abstract

Multi-UAV path planning in dynamic and complex environments is a challenging constrained optimization problem because it must simultaneously consider path efficiency, obstacle avoidance, altitude feasibility, flight smoothness, and inter-UAV path diversity. Existing methods often struggle to maintain search diversity, balance exploration and exploitation, and avoid premature convergence in high-dimensional search spaces. To address this issue, this paper proposes a Q-learning-guided Harris Hawk Optimization-Genetic Algorithm (QHHO_GA), which integrates Genetic Algorithm (GA), Harris Hawk Optimization (HHO), Q-learning, prioritized experience replay, entropy-based state partitioning, and a Rapidly exploring Random Tree (RRT)-based stagnation adjustment mechanism. In the proposed framework, GA enhances population quality and diversity, HHO performs the core search, Q-learning adaptively guides HHO behaviors, and stagnation monitoring with RRT-based stagnation adjustment improves the ability to escape locally trapped regions. Experimental results on the CEC2017 benchmark suite and a multi-UAV path planning task demonstrate the effectiveness of the proposed method. On the CEC2017 benchmark, QHHO_GA ranks among the top two on 18 out of 30 test functions and achieves the best overall ranking among the compared algorithms. In the UAV path planning experiments, it achieves an average ranking of 3.44 and also achieves the best overall rank among all compared methods. These results indicate that QHHO_GA is a robust and competitive method for high-dimensional constrained optimization, and is particularly effective for complex multi-UAV path planning tasks.

1. Introduction

Unmanned aerial vehicles (UAVs) have been increasingly deployed in disaster response, environmental monitoring, logistics, and intelligent agriculture because of their flexibility, low deployment cost, and ability to operate in complex or inaccessible areas [1,2]. With the growing demand for cooperative and autonomous missions, path planning has evolved from single-UAV navigation to multi-UAV coordination in dynamic environments [3,4]. Compared with ground vehicles, UAVs operate in three-dimensional space and must simultaneously account for obstacle avoidance, altitude feasibility, path smoothness, and mission efficiency, which substantially increases the difficulty of real-time planning [5,6].
For multi-UAV missions, path planning concerns not only the feasibility and safety of an individual trajectory, but also the coordination quality of multiple trajectories. In practical deployments, overly similar or strongly overlapping routes may reduce coverage efficiency, increase redundancy, and weaken the advantage of cooperative operation. Therefore, path planning in dynamic multi-UAV environments should not be formulated solely as a conventional shortest-path problem. Instead, trajectory quality, safety, and spatial differentiation should be considered jointly within a unified optimization framework.
From an optimization perspective, dynamic multi-UAV path planning is a high-dimensional and strongly constrained problem. Feasible trajectories must be generated under coupled requirements on path length, obstacle avoidance, altitude constraints, and flight smoothness, while excessive path overlap among UAVs should also be avoided. These characteristics lead to a complex search landscape in which premature convergence, diversity loss, and local trapping can substantially reduce solution quality. As a result, an effective optimizer should be able to maintain population diversity, regulate its search behavior adaptively, and expand the search space when progress becomes insufficient.
Existing hybrid optimization methods have shown that combining multiple mechanisms can improve path planning performance. However, in complex multi-UAV scenarios, three challenges remain particularly important. First, population quality and search diversity are often difficult to preserve simultaneously throughout the optimization process. Second, the exploration–exploitation balance is not always regulated through an explicit state-dependent decision mechanism. Third, recovery from prolonged search stagnation is frequently weak, which limits the optimizer’s ability to escape locally favorable but globally suboptimal regions. These challenges are especially restrictive in dynamic multi-UAV planning, where the search space is constrained not only by environmental feasibility but also by the coordination requirement of producing sufficiently differentiated trajectories.
To address these issues, this paper proposes a Q-learning-guided Harris Hawk Optimization-Genetic Algorithm (QHHO_GA) for dynamic multi-UAV path planning. In the proposed framework, GA is employed to enhance population quality and maintain diversity, HHO serves as the core search mechanism, and Q-learning is used to guide adaptive behavior selection through a deliberately designed state–action space. In addition, prioritized experience replay, entropy-guided state partitioning, stagnation monitoring, adaptive parameter regulation, and RRT-based stagnation adjustment are integrated into a unified heuristic optimization process. Through this design, the proposed method aims to improve diversity preservation, adaptive search regulation, and recovery from local stagnation in complex constrained search spaces.
The effectiveness of QHHO_GA is evaluated on the CEC2017 benchmark suite and on simulation-based multi-UAV path planning scenarios with multiple obstacle configurations. The main contributions of this paper are summarized as follows:
  • A hybrid optimization framework, QHHO_GA, is proposed for dynamic multi-UAV path planning by integrating GA-based population enhancement, Q-learning-guided HHO adaptive search, and RRT-based stagnation adjustment into a unified search process.
  • An adaptive search mechanism is developed through entropy-guided state partitioning, prioritized experience replay, and Boltzmann-based action selection, enabling more effective regulation of exploration and exploitation while improving the ability to escape premature convergence.
  • A multi-UAV path planning model with an explicit path diversity objective is established, and extensive experiments on benchmark functions and complex terrain scenarios are conducted to verify the effectiveness and competitiveness of the proposed method.

2. Related Works

UAV path planning has been studied extensively, and the related literature reflects a clear evolution from deterministic search to adaptive and hybrid optimization. Early studies mainly relied on classical search-based planners. With the increasing complexity of dynamic and constrained planning tasks, later work progressively incorporated population-based metaheuristics, hybrid frameworks, and learning-guided search strategies. Accordingly, the development of this field can be understood not merely as the accumulation of individual algorithms, but as a gradual transition from fixed search rules to adaptive optimization mechanisms.
Traditional search-based methods. Classical path planning methods, such as A* and D*, remain important because they provide explicit search logic and can generate feasible paths efficiently in structured or relatively static environments [7,8]. Deterministic search has also been extended to more complex planning settings through conflict-based search and related graph-based formulations [9,10]. In continuous spaces, RRT and its variants have been widely adopted because of their strong exploratory capability and flexible search behavior [11,12]. Subsequent developments, including RRT* and related extensions, further improved path quality and asymptotic optimality [13,14]. Other variants have also been proposed for constrained or application-specific planning scenarios [15,16]. These methods are particularly effective when the environment model is relatively clear and the search objective can be represented by deterministic rules. However, when the problem becomes high-dimensional, strongly constrained, or dynamically changing, repeated search and replanning may lead to a substantial computational burden. Moreover, the resulting paths often require further refinement when obstacle avoidance, smoothness, altitude feasibility, and multi-UAV coordination must be considered simultaneously [10,17].
Metaheuristic optimization and improved variants. To overcome the limitations of purely deterministic planning, many studies introduced population-based metaheuristic algorithms into UAV path planning. Classical methods such as GA and Ant Colony Optimization (ACO) provide stochastic exploration through population evolution and probabilistic search [18,19]. Particle Swarm Optimization (PSO) and HHO have likewise been adopted because of their strong global search capability and flexible update mechanisms [20,21]. Beyond these foundational approaches, a large number of improved variants have been proposed to enhance convergence speed, diversity maintenance, and search robustness. Representative examples include the multi-verse optimizer and dynamic metaheuristic variants designed for more complex search landscapes [22,23]. More recent studies have further developed specialized frameworks such as DENSDBO-ASR and PPSwarm for UAV-related optimization scenarios [24,25]. These studies indicate that performance can often be improved by redesigning initialization strategies, update rules, or diversity-preserving mechanisms. Nevertheless, in complex path planning tasks, especially when the feasible solution space is discontinuous or strongly constrained, basic or lightly improved algorithms may still suffer from premature convergence and insufficient adaptability to changing search conditions.
Hybrid and learning-guided optimization frameworks. As optimization problems became more complex, researchers increasingly turned to hybrid frameworks in order to combine the complementary strengths of different algorithms. Representative examples include path planning schemes that integrate deterministic search with evolutionary optimization, such as RRT-GA frameworks [26,27]. Other studies have combined swarm-intelligence methods, including PSO-ACO and related multi-strategy designs, to improve search efficiency and solution quality [28,29]. More recent work has explored hierarchical, co-evolutionary, and agent-based hybrid frameworks for complex planning tasks [30,31]. Adaptive multi-strategy designs have also been proposed to improve robustness in high-dimensional or dynamically changing environments [32,33]. The motivation behind such studies is not hybridization for its own sake, but the use of one mechanism to compensate for the limitations of another. In particular, hybridization is often employed to combine broader exploration with local refinement, or to restore diversity when the search becomes trapped in a limited region. This line of research shows that appropriately designed hybridization can improve optimization performance, although it may also increase structural complexity when the integrated mechanisms are not well aligned.
In recent years, Q-learning-guided metaheuristics have attracted increasing attention because Q-learning can provide lightweight adaptive control without the heavier computational requirements of deep reinforcement learning [34,35]. In UAV-related optimization, reinforcement learning has been increasingly used to improve path planning and decision adaptation under dynamic conditions [36,37]. Related studies have also explored multi-agent and policy-design perspectives for UAV coordination and planning [38,39]. Recent reviews further indicate that the integration of Q-learning with metaheuristics has become an active research direction for operator selection, parameter adaptation, and exploration–exploitation regulation [40]. Promising results have already been reported in both path planning and general search optimization. For example, the IQ-FAT algorithm improves convergence speed by 40% and achieves approximately 90% better dynamic response time than the artificial potential field method [41]. Similarly, the GA and Q-learning hybrid method proposed by Puente-Castro et al. improves path planning performance by 57.14% compared with conventional GA [42]. More recently, Q-learning-guided grey wolf optimization has been shown to improve path cost and convergence behavior in UAV path planning through adaptive search control and diversity-preserving mechanisms [43]. These findings suggest that the effectiveness of Q-learning-guided optimization does not depend solely on the addition of a learning component, but also on whether the state representation and action design are properly matched to the underlying search dynamics.
Based on the above development, the issue addressed in this work is not whether hybridization itself is possible, since many hybrid frameworks already exist, but rather how to design a practically effective combination of mechanisms for complex constrained search tasks. In particular, for HHO-based search, the construction of a meaningful state–action space for adaptive behavior selection is nontrivial. If the state partition is redundant or the action mapping is poorly matched, the hybrid framework may perform no better than, or even worse than, the original optimizer. In addition, path planning problems, especially dynamic multi-UAV tasks, require not only feasible and low-cost trajectories but also stronger diversity maintenance and effective stagnation adjustment in order to avoid overly similar paths and local trapping. These considerations motivate the proposed QHHO_GA framework. In the present work, GA is introduced as a simple but effective population-enhancement mechanism, Q-learning-guided HHO is used for adaptive action selection through a deliberately designed state–action space, and RRT-based stagnation adjustment is incorporated to expand the search space when progress becomes insufficient. Accordingly, the contribution of this work lies not in claiming the first hybridization itself, but in the careful integration of these mechanisms into a coherent framework and in demonstrating its effectiveness for dynamic multi-UAV path planning.

3. Background

This section briefly introduces the four algorithmic components underlying the proposed framework, namely GA, HHO, RRT, and Q-learning. Since these methods provide the theoretical basis of QHHO_GA, only their basic principles and the key formulations used later are summarized here.

3.1. Genetic Algorithm

GA is a population-based optimization algorithm inspired by natural selection and genetic evolution. Through repeated selection, crossover, mutation, and replacement, GA evolves candidate solutions toward promising regions of the search space. Owing to its robust global search capability, GA has been widely used in optimization problems. In the present work, GA is introduced as a population enhancement mechanism for improving solution quality and preserving search diversity. The initial population is generated within the search bounds as
X N × D = l b + ( u b l b ) · r a n d ( N , D )
where N is the population size, D is the problem dimension, and  l b and u b are the lower and upper bounds of the search space, respectively. To assign higher selection probability to individuals with better fitness under a minimization setting, roulette-wheel selection is adopted:
P i = 1 / ( F i t n e s s ( i ) + ϵ ) j = 1 N 1 / ( F i t n e s s ( j ) + ϵ )
where P i is the probability that individual i is selected, F i t n e s s ( i ) is the fitness value of individual i, and  ϵ is a constant used to avoid division-by-zero errors. New offspring are then generated by single-point crossover:
C 1 , i = [ P 1 , 1 : k , P 2 , k + 1 : D ] C 2 , i = [ P 2 , 1 : k , P 1 , k + 1 : D ]
where P 1 and P 2 denote the parent individuals, k is a randomly selected crossover point, and  C 1 and C 2 are the generated offspring. To preserve diversity and reduce the risk of local trapping, mutation is further applied as
X i , j = l b j + r a n d · ( u b j l b j ) if r a n d < P m u t
where X i , j is the gene value of the jth dimension of individual i, P m u t is the mutation probability, and  l b j and u b j are the search bounds of the corresponding dimension. After crossover and mutation, individuals with poorer fitness are replaced so that the overall population quality can be maintained, typically through an elite-retention strategy.

3.2. Harris Hawk Optimization Algorithm

HHO is a population-based metaheuristic inspired by the cooperative hunting behavior of Harris hawks. By switching among different search behaviors, HHO balances global exploration and local exploitation during the optimization process and has shown competitive performance in a variety of optimization tasks. In this study, HHO serves as the core search mechanism of the proposed framework. In HHO, each candidate solution is regarded as the current position of a hawk, and the population evolves iteratively toward the current best solution. The main search behaviors considered in this work include soft besiege, hard besiege, random jump, spiral motion, and surprise pounce. The soft-besiege update is expressed as
X i t + 1 = X i t + E · ( X g t X i t ) + 0.01 · r a n d n ( 1 , d )
where E denotes the escape energy of the prey, X g t is the current global optimal solution, and  r a n d n ( 1 , d ) is a random perturbation following a standard normal distribution. The hard-besiege behavior is given by   
X i t + 1 = X g t + 0.5 · ( X i t X g t ) · r a n d ( 1 , d )
whereas the random-jump behavior, which enlarges the search range during exploration, is written as
X i t + 1 = X i t + J · ( U B L B ) · r a n d ( 1 , d )
where J is the jump intensity and U B and L B denote the upper and lower bounds of the search space. In addition, the spiral-motion behavior is defined as
X i t + 1 = X g t + r · c o s ( θ ) · ( X i t X g t )
where r is the radius factor and θ is the random angle, while the surprise-pounce behavior is modeled as
X i t + 1 = X g t + rand · ( X i t X g t ) · exp ( α · t )
where rand is a random number, α is a decay factor, and t is the current iteration step. These update rules provide the basic HHO behaviors later invoked in the proposed hybrid framework.

3.3. Rapidly Exploring Random Tree

RRT is a randomized tree-based search strategy that incrementally explores the solution space through recursive node expansion. Because of its exploratory capability, RRT has been widely used in motion planning and related search problems. In the proposed framework, RRT is not used as an independent planner, but as a search-space adjustment mechanism when stagnation occurs. Its basic procedure involves root generation, local node extension, and boundary handling. The root node is initialized as
q root = l b + rand ( 1 , D ) · ( u b l b )
where q root denotes the location of the root node, l b and u b are the lower and upper bounds of the search space, and D is the problem dimension. Child nodes are then generated through local randomized extension:
q child = q node + δ · ( rand ( 1 , D ) 0.5 )
where q node is the current node and δ is the step size of the expansion. To ensure feasibility with respect to the search domain, each generated child node is projected back into the valid range by
q child = min ( max ( q child , l b ) , u b )
which guarantees that the generated node remains within the admissible search space.

3.4. Q-Learning

Q-learning is a model-free reinforcement learning algorithm that learns state–action values through interaction with the environment. By iteratively updating the Q-value of each state–action pair, it gradually improves the action policy according to observed rewards. In QHHO_GA, Q-learning is used to guide adaptive action selection during the search process. The core idea of Q-learning is to estimate the expected cumulative reward associated with taking a certain action in a given state. Its standard temporal-difference update rule is written as
Q ( s , a ) Q ( s , a ) + α r + γ max a Q ( s , a ) Q ( s , a )
where Q ( s , a ) denotes the Q-value for choosing action a in state s, α is the learning rate, r is the reward obtained after executing action a, and  γ is a discount factor used to measure the contribution of future rewards. The next state is denoted by s , and  max a Q ( s , a ) represents the maximum Q-value over the candidate actions in the next state.

4. Proposed QHHO_GA Algorithm

This section presents the overall framework of QHHO_GA and explains how its main components are integrated into a unified heuristic optimization process. Since the basic principles of GA, HHO, RRT, and Q-learning have been introduced in the previous section, the focus here is placed on their interaction, execution order, and functional coordination within the proposed framework.
As shown in Figure 1 and Figure 2, QHHO_GA consists of four modules, namely GA-based population enhancement, Q-learning-guided HHO adaptive search, stagnation monitoring and adaptive parameter regulation, and RRT-based stagnation adjustment. Starting from an initialized population, GA operations are first used to improve population quality and maintain diversity. Candidate solutions are then updated through Q-learning-guided HHO search, in which the current search state is evaluated and the corresponding HHO behavior is selected adaptively. During this process, prioritized replay is used to improve Q-table updating. When the best solution remains unimproved for a predefined period, the temperature parameter τ and the exploration probability ϵ are increased, and the RRT-based stagnation adjustment mechanism is triggered to inject new candidate solutions and enlarge the search space.

4.1. GA-Based Population Enhancement

In QHHO_GA, GA is employed as a population enhancement mechanism rather than as an independent optimizer. Through selection, crossover, mutation, and fitness-guided replacement, it improves population quality while maintaining useful diversity, thereby providing a more suitable population basis for the subsequent adaptive search process. Since the basic GA operators have already been introduced in the previous section, they are not repeated here.

4.2. Q-Learning-Guided HHO Adaptive Search

This module constitutes the adaptive search core of QHHO_GA. Its main role is to regulate HHO behaviors through Q-learning so that the search process is adjusted according to the current optimization state rather than following only fixed update rules. As summarized in Algorithm 1, each candidate solution is evaluated with respect to the current population state, an HHO action is selected according to the learned Q-table, and the corresponding update is then executed. The resulting transition is stored for subsequent replay-based learning, so that state assessment, action selection, and experience reuse are coupled within a unified adaptive search loop.
To improve learning efficiency, QHHO_GA employs prioritized experience replay, as illustrated in Figure 3. Instead of replaying all transitions with equal importance, the framework assigns higher priority to samples with larger absolute TD errors, because such transitions generally provide stronger learning signals for Q-table updating. The replay probability is defined as
P ( i ) = | T D error ( i ) | α j = 1 N | T D error ( j ) | α
where α controls the influence of TD error on sampling probability. The corresponding replay procedure is summarized in Algorithm 2.
Algorithm 1: Q-learning-guided HHO adaptive search
Input:  x , f i t x , Q t a b l e , X , g b e s t , f e s , f o b j , u b , l b , d ;
r e p l a y B u f f e r ,   p r i o r i t y B u f f e r ,   r e p l a y S i z e ,   τ ,   ϵ ,   b e s t f i t ,   w o r s e f i t
Output:  n e w _ x , n e w f i t , Q t a b l e , f e s , r e p l a y B u f f e r , p r i o r i t y B u f f e r
1Initialize:  m a x _ s t d ; m i n _ s t d ;
2Compute m a x _ s t d and m i n _ s t d over X (by row standard deviation);
3Determine the current state using:
4 s t a t e determine _ state _ hho ( x , g b e s t , f i t x , b e s t f i t , w o r s e f i t , m a x _ s t d , m i n _ s t d , X , u b , l b ) ;
5Select an action based on the current state and Q-table using:
6 a c t i o n choose _ hho _ action ( s t a t e , Q t a b l e , τ , ϵ ) ;
7Apply the selected action:
8 [ n e w _ x , n e w f i t , r e w a r d , f e s ] apply _ hho _ action ( a c t i o n , x , f i t x , X , g b e s t , f e s , f o b j , u b , l b , d ) ;
9Update the global best solution g b e s t and the best fitness b e s t f i t based on n e w f i t ;
10Compute the current and next Q-values for updating the Q-table using:
11 Q t a b l e update _ q _ table ( s t a t e , a c t i o n , r e w a r d , n e x t _ s t a t e , c u r r e n t Q _ v a l u e , n e x t Q _ v a l u e ) ;
12Add the experience ( s t a t e , a c t i o n , r e w a r d , n e x t _ s t a t e , T D _ e r r o r ) to the replay buffer and priority buffer;
13Perform prioritized replay by sampling from the priority buffer and updating the Q-table.
14return  n e w _ x , n e w f i t , Q t a b l e , f e s , r e p l a y B u f f e r , p r i o r i t y B u f f e r
Algorithm 2: Prioritized Experience Replay Based on TD Error
Symmetry 18 00749 i001
The adaptive search mechanism further combines entropy-aware state characterization with Boltzmann-based action selection, as illustrated in Figure 4. After the current state is determined, actions are sampled according to
P ( a ) = e Q ( s , a ) / τ a e Q ( s , a ) / τ
where τ is the temperature parameter controlling the randomness of action selection. A lower value of τ emphasizes exploitation by favoring actions with larger Q-values, whereas a higher value encourages broader exploration. To characterize the current search condition, the population state is described through normalized fitness, relative distance to the current global best solution, population dispersion, and an entropy-related diversity measure. The population entropy is defined as
H = i = 1 N p i · log ( p i )
where p i denotes the probability distribution of individuals in the population. Based on these indicators, the current search state is evaluated as
State = 1 , if normalized _ fit < 0.3 and normalized _ std < 0.5 , 2 , if distance > 0.5 and normalized _ std > 0.5 , 3 , if mean _ dist < 0.5 , 4 , if entropy > 0.6 , 5 , otherwise .
Here, State 1 indicates a highly aggregated population suitable for local search, State 2 corresponds to a more dispersed population requiring broader exploration, State 3 represents a population already close to the current global optimum, State 4 reflects high population diversity, and State 5 is treated as a default transitional state. The default action is defined as
x new = x + 0.1 · randn ( 1 , d )
while the complete state–action correspondence is illustrated in Figure 4. Through repeated interaction among state evaluation, action selection, and replay-based updating, the search process becomes more sensitive to different optimization stages and more capable of balancing exploration and exploitation.

4.3. Stagnation Monitoring and Adaptive Parameter Regulation

In addition to the adaptive search mechanism, QHHO_GA includes a stagnation monitoring module to evaluate whether the optimization process continues to improve the current best solution. Specifically, the framework records the number of consecutive iterations in which the best fitness is not updated. When the stagnation threshold is reached, the temperature parameter τ and the exploration probability ϵ are increased so that the subsequent Q-learning-guided action selection becomes more exploratory. In this way, optimization progress is directly incorporated into parameter regulation rather than being treated as a fixed external condition.

4.4. RRT-Based Stagnation Adjustment

When the stagnation monitoring module detects that the current best solution has not improved for a predefined number of iterations, the RRT-based stagnation adjustment mechanism is activated, as illustrated in Figure 5. In QHHO_GA, RRT is not used as an independent optimizer; instead, it serves as a recovery mechanism for enlarging the search space when the population becomes trapped in a limited region. The new candidate solutions generated by the RRT-based random tree generation procedure are used to replace the worst individuals in the population, after which their fitness values are re-evaluated. At the same time, τ and ϵ are increased appropriately to encourage broader exploration.

4.5. Summary of the Mechanism Hierarchy in QHHO_GA

In summary, QHHO_GA is a hierarchically organized hybrid heuristic optimization framework that integrates population-level, individual-level, and cross-level control mechanisms. At the population level, GA-based population enhancement, stagnation monitoring and adaptive parameter regulation, and RRT-based stagnation adjustment are responsible for improving population quality, maintaining diversity, and enlarging the search space when the search becomes stagnant. At the individual level, HHO action execution and Q-learning-guided action selection enable each candidate solution to adjust its search behavior adaptively. In addition, entropy-based state evaluation, prioritized experience replay, and the adaptive regulation of τ and ϵ connect individual search behavior with population-level statistical information, thereby forming a cross-level control mechanism. Through this coordinated design, the proposed framework forms a closed optimization loop that integrates population evolution, adaptive individual search, and stagnation adjustment.

5. Problem Modeling

This section formulates the multi-UAV path planning problem considered in this work. In dynamic and complex environments, path planning is not limited to finding a shortest feasible route. Instead, the optimizer is required to generate trajectories in a high-dimensional search space while simultaneously satisfying multiple coupled requirements, including path efficiency, obstacle avoidance, altitude feasibility, and flight smoothness. These characteristics make the problem highly constrained and difficult to solve using simple deterministic search alone.
Compared with single-UAV path planning, the multi-UAV case introduces an additional coordination requirement. It is not sufficient for each UAV to independently obtain a feasible path; the trajectories of multiple UAVs should also maintain sufficient spatial distributability. If several UAVs follow highly similar or excessively close routes, task coverage may become redundant, mutual interference may increase, and the cooperative advantage of deploying multiple UAVs may be weakened. Therefore, path diversity is treated here not as an auxiliary preference, but as an explicit component of the multi-UAV optimization objective.
From an optimization perspective, the problem considered here is characterized by a high-dimensional search space, multiple coupled objective components, and strict feasibility constraints. These properties generally produce a complex search landscape with many locally favorable but globally suboptimal regions. Moreover, in the multi-UAV setting, the optimization process should not only seek feasible and low-cost trajectories, but also avoid excessive concentration of solutions, since overly similar paths weaken cooperative effectiveness. As a result, the problem requires the optimizer to maintain population diversity, regulate search behavior according to the current search state, and expand the search space when progress becomes stagnant. In this sense, dynamic multi-UAV path planning provides a representative constrained optimization scenario for assessing the effectiveness of QHHO_GA.
Accordingly, each candidate solution X i is evaluated by a weighted cost model. The first four cost terms describe the quality and feasibility of an individual UAV path, whereas the fifth term explicitly captures the coordination requirement among multiple UAVs through path diversity. The total cost is defined as follows:
F total ( X i ) = m = 1 5 w m · F m
where F total ( X i ) denotes the total cost associated with the candidate solution X i , w m denotes the weight assigned to the m-th cost component, and F m denotes the corresponding cost function.

5.1. Path Length Cost

Path length is a fundamental measure of flight efficiency, since shorter trajectories generally reduce flight time, energy consumption, and mission cost. Let N denote the total number of nodes in the complete path, including the start and end nodes, and let P i , j denote the j-th node of the path encoded in candidate solution X i . The path length cost is defined as:
F 1 ( X i ) = j = 1 N 1 P i , j P i , j + 1
where F 1 ( X i ) denotes the path length cost of X i , obtained by summing the Euclidean distances between consecutive path nodes.

5.2. Obstacle Threat Cost

In addition to path efficiency, the generated trajectory must maintain sufficient clearance from obstacles. Suppose that there are K obstacles in the environment. Each obstacle is represented as a cylindrical structure centered at C k with radius R k . Let d k denote the shortest distance from a path segment to the center of the k-th obstacle. A smaller d k indicates a higher collision risk. As shown in Figure 6, the obstacle threat cost F 2 ( X i ) is defined as:
F 2 ( X i ) = j = 1 N 1 k = 1 K T k P i , j P i , j + 1 ¯
where the term is defined as:
T k P i , j P i , j + 1 ¯ = 0 , if d k > S + D + R k S + D + R k d k , if D + R k < d k S + D + R k , if d k D + R k
where D denotes the UAV diameter and S is a safety-margin parameter defining the obstacle risk region. The value of S can be adjusted according to environmental uncertainty and positioning accuracy. When the path remains outside the risk region, the corresponding threat cost is zero. Once the path enters the risk region, the penalty increases as the clearance decreases. If the path enters the collision zone, the threat cost becomes infinite, indicating an infeasible solution.

5.3. Navigation Height Cost

Altitude feasibility is another important constraint in path planning. A feasible trajectory should keep the UAV within an allowable flight-height range with respect to the local ground while avoiding excessive fluctuations that may reduce flight stability. As shown in Figure 7, let z i j denote the flight height of the j-th path node in candidate solution X i with respect to the local terrain. The node-wise altitude cost is defined as:
C i j alt = z i j z max + z min 2 , if z i j 0 , otherwise
where z min and z max denote the minimum and maximum allowable flight heights with respect to the ground, respectively. This formulation penalizes deviations from the middle of the admissible height band and assigns an infinite penalty when the relative flight height becomes negative. The corresponding absolute altitude used for trajectory generation is given by
z i j abs = H ( x i j , y i j ) + z i j
where H ( x i j , y i j ) denotes the terrain elevation at the corresponding horizontal position. The total navigation height cost is then
F 3 ( X i ) = j = 1 n C i j alt

5.4. Smoothness Cost

A feasible path should not only be safe but also sufficiently smooth for practical execution. Large fluctuations in yaw angle or pitch angle may reduce flight stability, increase control difficulty, and degrade mission performance. Therefore, the smoothness cost is defined using both the horizontal yaw angle and the vertical pitch angle.
As shown in Figure 8, the horizontal yaw angle ϕ i j is defined as the angle between two consecutive path segments projected onto the horizontal plane:
ϕ i j = arctan P i j P i , j + 1 × P i , j + 1 P i , j + 2 P i j P i , j + 1 · P i , j + 1 P i , j + 2
The vertical pitch angle ψ i j is defined as:
ψ i j = arctan z i , j + 1 abs z i j abs P i j P i , j + 1
where z i , j + 1 abs and z i j abs denote the absolute altitudes of consecutive nodes. In implementation, excessive turning and excessive changes in climb angle are penalized only when they exceed prescribed thresholds. Thus, the smoothness cost is written as
F 4 ( X i ) = j = 1 N 2 max 0 , | ϕ i j | ϕ max + max 0 , | ψ i , j + 1 ψ i j | ψ max
where ϕ max and ψ max are the allowable turning-angle and climb-angle-change thresholds, respectively. This formulation penalizes sharp turns and abrupt altitude changes only when they exceed the admissible limits, thereby encouraging smoother and more executable trajectories.

5.5. Multi-UAV Flight Diversity Cost

In multi-UAV missions, it is not sufficient to optimize each path independently. The generated trajectories should also maintain sufficient spatial distributability. If multiple UAVs follow overly similar or closely overlapping paths, mission coverage may become redundant and mutual interference may increase. Therefore, an additional diversity cost is introduced to penalize excessive proximity among UAV trajectories.
Assume that the mission contains N u UAVs, and let P u , j denote the j-th node on the path of the u-th UAV. The distribution cost between multiple UAV paths is defined as:
J dist = u = 1 N u 1 v = u + 1 N u k = 1 n u 1 l = 1 n v 1 Cost dist ( S u , k , S v , l )
where S u , k and S v , l denote the k-th and l-th path segments of the u-th and v-th UAVs, respectively. The pairwise segment distance cost is defined as:
Cost dist ( S u , k , S v , l ) = 100 d , if d < 10 1 d , otherwise
where d is the minimum distance between the two path segments, computed as:
d = min s [ 0 , 1 ] , t [ 0 , 1 ] P u , k ( s ) P v , l ( t )
Finally, the total multi-UAV flight diversity cost is defined as:
F 5 ( X ) = b dist J dist
where b dist is the weight factor of the diversity cost. By introducing this term, the optimization process penalizes overlapping or excessively close UAV paths and thereby reflects the cooperative requirement of multi-UAV missions more explicitly.
Overall, the optimization objective considered in this work is to obtain a set of feasible and spatially differentiated UAV trajectories by minimizing the weighted cost in Equation (19). The first four cost terms evaluate path efficiency, safety, altitude feasibility, and smoothness, whereas the fifth term models the coordination requirement among multiple UAVs through path diversity.

6. Experimental Results and Analysis

To validate the effectiveness of QHHO_GA, two groups of experiments were conducted. The first group evaluates the general optimization capability of the proposed algorithm on the CEC2017 benchmark set, while the second group examines its performance in a multi-UAV path planning task under complex three-dimensional terrain scenarios. In this way, the proposed method is assessed from both the optimization perspective and the application perspective.

6.1. CEC2017 Benchmark Comparison

For the benchmark evaluation, the CEC2017 test set containing 30 functions was adopted. These functions include single-peaked, simple multi-peaked, hybrid, and composition functions, thus covering different levels of search difficulty. All experiments were conducted independently 30 times in Matlab 2023a with a population size of 30 and a maximum number of function evaluations of 300,000. The mean and standard deviation were used to measure performance stability, and the Wilcoxon signed-rank test with a significance level of 0.05 was used to evaluate statistical significance. In the following analysis, the symbols “+/=/−” indicate whether QHHO_GA performs better than, equal to, or worse than the comparison algorithms, respectively.
The algorithms compared with QHHO_GA on the CEC2017 benchmark include PSO, Aging Leader and Challengers Particle Swarm Optimization (ALCPSO) [44], Differential Evolution (DE) [45], HHO, Whale Optimization Algorithm (WOA) [46], Random Walk Grey Wolf Optimizer (RWGWO) [47], Cooperative Dynamic Learning Opposition-Based Bat Algorithm (CDLOBA) [48], Differential Fruit Fly Optimization Algorithm (DFOA) [49], Opposition-Based Sine Cosine Algorithm (OBSCA) [50], and Rule-Based Cultural Algorithm (RCBA) [51]. These algorithms were selected because they represent a range of classical and improved population-based optimization strategies, including particle-based search, differential evolution, hawk-inspired search, whale-inspired search, grey-wolf-based search, bat-inspired learning, fruit-fly optimization, sine-cosine optimization, and rule-based hybrid optimization. Therefore, they provide representative baselines for evaluating the optimization performance of QHHO_GA on benchmark functions.
As shown in Figure 9, the benchmark results are visualized through a rank heatmap of the top-10 algorithms on the CEC2017 test suite. For readability, only the ten highest-ranked methods according to the overall ranking are displayed. Darker cells indicate better rankings, and the black horizontal separators divide the benchmark into four function families, namely unimodal functions (F1–F3), multimodal functions (F4–F10), hybrid functions (F11–F20), and composition functions (F21–F30). The heatmap shows that QHHO_GA maintains consistently competitive rankings across a wide range of benchmark functions.
The convergence curves shown in Figure 10 provide additional evidence that the proposed algorithm is especially effective on hybrid and composition functions, where the search landscape is more complex and the risk of premature convergence is higher.
More specifically, QHHO_GA ranks among the top two on 18 out of the 30 CEC2017 benchmark functions when tied positions are taken into account. It ranks first on F3, F10, F13, F18, F19, F28, and F30, and ranks second on F1, F2, F6, F8, F11, F12, F15, F21, and F26. From the perspective of function categories, the proposed method remains highly competitive on the unimodal group and shows particularly strong robustness on the hybrid and composition groups, where population diversity maintenance and stagnation recovery become more important. Such results suggest that the combination of population enhancement, adaptive search, stagnation monitoring, and RRT-based stagnation adjustment enables QHHO_GA to maintain a strong balance between exploration and exploitation in complex optimization landscapes.
The benchmark results also reveal a clear competitive structure among the compared algorithms. RWGWO and DE are the strongest competitors to QHHO_GA in terms of overall ranking, whereas DFOA and WOA show relatively weak robustness on the CEC2017 benchmark. This indicates that the superiority of QHHO_GA is established not only against weaker baselines, but also against strong competitors from different metaheuristic families.
It should also be noted that QHHO_GA is not ranked first on every benchmark function. Its relatively weaker rankings mainly appear on F4, F5, F7, F9, F14, F16, F17, F20, F22, F23, F24, F25, F27, and F29, where it is usually outperformed by DE, RWGWO, or ALCPSO. A cautious interpretation is that these functions may favor more direct exploitation dynamics or more stable local refinement patterns, whereas the diversity-preserving and stagnation-recovery mechanisms of QHHO_GA become more advantageous when the search landscape is more deceptive, composite, or prone to premature convergence. Therefore, the proposed method should be understood as a robust overall optimizer rather than as the best-performing method on every individual function.
Appendix A Table A1 and Table A2 report the detailed mean and standard deviation results for all compared algorithms. Table 1 summarizes the final ranking results and further confirms the competitiveness of QHHO_GA on the CEC2017 benchmark set.
Table 1 lists the final ranking results of QHHO_GA and the comparison algorithms, further validating the competitiveness of the proposed method, especially on hybrid and composition functions.

6.2. UAV Experimental Comparison

To further evaluate the practical applicability of QHHO_GA, comparative experiments were conducted on pre-generated three-dimensional hilly terrain maps for multi-UAV path planning. In addition to several baseline algorithms already used in the CEC2017 comparison, this experiment also includes Random Drift Whale Optimization Algorithm (RDWOA) [52], Gaussian Barebone Harris Hawks Optimizer (GBHHO) [53], Enhanced Salp Swarm Algorithm (ESSA) [54,55], RIME Algorithm [56], MCHHO Algorithm [57], Levy-flight Whale Optimization Algorithm (LWOA) [58], and Slime Mould Algorithm (SMA) [59]. These algorithms were selected because they provide representative search strategies for complex path planning scenarios, including adaptive whale optimization, hybrid HHO variants, improved salp-swarm-based search, chaos-enhanced exploration, Levy-flight-based search, and slime-mould-inspired adaptive optimization. Together with OBSCA, ALCPSO, PSO, and WOA, these algorithms serve as the main comparative baselines for the UAV path planning task.
Representative path visualizations are provided for nine UAV scenarios, whereas the quantitative statistical comparison in Table 2 is reported for seven representative scenarios due to space limitations. This design allows both visual inspection of trajectory behaviors and compact quantitative comparison of algorithmic performance.
Table 2 reports the final path-planning cost statistics of all algorithms on seven representative UAV scenarios, together with their average rank and total rank.
The UAV experimental results show that QHHO_GA achieves an average rank of 3.44 and the best total rank among all 12 methods, indicating strong robustness across different terrain scenarios. As shown in Table 2, the proposed algorithm maintains competitive performance not only in terms of final cost, but also in terms of consistency across multiple UAV test cases. Compared with the strongest competing methods, such as RDWOA and GBHHO, QHHO_GA shows better overall stability, while algorithms such as PSO, ALCPSO, and SMA exhibit substantially weaker robustness in complex three-dimensional environments with multiple obstacles. These results suggest that the hybrid design of QHHO_GA is effective in balancing global exploration, local refinement, and path diversity under constrained multi-UAV path planning settings.
The numerical results further indicate that the main strength of QHHO_GA lies in its overall robustness rather than in dominating every individual scenario. In relatively simple environments, several comparison algorithms can occasionally achieve lower mean costs than QHHO_GA. However, as terrain complexity increases, many competing methods exhibit substantial cost degradation, weaker stability, or more homogeneous trajectory patterns, whereas QHHO_GA remains within a consistently competitive performance range. This explains why the proposed method achieves the best total rank even though it is not the best-performing algorithm on every single test case.
To further visualize the path-planning behavior of the compared algorithms, representative path plots are presented in Figure 11, Figure 12, Figure 13, Figure 14, Figure 15, Figure 16, Figure 17, Figure 18, Figure 19, Figure 20, Figure 21 and Figure 22. These figures show the three-dimensional views and, where applicable, top views of the generated UAV trajectories under different terrain configurations. Overall, the visual results are consistent with the statistical comparison: in relatively simple scenarios, QHHO_GA remains competitive, while in scenarios with denser obstacles and stronger path interaction requirements, its advantage becomes more evident.
For the relatively simple scenarios represented by f1 and f2 (Figure 11 and Figure 12), QHHO_GA is able to generate feasible and relatively diverse paths while maintaining good path efficiency. In these cases, the shortest-path objective still plays a dominant role, and the performance differences among the strongest methods are not always large. Nevertheless, the comparison plots show that QHHO_GA can still provide more balanced path distributions. In contrast, the paths generated by SMA are vertically duplicated, which introduces unnecessary steering and reduces resource efficiency. In the f2 scenario, PSO requires a pronounced detour, which substantially increases path length, whereas QHHO_GA reaches the target region with a more efficient and better-balanced set of trajectories.
In the scenario of f3 (Figure 13), three obstacles are located directly in the main flight region. Both QHHO_GA and LWOA are able to guide the UAVs around these obstacles; however, QHHO_GA achieves a lower overall path cost by preserving obstacle avoidance performance while still generating more spatially differentiated trajectories. This result suggests that the proposed framework can maintain path diversity without excessively sacrificing path efficiency.
In the scenario of f4 (Figure 14), where four obstacles are distributed inside the terrain, QHHO_GA still produces relatively efficient and well-separated paths. By comparison, the paths generated by GBHHO are more homogeneous and show weaker spatial differentiation. This observation is consistent with the design objective of the proposed method, which explicitly encourages path diversity in multi-UAV missions rather than allowing the trajectories to collapse into highly similar solutions.
The difference becomes more evident in the scenario of f5 (Figure 15), which contains five obstacles and therefore imposes a more complex path interaction structure. In this case, QHHO_GA generates multiple optimized paths from different directions, whereas OBSCA mainly produces a simpler bypass pattern from one side of the terrain. Such behavior indicates that QHHO_GA is more capable of finding alternative feasible routes and is therefore more consistent with the practical requirement of multi-UAV cooperative path planning, where excessive path overlap should be avoided whenever possible.
As the obstacle density increases further, the advantage of QHHO_GA becomes more pronounced. This tendency is already reflected in the quantitative results for f6 and can be seen more clearly in the visual comparison for f7 (Figure 18 and Figure 19). In the seven-obstacle scenario, QHHO_GA enables the five UAVs to identify five feasible paths from three different directions while successfully avoiding obstacles. By contrast, MCHHO tends to generate five highly similar paths from essentially one direction, and one UAV may even fail to obtain a satisfactory obstacle-avoiding solution within the allowed iterations. These results suggest that the proposed framework is more effective in preserving both feasibility and trajectory diversity under increasingly constrained conditions.
A similar pattern is observed in the scenario of f8 (Figure 20 and Figure 21), where obstacles are distributed in a more complex and irregular manner. In this case, QHHO_GA finds five largely distinct paths that start from four directions, with only limited similarity between a small number of trajectories. RDWOA also performs reasonably well in terms of obstacle avoidance and path length; however, its solutions are less diverse overall, with four trajectories being nearly identical and only two main search directions being explored. From the perspective of multi-UAV mission planning, such a path distribution is less desirable than the more spatially differentiated solutions generated by QHHO_GA.
Finally, in the highly constrained scenario of f9 (Figure 22), the terrain contains ten obstacles and imposes the most challenging path planning conditions among the tested visualization cases. Even in this environment, QHHO_GA is still able to generate multiple diverse paths that follow obstacle boundaries accurately while maintaining a relatively low total path cost. Although RDWOA also shows a certain degree of trajectory diversity, its solutions become overly dispersed, resulting in a poorer balance between path cost, altitude cost, and obstacle avoidance. In some cases, this excessive diversity even leads to collisions or less efficient routes. By contrast, QHHO_GA maintains a more effective compromise between diversity and path quality.
In summary, the scenario-based visualization results reinforce the statistical findings reported in Table 2. QHHO_GA performs competitively in simple terrain and shows increasingly clear advantages as obstacle density and path interaction complexity increase. Its main strength lies not only in generating feasible trajectories, but also in producing spatially differentiated cooperative paths while maintaining reasonable path cost. This property is particularly important for practical multi-UAV missions, where path overlap, redundancy, and insufficient coverage may significantly reduce cooperative effectiveness.

7. Conclusions

This paper proposed a hybrid heuristic optimization algorithm, QHHO_GA, and applied it to the multi-UAV path planning problem in complex environments. The proposed framework integrates GA-based population enhancement, Q-learning-guided HHO adaptive search, stagnation monitoring and adaptive parameter regulation, prioritized experience replay, and RRT-based stagnation adjustment into a unified optimization process. Through this design, the algorithm aims to improve search diversity, strengthen the balance between exploration and exploitation, and enhance the ability to escape from locally trapped regions.
The experimental results demonstrate the effectiveness of the proposed method from both the optimization perspective and the application perspective. On the CEC2017 benchmark set, QHHO_GA ranked among the top two on 18 out of 30 test functions, indicating strong competitiveness across different categories of optimization landscapes. In the multi-UAV path planning experiments, QHHO_GA achieved an average ranking of 3.44 and ranked first overall among 12 classical and improved algorithms, showing strong robustness and good solution quality in complex terrain scenarios. In particular, the proposed method demonstrated clear advantages in environments with dense obstacles and complex path interactions, where both search diversity and the ability to avoid premature convergence are critical.
The main strength of QHHO_GA lies in its coordinated hybrid design. GA improves population quality and diversity, HHO provides the core search capability, Q-learning enhances the adaptivity of action selection, and the combination of stagnation monitoring and adaptive parameter regulation with RRT-based stagnation adjustment helps the algorithm restore diversity when progress becomes insufficient. As a result, the proposed framework is able to maintain stable performance in high-dimensional and strongly constrained optimization tasks. In the multi-UAV application considered in this work, the method not only improves path quality and obstacle avoidance performance, but also better supports the generation of spatially differentiated trajectories, which is important for cooperative multi-UAV missions.
Nevertheless, this study still has several limitations. First, the effectiveness of QHHO_GA has been validated only in simulation environments, and no real-world flight experiments have been conducted. Therefore, the current results do not fully reflect the uncertainties, sensor noise, communication delays, and dynamic disturbances that may arise in practical UAV operations. Second, the proposed method has been evaluated on pre-generated terrain and obstacle settings, which, although representative, cannot cover all possible real-world environmental variations. Third, like many population-based hybrid optimization methods, the computational cost of the framework may become a challenge in scenarios with stricter real-time requirements or larger-scale UAV teams.
Future work will therefore focus on addressing these limitations from both the algorithmic and application levels. On the one hand, real-world flight tests will be conducted to further verify the practicality and robustness of QHHO_GA under realistic sensing, communication, and environmental uncertainties. On the other hand, the framework will be extended to improve real-time responsiveness and adaptability in dynamic environments, for example, through tighter integration with online learning strategies, more efficient parameter control, and multi-sensor information fusion. In addition, reducing computational overhead and improving energy efficiency will be important directions for making the proposed method more suitable for lightweight UAV platforms and large-scale cooperative missions.

Author Contributions

Conceptualization, R.L.; Methodology, R.L.; Software, R.L. and Z.X.; Validation, R.L., Z.X. and H.H.; Investigation, R.L.; Writing—original draft, R.L. and H.H.; Writing—review & editing, H.H. and Z.Z.; Supervision, Z.Z.; Project administration, Z.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding. The APC was paid by the authors.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Comparison with the statistical results of the improved algorithms on the benchmark function test set (1).
Table A1. Comparison with the statistical results of the improved algorithms on the benchmark function test set (1).
F1F2F3F4F5
meanstdmeanstdmeanstdmeanstdmeanstd
QHHO_GA4.64 × 10 3 4.11 × 10 3 3.62 × 10 2 5.37 × 10 2 3.00 × 10 2 5.54 × 10 6 4.66 × 10 2 3.87 × 10 1 6.07 × 10 2 2.70 × 10 1
ALCPSO9.29 × 10 3 7.52 × 10 3 7.56 × 10 16 3.96 × 10 17 2.59 × 10 4 4.43 × 10 3 4.89 × 10 2 3.39 × 10 1 5.99 × 10 2 2.27 × 10 1
CDLOBA5.30 × 10 3 6.72 × 10 3 4.15 × 10 14 1.97 × 10 15 9.39 × 10 2 1.46 × 10 3 4.57 × 10 2 4.15 × 10 1 8.58 × 10 2 7.26 × 10 1
DE1.84 × 10 3 3.69 × 10 3 3.07 × 10 21 8.62 × 10 21 1.98 × 10 4 4.11 × 10 3 4.90 × 10 2 9.48 × 10 0 6.09 × 10 2 8.82 × 10 0
DFOA8.46 × 10 10 3.78 × 10 8 1.37 × 10 61 6.56 × 10 60 1.08 × 10 9 5.53 × 10 7 2.28 × 10 4 5.85 × 10 2 1.03 × 10 3 1.44 × 10 1
HHO1.10 × 10 7 2.59 × 10 6 1.87 × 10 13 3.68 × 10 13 7.95 × 10 3 2.78 × 10 3 5.25 × 10 2 2.49 × 10 1 7.33 × 10 2 2.68 × 10 1
OBSCA1.69 × 10 10 2.34 × 10 9 1.44 × 10 35 3.54 × 10 35 6.02 × 10 4 7.49 × 10 3 2.67 × 10 3 8.34 × 10 2 8.14 × 10 2 2.14 × 10 1
PSO1.32 × 10 8 1.44 × 10 7 3.03 × 10 13 3.07 × 10 13 6.54 × 10 2 3.95 × 10 1 4.53 × 10 2 3.08 × 10 1 7.48 × 10 2 3.23 × 10 1
RCBA2.07 × 10 4 8.56 × 10 3 2.00 × 10 2 2.01 × 10 3 3.01 × 10 2 3.64 × 10 1 4.93 × 10 2 3.34 × 10 1 8.28 × 10 2 6.06 × 10 1
RWGWO2.47 × 10 4 1.19 × 10 4 2.23 × 10 7 3.78 × 10 7 3.34 × 10 2 3.06 × 10 1 4.89 × 10 2 2.19 × 10 1 5.56 × 10 2 1.26 × 10 1
WOA3.13 × 10 6 2.53 × 10 6 1.10 × 10 24 4.14 × 10 24 1.63 × 10 5 6.86 × 10 4 5.39 × 10 2 3.93 × 10 1 7.79 × 10 2 5.87 × 10 1
F6F7F8F9F10
meanstdmeanstdmeanstdmeanstdmeanstd
QHHO_GA6.00 × 10 2 3.09 × 10 2 8.78 × 10 2 3.59 × 10 1 8.97 × 10 2 2.25 × 10 1 1.93 × 10 3 5.44 × 10 2 3.44 × 10 3 3.66 × 10 2
ALCPSO6.05 × 10 2 6.29 × 10 0 8.54 × 10 2 4.41 × 10 1 9.07 × 10 2 2.78 × 10 1 2.05 × 10 3 8.25 × 10 2 4.47 × 10 3 5.17 × 10 2
CDLOBA6.72 × 10 2 8.52 × 10 0 2.72 × 10 3 2.92 × 10 2 1.09 × 10 3 6.05 × 10 1 9.08 × 10 3 2.21 × 10 3 5.48 × 10 3 6.26 × 10 2
DE6.00 × 10 2 0.00 × 10 0 8.43 × 10 2 1.13 × 10 1 9.09 × 10 2 7.87 × 10 0 9.00 × 10 2 1.03 × 10 13 5.87 × 10 3 2.62 × 10 2
DFOA7.10 × 10 2 3.92 × 10 0 1.59 × 10 3 2.08 × 10 1 1.23 × 10 3 6.61 × 10 0 1.75 × 10 4 1.93 × 10 3 1.08 × 10 4 1.82 × 10 2
HHO6.63 × 10 2 5.72 × 10 0 1.24 × 10 3 8.12 × 10 1 9.66 × 10 2 2.70 × 10 1 6.49 × 10 3 6.20 × 10 2 5.50 × 10 3 6.10 × 10 2
OBSCA6.55 × 10 2 5.12 × 10 0 1.17 × 10 3 5.00 × 10 1 1.07 × 10 3 1.40 × 10 1 7.06 × 10 3 9.83 × 10 2 7.27 × 10 3 3.49 × 10 2
PSO6.49 × 10 2 1.40 × 10 1 9.17 × 10 2 1.84 × 10 1 9.91 × 10 2 2.72 × 10 1 4.50 × 10 3 2.53 × 10 3 6.21 × 10 3 6.48 × 10 2
RCBA6.73 × 10 2 8.65 × 10 0 1.76 × 10 3 1.75 × 10 2 1.05 × 10 3 5.49 × 10 1 8.06 × 10 3 2.40 × 10 3 5.83 × 10 3 8.43 × 10 2
RWGWO6.00 × 10 2 1.91 × 10 1 7.91 × 10 2 1.34 × 10 1 8.51 × 10 2 1.46 × 10 1 9.36 × 10 2 1.08 × 10 2 3.71 × 10 3 4.41 × 10 2
WOA6.71 × 10 2 1.24 × 10 1 1.25 × 10 3 9.37 × 10 1 1.01 × 10 3 5.23 × 10 1 7.90 × 10 3 2.43 × 10 3 6.05 × 10 3 7.66 × 10 2
F11F12F13F14F15
meanstdmeanstdmeanstdmeanstdmeanstd
QHHO_GA1.17 × 10 3 3.24 × 10 1 8.55 × 10 5 5.61 × 10 5 1.24 × 10 4 9.45 × 10 3 1.11 × 10 4 1.09 × 10 4 6.57 × 10 3 5.84 × 10 3
ALCPSO1.24 × 10 3 6.33 × 10 1 9.08 × 10 5 1.98 × 10 6 2.61 × 10 4 2.04 × 10 4 3.37 × 10 4 8.59 × 10 4 1.38 × 10 4 1.34 × 10 4
CDLOBA1.31 × 10 3 5.89 × 10 1 2.78 × 10 5 2.43 × 10 5 1.57 × 10 5 9.30 × 10 4 5.92 × 10 3 4.47 × 10 3 7.60 × 10 4 4.38 × 10 4
DE1.15 × 10 3 2.86 × 10 1 1.34 × 10 6 6.72 × 10 5 3.35 × 10 4 2.27 × 10 4 4.72 × 10 4 4.07 × 10 4 5.87 × 10 3 2.56 × 10 3
DFOA5.94 × 10 8 1.05 × 10 8 2.56 × 10 10 6.11 × 10 8 3.04 × 10 10 2.31 × 10 9 1.21 × 10 9 6.55 × 10 7 4.15 × 10 9 4.44 × 10 8
HHO1.26 × 10 3 3.94 × 10 1 1.04 × 10 7 6.81 × 10 6 5.43 × 10 5 5.66 × 10 5 6.01 × 10 4 8.66 × 10 4 5.66 × 10 4 3.09 × 10 4
OBSCA2.86 × 10 3 6.28 × 10 2 1.86 × 10 9 5.40 × 10 8 7.49 × 10 8 3.79 × 10 8 2.67 × 10 5 1.44 × 10 5 1.11 × 10 7 1.24 × 10 7
PSO1.30 × 10 3 4.90 × 10 1 2.22 × 10 7 9.20 × 10 6 4.73 × 10 6 1.46 × 10 6 8.54 × 10 3 4.12 × 10 3 4.99 × 10 5 1.98 × 10 5
RCBA1.30 × 10 3 7.46 × 10 1 2.61 × 10 6 1.21 × 10 6 1.11 × 10 5 8.36 × 10 4 7.60 × 10 3 4.55 × 10 3 5.52 × 10 4 4.17 × 10 4
RWGWO1.19 × 10 3 3.56 × 10 1 1.98 × 10 6 1.57 × 10 6 7.77 × 10 4 5.20 × 10 4 2.23 × 10 4 1.89 × 10 4 5.05 × 10 4 2.92 × 10 4
WOA1.62 × 10 3 5.10 × 10 2 4.49 × 10 7 3.91 × 10 7 1.37 × 10 5 7.06 × 10 4 6.57 × 10 5 8.74 × 10 5 6.92 × 10 4 3.82 × 10 4
Table A2. Comparison with the statistical results of the improved algorithms on the benchmark function test set (2).
Table A2. Comparison with the statistical results of the improved algorithms on the benchmark function test set (2).
F16F17F18F19F20
meanstdmeanstdmeanstdmeanstdmeanstd
QHHO_GA2.69 × 10 3 3.00 × 10 2 2.11 × 10 3 1.99 × 10 2 6.04 × 10 4 4.53 × 10 4 5.70 × 10 3 3.42 × 10 3 2.37 × 10 3 1.68 × 10 2
ALCPSO2.63 × 10 3 2.24 × 10 2 2.16 × 10 3 2.13 × 10 2 1.60 × 10 5 2.18 × 10 5 1.36 × 10 4 1.34 × 10 4 2.37 × 10 3 1.42 × 10 2
CDLOBA3.44 × 10 3 3.78 × 10 2 2.85 × 10 3 3.72 × 10 2 1.09 × 10 5 7.05 × 10 4 6.19 × 10 4 2.34 × 10 4 2.91 × 10 3 1.87 × 10 2
DE2.08 × 10 3 1.56 × 10 2 1.85 × 10 3 5.40 × 10 1 2.93 × 10 5 1.21 × 10 5 8.82 × 10 3 5.18 × 10 3 2.14 × 10 3 7.82 × 10 1
DFOA2.73 × 10 4 2.14 × 10 2 1.95 × 10 5 2.49 × 10 4 4.52 × 10 9 3.22 × 10 8 3.14 × 10 9 3.18 × 10 8 3.68 × 10 3 9.71 × 10 1
HHO3.24 × 10 3 4.65 × 10 2 2.58 × 10 3 3.52 × 10 2 1.32 × 10 6 1.67 × 10 6 3.68 × 10 5 2.43 × 10 5 2.73 × 10 3 2.04 × 10 2
OBSCA3.85 × 10 3 2.39 × 10 2 2.55 × 10 3 2.12 × 10 2 5.03 × 10 6 3.84 × 10 6 3.75 × 10 7 2.37 × 10 7 2.64 × 10 3 1.10 × 10 2
PSO2.88 × 10 3 2.87 × 10 2 2.31 × 10 3 2.32 × 10 2 1.67 × 10 5 1.06 × 10 5 1.64 × 10 6 5.73 × 10 5 2.60 × 10 3 1.74 × 10 2
RCBA3.35 × 10 3 3.97 × 10 2 2.90 × 10 3 3.34 × 10 2 1.94 × 10 5 1.35 × 10 5 1.26 × 10 4 7.17 × 10 3 2.93 × 10 3 2.61 × 10 2
RWGWO2.23 × 10 3 2.14 × 10 2 1.91 × 10 3 9.40 × 10 1 2.97 × 10 5 1.94 × 10 5 2.66 × 10 4 1.58 × 10 4 2.28 × 10 3 1.20 × 10 2
WOA3.48 × 10 3 4.11 × 10 2 2.57 × 10 3 2.74 × 10 2 1.54 × 10 6 1.76 × 10 6 2.52 × 10 6 1.75 × 10 6 2.70 × 10 3 1.90 × 10 2
F21F22F23F24F25
meanstdmeanstdmeanstdmeanstdmeanstd
QHHO_GA2.41 × 10 3 2.17 × 10 1 4.55 × 10 3 1.45 × 10 3 2.78 × 10 3 3.29 × 10 1 3.07 × 10 3 6.93 × 10 1 2.90 × 10 3 2.00 × 10 1
ALCPSO2.41 × 10 3 2.88 × 10 1 4.85 × 10 3 1.67 × 10 3 2.80 × 10 3 5.04 × 10 1 2.98 × 10 3 4.74 × 10 1 2.89 × 10 3 1.07 × 10 1
CDLOBA2.64 × 10 3 6.55 × 10 1 7.20 × 10 3 1.41 × 10 3 3.22 × 10 3 1.14 × 10 2 3.33 × 10 3 1.11 × 10 2 2.92 × 10 3 3.04 × 10 1
DE2.41 × 10 3 1.06 × 10 1 3.99 × 10 3 1.91 × 10 3 2.76 × 10 3 1.03 × 10 1 2.96 × 10 3 1.39 × 10 1 2.89 × 10 3 5.08 × 10 1
DFOA3.08 × 10 3 2.45 × 10 1 1.16 × 10 4 3.30 × 10 2 5.66 × 10 3 1.17 × 10 2 5.18 × 10 3 1.85 × 10 1 6.94 × 10 3 2.31 × 10 2
HHO2.54 × 10 3 4.39 × 10 1 6.81 × 10 3 1.42 × 10 3 3.13 × 10 3 1.03 × 10 2 3.43 × 10 3 1.52 × 10 2 2.91 × 10 3 2.06 × 10 1
OBSCA2.46 × 10 3 9.28 × 10 1 4.13 × 10 3 3.55 × 10 2 3.02 × 10 3 3.44 × 10 1 3.19 × 10 3 3.17 × 10 1 3.35 × 10 3 1.31 × 10 2
PSO2.55 × 10 3 3.49 × 10 1 5.39 × 10 3 2.58 × 10 3 3.13 × 10 3 1.82 × 10 2 3.23 × 10 3 1.43 × 10 2 2.91 × 10 3 2.51 × 10 1
RCBA2.64 × 10 3 8.64 × 10 1 7.09 × 10 3 1.12 × 10 3 3.34 × 10 3 1.45 × 10 2 3.40 × 10 3 1.39 × 10 2 2.90 × 10 3 2.23 × 10 1
RWGWO2.35 × 10 3 1.26 × 10 1 3.96 × 10 3 1.44 × 10 3 2.71 × 10 3 1.46 × 10 1 2.89 × 10 3 1.76 × 10 1 2.89 × 10 3 1.71 × 10 0
WOA2.57 × 10 3 7.03 × 10 1 6.77 × 10 3 1.82 × 10 3 3.06 × 10 3 1.17 × 10 2 3.17 × 10 3 8.69 × 10 1 2.95 × 10 3 2.83 × 10 1
F26F27F28F29F30
meanstdmeanstdmeanstdmeanstdmeanstd
QHHO_GA4.50 × 10 3 1.27 × 10 3 3.25 × 10 3 2.29 × 10 1 3.15 × 10 3 4.51 × 10 1 3.71 × 10 3 1.87 × 10 2 9.15 × 10 3 3.44 × 10 3
ALCPSO4.85 × 10 3 8.02 × 10 2 3.25 × 10 3 2.43 × 10 1 3.23 × 10 3 3.95 × 10 1 3.87 × 10 3 1.45 × 10 2 2.63 × 10 4 5.64 × 10 4
CDLOBA9.90 × 10 3 2.16 × 10 3 3.49 × 10 3 1.82 × 10 2 3.19 × 10 3 6.10 × 10 1 5.30 × 10 3 6.70 × 10 2 2.18 × 10 5 2.01 × 10 5
DE4.63 × 10 3 8.76 × 10 1 3.20 × 10 3 3.99 × 10 0 3.19 × 10 3 4.16 × 10 1 3.52 × 10 3 7.73 × 10 1 1.31 × 10 4 3.09 × 10 3
DFOA1.59 × 10 4 2.77 × 10 2 9.01 × 10 3 1.92 × 10 2 9.33 × 10 3 1.52 × 10 2 2.19 × 10 5 2.65 × 10 4 9.94 × 10 9 2.29 × 10 8
HHO6.75 × 10 3 1.86 × 10 3 3.37 × 10 3 1.39 × 10 2 3.26 × 10 3 3.68 × 10 1 4.31 × 10 3 3.23 × 10 2 1.89 × 10 6 1.01 × 10 6
OBSCA6.90 × 10 3 7.77 × 10 2 3.46 × 10 3 6.39 × 10 1 4.20 × 10 3 2.94 × 10 2 5.12 × 10 3 1.94 × 10 2 1.04 × 10 8 4.26 × 10 7
PSO5.80 × 10 3 1.77 × 10 3 3.18 × 10 3 8.69 × 10 1 3.25 × 10 3 2.22 × 10 1 4.29 × 10 3 2.93 × 10 2 3.55 × 10 6 1.96 × 10 6
RCBA9.12 × 10 3 2.52 × 10 3 3.47 × 10 3 1.49 × 10 2 3.18 × 10 3 6.69 × 10 1 4.85 × 10 3 5.15 × 10 2 2.19 × 10 5 1.60 × 10 5
RWGWO4.26 × 10 3 1.87 × 10 2 3.21 × 10 3 9.47 × 10 0 3.22 × 10 3 2.52 × 10 1 3.56 × 10 3 8.76 × 10 1 2.53 × 10 5 1.51 × 10 5
WOA7.86 × 10 3 1.38 × 10 3 3.37 × 10 3 1.08 × 10 2 3.32 × 10 3 4.48 × 10 1 4.84 × 10 3 3.71 × 10 2 1.19 × 10 7 8.85 × 10 6

References

  1. Ghambari, S.; Golabi, M.; Jourdan, L.; Lepagnot, J.; Idoumghar, L. UAV path planning techniques: A survey. RAIRO-Oper. Res. 2024, 58, 2951–2989. [Google Scholar] [CrossRef]
  2. Almakayeel, N. Flying foxes optimization with reinforcement learning for vehicle detection in UAV imagery. Sci. Rep. 2024, 14, 20616. [Google Scholar] [CrossRef] [PubMed]
  3. Kashino, Z.; Nejat, G.; Benhabib, B. Multi-UAV based autonomous wilderness search and rescue using target iso-probability curves. In Proceedings of the 2019 International Conference on Unmanned Aircraft Systems (ICUAS); IEEE: Piscataway, NJ, USA, 2019; pp. 636–643. [Google Scholar]
  4. Anam, I.; Arafat, N.; Hafiz, M.S.; Jim, J.R.; Kabir, M.M.; Mridha, M. A Systematic Review of UAV and AI Integration for Targeted Disease Detection, Weed Management, and Pest Control in Precision Agriculture. Smart Agric. Technol. 2024, 9, 100647. [Google Scholar] [CrossRef]
  5. Song, J.; Zhao, K.; Liu, Y. Survey on mission planning of multiple unmanned aerial vehicles. Aerospace 2023, 10, 208. [Google Scholar] [CrossRef]
  6. Luo, J.; Tian, Y.; Wang, Z. Research on Unmanned Aerial Vehicle Path Planning. Drones 2024, 8, 51. [Google Scholar] [CrossRef]
  7. Farid, G.; Cocuzza, S.; Younas, T.; Razzaqi, A.A.; Wattoo, W.A.; Cannella, F.; Mo, H. Modified A-star (A*) approach to plan the motion of a quadrotor UAV in three-dimensional obstacle-cluttered environment. Appl. Sci. 2022, 12, 5791. [Google Scholar] [CrossRef]
  8. Korf, R.E. Depth-first iterative-deepening: An optimal admissible tree search. Artif. Intell. 1985, 27, 97–109. [Google Scholar] [CrossRef]
  9. Wang, Z.; Gong, H.; Nie, M.; Liu, X. Research on Multi-UAV Cooperative Dynamic Path Planning Algorithm Based on Conflict Search. Drones 2024, 8, 274. [Google Scholar] [CrossRef]
  10. Huang, T.; Fan, K.; Sun, W. Density gradient-RRT: An improved rapidly exploring random tree algorithm for UAV path planning. Expert Syst. Appl. 2024, 252, 124121. [Google Scholar] [CrossRef]
  11. LaValle, S. Rapidly-exploring random trees: A new tool for path planning. Tech. Rep. 1998, TR 98-11, 1–12. [Google Scholar]
  12. Kuffner, J.J.; LaValle, S.M. RRT-connect: An efficient approach to single-query path planning. In Proceedings of the 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No. 00CH37065); IEEE: Piscataway, NJ, USA, 2000; Volume 2, pp. 995–1001. [Google Scholar]
  13. Karaman, S.; Frazzoli, E. Sampling-based algorithms for optimal motion planning. Int. J. Robot. Res. 2011, 30, 846–894. [Google Scholar] [CrossRef]
  14. Urmson, C.; Simmons, R. Approaches for heuristically biasing RRT growth. In Proceedings of the 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No. 03CH37453); IEEE: Piscataway, NJ, USA, 2003; Volume 2, pp. 1178–1183. [Google Scholar]
  15. Jiang, C.; Hu, Z.; Mourelatos, Z.P.; Gorsich, D.; Jayakumar, P.; Fu, Y.; Majcher, M. R2-RRT*: Reliability-based robust mission planning of off-road autonomous ground vehicle under uncertain terrain environment. IEEE Trans. Autom. Sci. Eng. 2021, 19, 1030–1046. [Google Scholar] [CrossRef]
  16. Guo, Y.; Liu, X.; Liu, X.; Yang, Y.; Zhang, W. FC-RRT*: An improved path planning algorithm for UAV in 3D complex environment. ISPRS Int. J. Geo-Inf. 2022, 11, 112. [Google Scholar] [CrossRef]
  17. Primatesta, S.; Osman, A.; Rizzo, A. MP-RRT#: A model predictive sampling-based motion planning algorithm for unmanned aircraft systems. J. Intell. Robot. Syst. 2021, 103, 59. [Google Scholar]
  18. Holland, J.H. Genetic algorithms. Sci. Am. 1992, 267, 66–73. [Google Scholar] [CrossRef]
  19. Dorigo, M.; Birattari, M.; Stutzle, T. Ant colony optimization. IEEE Comput. Intell. Mag. 2006, 1, 28–39. [Google Scholar] [CrossRef]
  20. Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN’95-International Conference on Neural Networks; IEEE: Piscataway, NJ, USA, 1995; Volume 4, pp. 1942–1948. [Google Scholar]
  21. Heidari, A.A.; Mirjalili, S.; Faris, H.; Aljarah, I.; Mafarja, M.; Chen, H. Harris hawks optimization: Algorithm and applications. Future Gener. Comput. Syst. 2019, 97, 849–872. [Google Scholar] [CrossRef]
  22. Mirjalili, S.; Mirjalili, S.M.; Hatamlou, A. Multi-verse optimizer: A nature-inspired algorithm for global optimization. Neural Comput. Appl. 2016, 27, 495–513. [Google Scholar] [CrossRef]
  23. Yao, P.; Wang, H. Dynamic Adaptive Ant Lion Optimizer applied to route planning for unmanned aerial vehicle. Soft Comput. 2017, 21, 5475–5488. [Google Scholar] [CrossRef]
  24. Shen, Q.; Zhang, D.; He, Q.; Ban, Y.; Zuo, F. A novel multi-objective dung beetle optimizer for Multi-UAV cooperative path planning. Heliyon 2024, 10, e37286. [Google Scholar] [CrossRef]
  25. Meng, Q.; Chen, K.; Qu, Q. PPSwarm: Multi-UAV Path Planning Based on Hybrid PSO in Complex Scenarios. Drones 2024, 8, 192. [Google Scholar] [CrossRef]
  26. Xu, H.; Niu, Z.; Jiang, B.; Zhang, Y.; Chen, S.; Li, Z.; Gao, M.; Zhu, M. ERRT-GA: Expert Genetic Algorithm with Rapidly Exploring Random Tree Initialization for Multi-UAV Path Planning. Drones 2024, 8, 367. [Google Scholar] [CrossRef]
  27. Li, Y.; Meng, X.; Ye, F.; Jiang, T.; Li, Y. Path planning based on clustering and improved ACO in UAV-assisted wireless sensor network. In Proceedings of the 2020 IEEE USNC-CNC-URSI North American Radio Science Meeting (Joint with AP-S Symposium); IEEE: Piscataway, NJ, USA, 2020; pp. 57–58. [Google Scholar]
  28. Gong, Y.; Chen, K.; Niu, T.; Liu, Y. Grid-Based coverage path planning with NFZ avoidance for UAV using parallel self-adaptive ant colony optimization algorithm in cloud IoT. J. Cloud Comput. 2022, 11, 29. [Google Scholar] [CrossRef]
  29. Xue, J.; Shen, B. A novel swarm intelligence optimization approach: Sparrow search algorithm. Syst. Sci. Control Eng. 2020, 8, 22–34. [Google Scholar] [CrossRef]
  30. Lei, H.; Yan, Y.; Liu, J.; Han, Q.; Li, Z. Hierarchical multi-UAV path planning for urban low altitude environments. IEEE Access 2024, 12, 162109–162121. [Google Scholar] [CrossRef]
  31. Chour, K.; Reddinger, J.P.; Dotterweich, J.; Childers, M.; Humann, J.; Rathinam, S.; Darbha, S. An agent-based modeling framework for the multi-UAV rendezvous recharging problem. Robot. Auton. Syst. 2023, 166, 104442. [Google Scholar] [CrossRef]
  32. Meng, Q.; Qu, Q.; Chen, K.; Yi, T. Multi-UAV Path Planning Based on Cooperative Co-Evolutionary Algorithms with Adaptive Decision Variable Selection. Drones 2024, 8, 435. [Google Scholar] [CrossRef]
  33. Shujuan, H.; Wenqi, C.; Beixuan, L.; Feng, X.; Chao, S.; Wenjuan, Z. An improved BAT algorithm for collaborative dynamic target tracking and path planning of multiple UAV. Comput. Electr. Eng. 2024, 118, 109340. [Google Scholar] [CrossRef]
  34. Peng, Y.; Liu, Y.; Zhang, H. Deep reinforcement learning based path planning for UAV-assisted edge computing networks. In Proceedings of the 2021 IEEE Wireless Communications and Networking Conference (WCNC); IEEE: Piscataway, NJ, USA, 2021; pp. 1–6. [Google Scholar]
  35. Ejaz, M.; Gui, J.; Asim, M.; ElAffendi, M.; Fung, C.; Abd El-Latif, A.A. RL-Planner: Reinforcement Learning-Enabled Efficient Path Planning in Multi-UAV MEC Systems. IEEE Trans. Netw. Serv. Manag. 2024, 21, 3317–3329. [Google Scholar] [CrossRef]
  36. Beishenalieva, A.; Yoo, S.J. UAV Path Planning for Data Gathering in Wireless Sensor Networks: Spatial and Temporal Substate-Based Q-Learning. IEEE Internet Things J. 2023, 11, 9572–9586. [Google Scholar] [CrossRef]
  37. Tang, J.; Liang, Y.; Li, K. Dynamic Scene Path Planning of UAVs Based on Deep Reinforcement Learning. Drones 2024, 8, 60. [Google Scholar] [CrossRef]
  38. Mahajan, P.; Balamurugan, P.; Kumar, A.; Chalapathi, G.; Chamola, V.; Khabbaz, M. Multi-Objective MDP-based Routing in UAV Networks for Search-based Operations. IEEE Trans. Veh. Technol. 2024, 73, 13777–13789. [Google Scholar] [CrossRef]
  39. Zhao, Z.Y.; Che, Y.L.; Luo, S.; Luo, G.; Wu, K.; Leung, V.C. On Designing Multi-UAV aided Wireless Powered Dynamic Communication via Hierarchical Deep Reinforcement Learning. IEEE Trans. Mob. Comput. 2024, 23, 13991–14004. [Google Scholar] [CrossRef]
  40. Tu, B.; Wang, F.; Han, X.; Fu, X. Q-learning Guided Grey Wolf Optimizer for UAV 3D Path Planning. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 469. [Google Scholar] [CrossRef]
  41. Bo, L.; Zhang, T.; Zhang, H.; Yang, J.; Zhang, Z.; Zhang, C.; Liu, M. Improved Q-learning Algorithm Based on Flower Pollination Algorithm and Tabulation Method for Unmanned Aerial Vehicle Path Planning. IEEE Access 2024, 12, 104429–104444. [Google Scholar] [CrossRef]
  42. Puente-Castro, A.; Rivero, D.; Pedrosa, E.; Pereira, A.; Lau, N.; Fernandez-Blanco, E. Q-learning based system for path planning with unmanned aerial vehicles swarms in obstacle environments. Expert Syst. Appl. 2024, 235, 121240. [Google Scholar] [CrossRef]
  43. Nayeem, G.M.; Fan, M.; Daiyan, G.M. Adaptive Q-learning grey wolf optimizer for UAV path planning. Drones 2025, 9, 246. [Google Scholar] [CrossRef]
  44. Chen, C.Y.; Ye, F. Particle swarm optimization algorithm and its application to clustering analysis. In Proceedings of the 2012 Proceedings of 17th Conference on Electrical Power Distribution; IEEE: Piscataway, NJ, USA, 2012; pp. 789–794. [Google Scholar]
  45. Storn, R.; Price, K. Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 1997, 11, 341–359. [Google Scholar] [CrossRef]
  46. Mirjalili, S.; Lewis, A. The whale optimization algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
  47. Gupta, S.; Deep, K. A novel random walk grey wolf optimizer. Swarm Evol. Comput. 2019, 44, 101–112. [Google Scholar] [CrossRef]
  48. Yong, J.; He, F.; Li, H.; Zhou, W. A novel bat algorithm based on collaborative and dynamic learning of opposite population. In Proceedings of the 2018 IEEE 22nd International Conference on Computer Supported Cooperative Work in Design (CSCWD); IEEE: Piscataway, NJ, USA, 2018; pp. 541–546. [Google Scholar]
  49. Niu, J.; Zhong, W.; Liang, Y.; Luo, N.; Qian, F. Fruit fly optimization algorithm based on differential evolution and its application on gasification process operation optimization. Knowl.-Based Syst. 2015, 88, 253–263. [Google Scholar] [CrossRef]
  50. Abd Elaziz, M.; Oliva, D.; Xiong, S. An improved opposition-based sine cosine algorithm for global optimization. Expert Syst. Appl. 2017, 90, 484–500. [Google Scholar] [CrossRef]
  51. Liang, H.; Liu, Y.; Shen, Y.; Li, F.; Man, Y. A hybrid bat algorithm for economic dispatch with random wind power. IEEE Trans. Power Syst. 2018, 33, 5052–5061. [Google Scholar] [CrossRef]
  52. Chen, H.; Yang, C.; Heidari, A.A.; Zhao, X. An efficient double adaptive random spare reinforced whale optimization algorithm. Expert Syst. Appl. 2020, 154, 113018. [Google Scholar] [CrossRef]
  53. Wei, Y.; Lv, H.; Chen, M.; Wang, M.; Heidari, A.A.; Chen, H.; Li, C. Predicting entrepreneurial intention of students: An extreme learning machine with Gaussian barebone Harris hawks optimizer. IEEE Access 2020, 8, 76841–76855. [Google Scholar] [CrossRef]
  54. Qais, M.H.; Hasanien, H.M.; Alghuwainem, S. Enhanced salp swarm algorithm: Application to variable speed wind generators. Eng. Appl. Artif. Intell. 2019, 80, 82–96. [Google Scholar] [CrossRef]
  55. Mirjalili, S.; Gandomi, A.H.; Mirjalili, S.Z.; Saremi, S.; Faris, H.; Mirjalili, S.M. Salp Swarm Algorithm: A bio-inspired optimizer for engineering design problems. Adv. Eng. Softw. 2017, 114, 163–191. [Google Scholar] [CrossRef]
  56. Su, H.; Zhao, D.; Heidari, A.A.; Liu, L.; Zhang, X.; Mafarja, M.; Chen, H. RIME: A physics-based optimization. Neurocomputing 2023, 532, 183–214. [Google Scholar] [CrossRef]
  57. Cao, Y.; Sun, Z.; Name, G. Multi-strategy optimization of HHO algorithm for path planning of warehouse robots. In Proceedings of the China Automation Congress (CAC), Chongqing, China, 17–19 November 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 735–740. [Google Scholar]
  58. Ling, Y.; Zhou, Y.; Luo, Q. Lévy flight trajectory-based whale optimization algorithm for global optimization. IEEE Access 2017, 5, 6168–6186. [Google Scholar] [CrossRef]
  59. Li, S.; Chen, H.; Wang, M.; Heidari, A.A.; Mirjalili, S. Slime mould algorithm: A new method for stochastic optimization. Future Gener. Comput. Syst. 2020, 111, 300–323. [Google Scholar] [CrossRef]
Figure 1. Overall block diagram of the proposed QHHO_GA framework.
Figure 1. Overall block diagram of the proposed QHHO_GA framework.
Symmetry 18 00749 g001
Figure 2. General flowchart of the QHHO_GA algorithm.
Figure 2. General flowchart of the QHHO_GA algorithm.
Symmetry 18 00749 g002
Figure 3. Prioritized empirical replay based on TD errors.
Figure 3. Prioritized empirical replay based on TD errors.
Symmetry 18 00749 g003
Figure 4. Combining entropy state partitioning with QHHO.
Figure 4. Combining entropy state partitioning with QHHO.
Symmetry 18 00749 g004
Figure 5. The flow chart of RRT with stagnation adjustment.
Figure 5. The flow chart of RRT with stagnation adjustment.
Symmetry 18 00749 g005
Figure 6. Obstacle threat schematic.
Figure 6. Obstacle threat schematic.
Symmetry 18 00749 g006
Figure 7. UAV flight altitude constraint schematic.
Figure 7. UAV flight altitude constraint schematic.
Symmetry 18 00749 g007
Figure 8. UAV flight angle constraint schematic.
Figure 8. UAV flight angle constraint schematic.
Symmetry 18 00749 g008
Figure 9. Rank heatmap of the ten highest-ranked algorithms on the CEC2017 benchmark.
Figure 9. Rank heatmap of the ten highest-ranked algorithms on the CEC2017 benchmark.
Symmetry 18 00749 g009
Figure 10. Convergence curves of the compared algorithms on representative CEC2017 functions.
Figure 10. Convergence curves of the compared algorithms on representative CEC2017 functions.
Symmetry 18 00749 g010
Figure 11. 3D path and overhead comparison diagram of scenario function 1. Terrain elevation: dark blue (low) → yellow (high) [2D background; 3D surface]. UAV trajectories: colored dashed lines (blue, red, cyan, magenta, green). Start/end points marked. 2D only: threats = black concentric circles. 3D only: obstacles = cylinders (colors for visual distinction). The same color and symbol conventions apply to all subsequent figures unless otherwise noted.
Figure 11. 3D path and overhead comparison diagram of scenario function 1. Terrain elevation: dark blue (low) → yellow (high) [2D background; 3D surface]. UAV trajectories: colored dashed lines (blue, red, cyan, magenta, green). Start/end points marked. 2D only: threats = black concentric circles. 3D only: obstacles = cylinders (colors for visual distinction). The same color and symbol conventions apply to all subsequent figures unless otherwise noted.
Symmetry 18 00749 g011
Figure 12. 3D diagram of scenario function 2.
Figure 12. 3D diagram of scenario function 2.
Symmetry 18 00749 g012
Figure 13. 3D path diagram of scenario function 3.
Figure 13. 3D path diagram of scenario function 3.
Symmetry 18 00749 g013
Figure 14. 3D path diagram of scenario function 4.
Figure 14. 3D path diagram of scenario function 4.
Symmetry 18 00749 g014
Figure 15. 3D path diagram of scenario function 5.
Figure 15. 3D path diagram of scenario function 5.
Symmetry 18 00749 g015
Figure 16. 3D path diagram of scenario function 6.
Figure 16. 3D path diagram of scenario function 6.
Symmetry 18 00749 g016
Figure 17. 2D path diagram of scenario function 6.
Figure 17. 2D path diagram of scenario function 6.
Symmetry 18 00749 g017
Figure 18. 3D path diagram of scenario function 7.
Figure 18. 3D path diagram of scenario function 7.
Symmetry 18 00749 g018
Figure 19. 2D path diagram of scenario function 7.
Figure 19. 2D path diagram of scenario function 7.
Symmetry 18 00749 g019
Figure 20. 3D path and overhead comparison diagram of scenario function 8.
Figure 20. 3D path and overhead comparison diagram of scenario function 8.
Symmetry 18 00749 g020
Figure 21. 2D path and overhead comparison diagram of scenario function 8.
Figure 21. 2D path and overhead comparison diagram of scenario function 8.
Symmetry 18 00749 g021
Figure 22. 3D path diagram of scenario function 9.
Figure 22. 3D path diagram of scenario function 9.
Symmetry 18 00749 g022
Table 1. Results of comparison between QHHO_GA and improved algorithms.
Table 1. Results of comparison between QHHO_GA and improved algorithms.
AlgorithmsOverall Rank
Rank+/−/=
QHHO_GA1 
ALCPSO414/3/13
CDLOBA826/2/2
OBSCA1029/0/1
RWGWO215/13/2
RCBA626/1/3
HHO729/0/1
DFOA1130/0/0
WOA930/0/0
PSO525/1/4
DE312/13/5
Table 2. The final cost of each algorithm for UAV path planning.
Table 2. The final cost of each algorithm for UAV path planning.
Functionf1f4f5f6
StatisticMeanStdMeanStdMeanStdMeanStd
QHHO_GA5.96 × 10 4 1.70 × 10 3 7.29 × 10 4 3.69 × 10 3 7.33 × 10 4 2.13 × 10 3 7.00 × 10 8 6.75 × 10 8
ALCPSO7.72 × 10 4 1.61 × 10 3 2.30 × 10 9 1.42 × 10 9 3.40 × 10 9 2.07 × 10 9 6.00 × 10 9 1.63 × 10 9
ESSA5.60 × 10 4 4.24 × 10 2 5.95 × 10 4 6.45 × 10 2 1.00 × 10 8 3.16 × 10 8 8.00 × 10 8 6.32 × 10 8
GBHHO5.25 × 10 4 1.22 × 10 3 1.00 × 10 8 3.16 × 10 8 5.00 × 10 8 7.07 × 10 8 4.00 × 10 8 9.66 × 10 8
LWOA5.71 × 10 4 7.11 × 10 2 2.00 × 10 8 6.32 × 10 8 3.00 × 10 8 6.75 × 10 8 1.10 × 10 9 8.76 × 10 8
MCHHO5.13 × 10 4 6.83 × 10 2 1.00 × 10 8 3.16 × 10 8 2.00 × 10 8 6.32 × 10 8 1.20 × 10 9 1.14 × 10 9
OBSCA6.74 × 10 4 2.25 × 10 3 7.48 × 10 4 2.52 × 10 3 7.48 × 10 4 1.04 × 10 3 6.00 × 10 8 5.16 × 10 8
PSO8.25 × 10 4 1.41 × 10 3 1.90 × 10 9 8.76 × 10 8 3.10 × 10 9 1.10 × 10 9 3.90 × 10 9 1.37 × 10 9
RDWOA5.29 × 10 4 1.39 × 10 3 7.06 × 10 4 4.60 × 10 3 7.51 × 10 4 7.10 × 10 3 1.30 × 10 9 8.23 × 10 8
RIME5.86 × 10 4 2.49 × 10 3 7.85 × 10 4 3.97 × 10 3 7.86 × 10 4 2.40 × 10 3 4.00 × 10 8 5.16 × 10 8
SMA8.24 × 10 4 2.21 × 10 3 5.50 × 10 9 2.68 × 10 9 6.60 × 10 9 2.59 × 10 9 8.10 × 10 9 1.97 × 10 9
WOA5.54 × 10 4 1.32 × 10 3 6.00 × 10 8 8.43 × 10 8 1.10 × 10 9 1.29 × 10 9 1.90 × 10 9 1.20 × 10 9
Functionf7f8f9
StatisticMeanStdMeanStdMeanStdAverage rankTotal rank
QHHO_GA2.00 × 10 8 4.22 × 10 8 9.00 × 10 8 8.76 × 10 8 1.50 × 10 9 1.65 × 10 9 3.4444444441
ALCPSO5.80 × 10 9 1.75 × 10 9 9.10 × 10 9 2.18 × 10 9 1.06 × 10 10 1.65 × 10 9 10.7777777811
ESSA6.00 × 10 8 8.43 × 10 8 1.60 × 10 9 2.22 × 10 9 3.70 × 10 9 3.47 × 10 9 4.1111111114
GBHHO6.00 × 10 8 5.16 × 10 8 1.40 × 10 9 1.43 × 10 9 3.40 × 10 9 1.65 × 10 9 3.8888888893
LWOA1.80 × 10 9 1.03 × 10 9 2.00 × 10 9 6.67 × 10 8 3.40 × 10 9 1.51 × 10 9 6.6666666678
MCHHO6.00 × 10 8 5.16 × 10 8 1.90 × 10 9 1.37 × 10 9 3.70 × 10 9 2.16 × 10 9 56
OBSCA7.00 × 10 8 4.83 × 10 8 2.80 × 10 9 7.89 × 10 8 4.40 × 10 9 1.51 × 10 9 67
PSO4.20 × 10 9 1.40 × 10 9 8.30 × 10 9 1.57 × 10 9 7.90 × 10 9 1.29 × 10 9 10.3333333310
RDWOA1.30 × 10 9 9.49 × 10 8 1.00 × 10 9 1.33 × 10 9 2.40 × 10 9 1.43 × 10 9 3.8888888892
RIME5.00 × 10 8 7.07 × 10 8 9.00 × 10 8 9.94 × 10 8 1.70 × 10 9 1.34 × 10 9 4.1111111115
SMA8.70 × 10 9 2.26 × 10 9 1.22 × 10 10 1.75 × 10 9 1.44 × 10 10 1.43 × 10 9 11.8888888912
WOA2.10 × 10 9 7.38 × 10 8 4.60 × 10 9 2.59 × 10 9 4.90 × 10 9 1.66 × 10 9 7.8888888899
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, R.; Xu, Z.; Hu, H.; Zheng, Z. A Hybrid Optimization Algorithm for Enhanced Path Planning in Dynamic Multi-UAV Environments. Symmetry 2026, 18, 749. https://doi.org/10.3390/sym18050749

AMA Style

Liu R, Xu Z, Hu H, Zheng Z. A Hybrid Optimization Algorithm for Enhanced Path Planning in Dynamic Multi-UAV Environments. Symmetry. 2026; 18(5):749. https://doi.org/10.3390/sym18050749

Chicago/Turabian Style

Liu, Rui, Ziyin Xu, Haiyang Hu, and Zhihao Zheng. 2026. "A Hybrid Optimization Algorithm for Enhanced Path Planning in Dynamic Multi-UAV Environments" Symmetry 18, no. 5: 749. https://doi.org/10.3390/sym18050749

APA Style

Liu, R., Xu, Z., Hu, H., & Zheng, Z. (2026). A Hybrid Optimization Algorithm for Enhanced Path Planning in Dynamic Multi-UAV Environments. Symmetry, 18(5), 749. https://doi.org/10.3390/sym18050749

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop