1. Introduction
Unmanned aerial vehicles (UAVs) have been increasingly deployed in disaster response, environmental monitoring, logistics, and intelligent agriculture because of their flexibility, low deployment cost, and ability to operate in complex or inaccessible areas [
1,
2]. With the growing demand for cooperative and autonomous missions, path planning has evolved from single-UAV navigation to multi-UAV coordination in dynamic environments [
3,
4]. Compared with ground vehicles, UAVs operate in three-dimensional space and must simultaneously account for obstacle avoidance, altitude feasibility, path smoothness, and mission efficiency, which substantially increases the difficulty of real-time planning [
5,
6].
For multi-UAV missions, path planning concerns not only the feasibility and safety of an individual trajectory, but also the coordination quality of multiple trajectories. In practical deployments, overly similar or strongly overlapping routes may reduce coverage efficiency, increase redundancy, and weaken the advantage of cooperative operation. Therefore, path planning in dynamic multi-UAV environments should not be formulated solely as a conventional shortest-path problem. Instead, trajectory quality, safety, and spatial differentiation should be considered jointly within a unified optimization framework.
From an optimization perspective, dynamic multi-UAV path planning is a high-dimensional and strongly constrained problem. Feasible trajectories must be generated under coupled requirements on path length, obstacle avoidance, altitude constraints, and flight smoothness, while excessive path overlap among UAVs should also be avoided. These characteristics lead to a complex search landscape in which premature convergence, diversity loss, and local trapping can substantially reduce solution quality. As a result, an effective optimizer should be able to maintain population diversity, regulate its search behavior adaptively, and expand the search space when progress becomes insufficient.
Existing hybrid optimization methods have shown that combining multiple mechanisms can improve path planning performance. However, in complex multi-UAV scenarios, three challenges remain particularly important. First, population quality and search diversity are often difficult to preserve simultaneously throughout the optimization process. Second, the exploration–exploitation balance is not always regulated through an explicit state-dependent decision mechanism. Third, recovery from prolonged search stagnation is frequently weak, which limits the optimizer’s ability to escape locally favorable but globally suboptimal regions. These challenges are especially restrictive in dynamic multi-UAV planning, where the search space is constrained not only by environmental feasibility but also by the coordination requirement of producing sufficiently differentiated trajectories.
To address these issues, this paper proposes a Q-learning-guided Harris Hawk Optimization-Genetic Algorithm (QHHO_GA) for dynamic multi-UAV path planning. In the proposed framework, GA is employed to enhance population quality and maintain diversity, HHO serves as the core search mechanism, and Q-learning is used to guide adaptive behavior selection through a deliberately designed state–action space. In addition, prioritized experience replay, entropy-guided state partitioning, stagnation monitoring, adaptive parameter regulation, and RRT-based stagnation adjustment are integrated into a unified heuristic optimization process. Through this design, the proposed method aims to improve diversity preservation, adaptive search regulation, and recovery from local stagnation in complex constrained search spaces.
The effectiveness of QHHO_GA is evaluated on the CEC2017 benchmark suite and on simulation-based multi-UAV path planning scenarios with multiple obstacle configurations. The main contributions of this paper are summarized as follows:
A hybrid optimization framework, QHHO_GA, is proposed for dynamic multi-UAV path planning by integrating GA-based population enhancement, Q-learning-guided HHO adaptive search, and RRT-based stagnation adjustment into a unified search process.
An adaptive search mechanism is developed through entropy-guided state partitioning, prioritized experience replay, and Boltzmann-based action selection, enabling more effective regulation of exploration and exploitation while improving the ability to escape premature convergence.
A multi-UAV path planning model with an explicit path diversity objective is established, and extensive experiments on benchmark functions and complex terrain scenarios are conducted to verify the effectiveness and competitiveness of the proposed method.
2. Related Works
UAV path planning has been studied extensively, and the related literature reflects a clear evolution from deterministic search to adaptive and hybrid optimization. Early studies mainly relied on classical search-based planners. With the increasing complexity of dynamic and constrained planning tasks, later work progressively incorporated population-based metaheuristics, hybrid frameworks, and learning-guided search strategies. Accordingly, the development of this field can be understood not merely as the accumulation of individual algorithms, but as a gradual transition from fixed search rules to adaptive optimization mechanisms.
Traditional search-based methods. Classical path planning methods, such as A* and D*, remain important because they provide explicit search logic and can generate feasible paths efficiently in structured or relatively static environments [
7,
8]. Deterministic search has also been extended to more complex planning settings through conflict-based search and related graph-based formulations [
9,
10]. In continuous spaces, RRT and its variants have been widely adopted because of their strong exploratory capability and flexible search behavior [
11,
12]. Subsequent developments, including RRT* and related extensions, further improved path quality and asymptotic optimality [
13,
14]. Other variants have also been proposed for constrained or application-specific planning scenarios [
15,
16]. These methods are particularly effective when the environment model is relatively clear and the search objective can be represented by deterministic rules. However, when the problem becomes high-dimensional, strongly constrained, or dynamically changing, repeated search and replanning may lead to a substantial computational burden. Moreover, the resulting paths often require further refinement when obstacle avoidance, smoothness, altitude feasibility, and multi-UAV coordination must be considered simultaneously [
10,
17].
Metaheuristic optimization and improved variants. To overcome the limitations of purely deterministic planning, many studies introduced population-based metaheuristic algorithms into UAV path planning. Classical methods such as GA and Ant Colony Optimization (ACO) provide stochastic exploration through population evolution and probabilistic search [
18,
19]. Particle Swarm Optimization (PSO) and HHO have likewise been adopted because of their strong global search capability and flexible update mechanisms [
20,
21]. Beyond these foundational approaches, a large number of improved variants have been proposed to enhance convergence speed, diversity maintenance, and search robustness. Representative examples include the multi-verse optimizer and dynamic metaheuristic variants designed for more complex search landscapes [
22,
23]. More recent studies have further developed specialized frameworks such as DENSDBO-ASR and PPSwarm for UAV-related optimization scenarios [
24,
25]. These studies indicate that performance can often be improved by redesigning initialization strategies, update rules, or diversity-preserving mechanisms. Nevertheless, in complex path planning tasks, especially when the feasible solution space is discontinuous or strongly constrained, basic or lightly improved algorithms may still suffer from premature convergence and insufficient adaptability to changing search conditions.
Hybrid and learning-guided optimization frameworks. As optimization problems became more complex, researchers increasingly turned to hybrid frameworks in order to combine the complementary strengths of different algorithms. Representative examples include path planning schemes that integrate deterministic search with evolutionary optimization, such as RRT-GA frameworks [
26,
27]. Other studies have combined swarm-intelligence methods, including PSO-ACO and related multi-strategy designs, to improve search efficiency and solution quality [
28,
29]. More recent work has explored hierarchical, co-evolutionary, and agent-based hybrid frameworks for complex planning tasks [
30,
31]. Adaptive multi-strategy designs have also been proposed to improve robustness in high-dimensional or dynamically changing environments [
32,
33]. The motivation behind such studies is not hybridization for its own sake, but the use of one mechanism to compensate for the limitations of another. In particular, hybridization is often employed to combine broader exploration with local refinement, or to restore diversity when the search becomes trapped in a limited region. This line of research shows that appropriately designed hybridization can improve optimization performance, although it may also increase structural complexity when the integrated mechanisms are not well aligned.
In recent years, Q-learning-guided metaheuristics have attracted increasing attention because Q-learning can provide lightweight adaptive control without the heavier computational requirements of deep reinforcement learning [
34,
35]. In UAV-related optimization, reinforcement learning has been increasingly used to improve path planning and decision adaptation under dynamic conditions [
36,
37]. Related studies have also explored multi-agent and policy-design perspectives for UAV coordination and planning [
38,
39]. Recent reviews further indicate that the integration of Q-learning with metaheuristics has become an active research direction for operator selection, parameter adaptation, and exploration–exploitation regulation [
40]. Promising results have already been reported in both path planning and general search optimization. For example, the IQ-FAT algorithm improves convergence speed by 40% and achieves approximately 90% better dynamic response time than the artificial potential field method [
41]. Similarly, the GA and Q-learning hybrid method proposed by Puente-Castro et al. improves path planning performance by 57.14% compared with conventional GA [
42]. More recently, Q-learning-guided grey wolf optimization has been shown to improve path cost and convergence behavior in UAV path planning through adaptive search control and diversity-preserving mechanisms [
43]. These findings suggest that the effectiveness of Q-learning-guided optimization does not depend solely on the addition of a learning component, but also on whether the state representation and action design are properly matched to the underlying search dynamics.
Based on the above development, the issue addressed in this work is not whether hybridization itself is possible, since many hybrid frameworks already exist, but rather how to design a practically effective combination of mechanisms for complex constrained search tasks. In particular, for HHO-based search, the construction of a meaningful state–action space for adaptive behavior selection is nontrivial. If the state partition is redundant or the action mapping is poorly matched, the hybrid framework may perform no better than, or even worse than, the original optimizer. In addition, path planning problems, especially dynamic multi-UAV tasks, require not only feasible and low-cost trajectories but also stronger diversity maintenance and effective stagnation adjustment in order to avoid overly similar paths and local trapping. These considerations motivate the proposed QHHO_GA framework. In the present work, GA is introduced as a simple but effective population-enhancement mechanism, Q-learning-guided HHO is used for adaptive action selection through a deliberately designed state–action space, and RRT-based stagnation adjustment is incorporated to expand the search space when progress becomes insufficient. Accordingly, the contribution of this work lies not in claiming the first hybridization itself, but in the careful integration of these mechanisms into a coherent framework and in demonstrating its effectiveness for dynamic multi-UAV path planning.
3. Background
This section briefly introduces the four algorithmic components underlying the proposed framework, namely GA, HHO, RRT, and Q-learning. Since these methods provide the theoretical basis of QHHO_GA, only their basic principles and the key formulations used later are summarized here.
3.1. Genetic Algorithm
GA is a population-based optimization algorithm inspired by natural selection and genetic evolution. Through repeated selection, crossover, mutation, and replacement, GA evolves candidate solutions toward promising regions of the search space. Owing to its robust global search capability, GA has been widely used in optimization problems. In the present work, GA is introduced as a population enhancement mechanism for improving solution quality and preserving search diversity. The initial population is generated within the search bounds as
where
N is the population size,
D is the problem dimension, and
and
are the lower and upper bounds of the search space, respectively. To assign higher selection probability to individuals with better fitness under a minimization setting, roulette-wheel selection is adopted:
where
is the probability that individual
i is selected,
is the fitness value of individual
i, and
is a constant used to avoid division-by-zero errors. New offspring are then generated by single-point crossover:
where
and
denote the parent individuals,
k is a randomly selected crossover point, and
and
are the generated offspring. To preserve diversity and reduce the risk of local trapping, mutation is further applied as
where
is the gene value of the
jth dimension of individual
i,
is the mutation probability, and
and
are the search bounds of the corresponding dimension. After crossover and mutation, individuals with poorer fitness are replaced so that the overall population quality can be maintained, typically through an elite-retention strategy.
3.2. Harris Hawk Optimization Algorithm
HHO is a population-based metaheuristic inspired by the cooperative hunting behavior of Harris hawks. By switching among different search behaviors, HHO balances global exploration and local exploitation during the optimization process and has shown competitive performance in a variety of optimization tasks. In this study, HHO serves as the core search mechanism of the proposed framework. In HHO, each candidate solution is regarded as the current position of a hawk, and the population evolves iteratively toward the current best solution. The main search behaviors considered in this work include soft besiege, hard besiege, random jump, spiral motion, and surprise pounce. The soft-besiege update is expressed as
where
E denotes the escape energy of the prey,
is the current global optimal solution, and
is a random perturbation following a standard normal distribution. The hard-besiege behavior is given by
whereas the random-jump behavior, which enlarges the search range during exploration, is written as
where
J is the jump intensity and
and
denote the upper and lower bounds of the search space. In addition, the spiral-motion behavior is defined as
where
r is the radius factor and
is the random angle, while the surprise-pounce behavior is modeled as
where rand is a random number,
is a decay factor, and
t is the current iteration step. These update rules provide the basic HHO behaviors later invoked in the proposed hybrid framework.
3.3. Rapidly Exploring Random Tree
RRT is a randomized tree-based search strategy that incrementally explores the solution space through recursive node expansion. Because of its exploratory capability, RRT has been widely used in motion planning and related search problems. In the proposed framework, RRT is not used as an independent planner, but as a search-space adjustment mechanism when stagnation occurs. Its basic procedure involves root generation, local node extension, and boundary handling. The root node is initialized as
where
denotes the location of the root node,
and
are the lower and upper bounds of the search space, and
D is the problem dimension. Child nodes are then generated through local randomized extension:
where
is the current node and
is the step size of the expansion. To ensure feasibility with respect to the search domain, each generated child node is projected back into the valid range by
which guarantees that the generated node remains within the admissible search space.
3.4. Q-Learning
Q-learning is a model-free reinforcement learning algorithm that learns state–action values through interaction with the environment. By iteratively updating the Q-value of each state–action pair, it gradually improves the action policy according to observed rewards. In QHHO_GA, Q-learning is used to guide adaptive action selection during the search process. The core idea of Q-learning is to estimate the expected cumulative reward associated with taking a certain action in a given state. Its standard temporal-difference update rule is written as
where
denotes the Q-value for choosing action
a in state
s,
is the learning rate,
r is the reward obtained after executing action
a, and
is a discount factor used to measure the contribution of future rewards. The next state is denoted by
, and
represents the maximum Q-value over the candidate actions in the next state.
4. Proposed QHHO_GA Algorithm
This section presents the overall framework of QHHO_GA and explains how its main components are integrated into a unified heuristic optimization process. Since the basic principles of GA, HHO, RRT, and Q-learning have been introduced in the previous section, the focus here is placed on their interaction, execution order, and functional coordination within the proposed framework.
As shown in
Figure 1 and
Figure 2, QHHO_GA consists of four modules, namely GA-based population enhancement, Q-learning-guided HHO adaptive search, stagnation monitoring and adaptive parameter regulation, and RRT-based stagnation adjustment. Starting from an initialized population, GA operations are first used to improve population quality and maintain diversity. Candidate solutions are then updated through Q-learning-guided HHO search, in which the current search state is evaluated and the corresponding HHO behavior is selected adaptively. During this process, prioritized replay is used to improve Q-table updating. When the best solution remains unimproved for a predefined period, the temperature parameter
and the exploration probability
are increased, and the RRT-based stagnation adjustment mechanism is triggered to inject new candidate solutions and enlarge the search space.
4.1. GA-Based Population Enhancement
In QHHO_GA, GA is employed as a population enhancement mechanism rather than as an independent optimizer. Through selection, crossover, mutation, and fitness-guided replacement, it improves population quality while maintaining useful diversity, thereby providing a more suitable population basis for the subsequent adaptive search process. Since the basic GA operators have already been introduced in the previous section, they are not repeated here.
4.2. Q-Learning-Guided HHO Adaptive Search
This module constitutes the adaptive search core of QHHO_GA. Its main role is to regulate HHO behaviors through Q-learning so that the search process is adjusted according to the current optimization state rather than following only fixed update rules. As summarized in Algorithm 1, each candidate solution is evaluated with respect to the current population state, an HHO action is selected according to the learned Q-table, and the corresponding update is then executed. The resulting transition is stored for subsequent replay-based learning, so that state assessment, action selection, and experience reuse are coupled within a unified adaptive search loop.
To improve learning efficiency, QHHO_GA employs prioritized experience replay, as illustrated in
Figure 3. Instead of replaying all transitions with equal importance, the framework assigns higher priority to samples with larger absolute TD errors, because such transitions generally provide stronger learning signals for Q-table updating. The replay probability is defined as
where
controls the influence of TD error on sampling probability. The corresponding replay procedure is summarized in Algorithm 2.
| Algorithm 1: Q-learning-guided HHO adaptive search |
| | Input: ; |
| | |
| | Output: |
| 1 | Initialize: ; ; |
| 2 | Compute and over X (by row standard deviation); |
| 3 | Determine the current state using: |
| 4 | ; |
| 5 | Select an action based on the current state and Q-table using: |
| 6 | ; |
| 7 | Apply the selected action: |
| 8 | ; |
| 9 | Update the global best solution and the best fitness based on ; |
| 10 | Compute the current and next Q-values for updating the Q-table using: |
| 11 | ; |
| 12 | Add the experience to the replay buffer and priority buffer; |
| 13 | Perform prioritized replay by sampling from the priority buffer and updating the Q-table. |
| 14 | return |
| Algorithm 2: Prioritized Experience Replay Based on TD Error |
![Symmetry 18 00749 i001 Symmetry 18 00749 i001]() |
The adaptive search mechanism further combines entropy-aware state characterization with Boltzmann-based action selection, as illustrated in
Figure 4. After the current state is determined, actions are sampled according to
where
is the temperature parameter controlling the randomness of action selection. A lower value of
emphasizes exploitation by favoring actions with larger Q-values, whereas a higher value encourages broader exploration. To characterize the current search condition, the population state is described through normalized fitness, relative distance to the current global best solution, population dispersion, and an entropy-related diversity measure. The population entropy is defined as
where
denotes the probability distribution of individuals in the population. Based on these indicators, the current search state is evaluated as
Here, State 1 indicates a highly aggregated population suitable for local search, State 2 corresponds to a more dispersed population requiring broader exploration, State 3 represents a population already close to the current global optimum, State 4 reflects high population diversity, and State 5 is treated as a default transitional state. The default action is defined as
while the complete state–action correspondence is illustrated in
Figure 4. Through repeated interaction among state evaluation, action selection, and replay-based updating, the search process becomes more sensitive to different optimization stages and more capable of balancing exploration and exploitation.
4.3. Stagnation Monitoring and Adaptive Parameter Regulation
In addition to the adaptive search mechanism, QHHO_GA includes a stagnation monitoring module to evaluate whether the optimization process continues to improve the current best solution. Specifically, the framework records the number of consecutive iterations in which the best fitness is not updated. When the stagnation threshold is reached, the temperature parameter and the exploration probability are increased so that the subsequent Q-learning-guided action selection becomes more exploratory. In this way, optimization progress is directly incorporated into parameter regulation rather than being treated as a fixed external condition.
4.4. RRT-Based Stagnation Adjustment
When the stagnation monitoring module detects that the current best solution has not improved for a predefined number of iterations, the RRT-based stagnation adjustment mechanism is activated, as illustrated in
Figure 5. In QHHO_GA, RRT is not used as an independent optimizer; instead, it serves as a recovery mechanism for enlarging the search space when the population becomes trapped in a limited region. The new candidate solutions generated by the RRT-based random tree generation procedure are used to replace the worst individuals in the population, after which their fitness values are re-evaluated. At the same time,
and
are increased appropriately to encourage broader exploration.
4.5. Summary of the Mechanism Hierarchy in QHHO_GA
In summary, QHHO_GA is a hierarchically organized hybrid heuristic optimization framework that integrates population-level, individual-level, and cross-level control mechanisms. At the population level, GA-based population enhancement, stagnation monitoring and adaptive parameter regulation, and RRT-based stagnation adjustment are responsible for improving population quality, maintaining diversity, and enlarging the search space when the search becomes stagnant. At the individual level, HHO action execution and Q-learning-guided action selection enable each candidate solution to adjust its search behavior adaptively. In addition, entropy-based state evaluation, prioritized experience replay, and the adaptive regulation of and connect individual search behavior with population-level statistical information, thereby forming a cross-level control mechanism. Through this coordinated design, the proposed framework forms a closed optimization loop that integrates population evolution, adaptive individual search, and stagnation adjustment.
5. Problem Modeling
This section formulates the multi-UAV path planning problem considered in this work. In dynamic and complex environments, path planning is not limited to finding a shortest feasible route. Instead, the optimizer is required to generate trajectories in a high-dimensional search space while simultaneously satisfying multiple coupled requirements, including path efficiency, obstacle avoidance, altitude feasibility, and flight smoothness. These characteristics make the problem highly constrained and difficult to solve using simple deterministic search alone.
Compared with single-UAV path planning, the multi-UAV case introduces an additional coordination requirement. It is not sufficient for each UAV to independently obtain a feasible path; the trajectories of multiple UAVs should also maintain sufficient spatial distributability. If several UAVs follow highly similar or excessively close routes, task coverage may become redundant, mutual interference may increase, and the cooperative advantage of deploying multiple UAVs may be weakened. Therefore, path diversity is treated here not as an auxiliary preference, but as an explicit component of the multi-UAV optimization objective.
From an optimization perspective, the problem considered here is characterized by a high-dimensional search space, multiple coupled objective components, and strict feasibility constraints. These properties generally produce a complex search landscape with many locally favorable but globally suboptimal regions. Moreover, in the multi-UAV setting, the optimization process should not only seek feasible and low-cost trajectories, but also avoid excessive concentration of solutions, since overly similar paths weaken cooperative effectiveness. As a result, the problem requires the optimizer to maintain population diversity, regulate search behavior according to the current search state, and expand the search space when progress becomes stagnant. In this sense, dynamic multi-UAV path planning provides a representative constrained optimization scenario for assessing the effectiveness of QHHO_GA.
Accordingly, each candidate solution
is evaluated by a weighted cost model. The first four cost terms describe the quality and feasibility of an individual UAV path, whereas the fifth term explicitly captures the coordination requirement among multiple UAVs through path diversity. The total cost is defined as follows:
where
denotes the total cost associated with the candidate solution
,
denotes the weight assigned to the
m-th cost component, and
denotes the corresponding cost function.
5.1. Path Length Cost
Path length is a fundamental measure of flight efficiency, since shorter trajectories generally reduce flight time, energy consumption, and mission cost. Let
N denote the total number of nodes in the complete path, including the start and end nodes, and let
denote the
j-th node of the path encoded in candidate solution
. The path length cost is defined as:
where
denotes the path length cost of
, obtained by summing the Euclidean distances between consecutive path nodes.
5.2. Obstacle Threat Cost
In addition to path efficiency, the generated trajectory must maintain sufficient clearance from obstacles. Suppose that there are
K obstacles in the environment. Each obstacle is represented as a cylindrical structure centered at
with radius
. Let
denote the shortest distance from a path segment to the center of the
k-th obstacle. A smaller
indicates a higher collision risk. As shown in
Figure 6, the obstacle threat cost
is defined as:
where the term is defined as:
where
D denotes the UAV diameter and
S is a safety-margin parameter defining the obstacle risk region. The value of
S can be adjusted according to environmental uncertainty and positioning accuracy. When the path remains outside the risk region, the corresponding threat cost is zero. Once the path enters the risk region, the penalty increases as the clearance decreases. If the path enters the collision zone, the threat cost becomes infinite, indicating an infeasible solution.
5.3. Navigation Height Cost
Altitude feasibility is another important constraint in path planning. A feasible trajectory should keep the UAV within an allowable flight-height range with respect to the local ground while avoiding excessive fluctuations that may reduce flight stability. As shown in
Figure 7, let
denote the flight height of the
j-th path node in candidate solution
with respect to the local terrain. The node-wise altitude cost is defined as:
where
and
denote the minimum and maximum allowable flight heights with respect to the ground, respectively. This formulation penalizes deviations from the middle of the admissible height band and assigns an infinite penalty when the relative flight height becomes negative. The corresponding absolute altitude used for trajectory generation is given by
where
denotes the terrain elevation at the corresponding horizontal position. The total navigation height cost is then
5.4. Smoothness Cost
A feasible path should not only be safe but also sufficiently smooth for practical execution. Large fluctuations in yaw angle or pitch angle may reduce flight stability, increase control difficulty, and degrade mission performance. Therefore, the smoothness cost is defined using both the horizontal yaw angle and the vertical pitch angle.
As shown in
Figure 8, the horizontal yaw angle
is defined as the angle between two consecutive path segments projected onto the horizontal plane:
The vertical pitch angle
is defined as:
where
and
denote the absolute altitudes of consecutive nodes. In implementation, excessive turning and excessive changes in climb angle are penalized only when they exceed prescribed thresholds. Thus, the smoothness cost is written as
where
and
are the allowable turning-angle and climb-angle-change thresholds, respectively. This formulation penalizes sharp turns and abrupt altitude changes only when they exceed the admissible limits, thereby encouraging smoother and more executable trajectories.
5.5. Multi-UAV Flight Diversity Cost
In multi-UAV missions, it is not sufficient to optimize each path independently. The generated trajectories should also maintain sufficient spatial distributability. If multiple UAVs follow overly similar or closely overlapping paths, mission coverage may become redundant and mutual interference may increase. Therefore, an additional diversity cost is introduced to penalize excessive proximity among UAV trajectories.
Assume that the mission contains
UAVs, and let
denote the
j-th node on the path of the
u-th UAV. The distribution cost between multiple UAV paths is defined as:
where
and
denote the
k-th and
l-th path segments of the
u-th and
v-th UAVs, respectively. The pairwise segment distance cost is defined as:
where
d is the minimum distance between the two path segments, computed as:
Finally, the total multi-UAV flight diversity cost is defined as:
where
is the weight factor of the diversity cost. By introducing this term, the optimization process penalizes overlapping or excessively close UAV paths and thereby reflects the cooperative requirement of multi-UAV missions more explicitly.
Overall, the optimization objective considered in this work is to obtain a set of feasible and spatially differentiated UAV trajectories by minimizing the weighted cost in Equation (
19). The first four cost terms evaluate path efficiency, safety, altitude feasibility, and smoothness, whereas the fifth term models the coordination requirement among multiple UAVs through path diversity.
6. Experimental Results and Analysis
To validate the effectiveness of QHHO_GA, two groups of experiments were conducted. The first group evaluates the general optimization capability of the proposed algorithm on the CEC2017 benchmark set, while the second group examines its performance in a multi-UAV path planning task under complex three-dimensional terrain scenarios. In this way, the proposed method is assessed from both the optimization perspective and the application perspective.
6.1. CEC2017 Benchmark Comparison
For the benchmark evaluation, the CEC2017 test set containing 30 functions was adopted. These functions include single-peaked, simple multi-peaked, hybrid, and composition functions, thus covering different levels of search difficulty. All experiments were conducted independently 30 times in Matlab 2023a with a population size of 30 and a maximum number of function evaluations of 300,000. The mean and standard deviation were used to measure performance stability, and the Wilcoxon signed-rank test with a significance level of 0.05 was used to evaluate statistical significance. In the following analysis, the symbols “+/=/−” indicate whether QHHO_GA performs better than, equal to, or worse than the comparison algorithms, respectively.
The algorithms compared with QHHO_GA on the CEC2017 benchmark include PSO, Aging Leader and Challengers Particle Swarm Optimization (ALCPSO) [
44], Differential Evolution (DE) [
45], HHO, Whale Optimization Algorithm (WOA) [
46], Random Walk Grey Wolf Optimizer (RWGWO) [
47], Cooperative Dynamic Learning Opposition-Based Bat Algorithm (CDLOBA) [
48], Differential Fruit Fly Optimization Algorithm (DFOA) [
49], Opposition-Based Sine Cosine Algorithm (OBSCA) [
50], and Rule-Based Cultural Algorithm (RCBA) [
51]. These algorithms were selected because they represent a range of classical and improved population-based optimization strategies, including particle-based search, differential evolution, hawk-inspired search, whale-inspired search, grey-wolf-based search, bat-inspired learning, fruit-fly optimization, sine-cosine optimization, and rule-based hybrid optimization. Therefore, they provide representative baselines for evaluating the optimization performance of QHHO_GA on benchmark functions.
As shown in
Figure 9, the benchmark results are visualized through a rank heatmap of the top-10 algorithms on the CEC2017 test suite. For readability, only the ten highest-ranked methods according to the overall ranking are displayed. Darker cells indicate better rankings, and the black horizontal separators divide the benchmark into four function families, namely unimodal functions (F1–F3), multimodal functions (F4–F10), hybrid functions (F11–F20), and composition functions (F21–F30). The heatmap shows that QHHO_GA maintains consistently competitive rankings across a wide range of benchmark functions.
The convergence curves shown in
Figure 10 provide additional evidence that the proposed algorithm is especially effective on hybrid and composition functions, where the search landscape is more complex and the risk of premature convergence is higher.
More specifically, QHHO_GA ranks among the top two on 18 out of the 30 CEC2017 benchmark functions when tied positions are taken into account. It ranks first on F3, F10, F13, F18, F19, F28, and F30, and ranks second on F1, F2, F6, F8, F11, F12, F15, F21, and F26. From the perspective of function categories, the proposed method remains highly competitive on the unimodal group and shows particularly strong robustness on the hybrid and composition groups, where population diversity maintenance and stagnation recovery become more important. Such results suggest that the combination of population enhancement, adaptive search, stagnation monitoring, and RRT-based stagnation adjustment enables QHHO_GA to maintain a strong balance between exploration and exploitation in complex optimization landscapes.
The benchmark results also reveal a clear competitive structure among the compared algorithms. RWGWO and DE are the strongest competitors to QHHO_GA in terms of overall ranking, whereas DFOA and WOA show relatively weak robustness on the CEC2017 benchmark. This indicates that the superiority of QHHO_GA is established not only against weaker baselines, but also against strong competitors from different metaheuristic families.
It should also be noted that QHHO_GA is not ranked first on every benchmark function. Its relatively weaker rankings mainly appear on F4, F5, F7, F9, F14, F16, F17, F20, F22, F23, F24, F25, F27, and F29, where it is usually outperformed by DE, RWGWO, or ALCPSO. A cautious interpretation is that these functions may favor more direct exploitation dynamics or more stable local refinement patterns, whereas the diversity-preserving and stagnation-recovery mechanisms of QHHO_GA become more advantageous when the search landscape is more deceptive, composite, or prone to premature convergence. Therefore, the proposed method should be understood as a robust overall optimizer rather than as the best-performing method on every individual function.
Appendix A Table A1 and
Table A2 report the detailed mean and standard deviation results for all compared algorithms.
Table 1 summarizes the final ranking results and further confirms the competitiveness of QHHO_GA on the CEC2017 benchmark set.
Table 1 lists the final ranking results of QHHO_GA and the comparison algorithms, further validating the competitiveness of the proposed method, especially on hybrid and composition functions.
6.2. UAV Experimental Comparison
To further evaluate the practical applicability of QHHO_GA, comparative experiments were conducted on pre-generated three-dimensional hilly terrain maps for multi-UAV path planning. In addition to several baseline algorithms already used in the CEC2017 comparison, this experiment also includes Random Drift Whale Optimization Algorithm (RDWOA) [
52], Gaussian Barebone Harris Hawks Optimizer (GBHHO) [
53], Enhanced Salp Swarm Algorithm (ESSA) [
54,
55], RIME Algorithm [
56], MCHHO Algorithm [
57], Levy-flight Whale Optimization Algorithm (LWOA) [
58], and Slime Mould Algorithm (SMA) [
59]. These algorithms were selected because they provide representative search strategies for complex path planning scenarios, including adaptive whale optimization, hybrid HHO variants, improved salp-swarm-based search, chaos-enhanced exploration, Levy-flight-based search, and slime-mould-inspired adaptive optimization. Together with OBSCA, ALCPSO, PSO, and WOA, these algorithms serve as the main comparative baselines for the UAV path planning task.
Representative path visualizations are provided for nine UAV scenarios, whereas the quantitative statistical comparison in
Table 2 is reported for seven representative scenarios due to space limitations. This design allows both visual inspection of trajectory behaviors and compact quantitative comparison of algorithmic performance.
Table 2 reports the final path-planning cost statistics of all algorithms on seven representative UAV scenarios, together with their average rank and total rank.
The UAV experimental results show that QHHO_GA achieves an average rank of 3.44 and the best total rank among all 12 methods, indicating strong robustness across different terrain scenarios. As shown in
Table 2, the proposed algorithm maintains competitive performance not only in terms of final cost, but also in terms of consistency across multiple UAV test cases. Compared with the strongest competing methods, such as RDWOA and GBHHO, QHHO_GA shows better overall stability, while algorithms such as PSO, ALCPSO, and SMA exhibit substantially weaker robustness in complex three-dimensional environments with multiple obstacles. These results suggest that the hybrid design of QHHO_GA is effective in balancing global exploration, local refinement, and path diversity under constrained multi-UAV path planning settings.
The numerical results further indicate that the main strength of QHHO_GA lies in its overall robustness rather than in dominating every individual scenario. In relatively simple environments, several comparison algorithms can occasionally achieve lower mean costs than QHHO_GA. However, as terrain complexity increases, many competing methods exhibit substantial cost degradation, weaker stability, or more homogeneous trajectory patterns, whereas QHHO_GA remains within a consistently competitive performance range. This explains why the proposed method achieves the best total rank even though it is not the best-performing algorithm on every single test case.
To further visualize the path-planning behavior of the compared algorithms, representative path plots are presented in
Figure 11,
Figure 12,
Figure 13,
Figure 14,
Figure 15,
Figure 16,
Figure 17,
Figure 18,
Figure 19,
Figure 20,
Figure 21 and
Figure 22. These figures show the three-dimensional views and, where applicable, top views of the generated UAV trajectories under different terrain configurations. Overall, the visual results are consistent with the statistical comparison: in relatively simple scenarios, QHHO_GA remains competitive, while in scenarios with denser obstacles and stronger path interaction requirements, its advantage becomes more evident.
For the relatively simple scenarios represented by f1 and f2 (
Figure 11 and
Figure 12), QHHO_GA is able to generate feasible and relatively diverse paths while maintaining good path efficiency. In these cases, the shortest-path objective still plays a dominant role, and the performance differences among the strongest methods are not always large. Nevertheless, the comparison plots show that QHHO_GA can still provide more balanced path distributions. In contrast, the paths generated by SMA are vertically duplicated, which introduces unnecessary steering and reduces resource efficiency. In the f2 scenario, PSO requires a pronounced detour, which substantially increases path length, whereas QHHO_GA reaches the target region with a more efficient and better-balanced set of trajectories.
In the scenario of f3 (
Figure 13), three obstacles are located directly in the main flight region. Both QHHO_GA and LWOA are able to guide the UAVs around these obstacles; however, QHHO_GA achieves a lower overall path cost by preserving obstacle avoidance performance while still generating more spatially differentiated trajectories. This result suggests that the proposed framework can maintain path diversity without excessively sacrificing path efficiency.
In the scenario of f4 (
Figure 14), where four obstacles are distributed inside the terrain, QHHO_GA still produces relatively efficient and well-separated paths. By comparison, the paths generated by GBHHO are more homogeneous and show weaker spatial differentiation. This observation is consistent with the design objective of the proposed method, which explicitly encourages path diversity in multi-UAV missions rather than allowing the trajectories to collapse into highly similar solutions.
The difference becomes more evident in the scenario of f5 (
Figure 15), which contains five obstacles and therefore imposes a more complex path interaction structure. In this case, QHHO_GA generates multiple optimized paths from different directions, whereas OBSCA mainly produces a simpler bypass pattern from one side of the terrain. Such behavior indicates that QHHO_GA is more capable of finding alternative feasible routes and is therefore more consistent with the practical requirement of multi-UAV cooperative path planning, where excessive path overlap should be avoided whenever possible.
As the obstacle density increases further, the advantage of QHHO_GA becomes more pronounced. This tendency is already reflected in the quantitative results for f6 and can be seen more clearly in the visual comparison for f7 (
Figure 18 and
Figure 19). In the seven-obstacle scenario, QHHO_GA enables the five UAVs to identify five feasible paths from three different directions while successfully avoiding obstacles. By contrast, MCHHO tends to generate five highly similar paths from essentially one direction, and one UAV may even fail to obtain a satisfactory obstacle-avoiding solution within the allowed iterations. These results suggest that the proposed framework is more effective in preserving both feasibility and trajectory diversity under increasingly constrained conditions.
A similar pattern is observed in the scenario of f8 (
Figure 20 and
Figure 21), where obstacles are distributed in a more complex and irregular manner. In this case, QHHO_GA finds five largely distinct paths that start from four directions, with only limited similarity between a small number of trajectories. RDWOA also performs reasonably well in terms of obstacle avoidance and path length; however, its solutions are less diverse overall, with four trajectories being nearly identical and only two main search directions being explored. From the perspective of multi-UAV mission planning, such a path distribution is less desirable than the more spatially differentiated solutions generated by QHHO_GA.
Finally, in the highly constrained scenario of f9 (
Figure 22), the terrain contains ten obstacles and imposes the most challenging path planning conditions among the tested visualization cases. Even in this environment, QHHO_GA is still able to generate multiple diverse paths that follow obstacle boundaries accurately while maintaining a relatively low total path cost. Although RDWOA also shows a certain degree of trajectory diversity, its solutions become overly dispersed, resulting in a poorer balance between path cost, altitude cost, and obstacle avoidance. In some cases, this excessive diversity even leads to collisions or less efficient routes. By contrast, QHHO_GA maintains a more effective compromise between diversity and path quality.
In summary, the scenario-based visualization results reinforce the statistical findings reported in
Table 2. QHHO_GA performs competitively in simple terrain and shows increasingly clear advantages as obstacle density and path interaction complexity increase. Its main strength lies not only in generating feasible trajectories, but also in producing spatially differentiated cooperative paths while maintaining reasonable path cost. This property is particularly important for practical multi-UAV missions, where path overlap, redundancy, and insufficient coverage may significantly reduce cooperative effectiveness.
7. Conclusions
This paper proposed a hybrid heuristic optimization algorithm, QHHO_GA, and applied it to the multi-UAV path planning problem in complex environments. The proposed framework integrates GA-based population enhancement, Q-learning-guided HHO adaptive search, stagnation monitoring and adaptive parameter regulation, prioritized experience replay, and RRT-based stagnation adjustment into a unified optimization process. Through this design, the algorithm aims to improve search diversity, strengthen the balance between exploration and exploitation, and enhance the ability to escape from locally trapped regions.
The experimental results demonstrate the effectiveness of the proposed method from both the optimization perspective and the application perspective. On the CEC2017 benchmark set, QHHO_GA ranked among the top two on 18 out of 30 test functions, indicating strong competitiveness across different categories of optimization landscapes. In the multi-UAV path planning experiments, QHHO_GA achieved an average ranking of 3.44 and ranked first overall among 12 classical and improved algorithms, showing strong robustness and good solution quality in complex terrain scenarios. In particular, the proposed method demonstrated clear advantages in environments with dense obstacles and complex path interactions, where both search diversity and the ability to avoid premature convergence are critical.
The main strength of QHHO_GA lies in its coordinated hybrid design. GA improves population quality and diversity, HHO provides the core search capability, Q-learning enhances the adaptivity of action selection, and the combination of stagnation monitoring and adaptive parameter regulation with RRT-based stagnation adjustment helps the algorithm restore diversity when progress becomes insufficient. As a result, the proposed framework is able to maintain stable performance in high-dimensional and strongly constrained optimization tasks. In the multi-UAV application considered in this work, the method not only improves path quality and obstacle avoidance performance, but also better supports the generation of spatially differentiated trajectories, which is important for cooperative multi-UAV missions.
Nevertheless, this study still has several limitations. First, the effectiveness of QHHO_GA has been validated only in simulation environments, and no real-world flight experiments have been conducted. Therefore, the current results do not fully reflect the uncertainties, sensor noise, communication delays, and dynamic disturbances that may arise in practical UAV operations. Second, the proposed method has been evaluated on pre-generated terrain and obstacle settings, which, although representative, cannot cover all possible real-world environmental variations. Third, like many population-based hybrid optimization methods, the computational cost of the framework may become a challenge in scenarios with stricter real-time requirements or larger-scale UAV teams.
Future work will therefore focus on addressing these limitations from both the algorithmic and application levels. On the one hand, real-world flight tests will be conducted to further verify the practicality and robustness of QHHO_GA under realistic sensing, communication, and environmental uncertainties. On the other hand, the framework will be extended to improve real-time responsiveness and adaptability in dynamic environments, for example, through tighter integration with online learning strategies, more efficient parameter control, and multi-sensor information fusion. In addition, reducing computational overhead and improving energy efficiency will be important directions for making the proposed method more suitable for lightweight UAV platforms and large-scale cooperative missions.