1. Introduction
Unmanned aerial vehicles (UAVs) are increasingly utilized across various military and civilian applications, driven by rapid technological advancement [
1]. Path planning is a critical technology for enabling autonomous UAV navigation [
2,
3]. In complex 3D environments, a UAV path must satisfy physical and dynamic constraints, such as maximum turning angle, maximum climb angle, path length, and flight cost, while simultaneously avoiding terrain obstacles and threat areas [
1]. Currently, path planning algorithms can be generally divided into four classes: heuristic algorithms, sampling-based methods, learning-based methods, and swarm intelligence-based methods. Heuristic algorithms, such as A* [
4], guide the search process using designed heuristic functions, though designing effective heuristics can be challenging. Sampling-based methods, e.g., the rapidly exploring random tree (RRT) [
5], can ensure path feasibility but often cannot guarantee optimality. Learning-based methods, such as reinforcement learning (RL) [
6], offer adaptive optimization and generalization capabilities but typically require substantial computational resources and data. Concurrently, the rise of large AI models for end-to-end navigation [
7] presents a complementary frontier where efficient optimizers could be integrated to refine AI-generated paths with guaranteed constraint satisfaction. Swarm intelligence-based algorithms usually possess strong global search capabilities and achieve significant improvements in path planning performance, such as the Differential Evolution (DE) algorithm [
8], Ant Colony Optimization (ACO) algorithm [
9], Grey Wolf Optimizer (GWO) algorithm [
10], Fireworks Algorithm (FWA) [
11], and Particle Swarm Optimization (PSO) algorithm [
11].
DE, recognized for its structural simplicity and robustness [
12], has been effectively applied in a large number of engineering fields [
13,
14], including UAV path planning and multi-UAVs coordination [
15]. Researchers have conducted extensive studies and improvements on the standard DE to enhance its performance. The adaptive DE (JADE) [
16] incorporates an optional external archive and adaptive mechanisms to improve the diversity and global search ability. The multiple strategies of adaptive DE (MLSHADE-SPA and SaUSDE) [
17,
18] dynamically adjust and select the suitable mutation strategy based on problem characteristics and dynamic changes during evolution, thereby enhancing the algorithm’s performance and convergence speed. The iterative local search adaptive DE (SHADE-ILS) [
19] combines adaptive mechanisms with an iterative local search. The Multiple-Objective DE (MODE) [
20] incorporates multiple differential mutation strategies and crossover operators to handle multiple-objective optimization problems. The hybrid DE (HDE) [
21] enhances the robustness against noise and disturbances. However, existing DE variants still suffer from premature convergence and insufficient search efficiency when dealing with complex problems.
In recent years, several existing studies have studied the application of the DE to the UAV path planning problem. Zhou et al. [
22] presented a three-dimensional trajectory planning algorithm for UAV based on an improved DE algorithm. Moreover, considering that the path planning problem is truly a multi-objective optimization problem in which conflicting goals of minimizing the length of the path and maximizing the margin of safety can be simultaneously important, Mittal et al. [
23] used a hybrid multi-objective evolutionary algorithm to optimize the flight distance and risk factor simultaneously, and thus generated a set of Pareto optimal paths. However, in complex 3D environments, these methods often converge slowly, become trapped in local optima, and struggle to produce smooth, feasible paths.
Opposite-Based Learning (OBL) [
24] has recently been incorporated into population-based optimization techniques to enhance their search capabilities. OBL, as a relatively simple learning mechanism, can analyze known outputs, derive optimal input parameters, and make adjustments to enhance system performance or improve operational processes. OBL does not require complex data processing, making it simple to implement and computationally cost-effective. Meanwhile, quantum theory has provided new ideas for improving optimization algorithms [
25]. When individuals follow the rules of quantum-behaved movement, their solutions exhibit higher diversity, thereby expanding the search range. Quantum-behaved Particle Swarm Optimization (QPSO) utilizes quantum mechanical rules for particle motion to help particles escape local optima [
26,
27]. Quantum Firefly Algorithm (QFA) employs quantum movement to maintain population diversity [
28].
This paper integrates quantum theory and the OBL mechanism to overcome the limitations of standard DE, and proposes a novel Quantum-behaved Loser Reverse-learning Differential Evolution (QLRDE) algorithm. First, a Loser Reverse-Learning Mechanism (LRLM) is introduced to reconstruct the positions of inferior individuals, thereby increasing population diversity. Second, quantum-behaved mutation strategies are adopted, leveraging the probabilistic nature of wave functions to suppress premature convergence. Third, an adaptive parameter adjustment strategy based on the hyperbolic tangent function is designed to dynamically balance the exploration and exploitation of solution space. Simulation results demonstrate that QLRDE outperforms other algorithms on benchmark test functions. Furthermore, this paper presents a significant application of QLRDE to 3D path planning for UAV. The QLRDE encodes B-Spline-based control points as optimization individuals and employs a comprehensive cost function of UAV path. Experimental evaluations confirm that QLRDE can generate shorter, lower-altitude, and smoother flight paths compared to other algorithms.
This paper makes three main contributions. First, we propose the Quantum-behaved Loser Reverse-learning Differential Evolution (QLRDE) algorithm, which incorporates three innovations: a quantum-behaved mutation strategy to suppress premature convergence, the Loser Reverse-Learning Mechanism (LRLM) to enhance population diversity, and an adaptive parameter adjustment mechanism to balance exploration and exploitation. Second, experimental evaluations on twelve benchmark functions confirm that QLRDE demonstrates better performance than existing algorithms in terms of search capability and convergence speed, achieving solutions closer to the true global optimum. Third, the proposed QLRDE is effectively applied to the 3D UAV path planning problem. By encoding B-Spline-based control points as optimization individuals and incorporating real-world constraints into a comprehensive cost function, QLRDE generates shorter, lower-altitude, and smoother flight paths, outperforming other algorithms with respect to path quality and robustness.
This paper proceeds as follows.
Section 2 focuses on the DE algorithm. The detailed QLRDE algorithm is introduced in
Section 3.
Section 4 compares QLRDE and other algorithms on test functions.
Section 5 shows the application of QLRDE to UAV path planning.
Section 6 discusses experimental results of QLRDE-based UAV path planning.
Section 7 summarizes this paper finally.
2. Standard DE Algorithm
Operating as a parallel direct search method, DE follows an iterative, population-based stochastic process consisting of mutation, crossover, and selection. Algorithm 1 presents the general structure of the DE algorithm [
12].
| Algorithm 1. Structure of the basic DE algorithm |
| 1. | Set the generation index t = 0 |
| 2. | /*Initialization*/ |
| 3. | An initial population of Mpop individuals is randomly generated. The fitness of each individual is then evaluated |
| 4. | While stopping criteria are unsatisfied, do |
| 5. | generation index t = t + 1 |
| 6. | For i = 1: Mpop do |
| 7. | /*Mutation*/ |
| 8. | generate via a random selection of distinct individuals |
| 9. | /*Crossover*/ |
| 10. | apply the crossover scheme to and to obtain |
| 11. | /*Selection*/ |
| 12. | evaluate , then perform greedy selection between and |
| 13. | End for |
| 14. | End while |
The standard DE algorithm consists of the following four steps:
(1) Initialization: Generate the random initial solution population, in which the
i-th individual is denoted as
, and
where
D denotes the dimensionality of search space,
i = 1, 2, …,
Mpop,
xj,min and
xj,max are the predefined search bounds, and
rand is a uniformly distributed random number within [0, 1].
(2) Mutation: In the
g-th generation, the mutation vector
vi is generated by linearly combining the components of the difference vector with the corresponding components of other individuals by
where
xr1,
xr2 and
xr3 are three distinct individuals that are randomly selected from the population and
i ∉ {
r1,
r2,
r3}. Amplification factor
F > 0 can scale the difference vector.
The vector
should also satisfy the constraints of the search space as follows:
(3) Crossover: Perform a crossover operation using
vi and
xi to generate
by
where
CR is the crossover rate and
jrand is a randomly selected integer from 1 to
D.
(4) Selection: By comparing
ui and
xi, the vector with lower value of cost function is selected in the next generation, as follows:
where the function
f(.) represents the cost function.
3. QLRDE
In the standard DE algorithm, the mutation vector updates primarily rely on the differences between individuals, lacking clear directionality. Moreover, the DE algorithm tends to overfit the current population, further reducing diversity and weakening global search ability. As a result, DE is often stuck in local optima, especially when dealing with the high-dimensional complex problems. To address these issues, this paper proposes the QLRDE algorithm by combining quantum-behaved mutation strategies and the Loser Reverse-Learning Mechanism.
3.1. Quantum-Behaved Mutation Strategies
According to the principles of quantum mechanics [
25], the population of QLRDE is viewed as a quantum system where each individual exhibits quantum behavior. Additionally, local attractors serve as centers of an attractive potential field, and individuals within this field are able to explore different positions in the feasible region.
Assuming a
D-dimensional search space and a population of
Mpop individuals, during the
g-th iteration, if the position vector of individual
i,
i = 1, 2, …,
Mpop, and the space is
, then the local attractor
qi and mutation vector
vi are generated by
where
rp,
rx, and
rand are mutually independent random numbers from a uniform distribution in the range of (0, 1). The amplification parameter
k controls the intensity of quantum fluctuations acting on the mutation vector
. A larger
k results in a broader exploration size around the local attractor
. The nonlinear parameter
x controls the shape of the
, controlling the transition from exploration to exploitation, thereby balancing exploration and exploitation capabilities.
3.2. Loser Reverse-Learning Mechanism
The LRLM allows individuals with poor performance in the search process to improve through reverse-learning, explore a wider solution space, enhance the global optimization ability, and effectively mitigate premature convergence. The implementation and application of this augmentation mechanism is detailed below.
3.2.1. Identifying the Losers
The improvement extent
δi of
xi after mutation and crossover operations is defined by
where the large value of
δi indicates that the population is improving rapidly.
Then, utilize the linear prediction [
29] to estimate its costs at generation
gmax by
If the predicted final cost is inferior to the current best cost min f(xi), this identifies individual ui as a “loser”, named as uL. In this case, reverse-learning will be applied to the loser crossover vector.
3.2.2. Reverse-Learning
OBL expands the search space and increases population diversity by simultaneously exploring both self-solution and antithetical solutions, thereby preventing premature convergence to local optimum and enhancing global search capabilities [
28,
29,
30].
Suppose a point in the
D-dimensional space
, its opposite point
can be represented as
where the variable
xj is confined to the interval [
aj,
bj], and
j = 1, 2, …,
D.
To enhance the population’s probability of locating the global optimum, the reverse-learning operation in this paper is modified to update the loser as follows:
where
p1 and
p2 are uniformly random numbers within (0, 1). The reverse-learning intensity parameter
o primarily influences the magnitude of the reverse-learning perturbation. A larger
o increases the adjustment strength applied to the loser individual
uL, encouraging its movement toward better regions of the solution space. The limitation parameter
w acts as a weighting coefficient, jointly with the random number
p1, determining the distance between the reverse-learning direction and the original opposite point. Its role is to maintain diversity and stability in the learning process. In practice,
o is commonly chosen from [0.4, 1.0] and
w from [0.1, 1.0]. The selection criterion involves balancing exploration intensity (controlled by
o) with convergence stability (influenced by
w), often determined through sensitivity analysis as demonstrated in
Section 6.2.
The procedure of the LRLM is specifically described in Algorithm 2.
| Algorithm 2. Loser Reverse-Learning Mechanism |
| 1. | Require: maximal generation number gmax, population size Mpop |
| 2. | For i = 1: Mpop do |
| 3. | If f(ui) < f(xi) then |
| 4. | δi = f(xi) − f(ui) |
| 5. | If δi∙(gmax − g) < f(ui) − min f(xi) then |
| 6. | The crossover vector ui is denoted as uL. |
| 7. | Set adjustment parameters o, w. |
| 8. | For j = 1: D |
| 9. | |
| 10. | End for |
| 11. | Re-evaluate and updated the cost f(ui) |
| 12. | End if |
| 13. | End if |
| 14. | End for |
3.3. Adaptive Parameter Adjust Mechanism
In the standard DE algorithm, two adaptive parameters, the amplification factor F and crossover rate CR, are typically fixed throughout the entire evolutionary process. This static configuration tends to lead to an imbalance between search diversity and convergence precision across distinct phases, especially when tackling complex high-dimensional problems. To better balance search diversity and convergence precision throughout the search, the adaptive parameter adjust mechanism is proposed for F and CR. This mechanism dynamically adjusts these parameters for each individual based on successful mutation events, thereby enhancing the algorithm’s robustness and convergence efficiency.
For each individual i in the population, the amplification factor Fi and crossover rate CRi are updated as follows.
3.3.1. Parameter Initialization
Initially,
Fi and
CRi for each individual are randomly initialized within predefined ranges:
where
rand ∈ [0, 1] is a uniformly distributed random number.
3.3.2. Parameter Retention
If a trial vector
ui succeeds in replacing the target vector
xi (where
), the parameters that generated this successful mutation are considered favorable. The parameters for individual
i in the next generation are updated as follows:
where
rand ∈ [0, 1] is a uniformly distributed random number and
a ∈ {1, 2, …,
Mpop} is a randomly chosen index distinct from
i. This update introduces a stochastic element that helps preserve population diversity.
3.3.3. Parameter Updating
If the
ui fails to outperform the
xi, the parameters remain unchanged for the next generation, allowing the individual to maintain its current search direction temporarily:
3.3.4. Boundary Violation Handling
After each update, the values of
Fi and
CRi are constrained to their respective allowable ranges to ensure stability:
This adaptive mechanism enables the algorithm to automatically tailor its search parameters to the landscape of the optimization problem. By promoting effective parameter sets and discouraging ineffective ones, the mechanism contributes to a more efficient trade-off between search diversity and convergence precision. Consequently, the adaptive parameter adjust mechanism significantly enhances the convergence performance and solution optimality of the QLRDE.
During exploratory phases, the adaptive mechanism promotes higher Fi and CRi, which enhances the mutation strength in quantum-behaved mutation for global exploration while allowing Loser Reverse-Learning Mechanism to actively reconstruct inferior individuals. During exploitation phases, adaptive mechanism reduces parameter variations through its success-based retention strategy, enabling quantum-behaved mutation to perform refinement while loser reverse-learning focuses on fine-tuning near optimal regions.
By integrating the adaptive mechanism with the quantum-behaved mutation strategies and the Loser Reverse-Learning Mechanism, we obtain a comprehensive and powerful optimization framework, which is detailed in the complete procedure of the QLRDE algorithm provided in Algorithm 3.
3.4. Process of the QLRDE Algorithm
The specific procedure of QLRDE is shown in Algorithm 3.
| Algorithm 3. QLRDE |
| | /*Parameter setting*/ |
| 1. | Set algorithm parameters gmax, Mpop, Fmax, Fmin, CRmax, CRmin. |
| | /*Initialization*/ |
| 2. | Randomly initialize xi and calculate the costs J(xi). |
| 3. | Initialize amplification factors Fi and CRi. |
| 4. | While g < gmax do |
| 5. | Randomly select a, b ∈ [1, Mpop]. |
| 6. | If rand < 0.5 then |
| 7. |
|
| 8. | Else |
| 9. |
|
| 10. | End if |
| 11. | For i = 1: Mpop do |
| | /*Quantum behavior mutation*/ |
| 12. | Set adjustment parameters k, x. |
| 13. | If rand < 0.5 then |
| 14. |
|
| 15. | Else |
| 16. |
|
| 17. | End if |
| | /*Crossover*/ |
| 18. | For j = 1: D do |
| 19. |
|
| 20. | End for |
| 21. | End for |
| 22. | Apply the Loser Reverse-Learning Mechanism as described in Algorithm 2. |
| 23. | For i = 1: Mpop do |
| | /*Selection*/ |
| 24. | If J(ui) ≤ J(xi) then |
| 25. |
|
| 26. |
|
| 27. |
|
| 28. |
|
| 29 |
Else |
| 30. |
|
| 31. |
|
| 32. |
|
| 33. |
|
| 34. |
End if |
| 35. | End for |
| 36. | g = g + 1 |
| 37. | End while |
| 38. | Return the optimal solution. |
3.5. Computing Complexity
Let Mpop denote the population size, D the problem dimension, Gmax the maximum number of generations, and C the cost function. The QLRDE algorithm consists of two main phases: initialization and iterative optimization. The computing complexity of each part is detailed below.
3.5.1. Initialization
The initialization is executed only once at the beginning of the algorithm. It involves generating Mpop random individuals in the D-dimensional search space and evaluating their costs, C. The computational complexity of this part is .
3.5.2. Optimization
The optimization is executed in each generation and comprises four components: quantum-behaved mutation, crossover, loser reverse-learning, and selection with adaptive parameter adjust. The complexity of each component is analyzed below under worst-case assumptions.
(1) Quantum-behaved mutation: In each generation, all Mpop individuals undergo mutation according to Equation (7). This operation involves arithmetic computations for each of the D dimensions per individual. Therefore, the computational complexity of the mutation step is .
(2) Crossover: Following mutation, the binomial crossover in Equation (4) is applied to each individual. For each of the D dimensions, a random number is generated and compared with the crossover rate CRnew. The resulting complexity is .
(3) Loser reverse-learning: The LRLM (Algorithm 2) is invoked after crossover and involves three steps. First, loser identification computes the improvement (Equation (8)) and the predicted cost (Equation (9)) for each trial vector, requiring operations. Second, reverse-learning application, assuming in the worst case that all Mpop trial vectors are identified as losers, applies Equation (11) to every dimension of each loser, resulting in a complexity of . Third, re-evaluation re-computes the cost of the updated losers, contributing operations. Thus, the overall worst-case complexity of LRLM is .
(4) Selection with adaptive parameter adjust: First, greedy selection compares each trial vector with its corresponding target vector based on cost, requiring operations. Second, adaptive parameter adjusts the parameters Fi and CRi for every individual according to Equations (13) and (14), which contributes operations.
Summing the dominant terms from the above components, the computational complexity per generation is The scalar operations are omitted as they are dominated by the higher-order terms when D is large. Over Gmax generations, the total computational complexity of QLRDE is .
The standard DE algorithm possesses the same asymptotic complexity of . Therefore, QLRDE maintains the identical asymptotic complexity class as standard DE, demonstrating that the introduced quantum-behaved mutation, Loser Reverse-Learning Mechanism, and adaptive parameter adjustment do not increase the algorithm’s asymptotic computational burden.
In practical implementation, QLRDE introduces a modest constant-factor overhead due to the additional operations in LRLM. However, this overhead is justified by the algorithm’s significantly enhanced search capability and convergence performance, as evidenced by the experimental results in the following sections. The improved convergence characteristics often enable QLRDE to attain high-quality solutions with fewer generations (Gmax), thereby potentially reducing the total computational time in practice.
In summary, the complexity analysis confirms that QLRDE achieves substantial performance improvements while preserving the computational efficiency of the DE.
5. QLRDE for UAV Path Planning
This section presents on a path planning solution based on the QLRDE algorithm. The solution begins by employing B-Spline curves to transform the continuous path planning problem into an optimization problem with a finite number of control points, thereby reducing the dimensionality of the decision variables. A comprehensive cost function that incorporates multiple objectives is established, considering path length, threat minimization, flying height, turning angle, and terrain constraints, thus modeling the path planning as an optimization problem. The QLRDE algorithm is then employed to optimize the path, ultimately generating a smooth flight path that satisfies dynamic constraints while minimizing the total cost.
5.1. Path Representation Based on B-Spline Curve
The UAV path comprises
N discrete waypoints (excluding the fixed start and target), each waypoint
pk,
k = 1, …,
N, is defined by three coordinates (
xpk,
ypk,
zpk), thereby resulting in a 3
N-dimensional decision space. For dimensionality reduction and smooth path generation, B-Splined strategy [
31] is employed to derive the flight trajectory from control points, ultimately represent the target UAV path.
UAV path defined by discrete waypoints {p0, p1, …, pN, pN+1} with the coordinates (xpk, ypk, zpk) for each waypoint, corresponds to a set of control points {w0, w1, …, wn, wn+1} with the coordinates (xci, yci, zci), i = 1, …, n. The start (p0, w0) and goal (pN+1, wn+1) are prescribed. Path planning is equivalent to solving for the n free control points that, along with the fixed endpoints, produce the desired B-Spline curve-based path.
5.2. Model of Cost Function
Path planning is formulated as an optimization task, and minimizes a comprehensive cost function
J, which is the core criterion to evaluate path quality [
27]. As detailed in
Section 5.1, given the UAV path represented by waypoints {
p0,
p1, …,
pN,
pN+1} with coordinates (
xpk,
ypk,
zpk),
k = 0, …,
N + 1. The cost function is as follows:
where total cost
J aggregates five key terms: length cost
, threat cost
, no-fly zone cost
, altitude cost
, lateral maneuvering cost
, and vertical maneuvering cost
.
The length cost
fLC is given by the accumulated sum of all consecutive segment lengths along the path
Designed to penalize proximity to threats, the threat cost
fTC accounts for the cumulative threat along the path, factoring in segment lengths and localized threat probabilities as follows
where
Pj,k,
Rmax,j, and
dj,k denote, respectively, the threat probability for segment
pkpk+1 from the
j-th threat, the threat’s maximum effective radius, and the segment’s distance to the threat center.
The UAV should avoid entering some no-fly zones, such as harsh climate zones, unknown zones, and so on. The no-fly zones cost function
fNFC is calculated as follows:
where
N is the number of no-fly zones and
Lin,k is the length of the UAV path that is inside the
k-th no-fly zones.
The altitude cost
fAC is designed to incentivize low-altitude flight, thereby exploiting terrain masking, and to penalize any path colliding with the terrain, as follows:
where
Hmap (
xpk,
ypk) is the terrain elevation at coordinates (
xpk,
ypk),
Hmin is the minimum allowable flight altitude, and
C is the penalty coefficient.
The lateral maneuvering cost
fLMC penalizes excessive turning by summing penalties at waypoints where the turning angle exceeds the allowable limit, as follows:
where turning angle
ϕk at
pk and penalty
C are as defined. The allowable turning is constrained by a maximum angle
, which is determined by the UAV’s dynamic limits
where
V is the velocity and
nmax is the maximum allowable lateral load.
The vertical maneuvering cost
fVMC can be computed as the sum of penalties imposed at waypoints that violate the allowable climb/glide slope, as follows:
where the maximum climbing slope
αk, minimum gliding slope
βk, and the instantaneous path slope
sk at waypoint
pk are calculated according to the aerodynamic model [
27].
5.3. Path Planning Using QLRDE
The core of applying the QLRDE algorithm to UAV path planning lies in effectively mapping the path optimization problem onto the evolutionary search framework of QLRDE. This section details this mapping process, including the encoding of B-Spline control points into QLRDE individuals, the design of the cost function as the fitness evaluation criterion, and the specific procedural integration.
In the B-Spline-based path representation described in
Section 5.1, a smooth flight path is determined by a sequence of
n free-to-move control points,
, where each control point
has 3D coordinates. Together with the fixed start point
w0 and target point
wn+1, these points fully define the path.
In the QLRDE algorithm, each individual in the population represents a candidate solution. Specifically, an individual
x in the QLRDE population is encoded as the concatenation of all coordinates of the
n free control points:
Thus, the problem dimension is D = 3n. Each individual x corresponds to a unique UAV path generated by the B-Spline curve construction formulas. The population of QLRDE, therefore, explores a space of potential paths by evolving these control points.
The quality of a path, represented by an individual x, is evaluated by the cost function J(x) defined in Equation (17). This function J serves as the fitness function in the QLRDE algorithm, guiding the evolutionary search towards safer and more efficient paths.
During greedy selection, each trial path is directly compared with its parent based on the total cost J(ui) versus J(xi), which establishes a steady selection pressure toward lower-cost paths. The penalty mechanism embedded in the cost function plays a crucial role in steering the search direction: paths that violate hard constraints (e.g., exceedance of turning or climb-angle limits) receive a heavy penalty of order C = 103, raising their total cost far above that of any feasible path and ensuring their rapid elimination. For violations of soft constraints (e.g., proximity to threats), a moderate penalty on the order of 101 to 102 is applied, allowing some exploration near constraint boundaries while gradually guiding the population toward feasible regions. This differentiated penalty strategy enables the algorithm to explore a broad space that includes slightly infeasible solutions in early generations, yet converge strictly to fully feasible and optimized paths in later stages.
For the adaptive parameter adjust mechanism, the update of an individual’s control parameters Fi and CRi is directly coupled to cost improvement: only when J(ui) < J(xi) are the parameters used in that trial considered successful and retained for the next generation via Equation (13); otherwise, the original parameters are kept. This mechanism ensures that search regions in the control-point space that consistently yield cost reductions acquire enhanced local-search capability, while regions that fail to improve are gradually assigned lower search intensity. In particular, individuals that successfully avoid threat zones or terrain obstacles tend to preserve and propagate their parameters, thereby establishing a search bias toward feasibility and low cost within the population.
The process of integrating the QLRDE algorithm for 3D UAV path planning follows a structured iterative procedure, as illustrated in
Figure 4. Algorithm 4 summarizes the detailed procedure, highlighting the integration points between path planning and the QLRDE optimizer.
| Algorithm 4. QLRDE path planning |
| | /*Parameter setting*/ |
| 1. | Set Start point w0, target point wn+1, terrain data Hmap, UAV parameters, nmax, V, Hmin, QLRDE parameters D, gmax, Mpop, Fmax, Fmin, CRmax, CRmin. |
| | /*Initialization*/ |
| 2. | Generation g = 0 |
| 3. | Randomly initialize population of Mpop individuals, xi, I = 1, …, Mpop, within the mission boundaries. Each individual xi represents n control points and calculate the costs J(xi). |
| 4. | Initialize amplification factors Fi and CRi. |
| 5. | While g < gmax do |
| 6. | Randomly select a, b ∈ [1, Mpop]. |
| 7. | If rand < 0.5 then |
| 8. | |
| 9. | Else |
| 10. | |
| 11. | End if |
| 12. | For i = 1: Mpop do |
| | /*Quantum behavior mutation*/ |
| 13. | Set adjustment parameters k, x. |
| 14. | If rand < 0.5 then |
| 15. | |
| 16. | Else |
| 17. | |
| 18. | End if |
| | /*Crossover*/ |
| 19. | For j = 1: D do |
| 20. | |
| 21. | End for |
| 22. | End for |
| 23. | Apply the Loser Reverse-Learning Mechanism as described in Algorithm 2. |
| 24. | For i = 1: Mpop do |
| | /*Selection*/ |
| 25. | If J(ui) ≤ J(xi) then |
| 26. | |
| 27. | |
| 28 | |
| 29. | |
| 30. | |
| 31 | Else |
| 32. | |
| 33. | |
| 34. | |
| 35. | |
| 36. | |
| 37. | End if |
| 38. | End for |
| 39. | g = g + 1 |
| 40. | End while |
| 41. | Return the Jbest and optimal path defined by the control points of xbest. |
5.4. Synergistic Advantages of QLRDE in Path Planning
The unique mechanisms of the QLRDE algorithm confer distinct synergistic advantages for addressing the UAV path planning problem. Primarily, the quantum-behaved mutation strategy, by virtue of its inherent uncertainty, drives individuals to explore distant regions of the search space. This is crucial for escaping local optima frequently induced by complex obstacle fields, thereby enhancing the probability of discovering a globally competitive path. Furthermore, the LRLM actively revitalizes stagnant individuals, effectively preventing premature convergence of the population to a suboptimal path and promoting the exploration of diverse path alternatives around threats and terrain features, thus maintaining critical population diversity. Ultimately, the adaptive parameter adjustment mechanism dynamically fine-tunes the search intensity: encouraging broad exploration during the early evolutionary stages and subsequently focusing on the local refinement of promising paths, achieving a balance between convergence efficiency and robustness.
6. Evaluation and Comparison in Solving Path Planning
6.1. Simulation Results and Comparison
To evaluate experimental evaluation of the designed QLRDE-based path planner, several comparative simulation experiments are conducted. This section focuses on comparing the QLRDE algorithm with several other algorithms, including the LRDE algorithm, the GOBLDE algorithm, the standard DE algorithm, and the PSO algorithm under two mission cases. A fair comparison was ensured by adopting identical settings for all algorithms, with the maximum iteration count as the termination criterion and a population size of Mpop = 40. The simulation environment consisted of MATLAB-2024a running on a standard personal computer, on which all algorithms were implemented.
Given the inherent stochasticity of swarm intelligence algorithms, performance was evaluated statistically. For each test case, every algorithm was executed independently for 50 runs. The experiments were conducted in a known rectangular mission environment containing predefined terrain and threats. As shown in
Table 6, the positions of the start point, target point, threats, and no-fly zones are all represented by 2D planar coordinates. In test cases, the UAV mission area is defined as a square airspace with a side length of 90 km. Threats from anti-aircraft gun, radars, and missiles are all modeled as cylinders of infinite height. For example, {[40, 25], 11} indicates that the center of a ground-based threat is located at coordinates [40, 25] km, with a threat radius of 11 km. No-fly zones are represented as rectangular cuboids of infinite height. For instance, {[9, 43], [23, 43], [23, 27], [9, 27]} denotes that the four vertices of the rectangular no-fly zone are situated at [9, 43] km, [23, 43] km, [23, 27] km, and [9, 27] km, respectively. The UAV path was parameterized using
n = 5 B-Spline control points, resulting in an optimization problem of dimension
D = 3
n = 15. The UAV was configured with the following operational parameters:
nmax = 5,
Hmin = 20 m, and
V = 200 m/s. All tested algorithms shared a common configuration, where
Mpop = 40 and
gmax = 400. Other parameters are set as follows: for LRDE, DE, and GOBLDE,
F is a random number in [0.1, 0.9], and
CR = 0.7; for PSO,
c1 =
c2 = 1.4,
w = 0.9; for QLRDE, both the amplification factor
F ∈ [0.1, 0.9] and the crossover operator
CR ∈ [0.2, 0.9] are dynamically adjusted during the optimization process.
Figure 5 shows the best 3D UAV paths in the digital terrain environment obtained by the five algorithms after 50 independent runs for Case 1 and Case 2, respectively, where white cylinders represent threat areas from missiles, radars, and anti-aircraft guns, blue cubes represent no-fly zones.
Figure 6 shows the horizontal projections of the best UAV paths from
Figure 5 on the contour map, where circular areas represent threats and rectangular areas represent no-fly zones. All algorithms successfully planned safe paths that completely avoid threats.
Figure 7 displays the altitude profiles of the best-performing paths for Case 1 and Case 2. The results show that after 50 runs, all five optimization algorithms successfully generated feasible paths avoiding all danger zones. However, the paths generated by the QLRDE algorithm have the shortest length while also exhibiting smaller variations in flight altitude. The QLRDE algorithm effectively combines threat avoidance and path optimization, significantly enhancing the survivability and mission effectiveness of the aircraft. Compared to the paths obtained by LRDE, GOBLDE, DE, and PSO, the QLRDE-generated paths yield superior cost metrics and smaller standard deviations, demonstrating its stronger search capability.
Algorithm performance can be evaluated using the mean cost and std, indicating its search capability and stability, respectively. The data in
Table 7 indicate that QLRDE achieves the minimum values in all five metrics: best, mean, median, worst, and std, with each optimal value highlighted in bold. This indicates that QLRDE possesses the strongest optimization capability in a statistical sense. Algorithm execution efficiency can be evaluated using the average time (AT, representing the average time the algorithm runs once). The data in
Table 7 indicate that QLRDE achieves shorter average time in both test cases, outperforming LRDE and GOBLDE, while remaining competitive with DE and PSO.
Figure 8 presents the convergence curves, showing that the QLRDE converges faster and achieves lower cost values than its counterparts.
To study the distribution characteristics of the solution set,
Figure 9 plots the cumulative frequency against the cost value
J for the solution set obtained from independent runs, where the cumulative frequency at any cost threshold
J is as follows:
where the numerator
N (
Jmin <
J) counts the runs where the obtained minimum cost is at most
J and the denominator
Ntotal = 50 is the total number of independent runs.
Figure 9 clearly shows that the QLRDE algorithm delivers better performance compared to comparison algorithms. Taking Case 1 as an example, 95% of the solutions found by QLRDE have a cost below the threshold of 1500. In contrast, GOBLDE and LRDE reach this threshold in only about 75% and 80% of the trials, respectively. In the Case 2, 100% of the cost values obtained by QLRDE are less than 1000 iterations, whereas the GOBLDE, LRDE, and DE algorithms only achieve this in about 80% of the cases. These results show that QLRDE has higher capability compared to other algorithms in solving UAV path planning. In summary, the proposed QLRDE algorithm outperforms GOBLDE, LRDE, DE, and PSO.
6.2. Further Discussion of Algorithm Parameters
In QLRDE algorithm, parameters k, x, o, and w significantly influence the search performance. To ensure the scientific rigor of the experiments, this study first established reasonable ranges for these parameters. Parameter k controls the amplification factor of differential mutation. If its value is too small, the population may lack exploration capability, whereas an excessively large value may cause oscillations in the solution space. Therefore, this study selected k = 3 and k = 4 within the commonly used range for comparison. Parameters x and o primarily affect the shape of the nonlinear adjustment function, influencing the step size distribution during the search process and the ability to escape local optima and their effective intervals are concentrated within [0.5, 0.9]. Thus, several representative points were selected for experimentation. The parameter w balances the ratio of mutation and retention. An overly small value may weaken exploration capability, while an overly large value may lead to algorithmic instability. Hence, typical values within [0.4, 0.8] were chosen for testing. Although this approach does not exhaustively traverse all possible combinations, it adequately covers the main effective intervals, ensuring the reliability of the analytical conclusions.
Under the above settings, this study designed 16 parameter combinations, each independently run 50 times in both Case 1 and Case 2, and recorded the best, worst, median, and mean values, as well as std in
Table 8 and
Table 9, with the top four values per row marked in bold. The experimental results indicate that, regarding parameter
k, the overall mean value for
k = 4 is significantly lower than that for
k = 3, demonstrating stronger convergence performance. For the parameter
w, although
w = 0.4 achieved favorable mean values in some combinations, nearly all the best and second-best results corresponded to
w = 0.8, indicating that
w = 0.8 exhibits greater stability when combined with
k = 4. In terms of the nonlinear coefficients, the combination
x = 0.7 and
o = 0.9 achieved the smallest mean value in both Case 1 and Case 2, representing the global optimum. It also demonstrated greater stability and better robustness across different experiments.
In Case 1 and Case 2, the performance metrics of each parameter combination are reported in
Table 8 and
Table 9, with best values highlighted in bold. In Case 1 and Case 2,
k = 4,
x = 0.7,
o = 0.9, and
w = 0.8 achieved the smallest best, median, mean, and standard values.
In summary, three main conclusions can be drawn: First, k = 4 is superior to k = 3; second, w = 0.8 frequently appears in high-quality solutions and ensures stable performance; and third, x = 0.7 and o = 0.9 provides a good balance between mean performance and stability. Therefore, the optimal parameter combination is determined as k = 4, x = 0.7, o = 0.9, and w = 0.8, which achieves low mean cost while maintaining robust performance and general applicability, demonstrating the superiority of the selected parameters.
The experiments in the two test cases demonstrate the efficacy of the QLRDE algorithm for UAV path planning. In the two test cases, QLRDE achieved the shortest path lengths with best costs of 269 (with an average execution time of 93.27 s per run) and 253 (with an average execution time of 80.57 s per run), respectively. The altitude profiles show that paths generated by QLRDE exhibit smaller variations in flight altitude compared to other algorithms. Moreover, cumulative frequency analysis exhibits that 95% of the solutions found by QLRDE have a cost below the threshold of 1500, demonstrating faster convergence than other algorithms. Additionally, the analysis of 16 parameter combinations identified the optimal set as k = 4, x = 0.7, o = 0.9, and w = 0.8, which achieved the lowest mean cost and the most robust performance across both test cases. Experimental results demonstrate that QLRDE outperforms other existing algorithms in terms of solution quality, convergence speed, and computational efficiency.
7. Conclusions
This paper contributes an improved QLRDE to overcome local optima and premature convergence in standard DE, with its application to the UAV path planning. The proposed QLRDE integrates three innovations: a quantum-behaved mutation strategy to suppress premature convergence, the LRLM to enhance population diversity, and an adaptive parameter adjust mechanism to enhance the algorithm’s robustness and convergence efficiency. These improvements strengthen global exploration and convergence capabilities. Experimental results on twelve benchmark functions demonstrated that QLRDE demonstrates enhanced performance with respect to convergence speed, solution quality, and stability compared to several state-of-the-art algorithms. Applied to 3D UAV path planning, QLRDE generates short and low-altitude path while satisfying realistic constraints including maximum turning angle, terrain avoidance, and threat zones. Results from path planning experiments demonstrate QLRDE’s advantages over other algorithms in achieving higher quality solutions, faster convergence, and more efficient computation.
QLRDE is suitable for medium-dimensional optimization problems (e.g., 20–30 dimensions), where it effectively escapes local optima. It also performs well in engineering applications such as UAV 3D path planning and robotic trajectory optimization, which require smooth and feasible solutions under multiple physical and environmental constraints. Furthermore, in scenarios demanding high robustness, QLRDE maintains consistent performance and stable solution quality across multiple independent runs. However, the current QLRDE is limited to single-UAV path planning and does not support cooperative multi-UAV coordination, which involves challenges such as inter-agent collision avoidance and communication constraints. Furthermore, the algorithm is an offline planning method assuming a static environment, and thus does not handle dynamic threats or moving obstacles that would require real-time replanning capabilities.
Future work will focus on two aspects. First, experimental validation will be conducted on real drone platforms or using public UAV path planning datasets to assess the physical feasibility and performance of the planned paths under real-world conditions. Second, we will extend the research to multi-UAVs cooperative path planning problem incorporating more practical constraints.