Next Article in Journal
Correction: Zhu et al. UAVs’ Flight Dynamics Is All You Need for Wind Speed and Direction Measurement in Air. Drones 2025, 9, 466
Previous Article in Journal
Research on the FSW-GWO Algorithm for UAV Swarm Task Scheduling Under Uncertain Information Conditions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research on Trajectory Planning for a Limited Number of Logistics Drones (≤3) Based on Double-Layer Fusion GWOP

College of Civil Aviation, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
*
Author to whom correspondence should be addressed.
Drones 2025, 9(10), 671; https://doi.org/10.3390/drones9100671
Submission received: 7 August 2025 / Revised: 6 September 2025 / Accepted: 22 September 2025 / Published: 24 September 2025

Abstract

Trajectory planning for logistics UAVs in complex environments faces a key challenge: balancing global search breadth with fine constraint accuracy. Traditional algorithms struggle to simultaneously manage large-scale exploration and complex constraints, and lack sufficient modeling capabilities for multi-UAV systems, limiting cluster logistics efficiency. To address these issues, we propose a GWOP algorithm based on dual-layer fusion of GWO and GRPO and incorporate a graph attention network (GAT). First, CEC2017 benchmark functions evaluate GWOP convergence accuracy and balanced exploration in multi-peak, high-dimensional environments. A hierarchical collaborative architecture, “GWO global coarse-grained search + GRPO local fine-tuning”, is used to overcome the limitations of single-algorithm frameworks. The GAT model constructs a dynamic “environment–UAV–task” association network, enabling environmental feature quantification and multi-constraint adaptation. A multi-factor objective function and constraints are integrated with multi-task cascading decoupling optimization to form a closed-loop collaborative optimization framework. Experimental results show that in single UAV scenarios, GWOP reduces flight cost (FV) by over 15.85% on average. In multi-UAV collaborative scenarios, average path length (APL), optimal path length (OPL), and FV are reduced by 4.08%, 14.08%, and 24.73%, respectively. In conclusion, the proposed method outperforms traditional approaches in path length, obstacle avoidance, and trajectory smoothness, offering a more efficient planning solution for smart logistics.

1. Introduction

With the rapid development of e-commerce retail, higher demands have been placed on express delivery, driving the evolution of smart logistics from “automated warehousing” to “full-chain drone logistics.” Drones have experienced significant growth due to their advantages in flexible low-altitude deployment and strong coverage in remote areas. The market is projected to grow at an annual rate of 55%, expanding from USD 544 million in 2022 to USD 18.311 billion by 2030 [1]. Application scenarios include urban distributed warehouse departures, time-windowed multi-point deliveries, and agricultural supply transportation in complex rural terrains. However, the large-scale adoption of logistics drones heavily depends on adaptive trajectory planning that balances delivery cost and safety. Compared to scenarios such as infrastructure inspection [2], agricultural pest control [3], and disaster search and rescue [4], logistics drone applications impose stricter constraints, necessitating further optimization of traditional trajectory planning.
The core challenge in logistics drone trajectory planning is to find the optimal path from start to destination under multiple constraints. Due to the complexity of logistics environments, classical algorithms [5,6] often fall into local optima. To address this, [7] integrated a triple-improved adaptive neighborhood search into the A* algorithm, introducing a 5-neighborhood search mechanism to enhance planning efficiency. Ref. [8] proposed an improved Dubins-RRT* algorithm to incorporate curvature constraints, enabling adaptation to multiple delivery waypoints. In recent years, meta-heuristic algorithms have become dominant due to their ability to handle complex constraints and high-dimensional spaces.
For example, [2] introduced a cooperative coverage path optimization method based on particle swarm optimization (PSO), achieving parallel optimization of path efficiency, image quality, and computational complexity. Ref. [9] combined GWO with differential evolution (DE) to propose HGWODE, optimizing trajectory length and smoothness. Ref. [10] enhanced adaptability in dynamic environments by introducing roulette selection and Levy flight strategies, improving path length optimization by 82.4%. Ref. [4] proposed a heuristic cross-search and rescue optimization (HC-SAR), enhancing drone conflict avoidance in 3D threat environments through real-time correction and B-spline interpolation. Ref. [11] integrated reinforcement learning (RL) with the artificial bee colony algorithm (ABC), dynamically adjusting search dimensions to improve path accuracy.
In multi-drone collaboration, hybrid and learning-based frameworks have gained attention. Ref. [12] proposed a hybrid algorithm combining improved Golden Jackal Optimization (MGJO) and Dynamic Window Approach (DWA), reducing local optima occurrences in large-scale scenarios from six to two. Ref. [13] introduced an improved Marine Predators Algorithm (MMPA), enhancing convergence speed by 40% through adaptive parameters and Cauchy mutation. Ref. [14] proposed a Multi-Agent Cooperative Navigation System (MACNS) based on Graph Neural Networks (GNNs), integrating PPO for dynamic traffic coordination. Ref. [15] proposed a RL-based Multi-Strategy Cuckoo Search (RLMSCS), reducing threat cost by 21.1% through dynamic parameter adjustment. Ref. [16] explored learning-based adaptive information path planning (AIPP), while [14] proposed GPPO, combining GNN and PPO to improve multi-agent coordination. These studies demonstrate the growing effectiveness of learning-enhanced algorithms in UAV logistics.
The Gray Wolf Optimization (GWO) algorithm, introduced in 2014, simulates wolf pack hierarchy to achieve rapid convergence in UAV trajectory planning. Ref. [17] proposed BGWO with a beetle antenna strategy, improving benchmark function accuracy by 35%. Ref. [3] introduced NAS-GWO, integrating Gaussian mutation and spiral functions to reduce cost values by 13.48%. Ref. [18] proposed RGWO with adaptive inertia weights and escape mechanisms to enhance search efficiency.
Despite its strengths, GWO faces limitations in complex urban environments: (1) It relies solely on α/β/δ wolves for global search without a hierarchical framework of “coarse-grained exploration + fine-grained optimization”. (2) It lacks dynamic coupling between “environmental features” and “UAV tasks,” limiting adaptability. (3) It struggles with multi-dimensional modeling in cooperative operations, requiring a shift from “feasible” to “high-quality” solutions.
To address these issues, this study proposes a hierarchical collaboration framework combining the CAT feature input layer with the GRPO decision-making strategy. By integrating GWO with Generalized Reinforcement Policy Optimization (GRPO) and incorporating GAT-based dynamic association decoupling and multi-task cascading optimization, the framework enhances global search and real-time strategy adaptation. It supports dynamic task priority changes [19] and sudden obstacle avoidance [20] without reinitializing the population, showing promise in smart logistics [21] and collaborative inspection [22]. Compared to hybrid methods like GWO-DE [9] and RL-ACO [23], the GWOP framework achieves breakthroughs through the following mechanisms:
  • Two-layer Fusion mechanism: A two-layer fusion architecture integrating “GWO global search + GRPO strategy optimization” is proposed to overcome the performance separation between “global search” and “local convergence” inherent in single algorithms. By leveraging the α/β/δ wolf cooperation mechanism in GWO for large-scale space exploration, combined with GRPO’s reward normalization and fine trajectory adjustment via KL constraints, a hierarchical collaboration of “coarse-grained search + fine-grained optimization” is achieved. This provides an effective framework for addressing path planning challenges in complex environments.
  • GAT Feature Correlation Modeling: We construct an “environment—unmanned aerial vehicle” dynamic correlation network. By adaptively allocating attention weights, the network can capture in real-time the correlation strength between the dynamic distribution of environmental obstacles and logistics tasks. Integrated with GAT-based graph convolution quantization and global adaptive parallel decoupling strategies, it effectively quantifies dynamic environmental features, enabling seamless switching between static map pre-planning and online dynamic obstacle avoidance.
  • CEC2017 test model: The improved double-layer fusion GWOP algorithm was evaluated against multiple other algorithms using the CEC2017 test functions, and its adaptability was demonstrated through analyses of convergence accuracy and fitness value. The algorithm’s strengths in balancing exploration across multimodal and high-dimensional problems were examined, leading to the conclusion that the GWOP algorithm exhibits superior performance in complex optimization environments.
  • Modeling of target constraint conditions: Considering the actual flight scenarios of logistics drones in a comprehensive manner, a multi-factor objective function and corresponding constraint conditions are established to enable dynamic strategy adjustment under multiple constraints. This approach addresses the limitations of traditional algorithms in terms of “coarse feature extraction and reliance on static models” within complex environments, thereby advancing path planning from “physical environment adaptation” to “task semantic-driven decision-making.”
  • Multi-task Cascading Decoupling Optimization: By integrating GAT association decoupling and GRPO gradient optimization, the issues of “task conflict” and “resource imbalance” in multi-logistics UAV collaborative operations are resolved. This achieves collaborative optimization across “path length—obstacle avoidance effect—trajectory smoothness,” integrating architectural diversity and breaking the limitation of “single-index optimization” in traditional algorithms. It promotes the evolution of intelligent logistics route planning from “feasible solutions” to “high-quality solutions.”
The organization of this paper is structured as follows: Section 2 outlines the working process of the classic Gray Wolf Optimizer (GWO) algorithm. Section 3 introduces the proposed double-layer fusion GWOP algorithm, which integrates GWO and GRPO, and further incorporates the GAT layer attention mechanism and B-spline trajectory smoothing technique. Section 4 describes the real-world unmanned aerial vehicle (UAV) flight environment and experimental procedures, while also evaluating the performance of multiple algorithms on the CEC2017 test functions. Section 5 establishes the mathematical model of the objective function and constraint conditions for the UAV flight environment, based on which comparative experiments are conducted between the improved double-layer GWOP algorithm and the classic GWO algorithm. Following the initial results, six additional classical algorithms are introduced to assess the robustness of the proposed method. Section 6 presents a comprehensive discussion and comparative analysis of the experimental outcomes involving the improved double-layer GWOP algorithm and other classical algorithms. Finally, Section 7 provides a summary of the study along with recommendations for future research directions.

2. GWO Trajectory Planning Algorithm

All Gray Wolf Optimizer (GWO) is a population-based optimization algorithm inspired by the hunting behavior and social hierarchy of gray wolves. The algorithm categorizes individuals into four roles: alpha (α), beta (β), delta (δ), and omega (ω), as illustrated in Figure 1, encompassing the entire process of hunting, encircling, and attacking prey. It achieves a balance between global exploration and local exploitation through the iterative adjustment of coefficients A and C, where A decreases over iterations and C is a random number within a defined range.
The process of the Grey Wolf Optimizer (GWO) algorithm is outlined in Algorithm 1, involving key parameters such as population size N , dimensionality dim , maximum number of iterations T max , and boundary initialization [ l b , u b ] . During the iterative process, a new position X i ( t ) is constructed using the decreasing coefficient A which varies with each iteration, along with random coefficients s k and C k . After applying boundary constraints to the newly generated position, the three hierarchical roles of gray wolves (α, β, δ) are updated based on their fitness values.
Algorithm 1: GWO Algorithm Workflow
  Input:  N , T max , dim , [ lb , ub ]
  Output: X * ; f *
1 X i ( 0 ) l b + s i ( u b l b ) ,   i = 1 , 2 , , N   / /   s i U ( 0 , 1 )
2 f ( X i ( 0 ) ) e v a l u a t e _ f i t n e s s ( X i ( 0 ) )
3 X α ( 0 ) = arg min i f ( X i ( 0 ) )
4 X β ( 0 ) = arg min i α f ( X i ( 0 ) )
5 X δ ( 0 ) = arg min i α , β f ( X i ( 0 ) )
6 for   t   = 1   to   T max   do
7     a ( t ) = 2 1 t / T max
8     for   i   = 1   to   N   do
9       C k = 2 s k , k = 1 ,   2 ,   3   / /   s k U ( 0 , 1 )
10      D α , i ( t ) = | C 1 X α ( t 1 ) X i ( t 1 ) | , D β , i ( t ) = | C 2 X β ( t 1 ) X i ( t 1 ) |
11       D δ , i ( t ) = | C 3 X δ ( t 1 ) X i ( t 1 ) |
12      A k = 2 a ( t ) s k + 3 a ( t ) , k = 1 ,   2 ,   3   / /   s k + 3 U ( 0 , 1 )
13      X 1 ( c ) = X α ( t 1 ) A 1 D α , i ( t ) , X 2 ( c ) = X β ( t 1 ) A 2 D β , i ( t )
14      X 3 ( c ) = X δ ( t 1 ) A 3 D δ , i ( t )
15      X i ( t ) = 1 / 3 X 1 ( c ) + X 2 ( c ) + X 3 ( c )
16      for   j   = 1   to   dim   do
17         if   X i , j ( t ) < lb j   then
18          X i , j ( t ) l b j
19      end
20       if   X i , j ( t ) > ub j   then
21          X i , j ( t ) u b j
22       end
23      end
24    end
25     f ( X i ( t ) ) evaluate _ fitness ( X i ( t ) )   X α ( t ) , X β ( t ) , X δ ( t ) = arg top 3 min i f ( X i ( t ) )
26   end
27   return X * = X α ( T max ) , f * = f X α ( T max )
Initialize the position of each gray wolf individual by setting the population size of the gray wolf group as N , and initialize the position X i 0 of the i gray wolf as follows.
X i 0 dim = lb + s i ( u b l b ) ,   i = 1 , 2 , 3 , , N
In the formula, i represents the index of individuals within the gray wolf population. l b and u b denote the lower and upper bound vectors of the search space, respectively. dim signifies the d-dimensional real number space. s i indicates the random vector for the i gray wolf, which is uniformly distributed in the interval [0, 1]. The symbol denotes element-wise multiplication.
By calculating the fitness of each gray wolf in the initial population, the “leadership hierarchy” of the population is determined. This process is used to identify the positions of the top three individuals: the best (α), the second-best (β), and the third-best (δ) wolves.
X α ( 0 ) = arg min i f ( X i ( 0 ) ) X β ( 0 ) = arg min i α f ( X i ( 0 ) ) X δ ( 0 ) = arg min i α , β f ( X i ( 0 ) )
In the formula, X α ( 0 ) , X β ( 0 ) , and X δ ( 0 ) represent the position vectors of α, β, and δ wolves at the initial time t = 0 , respectively. X i ( 0 ) denotes the position vector of the first gray wolf at the initial time. arg min i performs traversal over all gray wolves for index i . f ( X i ( 0 ) ) represents the fitness value of the i gray wolf. The traversal range for i α excludes gray wolves already selected as α, while the traversal range for i α , β excludes gray wolves already selected as α and β. The fitness function is evaluated to determine the fitness value of each individual’s position. To balance global and local exploration diversity, a dynamic adjustment model is introduced as follows.
a ( t ) = 2 1 t T max
In the formula, t denotes the current number of iterations, a ( t ) represents the control parameter at iteration t, and T max indicates the maximum number of iterations of the algorithm. The positions of α, β, and δ wolves are utilized to continuously update the positions of the remaining wolves [24] as follows.
D α , i ( t ) = | C 1 X α ( t 1 ) X i ( t 1 ) | D β , i ( t ) = | C 2 X β ( t 1 ) X i ( t 1 ) | D δ , i ( t ) = | C 3 X δ ( t 1 ) X i ( t 1 ) | C k = 2 r k , k = 1 ,   2 ,   3
In the formula, D α , i ( t ) , D β , i ( t ) , and D δ , i ( t ) represent the distance vectors between the i gray wolf and the α, β, and δ wolves at iteration t, respectively. X α ( t 1 ) , X β ( t 1 ) , and X δ ( t 1 ) denote the position vectors of the α, β, and δ wolves at iteration t − 1. X i ( t 1 ) represents the position vector of the i gray wolf at iteration t − 1. The symbol stands for element-wise product, and | | denotes the absolute value operation. Since r k represents random numbers in the range [0, 1], C k , together with C 1 , C 2 , and C 3 , represent random coefficient vectors in the range [0, 2]. The candidate location model [25] is calculated as follows.
X 1 ( c ) = X α ( t 1 ) A 1 D α , i ( t ) X 2 ( c ) = X β ( t 1 ) A 2 D β , i ( t ) X 3 ( c ) = X δ ( t 1 ) A 3 D δ , i ( t ) A k = 2 a ( t ) s k + 3 a ( t ) , k = 1 ,   2 ,   3
In the formula, X α ( t 1 ) , X β ( t 1 ) , X δ ( t 1 ) , D α , i ( t ) , D β , i ( t ) , D δ , i ( t ) , , and | | have the same meanings as in Formula (4) above, and a ( t ) has the same meaning as in Formula (3) above. X 1 ( t ) , X 2 ( t ) , and X 3 ( t ) represent the candidate position vectors guided by the leader wolves α, β, and δ, respectively. Since s k + 3 represents a random number with a value range of [0, 1], A k , A 1 , A 2 , and A 3 represent the direction control coefficients with a value range of [ a ( t ) , a ( t ) ] . By combining the guidance of the leader wolves with random disturbances, the average of the final three candidate positions is used to update the new position X i ( t ) of the gray wolf [26,27].
X i ( t ) = 1 3 X 1 ( c ) + X 2 ( c ) + X 3 ( c )
After completing the position update of the new gray wolf, boundary constraint processing is performed.
x i , j ( t ) = lb j   if   x i , j ( t )   <   lb j ub j   if   x i , j ( t )   >   ub j   ,   j = 1 , 2 , , dim x i , j ( t )   if   x i , j ( t )   otherwise
In the formula, j represents the dimension index, l b j and u b j denote the lower and upper bounds of the j dimension of the search space, respectively, and x i , j ( t ) represents the j dimensional component of the position of the i gray wolf at iteration t. The best three solutions are reselected by calculating the fitness of the new population.
X α ( t ) , X β ( t ) , X δ ( t ) = arg top 3 i min f ( X i ( t ) )
In the formula, X α ( 0 ) , X β ( 0 ) , and X δ ( 0 ) denote the position vectors of α, β, and δ wolves in generation t, respectively, i denotes the individual index of grey wolves. arg top 3 min represents the selection of the three individuals with the best fitness values. X i ( t ) represents the position vector of the i gray wolf at iteration t, and f ( X i ( t ) ) represents the fitness value of the i gray wolf at iteration t. Finally, the global optimal solution is output.
X * = X α ( T max ) f * = f X α ( T max )
In the formula, X * denotes the global optimal solution, f * represents the optimal fitness value, T max signifies the maximum number of iterations, and X α ( T max ) indicates the position of the α-wolf at the final iteration.

3. Double-Layer GWOP Fusion Algorithm Design

3.1. GAT Layer Attention Mechanism

GAT (Graph Attention Mechanism) is a neural network built upon graph structures, and its algorithmic flow is outlined in Algorithm 2. By constructing association weights between nodes through adaptive learning, GAT accurately captures the relationships between nodes and their neighbors. It relies on the linear transformation of node features combined with the Softmax normalization function to perform weighted aggregation of neighbor information. Furthermore, GAT enhances the representation of key feature information through multi-head attention concatenation.
Algorithm 2: GAT Attention Mechanism Workflow
  Input: G , d f , K , d h , E , η , B , λ
  Output: G A T   Θ
1 W k = RandomMatrix ( d h , d f ) k { 1 , 2 , , K }
2 a k = RandomVector ( 2 d h ) k { 1 , 2 , , K }
3 Θ = { W 1 , a 1 , W 2 , a 2 , , W K , a K }
4 for   epoch   = 1   to   E   do
5      foreach   G b MiniBatch   ( G , B )   d o
6       H 0 = ExtractNodeFeatures ( G b )
7       for   k = 1   to   K   do
8       H k = W k H 0
9       foreach   ( i , j ) G b   do
10       h k , k = H k [ i ]
11      h k , k = H k [ j ]
12      e i , j , k = LeakyReLU ( a k T [ h i , k ; h j , k ] ) ,
13      e k [ i , j ] = e i , j , k
14       end
15       a k = Softmax ( e k , dim = 1 )
16       h i , k = j N ( i ) α i , j , k h j , k
17       z k [ i ] = ELU ( h i , k )
18      end
19      Z = Concat ( Z 1 , Z 2 , , Z K )
20      Γ b = ComputeLoss ( Z , target )
21     Γ = Γ + Γ b + λ | | Θ | | 2 2
22     Θ = Θ η Θ Γ
23     end
24   end
25   return   Θ
The weight matrix W k d h × d f , attention coefficient vector a k 2 d h , and parameter vector of each attention head k { 1 , 2 , , K } [28] are initialized as follows.
W k = RandomMatrix ( d h , d f ) , k { 1 , 2 , , K } α k = RandomVector ( 2 d h ) , k { 1 , 2 , , K }
In the formula, represents the real number field, d h denotes the feature dimension of the node, d f indicates the initial feature dimension of the node, and d h × d f represents the dimension of the weight matrix W k . K represents the number of attention heads. For each attention head k, the initial feature H 0 is linearly transformed using the weight matrix W k .
H k = W k H 0 N × d h
In the formula, H 0 represents the initial node feature matrix, N denotes the number of batch nodes, and H k indicates the node feature matrix after transformation. To capture the “UAV-obstacle” obstacle avoidance association [29] and achieve a better trajectory, by combining the attention vector with LeakyReLU dynamic calculation, the feature correlation coefficient e i , j , k [30,31] after concatenating neighboring node changes is computed as follows:
e i , j , k = LeakyReLU a k T [ h i , k h j , k ]
In the formula, a k represents the attention vector of the k attention head, and L e a k y Re L U denotes the activation function. h i , k represents the feature change in node j under the k attention head, as well as the feature change in its neighbors under the same attention head. denotes the feature concatenation operation. To address the issue that “different neighbors have different effects on the current node,” for each node i, the neighbor attention coefficient is normalized using the following function S o f t max [32]:
α i , j , k = Softmax ( e i , j , k | j ( i ) )
In the formula, ( i ) denotes the set of neighbors for node i, and α i , j , k represents the normalized attention weight assigned by node i to its j neighbor under the k attention head. By incorporating the characteristics of unmanned aerial vehicles (UAVs) into the “nearest obstacle distance” feature, and subsequently aggregating the flight path features of UAVs, adaptive aggregation of neighboring features can be achieved [33].
h i , k = j ( i ) α i , j , k h j , k Z k [ i ] = ELU ( h i , k )
In the formula, h i , k denotes that node i is under the k attention head, aggregating the neighbor features j ( i ) α i , j , k h j , k , which signifies the weighted summation operation; Z k [ i ] represents the output feature of node i under the k attention head, and ELU refers to the exponential linear unit, primarily utilized to break the dependency of linear features.
ELU ( x ) = x                     x 0 α ( e x 1 )   x < 0
In the formula, α represents a constant. The complementary modes of different attention heads capture nodes by utilizing multi-attention head concatenation to output the node matrix Z:
Z = Concat Z 1 , Z 2 , , Z K N × ( K d h )
In the formula, Z k denotes the output feature matrix of the k attention head, Concat refers to the feature concatenation operation, and K d h represents the feature dimension after concatenation. Finally, by quantifying the prediction error of the model, overfitting can be prevented, enabling GAT to progressively learn more characteristic information about dangerous obstacles:
Γ b = ComputeLoss ( Z , target )   Γ = Γ + Γ b + λ | | Θ | | 2 2 Θ = Θ η Θ Γ
In the formula, Γ b denotes the predicted loss for the specified batch; Z signifies the features of the model’s output nodes, and target indicates the true value labels of the nodes. λ represents the regularization coefficient; Θ denotes the set of model parameters, | | Θ | | 2 2 refers to the L 2 regularization term, and Γ represents the total loss. η stands for the learning rate, and Θ Γ represents the gradient of the loss function with respect to parameter Θ , which is primarily used to characterize the current changing trend of the loss function with respect to the parameter.

3.2. Group Relative Policy Optimization

The GRPO (Group Relative Policy Optimization) algorithm is illustrated in Figure 2. Its design eliminates the Critic model, thereby overcoming the limitations of large-scale training inherent in traditional reinforcement learning methods. The core idea of GRPO is to derive an estimation benchmark by comparing multiple output results against one another, enabling optimal conclusion inference for value-free networks.
The GRPO algorithm process is outlined in Algorithm 3, starting with the initial strategy π θ . In the nested loop, a batch D b is first sampled from the dataset D, and G actions are collected from this batch using strategy π θ o l d to compute the action rewards. By constructing the objective function J G R P O ( θ ) and integrating it with the dominant function A i , the gradient strategy parameter G is updated.
Algorithm 3: GRPO Algorithm Workflow
  Input: π θ i n i t , r φ , D , ε , β , u
  Output:   π θ
1 π θ = π θ i n i t
2 for   step   = 1   to   I   do
3     π ref = π θ
4     for   step   = 1   to   I   do
5    Sample   a   batch   D b   from   D
6     π θ o l d = π θ
7     foreach   q     D b   do
8      Sample   G   oupputs   { o i } i = 1 G ~ π θ o l d ( o | q )
9      for   i   = 1   to   G   do
10       r i r ϕ ( o i , q )
11      end
12       mean r   =   1 G i = 1 G r i , std r   =   1 G i = 1 G ( r i m e a n r ) 2
13       for   i   = 1   to   G   do
14       r ˜ i = r i mean r std r
15       end
16     end
17     for   GRPO = 1 to u do
18       J GRPO ( θ ) = E q ~ P ( Q ) , { o i } i = 1 G ~ π θ old ( o | q ) 1 G i = 1 G ψ β D KL ( π θ | | π ref )
19      ψ = min π θ ( o i | q ) π old ( o i | q ) A i , c l i p π θ ( o i | q ) π old ( o i | q ) , 1 ε , 1 + ε A i
20       A i = r i mean ( { r 1 , r 2 , , r G } ) std ( { r 1 , r 2 , , r G } )
21       θ θ + η θ J ( θ )
22       end
23      end
24      Update   r ϕ u   sin g   a   replay   buffer   with   samples   { ( q , o , r ϕ ( o , q ) ) }
25   end
26   return π θ
The GRPO objective function primarily comprises three components: the sampling ratio of old and new policies, the policy clipping objective, and the KL divergence regularization term. In contrast to the classic PPO algorithm, the GRPO algorithm does not require a value network and predominantly employs group sampling to achieve efficient advantage estimation. The continuous update model of the strategy is realized by enhancing the reward-penalty mechanism within the objective function [34] as follows:
J G R P O ( θ ) = E q ~ P ( Q ) , { o i } i = 1 G ~ π θ o l d ( o | q ) 1 G i = 1 G ψ M
In the above formula, J G R P O ( θ ) denotes the objective function value of the GRPO model; E represents the mathematical expectation for sampling from the problem distribution P ( q ) and the output group generated by the old strategy. q P ( q ) signifies sampling the input based on the task probability. { o i } i = 1 G π θ o l d ( o | q ) indicates generating G candidate outputs from the old strategy q for each sampled π θ o l d . 1 G G i = 1 refers to averaging the G candidate outputs within each group. ψ and M represent the strategy clipping objective and the KL divergence regularization term, respectively [35].
ψ = min ξ A i , c l i p ξ , 1 ε , 1 + ε A i M = β D K L ( π θ | | π r e f )
In the above formula, ε denotes the clipping threshold, and c l i p ξ , 1 ε , 1 + ε signifies the threshold limit of the strategy amplitude. β represents the weight of the regularization term, primarily serving to balance strategy improvement and reference constraints. π r e f refers to the reference strategy, and D K L ( π θ | | π r e f ) indicates the KL divergence between strategy π θ and the reference strategy π r e f . ξ and A i represent the sampling ratio of the old and new strategies and the estimated intra-group advantage, respectively:
ξ = π θ ( o i | q ) π o l d ( o i | q ) A i = r i m e a n ( { r 1 , r 2 , , r G } ) s t d ( { r 1 , r 2 , , r G } )
In the above formula, π θ ( o i | q ) denotes the probability of the current strategy generating o i , and π o l d ( o i | q ) signifies the probability of the old strategy generating o i prior to the update. When ξ > 1 , it indicates that the new strategy is more likely to generate o i ; when ξ < 1 , it suggests that the new strategy reduces the generation probability of o i . r i represents the original reward for the i output, m e a n ( { r 1 , r 2 , , r G } ) denotes the average of all rewards within the group, and s t d ( { r 1 , r 2 , , r G } ) signifies the standard deviation of all rewards within the group. When A i > 0 , it implies that the output performs better than the group average; when A i < 0 , it implies that the output performs worse than the group average.

3.3. Double-Layer GWOP Algorithm Design

The two-layer fusion GWOP algorithm integrates the combination process of GWO and GRPO, with the introduction of the GAT attention mechanism, as illustrated in the overall fusion framework shown in Figure 3. Through the three-dimensional raster coding environment depicted in the upper left of the figure, the representation of digital spatial grid information is completed [36]. By meticulously recording the performance of each trajectory algorithm in 3D route planning, fast search and accurate evaluation of the optimal route planning algorithm are achieved. The top right corner of the figure presents the fundamental framework of the Gray Wolf Optimization (GWO). Building upon this, the GAT attention mechanism in the bottom right corner and the GRPO algorithm are fused to form a comprehensive integration process of “GWO group search + GRPO strategy optimization + GAT graph structure perception.” Finally, the output results are applied to the attitude control of the UAV in the bottom right corner, providing an effective solution for real-time path planning of logistics UAVs in unknown environments.
The process of the GWOP fusion algorithm is outlined in Algorithm 4. Firstly, the GRPO algorithm is integrated into the GWO fitness calculation as J c = J G R P O ( θ c ) . On this basis, the GAT attention mechanism is incorporated to establish the association of key feature information, serving as the fitness constraint term J G R P O ( θ i ) = f t a s k ( θ i ) + λ G A T ( G , θ i ) for the GWO algorithm. The solution for the complete optimal strategy is achieved via bidirectional collaborative feedback between GAT attention weights and GWO strategies.
Algorithm 4: Double—Layer GWOP fusion Algorithm Workflow
  Input: N , T max , dim , [ lb , ub ] , GRPO , ε
     β , G , d h , d h , K , E , η , B , λ
  Output: θ * , J GRPO , GAT   Θ , G W O P
1 θ i = lb + s i ( ub lb ) , s i U ( 0 , 1 ) dim
2 for   i = 1   to   N   do
3    J i = J GR PO ( θ i )
4    f i = J i
5 end
6 θ α = arg min i f i
7 θ β = arg min i α f i
8 θ δ = arg min i α , β f i
9 for   t = 1   to   T max   do
10      a = 2 ( 1 t / T max )
11      for   i   = 1   to   N   do
12      for   k = 1 to   3   do
13       r k = Random ( 0 , 1 )
14       C k = 2 r k , k = 1 ,   2 ,   3   / /   r k U ( 0 , 1 )
15       D α , i = | C 1 θ α θ i |
16       D β , i = | C 2 θ β θ i |
17       D δ , i = | C 3 θ δ θ i |
18      end
19      for   k = 1   to   3   do
20       s k + 1 = Random ( 0 , 1 )
21       A k = 2 a s k + 3 a , k = 1 ,   2 ,   3   / /   s k + 3 U ( 0 , 1 )
22       θ k ( c ) = θ { α , β , δ } [ k ] A k D { α , β , δ } [ k ] , i
23   end
24   θ c = 1 / 3 θ 1 ( c ) + θ 2 ( c ) + θ 3 ( c )
25   for j   = 1 to   dim   do
26    if θ c , j < lb j   then
27     θ c , j = lb j
28    end
29    if   θ c , j > ub j   then
30      θ c , j u b j
31    end
32   end
33   J c = J GRPO ( θ c )
34   f c = J c
35   if   f c < f i   then
36     θ i = θ c
37     f j = f c ,   J j = J c
38   end
39  end
40  θ α , θ β , θ δ = arg top 3 min i f i
41 end
42 θ * = θ α
43 J G R P O = f ( θ α )
44 if   k = 1   to   K   do
45  W k = RandomMatrix ( d h , d f )  
46  a k = RandomVector ( 2 d h )  
47 end
48 Θ = { W 1 , a 1 , W 2 , a 2 , , W K , a K }
49 for   epoch = 1   to   E   do
50  foreach   G b MiniBatch ( G , B ) do
51    H 0 = ExtractNodeFeatures ( G b )
52    for   k = 1 to K   do
53    H k = W k H 0
54    foreach   ( i , j ) G b   do
55      h k , k = H k [ i ]
56     h k , k = H k [ j ]
57     e i , j , k = LeakyReLU ( a k T [ h i , k ; h j , k ] ) ,
58     e k [ i , j ] e i , j , k
59    end
60    a k = Softmax ( e k , dim = 1 )
61    h i , k = j N ( i ) α i , j , k h j , k
62    z k [ i ] = ELU ( h i , k )
63    end
64    Z = Concat ( Z 1 , Z 2 , , Z K )
65    Γ b = ComputeLoss ( Z , target )
66    Γ = Γ + Γ b + λ | | Θ | | 2 2
67    Θ = Θ η Θ Γ
68   end
69   end
70   return   θ * , J G R P O *   ,   Θ
The GWO “Wolf Pack” corresponds to a set of candidate strategies for logistics drones, which comprises multiple θ i strategy parameters. Under any selected parameter of Strategy θ i , dynamic trajectory planning and goods distribution are executed by the logistics unmanned aerial vehicle. The trajectory performance under Strategy θ i is evaluated using the GRPO algorithm, and ultimately the optimal “α Wolf” guided population evolution is achieved. By integrating the two-tier architecture of GWO and GRPO, the GWOP algorithm realizes a high-quality global/local search strategy. The initialization of the GWOP model [37] is as follows:
θ i dim = lb + s i ( ub lb )   ,   s i U ( 0 , 1 ) dim W k N ( 0 , σ 2 ) ,   a k N ( 0 , σ 2 ) ,   k = 1 , 2 , , K
In the formula, θ denotes the parameter vector of the reinforcement learning strategy, represents the real number field, and dim signifies the dimension of the strategy parameters. θ i represents the strategy parameters of the i individual in the population. l b denotes the lower bound of the parameters, u b denotes the upper bound of the parameters, s i represents the random vector where each element follows a uniform distribution in (0,1). signifies element-wise multiplication. W k represents the weight matrix of the k attention head, a k represents the attention coefficient vector of the k attention head, k denotes the index of the attention head, and K represents the total number of attention heads. The reward value model in the fitness evaluation of GRPO [38] is as follows:
r ˜ i = r i mean r std r mean r = 1 G i = 1 G r i std r = 1 G i = 1 G ( r i mean r ) 2
In the formula, r i denotes the original reward of the i sampling, and r ˜ i denotes the standardized reward of the i sampling. G represents the number of samplings for the same type of action, m e a n r represents the arithmetic mean of the rewards for G samplings, and std r represents the standard deviation of the rewards for G samplings. The fitness J ( θ i ) of the i gray wolf individual in the modified GRPO objective function is [39,40,41]:
J GRPO ( θ i ) = f task ( θ i ) + λ GAT ( G , θ i ) f task = E q ~ P ( Q ) , o ~ π θ old ( o | q ) 1 G i = 1 G ψ β D KL ( π θ i | | π θ old ) ψ = min π θ i ( o | q ) π θ old ( o | q ) r ˜ i , c l i p π θ i ( o | q ) π θ old ( o | q ) , 1 ε , 1 + ε r ˜ i
In the formula, f t a s k ( θ i ) denotes the task objective function, ψ denotes the objective term, λ denotes the balance coefficient, G A T ( G , θ i ) denotes the graph structure constraint term, and G denotes the graph structure data. E represents the expectation operation, q represents the environmental state, and P ( Q ) represents the state distribution; o represents the actions performed by the agent in state q, and π θ old ( o | q ) represents the action distribution of the old policy. G denotes the number of action samples in the same state, and i = 1 G ψ denotes the sum of ψ for G action samples. β represents the penalty coefficient of KL divergence, and D KL ( π θ | | π ref ) represents KL divergence. θ i denotes the parameters of the current candidate strategy, and θ o l d denotes the parameters of the old strategy; π θ i ( o | q ) π θ old ( o | q ) represents the probability difference in the output action o between the new and old strategies in state q, ε represents the clipping coefficient, and clip π θ i ( o | q ) π θ old ( o | q ) , 1 ε , 1 + ε indicates clipping the input value to the interval [ 1 ε , 1 + ε ] . The model for converting the maximization objective of GRPO into the minimization objective of GWO is:
f ( θ i ) = J GRPO ( θ )
In the formula, J GRPO ( θ ) denotes the performance evaluation value of GRPO for strategy π θ i , and f ( θ i ) denotes the fitness function value of the i individual in GWO. Coefficients of the GWO update mechanism [42,43]:
a ( t ) = 2 1 t / T max C k = 2 r k ,   r k U ( 0 , 1 ) , k = 1 ,   2 ,   3   A k = 2 a ( t ) s k 2 a ( t ) , s k 2 U ( 0 , 1 ) , k = 1 ,   2 ,   3  
In the formula, t denotes the current number of iterations, and T max denotes the maximum number of iterations. r k and r k 2 represent uniformly distributed random numbers within the range (0, 1). C k represents the bounding coefficient, and A k represents the direction coefficient [44].
h i = W k h i , i ν e i , j = LeakyReLU ( a k T [ h i ; h j ] ) , ( i , j ) ε α i , j = exp ( e i , j ) k ( i ) exp ( e i , k )   h i = j ( i ) α i , j h j
In the formula, h i denotes the original feature vector of node i, W k denotes the weight matrix of the k attention map, h i denotes the updated feature vector of node i, and ν represents the node set of the graph. e i , j represents the original attention coefficient between nodes i and j, a k T [ h i ; h j ] represents the inner product operation between nodes, [ h i ; h j ] denotes the feature concatenation operation, and ε represents the edge set of the graph. α i , j represents the normalized attention weight of node i to neighbor j, exp ( e i , j ) denotes the exponential transformation of the original attention coefficient, k ( i ) exp ( e i , k ) represents the sum of the exponential attention coefficients for all neighbors of the node i, and ( i ) denotes the neighbor set of node i. h i represents the new feature vector after node i aggregates the features of its neighbors, and j ( i ) α i , j h j denotes the weighted summation operation. C k represents the containment coefficient, which measures the distance from the Wolf pack and is used for candidate strategy updates [45]:
D α = | C 1 θ α θ i | D β = | C 2 θ β θ i | D δ = | C 3 θ δ θ i | θ 1 ( c ) = θ α A 1 D α θ 2 ( c ) = θ β A 2 D β θ 3 ( c ) = θ δ A 3 D δ θ c = ( θ 1 ( c ) + θ 2 ( c ) + θ 3 ( c ) ) / 3
In the formula, θ α , θ β , θ δ , and θ i respectively denote the current optimal strategy, suboptimal strategy, third optimal strategy, and the parameter vector of the current individual. represents element-wise multiplication. D α , D β , and D δ respectively denote the parameter difference vectors of the α, β, and δ wolf. θ 1 ( c ) , θ 2 ( c ) , and θ 3 ( c ) respectively denote the candidate parameter vectors of the α, β, and δ wolf. θ c denotes the final candidate strategy that integrates the three leader-guided strategies. The values of A 1 , A 2 , and A 3 are dynamically adjusted by the output of GAT [46]:
A k = 2 a ( 1 + γ GAT attention ( i , j ) ) a
In the formula, A k denotes the control coefficient for the position update of the gray wolf, a denotes the basic control parameter of GWOP, γ denotes the scaling factor, GAT attention ( i , j ) represents the attention weight function calculated by the GAT model for node, where i is the current node and j is a neighboring node. The greedy update and convergence model [47] is as follows:
Θ = Θ η Θ [ L task + λ L GWOP ] θ * = θ i ( new )   if   J ( θ i ( new ) ) > J ( θ i )
In the formula, Θ denotes the set of learning parameters of GAT, η denotes the learning rate, and Θ denotes the gradient of the loss function with respect to Θ . L t a s k represents the task-specific loss, λ represents the balance coefficient, and L GWOP represents the GWOP guiding loss. θ i ( n e w ) denotes the new position of the i gray wolf individual, and J ( θ i ( new ) ) denotes the fitness function corresponding to the new position of the i gray wolf individual. θ i represents the current position of the i gray wolf individual, and J ( θ i ) represents the fitness function corresponding to the current position of the i gray wolf individual.

3.4. B-Spline Trajectory Curve

In the application of B-spline trajectory curves to the delivery routes of logistics drones, given the node vector U = { u 0 , u 1 , , u m } , the i p degree basis function N i , p ( u ) is defined as follows:
N i , 0 ( u ) = 1       u i u u i + 1 0       otherwise N i , p ( u ) = u u i u i + p u i N i , p 1 ( u ) + u i + p 1 u u i + p + 1 u i + 1 N i + 1 , p 1 ( u )
In the formula, u i represents the continuous changing values of a non-decreasing sequence of node vectors, and N i , p ( u ) indicates the i p degree B-spline basis function, where i is the index of the basis function and p is the degree of the basis function. For n + 1 control vertices P 0 , P 0 , , P 0 d , where d is the dimension. The weighted sum of the control vertices of the B-spline curve:
C ( u ) = i = 0 n N i , p ( u ) P i u [ u p , u m p ]
In the above formula, P i represents the control vertices, C ( u ) is the B-spline curve function, m is the total number of nodes; u is the parameter variable and serves as the input for the B-spline curve. The minimum smooth model of the B-spline curve is as follows:
J = u 0 u n ω 1 | | C ¨ ( u ) | | 2 + ω 2 | | C ( u ) Q ( u ) | | 2 d u
In the above formula, J represents the objective of the trajectory curve, ω 1 and ω 2 are the weight coefficients of the curve; C ( u ) optimizes the B-spline curve, and C ¨ ( u ) is the second derivative of the B-spline curve C ( u ) with respect to the parameter u ; Q ( u ) is the parametric curve, | | C ( u ) Q ( u ) | | 2 is the square of the L 2 norm of the difference between the curves C ( u ) and Q ( u ) , ω 2 | | C ( u ) Q ( u ) | | 2 is the penalty for the deviation of the curve from the ideal trajectory Q ( u ) , and ω 1 | | C ¨ ( u ) | | 2 is the penalty for the “non-smoothness” of the curve.

4. Experiment and Evaluation

4.1. Experimental Scenario

In order to verify the effectiveness of the double-layer GWOP algorithm, which integrates GWO and GRPO, in dynamic route planning for logistics UAVs, the entire flight experiment process of the logistics UAV from “loading—taking off—route planning” is recorded, as shown in Figure 4. Specifically, Figure 4a illustrates the process of loading envelope goods into the UAV logistics box; Figure 4b shows the vertical takeoff of the logistics UAV; and Figure 4c–f depict the various stages of the logistics UAV’s distribution route planning and flight process.

4.2. Experimental Results Analysis

The CEC2017 test functions are listed in Table 1. Among these, F1 to F10 are basic transformation functions, primarily used to evaluate convergence performance on unimodal/multimodal and complex curved landscapes through shifting and rotation operations. F11 to F20 represent hybrid functions, mainly designed to assess the algorithm’s ability to collaboratively optimize multiple sub-problems. F21 to F25 are also hybrid functions, specifically intended to test robustness and global search capabilities in highly complex environments, thereby evaluating the comprehensive performance of multiple sub-function superposition and local distortion.

4.3. CEC2017 Test

4.3.1. CEC2017 Test Data

The results of the improved GWOP algorithm and seven other algorithms, namely GWO, PSO, SSA, DBO, NGO, TSO, and JSOA, on the CEC2017 test function data are shown in Table 2. The results of the improved GWOP algorithm and seven other algorithms, including GWO, PSO, SSA, DBO, NGO, TSO, and JSOA, on the CEC2017 test function data are shown in Table 2. The obtained data results include the minimum value Min, the mean value Mean, and the standard deviation Std. The obtained data results include the minimum value Min, the average value Mean, and the standard deviation Std.

4.3.2. CEC2017 Test Results

The results of the improved GWOP algorithm and seven other algorithms GWO, PSO, SSA, DBO, NGO, TSO, and JSOA on the CEC2017 test functions are presented in Figure 5. Based on the convergence curves of the unimodal, multimodal, and composite functions from F1 to F10, it can be observed that the improved two-layer fused GWOP algorithm achieves better convergence performance and lower fitness values compared to the original GWO algorithm. Moreover, the GWOP algorithm consistently obtains the lowest fitness values among all the compared algorithms, including PSO, SSA, DBO, NGO, TSO, and JSOA. Furthermore, GWOP demonstrates superior performance in terms of fitness accuracy for the mixed functions from F11 to F20 and for the complex functions from F21 to F25.

4.4. Experimental Procedure

The path planning of logistics UAV involves four experiments, as shown in Figure 6. Experiment 1: The improved double-layer fusion GWOP algorithm is compared with the classical GWO algorithm under a single UAV scenario. Experiment 2: Building upon Experiment 1, six classical 3D trajectory planning algorithms (GWO, PSO, SSA, DBO, NGO, TSO, and JSOA) are introduced for comparison with the improved two-layer fusion GWOP algorithm to determine whether GWOP outperforms these seven algorithms. Experiment 3: The single UAV scenario from Experiment 1 is extended to multiple UAVs to compare the GWOP and GWO algorithms. Experiment 4: The single UAV scenario from Experiment 2 is similarly extended to multiple UAVs to conduct comparative experiments between GWOP and the other seven path planning algorithms.

5. UAV Trajectory Planning Model Design

5.1. Design of UAV Flight Environment

A target model is constructed by integrating the three key factors: shortest trajectory distance [48], smoothness, and minimum collision risk:
f = λ 1 i = 1 n 1 | | P i P i + 1 | | + λ 2 i = 2 n 1 θ i 2 + λ 3 i = 1 n o O 1 d ( P i , o ) + ε
In this formula, λ 1 , λ 2 , and λ 3 denote the weight coefficients corresponding to trajectory length, smoothness, and collision risk, respectively. Based on the specific requirements of trajectory planning for delivery tasks, this study sets the values of λ 1 = 0.6 , λ 2 = 0.2 , and λ 3 = 0.2 .
Logistics unmanned aerial vehicle minimum trajectory distance model:
min f 1 = i = 1 n 1 | | P i P i + 1 | |
In the formula, P i represents the three-dimensional coordinates of the i waypoint of the unmanned aerial vehicle’s trajectory planning, and | | P i P i + 1 | | represents the Euclidean distance between two points, which is ( x i + 1 x i ) 2 + ( y i + 1 y i ) 2 + ( z i + 1 z i ) 2 . In the formula, n indicates the total number of waypoints along the trajectory.
The maximum smoothness model of logistics drones:
max f 2 = i = 2 n 1 θ i 2
In the formula, θ i represents the turning angle at the i waypoint of the drone’s flight path. Then, the angle θ i between the two vectors needs to satisfy cos θ i = v i v i + 1 | | v i | | | | v i + 1 | | , where v i = P i P i 1 and v i + 1 = P i + 1 P i .
Minimum collision risk model for logistics drones in flight among obstacles:
min f 3 = i = 1 n o O 1 d ( P i , o ) + ε
In the formula, o O represents the summation over all obstacles, o represents the index of a single obstacle, O represents the set of all obstacles; 1 d ( P i , o ) + ε represents the collision risk between a single planning point and a single obstacle, d ( P i , o ) represents the shortest distance from the trajectory point P i to the obstacle o, and ε represents a very small positive number.

5.2. Construct UAV Flight Constraint Model

5.2.1. Load Constraint

The payload capacity of an unmanned aerial vehicle (UAV) in delivery operations is determined by the difference between its maximum takeoff weight and its empty weight. To account for the payload limitations inherent in logistics UAVs used for cargo transportation, a payload constraint model is formulated:
W l o a d G max W e m p t y
In the formula, W l o a d represents the payload, G max represents the maximum takeoff weight, W e m p t y represents the empty weight of the drone, and G max W e m p t y represents the upper limit of the maximum payload of the drone.

5.2.2. Turning Radius Constraint

In logistics unmanned aerial vehicle UAV delivery transportation, the turning angle constraints during flight include both vertical and horizontal limitations, as illustrated in Figure 7. Let trajectory point W i j = ( x i j , y i j , z i j ) represent the j point along the i flight path, where W i , j W i , j + 1 and W i , j + 1 W i , j + 2 denote two consecutive trajectory segments during flight, and W i , j W i , j + 1 and W i , j + 1 W i , j + 2 are the projections of the corresponding flight trajectory onto the horizontal coordinate axis X O Y .
Reasonable corner constraints significantly contribute to the generation of optimal flight path trajectories. When integrated with the aforementioned flight trajectory construction model:
W i , j W i , j + 1 =   b × ( W i , j W i , j + 1 × b ) α i j = arctan W i j W i , j + 1 × W i j + 1 W i , j + 2 W i j W i , j + 1 W i j + 1 W i j + 2 β i j = arctan z i , j + 1 z i , j W i j W i , j + 1
In this formula, α i j denotes the horizontal turning angle, β i j denotes the vertical turning angle, and b represents the unit vector corresponding to the positive direction of each coordinate axis.

5.2.3. Flight Posture Constraint

If the rotation matrix of the logistics unmanned aerial vehicle is set as R = r 11 r 12 r 13 r 21 r 22 r 23 r 31 r 32 r 33 , then the corresponding constraints of each attitude angle are as follows:
θ = arcsin ( r 31 ) π 6 θ π 6 ϕ = arctan 2 ( r 32 , r 33 ) π 4 ϕ π 4 ψ = arctan 2 ( r 21 , r 11 ) π ψ π
In this formula, θ denotes the pitch angle, ϕ denotes the roll angle, and ψ denotes the yaw angle. The corresponding quaternion is represented by q = ( q 0 , q 1 , q 2 , q 3 ) , where q 0 is the real part. Based on this representation, θ = arcsin ( 2 ( q 1 q 3 q 0 q 2 ) ) , ϕ = arctan 2 ( 2 ( q 0 q 1 + q 2 q 3 ) , q 0 2 q 1 2 q 2 2 + q 3 2 ) , and ψ = arctan 2 ( 2 ( q 1 q 2 + q 0 q 3 ) , q 0 2 + q 1 2 q 2 2 q 3 2 ) are calculated.

5.2.4. Flight Altitude Constraint

The flight height constraints for the unmanned aerial vehicle UAV are illustrated in Figure 8. The flight altitude of the logistics UAV must comply with the maximum allowable height limit. Specifically, T i , j denotes the terrain elevation at the j waypoint along the i route, z i , j represents the UAV’s flight altitude relative to sea level at that waypoint, h r , max is the maximum relative flight height, and h a , max is the maximum absolute flight height.
The flight altitude of the unmanned aerial vehicle must satisfy the following constraint model:
H i , j = ς ( h a , max T i , j )   if   ( z i , j T i , j ) > h a , max z i , j       else   T i , j < ( z i , j T i , j ) < h a , max ς ( h i j + T i , j )   else   ( z i , j T i , j ) < 0
In this formula, W i , j + 1 , W i , j + 2 , W i , j + 3 and W i , j + 4 denote the weights corresponding to the j , j + 1 , j + 2 , j + 3 , and j + 4 track points along the i flight route, respectively. ς represents the altitude coefficient, which has a value range of (0, 1), while H i , j denotes the optimized actual flight altitude when reaching the j track point on the i flight route.

5.2.5. Flight Speed Constraint

During the unmanned aerial vehicle (UAV) delivery process, the speed constraint is expressed as:
v T max 2 G 2 k d
In this formula, v denotes the horizontal flight speed, G represents the total load during flight, k d is the aerodynamic drag coefficient during flight, and T max denotes the maximum total thrust limit of the unmanned aerial vehicle.

5.3. Simulation Example and Results Analysis

5.3.1. Comparing Single UAV Path Planning Experiments

Experiment 1: Due to the limitations of visualization in the actual flight environment, supplementary experiments were carried out in a visualized three-dimensional environment, as illustrated in Figure 9. In this setup, the green circle denotes the starting point of the flight trajectory, the red triangle indicates the target endpoint of the delivery, and the colored columnar structures represent urban buildings acting as obstacles. In each trial, a single unmanned aerial vehicle (UAV) initiated its flight from the starting point and employed the improved GWOP algorithm, the classic GWO algorithm, and the GRPO algorithm, respectively, to navigate around obstacles. The optimal algorithm is defined as the one that reaches the target endpoint first, with a collision-free trajectory and the shortest path length. To ensure experimental reliability, simulations were conducted over 100, 300, and 500 iterations.
Based on the aforementioned three-dimensional environment, the comparison visualization results of a single unmanned aerial vehicle (UAV) operating under three different trajectory planning algorithms indicate that, in the 100-iteration Figure 9a, all trajectory planning algorithms successfully reach the target endpoint and complete the path from the starting point to the destination. However, as shown in Figure 9b–d, the GWOP algorithm generates a generally smoother obstacle-avoidance path, whereas the GWO and GRPO algorithms experience multiple collisions during trajectory planning. The 300-iteration Figure 9e–h and the 500-iteration Figure 9i–l consistently demonstrate that the improved GWOP algorithm significantly outperforms the classical GWO and GRPO algorithms in terms of obstacle avoidance space utilization and path smoothness. The training results of trajectory planning using a single UAV with the three algorithms across 100, 300, and 500 iterations are presented in Figure 10.
The training results in Figure 10a demonstrate that GWOP achieves rapid convergence within approximately 20 iterations, whereas the GWO and GRPO algorithms exhibit slower initial convergence and only gradually stabilize after around 40 iterations. Consequently, in the 100-iteration experiment, GWOP displays a significantly faster convergence rate compared to GWO and GRPO, reaching the optimal solution with fewer iterations. As shown in Figure 10b, the fitness value of the GWOP algorithm is approximately 25.63, while those of the GWO and GRPO algorithms are 31.58 and 34.58, respectively. This indicates that GWOP incurs lower cost consumption than GWO and GRPO in path planning. Figure 10c,d from the 300-iteration results, as well as Figure 10e,f from the 500-iteration results, further confirm that GWOP converges faster and achieves lower planning costs in trajectory planning.
Experiment 2: Given the superior performance of GWOP over GWO observed in the previous experiment, six additional trajectory planning algorithms—PSO, SSA, DBO, NGO, TSO, and JSOA—are included for comparison. A total of eight experiments are conducted for trajectory planning from the starting point to the target endpoint, as illustrated in Figure 11. The starting point, target endpoint, and building obstacles are configured identically to those in Experiment 1.
Based on the visualization results comparing eight different trajectory planning algorithms for a single unmanned aerial vehicle (UAV) in the aforementioned three-dimensional environment, Figure 11a presents the trajectory curves generated by the eight algorithms from the starting point to the target endpoint. The trajectories reveal notable differences among the algorithms. Combined with Figure 11b–d, it is evident that the GWO algorithm frequently encounters obstacles during trajectory planning, whereas the PSO, SSA, and JSOA algorithms produce redundant obstacle avoidance paths. Therefore, it can be concluded that the GWOP algorithm generates a smoother obstacle avoidance trajectory. A further analysis of the data obtained from the eight algorithms is provided in Figure 12.
The training results from the aforementioned Figure 12a indicate that the GWOP algorithm achieves convergence within approximately 20 iterations and subsequently stabilizes. In contrast, the GWO algorithm exhibits significant fluctuations during the first 53 iterations and converges to the highest cost. The DBO and NGO algorithms also converge within about 40 iterations, with relatively lower costs. Meanwhile, the PSO, TSO, SSA, and JSOA algorithms begin to converge after approximately 60 iterations, demonstrating the slowest convergence speed. As shown in Figure 12b, the fitness values of the GWOP algorithm are concentrated between 20 and 30, clearly outperforming the other seven algorithms in terms of fitness.

5.3.2. Comparing Multi-UAV Path Planning Experiments

Experiment 3: Following the superior performance of the GWOP algorithm in the previous single UAV three-dimensional trajectory planning experiments, this experiment further introduces multiple UAVs to evaluate the robustness of the improved two-layer GWOP algorithm. The trajectory comparison of the improved GWOP algorithm, the original GWO algorithm, and the GRPO algorithm for multiple UAVs is presented in Figure 13. In this setup, three UAVs share the same departure point but have different target endpoints, with the differently colored columns representing urban building obstacles.
Based on the aforementioned three-dimensional environment, the comparison visualization results of multiple unmanned aerial vehicles (UAVs) operating under three different trajectory planning algorithms demonstrate that, as shown in Figure 13a, all three UAVs using these algorithms exhibit basic obstacle avoidance capabilities and successfully complete the task of planning paths from the starting point to the target. From the top view in Figure 13b, it is evident that the UAVs under the GWOP algorithm achieve smoother obstacle avoidance maneuvers. In contrast, UAVs using the GWO and GRPO algorithms experience collisions with obstacles in certain cases, particularly among UAVs with different serial numbers. Notably, no collision behavior occurs during the trajectory planning of the three UAVs using the GWOP algorithm, which effectively reduces energy consumption and control complexity during flight. By synthesizing the various perspective views of the flight environment, it can be concluded that the improved GWOP algorithm offers significant advantages over the classical GWO and GRPO algorithms in multi-UAV flight scenarios. To validate the reliability of the conclusions drawn from the visualization results, further analysis is conducted on the convergence and violin distribution statistics, as presented in Figure 14.
The training results from Figure 14a demonstrate that the GWOP algorithm achieves rapid convergence within 50 iterations for the trajectory planning of three drones. In contrast, UAV2 using the GWO algorithm begins to converge after 5 iterations, while the GRPO algorithm starts to converge after 20 iterations. However, the converged fitness values of both the GWO and GRPO algorithms are higher than those of all drones under the GWOP algorithm. As shown in Figure 14b, the fitness values of the GWOP algorithm are concentrated in the range of 15 to 27, whereas those of the GWO and GRPO algorithms are concentrated in the ranges of 20 to 36 and 21 to 39, respectively. This indicates that the GWOP algorithm outperforms the GWO and GRPO algorithms in terms of the quality of trajectories generated for multiple drones.
Experiment 4: Given the superior performance of the GWOP algorithm over the GWO algorithm in multi-drone trajectory planning, as demonstrated in the previous results, six additional trajectory planning algorithms—PSO, SSA, DBO, NGO, TSO, and JSOA—are introduced for further comparative analysis, as illustrated in Figure 15. A comparative experiment is conducted using eight different algorithms for trajectory planning from the starting point to the target endpoints. The starting point, building obstacles, and the number of drones remain consistent with those in Experiment 1, while the number of target endpoints is increased to three to match the current number of drones.
From the aforementioned three-dimensional environment involving multiple drones, the visualization results obtained using eight different trajectory planning algorithms demonstrate that, as shown in Figure 15a, all three drones successfully avoid the three-dimensional cylindrical obstacles and reach the target destination using each of the eight algorithms. From the top view in Figure 15b, it is evident that multi-drone operations based on the GWO, TSO, SSA, NGO, and JSOA algorithms exhibit varying degrees of collision. Meanwhile, trajectory planning using the PSO and DBO algorithms results in a local “clustering” phenomenon. In contrast, the GWOP algorithm generates more dispersed trajectories, effectively reducing the “potential conflict risk” in multi-drone collaborative planning. Figure 15c,d further confirm the superior performance of the GWOP algorithm in terms of spatial utilization. To provide a more in-depth evaluation, the convergence characteristics and violin distribution statistics presented in Figure 16 are further analyzed.
The training results from Figure 16a demonstrate that the GWOP algorithm achieves rapid convergence in the trajectory planning of three drones, stabilizing around the 18th iteration. In contrast, the GWO algorithm exhibits significant fluctuations during the first 60 iterations and only stabilizes after the 70th iteration. The DBO and SSA algorithms converge within the first 55 iterations, but their fitness values are notably higher than those of the GWOP algorithm. The TSO, PSO, NGO, and JSOA algorithms converge the slowest, with convergence beginning after the 70th iteration. As shown in Figure 16b, the fitness values of the GWOP algorithm are concentrated in the range of 15 to 22, whereas those of the GWO algorithm fall within the range of 27 to 41. The fitness values of the DBO, NGO, TSO, PSO, SSA, and JSOA algorithms all exceed the range observed for the GWOP algorithm. Therefore, it can be concluded that the GWOP algorithm demonstrates a significant advantage over the other seven algorithms in terms of both convergence speed and trajectory quality in multi-drone trajectory planning.

6. Discussion

6.1. Discussion on the Results of a Single UAV

This section provides a summary and analysis of the results from the aforementioned experimental content. Key metrics are defined as follows: Optimal Path Length (OPL), Average Path Length (APL), Number of Obstacle Collisions (NOC), Fitness Value (FV), Number of UAVs (NOU), Trajectory Planning (TP), Algorithms (Algs), and Comparison (Comp).
In Experiment 1, the GRPO algorithm was integrated with and used to improve the classical GWO algorithm. Subsequently, the enhanced two-layer GWOP algorithm was evaluated through ablation experiments by comparing it with the classical GWO and GRPO algorithms. The trajectory planning of a single unmanned aerial vehicle (UAV) from the starting point to the endpoint was carried out, and the results are summarized in Table 3. Specifically, the GWOP, GWO, and GRPO algorithms were each trained 100, 300, and 500 times, respectively, across three evaluation indicators: Optimal Path Length (OPL), Number of Collisions (NOC), and Fitness Value (FV). The results were then summarized and analyzed. An evaluation indicator was considered optimal when the OPL and FV values were minimized and no collisions (NOC = 0) were observed.
Table 3 provides a detailed comparison and analysis of the GWOP algorithm, GWO algorithm, and GRPO algorithm in the context of a single drone’s flight, focusing on the OPL, NOC, and FV indicators. The results are summarized in Figure 17. As shown in Figure 17a, which illustrates the OPL values, the GWO algorithm achieved distances of 21.58 m and 21.77 m at Episode 100 and Episode 300, respectively, which are shorter than the GWOP algorithm’s 25.63 m and 26.01 m at the same episodes. However, the GWO algorithm encountered two collisions in both cases. Regarding the FV indicator in Figure 17b, the GWOP algorithm obtained fitness values of 25.63 m, 22.40 m, and 26.01 m at Episodes 100, 300, and 500, respectively. These values are significantly lower than those of the GWO algorithm (31.58 m, 27.95 m, and 31.77 m) and the GRPO algorithm (34.58 m, 27.57 m, and 26.79 m). Furthermore, as indicated by the NOC metric, the GWOP algorithm successfully avoided obstacles in all three iterations, whereas both the GWO and GRPO algorithms experienced multiple collisions before improvement. These results clearly demonstrate the superior performance of the improved GWOP algorithm in three-dimensional trajectory planning for logistics drones.
From Figure 17, it can be concluded that the improved double-layer fusion GWOP algorithm reduced the FV by 18.84% at 100 iterations, 19.86% at 300 iterations, and 18.13% at 500 iterations, with an average reduction of 18.94% across the three iteration levels compared to the improved GWO algorithm. Compared with the improved GRPO algorithm, the reductions were 25.88%, 18.75%, and 2.91% at 100, 300, and 500 iterations, respectively, yielding an average reduction of 15.85%.
In Experiment 2, the GWOP algorithm demonstrated significantly better performance than the GWO and GRPO algorithms in single UAV trajectory planning. To further evaluate the improved GWOP algorithm, six additional trajectory planning algorithms were introduced, and a comparative analysis was conducted using a total of eight different algorithms. Finally, experiments with 100 iterations were carried out for OPL, NOC, and FV, and the results are summarized in Table 4.
Table 4 presents a detailed comparison and analysis of the GWOP algorithm against seven other algorithms—GWO, PSO, SSA, DBO, NGO, TSO, and JSOA—in the single-drone flight scenario, focusing on the OPL, NOC, and FV metrics. The results are illustrated in Figure 18. As shown in Figure 18a, the GWOP algorithm achieves a trajectory length of 22.24 m, whereas the other seven algorithms yield trajectory lengths of 30.92 m, 26.88 m, 29.34 m, 23.35 m, 23.66 m, 22.43 m, and 26.90 m, respectively. According to Figure 18b, the GWOP algorithm obtains a fitness value of 22.24 for the FV metric, while the corresponding values for the other algorithms are 40.92, 26.88, 29.34, 23.35, 23.66, 22.43, and 26.90. Regarding the NOC data, the trajectories generated by the GWOP, PSO, DBO, NGO, and TSO algorithms satisfy the collision-free requirement in the single-drone delivery task. Overall, the experimental results indicate that the GWOP algorithm outperforms all other tested algorithms.
From Figure 18a, it can be observed that the improved two-layer fused GWOP algorithm achieves an OPL value that is 15.15% lower than the average of the other seven algorithms. As shown in Figure 18b, the improved GWOP algorithm also demonstrates a 19.54% reduction in FV compared to the average of the other algorithms. In summary, these results indicate that the improved two-layer GWOP algorithm offers significant advantages in single UAV trajectory planning.

6.2. Discussion on the Results of Multiple UAVs

In Experiment 3, after verifying the superiority of the GWOP algorithm over multiple algorithms in the single UAV scenario, the number of UAVs was increased to evaluate its performance in a multi-UAV setting. The improved two-layer fused GWOP algorithm was compared with the classic GWO and GRPO algorithms in this multi-UAV scenario. The trajectory planning data from the starting point to the endpoint for multiple UAVs are presented in Table 5. Specifically, 100 iterations were conducted for each of the four evaluation indicators—APL, OPL, NOC, and FV—for the GWOP, GWO, and GRPO algorithms. The evaluation criteria consider APL, OPL, and FV optimal when their values are minimized, and NOC is zero, indicating no collisions.
Table 5 presents the APL, OPL, NOC, and FV index data obtained from the flight trajectories of multiple drones using the GWOP, GWO, and GRPO algorithms. The data were visualized and analyzed, resulting in Figure 19. As shown in Figure 19a,b, the GWOP algorithm achieves APL, OPL, and FV values of 20.21 m, 15.13 m, and 20.21 m, respectively, which are significantly lower than the corresponding values for the GWO algorithm (21.07 m, 17.61 m, and 27.61 m) and the GRPO algorithm (23.49 m, 20.10 m, and 30.10 m). Moreover, the GWOP algorithm successfully completed conflict-free obstacle avoidance in the trajectory planning of three drones, whereas both the GWO and GRPO algorithms resulted in two collision incidents. Therefore, the experimental results demonstrate that the GWOP algorithm exhibits a clear advantage over the GWO and GRPO algorithms in multi-drone trajectory planning.
From Figure 19a,b, it can be observed that the improved double-layer fusion GWOP algorithm reduces the APL index by 4.08% compared to the GWO algorithm, the OPL index by 14.08%, and the FV index by 26.80%. Compared with the GRPO algorithm, the reductions are 13.96% for APL, 24.73% for OPL, and 32.86% for FV. These results further confirm the effectiveness of the improved GWOP algorithm in multi-UAV trajectory planning.
In Experiment 4, after demonstrating the superiority of the GWOP algorithm over two other algorithms in the multi-UAV scenario, six additional trajectory planning algorithms were introduced. A comparative analysis involving a total of eight algorithms was then conducted in the 3D multi-UAV trajectory planning environment to further validate the performance of the improved GWOP algorithm. The experimental results from 100 iterations for APL, OPL, NOC, and FV are summarized in Table 6. The evaluation criteria consider the shortest APL, OPL, and FV values as optimal, provided that no NOC occur.
Table 6 presents a detailed comparison and analysis of the GWOP algorithm against seven other algorithms—GWO, PSO, SSA, DBO, NGO, TSO, and JSOA—in the context of multiple drone flights, focusing on the APL, OPL, NOC, and FV metrics. The results are illustrated in Figure 20. According to the APL data in Figure 20b, the GWOP algorithm achieves a trajectory length of 17.35 m, whereas the other seven algorithms yield lengths of 38.51 m, 21.61 m, 22.55 m, 19.70 m, 18.74 m, 17.87 m, and 21.13 m, respectively. As shown in Figure 20c, the OPL value for the GWOP algorithm is 15.00 m, compared to 32.03 m, 16.05 m, 18.47 m, 16.61 m, 16.57 m, 15.26 m, and 16.30 m for the other algorithms. Regarding the FV metric in Figure 20d, the GWOP algorithm obtains a value of 18.02 m, while the corresponding values for the other algorithms are 31.37 m, 22.27 m, 23.22 m, 20.37 m, 19.41 m, 18.54 m, and 22.46 m. Finally, as indicated in Figure 20a, the GWO algorithm demonstrates a significant advantage over the other seven algorithms in terms of APL, OPL, NOC, and FV metrics in multi-drone trajectory planning.
To summarize, as indicated in Figure 20a, the improved two-layer fusion GWOP algorithm demonstrates superior performance compared to the other seven algorithms. Specifically, it achieves an APL index that is 24.14% lower than the average, an OPL index that is 20.04% lower, and an FV index that is 19.98% lower. Furthermore, the trajectory curve generated by the GWOP algorithm for path planning from the starting point to the target endpoint satisfies the requirements of an optimal, conflict-free trajectory.

7. Conclusions

This study addresses the challenges of dynamic trajectory planning for unmanned aerial vehicles (UAVs) in logistics environments. First, a three-dimensional multi-objective flight model is established. Based on this framework, a two-layer fusion GWOP algorithm is proposed. Subsequently, the GAT attention mechanism is integrated to enable dynamic association. The key experimental findings are summarized as follows:
  • The CEC2017 benchmark functions are employed for evaluation. Compared with the GWO algorithm, the improved two-layer fusion GWOP algorithm demonstrates superior convergence curves and fitness values across unimodal, multimodal, and complex functions F1–F10. When compared with six other algorithms—PSO, SSA, DBO, NGO, TSO, and JSOA—it achieves better fitness values in the hybrid functions F11–F20 and F21–F25.
  • In the single UAV 3D trajectory planning experiment, the improved GWOP algorithm reduces the FV (fitness value) by 18.84%, 19.86%, and 18.13% at 100, 300, and 500 iterations, respectively, yielding an average reduction of 18.94% across all three. The NOC (number of collisions) is reduced from 2 to 0. Compared with the classic GRPO algorithm, the FV reductions are 25.88%, 18.75%, and 2.91% at the same iteration levels, with an average reduction of 15.85%, and the NOC is reduced from 1 to 0. These results highlight the algorithm’s superior path optimization capability and robust obstacle avoidance performance.
  • When extended to a comparison involving eight trajectory planning algorithms, the GWOP algorithm achieves OPL (optimal path length) and FV values that are 15.15% and 19.54% lower than the average, respectively, further confirming its effectiveness in complex 3D trajectory planning for single UAVs.
  • In the multi-UAV 3D scenario, the advantages of the GWOP algorithm become even more pronounced. Compared with the GWO algorithm, it reduces APL (average path length), OPL, and FV by 4.08%, 14.08%, and 26.80%, respectively. Compared with the GRPO algorithm, the reductions are 13.96%, 24.73%, and 32.86%, respectively. The NOC is also reduced from 2 to 0, further validating the algorithm’s superior obstacle avoidance and trajectory optimization capabilities.
  • When evaluated against eight algorithms in a multi-UAV environment, the GWOP algorithm shows reductions of 24.14% in APL (average path length), 20.04% in OPL (optimal path length), and 19.98% in FV (fitness value) compared to the average. Collectively, these experiments demonstrate that the improved GWOP algorithm consistently outperforms other methods in terms of path length, obstacle avoidance, and trajectory smoothness, offering an efficient and reliable planning solution for UAV logistics operations.
Despite its strong performance, future research can further enhance the algorithm in three key directions:
First, improving adaptability to meteorological conditions by incorporating real-time weather data—such as wind intensity and rainfall—to better simulate real-world logistics uncertainties.
Second, expanding to large-scale UAV clusters to explore the algorithm’s scalability in coordinating dozens or even hundreds of UAVs, optimizing communication and path coordination mechanisms for complex swarm logistics tasks.
Third, extending the algorithm to cross-platform and multi-objective optimization, enabling its application in heterogeneous logistics systems such as autonomous ground vehicles and unmanned ships. This would involve integrating constraints like energy consumption and time windows, ultimately supporting the development of a globally coordinated intelligent logistics path planning system and advancing the evolution of the smart logistics ecosystem.

Author Contributions

J.D.: Conceptualization, Investigation, Validation, Writing—original draft, Writing—review and editing, Formal analysis, Data curation, Investigation. Y.Z.: Formal analysis, Data curation, Investigation, Project administration, Writing—original draft. Y.S.: Data curation, Investigation, Project administration. H.Z.: Conceptualization, Funding acquisition, Supervision, Writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by The National Social Science Fund of China (No. 22&ZD169) and the Key project of Civil Aviation Joint Fund of National Natural Science Foundation of China (No. U2133207).

Data Availability Statement

Data will be made available on request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

OPLOptimal Path Length
APLAverage Path Length
NOCNumber of Obstacle Collisions
FVFitness Value
NOUNumber of UAVs
TPTrajectory Planning
AlgsAlgorithms
CompComparison

References

  1. Niu, B.; Zhang, J.; Xie, F. Drone logistics’ resilient development: Impacts of consumer choice, competition, and regulation. Transp. Res. Part A Policy Pract. 2024, 185, 104126. [Google Scholar] [CrossRef]
  2. Shang, Z.; Bradley, J.; Shen, Z. A co-optimal coverage path planning method for aerial scanning of complex structures. Expert Syst. Appl. 2020, 158, 113535. [Google Scholar] [CrossRef]
  3. Liu, X.; Li, G.; Yang, H.; Zhang, N.; Wang, L.; Shao, P. Agricultural UAV trajectory planning by incorporating multi-mechanism improved grey wolf optimization algorithm. Expert Syst. Appl. 2023, 233, 120946. [Google Scholar] [CrossRef]
  4. Zhang, C.; Zhou, W.; Qin, W.; Tang, W. A novel UAV path planning approach: Heuristic crossing search and rescue optimization algorithm. Expert Syst. Appl. 2023, 215, 119243. [Google Scholar] [CrossRef]
  5. Solka, J.L.; Perry, J.C.; Poellinger, B.R.; Rogers, G.W. Fast computation of optimal paths using a parallel Dijkstra algorithm with embedded constraints. Neurocomputing 1995, 8, 195–212. [Google Scholar] [CrossRef]
  6. Phong, T.Q.; Hoai An, L.T.; Tao, P.D. Decomposition branch and bound method for globally solving linearly constrained indefinite quadratic minimization problems. Oper. Res. Lett. 1995, 17, 215–220. [Google Scholar] [CrossRef]
  7. Huang, J.; Chen, C.; Shen, J.; Liu, G.; Xu, F. A self-adaptive neighborhood search A-star algorithm for mobile robots global path planning. Comput. Electr. Eng. 2025, 123, 110018. [Google Scholar] [CrossRef]
  8. Wang, J.; Bi, C.; Liu, F.; Shan, J. Dubins-RRT* motion planning algorithm considering curvature-constrained path optimization. Expert Syst. Appl. 2026, 296, 128390. [Google Scholar] [CrossRef]
  9. Yu, X.; Jiang, N.; Wang, X.; Li, M. A hybrid algorithm based on grey wolf optimizer and differential evolution for UAV path planning. Expert Syst. Appl. 2023, 215, 119327. [Google Scholar] [CrossRef]
  10. Lou, T.; Yue, Z.; Jiao, Y.; He, Z. A hybrid strategy-based GJO algorithm for robot path planning. Expert Syst. Appl. 2024, 238, 121975. [Google Scholar] [CrossRef]
  11. Cui, Y.; Hu, W.; Rahmani, A. A reinforcement learning based artificial bee colony algorithm with application in robot path planning. Expert Syst. Appl. 2022, 203, 117389. [Google Scholar] [CrossRef]
  12. Wang, Y.; Tong, K.; Fu, C.; Wang, Y.; Li, Q.; Wang, X.; He, Y.; Xu, L. Hybrid path planning algorithm for robots based on modified golden jackal optimization method and dynamic window method. Expert Syst. Appl. 2025, 282, 127808. [Google Scholar] [CrossRef]
  13. Lyu, L.; Yang, F. MMPA: A modified marine predator algorithm for 3D UAV path planning in complex environments with multiple threats. Expert Syst. Appl. 2024, 257, 124955. [Google Scholar] [CrossRef]
  14. Xiao, Z.; Li, P.; Liu, C.; Gao, H.; Wang, X. MACNS: A generic graph neural network integrated deep reinforcement learning based multi-agent collaborative navigation system for dynamic trajectory planning. Inf. Fusion 2024, 105, 102250. [Google Scholar] [CrossRef]
  15. Yu, X.; Luo, W. Reinforcement learning-based multi-strategy cuckoo search algorithm for 3D UAV path planning. Expert Syst. Appl. 2023, 223, 119910. [Google Scholar] [CrossRef]
  16. Popović, M.; Ott, J.; Rückin, J.; Kochenderfer, M.J. Learning-based methods for adaptive informative path planning. Robot. Auton. Syst. 2024, 179, 104727. [Google Scholar] [CrossRef]
  17. Fan, Q.; Huang, H.; Li, Y.; Han, Z.; Hu, Y.; Huang, D. Beetle antenna strategy based grey wolf optimization. Expert Syst. Appl. 2021, 165, 113882. [Google Scholar] [CrossRef]
  18. Tang, H.; Sun, W.; Lin, A.; Xue, M.; Zhang, X. A GWO-based multi-robot cooperation method for target searching in unknown environments. Expert Syst. Appl. 2021, 186, 115795. [Google Scholar] [CrossRef]
  19. Li, K.; Ge, F.; Han, Y.; Wang, Y.A.; Xu, W. Path planning of multiple UAVs with online changing tasks by an ORPFOA algorithm. Eng. Appl. Artif. Intell. 2020, 94, 103807. [Google Scholar] [CrossRef]
  20. Niu, Y.; Yan, X.; Wang, Y.; Niu, Y. 3D real-time dynamic path planning for UAV based on improved interfered fluid dynamical system and artificial neural network. Adv. Eng. Inform. 2024, 59, 102306. [Google Scholar] [CrossRef]
  21. Lee, H. Research on multi-functional logistics intelligent Unmanned Aerial Vehicle. Eng. Appl. Artif. Intell. 2022, 116, 105341. [Google Scholar] [CrossRef]
  22. Yu, S.; Chen, J.; Liu, G.; Tong, X.; Sun, Y. SOF-RRT*: An improved path planning algorithm using spatial offset sampling. Eng. Appl. Artif. Intell. 2023, 126, 106875. [Google Scholar] [CrossRef]
  23. Fang, W.; Liao, Z.; Bai, Y. Improved ACO algorithm fused with improved Q-Learning algorithm for Bessel curve global path planning of search and rescue robots. Robot. Auton. Syst. 2024, 182, 104822. [Google Scholar] [CrossRef]
  24. Liu, X.; Shao, P.; Li, G.; Ye, L.; Yang, H. Complex hilly terrain agricultural UAV trajectory planning driven by Grey Wolf Optimizer with interference model. Appl. Soft Comput. 2024, 160, 111710. [Google Scholar] [CrossRef]
  25. Wang, Z.; Yuan, F.; Li, R.; Zhang, M.; Luo, X. Hidden AS link prediction based on random forest feature selection and GWO-XGBoost model. Comput. Netw. 2025, 262, 111164. [Google Scholar] [CrossRef]
  26. Zhu, C.; Bouteraa, Y.; Khishe, M.; Martín, D.; Hernando-Gallego, F.; Vaiyapuri, T. Enhancing unmanned marine vehicle path planning: A fractal-enhanced chaotic grey wolf and differential evolution approach. Knowl.-Based Syst. 2025, 317, 113481. [Google Scholar] [CrossRef]
  27. Zhang, L.; Yang, H.; Yang, C.; Zhang, J.; Wang, J. Optimal design of mixed dielectric coaxial-annular TSV using GWO algorithm based on artificial neural network. Integration 2024, 97, 102205. [Google Scholar] [CrossRef]
  28. Lakzaei, B.; Haghir Chehreghani, M.; Bagheri, A. LOSS-GAT: Label propagation and one-class semi-supervised graph attention network for fake news detection. Appl. Soft Comput. 2025, 174, 112965. [Google Scholar] [CrossRef]
  29. Stamatopoulos, M.; Banerjee, A.; Nikolakopoulos, G. Conflict-free optimal motion planning for parallel aerial 3D printing using multiple UAVs. Expert Syst. Appl. 2024, 246, 123201. [Google Scholar] [CrossRef]
  30. Kim, M.; Jang, Y.; Sung, T. Graph-based technology recommendation system using GAT-NGCF. Expert Syst. Appl. 2025, 288, 128240. [Google Scholar] [CrossRef]
  31. Zhang, H.; An, X.; He, Q.; Yao, Y.; Zhang, Y.; Fan, F.; Teng, Y. Quadratic graph attention network (Q-GAT) for robust construction of gene regulatory network. Neurocomputing 2025, 631, 129635. [Google Scholar] [CrossRef]
  32. Zhao, J.; Yan, Z.; Zhou, Z.; Chen, X.; Wu, B.; Wang, S. A ship trajectory prediction method based on GAT and LSTM. Ocean Eng. 2023, 289, 116159. [Google Scholar] [CrossRef]
  33. Zhou, L.; Wang, H. MST-GAT: A multi-perspective spatial-temporal graph attention network for multi-sensor equipment remaining useful life prediction. Inf. Fusion 2024, 110, 102462. [Google Scholar] [CrossRef]
  34. Sheng, Z.; Song, T.; Song, J.; Liu, Y.; Ren, P. Bidirectional rapidly exploring random tree path planning algorithm based on adaptive strategies and artificial potential fields. Eng. Appl. Artif. Intell. 2025, 148, 110393. [Google Scholar] [CrossRef]
  35. Niu, Y.; Yan, X.; Wang, Y.; Niu, Y. Three-dimensional UCAV path planning using a novel modified artificial ecosystem optimizer. Expert Syst. Appl. 2023, 217, 119499. [Google Scholar] [CrossRef]
  36. Fu, B.; Chen, Y.; Quan, Y.; Zhou, X.; Li, C. Bidirectional artificial potential field-based ant colony optimization for robot path planning. Robot. Auton. Syst. 2025, 183, 104834. [Google Scholar] [CrossRef]
  37. Stache, F.; Westheider, J.; Magistri, F.; Stachniss, C.; Popović, M. Adaptive path planning for UAVs for multi-resolution semantic segmentation. Robot. Auton. Syst. 2023, 159, 104288. [Google Scholar] [CrossRef]
  38. Lin, H.; Shodiq, M.A.F.; Hsieh, M.F. Robot path planning based on three-dimensional artificial potential field. Eng. Appl. Artif. Intell. 2025, 144, 110127. [Google Scholar] [CrossRef]
  39. Xia, Q.; Liu, S.; Guo, M.; Wang, H.; Zhou, Q.; Zhang, X. Multi-UAV trajectory planning using gradient-based sequence minimal optimization. Robot. Auton. Syst. 2021, 137, 103728. [Google Scholar] [CrossRef]
  40. Hu, L.; Wei, C.; Yin, L. MAPPO-ITD3-IMLFQ algorithm for multi-mobile robot path planning. Adv. Eng. Inform. 2025, 65, 103398. [Google Scholar] [CrossRef]
  41. Lin, S.; Liu, A.; Wang, J.; Kong, X. An improved fault-tolerant cultural-PSO with probability for multi-AGV path planning. Expert Syst. Appl. 2024, 237, 121510. [Google Scholar] [CrossRef]
  42. Zhang, T.; Hu, H.; Liang, Y.; Liu, X.; Rong, Y.; Wu, C.; Zhang, G.; Huang, Y. A novel path planning approach to minimize machining time in laser machining of irregular micro-holes using adaptive discrete grey wolf optimizer. Comput. Ind. Eng. 2024, 193, 110320. [Google Scholar] [CrossRef]
  43. Ou, J.; Hong, S.H.; Song, G.; Wang, Y. Hybrid path planning based on adaptive visibility graph initialization and edge computing for mobile robots. Eng. Appl. Artif. Intell. 2023, 126, 107110. [Google Scholar] [CrossRef]
  44. Wang, X.; Zhuang, Y.; Cao, X.; Huai, J.; Zhang, Z.; Zheng, Z.; El-Sheimy, N. GAT-LSTM: A feature point management network with graph attention for feature-based visual SLAM in dynamic environments. Isprs J. Photogramm. 2025, 224, 75–93. [Google Scholar] [CrossRef]
  45. Wei, D.; Chen, D.; Huang, Z.; Li, T. An improved chaotic GWO-LGBM hybrid algorithm for emotion recognition. Biomed. Signal Proces. 2024, 98, 106768. [Google Scholar] [CrossRef]
  46. Jiang, W.; Lyu, Y.; Li, Y.; Guo, Y.; Zhang, W. UAV path planning and collision avoidance in 3D environments based on POMPD and improved grey wolf optimizer. Aerosp. Sci. Technol. 2022, 121, 107314. [Google Scholar] [CrossRef]
  47. Chintam, P.; Lei, T.; Osmanoglu, B.; Wang, Y.; Luo, C. Informed sampling space driven robot informative path planning. Robot. Auton. Syst. 2024, 175, 104656. [Google Scholar] [CrossRef]
  48. Xiong, Y.; Zhou, Y.; She, J.; Yu, A. Collaborative coverage path planning for UAV swarm for multi-region post-disaster assessment. Veh. Commun. 2025, 53, 100915. [Google Scholar] [CrossRef]
Figure 1. GWO algorithm framework.
Figure 1. GWO algorithm framework.
Drones 09 00671 g001
Figure 2. GRPO algorithm framework.
Figure 2. GRPO algorithm framework.
Drones 09 00671 g002
Figure 3. Double-Layer GWOP fusion algorithm framework.
Figure 3. Double-Layer GWOP fusion algorithm framework.
Drones 09 00671 g003
Figure 4. Trajectory Planning and Delivery Process of Logistics UAV: (a) Load letter cargo; (b) Vertical Takeoff; (c) Waypoint 1; (d) Waypoint 2; (e) Waypoint 3; (f) Waypoint 4.
Figure 4. Trajectory Planning and Delivery Process of Logistics UAV: (a) Load letter cargo; (b) Vertical Takeoff; (c) Waypoint 1; (d) Waypoint 2; (e) Waypoint 3; (f) Waypoint 4.
Drones 09 00671 g004
Figure 5. CEC2017 test results.
Figure 5. CEC2017 test results.
Drones 09 00671 g005
Figure 6. Experimental Procedure of the Double-Layer Fusion GWOP Algorithm.
Figure 6. Experimental Procedure of the Double-Layer Fusion GWOP Algorithm.
Drones 09 00671 g006
Figure 7. Flight turn angle constraint.
Figure 7. Flight turn angle constraint.
Drones 09 00671 g007
Figure 8. Flight altitude constraint diagram.
Figure 8. Flight altitude constraint diagram.
Drones 09 00671 g008
Figure 9. Comparative Experiment of 2 Algorithms for Single UAV Path Planning: (a) 3D-Path planning Single UAV (100); (b) Top View Single UAV (100); (c) XZ View Single UAV (100); (d) YZ View Single UAV (100); (e) 3D-Path planning Single UAV (300); (f) Top View Single UAV (300); (g) XZ View Single UAV (300); (h) YZ View Single UAV (300); (i) 3D-Path planning Single UAV (500); (j) Top View Single UAV (500); (k) XZ View Single UAV (500); (l) YZ View Single UAV (500).
Figure 9. Comparative Experiment of 2 Algorithms for Single UAV Path Planning: (a) 3D-Path planning Single UAV (100); (b) Top View Single UAV (100); (c) XZ View Single UAV (100); (d) YZ View Single UAV (100); (e) 3D-Path planning Single UAV (300); (f) Top View Single UAV (300); (g) XZ View Single UAV (300); (h) YZ View Single UAV (300); (i) 3D-Path planning Single UAV (500); (j) Top View Single UAV (500); (k) XZ View Single UAV (500); (l) YZ View Single UAV (500).
Drones 09 00671 g009aDrones 09 00671 g009bDrones 09 00671 g009c
Figure 10. Statistical Comparison of 3 Algorithms for Single UAV Path Planning: (a) Convergence Curves Single UAV (100); (b) Comparison 2 Path Planning Algorithms (100); (c) Convergence Curves Single UAV (300); (d) Comparison 2 Path Planning Algorithms (300); (e) Convergence Curves Single UAV (500); (f) Comparison 2 Path Planning Algorithms (500).
Figure 10. Statistical Comparison of 3 Algorithms for Single UAV Path Planning: (a) Convergence Curves Single UAV (100); (b) Comparison 2 Path Planning Algorithms (100); (c) Convergence Curves Single UAV (300); (d) Comparison 2 Path Planning Algorithms (300); (e) Convergence Curves Single UAV (500); (f) Comparison 2 Path Planning Algorithms (500).
Drones 09 00671 g010aDrones 09 00671 g010b
Figure 11. Comparative Experiment of 8 Algorithms for Single UAV Path Planning: (a) 3D-Path planning (Single UAV); (b) Top View (Single UAV); (c) XZ View (Single UAV); (d) YZ View (Single UAV).
Figure 11. Comparative Experiment of 8 Algorithms for Single UAV Path Planning: (a) 3D-Path planning (Single UAV); (b) Top View (Single UAV); (c) XZ View (Single UAV); (d) YZ View (Single UAV).
Drones 09 00671 g011
Figure 12. Statistical Comparison of 8 Algorithms for Single UAV Path Planning: (a) Convergence Curves (Single UAV); (b) Comparison 8 Path Planning Algorithms (Single UAV).
Figure 12. Statistical Comparison of 8 Algorithms for Single UAV Path Planning: (a) Convergence Curves (Single UAV); (b) Comparison 8 Path Planning Algorithms (Single UAV).
Drones 09 00671 g012
Figure 13. Comparative Experiment of 3 Algorithms for Multi-UAV Path Planning: (a) 3D-Path planning (3 UAVs); (b) Top View (3 UAVs); (c) XZ View (3 UAVs); (d) YZ View S (3 UAVs).
Figure 13. Comparative Experiment of 3 Algorithms for Multi-UAV Path Planning: (a) 3D-Path planning (3 UAVs); (b) Top View (3 UAVs); (c) XZ View (3 UAVs); (d) YZ View S (3 UAVs).
Drones 09 00671 g013
Figure 14. Statistical Comparison of 3 Algorithms for Multi-UAV Path Planning: (a) Convergence Curves (3 UAVs); (b) Comparison 3 Path Planning Algorithms (3 UAVs).
Figure 14. Statistical Comparison of 3 Algorithms for Multi-UAV Path Planning: (a) Convergence Curves (3 UAVs); (b) Comparison 3 Path Planning Algorithms (3 UAVs).
Drones 09 00671 g014
Figure 15. Comparative Experiment of 8 Algorithms for Multi-UAV Path Planning: (a) 3D-Path planning (3 UAVs); (b) Top View (3 UAVs); (c) XZ View (3 UAVs); (d) YZ View (3 UAVs).
Figure 15. Comparative Experiment of 8 Algorithms for Multi-UAV Path Planning: (a) 3D-Path planning (3 UAVs); (b) Top View (3 UAVs); (c) XZ View (3 UAVs); (d) YZ View (3 UAVs).
Drones 09 00671 g015
Figure 16. Statistical Comparison of 8 Algorithms for Multi-UAV Path Planning: (a) Convergence Curves (3 UAVs); (b) Comparison 8PathPlanning Algorithms (3 UAVs).
Figure 16. Statistical Comparison of 8 Algorithms for Multi-UAV Path Planning: (a) Convergence Curves (3 UAVs); (b) Comparison 8PathPlanning Algorithms (3 UAVs).
Drones 09 00671 g016
Figure 17. Comp Results Figure of 3TP Algs for Single UAV: (a) OPL Single UAV 3TP Algs Comp; (b) FV Single UAV 3TP Algs Comp.
Figure 17. Comp Results Figure of 3TP Algs for Single UAV: (a) OPL Single UAV 3TP Algs Comp; (b) FV Single UAV 3TP Algs Comp.
Drones 09 00671 g017
Figure 18. Comp Results Figure of 8TP Algs for Single UAV: (a) OPL Single UAV 8TP Algs Comp; (b) FV Single UAV 8TP Algs Comp.
Figure 18. Comp Results Figure of 8TP Algs for Single UAV: (a) OPL Single UAV 8TP Algs Comp; (b) FV Single UAV 8TP Algs Comp.
Drones 09 00671 g018
Figure 19. Comp Results Figure of 3TP Algs for Multi-UAVs: (a) 3TP Algs Surface Comp (3 UAVs); (b) 3TP Algs Heatmap Comp (3 UAVs).
Figure 19. Comp Results Figure of 3TP Algs for Multi-UAVs: (a) 3TP Algs Surface Comp (3 UAVs); (b) 3TP Algs Heatmap Comp (3 UAVs).
Drones 09 00671 g019
Figure 20. Comp Results Figure of 8TP Algs for Multi-UAVs: (a) 8TP Algs Comp (3 UAVs); (b) APL Comp for 3 UAVs with 8TP Algs; (c) OPL Comp for 3 UAVs with 8TP Algs; (d) FV Comp for 3 UAVs with 8TP Algs.
Figure 20. Comp Results Figure of 8TP Algs for Multi-UAVs: (a) 8TP Algs Comp (3 UAVs); (b) APL Comp for 3 UAVs with 8TP Algs; (c) OPL Comp for 3 UAVs with 8TP Algs; (d) FV Comp for 3 UAVs with 8TP Algs.
Drones 09 00671 g020
Table 1. CEC2017 test functions.
Table 1. CEC2017 test functions.
FunctionFunction DescriptionSearch RangeOptimum
F1Shifted and Rotated Bent Cigar[−100,100]100
F3Shifted and Rotated Sum of Power[−100,100]300
F4Shifted and Rotated Rosenbrock[−100,100]400
F5Shifted and Rotated Rastrigin[−100,100]500
F6Shifted and Rotated Schaffer F6[−100,100]600
F7Shifted and Rotated Schwefel[−100,100]700
F8Shifted and Rotated Ackley[−100,100]800
F9Shifted and Rotated Weierstrass[−100,100]900
F10Shifted and Rotated Griewank[−100,100]1000
F11Hybrid Function 1 (N = 3)[−100,100]1100
F12Hybrid Function 2 (N = 3)[−100,100]1200
F13Hybrid Function 3 (N = 3)[−100,100]1300
F14Hybrid Function 4 (N = 4)[−100,100]1400
F15Hybrid Function 5 (N = 4)[−100,100]1500
F16Hybrid Function 6 (N = 4)[−100,100]1600
F17Hybrid Function 6 (N = 5)[−100,100]1700
F18Hybrid Function 6 (N = 5)[−100,100]1800
F19Hybrid Function 6 (N = 5)[−100,100]1900
F20Hybrid Function 6 (N = 6)[−100,100]2000
F21Composition Function 1 (N = 3)[−100,100]2100
F22Composition Function 2 (N = 3)[−100,100]2200
F23Composition Function 3 (N = 4)[−100,100]2300
F24Composition Function 4 (N = 4)[−100,100]2400
F25Composition Function 5 (N = 5)[−100,100]2500
Table 2. Data obtained from different algorithms in the basic function.
Table 2. Data obtained from different algorithms in the basic function.
FunctionAlgorithmMinMeanStd
F1GWO199.6942223.895214.0158
PSO111.7409140.093115.8392
SSA123.3391144.878012.8837
DBO194.1273209.259910.3549
NGO200.8044214.22569.7497
TSO160.3911175.738910.4338
JSOA180.1937205.786114.6241
GWOP104.6724115.38978.8722
F3GWO28,358.363139,774.99045710.244
PSO17,644.392524,685.24893353.1221
SSA9697.020413,455.89401889.9365
DBO26,192.246630,052.30331520.5187
NGO21,007.292526,682.87052166.6123
TSO7212.82777944.1156370.1362
JSOA8603.38709232.7826317.5621
GWOP7190.45537309.238551.8563
F4GWO407.4124426.956112.0619
PSO401.2658418.58489.0139
SSA411.9919422.645410.1980
DBO403.9496475.741536.4312
NGO400.5129433.978316.7253
TSO402.6160407.01933.8720
JSOA436.7995600.048787.7498
GWOP401.7645412.81038.9721
F5GWO504.0811520.533610.8477
PSO539.5564554.013410.1109
SSA604.9819620.199510.3872
DBO582.0188597.940010.6522
NGO523.8859543.391012.0463
TSO511.9445522.75348.9001
JSOA586.9167607.273512.3935
GWOP502.0501512.80868.8846
F6GWO603.2949614.35278.9760
PSO570.2043610.974421.5719
SSA658.0606671.101312.0831
DBO655.4169666.77429.0800
NGO622.6914638.218410.5012
TSO601.6734616.309010.1769
JSOA659.5449679.399115.8114
GWOP602.9636613.21508.7333
F7GWO722.4713739.921511.2307
PSO751.9333768.83689.7279
SSA813.2591844.152721.7951
DBO715.9411805.992747.1721
NGO752.5464777.335214.2703
TSO725.7858742.442510.9262
JSOA782.1305808.213714.8347
GWOP712.1128726.90388.6603
F8GWO804.9292815.56368.8472
PSO839.4395855.14659.0143
SSA861.4109872.16688.8838
DBO867.7398887.904312.3146
NGO817.9014830.45147.0711
TSO804.9764824.065611.1803
JSOA876.4252887.45508.9674
GWOP801.4056812.17898.8778
F9GWO922.53021029.062655.9017
PSO904.4682927.022413.3106
SSA1373.64031672.4681174.6425
DBO1443.33482058.6789320.7845
NGO1138.48521368.5626138.8741
TSO904.5056935.646717.3205
JSOA2375.60382860.6900254.9510
GWOP902.6817913.40268.8735
F10GWO1700.53682000.1873173.2051
PSO2563.70482860.3803148.5062
SSA2263.05702350.676951.9615
DBO2617.28922932.8825157.9622
NGO1739.38141923.442092.3015
TSO1838.01492439.8799316.2278
JSOA2835.77673015.109489.9226
GWOP1629.67461911.6185165.8312
F11GWO1198.39341247.343826.9258
PSO1111.60251126.691810.3422
SSA1171.34021316.154381.9152
DBO1265.73181362.603251.9624
NGO1126.23611224.161949.4709
TSO1110.66221137.066915.8114
JSOA1209.24711449.5850134.5850
GWOP1111.09651121.95146.7082
F12GWO1653.68271897.9735134.1633
PSO1385.56481694.3517173.2042
SSA2226.28532241.097210.2396
DBO1408.5421486.506743.3029
NGO1564.12931892.4710187.0829
TSO1794.42682213.1708235.4435
JSOA1483.46691630.257981.9152
GWOP1269.09171387.298267.0822
F13GWO1311.15471511.6939100.5186
PSO1402.85671648.5917141.4214
SSA1654.16791849.073997.7102
DBO1481.51841559.654643.3011
NGO1839.62182200.0185216.3331
TSO1977.84532045.532134.5742
JSOA1419.59821571.214781.9163
GWOP1378.79311505.185270.7107
F14GWO1506.73271557.789728.8675
PSO1935.46222052.064567.0820
SSA1308.03161563.4021158.1139
DBO1598.88121641.934925.0031
NGO1532.88102016.7747269.2582
TSO1519.29421787.8413133.7116
JSOA1707.72701715.21297.8747
GWOP1427.15321537.103355.4259
F15GWO2815.56332818.81571.8708
PSO7033.32917304.1382152.0690
SSA2239.25302246.01807.8427
DBO2043.61932397.3519205.1346
NGO4341.32675684.5163787.4008
TSO2106.14662246.808981.9155
JSOA17,002.22703581.09966928.3242
GWOP1973.40032032.389330.3302
F16GWO1624.98801848.3706134.1641
PSO1630.80931922.1510165.8312
SSA2178.09122348.304985.3872
DBO2031.47532244.4391122.4745
NGO1682.15971780.920949.8843
TSO1703.39871866.873290.1388
JSOA1903.25322132.3633134.1642
GWOP1618.72131769.837286.6025
F17GWO1740.89181801.026534.0147
PSO1769.07171800.776919.3649
SSA1835.98711872.950721.7945
DBO1879.38562007.802067.0822
NGO1782.30451822.412223.5442
TSO1709.81001792.894747.1699
JSOA1847.83391914.262237.4157
GWOP1765.78821790.765014.3178
F18GWO1920.96102121.5875100.5620
PSO4116.81285937.11601019.8039
SSA7174.89558420.5290670.8204
DBO7665.15848325.8646370.1364
NGO13,392.128216,201.49871670.8204
TSO5928.67416672.3475433.0127
JSOA2463.53492627.674591.2871
GWOP1790.38382084.8504173.2052
F19GWO12,769.914919,578.05633741.6574
PSO7446.31078309.6088471.6991
SSA13,697.657614,568.8387519.6152
DBO11,887.539113,513.9625912.8709
NGO3321.70813785.2513269.2585
TSO2334.31522404.038737.4166
JSOA2618.74743027.0182235.4435
GWOP2319.60482379.698235.3553
F20GWO2179.58932241.251431.6189
PSO2046.79552150.666952.4150
SSA2315.96512392.338643.3013
DBO2200.90782242.826123.5443
NGO2147.96622200.733827.3186
TSO2003.42392237.6677136.0147
JSOA2290.10592305.206410.3444
GWOP2107.18722148.201421.6919
F21GWO2216.64322306.145651.9616
PSO2340.01542354.505510.1239
SSA2334.16812403.872335.5621
DBO2368.06812382.455010.0878
NGO2204.80402261.439634.0147
TSO2202.48122283.953545.0185
JSOA2262.98412344.842741.5376
GWOP2211.53052224.55159.6109
F22GWO2205.51702309.989252.7125
PSO2201.65322305.276652.2921
SSA3498.03653681.4250101.9804
DBO2689.10983133.1716254.9511
NGO2305.10632333.654616.5831
TSO2104.16562313.4109122.4746
JSOA2371.14172674.1010187.0829
GWOP2263.82192302.722122.3607
F23GWO2611.51892623.68039.3153
PSO2614.46552635.377312.6224
SSA2768.62852839.432942.7200
DBO2702.46042744.173624.0416
NGO2615.24582634.808312.0830
TSO2606.36622633.228715.1789
JSOA2713.63152763.092328.8676
GWOP2508.53862615.663657.0088
F24GWO2753.85162767.97969.9951
PSO2749.15932780.963518.0278
SSA2762.88852864.354251.2213
DBO2823.26952868.580123.7332
NGO2756.10692780.888614.2665
TSO2621.02012736.493758.1682
JSOA2608.56512736.588464.3996
GWOP2518.33262637.987460.2421
F25GWO2914.13752950.855621.7947
PSO2892.08412907.35128.8921
SSA3457.26383657.8078100.5210
DBO3148.48863319.868596.0469
NGO2912.75852974.393131.6181
TSO2901.17722944.941522.9964
JSOA2945.87073092.085187.7496
GWOP2862.42152904.950622.4100
Table 3. Comp Results Table of 3TP Algs for Single UAV.
Table 3. Comp Results Table of 3TP Algs for Single UAV.
Attention MethodAlgorithms 1 (GWO)Algorithms 2 (GRPO)Algorithms 3 (GWOP)
100300500100300500100300500
OPL/m21.5822.9521.7729.5827.5726.7925.6322.4026.01
NOC/counts212100000
FV31.5827.9531.7734.5827.5726.7925.6322.4026.01
Table 4. Comp Results Table of 8TP Algs for Single UAV.
Table 4. Comp Results Table of 8TP Algs for Single UAV.
AlgorithmOPL/mNOC/CountsFV
GWO30.92540.92
GWOP22.24022.24
PSO26.88026.88
SSA29.34029.34
DBO23.35023.35
NGO23.66023.66
TSO22.43022.43
JSOA26.90026.90
Table 5. Comp Results Table of 3TP Algs for Multi-UAVs.
Table 5. Comp Results Table of 3TP Algs for Multi-UAVs.
AlgorithmAPL/mOPL/mNOC/CountsFV
GWO21.0717.61227.61
GRPO23.4920.10230.10
GWOP20.2115.13020.21
Table 6. Comp Results Table of 8TP Algs for Multi-UAVs.
Table 6. Comp Results Table of 8TP Algs for Multi-UAVs.
AlgorithmAPL/mOPL/mNOC/CountsFV
GWO38.5132.03631.37
GWOP17.3515.00018.02
PSO21.6116.05622.27
SSA22.5518.47223.22
DBO19.7016.61020.37
NGO18.7416.57119.41
TSO17.8715.26118.54
JSOA21.1316.30222.46
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Deng, J.; Zhang, H.; Zhang, Y.; Sun, Y. Research on Trajectory Planning for a Limited Number of Logistics Drones (≤3) Based on Double-Layer Fusion GWOP. Drones 2025, 9, 671. https://doi.org/10.3390/drones9100671

AMA Style

Deng J, Zhang H, Zhang Y, Sun Y. Research on Trajectory Planning for a Limited Number of Logistics Drones (≤3) Based on Double-Layer Fusion GWOP. Drones. 2025; 9(10):671. https://doi.org/10.3390/drones9100671

Chicago/Turabian Style

Deng, Jian, Honghai Zhang, Yuetan Zhang, and Yaru Sun. 2025. "Research on Trajectory Planning for a Limited Number of Logistics Drones (≤3) Based on Double-Layer Fusion GWOP" Drones 9, no. 10: 671. https://doi.org/10.3390/drones9100671

APA Style

Deng, J., Zhang, H., Zhang, Y., & Sun, Y. (2025). Research on Trajectory Planning for a Limited Number of Logistics Drones (≤3) Based on Double-Layer Fusion GWOP. Drones, 9(10), 671. https://doi.org/10.3390/drones9100671

Article Metrics

Back to TopTop