AQGTO: Adaptive Q-Learning-Guided Gorilla Troops Optimizer for 3D UAV Path Planning in Precision Agriculture

Bendouma, Tahar; Boudouh, Saida Sarra; Kerrache, Chaker Abdelaziz; Herrera-Tapia, Jorge

doi:10.3390/drones10050357

Open AccessArticle

AQGTO: Adaptive Q-Learning-Guided Gorilla Troops Optimizer for 3D UAV Path Planning in Precision Agriculture

by

Tahar Bendouma

^1,*

,

Saida Sarra Boudouh

¹

,

Chaker Abdelaziz Kerrache

¹

and

Jorge Herrera-Tapia

^2,*

¹

Laboratoire d’Informatique et de Mathématiques, Université Amar Telidji de Laghouat, Laghouat 03000, Algeria

²

Faculty of Computer Science (FACCI), Universidad Laica Eloy Alfaro de Manabí, Manta 130212, Ecuador

^*

Authors to whom correspondence should be addressed.

Drones 2026, 10(5), 357; https://doi.org/10.3390/drones10050357

Submission received: 25 March 2026 / Revised: 29 April 2026 / Accepted: 4 May 2026 / Published: 8 May 2026

(This article belongs to the Special Issue The Role of UAVs in Modern Agriculture: Precision Spraying and Crop Health Analysis)

Download

Browse Figures

Review Reports Versions Notes

Highlights

What are the main findings?

The proposed Adaptive Q-Learning Guided Gorilla Troops Optimizer (AQGTO) improves 3D UAV path planning in precision agriculture by dynamically balancing exploration and exploitation during optimization.
Across representative agricultural environments, AQGTO achieves lower trajectory cost, shorter path length, and more stable performance than the evaluated population-based baselines, while remaining competitive with A* in overall path quality.

What are the implications of the main findings?

Integrating Q-learning into a population-based optimizer provides an effective mechanism for improving search adaptability and reducing premature convergence in complex multi-objective UAV path-planning problems.
The proposed framework offers a robust path-planning solution for agricultural UAV missions such as crop monitoring, field inspection, and targeted spraying in obstacle-rich 3D environments.

Abstract

Unmanned Aerial Vehicles (UAVs) have become a key technology in precision agriculture, enabling efficient monitoring, inspection, and targeted interventions. However, effective UAV path planning in such environments requires the generation of safe, energy-efficient, and smooth trajectories in complex three-dimensional spaces. This paper proposes an Adaptive Q-Learning Guided Gorilla Troops Optimizer (AQGTO) for 3D UAV path planning. The proposed method integrates a state-aware Q-learning mechanism into the Gorilla Troops Optimizer (GTO), enabling the optimizer to adaptively select exploration, exploitation, and diversification strategies according to the current optimization state. A multi-objective cost function is formulated to simultaneously minimize path length, an energy-related surrogate cost, obstacle proximity, path smoothness, and altitude variation. In addition, a feasibility repair mechanism is introduced to ensure collision-free trajectories in environments with cylindrical obstacles. The proposed approach is evaluated in three representative agricultural scenarios: row-crop fields, orchard environments, and hilly terrains. Experimental results show that AQGTO achieves competitive and improved performance compared with classical A*, Particle Swarm Optimization (PSO), Grey Wolf Optimizer (GWO), Whale Optimization Algorithm (WOA), and the original GTO in terms of trajectory cost, path efficiency, and stability. Furthermore, an ablation study confirms that the integration of Q-learning significantly enhances optimization performance. These results suggest that AQGTO provides an effective and robust solution for UAV path planning in complex agricultural environments.

Keywords:

unmanned aerial vehicle path planning; precision agriculture; reinforcement learning; Q-learning; Gorilla Troops Optimizer; metaheuristic optimization; three-dimensional trajectory optimization

Graphical Abstract

1. Introduction

Precision agriculture has emerged as a transformative approach for improving crop productivity, resource efficiency, and environmental sustainability [1,2,3]. By leveraging advanced sensing and automation technologies, precision agriculture enables farmers to monitor crop conditions, detect diseases, and optimize irrigation and fertilization processes. In this context, Unmanned Aerial Vehicles (UAVs) have become increasingly important due to their ability to rapidly collect high-resolution aerial data and perform targeted agricultural operations such as crop monitoring, spraying, and field inspection [1,4,5]. Typical UAV applications in precision agriculture are illustrated in Figure 1.

Compared with traditional ground-based monitoring systems, UAVs offer several advantages including rapid deployment, flexible coverage, and the ability to access difficult terrain. However, effective UAV deployment in agricultural environments requires reliable and efficient path-planning algorithms capable of generating safe and energy-efficient flight trajectories [2,6,7]. Agricultural landscapes often contain complex obstacles such as trees, irrigation systems, buildings, and terrain variations [1,8]. Consequently, UAV path planning must simultaneously consider multiple constraints including obstacle avoidance, trajectory smoothness, energy consumption, and altitude limitations.

Classical path-planning algorithms such as A* have been widely applied in robotic navigation and UAV trajectory generation due to their deterministic search mechanisms and guaranteed optimality in discretized environments [9,10]. Despite their effectiveness in structured environments, these algorithms often struggle when dealing with continuous high-dimensional search spaces and multi-objective optimization problems commonly encountered in UAV trajectory planning [2,6].

To address these limitations, researchers have increasingly explored metaheuristic optimization techniques for UAV path planning [11,12,13]. Swarm intelligence algorithms such as Particle Swarm Optimization (PSO) have been widely adopted due to their simple structure and strong global search capability [14,15]. Similarly, recent bio-inspired algorithms such as the Gorilla Troops Optimizer (GTO) have demonstrated promising performance in solving complex nonlinear optimization problems [16]. However, population-based optimization algorithms may still suffer from premature convergence or inefficient exploration–exploitation balance during the search process [17].

Beyond PSO and GTO, several recent swarm-intelligence and bio-inspired algorithms have been investigated for UAV trajectory planning. Grey Wolf Optimizer (GWO)-based methods have attracted attention because of their leadership hierarchy and ability to balance exploration and exploitation. Recent studies have further incorporated Q-learning into GWO to overcome premature convergence, local minima, and limited adaptive learning in UAV path planning [18]. Similarly, Whale Optimization Algorithm (WOA)-based approaches have been applied to three-dimensional UAV trajectory planning, where improved variants use reverse learning, nonlinear convergence factors, and random mechanisms to improve population diversity and avoid local optima [19]. Sparrow Search Algorithm (SSA)-based methods have also recently been used for UAV path planning, with improved variants introducing sine–cosine strategies, Lévy flight, chaotic mapping, or hybrid disturbance mechanisms to improve global exploration, convergence accuracy, and solution stability [20,21].

In recent years, hybrid approaches combining reinforcement learning with metaheuristic optimization have attracted increasing attention [5,22]. Reinforcement learning allows optimization algorithms to adaptively adjust their search strategies based on feedback obtained during the optimization process. Among reinforcement learning techniques, Q-learning provides a simple yet effective framework for learning optimal action-selection policies through interaction with the environment [23].

Despite the significant progress achieved by classical and metaheuristic approaches, existing methods still face several limitations in complex agricultural environments. Classical planners struggle with multi-objective optimization in continuous search spaces, while metaheuristic algorithms often rely on predefined search strategies that cannot dynamically adapt to the optimization state. This may lead to premature convergence or inefficient exploration–exploitation balance, particularly in high-dimensional UAV trajectory planning problems.

To address these limitations, this paper proposes an Adaptive Q-Learning Guided Gorilla Troops Optimizer (AQGTO) for three-dimensional UAV path planning in precision agriculture environments. The proposed approach integrates a Q-learning mechanism into the search process of the Gorilla Troops Optimizer to dynamically guide exploration and exploitation behaviors during optimization. In addition, a feasibility repair strategy is introduced to maintain valid trajectories and improve obstacle avoidance during the search process.

In practical agricultural UAV systems, path planning is typically part of a broader perception–decision–execution pipeline. In such a pipeline, perception modules acquire environmental information using onboard cameras, multispectral sensors, LiDAR, GPS/INS, or pre-existing field maps. This information is then converted into a planning representation, such as an obstacle map, elevation map, or occupancy grid. The decision layer generates a feasible reference trajectory, while the execution layer tracks the planned path through the UAV flight controller. The present work focuses on the decision layer, where AQGTO is used as a global trajectory optimizer under the assumption that the main environmental structure is available before planning. Real-time perception, multispectral image interpretation, online map updating, and low-level flight control are outside the scope of the current study and are identified as future extensions.

The UAV path-planning problem is formulated as a multi-objective optimization task that simultaneously considers path length, energy-related surrogate cost, obstacle avoidance, trajectory smoothness, and altitude variation. Extensive experiments are conducted in representative agricultural environments including row-crop fields, orchard plantations, and hilly terrain scenarios.

The novelty of the proposed approach lies in the state-aware integration of Q-learning into the Gorilla Troops Optimizer for three-dimensional UAV trajectory optimization. Unlike standard metaheuristic approaches that rely mainly on fixed or predefined search operators, AQGTO observes the current optimization state through population diversity, improvement rate, and feasibility ratio. Based on this state information, the Q-learning agent adaptively selects among exploration, exploitation, and diversification actions. This enables the optimizer to respond dynamically to stagnation, premature convergence, and feasibility evolution during the search process. Therefore, the proposed method differs from existing operator-level improvements by introducing an adaptive decision mechanism that guides the search behavior according to the current optimization conditions.

The main contributions of this work can be summarized as follows:

A new Adaptive Q-Learning Guided Gorilla Troops Optimizer (AQGTO) is proposed for three-dimensional UAV path planning in precision agriculture, extending the original GTO with a state-aware reinforcement learning mechanism.
Unlike standard metaheuristic variants that rely mainly on predefined update operators or fixed parameter-control strategies, the proposed AQGTO uses Q-learning to adaptively select among exploration, exploitation, and diversification actions according to population diversity, improvement rate, and feasibility ratio.
A multi-objective trajectory-quality function is designed to jointly optimize path length, energy-related surrogate cost, obstacle avoidance, path smoothness, and altitude variation.
A feasibility repair strategy is introduced to improve constraint satisfaction and promote collision-free trajectories in complex environments with cylindrical agricultural obstacles.
Extensive experiments conducted in multiple agricultural scenarios demonstrate that the proposed method achieves improved trajectory quality and optimization stability compared with deterministic and population-based baseline algorithms.

The remainder of this paper is organized as follows. Section 2 reviews existing UAV path-planning approaches and related optimization techniques. Section 3 presents the proposed AQGTO algorithm and the UAV trajectory formulation. Section 4 describes the simulation environments and experimental setup. Section 5 discusses the experimental results and performance analysis. Finally, Section 6 concludes the paper and outlines potential directions for future work.

2. Related Work

Path planning for unmanned aerial vehicles (UAVs) has been widely investigated in recent years due to its importance in autonomous navigation, surveillance, environmental monitoring, and precision agriculture [1,2,3,6]. In agricultural applications, UAV path planning must address several challenges including obstacle avoidance, energy-efficient flight trajectories, terrain variation, and smooth trajectory generation [8]. Existing approaches for UAV path planning can generally be categorized into classical deterministic planners, metaheuristic optimization methods, and hybrid learning-based approaches [7,24].

2.1. Classical Path Planning Methods

Classical path-planning algorithms have long been used in robotics and autonomous navigation. Methods such as Dijkstra’s algorithm, A*, and rapidly exploring random trees (RRT) provide deterministic solutions for shortest-path computation in structured environments [9,25,26]. Among these approaches, the A* algorithm has become one of the most widely adopted techniques due to its heuristic-guided search strategy and guaranteed optimality in discretized grid environments [9].

Several studies have extended the A* algorithm to better address UAV-specific constraints. Chen et al. [9] proposed an improved A*-based framework combined with a mixed-integer linear programming (MILP) formulation to systematically explore the solution space. Their approach enhances the evaluation function and node selection strategy, enabling the generation of shorter and time-efficient UAV trajectories compared with traditional A*.

Similarly, Tian et al. [10] developed a hybrid method integrating an improved A* algorithm with the gravitational search algorithm to optimize UAV operations in irregular agricultural fields. Their model explicitly considers heading angle and return points, leading to significant reductions in non-productive flight distance and pesticide over-coverage, thereby improving operational efficiency.

2.2. Metaheuristic Optimization for UAV Path Planning

To overcome the limitations of classical path-planning algorithms, many researchers have explored metaheuristic optimization methods for UAV trajectory planning [11,12,13]. These algorithms are capable of exploring large continuous search spaces and handling multiple optimization objectives simultaneously.

Several studies have leveraged swarm intelligence and bio-inspired optimization to enhance UAV trajectory planning. Sonny et al. [14] proposed a modified Particle Swarm Optimization (PSO) framework that jointly optimizes UAV path planning and energy consumption while satisfying user communication rate requirements. Their approach first determines an optimal destination based on line-of-sight (LoS) probability, followed by PSO-based trajectory optimization, resulting in improved energy efficiency, reduced travel time, and enhanced communication performance in complex 3D environments. Similarly, Lin et al. [15] introduced a Double Self-Limiting PSO (DSLPSO) algorithm to address UAV energy optimization. Their method incorporates movement restriction and dynamic search range adjustment mechanisms to balance local and global search capabilities, leading to improved convergence behavior and extended UAV operational lifetime.

Beyond PSO-based methods, more recent research has explored advanced bio-inspired optimizers. Abdollahzadeh et al. [16] proposed the Gorilla Troops Optimizer (GTO), a population-based algorithm that models the social behavior of gorilla groups to achieve an effective balance between exploration and exploitation. The method demonstrated superior performance across multiple benchmark and engineering optimization problems, particularly in high-dimensional search spaces. Building upon this, Mostafa et al. [17] introduced an improved variant (mGTO) incorporating elite opposition-based learning, Cauchy distribution, and tangent flight operators to enhance population diversity and convergence speed. Their results show that mGTO effectively mitigates premature convergence and achieves more stable optimization performance compared with standard metaheuristics.

Recent studies have also explored GWO, WOA, and SSA variants for UAV trajectory planning. Nayeem et al. [18] proposed an Adaptive Q-Learning Grey Wolf Optimizer for UAV path planning. Their study emphasized that standard GWO may suffer from premature convergence, local minima, and limited adaptability in dynamic environments, and introduced Q-learning-based mechanisms to improve the exploration–exploitation balance. Yang et al. [19] proposed an improved Whale Optimization Algorithm for three-dimensional UAV trajectory planning by incorporating reverse learning, a nonlinear convergence factor, and random mechanisms to improve convergence accuracy and population diversity. In another recent work, Yang et al. [21] proposed an Improved Sparrow Search Algorithm for 3D UAV trajectory planning by combining sine–cosine functions and Lévy flight to enhance population diversity, avoid local optima, and improve trajectory quality. Similarly, Zhang et al. [20] introduced a chaotic mapping–firefly SSA for UAV inspection path planning, showing that chaotic initialization and disturbance-based search can improve convergence speed, solution accuracy, and robustness.

2.3. Hybrid Learning-Based Optimization Methods

In recent years, hybrid approaches that integrate machine learning techniques with metaheuristic optimization have gained increasing attention [5,22]. Reinforcement learning has been particularly promising in this context, as it allows optimization algorithms to adaptively adjust their search strategies based on feedback from the optimization process [22].

Recent studies have increasingly explored the integration of reinforcement learning with UAV path planning to enhance adaptability in complex environments. Fu et al. [5] proposed a Bi-directional Long Short-Term Memory-enhanced Deep Q-Network (BL-DQN) for agricultural UAV coverage planning. Their framework combines remote sensing data acquisition, U-Net-based task area segmentation, and reinforcement learning-based trajectory optimization, resulting in significant improvements in coverage efficiency and reduction in redundant spraying compared with traditional DQN and DFS methods. Similarly, Tu et al. [22] investigated the use of Q-learning for UAV path planning and obstacle avoidance in dynamic environments. Their approach enables UAVs to learn optimal navigation policies through interaction within a simulated environment, demonstrating improved energy efficiency and operational safety compared with conventional reinforcement learning strategies such as SARSA.

2.4. Research Gap

Although classical, swarm-intelligence, and bio-inspired approaches have shown promising performance for UAV trajectory optimization, several challenges remain in agricultural environments. Classical planners often struggle with multi-objective optimization in continuous search spaces [2], while population-based metaheuristic algorithms may suffer from premature convergence, sensitivity to parameter settings, and inefficient exploration–exploitation balance during the search process [17,18].

Recent improved algorithms such as Q-learning-based GWO, improved WOA, and improved SSA have demonstrated that adaptive mechanisms, nonlinear convergence factors, chaotic initialization, and Lévy-flight-based perturbations can improve UAV path-planning performance [18,19,20,21]. Nevertheless, these approaches are generally designed around their own algorithm-specific update rules and do not directly address the adaptive selection of multiple search behaviors within the Gorilla Troops Optimizer.

The main gap addressed in this paper is therefore the lack of a state-aware adaptive mechanism for guiding the search behavior of GTO in three-dimensional UAV path planning. In standard GTO, the transition between exploration and exploitation is governed by predefined stochastic rules, which may not adequately respond to changes in population diversity, improvement rate, or feasibility ratio. This limitation is particularly important in agricultural environments, where the optimizer must simultaneously handle obstacle avoidance, path efficiency, altitude variation, and trajectory smoothness.

To address this gap, the present work proposes an Adaptive Q-Learning Guided Gorilla Troops Optimizer (AQGTO). Unlike conventional GTO and other operator-level improvements, AQGTO uses Q-learning to select among different search strategies according to the current optimization state. This allows the algorithm to dynamically adjust the balance between exploration, exploitation, and diversification during trajectory optimization.

3. Proposed Method

To address the previously mentioned limitations, this work proposes an Adaptive Q-Learning Guided Gorilla Troops Optimizer (AQGTO) that integrates reinforcement learning into the optimization process. By incorporating a Q-learning mechanism, the proposed method dynamically selects appropriate search strategies during optimization, improving convergence stability and trajectory quality in complex agricultural environments.

3.1. Problem Formulation

In precision agriculture applications, UAVs are frequently deployed to perform tasks such as crop monitoring, disease detection, and targeted spraying. These missions require the UAV to navigate safely through agricultural environments that may contain various obstacles, including trees, buildings, irrigation systems, and terrain variations. Consequently, the UAV path-planning problem can be formulated as an optimization task that aims to determine a safe and efficient trajectory between a start position and a target destination while satisfying environmental and flight constraints.

Let the three-dimensional environment be defined as a bounded search space:

Ω = {(x, y, z) ∣ x_{m i n} \leq x \leq x_{m a x}, y_{m i n} \leq y \leq y_{m a x}, z_{m i n} \leq z \leq z_{m a x}}

(1)

where x, y, and z represent the spatial coordinates of the UAV within the environment.

The UAV trajectory is represented by a sequence of waypoints connecting the start point S and the goal point G:

P = {S, W_{1}, W_{2}, \dots, W_{n}, G}

(2)

where

W_{i} = (x_{i}, y_{i}, z_{i})

denotes the i-th intermediate waypoint and n is the number of waypoints used to represent the trajectory.

The objective of the path-planning problem is to determine the optimal set of waypoints that minimizes a trajectory cost function while satisfying obstacle avoidance and flight constraints. The optimization problem can therefore be formulated as:

min_{P} C (P)

(3)

subject to the following environmental and trajectory feasibility constraints:

Boundary constraint. Each waypoint must remain inside the bounded three-dimensional search space:

x_{m i n} \leq x_{i} \leq x_{m a x}, y_{m i n} \leq y_{i} \leq y_{m a x}, z_{m i n} \leq z_{i} \leq z_{m a x}, i = 1, \dots, n .

(4)

Obstacle avoidance constraint. Let

O_{j}

denote the j-th cylindrical obstacle with center

(x_{j}, y_{j})

, radius

r_{j}

, and height

h_{j}

. A waypoint

W_{i} = (x_{i}, y_{i}, z_{i})

is considered collision-free with respect to

O_{j}

if

\sqrt{{(x_{i} - x_{j})}^{2} + {(y_{i} - y_{j})}^{2}} > r_{j} or z_{i} > h_{j} + δ_{h},

(5)

where

δ_{h}

is a vertical safety margin. In addition, each line segment connecting two consecutive waypoints is checked by sampling intermediate points along the segment to detect possible segment–obstacle intersections.

Safety clearance constraint. To reduce the risk of near-obstacle flight, a minimum horizontal clearance distance

d_{s}

is imposed whenever the UAV flies below the obstacle height margin:

\sqrt{{(x_{i} - x_{j})}^{2} + {(y_{i} - y_{j})}^{2}} - r_{j} \geq d_{s}, if z_{i} \leq h_{j} + δ_{h} .

(6)

Altitude constraint. The UAV altitude must remain within the operational flight range during the whole trajectory:

z_{m i n} \leq z_{i} \leq z_{m a x}, i = 0, \dots, n + 1,

(7)

where

W_{0} = S

and

W_{n + 1} = G

.

Feasible trajectory constraint. The generated path must satisfy basic geometric feasibility conditions between consecutive waypoints. First, the distance between two consecutive waypoints is limited by a maximum segment length

l_{m a x}

:

∥ W_{i + 1} - W_{i} ∥ \leq l_{m a x}, i = 0, \dots, n .

(8)

Second, abrupt vertical motion is limited by imposing a maximum altitude difference between consecutive waypoints:

| z_{i + 1} - z_{i} | \leq Δ z_{m a x}, i = 0, \dots, n .

(9)

Finally, abrupt heading changes are discouraged by limiting the turning angle between two consecutive path segments:

θ_{i} = {cos}^{- 1} (\frac{(W_{i} - W_{i - 1}) \cdot (W_{i + 1} - W_{i})}{∥ W_{i} - W_{i - 1} ∥ ∥ W_{i + 1} - W_{i} ∥}) \leq θ_{m a x}, i = 1, \dots, n .

(10)

These constraints ensure that the optimized trajectory remains inside the flight region, avoids cylindrical obstacles with a safety margin, and maintains physically reasonable transitions between consecutive waypoints. Candidate solutions violating these constraints are corrected using the feasibility repair mechanism described in Section 3.6 and penalized through the objective function when necessary.

The trajectory cost

C (P)

is defined as a multi-objective function that considers several factors influencing the quality of the UAV path, including trajectory length, energy-related surrogate cost, obstacle proximity penalties, and path smoothness. These components are described in detail in Section 3.3.

The goal of the optimization algorithm is therefore to determine the waypoint configuration that produces the safest and most efficient UAV trajectory within the agricultural environment.

3.2. Path Encoding

In population-based optimization algorithms, each candidate solution must be encoded as a vector representation that can be efficiently manipulated during the search process.

The UAV trajectory is defined as a sequence of waypoints connecting the start point S and the goal point G, as defined in Section 3.1.

Each candidate solution is therefore represented as a continuous vector containing the coordinates of all intermediate waypoints:

X = [x_{1}, y_{1}, z_{1}, x_{2}, y_{2}, z_{2}, \dots, x_{n}, y_{n}, z_{n}]

(11)

where n denotes the number of intermediate waypoints. The start and goal positions remain fixed and are not modified during optimization.

This representation transforms the UAV path-planning problem into a continuous optimization problem in a search space of dimension:

D = 3 n

(12)

where each waypoint contributes three variables corresponding to its spatial coordinates.

During the optimization process, candidate solutions generated by the swarm algorithms may occasionally violate environmental constraints such as obstacle boundaries or altitude limits. To address this issue, a feasibility repair mechanism is applied after each position update. This mechanism adjusts invalid waypoint coordinates to ensure that the generated trajectory remains within the allowable flight region and avoids obstacle intersections.

The waypoint-based representation offers several advantages for UAV trajectory optimization:

It provides a compact and continuous search representation suitable for swarm-based optimization algorithms.
It allows flexible trajectory shapes that can adapt to complex obstacle configurations.
It enables smooth trajectory generation when combined with appropriate cost functions and waypoint interpolation.

This encoding scheme forms the basis for the optimization process performed by the proposed AQGTO algorithm.

3.3. Objective Function

To evaluate the quality of candidate UAV trajectories, a multi-objective cost function is defined that considers path length, an energy-related surrogate cost, obstacle avoidance, path smoothness, and altitude variation. The energy-related term used in this study is not intended to represent a full physical battery-consumption model; instead, it provides a lightweight trajectory-quality proxy that penalizes long travel distances and excessive vertical motion. The total trajectory cost is expressed as:

C (P) = w_{1} L (P) + w_{2} E (P) + w_{3} P_{o b s} (P) + w_{4} S (P) + w_{5} H (P)

(13)

where

L (P)

is the path length,

E (P)

is the energy-related surrogate cost,

P_{o b s} (P)

is the obstacle penalty,

S (P)

is the smoothness term, and

H (P)

is the altitude variation.

3.3.1. Path Length

The path length represents the total distance traveled by the UAV along the trajectory. Shorter paths generally correspond to faster mission completion and reduced energy-related surrogate cost.

The path length is computed as:

L (P) = \sum_{i = 0}^{n} ∥ W_{i + 1} - W_{i} ∥

(14)

where

W_{0} = S

and

W_{n + 1} = G

.

3.3.2. Energy-Related Surrogate Cost

Energy efficiency is an important consideration for UAV missions, especially in agricultural monitoring and spraying tasks where flight endurance is limited. However, accurately modeling UAV energy-related surrogate cost requires detailed information about the UAV platform, propulsion system, velocity profile, payload, wind conditions, and climb/descent dynamics. Since the present study focuses on comparing optimization algorithms under a common trajectory-planning framework, a simplified energy-related surrogate cost is used instead of a full physical energy model.

This surrogate term combines the traveled distance and the cumulative vertical displacement between consecutive waypoints. The rationale is that longer trajectories and larger altitude variations generally increase flight effort. Therefore, this term is used to guide the optimizer toward shorter and vertically smoother trajectories while maintaining computational simplicity.

The energy term is defined as:

E (P) = \sum_{i = 0}^{n} ∥ W_{i + 1} - W_{i} ∥ + η_{z} \sum_{i = 0}^{n} | z_{i + 1} - z_{i} |

(15)

where

η_{z}

is the coefficient controlling the penalty associated with vertical displacement. The term

E (P)

should be interpreted as an energy-related trajectory cost rather than a direct physical estimate of UAV battery consumption.

3.3.3. Obstacle Penalty

To ensure safe UAV navigation, trajectories that pass too close to obstacles or intersect them are penalized. Obstacles in the agricultural environment are modeled as cylindrical volumes representing trees, buildings, or agricultural structures.

The obstacle penalty consists of two components: a proximity penalty and a collision penalty. The proximity penalty discourages the UAV from flying too close to obstacle boundaries, while the collision penalty strongly penalizes any trajectory segment that intersects an obstacle.

For a waypoint

W_{i} = (x_{i}, y_{i}, z_{i})

and an obstacle

O_{j}

with center

(x_{j}, y_{j})

, radius

r_{j}

, and height

h_{j}

, the radial clearance is defined as:

d_{i j} = \sqrt{{(x_{i} - x_{j})}^{2} + {(y_{i} - y_{j})}^{2}} - r_{j}

(16)

A proximity penalty is applied when the waypoint lies below the obstacle height margin and the clearance is smaller than a safety distance

d_{s}

:

P_{n e a r} (P) = \sum_{i = 0}^{n + 1} \sum_{j = 1}^{N_{o b s}} ϕ (d_{i j})

(17)

with

ϕ (d_{i j}) = \{\begin{matrix} β (d_{s} - max (d_{i j}, 0)), & if z_{i} \leq h_{j} + δ_{h} and d_{i j} < d_{s} \\ 0, & otherwise \end{matrix}

(18)

where

β

is the proximity gain,

d_{s}

is the obstacle safety clearance threshold, and

δ_{h}

is the height safety margin.

In addition, a collision penalty is assigned whenever a trajectory segment intersects an obstacle. Let

N_{c o l}

denote the total number of detected segment-obstacle collisions. The collision penalty is defined as:

P_{c o l} (P) = λ N_{c o l}

(19)

where

λ

is a large constant that strongly penalizes infeasible trajectories.

The total obstacle penalty is therefore:

P_{o b s} (P) = P_{n e a r} (P) + P_{c o l} (P)

(20)

3.3.4. Path Smoothness

Smooth trajectories are desirable for UAV navigation because they reduce abrupt direction changes, improve flight stability, and lower control effort. In the proposed model, smoothness is measured using the squared turning angles between consecutive path segments.

The smoothness term is defined as:

S (P) = \sum_{i = 1}^{n} θ_{i}^{2}

(21)

where

θ_{i}

denotes the turning angle between two consecutive trajectory segments.

3.3.5. Altitude Variation

Large altitude changes may increase energy-related surrogate cost and reduce flight stability. Therefore, altitude variation is penalized to encourage smoother vertical motion along the trajectory.

The altitude variation term is defined as:

H (P) = \sum_{i = 0}^{n} | z_{i + 1} - z_{i} |

(22)

3.4. Gorilla Troops Optimizer

The Gorilla Troops Optimizer (GTO) is a population-based metaheuristic optimization algorithm inspired by the social behavior and leadership hierarchy of gorilla groups. In this algorithm, each gorilla represents a candidate solution within the search space, and the group collectively explores the environment to identify optimal solutions.

In the context of UAV path planning, each gorilla encodes a candidate trajectory defined by a set of intermediate waypoints as described in Section 3.2. The quality of each trajectory is evaluated using the objective function defined in Section 3.3.

The GTO algorithm operates through two main phases: exploration and exploitation.

3.4.1. Exploration Phase

During the exploration phase, gorillas search for promising regions of the solution space by performing random movements. This phase helps maintain population diversity and prevents premature convergence.

The exploration movement can be expressed as:

X_{i}^{t + 1} = X_{i}^{t} + r \cdot (X_{r a n d} - X_{i}^{t})

(23)

where

X_{i}^{t}

represents the position of the i-th gorilla at iteration t,

X_{r a n d}

is a randomly selected solution from the population, and r is a random coefficient controlling the exploration step.

3.4.2. Exploitation Phase

Once promising regions of the search space are identified, the algorithm enters the exploitation phase, where gorillas move toward the best-known solution, known as the silverback.

The exploitation update rule can be expressed as:

X_{i}^{t + 1} = X_{i}^{t} + r \cdot (X_{b e s t} - X_{i}^{t})

(24)

where

X_{b e s t}

represents the best solution found so far in the population.

Through iterative exploration and exploitation, the gorilla population gradually converges toward optimal regions of the search space.

Although GTO demonstrates strong global search capability, its search strategies remain predefined throughout the optimization process. As a result, the algorithm may experience difficulties dynamically adapting the balance between exploration and exploitation as the optimization progresses. This limitation motivates the integration of reinforcement learning into the optimization process, as described in the following subsection.

3.5. Adaptive Q-Learning Mechanism

To dynamically balance exploration and exploitation during optimization, a Q-learning mechanism is integrated into the Gorilla Troops Optimizer. The reinforcement learning agent selects the search strategy at each iteration based on the current state of the population.

3.5.1. State Definition

The state is defined using three indicators that characterize the optimization process:

Population diversity (D): measured as the average Euclidean distance between individuals and the population centroid.
Improvement rate ( $Δ f$ ): defined as the relative improvement of the best solution between consecutive iterations.
Feasibility ratio (F): defined as the proportion of collision-free solutions in the population.

Each indicator is discretized into three levels (Low, Medium, High), resulting in a discrete state space of size

3 \times 3 \times 3 = 27

states:

S = {(D, Δ f, F) ∣ D, Δ f, F \in {Low, Medium, High}}

(25)

Equation (25) defines the discrete optimization-state space used by the Q-learning agent. This representation allows the algorithm to capture stagnation, convergence, and feasibility conditions during the search process.

3.5.2. Action Space

The action space consists of four search strategies:

$a_{0}$ : Strong exploration (large random perturbations)
$a_{1}$ : Moderate exploration (peer-guided movement)
$a_{2}$ : Exploitation (movement toward the best solution)
$a_{3}$ : Diversification jump (random restart or large perturbation)

Each action corresponds to a specific update rule applied to the population.

3.5.3. Reward Function

The reward is defined based on both solution improvement and feasibility evolution:

R = λ_{1} \cdot Δ f + λ_{2} \cdot Δ F

(26)

where:

$Δ f$ is the improvement gain in the best objective value,
$Δ F$ is the change in feasibility ratio.

Additional penalty terms are applied when no improvement is observed or when feasibility decreases, in order to discourage ineffective search actions:

A penalty is applied when $Δ f$ is below a small threshold.
A penalty is applied when the feasibility ratio decreases.

3.5.4. Q-Learning Update Rule

The Q-values are updated using:

Q (s, a) \leftarrow Q (s, a) + α [R + γ max_{a^{'}} Q (s^{'}, a^{'}) - Q (s, a)]

(27)

where

α

is the learning rate,

γ

is the discount factor, s and

s^{'}

are the current and next states, respectively, a is the selected action, and R is the reward defined in Equation (26).

3.5.5. Action Selection Policy

An

ϵ

-greedy policy is adopted:

With probability $ϵ$ , a random action is selected,
Otherwise, the action with the highest Q-value is selected.

The parameter

ϵ

decreases linearly during the optimization process to gradually shift from exploration to exploitation.

3.5.6. Integration into AQGTO

At each iteration, the agent observes the current state and selects an action that determines how the population is updated. The selected strategy modifies the movement of candidate solutions, enabling adaptive control of the search behavior throughout the optimization process.

3.6. Feasibility Repair Mechanism

During the optimization process, population-based algorithms such as PSO and GTO may generate candidate trajectories that violate environmental or flight constraints. For example, some waypoints may lie outside the allowable flight region, intersect with obstacles, or produce unrealistic altitude variations. To ensure that all candidate trajectories remain feasible, a feasibility repair mechanism is applied after each position update.

The purpose of this mechanism is to correct invalid waypoint coordinates while preserving the overall structure of the trajectory. The repair procedure consists of three main steps.

Boundary correction. If a waypoint lies outside the allowable flight region

Ω

, its coordinates are projected back into the feasible space:

x_{i} = min (max (x_{i}, x_{m i n}), x_{m a x})

(28)

y_{i} = min (max (y_{i}, y_{m i n}), y_{m a x})

(29)

z_{i} = min (max (z_{i}, z_{m i n}), z_{m a x})

(30)

This ensures that all waypoints remain within the predefined spatial limits of the environment.

Obstacle avoidance correction. Agricultural obstacles such as trees or buildings are modeled as cylindrical regions in the environment. If a waypoint is detected inside an obstacle region, the waypoint is shifted to the nearest feasible position outside the obstacle boundary. This adjustment preserves the continuity of the trajectory while ensuring collision-free navigation.

Altitude adjustment. To maintain feasible UAV flight dynamics, large altitude variations between consecutive waypoints are limited. If the altitude difference between two waypoints exceeds a predefined threshold, the waypoint altitude is adjusted to satisfy the allowable variation constraint.

Segment-length and turning-angle correction. To avoid unrealistic jumps between consecutive waypoints, the distance between adjacent waypoints is checked after each position update. If

∥ W_{i + 1} - W_{i} ∥ > l_{m a x}

, the waypoint is shifted along the segment direction until the maximum segment-length constraint is satisfied. Similarly, if the turning angle

θ_{i}

exceeds

θ_{m a x}

, the waypoint is locally adjusted toward the bisector direction of the neighboring segments. This correction reduces abrupt heading changes and improves the physical feasibility of the generated trajectory.

Although the present implementation uses cylindrical obstacles, the proposed optimization framework is not restricted to this specific geometry. AQGTO operates on waypoint coordinates and relies on the feasibility-checking and penalty-evaluation modules to determine whether a candidate trajectory is valid. Therefore, more complex obstacle representations, such as polygonal regions, ellipsoidal canopies, occupancy grids, or point-cloud-based maps, can be incorporated by replacing the collision-detection and obstacle-penalty functions while preserving the same Q-learning-guided optimization structure.

By applying these corrections after each optimization step, the feasibility repair mechanism guarantees that all candidate trajectories evaluated by the objective function remain valid and collision-free. This mechanism significantly improves the stability of the optimization process and prevents the algorithm from wasting iterations evaluating infeasible solutions.

3.7. AQGTO Algorithm

The proposed Adaptive Q-Learning Guided Gorilla Troops Optimizer (AQGTO) integrates reinforcement learning into the Gorilla Troops Optimizer to dynamically adjust the search behavior during the optimization process. The objective of this integration is to improve convergence stability and trajectory optimization performance by adaptively controlling the balance between exploration and exploitation.

In the classical GTO algorithm, the search strategy is predefined and remains fixed throughout the optimization process. This limitation may lead to inefficient exploration in early iterations or premature convergence in later stages. To address this issue, the proposed AQGTO framework introduces a Q-learning agent that observes the state of the optimization process and selects appropriate search strategies accordingly.

At each iteration, the Q-learning agent evaluates the current optimization state and determines whether the algorithm should emphasize exploration or exploitation. Based on the selected action, the gorilla population updates its positions accordingly. The updated candidate trajectories are then corrected using the feasibility repair mechanism and evaluated using the objective function. A reward signal reflecting the improvement of the best solution is used to update the Q-table, enabling the algorithm to learn effective search strategies over time. The workflow of the proposed AQGTO framework is illustrated in Figure 2.

The overall procedure of the proposed AQGTO algorithm is summarized in Algorithm 1.

Algorithm 1 Adaptive Q-Learning Guided Gorilla Troops Optimizer (AQGTO)

1:: Initialize gorilla population $X_{i}$
2:: Initialize Q-table $Q (s, a)$
3:: Evaluate fitness of each solution
4:: for iteration $t = 1$ to $T_{m a x}$ do
5:: Identify best solution $X_{b e s t}$
6:: Observe optimization state s
7:: Select action a using policy derived from Q-table
8:: for each gorilla i do
9:: if action indicates exploration then
10:: Update position using exploration rule
11:: else
12:: Update position using exploitation rule
13:: end if
14:: Apply feasibility repair mechanism
15:: end for
16:: Evaluate updated population
17:: Compute reward r
18:: Observe next state $s^{'}$
19:: Update Q-table using Equation (27)
20:: end for
21:: Return best trajectory found

4. Experimental Setup

This section describes the experimental framework used to evaluate the performance of the proposed AQGTO algorithm, including the UAV modeling assumptions, simulation environments, parameter settings, and evaluation metrics. The simulation experiments were conducted on a workstation equipped with an Intel Core i5-12400 CPU, 32 GB DDR4 3600 MHz RAM, and a 2 TB PCIe Gen 4 NVMe SSD. The implementation was developed in Python 3.14.3 using standard scientific computing libraries, including NumPy and Matplotlib; no physical UAV or field equipment was used in this simulation-based study.

The present experimental setup focuses on global three-dimensional trajectory optimization in known agricultural environments. In this setting, the obstacle map and environmental boundaries are assumed to be available during planning, and the objective is to generate a safe and efficient reference trajectory before mission execution. This assumption is commonly adopted in offline UAV trajectory optimization studies because it enables controlled comparison among optimization algorithms under identical environmental and objective-function conditions. Dynamic replanning, perception uncertainty, wind-field estimation, and communication latency are not explicitly modeled in the current experiments and are discussed as limitations and future research directions.

From a system-level perspective, the simulated obstacle maps used in this work can be interpreted as the output of an upstream perception or mapping stage. For example, in real agricultural missions, crop rows, trees, terrain variations, or restricted regions may be extracted from UAV imagery, multispectral data, LiDAR point clouds, or geographic field maps before trajectory optimization. AQGTO then operates on this representation to generate a reference path. Therefore, the current evaluation isolates the planning component in order to assess the optimization performance of the proposed algorithm under controlled and reproducible conditions.

4.1. UAV Model and Constraints

In precision agriculture applications, unmanned aerial vehicles are typically deployed to perform tasks such as crop monitoring, spraying, and field mapping. During these missions, the UAV must navigate safely through environments containing obstacles such as trees, buildings, and terrain variations. To ensure realistic trajectory generation, the path-planning process must consider several UAV operational constraints.

In this work, the UAV is modeled as a point-mass system operating in a three-dimensional environment. The trajectory is defined by a sequence of waypoints connecting the start point S to the goal point G, as described in Section 3.2. The UAV is assumed to move along straight-line segments between consecutive waypoints.

Several constraints are imposed to ensure safe and feasible UAV navigation.

Altitude constraint. The UAV altitude must remain within allowable flight limits defined by the mission requirements:

z_{m i n} \leq z_{i} \leq z_{m a x}

(31)

where

z_{i}

represents the altitude of waypoint

W_{i}

.

Obstacle avoidance constraint. The UAV trajectory must avoid obstacles present in the agricultural environment. Obstacles such as trees or infrastructure elements are modeled as cylindrical regions defined by their center coordinates and radius. Any trajectory segment intersecting an obstacle is considered infeasible and penalized through the objective function.

Trajectory continuity constraint. To maintain feasible UAV motion, the distance between consecutive waypoints must remain within a reasonable range, ensuring smooth transitions between trajectory segments.

Flight safety constraint. A minimum safety distance is maintained between the UAV trajectory and obstacle boundaries to account for navigation uncertainty and environmental disturbances.

These constraints ensure that the generated trajectories are physically realizable and safe for UAV operation in agricultural environments.

4.2. Simulation Environment

To evaluate the performance of the proposed AQGTO algorithm, three representative agricultural path-planning scenarios were simulated. These scenarios are not intended to reproduce a single real farm as a digital twin; rather, they provide controlled test environments that capture common geometric challenges encountered in precision-agriculture UAV missions. The simulated tasks correspond to field monitoring, orchard inspection, and terrain-aware agricultural surveying, where the UAV must generate a collision-free and smooth reference trajectory between a fixed start point and a target location. The considered environments are illustrated in Figure 3.

The simulation space is defined as a bounded three-dimensional region:

Ω = [0, X_{m a x}] \times [0, Y_{m a x}] \times [Z_{m i n}, Z_{m a x}]

(32)

where

X_{m a x}

and

Y_{m a x}

represent the horizontal dimensions of the field, and

Z_{m i n}

and

Z_{m a x}

correspond to the allowable UAV altitude limits.

Within this environment, obstacles representing trees, buildings, or agricultural structures are distributed according to the characteristics of the considered scenario. Obstacles are modeled as cylindrical volumes defined by their center coordinates, radius, and height.

The cylindrical obstacle model is used as a geometric abstraction of common agricultural structures such as tree trunks and canopies, irrigation equipment, poles, storage elements, and small field buildings. This representation provides a controlled and computationally efficient setting for evaluating obstacle avoidance and trajectory optimization. Nevertheless, real agricultural environments may contain irregular, non-convex, and heterogeneous obstacle shapes. The present model therefore represents a simplified but useful first-level approximation, and the extension toward more realistic obstacle geometries is discussed in Section 5.9.

Three different agricultural environments are considered in the experiments:

Row-crop environment. This scenario represents a structured agricultural field, such as cereal, vegetable, or maize-like row cultivation, where vegetation rows and irrigation elements form elongated obstacle patterns. The simulated UAV mission corresponds to field monitoring or targeted spraying, where the aircraft must move from one side of the field to the opposite side while maintaining safe clearance from crop rows and field infrastructure. The main planning challenge in this scenario is to generate a short and regular path through narrow free-space corridors without violating obstacle-clearance constraints.

Orchard environment. This scenario represents an orchard or tree-plantation inspection task, where trees are approximated by cylindrical obstacles defined by trunk/canopy radius and height. The obstacle density is higher than in the row-crop environment, which forces the planner to generate more flexible trajectories around tree-like structures. This scenario is representative of UAV missions for canopy monitoring, disease inspection, or localized treatment in orchards.

Hilly terrain environment. This scenario represents agricultural monitoring over uneven or sloped terrain. Terrain variation affects the desired altitude profile and increases the importance of vertical smoothness and altitude-variation penalties. The planning objective is to maintain a safe and regular trajectory while avoiding excessive climb/descent behavior. This scenario is relevant to agricultural fields located in non-flat regions, where altitude changes and terrain clearance must be considered during UAV route generation.

For each environment, the start and goal positions are defined at opposite regions of the search space to ensure sufficiently long trajectories and meaningful optimization challenges. The proposed AQGTO algorithm and baseline methods are evaluated using identical environmental conditions to ensure fair comparisons.

4.3. Simulation Parameters

The experiments were conducted in a simulated three-dimensional agricultural environment designed to represent typical UAV operating conditions in precision agriculture. The simulation parameters defining the environment size, altitude limits, obstacle properties, and trajectory representation are summarized in Table 1.

The selected parameters aim to provide sufficiently complex environments while maintaining computational feasibility for repeated optimization runs. These settings are used consistently for all evaluated algorithms to ensure fair performance comparisons.

The obstacle dimensions and altitude limits were selected to create representative 3D planning challenges rather than to reproduce one specific agricultural field. Cylindrical obstacles with radii between 3 and 5 m and heights between 18 and 30 m provide a simplified approximation of trees, compact canopy regions, irrigation structures, poles, or small field buildings. The altitude range

z \in [5, 60]

m was intentionally defined as a broad operational search interval to test the optimizer in a three-dimensional space. However, the altitude-variation term, the energy-related surrogate cost, and the feasibility constraints discourage unnecessary vertical motion and promote smoother altitude profiles. In practical deployments, these bounds should be adjusted according to the UAV platform, crop type, regulatory altitude limits, sensor requirements, and mission objective.

All experiments were conducted using the same implementation framework and identical stopping criteria to ensure reproducibility and fair comparison between algorithms.

4.4. Algorithm Parameters

To ensure fair and reproducible comparisons, all optimization algorithms were executed using consistent parameter settings across the experiments. The parameter values were selected based on commonly used configurations in swarm optimization literature and preliminary empirical tuning.

The parameters used for the Particle Swarm Optimization (PSO), Gorilla Troops Optimizer (GTO), and the Q-learning component of the proposed AQGTO algorithm are summarized in Table 2.

The algorithmic parameters were selected to ensure a fair comparison among the evaluated optimization methods under the same computational budget. The population size and maximum number of iterations were kept identical for all population-based algorithms. The PSO parameters were chosen according to commonly used values in swarm-optimization studies and preliminary empirical testing. For the Q-learning component of AQGTO, the learning rate

α = 0.12

was selected to allow gradual Q-value adaptation without producing unstable oscillations, while the discount factor

γ = 0.90

gives sufficient importance to future rewards during iterative optimization. The exploration rate

ϵ

was linearly decreased from 0.15 to 0.02 to encourage moderate exploration during the early iterations and more exploitation-oriented behavior during the later search stages.

All algorithms were executed under identical experimental conditions, including population size, maximum number of iterations, and number of runs (Table 2 and Table 3). This ensures a fair and unbiased comparison of optimization performance across different methods. Each algorithm was run multiple times to account for stochastic variability, and the reported results correspond to the statistical averages obtained across these runs.

The objective-function weights were selected according to the relative importance of the mission requirements in agricultural UAV path planning. Obstacle avoidance was assigned the highest priority because trajectory safety is a mandatory requirement in environments containing trees, irrigation structures, and terrain-related constraints. Path length and energy-related effort were assigned relatively high weights because they directly affect mission duration and UAV endurance. Smoothness and altitude variation were assigned lower weights because they act mainly as regularization terms that reduce abrupt directional and vertical changes without dominating the safety and efficiency objectives. The penalty parameters were selected to strongly discourage infeasible trajectories, particularly segment-obstacle intersections, while still allowing the optimizer to compare feasible candidate paths according to their geometric and energetic quality. Preliminary tuning was conducted to avoid dominance of any single objective term and to ensure stable convergence across the simulated agricultural scenarios.

4.5. Evaluation Metrics

To quantitatively evaluate the performance of the proposed AQGTO algorithm and the baseline methods, several metrics are considered. These metrics assess different aspects of the generated UAV trajectories, including efficiency, safety, and trajectory quality.

Trajectory cost. The primary evaluation metric is the overall trajectory cost defined by the objective function described in Section 3.3. This metric combines path length, energy-related surrogate cost, obstacle penalties, smoothness, and altitude variation into a single value representing the quality of the UAV trajectory.

Path length. Path length measures the total distance traveled by the UAV from the start point to the goal point along the generated trajectory. Shorter paths generally lead to faster mission completion and reduced energy-related surrogate cost.

L (P) = \sum_{i = 1}^{n} ∥ W_{i + 1} - W_{i} ∥

(33)

Energy-related surrogate cost. The energy-related surrogate cost provides a simplified trajectory-level indicator of flight effort. It combines traveled distance and vertical displacement, and is used only as a comparative optimization metric. It should not be interpreted as a complete physical energy-consumption model, since real UAV energy depends on UAV mass, propulsion characteristics, flight speed, acceleration, payload, wind conditions, and climb/descent dynamics.

Path smoothness. Trajectory smoothness reflects the stability of UAV motion. It is typically measured using the turning angles between consecutive trajectory segments. Smoother trajectories result in more stable UAV flight and lower control effort.

Collision count. The number of collisions indicates whether the generated trajectory intersects with obstacles. A valid UAV path should avoid all obstacles in the environment, resulting in zero collisions.

For each algorithm, the reported results correspond to the average performance obtained over multiple independent runs in order to account for the stochastic nature of population-based optimization methods.

In addition to mean performance, standard deviation values are reported to assess the robustness and stability of each algorithm.

5. Results and Discussion

This section presents and analyzes the experimental results obtained by the proposed AQGTO algorithm, including comparisons with baseline methods, performance across different agricultural scenarios, and evaluations of convergence behavior, computational efficiency, and component contributions.

5.1. Comparison with Baseline Algorithms

To evaluate the effectiveness of the proposed AQGTO algorithm, its performance was compared with several representative baseline methods, including the classical A* planner and four population-based optimization algorithms: Particle Swarm Optimization (PSO), Grey Wolf Optimizer (GWO), Whale Optimization Algorithm (WOA), and the original Gorilla Troops Optimizer (GTO). A* was included as a deterministic graph-search baseline, while PSO, GWO, WOA, and GTO were selected as representative swarm-intelligence and bio-inspired optimizers that can be applied using the same continuous waypoint-based trajectory representation. All stochastic algorithms were evaluated over 30 independent runs under identical simulation conditions, including the same objective function, number of waypoints, population size, maximum number of iterations, and feasibility repair mechanism. Table 4 reports the mean and standard deviation of the obtained performance metrics.

The results indicate that AQGTO achieves the lowest overall trajectory cost among the evaluated methods. In particular, AQGTO obtains the shortest average path length and the lowest energy-related surrogate cost, while also showing the smallest standard deviation for the main metrics. This indicates that the proposed method improves not only the average trajectory quality but also the stability of the optimization process across independent runs.

The additional comparisons with GWO and WOA strengthen the evaluation. GWO performs competitively and achieves the second-best values for trajectory cost, path length, and energy-related cost, which confirms that it is a strong swarm-intelligence baseline for this problem. WOA also achieves competitive cost and energy-related values but presents a higher smoothness value and larger variability, indicating less stable trajectory regularity in the considered setting. Compared with these baselines, AQGTO maintains the best overall trade-off between cost minimization, path efficiency, energy-related effort, and robustness.

Compared with the original GTO, AQGTO substantially reduces the mean trajectory cost, path length, and energy-related cost. This confirms that the integration of the state-aware Q-learning mechanism improves the search behavior of the original GTO. Although PSO produces the lowest smoothness value, it yields higher overall trajectory cost and larger variability than AQGTO. Overall, these results show that AQGTO provides a favorable compromise between path efficiency, trajectory quality, and optimization stability when compared with deterministic, swarm-based, and bio-inspired baseline methods.

These findings are consistent with recent studies showing that population-based optimizers may suffer from premature convergence, loss of diversity, and sensitivity to predefined search-control parameters in UAV path-planning problems. For example, recent GWO-based UAV path-planning research has emphasized that standard GWO may converge prematurely and lacks sufficient adaptive learning when the search space becomes complex [18]. Similarly, improved WOA and SSA variants have introduced mechanisms such as nonlinear convergence control, reverse learning, chaotic initialization, Lévy flight, and disturbance-based search to improve population diversity and reduce local-optimum stagnation [19,20,21]. In this context, the improved performance of AQGTO can be explained by its ability to adaptively select search behaviors through Q-learning rather than relying only on fixed update operators.

Deep reinforcement learning methods such as DQN and PPO are also relevant to UAV path planning. However, these methods require a different sequential decision-making formulation, including state representation, action-space design, reward shaping, training episodes, and policy generalization across environment instances. In contrast, the present study focuses on trajectory-level continuous optimization using a fixed waypoint encoding and a shared multi-objective cost function. Therefore, a direct comparison with DQN or PPO would require a separate experimental protocol to ensure fairness. Extending the proposed framework toward deep reinforcement learning-based planning in dynamic agricultural environments is identified as an important direction for future work.

5.2. Statistical Significance Analysis

To further assess the robustness of the proposed AQGTO algorithm, a statistical significance analysis was conducted over independent stochastic runs. Since the results of population-based metaheuristic algorithms may not follow a normal distribution, a non-parametric Mann–Whitney U test, also known as the Wilcoxon rank-sum test, was adopted instead of a parametric t-test. The significance level was set to

p < 0.05

. The statistical test was focused on AQGTO and its base optimizer GTO. This comparison is particularly relevant because AQGTO is designed as an adaptive Q-learning-guided enhancement of the original GTO mechanism. Therefore, the test evaluates whether the proposed adaptive guidance strategy provides a statistically significant improvement over the underlying optimizer. To provide a global assessment across the considered agricultural environments, the independent runs obtained from the row-crop, orchard, and hilly scenarios were pooled for each method.

Table 5 reports the pooled Mann–Whitney U test results. The results show that AQGTO significantly outperforms GTO in terms of trajectory cost, path length, and energy-related cost, with very small p-values and large negative Cliff’s delta values. Since all these metrics are minimized, the negative Cliff’s delta values indicate that AQGTO consistently tends to produce lower values than GTO. For the smoothness metric, AQGTO obtains a lower mean value than GTO, although the difference is not statistically significant. Overall, the statistical results support the effectiveness of the adaptive Q-learning-guided mechanism introduced in AQGTO.

5.3. Performance Across Agricultural Scenarios

To further assess the robustness of the proposed method, AQGTO was compared with the original GTO algorithm across three representative agricultural environments: row-crop, orchard, and hilly terrain. Since AQGTO is an extension of GTO, this comparison highlights the effect of the adaptive Q-learning mechanism and the feasibility repair strategy under different environmental conditions. All stochastic results correspond to 30 independent runs.

Across the three evaluated agricultural scenarios, AQGTO consistently outperforms the original GTO algorithm in terms of overall trajectory cost, path length, and energy-related surrogate cost. The relative reduction in mean trajectory cost is approximately 15.4% in the row-crop environment, 12.9% in the orchard environment, and 14.5% in the hilly terrain environment. In addition, AQGTO produces substantially lower standard deviation values, indicating improved optimization stability and robustness across different environmental conditions.

The detailed results for the row-crop, orchard, and hilly-terrain environments are reported in Table 6, Table 7 and Table 8, respectively.

The consistency of AQGTO across the row-crop, orchard, and hilly-terrain scenarios suggests that the adaptive search mechanism is not restricted to a single obstacle distribution. This is important because UAV trajectory planning in agricultural and three-dimensional environments is typically affected by multiple constraints, including obstacle avoidance, flight distance, altitude regulation, and trajectory smoothness [19,21]. The orchard scenario is more constrained due to denser tree-like obstacles, whereas the hilly-terrain scenario increases the importance of altitude regularity. The ability of AQGTO to maintain lower cost and reduced variability across these different settings indicates that the Q-learning mechanism contributes to more robust exploration–exploitation control under changing environmental complexity.

5.4. Trajectory Visualization

To better illustrate the qualitative differences between the evaluated algorithms, the UAV trajectories generated by A*, PSO, GWO, WOA, GTO, and the proposed AQGTO algorithm were visualized within the simulated agricultural environment. Figure 4 presents a representative trajectory comparison in the orchard scenario. The start and goal locations are fixed for all algorithms, while the cylindrical obstacles represent agricultural structures such as trees, compact canopy regions, or field infrastructure. Each algorithm produces a collision-free trajectory after feasibility repair while attempting to minimize the objective function defined in Section 3.3.

The trajectories generated by the population-based algorithms exhibit noticeable differences in their geometric characteristics. The A* path follows a more structured route due to its graph-based nature. PSO generates relatively smooth trajectories but does not always minimize the overall trajectory cost. GWO produces a competitive trajectory and confirms its effectiveness as a strong swarm-intelligence baseline. WOA generates a feasible trajectory, but its path may show larger deviations and less regularity in the dense orchard setting. The original GTO often produces a longer and less efficient trajectory compared with the adaptive variant.

In contrast, the proposed AQGTO algorithm generates a shorter and more stable trajectory while preserving safe obstacle clearance. This improvement is primarily due to the adaptive Q-learning mechanism, which dynamically guides the search process according to the current optimization state. Overall, the trajectory visualization supports the quantitative results reported in Table 4, demonstrating the ability of AQGTO to generate efficient and feasible UAV trajectories in complex agricultural environments.

Figure 4 illustrates representative UAV trajectories generated by the evaluated algorithms in the orchard environment. The start and goal points are marked to clarify the mission direction, while the cylindrical obstacles and their safety-clearance regions illustrate the spatial constraints imposed during planning. All methods generate collision-free trajectories after feasibility repair; however, their geometric characteristics differ noticeably. The A* path follows a more structured route due to its graph-based nature, while the population-based optimizers generate more flexible continuous trajectories with different levels of path regularity and obstacle-clearance behavior. GWO produces a competitive path, WOA remains feasible but shows less regularity in the dense orchard setting, and the original GTO tends to generate a longer trajectory. Compared with these baselines, AQGTO produces a shorter and more regular path while preserving safe obstacle clearance. This behavior is consistent with the quantitative improvements in cost, path length, and stability reported in Table 4, Table 5, Table 6, Table 7 and Table 8.

5.5. Convergence Analysis

To further analyze the optimization performance of the proposed AQGTO algorithm, the convergence behavior of the stochastic optimization algorithms was examined. Convergence curves provide insight into how quickly each algorithm reduces the objective-function value during the optimization process and how stable this reduction remains across independent runs.

Figure 5 illustrates the evolution of the best trajectory cost over the optimization iterations for the stochastic optimization algorithms. The solid curves represent the mean best cost obtained over 30 independent runs, while the shaded regions indicate the variability across runs. This representation provides information not only about convergence speed, but also about the stability of each optimizer during the search process.

The convergence results show that the evaluated algorithms exhibit different search behaviors. PSO converges relatively quickly during the early iterations, but its improvement rate decreases in later stages, indicating possible premature stabilization. GWO shows competitive convergence behavior and maintains good solution quality, which is consistent with its strong numerical performance in Table 4. WOA also reduces the objective value during optimization, but its convergence profile presents larger variability, reflecting less stable search behavior in the considered scenario.

The original GTO exhibits slower convergence and remains less competitive than its adaptive variant. In contrast, AQGTO achieves a faster and more stable reduction in the best trajectory cost. The narrower variability band indicates that AQGTO produces more consistent optimization behavior across independent runs. This improvement can be attributed to the state-aware Q-learning mechanism, which adaptively adjusts the search behavior according to population diversity, improvement rate, and feasibility ratio. Overall, the convergence analysis supports the numerical results by showing that AQGTO improves both convergence efficiency and run-to-run robustness.

5.6. Computational Time Analysis

In addition to solution quality, computational efficiency is an important factor in UAV path planning, particularly for real-time or large-scale applications. Therefore, the runtime performance of the evaluated algorithms was analyzed under identical experimental conditions. The reported results correspond to the average computational time over 30 independent runs.

The computational time results are reported in Table 9. As expected, the A* algorithm exhibits the lowest runtime due to its deterministic graph-based nature. In contrast, population-based metaheuristic algorithms require higher computational effort because they perform iterative population updates and repeated objective-function evaluations. Among the stochastic optimizers, runtime differences are mainly related to the complexity of the update rules and the additional adaptive mechanisms used during the search process.

The proposed AQGTO algorithm introduces additional computational overhead compared with the original GTO, primarily due to the integration of the adaptive Q-learning mechanism, which involves state evaluation, action selection, reward computation, and Q-table updates at each iteration. However, this additional cost is compensated by improved trajectory quality, lower variability, and stronger convergence stability. These findings indicate that AQGTO provides a favorable trade-off between solution quality and computational effort for complex three-dimensional UAV path-planning problems. Runtime optimization can be further improved through parallel population evaluation and accelerated collision checking.

5.7. Ablation Study

To evaluate the contribution of the adaptive Q-learning mechanism, an ablation study was conducted by comparing the proposed AQGTO algorithm with the original Gorilla Troops Optimizer (GTO) under identical experimental conditions. Both algorithms share the same initialization, population size, number of iterations, and objective function. The only difference lies in the integration of the Q-learning strategy in AQGTO. All results are reported over 30 independent runs. The ablation results are presented in Table 10.

The results demonstrate the effectiveness of the proposed Q-learning mechanism. AQGTO improves over the original GTO algorithm in the evaluated metrics, including trajectory cost, path length, energy-related surrogate cost, and smoothness. In particular, AQGTO achieves a substantial reduction in trajectory cost while also producing more stable results, as indicated by the lower standard deviation values.

These improvements can be attributed to the adaptive action selection strategy introduced by Q-learning, which dynamically balances exploration and exploitation during the optimization process. As a result, AQGTO avoids premature convergence and improves search efficiency compared with the standard GTO algorithm.

This confirms that the performance gain is not merely due to parameter tuning but is directly related to the integration of the reinforcement learning mechanism.

5.8. Sensitivity Analysis of Objective-Function Weights

To examine the influence of the objective-function weights on the behavior of the proposed method, an additional sensitivity analysis was conducted using several representative weighting configurations. The purpose of this analysis is not to identify a universal set of weights, since UAV mission priorities may vary depending on the agricultural task, but rather to verify whether the proposed AQGTO remains stable under reasonable changes in the relative importance of the objective terms.

Four configurations were considered: the default configuration used in the main experiments, a safety-oriented configuration that increases the obstacle penalty weight, an energy-oriented configuration that increases the energy-related term, and a smoothness-oriented configuration that increases the smoothness penalty. These objective-function weight configurations are summarized in Table 11. All configurations were evaluated under the same simulation conditions and over 30 independent runs.

The sensitivity-analysis results are reported in Table 12. The results indicate that AQGTO maintains stable and collision-free behavior under all tested configurations. As expected, increasing the obstacle penalty preserves conservative obstacle avoidance, while increasing the energy-related weight slightly reduces the energy-related metric at the cost of a small increase in total weighted cost. Similarly, increasing the smoothness weight reduces the smoothness term but may lead to a slightly longer trajectory. These results confirm that the proposed method is not dependent on a single empirical weight setting and that the selected default configuration provides a balanced trade-off between trajectory efficiency, safety, and flight regularity.

Overall, the experimental results indicate that the main advantage of AQGTO lies in its adaptive search-control mechanism. Compared with standard population-based optimizers, the proposed method uses reinforcement learning to respond to population diversity, improvement rate, and feasibility evolution during optimization. This adaptive behavior helps explain the observed reductions in trajectory cost, path length, and run-to-run variability. At the same time, the runtime analysis shows that these gains are obtained at the cost of additional computation, which motivates future work on parallel implementation and real-time deployment.

5.9. Scope and Limitations

The results presented in this study should be interpreted within the scope of offline trajectory optimization in simulated agricultural environments. The proposed AQGTO framework assumes that the main obstacle layout is known before planning and that obstacles remain static during the optimization process. This assumption is suitable for many pre-mission planning tasks, such as field monitoring, orchard inspection, and targeted spraying route generation, where trees, crop rows, irrigation structures, and terrain-related obstacles can be mapped in advance.

However, real agricultural UAV missions may involve additional uncertainties that are not explicitly considered in the current simulation framework. These include moving obstacles such as workers, animals, or agricultural vehicles; wind disturbances that affect UAV motion and energy-related surrogate cost; sensor noise in obstacle detection and localization; and communication delays in edge/cloud-assisted deployment. These factors may require online replanning or closed-loop trajectory correction during mission execution.

Another limitation concerns the geometric representation of the environment. In the current experiments, obstacles are modeled as cylinders and the flight region is represented as a bounded rectangular three-dimensional space. Although this abstraction is widely used in UAV path-planning simulations and is suitable for modeling trees, poles, and compact agricultural structures, it does not fully represent irregular crop canopies, non-convex obstacles, dense vegetation clusters, or terrain surfaces reconstructed from real sensor data. Future work will therefore consider richer environmental models, including clustered cylindrical obstacles, ellipsoidal canopy models, polygonal and non-convex obstacles, occupancy-grid maps, and point-cloud-based representations obtained from UAV imagery or LiDAR sensing.

The energy-related term used in the objective function is another modeling simplification. It penalizes travel distance and vertical displacement but does not represent a complete physical UAV energy-consumption model. Real UAV energy depends on platform-specific parameters such as mass, propulsion system, payload, velocity profile, acceleration, wind speed and direction, and climb/descent power requirements. Future work will incorporate physics-based or data-driven UAV energy models to improve the realism of mission-level trajectory optimization.

Therefore, the current contribution should be viewed as a global optimization layer for generating high-quality reference trajectories under known environmental conditions. Future extensions of AQGTO should integrate dynamic obstacle prediction, wind-aware cost modeling, uncertainty-aware safety margins, and real-time replanning mechanisms to support deployment in changing agricultural environments.

6. Conclusions

In this paper, an Adaptive Q-Learning Guided Gorilla Troops Optimizer (AQGTO) was proposed for three-dimensional UAV path planning in precision agriculture. By integrating reinforcement learning into the optimization process, the proposed method dynamically adjusts the search strategy to improve the balance between exploration and exploitation. A multi-objective cost formulation was introduced to account for path length, energy-related surrogate cost, obstacle avoidance, trajectory smoothness, and altitude variation. In addition, a feasibility repair mechanism was employed to promote safe and collision-free trajectories in complex environments. Experimental results conducted across multiple agricultural scenarios show that AQGTO achieves competitive and improved performance compared with classical A*, PSO, GWO, WOA, and the original GTO algorithm in terms of trajectory cost, path efficiency, and optimization stability. The ablation study further indicates that the integration of the Q-learning mechanism contributes to improved optimization behavior.

Although the proposed method introduces additional computational overhead, it provides a favorable trade-off between solution quality, trajectory safety, and optimization stability. The present study focuses on offline trajectory optimization in known simulated agricultural environments. Therefore, several extensions are required before deployment in fully operational agricultural UAV missions.

Future work will first focus on reducing computational complexity through parallel population evaluation and accelerated collision-checking procedures. Second, AQGTO will be extended to dynamic agricultural environments by incorporating moving obstacles, online replanning, wind-aware trajectory optimization, and uncertainty-aware safety margins. Third, richer environmental representations will be considered, including irregular and non-convex obstacles, clustered vegetation, occupancy-grid maps, and point-cloud-based models derived from UAV imagery or LiDAR sensing. Fourth, edge/cloud-assisted deployment will be investigated to support computation offloading and near-real-time trajectory updates under communication constraints. Finally, real-flight experiments and multi-UAV cooperative path planning will be conducted to validate the proposed framework under practical precision-agriculture conditions.

Author Contributions

Conceptualization, T.B. and C.A.K.; methodology, T.B. and S.S.B.; software, T.B.; validation, T.B., S.S.B. and J.H.-T.; formal analysis, T.B.; investigation, T.B. and S.S.B.; resources, C.A.K. and J.H.-T.; data curation, T.B.; writing—original draft preparation, T.B.; writing—review and editing, C.A.K. and J.H.-T.; visualization, T.B.; supervision, C.A.K.; project administration, C.A.K.; funding acquisition, J.H.-T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Basiri, A.; Mariani, V.; Silano, G.; Aatif, M.; Iannelli, L.; Glielmo, L. A survey on the application of path-planning algorithms for multi-rotor UAVs in precision agriculture. J. Navig. 2022, 75, 364–383. [Google Scholar] [CrossRef]
Kumar, P.; Pal, K.; Govil, M.C. Comprehensive review of path planning techniques for unmanned aerial vehicles (uavs). ACM Comput. Surv. 2025, 58, 73. [Google Scholar] [CrossRef]
Ghambari, S.; Golabi, M.; Jourdan, L.; Lepagnot, J.; Idoumghar, L. UAV path planning techniques: A survey. RAIRO-Oper. Res. 2024, 58, 2951–2989. [Google Scholar] [CrossRef]
Höffmann, M.; Patel, S.; Büskens, C. Optimal guidance track generation for precision agriculture: A review of coverage path planning techniques. J. Field Robot. 2024, 41, 823–844. [Google Scholar] [CrossRef]
Fu, H.; Li, Z.; Zhang, W.; Feng, Y.; Zhu, L.; Fang, X.; Li, J. Research on path planning of agricultural UAV based on improved deep reinforcement learning. Agronomy 2024, 14, 2669. [Google Scholar] [CrossRef]
Jones, M.; Djahel, S.; Welsh, K. Path-planning for unmanned aerial vehicles with environment complexity considerations: A survey. ACM Comput. Surv. 2023, 55, 234. [Google Scholar] [CrossRef]
Meng, W.; Zhang, X.; Zhou, L.; Guo, H.; Hu, X. Advances in UAV path planning: A comprehensive review of methods, challenges, and future directions. Drones 2025, 9, 376. [Google Scholar] [CrossRef]
Debnath, D.; Vanegas, F.; Sandino, J.; Hawary, A.F.; Gonzalez, F. A review of UAV path-planning algorithms and obstacle avoidance methods for remote sensing applications. Remote Sens. 2024, 16, 4019. [Google Scholar] [CrossRef]
Chen, J.; Li, M.; Yuan, Z.; Gu, Q. An improved A* algorithm for UAV path planning problems. In 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC); IEEE: New York, NY, USA, 2020; Volume 1, pp. 958–962. [Google Scholar] [CrossRef]
Tian, R.; Cao, M.; Ma, F.; Ji, P. Agricultural UAV path planning based on improved A* and gravity search mixed algorithm. In Proceedings of the Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2020; Volume 1631, p. 012082. [Google Scholar] [CrossRef]
Yahia, H.S.; Mohammed, A.S. Path planning optimization in unmanned aerial vehicles using meta-heuristic algorithms: A systematic review. Environ. Monit. Assess. 2023, 195, 30. [Google Scholar] [CrossRef]
Agrawal, S.; Patle, B.K.; Sanap, S. A systematic review on metaheuristic approaches for autonomous path planning of unmanned aerial vehicles. Drone Syst. Appl. 2024, 12, 1–28. [Google Scholar] [CrossRef]
Hooshyar, M.; Huang, Y.M. Meta-heuristic algorithms in UAV path planning optimization: A systematic review (2018–2022). Drones 2023, 7, 687. [Google Scholar] [CrossRef]
Sonny, A.; Yeduri, S.R.; Cenkeramaddi, L.R. Autonomous UAV path planning using modified PSO for UAV-assisted wireless networks. IEEE Access 2023, 11, 70353–70367. [Google Scholar] [CrossRef]
Lin, L.; Wang, Z.; Tian, L.; Wu, J.; Wu, W. A PSO-based energy-efficient data collection optimization algorithm for UAV mission planning. PLoS ONE 2024, 19, e0297066. [Google Scholar] [CrossRef] [PubMed]
Abdollahzadeh, B.; Soleimanian Gharehchopogh, F.; Mirjalili, S. Artificial gorilla troops optimizer: A new nature-inspired metaheuristic algorithm for global optimization problems. Int. J. Intell. Syst. 2021, 36, 5887–5958. [Google Scholar] [CrossRef]
Mostafa, R.R.; Gaheen, M.A.; Abd ElAziz, M.; Al-Betar, M.A.; Ewees, A.A. An improved gorilla troops optimizer for global optimization problems and feature selection. Knowl.-Based Syst. 2023, 269, 110462. [Google Scholar] [CrossRef]
Nayeem, G.M.; Fan, M.; Daiyan, G.M. Adaptive Q-learning grey wolf optimizer for UAV path planning. Drones 2025, 9, 246. [Google Scholar] [CrossRef]
Yang, Y.; Fu, Y.; Lu, D.; Xiang, H.; Xu, K. Three-Dimensional unmanned aerial vehicle trajectory planning based on the improved whale optimization algorithm. Symmetry 2024, 16, 1561. [Google Scholar] [CrossRef]
Zhang, J.; Zhu, X.; Li, J. Intelligent path planning with an improved sparrow search algorithm for workshop UAV inspection. Sensors 2024, 24, 1104. [Google Scholar] [CrossRef]
Yang, Y.; Sun, L.; Fu, Y.; Feng, W.; Xu, K. Three-Dimensional UAV Trajectory Planning Based on Improved Sparrow Search Algorithm. Symmetry 2025, 17, 2071. [Google Scholar] [CrossRef]
Tu, G.T.; Juang, J.G. UAV path planning and obstacle avoidance based on reinforcement learning in 3d environments. Actuators 2023, 12, 57. [Google Scholar] [CrossRef]
Watkins, C.J.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
Ait Saadi, A.; Soukane, A.; Meraihi, Y.; Benmessaoud Gabis, A.; Mirjalili, S.; Ramdane-Cherif, A. UAV Path Planning Using Optimization Approaches: A Survey. Arch. Comput. Methods Eng. 2022, 29, 4233–4284. [Google Scholar] [CrossRef]
Dijkstra, E.W. A note on two problems in connexion with graphs. Numer. Math. 1959, 1, 269–271. [Google Scholar] [CrossRef]
LaValle, S. Rapidly-Exploring Random Trees: A New Tool for Path Planning; Research Report 9811; Department of Computer Science, Iowa State University: Ames, IA, USA, 1998. [Google Scholar]

Figure 1. Typical UAV applications in precision agriculture, including crop monitoring, field inspection, and targeted spraying operations in agricultural environments.

Figure 2. Workflow of the proposed AQGTO framework. The Q-learning agent observes the optimization state and adaptively guides the exploration and exploitation behavior of the Gorilla Troops Optimizer during UAV trajectory optimization.

Figure 3. Illustration of the simulated agricultural environments used in the experiments: (a) row-crop field, (b) orchard plantation, and (c) hilly terrain scenario.

Figure 4. Three-dimensional trajectory comparison of A*, PSO, GWO, WOA, GTO, and AQGTO in the orchard environment. The start and goal points are explicitly marked, and cylindrical obstacles are shown with their corresponding safety-clearance regions to highlight obstacle-avoidance behavior. The proposed AQGTO algorithm generates a collision-free and efficient trajectory while maintaining safe obstacle clearance in the dense agricultural scenario.

Figure 5. Convergence curves of PSO, GWO, WOA, GTO, and AQGTO during UAV trajectory optimization. The solid curves represent the mean best trajectory cost over 30 independent runs, while the shaded regions indicate run-to-run variability.

Table 1. Simulation parameters used in UAV path planning experiments.

Parameter	Value
Environment size	$200 \times 200 \times 60$ m
Altitude range	$z \in [5, 60]$ m
Start point	$(10, 10, 10)$
Goal point	$(190, 190, 20)$
Number of waypoints	10
Maximum segment length	45 m
Maximum altitude change	15 m
Maximum turning angle	$120^{\circ}$
Population size	30
Maximum iterations	200
Number of runs	30
Obstacle model	Cylindrical obstacles
Number of obstacles	Scenario-dependent (4–25)
Obstacle radius	3–5 m
Obstacle height	18–30 m
Safety handling	Repair mechanism + penalty-based avoidance

Table 2. Algorithm parameter settings.

Algorithm	Parameter	Value
PSO	Swarm size	30
	Inertia weight (w)	0.7
	Cognitive coefficient ( $c_{1}$ )	1.5
	Social coefficient ( $c_{2}$ )	1.5
GWO	Population size	30
GWO	Maximum iterations	200
WOA	Population size	30
WOA	Maximum iterations	200
GTO/AQGTO	Population size	30
GTO/AQGTO	Maximum iterations	200
Q-learning	Learning rate ( $α$ )	0.12
	Discount factor ( $γ$ )	0.90
	Exploration rate ( $ϵ$ )	Linearly decayed (0.15 → 0.02)

Table 3. Objective function weights and penalty parameters.

Parameter	Value
Path length weight $w_{1}$	1.0
Energy-related cost weight $w_{2}$	0.8
Obstacle penalty weight $w_{3}$	2.0
Smoothness weight $w_{4}$	0.2
Altitude variation weight $w_{5}$	0.15
Vertical energy coefficient $η_{z}$	0.4
Obstacle proximity gain $β$	0.08
Obstacle safety clearance $d_{s}$	8.0
Altitude safety margin $δ_{h}$	2.0
Collision penalty $λ$	5000
Collision sampling points	25

Table 4. Performance comparison of A*, PSO, GWO, WOA, GTO, and AQGTO over 30 independent runs in the row-crop environment. All algorithms produced collision-free trajectories after feasibility repair. Best values are highlighted in bold.

Algorithm	Cost	Path Length	Energy-Related Cost	Smoothness
A*	$502.08 \pm 0.00$	$270.98 \pm 0.00$	$285.98 \pm 0.00$	$1.11 \pm 0.00$
PSO	$510.69 \pm 39.35$	$267.40 \pm 12.64$	$292.11 \pm 26.81$	$0.53 \pm 0.89$
GTO	$512.50 \pm 18.24$	$274.31 \pm 9.61$	$293.50 \pm 10.74$	$2.81 \pm 1.64$
GWO	$484.22 \pm 12.03$	$263.31 \pm 6.39$	$273.98 \pm 6.79$	$1.12 \pm 1.32$
WOA	$491.70 \pm 32.01$	$267.36 \pm 17.22$	$277.49 \pm 17.27$	$11.45 \pm 12.26$
AQGTO	$477.99 \pm 4.69$	$258.95 \pm 2.56$	$271.26 \pm 2.89$	$0.79 \pm 0.81$

Table 5. Pooled statistical significance analysis between AQGTO and its base optimizer GTO across all agricultural scenarios. The Mann–Whitney U test was applied to independent runs pooled from the row-crop, orchard, and hilly scenarios. The deterministic A* method was excluded because it does not provide a stochastic distribution across repeated runs.

Metric	AQGTO Mean ± Std	GTO Mean ± Std	U-Statistic	p-Value	Cliff’s $δ$
Trajectory cost	$483.635 \pm 10.526$	$512.887 \pm 20.503$	$66.000$	$2.8588 \times 10^{- 8}$	$- 0.853$
Path length	$261.233 \pm 5.325$	$273.318 \pm 10.403$	$83.000$	$1.2021 \times 10^{- 7}$	$- 0.816$
Energy-related cost	$275.045 \pm 6.290$	$294.824 \pm 12.124$	$61.000$	$1.8521 \times 10^{- 8}$	$- 0.864$
Smoothness	$1.870 \pm 2.149$	$3.234 \pm 3.056$	$332.000$	$0.1235$	$- 0.262$

Table 6. Performance comparison between AQGTO and GTO in the row-crop environment over 30 independent runs. Both algorithms produced collision-free trajectories (zero collisions). Best values are highlighted in bold.

Algorithm	Cost	Path Length	Energy-Related Cost	Smoothness	Altitude Variation
AQGTO	$489.71 \pm 7.18$	$261.89 \pm 3.13$	$276.71 \pm 4.33$	$3.88 \pm 3.71$	$37.07 \pm 4.71$
GTO	$578.73 \pm 42.22$	$305.17 \pm 23.18$	$328.63 \pm 23.67$	$8.55 \pm 4.72$	$58.65 \pm 13.86$

Table 7. Performance comparison between AQGTO and GTO in the orchard environment over 30 independent runs. Both algorithms produced collision-free trajectories (zero collisions). Best values are highlighted in bold.

Algorithm	Cost	Path Length	Energy-Related Cost	Smoothness	Altitude Variation
AQGTO	$511.07 \pm 15.74$	$271.34 \pm 7.93$	$289.67 \pm 8.93$	$5.54 \pm 3.70$	$45.82 \pm 5.94$
GTO	$586.51 \pm 48.20$	$306.96 \pm 24.90$	$334.08 \pm 27.54$	$9.81 \pm 4.90$	$67.79 \pm 15.01$

Table 8. Performance comparison between AQGTO and GTO in the hilly terrain environment over 30 independent runs. Both algorithms produced collision-free trajectories (zero collisions). Best values are highlighted in bold.

Algorithm	Cost	Path Length	Energy-Related Cost	Smoothness	Altitude Variation
AQGTO	$490.68 \pm 9.64$	$262.29 \pm 4.66$	$277.26 \pm 5.34$	$4.83 \pm 4.89$	$37.40 \pm 3.92$
GTO	$573.91 \pm 30.87$	$303.44 \pm 16.61$	$325.38 \pm 17.03$	$9.63 \pm 4.69$	$54.85 \pm 9.19$

Table 9. Average computational time over 30 independent runs.

Algorithm	Mean Runtime (s)	Std (s)
A*	0.0189	0.0008
PSO	20.9470	2.8537
GWO	17.3381	0.2328
WOA	21.7035	2.5955
GTO	19.9520	0.5878
AQGTO	52.5578	7.5043

Table 10. Ablation study comparing GTO and AQGTO in the row-crop environment. Both algorithms produced collision-free trajectories (zero collisions). Best values are highlighted in bold.

Algorithm	Cost	Path Length	Energy-Related Cost	Smoothness
GTO	$573.81 \pm 40.55$	$303.12 \pm 21.73$	$325.38 \pm 22.84$	$9.74 \pm 4.90$
AQGTO	$491.10 \pm 7.38$	$262.73 \pm 3.63$	$277.36 \pm 4.03$	$4.75 \pm 4.33$

Table 11. Sensitivity analysis of AQGTO under different objective-function weight configurations in the row-crop environment. Values are reported as mean ± standard deviation over 30 independent runs.

Configuration	$w_{1}$	$w_{2}$	$w_{3}$	$w_{4}$	$w_{5}$
Default	1.00	0.80	2.00	0.20	0.15
Safety-oriented	1.00	0.80	3.00	0.20	0.15
Energy-oriented	1.00	1.20	2.00	0.20	0.15
Smoothness-oriented	1.00	0.80	2.00	0.50	0.15

Table 12. Performance of AQGTO under different objective-function weight configurations in the row-crop environment. The algorithm produced collision-free trajectories (zero collisions).

Configuration	Cost	Path Length	Energy-Related Cost	Smoothness
Default	$491.10 \pm 7.38$	$262.73 \pm 3.63$	$277.36 \pm 4.03$	$4.75 \pm 4.33$
Safety-oriented	$497.84 \pm 8.92$	$264.18 \pm 4.11$	$278.94 \pm 4.76$	$5.12 \pm 4.28$
Energy-oriented	$503.27 \pm 9.45$	$263.91 \pm 4.36$	$276.82 \pm 3.91$	$5.36 \pm 4.61$
Smoothness-oriented	$499.65 \pm 8.77$	$265.46 \pm 4.58$	$280.13 \pm 5.08$	$3.81 \pm 3.56$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bendouma, T.; Boudouh, S.S.; Kerrache, C.A.; Herrera-Tapia, J. AQGTO: Adaptive Q-Learning-Guided Gorilla Troops Optimizer for 3D UAV Path Planning in Precision Agriculture. Drones 2026, 10, 357. https://doi.org/10.3390/drones10050357

AMA Style

Bendouma T, Boudouh SS, Kerrache CA, Herrera-Tapia J. AQGTO: Adaptive Q-Learning-Guided Gorilla Troops Optimizer for 3D UAV Path Planning in Precision Agriculture. Drones. 2026; 10(5):357. https://doi.org/10.3390/drones10050357

Chicago/Turabian Style

Bendouma, Tahar, Saida Sarra Boudouh, Chaker Abdelaziz Kerrache, and Jorge Herrera-Tapia. 2026. "AQGTO: Adaptive Q-Learning-Guided Gorilla Troops Optimizer for 3D UAV Path Planning in Precision Agriculture" Drones 10, no. 5: 357. https://doi.org/10.3390/drones10050357

APA Style

Bendouma, T., Boudouh, S. S., Kerrache, C. A., & Herrera-Tapia, J. (2026). AQGTO: Adaptive Q-Learning-Guided Gorilla Troops Optimizer for 3D UAV Path Planning in Precision Agriculture. Drones, 10(5), 357. https://doi.org/10.3390/drones10050357

Article Menu

AQGTO: Adaptive Q-Learning-Guided Gorilla Troops Optimizer for 3D UAV Path Planning in Precision Agriculture

Highlights

Abstract

1. Introduction

2. Related Work

2.1. Classical Path Planning Methods

2.2. Metaheuristic Optimization for UAV Path Planning

2.3. Hybrid Learning-Based Optimization Methods

2.4. Research Gap

3. Proposed Method

3.1. Problem Formulation

3.2. Path Encoding

3.3. Objective Function

3.3.1. Path Length

3.3.2. Energy-Related Surrogate Cost

3.3.3. Obstacle Penalty

3.3.4. Path Smoothness

3.3.5. Altitude Variation

3.4. Gorilla Troops Optimizer

3.4.1. Exploration Phase

3.4.2. Exploitation Phase

3.5. Adaptive Q-Learning Mechanism

3.5.1. State Definition

3.5.2. Action Space

3.5.3. Reward Function

3.5.4. Q-Learning Update Rule

3.5.5. Action Selection Policy

3.5.6. Integration into AQGTO

3.6. Feasibility Repair Mechanism

3.7. AQGTO Algorithm

4. Experimental Setup

4.1. UAV Model and Constraints

4.2. Simulation Environment

4.3. Simulation Parameters

4.4. Algorithm Parameters

4.5. Evaluation Metrics

5. Results and Discussion

5.1. Comparison with Baseline Algorithms

5.2. Statistical Significance Analysis

5.3. Performance Across Agricultural Scenarios

5.4. Trajectory Visualization

5.5. Convergence Analysis

5.6. Computational Time Analysis

5.7. Ablation Study

5.8. Sensitivity Analysis of Objective-Function Weights

5.9. Scope and Limitations

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI