1. Introduction
Precision agriculture has emerged as a transformative approach for improving crop productivity, resource efficiency, and environmental sustainability [
1,
2,
3]. By leveraging advanced sensing and automation technologies, precision agriculture enables farmers to monitor crop conditions, detect diseases, and optimize irrigation and fertilization processes. In this context, Unmanned Aerial Vehicles (UAVs) have become increasingly important due to their ability to rapidly collect high-resolution aerial data and perform targeted agricultural operations such as crop monitoring, spraying, and field inspection [
1,
4,
5]. Typical UAV applications in precision agriculture are illustrated in
Figure 1.
Compared with traditional ground-based monitoring systems, UAVs offer several advantages including rapid deployment, flexible coverage, and the ability to access difficult terrain. However, effective UAV deployment in agricultural environments requires reliable and efficient path-planning algorithms capable of generating safe and energy-efficient flight trajectories [
2,
6,
7]. Agricultural landscapes often contain complex obstacles such as trees, irrigation systems, buildings, and terrain variations [
1,
8]. Consequently, UAV path planning must simultaneously consider multiple constraints including obstacle avoidance, trajectory smoothness, energy consumption, and altitude limitations.
Classical path-planning algorithms such as A* have been widely applied in robotic navigation and UAV trajectory generation due to their deterministic search mechanisms and guaranteed optimality in discretized environments [
9,
10]. Despite their effectiveness in structured environments, these algorithms often struggle when dealing with continuous high-dimensional search spaces and multi-objective optimization problems commonly encountered in UAV trajectory planning [
2,
6].
To address these limitations, researchers have increasingly explored metaheuristic optimization techniques for UAV path planning [
11,
12,
13]. Swarm intelligence algorithms such as Particle Swarm Optimization (PSO) have been widely adopted due to their simple structure and strong global search capability [
14,
15]. Similarly, recent bio-inspired algorithms such as the Gorilla Troops Optimizer (GTO) have demonstrated promising performance in solving complex nonlinear optimization problems [
16]. However, population-based optimization algorithms may still suffer from premature convergence or inefficient exploration–exploitation balance during the search process [
17].
Beyond PSO and GTO, several recent swarm-intelligence and bio-inspired algorithms have been investigated for UAV trajectory planning. Grey Wolf Optimizer (GWO)-based methods have attracted attention because of their leadership hierarchy and ability to balance exploration and exploitation. Recent studies have further incorporated Q-learning into GWO to overcome premature convergence, local minima, and limited adaptive learning in UAV path planning [
18]. Similarly, Whale Optimization Algorithm (WOA)-based approaches have been applied to three-dimensional UAV trajectory planning, where improved variants use reverse learning, nonlinear convergence factors, and random mechanisms to improve population diversity and avoid local optima [
19]. Sparrow Search Algorithm (SSA)-based methods have also recently been used for UAV path planning, with improved variants introducing sine–cosine strategies, Lévy flight, chaotic mapping, or hybrid disturbance mechanisms to improve global exploration, convergence accuracy, and solution stability [
20,
21].
In recent years, hybrid approaches combining reinforcement learning with metaheuristic optimization have attracted increasing attention [
5,
22]. Reinforcement learning allows optimization algorithms to adaptively adjust their search strategies based on feedback obtained during the optimization process. Among reinforcement learning techniques, Q-learning provides a simple yet effective framework for learning optimal action-selection policies through interaction with the environment [
23].
Despite the significant progress achieved by classical and metaheuristic approaches, existing methods still face several limitations in complex agricultural environments. Classical planners struggle with multi-objective optimization in continuous search spaces, while metaheuristic algorithms often rely on predefined search strategies that cannot dynamically adapt to the optimization state. This may lead to premature convergence or inefficient exploration–exploitation balance, particularly in high-dimensional UAV trajectory planning problems.
To address these limitations, this paper proposes an Adaptive Q-Learning Guided Gorilla Troops Optimizer (AQGTO) for three-dimensional UAV path planning in precision agriculture environments. The proposed approach integrates a Q-learning mechanism into the search process of the Gorilla Troops Optimizer to dynamically guide exploration and exploitation behaviors during optimization. In addition, a feasibility repair strategy is introduced to maintain valid trajectories and improve obstacle avoidance during the search process.
In practical agricultural UAV systems, path planning is typically part of a broader perception–decision–execution pipeline. In such a pipeline, perception modules acquire environmental information using onboard cameras, multispectral sensors, LiDAR, GPS/INS, or pre-existing field maps. This information is then converted into a planning representation, such as an obstacle map, elevation map, or occupancy grid. The decision layer generates a feasible reference trajectory, while the execution layer tracks the planned path through the UAV flight controller. The present work focuses on the decision layer, where AQGTO is used as a global trajectory optimizer under the assumption that the main environmental structure is available before planning. Real-time perception, multispectral image interpretation, online map updating, and low-level flight control are outside the scope of the current study and are identified as future extensions.
The UAV path-planning problem is formulated as a multi-objective optimization task that simultaneously considers path length, energy-related surrogate cost, obstacle avoidance, trajectory smoothness, and altitude variation. Extensive experiments are conducted in representative agricultural environments including row-crop fields, orchard plantations, and hilly terrain scenarios.
The novelty of the proposed approach lies in the state-aware integration of Q-learning into the Gorilla Troops Optimizer for three-dimensional UAV trajectory optimization. Unlike standard metaheuristic approaches that rely mainly on fixed or predefined search operators, AQGTO observes the current optimization state through population diversity, improvement rate, and feasibility ratio. Based on this state information, the Q-learning agent adaptively selects among exploration, exploitation, and diversification actions. This enables the optimizer to respond dynamically to stagnation, premature convergence, and feasibility evolution during the search process. Therefore, the proposed method differs from existing operator-level improvements by introducing an adaptive decision mechanism that guides the search behavior according to the current optimization conditions.
The main contributions of this work can be summarized as follows:
A new Adaptive Q-Learning Guided Gorilla Troops Optimizer (AQGTO) is proposed for three-dimensional UAV path planning in precision agriculture, extending the original GTO with a state-aware reinforcement learning mechanism.
Unlike standard metaheuristic variants that rely mainly on predefined update operators or fixed parameter-control strategies, the proposed AQGTO uses Q-learning to adaptively select among exploration, exploitation, and diversification actions according to population diversity, improvement rate, and feasibility ratio.
A multi-objective trajectory-quality function is designed to jointly optimize path length, energy-related surrogate cost, obstacle avoidance, path smoothness, and altitude variation.
A feasibility repair strategy is introduced to improve constraint satisfaction and promote collision-free trajectories in complex environments with cylindrical agricultural obstacles.
Extensive experiments conducted in multiple agricultural scenarios demonstrate that the proposed method achieves improved trajectory quality and optimization stability compared with deterministic and population-based baseline algorithms.
The remainder of this paper is organized as follows.
Section 2 reviews existing UAV path-planning approaches and related optimization techniques.
Section 3 presents the proposed AQGTO algorithm and the UAV trajectory formulation.
Section 4 describes the simulation environments and experimental setup.
Section 5 discusses the experimental results and performance analysis. Finally,
Section 6 concludes the paper and outlines potential directions for future work.
3. Proposed Method
To address the previously mentioned limitations, this work proposes an Adaptive Q-Learning Guided Gorilla Troops Optimizer (AQGTO) that integrates reinforcement learning into the optimization process. By incorporating a Q-learning mechanism, the proposed method dynamically selects appropriate search strategies during optimization, improving convergence stability and trajectory quality in complex agricultural environments.
3.1. Problem Formulation
In precision agriculture applications, UAVs are frequently deployed to perform tasks such as crop monitoring, disease detection, and targeted spraying. These missions require the UAV to navigate safely through agricultural environments that may contain various obstacles, including trees, buildings, irrigation systems, and terrain variations. Consequently, the UAV path-planning problem can be formulated as an optimization task that aims to determine a safe and efficient trajectory between a start position and a target destination while satisfying environmental and flight constraints.
Let the three-dimensional environment be defined as a bounded search space:
where
x,
y, and
z represent the spatial coordinates of the UAV within the environment.
The UAV trajectory is represented by a sequence of waypoints connecting the start point
S and the goal point
G:
where
denotes the
i-th intermediate waypoint and
n is the number of waypoints used to represent the trajectory.
The objective of the path-planning problem is to determine the optimal set of waypoints that minimizes a trajectory cost function while satisfying obstacle avoidance and flight constraints. The optimization problem can therefore be formulated as:
subject to the following environmental and trajectory feasibility constraints:
Boundary constraint. Each waypoint must remain inside the bounded three-dimensional search space:
Obstacle avoidance constraint. Let
denote the
j-th cylindrical obstacle with center
, radius
, and height
. A waypoint
is considered collision-free with respect to
if
where
is a vertical safety margin. In addition, each line segment connecting two consecutive waypoints is checked by sampling intermediate points along the segment to detect possible segment–obstacle intersections.
Safety clearance constraint. To reduce the risk of near-obstacle flight, a minimum horizontal clearance distance
is imposed whenever the UAV flies below the obstacle height margin:
Altitude constraint. The UAV altitude must remain within the operational flight range during the whole trajectory:
where
and
.
Feasible trajectory constraint. The generated path must satisfy basic geometric feasibility conditions between consecutive waypoints. First, the distance between two consecutive waypoints is limited by a maximum segment length
:
Second, abrupt vertical motion is limited by imposing a maximum altitude difference between consecutive waypoints:
Finally, abrupt heading changes are discouraged by limiting the turning angle between two consecutive path segments:
These constraints ensure that the optimized trajectory remains inside the flight region, avoids cylindrical obstacles with a safety margin, and maintains physically reasonable transitions between consecutive waypoints. Candidate solutions violating these constraints are corrected using the feasibility repair mechanism described in
Section 3.6 and penalized through the objective function when necessary.
The trajectory cost
is defined as a multi-objective function that considers several factors influencing the quality of the UAV path, including trajectory length, energy-related surrogate cost, obstacle proximity penalties, and path smoothness. These components are described in detail in
Section 3.3.
The goal of the optimization algorithm is therefore to determine the waypoint configuration that produces the safest and most efficient UAV trajectory within the agricultural environment.
3.2. Path Encoding
In population-based optimization algorithms, each candidate solution must be encoded as a vector representation that can be efficiently manipulated during the search process.
The UAV trajectory is defined as a sequence of waypoints connecting the start point
S and the goal point
G, as defined in
Section 3.1.
Each candidate solution is therefore represented as a continuous vector containing the coordinates of all intermediate waypoints:
where
n denotes the number of intermediate waypoints. The start and goal positions remain fixed and are not modified during optimization.
This representation transforms the UAV path-planning problem into a continuous optimization problem in a search space of dimension:
where each waypoint contributes three variables corresponding to its spatial coordinates.
During the optimization process, candidate solutions generated by the swarm algorithms may occasionally violate environmental constraints such as obstacle boundaries or altitude limits. To address this issue, a feasibility repair mechanism is applied after each position update. This mechanism adjusts invalid waypoint coordinates to ensure that the generated trajectory remains within the allowable flight region and avoids obstacle intersections.
The waypoint-based representation offers several advantages for UAV trajectory optimization:
It provides a compact and continuous search representation suitable for swarm-based optimization algorithms.
It allows flexible trajectory shapes that can adapt to complex obstacle configurations.
It enables smooth trajectory generation when combined with appropriate cost functions and waypoint interpolation.
This encoding scheme forms the basis for the optimization process performed by the proposed AQGTO algorithm.
3.3. Objective Function
To evaluate the quality of candidate UAV trajectories, a multi-objective cost function is defined that considers path length, an energy-related surrogate cost, obstacle avoidance, path smoothness, and altitude variation. The energy-related term used in this study is not intended to represent a full physical battery-consumption model; instead, it provides a lightweight trajectory-quality proxy that penalizes long travel distances and excessive vertical motion. The total trajectory cost is expressed as:
where
is the path length,
is the energy-related surrogate cost,
is the obstacle penalty,
is the smoothness term, and
is the altitude variation.
3.3.1. Path Length
The path length represents the total distance traveled by the UAV along the trajectory. Shorter paths generally correspond to faster mission completion and reduced energy-related surrogate cost.
The path length is computed as:
where
and
.
3.3.2. Energy-Related Surrogate Cost
Energy efficiency is an important consideration for UAV missions, especially in agricultural monitoring and spraying tasks where flight endurance is limited. However, accurately modeling UAV energy-related surrogate cost requires detailed information about the UAV platform, propulsion system, velocity profile, payload, wind conditions, and climb/descent dynamics. Since the present study focuses on comparing optimization algorithms under a common trajectory-planning framework, a simplified energy-related surrogate cost is used instead of a full physical energy model.
This surrogate term combines the traveled distance and the cumulative vertical displacement between consecutive waypoints. The rationale is that longer trajectories and larger altitude variations generally increase flight effort. Therefore, this term is used to guide the optimizer toward shorter and vertically smoother trajectories while maintaining computational simplicity.
The energy term is defined as:
where
is the coefficient controlling the penalty associated with vertical displacement. The term
should be interpreted as an energy-related trajectory cost rather than a direct physical estimate of UAV battery consumption.
3.3.3. Obstacle Penalty
To ensure safe UAV navigation, trajectories that pass too close to obstacles or intersect them are penalized. Obstacles in the agricultural environment are modeled as cylindrical volumes representing trees, buildings, or agricultural structures.
The obstacle penalty consists of two components: a proximity penalty and a collision penalty. The proximity penalty discourages the UAV from flying too close to obstacle boundaries, while the collision penalty strongly penalizes any trajectory segment that intersects an obstacle.
For a waypoint
and an obstacle
with center
, radius
, and height
, the radial clearance is defined as:
A proximity penalty is applied when the waypoint lies below the obstacle height margin and the clearance is smaller than a safety distance
:
with
where
is the proximity gain,
is the obstacle safety clearance threshold, and
is the height safety margin.
In addition, a collision penalty is assigned whenever a trajectory segment intersects an obstacle. Let
denote the total number of detected segment-obstacle collisions. The collision penalty is defined as:
where
is a large constant that strongly penalizes infeasible trajectories.
The total obstacle penalty is therefore:
3.3.4. Path Smoothness
Smooth trajectories are desirable for UAV navigation because they reduce abrupt direction changes, improve flight stability, and lower control effort. In the proposed model, smoothness is measured using the squared turning angles between consecutive path segments.
The smoothness term is defined as:
where
denotes the turning angle between two consecutive trajectory segments.
3.3.5. Altitude Variation
Large altitude changes may increase energy-related surrogate cost and reduce flight stability. Therefore, altitude variation is penalized to encourage smoother vertical motion along the trajectory.
The altitude variation term is defined as:
3.4. Gorilla Troops Optimizer
The Gorilla Troops Optimizer (GTO) is a population-based metaheuristic optimization algorithm inspired by the social behavior and leadership hierarchy of gorilla groups. In this algorithm, each gorilla represents a candidate solution within the search space, and the group collectively explores the environment to identify optimal solutions.
In the context of UAV path planning, each gorilla encodes a candidate trajectory defined by a set of intermediate waypoints as described in
Section 3.2. The quality of each trajectory is evaluated using the objective function defined in
Section 3.3.
The GTO algorithm operates through two main phases: exploration and exploitation.
3.4.1. Exploration Phase
During the exploration phase, gorillas search for promising regions of the solution space by performing random movements. This phase helps maintain population diversity and prevents premature convergence.
The exploration movement can be expressed as:
where
represents the position of the
i-th gorilla at iteration
t,
is a randomly selected solution from the population, and
r is a random coefficient controlling the exploration step.
3.4.2. Exploitation Phase
Once promising regions of the search space are identified, the algorithm enters the exploitation phase, where gorillas move toward the best-known solution, known as the silverback.
The exploitation update rule can be expressed as:
where
represents the best solution found so far in the population.
Through iterative exploration and exploitation, the gorilla population gradually converges toward optimal regions of the search space.
Although GTO demonstrates strong global search capability, its search strategies remain predefined throughout the optimization process. As a result, the algorithm may experience difficulties dynamically adapting the balance between exploration and exploitation as the optimization progresses. This limitation motivates the integration of reinforcement learning into the optimization process, as described in the following subsection.
3.5. Adaptive Q-Learning Mechanism
To dynamically balance exploration and exploitation during optimization, a Q-learning mechanism is integrated into the Gorilla Troops Optimizer. The reinforcement learning agent selects the search strategy at each iteration based on the current state of the population.
3.5.1. State Definition
The state is defined using three indicators that characterize the optimization process:
Population diversity (D): measured as the average Euclidean distance between individuals and the population centroid.
Improvement rate (): defined as the relative improvement of the best solution between consecutive iterations.
Feasibility ratio (F): defined as the proportion of collision-free solutions in the population.
Each indicator is discretized into three levels (Low, Medium, High), resulting in a discrete state space of size
states:
Equation (
25) defines the discrete optimization-state space used by the Q-learning agent. This representation allows the algorithm to capture stagnation, convergence, and feasibility conditions during the search process.
3.5.2. Action Space
The action space consists of four search strategies:
: Strong exploration (large random perturbations)
: Moderate exploration (peer-guided movement)
: Exploitation (movement toward the best solution)
: Diversification jump (random restart or large perturbation)
Each action corresponds to a specific update rule applied to the population.
3.5.3. Reward Function
The reward is defined based on both solution improvement and feasibility evolution:
where:
Additional penalty terms are applied when no improvement is observed or when feasibility decreases, in order to discourage ineffective search actions:
3.5.4. Q-Learning Update Rule
The Q-values are updated using:
where
is the learning rate,
is the discount factor,
s and
are the current and next states, respectively,
a is the selected action, and
R is the reward defined in Equation (
26).
3.5.5. Action Selection Policy
An -greedy policy is adopted:
With probability , a random action is selected,
Otherwise, the action with the highest Q-value is selected.
The parameter decreases linearly during the optimization process to gradually shift from exploration to exploitation.
3.5.6. Integration into AQGTO
At each iteration, the agent observes the current state and selects an action that determines how the population is updated. The selected strategy modifies the movement of candidate solutions, enabling adaptive control of the search behavior throughout the optimization process.
3.6. Feasibility Repair Mechanism
During the optimization process, population-based algorithms such as PSO and GTO may generate candidate trajectories that violate environmental or flight constraints. For example, some waypoints may lie outside the allowable flight region, intersect with obstacles, or produce unrealistic altitude variations. To ensure that all candidate trajectories remain feasible, a feasibility repair mechanism is applied after each position update.
The purpose of this mechanism is to correct invalid waypoint coordinates while preserving the overall structure of the trajectory. The repair procedure consists of three main steps.
Boundary correction. If a waypoint lies outside the allowable flight region
, its coordinates are projected back into the feasible space:
This ensures that all waypoints remain within the predefined spatial limits of the environment.
Obstacle avoidance correction. Agricultural obstacles such as trees or buildings are modeled as cylindrical regions in the environment. If a waypoint is detected inside an obstacle region, the waypoint is shifted to the nearest feasible position outside the obstacle boundary. This adjustment preserves the continuity of the trajectory while ensuring collision-free navigation.
Altitude adjustment. To maintain feasible UAV flight dynamics, large altitude variations between consecutive waypoints are limited. If the altitude difference between two waypoints exceeds a predefined threshold, the waypoint altitude is adjusted to satisfy the allowable variation constraint.
Segment-length and turning-angle correction. To avoid unrealistic jumps between consecutive waypoints, the distance between adjacent waypoints is checked after each position update. If , the waypoint is shifted along the segment direction until the maximum segment-length constraint is satisfied. Similarly, if the turning angle exceeds , the waypoint is locally adjusted toward the bisector direction of the neighboring segments. This correction reduces abrupt heading changes and improves the physical feasibility of the generated trajectory.
Although the present implementation uses cylindrical obstacles, the proposed optimization framework is not restricted to this specific geometry. AQGTO operates on waypoint coordinates and relies on the feasibility-checking and penalty-evaluation modules to determine whether a candidate trajectory is valid. Therefore, more complex obstacle representations, such as polygonal regions, ellipsoidal canopies, occupancy grids, or point-cloud-based maps, can be incorporated by replacing the collision-detection and obstacle-penalty functions while preserving the same Q-learning-guided optimization structure.
By applying these corrections after each optimization step, the feasibility repair mechanism guarantees that all candidate trajectories evaluated by the objective function remain valid and collision-free. This mechanism significantly improves the stability of the optimization process and prevents the algorithm from wasting iterations evaluating infeasible solutions.
3.7. AQGTO Algorithm
The proposed Adaptive Q-Learning Guided Gorilla Troops Optimizer (AQGTO) integrates reinforcement learning into the Gorilla Troops Optimizer to dynamically adjust the search behavior during the optimization process. The objective of this integration is to improve convergence stability and trajectory optimization performance by adaptively controlling the balance between exploration and exploitation.
In the classical GTO algorithm, the search strategy is predefined and remains fixed throughout the optimization process. This limitation may lead to inefficient exploration in early iterations or premature convergence in later stages. To address this issue, the proposed AQGTO framework introduces a Q-learning agent that observes the state of the optimization process and selects appropriate search strategies accordingly.
At each iteration, the Q-learning agent evaluates the current optimization state and determines whether the algorithm should emphasize exploration or exploitation. Based on the selected action, the gorilla population updates its positions accordingly. The updated candidate trajectories are then corrected using the feasibility repair mechanism and evaluated using the objective function. A reward signal reflecting the improvement of the best solution is used to update the Q-table, enabling the algorithm to learn effective search strategies over time. The workflow of the proposed AQGTO framework is illustrated in
Figure 2.
The overall procedure of the proposed AQGTO algorithm is summarized in Algorithm 1.
| Algorithm 1 Adaptive Q-Learning Guided Gorilla Troops Optimizer (AQGTO) |
- 1:
Initialize gorilla population - 2:
Initialize Q-table - 3:
Evaluate fitness of each solution - 4:
for iteration to do - 5:
Identify best solution - 6:
Observe optimization state s - 7:
Select action a using policy derived from Q-table - 8:
for each gorilla i do - 9:
if action indicates exploration then - 10:
Update position using exploration rule - 11:
else - 12:
Update position using exploitation rule - 13:
end if - 14:
Apply feasibility repair mechanism - 15:
end for - 16:
Evaluate updated population - 17:
Compute reward r - 18:
Observe next state - 19:
Update Q-table using Equation ( 27) - 20:
end for - 21:
Return best trajectory found
|
4. Experimental Setup
This section describes the experimental framework used to evaluate the performance of the proposed AQGTO algorithm, including the UAV modeling assumptions, simulation environments, parameter settings, and evaluation metrics. The simulation experiments were conducted on a workstation equipped with an Intel Core i5-12400 CPU, 32 GB DDR4 3600 MHz RAM, and a 2 TB PCIe Gen 4 NVMe SSD. The implementation was developed in Python 3.14.3 using standard scientific computing libraries, including NumPy and Matplotlib; no physical UAV or field equipment was used in this simulation-based study.
The present experimental setup focuses on global three-dimensional trajectory optimization in known agricultural environments. In this setting, the obstacle map and environmental boundaries are assumed to be available during planning, and the objective is to generate a safe and efficient reference trajectory before mission execution. This assumption is commonly adopted in offline UAV trajectory optimization studies because it enables controlled comparison among optimization algorithms under identical environmental and objective-function conditions. Dynamic replanning, perception uncertainty, wind-field estimation, and communication latency are not explicitly modeled in the current experiments and are discussed as limitations and future research directions.
From a system-level perspective, the simulated obstacle maps used in this work can be interpreted as the output of an upstream perception or mapping stage. For example, in real agricultural missions, crop rows, trees, terrain variations, or restricted regions may be extracted from UAV imagery, multispectral data, LiDAR point clouds, or geographic field maps before trajectory optimization. AQGTO then operates on this representation to generate a reference path. Therefore, the current evaluation isolates the planning component in order to assess the optimization performance of the proposed algorithm under controlled and reproducible conditions.
4.1. UAV Model and Constraints
In precision agriculture applications, unmanned aerial vehicles are typically deployed to perform tasks such as crop monitoring, spraying, and field mapping. During these missions, the UAV must navigate safely through environments containing obstacles such as trees, buildings, and terrain variations. To ensure realistic trajectory generation, the path-planning process must consider several UAV operational constraints.
In this work, the UAV is modeled as a point-mass system operating in a three-dimensional environment. The trajectory is defined by a sequence of waypoints connecting the start point
S to the goal point
G, as described in
Section 3.2. The UAV is assumed to move along straight-line segments between consecutive waypoints.
Several constraints are imposed to ensure safe and feasible UAV navigation.
Altitude constraint. The UAV altitude must remain within allowable flight limits defined by the mission requirements:
where
represents the altitude of waypoint
.
Obstacle avoidance constraint. The UAV trajectory must avoid obstacles present in the agricultural environment. Obstacles such as trees or infrastructure elements are modeled as cylindrical regions defined by their center coordinates and radius. Any trajectory segment intersecting an obstacle is considered infeasible and penalized through the objective function.
Trajectory continuity constraint. To maintain feasible UAV motion, the distance between consecutive waypoints must remain within a reasonable range, ensuring smooth transitions between trajectory segments.
Flight safety constraint. A minimum safety distance is maintained between the UAV trajectory and obstacle boundaries to account for navigation uncertainty and environmental disturbances.
These constraints ensure that the generated trajectories are physically realizable and safe for UAV operation in agricultural environments.
4.2. Simulation Environment
To evaluate the performance of the proposed AQGTO algorithm, three representative agricultural path-planning scenarios were simulated. These scenarios are not intended to reproduce a single real farm as a digital twin; rather, they provide controlled test environments that capture common geometric challenges encountered in precision-agriculture UAV missions. The simulated tasks correspond to field monitoring, orchard inspection, and terrain-aware agricultural surveying, where the UAV must generate a collision-free and smooth reference trajectory between a fixed start point and a target location. The considered environments are illustrated in
Figure 3.
The simulation space is defined as a bounded three-dimensional region:
where
and
represent the horizontal dimensions of the field, and
and
correspond to the allowable UAV altitude limits.
Within this environment, obstacles representing trees, buildings, or agricultural structures are distributed according to the characteristics of the considered scenario. Obstacles are modeled as cylindrical volumes defined by their center coordinates, radius, and height.
The cylindrical obstacle model is used as a geometric abstraction of common agricultural structures such as tree trunks and canopies, irrigation equipment, poles, storage elements, and small field buildings. This representation provides a controlled and computationally efficient setting for evaluating obstacle avoidance and trajectory optimization. Nevertheless, real agricultural environments may contain irregular, non-convex, and heterogeneous obstacle shapes. The present model therefore represents a simplified but useful first-level approximation, and the extension toward more realistic obstacle geometries is discussed in
Section 5.9.
Three different agricultural environments are considered in the experiments:
Row-crop environment. This scenario represents a structured agricultural field, such as cereal, vegetable, or maize-like row cultivation, where vegetation rows and irrigation elements form elongated obstacle patterns. The simulated UAV mission corresponds to field monitoring or targeted spraying, where the aircraft must move from one side of the field to the opposite side while maintaining safe clearance from crop rows and field infrastructure. The main planning challenge in this scenario is to generate a short and regular path through narrow free-space corridors without violating obstacle-clearance constraints.
Orchard environment. This scenario represents an orchard or tree-plantation inspection task, where trees are approximated by cylindrical obstacles defined by trunk/canopy radius and height. The obstacle density is higher than in the row-crop environment, which forces the planner to generate more flexible trajectories around tree-like structures. This scenario is representative of UAV missions for canopy monitoring, disease inspection, or localized treatment in orchards.
Hilly terrain environment. This scenario represents agricultural monitoring over uneven or sloped terrain. Terrain variation affects the desired altitude profile and increases the importance of vertical smoothness and altitude-variation penalties. The planning objective is to maintain a safe and regular trajectory while avoiding excessive climb/descent behavior. This scenario is relevant to agricultural fields located in non-flat regions, where altitude changes and terrain clearance must be considered during UAV route generation.
For each environment, the start and goal positions are defined at opposite regions of the search space to ensure sufficiently long trajectories and meaningful optimization challenges. The proposed AQGTO algorithm and baseline methods are evaluated using identical environmental conditions to ensure fair comparisons.
4.3. Simulation Parameters
The experiments were conducted in a simulated three-dimensional agricultural environment designed to represent typical UAV operating conditions in precision agriculture. The simulation parameters defining the environment size, altitude limits, obstacle properties, and trajectory representation are summarized in
Table 1.
The selected parameters aim to provide sufficiently complex environments while maintaining computational feasibility for repeated optimization runs. These settings are used consistently for all evaluated algorithms to ensure fair performance comparisons.
The obstacle dimensions and altitude limits were selected to create representative 3D planning challenges rather than to reproduce one specific agricultural field. Cylindrical obstacles with radii between 3 and 5 m and heights between 18 and 30 m provide a simplified approximation of trees, compact canopy regions, irrigation structures, poles, or small field buildings. The altitude range m was intentionally defined as a broad operational search interval to test the optimizer in a three-dimensional space. However, the altitude-variation term, the energy-related surrogate cost, and the feasibility constraints discourage unnecessary vertical motion and promote smoother altitude profiles. In practical deployments, these bounds should be adjusted according to the UAV platform, crop type, regulatory altitude limits, sensor requirements, and mission objective.
All experiments were conducted using the same implementation framework and identical stopping criteria to ensure reproducibility and fair comparison between algorithms.
4.4. Algorithm Parameters
To ensure fair and reproducible comparisons, all optimization algorithms were executed using consistent parameter settings across the experiments. The parameter values were selected based on commonly used configurations in swarm optimization literature and preliminary empirical tuning.
The parameters used for the Particle Swarm Optimization (PSO), Gorilla Troops Optimizer (GTO), and the Q-learning component of the proposed AQGTO algorithm are summarized in
Table 2.
The algorithmic parameters were selected to ensure a fair comparison among the evaluated optimization methods under the same computational budget. The population size and maximum number of iterations were kept identical for all population-based algorithms. The PSO parameters were chosen according to commonly used values in swarm-optimization studies and preliminary empirical testing. For the Q-learning component of AQGTO, the learning rate was selected to allow gradual Q-value adaptation without producing unstable oscillations, while the discount factor gives sufficient importance to future rewards during iterative optimization. The exploration rate was linearly decreased from 0.15 to 0.02 to encourage moderate exploration during the early iterations and more exploitation-oriented behavior during the later search stages.
All algorithms were executed under identical experimental conditions, including population size, maximum number of iterations, and number of runs (
Table 2 and
Table 3). This ensures a fair and unbiased comparison of optimization performance across different methods. Each algorithm was run multiple times to account for stochastic variability, and the reported results correspond to the statistical averages obtained across these runs.
The objective-function weights were selected according to the relative importance of the mission requirements in agricultural UAV path planning. Obstacle avoidance was assigned the highest priority because trajectory safety is a mandatory requirement in environments containing trees, irrigation structures, and terrain-related constraints. Path length and energy-related effort were assigned relatively high weights because they directly affect mission duration and UAV endurance. Smoothness and altitude variation were assigned lower weights because they act mainly as regularization terms that reduce abrupt directional and vertical changes without dominating the safety and efficiency objectives. The penalty parameters were selected to strongly discourage infeasible trajectories, particularly segment-obstacle intersections, while still allowing the optimizer to compare feasible candidate paths according to their geometric and energetic quality. Preliminary tuning was conducted to avoid dominance of any single objective term and to ensure stable convergence across the simulated agricultural scenarios.
4.5. Evaluation Metrics
To quantitatively evaluate the performance of the proposed AQGTO algorithm and the baseline methods, several metrics are considered. These metrics assess different aspects of the generated UAV trajectories, including efficiency, safety, and trajectory quality.
Trajectory cost. The primary evaluation metric is the overall trajectory cost defined by the objective function described in
Section 3.3. This metric combines path length, energy-related surrogate cost, obstacle penalties, smoothness, and altitude variation into a single value representing the quality of the UAV trajectory.
Path length. Path length measures the total distance traveled by the UAV from the start point to the goal point along the generated trajectory. Shorter paths generally lead to faster mission completion and reduced energy-related surrogate cost.
Energy-related surrogate cost. The energy-related surrogate cost provides a simplified trajectory-level indicator of flight effort. It combines traveled distance and vertical displacement, and is used only as a comparative optimization metric. It should not be interpreted as a complete physical energy-consumption model, since real UAV energy depends on UAV mass, propulsion characteristics, flight speed, acceleration, payload, wind conditions, and climb/descent dynamics.
Path smoothness. Trajectory smoothness reflects the stability of UAV motion. It is typically measured using the turning angles between consecutive trajectory segments. Smoother trajectories result in more stable UAV flight and lower control effort.
Collision count. The number of collisions indicates whether the generated trajectory intersects with obstacles. A valid UAV path should avoid all obstacles in the environment, resulting in zero collisions.
For each algorithm, the reported results correspond to the average performance obtained over multiple independent runs in order to account for the stochastic nature of population-based optimization methods.
In addition to mean performance, standard deviation values are reported to assess the robustness and stability of each algorithm.
5. Results and Discussion
This section presents and analyzes the experimental results obtained by the proposed AQGTO algorithm, including comparisons with baseline methods, performance across different agricultural scenarios, and evaluations of convergence behavior, computational efficiency, and component contributions.
5.1. Comparison with Baseline Algorithms
To evaluate the effectiveness of the proposed AQGTO algorithm, its performance was compared with several representative baseline methods, including the classical A* planner and four population-based optimization algorithms: Particle Swarm Optimization (PSO), Grey Wolf Optimizer (GWO), Whale Optimization Algorithm (WOA), and the original Gorilla Troops Optimizer (GTO). A* was included as a deterministic graph-search baseline, while PSO, GWO, WOA, and GTO were selected as representative swarm-intelligence and bio-inspired optimizers that can be applied using the same continuous waypoint-based trajectory representation. All stochastic algorithms were evaluated over 30 independent runs under identical simulation conditions, including the same objective function, number of waypoints, population size, maximum number of iterations, and feasibility repair mechanism.
Table 4 reports the mean and standard deviation of the obtained performance metrics.
The results indicate that AQGTO achieves the lowest overall trajectory cost among the evaluated methods. In particular, AQGTO obtains the shortest average path length and the lowest energy-related surrogate cost, while also showing the smallest standard deviation for the main metrics. This indicates that the proposed method improves not only the average trajectory quality but also the stability of the optimization process across independent runs.
The additional comparisons with GWO and WOA strengthen the evaluation. GWO performs competitively and achieves the second-best values for trajectory cost, path length, and energy-related cost, which confirms that it is a strong swarm-intelligence baseline for this problem. WOA also achieves competitive cost and energy-related values but presents a higher smoothness value and larger variability, indicating less stable trajectory regularity in the considered setting. Compared with these baselines, AQGTO maintains the best overall trade-off between cost minimization, path efficiency, energy-related effort, and robustness.
Compared with the original GTO, AQGTO substantially reduces the mean trajectory cost, path length, and energy-related cost. This confirms that the integration of the state-aware Q-learning mechanism improves the search behavior of the original GTO. Although PSO produces the lowest smoothness value, it yields higher overall trajectory cost and larger variability than AQGTO. Overall, these results show that AQGTO provides a favorable compromise between path efficiency, trajectory quality, and optimization stability when compared with deterministic, swarm-based, and bio-inspired baseline methods.
These findings are consistent with recent studies showing that population-based optimizers may suffer from premature convergence, loss of diversity, and sensitivity to predefined search-control parameters in UAV path-planning problems. For example, recent GWO-based UAV path-planning research has emphasized that standard GWO may converge prematurely and lacks sufficient adaptive learning when the search space becomes complex [
18]. Similarly, improved WOA and SSA variants have introduced mechanisms such as nonlinear convergence control, reverse learning, chaotic initialization, Lévy flight, and disturbance-based search to improve population diversity and reduce local-optimum stagnation [
19,
20,
21]. In this context, the improved performance of AQGTO can be explained by its ability to adaptively select search behaviors through Q-learning rather than relying only on fixed update operators.
Deep reinforcement learning methods such as DQN and PPO are also relevant to UAV path planning. However, these methods require a different sequential decision-making formulation, including state representation, action-space design, reward shaping, training episodes, and policy generalization across environment instances. In contrast, the present study focuses on trajectory-level continuous optimization using a fixed waypoint encoding and a shared multi-objective cost function. Therefore, a direct comparison with DQN or PPO would require a separate experimental protocol to ensure fairness. Extending the proposed framework toward deep reinforcement learning-based planning in dynamic agricultural environments is identified as an important direction for future work.
5.2. Statistical Significance Analysis
To further assess the robustness of the proposed AQGTO algorithm, a statistical significance analysis was conducted over independent stochastic runs. Since the results of population-based metaheuristic algorithms may not follow a normal distribution, a non-parametric Mann–Whitney U test, also known as the Wilcoxon rank-sum test, was adopted instead of a parametric t-test. The significance level was set to . The statistical test was focused on AQGTO and its base optimizer GTO. This comparison is particularly relevant because AQGTO is designed as an adaptive Q-learning-guided enhancement of the original GTO mechanism. Therefore, the test evaluates whether the proposed adaptive guidance strategy provides a statistically significant improvement over the underlying optimizer. To provide a global assessment across the considered agricultural environments, the independent runs obtained from the row-crop, orchard, and hilly scenarios were pooled for each method.
Table 5 reports the pooled Mann–Whitney U test results. The results show that AQGTO significantly outperforms GTO in terms of trajectory cost, path length, and energy-related cost, with very small
p-values and large negative Cliff’s delta values. Since all these metrics are minimized, the negative Cliff’s delta values indicate that AQGTO consistently tends to produce lower values than GTO. For the smoothness metric, AQGTO obtains a lower mean value than GTO, although the difference is not statistically significant. Overall, the statistical results support the effectiveness of the adaptive Q-learning-guided mechanism introduced in AQGTO.
5.3. Performance Across Agricultural Scenarios
To further assess the robustness of the proposed method, AQGTO was compared with the original GTO algorithm across three representative agricultural environments: row-crop, orchard, and hilly terrain. Since AQGTO is an extension of GTO, this comparison highlights the effect of the adaptive Q-learning mechanism and the feasibility repair strategy under different environmental conditions. All stochastic results correspond to 30 independent runs.
Across the three evaluated agricultural scenarios, AQGTO consistently outperforms the original GTO algorithm in terms of overall trajectory cost, path length, and energy-related surrogate cost. The relative reduction in mean trajectory cost is approximately 15.4% in the row-crop environment, 12.9% in the orchard environment, and 14.5% in the hilly terrain environment. In addition, AQGTO produces substantially lower standard deviation values, indicating improved optimization stability and robustness across different environmental conditions.
The detailed results for the row-crop, orchard, and hilly-terrain environments are reported in
Table 6,
Table 7 and
Table 8, respectively.
The consistency of AQGTO across the row-crop, orchard, and hilly-terrain scenarios suggests that the adaptive search mechanism is not restricted to a single obstacle distribution. This is important because UAV trajectory planning in agricultural and three-dimensional environments is typically affected by multiple constraints, including obstacle avoidance, flight distance, altitude regulation, and trajectory smoothness [
19,
21]. The orchard scenario is more constrained due to denser tree-like obstacles, whereas the hilly-terrain scenario increases the importance of altitude regularity. The ability of AQGTO to maintain lower cost and reduced variability across these different settings indicates that the Q-learning mechanism contributes to more robust exploration–exploitation control under changing environmental complexity.
5.4. Trajectory Visualization
To better illustrate the qualitative differences between the evaluated algorithms, the UAV trajectories generated by A*, PSO, GWO, WOA, GTO, and the proposed AQGTO algorithm were visualized within the simulated agricultural environment.
Figure 4 presents a representative trajectory comparison in the orchard scenario. The start and goal locations are fixed for all algorithms, while the cylindrical obstacles represent agricultural structures such as trees, compact canopy regions, or field infrastructure. Each algorithm produces a collision-free trajectory after feasibility repair while attempting to minimize the objective function defined in
Section 3.3.
The trajectories generated by the population-based algorithms exhibit noticeable differences in their geometric characteristics. The A* path follows a more structured route due to its graph-based nature. PSO generates relatively smooth trajectories but does not always minimize the overall trajectory cost. GWO produces a competitive trajectory and confirms its effectiveness as a strong swarm-intelligence baseline. WOA generates a feasible trajectory, but its path may show larger deviations and less regularity in the dense orchard setting. The original GTO often produces a longer and less efficient trajectory compared with the adaptive variant.
In contrast, the proposed AQGTO algorithm generates a shorter and more stable trajectory while preserving safe obstacle clearance. This improvement is primarily due to the adaptive Q-learning mechanism, which dynamically guides the search process according to the current optimization state. Overall, the trajectory visualization supports the quantitative results reported in
Table 4, demonstrating the ability of AQGTO to generate efficient and feasible UAV trajectories in complex agricultural environments.
Figure 4 illustrates representative UAV trajectories generated by the evaluated algorithms in the orchard environment. The start and goal points are marked to clarify the mission direction, while the cylindrical obstacles and their safety-clearance regions illustrate the spatial constraints imposed during planning. All methods generate collision-free trajectories after feasibility repair; however, their geometric characteristics differ noticeably. The A* path follows a more structured route due to its graph-based nature, while the population-based optimizers generate more flexible continuous trajectories with different levels of path regularity and obstacle-clearance behavior. GWO produces a competitive path, WOA remains feasible but shows less regularity in the dense orchard setting, and the original GTO tends to generate a longer trajectory. Compared with these baselines, AQGTO produces a shorter and more regular path while preserving safe obstacle clearance. This behavior is consistent with the quantitative improvements in cost, path length, and stability reported in
Table 4,
Table 5,
Table 6,
Table 7 and
Table 8.
5.5. Convergence Analysis
To further analyze the optimization performance of the proposed AQGTO algorithm, the convergence behavior of the stochastic optimization algorithms was examined. Convergence curves provide insight into how quickly each algorithm reduces the objective-function value during the optimization process and how stable this reduction remains across independent runs.
Figure 5 illustrates the evolution of the best trajectory cost over the optimization iterations for the stochastic optimization algorithms. The solid curves represent the mean best cost obtained over 30 independent runs, while the shaded regions indicate the variability across runs. This representation provides information not only about convergence speed, but also about the stability of each optimizer during the search process.
The convergence results show that the evaluated algorithms exhibit different search behaviors. PSO converges relatively quickly during the early iterations, but its improvement rate decreases in later stages, indicating possible premature stabilization. GWO shows competitive convergence behavior and maintains good solution quality, which is consistent with its strong numerical performance in
Table 4. WOA also reduces the objective value during optimization, but its convergence profile presents larger variability, reflecting less stable search behavior in the considered scenario.
The original GTO exhibits slower convergence and remains less competitive than its adaptive variant. In contrast, AQGTO achieves a faster and more stable reduction in the best trajectory cost. The narrower variability band indicates that AQGTO produces more consistent optimization behavior across independent runs. This improvement can be attributed to the state-aware Q-learning mechanism, which adaptively adjusts the search behavior according to population diversity, improvement rate, and feasibility ratio. Overall, the convergence analysis supports the numerical results by showing that AQGTO improves both convergence efficiency and run-to-run robustness.
5.6. Computational Time Analysis
In addition to solution quality, computational efficiency is an important factor in UAV path planning, particularly for real-time or large-scale applications. Therefore, the runtime performance of the evaluated algorithms was analyzed under identical experimental conditions. The reported results correspond to the average computational time over 30 independent runs.
The computational time results are reported in
Table 9. As expected, the A* algorithm exhibits the lowest runtime due to its deterministic graph-based nature. In contrast, population-based metaheuristic algorithms require higher computational effort because they perform iterative population updates and repeated objective-function evaluations. Among the stochastic optimizers, runtime differences are mainly related to the complexity of the update rules and the additional adaptive mechanisms used during the search process.
The proposed AQGTO algorithm introduces additional computational overhead compared with the original GTO, primarily due to the integration of the adaptive Q-learning mechanism, which involves state evaluation, action selection, reward computation, and Q-table updates at each iteration. However, this additional cost is compensated by improved trajectory quality, lower variability, and stronger convergence stability. These findings indicate that AQGTO provides a favorable trade-off between solution quality and computational effort for complex three-dimensional UAV path-planning problems. Runtime optimization can be further improved through parallel population evaluation and accelerated collision checking.
5.7. Ablation Study
To evaluate the contribution of the adaptive Q-learning mechanism, an ablation study was conducted by comparing the proposed AQGTO algorithm with the original Gorilla Troops Optimizer (GTO) under identical experimental conditions. Both algorithms share the same initialization, population size, number of iterations, and objective function. The only difference lies in the integration of the Q-learning strategy in AQGTO. All results are reported over 30 independent runs. The ablation results are presented in
Table 10.
The results demonstrate the effectiveness of the proposed Q-learning mechanism. AQGTO improves over the original GTO algorithm in the evaluated metrics, including trajectory cost, path length, energy-related surrogate cost, and smoothness. In particular, AQGTO achieves a substantial reduction in trajectory cost while also producing more stable results, as indicated by the lower standard deviation values.
These improvements can be attributed to the adaptive action selection strategy introduced by Q-learning, which dynamically balances exploration and exploitation during the optimization process. As a result, AQGTO avoids premature convergence and improves search efficiency compared with the standard GTO algorithm.
This confirms that the performance gain is not merely due to parameter tuning but is directly related to the integration of the reinforcement learning mechanism.
5.8. Sensitivity Analysis of Objective-Function Weights
To examine the influence of the objective-function weights on the behavior of the proposed method, an additional sensitivity analysis was conducted using several representative weighting configurations. The purpose of this analysis is not to identify a universal set of weights, since UAV mission priorities may vary depending on the agricultural task, but rather to verify whether the proposed AQGTO remains stable under reasonable changes in the relative importance of the objective terms.
Four configurations were considered: the default configuration used in the main experiments, a safety-oriented configuration that increases the obstacle penalty weight, an energy-oriented configuration that increases the energy-related term, and a smoothness-oriented configuration that increases the smoothness penalty. These objective-function weight configurations are summarized in
Table 11. All configurations were evaluated under the same simulation conditions and over 30 independent runs.
The sensitivity-analysis results are reported in
Table 12. The results indicate that AQGTO maintains stable and collision-free behavior under all tested configurations. As expected, increasing the obstacle penalty preserves conservative obstacle avoidance, while increasing the energy-related weight slightly reduces the energy-related metric at the cost of a small increase in total weighted cost. Similarly, increasing the smoothness weight reduces the smoothness term but may lead to a slightly longer trajectory. These results confirm that the proposed method is not dependent on a single empirical weight setting and that the selected default configuration provides a balanced trade-off between trajectory efficiency, safety, and flight regularity.
Overall, the experimental results indicate that the main advantage of AQGTO lies in its adaptive search-control mechanism. Compared with standard population-based optimizers, the proposed method uses reinforcement learning to respond to population diversity, improvement rate, and feasibility evolution during optimization. This adaptive behavior helps explain the observed reductions in trajectory cost, path length, and run-to-run variability. At the same time, the runtime analysis shows that these gains are obtained at the cost of additional computation, which motivates future work on parallel implementation and real-time deployment.
5.9. Scope and Limitations
The results presented in this study should be interpreted within the scope of offline trajectory optimization in simulated agricultural environments. The proposed AQGTO framework assumes that the main obstacle layout is known before planning and that obstacles remain static during the optimization process. This assumption is suitable for many pre-mission planning tasks, such as field monitoring, orchard inspection, and targeted spraying route generation, where trees, crop rows, irrigation structures, and terrain-related obstacles can be mapped in advance.
However, real agricultural UAV missions may involve additional uncertainties that are not explicitly considered in the current simulation framework. These include moving obstacles such as workers, animals, or agricultural vehicles; wind disturbances that affect UAV motion and energy-related surrogate cost; sensor noise in obstacle detection and localization; and communication delays in edge/cloud-assisted deployment. These factors may require online replanning or closed-loop trajectory correction during mission execution.
Another limitation concerns the geometric representation of the environment. In the current experiments, obstacles are modeled as cylinders and the flight region is represented as a bounded rectangular three-dimensional space. Although this abstraction is widely used in UAV path-planning simulations and is suitable for modeling trees, poles, and compact agricultural structures, it does not fully represent irregular crop canopies, non-convex obstacles, dense vegetation clusters, or terrain surfaces reconstructed from real sensor data. Future work will therefore consider richer environmental models, including clustered cylindrical obstacles, ellipsoidal canopy models, polygonal and non-convex obstacles, occupancy-grid maps, and point-cloud-based representations obtained from UAV imagery or LiDAR sensing.
The energy-related term used in the objective function is another modeling simplification. It penalizes travel distance and vertical displacement but does not represent a complete physical UAV energy-consumption model. Real UAV energy depends on platform-specific parameters such as mass, propulsion system, payload, velocity profile, acceleration, wind speed and direction, and climb/descent power requirements. Future work will incorporate physics-based or data-driven UAV energy models to improve the realism of mission-level trajectory optimization.
Therefore, the current contribution should be viewed as a global optimization layer for generating high-quality reference trajectories under known environmental conditions. Future extensions of AQGTO should integrate dynamic obstacle prediction, wind-aware cost modeling, uncertainty-aware safety margins, and real-time replanning mechanisms to support deployment in changing agricultural environments.
6. Conclusions
In this paper, an Adaptive Q-Learning Guided Gorilla Troops Optimizer (AQGTO) was proposed for three-dimensional UAV path planning in precision agriculture. By integrating reinforcement learning into the optimization process, the proposed method dynamically adjusts the search strategy to improve the balance between exploration and exploitation. A multi-objective cost formulation was introduced to account for path length, energy-related surrogate cost, obstacle avoidance, trajectory smoothness, and altitude variation. In addition, a feasibility repair mechanism was employed to promote safe and collision-free trajectories in complex environments. Experimental results conducted across multiple agricultural scenarios show that AQGTO achieves competitive and improved performance compared with classical A*, PSO, GWO, WOA, and the original GTO algorithm in terms of trajectory cost, path efficiency, and optimization stability. The ablation study further indicates that the integration of the Q-learning mechanism contributes to improved optimization behavior.
Although the proposed method introduces additional computational overhead, it provides a favorable trade-off between solution quality, trajectory safety, and optimization stability. The present study focuses on offline trajectory optimization in known simulated agricultural environments. Therefore, several extensions are required before deployment in fully operational agricultural UAV missions.
Future work will first focus on reducing computational complexity through parallel population evaluation and accelerated collision-checking procedures. Second, AQGTO will be extended to dynamic agricultural environments by incorporating moving obstacles, online replanning, wind-aware trajectory optimization, and uncertainty-aware safety margins. Third, richer environmental representations will be considered, including irregular and non-convex obstacles, clustered vegetation, occupancy-grid maps, and point-cloud-based models derived from UAV imagery or LiDAR sensing. Fourth, edge/cloud-assisted deployment will be investigated to support computation offloading and near-real-time trajectory updates under communication constraints. Finally, real-flight experiments and multi-UAV cooperative path planning will be conducted to validate the proposed framework under practical precision-agriculture conditions.