Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

AQGTO: Adaptive Q-Learning-Guided Gorilla Troops Optimizer for 3D UAV Path Planning in Precision Agriculture

Drones 2026, 10(5), 357; https://doi.org/10.3390/drones10050357

by Tahar Bendouma^1,*

, Saida Sarra Boudouh¹

, Chaker Abdelaziz Kerrache¹

and Jorge Herrera-Tapia^2,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3:

Adrian MOLNAR-IRIMIE

Reviewer 4: Anonymous

Drones 2026, 10(5), 357; https://doi.org/10.3390/drones10050357

Submission received: 25 March 2026 / Revised: 29 April 2026 / Accepted: 4 May 2026 / Published: 8 May 2026

(This article belongs to the Special Issue The Role of UAVs in Modern Agriculture: Precision Spraying and Crop Health Analysis)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This manuscript addresses the problem of 3D UAV path planning in precision agriculture and proposes an improved framework that combines Q-learning with the Gorilla Troops Optimizer. Overall, the topic is relevant, the methodological framework is reasonably complete, and the experimental design is generally clear. The reported results suggest that the proposed method offers certain advantages in terms of trajectory cost, path quality, and stability. The manuscript is basically well organized, but several minor issues still need attention, particularly with regard to the sharpening of the novelty statement, the justification of parameter settings, the statistical support of the results, and the informativeness of the figures and tables. Therefore, I would recommend the manuscript for publication after minor revision.The detailed review comments are as follows:

1.It is suggested that the novelty of the manuscript be clarified more explicitly at the end of the Introduction. Although the basic idea of the proposed AQGTO method is presented, its specific improvement over similar existing studies is still not sufficiently highlighted. A brief statement at the end of the Introduction would help make the contribution of the paper clearer.

2.The parameter section would benefit from a brief justification. Tables 2 and 3 are useful, but at present most values appear to be simply assigned, especially the Q-learning parameters and the objective-function weights. A full sensitivity study may not be necessary for this paper, but the authors should at least explain whether these values were chosen from prior literature, preliminary tuning, or scenario-specific considerations. This is particularly important for the objective weights and penalty terms, which directly affect the reported performance.

3.It is good that the results are reported as mean ± standard deviation. However, the comparative analysis would be more convincing if a simple statistical significance test were added for the main metrics, such as trajectory cost or path length. Since the manuscript emphasizes the stability and superiority of AQGTO across repeated runs, even a basic t-test or a non-parametric alternative would help support that claim more rigorously.

4.Figures 4 and 5 could be made slightly more informative. At present, they serve mainly as qualitative illustrations, which is acceptable, but the captions and associated discussion remain somewhat brief. I would encourage the authors to enhance the visualization by more clearly marking the start/goal points, obstacle clearance region, or—particularly for the convergence plot-showing variability bands across repeated runs. This would make the figures more informative rather than simply confirmatory.

5.The conclusion is acceptable, but it could be slightly tighter. I suggest reducing some repetitive summary and stating the current scope and limitations a bit more explicitly.

Author Response

It is suggested that the novelty of the manuscript be clarified more explicitly at the end of the Introduction. Although the basic idea of the proposed AQGTO method is presented, its specific improvement over similar existing studies is still not sufficiently highlighted. A brief statement at the end of the Introduction would help make the contribution of the paper clearer.

Thank you for this constructive comment. We agree that the novelty of the proposed AQGTO method should be stated more explicitly. In the revised manuscript, we strengthened the end of the Introduction by adding a clearer novelty statement and revising the contribution list.

The revised text now clarifies that the main novelty is not merely the use of Q-learning or GTO separately, but the integration of a state-aware Q-learning mechanism into the GTO search pro- cess for 3D UAV path planning. Unlike standard metaheuristic variants that rely mainly on predefined exploration and exploitation rules, AQGTO observes the optimization state through population diversity, improvement rate, and feasibility ratio, and uses this information to adap- tively select among exploration, exploitation, and diversification actions. This clarification better distinguishes the proposed method from existing PSO-, GWO-, WOA-, SSA-, and GTO-based UAV path-planning approaches.

Changes made in the manuscript: The Introduction was revised to include an explicit nov- elty statement clarifying the state-aware integration of Q-learning into GTO and distinguishing AQGTO from existing operator-level metaheuristic improvements (page 4, lines 120–129). In addition, the contribution list was revised to emphasize the state-aware reinforcement learn- ing mechanism, adaptive selection of exploration/exploitation/diversification actions, and the multi-objective trajectory-quality formulation (page 4, lines 130–146).

The parameter section would benefit from a brief justification. Tables 2 and 3 are useful, but at present most values appear to be simply assigned, especially the Q-learning parameters and the objective-function weights. A full sensitivity study may not be necessary for this paper, but the authors should at least explain whether these values were chosen from prior literature, prelimi- nary tuning, or scenario-specific considerations. This is particularly important for the objective weights and penalty terms, which directly affect the reported performance.

Thank you for this useful comment. We agree that the parameter settings, especially the Q- learning parameters and objective-function weights, should be better justified. In the revised manuscript, we added an explanatory paragraph after the algorithm and objective-parameter tables. The revised text clarifies that the population size, maximum number of iterations, and PSO parameters were selected based on commonly used settings in swarm-optimization studies and preliminary tuning to ensure comparable computational budgets across algorithms. The Q-learning parameters were chosen to provide stable learning while maintaining moderate explo- ration during the optimization process.

We also clarified the rationale behind the objective-function weights. Obstacle avoidance and trajectory safety were given the highest priority, followed by path length and energy-related sur- rogate cost, while smoothness and altitude variation were used as regularization terms to avoid abrupt directional and vertical changes. In addition, we added a weight-sensitivity analysis to verify that AQGTO remains stable under representative weight configurations. The results show that the proposed method maintains collision-free performance and stable trajectory quality across different objective-weight settings.

Changes made in the manuscript: The algorithm-parameter section was revised to justify the common computational budget, population size, maximum number of iterations, PSO param- eters, and Q-learning parameters, including the learning rate, discount factor, and exploration- rate schedule (page 19, lines 710–719). The objective-function weight settings were also justi- fied according to agricultural UAV mission priorities, including obstacle avoidance, path length, energy-related effort, smoothness, altitude variation, and penalty parameters (pages 19–20, lines 725–737). In addition, a new sensitivity analysis subsection was added to evaluate AQGTO under default, safety-oriented, energy-oriented, and smoothness-oriented weight configurations (pages 27–28, lines 970–989).

It is good that the results are reported as mean ± standard deviation. However, the compar- ative analysis would be more convincing if a simple statistical significance test were added for the main metrics, such as trajectory cost or path length. Since the manuscript emphasizes the stability and superiority of AQGTO across repeated runs, even a basic t-test or a non-parametric alternative would help support that claim more rigorously.

Thank you for this valuable comment. We agree that reporting only average performance values is not sufficient to fully demonstrate the robustness of the proposed method. Therefore, we added a statistical significance analysis to the revised manuscript.

Since the performance results of stochastic population-based algorithms may not follow a nor- mal distribution, we used a non-parametric Mann–Whitney U test, also known as the Wilcoxon rank-sum test, rather than a parametric t-test. The significance level was set to p < 0.05.

The statistical analysis was focused on the comparison between AQGTO and GTO, because GTO is the base optimizer from which AQGTO is derived. This allows us to directly evaluate whether the proposed adaptive Q-learning-guided strategy provides a statistically significant im- provement over the underlying optimizer. To obtain a more global assessment, the independent runs from the row-crop, orchard, and hilly scenarios were pooled for each method.

The results show that AQGTO significantly outperforms GTO in terms of trajectory cost, path length, and energy-related cost, with very small p-values and large effect sizes. The smoothness metric also shows a lower mean value for AQGTO, although the difference is not statistically significant. These results provide statistical evidence supporting the effectiveness of the proposed adaptive guidance mechanism.

Changes made in the manuscript: A new subsection entitled “Statistical Significance Anal- ysis” was added to the Results and Discussion section to describe the use of the non-parametric Mann–Whitney U test and Cliff’s delta effect size for comparing AQGTO with its base opti- mizer GTO (page 22, lines 825–837). A new statistical table was also included to report the pooled AQGTO–GTO comparison across the row-crop, orchard, and hilly scenarios, followed by an interpretation of the obtained p-values and effect sizes (pages 22–23, lines 838–845).

Figures 4 and 5 could be made slightly more At present, they serve mainly as qualitative illustrations, which is acceptable, but the captions and associated discussion remain somewhat brief. I would encourage the authors to enhance the visualization by more clearly marking the start/goal points, obstacle clearance region, or—particularly for the convergence plot-showing variability bands across repeated runs. This would make the figures more informa- tive rather than simply confirmatory.

Thank you for this helpful suggestion. We agree that Figures 4 and 5 should be more informative and better connected to the quantitative analysis. In the revised manuscript, we improved the captions and the accompanying discussion of both figures to clarify their interpretation.

For Figure 4, we revised the caption and discussion to explicitly describe the start and goal points, the cylindrical obstacles, and the obstacle-avoidance behavior of the compared trajecto- ries. This makes the qualitative trajectory comparison easier to interpret and better connected to the numerical results reported in the tables.

For Figure 5, we revised the caption and discussion to clarify that the convergence curves rep- resent the evolution of the best trajectory cost during optimization and to better explain the observed differences in convergence behavior among the evaluated algorithms. The revised dis- cussion now emphasizes convergence speed, stability, and the role of the adaptive Q-learning mechanism in reducing fluctuations and improving search behavior.

These changes improve the readability and interpretability of the figures while preserving the original experimental presentation.

Changes made in the manuscript: The trajectory visualization subsection was revised to explicitly describe the algorithms shown in Figure 4, including A*, PSO, GWO, WOA, GTO, and AQGTO, and to explain the start/goal locations, cylindrical obstacles, obstacle-clearance behav- ior, and qualitative differences among the generated trajectories (page 24, lines 870–891). The caption and discussion of Figure 4 were also expanded to describe the start and goal markers, safety-clearance regions, and the comparative trajectory behavior of the evaluated algorithms (pages 24–25, lines 892–904). The convergence analysis subsection was revised to explain the mean convergence curves and run-to-run variability across 30 independent runs (page 25, lines 905–915). In addition, the caption and discussion of Figure 5 were revised to include PSO, GWO, WOA, GTO, and AQGTO, and to interpret convergence speed, variability, and robustness across independent runs (pages 25–26, lines 916–930).

The conclusion is acceptable, but it could be slightly tighter. I suggest reducing some repetitive summary and stating the current scope and limitations a bit more

Thank you for this helpful suggestion. We agree that the Conclusion should be more concise and should state the scope and limitations more explicitly. In the revised manuscript, we rewrote the Conclusion to reduce repetitive summary statements and to better distinguish between the achieved contributions, the current scope of the study, and future extensions.

The revised Conclusion now clearly states that the present work focuses on offline trajectory optimization in known simulated agricultural environments. It also acknowledges the main lim- itations, including the simplified obstacle representation, the use of an energy-related surrogate cost rather than a full physical UAV energy model, and the absence of dynamic obstacles, wind disturbances, communication delays, and real-flight validation. Finally, the future-work para- graph was expanded to include concrete directions such as online replanning, wind-aware opti- mization, irregular obstacle modeling, edge/cloud-assisted deployment, multi-UAV coordination, and real-world UAV experiments.

Changes made in the manuscript: A dedicated “Scope and Limitations” subsection was added to explicitly clarify the current scope of the study as offline trajectory optimization in known simulated agricultural environments and to discuss the main limitations related to static obstacles, wind disturbances, communication delays, simplified cylindrical obstacles, and the use of an energy-related surrogate cost rather than a complete physical UAV energy model (pages 28–29, lines 998–1033). The Conclusion was also revised to provide a more concise summary of the proposed AQGTO framework, its main experimental findings, its computational overhead, and concrete future-work directions including online replanning, wind-aware optimization, irreg- ular obstacle modeling, edge/cloud-assisted deployment, real-flight experiments, and multi-UAV cooperative path planning (page 29, lines 1034–1063).

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Dear Author,

This paper proposes an Adaptive Q-Learning Guided Gorilla Troops Optimizer (AQGTO) for 3D UAV path planning. However, the manuscript requires revisions in several critical areas:

Insufficient data processing and analysis:

The manuscript contains few figures and many tables; consider converting some tables into figures.
The formulas in lines 371 through 393 are not labeled.

Insufficient depth in the discussion and conclusions:

The “5. Results and Discussion” section clearly presents the research findings, but the discussion is limited in scope and lacks supporting references, thereby undermining its credibility.

Author Response

The manuscript contains few figures and many tables; consider converting some tables into figures.

Thank you for this helpful suggestion. We agree that the balance between tables and figures is important for readability. After careful revision, we decided to retain the main numerical tables because they report detailed quantitative results, including mean and standard deviation values over 30 independent runs. These values are important for reproducibility, robustness assessment, and fair comparison among the evaluated algorithms.

Changes made in the manuscript: The baseline-comparison discussion was expanded to make the main trends easier to interpret, including the relative performance of A*, PSO, GWO, WOA, GTO, and AQGTO, and the role of adaptive search behavior in improving trajectory quality and robustness (pages 21–22, lines 785–824). The trajectory-visualization subsection and Figure 4 discussion were revised to better explain the start/goal points, cylindrical obstacles, safety-clearance regions, and qualitative differences among the evaluated trajectories (pages 24– 25, lines 870–904). The convergence-analysis subsection and Figure 5 discussion were also revised to explain the mean convergence curves, run-to-run variability, convergence speed, and robustness of the compared stochastic algorithms over 30 independent runs (pages 25–26, lines 905–930).

The formulas in lines 371 through 393 are not labeled.

Thank you for pointing this out. In the revised manuscript, we replaced the unnumbered display equations with numbered equation environments and added appropriate labels.

Changes made in the manuscript: The Q-learning state-space formulation was changed to a numbered and labeled equation and explicitly referenced in the text (page 12, lines 480–485). The reward function and the Q-learning update rule were also converted into numbered equations, with the Q-update rule explicitly referring to the reward equation (pages 12–13, lines 493–507). In addition, Algorithm 1 was revised to refer to the labeled Q-learning update equation instead of repeating an unlabeled display formula (page 16, lines 517–520).

The “5. Results and Discussion” section clearly presents the research findings, but the discussion is limited in scope and lacks supporting references, thereby undermining its credibility.

Thank you for this constructive comment. We agree that the original Results and Discussion sec- tion mainly presented the obtained numerical results and did not sufficiently connect them to the broader literature. In the revised manuscript, we expanded the discussion by adding supporting references and by interpreting the observed performance trends in relation to known limitations of population-based optimizers, such as premature convergence, insufficient population diversity, sensitivity to predefined update rules, and instability in high-dimensional constrained UAV path- planning problems. Specifically, we revised the discussion following the baseline comparison, scenario comparison, and convergence analysis. The revised text now relates the improved performance of AQGTO to recent studies on adaptive and improved GWO, WOA, and SSA variants for UAV trajec- tory planning. These studies show that adaptive mechanisms, nonlinear convergence factors, population-diversity preservation, chaotic initialization, L´evy-flight mechanisms, and reinforce- ment learning can improve convergence behavior and reduce local-optimum stagnation. This broader discussion strengthens the interpretation of the results and better situates the proposed AQGTO method within current UAV path-planning literature.

Changes made in the manuscript: The Results and Discussion section was expanded after the baseline comparison to provide a deeper interpretation of the relative performance of A*, PSO, GWO, WOA, GTO, and AQGTO, including the role of adaptive search behavior and op- timization stability (pages 21–22, lines 785–824). Additional supporting references were added to connect the observed performance trends to recent GWO-, WOA-, and SSA-based UAV path- planning studies, particularly regarding premature convergence, population diversity, nonlinear convergence control, reverse learning, chaotic initialization, L´evy flight, and local-optimum stag- nation (page 22, lines 805–815). The scenario-based discussion was also expanded to interpret AQGTO robustness across row-crop, orchard, and hilly-terrain environments (pages 23–24, lines 853–868). Finally, the convergence-analysis discussion was revised to provide a more detailed interpretation of convergence speed, variability, and run-to-run robustness across the evaluated stochastic algorithms (pages 25–26, lines 916–930).

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Main objective of this paper is to propose an innovative optimization framework that integrates Q-learning reinforcement mechanism into Gorilla Troops Optimizer to enhance autonomous navigation for UAVs in complex, 3D simulated, agricultural environments (row-crop, orchard, hilly terrain). The main contribution includes a multi-objectives cost function accounting for path length, energy, obstacle proximity, smoothness, and altitude variation, alongside a feasibility repair mechanism for cylindrical obstacle avoidance. The authors correctly identify that agricultural landscapes are characterized by set of constraints, including non-deterministic obstacles, terrain irregularities, and varying crop heights etc.

However, a significant issue is that manuscript fail to address the perception-decision-execution framework, where path planning is integrated with real-time remote sensing and multispectral data acquisition. The manuscript treats agricultural space/environment as static space, which overlooks the presence of dynamic obstacles or changing wind conditions.

Another fundamental problem is lack of in-depth environment description (e.g. row crop) where details matter (what kind of crop, employed technology on that field, crop technology specific requirements, crop characteristics, what that UAV is supposed to do, etc. etc.), what happens if the pre-planned scenarios are changed during field operations. The statement in lines 518-519 is very general and does not address a real (simulated) event, also the altitude range is rather generous (table 1) for regular UAV field operations, how obstacles sizes were chosen, what happens if obstacles are smaller or if they are moving etc.

An important issue is line 277 (energy consumption) or energy cost (line 279) is defined as sum of path length and changes in altitude, which does not compute in energy consumption (usually depends on UAV speed, payload, wind speed and direction – please see proposed energy models used in literature), or use an appropriate term.

Author Response

A significant issue is that manuscript fail to address the perception-decision-execution frame- work, where path planning is integrated with real-time remote sensing and multispectral data acquisition. The manuscript treats agricultural space/environment as static space, which over- looks the presence of dynamic obstacles or changing wind conditions.

Thank you for this insightful comment. We agree that practical agricultural UAV missions require a complete perception–decision–execution pipeline, where environmental information is acquired through onboard sensors or remote sensing, processed into a map or obstacle represen- tation, used by a planning module to generate or update trajectories, and then executed by the UAV control system.

The original manuscript focused mainly on the decision/planning component and did not suf- ficiently explain how the proposed AQGTO framework fits within such a broader autonomy pipeline. To address this issue, we revised the manuscript by adding a new paragraph de- scribing the role of AQGTO as an offline/global trajectory optimization layer within a percep- tion–decision–execution architecture. We clarified that the current study assumes a known or pre-mapped agricultural environment, while real-time perception, multispectral data interpreta- tion, online map updating, dynamic obstacle prediction, and low-level flight control are outside the current scope.

We also added these aspects as concrete future research directions. In particular, future work will investigate coupling AQGTO with perception-driven map updates from UAV imagery, multispec- tral sensing, LiDAR, or occupancy grids, as well as online replanning mechanisms for dynamic obstacles and wind-aware trajectory correction. This revision clarifies the methodological scope of the paper while better connecting the proposed planner to realistic precision-agriculture UAV deployments.

Changes made in the manuscript: The Introduction was revised to explicitly describe the role of AQGTO within a broader perception–decision–execution UAV autonomy pipeline, in- cluding perception modules, environment representation, decision-level trajectory planning, and trajectory execution through the flight controller (page 3, lines 103–114). The Experimental Setup was also revised to clarify that the simulated obstacle maps can be interpreted as the output of an upstream perception or mapping stage using UAV imagery, multispectral data, LiDAR point clouds, or geographic field maps (page 15, lines 597–604). In addition, the Scope and Limitations section was expanded to clarify that the present work focuses on offline trajec- tory optimization in known simulated agricultural environments and does not explicitly model dynamic obstacles, wind disturbances, sensor noise, or communication delays; these aspects are identified as future extensions requiring online replanning and closed-loop trajectory correction (pages 28–29, lines 998–1033).

Another fundamental problem is lack of in-depth environment description (e.g. row crop) where details matter (what kind of crop, employed technology on that field, crop technology specific requirements, crop characteristics, what that UAV is supposed to do, etc. etc.), what happens if the pre-planned scenarios are changed during field operations. The statement in lines 518-519 is very general and does not address a real (simulated) event, also the altitude range is rather gen- erous (table 1) for regular UAV field operations, how obstacles sizes were chosen, what happens if obstacles are smaller or if they are moving etc

Thank you for this detailed and constructive comment. We agree that the original simulation- environment description was too general and did not provide enough information about the agricultural context, the intended UAV mission, and the rationale behind the selected obstacle and altitude settings. In the revised manuscript, we expanded the description of the three simulated agricultural sce- narios. The row-crop scenario is now described as a field-monitoring or targeted-spraying mission over organized crop rows, where the UAV must follow safe corridors between vegetation structures and irrigation elements. The orchard scenario is described as a tree-plantation inspection mis- sion, where cylindrical obstacles approximate tree trunks and canopy regions. The hilly-terrain scenario is described as a monitoring mission over uneven agricultural land, where altitude vari- ation must be controlled to preserve safe clearance and trajectory regularity.

We also clarified that the obstacle dimensions and altitude limits are representative simulation parameters selected to create controlled planning difficulty rather than exact values from a spe- cific crop field. The altitude range was intentionally kept broad to test the optimizer’s behavior in a 3D search space, while the altitude-variation term and feasibility constraints discourage unnecessary vertical motion. Finally, we added a scope and limitations discussion explaining that if field conditions change during operation, the current offline trajectory should be updated through an online replanning layer, which is identified as future work.

Changes made in the manuscript: The Simulation Environment section was revised to clarify that the three agricultural scenarios are representative controlled path-planning environ- ments rather than exact digital twins of a specific farm (page 17, lines 633–640). The row-crop, orchard, and hilly-terrain scenarios were expanded to describe the intended UAV missions, obsta- cle distributions, and planning challenges, including field monitoring, targeted spraying, orchard inspection, canopy monitoring, disease inspection, localized treatment, and terrain-aware agri- cultural surveying (page 17, lines 657–681). The rationale for the selected obstacle dimensions and altitude limits was also added after Table 1, explaining that the cylindrical obstacles and al- titude range were selected to create representative 3D planning challenges and should be adapted in practical deployments according to the UAV platform, crop type, regulations, sensor require- ments, and mission objective (pages 18–19, lines 690–699). Finally, the Scope and Limitations section was expanded to explain that changes during field operations, such as moving obstacles, wind disturbances, sensor noise, or communication delays, require online replanning or closed- loop trajectory correction (pages 28–29, lines 998–1033).

An important issue is line 277 (energy consumption) or energy cost (line 279) is defined as sum of path length and changes in altitude, which does not compute in energy consumption (usually depends on UAV speed, payload, wind speed and direction – please see proposed energy models used in literature), or use an appropriate term.

Thank you for this important and technically relevant comment. We agree that the term “energy consumption” in the original manuscript could be misleading, because the formulation used in this study does not represent a complete physical UAV energy model. Real UAV energy con- sumption depends on several factors, including UAV mass, propulsion system characteristics, flight speed, acceleration, payload, wind speed and direction, and aerodynamic effects. The term used in the original manuscript was therefore too strong.

To address this issue, we revised the manuscript by replacing “energy consumption” with “energy- related cost” or “energy surrogate cost” where appropriate. We now explicitly state that the proposed term is a simplified surrogate used in the trajectory optimization objective to penal- ize long paths and excessive vertical motion, rather than a direct estimate of battery energy consumption. The formulation was kept as a lightweight trajectory-quality term to preserve computational efficiency and to allow fair comparison among the evaluated optimization algo- rithms under identical conditions.

In addition, we added a limitation statement explaining that a more realistic energy model should be considered in future work. Such a model may incorporate UAV mass, velocity profile, pay- load, propulsion characteristics, wind disturbances, and climb/descent power requirements. This revision improves the scientific accuracy of the manuscript and avoids overclaiming the physical interpretation of the proposed cost term.

Changes made in the manuscript: The terminology was revised throughout the manuscript by replacing the overstrong expression “energy consumption” with “energy-related surrogate cost” where appropriate. This terminology was updated in the Abstract (page 1, lines 25–27), the problem formulation and trajectory-cost description (page 8, lines 322–325), the Objective Function subsection (pages 9–10, lines 361–395), and the Evaluation Metrics subsection (page 20, lines 749–755). In addition, the Scope and Limitations section was expanded to explicitly state that the energy-related term penalizes travel distance and vertical displacement but does not represent a complete physical UAV energy-consumption model, and that future work will incorporate physics-based or data-driven UAV energy models (page 28, lines 1021–1028).

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

Summary:

This paper addresses the multi-objective optimization and complex environment obstacle avoidance problems in 3D UAV path planning for precision agriculture scenarios. It proposes an Adaptive Q-Learning–Guided Gorilla Troops Optimizer (AQGTO), which improves the safety, energy efficiency, and smoothness of path planning by dynamically balancing exploration and exploitation, designing a multi-objective cost function, and implementing a feasibility repair mechanism. The research topic aligns well with the engineering requirements of UAV autonomous operation and smart agriculture and has certain theoretical innovation and engineering application value. However, the manuscript has deficiencies in the definition of methodological innovation, theoretical rigor, completeness of experimental comparisons, and consistency of formulas and expressions. The rationality and robustness of some core designs are not sufficiently verified.

1、

The weights of the multi-objective cost function are all set empirically, and no weight sensitivity analysis or parameter tuning justification is provided.

2、

The baseline algorithms only include A*, PSO, and the original GTO. Comparisons with state-of-the-art hybrid optimization algorithms (e.g., QL-PSO, GWO, WOA, GTO) and deep reinforcement learning-based path planning methods (e.g., DQN, PPO) are absent, which weakens the persuasiveness of the claimed performance advantages.

3、

In introduction part, besides PSO, it is suggest to review more recent algorithms like WOA, Sparrow Search Algorithm, GWO etc, can refer to three-dimensional UAV trajectory planning based on improved sparrow search algorithm, three-dimensional unmanned aerial vehicle trajectory planning based on the improved whale optimization algorithm etc.

4、In 3.1 problem formulation part, the constraints is too abstract. It is suggested to add some contents to make it more concrete, especially for “Feasible trajectory contrant”.

5、

The experiments are only validated in simulated static environments, and realistic constraints common in agricultural scenarios—such as dynamic obstacles, wind disturbances, and communication delays—are not considered.

5、

Obstacles are uniformly modeled as cylinders and the terrain is represented as a regular rectangle. However, real agricultural environments contain non-convex and irregular obstacles. It is unclear whether the experimental model will be expanded to improve its adaptability to real-world scenarios.

6、

The manuscript lacks specific directions for future work; only parallelization and multi-UAV collaboration are mentioned in general terms. It is suggested to supplement concrete research directions such as dynamic environments, edge-cloud deployment, and real-flight experiments.

Author Response

The weights of the multi-objective cost function are all set empirically, and no weight sensitivity analysis or parameter tuning justification is provided.

Thank you for this important comment. We agree that the weighting coefficients of the multi- objective cost function should be better justified, since they directly influence the optimization behavior and the interpretation of the results. In the revised manuscript, we have added a jus- tification paragraph after the objective-function parameter table to explain the rationale behind the selected weights. The weights were chosen to reflect the mission priorities in agricultural UAV path planning: obstacle avoidance and safety were assigned the highest priority, followed by path length and energy-related effort, while smoothness and altitude variation were used as secondary regularization terms to improve trajectory feasibility and flight stability.

In addition, to verify that the proposed AQGTO is not overly dependent on a single empirical weight setting, we added a weight-sensitivity analysis in the Results and Discussion section. The analysis evaluates AQGTO under several representative weighting configurations, including bal- anced, safety-oriented, energy-oriented, and smoothness-oriented settings. The results show that AQGTO maintains stable collision-free performance and competitive trajectory quality across these configurations, while the expected trade-offs between path length, energy-related effort, and smoothness are observed. This revision strengthens the robustness and credibility of the proposed objective formulation.

Changes made in the manuscript: The objective-function parameter section was revised to explain the rationale behind the selected weights and penalty parameters according to agri- cultural UAV mission priorities, including obstacle avoidance, path length, energy-related ef- fort, smoothness, altitude variation, and infeasibility penalties (pages 19–20, lines 725–737). In addition, a new sensitivity-analysis subsection was added to evaluate AQGTO under default, safety-oriented, energy-oriented, and smoothness-oriented objective-weight configurations (page 27, lines 970–981). The corresponding sensitivity-analysis tables and discussion were added to show that AQGTO maintains stable and collision-free behavior under all tested configurations and that the selected default setting provides a balanced trade-off between trajectory efficiency, safety, and flight regularity (pages 27–28, lines 982–989).

The baseline algorithms only include A*, PSO, and the original GTO. Comparisons with state- of-the-art hybrid optimization algorithms (e.g., QL-PSO, GWO, WOA, GTO) and deep reinforcement learning-based path planning methods (e.g., DQN, PPO) are absent, which weakens the persuasiveness of the claimed performance advantages.

Thank you for this valuable comment. We agree that the experimental comparison should be strengthened by including additional recent and representative optimization baselines. In the revised manuscript, we have extended the experimental evaluation by adding two widely used swarm-intelligence and bio-inspired optimizers, namely the Grey Wolf Optimizer (GWO) and the Whale Optimization Algorithm (WOA). These algorithms were selected because they can be implemented under the same continuous waypoint-based path representation, objective function, population size, maximum number of iterations, and number of independent runs used for PSO, GTO, and AQGTO. This enables a fair trajectory-level comparison among population-based optimizers.

The comparison table and the corresponding discussion have been revised accordingly. The new results show that AQGTO maintains the best overall trajectory cost and competitive path qual- ity compared with A*, PSO, GWO, WOA, and GTO. The additional baselines also confirm that the performance gain is not limited to comparison with the original GTO alone.

Regarding deep reinforcement learning methods such as DQN and PPO, we agree that they are relevant to UAV path planning. However, they require a different sequential decision-making for- mulation, involving discrete or continuous action spaces, training episodes, reward shaping, and policy generalization over many environment instances. A direct comparison with trajectory-level metaheuristic optimizers under the same optimization budget would therefore require a separate experimental protocol. To avoid an unfair or superficial comparison, we have clarified this point in the revised manuscript and added deep reinforcement learning-based planning under dynamic agricultural environments as a concrete direction for future work.

Changes made in the manuscript: The experimental setup was revised to include GWO and WOA as additional stochastic population-based baselines under the same waypoint encoding, objective function, population size, maximum number of iterations, and number of independent runs (page 19, lines 702–719). The baseline-comparison subsection and Table 4 were updated to compare A*, PSO, GWO, WOA, GTO, and AQGTO, followed by an expanded discussion of the relative performance, stability, and trade-offs among the evaluated methods (pages 21– 22, lines 772–804). The Related Work section was also expanded to discuss recent GWO-, WOA-, and SSA-based UAV path-planning studies (pages 5–6, lines 205–218). In addition, the manuscript now explains why direct comparison with DQN/PPO would require a different se- quential decision-making protocol and identifies deep reinforcement learning-based planning as future work (page 22, lines 816–824). The trajectory and convergence analyses were also updated to include GWO and WOA in the visual comparisons (pages 24–26, lines 870–930).

In introduction part, besides PSO, it is suggest to review more recent algorithms like WOA, Sparrow Search Algorithm, GWO etc, can refer to three-dimensional UAV trajectory planning based on improved sparrow search algorithm, three-dimensional unmanned aerial vehicle trajec- tory planning based on the improved whale optimization algorithm etc.

Thank you for this helpful suggestion. We agree that the Introduction and Related Work should better reflect recent developments in metaheuristic UAV path planning beyond PSO and GTO. In the revised manuscript, we have expanded the related work by discussing recent studies on Grey Wolf Optimizer (GWO), Whale Optimization Algorithm (WOA), and Sparrow Search Al- gorithm (SSA) for UAV path planning and three-dimensional trajectory optimization.

In particular, we added discussion of recent GWO-based UAV path planning work, including Q-learning-based GWO variants that address premature convergence and insufficient adaptive learning. We also reviewed recent improved WOA approaches for 3D UAV trajectory planning, where reverse learning and nonlinear convergence factors are used to improve population diver- sity and convergence behavior. In addition, we included recent SSA-based UAV path planning studies, including improved SSA variants that use sine–cosine strategies, L´evy flight, chaotic mapping, and hybrid disturbance mechanisms to improve global exploration, convergence speed, and avoidance of local optima.

This revision provides a broader and more up-to-date background for the proposed AQGTO method and clarifies how our contribution differs from existing metaheuristic approaches. While recent improved GWO, WOA, and SSA methods enhance specific search operators or initializa- tion strategies, the proposed AQGTO introduces a Q-learning-guided adaptive strategy-selection mechanism into GTO, allowing the optimizer to dynamically select exploration, exploitation, or diversification actions according to the current optimization state.

Changes made in the manuscript: The Introduction was expanded to include recent GWO-, WOA-, and SSA-based UAV trajectory-planning studies, including Q-learning-based GWO, im- proved WOA, and improved SSA variants using mechanisms such as reverse learning, nonlinear convergence factors, sine–cosine strategies, L´evy flight, chaotic mapping, and hybrid disturbance mechanisms (page 3, lines 70–82). The Related Work section was also expanded to provide a more detailed discussion of recent GWO, WOA, and SSA variants for UAV path planning and three-dimensional trajectory optimization, and to clarify how the proposed AQGTO differs from these methods by introducing a Q-learning-guided adaptive strategy-selection mechanism into GTO (pages 5–6, lines 205–218).

In 3.1 problem formulation part, the constraints is too abstract. It is suggested to add some contents to make it more concrete, especially for “Feasible trajectory contrant”.

Thank you for this valuable comment. We agree that the original formulation of the constraints in Section 3.1 was too general, particularly the feasible trajectory constraint. In the revised manuscript, we have expanded the problem formulation by providing more concrete mathemati- cal constraints. Specifically, we now explicitly define the boundary constraint, altitude constraint, obstacle avoidance constraint, safety clearance condition, maximum segment-length constraint, maximum altitude-change constraint, and maximum turning-angle constraint.

The feasible trajectory constraint has been clarified as a set of geometric and kinematic feasi- bility conditions imposed on consecutive waypoints. These conditions ensure that the generated path does not contain unrealistic jumps, excessive vertical changes, or abrupt heading variations that would be difficult for a UAV to execute. This revision makes the path-planning problem formulation more precise and improves the consistency between the mathematical model, the feasibility repair mechanism, and the experimental setup.

Changes made in the manuscript: Section 3.1 was revised to include explicit mathemat- ical constraints for the UAV path-planning problem. The revised formulation now defines the boundary constraint (page 7, lines 289–292), obstacle-avoidance and safety-clearance constraints for cylindrical obstacles (pages 7–8, lines 292–303), the altitude constraint (page 8, lines 303– 307), and concrete feasible-trajectory constraints based on maximum segment length, maximum altitude difference, and maximum turning angle (page 8, lines 307–321). These additions make the problem formulation more precise and align it with the feasibility repair mechanism and experimental setup.

The experiments are only validated in simulated static environments, and realistic constraints common in agricultural scenarios—such as dynamic obstacles, wind disturbances, and commu- nication delays—are not considered.

Thank you for this important observation. We agree that dynamic obstacles, wind disturbances, and communication delays are highly relevant in real agricultural UAV deployments. The present study focuses on global/offline 3D trajectory optimization in simulated agricultural environments where the obstacle map is assumed to be known during planning. This assumption allows the proposed AQGTO framework to be evaluated under controlled conditions and compared fairly with deterministic and population-based optimization baselines using the same objective func- tion and environmental constraints.

To avoid overgeneralization, we have clarified this scope in the revised manuscript. We also added a dedicated limitations paragraph explaining that the current experimental setup considers static obstacles and does not yet model moving obstacles, wind fields, sensor uncertainty, or communi- cation latency. In addition, we expanded the future-work discussion to explicitly include online replanning, dynamic obstacle avoidance, wind-aware trajectory optimization, edge/cloud deploy- ment, and real-flight validation. These additions better define the applicability of the current contribution and provide a concrete path for extending AQGTO toward real-time agricultural UAV operations.

Changes made in the manuscript: The Experimental Setup section was revised to clarify that the present study focuses on offline/global three-dimensional trajectory optimization in known agricultural environments, where the obstacle map and environmental boundaries are assumed to be available during planning (page 15, lines 584–596). A system-level clarification was also added to explain that the simulated obstacle maps may be interpreted as outputs of upstream perception or mapping stages based on UAV imagery, multispectral data, LiDAR point clouds, or geographic field maps (page 15, lines 597–604). In addition, the Scope and Limitations section was expanded to explicitly state that the current experiments assume static obstacles and do not explicitly model moving obstacles, wind disturbances, sensor noise, or communication delays; these factors are identified as future extensions requiring online replanning, uncertainty-aware safety margins, wind-aware cost modeling, and closed-loop trajectory correction (pages 28–29, lines 998–1033).

Obstacles are uniformly modeled as cylinders and the terrain is represented as a regular rect- angle. However, real agricultural environments contain non-convex and irregular obstacles. It is unclear whether the experimental model will be expanded to improve its adaptability to real-world scenarios.

Thank you for this important comment. We agree that real agricultural environments may contain irregular, non-convex, and heterogeneous obstacles, and that modeling all obstacles as cylinders is a simplification. In the revised manuscript, we clarified that the cylindrical obstacle model is used as a first-level geometric abstraction for common agricultural structures such as trees, orchard canopies, irrigation elements, poles, and small field buildings. This abstraction is commonly used in UAV trajectory-planning simulations because it provides a controlled and computationally efficient way to evaluate obstacle avoidance and trajectory optimization.

To address the reviewer’s concern, we have revised the simulation-environment description and added a limitations paragraph explaining that the current model does not fully capture irregular crop canopies, non-convex structures, or highly heterogeneous terrain. We also added a concrete future-work direction indicating that the framework will be extended to more realistic environmental representations, including clustered cylinders, ellipsoidal canopies, polygonal/non-convex obstacles, occupancy-grid maps, and point-cloud-based obstacle models derived from remote sens- ing or LiDAR data. Since the proposed AQGTO operates on waypoint coordinates and evaluates candidate paths through a cost function and feasibility checking module, its optimization frame- work can be adapted to more complex obstacle geometries by replacing the collision-checking and penalty modules without changing the main Q-learning-guided search mechanism.

Changes made in the manuscript: The Simulation Environment section was revised to clar- ify that the cylindrical obstacle model is used as a geometric abstraction of common agricultural structures such as trees, canopies, irrigation equipment, poles, storage elements, and small field buildings, while acknowledging that real environments may contain irregular, non-convex, and heterogeneous obstacle shapes (page 17, lines 646–655). The Feasibility Repair Mechanism sec- tion was expanded to explain that the proposed AQGTO framework is not restricted to cylindrical obstacles and can be adapted to polygonal regions, ellipsoidal canopies, occupancy grids, or point- cloud-based maps by replacing the collision-detection and obstacle-penalty functions (page 14, lines 550–557). The Scope and Limitations section was also expanded to discuss the limitations of cylindrical and rectangular abstractions and to identify richer environmental models as future work, including clustered cylindrical obstacles, ellipsoidal canopy models, polygonal/non-convex obstacles, occupancy-grid maps, and point-cloud-based representations from UAV imagery or LiDAR sensing (page 28, lines 1012–1021). Finally, the Conclusion was revised to include fu- ture work on irregular and non-convex obstacles, clustered vegetation, occupancy-grid maps, and point-cloud-based models (page 29, lines 1052–1062).

The manuscript lacks specific directions for future work; only parallelization and multi-UAV collaboration are mentioned in general terms. It is suggested to supplement concrete research directions such as dynamic environments, edge-cloud deployment, and real-flight experiments.

Thank you for this helpful suggestion. We agree that the future-work directions in the original manuscript were too general. In the revised manuscript, we have expanded the Conclusion and the newly added Scope and Limitations subsection to provide more concrete future research di- rections. These now include dynamic agricultural environments, online replanning, wind-aware trajectory optimization, irregular and non-convex obstacle modeling, edge/cloud-assisted deploy- ment, real-flight experiments, and multi-UAV cooperative path planning.

These additions clarify how the proposed AQGTO framework can be extended beyond offline simulated trajectory optimization. In particular, we now discuss the need to integrate perception- driven environment updates, dynamic obstacle prediction, communication-aware computation offloading, and real UAV validation to assess the practical applicability of the proposed method in operational precision-agriculture missions.

Changes made in the manuscript: The Scope and Limitations section was expanded to iden- tify concrete future extensions required for deployment beyond offline simulated trajectory op- timization, including dynamic obstacle prediction, wind-aware cost modeling, uncertainty-aware safety margins, real-time replanning, and closed-loop trajectory correction (pages 28–29, lines 998–1033). The Conclusion was also revised to provide specific future-work directions, including parallel population evaluation, accelerated collision checking, dynamic agricultural environments, online replanning, wind-aware trajectory optimization, irregular and non-convex obstacle mod- eling, edge/cloud-assisted deployment, real-flight experiments, and multi-UAV cooperative path planning (page 29, lines 1052–1062).

Article Menu

AQGTO: Adaptive Q-Learning-Guided Gorilla Troops Optimizer for 3D UAV Path Planning in Precision Agriculture

Further Information

Guidelines

MDPI Initiatives

Follow MDPI