Review Reports
- Yijun Zhang*,
- Zhiming Li and
- Ku Du
Reviewer 1: Lujun Wan Reviewer 2: Hongjun Yu Reviewer 3: Jinyu Fu
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsImprovement suggestions
- Theoretical deepening
Supplement the convergence analysis of the adjacent network (e.g., proof of the relationship between embedded space distance and real path cost).
Compare the advantages and disadvantages of different dynamic final goal selection strategies (e.g., maximum connected area vs. nearest area).
- Experimental enhancement
Comparative experiments: Compared with advanced HRL methods such as HIRO [17] and HAC [8], the performance advantages are highlighted.
Test continuous state space (such as Gazebo simulation) and dynamic obstacle scenarios.
- Method details are supplemented
Clarify the training data collection strategy of adjacent networks (random exploration or expert
trajectory?).
Analyze the quantitative relationship between the number of landmarks (Figure 10) and map size (e.g., landmark density = f (map size)).
- Project applicability
Discuss computing bottlenecks (such as the feasibility of deploying adjacency networks on embedded devices).
It is recommended to add energy consumption indicators (such as total path power vs coverage rate).
Author Response
Comment 1: Theoretical deepening. Supplement the convergence analysis of the adjacent network (e.g., proof of the relationship between embedded space distance and real path cost). Compare the advantages and disadvantages of different dynamic final goal selection strategies (e.g., maximum connected area vs. nearest area).
Response: We thank the reviewer for these valuable suggestions to enhance the theoretical depth of our paper.
Action Taken: Convergence Analysis: While a formal proof of convergence for the Adjacency Network is beyond the scope of this paper, we have enhanced Section 3.1 with a more detailed discussion on its theoretical underpinnings, framing it within the established context of metric learning.
Goal Selection Strategies: We have directly addressed this point. In Section 4.2, we have added a dedicated paragraph to explicitly discuss the trade-offs between different dynamic final goal selection strategies.
Comment 2: Experimental enhancement. Comparative experiments: Compared with advanced HRL methods such as HIRO and HAC, the performance advantages are highlighted. Test continuous state space (such as Gazebo simulation) and dynamic obstacle scenarios.
Response: We appreciate this suggestion to benchmark our method against these state-of-the-art HRL algorithms. A fair and rigorous comparison with methods like HIRO and HAC would require careful re-implementation and hyperparameter tuning within our specific problem domain, which is a substantial undertaking not feasible within the revision period. Explanation: Our primary contribution lies in adapting existing concepts (like adjacency constraints and landmarks) to the unique, non-goal-oriented domain of CPP, particularly through our dynamic goal generation module. We believe our current comparison against a strong HRL baseline and an ablation (ACHMP) effectively isolates and validates this core contribution.
Action Taken: We have acknowledged the need for comparison with HIRO and HAC, as well as testing in continuous spaces, as critical future work in our Conclusion (Section 6).
Comment 3: Method details are supplemented. Clarify the training data collection strategy of adjacent networks (random exploration or expert trajectory?). Analyze the quantitative relationship between the number of landmarks (Figure 10) and map size (e.g., landmark density = f (map size)).
Response: Thank you for these requests for clarification.
Action Taken: In Section 3.1, we now explicitly state that the Adjacency Network is trained on online agent trajectories collected during interaction with the environment.
In Section 5.4 (Ablation Studies), we analyze the impact of the number of landmarks (Figure 11) and discuss the qualitative relationship between landmark quantity and performance.
Comment 4: Project applicability. Discuss computing bottlenecks (such as the feasibility of deploying adjacency networks on embedded devices). It is recommended to add energy consumption indicators (such as total path power vs coverage rate).
Response: These are important practical considerations.
Action Taken: We have added the "Steps to Reach Milestone" metric in Table 1 and Table 2, which directly serves as a proxy for both energy consumption and mission time. The new learning curves in Figure 9 further visualize this efficiency.
We briefly discuss the computational efficiency of our method in Section 3.2, noting that the primary additional load comes from a small, lightweight Adjacency Network.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors proposed a study that convincingly identifies “subgoal space explosion” in HRL for CPP and motivates the need for both local feasibility constraints and global guidance. The connection between multi-scale observations and hierarchical structure is well articulated. Some comments are below:
1. All experiments are in simulated grid worlds; no results in continuous or more realistic UAV dynamics.
2. Coverage rate is the only reported quantitative metric; path efficiency (length), energy consumption, or time-to-coverage would make results more comprehensive.
3. There is no evaluation of robustness to sensor noise, map inaccuracies, or dynamic obstacles, which are common in real-world CPP.
4. The abstract is clear but could mention the environments used for evaluation to set expectations.
5. Figures would benefit from larger labels and legends for clarity.
6. Some references in the introduction are inserted mid-sentence in a way that interrupts reading flow. Consider grouping them more coherently.
Author Response
Comment 1: All experiments are in simulated grid worlds; no results in continuous or more realistic UAV dynamics.
Response: We thank the reviewer for this important suggestion. We fully agree that validation in continuous, high-fidelity environments is a crucial next step for demonstrating the practical applicability of our method. Given the significant engineering effort required to migrate and re-validate the entire framework, this was not feasible within the revision period. However, we have explicitly acknowledged this as a key direction for future work.
Action Taken: In the Conclusion (Section 6), we have added a point explicitly stating our plan to extend and validate our HRGL framework in continuous environments like Gazebo.
Comment 2: Coverage rate is the only reported quantitative metric; path efficiency (length), energy consumption, or time-to-coverage would make results more comprehensive.
Response: This is an excellent point. To provide a more comprehensive and multi-faceted evaluation, we have introduced several new quantitative metrics that serve as proxies for path efficiency, time-to-coverage, and energy consumption.
Action Taken: We have created two new tables (Table 1 and Table 2) in the new Section 5.3, which now report not only the Final Coverage Rate but also the Arrival Rate at a specific coverage milestone and the Steps to Reach that milestone.
We have also added a new learning curve figure (Figure 9) that visualizes the "Average Steps to Reach Milestone" throughout the training process, directly illustrating the learning speed of efficiency. We believe this new, deep analysis directly addresses the reviewer's concerns.
Comment 3: There is no evaluation of robustness to sensor noise, map inaccuracies, or dynamic obstacles, which are common in real-world CPP.
Response: We appreciate the reviewer's suggestion regarding robustness. While a full-scale analysis against all these factors is extensive, we have addressed the spirit of this comment by evaluating our method's generalization capabilities in environments of varying complexity.
Action Taken: In our new Section 5.3, we evaluate our method on two maps of significantly different scales and complexities (the 18x18 Urban Area and the much larger 50x50 Mahan Area). The strong performance of our method on the more challenging map (as shown in Table 2 and Figure 9b) serves as a demonstration of its robust generalization to more difficult scenarios.
Comment 4: The abstract is clear but could mention the environments used for evaluation to set expectations.
Response: We agree. The abstract has been revised to include this information.
Action Taken: We have updated the Abstract to state that "Extensive experiments on a variety of simulated grid-world maps demonstrate that HRGL significantly outperforms baseline methods..."
Comment 5: Figures would benefit from larger labels and legends for clarity.
Response: We apologize for the clarity issues in the original figures.
Action Taken: We have revised Figures 3, 6, 8, 10, and 11, which had readability issues, to ensure that all labels, legends, and axes are clear and legible.
Comment 6: Some references in the introduction are inserted mid-sentence in a way that interrupts reading flow. Consider grouping them more coherently.
Response: Thank you for this stylistic advice.
Action Taken: We have thoroughly revised the Introduction section, repositioning citations to the end of sentences or clauses to improve the overall reading flow.
Reviewer 3 Report
Comments and Suggestions for Authors
The framework provides a low-resolution global map for the high-level policy's strategic planning and a high-resolution local map for the low-level policy's execution. The current version is good, but there are still some content that needs to be modified. The details are as follows:
(1) In the introduction section, the author should enhance the discussion of relevant literature in the article. Please consider further refining the wording to enhance objectivity. The author should consider including references to previous articles published in journals about path planning and Multi-agent systems technology over the past three years. This will provide additional context and support for the research presented in the current article.
[1] Z. Yang, J. Fu, Y. Sun and Y. Li ,A Pseudo-Trajectory Homotopy Method for UMVs Information Collection IoT System With an Underwater Communication Constraint," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2025.3596338.
[2]Meilin Li, Kai Zhang, Yang Liu, Fazhi Song, Tieshan Li, Prescribed-time consensus of nonlinear multi-agent systems by dynamic event-triggered and self-triggered protocol, IEEE Transactions on Automation Science and Engineering, Vol. 22, pp. 16768-16779, 2025.
(2)There are many non-standard aspects in the formula, and it is recommended to optimize and adjust them.
(3) Add appropriate analysis and quantitative description to the experimental results
For the reasons mentioned above, I recommend that this article undergo a "minor revision."
Author Response
Comment 1: In the introduction section, the author should enhance the discussion of relevant literature in the article... The author should consider including references to previous articles published in journals about path planning and Multi-agent systems technology over the past three years.
Response: We are very grateful to the reviewer for this suggestion and for providing specific references.
Action Taken: We have completely restructured and significantly enhanced the Introduction. It now includes a much deeper discussion of the relevant literature, including the two papers recommended by the reviewer ([29, 30]). We now clearly position our work by first discussing the state-of-the-art in adjacency constraints and landmark guidance for goal-reaching tasks, and then highlighting the unique challenges of CPP (e.g., no fixed goal) that our work is the first to address with our dynamic goal generation module.
Comment 2: There are many non-standard aspects in the formula, and it is recommended to optimize and adjust them.
Response: We apologize for the inconsistencies in our original formulas.
Action Taken: We have meticulously reviewed and corrected all formulas in the manuscript to ensure consistent and standard notation. The corrected formulas are highlighted in blue.
Comment 3: Add appropriate analysis and quantitative description to the experimental results.
Response: We fully agree and have made this a central part of our revision.
Action Taken: We have created an entirely new subsection, "5.3 Final Performance Evaluation," dedicated to this. This section includes two new tables with multiple quantitative metrics (Table 1 and Table 2) and a new, two-panel learning curve figure (Figure 9). A detailed, paragraph-by-paragraph analysis is provided, directly linking the quantitative data in the tables to the visual trends in the learning curves to build a consistent and compelling narrative.
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsI consent to the pubulication of this article.
Author Response
Comments 1: I consent to the pubulication of this article.
Response 1: We sincerely thank the reviewer for their positive evaluation and consent to the publication of our manuscript. We greatly appreciate your recognition of the significance of our work.
Reviewer 3 Report
Comments and Suggestions for AuthorsThe author has made detailed changes to the relevant technology and is satisfied, however, there are still some grammar errors and reference errors in this article. The author information in references [29] and [30] is all incorrect. I suspect that the author used AI tools to complete the references, but the information provided by these tools is unreliable. Therefore, I suggest that the author should truly review the original text and verify the author's information. Otherwise, this paper cannot be accepted.
Author Response
Comments 2: The author has made detailed changes to the relevant technology and is satisfied, however, there are still some grammar errors and reference errors in this article. The author information in references [29] and [30] is all incorrect. I suspect that the author used AI tools to complete the references, but the information provided by these tools is unreliable. Therefore, I suggest that the author should truly review the original text and verify the author's information. Otherwise, this paper cannot be accepted.
Response 2: We appreciate the reviewer’s attention to the errors in the author information for these references. After thoroughly re-examining the original publications, we have corrected the grammatical errors and the author names and citation details accordingly. The revised references are now fully consistent with the original texts. Following the reviewer’s advice, we carefully re-verified all references against their original sources to ensure that author names, publication details, and formatting strictly comply with the journal’s requirements. This process confirmed the accuracy of the remaining references.
We sincerely apologize for the oversight in the initial submission and assure the reviewer that no AI-generated reference information remains unverified in the revised manuscript. We are committed to ensuring that all citations are accurate and reliable.
25. Huang Zhiao, Liu Fangchen, Su Hao, Mapping state space using landmarks for universal goal reaching,
Advances in Neural Information Processing Systems, volume 32, pages 1942–1952, 2019.
26. Zhao Shuen, Leng Yao, Zhao Maojie , Wang Kan, Zeng Jie, Erratum: A Novel Dynamic Lane
Changing Trajectory Planning for Autonomous Vehicles Based on Improved APF and RRT Algorithm,
International Journal of Automotive Technology, 2024.
27. Ni Jianjun , Gu Yu, Gu Yang, Zhao Yonghao and Shi Pengfei, UAV coverage path planning with
limited battery energy based on improved deep double Q-network, International Journal of Control,
Automation and Systems, volume 22, number 8, pages 2591-2601, 2024.
28. Kim Kyungseo, Park Junwoo and Kim Jinwhan , Optimized Area Partitioning for Cooperative
Underwater Search Using Multiple Autonomous Underwater Vehicles, International Journal of Control,
Automation and Systems, volume 23, number 2, pages 392-404, 2025.
29. Li Meilin, Zhang Kai, Liu Yang, et al. Prescribed-Time Consensus of Nonlinear Multi-Agent Systems
by Dynamic Event-Triggered and Self-Triggered Protocol. IEEE Transactions on Automation Science
and Engineering, volume 22, pages 16768-16779, 2025.
30. Yang Ziao, Fu Jinyu, Sun Yushan and Li Ye, A Pseudo-Trajectory Homotopy Method for UMVs 577
Information Collection IoT System With an Underwater Communication Constraint, IEEE Internet of
Things Journal, doi: 10.1109/JIOT.2025.3596338, 2025.