A Macro-Control and Micro-Autonomy Pathfinding Strategy for Multi-Automated Guided Vehicles in Complex Manufacturing Scenarios
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsDear authors
there are my comments:
- On page 4, lines 153 - 157 the possible directions of AGV movement along with rotations are defined (rotate 90° left, rotate 90° right, rotate 180° back and wait). How should the penalty values in Table 2 be understood? Why are the rotation angles defined in the range: less than or equal to 90° or greater than 90°? In my opinion penalty should be 0 - forward direction, 1 - rotate 90° right or left and 2 - rotate 180° back.
Please explain the values and ranges of rotation angles described in Table 2. - For better quality, all figures should be inserted as vector graph (if it is possible).
- What is the unit of time on figure no. 9?
Best regards
Reviewer
Author Response
Dear Reviewer,
Thank you very much for your time involved in reviewing the manuscript and your very encouraging comments on the merits.
Comments 1:
On page 4, lines 153 - 157 the possible directions of AGV movement along with rotations are defined (rotate 90° left, rotate 90° right, rotate 180° back and wait). How should the penalty values in Table 2 be understood? Why are the rotation angles defined in the range: less than or equal to 90° or greater than 90°? In my opinion penalty should be 0 - forward direction, 1 - rotate 90° right or left and 2 - rotate 180° back.
Please explain the values and ranges of rotation angles described in Table 2.
Response 1:
We agree with this comment that this paper does not go into enough detail on the rotational penalty values and Table 2 contains some ambiguities. Therefore, we have rewritten this part on page 10-11, line 350-353 and line 358-365.
The angle in Table 2 is used to describe the relative orientation of the AGV to its target, and we rewrite the Table 2 to remove ambiguities.
In this paper, the rotational penalty p(n) is used to predict the number of rotate actions needed to reach the target node from each node, and modify the heuristic function based on Manhattan distance which can only predict the count of forward action. In our model, rotate 90° and rotate 180° are all considered to be one rotate action, details in on page 5, line 183-187.
Comments 2:
For better quality, all figures should be inserted as vector graph (if it is possible).
Response 2:
We apologize for the previous low-resolution image. We have redrawn all the figures to ensure the quality.
Comments 3:
What is the unit of time on figure no. 9?
Response 3:
Sorry for the omission of the unit in the picture. The unit of time on figure no. 9 is second, and we have fixed the mistake on figure no. 9.
We would like to take this opportunity to thank you for all your time involved and this great opportunity for us to improve the manuscript. We hope you will find this revised version satisfactory.
Sincerely,
The Authors
Author Response File: Author Response.docx
Reviewer 2 Report
Comments and Suggestions for AuthorsThis paper presents a hybrid approach to the Multi-Agent Path Finding (MAPF) problem, combining centralized A* planning with a distributed method based on PER-D3QN-EDAgger. The method is well-motivated and clearly described. However, several concerns limit the impact and clarity of the contribution.
The following are the authors observations, concerns and opinions:
Real-World Applicability - The method relies on strong assumptions, particularly regarding constant inter-agent communication, without evaluating their feasibility in real-world settings.
Deadlock Resolution and Traffic Rules - The paper emphasizes limitations of centralized methods but omits a discussion of their strengths—specifically, deadlock resolution. It is also unclear how the proposed distributed method addresses deadlocks. The implementation of traffic rules as part of distributed coordination is not discussed.
Lifelong Task Simulation and Sequencing - While the method is positioned as a solution for lifelong MAPF with task decomposition, the evaluation section lacks details on how tasks and sub-tasks are simulated. Additionally, the mechanism for temporal synchronization among agents collaborating on a shared task is not explained.
Augmented Map Representation - The proposed representation uses directional spatial nodes, but the advantages over the standard (x, y, orientation) state formulation are unclear. It is also not specified how node connectivity is established—manually or automatically.
Misc observations:
- Sensor Modality: The manuscript repeatedly refers to AGVs as using radar. This is misleading—LIDAR is the standard sensing modality in most AGV systems. Please revise accordingly.
- Line 207–209: This section appears to be an instruction or comment to the authors rather than formal paper content. It should be either removed or rewritten to maintain a consistent academic tone.
- Line 218: The authors acknowledge that distributed methods may lead to suboptimal and inefficient paths but do not explain how their proposed PER-D3QN-EDAgger approach mitigates this. A detailed justification or experimental evidence is needed.
- Lines 227–228: The text seems to introduce a list but includes no bullet points or continuation. Please verify whether content is missing or formatting was lost.
- Figure 4 (Page 8): Abbreviations such as WCS and RCS are not explained in the figure caption or accompanying text. Readers unfamiliar with these terms will struggle to interpret the figure. Define all acronyms at first use.
- Section 3.1.2 (Enhancements over A*): The section claims enhancements over A*, but the listed modifications—action space redefinition, directional spatial nodes, and a directional heuristic—do not modify the core A* algorithm. Clarify how these are integrated into the A* framework or revise the claim.
- Table 2 (Page 9): The second entry, corresponding to a 180° rotation, should logically incur the highest penalty, yet this is not reflected in the values. Please verify and explain the penalty logic used.
- Equation 10 (Page 10): The variable `done` appears in the equation but is not defined or described anywhere in the text. Clarify its role and how it is computed.
- Figure 5 (Page 11): The table embedded in this figure includes text in Chinese. All elements in figures must be translated to English for clarity and consistency with the rest of the paper.
- Observation Space (Page 11): the paper mentions a 4-layer observation space but does not specify what the value in each layer represents. Detailed descriptions of these layers are essential for reproducibility and understanding.
- Figure 6 (Page 13): The visual quality is low; the figure should be rendered using vector graphics to ensure clarity. Also, the legend order and naming should match the text for consistency.
- Figure 7 (Page 13): Similar to Figure 6, the resolution should be improved, and vectorized graphics are recommended. Use consistent terminology and notation throughout.
- Figures 8–10 (Pages 14-15): The current graph combines multiple dimensions of data into a single figure, which is hard to interpret. Consider breaking these into individual plots, each representing in two dimensions to enhance clarity.
- Line 379: The term "stability" of results is used but not defined. Clarify whether this refers to variance in performance, robustness to perturbations, or another metric.
- Line 396: The concept of "shelf density" is introduced without definition. Specify whether this is the ratio of shelf area to total area, number of shelves per unit area, or another metric.
- Line 408: "Response time" is not defined. If this is evaluation time of the algorithms, a clear description of the algorithm implementation must be provided.
- Line 416: The acronym "IDA*" should be written as "ID A*" to match its earlier usage in the paper and avoid confusion.
- Line 423: When presenting response time results for the proposed method, clarify whether the time is measured per agent, per episode, or for the entire system.
- Task Generation in Evaluation: The paper lacks information on how the tasks used in the simulation scenarios were genererated. Include a description of how tasks and sub-tasks are defined, their distribution, and how collaboration between agents is tested.
Author Response
Dear Reviewer,
Thank you very much for your time involved in reviewing the manuscript and your very encouraging comments on the merits.
Comments 1:
Real-World Applicability - The method relies on strong assumptions, particularly regarding constant inter-agent communication, without evaluating their feasibility in real-world settings.
Response 1:
We agree with this comment that our approach lacks discussion of communicate interference situation. Therefore, we add a discussion of system robustness under communication delays or interruptions situation on page 9, line 350-353.
Generally speaking, communication interruption of a few AGVs does not affect the stability of the system operation, but it well interfere the task completion of those offline AGVs.
Comments 2:
Deadlock Resolution and Traffic Rules - The paper emphasizes limitations of centralized methods but omits a discussion of their strengths—specifically, deadlock resolution. It is also unclear how the proposed distributed method addresses deadlocks. The implementation of traffic rules as part of distributed coordination is not discussed.
Response 2:
We agree with the comment. Though, We have not introduced an explicit deadlock-avoidance mechanism, the AGVs can learn implicit coordination behaviors via communication-based training. But of course, no training process can cover all possible deadlock scenarios. Hence, we well monitor AGV task execution in real time and incorporate an explicit deadlock-avoidance policy in future work.
Comments 3:
Lifelong Task Simulation and Sequencing - While the method is positioned as a solution for lifelong MAPF with task decomposition, the evaluation section lacks details on how tasks and sub-tasks are simulated. Additionally, the mechanism for temporal synchronization among agents collaborating on a shared task is not explained.
Response 3:
We apologize for the missing description of the experimental setup, we have rewritten this part. The details of the tasks in the experiment are on page 15, lines 519-523 and page 17 lines 556-559.
And the implementation of traffic rules as part of distributed coordination is on page 9, lines 283-288 and lines 294-297.
Comments 4:
Augmented Map Representation - The proposed representation uses directional spatial nodes, but the advantages over the standard (x, y, orientation) state formulation are unclear. It is also not specified how node connectivity is established—manually or automatically.
Response 4:
The directional spatial nodes concept was introduced to represent AGV orientation characteristics and to accommodate an AGV‑referenced action space. The AGV‑referenced action space unifies all basic actions to facilitate subsequent model training. Details on page 5, lines 188-192.
Node connectivity details is on page 7, lines 237-242
Comments 5:
Sensor Modality: The manuscript repeatedly refers to AGVs as using radar. This is misleading—LIDAR is the standard sensing modality in most AGV systems. Please revise accordingly.
Response 5:
Thanks for the correction, we have corrected the mistakes.
Comments 6:
Line 207–209: This section appears to be an instruction or comment to the authors rather than formal paper content. It should be either removed or rewritten to maintain a consistent academic tone.
Response 6:
We apologize for an oversight in the proofreading of the paper; The first paragraph of Section 3 was not the formal paper content but a draft, which we have now removed.
Comments 7:
Line 218: The authors acknowledge that distributed methods may lead to suboptimal and inefficient paths but do not explain how their proposed PER-D3QN-EDAgger approach mitigates this. A detailed justification or experimental evidence is needed.
Response 7:
To address the elevated path costs issue, this work optimizes centralized pre-planning from three angles: 1.Incorporating the A* pre‐planned path into each agent’s observation space; 2. Introducing extra A* based rewards; 3. Using the A* pre‐planned path as expert demonstrations during learning.
Together, these measures complement one another to guide AGVs under conflict free conditions—to follow the high quality routes produced by centralized planning as closely as possible. Details on page 13, Figure 5; page 13, line 430-433; page 14,Table 3.
Comments 8:
Lines 227–228: The text seems to introduce a list but includes no bullet points or continuation. Please verify whether content is missing or formatting was lost.
Response 8:
Thanks for the correction, we have corrected the mistake on page 8, lines 267-268
Comments 9:
Figure 4 (Page 8): Abbreviations such as WCS and RCS are not explained in the figure caption or accompanying text. Readers unfamiliar with these terms will struggle to interpret the figure. Define all acronyms at first use.
Response 9:
We have redrawn the figure 4 on page 8.
Comments 10:
Section 3.1.2 (Enhancements over A*): The section claims enhancements over A*, but the listed modifications—action space redefinition, directional spatial nodes, and a directional heuristic—do not modify the core A* algorithm. Clarify how these are integrated into the A* framework or revise the claim.
Response 10:
We agree the comment, and have rewritten the enhancements for AGV Path finding part on page 10-11, line 350-365.
Comments 11:
Table 2 (Page 9): The second entry, corresponding to a 180° rotation, should logically incur the highest penalty, yet this is not reflected in the values. Please verify and explain the penalty logic used.
Response 11:
We agree with Table 2 contains some ambiguities. Therefore, we have rewritten Table 2.
The angle in Table 2 is used to describe the relative orientation of the AGV to its target, and we rewrite the Table 2 to remove ambiguities.
In this paper, the rotational penalty p(n) is used to predict the number of rotate actions needed to reach the target node from each node, and modify the heuristic function based on Manhattan distance which can only predict the count of forward action. In our model, rotate 90° and rotate 180° are all considered to be one rotate action, details in on page 5, line 183-187.
Comments 12:
Equation 10 (Page 10): The variable `done` appears in the equation but is not defined or described anywhere in the text. Clarify its role and how it is computed.
Response 12:
We have added the defined on page 12, line 403-404.
Comments 13:
Figure 5 (Page 11): The table embedded in this figure includes text in Chinese. All elements in figures must be translated to English for clarity and consistency with the rest of the paper.
Response 13:
We are sorry for the mistake, We have redrawn the figure on page 13, Figure 5.
Comments 14:
Observation Space (Page 11): the paper mentions a 4-layer observation space but does not specify what the value in each layer represents. Detailed descriptions of these layers are essential for reproducibility and understanding.
Response 14:
We agree the comment, and have detailed descripted the observation space on page 13-14, line 454-476.
Comments 15:
Figure 6 (Page 13): The visual quality is low; the figure should be rendered using vector graphics to ensure clarity. Also, the legend order and naming should match the text for consistency.
Response 15:
We have redrawn the figure to ensure the quality.
Comments 16:
Figure 7 (Page 13): Similar to Figure 6, the resolution should be improved, and vectorized graphics are recommended. Use consistent terminology and notation throughout.
Response 16:
We have redrawn the figure to ensure the quality.
Comments 17:
Figures 8–10 (Pages 14-15): The current graph combines multiple dimensions of data into a single figure, which is hard to interpret. Consider breaking these into individual plots, each representing in two dimensions to enhance clarity.
Response 17:
We have redrawn the figure to ensure the quality. And the use of three-dimension figure is intended to give everyone a more intuitive impression of each method’s performance across different environments, thereby illustrating the adaptability differences among the methods.
Comments 18:
Line 379: The term "stability" of results is used but not defined. Clarify whether this refers to variance in performance, robustness to perturbations, or another metric.
Response 18:
Stability refers to the robustness to environments.
Comments 19:
Line 396: The concept of "shelf density" is introduced without definition. Specify whether this is the ratio of shelf area to total area, number of shelves per unit area, or another metric.
Response 19:
We are sorry for the lack of definition of shelf density. We have added the defined on page 15, line 506-514.
Comments 20:
Line 408: "Response time" is not defined. If this is evaluation time of the algorithms, a clear description of the algorithm implementation must be provided.
Response 20:
We agree this comment, we have added the defined of "Response time", on page 17, line 560-564.
Comments 21:
Line 416: The acronym "IDA*" should be written as "ID A*" to match its earlier usage in the paper and avoid confusion.
Response 21:
Thank you for the correction, we have corrected them.
Comments 22:
Line 423: When presenting response time results for the proposed method, clarify whether the time is measured per agent, per episode, or for the entire system.
Response 22:
The response time is measured per agent.
We would like to take this opportunity to thank you for all your time involved and this great opportunity for us to improve the manuscript. We hope you will find this revised version satisfactory.
Sincerely,
The Authors
Author Response File: Author Response.docx
Reviewer 3 Report
Comments and Suggestions for AuthorsIt is recommended for publication after revision, taking into account the aforementioned comments
Comments for author File: Comments.pdf
Author Response
Dear Reviewer,
Thank you very much for your time involved in reviewing the manuscript and your very encouraging comments on the merits.
Comments 1:
The article does not describe a hierarchical integration model of a complex dynamic system with a stochastic structure of route selection control processes.
Response 1:
We agree with this comment that we failed to effectively characterize a path selection hierarchical control system, and we have rewritten the relevant parts on page 8-9, line 260-268 and line 276-297.
Comments 2:
The target problem model and the concept of creative solutions for crisis situations are not formulated.
Response 2:
The key issues of our model are: limited adaptability, elevated path costs, and slow computational performance.
To address the elevated path costs issue, this work optimizes centralized pre-planning from three angles: 1.Incorporating the A* pre‐planned path into each agent’s observation space; 2. Introducing extra A* based rewards; 3. Using the A* pre‐planned path as expert demonstrations during learning.
Together, these measures complement one another to guide AGVs under conflict free conditions—to follow the high quality routes produced by centralized planning as closely as possible. Details on page 13, line 430-433; page 14,Table 3.
To improve adaptability, our distributed planning model applies the ε‑Dagger strategy to integrate A* expert experience, accelerating training while preserving exploration capability. Page 13, line 434-439.
Moreover, during training each experimental environment randomly reinitializes both obstacle distributions and the number of AGVs to ensure the model can generalize across diverse scenarios. Page 15, line 509-518.
Finally, to overcome slow computational performance, the centralized pre‑planning algorithm uses a single‐agent A*, so its computation time grows linearly with the number of AGVs. Detailed analysis can be found on page 8-9, line 270-273.
Comments 3:
The types of graphs representing admissible routes and the assessment of boundary (acceptable) risks are not defined.
Response 3:
The node connectivity is defined by the AGV-referenced actions and constrained by the grid map obstacles. The conflict detection actually only uses the X-Y coordinates. The details is on page 7, lines 237-247
Comments 4:
In Figure 5, the procedure for constructing the hierarchy of system control is not justified.
Response 4:
We agree the comment, and have detailed descripted the observation space on page 13-14, line 454-476.
Comments 5:
Potential risks associated with attack factors and control system failures are neither presented nor substantiated.
Response 5:
We agree the comments, and add the system robustness discussion about communication delays or interruptions and unexpected situation on page 9, lines 298-319.
Comments 6:
Although the authors provide an overview of existing approaches, it lacks depth in analyzing current competitive strategies, particularly recent work in the field of multi agent reinforcement learning (MARL) for logistics.
Response 6:
We agree with this comment that this paper is the lack of a rigorous review of state-of-the-art MARL. So, we have add more references about MARL, on page 2, line 57-71.
Comments 7:
The experiment parameters, training environment, precise map configuration, and hyperparameter settings are presented too briefly, making full replication of the study difficult.
Response 7:
We agree with the comment, and we have rewrote the map configuration and environment setup part on page 15, lines 506-523, and page 16-17, lines 544-559.
Comments 8:
It would be beneficial to include a more systematic comparison of the proposed approach with other modern methods, including MARL-based models and rule-based hybrid algorithms.
Response 8:
We agree with the comment. Therefore, we have added a state-of-the-art hybrid planning method PRIMAL2 as a reference. More detail on page 16-19.
We would like to take this opportunity to thank you for all your time involved and this great opportunity for us to improve the manuscript. We hope you will find this revised version satisfactory.
Sincerely,
The Authors
Author Response File: Author Response.docx
Reviewer 4 Report
Comments and Suggestions for AuthorsThe aim of the article submitted for review is to develop a strategy for planning paths for multiple automated guided vehicles (AGVs) in complex production environments. The authors combine the advantages of centralized route planning (as a macro strategy) and autonomous decision-making by individual AGVs (micro strategy), creating a hybrid model that increases the efficiency and adaptability of industrial transport systems.
The introduction to the study is written correctly, the structure of the article is correct. The developed method is presented in a clear and understandable way, both the tested example and the entire research procedure are well described. The adopted goal of the study was preceded by a sufficient literature review. The authors discuss existing problems in the analyzed topic well, and the items cited in the bibliography are up-to-date, so it can be assumed that the conclusions from the review are based on the current state of knowledge, which ensures their credibility.
The following should be indicated as a contribution to scientific development and an element of novelty in the article:
1. Combination of two approaches (central and distributed) - The hybrid model allows to avoid the weaknesses typical of each of the strategies separately.
2. Using the author's modification of the deep learning algorithm, combining Double Dueling DQN, Prioritized Experience Replay and Imitation Learning
3. Extending the AGV action space with directionality and rotation, which improved the realism and precision of the model.
4. Modeling the map with directional nodes, which allows for better representation of the actual behavior of the AGV in the factory environment.
The authors demonstrated innovation by combining various algorithms and conducting extensive tests in a very large number of scenarios, diversified in terms of the number of AGVs and the density of obstacles. It was possible to achieve over 95% effectiveness in carrying out tasks with a large number of AGVs, while maintaining a low reaction time. It is worth emphasizing that the model is suitable for use in real industrial environments.
My biggest comment regarding the analyzed article is the lack of real implementation. All tests were carried out exclusively in a simulation environment. The work would definitely gain in value thanks to validation in a physical AGV system. It is worth referring to this in the article and explaining why these studies were not carried out and whether they are planned.
It is also worth mentioning the following issues in conclusion:
- The proposed system is quite complex, although the authors have demonstrated its high effectiveness, but can the degree of complexity make practical implementation in less advanced industrial systems difficult?
- The developed solution assumes that vehicles can communicate effectively, which is not always possible in real conditions (interference, delays) - what in such a situation?
- It is worth indicating how the analysis of the computational cost looks like with such a low response time, what are the metrics of the consumption of computational resources on a system scale.
Nevertheless, the above comments do not diminish the value of the text. In my opinion, this is an interesting, innovative work that can significantly influence the development of AGV systems and has the potential to be published.
Author Response
Dear Reviewer,
Thank you very much for your time involved in reviewing the manuscript and your very encouraging comments on the merits.
Comments 1:
The proposed system is quite complex, although the authors have demonstrated its high effectiveness, but can the degree of complexity make practical implementation in less advanced industrial systems difficult?
Response 1:
We acknowledge the complexity of the system presented here. For the central planning system, most of the required data should be obtained via interfaces from mid‑level control systems such as the Robotic Control System (RCS) or Warehouse Control System (WCS), which also handle AGV dispatching. If a factory lacks such systems, central planning system itself may need to monitor and dispatch each AGV—undoubtedly increasing implementation difficulty. The distributed planning system, however, relies primarily on the AGVs themselves and the central planning system, so there is no need for undue concern.
Comments 2:
The developed solution assumes that vehicles can communicate effectively, which is not always possible in real conditions (interference, delays) - what in such a situation?
Response 2:
We agree with this comment that our approach lacks discussion of communicate interference situation. Therefore, we add a discussion of system robustness under communication delays or interruptions situation on page 9, line 350-353.
Generally speaking, communication interruption of a few AGVs does not affect the stability of the system operation, but it well interfere the task completion of those offline AGVs. So we well train a purely local-data–based fallback planner as an emergency contingency in future work.
Comments 3:
It is worth indicating how the analysis of the computational cost looks like with such a low response time, what are the metrics of the consumption of computational resources on a system scale.
Response 3:
We are sorry that we neglected to analyse the time complexity with the system scale. We have add it on page 8-9, line 270-273.
The time complexity with AGV size n is O(n).
We would like to take this opportunity to thank you for all your time involved and this great opportunity for us to improve the manuscript. We hope you will find this revised version satisfactory.
Sincerely,
The Authors
Author Response File: Author Response.docx
Reviewer 5 Report
Comments and Suggestions for AuthorsThe paper proposes a hybrid pathfinding strategy tailored for complex manufacturing scenarios. The approach integrates centralized planning, utilizing the classic A* algorithm to determine optimal directions between current and target positions. It uses a distributed algorithm where each AGV (Autonomous Guided Vehicle) considers dynamic elements in the environment to define its path to the target.
The topic is highly relevant, particularly concerning current demands in advanced manufacturing systems. Modern warehouses often operate with thousands of AGVs navigating dynamic environments filled with various products and tightly integrated with IT-based solutions.
One of the paper’s primary weaknesses is the lack of a rigorous review of state-of-the-art hybrid pathfinding methods. While the study is built upon established centralized and distributed techniques, whose limitations are well documented, the authors do not explore the current hybrid strategies that address the same key issues: limited adaptability, elevated path costs, and slow computational performance.
Additionally, the validation process does not benchmark the proposed approach against comparable hybrid methods found in recent literature. Instead, the analysis focuses only on internal module performance without offering external comparisons. Future versions should identify relevant state-of-the-art methods and use them as reference points to effectively measure and contextualize the proposed solution's performance.
In terms of structure and clarity, the manuscript would benefit from some specific revisions:
-
The first paragraph of Section 3 should be moved to the beginning of Section 4.
-
The distinction between “pre-planned paths” and “planned trajectories” (lines 261–265) should be more clearly articulated.
-
It is not evident whether the PER-D3QN-EDAgger component is an original contribution of this work or derived from previous studies—this needs clarification.
-
The role of expert knowledge in the E-Dagger Imitation Learning approach should be better explained, particularly when and how it is integrated into the learning process.
Overall, the proposed hybrid strategy addresses an essential challenge in AGV pathfinding by attempting to bridge centralized and distributed decision-making. However, the contribution would be significantly strengthened by deeper engagement with the existing body of hybrid methods and a more robust comparative evaluation. Clarifying technical elements and refining the manuscript's structure will improve the paper’s clarity and impact. With these revisions, the work can potentially contribute to intelligent path planning in smart manufacturing environments.
Author Response
Dear Reviewer,
Thank you very much for your time involved in reviewing the manuscript and your very encouraging comments on the merits.
Comments 1:
One of the paper’s primary weaknesses is the lack of a rigorous review of state-of-the-art hybrid pathfinding methods. While the study is built upon established centralized and distributed techniques, whose limitations are well documented, the authors do not explore the current hybrid strategies that address the same key issues: limited adaptability, elevated path costs, and slow computational performance.
Response 1:
We agree with this comment that this paper is the lack of a rigorous review of state-of-the-art hybrid pathfinding methods. So, we have add more references about pathfinding methods, on page 2, line 77-79, page 3, line 85-95 .
To address the elevated path costs issue, this work optimizes centralized pre-planning from three angles: 1.Incorporating the A* pre‐planned path into each agent’s observation space; 2. Introducing extra A* based rewards; 3. Using the A* pre‐planned path as expert demonstrations during learning.
Together, these measures complement one another to guide AGVs under conflict free conditions—to follow the high quality routes produced by centralized planning as closely as possible. Details on page 13, Figure 5; page 13, line 430-433; page 14,Table 3.
To improve adaptability, our distributed planning model applies the ε‑Dagger strategy to integrate A* expert experience, accelerating training while preserving exploration capability. Page 13, line 434-439.
Moreover, during training each experimental environment randomly reinitializes both obstacle distributions and the number of AGVs to ensure the model can generalize across diverse scenarios. Page 15, line 509-518.
Finally, to overcome slow computational performance, the centralized pre‑planning algorithm uses a single‐agent A*, so its computation time grows linearly with the number of AGVs. Detailed analysis can be found on page 8-9, line 270-273.
Comments 2:
Additionally, the validation process does not benchmark the proposed approach against comparable hybrid methods found in recent literature. Instead, the analysis focuses only on internal module performance without offering external comparisons. Future versions should identify relevant state-of-the-art methods and use them as reference points to effectively measure and contextualize the proposed solution's performance.
Response 2:
We agree with the comment. Therefore, we have added a state-of-the-art hybrid planning method PRIMAL2 as a reference. More detail on page 16-19.
Comments 3:
The first paragraph of Section 3 should be moved to the beginning of Section 4.
Response 3:
We apologize for an oversight in the proofreading of the paper; the first paragraph of Section 3 was not the formal paper content but a draft, which we have now removed.
Comments 4:
The distinction between “pre-planned paths” and “planned trajectories” (lines 261–265) should be more clearly articulated.
Response 4:
We are sorry for the ambiguity in the paper. The first one stand for the AGV’s own A* pre-planned path; and the second one stand for the A* pre-planned path of the AGV’s neighboring AGVs. Now we have changed the description on page 11, line 367-369.
Comments 5:
It is not evident whether the PER-D3QN-EDAgger component is an original contribution of this work or derived from previous studies—this needs clarification.
Response 5:
For PER-D3QN-EDAgger, ε-DAgger is an original contribution of this work, but PER and D3QN are all derived from previous studies. We have added clarification on page 11, line 382-385.
Comments 6:
The role of expert knowledge in the E-Dagger Imitation Learning approach should be better explained, particularly when and how it is integrated into the learning process.
Response 6:
We are agree that the ε-Dagger Imitation Learning should be better elaborated. Therefore, it is now described in more detail on page 12-13, line 423-433.
The expert knowledge is used to replace action selection during the learning, reducing the amount of unstructured exploration in the early stages. ε-Dagger allows the model to retain its exploration capability while lowering the probability of randomly selecting an action to ε(1 − ε).
We would like to take this opportunity to thank you for all your time involved and this great opportunity for us to improve the manuscript. We hope you will find this revised version satisfactory.
Sincerely,
The Authors
Author Response File: Author Response.docx
Reviewer 6 Report
Comments and Suggestions for AuthorsIn order to solve the problem of multi-AGV path search in a complex manufacturing environment, this paper proposed a macro-control & micro-autonomy hybrid strategy that combines centralized path planning and distributed autonomous control methods.
The proposed method is significant in that it can flexibly respond to both static and dynamic environments based on a central-distributed coupling structure, and can maintain high scalability and processing performance, especially in large-scale AGV operating environments.
However, it is also confirmed that some supplements are needed.
First, the method proposed in the proposed paper relies entirely on simulation-based experiments, and its applicability in actual manufacturing or logistics sites has not been verified. Since there are various unpredictable factors in the real environment, reality-based experiments or pilot tests are required to evaluate the effectiveness of the proposed technique.
Second, the proposed method focuses on collaborative control based on state sharing between AGVs, but no way to secure fault-tolerance for network issues such as communication delays or failures is suggested. Since there is a high possibility of problems such as communication instability or sensor malfunction in an actual operating environment, a robustness evaluation considering this situation is required.
The proposed paper stands out in an attempt to effectively solve the problem of multi-AGV path search in a complex manufacturing environment through a hybrid strategy that combines centralized planning and distributed autonomous control. Based on the simulation, the excellence of the proposed method was confirmed, and meaningful results were also presented in terms of scalability and flexibility.
However, it is regrettable that consideration of realistic variables such as lack of verification for actual environmental application and communication failure is insufficient. If these limitations can be supplemented, this study is expected to develop into a study with both practicality and academic contribution in the future.
.
Author Response
Dear Reviewer,
Thank you very much for your time involved in reviewing the manuscript and your very encouraging comments on the merits.
Comments 1:
First, the method proposed in the proposed paper relies entirely on simulation-based experiments, and its applicability in actual manufacturing or logistics sites has not been verified. Since there are various unpredictable factors in the real environment, reality-based experiments or pilot tests are required to evaluate the effectiveness of the proposed technique.
Response 1:
We agree on the importance of pilot tests, as simulations cannot cover all possible scenarios. Unfortunately, we currently do not have the opportunity to conduct on‑site trials or pilot runs in real production environments. However, in future work we will dedicate ourselves to building more realistic experimental environments and will seek partnerships with industrial enterprises to carry out pilot operations in environments such as lights‑out warehouse or production workshops.
Comments 2:
Second, the proposed method focuses on collaborative control based on state sharing between AGVs, but no way to secure fault-tolerance for network issues such as communication delays or failures is suggested. Since there is a high possibility of problems such as communication instability or sensor malfunction in an actual operating environment, a robustness evaluation considering this situation is required.
Response 2:
We agree with this comment that our approach lacks discussion of communicate interference situation. Therefore, we add a discussion of system robustness under communication delays or interruptions situation on page 9, line 350-353.
Generally speaking, communication interruption of a few AGVs does not affect the stability of the system operation, but it well interfere the task completion of those offline AGVs. So we well train a purely local-data–based fallback planner as an emergency contingency in future work.
We would like to take this opportunity to thank you for all your time involved and this great opportunity for us to improve the manuscript. We hope you will find this revised version satisfactory.
Sincerely,
The Authors
Author Response File: Author Response.docx
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors have provided the updated paper with some of the comments taken into the consideration. However, the comments below are still not addressed appropriately:
Comment 10: Section 3.1.2 (Enhancements over A*): The section claims enhancements over A*, but the listed modifications—action space redefinition, directional spatial nodes, and a directional heuristic—do not modify the core A* algorithm. Clarify how these are integrated into the A* framework or revise the claim.
The proposed algorithm doesn't really provide a noteworthy enhancement to A - the selection of action space and heuristic function is not regarded as being a part of the A algorithm. The directional nodes are also merely an arhitectural part of the used graph and again not part of the A* that would have been enhanced.
Comments 17: Figures 8–10 (Pages 14-15): The current graph combines multiple dimensions of data into a single figure, which is hard to interpret. Consider breaking these into individual plots, each representing in two dimensions to enhance clarity.
The 3-D representation employed is lacking a visual clarity and doesn't really render the results quantitively to be of any use to an engineer.
Comments 18: Line 379: The term "stability" of results is used but not defined. Clarify whether this refers to variance in performance, robustness to perturbations, or another metric.
It would be beneficial for the reader of the paper that the clarification given to the reviewer is included in the paper.
Comments 19: Line 396: The concept of "shelf density" is introduced without definition. Specify whether this is the ratio of shelf area to total area, number of shelves per unit area, or another metric.
The reviewer still can't really comprehend the logic behind the idea - what space is sampled during shelf initialization? If the complete space is sampled and randomly assigned a position of the shelf or free space, how it is assured that the map is even traversable?
Comments on the Quality of English LanguageSome parts of the newly inserted text in the paper need to be linguistically reviewed and improved.
Author Response
Dear Reviewer,
Thank you very much for taking the time out of your busy schedule to review our responses and the revised manuscript. Your comments have been immensely helpful and have provided us with invaluable guidance in improving our paper.
Comments 1:
Section 3.1.2 (Enhancements over A*): The section claims enhancements over A*, but the listed modifications—action space redefinition, directional spatial nodes, and a directional heuristic—do not modify the core A* algorithm. Clarify how these are integrated into the A* framework or revise the claim.
The proposed algorithm doesn't really provide a noteworthy enhancement to A - the selection of action space and heuristic function is not regarded as being a part of the A algorithm. The directional nodes are also merely an arhitectural part of the used graph and again not part of the A* that would have been enhanced.
Response 1:
We agree with this comment that we did not improve the core algorithm of A*. We have now revised the statement, and called it “modified A*”.
Comments 2:
Figures 8–10 (Pages 14-15): The current graph combines multiple dimensions of data into a single figure, which is hard to interpret. Consider breaking these into individual plots, each representing in two dimensions to enhance clarity.
The 3-D representation employed is lacking a visual clarity and doesn't really render the results quantitively to be of any use to an engineer.
Response 2:
We agree with this comment that the 3-D view did not present quantitative results. Therefore, we have compiled key experimental data into Table 5, on Page 17.
Comments 3:
Line 379: The term "stability" of results is used but not defined. Clarify whether this refers to variance in performance, robustness to perturbations, or another metric.
It would be beneficial for the reader of the paper that the clarification given to the reviewer is included in the paper.
Response 3:
We agree with this comment, and we have added its definition in the paper. Details on page 16, line 533-544.
Comments 4:
Line 396: The concept of "shelf density" is introduced without definition. Specify whether this is the ratio of shelf area to total area, number of shelves per unit area, or another metric.
The reviewer still can't really comprehend the logic behind the idea - what space is sampled during shelf initialization? If the complete space is sampled and randomly assigned a position of the shelf or free space, how it is assured that the map is even traversable?
Response 4:
Please excuse our oversight in the map modeling description. As shown in Figure 2 or Figure 7, in our simulation environment the storage locations (yellow or green cells) are arranged back-to-back, with aisles (white cells) reserved. This layout guarantees access to every storage node. Details on page 6-7, line 224-231; page 7 Figure 2; page 17 Figure 7.
Comments 5:
Some parts of the newly inserted text in the paper need to be linguistically reviewed and improved.
Response 5:
Thanks for the reminder, we have also corrected several grammatical errors throughout the manuscript.
Thank you again for offering further suggestions. We truly appreciate your advice, which has greatly aided our revisions. We hope you will find this revised version satisfactory.
Sincerely,
The Authors
Author Response File: Author Response.docx
Reviewer 3 Report
Comments and Suggestions for Authorsthe article can be published
Author Response
Thank you very much for taking the time out of your busy schedule to review our responses and the revised manuscript. Your comments have been immensely helpful and have provided us with invaluable guidance in improving our paper. Please allow us to express our sincere gratitude.
Reviewer 5 Report
Comments and Suggestions for AuthorsFor the final revision, it is necessary to indicate for each new reference included in this version whether it is a centralized or distributed approach.
Author Response
Dear Reviewer,
Thank you very much for taking the time out of your busy schedule to review our responses and the revised manuscript. Your comments have been immensely helpful and have provided us with invaluable guidance in improving our paper.
Comments 1:
For the final revision, it is necessary to indicate for each new reference included in this version whether it is a centralized or distributed approach.
Response 1:
We agree with this comment. Therefore, we have indicated for each new reference to clarify whether it is a distributed solutions or hybrid solutions complemented by a centralized system. Details on page 2, line 78-82; page 3, line 88-99.
Thank you again for offering further suggestions. We truly appreciate your advice, which has greatly aided our revisions. We hope you will find this revised version satisfactory.
Sincerely,
The Authors
Author Response File: Author Response.docx