Autonomous Dogfight Decision-Making for Air Combat Based on Reinforcement Learning with Automatic Opponent Sampling
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe paper presents a reinforcement learning (RL) framework for autonomous dogfight decision-making. It introduces an Automatic Opponent Sampling (AOS) method with a Proximal Policy Optimization (PPO) algorithm, addressing limitations in previous RL-based dogfight models. The proposal is interesting, but it requires improvements.
* Include a motivation for preferring data-driven approaches to analytical approaches. Why not use model-based techniques? For example, techniques based on energy shaping, e.g. modeling and passivity-based control for a convertible fixed-wing vtol, applied mathematical modelling.
* Compare theoretical assumptions with the mentioned existing RL approaches (e.g., PFSP, SAC) in detail. Include a discussion on computational complexity and real-time applicability with AOS-PPO and hierarchical decision-making methods (e.g., SECA).
* Variables should be clearly defined within the text. Several equations (e.g., reward shaping functions) use inconsistent notation, making interpretation difficult. Please uniform the notation.
* How does AOS-PPO compare in training time and convergence rate versus standard PPO and SAC-lstm?
* It is unclear whether PPO, SAC-lstm, and PFSP-PPO were reimplemented by the authors or sourced from existing benchmarks.
* Clarify experimental setup details, including opponent initialization and parameter settings.
* The JPC matrix (Figure 9) should include confidence intervals or statistical significance analysis.
* Compare your results with a relevant and recent strategy. Include tables with performance indexes.
* Discuss about the limitations of the proposed method.
* Include proposed future work in the conclusions.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsReviewer Report
Article Title: "Autonomous Dogfight Decision-Making for Air Combat Based on Reinforcement Learning with Automatic Opponent Sampling."
General Assessment
This article tackles a crucial topic within the field of aerial defense, proposing an innovative approach that utilizes reinforcement learning (RL). The combination of an Automatic Opponent Sampling (AOS) framework with Proximal Policy Optimization (PPO) shows promise, and the experimental results seem solid.
Detailed Comments
- Methodology: The AOS-PPO approach introduced (line 63) requires further explanation regarding its operation and how it enhances the learning of maneuvering policies. It is vital that the Materials and Methods section is clearly articulated to ensure readers fully grasp the foundation of your research. Furthermore, a comparison with techniques such as Prioritized Fictitious Self-Play (PFSP) (line 59) could be strengthened by quantitative analyses to more clearly highlight the advantages of the AOS-PPO approach.
- Results: The findings (lines 427-433) demonstrate the effectiveness of the AOS. However, a more thorough analysis of performance against various adversarial configurations could offer additional insights and assist in generalizing the conclusions. Including this aspect would make the practical implications of your results more apparent.
- Clarity and Visualization: While figures are provided (e.g., Figure 11, line 456), the article would benefit from additional visualizations to illustrate the dynamics of maneuvers. Moreover, it is crucial that the legends accompanying these figures are detailed to improve reader comprehension and adequately contextualize the visual data. This will enhance accessibility for a wider audience.
The article makes a valuable contribution to research on autonomous decision-making in air combat. However, enhancing methodological clarity, offering a more detailed analysis of the results, and ensuring that the Materials and Methods section is well-presented would strengthen the impact of this work. I recommend major revisions focusing on these aspects before publication.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors have done a great job addressing my comments. I am happy with the revised manuscript.
Author Response
We would like to express our gratitude to you once again for your constructive comments and valuable suggestions. We will further expand and refine the proposed algorithm in our future works.
Reviewer 2 Report
Comments and Suggestions for AuthorsI would like to congratulate you on the substantial improvements made, which clearly demonstrate the quality and rigor of your work.
I would, however, suggest a few minor adjustments to further enhance the clarity and presentation of your article:
Figures 1, 3, 6, and 7: Please consider increasing their size to improve readability.
Algorithm 1: Verify its placement to ensure that it does not interrupt the flow of the paragraph.
Figure 9: Adjust the color scheme to make the values more legible. For instance,
Figure 10:
Display the parameters HP, μ, and Vins as titles:
on the graph provide the triplet of these values to avoid overcrowding the figure.
Avoid using red text for annotations; using black for both would be more appropriate.
Check the size of the equations, as some appear excessively large, and reduce them if necessary to improve layout and readability.
Thank you once again for your efforts and for the improvements made to the manuscript.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 3
Reviewer 2 Report
Comments and Suggestions for AuthorsI am pleased to propose the acceptance of your manuscript. Thank you for your valuable contribution and excellent work.