Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Research on Wargame Decision-Making Method Based on Multi-Agent Deep Deterministic Policy Gradient

Appl. Sci. 2023, 13(7), 4569; https://doi.org/10.3390/app13074569

by Sheng Yu

, Wei Zhu^* and Yong Wang

Reviewer 1:

Kyungyong Chung

Reviewer 2:

Muhammet Deveci

Reviewer 3:

Marco Zamora

Appl. Sci. 2023, 13(7), 4569; https://doi.org/10.3390/app13074569

Submission received: 8 March 2023 / Revised: 30 March 2023 / Accepted: 30 March 2023 / Published: 4 April 2023

(This article belongs to the Special Issue Applications of Artificial Intelligence and Machine Learning in Games)

Round 1

Reviewer 1 Report

The proposed approach appears to be technically sound for a relevant topic. A use case for the e decision-making method based on Multi-Agent Deep Deterministic Policy Gradient would be interesting to motivate the problem. Is there a specific reason for using wargame decision-making method, as removing the background and vectorization are common methods?

Related work could be cited to validate the novelty of this proposal. The motivation could be improved by providing real world examples of the labeled data shortage problem.

Also, it would be more convincing if the advantages of using a MADDPG architecture for achieving a better result were pointed out. Perhaps, some more systematic research could go into the choice of the methods. Also, some more insight on related work would be helpful in understanding the contribution of this work.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

This paper is very interesting and very well. However, it needs some improvements as follows:

- Introduction is too long, I suggest authors to remove unnecessary information in this section.
authors to highlight the study contribution, study objectives and study novelty.
- The authors to highlight the study objectives and study novelty.
- Some of the references are out of date, I suggest authors to remove old references and add the new ones.
-In Section 2, the authors provided a literature review. I advise authors to present the current literature and their contribution to the literature with a summary table.
Some related papers should be discussed in the paper as follows: (i) Evaluation of Supplier Selection in the Defense Industry Using q-Rung Orthopair Fuzzy Set based EDAS Approach, (ii) On extended power geometric operator for proportional hesitant fuzzy linguistic large-scale group decision-making
- Comparison analysis should be added to show the superiority of the proposed method, in this section, authors need to compare the study results with other existing methods.
- Limitations and recommendations for further work should be discussed in the conclusion section.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

Dear Authors

Thank you very much for taking me into account to review the paper entitled: Research on Wargame Decision-making Method Based on Multi-Agent Deep Deterministic Policy Gradient. The paper is well structured according to the parameters of the journal. The research topic is very successful, I would like to share with you some observations to improve the work done. I will divide the comments into sections that are related to your work:

Abstract

Wargames serve as simulators for various war scenarios, but as the speed of warfare continues to increase, traditional wargame decision-making methods may no longer be sufficient. Therefore, wargame assisted decision-making methods that incorporate artificial intelligence techniques, especially reinforcement learning, have emerged as a promising solution. However, the current wargame environment generally suffers from a large decision space and sparse rewards. To address these problems, we investigate a wargame decision-making method based on Multi-Agent Deep Deterministic Policy Gradient (MADDPG). To adapt to the wargame environment, we introduced the Partially Observable Markov Decision Process (POMDP), joint action-value function and Gumbel-Softmax estimator to optimize MADDPG. We designed the wargame decision-making method based on the improved MADDPG, added supervised learning to improve training efficiency and reduce action space before the reinforcement learning phase, introduced policy gradient estimator for reducing action space and obtaining the global optimal solution, and additional reward function is designed to solve the sparse reward problem. The experimental results show that our wargame decision-making method performs better in wargame compared with the algorithm before optimization.

Structure of the work. It is adequate to the criteria established by the journal Applied Sciences

Title of the paper, it is congruent with the content.

Abstract, it is adequately presented and allows to awaken the interest of the readers in the subject.

Keyword, adequately defined

Introduction. The Introduction is duly detailed, includes a review of the literature on the main aspects addressed and includes the general objective of the work where they propose a deep learning method.

• It would be good to include a research question or a research assumption after the literature review where they have mentioned similar works.

• It would be good to include a paragraph at the end of the introduction where the authors mention: why the proposed work is innovative?

• Literature review (Related Work). The work is well supported by similar or related studies.

Method

• The method employed is adequate for the main aspects addressed by the work.

• It would be good to review the nomenclature used in the mathematical formulas described in the paper, particularly in paragraph 3.2.1 Partially Observable Markov Decision Process.

• It would be good to review formulas (1) to (8), verify that the nomenclature used is correct.

Results

• They are properly presented.

Discussion

• It would be good to include a discussion of the results before the conclusions, including other results of similar studies to support the results obtained. Mentioning again the results of the investigations described in point 2.

Conclusions and Future Work

• It would be good to separate into two different paragraphs

• First paragraph the conclusions, and indicate in a more punctual way how the main objective of the research was fulfilled.

• For the part of future work, where they mention that they intend to design decision making methods directly using discrete algorithms based on policies, it would be better to give a broader explanation (Line 512).

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

It can be accepted from my side.

Author Response

Thank you very much for your valuable comments!

Reviewer 3 Report

Dear Authors

Thank you very much for considering us to review your article N entitled: Research on Wargame Decision-making Method Based on Multi-Agent Deep Deterministic Policy Gradient. I believe the improvements have been significant, providing greater clarity to readers interested in the topic.

I can see that they have complied with the suggestions from the introduction, the verification of the nomenclature in the formulas, the conclusions have improved a lot, and the bibliography.

Author Response

Thank you very much for your valuable comments!

Article Menu

Research on Wargame Decision-Making Method Based on Multi-Agent Deep Deterministic Policy Gradient

Further Information

Guidelines

MDPI Initiatives

Follow MDPI