Next Article in Journal
Foreword to the Special Issue on Thulium-Doped Fiber Lasers
Next Article in Special Issue
An Optimized Gradient Boost Decision Tree Using Enhanced African Buffalo Optimization Method for Cyber Security Intrusion Detection
Previous Article in Journal
Study of Marine Particles Using Submersible Digital Holographic Camera during the Arctic Expedition
Previous Article in Special Issue
Financial Fraud Detection Based on Machine Learning: A Systematic Literature Review
 
 
Article
Peer-Review Record

Cascaded Reinforcement Learning Agents for Large Action Spaces in Autonomous Penetration Testing

Appl. Sci. 2022, 12(21), 11265; https://doi.org/10.3390/app122111265
by Khuong Tran 1,*,†, Maxwell Standen 2,†, Junae Kim 2,†, David Bowman 2,†, Toby Richer 2,†, Ashlesha Akella 1,† and Chin-Teng Lin 1,*,†
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3:
Reviewer 4:
Appl. Sci. 2022, 12(21), 11265; https://doi.org/10.3390/app122111265
Submission received: 30 September 2022 / Revised: 31 October 2022 / Accepted: 1 November 2022 / Published: 7 November 2022

Round 1

Reviewer 1 Report

A new cascaded agent reinforcement learning architecture called CRLA to tackle penetration testing scenarios with large discrete action spaces in this paper and it is validated with simulated scenarios from CybORG in which, CRLA showed superior performance to a single DDQN agent. Hence I do recommend the paper for possible publication in your reputed jornal.

Author Response

Dear reviewer #1,

We thank you for your time and consideration in reviewing our paper. We appreciate your comment and have the manuscript proofread accordingly.

Please let us know if there is further revision on the manuscript.

Best regards,

Khuong

Reviewer 2 Report

 

 

This Article shows a novel architecture called CRLA that handles significant discrete action spaces in an autonomous penetration testing simulator. The number of these actions exponentially increases with the complexity of the designed cybersecurity network. An algebraic action decomposition strategy in the proposed CRLA aims to discover the optimal attack policy in scenarios with large action spaces faster. It is more stable than a conventional deep Q-learning agent, commonly employed to involve artificial intelligence in autonomous penetration testing. The Article deals with interesting problems where the experimental investigation is presented and evaluated.

The Article introduces the problem briefly, then describes the proposed solution, and finally sheds light on the effectiveness of the proposed solutions, which is excellent. 

The authors present an extensive and complete literature review contextualizing their contribution, and the references are mostly up-to-date ad relevant.

All in all, however, I consider this contribution interesting enough to deserve acceptance.

Author Response

Dear Reviewer 2,

We appreciate your time and consideration in reviewing our manuscript. We are glad that you found our work to be useful and deserve acceptance.

Thank you

Best regards,

Khuong

Reviewer 3 Report

The article answers whether it is possible to reduce the effort of performing penetration testing on platforms with a large action space. The text contains a literary review and comparative analysis of achievements on the subject to date.
The paper presents a novel architecture named deep cascaded reinforcement learning agents.
The problem to be considered in this paper is a discrete-time reinforcement learning task modelled by a Markov decision process.
algebraic action decomposition strategy CRLA is demonstrated to find the optimal attack policy in scenarios with complex action space.
The authors successfully applied the divide and conquer principle to answer the question posed. The results of the research are convincingly presented.
Inconsistencies with the template in the literary sources are noticed

Author Response

Dear Reviewer 3,

We appreciate your time and consideration in reviewing our manuscript. We are glad that you have found our work to be useful and convincing. We have modified our citation to make it more consistent with the template and the manuscript has been proofread.

Please let us know if the manuscript needs further revision.

Best regards,

Khuong

Reviewer 4 Report

This paper proposed to employ deep cascaded reinforcement learning to deal with the large discrete action spaces of autonomous penetration testers. Experimental results demonstrated the effectiveness of the proposed scheme. I found this paper is potentially useful to the community. 

This paper is well-prepared and easy to follow up. However, I have a concern about the symbols in the equations. For example, A^i, t^i, u^i, do you mean the A to the power i, if not I would suggest the authors to double check and think about other ways of showing its meaning, maybe A^(i), t^(i), u^(i).

Author Response

Dear Reviewer 4,

We appreciate your time and consideration in reviewing our manuscript. We are glad that you found our work to be useful to the community.

Regarding the notation, the superscript i denotes the individual agent i. For example a^i is the action generated by agent i. We use this notation (without the parentheses) because it is the standard notation in multi-agent reinforcement learning literature such as in [1, 2]. If you need further clarification please do not hesitate to let us know.

Again, we thank you for reviewing our work.

Best regards,

Khuong

[1] QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

[2] Rethinking the implementation tricks and monotonicity constraints in cooperative multi-agent reinforcement learning.

Back to TopTop