Review Reports - Secure State Estimation of Cyber-Physical System under Cyber Attacks: <i>Q</i>-Learning vs. SARSA

Round 1

Reviewer 1 Report

The paper presents a framework for comparison of RL algorithms. The comparison is done in the context of security related use case, i.e., a CPS being under a DoS attack.

The paper has a good structure, appropriate language with minor typos and sufficient related work.

The paper contribution is the provided framework and comparison of two algorithms. The framework itself is composed from known techniques though with tailoring and a different take on it. However, it is not clear if it can be generalized to be used for other algorithms comparison and for other use cases (what would be an outcome of changing the game formulation).

It is interesting that the system environment is not represented as a player in the game, could be interesting to add a few sentences on how it might change the framework.

The evaluation and comparison of algorithms is also a claimed contribution. However, I would strongly suggest to present the outcomes of the evaluation in more details, what we can actually conclude, when which algorithm is better to use, if the difference is significant, which factors are considered except Q-table (computational complexity, required resource, timing). Without such evaluation the comparison itself looks more like an engineering task.

As the paper is grounded in security use case, its potential readers could be from the security domain. Thus, I would suggest to explain more regarding ML. What does it mean to have a lower Q-table when applying to estimation of a security state? Just a few examples of what ML related parameters mean in practice would increase paper's readability.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Secure state estimation of cyber-physical system under cyber attacks: Q-learning vs SARSA

1. I would like to suggest that a two-player zero-sum game needs much detailed explanation.

2. Figure 1 needs major revisions with a more detailed reflection.

3. Detailed flow chart is required highlighting each step clearly Figure 3 is not sufficient.

4. Section 4.3 require major revisions, particularly Nash equilibrium policy of the sensor and attacker.

5. Tables 1-9 need detailed explanations.

6. Please revise the conclusion for better clarity.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

This paper proposes a reinforcement learning (RL) algorithm for the security problem of state estimation of cyber-physical system (CPS) under denial-of-service (DoS) attack. From my view, this paper is well organized and the proposed method is valuable for this research filed. After reviewed this paper, there are some questions and suggestions as follows.

Has the effect of other action selection methods such as Soft Max been investigated?
It is necessary to talk more about how to choose the epsilon value and its sensitivity analysis.
It is necessary to talk about the role of the parameters of the proposed algorithm in a separate section. For example: Which parameters are responsible for controlling exploration and which parameters are responsible for controlling exploitation?
Some figures need to be enhanced in terms of quality and resolution.
You must review all significant similar works that have been done. Also, review some of the good recent works that have been done in this area and are more similar to your paper. Authors suggested to go through the following feature selection algorithms and they MAY make use of them in updating the introduction and the related work sections: Ant-TD and EFR-ESO.
What are the advantages and disadvantages of this study compared to the existing studies in this area? This needs to be addressed explicitly and in a separate subsection.
There are many grammatical mistakes and typo errors.
Write a pseudocode in standard format for the proposed algorithm.
The proposed method should be compared with at least 3 other novel methods.
The experimental results indicate that they perform well, but providing a stronger theoretical analysis and justification for the algorithm would be more convincing. To clearly state the objective of the research in terms of problems to address and expected results and show how the proposed technique will advance the state of the art by overcoming the limitations of the existing work. Also, the results obtained must be interpreted.
It is necessary to experimentally analyze the proposed algorithm in terms of time consumed and compare with other algorithms.
Some final cosmetic comments:

* The results of your comparative study should be discussed in-depth and with more insightful comments on the behaviour of your algorithm on various case studies. Discussing results should not mean reading out the tables and figures once again.
* Avoid lumping references as in [x, y] and all other. Instead summarize the main contribution of each referenced paper in a separate sentence. For scientific and research papers, it is not necessary to give several references that say exactly the same. Anyway, that would be strange, since then what is innovative scientific contribution of referenced papers? For each thesis state only one reference.
* Avoid using first person.
* Avoid using abbreviations and acronyms in title, abstract, headings and highlights.
* Please avoid having heading after heading with nothing in between, either merge your headings or provide a small paragraph in between.
* The first time you use an acronym in the text, please write the full name and the acronym in parenthesis. Do not use acronyms in the title, abstract, chapter headings and highlights.
* The results should be further elaborated to show how they could be used for the real applications.

* Are all the images used in this work copyrights free? If not, have the authors obtained proper copyrights permission to re-use them? Please kindly clarify, and this is just to ensure all the figures are fine to be published in this work.

* Also, the list of references should be carefully checked to ensure consistency with between all references and their compliances with the journal policy on referencing.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Secure state estimation of cyber-physical system under cyber attacks: Q-learning vs SARSA

The current state of the manuscript may be considered for publication.

Reviewer 3 Report

Good revisions have been made in the paper and the revised version has the necessary qualities for acceptance compared to the previous version. In my opinion, the article is acceptable in its current form.