Next Article in Journal
A Modified Liu and Storey Conjugate Gradient Method for Large Scale Unconstrained Optimization Problems
Next Article in Special Issue
SR-Inpaint: A General Deep Learning Framework for High Resolution Image Inpainting
Previous Article in Journal
Similar Supergraph Search Based on Graph Edit Distance
Previous Article in Special Issue
Evaluation of Agricultural Investment Climate in CEE Countries: The Application of Back Propagation Neural Network
 
 
Article
Peer-Review Record

Synthetic Experiences for Accelerating DQN Performance in Discrete Non-Deterministic Environments†

Algorithms 2021, 14(8), 226; https://doi.org/10.3390/a14080226
by Wenzel Pilar von Pilchau 1,*, Anthony Stein 2 and Jörg Hähner 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Algorithms 2021, 14(8), 226; https://doi.org/10.3390/a14080226
Submission received: 30 June 2021 / Revised: 24 July 2021 / Accepted: 26 July 2021 / Published: 27 July 2021
(This article belongs to the Special Issue Algorithmic Aspects of Neural Networks)

Round 1

Reviewer 1 Report

The paper is well written and the results are satisfactory.  The idea of using Interpolated  Experience Replay to create synthetic experiences is very interesting. However, the problem description (4.1) is not very clear should be improved. The presentation of experimental results is not clear and should be improved. 

Author Response

First of all, thank you very much for the review and the valuable input from your side.

Point 1: However, the problem description (4.1) is not very clear should be improved.

Response 1: The problem description was reworked and covers two main topics. The first part of the chapter describes the FrozenLake environment how it can be found in OpenAI Gym.  The end of the first part describes the changes of the reward function that we made,  together with the reasons why we did this.

The second part covers the problem of oscillation that occurs in the learning phase with vanilla ER. We describe formally what is happening and present our expectations of how to mitigate this effect and at the same time accelerate the learning process.

I hope the new problem description is more clear to you and meets your expectations.

 

Point 2: The presentation of experimental results is not clear and should be improved. 

Response 2:  The experimental results chapter now only holds the presentation of the graphs and the whole interpretation is moved to the interpretation chapter. Also, the interpretation was reworked to be more on point.

Reviewer 2 Report

1. The proposed idea looks novel, but the experimental results are poor. The FrozenLake environment is too simplistic. Rather than Future Work, this paper should be revised to present results for a more complex environment.

2. All over the paper, there are many awkward sentences in English. The expressions need to be completely revised.

3. The followings are minor comments:
- Line8: The word "Reaply" should be replaced with Replay.
- Line93: The statement that start with "DQN perform an Q-learning….", the word an should be removed from the statement, or replaced by "a" if needed.

Author Response

First of all, thank you very much for your review and the valuable input.

Point 1: The proposed idea looks novel, but the experimental results are poor. The FrozenLake environment is too simplistic. Rather than Future Work, this paper should be revised to present results for a more complex environment.

You identified a very good point here and we do agree with you. Unfortunately, 10 days is not enough time to reevaluate and more importantly develop our approach to work in continuous environments. The presented approach is designed for discrete and non-deterministic environments and only works in such. The FrozenLake Environments fits perfectly here and this is the reason why we chose this problem to evaluate our approach on. We are currently in the process of developing the IER further on to interpolate follow-up states as well (which is needed for continuous environments) but unfortunately do not have satisfying results yet. By the evaluation of three different state encodings, we were able to evaluate different network structures and also introduce additional difficulties (conceptual aliasing states). This work serves as an initial study of this research area.  We will cover all your mentioned points in future work but for this submission, we are not able to fix this.

Point 2: All over the paper, there are many awkward sentences in English. The expressions need to be completely revised.

Response 2: The article was revised by a native speaker and now hopefully meets your expectations.

Point 3: the followings are minor comments:
- Line8: The word "Reaply" should be replaced with Replay.
- Line93: The statement that start with "DQN perform an Q-learning….", the word an should be removed from the statement, or replaced by "a" if needed.

Response 3: See Response 2.

Round 2

Reviewer 2 Report

No further comments.

Back to TopTop