Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Proximal Policy Optimization Through a Deep Reinforcement Learning Framework for Multiple Autonomous Vehicles at a Non-Signalized Intersection

Appl. Sci. 2020, 10(16), 5722; https://doi.org/10.3390/app10165722

by Duy Quang Tran

and Sang-Hoon Bae^*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Appl. Sci. 2020, 10(16), 5722; https://doi.org/10.3390/app10165722

Submission received: 22 July 2020 / Revised: 15 August 2020 / Accepted: 17 August 2020 / Published: 18 August 2020

(This article belongs to the Section Computing and Artificial Intelligence)

Round 1

Reviewer 1 Report

In the following sentence „According to a national motor vehicle crash survey…“ you should specify on which national survey you are referring to.

I suggest that you use following reference for deeper insight into the application of DRL in signalized intersection control: “Application of Deep Reinforcement Learning in Traffic Signal Control: An Overview and Impact of Open Traffic Data”, and "The use of cooperative approach in intelligent speed adaptation"

In Figure 1. It should be noted clearly where do you using DNN since you are mentioning DRL. Do you consider MLP as DNN? if yes, please elaborate on why. MLP can have numerous hidden layers but does not make those NN exclusively “deep”.

I didn’t understand Figure 1. does it refers only to single AV at the intersection or all? Please elaborate that and re-arrange this figure accordingly. Currently, in this figure, you are mentioning (picture of) one AV. I found this confusing (inscription below picture of AV is very blurry). All vehicles from intersection create State that you should present in a particular figure.

In your paper, you are just trying to optimize the speed of all AVs at the intersection? You are not computing their trajectories? Those trajectories are governed by SUMO? That should be clearly elaborate in your paper. At this point, it looks something as ISA system for AV.

In table 2. A number of hidden layers are part of PPO hyperparameters or MLP network structure? I am confused about what 256 x 256 x 256 means? Batch size is significantly larger than the number of iterations. Can you elaborate on that? It would be great to provide a detailed explanation of the batch structure.

Results are nicely presented and elaborated.

Author Response

Dear Reviewer,

We would like to thank you for your careful and thorough reading of our manuscript and for the thoughtful comments and constructive suggestions. Especially, Thank you for providing us this opportunity to further revise our manuscript. We have revised our manuscript in response to your suggestion and hope that this improved manuscript is acceptable for publication in the Applied Sciences journal.

Please refer to the attached file to see my file.

We look forward to hearing from you.

Best regards.

Author Response File: Author Response.docx

Reviewer 2 Report

Overall comment
- It shows the possibility of using autonomous vehicles (AVs) in non-signalized intersection as a way of solving traffic congestion. It also proved that full-autonomy traffic is more efficient than human-driven traffic through experiment. However, it seems to be lacking explanation on this paper’s main contribution. PPO is deep RL algorithm that already exists. Please consider adding more details on more developed or better features against the existing PPO through PPO (Poximal Policy Optimization) hyperparameters optimization. For example, there could be specific information such as the optimization course of PPO hyperparameters, or which point has been improved from the existing PPO hyperparameters.
- The proposed paper explains the problem of current situation and its solution. Also, appropriate figures and tables are helpful in understanding the article. However, it lacks recent related works. There are so many researches trying to solve problems like traffic congestion, and there have been a lot of recent related works based on in-depth reinforcement learning. It will be helpful if authors add comparative contents between the proposed paper and the latest researches.
Comments on Methods
- The proposed paper shows main features of reinforcement learning such as the state and action, and represents the learning method of the model through overall architecture. However, it lacks explanation on the design of reward function in 2.4.6 Reward function. The reward function is one of the most important features in reinforcement learning, and the learning performance depends on the design method of reward function. As the proposed paper uses speed value in the reward function, and if authors explain why authors used speed in reward function it will be helpful understanding the model’s performance.
Comments on Experiment
- The proposed paper performed various experiments in virtual simulation environment to show the efficiency of autonomous vehicle (AV). However, it doesn’t seem to be feasible to apply it to actual environment as the scale of road environment used in the experiment is too small. If it is hard to perform additional experiments in larger road environment, it would be good to show a possibility of the algorithm authors proposed to be expanded into larger environment.
- PPO hyperparameters optimization is one of the main contributions of this paper. It will be helpful to understand if authors add a performance comparison experiment between PPO hyperparameters optimized model and control group model in Section 3’s experiment.

Author Response

Dear Reviewer,

Please refer to the attached file to see my file.

We look forward to hearing from you.

Best regards.

Author Response File: Author Response.docx

Article Menu

Proximal Policy Optimization Through a Deep Reinforcement Learning Framework for Multiple Autonomous Vehicles at a Non-Signalized Intersection

Further Information

Guidelines

MDPI Initiatives

Follow MDPI