Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Discretionary Lane-Change Decision and Control via Parameterized Soft Actor–Critic for Hybrid Action Space

Machines 2024, 12(4), 213; https://doi.org/10.3390/machines12040213

by Yuan Lin

, Xiao Liu^* and Zishun Zheng

Reviewer 1: Anonymous

Reviewer 2:

Samir Maity

Reviewer 3:

Zejiang Wang

Machines 2024, 12(4), 213; https://doi.org/10.3390/machines12040213

Submission received: 18 February 2024 / Revised: 21 March 2024 / Accepted: 21 March 2024 / Published: 22 March 2024

(This article belongs to the Special Issue Data-Driven and Learning-Based Control for Vehicle Applications)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

In this study, authors conducted simulated training for automatic lane-changing scenarios using both Deep Reinforcement Learning (DRL) and Model Predictive Control (MPC). Additionally, they used the PASAC(Parameterized Soft Actor-Critic) algorithm to train a model equipped with a DRL strategy.

Discussion:

- We cannot consider the comparison between DRL and MPC as a contribution;

- The authors have used (but not proposed) the PASAC algorithm to train a model equipped with a DRL strategy;

-List the various parameters

-Introduce equation 21 (motivation)

-Comment on equation 21 (meaning of each term)

-The titles of figures 6 and 7 are very long.

- The authors can use the Bayesian technique to select the best hyperparameters for each case study.

- Give the complexity of the different models used;

- Can the authors give the mathematical explanation of the behaviors described between lines 332 and 337.

- In the abstract, the authors said "under consistent conditions of relevant hyperparameters, both MPC and PASAC achieve a collision rate of 0%", so they should give these conditions in the conclusion.

Author Response

Thank you very much for taking the time to review this manuscript. Please find the detailed responses below and the corresponding highlighted changes in the re-submitted manuscript.

Comments 1: We cannot consider the comparison between DRL and MPC as a contribution;

Response 1: Thank you for your comment. So far, there has been no comparative study on the automatic lane-changing decision between hybrid-action DRL and MPC, particularly in comparing MPC costs for selecting different lanes. Therefore, our research makes a certain contribution to this field.

Comments 2: The authors have used (but not proposed) the PASAC algorithm to train a model equipped with a DRL strategy;

Response 2: Thank you for your comment. Indeed, there are scholars who have proposed similar algorithms, but as of now, no one has employed them in lane change.

Specifically, we used the Parameterized Soft Actor-Critic (PASAC) algorithm to train a DRL-based lane-change strategy to output both discrete lane-change decision and continuous longitudinal vehicle acceleration. --Abstract

In this study, we used a novel hybrid-action reinforcement learning algorithm, PASAC, and compared it with MPC in the decision and control problems of autonomous vehicles during the lane-change process. --Section 6

Comments 3: Introduce equation 21 (motivation) and Comment on equation 21 (meaning of each term)

Response 3: Thank you for your comment. We have supplemented it accordingly.

The primary objectives of the first and second terms are to ensure an appropriate following distance between the ego and the lead vehicles. The third term aims to reduce the likelihood of collisions between vehicles and maintain a safe ego speed. The fourth term is designed to enhance driving comfort by penalizing jerk. --Section 4.2

Comments 4: The titles of figures 6 and 7 are very long

Response 4: Thank you for your comment. We have rewritten the titles, please see below.

-- Section 5.3

Comments 5: The authors can use the Bayesian technique to select the best hyperparameters for each case study. and give the complexity of the different models used;

Response 5: Thank you for your comment. Currently, we don't have sufficient time to conduct this research, but we will carefully consider it in the future. The complexity of the model for DRL is high, since it's based on deep neural networks, while it's low for MPC since it has two state variables.

Comments 6: Can the authors give the mathematical explanation of the behaviors described between lines 332 and 337.

Response 6: Thank you for your comment. For lines 332 and 337, the ego vehicle behaviors are based on traffic conditions. Moreover, since neural networks are a black box solution, it’s challenging to explain the DRL solution from a mathematical perspective.

Comments 7: List the various parameters.

Response 7: Thank you for your comment. We have made adjustments to various traffic flow parameters.

--Table 4

Comments 8: In the abstract, the authors said "under consistent conditions of relevant hyperparameters, both MPC and PASAC achieve a collision rate of 0%", so they should give these conditions in the conclusion.

Response 8: Thank you for your comment. We have added this content:

DRL training and MPC optimization had the same reward(cost) function, and both MPC and PASAC achieve a collision rate of 0\%.

--Section 6

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

The study presents an innovative approach to tackling the challenge of automatic lane changing in autonomous driving, a critical aspect of the broader field of autonomous vehicle technology. By integrating Deep Reinforcement Learning (DRL) and Model Predictive Control (MPC) into simulated training scenarios, and introducing the PASAC algorithm, the research contributes valuable insights into the decision-making mechanisms of autonomous driving systems. However, several critical comments and areas for further exploration emerge from this study:

The comparison between DRL (specifically, PASAC) and MPC is commendable for its novelty. However, the study could benefit from a more detailed methodological description, including the specific configurations of the DRL and MPC models, to enhance reproducibility and allow for a deeper understanding of why these methods perform equally well in terms of collision rates.
Further exploration into how each method handles different traffic densities, weather conditions, and unexpected road incidents could provide more nuanced insights into their respective capabilities and limitations.
While simulation results are promising, the transition from simulated environments to real-world application entails numerous challenges not addressed in the study. Factors such as sensor noise, real-time processing constraints, and interaction with non-autonomous vehicles are critical for the practical deployment of autonomous driving technologies.

Comments on the Quality of English Language

No comments

Author Response

Response to Reviewer 2 Comments

Thank you very much for taking the time to review this manuscript. Please find the detailed responses below and the corresponding highlighted changes in the re-submitted manuscript.

Comments 1: The comparison between DRL (specifically, PASAC) and MPC is commendable for its novelty. However, the study could benefit from a more detailed methodological description, including the specific configurations of the DRL and MPC models, to enhance reproducibility and allow for a deeper understanding of why these methods perform equally well in terms of collision rates.

Response 1: Thank you for your comment. In both MPC and DRL, the reward/cost functions is the same, and they also have the same actions. The DRL environment state is selected based on the MPC model. The traffic flow condition is the same. I hope this helps clarify the methodology.

Comments 2: Further exploration into how each method handles different traffic densities, weather conditions, and unexpected road incidents could provide more nuanced insights into their respective capabilities and limitations.

Response 2: Thank you for your comment. We have supplemented the tests for different traffic densities in Table 4. As for other conditions such as weather and unexpected road incidents, we will investigate them in future research. We add the following in section 6.

In the future, we also will consider more complex models, such as weather conditions and unexpected road incidents.

--Section 6

Comments 3: While simulation results are promising, the transition from simulated environments to real-world application entails numerous challenges not addressed in the study. Factors such as sensor noise, real-time processing constraints, and interaction with non-autonomous vehicles are critical for the practical deployment of autonomous driving technologies. Investigating the scalability of the PASAC algorithm and its integration with other autonomous driving functions, such as adaptive cruise control and emergency braking, could reveal its potential as a holistic decision-making framework for autonomous vehicles.

Response 3: Thank you for your comment. We will thoroughly consider the challenges mentioned and conduct research on the scalability of the PASAC algorithm, as well as its integration with other autonomous driving functions such as adaptive cruise control and emergency braking in the future.

Author Response File: Author Response.docx

Reviewer 3 Report

Comments and Suggestions for Authors

This paper compares a DLR and an MPC algorithm for automated vehicle lane change. The DLR is designed as a PASAC. SUMO simulation shows the improved performance of DLR w.r.t. an MPC. The authors can consider the following comments, which could probably enhance the draft in its current form.

Major Issue:

1. Page 3, line 15, citing the authors “Deep Reinforcement Learning) solution, based on neural networks, has short computation times and is suitable for real-time applications.” Even though DLR does not involve online numerical optimization, the offline training process can be time-consuming, which MPC can avoid.

2. The MPC formulation (20) simplified the vehicle as a point mass. In fact, vehicle dynamics are much more complicated. The authors may refer [R1] for details.

3. Below (21), citing the authors ‘Formula(15) is consistent with the PASAC reward function.’ Since the prediction horizon is infinity, how could that be implemented in MPC? Please explicitly formulate the MPC problem, like [R2].

Minor issue:

Page 7, line 224, equation (19)?
How the weights in Table I were determined?

[R1]. Wang, Z., Cook, A., Shao, Y., Xu, G. and Chen, J.M., 2023, June. Cooperative merging speed planning: A vehicle-dynamics-free method. In 2023 IEEE Intelligent Vehicles Symposium (IV) (pp. 1-8). IEEE.

[R2]. Falcone, P., Borrelli, F., Asgari, J., Tseng, H.E. and Hrovat, D., 2007. Predictive active steering control for autonomous vehicle systems. IEEE Transactions on control systems technology, 15(3), pp.566-580.

Comments on the Quality of English Language

Fine

Author Response

Response to Reviewer 3 Comments
Thank you for taking the time to review this manuscript. Please find the detailed responses below and the corresponding highlighted changes in the re-submitted manuscript.
Comments 1: Page 3, line 15, citing the authors “Deep Reinforcement Learning) solution, based on neural networks, has short computation times and is suitable for real-time applications.” Even though DLR does not involve online numerical optimization, the offline training process can be time-consuming, which MPC can avoid.
Response 1: Thank you for your comment. We have made modifications. On the other hand, the DRL solution, based on neural networks, despite being time-consuming during offline training, has short execution times and is suitable for real-time applications. --Section 1
Comments 2: The MPC formulation (20) simplified the vehicle as a point mass. In fact, vehicle dynamics are much more complicated. The authors may refer [R1] for details.
Response 2: Thank you for your comment. We have added relevant content. It's worth noting that we simplify vehicles to a point mass, without considering the low-level vehicle dynamics [35]. --Section 4.1 [35]Wang Z, Cook A, Shao Y, et al. Cooperative merging speed planning: A vehicle-dynamics-free method[C]//2023 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2023: 1-8. --References

Comments 3: Below (21), citing the authors ‘Formula(15) is consistent with the PASAC reward function.’ Since the prediction horizon is infinity, how could that be implemented in MPC? Please explicitly formulate the MPC problem, like [R2].
Response 3: Thank you for your comment. We have supplemented the relevant content: In the prediction horizon of MPC, scholars like Paolo Falcone have fixed the values of slip and friction coefficients within the predictive time horizon, ensuring they remain constant and equal to the estimated values at the current moment [36]. Similarly, in this paper, we choose N=5 as the prediction horizon and utilize the same velocity of the leading vehicle at the current time step, denoted as 'k,' for computations. --Section 4.5 [36]Falcone P, Borrelli F, Asgari J, et al. Predictive active steering control for autonomous vehicle systems[J]. IEEE Transactions on control systems technology, 2007, 15(3): 566-580. --References

Comments 4: Page 7, line 224, equation (19)?
Response 4: Thank you for your comment. This is where the control variable in MPC, acceleration, corresponds to the continuous actions in DRL. Equation (19) is the state-space equation.

Comments 5: How the weights in Table I were determined?
Response 5: Thank you for your comment. We have supplemented as follows. The weights are determined though manual turning. --Section 3.4

Article Menu

Discretionary Lane-Change Decision and Control via Parameterized Soft Actor–Critic for Hybrid Action Space

Further Information

Guidelines

MDPI Initiatives

Follow MDPI