Cloud-Assisted Nonlinear Model Predictive Control with Deep Reinforcement Learning for Autonomous Vehicle Path Tracking
Round 1
Reviewer 1 Report
Comments and Suggestions for Authors- In line 112, the tire–road friction coefficient is introduced. However, this parameter characterizes not only the road surface but also the tire–road interaction. Please expand on this aspect and clarify how it is modelled or measured in this work.
- In the Introduction, several approaches for path-tracking control are mentioned; however, the discussion is limited to PID and MPC controllers. There are many other relevant methods, such as robust control and LQR-based approaches, that should also be acknowledged. It is recommended to include references to additional related works, for example: Static Output-Feedback Path-Tracking Controller Tolerant to Steering Actuator Faults for Distributed Driven Electric Vehicles” (robust control), and Robust Path Tracking Control with Lateral Dynamics Optimization: A Focus on Sideslip Reduction and Yaw Rate Stability Using Linear Quadratic Regulator and Genetic Algorithms” (LQR control).
- In Equation (12), the reward function for the reinforcement learning (RL) agent is presented. The deviation of the vehicle position with respect to the reference trajectory is defined based on a sinusoidal path used in this study. Although this approach is suitable for the given case, it lacks generalizability, as the RL model may not perform adequately with different trajectory types. This assumption simplifies training but limits applicability. Please consider providing a more general definition of the vehicle’s deviation from the trajectory.
- More details regarding the training of the RL agent should be included. Please specify aspects such as the computer hardware used (e.g., CPU model, GPU type), training duration, and any relevant computational settings.
- Please specify which software or simulation environment was used to model the vehicle dynamics.
- Additional maneuvers should be included to more thoroughly validate the proposed approach. Furthermore, please indicate the time delay between the NMPC controller and the vehicle during simulation. Several key implementation details appear to be missing and should be provided for completeness.
- More up to date works should be cited.
Author Response
Comment 1. In line 112, the tire–road friction coefficient is introduced. However, this parameter char
acterizes not only the road surface but also the tire–road interaction. Please expand on this aspect and
clarify how it is modelled or measured in this work.
Response. Thank you for the comment. The tire-road friction coefficient µi indeed characterizes the
tire-road friction and describes the relationship between the lateral tire force and the vertical load. In
our study, we assume that µi is constant and known. In the numerical study, it was set 1 as shown in the
newly added Table 1. We have clarified it in the revised version.
Comment 2. In the Introduction, several approaches for path-tracking control are mentioned; however,
the discussion is limited to PID and MPC controllers. There are many other relevant methods, such as
robust control and LQR-based approaches, that should also be acknowledged. It is recommended to include
references to additional related works, for example: Static Output-Feedback Path-Tracking Controller
Tolerant to Steering Actuator Faults for Distributed Driven Electric Vehicles” (robust control), and Robust
Path Tracking Control with Lateral Dynamics Optimization: A Focus on Sideslip Reduction and Yaw Rate
Stability Using Linear Quadratic Regulator and Genetic Algorithms” (LQR control).
Response. We would like to appreciate the reviewer’s suggestions. The suggested references are indeed
relevant and we have now added those references in the revised manuscript.
Comment 3. In Equation (12), the reward function for the reinforcement learning (RL) agent is pre
sented. The deviation of the vehicle position with respect to the reference trajectory is defined based on a
sinusoidal path used in this study. Although this approach is suitable for the given case, it lacks gener
alizability, as the RL model may not perform adequately with different trajectory types. This assumption
simplifies training but limits applicability. Please consider providing a more general definition of the
vehicle’s deviation from the trajectory.
Response. We appreciate your observation. The way we wrote (12) indeed makes it look very specific
to the chosen sinusoidal trajectory. As a matter of fact, the reward only needs the deviation from the
reference. As a result, we have updated (12) to clarify that.
We agree that since we only trained the decision-making model for the sinusoidal reference trajectory,
it is not clear how it will work on other trajectories and we plan to look into that in our future studies.
Comment 4. More details regarding the training of the RL agent should be included. Please specify
aspects such as the computer hardware used (e.g., CPU model, GPU type), training duration, and any
relevant computational settings.
Response. Thanks for this comment. All experiments were conducted on an Ubuntu 20.04 system
equipped with two NVIDIA GeForce RTX 2080 Ti GPUs (12 GB each), an AMD 9820X processor, and
64 GBofRAM.TheimplementationisbasedonPyTorchv2.1.1. Training typically requires approximately
2 hours to complete. In the revised manuscript, we have added the description in Section 4.
Comment 5. Please specify which software or simulation environment was used to model the vehicle
dynamics.
Response. Thank you for the question. We actually hand-coded the simulation environment in Python
and our code is open-sourced at https://gitee.com/majortom123/mpc. In the revised manuscript, we
have made it clear in Section 4.
Comment6. Additional maneuvers should be included to more thoroughly validate the proposed approach.
Furthermore, please indicate the time delay between the NMPC controller and the vehicle during simula
tion. Several key implementation details appear to be missing and should be provided for completeness.
Response. We agree that adding more maneuver scenarios can further validate the proposed approach.
However, due to the limited time for response, we will leave it for our future work. We believe that the
existing results can well validate the proposed cloud-local fusion algorithm.
The communication delay is set as 2 steps and the prediction horizon is 5 steps. We have also added
additional information about the simulation setup in Section 4.
Comment 7. More up to date works should be cited.
Response. Thanks for the suggestion. We have now added more recent literature in the revised version,
including [10], [19], and [21].
Finally, we thank the reviewer again for the valuable suggestions and comments, and hope our revisions
address your concerns.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for Authors- Clearly specify the simulation setup: vehicle parameters, tire model constants, road friction settings, disturbance bounds, latency values, cloud computation times, prediction and control horizons for both cloud and local MPC, and constraint sets.
- Detail the fusion policy action a and clarify whether a is continuous in or a trigger
- Reconcile with Algorithm 1 which describes binary actions.
- Explain how ut is implemented and stabilized.
- Describe how robust constraint enforcement is implemented for initial-state uncertainty and disturbances.
- Clarify the reference path and scenarios: provide trajectories beyond a single sinusoid; test corner cases (varying speeds, friction changes, larger delays, sensor noise).
- Please add a focused state-of-the-art comparison: benchmark against recent robust/nonlinear MPC, event-triggered MPC with guarantees, and supervisory/hybrid fusion schemes using standard metrics across multiple scenarios. -Report mean std over several runs.
Fix tense consistency, article usage, and typographical issues. Standardize notation between sections.
Author Response
Comment 1. Clearly specify the simulation setup: vehicle parameters, tire model constants, road friction
settings, disturbance bounds, latency values, cloud computation times, prediction and control horizons for
both cloud and local MPC, and constraint sets.
Response. We sincerely appreciate this comment. We have added Table 1 to summarize the main
simulation parameters. In addition, we have expanded the description of the MPC implementation details
in Section 4. Further implementation details can be found in our open-sourced code: .https://gitee.
com/majortom123/mpc.
Comment 2. Detail the fusion policy action a and clarify whether a is continuous in or a trigger.
Reconcile with Algorithm 1 which describes binary actions.
Response. Thank you for catching this inconsistency. The proposed method performs an affine fusion
of the cloud and local controls according to ut = a ∗ ût + (1 − a) ∗ Å«t. Thus, a is a continuous weight
rather than a binary trigger. We have corrected Algorithm 1 accordingly.
Comment 3. Explain how Ut is implemented and stabilized.
Response. At each time instant t = 0,1...,N −1, two candidate controls, ût from cloud MPC and Å«t|t
from local MPC, are available. The goal is to systematically fuse them to achieve enhanced performance.
Specifically, the RL agent generates the weight a and the final control is updated as ut = a ∗ ût + (1 − a) ∗ Å«t.
We have expanded this explanation near the end of Section 3 in the revised manuscript.
Comment 4. Describe how robust constraint enforcement is implemented for initial-state uncertainty
and disturbances.
Response. We appreciate this comment. Robust constraint enforcement follows the same procedure as
in our prior work [15], specifically equations (26)–(32). In the revised manuscript, we have included a
brief explanation and explicitly referenced [15] for additional details.
Comment 5. Clarify the reference path and scenarios: provide trajectories beyond a single sinusoid; test
corner cases (varying speeds, friction changes, larger delays, sensor noise).
Response. Thank you for this valuable suggestion. We agree that incorporating additional reference
trajectories and varied operating conditions would further strengthen the paper. However, due to time
constraints, we were unable to complete these additional simulations. We believe that the current set
of results—covering a range of process-noise and sensor-noise levels—still effectively demonstrates the
robustness and benefits of the proposed framework.
Comment6. Please add a focused state-of-the-art comparison: benchmark against recent robust/nonlinear
MPC, event-triggered MPC with guarantees, and supervisory/hybrid fusion schemes using standard met
rics across multiple scenarios.-Report mean std over several runs.
Response. We appreciate this thoughtful suggestion. Unfortunately, given the limited time available
for the revision, we could not perform the extensive benchmarking requested. Nonetheless, we compare
the fused controller with both standalone local MPC and cloud MPC, and the results consistently show
improved performance across varying noise levels.
Comment 7. Fix tense consistency, article usage, and typographical issues. Standardize notation between
sections.
Response. We have thoroughly revised the manuscript to correct typographical issues, improve gram
matical consistency, and standardize notation throughout.
Finally, we thank the reviewer again for the valuable suggestions and comments, and hope our revisions
address your concerns.
Author Response File:
Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsSome comment from the previous revision were not assesed properly, specifically the justification for not adding more maneuvres or others method comparision. This should be justified within the manuscript.
Fig 1 should be larger and centered
Author Response
Comment 1. Some comment from the previous revision were not assesed properly, specifically the justification for not adding more maneuvres or others method comparision. This should be justified within the manuscript.
Response. Thank you for the comment. To further evaluate the generalizability of the proposed method, additional experiments were conducted on a straight-line reference trajectory (y=0). The results consistently demonstrate the effectiveness of our approach across different path geometries in Fig 6.
Comment 2. Fig 1 should be larger and centered.
Response. We would like to appreciate the reviewer’s suggestions. We have enlarged and centered Fig 1 as suggested.
Finally, we thank the reviewer again for the valuable suggestions and comments, and hope our revisions
address your concerns.
Reviewer 2 Report
Comments and Suggestions for AuthorsI would like to thank the Authors. According to me, the paper is ready to be published.
Author Response
We would like to thank you for the critical comments and suggestions that have enabled us to improve our manuscript.
Round 3
Reviewer 1 Report
Comments and Suggestions for Authors- The legends are missing from Figures 5 and 6.
- The references suggested by the reviewer for including additional types of controllers in the introduction were not added, even though the response indicated that they had been incorporated in the previous review.
- Could a scenario more complex than a straight-line case be considered?
Author Response
Comment 1. The legends are missing from Figures 5 and 6.
Response. Thank you for the comment. We have now updated both figures with detailed and descriptive legends to improve clarity and interpretability. These additions provide clear visual guidance to distinguish between controllers, reference paths, and simulation conditions, facilitating better understanding of the comparative results presented.
Comment 2. The references suggested by the reviewer for including additional types of controllers in the introduction were not added, even though the response indicated that they had been incorporated in the previous review.
Response. Thank you for your attentive review and for bringing this oversight to our attention. We sincerely apologize for the confusion caused. During a previous round of revisions, an older version of the manuscript was inadvertently used, which led to the omission of the suggested references. We have now carefully corrected this and ensured that all recommended citations concerning additional controller types have been properly incorporated into the introduction (see the newly added references [7], [8]).
Comment 3. Could a scenario more complex than a straight-line case be considered?
Response. Thank you for raising this valuable point regarding the need for more complex scenarios. In response, we would like to clarify our current experimental design and provide our reasoning regarding scenario selection.
During our internal validation, we did in fact evaluate the proposed method under additional driving scenarios, for example, the path shown in the figure.

Finally, we thank the reviewer again for the valuable suggestions and comments, and hope our revisions address your concerns.
