Cloud-Assisted Nonlinear Model Predictive Control with Deep Reinforcement Learning for Autonomous Vehicle Path Tracking

Zhang, Yuxuan; Chen, Bing; Wang, Yan; Li, Nan

doi:10.3390/act14120609

Open AccessArticle

Cloud-Assisted Nonlinear Model Predictive Control with Deep Reinforcement Learning for Autonomous Vehicle Path Tracking

¹

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China

²

Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, Hong Kong, China

³

School of Automotive Studies, Tongji University, Shanghai 200092, China

^*

Authors to whom correspondence should be addressed.

Actuators 2025, 14(12), 609; https://doi.org/10.3390/act14120609

Submission received: 18 October 2025 / Revised: 7 December 2025 / Accepted: 9 December 2025 / Published: 13 December 2025

(This article belongs to the Section Actuators for Surface Vehicles)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Model Predictive Control (MPC) stands out as a prominent method for achieving optimal control in autonomous driving applications. However, the effectiveness of MPC approaches critically depends on the availability of accurate dynamic models and often necessitates substantial computational overhead for real-time optimization procedures at every iteration. Recently, the research community has been increasingly drawn to the concept of cloud-assisted MPC, which harnesses the capabilities of powerful cloud computing to provide users with on-demand computational resources and data storage services. Within these cloud-assisted MPC frameworks, control signals are merged with a cloud-based MPC, which leverages the substantial processing power of cloud infrastructure to determine optimal control actions using detailed nonlinear models for greater accuracy. Simultaneously, a local MPC runs on simplified linear models constrained by limited on-device computing resources, delivering prompt control responses at the cost of reduced model accuracy. To achieve an effective trade-off between rapid response and model fidelity, this work presents a new model-free deep reinforcement learning structure designed to merge cloud and local MPC outputs. Tests conducted on path-following scenarios show that the introduced method achieves superior control performance compared to existing reinforcement learning baselines and conventional rule-based fusion strategies.

Keywords:

autonomous vehicle (AV); model predictive control (MPC); cloud control; reinforcement learning (RL); path tracking

1. Introduction

Autonomous vehicles (AVs) have attracted dramatic research attention in the past decades [1]. The deployment of advanced control systems in autonomous vehicles (AV) can improve road safety by significantly mitigating traffic incidents and associated casualties, while also ensuring a more comfortable ride experience [2]. In a typical hierarchical autonomous driving framework, one key component in achieving safe and comfortable driving is the path tracking module [3], which directly impacts vehicle movement. Specifically, with the objective of accurately and smoothly following the path planned by the upper-level path planner, the path tracking module generates control demands for lower-level actuators in real-time under constrained computation and communication conditions [4].

Several control methods are utilized to design path tracking controllers [5,6], including state feedback controllers, sliding mode control, proportional integral derivative (PID) control, robust control [7], linear quadratic regulator (LQR) control [8] and model predictive control (MPC), etc. The prominence of MPC in recent path tracking research (as shown in [9], Figure 1) is largely attributable to its proficiency in managing the optimization of nonlinear, constrained Multi-Input Multi-Output systems. This specific capability makes it exceptionally applicable to the demands of real-world autonomous vehicle path tracking [10]. However, due to complex vehicle dynamics and limited computing power onboard, solving a constrained MPC with a high-fidelity nonlinear vehicle dynamics model in real-time is always unavailable for onboard deployment [11]. Additionally, state estimation under data loss and noise uncertainty presents significant challenges for vehicle control systems [12].

Recently, with the advancement of cloud computing, new opportunities have opened for autonomous driving. Due to the benefits offered by cloud computing in providing seemingly limitless computing capabilities and data storage services, increasing research attention is being directed towards cloud-assisted control systems [13,14], including cloud-assisted MPC design. For example, a dual control framework is proposed in [15,16]. This framework involves solving constrained linear-quadratic MPC problems in the cloud for computing control inputs, supplemented by a local backup controller. However, they lack consideration of critical issues such as plant–model mismatch, connectivity, latency, and feasibility, which are crucial to ensure safety in autonomous driving tasks [17].

Taking into account the existing challenges of communication delays, disturbances, and plant–model mismatch in cloud-assisted MPC, in our previous work [18], we present an innovative cloud-assisted MPC framework to manage system constraints in complex nonlinear systems, as shown in Figure 1, in which a cloud computational node hosts a high-fidelity nonlinear MPC, while a simplified linear MPC version is executed locally. In this way, the computational burden on the local controller can be significantly reduced without compromising optimization due to cloud MPC. The key component of integrating cloud MPC and local MPC is the control fusion module, whose objective is to select appropriate control to minimize future cumulative costs, contributing to enhanced control performance. As a result, our approach successfully achieves improved control performance, offering a well-balanced solution that capitalizes on the strengths of both nonlinear MPC and linear MPC within a cloud-assisted framework. A key limitation of the rule-based switching approach is its requirement for prior closed-loop system knowledge and worst-case cost analysis. Such comprehensive information is often impractical to obtain for complex autonomous vehicle systems affected by dynamic disturbances.

To address the problem mentioned above of the control fusion module in our proposed cloud-assisted MPC framework, in this paper, we exploit the model-free deep reinforcement learning (DRL) technique [19,20] to learn the optimal control fusion policy. The DRL-based fusion strategy empowers the system to acquire optimal control signal fusion techniques through dynamic interactions with the environment, facilitating adaptive adjustments to varying conditions. In contrast to rule-based policies, DRL operates without prior knowledge of closed-loop dynamics and circumvents the need to precisely evaluate how disturbances propagate through complex nonlinear systems. Moreover, our framework achieves a harmonious equilibrium between control precision and effort by incorporating both factors into a multi-objective reward function, thus achieving better path-tracking control performance. The main contributions of this paper are summarized as follows.

1.: A DRL-driven model-free cloud-assisted MPC framework is extended based on our previous work [18]. DRL enables the learning of control fusion strategies, thereby bypassing the requirement for exact future cost estimation under disturbances, a crucial capability for complex dynamic environments.
2.: The DRL-cloud-assisted MPC is validated in the AV path-tracking task. In cloud MPC, a high-fidelity nonlinear vehicle dynamics model is applied that considers the angle of tire slip and axle load transfer, which is commonly neglected in existing MPC-based path tracking research due to the limited computing power on board [9]. The results demonstrate superior performance compared to both cloud-only and local-only baselines, as well as the rule-based method.
3.: The developed cloud-assisted MPC with classical DRL algorithms are well benchmarked and the codes are open-sourced (Codes: https://gitee.com/majortom123/mpc, accessed on 1 January 2025), which is expected to be a valuable resource for future research in the domain of cloud-assisted MPC with DRL and beyond.

The remainder of the paper is organized as follows: Section 2 formulates the AV path tracking problem and illustrates the cloud-assisted MPC framework. The DRL-based control fusion policy is presented in Section 3. Experiments, results, and discussions are presented in Section 4. Finally, we conclude the paper and discuss future works in Section 5.

2. Methodology

In this section, we first introduce the nonlinear vehicle dynamics model for the AV path-tracking task. Then, the cloud-assisted MPC framework is introduced.

2.1. Vehicle Dynamics and Path Tracking Problem

We consider a single-track nonlinear vehicle dynamics model with consideration of tire slip characteristics [9,21] to represent the actual vehicle dynamics system. The dynamic behavior of the vehicle’s center of gravity and its wheels is governed by the following equations.

\begin{matrix} \dot{p_{x}} & = v_{x} cos ψ - v_{y} sin ψ \end{matrix}

(1a)

\begin{matrix} {\dot{v}}_{x} & = v_{y} r + \frac{2}{m} \sum_{i = f, r} F_{x, i} - g sin σ_{g} - \frac{1}{m} F_{a} \end{matrix}

(1b)

\begin{matrix} \dot{p_{y}} & = v_{x} sin ψ + v_{y} cos ψ \end{matrix}

(1c)

\begin{matrix} {\dot{v}}_{y} & = - v_{x} r + \frac{2}{m} \sum_{i = f, r} F_{y, i} \end{matrix}

(1d)

\begin{matrix} \dot{ψ} & = r \end{matrix}

(1e)

\begin{matrix} \dot{r} & = \frac{1}{I} (2 L_{x f} F_{y, f} - 2 L_{x r} F_{y, r}), \end{matrix}

(1f)

where the state variables are: the CG positions

p_{x}

(longitudinal) and

p_{y}

(lateral); the global heading angle

ψ

; the body frame velocities

v_{x}

(longitudinal) and

v_{y}

(lateral); and the yaw rate r. The forces include the aerodynamic drag

F_{a}

and the tire forces

F_{x}

,

F_{y}

. The parameters are: vehicle mass m, yaw inertia I, and the distances from the CG to the front and rear axles,

L_{x f}

and

L_{x r}

, respectively.

The vehicle frame tire forces

F_{x, i}

and

F_{y, i}

from Equations (1b) and (1d) are obtained via a transformation from the wheel frame:

\begin{matrix} F_{x, i} & = {\bar{F}}_{x, i} cos β_{i} - {\bar{F}}_{y, i} sin β_{i} \end{matrix}

(2a)

\begin{matrix} F_{y, i} & = {\bar{F}}_{x, i} sin β_{i} + {\bar{F}}_{y, i} cos β_{i} \end{matrix}

(2b)

In this model,

β_{i}

is the wheel/road angle, and the index i (f or r) specifies the front or rear wheel. The longitudinal and lateral forces in the wheel frame,

{\bar{F}}_{x, i}

and

{\bar{F}}_{y, i}

, are determined as follows:

\begin{matrix} {\bar{F}}_{x, i} & = \frac{T_{i}}{2 R} \end{matrix}

(3a)

\begin{matrix} {\bar{F}}_{y, i} & = C_{i} μ_{i} F_{z, i} α_{i} \end{matrix}

(3b)

The parameters in these equations are: axle torque

T_{i}

, effective tire radius R, tire cornering stiffness

C_{i}

, road friction coefficient

μ_{i}

, and slip angle

α_{i}

[22]. Accurate estimation of road friction coefficient is crucial for vehicle stability control, and recent learning-based frameworks have shown promising results in this domain [23]. The vertical force

F_{z, i}

in (3b) is approximated by a static load transfer model. The vertical load

F_{z, i}

in Equation (3b) is given by a static load transfer model:

\begin{matrix} F_{z, i} = \frac{L_{x, i} m g}{2 (L_{x, f} + L_{x, r})} . \end{matrix}

(4)

Define state vector

x = {[p_{x}, v_{x}, p_{y}, v_{y}, ψ, r]}^{T}

and the control input

u = {[T_{f}, β_{f}]}^{T}

. With a sampling time of 0.05 s, the above nonlinear vehicle dynamics system can be discretized and written into the state-space form as

x_{t + 1} = A x_{t} + B u_{t} + f (x_{t}, u_{t}) + w_{t},

(5)

where

w_{t} = {[w_{x}, w_{v_{x}}, w_{y}, w_{v_{y}}, w_{ψ}, w_{r}]}^{T} \in W

represents the uncertainties in system dynamics. Here, an assumption is given that the input of the disturbance

w_{t}

is bounded, taking values from a known set

W \subset R^{p}

, with its supremum norm characterized by

{sup}_{w \in W} | w | = ω

.

The MPC strategy operates by solving an online optimization problem at each time step to determine the optimal control sequence

U_{t} = {u_{t}, u_{t + 1}, \dots u_{t + N - 1}}

and the resulting state sequence

X_{t} = {x_{t + 1}, x_{t + 2}, \dots x_{t + N}}

across a prediction horizon of length N:

\begin{matrix} min_{X_{t}, U_{t}} & J = \sum_{k = 0}^{N - 1} ϕ (x_{t + k}, u_{t + k}) \end{matrix}

(6a)

\begin{matrix} s . t . & x_{t} = {\hat{x}}_{t} \end{matrix}

(6b)

\begin{matrix} x_{t + k} = A x_{t + k - 1} + B u_{t + k - 1} \end{matrix}

\begin{matrix} + f (x_{t + k - 1}, u_{t + k - 1}) + w_{t + k - 1}, 1 \leq k \leq N \end{matrix}

(6c)

\begin{matrix} x_{min} \leq x_{t + k} \leq x_{max}, 1 \leq k \leq N \end{matrix}

(6d)

\begin{matrix} u_{min} \leq u_{t + k} \leq u_{max}, 0 \leq k \leq N - 1 \end{matrix}

(6e)

\begin{matrix} Δ_{min} \leq u_{t + k} - u_{t + k - 1} \leq Δ_{max} \end{matrix}

(6f)

\begin{matrix} 0 \leq k \leq N - 1 \end{matrix}

(6g)

where t is the current time instant,

ϕ (x_{t}, u_{t})

is the cost to be minimized. The constraints limit changes in states and control inputs within certain limits.

This paper addresses autonomous vehicle trajectory tracking, specifically for the trajectory shaped by

\begin{matrix} y_{r} = 4 sin (\frac{2 π}{100} x_{r}) \end{matrix}

(7)

where

x_{r}

and

y_{r}

are longitudinal and lateral reference positions, respectively. Therefore, the objective in (6a) is rewritten as:

\begin{matrix} J (X_{t}, U_{t}) = \sum_{k = 1}^{p} {∥ x_{t + k} (3) - 4 sin (\frac{2 π}{100} x_{t + k} (1)) ∥}_{Q_{t}}^{2} + \sum_{k = 0}^{p - 1} ({∥u_{t + k} - u_{t + k}^{r}∥}_{Q_{u}}^{2}) . \end{matrix}

(8)

2.2. Cloud-Assisted MPC Framework

As shown in Figure 1, there are two different MPC problems to be solved on the cloud and local sides separately. Based on the framework, the cloud MPC leverages a high-fidelity nonlinear model over an extended prediction horizon, computing a solution only upon receiving a new control task. In parallel, the local MPC utilizes a simplified linear model with a receding horizon, executing an optimization at every sampling instant throughout the task duration. Accordingly, the cloud is tasked with solving the following nonlinear optimization problem:

\begin{matrix} min & J = \sum_{τ = 0}^{N - 1} ϕ ({\hat{x}}_{τ}, u_{τ}) \end{matrix}

(9a)

\begin{matrix} s . t . {\hat{x}}_{t + 1} = A {\hat{x}}_{t} + B u_{t} + f ({\hat{x}}_{t}, u_{t}) \end{matrix}

(9b)

\begin{matrix} ({\{{\hat{x}}_{τ}\}}_{τ = 1}^{N}, {\{u_{τ}\}}_{τ = 0}^{N - 1}) \in Ξ^{c} \end{matrix}

(9c)

where

\hat{x}

represents predicted states in the cloud MPC. The model used here is a high-fidelity model including the nonlinear term

f ({\hat{x}}_{t}, u_{t})

.

Ξ^{c}

is constraints with the same form as in (6d)–(6f).

Given the constrained computational capabilities onboard, the local MPC employs a linear dynamic model for its operations, so the MPC problem is:

\begin{matrix} min & J_{t} = \sum_{τ = t}^{N - 1} ϕ ({\bar{x}}_{τ ∣ t}, u_{τ ∣ t}) \end{matrix}

(10a)

\begin{matrix} s . t . {\bar{x}}_{t ∣ t} = x_{t} \end{matrix}

(10b)

\begin{matrix} {\bar{x}}_{τ + 1 ∣ t} = A {\bar{x}}_{τ ∣ t} + B u_{τ ∣ t} \end{matrix}

(10c)

\begin{matrix} ({\{{\bar{x}}_{τ ∣ t}\}}_{τ = 1}^{N}, {\{u_{τ ∣ t}\}}_{τ = 0}^{N - 1}) \in Ξ_{t}^{l} . \end{matrix}

(10d)

where

\bar{x}

represents predicted states in the local MPC. The symbolic expression

{(\cdot)}_{τ ∣ t}

in the expressions above signifies a forecast of the variable

{(\cdot)}_{τ}

that is computed based on information available at time t.

In terms of cloud triggers, we assume that, the local system can initiate a request to the cloud for solving the nonlinear MPC problem when a control task is assigned. This process inevitably incurs a “request–response delay”, which consists of communication uplink–downlink delay and cloud computation time. To tackle this problem, a “prediction-ahead-of-time” approach is adopted, which ensures that the cloud can predict the state with the maximum delay time

Δ t

ahead. For example, if the request is sent to the cloud at

t = k - Δ t

, the cloud will predict the vehicle state

Δ t

time ahead,

\hat{x} ((k - Δ t) + Δ t) = \hat{x} (k)

by assuming state remaining constant over the time interval

Δ t

or by running forward simulation based on system model (5). The predicted state

\hat{x}

then serves as the initial condition for the cloud MPC, which subsequently computes the optimal control sequence. Because the local side can receive the cloud control law within the maximum delay time

Δ t

, the cloud control law can be exactly started at time instant

t = (k - Δ t) + Δ t = k

, thereby ensuring that no stale control commands are applied.

However, due to prediction inaccuracy, an initial state uncertainty for cloud MPC is inevitable. In addition, during the implementation process, the model mismatch and environmental disturbances may also lead to model prediction errors. To address these problems, the robust constraint enforcement technique is used to ensure the states can be constrained even with uncertainty. Because of the limited space and the main focus of control fusion design in this paper, we recommend interested readers to read our previous work [18].

A key innovation of our cloud-assisted nonlinear MPC strategy is a novel integration mechanism that seamlessly merges cloud and local control actions to achieve enhanced performance. In our previous research, we used a rule-based policy to realize the control switching. A main limitation of the proposed switching strategy is its inherent reliance on worst-case bounds, which means it does not guarantee optimality in minimizing the cumulative cost. Achieving optimality would demand an exact evaluation of disturbance effects through the nonlinear system (5), presenting a formidable challenge for complex dynamical systems. Therefore, in this paper, we want to deploy DRL technique to automatically learn an optimal switch policy. The details of DRL-based control fusion design are introduced in Section 3.

3. DRL-Based Control Fusion Design

In this section, we will first introduce the basic knowledge of DRL. Then, the DRL-based control fusion module for cloud-assisted MPC will be developed as shown in Algorithm 1.

Algorithm 1 DRL-based Cloud-Assisted MPC.

Require: M, T,

d t

,

γ

, N

Ensure:

θ

1: Initialize

θ

,

D \leftarrow \emptyset

2: for

j = 0

to

M - 1

do

3: Initialize

s_{t}

, Z, U,

k \leftarrow 0

4: while

t \leq T

do

5: Select fusion action based on current state

6:

a_{t} \sim π_{θ} (s_{t})

7: Environment Simulation

8:

k \leftarrow 0

9:

(Z, U) \leftarrow Solve cloud optimal control problem (9 a)

10:

u \leftarrow Solve local MPC problem (10 a) to obtain {\hat{u}}_{t}

Local control

11: Fuse control input from the dual controller

12:

u_{t} = a * {\hat{u}}_{t} + (1 - a) * \bar{U} (k)

13:

k \leftarrow k + 1

14: Apply control and simulate system dynamics

15:

x_{t + 1} \leftarrow Simulate system dynamics (5) using u_{t}

16: Calculate reward

17:

r_{t} \leftarrow Equation (12)

18: Observe next state and store experience

19:

s_{t + 1} \leftarrow (x_{t + 1}, Z (k))

20: Store

(s_{t}, a_{t}, r_{t}, s_{t + 1})

in

D

21: Update policy parameters

22: Sample N experiences from

D

and update

θ

23: Move to next time step

24:

s_{t} \leftarrow s_{t + 1}

25:

t \leftarrow t + d t

26: end while

27: end for

3.1. Preliminary of DRL

In an RL setting, an agent sequentially observes states

s_{t} \in S \subseteq R^{n}

, and performs actions

a_{t} \in A \subseteq R^{m}

at each timestep t, guided by a policy

π (a_{t} | s_{t})

. The simulation environment advances to a subsequent state

s_{t + 1}

according to the transition dynamics

p (\cdot | s_{t}, a_{t})

, concurrently yielding an immediate reward

r_{t} = r (s_{t}, a_{t}, s_{t + 1})

. The learning objective is to obtain an optimal policy

π^{*} : S \to A

that maps states to actions such that the cumulative discounted return

R_{t} = \sum_{k = 0}^{T} γ^{k} r_{t + k}

is maximized, where

r_{t + k}

denotes the reward at future step

t + k

, while

γ \in (0, 1]

and T correspond to the discount factor and episode horizon, respectively.

We define the state–action value function

Q^{π} (s_{t}, a_{t}) = E (R_{t} | s_{t}, a_{t})

as the expected total return when starting in state

s_{t}

, executing action

a_{t}

, and thereafter following policy

π

. The optimal Q-function, given by

Q^{*} (s_{t}, a_{t}) = {max}_{π} Q^{π} (s_{t}, a_{t})

, directly induces the optimal greedy policy

π^{*} (a_{t} | s_{t})

. Similarly, the state value function

V^{π} (s_{t}) = E (R_{t} | s_{t})

captures the expected return from state

s_{t}

when consistently adhering to policy

π

.

Building upon these foundations, we highlight several key RL algorithms tailored for continuous action spaces: Deep Deterministic Policy Gradient (DDPG) is designed for continuous control, utilizing an actor–critic framework where a deterministic actor network maps states directly to actions, and a critic network is used to evaluate the quality (Q-value) of those state–action pairs [24]. Twin Delayed Deep Deterministic (TD3) improves upon DDPG by introducing twin Q-networks to reduce overestimation bias and stabilize policy learning [25]. Soft Actor–Critic (SAC) adds an entropy term to the reward function, encouraging the policy toward diverse actions and enhancing the robustness of the learned policy [26]. Trust Region Policy Optimization (TRPO) aims to monotonically improve policy updates through a constrained optimization framework, ensuring robust policy improvement with theoretical guarantees of performance [27]. In contrast, Proximal Policy Optimization (PPO) offers a practical approach to policy optimization in discrete spaces by employing a clipped surrogate objective, ensuring stable and efficient learning [28]. These algorithms play a crucial role in advancing reinforcement learning by improving both efficiency and stability, and they will be evaluated within the proposed cloud-assisted MPC framework.

3.2. DRL-Based Cloud-Assisted MPC

In this paper, we formulate the problem of DRL-based cloud-assisted MPC for path planning in autonomous vehicles as a Partially Observable Markov Decision Process

M

(POMDP) and solve with model-free RL algorithms [19], represented by the tuple

M = (A, S, R, T, γ)

:

1.: Action space $A$ : The action generated by the DRL represents the fusion strategy for the DRL-based cloud-assisted MPC, ranging continuously from 0 to 1. This strategy integrates the control signals from both the cloud and local controllers. When $a = 1$ , the system solely utilizes the cloud MPC, analogous to a cloud-only approach. Conversely, when it comes to $a = 0$ , it employs only the local controller, aligning with the local-only MPC strategy.
2.: State space $S$ : The state space is designed to represent the simulation environment to learn a fusion policy and is defined as follows:

$S = {p_{x}, p_{y}, v_{x}, v_{y}, ψ, r} .$

(11)
3.: Reward function $R$ : The design of the reward function plays a critical role in guiding RL agents toward target behaviors. The specific form of the reward function is given in Equation (8), where the term $r ({\hat{x}}_{t}, u_{t})$ is defined as:

$\begin{matrix} r ({\hat{x}}_{t}, u_{t}) = {||({\hat{x}}_{t} (3) - x_{t}^{r} (3)||}_{Q_{t}}^{2} + | | u_{t} - u_{t}^{r} {| |}_{Q_{u}}^{2}, \end{matrix}$

(12)

where ${\hat{x}}_{t}$ denotes the actual system state (or its current state estimate in the absence of direct measurement), $x_{t}^{r}$ represents the reference trajectory and $x_{t}^{r} (3) = 4 sin (\frac{2 π}{100} x_{t}^{r} (1))$ , and $u_{t}$ represents the real-time control action, synthesized as $u_{t} = a * {\hat{u}}_{t} + (1 - a) * {\bar{u}}_{t}$ , an affine combination of cloud control and local control.
4.: Transition probabilities $T$ : As a model-free RL approach, our framework operates without an explicit model of the transition dynamics $T (s^{'} | s, a)$ , which characterizes the robot’s environmental interactions.

4. Simulation Results

This section presents extensive numerical tests to assess the effectiveness of the proposed framework against multiple baseline methods. The key simulation parameters of the vehicle dyanmics are shown in Table 1. The communication delay is set as 2 steps and the prediction horizon is 5 steps. The readers are referred to our open-sourced code (https://gitee.com/majortom123/mpc, accessed on 1 January 2025) for additional details about the simulation setup. The RL network architecture processes the input state through an initial fully-connected layer comprising 128 units, succeeded by two additional FC layers, each with 128 neurons. For the LSTM variant, the final FC layer is substituted by an LSTM layer containing 128 hidden units. The output layer produces two Q-values, representing the two possible control decisions: to trigger or not to trigger the control intervention. Following the DDQN algorithm, the target network parameters are synchronized with the online network every

N_{0} = 1000

training steps.

The off-policy reinforcement learning models undergo training for 50,000 environmental steps, equivalent to approximately 500 episodes. Each episode spans a duration of T = 20 s with a sampling interval of

d t

= 200 ms, resulting in an episode horizon of

T_{e} = 100

decision steps. In contrast, on-policy methods such as PPO typically demand extended training periods while offering enhanced stability [29]; accordingly, these algorithms are trained for 1000 episodes to ensure adequate convergence. The Markov decision process parameters include a discount factor

γ = 0.99

and a mini-batch size

N = 64

. Additional hyperparameters comprise a learning rate

η = 1 \times 10^{- 4}

and an experience replay buffer capacity of 5000 transitions. For the DDQN implementation, an

ϵ

-greedy exploration strategy is employed, with

ϵ

undergoing linear decay from 1.0 to 0.01 throughout the initial 5000 training steps. The reference trajectory we use follows (7). All experiments were conducted on an Ubuntu 20.04 system equipped with two NVIDIA GeForce RTX 2080 Ti GPUs (NVIDIA Corporation, Santa Clara, CA, USA, 12 GB each), an AMD 9820X (Advanced Micro Devices, Santa Clara, CA, USA) processor, and 64 GB of RAM. The implementation is based on PyTorch v2.1.1. Training typically requires approximately 2 h to complete.

We first evaluate some classical RL algorithms to see which algorithm works best for the considered vehicle tracking problem, including TRPO [27], PPO [28], SAC [26], TD3 [25], and DDPG [24]. The evaluation curves of those RL algorithms under nominal disturbances (

w = 0.02, w_{x} = 0.1

) are illusrated in Figure 2, which shows that TD3 and PPO are two best-performing models. We further evaluate the performance of different RL models under various disturbance levels. As shown in Figure 3, PPO stands out as the best model that consistently outperforms other models. It is worth noting that all RL models can work better than either local MPC or cloud MPC alone.

Since PPO works best, it is chosen as the RL algorithm for further comparisons. A detailed comparison between PPO and local/cloud MPC under different initial conditions is shown in Figure 4. It can be seen that PPO consistently perform better for a wide range of initial conditions.

In addition, the control and trajectories of different strategies are shown in Figure 5. It can be seen that the RL intelligently fuses local and cloud control to generate smooth tracking performance, demonstrating its capability of learning the hybrid control strategy for improved performance. To further evaluate the generalizability of our method, the same experimental procedure is applied to a straight-line reference trajectory

y = 0

. The results shown in Figure 6 consistently demonstrate the effectiveness of our approach across different reference paths.

5. Conclusions & Future Work

In this paper, we proposed a novel cloud-assisted framework with advanced reinforcement learning (RL) techniques. A multi-objective reward function was proposed to balance control performance and control efforts. In comparison to existing approach, the proposed algorithm does not require any knowledge of the closed-loop dynamics (i.e., model-free) and delivers superior performance. The proposed approach was comprehensively validated on the path planning problem for autonomous driving. The RL-based approach outperformed both the cloud-only and local-only MPC baselines, as well as the rule-based approach.

In subsequent research, we plan to extend this work by incorporating dynamic computational constraints and associated costs into the deep-RL-eMPC architecture for autonomous path tracking. We also intend to validate the framework using high-fidelity simulation platforms, including professional tools such as CARLA, under more realistic and complex driving scenarios. Finally, the theoretical guarantees of the proposed method, particularly regarding stability and convergence, will be rigorously evaluated through hardware-in-the-loop experiments.

Author Contributions

Conceptualization, Y.Z. and B.C.; methodology, Y.Z.; software, Y.Z.; validation, Y.Z., Y.W. and N.L.; formal analysis, Y.Z.; investigation, Y.Z.; resources, B.C.; data curation, Y.W.; writing—original draft preparation, Y.Z.; writing—review and editing, Y.W. and N.L.; visualization, Y.Z.; supervision, B.C.; project administration, Y.W. and N.L.; funding acquisition, B.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by the National Natural Science Foundation of China under Grant 52402482.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data presented in the study are openly available at https://gitee.com/majortom123/mpc, accessed on 1 January 2025.

Conflicts of Interest

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Bagloee, S.A.; Tavana, M.; Asadi, M.; Oliver, T. Autonomous vehicles: Challenges, opportunities, and future implications for transportation policies. J. Mod. Transp. 2016, 24, 284–303. [Google Scholar] [CrossRef]
Koopman, P.; Wagner, M. Autonomous vehicle safety: An interdisciplinary challenge. IEEE Intell. Transp. Syst. Mag. 2017, 9, 90–96. [Google Scholar] [CrossRef]
Yurtsever, E.; Lambert, J.; Carballo, A.; Takeda, K. A survey of autonomous driving: Common practices and emerging technologies. IEEE Access 2020, 8, 58443–58469. [Google Scholar] [CrossRef]
Zhou, Z.; Rother, C.; Chen, J. Event-triggered model predictive control for autonomous vehicle path tracking: Validation using carla simulator. IEEE Trans. Intell. Veh. 2023, 8, 3547–3555. [Google Scholar] [CrossRef]
Amer, N.H.; Zamzuri, H.; Hudha, K.; Kadir, Z.A. Modelling and control strategies in path tracking control for autonomous ground vehicles: A review of state of the art and challenges. J. Intell. Robot. Syst. 2017, 86, 225–254. [Google Scholar] [CrossRef]
Shi, Z.; Liu, H.; Liu, Z.; Li, T.; Shi, Z.; Zhuang, W. Safety-Critical Lane Change Control of Autonomous Vehicles on Curved Roads Based on Control Barrier Functions. In Proceedings of the 2023 IEEE International Automated Vehicle Validation Conference (IAVVC), Austin, TX, USA, 16–18 October 2023; IEEE: New York, NY, USA, 2023; pp. 1–8. [Google Scholar]
Meléndez-Useros, M.; Viadero-Monasterio, F.; Jiménez-Salas, M.; López-Boada, M.J. Static Output-Feedback Path-Tracking Controller Tolerant to Steering Actuator Faults for Distributed Driven Electric Vehicles. World Electr. Veh. J. 2025, 16, 40. [Google Scholar] [CrossRef]
Al-bayati, K.Y.; Mahmood, A.; Szabolcsi, R. Robust Path Tracking Control with Lateral Dynamics Optimization: A Focus on Sideslip Reduction and Yaw Rate Stability Using Linear Quadratic Regulator and Genetic Algorithms. Vehicles 2025, 7, 50. [Google Scholar] [CrossRef]
Stano, P.; Montanaro, U.; Tavernini, D.; Tufo, M.; Fiengo, G.; Novella, L.; Sorniotti, A. Model predictive path tracking control for automated road vehicles: A review. Annu. Rev. Control 2023, 55, 194–236. [Google Scholar]
Grüne, L.; Pannek, J.; Grüne, L.; Pannek, J. Nonlinear Model Predictive Control; Springer: Berlin/Heidelberg, Germany, 2017. [Google Scholar]
Schwenzer, M.; Ay, M.; Bergs, T.; Abel, D. Review on model predictive control: An engineering perspective. Int. J. Adv. Manuf. Technol. 2021, 117, 1327–1349. [Google Scholar] [CrossRef]
Wang, Y.; Tian, F.; Wang, J.; Li, K. A Bayesian expectation maximization algorithm for state estimation of intelligent vehicles considering data loss and noise uncertainty. Sci. China Technol. Sci. 2025, 68, 1220801. [Google Scholar] [CrossRef]
Givehchi, O.; Trsek, H.; Jasperneite, J. Cloud computing for industrial automation systems—A comprehensive overview. In Proceedings of the 2013 IEEE 18th Conference on Emerging Technologies & Factory Automation (ETFA), Cagliari, Italy, 10–13 September 2013; IEEE: New York, NY, USA, 2013; pp. 1–4. [Google Scholar]
Breivold, H.P.; Sandström, K. Internet of things for industrial automation–challenges and technical solutions. In Proceedings of the 2015 IEEE International Conference on Data Science and Data Intensive Systems, Sydney, Australia, 11–13 December 2015; IEEE: New York, NY, USA, 2015; pp. 532–539. [Google Scholar]
Skarin, P.; Eker, J.; Kihl, M.; Årzén, K.E. An assisting model predictive controller approach to control over the cloud. arXiv 2019, arXiv:1905.06305. [Google Scholar] [CrossRef]
Skarin, P.; Tärneberg, W.; Årzén, K.E.; Kihl, M. Control-over-the-cloud: A performance study for cloud-native, critical control systems. In Proceedings of the 2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC), Leicester, UK, 7–10 December 2020; IEEE: New York, NY, USA, 2020; pp. 57–66. [Google Scholar]
Chu, W.; Wuniri, Q.; Du, X.; Xiong, Q.; Huang, T.; Li, K. Cloud control system architectures, technologies and applications on intelligent and connected vehicles: A review. Chin. J. Mech. Eng. 2021, 34, 139. [Google Scholar] [CrossRef]
Li, N.; Zhang, K.; Li, Z.; Srivastava, V.; Yin, X. Cloud-assisted nonlinear model predictive control for finite-duration tasks. IEEE Trans. Autom. Control 2022, 68, 5287–5300. [Google Scholar] [CrossRef]
Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep reinforcement learning: A brief survey. IEEE Signal Process. Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef]
Li, Y. Deep reinforcement learning: An overview. arXiv 2017, arXiv:1701.07274. [Google Scholar]
Dang, F.; Chen, D.; Chen, J.; Li, Z. Event-triggered model predictive control with deep reinforcement learning for autonomous driving. IEEE Trans. Intell. Veh. 2023, 9, 459–468. [Google Scholar] [CrossRef]
Chen, J.; Yi, Z. Comparison of event-triggered model predictive control for autonomous vehicle path tracking. In Proceedings of the 2021 IEEE Conference on Control Technology and Applications (CCTA), San Diego, CA, USA, 9–11 August 2021; IEEE: New York, NY, USA, 2021; pp. 808–813. [Google Scholar]
Wang, Y.; Yin, G.; Hang, P.; Zhao, J.; Lin, Y.; Huang, C. Fundamental estimation for tire road friction coefficient: A model-based learning framework. IEEE Trans. Veh. Technol. 2024, 74, 481–493. [Google Scholar] [CrossRef]
Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
Fujimoto, S.; Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 1587–1596. [Google Scholar]
Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 1861–1870. [Google Scholar]
Schulman, J.; Levine, S.; Abbeel, P.; Jordan, M.; Moritz, P. Trust region policy optimization. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 6–11 July 2015; pp. 1889–1897. [Google Scholar]
Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
Christodoulou, P. Soft actor-critic for discrete action settings. arXiv 2019, arXiv:1910.07207. [Google Scholar] [CrossRef]

Figure 1. Illustration of the proposed cloud-assisted MPC framework embedded with DRL-based control fusion policy.

Figure 2. Evaluation curves during training with different RL benchmarks for path tracking.

Figure 3. Performance comparison between local, cloud MPC controllers and five classical RL benchmarks.

Figure 4. Performance comparison between local, cloud MPC controllers and PPO under different initial conditions.

Figure 5. Tracking Performance under Sinusoidal Reference Path. The first row shows the vehicle trajectories on the X-Y plane. The trajectories tracked by the Local MPC, Cloud MPC, and the PPO-based DRL fusion controller are shown for comparison with reference trajectory (7) under

w = 0.11

and

w_{x} = 0

. The fusion controller achieves the smoothest and most accurate tracking. The second row presents the corresponding control inputs over time. The subplots illustrate the control signals generated by each controller, showing how the DRL agent intelligently blends the local and cloud commands to produce a stable and effective control input, avoiding the oscillations seen in the Local MPC or the latency-induced jumps in the Cloud MPC.

Figure 5. Tracking Performance under Sinusoidal Reference Path. The first row shows the vehicle trajectories on the X-Y plane. The trajectories tracked by the Local MPC, Cloud MPC, and the PPO-based DRL fusion controller are shown for comparison with reference trajectory (7) under

w = 0.11

and

w_{x} = 0

. The fusion controller achieves the smoothest and most accurate tracking. The second row presents the corresponding control inputs over time. The subplots illustrate the control signals generated by each controller, showing how the DRL agent intelligently blends the local and cloud commands to produce a stable and effective control input, avoiding the oscillations seen in the Local MPC or the latency-induced jumps in the Cloud MPC.

Figure 6. Tracking Performance under Straight-Line Reference Path. The first row shows the vehicle trajectories on the X-Y plane. The trajectories tracked by the Local MPC, Cloud MPC, and the PPO-based DRL fusion controller are shown for comparison with reference trajectory

y = 0

under

w = 0.11

and

w_{x} = 0

. The fusion controller demonstrates minimal lateral deviation and superior stabilization. The second row presents the corresponding control inputs over time. The subplots show the control effort of each strategy. The DRL fusion policy effectively balances the rapid response of the Local MPC and the optimality of the Cloud MPC, resulting in a controlled, smooth input that rejects disturbances and maintains precise path following.

Figure 6. Tracking Performance under Straight-Line Reference Path. The first row shows the vehicle trajectories on the X-Y plane. The trajectories tracked by the Local MPC, Cloud MPC, and the PPO-based DRL fusion controller are shown for comparison with reference trajectory

y = 0

under

w = 0.11

and

w_{x} = 0

. The fusion controller demonstrates minimal lateral deviation and superior stabilization. The second row presents the corresponding control inputs over time. The subplots show the control effort of each strategy. The DRL fusion policy effectively balances the rapid response of the Local MPC and the optimality of the Cloud MPC, resulting in a controlled, smooth input that rejects disturbances and maintains precise path following.

Table 1. The vehicle simulation parameters of MPC.

Parameter	$m [kg]$	$L_{xf} [m]$	$L_{xr} [m]$	$I [kg \cdot m^{2}]$	$R [m]$	$C_{i}$ [-]	$μ_{i}$ [-]
Values	1500	1.2	1.4	4192	0.2159	−4.5837	1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, Y.; Chen, B.; Wang, Y.; Li, N. Cloud-Assisted Nonlinear Model Predictive Control with Deep Reinforcement Learning for Autonomous Vehicle Path Tracking. Actuators 2025, 14, 609. https://doi.org/10.3390/act14120609

AMA Style

Zhang Y, Chen B, Wang Y, Li N. Cloud-Assisted Nonlinear Model Predictive Control with Deep Reinforcement Learning for Autonomous Vehicle Path Tracking. Actuators. 2025; 14(12):609. https://doi.org/10.3390/act14120609

Chicago/Turabian Style

Zhang, Yuxuan, Bing Chen, Yan Wang, and Nan Li. 2025. "Cloud-Assisted Nonlinear Model Predictive Control with Deep Reinforcement Learning for Autonomous Vehicle Path Tracking" Actuators 14, no. 12: 609. https://doi.org/10.3390/act14120609

APA Style

Zhang, Y., Chen, B., Wang, Y., & Li, N. (2025). Cloud-Assisted Nonlinear Model Predictive Control with Deep Reinforcement Learning for Autonomous Vehicle Path Tracking. Actuators, 14(12), 609. https://doi.org/10.3390/act14120609

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Cloud-Assisted Nonlinear Model Predictive Control with Deep Reinforcement Learning for Autonomous Vehicle Path Tracking

Abstract

1. Introduction

2. Methodology

2.1. Vehicle Dynamics and Path Tracking Problem

2.2. Cloud-Assisted MPC Framework

3. DRL-Based Control Fusion Design

3.1. Preliminary of DRL

3.2. DRL-Based Cloud-Assisted MPC

4. Simulation Results

5. Conclusions & Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI