Trajectory Tracking Control of Intelligent Vehicles with Adaptive Model Predictive Control and Reinforcement Learning Under Variable Curvature Roads

Fang, Yuying; Wang, Pengwei; Gao, Song; Sun, Binbin; Zhang, Qing; Zhang, Yuhua

doi:10.3390/technologies13090394

Open AccessArticle

Trajectory Tracking Control of Intelligent Vehicles with Adaptive Model Predictive Control and Reinforcement Learning Under Variable Curvature Roads

by

Yuying Fang

^1,2

,

Pengwei Wang

^1,2,*

,

Song Gao

^1,2,

Binbin Sun

^1,2,*,

Qing Zhang

^1,2 and

Yuhua Zhang

^1,2

¹

School of Transportation and Vehicle Engineering, Shandong University of Technology, Zibo 255000, China

²

Key Laboratory of Integrated Design and Intelligence of New Energy Vehicles in Shandong Province, Shandong University of Technology, Zibo 255000, China

^*

Authors to whom correspondence should be addressed.

Technologies 2025, 13(9), 394; https://doi.org/10.3390/technologies13090394

Submission received: 26 July 2025 / Revised: 21 August 2025 / Accepted: 29 August 2025 / Published: 1 September 2025

(This article belongs to the Section Manufacturing Technology)

Download

Browse Figures

Versions Notes

Abstract

To improve the tracking accuracy and the adaptability of intelligent vehicles in various road conditions, an adaptive model predictive controller combining reinforcement learning is proposed in this paper. Firstly, to solve the problem of control accuracy decline caused by a fixed prediction time domain, a low-computational-cost adaptive prediction horizon strategy based on a two-dimensional Gaussian function is designed to realize the real-time adjustment of prediction time domain change with vehicle speed and road curvature. Secondly, to address the problem of tracking stability reduction under complex road conditions, the Deep Q-Network (DQN) algorithm is used to adjust the weight matrix of the Model Predictive Control (MPC) algorithm; then, the convergence speed and control effectiveness of the tracking controller are improved. Finally, hardware-in-the-loop tests and real vehicle tests are conducted. The results show that the proposed adaptive predictive horizon controller (DQN-AP-MPC) solves the problem of poor control performance caused by fixed predictive time domain and fixed weight matrix values, significantly improving the tracking accuracy of intelligent vehicles under different road conditions. Especially under variable curvature and high-speed conditions, the proposed controller reduces the maximum lateral error by 76.81% compared to the unimproved MPC controller, and reduces the average absolute error by 64.44%. The proposed controller has a faster convergence speed and better trajectory tracking performance when tested on variable curvature road conditions and double lane roads.

Keywords:

intelligent vehicle; model predictive control; adaptive; deep Q-network; weight matrix

1. Introduction

Intelligent vehicles have significant advantages in improving vehicle safety and traffic efficiency, and the research of intelligent vehicle has become a current hotspot [1]. The key technologies of intelligent vehicles include environmental perception, driving decision making, path planning, and motion control. The motion control module aims to operate the desired trajectory of the vehicle track quickly and stably [2]. How to ensure control accuracy and vehicle stability under dynamic environmental changes and complex road conditions is still a difficult problem to solve. Therefore, tracking control has become one of the key research areas in current intelligent vehicle research.

Currently, common trajectory tracking control methods include PID control [3,4], preview control [5], fuzzy control [6], sliding mode control [7], and model predictive control [8,9]. PID Control cannot handle multi-variable coupling and multi-objective constraints, with parameter tuning heavily reliant on empirical knowledge, resulting in poor adaptability to varying operating conditions. Preview control has weak robustness against external disturbances, making it difficult to achieve effective control under high-speed driving conditions. The accuracy of fuzzy control is limited, as it depends on empirical rules and lacks general applicability. Sliding mode control tends to chattering phenomena, severely affecting the controller’s effectiveness in real traffic scenarios. Compared to other control algorithms, Model Predictive Control (MPC) has significant advantages [10]. It can directly incorporate multiple inputs, multiple outputs, and state constraints into the optimization framework. MPC models the predictive model by considering external disturbances and model errors, significantly improving the adaptability of the control system [11]. Moreover, MPC achieves multi-objective optimization by adjusting weights, improving system flexibility. With its unique strengths in predictive model construction and optimal control solving, MPC has garnered widespread attention from researchers.

For the construction of the MPC prediction model, the fixed prediction time domain is usually used in the traditional method [12]. However, when dealing with complex driving conditions, a fixed prediction range can limit the accuracy of trajectory tracking. Currently, researchers widely adopt variable horizon MPC algorithms to address these challenges. Pei et al. [13] proposed an MPC controller that updates the prediction horizon based on vehicle speed. The experimental results demonstrate that the proposed controller achieves high control accuracy, but it is difficult to meet the control requirements of complex road conditions. Wu et al. [14] used a gray relational analysis method to determine the optimal prediction horizon parameters for MPC under different vehicle speeds. This approach improved the computational efficiency, but the tracking accuracy of the modified controller still needs further improvement. Zhang et al. [15] proposed a fuzzy adaptive trajectory tracking control approach based on road surface adhesion coefficient estimation. However, the above studies only focus on optimizing the design for specific scenarios, resulting in response delays in dynamic environments such as sudden curvature changes, making it difficult to maintain stable control performance.

For the optimal control part of MPC, in order to balance the importance of each objective, weight factors are often introduced for adjustment. Mei et al. [16] proposed an online self-tuning method for weight coefficients based on an improved particle swarm algorithm. Through iterative optimization, the weight coefficients that minimize the objective function were obtained. However, its adaptability needs to be enhanced. Sun et al. [17] used a fuzzy adaptive track tracking control based on the estimation of a road adhesion coefficient, which improved the stability of the controller. Traditional MPC relies on accurate system models and fixed control strategies, which have certain limitations. Reinforcement Learning (RL), due to its strengths in dynamic decision making and adaptive optimization, has been increasingly integrated with MPC by researchers [18]. Li et al. [19] used Deep Deterministic Policy Gradient (DDPG) to perform offline tuning of the weight matrix in the MPC objective function, training an optimal weight matrix. Shi et al. [20] proposed a risk threshold model based on RL, classifying scene risk levels according to scenario features, improving the adaptability of MPC to real-world scenarios. Wang et al. [21] designed a reward function to guide the adjustment of neural network parameters, improving the parking efficiency of intelligent vehicles. However, learning-based MPC algorithms face challenges such as instability caused by reliance on learned strategies and have limited generalization abilities due to complex modeling, requiring further optimization and improvement.

In summary, to solve the problem of control performance decline caused by the fixed prediction time domain and weight matrix solution, an adaptive model prediction controller based on reinforcement learning optimization is proposed in this paper. The prediction model and optimal solution of MPC are improved, which effectively improves the adaptability and tracking accuracy of the controller in various driving scenarios. The technology route is shown in Figure 1 and the main contributions can be summarized as follows: Firstly, a three-degree-of-freedom vehicle dynamics model and a trajectory tracking error model are established; Secondly, an adaptive prediction horizon strategy based on a two-dimensional Gaussian function is designed and the real-time adjustment of prediction time domain change with vehicle speed and road curvature is realized. The tracking accuracy is further improved; Thirdly, to address the problem of tracking stability reduction under complex road conditions, the Deep Q-Network (DQN) algorithm is used to adjust the weight matrix of the Model Predictive Control (MPC) algorithm, and the convergence speed and control effectiveness of the tracking controller are improved; Finally, to verify the performance of the proposed controller, co-simulation, hardware-in-the-loop tests and real vehicle tests are conducted.

The remainder of this paper is organized as follows. Section 2 conducts vehicle modeling; Section 3 provides a detailed introduction to the improved controller; in Section 4, simulation experiments are conducted; in Section 5, bench and vehicle validation is conducted; and Section 6 concludes the paper.

2. Vehicle Modeling and State Equations

The vehicle model and tracking error are key factors in studying trajectory tracking control problems. To track the desired trajectory, the control system must first define the tracking error in the objective function and then design a trajectory tracking control strategy based on the vehicle model to eliminate the defined tracking error. Therefore, it is crucial to construct a vehicle model that can accurately characterize the vehicle’s motion state while meeting the real-time performance of the control algorithm.

2.1. Vehicle Three-Degree of Freedom Dynamic Model

The vehicle dynamics model considers tire deformation, force, and relative motion, offering stronger predictive performance and stability compared to kinematic models. This paper focuses on the coupled longitudinal and lateral control of vehicles based on an improved model predictive algorithm. It establishes a vehicle dynamics model and tire model for trajectory tracking control, modeling the three degrees of freedom in longitudinal motion, lateral motion, and yaw motion to analyze the system’s accuracy, stability, and robustness. It is assumed that the forces on both sides are symmetrical, as shown in Figure 2 for the three-degree-of-freedom vehicle model [22].

In Figure 2, OXY represents the inertial coordinate system; oxy represents the vehicle body coordinate system; a and b are the distances from the front and rear axle centers to the center of mass; F_cf and F_cr are the cornering forces on the front and rear wheels; α_f and α_r are the tire slip angles of the front and rear wheels; F_lf and F_lr are the longitudinal forces on the front and rear wheels; F_xf and F_xr are the x-direction forces on the front and rear tires; F_yf and F_yr are the y-direction forces on the front and rear tires; δ_f, β are the front wheel steering angle and the side slip angle of the center of mass, respectively; v is the velocity of the vehicle’s center of mass, v_x is the longitudinal component of the center of mass velocity, v_y is the lateral component of the center of mass velocity; and

φ

and

\dot{φ}

are the vehicle yaw angle and yaw rate, respectively.

Based on Newton’s second law, the mechanical equations for the vehicle along the three axes are shown in Equation System (1):

\{\begin{cases} m ({\dot{v}}_{x} - v_{y} \dot{φ}) = F_{x f} + F_{x r} \\ m ({\dot{v}}_{y} + v_{x} \dot{φ}) = F_{y f} + F_{y r} \\ I_{z} \ddot{φ} = a F_{y f} - b F_{y r} \end{cases},

(1)

where m represents the total mass of the vehicle, and I_z represents the moment of inertia.

δ_{f}, δ_{r}

are the front and rear wheel steering angles, respectively. This paper only considers front-wheel steering; thus,

δ_{r} = 0

. The tire forces in the longitudinal and lateral directions can be decomposed into the resultant forces along the x-axis and y-axis of the vehicle body coordinate system, as shown in Equation (2):

\begin{array}{l} F_{x f} = F_{l f} \cos δ_{f} - F_{c f} \sin δ_{f} \\ F_{x r} = F_{l r} \\ F_{y f} = F_{l f} \sin δ_{f} + F_{c f} \cos δ_{f} \\ F_{y r} = F_{c r} \end{array},

(2)

When the tire slip angle and slip rate are small, based on the magic formula, tire forces can be approximated as linear functions, as shown in Equation (3):

\{\begin{cases} F_{x f} = C_{l f} s_{f}, F_{y f} = C_{α f} α_{f} \\ F_{x r} = C_{l r} s_{r}, F_{y r} = C_{α r} α_{r} \end{cases},

(3)

where C_lf and C_lr represent the longitudinal stiffness of the front and rear tires;

s_{f}

,

s_{r}

represent the slip ratios of the front and rear tires;

C_{α f}

and

C_{α r}

are the longitudinal cornering stiffness of the front and rear tires; and

α_{f}

,

α_{r}

are the tire slip angles of the front and rear tires.

To further simplify the equations and improve the fitting accuracy of the tire model, based on the small-angle assumption

\cos θ \approx 1, \sin θ \approx 0, \tan θ \approx 0

, the slip angles of the front and rear wheels can be approximated as Equation (4):

\{\begin{cases} α_{f} = \frac{v_{y} + a \dot{φ}}{v_{x}} - δ_{f} \\ α_{r} = \frac{v_{y} - b \dot{φ}}{v_{x}} \end{cases},

(4)

Based on the above analysis, the three-degree-of-freedom vehicle dynamics model for lateral motion, longitudinal motion, and yaw motion is established as shown in Equation (5):

\{\begin{cases} m {\dot{v}}_{y} = - m v_{x} \dot{φ} + C_{l f} s_{f} δ_{f} + C_{α f} (\frac{v_{y} + a \dot{φ}}{v_{x}} - δ_{f}) + C_{α r} (\frac{v_{y} - b \dot{φ}}{v_{x}}) \\ m {\dot{v}}_{x} = m v_{y} \dot{φ} + C_{l f} s_{f} - C_{α f} (\frac{v_{y} + a \dot{φ}}{v_{x}} - δ_{f}) δ_{f} + C_{l r} s_{r} \\ I_{z} \ddot{φ} = a [C_{l f} s_{f} δ_{f} + C_{α f} (\frac{v_{y} + a \dot{φ}}{v_{x}} - δ_{f})] - b C_{α r} (\frac{v_{y} - b \dot{φ}}{v_{x}}) \end{cases},

(5)

Since the reference trajectory information is represented based on the geodetic coordinate system, and the aforementioned three-degree-of-freedom vehicle dynamics model is built under the vehicle body coordinate system, it is necessary to perform a conversion between the vehicle body coordinate system and the ground inertial coordinate system. The conversion formula is Equation System (6):

\{\begin{cases} \dot{Y} = \dot{x} \sin φ + \dot{y} \cos φ \\ \dot{X} = \dot{x} \cos φ - \dot{y} \sin φ \end{cases},

(6)

In summary, the state variable of the system is

x = {[\begin{matrix} \begin{matrix} v_{y} & v_{x} & φ & \dot{φ} \end{matrix} & Y & X \end{matrix}]}^{T}

, which is used to evaluate the accuracy and stability of the trajectory tracking control strategy. The control variable of the system is

u = {[\begin{matrix} δ_{f} & v_{x} \end{matrix}]}^{T}

, which is the independent variable for the optimization solution of the subsequent controller.

2.2. Error Model of Trajectory Tracking

In the autonomous driving control module, it is necessary to calculate the errors in the Frenet coordinate system [23], which is based on the actual state in the Cartesian coordinate system and the projected state of the reference trajectory, and minimize these errors through control algorithms. The selected error state variables include lateral position error, lateral velocity error, yaw angle error, yaw rate error, longitudinal position error, and longitudinal velocity error, which are used to establish a dynamic error model. During trajectory tracking, road curvature has a significant impact on the vehicle steering characteristics and driving stability, directly affecting model accuracy. Therefore, the error model must fully consider variations in road curvature. The specific model is shown in Figure 3.

In the Cartesian coordinate system, the state of a vehicle can be represented by

(x, y, k, φ, v, a)

, where

(x, y)

is the position coordinate of the vehicle in the Cartesian coordinate system, and

(k, φ, v, a)

are the road curvature, yaw angle, velocity, and acceleration at the center of mass; in the Frenet coordinate system, the state of the vehicle can be represented by

(s, \dot{s}, \ddot{s}, d, \dot{d}, \ddot{d})

, where

s, \dot{s}, \ddot{s}

are the longitudinal displacement, longitudinal velocity, and longitudinal acceleration of the vehicle, and

d, \dot{d}, \ddot{d}

are the lateral displacement, lateral velocity, and lateral acceleration of the vehicle, respectively.

The reference line of a vehicle is usually composed of a series of discrete points. When the number of discrete points is sufficient, the sum of the distances between points can be approximately considered as the lateral displacement s. Assuming that projection point

(x_{n}, y_{n})

is the n-th point among the discrete points, then the longitudinal displacement s is Formula (7):

s = \sum_{n = 1}^{N} {[{(x_{n} - x_{n - 1})}^{2} + {(y_{n} - y_{n - 1})}^{2}]}^{\frac{1}{2}},

(7)

Based on the triangular vector relationship, the lateral displacement d can be calculated as Formula (8):

d = (\vec{h} - \vec{r}) \cdot {\vec{n}}_{r e f},

(8)

Furthermore, the longitudinal velocity

\dot{s}

and lateral velocity

\dot{d}

can be derived as Formulas (9) and (10):

\dot{s} = \frac{|{\vec{v}}_{h}| \cos (θ - φ_{r e f})}{1 - k_{r e f} e_{d}},

(9)

\begin{array}{l} \dot{d} & = |{\vec{v}}_{h}| \sin (θ - φ_{r e f}) \\ = |{\vec{v}}_{h}| \sin (φ + β - φ_{r e f}) \\ = |{\vec{v}}_{h}| \sin β \cos (φ - φ_{r e f}) + |{\vec{v}}_{h}| \cos β \sin (φ - φ_{r e f}) \\ = v_{y} \cos (φ - φ_{r e f}) + v_{x} \sin (φ - φ_{r e f}) \end{array},

(10)

where

φ

is the vehicle yaw angle, and

β

is the sideslip angle at the center of mass. According to automotive theory, the heading angle

θ = φ + β

.

In the Frenet coordinate system, the lateral error e_d is the lateral distance from the center of mass of the vehicle to the reference trajectory point P; the longitudinal error e_s is the longitudinal distance from the center of mass of the vehicle to the reference trajectory point P; and

φ_{r e f}

is the angle between the tangent of point P on the reference trajectory and the X-axis, the heading angle error is

e_{φ} = φ - φ_{r e f}

, and the road curvature of point P on the reference trajectory is

ρ_{r e f}

. The car trajectory tracking error equation is summarized as Equation System (11):

\{\begin{cases} e_{d} = d \\ {\dot{e}}_{d} = v_{x} e_{φ} + v_{y} \\ e_{φ} = θ - φ_{r e f} \approx φ - φ_{r e f} \\ {\dot{e}}_{φ} = \dot{φ} - \frac{ρ_{r e f} v_{x}}{1 - ρ_{r e f} e_{d}} \\ {\dot{e}}_{s t a t i o n} = e_{s p e e d} = {\dot{e}}_{s} \\ {\dot{e}}_{s p e e d} = a - a_{r e f} = Δ a \end{cases},

(11)

2.3. Linearization and Discretization of the Model

At a certain operating moment, the system’s state variable is

x_{r}

, and the control variable is

u_{r}

. The state-space equation at the initial operating point of the system is given by the following:

{\dot{x}}_{r} = f (x_{r}, u_{r})

.

To linearize the state-space equation around any point

(x_{r}, u_{r})

in the system, perform a first-order Taylor expansion, neglecting higher-order terms and retaining only the first-order terms. This transforms the nonlinear model into a computationally convenient linear model. The result is shown in Equation (12):

\dot{x} = f (x_{r}, u_{r}) + \underset{λ (x)}{\underset{⏟}{\frac{\partial f (x, u)}{\partial x}}} (x - x_{r}) + \underset{λ (u)}{\underset{⏟}{\frac{\partial f (x, u)}{\partial u}}} (u - u_{r}),

(12)

where

λ (x)

is the Jacobian matrix of the system state transition function f with respect to the state variable x, and

λ (u)

is the Jacobian matrix of the system state transition function f with respect to the control variable u.

Let

A = λ (x), B = λ (u)

, by subtracting Equation (12) from the state space equation of the initial operating point of the system. Then, obtain the linear state space expression, as shown in Equation (13):

\dot{\tilde{x}} = A \tilde{x} + B \tilde{u},

(13)

where

\dot{\tilde{x}} = \dot{x} - {\dot{x}}_{r}

,

\tilde{x} = x - x_{r}

,

\tilde{u} = u - u_{r}

and

\tilde{x} = {[\begin{matrix} \begin{matrix} \begin{matrix} {\dot{e}}_{d} & {\dot{e}}_{s} \end{matrix} & e_{φ} \end{matrix} & {\dot{e}}_{φ} & e_{d} & e_{s} \end{matrix}]}^{T}

,

\tilde{u} = {[\begin{matrix} Δ δ_{f} & Δ v_{x} \end{matrix}]}^{T}

.

A = [\begin{matrix} - \frac{2 (C_{α f} + C_{α r})}{m v_{x}} & \frac{\partial f (v_{y})}{\partial v_{x}} & 0 & - v_{x} + \frac{2 (a C_{α f} + b C_{α r})}{m v_{x}} & 0 & 0 \\ \dot{φ} - \frac{2 C_{α f} δ}{m v_{x}} & \frac{\partial f (v_{x})}{\partial v_{x}} & 0 & v_{y} - \frac{2 b C_{α f} δ}{m v_{x}} & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 \\ \frac{- 2 a C_{α f} + 2 b C_{α r}}{I_{z} v_{x}} & \frac{\partial f (\dot{φ})}{\partial v_{x}} & 0 & - \frac{2 (a^{2} C_{α f} + b^{2} C_{α r})}{I_{z} v_{x}} & 0 & 0 \\ \cos φ & \sin φ & v_{x} \cos φ - v_{y} \sin φ & 0 & 0 & 0 \\ - \sin φ & \cos φ & - v_{y} \cos φ - v_{x} \sin φ & 0 & 0 & 0 \end{matrix}],

(14)

B = [\begin{matrix} \frac{2 C_{α f}}{m} & \frac{\partial f (v_{y})}{\partial v_{x}} \\ \frac{\partial f (v_{x})}{\partial δ} & \frac{\partial f (v_{x})}{\partial v_{x}} \\ 0 & 0 \\ \frac{2 a C_{α f}}{I_{z}} & \frac{\partial f (\dot{φ})}{\partial v_{x}} \\ 0 & \sin φ \\ 0 & \cos φ \end{matrix}],

(15)

where

\{\begin{cases} \frac{\partial f (v_{y})}{\partial v_{x}} = [2 C_{α f} (v_{y} + a \dot{φ}) + 2 C_{α r} (v_{y} - b \dot{φ})] / m {v_{x}}^{2} - \dot{φ} \\ \frac{\partial f (v_{x})}{\partial v_{x}} = [2 C_{α f} δ (v_{y} + a \dot{φ})] / m {v_{x}}^{2} \\ \frac{\partial f (\dot{φ})}{\partial v_{x}} = [2 a C_{α f} (v_{y} + a \dot{φ}) - 2 b C_{α r} (v_{y} - b \dot{φ})] / I_{z} {v_{x}}^{2} \\ \frac{\partial f (v_{x})}{\partial δ} = [2 C_{α f} (2 δ - \frac{v_{y} + a \dot{φ}}{v_{x}})] / m \end{cases}

.

The matrix A describes the coupling relationships among different state variables of the vehicle system, while the matrix B represents the effects of control inputs (steering angle and velocity) on vehicle motion. By linearization, these two matrices allow the complex nonlinear vehicle model to be approximated as a linear system within a certain operating range, thereby facilitating controller design and analysis. It has been verified that the midpoint Euler method is more accurate than the forward Euler method and the backward Euler method. Therefore, the midpoint Euler method is used to discretize Formula (13) and obtain the discrete expression of the state space, which is as follows:

\dot{\tilde{x}} (k + 1) = A_{k t} \tilde{x} (k) + B_{k t} \tilde{u} (k),

(16)

where

A_{k t} = I + T A_{d} (t), B_{k t} = I + T B_{d} (t)

, I is the identity matrix and T is the sampling time.

By linearizing and discretizing the nonlinear state space, a linear time-varying prediction model is established. In the prediction horizon, the difference between the current state and the control input is used to predict the future system state and output, iterating continuously [24]. Then, the rolling optimization of Model Predictive Control is performed by establishing the objective function of the system and the constraints during the driving process, followed by optimization and solution.

3. Design of a Learning-Based Adaptive Trajectory Tracking Controller

MPC is an advanced control method. In this chapter, two improvements are made to the benchmark MPC controller. The first is to design an adaptive predictive time-domain strategy, and the second is to improve the weight matrix based on reinforcement learning methods to achieve precise and stable control of the system.

3.1. Objective Function with Constraints

The objective function with constraints is one of the key components of the coupled longitudinal and lateral MPC controller. To balance accuracy, stability, and ride comfort during the trajectory tracking process, the objective function needs to trade off tracking error, control consumption, and the magnitude of control increment changes.

Due to the many inherent physical constraints of the vehicle, a small change in the steering wheel angle at higher vehicle speeds can cause significant lateral acceleration. Therefore, hard constraints must be imposed on the control variables during the solution process. Additionally, to ensure the feasibility of optimization, soft constraints are applied to the control increments. Thus, Formula (17) can be obtained:

\{\begin{cases} u_{\min} \leq u (t) \leq u_{\max}, t = 0, 1, \dots, N_{c} - 1 \\ Δ u_{\min} - ε \leq Δ u (t) \leq Δ u_{\max} + ε, t = 0, 1, \dots, N_{c} - 1 \end{cases},

(17)

where

u_{\min} = {[\begin{matrix} δ_{\min} & v_{\min} \end{matrix}]}^{T}

represents the lower constraint bound of the control variable;

u_{\max} = {[\begin{matrix} δ_{\max} & v_{\max} \end{matrix}]}^{T}

represents the upper constraint bound of the control variable;

Δ u_{\min} = {[\begin{matrix} Δ δ_{\min} & Δ v_{\min} \end{matrix}]}^{T}

represents the lower constraint bound of the control increment; and

Δ u_{\max} = {[\begin{matrix} Δ δ_{\max} & Δ v_{\max} \end{matrix}]}^{T}

represents the upper constraint bound of the control increment.

Taking into account the constraints of the road lateral position, tire roll angle, vehicle adhesion conditions, and control increment, the design objective function and constraints are as shown in Equation (18):

\min J (x (t), Δ u (t), ε) = \min_{x_{k}, u_{k}} \sum_{i = 1}^{N_{p}} ‖x (t + i |t) - {x_{r e f} (t + i |t)‖}_{Q}^{2} + \sum_{i = 0}^{N_{c} - 1} {‖Δ u (t + i |t)‖}_{R}^{2} + ρ ε^{2} s . t . \{\begin{cases} Δ u_{\min} \leq Δ u \leq Δ u_{\max} \\ u_{\min} \leq A Δ u_{t} + u_{t} \leq u_{\max} \\ y_{h c, \min} \leq y_{h c} \leq y_{h c, \max} \\ y_{s c, \min} - ε \leq y_{s c} \leq y_{s c, \max} + ε \\ ε > 0 \end{cases},

(18)

where

x (k + i |t)

represents the actual output,

x_{r e f} (k + i |t)

represents the reference output,

i = 1, 2, \dots, N_{c}, \dots N_{p}

,

N_{c}

is the control horizon,

N_{p}

is the prediction horizon,

Δ u (k + i |t)

is the control increment, Q is the weight matrix for the state variables, R is the weight matrix for the control increments,

ρ

is the weight for the relaxation factor,

ε

is the relaxation factor,

y_{h c}

represents the hard constraints, which are outputs that cannot have their constraint ranges relaxed, and

y_{s c}

represents the soft constraints, which are outputs whose constraint ranges can be adjusted through relaxation factors.

MPC controller solving problems belongs to constrained convex quadratic programming, which has good feasibility and optimality. Although the computational complexity increases with the increase in the prediction step size and number of constraints, setting reasonable prediction time domains and other methods can meet the requirements of real-time vehicle control. Therefore, this algorithm has strong feasibility and promotional value in both theoretical and engineering applications.

3.2. Adaptive Prediction Horizon Strategy

In the MPC algorithm, considering the greater influence of the prediction horizon N_p, a control horizon N_c is set to 10 under the premise of ensuring vehicle stability. The criteria for selecting the prediction horizon vary with different driving speeds. At low speeds, a shorter prediction horizon can achieve better control effects; at high speeds, increasing the prediction horizon ensures vehicle stability. Additionally, the prediction horizon must consider the curvature characteristics of the road ahead. When the road curvature is large, the prediction horizon should be reduced to improve control precision; when the road curvature is small, the prediction horizon should be increased to maintain vehicle stability.

To enhance the controller’s performance under various road curvatures and speed conditions, this paper proposes an adaptive prediction horizon strategy based on a two-dimensional Gaussian function. This method designs a smooth and continuous prediction horizon adjustment mechanism, more effectively accommodating the complexity and diversity of curved road driving. Compared to traditional methods, this strategy not only reduces computational complexity but also significantly improves the stability and adaptability of the controller. Furthermore, the design of the prediction horizon fully considers the coupling relationship between vehicle speed and road curvature, and its form is as shown in Formula (19):

ρ = \frac{\dot{x} \ddot{y} - \dot{y} \ddot{x}}{{({\dot{x}}^{2} + {\dot{y}}^{2})}^{3 / 2}},

(19)

where

ρ

represents the road curvature,

\dot{x}, \dot{y}

are the first derivatives of the x and y coordinates of the curve at a certain moment, and

\ddot{x}, \ddot{y}

are the second derivatives of the x and y coordinates, respectively.

N_{p} (v, ρ) = r o u n d (N_{p, \max} \exp (\frac{{(v_{s + Δ s} - v_{s})}^{2}}{{σ_{1}}^{2}} + \frac{ρ^{2}}{{σ_{2}}^{2}})),

(20)

where

N_{p, \max}

represents the upper limit of the prediction horizon,

σ_{1}

is the speed influence factor, and

σ_{2}

is the road curvature influence factor.

Based on the vehicle characteristics and performance requirements, the parameters are calibrated to obtain a predicted horizon offline database. Combined with the vehicle model and control algorithm, the three-dimensional distribution of the prediction horizon concerning vehicle speed and road curvature is obtained, as shown in Figure 4.

3.3. Solution of Controller

To facilitate the solution and computation by a computer, the optimization problem can be transformed into a quadratic programming problem, and its standard form is Equation (21):

\min J = \min \frac{1}{2} {\tilde{U}}^{T} H \tilde{U} + {\tilde{U}}^{T} f π s . t . \{\begin{cases} l b A \leq A x \leq u b A \\ l b \leq x \leq u b \end{cases},

(21)

where

\tilde{U} = [\begin{matrix} Δ U (k) \\ ε \end{matrix}]

;

Δ U = I_{N_{c} \times 1} \otimes Δ u

(

\otimes

denotes the Kronecker product); the Hessian matrix is denoted by

H = 2 [\begin{matrix} Z^{T} Q Z + R & 0 \\ 0 & ρ \end{matrix}]

; the gradient matrix is denoted by

f = [\begin{matrix} 2 E^{T} Q Z & 0 \end{matrix}]

;

Q = d i a g (\begin{matrix} \begin{matrix} \begin{matrix} W_{{\dot{e}}_{d}} & W_{{\dot{e}}_{s}} \end{matrix} & W_{e_{φ}} \end{matrix} & W_{{\dot{e}}_{φ}} & W_{e_{d}} & W_{e_{s}} \end{matrix})

;

R = d i a g (\begin{matrix} W_{δ_{f}} & W_{v_{x}} \end{matrix})

; x represents the optimization variable;

l b

represents the lower bound of the optimization variable constraints;

u b

represents the upper bound of the optimization variable constraints; and

A

is the coefficient matrix of the inequality.

Q and R are the weight matrices, defining the priority in control objectives to balance the weights of system state deviations and control input weights. Due to the need to consider driving smoothness and riding comfort during the design process of the coupled lateral and longitudinal control system, the changes in state increments and control increments should be as small as possible. Therefore, they require precise constraints, where

W_{e_{d}}

is the weight for lateral position deviation;

W_{{\dot{e}}_{d}}

is the weight for lateral velocity deviation;

W_{e_{s}}

is the weight for longitudinal position deviation;

W_{e_{s}}

is the weight for longitudinal velocity deviation;

W_{e_{φ}}

is the weight for yaw angle deviation;

W_{{\dot{e}}_{φ}}

is the weight for yaw rate deviation;

W_{δ_{f}}

is the weight for the front wheel steering angle; and

W_{v_{x}}

is the weight for longitudinal velocity. For the state weight matrix

Q = d i a g (q_{11}, q_{22}, \dots q_{n n})

, the larger the value

q_{i i}

, the higher the sensitivity of the system to the deviation of the state variable

x_{i}

from the target, meaning that the system places more emphasis on the accuracy of that state variable to ensure that the system is closer to the target state. For the control weight matrix

R = d i a g (r_{11}, r_{22}, \dots r_{n n})

, the larger the values

r_{i i}

, the more the system tends to limit the amplitude of the control input

u_{i}

, avoiding drastic adjustments, which can ensure the smoothness of control actions and reduce system wear. The improper selection of weight matrices may lead to system oscillation, energy waste, poor tracking performance, and other issues.

3.4. Optimization of Weight Matrix Based on DQN

The adaptive prediction horizon MPC faces issues of control delay during vehicle steering and return to straight, leading to increased tracking error, especially at high speeds or under extreme conditions. Reinforcement learning algorithms can optimize weights in real time based on environmental characteristics, allowing controllers to maintain trajectory tracking accuracy in complex environments and avoid control instability. In terms of optimizing the MPC weight matrix, a deep Q-network centered on the value function does not require additional learning of the system’s dynamic environment model. It directly learns Q values using interaction data, bypassing the modeling step. This approach is particularly advantageous for scenarios with complex or difficult-to-model dynamic models. Therefore, this paper adopts a deep Q-network to optimize the weight matrix. By penalizing behaviors that deviate from the expected state and control actions, a more precise control target is formed. The DQN can help MPC find the optimal values for these matrices, achieving the best control performance of the controller. The specific implementation process is shown in Figure 5.

The state spaces includes the current state and the current values of the weight matrices

s_{t} = [\begin{matrix} \begin{matrix} {\dot{e}}_{d} & {\dot{e}}_{s} & e_{φ} & \begin{matrix} {\dot{e}}_{φ} & e_{d} \end{matrix} \end{matrix} & e_{s} & Q & R \end{matrix}]

; the action space a involves adjusting the strategies for Q and R, with each action being an adjustment to each element of these matrices. Since the matrices are diagonal, actions can be defined as

a = (Δ q_{11}, Δ q_{22}, Δ q_{33}, Δ q_{44}, Δ q_{55}, Δ q_{66}, Δ r_{11}, Δ r_{22})

. Actions can be discretized,

Δ = \{\begin{matrix} - 1000 & 0 & 1000 \end{matrix}\}

, such as by increasing, maintaining, or decreasing a specific weight. The action a_t adjusts Q and R, and the MPC controller computes the control input u_t, resulting in the next state s_t₊₁ and the immediate reward r_t; the reward function consists of two parts: one is the state error penalty

J_{s t a t e} = \sum_{k = 0}^{N_{c} - 1} {e_{k}}^{T} Q e_{k}

, and the other is the control input penalty

J_{c o n t r o l} = \sum_{k = o}^{N_{c} - 1} {u_{k}}^{T} R u_{k}

. The total reward can be expressed as

r_{t} = - (ω_{1} J_{s t a t e} + ω_{2} J_{c o n t r o l})

, where smaller state errors and control costs lead to higher rewards.

ω_{1}, ω_{2}

are used to balance the proportion between state error and control input.

The Deep Q-Network algorithm uses the Bellman equation to iteratively update the

Q_{π} (s, a)

. The parameter values and explanations of the Deep Q-Network are shown in Table 1, and the pseudocode of the algorithm is shown in Algorithm 1.

Algorithm 1 DQN-Based Tuning of MPC Weighting Matrices

Input: Environment model f(x,u); Discrete action set A = {(Q1, R1), (Q2, R2), …, (Qm, Rm)}; State space S; Learning parameters: γ, α, ε, replay buffer size M, batch size B, target update interval T

Output : Optimized Q_{π}

(s,a) function approximator
//Initialize experience replay and networks
1: Initialize replay buffer D←∅
2:

Initialize action - value network Q_{π}

(s,a; θ) with random weights θ
3:

Initialize target network Q_{π}

’ (s, a; θ -) \leftarrow Q_{π}

(s,a; θ)
//Loop over training episodes
4: for episode = 1 to N_episodes do
5: Initialize state s
6: for t = 1 to T_episode do
7: with probability

ε

select random action a∈A

8 : Otherwise select a = argmax_a Q_{π}

(s,a; θ)
9: Decode (Q_weight, R_weight)←action a
10: Apply MPC using (Q_weight, R_weight) to compute u *
11: Simulate f(x,u *) to obtain next state s’
12: Compute reward r based on tracking error and control effort
13: Store transition (s,a,r,s’) in D
14: Sample random minibatch of B transitions (sj,aj,rj,sj’) from D

15 : Compute target yj \leftarrow rj + γ \cdot \max_{a ’} Q_{π}

′(sj′, a′; θ−)
16: Update θ by minimizing loss:

L = (1 / B) \cdot Σ j [yj - Q_{π}

(sj, aj; θ)]²
17: Every T steps, update target network:
θ−←θ
18: s←s’
19: end for
20: end for

Repeat the aforementioned process until the maximum number of training episodes is reached. Once the accumulated rewards converge within a certain number of episodes, select the Q and R matrices that yield the highest accumulated reward and smallest penalties as the optimization results. Then, incorporate these matrices into the objective function of the Quadratic Programming (QP) problem for solving.

J = {[Y - Y_{r e f}]}^{T} \bar{Q} [Y - Y_{r e f}] + Δ U^{T} \bar{R} Δ U + ε^{T} ρ ε,

(22)

where

\bar{Q} = I_{N_{p}} \otimes Q

,

\bar{R} = I_{N_{c}} \otimes R

.

During the optimization process, the rational selection of weight matrices is crucial for the efficient and stable operation of the MPC controller. The round training of reinforcement learning continuously adjusts the weight matrices. The convergence trend of accumulated rewards (or penalties) serves as the optimization criterion, clearly quantifying the performance differences between various Q and R combinations, avoiding the trial-and-error nature of traditional methods in selecting weights. Additionally, this reward-driven optimization approach has dynamic adaptability, breaking through the limitations of traditional algorithms that rely on manual parameter adjustments.

Under the constraints, the optimal weight matrix is obtained by minimizing the cost function, then incorporated into the objective function of the MPC optimal control input. The optimal variable in each control period of the controller can be solved through the following formula, while also obtaining a series of control increments and relaxation factors within the control horizon. Relaxation factors are typically used to handle uncertainties in system constraints, which helps improve the stability and efficiency of the solution process.

Δ U_{t}^{*} = {[\begin{matrix} \begin{matrix} Δ u_{t}^{*} & Δ u_{t + 1}^{*} \end{matrix} & \dots & Δ u_{t + N_{c} - 1}^{*} & ε \end{matrix}]}^{T},

(23)

In subsequent control processes, the first term in the optimized control sequence is used as the optimal input at the current moment and fed back into the system, and then the process is repeated. Through rolling optimization, the controller can continuously update the control inputs to achieve precise control of the entire system. Especially in trajectory tracking problems, the MPC method can adjust control inputs based on real-time feedback information and future predictions, allowing the controller to precisely control complex dynamic systems and achieve the desired control objectives.

u (t) = u (t - 1) + Δ u_{t},

(24)

4. Simulation Analysis

This chapter validates the performance of the improved controller, analyzes the experimental results, and compares the performance between algorithms.

4.1. Simulation Modeling

To verify the superiority of the improved controller, this paper builds a joint simulation platform based on CarSim and MATLAB/Simulink, as shown in Figure 6. CarSim provides a high-fidelity vehicle dynamics model, including suspension, tires, and powertrain modules, and transmits real-time state information such as the vehicle position, speed, and yaw rate to the controller. The controller, implemented in Simulink, calculates the optimal control inputs (steering angle and acceleration/braking) at each sampling step and sends them back to CarSim, thereby forming a closed-loop control process. Table 2 summarizes the vehicle and controller parameters used in the simulation. To comprehensively evaluate the proposed method, multiple driving scenarios are designed. The controller performance is assessed in terms of trajectory tracking accuracy and disturbance robustness, enabling a complete and fair comparison with the baseline method.

4.2. Reference Trajectory

To verify the performance of the designed controller at higher vehicle speeds and on roads with varying curvatures, this paper sets a path with changing curvature. The path is divided into three segments with small, medium, and large curvature changes. The maximum curvatures of these segments are 0.005, 0.01, and 0.015, respectively, simulating the driving conditions of the controller in different bends. As shown in Figure 7.

In addition, the double lane path is selected as the second path to test the performance of the controller. As shown in Figure 8. It has the following key advantages: The double lane path involves significant direction changes and different curvature transitions, effectively simulating dynamic curvature changes in road conditions, which can evaluate the adaptability of the controller when facing various curvatures. The double lane path simulates complex scenarios such as overtaking and obstacle avoidance that may occur in real driving, testing the controller’s response capability in high-dynamic situations. Moreover, experiments with the double lane path can be repeated multiple times, facilitating performance comparisons between different controllers and providing feasibility assurance for subsequent real vehicle experiments.

4.3. Evaluation Index

Maximum Absolute Error (MAX), Mean Absolute Error (MAE), and Root Mean Square Error (RMSE) [25] are commonly used model evaluation indices that can analyze the performance of algorithms from different perspectives. The specific calculation formula is shown in Formulas (25)–(27):

Calculated value : y = \{y_{1}, y_{2}, \dots, y_{n}\} Reference value : \hat{y} = \{{\hat{y}}_{1}, {\hat{y}}_{2}, \dots, {\hat{y}}_{n}\}

M A X = \max_{i \in [1, n]} |y_{i} - {\hat{y}}_{i}|

(25)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|

(26)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(27)

Firstly, the tracking performance comparison of the unimproved MPC controller (MPC), the adaptive predictive horizon MPC controller (AP-MPC), and the adaptive predictive horizon MPC controller, based on deep Q-network (DQN-AP-MPC), on variable curvature roads at a speed of 90 km/h is shown in Figure 9 and Figure 10.

The simulation experiment results show that under the extreme working conditions of high speed and large curvature, the maximum lateral error of the controller designed in this paper is only 0.33958 m, which is comparable to the error of Wang et al. [26] at a speed of 80 km/h. The confidence interval for lateral tracking error is [−0.03334 0.22112], with a confidence level of over 95%. Compared with the unimproved MPC controller, the tracking accuracy has been improved by 76.80671%, and the maximum longitudinal speed error is only 0.33546 km/h. After passing through a section with a large curvature, the improved controller can quickly track the target path and target vehicle speed, and control the error within a small range. The organized data results are shown in Table 3, which fully demonstrates the superior performance of the controller designed in this paper in dealing with extreme working conditions, providing reference for practical applications.

Secondly, compare the tracking performance of three controllers on the double line path. This article selects three types of vehicle speeds to test the performance of the controller, with 36 km/h for low-speed conditions, 54 km/h for medium-speed conditions, and 72 km/h for high-speed conditions.

From Figure 11 and Figure 12, it can be seen that under high-speed conditions, the maximum tracking error of AP-MPC is reduced by 29.33% compared to MPC, the maximum tracking error of DQN-AP-MPC is reduced by 52.24% and the average absolute error of DQN-AP-MPC is 0.0797 m, which is reduced by 52.23% compared to MPC. Not only has the tracking accuracy of the DQN-AP-MPC controller been improved, but the comparison of different controller parameters in the figure also shows that its convergence speed and stability have been improved.

From Figure 13, it can be seen that under medium-speed conditions, compared to MPC, the maximum tracking error of AP-MPC is reduced by 29.23%, and the maximum tracking error of DQN-AP-MPC is reduced by 59.51%. The average absolute error of DQN-AP-MPC is 0.0366 m, which is a 66.57% reduction compared to MPC. Compared to high-speed conditions, the performance improvement of the enhanced trajectory tracking controller is more significant.

From Figure 14, it can be seen that under low-speed conditions, compared to MPC, the maximum tracking error of AP-MPC is reduced by 28.33%, and the maximum tracking error of DQN-AP-MPC is reduced by 67.85%. The average absolute error is reduced by 63.95%, and the root mean square error is reduced by 46.48%.

The trajectory tracking error analysis of the three controllers proposed in this article at different vehicle speeds is shown in Table 4. The confidence interval for the lateral tracking error at a speed of 72 km/h is [−0.04894 0.20831]; the confidence interval for the lateral tracking error at a speed of 54 km/h is [−0.02261 0.09581]; the confidence interval for the lateral tracking error at a speed of 36 km/h is [−0.01413 0.06852]; and the confidence level has reached over 95%. Through data analysis, it can be seen that the improvement effect of the maximum tracking error becomes more significant as the vehicle speed decreases.

5. Hardware-in-the-Loop Experiment and Real Vehicle Experiment

5.1. The Hardware-in-the-Loop Platform

Hardware-in-the-loop testing involves incorporating some hardware into the simulation test, which can consider certain real-world factors and has a higher level of confidence. Compared to real vehicle road tests, HIL testing offers greater safety and allows for algorithm simulation experiments under high-risk conditions. Therefore, HIL testing is an indispensable step before real vehicle experiments. The HIL experimental framework used in this paper is shown in Figure 15.

5.2. Analysis of the Hardware-in-the-Loop Experimental Results

The double lane path can more comprehensively test the dynamic changes in real driving environments, imposing higher requirements on the response capability and stability of vehicle control systems. Therefore, the hardware-in-the-loop experiment selects the double lane path for testing.

Figure 16 and Figure 17 show the hardware-in-the-loop experimental results of different controllers under higher speed conditions. Affected by hardware sensor noise and mechanical system lag, the MPC controller became unstable during the recovery process, with a recovery lateral error of 0.2399 m, compared to a simulation experiment recovery error of 0.1817 m. The AP-MPC controller had a maximum lateral error of 0.4219 m, with a simulation experiment maximum lateral error of 0.3578 m. The DQN-AP-MPC controller had a maximum lateral error of 0.2992 m and an average absolute error of 0.0817 m, with a simulation experiment maximum lateral error of 0.2834 m and an average absolute error of 0.0797 m. This verifies that the DQN-AP-MPC controller can complete the hardware-in-the-loop experiment with higher precision and a faster convergence speed under higher speed conditions. The differences between the hardware-in-the-loop experimental results and the simulation results are within an acceptable range, fully validating the effectiveness of the improved controller. This lays a foundation for further optimization and practical application of the controller.

5.3. Real Vehicle Experimental Equipment and Road Scenarios

This paper uses an intelligent vehicle modified based on the Haval H7 as the real vehicle experimental platform, as shown in Figure 18a. To collect and analyze driving data such as the vehicle position, front wheel steering angle, and yaw angle, the upper computer communicates with hardware devices to monitor the vehicle status in real time. The improved MPC control algorithm is written in Matlab and deployed in the ROS environment. The CAN nodes subscribe and send them to the vehicle chassis in the form of CAN messages. The specific connection situation is shown in Figure 18b.

The real vehicle experiment was conducted on a safe road section within the campus, with the road scene and path shown in Figure 19.

5.4. Analysis of Real Vehicle Experimental Results

To ensure the safety and controllability of the experiment, the real vehicle experiment was conducted at a controlled speed of 18 km/h. The results of the real vehicle experiment are shown in Figure 20. The chattering of the data in the figure is caused by the combined effects of the inherent characteristics of the system and external environmental interference, which are inevitable to some extent in practical applications. This is a common phenomenon in trajectory tracking control processes.

According to the analysis, the maximum lateral tracking error for path tracking is 0.3372 m, and the average absolute error is 0.1413 m; the average absolute error for the front wheel steering angle is 0.8793 degrees, which is slightly larger due to the influence of transmission system vibration and delay, but within the allowable range of error. The average absolute error for the yaw rate is 1.2731 degrees per second, and the average absolute error for the longitudinal vehicle speed is 0.1543 km/h. These parameters indicate that the improved controller accurately controls the basic dynamic characteristics of the vehicle. The real vehicle experimental results of the controller designed in this paper are basically consistent with the reference indicators, fully verifying the effectiveness and robustness of the improved controller in real driving environments, providing strong support for its promotion in practical applications.

6. Conclusions

To solve the problem of the tracking performance decline of intelligent vehicles in complex driving conditions, an adaptive model predictive tracking controller based on Deep Q-Network is proposed in this paper. By improving the prediction model part and optimization solution part of the MPC algorithm, the tracking performance and the adaptability are further improved. The experimental result proves that the designed adaptive prediction time domain strategy based on two-dimensional Gaussian function achieves the real-time adjustment of MPC prediction horizon, and the adaptive prediction time domain improves the tracking accuracy of the controller under variable curvature and other complex operating conditions. On this basis, the reinforcement learning algorithm of DQN is used to optimize and adjust the weight matrix of the MPC cost function, and the optimized controller has higher tracking accuracy and control stability in the stage of tracing back to the expected path. The improved algorithm demonstrates good comprehensive control performance in multi-testing conditions. In addition, the development of autonomous vehicle technology can drive intelligent transportation and promote new travel services and business models.

This study still has the following shortcomings. The current DQN-AP-MPC controller has a large computational load under complex operating conditions, and in the future, real-time performance can be improved through model reduction or lightweight network design. Secondly, the controller has limited adaptability in extreme environments such as low adhesion road conditions, and uncertainty modeling can be introduced in the future to enhance robustness.

Author Contributions

Conceptualization, Y.F.; methodology, Y.F.; software, Y.F.; validation, Y.F. and P.W.; formal analysis, Y.F.; investigation, Y.Z.; resources, S.G.; writing—original draft preparation, Y.F.; writing—review and editing, P.W.; visualization, Q.Z.; supervision, B.S.; project administration, B.S.; funding acquisition, S.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. 52102465); the National Natural Science Foundation of China General Project (52475269); the Natural Science Foundation of Shandong Province (Grant No. ZR2022MF230); the Natural Science Foundation of Shandong Province (Grant No. ZR2021QF039); the Natural Science Foundation of Shandong Province General Project (ZR2024ME179); Small and Medium-sized Enterprise Innovation Capability Improvement Project of Shandong Province (Grant No. 2022TSGC2277); the Major Scientific and Technological Innovation Project of Shandong Province (Grant No. 2023CXGC010111). Any opinions expressed in this paper are solely those of the authors and do not represent those of the sponsors. The authors would like to thank experienced anonymous reviewers for their constructive and valuable suggestions for improving the overall quality of this paper.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Dong, N.; Li, X.; Wu, Z. On integrated lateral and longitudinal control of brain-controlled vehicles. Neurocomputing 2024, 597, 127957. [Google Scholar] [CrossRef]
Dong, C.; Ye, Q.; Dai, S. Neural-network-based adaptive output-feedback formation tracking control of USVs under collision avoidance and connectivity maintenance constraints. Neurocomputing 2020, 401, 101–112. [Google Scholar] [CrossRef]
Wang, H.Y.; Zuo, Z.Q.; Deng, Y.X.; Wang, Y.J.; Yang, H.J. Overview of Motion Control for Autonomous Vehicles. Control Theory Appl. 2024, 41, 1507–1522. Available online: http://kns.cnki.net/kcms/detail/44.1240.TP.20231214.0029.006.html (accessed on 11 September 2024). (In Chinese).
Zheng, C.; Du, Y.; Liu, Z.J. A Survey of Lateral Control Methods for Autonomous Vehicles. Auto. Eng. 2024, 1–10. (In Chinese) [Google Scholar] [CrossRef]
Han, W.; Li, A.; Huang, X.; Li, W.; Cao, J.; Bu, H. Trajectory tracking of in-wheel motor electric vehicles based on preview time adaptive and torque difference control. Adv. Mech. Eng. 2022, 14, 16878132221089909. [Google Scholar] [CrossRef]
Xu, L.J.; Chen, Q.H.; Wang, L.L.; Yao, L.J. Path Tracking of Agricultural Vehicles Based on 4WIS–4WID Structure and Fuzzy Control. Appl. Sci. 2023, 13, 8495. [Google Scholar] [CrossRef]
Wang, R.C.; Liu, H.; Ye, Q.; Fu, Y.X. Research on Fuzzy Sliding Mode Lateral Motion Control of Intelligent Vehicles Based on Three Degrees of Freedom. J. Automot. Eng. 2022, 12, 749–758. [Google Scholar]
Li, J.; Kong, X.Y.; Wang, X.Y.; Hu, Z.; Lan, H.; Wang, Z.Y. Model prediction based active steering intelligent unmanned vehicle lateral steady-state control. J. Beijing Inst. Technol. 2023, 43, 812–819. [Google Scholar] [CrossRef]
Li, T.; Ren, H.; Li, C. Intelligent electric vehicle trajectory tracking control algorithm based on weight coefficient adaptive optimal control. Trans. Inst. Meas. Control 2024, 46, 2647–2663. [Google Scholar] [CrossRef]
Chen, Z.N.; Tong, L.; Li, X.D.; Xu, Z.F. Trajectory Tracking Control of Adaptive Time Domain Model Predictive Control. J. Chongqing Univ. Technol. 2024, 38, 78–85. [Google Scholar]
Guo, J.H.; Li, W.C.; Luo, Y.G.; Li, K.Q. Model Predictive Adaptive Cruise Control of Intelligent Electric Vehicles Based on Deep Reinforcement Learning Algorithm FWOR Driver Characteristics. Int. J. Automot. Technol. 2023, 24, 1175–1187. [Google Scholar] [CrossRef]
Wang, X.D.; Yuan, L.C.; Huang, J.L.; Gao, Y.; Li, H.F. Research on continuous obstacle avoidance trajectory planning and tracking control strategies for low speed intelligent vehicles with different road curvature changes. Available online: https://link.cnki.net/urlid/61.1114.th.20240902.1707.013 (accessed on 11 September 2024). (In Chinese).
Pei, Y.L.; Zhang, C.X.; Fu, B.H.; Ran, S.M. Vehicle path tracking control based on adaptive sampling period and predictive time-domain MPC. Transp. Res. 2024, 10, 46–55. (In Chinese) [Google Scholar] [CrossRef]
Wu, C.S.; Gao, S.Y. Adaptive predictive time-domain parameter MPC vehicle trajectory tracking control. J. Chongqing Univ. Technol. 2024, 38, 99–108. [Google Scholar]
Zhang, X.W.; Chen, L.; Sun, H.W.; Ji, Q. Fuzzy adaptive trajectory tracking control based on road adhesion coefficient estimation. J. Chongqing Univ. Technol. 2024, 38, 92–101. [Google Scholar]
Mei, R.K.; Yu, X.H. Model Predictive Speed Control Weight Coefficient Self-tuning Based on PSO. Micro Spec. Mot. 2022, 50, 42–46+51. (In Chinese) [Google Scholar] [CrossRef]
Sun, D.S.; Anahita, J.; Bart, D.S. Adaptive parameterized model predictive control based on reinforcement learning: A synthesis framework. Eng. Appl. Artif. Intell. 2024, 136, 109009. [Google Scholar] [CrossRef]
Han, M.; He, H.W.; Shi, M.; Liu, W.; Cao, J.F.; Wu, J.d. Research on Path Tracking Control Based on Learning for Autonomous Vehicle Model Prediction. Automot. Eng. 2024, 46, 1197–1207. [Google Scholar] [CrossRef]
Li, J.S.; Peng, X.Y.; Li, B.; Sreeram, V.; Wu, J.W.; Chen, Z.A.; Li, M.Z. Model predictive control for constrained robot manipulator visual servoing tuned by reinforcement learning. Math. Biosci. Eng. 2023, 20, 10495–10513. [Google Scholar] [CrossRef]
Shi, J.R.; Li, K.X.; Piao, C.H.; Gao, J.; Chen, L.Z. Model-Based Predictive Control and Reinforcement Learning for Planning Vehicle-Parking Trajectories for Vertica Parking Spaces. Sensors 2023, 23, 7124. [Google Scholar] [CrossRef]
Wang, L.W.; Yang, S.; Yuan, K.; Huang, Y.J.; Chen, H. A Combined Reinforcement Learning and Model Predictive Control for Car-Following Maneuver of Autonomous Vehicles. Chin. J. Mech. Eng. 2023, 36, 1–11. [Google Scholar] [CrossRef]
Hejtmánek, P.; Čavoj, O.; Porteš, P. Evaluation of vehicle handling by a simplified single track model. Perner’s Contacts 2013, 8, 42–52. [Google Scholar]
Wiseman, Y. Autonomous Vehicles, Encyclopedia of Information Science and Technology, 5th ed.; IGI Global: Hershey, PA, USA, 2020; Volume 1, Chapter 1; pp. 1–11. [Google Scholar]
Xiao, H.Z.; Chen, C.L.; Lai, G.Y.; Yu, D.X.; Zhang, Y. Integrated nonholonomic multi-robot consensus tracking formation using neural-network-optimized distributed model predictive control strategy. Neurocomputing 2023, 518, 282–293. [Google Scholar] [CrossRef]
Han, T.Y.; Shen, Y.J.; Bao, Q.; Qu, Q.K.; Wu, Z. Adaptive Cruise Control Strategy for Vehicle Curves Based on Human like Decision Making and Horizontal/Vertical Coordination. Chin. J. Highw. 2023, 36, 211–223. (In Chinese) [Google Scholar] [CrossRef]
Wang, H.B.; Wang, C.Y.; Zhao, L.F.; Hu, Y.P. Intelligent vehicle path tracking with variable parameter MPC multi-objective control based on reinforcement learning. Chin. J. Highw. 2024, 37, 157–169. (In Chinese) [Google Scholar] [CrossRef]

Figure 1. Technical route. The reinforcement learning framework generates new parameters based on the system state, control actions, and parameter optimization, which are input into the MPC controller. The MPC controller outputs control actions based on the system state and new parameters, and sends them to a co-simulation, hardware-in-the-loop platform and real vehicles for verification.

Figure 2. Three-degree of freedom dynamic model of vehicle.

Figure 3. Dynamic error model.

Figure 4. The three-dimensional distribution of the prediction horizon.

Figure 5. The principle of the DQN algorithm.

Figure 6. Control system modeling.

Figure 7. Variable curvature reference path.

Figure 8. The double line reference path.

Figure 9. 90 km/h comparison of tracking performance among different controllers.

Figure 10. 90 km/h comparison of errors among different controllers.

Figure 11. 72 km/h comparison of tracking performance among different controllers.

Figure 12. 72 km/h comparison of errors among different controllers.

Figure 13. 54 km/h comparison of tracking performance among different controllers.

Figure 14. 36 km/h comparison of tracking performance among different controllers.

Figure 15. Hardware-in-the-loop experimental framework.

Figure 16. 72 km/h comparison of tracking performance among different controllers.

Figure 17. 72 km/h Comparison of errors among different controller.

Figure 18. Intelligent vehicle.

Figure 19. Top view of real vehicle experiment. The dashed line represents the reference path, and the box is used to highlight the actual position of the intelligent vehicle.

Figure 20. The results of the real vehicle experiment.

Table 1. The Deep Q-network parameters and explanations.

Parameters	Symbols	Value or Value Range	Explanation
Discount factor	γ	0.95~0.99	Determines the importance of future rewards.
Learning rate	α	0.001	Step size for updating the neural network parameters.
Exploration rate	ε	1.0→0.01 (decay)	Probability of selecting a random action to balance exploration and exploitation.
Replay buffer size	M	10,000~50,000	Maximum number of past transitions stored for training.
Batch size	B	64	Number of samples randomly drawn from the replay buffer per update.
Target update interval	T	100~500 steps	Frequency (in steps) to update the target Q-network.
State space dimension	s	-	Represents tuning decisions for MPC weighting matrices.
Action space dimension	a	-	Trainable parameters of the action-value function approximator.

Table 2. Model parameter table.

Parameters	Symbols	Value or Value Range
Prediction time domain	$N_{p}$	10~30
Control time domain	$N_{c}$	10
Sampling period	$Δ T$	0.05 s
Front wheel corner restraint	$[\begin{matrix} δ_{f, \min} & δ_{f, \max} \end{matrix}]$	−0.1745~0.1745 rad
Incremental constraint of front wheel angle	$[\begin{matrix} Δ δ_{f, \min} & Δ δ_{f, \max} \end{matrix}]$	−0.0148~0.0148 rad
Longitudinal speed constraint	$[\begin{matrix} v_{x, \min} & v_{x, m a x} \end{matrix}]$	0~30 m/s
Longitudinal speed increment constraint	$[\begin{matrix} Δ v_{x, \min} & Δ v_{x, \min} \end{matrix}]$	−2~2 m/s
Relaxation factor	$ε$	5
Weight coefficient of relaxation factor	$ρ$	1000

Table 3. Comparison of tracking errors.

Index	Controller	MAX↓	MSE↓	RMSE↓
Lateral position error (m)	MPC	1.4641	0.2639	0.4764
	AP-MPC	0.7519	0.1558	0.2348
	DQR-AP-MPC	0.3396	0.0939	0.1272
Longitudinal speed (km/h)	MPC	0.9273	0.1992	0.3075
	AP-MPC	0.5349	0.1345	0.1949
	DQR-AP-MPC	0.3355	0.0701	0.0980

“↓” represents a decreasing trend, and the smaller the error, the better.

Table 4. Comparison of tracking errors at different vehicle speeds.

Speed	Controller	MAX↓	MSE↓	RMSE↓
72 km/h	MPC	0.5929	0.1580	0.2401
	AP-MPC	0.4190	0.1202	0.1947
	DQR-AP-MPC	0.2834	0.0797	0.1286
54 km/h	MPC	0.4019	0.1096	0.1794
	AP-MPC	0.2844	0.0754	0.1207
	DQR-AP-MPC	0.1627	0.0366	0.0592
36 km/h	MPC	0.3154	0.0755	0.1245
	AP-MPC	0.2261	0.0505	0.0831
	DQR-AP-MPC	0.1014	0.0272	0.0413

“↓” represents a decreasing trend, and the smaller the error, the better.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fang, Y.; Wang, P.; Gao, S.; Sun, B.; Zhang, Q.; Zhang, Y. Trajectory Tracking Control of Intelligent Vehicles with Adaptive Model Predictive Control and Reinforcement Learning Under Variable Curvature Roads. Technologies 2025, 13, 394. https://doi.org/10.3390/technologies13090394

AMA Style

Fang Y, Wang P, Gao S, Sun B, Zhang Q, Zhang Y. Trajectory Tracking Control of Intelligent Vehicles with Adaptive Model Predictive Control and Reinforcement Learning Under Variable Curvature Roads. Technologies. 2025; 13(9):394. https://doi.org/10.3390/technologies13090394

Chicago/Turabian Style

Fang, Yuying, Pengwei Wang, Song Gao, Binbin Sun, Qing Zhang, and Yuhua Zhang. 2025. "Trajectory Tracking Control of Intelligent Vehicles with Adaptive Model Predictive Control and Reinforcement Learning Under Variable Curvature Roads" Technologies 13, no. 9: 394. https://doi.org/10.3390/technologies13090394

APA Style

Fang, Y., Wang, P., Gao, S., Sun, B., Zhang, Q., & Zhang, Y. (2025). Trajectory Tracking Control of Intelligent Vehicles with Adaptive Model Predictive Control and Reinforcement Learning Under Variable Curvature Roads. Technologies, 13(9), 394. https://doi.org/10.3390/technologies13090394

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Trajectory Tracking Control of Intelligent Vehicles with Adaptive Model Predictive Control and Reinforcement Learning Under Variable Curvature Roads

Abstract

1. Introduction

2. Vehicle Modeling and State Equations

2.1. Vehicle Three-Degree of Freedom Dynamic Model

2.2. Error Model of Trajectory Tracking

2.3. Linearization and Discretization of the Model

3. Design of a Learning-Based Adaptive Trajectory Tracking Controller

3.1. Objective Function with Constraints

3.2. Adaptive Prediction Horizon Strategy

3.3. Solution of Controller

3.4. Optimization of Weight Matrix Based on DQN

4. Simulation Analysis

4.1. Simulation Modeling

4.2. Reference Trajectory

4.3. Evaluation Index

5. Hardware-in-the-Loop Experiment and Real Vehicle Experiment

5.1. The Hardware-in-the-Loop Platform

5.2. Analysis of the Hardware-in-the-Loop Experimental Results

5.3. Real Vehicle Experimental Equipment and Road Scenarios

5.4. Analysis of Real Vehicle Experimental Results

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI