1. Introduction
Intelligent vehicles have significant advantages in improving vehicle safety and traffic efficiency, and the research of intelligent vehicle has become a current hotspot [
1]. The key technologies of intelligent vehicles include environmental perception, driving decision making, path planning, and motion control. The motion control module aims to operate the desired trajectory of the vehicle track quickly and stably [
2]. How to ensure control accuracy and vehicle stability under dynamic environmental changes and complex road conditions is still a difficult problem to solve. Therefore, tracking control has become one of the key research areas in current intelligent vehicle research.
Currently, common trajectory tracking control methods include PID control [
3,
4], preview control [
5], fuzzy control [
6], sliding mode control [
7], and model predictive control [
8,
9]. PID Control cannot handle multi-variable coupling and multi-objective constraints, with parameter tuning heavily reliant on empirical knowledge, resulting in poor adaptability to varying operating conditions. Preview control has weak robustness against external disturbances, making it difficult to achieve effective control under high-speed driving conditions. The accuracy of fuzzy control is limited, as it depends on empirical rules and lacks general applicability. Sliding mode control tends to chattering phenomena, severely affecting the controller’s effectiveness in real traffic scenarios. Compared to other control algorithms, Model Predictive Control (MPC) has significant advantages [
10]. It can directly incorporate multiple inputs, multiple outputs, and state constraints into the optimization framework. MPC models the predictive model by considering external disturbances and model errors, significantly improving the adaptability of the control system [
11]. Moreover, MPC achieves multi-objective optimization by adjusting weights, improving system flexibility. With its unique strengths in predictive model construction and optimal control solving, MPC has garnered widespread attention from researchers.
For the construction of the MPC prediction model, the fixed prediction time domain is usually used in the traditional method [
12]. However, when dealing with complex driving conditions, a fixed prediction range can limit the accuracy of trajectory tracking. Currently, researchers widely adopt variable horizon MPC algorithms to address these challenges. Pei et al. [
13] proposed an MPC controller that updates the prediction horizon based on vehicle speed. The experimental results demonstrate that the proposed controller achieves high control accuracy, but it is difficult to meet the control requirements of complex road conditions. Wu et al. [
14] used a gray relational analysis method to determine the optimal prediction horizon parameters for MPC under different vehicle speeds. This approach improved the computational efficiency, but the tracking accuracy of the modified controller still needs further improvement. Zhang et al. [
15] proposed a fuzzy adaptive trajectory tracking control approach based on road surface adhesion coefficient estimation. However, the above studies only focus on optimizing the design for specific scenarios, resulting in response delays in dynamic environments such as sudden curvature changes, making it difficult to maintain stable control performance.
For the optimal control part of MPC, in order to balance the importance of each objective, weight factors are often introduced for adjustment. Mei et al. [
16] proposed an online self-tuning method for weight coefficients based on an improved particle swarm algorithm. Through iterative optimization, the weight coefficients that minimize the objective function were obtained. However, its adaptability needs to be enhanced. Sun et al. [
17] used a fuzzy adaptive track tracking control based on the estimation of a road adhesion coefficient, which improved the stability of the controller. Traditional MPC relies on accurate system models and fixed control strategies, which have certain limitations. Reinforcement Learning (RL), due to its strengths in dynamic decision making and adaptive optimization, has been increasingly integrated with MPC by researchers [
18]. Li et al. [
19] used Deep Deterministic Policy Gradient (DDPG) to perform offline tuning of the weight matrix in the MPC objective function, training an optimal weight matrix. Shi et al. [
20] proposed a risk threshold model based on RL, classifying scene risk levels according to scenario features, improving the adaptability of MPC to real-world scenarios. Wang et al. [
21] designed a reward function to guide the adjustment of neural network parameters, improving the parking efficiency of intelligent vehicles. However, learning-based MPC algorithms face challenges such as instability caused by reliance on learned strategies and have limited generalization abilities due to complex modeling, requiring further optimization and improvement.
In summary, to solve the problem of control performance decline caused by the fixed prediction time domain and weight matrix solution, an adaptive model prediction controller based on reinforcement learning optimization is proposed in this paper. The prediction model and optimal solution of MPC are improved, which effectively improves the adaptability and tracking accuracy of the controller in various driving scenarios. The technology route is shown in
Figure 1 and the main contributions can be summarized as follows: Firstly, a three-degree-of-freedom vehicle dynamics model and a trajectory tracking error model are established; Secondly, an adaptive prediction horizon strategy based on a two-dimensional Gaussian function is designed and the real-time adjustment of prediction time domain change with vehicle speed and road curvature is realized. The tracking accuracy is further improved; Thirdly, to address the problem of tracking stability reduction under complex road conditions, the Deep Q-Network (DQN) algorithm is used to adjust the weight matrix of the Model Predictive Control (MPC) algorithm, and the convergence speed and control effectiveness of the tracking controller are improved; Finally, to verify the performance of the proposed controller, co-simulation, hardware-in-the-loop tests and real vehicle tests are conducted.
The remainder of this paper is organized as follows.
Section 2 conducts vehicle modeling;
Section 3 provides a detailed introduction to the improved controller; in
Section 4, simulation experiments are conducted; in
Section 5, bench and vehicle validation is conducted; and
Section 6 concludes the paper.
2. Vehicle Modeling and State Equations
The vehicle model and tracking error are key factors in studying trajectory tracking control problems. To track the desired trajectory, the control system must first define the tracking error in the objective function and then design a trajectory tracking control strategy based on the vehicle model to eliminate the defined tracking error. Therefore, it is crucial to construct a vehicle model that can accurately characterize the vehicle’s motion state while meeting the real-time performance of the control algorithm.
2.1. Vehicle Three-Degree of Freedom Dynamic Model
The vehicle dynamics model considers tire deformation, force, and relative motion, offering stronger predictive performance and stability compared to kinematic models. This paper focuses on the coupled longitudinal and lateral control of vehicles based on an improved model predictive algorithm. It establishes a vehicle dynamics model and tire model for trajectory tracking control, modeling the three degrees of freedom in longitudinal motion, lateral motion, and yaw motion to analyze the system’s accuracy, stability, and robustness. It is assumed that the forces on both sides are symmetrical, as shown in
Figure 2 for the three-degree-of-freedom vehicle model [
22].
In
Figure 2,
OXY represents the inertial coordinate system;
oxy represents the vehicle body coordinate system;
a and
b are the distances from the front and rear axle centers to the center of mass;
Fcf and
Fcr are the cornering forces on the front and rear wheels;
αf and
αr are the tire slip angles of the front and rear wheels;
Flf and
Flr are the longitudinal forces on the front and rear wheels;
Fxf and
Fxr are the x-direction forces on the front and rear tires;
Fyf and
Fyr are the y-direction forces on the front and rear tires;
δf,
β are the front wheel steering angle and the side slip angle of the center of mass, respectively;
v is the velocity of the vehicle’s center of mass,
vx is the longitudinal component of the center of mass velocity,
vy is the lateral component of the center of mass velocity; and
and
are the vehicle yaw angle and yaw rate, respectively.
Based on Newton’s second law, the mechanical equations for the vehicle along the three axes are shown in Equation System (1):
where
m represents the total mass of the vehicle, and
Iz represents the moment of inertia.
are the front and rear wheel steering angles, respectively. This paper only considers front-wheel steering; thus,
. The tire forces in the longitudinal and lateral directions can be decomposed into the resultant forces along the
x-axis and
y-axis of the vehicle body coordinate system, as shown in Equation (2):
When the tire slip angle and slip rate are small, based on the magic formula, tire forces can be approximated as linear functions, as shown in Equation (3):
where
Clf and
Clr represent the longitudinal stiffness of the front and rear tires;
,
represent the slip ratios of the front and rear tires;
and
are the longitudinal cornering stiffness of the front and rear tires; and
,
are the tire slip angles of the front and rear tires.
To further simplify the equations and improve the fitting accuracy of the tire model, based on the small-angle assumption
, the slip angles of the front and rear wheels can be approximated as Equation (4):
Based on the above analysis, the three-degree-of-freedom vehicle dynamics model for lateral motion, longitudinal motion, and yaw motion is established as shown in Equation (5):
Since the reference trajectory information is represented based on the geodetic coordinate system, and the aforementioned three-degree-of-freedom vehicle dynamics model is built under the vehicle body coordinate system, it is necessary to perform a conversion between the vehicle body coordinate system and the ground inertial coordinate system. The conversion formula is Equation System (6):
In summary, the state variable of the system is , which is used to evaluate the accuracy and stability of the trajectory tracking control strategy. The control variable of the system is , which is the independent variable for the optimization solution of the subsequent controller.
2.2. Error Model of Trajectory Tracking
In the autonomous driving control module, it is necessary to calculate the errors in the Frenet coordinate system [
23], which is based on the actual state in the Cartesian coordinate system and the projected state of the reference trajectory, and minimize these errors through control algorithms. The selected error state variables include lateral position error, lateral velocity error, yaw angle error, yaw rate error, longitudinal position error, and longitudinal velocity error, which are used to establish a dynamic error model. During trajectory tracking, road curvature has a significant impact on the vehicle steering characteristics and driving stability, directly affecting model accuracy. Therefore, the error model must fully consider variations in road curvature. The specific model is shown in
Figure 3.
In the Cartesian coordinate system, the state of a vehicle can be represented by , where is the position coordinate of the vehicle in the Cartesian coordinate system, and are the road curvature, yaw angle, velocity, and acceleration at the center of mass; in the Frenet coordinate system, the state of the vehicle can be represented by , where are the longitudinal displacement, longitudinal velocity, and longitudinal acceleration of the vehicle, and are the lateral displacement, lateral velocity, and lateral acceleration of the vehicle, respectively.
The reference line of a vehicle is usually composed of a series of discrete points. When the number of discrete points is sufficient, the sum of the distances between points can be approximately considered as the lateral displacement
s. Assuming that projection point
is the
n-th point among the discrete points, then the longitudinal displacement
s is Formula (7):
Based on the triangular vector relationship, the lateral displacement
d can be calculated as Formula (8):
Furthermore, the longitudinal velocity
and lateral velocity
can be derived as Formulas (9) and (10):
where
is the vehicle yaw angle, and
is the sideslip angle at the center of mass. According to automotive theory, the heading angle
.
In the Frenet coordinate system, the lateral error
ed is the lateral distance from the center of mass of the vehicle to the reference trajectory point
P; the longitudinal error
es is the longitudinal distance from the center of mass of the vehicle to the reference trajectory point
P; and
is the angle between the tangent of point
P on the reference trajectory and the
X-axis, the heading angle error is
, and the road curvature of point
P on the reference trajectory is
. The car trajectory tracking error equation is summarized as Equation System (11):
2.3. Linearization and Discretization of the Model
At a certain operating moment, the system’s state variable is , and the control variable is . The state-space equation at the initial operating point of the system is given by the following: .
To linearize the state-space equation around any point
in the system, perform a first-order Taylor expansion, neglecting higher-order terms and retaining only the first-order terms. This transforms the nonlinear model into a computationally convenient linear model. The result is shown in Equation (12):
where
is the Jacobian matrix of the system state transition function
f with respect to the state variable
x, and
is the Jacobian matrix of the system state transition function
f with respect to the control variable
u.
Let
, by subtracting Equation (12) from the state space equation of the initial operating point of the system. Then, obtain the linear state space expression, as shown in Equation (13):
where
,
,
and
,
.
where
.
The matrix
A describes the coupling relationships among different state variables of the vehicle system, while the matrix
B represents the effects of control inputs (steering angle and velocity) on vehicle motion. By linearization, these two matrices allow the complex nonlinear vehicle model to be approximated as a linear system within a certain operating range, thereby facilitating controller design and analysis. It has been verified that the midpoint Euler method is more accurate than the forward Euler method and the backward Euler method. Therefore, the midpoint Euler method is used to discretize Formula (13) and obtain the discrete expression of the state space, which is as follows:
where
,
I is the identity matrix and
T is the sampling time.
By linearizing and discretizing the nonlinear state space, a linear time-varying prediction model is established. In the prediction horizon, the difference between the current state and the control input is used to predict the future system state and output, iterating continuously [
24]. Then, the rolling optimization of Model Predictive Control is performed by establishing the objective function of the system and the constraints during the driving process, followed by optimization and solution.
3. Design of a Learning-Based Adaptive Trajectory Tracking Controller
MPC is an advanced control method. In this chapter, two improvements are made to the benchmark MPC controller. The first is to design an adaptive predictive time-domain strategy, and the second is to improve the weight matrix based on reinforcement learning methods to achieve precise and stable control of the system.
3.1. Objective Function with Constraints
The objective function with constraints is one of the key components of the coupled longitudinal and lateral MPC controller. To balance accuracy, stability, and ride comfort during the trajectory tracking process, the objective function needs to trade off tracking error, control consumption, and the magnitude of control increment changes.
Due to the many inherent physical constraints of the vehicle, a small change in the steering wheel angle at higher vehicle speeds can cause significant lateral acceleration. Therefore, hard constraints must be imposed on the control variables during the solution process. Additionally, to ensure the feasibility of optimization, soft constraints are applied to the control increments. Thus, Formula (17) can be obtained:
where
represents the lower constraint bound of the control variable;
represents the upper constraint bound of the control variable;
represents the lower constraint bound of the control increment; and
represents the upper constraint bound of the control increment.
Taking into account the constraints of the road lateral position, tire roll angle, vehicle adhesion conditions, and control increment, the design objective function and constraints are as shown in Equation (18):
where
represents the actual output,
represents the reference output,
,
is the control horizon,
is the prediction horizon,
is the control increment,
Q is the weight matrix for the state variables,
R is the weight matrix for the control increments,
is the weight for the relaxation factor,
is the relaxation factor,
represents the hard constraints, which are outputs that cannot have their constraint ranges relaxed, and
represents the soft constraints, which are outputs whose constraint ranges can be adjusted through relaxation factors.
MPC controller solving problems belongs to constrained convex quadratic programming, which has good feasibility and optimality. Although the computational complexity increases with the increase in the prediction step size and number of constraints, setting reasonable prediction time domains and other methods can meet the requirements of real-time vehicle control. Therefore, this algorithm has strong feasibility and promotional value in both theoretical and engineering applications.
3.2. Adaptive Prediction Horizon Strategy
In the MPC algorithm, considering the greater influence of the prediction horizon Np, a control horizon Nc is set to 10 under the premise of ensuring vehicle stability. The criteria for selecting the prediction horizon vary with different driving speeds. At low speeds, a shorter prediction horizon can achieve better control effects; at high speeds, increasing the prediction horizon ensures vehicle stability. Additionally, the prediction horizon must consider the curvature characteristics of the road ahead. When the road curvature is large, the prediction horizon should be reduced to improve control precision; when the road curvature is small, the prediction horizon should be increased to maintain vehicle stability.
To enhance the controller’s performance under various road curvatures and speed conditions, this paper proposes an adaptive prediction horizon strategy based on a two-dimensional Gaussian function. This method designs a smooth and continuous prediction horizon adjustment mechanism, more effectively accommodating the complexity and diversity of curved road driving. Compared to traditional methods, this strategy not only reduces computational complexity but also significantly improves the stability and adaptability of the controller. Furthermore, the design of the prediction horizon fully considers the coupling relationship between vehicle speed and road curvature, and its form is as shown in Formula (19):
where
represents the road curvature,
are the first derivatives of the
x and
y coordinates of the curve at a certain moment, and
are the second derivatives of the
x and
y coordinates, respectively.
where
represents the upper limit of the prediction horizon,
is the speed influence factor, and
is the road curvature influence factor.
Based on the vehicle characteristics and performance requirements, the parameters are calibrated to obtain a predicted horizon offline database. Combined with the vehicle model and control algorithm, the three-dimensional distribution of the prediction horizon concerning vehicle speed and road curvature is obtained, as shown in
Figure 4.
3.3. Solution of Controller
To facilitate the solution and computation by a computer, the optimization problem can be transformed into a quadratic programming problem, and its standard form is Equation (21):
where
;
(
denotes the Kronecker product); the Hessian matrix is denoted by
; the gradient matrix is denoted by
;
;
;
x represents the optimization variable;
represents the lower bound of the optimization variable constraints;
represents the upper bound of the optimization variable constraints; and
is the coefficient matrix of the inequality.
Q and R are the weight matrices, defining the priority in control objectives to balance the weights of system state deviations and control input weights. Due to the need to consider driving smoothness and riding comfort during the design process of the coupled lateral and longitudinal control system, the changes in state increments and control increments should be as small as possible. Therefore, they require precise constraints, where is the weight for lateral position deviation; is the weight for lateral velocity deviation; is the weight for longitudinal position deviation; is the weight for longitudinal velocity deviation; is the weight for yaw angle deviation; is the weight for yaw rate deviation; is the weight for the front wheel steering angle; and is the weight for longitudinal velocity. For the state weight matrix , the larger the value , the higher the sensitivity of the system to the deviation of the state variable from the target, meaning that the system places more emphasis on the accuracy of that state variable to ensure that the system is closer to the target state. For the control weight matrix , the larger the values , the more the system tends to limit the amplitude of the control input , avoiding drastic adjustments, which can ensure the smoothness of control actions and reduce system wear. The improper selection of weight matrices may lead to system oscillation, energy waste, poor tracking performance, and other issues.
3.4. Optimization of Weight Matrix Based on DQN
The adaptive prediction horizon MPC faces issues of control delay during vehicle steering and return to straight, leading to increased tracking error, especially at high speeds or under extreme conditions. Reinforcement learning algorithms can optimize weights in real time based on environmental characteristics, allowing controllers to maintain trajectory tracking accuracy in complex environments and avoid control instability. In terms of optimizing the MPC weight matrix, a deep Q-network centered on the value function does not require additional learning of the system’s dynamic environment model. It directly learns Q values using interaction data, bypassing the modeling step. This approach is particularly advantageous for scenarios with complex or difficult-to-model dynamic models. Therefore, this paper adopts a deep Q-network to optimize the weight matrix. By penalizing behaviors that deviate from the expected state and control actions, a more precise control target is formed. The DQN can help MPC find the optimal values for these matrices, achieving the best control performance of the controller. The specific implementation process is shown in
Figure 5.
The state spaces includes the current state and the current values of the weight matrices ; the action space a involves adjusting the strategies for Q and R, with each action being an adjustment to each element of these matrices. Since the matrices are diagonal, actions can be defined as . Actions can be discretized, , such as by increasing, maintaining, or decreasing a specific weight. The action at adjusts Q and R, and the MPC controller computes the control input ut, resulting in the next state st+1 and the immediate reward rt; the reward function consists of two parts: one is the state error penalty , and the other is the control input penalty . The total reward can be expressed as , where smaller state errors and control costs lead to higher rewards. are used to balance the proportion between state error and control input.
The Deep Q-Network algorithm uses the Bellman equation to iteratively update the
. The parameter values and explanations of the Deep Q-Network are shown in
Table 1, and the pseudocode of the algorithm is shown in Algorithm 1.
Algorithm 1 DQN-Based Tuning of MPC Weighting Matrices |
Input: Environment model f(x,u); Discrete action set A = {(Q1, R1), (Q2, R2), …, (Qm, Rm)}; State space S; Learning parameters: γ, α, ε, replay buffer size M, batch size B, target update interval T (s,a) function approximator
//Initialize experience replay and networks 1: Initialize replay buffer D←∅ 2: (s,a; θ) with random weights θ 3: (s,a; θ) //Loop over training episodes 4: for episode = 1 to N_episodes do 5: Initialize state s 6: for t = 1 to T_episode do 7: with probability select random action a∈A (s,a; θ) 9: Decode (Q_weight, R_weight)←action a 10: Apply MPC using (Q_weight, R_weight) to compute u * 11: Simulate f(x,u *) to obtain next state s’ 12: Compute reward r based on tracking error and control effort 13: Store transition (s,a,r,s’) in D 14: Sample random minibatch of B transitions (sj,aj,rj,sj’) from D ′(sj′, a′; θ−) 16: Update θ by minimizing loss: (sj, aj; θ)]2 17: Every T steps, update target network: θ−←θ 18: s←s’ 19: end for 20: end for |
Repeat the aforementioned process until the maximum number of training episodes is reached. Once the accumulated rewards converge within a certain number of episodes, select the
Q and
R matrices that yield the highest accumulated reward and smallest penalties as the optimization results. Then, incorporate these matrices into the objective function of the Quadratic Programming (QP) problem for solving.
where
,
.
During the optimization process, the rational selection of weight matrices is crucial for the efficient and stable operation of the MPC controller. The round training of reinforcement learning continuously adjusts the weight matrices. The convergence trend of accumulated rewards (or penalties) serves as the optimization criterion, clearly quantifying the performance differences between various Q and R combinations, avoiding the trial-and-error nature of traditional methods in selecting weights. Additionally, this reward-driven optimization approach has dynamic adaptability, breaking through the limitations of traditional algorithms that rely on manual parameter adjustments.
Under the constraints, the optimal weight matrix is obtained by minimizing the cost function, then incorporated into the objective function of the MPC optimal control input. The optimal variable in each control period of the controller can be solved through the following formula, while also obtaining a series of control increments and relaxation factors within the control horizon. Relaxation factors are typically used to handle uncertainties in system constraints, which helps improve the stability and efficiency of the solution process.
In subsequent control processes, the first term in the optimized control sequence is used as the optimal input at the current moment and fed back into the system, and then the process is repeated. Through rolling optimization, the controller can continuously update the control inputs to achieve precise control of the entire system. Especially in trajectory tracking problems, the MPC method can adjust control inputs based on real-time feedback information and future predictions, allowing the controller to precisely control complex dynamic systems and achieve the desired control objectives.
6. Conclusions
To solve the problem of the tracking performance decline of intelligent vehicles in complex driving conditions, an adaptive model predictive tracking controller based on Deep Q-Network is proposed in this paper. By improving the prediction model part and optimization solution part of the MPC algorithm, the tracking performance and the adaptability are further improved. The experimental result proves that the designed adaptive prediction time domain strategy based on two-dimensional Gaussian function achieves the real-time adjustment of MPC prediction horizon, and the adaptive prediction time domain improves the tracking accuracy of the controller under variable curvature and other complex operating conditions. On this basis, the reinforcement learning algorithm of DQN is used to optimize and adjust the weight matrix of the MPC cost function, and the optimized controller has higher tracking accuracy and control stability in the stage of tracing back to the expected path. The improved algorithm demonstrates good comprehensive control performance in multi-testing conditions. In addition, the development of autonomous vehicle technology can drive intelligent transportation and promote new travel services and business models.
This study still has the following shortcomings. The current DQN-AP-MPC controller has a large computational load under complex operating conditions, and in the future, real-time performance can be improved through model reduction or lightweight network design. Secondly, the controller has limited adaptability in extreme environments such as low adhesion road conditions, and uncertainty modeling can be introduced in the future to enhance robustness.