Reinforcement Learning-Based Approach to Reduce Velocity Error in Car-Following for Autonomous Connected Vehicles

Tayab, Abu; Li, Yanwen; Syed, Ahmad

doi:10.3390/machines12120861

Open AccessArticle

Reinforcement Learning-Based Approach to Reduce Velocity Error in Car-Following for Autonomous Connected Vehicles

by

Abu Tayab

^1,*

,

Yanwen Li

^1,*

and

Ahmad Syed

²

¹

School of Mechanical Engineering, Yanshan University, Qinhuangdao 066004, China

²

Department of Electrical Engineering, Yanshan University, Qinhuangdao 066004, China

^*

Authors to whom correspondence should be addressed.

Machines 2024, 12(12), 861; https://doi.org/10.3390/machines12120861

Submission received: 13 November 2024 / Revised: 21 November 2024 / Accepted: 24 November 2024 / Published: 27 November 2024

(This article belongs to the Section Vehicle Engineering)

Download

Browse Figures

Review Reports Versions Notes

Abstract

This paper suggests an adaptive car-following strategy for autonomous connected vehicles (ACVs) that integrates a robust controller with an extended disturbance estimator (EDE) and reinforcement learning (RL) to improve performance in dynamic traffic environments. Traditional car-following methods struggle to handle external disturbances and uncertainties in vehicle dynamics. The suggested method addresses this by dynamically adjusting the EDE gain using RL, enabling the system to optimize its control strategy in real time continuously. Simulations were conducted in two scenarios, a single following vehicle and two following vehicles, each tracking a leading vehicle. Results showed significant improvements in velocity tracking, with the RL-based control method reducing velocity error by over 50% compared to conventional approaches. The technique also led to smoother acceleration control, enhancing stability and driving comfort. Quantitative metrics, such as total reward, velocity error, and acceleration magnitude, indicate that the suggested EDE-RL-based strategy provides a robust and adaptable solution for autonomous vehicle control. These findings indicate that RL, combined with robust control, can improve the performance and safety of ACV systems, making it suitable for broader applications in autonomous vehicle platooning and complex traffic scenarios, including vehicle-to-vehicle (V2V) communication.

Keywords:

autonomous driving; autonomous connected vehicles (ACVs); reinforcement learning (RL); intelligent controller

1. Introduction

The autonomous connected vehicle (ACV) is poised to transform highway transport and represents an unavoidable trend in the future [1,2]. Car-following is a fundamental traffic situation, and different control methods have been utilized in this scenario [3,4]. The car’s ability to follow other vehicles is constrained by control theory in composite transportation environments. Unlike traditional methods based on control theory, the robust controller has demonstrated successful outcomes in the car-following scenario of ACVs.

Due to external disturbances and uncertainties in parameters, car-following systems often experience a decrease in control performance. While compensating for the influence of disturbances is possible with accurate values, measuring these values directly is typically challenging. To address this issue, researchers have created various anti-disturbance methods [5]. The methods have been conveniently labeled as robust controllers. The controller essentially utilizes disturbance estimation values to counteract the effects of disturbances, while employing a traditional controller to maintain stability in the closed-loop system, ultimately achieving the intended objective [6].

The cornerstone of a robust controller lies in the disturbance estimator. Initially, the conventional nonlinear disturbance estimator was presented. A new type of nonlinear disturbance estimator has been suggested for estimating the disturbance [7]. The estimated disturbance value approaches the real disturbance value asymptotically. After that, a nonlinear gain is utilized in the nonlinear disturbance estimator, and the study indicates a significant enhancement in performance due to the nonlinear gain. It is further noted that this improvement occurs when the disturbance’s first derivative is non-zero [8], and a limited estimate error exists in the nonlinear disturbance estimator. By choosing the right gain, the error can be reduced to a negligible level.

In addition to the traditional nonlinear disturbance estimator, there is also a focus on the high-order disturbance estimator. The inclusion of an integral term is utilized in developing a higher-order disturbance estimator, assuming that the higher-order slope of the disturbance is zero. The principal and proportional principal estimators were developed by incorporating multiple integrals to account for measurement noise and internal disturbances in system analysis. Subsequently, the higher-order linear comprehensive state estimator incorporated the nonlinear equation. The initial first-order nonlinear disturbance estimator was expanded to a higher-order version [9,10]. Enhancing the precision of disturbance estimation can be achieved through additional enhancements. Moreover, certain descending mode disturbance estimators enhance estimate precision by incorporating higher-order elements [11,12].

Nevertheless, the disturbance estimator’s coefficients are typically established through manual adjustment relying on past experiences. These coefficients cannot be dynamically modified according to real-time traffic conditions. Consequently, this traditional estimator-based approach may lead to decreased performance in intricate traffic settings. Concurrently, there is growing interest in artificial intelligence techniques such as neural networks [13] and reinforcement learning [14,15,16]. Car research has incorporated the use of reinforcement learning algorithms to enhance the training process. These algorithms enable the maximization of cumulative rewards through interactions with the environment, ultimately leading to improvements in the effects of optimal schemes created on environmental changes [17,18].

The problem mentioned above is addressed by implementing reinforcement learning in the car-following scenario within the robust controller framework. This research introduces a car-following approach for ACVs and makes the following significant contribution:

A car-following approach is theoretically suggested using the Lyapunov method within the context of a robust controller. The introduction of EDE into the car-following scenario aims to enhance the performance of car-following.
The EDE gain of the suggested strategy is adaptively attuned through RL, as opposed to being manually set based on experience, diverging from the traditional approach.
The simulation outcomes for two different scenarios indicate a significant improvement in the car-following performance of both the first and second following vehicles.

2. Related Works

The development of car-following strategies for ACVs has garnered substantial research interest in recent years. Numerous methodologies have been explored to enhance vehicle following performance under varying traffic conditions. A previous article [5] introduces a self-supervised reinforcement learning algorithm designed to mitigate disturbances in vehicle control during car-following situations. This algorithm incorporates an EDO into a robust controller framework, allowing for the dynamic adjustment of EDO gains through reinforcement learning. As a result, the algorithm can provide adaptive and stable responses to disturbances. Simulations demonstrate notable enhancements in performance, underscoring the algorithm’s robust nature and its potential to improve ACV systems across different traffic scenarios. Another paper [19] presents an RL-based CF control strategy for multivehicle scenarios using V2V communication data. It employs the twin-delayed deep deterministic policy gradient (TD3) algorithm and integrates a sequence-to-sequence (seq2seq) predictive module to handle dynamic traffic conditions. Validated with the highD dataset and traffic perturbation scenarios, the suggested strategy shows superior performance in convergence speed, safety, efficiency, comfort, and fuel consumption compared to standard RL algorithms and model predictive controllers (MPCs).

The researcher in [20] presents a coordinated control model for intelligent vehicles, combining hierarchical time series prediction with deep reinforcement learning for lane change and car-following. It employs Long Short-Term Memory (LSTM) for vehicle state prediction, a Deep Q-Network with Double Q-Learning (D3QN) for lane-change decisions, a Genetic Algorithm with LSTM and Backpropagation (GA-LSTM-BP) for trajectory planning, and Deep Deterministic Policy Gradient (DDPG) for velocity control. Simulations show a 17.58% increase in driving velocity, improving efficiency, safety, and stability. Ref. [21] introduces an Expert Demonstration Reinforcement Learning (EDRL) approach to address challenges in deep reinforcement learning (DRL) for car-following control with traffic oscillations. By incorporating expert demonstrations, the method stabilizes training, accelerates learning, and enhances performance. The approach involves offline pre-training with prior traffic knowledge and online training using behavioral cloning from expert demonstrations. Experiments show that EDRL significantly improves training stability, learning speed, and rewards compared to baseline algorithms. The study conducted by [22] introduces a novel data-driven traffic signal control method using deep reinforcement learning (DRL) and deep learning techniques. The approach employs a compressed representation of traffic states to enhance signal control flexibility and efficiency. Simulations demonstrate that the suggested method outperforms existing benchmarks in vehicle speed, queue length, wait time, and traffic density, showing improved convergence and robustness. The study conducted in [23] presents a reinforcement learning-based control method (RL-controller) for high-speed path following in autonomous vehicles. The RL controller addresses the difficulties of accurate dynamic modeling and the strong coupling between longitudinal and lateral controls. It utilizes a deep soft actor-critic method and integrates lane curvature information to improve learning and control performance. The approach outperforms traditional methods like imitation learning and model-based optimal control in terms of driving speed and compound error. Another paper [24] investigates the use of deep reinforcement learning (DRL) for controlling autonomous vehicles (AVs) during pedestrian crossings, addressing challenges in managing speed and decision-making with varying road users and conditions. It presents a DRL model that integrates continuous and discrete action spaces, enhancing the vehicle’s ability to adapt to dynamic environments. The model incorporates a hybrid approach for real-time acceleration control and vehicle-to-pedestrian (V2P) communication via light signals.

Ref. [25] presents a compliant optimization approach for transportation signals and vehicle speed using a multi-agent deep reinforcement learning framework called COTV-MADRL. The system includes two types of agents, a Light-agent for controlling traffic signals and a Vehicle-agent for managing vehicle speeds, aimed at reducing unnecessary stops at intersections and enhancing overall traffic efficiency. Through a combination of macro- and micro-control strategies and the support of cooperative vehicle infrastructure systems, the model improves traffic flow and driving comfort. Experiments conducted on 108 signalized intersections using real-world data demonstrate the superior performance of COTV-MADRL compared to conventional and other DRL-based methods. This study [26] introduces a car-following (CF) model that integrates both longitudinal and lateral control using three degrees of the freEDEm vehicle dynamics model and reinforcement learning. Two models, DDPG (Deep Deterministic Policy Gradient) and MADDPG (Multi-Agent DDPG) are developed and trained using 100 CF segments from the OpenACC database. The models are validated against observed data, showing that vehicles controlled by these models perform better in terms of safety, comfort, and traffic flow efficiency compared to human-driven vehicles. The MADDPG model, in particular, shows superior performance with higher road traffic flow efficiency and reduced lateral offsets. The related works demonstrate significant advancements in car-following strategies and control systems for autonomous connected vehicles. Another [27] paper presents a predictive model-based and control-aware communication strategy for Cooperative Adaptive Cruise Control (CACC), combining Event-Triggered Communication (ETC) with Model-Based Communication (MBC) to reduce communication usage by 82% while maintaining performance (less than 1% speed deviation). It addresses inefficiencies in traditional periodic strategies, ensuring string stability and preventing Zeno behavior. However, gaps remain in integrating MBC with ETC, embedding control dynamics in MBC models, and enhancing scalability for large networks, highlighting areas for future research. The researcher in [28] proposed a Safety Reinforced-CACC strategy to enhance platoon safety during V2V communication failures. By integrating a dual-branch control strategy with a smooth transition algorithm, the system switches to sensor-based ACC when communication fails, reducing aggressive acceleration and braking. Simulations show improved safety under poor communication. Future work includes developing advanced transition algorithms, exploring complex communication environments, and designing diverse scenarios to ensure societal trust in the system.

These studies highlight the effectiveness of various reinforcement learning techniques and optimization methods in improving vehicle safety, comfort, traffic flow, and efficiency across different scenarios. Each approach contributes to the ongoing development of more sophisticated and adaptable systems, providing valuable insights and benchmarks for future research in intelligent transportation. In contrast to these studies, our paper introduces three main innovations that are not extensively represented in the literature: (1) The integration of a robust controller with an EDO for real-time disturbance rejection in car-following tasks. (2) Dynamic adjustment of the EDO gain using reinforcement learning to optimize vehicle stability and control performance. (3) An adaptive control strategy designed to address real-time disturbances specifically in car-following scenarios. The details outlined in Table 1 encapsulate the most recent research results.

3. Vehicle Model and Problem Definition

The ACV model is described in detail, including the integration of longitudinal and lateral control systems. Following this, the problem formulation addresses the challenges in optimizing car-following strategies to enhance vehicle performance and safety in varying traffic conditions. The formulation sets the stage for developing and evaluating the suggested car-following strategy within the context of the extended disturbance estimator and reinforcement learning framework.

3.1. Vehicle Model

This research examines ACV utilizing V2V communication technology while excluding pedestrians from consideration. The primary vehicle travels in a single lane and is not influenced by the vehicles following it. The primary vehicle is defined as

x_{0} = v_{0} t + x_{0} (0)

, where

x_{0} (0)

,

x_{0}

, and

v_{0}

represent the initial position, position, and velocity, correspondingly. As per [29], the model for an autonomous-driving car is provided as follows:

\{\begin{array}{l} \dot{x_{i}} = y_{i} \\ \dot{v_{i}} = a_{i} \\ \dot{a_{i}} = - a_{i} / k + u_{i} + ω_{i} \end{array}

(1)

The variables x_i, v_i, a_i, u_i, and ω_i represent the position, velocity, acceleration, control torque, and external disturbance of the i-th subsequent vehicle, as functions of time t. For instance, x_i = x_i(t), v_i = v_i(t), and a_i = a_i(t). While t is not explicitly shown in Equation (1), the time dependency is implicit. Meanwhile, k stands for the dynamics parameter of the autonomous driving car. This is a sort of time constant that tunes the maximum rate of change in longitudinal acceleration.

3.2. Problem Definition

With the method model described by Equation (1), this research aims to develop a car-following approach u_i for the i-th following ACV that enhances car-following capabilities solely through V2V communication between neighboring vehicles.

4. Extended Disturbance Estimator Design for Intelligent Connected Vehicle

This section presents the incorporation of the extended disturbance estimator in the car-following control scheme. The purpose is to estimate the impact of the “equivalent disturbance” on the ACV’s car-following system. Subsequently, the estimator gains are adjusted through reinforcement learning.

4.1. Extended Disturbance Estimator Design

Let

y_{i} = x_{i + 1} - x_{i} - x_{v e h} - x_{s a f e}

indicate the positional difference between the safe space x_safe, and x_veh is the length of the ego-vehicle and the real space of two neighboring vehicles. Let

y_{v} = v_{d} - v_{i}

indicate the velocity difference between the ego-vehicle and the leading vehicle. In this case, we selected a sliding surface as outlined below:

\begin{array}{l} s_{i} & = y_{i} + c_{1} y_{v} \\ = y_{i} + c_{1} (v_{d} - v_{i}) \end{array}

(2)

This formulation ensures that both position and velocity errors are considered in the control law, allowing for responsive adjustments that maintain safe following distances. The positive controller gain c₁ balances the position and velocity terms to enhance stability.

The s_i function’s derivative can be obtained through the following steps:

\dot{s_{i}} = \dot{y_{i}} + c_{1} \dot{y_{v}}

(3)

By replacing Equation (1) and y_i into Equation (3), we obtain the following result:

\begin{array}{l} {\dot{s}}_{i} & = v_{i + 1} - v_{i} + c_{1} (a_{0} - a_{i}) \\ = v_{i + 1} - v_{i} + c_{1} a_{0} + c_{1} k {\dot{a}}_{i} - c_{1} k u_{i} - c_{1} k w_{i} \end{array}

(4)

Explain

d_{i} = c_{1} a_{0} + c_{1} k {\dot{a}}_{i} - c_{1} k w_{i}

as an “equivalent disturbance”, and Equation (5) is represented as

{\dot{s}}_{i} = v_{i + 1} - v_{i} - c_{1} k u_{i} + d_{i}

(5)

Then, Equation (5) can be reformulated as the following:

d_{i} = {\dot{s}}_{i} - v_{i + 1} + v_{i} + c_{1} k u_{i}

(6)

By utilizing Equation (6), it is possible to indirectly determine the structure of di by measuring the earlier structure of u_i,

v_{i + 1}

, v_i, a_i, and calculating

{\dot{s}}_{i}

. This can be expressed as defining z₁ = d_i and z₂ = d_i, and rewriting the “equivalent disturbance” can be reformulated as the following:

\{\begin{matrix} {\dot{z}}_{1} = z_{2} \\ {\dot{z}}_{2} = {\ddot{d}}_{i} \end{matrix}

(7)

Additionally, Equation (7) can be expressed in the following formula through the definition of A = [0, 1; 0, 0], B = [0, 1]^T, and z = [z₁, z₂]^T.

\ddot{z} = A z + B {\ddot{d}}_{i}

(8)

Hence, System Equation (8) can be represented by Equation (6). By setting C = [1, 0], the output equation is expressed as follows:

d_{i} = C z

(9)

Given the observability of the pair (A, C), the disturbance estimator for an autonomous driving vehicle can be formulated based on the guidelines provided in [30] by establishing

{\hat{d}}_{i} = C \hat{z} .

Theorem 1.

Suppose we assume that the second derivative of equivalent disturbance d_i is confined and pleases

{\ddot{d}}_{i} \leq δ_{1},

for the system. The boundedness of

{\ddot{d}}_{i}

is reasonable in typical driving scenarios, as physical disturbances (like drag or road gradient) generally vary gradually. The parameter δ₁ is chosen based on expected disturbance levels, ensuring robustness to realistic variations in traffic and road conditions. Considering Equations (8) and (9), EDE can be designed in the following formula:

\{\begin{array}{l} \hat{z} = p + L s_{i} \\ \dot{p} = (A - L C) p + A L s_{i} - L [C L s_{i} + v_{i + 1} - v_{i} - c_{1} k u_{i}] \end{array}

(10)

Therefore, the error in estimating the disturbance, denoted as

\tilde{z} = z - \hat{z}

, can approach a region near the origin through the selection of suitable gains L = [l₁, l₂]^T and an auxiliary variable p.

Equation (10) can be rewritten by substituting the second line into the derivative of the first line.

\dot{\hat{z}} = (A - L C) \hat{z} + A L [{\dot{s}}_{i} - v_{i + 1} + v_{i} + c_{1} k u_{i}]

(11)

Rewriting Equations (6) and (9), and

{\hat{d}}_{i} = C \hat{z}

in Equation (11) results in

\dot{\hat{z}} = A \hat{z} + L C (z - \hat{z})

(12)

Equation (8) subtracted by Equation (12) results in

\dot{\tilde{z}} = (A - L C) \tilde{z} + B {\ddot{d}}_{i}

(13)

Selecting suitable gains will result in

\bar{A} = A - L C

being Hurwitz due to the observability of the pair (A, C). Therefore, for any positive matrix Q > 0, a distinct positive matrix p > 0 exists.

P \bar{A} + {\bar{A}}^{T} P = - Q

(14)

Select a Lyapunov function.

V_{1} = {\tilde{z}}^{T} P \tilde{z}

(15)

The time derivative is obtained by taking the derivative of Equation (15) concerning Equations (13) and (14).

\begin{array}{l} V_{1} & = {\tilde{z}}^{T} P \dot{\tilde{z}} + {\dot{\tilde{z}}}^{T} P \tilde{z} = {\tilde{z}}^{T} (P \bar{A} + {\bar{A}}^{T} P) \tilde{z} + 2 {\tilde{z}}^{T} P B {\ddot{d}}_{i} \\ = - {\tilde{z}}^{T} Q \tilde{z} + 2 {\tilde{z}}^{T} P B {\ddot{d}}_{i} \end{array}

(16)

Consider λ_p as the largest eigenvalue of matrix P and λ_q as the smallest eigenvalue of matrix Q, resulting in

\begin{array}{l} {\dot{V}}_{1} & \leq - λ_{q} | | \tilde{z} | |_{2}^{2} + 2 | | \tilde{z} | |_{2} | | P | |_{F} | | B | |_{2} | {\ddot{d}}_{i} | \\ \leq - λ_{q} | | \tilde{z} | |_{2}^{2} + 2 λ_{q} | | \tilde{z} | |_{2} δ_{1} \end{array}

(17)

It can be shown that |.|, ||.||2, and ||.||F represent the utter value of a variable, the 2-norm of a vector, and the Frobenius norm of a matrix, correspondingly. It follows that

{\dot{V}}_{1} < 0 i f {||\tilde{z}||}_{2} > 2 λ_{p} δ_{1} / λ_{q}

. Subsequently, the reduction in V1 leads the system’s path towards a region where

{||\tilde{z}||}_{2} > 2 λ_{p} δ_{1} / λ_{q}

. Consequently, by selecting suitable gains L = [l₁, l₂]^T, the system’s path will ultimately converge towards a limited origin.

Statement 1.

Traditional EDE relies on manual adjustment of gains, which are determined through experience. However, due to the constantly changing nature of disturbances in transportation environments, the secure gain of EDE falls short in meeting the demands of complex transportation scenarios, ultimately leading to a decrease in the precision of disturbance estimates.

4.2. Estimator Gain Adjusted by Reinforcement Learning

The EDE gain for the suggested car-following system is adjusted using reinforcement learning. This method allows for maximizing returns through trial and error to achieve an ideal approach. DDPG is a common reinforcement learning algorithm that is well-suited for continuous action spaces. Therefore, the DDPG algorithm from Algorithm 1 is utilized in this study to adjust EDE gains. Two car-following scenarios are selected for ease of reinforcement learning application, as depicted in Figure 1.

Algorithm 1: DDPG Algorithm

1.

Initialize:

Critic network Q(s,a∣θ^Q) and actor μ(s∣θ^μ) random weights θ^Q and θ^μ.
Target networks Q′ and μ′ with weights θ^Q′←θ^Q.
Replay buffer R.

2.

For episode = 1 to M do:

Initialize a random process N for action exploration.
Receive initial observation state s₁.

3.

For time step t = 1 to T do:

Select action $a_{t} = μ (s_{t} | θ^{μ}) + N_{t}$ according to the current policy and exploration noise.
Execute action at and observe reward r_t and new state s_t₊₁.
Store transition (s_t, a_t, r_t, s_t₊₁) in replay buffer R.

4.: Sample a random minibatch of N transitions (s_i, a_i, r_i, s_i₊₁) from R.

5.: Set target value for each sampled transition:

$y_{i} = r_{i} + γ Q^{'} (s_{i + 1}, μ^{'} (s_{i + 1} |θ^{μ^{'}})| θ^{Q^{'}})$

6.: Update critic by minimizing the loss:

$L = \frac{1}{N} \sum_{i} {(y_{i} - Q (s_{i}, a_{i} | θ^{Q}))}^{2}$

7.: Update the actor policy using the sampled policy gradient:

$\nabla_{θ^{μ}} J \approx \frac{1}{N} \sum_{i} \nabla_{a} Q (s, a {|θ^{Q})|}_{s = s_{i}, a = μ (s_{i})} \nabla_{θ^{μ}} μ (s {|θ^{μ})|}_{s_{i}}$

8.: Update target networks:

$θ^{Q^{'}} \leftarrow τ θ^{Q} + (1 - τ) θ^{Q^{'}}$

$θ^{μ^{'}} \leftarrow τ θ^{μ} + (1 - τ) θ^{μ^{'}}$

9.: End for

10.: End for

In Scenario 1, there is a single leading vehicle and a single following vehicle. The two vehicles exchange information regarding speed, position, and other relevant data through V2V communication. The vehicle highlighted in the blue box in Figure 1 implements the car-following system outlined in this research, where the EDE gain is continuously simplified in the present using reinforcement learning.

The Markov decision process is initially modeled. The DDPG selects the action as the EDE gain of the subsequent vehicle. The state space is then selected as

\{y_{i}, Δ v_{i}, v_{i}, a_{i}\}

(18)

The universe movement between the following vehicle and the front vehicle is calculated as

y_{i} = x_{i} - x_{0} - x_{v e h} - x_{s a f e}

, where xsafe represents the safe space and xveh represents the vehicle length. The velocity error between the following vehicle and the front vehicle is determined as Δv_i = v_i − v₀, with v_i and a_i representing the velocity and acceleration of the following vehicle.

The selection of the compensation purpose is as follows:

r = r_{Δ v_{i}} + r_{j e r k i}

(19)

r_{Δ v_{i}} = - ω_{1} \frac{|Δ v_{i}|}{v_{m a x}}

(20)

r_{j e r k} = - ω_{2} \frac{|a_{i_{k}} - a_{i_{k} - 1}|}{2 a_{m a x} Δ T}

(21)

In the equation, a_ik represents the acceleration of the i-th following vehicle at the k-th frame. Additionally, v_max, a_max, and ΔT stand for maximum velocity, maximum acceleration, and time step, correspondingly. The theoretical absolute value of a variable is indicated by |.|, while positive coefficients ω₁ and ω₂ are also included in the equation.

In Scenario 2, there is a front vehicle and two following vehicles. As depicted in Figure 1, the car-following approach presented in this study is utilized by the following vehicles 1 and 2, with real-time updates of EDE gain through reinforcement learning.

We formulate a Markov decision method for Scenario 2, where the exploit of DDPG is selected as the EDE gain for two consecutive vehicles simultaneously. The state space is defined as follows:

{y_{i}, Δ v_{i}, y_{i + 1}, Δ v_{i + 1}, v_{i}, v_{i + 1}}

(22)

The universe relative motion movement between following vehicle 1 and the front vehicle is given by

y_{i} = x_{i} - x_{0} - x_{v e h} - x_{s a f e}

, while the velocity error is represented by Δv_i = v_i − v₀. On the other hand, the universe relative motion movement between following vehicle 2 and the front vehicle is denoted by y_i₊₁ = x_i₊₁ − x₀ − 2(x_veh − x_safe), and the velocity error is Δv_i₊₁ = v_i₊₁ − v₀. The velocities of the following vehicles 1 and 2 are represented by v_i and v_i₊₁, respectively.

The incentive function in scenario 2 is calculated as the total of Equation (19) for subsequent vehicles 1 and 2 in the following manner:

r = \sum_{i = 1}^{2} r_{Δ v i} + r_{j e r k i}

(23)

Statement 2.

In contrast to the conventional EDE method, which relies on experience to adjust gains, the gains of EDE in this research are fine-tuned using RL. RL involves continuous trial and error during training to achieve the strategy with the highest cumulative reward, ensuring accurate disturbance adaptive estimation in intricate traffic situations. Consequently, through reinforcement learning, EDE gains can be optimized to suit various state spaces, enhancing the precision of the disturbance estimate.

Statement 3.

For more than three vehicles, the EDE gain will be adjusted by linking Scenarios 1 and 2. For instance, if there are four ACVs, Scenario 1 will be applied to vehicles 1 and 2, with vehicle 2 being manipulated by the suggested method. On the other hand, Scenario 2 will be applied to vehicles 2, 3, and 4, with vehicles 3 and 4 being operated by the suggested method. Due to the possibility of delay, the suggested car-following structure only takes into account V2V communication between both vehicles, making it suitable for limited platoons.

5. Car-Following Strategy Using Extended Disturbance Estimator

Within this segment, a car-following approach utilizing a protracted disturbance estimator is suggested through the integration of an extended disturbance estimator and traditional sliding model control technique in the following manner:

Theorem 2.

By designing the distributed controller with appropriate parameters, the route of method error for system (1) will be ensured to converge towards a vicinity of the source.

u_{i} = \frac{1}{c_{1} k} [(v_{i + 1} - v_{i}) + c_{2} s_{i} + c_{3} s i g n (s_{i}) + {\hat{d}}_{i}]

(24)

The control torque, velocity, and vehicle dynamics parameters are represented by u_i, v_i, and k, respectively. The disturbance estimation

\hat{d}

is defined by Equation (10), while s_i follows Equation (4). The controller gains c₁, c₂, and c₃ are also included in the system.

Proof.

By replacing Equation (24) with Equation (5), we obtain:

{\dot{s}}_{i} = - c_{2} s_{i} - c_{3} s i g n (s_{i}) + {\hat{d}}_{i}

(25)

Select Lyapunov matrix V₂ as

V_{2} = \frac{1}{2} s_{i}^{2} + V_{1}

(26)

Deriving V₂ concerning Theorem 1 and Equation (24) results in

\begin{array}{l} {\dot{V}}_{2} & = s_{i} {\dot{s}}_{i} + {\dot{V}}_{1} \\ \leq s_{i} {\dot{s}}_{i} - λ_{q} | | \tilde{z} | |_{2}^{2} + 2 λ_{q} | | \tilde{z} | |_{2} δ_{1} \end{array}

(27)

Equation (25) can be replaced with Equation (27) to obtain a new result:

\begin{array}{l} {\dot{V}}_{2} & \leq - c_{2} {|s_{i}|}^{2} - c_{3} |s_{i}| + s_{i} {\tilde{d}}_{i} - λ_{q} {||\tilde{z}||}_{2}^{2} + 2 λ_{q} {||\tilde{z}||}_{2} δ_{1} \\ \leq - c_{2} {|s_{i}|}^{2} - c_{3} |s_{i}| + {||\tilde{z}||}_{2} |s_{i}| - λ_{q} {||\tilde{z}||}_{2}^{2} + 2 λ_{q} {||\tilde{z}||}_{2} δ_{1} \end{array}

(28)

With a positive value of δ₂, it is possible to achieve

δ_{2} > 0, {||\tilde{z}||}_{2} \leq δ_{2}

as stated in Theorem 1. Subsequently, Equation (28) can be restated as

\begin{array}{l} {\dot{V}}_{2} & \leq - {(\sqrt{c_{2}} |s_{i}| - \frac{1}{2 \sqrt{c_{2}}} {||\tilde{z}||}_{2})}^{2} - c_{3} |s_{i}| + \frac{1}{4 c_{2}} {||\tilde{z}||}_{2}^{2} \\ - λ_{q} {||\tilde{z}||}_{2}^{2} + 2 λ_{p} {||\tilde{z}||}_{2} δ_{1} \\ \leq - c_{3} |s_{i}| - (λ_{q} - \frac{1}{4 c_{2}}) {||\tilde{z}||}_{2}^{2} + 2 λ_{p} {||\tilde{z}||}_{2} δ_{1} \\ \leq - c_{3} |s_{i}| - (λ_{q} - \frac{1}{4 c_{2}}) {||\tilde{z}||}_{2}^{2} + 2 λ_{p} δ_{1} δ_{2} \end{array}

(29)

Define

α = m i n {c_{3,} (4 c_{2} λ_{q} - 1) / 4 c_{2}}, β = 2 λ_{p} δ_{1} δ_{2}

. Equation (29) is equivalent to the following formulation:

\begin{array}{l} {\dot{V}}_{2} & \leq - c_{3} \frac{1}{2} s_{i}^{2} - \frac{4 c_{2} λ_{q} - 1}{4 c_{2}} | | \tilde{z} | |_{2}^{2} + 2 λ_{p} δ_{1} δ_{2} \\ \leq - c_{3} V_{2} + β \end{array}

(30)

By multiplying e^−εt to both sides of Equation (30), we obtain

e^{- ε t} ({\dot{V}}_{2} + α V_{2}) \leq e^{- ε t} β

(31)

When Equation (31) is combined with the initial value V₂(0) of V₂(t), the result is obtained.

V_{2} (t) \leq [V_{2} (0) - β / α] e^{- ε t} + β / α

(32)

According to Equation (32), it is established that V₂(t) is limited by the bounds of α, β, and V₂(0). As a result, the trajectory of the closed-loop system will ultimately converge within a limited region. □

Statement 4.

The boundary of the area is determined by the ratio β/α, which in turn is influenced by the parameters c₂, c₃, λ_p, λ_q, δ₁, and δ₂. The precision of the suggested distributed controller is affected by both the disturbance features and estimate error of EDE, given that

{\ddot{d}}_{i} \leq δ_{1}

and

∥ \tilde{z} ∥_{2} \leq δ_{2}

. According to Theorem 1 and Statement 2, the EDE gains L = [l₁, l₂]^T are adapted by RL in this study to enhance the precision of disturbance estimation, thereby improving the accuracy of the distributed controller. This is attributed to the fact that EDE can compensate for the “equivalent disturbance” to a large extent, thereby significantly enhancing the disturbance refusal capability of the multiple controllers in multipart transportation scenarios.

Statement 5.

To prevent prattling issues, the function sign(x) has been substituted with the saturation function

s a t (x) = s i g n (x) \times m i n \{| x | / ς_{1}, 1\}

where 0 < ς1 < 1. Therefore, the car-following controller suggested in Equation (24) can be expressed as follows:

u_{i} = \frac{1}{c_{1} k} [(v_{i + 1} - v_{i}) + c_{2} s_{i} + c_{3} s a t (s_{i})] + {\hat{d}}_{i}

(33)

Equation (33) allows for the representation of

{\hat{d}}_{i}

to be expressed in a modified form, as illustrated in Equation (10):

\{\begin{array}{l} {\hat{d}}_{i} = l_{1} s_{i} + p_{1} \\ {\dot{p}}_{1} = - l_{1} {\hat{d}}_{i} - l_{1} (v_{i + 1} - v_{i} - c_{1} k u_{i}) + {\hat{z}}_{2} \end{array}

(34)

\{\begin{array}{l} {\hat{z}}_{2} = l_{2} s_{i} + p_{2} \\ {\dot{p}}_{2} = - l_{2} {\hat{d}}_{i} - l_{2} (v_{i + 1} - v_{i} - c_{1} k u_{i}) \end{array}

(35)

Additionally, Equation (33) will deteriorate into the traditional car-following controller without taking into account disturbance compensation in the following formula:

u_{i} = \frac{1}{c_{1} k} [(v_{i + 1} - v_{i}) + c_{2} s_{i} + c_{3} s a t (s_{i})]

(36)

6. Simulations Results

To evaluate the efficiency of the suggested CF strategy, simulations were conducted on a single-lane setup without accounting for lane changes. The simulation process was performed using a Python–Simulink integration, where Python 3.12 handled the RL training phase, and MATLAB 2023a modeled vehicle dynamics. This combined setup allowed for us to leverage Python’s robust reinforcement learning libraries, particularly for implementing the DDPG algorithm, while using Simulink’s accurate vehicle modeling capabilities to capture real-time dynamics.

The modeling process in Simulink represents a simplified traffic scenario with a leading vehicle and one or two following vehicles. The leading vehicle’s velocity profile is divided into six segments with varying speeds to simulate realistic changes in traffic conditions. The parameters for safe space and car length are defined as x_safe = 6 m and x_veh = 4 m, respectively. The initial conditions for the following vehicles are established as y_i = x_safe + x_veh and v_i(0) = v₀(0). The RL-based control strategy was trained using the DDPG algorithm. During the RL training phase, Python continuously adjusted the EDO gain for the following vehicle, learning to optimize the control strategy based on simulated traffic disturbances. Training convergence was monitored through reward values, which gradually stabilized as the RL agent learned to minimize velocity errors and optimize the following performance. Training convergence was observed after approximately 10,000 episodes, achieving a stable reward and demonstrating the algorithm’s ability to generalize across different traffic scenarios; each episode comprised a sequence of time steps where the agent select actions (EDE gains) are based on the observed state space. A replay buffer stored past experiences, and minibatch sampling was used to update the critic and actor networks, enabling stable learning and avoiding overfitting to recent experiences. Training convergence was assessed by monitoring the reward value per episode. Figure 2 shows the training convergence of the RL agent over 10,000 episodes. The graph shows the average reward per episode, gradually stabilizing as the RL agent learns effective policies for minimizing velocity error and managing external disturbances. The observed convergence demonstrates the model’s ability to generalize across different traffic scenarios, ensuring consistency and robustness in real-world applications. This convergence ensures that the RL agent’s control decisions are reliable and effective under varying conditions.

In the simulation, Gaussian noise is utilized as dimension noise, with a simulation time step set l₁ of 0.2 s. The parameter l₁ is adjusted through reinforcement learning, while l₂ is fixed at 0.01 s. The parameters for DDPG are detailed in Table 2, noting that their variations are typically minimal. Controller parameters are provided in Table 3.

The effectiveness of the suggested CF method was validated across two scenarios: a single following vehicle and two following vehicles; each scenario was evaluated over three distinct time intervals (0–50 s, 100–150 s, and 300–350 s). Reward values and velocity errors were analyzed to assess performance. The convergence in training reward indicates that the RL agent learned effective policies for minimizing tracking errors and managing external disturbances. By achieving stable reward values, the RL-based controller demonstrated consistency and robustness, making it applicable to real-world car-following tasks.

6.1. Single Following Vehicle in Scenario 1

The velocity comparison between the leading and following vehicles was conducted for three distinct time intervals: 0–50 s, 100–150 s, and 300–350 s. As shown in Figure 3, during the first interval (0–50 s), the following vehicle using RL-based control closely follows the velocity of the leading vehicle, with minimal overshoot. The following vehicle’s velocity quickly converges with the leading vehicle’s changes, demonstrating smooth adaptation with reduced error.

In the second interval (100–150 s), Figure 4 illustrates that the suggested RL-based control continues to outperform conventional methods, maintaining stable velocity tracking despite fluctuating velocities in the leading vehicle. By 300–350 s, as shown in Figure 5, the suggested control strategy still demonstrates excellent performance, with the following vehicle maintaining a smaller velocity error and smoother velocity transitions than the conventional car-following method. Table 4 summarizes the velocity error for both the conventional and RL-based car-following methods. The RL-based control reduces velocity error by over 60% in the 0–50 s interval and continues to perform significantly better across the other intervals.

6.2. Two Following Vehicles in Scenario 2

In Scenario 2, the two following vehicles adapt their velocities based on the leading vehicle’s dynamics. Figure 6 shows the velocity profiles for the leading vehicle and the two following vehicles during the 0–50 s interval. The first following vehicle exhibits a quicker response, closely tracking the leading vehicle, while the second following vehicle demonstrates a more delayed response but still adapts effectively with minimal overshoot.

As depicted in Figure 7 and Figure 8, during the 100–150 s and 300–350 s intervals, both following vehicles continue to show stable velocity tracking. The second following vehicle’s delay is more pronounced, but its tracking accuracy improves over time. The RL-based control successfully reduces the response time for both vehicles, allowing for them to maintain safe following distances with smoother velocity transitions. Table 5 summarizes the velocity errors for both following vehicles in Scenario 2. The first vehicle consistently achieves lower velocity errors, while the second vehicle, despite having a delayed response, gradually reduces its tracking error over time.

The results from both scenarios demonstrate the superiority of the suggested car-following strategy using the EDE combined with RL. In Scenario 1, the following vehicle exhibited significantly reduced velocity error and quicker adaptation to the leading vehicle’s changes, as evidenced by Figure 3, Figure 4 and Figure 5. In Scenario 2, both following vehicles demonstrated stable and smooth velocity tracking, with the first vehicle reacting faster and the second vehicle showing a delayed but effective response, as seen in Figure 6, Figure 7 and Figure 8. The performance metrics summarized in Table 4 further highlight the effectiveness of the suggested method, reducing velocity error by over 50% compared to the conventional CF method. These results showcase the robustness and adaptability of the suggested strategy in dynamic traffic conditions, offering improved tracking accuracy and passenger comfort.

6.3. Comparative Analysis with Benchmark Models

To further evaluate the effectiveness of the suggested EDE-RL model, we conducted a comparative analysis with three benchmark models: a classic Adaptive Cruise Control (ACC) model, CACC, and a reinforcement learning-based control model using DDPG without the EDE. The ACC model, a widely recognized baseline in car-following research, represents a traditional approach to vehicle longitudinal control. The CACC model, which leverages V2V communication for collaborative control, provides a more advanced benchmark, enabling vehicles to exchange information to improve tracking accuracy and stability. In contrast, the DDPG-based RL model without EDE allows for us to isolate and assess the specific contribution of the EDE in enhancing disturbance rejection.

Three primary performance metrics were evaluated across all models: velocity tracking error, acceleration smoothness (jerk), and disturbance rejection. The velocity tracking error metric captures the average deviation from the leading vehicle’s speed, reflecting each model’s accuracy in maintaining consistent following behavior. Acceleration smoothness, measured in terms of jerk, indicates how smoothly the model manages speed adjustments, contributing to passenger comfort by minimizing abrupt accelerations or decelerations. Finally, disturbance rejection assesses each model’s resilience to external disturbances, a critical factor in dynamic traffic environments where unexpected changes in traffic flow can impact vehicle stability. This comparative analysis provides insight into the strengths of the suggested EDE-RL model, particularly in reducing tracking errors, improving acceleration smoothness, and enhancing disturbance-handling capability, thereby validating its robustness and suitability for real-world applications in ACV. Table 6 and Table 7 summarize the results of 0–50 s to predict the best comparative analysis for Scenarios 1 and 2, respectively. In addition to the ACC and RL-based models, a CACC model was included as a benchmark, leveraging V2V communication for collaborative control. Figure 9 and Figure 10 show the velocity tracking performance of the suggested EDE-RL model compared to the CACC and other benchmark models. The EDE-RL model achieves superior tracking accuracy and smoother transitions, highlighting its robustness and adaptability to dynamic traffic conditions.

The comparative results demonstrate that the suggested EDE-RL model significantly outperforms the ACC model, the CACC model, and the RL-based model without EDE across all evaluated metrics. The EDE-RL model achieved the lowest velocity tracking error, indicating its superior ability to maintain consistent following behavior. In addition, the EDE-RL model exhibited the smoothest acceleration profiles, reducing jerk and contributing to enhanced passenger comfort and driving stability. Moreover, the EDE-RL model demonstrated remarkable disturbance rejection capabilities, effectively handling a higher percentage of external disturbances compared to the other models. These results highlight the robustness of the EDE-RL approach in addressing dynamic traffic conditions, ensuring improved tracking accuracy, smoother acceleration transitions, and greater resilience to disturbances, thereby validating its suitability for real-world ACV applications.

7. Conclusions

In this work, we suggested an advanced car-following strategy for ACVs that integrates an EDE with RL to address the challenges posed by dynamic traffic conditions and external disturbances. The suggested approach adapts the EDE gain in real-time, allowing for the system to compensate for uncertainties in vehicle dynamics and disturbances, which conventional car-following strategies often struggle to manage. By incorporating RL, the system can continuously learn and optimize its control behavior based on environmental changes, making it highly adaptive and responsive. Through extensive simulations in two scenarios, a single following vehicle and multiple following vehicles, the suggested method consistently outperformed conventional car-following strategies. In Scenario 1, with a single following vehicle, the RL-based control system demonstrated superior performance, maintaining a significantly lower velocity error and faster convergence to the leading vehicle’s speed changes across various time intervals. Scenario 2, involving two following vehicles, showed that both vehicles maintained stable tracking of the leading vehicle, with the first following vehicle exhibiting faster responses and the second vehicle adapting effectively, despite a slight delay.

Quantitative analysis further confirmed the efficiency of the suggested scheme. The total velocity error was reduced by over 50%, and the total reward, which reflects control efficiency, improved significantly compared to conventional methods. These improvements translate into more accurate tracking of the leading vehicle, smoother transitions between speed changes, and greater overall stability of the vehicle platoon. The reduced acceleration magnitude also points to smoother driving behavior, improving passenger comfort and safety. Overall, the suggested CF (EDE-RL) method offers a robust and adaptive solution for car-following tasks in autonomous driving, enhancing both performance and safety in dynamic and uncertain traffic environments. Its ability to dynamically adjust to real-time changes makes it well-suited for real-world deployment in Intelligent Connected Vehicle systems. Future work could focus on extending this framework to larger vehicle platoons and incorporating more complex scenarios, such as lane changes and varying road conditions. Additionally, investigating the method’s scalability in more congested environments and exploring the integration with other vehicle systems, such as V2V communication, could further enhance its applicability to next-generation autonomous transportation systems.

The comparative analysis with benchmark models, including a conventional ACC model, CACC, and an RL-based control model without EDE, further confirmed the effectiveness of the suggested EDE-RL approach. In both single- and multi-vehicle scenarios, the EDE-RL model consistently outperformed these benchmarks in reducing velocity tracking error, smoothing acceleration transitions, and enhancing disturbance rejection capabilities. These improvements validate the robustness and adaptability of the suggested model over traditional car-following strategies, making it highly suitable for deployment in dynamic and unpredictable traffic environments.

Author Contributions

Conceptualization, A.T.; writing—original draft preparation, Y.L.; supervision A.S.; software, A.T., Y.L. and A.S.; validation and formal analysis, Y.L.; review—writing and editing. All authors have read and agreed to the published version of the manuscript.

Funding

The authors would like to express thanks for the financial support from the National Natural Science Foundation of China (project 51775474) and (FPHP: 19221909D).

Data Availability Statement

Data are available on request from the corresponding author.

Conflicts of Interest

Authors declare no conflicts of interest.

References

Omeiza, D.; Webb, H.; Jirotka, M.; Kunze, L. Explanations in autonomous driving: A survey. IEEE Trans. Intell. Transp. Syst. 2021, 23, 10142–10162. [Google Scholar] [CrossRef]
Yang, D.; Jiang, K.; Zhao, D.; Yu, C.; Cao, Z.; Xie, S.; Xiao, Z.; Jiao, X.; Wang, S.; Zhang, K. Intelligent and connected vehicles: Current status and future perspectives. Sci. China Technol. Sci. 2018, 61, 1446–1471. [Google Scholar] [CrossRef]
Zhang, Y.; Xu, M.; Qin, Y.; Dong, M.; Gao, L.; Hashemi, E. MILE: Multiobjective integrated model predictive adaptive cruise control for intelligent vehicle. IEEE Trans. Ind. Inform. 2022, 19, 8539–8548. [Google Scholar] [CrossRef]
Guo, J.; Luo, Y.; Li, K. Integrated adaptive dynamic surface car-following control for nonholonomic autonomous electric vehicles. Sci. China Technol. Sci. 2017, 60, 1221–1230. [Google Scholar] [CrossRef]
Li, M.; Li, Z.; Wang, S.; Wang, B. Anti-disturbance self-supervised reinforcement learning for perturbed car-following system. IEEE Trans. Veh. Technol. 2023, 72, 11318–11331. [Google Scholar] [CrossRef]
Zhou, Y.; Ahn, S. Ahn. Robust local and string stability for a decentralized car following control strategy for connected automated vehicles. Transp. Res. Part B Methodol. 2019, 125, 175–196. [Google Scholar] [CrossRef]
Wang, X.; Dai, B.; Miao, Q.; Nie, Y.; Shang, E. An Efficient Deep Reinforcement Learning-based Car-following Method via Rule-constrained Data Augmentation. IEEE Trans. Intell. Veh. 2024, 1–13. [Google Scholar] [CrossRef]
Chen, J.; Shuai, Z.; Zhang, H.; Zhao, W. Path following control of autonomous four-wheel-independent-drive electric vehicles via second-order sliding mode and nonlinear disturbance observer techniques. IEEE Trans. Ind. Electron. 2020, 68, 2460–2469. [Google Scholar] [CrossRef]
Chen, Q.; Zhou, Y.; Ahn, S.; Xia, J.; Li, S.; Li, S. Robustly string stable longitudinal control for vehicle platoons under communication failures: A generalized extended state observer-based control approach. IEEE Trans. Intell. Veh. 2022, 8, 159–171. [Google Scholar] [CrossRef]
Zhao, K.; Jia, N.; She, J.; Dai, W.; Zhou, R.; Liu, W.; Li, X. Robust model-free super-twisting sliding-mode control method based on extended sliding-mode disturbance observer for PMSM drive system. Control. Eng. Pr. 2023, 139, 105657. [Google Scholar] [CrossRef]
Yan, R.; Yang, D.; Wijaya, B.; Yu, C. Feedforward compensation-based finite-time traffic flow controller for intelligent connected vehicle subject to sudden velocity changes of leading vehicle. IEEE Trans. Intell. Transp. Syst. 2019, 21, 3357–3365. [Google Scholar] [CrossRef]
Yan, R.; Yang, D.; Huang, J.; Jiang, K.; Jiao, X. Distributed car-following control for intelligent connected vehicle using improved super-twisting compensator subject to sudden velocity changes of leading vehicle. IEEE Trans. Intell. Transp. Syst. 2021, 23, 6689–6698. [Google Scholar] [CrossRef]
Mou, J.; Zhang, W.; Wu, C.; Guo, Q. Adaptive control of flapping-wing micro aerial vehicle with coupled dynamics and unknown model parameters. Appl. Sci. 2022, 12, 9104. [Google Scholar] [CrossRef]
Liu, X.; Amour, B.S.; Jaekel, A. A reinforcement learning-based congestion control approach for V2V communication in VANET. Appl. Sci. 2023, 13, 3640. [Google Scholar] [CrossRef]
Mu, C.; Wang, K.; Ma, S.; Chong, Z.; Ni, Z. Adaptive composite frequency control of power systems using reinforcement learning. CAAI Trans. Intell. Technol. 2022, 7, 671–684. [Google Scholar] [CrossRef]
Wang, X.; Wang, S.; Liang, X.; Zhao, D.; Huang, J.; Xu, X.; Dai, B.; Miao, Q. Deep reinforcement learning: A survey. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 5064–5078. [Google Scholar] [CrossRef]
Zhu, M.; Wang, Y.; Pu, Z.; Hu, J.; Wang, X.; Ke, R. Safe, efficient, and comfortable velocity control based on reinforcement learning for autonomous driving. Transp. Res. Part C Emerg. Technol. 2020, 117, 102662. [Google Scholar] [CrossRef]
ElSamadisy, O.; Shi, T.; Smirnov, I.; Abdulhai, B. Safe, efficient, and comfortable reinforcement-learning-based car-following for AVs with an analytic safety guarantee and dynamic target speed. Transp. Res. Rec. J. Transp. Res. Board 2023, 2678, 643–661. [Google Scholar] [CrossRef]
Wang, T.; Qu, D.; Wang, K.; Dai, S. Deep Reinforcement Learning Car-Following Control Based on Multivehicle Motion Prediction. Electronics 2024, 13, 1133. [Google Scholar] [CrossRef]
Zhang, K.; Pu, T.; Zhang, Q.; Nie, Z. Coordinated Decision Control of Lane-Change and Car-Following for Intelligent Vehicle Based on Time Series Prediction and Deep Reinforcement Learning. Sensors 2024, 24, 403. [Google Scholar] [CrossRef]
Li, M.; Li, Z.; Cao, Z. Enhancing Car-Following Performance in Traffic Oscillations Using Expert Demonstration Reinforcement Learning. IEEE Trans. Intell. Transp. Syst. 2024, 25, 7751–7766. [Google Scholar] [CrossRef]
Shi, Y.; Wang, Z.; LaClair, T.J.; Wang, C.; Shao, Y.; Yuan, J. A novel deep reinforcement learning approach to traffic signal control with connected vehicles. Appl. Sci. 2023, 13, 2750. [Google Scholar] [CrossRef]
Liu, J.; Cui, Y.; Duan, J.; Jiang, Z.; Pan, Z.; Xu, K.; Li, H. Reinforcement learning-based high-speed path following control for autonomous vehicles. IEEE Trans. Veh. Technol. 2024, 73, 7603–7615. [Google Scholar] [CrossRef]
Brunoud, A.; Lombard, A.; Gaud, N.; Abbas-Turki, A. Application of Hybrid Deep Reinforcement Learning for Managing Connected Cars at Pedestrian Crossings: Challenges and Research Directions. Futur. Transp. 2024, 4, 579–590. [Google Scholar] [CrossRef]
Huang, H.; Hu, Z.; Li, M.; Lu, Z.; Wen, X. Cooperative Optimization of Traffic Signals and Vehicle Speed Using a Novel Multi-agent Deep Reinforcement Learning. IEEE Trans. Veh. Technol. 2024, 73, 7785–7798. [Google Scholar] [CrossRef]
Qin, P.; Tan, H.; Li, H.; Wen, X. Deep Reinforcement Learning Car-Following Model Considering Longitudinal and Lateral Control. Sustainability 2022, 14, 16705. [Google Scholar] [CrossRef]
Razzaghpour, M.; Valiente, R.; Zaman, M.; Fallah, Y.P. Predictive model-based and control-aware communication strategies for cooperative adaptive cruise control. IEEE Open J. Intell. Transp. Syst. 2023, 4, 232–243. [Google Scholar] [CrossRef]
Liu, Y.; Wang, W. A safety reinforced cooperative adaptive cruise control strategy accounting for dynamic vehicle-to-vehicle communication failure. Sensors 2021, 21, 6158. [Google Scholar] [CrossRef]
Liu, H.; Jiang, R. Improving comfort level in traffic flow of CACC vehicles at lane drop on two-lane highways. Phys. A Stat. Mech. Its Appl. 2021, 575, 126055. [Google Scholar] [CrossRef]
Liu, H.; Jiang, R. Attitude stabilization of flexible spacecrafts via extended disturbance observer based controller. Acta Astronaut. 2017, 133, 73–80. [Google Scholar]

Figure 1. Two scenarios of CF solution with EDE adapted by reinforcement learning.

Figure 2. Training convergence of the RL agent over “10,000” episodes, showing the average reward per episode stabilizing as the agent learns optimal policies for minimizing velocity error and managing external disturbances.

Figure 3. Velocity comparison (0–50 s) for Scenario 1.

Figure 4. Velocity comparison (100–150 s) for Scenario 1.

Figure 5. Velocity comparison (300–350 s) for Scenario 1.

Figure 6. Velocity comparison (0–50 s) for Scenario 2.

Figure 7. Velocity comparison (100–150 s) for Scenario 2.

Figure 8. Velocity comparison (300–350 s) for Scenario 2.

Figure 9. Velocity tracking comparison for Scenario 1, demonstrating the proposed EDE-RL model’s superior performance.

Figure 10. Velocity tracking comparison for Scenario 2, highlighting the improved tracking capabilities of the EDE-RL model for multiple vehicles.

Table 1. Provides a detailed summary of the existing works.

References	Algorithms	Performance Matrix	Objective
[5]	Anti-Disturbance Self-supervised RL	Control efficiency and driving comfort	Anti-disturbance policy for car-following, mitigating exogenous and endogenous disturbances
[19]	Twin-Delayed Deep Deterministic Policy Gradient (TD3)	Control efficiency and smoothness	Car-following policy using RL to reduce disturbances
[17]	Deep Deterministic Policy Gradient (DDPG)	Increased driving velocity and lane stability	Hierarchical model for lane-change and car-following
[18]	Expert Demonstration RL (EDRL)	Enhanced reward performance and training speed	Stabilize and accelerate DRL training for car-following control in traffic oscillations
[19]	Deep Reinforcement Learning (DRL)	Improved speed, queue length, and wait time	Traffic signal control using compressed traffic states
[20]	Deep Soft Actor–Critic	Improved speed and path following accuracy	Control method for high-speed path-following
[21]	Hybrid DRL	Model adaptability to varying road conditions	Scalable control model for vehicle speed at pedestrian crossings
[22]	Cooperative Optimization Multi-agent DRL (COTV-MADRL))	Reduced travel time and pollution	Coordinated control for traffic signals and vehicle speed to improve flow and efficiency
[23]	DDPG	Improved safety, comfort, and traffic flow	The car-following model with longitudinal and lateral control for adaptive cruise control

Table 2. Parameters of DDPG.

Parameter	Configuration
Learning rate (actor)	0.0002
Learning rate (critic)	0.0005
Replay buffer size	750,000
Batch size	64
Soft update coefficient (τ)	0.002
Discount factor (γ)	0.98

Table 3. Parameters of CF.

Parameter	Configuration
c₁	2.5
c₂	1.0
c₃	0.15
ς₁	0.02
k	0.9
a_max	2.2

Table 4. Summary of the velocity error for both the conventional and RL-based car-following methods in Scenario 1.

Time Interval (s)	Velocity Error (Conventional) (m/s)	Velocity Error (Suggested RL) (m/s)
0–50	0.95	0.35
100–150	1.25	0.5
300–350	0.80	0.30

Table 5. Summary of the velocity error for both the conventional and RL-based car-following methods in Scenario 2.

Time Interval (s)	Velocity Error (Conventional) (m/s)	Velocity Error (Suggested RL) (m/s)
0–50	0.40	0.85
100–150	0.55	1.10
300–350	0.45	0.95

Table 6. Comparison of performance metrics for Scenario 1.

Metric	ACC Model	CACC Model	RL Model (no EDE)	Suggested EDE-RL Model
Velocity tracking error (m/s)	0.95	0.70	0.75	0.35
Acceleration jerk (m/s³)	2.5	1.8	2.0	1.2
Disturbance rejection	50	65	75	90

Table 7. Comparison of performance metrics for Scenario 2.

Metric	ACC Model	CACC	RL Model (no EDE)	Suggested EDE-RL Model
Velocity tracking error (m/s)	1.50	0.80	0.90	0.85
Acceleration jerk (m/s³)	2.8	2.0	2.3	1.3
Disturbance rejection	45	68	70	88

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tayab, A.; Li, Y.; Syed, A. Reinforcement Learning-Based Approach to Reduce Velocity Error in Car-Following for Autonomous Connected Vehicles. Machines 2024, 12, 861. https://doi.org/10.3390/machines12120861

AMA Style

Tayab A, Li Y, Syed A. Reinforcement Learning-Based Approach to Reduce Velocity Error in Car-Following for Autonomous Connected Vehicles. Machines. 2024; 12(12):861. https://doi.org/10.3390/machines12120861

Chicago/Turabian Style

Tayab, Abu, Yanwen Li, and Ahmad Syed. 2024. "Reinforcement Learning-Based Approach to Reduce Velocity Error in Car-Following for Autonomous Connected Vehicles" Machines 12, no. 12: 861. https://doi.org/10.3390/machines12120861

APA Style

Tayab, A., Li, Y., & Syed, A. (2024). Reinforcement Learning-Based Approach to Reduce Velocity Error in Car-Following for Autonomous Connected Vehicles. Machines, 12(12), 861. https://doi.org/10.3390/machines12120861

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Reinforcement Learning-Based Approach to Reduce Velocity Error in Car-Following for Autonomous Connected Vehicles

Abstract

1. Introduction

2. Related Works

3. Vehicle Model and Problem Definition

3.1. Vehicle Model

3.2. Problem Definition

4. Extended Disturbance Estimator Design for Intelligent Connected Vehicle

4.1. Extended Disturbance Estimator Design

4.2. Estimator Gain Adjusted by Reinforcement Learning

5. Car-Following Strategy Using Extended Disturbance Estimator

6. Simulations Results

6.1. Single Following Vehicle in Scenario 1

6.2. Two Following Vehicles in Scenario 2

6.3. Comparative Analysis with Benchmark Models

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI