1. Introduction
Connected and automated vehicles (CAVs) are an important development direction for the automobile industry. They are not only an important way to solve the problems of traffic safety, resource consumption, environmental pollution, etc., but are also the core element of establishing an intelligent transportation system. Cooperative adaptive cruise control (CACC) based on on-board sensors and vehicle-to-vehicle (V2V) and/or infrastructure-to-vehicle (I2V) communication has become a hot spot in the research of intelligent vehicles [
1,
2]. Through vehicle-to-everything (V2X) communication, this mode can receive the dynamic information of the surrounding environment in real-time and improve driving safety [
3,
4]. Simultaneously, CACC has a significant influence on improving the road capacity, reducing fuel consumption, decreasing environment pollution, and so on [
5,
6,
7].
By sharing information among vehicles, a CACC system allows automated vehicles to form platoons and be driven at harmonized speed with smaller constant time gaps between vehicles [
8]. CACC plays a positive role in improving the performance of the vehicular platooning system and ensuring the safety of vehicles, so it has attracted wide attention from researchers. Previous methods for CACC include proportional integral derivative (PID) control [
9,
10], sliding mode control (SMC) [
11,
12], model predictive control (MPC) [
13,
14,
15], H-Infinity (H∞) control [
16,
17], etc. Due to the advantages of low complexity and less computation, PID controllers play an important role in the control field. However, the parameters of the PID controller need to be adjusted manually and cannot adapt to different working conditions. The control effect of SMC, MPC, and H∞ methods are closely related to model accuracy, and need a reasonably good model of the system to be controlled. When the model precision is higher, the control effect is better. Nevertheless, due to the complex nonlinear dynamics of the longitudinal movement of the vehicular platooning, it is difficult to establish an accurate model.
In recent years, Google’s DeepMind team has combined deep neural networks with the decision-making capabilities of reinforcement learning to establish a framework for deep reinforcement learning (DRL) [
18]. Then the deep deterministic policy gradient (DDPG) algorithm was proposed to realize the control of the continuous action space [
19]. In addition, it has achieved good results in the field of automatic driving control [
20]. At present, the DRL algorithm is mainly applied to the control of individual vehicles, specifically divided into longitudinal [
21,
22] and lateral [
23,
24] motion control. Zhu et al. [
21] used real-world driving data for training and proposed a human-like car-following model based on the DDPG algorithm, which has higher accuracy than traditional methods. A lane change model based on DRL was designed, which can achieve more stable, safe, and efficient results by adjusting the reward function [
23]. Chen et al. [
25] proposed a path tracking control architecture that combines a conventional pure pursuit method and DRL algorithm. It was found that the approach of adding a DRL in parallel improves the performance of a traditional controller under various operating conditions. Zhou et al. [
26] proposed a framework for learning the car-following behavior of drivers based on maximum entropy deep inverse reinforcement learning. Aiming at the problem of simple simulation scene setting in the above research, Makantasis et al. [
27] established the traffic flow model in SUMO simulation software to train the agent. The car-following and lane-changing behavior integrated model using DDPG was developed and trained in the VISSIM simulation environment [
28]. Some studies have tried to apply theory to practice [
22], but the DRL algorithm based on a deep neural network is a “black box” model. In other words, the control principle is unknown and has significant uncertainty. The training results depend on the setting of random seeds, which is unstable. This is the reason why the current DRL algorithm is mainly implemented on the simulation platform and is difficult to apply to the real vehicle [
29].
The learning controller has the strong ability of discrimination, memory, and self-adjustment. It can adjust its own parameters according to different controlled objects and environmental changes to achieve the best control performances. There are currently three main types of learning control systems: iterative learning control (ILC) [
30,
31], adaptive control based on neural networks (NN) [
32,
33], and learning control based on the Markov decision process (MDP) [
34,
35]. Wang et al. [
30] presented a novel learning-based cruise controller for autonomous land vehicles (ALVs). The controller consists of a time-varying proportional-integral (PI) module and an actor-critic learning control module. Lu et al. [
36] designed a personalized driving behavior learning system based on neural reinforcement learning (NRL), which utilized data collected by on-board sensors to learn the driver’s longitudinal speed control characteristics. Combining DRL with traditional control methods has been a research hotspot in recent years. It takes advantage of the self-learning and self-tuning abilities of DRL. Moreover, it uses the traditional controller to ensure the stability of the system. The learning-based predictive control (LPC) method using the actor-critic framework was proposed, which was shown to be asymptotically stable in the sense of Lyapunov [
37]. Ure et al. [
38] developed a reinforcement learning framework for automated weight tuning for MPC-based adaptive cruise control (ACC) systems. This approach significantly shortens the exhausting manual weight tuning process.
In summary, researchers in different fields have already completed numerous works in the longitudinal motion control of vehicular platooning, but there still exist some deficiencies as follows. (1) The vehicular platooning controller is difficult to adapt to various working conditions and controller parameters must be set manually by professional engineers (e.g., PID). The existing controllers such as MPC, LQR, and H∞ need a high-precision controlled object model. However, this knowledge is very difficult to obtain. (2) Neural networks and their derived controllers belong to the scope of supervised machine learning, which can only imitate the parameter adjustment strategies of expert demonstrations, but not necessarily the optimal control effect. Another issue is that their generalization ability also needs to be proved. (3) The end-to-end learning method performs well in an autonomous driving simulation environment, but its interpretability is poor, and there is little literature to analyze the stability of the control system. The vehicular platooning has complex nonlinearity, so the actual control effect cannot be guaranteed.
In view of the above problems, a learning control method that uses DDPG-based PID for longitudinal motion control of vehicular platooning is proposed in this paper. PID controllers are the most commonly used for industrial applications due to their simplicity in structure and robustness in performance. However, the traditional PID adjustment is not only time-consuming and laborious, but also unable to adapt to different working conditions. Therefore, we proposed a novel control strategy of vehicular platooning using DDPG-based PID to solve this problem. To the best knowledge of the authors, this is the first reported use of DDPG-based PID for vehicular platooning control. The PID controller parameters can be automatically adjusted in real-time according to the state by using a trained DDPG algorithm. Through offline training and learning in a simulation environment, the PID controller can adapt to different road and vehicular platooning acceleration and deceleration conditions. The advantage of this scheme is that the PID controller parameters do not rely on any manual tuning and can better adapt to the change in working conditions. The DDPG-based PID controller eliminates the drawbacks of the traditional PID controller, such as insufficient adaptability, and the difficulty of parameter regulation. In addition, the vehicular platooning system stability is proved by stability theory to ensure safety. Therefore, compared with the traditional PID controller, the DDPG-based PID has stronger robustness. This study is the further development of the learning control method, and provides a new idea for the practical application of DRL algorithm in the industrial field. However, the HIL simulation simplifies the road environment conditions. How to carry out real vehicle experiments to further verify the stability and reliability of a vehicular platoon controller is the focus of the next research in this paper.
The work in this paper is an extension of our previous publication [
39]. The remainder of this paper is organized as follows. In
Section 2, the architecture of the vehicular platooning control system and a string stability analysis are presented. In
Section 3, we illustrate how the problem of vehicular platoon control is formulated as an MDP model. The DDPG-based PID control algorithm is trained in
Section 4. In
Section 5, the experimental result is presented and in
Section 6, the results are analyzed and discussed. Finally, the conclusions and future work outlook are provided in the last section.