Traffic congestion leads to a lot of wasted time and slow traffic, and it is one of the main challenges that traffic management agencies and traffic participants have to overcome. According to a national motor vehicle crash survey of the United States, 47% of collisions in 2015 happened at intersections [1
]. Automated vehicles (AVs) have recently shown the potential to prevent human errors and improve the quality of a traffic service, with full autonomy expected as soon as 2050 [2
]. This means of transportation can save the economy of the United States approximately $
450 billion each year [3
]. Recently, the intelligent transport system (ITS) domain was developed to provide a smoother, smarter, and safer journey to traffic participants. The early applications of ITS, such as traffic control in Japan, route guidance systems in Berlin, or Intelligent Vehicle Highway Systems in the United States, have been in use since the 1980s. However, the ITS domain concentrates only on intelligent techniques located in vehicles and road infrastructures. To solve communication problems between vehicles and road infrastructures, cooperative intelligent transport systems (C-ITS) can be used to enable those systems to communicate and share information in real time to provide safe and convenient travel. Motivated by the uncertainty in the application of AVs in real environments, this study focuses on mixed-autonomy traffic settings, in which complex interactions between AVs and human-driven vehicles occur in various continuous control tasks.
In car-following models, adaptive cruise control (ACC) is used to develop driver behavior. ACC systems are an important part of the driver assistance system in premium vehicles and adopt a radar sensor to set the relative distance between vehicles. Previous studies have attempted to connect automated vehicle applications in order to improve traffic safety and capacity. Rajamani and Zhu [4
] applied an ACC system to a semi-automated vehicle. The cooperative ACC (CACC) model is a next-generation ACC system that considers both the lead car in the same lane and the car in front in the other lane [5
]. Nonetheless, ACC and CACC both depend on constant spacing. As an improvement, the intelligent driver model (IDM) was designed to enhance ACC and CACC systems using real-world experimental data [6
]. The IDM, which was introduced by Treiber et al. [7
], provides more advantages and realistic values to an ACC system. In particular, the IDM improves the road capacity and reduces the real-time headway [8
Motivated by the challenges of complex policies, reinforcement learning (RL) was developed based on a trial-and-error method in order to find the best action in uncertain and dynamic environments. RL is a kind of machine learning that differs from supervised learning and unsupervised learning. RL optimizes a reward signal instead of finding a hidden structure. Bellman [9
] proposed Markovian decision processes (MDPs) as discrete stochastic methods for optimal control. Howard [10
] introduced the policy iteration method that was applied in MDPs. There are basically three kinds of RL methods: policy-based, value-based, and actor-critic methods [11
]. Recent studies in RL have applied RL to Atari 2600 games [12
], fused reinforcement learning with the Monte Carlo tree search for AlphaGo [13
], and applied RL to continuous control tasks [14
]. In order to obtain reliable simulation performance, deep reinforcement learning (deep RL) can be used to learn the most appropriate actions in a dynamic environment. In deep RL, RL is fused with an artificial neural network (ANN). Deep RL has, for example, been applied for traffic signal control. Furthermore, recent breakthroughs in artificial intelligence (AI) have been used to develop deep RL methods that are suitable for a range of applications, including high-fidelity simulators, such as virtual environments including the Arcade Learning Environment for more than 55 different games [15
], a testing-model-based control platform called multi-joint dynamics with a contact point for control applications [16
], and deep convolutional neural networks (CNNs) for guiding the policy search method [17
]. Recent studies have applied deep reinforcement learning to adaptive traffic signal control (ATSC) [18
]. The overview of recent applications for ATSC was based on deep RL [20
]. A large-scale traffic light signal for multiple agents was conducted by using a cooperative deep RL framework [21
]. The multi-agent RL framework for traffic light control performed better than the previous methods [22
]. However, signalized intersection rules are always broken by aggressive drivers. In addition, a non-signalized intersection is a complex traffic situation with a high collision rate. Therefore, it is necessary to study autonomous driving in a mixed-traffic condition at a non-signalized intersection by adopting deep RL.
In order to improve RL’s performance during continuous tasks, various studies have applied RL using neural network function approximators, such as deep Q-learning [23
], original policy gradient methods [24
], and trust region policy optimization (TRPO) [25
]. However, deep Q-learning remains poorly understood and fails to converge during many simple tasks. Trust region policy optimization has a high degree of complexity. Proximal policy optimization (PPO) uses multiple epoch updates along a minibatch instead of one gradient update for the sample [26
]. Thus, the use of PPO through a deep RL framework has become a promising approach to the control of multiple autonomous vehicles. The PPO-based deep RL was applied to control lane-change decisions according to safety, efficiency, and comfort [27
]. In addition, PPO-based deep RL was leveraged to optimize a mixed-traffic condition at a roundabout intersection [28
]. Nevertheless, these studies did not consider the PPO hyperparameter within the real traffic volume. Research on PPO hyperparameter for a non-signalized intersection has been lacking.
The most difficult problem for researchers to solve regarding autonomous driving is that of training and validating driving control models in a physical environment. To solve this problem, the simulation approach has been used to represent the real world. Pomerleau [29
] used an autonomous land vehicle in a neural network to simulate road images. Recently, the open racing car simulator (TORCS), which is a multi-agent car simulator, was developed based on AI through a lower-level application programming interface [30
]. However, TORCS does not support urban driving simulations and lacks such factors as pedestrians, traffic rules, and intersections. More recently, researchers have adopted deep RL to analyze autonomous driving strategies. For example, the car learning to act (CARLA) open urban driving simulator is a trained and validated driving model according to perception and control [31
]. However, CARLA is a three-dimensional (3D) simulator for the testing of individual autonomous vehicles. Furthermore, simulation of urban mobility (SUMO), which is an open-source traffic simulator, enables the simulation of traffic scenarios in a large area [32
], and with traffic signal control [19
]. The total possible set of SUMO simulations can be expanded by adopting a traffic control interface (TraCI), which interacts with other programming languages such as Python and Matlab [35
]. In addition, Flow is a Python-based open-source tool that can be used to connect a simulator (e.g., SUMO, Aimsun) with a reinforcement learning library (e.g., RLlib, Rllab) [36
]. Flow can be used to train a deep RL algorithm and evaluate a mixed-autonomy traffic controller, such as a traffic light or an urban network [37
]. Recent studies have applied Flow to evaluate the effectiveness of an automated vehicle (AV) in a network [38
] and reduce the frequency and magnitude of formed waves with AV penetration rates [40
]. The experimental results showed that the multi-agents RL policy outperformed according to average velocity and rewards. In addition, the high average velocity leads to reduce the delay time, fuel consumption, and emissions. Thus, the average velocity has become an effective metric to train a deep RL policy in the real world.
In this study, we present a deep RL method for simulating mixed-autonomy traffic at a non-signalized intersection. Our proposed method combines RL and multilayer perceptron (MLP) algorithms and considers the effectiveness of the leading autonomous vehicles. In addition, we apply a set of PPO hyperparameters to enhance the simulation’s performance. First, we perform a leading autonomous vehicle experiment at a non-signalized intersection with a varying AV penetration rate that ranges from 10% to 100% in 10% increments. Second, we input the PPO hyperparameters into the MLP algorithm for the leading autonomous vehicle experiment. Finally, human-driven leading vehicle and all human-driven vehicle experiments are used to evaluate the superiority of the proposed method. The major contributions of this work are as follows.
An enhanced hybrid deep RL method is presented that uses a PPO algorithm through MLP and RL models in order to consider the effectiveness of the leading autonomous vehicle experiment at a non-signalized intersection based on an AV penetration rate that ranges from 10% to 100% in 10% increments. The leading autonomous vehicle experiment yields a significant improvement when compared with the leading human-driven vehicle and all human-driven vehicle experiments in terms of training policy, mobility, and energy efficiency.
A set of PPO hyperparameters is proposed in order to explore the effect of the automated extraction feature on policy prediction and to obtain reliable simulation performance at a non-signalized intersection within the real traffic volume.
The demonstration of a significant improvement in traffic perturbations at a non-signalized intersection is based on an AV penetration rate that ranges from 10% to 100% in 10% increments.
The rest of this paper is organized as follows. Section 2
presents the deep RL framework, the longitudinal dynamic models, the policy optimization method, and the proposed model’s architecture. Section 3
describes the simulation experiments and presents the results. Section 4
contains our conclusions.