Reinforcement-Learning-Based Decision and Control for Autonomous Vehicle at Two-Way Single-Lane Unsignalized Intersection

: Intersections have attracted wide attention owing to their complexity and high rate of trafﬁc accidents. In the process of developing L3-and-above autonomous-driving techniques, it is necessary to solve problems in autonomous driving decisions and control at intersections. In this article, a decision-and-control method based on reinforcement learning and speed prediction is proposed to manage the conjunction of straight and turning vehicles at two-way single-lane unsignalized intersections. The key position of collision avoidance in the process of conﬂuence is determined by establishing a road-geometry model, and on this basis, the expected speed of the straight vehicle that ensures passing safety is calculated. Then, a reinforcement-learning algorithm is employed to solve the decision-control problem of the straight vehicle, and the expected speed is optimized to direct the agent to learn and converge to the planned decision. Simulations were conducted to verify the performance of the proposed method, and the results show that the proposed method can generate proper decisions for the straight vehicle to pass the intersection while guaranteeing preferable safety and trafﬁc efﬁciency.


Introduction
With the rapid development of automatic-driving technology, many functions of low-level advanced driver-assistance systems have been implemented in an increasing number of vehicles. However, for high-level automatic-driving systems, it is imperative to develop safer and more intelligent decisions and control for automated vehicles under increasingly complex traffic scenes. As a typical traffic scene with a high incidence of accidents, unsignalized intersections have been investigated by many researchers for decision making and control to promote driving safety and efficiency [1,2].
As a classical method, behavior prediction for surrounding vehicles in traffic environment has proved to be an efficient way of dealing with decision-making problems. Zyner et al. [3] leveraged the long short-term memory (LSTM) recurrent neural network (RNN) to predict the intention of the driver when a vehicle enters an intersection, contributing to the decision making of an autonomous vehicle. A decision-making framework is proposed by Samyeul in [4] for autonomous vehicles to predict the future trajectory of observed vehicles and to delineate the potentially dangerous collision area to help navigate the intersection safely and efficiently. In [5], a motion-planning method for autonomous vehicles is introduced via rapidly exploring the random-tree algorithm. To solve motionplanning problems in environments with dynamic obstacles, the algorithm combines the RRT algorithm and the configuration-time space to improve the quality of the planned trajectory. Ramyar et al. [6] present a data-driven technology using the Takagi-Sugeno fuzzy model to simulate and predict driver behavior at intersections, thereby further improving prediction accuracy.
Model predictive control (MPC) as a commonly used control method has been widely exploited in decision control of autonomous driving at intersections. In [7], a bilevel MPC algorithm is established for the coordination of autonomous vehicles at intersections, and a distributed sequential quadratic-programming (QP) method is leveraged to solve the intersection-level optimization problem. Zhao et al. [8] developed a collaborative-driving algorithm for connected and automated vehicles at unsignalized intersections based on MPC, and a decentralized controller was advanced to control each vehicle to pass through the intersection smoothly. A probabilistic model was devised to predict the trajectory of the target vehicle [9], and afterwards was integrated within a collision-avoidance model. Katriniok et al. proposed a distributed MPC approach that enables multiple vehicles to pass through an intersection simultaneously with a safe and efficient manner [10]. A study was conducted concerning the decision-making control in intersections with multiple surrounding vehicles [11], wherein a robust MPC is responsible for searching security breakthrough in the studied scene, and meanwhile, planning the optimal trajectory.
In recent years, partial observable Markov decision processes (POMDP) have been progressively employed for autonomous-driving decisions at intersections. Bouton et al. [12] defined the traffic problem at unsignalized intersections as a POMDP, and the Monte Carlo sampling method was adopted to solve the problem. Shu et al. [13] proposed a method for decision-making control for left-turning intelligent vehicles based on the key turning points at intersections, and a partially observable Markov model was employed to solve the optimal speed sequence in the left-turn process. Kye et al. [14] introduced an intent-aware autonomous-driving decision-making method at unsignalized intersections, where the intents of traffic participants were modeled as dynamic Bayesian networks, and the intentaware decision-making problem was modeled as a POMDP based on the inference results. Hubmann et al. [15] considered the occlusion generated by static objects and dynamic objects at the same time, and a general autonomous-driving strategy based on POMDP was advanced under urban conditions. In [16], a POMDP framework was proposed for online autonomous driving in different situations.
Machine-learning algorithms, such as reinforcement learning (RL), are also widely exploited in the field of decision control. Deep RL (DRL) combines the perception ability of deep learning and the decision-making capability of RL, performing well in solving continuous motion-control problems [17,18]. Islee et al. [19] investigated the effectiveness of DRL in dealing with intersection decision-control problems. Through comparison study, a deep Q network enables the learning of strategies better than common heuristic methods for different indicators, such as traffic time and traffic rate; however, the generalization ability is limited. Shi et al. proposed a coordinated control method with proximal policy optimization in a vehicle-road-cloud integration system, and a policy of the connected vehicles was learned by RL to across the intersection safely [20]. Chen et al. [21] proposed an autonomous intersection-management system based on DRL, and a braking safety-control model was applied to ensure the safety of each autonomous vehicle at the intersection. Zhou et al. [22] established a vehicle-following model for intelligent vehicles based on RL to improve driving behavior at intersections. By specifying an effective reward function, the model can be learned and works well under different conditions to improve fuel consumption, safety, and driving efficiency.
In view of the research status of autonomous-driving decision making and control at integrated intersections, planning methods based on state-prediction results of environmental vehicles usually quantify the degree of risk of intersection collisions, and rule-based strategies are proposed to make decisions for intelligent vehicles. However, rule-based strategies exhibit poor generality, and the formulation of rules depends on the practical experience, greatly affecting the effectiveness of the algorithm. Problems in the decision and control of intelligent vehicles at intersections are complex and involve multifactor coupling [23]. Crossing an intersection is a complex driving behavior [24]; thus, it is necessary to simplify the intersection-scene model to a certain extent to make decision rules depending on the quantified degree of risk, leading to certain differences between the simplified scene and the actual scene [25]. Generally, a POMDP model requires a large amount of computation. Although Monte Carlo sampling can mitigate this concern, the required discretization of the motion space will also lead to deteriorated accuracy to some extent [13].
The method based on model prediction strongly relies on the accuracy of the established model; thus, many factors should be considered comprehensively in the modeling process to achieve a satisfactory control effect [8]. In contrast to the above methods, a specific control model is not required in RL due to its model-free characteristic. Decision making for straight intelligent vehicles at intersections is a continuous action-control problem, and thus it is well-believed that the decision-making control problem of intelligent vehicles at intersections can be solved by an RL method.
Motivated by these conditions, in this study, a decision-and-control model based on RL is designed. The main contributions of this study are as follows. (1) A method is proposed to judge the priority of crossing the intersection based on a speed prediction by an autoregressive integrated moving average (ARIMA), and to calculate the expected speed of an autonomous vehicle. (2) A decision-and-control model is constructed based on speed prediction and RL. The model incorporates the expected speed guided by the RL model to converge in the optimal direction, thereby saving the learning time of the agent.
(3) A multiobjective decision-making control-effect-evaluation system is established with the consideration of success rate, speed punishment, safety, traffic efficiency, and comfort.
The remainder of this article is structured as follows. In Section 2, the geometric model of the road and the circular model of the vehicle body are established, and a mathematical analysis of the intersection confluence trajectory data is presented. In Section 3, a decisionand-control method based on speed prediction, RL, and evaluation methods is introduced in detail. In Section 4, the simulation and effects validation are addressed. Section 5 draws the main conclusions of this study.

Intersection Confluence Condition Modeling
To better analyze the decision-making process and explain the mathematical model of the subsequent decision-making control, the road-geometry model of the research object should be constructed first. When passing through the intersection, a vehicle generally has three directions to go, as shown in Figure 1a, where a-f represents the possible driving direction of the vehicle. As shown in Figure 1b, the relationship between two vehicles can basically be divided into three types: irrelevant (1-4), cross (5-7), and confluent (8,9). The areas with probability of collision are marked with a yellow box in the figures. Under the confluence condition, two vehicles will eventually drive into the same lane; therefore, the potential collision area is longer than in other conditions. This scene not only includes the decision-making and control problem in the process of two vehicles when passing through the intersection, but also contains the continuous influence between two vehicles after confluence. Therefore, this paper selects b and d for subsequent modeling and analysis. Figure 2 illustrates the road-geometry model under the conditions of two-way single-lane confluence. Straight and turning vehicles enter the intersection from different junctions and eventually converge into the same lane. The center lines of the east-west and north-south lanes at the intersection are labeled as L C1 and L C2 , respectively; L S1 through L S4 represent the stop line at the intersection; and (x , y ) is the confluence point of the two vehicles.

Circular Model of Vehicle Body
The trajectory shown in Figure 2 shows only the centroid movement process of the straight and turning vehicles without considering the actual geometric size of the vehicle. In a real driving scenario, the geometric size of the vehicle body cannot be ignored to avoid the potential risk of collision in the process of two vehicles converging at the intersection. Therefore, a circular model, which has been widely adopted in studies on vehicle collisions, is used to represent the vehicle body profile hereinafter, as shown in Figure 3. By this manner, the radius of the circular model can be calculated by where W and L denote the width and length of the vehicle, respectively; and r denotes the radius of the body circle.

Circular Model of Vehicle Body
The trajectory shown in Figure 2 shows only the centroid movement process of the straight and turning vehicles without considering the actual geometric size of the vehicle. In a real driving scenario, the geometric size of the vehicle body cannot be ignored to avoid the potential risk of collision in the process of two vehicles converging at the intersection. Therefore, a circular model, which has been widely adopted in studies on vehicle collisions, is used to represent the vehicle body profile hereinafter, as shown in Figure 3. By this manner, the radius of the circular model can be calculated by where W and L denote the width and length of the vehicle, respectively; and r denotes the radius of the body circle.

Circular Model of Vehicle Body
The trajectory shown in Figure 2 shows only the centroid movement process of the straight and turning vehicles without considering the actual geometric size of the vehicle. In a real driving scenario, the geometric size of the vehicle body cannot be ignored to avoid the potential risk of collision in the process of two vehicles converging at the intersection. Therefore, a circular model, which has been widely adopted in studies on vehicle collisions, is used to represent the vehicle body profile hereinafter, as shown in Figure 3. By this manner, the radius of the circular model can be calculated by where W and L denote the width and length of the vehicle, respectively; and r denotes the radius of the body circle. Considering the circular model of the vehicle body, the actual motion trajectory of straight and turning vehicles under the confluence condition at the intersection is shown in Figure 4. The trajectory of the turning vehicle is assumed to be composed of two straight lines and a 1/4 arc, in which TR and GS denote the turning section and straight section of the turning vehicle, respectively; R denotes the radius of the arc; and L represents the chord length of the arc. In the TR phase, the vehicle turns to the right and eventually merges to the same lane as a straight vehicle. As the two vehicles become closer, the risk of collision increases; therefore, this is the area that our research is focused on. Considering the circular model of the vehicle body, the actual motion trajectory of straight and turning vehicles under the confluence condition at the intersection is shown in Figure 4. The trajectory of the turning vehicle is assumed to be composed of two straight lines and a 1/4 arc, in which TR and GS denote the turning section and straight section of the turning vehicle, respectively; R denotes the radius of the arc; and ' L represents the chord length of the arc. In the TR phase, the vehicle turns to the right and eventually merges to the same lane as a straight vehicle. As the two vehicles become closer, the risk of collision increases; therefore, this is the area that our research is focused on.  ) ( ) According to (2), we can easily establish areas where collisions may occur during the confluence of two vehicles, as shown in Figure 5. The CA area in red enclosed by the lines, which is 2r from the centerline of the trajectories y2 and x1, is the only latent collision area of the two vehicles before the confluence point ( ) ' ' , x y , since the distance between the two vehicles will not be less than 2r outside this area.  Considering the circular model of the vehicle body, the actual motion trajectory of straight and turning vehicles under the confluence condition at the intersection is shown in Figure 4. The trajectory of the turning vehicle is assumed to be composed of two straight lines and a 1/4 arc, in which TR and GS denote the turning section and straight section of the turning vehicle, respectively; R denotes the radius of the arc; and ' L represents the chord length of the arc. In the TR phase, the vehicle turns to the right and eventually merges to the same lane as a straight vehicle. As the two vehicles become closer, the risk of collision increases; therefore, this is the area that our research is focused on.
According to (2), we can easily establish areas where collisions may occur during the confluence of two vehicles, as shown in Figure 5. The CA area in red enclosed by the lines, which is 2r from the centerline of the trajectories y2 and x1, is the only latent collision area of the two vehicles before the confluence point ( ) ' ' , x y , since the distance between the two vehicles will not be less than 2r outside this area. The critical condition of collision judgment can be determined by (2), where (x 0 , y 0 ) denotes the coordinate of the straight vehicle and (x 0 , y 0 ) represents the coordinates of the turning vehicle.
According to (2), we can easily establish areas where collisions may occur during the confluence of two vehicles, as shown in Figure 5. The CA area in red enclosed by the lines, which is 2r from the centerline of the trajectories y2 and x1, is the only latent collision area of the two vehicles before the confluence point (x , y ), since the distance between the two vehicles will not be less than 2r outside this area.  According to the different positions and speeds of the two vehicles in the confluence process, there are several feasible solutions to (2). Nonetheless, it is difficult to solve all the vehicle positions where the body circles are tangential, while the important points in the CA area can be analyzed. We selected the confluence point (x , y ) and the turning point of the turning vehicle (x 1 , y 1 ) for analysis.
By finding the point on the track of the turning vehicle at a distance of 2r from the merging point, it can be observed that when the straight and turning vehicles are located around the merging point and end of the GS segment, the body circles of the two vehicles touch in tangent, as shown in Figure 6. Another important body circle, C2, is obtained by taking the symmetry position of the merging point (x , y ) about the straight line x = x 1 . According to the different positions and speeds of the two vehicles in the confluence process, there are several feasible solutions to (2). Nonetheless, it is difficult to solve all the vehicle positions where the body circles are tangential, while the important points in the CA area can be analyzed. We selected the confluence point ( ) ', ' x y and the turning point of the turning vehicle ( ) x y for analysis.
By finding the point on the track of the turning vehicle at a distance of 2r from the merging point, it can be observed that when the straight and turning vehicles are located around the merging point and end of the GS segment, the body circles of the two vehicles touch in tangent, as shown in Figure 6. Another important body circle, C2, is obtained by taking the symmetry position of the merging point ( ) ', ' x y about the straight line The analysis shows that there will be no collision between the two vehicles before the straight vehicle reaches the position of the C2 body circle. When any of the two vehicles reach the position of C3 in advance, the rear vehicle should pass through the intersection in the vehicle-following mode after the prior one. Hence, the body circles C1, C2, and C3 represent the key positions in the proposed road-geometry model approximately with regard to collision avoidance at intersections. A detailed analysis of the three body circles is presented in the following content. The analysis shows that there will be no collision between the two vehicles before the straight vehicle reaches the position of the C2 body circle. When any of the two vehicles reach the position of C3 in advance, the rear vehicle should pass through the intersection in the vehicle-following mode after the prior one. Hence, the body circles C1, C2, and C3 represent the key positions in the proposed road-geometry model approximately with regard to collision avoidance at intersections. A detailed analysis of the three body circles is presented in the following content.

Statistical Analysis of Intersection Confluence Trajectory Data
To mimic the actual working conditions in this study, the intersection-trajectory dataset published by Open ITS [26] was imported to conduct the driving-behavior analysis at intersections. Open ITS is a traffic-data resource-sharing platform built jointly by different research institutes and enterprises. Open ITS provides trajectory data of intersections involving a total of 60 confluence conditions. Figure 7 shows one example of the dataset, which contains information about the position, speed, and acceleration change with time, of straight and turning vehicles.
A speed that is too high or too low at an intersection is dangerous, and as such it is necessary to determine the speed and acceleration threshold of vehicles driving through the intersection. Grouped statistics are made according to the passing order of vehicles going straight at intersections, and the statistical results are shown in Figures 8 and 9, where Figure 8 shows the speed and acceleration distribution of straight and turning vehicles in the case of a straight vehicle going before a turning vehicle, and Figure 9 sketches the distribution when the straight vehicle gives way. taset published by Open ITS [26] was imported to conduct the driving-behavior analysis at intersections. Open ITS is a traffic-data resource-sharing platform built jointly by different research institutes and enterprises. Open ITS provides trajectory data of intersections involving a total of 60 confluence conditions. Figure 7 shows one example of the dataset, which contains information about the position, speed, and acceleration change with time, of straight and turning vehicles. A speed that is too high or too low at an intersection is dangerous, and as such it is necessary to determine the speed and acceleration threshold of vehicles driving through the intersection. Grouped statistics are made according to the passing order of vehicles going straight at intersections, and the statistical results are shown in Figures 8 and 9, where Figure 8 shows the speed and acceleration distribution of straight and turning vehicles in the case of a straight vehicle going before a turning vehicle, and Figure 9 sketches the distribution when the straight vehicle gives way.  A speed that is too high or too low at an intersection is dangerous, and as such it is necessary to determine the speed and acceleration threshold of vehicles driving through the intersection. Grouped statistics are made according to the passing order of vehicles going straight at intersections, and the statistical results are shown in Figures 8 and 9, where Figure 8 shows the speed and acceleration distribution of straight and turning vehicles in the case of a straight vehicle going before a turning vehicle, and Figure 9 sketches the distribution when the straight vehicle gives way. The statistical results reveal that under the two working conditions the velocity and acceleration are mainly distributed in a certain range. The vehicle speed is mainly distributed in the range of 0 to 8 m/s, and the acceleration is mainly distributed in the range of −2 to 2 m/s 2 , thereby providing a reference basis for boundary-condition setting.
(c) (d) The statistical results reveal that under the two working conditions the velocity and acceleration are mainly distributed in a certain range. The vehicle speed is mainly distributed in the range of 0 to 8 m/s, and the acceleration is mainly distributed in the range of −2 to 2 m/s 2 , thereby providing a reference basis for boundary-condition setting.

Turning-Vehicle Speed Prediction
Owing to the limited space and high risk of collision at the intersection, the state of vehicles in the surrounding environment should be predicted from the perspective of safety. Hence, the ARIMA model is considered to predict the future speed of the turning

Turning-Vehicle Speed Prediction
Owing to the limited space and high risk of collision at the intersection, the state of vehicles in the surrounding environment should be predicted from the perspective of safety. Hence, the ARIMA model is considered to predict the future speed of the turning vehicle. Taking the confluence trajectory data from the first data group in the Open ITS dataset as an example, the inductive method based on the self-correlation function and partial self-correlation function is adopted to determine the order of the model [27]. The self-correlation function and partial self-correlation function are shown in Figure 10. As can be found, the second-order difference in the speed of the selected track data is a stationary time series. The blue line in the figure represents the 95% confidence interval. In general, the determination of the order of the ARIMA model is based on the last point outside the confidence interval. Therefore, in this study, both parameters of the second-order-difference ARIMA model of the turning vehicle can be set to 6.
With the established ARIMA speed-prediction model, the future speed of the turning vehicle can be predicted, providing information for the vehicle to decide to go ahead or give way to the turning vehicle at the intersection. The sampling step of speed in the data set is 0.04 s. In order to reduce the computing load of on-board processors and maintain accuracy and predictability, we set the predicted time to 0.2 s to predict the speed after five sampling steps. Figure 11 shows the result of speed prediction. The mean value of the actual vehicle speed is 2.331 m/s, while the mean square speed errors of the rolling prediction model in the next five steps are 0.0243, 0.0325, 0.0370, 0.0410, and 0.0411, respectively. Most of the mean square errors of speed prediction are approximately within 1%, and the maximum is less than 2%, demonstrating superior prediction performance. self-correlation function and partial self-correlation function are shown in Figure 10. As can be found, the second-order difference in the speed of the selected track data is a stationary time series. The blue line in the figure represents the 95% confidence interval. In general, the determination of the order of the ARIMA model is based on the last point outside the confidence interval. Therefore, in this study, both parameters of the secondorder-difference ARIMA model of the turning vehicle can be set to 6. With the established ARIMA speed-prediction model, the future speed of the turning vehicle can be predicted, providing information for the vehicle to decide to go ahead or give way to the turning vehicle at the intersection. The sampling step of speed in the data set is 0.04 s. In order to reduce the computing load of on-board processors and maintain accuracy and predictability, we set the predicted time to 0.2 s to predict the speed after five sampling steps. Figure 11 shows the result of speed prediction. The mean value of the actual vehicle speed is 2.331 m/s, while the mean square speed errors of the rolling prediction model in the next five steps are 0.0243, 0.0325, 0.0370, 0.0410, and 0.0411, respectively. Most of the mean square errors of speed prediction are approximately within 1%, and the maximum is less than 2%, demonstrating superior prediction performance.

Decision and Control Based on RL
RL is commonly employed to solve problems with complex decisions and control. In  With the established ARIMA speed-prediction model, the future speed of the turning vehicle can be predicted, providing information for the vehicle to decide to go ahead or give way to the turning vehicle at the intersection. The sampling step of speed in the data set is 0.04 s. In order to reduce the computing load of on-board processors and maintain accuracy and predictability, we set the predicted time to 0.2 s to predict the speed after five sampling steps. Figure 11 shows the result of speed prediction. The mean value of the actual vehicle speed is 2.331 m/s, while the mean square speed errors of the rolling prediction model in the next five steps are 0.0243, 0.0325, 0.0370, 0.0410, and 0.0411, respectively. Most of the mean square errors of speed prediction are approximately within 1%, and the maximum is less than 2%, demonstrating superior prediction performance.

Decision and Control Based on RL
RL is commonly employed to solve problems with complex decisions and control. In

Decision and Control Based on RL
RL is commonly employed to solve problems with complex decisions and control. In the decision process, the tuples (S, A, R, S ) represent the basic units of each training, in which S denotes the current state, S denotes the new state that transfers from S taken action A, and a reward R is received according to the actions and states. The proper state space and action space should be carefully constructed in RL for decision making by intelligent vehicles. The construction of the state space is mainly based on the position and speed information of the two vehicles, as shown in (3) to (5). In this paper, the subscript ego represents the ego vehicle, and the subscript env denotes the turning vehicle in the environment.
where S ego and S env denote the sequence of state of the straight vehicle and turning vehicle, respectively, and (x, y, v) denote the abscissa, ordinate, and velocity of the vehicle, respectively. As for the action space, a natural way is to set the action as the throttle percentage and brake-pedal pressure, which can simplify the design of the tracking controller. However, the simultaneous output of the above two actions leads to an unreasonable strategy, such as pressing the throttle and braking simultaneously. Thus, the action space illustrated in (6) to (8) is constructed to avoid the problem, in which Action_mix indicates the brake-pedal pressure or throttle opening of the vehicle, and its value range [ Action_min, Action _max] is determined by simulation experiment according to the statistical results depicted in Figures 8 and 9.
The arrival time required for straight and turning vehicles from the current position to the three centers (x , y ),(x 2 , y 2 ), and (x 1 , y 1 ) is shown in Figure 6 and can be calculated according to (9) to (15). For straight vehicles, the minimum time to arrive at the key position is calculated with the permitted maximum acceleration a max , which is 2 m/s 2 in this study. Here, v pre represents the prediction result of the ARIMA multistep speed-prediction model v 1 pre , v 2 pre , · · · , v l−1 pre , v l pre , and l denotes the predicted length in five steps. In addition, the situation when the straight vehicle reaches the maximum speed v max before arriving at the key position is considered in (9) and (11), and v max is set to 8 m/s according to the conclusion of Section 2.2.
If two vehicles driven by humans arrive at an intersection at different times, we should determine which vehicle should give way according to the "first-in, first-out" rule. If two vehicles arrive at an intersection at the same time, the traffic regulations stipulate that the straight vehicle has the right of way. However, due to the limitation of drivers, it is sometimes difficult for them to accurately judge the order of two vehicles arriving at the intersection; therefore, both vehicles will usually slow down and pass through the intersection sequentially, greatly affecting the traffic efficiency of the unsignalized intersection. Therefore, the following method is proposed to facilitate the safe and orderly movement of traffic. For security reasons, a straight vehicle should speed up and pass through the intersection if there is a certainty; otherwise, it should slow down appropriately to give way.
In this paper, the priority of crossing the intersection that we made is presented as follows: if the time for the straight vehicle arriving at (x , y ) is less than that of the turning vehicle when reaching (x 1 , y 1 ), then the straight vehicle goes ahead. Otherwise, if the time for the straight vehicle to arrive at point (x 2 , y 2 ) is longer than that for the turning vehicle to arrive at (x 1 , y 1 ), then the straight vehicle gives way to the turning vehicle. Consequently, according to the required time calculated before, the decision-making model should be established to output the decision of going ahead or giving way, and the RL agent can be instructed to learn and converge to the planned traffic strategy by calculating the expected acceleration. The expected speed of the intelligent vehicle can be computed using (16), where ∆t denotes the step interval between two decisions. The expected acceleration is calculated by (17), which considers the sequence of the two vehicles reaching the key position. Here, v min denotes the lower limit of the vehicle speed, and K 1 denotes the constant term of inverse proportion function-here set as 1-meaning that the acceleration of the straight vehicle is inversely proportional to the arrival-time difference between the two vehicles.
v re f = v ego + a re f ∆t (16) The mathematical model of the RL reward function determines the convergence direction of the agent during learning. To ensure the efficiency and security of crossing at the intersection, the reward function shown in (18) to (22) is constructed. This reward function consists of aspects in collision, speed, and the position of whether to reach the end point. In addition, it also includes the difference between the reference acceleration calculated by (17) and actual acceleration of the vehicle, which will guide the agent to converge to the direction of the decision-making model. Compared with other rewards, the reward of R f is smaller, thereby accelerating the convergence speed of the agent in the training process, while avoiding limiting the ability of the agent to explore.
where R G denotes the reward for reaching the end point, R C denotes the penalty of collision, R S represents the penalty for speeding or being too slow, and R f is the reward for actual speed close to reference speed. x d denotes the abscissa of the end point, and v upper and v lower denote the upper and lower limits of speed, respectively. A deep deterministic policy gradient (DDPG) algorithm is employed as the training algorithm of RL. The structure of DDPG is mainly composed of an actor network and a critic network. The actor network is mainly responsible for outputting actions according to the current state, and the critic network accounts for outputting the value of the state-action pair. The flowchart of the decision-and-control model at the intersection based on RL and speed prediction by ARIMA is shown in Figure 12.

Model-Evaluation Method
To analyze the control effect of the quantitative model, five indexes, including success rate, speed penalty, safety, traffic efficiency, and comfort are considered to evaluate the performance. The success rate of the intersection is calculated by (23), in which 0 and 100 points are scored according to whether the vehicle finally passes the terminal. The speed penalty is evaluated by (24), in which 1 t  denotes the sum of the duration when the vehicle speed exceeds and goes below the desired speed range, and 2 t  denotes the total time duration when the last vehicle passes the finish line. The minimum distance between two vehicles is adopted as the index to evaluate the driving safety of intelligent vehicles. According to the circular model of vehicle body established in Section 2, the risk of collision increases with the approach of two body circles. Hence, a distance of 1.5 times the radius of the body circle is reserved in this study, and is therefore defined as the optimal interval between the two vehicles. A longer or shorter distance out of the optimum is considered an unsatisfactory scenario. The maximum distance is defined as the distance between the initial positions of two vehicles. The assessment of traffic safety at the intersection is shown in (25), where D is the diameter of the body circle, max L is the distance between the coordinates of the initial positions of two vehicles, and min d represents the minimum distance between two vehicles in the running process. In combination with the intersection simulation scene model, max L can be calculated in a straightforward manner according to (26), in which L is the width of the vehicle model adopted, and 1 L represents the distance from the initial position of straight vehicle to the longitudinal centerline of the intersection, and 2 L represents the distance from the initial position of turning vehicle to the horizontal centerline of the intersection .

Model-Evaluation Method
To analyze the control effect of the quantitative model, five indexes, including success rate, speed penalty, safety, traffic efficiency, and comfort are considered to evaluate the performance. The success rate of the intersection is calculated by (23), in which 0 and 100 points are scored according to whether the vehicle finally passes the terminal. The speed penalty is evaluated by (24), in which ∑ t 1 denotes the sum of the duration when the vehicle speed exceeds and goes below the desired speed range, and ∑ t 2 denotes the total time duration when the last vehicle passes the finish line. The minimum distance between two vehicles is adopted as the index to evaluate the driving safety of intelligent vehicles. According to the circular model of vehicle body established in Section 2, the risk of collision increases with the approach of two body circles. Hence, a distance of 1.5 times the radius of the body circle is reserved in this study, and is therefore defined as the optimal interval between the two vehicles. A longer or shorter distance out of the optimum is considered an unsatisfactory scenario. The maximum distance is defined as the distance between the initial positions of two vehicles. The assessment of traffic safety at the intersection is shown in (25), where D is the diameter of the body circle, L max is the distance between the coordinates of the initial positions of two vehicles, and d min represents the minimum distance between two vehicles in the running process. In combination with the intersection simulation scene model, L max can be calculated in a straightforward manner according to (26), in which L is the width of the vehicle model adopted, and L 1 represents the distance from the initial position of straight vehicle to the longitudinal centerline of the intersection, and L 2 represents the distance from the initial position of turning vehicle to the horizontal centerline of the intersection.
The traffic efficiency of vehicles at intersections is usually calculated based on the time of crossing the intersection. This study evaluates the traffic efficiency of intelligent vehicles based on Equation (27), where T max and T min represent the time duration from T = 0 to the terminal under the condition that the vehicles accelerate or decelerate to the limited speed with the maximum permitted acceleration or deceleration, as expressed in (28) and (29), where v ini , v upper , and v lower denote the initial speed, upper speed limit, and lower speed limit, respectively, and S denotes the distance from the initial position to the finish line.
The vehicle-comfort evaluation is relatively complex. In this study, a test standard in [28] is adopted to evaluate the degree of comfort for vehicles in the intersection. The standard stipulates that the root mean square (RMS) value of weighted acceleration is utilized to evaluate the impact of vibration on human comfort and health. The detailed calculation can be described as follows. Given the acceleration sequence a(t) in the time domain, the weighted-acceleration time series a w (t) is obtained through the filtering network of the frequency-weighting function w( f ), as expressed in (31), and the RMS value of the weighted acceleration can be calculated according to (30).
where T denotes the analysis time of vibration. According to [28], the frequency-weighting functions w( f ) at different input points and directions of vibration are different. In this study, only the vibration caused by longitudinal acceleration is considered. Thus, the vibration on the seat back is selected as the input point for comfort study. The standard stipulates that the final result of RMS value of the total weighted acceleration should consider the weighting of all the directions in the axial system, of which the calculation is shown in (32). Since the influence of lateral and vertical vibration is ignored in this study, the RMS value of the total weighted acceleration is simplified to (33), and k x = 0.8 is obtained by a look-up table. The relationship between the RMS value of the total weighted acceleration and passenger comfort level specified in [28] is presented in Table 1. Six levels of comfort scoring that evaluate the comfort level from "no discomfort" to "extremely uncomfortable" are defined as 100, 80, 60, 40, 20, and 0 points. Accordingly, the driving comfort of the vehicle is evaluated according to the level of passenger comfort in Table 1.
The overall comfort score is calculated by (34), in which S C(i) is the comfort score of each time period, and S C(max) represents the full score. After the scores of each evaluation index are calculated, the comprehensive score that evaluates the effect of decision making for the straight vehicle can be calculated by (35), where k i denotes the weight coefficient (0.2 in this study). The weight coefficient k i can be adjusted according to the needs of different scenarios. For instance, if passengers pay more attention to comfort and safety when crossing the intersection, but with no requirement for passing time, k 5 , k 3 can be increased and k 4 can be reduced. Table 1. Relationship between the RMS value of the total weighted acceleration and passenger comfort level in [28].

Validation and Discussion
In this study, sufficient traffic simulations are conducted to verify the effectiveness of the proposed method. The RL agent is trained using the proposed method under different driving conditions of intersections. Then, the simulation is performed to validate the effectiveness of decision making at the same initial speed. Finally, the overall performance of the proposed method is evaluated comprehensively with the indices mentioned in Section 3.3.

Simulation Validation
To verify the effectiveness of the proposed method, a number of cosimulation tests are carried out based on Prescan and Matlab/Simulink. In the simulation, Prescan is concerned with the provision of the intersection-simulation scene, and the vehicle information is exchanged between the Prescan and Simulink control models via a virtual CAN bus. A schematic diagram of the cosimulation process is shown in Figure 13.

Simulation Validation
To verify the effectiveness of the proposed method, a number of cosimulation tests are carried out based on Prescan and Matlab/Simulink. In the simulation, Prescan is concerned with the provision of the intersection-simulation scene, and the vehicle information is exchanged between the Prescan and Simulink control models via a virtual CAN bus. A schematic diagram of the cosimulation process is shown in Figure 13. Before the cosimulation, the scenario model of the road in the simulation should be constructed in Prescan, as shown in Figure 14. The specific values of parameters for each road segment are provided in Table 2. Before the cosimulation, the scenario model of the road in the simulation should be constructed in Prescan, as shown in Figure 14. The specific values of parameters for each road segment are provided in Table 2.
The trajectory of the turning vehicle in the cosimulation condition is constructed based on the Open ITS dataset. The speed curve is shown in Figure 15, the related parameters in RL are shown in Table 3, and the reward convergence curve during the training process is shown in Figure 16. Before the cosimulation, the scenario model of the road in the simulation should be constructed in Prescan, as shown in Figure 14. The specific values of parameters for each road segment are provided in Table 2.     The trajectory of the turning vehicle in the cosimulation condition is constructed based on the Open ITS dataset. The speed curve is shown in Figure 15, the related parameters in RL are shown in Table 3, and the reward convergence curve during the training process is shown in Figure 16.   The decision-making agent for a straight vehicle is obtained after RL training. In this study, verification experiments are conducted on the trained agents under two conditions to evaluate the control performance. To compare the effectiveness under different working conditions, the initial speeds of the agent in the experiments are kept the same. The speed range of the intersection is set from 0 to 8 m/s, and the initial speed of the vehicle in the verification experiment takes a middle value of 5 m/s within the allowable speed range.

Parameter
The simulation results under condition I are shown in Figure 17a-f. The simulation results show that the straight vehicle slows down and gives way to the turning vehicle if the turning vehicle reaches the confluence point in advance. However, the straight vehicle still drives at a speed that is slightly higher than the lower speed limitation, rather than braking to stop. After the turning vehicle passes the confluence point on the road, the straight vehicle accelerates to follow the former vehicle as quickly as possible. In Figure 17a, the turning vehicle stops at the terminal of the intersection as the end of the trajectory of the turning vehicle is set at the terminal of the intersection. Figure 17b shows the distances to the terminal of the two vehicles. After passing the destination, the turning vehicle drives toward the end of the virtual scene, and this explains the trend of the speed curve that decreases first and then accelerates to the end. Moreover, no unreasonable strategy that leads to the conflict operation of the throttle and brake pedal can be found in Figure 17c,d, and the last two figures show that the acceleration and distance between the two vehicles are restricted under the limits of the entire process.  The decision-making agent for a straight vehicle is obtained after RL training. In this study, verification experiments are conducted on the trained agents under two conditions to evaluate the control performance. To compare the effectiveness under different working conditions, the initial speeds of the agent in the experiments are kept the same. The speed range of the intersection is set from 0 to 8 m/s, and the initial speed of the vehicle in the verification experiment takes a middle value of 5 m/s within the allowable speed range.
The simulation results under condition I are shown in Figure 17a-f. The simulation results show that the straight vehicle slows down and gives way to the turning vehicle if the turning vehicle reaches the confluence point in advance. However, the straight vehicle still drives at a speed that is slightly higher than the lower speed limitation, rather than braking to stop. After the turning vehicle passes the confluence point on the road, the straight vehicle accelerates to follow the former vehicle as quickly as possible. In Figure  17a, the turning vehicle stops at the terminal of the intersection as the end of the trajectory of the turning vehicle is set at the terminal of the intersection. Figure 17b shows the distances to the terminal of the two vehicles. After passing the destination, the turning vehi-  Figure 18 shows the simulation animation of the straight vehicle when it passes through the intersection under working condition I. The yellow rectangular area of the figure represents the straight vehicle that passes through the intersection; hence, the traffic priority of the two vehicles can be exhibited in the simulation animation.
The simulation results under working condition II are shown in Figures 19a-f and 20. In working condition II, the speed of the turning vehicle is obviously lower than that of working condition I, and the straight vehicle on a straight road chooses to pass the intersection first. The simulation results show that the speed and acceleration of the intelligent vehicle in working condition II are also within the limitations. of the turning vehicle is set at the terminal of the intersection. Figure 17b shows the distances to the terminal of the two vehicles. After passing the destination, the turning vehicle drives toward the end of the virtual scene, and this explains the trend of the speed curve that decreases first and then accelerates to the end. Moreover, no unreasonable strategy that leads to the conflict operation of the throttle and brake pedal can be found in Figure 17c,d, and the last two figures show that the acceleration and distance between the two vehicles are restricted under the limits of the entire process.  Figure 18 shows the simulation animation of the straight vehicle when it passes through the intersection under working condition I. The yellow rectangular area of the figure represents the straight vehicle that passes through the intersection; hence, the traffic priority of the two vehicles can be exhibited in the simulation animation.   Figure 18 shows the simulation animation of the straight vehicle when it passes through the intersection under working condition I. The yellow rectangular area of the figure represents the straight vehicle that passes through the intersection; hence, the traffic priority of the two vehicles can be exhibited in the simulation animation. The simulation results under working condition II are shown in Figure 19a-f and Figure 20. In working condition II, the speed of the turning vehicle is obviously lower than that of working condition I, and the straight vehicle on a straight road chooses to pass the intersection first. The simulation results show that the speed and acceleration of the intel-    As can be observed from the above simulation results, compared with the method proposed in [29] by sequential MDPs with standard or bipotential features, the proposed method can better smooth the speed curve of vehicles when passing through the intersection and promote the comfort performance. Since the lower limit of the speed is set, and the expected speed is leveraged to guide the agent, the parking at the intersection is effectively avoided, and the stop time at the intersection is also obviously reduced, compared with results presented in [29], thereby improving the traffic efficiency of the intersection.

Effect Evaluation
The above analysis verifies that the proposed controller is superior in decision making for the straight vehicle, and the trained agent can take the appropriate opportunity to pass the intersection encountering different turning vehicles. Since the simulation durations of working conditions I and II are inconsistent, the acceleration data from 0 to 10 s are selected to calculate the RMS value of weighted acceleration for the convenience of comparison. Owing to the fluctuations in the acceleration results, if the RMS of the weighted acceleration is calculated for the entire length of driving data, the comprehensive comfort evaluation concerning local acceleration fluctuation is intractable. Thus, the RMS of the weighted acceleration is calculated separately with 10 intervals from 0 to 10 s. The calculation results under the two working conditions are listed in Tables 4 and 5, respectively, and the results are scored according to the scoring standard established in Section 3.3. The results show that the passenger comfort of the straight vehicle is worse at the beginning since the necessary acceleration or deceleration is inevitable to achieve the purpose of going ahead or giving way. After this stage, passenger comfort improves in the later confluence process.   Table 6 shows the comprehensive and individual scores of the indices of the proposed controller under the two conditions. For the success rate and speed penalty, in both conditions, the straight vehicle can successfully reach the ending point within the speed limit, and thus the scores of both items are 100 out of 100. For security, under working condition I, the minimum distance between two vehicles is always longer than the optimal interval, and they are relatively close. While the minimum distance between two vehicles under condition II is less than the optimal interval, thus the safety score under condition II is lower than that for condition I. In terms of traffic efficiency, the acceleration behavior of the straight vehicle under condition II according to the speed prediction of the steering vehicle makes the straight vehicle pass through the intersection ahead of the turning vehicle, which reduces the passage time, and as such it obtains a higher score in traffic efficiency, but at the expense of comfort. Based on the results above, the simulation results under the two working conditions can achieve preferable results in the comprehensive score. The results show that the proposed decision-control strategy can guide the vehicle to pass through an intersection safely and efficiently under different driving conditions.

Conclusions
To solve the decision-making and control problem at intersections for straight and turning vehicles, a decision-and-control method based on RL and vehicle-speed prediction is proposed. The road-geometry model of the intersection is built, and the distribution of speed and acceleration and the main factors that influence the decision process are analyzed based on an open-source confluence-trajectory dataset for intersections. Based on the ARIMA method, the speed prediction of the turning vehicle in the future time domain is conducted, and a decision-making method for intersections based on RL and ARIMA is proposed. Cosimulations are performed for the established intersection scene to validate the effectiveness of the proposed algorithm. The simulation results reveal that the trained RL agent can make appropriate decisions to pass the intersection safely and efficiently under two working conditions. Finally, the performance of the proposed method is evaluated based on the proposed evaluation standard from five indices. The results manifest that under different working conditions, the proposed method exhibits superior performance among all indices and comprehensive scores.
However, it is assumed that the sensing of the surroundings is precise and the traffic information can be received by the vehicle. The influence of errors caused by the perceptual layer should be addressed in the next research step. In addition, considering the multivehicle as agents, the interaction and game between vehicles also need to be further investigated.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.