A Review of Research on Intersection Control Based on Connected Vehicles and Data-Driven Intelligent Approaches

: Benefiting from the application of vehicle communication networks and new technologies, such as connected vehicles, video monitoring, automated vehicles and vehicle–road collaboration, traffic network data can be observed in real-time. Applied in the field of traffic control, these technologies can provide high-quality input data and make a more comprehensive evaluation of the effectiveness of traffic control. However, most of the control theories and strategies adopted by adaptive control systems cannot effectively use these real-time, high-precision data. In order to adapt to the development of the times, intersection control theory needs to be further developed. This paper reviews the intersection control strategies from many perspectives, including intelligent data-driven control, conventional timing control, induction control and model-based traffic control. There are three main directions for intersection control based on the connected vehicle environment: (1) data-driven reinforcement learning control; (2) adaptive performance optimization control; (3) research on traffic control based on the environment of connected vehicles (CV); and (4) multiple intersection control based on the CV environment. The review gives a clear view of the data-driven intelligent control theory and its application for intelligent transportation systems.


Introduction
With socio-economic development and the acceleration of urbanization, the existing urban roads cannot withstand the increasing traffic flow, which puts tremendous pressure on the urban traffic management department. The intersections of urban roads display a high incidence of congestion, which has a huge impact on traffic efficiency. Therefore, the primary issue for improving traffic congestion should start from the control of intersections, to improve the efficiency of road traffic [1].
In the past few decades, traffic signal control has experienced many developments; from the initial fixed-time control, to the actuated control, to the development of adaptive control with realtime adjustment capability [2]. Initial fixed-time control is a signal control method that calculates the timing of a flow signal based on historical flow data. Its signal timing does not adapt to changing traffic needs [3]. The control method is more flexible, but these sensors are expensive to maintain [4], and the actuated control must have a greater set of predefined static parameters to change these timings [5].
used as a powerful motion detector [14], to collect road conditions and detect dangerous obstacles to reduce traffic accidents. [15]. The Intelligent transportation system (ITS) is a comprehensive, real-time, accurate and efficient comprehensive transportation and management system. At present, relying on communication technology, connected vehicles can also be used as sensors to collect high-precision status information, making a great contribution to intersection control [16]. Extensive research on autonomous driving has also provided wider dynamism for the deployment of ITS [17]. 5G can push car cloud efficiency in a short time, with larger data dissemination in the next generation of mobile networks. It can significantly increase the ITS performance. The use of this network can find out the ITS dynamic behavior via the 5G solution [18]. The structure of the intelligent transportation system is shown in Figure 1, where Part 1 is the collection of traffic information, and the vehicle status can be obtained by accessing the network data of the CV or loop. Part 2 is sensing and predicting the traffic state, such as predicting the real-time queue length. Part 3 is the traffic system control, which is composed of various strategies and finally completes the signal timing. Part 4 is high-level traffic monitoring and management, which has the highest authority and can obtain all the information. Part 2 and Part 3 are called transport system models (TSM), which can realize sensing, prediction and control. This article also reviews adaptive signal control at intersections from sensing, prediction and control.  With the advancement of communication technology, the data that can be applied to traffic control is also constantly being enriched. However, the disadvantages of traditional adaptive control strategies show up in large amounts of data processing, which require the development of adaptive control algorithms and strategies suitable for our developing times. The traffic control methods reviewed in this paper are suitable for urban intersections, where the signal lights and vehicles have the capacity for network communication. What is more, it is useful to research advanced intersection control methods to increase traffic efficiency in crowded urban traffic. This article analyzed technical differences and features of three kinds of traffic control, namely model-based traffic control, intelligent computing traffic control and data-driven traffic control, and discusses the key issues and technology development trends of these adaptive control modeling methods in traffic data-rich environments. The second section introduces signal sensing at intersections, the third section introduces traditional adaptive control methods, the fourth section introduces the development trends of adaptive signal control methods in the CV environment, and the fifth section summarizes them. An overview of related literature is shown in Figure 2.

Data Perception for Intersection
V2X technology makes the connected vehicle become a "mobile sensor", which can collect traffic information [19]. The management of Internet in vehicles has become the focus of traffic management. How to accurately obtain traffic information from a small number of connected vehicles' data has become the research focus of many researchers engaged in vehicle network data processing and application research [20]. Zheng et al. used a small amount of connected cars' GPS trajectory data in the model to predict traffic volume at the intersection. The model considered the arrival of vehicles as a Poisson distribution, to establish the maximum likelihood problem to solve the traffic prediction value [21]. Wang et al. proposed a new topology framework based on connected vehicle data. The topology framework has been used for road network modeling, given the traffic flow propagation mode, and designing a regression neural network as an online predictor to learn traffic transmission patterns in road networks [22]. Connected car data is widely used because it comes from various sources and gives a variety of traffic information.
Regular vehicles (RV), connected vehicles (CV) and automated vehicles (AV) compose traffic flow, which changes the urban road network traffic composition. In addition to private cars, there will be busses in the traffic, which makes the traffic streams more complicated, as shown in Figure 3. The information network of connected vehicles and automated vehicles affects driving behavior and demand. The study of a multi-modal traffic flow control theory is imminent [23]. Adaptive flow control is used to meet the needs of real-time traffic. The adaptive traffic control system with V2X Today, it is not uncommon for transportation infrastructure and vehicles to have the ability to network and provide useful information. The rapid development of the Internet of Things has greatly improved the framework [32]. Many researchers have focused their research on the use of the Internet of Things in ITS, as well as traffic information collection and dissemination, and traffic flow management and monitoring [33]. Figure 4 shows an implementation of 4G/5G communication technology in connecting intelligent cameras, loop and connected vehicles to the Internet of Things. Intelligent cameras, connected vehicles and loops sense the state, and communicate these messages to the 4G/5G communication technology. The features are processed through the cloud server, and finally the information perception and prediction are performed through the algorithm of the Matlab platform. Obviously, the reliability of V2X communication is the key factor in realizing signal perception in the CV environment. Meng et al. proposed that the loss of vehicle information and location errors would result in signal timing failure and a large amount of delay [34]. Robust optimization is a common method for solving the uncertainty in traffic control [35]. Tong et al. proposed a stochastic programming model to schedule adaptive signal control, to minimize the expected delay of vehicles, greatly reduce the computing cost, and make better use of the updated traffic information [36]. Li et al. established a predictable interference control, which was applied to the predictable interference control of the V2V network to ensure the reliability of the predicted communication. Although this method focuses on vehicle-to-vehicle communication, it can also be extended to V2I communication [37]. Filipe et al. developed a robust deep reinforcement learning (DRL) algorithm by pre-training for emergencies, such as communication interruption, and then transferring the learning parameters to the scene. The results showed that this method provided the best choice compared with other methods [38].
However, the role of the algorithm is relatively limited. How to ensure robust V2X communication is the fundamental issue in solving this problem. By combining the advantages of dedicated short-range communication (DSRC) and cellular networks, the researchers are using cellular networks (4G/5G) as backup data for the vehicle, breaking through the limitations of a single V2X's communication capability and making the communication more robust [39,40]. The emergence of 5G+DSRC V2X communication technology makes it possible to achieve higher quality communication.
Various traffic states, such as traffic flow, traffic congestion and queue length at intersections, are collected, extracted and integrated through the interconnected transportation infrastructure and vehicles. Various traffic conditions provide broad prospects for improving and promoting traffic control. In this regard, emerging connected vehicles and wireless technologies are undoubtedly a watershed, making transportation information exchange more convenient. With the development of equipment technology, the performance and computing power of transportation infrastructure controllers have changed substantially. In addition, the powerful computing capabilities of cloud and edge computing also provide a new possibility for the improvement of traffic signal perception and prediction. [41,42], and make it possible to build a smart city framework of intelligent internet of things (IoT), cloud, edge and 5G, and provide more efficient and reasonable services [43].

Conventional Adaptive Control Methods of Intersections
There are many means of traffic control different from fixed-time control and actuated control. Adaptive control regards a single-point intersection as an uncertain system, and uses real-time traffic status updates as the value of input parameters. The value of the output parameter is calculated by the control unit, so that the signal control system reaches an optimal state, which makes up for the shortcomings of the first two control methods. At an intersection, the criterion for evaluating the pros and cons of signal control are the consideration of whether the signal timing can achieve rapid and accurate feedback, and whether a signal control scheme can be provided in a timely manner. Adaptive control of urban traffic signals is an extremely complex and difficult control problem, from both a theoretical and a practical perspective [1].
Consulting the previous literature, traditional adaptive signal control can be summarized into three categories, as shown in Figure 5, which are model-based traffic control (MBC), intelligent computing and data-driven control. These can also blend with each other to form a hybrid algorithm to control traffic lights, making a huge contribution to traffic control in the past and present.
Combine Combine Figure 5. Overview of the conventional adaptive control methods.

Introduction to Adaptive Signal Control System
The development of adaptive signal control can be roughly divided into the following six stages, as shown in Table 1. There are many ways to classify urban traffic signal control methods and structures. According to control strategies, urban traffic control methods are classified accordingly: (1) Offline optimization method. The typical representative of this method is the TRANSYT system. TRANSYT is a simulation/optimization model and serves as an informal international standard. The timing scheme is based on the historical data of the transportation network, and mainly uses computer modeling, optimization and simulation techniques. The objective function of this method mainly uses the number of stops and delay times as indicators, and uses the blind mountain climbing method to optimize the phase difference and green light time [44].
(2) Online plan selection method. This method is typically represented by the SCAT (Sydney coordinated adaptive traffic) system. It uses offline optimization to optimize several timing schemes corresponding to different traffic flows on the road network, and uses class saturation and comprehensive traffic for optimization selection.
(3) Online scheme generation method. The typical representative of this method is the SCOOT (split, cycle and offset optimization technique) system. It consists of a traffic prediction model and timing parameter optimization. The traffic model is processed online, and directly calculates a series of parameters and operating indicators based on real-time feedback concerning traffic conditions on the road network.
According to control strategies, the structures of the urban traffic control systems are classified accordingly: (1) Centralized traffic signal control system. All signals in the control area are connected, and the entire system is controlled centrally by the control center. The centralized control system has only one control center, and each intersection is directly connected to the control center, forming a star structure on the topology. In this structure, each intersection executes the control strategy formulated by the control center, while at the same time, each intersection transmits its own traffic information to the control center in real-time, and the control center adjusts the control scheme based on the traffic information of each intersection [45,46]. The use of centralized control makes greater requirements of the control center system. Errors in the central system will cause the entire system to be paralyzed. Therefore, the robustness of the centralized system is poor. The SCOOT system is a centralized traffic signal control system [47].
(2) Distributed traffic signal control system. The distributed control system is mainly composed of three levels: the general control center, the sub-control center and the intersection. Among them, the general control center is responsible for the overall scheduling of the system, the coordination of the tasks of the sub-control centers and the handling of global affairs. It has the highest control ability and priority. The sub-control center is responsible for the formulation of management signal strategies and other functions at the intersections of the area or the main road, and the intersections are responsible for tasks such as collecting traffic information at their locations and implementing control strategies [48]. The distributed control structure improves the reliability of the system. The existing control system SCATS (Sydney coordinated adaptive traffic system) belongs to the distributed control structure [49].
This paper studies the application of adaptive control methods for systematically tracking urban road traffic signals from three perspectives, including model-based control (MBC) methods, intelligent computing control methods and data-driven control methods. Under the discussion of the background of the evolution of the traffic control system, the key issues and development trends of the three adaptive control methods applied to intersection traffic control are discussed [23].
As shown in Table 1, which depicts the classification of the former adaptive traffic control systems, these continue to iterate, and key technologies continue to advance, but the current traffic control system can no longer meet the needs of future traffic environment changes. With the development of V2X, autonomous driving, CV, and vehicle-road collaboration, the resolution and accuracy of vehicle flow data are constantly improving. This necessitates the emergence of new adaptive traffic control systems to adapt to the developments of the times. The development of adaptive control theory is the top priority.

Model-Based Traffic Control
The MBC methods follows the feedforward control theory, adopts adaptive control and optimal control methods and technologies, establishes a multi-target timing model of network-level traffic signals, and uses an online optimization module to calculate the optimal control scheme (period, green signal ratio and phase difference), to ensure timely updates of the control parameters of the intersection signal, so it can respond to changes in traffic demand at the intersection. According to the different control objectives, the MBC method can be roughly divided into two types, namely the comprehensive performance indicators method and the green wave band method [23].
Comprehensive performance indicators comprehensively consider delays, parking times, queuing lengths and throughput in order to obtain the best comprehensive network efficiency [50]. Most of the current adaptive traffic signal control methods use a centralized, coordinated control method based on comprehensive performance indicators. The mainstream commercial traffic control systems, such as SCOOT, U-TOPIA and MOTION, and the timing software such as TRANSYT and SYNCHR0, have evolved and developed on the basis of MBC methods. The optimization and generation of the benchmark coordination schemes of these systems are based on TRANSYT similar control logic, except that the form of the specific timing model and the method of optimization are different [51].
The green wave band method reduces the delay of vehicles traveling in the coordinated direction by increasing the green waves on the trunk corridor, which greatly improves the traffic efficiency of the mainline. Common green band-based mainline coordination control methods include the maximum green-band method MAXBAND and the variable-bandwidth multi-green-band method MULTIBAND [52]. Actually, the green wave method does not directly take delays, stopping times or queuing lengths as control targets. By maximizing the continuous passing time zone without stopping, it effectively reduces the travel time of the vehicles in the coordinated direction and minimizes the delays in the coordinated direction. However, the green wave method aims to find the maximum green wave band when the mainline convoy is continuously running. The public cycle time is usually long, which can easily cause problems, such as concentrated cycle control at noncritical intersections [53].
Obviously, it is very difficult to establish an accurate model, and the robustness of the MBC method is difficult to guarantee [54].

Traffic Control Based on Intelligent Computing
Artificial intelligence technology has given computers the ability to simulate human reasoning and learning processes, as well as learn autonomous optimization strategies, during the interaction between traffic controllers and traffic environments. It has strong nonlinear approximation and learning capabilities. In the past 10 years, the use of multi-agent modeling technology-intelligent computing-based traffic control methods without precise mathematical models-has attracted the attention of many scholars. Among them, fuzzy logic, neural networks and group intelligence algorithms dominate [55].

Fuzzy logic
A fuzzy logic (FL) traffic manager or expert control signal can accumulate experience with if else-expressed fuzzy rules, fuzzy inference and reasoning to achieve decision-making process, and is a nonlinear and effective uncertainty tool [56]. An effective method for solving urban traffic problems is to apply fuzzy control to urban traffic signal control. As early as 1977, Pappisno used fuzzy control in traffic control, and established a fuzzy control rule base or expert system for various traffic conditions, achieving great results [57]. Since then, in order to improve the level of fuzzy controllers in solving actual traffic problems, multi-level fuzzy structure models, such as two-level fuzzy and three-level fuzzy, have been proposed and developed, from a single point to the application of regional traffic control. At the same time, combined with group intelligence, hybrid intelligent control methods, such as neural networks, have been studied by many scholars to improve the learning ability of fuzzy logic itself [58]. Research shows that the method of applying fuzzy control to traffic signal control works well, but the structure and parameters of artificially set fuzzy controllers are affected by individual subjectivity. The structure and parameters of the fuzzy controller are optimized online. The current research focusses on fuzzy applied to urban traffic signal control [6].

Neural Network
Artificial neural network (ANN) is a self-learning system that has been shown, in theory, to have a strong approximation ability and self-learning function in nonlinear mapping, widely used in traffic pattern recognition systems, such as adaptive control and other fields [23]. Nakastuji uses the fuzzy neural network in the forward dynamic programming algorithm. Fuzzy control is also used in PRODYN real-time control system [59].
Srinivasan et al. adopted a multi-agent architecture to model the traffic signal control problem as a distributed unsupervised response model. In this architecture, the signal light at each intersection is an agent, which is approximated using a fuzzy neural network. Simulation results show that the average vehicle delay is reduced by 78% [60]. In addition, Srinivasan et al. used each agent to handle an intersection, which thus constitutes a single-layer distributed multi-agent traffic signal control system. Through the cooperation between the agents, the delay is effectively reduced by 35.6% [61].
Traffic situations are very complicated, and traffic systems also have strong nonlinear characteristics, and it is very difficult and time-consuming to collect state data, resulting in a slow convergence of the system. In addition, the initial training worth setting is a great influence on the training speed and effect of the neural network [23].

Group Intelligence
Group intelligence algorithms, such as genetic algorithms (GA), ant colony optimization (ACO) and particle swarm optimization (PSO) simulate the social behavior of living things [62], as they use the group search strategy and the information exchange between individuals in the group to carry out global random searches and parallel optimizations. The search process does not depend on the gradient information of the object, and has been widely used in traffic control [63].
Zhao considers the lane restructuring strategy (for example, reverse lane, one-way streets, turn restrictions and cross-elimination) in enhancing the effectiveness of the transport network in terms of capacity, and also contributed to the establishment of a mathematical model of the traffic balance network. In this model, lane reorganization optimization and traffic control strategies are integrated in a structure, and a GA genetic algorithm is used to obtain the optimal solution [64]. Li et al. proposed a hybrid algorithm based on a simulated annealing algorithm (SA) and a GA to optimize arterial signal timing. This method is an effective solution, showing that the optimal green light time should be proportional to the number of important lanes at each stage. Various signal optimization models can use this algorithm. In addition, optimizing the phase sequence of supersaturated intersections is very suitable for this algorithm. Compared with SA or GA algorithms, the SA-GA algorithm has more advantages in solving quality and convergence speed [65].
With regard to complex multi-objective optimization control, traditional group intelligence algorithms may face issues such as local optimization [66]. In practical applications, simulated annealing, golden section local search optimization operators, or a combination of multiple algorithms, are often used to propose a group intelligence algorithm with a mixed structure, to increase the local optimization capabilities of traditional group algorithms such as GA [62]. However, due to the limitations of optimization efficiency, the hybrid group intelligence algorithm is only suitable for offline optimal adaptive control.

Data-Driven Traffic Control
Data-driven control (DDC) involves the controlled object using the direct input/output (I/O) amount of the measurement data to design a control theory and method, without the mathematical model of the environment, also known as the Model-Free Control. Iterative learning control (ILC) is an important application of DDC, and the idea of (ILC) has been widely used in reinforcement learning (RL) and adaptive dynamic programming (ADP) algorithms. It has been successfully applied to single point, main line and regional traffic signal control [67].

Reinforcement learning
Reinforcement learning (RL), as a machine learning algorithm, adopts the method of learning while acquiring examples and updating the model, and guides the next action with the current model. It then updates the model according to the feedback after the next action is performed, and iterates continuously until the model converges. The combination of the strong adaptability of RL and the highly time-variable traffic flow is of great significance for the realization of urban intelligent signal management [68].
RL is portrayed in Figure 6; it first extracts an environment from the task to be completed, from which it then abstracts the state, action, and reward for performing the action. In signal control, the intelligent signal light is an agent, which obtains the state from the intersection to perform the signal timing action. At this point, it changes from one state to another. If the agent's signal timing effectively reduces traffic congestion, it will be rewarded, otherwise, it will not be rewarded. Liang et al. used the collected traffic data to divide the entire intersection into multiple grids, thereby categorizing complex traffic scenarios into multiple states. By defining the position and speed of the vehicle at the intersection as the state, the phase corresponding to the green light duration as the action, and the cumulative waiting time of adjacent cycles as the reward signal, the signal optimization problem is expressed as a Markov decision process. The convolutional neural network is used to solve the model, and multiple optimization elements, such as a game network, a target network, a double Q learning network and priority experience playback, are integrated to improve the control performance [70].
Erwin et al. expressed the traffic flow optimization problem as a Markov decision problem, using a Q-learning algorithm to learn the speed distribution law of high-speed sections, and determined the maximum allowable speed to alleviate the traffic congestion problem at the entrance and exit of ramps [71].

Adaptive Dynamic Programming
ADP is a near-optimal method emerging in the field of optimal control. It flourishes in RL, ANN, FL, etc., and can provide many solutions and specific technical methods for solving nonlinear system optimization problems. ADP is a cross-domain generated by the integration of artificial neural networks, optimal control and RL. It can also be considered as an extension of continuous learning in discrete fields, and it is defined as a modern version of reinforcement learning. ADP is widely used in various complex control fields [72].
Dynamic programming algorithms are computationally intractable [3]. Compared with dynamic programming, adaptive dynamic programming uses a combination of offline and online training methods, which ensures that the system parameters can quickly and accurately change with the real state and can make the system robust [73].
In order to enable the signal controller to respond in real-time, Cai et al. proposed a control algorithm based on adaptive dynamic programming (ADP). The algorithm uses dynamic programming and reinforcement learning computing power to approximate, which improves the computing efficiency. In addition, simulation results show that the algorithm can dynamically allocate green time, automatically adjust control parameters and quickly respond to signal timing. Compared with the fixed timing method, this algorithm reduces a large number of vehicle delays. [74].
Cai et al. proposed an adaptive signal control method. This control method uses adaptive dynamic programming to give the traffic controller the ability to constantly understand its own performance. In this way, the vehicle can predict the remaining travel time when approaching the stop line of the intersection. This method has achieved gratifying results in solving problems in dynamic control and optimal control performance [75].

Development Trends of Adaptive Signal Control for Intersections in Future Traffic Environments
Current adaptive control methods, such as MBC, intelligent computing methods and datadriven methods, can only be applied to small-scale and linear control systems. In the current era of the Internet of Everything, there must be new control methods to break through this limitation. The current vehicle-intersection control network, based on documentation of the last 10 years, can be divided into the following categories: intersection data-driven RL control, with performance optimization based on adaptive control of the intersection and automated vehicles; and networklinked automotive environment intersection control, based on the main road and network control of the connected vehicle and autonomous vehicle environment [68,76]. Table 2 summarizes the analysis of the main technical characteristics of the traditional adaptive control methods, and the current adaptive signal control research's hotspot direction.

Data-Driven RL Control
The rapid development of new technologies, such as video information, vehicle detection, CV, and autonomous driving, makes the information collected from traffic data more abundant. RL is a model-free learning algorithm, which is very suitable for regional traffic control. The use of RL for traffic signal control has become very widespread, and has been used for single-point, arterial and regional signal control [77]. There will be no doubt that transportation systems become more intelligent with machine learning, big data, and excellent computing resources. The cost and difficulty of data storage and analysis will become lower because of the collection of multi-source transportation data [78].
Although traditional reinforcement learning shows better performance in simple models, such as piecewise constant table and linear regression, its scalability and optimality do not seem to be good in reality. In recent years, combining deep neural networks (DNN) with reinforcement learning has become an effective way to improve the performance of reinforcement learning in complex problems [79]. Today's data is very rich. First of all, it is feasible to use DNN to extract useful feature information from the original traffic information, transform the problem into a reinforcement learning problem, and then use reinforcement learning to learn the optimal traffic signal strategy [80]. Figure 7 shows the structure of a common deep neural network, which consists of three convolutional layers and two fully connected layers. The first two convolutional layers are used to extract low-level features, and the last convolutional layers are used to extract high-level features. Finally, through two fully connected layers, the classification of vehicles and date processing can be achieved. Because DNN can extract information useful for signal control from massive state data, combined with reinforcement learning, it can adapt to current developments more effectively than simple reinforcement learning. This section focuses on deep reinforcement learning (DRL). Gao et al. designed a deep reinforcement learning algorithm in order to extract more effective information for traffic signal control from rich traffic conditions. This algorithm can autonomously learn the optimal strategy suitable for the intersection environment. In order to make the algorithm more stable, the structure of the joint target network of experience playback is designed. Through simulation, and compared with the common control algorithm for the current intersection, this algorithm can significantly lower the delay: 86% lower than fixed timing, 47% lower than the longest queue length algorithm [80]. Li et al. established a deep neural network (DNN) to implement the signal timing strategy. Relying on the powerful processing ability of DNN, it provides more accurate input for the calculation of the Q function of reinforcement learning. They modeled the problem as an implicit use of control actions and system state changes to find the optimal control strategy. Comparison with other methods shows that this method works well in the timing of intersections. In addition, because of the excellent ability of DRL, they also proposed the development trend of the next generation of ITS, which is the joining of DRL and ITS [81].
Considering the wide application of data-driven strategies in ITS, Zhang et al. proposed a novel data-driven, distributed adaptive collaborative control (MA-DD-DACC) method based on a multiagent. The algorithm uses a distributed strategy, combined with online parameter learning. The data input of this algorithm is simple. By extracting the I/O flow queuing length data and network topology, the signal timing of the intersection can be realized. Through the verification of the stability of Lyapunov, this method can guarantee the consistency and ultimate boundedness of the distributed consensus coordination error of the queuing strength [82]. Chu et al. took advantage of the deep learning capabilities of deep reinforcement learning. Considering the limitations of centralized strategies in large adaptive traffic signal control (ATSCs), a multi-agent RL (MARL) algorithm was designed, which combines an A2C and fully scalable centralized MARL algorithm. This algorithm assigns global control to the local RL agent, which can effectively solve the scalability problem, but is limited by the previous communication quality of the agent. The simulation of the real road network in Monaco shows that the algorithm has better robustness and sampling efficiency than other algorithms (independent A2C and Q learning) [79]. After collecting enough data, deep learning can predict customer demand and driver supply with high accuracy. Data-driven DRL control shows great potential [78].
Many applications of machine learning are effective, but the limitations of existing and developing machine learning methods for the ITS's needs mean that existing machine learning is not able to drive the ITS to its full potential.

Research on Traffic Control Based on Adaptive Performance Optimization
Emerging CV can obtain traffic information, such as vehicle position, speed, and acceleration. Some scholars have focused on how to effectively use data from connected cars to optimize the timing of signal lights and to optimize the adaptive performance of signal lights. Premier et al. first proposed the application of connected vehicle technology to traffic signal control, obtaining data from connected vehicles, including vehicle identification number, location, speed, time, and other traffic information. Considering the limitation of the communication range, the optimization interval is 5 s, and the prediction interval is 20 s. Finally, it is solved by dynamic programming and full enumeration [83]. In order to achieve a better optimization effect, Yao et al. designed an adaptive signal control method for intersections based on rolling layer optimization, which improved the solution efficiency and optimization level, and its algorithm optimization interval was only 1 s [84].
In the study of Diakaki et al., the concept was proposed of using the common automatic control theory to enrich the content of traffic city control. The proposed method is verified in a large-scale urban traffic road network, and the results show that it can effectively respond to coordinated signals. In particular, this method can also be applied to the regulation of saturated traffic flow [85]. Kosmatopoulos et al. studied the adaptive optimization (AO) scheme based on the principle of random approximation, and found that the algorithms of this scheme, such as random directions Kiefer-Wolfowitz (RDKW) and simultaneous perturbation stochastic approximation (SPSA) algorithm, have the defects of transient instability, and proposed as a solution an improved algorithm. The results show that the algorithm can greatly improve the optimization performance, and can achieve convergence under most conditions, ensuring good transient performance [86].
For large-scale traffic control systems, because there is no systematic method to establish the design parameters automatically, huge costs are involved in deploying and maintaining the traffic system. Kouvelas and others have developed an adaptive fine-tuning algorithm, which can help managers determine the design parameters of traffic control systems, including splits and cycles. The algorithm is tested in the actual road network, and good results are obtained [87]. Regarding the problem of coordinated signal control, the optimization of the optimal joint motion of coordinated intersections has always been a very challenging problem. In response to this scenario, Zhu et al. proposed a reinforcement learning algorithm based on cross-tree learning. The simulation results show that this algorithm can interact with the random environment through the coordination of agents, and accurately infer the best joints. The action effectively reduces vehicle delay. [88]. In this work, Kouvelas developed an adaptive optimization scheme for peripheral control of heterogeneous transmission networks, and discarded boundary control restrictions. In a micro-simulation of a large city network with more than 1500 roads, the multi-boundary control scheme to be derived was tested. The results show that the proposed control scheme achieves better congestion allocation [89].

Research on Traffic Control Based on the Environment of CV
Mixed traffic flow consists of RV, CV and AV, and this changes the composition of urban road network traffic.
Connected vehicles and automated vehicles (CAVs) technology greatly change the composition of traffic flow, and also provide technical support for extracting richer traffic state data. For the signal control problem in this case, the common processing method is to embody the state variables into indicators that can be directly observed, such as queuing length and travel time. If one sets some more constraints, assuming that the vehicle's arrival distribution is a probability distribution such as a Poisson distribution, then the vehicle should also follow the queuing model, LWR model, or following model. Finally, the signal timing is completed by controlling the phase and phase sequence of the signal [23].
At present, the CV environment is still being gradually deployed and is in the preliminary stage. Intersection control for this scenario is still a frontier field. Relying on this technology, the new adaptive control will address the needs for accuracy and real-time traffic control more than the previous methods. However, there are still some problems. The existing research is mainly based on the optimization method, which has a simplified model, but the reality is relatively complicated. The penetration rate of CV is a key parameter that determines the effectiveness of the control algorithm. Existing research mainly focuses on CV, and the estimation of the status of RV in the CV environment is limited. In addition, the existing methods generally only consider single-modal traffic and have limited consideration of mixed traffic. Moreover, due to the limitation of communication quality, data loss has a great impact on performance, which has broad effects.
Feng et al. [39] used the high-precision data provided by CV to control the intersection signals. By proposing a vehicle position and speed (EVLS) algorithm, the minimum vehicle total delay and the minimum queue length were taken as the objective function. In the specific operation, the road is divided into different parts for analysis, including queuing area, deceleration area and free driving area. Simulation results show that this algorithm can reduce the total delay by 16.33% under the condition of high penetration rate. However, this scenario is mainly established when the CV penetration rate is high and the current traffic flow environment is not considered, and the solution of its objective function does not modify it [26].
Relying on CV technology, Lee et al. considered the composition of complex traffic flows in the current development scenario, showing it is impossible for CV penetration to reach a relatively high transition stage in a short period of time. A cumulative travel time response (CTR) algorithm was developed. The basis of this algorithm is Kalman filter technology, which can randomly estimate the traffic state, and then design an experimental simulation to verify this algorithm. The results show that 30% penetration rate is the minimum requirement to ensure that CTR can be implemented, and has a high control effect, which is of great significance in reducing traffic congestion. In addition, when the penetration rate reaches 100%, the total delay can be reduced by up to 34% [90].
Guler et al. designed an algorithm to optimize traffic control at intersections. The algorithm uses CV vehicle data, taking into account the length of the vehicle exit sequence and signal flexibility at the intersection to optimize the control effect. The results show that the penetration rate between 0% and 60% can significantly lower the delay. When the penetration rate exceeds 60%, the effect of reducing the delay gradually decreases, and is not as effective as in 10% to 60%. However, interconnected vehicles can play a significant role in reducing traffic congestion during signal timing [91].
Pandit and others rely on the vehicle self-organizing network (VANET) to control signals at intersections. VANET can collect and process a series of vehicle statuses in the network, such as position and speed. They model the problem as the job scheduling of the processor, and deal with the idea of vehicle platoon. Relying on these methods and techniques, the oldest job first (OJF) algorithm is proposed, which uses the data collected by VANET as input to determine the arrival row and thereby control the phase. This algorithm can adjust the signal control online in real-time. Compared with the traditional method, the effect of reducing delay is obvious. It is worth noting that this algorithm can only have ideal control effects in scenarios with high penetration rates [92].
It can be seen that, because CV can provide a lot of effective help for signal adaptive control, there is also a key problem. When the penetration rate is low, because the collected data samples cannot reflect the true state of traffic sufficiently, a large error will occur. At this time, its control performance is not even comparable to traditional control methods. Therefore, CV penetration is a key factor [93].
Considering the situation of low penetration rate, Li et al. proposed a new DP algorithm. This algorithm models the problem as a mixed integer nonlinear programming problem, and minimizes fuel consumption and travel time as optimization goals. Using this algorithm to approximate the phase sequence decision value of each stage can give the obtained control strategy a fixed cycle length input. The results show that this algorithm not only performs better than the fixed timing and drive control in reducing delays, but also performs better in low permeability environments [94].

Multiple Intersection Control Based on CV Environment
Most of the studies on intersection control methods based on CAV focus on a single intersection, and there are few works on the optimization and coordination of corridor-level signals. This shows that corridor-level multiple intersections control methods based on CAV face lots of challenges. On the one hand, how to cooperate the multiple intersections effectively by utilizing the information from CAV is very complicated due to arising large-scale computing. On the other hand, the difficulty for control design of single intersection and corridor still remains. These issues require further specialized research. The adaptive control system allocates transit time according to real-time traffic demand. Its control strategies are mainly divided into two categories [47]: 1) vehicles only undertake the task of traffic data collection, and a centralized control strategy is centrally processed by the central controller; and 2) the utilization-distributed control strategy, with in-vehicle computing capabilities processing part of the data. In a time-varying and random complex traffic scenario, dynamic traffic performance can be realized by adopting adaptive signal control systems. However, the centralized signal control system has disadvantages, such as poor stability, and high calculation and communication costs. The distributed signal control strategy can overcome the shortcomings of centralized signal control systems, but the performance of the overall network cannot be characterized only by the optimization of local intersections. Therefore, on the basis of exchanging information about associated intersections, it is still necessary to design and calculate a precise coordination mechanism between intersections, and adopt a joint timing strategy to ameliorate the overall traffic efficiency. The autonomy, collaboration and interactivity of agent technology meet the inherent needs of the adaptive control of distributed traffic signals. Because the agent (intersection signal controller) can sense the surrounding environment and respond timely to traffic state changes, without the direct intervention of people or other factors, the agent can actively make plans based on goals and environmental requirements to achieve traffic control automation. At the same time, through the mutual cooperation of the distributed intersection agents, the construction of a multiagent control strategy can optimize the global road network's performance.
The centralized strategy connects all the signal machines in the control area, and the control center centrally controls the entire system, which is simple to implement and highly efficient. Tan et al. proposed a new DRL algorithm to solve large-scale traffic signal control issues, via cooperation between a centralized global agent and multiple regional agents. Through the analysis of the results obtained by the regional agents, and the cross-region coordination through the centralized global agent, the encoder can find a good performance control strategy for signal control problems in large discrete motion spaces [95].
Zhang et al. proposed the QCOMBO algorithm, which is a new type of centralized independent learning and coupling, implemented by a new consistency adjuster. A three-part composite objective is optimized, that has a separate part based on the independent deep Q-network (DQN) loss function, a global part for learning the overall action-value function, and a minimization of the difference between the weight of a single Q value and the global Q value shaping items. This algorithm guarantees the cooperation between agents by maximizing the global reward. At the same time, it also guarantees the agent's ability to use the agent's unique observations and rewards to optimize individual performance [96].
Global knowledge is available for centralized fusion, which makes it capable of simple implementation and high overall performance. The cost is a high communication requirement and the need for heavy calculation for data processing in the fusion center. Benefitting from the collaboration among the neighboring agents, distributed processing that can better achieve data consistency have received much attention [97,98]. Distributed control may help solve large-scale network problems based on CAV by dividing network control problems into several simple problems at each intersection [99].
Goodall et al. proposed a microscopic simulation prediction algorithm, where new initiatives of vehicle location, direction, and speed from the connected vehicle are utilized. A rolling iteration mechanism is used in the control algorithm, where the phase separation is selected to optimize the objective function in the next 15 seconds. Deceleration, stop, and a combination of delays, are utilized to construct the objective function. This algorithm is relatively simple, with detection or signal to the communication signal, and a complete response cart wherein immediate needs of the vehicle are not needed. However, the algorithm has been greatly improved in the case of low and medium traffic, and its performance is decreased in saturated and supersaturated situations [100].
Islam proposed a distributed coordination method to optimize the signal timing for connected urban street networks, where vehicles and intersections can exchange information. The novelty of this work lies in solving the signal timing optimization problem via a decentralized approach instead of a centralized architecture. Since this distribution greatly reduces the complexity of the problem, the proposed method is real-time and scalable. In addition, distributed mathematical programs continuously coordinate with each other to avoid finding locally optimal solutions and moving towards global optimality [101].
Li considers the multi-level optimization problem of hierarchical control technology and proposes a CV-based coordinated traffic signal optimization framework for traffic signals. The problem is decomposed into two levels: using DP to optimize phase duration at intersecting levels and corridor levels, which optimizes all intersecting offsets. In order to solve the two models, they developed a prediction-based solution technique, and achieved good results [102].
The aforementioned papers researches a single traffic situation modality. Xiang et al. proposed a novel method of controlling an adaptive multi-agent based on the traffic signal controller integrated in the network communication environment. The method has the following features: it lacks a traffic model suitable for parallel processing, regional coordination and common learning. This method has been evaluated on a 23 km traffic road network in the urban area of Xiaogan, China, with 22 intersections. Compared with the data from the traditional fixed time method and driving method, the results reveal that the average driving time per vehicle, the average delay time per vehicle, and the average queue length, are significantly reduced [103]. Zamanipour et al. designed a multi-mode intelligent traffic signal control method to ensure the application of optimal signal dispatching while minimizing the negative impact on RV [104]. Unfortunately, the current theory of coordinated control of multi-modal vehicle traffic flow has not yet been formed, which will be the future development trend.

Discussion
In recent years, communication technology has continued to iteratively improve. New technologies, such as vehicle-road coordination, vehicle-to-vehicle communication (V2V), vehicle-tovehicle communication (V2I), V2X and autonomous driving are also beginning to emerge and continue to develop. The modes of traffic flow are constantly changing. RV, CV and AV are the main trends in current urban traffic composition. For the traffic control system, traffic status data has been greatly enriched and developed, which can lay the foundation for the realization of real-time, efficient and accurate traffic control. A new generation of adaptive control system relies on this scenario to achieve excellent uncertainty response capabilities and intelligent decision-making levels. The basis of the intelligent transportation system is the adaptive control method, so there must be a new control algorithm strategy that adapts to the development of the times, and integrates it, to achieve the goal of reducing traffic congestion and improving the efficiency of urban road traffic.
According to the current development of connected vehicles and the existing intersection control technology, this paper proposes that the current intersection signal control prospects are mainly in three categories: (1) data-driven reinforcement learning control; (2) adaptive performance optimization based on traffic control research; (3) traffic control research-based CAVs; and (4) multiple control based on CV environment. These four directions are based on the CV environment. With the advent of the era of 5G and the Internet of Everything, greater vitality will surely emerge.
Therefore, in today's era of 5G and the Internet of Everything, data is greatly enriched, and the adaptive control method based on the CV environment also needs to be constantly developed with the progress of technology. The main trends can be assumed as follows: (1) Combining various new adaptive control methods in Section 4, develop a new generation of traffic control system with higher computing power, achieve higher responsiveness, accuracy and robustness, and carry out overall control of urban traffic. (2) The guarantee of network security for 5G, V2X and the Internet of Things involves the protection of private vehicles and their personal privacy, as well as the protection against signal collection attacks by lawbreakers. Some of the literature has shown that signal attacks can cause a large number of vehicle delays. (3) The current control method is mainly aimed at a single traffic mode. In the future, there will be a mixed traffic flow of CV, AV and RV, and self-driving vehicles will have different levels. Therefore, how to take into account the traffic control of multimode traffic flow is a problem that must be solved in the future. (4) Balance multiple control objectives, such as vehicle delay, fuel consumption, and safety of motor vehicles and non-motor vehicles, to achieve the optimal overall control performance. This involves how to design, integrate and constrain different control objectives.

Conflicts of Interest:
The authors declare no conflict of interest.