A Two-Level Rolling Optimization Model for Real-time Adaptive Signal Control

Recently, dynamic traffic flow prediction models have increasingly been developed in a connected vehicle environment, which will be conducive to the development of more advanced traffic signal control systems. This paper proposes a rolling optimization model for real-time adaptive signal control based on a dynamic traffic flow model. The proposed method consists of two levels, i.e., barrier group and phase. The upper layer optimizes the length of the barrier group based on dynamic programming. The lower level optimizes the signal phase lengths with the objective of minimizing vehicle delay. Then, to capture the dynamic traffic flow, a rolling strategy was developed based on a real-time traffic flow prediction model. Finally, the proposed method was compared to the Controlled Optimization of Phases (COP) algorithm in a simulation experiment. The results showed that the average vehicle delay was significantly reduced, by as much as 17.95%, using the proposed method.


Introduction
With the development of the social economy, traffic congestion has become one of the most significant problems in many cities.Traffic signal control is a critical form of traffic control and management to reduce urban traffic congestion.Traffic signal control theory has been established for over 60 years, starting from the pioneering work of Webster [1].Since then, research and development in traffic signal control has largely fallen into three types of control strategies: fixed-time control, actuated control, and responsive control.
Fixed-time control is based on historic traffic data and assumes traffic demand is constant.Actuated control uses preset rules to adapt traffic flow based on detected traffic data (mainly vehicles passing/existing).Responsive control optimizes the signal timing plans based on real-time detected traffic data and improves the usage of intersection capacity [2][3][4].There are a few widely used responsive traffic control systems in the world [5][6][7]: SCATS [8] was developed in Australia, SCOOT [9] was developed in Britain, RODYN [10] and CRONOS [11] were developed in France, UTOPIA [12] was developed in Italy, OPAC [13] and RHODES [14,15] were developed in the USA.
The optimization algorithm, which can generate the optimal signal timing plans based on a given objective, is regarded as an indispensable part of adaptive control systems.At present, the optimization algorithm in adaptive control systems can be divided into the following categories [16]: dynamic programming [14,17,18], genetic algorithms [19,20], neural networks [21,22], and fuzzy logic control [23,24].Because of the fast calculation speed, the dynamic programming algorithm is widely used in adaptive control systems such as PRODYN [10], UTOPIA [12], OPAC [13] and RHODES [14,15].In the RHODES system, the optimization algorithm is the dynamic programming algorithm named, Controlled Optimization of Phases (COP) [14].In 2015, Feng et al. [17] proposed a dynamic programming signal timing optimization algorithm based on connected vehicle data.However, these algorithms do not consider the time-varying characteristics of the traffic flow during the optimization horizon.The longer the optimization horizon is, the worse the prediction effect is.Therefore, signal timing plans based on these methods are often unsatisfactory.
With the development of connected vehicle technology, the time granularity of prediction data is becoming smaller and smaller [25].Therefore, the use of small granularity predicted data to optimize the signal timing plans has been a popular research topic in the last couple of years.To address this issue, a two-level rolling optimization model for real-time adaptive signal control based on a dynamic traffic flow model was developed.The proposed method consists of two levels, i.e., barrier group and phase.The upper layer optimizes the length of the barrier group based on dynamic programming.The lower level optimizes the signal phase lengths with the objective of minimizing vehicle delay.Then, to capture the dynamic traffic flow, a rolling strategy was developed based on a real-time traffic flow prediction model.Finally, the proposed method was compared with the Controlled Optimization of Phases (COP) algorithm in a simulation experiment.
The remainder of this paper is structured as follows.The traffic signal timing optimization model and algorithm are described in Section 2. In Section 3, a case study and discussion about the proposed method are presented.Conclusions and future work are discussed in Section 4.

Traffic Signal Timing Optimization Algorithm
The signal timing optimization algorithm optimizes signal phase durations based on a dynamic traffic flow prediction model.The optimization method consists of two levels of optimization in this paper.At the upper layer, a dynamic programming (DP) is applied to each barrier group, with each barrier group between two barriers defined as a phase group.Based on the above definition, a standard National Electrical Manufacturers Association (NEMA) ring barrier controller structure is shown in Figure 1.The figure illustrates a phase sequence with left-turn movements leading the opposing through movements on both the major and minor streets.The diagram shows phases 1 and 5 ending at different times.The subsequent phase (phases 2 and 6 respectively) may begin once the previous phase has used its time.Once the barrier is crossed, phases 3 and 7 operate followed by phases 4 and 8.The cycle ends with the completion of phases 4 and 8.The calculation of the performance function of the upper level is passed to the lower level.The lower level (individual phase) optimization is formulated as an integer linear programming problem.In this study, the objective is to minimize the total vehicle delay and the sequence of barrier groups is assumed to be fixed.Time is discretized to 1 sec intervals, and the optimization is performed on a predetermined planning horizon, e.g., 80 s.Therefore, the problem is to find an optimal length for each phase to minimize vehicle delay.In addition, some constraints should be considered, such as the minimal and maximal green time in each phase.The two-level optimization model will be discussed in the next section.

Dynamic Programming Algorithm
The upper optimization is a DP problem, and a forward and a backward recursion are used to solve this. Figure 2 shows the relation of some variables, and Table 1 lists the notation of parameters and variables used in the DP algorithm.Green time of phase  in ring .
In this paper, a forward and a backward recursion were used to solve the DP problem.The forward recursion calculates the performance measure (objective function) based on the decision and state variables and records the optimal value function for each stage.The backward recursion

Dynamic Programming Algorithm
The upper optimization is a DP problem, and a forward and a backward recursion are used to solve this. Figure 2 shows the relation of some variables, and Table 1

Dynamic Programming Algorithm
The upper optimization is a DP problem, and a forward and a backward recursion are used to solve this. Figure 2 shows the relation of some variables, and Table 1 lists the notation of parameters and variables used in the DP algorithm.Green time of phase  in ring .
In this paper, a forward and a backward recursion were used to solve the DP problem.The forward recursion calculates the performance measure (objective function) based on the decision and state variables and records the optimal value function for each stage.The backward recursion  Green time of phase p in ring r.
In this paper, a forward and a backward recursion were used to solve the DP problem.The forward recursion calculates the performance measure (objective function) based on the decision and state variables and records the optimal value function for each stage.The backward recursion retrieves the optimal policy starting from the final stage and working backward.The details of the forward and backward recursion are described below.
The forward recursion is based on the allocation of time to each barrier group as stages in the DP.Considering each barrier group as a stage, the algorithm plans as many stages as necessary to obtain the optimal solution.The ring and phase within one barrier group are defined in Figure 1.The phases in each barrier group are divided into two rings, and r represents the ring index and p represents the phase index within the ring.Due to the variability of traffic demand, the algorithm will not produce a fixed cycle length.
The minimum and maximum allowable barrier group lengths are calculated according to the signal timing parameters as shown in Equations ( 1) and ( 2).The parameters include the minimum green, maximum green, yellow change and red clearance times of each phase.
Then, given the state j and the calculated minimum and maximum time for that barrier group, and the total discrete time-steps, the set of state variables is determined by Equation (3).
Given the state variable s j and the calculated minimum and maximum time for the barrier group, the set of feasible decision variables are determined by Equation (4).
After determining the equations of S j and X j s j , DP is used to search for the best signal timing plan.The forward recursion is described as follows.
Step 2: Calculate S j .
Step 3: For s j in S j { Calculate X j s j .v j s j = Min x j f j s j , x j + v j−1 s j−1 x j ∈ X j s j record x * j s j as the optimal solution in Step 2. }.
Step 4: If (∑ Else STOP. For each barrier group, DP calculates the optimal decision x * j s j for each state variable s j .The performance measure (objective function) f j s j , x j used to determine the state variable is passed to the lower optimization level with the constraint of control variable x j .The stopping criteria will be met if the sum of the minimum time length in all barrier groups is larger than T. The justification of the stopping criterion is different from that in the COP algorithm [14], which does not consider the constraint of the maximum green time of a phase group.In addition, considering that pedestrians need to cross the street, barrier groups are not allowed to be skipped in this study.
After all decisions are made for all barrier groups, the optimal decision x * j s j of each barrier group can be retrieved in the backward recursion as follows.
The optimal plan is retrieved from barrier group J since this barrier group denotes the minimum performance measure v * J (T), such as the minimum delays or stops.

Integer Linear Programming
In Step 3 of the forward recursion, f j s j , x j , the optimal performance measure (objective function) at stage j, given barrier group state s j and control x j , needs to be calculated.The value of f j s j , x j depends on the green duration of the jth barrier group.In this study, the vehicle delays can be considered as the objective function.Then, the lower level integer linear programming is formulated in Equations ( 5)- (10).To solve the integer linear programming problem, the optimal phase duration is enumerated to find the minimum delay combination for the given x j .The arrival flow of each phase at each time step (A r,p (t)) comes from a predicted arrival table, which is a two-dimensional matrix with time and phase respectively.The value in each cell is the number of vehicles that will arrive at the stop bar after time interval t requesting phase p in ring r and is the result of the traffic flow prediction model [26].
Firstly, the cumulative delay can be calculated by using the IQA method [27].Given x j is the length of the barrier group, the lower level problem solves one of the following optimization problems.The objective function as shown in Equation (5). where, where d g r,p , R r,p is the total delay in the given g r,p and R r,p ; l r,p (t) denotes the queue length of phase p in ring r at time interval t.
The queue length at time interval t depends on the queue length of time interval t − 1, the arrival and departure vehicles during time interval t, which follows the basic flow conservation relationship as shown in Equation (7).
where A r,p (t) and D r,p (t) denote the number of arrival and departure vehicles of phase p in ring r at time interval t, respectively.The analysis shows that the departure of vehicles at time interval t is related to the initial queue vehicles at time interval t, traffic signal state, and the saturation flow rate.This relationship is shown in Equation (8).
where S r,p denotes the saturation flow rate of phase p in ring r, veh/s.Then, considering the duration of two barrier groups in each ring, these should be equal to the current decision variable x j Equation ( 9).In addition, the duration of each phase is bounded by a lower limit and an upper limit which is shown in Equation (10).

Rolling Strategy
To avoid the effect caused by predicted vehicle arrival errors and to use the newly collected data, the rolling strategy based on DP algorithm was proposed.The algorithm is solved at the end of each time step based on a rolling strategy, as shown in Figure 3.

Rolling Strategy
To avoid the effect caused by predicted vehicle arrival errors and to use the newly collected data, the rolling strategy based on DP algorithm was proposed.The algorithm is solved at the end of each time step based on a rolling strategy, as shown in Figure 3.

Time (s) Time Step Time Step
The 1 st optimization horizon The 2 rd optimization horizon From Figure 3, if each optimization horizon includes n time steps, which is related to the length of the prediction horizon, then, the first optimized time horizon will be 1 to n time steps.After the first time step is finished, the second optimized time horizon is 2 to n + 1 time steps.By analogy, when a time step is finished, a new optimization is performed immediately based on the latest forecast data.Therefore, the length of the time step is an important parameter in the rolling strategy.The different time step lengths will be discussed in the case study.

Case Study and Discussion
As shown in Figure 4, there is an actual road network in Chengdu, China, which has a typical grid structure.Geometric data were collected in the field to reflect real conditions and were further modeled into the microscopic simulation software Vissim [28].Of the five intersections in total, one of those intersections, which is marked 5 in Figure 4, was chosen as the testing intersection for the proposed control system.The area surrounding the intersection is necessary for traffic flow prediction model [26] and to ensure realistic traffic flows.Full-actuated control was applied to all the other intersections.As a reasonable simplification, no right-turn traffic was modeled in this study, and only straight and left-turn traffic flow were modeled.The simulation pre-warm time was set at 900 secs, and the effective simulation time was 3600 From Figure 3, if each optimization horizon includes n time steps, which is related to the length of the prediction horizon, then, the first optimized time horizon will be 1 to n time steps.After the first time step is finished, the second optimized time horizon is 2 to n + 1 time steps.By analogy, when a time step is finished, a new optimization is performed immediately based on the latest forecast data.Therefore, the length of the time step is an important parameter in the rolling strategy.The different time step lengths will be discussed in the case study.

Case Study and Discussion
As shown in Figure 4, there is an actual road network in Chengdu, China, which has a typical grid structure.Geometric data were collected in the field to reflect real conditions and were further modeled into the microscopic simulation software Vissim [28].Of the five intersections in total, one of those intersections, which is marked 5 in Figure 4, was chosen as the testing intersection for the proposed control system.The area surrounding the intersection is necessary for traffic flow prediction model [26] and to ensure realistic traffic flows.Full-actuated control was applied to all the other intersections.As a reasonable simplification, no right-turn traffic was modeled in this study, and only straight and left-turn traffic flow were modeled.
The simulation pre-warm time was set at 900 secs, and the effective simulation time was 3600 secs.Different traffic volume levels were modeled in Vissim [28] for testing the compliance of the proposed control system with real traffic conditions.In addition, to analyze the sensitivity of the rolling time step, we use three rolling time steps: 2 secs, 4 secs, and 6 secs.The average delays of the two control methods were collected from simulation data and are plotted in Figure 5.
As shown in Figure 5, the average vehicle delay in the two methods increased with the increase of traffic volume.However, compared with the COP algorithm, the proposed control method always had a lower average vehicle delay.The benefits are mainly due to the proposed method being based on a rolling strategy, which can capture the real-time chrematistics of traffic flow.Therefore, the proposed method had a smaller vehicle delay.In addition, as shown in Figure 5, the smaller the time step of rolling optimization, the better the effect of the proposed control method.This showed that smaller rolling time steps can bring better performance.
Next, the average vehicle delay of each phase was obtained as shown in Tables 2-4.
Algorithms 2019, 12, x; doi: FOR PEER REVIEW www.mdpi.com/journal/algorithmsmodeled into the microscopic simulation software Vissim [28].Of the five intersections in total, one of those intersections, which is marked 5 in Figure 4, was chosen as the testing intersection for the proposed control system.The area surrounding the intersection is necessary for traffic flow prediction model [26] and to ensure realistic traffic flows.Full-actuated control was applied to all the other intersections.As a reasonable simplification, no right-turn traffic was modeled in this study, and only straight and left-turn traffic flow were modeled.The simulation pre-warm time was set at 900 secs, and the effective simulation time was 3600 secs.Different traffic volume levels were modeled in Vissim [28] for testing the compliance of the  As shown in Figure 5, the average vehicle delay in the two methods increased with the increase of traffic volume.However, compared with the COP algorithm, the proposed control method always had a lower average vehicle delay.The benefits are mainly due to the proposed method being based on a rolling strategy, which can capture the real-time chrematistics of traffic flow.Therefore, the proposed method had a smaller vehicle delay.In addition, as shown in Figure 5, the smaller the time step of rolling optimization, the better the effect of the proposed control method.This showed that smaller rolling time steps can bring better performance.
Next, the average vehicle delay of each phase was obtained as shown in Tables 2-4.As shown in Tables 2-4, the reduction of average delay is observed for all phases.For the studied intersection, the proposed control method reduced the average vehicle delay in each phase, compared with the COP algorithm.The results show that the proposed method was able to reduce both total vehicle delay of the intersection and for each phase.In addition, the reduced average vehicle delay was as much as 17.95%, 12.32%, and 11.78%, respectively, when the traffic volumes were 2500, 3500, and 4500 veh/h.

Conclusions
Through the actual survey data and the simulation analysis, the following conclusions were reached in this study: (1) at the whole intersection level, the proposed algorithm has less delay than the COP algorithm, and the average vehicle delay is reduced by 17.95%; (2) At the intersection phase level, compared with the COP algorithm, the proposed algorithm can reduce the vehicle delay in each phase; (3) Smaller rolling time steps can bring better performance.

Future Work
The coordination control (with common cycle and coordinated offset) will be studied in future work.In addition, there are still some open areas that are worthy of being investigated, such as: multiple objectives optimization to achieve a more balanced signal control plan and feedback strategy to gain more robust control by using the post-event vehicle delay, queueing and turning data et al. supported by the new intelligent transportation technology.

Figure 2 .
Figure 2. The relationship between the states, decisions, and the total number of discrete time-steps.

Figure 1 .
Figure 1.The standard NEMA dual-ring controller diagram.

Figure 2 .
Figure 2. The relationship between the states, decisions, and the total number of discrete time-steps.

Figure 2 .
Figure 2. The relationship between the states, decisions, and the total number of discrete time-steps.

Figure 3 .
Figure 3. Diagram of rolling horizon optimization strategy.
(a) Road network (b) The controlled intersection

Figure 4 .
Figure 4. Diagram of the simulated road network.

Figure 3 .
Figure 3. Diagram of rolling horizon optimization strategy.

Figure 4 .
Figure 4. Diagram of the simulated road network.

Figure 4 .
Figure 4. Diagram of the simulated road network.

Figure 5 .
Figure 5.The control performance versus volume for Controlled Optimization of Phases (COP) and the proposed method.

Figure 5 .
Figure 5.The control performance versus volume for Controlled Optimization of Phases (COP) and the proposed method.

Table 1 .
Notation of key parameters and variables used in the dynamic programing (DP) algorithm.
lists the notation of parameters and variables used in the DP algorithm.

Table 1 .
Notation of key parameters and variables used in the dynamic programing (DP) algorithm.,Phasechange interval which is the total of the yellow change and red clearance times of phase  in ring . , Minimum green time of phase  in ring .
Variable Description  Phase index in each ring and barrier group,  = 1,2. Ring index in each barrier group,  = 1,2. Index of barrier groups/stages. Last stage calculated by the DP before stopping. Decision variable denoting the length of barrier group . State variable denoting the total number of time steps from the start time to barrier group .  A set of state variable  . The total number of discrete time steps in the planning horizon, seconds.   A set of feasible control decisions, given barrier group state  .  ,  Performance measure (objective function) at stage , given barrier group state  and control variable  .  Value function (cumulative value of prior performance measures), given state variable  ., Maximum green time of phase  in ring . Minimum possible barrier group length of stage . Maximum possible barrier group length of stage . ,

Table 1 .
Notation of key parameters and variables used in the dynamic programing (DP) algorithm.Phase change interval which is the total of the yellow change and red clearance times of phase p in ring r.
j State variable denoting the total number of time steps from the start time to barrier group j.S jA set of state variable s j .TThe total number of discrete time steps in the planning horizon, seconds.X j s jA set of feasible control decisions, given barrier group state s j .fs j , x j Performance measure (objective function) at stage j, given barrier group state s j and control variable x j .vj s j Value function (cumulative value of prior performance measures), given state variable s j .

Table 2 .
The average vehicle delay of each phase (volume level is 4500 veh/h).

Table 2 .
The average vehicle delay of each phase (volume level is 4500 veh/h).

Table 3 .
The average vehicle delay of each phase (volume level is 3500 veh/h).

Table 4 .
The average vehicle delay of each phase (volume level is 2500 veh/h).