Trafﬁc Signal Control System Based on Intelligent Transportation System and Reinforcement Learning

: Trafﬁc congestion has several causes, including insufﬁcient road capacity, unrestricted demand and improper scheduling of trafﬁc signal phases. A great variety of efforts have been made to properly program such phases. Some of them are based on traditional transportation assumptions, and others are adaptive, allowing the system to learn the control law (signal program) from data obtained from different sources. Reinforcement Learning (RL) is a technique commonly used in previous research. However, properly determining the states and the reward is key to obtain good results and to have a real chance to implement it. This paper proposes and implements a trafﬁc signal control system (TSCS), detailing its development stages: (a) Intelligent Transportation System (ITS) architecture design for the TSCS; (b) design and development of a system prototype, including an RL algorithm to minimize the vehicle queue at intersections, and detection and calculation of such queues by adapting a computer vision algorithm; and (c) design and development of system tests to validate operation of the algorithms and the system prototype. Results include the development of the tests for each module (vehicle queue measurement and RL algorithm) and real-time integration tests. Finally, the article presents a system simulation in the context of a medium-sized city in a developing country, showing that the proposed system allowed reduction of vehicle queues by 29%, of waiting time by 50%, and of lost time by 50%, when compared to ﬁxed phase times in trafﬁc signals.


Introduction
High vehicle congestion causes around 40% of the world's population to spend at least one hour on the road every day [1].According to a report published by the INRIX organization, drivers in Bogotá (Colombia), the city, spend the most time in traffic congestion worldwide [2], losing an average of 191 h in year 2019 due to traffic congestion.
Facts such as this underscore the importance of looking for solutions to these traffic problems, especially in developing countries such as Colombia.These problems can have different causes, such as insufficient infrastructure capacity, unrestricted demand, and lack of synchronization in traffic signals [3].
Traffic signal control is an important and challenging real-world problem, which aims to minimize the travel time of vehicles by coordinating their movements at road intersections [4][5][6][7][8].
Many cities in developing countries feature traditional traffic signal control systems (TSCS), programmed with fixed times calculated by assumptions of expected traffic.Changes to those fixed times are burdensome and can take a long time (several days or weeks) when the need for such a change is detected.Other cities feature systems with dynamic time assignments by using loop sensors that detect passing vehicles and help determine the direction of an intersection requiring a longer green light time to decrease traffic queues [5].Usage of loop sensors for dynamic allocation of times of traffic signal plans has some limitations, mainly in heavy traffic where there are many stopped vehicles that do not pass through the sensors [4,5].
RL is the most used traditional Machine Learning approach for traffic signal control because it does not need an exact model of the traffic flow behavior in a network of intersections.RL is a concept where a system can learn how to bring the underlying process in a desired state according to the defined reward function using available actions.The system learns an optimal policy how to do that, and in the case of traffic signal control, the optimal signal program is learned.An RL-based agent gains knowledge through its learning process [15].
RL techniques are used to improve traffic signal control through trial-and-error searches.Compared with traditional transportation methods, RL avoids making hard assumptions about the traffic and learns directly from the feedback by trying different strategies [4].
There are four main components to formulate a traffic signal control problem under the framework of RL [5]:

•
Reward.A reward defines the goal in a RL problem.

•
State.The state captures the situation on the road and converts it to values.

•
Selection of action scheme.Different action schemes also influence the performance of traffic signal control strategies.

•
Coordination strategy.Achieving coordination is one of the challenges complicating the signal control problem.
The Q-learning algorithm is one of the most used RL approaches for traffic control [20] because it is relatively simple when compared to other types of RL algorithms.The Qlearning algorithm uses a table (Q-table), which stores values known as Q-values, that map to a state-action combination.A Q-value for a state-action combination represents the quality of an action taken from that state.Better Q-values imply better chances of getting greater rewards [15].
The definition of the state and the reward for applying RL to traffic signal control show is a key issue.Regarding the reward, it is important to consider the ultimate objective in urban transportation: minimizing vehicle travel time [4,5].Travel time is influenced not only by traffic signals, but by other elements such as queue length, waiting time, speed, throughput, and delay.Each one of these elements could be considered to define an adequate reward.Previous studies use combinations of these elements (such as [13][14][15][16][17]) to try to establish an adequate reward.Regarding the state, elements such as queue length, waiting time, volume, delay, speed, phase duration, and congestion are considered to represent the states in traffic signal control [4,5].Some proposals combine two or more of such elements or use novel elements such as images to represent the states, which results in a state with high dimensionality representation.In such cases, state approximation techniques such as tile-and coarse-coding, neural networks, etc., are considered.
Selection of some combinations of reward and states for application of RL increases the complexity of the proposed systems, making the dimensions of the matrices used by the RL algorithms too large and increasing the training time.In addition, improvement is not significant when compared to simple selection schemes using one element for reward and one for states [4,5].
A possible solution to this problem of large-dimension matrices (curse of dimensionality problem [21]), is the use of Deep Reinforcement Learning (DRL), which integrates deep learning methodologies with RL [21,22].The main purpose of this integration is approximating the policy function to tackle the curse of dimensionality problem, by using Deep Neural Network (DNN) based models [15].
The approach used in this research, as an initiative of the authors to consider the system application context, is the development of a simple TSCS which can be implemented at intersections in cities of developing countries.The proposed control system does not use a very large Q-table, thus avoiding the dimensionality problem.Moreover, basic parameters were selected to achieve a simple design of both state and reward to avoid using DRL.The proposed system uses the Q-learning algorithm, and considers the number of vehicles on each approaching lane and the green light time of the traffic signals of each one of the four directions of the intersection (West-East WE, North-South NS, East-West EW, South-North SN) as the state; the selected reward is the vehicle queue length.The actions considered by the system are to leave the green light time unchanged, increase or decrease it for each one of the directions of the intersection.This simple implementation eases data collection and reduces training time.
In addition to the RL approach, it is important to consider the advantages of Intelligent Transportation Systems (ITS) in the traffic control system.An ITS plays an important role in public transport management, traffic control, security, and other issues.Traffic flow detection is an important part of an ITS.Based on the real-time acquisition of urban road traffic flow information, an ITS provides intelligent guidance for relieving traffic jams and reducing environmental pollution [23].
A case study was used to evaluate the operation of the proposed TSCS, at a traffic signal intersection in the city of Popayán, Colombia.To test the operation of the TSCS in the integrated system (mainly the correct operation of the recommendation times of the phases of the traffic signals), it was also necessary to develop a simulated prototype of the system using the SUMO software 1.7.0 version [24] because a live operational test in the actual infrastructure was not possible.The obtained results were compared and discussed, with the support of statistical graphs and comparative tables to facilitate analysis.
The remainder of this paper is organized as follows: Section 2 presents the materials and methods used; including the literature review, design, development, testing, and simulation phases; Section 3 presents the results of the TSCS testing and simulations; Section 4 discusses the results; and Section 5 presents the conclusions of the study.

Materials and Methods
This section presents the materials and methods used for each of the stages of the research (systematic literature review, determination of an adequate ITS architecture, design and development of a system prototype, and designing and developing of system tests).

Systematic Literature Review
It is also important to highlight that this work conducts research on "How to perform adequate traffic control at intersections with traffic signals, in cities with infrastructure and budget limitations, considering the necessary standardization of mobility services for adequate interoperability".The hypothesis raised to answer the research question establishes that "It is possible to do so by developing a TSCS, focused on recommending times for traffic signals, using Machine Learning, and based on ITS services".
The work described in this paper included a rigorous systematic literature review performed with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) methodology.In the literature review, the four stages of the methodology were performed (identification, screening, eligibility, and inclusion).Three databases were included in the identification phase: EBSCO, IEEE, and Science Direct.The query string used was: "Traffic control OR traffic lights OR ((intelligent transportation system) AND traffic)) AND artificial intelligence AND time", obtaining 3295 documents.After filtering for year and type of publication, a total of 1780 documents were obtained.In the screening phase, duplicate documents were eliminated and filtered by title and/or abstract, yielding 46 documents.In the eligibility phase, through specific criteria and review of the full document, 26 documents were selected.Finally, in the inclusion phase, the following four groups of documents were identified for which a qualitative and quantitative synthesis was performed:

•
Services or systems for intelligent control of traffic signal time.

•
Design or implementation of mobility services through ITS that are not directly related to TSCS.

•
Methodologies, models, analyses, algorithms, devices, and studies related to counting vehicles and control of traffic signal time.

•
Artificial Intelligences (AI) techniques implemented in mobility services, which cannot be included in any of the previous groups.
This systematic review made it possible to identify the important aspects to consider for TSCS development.From the 26 documents selected in the eligibility phase, the documents for the state of art were selected.The state of art documents [14], refs.[25][26][27][28][29][30][31][32][33][34][35][36][37][38][39][40][41][42][43][44] made it possible to identify important aspects for the work described in this paper.Through a detailed review, it was possible to identify which documents contained relevant aspects, and which characteristics of our work had not been considered by most of the works reviewed.Table 1 presents a summary of the analysis of the state of the art.
The novelty of the work described in this paper was determined by the relevant aspects for the implementation of a TSCS in the target context, which have not been considered by most of the works reviewed in the state of the art.The novel factors identified were:

•
The use of an ITS architecture as a basis for developing the prototype.

•
Collecting a dataset for traffic analysis.The work proposed the collection of videos of the selected intersection as a case study to evaluate traffic flow at seven different time ranges per day.

•
Using an algorithm to size vehicle queues.The proposed TSCS includes an algorithm for counting vehicles from the video signals captured at the intersection.

•
The development of the RL algorithm for recommendation of traffic signal times in a way as simple as possible, with basic parameters and a short execution time.Most proposals found use RL with multi-agent and/or deep learning, which increases system complexity.

•
Consideration of the target context, in aspects such as budget, lane characteristics and types of intersections.In cities similar to Popayán, traffic networks with lanes with similar flow of vehicles and with the same number of lanes are not common.Common intersections feature a main road or highway (with three or four vehicle lanes), and secondary roads with one or two lanes.

•
Validation of traffic control modules in a real environment.This was possible due to the limitation in scope regarding the number of considered intersections.The proposed TSCS considers only an intersection, because the objective was to go beyond a simulation of the operation in a microscopic traffic software, considering its design, development, and implementation.As shown in Table 1, most of the reviewed references consider only the simulation of the system.The proposed TSCS only considers the Q-learning algorithms, because one of the objectives of the work was to find an adequate traffic control option that would facilitate implementation in the considered context.Q-learning algorithms allow the implementation of RL without a steep increase of system complexity in terms of required components for information collection.

Determination of an Adequate ITS Architecture for the TSCS
An ITS aims to promote the safety and mobility of transport, reducing traffic and mitigating air pollution, among other benefits.These ITS are implemented in most developed countries through an ITS architecture and its services.An ITS architecture allows to identify the different components (modules) composing a certain service, describes the interaction between the actors (service users) and the components, and the information (content and structure) that will be exchanged between the components [45].
Reference architectures are used to define a context-aware suitable architecture for the TSCS.The most used reference architectures in the world include the American architecture, known as Architecture Reference for Cooperative and Intelligent Transportation (ARC-IT) [46]; the European architecture, known as European Intelligent Transport Systems Framework Architecture (FRAME) [47]; and the ISO 14813 architecture [48].These ITS architectures are very well documented and updated.
ITS architectures evaluated in the Colombian environment were also reviewed.The ITS national architecture initiative for Colombia, based upon ARC-IT, was launched in 2010 [49].The National ITS Architecture of Colombia is a guide for the ITS integration in Colombia.Although it was an appropriate initiative at the time, it did not achieve its mission and it has not been properly updated and adapted since its launch.Although the Colombian government has advanced in terms of ITS standardization (Act 1450 of 2011, Article 84; Decree 2060 of October 2015 [50]), this has been insufficient for adequate design, development, and implementation of mobility services in intermediate cities.
After conducting a review of the different service sets of the ARC-IT, FRAME, and ISO architectures, several elements were chosen to build a suitable architecture for the TSCS.
In the ARC-IT architecture [46], the service area that best suits the TSCS is the Traffic Management (TM) area.After reviewing the services proposed in this area, the Traffic Signal Control (TM03) and Traffic Metering (TM05) services were considered because the system must measure the traffic flow at an intersection to determine the better times for their traffic signals, thus reducing the vehicle waiting time.Figures 1 and 2 show the architecture of the TM03 and TM05 services.The European architecture (FRAME) proposes ten service areas; area number three (Manage Traffic) is directly related to the TSCS.This area includes five high-level functions.The Provide Traffic Control function was chosen, because it provides facilities for traffic management using the road network, functionality to manage the urban parts of the network, and facilities that allow data collection on the use of the road network and prioritizing certain vehicles [51].The European architecture (FRAME) proposes ten service areas; area number three (Manage Traffic) is directly related to the TSCS.This area includes five high-level functions.The Provide Traffic Control function was chosen, because it provides facilities for traffic management using the road network, functionality to manage the urban parts of the network, and facilities that allow data collection on the use of the road network and prioritizing certain vehicles [51].
Figure 3 presents the proposed architecture for the TSCS, which considers results of the systematic review of literature, and the reviewed ITS reference architectures.Three of the components used in the TM03 and TM05 ARC-IT services are included: ITS Roadway Equipment, Traffic Management Center (TMC) and Vehicles (presented as Driver in ARC-IT).The traffic measurement, traffic signal control, and basic road surveillance modules are also included.The three components of the proposed ITS architecture allow traffic monitoring, data, and video collection, determination of the best traffic signal times for a given intersection, and programming of the traffic signals phases of any intersection.The ITS traffic control system architecture proposed did not include the other actors (pedestrian and cyclist) suggested by ARC-IT architecture (Figures 1 and 2), because they have a very low degree of interaction with the traffic control system in the target context.The actors who want to cross the street near a high traffic intersection use elevated pedestrian bridges, or the pedestrian crossing zones located before the traffic signal, at the times that it is possible for them to cross.The ITS traffic control system architecture proposed did not include the other actors (pedestrian and cyclist) suggested by ARC-IT architecture (Figures 1 and 2), because they have a very low degree of interaction with the traffic control system in the target context.The actors who want to cross the street near a high traffic intersection use elevated pedestrian bridges, or the pedestrian crossing zones located before the traffic signal, at the times that it is possible for them to cross.
The proposed ITS architecture uses high-level functions from the FRAME architecture described in the provide traffic control area (Provide Urban Traffic Management and Provide Traffic Predictions).The proposed architecture maintains most of the physical components proposed in ARC-IT, but also considers the functionality of the two mentioned reference architectures (FRAME and ARC-IT).
The "Basic Road Surveillance" functional module (white box in Figure 3) is responsible for capturing and relaying the video signals from each one of the directions of a given intersection to the TMC.This functionality allows the "Traffic Operations Personnel" to visualize the level of video traffic of a certain intersection with traffic signals.
The "Roadway Traffic Metering" functional module converts video signals into data to determine the length of the vehicle queue in each direction of the intersection.This data is also sent to the TMC.
Finally, the functional module defined as "Roadway Traffic Signal Control" oversees programming the appropriate phases of the traffic signals at a given intersection.Its input is the set of values that the TMC determines as adequate.The adequate values are calculated after analyzing the current traffic data at the intersection and the environmental conditions using an AI algorithm.Next, in Section 2.3.1 the detailed design of the components and modules of the proposed prototype is presented.Section 2.3.2 presents the design and development of the RL traffic signal time recommendation AI algorithm, which is the principal smart component of TSCS.Finally, Section 2.3.3 presents the RL algorithm adjustments for simulation with the SUMO software are presented.

Description of Prototype's Components and Modules
Figure 4 shows the two physical components of the prototype: 1. ITS Roadway Equipment (ITS RE).Consists of a Raspberry PI 3 single-board computer (SBC) and a surveillance camera located in the structure of each of the traffic signals at the intersection.The Raspberry Pi SBC contains the three functional mod- Next, in Section 2.3.1 the detailed design of the components and modules of the proposed prototype is presented.Section 2.3.2 presents the design and development of the RL traffic signal time recommendation AI algorithm, which is the principal smart component of TSCS.Finally, Section 2.3.3 presents the RL algorithm adjustments for simulation with the SUMO software are presented.

Description of Prototype's Components and Modules
Figure 4 shows the two physical components of the prototype: 1.
ITS Roadway Equipment (ITS RE).Consists of a Raspberry PI 3 single-board computer (SBC) and a surveillance camera located in the structure of each of the traffic signals at the intersection.The Raspberry Pi SBC contains the three functional modules of the ITS RE: Basic Road Surveillance, Road Traffic Signal Control, and Road Traffic Measurement.2.
Traffic Management Center (TMC).The TMC is located on a server in the cloud.The three functional modules the TMC are: Basic Surveillance (TMC BS), Traffic Control (TMC TC), and Road Traffic Measurement (TMC RTM).
A description of the six functional modules of the prototype follows: 1.
The Basic Road Surveillance (ITS RE BRS) module is responsible for capturing the image of the road at each traffic lane (with help of the surveillance camera) and transmitting it to the TMC.This module can capture the video signal from each one of the directions in the intersection (using a camera in each direction) to reduce costs.

2.
The Road Traffic Signal Control (ITS RE RTSC) module sends the current traffic signal plan from the ITS RE to the TMC; this plan determines the order the intersection traffic signals change and the duration of the green light time of each of the signals.
This information allows to determine the state of the intersection at the TMC.This module also receives updates of the traffic signal plan (from the TMC) to adjust them according to instructions issued by the Traffic Operation Personnel (TOP).

3.
The Road Traffic Measurement (ITS RE RTM) module reviews video signals of current traffic in each direction of the intersection and counts the waiting vehicles (queue size), for this it captures an image periodically, detects and count the waiting vehicles using a computer vision algorithm.The used algorithm was an adaptation of the YOLO v3 algorithm [51].An example of the vehicle counting performed by the algorithm is shown in Figure 5. YOLO version 3 is a real-time object detection system that, unlike other systems such as Region-based Convolutional Neural Networks (R-CNN), applies a single neural network to the entire image.This network divides the image into regions and generates bounding boxes and probabilities for each region, making this system 100 times faster than Fast R-CNN [52].
Electronics 2021, 10, x FOR PEER REVIEW 9 of 32 YOLO v3 algorithm [51].An example of the vehicle counting performed by the algorithm is shown in Figure 5. YOLO version 3 is a real-time object detection system that, unlike other systems such as Region-based Convolutional Neural Networks (R-CNN), applies a single neural network to the entire image.This network divides the image into regions and generates bounding boxes and probabilities for each region, making this system 100 times faster than Fast R-CNN [52].

4.
The TMC Basic Surveillance (TMC BS) module receives the video signals from each of the four directions of the intersection and presents them on the web app, with which the TOP interacts.The Flask Python library [53] and the Ngrok tunneling application [54] were used to allow simultaneous presentation of the four video signals on the Web application.

5.
The TMC Traffic Control (TMC TC) module queries the traffic measurement parameters from the system database and transmits them to the RTM.Additionally, the TMC TC module queries the ITS RE about the current traffic state at the intersection.It requests the current state of the vehicle queue (short or long) and the green light time of each one of the directions of the intersection (short or long).The obtained response is an 8-bit array, where the four Most Significant Bits (MSB) are the green light times at each traffic signal (0 for short time, 1 for long time); and the four Least Significant Bits (LSB) represent the vehicle queue size in each direction (0 for short queue, 1 for long queue).6.
The TMC Traffic Measurement (TMC TM) module analyzes the current traffic state using the parameters obtained from the TMC TC module and makes a recommendation to update the time plan of the intersection's traffic signals.This module uses a RL traffic signal recommendation AI algorithm for such purposes.If the suggested recommendation differs from the current state, it is presented to the TOP in the Web app so that she can determine whether to update the traffic signal plan through the RTSC module.Information on the algorithm used in this module is detailed in Section 2.3.2.
A description of the data flows between the functional modules of the ITS RE and the TMC follows.

1.
Traffic measurement parameters.This information is sent from the TMC TM module to the ITS RE RTM through a database request.The parameters define three aspects: the periodicity (in seconds) of traffic measurement in the ITS RE; the maximum size of the vehicle queue in each of the directions of the intersection; and the values (in seconds) of the short and long times of the green light in each of the intersection directions.

2.
Traffic measurement status.Response message from the ITS RE RTM to the TM TMC.Contains the number of vehicles identified by the AI algorithm (queue sizes) at the intersection.This information is stored in the database.

3.
Current traffic signal control plan.Information about the current traffic signal phases and cycle at the intersection is coded in a string (e.g., STSTLTLT that means Short Time, Short Time, Long Time, and Long Time for each direction of intersection), which is sent from the ITS RE RTSC module to the corresponding module in the TMC.The traffic signal phases include the order in which the traffic signals change and the green light times (in seconds) for each one of them; this information is also stored in the database.4.
Updates to the traffic signal control plan.Message sent from the TMC TSC module to the respective ITS RE module through the database to update the traffic signal phases based on the recommendations that have been determined using the AI algorithm.
The message includes the order in which the traffic signals change and the green light times (in seconds) for each traffic signal coded into a string (e.g., LTLTSTST).

5.
Video feeds of the intersection in each direction.This message includes video feeds (in low or medium resolution) of the intersection in each direction, so the Traffic Operations Personnel (TOP) can see the respective videos.

RL Traffic Signal Time Recommendation AI Algorithm
The RL process is explained in Figure 6.The figure identifies the mentioned elements (states, actions, reward, and environment).The agent initially reviews the system environment (current state and possible actions), then decides which action to take (among the possible ones defined in the environment), executes the action, and in response receives a reward and a new state.This process is repeated continuously until an optimal state is achieved.
ceives a reward and a new state.This process is repeated continuously until an optimal state is achieved.
The objective of the RL algorithm is to minimize the length of the vehicle queue at intersections, for which the established rewards are the reduction of vehicle queues in each one of the directions.The action taken by the agent generates a new state in the traffic signals (new phase times).This new state will possibly increase or decrease the size of each of the vehicle queues.The algorithm aims to decrease the length of all long queues with the new assigned state, thus, yielding the highest reward.If this does not happen, the reward value obtained will vary (according to the established policies).The algorithm revises the new state of the vehicle queues two signal cycles after the state change.The objective of the RL algorithm is to minimize the length of the vehicle queue at intersections, for which the established rewards are the reduction of vehicle queues in each one of the directions.The action taken by the agent generates a new state in the traffic signals (new phase times).This new state will possibly increase or decrease the size of each of the vehicle queues.The algorithm aims to decrease the length of all long queues with the new assigned state, thus, yielding the highest reward.If this does not happen, the reward value obtained will vary (according to the established policies).The algorithm revises the new state of the vehicle queues two signal cycles after the state change.
Figure 7 presents the components of the implemented algorithm for the TSCS.The phases of the traffic signals for a 4-way intersection are effectively at least eight.In some cases, more than eight might be used when additional phases are used to allow special turns.Both the system analysis and the RL traffic signal time recommendation AI algorithm considered eight phases.But four of the eight phases do not change at any time: when the color in any traffic signal is red-yellow, or yellow (red-yellow is used in some countries to indicate that the traffic light will change from red to green).Each one of these static phases was assigned a duration of five seconds, considering the international standard duration for this color in traffic signals is 3-6 s (which is set for an intersection depending on factors such as: design speed, acceleration/decelerate rate, the width of intersection, and reaction time).Table 2 presents the eight considered phases for the chosen 4-way intersection.
Electronics 2021, 10, x FOR PEER REVIEW 11 of 32 Figure 7 presents the components of the implemented algorithm for the TSCS.The phases of the traffic signals for a 4-way intersection are effectively at least eight.In some cases, more than eight might be used when additional phases are used to allow special turns.Both the system analysis and the RL traffic signal time recommendation AI algorithm considered eight phases.But four of the eight phases do not change at any time: when the color in any traffic signal is red-yellow, or yellow (red-yellow is used in some countries to indicate that the traffic light will change from red to green).Each one of these static phases was assigned a duration of five seconds, considering the international standard duration for this color in traffic signals is 3-6 s (which is set for an intersection depending on factors such as: design speed, acceleration/decelerate rate, the width of intersection, and reaction time).Table 2 presents the eight considered phases for the chosen 4way intersection.By having four phases with static time, it is only required to define the time of four phases to determine a new time for the eight possible phases.For this reason, it is possible to encode the times of the four phases to which the time is modified as NS, SN, EW, and WE.The code defines the duration of the four phases for which the time can be modified (green light in any of the 4 traffic signals).Time for each direction is defined as either ST or LT (short time or long time).Defining the system in this way simplifies the algorithm and shortens the execution time.
In the developed algorithm, states were identified with an 8-bit array, the four MSB are green light times (0 for short time, 1 for long time) in each traffic signal, for each travel direction (WE, NS, EW, SN); and the four LSB represent vehicle queue lengths for each direction (0 for short queue, 1 for long queue).The actions were identified with a 4-digit array.Each digit controls an action for each traffic signal.If the green time in traffic signal is to be kept, the value of the respective action digit is 0; if the green time is to be decreased the value of the action digit is 1; and if it is to be increased the value is 2. Regarding rewards, if it is possible to decrease the vehicle queue in a certain direction, the most positive value of the reward is taken; if the vehicle queue is maintained, the reward is zero or less than the previous one; finally, if the vehicle queue is increased, the value of the reward is negative.The total reward is calculated by considering the reward obtained in each direction of the intersection.
The implemented algorithm operates in a similar way to the general way presented in Figure 6.The current state of the environment (current plan times and vehicle queue sizes) is the input to the algorithm.The output of the algorithm is the best course of action to perform with the times of each of the traffic signals, indicating whether the time should be maintained, increased, or decreased.The selected action generates a new state in the environment, and this new state is expected to improve traffic conditions.
The developed RL algorithm is based upon Gym from OpenAI [55].Gym is a toolkit for developing and comparing RL algorithms.The Gym exercise used as a base (called Taxi v3) is related to the movement of a taxicab in a 4 × 4 matrix which has obstacles in some positions [56].This exercise was used to facilitate identification and handling of states, actions, rewards, and environment for this specific system, and to identify the way to code this type of algorithm using the Python language.
The implemented algorithm was a Q-learning algorithm.The Q-values were initialized through a training process, which establishes these values depending upon certain basic reward policies.For example, a decrease in a vehicle queue generates a positive reward for the respective direction, while an increase in a vehicle queue generates a negative reward.
This training process takes approximately three to five minutes, depending on the number of training cycles.For the final tests, 10,000 cycles were used, and the training took approximately three minutes.This training would only be performed the first time the algorithm is executed; each time the algorithm is run the Q-values are updated, so the algorithm is constantly learning.
Once training is finished, traffic control operation starts.The agent exposes itself to the environment and receives different rewards by executing different actions.The Q-values are updated using the following equation [57]: where Q*(st, ac) is a new Q-value, Q(st, ac) is the current Q-value for a state-action pair, α (0 < α ≤ 1) is the learning rate, rw is the reward received from the environment, and γ (0 ≤ γ ≤ 1) is the discount rate, which determines how much importance the future rewards have.The parameters α (alpha) and γ (gamma) are known as "hyper-parameters" in Q-learning algorithms.
Figure 8 presents a flow chart explaining the algorithm operation.The Q-table is created and updated in an external text file, this allows to load information obtained from previous executions of the algorithm, allowing to continue the learning process.The algorithm is available at [58].The TSCS system learns during the interaction with the environment in the exploration phase through the algorithm of Figure 8.The algorithm would be executed in a real operating environment (with real traffic flow), but it was not possible to do so, because a traffic signal that could be updated and verified was not available.To verify its correct execution and operation, the values of the new TSCS states were updated in the database using external software.Since it was not possible to test the system in an integrated way with all the modules in a real environment, the development of a simulation through suitable software (SUMO) was proposed; this simulation required adjustments to the initially proposed algorithm which are explained in Section 2.3.3.The simulation made the generation of the Q-table easier; therefore, for a real environment, Q-table training could be performed initially.Such previous training would allow having adequate values before starting the interaction with the environment.
Although the algorithm developed for the real environment could not be tested in an integrated way with the rest of the components developed for the TSCS, it is important to know how it is proposed to learn it, through the generation and permanent updating of the Q-table, which is presented below.
An offline approach to generate the initial Q-table was used, considering a previously recorded response of the underlying traffic process (intersection) to be controlled.This is done to expedite the learning phase required before the system can be implemented.
Subsequently, the Q-table is updated during the interaction with the environment, which can take several minutes, hours, or days of execution in a real traffic environment (depending on the complexity of the environment).For this reason, the developed algorithm gives the option to suspend the process during the exploration phase and resume the update of the Q-table, loading it from a text file (which is continuously updated).The TSCS system learns during the interaction with the environment in the exploration phase through the algorithm of Figure 8.The algorithm would be executed in a real operating environment (with real traffic flow), but it was not possible to do so, because a traffic signal that could be updated and verified was not available.To verify its correct execution and operation, the values of the new TSCS states were updated in the database using external software.Since it was not possible to test the system in an integrated way with all the modules in a real environment, the development of a simulation through suitable software (SUMO) was proposed; this simulation required adjustments to the initially proposed algorithm which are explained in Section 2.3.3.The simulation made the generation of the Q-table easier; therefore, for a real environment, Q-table training could be performed initially.Such previous training would allow having adequate values before starting the interaction with the environment.
Although the algorithm developed for the real environment could not be tested in an integrated way with the rest of the components developed for the TSCS, it is important to know how it is proposed to learn it, through the generation and permanent updating of the Q-table, which is presented below.
An offline approach to generate the initial Q-table was used, considering a previously recorded response of the underlying traffic process (intersection) to be controlled.This is done to expedite the learning phase required before the system can be implemented.
Subsequently, the Q-table is updated during the interaction with the environment, which can take several minutes, hours, or days of execution in a real traffic environment (depending on the complexity of the environment).For this reason, the developed algorithm gives the option to suspend the process during the exploration phase and resume the update of the Q-table, loading it from a text file (which is continuously updated).
Next, an example of algorithm operation is presented.Bear in mind that the 4 MSB of the state array represent the green light times for each one of the travel directions in the intersection (WE, NS, EW, SN), and the 4 LSB represent the vehicle queue length for each travel direction.Thus, if the 4-way intersection is in the state "10101110", this means that the traffic signals in the WE and EW direction have a long green light time and the traffic signals in the NS and SN directions have a short green light time.In addition, the vehicle queue of the WE, NS, and EW directions exceed the maximum threshold and are considered as long queues, while the vehicle queue in the SN direction is short.
By asking the algorithm for the value of the action that generates the highest reward, the value "0202" could be obtained as an answer.This value causes the green light times of the traffic signals with the WE and EW directions to keep the same value, and the green light time of the traffic signals with the NS and SN directions to increase.With this action, the new status would be "11111110".The algorithm waits for two complete cycles of the traffic signals and rechecks the status at that time to verify if traffic conditions have improved.If conditions improve (e.g., one or more vehicle queues change from long to short) the reward will have a positive value.The Q-table is updated with each update of the status of the intersection.

RL Algorithm Adjustments for Simulation
Once the development of the prototype was finished, unit functional tests of some modules were performed (which are described in Section 2.4).However, testing the operation of the RL traffic signal time recommendation AI algorithm and its interaction with the environment in a real environment was complicated, because special permits were required from government entities in charge of traffic control in a city.In addition, it was important to perform a previous functionality assessment.For this reason, a new version of the algorithm (RL traffic signal time recommendation) was developed to allow simulation through SUMO, a microscopic traffic simulation software [24].Figure 9 presents a flow diagram explaining the operation of the algorithm allowing the simulation of the TSCS using SUMO.The algorithm can be downloaded at [59].The version of the algorithm presented in Figure 8 applies to a real operating environment (with real traffic flow), while the version of the algorithm presented in Figure 9 is required for evaluating performance with simulated traffic flows.
The generation of the Q-table in this algorithm (Figure 9) is performed in a similar way to that explained for the algorithm in Figure 8 (Section 2.3.2).The generation of the initial Q-table in this simulation algorithm and its permanent updating during the simulated cycles would allow adequately training the Q-table in a real environment, before starting the interaction.
Nevertheless, the algorithm presented in Figure 9 has important changes compared to the initial version presented in Figure 8.In this version of the algorithm, an integration was made between the initial version (which used only RL) and a SUMO interface, allowing simulation of the online traffic control named TraCI 1.7.0 version (Traffic Control Interface) [60].TraCI made it possible to measure the vehicle queues (generated in the simulation) and to program the times of the simulated traffic signals in each of the directions (using external files for time plan definition).
of the algorithm (RL traffic signal time recommendation) was developed to allow simulation through SUMO, a microscopic traffic simulation software [24].Figure 9 presents a flow diagram explaining the operation of the algorithm allowing the simulation of the TSCS using SUMO.The algorithm can be downloaded at [59].The version of the algorithm presented in Figure 8 applies to a real operating environment (with real traffic flow), while the version of the algorithm presented in Figure 9 is required for evaluating performance with simulated traffic flows.The generation of the Q-table in this algorithm (Figure 9) is performed in a similar way to that explained for the algorithm in Figure 8 (Section 2.3.2).The generation of the initial Q-table in this simulation algorithm and its permanent updating during the simulated cycles would allow adequately training the Q-table in a real environment, before starting the interaction.
Nevertheless, the algorithm presented in Figure 9   The traffic flow simulation was compared to two other options to evaluate the performance of the RL algorithm.The first option was the assignment of fixed times in the traffic signals, and the second option was a basic recommendation of traffic signal times, depending on the vehicle queue length for each direction.A setup variable (named "simulation_type") was used to configure the type of simulation to be performed.
The algorithm has two additional setup variables.The second setup variable named "route_generation" determines whether a route file should be generated for the SUMO simulation.The route file determines how many vehicles are going to travel on the simulated roads, along with their speeds and routes.SUMO performs the simulation with the existent route file.If the "route generation" variable is set to 1, a new route file will be generated; if the variable is set to 0, the existing one is used.Finally, the third setup variable named "Q-table_training" determines if a training of the RL algorithm should be performed.If the variable is set to 1, a training of the algorithm is performed, if the variable is set to 0, training is not performed, and the Q-table is loaded from a text file.
The algorithm simulates the steps corresponding to one day (86,400 s); one simulation step in SUMO corresponds to one second.By having different types of simulation, the algorithm allows to simulate traffic behavior at the intersection, using the same vehicle routes and different types of traffic control.The different tests performed with the developed algorithm are explained in Section 2.4.3.The simulated vehicle queue measurement is performed every 150 simulation steps, which corresponds to approximately two cycles of the traffic signals.

Designing and Developing of System Tests
The tests performed on the TSCS were: Each of them is described in detail below.

Design and Development of Computer Vision Tests (for Calculating Vehicle Queues)
In the TSCS prototype, the measurement of vehicle queues was proposed through the ITS RE RTM module.This module includes a vehicle queue AI algorithm which receives the video signals of the traffic of the four directions of an intersection as input and performs the periodic measurement of the vehicle queue in each direction.The algorithm was developed and tested in stages, increasing the complexity of the functionality and scaling the number of signals evaluated in each stage.Initially, the algorithm was tested using some publicly available images of roads, each with a considerable number of vehicles, analyzing the images one by one.Then, publicly available video files of traffic cameras at different intersections were used, measuring the vehicle queues periodically.These measurements were analyzed, reviewing which model offered by YOLO could have the best ratio between processing time and fidelity of detection results.These tests were also done by reviewing the videos one by one.Later, the counting was tested with a video taken and streamed from a USB camera connected to the Raspberry Pi SBC (in a line of vehicles not located at a traffic signal intersection).Finally, this process was scaled up to stream two videos simultaneously.
To test the vehicle queue algorithm in a 4-way intersection with traffic signals, an intersection in the city of Popayán was chosen as a case of study.This point has the needed characteristics and presents a high level of traffic.The selected point is located at latitude 2.459476 N and longitude 76.597016 W. Video was captured in the four directions of the intersection, and these videos were used for the final tests of the vehicle counting algorithm.The detailed information about design of captured video signals and the planned structure of the data set is presented in Section 2.4.2.
Two commonly used metrics in this type of research, precision and recall [61,62], were used to evaluate the performance of the computer vision algorithm for vehicle detection.Before presenting the definition of these two metrics, it is important to clarify the following definitions regarding the results obtained at the time of detecting a vehicle in an image or video using the algorithm [61,62]:

•
True positive (TP): the algorithm detects a vehicle, and a vehicle is present.

•
False positive (FP): the algorithm detects a vehicle, and a vehicle is not present.

•
False negative (FN): the algorithm does not detect a vehicle, and a vehicle is present.

•
True negative (TN): the algorithm does not detect a vehicle, and a vehicle is not present.
Precision is the ratio of the number of TP to the total number of positive predictions.The formula is as follows.
Precision = (TP)/(TP + FP) Recall is the ratio of the number of true positives to the total number of actual (relevant) objects: Recall = (TP)/(TP + FN) These two metrics were calculated for the final three types of computer vision tests, and the results obtained in each of them are presented in Section 3.1.

Design of Captured Video Signals and the Planned Data Set
To take the videos in the selected intersection, a series of recording sessions were performed, which consisted of taking videos of three cycles per traffic signal.Considering that the intersection has four directions (four traffic signals), each session required taking four videos, each one with a duration of three cycles, which was estimated at approximately 2 to 5 min.The duration of a session was estimated at approximately one hour.The plan for capturing the videos was to film on each of the seven days of the week (Monday to Sunday) and to hold a session in the morning and another in the afternoon, trying to do the filming at the times with peak traffic flow.Therefore, a total of approximately 14 h was planned for the 14 sessions.The principal limitations considered were unfavorable weather, achieving a good location (a high vantage point) to record the traffic, and visualizing the entire queue in each direction of the intersection.Videos were ordered by the day of the week, then by the time range of the day (morning or afternoon), and then by the direction at the intersection.Two months were allocated to perform the entire process, considering that filming would not take place on consecutive days.0.4, the algorithm tended to give too much importance to rewards of future states, which is not favorable because the objective is to find the best reward for the next state.While for values lower than 0.4 the reward was not significant, because an immediate reward in this case is not possible.Considering that the reward would be evaluated two cycles after the state change in the traffic signals, the ideal value was an intermediate value (0.4).
The chosen real intersection was represented in SUMO, considering the dimensions of the roads, the number of lanes in each direction, and the current phases of the traffic signals.Figure 10 presents the four directions of the intersection represented in SUMO, along with the vehicular flow, which accounts for differently sized vehicles (cars are yellow, buses are red, and trucks are green).The light color of each traffic signal is represented as a line across the road, before entering the intersection.Some aspects of the intersection such as U-turns were not considered.Figure 10 indicates the maneuvers a vehicle can perform at the intersection: go straight ahead, turn left, or turn right.The information collected in the video signal data set and the vehicle queue algorithm was used to simulate the vehicle frequency in each direction of the intersection.Operation was simulated for 5 days (86,400 execution steps in SUMO were equivalent to 86,400 simulated seconds, which is equivalent to the time of one day).Simulation was performed as follows: • Initially, the setup variables were set to the following values: simulation_type was set to 2, route_generation to 1, and Q-table_training to 1.This way, the simulation of the first day of TSCS operation was performed using the RL recommendation, generating a new vehicle routing file and training the algorithm's Q-table.The information collected in the video signal data set and the vehicle queue algorithm was used to simulate the vehicle frequency in each direction of the intersection.Operation was simulated for 5 days (86,400 execution steps in SUMO were equivalent to 86,400 simulated seconds, which is equivalent to the time of one day).Simulation was performed as follows: • Initially, the setup variables were set to the following values: simulation_type was set to 2, route_generation to 1, and Q-table_training to 1.This way, the simulation of the first day of TSCS operation was performed using the RL recommendation, generating a new vehicle routing file and training the algorithm's Q-table.

•
Once the simulation of the first day was done, results were compared to the other two types of simulations (basic recommendation and fixed times) for the first day.
To perform the other two simulations, the setup variables were configured as follows: for the second simulation (basic recommendation) simulation_type was set to 1, route_generation to 0, and Q_table_training to 0; for the third simulation (fixed times) simulation_type was set to 0, and the other two were unchanged.This guaranteed that results obtained with the three different types of simulation were tested, for the same vehicle routes in a day.

•
To simulate days 2, 3, 4, and 5, the first two steps were repeated.The algorithm guarantees randomness in the generation of vehicle routes, so as not to repeat the same route file on the other days.Each file generated from the routes has a random frequency of vehicles in each direction, at each time range of the day (7 time ranges per day were used: 0-6 a.m., 6-9 a.m., 9-11 a.m., 11 a.m.-2 p.m., 2-5 p.m., 5-8 p.m., 8 p.m.-0 a.m.).In this way, an attempt was made to approximate the behavior of a real environment, in which the frequency of vehicles is similar from day to day of the week, but with certain changes between hour ranges.
Once all the algorithm simulations had been performed, the information was analyzed and graphed.These results can be reviewed in Section 3.3.

Web Application Testing
The web application tests were designed according to the functionality identified with the prototype development methodology (Scrum).Through Scrum, the "user stories" of each one of the actors were identified.This same functionality determined by the "user stories" was verified through the tests of the web application.Each "user story" has one or more associated interfaces of the web application, therefore, when the functionality of the "user story" is verified, the interfaces of the web application were verified to operate as expected.Some additional "user stories" were related to other functional modules.For example, the "user stories" related to maps required the use of Google APIs, also the "user stories" related to the algorithms required verifying the interaction between the web application and such algorithms.

Real-Time Module Integration Testing
Finally, once the TSCS simulation yielded satisfactory results, some real-time tests of the system were performed to verify that the operating times were adequate and the system was able to operate in a production environment.The ITS RE RTM module was tested by inputting four video signals simultaneously, these four video signals were used to estimate the vehicle queue lengths by the AI algorithm applied to count the vehicles in each of the directions of the intersection and report the information to the database.The TSCS web interface was used to query the status of each of the traffic signals and the latest vehicle queue data in each direction of the intersection.Then, through the web interface, the execution of the recommendation algorithm was requested, and execution time was measured.This test was performed 20 times measuring the response time for vehicle queue length calculation and the response time of the recommendation algorithm.It was not possible to test the update of the traffic signal times, because it was not possible to obtain the necessary test infrastructure, nor was it possible to obtain authorization from the city government to test the system in a real setting.

Results
The results obtained in each of the groups of tests performed are presented below.

Computer Vision Results (Measurement of Vehicle Queues)
This Subsection presents the results of the tests outlined in Section 2.3.1.The first stage of testing, where publicly available images were used, allowed confirming that YOLOv3 adapts to the needs required by the TSCS for recognition and counting of vehicles, as it is an extremely fast and accurate model.At this stage, the metrics mentioned in Section 2.3.1 were not used.
In the second stage, with the help of publicly available traffic videos, 3 videos with an average duration of 60 s were reviewed, considering a periodicity of 10 s for measurement of the vehicle queue.From this stage onwards, the precision and recall metrics were used to evaluate the performance of the algorithm.At this stage, a comparison was made between two models (YOLOv3-320 and YOLOv3-tiny) to define the cost of machine vs. effectiveness of each model.Results showed that the best model to work with a Raspberry PI SBC was the YOLOv3-320 (which counts vehicles at 45 frames per second, and has a much lower error rate) with an average precision of 100% and an average recall of 90.28%.The results obtained with the YOLOv3-tiny model were very low in terms of recall, with an average of 29.72%.The results obtained for vehicle counting with the YOLOv3-320 model are presented in Table 3.For the third stage of testing, five videos taken at the selected intersection was used.The total duration of the videos was 700 s approximately, and a vehicle queue count was performed every 20 s. Results showed an average precision of 100% and an average recall of 94.34%.The results of this third stage are presented in Table 4.The results presented in Tables 3 and 4 correspond to tests performed to evaluate correct operation of the algorithm.Table 3 presents results using videos from the Internet, and Table 4 uses videos feeds from the selected intersection.These tests used a low number of images (one image every 20 s of video) and were useful to adjust some algorithm parameters and to perform the final tests (fourth stage).Stage 4 analyzed 50 images generated by reviewing a considerable number of videos (more than 1000 s of videos were reviewed to obtain the 50 collected images) directly at the intersection in the four directions (also taking an image every 20 s of each video).
At the stage 4 testing, a simultaneous counting of two videos was performed to test the scalability of transmission and counting of vehicle queues for the selected intersection, in this case using the data set of collected traffic videos.A comparison of the number of vehicles present at the intersection with respect to the vehicles counted by the algorithm in 50 images was made.Results of stage 4 are presented as a bar chart (Figure 11), where each bar represents the value for the recall metric.The precision metric of the algorithm for each of the 50 measurements was 100%, because in no case was a vehicle identified in the image, in a place that did not correspond.
An average recall of 92.67% was obtained in the final tests performed on the counting algorithm, which is considered acceptable for the TSCS prototype operation.In total, 14 out of the 50 analyzed images yielded a correct vehicle count, whereas in 36 of the 50 images the number of real vehicles was greater than the vehicles counted by the algorithm, with an average of two non-detected vehicles.In Figure 12, a screenshot at the time of performing one of the final stage tests, where the videos captured in the selected intersection were used, is shown.
the scalability of transmission and counting of vehicle queues for the selected intersecti in this case using the data set of collected traffic videos.A comparison of the number vehicles present at the intersection with respect to the vehicles counted by the algorit in 50 images was made.Results of stage 4 are presented as a bar chart (Figure 11), wh each bar represents the value for the recall metric.The precision metric of the algorit for each of the 50 measurements was 100%, because in no case was a vehicle identified the image, in a place that did not correspond.An average recall of 92.67% was obtained in the final tests performed on the count algorithm, which is considered acceptable for the TSCS prototype operation.In total, out of the 50 analyzed images yielded a correct vehicle count, whereas in 36 of the images the number of real vehicles was greater than the vehicles counted by the algorith with an average of two non-detected vehicles.In Figure 12, a screenshot at the time performing one of the final stage tests, where the videos captured in the selected inters tion were used, is shown.

Data Set of Collected Video Signals
This Subsection presents the results of video data set planned in Section 2.3.2.The data set of video signals was organized as follows.The first classification criterion was the day of the week; videos were not taken in consecutive weeks to increase data randomness.Within each day the videos were taken in two sessions, morning and afternoon (second criterion).The files were named with the following structure: travel direction in the intersection, (e.g., N-S which means that the shot was taken from cars coming from north to south), followed by the acronym of the day and finally the time at which videos were taken.Each video contains three cycles of green lights for each direction, yielding an approximate duration of 3:30 for the N-S and S-N directions and 1:40 for the E-W and W-E directions.A total of 56 videos were collected.Videos were coded in H.264 format at 480p resolution, because they allow an accurate count with the YOLO tool and do not consume a large amount of storage.This data set can be accessed at the following Google drive link: https://bit.ly/3apf7i6(accessed on 10 August 2021).
These videos were used to test the measurement algorithm.This data set can be used for future mobility projects related to traffic measurement or vehicle detection.

Results of TSCS Simulation Test Using SUMO
This Subsection presents the results of the tests outlined in Section 2.3.3.Three types of simulations were performed through using SUMO and the algorithm explained in Figure 9.To change the type of simulation, the parameter named "simulation_type" was

Data Set of Collected Video Signals
This Subsection presents the results of video data set planned in Section 2.3.2.The data set of video signals was organized as follows.The first classification criterion was the day of the week; videos were not taken in consecutive weeks to increase data randomness.Within each day the videos were taken in two sessions, morning and afternoon (second criterion).The files were named with the following structure: travel direction in the intersection, (e.g., N-S which means that the shot was taken from cars coming from north to south), followed by the acronym of the day and finally the time at which videos were taken.Each video contains three cycles of green lights for each direction, yielding an approximate duration of 3:30 for the N-S and S-N directions and 1:40 for the E-W and W-E directions.A total of 56 videos were collected.Videos were coded in H.264 format at 480p resolution, because they allow an accurate count with the YOLO tool and do not consume a large amount of storage.This data set can be accessed at the following Google drive link: https://bit.ly/3apf7i6(accessed on 10 August 2021).
These videos were used to test the measurement algorithm.This data set can be used for future mobility projects related to traffic measurement or vehicle detection.

Results of TSCS Simulation Test Using SUMO
This Subsection presents the results of the tests outlined in Section 2.3.3.Three types of simulations were performed through using SUMO and the algorithm explained in Figure 9.To change the type of simulation, the parameter named "simulation_type" was used.The "RL" simulation (setting simulation_type to 2) is used by the recommendation function to obtain the new state in the traffic signals (with the Q-table).The "basic" simulation (setting simulation_type to 1) changes the state of a traffic light (in a certain direction) to a long time (in green) if the queue of vehicles is long, and vice versa.Finally, the "fixed" simulation (setting simulation_type to 0) uses fixed green light times for each of the intersection traffic signals.The fixed times for each direction of the intersection were defined by simulation operation tests to allow the best operation.The ideal value for this parameter was 20 s in all directions.
Initially, the objective of reducing vehicle queues in all directions was tested.The algorithm used the vehicle queues measured every two cycles of the traffic signals in the TSCS database.Measurements were made for each of the three types of simulation ("RL", "basic", and "fixed") during five days of the week.Some results obtained with the vehicle queue measurements (day 1 and day 2) for the three simulation types are presented in Figures 13 and 14.The results show that the mean size of vehicle queues obtained in the "RL" simulation type was shorter than the mean size of vehicle queues obtained in the "fixed" simulation (28.83% shorter for day 1 and 29.66% shorter for day 2).In addition, the mean size of vehicle queues for the "RL" simulation was shorter (2% for day 1 and 4% for day 2) than the mean size of vehicle queues obtained in the "basic" simulation."RL" simulation type was shorter than the mean size of vehicle queues obtained in the "fixed" simulation (28.83% shorter for day 1 and 29.66% shorter for day 2).In addition, the mean size of vehicle queues for the "RL" simulation was shorter (2% for day 1 and 4% for day 2) than the mean size of vehicle queues obtained in the "basic" simulation.Although the difference between the vehicle queue size of each of the simulation types verifies the fulfillment of the objective of the RL algorithm, it does not represent information that can be analyzed in detail.Therefore, other variables were searched in the simulation results file (generated by SUMO for each simulation run).The variables selected from the file were waiting time and time loss.A low value of these variables means better traffic behavior at the intersection, because vehicles must wait less time at the intersection and lose less time moving.These variables are directly related to the vehicle queues, because if the vehicle queue becomes shorter, the waiting time and time loss decrease accordingly.Although the difference between the vehicle queue size of each of the simulation types verifies the fulfillment of the objective of the RL algorithm, it does not represent information that can be analyzed in detail.Therefore, other variables were searched in the simulation results file (generated by SUMO for each simulation run).The variables selected from the file were waiting time and time loss.A low value of these variables means better traffic behavior at the intersection, because vehicles must wait less time at the intersection and lose less time moving.These variables are directly related to the vehicle queues, because if the vehicle queue becomes shorter, the waiting time and time loss decrease accordingly.
The results file has a value of these variables for each vehicle (in seconds).To compare the three types of simulation in traffic control, the mean of the values of the variables The results file has a value of these variables for each vehicle (in seconds).To compare the three types of simulation in traffic control, the mean of the values of the variables was calculated for each day, both for each of the time ranges, and for the entire day.Figures 17 and 18 show the behavior of the three types of simulations by time range, for the waiting time variable.Figure 17 shows the results of day 1 and Figure 18 the results of day 2. The charts show better values for most of the time ranges when the RL recommendation type simulation is used.
Figures 19 and 20 show the behavior of the three types of simulations by time range, for the time loss variable.Behavior is like that of the waiting time variable.
The RL recommendation for changing the times of the phases of the traffic signals is better when compared to the traffic control with fixed times, in percentages ranging from 46.86% to 52.70% (49.79% on average for the five days), which is a considerable improvement.The RL recommendation is also better with respect to traffic control with the basic recommendation, in percentages ranging from 2.69% to 6.74% (4.6% on average for the five days).Regarding time loss, the algorithm using the RL recommendation simulation type is also better compared to the traffic control with fixed times in the traffic signals, in percentages ranging from 46.98% to 52.32% (49.65% on average for the five days).The RL recommendation is also better with respect to traffic control with the basic recommendation, in percentages ranging from 2.45% to 5.75% (4.1% on average for the five days).
The behavior of the simulations performed during each of the five days is similar, approximately the same percentages of difference between the three types of simulation are maintained on each day.This behavior occurs in the two variables (waiting time and time loss).This indicates that the behavior of the RL recommendation for the times of the traffic signals operates properly even if there are changes in the routes and frequencies of the vehicles.Figures 17 and 18 show the behavior of the three types of simulations by time range, for the waiting time variable.Figure 17 shows the results of day 1 and Figure 18 the results of day 2. The charts show better values for most of the time ranges when the RL recommendation type simulation is used.
Figures 19 and 20 show the behavior of the three types of simulations by time range, for the time loss variable.Behavior is like that of the waiting time variable.
The RL recommendation for changing the times of the phases of the traffic signals is better when compared to the traffic control with fixed times, in percentages ranging from 46.86% to 52.70% (49.79% on average for the five days), which is a considerable improvement.The RL recommendation is also better with respect to traffic control with the basic recommendation, in percentages ranging from 2.69% to 6.74% (4.6% on average for the five days).
Regarding time loss, the algorithm using the RL recommendation simulation type is also better compared to the traffic control with fixed times in the traffic signals, in percentages ranging from 46.98% to 52.32% (49.65% on average for the five days).The RL  recommendation is also better with respect to traffic control with the basic recommendation, in percentages ranging from 2.45% to 5.75% (4.1% on average for the five days).
The behavior of the simulations performed during each of the five days is similar, approximately the same percentages of difference between the three types of simulation are maintained on each day.This behavior occurs in the two variables (waiting time and time loss).This indicates that the behavior of the RL recommendation for the times of the traffic signals operates properly even if there are changes in the routes and frequencies of the vehicles.

Results of the Web Application Operation Tests
This Subsection presents the results of the tests outlined in Section 2.4.4.The functionality planned for each of the users of the web application was successfully developed.Integration of the web application with the two developed algorithms (vehicle queue and RL traffic signal recommendation) required some adjustments, because some minor inconveniences were detected during the initial tests.All other operational tests of the web application functionality were successful.Figure 22 shows a table of vehicle queue sizes at the managed intersections.This information is updated in real time.
Figure 23 shows the interface for the RL algorithm execution at a given intersection.The interface shows the current green light times and the queue size for each direction.The new green light time recommendations are also shown, and the TOP user is asked to confirm the update.Figure 22 shows a table of vehicle queue sizes at the managed intersections.This information is updated in real time.
Figure 23 shows the interface for the RL algorithm execution at a given intersection.The interface shows the current green light times and the queue size for each direction.The new green light time recommendations are also shown, and the TOP user is asked to confirm the update.

Results of Real-Time Module Integration Testing
The real-time module integration testing aimed to evaluate the way in which the different components interact and exchange information.Additionally, they were performed to determine the response times of the TSCS in an operating environment at an intersection.This test was performed 20 times measuring the response time of the vehicle queue measurement algorithm using four video signals (one signal in each direction of the intersection) and the response time of the recommendation algorithm.
The 4 vehicle queues were calculated in an average of 1.57 s.Each video feed captures an image approximately every 20 s to be analyzed by the algorithm.The selection of the image in each of the four video feeds can be done in parallel to save time.Data is recorded in the database when the last queue is processed; the maximum delay time between the start of the measurement and the data recorded in the database was 2.59 s on average, for the 20 tests performed.The mean response time of the time recommendation algorithm was around 7 s.These time delays are adequate because they are much lower than the cycle times of the traffic signals at the intersection (150 s) and than the time for measuring the vehicle queues (20 s).

Discussion
The ITS architecture designed for the TSCS is based upon international reference architectures (ARC-IT and FRAME) and the context of a city in a developing country.This

Results of Real-Time Module Integration Testing
The real-time module integration testing aimed to evaluate the way in which the different components interact and exchange information.Additionally, they were performed to determine the response times of the TSCS in an operating environment at an intersection.This test was performed 20 times measuring the response time of the vehicle queue measurement algorithm using four video signals (one signal in each direction of the intersection) and the response time of the recommendation algorithm.
The 4 vehicle queues were calculated in an average of 1.57 s.Each video feed captures an image approximately every 20 s to be analyzed by the algorithm.The selection of the image in each of the four video feeds can be done in parallel to save time.Data is recorded in the database when the last queue is processed; the maximum delay time between the start of the measurement and the data recorded in the database was 2.59 s on average, for the 20 tests performed.The mean response time of the time recommendation algorithm was around 7 s.These time delays are adequate because they are much lower than the cycle times of the traffic signals at the intersection (150 s) and than the time for measuring the vehicle queues (20 s).

Discussion
The ITS architecture designed for the TSCS is based upon international reference architectures (ARC-IT and FRAME) and the context of a city in a developing country.This ITS architecture of the TSCS can be used by projects in cities that have a similar context, for development of prototypes of this system.Likewise, the procedure used for the design of the ITS architecture can be taken as a model for design of an architecture of another system related to ITSs in future research.
The developed TSCS prototype used the proposed ITS architecture as its main source.In addition, specific technologies and hardware elements were proposed, considering budget limitations for a city in a developing country.The prototype was developed and tested with good results; therefore, it can be taken as a reference for developments in similar cities.Although the traffic control tests of the prototype were simulated due to logistical limitations of performing such control tests in a real environment, an attempt was made to reflect an environment as close to reality as possible.
Analyzing the results obtained from the simulation, it was possible to show that the system had similar results to works reviewed in literature related to RL (such as [4], [5]).The developed prototype improves traffic (specifically, waiting time and time loss of vehicles) at a traffic signal intersection by approximately 50% when compared to traffic control performed with traffic signals using fixed green light times.This percentage is similar to those obtained in related research.However, the prototype in this work uses a simpler, standardized architecture, featuring viable components and low complexity algorithms for determination of state-actions, coding and execution efficiency.
The results of the proposed prototype were also compared to the use of a basic recommendation algorithm.The waiting and loss times for the traffic control prototype were approximately 5% lower than those obtained with the basic recommendation, which represents a smaller scale, nevertheless significant improvement.
The improvement reported in other works related to traffic control using RL shows that using complex proposals (in terms of determination of states, actions, and rewards and more robust algorithms) yields not so significant improvements, with a low percentage of reduction of times of vehicles at the intersection.Therefore, it is considered that the percentage of improvement of this prototype is significant, considering its low complexity and limited implementation budget.
The developed prototype implemented two AI algorithms with very good results.The vehicle queue algorithm allowed counting the waiting vehicles in each direction of the intersection, with an average recall of 92.67% and average precision of 100%.The values of these metrics are considered as sufficient and adequate.
The RL recommendation time algorithm was developed in two versions.The first version allowed integration with the prototype in production, but it was not possible to test it in a real environment (due to the already mentioned logistical limitations).The second version allowed simulation of the prototype using the SUMO software, yielding positive results.
The hyper-parameter values of the RL recommendation time algorithm were modified, seeking the best results in the simulation in terms of vehicle queue length.The values of the hyper-parameters yielding the best results were alpha = 0.6 and gamma = 0.4.
Once the algorithms were independently tested, real-time integration tests were performed to determine if the response time was adequate.The results obtained showed that the maximum delay time between the start of the measurement of the four vehicle queues and the values being recorded in the database was 2.59 s on average, for the 20 tests performed.The response time mean of the time recommendation algorithm was 7 s.These times were considered adequate, because the time of a traffic signal cycle is approximately 150 s and the period between queue measurement is approximately 20 s.The system operator (TOP) can request for the execution of the time recommendation algorithm at any time, which has as input information from the vehicle queue (data collected 23 s ago or less), and the response is obtained in approximately 7 s.
Another significant result of the developed work was the compiled data set of video signals.These videos allow visualizing the behavior of current traffic in the four directions of the intersection selected as a use case.The measurements were taken in the morning and in the afternoon of seven days, collecting 56 videos, and a total of 240 min of recording.This data set was useful to determine the frequency of vehicles at the intersection (to configure the developed simulation) and may be useful for future work related to traffic control in similar cities.
Finally, the developed web application and database allow interaction of users (administrator and TOP) with the TSCS.The interfaces allow an intuitive interaction and are fully integrated with the developed algorithms to allow an adequate traffic control of the intersections managed by the system.
Recommendations for future work related to this research are focused on three aspects.The first aspect is to allow scaling in the flexibility of the developed prototype, in such a way that it is possible to configure the number of directions of an intersection and the number of lanes in each direction.This would allow applying this prototype in a greater number of intersections in a city, with a simpler configuration.The second aspect is to allow more categories for green light times (currently limited to Short Time and Long Time), and describing the possible states in the environment in a greater detail.This improvement could allow obtaining better results in the designed and executed tests.The third aspect is the implementation of a TSCS with a broader scope (more than one intersection) proposed in the same target context.The extension of the scope could be tested with independent agents and with multiple RL agents to evaluate the results obtained in each case.It is important to consider that the development of the complete system and its implementation are proposed, not only the simulation, for which the required hardware and communications components should be considered.

Conclusions
The developed prototype for the TSCS was based on a suitable ITS architecture, which considered international reference architectures and the context of a city in a developing country.The prototype allows integration with other mobility systems due to its modular design.This work showed that RL is a suitable approach for this type of situation where decision-based learning is required.
Additionally, the selected components of the prototype, the suggested technologies, and the simplicity of the algorithms should allow easy implementation in cities with similar contexts.Most of the components of the developed TSCS were tested in a real environment, obtaining very good results.
The prototype component in charge of counting the vehicles through computer vision was tested through four phases, obtaining good results.The precision and recall metrics were used to evaluate performance.In the final testing phase of this module, recall was 92.67% and precision was 100%.These results were considered sufficient and adequate, because the TSCS does not require calculating an exact number of vehicles to determine if the vehicle queue is long or short.
The component in charge of making the recommendation time for the traffic signals was also tested.The operation was as expected, and real-time integration tests were performed to guarantee that the response time of the system was consistent with the operation that it would have in a real environment.The response time when requesting a new status of the traffic signals (new times for the phases) was also consistent with the duration of the cycle of the traffic signals at the intersection.
Testing the full functionality of the TSCS, including the traffic signals time modification and measurement of results obtained with this change, would require having an available intersection with traffic signals controlling vehicle flows in all directions.This required scenario was not available for logistics and safety reasons.Therefore, a simulation of the TSCS operation was performed using the SUMO simulation tool.This simulation was tested by comparing it with two other types of traffic control (fixed times in the traffic signals and traffic control with a basic recommendation).The simulation results identified an improvement in the main goal; vehicle queue length was reduced approximately 29% with respect to control with fixed times, and 3% with respect to basic recommendation.Moreover, the simulation identified a reduction of the waiting time and time loss of the vehicles passing through the intersection of approximately 50% with respect to control with fixed times, and of approximately 5% with respect to control with a basic recommendation, with the advantage of low complexity in the development of the system.

Electronics 2021 ,
10, x FOR PEER REVIEW 6 of 32 their traffic signals, thus reducing the vehicle waiting time.Figures 1 and 2 show the architecture of the TM03 and TM05 services.

Figure 1 .
Figure 1.ITS Architecture for the TM03 (Traffic Signal Control) service proposed by ARC-IT [46].Figure 1. ITS Architecture for the TM03 (Traffic Signal Control) service proposed by ARC-IT [46].

Figure 1 .
Figure 1.ITS Architecture for the TM03 (Traffic Signal Control) service proposed by ARC-IT [46].Figure 1. ITS Architecture for the TM03 (Traffic Signal Control) service proposed by ARC-IT [46].

Figure 1 .
Figure 1.ITS Architecture for the TM03 (Traffic Signal Control) service proposed by ARC-IT [46].

Figure 3
Figure 3 presents the proposed architecture for the TSCS, which considers results of the systematic review of literature, and the reviewed ITS reference architectures.Three of the components used in the TM03 and TM05 ARC-IT services are included: ITS Roadway Equipment, Traffic Management Center (TMC) and Vehicles (presented as Driver in ARC-IT).The traffic measurement, traffic signal control, and basic road surveillance modules are also included.The three components of the proposed ITS architecture allow traffic monitoring, data, and video collection, determination of the best traffic signal times for a given intersection, and programming of the traffic signals phases of any intersection.

Figure 4
Figure 4 presents a diagram of the prototype, with the selected components, modules and technologies.Electronics 2021, 10, x FOR PEER REVIEW 8 of 32

Figure 5 .Figure 5 .
Figure 5. Example of vehicle queue counting by algorithm (based in YOLO v3). 4. The TMC Basic Surveillance (TMC BS) module receives the video signals from each of the four directions of the intersection and presents them on the web app, with which the TOP interacts.The Flask Python library [53] and the Ngrok tunneling application [54] were used to allow simultaneous presentation of the four video signalsFigure 5. Example of vehicle queue counting by algorithm (based in YOLO v3).
has important changes compared to the initial version presented in Figure 8.In this version of the algorithm, an integration was made between the initial version (which used only RL) and a SUMO interface, allowing simulation of the online traffic control named TraCI 1.7.0 version (Traffic Control Interface) [60].TraCI made it possible to measure the vehicle queues (generated in the simulation) and to program the times of the simulated traffic signals in each of the directions (using external files for time plan definition).

•
Design and development of computer vision tests (for calculating vehicle queues); • Design of captured video signals and the planned data set; • Design and development of TSCS simulation tests using SUMO; • Web application testing; • Real-time module integration testing.

Figure 10 .
Figure 10.Screenshot of RL algorithm simulation (using SUMO) indicating the maneuvers a vehicle can perform at the intersection.The yellow triangles on the roads simulate small vehicles, the red rectangles simulate buses, while the green rectangles simulate large trucks.Green arrows indicate the maneuvers a vehicle can perform at the intersection.

Figure 10 .
Figure 10.Screenshot of RL algorithm simulation (using SUMO) indicating the maneuvers a vehicle can perform at the intersection.The yellow triangles on the roads simulate small vehicles, the red rectangles simulate buses, while the green rectangles simulate large trucks.Green arrows indicate the maneuvers a vehicle can perform at the intersection.

Figure 11 .
Figure 11.Results of the counting algorithm used on data set (stage 4 of tests of the measurement vehicles queue).

Figure 11 . 32 Figure 12 .
Figure 11.Results of the counting algorithm used on data set (stage 4 of tests of the measurement vehicles queue).Electronics 2021, 10, x FOR PEER REVIEW 21 of 32

Figure 12 .
Figure 12.Screenshot: fourth stage of tests, measurement of vehicle queue.

Figure 13 .
Figure 13.Mean size of vehicle queues, on day 1 by time ranges, for each type of simulation.Figure 13.Mean size of vehicle queues, on day 1 by time ranges, for each type of simulation.

Figure 13 .
Figure 13.Mean size of vehicle queues, on day 1 by time ranges, for each type of simulation.Figure 13.Mean size of vehicle queues, on day 1 by time ranges, for each type of simulation.

Figure 13 .
Figure 13.Mean size of vehicle queues, on day 1 by time ranges, for each type of simulation.

Figure 14 .
Figure 14.Mean size of vehicle queues, on day 2 by time ranges, for each type of simulation.

Figure 14 .
Figure 14.Mean size of vehicle queues, on day 2 by time ranges, for each type of simulation.

Figures 15 -
20 present some of the obtained results.Figures 15 and 16 present the mean of waiting time and loss time (in seconds) of the vehicles, during 5 days, for each type of simulation.Figures 17 and 18 present the mean of waiting time (in seconds) for each of the time ranges, for two days.Figures 19 and 20 present the mean of time loss (in seconds) for each of the time ranges, for the same two days.Figures 15 and 16 show that simulation through the RL recommendation type has a better behavior (lower values) in terms of waiting time, when compared to the other two types of simulation.Electronics 2021, 10, x FOR PEER REVIEW 23 of 32 was calculated for each day, both for each of the time ranges, and for the entire day.Figures 15-20 present some of the obtained results.Figures 15 and 16 present the mean of waiting time and loss time (in seconds) of the vehicles, during 5 days, for each type of simulation.Figures 17 and 18 present the mean of waiting time (in seconds) for each of the time ranges, for two days.Figures 19 and 20 present the mean of time loss (in seconds) for each of the time ranges, for the same two days.Figures 15 and 16 show that simulation through the RL recommendation type has a better behavior (lower values) in terms of waiting time, when compared to the other two types of simulation.

Figure 15 .
Figure 15.Mean of waiting time (in seconds) of the vehicles, during 5 days, for each simulation.Figure 15.Mean of waiting time (in seconds) of the vehicles, during 5 days, for each simulation.

Figure 15 .
Figure 15.Mean of waiting time (in seconds) of the vehicles, during 5 days, for each simulation.Figure 15.Mean of waiting time (in seconds) of the vehicles, during 5 days, for each simulation.

Figure 15 .
Figure 15.Mean of waiting time (in seconds) of the vehicles, during 5 days, for each simulation.

Figure 16 .
Figure 16.Mean of time loss (in seconds) of the vehicles, during 5 days, for each simulation.Figure 16.Mean of time loss (in seconds) of the vehicles, during 5 days, for each simulation.

Figure 16 . 32 Figure 17 .
Figure 16.Mean of time loss (in seconds) of the vehicles, during 5 days, for each simulation.Figure 16.Mean of time loss (in seconds) of the vehicles, during 5 days, for each simulation.Electronics 2021, 10, x FOR PEER REVIEW 24 of 32

Figure 18 .
Figure 18.Mean of waiting time (in seconds), on day 2 by time ranges, for each type of simulation.

Figure 17 .
Figure 17.Mean of waiting time (in seconds) of the vehicles, on day 1 by time ranges, for each simulation.

Figure 17 .
Figure 17.Mean of waiting time (in seconds) of the vehicles, on day 1 by time ranges, for each simulation.

Figure 18 .
Figure 18.Mean of waiting time (in seconds), on day 2 by time ranges, for each type of simulation.Figure 18. Mean of waiting time (in seconds), on day 2 by time ranges, for each type of simulation.

Figure 18 . 32 Figure 19 .
Figure 18.Mean of waiting time (in seconds), on day 2 by time ranges, for each type of simulation.Figure 18. Mean of waiting time (in seconds), on day 2 by time ranges, for each type of simulation.Electronics 2021, 10, x FOR PEER REVIEW 25 of 32

Figure 19 .
Figure 19.Mean of time loss (in seconds), on day 1 by time ranges, for each type of simulation.Figure 19.Mean of time loss (in seconds), on day 1 by time ranges, for each type of simulation.

Figure 19 .
Figure 19.Mean of time loss (in seconds), on day 1 by time ranges, for each type of simulation.

Figure 20 .
Figure 20.Mean of time loss (in seconds), on day 2 by time ranges, for each type of simulation.

Figure 20 .
Figure 20.Mean of time loss (in seconds), on day 2 by time ranges, for each type of simulation.

3. 4 .
Results of the Web Application Operation TestsThis Subsection presents the results of the tests outlined in Section 2.4.4.The functionality planned for each of the users of the web application was successfully developed.Integration of the web application with the two developed algorithms (vehicle queue and RL traffic signal recommendation) required some adjustments, because some minor inconveniences were detected during the initial tests.All other operational tests of the web application functionality were successful.Figures21-23show screenshots of the web application interfaces for some of the main TSCS functions.Figures21 and 22correspond to administrator-type user functionality, whereas Figure23corresponds to functionality of the TOP type user.Figure21shows a map of the city.The red markers pinpoint the location of the intersections managed by the system.
Figures 21-23 show screenshots of the web application interfaces for some of the main TSCS functions.Figures 21 and 22 correspond to administrator-type user functionality, whereas Figure 23 corresponds to functionality of the TOP type user.Figure 21 shows a map of the city.The red markers pinpoint the location of the intersections managed by the system.

Table 1 .
Summary of evaluation of state of art.
Number Traffic Light 1 Traffic Light 2 Traffic Light 3 Traffic Light 4

Table 3 .
Test results for publicly available traffic videos.

Table 4 .
Video test results for selected intersection traffic.