Integrating Visible Light Communication and AI for Adaptive Traffic Management: A Focus on Reward Functions and Rerouting Coordination

Manuela Vieira; Gonçalo Galvão; Manuel A. Vieira; Mário Vestias; Paula Louro; Pedro Vieira

doi:10.3390/app15010116

,

and

¹

DEETC-ISEL/IPL, R. Conselheiro Emídio Navarro, 1949-014 Lisboa, Portugal

²

UNINOVA-CTS and LASI, Quinta da Torre, Monte da Caparica, 2829-516 Caparica, Portugal

³

NOVA School of Science and Technology, Quinta da Torre, Monte da Caparica, 2829-516 Caparica, Portugal

⁴

INESC INOV, R. Alves Redol, 9, 1000-029 Lisboa, Portugal

Appl. Sci.2025, 15(1), 116;https://doi.org/10.3390/app15010116

This article belongs to the Special Issue Novel Advances in Internet of Vehicles

Version Notes

Order Reprints

Abstract

This study combines Visible Light Communication (VLC) and Artificial Intelligence (AI) to optimize traffic signal control, reduce congestion, and enhance safety. Utilizing existing road infrastructure, VLC technology transmits real-time data on vehicle and pedestrian positions, speeds, and queues. AI agents, powered by Deep Reinforcement Learning (DRL), process these data to manage traffic flows dynamically, applying anti-bottlenecking and rerouting techniques. A global agent coordinates local agents, enabling indirect communication and a unified DRL model that adjusts traffic light phases in real time using a queue/request/response system. A key focus of this work is the design of reward functions for standard and rerouting scenarios. In standard scenarios, the reward function prioritizes wide green bands for vehicles while penalizing pedestrian rule violations, balancing efficiency and safety. In rerouting scenarios, it dynamically prevents queuing spillovers at neighboring intersections, mitigating cascading congestion and ensuring safe, timely pedestrian crossings. Simulation experiments in the SUMO urban mobility simulator and real-world trials validate the system across diverse intersection types, including four-way crossings, T-intersections, and roundabouts. Results show significant reductions in vehicle and pedestrian waiting times, particularly in rerouting scenarios, demonstrating the system’s scalability and adaptability. By integrating VLC technology and AI-driven adaptive control, this approach achieves efficient, safe, and flexible traffic management. The proposed system addresses urban mobility challenges effectively, offering a robust solution to modern traffic demands while improving the travel experience for all road users.

Keywords:

Visible Light Communication (VLC); Artificial Intelligence (AI); Deep Reinforcement Learning (DRL); traffic signal control; traffic management; reward functions; rerouting scenarios; pedestrian safety; congestion reduction; urban mobility; adaptive traffic control

1. Introduction

Urban traffic management presents a significant challenge for modern cities, as the growing volume of vehicles and pedestrians contributes to congestion, delays, and safety risks [1]. Given the spatial constraints in urban areas, expanding road infrastructure is no longer a viable solution. Instead, optimizing traffic flow at intersections has become critical, with adaptive traffic signal control emerging as one of the most effective strategies.

Adaptive systems can significantly reduce congestion and improve intersection efficiency by using real-time data from traffic networks, such as traffic flow, waiting times, and vehicle queues. Visible Light Communication (VLC) offers a versatile and efficient solution for modern traffic management, seamlessly integrating into existing infrastructure, such as vehicles, streetlights, and traffic signals [2,3]. Beyond illumination and communication, VLC enables dynamic and adaptive traffic signal control [4,5], particularly when combined with AI. This paper introduces a novel approach to traffic management based on DRL that emphasizes the design and impact of reward functions tailored for standard and rerouting scenarios, enhancing both vehicular efficiency and pedestrian safety.

Intersections are key bottlenecks in road networks, making intelligent signal control crucial for improving traffic flow. Recent advances in Deep Reinforcement Learning (DRL) have shown promise in dynamically managing traffic signals for both vehicles and pedestrians [6,7]. However, optimizing traffic flow across multiple intersections is challenging due to varying traffic conditions and the need for information sharing [8]. The rise of connected vehicles (CVs) adds further potential to traffic management. Through advanced communication technologies, connected vehicles (CVs) can exchange real-time traffic and safety information with each other and the infrastructure, improving road safety, comfort, and efficiency. This connected traffic environment enables better optimization of both vehicular and pedestrian flows. VLC is an innovative solution that can complement CV technologies to optimize traffic management. By leveraging LED technology in both road infrastructure and vehicles, VLC paves the way for smarter, more efficient traffic management.

This work investigates the integration of VLC and AI, particularly DRL, to design reward functions that optimize traffic signal control in both standard and rerouting scenarios. This research focuses on applying DRL in vehicular communication to enhance traffic flow, improve intersection efficiency, and balance the needs of vehicles and pedestrians in real-world urban environments. By leveraging emerging technologies, such as VLC and connected vehicles (CVs), the study aims to demonstrate their potential to address modern traffic management challenges effectively.

The core contributions are as follows:

Demonstrating the feasibility of VLC in outdoor environments as a complementary technology to existing vehicular communication systems.
Proposing an integrated framework where VLC is utilized to collect real-time traffic data, which are then processed and analyzed using DRL models.
Showcasing how the proposed approach can effectively optimize vehicle flow and traffic signal control in urban traffic scenarios, enhancing intersection efficiency and addressing modern traffic management challenges.

The paper is organized as follows. The introduction outlines the challenges in urban traffic management and explores the potential of integrating connected vehicles (CVs) with Visible Light Communication (VLC) to optimize traffic signal control and rerouting. Section 2 discusses the advantages of using VLC for communication in traffic systems, highlights key issues, such as congestion and safety, examines VLC’s role in real-time traffic management, and introduces Deep Reinforcement Learning (DRL) for traffic signal optimization. Section 3 describes the system architecture, the role of rewards in DRL, coordination strategies between agents, and the simulation used to validate the approach. Section 4 covers network training and testing, global agent decision making, and the impact of VLC and CV integration on traffic rerouting, along with associated challenges. Finally, Section 5 concludes the paper by summarizing the key findings, addressing the study’s limitations, and suggesting potential future research directions.

2. Background and Related Work

2.1. Key Challenges in Urban Traffic Management

Urban traffic management faces numerous challenges, including congestion, unpredictable delays, and safety risks. As cities grow and vehicle numbers increase, traffic congestion becomes a major concern, especially during peak hours. Road networks, particularly at intersections, become saturated, creating bottlenecks that hinder traffic flow. Accidents, vehicle breakdowns, or road work can exacerbate congestion and cause extended delays. Poorly coordinated traffic signals, often outdated, contribute to unnecessary stops, further increasing congestion [9]. Traditional approaches like road expansion and static signals have significant limitations in handling modern urban traffic demands. These solutions are reactive, lack flexibility, and fail to address long-term sustainability and safety needs. Emerging technologies, such as adaptive traffic control, connected vehicles, and intelligent communication systems, offer promising solutions. To address these challenges, solutions have been proposed that leverage machine learning algorithms to predict optimal routes, reduce average waiting times, alleviate congestion, minimize travel costs, and lower air pollution levels. These systems analyze traffic patterns, vehicle classifications, accident data, and environmental factors, such as precipitation levels [10]. By accurately analyzing traffic demand, future conditions can be predicted and used to optimize vehicle routes. This approach becomes particularly effective in the context of autonomous vehicles, which exhibit more predictable behavior. Such predictability allows city management systems to mitigate congestion and pollution more effectively, thereby improving traffic flow through centralized control mechanisms [11].

Current traffic control algorithms often focus on optimizing vehicular flow but overlook the complexities of pedestrian movements [12]. This gap highlights the need for reinforcement-learning-based traffic systems that incorporate pedestrian safety.

Traffic delays are closely tied to congestion, as static, non-adaptive traffic signals fail to respond to real-time traffic conditions. During rush hours, the road capacity is often exceeded, creating long wait times and slow travel speeds. A critical issue is the lack of communication between adjacent intersections, which hinders efficient synchronization of traffic signals, leading to delays.

In multi-intersection scenarios, the scalability of single-agent systems is limited, necessitating collaborative strategies [13,14,15]. Adaptive traffic control in Vehicle-to-Everything (V2X) environments offers a solution by using detailed data for improved traffic flow and safety. This approach models traffic flow dynamics and enhances scalability by integrating data, such as vehicle position, speed, and queuing lengths, that traditional systems fail to capture. Compared with the traffic flow and occupancy information provided by the fixed coil detector in the traditional traffic environment, the adaptive traffic control system in the V2X environment can collect more detailed data, such as vehicle position, speed, queuing length, and stopping time. While Vehicle-to-Vehicles (V2V) links are particularly important for safety functionalities, such as pre-crash sensing and forward collision warning, Infrastructure-to-Vehicles/Pedestrians (I2V/P) links provide the CV and the pedestrians with a variety of useful information.

Emerging solutions like adaptive traffic signal control (ATSC) [16], connected vehicles (CV) [17], and reinforcement learning (RL) algorithms [18] enable dynamic, real-time traffic management. These methods adapt to changing traffic patterns, reducing delays while improving safety and efficiency. For example, ATSC systems optimize traffic signal phases using sensor data, minimizing congestion. In environments with connected and autonomous vehicles (CAVs), technologies like vehicle speed guidance systems further reduce queues and delays. However, implementing these advancements faces challenges, including ensuring seamless integration of traffic lights, sensors, and vehicles, and requires substantial investments in infrastructure, hardware, and software.

2.2. Innovative Solutions: Integrating Connected Vehicles with VLC

Advancements in wireless communication technologies and the development of V2V and V2I systems present new opportunities to optimize urban traffic networks by coordinating traffic signal control with driving behaviors [19]. Addressing these challenges, this paper proposes a novel approach that integrates VLC-based localization with learning-driven traffic signal control. The goal is to provide comprehensive management of pedestrian and vehicular traffic, prioritizing reduced waiting times and improved safety in multi-intersection environments.

CV technology is revolutionizing urban traffic management by enabling V2V command V2I communications. This real-time exchange of traffic and safety data, such as speed, location, and road condition, improves traffic flow, enhances safety, and reduces accidents. Simultaneously, Visible Light Communication (VLC) is emerging as a groundbreaking solution for traffic management. By modulating the intensity of LED lights, commonly used in streetlights, traffic signals, and vehicle headlights, VLC transmits data while maintaining its primary function of illumination. This dual-purpose capability allows VLC to integrate seamlessly into existing infrastructure, transforming ordinary light sources into effective communication tools for smart urban environments. With its ability to deliver high-speed, localized communication, VLC is a key enabler in the development of smarter and more adaptive traffic management solutions.

To develop an intelligent control system for safe vehicle management at intersections utilizing V2V, V/P2I, and I2V/P communications, RL concepts are applied. RL trains agents by rewarding desired behaviors and penalizing undesired ones [20]. The effectiveness of the proposed V-VLC system in multi-intersection scenarios is evaluated using the agent-based Simulation of Urban Mobility (SUMO) platform [21]. SUMO is an open-source traffic simulation software developed to model complex traffic scenarios. It allows for the simulation of various traffic networks, vehicle types, and mobility patterns, making it widely applicable for urban traffic analysis and studies involving autonomous vehicles. We utilized different levels of simulation models within SUMO to represent the traffic scenario. These levels include the following:

Microscopic models, which simulate individual vehicle behaviors, including lane-changing, car-following, and interactions with traffic signals.
Macroscopic models, which model traffic flow using averaged variables, such as vehicle density and speed, over larger sections of a network.

Through experience, the agent learns to avoid negative outcomes and focus on positive actions. Traffic lights in SUMO are managed by the learning agent, which adjusts traffic signal phases based on its decisions to optimize traffic flow. The agent aims to explore new states while maximizing cumulative rewards to devise an optimal policy. A dynamic phasing diagram and a state matrix, based on total accumulated time, illustrate the approach.

By combining CV’s real-time data sharing with VLC’s high-precision communication, urban traffic systems can achieve synchronized control, improving traffic efficiency and safety at intersections and across multi-intersection networks. Together, these technologies form the backbone of an intelligent, adaptive traffic management ecosystem.

2.3. Leveraging VLC for Intelligent Traffic Solutions

The V-VLC system, as depicted in Figure 1a, utilizes a mesh cellular hybrid structure with two controllers. The “mesh” controller at streetlights relays messages to vehicles, while the “mesh/cellular” hybrid controller at the traffic lights acts as a border-router for edge computing [22,23].

Figure 1. (a) 2D representation of the V-VLC architecture. (b) V-VLC Emitter and receivers’ relative positions and illustration of the coverage map with the footprint regions in the unit cell (#1–#9) and steering angle codes (2–9) [22].

The system integrates a modulated light transmitter and a PIN–PIN-based receiver to enable wireless communication. Figure 1b illustrates the relative positions of the emitter and receivers, along with a coverage map depicting footprint regions (#1–#9) within the unit cell and their corresponding steering angles (δ). The transmitter employs ON–OFF Keying (OOK) amplitude modulation to modulate light emitted from tetra-chromatic white LEDs (WLEDs) positioned at the corners of square unit cells. These WLEDs combine Red (R: 626 nm), Green (G: 530 nm), Blue (B: 470 nm), and Violet (V: 390 nm) light to produce white illumination [24].

Each WLED channel generates calibrated wavelength signals, enabling up to four independent emitters per unit. The receiver detects combinations of two (#3, #5, #7, #9), three (#2, #4, #6, #8), or four (#1) optical signals, resulting in 2⁴ distinct combinations and 16 unique photocurrent levels [20]. A PIN–PIN demultiplexer filters the signals, decoding the transmitted message based on prior amplitude calibration [25].

The architecture supports Infrastructure-to-Cloud (I2IM) communication through embedded platforms for processing and sensor interfacing, as well as peer-to-peer Vehicle-to-Vehicle (V2V) communication for data sharing. Streetlights act as geo-transmitters spaced 20 m apart, emitting I2V messages, including synchronization, physical IDs, and traffic data. Vehicles and pedestrians entering a streetlight’s range receive a unique identifier q_i(x,y,t) along with traffic updates.

Intersection management utilizes queue/request/response mechanisms and temporal/spatial relative pose concepts. Vehicles or pedestrians approaching intersections send crossing requests (V/P2I). Traffic signals acknowledge these (I2V/P) and assign occupancy trajectories within footprint regions (Figure 1b). Collision risks delay acknowledgments until resolved. Vehicle speeds are tracked using transmitter IDs, while mesh nodes estimate relative poses q_ij(t) for neighboring vehicles [26]. Requests also include position, direction, and speed, with leader–follower data supporting request confirmation. Delays are analyzed based on queue lengths within cells at the start and end of green phases (Figure 1a).

The VLC system ensures real-time monitoring of pedestrians, vehicles, and infrastructure. Key metrics, such as queue formation and pedestrian density at corners, are evaluated to enhance safety. P2I2P communication facilitates travel time estimation, while transmitter tracking IDs provide insights into speed and waiting times.

2.4. DRL Framework for Traffic Signal Control

To manage traffic efficiently with data collected via VLC, a Multi-Agent Reinforcement Learning (MARL) system was implemented. Three agents (C0, C1, C2) were considered. The traffic scenario analyzed in this study is depicted in Figure 2. It includes three uniform 4-arm junctions, spaced at varying distances. The lane between junctions C0 and C1 is 400 m long, while the lane between C1 and C2 is only 200 m long.

Figure 2. Environment of the arterial scenario.

Each intersection is managed by an agent that maps its environment into cells, gathering data on vehicles and pedestrians. Communication between the infrastructure and vehicles ensures a cooperative information-sharing setup. Traffic data collected by the agents are stored in a shared memory, allowing a single network to be trained for controlling all intersections and selecting the best phase for each.

A Deep-Q Network (DQN), trained using deep Q-learning, optimizes the system. Instead of traditional tabular Q-learning, which stores Q-values for each state–action pair, the DQN uses a neural network (NN) to predict Q-values based on the current state. This approach handles real-world scenarios with large, continuous state–action spaces. Figure 3 illustrates the simulation and training process.

Figure 3. Flowchart during simulation and training.

At each time step t, the network receives the current state st, executes an action at, and transitions to the next state s_t+1. A reward r_t evaluates the action’s effectiveness based on vehicle and pedestrian waiting times (Equation (1)). Waiting time (

w t_{v e h, t} / w t_{p e d, t}

) is the duration a vehicle or pedestrian moves at less than 0.1 m/s since entering the environment. n represents the total number of vehicles/pedestrians in the environment in t. This value, atwt_t, accumulates until they cross the intersection, ensuring the reward reflects both traffic flow and pedestrian efficiency.

a t w t_{v e h, t} = \sum_{v e h = 1}^{n} w t_{(v e h, t)} a t w t_{p e d, t} = \sum_{p e d = 1}^{n} w t_{(p e d, t)}

(1)

The final reward equation,

r_{t},

is defined in Equation (2), where atwt_t and atwt_t₋₁ are the accumulated total waiting time of all the cars/pedestrians in the intersection captured, respectively, at t and t − 1. The weights of the p_veh and p_ped are set based on the desired priority that the agent should have towards vehicles and pedestrians during network training. The agent will learn a policy that benefits one more than the other or keeps the system balanced if the weights are equal.

r_{t} = p_{v e h} (a t w t_{v e h, t - 1} - a t w t_{v e h, t}) + p_{p e d} (a t w t_{p e d, t - 1} - a t w t_{p e d, t})

(2)

This experience ex = (s_t, a_t, r_t, s_t+1) will be stored in the replay memory, to be used in the future to train the agent. The replay memory is a dataset of an agent’s experiences Dt = (e₁, e₂, …, e_t), which are gathered when the agent interact with the environment as time goes by (t = 1, 2…, n). In training, a mini batch of random samples is selected to break the temporal correlation between consecutive experiences, preventing inefficient learning from highly correlated data. The replay memory buffer fills to a specific length, and, when full, older experiences are overwritten by new ones.

The neural network architecture implemented consists of a fully connected layer network (FCLN), and the weights θk of the FCLN are used to approximate its Q-values Q(s, a; θk). The first layer of the network is the input layer, formed by an input layer of 164 neurons, representing the state of the environment. Following this, there are 5 hidden layers, each one with 400 neurons and each with rectified linear unit (ReLu), an activation function commonly used in deep neural networks, with the ability to introduce non-linearity to the network, allowing the NN to learn complex patterns and representations in the data. Finally comes the output layer, with 9 neurons, and each one will display the Q-Values for each action. The actions will be linked to traffic management in the next sections. The next action that the agent will choose is determined by the maximum Q-Value output.

To improve this prediction of Q-Values, a Mean-Squared-Error function is used that is a mathematical function that quantifies the difference between the predicted Q-values and the target Q-values, as it is possible to see in Equation (3).

M S E L o s s = \frac{1}{N} \sum_{i = 1}^{N} {(Q_{t a r g e t} - Q_{p r e d})}^{2}

(3)

where N is the number of samples in the memory, Q_pred is the Q-value predicted by the main network, and Q_target is acquired using a network similar to the main one but that is not trained. At certain intervals of epochs, episodes that are used to train the main network, and the weights of this network are copied to the secondary network, bringing greater stability to the training. The Q_target values are calculated based on the following Equation (4). Where r_t is the reward obtained and γ is a discount factor applied to the maxQ_target value, lowering the importance of the future reward compared to the immediate reward.

Q_{t a r g e t} = r_{t} + y \cdot m a x Q_{t a r g e t} (s_{t + 1, a})

(4)

The MSE loss function calculates the squared difference between predicted and target values, aiming to minimize this difference during training. The model adjusts the weights θk of the neurons to better approximate the Q-value and improve predictions, leading to better decision making by the agent. The network is evaluated based on the average number of cars in queues, showing that it can manage traffic. However, without communication between agents about lane occupancy, peaks in vehicle queues can occur.

As traffic scenarios become more complex with multiple intersections, communication between agents becomes crucial. In simpler cases, where vehicles only pass through one junction, communication is less important. But with closely spaced junctions and heavy traffic, such as during rush hour, agents must coordinate and exchange information to manage queues effectively. This dynamic phase activation reduces congestion and ensures smoother traffic flow.

3. Proposed Approach and Methodology

This study presents a novel approach to urban traffic management by integrating VLC with AI, specifically leveraging DRL. The proposed system is designed to optimize traffic signal control in both standard and rerouting scenarios, ensuring efficient vehicular flow and pedestrian safety.

3.1. System Architecture

The system comprises three main components:

VLC-Enabled Infrastructure: Traffic lights and roadside units equipped with VLC modules to enable real-time data transmission between vehicles, infrastructure, and pedestrians.
AI Agents: Independent DRL-based agents are deployed at each intersection, with a centralized global agent coordinating the local ones. This structure ensures scalability and real-time adaptability to traffic dynamics.
Data Integration Platform: A platform that collects, processes, and integrates real-time data from VLC-enabled devices, including vehicle positions, speeds, and pedestrian requests.

3.1.1. VLC-Enabled Infrastructure

Arterial traffic signal control refers to managing intersections formed by crossing two or more main roads that are either radial or circular in design. The layout and spacing between intersections vary depending on traffic volume, road capacity, and network design. Each approach at an intersection comprises multiple lanes to accommodate different vehicle movements, such as left turns, right turns, and through-traffic. These intersections are governed by standard traffic rules, with priority movements determined by the traffic signals in place.

The scenario was idealized to reflect a futuristic urban environment tailored for autonomous vehicles. Key aspects include the following:

Multiple lanes on each arm of the junction to reduce queuing and enhance traffic flow.
Exclusive pedestrian phases to ensure complete pedestrian safety by segregating vehicle and pedestrian movements.
The scenario assumes that autonomous vehicles pre-plan their routes, enabling seamless navigation within designated lanes.
These assumptions align with a forward-looking vision of urban traffic management, optimizing safety and efficiency while accommodating future technological advancements.

The traffic scenario analyzed in this study is depicted in Figure 2. It includes three uniform 4-arm junctions, spaced at varying distances. The lane between junctions C0 and C1 is 400 m long, while the lane between C1 and C2 is only 200 m long. Figure 4 presents the simulated environment for each four-legged intersection, showcasing the optical infrastructure (X_ij), the generated footprints (#1–#9), and the interactions between connected vehicles and pedestrians. The streetlights along the lanes, denoted as X_i,j, are identified by integers that follow the opposite direction of traffic (N, S, E, W), starting from streetlight 0 at the required signalized intersection and extending, in line, towards streetlights K,L,M, N at the neighboring junction.

Figure 4. Simulated scenario for each junction: four-legged intersection and environment with the optical infrastructure (X_ij), the generated footprints (#1–#9), and the connected cars and pedestrians. Dush lines show the sidewalks.

Each arm of the junction has two lanes: one for left turns and another for vehicles to continue straight or turn right. This design is tailored for CAVs. Using VLC (Figure 1), the vehicle gathers information about the junction’s footprint regions, lanes, and surroundings to select the appropriate lane. Once positioned, the vehicle communicates its intention to cross to the Intersection Manager (IM) via V2I technology. The environment also includes sidewalks with designated pedestrian waiting zones at each intersection, ensuring safe waiting areas until the crossing phase is activated for pedestrians to use the zebra crossings.

The traffic signal phase timing is determined by factors like traffic demand, intersection layout, and traffic management objectives (e.g., minimizing delays, maximizing traffic flow). Traffic signals typically operate in phases, with green lights allowing movement in one direction while red lights restrict conflicting movements. In this study, we design eight vehicle signal phases and one exclusive pedestrian phase for each intersection, as illustrated in Figure 5a. For vehicles, there is a north–south phase (P1) where they can either proceed straight or turn right, followed by a phase where vehicles coming from the north can cross in all directions (P2), and another phase where vehicles from the south can do the same (P3). Additionally, there is a phase where vehicles traveling from both the north and the south can only turn left (P4). The same phase structure applies to the east–west direction (P5, P6, P7, P8). Pedestrians have an exclusive phase during which all vehicle traffic lights are red, allowing them to cross safely without interference from vehicles. This exclusive phase ensures pedestrian safety by preventing any crossover between pedestrians and vehicles at the crossings.

Figure 5. (a) Phasing diagram. (b) Schematic diagram of one junction with coded lanes (L/0–7) and traffic lights (TL/0–15). Arrows show the traffic directions.

A traffic control system consisting of sixteen traffic lights (LT) has been implemented to manage the flow of vehicles approaching the intersections. These traffic lights are numbered (LT 0–15), as shown in Figure 5b, which also displays the numbering of the lanes (L 0–7), consistent across all three junctions. These sixteen traffic lights enable the implementation of traffic phases to regulate the flow of vehicles and pedestrians.

Due to the dynamic nature of traffic flow, significant fluctuations occur, including variations in the number of stops and acceleration/deceleration events both on arterial roads and at individual intersections. These fluctuations are considered when forecasting potential traffic conflicts in the design.

The main challenge in controlling traffic across multiple intersections is the coordination required to manage the varying traffic conditions between intersections. The target roads between intersections vary in distance, making it crucial to synchronize traffic signals based on traffic volume, vehicle movements, and road capacity. Without proper coordination, localized traffic signal control can lead to inefficiencies like congestion or increased waiting times, especially when multiple intersections interact closely in an arterial network.

3.1.2. Multi-Agent Reinforcement Learning

Figure 6 illustrates the centralized control algorithm, where there is no direct communication between agents. Our research has explored various junction types, starting with a single junction [24] and progressing to two [22] and now three, focusing on the behavior and dynamics of different traffic scenarios in these environments. Because the junctions are homogeneous, the similar experiences observed by each agent allow for training of a single neural network, which acts as a global agent to make decisions and determine the best actions. This approach has proven effective in managing both pedestrian and vehicle traffic at these intersections. However, large occupancy peaks in the queues can sometimes occur, leading to congestion and inefficient traffic flow.

Figure 6. A schematic of the algorithm employed using centralized MARL.

The global agent can dynamically direct local agents to reroute traffic, avoiding congestion zones and ensuring smoother flow. By monitoring traffic density and flow patterns, it adjusts signal phases across intersections to create synchronized green waves that enhance throughput and reduce stop-and-go behavior. For instance, it may shorten green time at one intersection to prevent overwhelming a downstream intersection [27,28,29].

From our study across various scenarios, we gained insights into traffic queue behavior and identified lane capacity limits. To address the lack of direct communication between agents, we integrated this knowledge into the network by setting threshold values for queues. This allows the global system to manage traffic flow in critical sections by evaluating the volume that can be accommodated in each direction.

Although the neural network is trained centrally, traffic signal agents at each intersection locally implement these rerouting strategies. Congestion thresholds are set to adapt and optimize traffic flow, ensuring efficient management even in high-demand conditions.

3.1.3. Data Integration Platform

The protocol is designed for a platform that collects, processes, and integrates real-time data from VLC-enabled devices, including vehicle positions, speeds, and pedestrian requests. The communication protocol defines the structure and rules governing the exchange of information, outlining the synchronization, identification, and payload portions of the transmitted frame. The communication protocol is detailed in Table 1.

Table 1. Message protocol defined for each of the V-VLC communications.

The frame structure begins with a 5-bit synchronization block (Sync) with the pattern [10101], used to synchronize the receivers and mark the start of a new frame (SOF). Following this, the TIME block encodes the current time with a 12-bit sequence (6 + 6 + 6), representing the hour, minute, and second. A flag with the pattern [1 1 1} indicates that specific ID blocks will follow.

Each ID block consists of 4 bits, starting with the communication type (COM) that specifies the communication between streetlights (L), vehicles (V), pedestrians (P), and infrastructure (I). The next block provides the localization of transmitters, defined by x and y coordinates. Depending on the communication type, additional details include the occupied lane (Lane 0–7), requested traffic light signals (TL 0–15), the number of vehicles behind the leader (Veic. nr), the ID assigned by the Intersection Manager (IM) to acknowledge vehicle messages, the cardinal direction (Direct.), and the active phase (Phase), provided by the infrastructure in a “request” or “response” message.

For traffic-related messages, the frame includes vehicle information, such as coordinates, the position of vehicles behind the leader (CarIDx, CarIDy), and traffic-related data (payload), such as road conditions, average waiting times, and weather information. The frame ends with a 4-bit End of Frame (EoF) block, represented by the pattern [0000], signaling the conclusion of the frame.

In Figure 7, the moments of communication for both vehicles are illustrated. A highlighted car is coming from the north, heading towards C0, and it is waiting for its phase to be activated to turn left. Various VLC communications, V2V, V2I, and I2V, are being studied during this phase. At C0, pedestrians in the designated waiting area are awaiting the pedestrian phase activation. During this time, P2I communications are initiated, where the pedestrian makes a crossing request, and I2P serves as the IM’s response.

Figure 7. Simulated VLC scenario. Two junctions (C0 and C1) with the RGBV ID transmitters. (a) C0. (b) C1. (c) C2.

Figure 8a,b demonstrate the MUX signal and the decoded messages between the analyzed vehicles and the traffic lights toward C0 and their movement after crossing C0 toward C1, respectively.

Figure 8. Normalized MUX signal and the decoded messages between the analyzed vehicles and the traffic lights toward C0 (a) and their movement after crossing C0 toward C1 (b).

In Figure 8a, for V2V communication (COM: 2), the vehicle behind the target car communicates with the one in front, providing its position as G_5,1, R_5,10, V_4,0, the lane it is in (Lane: 2), the number of vehicles following it (in this case, none), and the time of communication (11:18:30); as there are no cars behind, it does not transmit anything in these blocks to the car in front. After receiving this communication, the leader then makes a request to the IM through V2I. It provides its position as G_5,1, R_5,10, V_4,0, the traffic light to which the request is being made (TL: 2), the number of vehicles following it (Veic. (nr): 1), the time of the communication (TIME: 11:18:31), the car identifier (y, x: G_5,1), and the number of cars behind the follower, which is currently 0. Next, the I2V communication occurs, where the leader receives a response with the same information at 11:18:32, indicating that the active phase is 4 (W > E). After the N > S Left phase is activated, the cars move toward the C1 intersection, where they are currently lined up waiting.

In Figure 5b, after activating the NS Left phase, the cars move toward the C1 intersection, where they are currently queued. Here, L2V and V2V communications are presented for the vehicle under study. In the first L2V communication, information is transmitted to the vehicles, such as their positions on the road (R_3,3; B_4,8; G₃,₂) and the time the communication is established (TIME: 11:23:15). In the V2V communication for the last vehicle in the queue, the transmitted information includes its position (R_3,3; B_4,8; G_3,2), the road it is on (L: 0), the number of vehicles behind it (currently 0), and the time the communication is made (TIME:11:23:16). Because there are no cars behind it, this vehicle does not pass any information forward in these blocks.

For the next vehicle in the queue, V2V communication transmits information like its position (R_3,3; B_4,8; G_3,2), the road it is on (L: 0), the number of vehicles following it (in this case, 1), and the time the communication is established (TIME: 11:23:17), and it informs the vehicle ahead that there is one car behind it, providing its position (R_3,3). After passing through C1, the vehicles arrive at C2, where they queue again, awaiting their phase. V2V and L2V communications are re-established. For the L2V communication, information is transmitted to the first vehicle under study, including its position on the road (R_3,1; B_4,6; G_3,2) and the time the communication is established (11:26:10). In the V2V communication, the first car under study has three vehicles behind it on the same road. It then communicates its position (R_3,1; B_4,6; G_3,2), the road it is on (lane 0), the number of vehicles following it (3), the communication time (11:26:11), the identifiers of the vehicles behind it (G_3,2; R_3,3; G_3,4), as well as the number of vehicles following each of those cars. Figure 7 illustrates the different communication moments.

L2V Communication: Vehicles receive their positions in the environment.
V2V and V2I Communication: Vehicles receive their positions and communicate both with one another and with the infrastructure to relay positioning, traffic light phases, and vehicle status. This data exchange helps vehicles align their movements with the active traffic phases, ensuring a coordinated flow.
P2I and I2P Communication: Pedestrians also participate in a similar communication cycle, requesting to cross intersections and receiving confirmations. The infrastructure responds based on the active traffic phase, managing pedestrian crossings in harmony with vehicular phases to improve safety and efficiency.
Traffic Flow and Phase Coordination: Each intersection has specific phases (Figure 5) to control pedestrian and vehicle movements. By synchronizing these phases across multiple intersections (C0, C1, C2), the system can handle complex flows and enhance safety, preventing conflicts between vehicles and pedestrians.

Overall, the V-VLC protocols provide a structured way for traffic management systems to coordinate and streamline both pedestrian and vehicle movement, which can be particularly beneficial in densely populated or high-traffic areas. The approach leverages real-time data to reduce delays, improve safety, and optimize intersection efficiency.

3.2. Role of Rewards in Standard and Rerouting Scenarios

The design of the reward function and the integration of inter-agent communication significantly impact the system’s ability to handle traffic, creating different strategies that fit the scenario in question; in this case, the standard and rerouting scenarios. With a certain reward, it is possible to affect the training policy of the neural network, thus creating a traffic control strategy. While isolated optimization may suffice in simple environments, interconnected traffic systems require coordinated, adaptive strategies to achieve efficient and balanced flows. Two types of scenarios were considered:

-: Standard Scenarios: Rewards typically aim to minimize queue lengths, waiting times, or delays at a single intersection. With this reward, the traffic strategy developed during network training involves each agent considering only the intersection it controls, without contemplating neighboring intersections. This encourages each agent to optimize locally without necessarily considering downstream impacts. While this approach is effective in isolated or less congested intersections, it may lead to suboptimal results when intersections are interconnected. Static reward structures often fail to adapt to dynamic conditions, leading to peaks in occupancy.
-: Rerouting Scenarios: In scenarios where traffic flow from one intersection affects neighboring ones, the reward design needs to account for global traffic metrics. For example, this might involve penalizing congestion caused by vehicles heading toward already congested intersections or rewarding actions that reduce overall network pressure, even if local delays increase temporarily. Introducing a hierarchical reward structure, where global objectives (e.g., reducing network-wide congestion) take precedence over local ones, leads to a new policy in the training, resulting in a new strategy that fits this type of connected intersection scenario, improving overall performance. Alternatively, dynamically adjusting weights in the reward function based on the observed environment (e.g., during rush hours or peak flow) can help address evolving traffic patterns.

The necessity for inter-agent communication becomes evident in dense and complex networks. Without communication, the agents optimize independently, which can lead to localized bottlenecks or oscillatory behavior. Communication based on sharing lane occupancy or flow rates allows agents to make informed decisions, aligning local actions with network-wide goals. For example, an agent may delay activating a green phase for incoming traffic if the downstream intersection is nearing capacity, or it can also prioritize outgoing traffic to alleviate pressure on upstream intersections. When vehicles travel between intersections, the system must consider how decisions at one intersection propagate throughout the network. This requires the following:

-: Real-time data exchange to synchronize decisions.
-: Adaptive strategies that consider not only local queue lengths but also expected arrivals from neighboring intersections.

When designing the reward function (Equation (2), Figure 6), agents controlling intersections in proximity must adopt strategies to redistribute traffic dynamically. Localized reward components ensure that intersections minimize immediate queues and waiting times.

If an intersection’s queue length exceeds a threshold, neighboring agents might adjust their phase timings to divert or delay incoming traffic. Coordination can also enable smoother transitions for vehicles heading to subsequent intersections, preventing cascading congestion. Globalized reward components encourage behavior that aligns with network-wide goals, such as reducing total system delay or balancing traffic flow across intersections.

Three scenarios were analyzed and address different traffic conditions. In the standard scenario, most vehicles go straight; in the symmetric rerouting scenario, traffic is redistributed via turns; and in the asymmetric routing scenario, a priority direction is given precedence. We began with a typical arterial scenario, the “standard” where 75% of vehicles travel straight and 25% turn. However, when traffic demand exceeds capacity, the system activates the rerouting “symmetrical” scenario,” where 75% of vehicles are redirected to turn, helping balance traffic load. An “asymmetrical” rerouting scenario was also explored, where one direction is prioritized over the other, allowing for greater flow in one direction and reducing traffic in the opposite direction. Rerouting is applied exclusively to the prioritized direction.

To further optimize traffic in rerouting scenarios, upstream anti-bottlenecking and smart rerouting techniques are employed, adjusting intersection control in real time based on congestion levels and dynamically assigning priority to alternative routes. To integrate the three traffic scenarios (“standard”, “symmetrical rerouting”, and “asymmetrical rerouting”) into the reward equation (Equation (2)), the reward was adjusted dynamically to account for the specific goals and constraints of each scenario. This was achieved by introducing scenario-specific weights,

ω

, that adjusts the priority of the reward based on the current traffic scenario or parameters into the reward function, ensuring that the policy aligns with the objectives of each traffic condition. Below is a modified version of the reward equation:

r_{t} = ω p_{v e h} (a t w t_{v e h, t - 1} - a t w t_{v e h, t}) + ω p_{p e d} (a t w t_{p e d, t - 1} - a t w t_{p e d, t})

(5)

To accommodate varying traffic conditions, the reward function incorporates scenario-specific weights (ω) and parameters (

p_{v e h}

and

p_{p e d}

). The weights are dynamically adjusted based on the current traffic scenario:

Standard Scenario: ω = 1 serves as the baseline, ensuring a balanced prioritization of vehicles ( $p_{v e h}$ ) and pedestrians ( $p_{p e d}$ ).
Symmetrical Rerouting: ω > 1 increases responsiveness to congestion, with $p_{v e h}$ adjusted to prioritize turning movements, reducing oversaturation of straight-through lanes.
Asymmetrical Rerouting: ω varies across directions to reflect priorities, e.g., ω = 1.5 for the prioritized direction and ω = 0.8 for the less-prioritized direction. In this scenario, $p_{v e h}$ is weighted more heavily in the prioritized direction.

These weights and parameters are selected using a combination of empirical testing and domain-specific traffic management knowledge, ensuring the system’s adaptability to real-world traffic challenges and referring to our previous work [22,24].

The inclusion of

ω

allows the reward function to adapt to the needs of the traffic scenario dynamically. By varying

p_{v e h}

and

p_{p e d}

, the system can optimize for different priorities, such as faster vehicle movement in high-demand scenarios or equitable pedestrian crossings in normal conditions. This approach ensures that the reinforcement learning agent learns policies tailored to the specific challenges and goals of each traffic scenario, improving overall system performance.

3.3. Coordination Strategy

The proposed system uses a distributed DRL approach where local agents manage individual intersections while the global agent facilitates indirect communication and coordination. This ensures synchronization of traffic signals across intersections, addressing scalability and congestion issues.

Different methods using DRL can be used depending on the traffic scenario in hand. For multiple intersections, the existing methods can be classified into two categories, the centralized control methods and the decentralized control methods. For the first method, a global agent is trained to control the traffic signal of the entire network. Each agent observes an intersection and saves its experience to be used to train this global agent, which controls the environment and indicates which actions should be taken at each intersection. The second method employs the decentralized control formulating the multi-intersection signal control as a multi-agent system, in which each agent is trained to control a single intersection and only observes and perceives parts of the traffic environment.

A global agent coordinating local agents offers significant advantages for achieving network-wide optimization, especially in complex urban scenarios. Some advantages include the following:

Inter-Agent Communication and Coordination: The global agent has access to aggregated data from all local agents. This allows it to identify bottlenecks, optimize traffic signal phase timings for high-priority routes, and balance traffic loads. Local agents might have conflicting goals (e.g., prioritizing their intersection’s flow versus preventing downstream congestion). It mediates these conflicts by aligning local actions with global objectives.
Dynamic and Context-Aware Strategies: By observing traffic density and flow patterns, the global agent adjusts signal phases across intersections, creating synchronized green waves that improve throughput and reduce stop-and-go behavior. During events like accidents, road closures, or high pedestrian density (e.g., near stadiums or schools), the global agent coordinates local responses to adapt quickly to changing conditions.
Global Reward Integration: While local agents optimize using their localized reward functions, the global agent enforces a composite reward that balances local and network-wide priorities by penalizing actions that cause downstream congestion or pedestrian risks and rewarding synchronized actions that optimize overall traffic flow.
Incident Management: Detects anomalies, such as sudden congestion or accidents, and coordinates local agents to minimize the impact.

So, without a global agent, the local agents optimize only their intersections independently. An upstream intersection may release vehicles without considering downstream congestion, causing queue spillovers. With a global agent, it monitors the entire corridor, instructing the upstream intersection to hold traffic temporarily to allow downstream intersections to clear. This prevents cascading congestion and ensures smoother flow.

3.4. Simulation and Validation

The methodology is validated through simulations using the SUMO urban mobility simulator. The system’s performance is assessed under various traffic conditions and intersection types, with metrics including vehicle delays, pedestrian waiting times, and safety incidents.

Traffic demand data were synthesized based on expected vehicle movement patterns across different times of the day, simulating realistic urban scenarios. We incorporated historical traffic data, where available, to ensure the representation of typical flow distributions between origin and destination points.

By integrating VLC, DRL, and connected vehicle technologies, this approach provides a scalable, adaptive, and efficient solution for modern urban traffic challenges.

For the simulation environment, a three-connected, four-arm intersection scenario with two lanes in both directions is considered (Figure 4). Each one of these intersections is controlled by an agent that maps the environment around it in cells, acquiring different information about the vehicles and pedestrians travelling through the intersection (Figure 8). Each of the three intersections is divided into 3 layers made up of 164 cells. The first layer is made up of 80 cells, with 10 for each lane routing vehicles to the junction, indicating their presence. If a vehicle is inside of the cell, it is filled with ‘1’; otherwise, it is filled with ‘0’. The second layer, made up of the same number of cells, indicates the normalized speed of the cars in each cell, if any are present. The third layer, made up of just 4 cells, represents the waiting zones, indicating the number of pedestrians standing still waiting for their phase to become active. This state representation helps the agent map the environment around the intersection and ends up being very similar to the states observed via VLC, illustrated in Figure 9. In this case, the vehicles are identified over time by the lane they are in and by the traffic light they are communicating with. Pedestrians, on the other hand, are identified over time by the waiting zone they are in, as well as the traffic light they are communicating with.

Figure 9. A schematic of the state representation for each junction.

4. Results and Discussion

4.1. Network Training

The number of vehicles and pedestrians, along with their respective speed values, were derived from previous traffic studies and experiments conducted in the city of Lisbon. Using this empirical data as a foundation, we simulated 2600 vehicles and 2000 pedestrians over the course of one hour to represent a typical peak-hour scenario in an urban environment. The learning rate was selected to balance convergence speed and stability, following guidelines from foundational work on reinforcement learning and optimization. Preliminary experiments were conducted to identify the optimal range, ensuring that the model avoided divergence or excessively slow training [22,24].

This approach ensures that the experimental conditions closely align with real-world traffic patterns while providing a controlled environment for our analysis, with aligns our parameter tuning with practices in mixed traffic flow analyses and provides insights into dynamic parameter optimization in traffic signal control [30,31]. So, to compare both scenarios, three neural networks were trained with a reward function of 50/50, this being the weight for vehicles and pedestrians, respectively. The environment simulated 2600 vehicles and 2000 pedestrians over 300 episodes, each lasting 3600 s. The batch size was tuned to balance computational efficiency and the stability of gradient updates. Smaller batch sizes provided better generalization for our traffic scenarios, while larger sizes caused slower response to changing dynamics.

Training parameters are given in Table 2.

Table 2. Training parameters.

The three neural networks utilized in the comparison are based on a Feed Forward Neural Network (FFNN) architecture, structured as follows:

Input Layer: 164 neurons, representing the current state of the environment.
Hidden Layers: Five fully connected hidden layers, each utilizing the ReLU activation function for non-linearity.
Output Layer: 9 neurons, each corresponding to a discrete action that the agent can take in the environment.

This choice of architecture was made to balance computational efficiency with sufficient model complexity to capture the dynamics of the traffic environment.

To analyze the results obtained, Figure 10a presents the cumulative negative reward graph, where significant differences can be observed among the curves for the three training scenarios. In the arterial scenario, the system faces a higher volume of bidirectional traffic flow, resulting in a lower reward compared to the other scenarios. This is attributed to the increased traffic volume creating substantial pressure at each junction, leading to longer waiting times and larger queues.

Figure 10. Network training for both scenarios. (a) Cumulative negative reward. (b) Average queue size.

In contrast, the symmetrical rerouting scenario shows an improvement in reward. By implementing micro-control at each junction, the system dynamically suggests better routes for vehicles, reducing queue sizes and alleviating pressure at critical points. This optimization leads to a higher cumulative reward as congestion is mitigated.

In the asymmetrical rerouting scenario, where traffic flow is adjusted to prioritize vehicles moving from west to east, the system adapts by prioritizing phases that facilitate movement in this direction. Combined with rerouting at critical junctions, this approach significantly reduces pressure across all intersections, leading to a notable increase in rewards compared to the other scenarios.

Overall, the results demonstrate that the networks were effectively trained, as rewards became progressively less negative over time. The symmetric rerouting and asymmetric rerouting scenarios showed faster convergence towards optimal strategies, greater stability, and robust performance, as evidenced by their more consistent reward distributions compared to the standard arterial scenario.

Figure 10b illustrates the average queue size during training for all three scenarios, aligning with the reward analysis. In the arterial scenario, the bidirectional traffic flow exerts higher pressure on the environment, resulting in larger queues. In the rerouting scenario, micro-control at the junctions ensures better traffic phase activation, accounting for lane capacity limits. This prevents excessive pressure, even when the flow is shifted to a 25/75 distribution.

Finally, the asymmetric rerouting scenario exhibits the lowest average queue sizes. By significantly reducing traffic volume in the east-to-west direction—one of the most critical traffic flows—the system minimizes congestion more effectively than in the other scenarios. This leads to better queue management and overall improved traffic performance.

4.2. Network Testing

To evaluate the trained networks, simulations were conducted for 3600 s involving 2600 vehicles and 2000 pedestrians. The number of cars and pedestrians in the simulation was derived from a detailed analysis in which traffic phase cycles were dynamically defined [22,24].

The environment scenario was optimized based on observed traffic patterns, and the number of vehicles and pedestrians was estimated under the condition of the longest traffic signal phase.
The intelligent system, powered by neural networks, was implemented and demonstrated that these levels of traffic flow could be efficiently managed.

The evaluation metrics validate the optimized performance of the proposed algorithm in terms of traffic safety and efficiency. Additional analyses were performed to assess the effects of optimization on halting vehicles and pedestrians in both standard and rerouting scenarios. The results confirmed that these traffic densities were feasible when optimal decisions were made by the neural-network-based intelligent system. This systematic approach allowed for realistic simulation conditions that closely mirror peak-hour scenarios.

Figure 11a–c depict the halting vehicles at junctions C0, C1, and C2, respectively.

Figure 11. Comparison of trends over time for vehicle halting sessions at intersections in standard versus rerouting symmetrical and asymmetrical scenarios: (a) Intersection C0, (b) Intersection C1, and (c) Intersection C2.

In the scenario with three horizontally arranged junctions, C1 is identified as the critical junction. Situated between C0 and C2, it receives traffic from both directions, subjecting it to significant pressure and increasing vehicle queues. By incorporating queue length thresholds, the system dynamically responds to congestion levels on key roads, such as those connecting C0 to C1 and C1 to C2.

For example, vehicles traveling from west to east at junction C0 are informed of road conditions between C0 and C1 upon entering the system. If the vehicle count exceeds a set threshold, such as 25 cars (despite the road’s 400 m capacity to hold more), vehicles are rerouted to turn right instead. This rerouting prevents additional congestion on the road between C0 and C1, as illustrated in Figure 11a for the rerouting and asymmetric rerouting scenarios. Vehicles that reroute exit the system sooner, alleviating pressure at subsequent junctions. This leaves space for new vehicles to enter and prevents scenarios where cars are stalled at green lights due to blocked lanes ahead.

Additionally, the system manages congestion by prioritizing traffic phases that direct fewer cars into critical lanes. For instance, instead of activating the west-to-east phase, which would funnel more cars into C1’s critical lane, the system may activate a north-to-south phase or a pedestrian phase, allowing C1 to clear its critical lane. As shown in Figure 11b, during the peak period between 800 and 2200 s, the standard scenario exhibits significantly higher congestion at C1 compared to the rerouting scenarios. C1, being the pivotal junction between C0 and C2, connects two critical roads in the network. Strict micro-control is necessary, particularly due to the disparity in road lengths: the road between C0 and C1 is 400 m long, while the road between C1 and C2 is only 200 m. This difference in capacity requires careful traffic management to prevent overwhelming C2. For example, vehicles traveling from C2 to C1 and heading toward C0 move more easily due to transitioning from the shorter 200 m road to the longer 400 m road. However, even in this case, the system ensures the 25-vehicle limit is respected to maintain traffic flow.

In Figure 11c, the halting vehicles at C2 are shown. The rerouting scenarios consistently reduce the number of stopped vehicles compared to the standard scenario, aligning with earlier observations. While congestion management at C2 is generally effective, it requires greater precision due to the interplay of route changes and phase activations.

The asymmetric rerouting scenario focuses on managing the higher traffic flow in the west-to-east direction, deprioritizing the east-to-west flow where traffic volumes are lower. By reducing the flow from east to west, a critical region in the system, the system handles peak traffic volumes (between 500 and 2000 s) effectively. During these peak periods, the number of stopped vehicles across the three junctions averages around 50, demonstrating the system’s ability to maintain stability even under unbalanced flow conditions.

Compared to the high queue levels observed in the standard scenario at junctions C0, C1, and C2, the rerouting scenario reduces average traffic pressure by 66%, 50%, and 75%, respectively. These results highlight the significant efficiency gains provided by rerouting, making arterial roads more effective and improving overall traffic flow.

Careful management of the 200 m road connecting C2 to C1 is critical, as its limited capacity can exacerbate congestion at C1. By implementing dynamic rerouting and optimized phase activations, the system ensures balanced traffic distribution, minimizing pressure on all intersections.

Figure 12 presents the number of halted pedestrians at each intersection for both the standard and rerouting scenarios. A comparison of pedestrian traffic between these scenarios reveals that the rerouting scenario generally results in fewer pedestrians waiting in designated areas.

Figure 12. Comparison of trends over time for pedestrian halting sessions at intersections in standard versus rerouting symmetrical and asymmetrical scenarios: (a) intersection C0, (b) intersection C1, and (c) intersection C2.

The most significant differences between scenarios occur at intersections C0 and C2, where the rerouting scenario more effectively utilizes the pedestrian phase, enabling smoother pedestrian flow. At C1, the critical junction with the highest overall pedestrian volume, there are more halted pedestrians than at C0 and C2. However, even at C1, the rerouting scenario consistently shows fewer waiting pedestrians compared to the standard scenario.

While the rerouting scenario does not explicitly prioritize pedestrian flow, it indirectly benefits pedestrians by reducing vehicle queues. With fewer vehicles clogging the intersections, the system can more frequently activate pedestrian phases, facilitating safer and more efficient crossings.

The pedestrian phase serves a dual purpose: it not only manages pedestrian flow but also helps regulate vehicle traffic on critical roads connecting intersections. As such, pedestrians indirectly benefit from improved traffic flow in the rerouting scenario. However, the overall reduction in halted pedestrian numbers is modest. This is because activating a pedestrian phase does not guarantee that large numbers of pedestrians will cross, as the availability of pedestrians in waiting zones varies. Factors like smaller groups, fewer pedestrians arriving during active phases, or inconsistent arrival patterns influence this variability.

Peaks in pedestrian halting sessions are closely tied to crossing periods, with the size of these peaks reflecting congestion levels and pedestrian reactions to connected vehicle traffic. The rerouting scenario exhibits smaller peaks compared to the standard scenario, where peaks are more pronounced, indicating higher stress levels. Between 500 and 1000 s, the rerouting scenario reduces average pedestrian pressure by 25%, effectively alleviating congestion during peak traffic times when pedestrian and vehicle wait times overlap. This reduction in pressure significantly minimizes the risk of pedestrian run-overs, enhancing safety performance and underscoring the importance of rerouting in traffic management.

The asymmetric rerouting scenario shows slightly higher pedestrian waiting numbers at C0 and C2 compared to the rerouting scenario. This difference arises from the redistribution of vehicles: instead of reducing vehicle numbers from the east, they are rerouted to other entry points across the network. As a result, intersections like C0 and C2 experience a slight increase in pedestrian presence, as priority is given to managing the high flow of west-to-east vehicles.

At C1, the critical junction, pedestrians experience the longest wait times due to the increased vehicle flow originating from the west. Consequently, the number of pedestrians in waiting zones rises as a result of the higher traffic pressure at this intersection.

In conclusion, the rerouting scenario demonstrates superior performance in reducing pedestrian halts and enhancing safety, especially during peak periods. Despite some minor increases in pedestrian pressure under the asymmetric rerouting scenario, both rerouting approaches improve overall traffic flow and safety for all road users.

4.3. Global Agent Decisions

Figure 13 shows a comparison of trends over time for all of the active phases (agent actions) validated by the global agent at C0, C1, and C2 intersections in standard (left) versus rerouting symmetrical (middle) and asymmetrical (right) scenarios. At the top, the nine actions that the agent can take in the environment are displayed. These mini-figures highlight the differences in traffic management strategies across the scenarios under consideration. They represent how the neural network adapted its strategy during training, reflecting the learned traffic signal control patterns tailored to the observed traffic dynamics in each environment.

Figure 13. Comparison of trends over time for the active phases (agent actions) at C0, C1, and C2 intersections in standard (left) versus rerouting symmetrical (middle) and asymmetrical (right) scenarios. The nine possible phases are indicated at the top.

Results show that the system diverges from a fixed phase sequence, characteristic of dynamic traffic control systems, by continuously adapting to real-time traffic conditions. Crucially, pedestrian phases are only triggered upon pedestrian request, optimizing phase usage by prioritizing vehicular movement unless a pedestrian need arises.

When comparing the three scenarios, the allocated green times and phase sequences across intersections exhibit notable variations over time. During the first 1500 s, green times for Phase 1 (N > S) and Phase 5 (E > W) at intersections C0 and C1 in the standard scenario differ significantly from those at C2. In contrast, the rerouting scenarios show distinct adjustments: green times for Phase 1 at C0 and C1 nearly double, while green times for Phase 5 at C2 increase significantly in both rerouting configurations.

Furthermore, the symmetric and asymmetric rerouting scenarios prioritize Phase 9 (pedestrians) and Phase 8 (left turns) during the initial period. This emphasis reflects a strategic focus on smoother rerouting and effective congestion management by prioritizing pedestrian and turning movements to support traffic flow redistribution.

This adaptive system efficiently handles both pedestrian and vehicle flows under varying traffic conditions.

Standard Scenario: The system aims to maximize green band efficiency for vehicles while minimizing pedestrian rule violations. Green lights are coordinated to reduce vehicle stops, while pedestrian crossing times are scheduled to prevent excessive wait times.
Rerouting Scenarios: When arterial traffic demand exceeds system capacity due to incidents or severe congestion, the system activates rerouting mechanisms. Traffic light coordination is dynamically reconfigured to redistribute traffic flow. For example, in these scenarios, 25% of vehicles are directed along the more congested straight paths, while the remaining 75% are rerouted to turning movements. This adjustment alleviates congestion on main arteries and enhances overall flow efficiency.

In the symmetric rerouting scenario, equal priority is given to both traffic directions, ensuring balanced flow across the network. Green times and phase sequences adapt dynamically to accommodate redistributed traffic. In the asymmetric rerouting scenario, the system assigns priority based on traffic demand. For instance, when one direction experiences higher traffic volumes, green times for corresponding high-traffic lanes are extended. This asymmetry effectively reduces pressure on critical lanes while maintaining manageable pedestrian waiting zones.

Table 3 presents the overall percentages of green times for intersections C0, C1, and C2 over a training segment for both scenarios. These values are also visualized in Figure 14 for comparison.

Table 3. Global percentages of green times for C0, C1, and C2 for both standard and rerouting scenarios.

Figure 14. Comparison of green time trends across all active phases at intersections C0, C1, and C2. Active phases are indicated at the top for clarity. (a) Standard scenario. (b) Symmetric rerouting scenario. (c) Asymmetric rerouting scenario.

As expected, the system prioritizes critical phases (P1, P5, P6, and P9) with green bands that depend on the scenario, adapting its strategy to reduce wait times and improve traffic flow.

Figure 14 provides a comparison of green time trends across all active phases at intersections C0, C1, and C2 for the analyzed scenarios. Active phases are indicated at the top of the figure for clarity.

Standard Scenario: Green times are predominantly allocated to the arterial direction, particularly the west–east phase (P5). However, this allocation sharply decreases from 37% at C0 to 20% at C2, creating controlled bottlenecks along specific road sections or chains of sections. This imbalance leads to queue build-ups at junctions C0, C1, and C2, as shown in Figure 11a–c. While prioritizing the arterial direction, the irregular green time distribution contributes to congestion, especially at downstream intersections.
Symmetrical Rerouting Scenario: Green times are redistributed more evenly, with P5 green times decreasing gradually from 33% to 28%. This balanced allocation ensures smoother and more consistent traffic flow across the network, significantly reducing congestion and preventing bottlenecks observed in the standard scenario.
Asymmetrical Rerouting Scenario: Green time allocation is further adapted by increasing the duration of P6 to prioritize the west-to-east (W > E) direction, addressing the higher vehicle flow in this direction. To achieve this prioritization, the system reduces the activation of less critical north–south phases (P2, P3, and P4), ensuring optimal resource utilization for the prioritized W > E traffic.

These results demonstrate how the dynamic adjustment of ω,

p_{v e h}

,

p_{p e d}

and phase-specific green times enables the system to adapt effectively to varying traffic scenarios, improving overall traffic performance. Those examples highlight the system’s ability to manage diverse challenges and achieve scenario-specific goals.

Pedestrian phases also exhibit notable differences between the scenarios. In the standard scenario, the pedestrian phase (P9) at C1 nearly triples, significantly enhancing safe crossing opportunities. This adjustment helps reduce queues at all intersections, as vehicles in the standard scenario are rerouted to exit the environment via right turns, alleviating pressure downstream. In the asymmetric rerouting scenario, pedestrian phases (P9) maintain similar durations at C0 and C2 but decrease at C1. This reduction is necessary to allocate more green time to vehicle phases supporting the W > E direction, ensuring stable vehicle flows without causing excessive pressure on critical junctions.

The adaptive nature of the system allows it to effectively manage both vehicle and pedestrian traffic. For vehicles, it minimizes critical queues by dynamically redistributing green times, thereby reducing congestion and relieving pressure at intersections. For pedestrians, consistently activated crossing phases ensure safe and efficient movement through intersections, except when temporary reductions are required to manage critical vehicle flows. This dynamic flexibility extends to traffic phase activation, which occurs without a fixed sequence, allowing the system to respond in real time to evolving traffic patterns at each intersection.

The system demonstrates its adaptability by accommodating 2600 vehicles and 2000 pedestrians under diverse traffic conditions. The standard scenario, while prioritizing arterial traffic, results in bottlenecks and longer vehicle queues but compensates by enhancing pedestrian safety through extended crossing times (P9). The symmetric rerouting scenario achieves a more balanced green time allocation, reducing congestion and distributing traffic loads more evenly. Finally, the asymmetric rerouting scenario prioritizes W > E traffic through increased green times for P6, effectively reducing congestion in critical directions while temporarily reducing pedestrian crossing times at C1 to maintain overall flow stability.

Overall, the system dynamically adapts to real-time conditions, ensuring efficient traffic flow and safety for all road users. Each scenario demonstrates specific trade-offs and benefits, highlighting the importance of flexible, adaptive traffic management strategies in complex urban environments.

4.4. Impact of VLC and CV Integration on Traffic Rerouting

The integration of Vehicle-to-Everything communication (VLC) and connected vehicle (CV) systems within traffic management demonstrates significant potential to improve traffic flow and safety, particularly through rerouting mechanisms. In the rerouting scenarios, these systems show notable benefits in reducing congestion and enhancing road safety, underscoring their value in advanced traffic management strategies.

The rerouting scenarios dynamically adjust traffic light phases to reduce congestion while maintaining pedestrian safety. Adaptive prioritization of green times ensures that pedestrian waiting zones remain within manageable limits, optimizing both vehicle throughput and pedestrian flow. The rerouting scenarios, in particular, further tailor signal adjustments to localized traffic demands, providing a more targeted approach to congestion management.

To evaluate the impact of the reward function across scenarios, we have conducted experiments in the past [22,24] with varying traffic densities for one and two arterial intersections. Results have shown that in high-traffic conditions, prioritizing vehicles (ω > 1, high p_veh) improved vehicle throughput but slightly increased pedestrian wait times. In low-traffic conditions, using balanced weights (ω = 1, equal p_veh and p_ped) optimized the overall system performance.

By integrating VLC and CV technologies with rerouting systems, traffic management becomes more responsive, more efficient, and safer for all road users. These technologies enable real-time communication between intersections, allowing traffic signals to synchronize and adjust dynamically based on live data. This reduces stop-and-go driving, minimizes congestion, and facilitates smoother traffic flow across the network.

Real-time data sharing between intersections allows rerouting systems to anticipate vehicle queues and proactively adjust signal phases to minimize waiting times. This capability benefits vehicles and pedestrians by reducing delays and ensuring safer, more timely pedestrian crossings while avoiding conflicts with vehicular traffic.

Rerouting plays a crucial role in congestion management by redirecting vehicles away from heavily congested areas and preventing critical intersections from becoming overwhelmed. Additionally, these systems can address unexpected issues, such as malfunctioning signals or sudden congestion spikes, preventing minor problems from escalating into major traffic disruptions.

During blockages or heavy congestion, rerouting enhances network efficiency by dynamically assigning priority to alternative routes, redistributing traffic loads, and reducing bottlenecks at key intersections. This adaptive control ensures real-time adjustments to traffic flows, optimizing both arterial and local road conditions.

5. Conclusions and Future Trends

5.1. Summary

This study presents a detailed examination of intelligent arterial traffic management systems spanning multiple intersections, focusing on the synergistic potential VLC and DRL techniques to enhance safety and efficiency. By leveraging vehicle-to-vehicle VLC messages and intelligent state representations, we developed dynamic traffic control models capable of managing traffic efficiently across interconnected intersections.

Using Multi-Agent Reinforcement Learning integrated with the SUMO simulator, we demonstrated the effectiveness of our approach in optimizing traffic flow and reducing congestion. The simulations accounted for various critical factors, including vehicle and pedestrian volumes, velocities, pedestrian clearance times, and waiting zone occupancy. These parameters are pivotal for ensuring robust and efficient traffic management systems. Our approach proved highly adaptable to dynamic traffic scenarios, showcasing the importance of continuous learning in fluctuating urban environments.

The results highlight the efficacy of rerouting in enhancing arterial road efficiency while demonstrating tangible benefits, such as shorter queue lengths and fewer stopped vehicles at intersections. When analyzing the three scenarios, standard, symmetric rerouting, and asymmetric rerouting, it is evident that rerouting significantly enhances traffic flow by redistributing vehicles more effectively. In the symmetric rerouting scenario, green times were more evenly allocated across all intersections, resulting in smoother traffic flow and reduced bottlenecks. The prioritization of phases, such as pedestrian crossing and left turns, facilitated better synchronization and balance in traffic management, accommodating both vehicles and pedestrians. The asymmetric rerouting scenario further demonstrated its advantage by tailoring traffic adjustments to prioritize high-traffic directions, such as W > E, while reducing activations of less critical phases, such as N > S and S > N. This targeted approach addressed localized congestion issues effectively, ensuring a more balanced distribution of traffic without overwhelming specific intersections. Both rerouting scenarios also demonstrated substantial benefits for pedestrian safety by reducing vehicle queues and enabling more frequent pedestrian phases. The overall reduction in halted vehicles and improved traffic fluidity contributed to a safer and more efficient traffic environment for all road users.

5.2. Novel Contributions

This paper presents a novel integration of Visible Light Communication (VLC) and Artificial Intelligence (AI) for adaptive traffic management. Unlike prior studies that primarily focus on VLC-based communication systems or traffic control methods [5,22,24,25,26], this study proposes a unified framework that combines VLC and AI. The AI-based approach leverages Deep Reinforcement Learning (DRL) to dynamically optimize traffic flow and enhance safety.

A key contribution of this paper is the design of customized reward functions to handle rerouting scenarios effectively. These reward functions prevent queue spillovers and mitigate cascading congestion while ensuring safe and timely pedestrian crossings—a feature not addressed in previous studies.

Another significant advancement is the implementation of global coordination for traffic management using multi-agent systems. Unlike previous work that focused on single-intersection optimization, this study employs a global agent coordination mechanism to manage traffic signals across multi-intersection networks. This approach facilitates indirect communication between local agents, enhancing scalability and adaptability.

5.3. Key Challenges and Limitation Costs

Despite these benefits, this study also identifies key challenges and limitations associated with implementing these systems in real-world scenarios. The effectiveness of rerouting heavily depends on the speed and reliability of real-time data transmission. Delays in communication between intersections can disrupt synchronization, leading to inefficient traffic control and safety risks from misaligned signals. Scalability becomes a critical issue as traffic systems expand to larger urban areas, requiring robust computing infrastructure capable of processing and analyzing vast amounts of data without performance degradation. Furthermore, network downtime or instability, particularly in regions with outdated infrastructure, can undermine rerouting efficiency, resulting in operational delays and inefficiencies. Upgrading these systems to accommodate rerouting capabilities often requires significant investment in hardware and software. Even with reliable data, the performance of rerouting systems depends on decision-making algorithms that must process information quickly and accurately. Delays or inefficiencies in algorithmic processing can compromise system effectiveness, leading to suboptimal traffic flow and increased congestion.

Although precise cost estimates are not calculated due to the research-oriented focus of this study, we address potential barriers and practical considerations for implementing multi-intersection VLC and DRL systems. Key aspects include the following:

-: Hardware Costs: Expenses depend on the deployment scale and performance requirements, encompassing LED transmitters, photodetectors, and edge-computing devices for real-time processing.
-: Maintenance Requirements: Regular upkeep involves replacing LEDs, calibrating photodetectors, and updating edge devices, with costs influenced by environmental factors and usage intensity.
-: Compatibility with Existing Systems: Deployment may require retrofitting traditional traffic lights and ensuring DRL system integration with centralized traffic management.

5.4. Challenges and Mitigation Strategies in VLC and Multi-Agent Communication

To provide a comprehensive analysis of the practical challenges and potential solutions associated with VLC and multi-agent communication systems, we have identified key issues and proposed strategies for mitigation:

VLC Disruptions and Bottlenecks: VLC communication is highly dependent on direct line-of-sight (LOS), which can be obstructed by vehicles or pedestrians. Enhancing signal coverage can be achieved by deploying multiple transmitters, receivers, and reflective surfaces. Additionally, ambient light interference, such as sunlight, can be mitigated using optical filters.

Multi-Agent Communication Challenges: Ensuring synchronized communication in distributed environments is essential to prevent delays in decision making. This can be addressed using time-stamping protocols and clock synchronization to enhance coordination. Implementing edge computing at intersections also reduces reliance on centralized servers, minimizing communication latency.

Scalability Concerns: In high-density traffic networks, increased data exchange among agents can lead to network congestion. Adopting priority-based communication protocols ensures that critical messages are transmitted promptly, even in congested conditions.

Future Work: Future research will focus on testing these mitigation strategies in simulated environments with realistic traffic scenarios and varying network loads. Experimental validations in small-scale deployments will also be conducted to assess the effectiveness of the proposed solutions.

5.5. Future Trends

Addressing these challenges is essential to unlocking the full potential of VLC and CV integration, creating safer, more efficient and adaptive traffic management systems. Looking ahead, the scalability and generalization of these proposed approaches remain critical areas of exploration. Future research will focus on testing and validating the methods across diverse urban settings characterized by varying traffic volumes, road layouts, and pedestrian dynamics. By continuing to refine and extend these techniques, this research aims to advance the development of intelligent traffic management systems that effectively address the growing complexities of urban mobility.

Author Contributions

Conceptualization, M.V.(Manuela Vieira); Formal analysis, M.A.V. and M.V. (Manuela Vieira); Investigation, M.V. (Mário Vestias), P.L., and P.V.; Methodology, M.V. (Manuela Vieira), G.G., and M.A.V.; Software, G.G.; Validation, G.G., M.A.V., M.V. (Mário Vestias), P.L., and P.V.; Writing—original draft, G.G.; Writing—review and editing, M.V. (Manuela Vieira). All authors have read and agreed to the published version of the manuscript.

Funding

This research received support from FCT—Fundação para a Ciência e a Tecnologia, through the Research Unit CTS—Center of Technology and Systems, with references UIDB/00066/2020 and IPL/IDICA/2024/INUTRAM_ISEL.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

The authors acknowledge CTS-ISEL and IPL.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

Siegel, J.E.; Erb, D.C.; Sarma, S.E. A survey of the connected vehicle landscape—Architectures, enabling technologies, applications, and development areas. IEEE Trans. Intell. Transp. Syst. 2018, 19, 2391–2406. [Google Scholar] [CrossRef]
O’Brien, D.; Le Minh, H.; Zeng, L.; Faulkner, G.; Lee, K.; Jung, D.; Oh, Y.; Won, E.T. Indoor visible light communications: Challenges and prospects. Proc. SPIE—Int. Soc. Opt. Eng. 2008, 7091, 60–68. [Google Scholar]
Parth, H.; Pathak, X.; Pengfei, H.; Prasant, M. Visible Light Communication, Networking and Sensing: Potential and Challenges. IEEE Commun. Surv. Tutor. 2015, 17, 2047–2077. [Google Scholar]
Caputo, S.; Mucchi, L.; Cataliotti, F.; Seminara, M.; Nawaz, T.; Catani, J. Measurement-based VLC channel characterization for I2V communications in a real urban scenario. Veh. Commun. 2021, 28, 100305. [Google Scholar] [CrossRef]
Vieira, M.A.; Vieira, M.; Vieira, P.; Louro, P. Optical signal processing for a smart vehicle lighting system using a-SiCH technology. In Proceedings of the SPIE Optics+Electronics, Prague, Czech Republic, 24–27 April 2017; Volume 10231. [Google Scholar]
Ge, H.; Song, Y.; Wu, C.; Ren, J.; Tan, G. Cooperative Deep Q-Learning With Q-Value Transfer for Multi-Intersection Signal Control. IEEE Access 2019, 7, 40797–40809. [Google Scholar] [CrossRef]
Vidali, A.; Crociani, L.; Vizzari, G.; Bandini, S. A Deep Reinforcement Learning Approach to Adaptive Traffic Lights Management. In Proceedings of the 20th Workshop “From Objects to Agents”, WOA 2019, Parma, Italy, 26–28 June 2019; pp. 42–50. [Google Scholar]
Oroojlooy, A.; Hajinezhad, D. A review of cooperative multi-agent deep reinforcement learning. Appl. Intell. 2023, 53, 13677–13722. [Google Scholar] [CrossRef]
Girao, P.S.; Alegria, F.; Viegas, J.M.; Lu, B.; Vieira, J. Wireless System for Traffic Control and Law Enforcement. In Proceedings of the 2006 IEEE International Conference on Industrial Technology, Mumbai, India, 15–17 December 2006; pp. 1768–1770. [Google Scholar] [CrossRef]
Khanna, A.; Goyal, R.; Verma, M.; Joshi, D. Intelligent traffic management system for smart cities: First International Conference, FTNCT 2018, Solan, India, 9–10 February 2018, Revised Selected Papers. In Futuristic Trends in Network and Communication Technologies; Springer: Singapore, 2018; pp. 152–164. [Google Scholar]
Zambrano-Martinez, J.L.; Calafate, C.T.; Soler, D.; Lemus-Zúñiga, L.-G.; Cano, J.-C.; Manzoni, P.; Gayraud, T. A Centralized Route-Management Solution for Autonomous Vehicles in Urban Areas. Electronics 2019, 8, 722. [Google Scholar] [CrossRef]
Oskarbski, J.; Guminska, L.; Miszewski, M.; Oskarbska, I. Analysis of Signalized Intersections in the Context of Pedestrian Traffic. Transp. Res. Procedia 2016, 14, 2138–2147. [Google Scholar] [CrossRef]
Pribyl, O.; Pribyl, P.; Lom, M.; Svitek, M. Modeling of Smart Cities Based on ITS Architecture. IEEE Intell. Transp. Syst. Mag. 2018, 11, 28–36. [Google Scholar] [CrossRef]
Miucic, R. Connected Vehicles: Intelligent Transportation Systems; Springer: Cham, Switzerland, 2019. [Google Scholar]
Kodi, J.H. Evaluating the Mobility and Safety Benefits of Adaptive Signal Control Technology (ASCT). Master’s Thesis, University of North Florida, Jacksonville, FL, USA, 2019. Available online: https://digitalcommons.unf.edu/etd/930 (accessed on 7 April 2024).
Bilal, J.M.; Jacob, D. Intelligent Traffic Control System. In Proceedings of the 2007 IEEE International Conference on Signal Processing and Communications, Dubai, United Arab Emirates, 24–27 November 2007; pp. 496–499. [Google Scholar] [CrossRef]
Yousefi, S.; Altman, E.; El-Azouzi, R.; Fathy, M. Analytical Model for Connectivity in Vehicular Ad Hoc Networks. IEEE Trans. Veh. Technol. 2008, 57, 3341–3356. [Google Scholar] [CrossRef]
Zhang, J.; Wang, F.-Y.; Wang, K.; Lin, W.-H.; Xu, X.; Chen, C. Data-Driven Intelligent Transportation Systems: A Survey. IEEE Trans. Intell. Transp. Syst. 2011, 12, 1624–1639. [Google Scholar] [CrossRef]
Shen, W.; Tsai, H. Testing vehicle-to-vehicle visible light communications in real-world driving scenarios. In Proceedings of the 2017 IEEE Vehicular Networking Conference (VNC), Torino, Italy, 27–29 November 2017; pp. 187–194. [Google Scholar]
Liang, X.; Du, X.; Wang, G.; Han, Z. A Deep Reinforcement Learning Network for Traffic Light Cycle Control. IEEE Trans. Veh. Technol. 2019, 68, 1243–1253. [Google Scholar] [CrossRef]
Lopez, P.A.; Behrisch, M.; Bieker-Walz, L.; Erdmann, J.; Flötteröd, Y.P.; Hilbrich, R.; Lücken, L.; Rummel, J.; Wagner, P.; Wiessner, E. Microscopic Traffic Simulation using SUMO. In Proceedings of the 21st IEEE International Conference on Intelligent Transportation Systems, Maui, HI, USA, 4–7 November 2018; pp. 2575–2582. [Google Scholar]
Vieira, M.; Vieira, M.A.; Galvão, G.; Louro, P.; Véstias, M.; Vieira, P. Enhancing Urban Intersection Efficiency: Utilizing Visible Light Communication and Learning-Driven Control for Improved Traffic Signal Performance. Vehicles 2024, 6, 666–692. [Google Scholar] [CrossRef]
Yousefpour, A.; Fung, C.; Nguyen, T.; Kadiyala, K.; Jalali, F.; Niakanlahiji, A.; Kong, J.; Jue, J.P. All one needs to know about fog computing and related edge computing paradigms: A complete survey. J. Syst. Archit. 2019, 98, 289–330. [Google Scholar] [CrossRef]
Vieira, M.A.; Vieira, P.; Fernandes, R.; Louro, P. Dynamic Vehicular Visible Light Communication for Traffic Management. Next-Generation Optical Communication: Components, Sub-Systems, and Systems XII; Li, G., Nakajima, K., Srivastava, A.K., Eds.; SPIE: Bellingham, WA, USA, 2023; p. 124290O. [Google Scholar] [CrossRef]
Fernandes, R.; Vieira, M.A.; Vieira, P.; Louro, P.A.; Véstias, M. Using visible light communication to implement intelligent traffic signals and cooperative trajectories at urban intersections. In Light-Emitting Devices, Materials, and Applications XXVII; Kim, J.K., Krames, M.R., Strassburg, M., Eds.; SPIE: Bellingham, WA, USA, 2023; p. 124410G. [Google Scholar] [CrossRef]
Vieira, M.A.; Vieira, M.; Louro, P.; Vieira, P. Cooperative vehicular communication systems based on visible light communication. Opt. Eng. 2018, 57, 076101. [Google Scholar] [CrossRef]
Tang, F.; Kawamoto, Y.; Kato, N.; Liu, J. Future Intelligent and Secure Vehicular Network Toward 6G: Machine-Learning Approaches. Proc. IEEE 2019, 108, 292–307. [Google Scholar] [CrossRef]
Luong, N.C.; Hoang, D.T.; Gong, S.; Niyato, D.; Wang, P.; Liang, Y.-C.; Kim, D.I. Applications of Deep Reinforcement Learning in Communications and Networking: A Survey. IEEE Commun. Surv. Tutor. 2019, 21, 3133–3174. [Google Scholar] [CrossRef]
Ye, H.; Li, G.Y.; Juang, B.-H.F. Deep Reinforcement Learning Based Resource Allocation for V2V Communications. IEEE Trans. Veh. Technol. 2019, 68, 3163–3173. [Google Scholar] [CrossRef]
Wu, S.; Sun, D.J.; Qiu, G. Emission analysis based on mixed traffic flow and license plate recognition model. Transp. Res. Part D Transp. Environ. 2024, 134, 104331. [Google Scholar] [CrossRef]
Chen, S.; Sun, D.J. An Improved Adaptive Signal Control Method for Isolated Signalized Intersection Based on Dynamic Programming. IEEE Intell. Transp. Syst. Mag. 2016, 8, 4–14. [Google Scholar] [CrossRef]

Figure 1. (a) 2D representation of the V-VLC architecture. (b) V-VLC Emitter and receivers’ relative positions and illustration of the coverage map with the footprint regions in the unit cell (#1–#9) and steering angle codes (2–9) [22].

Figure 2. Environment of the arterial scenario.

Figure 3. Flowchart during simulation and training.

Figure 4. Simulated scenario for each junction: four-legged intersection and environment with the optical infrastructure (X_ij), the generated footprints (#1–#9), and the connected cars and pedestrians. Dush lines show the sidewalks.

Figure 5. (a) Phasing diagram. (b) Schematic diagram of one junction with coded lanes (L/0–7) and traffic lights (TL/0–15). Arrows show the traffic directions.

Figure 6. A schematic of the algorithm employed using centralized MARL.

Figure 7. Simulated VLC scenario. Two junctions (C0 and C1) with the RGBV ID transmitters. (a) C0. (b) C1. (c) C2.

Figure 8. Normalized MUX signal and the decoded messages between the analyzed vehicles and the traffic lights toward C0 (a) and their movement after crossing C0 toward C1 (b).

Figure 9. A schematic of the state representation for each junction.

Figure 10. Network training for both scenarios. (a) Cumulative negative reward. (b) Average queue size.

Figure 11. Comparison of trends over time for vehicle halting sessions at intersections in standard versus rerouting symmetrical and asymmetrical scenarios: (a) Intersection C0, (b) Intersection C1, and (c) Intersection C2.

Figure 12. Comparison of trends over time for pedestrian halting sessions at intersections in standard versus rerouting symmetrical and asymmetrical scenarios: (a) intersection C0, (b) intersection C1, and (c) intersection C2.

Figure 13. Comparison of trends over time for the active phases (agent actions) at C0, C1, and C2 intersections in standard (left) versus rerouting symmetrical (middle) and asymmetrical (right) scenarios. The nine possible phases are indicated at the top.

Figure 14. Comparison of green time trends across all active phases at intersections C0, C1, and C2. Active phases are indicated at the top for clarity. (a) Standard scenario. (b) Symmetric rerouting scenario. (c) Asymmetric rerouting scenario.

Table 1. Message protocol defined for each of the V-VLC communications.

	SOF 5 Bits	TIME 6 + 6 + 6 Bits			FLAG 3 Bits	COM 4 Bits	POSITION 4 + 4 Bits		PAYLOAD 4 + 4 + 4 + 4 + 4 + 4 Bits					EOF
L2V	Sync	Hour	Min	Sec	END	1	y	x	0000 + 0000					EOF
V2V	Sync	Hour	Min	Sec	END	2	y	x	Lane (0–7)	Device (nr)	Device IDy	Device IDx	Nr. behind	EOF
V2I	Sync	Hour	Min	Sec	END	3	y	x	TL (0–15)	Device (nr).	Device IDy	Device IDx	Nr. behind	EOF
I2V	Sync	Hour	Min	Sec	END	4	y	x	TL (0–15)	Device ID	Device IDy	Device IDx	Nr. behind	EOF
P2I	Sync	Hour	Min	Sec	END	5	y	x	TL (0–15)	N,S,E,W.	……			EOF
I2P	Sync	Hour	Min	Sec	END	6	y	x	TL (0–15)	Phase	…….			EOF

Table 2. Training parameters.

Parameter	Value
Number of Episodes	300
Max Steps	3600
Vehicles Generated	2600
Pedestrians Generated	2000
Hidden Layers	400
Activation Function	ReLu
Width Layers	400
Batch Size	100
Learning Rate	0.001
Training Epochs	800
Memory Size	50,000
Number of States (Input Layers)	164
Number of Actions (Output Layers)	9
Gamma	0.75

Table 3. Global percentages of green times for C0, C1, and C2 for both standard and rerouting scenarios.

Standard (% Green Time)				Symmetrical (% Green Time)				Asymmetrical (% Green Time)
	C0	C1	C2		C0	C1	C2		C0	C1	C2
P1	21%	15%	13%	P1	13%	17%	11%	P1	13%	20%	15%
P2	2%	10%	6%	P2	9%	5%	10%	P2	5%	4%	6%
P3	3%	5%	9%	P3	2%	6%	7%	P3	10%	2%	6%
P4	2%	2%	4%	P4	3%	1%	2%	P4	0%	1%	2%
P5	37%	30%	22%	P5	33%	32%	28%	P5	31%	29%	16%
P6	7%	7%	19%	P6	6%	12%	9%	P6	14%	15%	17%
P7	4%	5%	4%	P7	9%	9%	3%	P7	6%	7%	10%
P8	7%	10%	11%	P8	7%	4%	8%	P8	1%	9%	9%
P9	16%	30%	22%	P9	18%	14%	21%	P9	20%	12%	19%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Integrating Visible Light Communication and AI for Adaptive Traffic Management: A Focus on Reward Functions and Rerouting Coordination

Abstract

1. Introduction

2. Background and Related Work

2.1. Key Challenges in Urban Traffic Management

2.2. Innovative Solutions: Integrating Connected Vehicles with VLC

2.3. Leveraging VLC for Intelligent Traffic Solutions

2.4. DRL Framework for Traffic Signal Control

3. Proposed Approach and Methodology

3.1. System Architecture

3.1.1. VLC-Enabled Infrastructure

3.1.2. Multi-Agent Reinforcement Learning

3.1.3. Data Integration Platform

3.2. Role of Rewards in Standard and Rerouting Scenarios

3.3. Coordination Strategy

3.4. Simulation and Validation

4. Results and Discussion

4.1. Network Training

4.2. Network Testing

4.3. Global Agent Decisions

4.4. Impact of VLC and CV Integration on Traffic Rerouting

5. Conclusions and Future Trends

5.1. Summary

5.2. Novel Contributions

5.3. Key Challenges and Limitation Costs

5.4. Challenges and Mitigation Strategies in VLC and Multi-Agent Communication

5.5. Future Trends

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

Standard (% Green Time)				Symmetrical (% Green Time)				Asymmetrical (% Green Time)
	C0	C1	C2		C0	C1	C2		C0	C1	C2
P1	21%	15%	13%	P1	13%	17%	11%	P1	13%	20%	15%
P2	2%	10%	6%	P2	9%	5%	10%	P2	5%	4%	6%
P3	3%	5%	9%	P3	2%	6%	7%	P3	10%	2%	6%
P4	2%	2%	4%	P4	3%	1%	2%	P4	0%	1%	2%
P5	37%	30%	22%	P5	33%	32%	28%	P5	31%	29%	16%
P6	7%	7%	19%	P6	6%	12%	9%	P6	14%	15%	17%
P7	4%	5%	4%	P7	9%	9%	3%	P7	6%	7%	10%
P8	7%	10%	11%	P8	7%	4%	8%	P8	1%	9%	9%
P9	16%	30%	22%	P9	18%	14%	21%	P9	20%	12%	19%

Standard (% Green Time)				Symmetrical (% Green Time)				Asymmetrical (% Green Time)
	C0	C1	C2		C0	C1	C2		C0	C1	C2
P1	21%	15%	13%	P1	13%	17%	11%	P1	13%	20%	15%
P2	2%	10%	6%	P2	9%	5%	10%	P2	5%	4%	6%
P3	3%	5%	9%	P3	2%	6%	7%	P3	10%	2%	6%
P4	2%	2%	4%	P4	3%	1%	2%	P4	0%	1%	2%
P5	37%	30%	22%	P5	33%	32%	28%	P5	31%	29%	16%
P6	7%	7%	19%	P6	6%	12%	9%	P6	14%	15%	17%
P7	4%	5%	4%	P7	9%	9%	3%	P7	6%	7%	10%
P8	7%	10%	11%	P8	7%	4%	8%	P8	1%	9%	9%
P9	16%	30%	22%	P9	18%	14%	21%	P9	20%	12%	19%

Standard (% Green Time)				Symmetrical (% Green Time)				Asymmetrical (% Green Time)
	C0	C1	C2		C0	C1	C2		C0	C1	C2
P1	21%	15%	13%	P1	13%	17%	11%	P1	13%	20%	15%
P2	2%	10%	6%	P2	9%	5%	10%	P2	5%	4%	6%
P3	3%	5%	9%	P3	2%	6%	7%	P3	10%	2%	6%
P4	2%	2%	4%	P4	3%	1%	2%	P4	0%	1%	2%
P5	37%	30%	22%	P5	33%	32%	28%	P5	31%	29%	16%
P6	7%	7%	19%	P6	6%	12%	9%	P6	14%	15%	17%
P7	4%	5%	4%	P7	9%	9%	3%	P7	6%	7%	10%
P8	7%	10%	11%	P8	7%	4%	8%	P8	1%	9%	9%
P9	16%	30%	22%	P9	18%	14%	21%	P9	20%	12%	19%