A Grid-Enabled Vision and Machine Learning Framework for Safer and Smarter Intersections: Enhancing Real-Time Roadway Intelligence and Vehicle Coordination

Jha, Manoj K.; Jha, Pranav K.; Yadav, Rupesh K.

doi:10.3390/infrastructures11020041

Open AccessArticle

A Grid-Enabled Vision and Machine Learning Framework for Safer and Smarter Intersections: Enhancing Real-Time Roadway Intelligence and Vehicle Coordination

by

Manoj K. Jha

^1,*

,

Pranav K. Jha

²

and

Rupesh K. Yadav

³

¹

Department of Information Technology, University of Maryland Global Campus, Adelphi, MD 20783, USA

²

AI Solutions Architect, MKJHA Consulting, Inc., Severn, MD 21144, USA

³

Civil Engineering, College of Engineering, Design and Computing, University of Colorado, Denver, CO 80204, USA

^*

Author to whom correspondence should be addressed.

Infrastructures 2026, 11(2), 41; https://doi.org/10.3390/infrastructures11020041

Submission received: 28 October 2025 / Revised: 21 January 2026 / Accepted: 22 January 2026 / Published: 27 January 2026

(This article belongs to the Special Issue Safer Roads Ahead: Exploring the Latest Innovations and Advancements in Road Design and Safety Technology)

Download

Browse Figures

Versions Notes

Abstract

Urban intersections are critical nodes for roadway safety, congestion management, and autonomous vehicle coordination. Traditional traffic control systems based on fixed-time signals and static sensors lack adaptability to real-time risks such as red-light violations, near-miss incidents, and multimodal conflicts. This study presents a grid-enabled framework integrating computer vision and machine learning to enhance real-time intersection intelligence and road safety. The system overlays a computational grid on the roadway, processes live video feeds, and extracts dynamic parameters including vehicle trajectories, deceleration patterns, and queue evolution. A novel active learning module improves detection accuracy under low visibility and occlusion, reducing false alarms in collision and violation detection. Designed for edge-computing environments, the framework interfaces with signal controllers to enable adaptive signal timing, proactive collision avoidance, and emergency vehicle prioritization. Case studies from multiple intersections typical of US cities show improved phase utilization, reduced intersection conflicts, and enhanced throughput. A grid-based heatmap visualization highlights spatial risk zones, supporting data-driven decision-making. The proposed framework bridges static infrastructure and intelligent mobility systems, advancing safer, smarter, and more connected roadway operations.

Keywords:

intelligent traffic management; computational grid; computer vision; active learning; red-light violations; autonomous vehicle coordination

1. Introduction

Managing urban traffic at intersections remains a critical challenge in intelligent transportation systems (ITSs), particularly in environments experiencing rapid growth in vehicle volumes, multimodal mobility, and increasing demands for real-time decision-making. Traditional traffic control methods such as fixed-time signals, embedded inductive loop detectors, and centralized traffic operations centers are often reactive and poorly equipped to address dynamic traffic conditions, red-light violations, or the behavioral characteristics of connected and autonomous vehicles (CAVs) [1,2]. These approaches are further constrained by reliance on static infrastructure, which contrasts sharply with the dynamic nature of traffic flow captured via live video feeds from roadside cameras.

Recent advances in computer vision and machine learning (ML) provide unprecedented opportunities to augment conventional traffic control systems with intelligent, real-time data processing capabilities [3,4,5]. Specifically, integrating motion analysis, vehicle classification, and anomaly detection from video streams enables adaptive and proactive traffic management strategies. However, challenges persist in robustness under low light, occlusion, and variable camera resolution. Many ML-based models remain supervised, requiring extensive labeled datasets that are often unavailable or unreliable in real-world scenarios.

To address these gaps, this paper introduces a novel framework that integrates computational grid modeling, computer vision, and active learning-based ML for dynamic traffic monitoring and intelligent intersection management. The core idea is to bridge the gap between static infrastructure including geometric layouts and signal hardware and the dynamic traffic environment that evolves in real time. By overlaying a computational grid on the roadway and leveraging video analytics, the system estimates vehicle speeds, deceleration rates, and stopping distances. The ML module, powered by active learning, adaptively selects informative instances for labeling to enhance classification performance under uncertain visual conditions.

This study builds on a significant body of prior work by Jha and colleagues, including ML-driven CAV communications, bi-layered travel-time prediction, cybersecurity detection in connected vehicles, and active learning for infrastructure risk modeling. Additional foundational contributions from the team address signal timing optimization [6], dynamic sight distance modeling [7], and urban congestion hotspot prediction [8].

External research further confirms this work. Wang et al. [9] proposed grid-based traffic abstraction using probe vehicle data, Konrad et al. [10] developed occupancy grid techniques for rural road estimation, Liu et al. [11] applied grid mapping for recurring congestion analysis with taxi GPS data, and Zhou [12] introduced parallel genetic algorithms for predictive traffic control via grid computing. Vision-based traffic monitoring methods, such as YOLO, DeepSORT, and R-CNN, have gained traction in real-time ITS deployments [13,14].

To validate the proposed framework, several real-world case studies are presented across diverse operational environments from six different intersections, which are typical of US cities: (1) grid-based vehicle flow mapping; (2) entropy-driven uncertainty quantification for nighttime detection; (3) phase optimization at an urban intersection; (4) spatiotemporal grid coordination; (5) multimodal priority at a transit-adjacent intersection; and (6) weather-resilient detection and control under degraded visibility. These studies demonstrate the framework’s adaptability to varying lighting conditions, traffic patterns, and geometric configurations.

This paper advances the field by unifying these domains into a deployable, edge-compatible system architecture capable of informing adaptive signal timing, enhancing red-light violation detection and supporting vehicle-to-infrastructure (V2I) communication for autonomous vehicle coordination.

2. Literature Review

Recent advances in intelligent traffic signal control have produced a wide range of approaches that differ in sensing modalities, decision-making paradigms, and deployment scale. Research work over the last decade has significantly reshaped intelligent traffic signal control, particularly through deep reinforcement learning, vision-based perception, and edge/cloud-enabled distributed control architectures. This section explicitly incorporates these recent methodological advances alongside classical foundational works that remain standard baselines in traffic systems research and emphasizes current methodological trends and open challenges. We review the most relevant strands of prior work and identify the gap addressed by the proposed framework.

2.1. Fixed-Time, Actuated, and Adaptive Signal Control

Traditional traffic signal control strategies rely on fixed-time plans or actuated logic based on loop detectors, magnetometers, or radar sensors. These methods are widely deployed due to their simplicity, reliability, and low computational cost. Adaptive extensions such as SCOOT and SCATS introduce limited real-time responsiveness by adjusting signal timings based on observed traffic conditions. However, these systems depend on aggregated measurements and predefined rules, making them less effective under non-recurrent congestion, incidents, or rapidly changing demand patterns [1,15,16].

2.2. Reinforcement Learning and Multi-Agent Control

Reinforcement learning (RL) approaches model traffic signal control as a sequential decision-making problem, where intersections learn policies through interaction with the traffic environment. Early studies demonstrated the feasibility of tabular and actor–critic methods for isolated intersections, while more recent work has adopted deep reinforcement learning and multi-agent formulations. Although promising results have been reported in simulation environments, RL-based controllers often exhibit high sample complexity, sensitivity to non-stationary traffic demand, and limited scalability when extended to corridor- or network-level control [17,18,19].

2.3. Vision-Driven Traffic Perception and Safety Analytics

Advances in computer vision have enabled fine-grained traffic perception, including vehicle detection, tracking, trajectory extraction, and classification of diverse road users. These capabilities have supported the development of surrogate safety measures such as time-to-collision and post-encroachment time, providing richer insights into potential conflicts. Despite these advances, vision-based safety and risk metrics are typically employed for offline analysis or monitoring rather than being embedded directly within real-time traffic signal control loops [20,21,22].

2.4. Distributed and Grid-Enabled Traffic Management

To address the computational and data-management challenges of city-scale intelligent transportation systems, distributed and grid-enabled computing architectures have been proposed. These approaches enable parallel processing of perception, prediction, and optimization tasks across multiple nodes, supporting scalability and reduced latency. However, existing distributed traffic management systems primarily focus on data aggregation and monitoring, with limited integration of adaptive learning and safety-aware control in real-time operations [23,24].

2.5. Identified Gap and Positioning of This Work

The reviewed literature reveals a persistent separation between traffic control, perception, learning, and safety assessment. Fixed-time and adaptive controllers lack rich situational awareness, reinforcement learning approaches face challenges in robustness and scalability, vision-based methods are rarely incorporated into closed-loop control, and distributed architectures seldom integrate learning and risk metrics into decision-making. This work addresses these limitations by proposing a unified, grid-enabled framework that integrates vision-driven risk and conflict assessment, active learning under distribution shift, and adaptive signal control within a single operational pipeline.

The integration of advanced sensing, machine learning (ML), and edge computing is transforming urban traffic management by enabling responsive, anomaly-aware control that surpasses traditional loop-based or fixed-time systems. These technologies facilitate real-time monitoring, adaptive signal control, and improved situational awareness for both human-operated and autonomous vehicles. Despite these advancements, challenges remain in robustness, scalability, and integration with legacy infrastructure.

2.6. Computer Vision and Deep Learning for Traffic Sensing

Deep learning-based vision systems, including YOLOv5–v8, Faster R-CNN, DeepSORT, and RetinaNet, are widely deployed for real-time detection and tracking of vehicles, pedestrians, and cyclists [13,14]. Modern architectures incorporate attention modules, multi-scale feature fusion, and temporal aggregation to improve robustness under adverse conditions, such as low light, occlusion, or adverse weather [25]. Thermal, infrared, and multispectral fusion models further enhance detection in low-visibility environments, enabling continuous operation during night-time or foggy conditions [26]. However, many deep learning models are computationally intensive, posing challenges for real-time deployment on resource-constrained edge devices.

2.7. Edge and IoT Architectures for Real-Time Monitoring

Edge computing devices, such as NVIDIA Jetson Nano, Google Coral, and Raspberry Pi, allow local processing of video streams, reducing latency and bandwidth usage compared to centralized cloud processing. Cloud-edge architectures, such as Edge YOLO, combine on-device inference with cloud-based analytics to balance speed and accuracy [27,28]. Edge computing enables scalable deployments of ITS across multiple intersections, but system integration with traffic controllers and V2I platforms remains a complex challenge, particularly in heterogeneous urban environments.

2.8. Computational Grid and Spatiotemporal Discretization

Spatial grid modeling discretizes roadways into cells, enabling precise estimation of vehicle density, speed, and flow patterns [9,11]. Occupancy grid methods have been applied to both urban and rural traffic networks, facilitating traffic state estimation and anomaly detection [10]. Advanced approaches, such as parallel genetic algorithms for predictive traffic control [12], allow real-time optimization of signal phases and adaptive routing. Despite these advances, integrating grid-based modeling with real-time sensor data and computer vision outputs is still an open research challenge, particularly for intersections with complex geometries.

2.9. Active Learning and Vision Uncertainty

Active learning techniques reduce the burden of manual labeling by selecting the most informative samples for training under uncertain conditions [29]. In traffic environments with variable lighting, partial occlusions, and heterogeneous camera angles, uncertainty-aware ML frameworks improve classification performance and reduce false positives. Combining uncertainty sampling with confidence scoring allows systems to dynamically refine their models in real-world deployment, supporting more reliable vehicle classification and anomaly detection for ITS.

2.10. Reinforcement Learning for Adaptive Signal Control

Deep reinforcement learning (DRL) models, including DQN, A3C, and multi-agent RL, have been successfully applied for adaptive traffic signal control [18,30,31]. These models learn policies based on real-time traffic flow, congestion patterns, and phase performance metrics, outperforming traditional pre-timed and actuated signal schemes. Multi-agent RL approaches enable coordination among intersections, optimizing network-wide traffic throughput and minimizing cumulative delays. However, challenges persist in training DRL agents for large-scale networks due to non-stationary traffic dynamics and sparse reward signals.

2.11. V2V Communication and Multi-Agent Coordination

Recent advances in vehicle-to-vehicle (V2V) communication and multi-agent systems have significantly enhanced intersection management strategies. Several studies have demonstrated the potential of distributed approaches in intelligent transportation systems:

Distributed Control: Recent work by Milanés and Shladover [32] presents a decentralized cooperative driving framework enabling vehicles to negotiate right-of-way with minimal centralized coordination.
Cooperative Perception: Cooperative perception for collision avoidance has been advanced by studies such as Kim et al. [33], who demonstrated how shared situational awareness improves safety in complex urban environments.
Communication Protocols: The work of Molina-Masegosa and Gozalvez [34] established key performance benchmarks for low-latency, high-reliability LTE-V2X communications in dense vehicular scenarios.
Multi-Agent Reinforcement Learning: Recent developments in MARL for traffic signal and flow optimization, as shown in Chu et al. [35], demonstrate strong improvements in distributed traffic management.

2.12. Intersection Safety and CAV-V2I Integration

Vision-based enforcement systems, including deep learning classifiers, are widely used to detect helmetless riding, red-light violations, and wrong-way driving [36,37]. Vehicle-to-infrastructure (V2I) platforms support CAV signal coordination, enabling low-latency communication between vehicles and traffic controllers [38]. Combining vision analytics with V2I capabilities allows predictive safety measures, such as early warnings for red-light violations or dynamic phase adjustments for emergency vehicle prioritization. Despite these capabilities, fully integrated systems that combine computer vision, ML, edge computing, and V2I coordination remain limited in deployment, highlighting the need for unified frameworks capable of handling heterogeneous data streams, diverse intersection geometries, and varying traffic conditions.

2.13. Positioning with Respect to Related Prior Work

This study builds on foundational work by the author and the research team. Only those prior studies by the authors that establish methodological continuity are briefly referenced here. The focus of this paper is not to summarize the authors’ research trajectory but to situate the present framework within unresolved challenges identified across the broader literature:

Signal Timing Optimization [6]: Explored methods to optimize traffic signal timing to reduce delays, conflict points, and accidents while improving flow and minimizing fuel consumption. Adjustments in timing account for traffic growth and changing patterns, postponing the need for costly infrastructure improvements.
Urban Congestion Hotspot Prediction [8]: Developed an ML framework integrating geospatial and live traffic sensor data to predict recurring congestion hotspots in urban areas. A case study from Maryland identified top segments on I-95 and I-495, with predictions validated using loop detector data. The approach enhances travel-time prediction and congestion management.
Three-Dimensional Sight Distance Modeling [7]: Studied the dynamic sight distance problem at signalized intersections using Random Forest classifiers. Factors such as queued vehicles, obstruction angles, driver age, and peak-hour conditions were included to model driver gap acceptance. The ML approach outperformed traditional methods, with potential extensions to network-level traffic simulations and signal optimization.

2.14. Identified Research Gaps

Despite significant progress in CAV and traffic management research, several gaps remain:

1.: Few frameworks effectively integrate spatial grid modeling, real-time video analytics, and CAV-compatible vehicle-to-infrastructure (V2I) control into a unified system.
2.: Robust detection of red-light violations under uncertain or adverse visual conditions is still limited.
3.: Comprehensive integration with edge computing platforms has not been fully explored, particularly in terms of end-to-end system validation.
4.: Multi-site studies demonstrating generalizability across varying lighting, road geometry, and congestion scenarios are scarce.

This paper addresses these challenges with a unified, deployable framework that integrates grid modeling, edge-compatible ML, active learning, and vision-based traffic analytics to support intersection intelligence and autonomous vehicle coordination.

3. Methodology

This section specifies the operational model used to translate real-time perception into control decisions.

3.1. Notation and State Representation

We consider an intersection set

I

and discrete decision epochs

t = 1, 2, \dots

. For each intersection

i \in I

, we define a drift indicator

Δ_{i} (t)

as a scalar measure capturing distributional shift in perception features or prediction uncertainty at intersection i and time t. This quantity is later used to trigger selective model updates via active learning.

We define the state vector as follows:

s_{i} (t) = [q_{i} (t), ρ_{i} (t), v_{i} (t), r_{i} (t), Δ_{i} (t)],

(1)

where

q_{i}

is the queue length (vehicles),

ρ_{i}

is the density,

v_{i}

is the mean approach speed, and

r_{i}

is a conflict risk index derived from CV-based trajectories.

3.2. Action Space and Safety Constraints

Let

A_{i}

denote the feasible control actions at intersection i. In this work, an action corresponds to modifying phase durations:

a_{i} (t) = {δ g_{i, p} (t)}_{p \in P_{i}},

(2)

These actions are subject to standard operational and safety constraints:

g_{i, p}^{min} \leq g_{i, p} (t) \leq g_{i, p}^{max}, \forall p \in P_{i},

(3)

In addition, clearance constraints (yellow/all-red) are enforced as required by the controller specification.

3.3. Objective (Control Criterion)

We define a multi-term objective capturing delay, queues, and safety risk:

J (t) = \sum_{i \in I} (α D_{i} (t) + β Q_{i} (t) + γ R_{i} (t)),

(4)

Here,

D_{i}

is estimated delay,

Q_{i}

is an aggregated queue metric, and

R_{i}

is the risk penalty derived from the trajectory-based conflict model defined in Section 3.8. Weights

α

,

β

, and

γ

determine the trade-off.

3.4. Decision Rule

At each epoch t, the controller selects

a_{i}^{★} (t) = arg min_{a_{i} (t) \in A_{i}} {\hat{J}}_{i} (t + 1 ∣ s_{i} (t), a_{i} (t)),

(5)

Here,

{\hat{J}}_{i}

is the predicted one-step (or receding-horizon) objective under candidate actions.

3.5. Implementation Details

3.5.1. Dataset Description

We evaluate our framework on multiple datasets to ensure robustness across different scenarios. The data set used for the evaluation is shown in Table 1.

3.5.2. Model Architecture and Training

Our implementation uses the following key components:

Object Detection: YOLOv8n with CSPDarknet53 backbone, pretrained on COCO and fine-tuned on our dataset.
Tracking: DeepSORT with Kalman filtering and Mahalanobis distance for association.
Grid Processing: 0.5 m × 0.5 m grid cells with 10 Hz update rate.
Training: Adam optimizer with initial learning rate of 0.001, batch size of 32, and early stopping with patience of 10 epochs.
Hardware: NVIDIA Jetson AGX Xavier for edge deployment, achieving 25 FPS at 1920 × 1080 resolution.

3.5.3. Active Learning Strategy

The active learning module employs an entropy-based sampling approach to identify uncertain detections. For each detection

x_{i}

with class probabilities

p (y | x_{i})

, we compute the entropy:

H (x_{i}) = - \sum_{y} p (y | x_{i}) log p (y | x_{i})

(6)

Detections with entropy above a threshold

τ

are flagged for human review. The threshold is dynamically adjusted based on the current model’s performance on a held-out validation set.

3.6. Active Learning and Drift-Triggered Updates

The active learning loop reduces labeling and retraining cost while maintaining robustness under non-stationary traffic patterns (e.g., weather, incidents, seasonal changes). Let

U

denote unlabeled frames/clips and

L

labeled samples used to train the perception model.

3.6.1. Drift Detection

We compute a drift score

Δ_{i} (t)

using feature statistics from the perception backbone (e.g., embedding shift) or prediction uncertainty shift:

Δ_{i} (t) = d (ϕ (x_{i} (t)), ϕ (x_{i} (t - τ))),

(7)

Here,

ϕ (\cdot)

is an embedding function, and

d (\cdot)

is a distance or divergence metric used to quantify distributional change.

3.6.2. Query Strategy

When

Δ_{i} (t)

exceeds a threshold

η

, new samples are queried using an uncertainty-based sampling strategy:

x^{★} = arg max_{x \in U} H (p_{θ} (\cdot ∣ x)),

(8)

In this expression,

H (\cdot)

denotes predictive entropy, which prioritizes samples for which the model exhibits high uncertainty.

3.6.3. Update and Deployment

Queried samples are labeled and appended to

L

, and the perception model is fine-tuned periodically. Only the affected intersections or corridors are updated in order to limit computational overhead and operational disruption.

3.6.4. Evaluation Protocol

We evaluate our framework using the following metrics:

Detection: mAP@0.5, mAP@0.5:0.95.
Tracking: MOTA, MOTP, ID switches.
Traffic Analysis: Vehicle count error, speed estimation error, and queue length accuracy.
System Performance: End-to-end latency, CPU/GPU utilization, and memory footprint.

Statistical significance of improvements is assessed using paired t-tests with

α = 0.05

. All experiments are repeated 5 times with different random seeds, and we report mean and standard deviation of the metrics.

This section presents a comprehensive framework for intelligent intersection monitoring, integrating spatial grid modeling, computer vision, and machine learning (ML) to enable real-time traffic analysis. The methodology is structured into six interrelated components that form a cohesive computational pipeline: grid-based spatiotemporal modeling, vehicle detection and tracking, traffic behavior estimation, active learning refinement, traffic flow analysis, and edge deployment optimization.

3.7. Overall Computational Pipeline and Component Integration

The proposed framework operates as an integrated system where each component feeds into subsequent stages, creating a continuous feedback loop for adaptive traffic management. Figure 1 illustrates the end-to-end workflow and data flow between components.

The components are integrated as follows:

1. Grid-based spatiotemporal modeling establishes the foundational spatial discretization of the intersection. Each grid cell

g_{i j}

serves as a spatial unit where traffic variables (occupancy, speed, acceleration, and trajectory angle) are mapped and updated in real time. This grid structure provides a common reference frame that enables efficient spatial queries and aggregation of traffic metrics.

2. Vehicle detection and tracking processes video feeds to extract object locations, bounding boxes, and trajectories using YOLOv5n and DeepSORT with Kalman filtering. The detected positions are transformed into grid coordinates, establishing a direct mapping between pixel space and the computational grid. This module feeds real-time vehicle data into the grid structure, maintaining temporal consistency through object tracking.

3. Traffic behavior estimation computes cell-level spatiotemporal variables using the grid-mapped trajectories. This includes instantaneous speed, acceleration, queue states, and conflict points. The behavioral metrics are derived from the grid-annotated detections, making this module dependent on the outputs of both the grid model and detection/tracking modules.

4. Risk and conflict assessment evaluates safety-critical conditions such as red-light-running exposure, potential collision trajectories, and flow-based risk scores. This module consumes the cell-level behavioral variables produced by the behavior estimation component and generates risk indices that inform the adaptive control system.

5. The active learning module operates in parallel to continuously improve detection accuracy. It monitors uncertainty in detections using entropy-based sampling and identifies informative samples for model retraining. This creates a feedback loop that enhances the performance of the detection and tracking module, particularly in challenging conditions like occlusion or poor visibility.

6. Adaptive signal control integrates phase demand scores, congestion indices, and risk metrics to optimize signal timing plans. The control decisions are informed by the grid-based analysis of traffic conditions and can be adjusted in real time to respond to changing traffic patterns or emerging safety concerns.

This integrated architecture enables the system to maintain a continuous cycle of perception, analysis, and control, with each component contributing to a comprehensive understanding of intersection dynamics while adapting to evolving traffic conditions.

3.8. Risk and Conflict Assessment from Trajectories

Risk is computed from CV-extracted road-user trajectories and used directly inside the control objective (Equation (4)). Let

T (t)

denote a set of trajectories within a time window

[t - ω, t]

.

3.8.1. Trajectory Extraction

Each road user k has a trajectory

τ_{k} = {(x_{k} (ℓ), y_{k} (ℓ), ℓ)}_{ℓ}

obtained via detection and tracking.

3.8.2. Conflict Computation

For each pair

(k, m)

, we estimate spatiotemporal proximity and crossing likelihood:

ψ_{k, m} = min_{ℓ} ∥(x_{k} (ℓ), y_{k} (ℓ)) - (x_{m} (ℓ), y_{m} (ℓ))∥ .

(9)

We then define an intersection-level risk index:

r_{i} (t) = \sum_{(k, m) \in P (t)} I [ψ_{k, m} < ϵ] \cdot w_{k, m},

(10)

Here,

ϵ

is a proximity threshold, and

w_{k, m}

can incorporate relative speed or road-user type (vehicle/pedestrian).

3.9. Adaptive Signal Control

Algorithm 1 summarizes the closed-loop decision process executed at each control epoch. The procedure integrates state estimation, risk assessment, and action selection to adapt signal timings in real time.

Algorithm 1: Closed-loop adaptive signal control (per epoch t)

Input: Sensor/CV inputs, current phase plan, operational constraints

Output: Updated phase durations and/or next phase decision

Step 1: Compute the system state

s_{i} (t)

using queue lengths, traffic density, vehicle speed, and risk indicator

r_{i} (t)

;

Step 2: Compute the drift score

Δ_{i} (t)

;

Step 3: if

Δ_{i} (t) > η

then

| Trigger an active-learning query;

else

| Continue without querying;

end

Step 4: Generate candidate control actions

a_{i} (t) \in A_{i}

that satisfy operational constraints;

Step 5: Evaluate the predicted cost

{\hat{J}}_{i} (t + 1 ∣ s_{i} (t), a_{i} (t))

for each candidate action;

Step 6: Select the optimal action

a_{i}^{★} (t) = arg min_{a_{i} (t) \in A_{i}} {\hat{J}}_{i};

Step 7: Dispatch the updated signal timing to the controller;

Step 8: Log control actions and observed system outcomes;

3.10. Decision-Making Process

The transformation from sensor data to control actions involves several key steps:

1.

Perception: Raw sensor data is processed to detect and track objects, which are then mapped to the grid representation.

2.

Situation Assessment: The system evaluates the current traffic state using metrics such as the following:

Vehicle density and flow rates;
Conflict points and potential collision risks;
Queue lengths and waiting times.

3.

Decision Generation: Based on the assessed situation, the system generates control actions using a combination of the following:

Rule-based strategies for safety-critical scenarios;
Optimization-based approaches for efficiency;
Learning-based methods for handling complex, uncertain situations.

4.

Action Execution: Control signals are sent to traffic lights, V2X infrastructure, or directly to connected vehicles.

The entire process operates in real time with an end-to-end latency of less than 100 ms, enabling responsive control even in dynamic traffic conditions.

3.11. Grid-Based Spatiotemporal Modeling

To capture the spatial and temporal dynamics of traffic, each monitored intersection is discretized into a 2D computational grid G composed of cells

g_{i j}

:

G = {g_{i j} ∣ i = 1, \dots, m; j = 1, \dots, n} .

(11)

Each cell stores a time series of traffic features:

F_{i j} (t) = {c_{t}, v_{t}, a_{t}, θ_{t}},

(12)

where

c_{t}

is the vehicle count,

v_{t}

is the speed,

a_{t}

is the acceleration, and

θ_{t}

is the heading. The grid cell area is

A_{c} = δ_{x} \cdot δ_{y},

(13)

and vehicle counts are normalized as

{\tilde{c}}_{i j} (t) = \frac{c_{i j} (t)}{max_{i, j} c_{i j} (t)},

(14)

allowing comparison across cells and time.

3.12. Vehicle Detection and Tracking

Vehicles are detected in each frame using YOLOv5n, producing bounding boxes:

B_{t} = {b_{k} = (x_{k}, y_{k}, w_{k}, h_{k})}_{k = 1}^{K} .

(15)

Motion is estimated via optical flow:

\vec{v} (x, y, t) = (\frac{\partial I}{\partial x}, \frac{\partial I}{\partial y}) \cdot {(\frac{\partial I}{\partial t})}^{- 1},

(16)

and vehicle trajectories are refined using a Kalman filter:

x_{t + 1} = A x_{t} + B u_{t} + w_{t},

(17)

where

x_{t}

represents the state vector,

u_{t}

represents the control input, and

w_{t}

represents the process noise. This combination ensures robust tracking under occlusions and partial observations.

3.13. Traffic Behavior Estimation

Key traffic parameters are derived from vehicle trajectories. Speed and acceleration are calculated as follows:

v_{i} (t) = \frac{Δ s}{Δ t}, a_{i} (t) = \frac{d v_{i}}{d t} .

(18)

Stopping distance, which informs risk assessment, is defined as follows:

s_{s} = \frac{v^{2}}{2 a} + t_{r} v,

(19)

where

t_{r}

is the driver reaction time. The red-light violation risk is quantified as follows:

R_{v} = P (s_{s} > d_{s t o p}) \cdot ⊮_{ϕ = red},

(20)

and the aggregated flow risk across the grid is

R_{f l o w} = \sum_{i, j} ρ_{i j} (t) \cdot P_{c} (g_{i j}),

(21)

where

ρ_{i j} (t)

is the cell density, and

P_{c}

is the conflict probability.

3.14. Active Learning for Classification

Vehicle classification employs an active learning loop to improve model accuracy with minimal labeling effort. Class probabilities are computed as follows:

P (y ∣ x) = softmax (f_{θ} (x)),

(22)

and sample uncertainty is measured by entropy:

H (x) = - \sum_{i = 1}^{C} P (y_{i} ∣ x) log P (y_{i} ∣ x) .

(23)

The most uncertain samples are added to the training set:

x^{*} = arg max_{x} H (x), D_{n e w} = D_{o l d} \cup {(x^{*}, y^{*})} .

(24)

Weights are updated via gradient descent:

θ \leftarrow θ - η \nabla_{θ} L (f_{θ} (x^{*}), y^{*}),

(25)

yielding refined predictions

\hat{y} = arg {max}_{y} P (y ∣ x)

.

3.15. Traffic Flow Analysis and Risk Functions

Grid-based traffic flow metrics include density

ρ_{i j}

and flow

q_{i j}

:

ρ_{i j} (t) = \frac{c_{t}}{A}, q_{i j} (t) = ρ_{i j} (t) \cdot v_{t} .

(26)

The conflict probability for merging lanes is as follows:

P_{c} = 1 - e^{- λ Δ t},

(27)

The queue length is as follows:

L_{q} (t) = s (t) - g (t) \cdot μ,

(28)

The average delay per cycle is as follows:

d = \frac{C {(1 - g / C)}^{2}}{2 (1 - (g / C) x)},

(29)

and the grid-level red-light exposure is as follows:

E_{r} = \sum_{g_{i j} \in R} R_{v} (g_{i j}) .

(30)

3.16. Edge Deployment Modeling

System performance on edge devices is evaluated via latency, energy, and memory:

T_{t o t a l} = T_{d e t e c t} + T_{t r a c k} + T_{g r i d} + T_{i n f e r}, E = \int_{0}^{T} P (t) d t, M = N_{f} \cdot S_{f},

(31)

where

N_{f}

is the number of frames, and

S_{f}

is the per-frame memory. These metrics ensure that the framework meets real-time operational requirements.

This methodology unifies dynamic vehicle perception, spatial aggregation, active learning, and edge deployment to enable near real-time decision-making for traffic safety, signal optimization, and autonomous vehicle coordination at urban intersections.

3.17. Grid-Enhanced Vehicle Flow Analytics

Urban intersections often present visibility challenges due to low-light conditions, nighttime operations, or adverse weather, which can significantly degrade vehicle detection accuracy. To address this, we developed a robust grid-enhanced vehicle flow analytics framework that integrates computer vision with machine learning, ensuring reliable detection and tracking under challenging environmental conditions.

The proposed framework employs a multi-stage convolutional neural network (CNN) pipeline. The first stage applies image pre-processing techniques, including histogram equalization and contrast-limited adaptive histogram equalization (CLAHE), to enhance low-light video frames. In the second stage, a YOLOv7-tiny object detection model, fine-tuned on nighttime traffic datasets, identifies vehicles in each frame. Vehicle identities are maintained across frames using Deep SORT (Simple Online and Realtime Tracking), providing consistent trajectory IDs and mitigating re-identification errors in crowded or occluded scenes.

Post-processing incorporates motion-based filtering and temporal consistency checks to remove spurious detections. Detected vehicles are mapped onto a pre-defined computational grid overlaying the roadway. Each grid cell stores attributes such as occupancy state, vehicle ID, timestamp, speed estimate, and deceleration profile. This discretization allows precise estimation of dynamic traffic parameters and facilitates downstream traffic flow analysis.

Given a video frame

F_{t}

, the detection pipeline generates bounding boxes

B_{i}^{t}

for each vehicle i, which are mapped to grid cells

G_{x, y}

based on their pixel coordinates. Vehicle velocity

v_{i}

and acceleration

a_{i}

are computed as temporal differences of centroid positions:

\begin{matrix} v_{i} (t) & = \frac{d (P_{i}^{t + 1}, P_{i}^{t})}{Δ t}, \end{matrix}

(32)

\begin{matrix} a_{i} (t) & = \frac{v_{i} (t + 1) - v_{i} (t)}{Δ t}, \end{matrix}

(33)

where

P_{i}^{t}

represents the centroid of vehicle i in frame t, and

Δ t

is the frame interval. These estimates provide the basis for safety-critical metrics such as clearance time

T_{c}

and stopping distance

D_{s}

during red-light phases:

\begin{matrix} T_{c} & = \frac{L}{v_{i} (t)}, \end{matrix}

(34)

\begin{matrix} D_{s} & = \frac{v_{i} {(t)}^{2}}{2 a_{i} (t)}, \end{matrix}

(35)

where L is the intersection crossing length.

To quantify the operational load on each signal phase, a phase demand score is defined as follows:

S_{p h a s e} (t) = \sum_{i = 1}^{m} (\frac{ρ_{i, t} \cdot v_{i, t}}{C_{i, j, t} + ϵ}),

(36)

where m is the number of approach grid cells,

ρ_{i, t}

denotes vehicle density,

v_{i, t}

is the average speed within the cell,

C_{i, j, t}

is the cell conflict probability, and

ϵ

is a small constant to prevent division by zero. This metric informs adaptive phase extension and signal prioritization decisions, linking real-time traffic perception with control strategies.

Experimental evaluation at a mid-sized urban intersection demonstrated a detection accuracy of 92.7% during nighttime, outperforming the 78.3% achieved using baseline YOLOv5 models. The grid-based representation enables seamless integration with traffic flow estimation, red-light violation analysis, and connected vehicle communication, supporting both conventional ITS deployments and CAV-enabled intersections.

4. Implementation and Reproducibility

This section documents implementation details to support reproducibility of the proposed framework, including data sources and preprocessing, training and inference configuration, and real-time control update constraints.

4.1. Data and Preprocessing

The system operates on video streams acquired from fixed traffic cameras positioned to cover the full intersection influence area. Video data are processed at a fixed frame rate and resolution to balance perception accuracy and computational efficiency. Preprocessing includes frame resizing, normalization, and region-of-interest masking to exclude irrelevant background areas.

Annotated data used for training perception models consist of bounding boxes and tracking identifiers for vehicles and other road users. Annotation is performed either offline or incrementally through the active learning pipeline, ensuring that labeling effort is focused on informative samples identified under distributional drift.

4.2. Training and Inference Details

Perception models are implemented using a deep convolutional backbone suitable for real-time inference, with weights initialized from publicly available pre-trained models. Training is performed using a stochastic gradient-based optimizer with a fixed learning rate schedule and mini-batch updates.

Inference is executed continuously on incoming video streams, while model updates are performed asynchronously to avoid disrupting real-time operation. Training and inference workloads are deployed on GPU-enabled compute nodes within the grid architecture, allowing multiple intersections or corridors to be processed in parallel.

4.3. Control Update Frequency and Latency Budget

Signal control decisions are updated at a fixed control epoch selected to balance responsiveness and stability. All perception, risk assessment, and decision computations are completed within the allotted latency budget for each control cycle.

Updated signal timing plans respect minimum green, maximum green, and clearance constraints enforced by the signal controller. This ensures that adaptive updates remain compatible with operational and safety requirements while enabling timely response to changing traffic conditions.

5. Case Studies, Results, and Discussion

This section presents six representative case studies designed to evaluate the proposed intelligent intersection framework under diverse traffic, environmental, and operational conditions. The studies leverage both semi-synthetic and real-world datasets to test system robustness, including variations in vehicle volume, lighting, intersection geometry, and phase pressure. Evaluation metrics include vehicle detection accuracy, trajectory estimation fidelity, red-light violation detection, average delay, and grid-level risk indices. Visual analytics from the computational grid provide insights into dynamic traffic behavior and intersection performance.

5.1. Datasets

Table 2 summarizes the datasets employed across the six case studies. Each dataset is selected to reflect specific operational challenges such as low-light conditions, high traffic density, or multi-modal interactions.

Each dataset imposes distinct constraints—such as variable illumination, vehicle occlusion, and fluctuating traffic flow—to assess the generalizability of the proposed framework. The following subsections provide detailed analysis for each case study, including quantitative performance metrics and visual outputs from the grid-based analytics.

5.2. Case Study 1: Suburban Daytime Flow (D1)

The first case study focuses on a suburban four-leg intersection under moderate daytime traffic conditions (1800 veh/h). Figure 2 illustrates the grid-level traffic flow map generated from the vehicle detection and trajectory estimation pipeline. High-resolution grid cells capture both lane-level occupancy and vehicle speed variations, enabling precise calculation of intersection performance metrics.

Table 3 summarizes key performance indicators. The average vehicle speed of 29.4 mph and delay of 16.2 s/veh indicate smooth traffic flow, while the grid risk index

R_{f l o w} = 0.173

reflects localized congestion and potential conflict zones. Classification accuracy of 87.1% demonstrates reliable vehicle type identification, supporting downstream V2I and adaptive signal control applications. The low number of red-light violations (eight events) further confirms effective phase coordination under normal traffic conditions.

Analysis of the grid heatmap reveals that most vehicle accumulation occurs near the approach cells for the east and west legs, highlighting the potential for targeted phase extensions during peak periods. The combination of trajectory estimation, active learning refinement, and grid-based aggregation demonstrates the framework’s capability to monitor real-time traffic states and support adaptive intersection management, even in moderately complex suburban environments.

Case study D1 demonstrates the effectiveness of the proposed grid-based intersection framework under favorable conditions, including daytime lighting and moderate traffic volumes. The high classification accuracy (87.1%) and low red-light violation count indicate that the integrated computer vision and active learning modules function reliably when occlusion and visual noise are minimal.

The grid representation provides granular insights into lane-level occupancy, speed variations, and phase pressure, enabling identification of potential congestion points even in relatively uncongested scenarios. This case establishes a baseline performance for the framework, serving as a reference for evaluating robustness under more challenging conditions such as nighttime operation, heavy traffic, or multi-modal interactions.

Furthermore, the observed correlation between grid-level vehicle density and average delay highlights the utility of the phase demand score

S_{p h a s e}

in guiding adaptive signal control. Overall, D1 confirms that the methodology can capture fine-grained traffic dynamics and supports data-driven decision-making for both human-operated and connected vehicle intersections.

5.3. Case Study 2: Nighttime Congestion (D2)

Figure 3 shows entropy values across labeled and unlabeled samples in a low-light urban setting.

Case study D2 evaluates the proposed framework under low-light urban conditions, simulating nighttime congestion at a medium-density intersection. This scenario stresses the vehicle detection pipeline due to motion blur, occlusion, and reduced contrast, providing a robust test for system resilience.

Entropy-based uncertainty quantification, illustrated in Figure 3, guided the selective labeling of high-uncertainty frames. This process effectively mitigated false positives and missed detections, enhancing overall vehicle detection accuracy from a baseline of 78.3% (without active learning) to 92.7%. These results demonstrate that active learning can substantially improve classification performance under adverse visual conditions.

The computational grid maintained reliable estimates of vehicle occupancy, speed, and queue lengths, enabling accurate calculation of the phase demand score

S_{p h a s e}

even with partially degraded sensor inputs. The findings highlight the robustness of the integrated methodology, combining entropy-driven active learning, grid-based aggregation, and adaptive tracking, and confirm its suitability for real-time intelligent transportation system (ITS) applications during nighttime operations.

5.4. Case Study 3: Urban High-Volume Arterial (D3)

Table 4 presents the estimated vehicle densities across four directional zones.

Figure 4 overlays vehicle trajectories onto the spatial grid.

Case study D3 examines system performance under high-volume urban arterial conditions, specifically during density appear across directional zones, underscoring the asymmetric load patterns typical of arterial corridors.

The spatial grid framework effectively captures vehicle occupancy, trajectory evolution, and speed distribution, while the integrated tracking and active learning modules sustain reliable inference even under dense traffic conditions. The model accommodates complex nonlinear interactions, such as lane changes, merges, and acceleration–deceleration dynamics.

Temporal consistency in detected trajectories improves classification stability and reduces false detections compared to frame-by-frame analysis. Phase demand scores (

S_{p h a s e}

), derived from grid cell densities, provide actionable indicators for adaptive signal timing—particularly in zones exhibiting high flow pressure.

Overall, case study D3 demonstrates that the proposed framework scales effectively to high-throughput urban environments, maintaining spatial coherence and enabling real-time signal optimization within complex arterial networks.

5.5. Case Study 4: Evening Rush Adaptive Control (D4)

The model simulated real-time adaptive signal control using risk indicators. Table 5 shows comparative metrics.

Figure 5 shows the delay distribution before and after adaptation.

Case study D4 examines the framework under evening rush-hour conditions, emphasizing real-time adaptive signal control based on grid-level risk metrics. Table 5 shows that the integration of phase demand scores, vehicle density, and trajectory data allowed the system to optimize signal timing dynamically.

The results indicate a 21.3% reduction in average vehicle delay, a 45.5% reduction in red-light violations, and an increase in classification accuracy from 88.2% to 91.0%. The aggregate flow risk index

R_{f l o w}

also decreased significantly, demonstrating that adaptive interventions effectively mitigate congestion and improve intersection safety.

Figure 5 visually confirms the redistribution of delays across lanes, highlighting the system’s ability to balance traffic pressure across approaches. This case study underscores the benefits of combining grid-based monitoring, active learning, and real-time adaptive control in medium-to-high complexity intersections.

5.6. Overall Discussion

Across the first four case studies (D1–D4), the intelligent intersection framework consistently demonstrated the following:

Robust performance across varying lighting conditions, from daylight to nighttime congestion.
Scalability to high-volume arterial traffic with coherent grid-level occupancy and trajectory tracking.
Efficacy of active learning in improving detection accuracy and reducing false positives in challenging visual environments.
Practical utility of grid-informed adaptive signal control, reducing delays, violations, and overall traffic risk.

These findings collectively validate the framework’s ability to operate in heterogeneous traffic environments, provide actionable metrics for decision-making, and maintain computational efficiency through selective learning and grid-level aggregation. The methodology thus bridges static infrastructure constraints and dynamic traffic flow realities, offering a deployable solution for next-generation intelligent transportation systems.

5.7. Case Study 5: Spatiotemporal Grid Optimization at a Multi-Lane Intersection

This case study evaluates the grid-based intelligent control system at a complex, signalized multi-lane intersection, which features multiple turning movements and mixed-mode traffic.

A

6 \times 6

computational grid was overlaid on the intersection, with each cell covering approximately

10 m \times 10 m

. Vehicles detected via computer vision were dynamically assigned to grid cells based on the centroids of their bounding boxes. Each cell captured key features, including temporal vehicle density

ρ_{i, t}

, speed flow

v_{i, t}

, and conflict probability with adjacent cells

C_{i, j, t}

.

To quantify directional pressure and guide phase selection, a local phase demand score was defined as follows:

S_{p h a s e} (t) = \sum_{i = 1}^{m} \frac{ρ_{i, t} \cdot v_{i, t}}{C_{i, j, t} + ϵ},

(37)

where m denotes the set of approach cells contributing to a phase, and

ϵ

is a small smoothing term to prevent division by zero.

These phase demand scores were input into a model-predictive control algorithm to dynamically schedule the most urgent phase. Over a 30 min simulation using real traffic volumes provided by VDOT, the grid-enhanced approach yielded substantial improvements:

23.4% reduction in vehicle idling time;
15.1% improvement in average travel time;
18.6% increase in phase responsiveness under asymmetric demand.

Figure 6 visualizes the spatiotemporal pressure heatmap, highlighting congestion hotspots and guiding adaptive signal interventions.

This study demonstrates that fine-grained, grid-based modeling of vehicle occupancy and flow can effectively inform phase-level interventions at multi-lane intersections. By integrating density, speed, and inter-cell conflict metrics, the system adapts in near real time to heterogeneous traffic patterns and asymmetric demands. The observed improvements in idling time, travel efficiency, and phase responsiveness confirm that spatiotemporal grid dynamics provide actionable insights for both human operators and connected vehicle coordination.

Together with the previous case studies (D1–D4), these results reinforce the scalability and robustness of the proposed framework, showing that grid-based intelligence can support adaptive, AI-enabled traffic management in high-complexity urban intersections.

5.8. Case Study 6: Evaluation Across Lighting and Demand Scenarios

This case study assesses the proposed framework across four distinct urban intersection scenarios, varying in lighting, traffic volume, and signal complexity. The goal is to demonstrate adaptability under real-world conditions, including low-light detection, congestion pressure estimation, and adaptive signal control.

5.8.1. Daytime Suburban Intersection

A four-leg suburban intersection was monitored under clear daytime conditions. Using YOLOv7-tiny combined with Deep SORT, the system achieved a tracking accuracy of 94.2% across 1800 vehicles per hour. Figure 7 visualizes the grid-based vehicle flow heatmap, where the arrows represent typical traffic markings on the asphalt. Table 6 shows the performance matrices.

Insight: The daytime scenario provides a baseline demonstrating high detection fidelity and accurate phase demand estimation under optimal visibility.

5.8.2. Low-Light Urban Arterial

Performance under reduced visibility was evaluated using low-light infrared video. Preprocessing included histogram equalization and contrast enhancement. Entropy-driven active learning selected 12.5% of frames for labeling, improving classification precision by 9.3%. Figure 8 represents entropy-based uncertainty heatmap.

Insight: Without active learning, dark SUVs and pedestrians were misclassified 13.1% of the time. After refinement, misclassification dropped to 4.8%, highlighting the effectiveness of entropy-based sample selection under low-light conditions.

5.8.3. Adaptive Signal Control Using Grid Pressure

A high-volume T-intersection was modeled with a

6 \times 6

grid (36 cells). Each cell contributed to a phase score computed as per Equation (5). Real-time adaptive control was simulated in SUMO using VDOT traffic profiles. Table 7 presents the before and after adaptive control.

Insight: Adaptive logic reduced delay by 27.7% and red-light violations by 35.4% during evening peaks, confirming the value of grid-based pressure metrics in real-time signal optimization.

5.8.4. Multi-Modal Complexity

This large-scale scenario involved mixed traffic, including cars, buses, and bicycles. Additional grid layers were introduced for dedicated bus and bike lanes. Multi-class detection with YOLOv7 achieved 89.8% overall precision. Figure 9 shows the spatiotemporal phase pressure heatmap.

Results:

Car detection accuracy: 92.3%; bus: 84.7%; bike: 78.6%
Red-light violations decreased by 42.1% with bus-prioritization logic.
Average bus delay reduced from 31.2 s to 21.4 s.

Insight: Class-specific signal strategies and lane-based grid segmentation are essential for multi-modal traffic environments, improving both safety and operational efficiency.

Overall, these scenarios demonstrate that the proposed framework adapts effectively across diverse lighting conditions, traffic volumes, and intersection complexities. Improvements in detection accuracy, delay reduction, and violation mitigation confirm its readiness for deployment in smart intersection systems.

6. Conclusions and Future Work

This paper presented an intelligent traffic management framework that integrates computational grid modeling with computer vision and machine learning to dynamically monitor and manage real-world intersections. By leveraging both static infrastructure data and live camera feeds, the system enables proactive control strategies informed by spatially localized traffic states and predictive learning models.

Extensive case studies across various intersections types typical of US cities, lighting conditions, and traffic volumes demonstrated the framework’s adaptability and effectiveness. The key outcomes include significant improvements in traffic classification accuracy, reductions in average delay, and enhanced detection of red-light violations, validating the framework’s utility under both moderate and high-volume conditions. The integration of entropy-driven active learning and adaptive signal control further enhanced system robustness under uncertainty.

Key Findings:

Grid-based modeling provides consistent spatial granularity for fusing static and dynamic traffic features.
Active learning reduces labeling overhead while improving robustness under sensor noise and visual uncertainty.
Real-time feedback enables measurable performance gains in intersection-level delay reduction and safety metrics.

Future Work: Several directions are envisioned to enhance and expand the framework.

7. Discussion and Limitations

7.1. Edge Deployment and Scalability

The framework can be extended with a focus on edge computing architectures to enable real-time processing and decision-making at the network edge. The key aspects include the following:

Edge-Cloud Collaboration: Develop a hierarchical architecture where lightweight models run on edge devices for real-time inference, while more complex computations are offloaded to the cloud.
Resource Optimization: Implement model quantization and pruning techniques to reduce computational requirements while maintaining accuracy.
Distributed Processing: Enable parallel processing across multiple edge nodes to handle high-throughput traffic scenarios.
Latency Reduction: Optimize communication protocols between edge devices and the central system to minimize latency.

7.2. Scalability Enhancements

To ensure the framework’s applicability across diverse intersection types and scales, we will focus on the following:

Modular Architecture: Design components that can be easily scaled or modified based on intersection complexity.
Adaptive Resource Allocation: Implement dynamic resource management to handle varying traffic loads and computational demands.
Federated Learning: Enable collaborative model training across multiple intersections while preserving data privacy.
Load Balancing: Develop strategies to distribute computational load across available resources efficiently.

7.3. Multi-Modal Integration and Advanced Features

Multi-Modal Integration: Extending capabilities to pedestrians, cyclists, and transit operations via multi-camera sensor fusion.
Adversarial Robustness: Developing defenses against spoofing and adversarial attacks targeting traffic classification and prediction models.
Explainability and Trust: Embedding explainable AI modules to improve transparency and support informed decision-making for traffic engineers and policymakers.

7.4. Edge-to-Cloud Continuum

Extend the framework to leverage the full potential of edge-to-cloud continuum computing:

Adaptive Offloading: Implement intelligent workload distribution between edge and cloud resources based on network conditions and computational requirements.
Incremental Learning: Enable continuous model improvement through edge-based learning while maintaining system stability.
Energy Efficiency: Optimize power consumption for battery-powered edge devices to ensure long-term operation.
Standardization: Develop interfaces and protocols for seamless integration with existing smart city infrastructure.

7.5. Technical Contributions

Our framework introduces several key innovations that advance the state-of-the-art in intelligent intersection management:

1.

Hybrid Grid Representation: Unlike traditional occupancy grids, our approach combines geometric and semantic information in a unified representation that enables more efficient spatial reasoning and real-time processing. The grid structure allows for the following:

Efficient spatial queries and aggregation of traffic metrics.
Seamless integration of heterogeneous sensor data.
Scalable representation that adapts to different intersection geometries.

2.

Active Learning Strategy: We propose a novel uncertainty sampling approach that considers both detection confidence and spatial consistency, significantly reducing annotation costs while maintaining high accuracy. The entropy-based selection mechanism ensures that only the most informative samples are used for model refinement.

3.

Real-Time Optimization: Our adaptive control algorithm uses a combination of model predictive control and reinforcement learning to optimize signal timing with sub-second latency. The system continuously adapts to changing traffic conditions by carrying out the following:

Monitoring real-time traffic flow and queue dynamics.
Predicting future traffic states using historical patterns.
Balancing competing objectives (e.g., delay minimization, throughput maximization).

4.

Edge-Cloud Collaboration: The framework supports distributed processing, with lightweight models running on edge devices and more complex computations offloaded to the cloud when needed. This hybrid approach enables the following:

Low-latency decision making at the edge.
Scalable processing for multiple intersections.
Continuous model improvement through federated learning.

These contributions are validated through extensive experiments showing significant improvements over baseline methods in terms of both accuracy and computational efficiency. The framework’s modular design also allows for easy integration of new components and adaptation to different deployment scenarios.

Additional technical contributions of this work are as follows:

Operational Decision Model Integrating Safety Risk: We define an intersection state/action formulation and a multi-term objective that incorporates conflict risk computed from trajectories (Equations (1)–(10)).
Drift-Triggered Active Learning in the Control Loop: We specify a drift detection and query strategy that selectively updates perception models under distribution shift (Section 3.6).
Closed-Loop Adaptive Control Procedure Suitable for Distributed Execution: We provide an end-to-end control algorithm and show how it maps to grid-enabled execution (Algorithm 1 and Section 4).

In summary, this study demonstrates that combining grid-based spatial modeling, computer vision, and machine learning provides a scalable and adaptive framework for intelligent intersections. The results underscore its potential for deployment in next-generation smart city traffic management systems, paving the way for safer, more efficient, and responsive urban mobility.

Author Contributions

Conceptualization, M.K.J.; methodology, M.K.J.; software, M.K.J., P.K.J. and R.K.Y.; validation, M.K.J.; formal analysis, M.K.J.; investigation, M.K.J.; resources, M.K.J.; data curation, M.K.J.; writing—original draft preparation, M.K.J.; writing—review and editing, M.K.J., P.K.J. and R.K.Y.; visualization, M.K.J.; supervision, M.K.J.; project administration, M.K.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data used in this study will be available from the corresponding author upon reasonable request.

Acknowledgments

The authors gratefully acknowledge the inspiration drawn from recent advancements in artificial intelligence and computer vision technologies. The increasing emphasis on connected and autonomous vehicles, including autonomous taxis and infrastructure-assisted driving systems, provided strong motivation for this work. Additionally, the societal imperative to reduce pedestrian-related accidents and red-light violations further underscores the need for intelligent intersection management. This research is a response to these emerging challenges and opportunities.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study, data collection, analysis, interpretation, writing of this manuscript, or the decision to publish the results.

References

Papageorgiou, M.; Diakaki, C.; Dinopoulou, V.; Kotsialos, A.; Wang, Y. Review of road traffic control strategies. Proc. IEEE 2003, 91, 2043–2067. Available online: https://ieeexplore.ieee.org/abstract/document/1246386 (accessed on 20 October 2025).
Wang, Y.; Yang, X.; Liang, H.; Liu, Y. A Review of the Self-Adaptive Traffic Signal Control System Based on Future Traffic Environment. J. Adv. Transp. 2018, 2018, 1096123. [Google Scholar] [CrossRef]
Liu, G.; Shi, H.; Kiani, A.; Khreishah, A.; Lee, J.; Ansari, N.; Liu, C.; Yousef, M.M. Smart Traffic Monitoring System Using Computer Vision and Edge Computing. IEEE Trans. Intell. Transp. Syst. 2021, 23, 12027–12038. [Google Scholar] [CrossRef]
Bugeja, M.; Dingli, A.; Attard, M.; Seychell, D. Comparison of Vehicle Detection Techniques Applied to IP Camera Video Feeds for Use in Intelligent Transport Systems. Transp. Res. Procedia 2020, 45, 971–978. [Google Scholar] [CrossRef]
Ren, X.; Wang, D.; Laskey, M.; Goldberg, K. Extraction of Vehicle Trajectories from Online Video Streams; University of California at Berkeley: Berkeley, CA, USA, 2018; Available online: https://digicoll.lib.berkeley.edu/record/136164/files/EECS-2018-44.pdf (accessed on 20 October 2025).
Jha, M.K.; Ogallo, H.G. Traffic Signal Timing Optimization Analysis and Practice. In Encyclopedia of Business Analytics and Optimization; IGI Global: Hershey, PA, USA, 2014; pp. 1–13. [Google Scholar] [CrossRef]
Jha, M.K.; Ogallo, H. Studying the Dynamic Sight Distance Problem with a Machine Learning Algorithm. In Proceedings of the 2021 Annual Transportation Research Board (TRB) Meeting, Washington, DC, USA, 21–29 January 2021; Paper No. TRBAM-21-03783. Available online: https://www.researchgate.net/publication/358501890_Studying_the_Dynamic_Sight_Distance_Problem_with_a_Machine_Learning_Algorithm (accessed on 27 October 2025).
Jha, M.K.; Jaiswal, R.; Varma, D.S.K.; Rankavat, S.; Bachu, A.K.; Jha, P.K. A Machine Learning Approach to Traffic Congestion Hotspot Identification and Prediction. Future Transp. 2025, 5, 161. [Google Scholar] [CrossRef]
Wang, L.; Yan, X.; Liu, Y.; Liu, X.; Chen, D. Grid Mapping for Road Network Abstraction and Traffic Congestion Identification Based on Probe Vehicle Data. J. Transp. Eng. Part A Syst. 2021, 147, 04021024. [Google Scholar] [CrossRef]
Konrad, M.; Nuss, D.; Dietmayer, K. Localization in Digital Maps for Road Course Estimation Using Grid Maps. In Proceedings of the 2012 IEEE Intelligent Vehicles Symposium, Alcala de Henares, Spain, 3–7 June 2012; IEEE: Madrid, Spain, 2012; pp. 87–92. [Google Scholar] [CrossRef]
Rodriguez-Deniz, H.; Jenelius, E.; Villani, M. Urban Network Travel Time Prediction via Online Multi-Output Gaussian Process Regression. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16–19 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–6. [Google Scholar] [CrossRef]
Zhou, Y. Web Services-Based Grid Computing for Traffic Flow Predictive Control. In Proceedings of the 2009 WRI Global Congress on Intelligent Systems, Xiamen, China, 19–21 May 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 455–459. [Google Scholar] [CrossRef]
Villa, J.; García, F.; Jover, R.; Martínez, V.; Armingol, J.M. Intelligent Infrastructure for Traffic Monitoring Based on Deep Learning and Edge Computing. J. Adv. Transp. 2024, 2024, 3679014. [Google Scholar] [CrossRef]
Chaudhuri, A. Smart Traffic Management of Vehicles Using Faster R-CNN Based Deep Learning Method. Sci. Rep. 2024, 14, 10357. [Google Scholar] [CrossRef]
Hunt, P.B.; Robertson, D.I.; Bretherton, R.D.; Winton, R.I. A Traffic Responsive Method of Coordinating Signals; Transport and Road Research Laboratory: Berkshire, UK, 1981; Available online: https://trid.trb.org/View/179439 (accessed on 27 October 2025).
Sims, A.G. The Sydney coordinated adaptive traffic system. In Proceedings of the Engineering Foundation Conference on Research Directions in Computer Control of Urban Traffic Systems, Pacific Grove, CA, USA, 11–16 February 1979; Available online: https://trid.trb.org/View/1206560 (accessed on 27 October 2025).
Wiering, M.A. Multi-agent reinforcement learning for traffic light control. In Proceedings of the Seventeenth International Conference (ICML ’2000), Stanford, CA, USA, 29 June–2 July 2000; Available online: https://www.researchgate.net/publication/221346141_Multi-Agent_Reinforcement_Learning_for_Traffic_Light_Control (accessed on 21 October 2025).
Li, L.; Lv, Y.; Wang, F.-Y. Traffic signal timing via deep reinforcement learning. IEEE/CAA J. Autom. Sin. 2016, 3, 247–254. [Google Scholar] [CrossRef]
Wei, H.; Zheng, G.; Yao, H.; Li, Z. CoLight: Learning network-level cooperation for traffic signal control. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; Available online: https://dl.acm.org/doi/abs/10.1145/3357384.3357902 (accessed on 27 October 2025).
Saunier, N.; Sayed, T. A feature-based tracking algorithm for vehicles in intersections. In Proceedings of the 3rd Canadian Conference on Computer and Robot Vision (CRV ’06); IEEE: Quebec, QC, Canada, 2006; p. 59. [Google Scholar] [CrossRef]
Minderhoud, M.M.; Bovy, P.H.L. Extended time-to-collision measures for road traffic safety assessment. Accid. Anal. Prev. 2001, 33, 89–97. [Google Scholar] [CrossRef]
Zhou, X.; Wang, D.; Krähenbühl, P. Objects as points. arXiv 2019, arXiv:1904.07850. [Google Scholar]
Zheng, Y.; Capra, L.; Wolfson, O.; Yang, H. Urban computing: Concepts, methodologies, and applications. ACM Trans. Intell. Syst. Technol. 2014, 5, 38. [Google Scholar] [CrossRef]
Mahmud, S.; Day, C.M. Evaluation of arterial signal coordination with commercial connected vehicle data: Empirical traffic flow visualization and performance measurement. arXiv 2022, arXiv:2212.02315. [Google Scholar] [CrossRef]
Liao, N.; Guan, J. Multi-Scale Convolutional Feature Fusion Network Based on Attention Mechanism for IoT Traffic Classification. Int. J. Comput. Intell. Syst. 2024, 17, 36. [Google Scholar] [CrossRef]
Wang, H.-F.; Jiao, Y.-M.; Hao, T.; Shan, Y.-H.; Song, S.-Z.; Huang, H. Low-Visibility Vehicle-Road Environment Perception Based on the Multi-Modal Visual Features Fusion of Polarization and Infrared. IEEE Trans. Intell. Transp. Syst. 2023, 24, 11997–12013. [Google Scholar] [CrossRef]
Liang, S.; Wu, H.; Zhen, L.; Hua, Q.; Garg, S.; Kaddoum, G.; Hassan, M.M.; Yu, K. Edge YOLO: Real-Time Intelligent Object Detection System Based on Edge-Cloud Cooperation in Autonomous Vehicles. IEEE Trans. Intell. Transp. Syst. 2022, 23, 25345–25360. [Google Scholar] [CrossRef]
Peyman, M.; Copado, P.J.; Tordecilla, R.D.; Martins, L.C.; Xhafa, F.; Juan, A.A. Edge Computing and IoT Analytics for Agile Optimization in Intelligent Transportation Systems. Energies 2021, 14, 6309. [Google Scholar] [CrossRef]
Abdellatif, A.A.; Chiasserini, C.F.; Malandrino, F.; Mohamed, A.; Erbad, A. Active Learning with Noisy Labelers for Improving Classification Accuracy of Connected Vehicles. IEEE Trans. Veh. Technol. 2021, 70, 3059–3070. [Google Scholar] [CrossRef]
Gao, J.; Shen, Y.; Liu, J.; Ito, M.; Shiratori, N. Adaptive Traffic Signal Control: Deep Reinforcement Learning Algorithm with Experience Replay and Target Network. arXiv 2017, arXiv:1705.02755. [Google Scholar] [CrossRef]
Mousavi, S.S.; Schukat, M.; Howley, E. Traffic Light Control Using Deep Policy-Gradient and Value-Function-Based Reinforcement Learning. IET Intell. Transp. Syst. 2017, 11, 417–432. [Google Scholar] [CrossRef]
Meier, J.-N.; Kailas, A.; Adla, R.; Bitar, G.; Moradi-Pari, E.; Abuchaar, O.; Ali, M.; Abubakr, M.; Deering, R.; Ibrahim, U.; et al. Implementation and evaluation of cooperative adaptive cruise control functionalities. IET Intell. Transp. Syst. 2018, 12, 1124–1133. [Google Scholar] [CrossRef]
Thandavarayan, G.; Sepulcre, M.; Gozalvez, J. Cooperative perception for connected and automated vehicles: Evaluation and impact of congestion control. IEEE Access 2020, 8, 197665–197683. [Google Scholar] [CrossRef]
Hajisami, A.; Lansford, J.; Dingankar, A.; Misener, J. A tutorial on the LTE-V2X direct communication. IEEE Open J. Veh. Technol. 2022, 3, 388–398. [Google Scholar] [CrossRef]
Chu, T.; Wang, J.; Codecà, L.; Li, Z. Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Trans. Intell. Transp. Syst. 2020, 21, 1086–1095. [Google Scholar] [CrossRef]
Singh, T.; Rajput, V.; Prasad, U.; Kumar, M. Real-Time Traffic Light Violations Using Distributed Streaming. J. Supercomput. 2023, 79, 7533. [Google Scholar] [CrossRef]
Merrad, A.; Daoud, W.; Dalouli, A.; Latrech, B.; Nouri, A.N. AI-Powered Simultaneous Multi-Vehicle Speed Estimation for Intelligent Traffic Monitoring in Developing Regions Using YOLOv7 and DeepSORT. ITEGAM-JETIA 2025, 11, 193–200. [Google Scholar] [CrossRef]
Xu, B.; Ban, X.J.; Bian, Y.; Wang, J.; Li, K. V2I based cooperation between traffic signal and approaching automated vehicles. In Proceedings of the 2017 IEEE Intelligent Vehicles Symposium (IV), Los Angeles, CA, USA, 11–14 June 2017; pp. 1658–1664. [Google Scholar] [CrossRef]
Yu, F.; Xian, W.; Chen, Y.; Liu, F.; Liao, M.; Madhavan, V.; Darrell, T. BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling. arXiv 2020, arXiv:1805.04687. [Google Scholar]

Figure 1. End-to-end computational pipeline showing the integration of framework components. Solid arrows indicate primary data flow, while dashed red arrows show feedback loops for adaptive learning and control.

Figure 2. Grid-level traffic flow map for D1.

Figure 3. Entropy-based uncertainty for active learning (D2).

Figure 4. Vehicle trajectories mapped on spatial grid (D3).

Figure 5. Average delay before and after adaptive control (D4).

Figure 6. Spatiotemporal pressure score heatmap for signal phase optimization.

Figure 7. Grid-based vehicle flow heatmap.

Figure 8. Entropy-based uncertainty heatmap.

Figure 9. Spatiotemporal phase pressure heatmap.

Table 1. Datasets used for evaluation.

Dataset	Source	Size (Hours)	Key Characteristics
UrbanDay	City Traffic Cameras	120	Clear weather, high traffic density
NightIntersection	Custom Collection	80	Low-light conditions, various weather
MixedTraffic	NGSIM & Custom	200	Mixed human/autonomous vehicles
AdverseWeather	BDD100K [39]	150	Rain, fog, and snow conditions

Table 2. Datasets used in case studies.

ID	Intersection	Scenario	Volume (veh/h)
D1	Intersection 1	Suburban, Daytime	1800
D2	Intersection 2	Urban, Nighttime	1300
D3	Intersection 3	Midday Peak	2200
D4	Intersection 4	Evening Adaptive	2100
D5	Intersection 5	Multi-modal	2500
D6	Intersection 6	Coordinated Grid	1900

Table 3. Performance metrics: case study 1.

Metric	Value
Average Speed	29.4 mph
Average Delay	16.2 s/veh
Red-Light Violations	8
Classification Accuracy	87.1%
Grid Risk Index ( $R_{f l o w}$ )	0.173

Table 4. Zone-wise density for D3.

Zone ID	Density (veh/km)
Z1 (Northbound)	81.5
Z2 (Southbound)	73.4
Z3 (Eastbound)	92.7
Z4 (Westbound)	88.2

Table 5. Before vs. after adaptive control (D4).

Metric	Before	After
Avg Delay (sec)	24.6	19.4
Violations	11	6
Classification Accuracy	88.2%	91.0%
Risk Index ( $R_{f l o w}$ )	0.212	0.147

Table 6. Performance metrics.

Metric	Value
Detection Accuracy	94.2%
Average Delay (s/veh)	14.8
Clearance Time Accuracy	91.5%
Red-Light Violation Rate	0.7%

Table 7. Before/after adaptive control.

Metric	Before	After
Average Delay (s)	26.7	19.3
Queue Length (m)	112.5	84.2
Phase Responsiveness Score	0.68	0.91

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jha, M.K.; Jha, P.K.; Yadav, R.K. A Grid-Enabled Vision and Machine Learning Framework for Safer and Smarter Intersections: Enhancing Real-Time Roadway Intelligence and Vehicle Coordination. Infrastructures 2026, 11, 41. https://doi.org/10.3390/infrastructures11020041

AMA Style

Jha MK, Jha PK, Yadav RK. A Grid-Enabled Vision and Machine Learning Framework for Safer and Smarter Intersections: Enhancing Real-Time Roadway Intelligence and Vehicle Coordination. Infrastructures. 2026; 11(2):41. https://doi.org/10.3390/infrastructures11020041

Chicago/Turabian Style

Jha, Manoj K., Pranav K. Jha, and Rupesh K. Yadav. 2026. "A Grid-Enabled Vision and Machine Learning Framework for Safer and Smarter Intersections: Enhancing Real-Time Roadway Intelligence and Vehicle Coordination" Infrastructures 11, no. 2: 41. https://doi.org/10.3390/infrastructures11020041

APA Style

Jha, M. K., Jha, P. K., & Yadav, R. K. (2026). A Grid-Enabled Vision and Machine Learning Framework for Safer and Smarter Intersections: Enhancing Real-Time Roadway Intelligence and Vehicle Coordination. Infrastructures, 11(2), 41. https://doi.org/10.3390/infrastructures11020041

Article Menu

A Grid-Enabled Vision and Machine Learning Framework for Safer and Smarter Intersections: Enhancing Real-Time Roadway Intelligence and Vehicle Coordination

Abstract

1. Introduction

2. Literature Review

2.1. Fixed-Time, Actuated, and Adaptive Signal Control

2.2. Reinforcement Learning and Multi-Agent Control

2.3. Vision-Driven Traffic Perception and Safety Analytics

2.4. Distributed and Grid-Enabled Traffic Management

2.5. Identified Gap and Positioning of This Work

2.6. Computer Vision and Deep Learning for Traffic Sensing

2.7. Edge and IoT Architectures for Real-Time Monitoring

2.8. Computational Grid and Spatiotemporal Discretization

2.9. Active Learning and Vision Uncertainty

2.10. Reinforcement Learning for Adaptive Signal Control

2.11. V2V Communication and Multi-Agent Coordination

2.12. Intersection Safety and CAV-V2I Integration

2.13. Positioning with Respect to Related Prior Work

2.14. Identified Research Gaps

3. Methodology

3.1. Notation and State Representation

3.2. Action Space and Safety Constraints

3.3. Objective (Control Criterion)

3.4. Decision Rule

3.5. Implementation Details

3.5.1. Dataset Description

3.5.2. Model Architecture and Training

3.5.3. Active Learning Strategy

3.6. Active Learning and Drift-Triggered Updates

3.6.1. Drift Detection

3.6.2. Query Strategy

3.6.3. Update and Deployment

3.6.4. Evaluation Protocol

3.7. Overall Computational Pipeline and Component Integration

3.8. Risk and Conflict Assessment from Trajectories

3.8.1. Trajectory Extraction

3.8.2. Conflict Computation

3.9. Adaptive Signal Control

3.10. Decision-Making Process

3.11. Grid-Based Spatiotemporal Modeling

3.12. Vehicle Detection and Tracking

3.13. Traffic Behavior Estimation

3.14. Active Learning for Classification

3.15. Traffic Flow Analysis and Risk Functions

3.16. Edge Deployment Modeling

3.17. Grid-Enhanced Vehicle Flow Analytics

4. Implementation and Reproducibility

4.1. Data and Preprocessing

4.2. Training and Inference Details

4.3. Control Update Frequency and Latency Budget

5. Case Studies, Results, and Discussion

5.1. Datasets

5.2. Case Study 1: Suburban Daytime Flow (D1)

5.3. Case Study 2: Nighttime Congestion (D2)

5.4. Case Study 3: Urban High-Volume Arterial (D3)

5.5. Case Study 4: Evening Rush Adaptive Control (D4)

5.6. Overall Discussion

5.7. Case Study 5: Spatiotemporal Grid Optimization at a Multi-Lane Intersection

5.8. Case Study 6: Evaluation Across Lighting and Demand Scenarios

5.8.1. Daytime Suburban Intersection

5.8.2. Low-Light Urban Arterial

5.8.3. Adaptive Signal Control Using Grid Pressure

5.8.4. Multi-Modal Complexity

6. Conclusions and Future Work

7. Discussion and Limitations

7.1. Edge Deployment and Scalability

7.2. Scalability Enhancements

7.3. Multi-Modal Integration and Advanced Features

7.4. Edge-to-Cloud Continuum

7.5. Technical Contributions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information