1. Introduction
Traffic management has become a global priority due to rapid urbanization, increasing vehicle density, and the need to reduce traffic congestion, accidents, and environmental impact. Intelligent traffic monitoring and accident detection systems are essential components for ensuring safety, optimizing road capacity, and improving mobility; while artificial intelligence and data-driven approaches are gaining importance in research due to their adaptability and high accuracy, traditional detection methods remain highly important in modern traffic systems. These classic techniques, such as threshold-based algorithms, Kalman filters, and rule-based systems, have been thoroughly validated by decades of use and are now widely implemented in real-world transportation environments because they offer robustness, reliability, and proven operational performance.
In addition, our departments, which organize this research, have a long tradition in the field of traffic monitoring and the development of intelligent systems, thereby contributing significantly to both classical and artificial intelligence-based detection methods. This rich academic and practical heritage highlights the importance of linking historical knowledge with new technologies.
It is important to clearly distinguish between traffic object detection and traffic incident detection, as these two tasks address different problem levels within intelligent transportation systems.
Traffic object detection focuses on the identification and localization of physical entities present in traffic scenes, such as vehicles, pedestrians, cyclists, traffic signs, or obstacles. Typical inputs include images or video streams from surveillance cameras, dashcams, or onboard sensors, as well as data from LiDAR or radar systems. The primary outputs of object detection are bounding boxes, object classes, confidence scores, and, in advanced cases, segmentation masks or three-dimensional object representations. The main objective of traffic object detection is accurate perception of the traffic environment at a given time.
In contrast, traffic incident detection operates at a higher semantic level and aims to identify abnormal or non-recurrent traffic events, such as accidents, sudden congestion, stalled vehicles, or hazardous situations. Its inputs often consist of aggregated traffic parameters (e.g., speed, flow, density), object trajectories, spatio-temporal patterns, or outputs generated by object detection and tracking modules. The outputs of incident detection systems are typically event labels, alarms, risk scores, or incident occurrence times, which support traffic management decisions and emergency response.
In many practical systems, traffic object detection serves as a fundamental building block for traffic incident detection. Reliable detection and tracking of traffic participants enable the extraction of higher-level features, such as vehicle interactions, speed changes, or trajectory anomalies, which are subsequently used to infer incidents. Therefore, while closely related, object detection and incident detection represent distinct yet complementary tasks within traffic monitoring architectures.
Therefore, this literature study examines both classical detection techniques and artificial intelligence-based approaches to provide a comprehensive understanding of current trends, their effectiveness, implementation challenges, and opportunities for integrated solutions.
Figure 1 presents a unified diagram that categorizes all reviewed traffic object and incident detection approaches. The diagram provides a structured overview of classical and artificial intelligence-based methodologies and clarifies the relationships between different detection strategies.
This paper provides a comprehensive review of traffic object and incident detection methods within intelligent transportation systems, focusing on both classical and artificial intelligence-based approaches. It outlines the key challenges faced by current systems, including environmental variability, real-time processing constraints, and limited robustness under adverse conditions such as rain, fog, and low illumination. The review establishes a clear scope by distinguishing between object detection and incident detection while emphasizing their interdependence.
The paper first presents commonly used datasets, sensor modalities, and evaluation metrics, highlighting inconsistencies in benchmarking practices across studies. It then introduces a methodological classification that organizes existing approaches into classical techniques and AI-based methods. Traffic object detection methods are reviewed in detail, covering motion-based, feature-based, rule-based, traditional machine learning, and deep learning approaches, along with recent advances such as attention mechanisms, transformer models, and graph-based architectures.
Finally, the study reviews traffic incident detection methods, ranging from threshold- and rule-based techniques to statistical modeling, deep learning, anomaly detection, and predictive approaches. A critical assessment of each category is provided, considering accuracy, scalability, computational cost, and environmental robustness. The paper concludes by identifying open research challenges and recommending future directions, including multimodal sensing, improved generalization across weather conditions, and integrated, real-time traffic detection frameworks.
For better navigation through the article,
Figure 2 below visualizes the structure of the article, and
Table 1 also shows the structure of the article.
2. Dataset Evaluation
Adverse weather conditions and complex environmental factors remain a critical challenge for traffic object detection systems, significantly influencing both algorithmic performance and the validity of experimental evaluation. Many of the datasets commonly used in the literature reviewed in this study were collected under controlled or predominantly favorable conditions, which limits their ability to fully reflect real-world operational environments.
Several widely used object detection benchmarks, such as KITTI and Cityscapes, provide high-quality annotations and well-structured urban scenes captured mainly under clear weather and daytime conditions; while these datasets have been instrumental in advancing object detection accuracy and multi-class recognition, their limited representation of adverse weather, low illumination, and extreme visibility degradation restricts the assessment of model robustness in realistic deployment scenarios. As a result, models trained and evaluated on these datasets often exhibit performance degradation when exposed to rain, fog, snow, or nighttime conditions.
The BDD100K dataset represents a partial step toward addressing this limitation by including images captured under diverse weather conditions, such as rain and fog, as well as different times of day. Several deep learning-based studies reviewed in this manuscript rely on BDD100K to demonstrate improved generalization. However, even in this case, adverse weather samples constitute only a subset of the dataset, and evaluation protocols rarely isolate weather-specific performance, making it difficult to quantify robustness systematically.
Traffic-specific benchmarks such as UA-DETRAC, frequently used in studies based on YOLO and related deep learning architectures, include video sequences recorded under varying illumination and weather conditions, including cloudy, rainy, and nighttime scenarios. Despite this diversity, many studies report aggregate detection metrics without explicitly analyzing performance degradation across environmental categories. Consequently, while UA-DETRAC enables evaluation under more realistic conditions, its potential for detailed environmental analysis remains underutilized.
For classical background subtraction and motion-based approaches, datasets such as CDnet are commonly employed to evaluate robustness against illumination changes, dynamic backgrounds, and weather effects. CDnet explicitly includes challenging categories such as bad weather, intermittent object motion, and low frame rate videos, making it a valuable benchmark for assessing environmental sensitivity. However, these datasets are often disconnected from traffic-specific semantics, focusing primarily on foreground segmentation rather than high-level object understanding.
In the context of traffic incident detection, datasets differ substantially from visual object detection benchmarks. Studies relying on loop detector measurements, floating car data (FCD), or aggregated traffic flow data implicitly incorporate environmental effects such as weather-induced speed reduction or congestion patterns. However, these datasets rarely include explicit weather labels, making it difficult to disentangle environmental influence from traffic dynamics. Similarly, LiDAR-based trajectory datasets used for near-crash or conflict detection provide accurate geometric information and improved robustness to illumination changes, yet their availability is limited and data collection costs remain high.
Overall, the reviewed literature reveals a persistent gap between algorithmic development and dataset realism. While advanced detection models increasingly emphasize robustness and adaptability, the lack of standardized, weather-diverse benchmarks and environment-aware evaluation protocols limits meaningful comparison across methods. Future research should prioritize the development of multimodal datasets that explicitly annotate environmental conditions and support systematic evaluation of detection performance under adverse and dynamically changing traffic environments. Coverage of commonly used traffic detection datasets is shown in the
Table 2.
5. Object Detection
Building upon the conceptual and methodological framework outlined in the previous section, this section focuses on traffic object detection methods, which constitute the foundational perception layer of many intelligent transportation systems. Object detection methods can be divided into two main categories. Classic detection methods and detection methods using artificial intelligence. Classic detection methods like background subtraction, optical flow, edge detection, Hough transform, SIFT (scale-invariant feature transform)/SURF (Speeded up robust features), and rule-based heuristics are indeed used to detect the presence and location of objects like vehicles or pedestrians in an image or video frame. They do not classify objects as deeply as artificial intelligence-based methods but still produce detections as bounding boxes or regions of interest.
Detection methods using artificial intelligence, like SVM + HOG (Support Vector Machine + Histogram of Oriented Gradients), Random Forests, YOLO, SSD, Faster R-CNN (Region-based Convolutional Neural Networks), DETR (DEtection TRansformer), and PointNet are all true object detection models, not just classifiers. They output bounding boxes, classes such as cars and pedestrians, and, in advanced cases, even segmentation masks or 3D bounding boxes.
When we take a closer look at classic detection methods, we can divide them into the following categories:
On the other hand, we can divide detection methods using artificial intelligence into the following categories:
Traditional Machine Learning;
Deep Learning–Object Detection;
Advanced Architectures and Trends.
All reviewed categories for this section are presented in
Figure 3.
5.1. Background and Motion-Based Methods
Background and motion-based methods rely on the idea that moving objects differ from the static background in a video stream. By modeling the background (road, buildings, static environment) and then subtracting it from each frame, we can detect the foreground objects—typically vehicles, pedestrians, or cyclists. They are fast and effective in controlled conditions (e.g., fixed cameras, good lighting), but limited when it comes to robust classification or handling complex urban scenes. Core techniques of this method are background subtraction, optical flow, and temporal differencing.
Research on background subtraction and object detection in traffic scenes has focused on improving robustness, adaptability, and real-time performance under challenging conditions. In the study, ref. [
1] enhanced traditional background subtraction by addressing illumination changes, shadows, and stationary vehicles, achieving high detection precision and reduced computational cost. Similarly, ref. [
2] integrated MoG and texture models with motion history and Kalman filtering to track multiple vehicles reliably in real time. The study by [
3] contributed an adaptive LBP histogram-based model capable of adjusting to background variations, supporting efficient traffic monitoring. Earlier, ref. [
4] proposed tracking pixel clusters in color and position space to detect moving vehicles from mobile cameras, demonstrating resilience to occlusions and shape variations. The study [
5] refined the MoG approach with adaptive updates and effective shadow detection, improving segmentation accuracy and distinguishing shadows from moving objects.
In a study, ref. [
6] further optimized the MoG model by dynamically adjusting Gaussian components per pixel, improving processing speed and adaptability in complex scenes. A work by [
7] proposed a codebook-based background modeling technique that quantized pixel values to enhance speed and memory efficiency while handling illumination changes and motion. Building upon this idea, ref. [
8] combined codebook modeling with texture, color, and shape cues to improve detection accuracy in complex environments. The study by [
9] introduced PAWCS, a dictionary-based segmentation model with adaptive feedback that achieved state-of-the-art performance and superior F-measure results on benchmark datasets. Finally, ref. [
10] developed PBAS, a non-parametric, pixel-level adaptive model that dynamically adjusted thresholds and learning rates, demonstrating excellent performance across various lighting and environmental conditions.
Collectively, these studies illustrate the steady evolution from static background subtraction methods toward adaptive, feedback-driven, and feature-rich models. Each contribution advanced the field by improving speed, accuracy, and robustness, laying the groundwork for modern, intelligent traffic monitoring and surveillance systems.
We created two tables for comparison purposes,
Table 3 and
Table 4, where we summarize the most interesting studies.
5.2. Shape and Edge-Based Methods
Shape and Edge-Based methods are classical computer vision techniques that detect and recognize objects by analyzing their geometric outlines rather than relying on pixel intensities or textures. These approaches usually begin with edge detection (e.g., Canny, Sobel) to extract object boundaries from an image. Once edges are identified, algorithms such as the Hough Transform or contour matching are applied to fit predefined shapes like lines, circles, or ellipses. The advantage of these methods is that they are relatively simple, interpretable, and computationally efficient, making them suitable for well-defined geometric objects. However, they often struggle in complex, cluttered, or noisy environments, and they lack robustness compared to modern machine learning or deep learning methods.
Shape- and edge-based approaches have played a key role in early traffic image analysis, offering interpretable and geometry-driven solutions for object detection. The study of [
11] proposed an edge-based detection framework that optimizes cost functions to fit geometric shapes such as circles, ellipses, and lines. This method achieved higher accuracy and fewer false detections than traditional Hough Transform techniques, even under noisy or cluttered conditions. Building on this foundation, ref. [
12] integrated background subtraction, edge-based feature extraction, and a neural network classifier to accurately detect and classify vehicles by size. Their system maintained over 96% accuracy—up to 99.47% in some cases—while remaining robust to illumination changes and shadows. The work of [
13] further advanced vehicle recognition by extracting edge points with a Canny operator and describing them using a modified SIFT descriptor, capturing both geometric and appearance information. Their probabilistic constellation model achieved up to 99% classification accuracy for distinguishing vehicle types, outperforming other shape-based methods. Focusing on license plate recognition, ref. [
14] combined edge density enhancement, morphological processing, and color verification to improve detection under challenging lighting and motion conditions, achieving 97% accuracy on real-world datasets. Finally, ref. [
15] explored template matching for vehicle and object tracking, demonstrating good performance (around 90% accuracy) in stable environments but highlighting the limitations of such methods under occlusion, rotation, and illumination variation.
Collectively, these studies demonstrate that integrating edge, shape, and appearance features enables effective detection and classification in traffic imagery. While template and edge-based methods perform well in structured conditions, hybrid approaches that incorporate learning and adaptive models offer better generalization to real-world environments. These methods are summarized in
Table 5.
5.3. Feature-Based Methods
Feature-Based methods are approaches that recognize or detect objects by extracting and analyzing distinctive local features from images rather than relying on global shape, intensity, or edges alone. These features are usually points, regions, or patches that are repeatable (detected reliably under different conditions) and discriminative (good at distinguishing one object from another). Common examples include:
Corner detectors (e.g., Harris, FAST);
Blob/region detectors (e.g., DoG, MSER (Maximally Stable Extremal Regions));
Local descriptors (e.g., SIFT, SURF, ORB) that encode information about texture, gradients, and local appearance.
The main strengths of Feature-Based methods are their invariance to scale, rotation, and illumination changes, making them robust for tasks like matching, recognition, and tracking. They are widely used in object recognition, image stitching, 3D reconstruction, and visual SLAM. However, they have limitations:
They perform poorly on textureless objects (like vehicles with uniform surfaces).
They may be sensitive to noise or severe occlusion.
Extracting and matching features can be computationally expensive, though modern fast descriptors address this.
Feature-based methods focus on extracting and tracking distinctive visual features to improve object detection and classification in complex traffic scenes. Feature-based methods were used in a few studies, for example, [
16] developed a real-time vehicle tracking system that combined lane detection, shadow removal, and contour-based analysis, using corner-point sub-features and Kalman filtering to achieve higher tracking accuracy under occlusions and lighting variations. A study by [
17] enhanced robustness in dynamic conditions by integrating invariant feature extraction with Discrete Dynamic Swarm Optimization (DDSO), enabling reliable tracking despite scale, rotation, or appearance changes, and outperforming particle filter and mean-shift approaches. Later, ref. [
18] proposed a LiDAR-based detection framework that employs adaptive clustering and geometric-intensity feature extraction to classify vehicles, pedestrians, and bicyclists in real time, achieving high accuracy and precision while effectively managing occlusions and environmental variability. A comparison of the methods is presented in
Table 6.
5.4. Rule-Based Methods for Object Detection
Rule-based methods use predefined logical rules to interpret visual data and make decisions about detected objects or events. Instead of relying on statistical learning, they apply knowledge-driven reasoning, often using if–then production rules, to handle detection, tracking, and classification.
In traffic monitoring, rule-based approaches typically combine low-level image analysis (like detecting motion, edges, or headlights) with high-level reasoning that enforces consistency over time. For example, rules may correct false detections, maintain object identity through occlusions, or distinguish between real vehicles and noise.
Their strength lies in interpretability and robustness: they can explicitly encode domain knowledge (e.g., “headlights appear in pairs at night” or “objects must move along lanes”), making them effective in structured environments. However, they may struggle with highly dynamic or unstructured scenes where fixed rules cannot cover all variations.
Rule-based approaches have proven highly effective in traffic and navigation systems by combining low-level perception with high-level reasoning to manage complex and dynamic environments. These methods were also used in a few studies, ref. [
19] developed a system that integrates image-based vehicle and headlight detection with a rule-based reasoning module, allowing the system to correct errors, handle occlusions, and maintain consistency in tracking. This approach achieved nearly 97% tracking accuracy during daytime and approximately 96% at night, demonstrating robust performance for vehicle counting under varying conditions. The study by [
20] extended this concept with a two-layer architecture: low-level image processing for vehicle detection using spatio-temporal segmentation during the day and headlight-pair analysis at night, combined with a high-level production-rule reasoning module. Their system reliably tracked vehicles around the clock with 94–96.9% accuracy, effectively managing shadows, segmentation errors, and stopped vehicles. The work of [
21] applied rule-based logic to traffic signal control through SPPORT, which prioritizes queues, prevents spillback, and grants conditional transit priority. Simulations showed that SPPORT reduced delays by up to 49% relative to fixed-time signals and improved transit efficiency, highlighting the effectiveness of rules in traffic management. Finally, ref. [
22] demonstrated the versatility of rule-based strategies in maritime navigation by integrating COLREGs into the R-RA* path planning algorithm for unmanned surface vehicles. This approach dynamically incorporates navigational constraints such as overtaking, head-on, and crossing encounters while enabling real-time trajectory repair, producing safe, compliant, and efficient paths significantly faster than standard A* algorithms. Collectively, these studies illustrate that rule-based frameworks enhance system reliability, accuracy, and operational efficiency by encoding expert knowledge and enforcing logical constraints across traffic and navigation applications. A comparison of the studies is presented in
Table 7.
5.5. Traditional Machine Learning
Traditional machine learning approaches to object detection in traffic scenes rely on carefully designed features and classical classifiers rather than end-to-end deep learning models. These methods typically begin by extracting visual or geometric features from images or video frames that are expected to capture the relevant characteristics of objects such as vehicles, pedestrians, or cyclists. Common feature types include edge and corner detectors, gradient-based descriptors like SIFT or HOG, color histograms, and texture features. Once extracted, these features are fed into supervised machine learning algorithms, such as support vector machines (SVMs), decision trees, k-nearest neighbors (k-NN), or ensemble methods like AdaBoost (Adaptive Boosting), which are trained to differentiate between objects of interest and background elements.
Detection in traffic scenarios often involves scanning the entire image using a sliding window at multiple scales, computing features for each window, and classifying it to determine the presence or absence of an object. Additional pre-processing steps, such as background subtraction, motion detection, or optical flow analysis, are frequently incorporated to reduce false positives and focus the detection on regions likely to contain moving objects. Temporal information from video sequences may also be exploited, allowing detections in one frame to inform predictions in subsequent frames, improving robustness under occlusion or partial visibility.
Despite their interpretability and relatively low computational cost compared to modern deep learning methods, traditional machine learning techniques face significant challenges in real-world traffic scenes. Variations in illumination, weather conditions, shadows, object scale, and pose, as well as cluttered environments with overlapping or occluded vehicles, often degrade performance. Low-resolution imagery from distant traffic cameras or noisy sensor data can further reduce detection accuracy. Nonetheless, these methods laid the foundational principles for automated traffic object detection and remain relevant in contexts where labeled data is limited, computational resources are constrained, or lightweight, explainable solutions are preferred. Researchers often combine traditional feature-based methods with heuristic rules or probabilistic models to enhance detection stability and reduce errors, demonstrating that careful design of features and classifiers can still yield effective results for traffic monitoring and analysis.
Traditional machine learning approaches for traffic object detection have relied heavily on handcrafted features and classical classifiers to achieve reliable recognition and tracking performance. A study by [
23] proposed a hybrid framework combining Binarized Normed Gradient (BING) for rapid object localization with PCANet and SVM classification for road marking recognition. Their system achieved a classification accuracy of over 96.8% across nine road marking classes, demonstrating strong robustness and generalization across diverse environments. The work of [
24] conducted a comparative analysis between conventional HOG-SVM models and modern deep learning detectors such as SSD and YOLO, revealing that while traditional methods achieved limited performance (21.6% mAP at 2 fps), deep models significantly outperformed them in both accuracy (up to 82% mAP) and processing speed (up to 95 fps). To enhance detection reliability, ref. [
25] introduced an improved HOG-SVM-based model that integrates region proposal mechanisms inspired by human visual attention, achieving a detection rate of 0.91 and an overall recognition accuracy of 94.1%. More recently, ref. [
26] developed a robust SVM-based framework that combines background subtraction, contour-based segmentation, and motion descriptors to maintain detection stability under varying illumination, weather, and motion conditions. Overall, these studies illustrate the progressive refinement of traditional machine learning techniques, moving from feature-engineered pipelines toward hybrid and adaptive systems that bridge the performance gap with deep learning approaches. The studies are compared in
Table 8.
5.6. Deep Learning with Object Detection
Deep learning has transformed object detection in traffic by replacing handcrafted features and traditional classifiers with end-to-end trainable models that directly learn to extract and represent meaningful patterns from raw data. Convolutional Neural Networks (CNNs), in particular, have become the backbone of modern traffic object detection, enabling systems to automatically identify vehicles, pedestrians, cyclists, and traffic signs with high accuracy. Unlike traditional machine learning methods that rely on fixed descriptors such as edges or gradients, deep learning models learn hierarchical feature representations, ranging from low-level textures to high-level semantic structures, which makes them far more robust to variations in scale, illumination, occlusion, and complex backgrounds common in traffic environments.
Object detection frameworks in traffic applications generally fall into two categories: two-stage detectors and one-stage detectors. Two-stage methods, such as Faster R-CNN, first generate region proposals and then classify and refine bounding boxes for objects. They tend to achieve higher accuracy and are well suited for tasks like traffic surveillance where precision is critical, for example, in detecting violations or monitoring pedestrian crossings. One-stage detectors, such as YOLO (You Only Look Once) and SSD (Single Shot MultiBox Detector), eliminate the proposal step and directly predict object classes and bounding boxes in a single pass, making them much faster and suitable for real-time applications such as autonomous driving or smart intersections. More recent advances, like YOLOv8, EfficientDet, and transformer-based approaches (e.g., DETR), further improve the balance between accuracy and speed, allowing scalable deployment in both cloud-based traffic monitoring systems and embedded platforms in vehicles.
Deep learning methods also integrate temporal information in video-based traffic analysis. Recurrent Neural Networks (RNNs), LSTMs, and more recently 3D CNNs and transformers capture motion cues across frames, improving detection stability for moving vehicles and pedestrians. Some approaches incorporate multimodal data from LiDAR, radar, or thermal cameras, fusing visual and sensor inputs to improve robustness under challenging conditions such as nighttime, fog, or heavy traffic congestion. Additionally, post-processing techniques like non-maximum suppression (NMS) are used to handle overlapping detections, ensuring that each object is localized precisely without redundancy.
One of the main advantages of deep learning-based object detection in traffic is its scalability. With large annotated datasets such as KITTI, Cityscapes, BDD100K, or Mapillary Vistas, models can be trained to recognize diverse traffic participants under a wide range of conditions. Transfer learning further allows pre-trained models to be fine-tuned for specific urban environments with relatively small amounts of local data, reducing development costs and improving adaptability. While deep learning models demand significant computational resources for training, advances in GPUs (Graphics processing units), TPUs (Tensor Processing Units), and lightweight architectures have made real-time inference feasible on both edge devices and embedded systems.
Overall, deep learning-based object detection has become the dominant approach for traffic monitoring, autonomous driving, and intelligent transportation systems. By learning robust, high-level representations of complex traffic scenes, these methods achieve superior accuracy and generalization compared to traditional approaches, making them essential for reliable and safe decision-making in real-world traffic applications.
Deep learning-based object detection methods have substantially advanced traffic monitoring by improving accuracy, adaptability, and real-time performance. The study by [
27] demonstrated the feasibility of fine-tuning SSD for ADAS (Advanced Driver-Assistance Systems) by augmenting pedestrian and cyclist data, achieving roughly 10% higher precision and real-time inference at about 29.4 FPS. Building on the YOLO family, ref. [
28] extended YOLOv3 with additional prediction layers and spatial pyramid pooling (SPP), improving robustness to scale variation and occlusion and yielding 85.29% mAP on the UA-DETRAC benchmark. A work by [
29] proposed a low-cost, camera-based congestion estimator using YOLOv3, reporting about 86% precision and 87% recall without per-camera retraining. The study by [
30] implemented a YOLO/OpenCV real-time traffic management prototype on embedded hardware to adapt signal timing from live vehicle counts, demonstrating effective congestion relief compared to fixed-timing methods. A study by [
31] introduced a projective consistency loss and integrated Global Context and ASPP (Atrous Spatial Pyramid Pooling) modules into YOLOv3 to jointly predict object classes and distances, achieving roughly 89.28% mAP and accurate 2.5D localization on KITTI. More recent systems have moved toward integrated traffic control: [
32] combined YOLOv7, R-CNN, and CNN classifiers in an automated traffic management system surpassing 90% detection accuracy under diverse conditions, and [
33] developed a self-adaptive control framework using YOLOv3 and CNNs to detect vehicles and violations, achieving 88.43% detection accuracy and 90.45% wrong-way detection. Collectively, these works illustrate a trajectory from single-model detection improvements toward multi-functional, real-time frameworks that couple robust perception with traffic control and scene understanding. A comparison of the studies is presented in
Table 9.
5.7. Advanced Architectures and Trends
In recent years, artificial intelligence has significantly transformed the landscape of traffic object detection, introducing architectures that emphasize adaptability, efficiency, and contextual understanding; while traditional rule-based systems relied on fixed heuristics and handcrafted features, contemporary AI-based methods focus on autonomous learning, probabilistic modeling, and intelligent feature fusion to improve robustness in dynamic road environments.
One important trend is the integration of multi-sensor fusion frameworks, where data from cameras, LiDAR, radar, and other sensors are combined to create a richer and more reliable environmental perception. These architectures employ Bayesian inference, fuzzy logic, and Markov-based decision models to reason about uncertain or incomplete information, for example, inferring the presence of a vehicle in poor visibility conditions or estimating motion patterns from partially observed trajectories.
Another emerging direction involves context-aware and behavior-driven detection systems. These systems not only identify objects such as vehicles, pedestrians, and road signs but also interpret their interactions and probable intentions. By integrating spatial and temporal reasoning, such systems enable predictive capabilities, allowing for early detection of potential traffic risks or congestion patterns.
There is also a growing focus on lightweight and computationally efficient architectures that can operate on embedded devices, mobile platforms, or roadside units. These systems are designed to balance accuracy with low latency, making them suitable for real-time applications in intelligent transportation infrastructure. Optimization techniques, such as feature selection, dimensionality reduction, and energy-efficient computation, are becoming essential to deploy AI-based detectors in large-scale urban networks.
Another notable trend is the rise of hybrid and adaptive AI (Artificial Intelligence) models, which combine various learning paradigms, for instance, merging symbolic reasoning with statistical inference or coupling supervised with reinforcement learning. Such models are capable of self-adaptation, allowing the system to adjust detection thresholds, retrain classifiers, or recalibrate sensors based on environmental feedback. This adaptability is crucial for maintaining consistent detection performance under changing conditions like weather variations, lighting differences, or evolving traffic behaviors.
Finally, the incorporation of ethical and safety considerations into AI-driven architectures has become a key research focus. Ensuring transparency, explainability, and reliability in detection decisions is particularly vital in traffic systems that interact closely with human drivers and pedestrians. Future trends point toward autonomous yet accountable AI systems capable of not only detecting and classifying objects but also explaining their reasoning processes in a manner that supports verification and public trust.
Recent advancements in object detection for intelligent transportation systems have focused on integrating deep learning models with distributed and adaptive computing architectures to enhance real-time performance, scalability, and efficiency. The work of [
34] employed local feature extraction techniques such as SIFT, SURF, and ORB combined with an MF ArtMap classifier and a genetic algorithm, achieving up to 83% accuracy in vehicle classification within five seconds per image, demonstrating feasibility for practical traffic monitoring. In the same year, ref. [
35] proposed a cloud–edge collaborative framework using BING-based region-of-interest extraction and Faster R-CNN, improving data transmission efficiency by 60% with minimal accuracy loss through adaptive model updating. The study by [
36] advanced these methods by introducing a hybrid architecture integrating optimization algorithms with adaptive feature extraction to enhance recognition accuracy and computational efficiency, achieving robust results under complex traffic conditions. More recently, ref. [
37] developed a distributed edge computing framework with lightweight detection algorithms and intelligent workload balancing, significantly reducing latency and energy consumption while maintaining high accuracy and scalability across multiple edge nodes. Collectively, these studies highlight the transition toward adaptive, distributed, and optimization-driven architectures capable of supporting real-time, large-scale intelligent transportation applications. A summary of the studies is presented in
Table 10.
5.8. Impact of Adverse Weather and Environmental Conditions
Adverse weather conditions and complex environmental factors represent a persistent challenge for both traffic object detection and higher-level traffic analysis. Conditions such as rain, fog, snow, low illumination, glare, and nighttime operation significantly degrade image quality, reduce contrast, and introduce noise, which negatively affects detection accuracy and system reliability.
Classical computer vision methods are particularly sensitive to illumination changes and weather-induced artifacts, as their performance often relies on stable background modeling, edge clarity, or handcrafted features. Although some approaches incorporate adaptive thresholds, shadow removal, or texture-based cues, their robustness remains limited under severe environmental variability.
Deep learning-based object detection methods demonstrate improved resilience due to their ability to learn complex feature representations. However, their performance is still highly dependent on the diversity and representativeness of training data. Models trained predominantly on clear-weather datasets often exhibit reduced generalization when deployed under adverse conditions. To mitigate this issue, recent studies explore data augmentation strategies, domain adaptation, and multimodal sensor fusion, combining RGB (Red, Green, and Blue) cameras with LiDAR, radar, or thermal imaging.
Despite these advances, many studies continue to evaluate detection performance under ideal or near-ideal conditions, while systematic benchmarking under adverse weather remains limited. This gap highlights the need for standardized datasets, evaluation protocols, and adaptive detection frameworks capable of maintaining reliable performance in real-world traffic environments characterized by significant environmental uncertainty.
5.9. Strengths, Limitations, and Trade-Offs of Traffic Object Detection Approaches
Traffic object detection methods exhibit distinct strengths and limitations depending on their underlying paradigms, data requirements, and deployment contexts. Classical computer vision approaches offer high interpretability, low computational complexity, and deterministic behavior, which makes them attractive for real-time applications and resource-constrained environments. However, their reliance on handcrafted features and background modeling renders them highly sensitive to illumination changes, weather variability, and scene dynamics, limiting their robustness in complex urban traffic scenarios.
Traditional machine learning methods improve upon classical approaches by incorporating statistical learning and feature-based classification. These methods demonstrate better generalization across moderate environmental variations and enable principled optimization of decision boundaries. Nevertheless, their performance remains strongly dependent on feature engineering quality and training data representativeness. As traffic environments become increasingly heterogeneous, handcrafted feature sets struggle to capture complex interactions among traffic participants.
Deep learning-based object detection methods currently achieve state-of-the-art performance in terms of detection accuracy, multi-class recognition, and adaptability. Convolutional neural networks, one-stage and two-stage detectors, and transformer-based architectures effectively learn hierarchical feature representations directly from data, reducing the need for manual feature design. Despite these advantages, deep learning models introduce significant trade-offs. They require large-scale annotated datasets, substantial computational resources, and careful hyperparameter tuning to achieve stable and reproducible results. Furthermore, their black-box nature limits interpretability, which is a critical concern in safety-critical traffic monitoring applications.
A fundamental trade-off emerges between accuracy and efficiency. High-accuracy models often rely on deep architectures with high inference latency, whereas real-time systems prioritize lightweight models at the expense of detection precision. Additionally, robustness to adverse weather and occlusions remains inconsistent across architectures, as many models are trained and evaluated under idealized conditions. Consequently, no single detection paradigm offers a universally optimal solution, and practical deployments increasingly favor hybrid systems that balance accuracy, efficiency, and robustness based on operational requirements.
6. Traffic Incident Detection
Following the discussion of object-level perception techniques, this section reviews traffic incident detection methods that operate at a higher semantic level and aim to identify abnormal or non-recurrent traffic events. Traffic incident detection is a fundamental part of intelligent transport systems (ITS), traffic monitoring mechanisms, and comprehensive road safety management systems. The primary goal of these methodologies is to identify non-standard events in traffic, such as traffic accidents, traffic jams, sudden slowdowns, or other abnormalities in traffic flow behavior. The ability to detect these events in a timely and accurate manner enables prompt response by control systems, effective resource allocation, traffic flow optimization, and minimization of the risk of secondary incidents.
In terms of methodological approach, incident detection systems can be divided into two basic categories. Classic methods are based on explicitly defined rules, thresholds, and statistical models, and are characterized by predictable computational complexity, simple implementation, and the ability to provide results in near real time. They enable reliable detection in predictable traffic scenarios and are suitable for situations with relatively stable traffic flows. On the other hand, methods using artificial intelligence apply machine learning, deep learning, clustering, and graph modeling algorithms that enable adaptive processing of historical and real-time data. These approaches can capture complex and nonlinear patterns in traffic, predict the likelihood of incidents, and optimize traffic management even in dynamically changing environments.
The integration of both approaches—classical analytical data processing and adaptive AI methods—enables the creation of robust systems that combine fast, predictable detection with predictive and autonomous capabilities, thereby increasing the overall efficiency and safety of traffic flow management.
Traffic incident detection builds upon the outputs of object detection systems. By identifying vehicles, pedestrians, and other relevant objects in real time, incident detection modules can infer abnormal or hazardous situations, such as accidents, congestions, or unsafe maneuvers. This hierarchical relationship underscores the importance of accurate and robust object detection as a prerequisite for reliable incident analysis.
When we take a closer look at classic detection methods, we can divide them as follows:
Threshold-based detection;
Rule-based methods;
Image processing techniques;
Statistical modeling with Kalman filters;
Statistical modeling using the Hidden Markov model;
Frequency analysis;
Dynamic Time Warping, DTW;
Histogram-based methods.
On the other hand, we can divide detection methods using artificial intelligence as follows:
Computer vision with deep learning;
Anomaly Detection;
Predictive modeling;
Traffic flow optimization;
Natural language processing;
Clustering;
Graph-based models.
All reviewed categories for this section are presented in
Figure 4.
6.1. Threshold-Based Detection
Threshold-based detection is one of the simplest and most commonly used methods in traffic monitoring systems. Its principle is based on comparing current traffic parameter values, such as speed, density, or traffic flow, with predefined thresholds. If the measured values differ significantly from these limits, the system interprets this as a potential incident or abnormality in traffic behavior.
Threshold values can be set statically—based on historical data—or dynamically, where thresholds are automatically adjusted to current conditions such as time of day or weather. This approach is computationally efficient, easy to implement, and allows for rapid real-time response, such as activating warning systems or adjusting traffic signs.
The disadvantage of this method is its limited flexibility and sensitivity to changing traffic conditions. Incorrectly set thresholds can lead to frequent false alarms or, conversely, to actual incidents being overlooked. Therefore, newer approaches use optimized threshold techniques, such as methods based on entropy or bimodal histograms, which allow for automatic and more accurate determination of threshold values. In practice, threshold methods are often combined with more advanced artificial intelligence models or algorithms, which increases the accuracy and reliability of detection in real traffic conditions.
Research on threshold-based traffic-incident detection methods has evolved significantly over the years. In the work by [
38], authors proposed a method combining improved nonparametric regression (INPR) with a standard deviation threshold (SND) to forecast traffic flow and detect incidents, aiming to provide a simple, reliable, and easily implementable algorithm; simulations showed low forecast errors and effective incident detection, with detection time dependent on the chosen SND threshold. The study by [
39] developed a system using predefined thresholds for traffic parameters such as speed, density, and flow, where deviations trigger incident alerts, evaluated on both simulated and real traffic data, with results showing reliable real-time detection suitable for practical deployment in traffic control systems. The work by [
40] integrated floating car data (FCD) with urban traffic signal cycles to detect congestion at intersections by comparing average speed and travel time against predefined thresholds across multiple detection cycles, providing a fast, accurate, and practical method that demonstrated high detection accuracy, effective congestion alarms, and reliable congestion dissipation management in real data from Ningbo. Finally, ref. [
41] compared five theoretical methods for determining threshold values to automate and objectify traffic-risk classification, finding that the minimum cross-entropy method achieved the highest predictive accuracy while reducing subjectivity in threshold selection. The studies are compared in
Table 11.
6.2. Rule-Based Methods for Incident Detection
Rule-based systems represent a traditional approach to traffic incident detection, using logically defined rules and inference mechanisms to analyze traffic data. Their operation is based on a knowledge base in which “if-then” relationships are explicitly defined. These rules make it possible to evaluate the current traffic situation and identify deviations from normal behavior, such as speed reductions, traffic jams, or traffic accidents.
Rule-based systems can include multiple levels of logic, from simple control conditions to complex multi-stage decision-making processes. Modern implementations often use multi-agent systems (MAS), in which individual agents independently process data from sensors and coordinate their decisions according to defined rules. An inference engine (such as Drools) ensures that rules are applied in real time, enabling immediate response to incidents or changes in the traffic environment.
The main advantages of rule-based systems are their transparency, modularity, and easy interpretation of results. They allow for rapid adaptation by modifying rules without the need to change the basic architecture of the system. However, their effectiveness is significantly conditioned by the quality and complexity of the knowledge base, improperly defined or incomplete rules can lead to false detection or delayed response.
In current practice, these systems are often combined with advanced analytical techniques or artificial intelligence methods. Hybrid approaches allow the rule-based system to provide an interpretable logical layer, while machine learning algorithms increase the accuracy and adaptability of the system when processing large and dynamic traffic data.
Early work by authors [
42] developed a rule-based multi-agent system for traffic management using the JADE platform and Drools inference engine, employing logical, detection, and notification rules to efficiently detect incidents and send prioritized alerts, particularly under adverse weather conditions. Building on rule-based approaches, ref. [
43] proposed a predictive model for real-time accident risk estimation using loop detector data, demonstrating high accuracy and performance comparable to or exceeding logistic regression and decision tree methods. More recently, ref. [
44] leveraged LiDAR-extracted trajectories and the Speed-Distance Profile (SDP) to identify vehicle–pedestrian near-crash events, achieving precise detection of all recorded events and accurate estimation of vehicle and pedestrian speeds. A comparison of the studies is presented in
Table 12.
6.3. Image Processing Techniques
Image processing techniques represent an important group of methods used to detect traffic incidents, working with visual data obtained from traffic cameras, sensors, or drones. Their goal is to automatically analyze images and videos to identify objects, track their movement, and recognize patterns of behavior that may signal an incident, traffic congestion, or other anomaly.
These methods are based on computer vision and a set of algorithms that enable image data to be processed in several steps, from image pre-processing (e.g., noise removal, contrast normalization) through segmentation and feature extraction to motion analysis. The most commonly used techniques include motion detection, optical flow, object tracking, and scene change detection. These approaches make it possible to distinguish between static and dynamic objects, monitor their trajectories, and identify events such as collisions, stopped vehicles, or traffic jams.
A significant advantage of image processing is its ability to provide a high level of detail and comprehensive information about the traffic situation without the need for a dense network of physical sensors. In addition, it allows simultaneous monitoring of multiple lanes or intersections and is relatively flexible to changes in traffic infrastructure. On the other hand, the accuracy of these methods is significantly affected by the quality of the input image, lighting conditions, environmental conditions, and the computational complexity of the algorithms.
In recent years, traditional image processing techniques have been gradually combined with deep learning methods (e.g., convolutional neural networks, CNNs), which significantly improve the ability of systems to automatically recognize and classify objects in images. The combination of classic image processing algorithms with intelligent models provides more robust solutions capable of more reliable detection of traffic incidents, even in complex and dynamic environments.
Research on image-based traffic-incident detection has evolved substantially in recent decades. One of the earliest studies [
45] reviewed traditional threshold- and prediction-based incident detection methods using loop detectors and proposed using image processing to detect incidents through visual traffic features such as stopped vehicles, speed changes, and spatial flow patterns, highlighting the potential of image-based detection if challenges in background modeling and threshold selection are addressed, although no experimental data were provided. Another study [
46] evaluated two real-world traffic systems and implemented image-processing modules using edge detection, intensity thresholds, and region segmentation to reduce false alarms caused by environmental factors like shadows, snow, rain, and glare; their results showed that shadow detection reduced false alarms by 96% and the other modules substantially improved system reliability with minimal computational cost. Another study [
47] developed a multi-stage vehicle detection, tracking, and classification system combining background subtraction, image differencing, morphological operations, and edge detection operators (Sobel, Prewitt, and Canny), achieving the highest detection accuracy with the Canny operator, which effectively minimized false detections. The work of [
48] applied edge detection techniques on key frames of traffic videos and used a Naive Bayes classifier with preprocessing steps including normalization, discretization, and fuzzification, achieving classification accuracies between 79% and 89%, with the best results obtained from operators providing more stable edge extraction. The studies are compared in
Table 13.
6.4. Statistical Modeling—Kalman Filters
Kalman filters are among the classic methods of statistical modeling used for processing and predicting traffic data, especially in tasks such as tracking vehicle movements and estimating traffic flow. Their basic principle is the recursive estimation of the actual state of the system based on sequentially arriving measurements, which may be subject to random noise or inaccuracies. In the context of traffic applications, the Kalman filter allows for the dynamic estimation of parameters such as average speed, density, or vehicle flow, thereby detecting deviations from normal traffic behavior.
Mathematically, it is a linear optimization process that minimizes the mean square error between the predicted and actual state of the system. In practice, it is used to predict traffic trends, track vehicle trajectories, or smooth measured data from sensors such as induction loops, radar devices, or GPS units. The advantage of Kalman filters is their ability to work effectively in environments with incomplete or inaccurate data and to provide stable estimates even in the presence of random noise.
Kalman filters have proven particularly useful in transportation systems for real-time incident detection, where changes in estimated parameters (such as sudden traffic slowdowns or speed reductions) can signal an accident or road obstruction. In addition, they are used in predictive traffic management models, where they are used for short-term traffic situation forecasts and optimization of control interventions.
The main disadvantage of the classic version of the Kalman filter is the assumption of system linearity assumption and Gaussian noise, which can be limiting in real traffic conditions. For this reason, extended variants have been developed, such as the extended Kalman filter (EKF) and the nonlinear or particle filter, which can also work in nonlinear environments and provide more accurate estimates. These modern variants increase the robustness of the system and improve the reliability of traffic incident detection even in highly dynamic and complex traffic situations.
Early advancements in Kalman filter-based traffic modeling focused on improving real-time estimation accuracy, evolving toward incident detection and adaptive forecasting. In 2010, ref. [
49] applied Kalman filters to estimate and predict real-time traffic conditions, modeling interactions between speed, density, and origin–destination flows; the Extended Kalman Filter achieved a 12.6% – 19.1% improvement in estimation accuracy, demonstrating significant benefits for traffic control center operations. Building on this, ref. [
50] in 2013 introduced an adaptive Kalman filter enhanced with historical averages and heuristic model predictions supplemented by pseudo-observations, with the heuristic-based predictor delivering the most accurate short-term and multi-stage traffic flow forecasts. In 2014, [
51] advanced the field further by using an Interactive Multiple Model Ensemble Kalman Filter that dynamically switched between traffic models to detect abnormal conditions, successfully improving real-time traffic estimation and accurately detecting incidents beyond the capabilities of traditional Kalman filters. Studies are compared in
Table 14.
6.5. Statistical Modeling—Hidden Markov Model
The Hidden Markov Model (HMM) represents an advanced statistical approach to modeling transport processes that are stochastic and time-dependent in nature. This model is based on the assumption that the monitored traffic system can be represented as a sequence of hidden states that cannot be directly observed but can be derived from observable variables—such as speed, density, or traffic flow intensity. Each state of the system has a defined probability of transition to another state and a probability distribution of observation generation.
In the field of traffic incident detection, HMMs are used to identify transitions between normal and abnormal traffic states, with the model learning to recognize typical traffic flow behavior patterns based on historical data. Once trained, the system can evaluate, in real time, the probability that the current traffic state corresponds to an incident such as a traffic accident or congestion. This approach enables continuous monitoring of traffic and provides an adaptive solution capable of taking into account temporary or local changes in the traffic environment.
The advantage of hidden Markov models is their ability to work with uncertainty and randomness in data, which makes them suitable for dynamic and unpredictable transport systems. In addition, they provide a mathematically accurate framework for evaluating probabilities and identifying sequences of events. However, their disadvantage is their computational complexity and sensitivity to the quality of training data, which must adequately represent different traffic conditions.
In modern practice, HMMs are often combined with machine learning methods and neural networks, creating hybrid models capable of increasing detection accuracy and system adaptability. Such approaches make it possible to combine the statistical robustness of Markov models with the generalization capabilities of artificial intelligence, leading to more reliable real-time incident identification even in complex traffic networks.
The use of Hidden Markov Models (HMMs) for traffic incident detection has evolved significantly over time, demonstrating increasing accuracy and adaptability to real-world conditions. In 2004, ref. [
52] applied Gaussian mixture hidden Markov models combined with the Viterbi algorithm to classify highway traffic conditions from video data without tracking individual vehicles, achieving 94% accuracy across six defined traffic states and demonstrating strong real-time detection capabilities. In 2008, ref. [
53] extended the application of HMMs to accident detection using real and simulated impact pulse data, creating a system that detected both frontal and angular crashes within just 6 milliseconds and achieved 100% accuracy, substantially outperforming traditional threshold-based methods. By 2011, ref. [
54] advanced HMM use further by integrating vehicle tracking and motion interaction features extracted from video into a continuous HMM framework, enabling real-time recognition of incidents such as crashes and overtaking maneuvers with over 85% accuracy, highlighting the robustness of HMMs for automated video-based traffic monitoring. A comparison of studies is shown in
Table 15.
6.6. Frequency Analysis
Frequency analysis is an analytical approach used to examine the periodic and oscillatory properties of traffic signals. The basic idea behind this method is to transform time-based traffic data (e.g., speed, density, or traffic intensity) into the frequency domain, where it is possible to identify dominant frequency components that reflect regular or recurring patterns in traffic behavior. This approach makes it possible to effectively detect anomalies or disturbances in the signal structure that may indicate the occurrence of a traffic incident, sudden slowdown, or traffic flow breakdown.
The most commonly used frequency analysis tool is the Fourier transform (FFT), which decomposes traffic data into a set of harmonic components. It can be used to monitor changes in the frequency spectrum over time and identify deviations from normal system behavior. In more dynamic settings, short-time Fourier transform (STFT) or wavelet transforms are used, which allow simultaneous analysis of the time and frequency characteristics of traffic flow.
Frequency methods are particularly effective in detecting periodic phenomena such as cyclical traffic jams at intersections or rhythmic changes in traffic flow, and in identifying sudden signal disturbances caused by incidents. Their advantage is high accuracy in analyzing stable and periodic data and the ability to detect subtle changes in traffic dynamics that are not apparent in the time domain.
On the other hand, a limitation of frequency analysis is its lower effectiveness with highly non-stationary traffic data, where traffic conditions change rapidly. Therefore, in modern practice, it is often combined with time-frequency methods or intelligent approaches that can respond dynamically to changing conditions. Within traffic management systems, frequency analysis is a valuable tool for diagnostics, trend prediction, and early incident detection.
In 2000, ref. [
55] applied the discrete wavelet transform to traffic flow data to remove high-frequency noise and then used linear discriminant analysis to extract low-frequency features representing incident patterns, which were fed into a neural network for detection. Their objective was to improve incident detection accuracy using frequency-based features, and the results showed a significant reduction in false alarms and enhanced performance compared to traditional threshold-based methods.
By 2003, ref. [
56] further advanced frequency-domain approaches by using discrete wavelet transform to analyze traffic signals in the time–frequency domain to detect abrupt changes indicating incidents. The objective was to enhance accuracy and minimize false alarms, and the wavelet-based method outperformed both traditional and neural network approaches, achieving higher detection accuracy and lower false alarm rates.
In 2015, ref. [
57] applied discrete Fourier transform to vehicle induction signatures from loop sensors to derive spectral characteristics independent of speed. Their objective was to classify vehicles by type using frequency-domain analysis rather than time-domain metrics, and the results demonstrated significantly improved classification accuracy for passenger cars, vans, and trucks, surpassing traditional vehicle-length-based approaches.
Finally, in 2016, ref. [
58] used discrete Fourier transform combined with clustering to analyze vehicle acceleration time series from the Naturalistic Driving Study database to identify accident and risk events. The methodology aimed to detect safety-critical events by analyzing motion frequency patterns, and the algorithm successfully detected 78% of verified critical events, proving its suitability for applications in driver monitoring, safety analysis, and insurance risk assessment. A comparison of studies is shown in
Table 16.
6.7. Dynamic Time Warping—DTW
The Dynamic Time Warping (DTW) method is one of the advanced techniques for comparing time series, widely used in the detection of traffic incidents based on the analysis of traffic variables such as speed, density, or vehicle flow. The basic principle of DTW is to enable non-linear time alignment of two data sequences in order to reveal similarities even when events occur with different time delays or durations. This approach is particularly suitable in situations where traffic dynamics change over time and classic methods of direct comparison of time series fail.
In the context of incident detection, DTW is used to compare the current traffic signal pattern with reference patterns of normal behavior. If significant time or amplitude deviations occur between them, the system can identify the occurrence of a traffic incident, congestion, or other anomaly. The method is particularly effective in processing data from sensors or traffic loops that provide continuous measurements over time, and provides robust evaluation even in the presence of noise or missing data.
The main advantage of DTW is its flexibility and ability to identify similarities between nonlinear time series, which makes it suitable for dynamic transport systems. On the other hand, its computational complexity can be a limiting factor when processing large amounts of data in real time. For this reason, optimized variants such as FastDTW or the Sakoe–Chiba band-based approach are often used to reduce the computational load while maintaining high accuracy.
In modern systems, DTW is often integrated with machine learning methods or classification algorithms that automatically recognize patterns in time series and improve the system’s ability to detect incidents. In this way, DTW becomes part of hybrid approaches that combine the advantages of classical analytical techniques with the adaptive properties of artificial intelligence.
In 2012, ref. [
59] used dynamic temporal deformations to analyze traffic data from point detectors placed in different parts of traffic flow, identifying incidents as sudden irregularities between flows. Their objective was to create a real-time accident detection system capable of predicting incidents and identifying anomalies in microscopic traffic variables, achieving a 94% detection rate, especially when integrated with video surveillance, which helped reduce false alarms. In the same year, ref. [
60] applied dynamic time warping combined with artificial traffic patterns to analyze indicators such as speed and time lead for predicting traffic congestion in real time, with the system reaching an accuracy of 82.53% and showing improved reliability as more data were processed. By 2014, ref. [
61] advanced this approach using Dynamic Time Warping with threshold-based classification logic to compare real-time traffic flow patterns against historical data, with the objective of accurately detecting abnormal traffic conditions under fluctuating congestion, and the method demonstrated high detection accuracy and low false alarm rates using real freeway data. A comparison of studies is shown in
Table 17.
6.8. Histogram-Based Methods
Histogram-based methods represent a simple but effective approach to traffic incident detection, utilizing the statistical distribution of traffic parameter values such as vehicle speed, density, intensity, or acceleration. A histogram serves as a graphical representation of the frequency of occurrence of individual values, with the shape of the distribution reflecting the nature of the traffic flow. Changes in the distribution of values may indicate the occurrence of abnormal events, such as traffic accidents, traffic jams, or road congestion.
In detection systems, histograms are used to determine threshold values or probability limits between normal and abnormal traffic conditions. For example, in the case of a bimodal distribution, one peak of the histogram can be assigned to smooth traffic and the other to congestion or an incident. By comparing the current histogram with the reference distribution, it is possible to effectively identify the emergence of a new traffic condition or a sudden change in the dynamics of the system.
The main advantage of histogram methods is their computational simplicity, transparency, and easy implementation, making them suitable for real-time traffic data processing. These methods can provide rapid feedback to traffic management systems, which is key to preventing secondary accidents or optimizing traffic flow.
Despite their simplicity, however, histogram-based methods face limitations, especially when processing non-stationary and noisy data, where the shape of the histogram can change dynamically. Therefore, in modern practice, they are often combined with advanced statistical techniques or intelligent algorithms that enable automatic determination of optimal thresholds and adaptation of the system to current traffic conditions. In this way, histogram approaches retain their simplicity while gaining greater accuracy and robustness in detecting traffic incidents.
In 2009, ref. [
62] introduced a traffic monitoring approach using traffic density histograms to estimate road connectivity, where each segment maintained its own histogram and connectivity was determined through threshold comparisons. Their objective was to create a scalable method for connectivity estimation without tracking individual vehicles, and the method successfully identified disconnected road segments while reducing false positives compared to traditional peer-to-peer schemes. In 2011, ref. [
63] developed a system that preprocessed traffic images into grayscale or RGB histograms, which are then classified using a neural network to distinguish between normal and corrupted surveillance images, achieving over 90% detection accuracy and demonstrating that RGB histograms with max scaling are effective for reliable image quality monitoring. By 2017, [
64] advanced histogram-based methods for real-time traffic management by using histogram models to monitor lane-level traffic density and predict congestion, dynamically rerouting vehicles in simulations, with results showing accurate congestion prediction and reduced travel times, demonstrating suitability for integration into intelligent transport systems. A comparison of studies is shown in
Table 18.
6.9. Computer Vision with Deep Learning
Deep learning-based computer vision is one of the most advanced and accurate methods for detecting traffic incidents. It uses convolutional neural networks (CNN) and their advanced architectures to automatically analyze images and video recordings from traffic cameras, without the need for manually defined rules or thresholds. These models are capable of independently extracting complex visual features—such as the presence of collided vehicles, traffic jams, sudden stops, or movement anomalies—and classifying them in real time with high accuracy.
Modern architectures such as YOLO (You Only Look Once), Faster R-CNN, Mask R-CNN, or Vision Transformer (ViT) make it possible to detect and locate objects (vehicles, pedestrians, obstacles) in a traffic scene, track their trajectories, and identify events indicating an incident. These models can capture spatiotemporal context using 3D convolutions or recurrent neural networks, which increases their ability to identify dynamic changes in traffic flow and analyze the causes of incidents.
A significant advantage of deep learning computer vision is its scalability and adaptability. These models can be continuously trained on new data, allowing them to adapt to changing traffic patterns, different geographic locations, and weather conditions. In addition, deep learning methods allow image data to be combined with other sources of information (e.g., sensor data or GPS), resulting in multimodal detection systems with even greater robustness.
The disadvantage of these approaches is their high computational complexity and the need for extensive annotated training data. However, with the use of cloud platforms, edge computing, and transfer learning, these limitations can be effectively mitigated, making deep learning computer vision a key element in modern intelligent transportation systems.
In 2019, ref. [
65] applied Mask R-CNN for vehicle detection and segmentation combined with centroid-based object tracking that analyzed trajectory angles, field overlap, and speed changes to identify accidents in real time, with the model achieving 71% accuracy on real-world video recordings. In 2021, ref. [
66] integrated YOLOv3 for vehicle detection with recurrent neural networks and Support Vector Machines to predict short-term traffic density for dynamic traffic management, achieving prediction accuracies between 0.44 and 0.53 (normalized prediction accuracy) for 1–5 min ahead, with SVM proving more stable and faster than RNN. In the same year, ref. [
67] introduced a model that combined the InceptionV4 convolutional neural network with ConvLSTM to detect spatial and temporal features in urban video streams, achieving 98% accident detection accuracy on the CADP dataset. In 2022, ref. [
68] implemented YOLOv5 with DeepSORT (Simple Online and Realtime Tracking with a Deep Association Metric) tracking and homographic projection to monitor traffic during major events, reaching 96.91% vehicle counting accuracy and turn detection performance between 43% and 72%, while [
69] introduced a hybrid CNN-LSTM model to capture spatial–temporal dependencies in urban and highway traffic data, outperforming traditional approaches in anomaly detection accuracy, recall, and F1 score. In 2023, [
70] integrated three deep learning models—YOLOv5 with DeepSORT for identifying vehicles, YOLOv5 for detecting accidents, and ResNet152 for fire recognition—achieving 99.2% accuracy in vehicle detection, 83.3% in accident detection, and 98.9% in fire detection to classify accident severity in real time. In 2024, ref. [
71] developed a system using YOLOv8 and DeepSORT to detect and track urban traffic incidents with 98.4% accuracy, 97.2% recall, and 98.5% precision, while [
72] applied YOLO with SORT to detect anomalies on highways, demonstrating real-time effectiveness in improving safety through automated anomaly recognition. Furthermore, in 2024, ref. [
73] utilized convolutional neural networks with anomaly detection algorithms to improve traffic accident monitoring, resulting in increased accuracy and fewer false positives, and [
74] compared optical flow and CNN methods optimized with space-filling Hilbert curves, finding that the CNN-SFC model achieved 96.5% accuracy while optical flow-SFC reached 93.8% with greater computational efficiency. A comparison of studies is shown in
Table 19.
6.10. Anomaly Detection
Anomaly detection in intelligent transport systems is a method of identifying non-standard or unexpected events that may indicate traffic incidents, accidents, or abnormal behavior by road users. Unlike traditional approaches based on fixed thresholds or rules, AI-based methods can learn from historical data and recognize complex, nonlinear patterns that traditional methods may not capture.
Key approaches include neural networks, autoencoders, graph neural networks (GNN)-based models, and clustering techniques. Autoencoders, for example, reconstruct normal traffic flow behavior, and large deviations between the reconstructed and actual states signal an anomaly. Graph models enable anomaly detection across the entire transport network, taking into account the spatiotemporal relationships between individual road sections and nodes. Clustering then allows the identification of outlier traffic patterns that differ significantly from typical traffic conditions.
Anomaly detection using AI is suitable for predictive traffic management, critical situation monitoring, and real-time decision support. These methods increase the accuracy of incident identification, minimize false alarms, and provide adaptive solutions for dynamic traffic environments.
In 2018, ref. [
75] introduced the Shadow Traffic model, which uses simulated “shadow vehicles” within established traffic simulators such as SUMO and VISSIM to realistically replicate abnormal behaviors including accidents, vehicle breakdowns, and pedestrian interactions, achieving realistic anomaly simulation with minimal computational overhead and seamless integration into existing systems. In 2022, ref. [
76] developed an RNN/LSTM-based approach using unsupervised preprocessing and deep quantile regression to analyze mobile network traffic and identify anomalies without requiring labeled data, successfully distinguishing short-term and long-term disruptions with an F1 score ranging from 87% to 88% on metropolitan traffic datasets. In the same year, ref. [
77] applied convolutional neural networks based on ResNet50 to traffic camera video streams for automated accident detection, achieving 82% real-time detection accuracy and reducing the need for human monitoring. Furthermore, in 2022, ref. [
78] introduced Nepod, a future localization-based anomaly detection framework that predicts vehicle and pedestrian trajectories and flags deviations using AnoPred, enabling real-time detection of rare or unseen events without labeled training data and achieving high accuracy across 4677 video samples with diverse anomaly categories. Most recently, in 2024, ref. [
79] proposed a deep learning-based traffic accident detection framework using a trainable I3D-CONVLSTM2D model that fuses RGB and optical flow inputs through ConvLSTM2D layers and feature fusion, leveraging transfer learning to capture complex spatiotemporal patterns in videos. This model achieved a mean average precision of 87% and accuracy, precision, recall, and F1 scores of 80%, demonstrating robust performance in real-time accident detection, though it still faces challenges with stationary vehicles, environmental interference, and computational demands on edge devices. Additionally, in 2023, ref. [
80] developed an LSTM-based deep representation approach for freeway accident detection using loop detector traffic flow data. This framework addresses class imbalance with resampling techniques and encodes spatio-temporal features through LSTM, achieving a true positive rate of 0.71 and false positive rate of 0.25, with accident detection possible in less than 18 min. However, its performance is sensitive to site-specific sensor arrangements and is limited to roads equipped with loop detectors. Studies are compared in
Table 20.
6.11. Predictive Modeling
Predictive modeling is one of the key intelligent methods for detecting and predicting traffic incidents, with the aim of anticipating abnormal events based on historical and current traffic data. Unlike traditional reactive approaches, which identify incidents only after they occur, predictive modeling makes it possible to estimate the probability of future incidents and take preventive measures as part of traffic flow management.
These methods are based on advanced machine learning and deep learning algorithms such as recurrent neural networks (RNN), long short-term memory (LSTM), Gated Recurrent Units (GRU), or transformer-based models, which can process time series of traffic variables and identify complex nonlinear relationships between them. These models use inputs such as vehicle speed, traffic flow, traffic density, weather conditions, or historical accident rates to make probabilistic predictions in the short and medium term.
Predictive modeling enables early detection of risky traffic situations, thereby increasing the effectiveness of dynamic control systems and contributing to a reduction in secondary accidents. Models can be trained at various levels—from link-level to network-level prediction—and can adaptively respond to traffic fluctuations during the day or seasonal variations.
The main advantage of these methods is their ability to generalize and adapt, making them particularly suitable for complex and dynamically changing traffic environments. The main limitations include dependence on data quality and the need to regularly update models to maintain high predictive accuracy. Despite these challenges, predictive modeling is becoming central to intelligent traffic management and an important tool for the transition to fully autonomous transportation infrastructures.
The study by [
81] developed a hybrid approach combining long short-term memory (LSTM) networks and multilayer perceptrons (MLP) to predict the remaining duration of traffic incidents in real time, addressing the critical need for dynamic and accurate forecasting based on live traffic data. Their findings demonstrated that using a 30-min observation window, the integrated model achieved an area under the curve (AUC) exceeding 0.85 and prediction accuracy above 75%, indicating that incorporating continuously updated traffic information significantly enhances predictive performance and incident management.
Expanding on predictive traffic modeling, ref. [
82] proposed a comprehensive framework that integrates traffic condition analysis, iterative prediction modeling, and intelligent decision-making to optimize traffic flow and mitigate congestion. The model demonstrated high accuracy in forecasting traffic density and speed, enabling effective road efficiency improvements through dynamic traffic rerouting strategies.
In the context of non-standard traffic events, such as accidents and adverse weather, ref. [
83] systematically compared multiple machine learning approaches, including MLP, convolutional neural networks (CNN), LSTM, CNN-LSTM hybrids, and LSTM autoencoders, to predict traffic flow under variable conditions. Their results indicated that the one-dimensional CNN-LSTM model outperformed other architectures, achieving the lowest root mean square error (RMSE) and mean absolute error (MAE), and effectively capturing the impact of incidents, with accidents reducing road capacity by approximately 10–25% and rainfall significantly increasing congestion levels.
Addressing the challenge of sparse sensor coverage, ref. [
84] introduced the Edge and Node Awareness Dual Autoencoder (ENDAE), a novel approach designed to reconstruct traffic network structures and detect incidents at both node and edge levels. The ENDAE framework enhanced incident detection performance by increasing recall by 12.5%, reducing detection delays by 18.5%, and notably doubling the recall rate for edge-level incidents, demonstrating its effectiveness in complex or under-instrumented traffic networks.
Most recently, ref. [
85] leveraged existing optical infrastructure to monitor traffic without specialized sensors, employing a bidirectional LSTM (Bi-LSTM) with attention mechanisms to process changes in standard operating patterns (SOP) for accurate vehicle movement detection. This approach achieved remarkable accuracy of 99%, correctly identifying 2140 out of 2141 traffic events and reliably capturing both daily and hourly traffic fluctuations.
Finally, ref. [
86] proposed a hybrid traffic incident detection model combining Generative Adversarial Networks (GAN) and a Temporal-Spatial Stacked Autoencoder (TSSAE). The GAN was used to generate additional incident samples to address imbalanced and small sample sizes, while the TSSAE captured both spatial and temporal correlations among traffic variables. The model achieved its best performance on balanced datasets, with detection rates (DR) up to 90.64%, classification rates (CR) up to 89.92%, low false alarm rates (FAR) around 5.2%, and AUC of 0.852, while still outperforming benchmark models on imbalanced datasets. This study highlights the importance of data augmentation and hybrid deep learning architectures for real-time incident detection.
Collectively, these studies illustrate the progressive evolution of traffic prediction and incident detection methodologies, moving from hybrid neural architectures for dynamic forecasting to integrated network-aware autoencoder models and infrastructure-efficient monitoring solutions, emphasizing the growing sophistication and practical applicability of machine learning approaches in modern traffic management systems. A comparison of studies is summarized in
Table 21.
6.12. Traffic Flow Optimization
Traffic flow optimization is an advanced approach that uses artificial intelligence methods to manage and regulate traffic in order to minimize congestion, reduce the likelihood of incidents, and maximize traffic flow. These methods focus on proactive management of traffic processes, rather than simply responding to existing problems. They use complex models based on reinforcement learning (RL), evolutionary algorithms, metaheuristics, and multi-agent systems that enable autonomous decision-making based on current and predicted traffic conditions.
Traffic flow optimization systems model the traffic network as a dynamic, multidimensional system in which individual control elements (such as traffic lights, ramp metering, variable traffic signs, or dedicated lanes) behave as intelligent agents. Through learning, these agents seek to find the optimal control strategy that minimizes time delays, the number of stops, or the length of traffic jams. Reinforcement learning allows these agents to learn from their interaction with the traffic environment, adaptively improving their decision-making over time based on a reward function.
A significant benefit of this method is its ability to prevent incidents through dynamic traffic flow management. Optimization models can redistribute traffic loads, predict critical locations with increasing density, and automatically apply regulatory measures. Integration with predictive models makes it possible to transform the traffic system from a passive to a proactive and autonomously controlled system capable of responding even before critical conditions arise.
The disadvantage of these methods is their high computational complexity and the need for simulated environments to train the models. However, thanks to the use of cloud technologies, edge computing, and digital twins of transport networks, these limitations are gradually being overcome. Traffic flow optimization thus represents a fundamental step towards fully intelligent transport infrastructures of the future.
The study by [
87] introduced a deep reinforcement learning (DRL) approach using a triple dual Q-network (3DQN) to optimize traffic signal control based on high-resolution event-based data, aiming to dynamically adjust signal phase durations and improve traffic flow efficiency. Their results demonstrated a reduction in vehicle delay by up to 21.2%, a decrease in queue length by 29.7%, and an increase in average vehicle speed by 15.5% compared to a fixed-time signal strategy, highlighting the potential of DRL to adaptively manage traffic in real time. In parallel, ref. [
88] developed the CAREL open-source framework to evaluate the robustness of reinforcement learning-based traffic signal controllers under varying uncertainties, including demand surges, traffic incidents, and sensor failures. Their findings indicated that RL controllers consistently outperformed traditional fixed and actuated strategies, dynamically redistributing green times and maintaining stable traffic control even in the presence of system disruptions, demonstrating the resilience and practical applicability of adaptive RL models in realistic traffic scenarios. Building on the integration of artificial intelligence in traffic management, ref. [
89] employed an artificial neural network (ANN) for traffic incident detection within a VISSIM-simulated highway environment, with the objective of improving the speed and accuracy of incident identification. The model achieved a detection accuracy of 100%, a false alarm rate (FAR) of 1.29%, and a mean time to detect (MTTD) of 1.6 min, significantly outperforming traditional rule-based and statistical approaches. Similarly, ref. [
90] combined Q-learning with genetic algorithms to optimize traffic light schedules in urban intersections, aiming to minimize vehicle waiting times and alleviate congestion. Their approach reduced average waiting times by 12.54% compared to a fixed cyclic strategy and by 10.39% relative to the Longest Queue First method, highlighting the advantages of hybrid optimization techniques in urban traffic signal control. Collectively, these studies illustrate the progressive adoption of reinforcement learning, neural networks, and hybrid optimization strategies in traffic management, demonstrating both the efficiency gains and the robustness improvements achievable through intelligent, adaptive traffic control systems. Studies are compared in
Table 22.
6.13. Natural Language Processing
Natural language processing (NLP) is a modern approach to detecting traffic incidents that uses analysis of unstructured text and voice data from external information sources. These sources include social networks, driver voice reports, data from dispatch centers, media reports, and crowdsourcing platforms such as Waze and Google Maps. The basic task of NLP methods in traffic management is the automatic extraction of relevant information about incidents and their transformation into a form that can be used by intelligent traffic systems in real time.
Modern NLP models, based on transformer architectures and large language models (LLMs), can analyze language patterns, identify the context of events, and distinguish between different types of incidents, such as accidents, traffic restrictions, or sudden deterioration of weather conditions. These systems are capable of detecting not only explicit information (e.g., “accident on the D1 highway”) but also implicit indicators of danger (e.g., “car stopped in the emergency lane” or “traffic jam”). In addition, NLP enables sentiment analysis, which increases detection accuracy by assessing the urgency and severity of an incident based on the emotional tone or verbal expression of users.
A significant advantage of NLP is its ability to provide timely, even predictive information, as incidents are often reported by users before they are detected by traditional sensor systems. The integration of NLP with other detection methods allows the creation of hybrid models that combine structured traffic data with situational context from text sources, resulting in greater system accuracy and robustness.
The challenge with NLP methods is the need to process multilingual and informal language inputs, the presence of noise in the data, and the identification of false reports. However, modern models can mitigate these problems through adaptive learning and validation of information from multiple sources. NLP is thus becoming a strategically important tool for holistic traffic incident detection, complementing sensor infrastructure with human-generated knowledge and situational awareness.
In the work by [
91], natural language processing (NLP) was combined with complex event processing (CEP) to extract and classify traffic updates from Twitter, aiming to transform unstructured crowdsourced data into structured, real-time traffic alerts. Their system achieved F1 scores above 70% and successfully provided users with maps and subscription alerts reflecting current traffic conditions. Building on crowdsourced social data for traffic monitoring, [
92] applied NLP and machine learning techniques, specifically Semi-Naive Bayes and supervised latent Dirichlet allocation (sLDA), to filter and categorize tweets related to traffic incidents while incorporating geocoding for spatial analysis. This approach achieved an overall classification accuracy of 90.5%, detected up to 71% of officially reported accidents, and identified 206 additional events that were not captured by conventional reporting channels, demonstrating the potential of social media as a complementary traffic monitoring tool. The study by [
93] further enhanced the processing of transportation-related Twitter data by combining NLP, text mining, and machine learning algorithms, including logistic regression and SVMs, with Apache Spark with version 2.1.1 for scalable big data processing. Their methodology effectively extracted incident locations and severity levels, maintaining high accuracy even under large-volume real-time data streams, thus improving the speed and reliability of traffic information dissemination. More recently, ref. [
94] employed NLP and machine learning using AWD-LSTM (LSTM variant with adaptive weight dropout) and ULMFiT to analyze Twitter messages for urban event classification and citizens’ mood assessment, with the goal of providing city authorities with actionable situational awareness. Their model achieved a classification accuracy of 88.5%, while multiple regression analyses revealed 60–90% variance between variables, confirming the concept that citizens act as “social sensors” and providing valuable insights into urban dynamics.
Extending NLP beyond text-only sources, ref. [
95] introduced a multimodal framework that integrates computer vision, graph-based reasoning, and transformer-based language models to generate structured natural language descriptions of traffic signs from visual inputs. By translating static traffic infrastructure into machine-interpretable semantic representations, their approach demonstrates how NLP can complement crowdsourced and sensor-based data by enriching traffic understanding with high-level contextual information. In a related direction, ref. [
96] proposed Visual Traffic Knowledge Graph Generation (VTKGG), leveraging a Hierarchical Graph Attention Network (HGAT) to analyze traffic scene images and extract heterogeneous elements such as roads, lanes, and sign components, while reasoning over complex relations. Evaluated on the RS10K dataset, the framework achieved F1 scores of 0.903 for detection, 0.823 for overall relation reasoning, and a mean TFPM F1 of 0.841, producing structured traffic knowledge graphs suitable for autonomous driving, map correction, and traffic assistance. The main limitation is the reliance on high-resolution images and precise annotations, with potential performance degradation under severe occlusion, low light, or motion blur.
In a complementary line of work, ref. [
97] proposed SignEye, a stepwise reasoning pipeline that interprets traffic signs from a vehicle’s first-person view (TSI-FPV) and supports autonomous driving through a Traffic Guidance Assistant (TGA). SignEye combines vision–language models with structured traffic sign descriptions in egocentric relative positions (EgoRPD), enabling lane- and road-level plan decisions, speed regulation assessment, and navigation guidance. Evaluated on the Traffic-CN dataset, the method outperformed general vision–language models, particularly in lane change and speed planning tasks, while structured descriptions ensured clarity and interpretability. Its main limitation lies in handling severe occlusions, where fully hidden signs cannot be detected. Complementing first-person view methods, ref. [
98] introduced TSI-arch, a multi-task framework that detects, recognizes, and interprets traffic signs into natural language instructions from high-resolution road images. TSI-arch decomposes the task into sign detection, sign recognition, and sign interpretation modules, leveraging pixel-level segmentation, transformer-based text recognition, and large language model-based semantic reasoning to generate accurate, syntactically and semantically coherent instructions. Evaluated on the TSI-CN dataset, TSI-arch achieved superior performance across detection, recognition, and interpretation, including handling large variations in sign scale, complex guide panels, and multiple symbols per sign; while TSI-arch demonstrates robust performance, it relies on high-quality, high-resolution imagery, and its accuracy can be affected by adverse conditions such as low light, motion blur, or occlusions.
Collectively, these studies highlight the progressive evolution of NLP-based traffic analysis from social media mining to multimodal, vision–language semantic interpretation, illustrating the increasing sophistication of language-driven approaches in transforming heterogeneous data into accurate, actionable intelligence for city management. A comparison of studies is presented in
Table 23 and
Table 24.
6.14. Clustering
Clustering is an unsupervised learning method used to identify traffic incidents by searching for natural structures and patterns in data without the need for prior human labeling of incidents. The aim of this method is to divide traffic data—such as speed, density, flow, or spatiotemporal characteristics—into groups (clusters), where each cluster represents a typical traffic condition. Incidents are identified as anomalous clusters or outliers from normal traffic behavior.
The most commonly used algorithms in traffic detection include k-means, DBSCAN (Density-Based Spatial Clustering of Applications with Noise), and hierarchical clustering. DBSCAN is particularly suitable because it can identify outliers as noise, allowing for reliable detection of atypical traffic situations without the need to define the number of clusters in advance. These algorithms analyze multidimensional traffic data and separate normal patterns of smooth traffic from patterns characteristic of incidents, such as sudden speed reductions or sharp drops in flow.
Clustering is particularly effective in environments where typical traffic behavior is highly variable, as the algorithms can dynamically adapt the classification to current conditions. Unlike traditional methods that use fixed thresholds, clustering methods enable adaptive incident detection based on actual traffic flow behavior, which increases accuracy and reduces false alarms. Combined with artificial intelligence methods and graph models, clustering can form the basis for autonomous detection systems that do not require manual supervision or static configuration.
The work of [
99] proposed the General Potential Data Field with Spectral Clustering (GPDfSC) method to cluster vehicle trajectories, combined with Bayesian decision theory to detect anomalous traffic behavior, aiming to identify abnormal traffic patterns in heavy traffic without relying on manually labeled data. Their approach achieved a detection accuracy of 98.87%, outperforming traditional methods and demonstrating successful real-time implementation. The study of [
100] applied Canopy K-Means clustering to classify traffic accident locations, using data preprocessing techniques such as normalization and missing value removal, with the objective of identifying accident hotspots and segmenting areas according to risk for traffic safety planning. This method provided more accurate results than standard K-Means and effectively highlighted high-risk zones. Expanding the use of clustering techniques for spatiotemporal traffic analysis, ref. [
101] combined the K-Means algorithm with LOESS regression to define the spatial and temporal reach of traffic events based on real-world speed changes. Their model accurately delineated the areas impacted by accidents and work zones without relying on static thresholds, demonstrating superior scalability and efficiency compared to traditional approaches. More recently, ref. [
102] integrated dynamic time warping (DTW), a graph search algorithm (GSA), and self-organizing neural network maps for trajectory clustering and traffic event detection, with the goal of improving the speed and accuracy of incident identification through vehicle motion pattern analysis. This approach achieved a 44.8% reduction in computation time while maintaining a high success rate in real-time traffic event classification. Collectively, these studies illustrate the progressive application of clustering and pattern recognition techniques for traffic incident detection and risk assessment, emphasizing the importance of combining spatiotemporal analysis with scalable, real-time methodologies to enhance traffic safety and operational efficiency. Studies are compared in
Table 25.
6.15. Graph-Based Models
Graph-based models represent an advanced approach to traffic incident detection that reflects the actual structure of the traffic network as a system of interconnected segments and nodes. The traffic infrastructure is modeled as a graph, where nodes represent intersections, traffic sensors, or road segments, and edges represent connections between them—most often individual road segments on which traffic flows. This approach makes it possible to capture spatial dependencies and interactions between parts of the transport network that traditional linear models cannot adequately represent.
Modern graph models, especially the Graph Neural Networks (GNN), can process information at the level of the entire transport system and identify incidents not only on a single section, but also their spread across the network. For example, an incident on one section of a highway affects traffic flow on connecting segments, which the graph model can capture by dynamically updating the states of nodes and edges. In addition to incident detection, these models also enable predictive analysis, estimating the probability of incidents occurring in neighboring sections based on local anomalies.
Graph models are often used in combination with convolutional and recurrent neural networks, resulting in spatio-temporal graph neural networks (ST-GNN). These analyze traffic as a spatio-temporal process, where incidents cause disruptions to the flow that spread over time and space. This enables these models to detect incidents at an early stage and predict their impact on the entire network, thereby significantly increasing the efficiency of traffic management.
The advantage of graph models is their ability to work with heterogeneous inputs, including data from sensors, GPS vehicle locations, camera systems, and information from social networks. In addition to incident detection, they are also used to optimize traffic management, identify congested sections, and design adaptive responses in real time.
The overall benefit of graph-based models is a holistic view of traffic events, which allows for the capture of complex interactions and provides more accurate, robust, and adaptive incident detection mechanisms than traditional methods.
The study by [
103] employed simulated incident traffic data to train a graph convolutional recurrent neural network (GCRNN), modeling the road network as a graph with nodes representing sensors and edges representing traffic links. The study aimed to improve traffic flow prediction during incidents, where traditional approaches often fail due to limited real-world data. The GCRNN model demonstrated lower mean absolute error (MAE) and root mean square error (RMSE) than LSTM and conventional methods, effectively capturing spatiotemporal dependencies and enabling accurate short-term predictions during traffic disruptions. In parallel, ref. [
104] designed a spatio-temporal incident dynamic graph neural network (STIDGNN), constructing dynamic graphs from historical passenger flow data and applying convolutional and recurrent operations to extract complex spatiotemporal relationships. The model addressed the limitations of standard CNNs and RNNs in capturing dynamic interactions between transport nodes, achieving lower prediction errors in both short-term and long-term forecasts, including during peak hours and special events. Complementing these approaches, ref. [
105] introduced a Variational Graph Recurrent Neural Network (VGRAN) that combines graph convolutions with Bayesian deep learning, leveraging variational inference to model uncertainty in traffic forecasts. The VGRAN framework outperformed existing models such as DCRNN (Diffusion Convolutional Recurrent Neural Network) and Graph WaveNet on the METR-LA and PEMS-BAY datasets, reducing RMSE and mean absolute percentage error (MAPE) while enabling probabilistic predictions suitable for safety-critical applications. In 2023, ref. [
106] proposed Graph (Graph), a nested graph-based framework for early traffic accident anticipation using dashcam videos. The approach models interactions between detected objects within and across frames via spatio-temporal object and frame graphs, processed with graph convolutional and attention layers. By combining local object interactions with global frame features, the framework achieves precise and early predictions, outperforming prior methods on DAD (Dashcam Accident Dataset) and CCD datasets with an AP (Average Precision) of 63.6% and mean Time-to-Accident of 4.45 s. Limitations include dependency on object detection quality, short video clip lengths, and high computational requirements for graph processing. Collectively, these studies underscore the growing importance of graph-based neural network architectures for traffic prediction, demonstrating their effectiveness in capturing spatiotemporal dependencies, modeling dynamic interactions, and providing robust, uncertainty-aware forecasts for incident-affected traffic conditions. A comparison of the studies is provided in
Table 26.
6.16. Critical Assessment of Traffic Incident Detection Approaches
Traffic incident detection methods differ fundamentally in their assumptions, data dependencies, and responsiveness to dynamic traffic conditions. Statistical and rule-based approaches are computationally efficient and interpretable, enabling fast detection of predefined incident patterns. However, their reliance on fixed thresholds and handcrafted rules often results in high false alarm rates when traffic conditions deviate from expected norms, such as during weather-induced congestion or special events.
Machine learning-based incident detection approaches enhance adaptability by learning incident patterns from historical data. These methods can capture non-linear relationships between traffic variables and incident occurrences, improving detection accuracy in moderately complex scenarios. Nevertheless, they are sensitive to class imbalance, limited availability of labeled incident data, and temporal variability in traffic patterns. Models trained on historical data may fail to generalize to unseen incident types or evolving traffic behaviors.
Deep learning-based and spatio-temporal modeling approaches offer improved performance by leveraging temporal dependencies, trajectory information, and multimodal inputs; while these methods demonstrate superior detection capability, they introduce increased computational complexity and latency, which may conflict with real-time incident management requirements. Moreover, the scarcity of high-quality, event-level annotated datasets restricts robust evaluation and comparative benchmarking across methods.
A key trade-off in traffic incident detection lies between detection timeliness and reliability. Early detection systems may generate false alarms, whereas conservative models risk delayed responses with potentially severe consequences. As a result, incident detection frameworks increasingly emphasize probabilistic reasoning, multi-source validation, and confidence-aware decision-making to balance responsiveness and accuracy.
7. Discussion
An analysis of the reviewed articles reveals trends in the development of traffic accident detection methods. Classical approaches, such as threshold-based measurements and statistical filters, provide excellent interpretability but lack flexibility in the face of dynamic changes in traffic. Although these techniques are effective in controlled conditions, they cannot be generalized to complex traffic patterns or environmental disturbances. Advanced artificial intelligence methods, including predictive modeling, clustering, and graph-based architectures, consistently outperform classical methods in detection accuracy and response time. However, this improvement is often based on models trained on datasets obtained in ideal weather conditions, raising concerns about their robustness in practical deployment.
Beyond summarizing performance trends, a key objective of this discussion is to identify open research challenges that limit the real-world applicability of existing traffic detection methods and to outline potential mitigation strategies. While recent advances have significantly improved detection accuracy, the reviewed literature reveals persistent gaps related to environmental robustness, generalization across datasets, real-time constraints, and evaluation practices. Addressing these challenges is critical for transitioning traffic detection systems from controlled experimental settings to large-scale operational deployments.
To provide a clearer comparison across methods,
Table 27 summarizes classical and AI-based traffic detection approaches, highlighting methodology, objective, performance results, and real-time capability. The additional column on real-time capability was included in response to reviewer comments, allowing readers to assess the practical deployment potential of each method under operational constraints.
Some authors have included adverse weather conditions in training and evaluating models, including research utilizing wavelet-based techniques [
55] and Kalman filter extensions [
51]. These studies demonstrate that environmental noise significantly increases the number of false alarms and reduces sensitivity unless models are specifically adapted or retrained for these conditions.
While
Table 27 provides a structured comparison of classical and AI-based approaches, it also reveals that high detection accuracy alone is insufficient for practical deployment. Methods achieving strong benchmark performance often rely on computationally intensive architectures or assume stable sensing conditions, since simpler models remain attractive in real-time scenarios but struggle under complex traffic dynamics. This observation highlights a fundamental trade-off between accuracy, robustness, and computational efficiency that must be carefully balanced in real-world traffic detection systems.
From a system-level perspective, the reviewed studies indicate that traffic object detection and traffic incident detection should not be treated as independent components, but rather as tightly coupled stages within a unified detection pipeline. The performance limitations observed at the incident detection level are often rooted in upstream perception errors, including missed object detections, inaccurate localization, or unstable trajectory estimation. Such errors may propagate through the system, amplifying false alarms or leading to delayed incident recognition.
A critical trade-off emerges between detection accuracy and operational efficiency. High-capacity object detection models provide richer semantic information and improved recognition under complex traffic interactions, yet their computational demands may conflict with real-time incident detection requirements, particularly in large-scale or edge-based deployments. Conversely, lightweight object detection approaches enable low-latency processing but may lack the contextual detail required for reliable incident inference under occlusion or adverse weather conditions.
The reviewed literature further suggests that robustness to environmental variability must be addressed at both detection levels simultaneously. While some studies incorporate weather-aware object detection or sensor fusion strategies, incident detection models often assume stable upstream perception. This mismatch highlights the need for end-to-end optimization and confidence-aware decision mechanisms that explicitly account for uncertainty introduced at the object detection stage.
Beyond algorithmic design, the reviewed literature reveals substantial challenges related to evaluation practices and context-dependent performance assessment. Reported detection accuracies and response times are strongly influenced by dataset composition, traffic density, camera viewpoint, annotation granularity, and environmental conditions. As a result, performance metrics obtained under controlled benchmark settings often fail to capture the variability and uncertainty encountered in real-world traffic deployments.
Furthermore, evaluation protocols frequently rely on aggregate metrics, such as average precision or overall detection rate, which may obscure critical failure modes, including missed detections under occlusion, delayed incident recognition, or elevated false alarm rates during abnormal traffic conditions. In safety-critical applications, these limitations can significantly impact operational reliability, even when benchmark performance appears favorable.
The lack of standardized evaluation frameworks further complicates cross-study comparison. Differences in dataset splits, incident definitions, temporal evaluation windows, and performance thresholds limit the interpretability of reported results. Consequently, future research should emphasize context-aware evaluation strategies that explicitly account for environmental variability, traffic dynamics, and deployment constraints in order to provide more meaningful and actionable performance assessment.
These limitations indicate that future progress in traffic detection requires mitigation strategies that go beyond architectural improvements. Promising directions include multimodal sensor fusion to reduce sensitivity to adverse weather, uncertainty-aware inference to account for noisy upstream detections, lightweight model compression techniques for edge deployment, and standardized evaluation protocols that better reflect real-world operational conditions. Without such integrated solutions, performance gains reported under controlled benchmark settings are unlikely to translate into reliable field deployment.
Beyond evaluation practices, several persistent challenges limit the practical deployment of traffic detection systems. Adverse weather conditions, including rain, fog, snow, and low-light scenarios, significantly degrade sensor data quality and remain a major source of performance instability for both object detection and incident detection models; while some approaches incorporate weather-aware training or sensor fusion, robustness under diverse and rapidly changing environmental conditions remains insufficiently addressed.
Occlusion represents another critical challenge, particularly in dense urban traffic where overlapping vehicles, pedestrians, and infrastructure elements hinder reliable perception and tracking. Partial or prolonged occlusions frequently lead to fragmented trajectories and missed detections, which subsequently impair higher-level incident inference and risk assessment.
Scalability further constrains real-world applicability, as large-scale traffic monitoring systems must process high-resolution data streams from numerous sensors under strict real-time constraints. Increasing model complexity and data volume exacerbate computational and communication demands, especially in centralized architectures. These challenges highlight the necessity of distributed processing, edge-based inference, and adaptive resource management to enable scalable and resilient traffic detection solutions.
Despite these advances, most of the reviewed articles evaluate their models under normal weather conditions, pointing to a significant gap in the literature. With regard to real-world deployment, future research should prioritize environmental adaptability as a key performance criterion, incorporating multimodal sensor data and learning strategies that take weather conditions into account in order to improve resilience. The inclusion of adverse weather conditions is not only beneficial but essential for the development of next-generation intelligent transportation systems capable of reliably detecting incidents under all operating conditions.
In addition to accuracy and real-time performance, environmental adaptability is a critical consideration for AI-based traffic detection methods. Previous studies often report degradation under adverse conditions without disentangling the contributions of individual technical modules. In this work, we address this by performing targeted, module-oriented ablation experiments that assess how components such as the feature extraction backbone, residual blocks, and transfer learning configurations contribute to robustness under diverse environmental conditions [
107]. This approach clarifies the technical pathway for “environmental intelligent integration” and highlights which modules most effectively mitigate performance loss in scenarios such as snow, rain, or partial occlusion.
More broadly, this analysis underscores the need to move beyond end-to-end performance reporting toward component-level understanding and system-aware evaluation, which remain largely underexplored in current traffic detection research.