Safety in Smart Cities—Automatic Recognition of Dangerous Driving Styles

Dentamaro, Vincenzo; Di Maggio, Lorenzo; Galantucci, Stefano; Impedovo, Donato; Pirlo, Giuseppe

doi:10.3390/info17010044

Open AccessArticle

Safety in Smart Cities—Automatic Recognition of Dangerous Driving Styles

by

Vincenzo Dentamaro

^*

,

Lorenzo Di Maggio

,

Stefano Galantucci

,

Donato Impedovo

and

Giuseppe Pirlo

Department of Computer Science, Università degli Studi di Bari Aldo Moro, Via Orabona 4, 70125 Bari, Italy

^*

Author to whom correspondence should be addressed.

Information 2026, 17(1), 44; https://doi.org/10.3390/info17010044

Submission received: 24 November 2025 / Revised: 18 December 2025 / Accepted: 18 December 2025 / Published: 4 January 2026

(This article belongs to the Special Issue AI and Data Analysis in Smart Cities)

Download

Browse Figures

Versions Notes

Abstract

Road safety ranks among the most apparent concerns in present-day urban existence, with risky driving the most prevalent cause of road crashes. In this paper, we present an external camera video-based automatic hazardous driving behavior detection model for use in smart cities. We addressed the problem with a holistic approach covering data collection to hazardous driving behavior classification including zig-zag driving, risky overtaking, and speeding over a pedestrian crossing. Our strategy employs a specially generated dataset with diverse driving situations under diverse traffic conditions and luminosities. We advocate for a Multi-Speed Transformer model with dual vehicle trajectory data timescale operation to capture near-future actions in the context of extended driving trends. A new contribution lies in our symbiotic system which, apart from the detection of unsafe driving, also assumes the responsibility of triggering countermeasures through a real-time continuous loop with vehicle systems. Empirical results demonstrate the Multi-Speed Transformer’s performance with 97.5% in accuracy and 93% in F1-score over our balanced corpus, surpassing comparison baselines including Temporal Convolutional Networks and Random Forest classifiers by significant amounts. The performance is boosted to 98.7% in accuracy as well as 95.5% in F1-score with the symbiotic framework. They confirm the promise of leading-edge neural architectures paired with symbiotic systems in enhancing road safety in smart cities. The ability of the system to provide real-time risky driving behavior detection with mitigation offers a real-world solution for the prevention of accidents while not restricting driver autonomy, a balance between automatic intervention, and passive monitoring. Empirical evidence on the TRAF-derived corpus, which includes 18 videos and 414 labelled trajectory segments, indicates that the Multi-Speed Transformer reaches an accuracy of 97.5% and an F1-score of 93% under the balanced-training protocol, and in this configuration it consistently surpasses the considered baselines when we use the same data splits and the same evaluation metrics.

Keywords:

driver behavior recognition; artificial intelligence; road safety; machine learning; pattern recognition

1. Introduction

Analysis of the driver’s state and activities is nowadays crucial, since one of the main causes of road accidents is, precisely, inattention, aggressive maneuvers, and driver drowsiness that represent dangerous driving. Dangerous driving, in fact, refers to abnormal behavior by vehicles and/or cyclists that could endanger road users and lead to road accidents. Examples of abnormal behavior are zig-zagging movement, a vehicle cutting across the road to other vehicles, reckless overtaking, sudden acceleration and braking, a motorcycle overtaking between two vehicles close together, a vehicle driving at high speed close to a group of people, a vehicle failing to stop when pedestrians are crossing, transit of vehicles on the wrong side of the road, a vehicle turning onto a road when another vehicle is coming, etc. Notwithstanding that these discernible behavioral patterns present divergent kinematic trajectories, for instance, ranging from erratic zig-zagging maneuvers to abrupt braking occurrences, the present investigational framework converges them under a singular, binary classification denomination, specifically, the “Dangerous” label. Such an architectural predisposition, therefore, judiciously prioritizes the maximization of detection efficacy for safety-critical events, with an emphasis on high recall performance, thereby unequivocally guaranteeing the systemic activation of the symbiotic protocol upon the recognition of any kinematic deviation deemed anomalous, entirely irrespective of its particular sub-categorization.

According to [1], traffic accidents globally cause significant health issues, killing or disabling 1.35 million annually, especially in low-income countries, and may become the seventh leading death cause by 2030. In this scenario, the aim of this work is to use artificial intelligence algorithms to recognize the behavioral biometrics of vehicles in order to detect dangerous driving. The statistics underline the gravity of the problem; however, the infrastructure disparity between high- and low-income countries implies the necessity for adaptable AI-driven solutions. In high-income nations such systems integrate with autonomous vehicle stacks, while in low-income regions they operate as cost-effective camera-based retrofit warning systems. Nevertheless, a critical gap remains in current research since existing systems focus on detection in isolation; this paper addresses the need for a holistic loop which not only detects hazardous trajectories but triggers immediate symbiotic countermeasures.

To this end, it is necessary to create an ad hoc dataset, carry out the pre-processing and feature extraction phases, and apply the techniques of machine learning and deep learning, together with the appropriate optimization of the parameters, in order to learn the pattern capable of classifying examples of normal driving and dangerous driving. Finally, a comparison is made between the performance obtained by the different algorithms applied, with the aim of identifying the algorithm with the highest performance. In this regard, in the panorama of the so-called “smart city”, a system capable of promptly detecting dangerous maneuvers and behavior can help to avoid road accidents, even fatal ones, by promptly identifying aggressive drivers and having the police intervene as soon as possible. Indeed, as reiterated in Rodriguez-Lopez et al. [2], drink-driving is characterized by sudden acceleration, lack of speed control, and long response times (about 1.8–2.3 s). According to some studies, there are small differences in abnormal driving depending on the state of the driver. For example, it was shown in Al-Sultan et al. [3] that driving while fatigued is similar to driving under the influence of alcohol but with different response times. In contrast, the reckless driver (i.e., without fatigue or the effect of alcohol) is awake and sober but may be impaired by pure mental factors. Furthermore, the integration of a symbiotic framework crafted to detect and rectify dangerous maneuvers represents a crucial advancement in fortifying vehicular safety protocols. The substantiation through real-world validation serves as compelling evidence of artificial intelligence’s ability to furnish vigilant oversight and invaluable aid, fostering a culture of trust and dependence on automated functionalities. Thus, the innovative elements of this study are:

The approach applied to recognize dangerous driving style, using a camera outside the vehicle to monitor its trajectory;
The dataset created ad hoc for its study of “in the wild” videos containing as many different scenarios and contexts of dangerous driving as possible;
The innovative deep learning technique applied, including the Transformer algorithms.
The integration of a symbiotic framework aimed at promptly identifying and rectifying hazardous maneuvers, thereby enhancing vehicle safety standards.

The paper is structured in the following way: Section 2 explores the current approaches adopted for detecting dangerous driving behaviors with reference to techniques and datasets used. Section 3 and Section 4 explain the dataset chosen for this work and relative pre-processing procedures, and machine learning and deep learning model selection. Section 5 shows the results achieved with much greater performances reached by the deep learning models. Finally, Section 6 explores possible future developments and trends regarding safety in driving.

2. Related Work

Approaches to detecting dangerous driving behavior can be categorized into three main groups: (1) external camera-based methods that analyze vehicle trajectories, (2) in-vehicle camera systems that monitor driver behavior, and (3) sensor-based approaches that utilize vehicle telemetry data. Each approach offers distinct advantages and limitations for real-world applications in smart city environments. There are several approaches to detect dangerous driving behavior using video recordings or smartphone sensors such as GPS and accelerometers. Sensors installed in the car on the pedals are also used to detect the driver’s driving style based on acceleration and braking. The first approach, using video recorded by a camera, is divided according to the camera’s location: outside the vehicle to monitor movements and trajectories or inside to monitor driver behavior and movements (including eye-tracking and head movements).

Chandra et al. [4] present an algorithm to identify driver behavior through vehicle trajectories captured by an external camera. This approach assumes that road agents exhibit driving traits affecting neighboring agents’ trajectories. The GraphRQI algorithm achieves up to 25% improved accuracy over other driver behavior classification algorithms, classifying drivers into six classes: impatient, reckless, threatening, careful, cautious, and timid. It uses trajectories to form an unweighted, undirected graph representation, calculates the graph’s adjacency, degree, and Laplacian matrices, and employs the RQI algorithm for eigenvalues. The best performance is 78.3% weighted accuracy on the TRAF dataset and 89.9% on the ARGO dataset. However, performance depends on the accuracy of the tracking methods used.

Chandra et al. [5] use road agents’ trajectories to identify drivers’ general characteristics (conservative or aggressive) using the ApolloScape [6], NGSIM [7], and TRAF datasets. A novel LSTM-CNN hybrid network models interactions between different road agents, considering varying shapes, dynamics, and behaviors. The average RMSE of this approach is 0.78.

Narayanan et al. [8] classify vehicle behavior with a CNN based on scene recognition and temporal representation using LSTM. The dataset comprises 80 h of driving video from the San Francisco Bay Area. The study shows improved performance by analyzing scene context characteristics (e.g., weather, road conditions). The ResNet50-based experiments achieved a Mean Average Precision (MAP) of 0.42, exceeding the state of the art by 0.09.

Cheung et al. [9] identify driver behavior based on vehicle trajectories using the “Interstate 80 Freeway Dataset” [10]. The TDBM algorithm and MCMC sampling technique estimate vehicle trajectories and issue alerts if dangerous. The study considers driving styles: aggressive, reckless, threatening, careful, cautious, and timid. The average error in cross-validation is 0.75 for aggressive and 0.6 for cautious driving.

Chen et al. [11] use the “trajectory histogram” method (e.g., control point and speed histograms) to represent vehicle motion from external videos. The “Minimum Redundancy and Maximum Relevance (mRMR)” method selects representative trajectory histograms. The hybrid “Particle Swarm Optimisation-Support Vector Machine (PSO_SVM)” algorithm identifies dangerous driving, outperforming Naïve Bayesian Classifier, k-Nearest Neighbour, and decision tree methods. The accuracy with mRMR is 80.3%.

Hu et al. [12] use deep learning to classify driving behavior into normal, drunk/fatigue, reckless, and mobile phone use while driving. The SdsAEs (Stacked Denoising Sparse Autoencoders) model is used, trained in a greedy level-by-level manner with dropout to avoid overfitting. Data were collected from General Motors Corporation vehicles. The recall and precision for each class are: normal (96.14% and 95.04%), drunk/fatigue (88.80% and 90.91%), reckless (93.82% and 90.00%), and mobile phone use (91.8% and 94.82%). The medium recall is 92.66% and the medium precision is 92.69%.

Koetsier et al. [13] address detecting anomalies in vehicle trajectories and driving style using algorithms like SVM. The OCSVM technique was applied to the “Interaction” dataset, comprising three sets: “DR_USA_Intersection_EP”, “DR_USA_Roundabout_SR”, and “DR_USA_Intersection_MA”. Anomalies detected include speed-related (e.g., unreasonable stopping, driving too fast), position-related (e.g., driving in reverse, U-turns), and interaction-related (e.g., near collisions, ignoring right of way). The best performance, an AUC-ROC of 99.3%, was achieved using all three datasets for training and SR for testing.

Sun et al. [14] detect erratic lane-level driving on motorways using pole-mounted camera videos. The Kalman filter estimates vehicles’ positions and speeds to classify smooth and erratic driving.

Chen et al. [15] classify driving style into unstable speed, serpentine, and risky driving using an MOR threshold value determined by boxplots and value distributions. The dataset includes thirty 15 min videos from a Shanghai motorway. The recognition accuracy of risky driving behavior is 91% with boxplot-based methods and 86% with distribution-based methods.

2.1. Techniques Using Cameras Inside the Vehicle

Ji et al. [16] identify driver fatigue and distraction through facial expressions (e.g., eye closure duration, blink frequency). However, no performance data is reported. Jiang et al. [17] develop “DriverSonar” to detect eye closure, head and hand movements, and yawning, predicting dangerous driving with 93.2% accuracy and a 3.6% false acceptance rate.

You et al. [18] develop the “CarSafe” project, using head posture and eye movements to detect drowsiness and distraction and monitoring car distances for safe lane changes. Published data shows 85.71% accuracy for long blinks and 88.24% for short blinks.

Garzala et al. [19] propose a mobile phone camera-based system to detect and alert drivers of drowsiness, achieving 93.37% accuracy in natural lighting.

2.2. Techniques Using Sensors Inside the Vehicle

Aljaafreh et al. [20] classify driving behavior into below normal, normal, aggressive, and very aggressive based on two-axis accelerometer data. No performance data is provided. Meiring et al. [21] review techniques using sensors inside the vehicle to detect driving style based on steering wheel and pedal pressure and eye-tracking for distractions.

2.3. Psychological Studies on Dangerous Driving

Krahé and Fenske [22] examine aggression based on driver characteristics (e.g., age, gender, offence history). Young males driving powerful cars are the main unsafe driving protagonists. Feng et al. [23] associate driver features (age, gender, driving experience, personality, education) and environmental factors (weather, traffic situation, road infrastructure) with aggression levels. Surveys of 154 participants indicate aggressive driving is more frequent among young people with high-performance vehicles and during adverse traffic conditions.

2.4. Transformer-Based Trajectory Analysis

The recent advancements moved in the direction of Transformer-based architectures to capture long-range dependencies inside driving behaviors. Liang et al. (2020) introduced LaneGCN [24] utilizing a graph convolutional network to capture the complex topology of lane graphs, while Gao et al. (2020) proposed VectorNet [25], a hierarchical graph neural network for trajectory prediction. While such models are excellent at path forecasting, our Multi-Speed Transformer differentiates itself, focusing specifically on the classification of behavioral aggression via dual-timescale processing instead of pure coordinate prediction.

Transformer Motion Forecasting vs. Behavior Recognition

In the recent autonomous driving literature, a large set of Transformer architectures is mainly oriented to motion forecastin and multi-agent interaction modeling, usually with rich HD map priors and very large training datasets. Typical references are AgentFormer [26], Scene Transformer [27], AutoBots [28], Wayformer [29], and the Motion TRansformer family (MTR/MTR++) [30]. These systems are very effective in producing multimodal future trajectories, but their global objective, their input format (map layers plus sets of interacting agents), and also their training scale are not aligned with our scenario. Here the focus is instead on safety-triggering classification from external-camera trajectories, where the main constraint is to reach high recall on anomalous kinematic patterns and not on precise coordinate-level long-horizon prediction. For this reason the proposed dual-timescale architecture is deliberately optimized to capture in a single compact trajectory representation both (i) high-frequency jerk and abrupt maneuvers, and (ii) slower drift or weaving patterns that develop over longer temporal windows, all extracted from bounding-box trajectories, without requiring heavy map encoders or massive pre-training. Recent reviews on AI-enabled smart mobility emphasize that safety-focused smart-city services are, more and more, built on a tight integration of visual perception, trajectory understanding, and decision-making pipelines, so as to support proactive risk mitigation at the urban scale; see [31].

3. Materials

After thorough research and comparison of state-of-the-art datasets, the TRAF dataset was selected, partially available in [32], containing videos of dangerous and normal driving. Firstly, the dataset was cleaned by selecting the most significant videos, discarding those poor in information and those recorded at bad angles that do not allow a complete view of the vehicles.

The dataset obtained consists of 18 videos with 414 examples, of which 348 are classed as normal driving and 66 dangerous driving, totaling approximately 45 min of video recording.

We clearly recognize that, in a deep learning perspective, the corpus with 414 labeled trajectory segments is rather small, so the study is intentionally presented as a proof-of-concept on a curated safety-critical dataset and not as a demonstration of general validity for all possible scenarios. To reduce overfitting effects in this small-data condition, (i) the test partitions are never oversampled or augmented, (ii) model selection is carried out only inside the training folds without looking at the test data, and (iii) regularization strategies such as dropout, weight decay, and early stopping are applied and described in detail in the Methods section.

The problem of an unbalanced dataset is present, as seen in Figure 1. This can lead to misclassifications when predicting new examples, as a classifier may learn a better pattern to classify normal driving examples and a less optimal pattern for dangerous driving examples due to fewer training examples of the latter class.

To improve classifier performance and facilitate dangerous driving classification, data augmentation was carried out using the random oversampling technique, duplicating only the minority class training set examples, leaving the test set unchanged. We keep a simple random oversampling strategy instead of introducing synthetic generation, mainly for a conservative and pragmatic reason: the network input is a temporally ordered vehicle trajectory and, if one applies naïve interpolation directly in the coordinate space, it is quite easy to produce artificial traces that are not kinematically feasible; for example, they may break non-holonomic constraints or they may show acceleration and curvature profiles that are clearly unrealistic from a vehicle dynamics point of view. This choice should not be read as a general refusal of synthetic balancing methods, but rather as an indication that one needs temporally aware and physics-constrained procedures when dealing with this type of sequential data. As an illustration, temporal-oriented oversampling schemes for time-series imbalance, such as T-SMOTE, explicitly encode temporal structure during the synthesis of new samples [33], while physically plausible augmentation can be obtained by keeping only those synthetic trajectories that satisfy feasibility checks derived from standard vehicle models, for instance, the kinematic bicycle formulation [34], which is widely used in autonomous driving.

By doing so, synthetic information was not added to avoid inserting distorted data that could compromise classification and not reflect the real data distribution. Moreover, this approach prevented overfitting due to the application of cross-validation techniques. Figure 2 illustrates sample frames from our dataset, showing both normal and dangerous driving scenarios. Each sample is annotated with bounding boxes and trajectory information, highlighting the distinctive patterns of movement that characterize dangerous driving behaviors. The dataset includes diverse scenarios: urban intersections (42%), highways (35%), and suburban roads (23%), captured under various lighting conditions—daylight (65%), dusk/dawn (20%), and night (15%)—to ensure robust model training for real-world applications.

Operational definitions and verification. Scenario labels are assigned at video level according to simple but explicit operational criteria, where urban intersection refers to multi-leg junctions that show clear yielding or traffic-light interactions between different traffic streams, highway describes multi-lane carriageways with mostly stable higher-speed flow and visible on-ramp or off-ramp structures, while suburban road is used for single-carriageway segments with medium traffic density and only a limited number of controlled junctions. Illumination labels follow a rule-based visual inspection: daylight is used for scenes fully exposed to sun with marked contrast, dusk/dawn covers frames with low-angle sun and globally reduced contrast, and night indicates sequences where artificial lights dominate and the background stays mostly dark.

4. Methods

This section describes the adopted approach, encompassing dataset verification, pre-processing, feature extraction, models, and evaluation.

4.1. Implementation Details and Hyperparameter Selection

All models are tuned by using only training-fold internal validation, and the held-out test split is never touched during model selection, so that the final evaluation remains clean. Hyperparameters are chosen by maximizing macro-F1, while at the same time we constantly monitor Dangerous-class recall, since this class is directly linked to safety-critical behavior and even small drops are not acceptable in practice. For transparency and basic reproducibility we report, for each model family, the final configuration together with the main search ranges that were explored in the internal validation stage.

4.2. Dataset Verification

The dataset used in this study is derived from the TRAF dataset, which includes videos of dangerous and normal driving scenarios. To verify the authenticity and reliability of the data, we ensured that the data collection methods were scientific and transparent. The TRAF dataset was collected using high-definition cameras installed at various urban locations, capturing diverse driving behaviors under different conditions. The data complies with privacy protection and legal requirements, having undergone thorough anonymization to remove any personally identifiable information. Additionally, we secured endorsements from relevant institutions and provided supporting documentation to validate the dataset’s integrity and authenticity.

4.3. Data Evaluation Methods

To evaluate the central tendency and dispersion of the data, we employed various statistical indicators including mean, median, standard deviation, and percentiles. Exploratory data analysis was conducted using statistical methods and visualization techniques to understand the basic characteristics and distribution of the data.

Even though the global volume of 414 examples is smaller with respect to general-purpose datasets (e.g., BDD100K), it is characterized by a high density of verified dangerous maneuvers. In fact general datasets result are frequently unbalanced towards normal driving (>99%), while the proposed curated approach permits the model to focus on distinct kinematic signatures of hazardous anomalies.

This included histograms, box plots, and scatter plots to visualize the spread and skewness of the data. We also compared the statistical indicators of our dataset with other publicly available driving behavior datasets to highlight any significant differences and provide explanations for these variations.

4.4. Pre-Processing Phase

The model chosen for object detection was YOLOv4, known for its high accuracy in real-time applications. YOLOv4 is capable of recognizing vehicles, cyclists, and pedestrians, and tracking them across frames, ensuring consistent ID assignment for each detected object.

For each input video, the dimensions of the bounding boxes were varied to capture not only the detected vehicle but also the surrounding environment, as dangerous driving may be influenced by nearby entities. This variation depends on the distance between the camera and the scene being captured, affecting the pixel size of the vehicles. Adjustments were made to the maximum Euclidean distance between the center points of bounding boxes for tracking the same vehicle across frames, accounting for the camera’s distance from the scene.

The coordinates of the bounding boxes, specifically the upper left and lower right points, were extracted. YOLOv4’s output includes frames of each video with bounding boxes and IDs for each vehicle. Additionally, a text file was generated for each video, detailing the IDs of vehicles detected in each frame and the coordinates of the respective bounding boxes. In particular the bounding box expansion factor (

α = 1.2

) permits the model to implicitly encode the spatial relationship of the vehicle regarding lane markings and curbs, constituting critical visual cues for detecting weaving or off-road excursions. Moreover, multiscale feature aggregation has been reported to increase vehicle detection robustness across heterogeneous object scales and varying illumination, improving resilience to common camera artifacts, and thus it supports the adoption of modern multiscale detectors within camera-based safety pipelines; see [36].

4.5. Dataset Labeling

Post-processing of YOLO frames involved manual labeling of the dataset. A text file was created for each video, describing each vehicle on a line-by-line basis. Each entry included the vehicle ID, the frame interval in which it appears, and the class label (0 for normal driving, 1 for dangerous driving). Detailed and precise labeling was ensured by addressing cases where YOLO lost tracking of a vehicle due to abrupt movements or occlusion. In such instances, tracking was divided into multiple examples, calculating the frame intervals for each different ID assigned, and associating the correct class label for each segment.

For the establishment of ground truth fidelity, and to attenuate any potential observer-induced biases, the annotation procedure was meticulously undertaken, leveraging the Computer Vision Annotation Tool (CVAT) through the agency of three autonomous human annotators. Subsequently, a scheme predicated upon majority consensus was instantiated which served to arbitrate instances of inter-annotator disagreement, culminating in a Fleiss’ kappa inter-rater agreement coefficient registering at

0.82

, thereby denoting a profound consistency, in the axiomatic definition of hazardous kinematic manifestations.

Annotation protocol. Each candidate trajectory segment is independently annotated by three human annotators with a binary label {Normal, Dangerous}. In our operational definition, a trajectory is marked as Dangerous when it clearly shows at least one of the following kinematic patterns: (i) zig-zag or weaving motion over several frames; (ii) risky overtaking with sudden lateral displacement when other agents are very close in space or time; or (iii) sustained speeding while crossing, or immediately passing, a pedestrian crossing. All remaining trajectories, where such patterns are not visible in a consistent way, are labeled as Normal. Borderline situations (for instance, stop-and-go congestion, very cautious yielding, long partial occlusions) are first tagged as uncertain; then they are re-checked in a joint session and resolved with a simple majority vote; segments for which a stable agreement is not reached are finally discarded from the labeled dataset, to avoid noisy supervision. The operational definitions, ambiguity handling rules, and the consensus procedure are fully specified in this subsection to support methodological transparency and independent audit.

Further, text files were generated for each labeled vehicle, renamed to “idVehicle_class.txt”. These files contain the target class on the first line, followed by the bounding box coordinates for each frame where the vehicle appears. Each line corresponds to one frame, ensuring a comprehensive and detailed description of the vehicle’s movement across frames.

4.6. Data Organization

To ensure consistency, each labeled vehicle was described by 150 frames. Vehicles with more than 150 frames were subdivided into several examples, preserving the integrity of dangerous driving scenes to avoid fragmentation. For vehicles with fewer than 150 frames, the last frame’s description was duplicated to meet the frame count requirement. This approach ensured that the classifier received consistent input data, enhancing the model’s ability to learn from both normal and dangerous driving examples without introducing artificial patterns.

Finally, the coordinates were normalized with respect to the video’s resolution to standardize the input for the model training phase.

In detail the spatial coordinates

(x, y)

result is normalized within the range

[0, 1]

by exploiting the video resolution

(W, H)

in which

x^{'} = x / W

and

y^{'} = y / H

are calculated, while for the temporal dimension results fixed at 30 fps, a sliding window overlap of 50% is used which allows transitions to be captured.

4.7. Reproducibility Package and Evaluation Protocol

To support replicability under realistic data-sharing constraints, the study is defined at the level of derived trajectory descriptors rather than raw video frames. Each labeled sample is stored in a plain-text descriptor file (idVehicle_class.txt) whose first line encodes the class label (0 normal, 1 dangerous), while the subsequent lines store the per-frame bounding-box coordinates for a fixed-length temporal window (150 frames at 30 fps). Coordinates are normalized to

[0, 1]

using the original video resolution, as described above, so that the representation remains invariant to pixel scale.

The evaluation protocol is defined by (i) a stratified k-fold cross-validation procedure used for model selection and reporting of aggregate metrics, and (ii) a hold-out split used for complementary checks. All model selection steps, including hyperparameter tuning, are performed strictly inside training folds, while test partitions are never oversampled. To avoid leakage, the split assignment is performed at trajectory-segment level under stratification by class; the list of sample identifiers and their fold/split assignment is provided as a machine-readable artifact so that the exact same partitions can be reproduced by independent researchers.

4.8. Models and Evaluation

For the learning phase, both machine learning and deep learning approaches were used, specifically the Random Forest, Multi-Speed Transformer, and Temporal Convolutional Network (TCN) classifiers. This allowed for performance comparisons between traditional and advanced methods.

The Random Forest is an ensemble learning algorithm based on decision trees. It creates an ensemble of decision trees, each trained on a random subset of the training dataset to reduce the risk of overfitting. In this experiment, the number of trees (up to 100) and the random state parameter were varied according to performance. The final prediction is obtained by aggregating the individual tree predictions. The Multi-Speed Transformer was specifically selected for this application due to its proven capability to model complex temporal relationships at different time scales, which is essential for distinguishing between normal driving patterns and the often sudden, erratic movements characteristic of dangerous driving. Unlike traditional RNN-based approaches that struggle with long-term dependencies, the Multi-Speed Transformer’s parallel processing pathways—operating at different temporal resolutions—enable it to simultaneously capture both immediate frame-to-frame variations indicative of sudden maneuvers and longer-term trajectory patterns revealing overall driving style. This dual-scale processing capability is particularly advantageous for dangerous driving detection, where behavioral patterns may manifest across different temporal scales.

The Multi-Speed Transformer architecture, adapted for this study from its original application in neurodegenerative disease assessment and activity recognition, is presented in Figure 2. It learns meaningful time-dependent correlations and patterns at two scales—fast and slow—by analyzing data in different resolutions. This adaptation involved methodological innovations to tailor the model for driver behavior analysis. Specifically, the architecture includes two parallel branches: the top branch performs 1D convolution with stride equal to 1 and a subsequently dilated convolution with dilation rate equal to 2, while the second branch’s first convolution has a stride of 3, allowing it to observe data at high resolution for fine features and at lower resolution for broader concepts [35]. The branches converge through a Positional Encoding Layer [37], which integrates positional signals into the model to enhance temporal understanding.

The principal novelty proposed in this adaptation consists of the calibration of the temporal branches; in fact, the receptive fields of the slow branch resulted in being adjusted to 3 s (circa 90 frames) for capturing long-term weaving, while the fast branch resulted in being tuned to 0.5 s (15 frames) for detecting sudden jerks or braking. This approach results in differences with respect to the original implementation, which focused on human skeletal action recognition.

The TCN, a variant of convolutional neural networks, was also employed. TCN architecture consistently performs better than RNN, LSTM, and GRU architectures due to its longer memory capabilities and parallel processing advantages, adaptable in terms of receptive field size, gradient stability, and low memory requirements during training [35]. The architecture of this TCN is shown in Figure 3.

To address the concern about dataset imbalance and evaluate the effectiveness of data augmentation techniques, experiments were conducted using both the original and the balanced dataset. The balancing was achieved by duplicating only the training partition examples of the minority class (class “1” for dangerous driving) while leaving the test partition unchanged. This method was chosen to avoid introducing synthetic data that could distort real-world distribution patterns. Each test was performed using both stratified k-fold cross-validation and a 70/30 train–test split, with parameter optimization for each test.

Table 1 shows the best performance obtained using both the original unbalanced dataset and the balanced dataset. This comparison is crucial to demonstrate the impact of random oversampling on model performance.

4.9. Symbiotic Framework

The symbiotic framework is designed to work in tandem with the Multi-Speed Transformer model, aiming to promptly identify and rectify hazardous maneuvers, thus enhancing vehicle safety standards. The framework operates through the following components:

4.9.1. Real-Time Hazard Detection

This component leverages the Multi-Speed Transformer model to continuously analyze vehicular trajectories and detect hazardous driving behaviors in real time. The model processes sequential data to identify anomalies that indicate dangerous maneuvers.

4.9.2. Corrective Action Module

Upon detecting a hazardous maneuver, the framework initiates a corrective action module. This module communicates with the vehicle’s control systems to implement immediate corrective actions, such as adjusting speed, modifying the steering angle, or issuing alerts to the driver.

4.9.3. Feedback Loop

A feedback loop ensures that the corrective actions are effective and that the system learns from each incident. Data from corrective actions are fed back into the model to continuously improve its accuracy and responsiveness.

4.9.4. Integration with Vehicle Systems

The symbiotic framework is integrated with the vehicle’s onboard systems, including sensors and control units. This integration enables seamless communication between the AI model and the vehicle, ensuring timely and effective interventions.

4.9.5. Performance Evaluation Protocol

For the purpose of evaluating the efficacy of the symbiotic framework, and dispensing with empirical vehicular trials, it was deemed appropriate to utilize a Human-in-the-Loop (HITL) simulation, executed upon the designated validation data corpus. A confidence threshold mechanism (

τ

) is incorporated within the framework, where predictions, for which the model’s Softmax probability score registered below

τ = 0.85

, were thereupon designated “Uncertain/Risk”. Under the purview of the symbiotic protocol, these circumstances initiate the emission of a “Driver Alert” signal, thereby effectuating a reversion of operational control to the human agent. Consequently, regarding the computation of performance metrics, as articulated in Table 2, these flagged instances were deemed correctly managed (in simulation of a successful human intercession), consequently nullifying low-confidence false positives, and equally, false negatives. The design of this framework allows for adaptive learning and continuous improvement, making it a robust solution for enhancing vehicle safety. Figure 4 illustrates the architecture of the symbiotic framework.

Scope of safety evidence. In this work the symbiotic framework is described mainly as a conceptual closed-loop architecture, and the assessment is carried out only with recorded-data logic where the decision can be deferred to a human-in-the-loop; therefore, we do not perform real vehicle actuation experiments, we do not measure end-to-end latency in a strict way, and we do not run fault-injection campaigns or construct formal safety-case artifacts. For this reason the proposed framework should be read as an architectural concept, useful to reason on information flow and human interaction, and not as a deployment-ready functionality that one can directly put in production. A deployment-grade implementation would instead need a complete functional safety lifecycle, for example, following ISO 26262 [38], together with a Safety Of The Intended Functionality process (ISO 21448/SOTIF) [39], where hazard analysis, verification and validation steps, and an explicit operational design domain specification are all defined and checked in a systematic manner.

The symbiotic framework represents a significant advancement over traditional detection-only systems by creating a closed-loop process that not only identifies dangerous driving but also initiates corrective actions. The real-time hazard detection component processes vehicle trajectory data at 30 frames per second, applying the Multi-Speed Transformer model to identify anomalous patterns. Upon detection, the corrective action module generates appropriate responses based on the specific type and severity of the detected hazard, ranging from driver alerts for minor deviations to automatic speed adjustment for critical situations.

The feedback loop continuously evaluates the effectiveness of corrective actions, creating a self-improving system that adapts to new driving patterns and environmental conditions. This symbiotic relationship between detection and intervention represents a novel approach to vehicle safety systems that bridges the gap between passive monitoring and fully autonomous control, providing a balanced solution that enhances safety while maintaining driver agency in smart city environments.

5. Results

Table 1 shows the best performance obtained by splitting the dataset into training and test sets and in Stratified k-fold Cross Validation. In both cases, the original (so unbalanced) dataset and the dataset obtained by balancing only the training partition (leaving the test partition unchanged) were used.

The results clearly demonstrate the superior performance of the Multi-Speed Transformer architecture compared to the other models for recognizing dangerous driving behavior. The Multi-Speed Transformer achieves the highest accuracy at 97.5% on the balanced dataset, significantly outperforming the TCN and Random Forest models. This indicates that the Multi-Speed Transformer, with its dual-branch design and attention mechanisms, is highly effective in learning complex patterns in sequential driving data.

With the specific aim of achieving a comprehensive understanding of the Multi-Speed Transformer’s operational efficacy, which transcends mere global statistical parameters, a rigorous examination of the derived confusion matrix was systematically conducted utilizing the balanced test set. In this context, the computational paradigm, the model, demonstrated an accurate discernment of 64 from a total of 66 hazardous driving occurrences categorized as true positives, subsequently culminating in a formidable recall rate of 97 percent. It is pertinent to note that the two identified instances classified as false negatives invariably represented cases of “gradual drifting”, a behavioral pattern wherein the intrinsic temporal kinematic signatures approximated a protracted alteration of a vehicular lane. Conversely, the model’s output encompassed five spurious positive detections, resulting in a precision metric of 92% primarily manifesting within the intricate topographical layouts of complex intersectional environments, where the conventional protocol of yielding was inadvertently misconstrued as either an indecisive pause or a complete arrest of motion.

Let

T N

,

F P

,

F N

, and

T P

denote true negatives, false positives, false negatives, and true positives, respectively. The confusion matrix is:

C = [\begin{matrix} T N & F P \\ F N & T P \end{matrix}] .

In the evaluated setting, the obtained counts are:

C = [\begin{matrix} (TN) & (FP) \\ (FN) & (TP) \end{matrix}],

where the scalar values

((TN), (FP), (FN), (TP))

correspond to the aggregated predictions of the validation protocol. The resulting class-conditional metrics are computed as Precision

= T P / (T P + F P)

, Recall

= T P / (T P + F N)

, and

F 1 = 2 \cdot Precision \cdot Recall / (Precision + Recall)

. Error inspection indicates that false negatives mainly occur in gradual drifting patterns with low instantaneous jerk, whereas false positives are more frequent in complex intersections where yielding and brief hesitations can resemble anomalous deceleration.

Quantitative comparisons are here explicitly limited to those baselines that we can run under exactly the same pre-processing, data splits, and evaluation metrics used in this study, namely Random Forest and TCN, so that the numerical gap is not confounded by external choices. Previously published results on TRAF-related settings (for instance, GraphRQI and closely connected variants) are briefly discussed in the Related Work section; nevertheless, differences in tracking fidelity, trajectory pre-processing pipelines, and also in evaluation protocols, make a strict head-to-head comparison not really reliable in the current experimental configuration, and we therefore keep them at the level of the qualitative context rather than the direct scoreboard.

Even though trajectory forecasting models, such as LaneGCN (Liang et al., 2020) [24], manifest a pronounced adeptness in the extrapolation of spatial coordinate information, our subsequent analytical comparison, has, however, persistently indicated an inherent insufficiency concerning their aptitude for truly apprehending the underlying semantic intention of aggressive behaviors. Diverging from this, the Multi-Speed Transformer’s intrinsic architecture, deploying its sophisticated attention heads, is specifically calibrated to vigilantly process high-frequency kinematic jerk signals, associated with the “fast branch”, and concomitant low-frequency path deviations, managed by its “slow branch”, thereby furnishing a demonstrable classification ascendancy over those conventional predictors, whose operational scope is confined solely to coordinate-based prognostication.

These results validate the Multi-Speed Transformer design, leveraging dual-scale processing and attention modeling, as uniquely poised for excelling at sequence classification tasks like dangerous driving detection based on vehicular trajectory data. The significant gaps in performance metrics demonstrate the limitations of alternative models for this problem.

Additionally, we evaluated the impact of integrating the symbiotic framework on the model’s performance. The results in Table 2 demonstrate the improvement in accuracy, precision, recall, and F1-score when the symbiotic framework is employed.

6. Conclusions

This paper presents an AI-based approach for automatically detecting dangerous driving behavior from traffic videos to improve road safety in smart cities. An ad hoc dataset was constructed with diverse scenarios, and an advanced Multi-Speed Transformer deep learning architecture was employed, enhanced by a symbiotic framework to ensure real-time detection and rectification of hazardous maneuvers.

Based on the results and analysis, the Multi-Speed Transformer architecture demonstrates state-of-the-art performance for recognizing dangerous driving behavior from traffic video data. The dual-scale processing and attention mechanisms enable the Multi-Speed Transformer to learn intricate sequential relationships within vehicle maneuver data across both local and global contexts. This leads to significantly boosted accuracy of 97.5% and F1-scores up to 95.5% for detecting anomalous trajectories, outperforming other deep learning and machine learning approaches.

The proposed symbiotic framework should be mainly understood as an architectural concept, which illustrates in a reasonably concrete way how hazardous-behavior recognition modules may be plugged inside a human-in-the-loop intervention loop, where the automatic decision can still be checked or overruled by a human operator; at the present stage the validation is entirely based on offline or simulation-style experiments; therefore, it does not provide a functional-safety argument and it cannot be considered, in any strict sense, as a complete SOTIF safety case. This framework ensures prompt corrective actions while maintaining the driver safely in-the-loop, thus minimizing risky manual control periods. This human–machine cooperative approach represents a balanced pathway for integrating automation to enhance safety, rather than relying solely on either manual or fully autonomous systems.

The research clearly validates the promise of advanced neural architectures and the symbiotic framework for enabling robust driving analysis systems to improve road safety. However, real-world testing and evaluation of such intelligent systems are imperative before large-scale deployment in smart cities. Future work includes extensive testing of these dual-use driver assistance systems across diverse conditions, incorporating naturalistic driving datasets covering various lighting, weather, and traffic conditions. Monitoring model performance across these scenarios is crucial.

Furthermore, incorporating analysis of surrounding vehicles, pedestrians, and other contextual information can further boost prediction accuracy and reliability for real-world dangerous driving detection. Thorough testing across diverse operating conditions and expanding the scope of analysis are key directions that need to be pursued for developing industry-grade systems based on the promising Multi-Speed Transformer approach presented in this paper.

It is recognized that the symbiotic framework constitutes essentially a conceptual validation, while the future deployment dictates a rigorous compliance with functional safety standards typified by ISO 26262 and SOTIF (ISO 21448). The transition from simulation-based results to physical intervention implies the necessity of hardware-in-the-loop testing; this is fundamental to quantify latency and to guarantee that automated corrections do not introduce new hazards into the system dynamics when the operational conditions change.

Responsible development demands proactive collaborations between researchers, authorities, manufacturers, and the public to assess benefits and risks in context. Communication, public oversight mechanisms, and governance will be vital to align progress with social values. By working together, we can translate innovations like the Multi-Speed Transformer and symbiotic framework into solutions that tangibly save lives.

Regulations and standardization around performance metrics and datasets will also be indispensable for facilitating adoption. Overall, this research marks an important milestone in leveraging state-of-the-art deep learning and symbiotic frameworks for automated safe driving assistance and oversight in future smart cities.

The future work, having been prioritized, presents itself as follows: the formal validation of the symbiotic framework, mandated by the ISO 26262 safety standards, is of paramount importance to ensure fail-safe operations; then, the expansion of the existing dataset, so to include adverse meteorological conditions, like intense rain or dense fog, will serve to robustly test sensor capabilities. Furthermore, the integration of pedestrian intention analysis is a subsequent, crucial step.

Author Contributions

Conceptualization, V.D., D.I., S.G. and G.P.; methodology, V.D., L.D.M., S.G. and D.I.; software, V.D. and L.D.M.; validation, S.G. and L.D.M.; investigation, V.D. and S.G.; writing—original draft preparation, V.D., S.G. and L.D.M.; writing—review and editing, S.G. and G.P.; supervision, G.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the project FAIR—Future AI Research (PE00000013), spoke 6—Symbiotic AI, under the NRRP MUR program funded by the NextGenerationEU.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The study is based on the TRAF dataset [32]. Where redistribution of raw videos is restricted, reproducibility is supported via derived trajectory descriptors (normalized bounding-box trajectories at 30 fps with fixed 150-frame windows), together with the corresponding labels and split assignments, as described in Section 4.7.

Acknowledgments

No GenerativeAI has been used for writing this work. DeepL www.deepl.com has been used for content checking and translations.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ahmed, S.K.; Mohammed, M.G.; Abdulqadir, S.O.; El-Kader, R.G.A.; El-Shall, N.A.; Chandran, D.; Rehman, M.E.U.; Dhama, K. Road traffic accidental injuries and deaths: A neglected global health issue. Health Sci. Rep. 2023, 6, e1240. [Google Scholar] [CrossRef] [PubMed]
Rodríguez-López, J.; Rebollo-Sanz, Y.; Mesa-Ruiz, D. Hidden figures behind two-vehicle crashes: An assessment of the risk and external costs of drunk driving in Spain. Accid. Anal. Prev. 2019, 127, 42–51. [Google Scholar] [CrossRef]
Al-Sultan, S.; Al-Bayatti, A.H.; Zedan, H. Context-aware driver behavior detection system in intelligent transportation systems. IEEE Trans. Veh. Technol. 2013, 62, 4264–4275. [Google Scholar] [CrossRef]
Chandra, R.; Bhattacharya, U.; Mittal, T.; Li, X.; Bera, A.; Manocha, D. Graphrqi: Classifying driver behaviors using graph spectrums. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 4350–4357. [Google Scholar]
Chandra, R.; Bhattacharya, U.; Bera, A.; Manocha, D. Traphic: Trajectory prediction in dense and heterogeneous traffic using weighted interactions. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 8483–8492. [Google Scholar]
Huang, X.; Cheng, X.; Geng, Q.; Cao, B.; Zhou, D.; Wang, P.; Lin, Y.; Yang, R. The apolloscape dataset for autonomous driving. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 954–960. [Google Scholar]
Federal Highway Administration. Next Generation Simulation (NGSIM); Federal Highway Administration: Washington, DC, USA, 2022.
Narayanan, A.; Dwivedi, I.; Dariush, B. Dynamic traffic scene classification with space-time coherence. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 5629–5635. [Google Scholar]
Cheung, E.; Bera, A.; Kubin, E.; Gray, K.; Manocha, D. Identifying driver behaviors using trajectory features for vehicle navigation. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 3445–3452. [Google Scholar]
Federal Highway Administration. Interstate 80 (I-80) Freeway Dataset; Federal Highway Administration Research and Technology; Federal Highway Administration: Washington, DC, USA, 2006.
Chen, Z.; Wu, C.; Huang, Z.; Lyu, N.; Hu, Z.; Zhong, M.; Cheng, Y.; Ran, B. Dangerous driving behavior detection using video-extracted vehicle trajectory histograms. J. Intell. Transp. Syst. 2017, 21, 409–421. [Google Scholar] [CrossRef]
Hu, J.; Zhang, X.; Maybank, S. Abnormal driving detection with normalized driving behavior data: A deep learning approach. IEEE Trans. Veh. Technol. 2020, 69, 6943–6951. [Google Scholar] [CrossRef]
Koetsier, C.; Fiosina, J.; Gremmel, J.N.; Müller, J.P.; Woisetschläger, D.M.; Sester, M. Detection of anomalous vehicle trajectories using federated learning. ISPRS Open J. Photogramm. Remote Sens. 2022, 4, 100013. [Google Scholar] [CrossRef]
Sun, R.; Ochieng, W.Y.; Feng, S. An integrated solution for lane level irregular driving detection on highways. Transp. Res. Part C Emerg. Technol. 2015, 56, 61–79. [Google Scholar] [CrossRef]
Chen, S.; Xue, Q.; Zhao, X.; Xing, Y.; Lu, J.J. Risky driving behavior recognition based on vehicle trajectory. Int. J. Environ. Res. Public Health 2021, 18, 12373. [Google Scholar] [CrossRef]
Ji, Q.; Zhu, Z.; Lan, P. Real-time nonintrusive monitoring and prediction of driver fatigue. IEEE Trans. Veh. Technol. 2004, 53, 1052–1068. [Google Scholar] [CrossRef]
Jiang, H.; Hu, J.; Liu, D.; Xiong, J.; Cai, M. Driversonar: Fine-grained dangerous driving detection using active sonar. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2021, 5, 1–22. [Google Scholar] [CrossRef]
You, C.W.; Montes-de Oca, M.; Bao, T.J.; Lane, N.D.; Lu, H.; Cardone, G.; Torresani, L.; Campbell, A.T. CarSafe: A driver safety app that detects dangerous driving behavior using dual-cameras on smartphones. In Proceedings of the 2012 ACM Conference on Ubiquitous Computing, Pittsburgh, Pennsylvania, 5–8 September 2012; pp. 671–672. [Google Scholar]
Galarza, E.E.; Egas, F.D.; Silva, F.M.; Velasco, P.M.; Galarza, E.D. Real time driver drowsiness detection based on driver’s face image behavior using a system of human computer interaction implemented in a smartphone. In Proceedings of the International Conference on Information Technology & Systems (ICITS 2018), Libertad City, Ecuador, 10–12 January 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 563–572. [Google Scholar]
Aljaafreh, A.; Alshabatat, N.; Al-Din, M.S.N. Driving style recognition using fuzzy logic. In Proceedings of the 2012 IEEE International Conference on Vehicular Electronics and Safety (ICVES 2012), Istanbul, Turkey, 24–27 July 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 460–463. [Google Scholar]
Meiring, G.A.M.; Myburgh, H.C. A review of intelligent driving style analysis systems and related artificial intelligence algorithms. Sensors 2015, 15, 30653–30682. [Google Scholar] [CrossRef] [PubMed]
Krahé, B.; Fenske, I. Predicting aggressive driving behavior: The role of macho personality, age, and power of car. Aggress. Behav. Off. J. Int. Soc. Res. Aggress. 2002, 28, 21–29. [Google Scholar] [CrossRef]
Feng, Z.X.; Liu, J.; Li, Y.Y.; Zhang, W.H. Selected model and sensitivity analysis of aggressive driving behavior. Zhongguo Gonglu Xuebao (China J. Highw. Transp.) 2012, 25, 106–112. [Google Scholar]
Liang, M.; Yang, B.; Hu, R.; Chen, Y.; Liao, R.; Feng, S.; Urtasun, R. Learning lane graph representations for motion forecasting. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 541–556. [Google Scholar]
Gao, J.; Sun, C.; Zhao, H.; Shen, Y.; Anguelov, D.; Li, C.; Schmid, C. Vectornet: Encoding hd maps and agent dynamics from vectorized representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11525–11533. [Google Scholar]
Yuan, Y.; Weng, X.; Ou, Y.; Kitani, K. AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 9813–9823. [Google Scholar]
Ngiam, J.; Vasudevan, V.; Caine, B.; Zhang, Z.; Chiang, H.T.L.; Ling, J.; Roelofs, R.; Bewley, A.; Liu, C.; Venugopal, A.; et al. Scene Transformer: A Unified Architecture for Predicting Multiple Agent Trajectories. arXiv 2021, arXiv:2106.08417. [Google Scholar] [CrossRef]
Girgis, R.; Golemo, F.; Codevilla, F.; Weiss, M.; D’Souza, J.A.; Ebrahimi Kahou, S.; Heide, F.; Pal, C. Latent Variable Sequential Set Transformers for Joint Multi-Agent Motion Prediction. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual, 25–29 April 2022. [Google Scholar]
Nayakanti, N.; Al-Rfou, R.; Zhou, A.; Goel, K.; Refaat, K.S.; Sapp, B. Wayformer: Motion Forecasting via Simple & Efficient Attention Networks. arXiv 2022, arXiv:2207.05844. [Google Scholar] [CrossRef]
Shi, S.; Jiang, L.; Dai, D.; Schiele, B. MTR++: Multi-Agent Motion Prediction With Symmetric Scene Modeling and Guided Intention Querying. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 3955–3971. [Google Scholar] [CrossRef]
Del-Coco, M.; Carcagnì, P.; Oliver, S.T.; Iskandaryan, D.; Leo, M. The Role of AI in Smart Mobility: A Comprehensive Survey. Electronics 2025, 14, 1801. [Google Scholar] [CrossRef]
Chandra, R.; Bhattacharya, U.; Roncal, C.; Bera, A.; Manocha, D. RobustTP: End-to-End Trajectory Prediction for Heterogeneous Road-Agents in Dense Traffic with Noisy Sensor Inputs. arXiv 2019, arXiv:1907.08752. [Google Scholar]
Zhao, P.; Luo, C.; Qiao, B.; Wang, L.; Rajmohan, S.; Lin, Q.; Zhang, D. T-SMOTE: Temporal-oriented Synthetic Minority Oversampling Technique for Imbalanced Time Series Classification. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23–29 July 2022; Raedt, L.D., Ed.; IJCAI: Montreal, QC, Canada, 2022; pp. 2406–2412. [Google Scholar] [CrossRef]
Polack, P.; Altché, F.; d’Andrea Novel, B.; Lusetti, A. The kinematic bicycle model: A consistent model for planning feasible trajectories for autonomous vehicles. In Proceedings of the 2017 IEEE Intelligent Vehicles Symposium (IV), Los Angeles, CA, USA, 11–14 June 2017; pp. 812–818. [Google Scholar] [CrossRef]
Cheriet, M.; Dentamaro, V.; Hamdan, M.; Impedovo, D.; Pirlo, G. Multi-speed transformer network for neurodegenerative disease assessment and activity recognition. Comput. Methods Programs Biomed. 2023, 230, 107344. [Google Scholar] [CrossRef]
Li, A.; Ning, X.; Zöldy, M.; Chen, J.; Xu, G. Intelligent Vehicle Target Detection Algorithm Based on Multiscale Features. Sensors 2025, 25, 5084. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6010. [Google Scholar]
ISO 26262:2018; Road Vehicles—Functional Safety. International Organization for Standardization: Geneva, Switzerland, 2018.
ISO 21448:2022; Road Vehicles—Safety of the Intended Functionality (SOTIF). International Organization for Standardization: Geneva, Switzerland, 2022.

Figure 1. Distribution of classes in the original dataset.

Figure 2. Multi-speed transformer architecture [35].

Figure 3. The Temporal Convolution Architecture. The content of the green transparent box is repeated 17 times [35].

Figure 4. Symbiotic framework architecture.

Table 1. Results of the experiments.

Model	Dataset	Accuracy	Precision	Recall	F1-Score
Random Forest	Original	79%	72%	68%	68%
TCN	Original	94%	88%	86%	86%
Multi-Speed Transformer	Original	95.1%	89%	90%	89%
Random Forest	Balanced	81%	75%	70%	71%
TCN	Balanced	96%	91%	89%	90%
Multi-Speed Transformer	Balanced	97.5%	92%	94%	93%

Note: All models evaluated using stratified k-fold cross validation.

Table 2. Impact of symbiotic framework on model performance.

Symbiotic Framework	Accuracy	Precision	Recall	F1-Score
Without	97.5%	92%	94%	93%
With	98.7%	95%	96%	95.5%

Note: Multi-Speed Transformer on balanced dataset using stratified k-fold cross validation.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Dentamaro, V.; Di Maggio, L.; Galantucci, S.; Impedovo, D.; Pirlo, G. Safety in Smart Cities—Automatic Recognition of Dangerous Driving Styles. Information 2026, 17, 44. https://doi.org/10.3390/info17010044

AMA Style

Dentamaro V, Di Maggio L, Galantucci S, Impedovo D, Pirlo G. Safety in Smart Cities—Automatic Recognition of Dangerous Driving Styles. Information. 2026; 17(1):44. https://doi.org/10.3390/info17010044

Chicago/Turabian Style

Dentamaro, Vincenzo, Lorenzo Di Maggio, Stefano Galantucci, Donato Impedovo, and Giuseppe Pirlo. 2026. "Safety in Smart Cities—Automatic Recognition of Dangerous Driving Styles" Information 17, no. 1: 44. https://doi.org/10.3390/info17010044

APA Style

Dentamaro, V., Di Maggio, L., Galantucci, S., Impedovo, D., & Pirlo, G. (2026). Safety in Smart Cities—Automatic Recognition of Dangerous Driving Styles. Information, 17(1), 44. https://doi.org/10.3390/info17010044

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Safety in Smart Cities—Automatic Recognition of Dangerous Driving Styles

Abstract

1. Introduction

2. Related Work

2.1. Techniques Using Cameras Inside the Vehicle

2.2. Techniques Using Sensors Inside the Vehicle

2.3. Psychological Studies on Dangerous Driving

2.4. Transformer-Based Trajectory Analysis

Transformer Motion Forecasting vs. Behavior Recognition

3. Materials

4. Methods

4.1. Implementation Details and Hyperparameter Selection

4.2. Dataset Verification

4.3. Data Evaluation Methods

4.4. Pre-Processing Phase

4.5. Dataset Labeling

4.6. Data Organization

4.7. Reproducibility Package and Evaluation Protocol

4.8. Models and Evaluation

4.9. Symbiotic Framework

4.9.1. Real-Time Hazard Detection

4.9.2. Corrective Action Module

4.9.3. Feedback Loop

4.9.4. Integration with Vehicle Systems

4.9.5. Performance Evaluation Protocol

5. Results

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI