1. Introduction
Analysis of the driver’s state and activities is nowadays crucial, since one of the main causes of road accidents is, precisely, inattention, aggressive maneuvers, and driver drowsiness that represent dangerous driving. Dangerous driving, in fact, refers to abnormal behavior by vehicles and/or cyclists that could endanger road users and lead to road accidents. Examples of abnormal behavior are zig-zagging movement, a vehicle cutting across the road to other vehicles, reckless overtaking, sudden acceleration and braking, a motorcycle overtaking between two vehicles close together, a vehicle driving at high speed close to a group of people, a vehicle failing to stop when pedestrians are crossing, transit of vehicles on the wrong side of the road, a vehicle turning onto a road when another vehicle is coming, etc. Notwithstanding that these discernible behavioral patterns present divergent kinematic trajectories, for instance, ranging from erratic zig-zagging maneuvers to abrupt braking occurrences, the present investigational framework converges them under a singular, binary classification denomination, specifically, the “Dangerous” label. Such an architectural predisposition, therefore, judiciously prioritizes the maximization of detection efficacy for safety-critical events, with an emphasis on high recall performance, thereby unequivocally guaranteeing the systemic activation of the symbiotic protocol upon the recognition of any kinematic deviation deemed anomalous, entirely irrespective of its particular sub-categorization.
According to [
1], traffic accidents globally cause significant health issues, killing or disabling 1.35 million annually, especially in low-income countries, and may become the seventh leading death cause by 2030. In this scenario, the aim of this work is to use artificial intelligence algorithms to recognize the behavioral biometrics of vehicles in order to detect dangerous driving. The statistics underline the gravity of the problem; however, the infrastructure disparity between high- and low-income countries implies the necessity for adaptable AI-driven solutions. In high-income nations such systems integrate with autonomous vehicle stacks, while in low-income regions they operate as cost-effective camera-based retrofit warning systems. Nevertheless, a critical gap remains in current research since existing systems focus on detection in isolation; this paper addresses the need for a holistic loop which not only detects hazardous trajectories but triggers immediate symbiotic countermeasures.
To this end, it is necessary to create an ad hoc dataset, carry out the pre-processing and feature extraction phases, and apply the techniques of machine learning and deep learning, together with the appropriate optimization of the parameters, in order to learn the pattern capable of classifying examples of normal driving and dangerous driving. Finally, a comparison is made between the performance obtained by the different algorithms applied, with the aim of identifying the algorithm with the highest performance. In this regard, in the panorama of the so-called “smart city”, a system capable of promptly detecting dangerous maneuvers and behavior can help to avoid road accidents, even fatal ones, by promptly identifying aggressive drivers and having the police intervene as soon as possible. Indeed, as reiterated in Rodriguez-Lopez et al. [
2], drink-driving is characterized by sudden acceleration, lack of speed control, and long response times (about 1.8–2.3 s). According to some studies, there are small differences in abnormal driving depending on the state of the driver. For example, it was shown in Al-Sultan et al. [
3] that driving while fatigued is similar to driving under the influence of alcohol but with different response times. In contrast, the reckless driver (i.e., without fatigue or the effect of alcohol) is awake and sober but may be impaired by pure mental factors. Furthermore, the integration of a symbiotic framework crafted to detect and rectify dangerous maneuvers represents a crucial advancement in fortifying vehicular safety protocols. The substantiation through real-world validation serves as compelling evidence of artificial intelligence’s ability to furnish vigilant oversight and invaluable aid, fostering a culture of trust and dependence on automated functionalities. Thus, the innovative elements of this study are:
The approach applied to recognize dangerous driving style, using a camera outside the vehicle to monitor its trajectory;
The dataset created ad hoc for its study of “in the wild” videos containing as many different scenarios and contexts of dangerous driving as possible;
The innovative deep learning technique applied, including the Transformer algorithms.
The integration of a symbiotic framework aimed at promptly identifying and rectifying hazardous maneuvers, thereby enhancing vehicle safety standards.
The paper is structured in the following way:
Section 2 explores the current approaches adopted for detecting dangerous driving behaviors with reference to techniques and datasets used.
Section 3 and
Section 4 explain the dataset chosen for this work and relative pre-processing procedures, and machine learning and deep learning model selection.
Section 5 shows the results achieved with much greater performances reached by the deep learning models. Finally,
Section 6 explores possible future developments and trends regarding safety in driving.
2. Related Work
Approaches to detecting dangerous driving behavior can be categorized into three main groups: (1) external camera-based methods that analyze vehicle trajectories, (2) in-vehicle camera systems that monitor driver behavior, and (3) sensor-based approaches that utilize vehicle telemetry data. Each approach offers distinct advantages and limitations for real-world applications in smart city environments. There are several approaches to detect dangerous driving behavior using video recordings or smartphone sensors such as GPS and accelerometers. Sensors installed in the car on the pedals are also used to detect the driver’s driving style based on acceleration and braking. The first approach, using video recorded by a camera, is divided according to the camera’s location: outside the vehicle to monitor movements and trajectories or inside to monitor driver behavior and movements (including eye-tracking and head movements).
Chandra et al. [
4] present an algorithm to identify driver behavior through vehicle trajectories captured by an external camera. This approach assumes that road agents exhibit driving traits affecting neighboring agents’ trajectories. The GraphRQI algorithm achieves up to 25% improved accuracy over other driver behavior classification algorithms, classifying drivers into six classes: impatient, reckless, threatening, careful, cautious, and timid. It uses trajectories to form an unweighted, undirected graph representation, calculates the graph’s adjacency, degree, and Laplacian matrices, and employs the RQI algorithm for eigenvalues. The best performance is 78.3% weighted accuracy on the TRAF dataset and 89.9% on the ARGO dataset. However, performance depends on the accuracy of the tracking methods used.
Chandra et al. [
5] use road agents’ trajectories to identify drivers’ general characteristics (conservative or aggressive) using the ApolloScape [
6], NGSIM [
7], and TRAF datasets. A novel LSTM-CNN hybrid network models interactions between different road agents, considering varying shapes, dynamics, and behaviors. The average RMSE of this approach is 0.78.
Narayanan et al. [
8] classify vehicle behavior with a CNN based on scene recognition and temporal representation using LSTM. The dataset comprises 80 h of driving video from the San Francisco Bay Area. The study shows improved performance by analyzing scene context characteristics (e.g., weather, road conditions). The ResNet50-based experiments achieved a Mean Average Precision (MAP) of 0.42, exceeding the state of the art by 0.09.
Cheung et al. [
9] identify driver behavior based on vehicle trajectories using the “Interstate 80 Freeway Dataset” [
10]. The TDBM algorithm and MCMC sampling technique estimate vehicle trajectories and issue alerts if dangerous. The study considers driving styles: aggressive, reckless, threatening, careful, cautious, and timid. The average error in cross-validation is 0.75 for aggressive and 0.6 for cautious driving.
Chen et al. [
11] use the “trajectory histogram” method (e.g., control point and speed histograms) to represent vehicle motion from external videos. The “Minimum Redundancy and Maximum Relevance (mRMR)” method selects representative trajectory histograms. The hybrid “Particle Swarm Optimisation-Support Vector Machine (PSO_SVM)” algorithm identifies dangerous driving, outperforming Naïve Bayesian Classifier, k-Nearest Neighbour, and decision tree methods. The accuracy with mRMR is 80.3%.
Hu et al. [
12] use deep learning to classify driving behavior into normal, drunk/fatigue, reckless, and mobile phone use while driving. The SdsAEs (Stacked Denoising Sparse Autoencoders) model is used, trained in a greedy level-by-level manner with dropout to avoid overfitting. Data were collected from General Motors Corporation vehicles. The recall and precision for each class are: normal (96.14% and 95.04%), drunk/fatigue (88.80% and 90.91%), reckless (93.82% and 90.00%), and mobile phone use (91.8% and 94.82%). The medium recall is 92.66% and the medium precision is 92.69%.
Koetsier et al. [
13] address detecting anomalies in vehicle trajectories and driving style using algorithms like SVM. The OCSVM technique was applied to the “Interaction” dataset, comprising three sets: “DR_USA_Intersection_EP”, “DR_USA_Roundabout_SR”, and “DR_USA_Intersection_MA”. Anomalies detected include speed-related (e.g., unreasonable stopping, driving too fast), position-related (e.g., driving in reverse, U-turns), and interaction-related (e.g., near collisions, ignoring right of way). The best performance, an AUC-ROC of 99.3%, was achieved using all three datasets for training and SR for testing.
Sun et al. [
14] detect erratic lane-level driving on motorways using pole-mounted camera videos. The Kalman filter estimates vehicles’ positions and speeds to classify smooth and erratic driving.
Chen et al. [
15] classify driving style into unstable speed, serpentine, and risky driving using an MOR threshold value determined by boxplots and value distributions. The dataset includes thirty 15 min videos from a Shanghai motorway. The recognition accuracy of risky driving behavior is 91% with boxplot-based methods and 86% with distribution-based methods.
3. Materials
After thorough research and comparison of state-of-the-art datasets, the TRAF dataset was selected, partially available in [
32], containing videos of dangerous and normal driving. Firstly, the dataset was cleaned by selecting the most significant videos, discarding those poor in information and those recorded at bad angles that do not allow a complete view of the vehicles.
The dataset obtained consists of 18 videos with 414 examples, of which 348 are classed as normal driving and 66 dangerous driving, totaling approximately 45 min of video recording.
We clearly recognize that, in a deep learning perspective, the corpus with 414 labeled trajectory segments is rather small, so the study is intentionally presented as a proof-of-concept on a curated safety-critical dataset and not as a demonstration of general validity for all possible scenarios. To reduce overfitting effects in this small-data condition, (i) the test partitions are never oversampled or augmented, (ii) model selection is carried out only inside the training folds without looking at the test data, and (iii) regularization strategies such as dropout, weight decay, and early stopping are applied and described in detail in the Methods section.
The problem of an unbalanced dataset is present, as seen in
Figure 1. This can lead to misclassifications when predicting new examples, as a classifier may learn a better pattern to classify normal driving examples and a less optimal pattern for dangerous driving examples due to fewer training examples of the latter class.
To improve classifier performance and facilitate dangerous driving classification, data augmentation was carried out using the random oversampling technique, duplicating only the minority class training set examples, leaving the test set unchanged. We keep a simple random oversampling strategy instead of introducing synthetic generation, mainly for a conservative and pragmatic reason: the network input is a temporally ordered vehicle trajectory and, if one applies naïve interpolation directly in the coordinate space, it is quite easy to produce artificial traces that are not kinematically feasible; for example, they may break non-holonomic constraints or they may show acceleration and curvature profiles that are clearly unrealistic from a vehicle dynamics point of view. This choice should not be read as a general refusal of synthetic balancing methods, but rather as an indication that one needs
temporally aware and
physics-constrained procedures when dealing with this type of sequential data. As an illustration, temporal-oriented oversampling schemes for time-series imbalance, such as T-SMOTE, explicitly encode temporal structure during the synthesis of new samples [
33], while physically plausible augmentation can be obtained by keeping only those synthetic trajectories that satisfy feasibility checks derived from standard vehicle models, for instance, the kinematic bicycle formulation [
34], which is widely used in autonomous driving.
By doing so, synthetic information was not added to avoid inserting distorted data that could compromise classification and not reflect the real data distribution. Moreover, this approach prevented overfitting due to the application of cross-validation techniques.
Figure 2 illustrates sample frames from our dataset, showing both normal and dangerous driving scenarios. Each sample is annotated with bounding boxes and trajectory information, highlighting the distinctive patterns of movement that characterize dangerous driving behaviors. The dataset includes diverse scenarios: urban intersections (42%), highways (35%), and suburban roads (23%), captured under various lighting conditions—daylight (65%), dusk/dawn (20%), and night (15%)—to ensure robust model training for real-world applications.
Operational definitions and verification. Scenario labels are assigned at video level according to simple but explicit operational criteria, where urban intersection refers to multi-leg junctions that show clear yielding or traffic-light interactions between different traffic streams, highway describes multi-lane carriageways with mostly stable higher-speed flow and visible on-ramp or off-ramp structures, while suburban road is used for single-carriageway segments with medium traffic density and only a limited number of controlled junctions. Illumination labels follow a rule-based visual inspection: daylight is used for scenes fully exposed to sun with marked contrast, dusk/dawn covers frames with low-angle sun and globally reduced contrast, and night indicates sequences where artificial lights dominate and the background stays mostly dark.
4. Methods
This section describes the adopted approach, encompassing dataset verification, pre-processing, feature extraction, models, and evaluation.
4.5. Dataset Labeling
Post-processing of YOLO frames involved manual labeling of the dataset. A text file was created for each video, describing each vehicle on a line-by-line basis. Each entry included the vehicle ID, the frame interval in which it appears, and the class label (0 for normal driving, 1 for dangerous driving). Detailed and precise labeling was ensured by addressing cases where YOLO lost tracking of a vehicle due to abrupt movements or occlusion. In such instances, tracking was divided into multiple examples, calculating the frame intervals for each different ID assigned, and associating the correct class label for each segment.
For the establishment of ground truth fidelity, and to attenuate any potential observer-induced biases, the annotation procedure was meticulously undertaken, leveraging the Computer Vision Annotation Tool (CVAT) through the agency of three autonomous human annotators. Subsequently, a scheme predicated upon majority consensus was instantiated which served to arbitrate instances of inter-annotator disagreement, culminating in a Fleiss’ kappa inter-rater agreement coefficient registering at , thereby denoting a profound consistency, in the axiomatic definition of hazardous kinematic manifestations.
Annotation protocol. Each candidate trajectory segment is independently annotated by three human annotators with a binary label {Normal, Dangerous}. In our operational definition, a trajectory is marked as Dangerous when it clearly shows at least one of the following kinematic patterns: (i) zig-zag or weaving motion over several frames; (ii) risky overtaking with sudden lateral displacement when other agents are very close in space or time; or (iii) sustained speeding while crossing, or immediately passing, a pedestrian crossing. All remaining trajectories, where such patterns are not visible in a consistent way, are labeled as Normal. Borderline situations (for instance, stop-and-go congestion, very cautious yielding, long partial occlusions) are first tagged as uncertain; then they are re-checked in a joint session and resolved with a simple majority vote; segments for which a stable agreement is not reached are finally discarded from the labeled dataset, to avoid noisy supervision. The operational definitions, ambiguity handling rules, and the consensus procedure are fully specified in this subsection to support methodological transparency and independent audit.
Further, text files were generated for each labeled vehicle, renamed to “idVehicle_class.txt”. These files contain the target class on the first line, followed by the bounding box coordinates for each frame where the vehicle appears. Each line corresponds to one frame, ensuring a comprehensive and detailed description of the vehicle’s movement across frames.
4.8. Models and Evaluation
For the learning phase, both machine learning and deep learning approaches were used, specifically the Random Forest, Multi-Speed Transformer, and Temporal Convolutional Network (TCN) classifiers. This allowed for performance comparisons between traditional and advanced methods.
The Random Forest is an ensemble learning algorithm based on decision trees. It creates an ensemble of decision trees, each trained on a random subset of the training dataset to reduce the risk of overfitting. In this experiment, the number of trees (up to 100) and the random state parameter were varied according to performance. The final prediction is obtained by aggregating the individual tree predictions. The Multi-Speed Transformer was specifically selected for this application due to its proven capability to model complex temporal relationships at different time scales, which is essential for distinguishing between normal driving patterns and the often sudden, erratic movements characteristic of dangerous driving. Unlike traditional RNN-based approaches that struggle with long-term dependencies, the Multi-Speed Transformer’s parallel processing pathways—operating at different temporal resolutions—enable it to simultaneously capture both immediate frame-to-frame variations indicative of sudden maneuvers and longer-term trajectory patterns revealing overall driving style. This dual-scale processing capability is particularly advantageous for dangerous driving detection, where behavioral patterns may manifest across different temporal scales.
The Multi-Speed Transformer architecture, adapted for this study from its original application in neurodegenerative disease assessment and activity recognition, is presented in
Figure 2. It learns meaningful time-dependent correlations and patterns at two scales—fast and slow—by analyzing data in different resolutions. This adaptation involved methodological innovations to tailor the model for driver behavior analysis. Specifically, the architecture includes two parallel branches: the top branch performs 1D convolution with stride equal to 1 and a subsequently dilated convolution with dilation rate equal to 2, while the second branch’s first convolution has a stride of 3, allowing it to observe data at high resolution for fine features and at lower resolution for broader concepts [
35]. The branches converge through a Positional Encoding Layer [
37], which integrates positional signals into the model to enhance temporal understanding.
The principal novelty proposed in this adaptation consists of the calibration of the temporal branches; in fact, the receptive fields of the slow branch resulted in being adjusted to 3 s (circa 90 frames) for capturing long-term weaving, while the fast branch resulted in being tuned to 0.5 s (15 frames) for detecting sudden jerks or braking. This approach results in differences with respect to the original implementation, which focused on human skeletal action recognition.
The TCN, a variant of convolutional neural networks, was also employed. TCN architecture consistently performs better than RNN, LSTM, and GRU architectures due to its longer memory capabilities and parallel processing advantages, adaptable in terms of receptive field size, gradient stability, and low memory requirements during training [
35]. The architecture of this TCN is shown in
Figure 3.
To address the concern about dataset imbalance and evaluate the effectiveness of data augmentation techniques, experiments were conducted using both the original and the balanced dataset. The balancing was achieved by duplicating only the training partition examples of the minority class (class “1” for dangerous driving) while leaving the test partition unchanged. This method was chosen to avoid introducing synthetic data that could distort real-world distribution patterns. Each test was performed using both stratified k-fold cross-validation and a 70/30 train–test split, with parameter optimization for each test.
Table 1 shows the best performance obtained using both the original unbalanced dataset and the balanced dataset. This comparison is crucial to demonstrate the impact of random oversampling on model performance.
5. Results
Table 1 shows the best performance obtained by splitting the dataset into training and test sets and in Stratified k-fold Cross Validation. In both cases, the original (so unbalanced) dataset and the dataset obtained by balancing only the training partition (leaving the test partition unchanged) were used.
The results clearly demonstrate the superior performance of the Multi-Speed Transformer architecture compared to the other models for recognizing dangerous driving behavior. The Multi-Speed Transformer achieves the highest accuracy at 97.5% on the balanced dataset, significantly outperforming the TCN and Random Forest models. This indicates that the Multi-Speed Transformer, with its dual-branch design and attention mechanisms, is highly effective in learning complex patterns in sequential driving data.
With the specific aim of achieving a comprehensive understanding of the Multi-Speed Transformer’s operational efficacy, which transcends mere global statistical parameters, a rigorous examination of the derived confusion matrix was systematically conducted utilizing the balanced test set. In this context, the computational paradigm, the model, demonstrated an accurate discernment of 64 from a total of 66 hazardous driving occurrences categorized as true positives, subsequently culminating in a formidable recall rate of 97 percent. It is pertinent to note that the two identified instances classified as false negatives invariably represented cases of “gradual drifting”, a behavioral pattern wherein the intrinsic temporal kinematic signatures approximated a protracted alteration of a vehicular lane. Conversely, the model’s output encompassed five spurious positive detections, resulting in a precision metric of 92% primarily manifesting within the intricate topographical layouts of complex intersectional environments, where the conventional protocol of yielding was inadvertently misconstrued as either an indecisive pause or a complete arrest of motion.
Let
,
,
, and
denote true negatives, false positives, false negatives, and true positives, respectively. The confusion matrix is:
In the evaluated setting, the obtained counts are:
where the scalar values
correspond to the aggregated predictions of the validation protocol. The resulting class-conditional metrics are computed as Precision
, Recall
, and
. Error inspection indicates that false negatives mainly occur in gradual drifting patterns with low instantaneous jerk, whereas false positives are more frequent in complex intersections where yielding and brief hesitations can resemble anomalous deceleration.
Quantitative comparisons are here explicitly limited to those baselines that we can run under exactly the same pre-processing, data splits, and evaluation metrics used in this study, namely Random Forest and TCN, so that the numerical gap is not confounded by external choices. Previously published results on TRAF-related settings (for instance, GraphRQI and closely connected variants) are briefly discussed in the Related Work section; nevertheless, differences in tracking fidelity, trajectory pre-processing pipelines, and also in evaluation protocols, make a strict head-to-head comparison not really reliable in the current experimental configuration, and we therefore keep them at the level of the qualitative context rather than the direct scoreboard.
Even though trajectory forecasting models, such as LaneGCN (Liang et al., 2020) [
24], manifest a pronounced adeptness in the extrapolation of spatial coordinate information, our subsequent analytical comparison, has, however, persistently indicated an inherent insufficiency concerning their aptitude for truly apprehending the underlying
semantic intention of aggressive behaviors. Diverging from this, the Multi-Speed Transformer’s intrinsic architecture, deploying its sophisticated attention heads, is specifically calibrated to vigilantly process high-frequency kinematic jerk signals, associated with the “fast branch”, and concomitant low-frequency path deviations, managed by its “slow branch”, thereby furnishing a demonstrable classification ascendancy over those conventional predictors, whose operational scope is confined solely to coordinate-based prognostication.
These results validate the Multi-Speed Transformer design, leveraging dual-scale processing and attention modeling, as uniquely poised for excelling at sequence classification tasks like dangerous driving detection based on vehicular trajectory data. The significant gaps in performance metrics demonstrate the limitations of alternative models for this problem.
Additionally, we evaluated the impact of integrating the symbiotic framework on the model’s performance. The results in
Table 2 demonstrate the improvement in accuracy, precision, recall, and F1-score when the symbiotic framework is employed.
6. Conclusions
This paper presents an AI-based approach for automatically detecting dangerous driving behavior from traffic videos to improve road safety in smart cities. An ad hoc dataset was constructed with diverse scenarios, and an advanced Multi-Speed Transformer deep learning architecture was employed, enhanced by a symbiotic framework to ensure real-time detection and rectification of hazardous maneuvers.
Based on the results and analysis, the Multi-Speed Transformer architecture demonstrates state-of-the-art performance for recognizing dangerous driving behavior from traffic video data. The dual-scale processing and attention mechanisms enable the Multi-Speed Transformer to learn intricate sequential relationships within vehicle maneuver data across both local and global contexts. This leads to significantly boosted accuracy of 97.5% and F1-scores up to 95.5% for detecting anomalous trajectories, outperforming other deep learning and machine learning approaches.
The proposed symbiotic framework should be mainly understood as an architectural concept, which illustrates in a reasonably concrete way how hazardous-behavior recognition modules may be plugged inside a human-in-the-loop intervention loop, where the automatic decision can still be checked or overruled by a human operator; at the present stage the validation is entirely based on offline or simulation-style experiments; therefore, it does not provide a functional-safety argument and it cannot be considered, in any strict sense, as a complete SOTIF safety case. This framework ensures prompt corrective actions while maintaining the driver safely in-the-loop, thus minimizing risky manual control periods. This human–machine cooperative approach represents a balanced pathway for integrating automation to enhance safety, rather than relying solely on either manual or fully autonomous systems.
The research clearly validates the promise of advanced neural architectures and the symbiotic framework for enabling robust driving analysis systems to improve road safety. However, real-world testing and evaluation of such intelligent systems are imperative before large-scale deployment in smart cities. Future work includes extensive testing of these dual-use driver assistance systems across diverse conditions, incorporating naturalistic driving datasets covering various lighting, weather, and traffic conditions. Monitoring model performance across these scenarios is crucial.
Furthermore, incorporating analysis of surrounding vehicles, pedestrians, and other contextual information can further boost prediction accuracy and reliability for real-world dangerous driving detection. Thorough testing across diverse operating conditions and expanding the scope of analysis are key directions that need to be pursued for developing industry-grade systems based on the promising Multi-Speed Transformer approach presented in this paper.
It is recognized that the symbiotic framework constitutes essentially a conceptual validation, while the future deployment dictates a rigorous compliance with functional safety standards typified by ISO 26262 and SOTIF (ISO 21448). The transition from simulation-based results to physical intervention implies the necessity of hardware-in-the-loop testing; this is fundamental to quantify latency and to guarantee that automated corrections do not introduce new hazards into the system dynamics when the operational conditions change.
Responsible development demands proactive collaborations between researchers, authorities, manufacturers, and the public to assess benefits and risks in context. Communication, public oversight mechanisms, and governance will be vital to align progress with social values. By working together, we can translate innovations like the Multi-Speed Transformer and symbiotic framework into solutions that tangibly save lives.
Regulations and standardization around performance metrics and datasets will also be indispensable for facilitating adoption. Overall, this research marks an important milestone in leveraging state-of-the-art deep learning and symbiotic frameworks for automated safe driving assistance and oversight in future smart cities.
The future work, having been prioritized, presents itself as follows: the formal validation of the symbiotic framework, mandated by the ISO 26262 safety standards, is of paramount importance to ensure fail-safe operations; then, the expansion of the existing dataset, so to include adverse meteorological conditions, like intense rain or dense fog, will serve to robustly test sensor capabilities. Furthermore, the integration of pedestrian intention analysis is a subsequent, crucial step.