3.1. Crash Severity and Injury Analysis
The analysis of crash injury severity is an important objective in road safety research. The literature reveals a gradual trend from classical statistical models toward data-driven hybrid frameworks. Classical methods, mainly logistic regression and ordered response models, remain in common use due to their interpretability and robustness [
4,
7,
15,
31,
32]. Other studies have shown that logistic regression models with low bias and random parameters can characterize the effects of road geometry, environmental conditions, and accident configuration on injury outcomes while simultaneously managing the variation resulting from rare fatal accidents or injuries leading to disability [
4,
7,
31]. Bayesian implementations improve these models by taking into account latent correlations and uncertainty, leading to more reliable severity estimates for complex data with multidirectional crashes [
15].
Another area of research is dedicated to analyzing the severity of bodily injuries according to specific typologies of users, such as riders of electric bicycles (EB) and other light vehicles. Studies of e-bike accidents use a generalized ordered probit model (GOM), with random parameters and heterogeneity in the environments, to show that lighting, horizontal road curvatures, speed limits, and user characteristics influence the probability of severe or fatal injuries [
33,
34]. These findings show the context dependence of injury severity mechanisms for semi-vulnerable road users, which are different from those of passenger vehicle occupants.
A significant part of the specialized literature has used machine learning models to increase the prediction of injury severity classes. Comparative studies show that ensemble learning methods, such as Random Forest and its variants of gradient boosting machines, can achieve better overall accuracy than traditional statistical models if large and structured datasets are provided [
5,
6,
32]. However, these machine learning approaches present challenges related to the issue of extreme class imbalance with implications on poor prediction performance for rare but severe injuries [
3,
6]. Machine learning tools have also been applied to small or specific datasets. Studies on tricycles and motorcycles show that the predictive performance under limited data conditions can still be considered acceptable if we carefully select features and adopt validation schemes [
35,
36].
Machine learning-based approaches to accident severity are on the rise, with a trend toward incorporating unstructured data sources using natural language processing methods. Models that incorporate both structured units of accident variables and features derived from unstructured police accident narratives demonstrate better performance than models that use only tabular data [
37,
38]. These approaches are promising but add complexity related to text preprocessing, narrative diversity, and model traceability.
Also, emerging research investigates the connection between injury severity analysis and automatic accident detection systems. Although they are mainly designed for real-time event detection and risk prediction instead of post-crash severity modeling, the studies based on convolutional neural networks (CNNs), computer vision techniques, and analytics based on graphs provide additional insights into crash mechanisms and dangerous scenarios [
1,
39,
40]. These methods are not substitutes for conventional post-crash severity modeling, but they link predictive severity analysis and proactive traffic safety monitoring.
In summary, the literature suggests a trend toward hybrid methodologies. These frameworks combine interpretable statistical models with predictive ML or novel data sources (see
Table 3). There is no universally optimal model; the choice depends on data availability, the specific road user type under investigation, and the required trade-off between explanatory capacity and predictive efficiency [
1,
2,
3,
4,
5,
6,
7,
15,
31,
32,
33,
34,
35,
36,
37,
38,
39,
40,
41,
42].
3.2. Driver Behavior and Human Factors
Driver attitude and behavior are leading human factors influencing crash risk. The studies in this research field aim at understanding how drivers feel, decide, and behave in challenging situations such as drowsiness, distraction, stress, agitation, or risk-taking and at modeling these phenomena using both classical and data-driven techniques [
12,
43,
44,
45].
A baseline of research considers behavior in terms of well-defined variables and interpretable models. These models relate driver characteristics (age, expertise), situational context (traffic, weather, time-of-day), and maneuver-level indicators (speed intentions, brake pattern, keeping the line) to unsafe driving or crash likelihood [
12,
44,
45]. Such studies still matter as they allow for clear causal inference and form the basis for justifiable interventions and policy decisions, at least where explanation is a goal of analysis rather than pure prediction [
12,
45].
With the advancements in the prevalence of sensing and telematics, the literature extends to high-frequency signals, including identifying driver states and driving styles. Some other studies use in-vehicle sensors, smartphones, or drive cams to extract behavioral descriptors (for example, hard braking, hard acceleration, and cornering) and apply ML clustering and supervised learning in order to automatically classify driver profiles and risk patterns [
12,
14,
46,
47]. A representative case study used an Android app to collect information from smartphone sensor streams and context data, evaluating various learning approaches in order to build driver profiles for feedback dashboards [
48]. These pipelines present a practical trend: low-cost sensing together with ML could enable scalable behavioral monitoring, but model validity is dependent on sensor quality (for example, stability of sampling rates) and context coverage [
14,
46,
47].
A distinct area of research investigates the cognitive and physiological aspects of poor driving behavior, such as fatigue, stress, and inattention. This research uses physiological measures or behavioral surrogates to detect driver state changes, which can be correlated with performance decline. A consistent research theme is the challenge of early detection. Identifying cognitive load or fatigue before manifesting errors is desirable for prevention but is difficult due to individual and environmental variability in physiological signals [
8,
44,
45,
49,
50].
An evolution of these approaches is multimodal, context-aware modeling. Recent works integrate kinematic data with contextual factors (type of road, traffic conditions, and weather) and behavior/physiological signals to improve model robustness and real-world applicability [
8,
49,
50,
51]. This is based on the consideration that the same action (for example, a sudden brake) can vary entirely when performed in different contexts, such as defensive vs. aggressive [
8,
49,
50,
51]. Models incorporating contextual features are preferred over those that depend only on raw vehicle signals. The generalization, however, is not a solved problem: the models trained on one region/fleet/driver population often require recalibration once deployed in new conditions/cultures of driving [
49,
50,
51].
Human factors studies in conditionally automated driving add an additional layer. Driver monitoring and behavior modeling focusing on attention allocation, trust calibration, and takeover readiness [
39,
40,
41,
42,
43] take into account the influence of non-driving-related tasks on safety margins. In this context, while research often highlights that “safe behavior” does not only relate to steering or braking quality but also supervisory control and situation awareness [
13,
52,
53,
54,
55]. Evaluating these factors typically is performed using simulator experiments (like [
52]), controlled pilot field operations, or curated datasets in which driver state and interaction events are observed with adequate ratio precision [
53,
54,
55].
Higher-level behavioral representation and reasoning systems (such as models based on knowledge, misbehavior detection architecture, and data-driven behavioral pattern mining) are increasingly seen as the key technologies to accelerate the development of intelligent transportation systems. These techniques aim to move beyond simple classification to designs that can contextualize behavior, explain when appropriate, and adaptively counter the subject to practical concerns of privacy, bias, and real deployment [
13,
48,
50,
56,
57,
58].
The literature reveals an evolution from traditional interpretable analysis at the population level, through sensor-rich behavior inference based on machine learning, to modeling human factors relevant to the context and automation (see
Table 4). The most robust contributions are those that balance methodological rigor (clear ground truth and robust validation) with realistic sensing constraints and situational and variable human driver behavior [
8,
12,
13,
14,
43,
44,
45,
46,
48,
49,
50,
51,
52,
53,
54,
55,
56,
57,
58].
3.3. Vulnerable Road Users
Vulnerable road users’ safety has become an increasing area of research. Pedestrians, cyclists, and light personal mobility device users are not well protected physically, and their visibility depends on infrastructure design, traffic state, and visibility conditions. The literature on VRUs establishes that the risk of accidents and injuries is determined not only by vehicle dynamics but also by a complex set of infrastructural, spatial, behavioral, and environmental variables [
18,
59,
60].
A large number of studies utilize traditional statistical models to measure the severity and frequency of vulnerable road users’ injuries. Logistic regression and ordered response analyses have been used extensively in the literature to study the impacts of vehicle speed, intersection geometry, crossing design, light condition, and demographic information on pedestrian/cyclist safety [
18,
60]. These investigations consistently point out that poor lighting, complex urban intersection form layouts, long crossing distances, and higher operating speeds substantially increase the probability of severe/fatal injury sustained by vulnerable road users. One of the advantages of such methodologies is that they can be more easily interpreted and facilitate the development of infrastructure oriented towards safety analysis and policy development [
61,
62].
More sophisticated modeling approaches extend this framework by incorporating unobserved heterogeneity and spatial variation. Techniques like random parameters and Bayesian models have been utilized with pedestrian and cyclist crash data to describe variation across places, times, and population cohorts. This demonstrates that VRU risk factors are context-sensitive and not uniformly distributed across a road network [
17,
59,
63,
64]. These findings show the necessity of local safety analysis and should be interpreted with concern when extrapolating results between regions without recalibration.
Spatial analysis constitutes another important research area for VRU safety. Methods such as geographic information systems/mapping, kernel density estimation, and network-based hotspot detection are commonly used to identify and monitor high-risk locations for pedestrian/cyclist crashes [
65,
66,
67,
68,
69]. By explicitly including the spatial organization of roads, these methods capture more realistic patterns of exposure and conflict locations than Euclidean approaches. The produced hotspot maps can facilitate targeted interventions such as better crossing solutions, traffic calming, and nighttime illumination. However, the utility of these models is influenced by data quality, geocoding precision, and parameter settings [
67,
68].
Research also addresses the behavioral dimension of VRU safety, focusing on the decisions made by pedestrians and their perception of risk. Some studies are also based on social-cognitive models (including the Theory of Planned Behavior) and account for unsafe crossing and compliance with traffic regulations, taking into consideration perceived risk, social norms, and group personality traits [
70,
71,
72]. These user-focused approaches investigate perceived safety through the use of surveys and controlled studies in immersive virtual reality settings to proactively identify vulnerabilities in contexts where crash data is sparse or reactive [
59,
70].
A less developed but separate area of research studies interactions between VRU safety and advanced vehicles. This research utilizes agent-based models [
17], probabilistic frameworks, and simulation studies to study conflicts involving pedestrians, cyclists, and automated or connected vehicles. These studies often use alternate safety measures (for example, post-encroachment time) for evaluating conflict risk prior to an unavoidable crash [
62,
73,
74]. Although such methods provide useful insights into the future traffic situation, their results are sensitive to modeling assumptions that need to be validated using real traffic data.
Overall, the literature shows that interpreting statistical models with spatial and behavioral theories, as well as emerging practices such as simulation-based assessment for new technologies, is necessary to analyze the safety of VRUs. The development of context-specific approaches is important, adapted to local infrastructure, traffic data, and VRU behavior. The progression of VRU safety depends on the successful coordination of these complementary approaches and how they are applied analytically to account for the mechanisms of vulnerability identified across studies [
17,
18,
59,
60,
61,
62,
63,
64,
65,
66,
67,
68,
69,
70,
71,
72,
73,
75,
76,
77]. Several representative test cases are presented in
Table 5.
3.4. Spatial Analysis and Hotspot Detection
The study of traffic safety has moved in recent years from “global-average” explanations to more spatial thinking since accidents are influenced by both spatial dependence and spatial heterogeneity [
78]. Hotspot detection and clustering are more than cartographical outputs; they are decision-making tools. They guide and direct constrained resources towards locations where danger exists to a larger extent and where interventions are likely to have the greatest impacts [
79,
80,
81].
One major research direction is related to discovering hotspots and clustering crashes that adopt geographic information systems (GIS) and spatial statistics in combination with clustering algorithms and network-aware density approaches. Recent studies demonstrate that algorithm selection is important. Hierarchical strategies and density-based clustering reveal different hotspot structures, and proximity analysis can identify underserved areas situated close to high-incidence places [
80,
81]. Network-based techniques, such as modified kernel density estimation and network kriging, improve the realism of hotspots, as they take into account the network geometry instead of just Euclidean distance [
79]. In low-quality settings, applied ranking relative to the priority value, methods continue to be applicable for identifying hazardous sites and prompting low-cost countermeasures until more sophisticated modeling is possible [
82]. At a macro level, spatial summaries such as mortality rate mapping with spatial autocorrelation assist in resource allocation and benchmarking between regions [
83].
A complementary research direction addresses spatial heterogeneity by moving from global models to local modeling techniques like geographically weighted regression and spatial machine learning. For a significant environmental impact, geographically weighted deep learning has been proposed to allow for nonlinear spatially varying effects and potentially can outperform classic geographically weighted regression baselines. Related research indicates that a standard geographically weighted regression detects spatial variations in the relationship between built environment characteristics and accidents, with differences in direction and magnitude between areas [
16,
84]. Regarding injury severity due to speeding, geographically weighted neural networks (GWNNs) are used to learn local models for each crash location and compare the spatial variation in marginal effects to support place-based countermeasures instead of universal rules [
85]. At the city-level network, geographically weighted random forest (GWRF) generalizes this approach to determine causes of crash frequency with better predictive power and interpretable local importance patterns [
86]. Multiscale geographically weighted regression (MGWR) also indicates that distinct contributing factors may work at diverse spatial scales, which increases model fit and interpretability for urban crash patterns [
86]. Spatial regression on planned areas is still useful and feasible, particularly for finding actionable models as tools in regular GIS (such as GeoDa 1.5.37 software) for routine operations by agencies [
87]. Spatial panel models are important for addressing spatial externalities and temporal dynamics relevant for both inference and policy implications [
88].
Apart from geographical factors, research also integrates behavioral, operational, and data-driven mechanisms to explain why risk clusters in specific segments at particular times. Analyses of the areas near the highway tunnels show that the danger is likely to develop due to a temporal–spatial evolution of predictions related to driving tasks and activities under restricted visibility and intervisibility conditions, explaining why local segments may show higher risk than average on an international scale [
89]. Crash and safety analysis also encounters non-purely spatial heterogeneity (unobserved differences across units, driver groups, or environments), which calls for the development of spatiotemporal crash prediction methods that consider such an additional challenge [
90,
91]. Injury-severity modeling with unobserved heterogeneity operating for vulnerable road users will offer a more contextual explanation of cyclist injury outcomes [
92]. New areas include intelligent and connected mobility: latent class methodologies to address autonomous vehicle crash reports reveal that crash scenarios may form into subclasses, with spatial risk shaped by operational mode and environment [
93]. Meanwhile, the problem of spatio-temporal graph learning has been well studied in the prediction of traffic conditions that can be adopted as an input data flow with appropriate incorporation for safety analytics [
94]. Another branch is to utilize strategic maneuvers that capture critical driving situations, obtained from naturalistic driving data for spatial accident prediction, to be served as a proactive safety service or an early warning strategy [
95]. When including spatial aggregation as a methodological choice and assessing the sensitivity of models with respect to various temporal/spatial groupings, model choice is no longer obvious, with aggregation-sensitive modeling of the frequency of accidents being essential for sound spatial safety analysis [
96].
Spatial analytics and hotspot detection assist road safety studies by (i) distinguishing statistically significant versus raw count clusters, (ii) facilitating targeted interventions at the location level in relation to spatial heterogeneity, and (iii) providing machine learning access to GIS for decision-making transparency [
78,
80,
85]. There are, however, important remaining practical and methodological limitations. As one limitation, obtaining precise geocoding data, especially in rural areas, can be difficult. Another limitation is the trade-off between model complexity and interpretability. Also, spatial results may be sensitive to aggregation decisions, and transferring models between regions is not straightforward without rigorous local validation and calibration [
88,
90,
96].
Table 6 presents several case studies exploring the spatial analysis and hotspot detection.
3.5. Artificial Intelligence, Machine Learning, and Computer Vision in Intelligent Transportation Systems
Artificial intelligence, machine learning (ML), and computer vision are core components of modern intelligent transportation systems (ITS) that can automate perception, prediction, and decision-making in complex traffic scenes [
74]. The vehicle detection, classification, and incident prediction represent some of the central parts in ITS, where deep learning (DL) architectures improve recognition accuracy with complex real-life scenes that have variable-scale objects, occlusion, and multiple-view pose angles [
10]. The most recent advances based on the YOLO (You Only Look Once) algorithm show promising results in vehicle detection at multiple orientations and spatial scales, overcoming drawbacks of traditional pipelines based on images [
10]. In that way, Vehicular Ad Hoc Networks (VANETs) enable low-latency vehicle-to-infrastructure communication for real-time data sharing, such as traffic safety and coordinated control strategies [
20]. With the development of big data analysis and ML, limitations in the matter of prediction and traffic optimization have evolved into making informed travel decisions as well as environmental impacts caused by congestion [
84]. Multimodal sensing systems based on the combination of visual and auditory inputs also enhance perception by enabling robust recognition of emergency vehicles in poor visibility conditions or in the presence of background noise [
21].
Advancements in learning methodologies are expanding the capabilities of ITS. The meta-transfer metric learning and deep space-time models are developed to further improve the model performance in small-data settings and enable adaptive learning, which would be valuable for next-generation communication systems such as 6G ITS [
97,
98]. The development of interpretable ML methodologies provides clear insights into complex crash causality, supporting real-world application operation and safety regulation [
9]. Recent real-time AI solutions are able to handle real-time live broadcasts from cameras and traffic context data to identify accidents, blockages, and abnormal traffic density in smart city scenarios, enabling proactive traffic management. Security and communication ensure data exchange and integrity in the autonomous vehicle–infrastructure interactions, which employ blockchain-based authentication frameworks [
99,
100].
Big data analytics is another important component of modern ITS, supporting real-time large-scale data processing and big-data-driven safety management at the network level [
11]. Large-scale metric learning approaches and deep convolutional neural networks (DNNs) offer an efficient alternative in terms of expenses to the manual safety inspection by processing vast volumes of road and infrastructure images [
101]. The existing computer vision system, which detects facial landmarks, facilitates the detection of drowsiness and distraction in real time by precisely observing the degree of eyelid closure, blink dynamics, and yawn-induced microexpressions. [
102,
103]. ML systems synthesize data from traffic offenses, GPS traces, and crash history to build driver risk profiles, enabling preventative interventions by fleet managers [
104,
105]. These methods are extended through artificial neural networks (ANN) and gain-related learning structures that predict crash risk based on combined environmental and behavioral predictors, allowing for proactive countermeasures on highways [
106].
Beyond perception and prediction, security and system resilience are important concerns. Blockchain-based authentication frameworks use certificate schemes to maintain anonymity and validate vehicle-to-infrastructure communication, which mitigates the problem of spoofing and tampering in distributed traffic scenarios [
99,
100]. Accident severity modeling using ML contributes to the identification of the risk factors associated with fatality and allows for targeted countermeasures against severe accidents and fatalities [
107]. Integrating AI into IoT (Internet of Things) sensor networks, adaptive analytics pipelines, and distributed data management has the potential to improve the resilience and responsiveness of ITS from different operational levels [
74,
108]. These technologies enable ITS to sense, forecast, and act, transforming it from reactive surveillance systems to proactive safety management.
AI and ML represent a new operational model for ITS with automation of perception, prediction, and decision support [
10,
11,
20]. DL methods are achieving human-comparable accuracy in tasks like vehicle detection and risk detection and can be optimized in real time as well as predicted for reactive safety directly on road networks [
74,
97,
109]. Scalable multimodal sensor fusion and scalable multimodal integration are expanding the range of system awareness to challenging visibility and noise conditions. Interpretable AI enables interventions based on proofs by uncovering a relevant neurometric relationship within complex predictive models [
9,
21,
104]. Low-cost video analysis platforms facilitate continuous monitoring, which shifts road safety from reactive analysis to proactive control in connected mobility [
101,
108,
110].
However, there are some limitations in the current work. The vast majority of AI and ML techniques require massive amounts of labeled training data, and such models’ performance may degrade in adverse weather conditions like snow, fog, or nighttime [
86,
90,
97]. Another limitation is related to high computational requirements, privacy constraints, and security vulnerabilities, which make them difficult for real-world deployment, especially in the case of applications at scale or applications with limited resources [
9,
99,
102]. Even though there has been progress, model biases, limited generalization between regions, and gaps in technical infrastructure remain major obstacles to widespread adoption, especially in ITS applications for road safety, as they need to operate reliably in extreme, high-stakes situations [
74,
100,
102,
103,
107]. In addition to the results in detection and prediction, the use of AI in ITS raises ethical and economic issues. AI systems often rely on large volumes of traffic, image, behavioral, and sensor data, which raises questions about privacy, data access, and use [
11,
21,
24]. Another issue is the negative influence of the model and limited transparency, especially when complex models are used in traffic monitoring, risk detection, and automated assistance systems [
9,
102]. Security is also important, as connected transportation systems depend on reliable data exchange and protection against spoofing, manipulation, and unauthorized access [
99,
100]. From an economic point of view, the use of these systems depends on investments in sensors, computing resources, storage, communication infrastructure, and model maintenance [
11,
21]. These limitations show that the use of AI in road safety depends not only on model accuracy but also on data governance, system security, infrastructure capacity, and institutional support [
24]. Representative test cases of ITS are detailed in
Table 7.
Taken together, the themes identified by the BERTopic analysis show a change in the way road safety problems are studied. Earlier work focused mainly on accident description, injury analysis, and statistical modeling based on structured crash records. Recent studies show a stronger focus on machine learning, spatial analysis, computer vision, and multimodal data for prediction, monitoring, and prevention [
22,
23,
24,
26]. This change affects both research questions and research practice. Questions centered only on why crashes occurred are now accompanied by questions about where risk is forming, how it can be detected earlier, and how transport systems can respond before a crash occurs. This also links road safety research more directly to traffic operations, infrastructure planning, and intelligent transport systems [
11,
22,
23,
24,
26].