Mapping Research Trends in Road Safety: A Topic Modeling Perspective

Tudor, Iulius Alexandru; Gîrbacia, Florin

doi:10.3390/vehicles8040069

Open AccessArticle

Mapping Research Trends in Road Safety: A Topic Modeling Perspective

by

Iulius Alexandru Tudor

^* and

Florin Gîrbacia

Department of Automotive and Transport Engineering, Transilvania University of Brasov, 500036 Brasov, Romania

^*

Author to whom correspondence should be addressed.

Vehicles 2026, 8(4), 69; https://doi.org/10.3390/vehicles8040069

Submission received: 2 February 2026 / Revised: 24 March 2026 / Accepted: 25 March 2026 / Published: 27 March 2026

(This article belongs to the Special Issue Intelligent Mobility and Sustainable Automotive Technologies)

Download

Browse Figures

Versions Notes

Abstract

Over the past decade, road safety research has experienced rapid development due to the rapid expansion of large crash databases, the adoption of artificial intelligence techniques, and the demand for proactive and predictive safety solutions. This study conducts a data-driven review of recent research trends in transport safety. It focuses on main domains including crash severity analysis, human factors, vulnerable road users (VRUs), spatial modeling, and artificial intelligence applications. A systematic search of the Scopus database identified 15,599 relevant scientific papers published between 2016 and 2025. After constructing this corpus, titles, abstracts, and keywords were preprocessed using a natural language pipeline. The analysis employed BERTopic, a transformer-based topic modeling framework. The analysis identified 29 distinct research topics, further synthesized into five major thematic areas: (1) crash severity and injury analysis, (2) driver behavior and human factors, (3) vulnerable road users, (4) artificial intelligence, machine learning, and computer vision in intelligent transportation systems, and (5) spatial analysis and hotspot detection. A notable increase in publications related to artificial intelligence and machine learning has been evident since 2020. The results show a transition from descriptive, post-crash studies to integrated, multimodal, predictive analysis. Overall, the findings reveal a paradigm shift in the field. This study also identifies ethical and economic issues associated with the use of artificial intelligence in intelligent transportation systems, including data management, infrastructure requirements, system security, and model transparency. The results signify a transition from intuition-based models to explainable, spatially explicit, and data-intensive models, ultimately facilitating proactive risk assessment and informed decision-making.

Keywords:

road accident; traffic accident; motor vehicle crash; traffic collision; traffic safety

1. Introduction

Over the last two decades, road safety research has developed rapidly, driven by the application of new data analysis methodologies, advanced statistical models, and artificial intelligence (AI). This evolution was primarily fueled by the growing volume and diversity of data from sources such as road infrastructure sensors, connected vehicles, telematics, and human behavioral data [1,2,3]. Consequently, methodological approaches in the field have shifted significantly. The scientific literature documents a move from traditional models like logistic regression and ordered probit models for risk factor analysis to hybrid methods and machine learning algorithms capable of processing nonlinear interactions among variables [4,5,6,7]. Topic modeling analysis of publications from 2016 to 2025 highlights several converging research directions. Five primary thematic clusters can be identified: crash and injury severity modeling, driver behavior and human factors [8], vulnerable road user safety, spatial analysis and hotspot detection, and artificial intelligence applications, including deep learning and computer vision. The literature also shows a temporal pattern. Recent studies increasingly examine explainable predictive models for autonomous vehicles and intelligent infrastructure [9,10,11]. Within the artificial intelligence cluster, models such as Random Forest, XGBoost, and Deep Neural Networks demonstrate a superior capacity to capture data heterogeneity and estimate injury severity with high reproducibility (AUC > 0.90) compared to traditional linear models [1,3,6,9].

Research on driver behavior is also evolving through the integration of physiological signals, including electroencephalography (EEG) and electrocardiogram (ECG), together with vehicular CAN-bus data. These signals are increasingly incorporated into Advanced Driver Assistance Systems (ADAS). This integration supports the early detection of driver stress, fatigue, and inattention [12,13,14].

Research on vulnerable road users (VRUs) has introduced a geographic perspective to urban safety, strongly aligned with spatial analysis methods. Bayesian and spatio-temporal models, often based on Geographic Information Systems (GIS), are effectively used to identify high-risk areas and estimate risk factors related to lighting conditions, pedestrian flow, and intersection design [15,16,17,18,19].

Machine learning applications in intelligent transportation extend beyond safety modeling to enable real-time traffic surveillance, automated incident detection, and mobility flow optimization [11,20,21]. This analysis reveals the field’s interdisciplinary progression from isolated accident studies toward a systemic approach that integrates spatial, behavioral, and sensory data. This shift marks a pivotal move from reactive, post-event analysis to proactive and explainable road risk prediction.

Recent literature on road safety has been extensively analyzed through bibliometric, scientometric, and systematic reviews, with the aim of mapping the structure of the field, research trends, and the evolution of methodological paradigms. Bibliometric analyses have examined the thematic dynamics of road safety research, the relationships between sub-fields, and the evolution of scientific collaboration networks. These studies indicate a gradual diversification of research directions and an increasing level of interdisciplinarity within the road safety domain [22,23,24,25]. In particular, studies dedicated to the relationship between artificial intelligence and road safety have shown an accelerated growth of research oriented towards accident prediction, driver behavior analysis, and the development of intelligent transport systems, concomitant with the conceptual fragmentation of the field into distinct thematic clusters [24].

Systematic reviews on road accident analysis and modeling have integrated traditional statistical approaches, modern machine learning techniques, and emerging technologies into a unified conceptual framework, highlighting the increasing focus on data-driven paradigms and the lack of standardized and generalizable methodological frameworks at the domain level [26]. Reviews on machine learning applications in road accident prediction have examined algorithm performance, data types, and validation strategies. These studies show that ensemble and deep learning models often achieve strong predictive performance. However, they also report limitations related to understanding model predictions, class imbalance, and how well the models perform on new data [3,27,28,29]. At the same time, bibliometric and comparative analyses on ML and AutoML applications have shown a rapid diversification of predictive strategies and an intensification of the use of automated methods in road safety research [28].

Existing studies show that road safety research is undergoing a major change in research methods. The field is moving from traditional statistical methods to methods based on artificial intelligence and data analysis. At the same time, the literature remains fragmented across research topics and lacks integrative analytical frameworks able to capture the latent semantic structure and the evolution of the field [3,22,23,24,25,26,27,28,29].

The present article performs a large-scale analysis of research trends in the field of road safety by applying a data-driven topic modeling framework, based on transformer models, to the scientific literature indexed by the Scopus database.

The objective of the presented analysis is to map the semantic and temporal nature of road safety research by identifying dominant and emerging themes and the relationships between them. The proposed approach allows the identification of latent semantic structures, relationships between research topics, and the evolutionary dynamics of the field. An integrative perspective on thematic fragmentation and interconnections between research directions is provided, contributing to overcoming the methodological limitations of traditional bibliometric analyses.

2. Materials and Methods

2.1. Data Collection

The dataset for this study was collected from the Scopus database because it provides broad coverage of peer-reviewed literature and structured metadata suitable for text mining and bibliometric analysis. The use of a single database, however, introduces a limit. Some relevant studies indexed only in other databases, such as Web of Science or specialized repositories, may not be included in the dataset. Even so, Scopus remains one of the main sources used for scientometric studies in transportation and road safety research. Future work may combine multiple databases to extend coverage and reduce selection bias.

A systematic search was conducted to retrieve articles published between January 2016 and July 2025. The search query was designed to be broad in the transport safety domain (see Appendix A). This query specifically targeted the title, abstract, and keyword fields. It restricted the results to English-language publications and specific document types: articles, conference papers, reviews, book chapters, short surveys, and data papers. This query yielded a corpus of 15,599 records. For each record, the title, abstract, and author-provided keywords were extracted, ensuring a rich representation of its content. Records with missing titles or abstracts were excluded, and duplicates based on these fields were removed to ensure the integrity of the dataset.

2.2. Topic Modeling

A multi-step preprocessing pipeline was implemented using the Natural Language Toolkit (NLTK) library in Python 3.12.13 to prepare textual data for topic modeling. This phase normalized the text and reduced noise to improve the quality of the resulting topics. All text was converted to lowercase. Numerical digits and special characters were removed using regular expressions to isolate textual content. The cleaned text was tokenized into individual words. To ensure semantic quality, a lemmatization and stopword removal process was applied. Part-of-Speech (POS) tagging was utilized to identify the grammatical category of each token, allowing the NLTK WordNetLemmatizer to accurately reduce each word to its context-aware dictionary base form. Next, an expanded stopword filter was applied. This list combined standard NLTK English stopwords with a custom dictionary of 57 domain-specific and academic terms (like ‘accident’, ‘crash’, ‘study’, ‘model’, ‘et’, ‘al’) that lack differentiating semantic value. All tokens consisting of two characters or fewer were removed. This preprocessed corpus was then analyzed using Bidirectional Encoder Representations from Transformers for Topic Modeling (BERTopic) [30]. Each document was converted into a high-dimensional numerical vector using the specific Sentence-Transformer model version, sentence-transformers/all-mpnet-base-v2. This model was selected for its superior ability to capture semantic meaning and context in English texts. To generate human-readable topic representations, a Class-based Term Frequency-Inverse Document Frequency (c-TF-IDF) approach was used. During this step, the vocabulary was structured using Scikit-Learn’s CountVectorizer, strictly configured with an n-gram range of (1, 3) to capture meaningful unigrams, bigrams, and trigrams. In order to reduce noise, terms appearing in fewer than 5 documents (min_df = 5) or in more than 85% of the documents (max_df = 0.85) were filtered out. The final topic representations were then refined using a KeyBERT-inspired model to extract the top 10 most representative keywords for each identified topic.

To enable the effective clustering of document embeddings, their dimensionality was first reduced using the Uniform Manifold Approximation and Projection (UMAP) algorithm. The UMAP configuration was chosen specifically to preserve the corpus’s broad thematic relationships: n_neighbors = 50 (higher than the default 15) prioritized the global topological structure, and a cosine metric was used to reduce dimensions for subsequent clustering (n_components = 5). This configuration ensured that broader relationships between road safety sub-disciplines remained intact. These reduced embeddings were then clustered using the Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) algorithm. To identify only established research trends and avoid micro-topics, the min_cluster_size was set to 80 (about 0.5% of the corpus). This configuration ensured that the resulting topics represented statistically meaningful concentrations of documents. HDBSCAN also handles noise by assigning documents in sparse regions to an outlier group (labeled as Topic-1).

A sensitivity analysis was conducted to validate the robustness of the topic structure and to ensure that the findings were not dependent on specific parameter choices. To achieve this, the min_cluster_size parameter was tested at thresholds of 20, 40, 60, 80, and 100. The results show that varying this parameter controlled the granularity of the topic model. Setting the threshold to 20 produced a fine-grained structure with 89 distinct topics. Increasing the threshold to 100 generated a more consolidated model with 26 broader topics. Importantly, the proportion of documents classified as outlier documents remained stable across all tests, ranging only between 44.0% and 51.5%. This stability indicates a consistent core of dense thematic research clusters within the corpus. Despite the variation in granularity, the primary identified thematic areas (crash severity and injury analysis, driver behavior and human factors, vulnerable road users (VRUs), AI and machine learning, and spatial analysis) consistently emerged as the most prominent. These results indicate that the core super-clusters represent principal research trends within the corpus.

2.3. Results of BERTopic Analysis

The BERTopic analysis of 15,599 scientific abstracts successfully identified 29 distinct, cohesive research topics within the transport safety literature. Many scientific abstracts (n = 7598) were classified as outliers (Topic-1). This outlier classification is not a modeling failure but rather reflects the actual heterogeneity of global road safety research, as captured by the HDBSCAN algorithm. Unlike partition-based algorithms such as K-Means, which force all documents into a cluster, HDBSCAN groups only documents that form dense and clearly defined semantic neighborhoods. We conducted an analysis to justify the classification of these documents as outliers (Topic-1). This analysis involved two key assessments. First, a text-length distribution analysis confirmed that the outlier documents do not represent low-quality noise or incomplete records. Their average word count (166.6 words) closely matched that of documents within valid topics (172.1 words). The density distributions of the word counts for both groups overlap almost perfectly (see Figure 1). This visual evidence confirms that documents were assigned to the outlier cluster due to their semantic isolation in the vector space, not because they lacked sufficient textual data.

Second, an extraction of the outlier corpus confirmed that it consists of specialized, cross-disciplinary, or localized research rather than just discarded data. A manual review of a random sample of these outlier documents confirmed their specific type. Representative documents include highly specific geographic case studies (for example: “Modeling fatal traffic accident occurrences in small Indian cities, Patiala, and Rajpura”), medical analyses (for example: “Factors related to healthcare costs of road traffic accidents in Bucaramanga, Colombia”), and niche technological investigations (for example: “Safety analysis and countermeasure research of power lithium-ion battery road transport”). The HDBSCAN algorithm correctly isolated these documents because they lack the semantic density and volume to form a broad trend. Their exclusion from the primary thematic clusters improved the overall topic model. Evaluated using the Gensim Python library, the final model achieved a strong topic coherence score of 0.508 and a high topic diversity of 0.778, indicating that the remaining topics represent generalized research topics. By allowing these sparse documents to remain as outliers, the algorithm prevents topic dilution. The 29 identified topics (see Table 1) represent reliable and coherent focal points within the scientific community. Document distribution across these topics is non-uniform, with a few dominant topics containing many documents and many smaller specialized topics (see Figure 2).

To understand how the 29 topics group together at a higher level, an algorithmic hierarchical clustering analysis was performed based on the cosine distances between the topic embeddings (see Figure 3). This resulted in four distinct super-clusters:

Injury cluster. This cluster groups topics related to the medical outcomes of crashes, containing trauma care (Topic 4), spinal cord injuries (Topic 13), and general injury severity analysis (Topics 0, 7, 19).
Human factors cluster. This large cluster focuses on the driver, including themes of unsafe driving (Topics 3, 27), driver distraction (Topic 12), and alcohol-impaired driving (Topic 9).
Technology and AI cluster. This cluster is dedicated to computational methods, including topics related to machine learning, deep learning, and neural networks (Topics 6, 8, and 24), as well as physiological monitoring (Topic 1).
Infrastructure cluster. Topics from this cluster concern the physical road environment, including highway safety and engineering (Topic 25), pavement and road surface analysis (Topic 21), and spatial analysis of crash hotspots (Topic 5).

The analysis of topic frequency over time reveals the dynamics of the research in the transport safety field. Common research like “Crash Severity Analysis” (Topic 0) and “Driver Behavior” (Topic 3) shows consistent and stable research output. An emerging trend is the exponential growth in topics related to AI and machine learning. As shown in Figure 4, topics related to machine learning (Topic 6) and deep learning (Topic 8) have seen a significant increase in frequency, mainly starting from 2020. This indicates a focus of the research in the field towards data-intensive and predictive solutions for transport safety.

Topic frequency analysis presents five primary thematic areas in the field of transport safety research. These thematic areas, each composed of one or more related topics, represent the core focus of the scientific community. The clustering algorithm grouped the topics by semantic similarity, but algorithms do not inherently understand established road safety domains. We manually reviewed the 29 topics and 4 super-clusters and grouped them into five major themes based on topic frequency, document volume, and practical relevance (see Table 2). This process was guided by three criteria: (i) we use the boundaries from the hierarchical clustering dendrogram (Figure 3) to group close topics, (ii) overlap in the main keywords of each topic, and (iii) agreement with known research directions in the road safety research area. When a topic could fit into more than one area, the final assignment was based on a review by an expert in road safety and accident analysis from the European Association for Accident Research and Accident Analysis. The identified major themes are (see Figure 5):

Crash severity and injury analysis (Topics 0, 4, 7). This is the leading research area, focused on modeling and predicting the severity of injuries resulting from a crash. Key research activities include using statistical methods like logistic regression and analyzing hospital trauma registry data.
Driver behavior and human factors (Topics 1, 3, 9). The second largest theme is focused on the human factors, investigating unsafe driving behaviors, safety education, driver impairment (drunken driving), and the use of technology (ECG, alarms) to monitor driver state and inattention.
Vulnerable road users (VRUs) (Topics 2, 15). A significant body of work is dedicated to the safety of pedestrians and cyclists. Research in this area explores infrastructure improvements, risk perception, and safety measures specific to VRUs.
AI, Machine Learning, and Computer Vision (Topics 6, 8, 24). This rapidly growing theme involves the application of advanced computational methods. Research includes using deep learning and convolutional neural networks (CNNs) for incident detection, traffic flow prediction, and image-based analysis.
Spatial Analysis & Hotspot Detection (Topic 5). This theme covers the geographical aspects of road safety, using GIS and spatial statistics to identify and analyze accident “hotspot” locations.

3. Results

3.1. Crash Severity and Injury Analysis

The analysis of crash injury severity is an important objective in road safety research. The literature reveals a gradual trend from classical statistical models toward data-driven hybrid frameworks. Classical methods, mainly logistic regression and ordered response models, remain in common use due to their interpretability and robustness [4,7,15,31,32]. Other studies have shown that logistic regression models with low bias and random parameters can characterize the effects of road geometry, environmental conditions, and accident configuration on injury outcomes while simultaneously managing the variation resulting from rare fatal accidents or injuries leading to disability [4,7,31]. Bayesian implementations improve these models by taking into account latent correlations and uncertainty, leading to more reliable severity estimates for complex data with multidirectional crashes [15].

Another area of research is dedicated to analyzing the severity of bodily injuries according to specific typologies of users, such as riders of electric bicycles (EB) and other light vehicles. Studies of e-bike accidents use a generalized ordered probit model (GOM), with random parameters and heterogeneity in the environments, to show that lighting, horizontal road curvatures, speed limits, and user characteristics influence the probability of severe or fatal injuries [33,34]. These findings show the context dependence of injury severity mechanisms for semi-vulnerable road users, which are different from those of passenger vehicle occupants.

A significant part of the specialized literature has used machine learning models to increase the prediction of injury severity classes. Comparative studies show that ensemble learning methods, such as Random Forest and its variants of gradient boosting machines, can achieve better overall accuracy than traditional statistical models if large and structured datasets are provided [5,6,32]. However, these machine learning approaches present challenges related to the issue of extreme class imbalance with implications on poor prediction performance for rare but severe injuries [3,6]. Machine learning tools have also been applied to small or specific datasets. Studies on tricycles and motorcycles show that the predictive performance under limited data conditions can still be considered acceptable if we carefully select features and adopt validation schemes [35,36].

Machine learning-based approaches to accident severity are on the rise, with a trend toward incorporating unstructured data sources using natural language processing methods. Models that incorporate both structured units of accident variables and features derived from unstructured police accident narratives demonstrate better performance than models that use only tabular data [37,38]. These approaches are promising but add complexity related to text preprocessing, narrative diversity, and model traceability.

Also, emerging research investigates the connection between injury severity analysis and automatic accident detection systems. Although they are mainly designed for real-time event detection and risk prediction instead of post-crash severity modeling, the studies based on convolutional neural networks (CNNs), computer vision techniques, and analytics based on graphs provide additional insights into crash mechanisms and dangerous scenarios [1,39,40]. These methods are not substitutes for conventional post-crash severity modeling, but they link predictive severity analysis and proactive traffic safety monitoring.

In summary, the literature suggests a trend toward hybrid methodologies. These frameworks combine interpretable statistical models with predictive ML or novel data sources (see Table 3). There is no universally optimal model; the choice depends on data availability, the specific road user type under investigation, and the required trade-off between explanatory capacity and predictive efficiency [1,2,3,4,5,6,7,15,31,32,33,34,35,36,37,38,39,40,41,42].

3.2. Driver Behavior and Human Factors

Driver attitude and behavior are leading human factors influencing crash risk. The studies in this research field aim at understanding how drivers feel, decide, and behave in challenging situations such as drowsiness, distraction, stress, agitation, or risk-taking and at modeling these phenomena using both classical and data-driven techniques [12,43,44,45].

A baseline of research considers behavior in terms of well-defined variables and interpretable models. These models relate driver characteristics (age, expertise), situational context (traffic, weather, time-of-day), and maneuver-level indicators (speed intentions, brake pattern, keeping the line) to unsafe driving or crash likelihood [12,44,45]. Such studies still matter as they allow for clear causal inference and form the basis for justifiable interventions and policy decisions, at least where explanation is a goal of analysis rather than pure prediction [12,45].

With the advancements in the prevalence of sensing and telematics, the literature extends to high-frequency signals, including identifying driver states and driving styles. Some other studies use in-vehicle sensors, smartphones, or drive cams to extract behavioral descriptors (for example, hard braking, hard acceleration, and cornering) and apply ML clustering and supervised learning in order to automatically classify driver profiles and risk patterns [12,14,46,47]. A representative case study used an Android app to collect information from smartphone sensor streams and context data, evaluating various learning approaches in order to build driver profiles for feedback dashboards [48]. These pipelines present a practical trend: low-cost sensing together with ML could enable scalable behavioral monitoring, but model validity is dependent on sensor quality (for example, stability of sampling rates) and context coverage [14,46,47].

A distinct area of research investigates the cognitive and physiological aspects of poor driving behavior, such as fatigue, stress, and inattention. This research uses physiological measures or behavioral surrogates to detect driver state changes, which can be correlated with performance decline. A consistent research theme is the challenge of early detection. Identifying cognitive load or fatigue before manifesting errors is desirable for prevention but is difficult due to individual and environmental variability in physiological signals [8,44,45,49,50].

An evolution of these approaches is multimodal, context-aware modeling. Recent works integrate kinematic data with contextual factors (type of road, traffic conditions, and weather) and behavior/physiological signals to improve model robustness and real-world applicability [8,49,50,51]. This is based on the consideration that the same action (for example, a sudden brake) can vary entirely when performed in different contexts, such as defensive vs. aggressive [8,49,50,51]. Models incorporating contextual features are preferred over those that depend only on raw vehicle signals. The generalization, however, is not a solved problem: the models trained on one region/fleet/driver population often require recalibration once deployed in new conditions/cultures of driving [49,50,51].

Human factors studies in conditionally automated driving add an additional layer. Driver monitoring and behavior modeling focusing on attention allocation, trust calibration, and takeover readiness [39,40,41,42,43] take into account the influence of non-driving-related tasks on safety margins. In this context, while research often highlights that “safe behavior” does not only relate to steering or braking quality but also supervisory control and situation awareness [13,52,53,54,55]. Evaluating these factors typically is performed using simulator experiments (like [52]), controlled pilot field operations, or curated datasets in which driver state and interaction events are observed with adequate ratio precision [53,54,55].

Higher-level behavioral representation and reasoning systems (such as models based on knowledge, misbehavior detection architecture, and data-driven behavioral pattern mining) are increasingly seen as the key technologies to accelerate the development of intelligent transportation systems. These techniques aim to move beyond simple classification to designs that can contextualize behavior, explain when appropriate, and adaptively counter the subject to practical concerns of privacy, bias, and real deployment [13,48,50,56,57,58].

The literature reveals an evolution from traditional interpretable analysis at the population level, through sensor-rich behavior inference based on machine learning, to modeling human factors relevant to the context and automation (see Table 4). The most robust contributions are those that balance methodological rigor (clear ground truth and robust validation) with realistic sensing constraints and situational and variable human driver behavior [8,12,13,14,43,44,45,46,48,49,50,51,52,53,54,55,56,57,58].

3.3. Vulnerable Road Users

Vulnerable road users’ safety has become an increasing area of research. Pedestrians, cyclists, and light personal mobility device users are not well protected physically, and their visibility depends on infrastructure design, traffic state, and visibility conditions. The literature on VRUs establishes that the risk of accidents and injuries is determined not only by vehicle dynamics but also by a complex set of infrastructural, spatial, behavioral, and environmental variables [18,59,60].

A large number of studies utilize traditional statistical models to measure the severity and frequency of vulnerable road users’ injuries. Logistic regression and ordered response analyses have been used extensively in the literature to study the impacts of vehicle speed, intersection geometry, crossing design, light condition, and demographic information on pedestrian/cyclist safety [18,60]. These investigations consistently point out that poor lighting, complex urban intersection form layouts, long crossing distances, and higher operating speeds substantially increase the probability of severe/fatal injury sustained by vulnerable road users. One of the advantages of such methodologies is that they can be more easily interpreted and facilitate the development of infrastructure oriented towards safety analysis and policy development [61,62].

More sophisticated modeling approaches extend this framework by incorporating unobserved heterogeneity and spatial variation. Techniques like random parameters and Bayesian models have been utilized with pedestrian and cyclist crash data to describe variation across places, times, and population cohorts. This demonstrates that VRU risk factors are context-sensitive and not uniformly distributed across a road network [17,59,63,64]. These findings show the necessity of local safety analysis and should be interpreted with concern when extrapolating results between regions without recalibration.

Spatial analysis constitutes another important research area for VRU safety. Methods such as geographic information systems/mapping, kernel density estimation, and network-based hotspot detection are commonly used to identify and monitor high-risk locations for pedestrian/cyclist crashes [65,66,67,68,69]. By explicitly including the spatial organization of roads, these methods capture more realistic patterns of exposure and conflict locations than Euclidean approaches. The produced hotspot maps can facilitate targeted interventions such as better crossing solutions, traffic calming, and nighttime illumination. However, the utility of these models is influenced by data quality, geocoding precision, and parameter settings [67,68].

Research also addresses the behavioral dimension of VRU safety, focusing on the decisions made by pedestrians and their perception of risk. Some studies are also based on social-cognitive models (including the Theory of Planned Behavior) and account for unsafe crossing and compliance with traffic regulations, taking into consideration perceived risk, social norms, and group personality traits [70,71,72]. These user-focused approaches investigate perceived safety through the use of surveys and controlled studies in immersive virtual reality settings to proactively identify vulnerabilities in contexts where crash data is sparse or reactive [59,70].

A less developed but separate area of research studies interactions between VRU safety and advanced vehicles. This research utilizes agent-based models [17], probabilistic frameworks, and simulation studies to study conflicts involving pedestrians, cyclists, and automated or connected vehicles. These studies often use alternate safety measures (for example, post-encroachment time) for evaluating conflict risk prior to an unavoidable crash [62,73,74]. Although such methods provide useful insights into the future traffic situation, their results are sensitive to modeling assumptions that need to be validated using real traffic data.

Overall, the literature shows that interpreting statistical models with spatial and behavioral theories, as well as emerging practices such as simulation-based assessment for new technologies, is necessary to analyze the safety of VRUs. The development of context-specific approaches is important, adapted to local infrastructure, traffic data, and VRU behavior. The progression of VRU safety depends on the successful coordination of these complementary approaches and how they are applied analytically to account for the mechanisms of vulnerability identified across studies [17,18,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,75,76,77]. Several representative test cases are presented in Table 5.

3.4. Spatial Analysis and Hotspot Detection

The study of traffic safety has moved in recent years from “global-average” explanations to more spatial thinking since accidents are influenced by both spatial dependence and spatial heterogeneity [78]. Hotspot detection and clustering are more than cartographical outputs; they are decision-making tools. They guide and direct constrained resources towards locations where danger exists to a larger extent and where interventions are likely to have the greatest impacts [79,80,81].

One major research direction is related to discovering hotspots and clustering crashes that adopt geographic information systems (GIS) and spatial statistics in combination with clustering algorithms and network-aware density approaches. Recent studies demonstrate that algorithm selection is important. Hierarchical strategies and density-based clustering reveal different hotspot structures, and proximity analysis can identify underserved areas situated close to high-incidence places [80,81]. Network-based techniques, such as modified kernel density estimation and network kriging, improve the realism of hotspots, as they take into account the network geometry instead of just Euclidean distance [79]. In low-quality settings, applied ranking relative to the priority value, methods continue to be applicable for identifying hazardous sites and prompting low-cost countermeasures until more sophisticated modeling is possible [82]. At a macro level, spatial summaries such as mortality rate mapping with spatial autocorrelation assist in resource allocation and benchmarking between regions [83].

A complementary research direction addresses spatial heterogeneity by moving from global models to local modeling techniques like geographically weighted regression and spatial machine learning. For a significant environmental impact, geographically weighted deep learning has been proposed to allow for nonlinear spatially varying effects and potentially can outperform classic geographically weighted regression baselines. Related research indicates that a standard geographically weighted regression detects spatial variations in the relationship between built environment characteristics and accidents, with differences in direction and magnitude between areas [16,84]. Regarding injury severity due to speeding, geographically weighted neural networks (GWNNs) are used to learn local models for each crash location and compare the spatial variation in marginal effects to support place-based countermeasures instead of universal rules [85]. At the city-level network, geographically weighted random forest (GWRF) generalizes this approach to determine causes of crash frequency with better predictive power and interpretable local importance patterns [86]. Multiscale geographically weighted regression (MGWR) also indicates that distinct contributing factors may work at diverse spatial scales, which increases model fit and interpretability for urban crash patterns [86]. Spatial regression on planned areas is still useful and feasible, particularly for finding actionable models as tools in regular GIS (such as GeoDa 1.5.37 software) for routine operations by agencies [87]. Spatial panel models are important for addressing spatial externalities and temporal dynamics relevant for both inference and policy implications [88].

Apart from geographical factors, research also integrates behavioral, operational, and data-driven mechanisms to explain why risk clusters in specific segments at particular times. Analyses of the areas near the highway tunnels show that the danger is likely to develop due to a temporal–spatial evolution of predictions related to driving tasks and activities under restricted visibility and intervisibility conditions, explaining why local segments may show higher risk than average on an international scale [89]. Crash and safety analysis also encounters non-purely spatial heterogeneity (unobserved differences across units, driver groups, or environments), which calls for the development of spatiotemporal crash prediction methods that consider such an additional challenge [90,91]. Injury-severity modeling with unobserved heterogeneity operating for vulnerable road users will offer a more contextual explanation of cyclist injury outcomes [92]. New areas include intelligent and connected mobility: latent class methodologies to address autonomous vehicle crash reports reveal that crash scenarios may form into subclasses, with spatial risk shaped by operational mode and environment [93]. Meanwhile, the problem of spatio-temporal graph learning has been well studied in the prediction of traffic conditions that can be adopted as an input data flow with appropriate incorporation for safety analytics [94]. Another branch is to utilize strategic maneuvers that capture critical driving situations, obtained from naturalistic driving data for spatial accident prediction, to be served as a proactive safety service or an early warning strategy [95]. When including spatial aggregation as a methodological choice and assessing the sensitivity of models with respect to various temporal/spatial groupings, model choice is no longer obvious, with aggregation-sensitive modeling of the frequency of accidents being essential for sound spatial safety analysis [96].

Spatial analytics and hotspot detection assist road safety studies by (i) distinguishing statistically significant versus raw count clusters, (ii) facilitating targeted interventions at the location level in relation to spatial heterogeneity, and (iii) providing machine learning access to GIS for decision-making transparency [78,80,85]. There are, however, important remaining practical and methodological limitations. As one limitation, obtaining precise geocoding data, especially in rural areas, can be difficult. Another limitation is the trade-off between model complexity and interpretability. Also, spatial results may be sensitive to aggregation decisions, and transferring models between regions is not straightforward without rigorous local validation and calibration [88,90,96]. Table 6 presents several case studies exploring the spatial analysis and hotspot detection.

3.5. Artificial Intelligence, Machine Learning, and Computer Vision in Intelligent Transportation Systems

Artificial intelligence, machine learning (ML), and computer vision are core components of modern intelligent transportation systems (ITS) that can automate perception, prediction, and decision-making in complex traffic scenes [74]. The vehicle detection, classification, and incident prediction represent some of the central parts in ITS, where deep learning (DL) architectures improve recognition accuracy with complex real-life scenes that have variable-scale objects, occlusion, and multiple-view pose angles [10]. The most recent advances based on the YOLO (You Only Look Once) algorithm show promising results in vehicle detection at multiple orientations and spatial scales, overcoming drawbacks of traditional pipelines based on images [10]. In that way, Vehicular Ad Hoc Networks (VANETs) enable low-latency vehicle-to-infrastructure communication for real-time data sharing, such as traffic safety and coordinated control strategies [20]. With the development of big data analysis and ML, limitations in the matter of prediction and traffic optimization have evolved into making informed travel decisions as well as environmental impacts caused by congestion [84]. Multimodal sensing systems based on the combination of visual and auditory inputs also enhance perception by enabling robust recognition of emergency vehicles in poor visibility conditions or in the presence of background noise [21].

Advancements in learning methodologies are expanding the capabilities of ITS. The meta-transfer metric learning and deep space-time models are developed to further improve the model performance in small-data settings and enable adaptive learning, which would be valuable for next-generation communication systems such as 6G ITS [97,98]. The development of interpretable ML methodologies provides clear insights into complex crash causality, supporting real-world application operation and safety regulation [9]. Recent real-time AI solutions are able to handle real-time live broadcasts from cameras and traffic context data to identify accidents, blockages, and abnormal traffic density in smart city scenarios, enabling proactive traffic management. Security and communication ensure data exchange and integrity in the autonomous vehicle–infrastructure interactions, which employ blockchain-based authentication frameworks [99,100].

Big data analytics is another important component of modern ITS, supporting real-time large-scale data processing and big-data-driven safety management at the network level [11]. Large-scale metric learning approaches and deep convolutional neural networks (DNNs) offer an efficient alternative in terms of expenses to the manual safety inspection by processing vast volumes of road and infrastructure images [101]. The existing computer vision system, which detects facial landmarks, facilitates the detection of drowsiness and distraction in real time by precisely observing the degree of eyelid closure, blink dynamics, and yawn-induced microexpressions. [102,103]. ML systems synthesize data from traffic offenses, GPS traces, and crash history to build driver risk profiles, enabling preventative interventions by fleet managers [104,105]. These methods are extended through artificial neural networks (ANN) and gain-related learning structures that predict crash risk based on combined environmental and behavioral predictors, allowing for proactive countermeasures on highways [106].

Beyond perception and prediction, security and system resilience are important concerns. Blockchain-based authentication frameworks use certificate schemes to maintain anonymity and validate vehicle-to-infrastructure communication, which mitigates the problem of spoofing and tampering in distributed traffic scenarios [99,100]. Accident severity modeling using ML contributes to the identification of the risk factors associated with fatality and allows for targeted countermeasures against severe accidents and fatalities [107]. Integrating AI into IoT (Internet of Things) sensor networks, adaptive analytics pipelines, and distributed data management has the potential to improve the resilience and responsiveness of ITS from different operational levels [74,108]. These technologies enable ITS to sense, forecast, and act, transforming it from reactive surveillance systems to proactive safety management.

AI and ML represent a new operational model for ITS with automation of perception, prediction, and decision support [10,11,20]. DL methods are achieving human-comparable accuracy in tasks like vehicle detection and risk detection and can be optimized in real time as well as predicted for reactive safety directly on road networks [74,97,109]. Scalable multimodal sensor fusion and scalable multimodal integration are expanding the range of system awareness to challenging visibility and noise conditions. Interpretable AI enables interventions based on proofs by uncovering a relevant neurometric relationship within complex predictive models [9,21,104]. Low-cost video analysis platforms facilitate continuous monitoring, which shifts road safety from reactive analysis to proactive control in connected mobility [101,108,110].

However, there are some limitations in the current work. The vast majority of AI and ML techniques require massive amounts of labeled training data, and such models’ performance may degrade in adverse weather conditions like snow, fog, or nighttime [86,90,97]. Another limitation is related to high computational requirements, privacy constraints, and security vulnerabilities, which make them difficult for real-world deployment, especially in the case of applications at scale or applications with limited resources [9,99,102]. Even though there has been progress, model biases, limited generalization between regions, and gaps in technical infrastructure remain major obstacles to widespread adoption, especially in ITS applications for road safety, as they need to operate reliably in extreme, high-stakes situations [74,100,102,103,107]. In addition to the results in detection and prediction, the use of AI in ITS raises ethical and economic issues. AI systems often rely on large volumes of traffic, image, behavioral, and sensor data, which raises questions about privacy, data access, and use [11,21,24]. Another issue is the negative influence of the model and limited transparency, especially when complex models are used in traffic monitoring, risk detection, and automated assistance systems [9,102]. Security is also important, as connected transportation systems depend on reliable data exchange and protection against spoofing, manipulation, and unauthorized access [99,100]. From an economic point of view, the use of these systems depends on investments in sensors, computing resources, storage, communication infrastructure, and model maintenance [11,21]. These limitations show that the use of AI in road safety depends not only on model accuracy but also on data governance, system security, infrastructure capacity, and institutional support [24]. Representative test cases of ITS are detailed in Table 7.

Taken together, the themes identified by the BERTopic analysis show a change in the way road safety problems are studied. Earlier work focused mainly on accident description, injury analysis, and statistical modeling based on structured crash records. Recent studies show a stronger focus on machine learning, spatial analysis, computer vision, and multimodal data for prediction, monitoring, and prevention [22,23,24,26]. This change affects both research questions and research practice. Questions centered only on why crashes occurred are now accompanied by questions about where risk is forming, how it can be detected earlier, and how transport systems can respond before a crash occurs. This also links road safety research more directly to traffic operations, infrastructure planning, and intelligent transport systems [11,22,23,24,26].

4. Conclusions

Topic modeling analysis of publications from 2016 to 2025 based on BERTopic identified five main research directions in road safety: injury severity analysis, driver behavior, vulnerable road users, spatial analysis and hotspot detection, and artificial intelligence in intelligent transportation systems.

Recent studies show that data analysis and artificial intelligence methods can address several limitations of conventional statistical models. Traditional models often struggle to represent nonlinear relationships and complex interactions among road infrastructure, traffic conditions, human behavior, and environmental factors [1,2,3,4,5,6,7,15,31,32,36,38,42]. Hybrid models that incorporate DL architecture and classical ML algorithms provide better prediction capabilities. The growth of Explainable AI (XAI) techniques like SHapley Additive exPlanations (SHAP) improves model interpretability and advances trust in their outputs [5,6,9,35,36,42]. This interpretability is important to translating analytical results into decisions concerning spatial planning and road safety measures [9,11].

Another trend identified in the literature is the increasing use of multimodal data. These datasets combine spatial information, vehicle dynamics, driver behavior, and visual or sensor data. Such methods provide a more detailed representation of traffic risk and enable continuous monitoring of traffic participants and conditions. As a result, the focus of safety analysis is gradually shifting from retrospective accident studies toward proactive risk management and intervention before accidents occur [3,16,67,93,101]. The development of GIS and spatial modeling techniques such as Bayesian spatial models and geographically weighted regression applications has also enhanced the identification of local risk patterns and high-risk locations across urban or regional road networks [58,78,80,83,85].

Research trends also relate to methods used in road safety. The use of artificial intelligence, spatial analysis, and sensors supports the development of driver assistance systems, traffic monitoring, incident detection, and infrastructure assessment [9,10,11,21]. Studies in this area show that machine learning, computer vision, and traffic data can improve incident detection, traffic monitoring, and accident risk prediction in traffic contexts [9,10,11,21].

These methods help to identify traffic hazards and provide real solutions for road traffic planning and management. In this context, the evolution towards safety prediction is connected not only to research methods but also to the use of data-driven transport systems and road safety management.

Despite these advances, several barriers continue to limit the practical implementation of road safety models based on artificial intelligence. Data quality remains a major challenge. Accident and traffic databases often contain missing values, inconsistent variable definitions, and differences between regional data collection systems. These issues reduce the ability of predictive models to perform well when applied outside the regions where they were developed [79,80,82,83]. A further limitation is that many researchers continue to treat accidents as isolated events rather than the output of processes that evolve over time. This static perspective restricts the development of real-time and adaptive safety plans [77,82,83].

Another limitation is that we only analyzed English-language publications. Road safety is highly localized, depending heavily on regional traffic laws, driving culture, and local infrastructure. Because we excluded non-English literature, our review likely misses some regional research, especially case studies published in national journals.

The reviewed literature suggests several directions for future research. One direction is the development of dynamic spatial and temporal risk models that capture how risk changes across time and space. Such models could better represent real traffic conditions and support adaptive safety measures. However, their application requires long and consistent time series data and involves complex model calibration and validation procedures [77,82,83,84,86].

A second direction involves the incorporation of multimodal data in unified analytical paradigms. Combining spatial, behavioral, and sensor-based data can improve risk estimation and situational awareness. At the same time, such an approach has limitations related to data heterogeneity, synchronization between sources, or high computational cost. To address these, research must shift attention towards consistent data processing algorithms and proficient data integration strategies [3,16,67,93,101].

A third direction is to use local and adaptive models to handle spatial heterogeneity in a more realistic way. Approaches such as geographically weighted regression and spatial machine learning make it possible to see that the relationships between risk factors and road accidents differ across space. Their important benefit is that they can provide targeted and situational safety precautions. A significant drawback of these models is their parameter sensitivity and poor transferability, often requiring careful recalibration for each new geographic region [83,84,85,86].

A fourth direction is utilizing critical driving events and naturalistic driving data to predict risk proactively. The analysis of near-miss situations and safety-critical maneuvers makes these methods capable of discovering risks prior to accidents and supporting the development of early warning systems. Their principal drawbacks are that they depend on sophisticated sensing infrastructure, associated privacy issues, and the lack of a uniform definition of a critical event [80,95].

A fifth direction concerns improving the transfer of models across regions. Predictive models based on artificial intelligence may perform well in one context but show reduced accuracy when applied to different geographic or cultural environments. Future research should therefore focus on cross-region validation, domain adaptation methods, and automatic model recalibration. Although these methods could improve practical applicability, they still depend on improved data harmonization and resources for continuous validation [79,80,82,83,94,98].

Road safety research is moving toward predictive and proactive approaches supported by artificial intelligence and spatial analysis. These methods offer improvements in predictive performance and analytical capability. However, their practical impact depends on addressing current challenges related to data quality, model transparency, robustness, and transferability. Further progress in the field will require not only methodological advances but also coordinated efforts in data standardization and responsible implementation of artificial intelligence in transportation systems [98,104,105,107,108,109].

Author Contributions

Conceptualization, F.G. and I.A.T.; methodology, F.G. and I.A.T.; software, F.G.; validation, F.G. and I.A.T.; formal analysis, I.A.T.; investigation, F.G. and I.A.T.; resources, I.A.T.; data curation, F.G. and I.A.T.; writing—original draft preparation, F.G. and I.A.T.; writing—review and editing, F.G. and I.A.T.; visualization, F.G.; supervision, F.G.; project administration, F.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by a grant of the Ministry of Research, Innovation and Digitization, CNCS-UEFISCDI, project number PN-IV-P2-2.1-TE-2023-1434, within PNCDI IV.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of the data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
ML	Machine Learning
DL	Deep Learning
ITS	Intelligent Transportation Systems
VRUs	Vulnerable Road Users
CNN	Convolutional Neural Network
ANN	Artificial Neural Network
DNN	Deep Neural Network
GIS	Geographic Information Systems
NLP	Natural Language Processing
NLTK	Natural Language Toolkit
BERTopic	Bidirectional Encoder Representations from Transformers for Topic Modeling
c-TF-IDF	Class-based Term Frequency–Inverse Document Frequency
UMAP	Uniform Manifold Approximation and Projection
HDBSCAN	Hierarchical Density-Based Spatial Clustering of Applications with Noise
POS	Part-of-Speech
YOLO	You Only Look Once
VANETs	Vehicular Ad Hoc Networks
IoT	Internet of Things
GPS	Global Positioning System
XAI	Explainable Artificial Intelligence
SHAP	SHapley Additive exPlanations
GWR	Geographically Weighted Regression
MGWR	Multiscale Geographically Weighted Regression
GWNN	Geographically Weighted Neural Network
GWRF	Geographically Weighted Random Forest
EB	Electric Bicycle
GOMs	Generalized Ordered Probit Models
6G	Sixth Generation Mobile Networks
MIT	Massachusetts Institute of Technology
mAP	mean Average Precision
ECG	Electrocardiogram
AutoML	Automated Machine Learning
AUC	Area Under the Curve
EEG	Electroencephalography
CAN-bus	Controller Area Network bus
ADAS	Advanced Driver Assistance Systems
XGBoost	Extreme Gradient Boosting
ROC	Receiver Operating Characteristic

Appendix A

TITLE-ABS-KEY (Road accident analysis) AND PUBYEAR > 2015 AND PUBYEAR < 2026 AND (LIMIT-TO (DOCTYPE, “ar”) OR LIMIT-TO (DOCTYPE, “cp”) OR LIMIT-TO (DOCTYPE, “re”) OR LIMIT-TO (DOCTYPE, “ch”) OR LIMIT-TO (DOCTYPE, “sh”) OR LIMIT-TO (DOCTYPE, “dp”)) AND (LIMIT-TO (LANGUAGE, “English”))

References

Ghomi, H.; Bagheri, M.; Fu, L.; Miranda-Moreno, L.F. Analyzing injury severity factors at highway railway grade crossing accidents involving vulnerable road users: A comparative study. J. Traffic Inj. Prev. 2016, 17, 833–841. [Google Scholar] [CrossRef]
Haleem, K.; Alluri, P.; Gan, A. Analyzing pedestrian crash injury severity at signalized and non-signalized locations. J. Accid. Anal. Prev. 2015, 81, 14–23. [Google Scholar] [CrossRef] [PubMed]
Santos, K.; Dias, J.P.; Amado, C. A literature review of machine learning algorithms for crash injury severity prediction. J. Saf. Res. 2022, 80, 254–269. [Google Scholar] [CrossRef] [PubMed]
Liu, S.; Fan, W.D.; Li, Y. Injury severity analysis of rollover crashes for passenger cars and light trucks considering temporal stability: A random parameters logit approach with heterogeneity in mean and variance. J. Saf. Res. 2021, 78, 276–291. [Google Scholar] [CrossRef]
Madushani, J.S.; Sandamal, R.K.; Meddage, D.; Pasindu, H.; Gomes, P.A. Evaluating expressway traffic crash severity by using logistic regression and explainable & supervised machine learning classifiers. J. Transp. Eng. 2023, 13, 100190. [Google Scholar]
Rahim, M.A.; Hassan, H.M. A deep learning based traffic crash severity prediction framework. J. Accid. Anal. Prev. 2021, 154, 106090. [Google Scholar] [CrossRef]
Tanishita, M.; Sekiguchi, Y.; Sunaga, D. Impact analysis of road infrastructure and traffic control on severity of pedestrian–vehicle crashes at intersections and non-intersections using bias-reduced logistic regression. J. IATSS Res. 2023, 47, 233–239. [Google Scholar] [CrossRef]
Helland, A.; Lydersen, S.; Lervåg, L.-E.; Jenssen, G.D.; Mørland, J.; Slørdal, L. Driving simulator sickness: Impact on driving performance, influence of blood alcohol concentration, and effect of repeated simulator exposures. J. Accid. Anal. Prev. 2016, 94, 180–187. [Google Scholar] [CrossRef]
Ahmed, S.; Hossain, M.A.; Ray, S.K.; Bhuiyan, M.M.I.; Sabuj, S.R. A study on road accident prediction and contributing factors using explainable machine learning models: Analysis and performance. J. Transp. Res. Interdiscip. Perspect. 2023, 19, 100814. [Google Scholar] [CrossRef]
Berwo, M.A.; Fang, Y.; Sarwar, N.; Mahmood, J.; Aljohani, M.; Elhosseini, M. YOLOv8n-CGW: A novel approach to multi-oriented vehicle detection in intelligent transportation systems. J. Multimed. Tools Appl. 2025, 84, 3809–3840. [Google Scholar] [CrossRef]
Nguyen, H.P.; Nguyen, P.Q.P.; Bui, V.D. Applications of big data analytics in traffic management in intelligent transportation systems. J. JOIV: Int. J. Inform. Vis. 2022, 6, 177–187. [Google Scholar] [CrossRef]
Chen, L.-l.; Zhao, Y.; Ye, P.-f.; Zhang, J.; Zou, J.-z. Detecting driving stress in physiological signals based on multimodal feature analysis and kernel classifiers. J. Expert. Syst. Appl. 2017, 85, 279–291. [Google Scholar] [CrossRef]
Hu, Y.; Wang, F.; Ye, D.; Wu, M.; Kang, J.; Yu, R. Llm-based misbehavior detection architecture for enhanced traffic safety in connected autonomous vehicles. J. IEEE Trans. Veh. Technol. 2025, 74, 12829–12841. [Google Scholar] [CrossRef]
Useche, S.A.; Alonso, F.; Montoro, L.; Esteban, C. Distraction of cyclists: How does it influence their risky behaviors and traffic crashes? J. PeerJ 2018, 6, e5616. [Google Scholar] [CrossRef] [PubMed]
Cheng, W.; Gill, G.S.; Sakrani, T.; Dasu, M.; Zhou, J. Predicting motorcycle crash injury severity using weather data and alternative Bayesian multivariate crash frequency models. J. Accid. Anal. Prev. 2017, 108, 172–180. [Google Scholar] [CrossRef] [PubMed]
Li, T.; Liu, S.; Fan, G.; Zhao, H.; Zhang, M.; Fan, J.; Li, C. Spatial heterogeneity effect of built environment on traffic safety using geographically weighted atrous convolutions neural network. J. Accid. Anal. Prev. 2025, 213, 107934. [Google Scholar] [CrossRef]
Merzougui, S.E.; Limani, X.; Gavrielides, A.; Reiter, P.; Palazzi, C.E.; Marquez-Barja, J. Leveraging edge computing and orchestration platform for enhanced pedestrian safety application: The DEDICAT-6G approach. In Proceedings of the 2024 IEEE 21st Consumer Communications & Networking Conference (CCNC); IEEE: New York, NY, USA, 2024; pp. 380–383. [Google Scholar]
Mukherjee, D. Assessing pedestrian safety at urban signalized intersections across various land use types: Insights from a mid-sized Indian city. J. Discov. Appl. Sci. 2025, 7, 384. [Google Scholar] [CrossRef]
Wang, S.; Gao, K.; Zhang, L.; Yu, B.; Easa, S.M. Geographically weighted machine learning for modeling spatial heterogeneity in traffic crash frequency and determinants in US. J. Accid. Anal. Prev. 2024, 199, 107528. [Google Scholar] [CrossRef]
Behura, A.; Kumar, A.; Jain, P.K. A comparative performance analysis of vehicular routing protocols in intelligent transportation systems. J. Telecommun. Syst. 2025, 88, 26. [Google Scholar] [CrossRef]
Zohaib, M.; Asim, M.; ELAffendi, M. Enhancing emergency vehicle detection: A deep learning approach with multimodal fusion. J. Math. 2024, 12, 1514. [Google Scholar] [CrossRef]
Tzika-Kostopoulou, D.; Nathanail, E.; Kokkinos, K. Big data in transportation: A systematic literature analysis and topic classification. J. Knowl. Inf. Syst. 2024, 66, 5021–5046. [Google Scholar] [CrossRef]
Ulu, M.; Türkan, Y.S. Bibliometric Analysis of Traffic Accident Prediction Studies from 2003 to 2023: Trends, Patterns and Future Directions. J Promet-Traffic Transp. 2024, 36, 833–851. [Google Scholar] [CrossRef]
Woo, S.H.; Choi, M.S.; Duffy, V.G. Artificial Intelligence and Transportations on Road Safety: A Bibliometric Review. In Proceedings of the International Conference on Human-Computer Interaction; Springer Nature: Berlin/Heidelberg, Germany, 2023; pp. 450–464. [Google Scholar]
Zou, X.; Vu, H.L.; Huang, H. Fifty years of accident analysis & prevention: A bibliometric and scientometric overview. J. Accid. Anal. Prev. 2020, 144, 105568. [Google Scholar]
Skaug, L.; Nojoumian, M.; Dang, N.; Yap, A. Road Crash Analysis and Modeling: A Systematic Review of Methods, Data, and Emerging Technologies. J. Appl. Sci. 2025, 15, 7115. [Google Scholar] [CrossRef]
Ali, Y.; Hussain, F.; Haque, M.M. Advances, challenges, and future research needs in machine learning-based crash prediction models: A systematic review. J. Accid. Anal. Prev. 2024, 194, 107–378. [Google Scholar] [CrossRef]
Angarita-Zapata, J.S.; Maestre-Gongora, G.; Calderín, J.F. A bibliometric analysis and benchmark of machine learning and automl in crash severity prediction: The case study of three colombian cities. J. Sens. 2021, 21, 8401. [Google Scholar] [CrossRef]
Silva, P.B.; Andrade, M.; Ferreira, S. Machine learning applied to road safety modeling: A systematic literature review. J. Traffic Transp. Eng. 2020, 7, 775–790. [Google Scholar] [CrossRef]
Grootendorst, M. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv 2022, arXiv:2203.05794. [Google Scholar]
Duddu, V.R.; Penmetsa, P.; Pulugurtha, S.S. Modeling and comparing injury severity of at-fault and not at-fault drivers in crashes. J. Accid. Anal. Prev. 2018, 120, 55–63. [Google Scholar] [CrossRef]
Jamal, A.; Zahid, M.; Tauhidur Rahman, M.; Al-Ahmadi, H.M.; Almoshaogeh, M.; Farooq, D.; Ahmad, M. Injury severity prediction of traffic crashes with ensemble machine learning techniques: A comparative study. J. Int. J. Inj. Control Saf. Promot. 2021, 28, 408–427. [Google Scholar] [CrossRef]
Chang, F.; Haque, M.M.; Yasmin, S.; Huang, H. Crash injury severity analysis of E-Bike Riders: A random parameters generalized ordered probit model with heterogeneity in means. J. Saf. Sci. 2022, 146, 105545. [Google Scholar] [CrossRef]
Qian, Q.; Shi, J. Comparison of injury severity between E-bikes-related and other two-wheelers-related accidents: Based on an accident dataset. J. Accid. Anal. Prev. 2023, 190, 107189. [Google Scholar] [CrossRef] [PubMed]
Ijaz, M.; Zahid, M.; Jamal, A. A comparative study of machine learning classifiers for injury severity prediction of crashes involving three-wheeled motorized rickshaw. J. Accid. Anal. Prev. 2021, 154, 106094. [Google Scholar] [CrossRef] [PubMed]
Santos, K.; Firme, B.; Dias, J.P.; Amado, C. Analysis of motorcycle accident injury severity and performance comparison of machine learning algorithms. J. Transp. Res. Rec. 2024, 2678, 736–748. [Google Scholar] [CrossRef]
Shao, Y.; Shi, X.; Zhang, Y.; Shiwakoti, N.; Xu, Y.; Ye, Z. Injury severity prediction and exploration of behavior-cause relationships in automotive crashes using natural language processing and extreme gradient boosting. J. Eng. Appl. Artif. Intell. 2024, 133, 108542. [Google Scholar] [CrossRef]
Zeng, Q.; Wang, Q.; Zhang, K.; Wong, S.; Xu, P. Analysis of the injury severity of motor vehicle–pedestrian crashes at urban intersections using spatiotemporal logistic regression models. J. Accid. Anal. Prev. 2023, 189, 107119. [Google Scholar] [CrossRef]
Singh, N.; Kumar, M. Conceptual Framework for Accident Prone Hotspot Identification and Removal using Historical Data Analytics. In Proceedings of the 2020 IEEE 17th India Council International Conference (INDICON); IEEE: New York, NY, USA, 2020; pp. 1–7. [Google Scholar]
Vandana, B.; Jogi, S.; Shenoy, G.; Karanth, S.; Hegde, S. A Comprehensive Framework for Accident Detection and Risk Assessment Using CNN and Graph-Based Analytics. In Proceedings of the 2025 International Conference on Artificial Intelligence and Data Engineering (AIDE); IEEE: New York, NY, USA, 2025; pp. 834–839. [Google Scholar]
Nour, M.K.; Naseer, A.; Alkazemi, B.; Jamil, M.A. Road traffic accidents injury data analytics. J. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 762–770. [Google Scholar] [CrossRef]
Zhang, J.; Li, Z.; Pu, Z.; Xu, C. Comparing prediction performance for crash injury severity among various machine learning and statistical methods. J. IEEE Access 2018, 6, 60079–60087. [Google Scholar] [CrossRef]
Alkinani, M.H.; Khan, W.Z.; Arshad, Q. Detecting human driver inattentive and aggressive driving behavior using deep learning: Recent advances, requirements and open challenges. J. IEEE Access 2020, 8, 105008–105030. [Google Scholar] [CrossRef]
Magaña, V.C.; Pañeda, X.G.; Garcia, R.; Paiva, S.; Pozueco, L. Beside and behind the wheel: Factors that influence driving stress and driving behavior. J. Sustain. 2021, 13, 4775. [Google Scholar] [CrossRef]
Rabea, A.F.A.; Ahmad, S.A.; Jantan, S.D.; Soh, A.C.; Ishak, A.J.; Adnan, R.N.E.R.; Al-Qazzaz, N.K. Driver’s fatigue classification based on physiological signals using rnn-lstm technique. In Proceedings of the 2022 IEEE-EMBS Conference on Biomedical Engineering and Sciences (IECBES); IEEE: New York, NY, USA, 2022; pp. 280–285. [Google Scholar]
Mallia, L.; Lazuras, L.; Violani, C.; Lucidi, F. Crash risk and aberrant driving behaviors among bus drivers: The role of personality and attitudes towards traffic safety. J. Accid. Anal. Prev. 2015, 79, 145–151. [Google Scholar] [CrossRef]
Xiao, Y.; Dai, M.; Xue, S. Driving Behavior of Older and Younger Drivers in Simplified Emergency Scenarios. J. Sens. 2025, 25, 5178. [Google Scholar] [CrossRef] [PubMed]
Valente, J.; Ramalho, C.; Vinha, P.; Mora, C.; Jardim, S. Using machine learning to understand driving behavior patterns. J. Procedia Comput. Sci. 2024, 239, 1823–1830. [Google Scholar] [CrossRef]
Fugiglando, U.; Massaro, E.; Santi, P.; Milardo, S.; Abida, K.; Stahlmann, R.; Netter, F.; Ratti, C. Driving behavior analysis through CAN bus data in an uncontrolled environment. J. IEEE Trans. Intell. Transp. Syst. 2018, 20, 737–748. [Google Scholar] [CrossRef]
Yuan, D.; Zhou, K.; Yang, C. Architecture and application of traffic safety management knowledge graph based on Neo4j. J. Sustain. 2023, 15, 9786. [Google Scholar] [CrossRef]
Damodariya, S.; Patel, C. Identification of factors causing risky driving behavior on high-speed multi-lane highways in India through principal component analysis. J. Int. J. Eng. 2022, 35, 2130–2138. [Google Scholar] [CrossRef]
Distefano, N.; Leonardi, S.; Pulvirenti, G.; Romano, R.; Merat, N.; Boer, E.; Woolridge, E. Physiological and driving behaviour changes associated to different road intersections. J. Eur. Transp. 2020, 77, 4. [Google Scholar] [CrossRef]
Moin, A.S.; Tanuja, C.; Kumar, N.S.; Nallakaruppan, M.; Senthilkumaran, U. Sensing drunken drivers using data science. In Proceedings of the 2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS); IEEE: New York, NY, USA, 2019; pp. 1148–1151. [Google Scholar]
Sahayadhas, A.; Sundaraj, K.; Murugappan, M.; Palaniappan, R. Physiological signal based detection of driver hypovigilance using higher order spectra. J. Expert. Syst. Appl. 2015, 42, 8669–8677. [Google Scholar] [CrossRef]
Selvathi, D.; Dhivya, N. Realization of VLSI architecture to detect driver drowsiness for road accident avoidance system. In Proceedings of the 2016 Online International Conference on Green Engineering and Technologies (IC-GET); IEEE: New York, NY, USA, 2016; pp. 1–5. [Google Scholar]
Nouh, R.; Singh, M.; Singh, D. SafeDrive: Hybrid recommendation system architecture for early safety predication using Internet of Vehicles. J. Sens. 2021, 21, 3893. [Google Scholar] [CrossRef]
Wu, C.-C. The development and implementation of driving information collection and recording system for driving behavior analysis. In Proceedings of the 2020 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-Taiwan); IEEE: New York, NY, USA, 2020; pp. 1–2. [Google Scholar]
Xu, L.; Guo, L.; Ge, P.; Wang, X. Transition in Different Critical Situations: How Non-Driving Related Tasks Affect Drivers’ Physiological Response and Takeover Behavior After Partial Automation Silent Failures. J. IEEE Trans. Intell. Transp. Syst. 2024, 25, 16642–16652. [Google Scholar] [CrossRef]
Hyder, A.; Subbarao, S.S. Navigating safety: Pedestrian safety perceptions at uncontrolled crosswalks in Hyderabad, India. J. Innov. Infrastruct. Solut. 2025, 10, 11. [Google Scholar] [CrossRef]
Kim, S.; Choi, S.; Kim, B.H. Analysis of factors affecting pedestrian safety for the elderly and identification of vulnerable areas in Seoul. J. Accid. Anal. Prev. 2025, 211, 107878. [Google Scholar] [CrossRef] [PubMed]
Hussain, M.S.; Kumari, R.; Nimesh, V.; Goswami, A.K. Assessing impact of urban street infrastructure on pedestrian safety perception. J. Proc. Inst. Civ. Eng.-Urban Des. Plan. 2021, 174, 76–84. [Google Scholar] [CrossRef]
Stipancic, J.; Miranda-Moreno, L.; Strauss, J.; Labbe, A. Pedestrian safety at signalized intersections: Modelling spatial effects of exposure, geometry and signalization on a large urban network. J. Accid. Anal. Prev. 2020, 134, 105265. [Google Scholar] [CrossRef]
Ihssian, A.; Ismail, K. Pedestrian Safety Perception Analysis at Intersections in Ottawa. In Proceedings of the Canadian Society of Civil Engineering Annual Conference; Springer Nature: Berlin/Heidelberg, Germany, 2023; pp. 15–26. [Google Scholar]
Singh, S.; Ali, Y.; Haque, M.M. A Bayesian extreme value theory modelling framework to assess corridor-wide pedestrian safety using autonomous vehicle sensor data. J. Accid. Anal. Prev. 2024, 195, 107416. [Google Scholar] [CrossRef] [PubMed]
Budzynski, M.; Jamroz, K.; Mackun, T. Pedestrian safety in road traffic in Poland. In Proceedings of the IOP Conference Series: Materials Science and Engineering; IOP: Bristol, UK, 2017; p. 042064. [Google Scholar]
Guo, Q.; Xu, P.; Pei, X.; Wong, S.; Yao, D. The effect of road network patterns on pedestrian safety: A zone-based Bayesian spatial modeling approach. J. Accid. Anal. Prev. 2017, 99, 114–124. [Google Scholar] [CrossRef]
Kaygisiz, Ö.; Yildiz, A.; Duzgun, S. Spatio-temporal pedestrian accident analysis to improve urban pedestrian safety: The case of the Eskisehir Motorway. J. Gazi Univ. J. Sci. 2015, 28, 623–630. [Google Scholar]
Koekemoer, K.; Van Gesselleen, M.; Van Niekerk, A.; Govender, R.; Van As, A.B. Child pedestrian safety knowledge, behaviour and road injury in Cape Town, South Africa. J. Accid. Anal. Prev. 2017, 99, 202–209. [Google Scholar] [CrossRef]
Lin, P.-S.; Guo, R.; Bialkowska-Jelinska, E.; Kourtellis, A.; Zhang, Y. Development of countermeasures to effectively improve pedestrian safety in low-income areas. J. Traffic Transp. Eng. 2019, 6, 162–174. [Google Scholar] [CrossRef]
Ham, J.-S.; Kim, D.H.; Jung, N.; Moon, J. Cipf: Crossing intention prediction network based on feature fusion modules for improving pedestrian safety. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2023; pp. 3666–3675. [Google Scholar]
Mahdinia, I.; Khattak, A.J.; Haque, A.M. How effective are pedestrian crash prevention systems in improving pedestrian safety? Harnessing large-scale experimental data. J. Accid. Anal. Prev. 2022, 171, 106669. [Google Scholar] [CrossRef]
Zainala, S.K.; Borhana, M.N.; Yazid, M.M.; Ibrahim, A.H. The application of theory of planned behaviour in pedestrian safety: A literature approach. J. Kejuruter. 2023, 35, 539–549. [Google Scholar] [CrossRef]
Utriainen, R. The potential impacts of automated vehicles on pedestrian safety in a four-season country. J. Intell. Transp. Syst. 2020, 25, 188–196. [Google Scholar] [CrossRef]
Zhu, L.; Yu, F.R.; Wang, Y.; Ning, B.; Tang, T. Big data analytics in intelligent transportation systems: A survey. J. IEEE Trans. Intell. Transp. Syst. 2018, 20, 383–398. [Google Scholar] [CrossRef]
Jung, S.; Qin, X.; Oh, C. Improving strategic policies for pedestrian safety enhancement using classification tree modeling. J. Transp. Res. Part A Policy Pract. 2016, 85, 53–64. [Google Scholar] [CrossRef]
Ni, Y.; Wang, M.; Sun, J.; Li, K. Evaluation of pedestrian safety at intersections: A theoretical framework based on pedestrian-vehicle interaction patterns. J. Accid. Anal. Prev. 2016, 96, 118–129. [Google Scholar] [CrossRef]
Zhu, H.; Almukdad, A.; Iryo-Asano, M.; Alhajyaseen, W.K.; Nakamura, H.; Zhang, X. A novel agent-based framework for evaluating pedestrian safety at unsignalized mid-block crosswalks. J. Accid. Anal. Prev. 2021, 159, 106288. [Google Scholar] [CrossRef]
Ziakopoulos, A.; Yannis, G. A review of spatial approaches in road safety. J. Accid. Anal. Prev. 2020, 135, 105323. [Google Scholar] [CrossRef]
Bisht, L.S.; Tiwari, G. Identification of road traffic crashes hotspots on an intercity expressway in India using geospatial techniques. J. IATSS Res. 2023, 47, 349–356. [Google Scholar] [CrossRef]
Kamh, H.; Alyami, S.H.; Khattak, A.; Alyami, M.; Almujibah, H. Exploring road traffic accidents hotspots using clustering algorithms and GIS-based spatial analysis. J. IEEE Access 2024, 13, 60944–60954. [Google Scholar] [CrossRef]
Khosravi, Y.; Hosseinali, F.; Adresi, M. Identifying accident prone areas and factors influencing the severity of crashes using machine learning and spatial analyses. J. Sci. Rep. 2024, 14, 29836. [Google Scholar] [CrossRef]
Mekonnen, E.; Quezon, E.T.; Mohammed, M. Investigation of traffic accident prone areas related to existing road condition and driver’s behavior along Menagesha-Ambo road section. Glob. Sci. J. 2018, 6, 82. [Google Scholar]
Assi, K. Traffic crash severity prediction—A synergy by hybrid principal component analysis and machine learning models. J. Int. J. Environ. Res. Public Health 2020, 17, 7598. [Google Scholar] [CrossRef] [PubMed]
Huang, Y.; Wang, X.; Patton, D. Examining spatial relationships between crashes and the built environment: A geographically weighted regression approach. J. Transp. Geogr. 2018, 69, 221–233. [Google Scholar] [CrossRef]
Zhang, Z.; Xu, N.; Liu, J.; Jones, S. Exploring spatial heterogeneity in factors associated with injury severity in speeding-related crashes: An integrated machine learning and spatial modeling approach. J. Accid. Anal. Prev. 2024, 206, 107697. [Google Scholar] [CrossRef]
Tang, X.; Bi, R.; Wang, Z. Spatial analysis of moving-vehicle crashes and fixed-object crashes based on multi-scale geographically weighted regression. J. Accid. Anal. Prev. 2023, 189, 107123. [Google Scholar] [CrossRef]
Rhee, K.-A.; Kim, J.-K.; Lee, Y.-i.; Ulfarsson, G.F. Spatial regression analysis of traffic crashes in Seoul. J. Accid. Anal. Prev. 2016, 91, 190–199. [Google Scholar] [CrossRef]
Soro, W.L.; Zhou, Y.; Wayoro, D. Crash rates analysis in China using a spatial panel model. J. IATSS Res. 2017, 41, 123–128. [Google Scholar] [CrossRef]
Bei, R.; Du, Z.; Lyu, N.; Yu, L.; Yang, Y. Exploring the Mechanism for Increased Risk in Freeway Tunnel Approach Zones: A Perspective on Temporal-spatial Evolution of Driving Predictions, Tasks, and Behaviors. J. Accid. Anal. Prev. 2025, 211, 107914. [Google Scholar] [CrossRef]
Kumar, S.; Toshniwal, D.; Parida, M. A comparative analysis of heterogeneity in road accident data using data mining techniques. J. Evol. Syst. 2017, 8, 147–155. [Google Scholar] [CrossRef]
Yan, Y.; Zhang, Y.; Yang, X.; Hu, J.; Tang, J.; Guo, Z. Crash prediction based on random effect negative binomial model considering data heterogeneity. J. Phys. A Stat. Mech. Its Appl. 2020, 547, 123858. [Google Scholar] [CrossRef]
Fountas, G.; Fonzone, A.; Olowosegun, A.; McTigue, C. Addressing unobserved heterogeneity in the analysis of bicycle crash injuries in Scotland: A correlated random parameters ordered probit approach with heterogeneity in means. J. Anal. Methods Accid. Res. 2021, 32, 100181. [Google Scholar] [CrossRef]
Ren, Q.; Xu, M. Heterogeneity in crash patterns of autonomous vehicles: The latent class analysis coupled with multinomial logit model. J. Accid. Anal. Prev. 2025, 209, 107827. [Google Scholar] [CrossRef] [PubMed]
Diao, C.; Zhang, D.; Liang, W.; Li, K.-C.; Hong, Y.; Gaudiot, J.-L. A novel spatial-temporal multi-scale alignment graph neural network security model for vehicles prediction. J. IEEE Trans. Intell. Transp. Syst. 2022, 24, 904–914. [Google Scholar] [CrossRef]
Ryder, B.; Dahlinger, A.; Gahr, B.; Zundritsch, P.; Wortmann, F.; Fleisch, E. Spatial prediction of traffic accidents with critical driving events–Insights from a nationwide field study. J. Transp. Res. Part. A Policy Pract. 2019, 124, 611–626. [Google Scholar] [CrossRef]
Costa, J.O.; Maria, A.P.; Pereira, P.A.; Freitas, E.F.; Soares, F.E. Portuguese two-lane highways: Modelling crash frequencies for different temporal and spatial aggregation of crash data. J. Transp. Res. Interdiscip. Perspect. 2018, 33, 92–103. [Google Scholar]
Staby, S.; Manjusha, R. Spatial-temporal analysis for traffic incident detection using deep learning. In Proceedings of the 2023 Innovations in Power and Advanced Computing Technologies (i-PACT); IEEE: New York, NY, USA, 2023; pp. 1–6. [Google Scholar]
Sun, L.; Liang, J.; Zhang, C.; Wu, D.; Zhang, Y. Meta-transfer metric learning for time series classification in 6G-supported intelligent transportation systems. J. IEEE Trans. Intell. Transp. Syst. 2023, 25, 2757–2767. [Google Scholar] [CrossRef]
Lei, A.; Cruickshank, H.; Cao, Y.; Asuquo, P.; Ogah, C.P.A.; Sun, Z. Blockchain-based dynamic key management for heterogeneous intelligent transportation systems. J. IEEE Internet Things J. 2017, 4, 1832–1843. [Google Scholar] [CrossRef]
Vangala, A.; Bera, B.; Saha, S.; Das, A.K.; Kumar, N.; Park, Y. Blockchain-enabled certificate-based authentication for vehicle accident detection and notification in intelligent transportation systems. J. IEEE Sens. J. 2020, 21, 15824–15838. [Google Scholar] [CrossRef]
Wang, L.; Gong, W.; Li, X. Deep convolutional neural network and its application in image recognition of road safety projects. J. Int. J. Perform. Eng. 2019, 15, 2182. [Google Scholar]
Albasrawi, R.; Fadhil, F.F.; Ghazal, M.T. Driver drowsiness monitoring system based on facial Landmark detection with convolutional neural network for prediction. J. Bull. Electr. Eng. Inform. 2022, 11, 2637–2644. [Google Scholar] [CrossRef]
Rathi, R.; Sawant, A.; Jain, L.; Kulkarni, S. Driver Fatigue and Distraction Analysis Using Machine Learning Algorithms. In Proceedings of the International Conference on Innovative Computing and Communications: Proceedings of ICICC 2020; Springer Nature: Berlin/Heidelberg, Germany, 2020; Volume 1, pp. 1037–1046. [Google Scholar]
Fang, A.; Qiu, C.; Zhao, L.; Jin, Y. Driver risk assessment using traffic violation and accident data by machine learning approaches. In Proceedings of the 2018 3rd IEEE International Conference on Intelligent Transportation Engineering (ICITE); IEEE: New York, NY, USA, 2018; pp. 291–295. [Google Scholar]
Thaika, M.; Tasneeyapant, S.; Cheamanunkul, S. A fast, scalable, unsupervised approach to real-time traffic incident detection. In Proceedings of the 2018 15th International Joint Conference on Computer Science and Software Engineering (JCSSE); IEEE: New York, NY, USA, 2018; pp. 1–6. [Google Scholar]
Yasin Çodur, M.; Tortum, A. An artificial neural network model for highway accident prediction: A case study of Erzurum, Turkey. J. Promet-Traffic Transp. 2015, 27, 217–225. [Google Scholar]
Jamal, A.; Umer, W. Exploring the injury severity risk factors in fatal crashes with neural network. J. Int. J. Environ. Res. Public Health 2020, 17, 7466. [Google Scholar] [CrossRef] [PubMed]
Nikolaev, A.B.; Sapego, Y.S.; Ivakhnenko, A.M.; Stroganov, V. Analysis of the incident detection technologies and algorithms in intelligent transport systems. J. Int. J. Appl. Eng. Res. 2017, 12, 4765–4774. [Google Scholar]
Chawla, P.; Hasurkar, R.; Bogadi, C.R.; Korlapati, N.S.; Rajendran, R.; Ravichandran, S.; Tolem, S.C.; Gao, J.Z. Real-time traffic congestion prediction using big data and machine learning techniques. J. World J. Eng. 2024, 21, 140–155. [Google Scholar] [CrossRef]
Al-Agroudy, Z.; Mohamed, A.; Ashraf, Z.; Al-Sayed, S.; Gad, W.; Hassan, Z.; Reda, R. AI-safe transportation: Real-time incident detection and alerting system in smart cities. In Proceedings of the 2023 Eleventh International Conference on Intelligent Computing and Information Systems (ICICIS); IEEE: New York, NY, USA, 2023; pp. 523–529. [Google Scholar]

Figure 1. Density plot comparing the preprocessed word counts of documents classified as outliers compared to documents assigned to valid topics.

Figure 2. Distribution of documents per topic.

Figure 3. Hierarchical clustering of topics.

Figure 4. Temporal evolution of the top 10 most frequent topics.

Figure 5. Top 5 research themes by document count. The chart aggregates documents from related individual topics into broader themes, showing the relative prevalence of each major research area.

Table 1. Dominant research topics identified by BERTopic based on document distribution.

Topic ID	Doc Count	Top Keywords	Topic	Objective
0	813	injury severity, severity prediction, logistic regression, analytics	Crash Severity & Injury Analysis	This topic covers research that uses statistical and analytical methods to build models that estimate injury severity in traffic crashes.
1	628	physiological, alarm, architecture, ecg, inattention	Driver Behavior & Human Factors	This theme focuses on monitoring driver condition using physiological signals and in-vehicle systems to detect distraction, fatigue, or reduced attention.
2	536	transportation pedestrian safety, improve pedestrian safety, safety perception, pedestrian safety risk	Vulnerable Road Users (VRUs)	This topic includes studies related to pedestrian safety, infrastructure improvements, and perception of risk in walking environments.
3	456	safety education, risky driving behavior, driving behavior	Driver Behavior & Human Factors	This topic analyzes risky driving behaviors and evaluates interventions such as training and education programs to improve safety.
4	435	trauma registry, human injury, hospital mortality, trauma care, morbidity, mortality	Crash Severity & Injury Analysis	This theme examines medical outcomes after crashes, including injury patterns, mortality, and the performance of trauma care systems.
5	421	prone area, safety spatial, street spatial variable, road street spatial, heterogeneity	Spatial Analysis & Hotspot Identification	This topic focuses on identifying accident-prone areas using spatial analysis and GIS techniques to understand geographic risk patterns.
6	366	machine learning, intelligent transportation, incident detection	AI, Machine Learning & Computer Vision	This theme explores the use of machine learning algorithms for traffic monitoring, incident detection, and safety optimization in ITS.
7	361	affect injury severity, contribute injury severity, influence injury severity, affect injury	Crash Severity & Injury Analysis	This topic investigates the factors that influence injury severity, focusing on causal relationships rather than prediction alone.
8	306	deep learning, intelligent transportation, neural network CNN, neural network	AI, Machine Learning & Computer Vision	This theme covers advanced deep learning techniques applied to visual data, including object detection, traffic analysis, and scene understanding.
9	306	automobile drive, car drive, drunken drive, toxicology, drive influence alcohol	Driver Behavior & Human Factors	This topic focuses on the effects of alcohol and substances on driving performance, including detection methods and toxicological analysis.

Table 2. Mapping of algorithmic topics and super-clusters to manual thematic areas for literature review.

Thematic Areas	Algorithmic Cluster	BERTopic ID	Objective
1. Crash severity and injury analysis	Injury cluster	0, 4, 7, 13, 19	Modeling injury severity
2. Driver behavior and human factors	Human factors cluster & technology/AI cluster	1, 3, 9, 12, 27	Unsafe driving, psychological states, distraction, driver monitoring.
3. Vulnerable road users	Infrastructure cluster & injury cluster	2, 15	Pedestrian and cyclist safety, infrastructure adjustments, VRU risk perception.
4. AI, Machine Learning and Computer Vision	Technology and AI cluster	6, 8, 24	Advanced computational methods, deep learning, incident detection, CNNs.
5. Spatial analysis and hotspot detection	Infrastructure cluster	5, 21, 25	Geographic distribution of crashes, GIS, spatial statistics network-level risks.

Table 3. Test cases for category crash severity & injury analysis.

Ref.	Brief Summary of the Method	Data	Key Findings	Challenges
[7]	A random parameters logit model was applied to study injury severity in rollover crashes, allowing the analysis to capture unobserved heterogeneity across crash observations.	A large United States rollover crash dataset containing information on vehicle type, speed, roadway alignment, and environmental conditions.	The results show significant heterogeneity in how speed and vehicle characteristics affect injury severity.	The model is more complex to interpret and is not directly transferable to non-rollover crashes or crashes involving two-wheeled vehicles.
[4]	Injury severity was predicted using machine learning approaches based on structured variables, with a comparison between deep learning models and classical machine learning algorithms, without explicit explainability techniques.	A national crash dataset including vehicle dynamics, roadway attributes, and environmental factors.	Machine learning and deep learning models achieve predictive performance comparable to or better than traditional statistical approaches.	Interpretability is limited, data requirements are high, and model performance is sensitive to class imbalance.
[6]	Logistic regression with bias reduction was used to analyze the severity of injuries caused by accidents, with the aim of correcting for bias from rare events and instability in estimating fatal injuries, while maintaining full interpretability of the model coefficients.	A national crash database including roadway geometry, collision configuration, vehicle characteristics, and environmental conditions.	Roadway geometry and impact configuration have a strong influence on injury severity, and the use of bias reduction improves the robustness of estimates for fatal injury outcomes.	The method has limited ability to represent nonlinear relationships and depends strongly on correct model specification.
[3]	A systematic review of machine learning methods for crash injury severity analysis was conducted, summarizing commonly used algorithms, data sources, and reported performance measures.	A collection of published studies using machine learning and statistical methods for injury severity modeling.	The review shows that machine learning is being used more and more through ensemble methods and highlights ongoing challenges with data imbalance and limited interpretability.	No original models were developed, and the conclusions depend on the quality and consistency of the reviewed studies.
[39]	A conceptual framework was proposed for identifying accident-prone locations using historical crash data, with an emphasis on spatial clustering techniques and structured analytical workflows.	Aggregated historical crash datasets with spatial and temporal information.	The framework demonstrates the potential of data-driven analytics for identifying high-risk locations in road networks.	The approach is conceptual and does not include direct injury severity modeling or empirical validation.

Table 4. Test cases for category driver behavior and human factors.

Ref.	Brief Summary of the Method	Data	Key Findings	Challenges
[12]	A statistical analysis was conducted, oriented on behavior to link driver characteristics and situational factors to risky driving behavior and crash involvement, using interpretable modeling approaches.	Structured crash and driver datasets including age, driving experience, time of day, traffic conditions, and behavioral indicators.	Driver demographics and situational context have a significant influence on risky behavior patterns and the likelihood of crash involvement.	Limited temporal resolution, with behavioral states inferred indirectly rather than measured explicitly.
[14]	A human factors analysis was performed in partially automated driving environments, focusing on driver attention, engagement in non-driving tasks, and readiness to take control during automation transitions.	Driving simulator and controlled experimental datasets capturing driver and automation interaction events.	Engagement in non-driving tasks significantly delays takeover responses and reduces situation awareness.	Results obtained from the simulator based on experiments may not fully generalize to naturalistic driving conditions.
[49]	A smartphone application based on driver behavior profiling was carried out using embedded inertial and global positioning sensors, combined with supervised machine learning for driving style classification.	Real-world driving trips collected using smartphone sensors, including accelerometer, gyroscope, and global positioning data.	Low-cost sensing enables scalable detection of aggressive and unsafe driving behaviors.	Sensor noise, inconsistent sampling rates, and limited contextual information reduce robustness.
[13]	A multimodal driver monitoring approach was developed by combining vision-based features with vehicle kinematics to detect distraction and unsafe behavior using machine learning classifiers.	In vehicle camera data, steering and pedal inputs, and contextual driving information.	Multimodal data fusion improves detection accuracy compared to single sensor systems.	Sensitivity to lighting conditions and visual occlusions, along with high computational demands for real-time deployment.
[57]	Behavioral pattern analysis was performed using experimental or observational driving data to identify unsafe maneuvers and the effects of cognitive workload on driving performance.	Driving behavior datasets, including maneuver level indicators and contextual variables.	Increased cognitive workload and behavioral irregularities are strongly associated with elevated risk indicators.	Difficulty in isolating behavioral effects from environmental influences and limited scalability of the approach.

Table 5. Test cases for the category vulnerable road users.

Ref.	Brief Summary of the Method	Data	Key Findings	Challenges
[18]	Interpretable regression based on statistical models was employed to evaluate pedestrian injury risk, with particular emphasis on roadway design, traffic speed environment, and environmental conditions. This approach allows infrastructure-related effects to be examined while accounting for demographic and traffic characteristics.	Pedestrian crash records enriched with detailed information on roadway geometry, speed limits, intersection type, lighting conditions, and pedestrian demographic characteristics.	The results indicate that vehicle speed, complex intersection layouts, and inadequate lighting are key contributors to severe and fatal pedestrian injuries, offering clear and actionable guidance for an oriented infrastructure towards safety interventions.	The reliance on linear model assumptions limits the representation of complex interactions and spatial spillover effects, while data completeness and reporting accuracy directly influence the robustness of the results.
[59]	A user-centered approach to pedestrian safety was adopted by focusing on perceived risk rather than recorded crash outcomes, integrating indicators based on a survey with contextual traffic and infrastructure characteristics.	Questionnaire data collected from pedestrians and linked with local road characteristics, traffic exposure levels, and demographic attributes.	Findings show that perceived safety varies considerably across demographic groups and is strongly shaped by crossing complexity, traffic volume, and the quality of pedestrian facilities.	Because indicators based on perception do not directly reflect objective crash risk, results may be influenced by subjective bias and familiarity with the local environment.
[72]	Network that is based on kernel density estimation was applied to pedestrian crash data in order to identify high-risk locations while explicitly accounting for road network structure and temporal variation.	Pedestrian crash records mapped onto the road network and enriched with temporal information describing the timing of incidents.	Compared with distance-based approaches, network-based methods provide more realistic and policy-relevant identification of crash hotspots, particularly within dense urban road networks.	The method involves high computational complexity and is sensitive to bandwidth selection, requiring accurate road network representation and precise crash location data.
[69]	Geographic information system based on spatial analysis using kernel density estimation was conducted to identify pedestrian and cyclist crash hotspots and to support the spatial prioritization of safety interventions.	Geocoded vulnerable road user crash datasets integrated within a geographic information system framework, including both spatial and temporal attributes.	The analysis reveals persistent clusters of pedestrian and cyclist crashes at specific intersections and road segments, enabling targeted countermeasures such as traffic calming measures and improved crossing design.	Results are sensitive to geocoding accuracy, spatial resolution, and parameter selection, while the underlying causal mechanisms are not explicitly modeled.
[67]	Behavioral modeling of pedestrian crossing decisions was performed using the Theory of Planned Behavior, incorporating psychological constructs such as perceived risk, social norms, and behavioral intention.	Survey datasets capturing attitudes, subjective norms, perceived behavioral control, and self-reported pedestrian behavior.	The results show that pedestrian compliance and risky crossing behavior are strongly influenced by perceived risk and social factors, often more strongly than by objective traffic conditions alone.	Dependence on self-reported data introduces response bias, and the identified behavioral relationships may vary across cultural and regional contexts.

Table 6. Test cases related to spatial analysis and hotspot detection.

Ref.	Brief Summary of the Method	Data	Key Findings	Challenges
[16]	A geographically weighted atrous convolutional neural network regression model was proposed to capture spatial heterogeneity in the effects of the built environment on road safety. The approach addresses key limitations of classical geographically weighted regression, including the neglect of road network distance and the nonlinear decay of spatial influence.	Empirical crash and built environment data collected in Jinan City, China.	The proposed model outperforms classical geographically weighted regression in predictive performance. Intersection density and bus stop density emerge as stronger risk factors than population density, land use mix, and destination accessibility. Population density shows bidirectional effects, while land use mix exhibits pronounced spatial variability.	Integration of high-resolution spatial data is challenging due to data quality and resolution issues. The model is sensitive to local calibration choices such as bandwidth and weighting schemes, involves high computational cost, and is less interpretable than classical statistical models.
[81]	A mixed analytical pipeline was developed combining hierarchical clustering for hotspot detection, field validation through site visits and police reports, and injury severity prediction using supervised machine learning based on spatial and environmental attributes.	Crash data from the Yazd to Kerman Road in Iran, enriched with information on lighting, climate, road slope, alignment, and geometric characteristics. Contextual factors were verified through field inspections.	Overlapping clusters reveal two consistent high-risk zones. Identified causes include rest area locations, insufficient lighting on curves, inadequate signage, and reduced visibility during dust storms. For injury severity prediction, the k-nearest neighbors method achieved higher accuracy than the random forest approach.	Results depend strongly on the quality and consistency of spatial and field-collected data. Transferability is limited without local recalibration, and clustering outcomes are sensitive to parameter selection and location errors.
[79]	A critical review of spatial approaches in road safety research was conducted, covering spatial units of analysis, model families including econometric, Bayesian, and machine learning methods, spatial aggregation strategies, and well-known methodological issues such as the modifiable areal unit problem and boundary effects.	Published literature on spatial road safety analysis, including studies focused on vulnerable road users.	The review clarifies why spatial dependence and spatial heterogeneity matter in safety analysis and shows how choices related to spatial units and neighborhood definition can substantially influence results. It also synthesizes strengths, limitations, and future directions for spatial modeling approaches.	As a review study, it does not provide direct numerical results. The applicability of recommendations depends on context, and persistent structural issues such as the modifiable areal unit problem and comparability across studies remain unresolved.
[78]	Hotspot identification methods were compared along an expressway by evaluating ordinary kriging, kernel density estimation, and network-based kernel density estimation for detecting crash concentrations along road segments.	Fatal crash data collected along a 165-kilometer expressway in India between August 2012 and October 2018.	While all methods identify common high-risk locations, the comparative analysis shows that a network based on kernel density estimation is more effective for detecting hotspots along short road segments. The findings are particularly relevant for method selection in low- and middle-income country contexts.	The results depend on geocoding accuracy and road network representation and are sensitive to parameter settings such as bandwidth and segmentation. Limited or variable exposure data may bias method comparisons.
[96]	Crash frequency on two-lane highways was modeled by testing different temporal and spatial aggregation schemes using generalized estimating equations to account for correlation and excess zero crash counts.	Data from eighty-eight road segments of two hundred meters each located outside urban areas, including crash frequency, average annual daily traffic, and geometric characteristics, covering the period from nineteen ninety-nine to two thousand ten.	Key influencing factors include average annual daily traffic, lane width, vertical sinuosity, and the density of access points. The model provides acceptable crash prediction performance for four-hundred-meter segments aggregated over two-year periods.	Model results depend on the chosen aggregation scheme and sample size. A large number of zero-crash observations complicates estimation, and transferability to other road networks or countries requires recalibration.

Table 7. Test cases for artificial intelligence, machine learning, and computer vision in intelligent transportation systems.

Ref.	Brief Summary of the Method	Data	Key Findings	Challenges
[83]	Interpretable machine learning models were applied to crash prediction and road safety analysis, with the objective of balancing predictive performance and model transparency for safety-related decision-making.	Large-scale structured crash datasets, including roadway characteristics, traffic conditions, environmental factors, and behavioral variables.	The models identify dominant risk factors influencing crash occurrence and injury severity while maintaining transparency and trustworthiness suitable for safety-critical applications.	A trade-off exists between interpretability and predictive accuracy, and explainability techniques may oversimplify complex nonlinear relationships.
[21]	A big data analytics framework was developed to support traffic monitoring, bottleneck prediction, and safety management through large-scale data processing and machine learning techniques.	High-volume traffic data collected from sensors, global positioning traces, cameras, and historical traffic records at the network level.	The framework enables near-real-time assessment of traffic states and supports proactive safety management, reducing reliance on delayed crashes based on indicators.	Data heterogeneity and imbalance, high storage and computational requirements, and privacy and data protection concerns limit large-scale deployment.
[9]	A multimodal detection framework was proposed that combines visual and acoustic sensing to improve the recognition of emergency vehicles and critical traffic events under adverse conditions.	Synchronized video and acoustic data collected from roadside and in-vehicle sensors across a range of visibility and noise conditions.	Multimodal data fusion substantially improves detection robustness compared to vision-only approaches, particularly in low-visibility and high-noise environments.	The approach requires precise sensor synchronization, is sensitive to environmental noise, and involves increased computational and hardware demands.
[10]	A real-time incident detection system based on deep learning was developed using CNN-based object detection models, mainly YOLO and Faster R-CNN, integrated into a smart city architecture. The system processes live streams from street cameras to detect, classify, and assess the severity of road and environmental incidents and automatically generates alerts and reports for authorities and users.	A diverse annotated dataset of real-world traffic and environmental incidents (e.g., car accidents, fires, floods, road anomalies) obtained under the MIT license, combined with live video feeds from street cameras. Data augmentation and preprocessing techniques were applied to improve the robustness of the model.	The YOLO-based model achieved high real-time detection performance, with an average accuracy of approximately 91% and a mean accuracy (mAP) of 90% at a confidence threshold of 0.5. The system demonstrated efficient incident detection, classification, and severity assessment, supporting proactive traffic management and enhanced road safety in smart city environments.	Key limitations include the complexity of real-time multiple incident detection, sensitivity to environmental conditions and background noise, reliance on large annotated datasets, and computational requirements for real-time implementation in large-scale urban environments.
[107]	Machine learning-based accident severity models were developed to identify factors associated with fatal outcomes and to support targeted road safety interventions.	Multiyear crash datasets containing detailed injury severity information, vehicle characteristics, roadway attributes, and environmental conditions.	The analysis reveals key predictors of severe and fatal crashes, allowing prioritization of countermeasures for high-risk scenarios.	Severe class imbalance, sensitivity to reporting quality, and limited generalization without local recalibration constrain practical application.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tudor, I.A.; Gîrbacia, F. Mapping Research Trends in Road Safety: A Topic Modeling Perspective. Vehicles 2026, 8, 69. https://doi.org/10.3390/vehicles8040069

AMA Style

Tudor IA, Gîrbacia F. Mapping Research Trends in Road Safety: A Topic Modeling Perspective. Vehicles. 2026; 8(4):69. https://doi.org/10.3390/vehicles8040069

Chicago/Turabian Style

Tudor, Iulius Alexandru, and Florin Gîrbacia. 2026. "Mapping Research Trends in Road Safety: A Topic Modeling Perspective" Vehicles 8, no. 4: 69. https://doi.org/10.3390/vehicles8040069

APA Style

Tudor, I. A., & Gîrbacia, F. (2026). Mapping Research Trends in Road Safety: A Topic Modeling Perspective. Vehicles, 8(4), 69. https://doi.org/10.3390/vehicles8040069

Article Menu

Mapping Research Trends in Road Safety: A Topic Modeling Perspective

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Collection

2.2. Topic Modeling

2.3. Results of BERTopic Analysis

3. Results

3.1. Crash Severity and Injury Analysis

3.2. Driver Behavior and Human Factors

3.3. Vulnerable Road Users

3.4. Spatial Analysis and Hotspot Detection

3.5. Artificial Intelligence, Machine Learning, and Computer Vision in Intelligent Transportation Systems

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI