1. Introduction
Proactive road safety analysis aims to identify and mitigate hazardous conditions before crashes occur; however, crash-based models, which rely on rare and often underreported events that must accumulate over extended periods to support analysis [
1,
2] are inherently unsuitable for such applications, particularly in real-time contexts. This has led to an increasing reliance on traffic conflicts as surrogate safety indicators, which capture high-frequency interactions among road users that are associated with elevated crash likelihood and allow for more timely safety assessment [
3,
4,
5].
The rapid development of sensing technologies, particularly video analytics and trajectory data extraction, has substantially expanded the scope of conflict-based modelling. Recent studies increasingly rely on high-resolution trajectory data derived from UAVs, computer vision, and sensor fusion systems to capture detailed spatiotemporal interactions among road users [
6,
7,
8,
9]. Early approaches primarily relied on statistical and regression formulations to model conflict frequency and severity [
3,
4,
10]. More recent work has incorporated temporal structures and short-term updating mechanisms, including cycle-level modelling and sliding time windows enabling risk estimation over short intervals rather than purely static representations [
11,
12,
13]. In parallel, Bayesian frameworks have been increasingly adopted to explicitly account for uncertainty, heterogeneity, and data limitations in dynamic traffic environments [
12,
14,
15].
A major methodological development in the field has been the adoption of extreme value theory (EVT), which focuses on modelling rare but safety-critical conflict events. EVT-based approaches provide a probabilistic framework linking the tails of conflict distributions to crash likelihood, and have demonstrated strong potential for short-term crash risk estimation across different traffic environments [
5,
6,
14,
16,
17,
18]. At the same time, machine learning methods have gained prominence due to their ability to capture complex nonlinear relationships in high-dimensional data, particularly when using trajectory and video-based inputs [
7,
9,
19,
20]. While these approaches often achieve strong predictive performance, their interpretability and explicit treatment of uncertainty remain limited in many applications, although recent studies have begun to address these limitations through explainable AI and probabilistic deep learning frameworks [
7,
8].
More recently, hybrid modelling strategies have emerged, combining probabilistic and data-driven techniques to leverage complementary strengths. Examples include the integration of EVT with machine learning for crash risk forecasting [
21], Bayesian–EVT frameworks for multi-level conflict modelling [
22], machine learning–Bayesian spatial frameworks [
23], and hierarchical hybrid models that jointly address occurrence and frequency of conflicts [
24,
25]. Such developments reflect a growing emphasis on balancing predictive performance, interpretability, uncertainty quantification, and operational feasibility.
Despite this rapid methodological evolution, the literature remains fragmented, with limited efforts to systematically compare modelling paradigms across key analytical dimensions. Existing studies differ substantially in terms of data sources (e.g., trajectory-based vs. aggregated traffic states), temporal representation (instantaneous, short-horizon, or cycle-level), conflict indicators (e.g., TTC, PET, MTTC, or alternative measures), and modelling objectives (frequency, severity, or risk evolution), making direct comparison challenging [
8,
12,
13].
This review addresses these limitations by providing a structured, model-centric synthesis of conflict-based approaches for real-time crash risk assessment. The literature is organized into five major modelling paradigms: statistical and regression-based models, Bayesian frameworks, EVT-based approaches, machine learning methods, and hybrid models. This classification enables a systematic comparison of their underlying assumptions and methodological characteristics.
Given the evolving and interdisciplinary nature of the field, many studies combine elements from different modelling paradigms. Accordingly, the classification adopted in this review allows for overlap across paradigms, such that individual studies may be associated with more than one methodological category when they incorporate multiple modelling components. Hybrid models are, however, distinguished as a separate category, referring specifically to studies that explicitly integrate two or more paradigms within a unified modelling framework. In this sense, the classification is interpreted as heuristic rather than strictly mutually exclusive, reflecting the inherent methodological convergence and overlap that characterize the literature. To move beyond descriptive categorisation, the review adopts a structured cross-paradigm framework that examines models across multiple analytical dimensions organized into two complementary groups: (i) model-intrinsic methodological characteristics, which describe how models are formulated and evaluated, and (ii) application, data, and deployment characteristics, which describe how models are applied in real-world traffic environments. This distinction enables a clearer separation between methodological capability and practical applicability.
Beyond descriptive classification, the review makes three main contributions that address key gaps within the context of conflict-based real-time and short-term crash risk assessment. First, it advances a model-centric perspective that shifts the focus from application-specific studies to underlying modelling paradigms and their analytical logic. Second, it provides a structured cross-paradigm comparison across both methodological and application-oriented dimensions, enabling a systematic evaluation of the relative strengths, limitations, and deployment potential of different approaches. Third, it introduces a conceptual distinction between conflict frequency and conflict severity, treating them as complementary and cross-cutting dimensions of real-time crash risk, and using this lens to interpret model behaviour across paradigms.
The remainder of this paper is structured as follows (see
Figure 1).
Section 2 outlines the systematic review methodology.
Section 3 presents the analytical synthesis of the literature, including a model-centric comparison across paradigms and a frequency–severity interpretive framework.
Section 4 discusses key research gaps and emerging directions. Finally,
Section 5 concludes the paper and provides actionable recommendations for research and policymakers.
2. Methodology
This study adopts the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines [
26] to ensure a transparent, systematic, and reproducible review process. The objective is to comprehensively identify and synthesize studies examining the use of traffic conflicts as surrogate safety indicators for real-time and short-term crash risk assessment, with particular emphasis on modelling approaches.
A structured literature search was conducted across Scopus, Web of Science, and ProQuest, selected for their broad and multidisciplinary coverage of transportation, engineering, and data-driven research. These databases were accessed between 6 January and 28 March 2026.
Initial queries based on core terms (traffic conflict, conflict-based, real-time, and short-term) were progressively expanded to include related expressions such as surrogate safety, crash precursors, near-miss, dynamic, prediction, forecast, online, and near real-time. This expansion accounts for variations in terminology across studies describing similar concepts. Boolean operators (OR, AND) and keyword combinations were refined for each database to maximize retrieval efficiency while minimizing irrelevant results.
To ensure relevance and consistency, the search was restricted to peer-reviewed journal articles published in English between 2016 and 2026, thereby focusing on recent methodological developments, including the emergence of machine learning, extreme value theory, and hybrid modelling approaches. While this restriction may exclude earlier foundational work, it allows the review to reflect the current state of the art.
The study selection process consisted of four sequential stages:
Identification: The initial search yielded 853 records (233 from Scopus, 212 from Web of Science, and 408 from ProQuest). These records were consolidated into a unified dataset, and duplicates were removed to prevent bias arising from overlapping database indexing.
Screening: Titles and abstracts were reviewed to exclude studies not directly related to traffic conflicts, surrogate safety indicators, or real-time crash risk assessment. This step ensured conceptual alignment with the objectives of the review by filtering out studies focused solely on crash data or unrelated traffic phenomena.
Eligibility: Of the remaining 75 studies, 19 were excluded due to inaccessibility or invalid DOI. Full-text articles were assessed to determine methodological relevance. Studies were retained only if they (i) explicitly addressed real-time or short-term conflict-based modelling, and (ii) provided sufficient methodological detail to support meaningful comparison. Studies were excluded if they focused only on historical crash-frequency modelling, did not use traffic conflicts or surrogate safety indicators, lacked real-time or short-term relevance, were not peer-reviewed journal articles, were not written in English, or did not provide sufficient methodological detail for comparison.
Inclusion: A total of 56 studies were retained. To enhance coverage, a snowballing approach [
27] was applied by examining reference lists and citations of eligible studies, identifying 14 additional relevant articles. This resulted in a final sample of 70 studies included in the review.
The screening and eligibility assessment were conducted by two independent reviewers to minimize subjective bias. Each reviewer independently evaluated titles, abstracts, and full-text articles against the predefined inclusion criteria. Discrepancies in study selection were discussed and resolved through consensus. This procedure ensured consistency in study selection and reduced the risk of individual reviewer bias.
To enhance reproducibility, the inclusion and exclusion criteria were defined a priori and applied consistently across all stages of the review. A standardized screening protocol was used to guide the evaluation of each record, including explicit criteria related to study scope, methodological relevance, and data adequacy. Prior to full screening, a pilot screening phase was conducted on a subset of studies to align reviewer interpretation of the criteria. Ambiguous cases were re-evaluated through joint discussion to ensure consistent application of the criteria.
Given the heterogeneity of modelling approaches, data sources, and evaluation metrics, a qualitative, model-centric synthesis was adopted. A quantitative meta-analysis was not appropriate due to the absence of standardized outcome measures and the substantial methodological diversity across studies. Instead, studies were categorized based on modelling paradigms and analyzed across key analytical dimensions. Classification was based on the different methodological approaches identified within each study, allowing for overlap where hybrid or multi-method structures were present. Accordingly, the classification is interpreted as heuristic and non-exclusive, reflecting the inherently hybrid and evolving nature of modelling approaches rather than rigid categorical boundaries.
The overall study identification and selection process is summarized in
Figure 2.
3. Results and Analytical Synthesis
3.1. Descriptive Analysis
A summary of the reviewed studies is presented in
Table 1. Excluding the two methodological references [
26,
27] and the three review/synthesis papers [
1,
28,
29], the analytical dataset comprises 67 model-focused studies. Because several studies combine multiple modelling logics, the paradigm categories are non-mutually exclusive; thus, counts across categories exceed the number of unique studies. Machine learning and deep learning approaches appear most frequently, with 37 studies, followed by Bayesian probabilistic approaches (18 studies) and EVT-based models (18 studies), then statistical and regression-based models (17 studies) and hybrid frameworks (14 studies).
In line with the classification framework outlined in
Section 1, studies may be associated with multiple paradigms when they incorporate different methodological components. Hybrid models are, however, identified more selectively, referring only to studies that explicitly integrate two or more paradigms within a unified modelling framework. Accordingly, hybrid studies are also reflected within their constituent paradigms (e.g., statistical, Bayesian, EVT, or machine learning) where applicable, while being distinguished as a separate category to represent their integrated design. This treatment ensures consistency between the paradigm counts and the underlying methodological composition of the reviewed studies. In particular, Bayesian methods used solely for parameter estimation within another modelling framework (e.g., EVT) are not treated as separate paradigms, whereas time-series formulations (e.g., ARIMA) are classified under the statistical/regression-based paradigm as extensions for modelling temporal dependence.
The distribution of studies across paradigms is further clarified in
Figure 3 through the UpSet representation, which highlights both exclusive and overlapping modelling structures. The results indicate that a substantial proportion of studies are concentrated within single paradigms, particularly machine learning (28 studies) and statistical/regression-based approaches (10 studies), while a smaller but meaningful subset adopts dual or hybrid configurations. EVT-based approaches appear only occasionally as standalone models (three studies) and are more frequently integrated with other paradigms, reflecting their primary role in modelling extreme risk rather than general conflict occurrence. Similarly, Bayesian approaches appear as standalone models in four studies and more commonly intersect with EVT (four studies in B+E) and, to a lesser extent, machine learning (two studies in hybrid B+M), supporting uncertainty quantification and probabilistic inference within more complex modelling pipelines. Hybrid approaches (14 studies) remain limited but represent the most methodologically integrated designs, combining complementary strengths across paradigms. Among these, the most common configurations are Bayesian–EVT integrations (six studies), followed by EVT–machine learning (two studies), Bayesian–machine learning (two studies), statistical–EVT (two studies), statistical–machine learning (one study), and a single three-way integration (Bayesian–EVT–machine learning).
Overall, the descriptive results indicate three main patterns: first, machine learning approaches dominate the recent literature numerically; second, probabilistic, EVT-based, and hybrid models remain important because they address uncertainty, rare events, and integrated risk representation; third, methodological sophistication does not necessarily imply operational maturity, since deployment also depends on data availability, computational feasibility, interpretability, validation depth, and transferability. These patterns motivate the cross-paradigm analytical synthesis in
Section 3.2 and the frequency–severity interpretation developed in
Section 3.3.
3.2. Model-Centric Synthesis
Organizing the reviewed studies into five modelling paradigms provides a useful classification of the literature, consistent with prior reviews [
1,
28]. However, because several studies combine multiple modelling logics, this classification is interpreted as heuristic rather than strictly mutually exclusive. To move beyond descriptive categorisation, the following synthesis compares the five paradigms across nine analytical dimensions: conflict representation and modelling logic; application context and traffic environment; data requirements and input characteristics; sample size and dataset scale; uncertainty and heterogeneity modelling; temporal modelling and prediction horizon; model performance and validation strategy; computational complexity and implementation; operational readiness and transferability. These dimensions are summarized in
Table 2 and
Table 3, where model-intrinsic methodological characteristics and application/deployment characteristics are synthesized into structured comparative frameworks.
3.2.1. Conflict Representation and Modelling Logic
The five paradigms differ fundamentally in how they define and operationalise traffic conflicts. Statistical and regression-based models typically rely on threshold-based surrogate safety indicators, where conflicts are converted into counts, rates, or severity classes. Ref. [
3] modelled rear-end conflicts at signalized intersections using cycle-level TTC-based counts, testing thresholds from 0.5 s to 3.0 s and adopting TTC ≤ 1.5 s as the main safety performance threshold. Ref. [
4] extended this logic using TTC ≤ 2.5 s, MTTC ≤ 2.5 s, and DRAC ≥ 1.5 m/s
2, while ref. [
11] compared conflict rate indicators using TTC ≤ 3.0 s, MTTC ≤ 3.0 s, and DRAC ≥ 1.5 m/s
2. These models therefore follow a count-based or censored rate logic and are mainly suited to representing conflict occurrence within short operational intervals.
This threshold-based logic has also been adapted to more diverse conflict types and road environments, including bicycle–vehicle conflicts [
50], left-turn conflicts reconstructed from vehicle trajectories [
51], rear-end conflicts at unsignalized intersections [
7,
30], vehicle–pedestrian interactions [
31], and sharp curve traffic conflicts [
32]. In freeway and heterogeneous traffic settings, conflict representation has been extended through severity indices, PET categories, macroscopic safety measures, and context-specific indicators [
33,
34,
35,
52]. More recent studies have also introduced composite or dependence-based indicators, including video-based TTC–ML representations in weaving areas [
53], temporal margin and behavioural feature indicators for vehicle–bicycle conflicts [
36], and vine copula-based spatial dependence modelling of ramp area conflict risk [
13].
Bayesian models often use similar conflict indicators but embed them within probabilistic structures. Refs. [
4,
10,
11] retained TTC-, MTTC-, and DRAC-based conflict definitions but estimated conflict risk through full Bayesian or Bayesian Tobit models. Bayesian formulations therefore do not necessarily redefine conflicts; rather, they reinterpret conflict occurrence probabilistically by estimating posterior distributions of risk, parameters, and heterogeneity. This logic is extended in spatial and hierarchical settings by [
13,
23,
33,
43].
EVT-based models represent a distinct modelling logic because they focus on extreme conflicts rather than all observed conflicts. Refs. [
2,
15] modelled extremes of surrogate safety indicators using BM–GEV or POT–GPD frameworks. Refs. [
5,
16,
44] further developed Bayesian or hierarchical EVT formulations. In this paradigm, the analytical focus shifts from the frequency of all conflicts to the tail behaviour of severe interactions that are more closely associated with crash occurrence.
Machine learning models adopt a more data-driven representation. Rather than relying only on predefined TTC or PET thresholds, ML models learn conflict patterns from trajectory, video, LiDAR, connected vehicle, or communication-based data. Refs. [
9,
37,
54] used high-dimensional data to predict conflict or risk states. LSTM and GRU models have been used to capture sequential pedestrian–vehicle conflict evolution [
55,
56], while GNN-based models represent spatiotemporal interaction structures in mixed traffic [
19]. Other ML applications include urban freeway and congested highway rear-end conflict prediction [
57,
58], rear-end conflict identification at unsignalized intersections [
7], tunnel risk modelling [
59], roundabout conflict analysis [
20], video-based conflict detection using spatial-temporal analytics [
54], deep learning with LTE access data [
60], and reinforcement learning-based collision risk assessment [
61].
Hybrid models combine these logics. Refs. [
24,
38] integrated EVT with time-series forecasting, ref. [
23] combined machine learning-derived predictors with Bayesian spatial Poisson modelling, and refs. [
16,
22] combined Bayesian and EVT components within hierarchical rare-event frameworks. Recent statistical–machine learning integrations further demonstrate how interpretable statistical structures can be combined with data-driven prediction modules for real-time conflict assessment [
39]. Hybrid approaches therefore attempt to represent conflict occurrence, severity, uncertainty, and temporal evolution within a single modelling structure.
3.2.2. Application Context and Traffic Environment
A key but often underexplored dimension in conflict-based modelling is the application context, referring to the type of road facility, traffic conditions, and operational environment in which models are developed and applied. Across the reviewed studies, clear differences emerge in how modelling paradigms align with specific traffic contexts.
Statistical and regression-based models are predominantly applied to signalized intersections and controlled urban environments, where structured traffic flow and well-defined signal cycles support aggregation-based models [
3,
4,
35]. Their use has been extended to heterogeneous traffic conditions, including weak lane discipline environments and mixed traffic involving bicycles and pedestrians [
31,
50,
52], although such applications often require context-specific calibration.
Bayesian models expand this applicability to network-level and spatially distributed systems, including connected and autonomous vehicle (CAV) environments and merging areas [
23,
43]. Their ability to incorporate spatial dependence and hierarchical structure makes them particularly suitable for multi-site and system-level safety assessment, although their application remains concentrated in relatively structured datasets.
EVT-based models are most commonly applied in safety-critical contexts, including intersections, highways, and hazardous locations where extreme conflicts are observable [
2,
45]. They are especially relevant for risk-sensitive applications, such as safest route identification and high-risk location screening, but their reliance on extreme observations can limit applicability in low-conflict or data-sparse environments.
Machine learning models exhibit the broadest application coverage, spanning intersections, freeways, tunnels, roundabouts, and mixed traffic environments [
9,
20,
59]. Their flexibility allows them to adapt to data-rich environments, including video analytics, LiDAR, and connected vehicle systems, making them well suited for complex and high-density traffic conditions. However, their performance remains strongly dependent on data availability and quality.
Hybrid models are primarily applied in complex, data-rich, and emerging environments, including network-level systems, multi-modal traffic, and CAV contexts [
23,
24]. These models are particularly relevant for integrated safety assessment and real-time forecasting, where multiple dimensions of risk (frequency, severity, uncertainty) must be jointly modelled. Nevertheless, their application remains limited to a relatively small number of studies due to high data and computational requirements.
Overall, application context acts as a structuring constraint on model selection: simpler models dominate structured and data-limited environments, while advanced and hybrid approaches are concentrated in complex, data-rich, and emerging traffic systems.
3.2.3. Data Requirements and Input Characteristics
The paradigms also differ substantially in their data requirements. Statistical and regression-based models are generally compatible with aggregated traffic variables, cycle-level data, and structured surrogate safety indicators. Refs. [
3,
4,
11] used signal cycle-level indicators and traffic flow variables, while ref. [
35] proposed a macroscopic framework based on PSD ≤ 1 and time spent in conflict. Ref. [
52] further demonstrated how conflict-based safety evaluation can be adapted to heterogeneous and weak lane discipline traffic using context-sensitive indicators. These models are therefore suitable when only aggregated or moderately detailed trajectory-derived data are available.
Bayesian models require similar inputs when used as extensions of regression models but can also incorporate richer hierarchical, spatial, or multivariate data structures. Refs. [
13,
40] used copula-based dependence structures, while ref. [
23] incorporated machine learning-derived features into a Bayesian spatial Poisson model for large-scale prediction. Ref. [
43] applied a Bayesian hierarchical approach in connected and autonomous vehicle merging areas, demonstrating that Bayesian models can accommodate spatial, interaction-level, and site-specific data.
EVT models require high-resolution conflict observations because their main target is the tail of the conflict distribution. This makes them dependent on the reliable detection of severe conflicts or extreme surrogate safety values. Refs. [
2,
5,
15,
16] relied on sufficiently detailed conflict measurements to estimate tail distributions. EVT applications in safest route and hazardous location identification further demonstrate the need for high-frequency conflict data across space and time [
45,
46].
Machine learning models are the most data intensive. They typically require high-dimensional trajectory, video, LiDAR, connected vehicle, or multi-sensor inputs. Refs. [
9,
37,
53,
54] demonstrated the use of video and trajectory-based data; ref. [
47] used LiDAR data with Bayesian deep learning; ref. [
6] used real-world pre-crash trajectories, and ref. [
62] used connected vehicle information for real-time longitudinal conflict risk prediction. Ref. [
36] further illustrated how temporal margins and behavioural features can enrich early risk assessment in vehicle–bicycle interactions. These models benefit from rich data but are more sensitive to data quality, sensor coverage, labelling consistency, and distribution shift.
Hybrid models typically require the richest data structures because they combine several modelling components. Refs. [
21,
24] used AI-based video analytics with EVT and forecasting structures, ref. [
23] used large-scale traffic conflict features with Bayesian spatial modelling, and ref. [
48] developed a pedestrian-focused ML-based real-time crash risk forecasting framework. These models therefore require not only detailed data but also compatibility between statistical, probabilistic, and learning-based modules.
3.2.4. Sample Size and Dataset Scale
The required dataset scale increases from statistical models to Bayesian, EVT, ML, and hybrid approaches. Statistical and regression-based models can operate with relatively small to medium datasets, particularly when conflicts are aggregated by cycle, interval, or facility segment. Refs. [
3,
4,
34,
35] indicated that interpretable models can be built from structured conflict counts, PET classes, or macroscopic conflict exposure indicators. However, their reliability remains sensitive to the number of observed conflicts, the selected thresholds, and the representativeness of the collection period, as highlighted by [
63,
64].
Bayesian models can be useful in sparse data settings because prior distributions and hierarchical structures help stabilize inference. Refs. [
3,
4,
11] illustrated the use of Bayesian formulations for cycle-level conflict rates, while refs. [
23,
43] demonstrated that Bayesian spatial and hierarchical models can borrow information across locations or contexts. However, Bayesian models still require sufficient data for convergence, posterior stability, and the credible estimation of heterogeneity.
EVT models have a specific data scale requirement: they need enough extreme observations rather than merely a large number of ordinary conflicts. Refs. [
2,
5,
15,
16,
44] demonstrated the potential of EVT, but the reliability of GEV or POT–GPD estimates depends heavily on threshold selection, block definition, and the number of severe conflict observations. This makes EVT powerful for crash risk inference but vulnerable in data-sparse or low-conflict environments.
Machine learning models generally require larger datasets than statistical and Bayesian models. Deep learning models such as those used by [
9,
19,
55,
56,
60,
65] rely on sufficient labelled data to learn nonlinear and temporal patterns. Ref. [
66] specifically highlights class imbalance as a major issue in traffic conflict prediction. Transfer learning and meta-learning approaches attempt to reduce the burden of large training datasets, as illustrated by [
67,
68], but these remain emerging rather than standard practice.
Hybrid models typically require medium to large datasets because they must support multiple analytical components. The models mentioned in refs. [
21,
24,
38] require sufficient video-derived observations for both EVT and forecasting components, ref. [
23] requires large-scale data to support ML and Bayesian spatial modelling. Their data demand is therefore high, particularly when rare-event, spatial, and temporal components are jointly modelled.
3.2.5. Uncertainty and Heterogeneity Modelling
Uncertainty treatment is one of the clearest differences across paradigms. Statistical and regression-based models handle uncertainty mainly through distributional assumptions, goodness-of-fit diagnostics, and random parameter extensions. Poisson and Negative Binomial models capture variability and overdispersion in conflict counts [
3], while Tobit models address censoring in conflict rate data [
4]. Ref. [
34] used random parameter ordered logit models with heterogeneity in means, ref. [
35] applied grouped random parameter Tobit models, and ref. [
12] used Poisson-lognormal-Lindley distributions to address overdispersion and excess zero cycles.
Bayesian models provide the strongest formal treatment of uncertainty. Posterior distributions allow model parameters, conflict rates, and crash risk estimates to be interpreted probabilistically. Refs. [
4,
11] used Bayesian Tobit models to estimate censored conflict rates, while refs. [
23,
43] used Bayesian spatial and hierarchical structures to account for unobserved heterogeneity. Bayesian EVT models further extend uncertainty modelling to rare and severe conflicts, as shown by [
5,
15,
16,
18,
44].
EVT models address uncertainty through tail distribution estimation and extreme value parameters. They are particularly strong in modelling uncertainty associated with rare events, but parameter stability depends on the availability of extreme observations and the selected threshold. Refs. [
5,
15,
16] improved this by combining EVT with Bayesian inference. However, standard EVT applications may still be sensitive to threshold selection and limited tail samples, as also discussed by [
29].
Machine learning models traditionally provide weaker explicit uncertainty treatment. Most ML models generate point predictions or classifications without formal predictive uncertainty. Recent work has begun to address this limitation through Bayesian deep learning [
47], uncertainty-aware spatiotemporal learning [
8], and imbalance-aware methods such as weighted losses and resampling [
66]. Nevertheless, uncertainty propagation remains less mature in ML than in Bayesian and Bayesian–EVT frameworks.
Hybrid models offer the potential for more complete uncertainty treatment because they can combine probabilistic components with data-driven learning. Refs. [
8,
16,
19,
23,
24] illustrated different ways of combining rare-event modelling, uncertainty estimation, statistical structure, and nonlinear prediction. However, uncertainty is not always propagated across all stages of the hybrid pipeline, making this a key remaining challenge.
3.2.6. Temporal Modelling and Prediction Horizon
Temporal modelling ranges from short aggregation windows to dynamic forecasting. Statistical and regression-based models usually operate over short intervals such as signal cycles, fixed time windows, or lane-level aggregation intervals. Refs. [
3,
4,
11] linked cycle-level conflicts to traffic variables such as volume, queue length, shock-wave characteristics, and platoon ratio. Ref. [
33] used 30 s freeway lane-level intervals, while ref. [
35] computed conflict exposure in 60 m × 1 s spatiotemporal windows. Ref. [
12] further incorporated autoregressive dependence across adjacent signal cycles. Spatiotemporal conflict risk evolution has also been explicitly analyzed using trajectory-based approaches [
41].
Bayesian models extend temporal modelling through dynamic updating, time-varying parameters, and hierarchical temporal structures. Refs. [
15,
16] modelled evolving crash risk using dynamic Bayesian EVT frameworks, while ref. [
11] incorporated temporal correlation into Bayesian Tobit conflict rate models. Ref. [
12] provided empirical evidence of temporal correlation in severe conflicts, supporting the need for dynamic modelling. Reference ref. [
47] also used Bayesian deep learning for cycle-level prediction using LiDAR data, and ref. [
18] developed conditional Bayesian POT models for short-term crash risk forecasting.
EVT models are increasingly dynamic but remain uneven in temporal capability. Refs. [
2,
15,
16] demonstrated real-time or dynamic EVT applications, while ref. [
24] combined GEV theory with ARIMA for short-term forecasting. Refs. [
18,
21,
38] further extended EVT into forecasting-oriented settings. Comparative forecasting work also shows how near-future crash prediction can be evaluated across alternative model families [
69]. Nevertheless, many EVT applications still focus on short-window extreme event estimation rather than continuous multi-step forecasting.
Machine learning models are generally the strongest in temporal representation. LSTM and GRU models capture sequence dependence in pedestrian–vehicle conflict evolution [
55,
56], while GNN-based approaches capture spatiotemporal interactions [
19]. Connected vehicle and trajectory-based studies also support real-time prediction at fine temporal scales [
6,
32,
62,
65]. Video-based weaving area analysis and communication data-based deep learning further illustrate how short-horizon conflict prediction can be implemented using dynamic traffic data streams [
53,
60]. However, many ML studies still focus on short-term classification or prediction rather than long-horizon forecasting.
Hybrid models provide the most integrated temporal structures by combining sequence learning, dynamic updating, and time-series forecasting. Refs. [
24,
38] combined EVT and time-series models, while ref. [
21] proposed a bi-level real-time forecasting framework. Ref. [
18] supports short-term forecasting using conditional EVT, and ref. [
25] further developed dynamic short-term crash risk prediction using a novel conflict indicator in emerging mixed traffic flow. These approaches are promising for real-time crash risk forecasting, although continuous deployment and long-horizon forecasting remain limited.
3.2.7. Model Performance and Validation Strategy
Validation practices differ substantially across paradigms. Statistical and regression-based models are usually evaluated through goodness-of-fit and comparative statistical criteria. Ref. [
3] used AIC, scaled deviance, Pearson χ
2, parameter significance, and Durbin–Watson tests. Refs. [
4,
11] used DIC, posterior significance, and convergence diagnostics for Bayesian Tobit formulations. Ref. [
40] used Kendall’s tau, while ref. [
34] used log-likelihood, AIC, BIC, odds ratios, and coefficient significance. Ref. [
35] added stronger predictive validation using a 70/30 split, five-fold cross-validation, RMSE, MAE, MSE, R
2, and predicted versus observed comparisons.
Bayesian models rely on both predictive and Bayesian-specific validation. The authors of refs. [
4,
11] used DIC, convergence diagnostics, and posterior significance. Bayesian EVT applications assess posterior inference, model comparison, and tail adequacy [
5,
15,
16,
44]. Spatial and hierarchical models also involve comparison across locations or structures, as shown by [
23,
43].
EVT validation focuses on tail distribution adequacy and crash risk consistency. Common validation approaches include goodness-of-fit tests, tail diagnostics, comparison of predicted and observed crash risk, and sensitivity to thresholds. Ref. [
49] concluded that EVT-based models can outperform traditional surrogate-based approaches in capturing the relationship between conflicts and crashes. However, transfer validation across sites remains limited, and EVT results remain sensitive to threshold choice and the availability of extreme observations.
Machine learning validation relies mainly on predictive performance metrics, including accuracy, precision, recall, F1 score, AUC, cross-validation, and benchmarking. Refs. [
66,
70,
71] used comparative ML evaluation frameworks. Earlier ML-based freeway rear-end collision risk modelling also provides evidence on the predictive use of learning algorithms in real-time safety assessment [
57]. Ref. [
6] strengthened validation by using real-world pre-crash trajectory data, while refs. [
67,
68] explicitly addressed transferability through transfer learning and meta-learning. However, validation remains heterogeneous, and only a minority of studies test robustness under distribution shift.
Hybrid validation combines methods from multiple paradigms. Refs. [
23,
24] used combinations of statistical diagnostics, probabilistic evaluation, and predictive metrics such as RMSE, MAE, or AUC. Ref. [
39] further illustrated how integrated statistical–ML frameworks require validation strategies that account for both interpretable model structure and predictive performance. Hybrid models therefore offer broader validation possibilities, but their evaluation is also more complex because each module may require separate diagnostics.
3.2.8. Computational Complexity and Implementation
The paradigms show clear differences in computational burden. Statistical and regression-based models are generally the least demanding. Their reliance on aggregated variables, count models, Tobit models, or random parameter extensions make them relatively easy to implement and interpret. This is evident in cycle-level and macroscopic applications by [
3,
4,
12,
34,
35]. Their main implementation challenge lies not in computation but in threshold selection, data aggregation, and model specification.
Bayesian models are more computationally intensive because they require posterior estimation, convergence checking, and often MCMC-based inference. The models in [
4,
10,
11,
15,
16,
18] demonstrated the value of Bayesian inference, but these models require more careful calibration and diagnostic assessment. Their implementation burden increases further in spatial, hierarchical, or Bayesian–EVT formulations [
23,
43].
EVT models require specialized statistical calibration. Their computation is not always excessive, but implementation is sensitive to threshold selection, block definition, tail fit diagnostics, and parameter stability. Refs. [
2,
5,
15,
16] demonstrate EVT’s methodological rigour, while refs. [
45,
46] show its use in route and hazardous location applications. However, EVT implementation requires expertise in extreme value modelling and sufficient extreme observations.
Machine learning models require substantial computational resources, especially when using deep learning, video analytics, LiDAR, GNNs, reinforcement learning, or transfer learning. The models in [
9,
19,
47,
54,
60,
61,
65] illustrate the computational demands associated with high-dimensional data streams and complex architectures. These models may be powerful but require training infrastructure, labelled data, tuning procedures, and often GPU-level computation.
Hybrid models are the most complex to implement because they combine several computational layers. Refs. [
16,
21,
23,
24,
25,
38,
39] indicate that hybrid models require coordination between EVT, Bayesian inference, machine learning, statistical modelling, and forecasting components. Their complexity can improve predictive capacity but creates challenges in calibration, interpretation, reproducibility, and real-time deployment.
3.2.9. Operational Readiness and Transferability
Operational readiness depends not only on predictive performance but also on interpretability, data availability, computational cost, and transferability. Statistical and regression-based models are the most operationally ready because they are interpretable, relatively simple, and compatible with aggregated data. Refs. [
3,
4,
35,
64,
71] demonstrate that these models can support short interval monitoring and practical safety assessment. The practical scope of such models is broadened by applications to heterogeneous weak lane discipline traffic, bicycle–vehicle conflicts, pedestrian interactions, sharp curves, and unsignalized intersections [
30,
31,
32,
50,
52]. However, their transferability remains limited by context-specific thresholds and facility-specific calibration.
Bayesian models offer strong decision-support value because they provide uncertainty-aware outputs. This is useful for operational safety systems where confidence in risk estimates matters. The models in [
4,
11,
23,
43,
47] demonstrate this potential. However, their operational use is constrained by computational complexity, convergence requirements, and the need for expert model specification.
EVT models are operationally valuable for high-risk event detection, crash risk inference, safest route analysis, and hazardous location identification. Refs. [
2,
15,
16,
45,
46,
49] indicate that EVT provides a theoretically grounded bridge between conflicts and crash risk. Yet EVT deployment remains constrained by sensitivity to thresholds, the need for sufficient extreme events, and limited transfer validation.
Machine learning models have strong potential for real-time prediction because they can process high-dimensional streaming data and capture nonlinear interaction patterns. Applications using video analytics, LiDAR, trajectories, connected vehicles, and communication-based traffic data demonstrate this potential [
6,
9,
37,
47,
54,
60,
62]. However, operational readiness is reduced by interpretability limitations, data requirements, robustness issues, and transferability concerns. The directions taken in refs. [
67,
68] directly address these limitations through transfer learning and meta-learning, but such approaches remain emerging.
Hybrid models represent the most conceptually comprehensive but least operationally mature paradigm. They can jointly model frequency, severity, uncertainty, and temporal evolution, as illustrated by [
16,
21,
22,
23,
24,
38,
39,
48]. However, their deployment is limited by high complexity, calibration burden, interpretability challenges, and the difficulty of integrating multiple modules into real-time traffic management systems. Therefore, although hybrid models are theoretically promising, they should not be described as operationally superior without stronger evidence of real-world deployment and transfer validation.
3.3. Frequency and Severity Dimensions
In addition to the nine comparison dimensions, the reviewed paradigms can be interpreted through the complementary lenses of conflict frequency and conflict severity. This distinction is not a separate methodological dimension in the same sense as data requirements, uncertainty, validation, or deployment; rather, it is a cross-cutting conceptual lens that clarifies the functional role of each paradigm in real-time crash risk assessment.
Conflict frequency refers to how often conflicts occur within a defined time interval, road segment, signal cycle, or traffic state. Statistical and regression-based models are most directly aligned with this perspective because they model conflict counts, rates, or probabilities using threshold-based indicators such as TTC, PET, MTTC, and DRAC [
3,
4,
35]. Bayesian models extend this frequency-oriented logic by incorporating posterior uncertainty, spatial variation, temporal dependence, and heterogeneity into conflict rate estimation [
10,
11,
23].
Conflict severity refers to the intensity or crash relevance of a conflict, particularly when interactions approach extreme or safety-critical conditions. EVT-based models are most explicitly severity-oriented because they focus on the tail of the conflict distribution and model extreme values of surrogate indicators such as minimum TTC or PET exceedances [
2,
5,
15]. ML models also contribute to severity modelling by learning risk scores or conflict escalation patterns from high-dimensional trajectory, video, LiDAR, connected vehicle, or communication-based traffic data [
6,
9,
47,
60]. However, their severity representation is often data-driven rather than explicitly grounded in physical thresholds.
Hybrid models provide the strongest integration of frequency and severity because they combine occurrence/exposure modelling with probabilistic, extreme value, statistical, and learning-based components [
22,
23,
24,
39,
48]. This makes them particularly suitable for short-term crash risk forecasting, although their operational use remains limited by computational complexity and calibration requirements (see
Table 4).
Thus, the frequency–severity distinction helps explain why no single paradigm is universally superior. Statistical and Bayesian models are better suited to monitoring conflict occurrence and rate variation; EVT models are better suited to estimating crash-relevant extreme risk; ML models are better suited to learning complex interaction patterns, and hybrid models attempt to combine these roles within integrated forecasting frameworks.
4. Research Gaps and Emerging Directions
Despite significant progress in conflict-based real-time crash risk modelling, the reviewed literature reveals persistent methodological and practical limitations. These gaps are evident across all modelling paradigms and highlight critical opportunities for future research. To provide clarity and analytical coherence, the discussion is organized into two parts: (i) key research gaps and (ii) emerging directions aimed at addressing these limitations.
4.1. Research Gaps
4.1.1. Fragmented Conflict Definitions and Limitations in Modelling Rare and Extreme Events
A fundamental limitation across the literature is the absence of standardized definitions for traffic conflicts. Most studies rely on surrogate safety indicators such as TTC, PET, and DRAC [
3,
28], whose effectiveness depends heavily on threshold selection and context-specific calibration [
49]. Differences in thresholds, aggregation strategies, and interaction definitions lead to inconsistent conflict representations and limit comparability across studies. While recent efforts introduce continuous safety measures [
33] and interaction-based indicators [
31], these approaches do not fully resolve this fragmentation.
This limitation directly affects the modelling of rare and extreme events. Although traffic conflicts provide higher frequency observations than crashes, not all conflicts are equally informative. Many statistical and machine learning models treat conflicts uniformly, without distinguishing between ordinary and extreme interactions. EVT-based approaches explicitly model extremes [
2,
29], but their effectiveness depends on threshold selection and sufficient tail data. Machine learning models, in contrast, face challenges related to class imbalance and rare-event prediction [
66], often leading to biased predictions toward non-critical events.
Overall, the lack of consistent and severity-sensitive conflict definitions constrains both the identification of crash-relevant events and the robustness of rare-event modelling.
4.1.2. Incomplete Treatment of Uncertainty and Fragmented Spatiotemporal Modelling
Uncertainty and spatiotemporal dynamics are unevenly addressed across modelling paradigms. Bayesian frameworks provide formal probabilistic inference [
10,
18], whereas most regression and machine learning models rely on point estimates without explicitly quantifying prediction uncertainty. Moreover, multiple sources of uncertainty including data noise, model specification, and environmental variability are rarely distinguished or consistently propagated through modelling pipelines.
At the same time, temporal and spatiotemporal dependencies are not fully integrated. Early models rely on aggregated conflict counts and assume temporal independence, while more recent studies incorporate temporal correlation [
11,
12] and spatiotemporal dynamics [
19,
41]. However, these features remain inconsistently represented across paradigms. Statistical and EVT-based models often lack rich temporal representations, whereas machine learning approaches, despite strong spatiotemporal capabilities, may lack interpretability and probabilistic structure.
This fragmented treatment limits the ability of existing models to reliably capture dynamic risk evolution under uncertainty.
4.1.3. Limited Validation and Transferability
Validation practices remain highly heterogeneous across conflict-based crash risk modelling studies, limiting the robustness and generalizability of reported findings. Many models are developed and evaluated using single-site or context-specific datasets, which constrains their applicability to broader traffic environments and operational conditions.
Across modelling paradigms, validation approaches differ substantially. Machine learning studies typically rely on internal validation techniques such as cross-validation and performance metrics (e.g., accuracy, precision, recall, AUC) [
66], which assess predictive performance but do not guarantee transferability beyond the training context. Bayesian models employ posterior predictive checks and model comparison criteria [
72], providing a more rigorous probabilistic assessment, yet these evaluations are also commonly conducted within the same dataset.
A key limitation is the limited use of external validation across independent sites, time periods, or traffic conditions. As a result, model performance often remains context-dependent, and the ability to generalize across heterogeneous environments is not systematically assessed. This issue is further compounded by the absence of standardized benchmark datasets and evaluation protocols, which restricts reproducibility and hinders direct comparison across studies.
Overall, the lack of consistent and externally validated evaluation frameworks limits confidence in model robustness and constrains the practical deployment of conflict-based crash risk models across diverse traffic settings.
4.1.4. Barriers to Scalability and Real-Time Deployment
Despite their explicit focus on real-time applications, many conflict-based crash risk models face significant challenges in scalability and operational deployment. These challenges arise from the combined effects of computational complexity, data requirements, and system integration constraints.
The nature of these limitations varies across modelling paradigms. Bayesian and hybrid models, while offering rich representations of uncertainty and risk, are often computationally intensive due to iterative inference procedures and multi-stage modelling structures. Machine learning approaches, particularly deep learning models, require large volumes of high-resolution data such as trajectory, LiDAR, or video data and substantial computational resources for training and real-time inference.
In addition to computational challenges, deployment is constrained by data availability, sensing infrastructure, and system interoperability. Many existing studies rely on high-quality datasets that may not be readily available in real-world settings, limiting the transfer of these models to operational environments. Furthermore, integrating modelling frameworks into existing traffic management systems remains complex, requiring robust data pipelines, real-time processing capabilities, and compatibility with intelligent transportation system architectures.
Although some studies demonstrate promising applications, including real-time warning systems [
42] and safety-oriented routing strategies [
45,
46], large-scale and continuous deployment remains limited. This gap highlights the need for modelling approaches that are not only methodologically advanced but also computationally efficient, data-accessible, and compatible with real-world operational constraints.
4.2. Emerging Research Directions
4.2.1. Standardized and Adaptive Conflict Representations
To address inconsistencies in conflict definitions, future research should prioritize the development of standardized yet adaptive conflict indicators. Existing studies highlight the sensitivity of conflict-based models to the choice of surrogate measures and thresholds, as well as to data collection strategies and modelling assumptions [
28,
49,
64]. Such dependencies limit comparability across studies and reduce the reliability of derived risk estimates.
Recent work has explored data-driven threshold selection within EVT frameworks and the development of context-aware indicators that adapt to varying traffic conditions [
14,
18].
In parallel, trajectory-based and interaction-driven formulations are gaining traction as alternatives to fixed threshold definitions, enabling a more continuous and behaviourally grounded representation of traffic interactions [
31,
33]. Advancing this direction requires not only methodological innovation but also the establishment of benchmark datasets and standardized evaluation protocols to support reproducibility, cross-study comparison, and model transferability.
4.2.2. EVT–Machine Learning Integration for Rare-Event Modelling
A key emerging direction is the integration of extreme value theory with machine learning to jointly model rare and high-risk traffic events. This approach leverages the theoretical strength of EVT in representing tail behaviour alongside the predictive flexibility of machine learning in capturing complex, nonlinear patterns.
Recent hybrid EVT–ML frameworks demonstrate strong potential for short-term crash risk forecasting, particularly in data-rich environments where both extreme event modelling and pattern recognition are required [
23,
24,
38,
39]. At the same time, the literature increasingly emphasizes the importance of addressing class imbalance and rare-event prediction challenges, which can bias models toward non-critical outcomes. Emerging solutions include imbalance-aware learning strategies, synthetic data generation, and simulation-enhanced training approaches, which aim to improve robustness under highly skewed safety datasets [
66]. Further research is needed to refine these integrated frameworks, particularly for real-time and streaming applications.
4.2.3. Integrated Uncertainty-Aware and Spatiotemporal Modelling Frameworks
Future research should move toward integrated frameworks that jointly capture uncertainty and spatiotemporal dynamics in conflict-based safety modelling. Bayesian approaches provide a strong foundation for uncertainty quantification through probabilistic inference and hierarchical modelling [
4,
10,
44], while recent advances in uncertainty-aware machine learning, including Bayesian deep learning, extend these capabilities to high-dimensional and real-time settings [
8,
47].
In parallel, the increasing availability of high-resolution trajectory and sensor data has enabled more sophisticated modelling of temporal dependencies and spatial interactions using sequence learning and graph-based approaches [
19,
41,
65]. Bayesian models have also incorporated temporal correlation and heterogeneity in conflict rates [
11], offering complementary perspectives on dynamic safety processes.
However, uncertainty and spatiotemporal dynamics are still rarely integrated within a unified framework. Advancing this direction requires the development of computationally efficient models capable of jointly representing temporal evolution, spatial interactions, and multiple sources of uncertainty, including both epistemic and aleatory components. Such integration is essential for improving model reliability, interpretability, and real-time decision support.
4.2.4. Transferability, Benchmarking, and Generalizable Modelling
Improving the transferability and generalizability of conflict-based models remains a critical research priority. Most existing models are developed and validated within specific traffic contexts, limiting their applicability across different environments and reducing confidence in their broader use [
71].
Emerging approaches such as transfer learning and meta-learning offer promising solutions by enabling models to adapt to new conditions with limited additional data [
67,
68]. However, systematic evaluation of transferability remains limited, and the absence of standardized benchmark datasets further hinders consistent comparison across modelling approaches.
Future research should focus on establishing common evaluation frameworks, cross-site validation protocols, and shared datasets to support reproducibility and robust performance assessment. Strengthening these aspects is essential for moving from context-specific models toward scalable and widely applicable safety assessment tools.
4.2.5. Scalable Real-Time Deployment and Unified Hybrid Systems
Advancing conflict-based safety modelling requires not only methodological innovation but also scalable real-time implementation and system integration. The increasing availability of high-resolution data from trajectory tracking, video analytics, and connected vehicle systems has created new opportunities for real-time safety assessment [
21,
33,
37]. Several studies demonstrate the feasibility of applications such as real-time crash risk forecasting, dynamic monitoring, and warning systems [
21,
42], as well as safety-oriented routing strategies [
45,
46].
At the same time, there is growing interest in unified hybrid modelling frameworks that combine statistical, Bayesian, EVT-based, and machine learning approaches to capture frequency, severity, uncertainty, and temporal dynamics within a single modelling pipeline [
17,
21,
23,
24,
39]. These approaches offer a pathway toward more comprehensive and operationally relevant safety models.
However, challenges remain in achieving computational efficiency, ensuring data reliability, and integrating models within intelligent transportation systems and connected vehicle ecosystems. Future work should therefore focus on efficient real-time inference, edge computing solutions, and seamless system integration, enabling the deployment of robust, interpretable, and scalable safety modelling frameworks in real-world environments.
5. Conclusions
This review provides a model-centric synthesis of conflict-based approaches for real-time crash risk assessment by organizing the literature into five modelling paradigms: statistical/regression-based, Bayesian, EVT-based, machine learning, and hybrid approaches. This classification enables a structured comparison of how different modelling traditions conceptualize and operationalize traffic conflicts. The analysis suggests that no single paradigm is universally optimal; rather, each captures distinct dimensions of crash risk. Statistical and Bayesian models tend to emphasize conflict frequency and support interpretable inference, EVT-based approaches focus on extreme interactions that may be more closely associated with crash occurrence, while machine learning models are well suited to capturing complex, high-dimensional interaction patterns. Hybrid frameworks offer the potential to integrate these complementary strengths within a unified modelling structure.
From a practical perspective, model selection should be guided by data availability, prediction objectives, and operational constraints. In data-limited contexts, statistical and Bayesian models provide relatively interpretable and computationally efficient solutions, making them suitable for exploratory analysis and policy-oriented applications. EVT-based approaches are particularly relevant when the objective is to characterize rare and safety-critical events. In contrast, machine learning models are more appropriate in data-rich environments where high-resolution trajectory or sensor data are available, enabling short-term prediction of complex interaction patterns. Hybrid frameworks are most applicable in advanced settings where multiple modelling objectives—such as prediction, uncertainty representation, and interpretability—need to be addressed simultaneously, although their implementation typically requires greater data availability and computational resources.
For researchers, several directions warrant further investigation. These include the development of standardized yet adaptive conflict indicators, improved modelling of rare and extreme events, and the systematic incorporation of uncertainty quantification across modelling pipelines. In addition, greater emphasis on cross-site validation and benchmarking using heterogeneous datasets is needed to assess model robustness and transferability. For policymakers and practitioners, effective deployment depends on sustained investment in sensing infrastructure, data integration, and computational capacity. The integration of conflict-based models into traffic management and intelligent transportation systems has the potential to support real-time monitoring and proactive safety interventions, although practical implementation challenges remain.
Overall, effective real-time safety modelling requires balancing interpretability, predictive performance, and operational feasibility. Model selection should therefore be informed by the specific application context, including the required prediction horizon, acceptable levels of uncertainty, and available data infrastructure. By providing a comparative framework and a frequency–severity perspective on crash risk, this review contributes to more informed model selection and supports the development of scalable, reliable, and data-driven traffic safety management systems.