Next Article in Journal
Taguchi-Based Experimental Optimization of PET and Bottom Ash Cement Composites for Sustainable Cities
Previous Article in Journal
Predicting the Concentration Levels of PM2.5 and O3 for Highly Urbanized Areas Based on Machine Learning Models
 
 
Article
Peer-Review Record

MRI-Copula: A Hybrid Copula–Machine Learning Framework for Multivariate Risk Indexing in Urban Traffic Safety

Sustainability 2025, 17(20), 9210; https://doi.org/10.3390/su17209210
by Fayez Alanazi 1,*, Abdalziz Alruwaili 1 and Amir Shtayat 2
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Sustainability 2025, 17(20), 9210; https://doi.org/10.3390/su17209210
Submission received: 16 September 2025 / Revised: 9 October 2025 / Accepted: 14 October 2025 / Published: 17 October 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

1.Although the paper provides a detailed description of the six stages of the MRI Copula framework, some technical details (such as the integration logic of CatBoost HAP and the Vine Copula parameter tuning process) lack specific implementation steps.
2.Section 2.5 of the review on Copula ML hybrids focuses on technology and lacks discussion on its applications in the field of transportation. It is suggested to add a table to compare the advantages and disadvantages of hybrid methods in existing traffic safety research.
3. The study is based solely on 835 accident data from Jeddah, Saudi Arabia, with insufficient sample size and regional representativeness.
4. The comparison with the benchmark model only focuses on performance indicators, without analyzing dimensions such as computational efficiency and interpretability.
5. Variable selection relies on domain knowledge, but does not specify how to handle potential collinearity between variables. It is recommended to supplement statistical basis or sensitivity analysis for variable screening.

Author Response

We sincerely appreciate your valuable time and thoughtful feedback. The comments and suggestions were instrumental in refining the methodological framework and strengthening the overall clarity of the manuscript. We have carefully addressed each point raised, providing detailed explanations and corresponding revisions highlighted in red within both this document and the revised manuscript. Also we attached the response in the file. 

Comment 1

“Although the paper provides a detailed description of the six stages of the MRI-Copula framework, some technical details (such as the integration logic of CatBoost–SHAP and the Vine Copula parameter tuning process) lack specific implementation steps.”

Response:

We thank the reviewer for this constructive observation. In response, Sections 3.3 and 3.4 have been merged and comprehensively revised to include detailed implementation steps of both the CatBoost–SHAP integration and the Vine Copula parameter tuning process. The revised section explicitly outlines:

  • The derivation, normalization, and integration of SHAP values as predictive weights.
  • The application of rank transformation to copula marginals.
  • The Akaike Information Criterion (AIC)-based selection among candidate copula families (Clayton, Gumbel, Student-t, Frank, and Independence).
  • The estimation of upper-tail dependence coefficients (χᵤ) using Monte Carlo simulation and bootstrap confidence intervals.
  • The α-blending optimization strategy that integrates SHAP-based and copula-based weights into a unified Multi-Risk Index (MRI).

These additions make the integration logic and parameterization process transparent and reproducible.

Revised Manuscript Excerpt (Section 3.3):

3.3. CatBoost–SHAP and Vine Copula Integration for MRI Construction

To capture both predictive influence and extremal dependence in crash severity modeling, we designed a two-step integration of CatBoost–SHAP feature attribution and Vine copula–based dependence analysis. The process ensured that the Multi-Risk Index (MRI) reflects both how strongly features predict severity and how risk factors co-escalate in extreme crash conditions.

Step 1. Predictive Feature Importance with CatBoost–SHAP

CatBoost (Prokhorenkova et al., 2018), a gradient boosting algorithm robust to categorical variables and missing data, was trained on the crash dataset with hyperparameters (learning rate, depth, number of iterations) optimized using stratified 5-fold cross-validation to maximize AUC.

Feature importance was quantified via SHapley Additive exPlanations (SHAP) (Lundberg & Lee, 2017). For each observation  and feature , SHAP values  represent the marginal contribution of  to the prediction. Global importance was obtained as the mean absolute SHAP value:

(1)

The SHAP weights were normalized to sum to 1 across ERI, BRI, and SRI, producing predictive importance scores that highlight which indices most strongly influence crash severity.

Step 2. Extremal Dependence with Vine Copula

To capture nonlinear, asymmetric, and joint-tail dependencies, the indices (ERI, BRI, SRI) were first rank-transformed to the unit interval:

,

(2)

 

A regular vine (R-vine) copula (Aas et al., 2009) was then fitted to these uniform marginals, with pair-copula families automatically selected from a candidate set (Clayton, Gumbel, Student-t, Frank, Independence) based on the Akaike Information Criterion (AIC).

The upper tail dependence coefficient,

 

 

                                        (3)

was estimated via Monte Carlo simulation (n = 10,000) and bootstrapped (n = 100) to construct 95% confidence intervals. Mean values across simulations provided copula-derived extremal weights, wCopula​.

Step 3. Multi-Risk Index (MRI) Construction

To integrate predictive and dependence perspectives, we applied a convex blend:

 

,

(4)

 

where α balances predictive accuracy and dependence-awareness. The optimal α was selected via stratified 5-fold cross-validation over a grid search (0.0 – 1.0 in increments of 0.1), with AUC maximization as the criterion. A constraint  ensured that at least 20% weight derived from copula dependence, safeguarding against purely predictive weighting.

The final MRI scores were appended as an additional feature to the dataset, and a CatBoost classifier was retrained on this enhanced feature set. Performance was evaluated on a held-out test set using F1, AUC, Precision, Recall, Accuracy, and Matthews Correlation Coefficient (MCC), with 95% confidence intervals estimated via 1,000 bootstrap replications.

For benchmarking, three alternative variants were implemented: (i) MRI-Corr (Spearman correlation in place of copula), (ii) MRI-Interact (explicit CatBoost interaction terms), and (iii) MRI-PCA (first principal component of indices).

Comment 2

“Section 2.5 of the review on Copula–ML hybrids focus on technology and lacks discussion on its applications in the field of transportation. It is suggested to add a table to compare the advantages and disadvantages of hybrid methods in existing traffic safety research.”

Response:

We appreciate this insightful suggestion. Section 2.5 has been substantially expanded to situate Copula–ML hybrid methods within the transportation safety context. The revised discussion now:

  1. Reviews relevant ML-based and copula-based transportation safety studies (e.g., ensemble SHAP models for crash severity, multivariate copulas for intersection and wildlife-vehicle crash modeling).
  2. Introduces Table 1, which systematically compares ML-only, Copula-only, and Copula–ML hybrid approaches in terms of their domains, methodological contributions, and limitations.
  3. Highlights that while ML models excel in predictive performance and copulas in dependence modeling, hybrid integration remains underutilized—thus motivating the MRI-Copula framework.

Revised Manuscript Excerpt (Section 2.5):

2.5. The Copula–ML Divide: Toward Hybrid Approaches in Safety Analytics

Although both copula models and machine learning (ML) have been applied in traffic safety research, their development has largely proceeded in parallel rather than in integration. On the ML side, ensemble algorithms and interpretable models have demonstrated strong predictive performance. For example, Asadi et al. (2023) developed a Self-Paced Ensemble–SHAP framework for work-zone crash severity classification, while Jamal et al. (2021) compared ensemble techniques such as Random Forest, XGBoost, and SVM, showing notable improvements in severity prediction. Similarly, Laphrom et al. (2024) used XGBoost-SHAP with heterogeneity modeling to capture the temporal patterns of multivehicle truck-involved crashes. These approaches illustrate the predictive strength of ML but generally treat features as independent and overlook multivariate dependence structures.

By contrast, copula-based methods focus on capturing dependence and joint distributions. Zou et al. (2019) applied copulas to correct underreporting in wildlife–vehicle collisions, while Wang et al. (2019) developed a multivariate copula temporal framework to jointly estimate severity, crash type, vehicle damage, and driver error. These studies highlight the value of copulas in modeling joint outcomes and tail dependencies, but their focus has largely been descriptive rather than predictive.

Outside the transportation domain, copula–ML hybrids have begun to emerge. Bermúdez and Karlis (2022) combined copula-based finite mixture regression with insurance claim modeling, and Sun et al. (2019) employed vine copulas to generate synthetic data for ML pipelines, enhancing generalization. In environmental risk modeling, Bayesian ML ensembles with copula-based uncertainty quantification have shown promise for robust groundwater forecasting (Zhu et al., 2023). These applications demonstrate the feasibility of embedding copula-derived dependence structures into predictive frameworks.

Despite these advances, no study in road safety has explicitly fused copula-based dependence with ML-derived feature importance into a unified predictive index. This methodological gap—what we call the copula–ML divide—reflects the tendency of copula research to emphasize dependence without predictive integration, while ML research prioritizes accuracy but neglects joint risk structures. The MRI-Copula framework presented in this study bridges this divide by integrating vine copula tail dependence with CatBoost-SHAP feature importance into a multivariate, interpretable risk index for crash severity.

To situate our contribution, Table 1 compares representative ML-only, copula-only, and hybrid approaches, highlighting their methodological scope, strengths, and limitations.

 

Table 1. Applications of Copula–ML and Related Methods in Transportation Safety Research

Study

Domain

Method

Key Contribution

Limitation

Asadi et al. (2023)

Work-zone crash severity

Self-Paced Ensemble + SHAP

Improved severity classification with interpretable feature attribution

Ignores dependence among risk factors

Jamal et al. (2021)

Traffic crash severity

Ensemble ML (RF, XGBoost, SVM)

Demonstrated strong predictive performance across classifiers

Limited interpretability; no dependence modeling

Laphrom et al. (2024)

Truck-involved crashes

XGBoost + SHAP with heterogeneity modeling

Captured temporal and unobserved heterogeneity

Still treats predictors as marginal, not joint

Zou et al. (2019)

Wildlife–vehicle collisions

Copula-based modeling

Corrected underreporting bias, improved hotspot identification

Descriptive focus; no predictive integration

Wang et al. (2019)

Intersection crashes

Multivariate copula temporal model

Jointly modelled severity, crash type, damage, driver error

Complexity limits scalability; predictive capacity underexplored

Bermúdez & Karlis (2022)

Insurance claims

Copula-based finite mixture regression

Combined copulas with regression mixtures for joint outcomes

Outside transport; not tested in crash analytics

Sun et al. (2019)

Data generation

Vine copula + ML

Used vine copulas for synthetic data to improve ML generalization

Methodological demonstration, not applied to safety data

 

Comment 3

“The study is based solely on 877 accident data from Jeddah, Saudi Arabia, with insufficient sample size and regional representativeness.”

Response:

We acknowledge and agree with this important limitation. The Jeddah dataset was selected as a proof-of-concept case study to evaluate the methodological feasibility of the MRI-Copula framework under realistic, data-limited urban conditions. While the dataset is region-specific, the proposed framework itself is generalizable once retrained on other datasets.

To clarify this, we have made the following updates:

  • In Section 3.1 (Data Preprocessing and Feature Engineering), we explicitly describe the dataset as a proof-of-concept demonstration rather than a globally representative sample.
  • In the Discussion (Section 5), we emphasize the need for multi-city validation and larger, cross-regional datasets to test robustness and generalizability.
  • Future work now explicitly proposes expanding the framework to multi-regional and open crash databases and integrating exposure and near-miss data.

Revised Manuscript Excerpt (Section 3.1):

3.1. Data Preprocessing and Feature Engineering

This study employs a dataset of 877 police-reported crashes from Jeddah, Saudi Arabia (2019–2023), serving as a proof-of-concept case for testing the MRI-Copula framework in an urban setting characterized by rapid motorization and varied crash determinants. Although region-specific, the dataset provides a representative basis for assessing methodological feasibility before scaling to multi-regional applications.

Each record contains 27 attributes spanning temporal, environmental, infrastructural, vehicle, and driver domains, with Injury Severity defined as a binary target: 0 = minor and 1 = severe/fatal. The class distribution (66.6 % minor, 33.4 % severe/fatal) supports stratified binary classification.

Temporal features (month, day of week, and weekend indicator) were derived from crash dates. Categorical variables (e.g., weather, driver gender, road type) were processed natively in CatBoost and encoded for LightGBM, while numerical variables (e.g., age, speed limit) were median-imputed. Categorical missing values were imputed using the mode (Joel et al., 2024). The cleaned dataset contained no missing values and was normalized for downstream modeling.

This preprocessing pipeline ensured data completeness, interpretability, and consistency across classifiers. Future extensions will validate the framework on larger, multi-regional datasets to enhance generalizability and capture cross-regional variations in crash causation.

 

Revised Manuscript Excerpt (Section 5 – Discussion):

Several limitations should be acknowledged; these findings should be interpreted within the scope of the study design. The analysis was based on 877 crash records from Jeddah, Saudi Arabia, which offers valuable insights into a Middle Eastern urban context but may not fully capture the diversity of cultural, infrastructural, and climatic conditions found elsewhere. The relatively modest sample size also places natural limits on statistical power, although this was partly mitigated by cross-validation and bootstrap replication. In addition, vine copula modeling remains computationally intensive, which may constrain its use in real-time or resource-limited settings. Finally, as with most observational crash data, the relationships observed are associative rather than strictly causal (Pearl, 2009).

Future work can address these constraints by scaling the framework to larger, multi-regional datasets to enhance generalizability, integrating geospatial analytics for corridor-level risk mapping (Essa & Sayed, 2019), and incorporating near-miss and exposure data to improve robustness (Hu et al., 2022). Addressing fairness and equity concerns is also essential to prevent algorithmic bias (Raji et al., 2020). Embedding MRI-Copula in human-in-the-loop decision systems could further bridge analytics with practitioner judgment, enabling adaptive and context-aware safety management.

 

Comment 4

“The comparison with the benchmark model only focuses on performance indicators, without analysing dimensions such as computational efficiency and interpretability.”

Response:

We thank the reviewer for this excellent point. The revised results section (Section 4.6) now incorporates a comprehensive benchmarking analysis that includes computational efficiency (runtime) and interpretability dimensions in addition to performance metrics.

Specifically:

  • Runtime analysis was added to Table 6, showing that LightGBM and HistGB models achieved competitive accuracy (AUC ≈ 0.958; F1 ≈ 0.91) within fractions of seconds, compared to CatBoost’s 32 seconds.
  • The section now explicitly discusses the trade-off between predictive performance and computational cost.
  • Interpretability improvements are highlighted, showing how the inclusion of ERI, BRI, and SRI indices (and the α-weighted MRI) provides a transparent mapping between feature dependence and crash outcomes.

Revised Manuscript (Section 4.6):

4.6 Comparative Model Performance Across Classifiers

Table 6 presents a comparative evaluation of the MRI-Copula framework across four classifiers—Logistic Regression, CatBoost, LightGBM, and Histogram-based Gradient Boosting (HistGB)—under three modeling scenarios: Baseline, With Indices, and With MRI. The Baseline models rely solely on the original crash-related variables, the With Indices models incorporate interpretable sub-indices representing Environmental (ERI), Behavioural (BRI), and Systemic (SRI) risks, while the With MRI models further integrate the α-weighted Multi-Risk Index (MRI), which combines SHAP-derived feature importance with vine-copula-based dependence weights, optimized at α = 0.80.

Table 6. Comparative Performance of Classifiers Integrated with the MRI-Copula Framework

Model

Scenario

F1

AUC

Precision

Recall

Accuracy

MCC

Training

Time (s)

Logistic Regression

Baseline

0.7895

0.8995

0.8182

0.7627

0.8636

0.6897

0.4000

Logistic Regression

With Indices

0.7652

0.8986

0.7857

0.7458

0.8466

0.6519

0.3830

Logistic Regression

With MRI

0.7788

0.8982

0.8148

0.7458

0.8580

0.6759

0.3850

CatBoost

Baseline

0.9027

0.9878

0.9444

0.8644

0.9375

0.8586

32.3240

CatBoost

With Indices

0.9043

0.9862

0.9286

0.8814

0.9375

0.8586

32.2200

CatBoost

With MRI

0.9043

0.9855

0.9286

0.8814

0.9375

0.8586

32.7060

LightGBM

Baseline

0.8696

0.9619

0.8929

0.8475

0.9148

0.8069

0.0660

LightGBM

With Indices

0.8929

0.9635

0.9434

0.8475

0.9318

0.8457

0.0830

LightGBM

With MRI

0.8496

0.9581

0.8889

0.8136

0.9034

0.7803

0.0670

HistGB

Baseline

0.8148

0.9502

0.8980

0.7458

0.8864

0.7404

0.2290

HistGB

With Indices

0.8224

0.9557

0.9167

0.7458

0.8920

0.7542

0.2160

HistGB

With MRI

0.8000

0.9478

0.9130

0.7119

0.8807

0.7281

0.2300

Across all configurations, the CatBoost + MRI-Copula model achieved the best overall predictive performance (F1 = 0.904; AUC = 0.985), demonstrating superior discrimination of severe crashes while maintaining balanced precision (0.929) and recall (0.881). The marginal differences among the Baseline, With Indices, and With MRI configurations indicate that the introduction of interpretable risk structures and dependence weighting did not compromise accuracy, confirming the robustness of the MRI-Copula integration. In comparison, LightGBM achieved similarly high predictive performance (AUC ≈ 0.96; F1 ≈ 0.87–0.89) but required substantially less training time (approximately 0.07 seconds), underscoring its efficiency and scalability for large-scale or near–real-time deployment. HistGB yielded slightly lower but competitive accuracy (AUC ≈ 0.95; F1 ≈ 0.81–0.82) within a modest runtime of about 0.23 seconds, further confirming the generalizability of the MRI-Copula approach across gradient-boosting frameworks.

By contrast, Logistic Regression consistently exhibited the lowest predictive performance (AUC ≈ 0.90; F1 ≈ 0.78), reflecting the inherent limitations of linear models in capturing the nonlinear interactions and complex dependencies among risk indices. This clear contrast between linear and ensemble-based learners highlights the methodological advantage of embedding dependency-aware indices within nonlinear architectures. Moreover, the inclusion of the interpretable ERI, BRI, and SRI sub-indices enhanced transparency by revealing how domain-specific risks contribute to overall crash severity, while the α-weighted MRI provided a unified representation of predictive and dependence-based relationships.

These results confirm that the MRI-Copula framework achieves a balanced integration of predictive accuracy, computational efficiency, and interpretability. CatBoost remains the most analytically robust and interpretable configuration, while LightGBM and HistGB offer practical alternatives for real-time or resource-constrained applications. Together, these findings position the MRI-Copula framework as a reliable, scalable, and theoretically grounded tool for data-driven road-safety management.

 

Revised Manuscript Excerpt (Section 5 – Discussion):

Comparative results across classifiers further confirmed the robustness of the MRI-Copula framework. As presented in Section 4.6, the CatBoost + MRI-Copula configuration achieved the highest predictive performance (AUC = 0.985; F1 = 0.904), demonstrating strong discriminative capability and balanced precision–recall trade-offs. LightGBM and HistGB models achieved comparably high accuracy (AUC ≈ 0.96; F1 ≈ 0.87–0.82) but required only fractions of a second for training, compared to approximately 32 seconds for CatBoost. This contrast shows an important trade-off between predictive power and computational efficiency, highlighting the framework’s flexibility for different deployment contexts. CatBoost remains preferable when analytical accuracy and interpretability are prioritized, while LightGBM and HistGB provide practical, time-efficient alternatives for real-time or large-scale traffic safety monitoring applications.

 

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

This study constructs an interpretable multivariate risk index framework by integrating the dependence quantification capabilities of vine copulas with the predictive power of CatBoost machine learning, achieving accurate prediction and factor analysis of urban traffic accident risks, which has great practical value. However, the following opinions are provided for the author to better improve the paper.

  1. The abstract mentions the integration of Copula dependency and SHAP feature importance, but the weight allocation criteria for both in the final risk index need more detailed explanation.
  2. There are many abbreviations in the paper. It is recommended that the author carefully check to ensure that each abbreviation is given its corresponding full name when it first appears, such as, MRI-Copula, CatBoost-SHAP, XGBoost, AUC.
  3. 3.It is suggested that the author add a summary of the core issues after the existing research introduction in the literature review (or discussion section), systematically sort out and summarize the key contradictions that have not been fully resolved in the existing research, in order to strengthen the targeted support of the research value of this paper.
  4. 4.The literature review section needs to be strengthened. The following related literature is suggested for review:  https://doi.org/10.1016/j.physa.2023.128980, https://doi.org/10.1109/JSEN.2020.3007809, https://doi.org/10.1109/TITS.2025.3559498.
  5. In the methodology of Section 3, it is mentioned that the categorical variable CatBoost preserves string inputs and converts LightGBM to categorical types. However, the current statement is slightly vague. It is recommended to clearly state whether CatBoost is the only model chosen as the final choice, or is it compared with other models (such as LightGBM)? If a comparison has been made, please briefly explain the comparison results and the basis for choosing CatBoost to enhance the persuasiveness of the methodology..
  6. In section 4.3, it is mentioned that the Systemic Risk Index (SRI) is conditionally independent of the Environmental Risk Index (ERI) and the Behavioral Risk Index (BRI). It is suggested that the author provide a deeper explanation of the practical significance of this statistical discovery to enhance the policy guidance value of the model.

Author Response

We thank you for their insightful and constructive observations, which have greatly enhanced the academic rigor and precision of our work. Each comment was carefully considered, and corresponding modifications were incorporated into the manuscript. The revisions are marked in red to ensure transparency in the changes made. Also the responses to the comments are attached.

Comment 1

“The abstract mentions the integration of Copula dependency and SHAP feature importance, but the weight allocation criteria for both in the final risk index need more detailed explanation.”

Response:
We appreciate the reviewer’s insightful comment. In the revised manuscript, Section 3.3 (CatBoost–SHAP and Vine Copula Integration for MRI Construction) has been expanded to provide a detailed explanation of how the weighting coefficient (α) is determined and how it governs the proportional contribution of SHAP-based feature importance and Copula-based dependence in the final Multi-Risk Index (MRI).

A convex integration scheme (Equation 4:

                                  (4)

 

is now clearly described, where α balances predictive accuracy (CatBoost–SHAP) and dependence-awareness (Vine Copula). The optimization of α was conducted through a grid search and stratified 5-fold cross-validation over α ∈ [0, 1] (in increments of 0.1), using AUC maximization as the criterion.

The optimal α = 0.80 was empirically found to offer the best trade-off between predictive strength and interpretability, ensuring that at least 20% of the final weighting derived from Copula-based dependence. This addition clarifies both the quantitative optimization process and the conceptual rationale behind integrating predictive and dependence structures.

Revised Manuscript Excerpt (Section 3.3):

3.3. CatBoost–SHAP and Vine Copula Integration for MRI Construction

To capture both predictive influence and extremal dependence in crash severity modeling, we designed a two-step integration of CatBoost–SHAP feature attribution and Vine copula–based dependence analysis. The process ensured that the Multi-Risk Index (MRI) reflects both how strongly features predict severity and how risk factors co-escalate in extreme crash conditions.

Step 1. Predictive Feature Importance with CatBoost–SHAP

CatBoost (Prokhorenkova et al., 2018), a gradient boosting algorithm robust to categorical variables and missing data, was trained on the crash dataset with hyperparameters (learning rate, depth, number of iterations) optimized using stratified 5-fold cross-validation to maximize AUC.

Feature importance was quantified via SHapley Additive exPlanations (SHAP) (Lundberg & Lee, 2017). For each observation  and feature , SHAP values  represent the marginal contribution of  to the prediction. Global importance was obtained as the mean absolute SHAP value:

(1)

The SHAP weights were normalized to sum to 1 across ERI, BRI, and SRI, producing predictive importance scores that highlight which indices most strongly influence crash severity.

Step 2. Extremal Dependence with Vine Copula

To capture nonlinear, asymmetric, and joint-tail dependencies, the indices (ERI, BRI, SRI) were first rank-transformed to the unit interval:

,

(2)

 

A regular vine (R-vine) copula (Aas et al., 2009) was then fitted to these uniform marginals, with pair-copula families automatically selected from a candidate set (Clayton, Gumbel, Student-t, Frank, Independence) based on the Akaike Information Criterion (AIC).

The upper tail dependence coefficient,

 

 

                                        (3)

was estimated via Monte Carlo simulation (n = 10,000) and bootstrapped (n = 100) to construct 95% confidence intervals. Mean values across simulations provided copula-derived extremal weights, wCopula​.

Step 3. Multi-Risk Index (MRI) Construction

To integrate predictive and dependence perspectives, we applied a convex blend:

 

,

(4)

 

where α balances predictive accuracy and dependence-awareness. The optimal α was selected via stratified 5-fold cross-validation over a grid search (0.0 – 1.0 in increments of 0.1), with AUC maximization as the criterion. A constraint  ensured that at least 20% weight derived from copula dependence, safeguarding against purely predictive weighting.

The final MRI scores were appended as an additional feature to the dataset, and a CatBoost classifier was retrained on this enhanced feature set. Performance was evaluated on a held-out test set using F1, AUC, Precision, Recall, Accuracy, and Matthews Correlation Coefficient (MCC), with 95% confidence intervals estimated via 1,000 bootstrap replications.

For benchmarking, three alternative variants were implemented: (i) MRI-Corr (Spearman correlation in place of copula), (ii) MRI-Interact (explicit CatBoost interaction terms), and (iii) MRI-PCA (first principal component of indices).

 

Comment 2

“There are many abbreviations in the paper. It is recommended that the author carefully check to ensure that each abbreviation is given its corresponding full name when it first appears, such as MRI-Copula, CatBoost-SHAP, XGBoost, AUC.”

Response:
We thank the reviewer for this helpful observation. The entire manuscript has been carefully reviewed to ensure that every abbreviation is defined upon its first appearance. For example:

  • MRI-CopulaMultivariate Risk Index based on Copula integration.
  • CatBoost–SHAPCategorical Boosting with SHapley Additive exPlanations.
  • XGBoostExtreme Gradient Boosting.
  • AUCArea Under the Receiver Operating Characteristic Curve.

These clarifications have been incorporated consistently throughout the Abstract, Methodology, Results, and Discussion sections to improve clarity and readability for both technical and non-technical audiences. Moreover, an abbreviation section is included in manuscript:

 

The following abbreviations are used in this manuscript:

Abbreviation

Definition

MRI

Multivariate Risk Index

MRI-Copula

Multivariate Risk Index with Copula Integration

ERI

Environmental Risk Index

BRI

Behavioural Risk Index

SRI

Systemic Risk Index

ML

Machine Learning

AI

Artificial Intelligence

IML

Interpretable Machine Learning

SHAP

SHapley Additive exPlanations

ALE

Accumulated Local Effects

DCA

Decision Curve Analysis

AUC

Area Under the Curve

MCC

Matthews Correlation Coefficient

PCA

Principal Component Analysis

SPF

Safety Performance Function

CPM

Crash Prediction Model

RSA

Road Safety Audit

GBM

Gradient Boosting Machine

RF

Random Forest

XGBoost

Extreme Gradient Boosting

SVM

Support Vector Machine

Comment 3

“It is suggested that the author add a summary of the core issues after the existing research introduction in the literature review (or discussion section), systematically sort out and summarize the key contradictions that have not been fully resolved in the existing research, in order to strengthen the targeted support of the research value of this paper.”

Response:
We appreciate this valuable recommendation. In response, a new integrative subsection has been developed in Section 2.5 of the Literature Review, which now systematically synthesizes and summarizes the core contradictions and research gaps that motivated the present study.

The revised section highlights:

  • The lack of integration between dependence modeling (Copulas) and explainable ML (e.g., SHAP) in traffic safety analytics.
  • The absence of interpretable composite indices linking environmental, behavioural, and systemic risk factors.
  • The limited attention to tail dependence and co-escalation of risk factors in extreme crash events.
  • The imbalance between accuracy and interpretability, constraining use in policy contexts.

This restructuring clarifies the paper’s research value and positions the MRI-Copula framework as a targeted methodological response to these unresolved issues.

Revised Manuscript Excerpt (Section 2.5):

2.5. The Copula–ML Divide: Toward Hybrid Approaches in Safety Analytics

Although both copula models and machine learning (ML) have been applied in traffic safety research, their development has largely proceeded in parallel rather than in integration. On the ML side, ensemble algorithms and interpretable models have demonstrated strong predictive performance. For example, Asadi et al. (2023) developed a Self-Paced Ensemble–SHAP framework for work-zone crash severity classification, while Jamal et al. (2021) compared ensemble techniques such as Random Forest, XGBoost, and SVM, showing notable improvements in severity prediction. Similarly, Laphrom et al. (2024) used XGBoost-SHAP with heterogeneity modeling to capture the temporal patterns of multivehicle truck-involved crashes. These approaches illustrate the predictive strength of ML but generally treat features as independent and overlook multivariate dependence structures.

By contrast, copula-based methods focus on capturing dependence and joint distributions. Zou et al. (2019) applied copulas to correct underreporting in wildlife–vehicle collisions, while Wang et al. (2019) developed a multivariate copula temporal framework to jointly estimate severity, crash type, vehicle damage, and driver error. These studies highlight the value of copulas in modeling joint outcomes and tail dependencies, but their focus has largely been descriptive rather than predictive.

Outside the transportation domain, copula–ML hybrids have begun to emerge. Bermúdez and Karlis (2022) combined copula-based finite mixture regression with insurance claim modeling, and Sun et al. (2019) employed vine copulas to generate synthetic data for ML pipelines, enhancing generalization. In environmental risk modeling, Bayesian ML ensembles with copula-based uncertainty quantification have shown promise for robust groundwater forecasting (Zhu et al., 2023). These applications demonstrate the feasibility of embedding copula-derived dependence structures into predictive frameworks.

Despite these advances, no study in road safety has explicitly fused copula-based dependence with ML-derived feature importance into a unified predictive index. This methodological gap—what we call the copula–ML divide—reflects the tendency of copula research to emphasize dependence without predictive integration, while ML research prioritizes accuracy but neglects joint risk structures. The MRI-Copula framework presented in this study bridges this divide by integrating vine copula tail dependence with CatBoost-SHAP feature importance into a multivariate, interpretable risk index for crash severity.

To situate our contribution, Table 1 compares representative ML-only, copula-only, and hybrid approaches, highlighting their methodological scope, strengths, and limitations.

 

Table 1. Applications of Copula–ML and Related Methods in Transportation Safety Research

Study

Domain

Method

Key Contribution

Limitation

Asadi et al. (2023)

Work-zone crash severity

Self-Paced Ensemble + SHAP

Improved severity classification with interpretable feature attribution

Ignores dependence among risk factors

Jamal et al. (2021)

Traffic crash severity

Ensemble ML (RF, XGBoost, SVM)

Demonstrated strong predictive performance across classifiers

Limited interpretability; no dependence modeling

Laphrom et al. (2024)

Truck-involved crashes

XGBoost + SHAP with heterogeneity modeling

Captured temporal and unobserved heterogeneity

Still treats predictors as marginal, not joint

Zou et al. (2019)

Wildlife–vehicle collisions

Copula-based modeling

Corrected underreporting bias, improved hotspot identification

Descriptive focus; no predictive integration

Wang et al. (2019)

Intersection crashes

Multivariate copula temporal model

Jointly modelled severity, crash type, damage, driver error

Complexity limits scalability; predictive capacity underexplored

Bermúdez & Karlis (2022)

Insurance claims

Copula-based finite mixture regression

Combined copulas with regression mixtures for joint outcomes

Outside transport; not tested in crash analytics

Sun et al. (2019)

Data generation

Vine copula + ML

Used vine copulas for synthetic data to improve ML generalization

Methodological demonstration, not applied to safety data

 

Comment 4

“The literature review section needs to be strengthened. The following related literature is suggested for review: https://doi.org/10.1016/j.physa.2023.128980, https://doi.org/10.1109/JSEN.2020.3007809, https://doi.org/10.1109/TITS.2025.3559498 .”

Response:
We thank the reviewer for these useful recommendations. All three cited works have been incorporated into the revised Literature Review to broaden the scope of recent ML applications in transportation and infrastructure studies. Specifically, the following integrations were made:

  • Chen et al. (2023)Analyzing Differences of Highway Lane-Changing Behavior Using Vehicle Trajectory Data (Physica A, DOI: 10.1016/j.physa.2023.128980) – example of trajectory-based ML in behavioural analysis.
  • Chen et al. (2020)Sensing Data Supported Traffic Flow Prediction via Denoising Schemes and ANN (IEEE Sensors Journal, DOI: 10.1109/JSEN.2020.3007809) – demonstrates preprocessing-enhanced neural network modeling.
  • Huang et al. (2025)Multi-Perspective Semantic Segmentation of Ground Penetrating Radar Images for Pavement Subsurface Objects (IEEE T-ITS, DOI: 10.1109/TITS.2025.3559498) – applies deep learning to pavement condition monitoring.

These studies illustrate the diversity of ML applications across traffic behaviour, flow prediction, and infrastructure monitoring. The revised paragraph uses these examples to show the novelty of the present study, that unlike prior works focusing on pattern recognition in subdomains, the MRI-Copula framework uniquely integrates dependence modeling and explainable ML for crash severity analysis.

Revised Literature Review Excerpt (Section 2.1):

More recently, ML applications in transportation have expanded beyond crash analysis to diverse areas such as lane-changing behaviour modeling (S. Chen et al., 2023), traffic air quality modelling (Suleiman et al., 2020), sensor-driven traffic flow prediction (X. Chen et al., 2020), and pavement subsurface imaging and condition assessment (Huang et al., 2025). These studies illustrate the breadth of data-driven approaches transforming traffic and infrastructure analytics. However, while such models excel at complex pattern recognition, they are often criticized as “black boxes,” lacking the interpretability and causal transparency required for policy-oriented safety analysis (Benfaress et al., 2024).

 

Comment 5

“In the methodology of Section 3, it is mentioned that the categorical variable CatBoost preserves string inputs and converts LightGBM to categorical types. However, the current statement is slightly vague. It is recommended to clearly state whether CatBoost is the only model chosen as the final choice, or is it compared with other models (such as LightGBM)? If a comparison has been made, please briefly explain the comparison results and the basis for choosing CatBoost to enhance the persuasiveness of the methodology.”

Response:
We appreciate this important clarification request. The revised manuscript now explicitly states that CatBoost was used as the primary model for developing and interpreting the MRI-Copula framework, due to its superior handling of categorical features and compatibility with SHAP interpretability (see section 3.3 above).

However, the study also performs comparative analyses against LightGBM, Histogram-based Gradient Boosting (HistGB), and Logistic Regression, as detailed in Section 4.6 (Comparative Model Performance).

This addition clarifies the methodological rationale for model selection and highlights the framework’s generalizability across gradient-boosting methods.

Revised Manuscript References: Revised Manuscript (Section 4.6):

4.6 Comparative Model Performance Across Classifiers

Table 6 presents a comparative evaluation of the MRI-Copula framework across four classifiers—Logistic Regression, CatBoost, LightGBM, and Histogram-based Gradient Boosting (HistGB)—under three modeling scenarios: Baseline, With Indices, and With MRI. The Baseline models rely solely on the original crash-related variables, the With Indices models incorporate interpretable sub-indices representing Environmental (ERI), Behavioural (BRI), and Systemic (SRI) risks, while the With MRI models further integrate the α-weighted Multi-Risk Index (MRI), which combines SHAP-derived feature importance with vine-copula-based dependence weights, optimized at α = 0.80.

Table 6. Comparative Performance of Classifiers Integrated with the MRI-Copula Framework

Model

Scenario

F1

AUC

Precision

Recall

Accuracy

MCC

Training

Time (s)

Logistic Regression

Baseline

0.7895

0.8995

0.8182

0.7627

0.8636

0.6897

0.4000

Logistic Regression

With Indices

0.7652

0.8986

0.7857

0.7458

0.8466

0.6519

0.3830

Logistic Regression

With MRI

0.7788

0.8982

0.8148

0.7458

0.8580

0.6759

0.3850

CatBoost

Baseline

0.9027

0.9878

0.9444

0.8644

0.9375

0.8586

32.3240

CatBoost

With Indices

0.9043

0.9862

0.9286

0.8814

0.9375

0.8586

32.2200

CatBoost

With MRI

0.9043

0.9855

0.9286

0.8814

0.9375

0.8586

32.7060

LightGBM

Baseline

0.8696

0.9619

0.8929

0.8475

0.9148

0.8069

0.0660

LightGBM

With Indices

0.8929

0.9635

0.9434

0.8475

0.9318

0.8457

0.0830

LightGBM

With MRI

0.8496

0.9581

0.8889

0.8136

0.9034

0.7803

0.0670

HistGB

Baseline

0.8148

0.9502

0.8980

0.7458

0.8864

0.7404

0.2290

HistGB

With Indices

0.8224

0.9557

0.9167

0.7458

0.8920

0.7542

0.2160

HistGB

With MRI

0.8000

0.9478

0.9130

0.7119

0.8807

0.7281

0.2300

Across all configurations, the CatBoost + MRI-Copula model achieved the best overall predictive performance (F1 = 0.904; AUC = 0.985), demonstrating superior discrimination of severe crashes while maintaining balanced precision (0.929) and recall (0.881). The marginal differences among the Baseline, With Indices, and With MRI configurations indicate that the introduction of interpretable risk structures and dependence weighting did not compromise accuracy, confirming the robustness of the MRI-Copula integration. In comparison, LightGBM achieved similarly high predictive performance (AUC ≈ 0.96; F1 ≈ 0.87–0.89) but required substantially less training time (approximately 0.07 seconds), underscoring its efficiency and scalability for large-scale or near–real-time deployment. HistGB yielded slightly lower but competitive accuracy (AUC ≈ 0.95; F1 ≈ 0.81–0.82) within a modest runtime of about 0.23 seconds, further confirming the generalizability of the MRI-Copula approach across gradient-boosting frameworks.

By contrast, Logistic Regression consistently exhibited the lowest predictive performance (AUC ≈ 0.90; F1 ≈ 0.78), reflecting the inherent limitations of linear models in capturing the nonlinear interactions and complex dependencies among risk indices. This clear contrast between linear and ensemble-based learners highlights the methodological advantage of embedding dependency-aware indices within nonlinear architectures. Moreover, the inclusion of the interpretable ERI, BRI, and SRI sub-indices enhanced transparency by revealing how domain-specific risks contribute to overall crash severity, while the α-weighted MRI provided a unified representation of predictive and dependence-based relationships.

These results confirm that the MRI-Copula framework achieves a balanced integration of predictive accuracy, computational efficiency, and interpretability. CatBoost remains the most analytically robust and interpretable configuration, while LightGBM and HistGB offer practical alternatives for real-time or resource-constrained applications. Together, these findings position the MRI-Copula framework as a reliable, scalable, and theoretically grounded tool for data-driven road-safety management.

 

Revised Manuscript Excerpt (Section 5 – Discussion):

Comparative results across classifiers further confirmed the robustness of the MRI-Copula framework. As presented in Section 4.6, the CatBoost + MRI-Copula configuration achieved the highest predictive performance (AUC = 0.985; F1 = 0.904), demonstrating strong discriminative capability and balanced precision–recall trade-offs. LightGBM and HistGB models achieved comparably high accuracy (AUC ≈ 0.96; F1 ≈ 0.87–0.82) but required only fractions of a second for training, compared to approximately 32 seconds for CatBoost. This contrast shows an important trade-off between predictive power and computational efficiency, highlighting the framework’s flexibility for different deployment contexts. CatBoost remains preferable when analytical accuracy and interpretability are prioritized, while LightGBM and HistGB provide practical, time-efficient alternatives for real-time or large-scale traffic safety monitoring applications.

 

Comment 6

“The paper lacks a clear discussion of model interpretability and the rationale behind integrating SHAP and Copula. Please elaborate on how this integration enhances model transparency and understanding of crash severity mechanisms.”

Response:
We thank the reviewer for this crucial observation. The revised manuscript now provides a comprehensive explanation of how the integration of SHAP and Vine Copula enhances interpretability.

Specifically, in Section 3.3, we describe that SHAP (SHapley Additive exPlanations) quantifies each variable’s marginal contribution to crash severity prediction—ensuring feature-level transparency. Simultaneously, the Vine Copula captures nonlinear and tail-dependent interactions among risk factors, revealing how environmental and behavioural variables co-escalate under extreme conditions.

These two perspectives are fused through the α-weighted MRI formulation (Equation 4), yielding a composite index that reflects both individual feature effects and joint dependence structures. The optimized weight (α = 0.80) ensures that the MRI-Copula maintains both theoretical soundness and practical interpretability.

In the Discussion (Section 5), we further emphasize that the MRI-Copula introduces domain-relevant sub-indices—Environmental (ERI), Behavioural (BRI), and Systemic (SRI)—whose combined influence is transparently governed by the α-weighting mechanism. This explicit structure allows policymakers to trace the contribution of each risk domain to overall crash severity, thereby enhancing both interpretability and operational relevance.

Revised Manuscript:

Section 5 — Interpretability and Policy Relevance Discussion.

The conditional independence of the risk indices offers significant policy implications. It reveals that systemic or infrastructure-related risks, such as those linked to road geometry, surface quality, and access control, serve as stabilizing elements within the broader crash-risk ecosystem. Well-designed infrastructure can therefore buffer the effects of adverse environmental conditions and risky driving behaviour, preventing these from escalating into severe crashes.

This finding emphasises the complementary roles of long-term infrastructure planning and short-term behavioural interventions in traffic safety management. Behavioural risks (BRI) require continuous enforcement and awareness programs, whereas systemic risks (SRI) demand sustained investment in road design, maintenance, and access management to enhance network resilience. Improvements such as better geometric alignment, lane separation, and roadside protection can mitigate behavioural volatility and make safety outcomes less sensitive to temporary human or environmental disturbances.

The independence of SRI from ERI and BRI thus supports a multi-layered safety strategy: behavioural countermeasures should target immediate crash precursors, while infrastructure design should function as a long-term stabilizer that structurally reduces exposure to compounding risks. This evidence-based distinction provides actionable guidance for policymakers to balance resources effectively among enforcement, education, and engineering measures.

Beyond these policy implications, the MRI-Copula framework itself functions as a scalable decision-support tool for real-world safety applications. By integrating environmental, behavioural, and systemic components within an interpretable α-weighted structure, the model enables data-driven prioritization of high-risk road segments, driver categories, or time periods. The sub-indices (ERI, BRI, SRI) further guide the type of intervention required; behavioural, infrastructural, or environmental. Moreover, the comparable accuracy of LightGBM and HistGB models, coupled with their lower computational cost, allows for real-time deployment in intelligent transportation system (ITS) dashboards. This integration bridges predictive analytics with operational decision-making, supporting adaptive and context-aware traffic safety management.

 

 

Author Response File: Author Response.docx

Reviewer 3 Report

Comments and Suggestions for Authors
  1. The novelty of the proposed MRI-Copula framework is mentioned, but its contribution compared to existing traffic risk prediction models (e.g., deep learning or ensemble-based models) is not clearly positioned. Could the authors explicitly highlight what is fundamentally new in MRI-Copula beyond combining copulas with machine learning?

  2. The description of datasets and variables is limited. Could the authors provide clearer details about data preprocessing, variable definitions, and whether the dataset is balanced across traffic conditions? Without this, it is difficult to assess the generalizability of results.

  3. The framework’s implementation lacks sufficient technical detail. For example, how were hyperparameters for the machine learning models chosen, and how sensitive is MRI-Copula to these choices? Was cross-validation applied?

  4. The results are presented with tables and metrics, but no clear baseline models are included for fair comparison. Could the authors add results against simpler benchmarks (e.g., logistic regression, random forest) to show relative gains more convincingly?

  5. The discussion of limitations is very brief. What are the known weaknesses of MRI-Copula, for example regarding data sparsity, computational costs, or robustness across cities with very different traffic patterns?

  6. The practical contribution for urban traffic management is not fully convincing. Could the authors clarify how urban planners or policy makers could realistically apply MRI-Copula in decision-making, beyond theoretical results?

Author Response

We extend our sincere gratitude to you for the comprehensive and critical evaluation of the manuscript. The reviewer’s detailed feedback contributed significantly to improving the coherence, technical accuracy, and presentation quality of our study. All responses and textual amendments are provided below, with revisions highlighted in red for clarity. Also, the file is attached 

Comment 1

“The novelty of the proposed MRI-Copula framework is mentioned, but its contribution compared to existing traffic risk prediction models (e.g., deep learning or ensemble-based models) is not clearly positioned. Could the authors explicitly highlight what is fundamentally new in MRI-Copula beyond combining copulas with machine learning?”

Response:

We thank the reviewer for this perceptive comment. The revised manuscript now explicitly articulates the novel conceptual and methodological contributions of the proposed Multi-Risk Index based on Copula Integration (MRI-Copula) framework relative to existing traffic-risk models.

Conventional crash-severity models typically rely on:

  1. Purely predictive approaches (e.g., deep learning, gradient boosting) that maximize accuracy but lack interpretability; or
  2. Dependence-based statistical models (e.g., copulas) that capture inter-variable relationships but are not optimized for predictive performance.

The MRI-Copula framework bridges this methodological divide through a principled α-weighted fusion of two complementary components:

  • CatBoost–SHAP predictive weights, representing each risk index’s marginal contribution to crash severity; and
  • Vine-copula-based tail-dependence weights, capturing the co-escalation of extreme risk factors.

This produces a composite, interpretable Multi-Risk Index (MRI) that simultaneously preserves predictive strength and structural dependence, a capability not offered by standard ensemble or deep-learning models.

Additionally, Section 4.6 now includes a quantitative benchmark, showing that the MRI-Copula achieves AUC ≈ 0.985 while improving interpretability through explicit Environmental (ERI), Behavioural (BRI), and Systemic (SRI) risk constructs. This dual emphasis on accuracy, dependence awareness, and interpretability establishes MRI-Copula as a distinct analytical paradigm in traffic-safety modeling, beyond conventional hybridization.

Revised Manuscript: (Sections 3.3 and 4.6)

3.3. CatBoost–SHAP and Vine Copula Integration for MRI Construction

To capture both predictive influence and extremal dependence in crash severity modeling, we designed a two-step integration of CatBoost–SHAP feature attribution and Vine copula–based dependence analysis. The process ensured that the Multi-Risk Index (MRI) reflects both how strongly features predict severity and how risk factors co-escalate in extreme crash conditions.

Step 1. Predictive Feature Importance with CatBoost–SHAP

CatBoost (Prokhorenkova et al., 2018), a gradient boosting algorithm robust to categorical variables and missing data, was trained on the crash dataset with hyperparameters (learning rate, depth, number of iterations) optimized using stratified 5-fold cross-validation to maximize AUC.

Feature importance was quantified via SHapley Additive exPlanations (SHAP) (Lundberg & Lee, 2017). For each observation  and feature , SHAP values  represent the marginal contribution of  to the prediction. Global importance was obtained as the mean absolute SHAP value:

(1)

The SHAP weights were normalized to sum to 1 across ERI, BRI, and SRI, producing predictive importance scores that highlight which indices most strongly influence crash severity.

Step 2. Extremal Dependence with Vine Copula

To capture nonlinear, asymmetric, and joint-tail dependencies, the indices (ERI, BRI, SRI) were first rank-transformed to the unit interval:

,

(2)

 

A regular vine (R-vine) copula (Aas et al., 2009) was then fitted to these uniform marginals, with pair-copula families automatically selected from a candidate set (Clayton, Gumbel, Student-t, Frank, Independence) based on the Akaike Information Criterion (AIC).

The upper tail dependence coefficient,

 

 

                                        (3)

was estimated via Monte Carlo simulation (n = 10,000) and bootstrapped (n = 100) to construct 95% confidence intervals. Mean values across simulations provided copula-derived extremal weights, wCopula​.

Step 3. Multi-Risk Index (MRI) Construction

To integrate predictive and dependence perspectives, we applied a convex blend:

 

,

(4)

 

where α balances predictive accuracy and dependence-awareness. The optimal α was selected via stratified 5-fold cross-validation over a grid search (0.0 – 1.0 in increments of 0.1), with AUC maximization as the criterion. A constraint  ensured that at least 20% weight derived from copula dependence, safeguarding against purely predictive weighting.

The final MRI scores were appended as an additional feature to the dataset, and a CatBoost classifier was retrained on this enhanced feature set. Performance was evaluated on a held-out test set using F1, AUC, Precision, Recall, Accuracy, and Matthews Correlation Coefficient (MCC), with 95% confidence intervals estimated via 1,000 bootstrap replications.

For benchmarking, three alternative variants were implemented: (i) MRI-Corr (Spearman correlation in place of copula), (ii) MRI-Interact (explicit CatBoost interaction terms), and (iii) MRI-PCA (first principal component of indices).

 

4.6 Comparative Model Performance Across Classifiers

Table 6 presents a comparative evaluation of the MRI-Copula framework across four classifiers—Logistic Regression, CatBoost, LightGBM, and Histogram-based Gradient Boosting (HistGB)—under three modeling scenarios: Baseline, With Indices, and With MRI. The Baseline models rely solely on the original crash-related variables, the With Indices models incorporate interpretable sub-indices representing Environmental (ERI), Behavioural (BRI), and Systemic (SRI) risks, while the With MRI models further integrate the α-weighted Multi-Risk Index (MRI), which combines SHAP-derived feature importance with vine-copula-based dependence weights, optimized at α = 0.80.

Table 6. Comparative Performance of Classifiers Integrated with the MRI-Copula Framework

Model

Scenario

F1

AUC

Precision

Recall

Accuracy

MCC

Training

Time (s)

Logistic Regression

Baseline

0.7895

0.8995

0.8182

0.7627

0.8636

0.6897

0.4000

Logistic Regression

With Indices

0.7652

0.8986

0.7857

0.7458

0.8466

0.6519

0.3830

Logistic Regression

With MRI

0.7788

0.8982

0.8148

0.7458

0.8580

0.6759

0.3850

CatBoost

Baseline

0.9027

0.9878

0.9444

0.8644

0.9375

0.8586

32.3240

CatBoost

With Indices

0.9043

0.9862

0.9286

0.8814

0.9375

0.8586

32.2200

CatBoost

With MRI

0.9043

0.9855

0.9286

0.8814

0.9375

0.8586

32.7060

LightGBM

Baseline

0.8696

0.9619

0.8929

0.8475

0.9148

0.8069

0.0660

LightGBM

With Indices

0.8929

0.9635

0.9434

0.8475

0.9318

0.8457

0.0830

LightGBM

With MRI

0.8496

0.9581

0.8889

0.8136

0.9034

0.7803

0.0670

HistGB

Baseline

0.8148

0.9502

0.8980

0.7458

0.8864

0.7404

0.2290

HistGB

With Indices

0.8224

0.9557

0.9167

0.7458

0.8920

0.7542

0.2160

HistGB

With MRI

0.8000

0.9478

0.9130

0.7119

0.8807

0.7281

0.2300

Across all configurations, the CatBoost + MRI-Copula model achieved the best overall predictive performance (F1 = 0.904; AUC = 0.985), demonstrating superior discrimination of severe crashes while maintaining balanced precision (0.929) and recall (0.881). The marginal differences among the Baseline, With Indices, and With MRI configurations indicate that the introduction of interpretable risk structures and dependence weighting did not compromise accuracy, confirming the robustness of the MRI-Copula integration. In comparison, LightGBM achieved similarly high predictive performance (AUC ≈ 0.96; F1 ≈ 0.87–0.89) but required substantially less training time (approximately 0.07 seconds), underscoring its efficiency and scalability for large-scale or near–real-time deployment. HistGB yielded slightly lower but competitive accuracy (AUC ≈ 0.95; F1 ≈ 0.81–0.82) within a modest runtime of about 0.23 seconds, further confirming the generalizability of the MRI-Copula approach across gradient-boosting frameworks.

By contrast, Logistic Regression consistently exhibited the lowest predictive performance (AUC ≈ 0.90; F1 ≈ 0.78), reflecting the inherent limitations of linear models in capturing the nonlinear interactions and complex dependencies among risk indices. This clear contrast between linear and ensemble-based learners highlights the methodological advantage of embedding dependency-aware indices within nonlinear architectures. Moreover, the inclusion of the interpretable ERI, BRI, and SRI sub-indices enhanced transparency by revealing how domain-specific risks contribute to overall crash severity, while the α-weighted MRI provided a unified representation of predictive and dependence-based relationships.

These results confirm that the MRI-Copula framework achieves a balanced integration of predictive accuracy, computational efficiency, and interpretability. CatBoost remains the most analytically robust and interpretable configuration, while LightGBM and HistGB offer practical alternatives for real-time or resource-constrained applications. Together, these findings position the MRI-Copula framework as a reliable, scalable, and theoretically grounded tool for data-driven road-safety management.

Revised Manuscript Excerpt (Section 5 – Discussion):

Comparative results across classifiers further confirmed the robustness of the MRI-Copula framework. As presented in Section 4.6, the CatBoost + MRI-Copula configuration achieved the highest predictive performance (AUC = 0.985; F1 = 0.904), demonstrating strong discriminative capability and balanced precision–recall trade-offs. LightGBM and HistGB models achieved comparably high accuracy (AUC ≈ 0.96; F1 ≈ 0.87–0.82) but required only fractions of a second for training, compared to approximately 32 seconds for CatBoost. This contrast shows an important trade-off between predictive power and computational efficiency, highlighting the framework’s flexibility for different deployment contexts. CatBoost remains preferable when analytical accuracy and interpretability are prioritized, while LightGBM and HistGB provide practical, time-efficient alternatives for real-time or large-scale traffic safety monitoring applications.

 

Comment 2

“The data preprocessing and feature description are not sufficiently detailed. The authors should clarify the nature, structure, and quality of the dataset, including missing values and class balance.”

Response:

We appreciate the reviewer’s valuable suggestion. Section 3.1 (Data Preprocessing and Feature Engineering) has been substantially revised to clearly describe the dataset and preprocessing pipeline. The revised text now includes:

  1. Dataset structure and coverage — 877 police-reported crashes (2019–2023, Jeddah, Saudi Arabia) with 27 temporal, environmental, infrastructural, vehicle, and driver-related variables.
  2. Target variable: binary Injury Severity (0 = minor, 1 = severe/fatal) with 66.6% minor and 33.4% severe/fatal crashes.
  3. Data imputation: numerical features imputed via median; categorical features via mode (Joel et al., 2024).
  4. Encoding and feature engineering: temporal derivatives (month, weekday, weekend) and categorical encoding optimized for CatBoost and LightGBM.
  5. Data quality verification: all missing values removed post-cleaning.

This enhanced description improves transparency, ensuring the dataset’s suitability for both predictive modeling and dependence estimation.

Revised Manuscript Reference: Section 3.1

3.1. Data Preprocessing and Feature Engineering

This study employs a dataset of 877 police-reported crashes from Jeddah, Saudi Arabia (2019–2023), serving as a proof-of-concept case for testing the MRI-Copula framework in an urban setting characterized by rapid motorization and varied crash determinants. Although region-specific, the dataset provides a representative basis for assessing methodological feasibility before scaling to multi-regional applications.

Each record contains 27 attributes spanning temporal, environmental, infrastructural, vehicle, and driver domains, with Injury Severity defined as a binary target: 0 = minor and 1 = severe/fatal. The class distribution (66.6 % minor, 33.4 % severe/fatal) supports stratified binary classification.

Temporal features (month, day of week, and weekend indicator) were derived from crash dates. Categorical variables (e.g., weather, driver gender, road type) were processed natively in CatBoost and encoded for LightGBM, while numerical variables (e.g., age, speed limit) were median-imputed. Categorical missing values were imputed using the mode (Joel et al., 2024). The cleaned dataset contained no missing values and was normalized for downstream modeling.

This preprocessing pipeline ensured data completeness, interpretability, and consistency across classifiers. Future extensions will validate the framework on larger, multi-regional datasets to enhance generalizability and capture cross-regional variations in crash causation.

Comment 3

“The framework’s implementation lacks sufficient technical detail. For example, how were hyperparameters for the machine learning models chosen, and how sensitive is MRI-Copula to these choices? Was cross-validation applied?”

Response:

We thank the reviewer for highlighting this. The revised Section 3.3 now provides explicit technical details on hyperparameter optimization, sensitivity analysis, and validation procedures:

  1. Hyperparameter tuning: Key CatBoost parameters (learning rate, depth, iterations) were optimized using stratified 5-fold cross-validation, maximizing AUC as the selection criterion.
  2. MRI weight sensitivity: The α-weight parameter controlling SHAP–Copula integration was tuned through a grid search (0.0–1.0, step 0.1), constrained to α ≤ 0.8 to preserve at least 20% Copula-derived contribution for interpretability.
  3. Validation robustness: Performance was tested on a held-out test set and further validated through 1,000 bootstrap replications, yielding stable F1, AUC, Precision, Recall, Accuracy, and MCC estimates.

These methodological clarifications confirm that MRI-Copula is both computationally validated and robust against overfitting.

Revised Manuscript: Section 3.3 as shown above.

Comment 4

“The results are presented with tables and metrics, but no clear baseline models are included for fair comparison. Could the authors add results against simpler benchmarks (e.g., logistic regression, random forest) to show relative gains more convincingly?”

Response:

We fully agree and thank the reviewer for this important suggestion. The revised Section 4.6 (Comparative Model Performance Across Classifiers) now includes baseline comparisons with Logistic Regression alongside gradient-boosting models (CatBoost, LightGBM, HistGB).

The expanded Table 6 demonstrates that:

  • Logistic Regression achieved an AUC of 0.899 and an F1 score of 0.789, representing the baseline performance.
  • The CatBoost + MRI-Copula model attained the highest predictive accuracy, with AUC = 0.985 and F1 = 0.904, reflecting approximately 10 percent improvement in F1 and 9 percent in AUC relative to the baseline.
  • LightGBM and HistGB produced comparable accuracy (AUC ≈ 0.96; F1 ≈ 0.87–0.82) while maintaining extremely short runtimes, fractions of a second, confirming their computational scalability.

These results strengthen the fairness and interpretability of the evaluation by contrasting MRI-Copula performance against both simple statistical and advanced ensemble baselines.

Revised Manuscript:

  • Section 4.6: Comparative Model Performance Across Classifiers (Table 6).
  • Section 5:  Discussion (computational trade-off and interpretability).

Comment 5

“The discussion of limitations is very brief. What are the known weaknesses of MRI-Copula, for example regarding data sparsity, computational costs, or robustness across cities with very different traffic patterns?”

Response:

We appreciate this crucial comment and have now substantially expanded the Discussion to address these limitations explicitly. The revised section identifies four key areas:

  1. Data sparsity and generalizability: The study’s dataset (877 Jeddah crashes) is region-specific and moderately sized. This is now explicitly stated as a limitation, with emphasis on the framework’s role as a proof-of-concept to be validated across larger, multi-city datasets.
  2. Computational costs: Vine Copula and SHAP computations are resource-intensive, particularly for tail-dependence estimation. Runtime analysis (Section 4.6) quantifies this cost, CatBoost + MRI ≈ 32s versus LightGBM/HistGB ≈ fraction of second.
  3. Cross-city robustness: Dependence structures among risk indices vary across contexts (e.g., congestion-heavy vs. high-speed urban areas). Future research will focus on multi-regional calibration to capture this heterogeneity.
  4. Causal interpretability: The framework identifies associative rather than strictly causal relationships (Pearl, 2009), and this distinction is now explicitly acknowledged.

These revisions provide a balanced and transparent discussion of scope, limitations, and directions for future enhancement.

Revised Manuscript: Section 5

Several limitations should be acknowledged; these findings should be interpreted within the scope of the study design. The analysis was based on 877 crash records from Jeddah, Saudi Arabia, which offers valuable insights into a Middle Eastern urban context but may not fully capture the diversity of cultural, infrastructural, and climatic conditions found elsewhere. The relatively modest sample size also places natural limits on statistical power, although this was partly mitigated by cross-validation and bootstrap replication. In addition, vine copula modeling remains computationally intensive, which may constrain its use in real-time or resource-limited settings. Finally, as with most observational crash data, the relationships observed are associative rather than strictly causal (Pearl, 2009).

Future work can address these constraints by scaling the framework to larger, multi-regional datasets to enhance generalizability, integrating geospatial analytics for corridor-level risk mapping (Essa & Sayed, 2019), and incorporating near-miss and exposure data to improve robustness (Hu et al., 2022). Addressing fairness and equity concerns is also essential to prevent algorithmic bias (Raji et al., 2020). Embedding MRI-Copula in human-in-the-loop decision systems could further bridge analytics with practitioner judgment, enabling adaptive and context-aware safety management.

Comment 6

“The practical contribution for urban traffic management is not fully convincing. Could the authors clarify how urban planners or policymakers could realistically apply MRI-Copula in decision-making, beyond theoretical results?”

Response:

We thank the reviewer for this valuable suggestion. The revised Discussion (Section 5) now articulates practical pathways through which MRI-Copula can inform urban safety management and policymaking.

Concrete use cases include:

  1. Data-driven enforcement thresholds: Identifying severity inflection points (e.g., ~100 km/h) for adaptive speed-limit zoning.
  2. Proactive risk prioritization: Ranking high-risk road segments, driver categories, or time windows using MRI scores to guide targeted interventions.
  3. Integration into ITS systems: Embedding MRI-Copula outputs in real-time traffic management dashboards for human-in-the-loop decision-making.
  4. Policy tailoring: Disaggregating risks into Environmental (ERI), Behavioural (BRI), and Systemic (SRI) indices to guide context-specific interventions, such as infrastructure redesign, behavioural campaigns, or environment-adaptive control.

The discussion concludes that MRI-Copula provides a scalable, interpretable decision-support tool, linking analytical precision with actionable safety policy.

Revised Manuscript Reference: Section 5

The conditional independence of the risk indices offers significant policy implications. It reveals that systemic or infrastructure-related risks, such as those linked to road geometry, surface quality, and access control, serve as stabilizing elements within the broader crash-risk ecosystem. Well-designed infrastructure can therefore buffer the effects of adverse environmental conditions and risky driving behaviour, preventing these from escalating into severe crashes.

This finding emphasises the complementary roles of long-term infrastructure planning and short-term behavioural interventions in traffic safety management. Behavioural risks (BRI) require continuous enforcement and awareness programs, whereas systemic risks (SRI) demand sustained investment in road design, maintenance, and access management to enhance network resilience. Improvements such as better geometric alignment, lane separation, and roadside protection can mitigate behavioural volatility and make safety outcomes less sensitive to temporary human or environmental disturbances.

The independence of SRI from ERI and BRI thus supports a multi-layered safety strategy: behavioural countermeasures should target immediate crash precursors, while infrastructure design should function as a long-term stabilizer that structurally reduces exposure to compounding risks. This evidence-based distinction provides actionable guidance for policymakers to balance resources effectively among enforcement, education, and engineering measures.

Beyond these policy implications, the MRI-Copula framework itself functions as a scalable decision-support tool for real-world safety applications. By integrating environmental, behavioural, and systemic components within an interpretable α-weighted structure, the model enables data-driven prioritization of high-risk road segments, driver categories, or time periods. The sub-indices (ERI, BRI, SRI) further guide the type of intervention required; behavioural, infrastructural, or environmental. Moreover, the comparable accuracy of LightGBM and HistGB models, coupled with their lower computational cost, allows for real-time deployment in intelligent transportation system (ITS) dashboards. This integration bridges predictive analytics with operational decision-making, supporting adaptive and context-aware traffic safety management.

 

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Problems have been addressed.

Reviewer 2 Report

Comments and Suggestions for Authors

I have carefully reviewed the revised manuscript and the author's responses, it is evident that the authors have made much efforts to provide detailed explanations for each point of contention. After considering the improvements made and the thoroughness of the revisions, I believe that the manuscript has significantly progressed and now meets the publication standards.

Back to TopTop