A Hybrid Approach to Investigating Factors Associated with Crash Injury Severity: Integrating Interpretable Machine Learning with Logit Model

Wang, Chenxi; Serre, Thierry

doi:10.3390/app151910417

Open AccessArticle

A Hybrid Approach to Investigating Factors Associated with Crash Injury Severity: Integrating Interpretable Machine Learning with Logit Model

by

Chenxi Wang

^1,2,*

and

Thierry Serre

²

¹

Department of Civil and Industrial Engineering (DICI), University of Pisa, 56122 Pisa, Italy

²

Laboratory of Accident Mechanism Analysis (LMA), Université Gustave Eiffel, 13300 Salon-de-Provence, France

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(19), 10417; https://doi.org/10.3390/app151910417

Submission received: 30 July 2025 / Revised: 22 September 2025 / Accepted: 23 September 2025 / Published: 25 September 2025

(This article belongs to the Special Issue Advances in Land, Rail and Maritime Transport and in City Logistics)

Download

Browse Figures

Versions Notes

Abstract

Understanding the determinants of crash injury severity is essential for developing effective safety strategies and reducing traffic-related losses. This study proposes a hybrid analytical framework that integrates interpretable machine learning with statistical modeling to address the limitations of existing approaches. A Random Forest (RF) classifier, combined with Shapley Additive Explanations (SHAP), was first employed to capture nonlinear relationships and identify key predictors of injury outcomes, including safety equipment, age, gender, and the presence of fixed obstacles. Random Forest was chosen for its strong predictive performance in capturing nonlinear relationships, while SHAP provides transparent explanations of model predictions. To ensure statistical rigor and quantify associations, a Partial Proportional Odds (PPO) model was subsequently applied, allowing for the relaxation of the proportional odds assumption (POA) and enabling the estimation of marginal effects. The results consistently highlight the protective role of safety equipment and the increased risks associated with fixed obstacles, adverse weather, and nighttime conditions. For instance, seatbelt use is associated with a 29.61% higher probability of no injury, whereas fixed obstacles are associated with a 29.36% lower probability and a higher risk of severe injury. These findings support safety campaigns that encourage protective equipment use and infrastructure policies aimed at reducing roadside obstacles and improving nighttime visibility. Future research will focus on accounting unobserved heterogeneity and validating the framework across multi-regional datasets to improve its generalizability and policy relevance.

Keywords:

crash injury severity; random forest; Shapley additive explanations; partial proportional odds model

1. Introduction

Road traffic crashes are a threat to the public, leading to approximately 1.19 million deaths and up to 50 million injuries globally each year, according to the World Health Organization (WHO) [1]. In France, 3193 people died on metropolitan roads in 2024, with an additional 239 deaths in overseas regions. Notably, 84% of deadly crashes involved male drivers, and the lack of helmet use among children significantly increased the risk of serious brain injuries even at low-speed impacts (e.g., 10.8 km/h) [2]. These facts highlight how individual characteristics (e.g., gender, protective equipment use) play a critical role in crash outcomes, emphasizing the importance of feature-based analysis for effective safety interventions.

Researchers have investigated factors associated with crashes, including drivers’ characteristics (e.g., age, gender), road conditions (e.g., lighting, layout, wet/dry), driver physical conditions (drug/alcohol, vision), vehicle characteristics, and weather (rainy, snowy) [3,4]. To model the relationship between these variables and injury outcomes, classical statistical methods, particularly the multinomial logit (MNLogit) [5] and ordered probit models [6], have been applied due to their interpretability and ability to estimate marginal effects. However, such models are limited by linear assumptions and cannot capture complex interactions or nonlinear patterns. In recent years, machine learning approaches (MLAs) have gained attention in traffic safety research due to their ability to handle complex, nonlinear relationships and large datasets. Among them, ensemble models such as Random Forest (RF) [7] and Extreme Gradient Boosting (XGBoost) [8] have demonstrated strong performance in crash severity classification. However, the black box nature limits their applicability in policy contexts where interpretability is essential.

To address these gaps, this study proposes a hybrid analytical framework that integrates: (i) a machine learning model combined with interpretable methods to provide explainable predictions and (ii) a logit-based module to statistically validate and quantify the associations of key features. The remainder of this paper is structured as follows: Section 2 reviews the related literature. Section 3 presents the dataset in detail. Section 4 describes the proposed framework and methodology. Section 5 reports the experimental results. Finally, Section 6 concludes the study.

2. Literature Review

Numerous studies have explored how individual characteristics, road environments, and vehicle-related factors contribute to different injury outcomes. However, while a variety of modeling approaches have been proposed, several limitations remain. This section reviews the existing literature in two key dimensions: modeling techniques and interpretability, to highlight the research gap addressed by this study.

2.1. Traditional Statistical Modeling Approaches

Classical statistical models such as the logit, probit models and logistic regression have been widely adopted in crash severity analysis due to their interpretability. These models allow for the estimation of marginal effects and examination of the correlations between explanatory variables (e.g., age, gender, weather, lighting conditions) and crash severity [9]. Classical models in crash severity analysis generally are categorized two types: unordered and ordered categorical models.

Unordered categorical models

MNLogit is often applied for crash severity analysis due to its flexibility in handling multiple unordered outcome categories, without requiring the proportional odds assumption (POA). Chen et al. [5] applied a MNLogit model to examine how driver behavior, vehicle type, and environmental factors influence crash injury severity. Injury outcomes were classified into four unordered levels. The model identified several significant predictors of severity, including the driver’s physical condition, vehicle type (motorcycle and heavy truck), pedestrian age (26–65 and over 65), light condition and wet road surface. However, the model assumes the independence of irrelevant alternatives (IIA), an assumption that may be violated in practice, as severity levels in crash data are often correlated.

To address the limitations of the IIA assumption inherent in MNLogit, the nested logit model introduces a hierarchical structure by grouping alternatives into nests (e.g., grouping severe and fatal together). This allows for correlations among related outcome categories and improves the model’s flexibility in capturing real-world decision processes. For example, Razi-Ardakani et al. [10] analyzed the influence of driver distraction on types of vehicle crashes, this model was utilized to analyze single-vehicle and two-vehicle crashes, effectively accounting for unobserved correlations in the error terms. This more accurately revealed how various distraction factors (such as cellphone usage, cognitive distractions, and distractions from passengers or outside events) are associated with the probability of different crash types, including run-off-road crashes, collisions with fixed objects, and rear-end crashes. Nonetheless, the nesting structure must be pre-specified, and incorrect nesting may lead to biased results.

Ordered categorical models

When the outcome exhibits a natural ordering, like crash severity levels, models like the ordered probit and ordered logit regression (OLR) are preferred. The ordered probit model exploits the ordinal structure of the dependent variable to produce more efficient estimates. However, it assumes that the effect of explanatory variables is constant across all category thresholds, which may oversimplify relationships in practice. Similarly, OLR takes advantage of the rank order of categories to yield parsimonious and interpretable results. For example, Ma et al. [11] employed an OLR model to examine the statistical characteristics of hazardous crashes. They identified ten significant risk factors, such as traffic violations, unsafe driving behaviors, and vehicle defects, and proposed three enforcement countermeasures. Nonetheless, like the ordered probit, the OLR model also relies on the POA, which may limit model flexibility when the assumption is violated. In such cases, more flexible alternatives such as the generalized ordered logit (GOL) model, partial proportional odds (PPO) models, or even MNLogit/nested logit models are commonly employed to relax the assumption and provide more robust estimates.

However, in practice, GOL often suffers from non-convergence due to the large number of parameters, the imbalanced distribution of severity categories, and multicollinearity among predictors. Even when convergence is achieved, the model may fail to guarantee the monotonic order of severity levels, which undermines interpretability. The increased complexity also reduces its applicability in policy-oriented studies. To overcome these shortcomings, the PPO model has been proposed as a more flexible alternative. PPO allows some variables to satisfy the POA while permitting others to vary across thresholds, thereby balancing interpretability with flexibility. For example, Li and Fan [12] demonstrated that combining latent class clustering with PPO in truck crash severity analysis yielded more robust estimates and clearer insights into the impacts of roadway, driver, and environmental factors.

Despite their strengths, all these models have limitations: they assume linear relationships between variables and outcomes, and are less effective in capturing complex, nonlinear patterns that often characterize crash data.

2.2. Machine Learning Approaches

In response to the limitations of traditional statistical models, MLAs have been increasingly adopted for crash injury modeling. MLAs, including decision trees (DT) [13], RF [14], support vector machines (SVM) [15,16], and other gradient boosting machine (GBMs) [17] have demonstrated superior performance in classifying injury severity. Their ability to learn complex patterns makes them particularly well-suited for crash severity analysis. For instance, DT provides a clear, rule-based representation, while SVMs excel at identifying optimal hyperplanes for classification in high-dimensional spaces. GBMs, through their iterative boosting approach, can achieve high predictive accuracy by sequentially correcting errors of previous models. Recent studies have demonstrated the effectiveness of these models in various contexts. For example, Champahom et al. [17] proposed a framework to develop a spatial-temporal crash risk map using ensemble decision tree models (including DT, RF, and XGBoost) to prioritize high-risk segments. The study found that the XGBoost model performed better in explaining and interpreting the factors associated with crash frequency, while the RF model was better for detecting trends and predicting crash frequency. Similarly, Dimitrijevic et al. [15] developed a hybrid framework for predicting the severity of work-zone crashes by incorporating optimization strategies into SVM models. They introduced two variants: a genetic algorithm-optimized SVM and a greedy-search-optimized SVM. Their findings indicated that genetic algorithm-optimized SVM achieved the highest prediction accuracy, demonstrating the potential of combining MLAs with metaheuristic optimization techniques to enhance predictive performance in safety-critical applications.

However, a significant limitation of most MLAs is the unexplainability. While they can achieve accurate predictions, they often lack transparency, making it difficult to understand the prediction. To address this interpretability issue, recent research has introduced methods that explain MLAs outputs. One of the most popular approaches is Shapley Additive Explanations (SHAP), which assigns an importance value to each feature for a particular prediction, representing its contribution to the prediction compared to the average prediction. This allows for both global and local interpretability. Sum et al. [14] and Sun et al. [18] combined the RF with the SHAP method to gain a deeper understanding of the determinants of crash severity in different scenarios. By utilizing the SHAP tool, they effectively explained the prediction results of the RF, revealing how key factors such as roadway characteristics, environmental conditions, and vehicle involvement are associated with crash severity. Hasan et al. [19] employed various MLAs with SHAP values to predict the injury severity. Results show that XGBoost outperforms RF, CatBoost, and LightGBM models in predicting crash severity. Partial dependence plots from SHAP values indicate a higher likelihood of injury crashes due to speeding in clear weather, and more injury crashes from multi-vehicle collisions at intersections. Similar findings can assist policymakers in implementing appropriate countermeasures to enhance driver safety.

In addition to these studies, recent research has also explored hybrid approaches that integrate interpretable MLAs with statistical models. For instance, Samerei and Aghabayk [20] investigated the transition from two-vehicle crashes to chain reaction crashes using a framework that combines latent class clustering, random parameter logit models, and SHAP analysis. Their work highlights the value of addressing unobserved heterogeneity and using SHAP to interpret variable effects across different clusters. In addition, Jafari and Persaud [21] proposed a hybrid Structural Equation Mode-Artificial Neural Network (SEM–ANN) approach to investigate the association between road, weather, and socioeconomic factors and crash frequency and severity. Their framework first applied SEM to capture complex and moderating linear relationships, followed by ANN to identify key nonlinear predictors of crash intensity. These studies demonstrate the advantages of combining interpretable statistical modeling with machine learning to provide complementary explanations of crash mechanisms.

3. Data Source

The dataset used in this study includes traffic crashes in France from 2019 to 2023, sourced from the official open data platform, observatoire national interministériel de la sécurité routière (ONISR) [22]. The original dataset consists of four files: crash location (e.g., number of lanes, slope, road surface condition), crash characteristics (e.g., time of crash, lighting conditions, weather), vehicle characteristics (e.g., vehicle type, fixed obstacle, mobile obstacle, maneuver before collision), and user characteristics (e.g., gender, safety equipment usage, age). All the datasets and descriptions can be found in ONISR website [22].

To reduce regional heterogeneity, the analysis was restricted to the Bouches-du-Rhône area, one of the most comprehensively recorded regions in terms of traffic crashes. This area provides a representative dataset for targeted investigation. Moreover, the study focused on the most frequent crash scenario, defined by catr = 4 (municipal roads) and catv = 7 (light vehicles). This restriction allowed the analysis to concentrate on high-frequency and policy-relevant collision types, thereby enhancing both the robustness and interpretability of the results. In this dataset, 0.72% of crashes were fatal, 10.64% involved hospitalized injuries, 43.9% resulted in minor injuries, and 44.8% caused no injury. To balance outcome categories, fatal and hospitalized-injury crashes were merged into a single severe category, making the identification of factors associated with the most serious injury outcomes in the subsequent modeling.

The original dataset contained 67 variables, many of which are not directly relevant to the research objectives. A systematic feature selection process was therefore implemented. Variables with excessive missingness or weak relevance were excluded first. Next, collinearity diagnostics were conducted to eliminate highly correlated features and reduce multicollinearity. In addition, variance filtering and univariate statistical tests (e.g., chi-square tests and analysis of variance) were applied to further refining the feature set, retaining only those variables significantly associated with crash severity. Finally, 10 key variables were selected for subsequent statistical modeling and analysis. Table 1 and Table 2 summarize the descriptive statistics of categorical and numerical variables, respectively. Among categorical features, males accounted for the majority of road users involved in crashes (62.6% of severe-injury cases and 70.3% of non-injury cases). Most crashes occurred without a fixed obstacle (over 90% across all severity levels). Seatbelt use was the most common safety equipment, yet a considerable proportion of severe crashes still involved unprotected users (38.5%). Only a small fraction of crashes involved a moving obstacle (approximately 11.8% of severe cases). Regarding environmental factors, the majority of crashes occurred during daylight (over 60% across severity groups), whereas dusk/dawn and nighttime crashes accounted for smaller shares. Road alignment was predominantly straight (over 85%), and most crashes took place on flat rather than sloped roads. The majority occurred under normal weather conditions, while adverse weather (rain or snow) represented fewer than 10% of crashes.

Regarding numerical features (Table 2), the average number of lanes (nbv) was approximately 2.7, with severe crashes occurring slightly more often on wider roads (mean = 3.02). The mean age of crash-involved users was around 40 years, with injured users being slightly younger than those in non-injury crashes. Overall, these descriptive statistics provide a comprehensive overview of the dataset composition and suggest that both roadway and user characteristics play an important role in shaping crash severity outcomes.

4. Methodology

4.1. Framework

This study proposes a hybrid modeling framework that integrates interpretable machine learning with statistical methods to investigate the predictors of crash injury severity. Tree-based models are applied to the dataset to compute feature importance by SHAP. Finally, logit-based models are employed to validate the results of the Random Forest and provide more reliable statistical evidence. This framework is based on assumption: crash records are independently and identically distributed and representative of the study area.

4.2. Random Forest

RF is a widely used ensemble learning algorithm valued for its robustness and ability to reduce overfitting. It effectively handles high-dimensional and imbalanced data, making it suitable for complex crash injury severity prediction. The algorithm is built upon two key principles: bootstrap sampling of the data and random feature selection at each node split. For the multi-class task of predicting injury severity in traffic crashes, the dataset

(X_{i}, Y_{i})

is composed of feature vectors

X_{i}

and injury severity levels

Y_{i}

. The fundamental principles of the RF algorithm include three main components: data sampling, node splitting, feature selection, and ensemble voting.

First, bootstrap sampling with replacement is performed on the original dataset to generate training subsets

D_{b} = {(X_{i}, Y_{i})}, i = 1, 2, \dots, N

that are used to construct

B

decision trees. Second, at each node of a tree, a random subset of

K

features

{f_{1}, f_{2}, \dots, f_{k}} \subseteq F

is selected. Based on the Gini index, the optimal feature for splitting is chosen from this subset. For a given node

t

, the Gini index is defined as:

G (t) = 1 - \sum_{j = 1}^{c} p_{j}^{2}

(1)

where

c = 1, 2, 3

represents the number of classes, and

p_{j}

is the proportion of samples in node

t

that belong to class

j

. By minimizing the uncertainty of node purity, the Gini index ensures that samples in the resulting child nodes are as homogenous as possible. Finally, RF combines the predictions of all trees to determine the final classification. Each tree

T_{b}

predicts an output class

{\hat{Y}}_{b}

for an input sample

x

. The total votes for each class are computed as:

V_{c} = \sum_{b = 1}^{B} I ({\hat{Y}}_{b}^{B} = c)

(2)

where the indicator function

I (•)

returns 1 if the condition is true and 0 otherwise. The final predicted class is determined by majority voting (Equation (3)), corresponding to minor injury, no injury or severe injury.

\hat{Y} = \arg \max (V_{c}), c = {0, 1, 2}

(3)

To ensure robust model performance, the RF classifier was optimized through a nested cross-validation procedure, where a grid search was applied in the inner folds for hyperparameter selection and the outer folds provided unbiased performance estimates. Several hyperparameters were systematically tuned. The number of trees (n_estimators) was explored in the range of 100 to 500, with 300 trees ultimately selected to achieve a balance between predictive stability and computational efficiency. The maximum tree depth (max_depth) was tuned between unrestricted growth and fixed depths of 10, 20, and 30, with 20 yielding the best trade-off between capturing nonlinear interactions and preventing overfitting. For node-splitting criteria, the minimum number of samples required to split an internal node (min_samples_split) was set to 5, while the minimum number of samples at a terminal leaf (min_samples_leaf) was set to 2, both of which were determined to reduce variance without overly constraining tree growth. The number of features considered at each split (max_features) was tested using sqrt and log2 rules, with sqrt selected as it consistently produced better generalization across folds. To mitigate severe class imbalance in the dataset, the class_weight parameter was specified as balanced, which assigns weights inversely proportional to class frequencies. A fixed random seed (random_state = 42) was applied to ensure reproducibility. Final hyperparameter choices were determined based on a combination of all the evaluation metrics.

4.3. Shapely Additive Explanations

SHAP is a method for interpreting machine learning model outputs. It is based on game theory and quantifies the marginal contribution of each feature to the prediction results. However, calculating SHAP values by considering all possible feature combinations is computationally intensive with a high time complexity. To address this, TreeSHAP, an optimized approach designed for tree-based models, improves computational efficiency. TreeSHAP employs an efficient algorithm to calculate SHAP values specifically for decision tree models, employing the structure of trees to enhance performance. In the decision trees, predictions are determined by traversing paths from the root node to leaf nodes. TreeSHAP calculates the marginal contributions of features along these paths. For a tree-based model

\hat{f}

, given the input features

X = {x_{1}, x_{2}, \dots, x_{n}}

, the SHAP value of feature

x_{i}

, denoted as

ϕ_{i}

, is computed using the Formula (5).

ϕ_{i} = \sum_{j \in paths} P_{j} \cdot [f_{j} (X \cup {x_{i}}) - f_{j} (X)]

(4)

It initializes the weights and adjusts them based on the encountered nodes. For each path

j

, it calculates the difference between the predicted values with and without the feature

x_{i}

. Specifically,

f_{j} (X \cup {x_{i}}) - f_{j} (X)

represents the predicted value at a node including

x_{i}

, while

f_{j} (X)

represents the predicted value excluding

x_{i}

. The contribution of the feature

x_{i}

to the output of path

j

is weighted by

p_{j}

, the probability or importance of the path, and summed over all paths.

The aggregated weighted contributions across all paths yield

ϕ_{i}

, the SHAP value of

x_{i}

, which represents the average marginal contribution of the feature to the model’s prediction. TreeSHAP-based feature importance and visualizations enable the identification of the most influential features and their main effects on crash severity predictions.

4.4. Statistical Models

To complement the machine learning analysis and provide formal statistical inference, we employed several logistic regression approaches suitable for modeling crash injury severity outcomes. The choice among these models depends on the theoretical assumptions about the outcome structure and empirical validation of model assumptions.

The MNLogit model treats injury severity categories as unordered outcomes, making no assumptions about the ordinal relationship between severity levels. For outcome categories j = 1, 2, …, J, the probability of injury category j given predictors X is expressed as:

P (Y = j ∣ X) = \frac{\exp (X β_{j})}{\sum_{k = 1}^{K} \exp (X β_{k})}

(5)

where

β_{j}

denotes the coefficient vector of category

j

, and

K

is the total number of outcome categories. The model relies on the independence of irrelevant alternatives (IIA) assumption, which can be tested using Hausman–McFadden or Small–Hsiao tests.

Given the natural ordering of injury severity categories (no injury < minor injury < severe injury), the ordered logit model provides a more parsimonious specification. This model assumes proportional odds, meaning that the effect of explanatory variables is constant across all thresholds of the ordered response. The cumulative logit is expressed as:

\log (\frac{P (Y \leq j)}{P (Y > j)}) = α_{j} - X β

(6)

where

α_{j}

represents the cut-off thresholds, and β is a vector of coefficients common across all thresholds. The POA can be tested using the Brant test.

When the POA is violated, the PPO model provides a flexible alternative. This model allows some predictors to vary across thresholds while others remain constant, providing a compromise between the flexibility of multinomial logit and the parsimony of ordered logit. The cumulative logit function is expressed as:

\log (\frac{P (Y \leq j)}{P (Y > j)}) = α_{j} - X β - Z γ_{j}

(7)

where

X

represents variables satisfying the proportional odds assumption with coefficient vector,

β

, and Z represents variables with threshold-specific effects

γ_{j}

The assignment of variables to

X

or Z is determined based on formal statistical tests of the proportional odds assumption.

Model selection follows a systematic approach: (1) test IIA assumption for MNLogit; (2) test POA for ordered models; (3) select the most appropriate specification based on assumption validity and model fit criteria including AIC and BIC. Average marginal effects are computed for the final model to facilitate interpretation of covariate associations with injury severity probabilities.

4.5. Evaluation Metrics

Since the crash injury severity is modeled as a three-class classification problem, standard multi-class evaluation metrics are employed. These include Accuracy, Precision, Recall, F1-score (per class, Macro, Micro, Weighted), Balanced Accuracy, Area Under the Receiver Operating Characteristic Curve (ROC-AUC), Area Under the Precision–Recall Curve (PR-AUC), and Confusion Matrices. For multi-class metrics such as ROC-AUC and PR-AUC, the One-vs-Rest (OvR) strategy is adopted: each class was considered as the positive class against all others as negative. Subsequently, Macro-averaging and Micro-averaging are reported to provide a class-balanced and frequency-weighted evaluation, respectively. Macro-F1 and Balanced Accuracy further emphasize fairness across classes, while PR-AUC is particularly informative for rare but safety-critical outcomes such as severe injuries. Finally, confusion matrices are reported per class to provide an intuitive view of classification errors across severity levels. Table 3 summarizes the definitions and formulas.

5. Results and Discussion

5.1. Model Results

Categorical variables were dummy encoded to avoid imposing artificial ordinal relationships, while numerical variables were standardized to ensure comparability of SHAP values across different scales. The final results are reported as the mean ± standard deviation (or 95% confidence interval) across the five outer test folds.

5.1.1. Validation of Injury Severity Category Merging

Given the severe class imbalance in the original four-category classification, a comparative analysis was conducted to justify the merger of fatal and severe injury categories. Using 5-fold cross-validation, both four-class (no injury, minor, severe, fatal) and three-class (no injury, minor, hospitalized and fatal) classification schemes were evaluated. As shown in Table 4, results from the four-class model revealed that the fatal category suffered from extremely poor performance due to insufficient sample size, with most fatal cases misclassified. In contrast, the three-class model demonstrated consistent and statistically robust improvements: balanced accuracy increased from 41.4 ± 1.8% to 50.1 ± 1.9% (+8.7%); macro F1-score improved from 45.0 ± 2.9% to 50.2 ± 2.2% (+5.2%); ROC-AUC (macro) rose from 79.2 ± 1.1% to 81.9 ± 1.5% (+2.7%); and PR-AUC (macro) improved substantially from 53.5 ± 2.4% to 66.8 ± 3.5% (+13.3%). The merged category also maintains a sufficient sample size for reliable statistical inference.

5.1.2. Model Results from Random Forest

The ROC curves in Figure 1a show that the model achieves strong discriminative ability across all three severity levels. Specifically, the AUC values are 0.89 for severe injury (class 0), 0.89 for minor injury (class 1), and 0.93 for no injury (class 2). The macro-average AUC of 0.90 and micro-average AUC of 0.93 confirm that the model maintains balanced predictive performance across classes. The confusion matrix in Figure 1b provides additional insights. Importantly, severe injuries are rarely misclassified as no injury, which is crucial from road safety.

The comprehensive evaluation metrics in Table 5 demonstrate that the RF model achieves robust and balanced predictive performance across the three injury severity categories. At the class level, the no injury group achieves the highest performance (Precision = 0.83, Recall = 0.86, F1 = 0.85), followed by minor injury (Precision = 0.80, Recall = 0.79, F1 = 0.79). The severe injury class, being the minority category, presents lower but still acceptable values (Precision = 0.70, Recall = 0.64, F1 = 0.67). This indicates that although the model captures a majority of severe cases, there remains a tendency to misclassify some severe injuries into less severe categories, reflecting the inherent data imbalance.

Other evaluation metrics are summarized in Table 6. The model achieves an overall accuracy of 0.814, with consistent results across macro-F1 (0.770), micro-F1 (0.814), and weighted-F1 (0.813). The balanced accuracy of 0.761 highlights that the model’s predictive ability remains reliable across classes, even under imbalance. The ROC-AUC values are consistently high: 0.902 (macro) and 0.927 (micro), confirming strong ranking capability across severity levels. However, PR-AUC provides a more realistic assessment of minority-class performance. The results show a micro PR-AUC of 0.791 and macro PR-AUC of 0.857, with class-specific values of 0.882 (no injury), 0.849 (minor injury), and 0.639 (severe injury). The lower PR-AUC for the severe injury class highlights the persistent challenge of identifying rare, as precision and recall for this group remain relatively modest compared to majority classes.

5.2. Model Interpretation by SHAP

Figure 2 shows the feature importance values computed using SHAP for RF model. The bars represent the mean absolute SHAP values, with colors indicating contributions to minor injury (Class 0, blue), no injury (Class 1, green), and severe injury (Class 2, red). The results suggest that safety equipment (secu) shows the strongest association with injury severity predictions across all classes, as it contributes the highest SHAP values. Fixed obstacle (obs), age, and gender (sexe) also exhibit substantial contributions, indicating that these features are highly relevant for the model’s classification of crash outcomes. In particular, fixed obstacle presence (obs) is most strongly associated with predictions of severe injuries.

For the no-injury and minor-injury categories, age and gender appear more influential relative to severe injuries, while road surface condition (surf) and infrastructure complexity (infra) are more important for severe injury predictions. Other contextual features such as mobile obstacles (obsm), driving maneuvers (manv), and weather (atm) also contribute to the model, though their associations are weaker. By contrast, the remaining features have relatively low SHAP values, suggesting limited relevance in the Random Forest model’s decision process.

Furthermore, SHAP analysis was used to interpret the prediction patterns across injury categories. As shown in Figure 3a, the type of safety equipment (secu) exhibits distinct associations with the model outputs: lower-coded devices such as seat belts are more negatively associated with predictions of severe outcomes, whereas higher-coded devices such as child seats and other equipment show comparatively weaker associations. Predictions of severe injury show stronger associations with crashes involving fixed obstacles, whereas non-severe predictions are less associated with such obstacles.

In the prediction results for no-injury crashes (Figure 3b), safety equipment, the presence of fixed obstacles, older individuals, and male drivers were associated with higher predicted probabilities of no injury. These features appeared as key indicators in the model outputs linked with lower injury severity. For minor injury crashes (Figure 3c), safety equipment (secu) also showed notable associations, although its SHAP values were more dispersed. This dispersion indicates that different types of safety equipment were not consistently linked with higher or lower probability of minor injuries. Instead, age showed a clearer pattern: younger individuals were more often associated with predictions of minor injuries, whereas older individuals tended to be linked with predictions of no injury. Female drivers were more frequently associated with minor injury predictions compared to males.

In summary, the SHAP analysis highlights that safety equipment is the most strongly associated feature across all injury categories, with a particularly strong link to predictions of severe injuries. Age and gender also show relevant associations: younger individuals are more frequently linked with minor injury predictions, while older individuals are more often linked with no injury outcomes. However, the relationship between age and severe injury predictions appears less consistent. Female individuals are also more often associated with minor injury predictions.

5.3. Model Interpretation by Logit Models

While the Random Forest model combined with SHAP values provides valuable insights into the contributions of individual features to crash severity, these findings are inherently model-dependent and do not constitute formal statistical inference. To complement the machine learning analysis and ensure statistical rigor, logit-based approaches were additionally employed.

The statistical modeling approach followed a systematic model selection protocol guided by the ordinal nature of injury severity outcomes. We began with ordered logit regression, which naturally accommodates the hierarchical structure of severity categories (no injury < minor injury < severe injury). However, the Brant test (Table 7) revealed violations of the POA for several key covariates (χ² = 63.58, p < 0.001), indicating that the effect of these variables varies across severity thresholds.

To address this limitation while preserving the ordinal framework, a PPO model is implemented. This specification allows covariates that violate proportionality to have threshold-specific effects while constraining others to maintain constant effects across categories. For completeness, a MNLogit model is also estimated, but this was rejected due to IIA assumption violations confirmed by diagnostic tests. The PPO model maintains the ordinal structure of injury severity while capturing the heterogeneous covariate effects observed in the data.

As shown in Table 8, although the PPO model has a slightly lower log-likelihood than the ordered logit model, it achieves a better overall fit as indicated by the reduced AIC. Since the BIC is only available for the PPO specification, direct comparison is limited. These results suggest that relaxing the proportional odds assumption while retaining interpretability yields improved model adequacy compared to the ordered logit alternative.

The estimated coefficients from the PPO model are reported in Table 9. Panel I and Panel II represent the two cumulative logit equations in the PPO model: Panel I corresponds to the severe vs. (minor + no injury) threshold, while Panel II corresponds to the (severe + minor) vs. no injury threshold. Several variables, including seatbelt use, age, gender, obstacle presence, and lighting conditions, show statistically significant associations with injury severity.

To complement the coefficient estimates, Table 10 presents the corresponding average marginal effects (AMEs), which provide quantitative estimates of probability changes associated with each predictor.

RF+SHAP analysis revealed that safety equipment possessed the highest feature importance in model predictions, while the PPO model quantifies this relationship precisely: seatbelt use was associated with a 29.61% higher probability of no injury compared with no equipment. The effect of age demonstrates consistent patterns: SHAP indicated that younger individuals more frequently appeared in minor injury predictions, while older individuals were associated with no injury outcomes. The PPO model further indicated that each additional year of age was linked to a 0.24% higher probability of no injury, 0.19% lower probability of minor injury, and a 0.04% lower probability of severe injury.

SHAP highlighted fixed obstacles as key predictors of severe crashes, ranking second in feature importance. The PPO marginal effects confirm this association: fixed obstacles are associated them with a 29.36% lower probability of no injury and higher probabilities of severe (+6.78%) and minor (+17.19%) injuries.

SHAP results suggested that male drivers were more frequently linked to no injury outcomes, consistent with PPO estimates indicating that females had a 15.59% lower probability of no injury compared to males. Moreover, PPO marginal effects emphasized the role of roadway and environmental factors: mobile obstacles were associated with a higher probability of no injury, adverse weather with a higher probability of injury outcomes, additional lanes with a lower probability of no injury, and night-time lighting (particularly with streetlights) with higher injury probabilities. In conclusion, both approaches consistently identified safety equipment, age, obstacle presence, and environmental conditions as primary correlations of crash severity.

6. Conclusions

This study developed a hybrid analytical framework integrating RF with SHAP analysis and PPO modeling to investigate crash injury severity factors. RF+SHAP analysis identified safety equipment as the most influential predictor across all severity categories, followed by fixed obstacle presence, age, and environmental conditions. The PPO model quantified these relationships, revealing that seatbelt use is associated with a 29.61% higher probability of no injury, whereas fixed obstacles are associated with a 29.36% lower probability of no injury. Besides, SHAP results suggested that male drivers were more frequently linked to no-injury outcomes, consistent with PPO estimates indicating that females had a 15.59% lower probability of no injury compared to males. Both approaches consistently highlighted the protective role of safety equipment and higher risks from fixed obstacles and adverse environmental conditions.

However, several important limitations constrain the interpretation and generalizability of findings. Unobserved heterogeneity across crash circumstances and driver behavior may confuse observed associations. Data limitations include potential selection bias in crash reporting, missing information on vehicle conditions and detailed driver behavior. Future research will address these limitations. First, methodological extensions should incorporate unobserved heterogeneity through random parameter models, latent class analysis, or hierarchical modeling structures. And cross-regional validation using larger datasets would enhance external validity and identify context-dependent effects. Despite these limitations, the integrated framework demonstrates how interpretable machine learning and statistical validation can enhance reliability.

Author Contributions

C.W. was responsible for conceptualization, methodology, software, data analysis, and original draft preparation. T.S. contributed to supervision, validation, and manuscript review. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset analyzed in this study is openly available at https://www.data.gouv.fr.

Acknowledgments

During the preparation of this manuscript, the author(s) used ChatGPT-4o for the purposes of improving English grammar. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AUC	Area Under the Curve
MLAs	Machine Learning Algorithms
SHAP	Shapley Additive Explanations
TreeSHAP	Tree-based Shapley Additive Explanations
MNLogit	Multinomial Logit Model
ROC	Receiver Operating Characteristic
DT	Decision Tree
SVM	Support Vector Machine
GBM	Gradient Boosting Machine
XGBoost	eXtreme Gradient Boosting
OLR	Ordered Logit Regression
GOL	Generalized Ordered Logit
WHO	World Health Organization
POA	Proportional Odds Assumption
ONISR	Observatoire National Interministériel de la Sécurité Routière
PPO	Partial Proportional Odds (model)
AME	Average Marginal Effect(s)
OvR	One-vs-Rest
AIC	Akaike Information Criterion
BIC	Bayesian Information Criterion
CI	Confidence Interval
IIA	Independence of Irrelevant Alternatives
PR-AUC	Precision–Recall Curve
RF	Random Forest

References

World Health Organization Road Traffic Injuries. Available online: https://www.who.int/health-topics/road-safety (accessed on 23 July 2025).
Observatoire National Interministériel de la Sécurité Routière Le Savez Vous? Available online: https://www.onisr.securite-routiere.gouv.fr/ (accessed on 23 July 2025).
Ling, J.; Qian, X.; Gkritza, K. Electric Vehicles vs. Internal Combustion Engine Vehicles: A Comparative Study of Crashes Involving Vulnerable Road Users. Int. J. Transp. Sci. Technol. 2025, in press. [Google Scholar] [CrossRef]
Lee, D.; Guldmann, J.-M.; von Rabenau, B. Impact of Driver’s Age and Gender, Built Environment, and Road Conditions on Crash Severity: A Logit Modeling Approach. Int. J. Environ. Res. Public Health 2023, 20, 2338. [Google Scholar] [CrossRef] [PubMed]
Chen, Z.; Fan, W. A Multinomial Logit Model of Pedestrian-Vehicle Crash Severity in North Carolina. Int. J. Transp. Sci. Technol. 2019, 8, 43–52. [Google Scholar] [CrossRef]
Karabulut, N.C.; Ozen, M. Exploring Driver Injury Severity Using Latent Class Ordered Probit Model: A Case Study of Turkey. KSCE J. Civ. Eng. 2023, 27, 1312–1322. [Google Scholar] [CrossRef]
Keramati, A.; Lu, P.; Iranitalab, A.; Pan, D.; Huang, Y. A Crash Severity Analysis at Highway-Rail Grade Crossings: The Random Survival Forest Method. Accid. Anal. Prev. 2020, 144, 105683. [Google Scholar] [CrossRef] [PubMed]
Parsa, A.B.; Movahedi, A.; Taghipour, H.; Derrible, S.; Mohammadian, A. Toward Safer Highways, Application of XGBoost and SHAP for Real-Time Accident Detection and Feature Analysis. Accid. Anal. Prev. 2020, 136, 105405. [Google Scholar] [CrossRef] [PubMed]
Shaffiee Haghshenas, S.; Guido, G.; Shaffiee Haghshenas, S.; Astarita, V. Predicting the Level of Road Crash Severity: A Comparative Analysis of Logit Model and Machine Learning Models. Transp. Eng. 2025, 20, 100323. [Google Scholar] [CrossRef]
Razi-Ardakani, H.; Mahmoudzadeh, A.; Kermanshah, M. A Nested Logit Analysis of the Influence of Distraction on Types of Vehicle Crashes. Eur. Transp. Res. Rev. 2018, 10, 44. [Google Scholar] [CrossRef]
Ma, C.; Zhou, J.; Yang, D. Causation Analysis of Hazardous Material Road Transportation Accidents Based on the Ordered Logit Regression Model. Int. J. Environ. Res. Public Health 2020, 17, 1259. [Google Scholar] [CrossRef]
Song, L.; Fan, W. Combined Latent Class and Partial Proportional Odds Model Approach to Exploring the Heterogeneities in Truck-Involved Severities at Cross and T-Intersections. Accid. Anal. Prev. 2020, 144, 105638. [Google Scholar] [CrossRef]
Iranmanesh, M.; Seyedabrishami, S.; Moridpour, S. Identifying High Crash Risk Segments in Rural Roads Using Ensemble Decision Tree-Based Models. Sci. Rep. 2022, 12, 20024. [Google Scholar] [CrossRef] [PubMed]
Sum, S.; Se, C.; Champahom, T.; Jomnonkwao, S.; Sinha, S.; Ratanavaraha, V. A Random Forest and SHAP-Based Analysis of Motorcycle Crash Severity in Thailand: Urban-Rural and Day-Night Perspectives. Transp. Eng. 2025, 21, 100369. [Google Scholar] [CrossRef]
Dimitrijevic, B.; Asadi, R.; Spasovic, L. Application of Hybrid Support Vector Machine Models in Analysis of Work Zone Crash Injury Severity. Transp. Res. Interdiscip. Perspect. 2023, 19, 100801. [Google Scholar] [CrossRef]
Mokhtarimousavi, S.; Anderson, J.C.; Azizinamini, A.; Hadi, M. Improved Support Vector Machine Models for Work Zone Crash Injury Severity Prediction and Analysis. Transp. Res. Rec. 2019, 2673, 680–692. [Google Scholar] [CrossRef]
Champahom, T.; Se, C.; Watcharamaisakul, F.; Jomnonkwao, S.; Karoonsoontawong, A.; Ratanavaraha, V. Tree-Based Approaches to Understanding Factors Influencing Crash Severity across Roadway Classes: A Thailand Case Study. IATSS Res. 2024, 48, 464–476. [Google Scholar] [CrossRef]
Sun, Z.; Wang, D.; Gu, X.; Abdel-Aty, M.; Xing, Y.; Wang, J.; Lu, H.; Chen, Y. A Hybrid Approach of Random Forest and Random Parameters Logit Model of Injury Severity Modeling of Vulnerable Road Users Involved Crashes. Accid. Anal. Prev. 2023, 192, 107235. [Google Scholar] [CrossRef] [PubMed]
Hasan, A.S.; Jalayer, M.; Das, S.; Asif Bin Kabir, M. Application of Machine Learning Models and SHAP to Examine Crashes Involving Young Drivers in New Jersey. Int. J. Transp. Sci. Technol. 2024, 14, 156–170. [Google Scholar] [CrossRef]
Samerei, S.A.; Aghabayk, K. Analyzing the Transition from Two-Vehicle Collisions to Chain Reaction Crashes: A Hybrid Approach Using Random Parameters Logit Model, Interpretable Machine Learning, and Clustering. Accid. Anal. Prev. 2024, 202, 107603. [Google Scholar] [CrossRef] [PubMed]
Jafari, M.; Persaud, B. Investigating the Influence of Socioeconomic Factors on the Relationships between Road Characteristics and Traffic Crash Frequency and Severity—A Hybrid Structural Equation Modelling–Artificial Neural Networks Approach. Accid. Anal. Prev. 2025, 218, 108076. [Google Scholar] [CrossRef]
data.gouv.fr, Annual Databases of Road Traffic Injury Accidents—Years 2005 to 2023. Available online: https://www.data.gouv.fr/datasets/bases-de-donnees-annuelles-des-accidents-corporels-de-la-circulation-routiere-annees-de-2005-a-2022/ (accessed on 8 October 2024).

Figure 1. Model performance: ROC curve and confusion matrix.

Figure 2. The feature importance in different classes.

Figure 3. Features influencing the prediction of 3 classes.

Table 1. Descriptive statistics of categorial variables.

Variables		Category	Severe Injury		No Injury		Minor Injury
Variables		Category	No.	%	No.	%	No.	%
sexe	Gender	1: Male	14,533	62.6	16,321	70.3	13,187	56.8
sexe	Gender	2: Female	8683	37.4	6895	29.7	10,029	43.2
obs	Fixed Obstacles	0: No fixed obstacle	19,525	84.1	22,334	96.2	20,987	90.4
obs	Fixed Obstacles	1: Fixed obstacle	813	3.5	348	1.5	650	2.8
secu	The safety equipment	0: No safety equipment	8938	38.5	5084	21.9	7522	32.4
		1: Seatbelt	14,046	60.5	18,085	77.9	15,601	67.2
		2: Child Seat	0	0.0	0	0.0	0	0.0
		3: Others	232	1.0	46	0.2	93	0.4
obsm	The mobile obstacle	0: No mobile obstacle	2739	11.8	813	3.5	1579	6.8
obsm	The mobile obstacle	1: Mobile obstacle	1230	5.3	2670	11.5	1416	6.1
lum	Lighting condition	1: Daylight	14,185	61.1	16,414	70.7	15,578	67.1
		2: Dusk or dawn	952	4.1	1184	5.1	1393	6.0
		3: Night without street lighting	720	3.1	302	1.3	163	0.7
		4: Night with street lighting not lit	93	0.4	163	0.7	139	0.6
		5: Night with street lighting lit	7267	31.3	5154	22.2	5920	25.5
plan	Road alignment	1: Straight	19,943	85.9	20,732	89.3	20,128	86.7
plan	Road alignment	0: Curved	1602	6.9	1323	5.7	1486	6.4
prof	Road longitudinal profile	0: Sloped	511	2.2	116	0.5	70	0.3
prof	Road longitudinal profile	1: Flat	16,924	72.9	17,621	75.9	17,342	74.7
atm	The weather condition	1: Normal	20,500	88.3	21,034	90.6	20,314	87.5
atm	The weather condition	2: Rainy or snowy	1602	6.9	1207	5.2	1695	7.3

Table 2. Descriptive statistics of numeric variables.

	Severity	Mean	Std	Min	25%	50%	75%	Max
nbv	No injury	2.69	1.93	0.0	1.0	3.0	3.0	11.0
	Minor injury	2.7	2.0	0.0	1.0	3.0	3.0	11.0
	Severe injury	3.02	1.55	0.0	3.0	3.0	3.0	9.0
age	No injury	43.9	17.43	3.0	30.0	41.0	56.0	100.0
	Minor injury	39.77	16.74	2.0	27.0	36.0	50.0	101.0
	Severe injury	40.08	16.42	8.0	28.0	36.0	50.0	96.0

Table 3. Evaluation of metrics.

Metric	Formula
Accuracy	$Accuracy = \frac{T P + T N}{T P + T N + F P + F N}$
Precision (per class)	${Precision}_{i} = \frac{T P_{i}}{T P_{i} + F P_{i}}$
Recall (per class)	${Recall}_{i} = \frac{T P_{i}}{T P_{i} + F N_{i}}$
F1-score (per class)	$F 1_{i} = \frac{2 \cdot {Precision}_{i} \cdot {Recall}_{i}}{{Precision}_{i} + {Recall}_{i}}$
Macro-F1	$F 1_{Macro} = \frac{1}{C} \sum_{i = 1}^{C} F 1_{i}$
Micro-F1	$F 1_{M i c r o} = \frac{2 \times \sum T P_{i}}{2 \times \sum T P_{i} + \sum F P_{i} + \sum F N_{i}}$
Weighted-F1	$\begin{array}{l} F 1_{Weighted} = \frac{\sum_{i = 1}^{C} w_{i} \cdot F 1_{i}}{\sum_{i = 1}^{C} w_{i}}, \\ w_{i} = number of samples in class i \end{array}$
Balanced Accuracy	$B A = \frac{1}{C} \sum_{i = 1}^{C} R e c a l l_{i} = \frac{1}{C} \sum_{i = 1}^{C} \frac{T P_{i}}{T P_{i} + F N_{i}}$
ROC-AUC (OvR, Macro, Micro, Class-wise)	$\begin{array}{l} {AUC}_{ROC} = \int_{0}^{1} T P R (F P R) d (F P R), \\ T P R = \frac{T P}{T P + F N}, \\ F P R = \frac{F P}{F P + T N} \end{array}$
PR-AUC (OvR, Class-wise)	${AUC}_{PR} = \int_{0}^{1} Precision (R e c a l l) d (R e c a l l)$
Confusion Matrix	$[\begin{matrix} T P & F P \\ F N & T N \end{matrix}]$

where TP (True Positives): correctly predicted instances of a given class, TN (True Negatives): correctly predicted instances that do not belong to the given class, FP (False Positives): instances incorrectly predicted as the given class but belong to other classes, FN (False Negatives): instances of the given class incorrectly predicted as other classes.

Table 4. Performance comparison between the unmerging and merging hospitalized and fatal categories.

	Not Merged	Merged
Balanced accuracy	0.414 ± 0.018	0.501 ± 0.019
Macro F1	0.450 ± 0.029	0.502 ± 0.022
ROC-AUC (macro)	0.792 ± 0.011	0.819 ± 0.015
PR-AUC (macro)	0.535 ± 0.024	0.668 ± 0.035

Table 5. Comprehensive performance metrics.

Classification Report
	Precision	Recall	F1-Score	Support
No injury	0.83	0.86	0.85	2386
Minor injury	0.80	0.79	0.79	1788
Severe injury	0.70	0.64	0.67	311
accuracy			0.81	4485
macro avg	0.78	0.76	0.77	4485
weighted avg	0.81	0.81	0.81	4485

Table 6. Other metrics on the test set.

Metric	Value
Accuracy	0.8137 ± 0.0046
Balanced Accuracy	0.7607 ± 0.0046
Macro-F1	0.7697 ± 0.0041
Micro-F1	0.8137 ± 0.0021
Weighted-F1	0.8129 ± 0.0058
ROC-AUC (macro)	0.9017 ± 0.0039
ROC-AUC (micro)	0.9270 ± 0.0021
PR-AUC (macro)	0.7908 ± 0.0027
PR-AUC (micro)	0.8565 ± 0.0061
PR-AUC (per class)	No injury: 0.8819 ± 0.0023
	Minor injury: 0.8488 ± 0.0018
	Severe injury: 0.6393 ± 0.0031

Table 7. Brant test results.

Variable	Chi²	p-Value	Violation (Yes/No)
age	5.27	0.020	Yes
secu1	2.11	0.150	No
secu2	1.28	0.203	No
secu3	1.74	0.180	No
nbv	8.53	0.000	Yes
sexe2	12.43	0.000	Yes
obs1	3.87	0.050	Yes
obsm1	0.00	1.000	No
lum2	1.93	0.160	No
lum3	17.45	0.000	Yes
lum4	0.06	0.800	No
lum5	0.06	0.810	No
plan0	0.45	0.500	No
prof0	0.31	0.590	No
atm0	4.26	0.040	Yes

Table 8. Model Fit Statistics: OL vs. PPO.

Model	Log-Likelihood	AIC	BIC
Ordered Logit	−6201.8	12,435.6	nan
PPO Model	−6182.75	12,409.5	13,052.429399574658

Table 9. PPO model results: coefficients.

Variable		Panel I	Panel II
Variable		Coefficient	Coefficient
obs = 1 (ref = 0)	Fixed obstacle vs. None	−0.990 ***	−0.990 ***
obsm = 1 (ref = 0)	Moving obstacle vs. None	0.223 *	0.223 *
plan = 0 (ref = 1)	Not straight vs. Straight	−0.107	−0.107
prof = 0 (ref = 1)	Not flat vs. Flat	−0.041	−0.041
atm = 0 (ref = 1)	Bad vs. Normal weather	−0.184 *	−0.184 *
secu = 1 (ref = 0)	Seatbelt vs. No equipment	1.738 ***	1.738 ***
secu = 2 (ref = 0)	Child seat vs. No equipment	0.029	0.029
secu = 3 (ref = 0)	Others vs. No equipment	0.102	0.102
age	Age	−0.003 *	−0.011 ***
nbv	The number of lanes	0.091 **	0.019
sexe = 2 (ref = 1)	Female vs. Male	0.343 ***	0.672 ***
lum = 2 (ref = 1)	Lum: Dawn/Dusk vs. Daylight	−0.344	0.072
lum = 3 (ref = 1)	Lum: Night no streetlights vs. Daylight	0.724 **	−0.278
lum = 4 (ref = 1)	Lum: Night without streetlights vs. Daylight	−0.263	−0.128
lum = 5 (ref = 1)	Lum: Night with streetlights vs. Daylight	0.234 *	0.260 ***

Note: *** Indicates parameter is significant at 0.001 and ** indicates parameter is significant at 0.01, and * indicates parameter is significant at 0.05.

Table 10. Average marginal effects.

Variable	Severe	Minor	No Injury
age	−0.0004 (−0.0005, −0.0003)	−0.0019 (−0.0025, −0.0013)	0.0024 (0.0017, 0.0030)
nbv	0.0012 (0.0002, 0.0022)	0.0096 (0.0091, 0.0098)	−0.0066 (−0.0120, −0.0012)
sexe2	0.0365 (0.0289, 0.0451)	0.1194 (0.1029, 0.1367)	−0.1559 (−0.1806, −0.1320)
obs1	0.0678 (0.0480, 0.0902)	0.1719 (0.1412, 0.1978)	−0.2936 (−0.2845, −0.1889)
obsm1	−0.0084 (−0.0156, −0.0008)	−0.0427 (−0.0832, −0.0038)	0.0511 (0.0046, 0.0985)
lum2	0.0012 (−0.0075, 0.0112)	0.0045 (−0.0364, 0.0472)	−0.0058 (−0.0585, 0.0436)
lum3	−0.0032 (−0.0169, 0.0152)	0.0189 (−0.0938, 0.0581)	−0.0221 (−0.0738, 0.1106)
lum4	−0.0040 (−0.0210, 0.0194)	−0.0241 (−0.1171, 0.0732)	0.0280 (−0.0922, 0.1379)
lum5	0.0124 (0.0069, 0.0185)	0.0497 (0.0290, 0.0701)	−0.0620 (−0.0877, −0.0354)
plan0	0.0053 (−0.0015, 0.0115)	−0.0225 (−0.0700, 0.0475)	−0.0278 (−0.0586, 0.0086)
prof0	0.0021 (−0.0024, 0.0071)	0.0093 (−0.0111, 0.0297)	−0.0114 (−0.0369, 0.0135)
atm0	0.0087 (0.0022, 0.0171)	0.0357 (0.0100, 0.0650)	−0.0444 (−0.0812, −0.0124)
secu1	−0.0370 (−0.0423, −0.0324)	−0.2591 (−0.2891, −0.2257)	0.2961 (0.2582, 0.3311)
secu2	−0.0010 (−0.0104, 0.0102)	−0.0059 (−0.0525, 0.0424)	0.0069 (−0.0528, 0.0626)
secu3	−0.0041 (−0.0125, 0.0064)	−0.0203 (−0.0621, 0.0277)	0.0244 (−0.0343, 0.0746)

Note: Entries are average marginal effects (AME) with 95% confidence intervals in parentheses. Effects are considered statistically significant when the 95% confidence interval does not include zero.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, C.; Serre, T. A Hybrid Approach to Investigating Factors Associated with Crash Injury Severity: Integrating Interpretable Machine Learning with Logit Model. Appl. Sci. 2025, 15, 10417. https://doi.org/10.3390/app151910417

AMA Style

Wang C, Serre T. A Hybrid Approach to Investigating Factors Associated with Crash Injury Severity: Integrating Interpretable Machine Learning with Logit Model. Applied Sciences. 2025; 15(19):10417. https://doi.org/10.3390/app151910417

Chicago/Turabian Style

Wang, Chenxi, and Thierry Serre. 2025. "A Hybrid Approach to Investigating Factors Associated with Crash Injury Severity: Integrating Interpretable Machine Learning with Logit Model" Applied Sciences 15, no. 19: 10417. https://doi.org/10.3390/app151910417

APA Style

Wang, C., & Serre, T. (2025). A Hybrid Approach to Investigating Factors Associated with Crash Injury Severity: Integrating Interpretable Machine Learning with Logit Model. Applied Sciences, 15(19), 10417. https://doi.org/10.3390/app151910417

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hybrid Approach to Investigating Factors Associated with Crash Injury Severity: Integrating Interpretable Machine Learning with Logit Model

Abstract

1. Introduction

2. Literature Review

2.1. Traditional Statistical Modeling Approaches

2.2. Machine Learning Approaches

3. Data Source

4. Methodology

4.1. Framework

4.2. Random Forest

4.3. Shapely Additive Explanations

4.4. Statistical Models

4.5. Evaluation Metrics

5. Results and Discussion

5.1. Model Results

5.1.1. Validation of Injury Severity Category Merging

5.1.2. Model Results from Random Forest

5.2. Model Interpretation by SHAP

5.3. Model Interpretation by Logit Models

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI