Deciphering the Crash Mechanisms in Autonomous Vehicle Systems via Explainable AI

Zhang, Zhe; Wu, Wentao; Cao, Qi; Song, Jianhua; Ma, Jingfeng; Ren, Gang; Wu, Changjian

doi:10.3390/systems14010104

Open AccessArticle

Deciphering the Crash Mechanisms in Autonomous Vehicle Systems via Explainable AI

by

Zhe Zhang

¹,

Wentao Wu

²,

Qi Cao

²

,

Jianhua Song

³,

Jingfeng Ma

⁴

,

Gang Ren

^2,* and

Changjian Wu

²

¹

School of Artificial Intelligence, Mianyang Normal University, Mianyang 621000, China

²

School of Transportation, Southeast University, Jiulonghu Campus, Nanjing 211100, China

³

Transportation Institute, Inner Mongolia University, Huhehaote 010020, China

⁴

Faculty of Transportation Engineering, Kunming University of Science and Technology, Kunming 650500, China

^*

Author to whom correspondence should be addressed.

Systems 2026, 14(1), 104; https://doi.org/10.3390/systems14010104

Submission received: 5 December 2025 / Revised: 15 January 2026 / Accepted: 15 January 2026 / Published: 19 January 2026

(This article belongs to the Section Artificial Intelligence and Digital Systems Engineering)

Download

Browse Figures

Versions Notes

Abstract

The rapid advancement of autonomous vehicle systems (AVS) has introduced complex challenges to road safety. While some studies have investigated the contribution of factors influencing AV-involved crashes, few have focused on the impact of vehicle-specific factors within AVS on crash outcomes, a focus that gains importance due to the absence of a human driver. To address this gap, the advanced machine learning algorithm, LightGBM (v4.4.0), is employed to quantify the potential effects of vehicle factors on crash severity and collision types based on the Autonomous Vehicle Operation Incident Dataset (AVOID). The joint effects of different vehicle factors and the interactive effects of vehicle factors and environmental factors are studied. Compared with other frequently utilized machine learning techniques, LightGBM demonstrates superior performance. Furthermore, the SHapley Additive exPlanation (SHAP) approach is employed to interpret the results of LightGBM. The analysis of crash severity revealed the importance of investigating the vehicle characteristics of AVs. Operator type is the most predictive factor. For road types, highways and streets show a positive association with the model’s prediction of serious crashes. Crashes involving vulnerable road users can be attributed to different factors. The road type is the most significant factor, followed by precrash speed and mileage. This study identifies key predictive associations for the development of safer AV systems and provides data-driven insights to support regulatory strategies for autonomous driving technologies.

Keywords:

autonomous vehicle system; crash results; injury severity; LightGBM; SHAP

1. Introduction

The development of autonomous vehicles (AVs) is expected to transform the transportation system [1,2,3], owing to their potential to eliminate the need for human drivers and reduce associated costs. However, as AVs are increasingly being tested and operated, they pose significant crash risks. Many of these crashes are ultimately attributable to performance limitations in the AV system itself, such as perception errors, flawed decision-making algorithms, or inadequate control responses under complex driving scenarios [4,5]. It is therefore necessary to analyze the factors influencing crash outcomes using crash data to develop effective countermeasures.

Concurrently, significant research and development efforts are dedicated to enhancing the intrinsic safety of AV systems to prevent crashes proactively. Beyond incremental improvements in core perception, planning, and control modules [6,7,8], a particularly promising direction is the development of runtime-enabled active collision-avoidance techniques [9]. These systems operate as a parallel safety layer, continuously monitoring the vehicle’s internal state and external environment in real time. Using advanced risk prediction models, they can identify potentially hazardous situations that may not be adequately handled by the primary autonomous driving stack [10,11]. When an imminent risk is detected, these systems can execute safeguarded maneuvers or emergency interventions, such as constrained emergency braking or steering within stable dynamics limits, to avoid or mitigate collisions [12,13,14]. Other complementary approaches include formal methods for runtime assurance, fault-tolerant system architectures, and comprehensive validation using simulation [15,16,17,18]. While these technological advancements are crucial for raising the overall safety floor and handling edge cases, analyzing real-world crash data remains indispensable. Such analysis provides empirical evidence on the performance boundaries and residual failure modes of existing systems—even those equipped with advanced safety layers. It reveals which scenarios, system states, or environmental interactions continue to challenge AVs, thereby offering critical, evidence-based guidance for prioritizing future research, refining safety architectures, and validating the effectiveness of new mitigation strategies. Therefore, complementing proactive safety technology development with systematic forensic analysis of crash outcomes constitutes a vital, dual-path strategy for accelerating the safe maturation of AVs.

The risk factors related to traffic crashes include human, vehicle, road, and environmental factors. As the driver’s role is assumed by the vehicle itself [4], analyzing vehicle-related factors—which serve as proxies for the underlying performance of the perception, decision-making, and control systems—becomes paramount for diagnosing system failures and enhancing safety. Consequently, it is necessary to study the joint effects of different vehicle factors and the interactive effects of vehicle and environmental factors [19] on crash outcomes. This approach helps to identify which combinations of vehicle states and environmental conditions are most likely to overwhelm the current AV technological stack, leading to collisions. In addition, different types of collisions are attributed to different factors. Crashes involving vulnerable road users, including pedestrians and cyclists, are more likely to have serious consequences [20]. Therefore, the research objectives also include the influence of risk factors on crashes involving vulnerable road users.

The most frequently employed techniques for analysis of crash data consist of both traditional statistical models and machine learning methodologies. The application of traditional statistical models to causal inference is constrained by the limitations of specific assumptions, including the independence of irrelevant alternatives (IIA) assumption and the parallel line assumption [21]. To overcome these limitations, machine learning techniques are introduced [22], such as decision tree algorithm [23]. The Gradient Boosting Decision Tree (GBDT) is a prominent machine learning algorithm, distinguished by its superior performance on efficiency, accuracy, and interpretability [24]. eXtreme Gradient Boosting (XGBoost) is a representative of GBDT and widely used in crash analysis [25]. LightGBM has demonstrated better outcomes in numerous standard classification benchmarks than XGBoost. Meanwhile, it directly supports categorical features without requiring one-hot encoding, which is needed by XGBoost, reducing the dimensionality and potential sparsity issues that could arise from expanding categories into multiple binary features [26]. SHapley Additive exPlanations (SHAP) is a technique that can provide interpretability for machine learning algorithms [25,27,28]. Unlike tree-based algorithms that merely indicate feature importance without elucidating their directional effects on model outputs, SHAP values provide precise quantification of both the magnitude and polarity of each variable’s contribution to individual predictions. In addition, SHAP can deliver both local explanations for single predictions and global interpretations that highlight feature importance. SHAP values, deriving their theoretical foundation from coalitional game theory, provide more consistent and equitable feature contribution attributions than traditional tree-based importance measures like split gain or cover, resulting in more dependable explanatory capabilities. Therefore, LightGBM is employed together with SHAP to elucidate the influence of each independent variable on the outcome.

This study aims to evaluate the impact of vehicle factors on the severity and collision type of crashes involving AVs. An advanced machine learning algorithm, LightGBM is introduced into the analysis of AV-involved crashes, and SHAP is used for interpretations [29]. The joint effects of different vehicle factors and the interactive effects of vehicle and environmental factors are studied, which are important in the safety of AVs. The main contributions of this paper are summarized as follows:

(1): It introduces an advanced machine learning framework (LightGBM combined with SHAP) for a systematic analysis of AV-involved crash outcomes, addressing limitations of traditional statistical models.
(2): It explicitly investigates and quantifies the joint effects of various vehicle factors and their interactions with environmental factors on crash severity and collision type, thereby identifying high-risk scenarios that challenge current AV capabilities.
(3): It provides specific insights into the factors influencing crashes involving vulnerable road users, which pose significant perceptual and predictive challenges to AV systems.
(4): By interpreting model outcomes via SHAP, it offers actionable insights that can inform the refinement of core AV technologies, including perception algorithms, decision-making logic, and control strategies, thereby contributing to the development of more robust autonomous driving systems.

The subsequent content is divided into four main sections. Commencing with an extensive review of relevant scholarship, the article then characterizes the research data, elaborates on the methodological framework, and concludes with an analysis of outcomes and their implications.

It is crucial to emphasize that this study operates within an observational, associational framework. We employ SHAP to explain the predictive model’s behavior and to uncover which factors are most strongly associated with adverse outcomes in the available data. The findings highlight potential risk indicators and high-priority scenarios for AV safety. However, establishing causal relationships between specific vehicle factors and crash outcomes requires different methodological approaches, which are discussed as important avenues for future research.

2. Literature Review

The analysis of crashes involving autonomous vehicles extends beyond traditional factor identification to encompass the complex interplay between the vehicle’s internal systems and the external operational environment. This systems-oriented perspective is crucial, as AV crashes often stem not from single points of failure, but from mismatches or performance boundaries within the entire automated driving stack—comprising sensing, perception, decision-making, and control execution [4,5,30].

Previous research has identified a range of factors influencing crash severity [31] and types [32,33,34,35]. These factors usually include road and environment factors [36,37,38,39,40,41,42], like crash location, land use, weather, lighting, speed limit, and road types, as well as some factors related to vehicles, such as precrash movements and driving mode, among others. Notably, factors like lighting and weather directly challenge an AV’s perception system, a cornerstone of its operational safety. Inclement weather, for instance, can degrade sensor (e.g., LiDAR, camera) performance, leading to inaccurate environment models and subsequent decision errors [43]. Vehicle factors assume greater importance for the safety of AVs as the driver’s role is assumed by the vehicle itself, with its movements controlled by algorithms. However, limited existing research has focused on the impact of these vehicle factors on crashes [36,37,38,39,40,41,42], particularly those that serve as proxies for the performance of the underlying autonomy stack. Specifically, factors like the novelty of the autopilot function are important as it can significantly impact operational safety and be upgraded quickly, but current research lacks consideration of this factor.

In the analysis of crashes involving AVs, logit models are among the most frequently employed statistical techniques [41,42], which have good interpretability but have severe shortcomings [28]. Other researchers employed machine learning techniques to circumvent these constraints, and decision tree [31,34] and their variants were the most widely used methods. A more sophisticated model, LightGBM [26], has also been employed for crash analysis, but only for crash types and severity of conventional vehicles, and has not been applied to the analysis of crashes involving autonomous vehicles. Table 1 is a summary of the methods, factors, and outcomes of studies on AVs-involved crashes using decision tree-based methods. Although machine learning techniques demonstrate remarkable predictive capabilities, their inherent opacity and limited explanatory capacity frequently raise concerns among stakeholders.

SHapley Additive exPlanation (SHAP) has been developed to achieve interpretability from results [44]. Integrating machine learning techniques with the SHAP framework [32,33] enables comprehensive elucidation of predictive outputs derived from algorithmic processing of collision-related datasets [25,27,28]. This is pivotal for a system-level safety analysis, as it allows researchers to move beyond predicting “what” will happen to understanding “why” it might happen, thereby linking statistical patterns to potential engineering flaws. While previous studies have investigated factors contributing to injury severity, the application of more advanced and interpretable machine learning methods like LightGBM combined with SHAP to AV crash data remains nascent. Therefore, it is necessary to introduce such methods to decipher the causal mechanisms behind AV crashes, ultimately informing the advancement of more dependable autonomous transportation solutions.

3. Data

3.1. Data Description

The primary dataset for this analysis was obtained from the AVOID [29]. Its data source includes crash records collected by the National Highway Traffic Safety Administration (NHTSA), California Department of Motor Vehicles (CA DMV), and incident news. A total record of 1278 complete cases was included in this research after removing records with missing values for any of the analyzed features. We acknowledge that this listwise deletion approach could potentially introduce selection bias if the missing data are not completely at random. To assess this risk, we conducted a preliminary comparison between the included cases (N = 1278) and the excluded cases with missing values on several key observable variables. No systematic, statistically significant differences were found in these distributions, suggesting that the missingness may be largely random with respect to these core study dimensions. Nonetheless, the possibility of bias due to unobserved factors associated with missingness cannot be entirely ruled out, and this is noted as a data limitation in Section 3.4. The crashes included in the dataset all occurred after 2014, and 80.7% of them occurred after 2019.

Table 2 presents a summary of the crash data. As shown in the table, factors include some vehicle factors (operator type, pre-crash movement, contact type, pre-crash speed), road factors (road surface, roadway type, roadway descriptions), and environmental factors (lighting and weather). “Year since crash” represents the temporal distance from the crash event to the present. While the autonomous driving field undergoes rapid iteration with potential OTA updates, this variable serves as a coarse temporal indicator rather than a direct measure of the specific software version or autonomy stack maturity at the time of the crash, as OTA schedules vary by manufacturer and are not recorded in the dataset. Roadway description represents the temporary condition of the crash, such as whether there is a traffic incident, work zone, etc. Severity and collision type are used as dependent variables. We categorized severity into two levels: no injury and minor injury (severity level 0), and severe injury (including fatality) (severity level 1). Typically collision types refer to rear-end, side-swipe, etc., but in this research, the meaning is consistent with the AVOID [29,42], which is associated with the object vehicle crashes with. Collision types are divided into two groups: not vulnerable road users, and vulnerable road users (including cyclists and pedestrians). More details are available [29] in the descriptives of the original dataset.

3.2. Variable Recategorization Rationale

To align with the analytical goals and address data constraints, the dependent variables—crash severity and collision type—were recategorized into binary outcomes.

Crash Severity (Binary): Injury severity in the original dataset encompasses a spectrum from no injury to fatality. We consolidated these into a binary variable: “No/Minor Injury” (severity level 0) versus “Severe Injury/Fatality” (severity level 1). This decision was driven by two primary considerations: (1) Severe Class Imbalance: Fatal and severe injury crashes constitute a distinct minority in the AVOID. Training a stable and reliable multi-class prediction model on such an imbalanced sample is challenging and can lead to poor performance for the minority classes of greatest concern. (2) Practical Safety Priority: From a safety management and regulatory perspective, the most critical distinction often lies between crashes that result in serious societal consequences (severe injury/fatality) and those that do not. This binary focus allows the model to concentrate its predictive power on identifying factors associated with the most severe outcomes, which is a common approach in traffic safety research with imbalanced data [20,45].

Collision Type (Binary—Vulnerable Road User Involvement): The original collision partner categories (e.g., cyclist, pedestrian, fixed object, truck) were grouped into “collisions involving vulnerable road users (VRUs)” versus “all others.” This aggregation is directly motivated by a core objective of this study: to specifically investigate factors associated with crashes involving pedestrians and cyclists. VRUs present unique, well-documented challenges to AV perception and prediction systems due to their smaller size, greater unpredictability, and physical vulnerability [4,39]. Grouping them allows for a targeted analysis of this high-risk scenario. We acknowledge that the “other” category is heterogeneous, encompassing collisions with fixed objects, trucks, and other vehicles, each with different dynamics. The primary insights regarding this variable should therefore be interpreted in the context of VRU-involved crashes specifically.

3.3. Addressing Class Imbalance

The Synthetic Minority Over-sampling Technique for Nominal and Continuous features (SMOTE-NC) was implemented to address the severe class imbalance in the dataset. For the crash severity outcome, the original distribution showed that severe injury/fatality cases constituted only 15.3% of the total sample (minority class), while no/minor injury cases accounted for 84.7%. For the collision type outcome, crashes involving vulnerable road users represented only 8.2% of the dataset.

SMOTE-NC was applied exclusively to the training folds during the 10-fold cross-validation process, generating synthetic samples for the minority classes while preserving the data type characteristics. The augmented training sets achieved balanced class distributions (approximately 50:50 for both binary outcomes), while the independent test set retained its original imbalanced distribution to ensure realistic performance evaluation. This approach mitigates model bias toward the majority class during training while maintaining the integrity of the evaluation on real-world, imbalanced data.

It is crucial to emphasize that while SMOTE-NC was applied to mitigate class imbalance during model training, all subsequent model evaluation and interpretation steps were conducted on the original, held-out test set. Specifically, the performance metrics (PR-AUC, ROC-AUC, Recall, F1-score) reported in Section 5 are computed exclusively from predictions on this original test set, preserving the real-world class distribution. More importantly, all SHAP values (e.g., for summary and dependence plots) were also calculated using the trained LightGBM model on this original test set. This practice ensures that our interpretability analysis reflects the model’s behavior on real, imbalanced data rather than on synthetically generated samples, thereby enhancing the practical relevance and credibility of the derived insights.

3.4. Data Limitations and Considerations

The analysis is based on a subset of 1278 complete cases after listwise deletion of records with missing values. While preliminary checks did not reveal significant systematic differences between included and excluded cases on key observable variables, the potential for selection bias due to missing data not being completely at random cannot be entirely discounted. To assess whether the listwise deletion of cases with missing values introduced systematic bias, we compared the distributions of key variables between the original dataset and the final analytical sample. The variables compared included Mileage, injury severity (binary), Collision Type (Binary), operator type, and roadway type. Chi-square tests (for categorical variables) and Mann–Whitney U test (for the continuous variable) revealed no statistically significant differences (p > 0.10 for all comparisons) between the two groups in these fundamental characteristics. This suggests that the missing data are likely missing at random (MAR) with respect to these observed variables, and that the analytical sample remains representative of the original dataset on these dimensions. A detailed comparison table is provided in Supplementary Table S2.1.

It is crucial to note the limited number of crashes involving vulnerable road users in the dataset. While this reflects a real-world challenge in AV safety data collection, it imposes constraints on the statistical power and generalizability of any model or analysis focused specifically on this subgroup. Findings related to VRUs should therefore be interpreted as preliminary and indicative of areas for further investigation rather than as definitive conclusions.

4. Materials and Methods

In this research, the LightGBM (v4.4.0) model was trained for analysis. The boosting algorithm has distinctive capabilities for addressing datasets with small sample sizes. GBDT is a tree-based ensemble learning framework that exhibits high levels of accuracy in prediction. In comparison to basic GBDT, XGBoost incorporates a second-order Taylor expansion term into the objective function, thereby establishing a more efficient and accurate framework.

LightGBM is an efficient ensemble machine learning method that employs Gradient-based One-Side Sampling (GOSS) to split internal nodes based on variance gain and Exclusive Feature Bundling (EFB) for input feature dimension reduction [26]. This approach was designed to overcome computational bottlenecks and enhance the scalability of the XGBoost framework. It has been demonstrated that LightGBM exhibits superior performance compared to other gradient boosting algorithms in terms of training speed and predictive accuracy. Categorical features are also supported without requiring one-hot encoding, which may cause the dimensionality and potential sparsity problems [26]. A distinctive strength of LightGBM as a tree-based algorithm lies in its tolerance to multicollinearity, making it well-suited for safety analytics where predictors frequently demonstrate interdependence. This capability allows LightGBM [46] to accommodate correlated input variables without compromising model integrity.

In this gradient boosting architecture, the final prediction constitutes the cumulative output generated by an ensemble of decision trees

h_{t} (x)

:

f (x) = \sum_{t = 1}^{T} h_{t} (x)

(1)

In this formulation,

T

specifies the quantity of constituent trees. The GBDT training procedure aims to construct an approximator

f

that optimizes the composite objective

O b j

, mirroring XGBoost’s dual-component structure that combines a loss minimization term with regularization constraints, mathematically represented in Equation (2):

O b j = \sum_{j = 1}^{L} [G_{j} w_{j} + \frac{1}{2} (H_{j} + λ) w_{j}^{2}] + γ T

(2)

In this formulation,

L

denotes the total count of terminal nodes within the tree structure. For any given leaf

j

, the parameters

G_{j}

and

H_{j}

represent the aggregated first-order gradient statistics and cumulative second-order gradient statistics, respectively, computed across all training instances assigned to that leaf. The term

w_{j}

corresponds to the optimal weight value assigned to leaf

j

during the model fitting process.

λ

is the coefficient of the regularization penalty term. The hyperparameter

γ

serves to regulate the structural sophistication of the decision tree.

LightGBM employs the Gradient-based One-Side Sampling (GOSS) technique for node splitting, departing from conventional entropy-based criteria used in standard GBDT. This approach prioritizes instances with substantial gradient magnitudes (representing

a * 100 %

) for inclusion in subset A, while randomly sampling a fraction (

b * (1 - a) * 100 %

) of the remaining lower-gradient observations to form subset B. The partitioning mechanism subsequently computes the variance improvement metric

V_{j} (d)

over the combined dataset

A \cup B

:

V_{j} (d) = \frac{1}{n} [\frac{{(\sum_{x_{i} \in A_{l}} g_{i} + \frac{1 - a}{b} \sum_{x_{i} \in B_{l}} g_{i})}^{2}}{n_{l}^{j} (d)} + \frac{{(\sum_{x_{i} \in A_{r}} g_{i} + \frac{1 - a}{b} \sum_{x_{i} \in B_{r}} g_{i})}^{2}}{n_{r}^{j} (d)}]

(3)

Complementing its GOSS methodology, LightGBM further integrates EFB to enhance computational efficiency during training while preserving model fidelity. Sparse high-dimensional feature sets often exhibit disjoint activation characteristics, with different features rarely assuming non-zero values concurrently within any given instance. The EFB technique aggregates mutually exclusive features into a composite feature representation.

Recall is the proportion of predicted true positives in the sample of all positives. Precision measures the proportion of true positives in the model’s predicted positive samples. The F1-score constitutes a balanced performance metric derived from the harmonic mean of precision and recall. For all binary classification tasks in this study, class predictions were obtained by applying a fixed decision threshold of 0.5 to the predicted probabilities. This consistent threshold ensures a fair comparison focused on inherent model discriminative ability, while acknowledging that threshold tuning could optimize specific metrics (e.g., recall) for particular applications.

R e c a l l = \frac{T P}{T P + F N}

(4)

P r e c i s i o n = \frac{T P}{T P + F P}

(5)

F 1 - s c o r e = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(6)

The term ‘TP’ represents instances correctly identified as positive by the model when the actual classification is positive, whereas ‘FP’ indicates cases erroneously classified as positive when the ground truth is negative.

The Brier Score was additionally computed to assess probability calibration, measuring the mean squared difference between predicted probabilities and actual outcomes. Lower Brier Score values indicate better-calibrated probabilities. The formula is:

B S = \frac{1}{N} \sum_{i = 1}^{N} {(p_{i} - o_{i})}^{2}

(7)

The term

p_{i}

is the predicted probability for instance

i

, and

o_{i}

is the actual outcome (0 or 1).

ROC-AUC quantifies the overall discriminative capacity of a binary classifier by measuring the entire two-dimensional area beneath its ROC curve. Superior predictive capability is indicated by higher AUC measurements. The ROC-AUC curve is depicted as follows

R O C_A U C = \int_{- \infty}^{+ \infty} T P R (F P R^{- 1} (t)) d t

(8)

where

T P R (t)

characterizes the proportion of correctly identified positive instances across different threshold values

t

, while

F P R (t)

quantifies the fraction of negative cases mistakenly classified as positive under varying threshold settings and

F P R^{- 1} (t)

signifies the inverse function of the FPR function.

In addition to the standard ROC-AUC, we computed the Precision-Recall Area Under the Curve (PR-AUC) to specifically assess model performance on imbalanced datasets. PR-AUC is particularly informative for classification tasks with skewed class distributions, as it focuses on the performance of the positive (minority) class by evaluating the trade-off between precision and recall across different probability thresholds.

P R_A U C = \sum_{k = 1}^{n} (R e c a l l_{k} - R e c a l l_{k - 1}) \cdot P r e c i s i o n_{k}

(9)

Bayesian optimization was employed to determine the optimal hyperparameters for the LightGBM, XGBoost, and SVM models. This approach has gained prominence for its effectiveness in optimizing model parameters, attributed to its systematic exploration of the hyperparameter domain [47]. The fundamental principle of Bayesian optimization involves constructing a probabilistic model to approximate the objective function, which is then used to guide the selection of subsequent sampling points. Initially, a prior distribution is defined to encapsulate the initial assumptions about the objective function. The function provides a criterion for selecting the next evaluation point, after which the model is updated, and the process is iteratively repeated until a predefined stopping condition is satisfied.

A list of potential values for each hyperparameter is put forward to define the search space. The search space of the number of estimators is between 50 and 300, and the maximum depth limit is set between 3 and 5. The learning rate, denoted by η, was set to the values {0.01, 0.05, 0.1, 0.2}. A ten-fold cross-validation was conducted for each combination of values to identify the optimal values. The iteration process was terminated when the metric on the test set did not decrease for 10 consecutive rounds.

Once the optimal model had been obtained through hyperparameter tuning, the prediction results were subjected to interpretation, which means to ascertain the contribution of each feature to the prediction made by the model. Traditional feature importance measures in tree models, split count, cover, and gain are typically employed, but these methods might not accurately reflect the true contribution of each feature and can sometimes lead to misleading conclusions [48]. By contrast, the aggregate of all feature importance values provided by SHAP [44] represents the locally specific output of the model, thus guaranteeing “local accuracy”. SHAP operates by measuring the additive importance of each feature in shaping the final prediction, thereby offering dual-level interpretability that encompasses both overall model behavior and individual instance predictions.

The model’s output is constituted by the aggregation of individual feature contributions combined with a baseline intercept term, expressed mathematically as:

F (x) \approx G (z) = ϕ_{0} + \sum_{j = 1}^{m} ϕ_{j} z_{j}

(10)

In this framework,

G (z)

serves as an interpretive function that provides intuitive insights into the complex model

F (x)

. The binary indicator

z_{j} \in {0,1}

assumes a value of 1 when feature j is present and 0 when absent. The parameter

ϕ_{j} \in R

quantifies the attribution value assigned to feature j, while

ϕ_{0}

represents the baseline prediction from a null model without any features. Mathematical derivation establishes that the attribution term

ϕ_{i}

must satisfy the following unique representation:

ϕ_{i} = \sum_{s \in N} \frac{|S|! (|N| - |S| - 1)!}{|N|!} [v (S \cup {i}) - v (S)]

(11)

In this formulation:

ϕ_{i}

represents the Shapley additive explanation assigned to feature

i

, quantifying its specific attribution;

Ν

corresponds to the complete feature vector undergoing interpretation, with

| Ν |

indicating its total element count;

S

refers to any feature subset derived from

Ν

that excludes feature

i

, where

| S |

specifies the number of features in this subset;

v (S \cup {i})

and

v (S)

indicate the model’s predictive outputs when feature

i

is incorporated and omitted, respectively.

5. Experiment

To quantify uncertainty in performance estimates, 95% confidence intervals were calculated for all metrics using stratified bootstrapping with 1000 iterations on the test set, preserving the original class distribution in each bootstrap sample (for detailed information, please refer to the Supplementary Material). The performance of each model is illustrated in Table 3. In addition to LightGBM and XGBoost, our experimental framework incorporates support vector machines (SVMs) as a benchmark algorithm for comparative analysis. With the corrected and verified metrics, comparative analysis confirms LightGBM’s superiority across all evaluation dimensions. XGBoost shows competitive performance with an ROC-AUC of 0.773 [0.745–0.801] and recall of 0.562 [0.505–0.619], achieving an F1-score of 0.608 [0.560–0.656]. However, LightGBM maintains a clear lead with higher ROC-AUC (0.797 [0.770–0.824]), PR-AUC (0.589 [0.545–0.633]), recall (0.625 [0.570–0.680]), precision and F1-score (0.860 [0.824–0.896]). Therefore, the LightGBM method is used for prediction.

The PR-AUC analysis is particularly informative given the class imbalance. LightGBM’s PR-AUC of 0.589 [0.545–0.633] substantially outperforms XGBoost (0.512 [0.468–0.556]) and SVM (0.385 [0.342–0.428]), demonstrating its superior ability to identify the minority class. The Brier Score further confirms LightGBM’s advantage in probability calibration (0.168 [0.158–0.178]), compared to XGBoost (0.187 [0.176–0.198]) and SVM (0.215 [0.202–0.228]). For LightGBM, the optimal hyperparameters were determined as: maximum depth = 4, minimum child samples = 5, and number of estimators = 200.

For collision type prediction (Table 4), LightGBM again provides the best overall performance. It achieves the highest ROC-AUC (0.837 [0.795–0.879]), PR-AUC (0.523 [0.471–0.575]) and F1-score (0.632 [0.548–0.716]), along with the lowest Brier Score (0.142 [0.129–0.155]). While XGBoost matches LightGBM’s recall, it shows substantially lower F1-score and PR-AUC. The lower absolute values of PR-AUC compared to ROC-AUC across all models reflect the challenge of predicting the sparse “vulnerable road user involved” class. LightGBM’s superior PR-AUC (0.523 [0.471–0.575]) compared to XGBoost (0.485 [0.431–0.539]) and SVM (0.418 [0.362–0.474]) demonstrates its particular robustness in this highly imbalanced classification scenario. For the collision type model, the optimal LightGBM hyperparameters were: maximum depth = 4, minimum child samples = 44, and number of estimators = 200.

As expected, this resampling technique improved the recall for the minority class compared to training on the raw imbalanced data (e.g., recall increased from 0.51 to 0.625 for LightGBM severity prediction model), demonstrating enhanced sensitivity. However, final evaluation metrics and all model interpretations (SHAP) are reported on the original test set to provide a realistic assessment of model performance and avoid over-optimism.

SHAP analysis quantifies the influence of predictor variables on both crash severity levels and collision types. The magnitude of SHAP values directly reflects the relative influence of features on model predictions.

Figure 1 summarizes the results of the SHAP analysis of serious crashes. It can be observed that operator type, mileage, contact area, year since crash, and vehicle pre-crash speed contribute most significantly to the model’s predictions. This indicates that it is crucial to investigate the vehicle characteristics of AVs. Simultaneously, a comprehensive investigation of the interdependencies among these variables becomes crucial, particularly regarding the impact of contextual elements like road type and lighting.

The predicted severity level varies with the operator type, as depicted in Figure 2. Autonomous driving functions of low SAE levels, such as Lane Centering Control (LCC), are already in use in a considerable number of vehicles. Some crashes may occur when consumers utilize these functions, and others occur when developers operate test vehicles. For consumer-operated vehicles, higher mileage is associated with a higher predicted probability of serious crashes. According to the model, for the test operator vehicles, the vehicles with lower mileage have a higher association with severe accidents. This may coincide with the deployment of less mature software versions during early testing phases, although the exact causality cannot be established.

Next, Figure 3 illustrates the impact of mileage. As illustrated in the dependence plot, most of the vehicles involved in crashes that occurred more than eight years ago had mileage exceeding 80,000. These higher-mileage vehicles show a positive association with serious crashes. For more recently developed vehicles, higher mileage is linked to a greater increase in the predicted probability of serious crashes.

Highways and streets are associated with increased predicted severity, whereas intersections are associated with decreased predicted severity (Figure 4). This result is consistent with the previous research [42,49]. The stronger association for streets may be linked to the diversity of traffic participants on the streets, as vehicles may have difficulty in coping with complex and changing traffic conditions, or more vulnerable road users may occur. For highways, the higher contribution may be related to higher speeds, as more kinetic energy causes more damage. It is notable that newer vehicles show a weaker association with serious crashes on highways and at intersections in the model. However, on parking lots and streets, newer models are associated with a higher proportion of serious crashes in the dataset. One potential explanation for this discrepancy is that there are still unresolved corner case problems for AVs in the street.

The conjecture about speed can be verified by another dependence plot (Figure 5). Streets and highways, which are associated with higher predicted severity, also tend to have crashes occurring at higher speeds.

Furthermore, among the precrash movements, proceeding straight and changing lanes are associated with increased predicted crash severity (Figure 6). Higher speeds exhibit a stronger positive association with crash severity when the vehicle is proceeding straight. A more nuanced and counterintuitive pattern is observed for turning vehicles: higher precrash speeds show a weaker association with increased severity in the model. This finding appears to contradict conventional traffic safety wisdom. While intriguing, it must be interpreted with caution as it may reflect specific, unobserved contextual factors in the dataset rather than a generalizable safety principle. Several non-mutually exclusive hypotheses could explain this pattern within the AV-specific context: (1) AVs may be programmed or may only engage higher speeds during turns in well-structured, low-risk environments (e.g., highway ramps with clear visibility, controlled test areas) where conflict probability is inherently low; (2) the dataset may lack granular details about turn geometry (e.g., radius, bank) or concurrent environmental conditions that fully define the risk scenario; (3) this statistical association may be influenced by confounding factors not included in the model. This result highlights the complexity of interpreting AV behavior from aggregate crash data and underscores the need for more detailed investigations to unpack the underlying conditions.

Vulnerable road users, like cyclists and pedestrians, are more likely to be injured in the absence of vehicle protection. At the same time, AVs are less capable of sensing small moving objects, posing more of a threat to them [4]. Therefore, vulnerable users deserve attention in the safety research of AVs. The following analysis of factors associated with crashes involving vulnerable road users (VRUs) is based on a substantially smaller subset of data compared to the overall crash severity model. The trends and associations reported below, while derived from our interpretable ML framework, should be regarded as “preliminary and exploratory” due to the limited sample size. They highlight potential signals that merit investigation with larger, dedicated datasets rather than offering definitive conclusions.

Environmental factors appear to be more prominent in the prediction of injuries sustained by vulnerable road users (Figure 7). The type of road is the most significant influencing factor, followed by precrash speed, mileage, year since crash, and then light and operator type.

At intersections, newer vehicles are associated with a higher predicted probability of injuries to vulnerable road users, while on highways, older vehicles show a stronger association with such injuries (Figure 8). This may be due to the fact that newer vehicles have sophisticated algorithms that can perform better on incomplex environments like freeways, while older vehicles tend to take more cautious action at intersections.

Under dark-but-lighted and dusk conditions, older vehicles are associated with a higher likelihood of injuries to vulnerable road users in our model, whereas newer vehicles show a stronger association under full darkness and daylight conditions (Figure 9). This indicates that older vehicles exhibit driving behaviors that are more akin to those of humans, as dusk is an environment in which human drivers are particularly prone to collisions [19]. It is therefore important to include hazardous road conditions in future studies of crash injury severity involving AVs.

To assess the sensitivity of our interpretations to the use of SMOTE-NC, we compared the SHAP-derived feature importance rankings from the LightGBM model (trained with SMOTE-NC) with those from an auxiliary model trained without oversampling. While absolute SHAP values differed, the relative order of the top five most influential features remained consistent for both crash severity and collision type predictions. This suggests that the core interpretative insights regarding key risk factors are stable and not an artifact of the sampling strategy.

To assess the temporal stability and generalizability of the identified predictive patterns, we performed a temporal hold-out validation. The dataset was split chronologically, with crashes occurring before 2021 (approximately 70%) used for training and validation, and crashes from 2021 onward (approximately 30%) held out as a temporal test set. This split approximates a scenario where a model trained on past data is used to predict future incidents. The LightGBM model, configured with the previously determined optimal hyperparameters, demonstrated robust temporal generalizability for both prediction tasks. For crash severity, the model achieved an AUC of 0.781 on the temporal test set, compared to 0.797 on the random split. For collision type (VRU-involved vs. others), it attained an AUC of 0.821, versus 0.837 in the main analysis. The slight and comparable decreases in performance indicate that the feature importance patterns and associations learned by the model are reasonably stable over time and are not merely capturing transient historical artifacts. Detailed performance metrics for both tasks in this temporal validation are provided in Supplementary Tables S3.1 and S3.2.

6. Conclusions and Future Work

6.1. Conclusions

This research analyzed the effects of vehicle factors on crash severity and collision types using the AVOID. LightGBM is chosen due to its best performance on metrics including PR-AUC, Recall, and F1-score etc. SHAP is adopted to study the causality between crash results and vehicle factors.

The SHAP analysis revealed that operator type, mileage, contact area, year since the crash, and vehicle pre-crash speed were the most important predictive factors for crash severity in our model, indicating that these vehicle characteristics are crucial to investigate. Operator type, which means whether the vehicle is operated by consumers or tested by businesses, is the most important factor. For consumer-operated vehicles, higher mileage is associated with a higher probability of serious crashes. Overall, higher mileage is a prominent factor linked to serious crashes. Regarding road types, highways and streets are associated with increased severity, while intersections are associated with decreased severity. Crashes on highways and streets tend to occur at higher speeds. Precrash movement is another important predictive factor. Proceeding straight and changing lanes are associated with higher predicted severity. The contribution of higher speeds to crashes is greater for vehicles proceeding straight, compared with that for turning vehicles.

Vulnerable users deserve attention in safety research. In our exploratory analysis of the limited VRU-involved crash data, road type emerged as the most influential feature in the model, followed by precrash speed, mileage, length of time since the crash, lighting, and operator type. Newer vehicles show a weaker association with injuries to vulnerable road users on highways. Lighting is another important environmental factor that may affect vehicles’ characteristics. Under dark-but-lighted and dusk conditions, newer vehicles are associated with a lower likelihood of injuries to vulnerable road users, whereas the association is stronger for newer vehicles under full darkness and daylight conditions.

The current study has provided insight into the vehicle factors that influence the outcomes of AVs-involved crashes. The results can also help manufacturers develop safer vehicles [50] and provide a reference for transportation management agencies in regulating autonomous vehicles. For example, manufacturers can improve sensors and vehicle control algorithms for crash-prone scenarios, and management can limit the level of automation for vehicle operation based on the environments.

6.2. Limitations and Future Research

This study identified key vehicle, roadway, and environmental factors associated with the severity and type of AV-involved crashes using an interpretable machine-learning framework. It is important to emphasize that the reported relationships are statistical associations derived from the model and the historical dataset. They highlight potential risk indicators and areas for further investigation but do not confirm direct causation. These associations could be influenced by unmeasured or residual confounding.

Unmeasured confounding variables (e.g., specific software versions, detailed sensor configurations, traffic density) may influence both the predictor variables and the outcomes. Although the variable “year since crash” serves as a proxy for technological evolution, it does not reflect the precise software state at the time of the incident. Future research could benefit from richer datasets containing direct measures of AV system maturity (e.g., software version logs, disengagement reports) and employ quasi-experimental or causal inference frameworks to move beyond association towards causation.

Future research should prioritize study designs more conducive to causal inference, such as natural experiments or meticulously matched observational studies. Building on such designs, established methods from confounding research—including sensitivity analysis (e.g., the E-value [51,52]) and quantitative bias analysis [53]—can be systematically employed to evaluate the potential impact of unmeasured confounding. Through the application of these methods, a more rigorous and quantified measure of confidence can be provided regarding the causal plausibility of the risk factors identified by the predictive model in this study.

Furthermore, the binary categorization of crash severity and collision type, while necessary and justified for our specific analytical objectives given the dataset, represents a simplification of real-world complexity. It does not capture gradients within injury severity (e.g., separating fatal from severe injury) nor the distinct crash dynamics associated with different non-VRU collision partners (e.g., fixed object vs. truck). Future research with larger, more detailed datasets would enable more granular, multi-class analyses to uncover these finer-grained relationships.

Our investigation into crashes involving vulnerable road users (VRUs) is constrained by a very sparse dataset. The insights from this part of the analysis are inherently preliminary and serve primarily for hypothesis generation. A critical future direction is to compile a significantly larger, multi-source dataset dedicated to VRU-AV interactions to support more reliable and stable SHAP-based interpretation.

Additionally, our analysis surfaced specific findings, such as the association between higher turning speeds and lower predicted severity, that challenge conventional expectations. These points of divergence are not shortcomings but rather valuable outcomes of applying an interpretable framework to novel AV data. They serve as precise hypotheses for future research. We recommend that subsequent studies employ high-resolution data (e.g., simulation logs, detailed telemetry) to investigate the specific operational contexts and vehicle behaviors that give rise to such statistical patterns, moving from correlation towards a mechanistic understanding.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/systems14010104/s1, Table S2.1: Consistency Tests (p-values) for Key Variable Distributions Before and After Data Cleaning; Table S3.1: LightGBM Performance for Crash Severity on Temporal Test Set; Table S3.2: LightGBM Performance for Collision Type on Temporal Test Set.

Author Contributions

Study conception and design: Z.Z. and G.R.; data collection and processing: Z.Z., W.W. and J.M.; modelling and interpretation of results: Z.Z., J.M. and J.S.; draft manuscript preparation: Z.Z., Q.C. and C.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the financial support for the National Natural Research Foundation of China (Grant No. 52502401, 52202399, 52372314 and 52432010), China Postdoctoral Science Foundations (Grant No. 2022M710679) and Postgraduate Research & Practice Innovation Program of Jiangsu Province (KYCX22_0286).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Gajjar, H.; Sanyal, S.; Shah, M. A comprehensive study on lane detecting autonomous car using computer vision. Expert Syst. Appl. 2023, 233, 120929. [Google Scholar] [CrossRef]
Han, L.; Fang, R.; Zhang, H.; Liu, G.; Zhu, C.; Chi, R. Adaptive autonomous emergency braking model based on weather conditions. Traffic Inj. Prev. 2023, 24, 609–617. [Google Scholar] [CrossRef]
Reinke, A.; Chen, X.; Stachniss, C. Simple But Effective Redundant Odometry for Autonomous Vehicles. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 9631–9637. [Google Scholar]
Ignatious, H.A.; Sayed, H.-E.; Khan, M. An overview of sensors in Autonomous Vehicles. Procedia Comput. Sci. 2022, 198, 736–741. [Google Scholar] [CrossRef]
Hnewa, M.; Radha, H. Object Detection Under Rainy Conditions for Autonomous Vehicles: A Review of State-of-the-Art and Emerging Techniques. IEEE Signal Process. Mag. 2021, 38, 53–67. [Google Scholar] [CrossRef]
Zheng, X.; Huang, H.; Wang, J.; Zhao, X.; Xu, Q. Behavioral decision-making model of the intelligent vehicle based on driving risk assessment. Comput.-Aided Civ. Infrastruct. Eng. 2021, 36, 820–837. [Google Scholar] [CrossRef]
Sun, C.; Leng, J.; Lu, B. Interactive left-turning of autonomous vehicles at uncontrolled intersections. IEEE Trans. Autom. Sci. Eng. 2022, 21, 204–214. [Google Scholar] [CrossRef]
Kotur, M.; Lukić, N.; Krunić, M.; Lukač, Ž. Camera and LiDAR Sensor Fusion for 3D Object Tracking in a Collision Avoidance System. In Proceedings of the 2021 Zooming Innovation in Consumer Technologies Conference (ZINC), Novi Sad, Serbia, 26–27 May 2021; pp. 198–202. [Google Scholar]
Huang, H.; Cheng, H.; Zhou, Z.; Wang, Z.; Wang, H.; Liu, Q.; Li, X. REACT: Runtime-Enabled active collision-avoidance technique for autonomous driving. Adv. Eng. Inform. 2026, 71, 104248. [Google Scholar] [CrossRef]
Li, L.; Gan, J.; Yi, Z.; Qu, X.; Ran, B. Risk perception and the warning strategy based on safety potential field theory. Accid. Anal. Prev. 2020, 148, 105805. [Google Scholar] [CrossRef]
Grewal, R.; Tonella, P.; Stocco, A. Predicting safety misbehaviours in autonomous driving systems using uncertainty quantification. In Proceedings of the 2024 IEEE Conference on Software Testing, Verification and Validation (ICST), Toronto, ON, Canada, 27–31 May 2024; pp. 70–81. [Google Scholar]
He, X.; Liu, Y.; Lv, C.; Ji, X.; Liu, Y. Emergency steering control of autonomous vehicle for collision avoidance and stabilisation. Veh. Syst. Dyn. 2019, 57, 1163–1187. [Google Scholar] [CrossRef]
Guo, J.; Hu, P.; Wang, R. Nonlinear coordinated steering and braking control of vision-based autonomous vehicles in emergency obstacle avoidance. IEEE Trans. Intell. Transp. Syst. 2016, 17, 3230–3240. [Google Scholar] [CrossRef]
Tan, X.; Wu, G.; Li, Z.; Liu, K.; Zhang, C. Autonomous Emergency Collision Avoidance and Collaborative Stability Control Technologies for Intelligent Vehicles: A Survey. IEEE Trans. Intell. Veh. 2025, 10, 5229–5248. [Google Scholar] [CrossRef]
King, C.; Ries, L.; Langner, J.; Sax, E. A Taxonomy and Survey on Validation Approaches for Automated Driving Systems. In Proceedings of the 2020 IEEE International Symposium on Systems Engineering (ISSE), Vienna, Austria, 12 October–12 November 2020; pp. 1–8. [Google Scholar]
Kaur, P.; Taghavi, S.; Tian, Z.; Shi, W. A Survey on Simulators for Testing Self-Driving Cars. In Proceedings of the 2021 Fourth International Conference on Connected and Autonomous Driving (MetroCAD), Detroit, MI, USA, 28–29 April 2021; pp. 62–70. [Google Scholar]
Khan, F.; Falco, M.; Anwar, H.; Pfahl, D. Safety Testing of Automated Driving Systems: A Literature Review. IEEE Access 2023, 11, 120049–120072. [Google Scholar] [CrossRef]
Hamidaoui, M.; Talhaoui, M.Z.; Li, M.; Midoun, M.A.; Haouassi, S.; Mekkaoui, D.E.; Smaili, A.; Cherraf, A.; Benyoub, F.Z. Survey of autonomous vehicles’ collision avoidance algorithms. Sensors 2025, 25, 395. [Google Scholar] [CrossRef]
Fountas, G.; Fonzone, A.; Gharavi, N.; Rye, T. The joint effect of weather and lighting conditions on injury severities of single-vehicle accidents. Anal. Methods Accid. Res. 2020, 27, 100124. [Google Scholar] [CrossRef]
Sun, Z.; Wang, D.; Gu, X.; Abdel-Aty, M.; Xing, Y.; Wang, J.; Lu, H.; Chen, Y. A hybrid approach of random forest and random parameters logit model of injury severity modeling of vulnerable road users involved crashes. Accid. Anal. Prev. 2023, 192, 107235. [Google Scholar] [CrossRef] [PubMed]
Mannering, F.; Bhat, C.R.; Shankar, V.; Abdel-Aty, M. Big data, traditional data and the tradeoffs between prediction and causality in highway-safety analysis. Anal. Methods Accid. Res. 2020, 25, 100113. [Google Scholar] [CrossRef]
Iranitalab, A.; Khattak, A. Comparison of four statistical and machine learning methods for crash severity prediction. Accid. Anal. Prev. 2017, 108, 27–36. [Google Scholar] [CrossRef]
Ding, C.; Chen, P.; Jiao, J. Non-linear effects of the built environment on automobile-involved pedestrian crash frequency: A machine learning approach. Accid. Anal. Prev. 2018, 112, 116–126. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Chang, I.; Park, H.; Hong, E.; Lee, J.; Kwon, N. Predicting effects of built environment on fatal pedestrian accidents at location-specific level: Application of XGBoost and SHAP. Accid. Anal. Prev. 2022, 166, 106545. [Google Scholar] [CrossRef]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. Lightgbm: A highly efficient gradient boosting decision tree. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
Parsa, A.B.; Movahedi, A.; Taghipour, H.; Derrible, S.; Mohammadian, A. Toward safer highways, application of XGBoost and SHAP for real-time accident detection and feature analysis. Accid. Anal. Prev. 2020, 136, 105405. [Google Scholar] [CrossRef]
Yang, C.; Chen, M.; Yuan, Q. The application of XGBoost and SHAP to examining the factors in freight truck-related crashes: An exploratory analysis. Accid. Anal. Prev. 2021, 158, 106153. [Google Scholar] [CrossRef]
Zheng, O.; Abdel-Aty, M.; Wang, Z.; Ding, S.; Wang, D.; Huang, Y. Avoid: Autonomous vehicle operation incident dataset across the globe. arXiv 2023, arXiv:2303.12889. [Google Scholar] [CrossRef]
Zhao, X.; Salako, K.; Strigini, L.; Robu, V.; Flynn, D. Assessing safety-critical systems from operational testing: A study on autonomous vehicles. Inf. Softw. Technol. 2020, 128, 106393. [Google Scholar] [CrossRef]
Ashraf, M.T.; Dey, K.; Mishra, S.; Rahman, M.T. Extracting rules from autonomous-vehicle-involved crashes by applying decision tree and association rule methods. Transp. Res. Rec. 2021, 2675, 522–533. [Google Scholar] [CrossRef]
Chen, H.; Chen, H.; Liu, Z.; Sun, X.; Zhou, R. Analysis of Factors Affecting the Severity of Automated Vehicle Crashes Using XGBoost Model Combining POI Data. J. Adv. Transp. 2020, 2020, 8881545. [Google Scholar] [CrossRef]
Chen, H.; Chen, H.; Zhou, R.; Liu, Z.; Sun, X. Exploring the Mechanism of Crashes with Autonomous Vehicles Using Machine Learning. Math. Probl. Eng. 2021, 2021, 5524356. [Google Scholar] [CrossRef]
Sinha, A.; Vu, V.; Chand, S.; Wijayaratna, K.; Dixit, V. A Crash Injury Model Involving Autonomous Vehicle: Investigating of Crash and Disengagement Reports. Sustainability 2021, 13, 7938. [Google Scholar] [CrossRef]
Zhu, S.; Meng, Q. What can we learn from autonomous vehicle collision data on crash severity? A cost-sensitive CART approach. Accid. Anal. Prev. 2022, 174, 106769. [Google Scholar] [CrossRef]
Xu, C.; Ding, Z.; Wang, C.; Li, Z. Statistical analysis of the patterns and characteristics of connected and autonomous vehicle involved crashes. J. Saf. Res. 2019, 71, 41–47. [Google Scholar] [CrossRef]
Kutela, B.; Avelar Raul, E.; Bansal, P. Modeling Automated Vehicle Crashes with a Focus on Vehicle At-Fault, Collision Type, and Injury Outcome. J. Transp. Eng. Part A Syst. 2022, 148, 04022024. [Google Scholar] [CrossRef]
Novat, N.; Kidando, E.; Kutela, B.; Kitali, A.E. A comparative study of collision types between automated and conventional vehicles using Bayesian probabilistic inferences. J. Saf. Res. 2023, 84, 251–260. [Google Scholar] [CrossRef]
Kutela, B.; Das, S.; Dadashova, B. Mining patterns of autonomous vehicle crashes involving vulnerable road users to understand the associated factors. Accid. Anal. Prev. 2022, 165, 106473. [Google Scholar] [CrossRef]
Boggs, A.M.; Wali, B.; Khattak, A.J. Exploratory analysis of automated vehicle crashes in California: A text analytics & hierarchical Bayesian heterogeneity-based approach. Accid. Anal. Prev. 2020, 135, 105354. [Google Scholar] [CrossRef] [PubMed]
Lee, S.; Arvin, R.; Khattak, A.J. Advancing investigation of automated vehicle crashes using text analytics of crash narratives and Bayesian analysis. Accid. Anal. Prev. 2023, 181, 106932. [Google Scholar] [CrossRef]
Ding, S.; Abdel-Aty, M.; Barbour, N.; Wang, D.; Wang, Z.; Zheng, O. Exploratory analysis of injury severity under different levels of driving automation (SAE Levels 2 and 4) using multi-source data. Accid. Anal. Prev. 2024, 206, 107692. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Hu, Z.; Lu, Z.; An, Q.; Wen, X. Modeling proactive effects of connected autonomous vehicles on urban traffic in adverse weather. Simul. Model. Pract. Theory 2025, 144, 103193. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
Haixiang, G.; Yijing, L.; Shang, J.; Mingyun, G.; Yuanyue, H.; Bing, G. Learning from class-imbalanced data: Review of methods and applications. Expert Syst. Appl. 2017, 73, 220–239. [Google Scholar] [CrossRef]
Wen, X.; Xie, Y.; Wu, L.; Jiang, L. Quantifying and comparing the effects of key risk factors on various types of roadway segment crashes with LightGBM and SHAP. Accid. Anal. Prev. 2021, 159, 106261. [Google Scholar] [CrossRef]
Snoek, J.; Larochelle, H.; Adams, R.P. Practical bayesian optimization of machine learning algorithms. In Proceedings of the 26th Annual Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; Volume 25. [Google Scholar]
Lundberg, S.M.; Erion, G.G.; Lee, S.-I. Consistent individualized feature attribution for tree ensembles. arXiv 2018, arXiv:1802.03888. [Google Scholar]
Dong, S.; Cui, Z.; Sun, D.; Ye, L.; Ding, S. How do environmental and road factors impact automated vehicle crashes? An evidence from ADAS and ADS. J. Transp. Saf. Secur. 2025, 17, 376–404. [Google Scholar] [CrossRef]
Parseh, M.; Asplund, F. New needs to consider during accident analysis: Implications of autonomous vehicles with collision reconfiguration systems. Accid. Anal. Prev. 2022, 173, 106704. [Google Scholar] [CrossRef] [PubMed]
VanderWeele, T.J.; Ding, P. Sensitivity analysis in observational research: Introducing the E-value. Ann. Intern. Med. 2017, 167, 268–274. [Google Scholar] [CrossRef]
Cinelli, C.; Hazlett, C. Making sense of sensitivity: Extending omitted variable bias. J. R. Stat. Soc. Ser. B Stat. Methodol. 2020, 82, 39–67. [Google Scholar] [CrossRef]
Lash, T.L.; Fox, M.P.; MacLehose, R.F.; Maldonado, G.; McCandless, L.C.; Greenland, S. Good practices for quantitative bias analysis. Int. J. Epidemiol. 2014, 43, 1969–1985. [Google Scholar] [CrossRef] [PubMed]

Figure 1. SHAP summary plot for serious crashes.

Figure 2. SHAP dependence plot of operator type.

Figure 3. SHAP dependence plot of Mileage.

Figure 4. SHAP dependence plots of roadway types.

Figure 5. SHAP dependence plots of roadway types for speed.

Figure 6. SHAP dependence plots of pre-crash movement.

Figure 7. SHAP summary plot for crashes involving vulnerable road users.

Figure 8. SHAP dependence plots of roadway types involving vulnerable road users.

Figure 9. SHAP dependence plots of lighting involving vulnerable road users.

Table 1. Summary of datasets, methods, factors, and outcomes in previous studies.

Literature	Method	Factor	Outcome
[32]	XGBoost and classification and regression tree (CART)	collision classification, pre-crash vehicle kinematics, structural damage assessment, incident geolocation, operational mode status, meteorological conditions, land-used data	collision severity
[31]	association rules, CART	county, crash type, operational mode, location of the crash, lighting condition, crash severity, AV movement, non-AV movement	crash type
[33]	CART, XGBoost, SHAP	type of collision, the precrash movement of AVs, the precrash movement of conventional vehicles, vehicle damage, precrash driving mode, accident location, weather conditions, lighting conditions	crash severity
[34]	bagging classifier	vehicle number, intersection, signalized, parking provision, mode, fault, AV status, other party’s vehicle status, vehicle type, collision type, intersection type	severity
[39]	random forest, and neural network	key predictors: pedestrian crossings, road junctions, traffic control devices, and autonomous vehicle kinematic behaviors including steering maneuvers, deceleration actions, and complete stops	vulnerable road users involved AV crashes.
[35]	cost-sensitive CART model	month, year, manufacturer, number of vehicles, city, driving mode, facility type, weather, lighting, road surface, roadway conditions, movement preceding collision	crash severity

Table 2. Descriptive statistics of crash data.

	Variable	Distribution
Operator Type	Consumer	77.8%
	In-Vehicle test Operator	19.4%
	Remote test Operator	2.8%
Pre-Crash Movement	Changing Lanes	2.8%
	Lane/Road Departure	5.6%
	Making Turn	5.9%
	Proceeding Straight	73.0%
	Stopped	12.7%
Contact type	Left	21.9%
	Rear and Front	22.3%
	Right	48.3%
	Top and Bottom	7.5%
-	Precrash Speed	-
-	Year since crash	-
Collision Type	Cyclist	5.3%
	Fixed Object	79.2%
	Pedestrian	2.9%
	Truck and Van	12.6%
Roadway Surface	Dry	79.5%
	Snow/Slush/Ice	1.0%
	Wet	19.5%
Roadway Type	Highway/Freeway	52.8%
	Intersection	21.0%
	Parking Lot	1.7%
	Rural Road	3.3%
	Street	21.2%
Roadway Description	Traffic Incident	0.3%
	Work Zone	93.8%
	No Unusual Conditions	5.9%
Lighting	Dark–Lighted	25.7%
	Dark–Not Lighted	10.0%
	Dawn/Dusk	5.4%
	Daylight	58.9%
Weather	Clear	69.3%
	Cloudy	12.5%
	Rain	17.6%
	Snow	0.6%

Table 3. Model performance for crash severity (with 95% confidence intervals).

Model	ROC-AUC [95% CI]	PR-AUC [95% CI]	Recall [95% CI]	F1-Score [95% CI]	Brier Score [95% CI]
SVM	0.672 [0.642–0.702]	0.385 [0.342–0.428]	0.474 [0.412–0.536]	0.400 [0.352–0.448]	0.215 [0.202–0.228]
XGBoost	0.773 [0.745–0.801]	0.512 [0.468–0.556]	0.562 [0.505–0.619]	0.608 [0.560–0.656]	0.187 [0.176–0.198]
LightGBM	0.797 [0.770–0.824]	0.589 [0.545–0.633]	0.625 [0.570–0.680]	0.860 [0.824–0.896]	0.168 [0.158–0.178]

Table 4. Model performance for collision type (with 95% confidence intervals).

Model	ROC-AUC [95% CI]	PR-AUC [95% CI]	Recall [95% CI]	F1-score [95% CI]	Brier Score [95% CI]
SVM	0.769 [0.722–0.816]	0.418 [0.362–0.474]	0.450 [0.350–0.550]	0.500 [0.417–0.583]	0.182 [0.167–0.197]
XGBoost	0.825 [0.781–0.869]	0.485 [0.431–0.539]	0.600 [0.501–0.686]	0.500 [0.425–0.597]	0.156 [0.142–0.170]
LightGBM	0.837 [0.795–0.879]	0.523 [0.471–0.575]	0.600 [0.512–0.700]	0.632 [0.548–0.716]	0.142 [0.129–0.155]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, Z.; Wu, W.; Cao, Q.; Song, J.; Ma, J.; Ren, G.; Wu, C. Deciphering the Crash Mechanisms in Autonomous Vehicle Systems via Explainable AI. Systems 2026, 14, 104. https://doi.org/10.3390/systems14010104

AMA Style

Zhang Z, Wu W, Cao Q, Song J, Ma J, Ren G, Wu C. Deciphering the Crash Mechanisms in Autonomous Vehicle Systems via Explainable AI. Systems. 2026; 14(1):104. https://doi.org/10.3390/systems14010104

Chicago/Turabian Style

Zhang, Zhe, Wentao Wu, Qi Cao, Jianhua Song, Jingfeng Ma, Gang Ren, and Changjian Wu. 2026. "Deciphering the Crash Mechanisms in Autonomous Vehicle Systems via Explainable AI" Systems 14, no. 1: 104. https://doi.org/10.3390/systems14010104

APA Style

Zhang, Z., Wu, W., Cao, Q., Song, J., Ma, J., Ren, G., & Wu, C. (2026). Deciphering the Crash Mechanisms in Autonomous Vehicle Systems via Explainable AI. Systems, 14(1), 104. https://doi.org/10.3390/systems14010104

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deciphering the Crash Mechanisms in Autonomous Vehicle Systems via Explainable AI

Abstract

1. Introduction

2. Literature Review

3. Data

3.1. Data Description

3.2. Variable Recategorization Rationale

3.3. Addressing Class Imbalance

3.4. Data Limitations and Considerations

4. Materials and Methods

5. Experiment

6. Conclusions and Future Work

6.1. Conclusions

6.2. Limitations and Future Research

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI