1. Introduction
Human errors are the most crucial cause of vehicle crashes and incidents [
1,
2]. Automated vehicle (AV) technology is expected to mitigate these human errors by decreasing human involvement in operating the vehicle, thus enhancing safety. These errors include distracted driving, aggressive driving, fatigued driving, and driving under the influence of alcohol and drugs [
3,
4]. AVs enhance road safety by reducing the reliance on human drivers [
5]. The Society of Automotive Engineers (SAE) classified AVs from Level 0, which is not automated, to Level 5, which is fully automated [
6]. Using technologies like Light Detection and Ranging (LiDAR), Radio Detection and Ranging (RADAR), and cameras, AVs maintain constant awareness of their surroundings and strictly follow traffic rules. Their ability to eliminate issues such as blind spots and poor visibility and reduce the risk of collisions makes them a better and safer driver than humans by minimizing human error. AVs can potentially transform transportation into a safer and more efficient system [
7,
8].
The introduction of vehicle automation has transformed road safety paradigms [
9]. As the prevalence of Automated Driving Systems (ADS) and Advanced Driver Assistance Systems (ADAS-L2) increases, understanding the nature and outcomes of related crashes becomes critical [
10,
11]. An ADS refers to the hardware and software that collectively enable a vehicle to perform the entire dynamic driving task. Essentially, it is a system that can handle the complexities of driving without requiring constant human input, at least within its operational limits. ADAS at Level 2 (L2) is a set of technologies that provide both steering and acceleration or deceleration assistance to the driver. This means the vehicle can manage lane centering and maintain a safe following distance from other vehicles, but the driver must remain engaged and be ready to take over control at any time [
12].
ADAS-L2 typically encompasses SAE Levels 1 and 2. In these levels, the system provides specific driving assistance, such as simultaneous speed and steering input, but critically, the human driver is required to remain fully engaged in the driving task and always retain ultimate responsibility. Common examples of ADAS-L2 include Forward Collision Warning (FCW), Automatic Emergency Braking (AEB), Lane Departure Warning (LDW), and Lane Keeping Assistance (LKA). In contrast, ADS refers to SAE Levels 3 through 5. At these higher levels, the vehicle is designed to perform the entire dynamic driving task autonomously within a defined operational design domain (ODD), significantly reducing or eliminating the need for constant human driver involvement. The fundamental differences in driver responsibility and system capabilities between ADAS-L2 and ADS are crucial, as they profoundly influence crash dynamics and the resulting severity outcomes, leading research to often analyze these categories separately [
13,
14,
15,
16].
The rapid development and widespread adoption of ADAS-L2 and ADS are fundamentally transforming the transportation sector. These technologies hold significant promise for enhancing road safety by mitigating human error, which is consistently identified as the predominant cause of crashes. Despite these advancements, traffic collisions continue to pose a substantial public health challenge [
17,
18,
19,
20,
21], with statistics revealing approximately 40,000 fatalities annually in the United States alone [
22]. The potential of ADAS-L2 and ADS to substantially reduce these figures underscores the critical importance of rigorously evaluating their safety performance.
ADS and ADAS-L2 technologies promise to enhance safety by reducing human error, maintaining safe distances, and supporting decision-making during complex driving scenarios [
23]. However, as real-world deployments increase, questions remain about how these technologies perform under various environmental and roadway conditions, and how such conditions may influence crash outcomes [
24]. Crash severity is a critical outcome that reflects the potential harm to occupants, pedestrians, and property. Identifying the contextual factors that contribute to more severe crashes is essential for both system design improvements and the development of appropriate infrastructure and regulatory responses [
25]. Despite ongoing progress in automation, data-driven insights into the determinants of crash severity remain limited, particularly regarding the interaction between roadway, lighting, and weather conditions and automated technologies.
Therefore, the goal of this study is to examine the severity and contributing factors of crashes involving vehicles equipped with ADS and ADAS-L2 using national and state-level datasets from the National Highway Safety Administration (NHTSA) using machine learning methods. This research uncovers patterns and risk elements associated with ADS and ADAS-L2 operations, provides a deeper understanding of safety challenges, and offers insights for improving these technologies that can enhance public trust and promote their safe integration into existing transportation systems. The availability of standardized, publicly available crash reports from NHTSA offers a unique opportunity to fill this knowledge gap using statistical and machine learning techniques. The ability to conduct robust research on crash severity is fundamentally dependent on the availability and quality of crash data. In the U.S., NHTSA plays a central role in regulating and collecting data on crashes involving both ADS and Level 2 ADAS-L2 vehicles. Given the limited reporting coverage and high proportion of missing severity data in the ADAS-L2 cohort, the conclusions emphasize relative differences rather than absolute risk estimates.
2. Literature Review
The proliferation of vehicles equipped with ADS and ADAS-L2 technologies has significantly altered the landscape of transportation safety research. These systems aim to mitigate driver errors, enhance road safety, and ultimately reduce the severity of crashes. However, the integration of ADS and ADAS-L2 has introduced new challenges and complexities in understanding the real-world safety implications of these systems [
26]. This review synthesizes existing literature to identify key research themes, highlight pivotal studies, address ongoing debates, recognize gaps, and propose directions for future research.
Many studies and researchers have explored the various aspects of these technologies, particularly their undeniable impact on safety, a key area of emphasis for researchers. Moradloo et al. researched complex crash scenarios in which AV performance in situations is challenging; their results show that most edge-case incidents’ causes were unplanned obstacles and unexpected environmental conditions [
27]. Liu et al. [
28] prove that AVs take more time for perception reaction (PR) than human drivers. Also, AVs have a poor ability to predict lane change, and deficiencies in time-space path planning in AVs are not comparable to those of individuals [
28]. Using the Black box, video-based methods are the way Han et al. used to predict high-risk scenarios by monitoring the movement direction and object size changes, and revealed that speed increases injury severity, and 80% of drivers fail to perform avoidance maneuvers [
29]. Categorizing AVs as preventable and unpreventable, and also by cause and initiator, gives some results about their safety [
30,
31]. The research by Das et al. shows that AV-exclusive performed similarly to TV-exclusive platoons in incident time [
32]. Huang et al. found that ADAS-L2-equipped crashes had a higher rate of rear-end collisions than ADS-equipped vehicles [
33]. Crash patterns, roadway type, model year, surface condition, and incident time influenced outcomes such as contact location and injury severity in AVs [
25].
Human interaction is another factor that affects ADAS-L2 and ADS. Forster et al. mentioned that responsibility-focused training in critical events improved perception, planned behavior, and takeover timing. In contrast, limitation-focused training showed no advantage over no training [
34]. Another study by Zhang et al. illustrated that ADAS-L2 capability, as well as total mileage, were linked to increased trust and reduced visual tracking [
10]. In some research about gender, they found out female drivers were more concerned about safety and reliability, and preferred hands-on experiences [
35].
Using some simulation and evaluation tools by researchers is an exciting process; for instance, Kroger et al. proposed measurable levels of risks that provide consistent safety standards based on engineering, regulatory, and sociological considerations [
36]. Ali et al. revealed their research by introducing OpenCat, findings about how closely coupled benchmarks limited the ability to generalize across ADAS-L2 models and simulation tools [
37]. Results of one study show successful modular testing with dynamic control over obstacles and Vehicle to Everything (V2X) messaging in ADAS-L2/AD systems [
38]. Singh et al. emphasized the impact of simulator-specific architectures and assumptions on ADAS-L2 and discovered that promoting the use of generalized simulation platforms increases the reliability of results across platforms [
39]. Scally et al. concentrated on partial overlap conditions between the lead vehicle and the driving lane to demonstrate how edges affect the dependability of automated emergency braking (AEB) and forward collision warning (FCW) systems [
40].
Beck et al. highlighted the findings of the importance of sensor fusion and simulation in post-crash analyses [
41]. Another multi-case study of actual fatalities yields some conclusions that highlight the inadequate safety management system (SMS), redundancies in failure mitigation, and subpar design domain [
42]. Malin et al. revealed that adding AEB and Electronic Stability Control (ESC) features to the AVs resulted in a minor drop in injuries and fatalities across the country [
43]. Pimentel et al. [
44] noted the shortcomings of the current framework, including the lack of coordination on measurement accuracy, validation standards, and AV safety guidelines. Additionally, they presented the Enterprise-Wide Quality and Integrated Management System (EWQIMS) as a potential ISO-compliant method of assisting with safety analysis and AV development [
44].
Environmental and roadway characteristics also contribute to crash severity. Crash injuries predominantly occur on streets, likely due to higher traffic volumes, more frequent stops, and increased pedestrian activity in urban environments. Avenues accounted for 15% of incidents, while boulevards had the fewest at 4%. This suggests that AV navigation systems may require further sophistication to handle the complexities of street-level traffic effectively. Other factors, such as intersection geometry, lighting conditions, traffic control signals, parking provisions, the number of vehicles involved, and weather/road surface conditions, were found to be less impactful but still notable predictors of injury outcomes. However, specific studies have observed that ADAS-L2 crashes were more common during unfavorable conditions like wet surfaces, adverse weather, and dark environments, whereas ADS crashes exhibited the opposite pattern. This observation highlights potential differences in the operational design domains and robustness of various ADAS-L2 and ADS [
45,
46,
47]. Despite the advancements in understanding ADS/ADAS-L2 crash severity, significant gaps persist in the current literature, mainly stemming from limitations in data availability and quality, as well as challenges inherent in comprehensive crash investigations [
47].
3. Data and Methodology
3.1. Data
The data used in this study were obtained from the NHTSA Standing General Order on Crash Reporting website [
48], which consists of real-world crashes associated with ADS and Level 2 ADAS-L2 vehicles from manufacturers and operators. The dataset includes incidents involving ADAS-L2 and ADS since July 2021 and is categorized by specific variables, along with their respective percentages in the total dataset. Overall, the dataset consists of 1434 crashes associated with ADS and 2193 crashes related to ADAS-L2.
Table 1 presents the ADS crash dataset, including the distribution of the crashes by month and year to identify the months with the highest percentage of crashes. Moreover, the type of operator involved in each crash is investigated, as it provides crucial information about the nature of the crashes. Other contributing factors considered include vehicle and driver characteristics, roadway type and conditions, climatic and weather conditions, and lighting conditions, all of which are known to influence crash occurrence and severity. The final section of the table displays crash involvement and severity rates for ADS-related incidents. Similarly,
Table 2 presents data for ADAS-L2-related crashes, following the same structure and analytical approach as
Table 1.
The descriptive analysis of ADS-related crash data reveals distinct patterns in terms of temporal distribution, operator involvement, roadway characteristics, environmental conditions, and incident severity. The ADS crashes had the highest rate between August and December and increased year over year from 21.47% in 2022 to 37.37% in 2024. Environmental and roadway factors also play a significant role in ADS crashes. Most incidents occurred on streets (47.07%) and at intersections (39.68%), highlighting the challenges that urban environments pose for automated systems. The analysis of light conditions reveals that 58.31% of crashes occurred during the daytime, while 35.84% occurred during nighttime with artificial illumination, indicating that visibility remains an issue.
While ADS-involved crashes show an increasing trend, most crashes are not classified as severe, as only 1.36% of the total crashes resulted in serious injuries, and 0.12% resulted in death. Only 82.02% of the crashes had no injuries.
ADAS-L2-related crash data analyses highlight notable trends concerning the distribution of incidents across time, operator behavior, roadway conditions, environmental conditions, and crash severity. There is a seasonal cycle, with the highest percentages of crashes occurring from October to December, peaking in December (11.77%) and November (11.54%). A steady increase in crashes is also observed over the years; the year 2024 yields the highest percentage (38.40%). Regarding road conditions, almost all crashes happen on highways and freeways (36.27%), streets (7.71%), and intersections (7.12%).
Lighting conditions were often unreported (61.04%). Among the known cases, daylight crashes were the most common (21.81%), followed by those occurring in dark but well-lit conditions (9.26%). Passenger cars (8.07%) and fixed objects (13.18%) were the most frequently involved objects in ADAS-L2 crashes. In terms of severity, while fatal crashes were slightly higher than ADS-related crashes (2.19%), most incidents had unknown severity classifications (82.34%), making it difficult to assess overall crash outcomes with certainty.
Figure 1 compares the injury distribution between ADAS-L2 and ADS crashes. A clear difference emerges: ADAS-L2 crashes have a substantially higher proportion of injury outcomes compared to ADS crashes. This pattern aligns with the operational differences between the two systems. ADAS-L2 crashes still rely heavily on human supervision, which may lead to delayed reaction times or incomplete hazard detection, especially in unexpected scenarios. In contrast, ADS crashes, which largely occur during controlled testing, show a much lower injury rate, suggesting that ADS vehicles may be involved in collisions in less hazardous contexts or at lower speeds. This comparison highlights system type as an important factor influencing crash severity.
Figure 2 illustrates the interaction between lighting conditions and system type on crash severity. For ADAS-L2 crashes, injury outcomes increase notably under poor lighting, particularly in dark environments. This suggests that reduced visibility places greater demands on the human driver, thereby increasing injury risk when the driver must rapidly re-engage from partial automation. For ADS crashes, injury rates remain relatively low across lighting conditions. However, a modest elevation appears in dark or transitional (dawn/dusk) settings, indicating that environmental visibility still poses challenges even for higher-level automation. These differences underscore that lighting conditions have a more pronounced influence on severity for ADAS-L2 crashes than for ADS crashes, highlighting the combined impact of visibility and system type on crash outcomes.
3.2. Methodology
The raw datasets from the NHTSA contained crash data for ADS and ADAS-L2-involved vehicles, with each crash record described by categorical variables such as the time of incident, roadway characteristics, lighting conditions, weather conditions, crash partner, and crash severity. Initial exploration revealed that both datasets contained a significant number of entries with missing or ambiguous labels (
Table 1 and
Table 2), particularly in the “Severity” field, where many entries were recorded as “Unknown”. Therefore, to prepare the data for supervised machine learning analysis, several cleaning steps were applied. First, records with missing or undefined severity values were removed from the dataset to ensure reliable model training and evaluation. The severity variable was then recoded into a binary variable: “Injury” (including any entry labeled as “Minor”, “Major”, or indicating bodily harm) versus “No Injury” (records explicitly stating “No Injuries Reported”).
Next, all categorical variables were converted into factors, and a dummy variable expansion was used, a necessary transformation for most machine learning algorithms. Variables such as lighting, roadway type, weather conditions, and crash partner were retained as key variables. The time of the incident was excluded due to its inconsistent formatting. After these preprocessing steps, the cleaned dataset included only those crashes with clear severity outcomes and complete data across selected variables, ensuring a consistent basis for model development and interpretation.
The data was randomly split into training and testing sets, with 70% of records used for model training and 30% for validation. Model performance was evaluated on the test set using metrics including overall accuracy, sensitivity, specificity, balanced accuracy, and Cohen’s kappa. Variable importance analysis was performed for the random forest and XGBoost (eXtreme Gradient Boosting) models to interpret the most influential predictors of injury severity. All analyses were conducted using R, utilizing packages such as caret [
49], randomForest [
50,
51], and xgboost [
52,
53]. Logistic regression for its interpretability and statistical grounding, random forest for its ability to handle complex nonlinear interactions and feature importance estimation, and XGBoost for its high accuracy and efficiency in handling tabular data [
54,
55,
56].
Because the ADS dataset exhibited a highly imbalanced distribution between “Injury” and “No Injury” outcomes, model performance was evaluated using class-specific metrics, including recall, precision, and F1-score for the minority “Injury” class. These metrics provide a more reliable assessment than overall accuracy, which can be inflated by the majority class. No resampling or class-weighting techniques were applied in this study; however, the class imbalance is explicitly recognized and incorporated into the interpretation of the model results.
Four supervised machine learning models were applied to predict crash injury severity: logistic regression, support vector machines (SVM), random forest, and XGBoost. Logistic regression serves as a baseline linear classifier that estimates the probability of injury as a function of crash-related predictors. SVM constructs an optimal hyperplane to separate injury and non-injury cases, allowing for nonlinear decision boundaries through kernel functions. Random forest is an ensemble of decision trees that captures complex interactions by aggregating multiple tree-based models to improve stability and reduce overfitting. XGBoost is a gradient-boosting algorithm that sequentially builds decision trees, optimizing each tree based on previous model errors, and incorporates regularization and efficient handling of categorical and missing data.
3.2.1. Logistic Regression
Logistic Regression is a supervised machine learning algorithm used for classification problems. Unlike linear regression, which predicts continuous values, it predicts the probability that an input belongs to a specific class. It is used for binary classification where the output can be one of two possible categories, such as Yes/No, True/False, or 0/1. It uses a sigmoid function to convert inputs into a probability value between 0 and 1 [
55].
3.2.2. Random Forest
Random Forest is a widely used machine learning algorithm developed by Leo Breiman and Adele Cutler, which combines the output of multiple decision trees to reach a single result. Its ease of use and flexibility, coupled with its effectiveness as a random forest classifier, have fueled its adoption, as it handles both classification and regression problems [
57].
3.2.3. SVM
A support vector machine (SVM) is a supervised machine learning algorithm that classifies data by finding an optimal line or hyperplane that maximizes the distance between each class in an N-dimensional space. SVMs are commonly used in classification problems. They distinguish between two classes by finding the optimal hyperplane that maximizes the margin between the closest data points of opposite classes [
58,
59].
3.2.4. XGBoost
XGBoost is a distributed, open-source machine learning library that utilizes gradient boosted decision trees, a supervised learning boosting algorithm that leverages gradient descent. It is known for its speed, efficiency, and ability to scale well with large datasets. Developed by Tianqi Chen from the University of Washington, XGBoost is an advanced implementation of gradient boosting with the same general framework; that is, it combines weak learner trees into strong learners by adding up residuals [
60].
3.2.5. Model Comparison and Theoretical Basis
The four machine learning algorithms used in this study differ substantially in their assumptions, learning mechanisms, and suitability for complex crash datasets. Logistic regression assumes linear separability between classes, which limits its ability to capture nonlinear relationships inherent in crash dynamics [
56]. Support Vector Machines (SVMs) address nonlinearity through kernel functions but can struggle with high-dimensional categorical variables and imbalanced classes [
58,
59]. Random Forests reduce overfitting by aggregating multiple decision trees, effectively capturing nonlinearities and interactions [
57]; however, their performance may degrade when dealing with sparse categories or complex variable interactions.
XGBoost extends gradient-boosted decision trees by using sequential tree-building, regularization, and optimized handling of missing values [
60]. Its ability to capture nonlinear, high-order interactions among roadway, environmental, and crash-partner variables makes it particularly well suited for ADS and ADAS-L2 crash data, which include numerous categorical variables and multiple interacting risk factors. XGBoost also incorporates shrinkage, column sampling, and an efficient split-finding algorithm, offering improved generalization and robustness compared to traditional ensemble methods. These characteristics make XGBoost theoretically advantageous for crash severity modeling in heterogeneous, real-world automated vehicle datasets.
The predictors included in the study represent all crash-related variables that were consistently reported in the NHTSA Standing General Order dataset and that have established relevance in the transportation safety literature. These variables, such as lighting conditions, roadway type, weather, and crash partner type, are among the few reliably available inputs across ADS and ADAS-L2 crash reports. XGBoost evaluates all predictors simultaneously and assigns importance scores based on their contribution to improving model performance. Therefore, the predictors highlighted in the study are those that the XGBoost model identified as the most influential in distinguishing between injury and non-injury outcomes. Their inclusion reflects both their theoretical significance in crash-severity research and their empirical importance within the available dataset.
5. Discussion
The findings of this study highlight the factors contributing to crashes involving ADS and Level-2 ADAS technologies. ADS crashes are primarily linked to test vehicles operating in complex urban environments, which most frequently occur on streets and at intersections. These incidents often happen under clear weather and dry surface conditions, suggesting that crash causes may be more closely related to system limitations and traffic complexity than to external hazards. The spatial analysis further supports these trends.
Figure 1 illustrates a high concentration of ADS crashes in California, Arizona, and Texas, reflecting the focused testing activity of AV companies, likely due to their favorable weather conditions that support consistent data collection and system performance.
In contrast, ADAS-L2 crashes are more geographically widespread, reflecting the broader adoption of these technologies by consumers. However, significant data gaps, especially in roadway surface, lighting, and severity reporting, limit a complete understanding of ADAS-L2 crash patterns.
Figure 6 illustrates the broader national spread of ADAS-L2-related crashes, with notable counts in New York, Florida, and Pennsylvania.
The results of the machine learning models provided additional depth by uncovering the key predictive factors associated with injury severity in crashes. For ADAS-L2 crashes, feature importance rankings pointed toward crashes involving fixed objects and low-light conditions on dark roadways as the top contributors to injury severity. These systems, which rely heavily on driver engagement, may not compensate adequately when human attention lapses, particularly in situations where visual cues are degraded, or the environment is non-uniform (e.g., rural or poorly lit roads).
Crashes involving ADS-equipped vehicles were predominantly no-injury incidents, with only 1.36% classified as serious and 0.12% as fatal. This aligns with the finding that most ADS crashes occurred in favorable environmental conditions, roads (94.9%), clear weather (80.2%), and daylight (58.3%), suggesting that crash severity is influenced more by system performance limitations in complex traffic environments than by external hazards. The machine learning models, particularly XGBoost, identified other road user interactions (pedestrians, cyclists, motorcyclists), darkness, and fixed object collisions as key injury predictors, emphasizing the challenges ADS face in urban multimodal contexts, even under optimal visibility and surface conditions.
In contrast, ADAS-L2 crashes showed a slightly higher proportion of fatal outcomes (2.19%) and were disproportionately associated with poor visibility and fixed-object collisions. XGBoost revealed that lighting conditions, particularly “dark, not lighted” environments, along with wet roads and stationary hazards, were the most influential factors in predicting injury. Unlike ADS crashes, which are system-controlled, ADAS-L2 crashes reflect human behavior under partial automation. These results suggest that the presence of ADAS-L2 systems may not sufficiently compensate for driver inattention or misjudgment in adverse conditions. Furthermore, a large portion of the ADAS-L2 dataset (over 60%) had missing or unknown severity values, highlighting the limitations of current reporting standards and the need for improved crash documentation.
From a policy standpoint, this study reveals the urgent need to improve data reporting consistency in ADAS-L2 crash reports, especially in fields related to severity classification and environmental context. Federal agencies and manufacturers must coordinate efforts to ensure that future crash datasets are complete, standardized, and sufficiently detailed to allow meaningful machine learning-based safety evaluations.
Several limitations should be noted. The analysis was constrained by the relatively high proportion of missing or “unknown” severity outcomes in the ADAS-L2 dataset, which may introduce selection bias and limit the generalizability of the results. Additionally, the NHTSA Standing General Order data does not include detailed information on driver behavior, system engagement status, or precise crash dynamics, all of which could further elucidate the pathways to injury. Exposure data (e.g., vehicle miles traveled by system type) was also not available, precluding calculation of accurate risk rates.
While this study reveals temporal patterns in ADAS-L2 crashes, it relies on raw crash counts that are not normalized by the number of ADAS-L2-equipped vehicles in operation. As such, the observed upward trend over time, particularly in late 2024, may reflect increased adoption and deployment of these vehicles rather than a deterioration in system safety or reliability. Without adjusting for vehicle exposure (e.g., number of vehicles on the road or vehicle miles traveled), it is not possible to determine whether the crash risk per vehicle is increasing, decreasing, or remaining constant. Unfortunately, publicly available datasets providing consistent and granular information on the fleet size or VMT of ADAS-equipped vehicles remain limited. Future research should integrate normalization metrics. Incorporating such metrics would allow for a more accurate evaluation.
A key limitation of this study is the reliance on a single data source, the NHTSA Standing General Order crash reporting dataset. While this dataset is currently the only nationwide, standardized source that separately documents ADS and ADAS-L2 crashes, it contains substantial missing or unreported fields, particularly for severity, lighting, weather, and roadway characteristics in ADAS-L2 reports. These gaps may introduce selection bias, reduce statistical power, and limit the generalizability of the machine-learning results. Because no complementary national dataset provides equivalent ADS/ADAS-L2 distinctions, cross-validation with external datasets was not feasible in the present study. This study did not evaluate crash severity under combined environmental or roadway scenarios due to limitations in the completeness, consistency, and sample size of the NHTSA SGO dataset. Many variables include substantial missing or categorical fragmentation, making it infeasible to construct reliable multi-factor scenarios. As a result, the analysis focused on individual predictors, and future research with richer datasets is needed to examine combined-condition effects.
This study did not analyze state-level differences in ADAS-L2 crashes, so the findings reflect overall national patterns rather than state-specific characteristics. Regional factors such as reporting practices, roadway environments, climate, and ADAS adoption levels may influence crash outcomes. Future research should examine these state-by-state variations to better understand how local conditions shape ADAS-L2 crash severity.
Future research should seek to address these limitations by integrating exposure measures, more granular operational and behavioral data, and potentially linking crash reports with vehicle telemetry and roadway inventory datasets. Longitudinal studies examining changes in risk profiles as ADS and ADAS-L2 technologies mature and as deployment environments diversify will be critical.
Moreover, the ADS dataset shows a pronounced class imbalance, with more than 80% of cases categorized as “No Injury”. As a result, traditional accuracy metrics may overestimate model performance. In this study, we emphasized class-specific evaluation metrics, particularly recall and F1-score for the injury class, to more accurately capture performance on minority outcomes. XGBoost achieved the highest injury recall, indicating relatively strong sensitivity to rare but critical injury events. Nonetheless, future studies should incorporate resampling strategies, cost-sensitive learning, or synthetic data generation (e.g., SMOTE) to further mitigate imbalance effects and strengthen minority-class prediction.
Future work should also incorporate additional datasets to validate the robustness and applicability of the findings. Potential sources include the Fatality Analysis Reporting System (FARS) and the Crash Report Sampling System (CRSS), which offer detailed national crash information; state-level crash databases with richer roadway and environmental attributes; manufacturer disengagement and operational design domain (ODD) reports; and naturalistic driving datasets such as SHRP2. Although these sources do not explicitly differentiate ADS and ADAS-L2 crashes in the same manner as the NHTSA SGO dataset, integrating them through variable harmonization or linkage methodologies could enable broader risk estimation and more comprehensive model validation.
Although ADS and ADAS-L2 crashes likely vary across states due to differences in roadway design, infrastructure quality, and traffic regulations, no national database currently links ADS/ADAS-L2 crashes to detailed policy or infrastructure attributes. As a result, the present study focuses on national-level patterns, and future work should incorporate standardized state-level policy and roadway datasets when such information becomes available.
6. Conclusions
This study examined crash severity outcomes associated with ADS and ADAS-L2, using crash data from the NHTSA. By applying data cleaning, recoding, and supervised machine learning techniques to real-world crash data, this study aimed to identify the most critical factors contributing to injury severity in crashes involving ADS and ADAS-L2-equipped vehicles.
Descriptive findings revealed distinct temporal, spatial, and environmental patterns for ADS and ADAS-L2 crashes. ADS crashes predominantly occurred on urban streets and intersections, under clear weather and daylight conditions, with the highest concentration reported in California, Arizona, and Texas, states known for concentrated AV testing. These incidents were mainly associated with commercial or test vehicles and typically resulted in property damage or minor injuries, with fatal crashes comprising only 0.12% of the dataset. In contrast, ADAS-L2 crashes were more geographically dispersed and often occurred on highways or freeways. A higher percentage of fatal crashes (2.19%) was observed among ADAS-L2 incidents.
Machine learning models were developed separately for ADS and ADAS-L2 datasets to predict crash injury severity. Among the models tested, XGBoost consistently achieved the highest performance across both datasets, demonstrating strong accuracy, precision, and recall. For ADS crashes, injury severity was most influenced by interactions with other road users (pedestrians, cyclists, motorcyclists), poor lighting conditions, and collisions with fixed objects or SUVs. For ADAS-L2, key predictors of injury severity included fixed object collisions, dark roadways, and wet surfaces.
From a policy and engineering standpoint, this research underscores the need for several key actions. First, federal and industry stakeholders should improve the consistency and completeness of crash reporting, particularly for ADAS-L2 events where missing severity and roadway condition data limit reliable inference. Second, system developers must prioritize enhancing object detection, pedestrian awareness, and decision-making capabilities of ADS in complex urban environments. Third, human–machine interface improvements and driver training protocols are critical for ADAS-L2 users, particularly to address challenges associated with partial automation during adverse conditions.
To better align these findings with regulatory practice, this study also highlights the need for more detailed, scenario-based evaluation within existing safety standards. Specifically, targeted updates to Federal Motor Vehicle Safety Standards (FMVSS) procedures could incorporate real-world test conditions identified in this study, such as pedestrian interaction scenarios for ADS and limited-visibility or wet-surface scenarios for ADAS-L2. These additions would help regulatory authorities more accurately assess system performance under conditions where higher-severity crashes are most likely to occur.
Overall, this study contributes to the growing body of literature on the real-world safety performance of automated and semi-automated vehicles. By identifying key predictors of injury severity and highlighting systemic data limitations, the findings provide actionable insights for system developers, policymakers, and researchers striving to improve safety outcomes in the era of vehicle automation.