Structured Risk Identification for Sustainable Safety in Mixed Autonomous Traffic: A Layered Data-Driven Approach

Hyorim Han; Soongbong Lee; Jeongho Jeong; Jongwoo Lee

doi:10.3390/su17167284

,

and

Korea Transport Institute, Sejong-si 30147, Republic of Korea

^*

Author to whom correspondence should be addressed.

Sustainability2025, 17(16), 7284;https://doi.org/10.3390/su17167284

This article belongs to the Collection Accident Prevention and Risk Management for Safe and Sustainable Transportation

Version Notes

Order Reprints

Abstract

With the accelerated commercialization of autonomous vehicles, new accident types and complex risk factors have emerged beyond the scope of existing traffic safety management systems. This study aims to contribute to sustainable safety by establishing a quantitative basis for early recognition and response to high-risk situations in urban traffic environments where autonomous and conventional vehicles coexist. To this end, high-risk factors were identified through a combination of literature meta-analysis, accident history and image analysis, autonomous driving video review, and expert seminars. For analytical structuring, the six-layer scenario framework from the PEGASUS project was redefined. Using the analytic hierarchy process (AHP), 28 high-risk factors were identified. A risk prediction model framework was then developed, incorporating observational indicators derived from expert rankings. These indicators were structured as input variables for both road segments and autonomous vehicles, enabling spatial risk assessment through agent-based strategies. This space–object integration-based prediction model supports the early detection of high-risk situations, the designation of high-enforcement zones, and the development of preventive safety systems, infrastructure improvements, and policy measures. Ultimately, the findings offer a pathway toward achieving sustainable safety in mixed traffic environments during the initial deployment phase of autonomous vehicles.

Keywords:

proactive risk management; high-risk situation prediction; layered safety framework; traffic safety indicators; mixed traffic environment; AHP (analytic hierarchy process); sustainable safety; autonomous vehicles

1. Introduction

Worldwide, technological development and institutional preparation for the commercialization of autonomous vehicles have been actively underway. In particular, real-world demonstration, legal system enhancements, and infrastructure development have been implemented in the United States, Europe, and Korea for fully autonomous driving (level 4 or higher), and a new paradigm of traffic safety management, necessitating a new paradigm in traffic safety management alongside the accelerated adoption of autonomous vehicles [1,2]. These changes have led to the emergence of new accident types and risk factors that are difficult to respond to with existing traffic safety management methods alone.

Unlike conventional accidents, such as simple driver judgment errors and collisions between vehicles, autonomous vehicle accidents have fundamentally different characteristics due to the complex involvement of factors, including sensor recognition errors, system malfunctions, and digital communication failures [3,4]. These complex, system-oriented accident mechanisms highlight the importance of “prevention” compared to “follow-up actions” in responding to autonomous vehicle accidents, indicating that it is essential to construct a system capable of detecting high-risk situations in advance and adequately responding to them.

Accordingly, studies have actively focused on risk-based assessment and real-time prediction model development to prevent autonomous vehicle accidents [5,6]. In particular, there is a growing need for a comprehensive risk prediction system that considers the characteristics of road segments, weather conditions, and interactions with surrounding objects, as well as the driving conditions of the vehicle, and an integrated approach is required beyond conventional single variable-oriented analysis [7]. Most existing models, however, were developed in limited environments (e.g., highways) or focused solely on vehicle-level risk prediction, limiting their ability to fully reflect the complexity and diversity of actual traffic environments.

To overcome these limitations, the risk factors required to detect and prevent high-risk situations before accidents in the urban traffic environment with mixed autonomous and conventional vehicles were systematically derived in this study. To this end, data from various sources (e.g., literature, accident videos, and accident history) were comprehensively collected and analyzed, and variable importance was quantified through expert consultation. Furthermore, high-risk prediction indicators applicable to road segments and individual autonomous vehicles were separately derived to support the development of a customized risk response system based on real-time detection.

2. Literature Review

2.1. Literature on Accident Factors in Mixed Traffic with Autonomous Vehicles

Previous studies have examined the causes of autonomous vehicle accidents from various perspectives, including road environment, driver behavior, and system errors. In terms of road infrastructure, intersections, lane count, dedicated lanes, and on-street parking were found to be factors that influence accident occurrence [8,9,10]. Variable facility complexity and specialized road sections were also found to affect driving safety. In particular, junction and deceleration lane design were found to increase risk [11]. Li et al. (2024) empirically demonstrated that lane-changing maneuvers on highways significantly increase the risk of collisions when combined with external factors such as lighting conditions and congestion [12]. This suggests that not only road structures but also driving environments and behaviors interact in complex ways to elevate accident risk.

With respect to the characteristics of traffic flow, the speed deviation, congestion, and upstream traffic were closely associated with autonomous vehicle accidents [7,13]. In this context, Chen et al. (2023) analyzed the differences in lane-changing behavior between autonomous and conventional vehicles, indicating that interactions between the conservative behavior of autonomous vehicles and the aggressive driving characteristics of human drivers may trigger collision risks [14]. This highlights that, in addition to static factors, dynamic behavioral differences also serve as critical contributors to accidents. Environmental factors such as inclement weather (e.g., rain, snow, and fog) or poor illumination conditions were identified as significant accident risk factors [15,16,17].

In terms of interactions with moving objects, errors by pedestrians, bicyclists, and other vehicle drivers were found to be the primary causes of autonomous vehicle accidents. In particular, rear-end collisions were frequently reported [8,10]. Additionally, cognitive and control errors in autonomous driving systems have been consistently identified as major accident causes [17,18].

2.2. Literature on Survey Items for Autonomous Vehicle Accidents

Diverse information should be collected in an integrated manner to investigate traffic accidents in a mixed traffic environment that involves autonomous vehicles, and related studies suggested that items for this should be systematically constructed. Road geometric structures (e.g., lane count, curvature radius, and intersection types) are closely related to the occurrence of accidents, and surrounding facilities (e.g., on-street parking lots and dedicated lanes) also affect accident severity [19,20,21].

Vinoth and Sasikumar (2024) emphasized that accident investigation items should include lane markings, sign recognition, and crosswalk locations to accurately analyze the causes of failures in autonomous perception systems [22]. This suggests the need to structure accident causality not only in terms of physical road features but also from the perspective of perceptibility.

Variable and temporary facilities must be recorded because changes in road conditions, such as construction zones and temporary lanes, can affect autonomous driving system (ADS) operation [20,23]. For items related to the driving behavior of vehicles, the vehicle speed, acceleration, inter-vehicle distance, and brake operation status are repeatedly mentioned, and they are used as key items of platooning risk assessment [24,25]. Jeong et al. (2023) suggested that collecting real-time interaction data between the vehicle and surrounding objects in the moments leading up to a crash is useful not only for identifying accident causes but also for developing future risk prediction models [11]. They specifically argued that items such as brake reaction time, the timing of obstacle recognition, and whether evasive maneuvers were attempted should also be included.

As for environmental variables, road conditions, weather, lighting, and location information are closely related to the cognitive accuracy of autonomous driving systems, and they were considered essential in almost every study [26,27,28]. It was also emphasized that interaction data with surrounding objects is necessary to assess the situation immediately before an accident, and that object identification and distance measurement are required [23,29].

Regarding digital elements, the operation status, failure record, and mode-change timing of the autonomous driving system should be recorded in DSSAD (Data Storage System for Automated Driving) and the EDR (Event Data Recorder) as key data for accident-cause analysis and liability attribution [25,28,30].

2.3. Literature on Autonomous Vehicle Risk Scenarios

In constructing autonomous vehicle risk scenarios, previous studies commonly consider infrastructure, environmental conditions, moving objects, and sensors. In terms of road infrastructure, road geometric structures (e.g., intersections, lane count, and curvature) are frequently used as the background for scenarios, and conflicts in intersections or cut-ins in lanes are repeatedly handled as major accident situations [31,32,33].

Chen et al. (2023) classified risk scenarios by road type, traffic signals, and vehicle interactions for simulation-based safety evaluations of autonomous driving, aiming to systematically replicate real accident conditions through various combinations [14]. This study enhanced the comprehensiveness and effectiveness of scenario design by emphasizing multi-layered structuring of road and object conditions.

Fixed infrastructure, including traffic lights and signs, is also essential for evaluating autonomous vehicle recognition and response. Weather and lighting conditions are reflected in almost all scenarios, as they are directly related to the cognitive accuracy of the autonomous driving system [34,35,36]. Weather variation, the time of day, and lighting availability directly affect sensor cognitive performance. In actual accident cases, restricted visibility was identified as a primary cause. Qu et al. (2025) identified types of autonomous driving risk scenarios based on domestic accident footage and proposed commonly included elements for each scenario, such as “illumination by time of day,” “multiple object movements within intersections,” and “rate of inter-vehicle distance reduction.” [37]. They emphasized the importance of constructing context-specific scenarios by incorporating accident contexts frequently observed in Korean urban environments (e.g., illegal U-turns and traffic signal violations by bicycles).

Moving objects and dynamic variables are key elements that reflect the interactions between autonomous vehicles and surrounding objects [32,38,39]. Sudden stops, vehicle lane changes, and pedestrian crossings are commonly included in scenarios as high-risk behaviors, and dynamic information (e.g., relative position, velocity, and TTC (time to collision)) is also considered. Sensor and digital elements are handled as essential conditions to reflect technical limitations, including sensor viewing angles, obstacle shading, and sensor errors [33,35,36].

2.4. Literature on Traffic Safety-Related Indicators in Mixed Traffic with Autonomous Vehicles

To predict high-risk situations in advance in mixed traffic environments, traffic safety indicators that quantitatively capture pre-accident risk signals are needed. Previous studies have mainly classified these indicators into three categories. First, driving behavior indicators include the inter-vehicle distance, time headway, lane departure rate, and longitudinal and lateral acceleration [3,40].

Guo et al. (2023) proposed additional indicators beyond driving behavior metrics, such as the brake response delay time and deceleration pattern variation in autonomous vehicles [41]. By analyzing actual autonomous driving simulation data, they demonstrated that subtle deceleration changes just before abrupt stops serve as key predictive signals of impending accidents. This highlights the need to incorporate dynamic predictive elements that static threshold-based indicators alone cannot capture.

Second, inter-vehicle interaction indicators include the time to collision (TTC), post encroachment time (PET), total impact time (TIT), crash potential index (CPI), and collision avoidance rate (CAR) [4,42,43]. Third, weather and road environment indicators include the precipitation effect, view range, road surface condition, reflective intensity, road curvature, and lane count [34,43,44].

Tang et al. (2025) comparatively analyzed actual autonomous driving footage and manual vehicle operation records and identified “relative speed variation” and “lateral approach angle” as core predictive indicators of collisions in mixed traffic environments [45]. They particularly recommended using the “Lateral Conflict Indicator” in conjunction with PET for lateral interactions at intersections or merging points, suggesting a way to address the limitations of traditional longitudinal-focused metrics in perceiving complex situations.

As such, recent studies have proposed composite indicators based not only on traditional TTC or distance-based metrics, but also on dynamic changes in actual driving behavior and object interactions. To enhance the accuracy of high-risk situation prediction in mixed traffic environments, the integration of static and dynamic indicators is increasingly emphasized.

2.5. Literature on High-Risk Situation Prediction Models in Mixed Traffic with Autonomous Vehicles

Studies on real-time high-risk situation prediction in mixed traffic environments have evolved around traffic flow characteristics, environmental conditions, and digital elements. Most focus on quantifying accident likelihood using structured data. Traffic flow characteristics and driving behavior variables are central to most prediction models. Average speed, speed standard deviation, and occupancy rate are closely related to driving stability and are effective in quantifying accident risks [5,6,7,13,26].

Meng et al. (2023) proposed a model for predicting high-risk situations just before entering intersections using key variables such as speed variation, intersection approach distance, and longitudinal gap [46]. They quantitatively demonstrated that speed deviations and abnormal acceleration patterns occurring in mixed flows of autonomous and manually driven vehicles increase collision probability, suggesting the potential for real-time detection-based warning systems.

Environmental and weather variables complement traffic flow variables to enhance predictive performance, while visibility, road wetness, and temperature increase accident risks by affecting vehicle control and driver response [5,6,7,17]. These variables are primarily collected through external sensor data and serve as critical complementary elements in real-time fusion-based prediction models.

Through real-world driving data, Khelfa et al. (2023) demonstrated that weather changes and road surface conditions are directly linked to sensor perception failures in autonomous vehicles [47]. They introduced “sensor perception accuracy degradation rate” as a new independent variable to enhance model sensitivity. Emphasizing the need for digital infrastructure support in adverse weather conditions, they also highlighted the feasibility of applying HD map correction cycles and infrastructure-cooperative prediction models.

Digital communication and infrastructure variables are unique risk factors in autonomous driving, and communication failure, the construction of HD maps (high-definition maps), and road infrastructure quality are used as input variables of prediction models [4,26].

Gu et al. (2022) developed a sub-model composed solely of digital risk factors by incorporating V2X communication failures and fault probabilities in road segments without high-definition maps [48]. The study empirically demonstrated the independent predictive impact of digital elements by forecasting the conditions under which sensor-communication fusion may fail.

Wang et al. (2024) developed a deep learning-based extreme gradient boosting (XGBoost) model focused on predicting decision-making errors of autonomous vehicles in mixed traffic environments [49]. By training time-series data from 10 s before an accident, the model could predict not only the onset but also the duration of high-risk situations. They integrated vehicle driving logs and sensor statuses, suggesting the potential for real-time warning system development.

As such, recent studies on high-risk prediction models have expanded beyond single-variable analyses toward multidimensional predictive frameworks that integrate traffic flow and environmental and digital variables. Notably, these models quantitatively reflect the interactive characteristics and system limitations specific to mixed traffic environments, distinguishing them from conventional traffic accident prediction models.

2.6. Research Differentiation

Previous studies commonly state that autonomous vehicle accidents are influenced by both environmental conditions (e.g., weather, illumination, and road geometry) and system factors (e.g., ADS operation status and fallback execution), unlike conventional vehicles, and that such system data must be included in accident investigations. Since most of them analyze causes after accidents based on the California DMV (Department of Motor Vehicles) accident reports or autonomous driving mode disengagement reports, research on predicting high-risk situations in the pre-accident phase remains relatively insufficient.

This study assumes autonomous vehicle operation in mixed traffic and aims to detect high-risk situations early in the pre-accident phase and derive preventive factors. To this end, data from diverse sources (e.g., literature meta-analysis, accident history, video footage, expert seminars, etc.) were integrated and analyzed to identify high-risk factors using a data-driven approach. Subsequently, an importance evaluation based on expert prior knowledge was conducted using the analytic hierarchy process (AHP), enabling the quantitative integration of heterogeneous data.

The study also designed a model for predicting high-risk situations by structuring road segment-based risk indicators and autonomous vehicle-level risk indicators in parallel. This makes it possible to precisely detect risks in dual structures beyond single-dimensional prediction by separately constructing space-based variables (e.g., traffic flow, road geometry, and weather conditions), variables based on individual vehicles (e.g., vehicle driving behavior and sensor errors), and digital-related variables (e.g., perception errors).

3. Methodology

3.1. Overall Research Flow

This study aims to propose a framework for predicting high-risk situations in road environments where autonomous and conventional vehicles coexist. To this end, the causes of high-risk situations were analyzed from multiple perspectives and structured into quantifiable indicators. The feasibility of designing a prediction model using the indicators as input variables was then examined. The research comprises the following five main steps. Figure 1 presents the overall research flow.

Figure 1. Overall research flow.

First, in identifying high-risk factors, five types of data were used to reflect various aspects, such as accident context, driver behavior, infrastructure impacts, and autonomous driving system errors. The data used included literature-based meta-analysis, accident history analysis, accident video analysis, autonomous driving video analysis, and expert seminars. Based on these data, multidimensional high-risk factors were comprehensively collected and refined.

Second, the classification of the identified high-risk factors referred to the six-layer structure of the PEGASUS joint project (Project for the Establishment of Generally Accepted Quality Standards, Tools, Methods, Processes, and Scenarios for the Approval of Autonomous Driving Functions) to systematically classify the identified factors, but they were reorganized into an analysis-oriented layer system in accordance with the purpose of this study. While the existing PEGASUS system focused on the verification of autonomous driving systems, this study established each layer as a category that is highly related to accidents and driving data based on the applicability of the prediction model and the data mapping possibility.

Third, for relative importance analysis, the AHP was applied based on the reorganized layer system. The AHP is a multi-criteria decision-making (MCDM) technique. In this study, a panel of 14 experts evaluated relative importance among layers and factors using the pairwise comparison method. The weight for each factor was then calculated through a consistency check. Based on this, the priorities of high-risk factors were quantitatively derived.

Fourth, in the step of deriving high-risk indicators and designing the prediction model, input indicators applicable to the actual prediction model were derived based on the AHP results, and a conceptual framework was developed. First, a candidate panel of observable indicators was compiled for each high-risk factor based on the existing literature, and representative indicators were selected through a follow-up survey with the same expert group. Responses were collected using the complete ranking method, and average ranks were calculated to identify indicators best representing each factor. Based on the derived indicators, a high-risk situation prediction framework was conceptually presented from the perspectives of road segments and individual autonomous vehicles.

3.2. Meta-Analysis Methodology

Meta-analysis is a technique to comprehensively determine the effectiveness or tendency of a particular variable by statistically integrating the results reported from individual studies [50]. In this study, meta-analysis was applied to quantify high-risk factors for autonomous vehicles using existing literature. Its purpose is to secure basic data for subsequent quantitative assessment (AHP) by systematically analyzing the likelihood and significance of accident-related factors [50,51].

The analysis was conducted using two approaches, depending on the data type. First, binary meta-analysis calculates the risk ratio (RR) based on the presence or absence of a specific high-risk factor in each study [50,52]. The RR is defined as follows:

R R = \frac{a / (a + b)}{c / (c + d)}

(1)

where a, b, c, and d are the frequencies according to the presence or absence of the factor and the occurrence of accidents, respectively. The standard error of the log-transformed RR is calculated as follows.

S E (\log (R R)) = \sqrt{\frac{1}{a} - \frac{1}{a + b} + \frac{1}{c} - \frac{1}{c + d}}

(2)

Second, effect size meta-analysis is a method for calculating the standardized effect size (ES) in consideration of the influence of each study (e.g., the number of citations and the influence of the journal) and deriving the overall average effect size by applying the weight for each study (

w_{i} = 1 / S E_{i}^{2}

) [53]. The overall effect size is calculated as follows.

E S_{t o t a l} = \frac{\sum w_{i} E S_{i}}{\sum w_{i}}

(3)

For both analyses, a random effects model was applied to account for heterogeneity across studies. This allowed for a comprehensive estimation of the effects of high-risk factors under various conditions without over-reliance on any single study [52,53].

3.3. AHP (Analytic Hierarchy Process) Methodology

AHP is a representative MCDM technique that quantifies relative importance via pairwise comparisons among multiple criteria and alternatives. In this study, AHP was applied to evaluate the relative importance of the identified high-risk factors and determine their priorities as input variables for the prediction model. AHP has been widely used in the fields of traffic safety and autonomous driving risk analysis as well, because it enables structural comparisons and can quantify expert judgments [54,55].

The AHP procedure proceeds as follows. First, a pairwise comparison matrix among high-risk factors or layers is constructed, and the relative importance of each element is evaluated on a scale of 1 to 9 points. The weight vector is then calculated via eigenvalue analysis, and judgment consistency is evaluated through a consistency test [56]. The consistency index (CI) and consistency ratio (CR) are defined by the following formulas.

C I = \frac{λ_{m a x} - n}{n - 1}, C R = \frac{C I}{R I}

(4)

where

λ_{m a x}

is the maximum eigenvalue of the comparison matrix, n is the dimension of the matrix, and RI is the Random Index. In general, acceptable consistency is defined as CR < 0.1 or 0.15 [56]. To ensure logical coherence and maintain the reliability of the analysis, responses with a consistency ratio (CR) exceeding the acceptable threshold were excluded from the aggregation [56]. Final weights were derived by applying the arithmetic mean to valid responses, considering its simplicity, interpretability, and widespread use in existing AHP-based traffic safety research [57].

In this study, an AHP survey was conducted with 14 experts using a two-tier hierarchical structure (comparison among layers and comparison among factors within layers). The final weights were normalized to a total of 1.0 after averaging all responses.

3.4. Average Ranking Analysis Methodology

Average ranking analysis is a method for calculating the relative importance or representativeness of each item based on the averages of all rankings when pairwise comparisons among the items to be evaluated are difficult [58]. In this study, average ranking analysis was applied as a method to select observable indicators that correspond to the high-risk factors identified through AHP analysis because it was judged to be more suitable than the pairwise comparison method due to the large number of indicators and high independence among the items.

The same 14 experts who participated in the AHP analysis conducted the ranking. For each high-risk factor, two to five candidate indicators were presented, and the complete ranking method, in which the experts rank the indicators based on their representativeness, was applied. The average rank of each indicator was then calculated using the following formula.

A v e r a g e R a n k_{j} = \frac{1}{N} \sum_{i = 1}^{N} r_{i j}

(5)

where

r_{i j}

is the ranking given to the j-th indicator by the i-th expert, and N is the total number of respondents. A lower average ranking value indicates that the indicator better represents the factor. The indicator with the lowest average rank was selected as the representative indicator for each high-risk factor. The indicators were then used as input variables in the prediction framework.

This method has the advantage of reducing the cognitive load on experts while still reflecting collective preferences, and is widely used in MCDM contexts where expert consensus is needed without complex matrix-based evaluations. Previous studies have also highlighted the applicability of average ranking analysis in fields requiring efficient prioritization, such as policy formulation and engineering assessment [59,60].

4. Results

4.1. Identification of High-Risk Factors

In this study, various types of empirical data and expert opinions were utilized to design a framework for systematically analyzing and predicting high-risk situations in a mixed traffic environment where autonomous and conventional vehicles coexist. High-risk factors were identified through literature review, accident histories, accident videos, autonomous vehicle driving footage, and expert seminars. The data were constructed to capture risk factors from different aspects, including the context of accidents, driving environment, and technical errors. The analysis proceeded according to the following steps.

4.1.1. Existing Literature-Based Meta-Analysis

After collecting a total of 85 studies, 58 studies (18 studies for accident factors, six studies for scenarios, 30 studies for prediction models, and 4 studies for regulations) were selected as final analysis targets based on meta-analysis applicability and content consistency. Binary and effect size meta-analyses were conducted based on the occurrence and significance of autonomous vehicle risk factors reported in each study.

In the binary meta-analysis, whether certain high-risk factors were mentioned or not was summarized using binary values (zero or one), and the frequency ratio and RR were calculated. Consequently, acceleration/deceleration, intersections, rear-end collisions, fog, rain, inter-vehicle distance, and lane count were identified as high-risk factors, each showing an RR exceeding 25%. These high-risk factors are highlighted in blue in Table 1.

Table 1. Results of binary meta-analysis.

In the effect size meta-analysis, weights were assigned based on study influence (e.g., the number of citations and the influence index of the journal), and the relative importance of each factor was quantified. The results of the effect size meta-analysis are presented in Table 2, with the factors identified as high-risk highlighted in blue for ease of reference. The analysis results revealed that acceleration/deceleration, fog, rain, intersections, the lane count, rear-end collisions, inter-vehicle distance, and pedestrian collisions were the main factors. Notably, factors related to autonomous vehicle driving behavior showed effect sizes exceeding 0.5.

Table 2. Results of effect size meta-analysis.

4.1.2. Accident History Data Analysis

To identify high-risk factors associated with actual accidents in mixed traffic environments, the accident history data of conventional and autonomous vehicles were utilized in this study. The data used for analysis were the conventional vehicle accident history from the Korean National Police Agency (2019–2023, a total of 1,037,516 cases) and the autonomous vehicle accident history from the California DMV (2019–2024, a total of 332 cases). Items, such as the date and time of accidents, location, weather, road surface condition, overview of accidents, and the number of injuries, were extracted and used for analysis.

After calculating accident frequency and injury rates, items exceeding a threshold for either indicator were identified as high-risk factors (Equation (6)). In this instance, the third quartile (75%) served as the threshold, as shown in Figure 2. For items with insufficient sample sizes, the median value (50%) was used as a supplementary criterion. This is because traditional statistics (e.g., the mean and standard deviation) can be easily distorted by extreme values (outliers). At the same time, quartile-based thresholds stably represent the upper segments of the data distribution. In particular, when high-risk items are selected based on actual accident data, as in this study, the risk tends to be concentrated in the upper risk sections rather than the statistical average. Thus, the third quartile is a more suitable threshold for this analysis.

P_{i n j u r e d} (A_{i}) = \frac{N_{i n j u r e d} + N_{f a t a l i t y}}{N_{i n j u r e d t o t a l}}

(6)

Figure 2. Identification of high-risk factors based on accident data.

$P_{i n j u r e d} (A_{i})$ is the injury rate in Category $A_{i}$ .
$N_{i n j u r e d}$ is the total number of serious (injured) people in Category $A_{i}$ .
$N_{f a t a l i t y}$ is the total number of fatalities in Category $A_{i}$ .
$N_{i n j u r e d t o t a l}$ is the total number of injuries across all accidents.

From the conventional vehicle accident history, 11 items were identified: underground roads, intersections, structures, cloudy weather, rain, wetness/moisture, pedestrians, trucks, two-wheelers, centerline violation, and traffic signal violations. From the autonomous vehicle accident history, 19 items were identified: ramp entry and exit, intersection, structure, nighttime, cloudy, clear, dry, lane change, sudden stop, right turn, left turn, moving straight, truck, passenger car, van, bicycle, on-road stopping, and driver inattention.

4.1.3. Accident Video Data Analysis

The accident videos posted online were analyzed to supplement the limitations of the structured accident history data. A total of 10,030 conventional vehicle accident videos and 65 autonomous vehicle accident videos were collected, and they were classified into 24 items after converting the subtitles, text, and audio information into text format in a database.

Similar to the accident history analysis, items in the third quartile or higher were classified as high-risk factors. For conventional vehicles, 12 items were identified: intersection, bus-only lane, daytime, sudden stop, moving straight, pedestrian, passenger car, van, two-wheelers, centerline violation, signal violation, and driver inattention. For autonomous vehicle accidents, nine items were identified: single roadway, structure, emergency vehicle, nighttime, clear, sudden stop, moving straight, drunk driving, and perception error. Notably, “perception error” emerged as a key factor reflecting the inherent vulnerability of autonomous driving systems. As described, accident video-based analysis is effective in capturing behavioral and cognitive elements that existing historical data may overlook.

4.1.4. Autonomous Vehicle Driving Video Data Analysis

In addition to post-accident data, such as existing accident history and accident videos, Waymo’s autonomous vehicle driving video data were used in this study to identify potential high-risk situations encountered during autonomous vehicle operation. The analysis was conducted to identify high-risk factors with a focus on errors that may occur in the perception and judgment processes of autonomous vehicles. Detailed analysis procedures and technologies are provided in Kim et al. (2025) [61].

As shown in Figure 3, the analysis consists of (1) extracting text information from generative artificial intelligence (AI)-based accident reports, (2) extracting autonomous vehicle accident factors, (3) captioning Waymo driving images, and (4) classifying high-risk situations through the BERT (Bidirectional Encoder Representations from Transformers) model. Risk indicators in autonomous driving scenarios were analyzed using textual and visual data.

Figure 3. Process of identifying high-risk factors based on autonomous vehicle driving video analysis [61].

In the analysis results, ten factors that may lead to high-risk situations during autonomous driving are as follows: intersection, highway, nighttime, daytime, clear, pedestrian, bicycle, on-road parking, on-road stopping, and perception error. In particular, the “perception error” was identified as a key factor that shows how the sensor-based perception system of autonomous vehicles can fail in the actual road environment, which is differentiated from conventional accident-based analysis. The video analysis also revealed that common elements, such as stopped or parked vehicles, may pose a threat to autonomous driving systems.

4.1.5. Expert Seminar

To compensate for the limitations of quantitative analysis, a seminar was held with seven experts in the field of autonomous driving safety. Through the seminar, high-risk factors were identified from practical perspectives, including the perception/decision/control of autonomous vehicles, interactions with drivers, and communication/security threats.

The seminar covered seven key topics, as outlined in Table 3. For each topic, the participants presented high-risk situations from various perspectives, including the limitations of the perception/decision/control processes of autonomous vehicles, interactions with drivers in mixed driving conditions, and communication/cybersecurity threats. Additionally, high-risk possibilities were further discussed for factors that are not captured in existing data but may frequently occur in real road environments, such as the impact of vehicle type distribution and delays in decision-making at complex intersections.

Table 3. Results of key factor identification through expert consultation.

More than 20 high-risk factors were identified during the seminar. These factors spanned multiple categories, including road structures, traffic flow interactions, object behavior, and digital elements. In particular, the experts repeatedly emphasized digital-based risk factors (e.g., perception error, decision error, DDT fallback, and cyberattacks) and risk factors in complex road segments (e.g., intersection, ramp in/out, and bottleneck point). These results served as a foundation for empirically deriving high-risk factors in real-world driving conditions, and were reflected in AHP-based importance assessment and prediction framework design.

4.2. Layer-Based Reclassification

The six-layer structure proposed by the PEGASUS project, which classifies road and driving environments for scenario-based verification of autonomous vehicles, has been widely used to ensure the stability and reliability of autonomous driving functions. The purpose of this study, however, is to establish a factor structure for predicting high-risk situations that may occur in a mixed traffic environment with autonomous and conventional vehicles rather than for scenario creation or system testing. For this prediction-oriented purpose, the existing PEGASUS system has limitations.

First, the PEGASUS framework lacks consistency with empirical data (e.g., accident history and video data) because it emphasizes a system-centered environment definition. Second, it is limited in precisely analyzing accident causes because road structures and transportation infrastructure are integrated into one category. Third, each factor must be translated into measurable and quantifiable indicators for prediction model design, but the existing framework fails to meet these requirements.

To improve predictability and data consistency, this study restructured the layer system based on three criteria. First, it must align with actual accident and driving data. Second, it must be translatable into quantitative indicators that can be used as input variables for the prediction model. Third, the corresponding factors must significantly contribute to risk detection and response in the perception–decision–control processes of autonomous vehicles. Based on these criteria, the PEGASUS system was reorganized into six analysis-oriented layers as follows (see Table 4). It is maintained in the same structure throughout the AHP-based analysis and prediction framework design process.

Table 4. Reclassified data layers and their correspondence to the original PEGASUS framework.

(1): Road facilities: Physical structures and equipment installed for road safety and efficiency (e.g., lighting facilities, road alignment, and road grades), which are basic infrastructure that corresponds to the background conditions of accidents.
(2): Variable and temporary facilities: Facilities with operation modes that vary over time or situation (e.g., variable lanes and construction zones), representing dynamic risk factors that complicate prediction.
(3): Traffic flow characteristics: Aggregate movement patterns of vehicles on the road, including density, speed, conflicts, and bottlenecks. These act as key indicators of persistent risk levels.
(4): Environmental variables: External conditions, such as weather, time, and road surface condition, which are directly related to driving stability by affecting vehicle sensor performance or drivers’ view.
(5): Moving objects: Road users, including vehicles, pedestrians, and bicycles, and their behaviors (e.g., lane changes, signal violations, and inter-vehicle distance), which directly contribute to hazardous conditions.
(6): Digital: Digital-based elements (e.g., perception, decision, and communication systems related to autonomous driving functions), including inherent technological risks, such as sensor malfunctions and cyberattacks.

The identified high-risk factors were obtained from six different data sources and classified according to the redefined six-layer framework. Specifically, six factors were derived through literature meta-analysis, 32 from expert seminars, and 24 from accident history data analysis (11 from conventional vehicle accidents and 18 from autonomous vehicle accidents, with 5 overlapping items). In addition, 23 factors were identified through driving video analysis (12 from conventional vehicle accident videos, 9 from autonomous vehicle accident videos, and 9 from autonomous vehicle driving videos, with 7 overlapping items). Due to length constraints, the final classification of high-risk factors based on the six-layer system is presented in Appendix A.

The analysis revealed that integrating heterogeneous data sources enabled the identification of high-risk factors that would be difficult to capture using a single source. Some high-risk factors were identified only from specific data sources, allowing for a more comprehensive reflection of potential risks across various environments and situations.

For example, digital-based factors such as “hacking,” “cyberattack,” and “DDT fallback failure (failure of driver takeover in autonomous driving systems)” were not captured in traditional sources such as literature reviews, accident records, or video analysis but were identified in expert seminars as critical high-risk elements in autonomous environments. This suggests that software errors or external communication threats in advanced autonomous systems can indeed act as actual accident triggers. In addition, the “dilemma zone,” although a high-risk situation arising from behavioral differences between autonomous and conventional vehicles, did not appear explicitly in accident data or videos but was identified through expert surveys as a unique intersection risk in mixed traffic environments. Another example includes “emergency vehicle,” which was identified only through autonomous vehicle accident videos, while dynamic or temporary facility-related risks such as “bus-only lane,” “construction zone,” “variable lane,” and “high-accident zone” were identified solely in expert seminars. Environmental variables like “cloudy” appeared consistently across multiple sources, including autonomous and conventional accident histories and expert seminars. However, “nighttime” was repeatedly confirmed in autonomous vehicle accident histories, accident videos, and driving footage, supporting the notion that reduced visibility conditions may impair autonomous system performance.

4.3. AHP Analysis Design and Execution

The AHP was applied in this study to quantitatively evaluate the relative importance of the identified high-risk factors. AHP is an analytical technique to quantify expert judgments in multi-criteria decision-making problems and systematically derive relative priorities among factors. It has been widely used to assess risks in complex systems, such as autonomous vehicles [21,62].

The AHP model designed in this study uses a two-level hierarchy structure. The first hierarchy compares the relative importance of the six redefined data layers, and the second hierarchy evaluates the relative importance of the high-risk factors in each layer. The structure did not include all initially identified factors as-is. Instead, it integrated items with overlapping meanings or inclusive relationships and excluded general conditions (e.g., “clear”) with a low correlation with high-risk situations from analysis targets to improve survey efficiency and response reliability. Table 5 summarizes the factors included in the final AHP survey.

Table 5. The risk factors included in the final AHP survey.

The expert survey was conducted with a total of 14 experts, consisting of nine professors and five researchers in the fields of autonomous driving and traffic safety in Korea, including nine professors, four researchers, and one research professor. The expert selection criteria were as follows: (1) more than 10 years of research or working experience in the relevant field, (2) participation in government-led projects related to autonomous vehicle safety or smart mobility, and (3) affiliation with major domestic universities or national research institutes. Notably, all experts are currently engaged in active research on autonomous driving and were selected for their expertise for the purpose of this study. Reliability was confirmed as CR remained below 0.15 for all responses. Eigenvectors were calculated based on the pairwise comparison matrix, which was based on a scale of one to nine points. Based on this process, weights were assigned to each hierarchical level.

According to the results of the first hierarchy, “moving objects (0.2437)” and “digital (0.2221)” layers exhibited the highest importance, while “road facilities (0.0643)” showed relatively low importance. This indicates that driving behavior and system error factors have a more significant impact than physical infrastructure in high-risk situations (see Table 6).

Table 6. Results of the AHP (the first hierarchy).

In the second hierarchy, weights for high-risk factors were determined for each layer. For example, the truck, pedestrian, and signal violation factors exhibited high importance in “moving objects”. In contrast, the decision error, perception error, and control error factors showed relatively high weights in “digital”. On the other hand, “on-road parking/stopping” and “vehicle type distribution” received low weights (see Table 7).

Table 7. Results of the AHP (the second hierarchy).

Accordingly, high-risk factors in the “moving objects” and “digital” layers should be prioritized in future autonomous driving safety system designs. For instance, high-precision object recognition algorithms to mitigate interaction risks with trucks and pedestrians, real-time inter-vehicle distance monitoring systems, and signal violation detection functions are needed. In addition, to address perception and decision-making errors, internal error detection and alert systems within autonomous vehicles, as well as preemptive response scenarios and safe driving mode activation for DDT fallback situations, must be implemented.

Finally, considering the usability of the analysis, only items for which the cumulative weight of the overall importance was 0.8 or higher were adopted in accordance with the Pareto principle (80:20) rather than reflecting all factors evenly. As a result, 28 high-risk factors were identified across the six layers. They were used to derive key input variables for high-risk indicators and prediction framework design (see Table 8).

Table 8. Final high-risk factors identified.

4.4. High-Risk Indicator Design and Prediction Framework Conceptualization

The relative importance of each high-risk factor in a mixed environment with autonomous vehicles was quantitatively derived through the AHP analysis in the previous section. Since most of these factors had qualitative concepts, however, observable indicators for each factor were derived to establish quantitative decision criteria and link them to the prediction flow. Based on this, an attempt was made to design a prediction framework.

4.4.1. Identification of High-Risk Indicators

High-risk indicators were identified in two steps: literature-based matching and expert consultation. First, a group of candidate indicators corresponding to each high-risk factor was formed by examining the literature related to traffic safety indicators and autonomous vehicle performance evaluation. For example, PET, the weaving ratio, and TTC represent “conflict” factors, while the stopping rate, average speed, and traffic flow density reflect “congestion” factors. In this process, the researchers of the present study added candidate indicators beyond those in the literature, considering practical applicability and the possibility of securing data. Table 9 summarizes the candidate indicators and their descriptions. The full list of variables and their descriptions used for the average ranking methodology is provided in Appendix B due to space limitations.

Table 9. Additional candidate indicators for high-risk prediction: practical considerations and data availability.

As a follow-up step, a complete ranking survey was conducted on the candidate indicators for each factor with 14 experts who participated in the AHP analysis. Experts ranked indicators by representativeness using the average rank method. This method can quantify relative representativeness in the evaluation of multiple items that make pairwise comparison difficult, and it has been widely used in the MCDM field [63,64]. The three indicators with the lowest average rankings were selected as representatives for each factor, as shown in Table 10.

Table 10. Key indicators for application in high-risk prediction models.

4.4.2. Proposal of a High-Risk Situation Prediction Framework in Mixed Traffic with Autonomous Vehicles

High-risk factors must be transformed into measurable indicators for use in applications such as prediction models. As such, in this study, representative indicators that correspond to the key high-risk factors identified based on AHP were derived. These indicators serve as critical input variables for predicting and addressing high-risk situations.

In particular, two prediction levels (road segments and autonomous vehicles) were distinguished in this study to conceptually propose a framework for the structural characteristics of each level. The prediction levels were distinguished because high-risk situations at the two levels have different causes and operational mechanisms. While road segments have relatively static risk patterns due to physical infrastructure and traffic flow characteristics, autonomous vehicles have risks that vary in real time due to real-time interactions and environmental factors that occur while driving. Therefore, the types of risk indicators and risk prediction methods are heterogeneous, requiring a prediction system with separate structures for each level.

Accordingly, the proposed framework adopts an input–process–output structure.

Input: The representative high-risk indicators identified through the AHP and expert consultation (e.g., accident frequency, conflict rate, stopping rate, weather conditions, inter-vehicle distance, and perception error)
Process: Forecasting indicator values over time using time-series algorithms (e.g., LSTM (long short-term memory)).
Output: Calculating the risk index for road segments or autonomous vehicles based on the predicted indicator values.

Road segment-level prediction uses two methods depending on data availability, as shown in Figure 4. In segments with only historical data, indicators that represent mid- to long-term risk levels (e.g., accident frequency, stopping rate, and conflict rate) are utilized. In segments where real-time information is available, indicators that fluctuate in real time (e.g., traffic volume, speed, and weather conditions) are additionally reflected to enable more sensitive detection. Static and dynamic risks can be addressed simultaneously by integrating historical and real-time data.

Figure 4. Conceptual framework for high-risk prediction model (road segment level).

Autonomous vehicle-level prediction uses the real-time driving behavior of vehicles, surrounding object information, and environmental conditions as input values to predict high-risk indicator values in n seconds. The vehicle-level risk index is calculated from these values (Figure 5). The input variables include vehicle behavior (e.g., inter-vehicle distance, lane change, and turning behavior), the relative speed and location of surrounding objects, illumination and weather conditions, and the risk of the road segment. The predicted high-risk indicators are converted into a risk index using a weighted sum or rule-based logic. The index informs operational strategies for autonomous vehicles.

Figure 5. Conceptual framework for high-risk prediction model (autonomous vehicle level).

Additionally, vehicle-level prediction is linked to segment-level prediction. Because identical behavior may pose greater risk in the high-risk segment, the road segment risk is integrated into the input value of the autonomous vehicle model. This makes it possible to construct a hierarchical prediction system that accounts for both temporal and spatial sensitivity.

As described, high-risk indicators serve as prediction targets and inputs and can be combined with various algorithms, including LSTM, the GRU (gated recurrent unit), random forest, and the support vector machine (SVM). The proposed framework can be extended into a unified prediction system that integrates real-time response and prevention measures in a mixed environment with autonomous vehicles.

5. Discussion

This study aimed to establish a foundation for predicting high-risk situations in mixed traffic environments involving both autonomous and conventional vehicles by integrating five heterogeneous data sources—literature, accident history, accident videos, autonomous vehicle driving videos, and expert seminars—to identify high-risk factors. Unlike previous studies that relied solely on accident frequency or severity, this study also comprehensively considered system errors (e.g., perception and decision failures), variable road infrastructure conditions (e.g., construction zones and variable lanes), and real-time risk variables (e.g., traffic conflicts and congestion), thereby establishing indicators that can quantitatively detect pre-crash risk signals. Notably, the PEGASUS six-layer framework was restructured to suit the study’s objectives, and the derived high-risk factors were quantified using the AHP and average ranking methods to transform them into input variables directly applicable to prediction models. Based on this, a high-risk prediction framework was proposed.

The results of this study are broadly consistent with those of previous research on the major causes of autonomous vehicle accidents. In terms of road infrastructure, previous studies have reported that design features such as intersection types, lane count, dedicated lanes, and on-street parking are closely related to accident occurrence [8,9,10,11]; this study likewise identified these elements as high-risk factors. Regarding traffic flow characteristics, several risk factors identified in this study align with prior findings that speed variation, congestion levels, and upstream traffic conditions significantly affect driving stability and accident risk in autonomous vehicles [7,12]. Additionally, this study incorporated environmental variables such as inclement weather and reduced visibility, which have been shown to impair the accuracy of autonomous perception systems and increase accident likelihood [13,14,15].

Accidents resulting from interactions between autonomous vehicles and surrounding objects were also a key area of discussion. Sudden interactions with pedestrians, bicycles, and conventional vehicles are closely associated with rear-end collisions and other major accident types [8,10], which aligns with this study’s emphasis on the high importance of “moving object” factors. Furthermore, perception and control errors within autonomous systems have been cited as major accident causes [15,16]; this study identified “digital elements” and “system risks” as high-risk factors.

Unlike previous studies that primarily focused on single factors or post-accident analysis, this study offers significant academic distinction by integrating heterogeneous data sources and quantitatively identifying latent risk factors in the pre-accident phase within a multilayered structure. Specifically, by restructuring the PEGASUS six-layer framework and applying AHP and average ranking methods, the study converted expert qualitative judgments into real-time, actionable quantitative indicators—thereby presenting a structured, multilayered metric system that can be directly applied in prediction model design [3,4,5,6,7,20,22,25,29,30,31,32,33,34,35,36,37,38,39].

This research presents a proactive approach to risk management by quantitatively identifying and structuring risk factors in advance, based on various data, moving beyond response-oriented analyses conducted after the occurrence of an accident. By comprehensively considering various dimensions of risk, such as technical failures, traffic interactions, and digital vulnerabilities, within a multi-layered structure, the study can contribute to the development of a long-term, sustainable traffic safety system rather than short-term reactive measures. This aligns with the need for a resilient, prevention-oriented urban traffic system design, particularly in the early stages of the commercialization of autonomous vehicles. Therefore, the results of this study can be used as practical basic data for overcoming the limitations of existing traffic safety management systems and establishing future traffic safety strategies.

Despite these contributions, this study has some limitations. First, although it proposes a novel framework that integrates heterogeneous data and applies a multilayered structure to identify high-risk factors in mixed traffic environments, it remains at a conceptual level and has not yet been applied or validated in real-world driving environments. Second, while the study systematically derived indicators by integrating expert judgment and diverse data, it did not empirically verify whether these indicators could effectively predict high-risk situations in actual autonomous driving contexts. Third, most of the autonomous vehicle data used in this study correspond to Level 3 automation, which limits the framework’s applicability to future Level 4 full-automation environments. The limited availability of autonomous vehicle accident data may also affect the diversity of analyzable cases. However, the study sought to reflect the unique risks of autonomous driving by collecting accident data as comprehensively as possible and incorporating expert assessments, despite practical constraints. Fourth, although the use of the AHP and average ranking methods to quantify expert opinions is structurally useful, the results may still reflect subjectivity depending on the expert group composition and judgment variability. Fifth, the applicability and performance of the prediction model may be affected by the absence of real-time sensor data and varying traffic densities, which were not considered in this study.

To address these limitations, future research should focus on implementing and validating prediction models that combine road segment-level structural risk factors and vehicle driving characteristics derived in this study. These models should be applied not only to fixed structural factors but also in conjunction with real-time operational data, and their performance should be experimentally evaluated in high-fidelity test environments—such as driving simulators or autonomous driving testbeds—under various traffic, environmental, and digital conditions. In particular, it is crucial to empirically verify how high-risk factors derived via AHP affect the occurrence of accidents or near-misses in actual autonomous driving scenarios.

Additionally, to overcome the limitation of Level 3-based data, future research could utilize Level 4 autonomous vehicle data (e.g., from test vehicles (including sensor data), pilot programs, or restricted operational domains) or design scenario-based simulation experiments assuming Level 4 conditions. This approach would not only enhance the framework’s alignment with future autonomous deployment environments but also contribute to refining scenario-based risk response strategies.

Finally, to address the limitations of expert-based weighting, future work could supplement or calibrate expert judgments with data-driven methods such as machine learning-based variable importance analysis or regression coefficient–based impact estimation. This hybrid approach—combining expert insight and empirical data—can enhance model objectivity and robustness while expanding its applicability across diverse road environments and stages of autonomous driving technology adoption.

6. Conclusions

This study followed these procedures to establish a quantitative basis for proactively detecting and responding to high-risk situations in mixed traffic environments involving autonomous and conventional vehicles. First, high-risk factors were identified from multiple perspectives, including literature meta-analysis, accident history and video data analysis, autonomous vehicle driving video footage, and expert seminars. After redefining the six-layer structure of PEGASUS to suit the research objectives, the relative importance of each factor was calculated using the AHP, and representative indicators corresponding to the high-risk factors were derived through average ranking analysis. Finally, using the derived indicators, a conceptual framework was designed to predict risks at both the road segment and autonomous vehicle levels.

The findings showed that high-risk factors were not confined to a single category. They were derived from various dimensions, such as road physical characteristics, traffic flow interactions, weather and illumination conditions, the driving behavior of autonomous vehicles, and system errors. This highlights the complexity and layered risk structure inherent in mixed traffic environments involving autonomous vehicles. The layer-based analysis revealed that the importance of the “moving object,” “digital,” and “traffic flow characteristics” layers was comparatively high. This shows that both technical faults and real-time interaction risks significantly affect the safety of autonomous vehicles. In addition, driving behavior patterns of autonomous systems, perception and decision-making errors, and external communication threats were identified as more influential core factors than physical infrastructure. Digital-based risk factors such as “DDT fallback” and “cyberattack,” which are not captured in existing accident data, were additionally derived through expert seminars. This indicates the need for a diversified approach to identify high-risk factors in the future design of autonomous driving safety systems. By quantifying the identified high-risk factors through the AHP and the average ranking method, they were converted into input variables directly applicable to prediction models. Utilizing these variables, the study proposed a dual-layer prediction framework that distinguishes between spatial units (road segments) and object units (autonomous vehicles)—a key contribution of this research.

This study offers three main contributions. First, an attempt was made to enhance the data-based precision of autonomous driving safety research by comprehensively analyzing empirical data with varying characteristics and deriving high-risk factors from multiple perspectives. Second, layer-based expert judgments were systematically quantified using the AHP and the average ranking method. They were subsequently converted into structured input variables for use in prediction models. Third, a basis for linking real-time response with proactive risk diagnosis was presented by distinguishing between the space level (road segments) and the object level (autonomous vehicles) in prediction model design.

However, this study has the following limitations. First, the proposed framework and high-risk factors have not yet been empirically validated in actual autonomous driving environments. Second, most of the autonomous vehicle data used in the analysis correspond to Level 3, limiting applicability to fully autonomous (Level 4) scenarios. Third, the expert evaluations using the AHP are subject to potential bias depending on expert composition and judgment. Fourth, the actual influence of the quantitatively derived factors on accident occurrence or near-miss situations has not been empirically verified. Fifth, limitations regarding prediction model applicability and potential performance degradation due to varying traffic densities and the absence of real-time sensor data were not addressed.

Future research should implement prediction models based on the identified high-risk factors and validate their performance across various conditions using simulators or testbeds. Moreover, acquiring Level 4 data (including sensor data, etc.) or designing scenario-based experiments could expand applicability. Combining expert evaluation with data-driven analysis would further strengthen model objectivity and effectiveness. Such studies could be used to develop real-time risk prediction systems for autonomous vehicles, refine scenario-based simulations, and establish infrastructure strategies focused on high-risk zones, ultimately contributing to safety assurance and policy development in the early stages of commercialization.

This study presented a foundation that can contribute to the preparation of a sustainable traffic safety system by detecting and structuring risk factors in advance beyond short-term accident prevention. Specifically, a multi-layered structure and system of quantified indicators can be utilized for the design of real-time response systems, high-risk segment-based infrastructure strategies, and policy intervention measures from a perspective of proactive risk management. This approach can contribute to minimizing the risk of accidents in the early stages of the commercialization of autonomous vehicles and enhancing long-term traffic system resilience and operational stability.

Author Contributions

Conceptualization, H.H., J.J. and S.L.; methodology, H.H. and J.J.; software, H.H. and J.J.; formal analysis, H.H. and J.J.; investigation, H.H., J.J. and J.L.; resources, S.L.; data curation, H.H. and J.J.; writing—original draft preparation, H.H.; writing—review and editing, H.H., J.J., S.L. and J.L.; visualization, H.H. and J.J.; supervision, S.L.; project administration, S.L.; funding acquisition, S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Korea Institute of Police Technology (No: RS-2024-00405603).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data collected in this study are not publicly available; interested parties may contact the corresponding author to request access.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Layer-wise high-risk factors in mixed autonomous vehicle environments.

Layer	High-Risk Factors	Literature Review	Expert Consultation	Historical Accident Data		Video-Based Accident Data
Layer	High-Risk Factors	Literature Review	Expert Consultation	Police Accident Data (CV)	CA DMV Collision Report (AV)	Accident Video (CV)	Accident Video (AV)	Driving Video (AV)
Road facilities	Underpass			o
	ramp on/off		o		o
	Single roadway						o
	intersection	o	o	o	o	o		o
	tunnel		o
	Roundabout		o
	Road lighting		o
	Expressway/Highway							o
	Road structure/facility			o	o		o
	# of lanes	o	o
Variable and temporary facilities	Bus-only lane		o			o
	variable lane		o
	High-accident zone		o
	Construction zone		o
Traffic flow characteristics	Traffic conflict		o
	dilemma zone		o
	Vehicle type distribution	o	o
	Bottleneck point		o
	Emergency vehicle
	Stop-and-go traffic		o
Environmental variable	Nighttime				o		o	o
	Daytime					o		o
	Fog	o	o
	Cloudy		o	o	o
	Wet/Moist Surface			o
	Clear		o		o		o	o
	Dry				o
	Rain/Snow	o		o
Moving objects	Lane change		o		o
	Sudden stop		o		o	o	o
	Right turn		o		o
	Left turn		o		o
	Moving straight		o		o	o	o
	Pedestrian	o		o		o		o
	Truck			o	o
	Passenger Car				o	o
	Van				o	o
	Bicycle				o			o
	Motorcycle			o		o
	On-road parking		o					o
	On-road stopping		o		o			o
	Inter-vehicle distance		o
	Centerline violation			o		o
	Signal violation		o	o		o
	Drunk driving						o
	Signal waiting
	Driver in attention		o		o	o
Digital	Perception error		o				o
	Decision error		o
	Control error		o
	DDT fallback		o
	In-vehicle/V2X communication		o
	Hacking		o
	Cyberattack		o

Appendix B

Table A2. Comprehensive variable list for high-risk indicator derivation.

Layer	High-Risk Indicators	Definition
Road facilities	Presence of intersection	Indicates whether an intersection is present in the road segment.
	Presence of a crosswalk	Indicates whether a pedestrian crosswalk exists in the area.
	Intersection type	Specifies the type of intersection, such as T-junction or four-way.
	Presence of on/off-ramp	Indicates the presence of highway or expressway access ramps.
	Speed limit	Maximum legal speed allowed on the road segment.
	Lane count	Number of lanes in the road segment.
	Presence of roundabout	Indicates whether a roundabout is present in the segment.
	Presence of underpass (road)	Indicates whether the segment includes an underpass.
	Underpass (road) length	Length of the road underpass in meters.
	Presence of tunnel	Indicates whether a tunnel is present on the road segment.
	Tunnel length	Length of the tunnel in meters.
Variable and temporary facilities	Presence of road obstacles	Indicates whether any obstacles are present on the road.
	Location of road obstacles	Specifies where obstacles are located (e.g., lane or shoulder).
	Type of road obstacles	Describes the obstacle type (e.g., pothole, debris, or roadkill).
	Presence of construction/work zone	Indicates if construction or maintenance work is ongoing.
	Location of construction/work zone	Specifies the position of the work zone on the road.
	Presence of reversible lane	Indicates if a reversible traffic lane is present.
Traffic flow characteristics	Weaving ratio	Rate of lane-changing or weaving maneuvers per segment.
	PET (post encroachment time)	Time interval between two vehicles occupying the same location.
	Standard deviation of link travel speed	Measures the variation in vehicle speeds across a road segment.
	Presence of accident-prone zone	Indicates if the segment is classified as accident-prone.
	Proportion of risky driving behavior	Share of vehicles exhibiting aggressive or dangerous maneuvers.
	EDI (erratic driving index)	Metric quantifying unexpected or irregular vehicle movements.
	Stopping rate within the segment	Frequency of vehicles coming to a complete stop in the segment.
	Presence of dilemma zone	Indicates if a zone exists where drivers hesitate to stop or proceed at yellow light.
	Speed difference between adjacent links	Difference in average speed between neighboring road segments.
Environmental variables	Heavy vehicle ratio	Percentage of large vehicles (e.g., trucks and buses) among total traffic.
	Snowfall amount	Volume of snowfall measured in the area.
	Snowfall duration	Duration of snowfall in hours or minutes.
	Rainfall amount	Volume of rainfall measured in the area.
	Rainfall duration	Duration of rainfall in hours or minutes.
	Temperature	Ambient temperature in degrees Celsius.
	Night time period presence	Indicates if the condition occurs during nighttime.
Moving objects	Presence of pedestrians near autonomous vehicle	Indicates pedestrian presence close to autonomous vehicles.
	Presence of freight vehicles near autonomous vehicle	Presence of trucks or delivery vehicles near AVs.
	Presence of signal violation	Indicates whether a traffic signal violation has occurred.
	Acceleration/Deceleration	Change in vehicle speed per unit time (m/s²).
	Speed	Current speed of the vehicle.
	Jerk	Rate of change in acceleration, indicating abrupt movement.
	Angular velocity per second	Rate of vehicle’s directional rotation over time.
	Inter-vehicle distance	Distance between the subject vehicle and the one ahead.
Digital	Presence of perception error	Whether the AV experienced sensor or detection failures.
	Frequency of perception errors	How often perception errors occur within a timeframe.
	Sensor field of view	Detection range and coverage angle of the AV’s sensors.
	Presence of decision-making error	Whether the AV made an incorrect driving decision.
	Frequency of decision-making errors	Number of incorrect decisions made during operation.
	Presence of control error	Whether the AV experienced control system malfunctions.
	Frequency of control errors	Number of control-related failures in vehicle systems.
	Presence of DDT fallback	Indicates whether fallback mode was triggered in AV operation.
	Frequency of DDT fallback	Count of fallback events initiated during driving.
	Presence of cyberattack	Whether a cyberattack targeting vehicle systems occurred.
	Frequency of cyberattacks	Number of cyberattacks detected during vehicle operation.

Appendix C

Table A3. Abbreviation list.

	High-Risk Indicators	Definition
PEGASUS Joint Project	Project for the Establishment of Generally Accepted Quality Standards, Tools, Methods, Processes, and Scenarios for the Approval of Autonomous Driving Functions	German Joint Project for Standardization of Autonomous Driving Safety Validation
AHP	Analytic Hierarchy Process	A hierarchical decision-making analysis method that quantifies the relative importance of factors
ADS	Autonomous Driving Systems	Autonomous driving system
DSSAD	Data Storage System for Automated Driving	Autonomous driving recorder (stores data during operation)
EDR	Event Data Recorder	Device that stores data at the moment of an accident (also used in non-autonomous vehicles)
TTC	Time to Collision	Time remaining until collision (a risk indicator)
PET	Post Encroachment Time	Time gap between two objects passing through the same point (a surrogate indicator for spatial threat)
TIT	Total Impact Time	Total time to collision (used as a composite indicator of collision probability)
CPI	Crash Potential Index	Collision potential index
CAR	Collision Avoidance Rate	Collision avoidance rate (evasive performance of the system or driver)
HD Maps	High-Definition Maps	High-precision map (centimeter-level accuracy, essential for autonomous driving)
California DMV	California Department of Motor Vehicles	California Department of Motor Vehicles (responsible for releasing autonomous vehicle-related data)
MCDM	Multi-criteria Decision-making	Multi-criteria decision-making methods (including AHP, TOPSIS, etc.)
RR	Risk Ratio	A ratio of risk levels between comparison groups
ES	Effect Size	A statistical measure of magnitude of impact
CI	Consistency Index	Consistency index (used to assess consistency in AHP matrices)
CR	Consistency Ratio	A value obtained by dividing the CI by the RI, used to judge acceptability
RI	Random Index	Average consistency index of a random matrix (used in CR calculation)
BERT	Bidirectional Encoder Representations from Transformers	Deep learning model for natural language processing
CV	Conventional Vehicle	Conventional vehicle (driven by a human)
AV	Autonomous Vehicle	Autonomous vehicle
CAV	Connected Autonomous Vehicle	Connected autonomous vehicle (AV capable of vehicle to everything communication)
DDT Fallback	Dynamic Driving Task Fallback	Fallback driving task performed by a human when the autonomous system fails during driving
V2X Communication	Vehicle-to-Everything Communication	Communication between the vehicle and external elements (includes V2V (Vehicle-to-Vehicle), V2I (Vehicle-to-Infrastructure), and V2P (Vehicle-to-Pedestrian), etc.)
EDI	Erratic Driving Index	Calculated as the sum of the area exceeding critical thresholds of aggressive driving indicators (e.g., speed, acceleration, jerk, and yaw) during the analysis period, divided by travel time
LSTM	Long Short-Term Memory	Recurrent neural network (RNN) architecture specialized in time-series data prediction
GRU	Gated Recurrent Unit	Recurrent neural network architecture that is a lightweight version of LSTM
SVM	support Vector Machine	Supervised learning models used for classification and regression analysis

References

Anderson, J.M.; Kalra, N.; Stanley, K.D.; Samaras, C.; Oluwatola, T.A. Autonomous Vehicle Technology: A Guide for Policymakers; Rand Corporation: Santa Monica, CA, USA, 2014. [Google Scholar]
Santacreu, A.; Yannis, G.; Léon, O.; Crist, P. Safe Micromobility; International Transport Forum: Leipzig, Germany; OECD: Paris, France, 2020; Available online: https://www.researchgate.net/publication/357689595_Safe_Micromobility (accessed on 17 February 2020).
Katrakazas, C.; Quddus, M.; Chen, W.-H. A new integrated collision risk assessment methodology for autonomous vehicles. Accid. Anal. Prev. 2019, 127, 61–79. [Google Scholar] [CrossRef]
Goudarzi, P.; Hassanzadeh, B. Collision risk in autonomous vehicles: Classification, challenges, and open research areas. Vehicles 2024, 6, 157–190. [Google Scholar] [CrossRef]
Xu, C.; Wang, W.; Liu, P.; Zhang, F. Development of a real-time crash risk prediction model incorporating the various crash mechanisms across different traffic states. Traffic Inj. Prev. 2015, 16, 28–35. [Google Scholar] [CrossRef]
Li, P.; Abdel-Aty, M.; Yuan, J. Real-time crash risk prediction on arterials based on LSTM-CNN. Accid. Anal. Prev. 2020, 135, 105371. [Google Scholar] [CrossRef]
Hossain, M.; Abdel-Aty, M.; Quddus, M.A.; Muromachi, Y.; Sadeek, S.N. Real-time crash prediction models: State-of-the-art, design pathways and ubiquitous requirements. Accid. Anal. Prev. 2019, 124, 66–84. [Google Scholar] [CrossRef]
Petrović, Đ.; Mijailović, R.; Pešić, D. Traffic accidents with autonomous vehicles: Type of collisions, manoeuvres and errors of conventional vehicles’ drivers. Transp. Res. Procedia 2020, 45, 161–168. [Google Scholar] [CrossRef]
Kim, C.H.; Kim, J.H. Investigating autonomous vehicle accidents at urban intersections based on road geometry data. J. Korean Soc. Road Eng. 2023, 25, 255–263. [Google Scholar] [CrossRef]
Lee, S.H.; Park, J.Y. A study on autonomous vehicle crash hierarchy analysis and severity model based on Bayesian probabilistic inference. J. Korean Soc. Transp. 2024, 42, 77–93. [Google Scholar] [CrossRef]
Jeong, A.R.; Cho, Y.; Oh, C. A methodology of identifying hazardous freeway segment based on multi-agent driving simulations for the mixed situation of autonomous and manual vehicles. J. Korean Soc. Transp. 2023, 41, 495–508. [Google Scholar] [CrossRef]
Li, J.; Ling, M.; Zang, X.; Luo, Q.; Yang, J.; Chen, J.; Guo, X. Quantifying risks of lane-changing behavior in highways with vehicle trajectory data under different driving environments. Int. J. Mod. Phys. C 2024, 35, 2450141. [Google Scholar] [CrossRef]
Park, H.; Haghani, A.; Sanuel, S.; Knodler, M.A. Real-time prediction and avoidance of secondary crashes under unexpected traffic congestion. Accid. Anal. Prev. 2018, 112, 39–49. [Google Scholar] [CrossRef]
Chen, S.; Piao, L.; Zang, X.; Luo, Q.; Li, J.; Yang, J.; Rong, J. Analyzing differences of highway lane-changing behavior using vehicle trajectory data. Phys. A Stat. Mech. Its Appl. 2023, 624, 128980. [Google Scholar] [CrossRef]
Szénási, S.; Kertész, G.; Felde, I.; Nádai, L. Statistical accident analysis supporting the control of autonomous vehicles. J. Comput. Methods Sci. Eng. 2021, 21, 85–97. [Google Scholar] [CrossRef]
Das, S.; Dutta, A.; Tsapakis, I. Automated vehicle collisions in California: Applying Bayesian latent class model. IATSS Res. 2020, 44, 300–308. [Google Scholar] [CrossRef]
Zheng, O.; Abdel-Aty, M.; Wang, Z.; Ding, S.; Wang, D.; Huang, Y. Avoid: Autonomous vehicle operation incident dataset across the globe. arXiv 2023, arXiv:2303.12889. [Google Scholar] [CrossRef]
Favarò, F.; Eurich, S.; Nader, N. Autonomous vehicles’ disengagements: Trends, triggers, and regulatory limitations. Accid. Anal. Prev. 2018, 110, 136–148. [Google Scholar] [CrossRef]
Kim, J.-Y. Law and Economics of Artificial Intelligence: Optimal Liability Rules for Accident Losses Caused by Fully Autonomous Vehicles. SSRN Electron. J. 2023. [Google Scholar] [CrossRef]
Hyeon, S.H.; Son, J.W.; Oh, Y.C.; You, B.Y. A study of the DSSAD data elements derivation through autonomous driving data analysis on expressways. J. Korean Soc. Intell. Transp. Syst. 2024, 23, 97–106. [Google Scholar] [CrossRef]
Lee, J.; Ahn, S.; Lee, J.; Roh, C.; Chang, I. Analysis of Safety Indicators by Pedestrian Accident Types in Urban Community Roads. J. Korea Inst. Intell. Transp. Syst. 2024, 23, 34–46. [Google Scholar] [CrossRef]
Vinoth, K.; Sasikumar, P. Multi-sensor fusion and segmentation for autonomous vehicle multi-object tracking using deep Q networks. Sci. Rep. 2024, 14, 31130. [Google Scholar] [CrossRef]
Saputra, D.B. Accident Investigation in the Automated Traffic System. Master’s Thesis, Westsächsische Hochschule Zwickau, Zwickau, Germany, 2023. [Google Scholar]
Kang, H.J.; Woo, N.E.; Park, G.O.; Song, J.H. A study on the direction of data triggers and elements for automated vehicle data recorder. J. Auto-Veh. Saf. Assoc. 2023, 15, 71–78. [Google Scholar]
Masello, L.; Sheehan, B.; Murphy, F.; Castiganani, G.; McDonnell, K.; Ryan, C. From traditional to autonomous vehicles: A systematic review of data availability. Transp. Res. Rec. 2022, 2676, 161–193. [Google Scholar] [CrossRef]
Elamrani Abou Elassad, Z.; Mousannif, H.; Al Moatassime, H. Class-imbalanced crash prediction based on real-time traffic and weather data: A driving simulator study. Traffic Inj. Prev. 2020, 21, 201–208. [Google Scholar] [CrossRef]
Kim, H.; Han, H.; You, Y.; Hong, J.; Song, T.J. A comprehensive traffic accident investigation system for identifying causes of the accident involving events with autonomous vehicle. J. Adv. Transp. 2024, 2024, 9966310. [Google Scholar] [CrossRef]
Répás, J.; Berek, L.; Schmidt, M. Autonomous Vehicles Forensics-The next step of the Digital Vehicles Forensics. In Proceedings of the 2022 IEEE 1st International Conference on Cognitive Mobility (CogMob), Budapest, Hungary, 12–13 October 2022. [Google Scholar]
Girdhar, M.; Hong, J.; You, Y.; Song, T.J. Anomaly Detection for Connected and Automated Vehicles: Accident Analysis. In Proceedings of the 2023 IEEE Transportation Electrification Conference & Expo (ITEC), Detroit, MI, USA, 21–23 June 2023. [Google Scholar]
Park, G.O.; Kang, H.J.; Woo, N.E. Types and necessity of data (EDR/DSSAD) recorded by automated vehicles. Auto J. J. Korean Soc. Automot. Eng. 2023, 45, 24–27. [Google Scholar]
Park, S.M.; So, J.; Ko, H.; Jeong, H.; Yun, I. Development of safety evaluation scenarios for autonomous vehicle tests using 5-layer format (case of the community road). J. Korean Soc. Intell. Transp. Syst. 2019, 18, 114–128. [Google Scholar] [CrossRef]
Lee, H.J.; Jang, M.; Song, J.; Hwang, K. Development of autonomous vehicle traffic accident scenarios in urban areas based on real-world accident data using association rule mining. J. Korean Soc. Transp. 2023, 41, 375. [Google Scholar] [CrossRef]
Lee, J.W.; Lee, M.S.; Jeong, J.I. Intersection collision situation simulation of automated vehicle considering sensor range. J. Auto-Veh. Saf. Assoc. 2021, 13, 114–122. [Google Scholar]
Lee, J.M.; Jeong, E.I.; Song, B.S. Critical scenario generation for collision avoidance of automated vehicles based on traffic accident analysis and machine learning. J. Korean Soc. Automot. Eng. 2020, 28, 817–826. [Google Scholar]
Lee, W.; Kang, M.; Hwang, K. A study on predicting accident vulnerable situation and deriving scenarios of automated vehicle based on actual driving data using vision transformer. J. Korean Soc. Intell. Transp. Syst. 2022, 21, 233–252. [Google Scholar] [CrossRef]
Oh, S.M.; Choi, J.H.; Jang, K.T.; Yoon, J.W. Analysis of autonomous vehicles risk cases for developing level 4+ autonomous driving test scenarios: Focusing on perceptual blind. J. Korean Soc. Intell. Transp. Syst. 2024, 23, 173–188. [Google Scholar] [CrossRef]
Qu, D.; Wang, K.; Dai, S.; Chen, Y.; Cui, S.; Yang, Y. Vehicle game lane-changing mechanism and strategy evolution based on trajectory data. Sci. Rep. 2025, 15, 4841. [Google Scholar] [CrossRef]
Song, Y.; Chitturi, M.V.; Noyce, D.A. Automated vehicle crash sequences: Patterns and potential uses in safety testing. Accid. Anal. Prev. 2021, 153, 106017. [Google Scholar] [CrossRef]
So, J.J.; Park, I.; Wee, J.; Park, S.; Yun, I. Generating traffic safety test scenarios for automated vehicles using a big data technique. KSCE J. Civ. Eng. 2019, 23, 2702–2712. [Google Scholar] [CrossRef]
Chen, S.; Zong, S.; Chen, T.; Huang, Z.; Chen, Y.; Labi, S. A taxonomy for autonomous vehicles considering ambient road infrastructure. Sustainability 2023, 15, 11258. [Google Scholar] [CrossRef]
Guo, H.; Xie, K.; Keyvan-Ekbatani, M. Modeling driver’s evasive behavior during safety-critical lane changes: Two-dimensional time-to-collision and deep reinforcement learning. Accid. Anal. Prev. 2023, 186, 107063. [Google Scholar] [CrossRef]
Virdi, N.; Grzybowska, H.; Waller, S.T.; Dixit, V. A safety assessment of mixed fleets with connected and autonomous vehicles using the surrogate safety assessment module. Accid. Anal. Prev. 2019, 131, 95–111. [Google Scholar] [CrossRef]
Shi, X.; Wong, Y.D.; Li, M.Z.F.; Chai, C. Key risk indicators for accident assessment conditioned on pre-crash vehicle trajectory. Accid. Anal. Prev. 2018, 117, 346–356. [Google Scholar] [CrossRef]
Khan, G.; Qin, X.; Noyce, D.A. Spatial analysis of weather crash patterns. J. Transp. Eng. 2008, 134, 191–202. [Google Scholar] [CrossRef]
Tang, W.; Wnag, H.; Ma, J.; Yang, C.; Yin, C. Vehicle collision risk assessment method in highway work zone based on trajectory data. Traffic Inj. Prev. 2025, 1–8. [Google Scholar] [CrossRef]
Meng, D.; Xiao, W.; Zhang, L.; Zhang, Z.; Liu, Z. Vehicle trajectory prediction based predictive collision risk assessment for autonomous driving in highway scenarios. arXiv 2023, arXiv:2304.05610. [Google Scholar] [CrossRef]
Khelfa, B.; Ba, I.; Tordeux, A. Predicting highway lane-changing maneuvers: A benchmark analysis of machine and ensemble learning algorithms. Phys. A Stat. Mech. Its Appl. 2023, 612, 128471. [Google Scholar] [CrossRef]
Gu, R. Integrating fuzzy trajectory data and artificial intelligence methods for multi-style lane-changing behavior prediction. arXiv 2022, arXiv:2205.05016. [Google Scholar] [CrossRef]
Wang, X.; Liu, S.; Zhang, J.; Ni, D. Real-Time Risk Identification and Prediction for the Target Lane’s Following Vehicle during Lane Change. Transp. Res. Rec. 2024, 2678, 1785–1798. [Google Scholar] [CrossRef]
Choi, J.H.; Lim, J.B.; Lee, S.B. A meta analysis of the effects of road safety facilities on accident reduction: Focusing on signalized intersection. J. Korean Soc. Transp. 2016, 34, 291–303. [Google Scholar] [CrossRef][Green Version]
Jo, Y.; Youn, S.-M.; Oh, C. Effectiveness Analysis of Variable Speed Limit Systems (VSL) in Work Zones based on Meta-analysis. J. Korea Inst. Intell. Transp. Syst. 2016, 15, 91–103. [Google Scholar] [CrossRef]
Hu, W.; Zhang, T.; Zhang, Y.; Chang, A.H.S. Non-driving-related tasks and drivers’ takeover time: A meta-analysis. Transp. Res. Part F Traffic Psychol. Behav. 2024, 103, 623–637. [Google Scholar] [CrossRef]
Zhang, T.; Zeng, W.; Zhang, Y.; Tao, D.; Li, G.; Qu, X. What drives people to use automated vehicles? A meta-analytic review. Accid. Anal. Prev. 2021, 159, 106270. [Google Scholar] [CrossRef]
Liberatore, M.J.; Nydick, R.L. The analytic hierarchy process in medical and health care decision making: A literature review. Eur. J. Oper. Res. 2008, 189, 194–207. [Google Scholar] [CrossRef]
Ho, W. Integrated analytic hierarchy process and its applications–A literature review. Eur. J. Oper. Res. 2008, 186, 211–228. [Google Scholar] [CrossRef]
Saaty, T.L. Fundamentals of Decision Making and Priority Theory with the Analytic Hierarchy Process; RWS Publications: Pittsburgh, PA, USA, 1994. [Google Scholar]
Ishizaka, A.; Labib, A. Review of the main developments in the analytic hierarchy process. Expert Syst. Appl. 2011, 38, 14336–14345. [Google Scholar] [CrossRef]
Borgulya, I. A ranking method for multiple-criteria decision-making. Int. J. Syst. Sci. 1997, 28, 905–912. [Google Scholar] [CrossRef]
Boran, F.E.; Genç, S.; Kurt, M.; Akay, D. A multi-criteria intuitionistic fuzzy group decision making for supplier selection with TOPSIS method. Expert Syst. Appl. 2009, 36, 11363–11368. [Google Scholar] [CrossRef]
Taherdoost, H.; Madanchian, M. Multi-criteria decision making (MCDM) methods and concepts. Encyclopedia 2023, 3, 77–87. [Google Scholar] [CrossRef]
Kim, H.; You, Y.; Han, H.; Cho, M.J.; Song, T.J. Traffic Accidents Scenarios Based on Autonomous Vehicle Functional Safety Systems. J. Korean Soc. Intell. Transp. Syst. 2025, 24, 250–266. [Google Scholar] [CrossRef]
Kang, W.; Wee, J.; Shin, H.C.; Kim, S. A study on policy directions for smart mobility service in the post-COVID-19 era using the ahp technique. J. Korea Inst. Intell. Transp. Syst. 2024, 23, 100–116. [Google Scholar] [CrossRef]
Cheng, E.W.; Li, H. Analytic hierarchy process: An approach to determine measures for business performance. Meas. Bus. Excell. 2001, 5, 30–37. [Google Scholar] [CrossRef]
Wedley, W.C. Consistency prediction for incomplete AHP matrices. Math. Comput. Model. 1993, 17, 151–161. [Google Scholar] [CrossRef]

Figure 1. Overall research flow.

Figure 2. Identification of high-risk factors based on accident data.

Figure 3. Process of identifying high-risk factors based on autonomous vehicle driving video analysis [61].

Figure 4. Conceptual framework for high-risk prediction model (road segment level).

Figure 5. Conceptual framework for high-risk prediction model (autonomous vehicle level).

Table 1. Results of binary meta-analysis.

Risk Factors	# of Events	# of Studies	Relative Risk	95% Confident Interval (Lower)	95% Confident Interval (Upper)
Acceleration/Deceleration	22	58	0.3793	0.2656	0.508
Intersection	20	58	0.3448	0.2356	0.508
Rear-end collision	18	58	0.3103	0.2062	0.4733
Fog	18	58	0.3103	0.2062	0.138
Rain	17	58	0.2931	0.1918	0.138
Inter-vehicle distance	15	58	0.2586	0.1635	0.4201
# of lanes	15	58	0.2586	0.1635	0.4201
Presence of lighting	12	58	0.2069	0.1225	0.3838
Pedestrian collision	8	58	0.1379	0.0716	0.3277
Bicycle collision	7	58	0.1207	0.0597	0.2493
Lane change	6	58	0.1034	0.0483	0.2288
Steering angle	5	58	0.0862	0.0379	0.1864
Driver intervention	5	58	0.0862	0.0379	0.1864
Vehicle error	5	58	0.0862	0.0379	0.1864
Tunnel	4	58	0.0689	0.0271	0.1643
Roadside parking	4	58	0.0689	0.0271	0.1643
Wind speed	4	58	0.0689	0.0271	0.1643
Jerk	3	58	0.0517	0.0177	0.1414
Severity	3	58	0.0517	0.0177	0.1414
TTC	3	58	0.0517	0.0177	0.1414
Sideswipe	2	58	0.0345	0.0095	0.1173
EDR	2	58	0.0345	0.0095	0.1173
Hacking	2	58	0.0345	0.0095	0.1173
Perception error	2	58	0.0345	0.0095	0.1173
Shockwave	2	58	0.0345	0.0095	0.1173
Bus stop	1	58	0.0172	0.0031	0.0914

Table 2. Results of effect size meta-analysis.

Risk Factors	Mean Effect Size	p-Value	Risk Factors	Mean Effect Size	p-Value
Acceleration/Deceleration	0.5050	0.000011	Tunnel	0.0771	0.069588
Fog	0.3447	0.000112	Jerk	0.0581	0.151900
Rain	0.3172	0.000218	Steering angle	0.0538	0.161178
Intersection	0.3097	0.000121	Vehicle error	0.0457	0.153285
# of lanes	0.2872	0.000607	Roadside parking	0.0372	0.192602
Rear-end collision	0.2665	0.000746	Bicycle collision	0.0353	0.013797
Inter-vehicle distance	0.2512	0.001605	TTC	0.0330	0.243168
Pedestrian collision	0.1237	0.018445	Shockwave	0.0322	0.159097
Dedicated lane	0.0963	0.045377	Sideswipe	0.0225	0.194688
Wind speed	0.0826	0.083216	EDR	0.0159	0.321537
Severity	0.0826	0.083216	Hacking	0.0097	0.160178
Lane change	0.0803	0.061907	Perception error	0.0064	0.160178
Driver intervention	0.0803	0.099426	Bus stop	0.0012	0.321537

Table 3. Results of key factor identification through expert consultation.

No.	Key Points	Risk Factors
1	Necessity of AI-based predictive driving analysis in connected autonomous vehicle (CAV) environments	Intersection, inter-vehicle distance, dilemma zone
2	Scenario-based prediction models and object recognition using video information	Perception error, intersection, right turn, roundabout
3	Need for lane operation analysis according to the penetration rate of autonomous vehicles	Vehicle type distribution, roundabout
4	Necessity of analyzing video-based accident data and identifying risk factors	Construction zone, high-accident zone, dedicated lane, vehicle type distribution, bottleneck point
5	Real-time accident risk prediction and infrastructure-based driving assistance technologies	Intersection, ramp in/out, lane changing, # of lanes, stop and go traffic, traffic conflict
6	Importance of establishing a cybersecurity framework for cooperative autonomous driving	Perception/decision/control error, hacking, DDT (dynamic driving task) fallback, in-vehicle/V2X (vehicle to everything) communication, cyberattack
7	Identification and risk prediction of complex driving hazard situations	Intersection, tunnel

Table 4. Reclassified data layers and their correspondence to the original PEGASUS framework.

No.	Reclassified Layer (This Study)	Corresponding PEGASUS Layer	Comparison and Reclassification Direction
1	Road facilities	Road level, traffic infrastructure	Consolidates physical and structural elements such as alignment, grade, and roadside objects. Redefined to emphasize infrastructure-related baseline conditions.
2	Variable and temporary facilities	Traffic infrastructure, temporary modifications	Separates dynamic or situational elements (e.g., bus-only lanes and construction zones) from static infrastructure for improved risk predictability.
3	Traffic flow characteristics	(Not specified in PEGASUS)	Newly introduced to capture macroscopic flow features (e.g., bottlenecks, and conflict points) as key risk indicators.
4	Environmental variable	Environmental conditions	Retains external conditions such as weather, lighting, and road surface, but highlights measurability and influence on sensor performance.
5	Moving objects	Dynamic objects	Refined focus on the behavior and interaction patterns of road users (e.g., lane changes, violations, and pedestrian movements).
6	Digital	Digital information	Expanded to encompass system-level risks such as perception/judgment errors, fallback scenarios, and cybersecurity threats in autonomous driving systems.

Table 5. The risk factors included in the final AHP survey.

Key Points	Risk Factors
Road facilities	Intersection	Roundabout
	Ramp on/off	Underpass
	Tunnel	# of lanes
	Road structure/facility	Expressway/Highway
Variable and temporary facilities	Road obstacle	Construction zone
Variable and temporary facilities	Variable lane	Bus-only lane
Traffic flow characteristics	Traffic conflict	High-accident zone
	Dilemma zone	Stop and go traffic
	Bottleneck point	Vehicle type distribution
Environmental variable	Snowfall	Wet/Moist surface
	Rainfall	Night time
	Fog
Moving objects	Truck	Pedestrian
	Signal violation	Sudden stop
	Lane change	Inter-vehicle distance
	Centerline violation	Turning
	On-road parking/stopping
Digital	Perception error	Decision error
	Control error	Cyberattack
	DDT fallback	In-vehicle/V2X communication

Table 6. Results of the AHP (the first hierarchy).

Layer		Importance
5	Moving objects	0.2437
6	Digital	0.2246
3	Traffic flow characteristics	0.1889
4	Environmental variables	0.1870
2	Variable/Temporary facilities	0.0915
1	Road facilities	0.0643

Table 7. Results of the AHP (the second hierarchy).

Layer		High-Risk Factors	Importance	Rank	Layer		High-Risk Factors	Importance	Rank
1	Road facilities	Intersection	0.2777	1	4	Environmental variables	Snowfall	0.3003	1
		Roundabout	0.1958	2			Wet/Moist surface	0.2811	2
		Ramp on/off	0.1880	3			Rainfall	0.1933	3
		Underpass	0.1026	4			Night time	0.1656	4
		tunnel	0.0846	5			Fog	0.0598	5
		Road structure/facility	0.0566	6	5	Moving objects	Truck	0.1840	1
		Road structure/facility	0.0566	6			Signal violation	0.1702	2
		# of lanes	0.0505	7
		Expressway/highway	0.0442	8
		Expressway/highway	0.0442	8			Pedestrian	0.1506	3
2	Variable/ Temporary facilities	Road obstacle	0.4136	1			Pedestrian	0.1506	3
		Road obstacle	0.4136	1			Sudden stop	0.1346	4
		Construction zone	0.2878	2			Sudden stop	0.1346	4
		Construction zone	0.2878	2			Inter-vehicle distance	0.1053	5
		Variable lane	0.2056	3			Inter-vehicle distance	0.1053	5
		Variable lane	0.2056	3			Lane change	0.0984	6
		Bus-only lane	0.0931	4			Lane change	0.0984	6
		Bus-only lane	0.0931	4			Centerline violation	0.0595	7
3	Traffic flow characteristics	Traffic conflict	0.2776	1			Centerline violation	0.0595	7
		Traffic conflict	0.2776	1			Turning	0.0557	8
		Dilemma zone	0.1713	2			Turning	0.0557	8
		Dilemma zone	0.1713	2			On-road parking/stopping	0.0417	9
		High-accident zone	0.1684	3			On-road parking/stopping	0.0417	9
					6	Digital	Perception error	0.2512	1
							Decision error	0.2408	2
		Stop and go traffic	0.1460	4			Decision error	0.2408	2
		Stop and go traffic	0.1460	4			Cyberattack	0.1604	3
		Bottleneck point	0.1340	5			Cyberattack	0.1604	3
							Control error	0.1277	4
							DDT fallback	0.1168	5
		Vehicle type distribution	0.1026	6			DDT fallback	0.1168	5
		Vehicle type distribution	0.1026	6			In-vehicle/V2X communication	0.1032	6

Table 8. Final high-risk factors identified.

Layer	High-Risk Factors
Road Facilities	Intersection	Roundabout
	Ramp on/off	Underpass
	Tunnel
Variable and Temporary Facilities	Road obstacle	Construction zone
Variable and Temporary Facilities	Variable lane
Traffic Flow Characteristics	Traffic conflict	High-accident zone
	Dilemma zone	Stop and go traffic
	Bottleneck point
Environmental Variables	Snowfall	Wet/Moist surface
Environmental Variables	Rainfall	Night time
Moving Objects	Truck	Pedestrian
	Signal violation	Sudden stop
	Lane change	Inter-vehicle distance
Digital	Perception error	Decision error
	Control error	Cyberattack
	DDT fallback

Table 9. Additional candidate indicators for high-risk prediction: practical considerations and data availability.

High-Risk Factors	Definition	Purpose of Use
Weaving ratio	Proportion of weaving traffic to total traffic volume within a weaving segment Weaving Traffic: The traffic flow that must cross other streams within a weaving segment to reach its intended direction Weaving segment: A road segment (≤750 m) where vehicles cross paths in the same direction and change lanes without traffic control facilities, typically with merging and diverging areas in sequence	Conflict risk analysis within road segments
EDI (Erratic Driving Index)	Calculated as the sum of the area exceeding critical thresholds of aggressive driving indicators (e.g., speed, acceleration, jerk, and yaw) during the analysis period, divided by travel time	Assessment of individual vehicle driving stability
Proportion of risky driving behavior	Proportion of time during which risky driving behaviors (speeding, sudden deceleration, hard braking, and sharp turning) are observed in the analysis period	Assessment of individual vehicle driving stability
Stopping rate within the segment	Number of stops per unit time within the segment (excluding stops due to traffic signals)	Used for risk assessment of crash occurrence and congestion evaluation

Table 10. Key indicators for application in high-risk prediction models.

Layer		High-Risk Factors	High-Risk Indicators	Average Rank	Rank
1	Road facilities	Intersection	Presence of intersection	1.86	1
			Presence of a crosswalk	2.93	2
			Intersection type (T-junction, four-way, etc.)	3.50	3
		Ramp on/off	Presence of on/off-ramp	1.21	1
			Speed limit	2.21	2
			Lane count	2.64	3
		Roundabout	Presence of roundabout	1.64	1
			Lane count	2.79	3
			Presence of crosswalk	2.57	2
		Underpass	Presence of underpass (road)	1.50	1
			Underpass (road) length	2.29	2
			Lane count	2.57	3
		Tunnel	Presence of tunnel	1.43	1
			Tunnel length	2.43	2
			Lane count	2.50	3
2	Variable and temporary facilities	Road obstacle	Presence of road obstacles	1.36	1
			Location of road obstacles (lane, shoulder, etc.)	2.36	3
			Type of road obstacles (pothole, debris, roadkill, etc.)	2.29	2
		Construction zone	Presence of construction/work zone	1.14	1
		Construction zone	Location of construction/work zone (lane, shoulder, etc.)	2.00	2
		Variable lane	Presence of reversible lane	1.07	1
3	Traffic flow characteristics	Traffic conflict	Weaving ratio	1.43	1
			PET (post encroachment time)	2.93	3
			Standard deviation of link travel speed	2.71	2
		High-accident zone	Presence of accident-prone zone	2.36	2
			Proportion of risky driving behavior	2.00	1
			EDI (erratic driving index)	2.86	3
		Stop-and-go traffic	Standard deviation of link travel speed	2.79	3
			PET (post encroachment time)	2.50	1
			Stopping rate within the segment	2.64	2
		Dilemma zone	Presence of dilemma zone	1.00	1
		Bottleneck point	Speed difference between adjacent links	1.71	1
			Standard deviation of link travel speed	1.71	1
			Heavy vehicle ratio	2.79	3
4	Environmental variable	Snowfall	Snowfall amount	1.00	1
		Snowfall	Snowfall duration	2.14	2
		Rainfall	Rainfall amount	1.00	1
		Rainfall	Rainfall duration	2.00	2
		Wet/Moist surface	Rainfall amount	2.14	2
			Snowfall amount	1.50	1
			Temperature	2.93	3
		Night time	Night time period presence	1.00	1
5	Moving object	Pedestrian	Presence of pedestrians near autonomous vehicle	1.14	1
		Truck	Presence of freight vehicles near autonomous vehicle	1.07	1
		Signal violation	Presence of signal violation	1.00	1
		Sudden stop	Acceleration/Deceleration	1.36	1
			Speed	2.43	3
			Jerk	2.21	2
		Lane change	Acceleration/Deceleration	1.64	1
			Angular velocity per second	2.43	2
			Jerk	3.07	3
		Inter-vehicle distance	Inter-vehicle distance	1.14	1
6	Digital	Perception error	Presence of perception error	1.36	1
			Frequency of perception errors	1.71	2
			Sensor field of view	2.93	3
		Decision error	Presence of decision-making error	1.36	1
			Frequency of decision-making errors	1.64	2
			Sensor field of view	3.00	3
		Control error	Presence of control error	1.36	1
			Frequency of control errors	1.64	2
			Sensor field of view	3.00	3
		DDT Fallback	Presence of DDT fallback	1.43	1
		DDT Fallback	Frequency of DDT fallback	1.57	2
		Cyberattack	Presence of cyberattack	1.21	1
		Cyberattack	Frequency of cyberattacks	1.79	2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Structured Risk Identification for Sustainable Safety in Mixed Autonomous Traffic: A Layered Data-Driven Approach

Abstract

1. Introduction

2. Literature Review

2.1. Literature on Accident Factors in Mixed Traffic with Autonomous Vehicles

2.2. Literature on Survey Items for Autonomous Vehicle Accidents

2.3. Literature on Autonomous Vehicle Risk Scenarios

2.4. Literature on Traffic Safety-Related Indicators in Mixed Traffic with Autonomous Vehicles

2.5. Literature on High-Risk Situation Prediction Models in Mixed Traffic with Autonomous Vehicles

2.6. Research Differentiation

3. Methodology

3.1. Overall Research Flow

3.2. Meta-Analysis Methodology

3.3. AHP (Analytic Hierarchy Process) Methodology

3.4. Average Ranking Analysis Methodology

4. Results

4.1. Identification of High-Risk Factors

4.1.1. Existing Literature-Based Meta-Analysis

4.1.2. Accident History Data Analysis

4.1.3. Accident Video Data Analysis

4.1.4. Autonomous Vehicle Driving Video Data Analysis

4.1.5. Expert Seminar

4.2. Layer-Based Reclassification

4.3. AHP Analysis Design and Execution

4.4. High-Risk Indicator Design and Prediction Framework Conceptualization

4.4.1. Identification of High-Risk Indicators

4.4.2. Proposal of a High-Risk Situation Prediction Framework in Mixed Traffic with Autonomous Vehicles

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix B

Appendix C

References

Article Metrics

Citations

Article Access Statistics