Next Article in Journal
Climate Change and Agricultural Risks: Perception of Farmers from a Socio-Economic Sustainability Perspective
Previous Article in Journal
Research on the Impact and Mechanism of Digital Technology on the Synergistic Governance of Pollution and Carbon Reduction
Previous Article in Special Issue
The Problem of the Comparability of Road Accident Data from Different European Countries
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Structured Risk Identification for Sustainable Safety in Mixed Autonomous Traffic: A Layered Data-Driven Approach

Korea Transport Institute, Sejong-si 30147, Republic of Korea
*
Author to whom correspondence should be addressed.
Sustainability 2025, 17(16), 7284; https://doi.org/10.3390/su17167284
Submission received: 26 June 2025 / Revised: 1 August 2025 / Accepted: 9 August 2025 / Published: 12 August 2025

Abstract

With the accelerated commercialization of autonomous vehicles, new accident types and complex risk factors have emerged beyond the scope of existing traffic safety management systems. This study aims to contribute to sustainable safety by establishing a quantitative basis for early recognition and response to high-risk situations in urban traffic environments where autonomous and conventional vehicles coexist. To this end, high-risk factors were identified through a combination of literature meta-analysis, accident history and image analysis, autonomous driving video review, and expert seminars. For analytical structuring, the six-layer scenario framework from the PEGASUS project was redefined. Using the analytic hierarchy process (AHP), 28 high-risk factors were identified. A risk prediction model framework was then developed, incorporating observational indicators derived from expert rankings. These indicators were structured as input variables for both road segments and autonomous vehicles, enabling spatial risk assessment through agent-based strategies. This space–object integration-based prediction model supports the early detection of high-risk situations, the designation of high-enforcement zones, and the development of preventive safety systems, infrastructure improvements, and policy measures. Ultimately, the findings offer a pathway toward achieving sustainable safety in mixed traffic environments during the initial deployment phase of autonomous vehicles.

1. Introduction

Worldwide, technological development and institutional preparation for the commercialization of autonomous vehicles have been actively underway. In particular, real-world demonstration, legal system enhancements, and infrastructure development have been implemented in the United States, Europe, and Korea for fully autonomous driving (level 4 or higher), and a new paradigm of traffic safety management, necessitating a new paradigm in traffic safety management alongside the accelerated adoption of autonomous vehicles [1,2]. These changes have led to the emergence of new accident types and risk factors that are difficult to respond to with existing traffic safety management methods alone.
Unlike conventional accidents, such as simple driver judgment errors and collisions between vehicles, autonomous vehicle accidents have fundamentally different characteristics due to the complex involvement of factors, including sensor recognition errors, system malfunctions, and digital communication failures [3,4]. These complex, system-oriented accident mechanisms highlight the importance of “prevention” compared to “follow-up actions” in responding to autonomous vehicle accidents, indicating that it is essential to construct a system capable of detecting high-risk situations in advance and adequately responding to them.
Accordingly, studies have actively focused on risk-based assessment and real-time prediction model development to prevent autonomous vehicle accidents [5,6]. In particular, there is a growing need for a comprehensive risk prediction system that considers the characteristics of road segments, weather conditions, and interactions with surrounding objects, as well as the driving conditions of the vehicle, and an integrated approach is required beyond conventional single variable-oriented analysis [7]. Most existing models, however, were developed in limited environments (e.g., highways) or focused solely on vehicle-level risk prediction, limiting their ability to fully reflect the complexity and diversity of actual traffic environments.
To overcome these limitations, the risk factors required to detect and prevent high-risk situations before accidents in the urban traffic environment with mixed autonomous and conventional vehicles were systematically derived in this study. To this end, data from various sources (e.g., literature, accident videos, and accident history) were comprehensively collected and analyzed, and variable importance was quantified through expert consultation. Furthermore, high-risk prediction indicators applicable to road segments and individual autonomous vehicles were separately derived to support the development of a customized risk response system based on real-time detection.

2. Literature Review

2.1. Literature on Accident Factors in Mixed Traffic with Autonomous Vehicles

Previous studies have examined the causes of autonomous vehicle accidents from various perspectives, including road environment, driver behavior, and system errors. In terms of road infrastructure, intersections, lane count, dedicated lanes, and on-street parking were found to be factors that influence accident occurrence [8,9,10]. Variable facility complexity and specialized road sections were also found to affect driving safety. In particular, junction and deceleration lane design were found to increase risk [11]. Li et al. (2024) empirically demonstrated that lane-changing maneuvers on highways significantly increase the risk of collisions when combined with external factors such as lighting conditions and congestion [12]. This suggests that not only road structures but also driving environments and behaviors interact in complex ways to elevate accident risk.
With respect to the characteristics of traffic flow, the speed deviation, congestion, and upstream traffic were closely associated with autonomous vehicle accidents [7,13]. In this context, Chen et al. (2023) analyzed the differences in lane-changing behavior between autonomous and conventional vehicles, indicating that interactions between the conservative behavior of autonomous vehicles and the aggressive driving characteristics of human drivers may trigger collision risks [14]. This highlights that, in addition to static factors, dynamic behavioral differences also serve as critical contributors to accidents. Environmental factors such as inclement weather (e.g., rain, snow, and fog) or poor illumination conditions were identified as significant accident risk factors [15,16,17].
In terms of interactions with moving objects, errors by pedestrians, bicyclists, and other vehicle drivers were found to be the primary causes of autonomous vehicle accidents. In particular, rear-end collisions were frequently reported [8,10]. Additionally, cognitive and control errors in autonomous driving systems have been consistently identified as major accident causes [17,18].

2.2. Literature on Survey Items for Autonomous Vehicle Accidents

Diverse information should be collected in an integrated manner to investigate traffic accidents in a mixed traffic environment that involves autonomous vehicles, and related studies suggested that items for this should be systematically constructed. Road geometric structures (e.g., lane count, curvature radius, and intersection types) are closely related to the occurrence of accidents, and surrounding facilities (e.g., on-street parking lots and dedicated lanes) also affect accident severity [19,20,21].
Vinoth and Sasikumar (2024) emphasized that accident investigation items should include lane markings, sign recognition, and crosswalk locations to accurately analyze the causes of failures in autonomous perception systems [22]. This suggests the need to structure accident causality not only in terms of physical road features but also from the perspective of perceptibility.
Variable and temporary facilities must be recorded because changes in road conditions, such as construction zones and temporary lanes, can affect autonomous driving system (ADS) operation [20,23]. For items related to the driving behavior of vehicles, the vehicle speed, acceleration, inter-vehicle distance, and brake operation status are repeatedly mentioned, and they are used as key items of platooning risk assessment [24,25]. Jeong et al. (2023) suggested that collecting real-time interaction data between the vehicle and surrounding objects in the moments leading up to a crash is useful not only for identifying accident causes but also for developing future risk prediction models [11]. They specifically argued that items such as brake reaction time, the timing of obstacle recognition, and whether evasive maneuvers were attempted should also be included.
As for environmental variables, road conditions, weather, lighting, and location information are closely related to the cognitive accuracy of autonomous driving systems, and they were considered essential in almost every study [26,27,28]. It was also emphasized that interaction data with surrounding objects is necessary to assess the situation immediately before an accident, and that object identification and distance measurement are required [23,29].
Regarding digital elements, the operation status, failure record, and mode-change timing of the autonomous driving system should be recorded in DSSAD (Data Storage System for Automated Driving) and the EDR (Event Data Recorder) as key data for accident-cause analysis and liability attribution [25,28,30].

2.3. Literature on Autonomous Vehicle Risk Scenarios

In constructing autonomous vehicle risk scenarios, previous studies commonly consider infrastructure, environmental conditions, moving objects, and sensors. In terms of road infrastructure, road geometric structures (e.g., intersections, lane count, and curvature) are frequently used as the background for scenarios, and conflicts in intersections or cut-ins in lanes are repeatedly handled as major accident situations [31,32,33].
Chen et al. (2023) classified risk scenarios by road type, traffic signals, and vehicle interactions for simulation-based safety evaluations of autonomous driving, aiming to systematically replicate real accident conditions through various combinations [14]. This study enhanced the comprehensiveness and effectiveness of scenario design by emphasizing multi-layered structuring of road and object conditions.
Fixed infrastructure, including traffic lights and signs, is also essential for evaluating autonomous vehicle recognition and response. Weather and lighting conditions are reflected in almost all scenarios, as they are directly related to the cognitive accuracy of the autonomous driving system [34,35,36]. Weather variation, the time of day, and lighting availability directly affect sensor cognitive performance. In actual accident cases, restricted visibility was identified as a primary cause. Qu et al. (2025) identified types of autonomous driving risk scenarios based on domestic accident footage and proposed commonly included elements for each scenario, such as “illumination by time of day,” “multiple object movements within intersections,” and “rate of inter-vehicle distance reduction.” [37]. They emphasized the importance of constructing context-specific scenarios by incorporating accident contexts frequently observed in Korean urban environments (e.g., illegal U-turns and traffic signal violations by bicycles).
Moving objects and dynamic variables are key elements that reflect the interactions between autonomous vehicles and surrounding objects [32,38,39]. Sudden stops, vehicle lane changes, and pedestrian crossings are commonly included in scenarios as high-risk behaviors, and dynamic information (e.g., relative position, velocity, and TTC (time to collision)) is also considered. Sensor and digital elements are handled as essential conditions to reflect technical limitations, including sensor viewing angles, obstacle shading, and sensor errors [33,35,36].

2.4. Literature on Traffic Safety-Related Indicators in Mixed Traffic with Autonomous Vehicles

To predict high-risk situations in advance in mixed traffic environments, traffic safety indicators that quantitatively capture pre-accident risk signals are needed. Previous studies have mainly classified these indicators into three categories. First, driving behavior indicators include the inter-vehicle distance, time headway, lane departure rate, and longitudinal and lateral acceleration [3,40].
Guo et al. (2023) proposed additional indicators beyond driving behavior metrics, such as the brake response delay time and deceleration pattern variation in autonomous vehicles [41]. By analyzing actual autonomous driving simulation data, they demonstrated that subtle deceleration changes just before abrupt stops serve as key predictive signals of impending accidents. This highlights the need to incorporate dynamic predictive elements that static threshold-based indicators alone cannot capture.
Second, inter-vehicle interaction indicators include the time to collision (TTC), post encroachment time (PET), total impact time (TIT), crash potential index (CPI), and collision avoidance rate (CAR) [4,42,43]. Third, weather and road environment indicators include the precipitation effect, view range, road surface condition, reflective intensity, road curvature, and lane count [34,43,44].
Tang et al. (2025) comparatively analyzed actual autonomous driving footage and manual vehicle operation records and identified “relative speed variation” and “lateral approach angle” as core predictive indicators of collisions in mixed traffic environments [45]. They particularly recommended using the “Lateral Conflict Indicator” in conjunction with PET for lateral interactions at intersections or merging points, suggesting a way to address the limitations of traditional longitudinal-focused metrics in perceiving complex situations.
As such, recent studies have proposed composite indicators based not only on traditional TTC or distance-based metrics, but also on dynamic changes in actual driving behavior and object interactions. To enhance the accuracy of high-risk situation prediction in mixed traffic environments, the integration of static and dynamic indicators is increasingly emphasized.

2.5. Literature on High-Risk Situation Prediction Models in Mixed Traffic with Autonomous Vehicles

Studies on real-time high-risk situation prediction in mixed traffic environments have evolved around traffic flow characteristics, environmental conditions, and digital elements. Most focus on quantifying accident likelihood using structured data. Traffic flow characteristics and driving behavior variables are central to most prediction models. Average speed, speed standard deviation, and occupancy rate are closely related to driving stability and are effective in quantifying accident risks [5,6,7,13,26].
Meng et al. (2023) proposed a model for predicting high-risk situations just before entering intersections using key variables such as speed variation, intersection approach distance, and longitudinal gap [46]. They quantitatively demonstrated that speed deviations and abnormal acceleration patterns occurring in mixed flows of autonomous and manually driven vehicles increase collision probability, suggesting the potential for real-time detection-based warning systems.
Environmental and weather variables complement traffic flow variables to enhance predictive performance, while visibility, road wetness, and temperature increase accident risks by affecting vehicle control and driver response [5,6,7,17]. These variables are primarily collected through external sensor data and serve as critical complementary elements in real-time fusion-based prediction models.
Through real-world driving data, Khelfa et al. (2023) demonstrated that weather changes and road surface conditions are directly linked to sensor perception failures in autonomous vehicles [47]. They introduced “sensor perception accuracy degradation rate” as a new independent variable to enhance model sensitivity. Emphasizing the need for digital infrastructure support in adverse weather conditions, they also highlighted the feasibility of applying HD map correction cycles and infrastructure-cooperative prediction models.
Digital communication and infrastructure variables are unique risk factors in autonomous driving, and communication failure, the construction of HD maps (high-definition maps), and road infrastructure quality are used as input variables of prediction models [4,26].
Gu et al. (2022) developed a sub-model composed solely of digital risk factors by incorporating V2X communication failures and fault probabilities in road segments without high-definition maps [48]. The study empirically demonstrated the independent predictive impact of digital elements by forecasting the conditions under which sensor-communication fusion may fail.
Wang et al. (2024) developed a deep learning-based extreme gradient boosting (XGBoost) model focused on predicting decision-making errors of autonomous vehicles in mixed traffic environments [49]. By training time-series data from 10 s before an accident, the model could predict not only the onset but also the duration of high-risk situations. They integrated vehicle driving logs and sensor statuses, suggesting the potential for real-time warning system development.
As such, recent studies on high-risk prediction models have expanded beyond single-variable analyses toward multidimensional predictive frameworks that integrate traffic flow and environmental and digital variables. Notably, these models quantitatively reflect the interactive characteristics and system limitations specific to mixed traffic environments, distinguishing them from conventional traffic accident prediction models.

2.6. Research Differentiation

Previous studies commonly state that autonomous vehicle accidents are influenced by both environmental conditions (e.g., weather, illumination, and road geometry) and system factors (e.g., ADS operation status and fallback execution), unlike conventional vehicles, and that such system data must be included in accident investigations. Since most of them analyze causes after accidents based on the California DMV (Department of Motor Vehicles) accident reports or autonomous driving mode disengagement reports, research on predicting high-risk situations in the pre-accident phase remains relatively insufficient.
This study assumes autonomous vehicle operation in mixed traffic and aims to detect high-risk situations early in the pre-accident phase and derive preventive factors. To this end, data from diverse sources (e.g., literature meta-analysis, accident history, video footage, expert seminars, etc.) were integrated and analyzed to identify high-risk factors using a data-driven approach. Subsequently, an importance evaluation based on expert prior knowledge was conducted using the analytic hierarchy process (AHP), enabling the quantitative integration of heterogeneous data.
The study also designed a model for predicting high-risk situations by structuring road segment-based risk indicators and autonomous vehicle-level risk indicators in parallel. This makes it possible to precisely detect risks in dual structures beyond single-dimensional prediction by separately constructing space-based variables (e.g., traffic flow, road geometry, and weather conditions), variables based on individual vehicles (e.g., vehicle driving behavior and sensor errors), and digital-related variables (e.g., perception errors).

3. Methodology

3.1. Overall Research Flow

This study aims to propose a framework for predicting high-risk situations in road environments where autonomous and conventional vehicles coexist. To this end, the causes of high-risk situations were analyzed from multiple perspectives and structured into quantifiable indicators. The feasibility of designing a prediction model using the indicators as input variables was then examined. The research comprises the following five main steps. Figure 1 presents the overall research flow.
First, in identifying high-risk factors, five types of data were used to reflect various aspects, such as accident context, driver behavior, infrastructure impacts, and autonomous driving system errors. The data used included literature-based meta-analysis, accident history analysis, accident video analysis, autonomous driving video analysis, and expert seminars. Based on these data, multidimensional high-risk factors were comprehensively collected and refined.
Second, the classification of the identified high-risk factors referred to the six-layer structure of the PEGASUS joint project (Project for the Establishment of Generally Accepted Quality Standards, Tools, Methods, Processes, and Scenarios for the Approval of Autonomous Driving Functions) to systematically classify the identified factors, but they were reorganized into an analysis-oriented layer system in accordance with the purpose of this study. While the existing PEGASUS system focused on the verification of autonomous driving systems, this study established each layer as a category that is highly related to accidents and driving data based on the applicability of the prediction model and the data mapping possibility.
Third, for relative importance analysis, the AHP was applied based on the reorganized layer system. The AHP is a multi-criteria decision-making (MCDM) technique. In this study, a panel of 14 experts evaluated relative importance among layers and factors using the pairwise comparison method. The weight for each factor was then calculated through a consistency check. Based on this, the priorities of high-risk factors were quantitatively derived.
Fourth, in the step of deriving high-risk indicators and designing the prediction model, input indicators applicable to the actual prediction model were derived based on the AHP results, and a conceptual framework was developed. First, a candidate panel of observable indicators was compiled for each high-risk factor based on the existing literature, and representative indicators were selected through a follow-up survey with the same expert group. Responses were collected using the complete ranking method, and average ranks were calculated to identify indicators best representing each factor. Based on the derived indicators, a high-risk situation prediction framework was conceptually presented from the perspectives of road segments and individual autonomous vehicles.

3.2. Meta-Analysis Methodology

Meta-analysis is a technique to comprehensively determine the effectiveness or tendency of a particular variable by statistically integrating the results reported from individual studies [50]. In this study, meta-analysis was applied to quantify high-risk factors for autonomous vehicles using existing literature. Its purpose is to secure basic data for subsequent quantitative assessment (AHP) by systematically analyzing the likelihood and significance of accident-related factors [50,51].
The analysis was conducted using two approaches, depending on the data type. First, binary meta-analysis calculates the risk ratio (RR) based on the presence or absence of a specific high-risk factor in each study [50,52]. The RR is defined as follows:
R R   =   a / ( a + b ) c / ( c + d )
where a, b, c, and d are the frequencies according to the presence or absence of the factor and the occurrence of accidents, respectively. The standard error of the log-transformed RR is calculated as follows.
S E ( log R R ) = 1 a 1 a + b + 1 c 1 c + d
Second, effect size meta-analysis is a method for calculating the standardized effect size (ES) in consideration of the influence of each study (e.g., the number of citations and the influence of the journal) and deriving the overall average effect size by applying the weight for each study ( w i = 1 / S E i 2 ) [53]. The overall effect size is calculated as follows.
E S t o t a l = w i E S i w i
For both analyses, a random effects model was applied to account for heterogeneity across studies. This allowed for a comprehensive estimation of the effects of high-risk factors under various conditions without over-reliance on any single study [52,53].

3.3. AHP (Analytic Hierarchy Process) Methodology

AHP is a representative MCDM technique that quantifies relative importance via pairwise comparisons among multiple criteria and alternatives. In this study, AHP was applied to evaluate the relative importance of the identified high-risk factors and determine their priorities as input variables for the prediction model. AHP has been widely used in the fields of traffic safety and autonomous driving risk analysis as well, because it enables structural comparisons and can quantify expert judgments [54,55].
The AHP procedure proceeds as follows. First, a pairwise comparison matrix among high-risk factors or layers is constructed, and the relative importance of each element is evaluated on a scale of 1 to 9 points. The weight vector is then calculated via eigenvalue analysis, and judgment consistency is evaluated through a consistency test [56]. The consistency index (CI) and consistency ratio (CR) are defined by the following formulas.
C I = λ m a x n n 1 ,     C R = C I R I
where λ m a x is the maximum eigenvalue of the comparison matrix, n is the dimension of the matrix, and RI is the Random Index. In general, acceptable consistency is defined as CR < 0.1 or 0.15 [56]. To ensure logical coherence and maintain the reliability of the analysis, responses with a consistency ratio (CR) exceeding the acceptable threshold were excluded from the aggregation [56]. Final weights were derived by applying the arithmetic mean to valid responses, considering its simplicity, interpretability, and widespread use in existing AHP-based traffic safety research [57].
In this study, an AHP survey was conducted with 14 experts using a two-tier hierarchical structure (comparison among layers and comparison among factors within layers). The final weights were normalized to a total of 1.0 after averaging all responses.

3.4. Average Ranking Analysis Methodology

Average ranking analysis is a method for calculating the relative importance or representativeness of each item based on the averages of all rankings when pairwise comparisons among the items to be evaluated are difficult [58]. In this study, average ranking analysis was applied as a method to select observable indicators that correspond to the high-risk factors identified through AHP analysis because it was judged to be more suitable than the pairwise comparison method due to the large number of indicators and high independence among the items.
The same 14 experts who participated in the AHP analysis conducted the ranking. For each high-risk factor, two to five candidate indicators were presented, and the complete ranking method, in which the experts rank the indicators based on their representativeness, was applied. The average rank of each indicator was then calculated using the following formula.
A v e r a g e   R a n k j = 1 N i = 1 N r i j
where r i j is the ranking given to the j-th indicator by the i-th expert, and N is the total number of respondents. A lower average ranking value indicates that the indicator better represents the factor. The indicator with the lowest average rank was selected as the representative indicator for each high-risk factor. The indicators were then used as input variables in the prediction framework.
This method has the advantage of reducing the cognitive load on experts while still reflecting collective preferences, and is widely used in MCDM contexts where expert consensus is needed without complex matrix-based evaluations. Previous studies have also highlighted the applicability of average ranking analysis in fields requiring efficient prioritization, such as policy formulation and engineering assessment [59,60].

4. Results

4.1. Identification of High-Risk Factors

In this study, various types of empirical data and expert opinions were utilized to design a framework for systematically analyzing and predicting high-risk situations in a mixed traffic environment where autonomous and conventional vehicles coexist. High-risk factors were identified through literature review, accident histories, accident videos, autonomous vehicle driving footage, and expert seminars. The data were constructed to capture risk factors from different aspects, including the context of accidents, driving environment, and technical errors. The analysis proceeded according to the following steps.

4.1.1. Existing Literature-Based Meta-Analysis

After collecting a total of 85 studies, 58 studies (18 studies for accident factors, six studies for scenarios, 30 studies for prediction models, and 4 studies for regulations) were selected as final analysis targets based on meta-analysis applicability and content consistency. Binary and effect size meta-analyses were conducted based on the occurrence and significance of autonomous vehicle risk factors reported in each study.
In the binary meta-analysis, whether certain high-risk factors were mentioned or not was summarized using binary values (zero or one), and the frequency ratio and RR were calculated. Consequently, acceleration/deceleration, intersections, rear-end collisions, fog, rain, inter-vehicle distance, and lane count were identified as high-risk factors, each showing an RR exceeding 25%. These high-risk factors are highlighted in blue in Table 1.
In the effect size meta-analysis, weights were assigned based on study influence (e.g., the number of citations and the influence index of the journal), and the relative importance of each factor was quantified. The results of the effect size meta-analysis are presented in Table 2, with the factors identified as high-risk highlighted in blue for ease of reference. The analysis results revealed that acceleration/deceleration, fog, rain, intersections, the lane count, rear-end collisions, inter-vehicle distance, and pedestrian collisions were the main factors. Notably, factors related to autonomous vehicle driving behavior showed effect sizes exceeding 0.5.

4.1.2. Accident History Data Analysis

To identify high-risk factors associated with actual accidents in mixed traffic environments, the accident history data of conventional and autonomous vehicles were utilized in this study. The data used for analysis were the conventional vehicle accident history from the Korean National Police Agency (2019–2023, a total of 1,037,516 cases) and the autonomous vehicle accident history from the California DMV (2019–2024, a total of 332 cases). Items, such as the date and time of accidents, location, weather, road surface condition, overview of accidents, and the number of injuries, were extracted and used for analysis.
After calculating accident frequency and injury rates, items exceeding a threshold for either indicator were identified as high-risk factors (Equation (6)). In this instance, the third quartile (75%) served as the threshold, as shown in Figure 2. For items with insufficient sample sizes, the median value (50%) was used as a supplementary criterion. This is because traditional statistics (e.g., the mean and standard deviation) can be easily distorted by extreme values (outliers). At the same time, quartile-based thresholds stably represent the upper segments of the data distribution. In particular, when high-risk items are selected based on actual accident data, as in this study, the risk tends to be concentrated in the upper risk sections rather than the statistical average. Thus, the third quartile is a more suitable threshold for this analysis.
P i n j u r e d A i = N i n j u r e d + N f a t a l i t y N i n j u r e d   t o t a l
  • P i n j u r e d ( A i ) is the injury rate in Category A i .
  • N i n j u r e d is the total number of serious (injured) people in Category A i .
  • N f a t a l i t y is the total number of fatalities in Category A i .
  • N i n j u r e d   t o t a l is the total number of injuries across all accidents.
From the conventional vehicle accident history, 11 items were identified: underground roads, intersections, structures, cloudy weather, rain, wetness/moisture, pedestrians, trucks, two-wheelers, centerline violation, and traffic signal violations. From the autonomous vehicle accident history, 19 items were identified: ramp entry and exit, intersection, structure, nighttime, cloudy, clear, dry, lane change, sudden stop, right turn, left turn, moving straight, truck, passenger car, van, bicycle, on-road stopping, and driver inattention.

4.1.3. Accident Video Data Analysis

The accident videos posted online were analyzed to supplement the limitations of the structured accident history data. A total of 10,030 conventional vehicle accident videos and 65 autonomous vehicle accident videos were collected, and they were classified into 24 items after converting the subtitles, text, and audio information into text format in a database.
Similar to the accident history analysis, items in the third quartile or higher were classified as high-risk factors. For conventional vehicles, 12 items were identified: intersection, bus-only lane, daytime, sudden stop, moving straight, pedestrian, passenger car, van, two-wheelers, centerline violation, signal violation, and driver inattention. For autonomous vehicle accidents, nine items were identified: single roadway, structure, emergency vehicle, nighttime, clear, sudden stop, moving straight, drunk driving, and perception error. Notably, “perception error” emerged as a key factor reflecting the inherent vulnerability of autonomous driving systems. As described, accident video-based analysis is effective in capturing behavioral and cognitive elements that existing historical data may overlook.

4.1.4. Autonomous Vehicle Driving Video Data Analysis

In addition to post-accident data, such as existing accident history and accident videos, Waymo’s autonomous vehicle driving video data were used in this study to identify potential high-risk situations encountered during autonomous vehicle operation. The analysis was conducted to identify high-risk factors with a focus on errors that may occur in the perception and judgment processes of autonomous vehicles. Detailed analysis procedures and technologies are provided in Kim et al. (2025) [61].
As shown in Figure 3, the analysis consists of (1) extracting text information from generative artificial intelligence (AI)-based accident reports, (2) extracting autonomous vehicle accident factors, (3) captioning Waymo driving images, and (4) classifying high-risk situations through the BERT (Bidirectional Encoder Representations from Transformers) model. Risk indicators in autonomous driving scenarios were analyzed using textual and visual data.
In the analysis results, ten factors that may lead to high-risk situations during autonomous driving are as follows: intersection, highway, nighttime, daytime, clear, pedestrian, bicycle, on-road parking, on-road stopping, and perception error. In particular, the “perception error” was identified as a key factor that shows how the sensor-based perception system of autonomous vehicles can fail in the actual road environment, which is differentiated from conventional accident-based analysis. The video analysis also revealed that common elements, such as stopped or parked vehicles, may pose a threat to autonomous driving systems.

4.1.5. Expert Seminar

To compensate for the limitations of quantitative analysis, a seminar was held with seven experts in the field of autonomous driving safety. Through the seminar, high-risk factors were identified from practical perspectives, including the perception/decision/control of autonomous vehicles, interactions with drivers, and communication/security threats.
The seminar covered seven key topics, as outlined in Table 3. For each topic, the participants presented high-risk situations from various perspectives, including the limitations of the perception/decision/control processes of autonomous vehicles, interactions with drivers in mixed driving conditions, and communication/cybersecurity threats. Additionally, high-risk possibilities were further discussed for factors that are not captured in existing data but may frequently occur in real road environments, such as the impact of vehicle type distribution and delays in decision-making at complex intersections.
More than 20 high-risk factors were identified during the seminar. These factors spanned multiple categories, including road structures, traffic flow interactions, object behavior, and digital elements. In particular, the experts repeatedly emphasized digital-based risk factors (e.g., perception error, decision error, DDT fallback, and cyberattacks) and risk factors in complex road segments (e.g., intersection, ramp in/out, and bottleneck point). These results served as a foundation for empirically deriving high-risk factors in real-world driving conditions, and were reflected in AHP-based importance assessment and prediction framework design.

4.2. Layer-Based Reclassification

The six-layer structure proposed by the PEGASUS project, which classifies road and driving environments for scenario-based verification of autonomous vehicles, has been widely used to ensure the stability and reliability of autonomous driving functions. The purpose of this study, however, is to establish a factor structure for predicting high-risk situations that may occur in a mixed traffic environment with autonomous and conventional vehicles rather than for scenario creation or system testing. For this prediction-oriented purpose, the existing PEGASUS system has limitations.
First, the PEGASUS framework lacks consistency with empirical data (e.g., accident history and video data) because it emphasizes a system-centered environment definition. Second, it is limited in precisely analyzing accident causes because road structures and transportation infrastructure are integrated into one category. Third, each factor must be translated into measurable and quantifiable indicators for prediction model design, but the existing framework fails to meet these requirements.
To improve predictability and data consistency, this study restructured the layer system based on three criteria. First, it must align with actual accident and driving data. Second, it must be translatable into quantitative indicators that can be used as input variables for the prediction model. Third, the corresponding factors must significantly contribute to risk detection and response in the perception–decision–control processes of autonomous vehicles. Based on these criteria, the PEGASUS system was reorganized into six analysis-oriented layers as follows (see Table 4). It is maintained in the same structure throughout the AHP-based analysis and prediction framework design process.
(1)
Road facilities: Physical structures and equipment installed for road safety and efficiency (e.g., lighting facilities, road alignment, and road grades), which are basic infrastructure that corresponds to the background conditions of accidents.
(2)
Variable and temporary facilities: Facilities with operation modes that vary over time or situation (e.g., variable lanes and construction zones), representing dynamic risk factors that complicate prediction.
(3)
Traffic flow characteristics: Aggregate movement patterns of vehicles on the road, including density, speed, conflicts, and bottlenecks. These act as key indicators of persistent risk levels.
(4)
Environmental variables: External conditions, such as weather, time, and road surface condition, which are directly related to driving stability by affecting vehicle sensor performance or drivers’ view.
(5)
Moving objects: Road users, including vehicles, pedestrians, and bicycles, and their behaviors (e.g., lane changes, signal violations, and inter-vehicle distance), which directly contribute to hazardous conditions.
(6)
Digital: Digital-based elements (e.g., perception, decision, and communication systems related to autonomous driving functions), including inherent technological risks, such as sensor malfunctions and cyberattacks.
The identified high-risk factors were obtained from six different data sources and classified according to the redefined six-layer framework. Specifically, six factors were derived through literature meta-analysis, 32 from expert seminars, and 24 from accident history data analysis (11 from conventional vehicle accidents and 18 from autonomous vehicle accidents, with 5 overlapping items). In addition, 23 factors were identified through driving video analysis (12 from conventional vehicle accident videos, 9 from autonomous vehicle accident videos, and 9 from autonomous vehicle driving videos, with 7 overlapping items). Due to length constraints, the final classification of high-risk factors based on the six-layer system is presented in Appendix A.
The analysis revealed that integrating heterogeneous data sources enabled the identification of high-risk factors that would be difficult to capture using a single source. Some high-risk factors were identified only from specific data sources, allowing for a more comprehensive reflection of potential risks across various environments and situations.
For example, digital-based factors such as “hacking,” “cyberattack,” and “DDT fallback failure (failure of driver takeover in autonomous driving systems)” were not captured in traditional sources such as literature reviews, accident records, or video analysis but were identified in expert seminars as critical high-risk elements in autonomous environments. This suggests that software errors or external communication threats in advanced autonomous systems can indeed act as actual accident triggers. In addition, the “dilemma zone,” although a high-risk situation arising from behavioral differences between autonomous and conventional vehicles, did not appear explicitly in accident data or videos but was identified through expert surveys as a unique intersection risk in mixed traffic environments. Another example includes “emergency vehicle,” which was identified only through autonomous vehicle accident videos, while dynamic or temporary facility-related risks such as “bus-only lane,” “construction zone,” “variable lane,” and “high-accident zone” were identified solely in expert seminars. Environmental variables like “cloudy” appeared consistently across multiple sources, including autonomous and conventional accident histories and expert seminars. However, “nighttime” was repeatedly confirmed in autonomous vehicle accident histories, accident videos, and driving footage, supporting the notion that reduced visibility conditions may impair autonomous system performance.

4.3. AHP Analysis Design and Execution

The AHP was applied in this study to quantitatively evaluate the relative importance of the identified high-risk factors. AHP is an analytical technique to quantify expert judgments in multi-criteria decision-making problems and systematically derive relative priorities among factors. It has been widely used to assess risks in complex systems, such as autonomous vehicles [21,62].
The AHP model designed in this study uses a two-level hierarchy structure. The first hierarchy compares the relative importance of the six redefined data layers, and the second hierarchy evaluates the relative importance of the high-risk factors in each layer. The structure did not include all initially identified factors as-is. Instead, it integrated items with overlapping meanings or inclusive relationships and excluded general conditions (e.g., “clear”) with a low correlation with high-risk situations from analysis targets to improve survey efficiency and response reliability. Table 5 summarizes the factors included in the final AHP survey.
The expert survey was conducted with a total of 14 experts, consisting of nine professors and five researchers in the fields of autonomous driving and traffic safety in Korea, including nine professors, four researchers, and one research professor. The expert selection criteria were as follows: (1) more than 10 years of research or working experience in the relevant field, (2) participation in government-led projects related to autonomous vehicle safety or smart mobility, and (3) affiliation with major domestic universities or national research institutes. Notably, all experts are currently engaged in active research on autonomous driving and were selected for their expertise for the purpose of this study. Reliability was confirmed as CR remained below 0.15 for all responses. Eigenvectors were calculated based on the pairwise comparison matrix, which was based on a scale of one to nine points. Based on this process, weights were assigned to each hierarchical level.
According to the results of the first hierarchy, “moving objects (0.2437)” and “digital (0.2221)” layers exhibited the highest importance, while “road facilities (0.0643)” showed relatively low importance. This indicates that driving behavior and system error factors have a more significant impact than physical infrastructure in high-risk situations (see Table 6).
In the second hierarchy, weights for high-risk factors were determined for each layer. For example, the truck, pedestrian, and signal violation factors exhibited high importance in “moving objects”. In contrast, the decision error, perception error, and control error factors showed relatively high weights in “digital”. On the other hand, “on-road parking/stopping” and “vehicle type distribution” received low weights (see Table 7).
Accordingly, high-risk factors in the “moving objects” and “digital” layers should be prioritized in future autonomous driving safety system designs. For instance, high-precision object recognition algorithms to mitigate interaction risks with trucks and pedestrians, real-time inter-vehicle distance monitoring systems, and signal violation detection functions are needed. In addition, to address perception and decision-making errors, internal error detection and alert systems within autonomous vehicles, as well as preemptive response scenarios and safe driving mode activation for DDT fallback situations, must be implemented.
Finally, considering the usability of the analysis, only items for which the cumulative weight of the overall importance was 0.8 or higher were adopted in accordance with the Pareto principle (80:20) rather than reflecting all factors evenly. As a result, 28 high-risk factors were identified across the six layers. They were used to derive key input variables for high-risk indicators and prediction framework design (see Table 8).

4.4. High-Risk Indicator Design and Prediction Framework Conceptualization

The relative importance of each high-risk factor in a mixed environment with autonomous vehicles was quantitatively derived through the AHP analysis in the previous section. Since most of these factors had qualitative concepts, however, observable indicators for each factor were derived to establish quantitative decision criteria and link them to the prediction flow. Based on this, an attempt was made to design a prediction framework.

4.4.1. Identification of High-Risk Indicators

High-risk indicators were identified in two steps: literature-based matching and expert consultation. First, a group of candidate indicators corresponding to each high-risk factor was formed by examining the literature related to traffic safety indicators and autonomous vehicle performance evaluation. For example, PET, the weaving ratio, and TTC represent “conflict” factors, while the stopping rate, average speed, and traffic flow density reflect “congestion” factors. In this process, the researchers of the present study added candidate indicators beyond those in the literature, considering practical applicability and the possibility of securing data. Table 9 summarizes the candidate indicators and their descriptions. The full list of variables and their descriptions used for the average ranking methodology is provided in Appendix B due to space limitations.
As a follow-up step, a complete ranking survey was conducted on the candidate indicators for each factor with 14 experts who participated in the AHP analysis. Experts ranked indicators by representativeness using the average rank method. This method can quantify relative representativeness in the evaluation of multiple items that make pairwise comparison difficult, and it has been widely used in the MCDM field [63,64]. The three indicators with the lowest average rankings were selected as representatives for each factor, as shown in Table 10.

4.4.2. Proposal of a High-Risk Situation Prediction Framework in Mixed Traffic with Autonomous Vehicles

High-risk factors must be transformed into measurable indicators for use in applications such as prediction models. As such, in this study, representative indicators that correspond to the key high-risk factors identified based on AHP were derived. These indicators serve as critical input variables for predicting and addressing high-risk situations.
In particular, two prediction levels (road segments and autonomous vehicles) were distinguished in this study to conceptually propose a framework for the structural characteristics of each level. The prediction levels were distinguished because high-risk situations at the two levels have different causes and operational mechanisms. While road segments have relatively static risk patterns due to physical infrastructure and traffic flow characteristics, autonomous vehicles have risks that vary in real time due to real-time interactions and environmental factors that occur while driving. Therefore, the types of risk indicators and risk prediction methods are heterogeneous, requiring a prediction system with separate structures for each level.
Accordingly, the proposed framework adopts an input–process–output structure.
  • Input: The representative high-risk indicators identified through the AHP and expert consultation (e.g., accident frequency, conflict rate, stopping rate, weather conditions, inter-vehicle distance, and perception error)
  • Process: Forecasting indicator values over time using time-series algorithms (e.g., LSTM (long short-term memory)).
  • Output: Calculating the risk index for road segments or autonomous vehicles based on the predicted indicator values.
Road segment-level prediction uses two methods depending on data availability, as shown in Figure 4. In segments with only historical data, indicators that represent mid- to long-term risk levels (e.g., accident frequency, stopping rate, and conflict rate) are utilized. In segments where real-time information is available, indicators that fluctuate in real time (e.g., traffic volume, speed, and weather conditions) are additionally reflected to enable more sensitive detection. Static and dynamic risks can be addressed simultaneously by integrating historical and real-time data.
Autonomous vehicle-level prediction uses the real-time driving behavior of vehicles, surrounding object information, and environmental conditions as input values to predict high-risk indicator values in n seconds. The vehicle-level risk index is calculated from these values (Figure 5). The input variables include vehicle behavior (e.g., inter-vehicle distance, lane change, and turning behavior), the relative speed and location of surrounding objects, illumination and weather conditions, and the risk of the road segment. The predicted high-risk indicators are converted into a risk index using a weighted sum or rule-based logic. The index informs operational strategies for autonomous vehicles.
Additionally, vehicle-level prediction is linked to segment-level prediction. Because identical behavior may pose greater risk in the high-risk segment, the road segment risk is integrated into the input value of the autonomous vehicle model. This makes it possible to construct a hierarchical prediction system that accounts for both temporal and spatial sensitivity.
As described, high-risk indicators serve as prediction targets and inputs and can be combined with various algorithms, including LSTM, the GRU (gated recurrent unit), random forest, and the support vector machine (SVM). The proposed framework can be extended into a unified prediction system that integrates real-time response and prevention measures in a mixed environment with autonomous vehicles.

5. Discussion

This study aimed to establish a foundation for predicting high-risk situations in mixed traffic environments involving both autonomous and conventional vehicles by integrating five heterogeneous data sources—literature, accident history, accident videos, autonomous vehicle driving videos, and expert seminars—to identify high-risk factors. Unlike previous studies that relied solely on accident frequency or severity, this study also comprehensively considered system errors (e.g., perception and decision failures), variable road infrastructure conditions (e.g., construction zones and variable lanes), and real-time risk variables (e.g., traffic conflicts and congestion), thereby establishing indicators that can quantitatively detect pre-crash risk signals. Notably, the PEGASUS six-layer framework was restructured to suit the study’s objectives, and the derived high-risk factors were quantified using the AHP and average ranking methods to transform them into input variables directly applicable to prediction models. Based on this, a high-risk prediction framework was proposed.
The results of this study are broadly consistent with those of previous research on the major causes of autonomous vehicle accidents. In terms of road infrastructure, previous studies have reported that design features such as intersection types, lane count, dedicated lanes, and on-street parking are closely related to accident occurrence [8,9,10,11]; this study likewise identified these elements as high-risk factors. Regarding traffic flow characteristics, several risk factors identified in this study align with prior findings that speed variation, congestion levels, and upstream traffic conditions significantly affect driving stability and accident risk in autonomous vehicles [7,12]. Additionally, this study incorporated environmental variables such as inclement weather and reduced visibility, which have been shown to impair the accuracy of autonomous perception systems and increase accident likelihood [13,14,15].
Accidents resulting from interactions between autonomous vehicles and surrounding objects were also a key area of discussion. Sudden interactions with pedestrians, bicycles, and conventional vehicles are closely associated with rear-end collisions and other major accident types [8,10], which aligns with this study’s emphasis on the high importance of “moving object” factors. Furthermore, perception and control errors within autonomous systems have been cited as major accident causes [15,16]; this study identified “digital elements” and “system risks” as high-risk factors.
Unlike previous studies that primarily focused on single factors or post-accident analysis, this study offers significant academic distinction by integrating heterogeneous data sources and quantitatively identifying latent risk factors in the pre-accident phase within a multilayered structure. Specifically, by restructuring the PEGASUS six-layer framework and applying AHP and average ranking methods, the study converted expert qualitative judgments into real-time, actionable quantitative indicators—thereby presenting a structured, multilayered metric system that can be directly applied in prediction model design [3,4,5,6,7,20,22,25,29,30,31,32,33,34,35,36,37,38,39].
This research presents a proactive approach to risk management by quantitatively identifying and structuring risk factors in advance, based on various data, moving beyond response-oriented analyses conducted after the occurrence of an accident. By comprehensively considering various dimensions of risk, such as technical failures, traffic interactions, and digital vulnerabilities, within a multi-layered structure, the study can contribute to the development of a long-term, sustainable traffic safety system rather than short-term reactive measures. This aligns with the need for a resilient, prevention-oriented urban traffic system design, particularly in the early stages of the commercialization of autonomous vehicles. Therefore, the results of this study can be used as practical basic data for overcoming the limitations of existing traffic safety management systems and establishing future traffic safety strategies.
Despite these contributions, this study has some limitations. First, although it proposes a novel framework that integrates heterogeneous data and applies a multilayered structure to identify high-risk factors in mixed traffic environments, it remains at a conceptual level and has not yet been applied or validated in real-world driving environments. Second, while the study systematically derived indicators by integrating expert judgment and diverse data, it did not empirically verify whether these indicators could effectively predict high-risk situations in actual autonomous driving contexts. Third, most of the autonomous vehicle data used in this study correspond to Level 3 automation, which limits the framework’s applicability to future Level 4 full-automation environments. The limited availability of autonomous vehicle accident data may also affect the diversity of analyzable cases. However, the study sought to reflect the unique risks of autonomous driving by collecting accident data as comprehensively as possible and incorporating expert assessments, despite practical constraints. Fourth, although the use of the AHP and average ranking methods to quantify expert opinions is structurally useful, the results may still reflect subjectivity depending on the expert group composition and judgment variability. Fifth, the applicability and performance of the prediction model may be affected by the absence of real-time sensor data and varying traffic densities, which were not considered in this study.
To address these limitations, future research should focus on implementing and validating prediction models that combine road segment-level structural risk factors and vehicle driving characteristics derived in this study. These models should be applied not only to fixed structural factors but also in conjunction with real-time operational data, and their performance should be experimentally evaluated in high-fidelity test environments—such as driving simulators or autonomous driving testbeds—under various traffic, environmental, and digital conditions. In particular, it is crucial to empirically verify how high-risk factors derived via AHP affect the occurrence of accidents or near-misses in actual autonomous driving scenarios.
Additionally, to overcome the limitation of Level 3-based data, future research could utilize Level 4 autonomous vehicle data (e.g., from test vehicles (including sensor data), pilot programs, or restricted operational domains) or design scenario-based simulation experiments assuming Level 4 conditions. This approach would not only enhance the framework’s alignment with future autonomous deployment environments but also contribute to refining scenario-based risk response strategies.
Finally, to address the limitations of expert-based weighting, future work could supplement or calibrate expert judgments with data-driven methods such as machine learning-based variable importance analysis or regression coefficient–based impact estimation. This hybrid approach—combining expert insight and empirical data—can enhance model objectivity and robustness while expanding its applicability across diverse road environments and stages of autonomous driving technology adoption.

6. Conclusions

This study followed these procedures to establish a quantitative basis for proactively detecting and responding to high-risk situations in mixed traffic environments involving autonomous and conventional vehicles. First, high-risk factors were identified from multiple perspectives, including literature meta-analysis, accident history and video data analysis, autonomous vehicle driving video footage, and expert seminars. After redefining the six-layer structure of PEGASUS to suit the research objectives, the relative importance of each factor was calculated using the AHP, and representative indicators corresponding to the high-risk factors were derived through average ranking analysis. Finally, using the derived indicators, a conceptual framework was designed to predict risks at both the road segment and autonomous vehicle levels.
The findings showed that high-risk factors were not confined to a single category. They were derived from various dimensions, such as road physical characteristics, traffic flow interactions, weather and illumination conditions, the driving behavior of autonomous vehicles, and system errors. This highlights the complexity and layered risk structure inherent in mixed traffic environments involving autonomous vehicles. The layer-based analysis revealed that the importance of the “moving object,” “digital,” and “traffic flow characteristics” layers was comparatively high. This shows that both technical faults and real-time interaction risks significantly affect the safety of autonomous vehicles. In addition, driving behavior patterns of autonomous systems, perception and decision-making errors, and external communication threats were identified as more influential core factors than physical infrastructure. Digital-based risk factors such as “DDT fallback” and “cyberattack,” which are not captured in existing accident data, were additionally derived through expert seminars. This indicates the need for a diversified approach to identify high-risk factors in the future design of autonomous driving safety systems. By quantifying the identified high-risk factors through the AHP and the average ranking method, they were converted into input variables directly applicable to prediction models. Utilizing these variables, the study proposed a dual-layer prediction framework that distinguishes between spatial units (road segments) and object units (autonomous vehicles)—a key contribution of this research.
This study offers three main contributions. First, an attempt was made to enhance the data-based precision of autonomous driving safety research by comprehensively analyzing empirical data with varying characteristics and deriving high-risk factors from multiple perspectives. Second, layer-based expert judgments were systematically quantified using the AHP and the average ranking method. They were subsequently converted into structured input variables for use in prediction models. Third, a basis for linking real-time response with proactive risk diagnosis was presented by distinguishing between the space level (road segments) and the object level (autonomous vehicles) in prediction model design.
However, this study has the following limitations. First, the proposed framework and high-risk factors have not yet been empirically validated in actual autonomous driving environments. Second, most of the autonomous vehicle data used in the analysis correspond to Level 3, limiting applicability to fully autonomous (Level 4) scenarios. Third, the expert evaluations using the AHP are subject to potential bias depending on expert composition and judgment. Fourth, the actual influence of the quantitatively derived factors on accident occurrence or near-miss situations has not been empirically verified. Fifth, limitations regarding prediction model applicability and potential performance degradation due to varying traffic densities and the absence of real-time sensor data were not addressed.
Future research should implement prediction models based on the identified high-risk factors and validate their performance across various conditions using simulators or testbeds. Moreover, acquiring Level 4 data (including sensor data, etc.) or designing scenario-based experiments could expand applicability. Combining expert evaluation with data-driven analysis would further strengthen model objectivity and effectiveness. Such studies could be used to develop real-time risk prediction systems for autonomous vehicles, refine scenario-based simulations, and establish infrastructure strategies focused on high-risk zones, ultimately contributing to safety assurance and policy development in the early stages of commercialization.
This study presented a foundation that can contribute to the preparation of a sustainable traffic safety system by detecting and structuring risk factors in advance beyond short-term accident prevention. Specifically, a multi-layered structure and system of quantified indicators can be utilized for the design of real-time response systems, high-risk segment-based infrastructure strategies, and policy intervention measures from a perspective of proactive risk management. This approach can contribute to minimizing the risk of accidents in the early stages of the commercialization of autonomous vehicles and enhancing long-term traffic system resilience and operational stability.

Author Contributions

Conceptualization, H.H., J.J. and S.L.; methodology, H.H. and J.J.; software, H.H. and J.J.; formal analysis, H.H. and J.J.; investigation, H.H., J.J. and J.L.; resources, S.L.; data curation, H.H. and J.J.; writing—original draft preparation, H.H.; writing—review and editing, H.H., J.J., S.L. and J.L.; visualization, H.H. and J.J.; supervision, S.L.; project administration, S.L.; funding acquisition, S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Korea Institute of Police Technology (No: RS-2024-00405603).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data collected in this study are not publicly available; interested parties may contact the corresponding author to request access.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Layer-wise high-risk factors in mixed autonomous vehicle environments.
Table A1. Layer-wise high-risk factors in mixed autonomous vehicle environments.
LayerHigh-Risk FactorsLiterature ReviewExpert
Consultation
Historical Accident DataVideo-Based Accident Data
Police
Accident Data (CV)
CA DMV Collision Report
(AV)
Accident Video (CV)Accident Video (AV)Driving Video (AV)
Road
facilities
Underpass o
ramp on/off o o
Single roadway o
intersectionooooo o
tunnel o
Roundabout o
Road lighting o
Expressway/Highway o
Road structure/facility oo o
# of lanesoo
Variable and temporary
facilities
Bus-only lane o o
variable lane o
High-accident zone o
Construction zone o
Traffic flow
characteristics
Traffic conflict o
dilemma zone o
Vehicle type distributionoo
Bottleneck point o
Emergency vehicle
Stop-and-go traffic o
Environmental
variable
Nighttime o oo
Daytime o o
Fogoo
Cloudy ooo
Wet/Moist Surface o
Clear o o oo
Dry o
Rain/Snowo o
Moving
objects
Lane change o o
Sudden stop o ooo
Right turn o o
Left turn o o
Moving straight o ooo
Pedestriano o o o
Truck oo
Passenger Car oo
Van oo
Bicycle o o
Motorcycle o o
On-road parking o o
On-road stopping o o o
Inter-vehicle distance o
Centerline violation o o
Signal violation oo o
Drunk driving o
Signal waiting
Driver in attention o oo
DigitalPerception error o o
Decision error o
Control error o
DDT fallback o
In-vehicle/V2X communication o
Hacking o
Cyberattack o

Appendix B

Table A2. Comprehensive variable list for high-risk indicator derivation.
Table A2. Comprehensive variable list for high-risk indicator derivation.
LayerHigh-Risk IndicatorsDefinition
Road
facilities
Presence of intersectionIndicates whether an intersection is present in the road segment.
Presence of a crosswalkIndicates whether a pedestrian crosswalk exists in the area.
Intersection typeSpecifies the type of intersection, such as T-junction or four-way.
Presence of on/off-rampIndicates the presence of highway or expressway access ramps.
Speed limitMaximum legal speed allowed on the road segment.
Lane countNumber of lanes in the road segment.
Presence of roundaboutIndicates whether a roundabout is present in the segment.
Presence of underpass (road)Indicates whether the segment includes an underpass.
Underpass (road) lengthLength of the road underpass in meters.
Presence of tunnelIndicates whether a tunnel is present on the road segment.
Tunnel lengthLength of the tunnel in meters.
Variable and temporary
facilities
Presence of road obstaclesIndicates whether any obstacles are present on the road.
Location of road obstaclesSpecifies where obstacles are located (e.g., lane or shoulder).
Type of road obstaclesDescribes the obstacle type (e.g., pothole, debris, or roadkill).
Presence of construction/work zoneIndicates if construction or maintenance work is ongoing.
Location of construction/work zoneSpecifies the position of the work zone on the road.
Presence of reversible laneIndicates if a reversible traffic lane is present.
Traffic flow characteristicsWeaving ratioRate of lane-changing or weaving maneuvers per segment.
PET (post encroachment time)Time interval between two vehicles occupying the same location.
Standard deviation of link travel speedMeasures the variation in vehicle speeds across a road segment.
Presence of accident-prone zoneIndicates if the segment is classified as accident-prone.
Proportion of risky driving behaviorShare of vehicles exhibiting aggressive or dangerous maneuvers.
EDI (erratic driving index)Metric quantifying unexpected or irregular vehicle movements.
Stopping rate within the segmentFrequency of vehicles coming to a complete stop in the segment.
Presence of dilemma zoneIndicates if a zone exists where drivers hesitate to stop or proceed at yellow light.
Speed difference between adjacent linksDifference in average speed between neighboring road segments.
Environmental
variables
Heavy vehicle ratioPercentage of large vehicles (e.g., trucks and buses) among total traffic.
Snowfall amountVolume of snowfall measured in the area.
Snowfall durationDuration of snowfall in hours or minutes.
Rainfall amountVolume of rainfall measured in the area.
Rainfall durationDuration of rainfall in hours or minutes.
TemperatureAmbient temperature in degrees Celsius.
Night time period presenceIndicates if the condition occurs during nighttime.
Moving
objects
Presence of pedestrians near autonomous vehicleIndicates pedestrian presence close to autonomous vehicles.
Presence of freight vehicles near autonomous vehiclePresence of trucks or delivery vehicles near AVs.
Presence of signal violationIndicates whether a traffic signal violation has occurred.
Acceleration/DecelerationChange in vehicle speed per unit time (m/s2).
SpeedCurrent speed of the vehicle.
JerkRate of change in acceleration, indicating abrupt movement.
Angular velocity per secondRate of vehicle’s directional rotation over time.
Inter-vehicle distanceDistance between the subject vehicle and the one ahead.
DigitalPresence of perception errorWhether the AV experienced sensor or detection failures.
Frequency of perception errorsHow often perception errors occur within a timeframe.
Sensor field of viewDetection range and coverage angle of the AV’s sensors.
Presence of decision-making errorWhether the AV made an incorrect driving decision.
Frequency of decision-making errorsNumber of incorrect decisions made during operation.
Presence of control errorWhether the AV experienced control system malfunctions.
Frequency of control errorsNumber of control-related failures in vehicle systems.
Presence of DDT fallbackIndicates whether fallback mode was triggered in AV operation.
Frequency of DDT fallbackCount of fallback events initiated during driving.
Presence of cyberattackWhether a cyberattack targeting vehicle systems occurred.
Frequency of cyberattacksNumber of cyberattacks detected during vehicle operation.

Appendix C

Table A3. Abbreviation list.
Table A3. Abbreviation list.
High-Risk IndicatorsDefinition
PEGASUS Joint ProjectProject for the Establishment of Generally Accepted Quality Standards, Tools, Methods, Processes, and Scenarios for the Approval of Autonomous Driving FunctionsGerman Joint Project for Standardization of Autonomous Driving Safety Validation
AHPAnalytic Hierarchy ProcessA hierarchical decision-making analysis method that quantifies the relative importance of factors
ADSAutonomous Driving SystemsAutonomous driving system
DSSADData Storage System for Automated DrivingAutonomous driving recorder (stores data during operation)
EDREvent Data RecorderDevice that stores data at the moment of an accident (also used in non-autonomous vehicles)
TTCTime to CollisionTime remaining until collision (a risk indicator)
PETPost Encroachment TimeTime gap between two objects passing through the same point (a surrogate indicator for spatial threat)
TITTotal Impact TimeTotal time to collision (used as a composite indicator of collision probability)
CPICrash Potential IndexCollision potential index
CARCollision Avoidance RateCollision avoidance rate (evasive performance of the system or driver)
HD MapsHigh-Definition MapsHigh-precision map (centimeter-level accuracy, essential for autonomous driving)
California DMVCalifornia Department of Motor VehiclesCalifornia Department of Motor Vehicles (responsible for releasing autonomous vehicle-related data)
MCDMMulti-criteria Decision-makingMulti-criteria decision-making methods (including AHP, TOPSIS, etc.)
RRRisk RatioA ratio of risk levels between comparison groups
ESEffect SizeA statistical measure of magnitude of impact
CIConsistency IndexConsistency index (used to assess consistency in AHP matrices)
CRConsistency RatioA value obtained by dividing the CI by the RI, used to judge acceptability
RIRandom IndexAverage consistency index of a random matrix (used in CR calculation)
BERTBidirectional Encoder Representations from TransformersDeep learning model for natural language processing
CVConventional VehicleConventional vehicle (driven by a human)
AVAutonomous VehicleAutonomous vehicle
CAVConnected Autonomous VehicleConnected autonomous vehicle (AV capable of vehicle to everything communication)
DDT FallbackDynamic Driving Task FallbackFallback driving task performed by a human when the autonomous system fails during driving
V2X CommunicationVehicle-to-Everything CommunicationCommunication between the vehicle and external elements (includes V2V (Vehicle-to-Vehicle), V2I (Vehicle-to-Infrastructure), and V2P (Vehicle-to-Pedestrian), etc.)
EDIErratic Driving IndexCalculated as the sum of the area exceeding critical thresholds of aggressive driving indicators (e.g., speed, acceleration, jerk, and yaw) during the analysis period, divided by travel time
LSTMLong Short-Term MemoryRecurrent neural network (RNN) architecture specialized in time-series data prediction
GRUGated Recurrent UnitRecurrent neural network architecture that is a lightweight version of LSTM
SVMsupport Vector MachineSupervised learning models used for classification and regression analysis

References

  1. Anderson, J.M.; Kalra, N.; Stanley, K.D.; Samaras, C.; Oluwatola, T.A. Autonomous Vehicle Technology: A Guide for Policymakers; Rand Corporation: Santa Monica, CA, USA, 2014. [Google Scholar]
  2. Santacreu, A.; Yannis, G.; Léon, O.; Crist, P. Safe Micromobility; International Transport Forum: Leipzig, Germany; OECD: Paris, France, 2020; Available online: https://www.researchgate.net/publication/357689595_Safe_Micromobility (accessed on 17 February 2020).
  3. Katrakazas, C.; Quddus, M.; Chen, W.-H. A new integrated collision risk assessment methodology for autonomous vehicles. Accid. Anal. Prev. 2019, 127, 61–79. [Google Scholar] [CrossRef]
  4. Goudarzi, P.; Hassanzadeh, B. Collision risk in autonomous vehicles: Classification, challenges, and open research areas. Vehicles 2024, 6, 157–190. [Google Scholar] [CrossRef]
  5. Xu, C.; Wang, W.; Liu, P.; Zhang, F. Development of a real-time crash risk prediction model incorporating the various crash mechanisms across different traffic states. Traffic Inj. Prev. 2015, 16, 28–35. [Google Scholar] [CrossRef]
  6. Li, P.; Abdel-Aty, M.; Yuan, J. Real-time crash risk prediction on arterials based on LSTM-CNN. Accid. Anal. Prev. 2020, 135, 105371. [Google Scholar] [CrossRef]
  7. Hossain, M.; Abdel-Aty, M.; Quddus, M.A.; Muromachi, Y.; Sadeek, S.N. Real-time crash prediction models: State-of-the-art, design pathways and ubiquitous requirements. Accid. Anal. Prev. 2019, 124, 66–84. [Google Scholar] [CrossRef]
  8. Petrović, Đ.; Mijailović, R.; Pešić, D. Traffic accidents with autonomous vehicles: Type of collisions, manoeuvres and errors of conventional vehicles’ drivers. Transp. Res. Procedia 2020, 45, 161–168. [Google Scholar] [CrossRef]
  9. Kim, C.H.; Kim, J.H. Investigating autonomous vehicle accidents at urban intersections based on road geometry data. J. Korean Soc. Road Eng. 2023, 25, 255–263. [Google Scholar] [CrossRef]
  10. Lee, S.H.; Park, J.Y. A study on autonomous vehicle crash hierarchy analysis and severity model based on Bayesian probabilistic inference. J. Korean Soc. Transp. 2024, 42, 77–93. [Google Scholar] [CrossRef]
  11. Jeong, A.R.; Cho, Y.; Oh, C. A methodology of identifying hazardous freeway segment based on multi-agent driving simulations for the mixed situation of autonomous and manual vehicles. J. Korean Soc. Transp. 2023, 41, 495–508. [Google Scholar] [CrossRef]
  12. Li, J.; Ling, M.; Zang, X.; Luo, Q.; Yang, J.; Chen, J.; Guo, X. Quantifying risks of lane-changing behavior in highways with vehicle trajectory data under different driving environments. Int. J. Mod. Phys. C 2024, 35, 2450141. [Google Scholar] [CrossRef]
  13. Park, H.; Haghani, A.; Sanuel, S.; Knodler, M.A. Real-time prediction and avoidance of secondary crashes under unexpected traffic congestion. Accid. Anal. Prev. 2018, 112, 39–49. [Google Scholar] [CrossRef]
  14. Chen, S.; Piao, L.; Zang, X.; Luo, Q.; Li, J.; Yang, J.; Rong, J. Analyzing differences of highway lane-changing behavior using vehicle trajectory data. Phys. A Stat. Mech. Its Appl. 2023, 624, 128980. [Google Scholar] [CrossRef]
  15. Szénási, S.; Kertész, G.; Felde, I.; Nádai, L. Statistical accident analysis supporting the control of autonomous vehicles. J. Comput. Methods Sci. Eng. 2021, 21, 85–97. [Google Scholar] [CrossRef]
  16. Das, S.; Dutta, A.; Tsapakis, I. Automated vehicle collisions in California: Applying Bayesian latent class model. IATSS Res. 2020, 44, 300–308. [Google Scholar] [CrossRef]
  17. Zheng, O.; Abdel-Aty, M.; Wang, Z.; Ding, S.; Wang, D.; Huang, Y. Avoid: Autonomous vehicle operation incident dataset across the globe. arXiv 2023, arXiv:2303.12889. [Google Scholar] [CrossRef]
  18. Favarò, F.; Eurich, S.; Nader, N. Autonomous vehicles’ disengagements: Trends, triggers, and regulatory limitations. Accid. Anal. Prev. 2018, 110, 136–148. [Google Scholar] [CrossRef]
  19. Kim, J.-Y. Law and Economics of Artificial Intelligence: Optimal Liability Rules for Accident Losses Caused by Fully Autonomous Vehicles. SSRN Electron. J. 2023. [Google Scholar] [CrossRef]
  20. Hyeon, S.H.; Son, J.W.; Oh, Y.C.; You, B.Y. A study of the DSSAD data elements derivation through autonomous driving data analysis on expressways. J. Korean Soc. Intell. Transp. Syst. 2024, 23, 97–106. [Google Scholar] [CrossRef]
  21. Lee, J.; Ahn, S.; Lee, J.; Roh, C.; Chang, I. Analysis of Safety Indicators by Pedestrian Accident Types in Urban Community Roads. J. Korea Inst. Intell. Transp. Syst. 2024, 23, 34–46. [Google Scholar] [CrossRef]
  22. Vinoth, K.; Sasikumar, P. Multi-sensor fusion and segmentation for autonomous vehicle multi-object tracking using deep Q networks. Sci. Rep. 2024, 14, 31130. [Google Scholar] [CrossRef]
  23. Saputra, D.B. Accident Investigation in the Automated Traffic System. Master’s Thesis, Westsächsische Hochschule Zwickau, Zwickau, Germany, 2023. [Google Scholar]
  24. Kang, H.J.; Woo, N.E.; Park, G.O.; Song, J.H. A study on the direction of data triggers and elements for automated vehicle data recorder. J. Auto-Veh. Saf. Assoc. 2023, 15, 71–78. [Google Scholar]
  25. Masello, L.; Sheehan, B.; Murphy, F.; Castiganani, G.; McDonnell, K.; Ryan, C. From traditional to autonomous vehicles: A systematic review of data availability. Transp. Res. Rec. 2022, 2676, 161–193. [Google Scholar] [CrossRef]
  26. Elamrani Abou Elassad, Z.; Mousannif, H.; Al Moatassime, H. Class-imbalanced crash prediction based on real-time traffic and weather data: A driving simulator study. Traffic Inj. Prev. 2020, 21, 201–208. [Google Scholar] [CrossRef]
  27. Kim, H.; Han, H.; You, Y.; Hong, J.; Song, T.J. A comprehensive traffic accident investigation system for identifying causes of the accident involving events with autonomous vehicle. J. Adv. Transp. 2024, 2024, 9966310. [Google Scholar] [CrossRef]
  28. Répás, J.; Berek, L.; Schmidt, M. Autonomous Vehicles Forensics-The next step of the Digital Vehicles Forensics. In Proceedings of the 2022 IEEE 1st International Conference on Cognitive Mobility (CogMob), Budapest, Hungary, 12–13 October 2022. [Google Scholar]
  29. Girdhar, M.; Hong, J.; You, Y.; Song, T.J. Anomaly Detection for Connected and Automated Vehicles: Accident Analysis. In Proceedings of the 2023 IEEE Transportation Electrification Conference & Expo (ITEC), Detroit, MI, USA, 21–23 June 2023. [Google Scholar]
  30. Park, G.O.; Kang, H.J.; Woo, N.E. Types and necessity of data (EDR/DSSAD) recorded by automated vehicles. Auto J. J. Korean Soc. Automot. Eng. 2023, 45, 24–27. [Google Scholar]
  31. Park, S.M.; So, J.; Ko, H.; Jeong, H.; Yun, I. Development of safety evaluation scenarios for autonomous vehicle tests using 5-layer format (case of the community road). J. Korean Soc. Intell. Transp. Syst. 2019, 18, 114–128. [Google Scholar] [CrossRef]
  32. Lee, H.J.; Jang, M.; Song, J.; Hwang, K. Development of autonomous vehicle traffic accident scenarios in urban areas based on real-world accident data using association rule mining. J. Korean Soc. Transp. 2023, 41, 375. [Google Scholar] [CrossRef]
  33. Lee, J.W.; Lee, M.S.; Jeong, J.I. Intersection collision situation simulation of automated vehicle considering sensor range. J. Auto-Veh. Saf. Assoc. 2021, 13, 114–122. [Google Scholar]
  34. Lee, J.M.; Jeong, E.I.; Song, B.S. Critical scenario generation for collision avoidance of automated vehicles based on traffic accident analysis and machine learning. J. Korean Soc. Automot. Eng. 2020, 28, 817–826. [Google Scholar]
  35. Lee, W.; Kang, M.; Hwang, K. A study on predicting accident vulnerable situation and deriving scenarios of automated vehicle based on actual driving data using vision transformer. J. Korean Soc. Intell. Transp. Syst. 2022, 21, 233–252. [Google Scholar] [CrossRef]
  36. Oh, S.M.; Choi, J.H.; Jang, K.T.; Yoon, J.W. Analysis of autonomous vehicles risk cases for developing level 4+ autonomous driving test scenarios: Focusing on perceptual blind. J. Korean Soc. Intell. Transp. Syst. 2024, 23, 173–188. [Google Scholar] [CrossRef]
  37. Qu, D.; Wang, K.; Dai, S.; Chen, Y.; Cui, S.; Yang, Y. Vehicle game lane-changing mechanism and strategy evolution based on trajectory data. Sci. Rep. 2025, 15, 4841. [Google Scholar] [CrossRef]
  38. Song, Y.; Chitturi, M.V.; Noyce, D.A. Automated vehicle crash sequences: Patterns and potential uses in safety testing. Accid. Anal. Prev. 2021, 153, 106017. [Google Scholar] [CrossRef]
  39. So, J.J.; Park, I.; Wee, J.; Park, S.; Yun, I. Generating traffic safety test scenarios for automated vehicles using a big data technique. KSCE J. Civ. Eng. 2019, 23, 2702–2712. [Google Scholar] [CrossRef]
  40. Chen, S.; Zong, S.; Chen, T.; Huang, Z.; Chen, Y.; Labi, S. A taxonomy for autonomous vehicles considering ambient road infrastructure. Sustainability 2023, 15, 11258. [Google Scholar] [CrossRef]
  41. Guo, H.; Xie, K.; Keyvan-Ekbatani, M. Modeling driver’s evasive behavior during safety-critical lane changes: Two-dimensional time-to-collision and deep reinforcement learning. Accid. Anal. Prev. 2023, 186, 107063. [Google Scholar] [CrossRef]
  42. Virdi, N.; Grzybowska, H.; Waller, S.T.; Dixit, V. A safety assessment of mixed fleets with connected and autonomous vehicles using the surrogate safety assessment module. Accid. Anal. Prev. 2019, 131, 95–111. [Google Scholar] [CrossRef]
  43. Shi, X.; Wong, Y.D.; Li, M.Z.F.; Chai, C. Key risk indicators for accident assessment conditioned on pre-crash vehicle trajectory. Accid. Anal. Prev. 2018, 117, 346–356. [Google Scholar] [CrossRef]
  44. Khan, G.; Qin, X.; Noyce, D.A. Spatial analysis of weather crash patterns. J. Transp. Eng. 2008, 134, 191–202. [Google Scholar] [CrossRef]
  45. Tang, W.; Wnag, H.; Ma, J.; Yang, C.; Yin, C. Vehicle collision risk assessment method in highway work zone based on trajectory data. Traffic Inj. Prev. 2025, 1–8. [Google Scholar] [CrossRef]
  46. Meng, D.; Xiao, W.; Zhang, L.; Zhang, Z.; Liu, Z. Vehicle trajectory prediction based predictive collision risk assessment for autonomous driving in highway scenarios. arXiv 2023, arXiv:2304.05610. [Google Scholar] [CrossRef]
  47. Khelfa, B.; Ba, I.; Tordeux, A. Predicting highway lane-changing maneuvers: A benchmark analysis of machine and ensemble learning algorithms. Phys. A Stat. Mech. Its Appl. 2023, 612, 128471. [Google Scholar] [CrossRef]
  48. Gu, R. Integrating fuzzy trajectory data and artificial intelligence methods for multi-style lane-changing behavior prediction. arXiv 2022, arXiv:2205.05016. [Google Scholar] [CrossRef]
  49. Wang, X.; Liu, S.; Zhang, J.; Ni, D. Real-Time Risk Identification and Prediction for the Target Lane’s Following Vehicle during Lane Change. Transp. Res. Rec. 2024, 2678, 1785–1798. [Google Scholar] [CrossRef]
  50. Choi, J.H.; Lim, J.B.; Lee, S.B. A meta analysis of the effects of road safety facilities on accident reduction: Focusing on signalized intersection. J. Korean Soc. Transp. 2016, 34, 291–303. [Google Scholar] [CrossRef][Green Version]
  51. Jo, Y.; Youn, S.-M.; Oh, C. Effectiveness Analysis of Variable Speed Limit Systems (VSL) in Work Zones based on Meta-analysis. J. Korea Inst. Intell. Transp. Syst. 2016, 15, 91–103. [Google Scholar] [CrossRef]
  52. Hu, W.; Zhang, T.; Zhang, Y.; Chang, A.H.S. Non-driving-related tasks and drivers’ takeover time: A meta-analysis. Transp. Res. Part F Traffic Psychol. Behav. 2024, 103, 623–637. [Google Scholar] [CrossRef]
  53. Zhang, T.; Zeng, W.; Zhang, Y.; Tao, D.; Li, G.; Qu, X. What drives people to use automated vehicles? A meta-analytic review. Accid. Anal. Prev. 2021, 159, 106270. [Google Scholar] [CrossRef]
  54. Liberatore, M.J.; Nydick, R.L. The analytic hierarchy process in medical and health care decision making: A literature review. Eur. J. Oper. Res. 2008, 189, 194–207. [Google Scholar] [CrossRef]
  55. Ho, W. Integrated analytic hierarchy process and its applications–A literature review. Eur. J. Oper. Res. 2008, 186, 211–228. [Google Scholar] [CrossRef]
  56. Saaty, T.L. Fundamentals of Decision Making and Priority Theory with the Analytic Hierarchy Process; RWS Publications: Pittsburgh, PA, USA, 1994. [Google Scholar]
  57. Ishizaka, A.; Labib, A. Review of the main developments in the analytic hierarchy process. Expert Syst. Appl. 2011, 38, 14336–14345. [Google Scholar] [CrossRef]
  58. Borgulya, I. A ranking method for multiple-criteria decision-making. Int. J. Syst. Sci. 1997, 28, 905–912. [Google Scholar] [CrossRef]
  59. Boran, F.E.; Genç, S.; Kurt, M.; Akay, D. A multi-criteria intuitionistic fuzzy group decision making for supplier selection with TOPSIS method. Expert Syst. Appl. 2009, 36, 11363–11368. [Google Scholar] [CrossRef]
  60. Taherdoost, H.; Madanchian, M. Multi-criteria decision making (MCDM) methods and concepts. Encyclopedia 2023, 3, 77–87. [Google Scholar] [CrossRef]
  61. Kim, H.; You, Y.; Han, H.; Cho, M.J.; Song, T.J. Traffic Accidents Scenarios Based on Autonomous Vehicle Functional Safety Systems. J. Korean Soc. Intell. Transp. Syst. 2025, 24, 250–266. [Google Scholar] [CrossRef]
  62. Kang, W.; Wee, J.; Shin, H.C.; Kim, S. A study on policy directions for smart mobility service in the post-COVID-19 era using the ahp technique. J. Korea Inst. Intell. Transp. Syst. 2024, 23, 100–116. [Google Scholar] [CrossRef]
  63. Cheng, E.W.; Li, H. Analytic hierarchy process: An approach to determine measures for business performance. Meas. Bus. Excell. 2001, 5, 30–37. [Google Scholar] [CrossRef]
  64. Wedley, W.C. Consistency prediction for incomplete AHP matrices. Math. Comput. Model. 1993, 17, 151–161. [Google Scholar] [CrossRef]
Figure 1. Overall research flow.
Figure 1. Overall research flow.
Sustainability 17 07284 g001
Figure 2. Identification of high-risk factors based on accident data.
Figure 2. Identification of high-risk factors based on accident data.
Sustainability 17 07284 g002
Figure 3. Process of identifying high-risk factors based on autonomous vehicle driving video analysis [61].
Figure 3. Process of identifying high-risk factors based on autonomous vehicle driving video analysis [61].
Sustainability 17 07284 g003
Figure 4. Conceptual framework for high-risk prediction model (road segment level).
Figure 4. Conceptual framework for high-risk prediction model (road segment level).
Sustainability 17 07284 g004
Figure 5. Conceptual framework for high-risk prediction model (autonomous vehicle level).
Figure 5. Conceptual framework for high-risk prediction model (autonomous vehicle level).
Sustainability 17 07284 g005
Table 1. Results of binary meta-analysis.
Table 1. Results of binary meta-analysis.
Risk Factors# of Events# of StudiesRelative Risk95% Confident
Interval (Lower)
95% Confident
Interval (Upper)
Acceleration/Deceleration22580.37930.26560.508
Intersection20580.34480.23560.508
Rear-end collision18580.31030.20620.4733
Fog18580.31030.20620.138
Rain17580.29310.19180.138
Inter-vehicle distance15580.25860.16350.4201
# of lanes15580.25860.16350.4201
Presence of lighting12580.20690.12250.3838
Pedestrian collision8580.13790.07160.3277
Bicycle collision7580.12070.05970.2493
Lane change6580.10340.04830.2288
Steering angle5580.08620.03790.1864
Driver intervention5580.08620.03790.1864
Vehicle error5580.08620.03790.1864
Tunnel4580.06890.02710.1643
Roadside parking4580.06890.02710.1643
Wind speed4580.06890.02710.1643
Jerk3580.05170.01770.1414
Severity3580.05170.01770.1414
TTC3580.05170.01770.1414
Sideswipe2580.03450.00950.1173
EDR2580.03450.00950.1173
Hacking2580.03450.00950.1173
Perception error2580.03450.00950.1173
Shockwave2580.03450.00950.1173
Bus stop1580.01720.00310.0914
Table 2. Results of effect size meta-analysis.
Table 2. Results of effect size meta-analysis.
Risk FactorsMean Effect Sizep-ValueRisk FactorsMean Effect Sizep-Value
Acceleration/Deceleration0.50500.000011Tunnel0.07710.069588
Fog0.34470.000112Jerk0.05810.151900
Rain0.31720.000218Steering angle0.05380.161178
Intersection0.30970.000121Vehicle error0.04570.153285
# of lanes0.28720.000607Roadside parking0.03720.192602
Rear-end collision0.26650.000746Bicycle collision0.03530.013797
Inter-vehicle distance0.25120.001605TTC0.03300.243168
Pedestrian collision0.12370.018445Shockwave0.03220.159097
Dedicated lane0.09630.045377Sideswipe0.02250.194688
Wind speed0.08260.083216EDR0.01590.321537
Severity0.08260.083216Hacking0.00970.160178
Lane change0.08030.061907Perception error0.00640.160178
Driver intervention0.08030.099426Bus stop0.00120.321537
Table 3. Results of key factor identification through expert consultation.
Table 3. Results of key factor identification through expert consultation.
No.Key PointsRisk Factors
1Necessity of AI-based predictive driving analysis in connected autonomous vehicle (CAV) environmentsIntersection, inter-vehicle distance, dilemma zone
2Scenario-based prediction models and object recognition using video informationPerception error, intersection, right turn, roundabout
3Need for lane operation analysis according to the penetration rate of autonomous vehiclesVehicle type distribution, roundabout
4Necessity of analyzing video-based accident data and identifying risk factorsConstruction zone, high-accident zone, dedicated lane, vehicle type distribution, bottleneck point
5Real-time accident risk prediction and infrastructure-based driving assistance technologiesIntersection, ramp in/out, lane changing, # of lanes,
stop and go traffic, traffic conflict
6Importance of establishing a cybersecurity framework for cooperative autonomous drivingPerception/decision/control error, hacking, DDT (dynamic driving task) fallback, in-vehicle/V2X (vehicle to everything) communication, cyberattack
7Identification and risk prediction of complex driving hazard situationsIntersection, tunnel
Table 4. Reclassified data layers and their correspondence to the original PEGASUS framework.
Table 4. Reclassified data layers and their correspondence to the original PEGASUS framework.
No.Reclassified Layer
(This Study)
Corresponding
PEGASUS Layer
Comparison and Reclassification Direction
1Road facilitiesRoad level,
traffic infrastructure
Consolidates physical and structural elements such as alignment, grade, and roadside objects. Redefined to emphasize infrastructure-related baseline conditions.
2Variable and temporary
facilities
Traffic infrastructure,
temporary modifications
Separates dynamic or situational elements (e.g., bus-only lanes and construction zones) from static infrastructure for improved risk predictability.
3Traffic flow
characteristics
(Not specified in
PEGASUS)
Newly introduced to capture macroscopic flow features (e.g., bottlenecks, and conflict points) as key risk indicators.
4Environmental variableEnvironmental
conditions
Retains external conditions such as weather, lighting, and road surface, but highlights measurability and influence on sensor performance.
5Moving objectsDynamic objectsRefined focus on the behavior and interaction patterns of road users (e.g., lane changes, violations, and pedestrian movements).
6DigitalDigital informationExpanded to encompass system-level risks such as perception/judgment errors, fallback scenarios, and cybersecurity threats in autonomous driving systems.
Table 5. The risk factors included in the final AHP survey.
Table 5. The risk factors included in the final AHP survey.
Key PointsRisk Factors
Road facilitiesIntersectionRoundabout
Ramp on/offUnderpass
Tunnel# of lanes
Road structure/facilityExpressway/Highway
Variable and temporary facilitiesRoad obstacleConstruction zone
Variable laneBus-only lane
Traffic flow
characteristics
Traffic conflictHigh-accident zone
Dilemma zoneStop and go traffic
Bottleneck pointVehicle type distribution
Environmental variableSnowfallWet/Moist surface
RainfallNight time
Fog
Moving objectsTruckPedestrian
Signal violationSudden stop
Lane changeInter-vehicle distance
Centerline violationTurning
On-road parking/stopping
DigitalPerception errorDecision error
Control errorCyberattack
DDT fallbackIn-vehicle/V2X communication
Table 6. Results of the AHP (the first hierarchy).
Table 6. Results of the AHP (the first hierarchy).
LayerImportanceSustainability 17 07284 i001
5Moving objects0.2437
6Digital0.2246
3Traffic flow characteristics0.1889
4Environmental variables0.1870
2Variable/Temporary facilities0.0915
1Road facilities0.0643
Table 7. Results of the AHP (the second hierarchy).
Table 7. Results of the AHP (the second hierarchy).
LayerHigh-Risk FactorsImportanceRankLayerHigh-Risk FactorsImportanceRank
1Road facilitiesIntersection0.277714Environmental variablesSnowfall0.30031
Roundabout0.19582Wet/Moist surface0.28112
Ramp on/off0.18803Rainfall0.19333
Underpass0.10264Night time0.16564
tunnel0.08465Fog0.05985
Road structure/facility0.056665Moving objectsTruck0.18401
Signal violation0.17022
# of lanes0.05057
Expressway/highway0.04428
Pedestrian0.15063
2Variable/
Temporary facilities
Road obstacle0.41361
Sudden stop0.13464
Construction zone0.28782
Inter-vehicle distance0.10535
Variable lane0.20563
Lane change0.09846
Bus-only lane0.09314
Centerline violation0.05957
3Traffic flow characteristicsTraffic conflict0.27761
Turning0.05578
Dilemma zone0.17132
On-road parking/stopping0.04179
High-accident zone0.16843
6DigitalPerception error0.25121
Decision error0.24082
Stop and go traffic0.14604
Cyberattack0.16043
Bottleneck point0.13405
Control error0.12774
DDT fallback0.11685
Vehicle type distribution0.10266
In-vehicle/V2X communication0.10326
Table 8. Final high-risk factors identified.
Table 8. Final high-risk factors identified.
LayerHigh-Risk Factors
Road FacilitiesIntersectionRoundabout
Ramp on/offUnderpass
Tunnel
Variable and Temporary FacilitiesRoad obstacleConstruction zone
Variable lane
Traffic Flow
Characteristics
Traffic conflictHigh-accident zone
Dilemma zoneStop and go traffic
Bottleneck point
Environmental VariablesSnowfallWet/Moist surface
RainfallNight time
Moving ObjectsTruckPedestrian
Signal violationSudden stop
Lane changeInter-vehicle distance
DigitalPerception errorDecision error
Control errorCyberattack
DDT fallback
Table 9. Additional candidate indicators for high-risk prediction: practical considerations and data availability.
Table 9. Additional candidate indicators for high-risk prediction: practical considerations and data availability.
High-Risk FactorsDefinitionPurpose of Use
Weaving ratioProportion of weaving traffic to total traffic volume within a weaving segment
Weaving Traffic: The traffic flow that must cross other streams within a weaving segment to reach its intended direction
Weaving segment: A road segment (≤750 m) where vehicles cross paths in the same direction and change lanes without traffic control facilities, typically with merging and diverging areas in sequence
Conflict risk analysis within road segments
EDI (Erratic Driving Index)Calculated as the sum of the area exceeding critical thresholds of aggressive driving indicators (e.g., speed, acceleration, jerk, and yaw) during the analysis period, divided by travel timeAssessment of individual vehicle driving stability
Proportion of risky driving behaviorProportion of time during which risky driving behaviors (speeding, sudden deceleration, hard braking, and sharp turning) are observed in the analysis period
Stopping rate within the segmentNumber of stops per unit time within the segment (excluding stops due to traffic signals)Used for risk assessment of crash occurrence and congestion evaluation
Table 10. Key indicators for application in high-risk prediction models.
Table 10. Key indicators for application in high-risk prediction models.
LayerHigh-Risk FactorsHigh-Risk IndicatorsAverage RankRank
1Road facilitiesIntersectionPresence of intersection1.861
Presence of a crosswalk2.932
Intersection type (T-junction, four-way, etc.)3.503
Ramp on/offPresence of on/off-ramp1.211
Speed limit2.212
Lane count2.643
RoundaboutPresence of roundabout1.641
Lane count2.793
Presence of crosswalk2.572
UnderpassPresence of underpass (road)1.501
Underpass (road) length2.292
Lane count2.573
TunnelPresence of tunnel1.431
Tunnel length2.432
Lane count2.503
2Variable and temporary facilitiesRoad obstaclePresence of road obstacles1.361
Location of road obstacles (lane, shoulder, etc.)2.363
Type of road obstacles (pothole, debris, roadkill, etc.)2.292
Construction zonePresence of construction/work zone1.141
Location of construction/work zone (lane, shoulder, etc.)2.002
Variable lanePresence of reversible lane1.071
3Traffic flow characteristicsTraffic conflictWeaving ratio1.431
PET (post encroachment time)2.933
Standard deviation of link travel speed2.712
High-accident zonePresence of accident-prone zone2.362
Proportion of risky driving behavior2.001
EDI (erratic driving index)2.863
Stop-and-go trafficStandard deviation of link travel speed2.793
PET (post encroachment time)2.501
Stopping rate within the segment2.642
Dilemma zonePresence of dilemma zone1.001
Bottleneck pointSpeed difference between adjacent links1.711
Standard deviation of link travel speed1.711
Heavy vehicle ratio2.793
4Environmental variableSnowfallSnowfall amount1.001
Snowfall duration2.142
RainfallRainfall amount1.001
Rainfall duration2.002
Wet/Moist surfaceRainfall amount2.142
Snowfall amount1.501
Temperature2.933
Night timeNight time period presence1.001
5Moving objectPedestrianPresence of pedestrians near autonomous vehicle1.141
TruckPresence of freight vehicles near autonomous vehicle1.071
Signal violationPresence of signal violation1.001
Sudden stopAcceleration/Deceleration1.361
Speed2.433
Jerk2.212
Lane changeAcceleration/Deceleration1.641
Angular velocity per second2.432
Jerk3.073
Inter-vehicle distanceInter-vehicle distance1.141
6DigitalPerception errorPresence of perception error1.361
Frequency of perception errors1.712
Sensor field of view2.933
Decision errorPresence of decision-making error1.361
Frequency of decision-making errors1.642
Sensor field of view3.003
Control errorPresence of control error1.361
Frequency of control errors1.642
Sensor field of view3.003
DDT FallbackPresence of DDT fallback1.431
Frequency of DDT fallback1.572
CyberattackPresence of cyberattack1.211
Frequency of cyberattacks1.792
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Han, H.; Lee, S.; Jeong, J.; Lee, J. Structured Risk Identification for Sustainable Safety in Mixed Autonomous Traffic: A Layered Data-Driven Approach. Sustainability 2025, 17, 7284. https://doi.org/10.3390/su17167284

AMA Style

Han H, Lee S, Jeong J, Lee J. Structured Risk Identification for Sustainable Safety in Mixed Autonomous Traffic: A Layered Data-Driven Approach. Sustainability. 2025; 17(16):7284. https://doi.org/10.3390/su17167284

Chicago/Turabian Style

Han, Hyorim, Soongbong Lee, Jeongho Jeong, and Jongwoo Lee. 2025. "Structured Risk Identification for Sustainable Safety in Mixed Autonomous Traffic: A Layered Data-Driven Approach" Sustainability 17, no. 16: 7284. https://doi.org/10.3390/su17167284

APA Style

Han, H., Lee, S., Jeong, J., & Lee, J. (2025). Structured Risk Identification for Sustainable Safety in Mixed Autonomous Traffic: A Layered Data-Driven Approach. Sustainability, 17(16), 7284. https://doi.org/10.3390/su17167284

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop