1. Introduction
The merging area of urban expressways serves as a critical point where traffic flows converge, exhibiting a significantly higher accident risk compared to basic segments, and its safety performance substantially impacts the stable operation of the entire road network [
1,
2,
3]. With the rapid advancement of Connected and Automated Vehicles (CAVs), mixed traffic flows comprising CAVs and human-driven vehicles represent an inevitable transitional stage in the evolution of Intelligent Transportation Systems. This mixed environment not only alters traditional traffic dynamics but also introduces new analytical challenges for safety assessment in merging areas. On the one hand, real-world accident data is extremely scarce during the initial promotion stage of CAVs, rendering post hoc evaluation methods reliant on historical statistics inadequate. On the other hand, the interaction between CAVs and human-driven vehicles generates complex, nonlinear spatiotemporal risk patterns that are difficult to capture with traditional macroscopic safety metrics. Consequently, there is an urgent need to develop a new safety assessment framework that integrates proactive prediction with spatiotemporally refined analysis. By establishing predictive mathematical models (e.g., for conflict frequency) and applying associated optimization methods (e.g., for parameter calibration and control strategy design), this framework can systematically address the safety challenges of merging areas under mixed traffic flows. It thereby provides theoretical support for proactive safety management and transportation system optimization.
Research on the impact of mixed traffic flows comprising CAVs and human-driven vehicles (HVs) has delineated a clear framework, spanning from macroscopic performance to microscopic mechanisms. Macroscopic simulation studies indicate that the introduction of CAVs can effectively enhance road capacity, improve traffic flow stability, and reduce emissions [
4,
5]. At the microscopic safety level, CAVs demonstrate significant safety advantages in routine driving as well as interaction scenarios such as merging and lane-changing, particularly in adverse conditions where they effectively reduce conflicts and delays [
6,
7]. Their safety benefits are generally positively correlated with penetration rates, leading to systemic improvements once a certain threshold (e.g., 25–30%) is reached [
8,
9]. From a theoretical perspective, research employing methods such as fundamental diagram models and evolutionary game theory has provided in-depth insights into the heterogeneous interaction processes between CAVs and HVs in behaviors like car-following and lane-changing, as well as their impact on the stability of mixed traffic flows [
10,
11,
12]. Recently, the analysis of traffic conflict hotspots has begun to leverage CAV data to delineate risk contours in mixed traffic streams, evaluating the safety implications of connectivity technologies [
13]. These findings lay an important foundation for understanding the potential benefits of CAV-integrated mixed traffic. However, existing studies have predominantly focused on CAV performance per se or the macroscopic efficiency of mixed traffic, with few systematically addressing the new risk assessment challenges arising from heterogeneous vehicle interactions from a methodological perspective of safety evaluation. During the transitional phase of mixed traffic, where accident data remain scarce, there is a pressing need to develop proactive safety assessment methods based on real-time interaction behaviors, with the Traffic Conflict Technique (TCT) being a prime example.
The TCT, as a non-accident proactive safety analysis method, assesses risk by quantifying interaction behaviors among road users, effectively compensating for the insufficiency of historical accident data. The development of TCT has evolved through stages of methodological construction, technological integration, and scenario expansion. During the methodological construction phase, research focused on standardizing and refining conflict indicators (e.g., Pseudo Time-to-Collision, PTTC [
14]) and establishing a conflict severity grading system [
15], laying the foundation for reliable application of TCT. With technological advancements, the integration of TCT with methods such as microscopic simulation and extreme value theory has enabled efficient generation of large-scale conflict data and long-term accident risk inference based on small samples, respectively [
16,
17]. Concurrently, the sub-field of traffic conflict hotspot analysis, which aims to identify and analyze locations with concentrated conflicts, has seen significant methodological evolution. The advent of high-resolution data from unmanned aerial vehicles (UAVs) and drones has enabled the precise identification of microscopic-level hotspots, while spatial statistical methods like Kernel Density Estimation (KDE) have become standard for visualizing risk agglomeration [
18,
19,
20]. Recent studies have further expanded the temporal dimension, investigating the spatiotemporal evolution of conflicts across different time periods and complex scenarios [
21]. This progression has shifted TCT from “phenomenon description” toward “mechanism modeling” and “risk prediction.” In terms of application, the scope of TCT has expanded from traditional intersections to complex environments such as highway merging areas [
22,
23], heterogeneous traffic flows [
24], and intelligent tunnels [
25]. Conflict hotspot analysis has similarly diversified, extending to high-risk scenarios like highway merging zones and inclement weather operations [
26]. These advancements collectively drive the evolution of TCT from a post hoc analytical tool to a proactive safety management instrument. In particular, the deep integration of TCT with microscopic simulation provides a critical tool for evaluating the safety conditions of emerging traffic environments, such as mixed CAV traffic, under controllable settings.
Merging areas, as bottlenecks where traffic flows inevitably converge, have long been a focal point in safety evaluation research. Given the scarcity of high-quality historical accident data, methods like the Traffic Conflict Technique (TCT) are widely used for safety assessment in these locations. This has spurred multidimensional methodological explorations: studies on control evaluation have quantified the efficacy of measures like ramp metering in reducing conflicts [
27]; efforts in comprehensive assessment have developed multi-indicator safety evaluation systems [
28,
29]; and theoretical applications have introduced new perspectives, such as traffic wave and driver workload theories. Additionally, research has revealed the influence mechanisms of road geometry and traffic flow parameters on conflict formation [
30] and extended application scenarios to specific complex environments such as underground interchanges and construction zones [
23]. However, most existing studies are based on traditional human-driven traffic environments, and the developed evaluation systems predominantly rely on macroscopic or aggregated conflict indicators (e.g., total conflict frequency within a region or static density maps). Such approaches often standardize conflict thresholds and focus on identifying where risks aggregate, but they struggle to mechanistically explain how heterogeneous interactions dynamically shape the fine-grained, time-varying spatial distribution of conflicts. This makes it difficult to capture the specific spatial distribution and dynamic evolution patterns of risks within merging areas. In a mixed CAV traffic environment, the spatiotemporal characteristics of vehicle interactions may undergo qualitative changes. This oversight of risk “spatial blind spots” and their generative mechanisms hampers the development of precise proactive control strategies with existing methods.
In summary, while existing research has achieved fruitful results in mixed traffic flow characteristics, traffic conflict techniques, and safety evaluation of merging areas, several challenges remain when applying these to safety assessment of merging areas in mixed CAV traffic environments. First, there is a lack of predictive models: studies have largely focused on macroscopic performance simulations, with insufficient integration of microscopic conflict interaction mechanisms to quantitatively capture the dynamic relationship between CAV penetration rates and safety levels. Second, indicator adaptability is inadequate: the introduction of CAVs alters interaction logics, and the effectiveness of traditional conflict indicators in mixed traffic environments requires systematic validation and calibration. Third, the spatial granularity of existing methods is coarse. Conventional approaches, including hotspot analysis, typically produce static, aggregated risk maps. This fails to capture the microscopic spatial heterogeneity and dynamic evolution of risks, thereby precluding the precise, lane-level or sub-lane-level control necessary for targeted interventions.
To address these challenges, this study proposes a novel, integrated safety assessment framework that uniquely bridges conflict prediction with spatial risk distribution mapping. Unlike conventional methods that treat safety metrics in isolation, this framework systematically links predictive modeling, indicator validation, and spatial diagnostics to deliver a comprehensive, multi-resolution view of risk. Specifically, it achieves integration at three levels: (1) constructing a predictive model that quantifies the safety benefits of mixed CAV traffic as a function of penetration rates; (2) validating and calibrating conflict indicators to ensure their relevance and sensitivity in mixed traffic environments; and (3) establishing a high-resolution spatial risk model to uncover the micro-level distribution and spatiotemporal evolution of risk hotspots—a key gap in current merging-area assessments. Ultimately, this framework advances safety evaluation from coarse, macroscopic statistics toward fine-grained, predictable, and spatially localizable risk intelligence. It thereby provides a methodologically coherent and practically usable tool for proactive safety management and system optimization in mixed CAV traffic environments.
2. Data and Methods
2.1. Study Area and Data Source
The data foundation of this study is the publicly available, high-precision Aerial Dataset for China’s Congested Highways & Expressways (AD4CHE). This dataset was collected via drone aerial photography with a sampling frequency of 30 Hz. It contains multi-dimensional motion states and interaction data, including vehicle trajectories, speed, acceleration, and inter-vehicle relative positions, providing high-precision data support for microscopic traffic behavior modeling and conflict analysis [
31,
32].
The study area was selected as a two-lane, direct-type acceleration lane merging area on an urban expressway (as shown in
Figure 1). According to the geometric information provided by the dataset, the physical acceleration lane in this area is approximately 140 m in length, with a standard lane width of 3.5 m. The mainline design speed is 60 km/h. To comprehensively capture vehicle interaction behaviors throughout the entire merging process from initiation to completion, the analysis scope (or “study section”) was extended beyond the acceleration lane. It spans from the ramp nose, through the entire acceleration lane (~140 m), and continues along the corresponding mainline lanes to capture the post-merging stabilization. This defines a contiguous study section with a total length of approximately 350 m.
From the AD4CHE dataset, this study extracted data from three video clips (referred to as “merging area scenarios”) numbered DJI_0001 to DJI_0003. The total recording duration of these three scenarios is approximately 900 s (totaling about 27,000 frames), covering a spatial span of 350 m. Through preprocessing steps such as coordinate correction, map-matching, and filtering on the raw trajectory data, precise sequences of position, instantaneous speed, and acceleration for all vehicles within the study scope were extracted.
Statistical analysis of the extracted trajectory data indicates that the average speed of vehicles in the study area is approximately 50 km/h (Standard Deviation: 10 km/h), and the average acceleration is approximately −0.2 m/s2 (Standard Deviation: 0.8 m/s2). Based on the processed high-precision trajectory data, potential traffic conflict events were further identified using indicators such as Time to Collision and Cumulative Time to Collision, establishing a reliable data foundation for subsequent modeling analysis and risk quantification.
2.2. Comprehensive Safety Risk Assessment Method for Merging Areas
Traffic Conflict Technique (TCT), which quantifies observable situations where two or more road users approach each other in space and time to such an extent that there is a risk of collision if their movements remain unchanged, effectively addresses the limitations of scarcity and lag associated with historical accident data, and has become a mainstream method for proactive traffic safety assessment [
33]. In merging areas, where vehicle trajectories mandatorily converge and interactions are frequent, conflicts primarily manifest as two typical patterns: rear-end conflicts (vehicle movement direction angle θ < 15°) and lateral conflicts (15° ≤ θ ≤ 85°). To scientifically and comprehensively quantify the safety risk in this area, it is necessary to construct a quantitative framework that integrates conflict identification, severity classification, and comprehensive assessment.
Among the various indicators in TCT, such as time headway, post-encroachment time, deceleration rate to avoid a crash, and time to collision [
34,
35,
36], Time to Collision (TTC) is selected as the core identification indicator in this study due to its ability to continuously and dynamically characterize the urgency of conflict evolution. Based on the identified conflict events, the cumulative frequency curve method [
25,
37] is employed, using the 85th and 20th percentiles of TTC values as thresholds to classify conflicts into two severity levels: “general conflicts” and “severe conflicts”, thereby distinguishing different levels of risk severity.
However, the count of conflicts of a single type or severity level is insufficient to fully reflect the overall safety condition of a merging area. To achieve a comprehensive quantification of safety levels and enable cross-scenario comparability, this study further integrates conflict type, severity, and their potential societal consequences to construct a Comprehensive Conflict Risk Index (CCRI). This index aims to aggregate multidimensional risk information, and its calculation formula is as follows:
where R
g and R
s represent the number of general and severe rear-end conflicts, respectively; S
g and S
s represent the number of general and severe lateral conflicts, respectively.
To clarify the derivation of the parameters for the Comprehensive Conflict Risk Index (CCRI), the following explains the rationale behind its two sets of weights, ensuring they have a clear physical meaning and statistical basis. The weights are determined by two distinct methods respectively.
(1) Conflict Severity Weights (0.14, 0.30, 0.18, 0.38): These are calculated based on the “accident conversion risk” characterized by the reciprocal of the average TTC (Time to Collision) indicator. A smaller TTC value indicates a more urgent conflict and a higher accident risk. The severity weights for the four conflict types (general/severe rear-end, general/severe lateral) within their respective categories are obtained by normalizing their risk values (calculation basis provided in
Table 1).
(2) Conflict Type Weights (Rear-end: 0.54, Lateral: 0.46): These are determined based on the direct economic losses from traffic accidents. According to national traffic accident statistics from 2016–2018, the average direct economic loss per accident for rear-end collisions and lateral collisions (including sideswipe) was calculated separately. Normalization of these average losses yields the respective weights for the two conflict types, reflecting the differing severity of their potential socioeconomic consequences (calculation basis provided in
Table 2).
It should be noted that the assignment of weights in the CCRI is based on a two-fold rationale aiming to balance the physical characteristics of conflicts with their potential socioeconomic impacts. The severity weights (0.14, 0.30, 0.18, 0.38) are derived from the physical metric of TTC (Time to Collision). A smaller TTC value indicates a higher probability of immediate collision, thereby justifying the higher weights assigned to severe conflicts compared to general ones within the same category.
Regarding the conflict type weights (Rear-end: 0.54, Lateral: 0.46), these are calibrated based on the average direct economic losses associated with actual traffic accidents (as shown in
Table 2). While it is acknowledged that not all conflicts result in accidents, using accident losses serves as a pragmatic surrogate to reflect the relative severity of different conflict types in terms of potential property damage and human injury. This approach assumes that conflict types responsible for higher average economic losses in the event of an accident inherently possess greater destructive potential and societal consequence. To ensure the validity of this approach, the weights have been normalized to ensure they sum to unity within their respective categories (severity or type), thereby mitigating scale bias.
Through the above weighting approach, this comprehensive index integrates the frequency of conflicts, the severity of real-time risk, and their potential socioeconomic impact. It achieves a transition from single-indicator measurement to a multi-dimensional, structured comprehensive assessment of safety risk in merging areas, providing a core quantitative basis for subsequent in-depth safety analysis.
2.3. Traffic Simulation Validation Method
To systematically obtain traffic conflict data across a range of CAV penetration rates, a realistic simulation of a two-lane freeway merging area was developed using SUMO. This environment provided the data foundation necessary for subsequent conflict modeling and safety assessment. In the context where real-world mixed CAV traffic data is difficult to acquire, microscopic simulation provides a reliable means for generating controllable and reproducible experimental data.
2.3.1. Simulation Environment Setup and Parameter Calibration
(1) Basic Traffic Flow Settings. Based on the actual geometric parameters from the AD4CHE dataset, a road network model consistent with the measured scenario was constructed using SUMO’s Netedit tool. To simulate the dynamic traffic demand of the merging area, a total of 6 traffic flows were defined on the mainline and the ramp. The simulation duration was set to 4000 s to sufficiently eliminate random fluctuations. The vehicle composition, flow rate, routing information, and other details for each traffic flow were calibrated based on measured data (see
Table 3). The “Generation probability” for each flow, a core stochastic input parameter, was calculated by converting the observed hourly traffic volume from the AD4CHE dataset into a per-second probability of vehicle generation. This ensures that the simulated traffic demand quantitatively replicates the real-world conditions. Key parameters, including the passenger car-to-truck ratio and the ramp flow ratio, collectively constitute the baseline traffic scenario for subsequent penetration rate control experiments.
(2) Vehicle Types and Car-Following Behavior Parameters. To accurately simulate mixed traffic flow, the parameters for the car-following and lane-changing models of both human-driven vehicles and CAVs were calibrated separately. For human-driven vehicles, the Intelligent Driver Model (IDM) was used as the car-following model, and the LC2013 model was used as the lane-changing model. Through iterative optimization, the optimal values for key parameters such as acceleration, deceleration, and minimum safe spacing were determined (see
Table 4).
For Connected and Automated Vehicles (CAVs), a Cooperative Adaptive Cruise Control (CACC) model and a lane-changing model with a cooperative decision-making mechanism were employed to reflect their vehicle-to-vehicle communication and coordinated control characteristics. The CACC model parameters for CAVs in this study were determined based on the CoEXist project and related research [
40]. Using Helmond, the Netherlands, as the test site, this project collected real-world operational data from automated vehicles, focusing on analyzing their car-following behavioral characteristics and optimizing simulation model parameters based on empirical analysis results. The specific parameter settings are shown in
Table 5 [
41,
42].
The parameter calibration strategy in this study serves two distinct but complementary purposes to ensure the validity of the mixed traffic simulation. First, calibrating the Human-Driven Vehicle (HDV) models (IDM and LC2013) against the real-world AD4CHE dataset aims to faithfully reproduce the baseline traffic flow characteristics and driver heterogeneity. Second, and critically, to avoid the pitfall of extrapolating HDV parameters to high penetration rates of Connected and Automated Vehicles (CAVs), the CAV behaviors are instead governed by parameters derived from the empirical CoExist project [
35]. This ensures that simulated CAV interactions (e.g., smaller time gaps in cooperative driving) are based on real-world automated vehicle operations, not theoretical assumptions. Therefore, while microscopic simulation faces inherent challenges in predicting absolute safety outcomes, this parameter setup provides a reasonable and methodologically sound approximation for evaluating the relative changes in safety and efficiency across different CAV penetration rates.
2.3.2. Model Validation Method and Results
To evaluate the reliability of the simulation model, the average travel time, the number of rear-end conflicts, and the number of lateral conflicts were selected as the core validation metrics. Given the inherent randomness of microscopic traffic conflict events, establishing a reasonable and acceptable error margin for model validation is crucial. Synthesizing related practices, this study defines a relative error (E) not exceeding 15% as the acceptable validation criterion. The relative error is calculated as follows:
where x
1 represents the actual (observed) value, and x
2 represents the simulated value.
The validation results (see
Table 6) indicate that the relative errors for all metrics are controlled within 15%, meeting the preset standard. Specifically: (1) The error for average travel time ranges between 5.3% and 9.4%, showing stable consistency across different lanes. This demonstrates the model’s high reliability in replicating macroscopic operational efficiency. (2) The errors for the number of rear-end and lateral conflicts are 12.6% and 14.1%, respectively. While these are the largest among all metrics, they remain within the acceptable range, indicating the model’s effectiveness in capturing the key characteristics of microscopic interactions and safety risks. In summary, the constructed simulation model can reliably replicate the traffic operation and conflict characteristics of the real-world merging area, providing a solid data foundation for subsequent research.
3. Traffic Conflict Prediction Model for Merging Areas
To construct a traffic conflict prediction model for merging areas under mixed traffic flow conditions, this study first utilizes a validated microscopic simulation model and employs orthogonal experimental design to systematically obtain multi-scenario conflict data. It then analyzes the influence mechanisms of key factors, and based on this analysis, establishes a negative binomial regression prediction model. This model serves as the core foundation for achieving quantitative safety assessment and prediction in merging areas.
3.1. Simulation and Data Acquisition Based on Orthogonal Experimental Design
To systematically obtain traffic conflict data under mixed traffic conditions and to efficiently analyze the influence of multiple factors, this study employs an orthogonal experimental design. A full factorial design for five factors at five levels would require 3125 (5^5) simulation runs, which is computationally prohibitive for the high-fidelity microscopic simulation used. The orthogonal design significantly reduces the number of required runs while ensuring a balanced and representative exploration of the factor space, allowing for the reliable estimation of main effects. The L25 orthogonal array was selected, generating 25 distinct scenario combinations (see
Table 7). This approach provides a robust foundation for the subsequent development of a statistical prediction model focused on the primary influences of each factor. It is acknowledged that this design is most efficient for estimating main effects; the exploration of higher-order interaction effects, while possible to a limited extent, was not the primary focus of this screening stage.
(1) Influencing Factors and Level Design: Five key influencing factors were selected: traffic volume (Q), CAV penetration rate (AR), merging ratio (CR), acceleration lane length (L), and truck proportion (TP). The rationale for this selection lies in their established impact on merging-area safety: Q governs traffic density and interaction frequency; AR defines the mixed-traffic interaction logic; CR influences competitive maneuvering intensity; L determines the physical space for merging; and TP introduces vehicle heterogeneity affecting traffic stability.
To systematically examine their effects within a computationally feasible framework, an L
25 orthogonal array was employed. Accordingly, each factor was assigned five levels (see
Table 7). This configuration ensures a balanced and efficient exploration of the factor space. The specific level values were strategically determined based on the measured ranges from the AD4CHE dataset and relevant highway design specifications, ensuring the scenarios are both representative of real-world conditions and capable of covering a spectrum of traffic states from free flow to congestion. This design provides the necessary granularity to model the nuanced influence of each factor on safety outcomes.
(2) Experimental Scheme and Simulation Execution: An L25 (5^6) orthogonal array was selected to design the experiments. This standard array is specifically suited for screening experiments involving multiple factors (up to six) at five levels each. For the present study with five key factors, it provides a balanced and orthogonal design with only 25 required runs. This is the minimal configuration that allows each level of every factor to be tested an equal number of times and combined uniformly with the levels of all other factors, thereby ensuring an efficient and statistically reliable estimation of the main effects for each factor. This study utilized five columns of this array to scientifically determine 25 sets of balanced and representative simulation scenario combinations. All scenarios ran sequentially on the SUMO platform, with each simulation lasting 4000 s, including a 400 s warm-up period to eliminate the influence of initial states. During the simulations, the TraCI interface facilitated the real-time collection and recording of the number of both general and severe rear-end and sideswipe conflicts in each scenario, thereby providing a systematic and efficient data foundation for constructing the subsequent conflict prediction model.
3.2. Analysis of Key Influencing Factors of Traffic Conflicts
Drawing on simulation data from orthogonal experiments, this study analyzes the effects of five key factors on the frequency of four traffic conflict types in the merging area. These factors are: traffic volume (Q), CAV penetration rate (AR), merging ratio (CR), acceleration lane length (L), and truck proportion (TP). The results are shown in
Figure 2,
Figure 3,
Figure 4,
Figure 5 and
Figure 6.
- (1)
Influence of Traffic Volume
As shown in
Figure 2, as traffic volume increases from 4500 veh/h to 6500 veh/h, all types of conflicts exhibit a monotonically increasing trend. Among them, the severe sideswipe conflict shows the highest growth rate, indicating that it is the most sensitive to changes in traffic volume. Within the 4500–5000 veh/h range, the traffic flow transitions from free flow to stable flow, and the number of conflicts begins to increase significantly. When the traffic volume exceeds 5500 veh/h and enters a congested flow state, the reduced speed differential between vehicles leads to a decrease in interaction intensity, causing the growth rate of conflict frequency to gradually level off. This pattern verifies that the safety risk in the merging area increases nonlinearly as capacity is approached, and the rate of risk exacerbation slows down under saturated flow conditions.
Figure 2.
Traffic volume—traffic conflicts.
Figure 2.
Traffic volume—traffic conflicts.
- (2)
Influence of CAV Penetration Rate
As shown in
Figure 3, the introduction of CAVs leads to a significant improvement in safety. As the penetration rate increases from 10% to 50%, all types of conflicts show a continuous reduction, with the severe sideswipe conflict exhibiting the most pronounced decline rate. When the penetration rate reaches approximately 30%, a clear inflection point appears on the conflict curve. At a penetration rate of 50%, the number of each type of conflict is reduced by approximately 50% to 75% compared to the 10% baseline scenario. This indicates that the cooperative control capability of CAVs plays a decisive role in enhancing the safety level of the merging area.
Figure 3.
CAV penetration rate—traffic conflicts.
Figure 3.
CAV penetration rate—traffic conflicts.
- (3)
Influence of Merging Ratio
As shown in
Figure 4, an increase in the merging ratio from 0.15 to 0.35 will lead to a rise in all conflict types. Among them, severe sideswipe conflict exhibits the most dramatic response, characterized by the steepest growth slope. This is directly attributed to the fact that a higher merging ratio results in more vehicles entering from the ramp, which significantly intensifies lateral interactions and competitive maneuvers between vehicles, thereby leading to a sharp increase in sideswipe conflict risk.
Figure 4.
Merging ratio—traffic conflicts.
Figure 4.
Merging ratio—traffic conflicts.
- (4)
Influence of Lane Length
As shown in
Figure 5, increasing the length of the acceleration lane effectively enhances safety in the merging area. As the lane length increases from 160 m to 280 m, all types of traffic conflicts decrease significantly. Among them, the reduction in sideswipe conflicts, particularly severe ones, is the most pronounced. This benefit is attributed to the longer acceleration lane providing sufficient space for vehicles to execute smoother merging maneuvers, thereby effectively mitigating the intensity of lateral interactions. Concurrently, rear-end conflicts are also reduced owing to the increased space for speed adjustment and the resultant more stable car-following behavior.
Figure 5.
Lane length—traffic conflicts.
Figure 5.
Lane length—traffic conflicts.
- (5)
Influence of Truck Proportion
Within the relatively low variation range set in this study (1% to 5%),
Figure 6 shows that the frequencies of the four types of traffic conflicts do not exhibit a statistically significant monotonic pattern as the truck proportion increases, with their fluctuation range being relatively limited. This finding indicates that, in this low-proportion scenario, the truck proportion is not a key dominant factor influencing conflicts in the merging area. Its disruptive effect may be diluted by the driving behavior of the predominant passenger car population.
The comprehensive analysis reveals that traffic volume, CAV penetration rate, merging ratio, and acceleration lane length are the key factors influencing conflicts in the merging area, providing a direct basis for the selection of independent variables in the subsequent prediction model.
Figure 6.
Truck proportion—traffic conflicts.
Figure 6.
Truck proportion—traffic conflicts.
3.3. Development of a Traffic Conflict Prediction Model Based on Negative Binomial Regression
Based on the multi-scenario traffic conflict data acquired from orthogonal experiments, this section develops a negative binomial regression model. The model aims to achieve accurate predictions of the occurrence counts of four conflict types (general/severe rear-end, general/severe sideswipe) in the merging area, thereby providing a core modeling tool for subsequent quantitative safety assessment. It is important to clarify that the purpose of developing this statistical prediction model is not to replace the aforementioned simulation model. While microscopic traffic simulation can capture the details of vehicle interactions, it is also computationally expensive. Moreover, the relationship between its outputs (e.g., conflict counts) and input parameters (e.g., Q, AR) is implicit, making direct analysis and rapid scenario testing challenging. By fitting a negative binomial regression model to the simulation data, we aim to establish a lightweight, interpretable, and computationally efficient surrogate model. This approach offers two core advantages: (1) It directly quantifies the marginal effect of each influencing factor on conflict frequency, clarifying the direction and magnitude of its impact, thereby providing a clear mathematical basis for mechanistic analysis; (2) it enables instantaneous prediction of conflict counts for any traffic scenario within the calibrated parameter range, significantly enhancing the efficiency of large-scale scenario screening and safety assessment, and providing the essential input for the subsequent development of the composite risk index and spatial distribution model.
3.3.1. Basis for Model Selection
Traffic conflict data, being count variables, are typically modeled using Poisson series regression models. In this study, the variance of the data for all four conflict types is significantly greater than their mean, indicating pronounced over-dispersion, which violates the equidispersion assumption underlying Poisson regression. Consequently, the negative binomial regression model is selected as the modeling foundation, as it can flexibly handle over-dispersion by introducing a dispersion parameter (α).
3.3.2. Variable Screening and Model Construction
The initial modeling effort focuses on quantifying the individual (main) effects of the key influencing factors to establish a foundational and interpretable predictive relationship. While interaction and quadratic effects can provide deeper mechanistic insights, their reliable estimation would require a more extensive experimental design specifically tailored for that purpose. The L25 orthogonal array employed in this study is optimized for the efficient estimation of main effects, which aligns with the primary objective of this screening and modeling stage. The investigation of potential higher-order effects is suggested as a valuable direction for future research.
First, a multicollinearity diagnosis was performed for the five key influencing factors: traffic volume (Q), CAV penetration rate (AR), merging ratio (CR), lane length (L), and truck proportion (TP). As shown in
Table 8, the variance inflation factor (VIF) values for all variables are significantly lower than 5 (the maximum is 1.087), indicating negligible multicollinearity among the independent variables. Therefore, all of them can be included as candidate variables in the preliminary model. Subsequently, using the aforementioned factors as independent variables and the respective counts of the four conflict types as dependent variables, a stepwise regression method (with the significance levels for entry and removal set at 0.05 and 0.10, respectively) was employed for variable screening and negative binomial regression model construction.
Based on the significant coefficients presented in
Table 8, the final prediction models established for the four types of conflicts are as follows:
- (1)
General Rear-end Conflict: N1 = exp(1.95 + 0.00054Q − 1.64AR + 1.04CR − 0.30L)
- (2)
Severe Rear-end Conflict: N2 = exp(1.69 + 0.00056Q − 1.60AR + 1.16CR − 1.08L)
- (3)
General Sideswipe Conflict: N3 = exp(2.64 + 0.00048Q − 1.52AR + 1.17CR − 0.53L)
- (4)
Severe Sideswipe Conflict: N4 = exp(2.26 + 0.00051Q − 1.46AR + 1.37CR − 1.20L)
3.3.3. Model Validation and Result Analysis
To evaluate the prediction accuracy and generalization ability of the models, a hold-out method was employed for validation. From the 25 sets of orthogonal experimental data, 5 sets (20% of the total data) were randomly selected to form an independent test set, while the remaining 20 sets (80%) were used for model training. The validation results show that the relative error for all four types of conflict prediction models is controlled within 15% (with a maximum error of 12.3%), indicating that the models possess good prediction accuracy and robust generalization ability.
The parameter estimation results for the final models are shown in
Table 9, which quantifies the specific influence mechanisms of each factor on conflict risk: (1) Traffic volume has a highly significant positive effect on all conflict types, which is consistent with the common understanding that increased traffic density leads to more interaction opportunities. (2) CAV penetration rate shows a highly significant negative effect in all models, with a relatively large absolute coefficient. This clearly confirms that the introduction of CAVs can substantially reduce conflict risk, constituting a key factor for enhancing safety in merging areas. (3) An increase in the merging ratio significantly raises the number of conflicts, reflecting that greater merging demand intensifies disturbance to the mainline traffic flow. Increasing the length of the acceleration lane has a significantly positive impact on safety, providing a basis for optimizing the geometric design of merge areas.
These quantitative conclusions derived from the econometric models and the trend analysis results from
Section 3.2 mutually corroborate, providing not only statistical empirical support but, more importantly, offering specific directions and magnitudes of the effects of each influencing factor, thereby achieving a leap from qualitative understanding to quantitative description of the conflict generation mechanism.
6. Conclusions
This study developed an integrated spatiotemporal safety assessment framework to address the safety evaluation challenges in freeway merging areas under mixed CAV and human-driven traffic. The main conclusions are as follows:
(1) A novel “prediction-location-evaluation” framework was proposed and validated, which bridges quantifiable conflict prediction with high-resolution, spatially explicit risk mapping. This approach overcomes the limitations of conventional methods regarding data dependency and coarse granularity.
(2) The framework reveals a fundamental shift in risk patterns with increasing CAV penetration: high-risk areas migrate systematically from the “static geometric bottleneck” at the ramp merge point to a “dynamic interaction interface” on the mainline. This spatial reconfiguration provides a mechanistic explanation for how CAVs enhance safety.
(3) A multi-dimensional safety evaluation system was established, integrating conflict frequency, severity, and spatial distribution. By applying grey variable-weight clustering, an objective safety grading system (A–D) was created. A case study demonstrated that increasing CAV penetration from 10% to 50% can improve the safety grade from D (Poor) to A (Excellent), offering a direct tool for quantifying CAV benefits.
(4) The study provides a complete methodological toolbox that includes models, indicators, and evaluation criteria, thereby supporting risk diagnosis, safety assessment, and informed decision-making for merging area management during the mixed-traffic transition.
Future work will focus on validating the framework with real-world connected vehicle data, extending it to more complex scenarios (e.g., urban weaving areas), and developing proactive intervention strategies based on the predicted risk patterns, thereby closing the loop from safety assessment to active prevention and control.