Analysis of Factors Inﬂuencing the Severity of Vehicle-to-Vehicle Accidents Considering the Built Environment: An Interpretable Machine Learning Model

: Understanding the causes of traﬃc road accidents is crucial; however, as data collection is conducted by traﬃc police, accident-related environmental information is not available. To ﬁll this gap, we collect information on the built environment within R = 500 m of the accident site; model the factors inﬂuencing accident severity in Shenyang, China, from 2018 to 2020 using the Random Forest algorithm; and use the SHapley Additive exPlanation method to interpret the underlying driving forces. We initially integrate ﬁve indicators of the built environment with 18 characteristics, including human and vehicle at-fault characters, infrastructure, time, climate, and land use arib-utes. Our results show that road type, urban/rural, season, and speed limit in the ﬁrst 10 factors have a signiﬁcant positive eﬀect on accident severity; density of commercial-POI in the ﬁrst 10 factors has a signiﬁcant negative eﬀect. Factors such as urban/rural and road type, commercial and vehicle type, road type, and season have signiﬁcant eﬀects on accident severity through an interactive mechanism. These ﬁndings provide important information for improving road safety.


Introduction
In recent years, the analysis of road traffic crashes has emerged as a critical area of research in sustainable urban development, primarily due to the severe consequences of such crashes, which result in significant human and property losses. The World Health Organization reports that road traffic accidents account for approximately 1.35 million deaths yearly [1]. Consequently, identifying effective strategies to mitigate fatalities from road traffic accidents has become a shared objective among scholars and policymakers.
Efforts to reduce deaths in road traffic accidents can be broadly categorized into two areas of focus: decreasing the frequency of accidents and minimizing the severity of accidents [2]. Studies investigating the frequency of road traffic accidents emphasize the typical characteristics of accident-prone locations [3], while research examining the severity of road traffic accidents tends to explore the mechanisms through which various key factors influence accident severity [4,5].
In constructing the set of influencing factors, previous research has predominantly relied on traffic police-reported data and has focused on driver characteristics [6]. However, the built environment also considerably impacts road traffic accidents [7]. Researchers discovered early in the exploration process that the objective environment has an unavoidable influence on accidents. Early urban planning a ached importance to incremental development, and problems existed in land use, job-housing balance, and traffic density, which affected residents' travel pa erns. Moreover, residents' traffic demand and supply conflicts were significant, which to some extent affected traffic safety. Scholars have found through numerous studies that the probability and severity of urban traffic accidents are closely related to the built environment and that the built environment directly affects regional traffic flow, speed, and traffic conflicts. Therefore, adding the influence of the built environment to accident analysis can obtain a more principled analysis of accidents than using only police survey data.
While identifying key influencing factors, mathematical statistics or machine learning methods are commonly employed. Mathematical statistics have stronger model interpretability. Still, machine learning methods can explore more complex relationships between the independent and dependent variables [8]. In a study by Ahmed et al. [9], six different machine learning algorithms, namely Random Forest (RF), Decision Jungle (DJ), Adaptive Boost (AdaBoost), Extreme Gradient Boost (XGBoost), Light Gradient Boost Machine (LGBM), and Classification Boost (CatBoost), were used to analyze road accidents in New Zealand from 2016 to 2020, and they showed that RF has the best prediction results. Therefore, in this study, we utilize the Random Forest (RF) algorithm to explore key influencing factors, including considering the surrounding environmental features of the accident locations. Furthermore, we apply the SHapley Additive exPlanation (SHAP) method [10] to complement the inherent interpretability shortcomings of machine learning methods, facilitating a deeper exploration of the impact trends on crucial determinants of crash severity.
The objective of this paper is to integrate macroscopic built environment data with police-reported data. We aim to explore the role of various factors in accidents from the perspective of the at-fault party, especially examining if certain built environment factors lead to fatal outcomes. Additionally, we will investigate the impact of dual-factor interactions on fatal and non-fatal accidents to provide targeted theoretical support and policy recommendations. The remainder of this paper organized as follows: Section 2 is a summary of the past research, Section 3 presents an overview of the data, Section 4 introduces the proposed methodological framework, Section 5 presents the results, and Section 6 discusses the implications those results have for policy making regarding pedestrian safety.

Literature Review
In exploring determinants affecting the severity of road traffic accidents, the academic literature primarily emphasizes six key dimensions: human factors, vehicle factors, road characteristics, crash characteristics, time, and environmental conditions [11]. Among them, human and vehicle factors include aspects such as driver's age [12], gender [13], driving experience [14], and vehicle type [15]; road characteristics involve elements such as the physical isolation of the road [16], road type [17], and other variables; crash characteristics include factors such as accident reason [18], cross-sectional location [19], whether it occurred at an intersection [20], and other elements; time and environmental conditions include variables such as seasonality [21], weather [22], climatic factors [23], and other variables. The relationship between traffic accidents and the built environment has been proposed since the last century. Researchers have been trying to discover the contributing factors behind accidents through micro-and macro-level studies. Ding et al. [24] found a significant nonlinear relationship between some built environment factors and pedestrian crash frequency. Ewing et al. [25] found two conclusions that defy previous knowledge: dense urban traffic environments are safer than suburban areas, and unforgiving design treatments may improve road safety. The probability and severity of urban traffic accidents are closely related to the built environment, which affects regional traffic flows, speeds, and traffic conflicts. Although human causes are the primary influence on accident occurrence [26], various studies have shown that there are still unidentified influences. Scholars have found that accident rates are reduced around schools, while commercial areas positively impact accidents [27]. The intensity of land use impacts accidents, thus leading to the possible influence of the built environment. The factors that may affect the accident are derived from the built environment. In addition, Lee et al. [28] also found the influence of the built environment on different age groups of pedestrians, and Merlin et al. [29] focused on the relationship between the characteristics of the built environment and crash rates in smaller geographical units. Accident analysis that considers the built environment objectively explores existing buildings from a microscopic perspective and considers the problem more comprehensively.
Over the years, mathematical and statistical methods have remained a standard part of analyzing factors influencing traffic accidents. Several researchers have employed these methods to gain insights into different aspects of traffic accidents. Nam et al. (2000) conducted a statistical analysis of many factors impacting incident duration [30]. Huang et al. (2008) implemented a binomial logistic model to assess the severity of driver injuries at intersections in traffic accidents [31]. In a study on significant factors affecting crash injury severity at public highway-railroad-grade crossings, Haleem and Gan (2015) proposed using a mixed logit model [32]. More recently, Wang et al. (2021) developed an evaluation system based on crash characteristics and identified key influencing factors using a multinomial logit model [33]. Mathematical statistical methods typically provide detailed results for judging factor significance; however, since they are generally employed when the model has been verified as feasible, there are limitations on the factors that can be included in the statistical calculations.
In recent years, machine learning models have emerged as a popular approach for identifying crucial factors that influence traffic accidents. Various studies have employed different machine learning algorithms to predict and analyze traffic accident data. For instance, Xu and Luo (2021) developed a prediction and early warning model using the Random Forest algorithm, which demonstrated good predictive performance for unsafe acts [34].  applied methods such as the Random Forest algorithm and Bayesian logistics regression to identify crash precursors for each type of highway area [35]. Das A et al. applied conditional inference forests to identify risk factors for collision severity for clusters of different arterial corridor lengths [36]. Similarly, Wen et al. (2021) applied the Shapley Additive exPlanation (SHAP) approach to interpret the outputs of machine learning methods when analyzing risk factors associated with road segment crashes [37], the research found that the RF method is efficient in situations where potential variables have complicated relationships, such as those impacting crash severity. Chang et al. (2022) investigated the nonlinear relationship between pedestrian fatalities and related factors using the XGBoost model, with the results being interpreted through the SHAP method [38].  used the same approach to conduct an exploratory analysis of the factors in freight-truck-related crashes [39]. Wang et al. (2022) sought to identify the causes of varying levels of delay in metropolitan and non-metropolitan areas using improved the Random Forest and LightGBM algorithms [40]. In a study by Shakil Ahmed et al. [9], six different machine learning algorithms, Random Forest (RF), Decision Jungle (DJ), Adaptive Boost (AdaBoost), Extreme Gradient Boost (XGBoost), Light Gradient Booster Machine (LGBM), and Classification Boost (CatBoost), were used to New Zealand for the period 2016 to 2020 road accident data for analysis and prediction. The comparison results show that RF prediction is the best.
While machine learning methods have certain limitations, such as reduced explanatory power, which biggest drawback is cannot explain the cause-effect relationship, but can only be used to explain the correlation between variables, their flexible data requirements make them suitable for considering a wide range of influencing factors. In addition, the application of the SHAP method can also circumvent the limitations of machine learning methods to provide valuable insights into the underlying factors influencing traffic accidents. Overall, this study employs a machine learning method in conjunction with SHAP to analyze factors that impact the severity of road traffic accidents.

Data Resource and Processing
This research is conducted based on the traffic accident data from 2018 to 2020 in Shenyang, Liaoning Province, China. The accident characteristics data reported by the traffic police only pointed out the specific address of each accident, we used technical means to convert the text into latitude and longitude under the WGS84 coordinate system.
The following preliminary screening of samples is performed considering the demand of the Random Forest model for sample size and the availability of data sources: (1) Data containing pedestrians and collision data of single vehicles were excluded; data on crash between motor vehicles and motor vehicles, non-motor vehicles and motor vehicles, and non-motor vehicles and non-motor vehicles were selected; (2) Extracted data on driver accidents with equal, primary, and total responsibility in the liability determination; (3) Data on exceptional cases, such as tire blowouts and driver's sudden illness, were removed; (4) During the coordinate matching process, data with latitude and longitude accuracies below 30% were excluded.
A total of 1378 entries were obtained after data processing.
With the findings of previous researchers, we decided to incorporate built environment influences into the police reported accident data already available. For the policereported data, some influences already exist that can be described as built environment, such as the location of the road cross-section of the accident and the type of physical isolation facilities set up at the point of accident, etc. We believe that the data obtained from the survey provide an inadequate description of the macroscopic characterization of the environment, such as the land use properties were missing, which may lead to a lack of key explanatory variables in the results. Therefore, five macro-level influences on the built environment that could be manually matched and obtained from the maps were added.
For the environmental factors, we obtain the point of interest (POI) data for Shenyang, China. In GIS, a POI can be a house, a store, a mailbox, a bus stop, etc. The number of POIs represents the value of the whole system to a certain extent. Each POI contains four aspects of information, name, category, coordinates, classification, comprehensive POI information is the necessary information to enrich the navigation map. On the basis of the POI data obtained, buffer zone analysis is conducted with the help of ArcGIS software 10.6. Through the density statistics of various types of POI points within the buffer zone to explore the impact of the macroscopic built environment on fatal and non-fatal accidents. As shown in Figure 1, with reference to related studies [41], a circular buffer area with a radius of 500 m is selected as the study area, with the accident sample as the center. The densities of the road network, residential facilities, living service, educational and commercial centers, and living services in the study area are calculated.

Variable Selection
Crash severity is the dependent variable. Our study has two crash severity levels: fatal and non-fatal. The explanatory variables are selected based on previous research and can be categorized into five aspects: human and vehicle at-fault characters, infrastructure, time, climate, and land use a ributes. Table 1 lists the specific definitions and quantity statistics of the 23 explanatory variables.

Methodology
This section describes a machine learning tree model called Random Forest, as well as the method of tuning its hyperparameters using grid search. Additionally, we will discuss the SHAP method, which is used to enhance the interpretability of machine learning models.

Random Forest
The dichotomy of CART simplifies the size of the decision tree and improves the efficiency of generation, and is therefore used as a base learner in ensemble algorithms. The default criterion for the CART classification tree corresponding is the Gini coefficient, and the criterion for the selection of the Gini coefficient is that each child node achieves the highest purity, i.e., all observations falling in the child nodes belong to the same classification, at which point the Gini coefficient is minimized, the purity is maximized, and the uncertainty is minimized. The calculation of the metrics is as follows: where D is the training dataset sample, k p is the probability of the sample point belonging to class K , a is the set of all possible values for feature A , and 1 D and 2 D are separated from D according a .
A Random Forest is an ensemble learning method that builds multiple decision trees and combines their predictions to make more accurate and robust predictions. The construction of a Random Forest consists mainly of the following: (1) Construct sample subset. Randomly select N samples from the original training sample set C to generate a new training set, and repeat this process K times to generate K sample sets.

Hyperparameter Adjustment
The decision tree is one of the essential factors for the Random Forest model. Too few trees can lead to underfi ing, while the depth of the tree is used to prevent overfi ing, and this study uses grid search to select these two hyperparameters. The overview of the process of parameter selection and model evaluation with GridSearchCV.as Figure 2 below.  Grid search is a standard hyperparameter optimization method in machine learning. When training a model, it is often necessary to choose specific hyperparameters to tune the model's performance, such as learning rate, regularization factor, etc. Grid search optimizes the performance of a model by performing an exhaustive search within a predefined range of hyperparameters to find the best combination of hyperparameters. It is simple, quick, and effective. The maximum tree depth is 4, and the maximum tree is determined as 35.

Shapley Value and SHAP
The full name of SHAP is SHapley Additive exPlanation, and SHAP is an additive explanatory model inspired by the Shapley value, which originated from cooperative game theory. The SHAP model can show the influencing factors that impact on the final prediction, which improves the machine learning model's interpretability and also allows for feature importance calculations [42]. For integrated tree models, the model output is a probability value when performing a classification task. Thus, SHAP a ributes the output value to the Shapely value of each feature to measure the effect of the feature on the final output value. Notably, SHAP not only displays the degree of influence of each feature variable on the results of each sample, but it also shows the positive or negative nature of the feature contribution.
The weighted average of the marginal contributions of a feature over a subset of all feature combinations is known as the Shapley value; the formula for the Shapley value is: where the symbol g represents the explanatory model; z is the indicator vector of the sample, 1 for the value of the original sample instance of interest, or 0 if replaced by the value of the randomly selected sample; 0  is the mean of the predicted values; j  is Shapley value; and ( ) i denotes a particular article/the i -th sample.

Analysis and Discussion of Results
SHAP is able to explain the predictions by calculating the contribution of each variable to the prediction, and the results are displayed in Figure 3. The x-axis represents the mean absolute value of the overall sample SHAP value, and the y-axis represents the explanatory variables. In this study, we use Python 3.9 software to build and interpret the model through the 'Random Forest' package and 'SHAP' package. The training set and test set are obtained by dividing them in the ratio of 70% and 30%; then, 10% from the training set is taken as the validation set. The finding ranges of n_estimators and max_depth are determined via grid search and cross-validation, and the total number of decision trees in the model and the two parameters of max_depth are optimally tuned. For the RF model on the vehicle-to-vehicle accident, when max_depth = 4 and n_estimators = 35, the highest accuracy rate of 0.7653 is obtained for the training set. Max_feature used auto, and all features can be downscaled, i.e., some features with too low importance can be discarded. The interference of multiple covariance between features in the sample can be removed by choosing sqrt, which stands for choosing root n features for tree building each time. Thus, the interference of multiple covariance between samples is reduced.
As shown in Figure 3, road type and vehicle type significantly impact the severity of accidents, followed by urban/rural, season, speed limit, catering and commercial POIs density, life services POIs density, accident reason, driving experience, and road network density. To determine the direction of feature importance, local interpretation is performed.  In contrast, a negative SHAP value suggests that the influencing factor hurts the dependent variable. Each row represents an independent variable that affects the severity of accidents, and each dot represents a sample. Feature values increase gradually from low (blue) to high (red), indicating that the more positive the feature value of the independent variable, the greater its corresponding SHAP value and the more severe the accident.
In summary, the severity of the accident is higher under the following conditions: the road type is a higher-level road; the driver at-fault is driving a motorcar or minivan; the speed limit is high; the cause of the accident is failure to follow signal instructions; the driver is an elderly person or teenager; the location of the accident is in a rural area; the season is winter; the density of the Commercial-POI and network is low; and the density of the Service-POI is high.
An analysis of the impact of each feature on the model output can be made based on this figure: (1) For road type, as its feature value increases, it positively affects the severity of highlevel accidents, such as accidents on urban expressways, which are usually more severe. As its feature value decreases, it harms the severity of high-level accidents, such as accidents on trunk roads or lower-level roads, which are usually less severe. This result was consistent with previous studies' results [43]. This is consistent with previous studies, and contrary to previous perceptions, low-grade roads may instead have a more negative impact on fatal accidents due to the mandatory restrictions of overly narrow roads or one-way streets, and it is not difficult to find that high-grade roads tend to have higher speed limits in China, so high-speed limits are one of the reasons for the positive impact on fatal accidents. (2) Regarding vehicle type, a decrease in its feature value is associated with a negative impact on fatal accidents. Specifically, if the driver at fault is driving a non-motorized vehicle, the effects of vehicle type on fatal accidents are relatively small. However, when the feature value of vehicle type is at an intermediate level, it has a more significant positive effect on fatal accidents than when its feature value is high. In other words, if the driver at fault is driving a minivan or motorcar, the impact of vehicle type on fatal accidents is more considerable than if the driver at fault is driving a nonmotorized vehicle. The dataset's sample size of drivers driving minivans and small trucks is lower; a common phenomenon in general developed cities in China today, where passenger car travel is prevalent. However, the construction of road infrastructure still leaves much to be desired and can create traffic congestion in most areas, leading to higher accident rates and high severity of accidents. (3) For urban/rural areas, the two colors of its feature values correspond to urban (blue) and rural (red). The areas where the SHAP values are less than zero are mainly blue points, indicating a negative effect on the severity of accidents, i.e., the possibility of minor accidents is greater when accidents occur in urban districts. The areas where the SHAP values are greater than zero are mainly red points, indicating a positive effect on the severity of accidents, i.e., accidents in rural districts are usually more severe [43]. The positive effect of suburban areas on fatal accidents is much more significant than that of urban areas, which is similar to the principle mentioned above, precisely because of the developed road network in cities and towns. Meanwhile, suburban areas are primarily national or provincial roads, so drivers are less alert, thus having a positive effect on fatal accidents, which is corroborated by the result that lower road network density has a more positive impact. (4) For speed limit, as its feature value increases, it positively affects the severity of accidents, i.e., the higher the speed limit, the more severe the accident. Similar findings have been generated in previous studies [44]. As its feature value decreases, it has a negative effect on the severity of accidents, i.e., the lower the speed limit, the less severe the accident. (5) For the age of the driver, as its feature value increases or decreases, it has a positive effect on the severity of accidents, i.e., the higher or lower the age range of the driver, the more severe the accident. This is related to driver behavior psychology, as research shows that middle-aged drivers have safer driving styles, while young people may have more impulsive or less experienced driving skills. In traffic behavior psychology studies, drivers who are too young lack driving experience and are prone to rashness and recklessness, while drivers who are too old have a low reaction time and lose their ability to handle emergencies or lag in their reactions [36], so these people have higher accident severity than middle-aged people. (6) For season, the severity of accidents corresponds to multiple feature values, but the positive SHAP values mainly consist of red dots, indicating that accidents occurring in winter tend to be more severe. This is due to the distinct feature of long winters in Shenyang, China, where most days between November and March have low temperatures and snowy weather, resulting in worse road conditions and more severe accidents [17]. This result also indirectly demonstrates that weather may have an impact on the severity of accidents. Shenyang is one of the cities in northeastern China, and the most crucial feature of this region is the long winter season, up to six months, half of the year in a low-temperature environment, where road conditions are complicated by low temperatures or snowfall, requiring a high level of driving operation and vehicle performance. Therefore, the long winter season has had a positive impact on fatal accidents. (7) For Service-POI density, as its feature value increases, it has a positive effect on the severity of accidents, indicating that when the density of the POI is high, the severity of accidents tends to be higher. This result was consistent with previous studies' results [38]. As its feature value decreases, it has a negative effect on the severity of accidents, indicating that when the density of the Service-POI density is low, the severity of accidents tends to be lower. (8) For the cause of the accident, as its feature value increases, it has a positive effect on the severity of accidents, indicating that when the cause of accidents is failure to follow signal instructions, the severity of accidents tends to be higher [40]. As its feature value decreases, it has a negative effect on the severity of accidents, indicating that when the cause of accidents is the improper operation of the driver, the severity of accidents tends to be lower. In Shenyang, motor vehicles' disobedience of signals can be very costly, while non-motorized vehicles and pedestrians do not have be er enforcement, especially with the development of the take-out industry, which is more prominent. Often disobeying signals is more likely to produce fatal accidents and is also a more oriented result. After all, extreme penalties for drivers who violate signal rules have been incorporated into Chinese traffic laws. (9) For network density, as its feature value decreases, it positively affects the severity of accidents, indicating that when the road network density is low, the severity of accidents tends to be higher. As its feature value increases, it has a negative effect on the severity of accidents, indicating that when the road network density is high, the severity of accidents tends to be lower. Meanwhile, in a study by Zafri et al. [45], they found that higher road network density has a positive effect on fatalities in pedestrian accidents. (10) For Commercial-POI density, as its feature value decreases, it positively affects the severity of accidents, indicating that when the density of the Commercial-POI is low, the severity of accidents tends to be higher [46]. As its feature value increases, it has a negative effect on the severity of accidents, indicating that when the density of the Commercial-POI is high, the severity of accidents tends to be lower. In a previous study, we observed the trend of the top ten single-factor effects on accidents, and the results showed that low-and medium-density Commercial-POIs caused more serious accidents. At the same time, high-density Service-POIs presented a positive impact on accidents. We have not been able to clarify the specific principle of the effect POI density, which was glimpsed in the two-factor interaction in order to explore the possible reasons for its contribution to fatal accidents [37]. A two-factor interaction analysis is performed on the top ten most important influencing factors, and significant characteristics arising between the different factors are found. Figure 5 shows that the importance of the characteristic variables changes when the road type interacts with the season. High-rated roads are more likely to produce non-fatal crashes in winter, while high-rated roads are more likely to have fatal crashes in spring and summer. The potential impact of crashes is considered in subsequent studies. Main roads and other types of roads, which may be paved with dirt or concrete, are more likely to be icy in the winter due to cold and wet conditions, and poorer road conditions can create other situations that can lead to hazards or the high-risk nature of the environment. As mentioned, traffic environments in dense urban areas are safer than low-traffic environments in suburban areas because there are far fewer miles traveled per capita, and the lower travel speeds make fatal accidents less likely. However, a more nuanced conclusion emerges from the results of the interaction. Figure 6 shows that a suburban area has a higher impact on fatal accidents, but high-level roads in rural areas have a higher promotional effect on non-fatal accidents. Therefore, we think that we should further regulate the penalties and strengthen the density of electronic enforcement in rural areas; strengthen the road infrastructure in rural areas, such as by improving crosswalks or underpasses; and enhance education to strictly prohibit pedestrians from crossing the road.  Non-motorized vehicles and motorcycles are more likely to cause fatal accidents in areas with high Commercial-POI densities greater than 500 pcs/km 2 and minivans, large passenger trucks, and others within the low-density Commercial-POI area of 0-50 pcs/km 2 . We speculate this is mainly since, in areas with restaurants, these areas are also closely associated with take-out platforms, which require employees to complete orders within a limited time frame, resulting in frequent illegal driving, thus generating a high incidence of accidents involving non-motorized vehicles and motorcycles, which has increased the number of fatal accidents in which the driver is at fault. In contrast, the travel rate of motor vehicles and other medium and large vehicles increases in areas with lower Commercial-POI density. The density of Service-POIs is not as high as Commercial-POIs due to the spatial distribution of cities in developing countries, the spatial distribution pa ern, and the different needs of residents. The likelihood of fatal accidents increases under the influence of non-motorized, motorcycle, and other types of vehicle negligence and low-density Service-POI. Under low-density Service-POI conditions, the impact of non-fatal crashes is higher for cars and minivans, with ambiguous results for large trucks. In the results of the interaction between Commercial-POI density and Service-POI density and vehicle type, it is easy to find that the supervision of non-motorized vehicles and motorcycles should be strengthened to reduce the accident rate. In the study by Wang et al. [47], the factors influencing the occurrence of accidents on e-bikes in China were pointed out, and in the study by Outay et al. [48] it was also pointed out that motorcycle accidents are more serious during peak hours, which may be related to severe braking or acceleration events. In a study by Zhu et al. [49] it was found that people with the same socio-economic background were more likely to be involved in accidents between bicycles and other vehicles.
As population density increases, activity trajectory is broader, and there is an increase in peak hour travel, which all need to take into consideration the built environment as the premise of the analysis of the accident impact factors. Similar conclusions were found in the above article, which also confirms that no ma er what kind of results or entry point explored, they are inextricably linked with the environment. This paper aimed to explore the points of interest and road network density, which is a cu ing-edge research trend. This is very important for the study of road traffic safety, which can optimize the regional configuration from the perspective of future urban planning and reduce the possibility of serious accidents from the root.

Selected Sources of Research Methodology
The methods chosen in this paper refer to previous studies, based on the type of data source; the effectiveness of the methods and whether they are cu ing-edge or not were compared. By referring to the articles of Ding [24], Ewing [25], Ahmad [26], Wen [37], etc., we have found that the machine learning approach has a great advantage over the traditional regression model in that it can train high-latitude data and does not have to perform feature selection. Even if a large portion of the features are missing, the Random Forest algorithm can still maintain its accuracy, the prediction results are not affected by multicollinearity, and so on. In particular, Random Forest is able to solve both types of problems, i.e., classification and regression, and performs well in both. However, Random Forest is usually used to solve classification problems, and the article [40] refers to classification studies in the industry. The final dependent variable chosen is accident severity, which is categorized into fatal and non-fatal. At the same time, the article considered the influence of the built environment as part of the research based on the spatial distribution characteristics of the study. After considering the consistency with the overall content of the article and referring to other authors of related research, there is too li le of this type of data and machine learning models in this paper to match the degree of the final discarded analysis of spatial characteristics. This will also be taken into consideration as a direction of our future research, such as the study of accident frequency based on a spatio-temporal perspective.

Conclusions
This study analyzes mixed motorized and non-motorized traffic accident data, focusing on the responsible party's perspective in Shenyang, Liaoning Province, China, from 2018 to 2020. The primary objective is to explore the factors that have a greater impact on fatal/non-fatal accidents and their negative/positive trends, as well as to explore if, under the condition that one type of factor is known to have a strong impact on accidents with fatalities, whether another type of factor, on this basis, would interact with another to have a different mechanism of action on fatal accidents. The key research findings are outlined as follows: 1. The Shenyang data underwent descriptive statistical analysis and variable classification processing. To assess the importance and influence of 23 accident-related factors, the Random Forest and SHAP methods were employed. The results demonstrate that the most significant feature impacting accident severity is road type, followed by vehicle type, urban/rural, season, speed limit, Commercial-POI, Service-POI, cause of accident, age of driver, and network. Fatal accidents are more likely to occur on highgrade roads, particularly when involving small passenger and cargo vehicles in rural areas and during winter. Additional contributing factors included high speed limits, failure to adhere to signal instructions, varying driver experience levels, and the moderately low density of Commercial-POIs and road networks. (5) Focusing on the top 10 selected variables, this study delved into the mechanism of influence regarding fatal accidents within the context of two-factor interactions. The findings indicate that the interaction between road type and season, vehicle type and Commercial-POI, as well as road type and urban/rural displayed noteworthy characteristics concerning fatal accidents. Consequently, this investigation contributes theoretical support to traffic management.
The innovation of this paper is to draw on vehicle-to-vehicle accident data from the extreme cold region of mainland China, which not only includes a single source of traffic police report data present in previous studies, but also self-acquired data based on the built environment. The results reveal that in addition to human factors, environmental influences also play a key role in fatal accidents. Where factors were significant in the single factor results, the authors went on to explore two-factor effects, demonstrating interesting results. In this article, although SHAP compensates for the usual problem of poor model interpretability, the two-factor interaction mechanism of SHAP can only be explored by artificially screening the interaction factors. The two-factor and even multi-factor interaction mechanisms are explored. In the future, we will include more factors in considering the built environment. Furthermore, we will consider the exploration multifactor interaction mechanisms using association rule algorithms such as Apriori.