Data Mining-Based Collision Scenarios of Vehicles and Two Wheelers for the Safety Assessment of Intelligent Driving Functions

: The safety performance test of intelligent driving vehicles needs to rely on the collision scenarios in a real road trafﬁc environment. In order to study the collision scenarios and accident characteristics of vehicles and two wheelers (TWs) in line with the complex trafﬁc conditions in China, this paper proposes using clustering analysis to initially cluster trafﬁc accident data to obtain the base scenarios and then applying the association rule algorithm to each base scenario to obtain the potential connection of its accident attributes and describe the collision scenarios in more detail. This study is based on data from 335 vehicle and two-wheeler crashes in the National Automobile Accident In-Depth Investigation System (NAIS). It used clustering analysis to cluster the crash data into different partitions to obtain eight clusters of vehicle and two-wheeler base scenarios and applied association rules to analyze the rest of the accident attributes, revealing common crash characteristics to describe the base scenarios in more detail. In the end, it constructed eleven types of detailed vehicle and two-wheeler collision scenarios covering straight roads, intersections, and T-junctions. The results provide richer and more suitable crash scenarios of vehicles and two wheelers in China’s complex trafﬁc and is an important reference for the development of intelligent driving testing scenarios in the future.


Introduction
In recent years, road accident safety has been a major concern, with the World Health Organization's Global Status Report on Road Safety 2018 showing that around 1.35 million people die in road accidents each year.Of particular concern is the fact that about 23 percent of all road traffic deaths are related to two-wheeler (TW) accidents [1].In China, the number of accidents involving TWs has increased year by year in the past five years.According to the data from the National Bureau of Statistics [2], there were 86,881 accidents involving TWs, and the number of casualties was as high as 115,940 in 2021.Consequently, the traffic safety of drivers of TWs as vulnerable road users (VRUs) needs to be solved urgently.Advanced Driver Assistance Systems (ADAS) can effectively reduce the risk of road traffic accidents [3].With the continuous development and application of ADAS, constructing collision scenarios through real traffic accident data has become a vital link in the research on the functional safety of intelligent driving vehicles [4].However, intelligent driving still faces significant challenges, and its further development must be evaluated by testing in more complex driving environments [5].There are significant differences between China's road traffic environment and that of foreign countries; the accident scenarios involving TWs are more complex and diverse.Therefore, it is necessary to conduct an in-depth study of the accidental collision scenarios of TWs in China in order to promote the application and development of the safety functions of the intelligent driving system in China and improve the safety of traffic accidents between vehicles and TWs.
Clustering algorithms, as a kind of data mining technique [6], can reduce the influence of researchers' subjective opinions on results when performing big data analysis with high reproducibility, and have been widely used to extract collision scenarios from traffic accident data.Li et al. [7] initially employed systematic clustering and chi-square tests to identify seven hazardous scenarios involving TWs and used PreScan to build relevant testing scenarios.Hu et al. [8] considered factors such as the degree of accident casualties, motion states, and vehicle speeds for different scenarios.They derived motion state characteristics for 11 categories of car-to-TW collision scenarios in a detailed manner.Zhou et al. [9] developed kinematic models based on the fundamental scenarios obtained from clustering accident data of car-to-TWs at intersections and derived five sets of hazardous scenarios.Xu et al. [10] utilized multivariate logistic regression to investigate the influencing factors of accident severity and identify feature elements for testing scenarios, and subsequently used these feature elements as clustering parameters to extract eight types of intersection testing scenarios.Building on the China In-Depth Accident Study (CIDAS) dataset, Sui et al. [11] applied the K-medoid clustering algorithm to cluster 672 accident cases involving cars and TWs, resulting in 6 common collision scenarios; these were then compared to the 4 typical scenarios obtained by Cao et al. [12].Wang et al. [13] employed 239 crash cases from the China In-Depth Mobility Safety Study-Traffic Accident (CIMSS-TA).They summarized six functional scenarios using the K-medoids clustering based on seven collision characteristics; additionally, they established dynamic parameters for collision trajectory analysis during hazardous moments and generated testing scenarios suitable for autonomous driving.Pan et al. [14] conducted an in-depth analysis of traffic accident data with monitored videos.They classified TWs into three typical types and used clustering analysis and accident characteristics to identify collision scenarios involving various types of TWs and cars.
Currently, research mainly focuses on extracting testing scenarios from accident data but lacks the exploration of accident attribute relationships and potential features.The obtained scenarios are relatively simple and lack the consideration of complex traffic environmental factors, such as vehicle lanes, road speed limits, and road infrastructure.Traditional clustering algorithms are commonly used as research methods, but they face challenges in achieving clear data classification when dealing with a large number of variables [15], which may result in the loss of important variable association scenarios [6].Moreover, analyzing road accident data requires consideration of data heterogeneity; otherwise, certain relationships between the data may remain hidden [16].Association rules, as a data mining technique, can extract the hidden relationships between various attributes in a large amount of data and are widely used in traffic accident data analysis [17].Xu et al. [18] used association rules for serious casualty traffic accidents (accidents with more than 10 fatalities) in China to reveal the accident factors that often occur together and their interdependence to determine the characteristics of serious casualty accidents.Das et al. [19] identified factors and hidden features affecting fatal pedestrian crashes at intersections in the United States by applying association rule mining to detailed pedestrian accident data to help understand the collision scenarios of pedestrian accidents at intersections.However, direct computing association rules on the entire dataset led to an enormous number of rules that are difficult to interpret.Kumar [20] demonstrated that conducting clustering analysis on the dataset before applying association rules can partially eliminate the heterogeneity in traffic accident data and mitigate the issue of having a high number of difficult-to-interpret rules.Nitsche et al. [21] investigated the key scenarios and collision characteristics of traffic accidents at UK junctions by means of K-medoids and association rules, obtained clusters of collisions under thirteen types of T-junctions and six types of intersections, and identified twelve pre-crash scenarios at junctions, taking into account clusters of high-injury outcomes of the accidents.
To the best of our knowledge, there are no studies that use association rules to mine accident characteristics of vehicle-to-TW collisions while extracting vehicle-to-TW collision scenarios through clustering analysis.Consequently, in this paper, the combination of clustering analysis and association rules is used for the first time to analyze vehicle-to-TW accident data.It aims to gain insights into the accident characteristics and key patterns of vehicle-to-TW collision scenarios in China and to provide scenario references for the assessment of intelligent driving safety performance in China based on accident data in the Songjiang district of Shanghai from NAIS.First, the base scenario is obtained by initially dividing the traffic accident data through clustering analysis.Then, the association rules algorithm is applied to the base scenarios to generate the rest of the more detailed accident attributes, ultimately constructing vehicle-to-TW collision scenarios that are suitable for complex traffic conditions in China.

Sources of Accident Data
The accident data in this paper comes from about 800 traffic accidents in Songjiang District, Shanghai, collected by NAIS during 2018-2021.There were more than 400 accidents involving TWs, accounting for about 54%.Considering the complexity of vehicle-to-TW traffic accidents, this paper screens accident cases by the following conditions: A collision between a vehicle (car, SUV, and MVP) and a TW; 2.
The type of road is straight, an intersection, or a T-junction; 3.
The motions of the vehicle and TW were limited to traveling straight ahead, turning, and others (the driver was waiting to turn left, reversing, performing a U-turn, or overtaking); 4.
Vehicle-to-TW rear-end accidents were ruled out.
Consequently, 335 real accident cases were selected to analyze the collision scenarios of vehicle-to-TW accidents.

Accident Variable Extraction and Coding
Traffic accidents are caused by "human-vehicle-road-environment" interactions [14].The purpose of this study is to extract vehicle-to-TW collision scenarios in real traffic environment.Therefore, considering the different variables in the four elements and the demand for establishing subsequent testing scenarios, the scenarios should be accurately and adequately described using as few variables as possible [21].Combined with previous studies [7][8][9][10], eleven variables in the four main elements of "human-vehicle-roadenvironment" were selected to state the vehicle-to-TW collision scenarios.The different variables corresponding to the four main elements are shown in Figure 1.
TW accident data.It aims to gain insights into the accident characteristics an of vehicle-to-TW collision scenarios in China and to provide scenario refe assessment of intelligent driving safety performance in China based on ac the Songjiang district of Shanghai from NAIS.First, the base scenario is ob tially dividing the traffic accident data through clustering analysis.Then, t rules algorithm is applied to the base scenarios to generate the rest of the accident attributes, ultimately constructing vehicle-to-TW collision scenario able for complex traffic conditions in China.

Sources of Accident Data
The accident data in this paper comes from about 800 traffic accident District, Shanghai, collected by NAIS during 2018-2021.There were more dents involving TWs, accounting for about 54%.Considering the complex to-TW traffic accidents, this paper screens accident cases by the following c 1. A collision between a vehicle (car, SUV, and MVP) and a TW; 2. The type of road is straight, an intersection, or a T-junction; 3. The motions of the vehicle and TW were limited to traveling straight a and others (the driver was waiting to turn left, reversing, performin overtaking); 4. Vehicle-to-TW rear-end accidents were ruled out.
Consequently, 335 real accident cases were selected to analyze the colli of vehicle-to-TW accidents.

Accident Variable Extraction and Coding
Traffic accidents are caused by "human-vehicle-road-environment" int The purpose of this study is to extract vehicle-to-TW collision scenarios in vironment.Therefore, considering the different variables in the four elemen mand for establishing subsequent testing scenarios, the scenarios should and adequately described using as few variables as possible [21].Combined studies [7][8][9][10], eleven variables in the four main elements of "human-vehi ronment" were selected to state the vehicle-to-TW collision scenarios.The d bles corresponding to the four main elements are shown in Figure 1.The performance of cluster analysis can be affected by high dimensional data [21], and all variables were divided into two groups for clustering analysis and association rules, respectively.The names, attributes, and frequencies of the variables are shown in Tables 1 and 2. The clustering variables (in Table 1) mainly include the kinematic state of the participants prior to the collision and environmental variables [22] (weather, light, and so on), with a total of five variables and seventeen attributes.The variables used for the association rule mining (in Table 2) include more detailed accident variables related to the road infrastructure and so on (injury severity of the TW rider, speed limit on the road, and so on), with a total of six variables and thirty-three attributes.Due to the small number of accidents in which vehicles were subjected to collision forces in the 3-9 o'clock directions in the accident data of this study, the 3-9 o'clock directions are given together in Table 2.
Injury severity of the TW rider is classified into four levels according to the Maximum Abbreviated Injury Scale [23] (MAIS): uninjured (MAIS 0), slight (MAIS 1-2), serious (MAIS 3-5), and fatal (MAIS 6).Vehicles traveling in lanes adjacent to a non-motorized carriageway or curb are the inside lane; otherwise, they are in the outside lane, adding to the number of lanes in the direction that the vehicle is traveling.The motion of the TW relative to the vehicle is divided into left and right on the axis of the vertical center of the vehicle.The direction of the collision force on a vehicle is divided into 12, which are called 1-12 o'clock directions, as shown in Figure 2. The division is made by taking the vehicle as the center and dividing the position of the first collision point between the TW and the vehicle relative to the vehicle in 30 • steps.The vehicle directly in front of the vehicle corresponds to the 12 o'clock direction, the rest are in a clockwise direction from 1-11 o'clock.
Clustering analysis is an algorithm for categorization based on distance or similarity, and it is important to avoid the effect of unequal distances between different attributes on the similarity between samples.The above variables are discrete categorical variables and need to be coded to ensure that the distances between attributes are measurable during clustering analysis and that the distances between the same attributes in the same clustered variable are zero, while the distances between different attributes are equal.In this paper, the variables are coded using one-hot encoding [24], which is a common form of encoding in machine learning.It can extend the unordered categorical variable taking values into the Euclidean space and indicate the state of the variable with the binary code 0, 1.    Clustering analysis is an algorithm for categorization based on distance or similarity, and it is important to avoid the effect of unequal distances between different attributes on the similarity between samples.The above variables are discrete categorical variables and need to be coded to ensure that the distances between attributes are measurable during clustering analysis and that the distances between the same attributes in the same clustered variable are zero, while the distances between different attributes are equal.In this paper, the variables are coded using one-hot encoding [24], which is a common form of encoding in machine learning.It can extend the unordered categorical variable taking values into the Euclidean space and indicate the state of the variable with the binary code 0, 1.

Hierarchical Clustering
Clustering analysis is an unsupervised learning method for discovering clustering effects among data.It can greatly reduce the influence of researchers' subjective opinions on the scenario classification results and is highly reproducible.In this paper, one of the most common clustering algorithms, hierarchical clustering, is used to cluster 335 accident cases in order to obtain the base scenarios.The steps of the hierarchical clustering algorithm are as follows: 1.
Each sample is a separate cluster; 2. The distance between different samples is calculated, and the two samples with the closest distance are combined into one cluster; 3.
Calculate the distance between the different clusters, combining two closest clusters into one new cluster; 4.
Keep repeating step 3 until all the samples are clustered into one cluster.
The distance between different samples is calculated using Euclidean distance.Each sample contains m variables, where the ith sample can be represented as: where X im denotes the value (0 or 1) of the mth variable in the ith sample, and the distance between samples i and j is: The distance between the different clusters is calculated using Ward's method.First, calculate the within-cluster sum of squares of deviations for each cluster separately; then, select two clusters to be merged into one.Since the sum of squares of deviations increases after reducing the number of clusters by one, the two clusters with the smallest increase in the sum of the squares of deviations are chosen for merging.
The within-cluster sum of squares of deviations of the samples is as follows: where n Q is the number of samples in the cluster C Q , X iQ is the ith sample in the cluster C Q , and X Q is the centroid of the C Q .
The distance between clusters is: where S R and S L are the sum of squares of deviations for clusters C R and C L , and C R is a merged cluster of C L and C Q .

Association Rules Mining
Association rules mining is a popular method of data analysis in road traffic safety research [25][26][27].Association rules mining, also known as "frequent item mining", is widely used to discover associations between incident attributes [28].Each sample in the association rules is called a transaction (t The rule term can be expressed as A → B , where A represents the antecedent and B represents the consequent; meanwhile, A ∈ I and B ∈ I.It is worth noting that these rules represent associative relationships between attributes and cannot be interpreted as causal relationships between antecedent and consequent [29].
In this paper, we used the Apriori algorithm for association rules.Apriori is one of the most commonly used association rule algorithms in the field of traffic accident data analysis.The steps of the Apriori algorithm are as follows.First, find all the frequent itemsets that satisfy the minimum support; then, generate strong association rules from these frequent itemsets that satisfy the minimum confidence.Support is the frequency of a rule that represents the importance of the rule.Higher support thresholds are the rules that also occur more frequently, while lower support thresholds may result in more rules, but the rules may not occur frequently enough and may not be representative.Confidence represents how reliable the rule is.Higher confidence thresholds produce more strongly correlated rules, while lower confidence thresholds may result in more rules but less reliable rules.Therefore, appropriate support and confidence thresholds need to be chosen to ensure that a moderate and representative number of rules is mined.
In the field of road traffic safety, regarding the setting of support and confidence thresholds, different studies have set different thresholds according to the research purpose and sample size [18,19,25].The main objective of the association rules in this paper is similar to Nitsche's study [21], which obtains detailed relevant accident attributes for collision scenarios, and the samples for conducting the association rule in this paper are small.Therefore, the minimum support is 0.1 by conducting experiments on different thresholds, and the minimum confidence is 0.75 based on the values taken in the study [21].
Lift, also known as "interestingness", is a metric to measure the degree of correlation between antecedent and consequent in a rule [28].It reflects the probability of simultaneous occurrence of a consequent under the given condition of an antecedent.A rule is considered to be strongly correlated if the Lift > 1.Therefore, strong association rules with Lift > 1 were further screened in this study.

Base Scenarios and Rules Mining 4.1. Accident Data Clustering
In this paper, we used the silhouette analysis [30] to assess the effectiveness of clustering and obtain the optimal results of clustering analysis; it can help us to analyze the cohesiveness and separation of clusters.Each cluster is expressed by a silhouette coefficient, and silhouette coefficients close to 1 indicate better clustering results for that cluster.The average silhouette width (ASW) is the average of the silhouette coefficients under the current number of clusters, which is used to select the most appropriate number of clusters; the larger the ASW, the higher the clustering validity.Although the ASW values gradually become larger as the number of clusters increases, there is a subsequent need to calculate the association rules for each cluster.Therefore, the number of samples in each cluster is not less than 30 to ensure that the association rule analysis is supported by sufficient sample data to reveal the relationship between different attributes.In this study, the optimal number of clusters was determined by comparing the ASW and minimum sample size for different numbers of clusters.Figure 3a shows the ASW and the minimum sample size in clusters for different numbers of clusters k = 2 to k = 15; the clustering results with k > 8 are excluded based on the minimum sample size.Further, the ASW in the clustering results was analyzed.The highest ASW = 0.29 for k = 8, and the overall clustering result was better, so the number of clusters k = 8.
The silhouette values of all samples in each cluster when k = 8 are shown in Figure 3b.Samples with negative silhouette values may be assigned to the wrong clusters.C2, C4, C6, C7, and C8 all have samples that may be assigned to the wrong clusters, but the overall number of incorrect samples is low, which indicates that the vast majority of the samples were assigned to the correct clusters and better reflects the similarity between samples in the same cluster.In addition, C1, C3, and C5 do not have negative silhouette value samples and have larger overall silhouette values, so their accident characteristics are more obvious.
The inconsistency coefficients of the cluster analyses were further examined in order to enhance the confidence for the selection of the number of clusters [6].The larger the increase in the inconsistency coefficient, the better the last clustering.As shown in Figure 4, the inconsistency coefficient corresponding to the 328th clustering has increased substantially from the inconsistency coefficient of the 327th clustering.Therefore, the 327th clustering is more effective, which means that the number of clusters is 8.
result was better, so the number of clusters k = 8.
The silhouette values of all samples in each cluster when k = 8 are shown in Figure 3b.Samples with negative silhouette values may be assigned to the wrong clusters.C2, C4, C6, C7, and C8 all have samples that may be assigned to the wrong clusters, but the overall number of incorrect samples is low, which indicates that the vast majority of the samples were assigned to the correct clusters and better reflects the similarity between samples in the same cluster.In addition, C1, C3, and C5 do not have negative silhouette value samples and have larger overall silhouette values, so their accident characteristics are more obvious.The inconsistency coefficients of the cluster analyses were further examined in order to enhance the confidence for the selection of the number of clusters [6].The larger the increase in the inconsistency coefficient, the better the last clustering.As shown in Figure 4, the inconsistency coefficient corresponding to the 328th clustering has increased substantially from the inconsistency coefficient of the 327th clustering.Therefore, the 327th clustering is more effective, which means that the number of clusters is 8.The clustering variables in Table 1 were selected to obtain the 8 clusters of accident base scenarios of vehicle-to-TW accidents by clustering the 335 vehicle-to-TW accidents, The inconsistency coefficients of the cluster analyses were further examined in order to enhance the confidence for the selection of the number of clusters [6].The larger the increase in the inconsistency coefficient, the better the last clustering.As shown in Figure 4, the inconsistency coefficient corresponding to the 328th clustering has increased substantially from the inconsistency coefficient of the 327th clustering.Therefore, the 327th clustering is more effective, which means that the number of clusters is 8.The clustering variables in Table 1 were selected to obtain the 8 clusters of accident base scenarios of vehicle-to-TW accidents by clustering the 335 vehicle-to-TW accidents, The clustering variables in Table 1 were selected to obtain the 8 clusters of accident base scenarios of vehicle-to-TW accidents by clustering the 335 vehicle-to-TW accidents, as shown in Table 3, where the grey table represents the accident attributes that account for greater than 80% of each characteristic variable for each cluster as the main characteristics of that base scenario.Based on the clustering results in Table 3, it can also be concluded that each of the accident attributes of C1 and C3 are more obvious.
C1 and C2 are accident scenarios under straight road.C1 is the most numerous type of scenario dataset with a total of 53 crashes, accounting for 15.8%, which is mainly for the vehicle-to-TW vertical collision during sunny weather, the day, and straight road.The difference between C2 and C1 is the weather, but a clear delineation of the light in the accidents in C2 was not formed.C3, C4, and C7 are all accident scenarios at intersections; all of the accidents in C3 were vehicle straight ahead and TW straight ahead during sunny weather, the day, and intersections; C4 occurred during cloudy weather, when vehicle was traveling straight ahead or turning left, while TW was traveling straight ahead, resulting in a conflict; C7 is clearly an intersection accident scenario during a well-lit night of a "TW crossing the road".C6 occurred at a T-junction during the day, when the vehicle was traveling straight ahead or turning left and made contact with TW traveling straight ahead.C5 is an accident scenario in rainy/snowy weather on a straight road or an intersection.C8 is characterized by accidents with no obvious segmentation of road type but is the only cluster that represents an accident scenario where a vehicle is traveling straight ahead and a TW is making a left turn.The above eight base scenarios basically cover the vehicle-to-TW standard conflict scenarios in the Euro-NCAP [31] and highlight the collision characteristics of vehicle-to-TW accidents that are unique to China: nighttime scenarios and scenarios in which the vehicle or TW turns.

Collision Scenarios Derived from Association Rules
For each cluster base scenario, association rules were computed using the attributes given in Table 2. Due to the large number of rules obtained by the Apriori algorithm for the eight clusters in this paper, we do not give all of them.The several clusters of base scenarios were selected to calculate the rules based on accident variables (road type and motion of the vehicle and TW) that reflect the key collision scenarios, and the degree of clustering was also considered in order to make the accident characteristics of the clustered variables in the collision scenarios more obvious.
Among clusters C1, C2, C3, C4, C6, and C7, in which straight, intersection, and Tjunction are the salient features, we have, respectively, chosen the best divisions cluster in each road type: C1, C3, and C6.Only C8 of the eight clusters is a vehicle that is straight ahead and a TW turning left accident.The last remaining cluster, C5, has a strong similarity of samples within the cluster, with all accident characteristics except road type being more prominent, so the other accident attributes of C8 and C5 were further investigated.Based on the above analyses, five clusters were identified to apply the association rules: C1, C3, C5, C6, and C8.
The association rules aim to obtain the accident attributes, so we analyzed only two-item and three-item rules.As an example, C3 further explains its association rule results.C3 generated 29 rules, as shown in Table 4, which gives the antecedent, consequent, support, confidence, and lift of the rules, and the rules are sorted by support values.Each rule consists of an antecedent and a consequent, which are expressed as "short name of variable = code of attribute".These rules represent the degree of association and dependency of Table 2 accident attributes in cluster C3, but they do not represent causal relationships between accident attributes.The motion of the TW relative to the vehicle in the association rule variables combined with the motion of the vehicle and TW in the clustering variables can provide a clear indication of the collision.Therefore, other accident attributes associated with the motion of the TW relative to the vehicle are mainly analyzed in the rules.As can be seen in Table 4, the rule with the highest support is the "Motr = L and Injury = Sei".The TW ran out of the left side of the vehicle, and the cyclist was serious, which is also related to the fact that the vehicle was traveling on the road with a speed limit of "40 mph" (Splim = 40 mph, rule No. 18), the lanes in the direction of travel were located in the "Outside lane of a dual carriageway" (Lane = Odc, rule Nos. 4 and 5), and the road center separation is "Central green belt" (Rcensep = Cgb, rule No. 7).Rule No. 3 is "Motr = R and Splim = 60 mph", which means that a vehicle is traveling on a road with a speed limit of 60 mph and a TW comes out from the right side of the vehicle, which is related to "Injury = Sei" (rule No. 8), and "Motr = R" is related to "Injury = Sli" (rule No. 10) and "Dirt = O1" (rule Nos. 25, 26, and 27).As a result, C3 derives two collision scenarios: C3.1 and C3.2, where the direction of motion of the TW relative to the vehicle (Motr = L or Motr = R) is used as the dividing variable for the derived collision scenarios.
The accident characteristics of the final collision scenario are a combination of the accident attributes of each base scenario and the remaining accident attributes obtained from the association rules.For example C3.1, the accident attributes of the base scenario C3 (intersection, sunny, day, straight ahead, straight ahead), together with the remaining accident attributes obtained by the association rules (serious injury, outside of dual carriageway, left, 40 mph, central green belt, and 11 o'clock direction), give a detailed description of the final collision scenario C3.1.C3.1 is at an intersection on a sunny day when a vehicle driving straight ahead on the outside of the dual carriageway by the central green belt separation and the road speed limit is 40 mph, the TW from the vehicle's left side of the straight run, the TW driver was seriously injured, the vehicle was hit at the 11 o'clock direction collision force.C3.2 is at an intersection on a sunny day when a vehicle driving straight ahead on a road with a speed limit of 60 mph, the TW from the vehicle's right side of the straight run, the TW driver was seriously or slightly injured, and the direction of the collision force on the vehicle is at the 1 o'clock direction.scenarios based on road type, weather, light, motion of vehicle, and motion of TW.In addition, this study also considered accident attributes such as road speed limit, vehicle traveling lane, and the motion of the TW relative to the vehicle and obtained the rest of the strongly related accident attributes of the base scenarios using the association rules to further describe the scenarios in detail.Some key attributes such as serious injuries of the TW rider, a road speed limit of 60 mph, and the direction of collision force on a vehicle in the 12 o'clock direction often appeared in the generated rules.This reveals the potential collision characteristics of vehicle-to-TW accidents in complex traffic environments in China, ultimately resulting in 11 categories covering vehicle-to-TW collision scenarios on straight roads, intersections, and T-junctions.
The results of this study support the existing findings on accident safety of vehicle-to-TW accidents.The collision scenarios obtained in this paper help to reduce the number of possible variations in accident attributes, such as vehicle trajectory, road speed limit, and the number of lanes, when building intelligent driving testing scenarios.This study provides a reference for the establishment of vehicle-to-TW testing scenarios for intelligent driving functional safety assessment.

Figure 1 .
Figure 1.Variables of the vehicle-to-TW collision scenario.

Figure 1 .
Figure 1.Variables of the vehicle-to-TW collision scenario.
[23] (MAIS): uninjured (MAIS 0), slight (MAIS 1-2), serious (MAIS 3-5), and fatal (MAIS 6).Vehicles traveling in lanes adjacent to a non-motorized carriageway or curb are the inside lane; otherwise, they are in the outside lane, adding to the number of lanes in the direction that the vehicle is traveling.The motion of the TW relative to the vehicle is divided into left and right on the axis of the vertical center of the vehicle.The direction of the collision force on a vehicle is divided into 12, which are called 1-12 o'clock directions, as shown in Figure 2. The division is made by taking the vehicle as the center and dividing the position of the first collision point between the TW and the vehicle relative to the vehicle in 30° steps.The vehicle directly in front of the vehicle corresponds to the 12 o'clock direction, the rest are in a clockwise direction from 1-11 o'clock.

Figure 2 .
Figure 2. Twelve directions of collision force on a vehicle.

Figure 2 .
Figure 2. Twelve directions of collision force on a vehicle.

Figure 3 .
Figure 3. Silhouette analysis plot: (a) ASW values and minimum sample sizes for different number of clusters; (b) silhouette values for k = 8.

Figure 3 .Figure 3 .
Figure 3. Silhouette analysis plot: (a) ASW values and minimum sample sizes for different number of clusters; (b) silhouette values for k = 8.

Table 1 .
Accident variables used for clustering analysis.

Table 2 .
Accident variables used for association rule mining.
Injury severity of the TW rider is classified into four levels according to the Maximum Abbreviated Injury Scale

Table 4 .
Rules obtained for C3 (abbreviated codes for variables and corresponding attributes are in Table2).