Next Article in Journal
Soil Strength Parameters for the Sustainable Design of Unsupported Cuts Under Drained Conditions Using Reliability Analysis
Previous Article in Journal
New Wine in Old Bottles: The Sustainable Application of Slow Sand Filters for the Removal of Emerging Contaminants, a Critical Literature Review
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Association Rule Mining-Based Modeling Framework for Characterizing Urban Road Traffic Accidents

1
School of Management, Research Institute of Digital Governance and Management Decision Innovation, Wuhan University of Technology, Wuhan 430070, China
2
School of Safety Science and Emergency Management, Wuhan University of Technology, Wuhan 430070, China
3
Hubei Information and Communication Co., Ltd., Wuhan 430014, China
4
National Engineering Research Center for Educational Big Data, Central China Normal University, Wuhan 430079, China
5
Department of Civil and Environmental Engineering, Florida State University, Tallahassee, FL 32306, USA
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Sustainability 2024, 16(23), 10597; https://doi.org/10.3390/su162310597
Submission received: 22 October 2024 / Revised: 26 November 2024 / Accepted: 28 November 2024 / Published: 3 December 2024

Abstract

:
The World Health Organization has recognized road traffic accidents as a global crisis, particularly in urban environments. Despite extensive research endeavors, significant gaps remain in our understanding of how various factors interact to influence urban road traffic incidents. This study analyzed data from 4285 urban road traffic accidents in Hubei Province, employing a two-step clustering algorithm to classify accidents into distinct groups based on specific conditions. Subsequently, association rule mining was utilized to discern relationships between accident characteristics within each cluster. Additionally, a classification based on the association rule algorithm was implemented to develop a predictive model for analyzing factors contributing to casualties. The data were categorized into clusters based on weather and road conditions, with separate discussions conducted for each scenario. The findings indicated that urban congestion is the most critical factor contributing to accidents. Interestingly, rather than in severe weather, accidents were more prevalent during cloudy, light-rain conditions. Electric vehicles and motorcycles emerged as the most vulnerable groups. Furthermore, a notable interaction was observed between the day of the week, time of day, and weather conditions. The predictive model achieved an impressive average accuracy of 86.9%. This methodology facilitates the identification of contributing factors and mechanisms underlying urban road traffic accidents in China and holds potential for establishing accident analysis models in similar contexts. The interactive visualization of association rules further enhances the applicability of the findings. The findings of this study can provide valuable insights for traffic management authorities to understand the causes of urban road traffic accidents, assisting them in devising effective policy measures and countermeasures to reduce the occurrence of accidents and casualties.

1. Introduction

Road traffic accidents have inflicted significant damage. According to the World Health Organization (WHO), approximately 1.3 million individuals lose their lives in road traffic incidents each year, with an additional 20 to 50 million suffering non-fatal injuries that frequently result in disability [1]. Daily, global road traffic accidents claim the lives of over 3500 people, culminating in an annual death toll exceeding 1.25 million—a figure that has remained relatively stable since 2007 [2]. In China, a total of 256,409 traffic accidents were documented in 2022, resulting in 60,676 fatalities and 263,621 injuries (National Bureau of Statistics of China, 2023) [3].
The characteristics of Urban Road Traffic Accidents (URTAs) are diverse and multifaceted. The interplay among these characteristics is intricate, as urban road traffic constitutes a dynamic system comprising humans, vehicles, roads, and the surrounding environment [4]. Characteristics refer to features such as “an electric vehicle is struck by a motor vehicle”, “the road condition is wet”, or “the weather is cloudy”, among others. Some of these characteristics may serve as catalysts for the occurrence of URTAs and can thus be identified as contributory factors. Macioszek and Grana [5] have identified key factors that increase the risk of severe injuries and fatalities among cyclists, including the gender and age of both drivers and cyclists, driving under the influence of alcohol, speeding, the speed of cyclists before the incident, vehicle type (notably trucks), location of the accident, time of day, and type of accident.
In traffic safety management, the importance of understanding the interrelationships among the factors and mechanisms that contribute to urban road traffic accidents cannot be overstated. However, certain accident characteristics only heighten the risk of accidents when they coalesce with other specific characteristics and do not independently increase the risk [6]. For instance, compared to single-vehicle collisions, poor lighting conditions are shown to significantly exacerbate the severity of injuries in multi-vehicle crashes [7]. The mechanisms underlying these characteristics are intricate, as they are interconnected and interact with one another. Consequently, the removal of any one characteristic would mitigate the risk of the associated traffic accident. Identifying an accurate and direct approach to addressing these complexities remains a prominent topic in contemporary academic discourse.
Over the years, scholars have employed a variety of methodologies to elucidate the contributing factors and mechanisms underlying urban road traffic accidents, including the two-step inter-cluster rule mining technique [8], the Log-Gaussian Cox model [9], and neural network models [10]. However, these approaches are not without their limitations. Non-parametric methods, such as decision tree models, often fall short of providing transparent explanations for their results. Conversely, parametric methods like statistical analysis may elucidate the marginal effects of individual variables; yet, the arbitrary distinction between dependent and independent variables can lead to erroneous inferences. Furthermore, both methodologies neglect to consider the potential influences arising from interactions between two or more variables.
This study innovatively investigates the impact of the complex interactions among multiple variables on the characterization and casualty outcomes of urban road traffic accidents. The research employs a two-step clustering algorithm and association rule mining as its primary methodologies. Initially, the two-step clustering algorithm categorizes 4285 urban road traffic accident records from Hubei Province into homogeneous clusters, with each cluster representing accidents under specific conditions. Subsequently, the association rule mining algorithm extracts the interrelationships among features within each cluster to unveil the underlying mechanisms of accident occurrence.
To further analyze the factors contributing to casualties, the study also applies a classification algorithm based on association rules to construct a predictive model. Specifically, the association rule mining algorithm identifies valuable strong association rules from the dataset by setting thresholds for minimum support and minimum confidence, reflecting the interactions between different features and their influence on accident outcomes. The research employs visualization techniques to showcase rules with high lift values, further emphasizing the highly correlated feature combinations.
Ultimately, the study concludes that urban congestion is the most significant factor contributing to frequent accidents, with a higher incidence of accidents occurring during overcast and light-rain conditions, and that electric vehicles and motorcycles are the most vulnerable groups in such incidents. Additionally, notable interactions between weather, periods, and accident types are revealed, and a predictive model is constructed, achieving an average prediction accuracy of 86.9%.
This research contributes in three main ways:
First, the study investigates the complex interactions among various factors, such as road conditions, weather, and collision types, that contribute to urban traffic accidents. It further analyzes how these interactions affect both the representation of incidents and their associated casualty outcomes. This inquiry addresses a significant gap in the existing literature concerning the comprehensive interplay of multiple variables.
Secondly, this research innovatively combines two-step clustering with association rule mining methodologies. By utilizing two-step clustering to classify accident data into homogeneous groups, it then employs association rule mining to elucidate causal relationships between accidents and their outcomes within each group as well as across the entire dataset. This methodology enhances analytical accuracy while revealing distinct patterns and risk factors pertinent to varying contexts. Based on these identified rules, a classification prediction model is constructed that autonomously extracts key features for forecasting purposes, thereby markedly improving both predictive accuracy and practical applicability.
Finally, this investigation successfully delineates contributing factors and mechanisms underlying urban traffic accidents while providing robust modeling support for similar analyses in related scenarios. Additionally, it underscores the importance of visualizing association rules to enhance practical utility; through visualization tools, one can effectively illustrate the causal relationships between accident causes and their resultant effects.

2. Literature Review

According to the research area we focused on, this section mainly reviews the characteristics and contributing factors of urban road traffic accidents and the analysis methods used in the literature.

2.1. Characteristics and Contributing Factors

A thorough analysis of the existing literature has illuminated the association of road, vehicle, and environmental factors with the incidence of urban road traffic accidents (Table 1).
A study conducted in a sub-urban area of Pakistan highlights that adverse weather conditions—such as fog, rain, severe cold, and high temperatures—alongside driver age (particularly the heightened risk associated with younger drivers) and vehicle type (the elevated accident rates of smaller vehicles) are critical factors influencing the incidence of road traffic accidents [11]. Similarly, research on urban road traffic accidents in Ningbo indicates that varying times (such as weekdays versus weekends and different hours of the day), diverse spatial areas (such as different administrative districts), and distinct demographic groups (including unemployed individuals and migrant workers) all affect the frequency of urban road traffic accidents [12]. Furthermore, human factors such as driver characteristics (age, gender) and behaviors (speeding) have also been found to be relevant [13]. A study on fatal accidents in Shenzhen from 2018 to 2022 similarly reveals that a lack of driver safety awareness and improper practices are among the primary causes of accidents [14]. Notably, factors external to the road transport system, such as socioeconomic influences, may also impact driver behavior, thereby affecting traffic accidents [15].
As illustrated in Table 1, the existing literature on urban road traffic accidents encompasses multiple dimensions, including vehicle, road, environmental, personnel, and societal factors. These elements have been combined and examined in various ways by different researchers, shedding light on their roles in accidents. However, despite the wealth of insights provided by these studies, most tend to focus on the isolated impact of individual factors, lacking a comprehensive exploration of the mechanisms underlying the combined effects of multiple factors. This paper specifically investigates the mechanisms by which various objective factors collectively contribute to traffic accidents. Future research may uncover the influence of driver characteristics, societal factors, and other unique elements on driver behaviors.
Table 1. Characteristics and contributing factors considered in related literature.
Table 1. Characteristics and contributing factors considered in related literature.
ReferenceVehicleRoadEnvironmentHumanSociety
Ackaah et al. (2020) [16]; Zichu et al. (2021) [17]
Yoshifumi et al. (2023) [18]
Mohammadi et al. (2023) [1]
Qinaat et al. (2019) [19]
Hongliang et al. (2020) [20]
Ramírez and Valencia. (2021) [9]; Li J and Zhao. (2022) [21]
Paul et al. (2019) [15]
Zhang et al. (2023) [14]
Hammad et al. (2019) [11]
Mahdi et al. (2019) [22]; Jiang et al. (2020) [23]; Samerei et al. (2021) [24]; Chen et al. (2022) [25]; Yanni et al. (2023) [26]
Kong et al. (2020) [13]; Hu et al. (2022); Khanh et al. (2023) [27]
Kashani and Besharati. (2017) [28]; Xu et al. (2018) [29]; Zhu (2020) [30]; Xiong et al. (2021) [12]; Olowosegun et al. (2022) [31]
Yingyu et al. (2018) [32]

2.2. Methodology

The association rule mining algorithm is particularly valuable for uncovering and visualizing the relationships among contributing factors and has been extensively explored in the existing literature (Table 2).
For instance, Xu and Bao [29] utilized association rule mining algorithms to examine exceptionally severe traffic accidents, employing two types of static image visualizations to present the derived association rules. Owais et al. [33] used a trained DRNN model to conduct Latin Hypercube Sampling simulations to determine the impact of each explanatory component on the final accident injury severity level. Jiang and Yuen [23] harnessed GIS technology to visually represent the findings of their investigation into motorcycle traffic accidents, facilitating the identification of accident hotspots linked to fatal factors. Moussa et al. [34] proposed a sophisticated approach that combines a deep learning paradigm with Variance-Based Global Sensitivity Analysis (VB/GSA), introducing a deep residual neural network structure to identify the significant attributes associated with injury severity levels in rear-end accidents. Kong et al. [13] applied the Classification Based on Association (CBA) algorithm to uncover hidden patterns from two perspectives of speeding: duration and behavior. Khanh et al. [2] implemented a two-step clustering method to partition the dataset into homogeneous clusters, subsequently employing association rule mining (ARM) techniques to investigate correlations between causes and traffic accidents both across the entire dataset and within each cluster. Reuben Tamakloe [8] integrated Cluster Correspondence Analysis (CCA) and Association Rule Mining (ARM) techniques to categorize PMD (Personal Mobility Devices) rider failure crash data into homogeneous sets, revealing unique risk factor patterns within each cluster and further exploring associated factor combinations.
As evidenced in Table 2, the current research methodologies encompass non-parametric methods (such as artificial neural networks and random forest algorithms), parametric methods (like multiple logistic regression models), association rule analysis, clustering methods, as well as meta-analyses and surveys. Each of these methods possesses its advantages and disadvantages, offering a diverse array of perspectives for the study of urban road traffic accidents. However, as previously mentioned, these methods exhibit limitations in elucidating the potential impacts of interactions among multiple factors. In particular, traditional regression models often struggle to manage the intricate relationships among numerous variables, while clustering analysis, although capable of categorizing data into distinct groups, lacks the means to deeply reveal the relationships between characteristics both within and across these groups.
Table 2. Methods used in related literature.
Table 2. Methods used in related literature.
MethodModelReference
Non-parametric methodsArtificial Neural Network; random forest algorithmZhu (2020) [30]
Parametric methodsMultiple logistic regression models; ordered logit model; logit model; the plume diffusion model; spatiotemporal modelMahdi et al. (2019) [22]; Li J and Zhao (2022) [21]; Yang et al. (2023) [35]; Chaudhuri et al. (2023) [36]; Yoshifumi et al. (2023) [18]
Association rule analysisAssociation rule mining; classification-based association rule miningXu et al. (2018) [29]; Jiang et al. (2020) [23]; Kong et al. (2020) [13]; Zhu (2020) [30]; Khanh et al. (2023) [2]
Clustering methodsTwo-step clusteringKashani and Besharati (2017) [28]; Khanh et al. (2023) [2]
Meta-analysisFull-text assessment and meta-analysisQinaat et al. (2019) [19]
SurveySurvey and subject matter expert workshop; A fuzzy comprehensive evaluation model based on analytic hierarchy processPaul et al. (2019) [15]; Yanni et al. (2023) [26]
Visualization of findingsGIS technology; static image; cellular automata Markov chain modelsXu et al. (2018) [29]; Jiang et al. (2020) [23]; Mohammadi et al. (2023) [1]
To address the gap in understanding the multifaceted mechanisms underlying urban road traffic accidents, this paper innovatively integrates a two-step clustering approach with an association rule mining algorithm. By employing clustering techniques, the data are segmented into homogeneous groups, while the association rule mining algorithm reveals the intricate relationships between the causes and outcomes of accidents within both the groups and the overall dataset. This not only enhances analytical precision but also establishes an efficient classification prediction model, significantly improving predictive accuracy. Furthermore, this study successfully elucidates the contributory factors and mechanisms behind accidents, and by visualizing the association rules, it enhances practical guidance, providing robust support for accident analysis in similar contexts.

3. Methodology

In this study, we examine the correlation between various characteristics of urban road traffic accidents and investigate the factors contributing to casualties. Our methodology entails the application of a two-step clustering algorithm to distinct data subsets, followed by the utilization of an association rule mining algorithm to discern relationships among features. Subsequently, we employ a classification model grounded in association rules to develop a predictive model for analyzing and forecasting factors that lead to casualties. The urban road traffic accident data utilized in this study encompasses 4285 records sourced from the Accident Management Database of the Hubei Provincial Traffic Police Brigade in China, covering the period from May 2019 to July 2023.

3.1. Clustering Analysis

We classify traffic accident data into similar categories using the two-step clustering algorithm, which is a modified version of a hierarchical clustering algorithm called Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) [37]. The two-step clustering algorithm can automatically provide the optimal clustering solution. It can show the optimal number of clusters and the variables that play a decisive role. The two-step clustering algorithm is accomplished through a pre-clustering phase and a clustering phase.
The two-step clustering algorithm consists of two steps. Step 1 involves pre-clustering. The data records are scanned sequentially from the dataset, and a distance criterion is used to determine whether the current record should be merged with any previously formed dense areas or should form a new individual area. The dense areas of data records are stored in summary statistics called clustering features (CFs). Step 2 performs clustering based on the CFs obtained in Step 1. This step employs a hierarchical clustering algorithm based on log-likelihood distance. It generates a set of clustering solutions with different numbers of clusters. To select the best clustering solution, i.e., to determine the optimal number of clusters, the algorithm uses the Bayesian information criterion (BIC) as the criterion statistic. Both steps use log-likelihood distance as the distance measure. The log-likelihood distance can handle both continuous and discrete variables. The pre-clustering phase employs the BIRCH algorithm to construct a Clustering Feature Tree. This Clustering Feature Tree is formed progressively by continuously adding and updating entries during the traversal of the dataset, utilizing node splitting as a means of refinement. BIRCH is a form of hierarchical clustering. In the clustering phase, once the Clustering Feature Tree is established, the logarithmic likelihood distance metric is applied to merge the pre-processed clusters based on the principle of minimal distance. In hierarchical clustering, similarity measures are used to assess the similarity between different clusters, thereby yielding a coherent set of clusters.
Recently, some researchers have used clustering technology to analyze traffic accident data sets and then analyzed the different clusters separately [24,28].

3.2. Association Rule Mining

In this paper, the association rule mining algorithm is used to identify the sets of characteristics that frequently occur together in urban road traffic accidents. Association rule mining is a technique in data mining used to discover interesting relationships, especially frequent patterns, associations, and correlations among variables in large databases. This technology was first proposed by Agrawal et al. in 1994. This method enables the search for characteristics that occur more frequently than those that are statistically independent. Compared to other statistical and machine learning methods, association rule methods do not require any parametric assumptions. Association rule mining includes two steps: (1) scanning the database to iteratively search for frequent item sets and identify all frequent item sets; (2) generating strong association rules based on the identified frequent item sets. This method can find unexpected and interesting rules, which can be used as references to reduce accidents. Recently, researchers have used association rule discovery technology to explore problems related to traffic accidents, such as motorcycle accidents [23], extraordinarily severe traffic crashes [29], and vehicle–bicycle hit-and-run crashes [30].
Support is the proportion of transactions in which a particular item set appears. Confidence is the probability that the consequence occurs given that the antecedent has occurred. It is an important measure of the strength of a rule. Lift is a measure of the interestingness of an association rule. It represents the ratio of the probability of the consequent occurring given the antecedent to the probability of the consequent occurring independently. A lift greater than 1 indicates that the rule is meaningful. Support, confidence, and lift values are the evaluation indicators for frequent item sets and strong association rules. The algorithm can define the minimum support and minimum confidence as the threshold to select valuable association rules. An association rule is a rule of the form “if A, then B”, where A and B are item sets, and AB = ∅. This rule suggests that the occurrence of A increases the probability of B occurring. The minimum threshold needs to be determined according to the characteristics of the data set and the research purpose. The rules discovered by the algorithm usually have the form “AB”. Both A and B are sets, and A and B do not intersect. A is called an antecedent transaction and B is called a consequent transaction.
Support is defined as the frequency that the antecedent transaction A and the consequent transaction B occur simultaneously in the entire data set, i.e., the probability that A and B co-occur.
Confidence is defined as the conditional probability of occurrence of B when A occurs. Confidence denotes the probability of occurrence of B when A occurs [38].
Lift is defined as the ratio of the probability of occurrence of transaction B when A occurs to the probability of occurrence of B in the data set. Lift indicates the effect of A on the probability of occurrence of B. When the lift is 1, A and B are independent of each other. When the lift is greater than 1, A and B are correlated. The higher the lift for the generated association rules, the stronger the association. Rules with lift > 1 have research value. In this study, we define the rules with lift > 1 as “interesting rules”.
support A B = N A B N data = P A B
confidence A B = support A B support A = P A B P A = P B | A
lift A B = confidence A B support B = P B | A P B
It should be noted that the interesting rules displayed by association rules are statistical. The before-and-after transactions of the rules generated by association rule discovery do not necessarily have a causal relationship but need to be interpreted as correlations, and the relationships behind them need to be further explored and interpreted.

3.3. Classification Based on Association Rule Mining

The classification based on the association (CBA) rule algorithm will be used to explore contributing factors to casualties. This algorithm’s ability to construct rule sets for target classes is one of its benefits. The rule set can be used to investigate the extent to which factors and their combinations affect the occurrence of casualties statistically. Another benefit is that no parametric assumptions are made with this non-parametric approach. Therefore, the conclusion may be more objective. This method has been used to study speeding behaviors [13].
Classification based on CBA is an integration framework that combines association rule mining and classification rule mining [39]. For association rule mining, the discovery target is not predetermined, and all rules in the database that satisfy stated minimum support and minimum confidence will be found [13], while for classification rule mining, the goal is to discover a small number of rules in the database to form an accurate classifier. CBA focuses on mining a special dataset called Class Association Rules (CARS) and constructing a classifier based on the set of found CARS. CARS are a special subset of association rules whose consequent transactions are restricted to the classification class attribute [40]. The classification rules obtained by CBA have the same form as the rules obtained by the association rule mining algorithm. The understanding of support, confidence, and lift is the same.

4. Data Description

4.1. Data Sources

The road traffic accident data were collected from the accident management database of the traffic police brigade in Hubei province, China. A total of 4285 accident data records from May 2019 to July 2023 were organized. Each record included four types of information: (i) Details of traffic accidents, including detailed information regarding collision type, casualties, and subjects of the accident. (ii) Vehicle attributes, including vehicle types in active hits and positive hits; (iii) Road information: road section, road type, and road conditions; (iv) Environmental characteristics, containing the week, slot, and weather.
The final dataset contained 4285 urban road traffic accidents, and 11 variables with 72 characteristics are listed in Table A1.

4.2. Data Use

The variables “Active Hit” and “Positive Hit” are further elaborations of the variable “Subjects of the accident”, which give a detailed account of the subjects that hit other subjects and the subjects that were hit in the traffic accident. Therefore, in the clustering process, the variables “Active Hit” and “Positive Hit” were not considered.
In the clustering process, eight variables were considered: collision type, subjects of the accident, road section, road type, road conditions, week, slot, and weather. In the association rule analysis process, nine variables were considered: collision type, active hit, positive hit, road section, road type, road conditions, week, slot, and weather. In the classification analysis process, ten variables were considered: collision type, casualty, active hit, positive hit, road section, road type, road conditions, week, slot, and weather.

5. Results

5.1. Two-Step Clustering Analysis

We used the two-step cluster algorithm provided by SPSS12.0 Software. The two-step clustering algorithm can automatically provide the optimal number of clusters. The results show that it is ideal to divide the data into two clusters. These clusters were analyzed and labeled according to their varying distributions of variables, with “Weather” and “Road conditions” as the major variables for differentiation. Thus, these two variables were chosen to describe the clusters. The following two clusters are listed, based on the variable distribution:
Cluster 1: accidents on dry roads in fine weather.
Cluster 2: accidents on wet roads in bad weather.
By elaborating the distribution of the variables between the two clusters, the traffic accident pattern for each cluster can be defined. As shown in Table 3, 100% of the accidents for Cluster 1 occurred on dry roads. The weather types for Cluster 1 were sunny, overcast, and cloudy. In contrast, all traffic accidents in Cluster 2 occurred in bad weather with rain, and the roads were wet.

5.2. Association Rule Analysis

To generate association rules among the characteristics in road traffic data, we used the association rule mining algorithm that the R 12.0 software offered. In addition, the minimum support and minimum confidence were set as 0.1 and 0.7, respectively, after some trial-and-error runs. Each cluster was studied separately.
Items. The item frequencies of Cluster 1 and Cluster 2 are shown in Figure 1 and Figure 2. There are clear differences in the ordering of item frequencies between Figure 1 and Figure 2. However, both graphs show that the four most frequent items are {Road section = Prosperous}, {Collision type = crash}, {Active hit = vehicle} and {Road type = straight}, indicating that these items are highly associated with the occurrence of road traffic accidents in any condition.
Interesting rules. As mentioned in methods, a rule with lift >1 indicates that its antecedent and consequent have positive interdependence, which is of research value. We define rules with lift > 1 as interesting rules. A total of 64 association rules are generated from Cluster 1. Among them, 51 rules with lift > 1. A total of 82 rules are generated from Cluster 2, and 56 rules with lift > 1. Ranking the interesting rules in descending order of lift, the interesting rules for each cluster are listed in Table A2 and Table A3. Table 4 and Table 5 show the first 5 rows of Table A2 and Table A3.
Pattern of rules. We created graph-based visualizations as Figure 3 (cluster1) and Figure 4 (cluster2) to show how the rules formed in two clusters to help comprehend the patterns of these interesting rules. Such techniques can show the relationship between individual items in the rule set. Items are represented as labeled vertices, and rules are represented as circles connected to items using arrows. The arrows lead from one or more labels (the antecedent) to one circle (the rule), then from the circle to further labels (the consequent). The labels (items) are linked by circles (rules). Moreover, the rules show the strength of the correlation between these items. Darker red circles imply higher lift rules, and higher lift indicates a stronger association. The force-oriented layout used in visualization moves items contained in many rules, and rules that share many items, to the center. Items that are in very few rules are pushed to the periphery of the plot.
Highly associated items. The higher the lift of a rule, the stronger the correlation among the items contained in this rule. Mining high-lift rules can obtain highly associated characteristics. From the graph in Figure 3, we can find most high-lift rules, including {Road type = crossroad} and {Road section = Prosperous}. Items {Road type = Straight} and {Road section = Non-prosperous} are highly associated with each other. They mainly appear in the rules with high lift together. The items {Collision Type = crash}, {Weather = cloudy}, and {Positive Hit = electric vehicle}, are highly linked to high-lift rules. From the graph in Figure 4, we found the greatest correlation among {Road section = Prosperous}, {Slot = night}, and {Weather = light rain}. They appear in the rules with the highest lift. Consistent with Cluster 1, {Road section = Non-prosperous} and {Road type = Straight} appear in many high-lift rules. {Positive Hit = motorcycle} appeared in only one interesting rule—the rule with the highest lift. We should note that since these characteristics are related, and even work together, eliminating any feature of these rules will reduce the risk of corresponding traffic accident patterns.

5.3. Classification Based on Association Rule Analysis

We next explore potential factors contributing to casualties in traffic accidents. We performed the Classification based on CBA provided by the ‘artless’ package in R Software. To extract useful classification rules, the minimum support and minimum confidence were set as 0.05 and 0.5, respectively.
A total of 57 classification rules were generated. There are two categories on the right-hand side of the rules: {Casualties = Y} and {Casualties = N}. The consequent {Casualties = Y} indicates that this traffic accident resulted in casualties. The consequent {Casualties = N} indicates that there were no casualties in this traffic accident. Among 57 classification rules, there are 31 {Casualties = Y} rules and 26 {Casualties = N} rules. Ranking the rules in descending order of lift, rules for each category are listed in Table A4 and Table A5. Table 6 and Table 7 show the first five rows of Table A4 and Table A5.
We also made a graph-based visualization as Figure 5 to describe how the rules formed. As shown in Figure 5, the graph has two centers, {Casualties = Y} and {Casualties = N}. Six items {Road Conditions = dry}, {Road type = Crossroad}, {Road section = Prosperous}, {Weather = cloudy}, {Road conditions = wet}, {Road type = straight} are connected to both centers, while others are connected to only one. {Positive Hit = vehicle} appears in all rules related to {Casualties = N}. {Positive Hit = pedestrian}, {Slot = forenoon}, and {Casualties = Y} appear together in rules with high lift, indicating that they are highly associated. {Positive Hit = electric vehicle}, {Active Hit = electric vehicle}, {Positive Hit = motorcycle}, {collision type = crash} are each found in a large number of rules related to {Casualties = Y}.
The CBA performed effectively, and, using a ten-fold cross-validation method to assess the predictive performance of the algorithm, the mean value of accuracy was obtained as 86.9%, precision as 87.9%, and recall as 85.3%, and maximum values of 89.7%, 90.5%, and 88.6% were achieved. This indicates that the model has an 86.9% chance of correctly predicting whether an accident will have casualties in this dataset; that 87.9% of the predicted casualty accidents will have casualties; and that there is an 85.3% probability of picking out actual casualty traffic accidents.

5.4. Comparison of Two Clusters

Next, we compare the association rules of two clusters and analyze and summarize the similarities and differences in the factors influencing traffic accidents under good and bad weather conditions.
(a) Regardless of the weather conditions, the primary factor influencing urban traffic accidents is the congestion level of the road segments. Busy roadways, especially intersections with high traffic volume, are prone to accidents. However, in adverse weather conditions, such as slippery roads, it is important to note that even less congested straight roads can pose a risk for accidents, and drivers should not let their guard down.
(b) In urban areas, people are busy commuting during weekdays, leading to traffic congestion, which makes weekdays the most likely time for accidents to occur. On days with good weather, the probability of traffic accidents is slightly higher on Mondays, Tuesdays, and Thursdays. Conversely, on days with poor weather, Fridays are particularly prone to accidents. Increased caution is advised during peak hours on Friday evenings when people are returning home.
(c) Motorcycles and electric vehicles are more susceptible to traffic accidents and often are the victims of such incidents. On dry roads with good weather conditions, electric vehicles have a strong correlation with collision accidents, making them more likely to be involved in crashes. In adverse weather conditions with wet roads, both electric vehicles and motorcycles are at risk of collision accidents, often being struck by motor vehicles.
(d) Regardless of the weather conditions, the evening hours (5:00 PM to 7:00 PM) are the most likely times for traffic accidents to occur throughout the day, coinciding with the time when most workers are leaving their jobs and students are finishing school. On days with good weather, the morning hours (9:00 AM to 12:00 PM) also present a potential time for accidents, whereas in bad weather and wet road conditions, special attention should be paid to nighttime driving safety.
(e) A more detailed analysis of the impact of weather on the occurrence of traffic accidents reveals that accidents are most frequent during overcast and light-rain conditions. Under circumstances of low visibility and slippery road surfaces, drivers should exercise caution.

6. Discussion

We use three data mining methods to study the factors (accident detail, vehicle, road, and environment) contributing to road traffic accidents. In the first stage, the two-step clustering algorithm is used to separate the accident data. In the second stage, the association rule mining algorithm is used to investigate the characteristics of URTAs and the relations among the variables. In the third stage, the CBA classifier is used to extract factors leading to the increase in casualty. The following are the main observations of the models.

6.1. Contributory Factors to Road Traffic Accidents

The clustering results showed that the factors that separate accident data into different clusters are weather and road conditions. The data were divided into two clusters. The first cluster represented urban road traffic accidents that occurred on dry roads and in good weather. The second one represented urban road traffic accidents on wet roads and in bad weather.
Commonalities: Both clusters show that the four most frequent items are {Road section = Prosperous}, {Collision type = crash}, {Active hit = vehicle}, and {Road type = straight}, indicating that these items are highly associated with urban road traffic accidents. Particularly, {Road section = Prosperous} had the highest frequency of traffic accidents, which is consistent with previous studies that assess the relationship between traffic congestion and accidents (Retallack and Ostendorf 2019). Therefore, to reduce the incidence of urban road traffic accidents effectively, authorities should focus their efforts on road sections that have a high traffic volume (over 10,000 vehicles per day). Often, the vehicle responsible for traffic accidents (the one that hits others) is a motor vehicle and is prone to crash-type traffic accidents.
Cluster 1: The conclusions are derived from Table A2 and Figure 3. (a) Rules 17, 18, 37, 38, and 41, all show that {Road section = Prosperous} and {Collision type = crash} are strongly correlated. And many rules show that {Road section = Prosperous} and {Collision type = crash} are strongly correlated. This is easy to understand, as busy areas are prone to traffic accidents. (b) In addition to crashes, the second most frequent collision type is {Collision = scrape} in Figure 1. The possible reasons for this are that high traffic volumes may increase the probability of lane changes (e.g., reaching a dedicated turn lane) and overtaking maneuvers, as well as inadequate driver attention allocation [41]. (c) Cloudy is the more likely weather for traffic accidents compared to sunny. This may be because visibility is lower than on sunny days, roads are relatively less dry, and people drive faster and less carefully on cloudy days for fear of rain. Rule 5 shows that {Weather = cloudy} and {Slot = night} are strongly correlated. Rule 6 suggests that accidents are likely to occur on cloudy Sundays, perhaps because people prefer to drive out on cool, cloudy days. (d) Busy roads on working days are also prone to traffic accidents, as shown in Rules 35, 41, and 43, which indicate a slightly higher statistical probability of traffic accidents on Mondays, Tuesdays, and Thursdays. In studies of traffic accidents in the urban weaving section in China, it was found that traffic accidents were much more likely to occur on weekdays than on weekends [42].
Cluster 2: The conclusions are derived from Table A3 and Figure 4. (a) Rule 1, with the highest lift, shows that {Slot = night}, {Road section = Prosperous}, and {Weather = light rain} are strongly related, which is inconsistent with the perception that the worse the weather conditions, the more likely traffic accidents will occur. The reason may be people prefer to use public transport (such as the metro) in severe weather (such as heavy rainstorms). In addition, traffic authorities deploy staff to manage traffic in severe weather. However, people may not pay enough attention to light rain, and the slippery roads and poor visibility caused by such weather can easily lead to traffic accidents. It has been shown that traffic accident rates and injury rates increase statistically significantly by 10% and 8% during periods of rainfall compared to dry weather [43]. (b) Rules 27 and 36 indicate a knowledge that people may not be aware of, that Friday is the day of the week when traffic accidents are likely to occur. One possible explanation for this phenomenon is that people are physically tired on Fridays after a week of exertion and mentally excited by the upcoming holiday, and therefore less focused on driving. A study in Canada shows that traffic accidents on highways show a significantly increased risk of death on Friday [44]. (c) {Positive hit = electric vehicle}, {Collision type = crash} and {Road section = prosperous} are related. There is a strong correlation between {Collision type = crash} and {Positive hit = motorcycle}. Motorbikes and electric bikes are prone to accidents when the roads are slippery, and the weather is terrible. And they are usually the vehicles that are hit. Motorbikes and electric bikes are popular because of their convenience. However, their riders often overspeed, run red lights and disobey traffic laws. Such behavior is undoubtedly extremely likely to lead to traffic accidents. According to a study conducted in China to describe the riding behavior of E-bike riders, 26.6 percent of the 18,150 E-bike riders observed broke road rules such as running red lights, riding in the opposite direction, and riding in motor vehicle lanes [45]. (d) In the previous analysis, it was highlighted that urban areas with high traffic flow are prone to traffic accidents. Therefore, it is vital to note that {Road section = non-prosperous} and {Road type = straight} are strongly related. Also, these two items appear in many rules with high support together, suggesting that accidents can also occur on straight roads with low traffic volumes. One possible explanation for this situation is that in bad weather, people usually encounter congested traffic and therefore tend to let their guard down and even speed up when driving onto a clear straight road with low traffic flow, thus leading to a crash. Rain increases the risk of accidents in low-traffic congestion situations. In previous studies, the risk of rainfall was compared separately for each of the 15 congestion levels, showing that the impact of rainfall on accident occurrence increased when congestion levels were low. At low congestion levels, the risk of a rainfall accident is five times higher than in good weather [46].

6.2. Contributory Factors to Casualties

Six items {Road Conditions = dry}, {Road type = Crossroad}, {Road section = Prosperous}, {Weather = cloudy}, {Road conditions = wet}, {Road type = straight} are connected to both {Casualties = N} and {Casualties = Y}, while others are connected to only one. It shows that these factors do not directly affect the casualties in traffic accidents.
No-casualty accidents: From Table A5 and Figure 5, we can see that {Casualties = N} and {Positive hit = vehicle} appear in almost all classification rules at the same time. This indicates that if a motor vehicle is hit, there will be few casualties in a traffic accident. Also, {Casualties = N} and {Positive hit = vehicle} appear in the five rules with high lift.
Casualty accidents: The conclusions are derived from Table A4 and Figure 5. (a) There is no doubt that {Casualties = Y} and {Positive Hit = pedestrian} have the strongest correlation. They appear in the three rules with the highest lift. The traffic department should take measures to reduce pedestrian injuries and fatalities. A study noted that police presence at busy intersections improves pedestrian safety during busy nights and weekends; elevated pedestrian bridges at mid-block locations reduce the likelihood of pedestrian and car crashes [47]. (b) {Positive hit = motorcycle}, {Active hit = motorcycle}, {Positive hit = electric vehicle}, and {Active hit = electric vehicle} all appear in multiple rules. This shows that motorcycles and electric vehicles are vulnerable to traffic accidents and that their occupants are very likely to be injured or even killed. A nationwide study shows that e-bike riders (91.07%) represent the majority of patients hospitalized for traffic crashes [48]. Of particular note is the high mortality rate (5.7%) of bicycle (including motorized) accidents that result in hospitalization. (d) {Slot = afternoon} and {Casualty = Y} appear in a rule with high lift. This may be because they tend to be mentally drowsy and slack and drive carelessly in the afternoon. Some studies show that most of the workday injuries causing hospitalization occur in the morning and afternoon, and reach their peak at 8 a.m. and 4 p.m. People need to pay extra attention to fatigue driving and take a nap before driving if necessary. (e) Most of the casualties occur in crashes.

6.3. Suggestions for Reducing Road Traffic Accidents

The following are some suggestions for reducing urban road traffic accidents based on the findings of this paper.
To reduce traffic accidents and casualties, the primary aim is to reduce traffic congestion. Congestion can lead to aggressive driving. Traffic police can provide more information to drivers, urging them to take alternative routes, and a campaign highlighting the dangers of aggressive driving could be useful. One of the major characteristics of road traffic is that the periods and areas with high traffic flow are somehow fixed. Therefore, the government can vigorously develop public transportation to reduce the passenger demand for private cars and cabs in commercial and residential areas, and enhance the traffic organization near congestion points in a targeted manner to promote traffic flow circulation. In addition to the rush hour congestion, another reason for accidents is that people are prone to fatigue during these periods. Driving when fatigued may result in more fatal and serious injuries in traffic accidents [49]. Some public facilities are also useful in reducing traffic accidents. Elevated intersections at central block locations reduce the likelihood of pedestrian and vehicle crashes; high levels of lighting at intersections are also useful.
Of course, drivers shouldn’t relax on straight roads with low traffic volumes. Studies have identified driving compensation effects. Elements of the roadway that are typically considered to be associated with higher driving risk (e.g., “circular curves” and “diversions and merges”) can reduce the probability of an accident [50]. When the driving environment becomes complex in terms of road geometry and traffic flow, drivers become more cautious and prudent.
The influence of weather on road traffic is complex. On the one hand, it will affect visibility, and physically and psychologically interfere with the driver’s driving. On the other hand, it will also affect other objective conditions such as road conditions [51]. It is suggested to increase the visibility of road markings in the rain and improve the retroreflection level of traffic signs [52].
It is important to wear protective equipment such as helmets on bicycles and e-bikes; this can effectively reduce the occurrence of traumatic brain injuries such as epidural hematomas and open-head injuries.

7. Conclusions

The characteristics of urban road traffic accidents are diverse, and this study aims to analyze the potential effects of the interaction between two or more characteristics. A two-step clustering algorithm was utilized to partition 4285 urban road traffic accident data records in Hubei Province into homogeneous clusters, with each cluster representing accidents under specific conditions. Subsequently, utilizing the clusters as input, the association rule mining algorithm was employed to extract the relationship between features. Finally, using the classification based on the association rule algorithm, the factors leading to casualties were analyzed and a prediction model was constructed for casualty occurrence.
In this study, three data mining methods were used to investigate factors contributing to road traffic accidents (accident details, vehicles, roads, and environment). The two-step clustering algorithm was applied for accident data segmentation. The association rule mining algorithm was utilized for studying relationships between urban road traffic accident features and variables. The CBA classifier was employed for identifying factors contributing to increased casualties. Key findings from three phases were obtained:
1. (a) Busy areas are prone to traffic accidents; (b) High traffic volumes may increase the likelihood of lane changes and overtaking operations; (c) Cloudy weather is more likely than sunny weather to cause traffic accidents; (d) There is a slightly higher statistical probability of traffic accidents on Mondays, Tuesdays, and Thursdays.
2. (a) More traffic accidents occur during light rain than heavy rain; (b) Friday has the highest likelihood of traffic accidents among days of the week; (c) Motorcycles and electric bicycles are usually hit when roads are slippery or inclement weather due their proneness towards such conditions; (d) There is a higher probability of accidents on straight roads with low traffic.
3. (a) Among casualty accidents, pedestrians suffer the most casualties; (b) Motorcycles and electric vehicles have a high risk of injury or death in traffic accidents; (c) Injuries and deaths are more likely in the afternoon when people are mentally tired; (d) Most deaths and injuries occur in car accidents.
Based on the findings of this study, the following recommendations are made to reduce traffic accidents on urban roads. There is a positive nonlinear causal relationship between traffic congestion and the number of accidents. An analysis of over 10 billion observations worldwide in 2019 indicates that, overall, a 10% reduction in traffic delays would lead to a 3.4% decrease in accidents, equivalent to more than 72,000 crashes Furthermore, there is also a positive correlation between traffic congestion and the number of serious or fatal accidents on major roads. Increased traffic congestion may lead to more aggressive driving behavior on the roads. Traffic police can provide drivers with more information, urging them to choose alternative routes during peak hours, and a campaign highlighting the dangers of aggressive driving under congested conditions may be beneficia.
A primary characteristic of urban road traffic is that periods and areas of high traffic flow are more fixed. A study in China indicates that on weekdays, there are peak times in the morning (8:00–10:00), at noon (12:00–14:00), and in the evening (17:00–19:00), while on weekends, there are only peak times at noon and in the evening; the travel demand in downtown areas is also relatively stable. Therefore, to address urban traffic congestion, government agencies should vigorously develop public transportation to reduce the demand for private cars and taxis in commercial and residential areas. Targeted efforts should be made to enhance traffic organization near congestion points to facilitate the flow of traffic.
In future research, similar studies can be conducted in various regions to enhance the reliability of road traffic safety knowledge. Furthermore, the impact of human characteristics and behaviors, as well as socioeconomic and other relevant factors on URTAs should also be taken into consideration.

Author Contributions

Conceptualization, L.D. and F.H.; Methodology, L.D. and H.L.; Formal analysis, L.D.; Geocoding and cleaning, F.H.; Writing, L.D., F.H, H.L., and S.C.; original draft review, Q.G.; review and editing, L.D.; supervision, L.D. and S.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Natural Science Foundation of Hubei Province (Grant No. 2020CFB162), the Humanities and Social Science Fund of the Ministry of Education of China (Grant No. 20YJC630018), and the National Natural Science Foundation of China (Grant No. 72104190).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was not obtained from the patients due to the nature of the study.

Data Availability Statement

All the data used in this study are available via Appendix A, Appendix B, Appendix C and Appendix D.

Acknowledgments

The authors would like to acknowledge all anonymous reviewers for their valuable comments and suggestions.

Conflicts of Interest

Author Hua Lu was employed by the company Hubei Information and Communication Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A. Data Description

Table A1 shows variable descriptions for 4285 urban road traffic accidents. Table A1 contains 11 variables and 72 characteristics.
Table A1. Variable description.
Table A1. Variable description.
FactorVariableDescriptionCharacteristicsCountPercentage
Details of traffic accidentsCollision TypeSpecific types of collisions in traffic accidents1: crash299069.78%
2: dodge551.28%
3: shunt711.66%
4: rollover360.84%
5: scrape101523.69%
6: other1182.75%
CasualtiesWhether there were casualties in the traffic accident1: N216950.62%
2: Y211649.38%
Subjects of the accidentThe types of subjects involved in the traffic accident, such as a vehicle and an animal1: unilateral2926.81%
2: non-pedestrian791.84%
3: both-non2596.04%
4: vehicle-non101723.72%
5: vehicle-animal180.42%
6: vehicle-pedestrian1944.52%
7: vehicle221251.55%
8: other2144.99%
Vehicle factorsActive HitThe subject that hit others in the accident1: pedestrian10.02%
2: bicycle360.84%
3: motorcycle3828.91%
4: electric vehicle60314.07%
5: vehicle210649.15%
6: taxi1713.99%
7: off road721.68%
8: bus751.75%
9: van1142.66%
10: small truck711.66%
11: truck3137.30%
12: big truck892.08%
13: other2525.88%
Positive HitThe subject that was hit in the accident1: animal260.61%
2: pedestrian2946.86%
3: bicycle1192.78%
4: motorcycle50411.76%
5: electric vehicle101123.59%
6: vehicle151735.40%
7: taxi701.63%
8: off road651.52%
9: bus270.63%
10: van872.03%
11: small truck370.86%
12: truck1363.17%
13: big truck350.82%
14: other3578.33%
Road factorsRoad sectionWhether the road where the accident occurred is a road area1: Non prosperous113026.37%
2: prosperous315573.63%
Road typeType of the road1: Crossroad143633.51%
2: Straight284966.49%
Road conditionsCondition of the road1: dry330477.11%
2: wet98122.89%
Environment factorsWeekDescribe the day of the week on which the traffic accident occurred1: Monday63614.84%
2: Tuesday62214.52%
3: Wednesday62314.54%
4: Thursday61314.31%
5: Friday65615.31%
6: Saturday56613.21%
7: Sunday56913.28%
SlotDescribe the slot of a day when a traffic accident occurs.1: late night1032.40%
2: dawn1172.73%
3: morning86320.14%
4: forenoon79918.65%
5: noon3728.68%
6: afternoon50711.83%
7: evening74417.36%
8: night78018.20%
WeatherThe weather conditions at the time of the traffic accident1: sunny69416.20%
2: overcast48811.39%
3: cloudy212249.52%
4: light rain51011.90%
5: moderate rain2155.02%
6: heavy rain1112.59%
7: rainstorm1062.47%
8: thunderstorm390.91%

Appendix B. Data Description

Table A2 shows interesting rules for Cluster 1, ranked in descending order of lift. Rules in Table A2 reveal the associations among the characteristics of urban road traffic accidents on dry roads in fine weather.
Table A2. Interesting rules for Cluster 1.
Table A2. Interesting rules for Cluster 1.
RulesAntecedentConsequentSupportConfidenceLift
1{Active_Hit = vehicle, Road_section = Non_prosperous}{Road_type = Straight}0.1041160.7962961.189943
2{Road_section = Non_prosperous}{Road_type = Straight}0.212470.7808681.166887
3{Weather = cloudy, Road_section = Non_prosperous}{Road_type = Straight}0.1334750.7764081.160223
4{Collision_Type = crash, Road_section = Non_prosperous}{Road_type = Straight}0.144370.775611.15903
5{Slot = night}{Weather = cloudy}0.129540.7417681.154948
6{Week = Sun}{Weather = cloudy}0.1007870.7367261.147098
7{Weather = cloudy, Collision_Type = crash, Road_type = Crossroad}{Road_section = Prosperous}0.1256050.8333331.144837
8{Active_Hit = vehicle, Road_type = Crossroad}{Road_section = Prosperous}0.1316590.831741.142648
9{Active_Hit = vehicle, Collision_Type = crash, Road_type = Crossroad}{Road_section = Prosperous}0.1016950.829631.139749
10{Weather = cloudy, Positive_Hit = electric_vehicle}{Collision_Type = crash}0.1216710.7897841.139496
11{Positive_Hit = electric_vehicle}{Collision_Type = crash}0.1879540.7860761.134146
12{Weather = cloudy, Road_type = Crossroad}{Road_section = Prosperous}0.1785710.8228731.130467
13{Collision_Type = crash, Road_type = Crossroad}{Road_section = Prosperous}0.1915860.8210121.12791
14{Positive_Hit = electric_vehicle, Road_section = Prosperous}{Collision_Type = crash}0.1361990.781251.127183
15{Road_type = Crossroad}{Road_section = Prosperous}0.2711860.8197621.126193
16{Positive_Hit = electric_vehicle, Road_type = Straight}{Collision_Type = crash}0.1225790.7803471.12588
17{Active_Hit = vehicle, Road_type = Crossroad}{Collision_Type = crash}0.1225790.7743791.117269
18{Active_Hit = vehicle, Road_section = Prosperous, Road_type = Crossroad}{Collision_Type = crash}0.1016950.7724141.114435
19{Active_Hit = vehicle, Road_section = Prosperous}{Collision_Type = crash}0.2587770.7467251.077371
20{Active_Hit = vehicle}{Collision_Type = crash}0.355630.7450861.075006
21{Weather = cloudy, Active_Hit = vehicle}{Collision_Type = crash}0.2224580.7431751.072249
22{Weather = cloudy, Active_Hit = vehicle, Road_section = Prosperous}{Collision_Type = crash}0.1628330.7420691.070653
23{Week = Mon}{Collision_Type = crash}0.1243950.7365591.062704
24{Weather = cloudy, Active_Hit = vehicle, Road_type = Straight}{Collision_Type = crash}0.1440680.7311831.054947
25{Active_Hit = vehicle, Road_section = Prosperous, Road_type = Straight}{Collision_Type = crash}0.1570820.7309861.054663
26{Active_Hit = vehicle, Road_type = Straight}{Collision_Type = crash}0.2330510.730551.054034
27{Slot = night}{Road_section = Prosperous}0.1325670.7590991.042853
28{Weather = sunny, Road_section = Prosperous}{Collision_Type = crash}0.1101690.7222221.042018
29{Active_Hit = electric_vehicle}{Collision_Type = crash}0.1029060.7188161.037104
30{Slot = evening}{Collision_Type = crash}0.1277240.7176871.035475
31{Positive_Hit = vehicle, Collision_Type = crash}{Road_section = Prosperous}0.1628330.7493041.029397
32{Weather = sunny}{Collision_Type = crash}0.1498180.7132561.029083
33{Weather = overcast}{Collision_Type = crash}0.1053270.7131151.028878
34{Slot = forenoon}{Collision_Type = crash}0.1346850.7108631.025629
35{Week = Mon}{Road_section = Prosperous}0.1259080.745521.024198
36{Week = Thurs}{Collision_Type = crash}0.1001820.7087791.022623
37{Road_section = Prosperous, Road_type = Crossroad}{Collision_Type = crash}0.1915860.7064731.019296
38{Road_type = Crossroad}{Collision_Type = crash}0.2333540.7053981.017745
39{Weather = cloudy, Positive_Hit = electric_vehicle}{Road_section = Prosperous}0.1141040.7406681.017533
40{Week = Tues}{Road_section = Prosperous}0.1095640.7402861.017009
41{Weather = cloudy, Road_section = Prosperous, Road_type = Crossroad}{Collision_Type = crash}0.1256050.703391.014847
42{Active_Hit = vehicle, Positive_Hit = vehicle}{Road_section = Prosperous}0.1280270.7369341.012403
43{Week = Thurs}{Road_section = Prosperous}0.1041160.7366171.011967
44{Weather = sunny, Collision_Type = crash}{Road_section = Prosperous}0.1101690.7353541.010232
45{Weather = cloudy, Collision_Type = crash}{Road_section = Prosperous}0.3211260.7332411.00733
46{Weather = cloudy, Active_Hit = vehicle}{Road_section = Prosperous}0.2194310.7330641.007086
47{Weather = cloudy}{Road_section = Prosperous}0.4703390.7323281.006076
48{Weather = cloudy, Active_Hit = vehicle, Collision_Type = crash}{Road_section = Prosperous}0.1628330.7319731.005588
49{Collision_Type = crash}{Road_section = Prosperous}0.5069610.7314411.004857
50{Positive_Hit = electric_vehicle}{Road_section = Prosperous}0.1743340.7291141.00166
51{Positive_Hit = vehicle}{Road_section = Prosperous}0.2605930.7284261.000716

Appendix C. Data Description

Table A3 shows interesting rules for Cluster 2 ranking in descending order of lift. Rules in Table A3 reveal the associations among the characteristics of urban road traffic accidents on wet roads in bad weather.
Table A3. Interesting rules for Cluster 2.
Table A3. Interesting rules for Cluster 2.
RulesAntecedentConsequentSupportConfidenceLift
1{Slot = night, Road_section = Prosperous}{Weather = light_rain}0.1172270.7055211.357091
2{Collision_Type = crash, Road_section = Non_prosperous}{Road_type = Straight}0.145770.8614461.324574
3{Road_section = Non_prosperous}{Road_type = Straight}0.1957190.8311691.27802
4{Active_Hit = vehicle, Road_section = Non_prosperous}{Road_type = Straight}0.1039760.8095241.244738
5{Positive_Hit = motorcycle}{Collision_Type = crash}0.1100920.8641.210834
6{Weather = light_rain, Collision_Type = crash, Road_type = Crossroad}{Road_section = Prosperous}0.1213050.9224811.206605
7{Weather = light_rain, Road_type = Crossroad}{Road_section = Prosperous}0.1692150.9120881.193011
8{Active_Hit = vehicle, Positive_Hit = electric_vehicle}{Collision_Type = crash}0.1039760.851.191214
9{Collision_Type = crash, Road_type = Crossroad}{Road_section = Prosperous}0.2201830.9037661.182126
10{Positive_Hit = vehicle, Road_type = Crossroad}{Road_section = Prosperous}0.1080530.8983051.174983
11{Active_Hit = vehicle, Collision_Type = crash, Road_type = Crossroad}{Road_section = Prosperous}0.1202850.8872181.160481
12{Road_type = Crossroad}{Road_section = Prosperous}0.3098880.8862971.159277
13{Positive_Hit = electric_vehicle, Road_section = Prosperous}{Collision_Type = crash}0.1345570.8098161.134899
14{Active_Hit = vehicle, Road_type = Crossroad}{Road_section = Prosperous}0.1600410.8674031.134564
15{Weather = moderate_rain, Collision_Type = crash}{Road_type = Straight}0.1111110.7266671.117335
16{Positive_Hit = electric_vehicle}{Collision_Type = crash}0.1783890.7918551.109729
17{Positive_Hit = electric_vehicle, Road_type = Straight}{Collision_Type = crash}0.1090720.7810221.094546
18{Active_Hit = vehicle, Road_type = Straight}{Collision_Type = crash}0.2762490.7787361.091342
19{Weather = light_rain, Active_Hit = vehicle, Road_section = Prosperous, Road_type = Straight}{Collision_Type = crash}0.1019370.7692311.078022
20{Active_Hit = vehicle, Road_section = Prosperous, Road_type = Straight}{Collision_Type = crash}0.1926610.7682931.076707
21{Active_Hit = vehicle}{Collision_Type = crash}0.4118250.7637051.070278
22{Slot = night, Collision_Type = crash}{Road_section = Prosperous}0.1233440.8175681.069378
23{Active_Hit = vehicle, Road_section = Prosperous}{Collision_Type = crash}0.3129460.7617871.06759
24{Week = wed}{Collision_Type = crash}0.1478080.7591621.063912
25{Weather = light_rain, Active_Hit = vehicle, Road_section = Prosperous}{Collision_Type = crash}0.1681960.7568811.060714
26{Slot = night, Weather = light_rain}{Road_section = Prosperous}0.1172270.8098591.059296
27{Week = Fri, Collision_Type = crash}{Road_section = Prosperous}0.112130.8088241.057941
28{Weather = light_rain, Active_Hit = vehicle, Collision_Type = crash}{Road_section = Prosperous}0.1681960.8088241.057941
29{Slot = evening}{Road_section = Prosperous}0.128440.8076921.056462
30{Active_Hit = vehicle, Road_section = Prosperous, Road_type = Crossroad}{Collision_Type = crash}0.1202850.7515921.053303
31{Week = wed, Road_section = Prosperous}{Collision_Type = crash}0.1100920.751.051071
32{Weather = light_rain, Active_Hit = vehicle, Road_type = Straight}{Collision_Type = crash}0.1345570.751.051071
33{Slot = night}{Road_section = Prosperous}0.1661570.8029561.050266
34{Slot = night, Active_Hit = vehicle}{Road_section = Prosperous}0.1019370.81.0464
35{Weather = light_rain, Positive_Hit = vehicle}{Road_section = Prosperous}0.1427120.81.0464
36{Week = Fri}{Road_section = Prosperous}0.162080.7989951.045085
37{Road_section = Non_prosperous, Road_type = Straight}{Collision_Type = crash}0.145770.7447921.043772
38{Weather = light_rain, Active_Hit = vehicle}{Collision_Type = crash}0.2079510.7445261.043399
39{Weather = light_rain, Active_Hit = vehicle}{Road_section = Prosperous}0.2222220.795621.040672
40{Slot = night, Road_section = Prosperous}{Collision_Type = crash}0.1233440.7423311.040324
41{Weather = moderate_rain, Road_type = Straight}{Collision_Type = crash}0.1111110.7414971.039155
42{Active_Hit = vehicle, Positive_Hit = vehicle}{Road_section = Prosperous}0.1559630.7927461.036912
43{Active_Hit = electric_vehicle}{Road_section = Prosperous}0.1049950.7923081.036338
44{Slot = evening}{Collision_Type = crash}0.1172270.7371791.033104
45{Weather = light_rain, Collision_Type = crash}{Road_section = Prosperous}0.2844040.7881361.030881
46{Positive_Hit = vehicle}{Road_section = Prosperous}0.2691130.788061.030782
47{Active_Hit = vehicle, Road_type = Crossroad}{Collision_Type = crash}0.1355760.7348071.029779
48{Weather = light_rain}{Road_section = Prosperous}0.4077470.7843141.025882
49{Slot = night}{Collision_Type = crash}0.1508660.7290641.021731
50{Road_type = Straight}{Collision_Type = crash}0.4699290.7225711.012631
51{Collision_Type = scrape}{Road_section = Prosperous}0.1743120.7737561.012072
52{Road_section = Non_prosperous}{Collision_Type = crash}0.1692150.7186151.007087
53{Slot = night, Weather = light_rain}{Collision_Type = crash}0.1039760.718311.00666
54{Weather = light_rain, Road_section = Prosperous, Road_type = Crossroad}{Collision_Type = crash}0.1213050.7168671.004639
55{Weather = moderate_rain}{Road_section = Prosperous}0.1681960.7674421.003814
56{Week = Thurs}{Road_section = Prosperous}0.1141690.7671231.003397

Appendix D. Data Description

Table A4 shows classification rules that reveal the mechanism of action of urban road traffic accident casualties.
Table A5 shows classification rules that reveal the mechanism of action of urban road traffic accidents without casualties.
Table A4. Classification rules for casualty accidents.
Table A4. Classification rules for casualty accidents.
RulesAntecedentConsequentSupportConfidenceLift
1{Positive_Hit = pedestrian, Collision_Type = crash}{Casualties = Y}0.0604434070.9810606061.986694091
2{Road_Conditions = dry, Positive_Hit = pedestrian}{Casualties = Y}0.0520420070.9737991271.971989252
3{Positive_Hit = pedestrian}{Casualties = Y}0.0667444570.9727891161.969943932
4{Slot = forenoon, Positive_Hit = electric_vehicle}{Casualties = Y}0.0476079350.9488372091.92144019
5{Active_Hit = electric_vehicle, Positive_Hit = electric_vehicle}{Casualties = Y}0.0490081680.9170305681.857030237
6{Weather = cloudy, Positive_Hit = electric_vehicle, Road_section = Prosperous, Road_type = Straight}{Casualties = Y}0.0464410740.9045454551.831747293
7{Positive_Hit = electric_vehicle, Collision_Type = crash, Road_type = Crossroad}{Casualties = Y}0.0597432910.9014084511.825394712
8{Positive_Hit = electric_vehicle, Collision_Type = crash}{Casualties = Y}0.1659276550.893216081.808804775
9{Road_Conditions = wet, Positive_Hit = electric_vehicle}{Casualties = Y}0.0459743290.8914027151.805132625
10{Road_Conditions = dry, Positive_Hit = electric_vehicle, Road_section = Prosperous, Road_type = Straight}{Casualties = Y}0.0725787630.8911174791.804555007
11{Active_Hit = electric_vehicle, Road_type = Straight}{Casualties = Y}0.0847141190.8832116791.788545389
12{Active_Hit = electric_vehicle, Collision_Type = crash}{Casualties = Y}0.0893815640.8824884791.787080876
13{Positive_Hit = electric_vehicle, Road_type = Straight}{Casualties = Y}0.1348891480.8810975611.784264201
14{Active_Hit = motorcycle, Collision_Type = crash}{Casualties = Y}0.0541423570.8787878791.779586985
15{Positive_Hit = electric_vehicle, Road_section = Prosperous}{Casualties = Y}0.1514585760.8782138021.778424453
16{Weather = cloudy, Positive_Hit = motorcycle}{Casualties = Y}0.0499416570.877049181.776066039
17{Weather = cloudy, Active_Hit = electric_vehicle}{Casualties = Y}0.0604434070.8751.771916352
18{Positive_Hit = electric_vehicle}{Casualties = Y}0.206301050.87438181.770664468
19{Active_Hit = electric_vehicle}{Casualties = Y}0.1227537920.8723051411.766459135
20{Road_Conditions = dry, Positive_Hit = motorcycle, Collision_Type = crash}{Casualties = Y}0.0611435240.8675496691.756829079
21{Road_Conditions = dry, Positive_Hit = motorcycle, Road_section = Prosperous}{Casualties = Y}0.0553092180.8649635041.751591972
22{Road_Conditions = dry, Positive_Hit = motorcycle, Road_type = Straight}{Casualties = Y}0.0522753790.8648648651.751392224
23{Road_Conditions = dry, Positive_Hit = motorcycle}{Casualties = Y}0.0763127190.8627968341.747204363
24{Positive_Hit = motorcycle, Collision_Type = crash, Road_section = Prosperous}{Casualties = Y}0.0599766630.8624161071.746433374
25{Positive_Hit = motorcycle, Collision_Type = crash, Road_type = Straight}{Casualties = Y}0.0562427070.8607142861.742987105
26{Positive_Hit = motorcycle, Road_type = Straight}{Casualties = Y}0.069078180.8604651161.742482525
27{Positive_Hit = motorcycle, Collision_Type = crash}{Casualties = Y}0.0819136520.8560975611.733638019
28{Positive_Hit = motorcycle, Road_section = Prosperous}{Casualties = Y}0.0728121350.8547945211.730999301
29{Positive_Hit = motorcycle}{Casualties = Y}0.0998833140.8492063491.71968299
30{Active_Hit = motorcycle, Road_type = Straight}{Casualties = Y}0.0499416570.83593751.692812943
31{Active_Hit = motorcycle}{Casualties = Y}0.0744457410.8350785341.691073496
Table A5. Classification rules for no-casualty accidents.
Table A5. Classification rules for no-casualty accidents.
RulesAntecedentConsequentSupportConfidenceLift
1{Active_Hit = other, Positive_Hit = vehicle}{Casualties = N}0.05250875111.975564776
2{Active_Hit = other}{Casualties = N}0.0578763130.9841269841.944206605
3{Active_Hit = vehicle, Positive_Hit = vehicle, Collision_Type = scrape}{Casualties = N}0.0518086350.9694323141.915176333
4{Weather = cloudy, Active_Hit = vehicle, Positive_Hit = vehicle, Road_section = Prosperous}{Casualties = N}0.0564760790.9681.912346704
5{Weather = cloudy, Active_Hit = vehicle, Positive_Hit = vehicle}{Casualties = N}0.0788798130.9657142861.907831127
6{Active_Hit = vehicle, Positive_Hit = vehicle, Road_type = Straight}{Casualties = N}0.1148191370.9590643271.894693704
7{Active_Hit = vehicle, Positive_Hit = vehicle}{Casualties = N}0.1715285880.9582790091.893142256
8{Slot = night, Positive_Hit = vehicle, Road_section = Prosperous}{Casualties = N}0.0525087510.9336099591.844406949
9{Positive_Hit = vehicle, Collision_Type = scrape, Road_section = Prosperous}{Casualties = N}0.0835472580.9322916671.841802578
10{Slot = night, Positive_Hit = vehicle}{Casualties = N}0.0688448070.924764891.826932944
11{Slot = morning, Positive_Hit = vehicle, Road_section = Prosperous}{Casualties = N}0.0506417740.9234042551.824244921
12{Weather = cloudy, Positive_Hit = vehicle, Collision_Type = scrape}{Casualties = N}0.0578763130.9219330861.82133853
13{Positive_Hit = vehicle, Collision_Type = scrape}{Casualties = N}0.1122520420.9179389311.81344782
14{Slot = evening, Road_Conditions = dry, Positive_Hit = vehicle}{Casualties = N}0.0499416570.9145299151.806713086
15{Slot = morning, Road_Conditions = dry, Positive_Hit = vehicle}{Casualties = N}0.053442240.9123505981.802407704
16{Slot = evening, Positive_Hit = vehicle}{Casualties = N}0.0630105020.9090909091.795967979
17{Slot = morning, Positive_Hit = vehicle}{Casualties = N}0.0667444570.9079365081.793687384
18{Weather = cloudy, Positive_Hit = vehicle, Road_type = Crossroad}{Casualties = N}0.0543757290.89961391.777245532
19{Week = Fri, Positive_Hit = vehicle}{Casualties = N}0.0499416570.8916666671.761545259
20{Road_Conditions = wet, Positive_Hit = vehicle, Road_type = Straight}{Casualties = N}0.0448074680.8847926271.747965148
21{Road_Conditions = dry, Positive_Hit = vehicle, Road_type = Crossroad}{Casualties = N}0.0819136520.8819095481.742269438
22{Positive_Hit = vehicle, Road_type = Crossroad}{Casualties = N}0.105717620.8779069771.7343621
23{Positive_Hit = vehicle, Road_section = Non_prosperous}{Casualties = N}0.0802800470.877551021.733658885
24{Weather = cloudy, Positive_Hit = vehicle, Road_section = Prosperous}{Casualties = N}0.1092182030.8764044941.731393849
25{Week = wed, Positive_Hit = vehicle}{Casualties = N}0.045040840.8733031671.725266977
26{Positive_Hit = vehicle}{Casualties = N}0.3082847140.8707976271.720317119

References

  1. Mohammadi, A.; Kiani, B.; Mahmoudzadeh, H.; Bergquist, R. Pedestrian Road Traffic Accidents in Metropolitan Areas: GIS-Based Prediction Modelling of Cases in Mashhad, Iran. Sustainability 2023, 15, 10576. [Google Scholar] [CrossRef]
  2. Le, K.G.; Tran, Q.H.; Do, V. Urban Traffic Accident Features Investigation to Improve Urban Transportation Infrastructure Sustainability by Integrating GIS and Data Mining Techniques. Sustainability 2024, 16, 107. [Google Scholar] [CrossRef]
  3. National Bureau of Statistics of China. China Statistical Yearbook; National Bureau of Statistics of China: Beijing, China, 2023.
  4. Qu, Y.; Lin, Z.; Li, H.; Zhang, X. Feature recognition of urban road traffic accidents based on GA-XGBoost in the context of big data. IEEE Access 2019, 7, 170106–170115. [Google Scholar] [CrossRef]
  5. Macioszek, E.; Grana, A. The Analysis of the Factors Influencing the Severity of Bicyclist Injury in Bicyclist-Vehicle Crashes. Sustainability 2022, 14, 215. [Google Scholar] [CrossRef]
  6. Fountas, G.; Anastasopoulos, P.C.; Abdel-Aty, M. Analysis of accident injury-severities using correlated random parameters ordered probit approach with time-variant covariates. Anal. Methods Accid. Res. 2018, 18, 57–68. [Google Scholar] [CrossRef]
  7. Wu, Q.; Chen, F.; Zhang, G.; Liu, X.C.; Wang, H.; Bogus, S.M. Mixed logit model-based driver injury severity investigations in single-and multi-vehicle crashes on rural two-lane highways. Accid. Anal. Prev. 2014, 72, 105–115. [Google Scholar] [CrossRef]
  8. Tamakloe, R.; Zhang, K.; Hossain, A.; Kim, I.; Park, S.H. Critical risk factors associated with fatal/severe crash outcomes in personal mobility device rider at-fault crashes: A two-step inter-cluster rule mining technique. Accid. Anal. Prev. 2024, 199, 107527. [Google Scholar] [CrossRef] [PubMed]
  9. Ramírez, A.F.; Valencia, C. Spatiotemporal correlation study of traffic accidents with fatalities and injuries in Bogota (Colombia). Accid. Anal. Prev. 2021, 149, 105848. [Google Scholar] [CrossRef]
  10. Li, P.; Abdel-Aty, M.; Yuan, J. Real-time crash risk prediction on arterials based on LSTM-CNN. Accid. Anal. Prev. 2020, 135, 105371. [Google Scholar] [CrossRef]
  11. Hammad, H.M.; Ashraf, M.; Abbas, F.; Bakhat, H.F.; Qaisrani, S.A.; Mubeen, M.; Fahad, S.; Awais, M. Environmental factors affecting the frequency of road traffic accidents: A case study of a sub-urban area of Pakistan. Environ. Sci. Pollut. Res. 2019, 26, 11674–11685. [Google Scholar] [CrossRef]
  12. Xiong, X.; Zhang, S.; Guo, L. Non-motorized vehicle traffic accidents in China: Analysing road users’ precrash behaviors and implications for road safety. Int. J. Saf. Secur. Eng. 2021, 11, 105–116. [Google Scholar] [CrossRef]
  13. Kong, X.; Das, S.; Jha, K.; Zhang, Y. Understanding speeding behavior from naturalistic driving data: Applying classification based association rule mining. Accid. Anal. Prev. 2020, 144, 105620. [Google Scholar] [CrossRef] [PubMed]
  14. Zhang, X.; Qi, S.; Zheng, A.; Luo, Y.; Hao, S. Data-Driven Analysis of Fatal Urban Traffic Accident Characteristics and Safety Enhancement Research. Sustainability 2023, 15, 3259. [Google Scholar] [CrossRef]
  15. Salmon, P.M.; Read, G.J.; Beanland, V.; Thompson, J.; Filtness, A.J.; Hulme, A.; McClure, R.; Johnston, I. Bad behaviour or societal failure? Perceptions of the factors contributing to drivers’ engagement in the fatal five driving behaviors. Appl. Ergon. 2019, 74, 162–171. [Google Scholar] [CrossRef]
  16. Ackaah, W.; Apuseyine, B.A.; Afukaar, F.K. Road traffic crashes at night-time: Characteristics and risk factors. Int. J. Inj. Control Saf. Promot. 2020, 27, 392–399. [Google Scholar] [CrossRef]
  17. Zhou, Z.; Meng, F.; Song, C.; Sze, N.; Guo, Z.; Ouyang, N. Investigating the uniqueness of crash injury severity in freeway tunnels: A comparative study in Guizhou, China. J. Saf. Res. 2021, 77, 105–113. [Google Scholar] [CrossRef]
  18. Wada, Y.; Asami, Y.; Hino, K.; Nishi, H.; Shiode, S.; Shiode, N. Road junction configurations and the severity of traffic accidents in Japan. Sustainability 2023, 15, 2722. [Google Scholar] [CrossRef]
  19. Hussain, Q.; Feng, H.; Grzebieta, R.; Brijs, T.; Olivier, J. The relationship between impact speed and the probability of pedestrian fatality during a vehicle-pedestrian crash: A systematic review and meta-analysis. Accid. Anal. Prev. 2019, 129, 241–249. [Google Scholar] [CrossRef]
  20. Ding, H.; Sze, N.; Li, H.; Guo, Y. Roles of infrastructure and land use in bicycle crash exposure and frequency: A case study using Greater London bike-sharing data. Accid. Anal. Prev. 2020, 144, 105652. [Google Scholar] [CrossRef]
  21. Li, J.; Zhao, Z. Impact of COVID-19 travel-restriction policies on road traffic accident patterns with emphasis on cyclists: A case study of New York City. Accid. Anal. Prev. 2022, 167, 106586. [Google Scholar] [CrossRef]
  22. Rezapour, M.; Moomen, M.; Ksaibati, K. Ordered logistic models of influencing factors on crash injury severity of single and multiple-vehicle downgrade crashes: A case study in Wyoming. J. Saf. Res. 2019, 68, 107–118. [Google Scholar] [CrossRef] [PubMed]
  23. Jiang, F.; Yuen, K.K.R.; Lee, E.W.M. Analysis of motorcycle accidents using association rule mining-based framework with parameter optimization and GIS technology. J. Saf. Res. 2020, 75, 292–309. [Google Scholar] [CrossRef]
  24. Samerei, S.A.; Aghabayk, K.; Mohammadi, A.; Shiwakoti, N. Data mining approach to model bus crash severity in Australia. J. Saf. Res. 2021, 76, 73–82. [Google Scholar] [CrossRef]
  25. Chen, M.; Zhou, L.; Choo, S.; Lee, H. Analysis of risk factors affecting urban truck traffic accident severity in Korea. Sustainability 2022, 14, 2901. [Google Scholar] [CrossRef]
  26. Huang, Y.; Chen, F.; Song, M.; Pan, X.; You, K. Effect evaluation of traffic guidance in urban underground road diverging and merging areas: A simulator study. Accid. Anal. Prev. 2023, 186, 107036. [Google Scholar] [CrossRef]
  27. Hu, L.; Li, H.; Huang, J.; Wang, F.; Lin, M.; Wu, X.; Wu, N. Investigating the severity of non-urban road traffic accidents in typical regions of Sichuan and Guizhou, China. Traffic Inj. Prev. 2022, 23, 290–295. [Google Scholar] [CrossRef]
  28. Kashani, A.T.; Besharati, M.M. Fatality rate of pedestrians and fatal crash involvement rate of drivers in pedestrian crashes: A case study of Iran. Int. J. Inj. Control. Saf. Promot. 2017, 24, 222–231. [Google Scholar] [CrossRef] [PubMed]
  29. Xu, C.; Bao, J.; Wang, C.; Liu, P. Association rule analysis of factors contributing to extraordinarily severe traffic crashes in China. J. Saf. Res. 2018, 67, 65–75. [Google Scholar] [CrossRef] [PubMed]
  30. Zhu, S. Investigation of vehicle-bicycle hit-and-run crashes. Traffic Inj. Prev. 2020, 21, 506–511. [Google Scholar] [CrossRef]
  31. Olowosegun, A.; Babajide, N.; Akintola, A.; Fountas, G.; Fonzone, A. Analysis of pedestrian accident injury-severities at road junctions and crossings using an advanced random parameter modeling framework: The case of Scotland. Accid. Anal. Prev. 2022, 169, 106610. [Google Scholar] [CrossRef]
  32. Zhang, Y.; Liu, T.; Bai, Q.; Shao, W.; Wang, Q. New systems-based method to analyze road traffic accidents. Transp. Res. Part F Traffic Psychol. Behav. 2018, 54, 96–109. [Google Scholar] [CrossRef]
  33. Owais, M.; Alshehri, A.; Gyani, J.; Aljarbou, M.H.; Alsulamy, S. Prioritizing rear-end crash explanatory factors for injury severity level using deep learning and global sensitivity analysis. Expert Syst. Appl. 2024, 245, 15. [Google Scholar] [CrossRef]
  34. Moussa, G.S.; Owais, M.; Dabbour, E. Variance-based global sensitivity analysis for rear-end crash investigation using deep learning. Accid. Anal. Prev. 2022, 165, 14. [Google Scholar] [CrossRef]
  35. Yang, L.; Luo, X.; Zuo, Z.; Zhou, S.; Huang, T.; Luo, S. A novel approach for fine-grained traffic risk characterization and evaluation of urban road intersections. Accid. Anal. Prev. 2023, 181, 106934. [Google Scholar] [CrossRef] [PubMed]
  36. Chaudhuri, S.; Juan, P.; Mateu, J. Spatio-temporal modeling of traffic accident incidence on urban road networks based on an explicit network triangulation. J. Appl. Stat. 2023, 50, 3229–3250. [Google Scholar] [CrossRef] [PubMed]
  37. Zhang, T.; Ramakrishnan, R.; Livny, M. BIRCH: An efficient data clustering method for very large databases. ACM Sigmod Rec. 1996, 25, 103–114. [Google Scholar] [CrossRef]
  38. Nowak, S. Some problems of causal interpretation of statistical relationships. Philos. Sci. 1960, 27, 23–38. [Google Scholar] [CrossRef]
  39. Liu, B.; Hsu, W.; Ma, Y. Integrating classification and association rule mining. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 27–31 August 1998; pp. 80–86. [Google Scholar]
  40. Shimada, K.; Hirasawa, K.; Hu, J. Class association rule mining with a chi-squared test using genetic network programming. In Proceedings of the 2006 IEEE International Conference on Systems, Man and Cybernetics, Taipei, Taiwan, 8–11 October 2006; pp. 5338–5344. [Google Scholar]
  41. Intini, P.; Berloco, N.; Fonzone, A.; Fountas, G.; Ranieri, V. The influence of traffic, geometric and context variables on urban crash types: A grouped random parameter multinomial logit approach. Anal. Methods Accid. Res. 2020, 28, 100141. [Google Scholar] [CrossRef]
  42. Mao, X.; Yuan, C.; Gan, J.; Zhang, S. Risk factors affecting traffic accidents at urban weaving sections: Evidence from China. Int. J. Environ. Res. Public Health 2019, 16, 1542. [Google Scholar] [CrossRef]
  43. Black, A.W.; Villarini, G.; Mote, T.L. Effects of rainfall on vehicle crashes in six US states. Weather. Clim. Soc. 2017, 9, 53–70. [Google Scholar] [CrossRef]
  44. Rzeznikiewiz, D.; Tamim, H.; Macpherson, A.K. Risk of death in crashes on Ontario’s highways. BMC Public Health 2012, 12, 1–7. [Google Scholar] [CrossRef] [PubMed]
  45. Du, W.; Yang, J.; Powis, B.; Zheng, X.; Ozanne-Smith, J.; Bilston, L.; He, J.; Ma, T.; Wang, X.; Wu, M. Epidemiological profile of hospitalized injuries among electric bicycle riders admitted to a rural hospital in Suzhou: A cross-sectional study. Inj. Prev. 2014, 20, 128–133. [Google Scholar] [CrossRef] [PubMed]
  46. Retallack, A.E.; Ostendorf, B. Relationship between traffic volume and accident frequency at intersections. Int. J. Environ. Res. Public Health 2020, 17, 1393. [Google Scholar] [CrossRef] [PubMed]
  47. Bagloee, S.A.; Asadi, M. Crash analysis at intersections in the CBD: A survival analysis model. Transp. Res. Part A Policy Pract. 2016, 94, 558–572. [Google Scholar] [CrossRef]
  48. Siman-Tov, M.; Radomislensky, I.; Group, I.T.; Peleg, K. The casualties from electric bike and motorized scooter road accidents. Traffic Inj. Prev. 2017, 18, 318–323. [Google Scholar] [CrossRef]
  49. Moradi, A.; Nazari, S.S.H.; Rahmani, K. Sleepiness and the risk of road traffic accidents: A systematic review and meta-analysis of previous studies. Transp. Res. Part F: Traffic Psychol. Behav. 2019, 65, 620–629. [Google Scholar] [CrossRef]
  50. Zhang, C.; He, J.; Yan, X.; Liu, Z.; Chen, Y.; Zhang, H. Exploring relationships between microscopic kinetic parameters of tires under normal driving conditions, road characteristics, and accident types. J. Saf. Res. 2021, 78, 80–95. [Google Scholar] [CrossRef]
  51. Lee, J.; Chae, J.; Yoon, T.; Yang, H. Traffic accident severity analysis with rain-related factors using structural equation modeling–A case study of Seoul City. Accid. Anal. Prev. 2018, 112, 1–10. [Google Scholar] [CrossRef]
  52. Seflers, M.; Kreicbergs, J.; Sauter, G. Equipment Condition for Zebra Crossing Night-Time Safety Performance in Latvia. Balt. J. Road Bridge Eng. 2021, 16, 108–125. [Google Scholar] [CrossRef]
Figure 1. Item frequency plot graphs of Cluster 1.
Figure 1. Item frequency plot graphs of Cluster 1.
Sustainability 16 10597 g001
Figure 2. Item frequency plot graphs of Cluster 2.
Figure 2. Item frequency plot graphs of Cluster 2.
Sustainability 16 10597 g002
Figure 3. Graph-based visualization for interesting rules of Cluster 1.
Figure 3. Graph-based visualization for interesting rules of Cluster 1.
Sustainability 16 10597 g003
Figure 4. Graph-based visualization for interesting rules of Cluster 2.
Figure 4. Graph-based visualization for interesting rules of Cluster 2.
Sustainability 16 10597 g004
Figure 5. Graph-based visualization for classification rules of URTAs.
Figure 5. Graph-based visualization for classification rules of URTAs.
Sustainability 16 10597 g005
Table 3. Cluster description.
Table 3. Cluster description.
ClusterDescriptionCount/PercentageCasualtyNon-Casualty
1accidents on dry roads in fine weather3304 (77.10%)16391665
2accidents on wet roads in rainy weather981 (22.90%)477504
Total428521162169
Table 4. Interesting rules (partial) for Cluster 1.
Table 4. Interesting rules (partial) for Cluster 1.
RulesAntecedentConsequentSupportConfidenceLift
1{Active_Hit = vehicle, Road_section =
Non_prosperous}
{Road_type =
Straight}
0.1041160.7962961.189943
2{Road_section = Non_prosperous}{Road_type =
Straight}
0.212470.7808681.166887
3{Weather = cloudy, Road_section =
Non_prosperous}
{Road_type =
Straight}
0.1334750.7764081.160223
4{Collision_Type = crash, Road_section =
Non_prosperous}
{Road_type =
Straight}
0.144370.775611.15903
5{Slot = night}{Weather =
cloudy}
0.129540.7417681.154948
Table 5. Interesting rules (partial) for Cluster 2.
Table 5. Interesting rules (partial) for Cluster 2.
RulesAntecedentConsequentSupportConfidenceLift
1{Slot = night, Road_section =
Prosperous}
{Weather = light_rain}0.1172270.7055211.357091
2{Collision_Type = crash,
Road_section = Non_prosperous}
{Road_type = Straight}0.145770.8614461.324574
3{Road_section = Non_prosperous}{Road_type = Straight}0.1957190.8311691.27802
4{Active_Hit = vehicle, Road_section = Non_prosperous}{Road_type = Straight}0.1039760.8095241.244738
5{Positive_Hit = motorcycle}{Collision_Type = crash}0.1100920.8641.210834
Table 6. Classification rules (partial) for casualty accidents {Casualties = Y}.
Table 6. Classification rules (partial) for casualty accidents {Casualties = Y}.
RulesAntecedentConsequentSupportConfidenceLift
1{Positive_Hit = pedestrian,
Collision_Type = crash}
{Casualties = Y}0.0604434070.9810606061.986694091
2{Road_Conditions =
dry, Positive_Hit = pedestrian}
{Casualties = Y}0.0520420070.9737991271.971989252
3{Positive_Hit = pedestrian}{Casualties = Y}0.0667444570.9727891161.969943932
4{Slot = forenoon, Positive_Hit =
electric_vehicle}
{Casualties = Y}0.0476079350.9488372091.92144019
5{Active_Hit = electric_vehicle,
Positive_Hit = electric_vehicle}
{Casualties = Y}0.0490081680.9170305681.857030237
Table 7. Classification rules (partial) for no-casualty accidents.
Table 7. Classification rules (partial) for no-casualty accidents.
RulesAntecedentConsequentSupportConfidenceLift
1{Active_Hit = other, Positive_Hit = vehicle}{Casualties = N}0.05250875111.975564776
2{Active_Hit = other}{Casualties = N}0.0578763130.9841269841.944206605
3{Active_Hit = vehicle, Positive_Hit = vehicle,
Collision_Type = scrape}
{Casualties = N}0.0518086350.9694323141.915176333
4{Weather = cloudy, Active_Hit = vehicle,
Positive_Hit = vehicle, Road_section = Prosperous}
{Casualties = N}0.0564760790.9681.912346704
5{Weather = cloudy, Active_Hit = vehicle,
Positive_Hit = vehicle}
{Casualties = N}0.0788798130.9657142861.907831127
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Du, L.; Huang, F.; Lu, H.; Chen, S.; Guo, Q. An Association Rule Mining-Based Modeling Framework for Characterizing Urban Road Traffic Accidents. Sustainability 2024, 16, 10597. https://doi.org/10.3390/su162310597

AMA Style

Du L, Huang F, Lu H, Chen S, Guo Q. An Association Rule Mining-Based Modeling Framework for Characterizing Urban Road Traffic Accidents. Sustainability. 2024; 16(23):10597. https://doi.org/10.3390/su162310597

Chicago/Turabian Style

Du, Lijing, Fasheng Huang, Hua Lu, Sijing Chen, and Qianwen Guo. 2024. "An Association Rule Mining-Based Modeling Framework for Characterizing Urban Road Traffic Accidents" Sustainability 16, no. 23: 10597. https://doi.org/10.3390/su162310597

APA Style

Du, L., Huang, F., Lu, H., Chen, S., & Guo, Q. (2024). An Association Rule Mining-Based Modeling Framework for Characterizing Urban Road Traffic Accidents. Sustainability, 16(23), 10597. https://doi.org/10.3390/su162310597

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop