Applying Data Mining Approaches for Analyzing Hazardous Materials Transportation Accidents on Different Types of Roads

: With the increase in the demand for and transportation of hazardous materials (Hazmat), frequent Hazmat road transport accidents, high death tolls and property damage have caused widespread societal concern. Therefore, it is necessary to carry out risk factor analysis of Hazmat transportation; predict the severity of accidents; and develop targeted, extensive and refined preventive measures to guarantee the safety of Hazmat road transportation. Based on the philosophy of graded risk management, this study used a priori algorithms in association rule mining (ARM) technology to analyze Hazmat transport accidents, using road types as classification criteria to find rules that had strong associations with property-damage-only (PDO) accidents and casualty (CAS) accidents under different road types. The results indicated that accidents involving PDO had a strong association with weather (WEA), traffic signals (TS), surface conditions (SC), fatigue (FAT) and vehicle safety status (VSS), and that accidents involving CAS had a strong association with VSS, equipment safety status (ESS), time of day (TOD) and WEA when urban roads were used for Hazmat transportation. Among Hazmat transport incidents on rural roads, the incidence of PDO accidents was associated with intersections (IN), SC, WEA, vehicle type (VT), and segment type (ST), while the occurrence of CAS accidents was associated with qualification (QUA), ESS, TS, VSS, SC, WEA, TOD, and month (MON). Strong associations between the occurrence of PDO accidents and related items, such as IN, SC, WEA and FAT, and the occurrence of CAS accidents and related items, such as ESS, TOD, VSS, WEA and SC, were identified for Hazmat road transport accidents on highways. The accident characteristics exemplified by strongly correlated rules were used as the input to the prediction model. Considering the scarcity of these events, four prediction models were selected to predict the severity of Hazmat accidents on each road type employing four analyses, and the most suitable prediction model was determined based on the evaluation criteria. The results showed that extreme gradient boosting (XGBoost) is preferable for predicting the severity of Hazmat accidents occurring on urban roads and highways, while nearest neighbor classification (NNC) is more suitable for predicting the severity of Hazmat accidents occurring on rural roads.


Introduction
China has become the world's largest producer and seller of chemicals, and the accompanying logistics have also increased rapidly with the booming development of production, sales and related activities. Due to the uneven geographical distribution of product supply and product demand in China's industries, approximately 95% of hazardous materials (Hazmat) in China must be transported off-site [1]. Due to policy constraints, geographical differences, and nonuniform technical conditions, information systems are not interoperable, and railroads, waterways and other modes of transport are not fully inferences about the influence of the factors of interest [17]. Machine learning approaches are adaptable to processing outliers, missing data, and noisy data and are versatile, requiring no or few previous assumptions about input variables [18][19][20][21][22][23][24][25][26]. These methods can effectively solve the problems associated with the above statistical methods and achieve more accurate predictions of accident severity [17]. Huting et al. [27] used the random forest model to identify factors that affected the probability of a responsible bus accident in the Minneapolis-Saint Paul, Minnesota, metropolitan area. They found that bus drivers are at greater risk toward the middle of their shift, especially when in dense traffic. Yassin et al. [28] used a hybrid k-means and random forest algorithm approach to road accident prediction and model interpretation. They found that driver experience and day, light condition, driver age, and service year of the vehicle were the decisive contributing factors for serious injury, light injury, and fatal severity, respectively. Harb et al. [29] investigated the features of drivers, vehicles, and settings associated with accident avoidance strategies. Additionally, the random forests approach was used to prioritize the drivers, vehicles, and environmental variables of accident avoidance operations. They discovered that obstructions to drivers' sight, physical disability, and attention were all connected with collision avoidance actions during incidents. Additionally, the speed limit was connected with avoidance movements for rear-end crashes, and vehicle type was associated with avoidance efforts for head-on and angle collisions. Lv et al. [30] investigated how to identify the traffic accident potential by using the k-nearest neighbor method with real-time traffic data and found that the k-nearest neighbor method outperformed the conventional c-means clustering method. An investigation by Ma et al. [31] of the 3146 traffic deaths in Los Angeles between 2010 and 2012, using a methodological framework of XGBoost and grid analysis, revealed the eight most essential elements that contributed to the fatalities. Drunk driving, partying, rear-end crashes, poor illumination, pedestrian contact, motorcycle contact, the day of the week, and the hour of the day were the most significant influences, in that order. Soleimani et al. [32] utilized XGBoost to determine the relative importance of crossing closure criteria using accidents data from 18,485 road-rail grade crossings in the United States. The model's accuracy was 0.991, which was higher than that of decision trees and random forests. Parsa et al. [33] applied XGBoost and Shapley Additive exPlanations (SHAP) for real-time accident detection and characterization. The findings indicated that XGBoost could reliably detect accidents with a 99% detection rate, 79% accuracy rate, and a 0.16% false alarm rate. Additionally, it was suggested that speed, population, network, land use, and weather conditions all substantially affected the likelihood of accidents.
However, since machine learning methods are 'black box' approaches, the analysis and prediction of severity classification often lack a direct and clear interpretation of accident severity and related variables [34]. In contrast, the association rule mining algorithm, as an unsupervised algorithm that does not rely on any assumptions or a priori knowledge to discover hidden but meaningful connections in a dataset, can discover the associations between different accident characteristics, including their severity [35][36][37]. This data mining methodology has been identified as a potential decision support tool for traffic safety engineers [38][39][40][41]. Montella et al. [42] investigated the contributory crash factors in 15 urban roundabouts located in Italy and to study the interdependences between these factors. They identified numerous contributory factors related to the road and environment deficiencies but unrelated to the road user or the vehicle. Das et al. [43] adopted an association rules mining method to investigate driver lane-keeping ability in foggy weather conditions. Their study indicated that affected visibility, male drivers, a higher number of lanes, the presence of horizontal curves, was associated with poor lane-keeping performance in several rules. Langford et al. [44] utilized an unsupervised association mining approach to uncover trends in a database of vehicle-pedestrian collisions. They discovered that highlighting traffic illumination helped to mitigate the severity of pedestrian accidents. According to Xu et al. [45], the association rule mining approach was used to find sets of accident contributing elements that were often found together in significant casualty collisions. According to researchers, there is a complicated connection between road user behavior, vehicle parameters, road geometry qualities, and environmental elements that lead to significant casualty collisions. Yu et al. [46] used an a priori approach to find significant correlations between crash severity and crash-related parameters. The created rules showed that male drivers aged 29 are more likely to be engaged in fatal incidents on non-separable roads, while property damage crashes are more likely to occur in towns.
Furthermore, despite this discovery, there is still a lack of study that uses data mining technologies to uncover the hidden correlations in Hazmat road transport accident-related datasets. A primary objective of this study is to apply the association rule mining (ARM) approach to extensively explore the characteristics and contributing factors of Hazmat road transport accidents that occur on different kinds of roads in light of this understanding. At the same time, multiple prediction models are evaluated to determine the best severity prediction model for accidents occurring on different road types. The findings of this research will aid in the complete understanding of basic patterns of Hazmat road transport accidents on various road types to target and guide policy and decision-making initiatives to enhance the safety of Hazmat road transport.

Association Rule Mining
ARM is a typical unsupervised learning technique that uses data mining ideas to uncover hidden correlations between variables in a database [36]. Its functions include discovering frequent itemsets and discovering association rules, and its process is composed of the following two steps: (1) The frequent itemset mining method is used to find all the frequent itemsets.
(2) Strong association rules are produced according to the obtained frequent itemsets.

Apriori Algorithm
The Apriori algorithm is a classic data mining algorithm that follows the a priori principle; that is, if an itemset is an infrequent itemset, then all its supersets are also infrequent itemsets, and if a rule does not have a strong association relationship, then all the subsets of the rule also do not have a strong association relationship. This approach can avoid the calculations caused by infrequent candidate itemsets. After several passes over the dataset, multiple robust candidate itemsets and multiple strongly correlated rules can be generated [37].
The process of determining the set of frequent items by the Apriori algorithm is shown in Figure 1. C 1 ,C 2 ,⋯,C k ⋯,C K denote 1-item sets, 2-item sets..., k-item sets, respectively. L 1 ,L 2 ,⋯,L k ⋯,L K denote the frequent itemsets with k items. Scan represents the dataset scanning function, which filters the itemsets by the set minimum support and discards those that do not meet the minimum support. The remaining itemsets that meet the requirements constitute the set L k . The different frequent k itemsets are combined into the candidate K + 1 itemsets.
After determining the frequent itemsets, the association rule mining criteria are used to find strong association relationships. The process is as follows. First, we start with a frequent itemset, create a list of rules with only one element on the right-hand side, and then calculate those rules' confidence and lift values. Next, the remaining rules are merged to create a new list of rules with two elements on the right-hand side of the rule, and the confidence and lift values of those rules are calculated. This step is repeated by adding elements to the rule's right-hand side, iterating through all the rules, and finally selecting the rules that satisfy the threshold.

Association Rule Assessment Criteria
Support, confidence and lift values are often used assessment metrics for frequent itemsets and strong association rules. An implication is defined in the Hazmat road transport accident dataset for two sets of itemsets (the antecedent) and (the consequent) of the form → that satisfy the requirements , ⊆ and ∩ = {∅}.
The support of the rule is the probability that and hold together among all the possible presented cases. Support can be mathematically defined, as shown in Equation (1) below.
where | ∪ | is the number of times both itemsets and occur together and | |is the number of items in the accident database.
The confidence of the rule is the conditional probability that the consequent is true under the condition of the antecedent , as defined as Equation (2).
where | | denotes the number of occurrences of itemset , and | ∪ | denotes the number of occurrences of both and itemsets. The lift takes into account how much the likelihood of occurrence of varies as a result of . Equation (3) below may be used to compute the lift value mathematically. ( Lift = 1 indicates no correlation between the antecedent and consequent, Lift > 1 indicates a positive correlation between the antecedent and consequent, and Lift < 1 indicates a negative correlation between the antecedent and consequent.

Ordinal Logit (OL)
Ordered logit models are derived from econometric models and are one of the common models used to perform ordered discrete data analysis and forecasting [16]. These models map the latent, difficult-to-observe, continuous variable * into an observable ordered variable to represent the severity propensity, and * and are related by Equation (4).
where τ = ( 0 , 1 , ⋯ , ⋯ ) denotes the set of accident severity grading points. Accident severity is represented by the ordered variable , and the various characteristics affecting accident severity are represented by . The general form of the model is * = + .
Where = 1 , 2 , … , … ; = 1, … , ; = 1, … , is the vector of accident severity influencing factors; = ( 1 , 2 , … , ) is the parameter corresponding to an influencing factor, where is the observed value of the th influencing factor of the th accident; is the total number of accident samples; is the number of influencing factors for each accident; and is the random error term, which is the sum of other factors that are difficult to observe but have an impact on the severity of the accident.
In the ordered logit model, obeys the Gumbel distribution, its probability density function is ( ), and its cumulative distribution function is ( ), ( ) = 0.
From Equations (1) and (2), it can be derived that the probability of the th accident being of severity is where the ith accident occurrence ratio (odds) is

Nearest Neighbor Classification (NNC)
NNC, sometimes referred to as the k nearest neighbors method, classifies an observation of interest by examining the closest k observations, and if the majority of these k instances belong to a specific class, then the new data belongs to that class. Its essential elements are the k value [47], the distance between two instances in the feature space [48], and the classification decision rule. The choice of k value starts from k = 1 and gradually increases, and the k value is determined according to the classification effect. The choice of the distance calculation method is decided according to the scenario of application and the characteristics of the data itself, which are generally Euclidean distance and Manhattan Distance [49]. The classification decision rule is generally a majority voting rule (majority voting rule), that is, the majority of the k neighboring categories are used as the categories of the test samples.

Random Forests (RF)
The core of the RF algorithm is to construct multiple mutually independent evaluators and then to average or majority vote principle on their predictions to decide the results of the evaluators. The primary computational process includes sample set selection, construction of decision tree, and combination in three parts [50].
In an original training set containing n samples, K rounds of data extraction are performed; in each round of data extraction, random sampling is performed, one sample is sampled each time, and the sample is put back into the original training set before the following sample is taken, so that n times are collected. Finally, the K datasets are as large as the original training set is obtained. Since it is random sampling, the other sampled sets are also different each time the dataset is different from the original dataset.
(2) Decision tree construction. The core problem of decision tree is to find out the right features to make judgments, that is, how to branch. When each sample has M attributes, and each node of the decision tree requires splitting, m attributes are randomly chosen from these M attributes that fulfill the criterion m ≪ M. Then, using some approach (Gini coefficient or Information Gain), one of these m properties is chosen as the node's splitting attribute. It continues until no more splitting is possible.
(3) Decision tree combination. A decision tree's importance is equated to the significance of the outcomes since each decision tree in this research is autonomous. In the RF combination phase, the weight of each decision tree is equal. All of the decision trees weigh in on the final categorization outcomes.
where is the th sample in the dataset, is the total amount of data imported into the th tree, and is all trees created. When creating trees solely, the equation should be ∑ ( ) =1 .
is the actual label, ̂ is the predicted value, and is an equation that determines the tree model's complexity based on the tree's structure.
When trees are created, the predicted value ̂ in the traditional loss function is expressed as Equation (6).
As a result, the classic loss function is connected to all well-established trees. ̂ stores the outcomes of all tree iterations, making a direct connection between the tree's structure and the model effect. The objective function is expressed as Equation (7).
Using Taylor's formula as a guide, the objective function may be expressed as shown in Equation (8) after expansion.
are the first-and second-order derivatives of the loss function ( ( ) ,̂( −1) ) over ̂( −1) , respectively. The constant term is irrelevant to the result of the th iteration, so the constant terms ( ( ) ,̂( −1) ) and ∑ ( ) are removed from the objective function. The objective function is expressed as Equation (9).
The structure of the tree is redefined according to Equation (10).
where ( ) is the leaf node where sample is located. ( ) is the score obtained by this sample falling in the ( ) leaf node of the th tree.
If a tree has a total of leaf nodes, each with an index of , the weight of the samples in the leaf nodes is . Equation (11) describes the complexity of the model ( ).
The objective function may be turned into Equation (12) by including the tree's structure into the loss function and specifying the set of samples stored on a leaf with index as .

Predictive Performance Evaluation Indexes
The confusion matrix is a special kind of table that is used to visualize an algorithm's performance. Table 1 illustrates the confusion matrix for a two-class classifier, where TN represents the number of correct predictions that an instance is negative, FP represents the number of incorrect predictions that an instance is positive, FN represents the number of incorrect predictions that an instance is negative, and TP represents the number of correct predictions that an instance is positive. While the optimal outcome is to achieve a high overall model prediction accuracy, greater preference is given to the prediction of CAS accidents; that is, it is more desirable to capture the occurrence of a few categories of accidents. Additionally, the influence of the imbalance of sample categories on the index results in the actual accident data should be eliminated. Therefore, the evaluation index for the overall effectiveness of the model, accuracy; the evaluation index that can capture the particular category, recall; and the index that can equalize the impact of the sample imbalance on the index results, the area under the receiver operating characteristics (ROC) curve (AUC), were chosen [52]. Accuracy is the proportion of all correctly judged results, as shown in Equation (13). Recall is the probability of being predicted as a positive sample out of an actual positive sample, as shown in Equation (14). FPR is the proportion of false positive prediction values within the sum of true negative and false positive values, as shown in Equation (15). When the distribution of positive and negative samples in the test set changes, the ROC curve with the TPR as the y-axis and the FPR as the x-axis can be kept constant; the higher the TPR (Recall) and the smaller the FPR, the more efficient the model and algorithm. From a geometric point of view, the larger the AUC is, the better the model, so the AUC can be used as a metric measuring the reliability of the algorithm and the model.

Data Sources
In this paper, we selected 900 accidents resulting from the transportation of Hazmat by road between 2016 and 2020, and these data were obtained from the Hazardous Chemicals Registration Center of the Ministry of Emergency Management of China. After screening and integration, the final data used for analysis included 862 accident cases, mainly involving attributes such as accident casualties, driver attributes, vehicle attributes, road attributes, environmental attributes, and Hazmat types. According to the road types where the accidents occurred, they were divided into three road types (rural road, urban road and highway) with large differences and analyzed separately, accounting for 11.14%, 23.43%, and 65.43% of the total number of accidents, respectively. Depending on the casualties of the accidents, the accident severities were divided into property-damageonly (PDO) and casualty (CAS) categories, accounting for 43.97% and 56.03% of the total number of accidents, respectively. To facilitate the modeling and analysis of the data, the accident characteristics need to be coded. The statistical results after feature coding are shown in Table 2.

Association Rule Mining
To arrive at significant results, it is critical to calibrate the minimal support and confidence levels. Defining proper cutoff points will result in the discovery of novel rules. A trial-and-error approach using iterative support and confidence combinations was utilized to develop a fair set of thresholds for investigations, including different levels of road. Then, using the lift values, itemsets with a high association to accident severity were retrieved. Increased lift values suggest higher links between the rule's or right-side item's (RSI or Y) consequence and the rule's or left-side item's antecedent (LSI or X).

Urban Roads
The minimum support, confidence and lift thresholds were defined as 0.3, 0.9, and 1.1, respectively. A total of 50 rules were generated using accident severity as a consequence. The top ten rules in descending order of lift values for different severity levels were selected and are presented in Table 3. Figure 2 shows the relationship between each antecedent and consequent.  (1) PDO Accidents.
As shown in Table 3, the occurrence of PDO accidents had a strong association with WEA, TS, SC, FAT and VSS. The highest lift value is 2.059 for the LSI term X {WEA-1, TS-1, SC-1}, which indicates that the probability of PDO accidents occurring under clear weather, dry road surface and up to standard road traffic signs is 2.059 times that of the average occurrence of PDO accidents on urban roads. This means that clear weather, a good road surface environment and standard sign markings in the city have certain helpful effects on reducing the severity of accidents. These benefits may exist because clear weather provides drivers with a clear view and a better grasp of the surrounding environment [43]; the dry road surface ensures that there is enough friction between the vehicle and the road surface, which can balance with the large inertia force of the heavy-duty Hazmat transport vehicle and allows the driver to control the vehicle better when danger occurs; and the presence of sign markings regulates the behavior of road users, controls the speed of motor vehicles [29], and effectively separates pedestrians, nonmotorized vehicles and motor vehicles, reducing the possibility of other road participants being involved in accidents and increasing the possibility of escape from Hazmat subaccidents.
(2) CAS Accidents. Table 3, the occurrence of CAS accidents showed a higher propensity to be linked to VSS, ESS, TOD, WEA and QUA. The highest lift value is found to be 2.276 with rule {VSS-1, ESS-1, TOD-1} → {Severity-CAS}. This finding indicates that the probability of CAS accidents occurring at 1-3 a.m. under transport vehicles with good loading equipment and vehicle technology is 2.276 times that of the average occurrence of CAS accidents on urban roads. This means that although the vehicles entering the city and with their loading equipment are in great technical condition, the probability of causing casualties in accidents that occur in the early morning hours is also high. The reason for this may be that the urban transport management of Hazmat transport vehicles has access to strict standards, so access to the technical condition of the vehicle is relatively good [15]. Meanwhile, the urban area has strict requirements on the access time and roadway of Hazmat transport vehicles, the more concentrated access time is 23:00-5:00. According to human physiological characteristics, in the early morning hours, individuals are prone to fatigue and sleepiness, and the ability to accurately evaluate the driving environment and the correct handling of risk are reduced [14]. In addition, because there are fewer road users and law enforcement officers during the night, drivers may engage in illegal driving, hit-and-run and other dangerous behaviors.

As shown in
(3) Proposals to Improve Safety in Hazmat Transport on Urban Roads.
To improve the safety of Hazmat road transport on urban roads, the following approaches should be taken into consideration. Law enforcement departments should increase supervision, enforcement, and accident tracking while increasing the cost of violations to eliminate unsafe driver behaviors. Road units should be used with increased investment in science, technology and personnel to provide timely detection and effective handling of dangerous road surface environments according to three aspects: initial forecasts (weather forecasts, event monitoring and regular analysis), timely warnings (information dissemination, extensive channels and directed push), and active interventions (road control, variable information and on-site command). It is also important to set standardized signs and markings [46]. Transportation companies should conduct psychological tests for drivers to avoid hiring aggressive and dangerous drivers. Specialized departments and transport companies should also conduct regular emergency rescue training and drills for Hazmat transport accidents.

Rural Roads
Minimum support, confidence, and lift levels of 0.2, 0.80, and 1.1, respectively, were specified. For accident severity, a total of 67 rules were produced. Among these, the best ten rules ranked by lift values for various severity levels were chosen and are given in Table 4. The link between each antecedent and consequent is shown in Figure 3.  (1) PDO Accidents As shown in Table 4, the features with strong association rules with the occurrence of PDO accidents were IN, SC, WEA, VT, and ST. The highest lift value is found to be 1.943 with rule {IN-0, SC-1, WEA-1} → {Severity-PDO}. This rule signifies that the probability of PDO accidents occurring at nonintersections with clear weather and dry road surface environments is 1.943 times that of the average occurrence of PDO accidents on rural roads. This implies that the probability of a serious accident at an intersection is higher in clear weather and under good road surface conditions. The reasons for this phenomenon include the following: (1) Hazmat transport vehicles are mostly heavy semitrailers, with a higher center of gravity, in the process of turning, the centrifugal force of the curve and the lateral force of the vehicle rotation on the tires increase the lateral slip force, making the vehicle susceptible to rolling over. Moreover, large body, long wheelbase and the high driver position increase the vehicle blind spots and the area of the inner wheel difference [12]; (2) Road junctions are not equipped with signal lights or other traffic signs and markings; motor vehicles, nonmotorized vehicles and pedestrians are mixed; and personnel are more concentrated; (3) The supervision of road transportation of Hazmat in rural areas is low, and there are many driving violations, such as running red lights and speeding at intersections.
(2) CAS Accidents We identified a strong association between the occurrence of PDO accidents and related items such as QUA, ESS, TS, VSS, SC, WEA, TOD and MON, as shown in Table 4. The rule with the highest lift value of 2.432 is {QUA-0, TOD-1, VSS-1} → {Severity-CAS}. This rule demonstrates that the probability of CAS accidents occurring in the early morning hours when drivers who are not qualified to drive tankers transporting Hazmat is 2.432 times that of the average occurrence of CAS accidents on rural roads. This means that driver qualification, accident time, and vehicle type significantly influence whether the accident will cause casualties. Possible reasons are mainly that rural areas have inadequate supervision over front-line transportation and Hazmat transportation enterprises and the lack of long-term management mechanisms. Some enterprises that have not obtained Hazmat transport qualifications attempt to avoid supervision by choosing rural roads. Drivers who are not qualified for transportation have insufficient knowledge of the physical and chemical characteristics of Hazmat, transportation requirements, precautions, rescue measures, and so forth. Moreover, their awareness of safety and legal systems is weak. Vehicles without transport qualifications do not meet the requirements for vehicle stability, braking, tank pressure resistance and impact, making them susceptible to leakage, fire or explosions. Road lights in rural areas are not well configured and have poor driving visibility in the early morning [28], and drivers are prone to fatigue, leading to a decrease in the perception of the surrounding environment and the ability to perform driving operations. The physical and chemical properties of different Hazmat differ greatly from each other, and the consequences of an accident are diverse and complex. Rescue work is highly professional and difficult to perform, requiring coordination with relevant departments to scientifically configure emergency rescue resources and equipment. However, a lack of resources for emergency treatment exists in rural areas, often resulting in missing the best time for disposal due to the lengthy delivery time. In addition, limited medical care in rural areas makes emergency medical assistance difficult and may miss the best time to treat the injury and cause it to worsen.
The lift value of rule {MON-10, WEA-1, SC-1} → {Severity-CAS} is also 2.432, which is interpreted as the probability of CAS accidents in October, when the weather is sunny and the road surface is dry, is 2.432 times higher than the average rate of CAS accidents on rural roads. This means that month, road surface conditions and weather conditions are strongly correlated with the occurrence of casualties in rural road accidents [43]. The possible reason for the above phenomenon is that October is the autumn harvest season, and roads with good road surface conditions are illegally occupied by farmers for grain drying in sunny weather. At this time, the flying chaff seriously affects driver and pedestrian vision; the surface of the rounded grain and smooth straw reduces the stability of the vehicle; and the contact of straw with the vehicle is likely to induce mechanical failure of the vehicle and can even ignite Hazmat in the process of friction, causing a fire or explosion and seriously affecting the safety of road traffic.
(3) Proposals to Improve Safety in Hazmat Transport on Rural Roads.
Additional mobile inspection stations for Hazmat should be set up at appropriate locations on rural roads to increase on-site supervision of Hazmat transport in rural areas. Led by the government, the joint management of several departments should crack down on the unlicensed transport of Hazmat, strengthen the source of management, and establish a long-term management mechanism. It is crucial to increase the number of streetlights and optimize traffic signal devices at intersections to improve the technical conditions of rural roads [53]. Observation windows should be fitted into the copilot doors of heavy vehicles, and these vehicles should be equipped with other side assistance systems, such as blind spot cameras and radar, to reduce the impact of visual blind spots on transport safety. By linking transport enterprises and regulatory units, the whole process of the transport supervision system can be established. Digital registration, intelligent query and route management of Hazmat, drivers and vehicle information can provide the behavior of drivers and escort personnel, the state of Hazmat, and the supervision and analysis of the state of vehicles and loading equipment to ensure the safety of the whole process of transportation. Access standards for Hazmat transportation drivers can be improved, including driving skills, risk avoidance skills, and risk awareness in the audit criteria, and driver education should be ongoing throughout drivers' professional careers. Finally, it is important to preset or optimize emergency rescue sites for Hazmat road transport accidents in rural areas and strengthen the linkage with the local public security traffic police, emergency fire, medical and health departments.

Highways
The minimum support, confidence and lift thresholds were defined as 0.2, 0.77, and 1.5, respectively. A total of 77 rules were generated using accident severity as a consequence (RSI). The top ten rules in descending order of lift values for different severity levels were selected and are presented in Table 5. Figure 4 shows the relationship between each antecedent and consequent. Table 5. Top 10 rules ranked by the lift value of each severity (highways).  (1) PDO Accidents.

No. Association Rules Support Confidence Lift
The severity of PDO accidents showed a higher propensity to be linked to IN, SC, WEA, and FAT, as shown in Table 5. The highest lift value is 2.044 for LSI {IN-0, SC-1, WEA-1}. The results reveal that the probability of PDO accidents occurring at non-intersections in clear weather with dry road surfaces is 2.044 times that of the average occurrence of PDO accidents on highways. In clear weather and good road surface conditions, the probability of serious accidents is higher at highway entrances and exits, especially at exits [5]. This is mainly because, in the exit diversion area, the speed difference between vehicles moving straight and vehicles turning becomes greater than the speed difference between vehicles on the general roadway. In particular, when the distance of the road sign at the front of the exit is not set reasonably (the sign is too close to the diversion nose), the driver needs to brake sharply and turn sharply before driving off ramp. However, compared to ordinary vehicles, Hazmat transport vehicles are heavier and have greater inertia, which makes it challenging to drive smoothly into the exit in a short time, thus causing traffic accident.
We identified strong associations between Hazmat road transport accidents involving casualties and related items, such as ESS, TOD, VSS, WEA and SC, as shown in Table  5. The highest lift value is found to be 2.482 with rule {SC-2, WEA-2, TOD-3} → {Severity-CAS}, which is interpreted as the probability of CAS accidents occurring at 7:00-9:00 a.m. on wet road surfaces being 2.482 times greater than that of the average occurrence of CAS accidents on highways. This rule signifies that there is a strong association among weather, road surface conditions, time of day and the occurrence of CAS accidents. This is mainly because of the fast travel speed and large traffic flow on the highway; at this time, any changes in the driving environment may bring safety hazards. For example, rainfall will reduce the visibility of the road, affecting the driver's ability to judge visually, and rain will also reduce the friction between the wheels and the ground, affecting the braking performance of the vehicle [13]. The fourth category of Hazmat regarding being in contact with water or moisture indicates that a violent chemical reaction will occur, releasing a large amount of flammable gas and heat, and in conditions that do not require an open flame, Hazmat may also burn or explode. Fog will cause diffusion and absorption of light and, coupled with small droplets of water in the air, it will result in objects on the road becoming blurred, seriously hindering the driver's sight and easily causing rear-end accidents and other accidents [43]. The impact of snow and ice on transport safety is mainly in reduced visibility and the road friction coefficient. At the same time, according to the physical and chemical properties of Hazmat, certain types of Hazmat will change state under high temperature or cold conditions and influence the safety of load-bearing equipment. Additionally, adverse weather conditions can also have a negative impact on the rescue work of Hazmat transport accidents. Furthermore, although nighttime (23:00-06:00) prohibitions have been developed and implemented for Hazmat road transport vehicles, transport companies are driven by would-be interests to keep drivers in transport, which will lead to driver fatigue in the early morning and loss of accurate perception of the road environment and the ability to deal with emergencies. At the same time, because of the inherent physical and chemical characteristics of Hazmat, after an accident occurs, leakage, fire and explosion can easily occur; in the case of a concentration of a large number of vehicles, mass death and injury can easily occur.
(3) Proposals to Improve Safety in Hazmat Transport on Highways.
Modifications to accident-prone exits, such as installing speed feedback devices, appropriately increasing the distance between exit signs and ramps, placing crash barrels in exit triangles, and establishing emergency rescue facilities and equipment storage stations for Hazmat in service areas near entrances and exits, are suggestions for improving safety [14]. According to regional, seasonal, and other characteristics, regular Hazard surveys and updates of the permitted hours for road transport in Hazmat should be conducted. In addition, the following recommendations warrant further consideration: strengthening the inspection of fatigue driving at night, establishing joint liability and several liabilities between enterprises and drivers for fatigue driving, increasing the cost of noncompliance, and forcing enterprises to take primary responsibility for traffic safety. Road operators are able to deploy real-time weather monitoring systems and establish variable speed limit signs and treble horns to set reasonable speed limits and provide drivers with real-time information on the weather and road environment based on weather conditions.

Performance of the Prediction Models
The features that strongly correlate with accident severity under different road types are used as the input of each prediction model; the output results are also evaluated based on the evaluation indexes, and the evaluation results are shown in Table 6. From this analysis, it can be seen that XGBoost is more suitable for predicting the severity of road transport accidents involving Hazmat that occur on urban roads and highways, and NNC is more suitable for predicting the severity of accidents that occur on rural roads.

Conclusions
Safety accidents involving Hazmat during road transport occur occasionally, often causing high casualties, property damage and environmental damage, and the safety management of Hazmat transportation has gained widespread concern in society. Exploring the leading causes and predicting the severity of Hazmat road transport accidents on different road types using road types as grading criteria is meaningful for building a community with traffic safety as a priority.
The main contributions of the paper are summarized below: (1) The use of ARM can both compensate for the negative impact of correlation between risk factors as independent variables in accident severity analysis and fill the shortcoming in which machine learning cannot provide a reasonable explanation for the antecedents and consequences of accident occurrences. This approach also provides meaningful relationship maps for factors that are strongly associated with the occurrence of accidents of different severities under different road types.
The contributory factors for accidents of different severity on different road types explored using the Apriori algorithm are shown below: Based on the results of the study, possible preventive measures provided for the safety of road transport of Hazmat on different road types are as follows: (a) To improve the safety of road transportation of Hazmat in urban areas, the road administration unit needs to continuously ensure good road surface conditions. The transportation management department should improve access standards and monitoring of Hazmat transport vehicles entering urban areas. Law enforcement departments need to increase the frequency of supervision, prosecution and punishment of Hazmat transport violations at night to eliminate dangerous driver behaviors. However, the main consideration is to avoid the routing of Hazmat transport vehicles through densely populated urban areas; (b) Strengthening the monitoring and punishment of the illegal transport of Hazmat; improving the basic knowledge of traffic safety, safety and risk awareness of participants in traffic travel; optimizing the traffic infrastructure; and setting up more Hazmat rescue stations and equipping them with special materials for Hazmat accident rescue can reduce the incidence and severity of Hazmat road transport accidents in rural areas; (c) The safety of highway transportation can be improved by establishing a whole-process supervision system for the transportation of Hazmat with the help of fifth-generation (5G) networks, big data, the Internet of Things, biotechnology and other technologies. The supervisory system can maintain continuous attention to driver fa-tigue, the state of Hazmat, the driving speed of the vehicle, and the driving environment of the highway and make appropriate interventions according to the actual situation in a timely manner.
(2) Selecting multiple prediction models, the features that exhibit strong correlation rules with accident severity are used as inputs to the prediction models, allowing the best prediction model to be determined for each road type for accident severity prediction in the transportation of Hazmat. The risk features discovered by the Apriori algorithm on different road types that lead to accidents of different severity were input into different prediction models for case studies and it was found that, when predicting the severity of Hazmat road transport accidents, XGBoost should be chosen for urban roads and highways, and NNC should be chosen for rural roads.
(a) In this paper, when classifying the severity of Hazmat road transport accidents, only human casualty determinants are considered, and the salient features of environmental damage caused by Hazmat transport accidents are not reflected. In future research, it will be necessary to quantify the data on damage to the environment to achieve a more comprehensive analysis of the severity of accidents; (b) In this paper, when analyzing the factors influencing accident severity, objective factors such as roads, vehicles and the external environment are considered to influence accident severity, but the subjective aspects of drivers' psychological and physiological states are not analyzed. In future research, we need to obtain more information about the subjective state of drivers through questionnaires, surveillance videos and physiological state testing instruments to analyze the influence of drivers on the occurrence of accidents. Institutional Review Board Statement: Not applicable.