Determinants and Prediction of Injury Severities in Multi-Vehicle-Involved Crashes

Multi-vehicle (MV) crashes, which can lead to great damages to society, have always been a serious issue for traffic safety. A further understanding of crash severity can help transportation engineers identify the critical reasons and find effective countermeasures to improve transportation safety. However, studies involving methods of machine learning to predict the possibility of injury-severity of MV crashes are rarely seen. Besides that, previous studies have rarely taken temporal stability into consideration in MV crashes. To bridge these knowledge gaps, two kinds of models: random parameters logit model (RPL), with heterogeneities in the means and variances, and Random Forest (RF) were employed in this research to identify the critical contributing factors and to predict the possibility of MV injury-severity. Three-year (2016–2018) MV data from Washington, United States, extracted from the Highway Safety Information System (HSIS), were applied for crash injury-severity analysis. In addition, a series of likelihood ratio tests were conducted for temporal stability between different years. Four indicators were employed to measure the prediction performance of the selected models, and four categories of crash-related characteristics were specifically investigated based on the RPL model. The results showed that the machine learning-based models performed better than the statistical models did when taking the overall accuracy as an evaluation indicator. However, the statistical models had a better prediction performance than the machine learning models had considering crash costs. Temporal instabilities were present between 2016 and 2017 MV data. The effect of significant factors was elaborated based on the RPL model with heterogeneities in the means and variances.


Introduction
Road traffic injuries have become the eighth-leading cause of death for people of all ages, which remains a serious problem globally. Road traffic injuries are the first cause of death among people 5-29 years of age [1]. Traffic crashes also severely impact social and economic loss [2]. According to the National Highway Traffic Safety Administration (NHTSA), there were 33,654 traffic-related fatalities that involved 51,872 cars in 2018. Singlevehicle (SV) crashes and multi-vehicle (MV) crashes accounted for 56.80% and 43.20%, respectively [3]. However, the number of cars involved in MV and SV crashes accounted for 63.15% and 36.85%, respectively, indicating that MV crashes had more causalities compared to SV crashes. In other words, MV crashes can result in a greater social property loss and cause greater damage to roadside structures and vehicles [2]. As stated above, it is vital to investigate the relationship between crash risk factors and injury-severity in MV crashes. It should be noted that a crash that involves two or more cars is referred to as MV, whereas a crash involving one car is regarded as SV in this study.
In general, existing studies on MV crashes have mainly focused on two categories: critical risk factors analysis and the prediction of injury-severity. As for crash risk factors analysis, many researchers have investigated the relationship between the crash risk factors (i.e., alcohol) and crash injury-severity via statistical models. Binary discrete models, such as the binary logit model or probit model, have widely been used in the studies related to two crash severity levels. Multiple levels of crash severity can be investigated by multinomial models. There have also been many studies that aimed to analyze the unobserved heterogeneity of crash risk factors. Venkataraman et al. [4] employed the random parameter negative binomial model to investigate the heterogeneity in road segments, and the number of vehicles involved in the crash was one of ways to aggregate crashes. The results showed that the heterogeneity could be captured through the random parameters. Seraneeprakarn et al. [5] studied injury-severity in SV and MV crashes that involved at least one hybrid vehicle, and they noted that the estimation model that empowered heterogeneities in the means and variances of random parameters endowed much more flexibility in analyzing the data with the unobserved heterogeneity. Rahimi et al. [6] conducted comprehensive research on the determinants of the injury-severity of truck drivers in single-vehicle truck crashes by developing a random threshold random parameters hierarchical ordered probit model. The increase in probability for fatalities was linked with a wide range of variables (i.e., driver's education, presence of curves on roadways, and high-speed limit). Shao et al. [7] analyzed the influence of variables related to injury severity in truck-involved rear-end crashes. To analyze data between 2006 and 2015 from the United States for both SV and MV crashes, three random parameters probit models were developed. Specifically, they identified a significant difference between car-strike-truck crashes and truck-strike-car crashes. In another study conducted by Rezapour et al. [8], the differences in SV and MV crashes on downgrades was investigated via the ordered logit model. They identified that there were four significant variables: safety equipment use, lighting conditions, posted speed limit, and lane width, in SV and MV crashes. Hong et al. [2] investigated the impacts of crash risk factors on MV crashes via a double-hurdle approach. By analyzing the data collected from 2011 to 2017 in South Korea, they found that driver violations (i.e., improper distance between vehicles, reversing, and passing) significantly increased the likelihood of injury-severity in MV crashes.
With respect to the prediction of injury-severity, some studies applied machine learning methods to predict the possibility of the injury severity outcomes in recent years. Support Vector Machine (SVM) has often been used as a prediction tool in trafficrelated studies. Li et al. [9] investigated the prediction performance of the SVM model for motor vehicle crashes. It was found that the SVM model was more accurate and effective than the traditional Negative Binomial (NB) model was. Besides, some studies analyzing crash risk factors also used SVM models [10,11]. Random Forest (RF) was also used to predict crash injury-severity. In a study conducted by Harb et al. [12], RF was employed to reveal the associations between crash avoidance maneuvers and crash characteristics (i.e., driver characteristics and vehicle characteristics). It was found that drivers characteristics were the most important factors in all types of crashes. In addition, to achieve a better prediction performance, many researchers employed various kinds of measures to predict crash injury-severity. Sameen and Pradhan [13] used the Recurrent Neural Network (RNN) to predict the injury severity of 1130 crashes collected from 2009 to 2015. Besides that, the back-propagation neural network (BPNN), nearest-neighbor classification (NNC), and K-means clustering were also employed to make predictions about the possibility of injury-severity outcome [14,15].
As summarized above, previous studies have mostly focused on the analysis of the MV crash risk factors. However, limited studies have considered the unobserved heterogeneity and temporal stability on MV crashes. Besides, few studies have predicted the possibility of the injury-severity on MV crashes by using machine learning methods. This was because of the complications that these crashes may have and the potential errors in the model estimations. In this paper, to bridge these gaps in the previous studies, a random parameter logit model with heterogeneities in the means and variances was developed to investigate the crash risk factors in MV crashes. Furthermore, three methods (traditional statistical methods, advanced statistical methods, and machine learning-based methods) dominated the analysis of crash data. To further investigate the differences between advanced statistical models and machine learning-based models, two categories of models were employed to identify the critical contributing factors and to predict the possibility of crash injury-severity.
In the remainder of this paper, we begin by elaborating on statistical and machine learning-based methodologies. A series of likelihood ratio tests on temporal stability are presented. A detailed model evaluation system including four indicators is introduced. The model estimation results and conclusions are then presented, which is followed by the description and processing of data.

Methodology
In this paper, two kinds of methods, including statistical methods (RPL) and machine learning-based methods (RF), were developed to analyze the critical risk factors and predict the probability of crash injury severities, respectively. This section presents a brief introduction to the above-mentioned methods and the processing of crash data.

Data Processing
The data used in this research were MV crashes collected from HSIS, which provides a large number of major risk factors and outstanding quality of the crash data. The Federal Highway Administration (FHWA) has established this database, which contains 10-state highway safety data since 1987. The three-year (2016-2018) crash data in Washington were utilized in this study. However, as stated in Section 3, only the 2017-2018 crash data were employed after a series of log-likelihood ratio tests.
The investigation of these crashes was based on the "Guidebook for State Data Files California." According to the variables of "numvehs," which represents the total number of cars involved in the crash, only crash records involving more than two cars were selected for this study. There were 26,026 MVs reported by the police between 2017 and 2018. After removing the insufficient crash information, 13,478 MVs were left within 2 years (2017-2018). However, each crash record was established on the occupants' information, indicating that other detailed crash information (i.e., driver characteristics and road characteristics) were doubled, except for the occupants' information. This can result in a serious collinearity of data. To solve this problem, the related information of the first vehicle involved in MV crashes should be reserved. To be precise, the selection of the first vehicle was based on the "vehno," which represents the vehicle number.
The "severity" representing the most severe injury in the crash was employed as the determinant variable. The explanatory variables shown in Appendix A were classified into four categories: driver characteristics, road characteristics, crash characteristics, and occupant characteristics. Factors such as physical condition, helmet, and vehicle violation were found significant on injury-severity level according to some previous research [4][5][6][7][8].
Due to the high account of missing values, they were not considered in this study. However, these indicators can also be represented with other indicators, such as road surface condition, roadway classification, and driver age/gender. More detailed information on data processing is shown in Figure 1.

Random Parameters Logit Model (RPL)
The unobserved heterogeneity originating from various explanatory variables (such as driver characteristics, environmental characteristics, vehicle characteristics, and roadway characteristics) and the energy dissipation via the vehicle structure were crucial in the analysis of crash risk factors. In addition, the resulting effect of energy dissipation varied from occupant physical condition, vehicle safety equipment, and bone mass [5]. Many previous studies emphasized the importance of taking the unobserved heterogeneity into consideration while analyzing crash-related factors. Furthermore, random parameters logit (RPL) models [16][17][18][19], random parameters logit models with heterogeneities in the means [20,21], and random parameters logit models with heterogeneities in the means and variances [5,22,23] have all been successfully utilized in the investigation of crash injury-severity. Hence, to account for the unobserved heterogeneity, the random parameters logit model with heterogeneities in the means and variances was used in this study.
To achieve a random parameter logit model, an injury-severity propensity function was defined, as shown in Equation (1): where determines the probability of injury-severity category k (property damage only, possible injury, and fatal) in crashes i, and is a vector of explanatory variables (driver characteristics, road characteristics, crash characteristics, and occupant characteristics) and presents a vector of the estimable parameters for injury-severity level k. Given the crash-specific unobserved heterogeneity, the estimable parameters β k are allowed to vary from different observations through a density function, where φ represents a vector of parameters of the density function. In addition, is a random term that follows a type I extreme value (i.e., Gumbel) distribution. Thus, the calculation of the probability for each crash injury-severity level is shown in Equation (2) [24]:

Random Parameters Logit Model (RPL)
The unobserved heterogeneity originating from various explanatory variables (such as driver characteristics, environmental characteristics, vehicle characteristics, and roadway characteristics) and the energy dissipation via the vehicle structure were crucial in the analysis of crash risk factors. In addition, the resulting effect of energy dissipation varied from occupant physical condition, vehicle safety equipment, and bone mass [5]. Many previous studies emphasized the importance of taking the unobserved heterogeneity into consideration while analyzing crash-related factors. Furthermore, random parameters logit (RPL) models [16][17][18][19], random parameters logit models with heterogeneities in the means [20,21], and random parameters logit models with heterogeneities in the means and variances [5,22,23] have all been successfully utilized in the investigation of crash injuryseverity. Hence, to account for the unobserved heterogeneity, the random parameters logit model with heterogeneities in the means and variances was used in this study.
To achieve a random parameter logit model, an injury-severity propensity function V ki was defined, as shown in Equation (1): where V ki determines the probability of injury-severity category k (property damage only, possible injury, and fatal) in crashes i, and X ki is a vector of explanatory variables (driver characteristics, road characteristics, crash characteristics, and occupant characteristics) and presents a vector of the estimable parameters for injury-severity level k. Given the crash-specific unobserved heterogeneity, the estimable parameters β k are allowed to vary from different observations through a density function, where ϕ represents a vector of parameters of the density function. In addition, ε ki is a random term that follows a type I extreme value (i.e., Gumbel) distribution. Thus, the calculation of the probability for each crash injury-severity level is shown in Equation (2) [24]: where P i (k) represents the possibility of crash i with injury-severity level k. Furthermore, the random parameters with heterogeneities in the means and variances is defined as Equation (3) by Seraneeprakarn et al. [5].
where β represents the mean estimated parameters across all crashes, Z k is a vector of crash-related variables capturing the heterogeneity in the means for all crashes, and δ k is the corresponding vector of estimated parameters. W k is a vector of crash-specific variables that explain the heterogeneity in the standard deviation σ k with corresponding parameter vector ω k , and υ k is a disturbance term.

Random Forest
Random Forest (RF), Support Vector Machine (SVM), and Back-Propagation Neural Network (BPNN) had all been widely and successfully used in predicting the possibility of injury-severity outcome [25][26][27][28]. Choosing a suitable method for the crash prediction is critical. In addition, the crash datasets are high-dimensional, imbalanced, and have a large sample size. The determinant variable was divided into three categories. SVM cannot handle well datasets with a large sample size. A large sample size would take a long computational time. Besides, SVM is good at binary classification rather than multi-classification. The Back-Propagation Neural Network (BPNN) would also take a long training time, and the structure of the neural network is often determined based on experience. It restricts the generalization of the model. Furthermore, the performance of BPNN has a strong dependence on the sample. Imbalanced datasets are not conducive to the prediction of BPNN. However, the RF is good at dealing with high-dimensional datasets and would take less computational time than others at the same sample size. In addition, RF considers the interactive influence between each feature during the training. This makes the prediction results more reliable. Therefore, the RF was utilized in this paper to further validate and evaluate its ability in the performance of crash prediction.
Random Forest (RF) is an ensemble learning method. RF is based on bagging by taking decision trees as the base learner, which was proposed by Breiman in 2001 [29]. Given the input dataset represented as (xi,yi), xi represents all the crash-related explanatory variables, and yi is the injury severity. The construction of the RF model is shown in Figure 2, which consists of three parts: (a) The construction of a training set. The training set is extracted from the original dataset by using bootstrap. Additionally, the bootstrap is a kind of nonparametric Monte Carlo method, ensuring each sample has the same chance to be selected.
(b) Decision-tree generation. Each decision-tree is generated by part of all features, which is randomly selected. That is, each decision-tree is a base learner.
(c) Results combination. Each training set will be classified by their own decision-tree. Then, the final classification result is the mode value in the all-decision-tree prediction result. In addition, each decision-tree result has the same weight in the final vote.

Model Evaluation
According to previous studies [13,30], a test dataset was used to evaluate the tion performance of different models, and the prediction results for each mode described by a confusion matrix (also called an error matrix) [31], as shown in Ta make full use of the crash datasets and to perform a robust evaluation, a 10-fol validation was used in this paper for evaluating the predictive performance o model [32]. Note: P, the prediction result; R, the ration of the prediction result over the number of cras

Model Evaluation
According to previous studies [13,30], a test dataset was used to evaluate the prediction performance of different models, and the prediction results for each model can be described by a confusion matrix (also called an error matrix) [31], as shown in Table 1. To make full use of the crash datasets and to perform a robust evaluation, a 10-fold cross-validation was used in this paper for evaluating the predictive performance of the RF model [32]. Note: P, the prediction result; R, the ration of the prediction result over the number of crashes; N, actual number of crashes.
In this error matrix, the column values are the predicted results while the row values are the actual results. That is, P ij is defined as the number of crashes with injury-severity level i, but it is predicted as j. R ij is the ratio of the prediction result over the number of crashes with injury-severity level j. Furthermore, the overall correct prediction ratio is calculated as Equation (6).
Additionally, for each injury-severity level i, the calculated value R ii can indicate the correct prediction ratio.
However, under particularly crash-related conditions (weather, vehicle type, driver age, etc.), the indicator R overall may not provide reliable results. There are two reasons: (a) A large account of crashes are from one specific severity level, such as Property Damage Only (PDO), which accounted for a large portion in many crash datasets. An insensitive prediction model would regard all the crashes as a specific frequent severity level, which would result in a higher R overall . However, a sensitive model would regard the specific frequent severity level as the same as the other severity levels, which may have a lower R overall ; and (b) R overall indicated that the value of each injury-severity level was equal. That is, the social influence and property loss caused by the different injury-severity levels were equal [14].
Hence, in this study, another three indicators aiming to evaluate the prediction accuracy were used. These indicators took crash-related economic costs in the evaluation of prediction performance, as can be shown in Equation (7).
The comprehensive crash cost (CCC) consisted of two parts: economical crash costs (ECC) and quality-adjusted life years (QALY). It is worth explaining the meaning of QALY to gain a further understanding of Equation (7). QALY is an indicator that can estimate the value of the lost quality-of-life due to crashes by quantifying the value of some behaviors people would take to avoid injury or death [33]. That is, ECC and QALY represent observable and unobservable costs due to crashes, respectively [34].
Moreover, the comprehensive crash cost of 2017, as shown in Table 2, was updated by using the consumer price index (CPI) and median usual weekly earnings (MUWE) based on Crash Costs for Highway Safety Analysis (CCHSA) [33]. Based on the comprehensive crash cost of 2017, the actual overall costs of crashes (AOCC) and the predicted overall costs of crashes (POCC) were defined as Equations (8) and (9), respectively.
Furthermore, the overall prediction mean absolute error (OPMAE), overall prediction absolute percentage error (OPAPE), and overall prediction root-mean-squared error (OPRMSE) were defined as Equations (10)-(12), respectively. Both OPMAE and OPRMSE can evaluate the absolute errors between the predicted and actual costs. The OPAPE was used to measure the relative errors between them.

Results and Discussion
In this section, likelihood ratio tests of temporal stability were conducted, and the prediction performance of the two selected models was comprehensively measured. First, two series of likelihood ratio tests were conducted to examine the temporal stability of three-year crash datasets. Secondly, the process of determining various parameters for machine learning-based models was elaborated in detail. Thirdly, the prediction performance among these models was evaluated by comparing four indicators (R overall , OPMAE, OPAPE, and OPRMSE). The final step was to analyze the effects of significant variables on injury-severity.

Likelihood Ratio Tests
Many previous studies found that the critical factors of crash injury severity showed temporal instability [35]. Considering this, a series of tests were conducted in this paper to examine the differences between different years' MV crashes via likelihood ratio tests.
To comprehensively examine the temporal stability of MV crashes injury severity, two series of likelihood ratio tests were conducted. The first series of likelihood ratio tests were utilized to identify whether the different estimated parameters were stable between two individual years. This likelihood ratio test is defined as [35,36]: where LL(β y1y2 ) is the log-likelihood at convergence for the model estimating parameters from y2 while using data subset y1, whereas LL(β y1 ) denotes the log-likelihood at convergence for the model using data from y1's data. For each model comparison, the test was carried out the other way around based on the y1 subset and y2 subset to obtain two different results. That is, taking the estimated parameters of the 2017 model as the starting values and employing them in the 2016 data, the χ 2 between 2017 and 2016 can be calculated. The calculated χ 2 was 35.06 with 9 freedoms in this dataset, illustrating that the null hypothesis that the 2017 and 2016 data are the same can be rejected at a high confidence level (the corresponding confidence level more than 99.99%). Likewise, other two-year periods can be identified as equal to the null hypotheses being rejected at a high confidence level except the 2017 and 2018 data. Table 3 shows χ 2 values with degrees of freedom in parentheses and confidence level in square brackets. Cells in italics indicate the null hypothesis that the temporal stability cannot be rejected at a high confidence level (>95%).
Besides, for investigating the temporal stability between the joint model and each separate model, the second series of likelihood ratio tests can be written as [35]: where LL(β 2017-2018 ) identifies the log-likelihood at the convergence of the model corresponding to all available year data (2017 and 2018), and LL(β i ) represents the log-likelihood at convergence of the model with only one year (2017 or 2018) data. The value of χ 2 was 1.62 with 14 degrees of freedom (the corresponding confidence level was about 0.00%). This result showed that the null hypothesis that the contributing factors of crash injury severity in separate models is of temporal stability cannot be rejected at a high confidence level. The estimated parameters of the joint model can be utilized for analysis.

Model Estimation
The prediction accuracy with the number of trees was evaluated using the learning curve, which aimed to find the best prediction accuracy. As shown in Figure 3, the prediction accuracy reached its maximum value when the number of trees was 652, which indicated that the RF model constructing 652 trees obtained the best prediction performance. According to previous research, the number of optimal features used for splitting was to be set as the square root of all features. Besides, for investigating the temporal stability between the joint model and each separate model, the second series of likelihood ratio tests can be written as [35]: . This result showed that the null hypothesis that the contributing factors of crash injury severity in separate models is of temporal stability cannot be rejected at a high confidence level. The estimated parameters of the joint model can be utilized for analysis.

Model Estimation
The prediction accuracy with the number of trees was evaluated using the learning curve, which aimed to find the best prediction accuracy. As shown in Figure 3, the prediction accuracy reached its maximum value when the number of trees was 652, which indicated that the RF model constructing 652 trees obtained the best prediction performance. According to previous research, the number of optimal features used for splitting was to be set as the square root of all features.

Model Prediction and Discussion
As shown in Table 4 the OPAPE of the statistical models was 3.61%, while the machine learning-based model was 27.12%. In the comparison of these two methods, the OPAPE values of the statistical model were lower than those of the machine learning-based model. That is, the statistical model had a better performance concerning crash costs than the machine learning model had. However, the overall prediction accuracy of the machine learning-based model was higher than that of the statistical model (the R overall of the statistical model was 56.59%, and the R overall of the machine learning-based method was 67.16%). Furthermore, the statistical and machine learning-based models had a similar trend, respectively, regarding OPMAE and OPRMSE. Note: R overall , the overall correct prediction ratio; OPMAE, the overall prediction mean absolute error; OPAPE, the overall prediction absolute percentage error; OPRMSE, the overall prediction root-mean-squared error; POCC, the predicted overall costs of crashes; AOCC, the actual overall costs of crashes. Table 4 depended on the way in which to utilize them in practice. For example, as a road designer, the relationship between different conditions (i.e., road characteristics and environmental characteristics) with the potential injuries is crucial. Therefore, the prediction models were selected based on the overall prediction accuracy. As for insurance companies, the crash costs deserve more attention. Hence, the crash-costs-related indicators such as OPMAE, OPAPE, and OPRMSE should be selected first. It was easy to conclude that the statistical methods had a better performance than the machine learning-based models considering crash costs.

The interpretation of the indicators shown in
Additionally, a higher overall prediction accuracy did not imply a better prediction performance on a specific type of injury-severity, due to the different account of various injury severities. In order to choose one model that can predict a specific level of injuryseverity as accurate as possible, the R ii indicator in the confusion matrix should be selected. As shown in Table 5, the machine learning-based model achieved the best prediction performance (its R ii was 85.14%) when the property damage only was the only concern. As for injury, the prediction accuracy of the statistical model reached 34.03%.

Effect of Significant Factors
After analyzing the prediction performance among the selected models, a detailed investigation of the effects of significant crash-related factors was critical to further understand these models. Depending on the mathematical definition of these models (statistical/machine learning-based models), statistical models were able to clearly explain the effects of various parameters by utilizing coefficients and significance. Furthermore, the RPL model was estimated by the simulated maximum likelihood, which was an efficient method to random draws. To estimate more accurate parameters, 500 Halton draws were used in this study. The density function f (β|ϕ) followed a normal distribution, similar to previous studies [22,23,35]. The whole estimated results are shown in Appendix B.
Regarding the driver characteristics, among the characteristics of drivers, "old-aged driver" was found to significantly decrease the likelihood of property damage only. "Middle-aged driver" increased the likelihood of property damage only. "Male driver" was found to increase the likelihood of property damage only, which was in line with previous studies [23]. In addition, taking "sudden slowing maneuvers" can significantly decrease the possibility of Fatal injury compared with "skipping involved." The roadway characteristics found to be significant were: "Wet/snow/slush/ice road surface" and "Rural freeways." "Wet/snow/slush/ice road surface" increased the likelihood of property damage only. The indicator "Rural freeways" was found to be statistically significant in increasing the likelihood of the Fatal injury.
Appendix B also shows that the "crash not occurring at intersection or driveway" increased the possibility of the property damage only. The indicator "weekend" decreased the likelihood of Injury.
With regard to the characteristics of the occupant, "Male occupant" was found to decrease the likelihood of the property damage only. The indicator "Old-aged occupant" was found to decrease the likelihood of the property damage only in our study, which was in line with some previous studies [23]. The indicator "Ejected" was found to significantly decrease the likelihood of the property damage only. In addition, the indicator "second row" was found to decrease the likelihood of the Injury.
As indicated in Appendix B, two variables were identified as random parameters in this study: "Occupant restraints" and "Male driver." The indicator of "Occupant restraints" followed a normal distribution with a mean of −1.5017 and a standard deviation of 4.5840, which means that this variable was negative for observations of 62.84% (decreasing the likelihood of the injury) and positive for observations of 37.16% (increasing the likelihood of the injury). That is, the crashes were not prone to occur when the occupants' safety equipment was used. As for the analysis of "Male driver", it can also be interpreted through the same way.
With respect to the random parameters with heterogeneity in the means, the indicator of occupant restraints [I] and Male driver [I] were found to produce random parameters with heterogeneity in the means. The negative values of −0.5543 indicated that the mean of occupant restraints indicator decreased if the driver took "sudden slowing maneuvers," also meaning that the possibility of injury was decreased. As for the heterogeneity in the variance of random parameters, the "Middle-aged driver" was found to decrease the variance of the indicator "Occupant restraints." Based on the above analysis, the corresponding measurements decreasing the likelihood of crashes is shown in Figure 4

Conclusions
This research employed two methods (statistical methods: RPL; machine learningbased method: RF) to analyze significant crash-related factors and to predict the possibility of injury-severity outcomes based on the dataset of 13,667 crashes extracted from the HSIS database.

Conclusions
This research employed two methods (statistical methods: RPL; machine learningbased method: RF) to analyze significant crash-related factors and to predict the possibility of injury-severity outcomes based on the dataset of 13,667 crashes extracted from the HSIS database.
As for the crash prediction, the overall accuracies of the RF and RPL model were 56.59% and 67.14%, respectively. The OPMAE and OPAPE of these two models were 2143 and 3.61%, and 14,076 and 27.12%, respectively. Regarding crash costs, the OPRMSE of the RPL and RF model were USD 137 and USD 895 (millions).
For significant crash-related factors, the variables "old-aged driver," "Male occupant," "Old-aged occupant," "Ejected," and "Second row" may decrease the likelihood of crash injury severity; while variables "Male driver," "Wet/snow/slush/ice road surface," "Straight," "Not at intersection or driveway," and "Weekend" could increase the possibility of crash injury severity. In addition, the indicator "Occupant restraints" and "Male driver" were identified as random parameters. The above findings could be applied by various walks of life (e.g., the government, transportation-related enterprise, and insurance company) to improve transportation safety and reduce the crash costs.
It should be noted that there are still some limitations in this study. Many prevailing machine learning-based methods, such as ANN and KNN, were not used for comparison in this study. The imbalance datasets may result in biases in crash prediction and critical risk factors analysis. Besides that, the out-of-date crash datasets may result in some bias in the analysis of crash critical factors, and some factors (road alignment, occupant/driver physical condition, etc.) were not considered in this research. Future studies will focus on these above-mentioned issues.