Crash Severity Analysis of Highways Based on Multinomial Logistic Regression Model, Decision Tree Techniques, and Artiﬁcial Neural Network: A Modeling Comparison

: The classiﬁcation of vehicular crashes based on their severity is crucial since not all of them have the same ﬁnancial and injury values. In addition, avoiding crashes by identifying their inﬂuential factors is possible via accurate prediction modeling. In crash severity analysis, accurate and time-saving prediction models are necessary for classifying crashes based on their severity. Moreover, statistical models are incapable of identifying the potential severity of crashes regarding inﬂuencing factors incorporated in models. Unlike previous research efforts, which focused on the limited class of crash severity, including property damage only (PDO), fatality, and injury by applying data mining models, the present study sought to predict crash frequency according to ﬁve severity levels of PDO, fatality, severe injury, other visible injuries, and complaint of pain. The multinomial logistic regression (MLR) model and data mining approaches, including artiﬁcial neural network-multilayer perceptron (ANN-MLP) and two decision tree techniques, (i.e., Chi-square automatic interaction detector (CHAID) and C5.0) are utilized based on trafﬁc crash records for State Highways in California, USA. The comparison of the ﬁndings of the relative importance of ten qualitative and ten quantitative independent variables incorporated in CHAID and C5.0 indicated that the cause of the crash (X 1 ) and the number of vehicles (X 5 ) were known as the most inﬂuential variables involved in the crash. However, the cause of the crash (X 1 ) and weather (X 2 ) were identiﬁed as the most contributing variables by the ANN-MLP model. In addition, the MLR model showed that the driver’s age (X 11 ) accounts for a larger proportion of trafﬁc crash severity. Therefore, the sensitivity analysis demonstrated that C5.0 had the best performance for predicting road crash severity. Not only did C5.0 take a shorter time (0.05 s) compared to CHAID, MLP, and MLR, it also represented the highest accuracy rate for the training set. The overall prediction accuracy based on the training data was approximately 88.09% compared to 77.21% and 70.21% for CHAID and MLP models. In general, the ﬁndings of this study revealed that C5.0 can be a promising tool for predicting road crash severity.


Introduction
More than 1.3 million people die worldwide, and as many as 50 million are annually injured in road crashes.According to official statistics by the World Health Organization [1], traffic crashes are projected to be the fifth leading cause of death in the world by 2030.Every year, traffic crashes impose tremendous costs in terms of human casualties, agony, and economic losses on the people and governments worldwide [2][3][4].The HSIS claims that in California, there were 3898 fatal crashes in 2017, which have increased 34.29% since 2012.Most of the drivers involved were speeding at the time of the crashes, and two vehicles were involved in the crash occurrence [5].
Crashes vary in terms of fatality and injury levels.However, other studies focus on introducing crash severity such as fatality and injury and property damage only (PDO).Thus, studying further details of crash severity helps researchers to identify the most influencing factors on crash occurrence [6,7].The significance of road traffic crashes and the need to curb them has compelled researchers to extensively focus on crash analysis efforts.The capability of crash analysis is vital for reducing fatalities and injuries resulting from vehicles on roads [6].Thus, the reliable analysis of road crashes requires accurate knowledge of the influential factors on crashes.However, a starting approach has mostly been using statistical models, including logit and ordered probit models to predict crash severity.Previous experiences reveal that these models are based on predefined functions, which decrease the accuracy.This deficiency leads to the unintentional ignoring of missing values in the dataset.Data mining techniques have recently shown to be non-parametric tools capable of managing outliers and missing values [8][9][10].
PDO, fatality, severe injury, other visible injuries, and complaint of pain have important rules in the proportions of crashes which should be considered in crash analysis.Accordingly, this classification provides more and better details regarding crash severity compared to three typical levels of fatality, injury, and PDO severity.Crash prediction models also have their unique benefits and limitations, and there is no consensus on the best one.On the other hand, crash prediction models still encompass various limitations and have not achieved optimal performance.Therefore, more extensive model comparisons should be conducted to determine which data mining techniques better fit crash severity analysis data.To facilitate this importance, this study mainly aimed to investigate five classes of crash severity, including PDO, fatality, severe injury, other visible injuries, and complaint of pain based on the highway safety information system (HSIS) data for all state highways in California, the USA in 2012-2014.The study further sought to find the most appropriate model among other models by finding the best fit on the data in crash severity analysis using Waikato environment for knowledge analysis (WEKA) software.Subsequently, the obtained data were from this model were compared with those of other models such as the multinomial logistic regression (MLR) model and data mining techniques such as C5.0 and Chi-square automatic interaction detector (CHAID) algorithms, and artificial neural network-multilayer perceptron (ANN-MLP) by accuracy parameters.The remaining sections of this study are organized as follows.
The Section 2 discusses works related to predicting crash severity via statistical and data mining approaches.In addition, the gap of working on crash severity between previous studies and the present study is characterized regarding the five classes of crash severity.In the Section 3, the research method is presented based on studying the HSIS database in 2012-2014 and applying the MLR and data mining techniques (i.e., C5.0, CHAID), and ANN-MLP models according to hyper-parameter settings using WEKA software in order to predict crash severity.In addition, the modeling findings and discussions of the proposed models are explained in the Section 4, followed by explaining a sensitivity analysis among the proposed models in order to select the best predictive model.Conclusions are described in the fifth part of this study.

Literature Review
Researchers have recently focused on different crash analysis types resulting from traffic crashes, specifically the development and application of crash severity prediction models.Crash severity models attempt to estimate the probability that a crash will fall into various severity levels including PDO, minor injury (other visible injuries, and complaint of pain), severe injury, and fatality based on contributing factors [11][12][13][14][15]. Researchers in crash severity evaluations employ different modeling processes, the most prominent of which are regression and data mining techniques.Regression techniques such as Logit and Probit have been used to analyze traffic crash severity [16].Crash severity can be generally considered as a random event, thus statistical models, particularly regression analysis, have been widely applied to explore the associated contributing factors [17,18].
Compared with other types of regression models, choice and logistic regression models have been employed more frequently.However, most regression models have their model assumptions and predefined relationships between dependent and independent variables.Therefore, any violation of these assumptions may lead to entirely erroneous predictions.
Rezapour et al. [19] used multinomial regression model in order to identify parameters impacting traffic barrier crash severity.The results indicated that multinomial logistic regression model is appropriate for both non-interstate and interstates crashes involved in traffic barriers.Moreover, factors including road surface conditions, age, driver restraint, and curve negotiation were found to be the most effective factors on the severity of traffic barrier crashes in non-interstate highways.Wahab and Jiang [20] used multinomial regression model in order to explore the factors affecting motorcycle crash severity in Ghana and found that motorcycle crashes occurring during the daytime, in curves of roads, and adverse weather conditions decrease the probability of fatal injury.Rezapour and Ksaibati [21] compared the performance of injury severity prediction of truck crashes using multinomial and ordinal logistic regression models and reported that multinomial logistic regression could predict injury severity of truck crashes better than ordinal logistic regression model due to not assuming normality and linearity in violation and crash data.Pradipta et al. [22] also used multinomial logistic regression in order to identify factors influencing crash severity in West Nusa Tenggara of Indonesia.They found that road function, vehicle type, crash type, possession of a driving license, use of driver safety equipment, distraction of the driver, and location of the crash have a significant correlation with the severity of crashes.Vajari et al. [23] used a multinomial logit model for the prediction of motorcycle crash severity at Australian intersections.The results indicated that factors such as female motorcyclists, snowy, stormy or foggy weather, rainy weather, evening rush hours crashes, and unpaved roads reduced the probability of fatal injuries.Further, some studies used multinomial logistic regression model in order to find factors contributing to crash severity modeling.The results indicated that multinomial logistic regression appropriately can predict crash severity level according to factors leading to crash occurrence, appropriately [24][25][26].Since crash analysis is performed based on various variables, drawing upon multinomial logistic regression in order to investigate the effect of various variables on crash severity.In addition, previous studies used multinomial logistic regression properly to examine factors associated with crash severity and results indicated that multinomial logistic regression has better capability in predicting different levels of crash severity than other statistical methods such as the discriminant and ordered logit models [21,24,27].
In contrast to statistical models, the data mining classification technique consists of several distinct subsets such as support vector machine (SVM), Bayesian classifier, ANN, and decision trees.ANNs are non-parametric methods that are widely employed by researchers in crash severity evaluations.Abdelwahab and Abdel-Aty [28] employed an ANN to predict vehicle collisions at signalized intersections in central Florida.They compared ANN data with those of the fuzzy approach, and the ANN classification showed relatively better performance.Further, Shirmohammadi et al. [29] used a clustering analysis approach to classify drivers' behaviors regarding road crash severity.Shirmohammadi et al. [30] also identified crash-prone road locations in the light of the wavelet theory and the multi-criteria decision-making method and concluded that the combination of this theory based on ANN with the mentioned method could be a new road crash severity technique.Likewise, Alkheder et al. [31] used WEKA data mining software to build ANN classifiers in order to predict the injury severity of traffic crashes based on 5973 traffic crash records that occurred in Abu Dhabi during 2008-2013 and demonstrated that developed ANN classifiers can predict crash severity with reasonable accuracy.In another study, Taamneh et al. [32] also reported that clustering data prior to classification resulted in a higher precision compared to no clustering.Similarly, Mokhtarimousavi et al. [33] used SVM models for work zone crash injury severity prediction.Wahab and Jiang [34] developed algorithms to predict motorcycle crash severity based on machine learning.In their study, Amiri et al. [35] focused on predicting the severity of fixed object crashes among elderly drivers using ANN models and a hybrid intelligent genetic algorithm.Some studies represented that machine learning techniques have a better performance in improving safety in transport modes, including pedestrians and motorcycle crash severity, compared with ANN models [36][37][38].Chang and Chien [39] focused on decision trees (DTs) to study crash severity as another data mining technique.Chong et al. [40] compared DTs and neural network data mining methods to model the severity of head-on collisions.The accuracy of the neural network and DT models varied depending on the severity type for prediction.Furthermore, Beshah and Hill [41] evaluated the performance of DTs, naive Bayes, and K-nearest neighbor classifiers in the crash severity evaluation and found that the accuracy of these three types of data mining techniques was 80.20%, 79.90%, and 80.82%, respectively.Other researchers preferred the Chi-square automatic interaction detector (CHAID) algorithm due to its distinct structure in crash analysis and concluded that the CHAID has an acceptable prediction accuracy in fatality severity [42][43][44][45][46]. Behbahani et al. [47] used an extreme learning machine (ELM) as an advanced model, which is highly fast in comparing other algorithms and can predict precisely.In comparison with other algorithms, ELM as a feedforward neural network with random weights was of quite noticeable benefits.It can be such an effective predictive effect in dealing with crash data, especially when the amount of the labeled data is relatively small.Amiri et al. [48] employed five different data mining methods, including Bayesian network, ANN-MLP, ANN-radial basis function, SVM-polynomial and SVM-sigmoid to determine which of these techniques better perform in predicting crash severity.Moreover, Iranitalab and Khattak [49] compared several statistical and machine learning methods for crash severity prediction.Singh et al. [50] applied a deep neural network-based predictive model to quantify the effects of various variables on crash frequency and provide a ranked list of variables based on their importance.
A review of previous studies revealed that crash severity analysis has so far been limited only to PDO, fatality, and injury levels.To the best of our knowledge, nearly no study has focused on investigating different classes of crash severity.In the light of the review of the relevant literature, the novelty of the present study is two-fold.First, this study applies different classes of crash severity, including PDO, fatality, severe injury, other visible injuries, and complaint of pain to provide an accurate analysis of influencing factors on crashes.Second, the present study evaluates and compares the MLR model with different data mining techniques including the ANN-MLP model and two DT algorithms (i.e., C5.0, and CHAID) using the HSIS dataset for all state highways in California, the USA, and proposes the most accurate models for crash prediction purposes.The applied data source in this study includes three years of crashes linked to system-wide roadway characteristics, traffic volumes, and crash data.Using the HSIS data helps determine how much different countermeasures can reduce road crash potentials.

Research Method
The present study considers a comprehensive classification of crash severity such as PDO, fatality, severe injury, other visible injuries, and complaint of pain based on the HSIS dataset in 2012-2014.It then seeks to find the most appropriate predictive model using the MLR model and data mining techniques (i.e., C5.0, CHAID), and the ANN-MLP model by finding the best fit on data in crash severity analysis.According to the HSIS dataset, qualitative and quantitative independent variables are determined, and then crash severity is examined in five severity classes.The MLR models for severity classes are proposed based on training, validation, and correlation analysis.Data mining approaches including C5.0, CHAID, and ANN-MLP models are applied by means of hyper-parameters settings, the relative importance of variables, and correctly and incorrectly classified instances in WEKA software.Additionally, accuracy, the receiver operating characteristic (ROC) curves (AUCs), and classification time are taken into consideration within prediction crash severity.Then, sensitivity analysis is applied based on the running time of classifying crash severity.Figure 1 presents the overall flowchart and the process for evaluating the credibility and precision (performance evaluation) of the selected models in explaining the nominated five classes of severity.

Data Description
Because of the importance of the needed comprehensive data for this study, the crash data were obtained from the California HSIS database for all State highways and comprising crash information for years of 2012-2014.The response variable of the model is crash severity which is classified into five levels of PDO, fatality, severe injury, other visible injuries, and complaint of pain.
Table 1 provides a total of 20 qualitative and quantitative explanatory (independent) variables evaluated in this study.The qualitative variables are divided into different categories (codes) with descriptions, including the cause of the crash, weather conditions, road surface conditions, lighting conditions, the number of involved vehicles, median type, facility access, design speed, surface type, and gender.Quantitative variables in the areas of humans, environments, roads, and vehicles contributing to the occurrence of crashes are also listed in Table 1.There are 145,142, 131,508, and 152,908 crash records, most of which are of PDO severity, followed by the complaint of pain and visible injuries during 2012-2014.As shown in Figure 2, 66% of crashes belong to PDO.Meanwhile, fatality and severe injuries constitute approximately 3% of all crashes.In addition, other visible injuries consist of 12% of crashes, while complaint of pain accounts for 21% of crashes.Information on the percentage of each condition within which the crashes have occurred is presented in Table 1.Except for the cause of the crash, the number of the involved vehicles in the crash and design speed, and lightening and surface conditions do not reflect the potential for increasing crash occurrence in comparison with others of the same variable since the exposure of the traffic volume to these conditions is not equal.For instance, the slippery surface is known to be an influential factor in increasing traffic crashes.However, most crashes took place on a dry surface (Table 1).This is because the period when the surface is slippery is far less than the period that it is dry, thus the traffic volume is less exposed to a slippery surface, and fewer crashes are expected accordingly.Therefore, judgments based upon these percentages are misleading, and further investigation is required in this regard.On the other hand, data reveal that speeding is the major cause of crashes comprising nearly half of them, followed by other violations (hazardous) and improper turns.Roads with design speeds greater than 70 miles per hour are prone to crashes significantly greater than those with a lower design speed.Moreover, traffic crashes are mostly due to two-vehicle involvement and single-vehicle crashes, respectively.The statistics of quantitative variables are summarized in Table 2.

MLR Model
In the present study, to apply the MLR model for predicting crash severity, dependent variables are followed as Y, which has i degrees, sequenced with values from low to high which include the crash severity (PDO, fatality, severe injury, other visible injuries, and complaint of pain) when given values i = 1 to 5 and k indexes the observation (crashes).Independent variables are considered as X i1 , X i2 ,•••, X ij and j is the number of predictors based on the dataset in Table 1.Thus, the multinomial logistic regression model for the crash k having severity level i can be expressed as Equation (1) as follows [51][52][53]: where α i , and β ij , represent the constant for the crash severity level i, and the regression coefficient, respectively.P k (Y ij ≤ i X j ) is the cumulative probability Y ij under the conditional form of i X j regarding the crash severity level i (Y = 1 (PDO); Y = 2 (fatality); Y = 3 (severe injury); Y = 4 (other visible injuries); Y = 5 (complain of pain)) and Thus, the multinomial logistic probability model can be expressed as Equation (2): Pearson's χ 2 is obtained by comparing the model prediction of the crash, and the actual observation of the severity of the crash has a negligible difference to the model test of the goodness of fit [54].The calculation formula is expressed by Equation (3): where K, O k , and E k denote the number of the covariant type, the observed frequency in j covariant type, and the predicted frequency in j covariant type.The smaller statistic of Pearson χ 2 indicates the predicted values between the model and the actual of no significant difference, the model fitting effect is highly good.On the other hand, the means model fitting effect is poor [55].

ANN-MLP Model
ANN-MLP is a supervised learning technique applied for the classification and regression of datasets in different applications [56,57].In addition, this technique creates a feed-forward artificial neural network that consists of multiple nodes organized in three or more layers (i.e., the input layer, the output layer, and one or more hidden layer/layers in between).The input variables are mapped onto the output variables using one or more hidden layer/layers [58,59].ANN-MLP has been successfully used to solve many difficult problems by utilizing a backpropagation algorithm in training the generated networks.MLP has the capability of separating data that are not linearly separable [56,57,60].In this study, the ANN-MLP technique was employed to generate a classifier to accurately predict crash severity.It is noteworthy that this method is capable of approximating any finite nonlinear function with extremely high accuracy, thus it can be practical in the present study.In training, ANN-MLP is the inputs of the first layer multiplied in weight coefficients that could be any randomly selected number and then is entered into the neurons in the second layer.Therefore, to predict crash severity in the present study based on WEKA software, the initial setting of hyper-parameters about ANN-MLP (e.g., hidden layers, learning rates, momentum, and normalizing attributes) is summarized in Table 3.

DT Techniques
The DT technique is a decision support means in which tree-like graphs and their feasible outcomes are used to visually display the data [61].These outcomes are made of internal nodes and diverse branches and leaf nodes.Each internal node expresses a "test" of an attribute, each branch represents the outcome of the test, and each leaf node describes a class label.The paths from the root to the leaf express classification rules [62].The DT algorithm is a new tool for analyzing the existing crash dataset and predicting crash severity [63].In the DT, the value of a particular criterion is generally used to specify each internal node.More details about the hyper-parameter setting were selected according to Table 3 to yield the best performance for DTs based on WEKA software.Therefore, each applied algorithm in the present paper has been provided as follows: 3.4.1.C5.0 DT Technique The C5.0 algorithm is the generalized form of the Iterative Dichotomiser 3 algorithm which uses the gain ratio for selecting the most important attributes [61].C5.0 can generate classifiers displayed either as DTs or rulesets.Many studies prefer rulesets over DTs since they are easier to understand compared to DTs.The process of C5.0 algorithm is that, in the first step, it makes a large tree based on all of the attribute values.Then, it finalizes the decision rule by pruning.In the second step, a heuristic approach is applied for pruning by considering statistical significance of splits.In the third step, the branch nodes are proceeded and sent after fixing the best rule.Finally, the final class value in the last node is made which is called the leaf node [64,65].Thus, to predict crash severity based on WEKA software in the present study, the initial setting of hyper-parameters about the C5.0 DT technique is provided in Table 3.

CHAID DT Technique
A CHAID tree is a DT that is formed by repeatedly splitting the subsets of the space into two or more child nodes, beginning with the entire data set [66].To determine the best split at any node, any permissible pair of the categories of predictor variables is merged until there is no statistically significant difference within the pair with respect to the objective variable [63,66,67].Chi-square tests are applied at each stage in building the CHAID tree to ensure that each branch is associated with a statistically significant predictor of the response variable [68,69].The process of the CHAID algorithm is that in the first step, the best partition for each predictor is selected.Then, data are subgrouped based on the selected predictor.In the second step, each of these subgroups is analyzed again for producing further subgroups for analysis.In the third step, for each selected pair, the CHAID algorithm is examined for p-values greater than the certain threshold in order to merge the values and search for an additional potential pair to be merged.Finally, this procedure is continued until no significant pairs are found [65,70].
Therefore, to predict crash severity in the present study, the initial setting of hyperparameters regarding the CHAID DT technique based on WEKA software is presented in Table 3. Table 4 provides only the most significant rules identified in the present study because of space constraints.The frequency of each input attribute in the PDO, fatality, severe injury, other visible injuries, and complaint of pain is illustrated in Figures 3 and 4. Based on data in Figure 3, the number of generated rules based on C5.0 for PDO, fatality, severe injury, other visible injuries, and complaint of pain is 12, 25, 96, 135, and 189, respectively.As shown, CAUSE (X 1 ), the number of involved vehicles (NUMVEHS (X 5 )) in the crash, road surface conditions (RDSURF (X 3 )), design speed (DESG_SPD (X 8 )), and WEATHER (X 2 ) are the primary splitters in the C5.0 model.This implies that these variables are critical in classifying PDO, fatality, severe injury, other visible injuries, and complaint of pain in traffic crashes regarding the C5.0 model.The number of generated rules based on the CHAID model for PDO, fatality, severe injury, other visible injuries, and complaint of pain is 23, 35, 110, 145, and 198, respectively (Figure 4).According to the CHAID model, four variables are the primary splitters in the CHAID model, including the CAUSE (X 1 ), the number of vehicles (NUMVEHS (X 5 )), WEATHER (X 2 ), and AADT (X 15 ).This indicates that these variables are essential in categorizing PDO, fatality, severe injury, other visible injuries, and complaint of pain in traffic crashes regarding the CHAID model.

Performance Evaluation of Classifier Accuracy
To determine which algorithm yields the most accurate outcome, comparing and evaluating the findings of the modeling techniques are essential.Several most effective measures are considered in performance evaluations.However, the performance of classification algorithms is usually checked by evaluating the correctness of the classification.Accuracy is a fraction that represents the overall success of the classification [71].Equation ( 4) presents the general form of the applied accuracy in the comparison process.Table 5 provides the 2 × 2 confusion matrix for a binary classifier that has only positive and negative classes (in our case, it becomes 4 × 4 as we have 4 classes).TP, TN, FP, and FN can be described as follows [65,66,70]: TP i = True positive, namely, instances observed to be from class i are classified (predicted) correctly as belonging to class i FN i = False negative, namely, instances observed to be from class i are classified incorrectly as belonging to a class other than i FP i = False positive, namely, instances not observed to be from class i are classified incorrectly as belonging to class i TN i = True negative, namely, instances not observed to be from class i are classified correctly as belonging to a class other than i Other evaluation measures commonly used to evaluate the effectiveness of a classifier for each class are the true positive rate (TPR), the false positive rate (FPR), and the ROC curve.Equations ( 4) and ( 5) explain how to calculate these measures for class Positive in Table 5.
Recall is the proportion of instances classified as Positive, among all instances belonging to the class Positive.Note that the overall accuracy of a classifier can also be calculated by taking the weighted average of all recall values.
The FPR or (1-specificity) is the proportion of instances classified as class Positive while belonging to a different class, among all instances which are not of class Positive as shown in Equation ( 5): Finally, the ROC curve is a plot of the TPR (i.e., recall) against the FPR at various threshold settings showing the trade-offs between true positive (benefits) and false positive (costs).

Results
After initializing the MLR model, the data of MLR equations were compared with each other by means of training and validation, correlation analysis between independent variables and crash severity, and the significant level of independent variables in order to find the most appropriate MLR equation for crash severity predictions.The findings of DT techniques (i.e., C5.0, and CHAID), and the ANN-MLP model in WEKA software for predicting crash severity are presented throughout correctly and incorrectly classified instances, accuracy, AUCs, and the classification time of crash severity.The process of DT techniques (i.e., C5.0 and CHAID) and the ANN-MLP model includes using the entire dataset because of the need regarding the training set for the algorithm, followed by finding the precision of the classifier which is normally based on the level of accuracy in predicting the class of every crash.As the second stage, the cross-validation technique was employed with 10-folds to evaluate accuracy.To this end, the entire dataset was randomly placed into 10 subsets.Out of the 10 subsets, a single subset was selected and applied as the testing data and the remaining subsets were used in the process as the training data and then repeated 10 times.Each of the 10 subsets was precisely employed once as the testing data.As a result, the entire dataset was used for validation.In the third step, the overall performance was determined by averaging the 10 data from the folds.As the final step and for controlling any problem resulting from the imbalanced distribution of crash severity in the dataset, the dataset was resampled to bias the crash severity distribution toward a uniform distribution.The cross-validation with a 10-fold cross-validation was then re-used to evaluate its performance.Hence, a sensitivity analysis is taken into consideration based on the running time of classifying crash severity and 10-fold cross-validation, training set, and resampled training set to find the best model.Thus, the findings of the proposed models are presented as follows:

Correlation Analysis of Independent Variables
To examine the correlation analysis of independent variables on the dependent variable, seven types of logistic regression model were run, namely, MLR Main, MLR Inter, MLR Poly, MLR Main Inter, MLR Main Poly, MLR Inter Poly, and MLR Main Inter Poly.
Based on the obtained data (Table 6), MLR Inter, MLR Main Inter, MLR Inter Poly, and MLR Main Inter Poly had the greatest over fit since all of them showed a considerable rate on the training set but poorly performed on the validation set.This is actually resulting from a large gap between the training and validation sets.Thus, these four models are unsuitable as a predictive model for this set of data.Therefore, MLR Main was found to be the best model for logistic regression since it had the highest percentage of accuracy as compared to MLR Poly and MLR Main Poly, even though there was slightly overfitting on that particular model.Based on the correlation analysis between independent variables and severity crash, seven independent variables demonstrated significant correlations (p < 0.05), including the cause of the crash (X 1 ), weather conditions (X 2 ), road surface conditions (X 3 ), lighting conditions (X 4 ), the number of vehicles (X 5 ), design speed (X 8 ), and from the driver's aspect, driver's age (X 11 ).The significance levels are shown in Table 7.According to lower values of the Akaike information criterion (AIC), Bayesian information criterion (BIC), and Pearson's Chi-squared test (χ 2 ) in comparison with other variables in Table 7, driver's age (X 11 ) accounts for a larger proportion of traffic crash severity among the independent variables.Thus, traffic crashes are closely related to human factors.

Testing Goodness of Fit on the Models
Table 9 presents the result of Pearson χ 2 and deviance statistics fitting goodness test.As shown, the p-value of Pearson χ 2 and deviance statistics are both >0.05, thus at the significance level α = 0.05 conditions, establish that the model fitting effect is acceptable.

DT Techniques and the ANN-MLP Model
Graphical representation in Figure 5 is presented for a more comfortable grasp of the relative importance of independent variables when employing C5.0, CHAID, and ANN-MLP models.Based on data in Figure 5, C5.0 has one-quarter of the relative importance to CAUSE (X 1 ), another one-quarter to the number of vehicles (NUMVEHS(X 5 )) involved in the crash, and the remaining cases related to other variables.According to C5.0, CAUSE (X 1 ), the number of vehicles (NUMVEHS(X 5 )), road surface conditions (RDSURF (X 3 )), design speed (DESG_SPD (X 8 )), and WEATHER (X 2 ) were categorized as the most influential variables in the occurrence of crashes.
On the other hand, CHAID attributes one-third of the weight of the crash frequency model to CAUSE (X 1 ) and a quarter to the number of vehicles (NUMVEHS (X 5 )) involved in the crash and the remaining cases to other variables.Based on CHAID, the CAUSE (X 1 ), number of vehicles (NUMVEHS(X 5 )), WEATHER(X 2 ), and AADT (X 15 ) were classified as the most influential variables in the occurrence of crashes.Unlike DT models, ANN-MLP has a reasonably homogeneous distribution of relative importance, thus variations are less palpable compared to DT models.However, two variables, including the CAUSE (X 1 ) and WEATHER (X 2 ), are significantly important in the occurrence of crashes.
Generally, based upon C5.0 and CHAID, CAUSE (X 1 ) and NUMVEHS (X 5 ) were identified as the most influential variables on the occurrence of crashes.On the other hand, CAUSE (X 1 ) and WEATHER (X 2 ) were reported as the most contributing variables in the ANN-MLP model.
In order to show the performance of each decision tree technique for crash severity, the accuracy was taken into consideration for each sample dataset including training set, cross-validation, and resampled training set based on the correctly classified instances, incorrectly classified instances, Equation ( 4), and Tables 10-12.Thus, the accuracy results were calculated and shown in Table 10 for C5.0 model.Regarding Table 10, it was found that, for C5.0 prediction accuracy based on the training dataset, crash severity such as PDO, fatality, severe injury, other visible injuries, and complaint of pain was 86.72%, 23.67%, 39.65%, 55.78%, and 69.80%, respectively.Therefore, for the C5 model, the overall prediction accuracy based on the training data was approximately 88.09%.Moreover, based on the 10-fold cross-validation in Table 10, the prediction accuracy for PDO, fatality, severe injury, other visible injuries, and complaint of pain was 78.56%, 10.82%, 17.45%, 25.11%, and 45.78%, respectively.The overall prediction accuracy for the 10-fold crossvalidation was nearly 72.08%.However, after resampling for PDO, fatality, severe injury, other visible injuries, and complaint of pain was 94.53%, 76.87%, 83.26%, 89.10%, and 90.33%, respectively.For C5.0 models after resampling, the overall prediction accuracy of the training data was approximately 89.45%.Based on these data, an enhancement was observed in the prediction accuracy after resampling the training set.
In addition, the CHAID classifier is shown according to the correctly classified instances, incorrectly classified instances in Equation ( 4) and Table 11 in order to represent accuracy.According to Table 11, it was found that, for CHAID prediction accuracy based on the training dataset crash severity such as PDO, fatality, severe injury, other visible injuries, and complaint of pain was calculated and shown to be as 86.73%, 23.67%, 36.78%,68.95%, and 10.99%, respectively.The overall prediction accuracy based on the training data was nearly 77.21%.According to the 10-fold cross-validation, the correctly classified instances, incorrectly classified instances, and Equation (4), the prediction accuracy for PDO, fatality, severe injury, and other visible injuries, and complaints of pain was 67.99%, 17.31%, 22.71%, 35.76%, and 8.89%, respectively.The overall prediction accuracy was approximately 51.55%.However, the prediction accuracy after resampling the training dataset for PDO, fatality, severe injury, other visible injuries, and complaints of pain was reported to be 88.61%, 76.60%, 45.78%, 65.90%, and 76.89%, respectively.Accordingly, the overall prediction accuracy was nearly 80.49% after resampling.Thus, an increase was found in the prediction accuracy after resampling the training data.
The prediction findings for the ANN-MLP classifier are presented in Table 12.The MPL classifier prediction accuracy based on the training data set for PDO, fatality, severe injury, other visible injuries, and complaints of pain was 63.67%, 45.89%, 65.81%, 28.90%, and 16.54%, respectively.The overall prediction accuracy based on the training data was approximately 70.21%.Based on 10-fold cross-validation in Table 12, the prediction accuracy for PDO, fatality, severe injury, other visible injuries, and complaints of pain was 64.22%, 30.89%, 25.10%, 19.23%, and 9.26%, respectively, and the overall prediction accuracy was around 53.80%.The findings further revealed that prediction accuracy after resampling the training dataset was 88.61%, 85.67%, 78.90%, 82.38%, and 85.57% for PDO, fatality, severe injury, other visible injuries, and complaints of pain, respectively.The overall prediction accuracy after resampling was nearly 76.24%.Thus, an enhancement was observed regarding the prediction accuracy after resampling the training data (Table 12).

Sensitivity Analysis
Sensitivity analysis was performed on prediction crash severity for DT techniques, and the MLP model.The obtained data in Tables 10-12 indicated that building the MLP classifier takes a longer time compared to other classifiers (approximately 179 s) whereas that of the C5.0 and CHAID classifiers take 0.05 and 0.76 s, respectively.Figure 6 shows that the overall accuracy of DTs for the C5.0 classifier is more than that of the CHAID classifier and the ANN-MLP classifier in predicting crash severity in 10-fold cross-validation, the training set, and the resampled training set.The high accuracy of C5.0 in predicting crash severity indicates that C5.0 is the best predictive model in comparison with other models.Additionally, the prediction accuracy of the classifiers increased after resampling the training set, indicating an increase in the performance of prediction crash severity for proposed models.

Conclusions
The classification of crashes based on their severity is crucial since not all crashes are have the same financial and injury values.Further, in crash severity analysis, accurate and time-saving prediction models are necessary for classifying crashes based on their severity.The crash frequencies of different levels of severity such as PDO, fatality, severe injury, other visible injuries, and complaint of pain were predicted using the MLR model, DT algorithms such as C5.0 and CHAID, and the ANN-MLP model for all state highways in California, USA during 2012-2014 were undertaken in the present study.Influential independent qualitative and quantitative variables (10 variables for each of them) were used for modeling purposes.The following conclusions could be drawn based on the obtained data: (1) Using MLR models, it was observed that independent variables of the cause of the crash (X 1 ), weather conditions (X 2 ), road surface conditions (X 3 ), lighting conditions (X 4 ), the number of vehicles (X 5 ), design speed (X 8 ), and from the driver's aspect and age (X 11 ) showed significant correlations in crash severity.In addition, regarding the lower values of the AIC, BIC, and χ 2 in comparison with other variables, it was found that driver's age (X 11 ) accounts for a larger proportion of traffic crash severity among the independent variables.(2) The use of C5.0 and CHAID models indicated that the cause of the crash (CAUSE(X 1 )) and the number of vehicles (NUMVEHS(X 5 )) were the most important variables involved in the occurrence of crashes.(3) The ANN-MLP model indicated that CAUSE (X 1 ) and WEATHER (X 2 ) were as the most influential variables in crash severity.(4) When using the DT model (C5.0), the prediction accuracy was 94.53%, 76.87%, 83.26%, 89.10%, and 90.33% for the entire applied dataset as a training set with 10-fold crossvalidation and after resampling for PDO, fatal, severe injury, other visible injuries, and complaint of pain, respectively.For the CHAID classifier, the prediction accuracy was reported 88.61%, 76.60%, 45.78%, 65.90%, and 76.89% for the entire used dataset as the training set, with 10-fold cross-validation and after resampling for PDO, fatality, severe injury, other visible injuries, and complaint of pain, respectively.For the ANN-MLP classifier, the prediction accuracy for the entire applied dataset as a training set, with 10-fold cross-validation and after resampling for PDO, fatality, severe injury, other visible injuries, and complaint of pain was 88.61%, 85.67%, 78.90%, 82.38%, and 85.57%, respectively.Finally, sensitivity analysis showed that the C5.0 model was selected as the best predictive model with five variables regarding predicting road crash severity since it demonstrated the highest accuracy rate for training and the validation set compared to CHAID, ANN-MLP, and MLR models.

Figure 1 .
Figure 1.The flowchart and process for the prediction of crash severity in the present study.Note: ANN-MLP: Artificial neural network-multilayer perceptron; HSIS: Highway safety information system; PDO: Property damage only; MLR: Multinomial logistic regression; CHAID: Chi-square automatic interaction detector.

Figure 3 .
Figure 3. Distribution of Five Severity Classes Regarding the C5.0 Model; Note: PDO: Property damage only.

Figure 4 .
Figure 4. Distribution of Five Severity Classes Regarding the CHAID Model; Note: PDO: Property damage only; Note.CHAID: Chi-square automatic interaction detector.

Figure 5 .
Figure 5. Relative Importance of Variables Based on the Proposed Models; Note: ANN-MLP: Artificial neural networkmultilayer perceptron; CHAID: Chi-square automatic interaction detector.

Figure 7
Figure 7 illustrates the findings of the comparison analysis among the proposed models via identified variables contributing to the crash occurrence.As shown, C5.0 was chosen as the best predictive model with five variables for predicting the types of road crash severity since it represented the highest accuracy rate for training and the validation set compared to CHAID, ANN-MLP, and MLR models.

Figure 7 .
Figure 7. Prediction data Using Different Proposed Models Based on Accuracy and the Number of Variables; Note: ANN-MLP: Artificial neural network-multilayer perceptron; CHAID: Chi-square automatic interaction detector.

Table 1 .
Qualitative and Quantitative Independent Variables Employed in the Models (2012-2014).

Table 2 .
Statistical Analysis of Quantitative Variables.

Table 3 .
Hyper-parameter Settings for All Classifiers in the Present Study.

Table 6 .
Different Proposed Types of Logistic Regression Equations.

Table 7 .
Significance Level of Independent Variables.

Table 7 .
Cont.Note: AIC: Akaike information criterion of the simplified model; BIC: Bayesian information criterion of the simplified model.Lower values of AIC, BIC, and χ 2 value indicate lower penalty terms, hence, an important variable is selected in the model.df: Degree of freedom. *

Table 8 .
MLR model for crash severity.

Table 9 .
Goodness of Fit.

Table 10 .
Data of Prediction Crash Severity Regarding the C5.0 Model.
Note: PDO: Property damage only; AUC: Area under the curve.

Table 11 .
Data of Prediction Crash Severity Regarding the CHAID Model.
Note: PDO: Property damage only; AUC: Area under the curve; CHAID: Chi-square automatic interaction detector.

Table 12 .
Data of Prediction Crash Severity Regarding the ANN-MLP Model.