Multilevel Mixed-Effects Models to Identify Contributing Factors on Freight Vehicle Crash Severity

: Freight vehicle crashes are more serious than regular vehicle crashes because they are likely to lead to major damage and injury once they occur; therefore, countermeasures are needed. The fatality rate from freight vehicle crashes is 1.5 times higher than that of all other accidents, and the death rate from expressway freight vehicle crashes continues to increase. In this study, the ten-freight-vehicle crash severity models (the ordered logit and probit model, the multinomial logit and probit model, mixed-effects logit and probit model, random-effects ordered logit and probit model, and multilevel mixed-effects ordered logit and probit model) are used to analyze the freight vehicle crash severity factors. The model was constructed using data collected from expressways over eight years, and 13 factors were derived to increase the severity of crashes and 7 factors to reduce the severity of crashes. As a result of comparing the 10 constructed models using AIC and BIC, the multilevel mixed-effects ordered probit model showed the best performance. It is expected that it can contribute to improving the safety of freight vehicles in the expressway section by utilizing factors related to the severity of crashes derived from this study.


Introduction
Currently, Korea's truck traffic and cargo volume are steadily increasing. According to the Korea Transport DataBase (KTDB)'s analysis of cargo traffic demand, road cargo volume has been growing by 1.2% annually since 2015, and the total traffic volume of cargo vehicles has been increasing by 3.2% annually [1,2]. In particular, the traffic volume of large cargo trucks exceeding 8.5 tons is increasing significantly. The traffic volume of large trucks and trailers increased by 11.1% compared to a 1.7% increase in the traffic volume of small trucks and a 6.9% increase in the traffic volume of medium-sized trucks. In addition, according to statistics from the National Logistics Information Center, even when domestic traffic in Korea was decreasing due to COVID-19, the delivery industry was activated, showing a 20.9% increase in delivery cargo volume than before COVID- 19. With the increase in truck traffic, truck traffic safety on the highway is becoming a problem. Truck crashes are more serious than general vehicles, so once they occur, they are likely to lead to large-scale crashes, which require countermeasures. According to the Korea Expressway Corporation, 48.5% of the 523 highway deaths over the five years from 2015 to 2019 were caused by truck-related crashes. In addition, the proportion of deaths from truck crashes continues to increase from 42.7% in 2017 to 43.7% in 2018 and 46.6% in 2019. According to the Korea Road Traffic Authority (KoRoad) Traffic Accident analysis System (TAAS), the total fatality rate of crashes was 1.93%, while the fatality rate of truck crashes was 3.49%, which is approximately 1.5 times higher. This study aims to analyze the factors influencing the severity of truck crashes using highway truck crash data for eight years from 2011 to 2018. For analysis, based on the ordered model, a model was developed that combines various additional methods such as the multinomial, random parameter, mixed-effect, and multilevel methods. that the truck accident was significantly different from ordinary vehicles in terms of the high-speed limit, whether to turn left or right, and whether to collide. Rosenbloom et al. (2009) [16] surveyed a total of 167 drivers in Israel and compared their behavior on the road between truckers and non-truck drivers using distributed analysis. Zhu and Srinivasan (2011) [17] analyzed truck accidents in the Large Truck Crash Causation Study (LTCCS) data using an ordered probit model. As a result, it was found that the severity of the accident was the highest when a freight car and a passenger car collided head-on at an intersection. Chang and Chen (2013) [18] created a CART classification tree using a truck traffic accident in Taiwan and analyzed the characteristics of the accident. As a result, it was found that drunk driving and wearing seat belts had a great influence on the occurrence of fatal accidents. Choi (2013) [19] analyzed the severity of crashes of highway trucks according to weather conditions using a binary logistic regression analysis. As a result of the analysis, it was found that night driving and speeding were affected under normal weather conditions, and the severity of crashes increased in the downhill sections, bridges, and tunnel sections under abnormal weather conditions. Han et al. (2014) [20] analyzed the characteristics of truck accidents by analyzing traffic accident data occurring in Seoul using the CART analysis technique. As a result of the analysis, the possibility of death varies greatly depending on whether protective equipment such as seat belts are worn, and the more experienced the driver, the higher the severity of the accident. Hong et al. (2019) [21] analyzed to discover the factors influencing freight vehicle crashes and develop more accurate crash probability estimation by explaining the endogenous driver traffic violations using a two-stage residual inclusion approach, a methodology used in the nonlinear regression analysis model. As a result of the analysis, the driver's physical condition, improper passing, speeding, and safe distance violations were found. Park et al. (2019) [22] developed multivariate adaptive regression splines (MARS) models to analyze all types and freight vehicle crashes. MARS models show better model fitness than the negative binomial (NB) models.
In previous studies, it was difficult to find cases in which crash severity analysis was performed mainly using a binary logistic model and an ordered probit model; other models were used. This study aims to construct a truck crash severity model using various models, as well as the models used in previous studies. In addition to the ordered model that is used, various methodologies such as multinomial, random-effects, mixed-effects, and multilevel methods are used. In addition, we evaluated the model constructed through performance comparison between these models.

Ordered Model
The ordered model is a representative model that can be used when there are more than three dependent variables. In general, analysis can be performed using the probit model or the logit model in the case of dependent variables that do not have an order. However, if the dependent variable is not binary, the analysis through the probit model or the logit model risks making errors. There is a limitation in making an error by analyzing the difference between Y = 0, Y = 1 such as the difference between property damage only and injury, and Y = 1, Y = 2 such as the difference between injury and fatal. Ordered models are used to improve the limitation that binary models are not available when dependent variables have an order. The logit model and the probit model have the same basic characteristics but are classified according to the probability distribution form of the error term. The logit model assumes that the probability distribution of the error term has the same variance and follows an independent logistic distribution, and the probit model assumes that the probability distribution of the error term has the same variance, and the covariance is 0. The ordinal probit model can be expressed as Equation (1) [23][24][25][26][27][28]: where y can be expressed as a measurable utility and a non-measurable utility as a potential utility. µ is the limit value estimated together with the estimation coefficient of each explanatory variable and has J − 1. Through this, it can be used to calculate the probability of selection for alternatives, and the probability of selection for each alternative can be expressed as Equation (2).
To verify the suitability of the finally derived model, ρ 2 (likelihood ratio index), which represents the suitability of the entire model, is used. Similar to R 2 in regression analysis, ρ 2 has a value between 0 and 1, and the closer it is to 1, the higher the fit. When the value of ρ 2 is 0.2 or more, the suitability may be evaluated as sufficient.

Multinomial Model
The multinomial model is a useful analysis method when classifying objects according to the value of predictors and can be used when the dependent variable does not have both hierarchy and order. It starts from the concept that the dependent variable forms a polynomial distribution, and it is characterized by polynomial selection as an extension of the abnormal selection situation. U ni is a function that determines the severity, and if it has a linear shape, it appears as Equation (3) [29][30][31][32][33].
where ε is an error term describing an unobserved effect on severity. If ε follows the type 1 extreme value distribution, it becomes a polynomial logistic model.
The coefficient i can be estimated by the maximum likelihood method. In the case of an ordinal logit or probit model, variables are constrained to affect only one direction, but in reality, variables can affect both directions in a U-shaped form. Therefore, the polynomial model differs from other models in that it has this flexibility.

Mixed Model
Theoretically, logistic models do not allow parameters to vary across observations. To address this issue of heterogeneity, mixed-effects models (also referred to as randomparameters models) were developed to allow some parameters to vary across crash observations [34,35]. The mixed-effects logistic model is given as: where Pr(·) = probability of injury severity y i of crash observation i, β = vector of regression coefficients, X i = vector of explanatory variables for crash observation i, Φ = cumulative distribution function of logistic distribution, and f(β|ϕ) = density function of random parameter β with distribution parameter Φ. Equation x is a mix of two distributions: Φ(X i β) for the error item and f(β|ϕ) for random parameters.

Random-Effects Ordered Model
Discrete choice models for estimating ordinal response data were applied in exploring the severity of transport safety [7,36,37]. Because those in the same vehicle are expected to have common unobserved factors for injury severity, a random effects component was included in the model. In the random effects ordered probit model used in this study, the following specification was used [38,39]: where y * ij = latent variable presenting the severity of the ith person in the jth vehicle, β = (k × 1) vector of coefficients for the independent variables, X i = (1 × k) vector of observed independent variables, ε ij = random effect for group j and person-level random effects.

Multilevel Mixed-Effects Model
A multilevel model can explicitly model complex variances and heterogeneity. It has the advantage that it is possible to generate a mixed-effect multilevel model that varies by fixed parameters estimated by the ordered model and random parameters observable in the multilevel model. A multilevel mixed-effects ordered logit model can be expressed as follows [40]. Y * ijk = X ijk β + W jk δ + V k γ + u jk + v k + e ijk where Y * ijk = latent continuous response representing the levels of driver injury for driver i, traffic crash j, and area k, X, W, V = fixed part of the explanatory variable design matrix for the first level, second level, and third level, β, δ, γ = corresponding coefficients, u jk + v k + e ijk = random part of the model in which a subset of X, a subset of W, and a subset of V are an explanatory variable design matrix for the first level, second level, and third level, representing both random intercepts and random coefficients, e ijk = set of first-level random effects, v k = set of a second-level random intercept and random coefficients, u jk = set of a third-level random intercept and random coefficients.

Marginal Effect
Marginal effects are used to accurately interpret the results derived from each model. The marginal effect is a value that estimates the amount of change in crash severity due to an increase in one unit of an independent variable through partial differentiation, and through this, the effect of the independent variable on accident severity can be identified. By estimating the amount of change due to the increase in 1 unit of the independent variable, it is possible to check how the independent variable affects the outcome. The direct marginal effects are calculated by Equation (7), respectively [41][42][43].
The direct marginal effects represent the impact of changes in variable k of outcome j on the probability for the crash I to be in severity outcome j.

Analysis of Data
For this study, data from freight vehicle crashes that occurred on expressways during eight years from 2011 to 2018 were used. Data are collected from the section under the jurisdiction of the Korea Expressway Corporation, and data from the section of the private highway are not reflected. A total of 22,619 freight vehicle crashes occurred over the eight years, and the dependent variables were classified as death, injury, and property damage only [43][44][45][46][47][48]. Independent variables are composed of 18 types. There were 808 deaths, accounting for 3.6% of the total, and injuries accounted for 10.5% of the total. Crashes at night accounted for 33.8%, and 60% of all crashes occurred on the highway main lines. Crashes caused by driver factors accounted for 73% of all crashes, and 95% of crashes occurred when there were no traffic obstacles. The causes of the accidents were high for, in order of negligence, speeding, vehicle defects, and drowsy driving. As for the type of accident, car-facility crashes accounted for 61.9% and car-car crashes accounted for 19.5%, and 65% of all crashes occurred in clear weather. Only 161 cases occurred in the right curve, while 2719 cases occurred in the left curve, and the rate for crashes that occurred in the uphill and downhill slopes was similar. The number of crashes of medium-sized cargo trucks was the highest, and crashes according to age groups occurred in the order of those in their 20 s, 50 s, and 40 s. When an accident occurred while driving, the average speed was 91 km/h. The basic statistics for each variable are shown in Table 1.

Results and Discussion
Using the variables described in Table 1, an ordered logit and probit model, a multinomial logit and probit model, a mixed-effects ordered logit and probit model, a randomeffects ordered logit and probit model, and a multilevel mixed-effects logit and probit model were constructed. The maximum likelihood estimation method was used for each model estimation. In addition, each model was constructed while removing variables until only 95% of the significant variables remained, and the final model construction results are as follows.

Ordered Model
The results of constructing an ordered logit and probit model are shown in Table 2. In the case of the ordered model, variables derived from the logit model and the probit model were the same except for the Saturday variable, and it can be seen that all variables had the same sign. It was confirmed that the coefficient value was high in the case of crashes caused by driver carelessness such as speeding, drowsiness, lack of safety distance, and negligence. In addition, it was found that the severity of the crash increased significantly even when the front was congested, or the forward car was stopped. On the other hand, it can be seen that the severity of the accident decreased in the tollgate, ramp section, snowy weather, and construction area, and those factors can be seen as lowering the severity of the accident because they had to slow down. In addition, the coefficient of the ordered logit model was approximately twice as high as that of the ordered probit model. The results of the marginal effects on the ordered logit and probit models are shown in Table 3. In most cases, an injury crash had a large influence, but some variables show other results. In the case of the nighttime variable, it was found that the ordered logit model had a great influence on fatality crashes, but in the ordered probit model, it was confirmed that it had a great influence when it was an injury crash. In the age group, people in their 40s and 50s had the greatest impact on injury crashes, but those over 60 had the greatest impact on fatality crashes.

Multinomial Model
The results of constructing a multinomial logit and probit model are shown in Table 4. The four variables, year, ramp, crash factor-vehicle, and Saturday, were found to be variables that affected only the event of an injury accident. Night, tollgate, construction area, guard fence, and age group were found to be variables that only affected deaths. Drowsiness, lack of distance, and negligence were found to have a significant effect on both injury and death, but in the case of speeding, it was found to affect only fatal crashes. In addition, in the case of weather, it was found to increase the severity of crashes when there was fog. In the road alignment, it can be seen that the severity of the accident is high in the right curve section or the downhill section. This is believed to be because the soundproof wall may restrict the driver's field of view in the right-curved section and cause brake operation problems in the downhill section. It can be seen that the signs of the variables derived from each model all have the same signs.  The results of the marginal effects on the multinomial logit and probit models are shown in Table 5. In the case of fog and road fault, it was found to affect fatal and property damage only crashes. In the case of the age group, the severity of fatal crashes increased in all age groups in their 30s and older.

Mixed-Effects Model
In the case of the mixed-effects model, variables such as those in Table 6 were finally derived. In the case of those in their 50s, it was found to be significant only in the logit model, and in the case of the median-other and guard fences, it was found to be significant only in the probit model. In the mixed-effects model, the results of the ordered model and the multinomial model analyzed above are mixed. It can be seen that speeding, drowsiness, lack of safety distance and negligence derived high from the ordered model and the right curve and downhill derived high from the multinomial model were derived together. It can be seen that all commonly derived variables have the same sign.

Random-Effects Ordered Model
In the case of the random-effects ordered model, variables such as those in Table 7 were derived. Except for the guard fence variable, variables derived from the logit model and the probit model were found to be the same. Unusually, a different result was derived from the crash type. In other models, the accident severity coefficient was high in the order of car-person, car-car, and car-facility, but in the random-effects ordered model, the car-facility coefficient was the highest and the car-person coefficient was the lowest. The results of the marginal effects on the multinomial logit and probit models are shown in Table 8. In most cases, it was found that the influence was greatest when it was an injury accident, but in the case of tollgates, the influence was greatest when it was a property damage only crash. This seems to be because crashes occurring within the tollgate are mainly caused by collisions with facilities. In addition, in the case of guard fences, the effect of property damage only crash was found to be the greatest in the random-effects ordered logit model, while the opposite result was found in the probit model. Table 8. Marginal effects of random-effects ordered logit and probit model.

Random-Effects Ordered Logit
Random-Effects Ordered Probit

Multilevel Mixed-Effects Ordered Model
In the case of the multilevel mixed-effects ordered model, variables such as those in Table 9 were derived. The results for the influencing factors were the same as those of other models previously analyzed. Variables derived from the logit model and the probit model were found to be the same, and it can be seen that all variables have the same sign. The results of the marginal effects on the multinomial logit and probit models are shown in Table 10. In most cases, it was found that it had the greatest effect when it was an injury crash, but in the case of toll gates and construction areas, it was confirmed that it had the greatest effect when it was a property damage only crash.

Model Comparison
AIC (Akaike information criterion) and BIC (Bayesian information criterion) were used for comparison between various logit and probit models analyzed using the same data. The AIC and BIC can be formulated as follows [49]: where LL is the log-likelihood of the full model with statistically significant explanatory variables, k is the number of parameters estimated in the model, and N is the number of observations. In addition to comparing the model fit, these two measures can account for the complexity of the model by penalizing the criterion for the number of explanatory variables included in the model. This penalization is carried out by either 2k or ln(N) × k terms in the equations. Having the models fit on the same data set, the model with lower AIC and BIC is considered to outperform the others. The AIC and BIC for all models were calculated and depicted in Table 11. As a result of comparing the AIC and BIC values for each model, it can be seen that the value of the multilevel mixed-effects ordered models was the lowest. Next, the model showing the lowest AIC and BIC values was found to be a multilevel mixed-effects logit model. When comparing AIC and BIC between model groups, the model that showed excellent performance after the multilevel mixed-effects model was found to be a random-effects model. The multinomial model was shown to have very similar performance to the random-effects model. In addition, in all model groups, the AIC and BIC results of the probit model were lower than that of the logit model.

Conclusions
In this study, an examination was conducted on the factors affecting the severity of highway truck crashes using 10 crash severity models of truck accident data generated on highways for eight years. A total of 26 variables were selected from 10 models, and the influencing factors of each variable are summarized in Table 12. Variables that increased the severity of the accident were night, speed, vehicle factors, speeding, drowsy driving, congestion, front vehicle stopping, fog, downhill, safety distance not secured, front gaze negligence, right curve, and age. In the case of night driving, the coefficient was found to be high, especially in the model related to fatal crashes. Due to the policy of the Korean expressway, which is subject to toll discounts at night, the rate of truck traffic at night is high, and the fatigue from long-distance operations increases, increasing the severity of the collision. For similar reasons, it can be seen that drowsy driving also has a high coefficient, and drowsy driving also has a high coefficient of death, so measures such as the installation of additional rest areas and sleep rest areas are needed. In the case of speed, it was shown as a variable that increased the severity of the accident in all models, but the coefficient value showed a low value of 0.01 or less in all models. This was analyzed because there were cases where speed was omitted from the crash data. In the case of speeding, it appeared as a major variable in all models. It was analyzed that there was an effect of a secondary accident in which an additional accident occurred in a stopped vehicle due to a vehicle failure on a highway or a prior accident when a vehicle stopped ahead. In particular, if an accident occurred when a passenger, including a driver of a vehicle stopped on the main line, was exposed to the road due to a preceding failure or accident, the severity of the accident increased, and it was confirmed that the coefficient of the vehicle-to-vehicle accident type was the highest. In the case of congestion, it was found that due to frequent repetition of stop-and-go situations, the driver's safety distance was not secured, and forward gaze was only connected. In the case of age, in all models, the older the age, the higher the accident severity coefficient. However, unlike previous studies, the accident severity of young drivers was the lowest, so further research is needed. In the case of the right curve, it was confirmed that the severity of the accident increased because it is difficult to check the front during the right curve because the truck mainly uses the right lane.
Variables that lowered the severity of the accident were toll gates, and ramps. Toll gates and ramps are a situation in which deceleration is required, and in many cases, the concentration of the driver is required. Road fault, snowy, wet surface conditions, and construction areas were also analyzed as factors that lower the severity of the accident, but this should consider various situations. These four factors are situations in which drivers have to pass with very high concentration, and accordingly, the need for low-speed driving is high. However, as soon as a driver problem or other factors occur in the corresponding sections, a very dangerous situation may be reached. In the case of road fault, dangerous actions such as suddenly changing lanes are often employed to avoid road fault sections. If there is no equipment such as a snow chain when it snows, there is a risk of driving on a very slippery road. Similarly, even if the road surface is wet, there is a risk that the road surface is slippery, and the brake does not properly catch due to the water barrier. Finally, in the case of the construction area, there is a geometric problem in which the width of the road narrows and the alignment is poor. In addition, the presence of workers working at construction sites increases the risk of pedestrian accidents on expressways.
In the comparison results between each model, it was found that the multilevel mixedeffects ordered logit and probit models showed the best performance. This is because various consideration conditions, including random effect, are all considered within the model, and each existing model has its advantages [40].
As a study to be added in the future, an analysis using a severity model not used in this study is necessary. In addition, in the time zone, it was simply classified as day or night, but it seems that analysis through detailed classification is necessary. Finally, it is necessary to find special measures to improve the transportation safety of cargo trucks through comparison with the accident severity model for all models in the highway.