1. Introduction
Traffic crashes have been ranked as the 7th causes of death in the U.S. [
1]. This is equivalent to
$871 billion dollars based on the crash collisions costs [
2]. Run-off-road (ROR) crashes account for significant proportion of those crash collision cost, especially as those crashes result in a significant proportion of severe crashes. Even though traffic barriers are one of the well-known countermeasures, which could be employed to reduce the severity of ROR crashes, the severity of barrier crashes still account for high proportion of severe crashes.
1.1. Traffic Barriers and Shoulder Width
Due to consideration of three models in this study, and those models are related to crash severity and shoulder width, and barrier types the next few paragraphs would discuss some of the related studies.
A study conducted to evaluate the severity of median barrier crash severity [
3]. A nested logit model, which considers unobserved impacts among the severity levels was employed. The results indicated that a collision with cable median barriers decrease the probability of sever crashes.
The effects of various geometric characteristics of median barriers were evaluated using the random-parameter model [
4]. The results indicated that concrete barriers are more prone to the severity of barriers’ crashes. On the other hand, the impacts of shoulder width on traffic safety have been evaluated extensively in the literature review. Shoulders are set with objectives of providing recovery areas, or emergency stop. Wider shoulder width could encourage drivers to have a higher operating speed, which might result in a higher crash severity [
5].
However, none of the past studies took into considering the point that various barrier’s types are likely to be installed based on policy makers’ discretion on assigning a specific barrier to a specific location. For instance, concrete barriers are more likely to be installed by policy makers at locations with higher traffic, and with no sharp curve. On the other hand, cable barriers, due to not having enough stiffness in crashes, might be installed at locations with less possibility of traffic and crashes. In addition, despite the contribution of the past studies to the safety of traffic barriers, none of them considered the possible endogeneity of the shoulder width on the severity of traffic barriers crashes.
1.2. Bivariate Copula Method
Copula based method could be implemented in the statistical analyses to account for the possible endogeneity. The few next paragraphs would highlight few studies, which employed copula-based method in traffic safety studies.
A joint estimation of injury severity, crash type, vehicle damage and driver error were simultaneously evaluated for intersection crash data [
6]. It was found that those factors are highly correlated, and a joint analysis of the variables are needed. In another study, the interaction terms of predictors and various years were considered for temporal instability evaluation. A binary logit copula-based method was used to model the severity of two vehicle crashes by considering endogeneity [
7]. In that study, the collision type predictor was considered as endogenous to crash severity outcomes. The significance of copula dependence highlighted the importance of considering the endogeneity of collision type and injury severity.
A bivariate copula-based method in another study was implemented to identify the contributory factors to at-fault and not-at-fault drivers in head-on collisions [
8]. A positive dependence was detected between the injury outcome of at-fault and not at fault drivers’ models.
The choice of policy makers in setting up warning signs at a more hazardous locations was evaluated by the past study [
9]. The results highlighted an important and significant correlation due to the choice of policy makers.
Even though no study, to the best of the knowledge of the authors of this study, has used trivariate copula on transportation problem, a study which implemented with the help of that method on other fields would be outlined in the next few paragraphs.
1.3. Trivariate Copula
A trivariate copulas method was implemented to model sample selection and treatment effects on family health care demand [
10]. Three equations were considered in that study: a dichotomous choice equation for insurance status selection, and an equation for the health care use by each spouse. Significant dependences were observed across regressors and the outcome model.
1.4. Why Copula Is Needed for This Case Study
Despite much effort in application of copula-based method in traffic safety studies, a comprehensive application of copula method in a traffic barrier study is missing. This is especially important as many characteristics of barriers such as choice of selection of a specific type of barrier for a specific location might be decided by policy makers based on various criteria such as crash proneness of a road locations, or amount of traffic. For instance, concrete barriers have been installed at the locations with higher likelihood of crashes and traffic such as median barrier’s locations. That is due to rigidity of concrete barriers in receiving numerous crashes.
The designs of various aspects of roadway characteristics such as shoulder width is expected to vary based on the associated conditions of the road segments. Those variables are also typically correlated with crashes necessitating the use of copulas for modeling their dependence structures. For instance, shoulder width decision criteria might be impacted by truck traffic for accommodating their needs, or due to a possible reason of a higher traffic so drivers have more time to return to the driveway in case of running off the road.
Despite the importance of implementation of the copula method for safety, especially evaluation of various aspects of roadside designs, not many studies conducted to investigate those factors while accounting for endogeneity. Thus, this study was conducted to study factors to barriers’ crash severity after accounting for correlation between error terms of decisions impacting setting up specific barriers’ types, or shoulder width at specific locations, and error term of crash severity model itself. Furthermore, by considering the shoulder width and barriers type as independent variables in the main model of the crash severity. The three considered barriers are shown in
Figure 1 for reader that are not familiar by included types of barriers.
Finally, it should be noted that simulations have been employed for analyzing the road safety barriers which the paragraph would outlines few of them. The paper addressed numerical simulation of a concrete barriers [
11]. The study identified the locations in concrete barriers segments. Safety performance of a precast concrete barriers with numerical method was employed [
12]. To gain a precise representation of the state of the damage the stochastic damage-plasticity was adopted. The impact locations, load and boundary constraints were highlighted.
The remainder of this manuscript is organized as follows: the method section comprehensively details proposed methodology implemented in this study. The results section illustrates an implemented approach on real dataset, whereas the discussion section discusses the findings and potential direction for the future studies.
2. Method
The method section would be presented in 4 subsections. First, the general background briefly presents the general framework of copula. The second subsection, Gaussian copula mixture model, would be presented for general, and three univariate variables. The Multivariate model copula talks about how the copula method would be implemented on multivariate model and would briefly outline the process of estimating the results parameters. As endogeneity is a core of copula method, the last section would discuss that concept.
2.1. General Background
The term Copula was derived from a Latin word of copula meaning to connect or join [
13]. The concept of this method lies in the question that by knowing an observation
, how much information would be gained to predict an observation
. This is expected due to interrelation or dependence of these two random variables. Copula method helps in the above marginal distributions and interrelationship.
The first step in achieving the above prediction process is to transform the random variables into uniformly distributed random variables . It should be noted that the original values of could also be achieved by using inverse of cumulative density function (cdf) of F, or , , where is called quantile transformation.
Based on Sklar theorem, for a d-dimensional cdf of F with margins of
, there would be a copula as:
From above, It should be noted that while can have a range of [−∞,∞], the range of is [0, 1] due to the concept of cdf making it as a probability.
One of the importance features of copula-based model is the dependence consideration. Consider a bivariate copula
C(
), and a bivariate cdf of
H(
x,
y), where
H(
x,
y) =
C(
F(
x),
G(
y)), we would have:
where
and
are two uniform random variables,
is known, and the objective is to find a conditional distribution of the unknown or
.
2.2. Gaussian Copula Mixture Model
It has been argued that Archimedean copulas, such as Frank or Gumble lack the flexibility to accurately model the dependence across large number of variables [
14]. In addition, there are associated limitations for extending Archimedean copulas to trivariate or higher orders. One of the main difficulties is related to the fact that the lower-level dependence cannot be preserved due to a reason that the copula cannot provide a unique distribution when the sequence of variables are altered [
15].
However, multivariate Gaussian distribution could be implemented without suffering from the aforementioned limitations [
16]. That is why we only considered Gaussian copula due to a possible limitation of other copulas in accommodating the trivariate types of copulas.
Every copula could be written based on its variance-covariance matrix. For instance, consider a two-dimensional Gaussian Copula with interrelationship of
and correlation matrix
:
where
is the inverse cumulative distribution function, and
is the multivariate standard normal distribution function. The marginal distributions of
and
are restricted to be Gaussian.
The correlation matrix of
is a resultant of a covariance matrix as:
Based on the above formula the representation of Gaussian copula could be written as:
For
the copula would be converted into (a) independent, (b) comonotonicity, and (c) countermonotonicity copulas, respectively [
17]. In summary, it can be said that the above equation is equal to multivariate standard normal distribution with a correlation matrix of
.
It is worth discussion of the simulation process of Gauss copula, where all margins are univariate Gaussian, which could be summarized as follows:
Cholesky decomposition of the covariance matrix, and set A for the lower triangular Metrix
Consider the iteration for the following steps
- ◦
Generate a vector of z from a random normal distribution where
- ◦
Now set x = A*z
- ◦
Return U as
Now consider three-dimensional joint of
, we know:
where
is the probability density function (pdf).
From the above equation,
are parameters that need to be estimated. Based on Sklar’s theorem,
could be written as:
On the other hand, from a binary dimension,
could be written as:
Now based on Equations (6)–(8) we would have:
It should be noted that are univariate pdf’s.
Now the
could be written as the pair copulas of
[
14,
18]. In addition, thetas, dependence, are incorporated in various copulas. For instance,
would be incorporated in
.
2.3. Multivariate Model Copula
For a three multivariate model, the copula would be implemented on error terms instead of the univariate variable itself. It should be noted that the Gaussian copula resulted from multivariate normal distribution. For instance, consider a trivariate model as follows:
where
and
and
are three latent equations
, and
are coefficients related to three modeling components. From the above equations, the resultant distributions would be fully known if the joint distribution of
and
, and
are known [
19]. For trivariate model, the joint distribution of the three models could be written as:
The current study considered trivariate copulas for three considered equations. In this study, the logit link function was implemented for the marginal distribution of the three models. Here, the difference between simple logistic regression model and copula based logistic model is that copula model accounts for dependence between the residuals of the marginal models while simple logistic regression does not. In other words, the multivariate models’ copula could be seen as a joint distribution of residuals, while the errors follow parameter
θi [
20]. The correlations across the error terms might capture the presence of unobservable factors impacting various decision-makers’ processes.
Parameter estimates of logistic regression could be computed as follows. First, we know the likelihood function of logistic regression, which could be written for binomial mass function as:
For
trials, there are
ways to arrange
successes of those
trials [
21]. Furthermore, the probability of
success is
, and the probability of
failure is
.
It should be noted that
Y is a linear combination of various coefficients. Now maximum likelihood could be implemented to find
which maximize the above equation. The values could be identified when the first derivative of above is set to be zero, and the second derivative of above is less than zero. Now based on log odds formula we have:
In addition, for
we have:
Now substituting Equation (14) in Equation (12) we would have:
After some algebra, the above equation would turn into the Kernel of likelihood function to be maximized. After taking the log of the kernel equation we would have:
Now equating the derivative of above to zero for , and solving based on Newton-Raphson method would solve for betas values without a need for second derivative of the log likelihood function.
In summary, Maximum likelihood is implemented to make the observed data most probable, where
is a density function which would be maximized:
The variance-covariance matrix would be estimated by the Hessian matrix as follows:
The above likelihood estimates with some minor modifications. using penalized additive method, could be implemented for copula method as follows [
22,
23]:
where
are the location and scale parameter of logistic models,
m are marginal distribution parameters,
is dependence coefficient,
is degree of freedom. The marginal distributions of
Ys could be specified by cdf’s and densities as
F and
f, respectively. In addition,
consists of coefficient vectors of
. Furthermore, the copula density
c could be written as:
Now consider a conditional distribution of
based on our case of the binary logistic regression for marginal distribution for two scenarios would be based on
Table 1 [
24] as:
2.4. Endogeneity
The role of copulas is to account for dependence across residuals of the marginal models [
24]. The endogeneity resulted from correlation across error terms reflected from unobserved factors impacting two endogenous models, which could not be measured, and thus would be set as error terms.
In this study, the endogeneity was considered as we are interested in estimating the impacts of various predictors on the severity of traffic barrier crashes in the presence of unobserved confounders. This problems in economics studies arise when important covariates have been omitted, and as a result become part of error terms [
20]. After setting the treatment by considering the endogeneity in the model, if the correlation across the treatment and the outcome error terms were found to be not important, the treatment would be turned to be exogenous in the main model, or crash severity. Various techniques could be implemented to account for endogeneity effect. For instance, latent instrumental variable (LIV) could be implemented by utilizing a discrete latent variable to account for dependencies between regressors and error terms.
The trivariate copula-based model in this study controls for unobserved confounders by three equations: the main equation describing the binary outcome in terms of the two treatments, where the other two treatments model the contributory factors to the treatments. On the other hand, the error terms describe the omitted coefficients being not considered in the model.
On the other hand, endogeneity resulted from incorporating the dependent predictor as an independent predictor in the main equation. In this study, we considered barrier types and shoulder width, which were dependent variables in other models, as independent variables in the crash severity model. To have a vision regarding the process
Figure 2 is provided, which highlight while the standard copula account for correlation across the error terms, we further extend the models to account for endogeneity by considering the barrier types and should width as explanatory variables in the main model of crash severity.
3. Study Continuations
Finally, the study contribution could be summarized as follows:
It is expected that the choice of various traffic safety barriers to be confounded by various policy-making decisions, so here to account for that unseen factor, we accounted for the correlation across the error terms by copula method.
It is also important to account for the endogeneity of various models. For instance, while copula model could account for the error terms correlations across the models, considering barrier types and should width, those factors are also expected to be important predictors for the severity of barriers crashes. So, we accounted for the endogeneity by incorporating those factors as explanatory variables in the main model of barriers crash severity.
Accounting for the aforementioned points is especially important while studying traffic barriers crashes due to their confounding effects of policy makers in assigning various barriers types to specific locations.
This one of the earliest study implemeneted the copula technique to account for policy makers subjectivity in making an especific policy.
4. Data
The dataset used in this study was obtained from Wyoming department of transportation (WYDOT) through the critical analysis reporting environment (CARE). The data was filtered to include only crashes occurred due to hitting barriers as their first harmful events. Due to variation of factors to crashes in multivehicle crashes, also only single vehicle hitting a barrier considered in this study. Due to low number of cable barriers and significant variation in the designs of temporary barriers, those barriers were excluded from this study. The data was also filtered to include only observation in the state’s interstate system.
The traffic barrier geometric characteristics were collected from 4176 miles of roadway in Wyoming, including various characteristics such as barrier heights, shoulder width, type of barriers, and offset. Offset is defined as the outer edge of shoulder width to the side of a barrier. The Wyoming roads map is presented in
Figure 3 to provide the summary of the road sketch in the state.
Originally, there were 3 datasets including: crash, traffic, and barriers geometric characteristic. The crash dataset was filtered to include only those crashes occurred between 2007 and 2016. More recent data was not incorporated due to lack of availability at the time of analysis. The traffic, and crash dataset were aggregated to the barriers’ geometric characteristics. The aggregation was implemented by matching highway system, milepost, and direction of travel. The averages of various explanatory variables were used for aggregating the crash to the barrier dataset while the total of various crash severity was considered for aggregation. For instance, the average of having no improper action in the crashes is 0.131 (see
Table 2), that means that the majority of the drivers hitting barriers had some form of improper driving as this binary predictor is closer to the reference category of improper driving as zero.
Due to similarities of box beam and W-beam, compared to concrete barriers, those two types were combined and set as reference as 0. The average of this predictor highlight that the majority of barriers were non-concrete barriers, mean = 0.048.
The model consists of three equations: barrier crash severity’s equation for identification of contributory factors to barriers severity, and two variables being endogenous to the traffic barrier crash severity: shoulder width and barrier types. A barrier is considered as hazardous if during the analysis period, it has experienced at least a fatal or severe crash. Shoulder width is converted into two categories based on the cutting point of 12 feet; the federal highway administration (FHWA) recommends a minimum shoulder width of 10 feet for the interstate system. However, a shoulder width of 12 feet is recommended when truck traffic exceeds 250 design hourly volume (DHV) (the design hourly volume for one direction) [
25]. Barrier type is set to be binary, reference category as 0, when the barriers were box beam or W-beam, and 1 when the barriers were concrete.
5. Results
The results of the trivariate copula would be presented in 4 subsections. The first two subsections would present the results of two endogenous binary variables of shoulder width and barrier types. Then it would detail the results of the main model of barrier crash severity. At the end, it would go over the results of copula dependence of the three considered models.
5.1. Endogenous Factor of Shoulder Width
The results of this part of the model highlight a policy makers’ decisions for setting a wider shoulder width of 12 feet in the state. The results highlight that it is more likely to have a wider shoulder width when truck traffic and barrier length increases, . For locations with higher truck traffic, it is intuitive to accommodate the truck drivers with wider shoulder width for various purpose such as pulling over to the shoulder width for mechanical issues, recovery, or resting. The impact of barrier length also could be dependent on other unseen factors that cause policy makers to make a decision about shoulder width.
5.2. Endogenous Factor of Barrier Types
For this model, a binary response was created by considering W-beam and box-beam as reference category, compared to concrete barriers. The aggregation across two barriers for a reference category was due to the similarity across those two barriers. As expected, barrier heights,
and offset,
are two variables that varied across these two predictors based on the policy design of these two barriers. The results are presented in
Table 3.
For instance, concrete barriers have a higher height than other barriers, which could be seen by direction of this coefficients, . Or it is more likely to assign a higher offset for w-beam compared with concrete barriers. The interaction terms across important predictors were considered as well: when there is a possibility of negotiating a curve at a location with a higher posted speed limit, it is less likely to have a concrete barrier. This is likely due to the nature of concrete barrier that could not be installed at curves segments.
5.3. Injury Severity Model (Main Model)
In addition to incorporating important variables such as posted speed limit or improper restrain, the two other variables discussed above were used as endogenous variables in this model. The results indicated that an increase in posted speed limit, , results in a decrease in the severity of barriers’ crashes. Even though the impact seems to be counterintuitive, the negative impact of posted speed limit on the severity of barrier crashes might resulted from the fact that higher posted speed limit is at locations with less critical geometric characteristics resulting in a reduction of the severity of barriers’ crash severity.
It was found that having no improper driving action, would result in a reduction of the severity of barrier crashes compared with the time that drivers showed some signs of improper driving action such as speeding or driving under influence (DUI). Numerous studies have been conducted on various aspects of the impacts of speeding or DUI on the severity of crashes.
Moving to the variable of drivers’ emotional condition. For this variable, various emotional drivers’ conditions such as being angry, sad, or irritable at the time of crash were compared with the time when drivers were observed to be under normal condition. As expected, being under non normal emotional condition,
, increases the likelihood of severe barriers crashes. The impact highlights the importance of emotional condition on the severity of crashes. This impact could be attributed to the fact that drivers under normal conditions, in terms of emotional conditions, asses physical and emotional traffic hazards accurately resulting in a decrease in crash risk [
26]. The results of residency indicated that residence drivers of Wyoming,
are more prone to the severity of barriers crashes. The impact might result from the confidence that resident drivers have while driving on Wyoming roadways. This might be due to distinguished fear to possible hazard on the roads for confident resident drivers, believing that they are not at risk [
27].
Another factor found to be important is alcohol involvement,
. The impact is intuitive, and many studies have been conducted on the importance of the effect of DUI on the severity of crashes. There is worldwide evidence of higher risks of driver under the influence in crashes and crash severities. Alcohol usage significantly impact the driving skills [
28].
Even though the impacts of the other two models were evaluated through the correlation across the error terms, the factors of barrier types and shoulder width considered in the third model as exogenous variables. Barrier type variable highlights the impact of the concrete barriers increase the severity of crashes compared with other barriers. The impacts of the barriers are intuitive and related to the capacity of energy absorption of W-beam barriers compared with concrete barriers, lacking flexibility and being rigid.
Moving to the factor of shoulder width
. The results indicated that a wider shoulder width of 12 feet increase the severity of barrier crashes. The impact was attributed to the fact that vehicles going off the roadway need to be stopped by barriers as soon as possible, and before going over a wider shoulder width [
29].
5.4. Copula Dependence Parameters
The copula dependence parameters,
highlights the presence of unobservable factors. If the unobservable factors impacting the outcomes were uncorrelated, having all the models in the framework simultaneously would be unnecessary. The copula could account for dependence between the two treatments and the outcomes. The large and significant dependence parameters across the models highlight that there are strong negative association between the unstructured terms of the model parameters (see the bottom of
Table 3).
Ignoring the dependencies might result in erroneous parameters’ estimates. Accounting for the correlated models is important as it could be hypothesized that concrete barriers are installed at the locations with higher crash frequency or higher crash severity such as median barriers. On the other hand, wider shoulder width is installed at the locations where wider shoulder width was necessitated by policy makers.
Even though all the studies in the literature, along with this study highlight the impact of concrete barriers on the severity of crashes, it is worth investigating the relationship between unseen factors across these two models. The results of copula dependency factor indicated that unseen factors that install concrete barrier at specific locations, at the same time they decrease the severity of barrier crashes. This result is contrary to the impact of seen factors of barrier types on barrier crash severity. This could be related to the decision being made by the WYDOT on installation of concrete barriers at right locations, which consequently reduce the severity of barriers crashes.
Moving to the relationship between error terms of shoulder width model and barriers’ crash severity, . The results indicated that while the main effect of wider shoulder width increase the likelihood of severe barriers’ crashes, there is a negative correlation across wider shoulder width and barriers crash severity error terms. In other words, the unseen factors might be related to policy makers that increase the width of shoulder at the same time decrease the severity of barriers’ crashes.
Moving to the last copula dependence between shoulder width and barrier types, . The results indicated that unseen factors that install concrete barrier, they also increase the shoulder width to their maximum values. This highlights the designs of concrete barriers associated with higher shoulder width. It should be emphasized that the unseen factors should have been part of the models’ predictors but due that fact that they have been omitted, they have become part of the model’s error terms
6. Conclusions
Hitting a fixed objects crashes have been considered as one of the leading causes of roads’ fatalities around the world. Traffic barriers have been installed with an objective of mitigating the severity of those crashes. However, traffic barriers still account for a significant proportion of severe crashes. Extensive effort has been made in identification of contributory factors to those types of crashes with the hope that the severity of barriers’ crashes would be reduced. However, despite many efforts being made regarding identification, and studying the factors to barriers crashes, none of the past studies considered the error terms’ correlations across endogenous variables and barrier crash severity.
Not addressing the methods’ shortcoming, has been justified by arguing that those unseen factors cannot be measured, or as those unobserved factors would be stored in error terms, no significant issues would be raised. However, there is a considerable chance that the error terms of those endogenous variables, and outcome models are correlated. Not accounting for those unseen factors through considering error terms’ correlations might result in bias or erroneous modeling results.
That is especially important for traffic barrier analysis as it is expected that there are unseen factors that impact the decision of various treatments, and thus those unseen factors are expected to end up in error terms. It is expected that a significant portion of error term resulted from the policy makers in the state in designing various aspects of roadway safety features such as allocating specific barrier types or shoulder width are based on traffic or other safety criteria.
Copulas are general tools for describing dependence structure by considering correlations across separate incorporated equations. In this study, a unified likelihood function framework was used to join different marginal models. The method accounts for pairwise dependence across models’ residuals, given the explanatory variables. Three joint models were considered and estimated using copula-based method.
Two treatment models along with an outcome model were considered in this study. Here the selection of various barriers or various shoulder width settings, as treatments, are biased toward the safety of roadways or various policy makers’ objectives. This selection bias which has been often ignored, cannot be measured by traditional statistical method and termed as endogenous selection bias. Ignoring the selection bias could result in erroneous results, especially when high correlations exist.
While Archimedean copulas have been extensively employed on bivariate problems, there are some limitations when it comes to their applications for higher dimensions copulas, e.g., trivariate copula. Thus, in this study Gaussian copula were implemented to accommodate trivariate copula methods.
The Gaussian copula was selected over Archimedean copulas due to the fact that the Archimedean copulas suffer from inflexible structure in higher dimensions. The Gaussian copula implemented in this study to conveniently express the endogeneity in the form of correlation matrix of a multivariate Gaussian distribution [
30].
While similar to the previous studies, the results highlight, for instance, concrete barriers, or wider shoulder width increase the odds of severe barrier crashes, the results of copula dependencies indicated that there are negative, large, and significant correlation across the error terms of models. This justifies the policy makers’ decision in installation of concrete barriers at specific locations or setting up a wider shoulder width in reduction of the barriers’ crash severity.
In other words, the WYDOT decisions on barrier type selection or decision on the width of shoulder width are in line with the safety of Wyoming interstate system. Accounting for correlations of error terms is important in this study especially because there were strong and significant correlations between regressors, and outcome models’ error terms.
More studies are needed to take into account endogeneity issue while analyzing the safety aspects of various treatments. That is because it is expected those choices to be influences by the safety of those specific locations or limitation of installation of different types of safety measures for a unique location. Our results provide the WYDOT with the assurance that despite the highlighted contributary factors of concrete barriers on higher severity of barriers, their choice of installation of concrete barriers negatively correlate with severity of barriers crashes. It would be interesting to have more available inputs regarding the chooses for the selection of various barrier types or shoulder width. In that way, this study would have been able to confirm the findings with the real justification for making a decision to set up various barrier types or should width. Future studies are encouraged to consider and evaluate those factors in their analyses.