Modeling Collision Probability on Freeway: Accounting for Di ﬀ erent Types and Severities in Various LOS

: In this study, collision-related data were collected on the I-880 freeway of California in the United States from 2006 to 2011. Our objective was to study the collision probability of di ﬀ erent collision types and severities in di ﬀ erent tra ﬃ c states. The tra ﬃ c states were divided by the traditional level of service (LOS) method. Various Bayesian conditional logit models have been established to analyze the relationship between the collision probability of di ﬀ erent collision patterns and LOSs. The results showed that LOS A had the best safety performance associated with all of the collision types and severities, LOS C had the worst safety performance associated with hit object collisions, LOS D had the worst safety performance associated with sideswipe collisions and rear end collisions, and LOS F had the worst safety performance associated with injury collisions. The ﬁve-stage Bayesian random parameter sequential logit model was established to quantify the e ﬀ ects of di ﬀ erent variables on the collision probability of various collision types and severities. In addition to LOS, the visibility, road surface, weather, ramp, and number of lanes had signiﬁcant e ﬀ ects on di ﬀ erent collision types and severities. The results of the sequential logit model showed that weather variables, road variables, and LOS had signiﬁcant e ﬀ ects on di ﬀ erent collision types and severities. It was found that fewer lanes and low visibility can both increase the injury probability of hit object collisions, sideswipe collisions, and rear end collisions. Ramp segments could decrease the injury probability of hit object collisions and sideswipe collisions. Normal weather conditions could increase the injury probability of hit object collisions and rear end collisions. Normal road surfaces could increase the injury probability of rear end collisions.


Introduction
With the widespread use of freeway traffic surveillance systems, researchers have started using high-resolution dynamic traffic flow data to identify traffic conditions before collision occurrences. Numerous studies have developed real-time collision probability models for estimating the relative probability of collisions, given dynamic traffic flow data [1][2][3][4][5]. These studies have generally used a case-controlled study design structure, in which the traffic conditions before collisions were considered collision cases, while those under collision-free conditions were considered control cases. With the case-controlled dataset, researchers have developed real-time collision probability models to analyze the relationship between the probability of a collision and the traffic-related variables, including geometric design factors, environment factors, traffic flow factors, crash characteristic factors, driver behavior factors, and control strategy factors on a freeway.
Numerous researchers have studied the spatio-temporal evolution of traffic flows by dividing the traffic flows into different states. However, relatively few studies have investigated the collision types under different traffic conditions. Thus, it is necessary to explore the collision mechanism of different types and severities in various traffic flow states. In previous studies, it has been proven that there is a significant difference in safety performance for different collision types and severities in various traffic flow states [6][7][8].
In this study, the traffic flow is separated into six states by level of service (LOS). The main purpose is to identify the relationship between the LOS and different collision types and severities, and explore how contributing factors affect collision risks for different types and severities. The collision-related data were collected on the I-880 freeway of California in the United States from 2006 to 2011. The Bayesian conditional logit models have been established to analyze the statistical relationship between the collision probability of different collision patterns and LOSs. The five-stage Bayesian random parameter sequential logit model was established to quantify the effects of various variables on the collision probability of different types and severities. This research can help traffic management personnel better understand which LOS is more dangerous for different collision types and severities and realize the contributing factors of different collision types and severities in different LOSs. The results can be applied to reduce the collision probability of different types and severities in different LOSs.

Literature Review
Although most studies have explored the collision mechanism without considering the traffic flow states [1][2][3][4][5], some researchers analyzed the collision types and severities in different traffic flow states. In early studies, Golob et al. separated the traffic flow into various states. The researchers indicated that the traffic flow states with high densities could increase the probability of property damage only and multi-vehicle collisions, while the traffic flow states with low densities could increase the probability of single-vehicle and injury collisions [6]. Subsequently, Golob et al. separated the traffic flow into eight states and analyzed the relationship between collision types and traffic states. The results indicated that there is a significant difference in collision characteristics associated with various traffic states. However, there is insufficient qualitative analysis of the contributing factors in various traffic states [7].
Recently, Li et al. found that the speed-related variables can significantly affect the collision probability of different traffic flow states on a freeway. According to the speeds upstream and downstream of a crash, the traffic flow is separated into four states: back of queue, congested traffic, front of queue, and free flow. The results showed that the variation of speeds could increase the probability of a collision in free flow conditions, while the coefficient of speed variation could increase the probability of a collision in back of queue and congested traffic [8]. Subsequently, Li et al. found that there was a significant relationship between rear end collisions and the magnitude of lengthwise traffic variations, while sideswipe collisions were significantly related to the traffic variation between adjacent lanes on a freeway [9]. Wang et al. analyzed the short-term variation and spatial-temporal characteristics of traffic flow by sideswipe collisions. The results implied that the occurance of sideswipe collisions was significantly related to occupancy, average flow, and speed variance [10]. Kwak et al. defined the traffic flow states by uncongested and congested conditions. The results indicated that there was a significant difference of collision probability by different traffic states and road types [11]. Xu et al. applied four traffic states defined by four-phase traffic theory. The preliminary analysis showed that collision probability, as well as collision severities and types, were significantly affected by traffic flow states. Nonlinear canonical correlation analysis was applied to analyze the collision mechanism. The results showed that the contributing factors leading to the occurance of collisions were significantly different for varying traffic flow states [12]. Xu et al. developed collision probability models to explore the relationship between the probability of collisions and various traffic states separated by the three-phase traffic theory. The study implied that some transitional states were more dangerous than free flow, such as the transitional state from synchronized flow to free flow and the transitional state from wide moving jams to synchronized flow [13].
Numerous studies have focused on how traffic flow operates in different traffic states. The evolution of traffic dynamics on freeways is complex, and the formation of various traffic flow states is influenced by a set of factors. Therefore, traffic flow is classified into different states, typically based on traffic flow characteristics such as speed, flow rate, and density. Hall et al. separated the traffic flow into three states [14]. Wu separated the traffic flow into four states [15]. Kerner divided the traffic flow into three phases [16]. In this study, LOS is applied to separate the traffic states, which is one of the common methods for identifying traffic states. In LOS theory, traffic flow is separated into six states, according to density [17].

Data Sources
Crash data, environment data, geometric design data, and traffic flow data were collected from the I-880 freeway in California, United States between 2006 and 2011. The freeway is 34 miles in length and located between the cities of Oakland and San Jose. There are 119 loop detector stations and three weather stations along the freeway.
A total of 9919 collisions were reported and used for further data analysis. For every collision, to avoid the uncertainty of occurrence time, the collision-related data were collected from 5 min to 10 min before the occurrence time of the reported collision. In previous studies, this method has been proven to be effective [4,13]. Previous studies suggested that the statistical power is negligible by using a control-to-case ratio beyond 4:1 [13]. Thus, a control-to-case ratio of 4:1 was used in this study. For each collision case, the authors randomly selected four paired observations of the non-collision traffic data on the basis of three matching factors, including the time, the location, and the weather [13]. For example, collision No. 67 occurred at post-mile 3.95 at 15:00 on 9 November 2009. Traffic data taken at the nearest detector station from 2:50 p.m. to 2:55 p.m. on 9 November 2009 were included in the collision cases as an observation. Then, the paired collision-free traffic data taken at the same loop detector station during the same period on four randomly selected collision-free days in the same weather conditions were used as four observations in the non-collision cases. In this study, the severity of collision was divided into injury or fatal collisions and property damage only (PDO) collisions.
As shown in Table 1, the boundary values of density at different LOSs are presented. In this study, according to the LOS on the freeway, the traffic flow states were divided into six states. In addition, the statistical results in Table 2 show that the number of different collision types and severities in various LOSs are quite different.

Methods
In this study, the Bayesian conditional logit model was built to analyze the relative safety performance of different collision types and severities without considering other traffic-related factors in different LOSs. A five-stage Bayesian random parameter sequential logit model was applied to quantify the effects of various variables on the collision probability of different types and severities.

Bayesian Conditional Logit Model
In previous studies, the conditional logit model has already been used to analyze the safety performance of different traffic states [18,19]. The calculation method has been written as follows: where x ijk is the kth unmatched factor for the jth sample or control in the ith matched sample. Therefore, X = {x ijk } consists of all samples, and all matched samples are controlled. The value of i is from 1 to I. The value of j is from 1 to J. The value of k is from 1 to K. I denotes the number of matched samples; J represents the number of controls in every matched sample; and K represents the number of contributing factors. α i is the effect of matching factors on the probability of collision occurance for each matched sample; β k represents the estimated value of contributing factors; and x k is the unmatched contributing factors.
To account for the selection bias introduced by the matched case-control design, a conditional likelihood needed to be developed. The conditional probability that the first vector of the explanatory variables x i0 in the ith matched set corresponds to the case, conditional on x i0 , x i1 , . . . , x iJ being the vectors of explanatory variables in the ith matched set, is given as Thus, the likelihood function of the conditional logit can be written as [15] The Bayesian inference method has been applied for this model using Markov Chain Monte Carlo (MCMC) methods, because there is a significant advantage of this method in that all parameters in the model have a prior distribution. The posterior distribution of parameters has been expressed as where f (β |Y) is a posterior joint probability distribution (JPD) associate with parameter β, based on data set Y; f(Y, β) is a JPD associate with parameter β and data set Y; f(Y|β) denotes the probability conditional associated with parameter β; and π(β) is a prior distribution associated with parameter β.
The non-informative prior distribution in this method has been written as β ∼ Normal(0 K , 10 6 I K ) where 0 K represents a K × 1 vector of zeros and I K represents a K × K matrix. Finally, the posterior JPD f (β |Y) has been written as

Bayesian Random Parameter Sequential Logit Model
In previous studies, the ordered logit model was one of the popular methods used to analyze collision severities. However, there are some limitations of this method for analyzing collision severities as follows: 1.
There is a hypothesis of this method that the parameter estimates of different collision types and severities are the same [20,21]. However, compared to the ordered logit model, the sequential logit model can explain the difference of various contributing factors across different collision types and severities [20,21].

2.
In addition, the sequential logit model explains the correlation of collision probability between different collision types and severities [22,23]. The expressions of collision probability by different collision types and severities have been calculated by Equations (8) through (13), respectively. 3.
Moreover, collisions were affected by various traffic-related factors [24][25][26]. Thus, there is an unobserved heterogeneity in the sequential logit model [27][28][29]. The contributing factors in this study can not explain all of the variance in collision types and severities. The unobserved heterogeneity in models can result in inconsistent and biased estimation [30][31][32]. To overcome the limitation of unobserved heterogeneity in the sequential logit model, random parameters were applied in this study.
Therefore, the five-stage Bayesian random parameters sequential logit model was applied to calculate the collision probability of different severities and types. As shown in Figure 1, four Bayesian random parameters binary logit models were built from Stage 1 to Stage 2. Subsequently, three Bayesian random parameters binary logit models were built at the Stage 5. These Bayesian random parameters binary logit models formed the whole five-stage Bayesian random parameters sequential logit model. Specifically, the Stage 1 model calculated the collision probability (P Collision ) without considering different collision types and severities. The Stage 2 model calculated the probability of a hit object collision (P Collision ×P HO ) without considering collision severities. The Stage 3 model calculated the probability of a sideswipe collision (P Collision ×(1-P HO ) ×P SW ) without considering collision severities. The Stage 4 model calculated the probability of a rear end collision (P Collision ×(1-P HO ) ×(1-P SW ) ×P RE ) without considering collision severities. Finally, three collision severity models were established at Stage 5. The injury probability of a hit object collision is P HO_I , the injury probability of a sideswipe collision is P SW_I , and the injury probability of a rear end collision is P RE_I . Finally, the absolute probability of collision by different types and severities have been given as follows: P(Hit object collision with injury) = P(Collision) × P(Hit object collision|Collision) × P(injury collision|Hit object collision) = P Collision × P HO × P HO_I The specification of the basic Bayesian random parameters binary logit model has already been introduced in previous studies [33,34].

Results and Discussion
To analyze the relative safety performance of various collision types and severities between different LOSs, the Bayesian conditional logit model was used in Section 5.1. In the Bayesian conditional logit models, only LOSs were regarded as candidate variables without considering other variables.
To quantify the effects of various variables on the collision probability of different collision types and severities in various LOSs, the five-stage Bayesian random parameter sequential logit model was applied in Section 5.2. In addition to LOS variables, five other candidate variables were also considered in the five-stage Bayesian random parameter sequential logit model.

Safety Performance of LOS by Different Collision Types and Severities
According to the LOS on the freeway, the Bayesian conditional logit model was used to analyze the relative safety performance of different collision types and severities without considering other traffic-related factors in different LOSs. There are five indicator variables in this model, including LOS B, LOS C, LOS D, LOS E, and LOS F. LOS A was considered as the reference level. Therefore, the purpose of this section is to explore the relative safety performance between LOS A and other LOSs. Other traffic flow variables were not included in this model, because LOSs were highly correlated with the traffic flow variables [34].
The process of MCMC chains for this model was composed of a total of 10,000 iterations, 4000 burn-in iterations, and three parallel MCMC chains for Bayesian inference [30,31]. The results of the Bayesian conditional logit models are shown in Table 3. The results show that LOSs significantly affect the collision probability for different types and severities. The 95% credible interval for each parameter in Table 3 indicates that the LOSs significantly affect the collision probability of different types and severities. The odds ratio for each variable was used to quantify the safety performance of each LOS. Specifically, as shown in Table 3 for hit object collisions, the results suggest that the odds ratios of LOS B and LOS C were significantly greater than LOS A, and the odds ratios of LOS D, LOS E, and LOS F were not significantly greater than LOS A. Accordingly, LOS A was the safest traffic state according to the lowest hit object collision probability. However, LOS C had the highest hit object collision likelihood and was 3.319 times higher than LOS A. The hit object collision probability of LOS B was 1.878 times higher than LOS A, but lower than LOS C. In previous studies, the results indicated that a hit object collision was more likely to occur in traffic flow states with low density [6,7]. In this study, LOS C had a higher density than LOS A and LOS B. Thus, LOS C was the most dangerous for hit object collisions in all LOSs. The analysis of LOS C can also be applied to the results of LOS B.
The results of sideswipe collisions are shown in Table 3. Three odds ratios were significantly greater than LOS A, including LOS B, LOS C, and LOS D. Two odds ratios were not significantly greater than LOS A, including LOS E and LOS F. The highest sideswipe collision probability was for LOS D, followed by LOS C and LOS B. LOS A had the lowest sideswipe collision probability. Specifically, the sideswipe collision probability of LOS B was 2.130 times higher than LOS A, the sideswipe collision probability of LOS C was 3.349 times higher than LOS A, and the sideswipe collision probability of LOS D was 6.279 times higher than LOS A. LOS D had the highest density, followed by LOS C, LOS B, and LOS A. There were more and more lane-changing behaviors in traffic flow with the density increasing. More lane-changing behaviors can increase the risk of a sideswipe collision [16]. Thus, the highest sideswipe collision probability was for LOS D, followed by LOS C, LOS B, and LOS A.
In the Bayesian conditional logit model for rear end collisions, the results showed that there were some significant differences in different LOSs. All other LOSs were more dangerous than LOS A for rear end collisions. The highest rear end collision probability was for LOS D, followed by LOS C, LOS F, LOS E, and LOS B. Specifically, the rear end collision probability of LOS B was 2.995 times higher than LOS A, the rear end collision probability of LOS C was 5.032 times higher than LOS A, the rear end collision probability of LOS D was 6.078 times higher than LOS A, the rear end collision probability of LOS E was 4.053 times higher than LOS A, and the rear end collision probability of LOS F was 4.795 times higher than LOS A. The reason why LOS D had the highest rear end collision probability is similar to the sideswipe collisions above. In addition, although LOS E and LOS F had higher densities than LOS D, there was less space in LOS E and LOS F for vehicles to change lanes. Thus, LOS D had the highest rear end collision probability. Although LOS C was still in free flow, some transitional states from free flow to congested flow started to emerge with sudden reductions in speed. This is the reason why LOS C had the second-highest rear end collision probability.
As shown in Table 3, the estimation results of the Bayesian conditional logit model for injury collisions show that the LOS significantly affected the probability of injury collision occurrences. LOS F had the highest injury collision probability, followed by LOS D, LOS C, LOS B, and LOS E. Specifically, the injury collision probability of LOS B was 2.838 times higher than LOS A, the injury collision probability of LOS C was 3.923 times higher than LOS A, the injury collision probability of LOS D was 4.468 times higher than LOS A, the injury collision probability of LOS E was 2.494 times higher than LOS A, and the injury collision probability of LOS F was 5.098 times higher than LOS A. It has been proven that a higher density can lead to injury collisions [6,7]. Thus, LOS F had the highest injury collision probability. In LOS D and LOS C, more transitional states from free flow to congested flow started to emerge with sudden reductions in speed. Due to LOS D having a higher density than LOS C, LOS D had the second-highest injury collision probability, followed by LOS C.

The Sequential Logit Model for Collision Types and Severities
The five-stage Bayesian random parameter sequential logit model was established to quantify the relationship between LOS and collision probability by different types and severities. Specifically, Section 5.2.1 was used to explore the relationship between LOS and collision types. In this section, Stage 1 is used to predict the collision likelihood. Stages 2-4 are used to predict the hit object collision probability, sideswipe collision probability, and rear end collision probability, respectively. Section 5.2.2 was used to explore the relationship between LOS and collision severities. In this section, Stage 5 is used to predict the injury probability for different collision types. To avoid the correlation between traffic flow variables and LOS, only LOS was considered in the models. In addition to LOS, as shown in Table 4, five other candidate variables including visibility, road surface, weather, ramp, and number of lanes were also taken into consideration at every stage. The simulation method of this section is similar to that of Section 4.1.

Candidate Variables Explanation
Vi Visibility (mile) We 1 = worse weather conditions; 0 = normal weather conditions; Rs 1 = worse road surface; 0 = normal road surface Ra 1 = ramp segment; 0 = non-ramp segment Nl Number  Table 5 presents the results of the collision probability for different types, from Stage 1 to Stage 4. As shown at Stage 1, low visibility significantly increases collision probability. LOS A has a random negative coefficient, indicating that the collision probability decreases in LOS A. In previous studies, it has been proven that free flow has the best safety performance [15] because the driver has sufficient time to adopt emergency measures in LOS A, with less flow and more space. LOS C has a positive correlation effect on the collision probability. Although vehicles are still in free flow in LOS C, the space between vehicles becomes smaller than in LOS A and LOS B. In previous studies, it has been demonstrated that more drivers will take advantage of higher speeds under uncongested conditions [35,36]. Thus, higher speed and smaller space can lead to less response time for drivers to take emergency measures. As shown in Table 5, the hit object collision probability was calculated for Stage 2. This model has six significant variables, as shown in Table 5, including number of lanes, visibility, road surface, LOS B, LOS C, and LOS D. The results indicated that the hit object collision probability increases with less lanes, low visibility, and a worse road surface. Specifically for these reasons, less lanes can lead to less space between vehicles. Low visibility can result in less response time for drivers to take emergency measures. Vehicles need longer braking distances in worse road surface conditions. In addition, the hit object collision probability decreases in LOS B, LOS C, and LOS D. The Stage 3 model was established to predict the likelihood of a sideswipe collision. This model has seven significant variables, as shown in Table 5, including number of lanes, visibility, ramp, LOS A, LOS B, LOS C, and LOS D. Number of lanes, visibility and ramp have random negative coefficients, indicating that the sideswipe collision probability increases with less lanes, low visibility, and non-ramp conditions. In LOS A, LOS B, LOS C, and LOS D situations, the sideswipe collision probability decreases. The results indicated that low occupancy can decrease the sideswipe collision probability because there is more space for drivers to take emergency measures in low occupancy conditions. The Stage 4 model was established to predict rear end collision likelihood. This model has six significant variables, as shown in Table 5, including number of lanes, visibility, road surface, LOS A, LOS C, and LOS D. More lanes, high visibility, and a worse road surface have positive effects on rear end collision probability. For LOS A, LOS C, and LOS D, the rear end collision probability increases. Table 6 presents the results of the probability of collision severities by different types at Stage 5. It was found that there are significant differences in the contributing factors of the estimation results. For hit object collision, the results indicate that four variables can significantly affect the severity of hit object collisions. All of the significant variables have negative effects on the injury probability of hit object collisions, specifically few number of lanes, non-ramp segments, normal weather, and low visibility can increase the injury probability of hit object collisions. The results indicate that higher speeds in non-ramp segments, normal weather conditions, fewer number of lanes, and less response time in low-visibility conditions can increase the injury probability of hit object collisions. In addition, it was found that LOS has no effects on the severity of hit object collisions.

Sequential Model for Collision Severities by Different Types
For sideswipe collisions, the results imply that four variables can significantly affect the severity of sideswipe collisions. Specifically, fewer number of lanes, non-ramp segments, and low visibility can increase the injury probability of sideswipe collisions. The results are similar to the hit object collisions above. Moreover, a PDO sideswipe collision is more likely to occur in LOS D.
For rear end collisions, the results indicate that five variables can significantly affect the severity of rear end collisions. Specifically, few number of lanes, non-ramp segments, low visibility, and normal road surfaces can increase the injury probability of rear end collisions. Furthermore, a PDO rear end collision is more likely to occur in LOS A.

Conclusions
In this study, the main purpose was to identify the relationship between LOS and different collision types and severities, and explore how contributing factors affect collision risks for different types and severities. The collision-related data were obtained from the I-880 freeway, which is located in California, United States. The time interval was from 2006 to 2011. The Bayesian conditional logit model was built to analyze the relative safety performance of different collision types and severities without considering other traffic-related factors in different LOSs. A five-stage Bayesian random parameter sequential logit model was applied to quantify the effects of various variables of the collision probability of different types and severities.
Specifically, as shown in Figure 2, the results of the Bayesian conditional logit models in Table 3 indicate that LOS A is the safest traffic state for different collision types and severities. LOS C has the worst safety performance associated with hit object collisions, and the hit object collision probability in LOS C is 3.319 times higher than the one in LOS A. LOS D has the worst safety performance associated with sideswipe collisions and rear end collisions, the sideswipe collision probability and the rear end collision probability in LOS D is 6.279 and 6.078 times higher than the one in LOS A. LOS F has the worst safety performance associated with injury collisions, and the injury collision probability in LOS F is 5.098 times higher than the one in LOS A, because an injury collision is more likely to occur in traffic flow states with high occupancy [6,7]. The results of the sequential logit model showed that weather variables, road variables, and LOS had significant effects on different collision types and severities. It was found that fewer lanes and low visibility can both increase the injury probability of hit object collisions, sideswipe collisions, and rear end collisions. Ramp segments could decrease the injury probability of hit object collisions and sideswipe collisions. Normal weather conditions could increase the injury probability of hit object collisions and rear end collisions. Normal road surfaces could increase the injury probability of rear end collisions.
This research can help transportation professionals better understand which LOS is more dangerous for different collision types and severities, and realize the contributing factors of different collision types and severities in different LOSs. The results can be applied to reduce the collision probability of different types and severities in different LOSs.
However, there are still some issues that need to be studied in the future. Firstly, more divided methods of traffic flow states should be adapted, such as three-phase theory. Second, more traffic variables should be used in the models, such as driver behavior and geometric design. Finally, the transferability of the models in this study still needs to be verified in the future.