Impacts of Real-Time Traffic State on Urban Expressway Crashes by Collision and Vehicle Type

With the rapid development of urban expressway systems in China in recent years, traffic safety problems have attracted more attention. Variation of traffic flow is considered to have significant impact on the safety performance of expressways. Therefore, the motivation of this study is to explore the mechanism of how the variation of traffic flow measurements such as average speed, speed variation and traffic volume impact the crash risk. Firstly, the crashes were classified according to crash type and vehicles involved: and they are labeled with rear-end collisions or side-impact collisions, they are labeled with heavy-vehicle related collisions or light-vehicle related collisions as well. Then, the corresponding crash data were aggregated based on the similarity of traffic flow conditions and types of crashes. Finally, a random effect negative binomial model was introduced to consider the heterogeneity of the crash risk due to the variance within the traffic flow and crash types. The results show that the significant influencing factors of each type of crashes are not consistent. Specifically, the percentage of heavy vehicles within traffic flow is found to have a negative impact on rear-end collisions and light-vehicle-related collisions, but it has no obvious correlation with side-impact collisions and heavy-vehicle-related collisions. Average speed, speed variation and traffic volume have an interactive effect on the crash rate. In conclusion, if the traffic flow is with higher speed variation within lanes and is with lower average speed, the risk of all types of crashes tends to be higher. If the speed variation within lanes decreases and the average speed increases, the crash risk will also increase. In addition, if the traffic flow is under the conditions of higher speed variation between lanes and lower traffic volume, the risk of rear-end collisions, side-impact collisions and heavy-vehicles related collisions tend to be higher. Meanwhile, if the speed variation between lanes decreases and the traffic volume increases, the crash risk is found to increase as well.


Introduction
With the fast rapid development and improvement of traffic detection and information communication technology, collecting massive amounts of and high-precision real-time traffic flow and crash data is becoming much easier. Therefore, the research on the relationship between crashes and real-time traffic flow data has attracted extensive attention [1][2][3] in recent years. Identifying traffic conditions with a high crash risk can provide strong support for the formulation of crash early warning strategies during practical traffic operations.
The influence of traffic flow characteristics on crash frequency has been extensively studied, which provides useful insights for formulating effective traffic safety improvement measures. Previous studies suggest that there is a certain correlation between speed and speed variance and the occurrence of crashes [4][5][6], but their results show that the impact of speed-related measurements on crash rates is different between each other. Studies show a positive relationship between speed and speed variation with crash rates [4,7], as the research results of Wang's study [4] show that if the average speed of urban arterials increased by 1%, the crash frequency will increase by 0.7%, and the crash frequency will gation. Random effect negative binomial model is found to be a better choice over other models because it accounts for over-dispersion and heterogeneity in the data [26][27][28][29][30].
In summary, the above-mentioned studies mainly analyze the direct correlation between traffic speed or volume and crash risk. Previous studies found that traffic speed or volume has a significant positive or negative correlation with crash risk, while conflicting conclusions also exist. To some extent, the interactive impact of traffic variation onto road safety is still unclear, which, in turn, requires further in-depth and systematic analysis. In modeling the impact of traffic state, only a few of previous studies considered crash types or crash vehicles, and they basically ignored the potential effect of data aggregation and data heterogeneity on crash prediction. Therefore, in order to address the heterogeneity issues of traffic variations, a random effect negative binomial model is introduced to study the relationship between traffic variation and crash frequency on urban expressways in this paper. An aggregation method for the crash data based on the similarity of traffic flow conditions is used to study the occurrence mechanism of various crashes. It is believed that the results from this paper should be able to provide theoretical support for real-time early warning of road safety, particularly for urban expressways.

Data and Methodology
Previous studies show that many factors could influence crash rate of different types of roads, without exception to urban arterial. Traffic conditions may also be affected by the traffic signal control and other traffic management countermeasures. As a result, the relationship between traffic variations and crash rate on expressways is considered with a strong connection and therefore such a relationship is fully studied in this paper. This paper first analyzes the aggregated crash data and concurrent traffic data. Then, the predictive models are developed using selected variables. After that, the model validation is carried out.

Collection of Crash and Traffic Data
Detailed crash data and real-time traffic flow data are used to study the impact of traffic variations on crashes. The urban expressway studied is located inside the City of Wuhan and it is a part of the Third Ring Road of the city, with a total length of 37 km, and it is installed with similar guardrail and central median. The alignment radius and road control of the tested corridor are consistent with those required under the design speed of 80 km/h; no obvious changes in road factors are found along the test segment, which mainly carries truck traffic, compared to other urban arterial roads. In addition, the highways selected for our study underwent safety audits during design and construction and potential road risks are removed prior to the opening according to the Design Specifications for Highway Safety Facilities and other standards. As a results, the effects of road geometric design, weather condition and other factors on crash rates are not considered in this study.
Microwave traffic flow detectors are set along the studied segments for collecting realtime traffic flow data. The studied segment is designed with divided, two-way, six lanes or eight lanes, with a design speed of 80 km/h and a corresponding maximum traffic capacity of 2100 pcu/h/lane. The heavy vehicles are restricted to driving in the third (for two-way six lanes segments) or third and fourth lane (for the two-way eight lanes segments). The data collection took place in the following two periods, from 1 September 2018 to 31 November 2018 and from 1 March 2019 to 31 May 2019. The maximum peak hour of traffic flow is 1784 pcu/h/lane with an average off-peak flow of 772 pcu/h/lane. Average traffic volume on and off ramps is 168 pcu/h and 152 pcu/h. The corresponding maximum travel speed is 78 km/h, with an average speed of 52 ± 15 km/h. The selected expressway segment is mostly operating at the Level of Service of B or C. No serious congestion is found during the above periods.
The crash data are extracted from the traffic crash database of the traffic management department of the city, and their detailed information are also recorded, such as the location, time and type of the traffic crash. A total of 1188 crashes occurred during the study, of which rear-end collisions and side-impact collisions accounted for 54% and 41%, respectively. The two types of crashes account for 95% of the total, which constitute the majority of the crashes taking place on the urban expressway studied. Vehicles are divided into heavy vehicles and light vehicles according to the Chinese Automobile Classification Standard. In terms of the types of vehicles involved in crashes, once a heavy vehicle is involved, the crash is counted as a heavy vehicle crash with 353 crashes, whereas the other 835 crashes involve only light vehicles. A large proportion of crashes involve heavy vehicles, which poses significant safety risks to the users of the facility. The real-time traffic flow data is collected by a set of microwave traffic flow detectors installed along the facility. There are 27 sets of detectors along the urban expressway under study, with an average deployment distance of about 1.37 km, which can collect the following real-time traffic data, such as vehicle passing time, speed, vehicle type on each lane. The traffic flow measurements, including average speed, traffic volume, proportion of heavy-vehicles, speed variation among lanes and within each lane with regards to each crash type and collision vehicle type are collected every 5-min and summarized in Table 1.

Data Processiong and Filtering
Previous studies show that traffic flow condition prior to the crash is closely related to the occurrence of the crash. For instance, Oh collected real-time traffic flow data through upstream loop detector ahead of the crash occurrence location and used the traffic flow data just 5 min prior to the crash report time to identify the crashes [31,32]. Abdel-Aty concluded that the speed variation that is detected from the closest loop detector within 5-10 min' interval prior to the crash report time has most significant impact on the crashes [33,34]. Based on these experiences, the pre-crash traffic flow conditions in this study are defined as those 5-10 min prior to the reported crash time, which are collected by the closest detectors upstream to the crash location. To improve the reliability of the modelling results, crashes more than 800 m away from the detectors were screened out, considering that the average distance between the detectors in the previous research is around 800 m [2,7,21] The workflow of data collection is shown in Figure 1. The relevant traffic flow data is determined and extracted by the occurrence time of related crash(s), and in this way, the corresponding detectors for collecting relevant traffic flow data is selected according to the location of the crash, aiming to identify traffic flow conditions before the crash, as shown in Figure

Data Processiong and Filtering
Previous studies show that traffic flow condition prior to the crash is closely related to the occurrence of the crash. For instance, Oh collected real-time traffic flow data through upstream loop detector ahead of the crash occurrence location and used the traffic flow data just 5 min prior to the crash report time to identify the crashes [31,32]. Abdel-Aty concluded that the speed variation that is detected from the closest loop detector within 5-10 minutes' interval prior to the crash report time has most significant impact on the crashes [33,34]. Based on these experiences, the pre-crash traffic flow conditions in this study are defined as those 5-10 min prior to the reported crash time, which are collected by the closest detectors upstream to the crash location. To improve the reliability of the modelling results, crashes more than 800 m away from the detectors were screened out, considering that the average distance between the detectors in the previous research is around 800 m [2,7,21] The workflow of data collection is shown in Figure 1. The relevant traffic flow data is determined and extracted by the occurrence time of related crash(s), and in this way, the corresponding detectors for collecting relevant traffic flow data is selected according to the location of the crash, aiming to identify traffic flow conditions before the crash, as shown in Figure  For the urban expressway studied, raw traffic flow data of each lane was recorded and aggregated at 5-min interval. However, traffic flow data collected often contain abnormal and missing values because of data noise and hardware equipment failure. It is necessary to clean such kind of data to avoid the negative impact of abnormal data on the model. The abnormal data, shown as wrong or missing traffic volume and speed, due to data noise and equipment failure are quite different from the normal data. Therefore they cannot be used to study any rules. Therefore, the threshold and logical reasoning method are combined to detect abnormal data. In this study, all invalid and unrealistic values are excluded from the further analysis, and the rules for excluding outliers include: (1) "missing or outlier" records in the raw data; (2) speed < 0 km/h or speed > 100 km/h; (3) traffic volume < 0 pcu, or traffic volume > 150 pcu in five minutes; (4) number of lanes > 5; (5) Heavy-vehicle proportion < 0. For the urban expressway studied, raw traffic flow data of each lane was recorded and aggregated at 5-min interval. However, traffic flow data collected often contain abnormal and missing values because of data noise and hardware equipment failure. It is necessary to clean such kind of data to avoid the negative impact of abnormal data on the model. The abnormal data, shown as wrong or missing traffic volume and speed, due to data noise and equipment failure are quite different from the normal data. Therefore they cannot be used to study any rules. Therefore, the threshold and logical reasoning method are combined to detect abnormal data. In this study, all invalid and unrealistic values are excluded from the further analysis, and the rules for excluding outliers include: (1) "missing or outlier" records in the raw data; (2) speed < 0 km/h or speed > 100 km/h; (3) traffic volume < 0 pcu, or traffic volume > 150 pcu in five minutes; (4) number of lanes > 5; (5) Heavy-vehicle proportion < 0.

Variable Selecting and Setting
This study uses the following five variables, including the average traffic volume per lane, the proportion of heavy vehicles, the average speed, the speed variation between lanes and the speed variation within each lane, to study the relationship between traffic variations and the risk of crashes. It should be noted that in the following data analysis steps, traffic volume of various types of vehicles is converted into the Passenger Car Unit (PCU) according to the defined conversion coefficient. In addition, the original traffic flow data is aggregated into 5-min units to remove the impact of occasional flow fluctuation.
where q stands for the traffic volume, L is the number of lanes and q l is the PCU value for a five-minute period on each lane. Heavy-vehicle proportion w: the proportion of heavy vehicles refers to the proportion of heavy vehicles that passes through a segment in a five-minute period.
where Q is the number of heavy vehicles and Q is the summation of the number of vehicles in a five-minute period. Average speed v: The average speed of all the vehicles that present on a road section along one traveling direction in a five-minute period.
where v Q is the speed of each vehicle.
The speed variation between lanes v : for each one-minute interval, the standard deviation of speeds between the lanes was calculated, and then the average of these standard deviations for 5 min was considered as the between-lanes speed variation.
where v t is the average speed for all lanes for minute t and v l,t is the average speed for the lth lane for minute t, and T is the number of the lanes. The speed variation within lanes v : for each lane, the standard deviation of speeds for a 5 min interval was calculated, and then the average of these standard deviations for all three lanes was considered as within the lane speed variation.
where v t is the average speed for 5 min within lane l.

Data Aggregation
In this paper, the impact of traffic states on crash frequency is investigated under different traffic flow conditions, and each traffic flow condition is defined as a crash scenario. Thus, a total of 432 crash scenarios (i.e., 4 levels of average speed × 4 levels of traffic volume × 3 levels of speed variation between lanes × 3 levels of within-lane speed variation × 3 levels of heavy-vehicle proportion) is developed, covering all possible traffic flow scenarios that may lead to crashes, and each scenario represents a unique traffic condition. The crash frequency in each scenario was represented by a combination of crash type (Rear-end collisions and Side-impact collisions) and vehicle type (Heavy-vehicle related collisions and Light-vehicle related collisions). The crash data grouped into the same scene was aggregated to form an analysis dataset, and the median of each traffic variable in each group is used to represent the corresponding traffic condition. In addition, the average vehicle-hour spent for going through the testing segment of each scenario is introduced as an exposure variable to calculate the probability of crashes under a specific traffic flow condition.
where, V HT i represents the average vehicle-hour travelled per kilometer in the ith scenario; q i is the traffic volume under the corresponding scenario; v i is the average speed under the same scenario.

Crash Predicition Modelling
Traditional count models for crash frequency prediction include Poisson regression model and Negative Binomial distribution model, and the Negative Binomial distribution models have been widely used to work around the over-dispersion issues inherent in count data. Similar to previous studies, the crash frequency data aggregated based on traffic condition are assumed to follow the negative binomial distribution in this paper: where λ ik and y ik refer to the expected crash frequency and the observed crash frequency for collision type k of the scenario i, respectively, and α represents the over-dispersion parameter.
To describe the unobserved heterogeneity of the modeling data, a random effect term ϕ ik was introduced into the negative binomial model, as follows: where β k0 represents the intercept of crash type k, β km is the coefficient of mth explanatory variable for crash type k, e i is the value of exposure variable for ith observation, X ikm is the value of mth explanatory variable for ith observation for crash type k and ϕ ik is the unobserved heterogeneity for ith observation for crash type k, which follows the normal distribution with a mean value of zero and a variance of σ 2 ϕ .

Prediciton Performance Evaluation
Akaike Information Criterion (AIC) is the main statistic to check the goodness-of-fit of the models developed in this paper. The smaller value of AIC information criterion indicates the better goodness-of-fit. The BIC information criterion is usually used as a supplement to the AIC information criterion. The smaller value of the BIC information criterion indicates a better fit of the model.
To evaluate the accuracy of the predicted results, two indicators were introduced: Mean Absolute Deviation (MAD) and Mean Squared Error (MSE). MAD describes the average deviation between the predicted and the observed crash frequency under each scenario, and the MSE refers to the average deviation squared. The smaller value of MAD and MSE mean a higher prediction accuracy of the model. Besides, R 2 is introduced to describe the accuracy of the model, and its value ranges from 0 to 1. A higher value of R 2 means a better model fit. Literature indicates that when R 2 is greater than 0.4, the developed model is considered to have a good fit.
where y ik is the observed average crash frequency for k crash type of the scenario i.

Analysis of Traffic Flow and Crash Data
Analysis and visualization of the above variables reveals that the traffic flows show very interesting temporal distribution characteristics, as shown in Figure 2. For instance, traffic volume data collected on the site clearly presents a morning and evening peak, as demonstrated in Figure 2a. The proportion of heavy vehicles is lower in the daytime and much higher at nighttime and early mornings, which is related to the travel restriction policies regarding heavy vehicles of the urban expressways, as shown in Figure 2b. Figure 2c shows the changes in the speed variation among lanes and the within-lane speed variation over time. The two variables are higher at nighttime and early mornings. The lower traffic volume and larger speed variation at those times may be the reason for such an observation.
where is the observed average crash frequency for crash type of the scenario .

Analysis of Traffic Flow and Crash Data
Analysis and visualization of the above variables reveals that the traffic flows show very interesting temporal distribution characteristics, as shown in Figure 2. For instance, traffic volume data collected on the site clearly presents a morning and evening peak, as demonstrated in Figure 2a. The proportion of heavy vehicles is lower in the daytime and much higher at nighttime and early mornings, which is related to the travel restriction policies regarding heavy vehicles of the urban expressways, as shown in Figure 2b. Figure  2c shows the changes in the speed variation among lanes and the within-lane speed variation over time. The two variables are higher at nighttime and early mornings. The lower traffic volume and larger speed variation at those times may be the reason for such an observation.  Pre-crash traffic conditions are extracted and then combined with the historical traffic crash data. For each traffic variable, it is defined as follows. To be specific, the average speed was firstly divided into 4 equal levels with each level covering 25% of its cumulative distribution, then the dataset for each average speed division is divided into 4 equal parts according to the cumulative distribution of traffic volume. Similarly, the speed variation between lanes for each separate traffic volume quantile is divided into 3 again; the speed variation within lanes for each speed variation between lanes division was divided into 3; and the heavy-vehicle proportion for each speed variation within lanes division was divided into 3 as well. After data aggregation, there is 432 traffic scenarios. The summary statistics of the scenario-based dataset are shown in Table 2.

Negative Binomial Model
Different combinations among the above independent variables are tested for developing the optimal models, in order to control the possible interactions among independent variables. Based on the criteria of minimum AIC, the best combination of independent variables is selected. Table 3 shows the posterior estimation of the random effect negative binomial model based on the crash scenario dataset. The estimated parameters are statistically significant based on their 95% significance levels.
According to the estimation results for the rear-end collision and side-impact collision prediction models, the significant independent variables finally included inside the models are: average speed (Mean = 0.0801, p value = 0.00 < 0.05), traffic volume (Mean = 0.0258, p value = 0.00 < 0.05), speed variation among lanes (Mean = 0.1939, p value = 0.00 < 0.05), within-lane speed variation (Mean = 0.6270, p value = 0.00 < 0.05), interaction terms between average speed and speed variation within lane (Mean = −0.0124, p value = 0.00 < 0.05), and interaction terms between traffic volume and speed variation between lanes (Mean = −0.0041, p value = 0.00 < 0.05). Different from the side-impact collision model, the heavy-vehicle proportion is also a significant independent variable for the rear-end collision model (p value = 0.00 < 0.05). Its coefficient is negative (Mean = −6.4851), indicating that it has a negative impact onto the crash risk. According to the analysis of traffic variation patterns of heavy vehicles, the number of heavy vehicles traveling at nighttime and early mornings in the studied area is much higher than that during the daytime. However, the majority of recorded crashes occurred during daylight hours, which may explain the inverse relationship between the proportion of heavy vehicles and the crash rate. R 2 When analyzing the relationship between the crash frequency and related independent variables, such as traffic volume and average speed, their effects on crashes cannot be analyzed separately due to their combined interaction effects. As shown in Figure 3, the relationship between the average speed, the speed variation within lane and the crash rate was plotted. In the case of a combination of higher speed variation within lane and a lower average speed (or vice versa), the curve line becomes very steep, indicating that the crash rate increases very quickly under such a scenario. There is a high-speed variation in the same lane combined with a low average speed and it may indicate that the roadway is in a congested traffic flow condition with vehicles taking frequent stop-and-go actions. Due to the limited distance between vehicles, the driver's response time to a sudden speed change of front vehicle is reduced, so it leads to more rear-end collisions. On the other hand, higher average speed and lower within-lane speed variation increase the crash risk, which is mainly reflected by the impact of higher average speed on the crash risk. When the vehicle is operating at a higher speed, the risk of crash will increase because the braking distance will be increased and the driver's response time will be very limited and. Nevertheless, the reduced crash risk under the scenario with a combination of higher speed variation and higher average within-lane speed may be related to the sample size. In this study, such traffic conditions were less frequent in the crash sample data used for the analysis. An earlier study has divided rear-end collisions crashes into low-speed and high-speed scenario, and corresponding findings are consistent with the conclusions of this study. Under high-speed conditions, speed is positively correlated with crash frequency; while under low-speed conditions, a larger speed variation is found to increase crash risk. Note: * means "the results given are considered the impact of these two factors".

Correlation between Traffic Volume, Speed Variation and Crash Rate
The relationship among traffic volume, speed variation among lanes and the crash rate are plotted in Figure 4. Basically, the speed variation among lanes reflects the driving behavior related to lane change or overtaking operation. The results show that the crash rate is higher under the low flow conditions with a high-speed variation. The entrances and exits of ramps are closely distributed over the section of the urban expressway under study, and the frequency of vehicle weaving and overtaking near the ramps is high. Frequent lane changes and overtaking will lead to a higher risk of collision. Besides, with high traffic volumes, there is more interweaving among vehicles, which leads to greater exposure to crash risks.
According to the results of estimation results, the relationship between traffic flow variables and heavy-vehicle/light-vehicle related collision rate are drawn in Figures 5 and 6, respectively. In terms of light-vehicle related collisions, the significant independent variables used in the model include average speed, traffic volume, proportion of heavy vehicles, speed variation among lanes, within-lane speed variation, interaction terms between average speed and within-lane speed variation, and interaction terms between traffic volume and speed variation between lanes. As shown in Figure 5, for light-vehicle related collision model, different between rear-end collisions and side-impact collisions, it is found that the effect of traffic volume on crash rate is decreased by the large speed variation among lanes, due to the existence of interaction term between within-lane speed variation and traffic volume. this study. Under high-speed conditions, speed is positively correlated with crash frequency; while under low-speed conditions, a larger speed variation is found to increase crash risk.

Correlation between Traffic Volume, Speed Variation and Crash Rate
The relationship among traffic volume, speed variation among lanes and the crash rate are plotted in Figure 4. Basically, the speed variation among lanes reflects the driving behavior related to lane change or overtaking operation. The results show that the crash rate is higher under the low flow conditions with a high-speed variation. The entrances and exits of ramps are closely distributed over the section of the urban expressway under study, and the frequency of vehicle weaving and overtaking near the ramps is high. Frequent lane changes and overtaking will lead to a higher risk of collision. Besides, with high traffic volumes, there is more interweaving among vehicles, which leads to greater exposure to crash risks.   quency; while under low-speed conditions, a larger speed variation is found to increase crash risk.

Correlation between Traffic Volume, Speed Variation and Crash Rate
The relationship among traffic volume, speed variation among lanes and the crash rate are plotted in Figure 4. Basically, the speed variation among lanes reflects the driving behavior related to lane change or overtaking operation. The results show that the crash rate is higher under the low flow conditions with a high-speed variation. The entrances and exits of ramps are closely distributed over the section of the urban expressway under study, and the frequency of vehicle weaving and overtaking near the ramps is high. Frequent lane changes and overtaking will lead to a higher risk of collision. Besides, with high traffic volumes, there is more interweaving among vehicles, which leads to greater exposure to crash risks.   6, respectively. In terms of light-vehicle related collisions, the significant independent variables used in the model include average speed, traffic volume, proportion of heavy vehicles, speed variation among lanes, within-lane speed variation, interaction terms between average speed and within-lane speed variation, and interaction terms between traffic volume and speed variation between lanes. As shown in Figure 5, for light-vehicle related collision model, different between rear-end collisions and side-impact collisions, it is found that the effect of traffic volume on crash rate is decreased by the large speed variation among lanes, due to the existence of interaction term between within-lane speed variation and traffic volume.  In terms of heavy-vehicle related collision model, the significant independent variables included in the model are: average speed, speed variation among lanes, within-lane speed variation, interaction terms between average speed and within-lane speed variation, and interaction terms between average speed and speed variation among lanes. Heavy-vehicle related collisions are more probable to occur under a high level of within-lane speed variation combined with a low level of average speed. The post speed limit of heavy vehicles and light vehicles on the urban expressway are different with each other, and the heavy vehicles generally drive at a relatively slower speed. Under such an operation policy, the within-lane speed variation is higher and the average speed of the road segment is low. Therefore, it may be because of the impact of heavy vehicles on traffic operation speed, or the occurrence of traffic congestion, which leads to an increased overtaking behavior, resulting in a higher crash risk. When the average speed is high, heavy vehicles tend to create traffic collisions due to their own design issues. Such a result is consistent with previous studies which concluded that crashes related to heavy vehicles happen with a higher probability under the scenarios with a high operation speed and speed variation. However, under the scenario of a high within-lane speed variation and average speed, the crash risk decreases, which may be related to the less occurrence of such traffic flow condition in the crash sample data used for analysis.

Study a "Safe" Traffic Flow Threshold in Practise
The elasticity analysis can be used to further quantify the effect of traffic flows on accidents and reveal t the relationship between traffic flow and accident frequency, which, in turn, could provide reference for the formulation of traffic safety improvement measures.
The calibrated random effects negative binomial model can be used to identify the important independent variables used for the collision prediction model. To further identify the degree of influence of the respective independent variables on the dependent variable, the elasticity analysis method is used to explain the degree of influence. The independent variables in this study are all continuous independent variables, so the formula for calculating the elasticity coefficient is determined as follows: where E j represents the elastic coefficient of the jth independent variable and x j denotes the average of the jth independent variable Due to interaction terms presented inside the model, the elastic coefficients of the respective variables may not have definite values. As shown in Figure 7, the elastic coefficient of within-lane speed variation is a function inversely proportional to the average speed. The average speed thresholds that results in positive elasticity coefficients for lane speed change are 51.09, 50.56, 64.95, 58.13, and 59.62 for overall crashes, rear-end collisions, side-impact collisions, heavy vehicle related collisions and light vehicle related collisions. When the average speed is less than these values, an increase in the in-lane speed variation, for which may result in more collisions; and as the average speed increases, an increase in the in-lane speed variation decreases the frequency of accidents. The thresholds for traffic volume with positive elasticity coefficients for inter-lane speed changes were 69.67, 47.29, 45.76 and 46.33 for overall crashes, rear-end collision accident, side collision accident and small vehicle collision accident. For heavy vehicle crashes, the average speed threshold that results in a positive elasticity coefficient for inter-lane speed variation is 62.55. All the above results provide insights for developing traffic operation policies to improve traffic safety. In detail, traffic safety can be improved by adjusting traffic volume, traffic vehicle composition, and vehicle speed distribution.

Conclusions and Discussion
This paper introduces a random effect negative binomial model to analyze the impa of traffic flow variables such as average speed, speed variation and traffic volume on cra risk, based on crash data and concurrent traffic flow data collected by high-precision m crowave traffic flow detectors on urban expressways. In this study, the crashes are subd vided into rear-end collisions/side-impact collisions and heavy-vehicle-related col sions/light-vehicle-related collisions. The crash data are aggregated based on the simila ity of traffic flow conditions, the crash scenarios that may reflect all possible types of traf flow conditions at the studied area are developed and the mechanism of various types crashes is then analyzed.
The results show that the significant influencing factors of each kind of crashes a different. For rear-end collisions，if there is higher speed variation within lane, the cra risk is higher. The finding is consistent with other studies [3]. Due to the limited distan between vehicles, the driver's response time to the sudden speed change of surroundi vehicles is reduced, which leads to rear-end collisions. Under high-speed traffic operati conditions, speed is positively correlated with crash frequency, while under low-spe conditions a larger speed variation increases the crash risk [33]. The results from this stu are largely in line with the previous study [35], which shows that crashes take place wi a higher probability in the presence of high-speed variations under low-flow condition Frequent lane changes and overtaking on road sections also lead to a higher risk of col sion [1]. The result is consistent with some previous studies, which found that crash related to heavy vehicles occur with a higher frequency in the presence of high operati speeds and speed variations [7].
By analyzing the relationship between traffic flow measurements and various typ of crashes, this study improves the level of details of the crash modeling and provid practical guiding values for traffic safety management. Although this study has achiev its major goal, its limitations have also been identified. First, weather conditions have

Conclusions and Discussion
This paper introduces a random effect negative binomial model to analyze the impact of traffic flow variables such as average speed, speed variation and traffic volume on crash risk, based on crash data and concurrent traffic flow data collected by highprecision microwave traffic flow detectors on urban expressways. In this study, the crashes are subdivided into rear-end collisions/side-impact collisions and heavy-vehicle-related collisions/light-vehicle-related collisions. The crash data are aggregated based on the similarity of traffic flow conditions, the crash scenarios that may reflect all possible types of traffic flow conditions at the studied area are developed and the mechanism of various types of crashes is then analyzed.
The results show that the significant influencing factors of each kind of crashes are different. For rear-end collisions, if there is higher speed variation within lane, the crash risk is higher. The finding is consistent with other studies [3]. Due to the limited distance between vehicles, the driver's response time to the sudden speed change of surrounding vehicles is reduced, which leads to rear-end collisions. Under high-speed traffic operation conditions, speed is positively correlated with crash frequency, while under low-speed conditions a larger speed variation increases the crash risk [33]. The results from this study are largely in line with the previous study [35], which shows that crashes take place with a higher probability in the presence of high-speed variations under low-flow conditions. Frequent lane changes and overtaking on road sections also lead to a higher risk of collision [1]. The result is consistent with some previous studies, which found that crashes related to heavy vehicles occur with a higher frequency in the presence of high operation speeds and speed variations [7].
By analyzing the relationship between traffic flow measurements and various types of crashes, this study improves the level of details of the crash modeling and provides practical guiding values for traffic safety management. Although this study has achieved its major goal, its limitations have also been identified. First, weather conditions have a potential impact on the occurrence of crashes, which will be considered in our models once the detailed weather data are available. Secondly, road geometric characteristics also have a certain correlation with the occurrence of crashes, as well as traffic delays, economic and societal costs and others, which have not been considered in our study yet. Moreover, the traffic flow variables have different safety effects on crash severity, but more than 90 percentage of crash samples are property damage only, so the crash severity has not been analyzed in detail as well. Lastly, the current study only used one section of urban expressway in the city of Wuhan for a case study; therefore, a limited sample size and road type may also have an impact on the rigor of the contributions of this study. Moreover, the conclusion of this study has demonstrated that if there are more heavy vehicles in the traffic flow, the crash risk would be higher. The commercial vehicle drivers' performance was believed to be one of the contributing factors; however, only GPS-based surveillance measurements, speed and position data, were available for this study. In addition, these two types of data did not support our further investigation of driving performance. Since the speed data are analyzed already, no further variable was used in the current paper. When the connected vehicle technology becomes more popular and more risky driving behavior can be detected, then a new crash prediction modeling can be established, including consideration of the heavy vehicle driver's performance. Therefore, under the premise of obtaining more crash samples through Big Data technology, it is of interest to study the mechanism of crash severity based on real-time traffic flow and driving behavior data in future. In addition, such kind of analyses can provide higher reference value for the formulation of road safety improvement measures.