Influence Factors on Injury Severity of Traffic Accidents and Differences in Urban Functional Zones: The Empirical Analysis of Beijing

The objective of this study was to identify influence factors on injury severity of traffic accidents and discuss the differences in urban functional zones in Beijing. A total of 3982 sets of accident data in Beijing were analyzed from the perspective of whole city and different urban functional zones. From the aspects of accident attribute, occurrence time, infrastructure, management status, and environmental condition, the influence factors set of injury severity of traffic accidents in Beijing are set up in this paper, which include 17 influence factors. Based on Pearson’s chi-squared test, factors are preselected. On the basis of binary logistic regression analysis, the impact of the value of influence factors on injury severity of traffic accidents is calibrated. Based on classification and regression tree analysis, the impact of influence factors is analyzed. Through Pearson’s chi-squared test and binary logistic regression analysis, it is found that there are similarities and differences among different urban functional zones. There are two common influence factors, including accident type and cross-section position, and six personalized influence factors, including lighting conditions, visibility, signal control, road physical isolation facility, occurrence period and road type, and the other nine weak influence factors. The results of binary logistic regression analysis and classification and regression tree analysis are basically the same. The factors that should be paid attention to in different urban functional zones and the value of the factors that need special attention are determined by synthesizing two methods.


Introduction
Road traffic injuries have a huge impact on health security and development. There were 1.25 million road traffic deaths globally in 2013 [1]. In 2015, a total of 187,781 traffic accidents occurred in China, including 58,022 deaths and 199,880 injuries, with a direct economic loss of more than 1 billion RMB [2]. It is very important to study the influencing factors of traffic accidents and eliminate potential accidents.
China has biggest population in the world and has huge area. There are number of variety among different regions. For megacities, there are some differences between different urban functional zones. Generally, the accident cause analysis in the whole city, and the differences among the urban functional zones are often neglected, which can be found by analyzing the statistical data of traffic accidents. Taking Beijing as an example, it can be divided into four parts based on urban master planning. Zone 1 is the capital functional core zone, Zone 2 is the urban functional expansion zone, Zone 3 is the new Additionally, considering the factors related to traffic accidents, there are great differences among different urban functional zone. Taking Beijing as the research object, this paper analyses the influence factors on injury severity of traffic accidents, and discusses the difference among the whole city and different urban functional zones.

Analysis of Traffic Accidents in Beijing
As the capital and one of the largest cites in China, Beijing has a population of 21.7 million people and 5.6 million motor vehicles [3]. Recently, many studies focused on the traffic accidents in Beijing. Yan et al. [5] presented a comprehensive analysis of motor vehicle-bicycle crashes to find the interrelationship of irregular maneuvers, crash patterns, and bicyclist injury severity. Zhao et al. [6] investigated the relative likelihood of pedestrian head injuries based on person, vehicular, and environmental factors. Qiu et al. [7] put forward a novel multi-objective particle swarm optimization-based partial classification method to identify the contributing factors that influence accident severity. Li and Guo [8] developed a sub-distribution hazard regression model for competing risks analysis on traffic accident duration time. Yuan and Chen [9] established a logistic regression model to analyze the significance of main contributing factors of vehicle to vulnerable road user crash. Recently, most of the researchers have studied traffic accidents in Beijing, which are based on influence factors of certain type of accidents.
Most of these studies focus on risk analysis, accident cause mechanism, behavior analysis, etc. Furthermore, these studies are generally small sample studies, since it is difficult to conduct a data survey. Therefore, almost no studies have been conducted on different functional zone. However, one or several types of traffic accidents are difficult to reflect overall characteristics of traffic accidents in Beijing, and the difference among zones is not negligible. In fact, the urban traffic safety activities are based on overall characteristic of urban traffic accidents. This paper is mostly based on the various types of traffic accidents in Beijing, the basis for traffic safety governance activities is put forward for the whole city and different urban functional zones. Traffic safety activities are very important ways to reduce traffic accidents systematically, which include both the planning and design of infrastructure, and the daily traffic management. These contents determine the choice of influence factors of traffic accident.

Influence Factors on Traffic Accidents
Many prior studies have examined the influence factors on traffic accidents. Šliupas [10] discussed the impact of road parameters and surrounding area on traffic accidents. Miškinis and Valuntaite [11] examined the correlation between traffic accidents and driving experience. Kunt et al. [12] considered driver information, vehicle information, weather condition, road surface, etc. Beak et al. [13] dealt the relations between operational method and traffic accidents. Ivan et al. [14] analyzed traffic accidents under low-light conditions. Lu et al. [15] studied the correlation between accident injury severity and potential factors, such as driver factors, environmental factors, vehicle factors, and tunnel factors.
Most of the current studies about the influence factors focus on accident attributes, occurrence time, infrastructure, management status and environmental conditions. However, different influence factors show different impact in different environments. Mathematical models need to be established to calibrate the relationship. Current studies commonly adopt negative binomial regression model [16,17], structural equation model [18], linear and multiple regressions model [10], random effects model [19], hypothesis testing model [13], multiple logistic regression [20], ordered logit model [15], etc.
In previous publications, the traffic safety depends on the integrated and complex relationship between various components [21]. For example: A. human factor: the psychology of the vehicle's driver, pedestrian, etc.; B. traffic flow: the traffic, the vehicle, signal control mode, etc.; C. road infrastructure: road type, road line style, central isolation facility, etc.; D. environmental condition: road safety attribute, lighting condition, etc.
In the study of influence factors on injury severity of traffic accidents in Beijing, a set of influence factors is established from multiple aspects. Further screening of influence factors and careful analysis of core factors are carried out.

Data
Based on the statistical data of traffic accidents in Beijing, a set of influence factors on injury severity of traffic accidents in Beijing is set up. Y indicates injury severity of traffic accident. X i indicates the independent variable that has a significant impact on injury severity of traffic accidents. The set of influence factors includes five aspects: A. Accident attribute, including accident type X 1 ; B. Time of occurrence, including day of the week X 2 and time interval X 3 ; C. Infrastructure, including cross-section position X 4 , central isolation facility X 5 , physical isolation facility X 6 , pavement condition X 7 , pavement structure X 8 , intersections type X 9 , road line style X 10 , and road type X 11 ; D. Management status, including road safety attribute X 12 and signal control mode X 13 ; and E. Environment condition, including weather X 14 , visibility X 15 , lighting condition X 16 , and road surface condition X 17 .
Based on the investigation of injuries in Beijing from 2014 to 2015, 3982 data points are selected. These data have excluded abnormal data, such as imperfect records and obvious error. The definitions and descriptive statistics of Y and X i are shown in Table 2. From the perspective of accident attribute, there are statistically analysis of the traffic accidents in whole city and different functional zones of Beijing. It is easy to find that there are certain differences among different urban functional zones. Differences of traffic accident type in urban functional zone are shown in Table 3. Differences of other traffic characteristics in urban functional zone [4] are shown in Table 4. Table 3. Differences of traffic accident type in urban functional zone.

Zone
Severity Accident Type

Methods
With 3982 sets of accident data in Beijing, considering the perspective of whole city and urban functional zones, the factors that really affect injury severity of traffic accidents can be screened from a series of candidate influence factors. First, based on Pearson's chi-squared test, the correlation between severity and influencing factors and the influence factors are selected to reduce the difficulty of the latter analysis. Secondly, based on binary logistic regression analysis, the influence of various factors is studied. Finally, based on classification and regression tree analysis, the influence degree of various factors is studied.

Pearson's Chi-Squared Test
The relationship between Y and X i (I = 1, 2, . . . , 17) is studied to realize preliminary screening of influence factors. Collating data through the contingency table, where columns indicate X i , and rows indicate Y. If X i has h levels, and Y has l levels, then the table is called h × l contingency table. It is considered that the chi-square statistic is of significance at the 0.05 level.

Binary Logistic Regression Analysis
The influence factors preliminarily screened by Pearson's chi-squared test should be further selected, and the effects of the selected factors on injury severity of traffic accidents should be determined. This analysis process can be realized by binary logistic regression analysis (BLR).
The probability of the occurrence of traffic accident severity is: where α is constant, and β i is the parameter of the independent variable. Further transformed into logarithmic form: All the factors that affect the severity of accidents should be screened, in order to find out the factors that have a significant impact on the severity of traffic accidents. A significance level of 0.05 of screening rule is suggested.

Classification and Regression Tree Analysis
Classification and regression tree (CART) is a learning method for conditional probability distribution of output random variables under given input conditions. CART dichotomizes each characteristic. After the selection of the best binary characteristics and the best binary eigenvalue and dichotomy, the binary tree is generated, and CART algorithm is implemented by pruning. Classification CART tree selects Gini coefficient criterion for feature selection.
In a classification problem, supposing there a K classes, the probability of the sample point belonging to class k is p k , then the Gini index of the probability distribution is defined as: Set C k as the subset of class k of sample D, then the Gini index is: Supposing the condition A divides the sample D into two data subsets D 1 and D 2 , then the Gini index of the sample D under condition A is: The Gini index also indicates the uncertainty of samples. The importance of the most influential factors is converted to 100%, and that of other factors is converted to percentage in turn. It is considered that the importance of standardization is meaningful at 20% levels.

Pearson's Chi-Squared Test
Based on Pearson's chi-squared test, the correlation between severity and influence factors are studied in whole city and different urban functional zones. The test value of the chi-square statistics is shown in Table 5.

Whole City
Accident type, time interval, cross-section position, physical isolation facility, pavement condition, intersections type, road line style, road type, signal control mode, visibility, and lighting conditions are 11 factors that are closely connected with severity. In the latter analysis, these 11 factors should be considered. For Zone 1, accident type, cross-section position, pavement condition, and road type are four factors that are closely connected with severity.
For Zone 2, accident type, time interval, cross-section position, pavement structure, intersections type, road safety attribute, weather, and visibility are eight factors that are closely connected with severity.
For Zone 3, accident type, time interval, central isolation facility, physical isolation facility, road line style, road type, signal control mode, visibility, and lighting conditions are nine factors that are closely connected with severity.
For Zone 4, accident type, time interval, cross-section position, road type, and lighting condition are five factors that are closely connected with severity.
As shown in Figure 1, the following results can be found: accident type, time interval, cross-section position, and road type appeared no less than four times; central isolation facility, physical isolation facility, pavement condition, pavement structure, intersections type, road line style, road safety attribute, signal control mode, weather, visibility, and lighting conditions appeared, but less than four times; day of the week and road surface condition did not appear.

Binary Logistic Regression Analysis
Based on BLR, the influence factors on injury severity of traffic accidents are studied in whole city and different urban functional zones. The results of BLR are shown in Table 6.
condition are five factors that are closely connected with severity.
As shown in Figure 1, the following results can be found: accident type, time interval, cross-section position, and road type appeared no less than four times; central isolation facility, physical isolation facility, pavement condition, pavement structure, intersections type, road line style, road safety attribute, signal control mode, weather, visibility, and lighting conditions appeared, but less than four times; day of the week and road surface condition did not appear.

Binary Logistic Regression Analysis
Based on BLR, the influence factors on injury severity of traffic accidents are studied in whole city and different urban functional zones. The results of BLR are shown in Table 6.   "-" indicates that the significance test is more than 0.05, and it is meaningless. "/" indicates that there are no data. "#" indicates the reference value of Exp (B).

Whole City
The probability of death accident of accident type 1, 2, 3, and 4 are, separately, 1.598 times, 4.422 times, 3.353 times, and 2.401 times that of accident type 5. The top two probabilities of death accident are accident types 2 and 3.
The probability of death accident of cross-section position 1, 2, 3, 4, 5, and 6 are, separately, 0.794 times, 0.794 times, 0.794 times, 2.105 times, 0.614 times, and 2.858 times that of cross-section position 7. The top two probabilities of death accident are cross-section positions 6 and 4.
The probability of death accident of physical isolation facility 1, 2, and 3 are, separately, 1.059 times, 0.846 times, and 1.334 times that of physical isolation facility 4. The top two probabilities of death accident are physical isolation facilities 3 and 1.
The probability of death accident of road type 1, 2, 3, 4, and 5 are, separately, 2.231 times, 1.091 times, 0.994 times 1.170 times, and 1.502 times that of road type 6. The top two probabilities of death accident are road types 1 and 5.
The probability of death accident of signal control mode 1 and 2 are, separately, 0.785 times and 0.785 times that of signal control mode 3. The top two probabilities of death accident are signal control modes 2 and 3.
The probability of death accident of visibility 1, 2, and 3 are, separately, 0.590 times, 1.028 times and 1.273 times that of visibility 4. The top two probabilities of death accident are visibilities 3 and 2.
The probability of death accident of lighting conditions 1, 2, 3, and 4 are, separately, 1.020 times, 0.914 times, 2.162 times, and 2.044 times that of lighting condition 5. The top two probabilities of death accident are lighting conditions 3 and 4.

Zone 1-4
Factors affecting the probability of death accident differ from zone 1 to zone 4. The top probability of death accident relating to each factor can was shown in Table 4 and the analysis process is the same to the whole city in order to find the most influencing factors.
By BLR, the influencing factors are further screened out. As shown in Figure 2, the following results can be found: accident type and cross-section position appeared no less than four times; lighting condition, visibility, signal control mode, physical isolation facility, central isolation facility, time interval, and road type appeared, but less than four times; day of the week, pavement condition, pavement structure, intersections type, road line style, road safety attribute, weather, road surface condition did not appear.

Classification and Regression Tree Analysis
Based on CART, the influence degree of various factors on injury severity of traffic accidents is studied in whole city and different urban functional zones. CART considers that the importance of standardization is meaningful at the 20% level. The results of CART are shown in Table 7.  Compared with the results of Pearson's chi-squared test, there are some differences. The number of factors that appear no less than four times decreased by 2; the number of the factors that appeared but less than four times decreased by 5; and the factors that do not appear increased by 7.

Classification and Regression Tree Analysis
Based on CART, the influence degree of various factors on injury severity of traffic accidents is studied in whole city and different urban functional zones. CART considers that the importance of standardization is meaningful at the 20% level. The results of CART are shown in Table 7.

Whole City
According to the magnitude of accident severity, the influence factors are sorted in turn: lighting condition, accident type, road type, visibility, signal control mode, time interval, physical isolation facility, cross-section position, pavement condition, road line style, and intersections type. For the top five, the importance of standardization is more than 20%.

Zone 1
According to the magnitude of accident severity, the influence factors are sorted in turn: accident type, cross-section position, road type, and pavement condition. The importance of standardization of all factors is more than 20%.

Zone 2
According to the magnitude of accident severity, the influence factors are sorted in turn: accident type, cross-section position, visibility, weather, time interval, and pavement structure. For the top five, the importance of standardization is more than 20%.

Zone 3
According to the magnitude of accident severity, the influence factors are sorted in turn: lighting condition, road type, visibility, accident type, physical isolation facility, signal control mode, central isolation facility, time interval, and road line style. For the top seven, the importance of standardization is more than 20%.

Zone 4
According to the magnitude of accident severity, the influence factors are sorted in turn: accident type, time interval, cross-section position, road type, and lighting condition. The importance of standardization of all factors is more than 20%. "-" indicates that the chi-square statistics are meaningless, and classification and regression tree analysis is not carried out. "*" indicates that the value is below 20%.

Consistence Analysis of BLR and CART
BLR and CART analyze the characteristics of influence factors from different angles. It is necessary to discuss the consistency of the two methods. As shown in Table 8, the conclusions of the two methods are basically the same.

Comparative Analysis of Influencing Factors
Based on the results of BLR, the influence factors that appear more no less than four times are defined as common influence factors, those that appear but less than 4 times as personalized influence factors, and those that don't appear as weak influence factors. Further analysis of the difference among whole city and different function zones is shown in Figure 3.

Accident Type
As shown in the previous study by Al-Ghamdi [22], accident type is a common influence factor, and needs attention when the value is 2 or 3. In the whole city, zone 2 and zone 4 should pay attention to accident types 2 and 3. In zone 3, accident type 4 should be focused besides accident type 2. In zone 1, accident type 5 should be focused besides accident type 3.

Time Interval
Time interval is a personalized influence factor. Only in zone 4, time intervals 1 and 2 should be considered. Similar results were found by Kim [23] that morning rush hour (between 06:00 and 09:59 a.m.) made an increase of fatal probability.

Cross-section Position
Cross-section position is a common influence factor, the value 4 needs more attention. In the whole city and zone 2, cross-section positions 4 and 6 need focus, in zone 1 cross-section positions 4 and 7, while in zone 4 cross-section positions 4 and 5. Several previous studies [24][25][26] indicated a similar result that cycling on sidewalk was more dangerous than on the road.

Accident Type
As shown in the previous study by Al-Ghamdi [22], accident type is a common influence factor, and needs attention when the value is 2 or 3. In the whole city, zone 2 and zone 4 should pay attention to accident types 2 and 3. In zone 3, accident type 4 should be focused besides accident type 2. In zone 1, accident type 5 should be focused besides accident type 3.

Time Interval
Time interval is a personalized influence factor. Only in zone 4, time intervals 1 and 2 should be considered. Similar results were found by Kim [23] that morning rush hour (between 06:00 and 09:59 a.m.) made an increase of fatal probability.

Cross-section Position
Cross-section position is a common influence factor, the value 4 needs more attention. In the whole city and zone 2, cross-section positions 4 and 6 need focus, in zone 1 cross-section positions 4 and 7, while in zone 4 cross-section positions 4 and 5. Several previous studies [24][25][26] indicated a similar result that cycling on sidewalk was more dangerous than on the road.

Physical Isolation Facility
Physical isolation facility is a personalized influence factor. Only in whole city physical isolation facility 1 and 3 should be paid attention to. Osman et al. [27] also showed similar result that lack of access-control increased the possibility of serious injury.

Road Type
Road type is a personalized influence factor, with great variety. In the whole city road types 1 and 5 should be considered, in zone 1 road types 2 and 4, while in zone 4 only road type 2. Previous study [27] also showed that urban principal arterial could contribute to injury severity.

Signal Control Mode
Signal control mode is a personalized influence factor. In the whole city signal control modes 2 and 3 need more attention, and in zone 3 signal control modes 1 and 2, which was similar to the result reported by Osman et al. [27] that signalized control made for a lower likelihood of serious injury in comparison with non-signalized control.

Visibility
Visibility is a personalized influence factor. In whole city and zone 2 visibility 2 and 3 need to be paid attention to, while in zone 3 visibility 3 and 4. Klop and Khattak's [28] study showed that fog increased injury severity partially because the inclement weather reduced visibility.

Lighting Condition
Lighting condition is a personalized influence factor. The distribution of index is more concentrated. In whole city and zone 3, lighting conditions 3 and 4 need more attention. A similar result was found by Ivan et al. [14] that low lighting conditions significantly influenced accident occurrence.

Comparative Analysis of Urban Functional Zone
Integrating the conclusion of BLR and CART, the features of different urban functional zone are explored.

Whole City
The factors that should be focused on in whole city include: lighting condition, accident type, road type, visibility and signal control mode.
In particular, attention should be paid to the following situation: the value of lighting condition is 3 or 4; the value of accident type is 2 or 3; the value of road type is 1 or 5; the value of visibility is 3 or 2; the value of signal control mode is 2 or 3.

Zone 1
The factors that should be focused on in zone 1 include: accident type, cross-section position, road type, and pavement condition.
In particular, attention should be paid to the following situation: the value of accident type is 3 or 5; the value of cross-section position is 7 or 4; the value of road type is 4 or 2.
Zone 1 is the center of city, with the main characteristics of high population density and high branch road density. The value of accident type, cross-section position and road type are connected with pedestrians and motor vehicles.

Zone 2
The factors that should be focused on in zone 2 include: accident type, cross-section position, visibility, weather, and time interval.
In particular, attention should be paid to the following situation: the value of accident type is 2 or 3; the value of cross-section position is 6 or 4; the value of road type is 3 or 2.
Zone 2 surrounds the city center, with the main characteristic that arterial roads of expressway have a relatively high density. The value of accident type, cross-section position, and road type are connected with vehicle traffic.

Zone 3
The factors that should be focused on in zone 3 include: lighting condition, road type, visibility, accident type, physical isolation facility, signal control mode, and central isolation facility.
In particular, attention should be paid to the following situation: the value of lighting condition is 3 or 4; the value of visibility is 3 or 4; the value of accident type is 2 or 4; the value of signal control mode is 2 or 1.
Zone 3 has a certain distance from the city center, with the main characteristic that high-grade roads have a relatively high density. Therefore, it is different from zone 1 and zone 2 that the value of some traffic facilities and environmental indicators, such as physical isolation facility, lighting conditions, accident type, signal control mode, and visibility are connected with high-grade roads.

Zone 4
The factors that should be focused on in zone 4 include: accident type, time interval, cross-section position, road type, and lightning condition.
In particular, attention should be paid to the following situation: the value of the accident type is 3 or 2; the value of the time interval is 1 or 2; the value of the cross-section position is 5 or 4; the value of the road type is 2.
Zone 4 is the suburban area of the city, with the main characteristics of low population density and nighttime vehicle crossing. Therefore, the value of accident type, road type, time interval, and cross-section position are connected with low population density and nighttime vehicle crossing. For example, the time interval indicates nighttime.

Conclusions
Taking Beijing as an example, 3982 sets of accident data were analyzed from the perspective of whole city and different urban functional zones. The influence factors set of injury severity of traffic accidents were set up from the aspects of accident attribute, occurrence time, infrastructure, management status, and environmental condition. These factors are preselected based on Pearson's chi-squared test. The impact of the value of these influence factors on injury severity is calibrated based on binary logistic regression analysis. Additionally, the impact of influence factors is analyzed based on classification and regression tree analysis.
It is found that there are similarities and differences among different urban functional zones. There are two common influence factors, including accident type and cross-section position, and six personalized influence factors, including lighting conditions, visibility, signal control, road physical isolation facility, occurrence period, and road type, and nine other weak influence factors. The results of binary logistic regression analysis and classification and regression tree analysis are basically the same. The factors that should be paid attention to in different urban functional zones and the value of the factors that need special attention are determined by synthesizing two methods. It can be concluded that the difference of influence factors on injury severity in different zones is connected with different zones' attribute.
Author Contributions: Study concept and design: Z.S. and Y.C.; data collection: Z.S. and J.W.; analysis and interpretation of results: Z.S., J.W., and H.L.; draft manuscript preparation: Z.S. and J.W. All authors reviewed the results and approved the final version of the manuscript.