Prediction of Crash Severity as a Way of Road Safety Improvement: The Case of Saint Petersburg, Russia

: This article investigates factors that explain road crash severity levels in Saint Petersburg, Russia, during the 2015–2021 period. The research takes into account factors such as lighting conditions, weather conditions, infrastructure factors, human factors, accident types, and vehicle category and color to assess their inﬂuence on crash severity. The most inﬂuential accident type is run-off-road crashes, which are associated with an 11.2% increase in fatal accidents. The biggest reason for the increase in fatal accidents due to road infrastructure conditions is road barrier shortcomings (2.8%). Road infrastructure conditions, such as a lack of road lighting, have a signiﬁcant effect on fatal outcomes, increasing them by 12.6%, and this is the most inﬂuential factor in the analysis. The obtained results may serve as a basis for Saint Petersburg authorities to develop new road safety policies.


Introduction
Road safety is a key factor in sustainable development. It can increase the economic growth of a region by changing the functional purposes of certain territories and constructing cities. Moreover, it has a direct impact on population growth. The complex innovative development of a region requires the systematic development of the transport industry in terms of increasing road safety, which will ensure not only an increase in the efficiency of transport logistics but also an overall improvement in the quality of people's lives and regional sustainability, along with its economic, social, and environmental components. This is why many studies have focused on improving road safety-e.g., Orsini et al. [1], Anastasopoulos et al. [2], and Karpova et al. [3]. Such studies highlight ways of improving road infrastructure to attain the objective of decreasing fatal crashes and serious injuries. Many researchers [4][5][6][7][8][9][10][11] have investigated the distribution of severity levels to determine the causes of injury to recommend proposals for decreasing the severe and fatal consequences of road crashes. The studies show the same results for severity level distribution, with slight severity being the most common, whereas fatality being the least. Hence, predicting a fatal outcome can be difficult.
Road safety is one of the most important sustainable development issues of the 2030 Agenda for Sustainable Development and the key Russian national project "Safe Quality Roads" for 2019-2030, the aim of which is to improve Russia's road infrastructure and road safety. To achieve this aim, it is crucial to understand the most dangerous disadvantages of the existing road infrastructure. Under the project, the number of deaths per 100,000 people should be reduced from the initial 13 people in 2017 to 8.4 by 2024 and 4 by 2030. In 2022, there were 9.59 deaths in road crashes per 100,000 individuals [12]. The main weaknesses of the road infrastructure cause severer injuries and more fatal cases after road crashes. Hence, understanding it can help the government implement road safety projects more effectively in terms of ensuring people's safety and obtaining such results 2 of 20 in an expedited manner. Many different studies discuss dangerous road infrastructure problems [13][14][15], which include the influence of traffic lights, road barriers, and signs on increasing crash severity. However, for a more comprehensive analysis, studies by Billah et al. [13], Chen et al. [14], and others examined the influence of other factors on crash severity, for example, lighting [14][15][16] and weather conditions [15,16], human factors [14][15][16][17][18], and vehicle characteristics [12,13,[19][20][21][22][23][24][25][26]. Azhar et al. [20], Pillajo-Quijia et al. [27], and Chen et al. [14], Refs. [28][29][30][31][32][33][34] highlighted the influence of night time on increasing crash severity. Chen et al. [15] and Park et al. [17] observed the impact of weather conditions on crash fatality and concluded that precipitation causes severer crash outcomes. Onieva-García et al. [18] and Chen et al. [14] proved that the gender and age of road users are factors in the severity of crashes. Wang et al. [19], Billah et al. [13], and Azhar et al. [20] considered how different types of road accidents can influence fatality levels. As a result of these studies, rollover and run-off-roadway accidents have the highest impact.
Therefore, every year, there is an increasing number of studies investigating the nature of the occurrence of road crashes in various territories to minimize their consequences for authorities and the population, which indicates the relevance of the topic and the importance of analyzing the factors affecting crashes.

Materials and Methods
The research required data that included the factors affecting the severity level and frequency of road traffic accidents in St. Petersburg. This dataset was obtained from "Karta DTP" [35]. It included data from 2015-2021 and 37,585 observations. For the examined set "severity level," 3 output classes-slight injury, severe injury, and fatal outcome-were determined as dependent variables. Table 1 presents the distribution of road accidents by severity level during the period under review in St. Petersburg. The independent variables for the study consisted of 38 dummy variables and a participant count that was assigned values from 1 to 26. Table 2 presents the data regarding the independent variables in further detail, along with information about coding, values, and labels.
Based on the data and the literature review of previous studies, the ordered probit regression was selected as the tool for the analysis. Ordered probit models explain variation in an ordered categorical dependent variable (severity level) as a function of independent variables. Categories must only be ordered. In this case, we had 3 categories of road accident severity level, the lowest being "slight injury" and the highest being "fatal". To check whether there were cases of high correlation between the variables, Figure 1 depicts the information in the form of a correlation matrix regarding the phik correlation in our dataset, which was converted to binary variables.
The phik correlation coefficient provides information regarding the magnitude of the association, or correlation, between categorical and interval variables. Additionally, the phik correlation captures non-linear relationships as well as linear ones. If we see that the coefficients equal more than 70% in the pair of explanatory variables, a certain amount of apprehension arises. As we understand it, these variables can cause the errors in our estimation. This means that if we need to use these variables, we should select only one that will be more appropriate for the following estimations.  We observed a high correlation (equals more than 0.7) between 2 pairs of variables: "avp" and "hp" and "acb" and "vb." In these cases, the correlation equals 1, and it represents a high correlation between vehicle-pedestrian collisions and the presence of pedestrians in accidents, given that pedestrians are involved in all vehicle-pedestrian collisions. This finding is similar to that of car-on-bike collisions and bikes in accidents. Hence, these pairs of variables cannot be considered in one sample, which is taken into account in determining the samples for the analysis.
This study is aimed at testing which factors impact road accident severity. Four ordered probit models with differences in the analyzed samples were used to achieve this objective:

1.
A model with a full sample that contained all observations and independent variables, except for pedestrian gender (hpg), to avoid the problem of missing values because pedestrians were not present in all cases of road crashes.

2.
A model that observed the full sample as for the first model but without considering the variables from the category "accident type" (avc, avp, apf, aco, acb, asv, aro, aror).

3.
A model with the subsample "Collisions" that contained cases of vehicle collisions (where the variable "avc" takes the value of 1) and did not observe the pedestrian's gender as the independent variable. 4.
A model with the subsample "Pedestrian accidents" that contained cases of vehiclepedestrian collisions (where the variable "avp" takes the value of 1), and the gender of the pedestrian was observed as the variable "hpg"; the variables "wh" and "hp" were omitted because of collinearity, as the cases with hurricane were not provided (all "wh" variables equal 0), whereas pedestrians as road users were present in all cases (all "hp" variables equal 1).
The ordered probit regression in "stata", which is used to compute the models, provides the coefficients that can be used only as an indicator of the change in the positive or negative impact on the dependent variable and its significance, with average values for all the output categories of the dependent variables. Therefore, the study also presents the average marginal effects for each injury severity type for a more detailed conclusion about the influence of the observed factors on severity level. It allows for the prediction of probabilities or coefficient estimates of every independent variable's effect on every category of independent variables. Based on the data and the literature review of previous studies, the ordered probit regression was selected as the tool for the analysis. Ordered probit models explain variation in an ordered categorical dependent variable (severity level) as a function of independent variables. Categories must only be ordered. In this case, we had 3 categories of road accident severity level, the lowest being "slight injury" and the highest being "fatal". To check whether there were cases of high correlation between the variables, Figure 1 depicts the information in the form of a correlation matrix regarding the phik correlation in our dataset, which was converted to binary variables.

Full Sample Analysis
At the beginning of our research work, we examined our full sample with 38 variables and 37,585 observations. The results of the ordered probit regression are presented in Table 3. It contains the coefficients of the regression modeling with the values of z-statistics and the marginal effects with their standard errors.    The asterisks depict the intervals of the p-value (*** p < 0.01, ** p < 0.05, * p < 0.1).
It is important to note that the following variables obtained significant coefficient estimates at the 95% confidence interval: li, lio, lim, wp, avp, aco, asv, aro, aror, ihm, iprs, Under the "light" category of regressors, every variable, except for "twilight," had a significant impact on the severity level. The most influential factor was missing illumination at night time (lim). The factors increased the probability of responding in response categories such as severe (18%) and fatal (10.7%), whereas they decreased the probability of responding in the response category slight injuries (28.7%) in comparison with the base, which was determined with a daytime term.

2.
Under the "weather" category of the independent variables, only the condition "precipitation" had a significantly positive impact on the second and third response categories (4.5% and 1.3%, respectively) and a significantly negative impact on the first one (−5.8%).

4.
As seen from the infrastructure conditions, only the regressors that provide information about outcomes connected with road signs (irsa and irsm) have estimates that did not have any significance. The absence of road barriers (irba) had the highest coefficients (1: −10.9%; 2: 8%; 3: 2.8%), whereas the absence of a pedestrian restraint system at desired locations (iprs) had the lowest ones (1: −4.1%; 2: 3.2%; 3: 0.9%). Defective traffic light (itl) had significant coefficients for slight injuries and severe injuries at the 95% confidence level, while the fatal category of response coefficient was estimated at the 90% confidence level.

5.
Under human factors, it was seen that all the factors were significant for the prediction of the severity level, and the presence of pedestrians as road users in accidents had the most influence (1: −13.4%; 2: 10.5%; 3: 3%). The driver's gender (hdg) coefficients for the first response were 6.4%, 2: −5.2%, 3: −1.2%. These estimates show that male drivers increased the probability of serious injuries after road accidents by 5.2% and fatal outcomes by 1.2%, while female drivers increased the probability of slight injuries by 6.4%. 6.

Full Sample Analysis without Accident Types
A regression analysis with the same sample as the full one is provided to compare how coefficient estimates change without this category of factors for further modeling with the subsamples of certain accident types-the most frequent ones. The results are shown in Table 4.    The asterisks depict the intervals of the p-value (*** p < 0.01, ** p < 0.05, * p < 0.1).
The following variables obtained significant coefficients at the 95% confidence interval: li, lio, lim, wp, wf, ihm, iprs, irba, isd, hpc, hp, hdg, vm, vpt, vb, ves, vl, vh, vsp, cb, cg. In comparison with the previous model with accidents, the type 2 variables' estimates became significant. In the case of accidents with bicycles (vb), all the variables from the vehicle category significantly influenced the severity level and weather condition-fog (wf)-but only for the first two response categories-slight injuries (17.8%) and severe injuries (12.5%). The coefficients of the "defective traffic light" regressor (itl) were non-significant.
The following statements can be concluded after comparing Tables 3 and 4: 1.
The estimates were quite similar, but some differences in values were noted. 2.
In the "light" regressors, missing illumination (lim) at night time had the largest impact on severity level, but the estimates were higher: for the category 1 response, the coefficient decreased by 2.8% (−31.5%); for the category 2 response, the coefficient increased by 0.9% (18.9%); and for the category 3 response, the coefficient increased by 1.9% (12.6%).

3.
For the weather category of regressors, there were two significant regressors. The coefficients of regressor precipitation (wp) increased by approximately 1%. As mentioned earlier, the regressor representing the fog condition (wf) became significant, and its estimates were higher than "wp" (1: −17.8%; 2: 12.5%).

4.
Both for the first model and the current one, the variables connected to the problems of the road signs (irsa and irsm) were non-significant for prediction, but in the current model, the "itl" regressor presented non-significant values of coefficients. Again, the absence of road barriers had the highest values of coefficients, and for response 1, it decreased by 1.7% (−12.6%), 2: increased by 1.2% (9.2%), and 3: by 0.5% (3.3%). The lowest coefficients were obtained with the "absence of the pedestrian restraint system" regressor (iprs). However, the estimates were lower when compared to the previous model; for response 1, it increased by 1% (−3.1%), 2: decreased by 0.8% (2.4%), and 3: by 0.2% (0.7%).

5.
As for human factors, it is fair to note that the estimates were very close to the results of the previous model. 6.
All the regressors that contained information about the vehicle category had significant coefficients. Therefore, accidents with bicycles had the lowest values (1: −4.6%; 2: 3.5%; 3: 1%) in this model and not with the light vehicle component in comparison with the previous model. 7.
As for color factors, it is fair to outline that the estimates were very close to the results of the previous model.

"Collisions" Sample Analysis
The most frequent accident type was vehicle collision. It covered 14,948 observations and formed 40% of all accidents. This is why we wanted to provide a regression model for the subsample. All the results are depicted in Table 5.   The asterisks depict the intervals of the p-value (*** p < 0.01, ** p < 0.05, * p < 0.1).
The results of the "collisions" subsample can be concluded in the following statements: 1.

3.
As seen from the infrastructure conditions, only regressors that provided information about the outcomes connected with horizontal markings (ihm), surface distress (isd), and traffic light-"itl" (for the first two categories of response)-were significant at a 95% confidence level. Surface distress had the greatest impact on the response (1: −17%; 2: 13.6%; 3: 3.4%).

4.
Under the human factors, all the factors were significant for the prediction of severity level, and the presence of pedestrians as road users in accidents was the most influential factor (1: −18.6%; 2: 14.7%; 3: 3.9%). The driver's gender (hdg) coefficients were the lowest for response 1: 1.9%, 2: −1.7%, 3: −0.3%. These estimates show that male drivers increased the probability of serious injuries after road accidents by 1.7% and fatal outcomes by 0.3%, while female drivers increased the probability of slight injuries by 1.9%.
The color category illustrates that white-colored (cw) vehicles significantly (at a 95% confidence level) influenced probability in the following way: for category 1, they increased the probability of slight injuries by 1.8%, decreased the probability of severe injuries by 1.6%, and fatal outcomes by 0.3%.

"Pedestrian" Sample Analysis
Vehicle-pedestrian collisions took second place by frequency in the whole sample. They covered 13,888 observations and formed 37% of the full sample. The performance of the regression analysis of the "pedestrian" subsample is presented in Table 6.
The results of the "pedestrian" subsample can be concluded in the following statements: 1. All the regressors under the "light" category, except for the twilight condition (lt), had a significant impact on severity level-night time, illumination missing (lim), night time illumination on (li), and illumination off (lio). The "lim" regressor had the highest coefficients, as in all our models (1: −29.5%; 2: 15.6%; 3: 13.9%), and the "li" regressor, which estimated the smallest effect on severity level, as in the "collisions" model, had the following values for response 1: −4.5%; 2: 3.3%; 3: 1.2%.

3.
As seen from the infrastructure conditions, only regressors that provided information about the outcomes connected with the absence of a pedestrian restraint system (iprs) and road barriers (irba) were significant at a 95% confidence level. The absence of road barriers had the greatest impact on the response (1: −13.1%; 2: 8.7%; 3: 4.4%).

4.
Under human factors, it was seen that all factors were significant for the prediction of severity level. The driver's gender (hdg) coefficients were the greatest for response 1: 9%, 2: −6.9%, 3: −2.1%. These estimates show that male drivers increased the probability of serious injuries after road accidents by 6.9% and fatal outcomes by 2.1%, while female drivers increased the probability of slight injuries by 9%. The pedestrian's gender (hpg) coefficients were the lowest, but, nevertheless, significant, for response 1: 2.5%, 2: −1.8%, 3: −0.6%. These estimates show that male pedestrians increased the probability of serious injuries after road accidents by 1.8% and fatal outcomes by 0.6%, while female pedestrians increased the probability of slight injuries by 2.5%.

Discussion
The results of the current research allow us to conclude the following. Lighting conditions: As considered by Azhar et al. [20], Chen et al. [14,15], and Pillajo-Quijia et al. [27], the time of day significantly influences the severity level of road crashes. Their studies highlighted the significant impact of daytime on decreased crash severity. Azhar et al. [20] and Chen et al. [14] observed that illumination decreased crash fatality. The current research confirms the influence of daytime on decreasing crash severity. The results of all the models present a positive relationship, with the highest coefficients between missing illumination at night time and severe injuries and fatal outcomes of crashes. This factor increases crash fatality in the range of 10.3-13.9%. The highest impact was observed in crashes with pedestrians (13.9%) and the lowest in the model of vehicle collisions (10.3%). The probability of severe injuries was increased by a factor in the range of 15.6-23.7%. Conversely, the highest coefficient was obtained by the model of vehicle collisions (23.7%), and the lowest (15.6%) was obtained by the "pedestrian" sample model. Slight injuries decreased in the range of 28.7-34%. The highest impact was obtained by the model of vehicle collisions (−34%), and the lowest (−28.7%) was obtained by the full sample model, whereas the "pedestrian" sample model showed −29.5%. Therefore, it can be concluded that missing road illumination, which relates to road infrastructure shortcomings as well, had the highest impact on the probability of decreasing slight injuries and the lowest on fatality. Moreover, it produces the highest coefficient on the fatality of "pedestrian" crashes and the lowest one on vehicle collisions, while the highest coefficient on increasing more severe injuries is conversely obtained by the vehicle collisions and the lowest one-by "pedestrian" crashes.
Weather conditions: As presented by Park et al. [17], Chen et al. [14], Pillajo-Quijia et al. [27], and Azhar et al. [20], the occurrence of precipitation significantly influenced crash severity negatively. This is also confirmed by the current results. The "precipitation" factor had a significant impact on crash severity in all the observed models. It increased fatality in the range of 1.3-1.7%. The lowest coefficient was provided by a full sample model and vehicle collisions (1.3%), while the highest coefficient was provided by a "pedestrian" sample model (1.7%). Severe injuries increased in the range of 4.2-6.6%. Here, vice versa, the lowest coefficient was provided by a "pedestrian" sample model (4.2%), while the highest coefficient was provided by vehicle collisions (6.6%). Slight injuries decreased in the range of 5.8-7.9%. Here, the lowest impact on severity level was provided by a "pedestrian" sample and full sample models (−5.8%), while the highest impact was provided by vehicle collisions (−7.9%). Additionally, the results show a significant impact of fog on the "full sample without accident types" model; its estimates were even higher than those of the "precipitation" factor; fog decreased slight outcome by 17.8% and decreased severe outcome by 12.5%. For "collisions" and "pedestrian" models, it is fair to mention that fog increased the severe outcome of crashes by 21.4% and 14.8%, respectively. Additionally, hurricanes significantly decreased accidents with slight injuries (50.7%) and increased ones with severe injuries (24.7%) in the cases of vehicle-pedestrian collisions.
Human factors: As observed by Park et al. [17], Onieva-García et al. [18], Chen et al. [15], Islam and Mannering [16], and Azhar et al. [20], the gender of road users and age have a significant impact on severity. In this case, all observed human factors in all models significantly influenced the severity. The participant count was directly proportional to increasing the severity level and crash fatality. According to the full sample model, the participant count increased fatality by 0.8% and severe injuries by 3.1%, whereas in the "pedestrian" model, it is seen that fatality was two times higher (1.6%), and there was a higher probability of severe injuries (4.7%). The "collisions" model shows results that are quite similar to the full sample analysis; it increased fatality by 0.7% and severe injuries by 4.1%. The analysis of gender differences states that female road users (pedestrians and drivers) increased the probability of slight injuries as a crash outcome, whereas male road users increased the probability of fatality and more severe injuries. This was also confirmed by the studies of Islam and Mannering [16] and Azhar et al. [20].
Road accidents: As supposed by Wang et al. [19], Billah et al. [13], and Azhar et al. [20], rollover and run-off-roadway accident types have the highest impact. Our research confirms this. The highest impact was provided by run-off-road accident type, collision with obstacle, rollover accidents, and vehicle-pedestrian collisions. Run-off-road type increased the probability of crash fatality by 11.2%, severe injuries by 18.1%, and slight injuries by 29.3%. Collision with obstacles increased fatality by 9.4% and severe injuries by 17.7%, whereas rollover accidents increased fatality by 5.5% and severe injuries by 12.9%. Additionally, the effect of vehicle-pedestrian collisions on fatality was increased, but with lower probability (2%), and on obtaining severe injuries (7.5%).
Vehicle category: As concluded by Zeng et al. [11], Billah et al. [13], and Azhar et al. [20], motorcycles are the most dangerous vehicle category in road crashes. Our research also confirms this statement, that is, the highest coefficients of motorcycle category in all models, which can be related with the construction of this vehicle category, dangerous for the motorcycle driver (results of full sample model: slight: −20.1%; severe: 14.1%; fatal: 6%), except for the "pedestrian" category, which identified public transport as the most dangerous factor for vehicle-pedestrian accidents; it increased fatality by 15.3%, severe injuries by 15.8%, while it increased fatality by the frequency of such vehicle category in cases of vehicle-pedestrian collisions. Accidents with motorcycles had the lowest significant estimate and only for severe injuries after accidents (10.7%), contrary to other models' results. Accidents with light vehicles had significance but the lowest coefficients for all observed models.
Vehicle color: As considered by Newstead and D'Elia [26] and Eustace et al. [36], white color is the safest color for cars because of its visibility on the road. Therefore, Newstead and D'Elia [26] obtained a significant relationship between white color and decreased crash severity; however, the study of Eustace et al. [36] did not show any significant results. According to our research, white color significantly decreases severity level, that is, fatality is decreased by 0.3%, severe injuries' probability is decreased by 1.6%, and slight injuries' probability is increased by 1.8%. However, in the model with vehicle-pedestrian collisions, white color conversely increases the severity level (slight: −5%; severe: 3.6%; fatal: 1.3%).
This means that the white color can be better seen by drivers, but it is not useful when there is a collision between a vehicle and a pedestrian. In this sample, red, blue, and green colors also significantly influenced crash severity. The full sample model depicts only blue and green colors as factors increasing fatality (0.6% for both colors) and severity (blue: 2%, green: 2.2%). It can be concluded that bright colors can influence the visibility of cars on the road, as well as the behavior of road users in terms of psychology, but this finding should be tested with a more comprehensive analysis.

Conclusions
In this study, we analyzed the accident data of St. Petersburg. The dataset covered the years 2015-2021 and consisted of 37,585 observations. We considered different conditions that influence crash severity, such as lighting, weather, road infrastructure, human factors, vehicle category, and vehicle color. From the results, it can be seen that all kinds of conditions have an impact on the crash severity. The main contribution of the study is to provide information about the causes of changes in the distribution of crash consequences by severity level (slight, severe, and fatal). The analysis was provided by ordered probit regressions of four samples: full sample with all examined variables, full sample without accident types, "collisions" sample with vehicle collisions, and "pedestrian" sample with vehicle-pedestrian collisions.
The paper's results allow us to summarize the following. Of the 37,585 accidents, 61% accounted for slight crash outcomes and only 4% for fatal accidents. However, as mentioned earlier, the government is focused on decreasing crashes and their outcomes according to the national project [12].
The main results of the study are as follows: • Missing road illumination can be considered a shortcoming of road infrastructure, and it had the highest impact on crash severity; it increased fatality in the range of 10.3-13.9%.

•
Precipitations are the main factor negatively influencing crash severity, which increases fatality and severe injuries (fatality in the range of 1.3-1.7%; severe injuries in the range of 4.2-6.6%).

•
The most influential factors on the severity level of road conditions are the absence of road barriers, absence of restraint systems for pedestrians at desired locations, defective traffic light, and problems with road horizontal markings.

•
The more participants in the crash, the higher the crash severity. Every crash participant decreased the probability of slight injuries by 3.9%. • Female road users increase the probability of slight outcomes compared to male road users, who are associated with an increase in severe and fatal outcomes.

•
Motorcycles are the most dangerous vehicle in road crashes; they increased fatality by 6% and severe injuries by 14.1%. • Run-off-road, collision with obstacle, and rollover accidents are the most dangerous accident types that increased fatality by 11.2%, 9.4%, and 5.5%, respectively. • White-colored cars positively influenced crash severity but only in the cases of vehicle collisions. • Such bright vehicle colors as blue and green can impact the severity, increasing the probability of fatal crashes.
The article considers the isolated factors' influence on road accidents in St. Petersburg; however, it should be mentioned that these factors mostly have a mutual impact on accident severity. Therefore, the current research focuses only on individual factor effects, and the results obtained are limited.
Nevertheless, the results show the need for St. Petersburg authorities to pay special attention to road transport infrastructure, namely, its shortcomings. The following shortcomings have the greatest impact on the increase in accident severity: absence of road illumination, absence of road barriers, absence of restraint systems for pedestrians at desired locations, defective traffic light, and problems with horizontal road markings.
The road infrastructure shortcomings (especially problems with illumination) are the most influential on the increase in accident severity but not all of them. That is why, for cost optimization, it was necessary to determine the most dangerous shortcomings for the road users. Such road infrastructure improvement allows the government to decrease accident mortality, as well as increase the probability of accidents with slight consequences.
Additional results about such factors, such as weather and controversial results of color analysis (different results for different accident types), can be a basis for further research with the analysis of factors' interactions and their mutual effect on accident severity.