Next Article in Journal
Phytoremediation Potential of Native Plants Growing in Industrially Polluted Soils of Al-Qassim, Saudi Arabia
Next Article in Special Issue
Estimation of the Evacuation Time According to Different Flood Depths
Previous Article in Journal
Services Development, Technological Innovation, and the Embedded Location of the Agricultural Global Value Chain
Previous Article in Special Issue
Reducing Urban Traffic Congestion via Charging Price
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Modelling Road Work Zone Crashes’ Nature and Type of Person Involved Using Multinomial Logistic Regression

1
Department of Civil Engineering and Architecture, University of Beira Interior, 6200-358 Covilhã, Portugal
2
CERIS, Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisbon, Portugal
*
Author to whom correspondence should be addressed.
Sustainability 2023, 15(3), 2674; https://doi.org/10.3390/su15032674
Submission received: 14 January 2023 / Revised: 27 January 2023 / Accepted: 30 January 2023 / Published: 2 February 2023
(This article belongs to the Special Issue Traffic Flow, Road Safety, and Sustainable Transportation)

Abstract

:
The sustainable development goals “Good health and well-being” and “Sustainable cities and communities” of the United Nations and World Health Organization, alert governments and researchers and raise awareness about road safety problems and the need to mitigate them. In Portugal, after the economic crisis of 2008–2013, a significant amount of road assets demand investment in maintenance and rehabilitation. The areas where these actions take place are called work zones. Considering the particularities of these areas, the proposed work aims to identify the main factors that impact the occurrence of work zones crashes. It uses the statistical technique of multinomial logistic regression, applied to official data on road crashes occurred in mainland Portugal, during the period of 2010–2015. Usually, multinomial logistic regression models are developed for crash and injury severity. In this work, the feasibility of developing predictive models for crash nature (collision, run off road and running over pedestrians) and for type of person involved in the crash (driver, passenger and pedestrian), considering only one covariate (the number of persons involved in the crash), was studied. For the two predictive models obtained, the variables road environment (urban/rural), horizontal geometric design (straight/curve), pavement grip conditions (good/bad), heavy vehicle involvement, and injury severity (fatalities, serious and slightly injuries), were identified as the preponderant factors in a universe of 230 investigated variables. Results point to an increase of work zone crash probability due to driver actions such as running straight and excessive speed for the prevailing conditions.

1. Introduction

1.1. Framework and Objectives

Road crashes are considered worldwide as a public health problem [1,2,3,4,5]. According to the World Health Organization (WHO), in 2019, road injuries represented the seventh cause of death in low-income countries and the tenth cause in lower-middle and upper-middle income countries [6]. This problem is also a concern addressed by the United Nations (UN). In its “2030 Agenda for Sustainable Development” [2] it is possible highlight two sustainable development objectives that are relevant in the context of road safety:
Goal 3—Ensure healthy lives and promote well-being for all at all ages: By 2020, halve global road fatalities and injuries.
Goal 11—Make cities and human settlements inclusive, safe, resilient, and sustainable: By 2030, provide access to safe, affordable, accessible, and sustainable transport systems for all, improving road safety notably by expanding public transport, with special attention to the needs of those in vulnerable situations (women, children, persons with disabilities and elders).
For the case of most developed countries, where planned national road networks are practically completed (built), the next step in the road management process involves the maintenance and rehabilitation of these networks. Therefore, an increase in the need for intervention on road networks is expected, along with an increase in the importance and study of issues related to work zones (WZ) safety. In these areas, when compared to nonwork zone scenarios, it is expected a higher probability of fatal and injury crashes, as well as an increase in crash frequency [7,8,9,10,11,12].
Considering the proven feasibility of using binary and probit logistic regression techniques with the information collected in the Portuguese police crash report [13], and the use of the multinomial logistic regression almost exclusively to establish crash and injury severity models, the main goal of the study is to verify the feasibility of identifying main factors that contribute to Portuguese WZ crashes. The multinomial logistic regression approach will be used in order to establish predictive models for crash nature (collision, run off road and running over pedestrians) and type of person involved (driver, passenger and pedestrian), considering one covariate (the number of persons involved in the crash).
The structure of the paper is organized into four sections. In the Introduction, the framework and scope of the work, as well as a literature review on road WZ crash data analysis using logistic regression or similar techniques, are presented. The modeling approach adopted in the multinomial logistic regression application is described in the Method section. The Models and discussion section is dedicated to the analysis of the Portuguese WZ crash data and discussion of results. Finally, the main findings, limitations and future work directions are summarized in Conclusions.

1.2. Literature Review

Taking into account the particularities of work zone areas, several authors have focused their research on the identification of factors that impact the occurrence of WZ crashes and injury severity level using logistic and probit regression approaches or other similar methodologies [9,10,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]. Next, it is presented a description of the research conducted and main finding of 16 of these studies.
Khattal et al. [14,15] considered the ordinal categorical nature of North Carolina (USA) work zone data in the elaboration of ordered probit models for the analysis of injury severity, as well as linear regression models for the analysis of the crash total harm (combine effect of both frequency and severity of injuries). The sample of the study focused on the WZ crashes occurred in the year 2000 (3383 WZ crashes), especially in the ones involving heavy vehicles. The study showed that collisions involving heavy vehicles happened more frequently when WZ activities were carried out while no traffic control device was in place and that multivehicle crashes involving this type of vehicle were more injurious when compared to non-work zone crashes.
The study carried out by Qi et al. [16] focused on 1994–2001 data from rear-end collisions occurred in WZ located in the state of New York, USA. In the study, ordered probit models were developed to analyze crash severity and to recommend measures for the reduction of the frequency of rear-end crashes. The study concluded that, at the time, the NYSDOT’s crash database system was the most comprehensive in use in the USA, especially regarding WZ data collection practices. As an inference, the model provided some actions that can be adopted to reduce the WZ crash frequency, namely: the installation of bands that alert drivers of the WZ proximity through sound vibrations, speed displays and the use of drone radar as a warning measure for law enforcement. It was also concluded that the WZ frequency and crash severity are correlated with the construction or maintenance phase of the work, the population density and the territorial area occupied.
Li & Bai [17,18,25] resorted to statistical techniques such as Pearson’s chi-square, chi-square likelihood ratio probability and logistic regression, to study the characteristics of Kansas’s (USA) highway WZ fatal and injury crashes, between 1992 and 2004 (157 fatal crashes and 4443 injury crashes). The study focused on the role that human factors play in WZ crash occurrence, aiming to reduce the likelihood of their occurrence by promoting education campaigns, research, and the effective use of temporary traffic control devices. The study identified several human behaviors as contributing factors to WZ crashes, such as disrespect for the safety distance, traffic signs, road markings, speed limits, as well as driving under the influence of alcohol. The authors also found that for fatal crashes, head-on was the dominant type of crash, and that the involvement of trucks, unfavorable light conditions and complicated road geometries contributed to a larger percentage of crashes. On the other hand, most injury crashes involved light-duty vehicles only and rear-end was the dominant injury crash type.
The Southeast Michigan (USA) work zone crash data was utilized by Meng et al. [12] to calibrate a probabilistic quantitative risk assessment model. The database contained 89 fatal, 10,142 injury and 35,036 property damage only crashes. The authors modeled the records of the crashes occurred between 1999 and 2008 in long-term WZ (>3 days) using the Monte Carlo technique and statistical techniques such as the chi-square. To obtain the casualty risk, the model combines an estimation of the WZ crash frequency by regression model, an event tree for crash scenario determination and consequence estimation models for scenarios in the event tree. The probability of fatal or injured victims is measured by taking the individual risk for driver and passenger categories. Societal risk is represented by the relationship between the frequency and total number of casualties caused by vehicle crash. The study showed that slowing down speed was found to be more effective in mitigating individual fatality and injury risks than reducing the emergency medical service response time.
In 2011, the analysis performed by Elghamrawy [19] included the available data and reports on work zone crashes occurred in the state of Illinois (USA), during the period of 1996–2005. Crash severity indices for serious injury, multi-vehicle, and multi-injury crashes were developed using the logistic regression method (ordered). The research also aimed to carry out field experiments, in order to evaluate the effectiveness of temporary rumble strips placed prior and at the edge of work zones, investigating and quantifying their effect on the most inattentive drivers. The results obtained showed that the WZ speed limit, type of barrier, length and the works’ start time were significant variables that directly influenced the occurrence of this type of crashes. Still, the combination of free-flow speed, respect for the established speed limit and type of construction or rehabilitation activity, influenced the crash type.
Yang et al. [20] proposed an approach for crash frequency estimation that involves the use of a measurement error (ME) model (applied to work zone length) integrated with the traditional negative binomial (NB) model. The aim was to identify risk factors in work zone safety evaluation. A dataset of 60 work zones in New Jersey (USA), for the period from 2004 to 2010, obtained from the NJDOT crash database, was used to estimate the models. Results suggest that both work zone length and traffic volume are positively associated with crash occurrence in WZ, and that variations in WZ speed can result in an increase of crashes.
Later, in a state-of-the-art review, Yang et al. [10] considered 82 WZ research papers, from January 1962 to July 2013, taking into account the point of view of WZ crash data analysis and safety modelling. The main contributing factors identified in the selected studies were crash classification (total number of crashes occurred during the construction period, divided by the distance traveled (km) by the vehicle in the construction area), crash severity, posted speed limit, annual average daily traffic, driver and vehicle characteristics, type of occurrence, location, contributing circumstances and actions and maneuvers preceding the crash. The authors also found that the multicollinearity between the variables is hardly addressed and that the variables and contributing factors are essentially selected through syntheses based on the information contained in the statistical bulletins, instead of adopting an analytical approach of how such variables and their respective groupings should be selected. The analysis indicated that there are significant gaps in understanding the relationship between injury severity and possible risk factors and that few studies have explored the actual level of injury recorded in hospital patient reports. Failure to report all WZ crashes and to collect WZ detailed information was also identified.
Osman et al. [21] undertook an extensive empirical analysis of the different types of injury severity models developed for WZ crashes involving large heavy vehicles. The authors used the statistical techniques of multinomial logit, nested logit, ordered logit, and generalized ordered logit to develop WZ crash models. The database used consisted of WZ crashes occurred in the state of Minnesota (USA) between 2003 and 2012. The authors concluded that the generalized ordered logit model provided superior data fit when compared to all the other models. The contributing factors that increased the risk of serious injuries in WZ crashes involving large heavy vehicles were crash occurrence during daylight, no access control, closure of one or more traffic lanes, failures in WZ transition areas, presence of workers on the road, higher speed limits, poor pavement grip conditions and crash occurrence on main arteries.
In 2018, the authors published a new study in which different types of WZ configurations adopted in the state of Minnesota were analyzed for crashes occurred in the same period of the research published in 2016 (14,351 crashes involving at least one passenger car) [22]. The analysis employed ordered and disordered probit models, in order to obtain an analytical injury severity model as a function of different WZ configurations (lane closure, lane change, intersections, roadside works and intermittent or moving activity, among others). It was found that 85% of fatal WZ victims were drivers and passengers of light vehicles, and that heavy vehicle involvement influenced the level of severity of the injury. It was also identified that the adoption of temporary traffic control plans is the solution that best adapts to each type of activity. Partial access control, rural road environment, night-time, weekends, and horizontal curves, were identified as factors that increase the probability of serious injuries. For variations across WZ configurations, significant differences in the effects of the following factors were observed: access-control, number of lanes, road class and surface condition, speed limit, WZ area, presence of workers, time-of-day, number of vehicles involved, and truck involvement.
In Portugal, several studies that focused on the application of logistic and probit regression techniques to WZ crash data obtained from the Police crash reports were performed [13,30,31]. The official 2013–2015 WZ crash data (1767 work zone crashes) was used in [13] to assess the feasibility of applying binary and probit logistic regression techniques to identify the main factors contributing to WZ crashes. The authors developed 11 binary and probit logistic regression models (for pedestrian, angle, rear-end and run-off-road crashes; for unexpected obstacle, excessive speed, disregards for vertical signs and disregard for safety distance; and for youth, adult and senior age groups), concluding that it was possible to identify risk factors and that the obtained estimated coefficients and goodness-of-fit test values were very similar for both logit and probit link functions. The modeling results pointed to excessive speed, disregard for vertical signs, luminosity, intersections, and motorcycle and heavy vehicle involvement as the most significant risk factors. As mentioned by [10,22], the study also concluded that it is necessary to collect additional information to allow for more complex analyses, more representative results, and to determine a measurement of crash exposure.
Finally, Ashqar et al. [26] identified and quantified the impact of risk factors that can lead to high-severity injury and/or fatal WZ crashes. For this purpose, the authors analyzed a set of 379 severe crashes that occurred in work zones along highway I-94 (state of Michigan, USA, 2016) using frequency analyses, logistic regression and machine learning Random Forest algorithm. Mechanisms for the improvement of WZ traffic control, lighting conditions and public education programs aimed at high-risk drivers were some of the directions pointed out by the study results to reduce risk in those areas. According to the authors, when compared to logistic regression, the Random Forest has the advantage of producing meaningful results, even for a small sample size.
Table 1 and Table 2 present a summary of the methods, scope and variables adopted in the presented studies.
Most studies have been developed in the USA and the majority point out that crash data constitutes the raw material necessary to assist decision-making regarding the implementation of road safety measures in WZ. However, there are still challenges related to the collection, storage, integration of various sources of information and provision of complete databases that adequately support research and investment [10,13,26,32].
As can be seen in Table 2, the degree of the information detail varies across studies and can influence the statistical technique adopted in the analysis. Despite the different studies approaches, the data related to road environment conditions, such as luminosity and weather, road geometry, road environment (rural/urban), speed limit, number and type of vehicles involved in the crash, WZ traffic control device/type, heavy vehicle involvement and crash severity, are those normally considered, being also those that are generally collected in police crash reports [13].
Studies are essentially focused on analyzing the crash or injury severity, the involvement of a specific vehicle type, usually heavy vehicles, and a specific crash type, such as rear-end crashes. Regarding the methods adopted, logistic regression is often the preferred one, since it allows a better understanding of the risk factors and of how they can increase the probability of occurrence of the dependent variable under study. This is mainly due to the easy interpretation of odd ratios. However, approaches that use machine learning techniques are being increasingly used, especially to obtain meaningful results for small sample sizes.
To conclude, all works refer that the main benefit of WZ crash studies is that they are fundamental elements for guiding intervention strategies and defining appropriate measures that aim to minimize the occurrence of WZ crashes.

2. Method

The regression technique selection and data processing operations considered the aim of investigating the possibility of modelling the nature and type of person involved in crashes occurred in Portuguese road WZ. Data from police crash reports, made available by the Portuguese National Road Safety Authority, was used as input in the modelling process.

2.1. Multinomial Logistic Regression

Regression techniques can be defined as a set of statistical and probabilistic approaches, usually used for data analysis of different natures, from which the relationship between the behaviour of a given phenomenon and one or more potentially predictive variables is investigated [33,34]. This process may or may not result in a cause-and-effect relationship [35].
In the present study, the phenomena to be modelled are the crash nature, considering the categories collision (code 0), run-off-road (code 1) and running over pedestrians (code 2); and the type of person involved in the crash, with pedestrian (code 0), passenger (code 1) and driver (code 2) categories. These phenomena are translated by polychotomous nominal qualitative variables (they present more than two mutually exclusive categories) and, therefore, the regression technique to be adopted must be a categorical regression, in this case, a multinomial logistic regression [34,35]. Multinomial regression is defined as a generalized linear model used to estimate probabilities for the m categories of a qualitative dependent variable Y, using a set of explanatory variables X [34,35,36]. In multinomial regression, one of the categories of the dependent variable must be chosen as a reference, in order to compare it with the remaining categories considered. This choice may or may not be arbitrary, depending on the orientation of the study. However, it is relevant to note that this does not change the model, it only changes the way of interpreting the data [37,38].
Therefore, if the dependent variable that represents the phenomenon under study presents 3 response categories (m = 3), as considered in this study, coded with ‘0’, ‘1’ and ‘2’, the number of equations of the estimated logit model will be (m − 1). To normalize the system relative to a category of the dependent variable, the logistic regression coefficient vector β of the response category considered as reference is set equal to 0 (for example β0 = 0). Thus, the probability P of the dependent variable Y taking the value of each of the 3 categories is given by expressions (1) to (3) [33], the chances (Odds) of each of the categories of Y relative to the reference category 0 is given by the expressions (4) and (5), and the expressions (6) and (7) establish the logit model.
P ( Y = 0   |   X ) = e X β 0 e X β 0 + e X β 1 + e X β 2   = e 0 e 0 + e X β 1 + e X β 2   = 1 1 + e X β 1 + e X β 2  
P ( Y = 1   |   X ) = e X β 1 e X β 0 + e X β 1 + e X β 2   = e X β 1 e 0 + e X β 1 + e X β 2   = e X β 1 1 + e X β 1 + e X β 2  
P ( Y = 2   |   X ) = e X β 2 e X β 0 + e X β 1 + e X β 2   = e X β 2 e 0 + e X β 1 + e X β 2   = e X β 2 1 + e X β 1 + e X β 2  
P ( Y = 1   |   X ) P ( Y = 0   |   X ) = e X β 1
P ( Y = 2   |   X ) P ( Y = 0   |   X ) = e X β 2
L n P ( Y = 1   |   X ) P ( Y = 0   |   X ) = X β 1
L n P ( Y = 2   |   X ) P ( Y = 0   |   X ) = X β 2
where:
  • Y is the qualitative dependent variable for a model where the response variable assumes m response categories (for this case m = 3, with codes 0, 1 and 2).
  • X is the matrix of independent variables (explanatory variables).
  • β0, β1 and β2 are the logistic regression coefficient vectors for the response category m.
The model is adjusted using the maximum likelihood method. The significance of the adjusted model is assessed using the likelihood ratio test (LRT). The LRT compares the goodness of fit of two models based on the ratio of their likelihoods. In this case, the likelihood of the reduced model (intercept only) was compared with the likelihood of the full model (with all variables) [35]. The goodness of fit measures adopted were the Pseudo-R2 of Cox & Snell, Nagelkerke and McFadden. It was not possible to find in the literature a strong guidance on how Pseudo-R2 measures should be interpreted [39,40]. These measures are more useful when comparing competing models obtained for the same data (the model with the largest pseudo-R2 statistic is the best). However, in Hensher and Stopher [41], McFadden himself states that a McFadden pseudo-R2 between 0.2 and 0.4 indicates a very good model fit.
The significance of the model coefficients was assessed with the Wald test and the area under the receiver operating characteristic curve (ROC), that ranges from 0 to 1, was used to measure the model’s ability to discriminate whether a specific condition is present or not. According to Hosmer and Lemeshow [33], for an area under the ROC (AUC) of 0.5, it is considered that the model does not have discriminant ability; for 0.5 < AUC < 0.7 the discrimination is weak; for 0.7 ≤ AUC < 0.8 it is acceptable; for 0.8 ≤ AUC < 0.9 it is good; and for AUC ≥ 0.9 the discrimination is exceptional.
Unlike what happens with other statistical data analysis procedures, in multinomial logistic regression, special attention is given to the sample size, especially for cases in which the explanatory variables present characteristics close to collinearity. This care is justified for very small samples with highly correlated variables, such as the number of fatal victims, since they can lead to erroneous or unreliable inferences based on the regression model obtained [26,32]. General guidelines recommend that maximum likelihood estimation, including logistic regression, must consider at least 100 cases. They also state that 500 cases are adequate in most applications, and that there should be at least 10 cases per predictor [42].
For diagnoses of the correlation between explanatory variables, the determination of Tolerance and VIF (Variance Inflation Factor) was adopted. According to Hoffman, Miles and Sze et al. [43,44,45], the closer the tolerance and VIF values are to 1, the lower the multicollinearity will be. Generally, VIF values above 4 or tolerance below 0.25 indicate that multicollinearity may exist. Still, to select variables with predictive power, the Forward Stepwise selection method was used. This method starts with a model that only includes the intercept, and it adds, at each step, the variable (predictor) that causes the largest statistically significant change in −2 Log Likelihood to the model. The final (best) model will be the one that only includes important predictors [33,35,46].
The procedure described and adopted in the analysis was performed using the software IBM SPSS®—Statistical Package for the Social Sciences, version 25.0 [47].

2.2. Data Processing

Efficient safety management needs a significant amount of data regarding the conditions under which crashes occur. The quantity and quality of the recorded data is essential to identify crash contributing factors and to properly diagnose safety conditions, allowing the detection of current and future safety needs, the definition of safety measures and the prediction of strategies’ effects over time.
To allow a valid and meaningful analysis of the road WZ crash phenomenon, the preparation of information started with the processing and organization of the 2010–2015 data provided by the Portuguese National Road Safety Authority (ANSR), originally spread over 3 related databases, in a single database of WZ crashes occurred in mainland Portugal.
In the Portuguese Police Crash Report [48], WZ crashes are record in section “B4—Pavement”. This section consists of four fields that aim to describe the pavement type (B4.1), its state of conservation (B4.2), the presence of obstacles on the pavement or roadworks (B4.3), and the pavement grip condition (B4.4). In this way, the reporting of WZ crashes is carried out along with crashes due to obstacles on the pavement by selecting one of the following B4.3 filling options: ‘Non-existent’, ‘Unsigned’, ‘Insufficiently signed’, and ‘Correctly signed’.
For the present study, ANSR provided three distinct databases for crashes with at least one victim and field B4.3 coded as ‘Unsigned’, ‘Insufficiently signed’, Correctly signed’ and ‘Undefined’: a database with information about vehicles and drivers involved in the crash, also including general information (location, date/time, nature, road environment conditions and characteristics); a database with information about injured passengers; and a database with information on injured pedestrians. The databases were related to each other through the crash identification number.
Decisions taken throughout the data preparation process were based on the recommendations and procedures adopted by [18,20,22,49,50,51]. In these studies, authors concluded that a significant part of WZ crashes is not reported, and that among those that are reported, some exhibit incomplete or incorrect crash report completion. The analysis of the Portuguese data showed that there is a significant percentage of reports with no information in the field B4.3, which does not allow concluding whether the crash occurred in a WZ or not. This fact can distort the real WZ crashes scenario, generating ambivalence in the validation of parameters, variables and models, which may result in erroneous conclusions. However, over the years, there has been a progressive decrease of non-completion cases, which makes it possible to envisage more representative future analyses (38% in 2010, 24% in 2011, 11% in 2012, 10% in 2013, 9% in 2014 and 8% in 2015) [13,31,52].
Considering that a road crash can involve one or more drivers, passengers, or pedestrians, just as it can involve one or more vehicles, a new variable was prepared to connect the three original databases. This new variable was the type of person involved in the crash: driver, passenger, and pedestrian. Based on the crash identification number, a new quantitative variable to represent the number of persons involved in the crash was also created. In accordance with this new quantitative variable, crashes with 6 or more persons involved were excluded from the database (1 crash with 44 persons involved, 1 with 13, 1 with 11, 4 with 9, 2 with 8, 11 with 7, and 23 with 6), since the minimum sample size conditions to fit the assumptions of multinomial logistic regression were not verified.
When processing the information, it was possible to verify that about 31% of a total of 13,109 records of persons involved in WZ crashes (4053 records) presented B4.3 coded as ‘Undefined’. These data were not considered, as the information recorded does not make it possible to state whether the crash occurred in a WZ area or due to the presence of obstacles on the pavement. Regarding records with other inadequate information completion, which resulted in the exclusion of another 2004 persons involved in WZ crashes, the following elimination criteria were considered:
(a)
Records coded as ‘Undefined’ in report fields A3: Crash type (with property damage only, fatalities, serious or minor injuries); A4: Crash nature (run of road, collision or run over pedestrians); A6: Gender and/or Date of birth; B2.1: Road horizontal geometry (straight or curved); B2.2: Road vertical geometry (level, slope or hump); B2.4: Road crash location (on the track, side, sidewalk, dedicated track or parking lot); B2.5: Road intersection (outside the intersection, at a level or uneven intersection); B4.1: Pavement type (unpaved, bituminous, concrete or stone); B4.2: Pavement condition (good, regular or bad); B4.4: Pavement grip condition (dry and clean, humid, wet, etc.); and B5.1: Markings on the pavement (no marks or barely visible, with markings separating the direction of traffic or with markings separating the direction and lanes of traffic).
(b)
Records without location data (no information in field A2).
(c)
Records of drivers under the age allowed by law to drive (field A6).
(d)
Records with no information about the year of license to drive (field E1.3).
In summary, taking into account the above, the database was initially organized based on the crash identification number and then it was reorganized by the type of person involved in the crash (new variable), which resulted in a single line of information per person involved in the WZ crash.
The procedure described resulted in a final database with 7052 persons involved (drivers, passengers, and pedestrians), corresponding to a total of 3504 WZ crashes. Table 3 presents the number of records with B4.3 field ‘Undefined’ and the number of WZ crashes and type of person involved. This database was used in the IBM SPSS® program to perform the following statistical analysis: an exploratory analysis to identify the main characteristics related to the variables considered in the study, and the development of multinomial logistic models for crash nature (collision, run-off-road and run over pedestrians) and type of person involved in the crash (driver, passenger, and pedestrian).

3. Models and Discussion

Based on the data provided by ANSR, which treatment resulted in a single database organized by type of person involved in WZ crashes, an exploratory analysis, the selection of significant variables, and statistical tests recommended for multinomial logistic regression, were performed.
In total, 230 predictive nominal qualitative variables were studied and converted into dummy variables, based on the results obtained from the exploratory analysis. The quantitative predictive variable considered in the modelling process as a covariate was the number of people involved in the crash, with five categories: 1, 2, 3, 4 and 5. The outcomes of the collinearity analysis and forward stepwise regression performed for the predictive variables resulted in a set of significant explanatory variables to be considered in the models. For the crash nature model, 10 variables were selected, and for the type of the person involved, 7. These variables are in line with those identified in similar studies [10,12,13,14,15,16,17,18,19,20,21,22,25,26,30,31,32].
In the modelling process, the reference category ‘collision’ was considered for the crash nature model (most common type of WZ crash identified in several studies [10,16,17,18,19,25,26] and in the database), and ‘pedestrian’ for the type of person involved model (type of person involved presenting more severe injuries in the database). The explanatory variables selection and multinomial modelling results are presented and discussed in Section 3.1 and Section 3.2.

3.1. Model 1: WZ Crash Nature

3.1.1. Results

Table 4 provides a summary of the cases processed in the analysis. It presents: the number of elements (N) in each of the explanatory variable categories (5 categories for quantitative variable X1 and dummy code for qualitative variables X2 to X11); the corresponding percentage values; the valid, missing, and total values; and two collinearity diagnostic factors, tolerance and VIF.
A summary of the model fitting criteria and of the effect selection tests obtained in the forward stepwise regression, considering a significance level of 0.05, is presented in Table 5. Table 6 presents the information regarding the model fit to the data, and Table 7 presents the likelihood ratio tests results and the model fitting criterion for each of the explanatory variables (predictors).
The estimates of the model parameters, the Wald test and the odds ratios Exp(β) are presented in Table 8. Finally, the equations of the probabilistic models obtained from the information presented in Table 8 are used to calculate the probability that a person involved in a WZ crash has to belong to a certain crash nature category (each person involved is classified in the category where the probability of occurrence is greater). The results obtained in the classification are shown in Table 9.

3.1.2. Analysis of Results

In the WZ crash nature model, the probability of a subject belonging to Y0 (collision), Y1 (run-off-road) and Y2 (run over pedestrian) was determined through a set of 10 predictor variables (X2 to X11 in Table 4). The tolerance and VIF test results showed that the selected variables were not influenced or inflated by its correlation with the other considered variables, meaning that multicollinearities were not identified. The results of the stepwise analysis using the forward entry method (see Table 5) also confirm the relevance of the selected variables. The successive reduction of AIC and BIC values, in line with the likelihood ratio tests, indicates that the model improves with the entry of each variable. Also, since the significance of the chi-square p-value is less than 0.05, it is possible to conclude that the model fits the data significantly.
The model fitting information presented in Table 6 makes it possible to conclude that there is at least one independent variable that significantly influences the nature of the WZ crash (p-value ≤ 0.001). The values of AIC, BIC and the likelihood tests indicate that the final complete model (with all variables) has a better fit than the reduced model (intercept only). There is no strong guidance on how pseudo-R2 of Cox and Snell, Nagelkerke and McFadden should be interpreted, however, the obtained value of 0.321 for Mc Fadden indicates a good fit of the model. The likelihood ratio tests results obtained for the independent variables (Table 7), for an α = 0.05, show that all the variables considered significantly affect the WZ crash nature (p ≤ 0.001). The best model, i.e., the one with the lowest AIC and BIC, is the model that incorporates all variables.
From the Wald test results (Table 8), for the run-off-road category, it is concluded that only variable X8 (passenger car) is not significant. Thus, it is possible to state that the remaining independent variables allow distinguishing the probabilities of the run-off-road category versus the collision category, and that all the variables allow distinguishing the probabilities of the run over pedestrian versus the collision category. For the run-off-road category, variables X1, X2, X3, X4, X9, X10 and X11 do not increase the probability of being involved in a run-off-road WZ crash (26.2% of persons involved) relative to a collision (60.2 % of persons involved). On the other hand, when compared to a collision, variables X5, X6 and X7 increase the probability of being involved in a run-off-road WZ crash. Concerting the running over pedestrian category, variables X1, X2, X3, X5, X6, X7, X8 and X10 do not increase the probability of being involved in a run over pedestrian WZ crash (13.6% of persons involved) relative to a collision. However, X4, X9 and X11 considerably increase the probability of being involved in a run over pedestrian WZ crash.
As shown in Table 9, the predicted model correctly classifies 73.2% of the cases, representing a rate of correct classifications that is 1.63 times higher than the one obtained by chance (0.6022 + 0.2622 + 0.1362 = 0.449 = 44.9%), demonstrating the usefulness of the model for classifying new observations. The predictive power of the model can also be evaluated using the ROC curves. Figure 1 presents the ROC curves for the estimated cell probability in the collision, run-off-road and running over pedestrian response categories. Graph (a) shows an area under the ROC curve of 0.811, which indicates a good discriminating ability for this category. Graphs (b) and (c), with areas under the ROC curve of 0.735 and 0.602, indicate an acceptable and poor discriminating ability to predict these categories (run-off-road and run over pedestrian).

3.2. Model 2: Type of Person Involved in WZ Crashes

3.2.1. Results

Similar to Section 3.1.1, Table 10, Table 11, Table 12, Table 13, Table 14 and Table 15 present the results obtained for Model 2: type of person involved in WZ crashes.

3.2.2. Analysis of Results

In the type of person involved in WZ crashes model, the probability of a subject belonging to Y0 (pedestrian), Y1 (passenger) and Y2 (driver) was identified through a set of 7 predictor variables (X2 to X8). Multicollinearity was not identified and the results of the stepwise analysis using the forward entry method confirmed the relevance of the 7 variables considered (see Table 10 and Table 11). The successive reduction of AIC and BIC values, in line with the likelihood ratio tests, make it possible to state that the model improves with the entry of each variable. It is still possible to conclude that the model fits the data significantly, since the significance of the chi-square p-value is less than 0.05.
Regarding the model fitting information (Table 12), it is possible to conclude that there is at least one independent variable that significantly influences the type of person involved in WZ crashes (p-value ≤ 0.001). The values of AIC, BIC and likelihood tests indicate that the final complete model (with all independent variables) represents a better fit than the reduced model (intercept only). As previously mentioned, there is no strong guidance on how pseudo-R2 of Cox and Snell, Nagelkerke and McFadden should be interpreted. Still, for the McFadden pseudo-R2, according to [41], the value of 0.126 does not indicate a good fit of the model. For the independent variables and an α=0.05, the likelihood ratio test results in Table 13 show that all the variables considered significantly affect the type of person involved in WZ crashes (p ≤ 0.001). The best model is the model with all variables (lowest AIC and BIC).
According to the Wald test results (Table 14), all the variables allow to distinguish the probabilities of the passenger category versus the pedestrian reference category. For the driver category, only variable X1 (number of persons involved in the crash) is not significant. The remaining independent variables make it possible to distinguish the probabilities of the driver category versus the pedestrian category. Despite the verification of statistical criteria, results obtained for the model must be interpreted carefully, since the reference category (pedestrian) presented a low percentage of correct classifications, namely 4.8% (see Table 15). Having pedestrian (7.7% of persons involved) as the reference category, all variables for the passenger category increase the probability of being involved in a WZ crash as a passenger (17.4% of persons involved). For the driver category, variables X2 to X8 increase the probability of being involved as a driver (74.9% of persons involved).
The predicted model correctly classifies 74.6% of the cases (Table 15), which represents a rate of correct classifications 1.25 times higher than the one obtained by chance (0.0772 + 0.1742 + 0.7492 = 0.597 = 59.7%). The predictive power of the final model was also evaluated using ROC curves. Figure 2 presents the ROC curves for the estimated cell probability in the pedestrian, passenger, and driver response categories. Graph (a) shows an area under the ROC curve of 0.741, which means that there is acceptable discrimination of subjects for this category, and graphs (b) and (c) show areas under the ROC curve of 0.641 and 0.541, meaning poor subject discrimination to predict these categories.

3.3. Discussion of Results

In addition to checking the feasibility of establishing the models, the decision to develop WZ crash nature and type of person involved in WZ crashes global models, rather than a model of injury severity, is justified. This can be inferred since the injury variable categories, for the database considered, do not present a similar distribution of subjects, as recommended by Marôco [35], especially regarding fatal and seriously injured victims (1.0% fatal victims, 3.4% seriously injured, 60.3% slightly injured and 35.3% unharmed).
It was also confirmed that the database did not present specific WZ quantitative variables that could be used as covariates in the analysis, such as the number of obstacles, the annual average daily traffic (AADT), the total WZ length, or the number of lanes closed, among others. To overcome this limitation and allow the feasibility of multinomial regression analysis, the quantitative variable X1 (number of persons involved in the crash) was prepared and included in the analysis.
Considering the more favorable distribution of subjects for the WZ crash nature and type of person involved in WZ crashes categories, models were developed for these two variables. The models showed a high level of confidence (5%) and proved to be statistically significant. As mentioned previously, the crash nature model presents 73.2% of correct classifications. This value is considerably higher than the proportional percentage of correct classifications by chance, which is 45.1% (about 1.63 times higher). For the type of person involved, the percentage of correct classifications is 74.6% and the proportional percentage of correct classifications by chance is 59.7% (about 1.25 times higher). For this last model, the increase in correct classifications, when compared to the classification by chance, is not as expressive as for the crash nature model. Even so, according to Marôco [35], the model can be considered to have classificatory properties (increase of at least 25%).
Regarding the WZ crash nature model, the quantitative variable X1 (number of people involved in a WZ crash) that was created did not show a positive influence on the model (Exp(β) < 1). When examining the injuries’ variables (X2 to X4), the analysis indicates that they do not increase the probability of a subject being involved in a crash. The exception is X4 (light injury = 0) for the run over pedestrian category relative to the reference category collision, since being involved as a fatal victim or being seriously injured or unharmed, in contrast to a light injured, increases the chances of being involved in a run over pedestrian by 92.2% (Exp(β) = 1.922). As expected, due to the involvement of vulnerable users (pedestrian workers) [21,22], when compared to run-off-road and collision, running over pedestrian category presents a high proportion of fatalities and serious injuries in WZ crashes, which confirms the obtained result.
For the road environment variable (X5, urban = 0), being involved in a crash in a rural environment, compared to a crash in an urban environment, increases the chances of being involved in a run-off-road crash relative to a collision by 30.2% (Exp(β) = 1.302). This environment was also identified in [13,22] as a risk factor that increases the probability of WZ crashes, especially those involving serious injuries. In contrast, for running over pedestrian, it reduces the chances of a subject being involved by 82% (Exp(β) = 0.180). In run-off-road crashes, this increase is apparently associated with the practice of high speeds.
The road horizontal geometric design (X6, alignment = 0), for run-off-road category relative to collision and when the crash occurs on a curve, increases the chances of a subject being involved by 87.0% (Exp(β) = 1.870), relatively to those involved in crashes that occur in straight alignment, while for running over pedestrian there is a reduction of 30.1% (Exp(β) = 0.699). Bad pavement grip conditions (X7, dry and clean = 0) are directly related to road environment conditions, such as adverse weather, presence of mud, sand, or oil on the pavement. For subjects involved in crashes that occurred with good grip conditions (dry and clean pavement), relative to those involved in places with poor grip conditions, there is an increase of 57.7% (Exp (β) = 1.577) in the chances of being involved in a run-off-road crash. There is also a reduction of 17.9% (Exp(β) = 0.821) for the case of run over pedestrian, when compared to the reference category collision. In view of these results and of the literature [13,18,22], it is considered of fundamental importance to ensure efficient traffic control devices, traffic signs and speed limits, especially in adverse weather conditions and complex road geometries.
Regarding the intervening vehicles, a positive influence was not found for the passenger car variable (X8, passenger car = 0) when modelling the crash nature. This result is in line with the fact that there is a predominance of subjects involved in WZ crashes driving or being passengers in passenger cars (77.7% of people involved). Considering the results obtained and the literature consulted [14,15,17,18,21,22,25], and despite the reduced number of cases in the analyzed database (348 cases), it is suggested that an isolated analysis of crashes involving one or more heavy vehicles is performed, since they present homogeneous characteristics in terms of injury severity, involved persons (drivers) and type of crash (collision).
For variable X9 (running straight = 0), being involved in a crash in which the actions and maneuvers were not running straight before the crash (including driver actions and maneuvers before the crash and actions of pedestrian victims), compared to running straight, reduces the chances of being involved in a run-off-road crash by 64.9% (Exp(β) = 0.351). Under the same conditions and when compared to a collision, it significantly increases the chances of being involved in a run over pedestrian crash by 303.6% (Exp(β) = 4.036). Most road crash studies, including the ones presented in the literature review section, have identified running straight as a risk factor that increases the probability of crashes, since it favors the practice of higher speeds.
The involvement in a crash with other actions and maneuvers of the driver previously to the accident, as well as with actions of the pedestrian victims, relative to the involvement of a road worker (X10, WZ pedestrian worker victim = 0), reduces the chances of being involved in a run-off-road crash by 91.6% (Exp(β) = 0.084). It also reduces the chances of being involved in a run over pedestrian crash by 98.8% (Exp(β) = 0.012), in comparison to a collision crash. As mentioned before, pedestrians working on the road are the most vulnerable subjects on WZ. Additionally, it is considered relevant to develop a model for run over pedestrian crashes, since there were 14 fatal and 50 seriously injured victims, as a consequence of this type of crash, presenting a relatively homogeneous distribution in terms of injuries.
For the circumstances described in other information complementary to actions and maneuvers (such as disrespect for vertical signs, road markings or traffic lights, among others), relative to the practice of excessive speed for prevailing conditions (X11, excessive speed for prevailing conditions = 0), which is an illegal and unadvised behavior in WZ, the chances of being involved in a run-off-road crash are reduced by 43.7% (Exp(β) = 0.563). However it increases the chances for the run over pedestrian category by 167.9% (Exp(β) = 2.679), when compared with collision.
Having collision as a reference category, variables X9 and X11 revealed a positive influence on the crash nature model for the run over pedestrian crash type. These results indicate the need to develop effective methods for signaling, as well as for warning and directing traffic in WZ. Some examples of these methods are the adoption of temporary traffic control and speed reduction devices, and more visible inspections (police presence). These actions aim to reduce the impact of human behavior as a contributing factor to the occurrence of WZ crashes [12,16,18,19,25,26].
When compared to the crash nature model, there is an increased difficulty in interpreting the results of the type of person involved model, since the pedestrian’s category was used as a reference and the characteristics of crashes in which they are involved are quite different from those verified for the drivers and passengers categories. This fact is reinforced by the results obtained in the classification of cases (Table 14), since the percentage of correct classifications for pedestrian and passenger categories are low when compared to the driver category.
The number of persons involved in the selected WZ crashes (X1) revealed a positive influence on the type of person involved model for the passenger category. The chances of being involved as a passenger in a WZ crash in comparison to a pedestrian, for the total number of persons involved, is 2.182 (Exp(β)), corresponding to a chance’s variation of 118.2% per unit increase of persons involved. Regarding crashes with one driver involved (which corresponds to a single vehicle involved), considering the ROC curves (especially for the driver and passenger categories) and the representativeness of this category in variable X1 (16.5%, see Table 3), it is recommended to investigate this scenario separately from crashes involving multiple vehicles. The need to study the effect of the type and number of vehicles involved, was also identified and addressed by Osman et al. and Meng et al. [12,22].
For the passenger and driver categories, variables related to injury severity (X2 and X3, fatality = 0, seriously injured = 0) revealed a positive influence on the model. These groups have the highest number of persons involved with injuries (mostly slight injuries) when compared with the pedestrian category.
For variable X4 (urban = 0), a subject involved in a rural environment WZ crash, relative to an urban environment, increases the chances of being involved as a passenger by 578.2% (Exp(β) = 6.782) and as a driver by 512.4% (Exp(β) = 6.124), when compared to the pedestrian category. Involvement in a crash that occurs on a horizontal curve (X5, alignment=0), in comparison to a crash on a straight alignment, increases the chances of being involved as a passenger by 108.4% (Exp(β) = 2.084) and as a driver in 59.2% (Exp(β) = 1.592). It was also verified that adverse pavement grip conditions, relative to good grip conditions (X6, dry and clean=0), increase the chances of being involved as a passenger by 53.5% (Exp(β) = 1.535) and as a driver by 35.8% (Exp(β) = 1.358), compared to the pedestrian category. The results obtained are in line with what would be expected from the comparison of passenger and driver with the pedestrian reference category.
Being involved as a subject in other types of vehicles considered in the Portuguese police crash report, relative to the class heavy vehicle (X7, heavy vehicle involvement = 0), increases the chances of a passenger being involved in a WZ crash by 498.3% (Exp(β) = 5.983), and the chances of a driver being involved in 73.5% (Exp(β) = 1.735), when compared to pedestrians. It should be noted that the passenger category is naturally associated with passenger cars, the driver category is mostly associated with passenger cars and for the other types of intervening vehicles, these categories are heterogeneously distributed.
Finally, the chances of a subject being involved in a WZ crash during the weekend, in comparison to a subject involved during the working weekdays (X8, working weekday = 0), increases the chances of a passenger being involved by 176.3% (Exp(β) = 2.763) and a driver by 83.3% (Exp(β) = 1.833), relative to a pedestrian. Considering that during the weekend road work zones are generally inactive, the results obtained reflect a greater propensity for driver behaviors associated with imposed speed limit and signage disrespect, as stated by Osman et al. [22].

4. Conclusions

Aligned with the consolidated road networks ‘maintenance needs and with the WHO and UN objectives to improve road safety, the presented study aims to identify the main risk factors increasing the probability of WZ crashes occurrence considering the crash nature, type of person involved and use of multinomial logistic regression. Multinomial logistic regression has been mainly used to analyze WZ crash and injury severity. Therefore, its application to the crash nature and type of person involved, considering the information available in the Portuguese police crash report, which was organized in a single database by type of person involved, constitutes a new application and a contribution to the study and promotion of WZ safety.
In this study, two analytical models were established, confirming the feasibility of using multinomial logistic regression to predict the occurrence probabilities for the categories considered for crash nature (collision, run-off-road and run over pedestrian) and type of person involved in WZ crashes (pedestrian, passenger and driver).
In a universe of 230 available independent variables, 10 nominal qualitative variables with predictive effects were selected for the crash WZ nature model: fatality, seriously injured, slightly injured, road environment, horizontal geometric design, pavement grip condition, passenger car, driver action, pedestrian worker victim action and excessive speed for prevailing conditions. Considering the type of person involved in WZ crashes, the following 7 predictive nominal qualitative variables were considered: fatality, seriously injured, road environment, horizontal geometric design, pavement grip condition, heavy vehicle involvement and working weekday.
From the fitting information, quality of fit, Pseudo-R², likelihood ratio tests, predictive parameters’ estimates, as well as from correct and random classifications, it was concluded that both models are considered discriminating through classification and with predictive power verified by the ROC curves. However, the WZ crash nature model presented better results. It is believed that this outcome is related to a more favorable distribution of subjects by categories in the WZ crash nature model, and to the use of pedestrians as the reference category in the type of person involved model (pedestrian’s characteristics and type of involvement are quite different from those of passengers and drivers).
Taking collision as a reference category, a global analysis of the WZ crash nature predictive model showed that rural environment, horizontal curve design and poor pavement grip conditions, compared to the urban environment, horizontal alignment design and good pavement grip conditions (dry and clean), increase the chances of a run-off-road crash by 30.2%, 87.0%, 57.7% respectively. For the run over pedestrian category, it was found that fatal victims, serious injured and unharmed victims, compared to a subject with minor injuries, increase the chances of being involved in a run over pedestrian crash by 92.2%. Running over pedestrian have the highest number of fatalities and serious injuries in the set of WZ crashes analyzed. The chances of being involved in a run over pedestrian crash as a subject in a vehicle that performs other actions and maneuvers before the crash or other complementary information, including pedestrian victims’ actions, compared to a subject in a vehicle running straight or moving at excessive speed, increase by 303.6% and 167.9% respectively.
As mentioned in the discussion of results, the interpretation of the type of person involved in WZ crashes model results must be carefully performed. According to the predictive model, the passenger category is positively influenced by all the considered predictive variables. For the driver category, only the number of persons involved in the crash does not significantly influence the probability of being involved in a crash as a driver, which can be explained by the fact that most of the persons involved in WZ crashes are drivers. Variables with the most significant positive influence are injured and unharmed victims, the rural environment, and the involvement of vehicles other than heavy vehicles, compared to the urban environment, fatal victims and the involvement of heavy vehicles, increasing the chances of being involved as a passenger by 755.8%, 578.2% and 498.3% respectively. This result could be explained by the fact that passengers essentially travel in passenger car vehicles. The chances of being involved as an unharmed or slightly injured driver, compared to being a fatal or seriously injured victim, increase by more than 600%. For the driver category and considering the road environment variable, the chances of being involved in a WZ crash in a rural environment, compared to an urban environment, increase by 512.4%.
The lack of more detailed WZ information, namely of quantitative variables such as the WZ length and traffic volume, is considered the main limitation of this study. It is believed that WZ detailed information would allow more representative and advanced crash analyses. Considering future works, crash nature models using multinomial logistic regression considering only drivers and a separate analysis for crashes involving a single vehicle from those with multiple intervening vehicles, are suggested. It is also suggested to verify the possibility of applying other statistical techniques such as generalized linear models, Monte Carlo, genetic algorithms, random forest and neural networks to Portuguese data, as well as comparing the results obtained, when possible, with those of multinomial logistic regression.

Author Contributions

Conceptualization, B.S. and A.V.; methodology, B.S. and A.V.; validation, A.V., B.S. and L.P.-S.; formal analysis, A.V.; investigation, A.V., B.S. and L.P.-S.; writing—original draft preparation, B.S.; writing—review and editing, B.S. and L.P.-S.; supervision, B.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding, but it is part of the research activity carried out at Civil Engineering Research and Innovation for Sustainability (CERIS) research center, which received financial support from the Fundação para a Ciência e a Tecnologia (FCT) in the framework of project UIDB/04625/2020.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Restrictions apply to the availability of data. Data was obtained from Portuguese National Road Safety Authority (ANSR—Autoridade Nacional de Segurança Rodoviária) and are available from the corresponding author if authorized by ANSR.

Acknowledgments

The authors acknowledge the Portuguese National Road Safety Authority (ANSR—Autoridade Nacional de Segurança Rodoviária) for providing data and assistance, CERIS—Civil Engineering Research and Innovation for Sustainability (ECI/04625), and the University of Beira Interior for support and funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. OECD-ITF. Road Safety Annual Report 2020. Available online: https://www.itf-oecd.org/sites/default/files/docs/irtad-road-safety-annual-report-2020_0.pdf (accessed on 9 November 2022).
  2. United Nations. A/RES/70/1 Transforming Our World: The 2030 Agenda for Sustainable Development Trans-Forming Our World: The 2030 Agenda for Sustainable Development Preamble. 2015. Available online: https://sdgs.un.org/sites/default/files/publications/21252030%20Agenda%20for%20Sustainable%20Development%20web.pdf (accessed on 9 November 2022).
  3. United Nations. A/RES/74/299 Improving Global Road Safety. 2020. Available online: https://digitallibrary.un.org/record/3879711?ln=en (accessed on 9 November 2022).
  4. WHO. Global Status Report on Road Safety 2018. Available online: https://www.who.int/publications/i/item/9789241565684 (accessed on 9 November 2022).
  5. WHO. The Top 10 Causes of Death. 2020. Available online: https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death (accessed on 10 November 2022).
  6. WHO. Road Traffic Injuries. 2022. Available online: https://www.who.int/news-room/fact-sheets/detail/road-traffic-injuries (accessed on 10 November 2022).
  7. Ambros, J.; Turek, R.; Elgner, J.; Křivánková, Z.; Valentová, V. Effectiveness evaluation of section speed control in Czech motorway work zones. Safety 2020, 6, 38. [Google Scholar] [CrossRef]
  8. La Torre, F.; Domenichini, L.; Nocentini, A. Effects of stationary work zones on motorway crashes. Saf. Sci. 2017, 92, 148–159. [Google Scholar] [CrossRef]
  9. Silverstein, C.; Schorr, J.; Hamdar, S.H. Work zones versus nonwork zones: Risk factors leading to rear-end and sideswipe collisions. J. Transp. Saf. Secur. 2016, 8, 310–326. [Google Scholar] [CrossRef]
  10. Yang, H.; Ozbay, K.; Ozturk, O.; Xie, K. Work zone safety analysis and modeling: A state-of-the-art review. Traffic Inj. Prev. 2015, 16, 387–396. [Google Scholar] [CrossRef]
  11. Ullman, G.; Ullman, B.; Finley, M. Analysis of crashes at active night work zones. In Proceedings of the Transportation Research Board 85th Annual Meeting, Washington, WA, USA, 22–26 January 2006. [Google Scholar]
  12. Meng, Q.; Weng, J.; Qu, X. A probabilistic quantitative risk assessment model for the long-term work zone crashes. Accid. Anal. Prev. 2010, 42, 1866–1877. [Google Scholar] [CrossRef] [PubMed]
  13. Santos, B.; Trindade, V.; Polónia, C.; Picado-Santos, L. Detecting risk factors of road work zone crashes from the information provided in police crash reports: The case study of Portugal. Safety 2021, 7, 12. [Google Scholar] [CrossRef]
  14. Khattak, A.; Rodriguez, D.; Targa, F.; Rocha, M. Understanding the Role of Truck-Driver, Occupational and High-Risk Roadway Factors in Truck-Involved Collisions; CURS Report No. 2003-04; CURS: Nashville, TN, USA, 2003. [Google Scholar]
  15. Khattak, A.; Targa, F. Injury severity and total harm in truck-involved work zone crashes. Transp. Res. Rec. 2004, 1877, 106–116. [Google Scholar] [CrossRef]
  16. Qi, Y.; Srinivasan, R.; Teng, H.; Baker, R. Frequency of Work Zone Accidents on Construction Projects: Final Report C-01-61. 2005. Available online: https://rosap.ntl.bts.gov/view/dot/16159 (accessed on 21 November 2022).
  17. Li, Y.; Bai, Y. Fatal and injury crash characteristics in highway work zones. In Proceedings of the Transportation Research Board 87th Annual Meeting, Washington, WA, USA, 13–17 January 2008. [Google Scholar]
  18. Li, Y.; Bai, Y. Comparison of characteristics between fatal and injury accidents in the highway construction zones. Saf. Sci. 2008, 46, 646–660. [Google Scholar] [CrossRef]
  19. Elghamrawy, T. Optimizing Work Zone Practices for Highway Construction Projects. Ph.D. Thesis, University of Illinois, Champaign, IL, USA, 2011. [Google Scholar]
  20. Yang, H.; Ozbay, K.; Ozturk, O.; Yildirimoglu, M. Modeling work zone crash frequency by quantifying measurement errors in work zone length. Accid. Anal. Prev. 2013, 55, 192–201. [Google Scholar] [CrossRef] [PubMed]
  21. Osman, M.; Paleti, R.; Mishra, S.; Golias, M. Analysis of injury severity of large truck crashes in work zones. Accid. Anal. Prev. 2016, 97, 261–273. [Google Scholar] [CrossRef] [PubMed]
  22. Osman, M.; Paleti, R.; Mishra, S. Analysis of passenger-car crash injury severity in different work zone configurations. Accid. Anal. Prev. 2018, 111, 161–172. [Google Scholar] [CrossRef] [PubMed]
  23. Weng, J.; Du, G.; Ma, L. Driver injury severity analysis for two work zone types. Proc. Inst. Civ. Eng.-Transp. 2016, 169, 97–106. [Google Scholar] [CrossRef]
  24. Sze, N.; Song, Z. Factors contributing to injury severity in work zone related crashes in New Zealand. Int. J. Sustain. Transp. 2019, 13, 148–154. [Google Scholar] [CrossRef]
  25. Li, Y.; Bai, Y. Highway work zone risk factors and their impact on crash severity. J. Transp. Eng. 2009, 135, 694–701. [Google Scholar] [CrossRef]
  26. Ashqar, H.; Shaheen, Q.; Ashur, S.; Rakha, H. Impact of risk factors on work zone crashes using logistic models and Random Forest. In Proceedings of the IEEE Intelligent Transportation Systems Conference (ITSC), Indianapolis, IN, USA, 19–21 September 2021. [Google Scholar] [CrossRef]
  27. Weng, J.; Meng, Q. Analysis of driver casualty risk for different work zone types. Accid. Anal. Prev. 2011, 43, 1811–1817. [Google Scholar] [CrossRef] [PubMed]
  28. Bédard, M.; Guyatt, G.; Stones, M.; Hirdes, J. The independent contribution of driver, crash, and vehicle characteristics to driver fatalities. Accid. Anal. Prev. 2002, 34, 717–727. [Google Scholar] [CrossRef]
  29. Harb, R.; Radwan, E.; Yan, X.; Pande, A.; Abdel-Aty, M. Freeway work-zone crash analysis and risk identification using multiple and conditional logistic regression. J. Transp. Eng. 2008, 134, 203–214. [Google Scholar] [CrossRef]
  30. Santos, B.; Picado-Santos, L. Impact assessment of road work zones in operating conditions, safety and user costs. In Pavement and Asset Management, 1st ed.; Crispino, M., Ed.; CRC Press/Balkema Taylor & Francis: London, UK, 2019; pp. 595–604. ISBN 978-0-367-20989-6. [Google Scholar]
  31. Santos, B.; Picado-Santos, L.; Trindade, V. Using binary logistic regression to explain the impact of accident factors on work zone crashes. In Proceedings of the RSS 2017—Road Safety & Simulation International Conference, The Hague, The Netherlands, 17–19 October 2017. [Google Scholar]
  32. Abdulhafedh, A. Road traffic crash data: An overview on sources, problems, and collection methods. J. Transp. Technol. 2017, 7, 206–219. [Google Scholar] [CrossRef]
  33. Hosmer, D.; Lemeshow, S.; Sturdivant, R. Applied Logistic Regression, 3rd ed.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2013. [Google Scholar] [CrossRef]
  34. Washington, S.; Karlaftis, M.; Mannering, F.; Anastasopoulos, P. Statistical and Econometric Methods for Transportation Data Analysis, 3rd ed.; Chapman & Hall/CRC: New York, NY, USA, 2020. [Google Scholar] [CrossRef]
  35. Marôco, J. Análise Estatística Com o SPSS Statistics, 6th ed.; ReportNumber—Análise e gestão de informação, Lda; Biblioteca Nacional de Portugal: Pêro Pinheiro, Portugal, 2014. [Google Scholar]
  36. Chen, J.; Lewis, B.; Marathe, A.; Marathe, M.; Swarup, S.; Vullikanti, A. Individual and collective behavior in public health epidemiology. Handb. Stat. 2017, 36, 329–365. [Google Scholar] [CrossRef]
  37. Fávero, L.; Belfiore, P.; Silva, F.; Chan, B. Análise de Dados: Modelagem Multivariada Para Tomada de Decições, 3rd ed.; Elsevier: Rio de Janeiro, Brasil, 2009; ISBN 9788535230468. [Google Scholar]
  38. Bham, G.; Javvadi, B.; Manepalli, U. Multinomial logistic regression model for single-vehicle and multivehicle collisions on urban U.S. highways in Arkansas. J. Transp. Eng. 2012, 138, 786–797. [Google Scholar] [CrossRef]
  39. Osborne, J. Best Practices in Logistic Regression; SAGE Publications, Ltd.: Thousand Oaks, CA, USA, 2017. [Google Scholar] [CrossRef]
  40. Pituch, K.; Stevens, J. Applied Multivariate Statistics for the Social Sciences: Analyses with SAS and IBM’s SPSS, 6th ed.; Taylor and Francis: Oxfordshire, UK, 2015. [Google Scholar] [CrossRef]
  41. Hensher, D.; Stopher, P. Behavioural Travel Modelling; Taylor and Francis Inc.: London, UK, 2021. [Google Scholar] [CrossRef]
  42. Long, J. Regression Models for Categorical and Limited Dependent Variables, 1st ed.; SAGE Publications: Thousand Oaks, CA, USA, 1997; ISBN 9780803973749. [Google Scholar]
  43. Hoffmann, R. Análise de Regressão: Uma Introdução à Econometria; Escola Superior de Agricultura Luiz de Queiroz: Piracicaba, Brazil, 2017. [Google Scholar] [CrossRef]
  44. Miles, J. Tolerance and variance inflation factor. In Wiley StatsRef: Statistics Reference Online; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2014. [Google Scholar] [CrossRef]
  45. Sze, N.; Wong, S.; Lee, C. The likelihood of achieving quantified road safety targets: A binary logistic regression model for possible factors. Accid. Anal. Prev. 2014, 73, 242–251. [Google Scholar] [CrossRef] [PubMed]
  46. Harrel, F., Jr. Regression Modeling Strategies—With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis, 2nd ed.; Springer Series in Statistics; Springer: Cham, Switzerland, 2015. [Google Scholar] [CrossRef]
  47. IBM Corp. IBM SPSS Statistics for Windows; IBM Corp.: Armonk, NY, USA, 2017. [Google Scholar]
  48. Autoridade Nacional de Segurança Rodoviária (ANSR). Boletim Estatístico de Acidentes de Viação, Portugal. 2014. Available online: http://www.ansr.pt/Estatisticas/BEAV/Documents/BEAV.pdf (accessed on 9 November 2022).
  49. Pahukula, J.; Hernandez, S.; Unnikrishnan, A. A time of day analysis of crashes involving large trucks in urban areas. Accid. Anal. Prev. 2015, 75, 155–163. [Google Scholar] [CrossRef] [PubMed]
  50. Roque, C.; Moura, F.; Lourenço-Cardoso, J. Detecting unforgiving roadside contributors through the severity analysis of ran-off-road crashes. Accid. Anal. Prev. 2015, 80, 262–273. [Google Scholar] [CrossRef]
  51. Tay, R.; Choi, J.; Kattan, L.; Khan, A. A multinomial logit model of pedestrian-vehicle crash severity. Int. J. Sustain. Transp. 2011, 5, 233–249. [Google Scholar] [CrossRef]
  52. Vieira, A. Analysis of Road Work Zone Crash Data Using Multinomial Logistic Regression. Master’s Thesis, University of Beira Interior, Covilhã, Portugal, 2019. [Google Scholar]
Figure 1. ROC Curves. Estimated cell probability for: (a) Collision (AUC = 0.811), (b) Run-off-road (AUC = 0.735), (c) Run over pedestrian (AUC = 0.602).
Figure 1. ROC Curves. Estimated cell probability for: (a) Collision (AUC = 0.811), (b) Run-off-road (AUC = 0.735), (c) Run over pedestrian (AUC = 0.602).
Sustainability 15 02674 g001
Figure 2. ROC Curves. Estimated cell probability for: (a) Pedestrian (AUC = 0.741), (b) Passenger (AUC = 0.641), (c) Driver (AUC = 0.541).
Figure 2. ROC Curves. Estimated cell probability for: (a) Pedestrian (AUC = 0.741), (b) Passenger (AUC = 0.641), (c) Driver (AUC = 0.541).
Sustainability 15 02674 g002
Table 1. Summary of work zones crashes studies using logistic regression approaches and other similar methodologies.
Table 1. Summary of work zones crashes studies using logistic regression approaches and other similar methodologies.
ReferenceYearMethodScope
Khattak et al. [14]2003Ordered probit
Linear regression
Heavy vehicle involvement
Injury severity and frequency
Khattak et al. [15]2004Ordered probitHeavy vehicle involvement
Injury severity
Qi et al. [16]2005Ordered probitRear-end collisions
Crash severity
Li & Bai [17,18,25]2006–2009Logistic regression
Crash frequency analysis
Fatal and injury crashes
Human factors
Meng et al. [12]2010Probabilistic quantitative risk assessment: regression model, event tree, consequence estimation modelsLong-term WZ
Casualty risk
Crash severity
Elghamrawy et al. [19]2011Ordered logistic regressionFatal and injury crashes
Crash severity and cost
Yang et al. [20]2013Negative binomial regressionCrash frequency
Single vehicle
Drivers’ injury severity
Osman et al. [21]2016Multinomial logit
Nested logit
Ordered logit
Generalized ordered logit
Heavy vehicle involvement
Injury severity
Osman et al. [22]2018Mixed generalized ordered probitInjury severity of passenger-car crashes
Santos et al. [13,30,31]2017–2021Logistic regression
Probit regression
Type of crash
Contributing factor
Driver age group
Ashqar et al. [26]2021Crash frequency analysis
Logistic regression
Machine learning Random Forest algorithm
Crash severity
Table 2. Summary of studies and explanatory variables adopted in logistic regression approaches and other similar methodologies applied to road work zones crashes analysis.
Table 2. Summary of studies and explanatory variables adopted in logistic regression approaches and other similar methodologies applied to road work zones crashes analysis.
ReferenceKhattak
et al.
[14]
Khattak
et al.
[15]
Qi
et al.
[16]
Li & Bai
[17,18,25]
Meng
et al.
[12]
Elghamrawy
[19]
Yang
et al.
[20]
Osman
et al.
[21]
Osman
et al.
[22]
Santos
et al. [13,30,31]
Ashqar et al.
[26]
Category and variablesYear2003200420052006–2009201020112013201620182017–20212021
Temporal:
hour and minutes (a), day of the week (b), month (c)
a, b, c a, b (a)(a, b)
Road environment conditions:
light (a), weather (b), road surface (c)
a, ba, ba, b, ca, b, caa, b, c a, b, cb, ca, b, ca, b, c
Road and WZ conditions:
road class (a), road geometry (b), road configuration (1) (c), number of lanes (d), lane width (e), intersection (f), bridge (g), pavement type (h), rural/urban (i), AADT/peak hour (j), speed limit (k), WZ type (2) (l), WZ marked with sign/cones (m), access/traffic control device (n), pedestrian/pedestrian worker (o), WZ activity (ongoing) (p), WZ duration (q), WZ length (r), type of work being done (s), WZ effect on the roadway (3) (t)
c, k, l, m, n, o, p, q, s, tc, k, l, m, n, p, s, ta, b, f, l, n, o, q, s, ta, b, d, h, i, k, ni, j, q, ra, c, d, e, f, j, k, l, nb, j, kb, d, f, g, i, j, k, n, o, ta, b, d, g, i, k, n, o, tb, f, i, ka, b, d, f, k, l, m, p
Driver:
age (a), gender (b), physical condition (c), illegal behaviours/poor judgment (d), driving under the influence (alcohol, drugs, etc.) (e), driver’s vision obstruction (f)
a, b, c, d, e, f d, e, fa, b, d, ea, e b, e a, d, e
Vehicle:
type (a), age (b), estimated speed at impact (c)
a
c
aaaaa
b
c
Crash information:
crash location (4) (a), n.º of vehicles involved (b), n.º of persons involved (c), HV involvement (d), motorcycle involvement (e), crash type/severity (f), contributing factors (g), pre-collision actions (h), first/most harmful event (i)
a, b, c, d, f, h, ia, b, c, d, f, ia, b, d, ga, b, f, g, hb, d, fb, c, d, f, gd, f, g, ha, ba, b, dd, e, g, ha
AADT—Average annual daily traffic; HV—Heavy vehicle; (1) One-way, Two-way, divided, not divided, unprotected, median barrier; (2) Construction, maintenance, utility, intermittent/moving; (3) Lane closed, shoulder/median closed, road closed—retour opposing side, lanes shift/become narrow, other/unknown, none; (4) Before work area, adjacent to work area, in work area approach taper.
Table 3. Global database numbers.
Table 3. Global database numbers.
YearN.º of Records Coded as
‘Undefined’
in Field B4.3
N.º of WZ Crashes (1)N.º of
Drivers
Involved
N.º of
Passengers Involved
N.º of
Pedestrians Involved
Total N.º of Persons
Involved
2010155883012633361161715
20119687061107247971451
2012424544814162931069
2013419511755192911038
201436445165014572867
201532046269414375912
Total40533504528312255447052
(1) obstacles on the pavement and roadwork areas (crash report field B4.3).
Table 4. Case processing summary, tolerance and VIF.
Table 4. Case processing summary, tolerance and VIF.
Dependent VariableCodeCategoriesN%ToleranceVIF
Crash nature0Run over pedestrians95713.6%
1Run-off-road185126.2%
2Collision424460.2%
Independent variablesCodeCategoriesN%ToleranceVIF
N. of persons involved in the crash
(Covariate)
X11 person involved (1)116716.5%0.8531.172
2 persons involved (2)299242.4%
3 persons involved (3)165623.5%
4 persons involved (4)83211.8%
5 persons involved (5)4055.8%
FatalityX2No (0)698299.0%0.9691.031
Yes (1)701.0%
Seriously injuredX3No (0)681396.6%0.9151.093
Yes (1)2393.4%
Slightly injuredX4No (0)280239.7%0.8681.152
Yes (1)425060.3%
Road environment (urban)X5No (0)265037.6%0.9141.094
Yes (1)440262.4%
Horizontal geometric design (alignment)X6No (0)188426.7%0.9781.023
Yes (1)516873.3%
Pav. grip condition (dry and clean)X7No (0)243134.5%0.9701.031
Yes (1)462165.5%
Passenger carX8No (0)157522.3%0.9071.103
Yes (1)547777.7%
Driver action (running straight)X9No (0)218331.0%0.9381.067
Yes (1)486969.0%
Pedestrian worker victim action (in WZ)X10No (0)696598.8%0.9581.043
Yes (1)871.2%
Excessive speed for prevailing conditionsX11No (0)656393.1%0.9691.032
Yes (1)4896.9%
Valid values
Missing values
Total values
Subpopulation
7052
0
7052
535 a
100%
0%
a The dependent variable has only one value observed in 332 (62.1%) subpopulations.
Table 5. Summary of variables selected by the forward stepwise method.
Table 5. Summary of variables selected by the forward stepwise method.
StepActionVariable CodeModel Fitting CriteriaEffect Selection Tests
AICBIC−2Log LikelihoodChi-Square adfSig.
(p-Value)
0EnteredIntercept7206.6247220.3467202.624
1EnteredX15118.3825145.8265110.3822092.2422<0.001
2EnteredX94412.9064454.0734400.906709.4752<0.001
3EnteredX43965.2294020.1183949.229451.6772<0.001
4EnteredX53658.2873726.8973638.287310.9432<0.001
5EnteredX103508.1083590.4413484.108154.1792<0.001
6EnteredX33398.0553494.1103370.055114.0522<0.001
7EnteredX63288.6283398.4053256.628113.4272<0.001
8EnteredX83180.6633304.1623144.663111.9652<0.001
9EnteredX73124.9393262.1613084.93959.7242<0.001
10EnteredX113081.7043232.6473037.70447.2352<0.001
11EnteredX23053.0363217.7023005.03632.6682<0.001
Method: Forward Stepwise. a Variable entry is based on chi-square likelihood test. AIC is the Akaike Information Criterion. BIC is the Schwarz’s Bayesin Information Criterion. df is the degrees of freedom.
Table 6. Model fitting information.
Table 6. Model fitting information.
ModelModel Fitting CriteriaLikelihood Ratio Tests
AICBIC−2Log LikelihoodChi-Square dfSig.
(p-Value)
Intercept only7206.6247220.3467202.624
Final5118.3825145.8265110.3822092.24222<0.001
Goodness-of-fit
Pseudo R-SquarePseudo R2
Cox and Snell0.449
Nagelkerke0.532
McFadden0.321
Table 7. Independent variables likelihood ratio test results.
Table 7. Independent variables likelihood ratio test results.
EffectAIC of Reduced ModelBIC of Reduced Model−2Log LikelihoodChi-SquredfSig.
(p-Value)
Intercept3053.0363217.7023005.036 a0.0000
X14418.7174569.6604374.7171369.6812<0.001
X23081.7043232.6473037.70432.6682<0.001
X33172.5313323.4753128.531123.4952<0.001
X43575.3133726.2573531.313526.2772<0.001
X53330.8683481.8113286.868281.8322<0.001
X63144.9723295.9153100.97295.9352<0.001
X73101.3013252.2453057.30152.2652<0.001
X83166.4383317.3813122.438117.4012<0.001
X93580.7813731.7243536.781531.7442<0.001
X103189.1413340.0843145.141140.1042<0.001
X113096.4173247.3603052.41747.3812<0.001
The chi-square statistics is the difference in −2 log-likelihood between the final model and a reduced model. The reduced model is formed by omitting an effect from the final model. The null hypothesis is that all parameters of that effect are 0. a This reduced model is equivalent to the final model because omitting the effect does not increase the degrees of freedom.
Table 8. Model parameter’s estimates.
Table 8. Model parameter’s estimates.
Crash Nature aβStd.
Error
WalddfSig. b
(p-Value)
Exp(β) c
(Odds Ratio)
95% Conf. Interval for Exp(B)
Lower BoundUpper Bound
Run-off-roadIntercept9.3140.749154.43910.000
X1−1.2330.046726.3291<0.0010.2910.2660.319
X2 (=0)−1.9750.33933.8301<0.0010.1390.0710.270
X3 (=0)−2.2570.202125.0501<0.0010.1050.0700.155
X4 (=0)−1.7530.097328.4941<0.0010.1730.1430.209
X5 (=0)0.2640.07512.2981<0.0011.3021.1231.508
X6 (=0)0.6260.07667.9601<0.0011.8701.6112.170
X7 (=0)0.4560.07339.4491<0.0011.5771.3681.819
X8 (=0)0.0320.0820.15110.6981.0320.8791.213
X9 (=0)−1.0460.091132.2271<0.0010.3510.2940.420
X10 (=0)−2.4720.59917.0231<0.0010.0840.0260.273
X11 (=0)−0.5740.12919.8161<0.0010.5630.4380.725
Running over pedestriansIntercept6.0120.78558.72310.000
X1−1.1820.065335.4101<0.0010.3070.2700.348
X2 (=0)−0.8420.4174.07410.0440.4310.1900.976
X3 (=0)−0.7390.2399.53610.0020.4780.2990.763
X4 (=0)0.6530.09151.9731<0.0011.9221.6092.295
X5 (=0)−1.7140.128178.0351<0.0010.1800.1400.232
X6 (=0)−0.3580.10611.34410.0010.6990.5680.861
X7 (=0)−0.1970.0914.67110.0310.8210.6870.982
X8 (=0)−1.2280.12892.2211<0.0010.2930.2280.376
X9 (=0)1.3950.087254.6951<0.0014.0363.4004.790
X10 (=0)−4.4260.54266.6711<0.0010.0120.0040.035
X11 (=0)0.9850.27013.3601<0.0012.6791.5794.544
a The reference category is Collision. b According to [35]: p-value < 0.001—the effect is statistically highly significant; 0.001 < p-value < 0.05—the effect is statistically significant; 0.05 < p-value < 0.10 the effect is statistically marginally or almost significant; and p-value > 0.10—the effect is not statistically significant. cgreen”—positive influence; “red”—negative influence; “black”—no influence.
Table 9. Classification.
Table 9. Classification.
Predicted
ObservedCollisionRun of RoadRun over PedestrianPercent Correct
Collision354050619883.4%
Run-off-road55612484767.4%
Run over pedestrian5711037639.3%
Percent correct66.2%25.0%8.8%73.2%
Table 10. Case processing summary, tolerance and VIF.
Table 10. Case processing summary, tolerance and VIF.
Dependent VariableCodeCategoriesN%ToleranceVIF
Crash nature0Pedestrian5447.7%
1Passenger122517.4%
2Driver528374.9%
Independent variablesCodeCategoriesN%ToleranceVIF
N. of persons involved in the crash
(Covariate)
X11 person involved (1)116716.5%0.9481.054
2 persons involved (2)299242.4%
3 persons involved (3)165623.5%
4 persons involved (4)83211.8%
5 persons involved (5)4055.8%
FatalityX2No (0)698299.0%0.9951.005
Yes (1)701.0%
Seriously injuredX3No (0)681396.6%0.9921.008
Yes (1)2393.4%
Road environment (urban)X4No (0)265037.6%0.9371.067
Yes (1)440262.4%
Horizontal geometric design (alignment)X5No (0)188426.7%0.9841.016
Yes (1)516873.3%
Pav. grip condition (dry and clean)X6No (0)243134.5%0.9811.020
Yes (1)462165.5%
Heavy vehicle involvementX7No (0)670495.1%0.9841.016
Yes (1)3484.9%
Working weekdayX8No (0)181325.7%0.9911.009
Yes (1)523974.3%
Valid values7052100%
Missing values00%
Total values7052
Subpopulation226 a
a The dependent variable has only one value observed in 110 (48.7%) subpopulations.
Table 11. Summary of variables selected by the forward stepwise method.
Table 11. Summary of variables selected by the forward stepwise method.
StepActionVariable CodeModel Fitting CriteriaEffect Selection Tests
AICBIC-2Log LikelihoodChi-Square adfSig.
(p-Value)
0EnteredIntercept2690.9992704.7212686.999
1EnteredX11895.9831923.4271887.983799.0162<0.001
2EnteredX41725.9561767.1221713.956174.0272<0.001
3EnteredX31633.3821688.2711617.38296.5732<0.001
4EnteredX81566.3891634.9991546.38970.9932<0.001
5EnteredX71517.4541599.7871493.45452.9342<0.001
6EnteredX51485.6351581.6901457.63535.8192<0.001
7EnteredX21459.3361569.1131427.33630.2992<0.001
8EnteredX61450.0371573.5361414.03713.2992<0.001
Method: Forward Stepwise. a Variable entry is based on chi-square likelihood test. AIC is the Akaike Information Criterion. BIC is the Schwarz’s Bayesin Information Criterion. df is the degrees of freedom.
Table 12. Model fitting information.
Table 12. Model fitting information.
ModelModel Fitting CriteriaLikelihood Ratio Tests
AICBIC-2Log LikelihoodChi-Square dfSig.
(p-Value)
Intercept only2690.9992704.7212686.999
Final1450.0371573.5361414.0371272.96216<0.001
Goodness-of-fit
Pseudo R-SquarePseudo R2
Cox and Snell0.165
Nagelkerke0.217
McFadden0.126
Table 13. Independent variables likelihood ratio test results.
Table 13. Independent variables likelihood ratio test results.
EffectAIC of Reduced ModelBIC of Reduced Model-2Log LikelihoodChi-SquredfSig.
(p-Value)
Intercept1450.0371573.5361414.037 a0.0000
X12179.7462289.5232147.746733.7092<0.001
X21476.1901585.9671444.19030.1532<0.001
X31552.7241662.5011520.724106.6872<0.001
X41673.6441783.4211641.644227.6072<0.001
X51478.0611587.8381446.06132.0242<0.001
X61459.3361569.1131427.33613.2992<0.001
X71496.9071606.6841464.90750.8702<0.001
X81509.4641619.2411477.46463.4272<0.001
The chi-square statistics is the difference in −2 log-likelihood between the final model and a reduced model. The reduced model is formed by omitting an effect from the final model. The null hypothesis is that all parameters of that effect are 0. a This reduced model is equivalent to the final model because omitting the effect does not increase the degrees of freedom.
Table 14. Model parameter’s estimates.
Table 14. Model parameter’s estimates.
Type of Person Involved aβStd.
Error
WalddfSig. b
(p-Value)
Exp(β) c (Odds Ratio)95% conf. Interval for Exp(B)
Lower BoundUpper Bound
PassengerIntercept−7.5480.627144.97210.000
X10.7800.057184.9511<0.0012.1821.9502.442
X2 (=0)2.1500.48319.7951<0.0018.5883.33022.146
X3 (=0)1.4340.23038.9941<0.0014.1972.6766.584
X4 (=0)1.9140.157148.7871<0.0016.7824.9869.225
X5 (=0)0.7340.13529.7191<0.0012.0841.6002.714
X6 (=0)0.4290.11912.9451<0.0011.5351.2151.939
X7 (=0)1.7890.28439.8061<0.0015.9833.43210.429
X8 (=0)1.0160.13853.9091<0.0012.7632.1063.624
DriverIntercept−2.7810.43041.92810.000
X1−0.0300.0520.34610.5560.9700.8761.074
X2 (=0)1.9800.32138.0951<0.0017.2433.86213.583
X3 (=0)2.0430.182125.9011<0.0017.7125.39811.019
X4 (=0)1.8120.146154.5021<0.0016.1244.6028.150
X5 (=0)0.4650.11915.3511<0.0011.5921.2622.009
X6 (=0)0.3060.1028.94610.0031.3581.1111.659
X7 (=0)0.5510.1888.61010.0031.7351.2012.508
X8 (=0)0.6060.12523.4541<0.0011.8331.4342.342
a The reference category is: Pedestrian. b According to [35]: p-value < 0.001—the effect is statistically highly significant; 0.001 < p-value < 0.05—the effect is statistically significant; 0.05 < p-value < 0.10 the effect is statistically marginally or almost significant; and p-value > 0.10—the effect is not statistically significant. cgreen”—positive influence; “black”—no influence.
Table 15. Classification.
Table 15. Classification.
Predicted
ObservedPedestrianPassengerDriverPercent Correct
Pedestrian2665124.8%
Passenger4204101716.7%
Driver19230503495.3%
Percent correct0.7%6.2%93.1%74.6%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Vieira, A.; Santos, B.; Picado-Santos, L. Modelling Road Work Zone Crashes’ Nature and Type of Person Involved Using Multinomial Logistic Regression. Sustainability 2023, 15, 2674. https://doi.org/10.3390/su15032674

AMA Style

Vieira A, Santos B, Picado-Santos L. Modelling Road Work Zone Crashes’ Nature and Type of Person Involved Using Multinomial Logistic Regression. Sustainability. 2023; 15(3):2674. https://doi.org/10.3390/su15032674

Chicago/Turabian Style

Vieira, Adriana, Bertha Santos, and Luís Picado-Santos. 2023. "Modelling Road Work Zone Crashes’ Nature and Type of Person Involved Using Multinomial Logistic Regression" Sustainability 15, no. 3: 2674. https://doi.org/10.3390/su15032674

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop