1. Introduction
The European Union is facing multiple interconnected challenges, from climate change to the even worse air pollution, from a stagnant number of road deaths to the increasing urbanization. Everything is exacerbated by rising obesity and the ageing population [
1]. The rapid increase in motorization followed by the increasing use of private motor vehicles is impacting non-renewable energy consumption, pollution, obesity, congestion, and collisions. What is more, the United Nations reported that 99% of the world’s urban population breathes polluted air [
2]. Cities are responsible for more than 70% of the global greenhouse gas emissions produced and this is a significant threat to human health worldwide, especially considering that more than half the world’s population live in cities nowadays and it is estimated that seven out of ten people will likely live in urban areas by 2050.
Among the EU countries, some governments are currently applying walking strategies at a national level. Since 2017, the English government has adopted a Walking Investment Strategy [
3] with the aim to increase the levels of walking up to 300 stages per person per year. A similar national walking promotion strategy has also been adopted in Finland since 2018 [
4]. Among the targets, the Finnish program aims to increase the walking modal share by 30% by 2030. Including pedestrian safety in every step of the planning, design, implementation, and management process is another key factor to ensure that the main pedestrians’ problems are identified and then mobilised.
Over being carbon and emission-free, walking is also the most common mode of transport, making part of our everyday lives and trips. Progress in road safety has been made in recent years. Nevertheless, there is still evidence that safety improvements are not equally shared by all road users and vulnerable road users’ safety has not improved as much as that of vehicle drivers. Pedestrian crashes, indeed, still represent a serious issue in the EU. Over the period 2010–2018, the number of pedestrian deaths decreased by 2.6% on average each year in the EU compared to a 3.1% annual reduction in motorised road user deaths [
1]. In the same period, in Italy, the number of pedestrian deaths decreased annually by only 0.1% [
5]. Zegeer and Bushell [
6] further found a greater pedestrian risk in urban areas where both pedestrians and vehicle activities are most intense. Thus, the greatest evidence is the ever-growing need for better knowledge among planners and engineers about the possible countermeasures that may balance the safety needs of pedestrians, drivers, and all road users. For a serious shift to walking, mainly for local journeys in densely populated areas, the design of urban spaces needs to change, establishing a modal priority on the basis of the vulnerability of road users. Hence, a study on the identification of pedestrian crash patterns appears strategic for planning, designing, and managing a safer transport system to guide safer urban development. Extensive prior research focused on the identification of contributory factors of severe and fatal crashes using the econometric models, mainly the multinomial logit (e.g., [
7,
8,
9,
10]) and the ordered logit models [
11,
12]. The need for models capable of capturing the unobserved heterogeneity highlighting hidden correlations among data has led to the implementation of the mixed logit (or random parameters) model [
13,
14,
15,
16,
17,
18]. Currently, the mixed logit is considered a precise estimator and the most used, proven, and consolidated model that explicitly accounts for crash-specific variations in the effects of explanatory variables. The model implies that the parameter effects can vary in magnitude across individual crashes, also ranging from negative to positive impacts [
19], or be fixed within an observation group [
20].
According to the review of the existing literature, prior recent research has also applied machine learning algorithms. Recognized as data-driven models, their use is to be preferred with large datasets [
21]. They are free from a priori probabilistic and parametric assumptions about the phenomena of understudying, typical of the econometric models. A downside of the machine learning tools is their difficulty in uncovering causality. Nevertheless, some machine learning methods, such as the rule discovery technique and the classification trees, show better capabilities in detecting valuable information. Particularly powerful for dealing with prediction and classification problems, the association rules (e.g., [
22,
23,
24,
25,
26]), as well as the classification trees (e.g., [
24]), have been used in several studies to find out patterns affecting the pedestrian crash severity by identifying sets of patterns or rules. Prior studies performed by Montella et al. [
27] showed that both the classification trees and the association rule straightforwardly detected non-trivial associations among crash patterns and their interdependencies in the data. The tree structure allowed a graphical visualization of the phenomenon investigated whereas the association rules revealed new information previously unknown in the data. Moreover, the results provided by the two different approaches were never conflicting and the joint use of the two machine learning tools as complementary methods was encouraged.
Several studies investigated the possible advantages provided by the combined use of econometric models and machine learning tools [
28,
29]. The implicit assumption in developing a traditional statistical model is that it will reveal causal effects while preserving the best prediction accuracy. However, the latest applications of machine learning tools, together with the issues of causality in traditional statistical modelling, advise safety analysts to find a compromise between uncovering causality and prediction accuracy. When choosing among the logit models or the data-driven methods, the main result provided by previous studies is that the traditional models and the machine learning tools agree on many aspects, including the importance of the variables and the direction of association between several explanatory variables and the response variable, and their joint use provides a trade-off between the predictive accuracy and the soundness and interpretability of the results [
13,
14].
Since previous research found that the joint application of the econometric and data-driven approaches is successful in providing non-trivial insights about crash contributory patterns and their interdependencies, this paper performed both an econometric model, namely the mixed logit model, and the association rules and the classification tree algorithm, as machine learning tools, to evaluate the patterns contributing to the greater propensity of pedestrian crashes. These methods have been generally used to analyse crash severity, whereas this study provided an application of such a methodological approach to detect the features associated with an increase in pedestrian crash proportion.
The aims of the study are (1) to detect the road infrastructure, environmental, vehicle, and driver-related patterns that affect the overrepresentation of pedestrian crashes in Italy, and (2) to identify safety countermeasures to mitigate the detected pedestrian crash patterns.
The paper is organized as follows:
Section 2 shows the crash data and the related descriptive statistics,
Section 3 introduces the methodology,
Section 4 provides the results of pedestrian crash occurrence,
Section 5 reports a comparison of the results provided by the different methods,
Section 6 provides the discussion followed in
Section 7 by the conclusions.
2. Crash Data
The Italian National Institute of Statistics (Istat, Rome, Italy) provided the crash data used in this study. The database includes only fatal crashes or crashes with injuries that occurred on Italian roads from 2014 to 2018. Crash severity is collected in two different levels: injury crashes and fatal crashes, without distinction between slight or serious injuries. Consistently with the datasets from Australasia, the European Union, and the United States [
30], the Istat database defines a fatal crash as a crash where at least one person dies in the crash or within the 30 days following it. Crashes are classified through 118 variables describing the crash characteristics (including the time, the location of the crash, and the presumed circumstances of crashes), the roadway characteristics and the environmental conditions, the traffic units (including the vehicle characteristics), and the people implicated in the crash (including the characteristics of drivers, passengers, and pedestrians). Further variables regarding detailed crash information and driver psychophysical states were provided by Istat for research support. Finally, the dataset included 15 categorical variables and consisted of 874,847 crashes. Of which, 101,032 were pedestrian crashes (
Table 1 and
Table 2) representative of 11.55% of the total crashes. Among the pedestrian crashes, 2.94% resulted in fatal crashes. Regarding all fatal crashes (n = 15,780), almost one fatal crash out of five is with pedestrian involvement (18.81%).
The variable lighting, classified as a binary variable (day/night), was obtained evaluating the sunrise and sunset by the “SUNCALC” R-Package.
5. Comparison between the Econometric and the Machine Learning Methods
To compare the results of the mixed logit and the machine learning models, the significant explanatory variables, as well as their impact on the probabilities of pedestrian crash occurrence, are discussed below.
5.1. Roadway
Area as a contributory factor was identified only by the rule discovery technique with the urban areas associated with the pedestrian crash occurrence. Both the mixed logit and the machine learning tools, instead, identified the road type variable. They provided consistent results detecting an overrepresentation of pedestrian crashes on urban municipal roads. Consistency was also found for alignment. All the methods detected the tangent alignment as a contributory pattern. The association rules further identified signalised and unsignalised intersections, combined with driver’s manoeuvring, contributing to the pedestrian crash occurrence.
5.2. Environment
Both the mixed logit model and the association rules identified the day of the week as a significant pattern. The probability of pedestrian crash occurrence increases during the weekday. Night-time increases the pedestrian crash propensity. Raining and snowing weather condition increases the likelihood of pedestrian crash occurrence. Rain’s effect was captured both by the mixed logit model and the association rules whereas fog and high winds contributing to the decrease in pedestrian crash occurrence were significant only in the mixed logit.
5.3. Vehicles
The vehicle involved in a pedestrian crash is decisive. Indeed, the vehicle type influences the likelihood of observing a pedestrian crash. The results of both the mixed logit model as well as the association rules were consistent, pointing out that a pedestrian struck by a car or a truck rather than a bike or a PTW has a higher attendance risk. New vehicles (vehicles registered less than 10 years ago) have a positive effect on pedestrian crashes. These results suggest that the innovation in vehicle technology equipment intended to reduce the likelihood of crashes fails to detect pedestrians and does not take adequate account of their safety.
5.4. Drivers
The driver behaviour exhibited a significant effect in both the mixed logit model and the machine learning tools. Driver manoeuvring contributes to the overrepresentation of pedestrian crashes. Inappropriate behaviour, such as speeding and travelling in opposite the right direction, was found by the classification tree further contributing to pedestrian crashes. Furthermore, the association rules and the classification tree identified drivers disobeying pedestrian crossing facilities as critical.
The relation between the driver psychophysical state and the pedestrian crashes was identified only by the mixed logit model. Poor eyesight conditions involve an increase in pedestrian crash propensity.
Driver age was correlated with pedestrian crash overrepresentation, especially the involvement of elderly drivers (at least 75 years old) was identified by both groups of methods. Male driver involvement in pedestrian crash overrepresentation was found significant with random effect only in the mixed logit.
6. Discussion
The study results identified several patterns associated with an overrepresentation of pedestrian crashes. The roadway attributes contributing to an increase in pedestrian crash propensity were urban areas, urban municipal roads, tangent alignment, and intersections combined with drivers’ manoeuvring. These results indicate that the roadway patterns impacting the occurrence of pedestrian crashes differ from those affecting the pedestrian crash severity. Indeed, highly dense urban settings may provide more facilities for pedestrians whereas, in rural areas, there are likely to be poor infrastructures that accommodate pedestrians [
36,
37,
38]. Despite this, pedestrian crashes are overrepresented on urban roads whereas fatal pedestrian crashes are overrepresented on other road types. Therefore, pedestrian-oriented safety countermeasures are strongly required for all road types. Based on the study results, on urban roads, special emphasis should be given to pedestrian treatments at mid-block locations. Walking should be prioritised in every new infrastructure scheme as well as when designing regenerated streets in an area experiencing land development, even during maintenance treatments. This may create an opportunity to reconsider some aspects of the street design useful to accommodate safe pedestrian mobility [
39] and better incorporate pedestrian–vehicle safety considerations at locations where pedestrian crashes are more likely to occur [
40,
41,
42]. The establishment of a suitable road user hierarchy should be based on safety, vulnerability, and sustainability, with walking being at the top of the hierarchy. The creation of pedestrian paths together with the reduction of vehicle-destined space is not easy to understand and digest for habitual road users. Hence, national, provincial, and municipal policies should work on public acceptance and emphasize the City’s interest and investment in developing safe and accessible streets that allow for safe movements.
Interestingly, the probability of pedestrian crashes at roundabouts is lower than at unsignalised and signalised intersections (ORs respectively equal to 0.23, 0.38, and 0.44). Hence, the safety benefits of the presence of roundabouts are relevant in decreasing the fatal pedestrian crash probability as well as in providing a reduction in the pedestrian crash probability. This is due to the reduction of pedestrian–vehicle conflict points and lower vehicle speeds [
43,
44]. This is a quite relevant result considering that in Italy there are often roundabouts with undesired safety features that negatively influence roundabout safety [
45,
46]. Based on the study result, if warranty conditions for the installation of roundabouts are satisfied converting unsignalised and signalised intersections in roundabouts is strongly recommended. Refuge islands at the legs of roundabouts further increase the safety of pedestrians at roundabouts [
47].
The environmental patterns affecting the increase in pedestrian crash propensity were night-time, dry pavement, wet pavement combined with older drivers (≥75), or with drivers’ manoeuvring, weekday, autumn, winter, and spring seasons, raining, and snowing. Pedestrian visibility in darkness is a well-known safety concern. Both drivers’ and pedestrians’ sight reduce with dark lighting whereas increase their reaction times to avoid potential conflicts. Furthermore, higher driving speeds are generally observed at night, increasing the crash risk. The combination of these conditions increases the required braking distance of vehicles and leads to higher impact at the time of crashes. Traffic calming as well as low-speed zones in areas with significant pedestrian activity are the most effective solutions to mitigate pedestrian crash frequency at night. Providing adequate pedestrian visibility during the night-time further provides drivers with sufficient time to identify and appropriately react to other road users and hazards [
48]. Pedestrian visibility during the night-time can be improved by providing pedestrian crossings lighting with light-emitting diodes (LEDs). Flashing in-curb LEDs as well as pedestrian-activated overhead beacons at crosswalks or in-pavement warning lights with advance signing are effective strategies to warn motorists of pedestrian crossings, increasing their attention, especially at night [
49,
50]. Campaigns to raise awareness of the importance of using reflective clothing to improve pedestrian conspicuity at night [
51,
52].
The vehicle patterns affecting the increase in pedestrian crash propensity were truck, car, and vehicles aged at most 10 years. Although the severity of truck-pedestrian crashes has already been found by prior research [
53,
54], this study further detected a detrimental relation between trucks and pedestrian crash occurrence. To mitigate the consequences of such crashes, traffic management strategies may be implemented separating pedestrian flow and truck routes.
The driver patterns affecting the increase in pedestrian crash propensity were manoeuvring, speeding, illegal travel direction, defective sight, very young age (≤ 17), medium age (45–64), and old age (≥65). Previous research found that the probability of complex vehicular manoeuvres increases the pedestrian crash occurrence, mainly at intersections [
55]. The speeding behaviour of drivers was also found to increase the risk of conflicts and its associated crash risk [
56]. The driver disobedience of pedestrian crossing facilities was also identified as a pattern contributing to pedestrian crash overrepresentation. The mixed logit model showed a significant odds ratio (equal to 1.41) for drivers with sight issues increasing the likelihood of pedestrian crashes. The rule discovery and the CART algorithm identified the strongest predictor in the drivers’ disobeying pedestrian crossing facility. Consistently with previous studies [
39], the quality and complexity of the walking environment, exacerbated by poor visibility in the proximity of road crossing opportunities, increase the possibility of pedestrian-vehicle conflicts. Empirical studies have proved the effectiveness of appropriate design modifications aimed at reducing pedestrian crashes and removing barriers to walking [
6]. The use of bulb-outs to improve pedestrian visibility is further encouraged. Provided at junction corners, the bulb-outs shorten the pedestrian crossing distance and offer a better view of the oncoming vehicles. Previous research has found that their presence affects the vehicles’ operating speeds. In-site measurements revealed lower speeds recorded in sections where bulb-outs are located [
57]. Other scholars suggest narrowing the road cross-section (bulb-outs) and introducing pedestrian crossings with blinking lights turning on automatically when a pedestrian is identified [
58]. Furthermore, safety awareness and education campaigns should target drivers on pedestrian right-of-way. To stimulate individuals towards safety-oriented actions, education campaigns are fundamental.
This study further identified a greater propensity of older drivers for pedestrian crashes, probably because of their lower reaction times and more difficult interaction with pedestrians.
7. Conclusions
The investigation of the patterns affecting pedestrian crash occurrence is not a well-developed topic as pedestrian crash severity. Whereas many studies aimed at reducing fatal and severe pedestrian crashes, the main aim of this paper was to help to raise awareness among practitioners and provide better guidance in planning and designing infrastructures for pedestrians that are safe, of course, but also accessible and sustainable, to prevent the occurrence of pedestrian crashes towards a vision of walkable cities. This study used an econometric model, namely the mixed logit model, the rule discovery technique, and the CART algorithm, as machine learning tools, to analyse the road infrastructure, environmental, vehicle, and driver-related patterns affecting the pedestrian crash overrepresentation in Italy. The mixed logit, the rule discovery, and the CART algorithm have been generally used to analyse crash severity, whereas this study provided an application of such a methodological approach to detect those features affecting the pedestrian crash occurrence.
The dataset contains 874,847 road crashes resulting in fatalities or injuries that occurred in Italy from 2014 to 2018. Of these, 101,032 were pedestrian crashes.
The results provided by the two groups of methods provide strong evidence of the importance of promoting urban sustainable complete street planning and development as well as raising awareness in support of safer behaviour if walking has to forge an effective—and mainly safe—solution against private car dependence, traffic noise, air pollution, health disease, and pedestrian vulnerability. To this aim, walking should be at the top of the hierarchy in every new infrastructure scheme as well as in street re-generation designs.
The methodological approach adopted in this study was effective in uncovering relations among road infrastructure, environmental, vehicle, and driver-related patterns, and the overrepresentation of pedestrian crashes. The latest applications of machine learning tools suggest that analysts must opt for a compromise between prediction accuracy and uncovering causality, trying to achieve prediction accuracy and, at the same time, exhaustive and reliable factors contributing to crashes. Despite this, the results of this study advocate the econometric model and the machine learning tools as complementary approaches. The mixed logit provided a clue on the impact of each pattern on the pedestrian crash occurrence whereas the association rules and the classification tree detected the associations among the patterns with insights on how the co-occurrence of more factors could be detrimental to pedestrian safety. Furthermore, the strength of the co-occurrence of the patterns impacting the pedestrian crash occurrence can be measured via the lift increase for the association rules and the posterior classification ratio for the classification tree with the factors mostly contributing to pedestrian crashes being the patterns providing the higher increase in the lift values (association rules) or the splitter modalities providing the highest proportion of pedestrian crashes in a node concerning the root node of the tree. By contrast, the mixed logit model provides information about the directions and magnitude of variable indicators. By the joint use of the econometric methods and machine learning tools, the analyst can exploit the interpretability of the results of the econometric methods and the ability of the machine learning tools to provide comprehensible scenarios (as those provided by association rules and classification tree), further highlighting the co-occurrence and the relative strength of the patterns that contribute to vehicle-pedestrian crashes.
According to the results obtained in the study, safety countermeasures have been proposed. Including pedestrian safety in every step of the planning, design, implementation, and management process is a key factor to ensure that their main problems are identified and mobilised.
The insights gained from the study may help to raise awareness among local authorities and transport agencies in planning and designing appropriate spaces for pedestrians. Furthermore, the results provided by the study may be also considered by the automotive industry to address the important challenge of how vehicle onboard devices can prevent pedestrian crashes.
A significant contribution of this paper relies on the detection of the detrimental impact of drivers’ psychophysical states and drivers’ behaviours on pedestrian crashes. The availability of such information in the data is crucial. It detects the need for conducting safety awareness and education campaigns to increase safety-oriented actions.