Econometric and Machine Learning Methods to Identify Pedestrian Crash Patterns

: Walking plays an important role in overcoming many challenges nowadays, and governments and local authorities are encouraging healthy and environmentally sustainable lifestyles. Nevertheless, pedestrians are the most vulnerable road users and crashes with pedestrian involvement are a serious concern. Thus, the identiﬁcation of pedestrian crash patterns is crucial to identify appropriate safety countermeasures. The aims of the study are (1) to identify the road infrastructure, environmental, vehicle, and driver-related patterns that are associated with an overrepresentation of pedestrian crashes, and (2) to identify safety countermeasures to mitigate the detected pedestrian crash patterns. The analysis carried out an econometric model, namely the mixed logit model, and the association rules and the classiﬁcation tree algorithm, as machine learning tools, to analyse the patterns contributing to the overrepresentation of pedestrian crashes in Italy. The dataset consists of 874,847 crashes—including 101,032 pedestrian crashes—that occurred in Italy from 2014 to 2018. The methodological approach adopted in the study was effective in uncovering relations among road infrastructure, environmental, vehicle, and driver-related patterns, and the overrepresentation of pedestrian crashes. The mixed logit provided a clue on the impact of each pattern on the pedestrian crash occurrence, whereas the association rules and the classiﬁcation tree detected the associations among the patterns with insights on how the co-occurrence of more factors could be detrimental to pedestrian safety. Drivers’ behaviour and psychophysical state turned out to be crucial patterns related to pedestrian crashes’ overrepresentation. Based on the identiﬁed crash patterns, safety countermeasures have been proposed.


Introduction
The European Union is facing multiple interconnected challenges, from climate change to the even worse air pollution, from a stagnant number of road deaths to the increasing urbanization. Everything is exacerbated by rising obesity and the ageing population [1]. The rapid increase in motorization followed by the increasing use of private motor vehicles is impacting non-renewable energy consumption, pollution, obesity, congestion, and collisions. What is more, the United Nations reported that 99% of the world's urban population breathes polluted air [2]. Cities are responsible for more than 70% of the global greenhouse gas emissions produced and this is a significant threat to human health worldwide, especially considering that more than half the world's population live in cities nowadays and it is estimated that seven out of ten people will likely live in urban areas by 2050.
Among the EU countries, some governments are currently applying walking strategies at a national level. Since 2017, the English government has adopted a Walking Investment Strategy [3] with the aim to increase the levels of walking up to 300 stages per person per year. A similar national walking promotion strategy has also been adopted in Finland since 2018 [4]. Among the targets, the Finnish program aims to increase the walking modal share by 30% by 2030. Including pedestrian safety in every step of the planning, design, implementation, and management process is another key factor to ensure that the main pedestrians' problems are identified and then mobilised.
Over being carbon and emission-free, walking is also the most common mode of transport, making part of our everyday lives and trips. Progress in road safety has been made in recent years. Nevertheless, there is still evidence that safety improvements are not equally shared by all road users and vulnerable road users' safety has not improved as much as that of vehicle drivers. Pedestrian crashes, indeed, still represent a serious issue in the EU. Over the period 2010-2018, the number of pedestrian deaths decreased by 2.6% on average each year in the EU compared to a 3.1% annual reduction in motorised road user deaths [1]. In the same period, in Italy, the number of pedestrian deaths decreased annually by only 0.1% [5]. Zegeer and Bushell [6] further found a greater pedestrian risk in urban areas where both pedestrians and vehicle activities are most intense. Thus, the greatest evidence is the ever-growing need for better knowledge among planners and engineers about the possible countermeasures that may balance the safety needs of pedestrians, drivers, and all road users. For a serious shift to walking, mainly for local journeys in densely populated areas, the design of urban spaces needs to change, establishing a modal priority on the basis of the vulnerability of road users. Hence, a study on the identification of pedestrian crash patterns appears strategic for planning, designing, and managing a safer transport system to guide safer urban development. Extensive prior research focused on the identification of contributory factors of severe and fatal crashes using the econometric models, mainly the multinomial logit (e.g., [7][8][9][10]) and the ordered logit models [11,12]. The need for models capable of capturing the unobserved heterogeneity highlighting hidden correlations among data has led to the implementation of the mixed logit (or random parameters) model [13][14][15][16][17][18]. Currently, the mixed logit is considered a precise estimator and the most used, proven, and consolidated model that explicitly accounts for crash-specific variations in the effects of explanatory variables. The model implies that the parameter effects can vary in magnitude across individual crashes, also ranging from negative to positive impacts [19], or be fixed within an observation group [20].
According to the review of the existing literature, prior recent research has also applied machine learning algorithms. Recognized as data-driven models, their use is to be preferred with large datasets [21]. They are free from a priori probabilistic and parametric assumptions about the phenomena of understudying, typical of the econometric models. A downside of the machine learning tools is their difficulty in uncovering causality. Nevertheless, some machine learning methods, such as the rule discovery technique and the classification trees, show better capabilities in detecting valuable information. Particularly powerful for dealing with prediction and classification problems, the association rules (e.g., [22][23][24][25][26]), as well as the classification trees (e.g., [24]), have been used in several studies to find out patterns affecting the pedestrian crash severity by identifying sets of patterns or rules. Prior studies performed by Montella et al. [27] showed that both the classification trees and the association rule straightforwardly detected non-trivial associations among crash patterns and their interdependencies in the data. The tree structure allowed a graphical visualization of the phenomenon investigated whereas the association rules revealed new information previously unknown in the data. Moreover, the results provided by the two different approaches were never conflicting and the joint use of the two machine learning tools as complementary methods was encouraged.
Several studies investigated the possible advantages provided by the combined use of econometric models and machine learning tools [28,29]. The implicit assumption in developing a traditional statistical model is that it will reveal causal effects while preserving the best prediction accuracy. However, the latest applications of machine learning tools, together with the issues of causality in traditional statistical modelling, advise safety analysts to find a compromise between uncovering causality and prediction accuracy. When choosing among the logit models or the data-driven methods, the main result provided by previous studies is that the traditional models and the machine learning tools agree on many aspects, including the importance of the variables and the direction of association between several explanatory variables and the response variable, and their joint use provides a trade-off between the predictive accuracy and the soundness and interpretability of the results [13,14].
Since previous research found that the joint application of the econometric and datadriven approaches is successful in providing non-trivial insights about crash contributory patterns and their interdependencies, this paper performed both an econometric model, namely the mixed logit model, and the association rules and the classification tree algorithm, as machine learning tools, to evaluate the patterns contributing to the greater propensity of pedestrian crashes. These methods have been generally used to analyse crash severity, whereas this study provided an application of such a methodological approach to detect the features associated with an increase in pedestrian crash proportion.
The aims of the study are (1) to detect the road infrastructure, environmental, vehicle, and driver-related patterns that affect the overrepresentation of pedestrian crashes in Italy, and (2) to identify safety countermeasures to mitigate the detected pedestrian crash patterns.
The paper is organized as follows: Section 2 shows the crash data and the related descriptive statistics, Section 3 introduces the methodology, Section 4 provides the results of pedestrian crash occurrence, Section 5 reports a comparison of the results provided by the different methods, Section 6 provides the discussion followed in Section 7 by the conclusions.

Crash Data
The Italian National Institute of Statistics (Istat, Rome, Italy) provided the crash data used in this study. The database includes only fatal crashes or crashes with injuries that occurred on Italian roads from 2014 to 2018. Crash severity is collected in two different levels: injury crashes and fatal crashes, without distinction between slight or serious injuries. Consistently with the datasets from Australasia, the European Union, and the United States [30], the Istat database defines a fatal crash as a crash where at least one person dies in the crash or within the 30 days following it. Crashes are classified through 118 variables describing the crash characteristics (including the time, the location of the crash, and the presumed circumstances of crashes), the roadway characteristics and the environmental conditions, the traffic units (including the vehicle characteristics), and the people implicated in the crash (including the characteristics of drivers, passengers, and pedestrians). Further variables regarding detailed crash information and driver psychophysical states were provided by Istat for research support. Finally, the dataset included 15 categorical variables and consisted of 874,847 crashes. Of which, 101,032 were pedestrian crashes (Tables 1 and 2) representative of 11.55% of the total crashes. Among the pedestrian crashes, 2.94% resulted in fatal crashes. Regarding all fatal crashes (n = 15,780), almost one fatal crash out of five is with pedestrian involvement (18.81%).  The variable lighting, classified as a binary variable (day/night), was obtained evaluating the sunrise and sunset by the "SUNCALC" R-Package.

Method
This study presents the analysis of the road infrastructure, environmental, vehicle, and driver-related patterns affecting pedestrian crash propensity in Italy trough the implementation of the mixed logit model, the rule discovery, and the CART algorithm. The entire dataset containing 874,847 crashes was used in the analysis. All 15 variables presented in Tables 1 and 2 were tested as potential explanatory variables. The dependent variable was the pedestrian crash that has a binary response: yes, if a pedestrian crash occurred, no otherwise.

The Mixed Logit Model
The mixed logit model is a random utility model that schematizes a specific category jth (that is the propensity of a crash of being classified as a crash involving-or not involvinga pedestrian in this study) with a utility given by the sum of V ij (the systematic component) and ε ij (the unobservable stochastic error): where: x ij are the characteristics that may potentially affect a pedestrian crash, β j are the parameters to be estimated, ε ij is the disturbance term. The hypothesis of the estimated parameters of being fixed is relaxed, so that one or more coefficients could potentially vary across crashes or be fixed within a group of crashes [20]. In that case, each β can be random and is derived as: where β j is the column vector of random parameters capturing unobserved crash-specific attributes, β j is the mean of β j random coefficient, σ j is the standard deviations of the random coefficient. The probability P i (j) that a crash i (i = 1, ..., I) is classified as a pedestrian crash/not a pedestrian crash j (j = 1, ..., J) is given by: where: f (β|σ ) is the β density function, θ describes the β coefficients density function in terms of mean and variance. The model was developed using the forward stepwise procedure with a p-value at most equal to 0.05. Finally, the McFadden's Pseudo R 2 index was used to assess how the model fits the data: where: LL full represents the log-likelihood of the model of interest which includes all statistically significant variables, LL 0 is the log-likelihood of the null model.
The R-cran environment with "Rchoice" was used to perform the mixed logit model. For each significant coefficient, the Odds Ratio (OR) was assessed to evaluate the relative amount by which the odds of the outcome increased (OR > 1) or decreased (OR < 1) when the value of the corresponding indicator variable is set equal to 1.

Machine Learning Models
Two machine learning tools, namely association rules and classification trees, were used to detect pedestrian crash patterns.

Association Rules
The association rules are a descriptive-analytic method that extracts information from big data in rules having the form A→B. Each rule is made up of at least one pattern, called antecedent (indicated with A), and a consequent (indicated with B). In our analysis, the consequent is the pedestrian crash. The a priori algorithm (proposed by Agrawal et al. [31]) examines all candidate item-sets. The valid rules must satisfy minimum values of support, confidence, and lift. The support represents the percentage of the entire data set covered by the rule (Equation (5)), the confidence evaluates the reliability of the inference of the rule (Equation (6)), and the lift measures the statistical interdependence of the rule (Equation (7)): where: Each rule with one antecedent and one consequent is a 2-item rule and is used as a starting point. Each rule with two antecedents and one consequent is a 3-item rule, and so on. Each rule with n + 1 items is validated by the lift increase (LIC), set equal to 5% [32,33].
The LIC values is calculated as follows: where: A n−1 is the antecedent of the n-1 item rule, A n is the antecedent of the n-item rule. Support (S), confidence (C), and lift (L) threshold values were set as follows: S ≥ 0.1%, C ≥ 4.0%, L ≥ 1.2, and LIC ≥ 1.05. The association rules were performed in the R-cran software environment using the package "arules".

Classification Trees
A classification tree is an oriented graph where the root node (containing all data) is divided by a splitter into a finite number of leaf nodes [34]. We developed the CART binary tree proposed by Breiman et al. [35]. Each of the road infrastructure, environmental, vehicle, and driver-related patterns considered in the study are candidates for splitting. The splitting variable is determined to separate the observations into two groups that are as homogenous as feasible. To perform each split, the Gini index or the node impurity is assessed (as a measure of the total variance among all classes in the node). The impurity is given by: where: i Y (t) is the node t impurity, p(j|t) represents the crashes in the node t belonging to class j. The total impurity of any tree T is given by: where: i Y (T) is the total impurity of a tree T p(t) = N(t)/N is the weight of the node t, N(t) is the number of crashes falling in node t whereas N is the total number of crashes, T is the set of terminal nodes of the tree T.
The tree growing process was stopped based on two criteria: (1) the impurity reduction is less than 0.0001 (minimum default value); and (2) the tree can have at most four levels. At each node, the class assignment depends on the greatest value of the posterior classification ratio (PCR). The PCR compares the tree terminal nodes' classification with the root node classification [27]: where: p(j|t) represents the crashes in the node t belonging to the class j, t root is the tree root node.
For each node, the class j* with the greatest value of PCR gives the class of that node that is selected as follows: Then, to integrate the classification tree and the association rule discovery results, the classification tree was transformed into rules. All the splits are the antecedents of the rule while the class j* determines the consequent. The association rule thresholds of Support (S), confidence (C), lift (L), and lift increase (LIC) were also evaluated for each terminal node t.
The classification tree was carried out with SPSS 26 software (IBM, Armonk, NY, USA).

Mixed Logit Model
The mixed logit model exhibited a McFadden Pseudo R 2 of 0.56 indicating an excellent fit. Overall, 14 independent variables and 44 indicators were statistically significant (see Table 3) with fixed effects. The indicator variable is driver gender male resulting in normally distributed random effects and statistically significant standard deviation, both indicating the presence of unobserved heterogeneity in the data. The mean and standard deviation were respectively equal to 0.18 and 0.17 implying that for 86% of the crashes the probability of a pedestrian crash is increased by the presence of a male driver whereas, for the remaining 14% of the crashes, it leads to a decrease in that probability.  As expected, urban municipal roads, considered as the baseline, show a greater propensity for pedestrian crashes while motorways show a lower propensity. Road alignment has a key role in pedestrian crashes. The simpler alignment, which is the tangent segment, has a higher propensity for pedestrian crashes while roundabouts have a lower probability of pedestrian crashes (OR = 0.23). Interestingly, pedestrian crashes in roundabouts are underrepresented compared to signalised and unsignalised intersections.

Environment
Results show a statistically significant higher probability of pedestrian crashes on weekdays, in winter (OR = 1.58), autumn (OR = 1.43), and spring (OR = 1.19), and in darkness. It is noteworthy to observe that weather conditions associated with pedestrian crashes are raining and snowing while wet, snowy, and slippery pavement are both associated with a pedestrian crash probability decrease.

Vehicles
Assuming cars as the baseline condition, trucks are overrepresented in pedestrian crashes while the involvement of PTWs and bicycles shows a lower probability of pedestrian crashes (OR = 1.24 vs OR = 0.69 and 0.36). Furthermore, older vehicles and vehicles with defects have a lower probability of pedestrian crashes (i.e., a higher probability of other crash types).

Drivers
Drivers' significant variable results: behaviour (with a positive coefficient of manoeuvring, which includes right-turn, left-turn, and U-turn manoeuvres), psychological state (with a positive coefficient for defective eyesight, OR = 4.10), age (with an increase in the probability of being involved in a pedestrian crash for older driver age), and gender (random variable with male gender associated to a higher probability of pedestrian crashes for 86% of the observations, OR = 1.20).

Machine Learning Models
The rule discovery tool generated 63 valid rules. In detail, the algorithm identified three two-item rules (Table 4), 14 three-item rules (Table 4), 31 four-item rules (Table 4), and 15 five-item rules ( Table 5). The rules were ordered by the decreasing value of the lift. Then, the rules were grouped according to the number of items.  The CART tree is reported in Figure 1. The algorithm provided eight terminal nodes, two of which predicted pedestrian crashes (node 1 and node 14) and were reported in red. Moreover, only these nodes (rules T_1, T_14, in Table 6) satisfied the LIC criterion (Equation (8)), identifying as predictors the following variables driver behaviour, road type, and alignment. The PCR was evaluated for all the nodes. However, in the tree, it was provided only for the terminal nodes to understand how representative each terminal node is in relation to the predicted class. Node 1 exhibited a very high PCR equal to 8.70, this is synonymous with the robustness of this terminal node for pedestrian crash classification.
Moreover, only these nodes (rules T_1, T_14, in Table 6) satisfied the LIC criterion (Equation (8)), identifying as predictors the following variables driver behaviour, road type, and alignment.
The PCR was evaluated for all the nodes. However, in the tree, it was provided only for the terminal nodes to understand how representative each terminal node is in relation to the predicted class. Node 1 exhibited a very high PCR equal to 8.70, this is synonymous with the robustness of this terminal node for pedestrian crash classification.

Roadway
Together with drivers aged ≥75 and drivers manoeuvring, tangent alignment, intersections, urban areas, and urban municipal roads were associated with pedestrian crashes. Urban roads and tangent alignment were also patterns identified by the tree.

Environment
The rules highlighted environmental conditions associated with pedestrian crashes such as night-time, wet pavement, rainy weather, winter, and autumn.

Vehicles
As regards vehicle type, both trucks and cars were associated with pedestrian crashes. As for the vehicle age, newer cars were associated with pedestrian crashes.

Drivers
All rules have as the first antecedent driver factors. Among them, eighty-six rules have elderly drivers (driver aged ≥75) as the first antecedent, twenty-three rules have driver's manoeuvring as the first antecedent, and one rule has the driver's failure to yield to pedestrians crossing on the zebra as the antecedent (rule 1, L = 8.66). The rule with the driver's failure to yield to pedestrians crossing on the zebra was also identified by the classification tree (rule T_1, L = 8.66). Driver behaviour was also the primary split of the classification tree.

Interaction among Contributory Factors
The association rules and the classification tree showed several combinations of patterns associated with an overrepresentation of frequency pedestrian crashes (Tables 4-6). The combined presence of driver manoeuvring and intersection (rules 38 and 39) were identified as the strongest three-item rules with a lift greater than 7 and LIC greater than 4, meaning that vehicle manoeuvring at intersections is associated with a probability of pedestrian crashes greater than vehicle manoeuvring in segments or roundabouts. The combined presence of driver manoeuvring, unsignalised intersection, and car involvement increased the lift of rule 39 (without car involvement) producing a lift equal to 8.61 (rule 40). The five-item rule with the higher lift included the combined presence of older drivers (≥75), night-time, tangent alignment, and urban municipal road (rule 49, L = 4.26). Manoeuvring, speeding, and illegal travel directions were identified also by the classification tree (rule T_14, L = 1.70) and combined with urban roads on tangent alignment. The association of such driver behaviours and urban roads with tangent increase the probability of the occurrence of pedestrian crashes by almost 50%.

Comparison between the Econometric and the Machine Learning Methods
To compare the results of the mixed logit and the machine learning models, the significant explanatory variables, as well as their impact on the probabilities of pedestrian crash occurrence, are discussed below.

Roadway
Area as a contributory factor was identified only by the rule discovery technique with the urban areas associated with the pedestrian crash occurrence. Both the mixed logit and the machine learning tools, instead, identified the road type variable. They provided consistent results detecting an overrepresentation of pedestrian crashes on urban municipal roads. Consistency was also found for alignment. All the methods detected the tangent alignment as a contributory pattern. The association rules further identified signalised and unsignalised intersections, combined with driver's manoeuvring, contributing to the pedestrian crash occurrence.

Environment
Both the mixed logit model and the association rules identified the day of the week as a significant pattern. The probability of pedestrian crash occurrence increases during the weekday. Night-time increases the pedestrian crash propensity. Raining and snowing weather condition increases the likelihood of pedestrian crash occurrence. Rain's effect was captured both by the mixed logit model and the association rules whereas fog and high winds contributing to the decrease in pedestrian crash occurrence were significant only in the mixed logit.

Vehicles
The vehicle involved in a pedestrian crash is decisive. Indeed, the vehicle type influences the likelihood of observing a pedestrian crash. The results of both the mixed logit model as well as the association rules were consistent, pointing out that a pedestrian struck by a car or a truck rather than a bike or a PTW has a higher attendance risk. New vehicles (vehicles registered less than 10 years ago) have a positive effect on pedestrian crashes. These results suggest that the innovation in vehicle technology equipment intended to reduce the likelihood of crashes fails to detect pedestrians and does not take adequate account of their safety.

Drivers
The driver behaviour exhibited a significant effect in both the mixed logit model and the machine learning tools. Driver manoeuvring contributes to the overrepresentation of pedestrian crashes. Inappropriate behaviour, such as speeding and travelling in opposite the right direction, was found by the classification tree further contributing to pedestrian crashes. Furthermore, the association rules and the classification tree identified drivers disobeying pedestrian crossing facilities as critical.
The relation between the driver psychophysical state and the pedestrian crashes was identified only by the mixed logit model. Poor eyesight conditions involve an increase in pedestrian crash propensity.
Driver age was correlated with pedestrian crash overrepresentation, especially the involvement of elderly drivers (at least 75 years old) was identified by both groups of methods. Male driver involvement in pedestrian crash overrepresentation was found significant with random effect only in the mixed logit.

Discussion
The study results identified several patterns associated with an overrepresentation of pedestrian crashes. The roadway attributes contributing to an increase in pedestrian crash propensity were urban areas, urban municipal roads, tangent alignment, and intersections combined with drivers' manoeuvring. These results indicate that the roadway patterns impacting the occurrence of pedestrian crashes differ from those affecting the pedestrian crash severity. Indeed, highly dense urban settings may provide more facilities for pedestrians whereas, in rural areas, there are likely to be poor infrastructures that accommodate pedestrians [36][37][38]. Despite this, pedestrian crashes are overrepresented on urban roads whereas fatal pedestrian crashes are overrepresented on other road types. Therefore, pedestrian-oriented safety countermeasures are strongly required for all road types. Based on the study results, on urban roads, special emphasis should be given to pedestrian treatments at mid-block locations. Walking should be prioritised in every new infrastructure scheme as well as when designing regenerated streets in an area experiencing land development, even during maintenance treatments. This may create an opportunity to reconsider some aspects of the street design useful to accommodate safe pedestrian mobility [39] and better incorporate pedestrian-vehicle safety considerations at locations where pedestrian crashes are more likely to occur [40][41][42]. The establishment of a suitable road user hierarchy should be based on safety, vulnerability, and sustainability, with walking being at the top of the hierarchy. The creation of pedestrian paths together with the reduction of vehicle-destined space is not easy to understand and digest for habitual road users. Hence, national, provincial, and municipal policies should work on public acceptance and emphasize the City's interest and investment in developing safe and accessible streets that allow for safe movements.
Interestingly, the probability of pedestrian crashes at roundabouts is lower than at unsignalised and signalised intersections (ORs respectively equal to 0.23, 0.38, and 0.44). Hence, the safety benefits of the presence of roundabouts are relevant in decreasing the fatal pedestrian crash probability as well as in providing a reduction in the pedestrian crash probability. This is due to the reduction of pedestrian-vehicle conflict points and lower vehicle speeds [43,44]. This is a quite relevant result considering that in Italy there are often roundabouts with undesired safety features that negatively influence roundabout safety [45,46]. Based on the study result, if warranty conditions for the installation of roundabouts are satisfied converting unsignalised and signalised intersections in roundabouts is strongly recommended. Refuge islands at the legs of roundabouts further increase the safety of pedestrians at roundabouts [47].
The environmental patterns affecting the increase in pedestrian crash propensity were night-time, dry pavement, wet pavement combined with older drivers (≥75), or with drivers' manoeuvring, weekday, autumn, winter, and spring seasons, raining, and snowing. Pedestrian visibility in darkness is a well-known safety concern. Both drivers' and pedestrians' sight reduce with dark lighting whereas increase their reaction times to avoid potential conflicts. Furthermore, higher driving speeds are generally observed at night, increasing the crash risk. The combination of these conditions increases the required braking distance of vehicles and leads to higher impact at the time of crashes.
Traffic calming as well as low-speed zones in areas with significant pedestrian activity are the most effective solutions to mitigate pedestrian crash frequency at night. Providing adequate pedestrian visibility during the night-time further provides drivers with sufficient time to identify and appropriately react to other road users and hazards [48]. Pedestrian visibility during the night-time can be improved by providing pedestrian crossings lighting with light-emitting diodes (LEDs). Flashing in-curb LEDs as well as pedestrian-activated overhead beacons at crosswalks or in-pavement warning lights with advance signing are effective strategies to warn motorists of pedestrian crossings, increasing their attention, especially at night [49,50]. Campaigns to raise awareness of the importance of using reflective clothing to improve pedestrian conspicuity at night [51,52].
The vehicle patterns affecting the increase in pedestrian crash propensity were truck, car, and vehicles aged at most 10 years. Although the severity of truck-pedestrian crashes has already been found by prior research [53,54], this study further detected a detrimental relation between trucks and pedestrian crash occurrence. To mitigate the consequences of such crashes, traffic management strategies may be implemented separating pedestrian flow and truck routes.
The driver patterns affecting the increase in pedestrian crash propensity were manoeuvring, speeding, illegal travel direction, defective sight, very young age (≤ 17), medium age (45-64), and old age (≥65). Previous research found that the probability of complex vehicular manoeuvres increases the pedestrian crash occurrence, mainly at intersections [55]. The speeding behaviour of drivers was also found to increase the risk of conflicts and its associated crash risk [56]. The driver disobedience of pedestrian crossing facilities was also identified as a pattern contributing to pedestrian crash overrepresentation. The mixed logit model showed a significant odds ratio (equal to 1.41) for drivers with sight issues increasing the likelihood of pedestrian crashes. The rule discovery and the CART algorithm identified the strongest predictor in the drivers' disobeying pedestrian crossing facility. Consistently with previous studies [39], the quality and complexity of the walking environment, exacerbated by poor visibility in the proximity of road crossing opportunities, increase the possibility of pedestrian-vehicle conflicts. Empirical studies have proved the effectiveness of appropriate design modifications aimed at reducing pedestrian crashes and removing barriers to walking [6]. The use of bulb-outs to improve pedestrian visibility is further encouraged. Provided at junction corners, the bulb-outs shorten the pedestrian crossing distance and offer a better view of the oncoming vehicles. Previous research has found that their presence affects the vehicles' operating speeds. In-site measurements revealed lower speeds recorded in sections where bulb-outs are located [57]. Other scholars suggest narrowing the road cross-section (bulb-outs) and introducing pedestrian crossings with blinking lights turning on automatically when a pedestrian is identified [58]. Furthermore, safety awareness and education campaigns should target drivers on pedestrian right-of-way. To stimulate individuals towards safety-oriented actions, education campaigns are fundamental.
This study further identified a greater propensity of older drivers for pedestrian crashes, probably because of their lower reaction times and more difficult interaction with pedestrians.

Conclusions
The investigation of the patterns affecting pedestrian crash occurrence is not a welldeveloped topic as pedestrian crash severity. Whereas many studies aimed at reducing fatal and severe pedestrian crashes, the main aim of this paper was to help to raise awareness among practitioners and provide better guidance in planning and designing infrastructures for pedestrians that are safe, of course, but also accessible and sustainable, to prevent the occurrence of pedestrian crashes towards a vision of walkable cities. This study used an econometric model, namely the mixed logit model, the rule discovery technique, and the CART algorithm, as machine learning tools, to analyse the road infrastructure, environmental, vehicle, and driver-related patterns affecting the pedestrian crash overrepresentation in Italy. The mixed logit, the rule discovery, and the CART algorithm have been generally used to analyse crash severity, whereas this study provided an application of such a methodological approach to detect those features affecting the pedestrian crash occurrence.
The dataset contains 874,847 road crashes resulting in fatalities or injuries that occurred in Italy from 2014 to 2018. Of these, 101,032 were pedestrian crashes.
The results provided by the two groups of methods provide strong evidence of the importance of promoting urban sustainable complete street planning and development as well as raising awareness in support of safer behaviour if walking has to forge an effectiveand mainly safe-solution against private car dependence, traffic noise, air pollution, health disease, and pedestrian vulnerability. To this aim, walking should be at the top of the hierarchy in every new infrastructure scheme as well as in street re-generation designs.
The methodological approach adopted in this study was effective in uncovering relations among road infrastructure, environmental, vehicle, and driver-related patterns, and the overrepresentation of pedestrian crashes. The latest applications of machine learning tools suggest that analysts must opt for a compromise between prediction accuracy and uncovering causality, trying to achieve prediction accuracy and, at the same time, exhaustive and reliable factors contributing to crashes. Despite this, the results of this study advocate the econometric model and the machine learning tools as complementary approaches. The mixed logit provided a clue on the impact of each pattern on the pedestrian crash occurrence whereas the association rules and the classification tree detected the associations among the patterns with insights on how the co-occurrence of more factors could be detrimental to pedestrian safety. Furthermore, the strength of the co-occurrence of the patterns impacting the pedestrian crash occurrence can be measured via the lift increase for the association rules and the posterior classification ratio for the classification tree with the factors mostly contributing to pedestrian crashes being the patterns providing the higher increase in the lift values (association rules) or the splitter modalities providing the highest proportion of pedestrian crashes in a node concerning the root node of the tree. By contrast, the mixed logit model provides information about the directions and magnitude of variable indicators. By the joint use of the econometric methods and machine learning tools, the analyst can exploit the interpretability of the results of the econometric methods and the ability of the machine learning tools to provide comprehensible scenarios (as those provided by association rules and classification tree), further highlighting the co-occurrence and the relative strength of the patterns that contribute to vehicle-pedestrian crashes.
According to the results obtained in the study, safety countermeasures have been proposed. Including pedestrian safety in every step of the planning, design, implementation, and management process is a key factor to ensure that their main problems are identified and mobilised.
The insights gained from the study may help to raise awareness among local authorities and transport agencies in planning and designing appropriate spaces for pedestrians. Furthermore, the results provided by the study may be also considered by the automotive industry to address the important challenge of how vehicle onboard devices can prevent pedestrian crashes.
A significant contribution of this paper relies on the detection of the detrimental impact of drivers' psychophysical states and drivers' behaviours on pedestrian crashes. The availability of such information in the data is crucial. It detects the need for conducting safety awareness and education campaigns to increase safety-oriented actions.

Conflicts of Interest:
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.