Development of Macro-Level Safety Performance Functions in the City of Naples

This paper presents macro-level safety performance functions and aims to provide empirical tools for planners and engineers to conduct proactive analyses, promote more sustainable development patterns, and reduce road crashes. In the past decade, several studies have been conducted for crash modeling at a macro-level, yet in Italy, macro-level safety performance functions have neither been calibrated nor used, until now. Therefore, for Italy to be able to fully benefit from applying these models, it is necessary to calibrate the models to local conditions. Generalized linear modelling techniques were used to fit the models, and a negative binomial distribution error structure was assumed. The study used a sample of 15,254 crashes which occurred in the period of 2009–2011 in Naples, Italy. Four traffic analysis zones (TAZ) levels were used, as one of the aims of this paper is to check the extent to which these zoning levels help in addressing the issue. The models were developed by the stepwise forward procedure using explanatory Socio-Demographic (S-D), Transportation Demand Management (TDM), and Exposure variables. The most significant variables were: children and young people placed in re-education projects, population, population aged 65 and above, population aged 25 to 44, male population, total vehicle kilometers traveled, average congestion level, average speed, number of trips originating in the TAZ, number of trips ending in the TAZ, number of total trips and, number of bus stops served per hour. An important result of the study is that children and young people placed in re-education projects negatively affects the frequency of crashes, i.e., it has a positive safety effect. This demonstrates the effectiveness of education projects, especially on children from disadvantaged neighbourhoods.


Introduction
Road safety has been increasingly regarded as one of the most important transportation concerns in urban areas.Over the last few decades, the development of safety performance functions has enabled traffic engineers and road safety researchers to identify important factors related to the occurrence of crashes on specific highway elements or on transportation networks [1].
A safety performance function (SPF) is an equation used to predict the average number of crashes per year at a location as a function of exposure and, in some cases, it includes site characteristics [2].This type of model belongs to a family of generalized linear models (GLM) with a non-normal error structure distribution [3].
The SPFs can be classified into two different spatial aggregation scales: micro-level and macro-level.In the first scale, the study units are based on small homogeneous road entities, such as roadway segments, ramps, and intersections [4,5].The micro-level factors refer to variables aggregated at the segment/intersection level including traffic data and, geometric data (e.g., number of lanes, road functional classification).In macro-level analysis, the study units are based on some geographic areas (zonal-level) to investigate the influence of socio-economic, demographic, land use, and infrastructure-related factors on crash occurrence [4].Several studies have been conducted for crash modelling at a macro-level, exploring various zonal systems: block groups, census tracts, ZIP code areas and, traffic analysis zones (TAZs) [6][7][8][9][10][11][12][13].Most of these zonal systems were developed for different specific uses.The prevalent spatial unit considered at the macro-level analysis is TAZ.A TAZ may consist of one or more census blocks, block groups, or census tracts; but usually it is a spatial aggregation of census blocks.TAZ boundaries generally coincide with identifiable physical barriers such as major streets and water bodies, and they are delineated in such a way that within each TAZ the land use activities are relatively homogeneous [7].
The objective of these models is to establish relationships between the number of crashes per traffic analysis zone and neighbourhood traits (explanatory variables), such as traffic, road network characteristics, socioeconomic and demographic features, land use, dwelling unit, and employment type.Macro-level safety performance functions that are consistent with aggregate travel demand models have been developed to provide empirical tools for planners and engineers to conduct proactive analyses, promote more sustainable development patterns, and reduce the road crash burden on communities worldwide .These models have great potential to promote increasingly sustainable development patterns by combining several redeeming features from pre-existing models.Specifically, improvements in land use and infrastructure efficiency, a reduction in environmental impact, an increased walkability and an improved neighbourhood social environment [38].
Some of the dependent variables modeled in the previous studies include: total crashes; severe injury crashes; peak morning crashes; property damage only (PDO) crashes; total number of fatalities; total number of injuries; pedestrian crashes and; number of crashes involving elderly drivers [7,11,20,39,40].
Although it is not the most significant predictor of crashes, exposure is a key determinant of traffic safety.The relationship between crash occurrence and exposure is fairly straightforward.The higher the exposure, the greater the possibility for a crash to occur [18].The most common exposure variables, annual average daily traffic (AADT) or vehicle kilometers traveled (VKT), were used along with average zonal operating speed (SPD), and average zonal volume to capacity ratio (VC).Hadayeghi et al. [7] found that VKT had significant effects on crash occurrence in a nonlinear relationship.Lovegrove et al. [15,16] confirmed earlier research regarding the dominant influence of VKT on crash predictions of all types.It also highlighted the significant influence that congestion (VC) and average zonal operating speed (SPD) play in safety evaluation.
Several studies observed that a low socioeconomic status and deprivation increase the fatality risk or the risk of being injured in traffic [17][18][19].An area's socioeconomic deprivation level is usually measured by proxy factors such as total population (TPOP), population density (POPD), household density (NHD), and percentage employed (EMPP) within the TAZ [9].Ladron de Guevara et al. [20] observed that population density and the number of employees (employment density) played a significant role in predicting crashes [18].Lee et al. [24] also observed that a lower proportion of households without an available vehicle within a ZIP code was negatively associated with the risk of pedestrians being involved in a crash.Several authors have suggested factors including age, and sex to explain crash risk [25].Wier et al. [26] have shown that the proportion of the population living in poverty, and the number of people aged 65 and older as percentage of the total population, were significantly good predictors of crashes.Similarly, Ukkusuri et al. [27] found that the proportion of the uneducated (without any schooling) population had a positive effect on pedestrian crashes, while Lascala et al. [28] concluded that the proportion of high school graduates was inversely correlated with pedestrian injury collisions.
Several road networks factors were considered in macroscopic studies, such as zonal lane kilometers (TLKM), percentage of each road class (ALKMP, LLKMP), intersection density (INTD), signal density (SIGD), intersection type (I3WP, IALP), and the average curvature of roadways (CRVD).Some studies have shown, that roadway density has a positive association with total crashes [19] and fatal crashes [12].Hadyeghi et al. [21] and Gomes et al. [22] observed that intersection density, number of households, the number of major road kilometres and, the number of vehicle kilometers traveled, all had significant effects on crash occurrence.Cai et al. [6] found that the length of sidewalks and length of bike lanes, have a positive effect on crash frequency.
Transportation Demand Management (TDM) strategies have hardly ever been implemented to improve traffic safety.Their main objectives are usually the reduction of congestion and emission, as well as travel costs and energy by means of reducing travel demand and consequently vehicle distance traveled, although their impact on traffic safety should not be neglected [29][30][31].However, different individual daily trips and land use are examined in numerous crash investigations.Wedagama et al. [8] found that residential population density, manufacturing, retail trade and services industries were positively related to the number of road traffic crashes.Kim and Yamashita [41] observed that areas with mixed residential and commercial land use have a higher frequency of crashes.Moreover, Pulugurtha et al. [23] also observed that land use characteristics such as urban residential and mixed-use development are strongly associated with the number of crashes in a TAZ.
SPFs are critical to local and state transportation agencies due to their ability to identify regions with potential safety concerns [42].Therefore, for a jurisdiction or nation to fully benefit from applying these models, it is necessary to calibrate or recalibrate them to local conditions [43].This is because crash occurrence frequency, and the associated under-and over-dispersion in crash data can vary significantly across an area.The need for calibrating SPFs to specific area is clearly recognized by the American Association of State Highway and Transportation Officials (AASHTO) due to variations in factors associated with safety, such as road geometry and conditions, environmental factors, geographic characteristics, crash characteristics, reporting thresholds, all of which can be unique to a specific area [2,42].
Since macro-level has not been calibrated nor used until now in Italy, this paper's aim is to fill these research gaps by developing safety performance functions to investigate the relationship between crash frequency and their contributing factors at TAZs level, using data from Naples, Italy.In this way, the paper provides Italian local and state transportation agencies with tools to conduct proactive road safety planning.
The models were developed using recorded crashes in the period of 2009-2011.To analyze different aspects of road safety, 17 dependent variables were investigated, which were divided into six main categories: (1) crash severity; (2) vehicle type; (3) crash location; (4) crash type; (5) traffic conditions; and (6) lighting conditions.There are 53 explanatory variables, which were chosen according to previous analysis of the literature, including factors describing traffic intensity, land use, employment type, socioeconomic and demographic, and traffic network characteristics.

Data
The study data are relative to the city of Naples, regional capital of Campania, in Southern Italy.Naples is the third-largest municipality in Italy, with an area equal to 117,27 km 2 , with 960,000 inhabitants, and a very high density-equal to 8157.79 inhabitants per km 2 .

Traffic Analysis Zones
The TAZ levels adopted in the study are obtained from the layer of the 4343 census zones of the city of Naples.The first TAZ level includes 831 zones, obtained using the following zoning criteria [44,45]: (1) Homogeneous socioeconomic characteristics for each zone's population.
(3) Recognizing physical, political, and historical boundaries.(4) Generating only connected zones and avoiding zones that are completely contained within another zone.(5) Devising a zonal system in which the number of households, population, area, or trips generated and attracted are nearly equal in each zone.(6) Basing zonal boundaries on census zones.
Previous studies have shown that this aggregation has two disadvantages: small size in urban areas and the high percentage of zonal boundary crashes.As seen, one zoning criteria for TAZ is to minimize the number of intra-zonal trips which results in a small area size for each TAZ.Thus, it is difficult to analyze traffic crashes within these small zones at the macroscopic level.Moreover, the small size of zones creates many zones with zero crash frequencies, especially with regards to rarely occurring crashes such as severe, fatal or pedestrian crashes.The second issue is connected to zoning criteria where TAZs are often delineated by arterial roads, and therefore many crashes occur on these boundaries.The existence of boundary crashes may invalidate the assumptions of modelling which is only based on the characteristics of a zone where the crash occurred [45][46][47].
A simple way to overcome these two issues was proposed by Lee et al. [47] and applied in this study, which consists of aggregating contiguous small areas with similar attribute values (in our case crash characteristics).However, to meet the needs of the various dependent variables analyzed in this study, four different levels of aggregation were performed: 831, 402, 208 and 107 TAZs (see Figure 1).The TAZ levels adopted in the study are obtained from the layer of the 4343 census zones of the city of Naples.The first TAZ level includes 831 zones, obtained using the following zoning criteria [44,45]: (1) Homogeneous socioeconomic characteristics for each zone's population.
(3) Recognizing physical, political, and historical boundaries.(4) Generating only connected zones and avoiding zones that are completely contained within another zone.(5) Devising a zonal system in which the number of households, population, area, or trips generated and attracted are nearly equal in each zone.(6) Basing zonal boundaries on census zones.
Previous studies have shown that this aggregation has two disadvantages: small size in urban areas and the high percentage of zonal boundary crashes.As seen, one zoning criteria for TAZ is to minimize the number of intra-zonal trips which results in a small area size for each TAZ.Thus, it is difficult to analyze traffic crashes within these small zones at the macroscopic level.Moreover, the small size of zones creates many zones with zero crash frequencies, especially with regards to rarely occurring crashes such as severe, fatal or pedestrian crashes.The second issue is connected to zoning criteria where TAZs are often delineated by arterial roads, and therefore many crashes occur on these boundaries.The existence of boundary crashes may invalidate the assumptions of modelling which is only based on the characteristics of a zone where the crash occurred [45][46][47].
A simple way to overcome these two issues was proposed by Lee et al. [47] and applied in this study, which consists of aggregating contiguous small areas with similar attribute values (in our case crash characteristics).However, to meet the needs of the various dependent variables analyzed in this study, four different levels of aggregation were performed: 831, 402, 208 and 107 TAZs (see Figure 1).

Crash Data
The crash variables data were obtained from micro-data collected by the various police forces in the urban area, relative to the 3-year period from 2009 to 2011.These data include 50 fields for each crash, containing crash-related, road-related, traffic unit-related, and person-related information [48,49].
The original crash database consisted of 15,254 crashes.In order to link crashes with each TAZ, crashes needed to be geocoded on the GIS road map.However, the location was only carried out for 14,781 crashes due to missing and the poor quality data.The poor quality of information related to crash location is undoubtedly one of the most critical problems of the databases.In Italy, there are two fundamental issues related to crash location.The first concerns the incongruence between national database format and the crash report form.The Italian Database format requires the highway name, linear referencing and GPS coordinates, but the highway police crash form does not contain fields for geographical coordinates, and most of the police units do not have GPS devices.The second issue concerns missing information recorded by the different police forces.Montella et al. [49][50][51][52] found that in 36% of the crashes the location was completely missing.Due to these problems, the information reported in the database is often incomplete, making the location of accidents in some cases difficult to determine, in others impossible.
In order to provide more complete information that help to identify regions with potential safety concerns, it was decided to analyze 17 dependent variables.The dependent variables were divided into six main categories: (1) crash severity; (2) vehicle type; (3) crash location; (4) crash type; (5) traffic conditions; and (6) lighting conditions (Table 1).
Several studies have also emphasized how factors such as the exploratory variables have different impacts for each levels of severity [53].Huang et al. [54] investigated the relations between crash frequency with a variety of aggregate road features, traffic patterns, demographic and socio-economic characteristics.In the study, models for all crash frequency and severe crash frequency were developed, and they were statistically different.Similar results were obtained by Hadayeghi et al. [7], who differentiated the dependent variables in the number of all collisions and number of fatal and injury collisions.Yasmin and Eluru [55] provide a review of earlier studies examining macro-level SPF at the various levels of injury severity.In the present study, the severity is based on the most severe injury to any person involved in the crash and was classified in three variables [50]: − Total crashes (C); − Property damage only (PDO); − Severe (fatal and non-fatal) injury crashes (Cs).
Pedestrians and powered two-wheelers (PTWs) are often referred to as "vulnerable road users" [56].The European Commission has proposed halving the overall number of road deaths in the European Union by 2020, by defining the protection of vulnerable road users as a specific objective of the road safety action program [7,[55][56][57][58].The vehicle type was distinguished into: − Crashes where at least one car was involved (C car ); − Crashes where at least one truck was involved (C truck ); − Crashes where at least one powered two-wheeler was involved (C ptw ); − Crashes where at least one pedestrian was involved (C ped ) In literature, there are several models to estimate crash frequencies on roadway segments or at intersections [2,[59][60][61][62].However, most of these models examined the traffic safety at the microscopic level to find out factors affecting traffic crash occurrence from geometric designs and/or traffic characteristics of roadway entities, and suggested specific engineering solutions to reduce traffic crashes.To extend these analyzes at the macro level, the crash location was distinguished in: − Crash occur on curve or tangent elements (C seg ); − Crash occur within intersection (C int ).
Previous studies have compared single-vehicle crashes with multi-vehicle crashes, and found substantial differences between these two types of crashes [63,64].Single vehicle crashes have been shown to differ from multi-vehicle crashes in a number of aspects, which relate to road conditions, time aspects, or driver characteristics.Single-vehicle crashes are frequently associated with a disproportionate number of serious and fatal crashes.To understand the dynamics involved, the crash type was distinguished in: − Single vehicle is a type of road traffic crash in which only one vehicle is involved (C sv ); − Multi-vehicle is a road traffic collision involving more than one vehicle (C mv ).
The literature review on the relationship between traffic volume and safety at road sections shows that crash frequency increases with increasing congestion levels [65,66].The impact of traffic levels in urban areas during morning time period and afternoon time period on injury severity may potentially be different [67].To explore the relationship between safety and congestion in greater detail, traffic conditions category was divided into four dependent variables: − Crash peak day occur in the part of the day during which traffic congestion on roads is highest (from 7 a.m. to 10 a.m.) (C peakday ); − Crashes peak night occur in the part of the night during which traffic congestion on roads is highest (from 4 p.m. to 9 p.m.) (C peaknight ); − Crash off-peak day occur in the part of the day during which traffic congestion on roads is lower (from 10 a.m. to 4 p.m.) (C off-peak-day ); − Crash off-peak night occur in the part of the night during which traffic congestion on roads is lower (from 9 p.m. to 4 a.m.) (C off-peak-night ).
Similarly to the previous category, the lighting conditions category studies how driver behaviour can vary throughout the day.Thus, two variables were considered: − Total crashes during day (C day ); − Total crashes during night (C night ).
In Table 1, descriptive statistics of crash data at the TAZ level are reported.They include the total crash numbers for each dependent variable with the relative mean and standard deviation for each TAZ level.

The Explanatory Variables
The explanatory variables used in this study were carefully selected based on previous literature and their expected influence on traffic.They (see Table 2; Table 3) were then divided into three themes: Socio-Demographic (S-D), Transportation Demand Management (TDM), and Exposure.In Table 2, each explanatory variable is associated with their relative code and unit of measure.
In Table 3, the mean and the standard deviation of each explanatory variable are reported for each TAZ level.
The socio-demographic data were obtained from the Italian National Institute of Statistics (ISTAT) database.On a municipal level, the field of observation is made up of the habitually domiciled population (residents) as well as the population currently present [7].The following units are measured: families; cohabitants; persons temporarily present on the census date; domiciles; other types of lodging; and buildings.For each particle census, 269 socio-demographic variables were provided.All variables consisted of measured data and most of these variables were aggregated manually to each TAZ.From a preliminary analysis of variables and a comparison with the literature, only 50 variables were used.
A further challenge to the implementation of the database is posed by the calculation of all traffic-related explanatory variables.In fact, differently from motorway contexts, which are largely monitored and have relevant traffic data readily (e.g., entry/exit counts and, mainstream sensors), urban networks require the implementation of proper models.For this reason, a state-ofthe-art transport model [44,68] capable of simulating the whole multimodal transport system in the metropolitan area of Naples has been applied.
Notably, the corresponding road transport system is classified as urban, but with a significant number of motorway connections between town centre and the districts in the suburbs.At a glance, the road supply model accounts for about 54,000 nodes and 115,500 links (of which 10.91% are motorway connections and ramps), implemented in TransCAD on the basis of a TeleAtlas graph; the transit supply model is complex as well, with about 14,000 bus stops and rail stations, and 1785 road and rail transit services.The demand model-following the typical four-stage structure-has been estimated on the basis of 5000 telephone surveys and on a prior O-D matrix based on ISTAT data available on systematic trips.An extensive validation was performed using traffic counts collected at various points of the network, selected also on the basis of the methodology proposed by Simonelli et al. [69].
Overall, the morning peak hour was effectively modelled through a Stochastic User Equilibrium (SUE) assignment.This allowed for the calculation of a wide range of standard traffic-related variables for each traffic zone related to all available modes, to be included as possible explanatory variables.Finally, it is worth noting that the detailed database underlying the applied transport model also allowed for the estimation of zonal supply-related traffic variables for the transit system.More specifically, the following set of variables was calculated for each zone: − Supply characteristics: length of the road network, number of transit services stops per hour.− Demand characteristics: generated/attracted trips.− Morning peak hour SUE: vehicles×km, average degree of congestion, average speed.

Model Description
In this study the dependent variables have only non-negative integer values, and the statistical treatment differs from that of the normally distributed one, which cannot assume any real value, positive or negative, integer or fractional.Poisson or Negative Binomial (NB) regression models, instead, are better suited for defining the random, discrete, and nonnegative nature of crash occurrence [70].One notable characteristic of crash-frequency data is that it is overdispersed, the variance exceeds the mean of the crash counts.When overdispersed data are present, estimating a common Poisson model can result in biased and inconsistent parameter estimates which in turn could lead to erroneous inferences regarding the factors that determine crash-frequencies.Following common practice [3,71], generalized linear modelling techniques were used to fit the models and a negative binomial distribution error structure was assumed.
The selected model form is as follows: where = predicted annual crash frequency, a i , b i = model parameters, and x i = explanatory variables.
And the distribution of Y around E(Y) = µ is negative binomial with an expected value and variance of: where κ is the dispersion parameter of the negative binomial distribution.The modelling procedure is estimated iteratively from the model residuals, with the method of maximum likelihood being the most widely used.Because the variance decreases as κ increases, the value of κ can also be used to compare the goodness of the fit of various models fitted to the same data, in that the larger the value of κ, the smaller the variance and the better the model [72,73].Separate multivariate models were developed for each crash variable and for each TAZ level.The model parameters and the dispersion parameter were estimated by forward stepwise selection of variables for logistic regression using an estimation of maximum likelihood [74].
The forward stepwise approach to choosing a model begins with a null model, and adds terms sequentially until further additions do not improve the fit.At each stage it selects the term which gives the greatest improvement in fit [75,76].The decision on whether or not to keep a variable in the model was based on two criteria.The first is whether the t-ratio of the variable's estimated coefficient is significant at the 5% level.The second criterion is based in the improvement of the goodness of fit measures of the model which include that variable.A stepwise variation of this procedure retests, at each stage, terms added at previous stages to see if they are still significant.This method requires finding the value of the coefficient that maximizes the conditional likelihood.
The models were developed using the GLM (General Linear Model) procedure in SPSS software [77].Multiplicative interaction terms were incorporated in the models, in addition to the analysis of main-effects for each variable selected.The interaction terms were created by combining two explanatory variables at a time and considering all the possible combinations.

Measuring Goodness of Fit
Several measures can be used to assess the goodness of fit of the models [78,79].To measure the goodness of fit in linear regression models the coefficient of determination R 2 is often used.However, the R 2 measure is only appropriate to linear regression, with its continuous dependent variables.The R 2 statistic is a measure of the percentage of unconditional variance of the dependent variable explained by the available covariates.It is considered meaningful only in measuring the goodness of fit of normal linear regression models with additive mean functions, in which the conditional variance of the dependent variable is not a function of its conditional mean.Because crash prediction models are nonnormal and functional forms are typically nonlinear, the R 2 is not appropriate as a goodness of fit measure.To get around this problem, a number of statisticians have developed other goodness of fit measures, such as: Pseudo R 2 , normalized adjusted R 2 (R 2 n ), dispersion parameter-based R 2 (R 2 α ), Akaike information criterion (AIC) and Bayesian information criterion (BIC).In this study, it was chosen to use the R 2 α and AIC, being the goodness of fit measures most commonly used in this analysis type [80][81][82].
The R 2 α uses the size of dispersion parameter in the conventional negative binomial regression model as a yardstick to determine how well the variance of the data is explained, which is calculated as follows [82]: where: κ min = smallest dispersion parameter possible that is obtained by having no covariates in the model (by assuming that all sites have an identical prediction estimate equal to the mean over all sites) and κ = dispersion parameter for the calibrated model.
For a given data set, the largest dispersion parameter value is first estimated by fitting the observed data Y, with a negative binomial distribution (which includes no covariate).The main advantage of this measure is its simplicity in addition to being bound between 0 when no covariate is included, and 1 when covariates are perfectly specified.
The AIC is an estimator of the relative quality of statistical models, the smaller the statistic the better the model [83,84].The AIC value is calculated as follows: ML = maximum log-likelihood of the fitted model and p = number of parameters in the model.
The first term in the AIC equation measures the badness of fit, or bias, when the maximum likelihood estimates of the parameters are used.The second term measures the complexity of the model, thus penalizing the model for using more parameters.The goal for selecting the best model is to choose the best fit with the least complexity.

Results and Discussion
The models were developed using the stepwise forward procedure, adding one explanatory variable at each step.68 regression models were developed in order to examine the relationships between zonal crashes and a suite of factors describing traffic intensity, land use, employment type, socioeconomic and demographic, and traffic network characteristics.
Results of the stepwise procedure for all crash variables are shown in the tables below.Table 4 shows the results of the crash severity group: total crashes (C); property damage only (PDO); severe injury crashes (C s ).Table 5 shows the results of the crash vehicle type group: crashes where at least one car was involved (C car ); crashes where at least one truck was involved (C truck ); crashes where at least one powered two wheeler was involved (C ptw ); crashes where at least one pedestrian was involved (C ped ).Table 6 shows the results of the crash location group: crash occur on curve or tangent elements (C seg ); crash occur within intersection (C int ).Table 7 shows the results of the vehicle type group are reported: single vehicle is a type of road traffic crash in which only one vehicle is involved (C sv ); multi-vehicle is a road traffic collision involving more than one vehicle (C mv ).Table 8 shows the results of the traffic conditions group are reported: peak day crashes occur in the part of the day during which traffic congestion on roads is highest (from 7 a.m. to 10 a.m.) (C peakday ); peak night crashes occur in the part of the night during which traffic congestion on roads is highest (from 4 p.m. to 9 p.m.) (C peaknight ); off-peak day crashes occur in the part of the day during which traffic congestion on roads is lower (from 10 a.m. to 4 p.m.) (C off-peak-day ); off-peak night crashes occur in the part of the night during which traffic congestion on roads is lower (from 9 p.m. to 4 a.m.) (C off-peak-night ).Table 9 shows the results of the crash lighting condition group: crashes during the day (C day ); crashes during the night (C night ).Analysis of the results shows that the goodness of fit of the models improves with decreasing the number of TAZ, particularly R 2 α increases and AIC decreases, except for C ptw , C off-peak day , and C off-peak night , where going from 208 to 107 TAZ R 2 α decreases.Observing the parameters of good fit, the best TAZ is 208.This finding is consistent with previous studies.Xu et al. [85,86] observed that zoning schemes with the higher number of zones tend to have an increasing number of significant variables, more stable coefficient estimation, smaller standard error, but worse model performance.Moreover, Lee et al. [47] confirmed that a higher level of aggregation of TAZ provides the best estimation models with less dispersion, but also demonstrated that if the zone is too large it may lose many local features.
For all models, exposure variables were the most significant predictors and positively associated with the number of crashes in each TAZ, as suggested and frequently shown in the literature [9].In all models, exposure variables such as the length of the road network (TRKM), the average congestion level (V/C) and the average speed (SPD) all gave significant results.These outcomes are consistent with the literature.Lovegrove et al. [87] and Wei and Lovegrove [88] found that regional congestion levels (V/C) were directly associated with the crash prediction model, and estimated that decreasing V/C values would result in decreasing crash estimates.This suggested that the average V/C value for a given traffic zone could be used as a surrogate indicator of road safety.Xie et al. [89] found that street length has a positive impact on crash occurrence.
In the model, the statistically significant demographic variables are resident population (POP), population aged 65 and above (Pop ≥ 65), male population (MaPop), and population aged 25 to 45 (25 ≤ Pop < 45).In particular, Pop ≥ 65 is associated with eight dependent variables: C, PDO, C s , C car , C ped , C seg , C sv and C off-peak day .These results are in line with other studies such as Montella et al. [57], in which the older population showed greater propensity toward fatal crashes.The studies conducted by Noland and Quddus [19] and Aguero-Valverde and Jovanis [12] showed that a higher percentage of the elderly population are associated with a higher number of road crashes, while, according to Amoh-Gyimah et al. [9] the elderly population percentage was positively associated with minor injury pedestrian crashes.A possible explanation is that the elderly may have weak eyesight and might usually take longer to cross a street, thus increasing their exposure to vehicle traffic [90].C peakday is associated with resident population (POP), which was with the research of Abdel-Aty et al. [10,91], Hadayeghi et al. [21], and Xie et al. [89].The male population (MaPop) variable affects CPTW.Montella et al. [56] found that male PTW drivers, in combination with other variables, was significantly correlated with fatal crashes.Interaction male population and population aged 25 to 44 (MaPop*25 ≤ Pop < 45) variable affects C off-peak night and C night .These outcomes are consistent with many studies, which have shown that as you get older, aggressive driving tendencies decrease and driver gender is correlated directly with aggressive driving [92,93] The results also showed that increased crashes were associated with increases in workers per residents (WKGD).In particular, workers per residents (WKGD) is associated with C car , C truck , C seg , C peaknight and C day .These results confirm earlier research by Lovegrove et al. [15] and Kim et al. [39,94].The difference between C peaknight and C peakday is related to the different activities carried out during the hours of the day.During the peak night hours, work-related trips prevail, while during the hours of the morning trips are more diverse, such as travel to school or shops.
The children and young people included in socio-educational projects (MinRe-edu) variable negatively affects the frequency of crashes.In particular MinRe-edu is associated with C, PDO, C s , C ptw , C ped , C peakday , C off-peak day , C off-peak night , C day and C night .These projects provide care to children from disadvantaged neighbourhoods, and organize after school educational activities and workshops which include music, art, cooking, sports, games and leisure activities.This variable shows that these projects also promote less aggressive driving habits in young people.The results of the present study confirmed the positive effects of an active learning-based educational program [95].This result highlight that the road user is the first link in the road safety chain.Whatever the technical measures in place, the effectiveness of a road safety policy depends ultimately on the users' behaviour.For this reason, education, training and enforcement are essential [58].
Regarding the transportation network, it was observed that as the number of trips increases, crashes also tend to increase.In particular, total trips (TRIP t ) is associated with C, PDO, C car , C truck , C ptw , C sv , C mv and C day .TRIP p is associated with C seg , TRIP a , C s , C ped and C int .These results confirm earlier research by Abdel-Aty et al. [91]; Dong et al. [96], Naderan and Shahi [97].A certain TDM scenario may be developed to reduce trips of a specific purpose, and the related number of crashes could be predicted.Hbus is the number of bus stops served in one hour in the area and it is an indirect measure of bus stop capacity.Hbus is a proxy for pedestrian traffic, so the positive sign indicates a growing correlation with crashes.The association between increased collisions and increased bus stops (BS) is consistent with researches of Kim et al. [39,94], Wei et al. and Rhee et al. [98].A larger number of subway stations were found to increase traffic crashes.Bus stops attract pedestrian activities, and an increase of such nodes would increase the possibility of conflict between pedestrians and vehicle traffic, and at bus stops, between buses, other vehicles and pedestrians [89].Pedestrian traffic is most likely unprotected, and therefore pedestrian routes must be improved.

Conclusions
Incorporating safety considerations into the transportation planning process in a comprehensive way has emerged as a strategy for improving transportation safety in recent years.
Macro-level safety performance functions were developed in this study to provide decision support tools for planners to consider safety in the transportation planning process, and to promote more sustainable land use and transport patterns.The objective of this study was to develop a series of macro-level safety performance functions that are consistent with aggregate travel demand models.It might be very helpful for administrations which do not have quality crash data to identify the area which has the highest number of crashes.
68 models were developed using recorded crashes in the period 2009-2011 in the city of Naples.To analyze different aspects of road safety, 17 dependent variables were investigated for four TAZ levels.The first result obtained highlights that, observing parameters of good fit of 68 models, the optimal scale was the TAZ with 208 zones.This result shows that using traditional zoning schemes might not be the optimal systems for regional safety analysis.In this study, the optimal zoning was obtained by aggregating contiguous small areas with similar crash characteristics.
The main significant variables were: children and young people included in socio-educational projects, population, population aged 65 and above, population aged 25 to 44, male population, total vehicle kilometers traveled, average congestion level, average speed, number of trips originating in the TAZ, number of trips ending in the TAZ, number of total trips and, number of bus stops served per hour.
Most of these variables are consistent with the literature, except for the MinRe-edu variable (children and young people included in socio-educational projects).Although a large number of road safety education programs exist, very few studies use crashes as an evaluation criterion-most use intermediate variables such as knowledge, attitudes and (self-reported) safe behaviour [95,99].This study highlights the positive influence of socio-educational projects, connecting the presence of such projects to a reduction in crashes.
The findings of this study highlight that road safety management must take a more comprehensive approach with broader range of policy tools that can be applied to a wider range of component parts that comprise the road system.This approach must recognize that the road user is the first link in the road safety chain, and to achieve greater safety, socio-educational projects have to be included.This is in line with the first objective of the Road Safety Action Program 2011-2020 proposed by European Commission: Improve education and training of road users [58].

Table 1 .
Descriptive statistics of crash data at the TAZ level.

Table 2 .
Explanatory variables: socio-demographic, exposure and transportation demand management.

Table 3 .
Descriptive statistics of explanatory variables at the TAZ level.

Table 4 .
SPFs (crash severity group): Parameter estimates and good of fit measures.
Note: Standard errors of the parameter estimates are reported in brackets.

Table 5 .
SPFs (vehicle type group): Parameter estimates and good of fit measures.

Table 6 .
SPFs (crash location group): Parameter estimates and good of fit measures.
Note: Standard errors of the parameter estimates are reported in brackets.

Table 7 .
SPFs (vehicle type group): Parameter estimates and good of fit measures.

Table 8 .
SPFs (traffic conditions group): Parameter estimates and good of fit measures.

Table 9 .
SPFs (lighting conditions group): Parameter estimates and good of fit measures.
Note: Standard errors of the parameter estimates are reported in brackets.