Identification of Vehicle-Pedestrian Collision Hotspots at the Micro-Level Using Network Kernel Density Estimation and Random Forests : A Case Study in Shanghai , China

The improvement of pedestrian safety plays a crucial role in developing a safe and friendly walking environments, which can contribute to urban sustainability. A preliminary step in improving pedestrian safety is to identify hazardous road locations for pedestrians. This study proposes a framework for the identification of vehicle-pedestrian collision hot spots by integrating the information about both the likelihood of the occurrence of vehicle-pedestrian collisions and the potential for the reduction in vehicle-pedestrian crashes. First, a vehicle-pedestrian collision density surface was produced via network kernel density estimation. By assigning a threshold value, possible vehicle-pedestrian hot spots were identified. To obtain the potential for vehicle-pedestrian collision reduction, random forests was employed to model the density with a set of variables describing vehicle and pedestrian flows. The potential for crash reduction was then measured as the difference between the observed vehicle-pedestrian crash density and the prediction produced by the random forests models. The final hotspots were determined by excluding those with a crash reduction value of no more than zero. The method was applied to the identification of hazardous road locations for pedestrians in a district in Shanghai, China. The result indicates that the method is useful for decision-making support.


Introduction
People start and end most of their trips on foot in their daily lives.However, mainly due to the lack of awareness, pedestrians are often at high risk for death and injury.According to the World Health Organization [1], approximately 1.24 million traffic deaths occur annually on the world's roads, of which about 22% involve pedestrians.As walking positively influences health and the environment, encouraging walking can help develop a sustainable community.Despite a shift from motorized to sustainable transport modes (such as walking and cycling) that have focused attention on pedestrian safety, there is still much room for improvement to ensure a safe walking environment for pedestrians.
A preliminary step to improve pedestrian safety is to identify hazardous road locations for pedestrians.This task plays a crucial role in safety countermeasure proposals and resource allocation.From a geography perspective, hazardous road locations are usually represented by clusters of traffic collisions.In the literature, extensive research has focused on the detection of traffic collision concentration at the micro levels [2][3][4][5][6][7][8][9][10][11][12].The studies can be categorized into two types [13,14].The first is the link-attribute class, where the road network is segmented into basic spatial units (BSUs) and treats the traffic collisions as attributes attached to the BSUs.The other is the event-based type, where individual traffic collision events represented by x and y coordinates in space are analyzed.In traffic collision analysis, kernel density estimation (KDE) is one of the most popular event-based approaches [15].KDE has been widely applied to the identification of hazardous road locations.Although some researchers employed traditional planar KDE [16][17][18] that estimates density in two-dimensional space where traffic collisions are weighted based on the Euclidean distance, there has been a growing trend in applying network KDE (NKDE), which estimates density in a one-dimensional space where distance is calculated along the road network mainly because traffic collisions are a network-constrained phenomenon.For instance, Xie and Yan [5] developed a novel NKDE approach to estimate the density of network-constrained point events and applied it to the analysis of 2005 traffic crash data in the Bowling Green, Kentucky, USA area.The results indicate that the NKDE is more appropriate than standard planar KDE for density estimation of traffic collisions, since the latter is likely to overestimate the density values.
In the context of road safety, hazardous road locations are usually referred to as traffic collision "hotspots", "blackspots", "sites with promise", or "high risk locations".A number of previous studies employed different methods to detect traffic collision hot spots based on traffic collision frequency and rate [19][20][21][22] aggregated by BSUs.Unlike spatial analysts who are interested in spatial analytical techniques for the detection of traffic collision clusters, traffic safety researchers are more concerned with the definition of hazardous road locations.Although using a simple ranking approach is the most convenient way of defining a traffic collision hotspot, it is thought that the method is naive and is likely to cause a large number of false positives.In handling this, previous studies have proposed other measures to define a hazardous (or unsafe) road locations.For instance, McGuigan [23,24] measured the "potential of accident reduction", which was calculated as the difference between the observed and the expected crash count at a site given exposure.Mahalel et al. [25] suggested that locations that are selected for treatment should maximize the expected total reduction of traffic collisions.The premise of these studies is that only excess traffic collisions can be prevented by appropriate treatments [26].However, most of these studies focused on vehicle-vehicle collisions and dealt with collision frequency.The method has not yet been applied to vehicle-pedestrian collision density.
As there is been no consensus on the best method of detecting hazardous road locations, this study proposes an integrated micro-level method that incorporates both traffic crash intensity and the potential for reduction to identify vehicle-pedestrian collision hot spots.The reasons for developing the method are twofold.Firstly, there is a growing trend among nations worldwide to set a "zero" tolerance vision in terms of fatalities to protect road users.To realize the ambitious target of zero road fatalities and serious injuries on roads, researchers and engineers should be concerned with locations where traffic collisions happen frequently.Secondly, in safety practice, resources are usually insufficient for treating every hazardous road location.Policy-makers may not be interested in traffic crash clusters that only result from high traffic volume.They may, instead, like to know hazardous road locations that produce the maximum reduction in traffic deaths and injuries when appropriately treated.In this light, we attempted to develop a framework to integrate both crash density and reduction potential information sources for decision-making support for pedestrian safety.
The following section first introduces the steps for identifying vehicle-pedestrian hot spots, with emphasis on models we used to analyse vehicle-pedestrian collisions.The study area and data are introduced in Section 3, and the results are presented and discussed in Section 4, followed by conclusions and further research directions in Section 5.

Method
The proposed framework for the identification of vehicle-pedestrian collision hot spots involves three steps: producing a vehicle-pedestrian collision density surface, measuring the potential for vehicle-pedestrian collision reduction, and identifying the vehicle-pedestrian collision hot spots.This section introduces the models and approaches employed in each step.

Generation of Vehicle-Pedestrian Collision Density Surface
The NKDE method was used for detecting the vehicle-pedestrian collision hot spots by following the approach in Xie and Yan [5] and Loo et al. [12].First, by analogy with standard planar KDE, where the entire two-dimensional space is divided into regular grids, the roads were divided into BSUs in equal intervals to ensure regularly spaced locations along a network for density estimation [5].Next, the center points of BSUs were obtained as reference points.For each reference point (RP), the density estimate, f (i), is calculated by: where b is the bandwidth, d ij is the network distance between reference point i and vehicle-pedestrian traffic collision j, and Kern(.) is a kernel function that measures the distance decay effect, such as Uniform, Triangle, Quartic, Triweight, and Gaussian [27].In this study, the length of BSU was set as 200 m, and the Quartic function was chosen as the kernel function, which is determined by: Although the BSU length and the choice of kernel function may have limited influence on the results, the selection of bandwidth has significant impacts on the resultant density surface [4,5,12].A small bandwidth may produce a sharp density pattern and may result in a large number of tiny isolated individual clusters, and a broad bandwidth produces smooth density surface where hazardous road locations are likely to be mixed with safe neighboring locations.In this research, the bandwidth was chosen as 250 m-an intermediate value-to ensure an appropriate density surface.

Calculation of Potential of Vehicle-Pedestrian Collision Reduction
The potential for vehicle-pedestrian collision reduction was measured as the difference between the observed and the estimated crash density values.The former is obtained using Equations ( 1) and (2), the latter can be calculated by modelling the vehicle-pedestrian crash density with variables that describe not only vehicle volume but also pedestrian flow.Although traditional statistical models have been widely used in traffic collision modelling [28][29][30], applying machine learning methods [31][32][33] has become a growing trend.A typical example is Chang [31] who analysed freeway collisions with neural network (NN) approaches and found that NN models had better predictive performance because of their exceptional ability in approximating the complicated nonlinearity.However, NNs have limited ability to illustrate the influence of risk factors due to the "black-box" drawback and are likely to cause a severe over-fitting problem.To balance the explanatory ability of risk factors and the accuracy of traffic collision prediction, we employed the random forest (RF) method [34,35] for modeling traffic collisions, because the technique is relatively robust to outliers and can evaluate the relative importance of potential predictors [36].The RF technique is being increasingly applied to many research fields such as classification of land cover [37], identification of fire occurrence [38], mapping of oil spill [39], detection of gold potential [40], and diagnosis of tree health [41]; however, it has rarely been applied to the modeling of traffic collision density.
RF was first proposed by Breiman [35].The technique relies on the "bagging" method that constructs each tree independently by using a bootstrap sample of the dataset [42].A random forest consists of many trees, each of which is generated by drawing bootstrap samples from the original dataset, with "out-of-bag" (OOB) data for validation.Unlike in standard trees where each node is split using the best among all predictors, in a random forest, each node is split by randomly sampling a subset of predictors and choosing the best split among those variables [34].The outcome of the RFs is determined by averaging the predictions of all the trees [35].The importance of each predictor can be estimated by examining the increase in prediction error when permuting the OOB data for that variable and leaving all others unchanged.Two commonly used measures in RFs for assessing variable importance are the mean decrease in accuracy and the decrease in node impurity.As the former indicator is considered a more reliable measure [43], it was used for measuring the variable importance in this study.
This study employed the Sci-Kit Learn (SKlearn, The French Institute for Research in Computer Science and Automation, Rocquencourt, France) toolkit [44] that provides machine learning tools in Python for data mining and data analysis.In SKlearn, the RandomForestRegressor tool was used for implementing the RF algorithm.It contains several parameters that allow users to specify modifications for optimizing the model, including n estimators (the number of decision trees), criterion (the method to measure the quality of a split), max_depth (the maximum depth of a decision tree), and min_samples_split (the minimum sample size in a split).SKlearn also provides functions that enable users to measure the prediction accuracy of the model, such as cross_val_score mean_squared_error, mean_absolute_error, and r2_score, which compute the values of mean squared error, mean absolute error, and R 2 , respectively.The function feature_importances is used for measuring the importance of each variable.
Although independent validation samples are not necessary for RF, they allow the assessment of the generalization capability of the method [38,45].In this light, the dataset was randomly divided into two parts: 70% for calibration and 30% for validation.The procedure was repeated n times, resulting in n sub-samples.The final predicted density value was determined by averaging predictions from RF models based on n sub-samples.The potential for vehicle-pedestrian collision reduction was then obtained by calculating the difference between the observed vehicle-pedestrian collision density and the final prediction.In this study, n was set to five.

Identification of Vehicle-Pedestrian Collision Hot Spots
The potential vehicle-pedestrian collision hot spots were first detected by setting a threshold value for crash density.For each of these locations, the potential for vehicle-pedestrian collision reduction was examined.If the value was no more than zero, the site was treated as a false positive and was excluded from the hot spots.The final hazardous road locations for pedestrians only included those with the potential for collision reduction above zero.Following Harirforoush and Bellalite [4], the threshold value was set to three standard deviations from the mean value in this research.

Study Area and Data
We analysed vehicle-pedestrian collisions occurring in 2015 in Changning District, which is located in the urban core of Shanghai, China.The vehicle-pedestrian collision data were collected from the Shanghai 110 Calling Center.The total length of arterial, secondary, and branch roads in this district is about 295 km.In 2015, 1200 vehicle-pedestrian collisions occurred in the district.Figure 1 shows the spatial distribution of vehicle-pedestrian crashes in the study area.In traffic safety research, the analysis is usually conducted based on crash data observed for 3-to 5-year periods; however, this study only used a dataset for one year.The reasons for this are twofold.First, given the length of the road network in the study area, 1200 vehicle-pedestrian collisions are able to depict overall pedestrian safety.It is not necessary to pool 3-or 5-year datasets to ensure the representativeness of the events.Second, since the late 2000s, the Shanghai Police has enforced a set of safety rules , which may result in significant yearly variation in safety performance.As mentioned earlier, to determine the potential for vehicle-pedestrian collision reduction, the vehicle-pedestrian collision density should be modeled by RF with variables that describe both vehicle and pedestrian volume.Because it is challenging and extremely costly to collect detailed information on the vehicle and pedestrian volume along roads, we employed proxy variables that may reflect the spatial variation in flows.
One crucial variable delineating traffic volume is the Global Positioning System (GPS) data extracted from GPS-equipped taxis.Such on-vehicle GPS data have been widely used in various fields such as urban traffic surveillance, trip pattern identification, city structure recognition, and traffic safety on arterial roads [46][47][48][49], but have not been applied to the modeling of vehicle-pedestrian collision density.The data were collected from nearly 13,000 GPS-equipped taxis from Shanghai Qiangsheng Holding Co., Ltd.(Shanghai, China) The Qiangsheng family owns about 25% of the total number of taxis, which represents 4-7% of the vehicle population in Shanghai [49].The Qiangsheng taxi GPS tracking point database contains information including vehicle identification (ID), time, speed, and longitude and latitude recorded by GPS receivers on the vehicles about every 10 s.With locational information, GPS points were plotted onto a map.A map-matching process was then conducted to ensure that the tracking points were assigned to appropriate roads [50].For each reference point, the number of taxis that passed was calculated.In this study, taxi GPS tracking data 1-7 March 2016 were used for the calculation.The average daily taxi flow was introduced as the vehicle exposure variable for the vehicle-pedestrian collision density prediction models.One crucial issue is that the travel patterns and characteristics of taxicabs may differ from that of general traffic.A typical problem is that unoccupied taxis tend to cluster in some specific types of places such as shopping malls and metro stations.Including unoccupied taxis may cause overestimation of traffic flow in these locations.As trajectories of occupied taxis are more likely to reflect travel demands and hence the variation in real traffic, only taxis with passengers were included in the sample.
In addition to vehicle flow, the pedestrian volume plays a crucial role in vehicle-pedestrian safety models.In the absence of detailed pedestrian flow data, we employed a set of variables that comprehensively reflect characteristics of pedestrian flow.As different uses of land may suggest diverse activities of human beings, which influence different features of pedestrian flow [51][52][53], we employed land use data to reflect the spatial variation in pedestrian exposure.Point of Interest (POI) As mentioned earlier, to determine the potential for vehicle-pedestrian collision reduction, the vehicle-pedestrian collision density should be modeled by RF with variables that describe both vehicle and pedestrian volume.Because it is challenging and extremely costly to collect detailed information on the vehicle and pedestrian volume along roads, we employed proxy variables that may reflect the spatial variation in flows.
One crucial variable delineating traffic volume is the Global Positioning System (GPS) data extracted from GPS-equipped taxis.Such on-vehicle GPS data have been widely used in various fields such as urban traffic surveillance, trip pattern identification, city structure recognition, and traffic safety on arterial roads [46][47][48][49], but have not been applied to the modeling of vehicle-pedestrian collision density.The data were collected from nearly 13,000 GPS-equipped taxis from Shanghai Qiangsheng Holding Co., Ltd.(Shanghai, China) The Qiangsheng family owns about 25% of the total number of taxis, which represents 4-7% of the vehicle population in Shanghai [49].The Qiangsheng taxi GPS tracking point database contains information including vehicle identification (ID), time, speed, and longitude and latitude recorded by GPS receivers on the vehicles about every 10 s.With locational information, GPS points were plotted onto a map.A map-matching process was then conducted to ensure that the tracking points were assigned to appropriate roads [50].For each reference point, the number of taxis that passed was calculated.In this study, taxi GPS tracking data 1-7 March 2016 were used for the calculation.The average daily taxi flow was introduced as the vehicle exposure variable for the vehicle-pedestrian collision density prediction models.One crucial issue is that the travel patterns and characteristics of taxicabs may differ from that of general traffic.A typical problem is that unoccupied taxis tend to cluster in some specific types of places such as shopping malls and metro stations.Including unoccupied taxis may cause overestimation of traffic flow in these locations.As trajectories of occupied taxis are more likely to reflect travel demands and hence the variation in real traffic, only taxis with passengers were included in the sample.
In addition to vehicle flow, the pedestrian volume plays a crucial role in vehicle-pedestrian safety models.In the absence of detailed pedestrian flow data, we employed a set of variables that comprehensively reflect characteristics of pedestrian flow.As different uses of land may suggest diverse activities of human beings, which influence different features of pedestrian flow [51][52][53], we employed land use data to reflect the spatial variation in pedestrian exposure.Point of Interest (POI) data that could be used to further segment the activities were also introduced into the RF model to incorporate more detailed features on pedestrian flow.In this research, land use data were derived from Landsat (National Aeronautics and Space Administration, Washington, DC, US) images from 2014 with a spatial resolution of 30 m. POIs were collected from Baidu, Inc. (Beijing, China) in 2014.The company provides application programming interfaces whereby users are allowed to develop programs for collecting POI information from Baidu Map.As some land use and POI variables are highly correlated, not all types of land use and POIs were integrated into the prediction models.Table 1 describes the variables that were finally introduced in the vehicle-pedestrian collision density models.The result of the collinearity test for these variables was 3.4, reflecting little collinearity.Due to data availability, we used the 2015 vehicle-pedestrian collision data, taxi GPS data from 2016, and land use and POI datasets from 2014.Since Changning District is located in the urban area of Shanghai where the features of the built environment did not vary significantly from 2014 to 2016, it was reasonable to conduct analysis based on datasets collected from different years during this period.

Result and Discussion
There were 1723 BSUs after the segmentation process.Following Equations ( 1) and ( 2), the vehicle-pedestrian density surface was produced, and the mean and standard deviation values were 0.008 and 0.01, respectively.The threshold value for identifying potential vehicle-pedestrian collision hot spots was computed as 0.038, which resulted in 35 possible hazardous road locations for pedestrians.
The RF models were established using GridsearchCV in SKlearn for parameter adjustment.In this study, n_estimator, max_depth, and min_samples_split were set from 100 to 200, 2 to 30, and 2 to 20, respectively.The values of the mean cross-validation score, mean squared error, median absolute error, and R 2 for each sample are presented in Table 2. Regardless of the sample, the value of R 2 was above 0.60.The mean cross validation scores were about 0.60 and slightly fluctuated, which suggests that the results were relatively stable.The values of the mean squared error and median absolute error were small.All these indicators reflect that the RF models could explain, to a large extent, the variation in vehicle-pedestrian collision density when vehicle and pedestrian exposure variables were considered.The result also indicates that the occurrence of vehicle-pedestrian collisions may result from exposures (vehicle and pedestrian flows in this study), as well as from some risk factors that require further investigation for treatment.This is the reason why it was essential to consider the potential for collision reduction.As mentioned before, the RF technique has strength in dealing with the complicated nonlinearity relationship between the vehicle (or pedestrian) flow and occurrence of vehicle-pedestrian collisions.Although it may have some black-box problems, RF is capable of providing importance of variables (also called "features" in RF). Figure 2 shows the value of the importance for each variable with different samples.Although the importance of each variable varied in different samples, two variables-the number of retail shops and the taxi flow-ranked as the top two regardless of which sample was used.The mean feature importance of the two variables among the five samples was 0.3 and 0.15, respectively, indicating their ability to predict the occurrence of vehicle-pedestrian collisions.As mentioned before, previous studies have already investigated the relationship between land use characteristics and the occurrence of traffic crashes involving pedestrians [30,51], and it was found that vehicle-pedestrian collisions were more likely to happen in commercial areas.In this study, the commercial land was further segmented into different types of places such as retail shops and restaurants.The average importance value of the number of retail shops ranked in first place (see NoRetShp in Figure 2); the value of the restaurant count ranged from 0.04 to 0.08.This may have occurred because different kinds of activities may produce diverse types of pedestrian flow, thus significantly influencing the occurrence of vehicle-pedestrian collisions.The findings suggest that introducing POIs into the vehicle-pedestrian crash prediction models is desirable.
The RF models were established using GridsearchCV in SKlearn for parameter adjustment.In this study, n_estimator, max_depth, and min_samples_split were set from 100 to 200, 2 to 30, and 2 to 20, respectively.The values of the mean cross-validation score, mean squared error, median absolute error, and R 2 for each sample are presented in Table 2. Regardless of the sample, the value of R 2 was above 0.60.The mean cross validation scores were about 0.60 and slightly fluctuated, which suggests that the results were relatively stable.The values of the mean squared error and median absolute error were small.All these indicators reflect that the RF models could explain, to a large extent, the variation in vehicle-pedestrian collision density when vehicle and pedestrian exposure variables were considered.The result also indicates that the occurrence of vehicle-pedestrian collisions may result from exposures (vehicle and pedestrian flows in this study), as well as from some risk factors that require further investigation for treatment.This is the reason why it was essential to consider the potential for collision reduction.As mentioned before, the RF technique has strength in dealing with the complicated nonlinearity relationship between the vehicle (or pedestrian) flow and occurrence of vehicle-pedestrian collisions.Although it may have some black-box problems, RF is capable of providing importance of variables (also called "features" in RF). Figure 2 shows the value of the importance for each variable with different samples.Although the importance of each variable varied in different samples, two variables-the number of retail shops and the taxi flow-ranked as the top two regardless of which sample was used.The mean feature importance of the two variables among the five samples was 0.3 and 0.15, respectively, indicating their ability to predict the occurrence of vehicle-pedestrian collisions.As mentioned before, previous studies have already investigated the relationship between land use characteristics and the occurrence of traffic crashes involving pedestrians [30,51], and it was found that vehicle-pedestrian collisions were more likely to happen in commercial areas.In this study, the commercial land was further segmented into different types of places such as retail shops and restaurants.The average importance value of the number of retail shops ranked in first place (see NoRetShp in Figure 2); the value of the restaurant count ranged from 0.04 to 0.08.This may have occurred because different kinds of activities may produce diverse types of pedestrian flow, thus significantly influencing the occurrence of vehicle-pedestrian collisions.The findings suggest that introducing POIs into the vehicle-pedestrian crash prediction models is desirable.The final predicted vehicle-pedestrian collision density was produced by averaging the predictions of five samples, and the potential of collision reduction was then calculated by subtracting the prediction from the observation of vehicle-pedestrian collision density.Altogether, there were 634 BSUs with collision reduction potential.By comparing the resultant locations with those detected by merely setting the density threshold value, 4 of 35 potential hot spots were excluded.Figure 3 shows the spatial distribution of hot spots that were finally determined as hazards for pedestrians (see solid black lines in Figure 3), as well as locations with no crash reduction potential (see solid red lines in Figure 3).It can be observed from the figure that hot spots were also clustered, resulting in several hot zones for pedestrians.Some notable hot spots in this district (see the ellipse in Figure 3) were located in Tian Shan Road, Gu Bei Road, Mao Tai Road, Lou Shan Guan Road, and South Yu Ping Road.If the potential for vehicle-pedestrian collision reduction was not considered, the length of the roads that required further examination, including those colored in both black and red in the figure, was 2.7 km in total.When the proposed integrated method was applied, only 1.8 km of road segments were identified as hazardous.This allows engineers and policy-makers to focus their efforts on locations where there might be a higher likelihood of improving pedestrian safety.The final predicted vehicle-pedestrian collision density was produced by averaging the predictions of five samples, and the potential of collision reduction was then calculated by subtracting the prediction from the observation of vehicle-pedestrian collision density.Altogether, there were 634 BSUs with collision reduction potential.By comparing the resultant locations with those detected by merely setting the density threshold value, 4 of 35 potential hot spots were excluded.Figure 3 shows the spatial distribution of hot spots that were finally determined as hazards for pedestrians (see solid black lines in Figure 3), as well as locations with no crash reduction potential (see solid red lines in Figure 3).It can be observed from the figure that hot spots were also clustered, resulting in several hot zones for pedestrians.Some notable hot spots in this district (see the ellipse in Figure 3) were located in Tian Shan Road, Gu Bei Road, Mao Tai Road, Lou Shan Guan Road, and South Yu Ping Road.If the potential for vehicle-pedestrian collision reduction was not considered, the length of the roads that required further examination, including those colored in both black and red in the figure, was 2.7 km in total.When the proposed integrated method was applied, only 1.8 km of road segments were identified as hazardous.This allows engineers and policy-makers to focus their efforts on locations where there might be a higher likelihood of improving pedestrian safety.Notably, in the absence of detailed vehicle and pedestrian exposure information at the micro level, we employed three variables-taxi flow, land use, and POI data-to reflect the variation in traffic and pedestrian characteristics across the study area by following previous studies on the relationship between the vehicle volume (or pedestrian flow) and taxi flow (or land use characteristics) [52][53][54].Although the focus of this research was not the validation of the three variables as proxies of vehicle and pedestrian flow, the way in which vehicle and pedestrian exposure can be measured has always been an area of interest in road safety research [30].With more experiments on the feasibility of proxy variables being performed in future, better tools can be developed to increase the precision of the estimation, and the proposed method in this research could be further improved.Notably, in the absence of detailed vehicle and pedestrian exposure information at the micro level, we employed three variables-taxi flow, land use, and POI data-to reflect the variation in traffic and pedestrian characteristics across the study area by following previous studies on the relationship between the vehicle volume (or pedestrian flow) and taxi flow (or land use characteristics) [52][53][54].Although the focus of this research was not the validation of the three variables as proxies of vehicle and pedestrian flow, the way in which vehicle and pedestrian exposure can be measured has always been an area of interest in road safety research [30].With more experiments on the feasibility of proxy variables being performed in future, better tools can be developed to increase the precision of the estimation, and the proposed method in this research could be further improved.

Conclusions
The improvement in pedestrian safety plays a crucial role in developing a safe and friendly walking environment to help ensure urban sustainability.Given the importance of hot spot detection in safety management, we proposed a framework for the identification of hazardous road locations

Figure 1 .
Figure 1.Spatial distribution of vehicle-pedestrian collisions in 2015 in the study area.

Figure 1 .
Figure 1.Spatial distribution of vehicle-pedestrian collisions in 2015 in the study area.

Figure 2 .
Figure 2. Feature importance of variables in each sample.

Figure 2 .
Figure 2. Feature importance of variables in each sample.

Figure 3 .
Figure 3. Spatial distribution of vehicle-pedestrian collision hot spots.

Figure 3 .
Figure 3. Spatial distribution of vehicle-pedestrian collision hot spots.

Table 1 .
Description of variables in the vehicle-pedestrian collision density models.

Table 2 .
Results of Random Forest (RF) models.

Table 2 .
Results of Random Forest (RF) models.