Examining Hotspots of Tra ﬃ c Collisions and their Spatial Relationships with Land Use: A GIS-Based Geographically Weighted Regression Approach for Dammam, Saudi Arabia

: Examining the relationships between vehicle crash patterns and urban land use is fundamental to improving crash predictions, creating guidance, and comprehensive policy recommendations to avoid crash occurrences and mitigate their severities. In the existing literature, statistical models are frequently used to quantify the association between crash outcomes and available explanatory variables. However, they are unable to capture the latent spatial heterogeneity accurately. Further, the vast majority of previous studies have focused on detailed spatial analysis of crashes from an aggregated viewpoint without considering the attributes of the built environment and land use. This study ﬁrst uses geographic information systems (GIS) to examine crash hotspots based on two severity groups, seven prevailing crash causes, and three predominant crash types in the City of Dammam, Kingdom of Saudi Arabia (KSA). GIS-based geographically weighted regression (GWR) analysis technique was then utilized to uncover the spatial relationships of tra ﬃ c collisions with population densities and relate it to the land use of each neighborhood. Results showed that Fatal and Injury (FI) crashes were mostly located in residential neighborhoods and near public facilities having low to medium population densities on highways with relatively higher speed limits. Distribution of hotspots and GWR-based analysis for crash causes showed that crashes due to “sudden lane deviation” accounted for the highest proportion of crashes that were concentrated mainly in the Central Business District (CBD) of the study area. Similarly, hotspots and GWR analysis for crash types revealed that “collisions between motor vehicles” constitute a signiﬁcant proportion of the total crashes, with epicenters mostly stationed in high-density residential neighborhoods. The outcomes of this study could provide analysts and practitioners with crucial insights to understand the complex inter-relationships between tra ﬃ c safety and land use. It can provide useful guidance to policymakers for better planning and e ﬀ ective management strategies to enhance safety at zonal levels.


Road Safety in Kingdom of Saudi Arabia
Road traffic collisions pose a critical public health concern worldwide. It is estimated that over 1.3 million peoples are killed, and 50 million others are injured due to traffic crashes every year [1].

Definition of Crash Hotspots
To reduce the burden of traffic crashes, it is critical to examine the time and locations of crashes where they occur more frequently. Hotspot identification (HSID) is a vital task for road traffic safety programs. Locations that have clusters of high concentrations of crashes are commonly known as crash-prone locations or hotspots. The number of crashes occurring at any specific site or road segment during a certain period of time is a non-negative integer and probabilistic in nature [6,7]. Even though traffic crashes are unpredictable at the micro-level and random by nature, statistical models can yield reliable estimations of expected crash frequencies as a function of explanatory variables such as traffic flow, site characteristics, and road geometry data at the macro-level [8,9]. In previous studies, numerous empirical relationships between such predictor variables and vehicle crashes have been established [10][11][12]. Such crash prediction models are useful in determining the critical risk factors, evaluating design and management alternatives, and improving the safety for road users [13,14].

Existing Methods for Crash Hotspots Identification
Crash HSID methods are frequently studied under two main categories: observed crash frequency and expected crash frequency [15,16]. The methods based on the crash frequency or crash rate is calculated as the number of crashes per vehicle-kilometer for road segments or per vehicle-entering at road junctions. However, methods solely based on observed crash frequency are considered inefficient, since crashes are random events that are likely to change both spatially and temporally. This randomness of crash occurrences can be attributed to several factors, including the conditions of the drivers, vehicles, roadways, traffic, and the physical environment. Given the limitations of methods based on crash frequency, the methods based on expected frequency are gaining rapid attention, whereby expected measures for HSID are estimated by different statistical models, including basic Poisson models, Negative Binomial models, and Empirical Bayesian methods [17]. It is believed that methods based on expected crash frequency will more accurately reflect the expected risk levels at specific locations during a given time period [18].
Numerous methods have been suggested for HSID of traffic crashes in recent years [16,[19][20][21]. Each method has its own characteristics, benefits, and shortcomings, making each suitable for HSID under specific conditions. The most commonly used methods for HSID for traffic crashes include the kernel density estimation method (KDE) [22,23], local indicators of spatial association (LISA) method [24], and the nearest neighborhood hierarchical (NNH) clustering method [25]. KDE is a non-parametric technique frequently adopted to estimate the probability density function of a random variable [26]. To measure local crash risk, each spatial unit is assigned a local density estimate (LDE) where spatial units exceeding a certain given LDE threshold are labeled as hazardous. If a group of neighboring spatial units shares the LDE higher than that certain threshold, the road segment and surrounding junctions can be identified as a hotspot [27]. Similarly, with the LISA approach, each spatial unit is allocated a LISA index. This index evaluates the spatial association for observed crashes at adjacent units. Higher LISA index would indicate an increased spatial concentration of crashes and vice-versa. Finally, the NNH is a hierarchical clustering technique that clusters the spatially distributed points together based on predefined criteria [28]. Clustering is repeated until all the points within the threshold distance are grouped into single cluster (referred to as hotspot), or the clustering criteria fails.
Empirical Bayes (EB) and full Bayesian (FB) are other approaches and techniques that are increasingly being used for HSID [20,29]. The EB method accounts for both historical and expected crash trends to yield the crash estimates for similar sites. It also accounts for the regression-to-mean phenomenon. Unlike other conventional HSID methods, previous studies suggest that the EB method is more reliable and accurate for HSID [30,31]. Regression-to-the-mean (RTM) implies the short-term fluctuations in crash frequencies around the average/expected crash frequency over a particular period. When a period with a comparatively high crash frequency is observed, it is statistically probable that the following period will have a comparatively low crash frequency (even in the absence of no countermeasures). This tendency also applies to the high probability that a low crash frequency period will be followed by a high crash frequency period. On the other hand, the FB HSID method uses a probability function (as Poisson-lognormal distribution) for crash frequency and caters to the area-specific myriad of risk factors as safety performance functions (SPFs) [17,32].

Relationship between Land Use and Crash Hotspots
Identifying crash clusters or hotspots provide key insights to safety specialists for a better understanding of crash patterns and management of road safety. Understanding the role of the built environment and urban land use is an essential problem in road safety studies. The relationship between land use and traffic crashes is apparent since different land use tends to attract and generate different types of trips [33]. Trip making behavior is a crucial predictor variable for determining the nature and volume of traffic. Although several previous studies have reported that traffic violations and crashes are closely related to characteristics of drivers and travel behavior [34][35][36][37], yet it is rational to assume that they are likely to increase as the land use intensifies. Earlier studies have proposed different methods to explore spatio-temporal variations of crashes [15,18]. Only a few have investigated the influence of land use on the crash patterns [33,38]. Further, conventional studies have primarily focused on determining crash hotspots based on aggregated crash frequencies, thus ignoring the significance of individual crash characteristics and contributing factors. However, it is well established that "not all crashes are equal," making it inevitable to identify hotspots for different crash attributes.
A number of previous studies reported that that urban zones with low densities such as rural and residential areas have high injury severities when compared to those with high densities such as commercial and congested residential areas [39,40]. A study conducted by Xie et al. used generalized structural equation modeling (GSEM) to assess the effect of land-use conversion on severe crashes (SC) [41]. The study results showed that urban residential, business and commercial, and mixed residential-commercial land uses had the highest risk of exposure to SC. Pulugurtha et al. utilized a negative binomial model to explore the impact of land use characteristics on zonal risk estimation and found that urban residential land use and mixed areas were highly correlated with crash frequencies [42]. In another study, the applicability of the empirical Bayes (EB) method was examined for the association between land use traffic collisions in three districts of Hong Kong [43]. It was concluded that commercial areas were more hazardous based on average crash counts and as well as injury severity. Chen et al. also investigated the effect of the built environment on pedestrian crashes [44]. The authors noted that they are inversely related with (i) high densities of sidewalks, (ii) high proportions of steep gradients, (iii) high proportions of industrial land use and employment, (iv) areas with lower average speed limits, and (v) lower number of bus stops. Thisstudy further demonstrated that local characteristics ISPRS Int. J. Geo-Inf. 2020, 9, 540 4 of 22 such as land use and urban population densities do have noticeable effects on traffic crashes. A better understanding of the relationship between vehicle crashes and land use is fundamental in improving crash prediction and providing sound policy recommendations that could reduce the number of crash occurrences as well as their severities.

Previous Studies for HSID Using GIS
Recently, geographic information systems technologies (GIST) have drawn widespread attention for the identification of crash hotspots. The GIS-based HSID process uses several methods and techniques, such as the traditional kernel density estimations, nearest neighbor analysis, and K-functions. A recent study examined the performance of four clustering methods in a GIS platform to observe crash patterns in the complex urban networks of Mashhad city, Iran [45]. The results revealed that crashes in the study area were more clustered in certain parts of the city. Additionally, in Iran, Shariat et al. adopted a GIS-based network kernel density estimation (NKDE) approach for detecting high-risk crash locations in the Markazi province [46]. They used Moran's I spatial autocorrelation coefficient to determine whether crashes were clustered together or not. Similarly, Ulak et al. compared the performance of Local Moran's I, Getis-Ord Gi*, KLINCS, and KLINCS-IC to gain better insights in the identification of hotspots in the eastern US state of Florida [47]. Alternative spatial weights (such as distance, free-flow travel time, and congested travel time) based sensitivity analysis were undertaken to examine the influence of bandwidths on the identified hotspots. Local Moran's I considering the weight distance, achieved the highest prediction accuracy followed by KLINCS-IC with free-flow weight travel time. Bíl et al. used KDE based clustering methods for spatiotemporal analysis of crash hotspots using nine years (2010-2018) crash data from Czech Republic [48]. The authors classified the hotspots based on three attributes, i.e., hotspot emergence, their stability, and disappearance. It was noted that only 100 hotspots were stable over the entire period.
Dereli and Erdogan also used GIS-aided spatial statistical methods (Poisson regression, empirical Bayesian, negative binomial regression) for determining crash hotspots [49]. Nine years of crash data (2005-2013) for 2408 state roads in Turkey were collected. Out of the 32,107 sub-segments, 126 were classified as hotspots. In another study, Yalcin and Duzgun employed GIS for the identification of hotspots for two-wheeled vehicles [50]. Three methods of spatial crash pattern analysis (nearest neighbor distance, kernel density, and the K-function) on road networks were used to find crashes clustered along the main transportation routes in the study area. Yuan et al. proposed a novel Firefly Clustering Algorithm coupled with GIS for accurate identification of crash hotspots in the urban areas of Licheng district, situated in the east of Jian, China [51]. Results showed that Euclidean distance-based search method could easily overestimate the number of hotspots, especially near the urban road intersections, while the proposed method is more robust under different scenarios. Butt et al. also utilized GIS-based geo-statistical surveillance for the identification of crash hotspots in the district of Rawalpindi, Pakistan [52]. They found that high-risk sites were located in both commercial and residential areas near hospitals schools, public transit services, and airports. Le and Lin investigated the significance of GIS-based statistical analytic techniques for spatio-temporal variation of crash severity index (SI) in Hanoi, Vietnam [53]. During the first phase of the study, the KDE method was utilized to analyze hotspots according to different time intervals (daytime, night-time, peak/off-peak periods) and seasons. The results were then presented using co-map technique. Results showed that both methods provided relatively similar hotpot locations, but due to the integration of SI, a ranking of some hotspots was different from others. The authors included SI in crash hotspot analysis for precise determination and ranking of hotspots locations.
Recently, few studies have used geographically weighted regression (GWR) for analysis of traffic crashes. Zheng et al. examined the spatial variation of factors contributing to crash harm using GWR in southeastern Virginia [54]. The map of coefficients with GWR showed detailed insights where certain factors are associated with higher crash harm. In another study, Soroori et al. utilized geographically weighted Poisson and negative binomial regression (GWPR and GWNBR) to model the relationship between crash injury frequency and transportation macro-level variables such as transport infrastructures, traffic characteristics, driver socio-economic factors, and land use attributes [55]. Results showed that proposed methods are more robust than GWR by capturing the variable's spatial heterogeneity more accurately. Zhibin, in their study, also showed that GWPR was useful in capturing the spatially non-stationary relationships between predictor variables at the county level [56]. Similarly, Li et al. compared the performance of the GWR calibrated statistical models with other models calibrated using the ordinary least squares (OLS) technique for predicting crashes on 245 intersections in Chicago [57]. The authors used crash data collected from 2008 to 2010 and performed analysis of variance tests that revealed that models calibrated with GWR achieved improved crash predictability. Another study conducted in Flanders, Belgium, also demonstrated that GWPR outperforms their corresponding generalized linear models (GLMs) in capturing the spatial heterogeneity of fatal and injury crashes [58]. Li et al. employed GIS-based Bayesian approach for analyzing spatio-temporal patterns of motor vehicle crashes in Texas and found that the proposed method is promising in identifying the relative crash risks [59]. Huang et al. also examined the relationship of built-environment and land use with urban traffic crashes through the application of GWR analysis [60]. Their study showed that variables such as intersection density, percentage of local road mileage, and percentage of commercial land use have a stable relationship with traffic crashes. Ye et al. also reported that urban population density and the presence of subway stations are directly related to the occurrence of traffic crashes [61].

Contributions of the Current Study
A critical review of the existing literature indicates that various methods were proposed for HSID in various parts of the world, with each having its own suitability and characteristics. Existing studies have mostly focused on statistical modeling-based approaches to uncover associations between crash outcome and predictor variables. However, these methods fail to capture the spatial heterogeneity and non-stationarity of traffic crashes. The relationship between land use and traffic crashes is obvious since different land uses generates/attracts different types of trips, yet only a few studies have explored this topic. Furthermore, previous studies have mostly focused on aggregated frequencies-based clustering of traffic crashes, which has limited applications to the implementation of effective countermeasures. Thus, it is essential to explore the crash patterns based on individual disaggregated crash attributes to mitigate the burden of specific severity class, crash causes, and crash types.
Only few studies have examined a detailed spatial analysis of crashes in KSA. This study is aimed to utilize the geographically weighted regression (GWR) method for unraveling the relationship between traffic crashes with population density and urban land use for the city of Dammam, Saudi Arabia. Crash hotspots and GWR based maps for three prevailing crash types (vehicle collisions, hit fixed object, and hit pedestrians), seven predominant crash causes (distractions, speeding, sudden lane deviations, not giving way, poor roadway, fatigue driving, and traffic violations), and two severity groups (fatal and injury (FI) and property damage only (PDO)) were developed at the zonal level in the study area. The findings of this study could provide essential guidance for road safety agencies and policymakers to identify the high-risk and crash-prone locations and to adopt appropriate measures for improving zonal-level safety.
The remainder of this paper is organized as follows. Section 2 provides a brief description of the study area. Section 3 presents the data and methods. The results are presented and discussed in Section 4. Finally, Section 5 highlights the study conclusions, implications of the current study, and prospects for future studies.

Study Area
The city of Dammam is the capital of the Eastern Province of Saudi Arabia, and it was chosen as the area of interest for this study (Figure 1). While the city has a total area of 653 km 2 , only 450 km 2 was chosen for this study since the remaining 203 km 2 are vacant lands that are yet to be developed [62].
Dammam currently has a total population of over 1.1 million [63]. After the oil boom during the 1970s, the city's population grew exponentially over the past few decades. With the rapid population growth, the number of registered vehicles in the city has surged from 0.4 million in 2001 to 1.3 million in 2018 [64]. Dammam currently does not have any public transportation system. The rapid rate of motorization and increased auto-ownership have severe consequences as far as road safety is concerned. The number of traffic crashes have increased exponentially over the past ten years resulting in hundreds of deaths, injuries, and costing billions of dollars in losses to the economy.  The population distribution and the land use map of the Dammam Metropolitan area are shown in Figure 2. A total of 82 occupied residential neighborhoods were considered in the analysis. Distribution of the population is diverse among the different neighborhoods (ranging between 5000 and 30,000 residents). Nearly 40% of the city's population are expatriates coming from surrounding Arab, South Asian, and Southeast Asian countries. Due to their diverse backgrounds and cultures with various driving styles and training, it is difficult for them to follow the Saudi driving rules and regulations. The city is a major hub to oil industries, public and private businesses, public services, academic institutions, and various commercial activities contributing to the traffic generations. As shown in Figure 2, the entire city can be primarily divided into five different land use zones (i.e., residential, commercial, mixed commercial-residential, industrial/employment, and public facilities and services). Almost 65% of the area is pure residential areas with high-density neighborhoods surrounding the downtown. Low-to-medium population densities are located mostly within the mixed commercial and residential zones along the coastal areas in the north-eastern parts of the city. The population distribution and the land use map of the Dammam Metropolitan area are shown in Figure 2. A total of 82 occupied residential neighborhoods were considered in the analysis. Distribution of the population is diverse among the different neighborhoods (ranging between 5000 and 30,000 residents). Nearly 40% of the city's population are expatriates coming from surrounding Arab, South Asian, and Southeast Asian countries. Due to their diverse backgrounds and cultures with various driving styles and training, it is difficult for them to follow the Saudi driving rules and regulations. The city is a major hub to oil industries, public and private businesses, public services, academic institutions, and various commercial activities contributing to the traffic generations. As shown in Figure 2, the entire city can be primarily divided into five different land use zones (i.e., residential, commercial, mixed commercial-residential, industrial/employment, and public facilities and services). Almost 65% of the area is pure residential areas with high-density neighborhoods surrounding the downtown. Low-to-medium population densities are located mostly within the mixed commercial and residential zones along the coastal areas in the north-eastern parts of the city.

Data and Methods
There were several sets of data that were collected and processed for this study. First, a database containing the detailed vehicular crash dataset containing their latitude and longitudinal coordinates between the years 2009 and 2016 was collected from the Dammam Traffic Department. The data cover all types of motor vehicle crashes during the study period. The local traffic police department is mainly responsible for recording detailed information about the crash event. Details on different crash characteristics are collected by emergency response crew on standard crash reporting proformas. The nearby police patrolling unit is required to reach the crash scene within 7 min of the event happening. The crew consists of a medical staff unit and trained observers for noting crash information. This information includes the drivers, sociodemographic attributes (age, gender, education, profession etc.), vehicle characteristics, weather characteristics, lighting conditions, and

Data and Methods
There were several sets of data that were collected and processed for this study. First, a database containing the detailed vehicular crash dataset containing their latitude and longitudinal coordinates between the years 2009 and 2016 was collected from the Dammam Traffic Department. The data cover all types of motor vehicle crashes during the study period. The local traffic police department is mainly responsible for recording detailed information about the crash event. Details on different crash characteristics are collected by emergency response crew on standard crash reporting proformas. The nearby police patrolling unit is required to reach the crash scene within 7 min of the event happening. The crew consists of a medical staff unit and trained observers for noting crash information. This information includes the drivers, sociodemographic attributes (age, gender, education, profession etc.), vehicle characteristics, weather characteristics, lighting conditions, and road inventory from crash locations. The final crash database on the excel spreadsheet is then complied by extracting the details from individual crash report files remotely in office. In addition to crash data, information about the boundaries of the residential neighborhoods with their population size (2010) and detailed land use was collected from the Dammam Municipality's office. Data on land use and population zoning were also collected from Dammam municipality.
To examine the pattern of distribution for the crashes, the nearest neighbor index (NNI) for the three crash attribute categories (crash severity groups, crash causes, and crash types) were calculated in ArcGIS v.10.6 to examine their pattern of distribution and knowing if the crashes are clustered together in few specific locations, are dispersed, or occurred randomly throughout the study area. The NNI is one of the common indexes used to examine point patterns, and it is calculated using Equation (1) [65]. In the equation D nnd is the average distance between each crash location and the nearest crash and D ran is the distance that would be expected if the crashes were randomly distributed [65]: where: In the equations, n is the total number of crashes within the study area; d i is the distance between each crash location, and its nearest closest crashes location, and A is the total area for the study (450 km 2 ). The NNI value near 0 would indicate clustering, while a value close to 1 would suggest the crashes are randomly distributed. An NNI value greater than 1 suggests that the crashes are spatially dispersed. In this study, crash points that had NNI values less than 0.6 were considered to be clustered, and the optimized hotspot analysis algorithm in ArcGIS was used to find the locations of the hotspots for each of the concerned crash attribute categories. In this algorithm, a fishnet polygon formed by cells of 250 m side length is created based on the number of crashes within each polygon. All the hotspot areas were mapped with more than a 90% confidence level.
In the final step of the analysis, a geographically weighted regression (GWR) method was used to examine the spatial relationships between the number of crashes (by per neighborhood and the population within that neighborhood). The mathematical equation for GWR is given in Equation (3) [50].
where x ij is the jth predictor variable (only population), β j (u i ,v i ) is the jth coefficient, (u i ,v i ) is the vector form of x and y coordinates, and y i is the rate of the crash within each neighborhood. Finally, the ε i is the error term determined from the observed standardized score and the predicted score [66]. A fixed kernel was used to conduct the GWR analysis, using the AICc (corrected Akaike Information Criterion) to determine the optimal bandwidth parameter.

Temporal Distribution of Crashes
The temporal variations of the traffic crashes within the study area are shown in Figure 3. Figure 3a shows the yearly fluctuations in the total number of crashes, injuries, and fatalities. A total of 11,539 crashes were recorded, resulting in 806 deaths and over 6000 injuries. Close evaluation of Figure 3a indicates that year 2012 witnessed the highest number of crashes, injuries, and traffic casualties. This sharp increase in crash record stirred an immediate response from road safety officials and concerned authorities. Various preventive measures such as revising the speed limits, strictly enforcing traffic rules and regulations, and increasing traffic violation fines were implemented. The steady decrease in the observed number of crashes in the following years may be attributed partly due to the regression-to-mean (RTM) phenomenon and partly due to implemented interventions. However, the proportion of injuries and deaths is well above the desired road safety targets in the country.
In Figure 3b, the variation in mean monthly crashes for the entire study period is presented. It can be observed that the summer months, including June, July, and August, carried the highest number of crashes followed by November and December. The large proportion of accidents during summer months may be attributed to a greater number of intracity trips during summer vacations as well as due to tire failures due to extreme heat in months of summer in KSA [5,67,68]. The frequency distribution of mean crashes among a different day of the weeks are plotted in Figure 3c. In KSA, Fridays and Saturdays are considered the weekends. Since the residents usually stay at home to attend the Friday noon prayers and spend the afternoon at home with friends and family, Friday has the lowest rate of total crashes. However, the residents travel to supermarkets, restaurants, and medical facilities during Saturdays. Therefore, Saturday has the highest rate of traffic crashes. During the weekdays, the number of crashes is fairly consistent. Finally, Figure 3d shows the hourly fluctuations in average daily traffic crashes. The distribution of hourly crash patterns is intuitive based on traffic exposure and travel during the day.
ISPRS Int. J. Geo-Inf. 2020, 9, x FOR PEER REVIEW 9 of 23 steady decrease in the observed number of crashes in the following years may be attributed partly due to the regression-to-mean (RTM) phenomenon and partly due to implemented interventions. However, the proportion of injuries and deaths is well above the desired road safety targets in the country. In Figure 3b, the variation in mean monthly crashes for the entire study period is presented. It can be observed that the summer months, including June, July, and August, carried the highest number of crashes followed by November and December. The large proportion of accidents during summer months may be attributed to a greater number of intracity trips during summer vacations as well as due to tire failures due to extreme heat in months of summer in KSA [5,67,68]. The frequency distribution of mean crashes among a different day of the weeks are plotted in Figure 3c. In KSA, Fridays and Saturdays are considered the weekends. Since the residents usually stay at home to attend the Friday noon prayers and spend the afternoon at home with friends and family, Friday has the lowest rate of total crashes. However, the residents travel to supermarkets, restaurants, and medical facilities during Saturdays. Therefore, Saturday has the highest rate of traffic crashes. During the weekdays, the number of crashes is fairly consistent. Finally, Figure 3d shows the hourly fluctuations in average daily traffic crashes. The distribution of hourly crash patterns is intuitive based on traffic exposure and travel during the day.

Crash Hotspot Analysis
The first objective of this study is to examine the presence of hotspots by crash severity, causes, and types. It was done by assessing the NNI values for the crash incidences (Table 1). Since all the crash attributes had NNI values less than 0.6, the hotspot analysis was performed for all the crash severity, causes, and types.

Hotspot Analysis by Crash Severity
In Figure 4, the crash hotspot locations based on severity indices (i.e., property damage only (PDO) or those involving fatal and injury (FI)) are shown. It is established that "not all crashes are equal", the extent of damage associated with the specific crash is dependent on several factors including urban land use, features of the built environment, crash type, driver alertness, and response to the event, road designs, vehicle characteristics, number of vehicles involved, the speed at the time of impact, and many others. PDO crashes were prevalent in medium to high density residential and public facility zones where permissible speed limits (<50 kmph) are relatively low, and traffic volumes are significantly high. Since the drivers are usually more alert during such circumstances, the chances of any significant injuries and casualties are low. On the other hand, the FI crashes were clustered on routes having flexible driving speed limits in urban areas having low-to-medium population densities. Apart from over-speeding, other factors such as driver distractions, fatigue driving, sudden lane deviations, and crashes involving pedestrians were significant contributors to FI crashes.

Crash Hotspot Analysis
The first objective of this study is to examine the presence of hotspots by crash severity, causes, and types. It was done by assessing the NNI values for the crash incidences (Table 1). Since all the crash attributes had NNI values less than 0.6, the hotspot analysis was performed for all the crash severity, causes, and types.

Hotspot Analysis by Crash Severity
In Figure 4, the crash hotspot locations based on severity indices (i.e., property damage only (PDO) or those involving fatal and injury (FI)) are shown. It is established that "not all crashes are equal", the extent of damage associated with the specific crash is dependent on several factors including urban land use, features of the built environment, crash type, driver alertness, and response to the event, road designs, vehicle characteristics, number of vehicles involved, the speed at the time of impact, and many others. PDO crashes were prevalent in medium to high density residential and public facility zones where permissible speed limits (<50 kmph) are relatively low, and traffic volumes are significantly high. Since the drivers are usually more alert during such circumstances, the chances of any significant injuries and casualties are low. On the other hand, the FI crashes were clustered on routes having flexible driving speed limits in urban areas having low-to-medium population densities. Apart from over-speeding, other factors such as driver distractions, fatigue driving, sudden lane deviations, and crashes involving pedestrians were significant contributors to FI crashes.

Hotspot Analysis by Crash Causes
In Figure 5, the distribution of hotspots by prevailing crash causes (i.e., driver distractions, not giving way, over-speeding, poor roadway, driver fatigued or falling asleep behind the steering wheel, and sudden lane deviations) is shown. Such crashes are mainly distributed along major and minor urban residential streets. However, a high concentration of crashes occurred at congested major intersections havinghigh population density in the surrounding areas. It is evident from the figure that although the extent and spread of hotspot zones by crash causes are different, they have one main common epicenter. This observation is intuitive because the epicenter for all crash categories is located in the downtown or central business districts of the Dammam Metropolitan Area. Speeding has the highest percentage (46.4%), followed by sudden lane change (43.9%). The distribution of crashes caused by driver sleep/exhaustion has the lowest percentage (19.9%) of the hotspots (Table 1). Crashes caused by traffic violations were another prevailing crash category that was mapped. These include violations of traffic control devices (TCD) like traffic signals, stop signs, road markings, non-compliance with the pedestrian crossings, and distractions caused by the use of mobile phones while driving. About 29% of the total crashes in this class fall within the hotspot zones.

Hotspot Analysis by Crash Types
The distribution of hotspots by crash types are shown in Figure 6. It was found that crashes between two or more modes of motorized transportations are the most predominant crash types accounting for over 68% of the total crashes. A significant proportion of collisions were rear-end crashes in the dilemma zones near the signalization intersections, which primarily resulted in non-fatal crashes. Approximately 53% of the total collisions belonged to hotspots. The next prevailing type of crash hotspot in this category was "hit fixed objects crashes" that included a collision with a pole, tree, and parked vehicles, etc. Only 17% of fixed object crashes were marked as hotspots. Although pedestrian crash hotspots constitute a low percentage (19.3%) of total crashes in this category, it resulted in a high number of injuries and fatalities. Elderly pedestrians, in particular, were more prone to severe crashes. In general, the patterns of crash hotspots by prevailing crash types (mapped in Figure 6) were also concentrated in the metropolitan downtown area.

Hotspot Analysis by Crash Types
The distribution of hotspots by crash types are shown in Figure 6. It was found that crashes between two or more modes of motorized transportations are the most predominant crash types accounting for over 68% of the total crashes. A significant proportion of collisions were rear-end crashes in the dilemma zones near the signalization intersections, which primarily resulted in nonfatal crashes. Approximately 53% of the total collisions belonged to hotspots. The next prevailing type of crash hotspot in this category was "hit fixed objects crashes" that included a collision with a pole, tree, and parked vehicles, etc. Only 17% of fixed object crashes were marked as hotspots. Although pedestrian crash hotspots constitute a low percentage (19.3%) of total crashes in this category, it resulted in a high number of injuries and fatalities. Elderly pedestrians, in particular, were more prone to severe crashes. In general, the patterns of crash hotspots by prevailing crash types (mapped in Figure 6) were also concentrated in the metropolitan downtown area.

GWR Analysis for Land Use Neighborhood Population and Crash Counts Severity, Causes, and Types
The second objective of this study was to examine the relationships between the total population of each neighborhood (Figure 2) against the number of vehicle crashes based on their characteristics (severity, causes, and types) occurring in that neighborhood during the study period. As discussed in the methods section, such relationships were examined using the GWR analyses. The results of the GWR analyses are shown in Figures 7-9. Each figure shows the standard deviations of the residuals (SDR), indicating the neighborhoods where the model over-and under-predicted the value of the dependent variable (crash characteristics). Neighborhoods with SDR values below −1.5 have crash characteristics very low when compared to the population in those neighborhoods. On the other hand, neighborhoods having the SDR value above 1.5 have value for crash characteristics that are significantly very high when compared to the population in those neighborhoods. Neighborhoods with SDR values between −0.5 and +0.5 have relatively proportional crash characteristics compared to the neighborhood populations.

GWR Analysis for Land Use Neighborhood Population and Crash Counts Severity, Causes, and Types
The second objective of this study was to examine the relationships between the total population of each neighborhood (Figure 2) against the number of vehicle crashes based on their characteristics (severity, causes, and types) occurring in that neighborhood during the study period. As discussed in the methods section, such relationships were examined using the GWR analyses. The results of the GWR analyses are shown in Figures 7-9. Each figure shows the standard deviations of the residuals (SDR), indicating the neighborhoods where the model over-and under-predicted the value of the dependent variable (crash characteristics). Neighborhoods with SDR values below −1.5 have crash characteristics very low when compared to the population in those neighborhoods. On the other hand, neighborhoods having the SDR value above 1.5 have value for crash characteristics that are significantly very high when compared to the population in those neighborhoods. Neighborhoods with SDR values between −0.5 and +0.5 have relatively proportional crash characteristics compared to the neighborhood populations.

GWR Analysis for Neighborhood Crash Severity
In Figure 7, the SDR values for the relationships between the population and the crash severities (FI and PDO crashes) are presented. It can be seen that fatal and injury (FI) crashes are mainly located in the neighborhoods with residential and mixed residential-commercial land uses where there are relatively high population densities. These neighborhoods are subjected to frequent driving, and the residents are exposed to high traffic while commuting for daily activities. A small proportion of FI crashes also occurred in industrial/employment neighborhoods and zones. Commercial zones have the lowest proportion of these crashes. Property damage only (PDO) crashes are mainly concentrated in residential and industrial/employment land use zones with low population densities and with light and/or slow-moving vehicles. The commercial zones and neighborhoods with public facilities zones have a low share of PDO crashes.

GWR Analysis for Neighborhood Crash Severity
In Figure 7, the SDR values for the relationships between the population and the crash severities (FI and PDO crashes) are presented. It can be seen that fatal and injury (FI) crashes are mainly located in the neighborhoods with residential and mixed residential-commercial land uses where there are relatively high population densities. These neighborhoods are subjected to frequent driving, and the residents are exposed to high traffic while commuting for daily activities. A small proportion of FI crashes also occurred in industrial/employment neighborhoods and zones. Commercial zones have the lowest proportion of these crashes. Property damage only (PDO) crashes are mainly concentrated in residential and industrial/employment land use zones with low population densities and with light and/or slow-moving vehicles. The commercial zones and neighborhoods with public facilities zones have a low share of PDO crashes.

GWR Analysis for Neighborhood Crash Causes
In Figure 8, the GWR analysis for population and predominant crash causes in the study area are shown. It is evident from the figure that distractions, speeding, sleep/fatigue crashes have similar patterns of association with densely populated neighborhoods in the residential areas. Crashes due to sudden lane changes are also more prominent in residential and mixed commercial-residential areas. A small percentage of crashes in residential and industrial/employment zones occurred due to poor roadway and surface conditions.

GWR Analysis for Neighborhood Crash Causes
In Figure 8, the GWR analysis for population and predominant crash causes in the study area are shown. It is evident from the figure that distractions, speeding, sleep/fatigue crashes have similar patterns of association with densely populated neighborhoods in the residential areas. Crashes due to sudden lane changes are also more prominent in residential and mixed commercial-residential areas. A small percentage of crashes in residential and industrial/employment zones occurred due to poor roadway and surface conditions.  Figure 9 portrays the relationship between the population and the distribution of crashes by prevailing crash types. The results show that collisions and hitting fixed objects are predominant in residential neighborhoods with high population densities in the northwestern parts of the study area. The streets are narrow in these areas. The relationship is weak with commercial, mixed commercial-residential, and public facilities. Fixed object crashes have a high positive correlation with sparsely built residential and industrial/employment zones. Mixed commercial-residential and commercial areas in the southeast of the city also have a slight positive affinity with fixed object crashes. Hit pedestrian crashes have a positive association with medium to highly dense residential land use and mixed commercial-residential areas. ISPRS Int. J. Geo-Inf. 2020, 9, x FOR PEER REVIEW 17 of 23 Figure 8. Distribution of the SDR values for different crash causes. Figure 8. Distribution of the SDR values for different crash causes.

GWR Analysis for Neighborhood Crash Types
The streets are narrow in these areas. The relationship is weak with commercial, mixed commercialresidential, and public facilities. Fixed object crashes have a high positive correlation with sparsely built residential and industrial/employment zones. Mixed commercial-residential and commercial areas in the southeast of the city also have a slight positive affinity with fixed object crashes. Hit pedestrian crashes have a positive association with medium to highly dense residential land use and mixed commercial-residential areas.

Conclusions
This study aimed to examine the relationship between traffic crashes with urban population density and land use in the context of the Dammam Metropolitan Area on the eastern coast of the KSA. The GWR analysis was utilized to investigate spatial heterogeneity and non-stationarity of traffic crashes. Crash hotspots and GWR based hazard maps for prevailing crash types, crash causes, and two severity groups (FI and PDO) were developed at the zonal level in the study area. Three crash types and seven predominant crash causes were found significant (>90%) for hotspots mapping. It was found that hotspots for FI crashes were located in residential neighborhoods and near public facility areas with low to medium population densities and along highways with relatively high travel speeds. On the contrary, hotspots for PDO crashes were concentrated in residential and industrial/commercial zones having medium to high-density populations. These areas had significantly high traffic volume with low traveling speed limits. This distribution of hotspots based on crash severity is expected since high population densities in these areas are accompanied by large

Conclusions
This study aimed to examine the relationship between traffic crashes with urban population density and land use in the context of the Dammam Metropolitan Area on the eastern coast of the KSA. The GWR analysis was utilized to investigate spatial heterogeneity and non-stationarity of traffic crashes. Crash hotspots and GWR based hazard maps for prevailing crash types, crash causes, and two severity groups (FI and PDO) were developed at the zonal level in the study area. Three crash types and seven predominant crash causes were found significant (>90%) for hotspots mapping. It was found that hotspots for FI crashes were located in residential neighborhoods and near public facility areas with low to medium population densities and along highways with relatively high travel speeds. On the contrary, hotspots for PDO crashes were concentrated in residential and industrial/commercial zones having medium to high-density populations. These areas had significantly high traffic volume with low traveling speed limits. This distribution of hotspots based on crash severity is expected since high population densities in these areas are accompanied by large traffic volumes and low travel speeds, which reduces the chances of severe injuries and fatalities. This observation is consistent with previous studies [27,[39][40][41]69]. Lee and Khattak showed that severe crashes are more likely to occur on the outskirts of the city, while low severity crashes are clustered together within the city [69].
Distribution of hotspots and GWR analysis for crash causes revealed that epicenters for crashes considered in this category were located in the CBD of the study area. Crashes due to sudden lane deviation constitute the highest proportion of observed crashes and are mostly located in the residential and mixed commercial-residential areas. Fatigue driving mainly concentrated in medium-density residential and commercial zones accounted for the lowest proportion of crashes in this category. Industrial/employment zones had very few collisions (by all crash causes) when compared to their population densities. Similar observations were also reported by previous studies [42,43]. Similarly, hotspots and GWR analysis by crash types resulted in similar patterns of crash distribution, with CBD being the epicenter for all crash types. Vehicle collisions representing a significant proportion of total crashes were found to be more predominant in high-density residential neighborhoods along with narrow and congested links. However, commercial and public facility zones having low population densities had fewer collisions. Fixed objects and hit pedestrian crashes were prevailing in commercial and mixed commercial and residential areas with medium population densities. A study conducted by Dai et al. reported that pedestrian crashes are clustered together on segments with a greater number of public transit stops, and those located near dense households [70].
Knowledge gained from this study provides useful insights for a better understanding of the complex interrelationship between traffic safety and land use. It can help officials and road safety agencies to pinpoint the high-risk crash locations at the disaggregated level by specific crash characteristics, i.e., crash cause, crash type, and/or crash severity. Such a detailed analysis could also provide essential guidance to policymakers for better planning and management strategies, and efficient use of appropriate resources to enhance the zonal-level safety. One of the drawbacks of the current study is that it did not explicitly include land use in the GWR model. Unfortunately, the percentages of land use within the individual neighborhood were not available, which may be considered in future studies. Future studies could consider the detailed features of the built environment, such as access management tools, pedestrian-oriented tools, and population sociodemographic for assessing spatio-temporal variations of crashes. It would also be interesting to explore the effect of other policy, social, and engineering factors such as police enforcement, traffic arrangements/volumes, infrastructure design, etc., in forthcoming studies. Studies could also compare the performance of GWR with other models such as Bayesian spatial models and ransom parameter model that also account for spatial heterogeneity in identifying crash-prone locations. Finally, studies could focus on a detailed analysis of hotspots for individual vulnerable road users (VRUs) groups such as pedestrians, motorcyclists, etc., in connection with features of the built environment.