Predicting Pedestrian Crashes in Texas’ Intersections and Midblock

ABSTRACT


INTRODUCTION
Despite being the oldest and most environmentally friendly form of transportation, walking has become increasingly risky in the U.S.While the total walk-miles traveled (WMT) is estimated to have risen 16% (BTS, 2019) between 2009 and 2017, the number of (reported) pedestrian deaths rose 46% (GHSA, 2020).Texas averaged 1.14 pedestrian deaths per 100,000 residents in 2019 (GHSA, 2020), which is 26% higher than the U.S. average of 0.900.Transportation planners and policymakers can reduce crash risks by implementing countermeasures based on benefit-cost analyses (BCA).However, such analyses do best with site-specific evaluations that are difficult to do at scale, due to a lack of detailed road feature variables.Between 2010 and 2020, the number of intersection crashes doubled in both Texas and the City of Austin, while midblock segment crashes rose 30% and 75%, respectively (Figure 1). Figure 1 shows how midblock pedestrian fatalities are more prevalent than those at intersections (where vehicle speeds are often lower, and pedestrians are more expected), at a rate of more than 3 to 1.Most past pedestrian safety studies use macro-level information, with data aggregated at traffic analysis zones (TAZs) (Siddiqui et al., 2012), census tracts (Wier et al., 2009), census blocks groups (Noland et al., 2013), and zip codes (Ukkusuri et al., 2012) while studies at point level are more limited.In the past, a key limitation of intersection-safety studies is the sample size due to a lack of information across wide networks.Xie et al. (2018) used a Bayesian measurement error model with 262 signalized intersections in Hong Kong and found that the number of crossing pedestrians and passing vehicles, the presence of curb parking, and the presence of shops were associated with higher pedestrian crash counts, while the presence of playgrounds came with lower counts.Pulugurtha and Sambhara (2011) used 176 randomly selected signalized intersections in Charlotte, North Carolina, to understand the factors affecting pedestrian crash counts using a negative binomial (NB) count model using different buffers widths (0.25-mile, 0.5-mile, and 1mile) to extract data at intersection levels.They found that using a 0.5-mile buffer width to extract demographic, socio-economic, and land use characteristics would yield better estimates for low pedestrian activity signalized intersections, 1-mile buffer width would yield better estimates in case of high pedestrian activity signalized intersections.Unfortunately, estimation of exposure variables, like site-level/highly local WMT, is challenging since pedestrian volumes are rarely available (unlike annualized vehicle counts).Studies have relied on surrogate measures, like the presence of schools and businesses, car ownership, pavement condition, sidewalk width, bus ridership, intersection control type, and presence of sidewalk barriers, to develop analyses, as seen in Lee et al. (2019).Midblock level analyses are more common in past research since midblock crashes tend to be more severe and more common, and most transportation departments maintain corridor design variables.Kwayu et al. (2019) analyzed two years of crash information from Michigan and found that the average mid-block pedestrian crash took place while pedestrians were crossing 130 feet from the nearest intersection/crosswalk.Key predictors of more pedestrian deaths in these settings were a lack of lighting during a nighttime crash, crashes involving an older pedestrian, and crashes along corridors that carry higher traffic volumes.Diogenes and Lindau (2010) developed a Poisson regression using 21 midblock crosswalks and found that pedestrian crash counts rise with the presence of busways and bus stops, road widths, more traffic lanes, and higher volumes of pedestrians and vehicles.Analyses at both intersections and midblock segments for the same locations are limited and tend to focus on small areas (Lightstone et al., 2001) or are based on comparisons of crash characteristics (Sandt and Zegeer, 2006).To the authors' knowledge, no published work has yet developed crash count models for both intersections and midblock segments across thousands of locations.The main goal of this research is to develop a micro-level analysis of pedestrian crashes at intersections and midblock segments using historical pedestrian crash information from police reports in Texas.The contributions of this work include (1) a method to estimate intersection and midblock segment geometries with roadway characteristics that can be used to better understand crashes at different locations.(2) Using the developed geometries, NB pedestrian count models are estimated using historical pedestrian crash information from police reports in Texas to understand the factors associated at intersection and midblock levels.(3) A specific case study of the City of Austin is also estimated to understand local affecting factors.(Rahman et al., 2021).

Purpose of Geometry Estimation
To facilitate the analysis of pedestrian-related crashes, it is necessary to spatially model crash locations with respect to known roadways, intersections, and traffic signals.This allows analyzing crash-prone "hotspots" tied to intersections and segments.This research focuses upon crashes that take place at mid-block locations or places that are not associated with an intersection between public streets.Two criteria are considered in determining whether a crash is "midblock."First, the original police report contains a field that indicates whether the officer categorizes the crash as happening at an intersection.Unfortunately, it is sometimes unclear as to whether that intersection is the center "box" area where public streets typically cross, a right turn yield link, or a roundabout.Second, the geographic crash location analysis against a map may show that the crash occurs far enough away from an intersection that it may be considered a midblock crash.

Uniform Segments
To support research analysis efforts, an underlying representation of roadway segments is prepared, and pedestrian-related crashes nearest to these segments are then associated with these segments.A geographic database of road segments with "multi-line string" geometry found in the TxDOT Roadway Inventory serves as the starting representation for all roadways in Texas.Each of these come labeled with street name, physical roadway characteristics such as functional class and lane count, estimated daily traffic volume, and maintenance information.In that inventory, each roadway consists of one or more segments.These segments are generally bidirectional except for those that represent one-way streets and divided highways.A challenge in using the TxDOT Roadway Inventory is that individual roadway segments may be extremely short-a minimum of 5 feet-to represent a high rate of changes in inventory values along a roadway, or extremely long-up to 44 miles-for roadways that see few changes.Project activities benefit from the use of fairly consistently spaced segments, necessitating a remapping effort to create derived datasets of mostly uniform segments.Two derived sets are created for this research: 1-mile-long target segment length and 0.1-milelong target segment length.In each, key criteria of the underlying segments that overlap the most from the TxDOT Roadway Inventory are mapped to these new segments.To create these, an algorithm divides up roadways to a new set of segments targeted at length ℓ using these rules: • If the original roadway is less than 1.25 • ℓ miles, then the derived roadway is represented with one segment of the same length.• If the original roadway is less than 2(1.25 • ℓ) miles, then the derived roadway is represented with two segments of equal length.• Otherwise, the derived roadway is represented as starting and ending with segments no less than 0.75 • ℓ miles on either end, with ℓ-mile segments in-between.

Intersections and Signals
Mapping locations of crashes to nearby intersections is also important for this research to distinguish from midblock crashes.Unfortunately, the TxDOT Roadway Inventory does not offer explicit intersection point geometry.Although one can analyze the Roadway Inventory to evaluate where roadway segments intersect, it is impossible to understand where bridges exist (especially near expressway interchanges), leading to numerous false-positive intersections, especially around expressways.
OpenStreetMap (OpenStreetMap contributors, 2021a) was leveraged to positively map intersection and signal locations and apply them to appropriate locations in the TxDOT Roadway Inventory geometry.To do this, queries to a local instance of the Overpass API (OpenStreetMap contributors, 2021b) were created among approx.30x30-mile tiles to return all candidate intersection seed points throughout Texas.This initial set came with caveats that many OpenStreetMap intersections on divided roads of all types exist for each direction.In contrast, the intersections needed for the TxDOT Roadway Inventory would be needed per roadway.Other criteria were sought in positively identifying this first set of candidate intersections: • It has a signal (tag "highway": "traffic_signals").This will also catch signals for mid-block crossings.Or, • It is met by more than one motorway that has a different type and name combination.
• Nodes serviced by only motorways and motorway links are labeled as a "junction." • Nodes that are joined by the ends of just 2 OpenStreetMap roadways are not counted, as they are likely a continuous stretch of roadway.Next, to combine together candidate intersections that were closely positioned next to each other, the DBScan clustering algorithm (Ester and Kriegel, 1996) available in the PostGIS/PostgreSQL database (PostGIS, 2021) was leveraged to combine intersections that are less than 250 feet in distance from each other.Finally, roadway segments in the 1-mile and 0.1-mile uniform segment sets were associated with the clustered candidate intersection locations through a nearest-proximity search, allowing for candidate intersections to be associated if they are less than 130 feet from the roadway geometry.This approximation is in support of efforts to perform initial rounds of this research.It had been found that this nearest-neighbor method of matching intersections to geometry by proximity alone still results in erroneous matches, especially around closely-positioned, urban expressway on-and off-ramps.However, because this research has emphasized urban streets and corridors that do not lie along expressways, the success rate for the nearest-neighbor matching approach has empirically been sufficient.It is anticipated in future work that a map-matching strategy (Perrine et al., 2015) can map valid pathways through the OpenStreetMap roadway network to underlying TxDOT Roadway Inventory geometry, allowing intersections to be more successfully tied with only the roadway segments that are truly connected with those intersections.

Estimated Geometries
An example of the estimated geometries is shown in Figure 2. The 0.1-mile roadway segments are shown along with the intersections in a close-up of the City of Austin downtown area.The geometry matches the roadway map. Figure 3 shows the total roadway segments and intersections for the City of Austin and for the entire state.A total of 700 thousand (~1-mile uniform) segments drawn from 575 thousand segments are used to describe the State of Texas, while the City of Austin consists of 20 thousand (0.1-mile uniform) segments and 41 thousand intersections.A detailed description of these geometries is summarized in Tables 1 and 2, showing summary statistics of infrastructure characteristics as well as pedestrian crash counts based on the crash type and map matching process.

CRASH COUNT MODELING
An NB count model was used for pedestrian crash counts.The expected number of pedestrian crash counts ( ! ) along the  "# intersection or midblock segment is expressed as follows: (1) where  & is the  "# covariate,  ! is a random error term which follows a Gamma distribution  !~Gamma(, ),  !represents the total pedestrian count at intersection  with mean E( ! ) =  ! and variance Var( ! ) =  !+  !' , and  is the dispersion parameter ( = 0 for a Poisson model).Additionally, a sensitivity analysis was applied to the NB estimates to understand the covariates' effects.Specifically, for each covariate, one standard deviation or binary change is applied.The modified variables are passed to the model to calculate the prediction.Then, the difference between the mean of original prediction and permuted prediction is calculated to represent the contribution of that covariate.Because of its appropriateness and suitable fit for modeling count data, this methodology has been applied in other research, including pedestrian crash occurrences at the segment level (Rahman et al., 2021) and e-scooter count models (Dean and Zuniga-Garcia, 2022).along with sensitivity analysis summarized in Figures 4 and 5.The dispersion parameter () of the 6 four models is greater than one, indicating that the data is over-dispersed, and an NB model is 7 preferred over a Poisson model.8 9 The Texas model (Table 3) shows a positive correlation between the WMT and the number of 10 pedestrians crashes across intersections and midblock-segment models, likely due to increased 11 exposure levels.However, previous research also found that the relationship between crash 12 exposure and crash rates is non-linear.It has rates falling off dramatically as walk levels rise, 13 presumably due to drivers expecting more pedestrians in high-WMT zones and safer pedestrian 14 environments that encourage walking (Wang and Kockelman, 2013).15 The signalized intersection indicator in the intersection model is found to be among the most significant variables in the model.The sensitivity analysis indicates that the number of pedestrian crashes is doubled with signalized intersections when compared with unsignalized intersections with everything else remaining constant.Although signals are relatively safer than other control types for high pedestrian activity areas, higher usage increases the risk of accidents, as found in previous research (Lee et al., 2019;Xie et al., 2018).Similarly, the number of approaches (intersections) and the number of intersections crossed by the roadway segment show a positive, significant effect on the rate of pedestrian crashes.Specifically, one standard deviation of the number of approaches (0.67) leads to an increase of 31% in pedestrian crashes, while the standard deviation of 2.8 intersections crossed increases the crash rate 29%.Among the highway design variables, the estimates indicate that higher DVMT significantly increases the number of pedestrian crashes.Interestingly, the effect of DVMT is more critical for the midblock segments than for the intersection model, as suggested by the sensitivity analysis, where one standard deviation increase causes 52% more crashes at intersections and 187% more crashes at midblock sections.Other variables such as the number of lanes and lane width also contribute to higher pedestrian crash rates.Higher posted speed limits and longer median widths, in contrast, tend to coincide with a reduction in crash rates.Roadways with a high speed limit are related to higher risk, and pedestrians tend to avoid these areas as there is limited pedestrian infrastructure.However, research findings indicate that, although the crash frequency is lower, the severity is significantly higher in areas with high speed (Bernhardt and Kockelman, 2021;Rahman et al., 2021;Zhao et al., 2021).One-way roads are related to fewer crashes at midblock segments, while this variable was not significant for intersections at a 95% confidence level.The effect on midblock crashes is likely due to the reduced exposure on segments (less distance to cross).The intersection model suggests that variables from both major and minor approaches have a similar effect.The minor approach has a slightly lesser (but still significant) impact, except for the number of lanes, which shows no significant effect.However, the one-way road (minor) indicator variable showed statistical significance for the model and as found for midblock segments, the effect is negative, suggesting that one-way roads led to fewer pedestrian crashes at intersections as well.Traffic attributes such as AADT and the percentage of trucks are also highly related to an increase in the crash rate at intersections.The roadway functional class is compared with local and collector roads using the indicator variables for arterials (freeways and highways are not included in the model).The results indicate a positive and significant effect, where arterial roads tend to have a higher number of pedestrian crashes than local and collector roads for both intersections and midblock segments.This effect is consistent in previous research where arterials are categorized as a health problem due to high pedestrian crash frequency, excessive noise and pollution (McAndrews et al., 2017).TxDOT-maintained roads ("on-system roads") show a negative effect, suggesting that the number of pedestrian crashes is lower on these roads compared to other roadways.However, this finding contrasts previous research analysis of pedestrian crashes that uses the Texas Roadway Inventory geometry (Rahman et al., 2021).This may be because this study developed an analysis at a smaller scale with separations between intersections crashes and midblock crashes, while previous studies aggregated crashes and analyzed only segment-level variables.Population is accounted for in the land use variable where urbanized areas (having a population of 50,000-200,000) are compared to rural, small urban, and large urbanized areas (refer to Table 1 for more details).Compared to urbanized areas, large urbanized areas (population greater than 200,000) show a positive coefficient, suggesting a positive effect.In contrast, small urban areas show a negative effect, suggesting that the rate of crashes is lower in these areas.This effect is consistent with expectations since denser areas have a higher number of pedestrians, and the exposure is higher due to the presence of more vehicles.The distance to the nearest hospital is also analyzed; the coefficient is found to be negative and significant at a confidence level of 95%.This suggests that the number of crashes is lower in areas with hospitals located at a close distance, which is comparable to the finding of higher crashes in denser areas.The sensitivity analysis indicates that one standard deviation of distance (approximately 5 miles) leads to a decrease of 11% (intersections) and 5% (midblock segments) in pedestrian crashes.However, it is important to mention that although the number of crashes is low, the distance to the hospital can be critical for response time and prompt injury treatment.The presence of public transportation is an indirect measure of pedestrian exposure.In this study, transit information is included in the model through two variables.The first one is an indicator variable of the presence of transit within a buffer of 0.25 miles from the geometry centroid, and the second one is the count of transit stops in this area.As expected, both variables indicate that the number of pedestrian crashes is higher in the presence of transit, with a significant level of 95%.The sensitivity analysis suggests that the crash rate is about 50% higher in comparison to areas with the same characteristics without transit stops.The Texas model includes a variable to indicate the location of the City of Austin and compare crashes with the rest of the state.Interestingly, the number of intersection crashes in Austin is higher than in the rest of Texas, but the number of midblock crashes is lower.Similarly, Figure 1 suggests that the number of intersection crashes in Austin is comparable to the number of midblock crashes, mainly in recent years (after 2017).The Austin-specific model in Table 4 shows an intersection model that is less sensitive to WMT and signalized intersections than the Texas model but still shows a significant value.It is likely that pedestrian crashes in non-signalized intersections are more frequent in this area compared to the state of Texas.The number of approaches also shows a positive coefficient.In terms of midblock segments, the number of intersections crossed is not significant, possibly due to the size of the segments.In this case, the segments are 0.1-mile long, and the number of intersections crossed is significantly lower than the case where 1-mile segments were used in the Texas model.DVMT has a positive significant effect on the number of crashes in the City of Austin.However, the sensitivity analysis suggests that the effect is less than the results obtained from the Texas model.One standard deviation increase of DVMT leads to a 40% (intersection) and 72% (midblock) increase in the number of pedestrian crashes.The effect of the speed limit is only significant for the midblock segments and not for the intersections model, and the sensitivity change is comparable to the rest of the state.The number of lanes also has a positive and significant coefficient.Variables such as lane width, median width, and one-way roads are not significant at a 95% confidence level for the intersections model (in both cases, major and minor approaches).In terms of the traffic attributes, AADT is positive and significant for the intersection model (same as Texas).Still, the percentage of trucks is negative in both models (as opposed to the positive effect for intersections in Texas).This suggests that pedestrians in Austin are less likely to be involved in an accident in intersections with a high number of trucks compared to the rest of the state.The functional class indication for arterial roadways is not significant, and no conclusion can be obtained from this variable.In comparison to the Texas model, the City of Austin model also suggests that on-system roads have fewer crashes than other roadways.The distance to hospitals in the Austin model is not relevant, likely due to the small area selected for the model, where multiple hospitals are located across the city.The exposure variable for transit presence is significant (midblock) and suggests a positive correlation with the number of pedestrian crashes.However, the variables related to transit are not as sensitive in this model compared to the Texas model.
Land use variables such as population density, employment density, and average household income were approximated using the CAMPO data at the TAZ level to analyze the effects on the city's pedestrian crash rate.Population density has a positive and significant effect, as expected.
One standard deviation increase (3.3 thousand individuals per square mile) led to a 15% (intersections) and 21% (midblock) increase in crash rates.The employment density, in contrast, does not have an effect on the intersections model.A significant finding suggests that areas with a higher average household income tend to present fewer pedestrian crashes.An increase of $41,000 in the average household income led to a reduction of 32% (intersections) and 39% (midblock) in pedestrian crash rates.Finally, an indicator variable of the CBD in the city highlights the importance of this area, with midblock crashes being more sensitive (240%) in this area than intersection (78%) crashes, but both models showing a relevant effect.

SUMMARY AND CONCLUSIONS
In this research, historical pedestrian crash information from police reports in Texas is used to understand the factors associated with crash rates at intersection and midblock levels.Developing micro-level analysis is challenging due to the lack of geographic information and characterization at a statewide scale.Therefore, one of the main contributions of this study is the development of a methodology to spatially model crash locations with respect to known roadways, intersections, and traffic signals.Geometry estimations at intersection and midblock levels are obtained, and information from the roadway infrastructure inventory and other sources is assigned with the objective of characterizing such geometries.Information such as traffic control (signalized intersections), highway design variables, traffic attributes, and land use from multiple sources is combined along with the location of the crashes (separated between intersections and midblock crashes) to provide a comprehensive analysis of the roadway network in the State of Texas.
An NB model is used to identify major factors influencing pedestrian crashes at intersection and midblock levels.Models for the State of Texas are developed along with a case study of the City of Austin, one of the areas with the highest number of crashes in the state, to understand specific factors affecting the city's crash rate.The main results suggest that signalized intersections present a higher pedestrian crash rate, DVMT increases the likelihood of pedestrian crashes, and midblock segments are more vulnerable, where one standard deviation increase causes an increase in crashes at intersections and midblock sections of 52% and 187%, respectively.Variables such as the number of lanes and lane width contribute to higher pedestrian crash rates, while higher posted speed limits and longer median widths tend to coincide with a reduced crash rate.Arterial roads are prone to have a higher number of pedestrian crashes than local and collector roads.Land use variables indicate that areas with a greater population tend to have more crashes.The analysis of the Austin area suggests that the CBD is critical for both models, with midblock crashes being more sensitive (240%) in this area than intersection (78%) crashes.Also, a significant inequity was also found in the area: an increase of $41,000 in average household income leads to a reduction of 32% (intersections) and 39% (midblock) in pedestrian crash rates.
(a) Pedestrian crashes in Texas (b) Pedestrian crashes in the City of Austin (c) Severity levels -intersection crashes (d) Severity levels -midblock crashes FIGURE 1 Description of pedestrian crashes at intersections and midblock (Texas Department of Transportation, 2020).
FIGURE 2 Roadway segments and intersections in the City of Austin downtown area.
(a) Roadway segments City of Austin (b) Intersections City of Austin (c) Roadway segments Texas (d) Intersections Texas FIGURE 3 Description of the roadway segments and intersections. 1

TABLE 2 Summary statistics of variables, City of Austin 1
The results from the NB model are summarized in Table3 (Texas) and Table 4 (Austin).Two 4 models are estimated for each location; an intersection and midblock-level model is described 5