1. Introduction
Railroad trespassing persists as a significant and perilous issue, causing numerous fatalities, injuries, and operational interruptions globally. In the United States, trespassing on railroads results in hundreds of deaths annually, representing a considerable proportion of rail-related casualties [
1]. Despite increased awareness and ongoing safety initiatives, the prevalence of railroad trespassing continues to be a serious public safety concern. While behavioral and demographic aspects are commonly investigated in relation to trespassing, the physical and spatial environment where these incidents occur has received less scrutiny. A literature review reveals that the built and natural environment—including elements such as pedestrian crossings, fencing, land use, and proximity to rail tracks—significantly influences the likelihood of trespassing [
2,
3]. However, comprehensive investigations of these spatial correlations, particularly through geographic information systems, remain limited. Most existing studies rely on aggregated data or anecdotal evidence, providing little insight into the micro-level characteristics that elevate the probability of trespassing occurrences. There is a critical need for location-specific, data-driven analyses that incorporate spatial and environmental attributes to enhance the effectiveness of prevention strategies. Spatial correlation refers to the degree to which observations located near one another exhibit similar values, reflecting the spatial dependence described by Tobler’s First Law of Geography [
4]. It is commonly quantified using global or local measures such as Moran’s I [
5], Geary’s C [
6], or Local Indicators of Spatial Association (LISA) statistics [
7].
This study addresses this need by investigating the relationship between the physical characteristics of the railroad environment and the spatial distribution of trespassing incidents. Utilizing GIS data and advanced spatial modeling methods, this research identifies high-risk locations based on established environmental features and constructs a framework for identifying trespassing hotspots. By employing spatial analysis and unsupervised learning methodologies, this study aims to go beyond retrospective analysis and support proactive safety planning. The primary objectives of this study are: (1) to analyze the spatial distribution of railroad trespassing incidents within North Carolina, USA. (2) to identify the physical and environmental factors most closely associated with these incidents, and (3) to develop a model to identify potential hotspots to aid rail authorities in targeting preventive measures. Ultimately, the findings are expected to facilitate more effective, geographically targeted interventions, decrease trespassing-related injuries and fatalities, and contribute to the expanding body of knowledge on spatial risk modeling in transportation safety. The next section discusses previous studies that have used these techniques in different contexts. Subsequent sections provide details of the methodology employed in this study as well as the results of the analysis and their implications for transportation safety policy.
2. Literature Review
Railroad trespassing poses a persistent and complex challenge, endangering individuals, disrupting rail services, and jeopardizing public safety. Understanding its root causes is crucial for developing effective prevention strategies [
8]. Research highlights a complex interplay of factors: railroad infrastructure design, surrounding land use, and human activity patterns as key drivers of trespassing incidents [
9]. Silla & Luoma [
10] found that residents near tracks often lack sufficient legal crossings, necessitating trespassing, especially in districts where homes are separated from city centers by railroad lines. Furthermore, commercial areas near stations often experience higher rates of crime, including trespassing, although with different patterns than residential areas, varying by day and time [
11].
The extent of rail infrastructure, typically measured in rail miles or track length, is a foundational exposure variable in trespassing risk analysis. Studies have shown that by simply expanding the physical interface between rail operations and the surrounding environment, the opportunity for unauthorized access is increased [
12,
13]. This aligns with the exposure-opportunity framework in injury prevention, where risk is a function of both the frequency of potential encounters and the inherent hazards present [
14]. A study by Searcy et al. [
14] further found that location-specific characteristics, including rail mileage, explained nearly half (48.9%) of the variation in daily pedestrian trespassing events in a 10-site subset analyzed in the study. Similarly, Kang et al. [
15] used a mixed-effects negative binomial model at the county level across the United States and found that rail track length significantly influenced the frequency of trespassing crashes, alongside demographic factors such as population density and age structure.
Pedestrian crossings such as grade crossings, underpasses, overpasses, and unauthorized (informal) paths represent critical points of interface between rail infrastructure and public movement [
14]. Findings from the Federal Railroad Administration (FRA) indicate that a substantial proportion of pedestrian trespasser fatalities occur within 1000 feet of a grade crossing, underscoring the importance of crossing density and design in risk mitigation [
16]. According to a report by the Florida Department of Transportation Freight and Multi-modal Operations Office [
17], the presence, type, and accessibility of crossings shape pedestrian behavior, influencing whether individuals use legal routes or resort to trespassing as a shortcut. Searcy et al. [
14] further showed that proximity to authorized crossings was inversely related to trespassing frequency: sites with distant or inaccessible crossings experienced higher rates of illegal crossing, while those with nearby, well-designed crossings saw reduced trespassing events. Crossings-per-mile and crossings-per-area are commonly used as exposure metrics in predictive models and cluster analysis, and their inclusion in risk models is essential for capturing the accessibility landscape, understanding behavioral motivations, and designing targeted interventions [
14].
Another factor considered is population density, which reflects the concentration of people living, working, or moving near rail infrastructure, thereby modulating the frequency of potential rail–pedestrian interactions. Consequently, high-density areas are hypothesized to experience greater trespassing risk due to increased pedestrian flows, land-use pressures, and the likelihood of informal access points [
14]. A substantial body of research confirms that population density is a significant predictor of trespassing incidents. Kang et al. [
15] found that at the county level in the U.S., higher population density was associated with increased rail trespass crash frequency, even after controlling for rail miles and train traffic. Similarly, in the Czech Republic, urbanized areas with high residential and industrial development reported trespassing frequencies as high as 10 cases per hour. Grabušić and Barić [
18] quantified the effect, noting that a population density of 100 people per 1.5 km
2 led to an increase in trespassing accidents from 4.8% to 8.18%. Surveys indicate that the majority of trespassers are local residents, and that incidents are more likely to occur close to home, especially in urban environments [
13].
Closely related to population density is the zoning and land-use composition of an area. They define the functional character of areas adjacent to rail infrastructure, shaping both the motivations for and patterns of trespassing. Research consistently demonstrates that land use and zoning are key determinants of trespassing behavior. Skládaná et al. [
8] found that the pattern of functional area types—especially the combination of housing, shopping, industrial, and public services—was a crucial factor in the motivation for railroad trespassing in the Czech Republic. Regression models also showed that the density of pedestrian attractors—such as schools, social services, and restaurants—within one mile of observation sites was significantly associated with increased trespassing events [
14]. For instance, in Florida, trespassing hotspots were frequently located where residential neighborhoods bordered recreational facilities without reasonable legal pedestrian routes, prompting shortcut behavior [
17]. Land-use variables are also critical in risk-based prioritization and intervention design. Agencies such as the Long Island Rail Road (LIRR) and New Jersey Transit (NJT) incorporate land-use context into hazard analyses and fencing policies, recognizing that proximity to schools, parks, and commercial establishments increases the need for targeted mitigation [
12].
In terms of analysis, Geographic Information Systems (GIS) provide a comprehensive framework for managing and exploring spatially referenced data, enabling researchers to examine geographic patterns and relationships across multiple scales [
19]. GIS has been extensively utilized in transportation safety for various purposes, including risk mapping and hotspot prediction. Li et al. [
20], for instance, use GIS to display locations of intra-city motor vehicle crashes as well as the hotspots. Bilim [
21] similarly uses the same tool for spatial autocorrelation and kernel density estimation to obtain the distribution and identification of the most critical locations for pedestrian road crashes. GIS techniques such as Kernel Density Estimation (KDE), Getis-Ord Gi*, and Moran’s I have proven useful in identifying accident-prone areas and hotspots by aiding in clustering accidents and identifying black spots, which are crucial for planning and safety interventions [
22,
23]. By combining physical data such as infrastructure layouts, topography, and land use with human-related data, including population density, behavioral patterns, and socioeconomic indicators, GIS facilitates a comprehensive understanding of the factors contributing to risks in a given area. This integrative capacity makes it possible to analyze how environmental and human variables interact to create or exacerbate hazardous conditions. Moreover, GIS enhances decision-making processes by offering detailed, location-specific insights into potential hazards, areas of vulnerability, and high-risk zones [
24].
Generalized Linear Models (GLMs), particularly those employing the negative binomial distribution, are commonly used to analyze count data such as crash frequencies. These models are especially suited for handling overdispersion, a frequent characteristic of crash data. To further enhance the predictive power and account for spatial characteristics inherent in safety data, researchers have increasingly integrated spatial regression techniques and a combination of generalized maximum likelihood estimation (GMLE) and generalized extreme value mixture model (GEVMM), which has been shown to be robust in complex probabilistic modeling problems [
25]. Notably, Geographically Weighted Poisson Regression (GWPR) and Geographically Weighted Negative Binomial Regression (GWNBR) have been applied to capture spatial dependency and heterogeneity in crash occurrences [
26]. These approaches allow model parameters to vary across geographic space, thus providing localized insights that traditional global models may overlook.
Although railroad trespassing is increasingly recognized as a significant safety concern, current research predominantly emphasizes behavioral, demographic, or temporal dimensions, with inadequate consideration of physical and spatial determinants. Existing spatial analyses often provide descriptive insights but fail to integrate data pertaining to physical infrastructure, such as access routes, barriers, land utilization, or adjacency to urban areas. This deficiency curtails a holistic understanding of how the built environment influences trespassing incidents, thereby impeding the formulation of targeted preventative measures. Furthermore, the application of unsupervised learning within a GIS to identify potential hotspots based on environmental attributes remains limited. This study addresses these limitations by synthesizing granular spatial data with analytical techniques to identify physical factors correlated with trespassing and to produce actionable hotspot maps. By merging physical infrastructure characteristics with geospatial behavioral patterns, this research promotes a more holistic and forward-thinking strategy for railroad safety management.
4. Results of Spatial Autocorrelation
4.1. Global Spatial Autocorrelation
Global Moran’s
I measures the overall spatial dependence in a numeric variable
y observed on
n spatial units with weights
. Using row-standardized weights (
), we compute
Positive
I indicates that neighboring units tend to have similar values (positive autocorrelation), negative
I indicates dissimilarity (checkerboard), and for a random pattern the expectation satisfies
.
Weights: k-nearest neighbors (KNN) (k = 10) with row standardization is used to ensure connectivity and comparable neighbor influence across ZIPs. Sensitivity is also assessed at .
Hypothesis 1. - H0:
No spatial autocorrelation (spatial randomness); y is exchangeable across locations.
- H1:
positive autocorrelation exists; one-sided.
Permutation inference: A conditional randomization test with 999 permutations was performed using the following procedure:
- 1.
Holding the spatial topology (the weight matrix W) fixed.
- 2.
Randomly permuting y across locations to generate the reference distribution of I under .
- 3.
Computing a pseudo p-value as the proportion of permuted statistics at least as extreme as the observed I (per the chosen alternative). With 999 permutations, the minimum attainable p-value is .
- 4.
And finally report I, the permutation-based p-value, and a z-score from the permutation distribution.
The incident rate data were exported as a geopackage and analyzed with GeoPandas, libpysal, and esda in a Python environment using Anaconda
® desktop version 2.5.0 [
54]. Global Moran’s I for trespassing casualty rate (per 10,000 population) using KNN (
k = 10), row-standardized weights, and 999 permutations indicated positive spatial autocorrelation (
I = 0.101, z = 7.025,
p = 0.001). Also, the mean distance to the 10th neighbor is 80,849.1 ft, equivalent to approximately 15.3 miles.
The Moran’s
I values in
Table 3 illustrate how the global autocorrelation signal varies with the neighborhood definition. Moran’s
I declines from about
at
to about
at
, as larger
k smooths local contrasts by averaging over more neighbors. In contrast, the permutation
z-score generally increases with
k (rising from
at
to
–
by
–20), indicating that—even as the effect size becomes more conservative—the statistic remains highly atypical under spatial randomness and thus strongly significant across all reasonable
k. Together, the plots imply a robust, modest positive autocorrelation. The precise magnitude of
I depends on
k, but the inference (
throughout) is stable. Thus,
offers a balanced neighborhood size (local enough to preserve corridor structure, large enough to avoid isolated units), with corroborating sensitivity at
and
.
4.2. Local Indicators of Spatial Association (LISA)
This section uses Local Moran’s I to move from the global question, does clustering exist, to the local question: where, exactly, are the clusters and spatial outliers? For each ZIP, we evaluate the trespassing casualty rate per 10,000 residents against the rates of its neighbors, using k-nearest neighbors (k = 10) with row-standardized weights to represent local spatial context. Significance is obtained by permutation testing (999 permutations) for each unit; because many tests are run simultaneously, we controlled for multiple testing using the Benjamini–Hochberg false discovery rate (FDR, ). The resulting LISA map classifies places as High–High (HH) and Low–Low (LL) clusters—interpreted as hotspots and cold spots—alongside High–Low (HL) and Low–High (LH) spatial outliers that may indicate emerging risk or protective pockets. These outputs guide prioritization, such as engineering, enforcement, or education, and provide a basis for robustness checks, such as alternative k, distance bands, or exposure metrics, such as per rail-miles or per crossing.
The resulting classification shows that the majority of ZIP codes were not significant (ns = 483), indicating no detectable local association after correction. A sizeable set formed low–low clusters (LL = 244), i.e., areas with lower rates surrounded by similarly low neighbors (cold spots). In contrast, high–high clusters (HH = 13) were relatively rare but represent the most defensible hotspots—locations with elevated rates embedded within high-rate neighborhoods. We also observed a small number of spatial outliers: low–High (LH = 21), suggesting comparatively low-rate ZIPs adjacent to high-rate neighbors (potential protective pockets), and high–low (HL = 2), indicating isolated high-rate ZIPs amid low-rate surroundings (possible emerging hotspots). Overall, these counts imply that while most of the state does not exhibit significant local association after FDR adjustment, a compact set of hotspots and a narrow band of outliers merit targeted investigation.
Figure 1 and
Figure 2 visualize the contrast between the global intensity hotspots (Gi*) and local similarity clusters (LISA).
As presented in
Figure 3, the positive Local Moran’s
I values indicate locations that resemble their neighbors (clustering), while negative values would indicate spatial outliers. Points higher on the plot have smaller permutation
p-values (greater statistical evidence). The orange, annotated points identify ZIPs that form High–High clusters after FDR adjustment; these represent the most defensible hotspots where elevated trespassing casualty rates are embedded within similarly high-rate neighborhoods. Most ZIPs cluster near
with lower
, indicating no detectable local association after multiple-testing correction, consistent with a sparse landscape punctuated by a compact set of statistically significant hotspots.
4.3. Sensitivity Analysis
To assess robustness to the neighborhood definition, we repeated the global test with k-nearest-neighbor weights over . Results were stable: Moran’s I ranged from 0.095–0.114 (k = 8: , , ; k = 10: , , ; k = 12: , , ), indicating a consistent, modest positive spatial autocorrelation irrespective of reasonable changes in k. As expected, a larger k slightly smooths local variation and reduces I, but significance remains unchanged. In local analyses (LISA), cluster detection proved more sensitive to k and multiple-testing control: with FDR at , k = 10 yielded a small set of hotspots (HH = 13), whereas k = 8 and k = 12 produced no FDR-significant clusters despite similar raw () patterns. Accordingly, we report k = 10 (row-standardized) as the primary specification and include k = 8/12 and raw vs. FDR-adjusted results as sensitivity checks. Thus, conclusions about the presence of global clustering are robust, while the exact set of local hotspots varies modestly with neighborhood choice and correction method.
5. Results of Hotspot Analysis
Using the
k-means clustering approach, the optimal number of clusters was determined using the elbow method, which evaluates the within-cluster sum of squares (inertia) as a function of the number of clusters. As shown in
Figure 4, inertia decreases sharply as the number of clusters increases from
k = 1 to
k = 4, after which the rate of improvement diminishes substantially. This inflection point indicates that additional clusters beyond
k = 4 yield only marginal reductions in within-cluster variance. Based on this pattern, a four-cluster solution is selected as the most parsimonious and interpretable representation of the data. A silhouette score of 0.50 was obtained, which indicates strong cluster cohesion and separation, supporting the robustness and interpretability of the selected
k-means clustering solution.
Table 4 presents the centroid values for each cluster, which represent the average characteristics of ZIP codes belonging to each cluster group. These centroids provide insight into the structural environments associated with varying levels of rail trespassing risk. Cluster 0 is characterized by relatively high population density (
persons per square mile) and a predominance of residential land use (approximately 71%). Rail exposure in this cluster is moderate, with an average of approximately 3.9 rail miles and 6 grade crossings per ZIP code. The combination of dense residential development and moderate rail infrastructure suggests that this cluster represents urban or suburban residential areas where pedestrian interactions with rail infrastructure may occur frequently.
Cluster 1 exhibits the highest levels of rail infrastructure exposure among the four clusters, with an average of approximately 6.9 rail miles and 11 crossings per ZIP code. Land-use composition in this cluster is dominated by industrial activity (approximately 51%), with relatively low residential presence. Population density is moderate ( persons per square mile). These characteristics indicate that Cluster 1 likely represents industrial rail corridors or freight-oriented environments where rail infrastructure is heavily concentrated. Cluster 2 displays the lowest levels of rail exposure, with approximately 1.0 rail mile and 3 crossings per ZIP code on average. Land use is overwhelmingly commercial (approximately 89%), while residential and industrial land uses are minimal. Population density is moderate ( persons per square mile). This cluster appears to represent commercial districts or retail corridors with limited rail infrastructure presence.
Cluster 3 is characterized by predominantly agricultural land use (approximately 82%) and the lowest population density among the clusters ( persons per square mile). Rail exposure is relatively low to moderate, with approximately 2.3 rail miles and 4 crossings per ZIP code. These characteristics suggest that Cluster 3 corresponds to rural or agricultural environments where rail lines traverse sparsely populated areas. Overall, the clustering results reveal four structurally distinct rail environments: (1) dense residential corridors with moderate rail exposure, (2) industrial rail corridors with high infrastructure concentration, (3) commercial areas with limited rail presence, and (4) rural agricultural regions with low population density. These environmental archetypes provide a useful framework for analyzing spatial variation in trespassing risk across the study area.
5.1. Cluster Sensitivity Analysis
To assess the robustness of the clustering solution, a sensitivity analysis was conducted by repeating the k-means clustering procedure across multiple runs with different random initializations. Because the k-means algorithm relies on random centroid initialization, different starting points can potentially produce different cluster assignments. Evaluating the stability of the clustering results ensures that the identified clusters represent inherent structure in the data rather than artifacts of the initialization process. The clustering procedure was therefore repeated 50 times using identical input variables but varying the random seed controlling centroid initialization. For each run, ZIP-code observations were assigned to one of four clusters (). The similarity between cluster assignments across runs was quantified using the Adjusted Rand Index (ARI), a widely used metric for measuring agreement between two clustering solutions while correcting for chance. ARI values range from 0 (random agreement) to 1 (identical clustering).
Pairwise ARI values were computed for all combinations of clustering runs, producing a distribution of stability scores. The results indicate extremely high clustering stability. The average ARI across all pairwise comparisons was , with a minimum ARI of and a maximum ARI of . These values indicate that cluster assignments remained nearly identical across repeated runs, demonstrating that the identified clusters are highly robust to variations in centroid initialization.
5.2. Hotspot Probability and Risk Metrics
Table 5 provides details of the various indices, while
Figure 5 illustrates the spatial distribution of the Cluster Risk Index (CRI) across ZIP codes in North Carolina. Higher CRI values (shown in darker red) correspond to ZIP codes that belong to cluster typologies characterized by both a high probability of severe rail-related incidents and elevated relative risk compared to the statewide average. The map reveals a clear spatial concentration of high-risk ZIP codes, particularly in dense urban areas and industrial rail corridors, while rural regions generally exhibit low CRI values.
Further, the distribution of CRI across ZIP codes is shown in
Figure 6. The distribution exhibits near symmetry with slight positive skewness (0.179), indicating a modest concentration of higher-risk ZIP codes. The negative kurtosis (−1.61) suggests a platykurtic distribution, reflecting a relatively even spread of risk across the study area with limited extreme outliers. Visual inspection of the CRI distribution reveals multiple distinct peaks, corresponding to the cluster-based segmentation of ZIP codes, which further supports the presence of structurally differentiated risk environments. Also, thresholds based on empirical percentiles were used to classify risk levels, with the 90th and 95th percentiles representing high-risk and priority intervention zones, respectively. This percentile-based approach enables data-driven identification of critical areas while preserving the relative distribution of risk across the study area. Overall, the CRI map provides a concise, policy-relevant visualization that integrates clustering and hotspot analysis results into a unified decision-support framework.
6. Discussion
Recent advances in spatial analysis, particularly the integration of spatial autocorrelation and cluster-based hotspot characterization, have enabled researchers such as Habib et al. [
55] and Mekonnen et al. [
56] to systematically examine the spatial structure of incident spots at granular levels such as ZIP codes, facilitating targeted interventions and resource allocation. Results from spatial autocorrelation analysis showed a highly significant positive spatial dependence of rail casualty occurrence, indicating that high-risk ZIP codes do not occur in isolation but are geographically clustered. This pattern suggests that the risk of rail trespassing is influenced by shared context and external environmental factors beyond ZIP codes, such as continuity of the rail corridor, urban development characteristics, and regional land use structure. The presence of spatial clustering also reinforces the need to move beyond independent-unit assumptions and supports the use of spatially informed methods for identifying and prioritizing high-risk areas. From a practical perspective, these findings imply that targeted interventions may be more effective when coordinated across neighboring ZIP codes along the same high-risk corridors or urban areas.
The cluster analysis also confirmed the presence of heterogeneity in the underlying causes for this spatial clustering of incidents, rather than a single dominant risk factor. Rural ZIP codes with low rail mileage and land-use dominated by agriculture consistently had very low hotspot probabilities, supporting the idea that low risk of trespassing results from both limited rail exposure and sparse population density. Conversely, two distinct high-risk cluster types emerged: industrial rail corridors and dense urban mixed-use ZIP codes. While both clusters showed increased hotspot probabilities, the mechanisms driving risk differed substantially between them. High-risk areas in industrial rail corridors were mostly linked to long stretches of railroad and numerous crossings, where regular encounters between railroad activities and the surrounding environment are common. These areas are probably impacted by freight operations, switching yards, and at-grade crossings that increase exposure independently of residential density.
In contrast, dense urban ZIPs, despite having a moderate amount of rail mileage, exhibited high hotspot probabilities, suggesting that population densities and mixed land-use patterns play key roles in increasing trespass risk. In these contexts, pedestrian activity, proximity of residential and commercial uses to rail infrastructure, and informal access points may contribute more strongly to risk than infrastructure volume alone. The integration of cluster types with hotspot probabilities and relative risks through the CRI offers a cohesive approach to turning spatial patterns into actionable information. While spatial autocorrelation indicates where clustering occurs, the CRI shows which context consistently generates risk. The significant difference in CRI between clusters—ranging from negligible risk in rural ZIP codes to almost certain hotspot occurrence in urban and industrial clusters—underscores the need for interventions tailored to local conditions.
The findings of this study are broadly consistent with prior research on transportation safety and spatial risk analysis. The observed association between rail infrastructure characteristics and increased trespassing risk aligns with previous studies that identify infrastructure exposure as a key determinant of accident occurrence [
12,
13]. The study’s focus on North Carolina is supported by a growing body of empirical research on rail trespassing in the United States and internationally. Searcy et al. [
14] and the Institute for Transportation Research and Education (ITRE) have documented the prevalence and spatial distribution of trespassing events across North Carolina, using both FRA data and direct observation to identify high-risk corridors and communities. Their findings corroborate this study’s identification of urban–industrial corridors and mixed-use environments as primary hotspots. Silla & Luoma [
10] and Grabušić et al. [
18] additionally highlight the role of insufficient legal crossings, poor urban planning, and land-use barriers in driving trespassing behavior, reinforcing the importance of structural exposure metrics in risk assessment. Comparative analyses also reveal the influence of reporting practices, data quality, and local context on observed patterns, underscoring the need for standardized methodologies and cross-jurisdictional collaboration in rail safety research.
A key contribution of this study is the development of a framework that is both interpretable and scalable for practical transportation safety analysis. Thus, the ability of practitioners and decision makers to understand how model inputs influence spatial risk assessments is crucial. Unlike black-box machine learning approaches, the proposed framework relies on transparent statistical and spatial analytical methods, including spatial autocorrelation metrics such as Moran’s
I, and geographically referenced variables such as rail density, population density, and land-use characteristics. Because the model parameters directly correspond to measurable environmental and demographic variables, the contribution of each factor to predicted trespassing risk can be readily interpreted by analysts and policymakers. In this regard, the present framework aligns with interpretable engineering modeling paradigms that emphasize physically meaningful variables and transparent probabilistic structures that provide transparent relationships between inputs and outputs to support operational decision-making and system accountability [
57]. Consequently, the methodology is extendable beyond North Carolina to other states or national-scale railroad safety analyses.
7. Conclusions
This study combined spatial autocorrelation analysis with cluster-based hotspot characterization to examine the spatial structure and contextual drivers of rail trespassing risk at the ZIP-code level. The results provide converging evidence that incidents are not randomly distributed in space, but instead exhibit statistically significant spatial clustering shaped by distinct combinations of rail infrastructure, land-use composition, and population density. Together, these findings advance understanding of how and why rail trespassing hotspots emerge across different spatial contexts. These results have direct implications for rail safety planning and resource allocation. The findings suggest that a single approach to preventing trespassing may not be effective. In industrial rail corridors, interventions might need to focus on engineering controls, crossing treatments, and coordination with freight operations; dense urban areas could benefit more from pedestrian-focused strategies such as access control, urban design modifications, and targeted public education. Identifying adjacent high-risk ZIP codes also points to opportunities for corridor-level strategies rather than site-specific interventions. By combining spatial autocorrelation with cluster-based hotspot detection, this study offers a multi-level analytical framework for assessing trespassing risks along rail tracks.
However, this study is still subject to several limitations that also point to important avenues for future research. The results presented in this study are based on observational data and statistical associations, and therefore should not be interpreted as causal relationships. While the clustering analysis and spatial risk modeling identify patterns and correlations among environmental, demographic, and infrastructure variables and railroad trespassing incidents, these relationships do not imply direct causation. Several potential confounding factors may influence the observed associations. For instance, regional population mobility, pedestrian accessibility to rail corridors, proximity to public service facilities, and informal crossing behavior may affect trespassing risk but are not explicitly captured in the current dataset. These unobserved variables may partially explain the spatial patterns identified in the analysis. Thus, future research should extend this work by incorporating causal inference frameworks, such as quasi-experimental designs or agent-based simulation models, to better understand the mechanisms driving railroad trespassing behavior and to evaluate the effectiveness of targeted interventions.