Quantifying the Risk Impact of Contextual Factors on Pedestrian Crash Outcomes in Data-Scarce Developing Country Settings

Mubiru, Joel; Evdorides, Harry

doi:10.3390/futuretransp5040151

Open AccessArticle

Quantifying the Risk Impact of Contextual Factors on Pedestrian Crash Outcomes in Data-Scarce Developing Country Settings

by

Joel Mubiru

^*

and

Harry Evdorides

Department of Civil Engineering, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK

^*

Author to whom correspondence should be addressed.

Future Transp. 2025, 5(4), 151; https://doi.org/10.3390/futuretransp5040151

Submission received: 27 August 2025 / Revised: 9 October 2025 / Accepted: 11 October 2025 / Published: 22 October 2025

Download

Browse Figure

Versions Notes

Abstract

Pedestrian crashes remain a leading cause of road traffic fatalities in developing countries (DCs); yet reliable crash data are scarce, constraining the ability to model pedestrian safety risks and evaluate countermeasure effectiveness. This study developed a methodological process for estimating the influence of contextual factors on pedestrian crashes using artificial data. The process integrated literature-derived trend analysis, artificial data generation, external face validity checks, correlation analysis, stepwise negative binomial regression, sensitivity testing, and mapping of results against the International Road Assessment Programme (iRAP) framework. Of the 26 contextual factors considered, 20 were retained in the negative binomial (NB) models, while six were excluded due to weak or inconsistent trend data. Results showed that behavioural and institutional factors, including ad hoc countermeasure implementation, gender composition of pedestrian flows, and vehicle age or technology, exerted stronger influence on crash outcomes than several geometric variables typically emphasised in global models. External validity testing confirmed broad alignment of the artificial dataset with published values, while sensitivity analysis demonstrated the robustness of factor influence values (Fi) across bootstrap resampling and scenario perturbations. The Fi values derived are illustrative rather than decision-ready, reflecting the artificial-data basis of this study. Nonetheless, the findings highlight methodological proof of concept that artificial-data modelling can provide credible and context-sensitive insights in data-scarce environments. Mapping results to the iRAP framework revealed complementarity, with opportunities to extend global models by incorporating behavioural and institutional variables more systematically. The approach provides a replicable pathway for improving pedestrian safety assessment in DCs and informs the development of an enhanced iRAP effectiveness model in subsequent research. Future applications should prioritise empirical calibration with real-world crash datasets and support policymakers in integrating behavioural and institutional factors into countermeasure prioritisation and safety planning.

Keywords:

pedestrian safety; contextual risk factors; artificial data; negative binomial model; data-scarce environments; iRAP; developing countries

1. Introduction

Pedestrian safety remains a pressing challenge in developing countries (DCs), where pedestrians account for a disproportionately high share of road traffic fatalities [1]. Existing safety assessment frameworks, such as the International Road Assessment Programme (iRAP), rely on countermeasure effectiveness values derived largely from high-income country (HIC) data [2]. While robust in well-documented contexts, such models may misrepresent actual risk dynamics in DCs due to fundamental differences in traffic operations, enforcement, infrastructure quality, and road user behaviour [3,4].

Accurate modelling of pedestrian crash risk in DCs is hindered by sparse and often unreliable crash data, with underreporting rates as high as 84% in some regions [5]. Traditional statistical modelling therefore faces limitations in these contexts, necessitating innovative approaches that leverage the available literature, expert knowledge, and proxy datasets [6]. To address this challenge, the authors recently conducted a systematic literature review (SLR), which identified 23 contextual factors that influence the effectiveness of pedestrian safety countermeasures in DCs [7]. These factors are categorised into four groups: traffic exposures and operational characteristics, land use and planning, demographics, and infrastructure and roadway characteristics [8,9,10]. This body of evidence provided a basis for developing methodological frameworks that can function in data-scarce environments.

The present study builds upon the findings of this SLR and has two primary objectives: first, to quantify the relative influence of contextual factors on pedestrian crash outcomes by generating artificial datasets informed by literature-derived distributions and applying correlation analysis, regression modelling, and regression coefficient transformations to derive risk factor influence values (Fi); second, to map these regression transformation results against world frameworks like the iRAP framework, thereby identifying important contextual factors that may not be adequately reflected in existing predictive tools.

It is important to emphasise that this study does not seek to provide empirically generalisable estimates of risk factor influence values (Fi). Rather, it presents an illustrative methodological process that can be replicated and calibrated when reliable crash data become available in DCs. To this end, this study follows a structured approach:

Extracting trend data of contextual factors from literature sources.
Generating a representative artificial dataset based on ranges and distributions reported in the literature, with outputs visualised as histograms and boxplots.
External face validity checks of the generated data.
Estimating the relative influence value (Fi) of each factor on crash frequency through pairwise correlation, stepwise regression, and transformation of regression coefficients.
Performing sensitivity analysis and Fi uncertainty check.
Mapping regression outputs with iRAP’s pedestrian crash risk framework to identify potential gaps.

Given the artificial data design, there is no expectation of conventional statistical significance. The aim is to demonstrate a replicable process for estimating Fi in data-scarce settings to be calibrated when reliable crash data becomes available.

2. Materials and Methods

2.1. Data Collection and Factor Selection

According to the systematic review reported in [7], a total of 33 contextual factors were identified as influencing the effectiveness of countermeasures across both high-income countries (HICs) and DCs/low- and middle-income countries (LMICs). Of these, 23 factors were relevant to LMICs. During data collection for the present study, some of these factors required further disaggregation to align with available evidence in the literature. For example, the age group was classified into three categories, and gender was separated into male and female categories. This process increased the number of independent variables used in this study from 23 to 26, as exhibited in Table 1 and Table 2. The dependent variable was the frequency of fatal pedestrian crashes.

2.2. Extracting Trend Data of Each Factor from Literature Sources

In total, 26 contextual risk factors were identified to influence countermeasure effectiveness in developing countries. These variables were grouped into four thematic categories, including traffic exposure and operations, land use and planning, demographics, and infrastructure and roadway factors [8,9,10]. These categories reflect consistently identified domains influencing pedestrian crash frequency in DCs and provide a structured framework for both data extraction and subsequent modelling.

Trend values (minimum, maximum, mean, and standard deviation) for each contextual factor were derived from a broad range of studies using a snowball sampling approach [11]. This method, complemented by convenience sampling, allowed inclusion of peer-reviewed articles, grey literature, and institutional reports, particularly from low- and middle-income countries, covering observational surveys, transport assessments, and crash risk analyses.

Statistical parameters from this literature formed the basis for generating artificial datasets. By using published ranges and measures of central tendency [8,10,11], the artificial data realistically mirrored variability observed in real-world pedestrian safety contexts, ensuring methodological transparency, reproducibility, and readiness for future calibration with empirical field data. Trend values were manually extracted into Excel with reference links for traceability, as shown in output Table 1.

2.3. Artificial Data Generation

Given severe data sparsity and underreporting in DCs (e.g., up to 84% underreporting in low-income countries by Job and Wambulwa [5]), artificial datasets were generated for 2000 random samples per variable [12] using the literature-derived ranges and distributions from Table 1 as inputs. Sampling was constrained to the observed minima and maxima and targeted the literature-derived reported means and standard deviations to ensure realism.

The generation process was implemented using Python 3.7 (Spyder IDE) programming language with libraries including NumPy for numerical random computation, SciPy for statistical distribution fitting, Pandas for dataset structuring, and Matplotlib for data visualisation [13].

The approach was designed to simulate realistic but artificial data distributions based on the following process:

Using NumPy to generate 2000 random artificial data values for each variable. NumPy’s random number capabilities are widely used in scientific computing for simulation and statistical modelling tasks [14].
To ensure statistical reliability, truncated normal distributions were applied on continuous variables to generate random numbers using SciPy’s truncnorm function [15]. This ensured that all values fall within the literature-derived minimum and maximum range while approximating the specified mean and standard deviation [16].
Random binary distribution was used for categorical/binary variables based on the reported mean values. This is equivalent to a Bernoulli random distribution [17].
The generated values were normalised and rescaled to have nearly the same mean and standard deviation using Pandas [18].
Histograms and boxplots were generated using Matplotlib to visually verify variable distributions [19].

Outputs were cross-checked using Microsoft Excel for validation of randomisation patterns and value ranges.

The Python script used in generating the artificial data is indicated in Appendix A.

The summary of the generated artificial data distribution characteristics for each variable is presented in Table 2. The distribution checks information of histograms and boxplots for each factor are presented in Appendix B.

Note that all scripts and some output files are provided as appendices for reproducibility.

2.4. External Face Validity and Dependence of Generated Data

External face validity was assessed by comparing the published means from Table 1 with the 95 per cent confidence intervals (CIs) of the generated values reported in Table 2. For each factor, if the published mean fell within the 95 per cent CI of the generated estimate, the artificial dataset was judged to be consistent with external evidence. Factors for which the published means fell outside the 95 per cent CI were highlighted as residual gaps.

For each factor, the standard error (SE) was calculated as

s / \sqrt{n}

, where s is the sample standard deviation and n = 2000. Ninety-five per cent confidence intervals for the mean were computed using the normal approximation formula

\bar{X} \pm 1.96 \times S E

, following standard practice in the statistical literature [20,21,22].

2.5. Estimating the Influence of Risk Factors on Pedestrian Crash Outcomes

After generating artificial datasets, the next step was to apply risk modelling using this data. The process involved conducting correlation analysis and stepwise regression modelling, followed by an exponential transformation of the regression coefficients to produce Fi values, which represent the relative influence of each risk factor. This process was used to demonstrate the practical application of risk modelling with artificial data.

2.5.1. Correlation Analyses

Spearman’s correlation was chosen because it evaluates the strength of monotonic relationships between variables based on ranked values [23]. It works well for mixed variable types because it is non-parametric and only depends on ranks, not scale or distribution.

To calculate the correlation between each pair of variables, the following steps were followed:

Ranked the values of the independent variable (X) across all the 2000 random observations. Replaced each row value for the variable with their corresponding ranks.
Ranked the fatal pedestrian crash counts/dependent variable (Y) across 2000 random observations.
Calculated the Spearman’s correlation coefficient between the two ranked pairs of variables using the following correlation formula:

ρ = 1 - \frac{6 \sum_{i = 1}^{n} d_{i}^{2}}{n (n^{2} - 1)}

(1)

where

ρ is the Spearman correlation coefficient;
d_i is the difference in ranks between the two variables (e.g., d_i = rank(X_i)—rank(Y_i));
n is the number of observations (where n = 2000).

This technique was applied in two stages. First, Spearman’s rank correlation was used to evaluate the monotonic relationship between each independent variable and the dependent variable (fatal pedestrian crash count). The results of this analysis are presented in Table 3. Second, pairwise correlations were computed among each pair of variables to assess the presence of multicollinearity, with the results summarised in Table 4.

The Python scripts used to compute the Spearman correlation coefficients and generate the output tables for both steps are provided in Appendix C.

These diagnostics were intentionally descriptive, not inferential, given the artificial nature of the data.

2.5.2. Stepwise Regression Modelling

All 26 contextual factors identified through the literature review and disaggregation were considered for inclusion in the regression analysis. Stepwise modelling procedures were applied to identify significant predictors of fatal pedestrian crashes.

Six negative binomial (NB) regression models were developed to predict fatal pedestrian crash frequencies. The NB model was chosen due to its ability to handle over-dispersed count data, where the variance exceeds the mean [24]. In this case, the variance (σ = 4.305) was greater than the mean (μ = 2.022).

According to Cameron and Trivedi [25], NB2 (quadratic variance) is the standard used in most crash-frequency modelling literature. It is also the default in Python’s statsmodels Generalised Linear Models (GLM) implementation, where variance increases quadratically with the mean. Therefore, under the NB2 parameterisation, the distribution of counts is defined as:

Y_{i} ~ N B (μ_{i}, α), E [Y_{i} │ X_{i}] = μ_{i}, V a r (Y_{i} │ X_{i}) = μ_{i} + α μ_{i}^{2}

(2)

where

μ_{i}

is the expected number of crashes at location I and

α > 0

is the dispersion parameter.

The mean was linked to covariates through the canonical log link:

l o g (μ_{i}) = β_{0} + \sum_{k = 1}^{k} β_{k} X_{k i}

(3)

where

X_{k i}

are predictor variables and

β_{k}

are regression coefficients, estimated by maximum likelihood.

The 6 NB models were fitted according to the following predictor groups:

Model 1: Constant only (baseline);
Model 2: Traffic exposure and operational variables (e.g., mixed traffic conditions);
Model 3: Land use and planning variables (e.g., road use);
Model 4: Demographics (e.g., age group);
Model 5: Infrastructure and roadway variables (e.g., coverage of pedestrian infrastructure);
Model 6: Full model (combined all variables).

The general form for fitting the NB model on the artificial data was as follows:

E [y_{i}] = μ_{i} = \exp (β_{0} + β_{1} X_{1 i} + β_{2} X_{2 i} + \dots + β_{k} X_{k i})

(4)

where

y_i is the expected number of crashes/crash count at point i.
X_1i, X_2i, ….: independent/predictor variables.
β₀, β₂, …: coefficients estimated by maximum likelihood.

Each coefficient β_k corresponds to the log change in the expected crash count per one-unit increase in predictor X_k.

Coefficients, Wald Statistics, and Significance Testing

Coefficients (β_k) were estimated using Maximum Likelihood Estimation (MLE). They indicated the direction (+/−) and magnitude of association, which were interpreted using exponential transformation (exp βk), which gives the multiplicative effect on crash frequency. To assess significance/test whether a coefficient is significantly different from zero, Wald statistics were calculated as follows:

z = \frac{β}{S E (β)}

(5)

where SE is the standard error of the coefficient.

A high absolute value (typically |z|> 1.96 at the 95% confidence level) indicates statistical significance.

Dispersion Parameter (Alpha)

The negative binomial model introduces a dispersion parameter α to account for overdispersion as follows:

V a r (y_{i}) = μ_{i} + α μ_{i}^{2}

(6)

A non-zero α confirms overdispersion, and the NB is better than the Poisson.

Log-Likelihood Function and Goodness-of-Fit Metrics

The contribution of each observation to the NB log-likelihood, expressed using dispersion/shape

r = 1 / α

, is:

{L L}_{O b s e r v a t i o n} = l o g Γ (y_{i} + r) - l o g Γ (r) - l o g y_{i}! + r l o g (\frac{r}{r + μ_{i}}) + y_{i} l o g (\frac{μ_{i}}{r + μ_{i}})

(7)

where LL is the log-likelihood function of convergence and Γ is a gamma function.

The overall log-likelihood of the model (LL_model) is the sum of the log-likelihoods of each site/observation (in this case, 2000 observations), given using the following formula:

{L L}_{M o d e l} = \sum_{i = 1}^{n} {L L}_{O b s e r v a t i o n s}

(8)

Model adequacy was further evaluated using:

Restricted log-likelihood ( ${L L}_{n u l l}$ ) of the null (intercept-only) model;
McFadden’s Pseudo-R² static/log-likelihood ratio index (ρ²) given by:

R^{2} = 1 - \frac{{L L}_{m o d e l}}{{L L}_{n u l l}}

(9)

Akaike Information Criterion (AIC), which is given as:

A I C = - 2 \cdot {L L}_{M o d e l} + 2 k

(10)

where k is the number of estimated parameters included in the model.

Equations (6)–(10) were formulated based on an adapted example of pedestrian risk modelling conducted in Kolkata, India, as presented by Mukherjee and Mitra [10]. Their work provided a practical foundation for structuring risk exposure and estimating the influence of contextual factors on crash frequency in a data-challenged environment. This research builds upon and modifies that approach to reflect the operational realities of developing countries, thereby ensuring methodological relevance while leveraging an established framework.

It is important to note that each model was evaluated based on coefficient direction, relative magnitude, and thematic alignment and not statistical significance.

Implementation of NB modelling in Python

All models were fitted in Python using the statsmodels Generalised Linear Model (GLM) with a negative binomial family and log link. The Python code for fitting the 6 NB regression models and exporting model coefficients, standard error, p-values, confidence intervals, and NB fit metrics is detailed in Appendix C. The modelling outcomes are exhibited in Table 5.

2.5.3. Transforming NB Coefficients into Risk Factor Influence Values (Fi)

Exponential transformation converted NB model coefficients into factor influence values (F_i) using the exponential function:

F_{i} = e^{β_{k}}

(11)

where

F_{i} > 1

suggests increased risk,

F_{i} < 1

suggests a protective effect, and

F = 1

suggests no effect.

These Fi values are equivalent to incident rate ratios (IRRs), which indicate the multiplicative change in expected crash counts per one-unit increase in X_k. This interpretation of exponentiated coefficients is standard in crash-frequency modelling [25,26]. For this research, the Fi/IRR values were the point of interest and hence regarded as the risk factor values of interest. Risk factor values are presented as part of Table 6.

2.6. Sensitivity Analysis and F_i Uncertainty

Sensitivity analysis was performed to assess the robustness of the Fi values to assumptions underlying the artificial data generation and modelling framework. This reflects good practice in applied statistical modelling, where stability under perturbation is an essential test of reliability [26,27]. Two complementary procedures were applied:

Bootstrap resampling: 1000 bootstrap samples of the synthetic dataset were drawn with replacement. For each bootstrap, negative binomial (NB) models were re-estimated and Fi values recalculated. This allowed the derivation of 95% confidence intervals.
Scenario perturbation: The synthetic dataset was resampled under varying sample sizes (n = 1000; 2000; 5000) and with multiplicative noise applied to predictor variables (5% and 10%). For each scenario, Fi values were recalculated, and stability of variable rankings was assessed using Kendall’s τ correlation with baseline rankings.

The Python script used in the sensitivity analysis is included in Appendix D.

2.7. Comparative Analyses (Mapping of Factors to NB Model and iRAP Framework)

A comparative analysis was conducted to map which of the 26 DC contextual factors were represented in the negative binomial regression model and in the current iRAP pedestrian crash risk framework [28]. The objective was to pinpoint contextual factors absent from both the NB model results and the existing iRAP framework. The comparison results are presented in Table 6.

Table 1. Literature-based frequency data characteristics for the different variables.

Characteristics	Variables	Variable Type	Minimum	Maximum	Mean (μ)	Standard Deviation (δ)	References	Country
Safety Performance	Fatal Pedestrian Crash Statistics	Continuous	0.00	13.00	1.83	2.29	[10]	India
Traffic Exposures and Operational Characteristics	Log (Average Daily Traffic Volume)	Continous	4.24	5.47	4.71	0.22	[10]	India
	Log (Average Daily Pedestrian Volume)	Continuous	3.33	5.25	4.58	0.35	[10]	India
	Speed (km/h)	Continuous	30.00	65.00	42.48	9.38	[10]	India
	Pedestrian to Vehicle Volume Ratio/Mixed Traffic Conditions	Continuous	0.05	9.20	1.09	1.23	[10]	India
	Vehicle age/technology (%)	Continuous	0.50	0.90	0.70	0.13	[29]	Nigeria, Ghana, Ethiopia, Kenya
	Compliance/Presence of Overtaking Tendency of Vehicle (1/0)	Categorical	0.00	1.00	0.67	0.48	[10]	India
	Enforcement of Traffic rules (Yes = 1; No = 0)	Categorical	0.00	1.00	0.50	0.50	[10,30]	India
	Public safety awareness level (%)	Continuous	0.31	0.68	0.50	0.13	[31]	Bangladesh
	Driver safety awareness level (%)	Continuous	0.38	0.54	0.48	0.13	[32]	Quatar
	Time of the Day (visibility) (1/0)	Categorical	0.00	1.00	0.49	0.50	[33]	India
Land use and Planning	Hierarchical Road Classification/Road Use (%)	Continuous	0.16	0.80	0.45	0.20	[34]	Brazil, Columbia, Tanzania
	Design Configuration (%)	Continuous	0.10	0.55	0.30	0.15	[35,36,37]	Ethiopia, India
	Ad hoc implementation of countermeasures (%)	Continuous	0.60	0.90	0.75	0.10	[38,39,40,41]	Uganda, India, Ghana
	Encroachment of Footpath by Street vendors (%)	Continuous	0.00	1.00	0.61	0.36	[10]	India
	Human Capacity of responsible agencies (Adequate = 1, Poor = 0)	Categorical	0.00	1.00	0.50	0.30	[42]	World Bank
Demographics	Age group (%)	Below 18 years (%)	0.00	0.90	0.09	0.15	[33]	India
		18–49 years (in %)	0.06	1.00	0.79	0.15	[33]	India
		50+ years (%)	0.00	0.33	0.11	0.07	[33]	India
	Gender (%)	Male pedestrians (%)	0.02	0.90	0.73	0.15	[33]	India
	Gender (%)	Female (%)	0.11	0.35	0.23	0.12	[43]	USA
	Employed population (%)	Continuous	0.40	0.70	0.55	0.10	[44]	World Bank
Infrastructure and Roadway Factors	Maintenance Practices/level (%)	Continuous	0.05	0.40	0.20	0.10	[35,36]	Ghana and Ethiopia
	Coverage of pedestrian infrastructure (%)	Continuous	0.20	0.60	0.40	0.10	[41]	India
	Vandalism of Street Furniture (Never = 1; Sometimes = 0.5; Always = 0)	Categorical	0.00	1.00	0.70	0.20	[45]	Turkey
	Age of the countermeasure (years)	Continuous	0.50	10.00	5.00	2.50	[46]	USA
	Appropriate location of countermeasure (1/0)	Categorical	0.00	1.00	0.60	0.20	[37]	Ethiopia

Table 2. Summary of generated artificial data characteristics for each variable.

Characteristics	Variables	Variable Type	Minimum	Maximum	Mean (μ)	Median	Standard Deviation (δ)	StdErr	95% CI Lower Bound	95% CI Upper Bound
Safety Performance	Fatal Pedestrian Crash Statistics	Continuous	0.00	10.67	2.03	1.50	2.06	0.05	1.94	2.12
Traffic Exposures and Operational Characteristics	Log (Average Daily Traffic Volume)	Continuous	4.24	5.47	4.71	4.71	0.22	0.00	4.70	4.72
	Log (Average Daily Pedestrian Volume)	Continuous	3.37	5.25	4.58	4.59	0.35	0.01	4.56	4.59
	Speed (km/h)	Continuous	30.00	65.00	42.67	42.00	9.06	0.20	42.27	43.06
	Pedestrian to Vehicle Volume Ratio/Mixed Traffic Conditions	Continuous	0.05	6.07	1.19	0.95	1.11	0.02	1.14	1.24
	Vehicle age/technology (%)	Continuous	0.50	0.90	0.70	0.70	0.12	0.00	0.69	0.71
	Compliance/Presence of Overtaking Tendency of Vehicle (1/0)	Categorical	0.00	1.00	0.66	1.00	0.48	0.01	0.63	0.68
	Enforcement of Traffic rules (Yes = 1; No = 0)	Categorical	0.00	1.00	0.51	1.00	0.50	0.01	0.48	0.53
	Public safety awareness level (%)	Continuous	0.31	0.68	0.50	0.50	0.12	0.00	0.49	0.51
	Driver safety awareness level (%)	Continuous	0.38	0.54	0.47	0.49	0.07	0.00	0.47	0.47
	Time of the Day (visibility) (1/0)	Categorical	0.00	1.00	0.48	0.00	0.50	0.01	0.46	0.50
Land use and Planning	Hierarchical Road Classification/Road Use (%)	Continuous	0.16	0.80	0.45	0.44	0.19	0.00	0.44	0.46
	Design Configuration (%)	Continuous	0.10	0.55	0.30	0.29	0.14	0.00	0.30	0.31
	Ad hoc implementation of countermeasures (%)	Continuous	0.60	0.90	0.75	0.75	0.09	0.00	0.75	0.75
	Encroachment of Footpath by Street vendors (%)	Continuous	0.00	1.00	0.60	0.64	0.32	0.01	0.58	0.61
	Human Capacity of responsible agencies (Adequate = 1, Poor = 0)	Categorical	0.00	1.00	0.50	0.00	0.50	0.01	0.48	0.52
Demographics	Age group (%)	Below 18 years (%)	0.00	0.69	0.11	0.07	0.13	0.00	0.10	0.12
		18–49 years (in %)	0.21	1.00	0.79	0.80	0.15	0.00	0.78	0.79
		50+ years (%)	0.00	0.33	0.11	0.10	0.07	0.00	0.11	0.11
	Gender (%)	Male pedestrians (%)	0.13	0.90	0.72	0.74	0.14	0.00	0.72	0.73
	Gender (%)	Female (%)	0.11	0.35	0.23	0.23	0.09	0.00	0.23	0.23
	Employed population (%)	Continuous	0.40	0.70	0.55	0.55	0.09	0.00	0.55	0.55
Infrastructure and Roadway Factors	Maintenance Practices/level (%)	Continuous	0.05	0.40	0.20	0.20	0.10	0.00	0.20	0.21
	Coverage of pedestrian infrastructure (%)	Continuous	0.20	0.60	0.40	0.40	0.10	0.00	0.40	0.40
	Vandalism of Street Furniture (Never = 1; Sometimes = 0.5; Always = 0)	Categorical	0.00	1.00	0.70	1.00	0.46	0.01	0.68	0.72
	Age of the countermeasure (years)	Continuous	0.50	10.00	5.01	5.04	2.47	0.06	4.90	5.12
	Appropriate location of countermeasure (1/0)	Categorical	0.00	1.00	0.61	1.00	0.49	0.01	0.59	0.63

Table 3. Spearman correlation between independent variables and pedestrian crash count.

Variable	Min	Max	Mean	Std Dev	Spearman Rho	T-Statistic	p-Value
Log Average Daily Traffic Volume	4.240	5.470	4.710	0.220	−0.029	−1.285	0.199
Log Average Daily Pedestrian Volume	3.364	5.250	4.580	0.349	0.030	1.336	0.182
Speed (km/h)	30.000	65.000	42.697	8.988	0.029	1.282	0.200
Pedestrian to Vehicle Volume Ratio	0.050	6.087	1.179	1.128	0.000	0.010	0.992
Vehicle age technology (%)	0.500	0.900	0.700	0.122	0.003	0.124	0.901
Overtaking Tendency (1/0)	0.000	1.000	0.668	0.471	−0.037	−1.662	0.097
Traffic Rule Enforcement (1/0)	0.000	1.000	0.517	0.500	0.009	0.387	0.699
Public Safety Awareness (%)	0.310	0.680	0.499	0.119	−0.026	−1.161	0.246
Driver Safety Awareness (%)	0.380	0.540	0.468	0.069	0.017	0.755	0.450
Time of Day Visibility (1/0)	0.000	1.000	0.491	0.500	0.011	0.490	0.624
Road Use (%)	0.160	0.800	0.452	0.191	0.001	0.039	0.969
Design Configuration (%)	0.100	0.550	0.302	0.139	0.018	0.791	0.429
Ad hoc implementation of countermeasures (%)	0.600	0.900	0.751	0.094	−0.032	−1.407	0.160
Footpath Encroachment (%)	0.000	1.000	0.596	0.326	0.008	0.342	0.733
Human Capacity of Agencies (1/0)	0.000	1.000	0.498	0.500	0.013	0.581	0.562
Age < 18 (%)	0.000	0.685	0.111	0.128	−0.021	−0.926	0.355
Age 18–49 (%)	0.181	1.000	0.788	0.147	0.031	1.373	0.170
Age 50+ (%)	0.000	0.330	0.111	0.069	−0.011	−0.487	0.627
Male Pedestrians (%)	0.158	0.900	0.725	0.143	0.036	1.616	0.106
Female Pedestrians (%)	0.110	0.350	0.230	0.092	−0.023	−1.016	0.310
Employed Population (%)	0.400	0.700	0.550	0.094	0.021	0.947	0.344
Maintenance Practices (%)	0.050	0.400	0.201	0.097	−0.006	−0.251	0.802
Pedestrian Infrastructure Coverage (%)	0.200	0.600	0.400	0.099	0.008	0.374	0.709
Street Furniture Vandalism (0/0.5/1)	0.000	1.000	0.680	0.467	−0.024	−1.074	0.283
Age of Countermeasure years	0.500	10.000	5.006	2.466	−0.022	−0.968	0.333
Appropriate Countermeasure Location (1/0)	0.000	1.000	0.608	0.488	0.033	1.468	0.142

Table 4. Spearman correlation matrix between each pair of variables.

	FT	T	P	S	R	VAT	OT	TR	PSA	DSA	TD	RU	DC	CA	FE	HCA	AG1	AG2	AG3	MP	FP	EP	MTP	PIC	SFV	AC
FT	1.000
T	−0.029	1.000
P	0.030	0.004	1.000
S	0.029	−0.021	−0.019	1.000
R	0.000	−0.038	0.005	0.017	1.000
VAT	0.003	−0.005	0.013	−0.013	−0.033	1.000
OT	−0.037	0.017	−0.001	−0.023	−0.011	0.018	1.000
TR	0.009	0.003	−0.075	0.019	−0.030	−0.034	0.009	1.000
PSA	−0.026	0.025	−0.002	0.018	0.019	−0.039	0.063	0.006	1.000
DSA	0.017	0.031	0.013	0.003	0.004	−0.001	0.003	−0.025	0.006	1.000
TD	0.011	0.012	−0.007	0.001	0.018	0.024	0.054	−0.016	0.007	−0.005	1.000
RU	0.001	0.001	−0.016	0.018	0.025	0.016	−0.013	0.015	0.015	0.021	−0.017	1.000
DC	0.018	0.017	0.004	−0.003	−0.003	−0.028	−0.009	0.002	0.004	−0.013	−0.020	0.014	1.000
CA	−0.031	0.026	−0.005	−0.024	0.011	0.001	0.021	−0.047	0.011	0.000	0.001	0.017	−0.025	1.000
FE	0.008	−0.011	−0.017	0.017	0.012	0.000	0.009	−0.003	−0.023	0.027	0.018	0.011	0.022	0.056	1.000
HCA	0.013	−0.006	0.034	0.033	0.021	0.019	−0.011	−0.008	−0.009	0.004	0.005	0.025	−0.008	0.009	−0.009	1.000
AG1	−0.021	0.015	−0.004	0.009	−0.035	−0.017	−0.002	−0.027	0.025	0.005	−0.031	0.022	0.019	0.007	0.032	−0.056	1.000
AG2	0.031	−0.025	0.025	−0.010	−0.003	−0.014	0.002	0.029	−0.007	−0.001	−0.001	0.019	0.003	−0.021	−0.029	0.017	−0.017	1.000
AG3	−0.011	0.012	0.004	0.010	0.023	0.023	0.019	−0.043	−0.013	0.008	0.027	−0.002	0.008	0.021	−0.008	−0.004	−0.022	0.033	1.000
MP	0.036	0.005	−0.002	0.011	−0.027	0.020	0.070	−0.021	0.019	0.036	−0.036	−0.015	−0.035	0.026	−0.004	0.005	−0.003	0.025	0.043	1.000
FP	−0.023	0.018	−0.012	−0.022	0.012	0.009	−0.002	−0.001	0.010	0.043	−0.013	−0.023	0.014	−0.022	0.012	−0.006	0.018	0.021	−0.019	0.008	1.000
EP	0.021	−0.017	0.030	−0.005	0.012	−0.021	0.016	−0.020	0.018	0.008	−0.002	0.015	−0.009	0.002	0.007	−0.005	−0.058	0.026	0.001	0.013	−0.017	1.000
MTP	−0.006	−0.038	−0.008	−0.013	0.013	−0.025	−0.003	−0.004	0.011	0.014	0.025	0.003	−0.007	0.013	−0.003	0.029	0.043	−0.007	−0.006	0.015	−0.057	0.023	1.000
PIC	0.008	−0.012	−0.043	0.012	−0.040	−0.019	0.008	−0.038	−0.045	0.029	0.001	0.013	−0.004	0.003	−0.002	0.023	−0.007	0.019	−0.036	0.012	−0.052	0.021	0.022	1.000
SFV	−0.024	0.054	0.001	−0.018	0.020	−0.004	−0.034	−0.019	−0.026	−0.040	0.027	0.010	0.003	−0.007	0.011	0.030	−0.005	−0.023	−0.008	−0.026	0.016	−0.013	0.001	0.009	1.000
AC	−0.022	−0.009	−0.013	0.021	−0.019	−0.038	0.003	−0.029	−0.001	−0.024	0.015	−0.038	0.009	0.019	0.006	−0.001	−0.017	−0.002	−0.021	0.032	−0.009	0.048	−0.001	−0.021	−0.007	1.000

FT = fatal pedestrian crash frequency, T = log average daily traffic volume, P = log average pedestrian volume, S = speed in km/h, R = pedestrian-to-vehicle volume ratio, VAT = vehicle age/technology, OT = overtaking tendency, TR = traffic rule enforcement, PSA = public safety awareness, DSA = driver safety awareness, TD = time of the day, RU = road use, DC = design configuration, CA = ad hoc implementation of countermeasures, FE = footpath encroachment, HCA = human capacity agencies, AG1 = age < 18 years, AG2 = age 10–49 years, AG3 = age 50+ years, MP = male pedestrian, FP = female pedestrian, EP = employed population, MTP = maintenance practices, PIC = pedestrian infrastructure coverage, SFV = street furniture vandalism, AC = age of countermeasure, ACL = appropriate countermeasure location.

Table 5. Negative binomial regression results for the six models.

Coefficient (β)	StdErr	z-Value	P > \|z\|	CI Lower	CI Upper	Variable	Model
0.704	0.027	25.750	0.000	0.650	0.758	intercept	Model_1_Baseline
0.680	0.720	0.943	0.345	−0.732	2.092	const	Model_2_Traffic
−0.146	0.125	−1.174	0.240	−0.391	0.098	Log Average Daily Traffic Volume	Model_2_Traffic
0.109	0.078	1.393	0.164	−0.045	0.263	Log Average Daily Pedestrian Volume	Model_2_Traffic
0.003	0.003	0.832	0.405	−0.003	0.008	Speed (km/h)	Model_2_Traffic
0.004	0.024	0.150	0.881	−0.044	0.051	Pedestrian to Vehicle Volume Ratio	Model_2_Traffic
0.142	0.224	0.635	0.526	−0.296	0.580	Vehicle age technology (%)	Model_2_Traffic
1.018	0.239	4.262	0.000	0.550	1.486	const	Model_3_Land_Use
0.060	0.143	0.416	0.677	−0.221	0.340	Road Use (%)	Model_3_Land_Use
0.112	0.196	0.569	0.569	−0.273	0.497	Design Configuration (%)	Model_3_Land_Use
−0.496	0.291	−1.707	0.088	−1.066	0.073	Ad hoc implementation of countermeasures (%)	Model_3_Land_Use
−0.005	0.084	−0.065	0.948	−0.170	0.159	Footpath Encroachment (%)	Model_3_Land_Use
0.446	0.267	1.666	0.096	−0.078	0.970	const	Model_4_Demographic
0.065	0.214	0.302	0.763	−0.355	0.484	Age < 18 (%)	Model_4_Demographic
0.154	0.186	0.826	0.409	−0.212	0.519	Age 18–49 (%)	Model_4_Demographic
−0.073	0.398	−0.183	0.854	−0.852	0.706	Age 50+ (%)	Model_4_Demographic
0.088	0.192	0.460	0.645	−0.288	0.464	Male Pedestrians (%)	Model_4_Demographic
−0.140	0.298	−0.468	0.639	−0.724	0.445	Female Pedestrians (%)	Model_4_Demographic
0.191	0.292	0.655	0.512	−0.381	0.764	Employed Population (%)	Model_4_Demographic
0.741	0.149	4.972	0.000	0.449	1.033	const	Model_5_Infrastructure
−0.100	0.283	−0.354	0.723	−0.654	0.454	Maintenance Practices (%)	Model_5_Infrastructure
0.077	0.277	0.277	0.782	−0.466	0.620	Pedestrian Infrastructure Coverage (%)	Model_5_Infrastructure
−0.032	0.058	−0.553	0.581	−0.147	0.082	Street Furniture Vandalism (0/0.5/1)	Model_5_Infrastructure
−0.011	0.011	−0.961	0.336	−0.032	0.011	Age of Countermeasure (years)	Model_5_Infrastructure
0.045	0.056	0.799	0.424	−0.065	0.155	Appropriate Countermeasure Location (1/0)	Model_5_Infrastructure
0.765	0.819	0.934	0.350	−0.840	2.369	const	Model_6_Full
−0.145	0.125	−1.159	0.247	−0.391	0.100	Log Average Daily Traffic Volume	Model_6_Full
0.103	0.079	1.310	0.190	−0.051	0.257	Log Average Daily Pedestrian Volume	Model_6_Full
0.002	0.003	0.802	0.423	−0.004	0.008	Speed (km/h)	Model_6_Full
0.004	0.024	0.177	0.860	−0.043	0.052	Pedestrian to Vehicle Volume Ratio	Model_6_Full
0.149	0.225	0.663	0.507	−0.291	0.589	Vehicle age technology (%)	Model_6_Full
0.049	0.144	0.344	0.731	−0.232	0.331	Road Use (%)	Model_6_Full
0.126	0.197	0.640	0.522	−0.260	0.512	Design Configuration (%)	Model_6_Full
−0.460	0.292	−1.578	0.114	−1.031	0.111	Ad hoc implementation of countermeasures (%)	Model_6_Full
0.000	0.084	0.002	0.998	−0.165	0.165	Footpath Encroachment (%)	Model_6_Full
0.057	0.215	0.264	0.792	−0.365	0.478	Age < 18 (%)	Model_6_Full
0.135	0.187	0.725	0.469	−0.231	0.502	Age 18–49 (%)	Model_6_Full
−0.083	0.399	−0.208	0.835	−0.864	0.698	Age 50+ (%)	Model_6_Full
0.109	0.193	0.568	0.570	−0.268	0.487	Male Pedestrians (%)	Model_6_Full
−0.152	0.300	−0.507	0.612	−0.740	0.436	Female Pedestrians (%)	Model_6_Full
0.203	0.293	0.691	0.490	−0.372	0.777	Employed Population (%)	Model_6_Full
−0.116	0.284	−0.408	0.683	−0.672	0.441	Maintenance Practices (%)	Model_6_Full
0.056	0.279	0.202	0.840	−0.490	0.603	Pedestrian Infrastructure Coverage (%)	Model_6_Full
−0.026	0.059	−0.447	0.655	−0.141	0.089	Street Furniture Vandalism (0/0.5/1)	Model_6_Full
−0.010	0.011	−0.920	0.358	−0.032	0.012	Age of Countermeasure years	Model_6_Full
0.047	0.056	0.832	0.406	−0.064	0.157	Appropriate Countermeasure Location (1/0)	Model_6_Full

Table 6. Fit metrics for the developed NB models.

Model	No. of Parameters Included (K)	No of Observations (n)	LL_model	LL_null	McFadden’s Pseudo-R²	AIC
Model 1 (Baseline)	1	2000	−3846.554	−3846.554	0.000000	7695.108
Model 2 (Traffic exposure and operational variables)	6	2000	−3846.077	−3846.554	0.000124	7704.153
Model 3 (Land use and planning variables)	5	2000	−3845.035	−3846.554	0.000395	7700.07
Model 4 (Demographic variables)	7	2000	−3844.866	−3846.554	0.000439	7703.732
Model 5 (Infrastructure and roadway variables)	6	2000	−3844.559	−3846.554	0.000519	7701.119
Model 6 (Full model)	21	2000	−3840.842	−3846.554	0.001485	7723.683

3. Results

3.1. Distribution of Trend Data and Artificial Datasets for Each Factor

Table 1 presents the trend values (minimum, maximum, mean, and standard deviation) for the variables as extracted from the literature, along with their sources. These values provided the statistical boundaries for generating artificial datasets.

Using distributions from Table 1 above, artificial datasets of 2000 random samples per variable were generated in Python. The descriptive statistics of the generated datasets are summarised in Table 2, showing that the artificial data closely approximated the literature-derived boundaries while maintaining internal variability.

Validation of these datasets was also undertaken using histograms and boxplots. For illustration, Figure 1 presents the distribution of ad hoc implementation of countermeasures, showing both the histogram and boxplot outputs. Similar plots were produced for all the factors and are provided in Appendix B. These visualisations confirm that the artificial datasets reflected realistic patterns and did not deviate from the empirical trends reported in the literature.

The histogram of the “Ad hoc implementation of countermeasures (%)” variable shows a bimodal distribution, with peaks centred at approximately 0.6 and 0.9. This indicates that different areas take distinct approaches in implementing pedestrian safety countermeasures, with some relying heavily on retrospective measures, while others apply them only occasionally. Although the mean and median are both close to 0.75, this average may hide the two underlying patterns. To better visualise this, a Kernel Density Estimation (KDE) curve was used, which smooths the data and confirms the presence of two clear peaks. KDE is a non-parametric method used to estimate the probability density function of a continuous variable and is especially helpful for identifying multiple modes in a dataset without depending on histogram binning [47]. This pattern may reflect disparities in planning philosophies, with some jurisdictions prioritising pedestrian safety as a primary concern, while others address it only after an incident. Such divergence may be rooted in differing regulatory environments, funding limitations, or urban planning priorities.

Overall, several variables (e.g., vehicle age, public safety awareness, female pedestrians, and employed population) exhibited bimodal or skewed distributions, reflecting heterogeneity in DC contexts. In contrast, others (e.g., pedestrian infrastructure coverage) showed near-normal patterns.

3.2. External Face Validity of Generated Data

Out of the 27 factors considered (including the dependent variable), 23 had published means that lay within the 95 percent confidence intervals of the generated values, indicating broad alignment between the artificial dataset and the external evidence base. Four factors fell outside the 95 per cent confidence intervals: fatal pedestrian crash statistics (published mean 1.83, generated 2.03, CI 1.94–2.12), pedestrian-to-vehicle volume ratio or mixed traffic conditions (published mean 1.09, generated 1.19, CI 1.14–1.24), driver safety awareness level (published mean 0.48, generated 0.47, CI 0.47–0.47), and age group below 18 years (published mean 0.09, generated 0.11, CI 0.10–0.12). These discrepancies highlight residual gaps that reflect either variability in the published literature or the limitations of simulating data in resource-constrained contexts.

3.3. Correlation Analysis

Pairwise Spearman’s correlation results between each factor and pedestrian crash counts are presented in Table 3.

Although correlation magnitudes were generally weak (p > 0.005), directionally useful associations were evident. For example, traffic rule enforcement, driver safety awareness, and human capacity of agencies showed positive correlations with crash counts, while ad hoc implementation of countermeasures, overtaking tendency, and public safety awareness were negatively correlated.

Pairwise correlation among each pair of variables is reported in Table 4.

Pairwise Spearman’s correlation coefficients (Table 4) showed generally weak associations among predictors, with absolute ρ values ranging between 0.00 and 0.07 for most variable pairs. These magnitudes are well below conventional multicollinearity thresholds (|ρ| > 0.7), suggesting that dependence among predictors is minimal.

3.4. Regression Analysis (Negative Binomial Models)

Table 5 presents the outputs of the six negative binomial regression models fitted to the artificial datasets, and Table 6 shows the NB fit metrics for the six developed models.

As expected, none of the modelled variables reached conventional levels of statistical significance, which is consistent with the constraints of using artificial datasets. Even so, the negative binomial (NB) coefficients still offered valuable inputs for deriving risk factor influence values (Fi). The relative magnitudes of these coefficients suggested that demographic and institutional variables (such as employed population and agency capacity) may exert greater potential influence than infrastructural factors, although this finding should be viewed as illustrative rather than definitive.

Model performance indicators also showed improvement over the null specification in Models 2–6, reflected by positive McFadden’s ρ² values and lower AIC scores compared with the baseline model.

Out of the 26 contextual factors initially considered, 20 were retained in the NB models. Six variables were excluded during the stepwise regression process because of weak or inconsistent data. These were overtaking tendency, traffic rule enforcement, public safety awareness, driver safety awareness, time of day, and the human capacity of agencies (see Table 7). The final models thus provide estimates for 20 contextual factors, which were subsequently transformed into risk factor influence values (Fi).

3.5. Transforming NB Coefficients into Risk Factor Influence Values (Fi)

The coefficients obtained from the negative binomial models for the 20 retained factors were transformed into risk factor influence values (Fi) to quantify the relative contribution of each contextual factor to pedestrian crash outcomes.

The six NB models produced varied β values, but none met the conventional threshold for statistical significance as mentioned earlier. Importantly, each model was evaluated based on coefficient direction, relative magnitude, and thematic alignment. The results, therefore, illustrate methodological feasibility rather than providing empirically validated estimates. The Fi values were calculated as the exponential transformation of NB coefficients (e^β) using Equation (11).

Illustrative examples of Fi values included the following:

Ad hoc implementation of countermeasures had a risk factor value of 0.63, indicating a 37% reduction in expected safety benefits when countermeasures are implemented after an accident has happened rather than before.
Female pedestrians had a risk factor value of 0.86, reinforcing gender-specific vulnerability that remains unaddressed in current global frameworks.
Employed population (1.22), and age 18–49 (1.15) showed the highest positive risk values among demographic variables. These highlight that areas with a high concentration of working-age pedestrians face elevated pedestrian crash risks, even when standard countermeasures are applied.
Vehicle age/technology (1.16) also exhibited an elevated risk value, pointing to the indirect effects of outdated or poorly maintained vehicle fleets, another non-iRAP parameter.
Design configuration (1.14) and road use (1.05), both geometric variables already covered in iRAP, showed moderate risk increases. However, their explanatory power appeared weaker compared to social–behavioural and institutional variables.

More details can be found in Table 7.

3.6. Sensitivity Analysis

The baseline NB model produced a log-likelihood of –3840.8, an AIC of 7723.7, and a deviance of 1841.3, confirming good convergence of the specification.

The bootstrap analysis showed that Fi values were stable across resamples, with narrow 95% confidence intervals for most predictors (Table 7), except for average daily traffic volume and pedestrian infrastructure coverage. This suggests that small data perturbations do not substantially alter the direction or magnitude of estimated risk influences.

Scenario testing highlighted differences in rank stability across sample sizes and noise levels. For n = 1000, Kendall’s τ ranged from 0.61 (no noise perturbation) to 0.69 (10% noise perturbation), with p < 0.001. For n = 2000, τ values were lower, between 0.34 and 0.36, but remained statistically significant (p < 0.05) (Table 8). These results indicate that, while absolute Fi values were robust, the relative ranking of weaker predictors was more sensitive to perturbation when larger samples were drawn.

3.7. Comparative Analysis with iRAP Framework (Mapping of Factors to NB Model and iRAP Framework)

The comparative analysis identified 16 contextual factors not currently included in iRAP’s pedestrian crash risk framework (Table 9).

Among these, five factors, including overtaking tendency, traffic rule enforcement, public safety awareness, driver safety awareness, and human capacity of agencies, were neither captured in NB modelling outputs nor covered by iRAP. Their omission highlights potential blind spots in the current prediction models, which may lead to overestimation of countermeasure performance in DC contexts.

4. Discussion

This paper demonstrated how literature trends and artificial data can be used to simulate modelling processes in data-constrained contexts. The results reflect a methodological process designed to assess risk relationships, not to infer statistical causality. The methodological approach offers a significant contribution to the study of pedestrian safety in data-scarce contexts by showing how artificial datasets, informed by literature-derived parameters, can be used to model and analyse contextual risk factors. This is particularly relevant for developing countries (DCs), where empirical crash data are often unavailable, unreliable, or inconsistent across jurisdictions [8,9,48]. The use of structured simulations, grounded in peer-reviewed studies and grey literature, ensures that the artificial data not only mirrors the statistical properties of real-world observations but also preserves contextual relevance [49].

4.1. External Face Validity

External face validity checks confirmed broad alignment between the artificial dataset and published DC evidence, with only four variables diverging (Table 2). These gaps underscore the importance of later empirical calibration. The residual differences likely arose from the narrow variance of some generated variables and from differences between local studies and the broader evidence base. Nevertheless, the overall consistency supports the use of artificial data as a credible foundation for exploratory modelling [6,30].

4.2. Correlation and Regression Analysis

The Spearman correlation analysis and subsequent negative binomial (NB) regression modelling revealed several noteworthy patterns. While statistical significance could not be meaningfully assessed owing to the absence of real-world inter-variable dependencies, the practical implications of the derived risk values (Fi) were evident. Behavioural and institutional variables such as ad hoc implementation of countermeasures [9], female pedestrian proportion [50], and vehicle age or technology [51] displayed stronger risk factors than several geometric variables already embedded within the iRAP framework. This highlights the systemic exclusion of socio-behavioural determinants in mainstream road safety assessment tools and supports previous critiques that global frameworks often inadequately represent the urban complexities of DCs [28,52,53,54].

The creation of multiple NB models grouped by variable typology (exposure, land use, demographics, and infrastructure) also provided insight into domain-specific influences on crash frequency. Although infrastructure variables demonstrated a logical alignment with iRAP, their risk values were generally lower compared to demographic and institutional variables. This suggests that the highest safety returns may come from broader governance and behavioural reforms rather than physical redesign alone [9,54]. This reflects a shift in thinking within the urban transport safety community, where “soft” interventions like awareness, compliance, and institutional reforms are increasingly acknowledged as vital complements to traditional engineering solutions [8,50].

4.3. Sensitivity Analysis

The sensitivity analysis confirmed that Fi values derived from the artificial dataset were generally robust to bootstrap resampling and scenario perturbations. Confidence intervals around the Fi estimates were narrow, supporting the reliability of the point estimates. Rank-order stability was moderate to high for smaller samples (τ = 0.61–0.69) but weaker for larger resamples (τ ≈ 0.34). This pattern reflects the fact that variables with Fi values close to unity were more prone to reordering under perturbations, while stronger predictors remained stable. Importantly, no major shifts in directionality were observed, and the key behavioural and institutional predictors continued to display higher Fi values relative to geometric factors.

By quantifying both bootstrap uncertainty and rank-order sensitivity, this study addresses a key methodological concern in artificial-data modelling. The results suggest that the modelling approach provides credible estimates of contextual factor influence in data-scarce environments, while also clarifying the limitations of variables with weak explanatory power [16,25].

4.4. Mapping to iRAP

The comparative analysis between the NB-included variables and the iRAP attributes reveals important thematic misalignments. While iRAP effectively captures geometric design and speed parameters, it largely omits contextual and behavioural dimensions such as traffic rule enforcement, public safety awareness, and institutional capacity [54,55]. These omissions likely contribute to the persistent “effectiveness gap” observed in the implementation of safety countermeasures in DCs.

This interpretation is consistent with broader critiques of road safety frameworks in DCs, where globalised models have been shown to underrepresent institutional realities, cultural behaviours, and governance capacity [5,30]. Such gaps highlight the need for complementary approaches that extend existing frameworks rather than replace them.

Artificial-data modelling offers one such pathway by incorporating underrepresented variables into existing predictive frameworks. The proposed context-adjusted iRAP effectiveness variant model would integrate both empirical weightings derived from NB regression and literature-based weights for excluded variables, thereby enhancing sensitivity and relevance in DC settings [28,54].

4.5. Methodological Contributions

Beyond empirical insights, this study demonstrates the methodological value of artificial data as a bridge for testing the viability of incorporating underrepresented variables into predictive frameworks. Despite limitations such as the lack of empirical validation and potential overfitting, this study successfully demonstrated that credible and reproducible risk models can be developed using literature-informed simulation [9,49]. The structured generation process, using Python-based statistical libraries like NumPy and SciPy [14,15], ensured adherence to statistical principles while enabling traceability, a critical component of transparent data analysis practice.

5. Limitations

This study has several limitations that should be acknowledged. Although 26 contextual factors relevant to DCs were identified through literature review and data disaggregation, only 20 were retained in the negative binomial models. Six factors, including overtaking tendency, traffic rule enforcement, public safety awareness, driver safety awareness, time of day, and the human capacity of agencies, were excluded due to weak or inconsistent trend data. This restriction inevitably reduces the comprehensiveness of the Fi estimates.

The study relied on artificially generated datasets derived from literature-informed parameterisations. While this approach enabled analysis in a data-scarce environment, it assumes independence between most variables and cannot fully reproduce the complex dependence structures present in real-world crash data. As a result, the estimated Fi values are best regarded as illustrative rather than decision-ready.

None of the reported statistical associations should be interpreted as causal. The p-values presented in Table 3 and Table 5 are provided only for completeness and do not carry inferential weight. The Fi values instead demonstrate a replicable process for estimating contextual factor influence, which can be calibrated once more reliable crash data become available.

Although external validity checks were performed against published DC statistics, more robust calibration and validation exercises are required. Future work should therefore focus on (i) strengthening local crash data systems, (ii) incorporating realistic dependence structures between variables, and (iii) recalibrating Fi estimates using empirical crash datasets. This calibration roadmap will provide a stronger basis for integrating Fi values into the current effectiveness models, hence enhancing their assessment abilities.

6. Conclusions

This study developed and demonstrated a methodological process for estimating the influence of contextual factors on pedestrian crashes in data-scarce environments. The process combined literature-derived trend analysis, artificial data generation, external validity checks, correlation and regression modelling, sensitivity analysis, and mapping to existing frameworks.

The findings revealed that behavioural and institutional factors such as ad hoc implementation of countermeasures, gender composition of pedestrian flows, and vehicle age or technology exerted a stronger influence on pedestrian crash outcomes than several geometric attributes typically emphasised in global models. These results are consistent with evidence that socio-behavioural and governance conditions are key determinants of road safety in developing countries.

The external face validity assessment confirmed broad alignment of the artificial dataset with published values, with only a small number of residual gaps. Sensitivity analysis further demonstrated that Fi values were robust to bootstrap resampling and scenario perturbations, although predictors with Fi close to unity exhibited greater instability. Taken together, these findings strengthen confidence in the proposed approach and confirm its potential to generate meaningful insights in the absence of empirical crash data.

By mapping results to the iRAP framework, this study showed areas of convergence while also highlighting additional contextual factors absent from current models. Rather than replacing iRAP, this work demonstrates how artificial-data modelling can complement and extend its scope, particularly for developing-country environments where behavioural and institutional determinants play a significant role.

Overall, this research provides methodological proof of concept that artificial-data approaches can help close knowledge gaps in road safety modelling. While limitations remain, including the reliance on assumptions, the lack of empirical interdependencies, and the need for calibration with real-world datasets, this study contributes a replicable framework that can support more context-sensitive safety assessment in data-scarce settings. A practical calibration pathway is therefore outlined: assemble minimally sufficient crash datasets, quality-check exposure and enforcement indicators, re-estimate NB coefficients, recompute Fi with uncertainty, and update the enhanced iRAP model accordingly.

7. Recommendations

Building on these findings, several recommendations are proposed:

Integration with existing frameworks
Artificial-data modelling should be used to complement established tools such as iRAP by incorporating contextual factors absent from current models, including institutional capacity, enforcement, and safety awareness. This would enhance the predictive sensitivity of safety assessments in developing countries.
Empirical calibration and validation
Future research should apply this methodology to real-world crash datasets as they become available. Such empirical calibration will be essential to refine Fi estimates and validate the robustness of artificial-data models.
Refinement of sensitivity approaches
Sensitivity testing should be expanded to include alternative approaches, such as Bayesian inference or hierarchical modelling, which may capture uncertainty more comprehensively in data-scarce environments.
Application to enhanced effectiveness modelling
The Fi values derived here, alongside weighting schemes developed in subsequent research, should inform the construction of an enhanced iRAP effectiveness model. This has the potential to address the performance gap identified in earlier reviews and improve the targeting of countermeasures in developing countries.
Policy and practice
Policymakers and practitioners should recognise that behavioural and institutional interventions such as sustained enforcement, awareness campaigns, and investment in agency capacity may yield equal or greater safety benefits than infrastructure redesign alone.

Author Contributions

Conceptualisation, J.M. and H.E.; methodology, analysis, and writing, J.M.; supervision, H.E. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data supporting the reported results can be obtained from the corresponding author upon reasonable request.

Acknowledgments

The authors would like to acknowledge the support of the Commonwealth Scholarship Commission and the University of Birmingham for providing the necessary resources.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AIC	Akaike Information Criterion
CI	Confidence Interval
DC	Developing Country
GLM	Generalised Linear Model
HIC	High-Income Country
IDE	Integrated Development Environment
iRAP	International Road Assessment Programme
IRR	Incident Rate Ratios
KDE	Kernel Density Estimation
LMIC	Low- and Middle-Income Countries
NB	Negative Binomial
SLR	Systematic Literature Review
WHO	World Health Organisation

Appendix A. Python Code for Generating Artificial Data for All the Variables

Appendix B. Histograms and Boxplots Showing Distribution for Various Variables

Appendix C. Python Scripts for Spearman’s Correlation, Negative Binomial Modelling with Model Fit Metrics

Appendix D. Python Code for Sensitivity Analysis and Fi Uncertainty Check

References

WHO. Pedestrian Safety: A Road Safety Manual for Decision-Makers and Practitioners; World Health Organisation: Geneva, Switzerland, 2023. [Google Scholar]
Karathodorou, N.; Graham, D.; Richter, T.; Ruhl, S.; La Torre, F.; Domenichini, L.; Yannis, G.; Dragomanovits, A.; Laio, A. Development of a crash modification factors model in Europe. In Proceedings of the 17th International Conference Road Safety On Five Continents (RS5C 2016), Rio de Janeiro, Brazil, 17–19 May 2016. Statens väg-och transportforskningsinstitut. [Google Scholar]
National Academies of Sciences, Engineering, and Medicine. Pedestrian Safety Prediction Methodology; The National Academies Press: Washington, DC, USA, 2008. [Google Scholar]
Kraidi, R.; Evdorides, H. Pedestrian safety models for urban environments with high roadside activities. Saf. Sci. 2020, 130, 104847. [Google Scholar] [CrossRef]
Job, R.S.; Wambulwa, W.M. Features of low-income and middle-income countries making road safety more challenging. J. Road Saf. 2020, 31, 79–84. [Google Scholar] [CrossRef]
Thierry, M.; Vet, J.; Uddin, K.B.; Wegman, F. A New Methodology for Road Crash Data Collection in Bangladesh Using Local Record Keepers. J. Road Saf. 2023, 34, 1–11. [Google Scholar] [CrossRef]
Mubiru, J.; Evdorides, H. Pedestrian Safety in Developing Countries: A Systematic Literature Review & Gap Analysis; Manuscript under review in the Journal of Future Transportation; University of Birmingham: Birmingham, UK, 2025. [Google Scholar]
Lin, P.-S.; Guo, R.; Bialkowska-Jelinska, E.; Kourtellis, A.; Zhang, Y. Development of countermeasures to effectively improve pedestrian safety in low-income areas. J. Traffic Transp. Eng. (Engl. Ed.) 2019, 6, 162–174. [Google Scholar] [CrossRef]
Mukherjee, D.; Mitra, S. Identification of Pedestrian Risk Factors Using Negative Binomial Model. Transp. Dev. Econ. 2020, 6, 4. [Google Scholar] [CrossRef]
Mukherjee, D.; Mitra, S. Modelling risk factors for fatal pedestrian crashes in Kolkata, India. Int. J. Inj. Control Saf. Promot. 2020, 27, 197–214. [Google Scholar] [CrossRef]
Parker, C.; Scott, S.; Geddes, A. Snowball sampling. In SAGE Research Methods Foundations; 2019. Available online: https://eprints.glos.ac.uk/6781/1/6781%20Parker%20and%20Scott%20%282019%29%20Snowball%20Sampling_Peer%20reviewed%20pre-copy%20edited%20version.pdf (accessed on 16 June 2025).
Caird, J.K.; Willness, C.R.; Steel, P.; Scialfa, C. A meta-analysis of the effects of cell phones on driver performance. Accid. Anal. Prev. 2008, 40, 1282–1293. [Google Scholar] [CrossRef]
Sundaram, J.; Gowri, K.; Devaraju, S.; Gokuldev, S.; Jayaprakash, S.; Anandaram, H.; Manivasagan, C.; Thenmozhi, M. An Exploration of Python Libraries in Machine Learning Models for Data Science. In Advanced Interdisciplinary Applications of Machine Learning Python Libraries for Data Science; Biju, S.M., Mishra, A., Kumar, M., Eds.; IGI Global Scientific Publishing: Hershey, PA, USA, 2023; pp. 1–31. [Google Scholar]
Harris, C.R.; Millman, K.J.; van der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J.; et al. Array programming with NumPy. Nature 2020, 585, 357–362. [Google Scholar] [CrossRef] [PubMed]
Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; et al. SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nat. Methods 2020, 17, 261–272. [Google Scholar] [CrossRef] [PubMed]
Freedman, D.A. Statistical Models: Theory and Practice; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
Lee, A. Generating random binary deviates having fixed marginal distributions and specified degrees of association. Am. Stat. 1993, 47, 209–215. [Google Scholar] [CrossRef]
McKinney, W. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2012. [Google Scholar]
Yim, A.; Chung, C.; Yu, A. Matplotlib for Python Developers: Effective Techniques for Data Visualization with Python; Packt Publishing Ltd.: Birmingham, UK, 2018. [Google Scholar]
Altman, D.G.; Bland, J.M. Standard deviations and standard errors. BMJ 2005, 331, 903. [Google Scholar] [CrossRef]
Rice, J.A.; Rice, J.A. Mathematical Statistics and Data Analysis; Thomson/Brooks/Cole Belmont: Monterey, CA, USA, 2007; Volume 371. [Google Scholar]
Montgomery, D.C.; Runger, G.C. Applied Statistics and Probability for Engineers; John Wiley & Sons: Hoboken, NJ, USA, 2010. [Google Scholar]
Hauke, J.; Kossowski, T. Comparison of values of Pearson’s and Spearman’s correlation coefficients on the same sets of data. Quaest. Geogr. 2011, 30, 87–93. [Google Scholar] [CrossRef]
Mukherjee, D.; Mitra, S. Pedestrian safety analysis of urban intersections in Kolkata, India using a combined proactive and reactive approach. J. Transp. Saf. Secur. 2022, 14, 754–795. [Google Scholar] [CrossRef]
Cameron, A.C.; Trivedi, P.K. Regression Analysis of Count Data; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]
Washington, S.; Karlaftis, M.G.; Mannering, F.; Anastasopoulos, P. Statistical and Econometric Methods for Transportation Data Analysis; Chapman and Hall/CRC: Boca Raton, FL, USA, 2020. [Google Scholar]
David Freeman, R.P.; Purves, R. Statistics, 4th ed.; W.W. Norton & Company: New York, NY, USA, 2007. [Google Scholar]
iRAP. iRAP Specification, Manuals and Guides. 2021. Available online: https://irap.org/specifications/ (accessed on 31 July 2025).
Earth.Org. Millions of Highly-Polluting Used Cars “Dumped” on Developing Countries-UN. 2020. Available online: https://earth.org/cars-developing-countries/ (accessed on 15 August 2025).
Heydari, S.; Hickford, A.; McIlroy, R.; Turner, J.; Bachani, A.M. Road safety in low-income countries: State of knowledge and future directions. Sustainability 2019, 11, 6249. [Google Scholar] [CrossRef]
Chowdhury, T.; Rifaat, S.M.; Tay, R. Characteristics of Pedestrians in Bangladesh Who Did Not Receive Public Education on Road Safety. Sustainability 2022, 14, 9909. [Google Scholar] [CrossRef]
Shaaban, K. Impact of experience and training on traffic knowledge of young drivers. Open Transp. J. 2021, 15, 61–68. [Google Scholar] [CrossRef]
Mukherjee, D.; Mitra, S. Comprehensive Study of Risk Factors for Fatal Pedestrian Crashes in Urban Setup in a Developing Country. Transp. Res. Rec. 2020, 2674, 100–118. [Google Scholar] [CrossRef]
Victoria Transport Policy Institute. Developing Country Transport Demand Management: Transportation Demand Management in Lower-Income Regions. 2019. Available online: https://www.vtpi.org/tdm/tdm75.htm (accessed on 15 August 2025).
Frimpong, L.K. Enhancing Pedestrian Safety in African Cities. 2022. Available online: https://www.researchgate.net/publication/363415723_Enhancing_Pedestrian_Safety_in_African_Cities#fullTextFileContent (accessed on 15 August 2025).
Jia, W.; Tesfaye, B.; Alcala, Y.M. How Can We Make Cities Safer for Pedestrians? Some Insights from Ethiopia. 2022. Available online: https://blogs.worldbank.org/en/transport/how-can-we-make-cities-safer-pedestrians-some-insights-ethiopia (accessed on 15 August 2025).
Walelign Bishaw, T.; Dolebo, G.N.; Singh, R.B. Evaluating pedestrian facilities for enhancing pedestrian safety in Addis Ababa city. Front. Sustain. Cities 2024, 6. [Google Scholar] [CrossRef]
Damsere-Derry, J.; Ebel, B.E.; Mock, C.N.; Afukaar, F.; Donkor, P.; Kalowole, T.O. Evaluation of the effectiveness of traffic calming measures on vehicle speeds and pedestrian injury severity in Ghana. Traffic Inj. Prev. 2019, 20, 336–342. [Google Scholar] [CrossRef]
Osuret, J.; Namatovu, S.; Biribawa, C.; Balugaba, B.E.; Zziwa, E.B.; Muni, K.; Ningwa, A.; Oporia, F.; Mutto, M.; Kyamanywa, P.; et al. State of pedestrian road safety in Uganda: A qualitative study of existing interventions. Afr. Health Sci. 2021, 21, 1498–1506. [Google Scholar] [CrossRef]
Sabi Boun, S.; Janvier, R.; Marc, R.E.J.; Paul, P.; Senat, R.; Demes, J.A.E.; Burigusa, G.; Chaput, S.; Maurice, P.; Druetz, T. Environmental measures to improve pedestrian safety in low- and middle-income countries: A scoping review. Glob. Health Promot. 2024, 31, 44–55. [Google Scholar] [CrossRef]
Times News Network. Pedestrian Life a No-Go in Bhopal as BMC Sidesteps Duties & Fails to Walk the Talk. 2025. Available online: https://timesofindia.indiatimes.com/city/bhopal/pedestrian-life-a-no-go-in-bhopal-as-bmc-sidesteps-duties-fails-to-walk-the-talk/articleshow/121241233.cms (accessed on 15 August 2025).
Bliss, T.; Breen, J.M. Road Safety Management Capacity Reviews and Safe System Projects Guidelines (Updated Edition); World Bank Group: Washington, DC, USA,, 2013; Available online: https://documents.worldbank.org/en/publication/documents-reports/documentdetail/400301468337261166 (accessed on 15 August 2025).
Zhu, M.; Zhao, S.; Coben, J.H.; Smith, G.S. Why more male pedestrians die in vehicle-pedestrian collisions than female pedestrians: A decompositional analysis. Inj. Prev. 2013, 19, 227–231. [Google Scholar] [CrossRef]
International Labour Organization. Labor Force Participation Rate, Total (% of Total Population Ages 15+) (Modeled ILO Estimate). 2025. Available online: https://data.worldbank.org/indicator/SL.TLF.CACT.ZS (accessed on 15 August 2025).
Arisoy, N. Measuring students’ preferences for urban furniture vandalism in Selçuk University Campus in Turkey: A case study. Arch. Agric. Environ. Sci. 2020, 5, 426–430. [Google Scholar] [CrossRef]
Zegeer, C.; Srinivasan, R.; Lan, B.; Carter, D.; Smith, S.; Sundstrom, C.; Thirsk, N.J.; Zegeer, J.; Lyon, C.; Ferguson, E.; et al. Development of Crash Modification Factors for Uncontrolled Pedestrian Crossing Treatments; National Academies of Sciences, Engineering, and Medicine: Washington, DC, USA, 2017; 162p. [Google Scholar] [CrossRef]
Silverman, B.W. Density Estimation for Statistics and Data Analysis; Routledge: Oxfordshire, UK, 1986. [Google Scholar]
Zafri, N.M.; Khan, A. A spatial regression modeling framework for examining relationships between the built environment and pedestrian crash occurrences at macroscopic level: A study in a developing country context. Geogr. Sustain. 2022, 3, 312–324. [Google Scholar] [CrossRef]
Huang, H.; Abdel-Aty, M. Multilevel data and Bayesian analysis in traffic safety. Accid. Anal. Prev. 2010, 42, 1556–1565. [Google Scholar] [CrossRef]
Yang, J.; Gauli, N.; Shiwakoti, N.; Tay, R.; Deng, H.; Chen, J.; Nepal, B.; Li, J. Examining the Factors Influencing Pedestrian Behaviour and Safety: A Review with a Focus on Culturally and Linguistically Diverse Communities. Sustainability 2025, 17, 6007. [Google Scholar] [CrossRef]
Ghasedi, M.; Sarfjoo, M.; Bargegol, I. Prediction and Analysis of the Severity and Number of Suburban Accidents Using Logit Model, Factor Analysis and Machine Learning: A case study in a developing country. SN Appl. Sci. 2021, 3, 13. [Google Scholar] [CrossRef]
Mukherjee, D.; Mitra, S. A comprehensive study on factors influencing pedestrian signal violation behaviour: Experience from Kolkata City, India. Saf. Sci. 2020, 124, 104610. [Google Scholar] [CrossRef]
Tiwari, G. Progress in pedestrian safety research. Int. J. Inj. Control Saf. Promot. 2020, 27, 35–43. [Google Scholar] [CrossRef] [PubMed]
Hossain, S.; Maggi, E.; Vezzulli, A. Factors influencing the road accidents in low and middle-income countries: A systematic literature review. Int. J. Inj. Control Saf. Promot. 2024, 31, 294–322. [Google Scholar] [CrossRef] [PubMed]
Mukherjee, D. Analyzing key determinants of pedestrian risky behaviors at urban signalized intersections: Insights from Kolkata City, India. Int. J. Inj. Control Saf. Promot. 2025, 32, 201–229. [Google Scholar] [CrossRef] [PubMed]

Figure 1. (a): Histogram showing the distribution of ad hoc implementation of countermeasures. (b): Boxplot showing the distribution of ad hoc implementation of countermeasures.

Table 7. Negative binomial model coefficients and risk factor influence (Fi) values with bootstrap-derived 95% confidence intervals.

Variable/Factor	NB Coefficient (β)	Risk Factor Influence (F_i) = e^β	Bootstrap 95% Confidence Interval for F_i
Variable/Factor	NB Coefficient (β)	Risk Factor Influence (F_i) = e^β	Lower Bound (2.5th Percentile)	Median (50th Percentile)	Upper Bound (97.5th Percentile)
Log (Avg Daily Traffic Volume)	−0.150	0.86	0.910	1.126	1.380
Log (Avg Daily Pedestrian Volume)	0.10	1.11	0.862	0.982	1.113
Speed (km/h)	0.00	1.00	0.994	1.000	1.005
Pedestrian/Vehicle Volume Ratio	0.00	1.00	0.951	0.991	1.032
Vehicle Age/Technology (%)	0.15	1.16	0.634	0.918	1.364
Road Use (%)	0.05	1.05	0.806	1.009	1.278
Design Configuration (%)	0.13	1.14	0.603	0.841	1.150
Ad hoc implementation of countermeasures (%)	−0.46	0.63	0.767	1.231	1.957
Footpath Encroachment (%)	0.00	1.00	0.785	0.900	1.026
Age < 18 (%)	0.06	1.06	0.921	1.316	1.867
Age 18–49 (%)	0.14	1.15	0.798	1.100	1.478
Age 50+ (%)	−0.08	0.92	0.594	1.150	2.178
Male Pedestrians (%)	0.11	1.12	0.708	0.960	1.297
Female Pedestrians (%)	−0.15	0.86	0.521	0.833	1.383
Employed Population (%)	0.20	1.22	0.832	1.318	2.187
Maintenance Practices (%)	−0.10	0.90	0.702	1.097	1.714
Pedestrian Infrastructure Coverage (%)	0.07	1.07	0.407	0.632	0.973
Street Furniture Vandalism	−0.03	0.97	0.901	0.998	1.109
Age of Countermeasure (years)	−0.01	0.99	0.994	1.010	1.027
Appropriate Countermeasure Location (1/0)	0.04	1.04	0.891	0.980	1.079

Table 8. Sensitivity of Fi rankings under scenario perturbations (Kendall’s τ correlation test).

Scenario (Sample Size n and Noise Level)	Kendall’s τ	p-Value
n = 1000, noise = 1.0	0.611	0.000
n = 1000, noise = 0.05	0.632	0.000
n = 1000, noise = 0.10	0.695	0.000
n = 2000, noise = 1.0	0.358	0.028
n = 2000, noise = 0.05	0.337	0.040
n = 2000, noise = 0.10	0.211	0.209
n = 5000, noise = 1.0	0.442	0.006
n = 5000, noise = 0.05	0.516	0.001
n = 5000, noise = 0.10	0.558	0.000

Table 9. Comparison of contextual factors with NB model outputs and iRAP framework.

Variable/Factor	Coefficient (β)	Risk Factor (F_i) = e^β	In NB Model	iRAP Covered	Practical Notes
Log (Avg Daily Traffic Volume)	−0.150	0.86	included	included	iRAP uses traffic flow
Log (Avg Daily Pedestrian Volume)	0.10	1.11	included	included	Pedestrian exposure proxy
Speed (km/h)	0.00	1.00	included	included	iRAP core attribute
Pedestrian/Vehicle Volume Ratio	0.00	1.00	included	excluded	Traffic exposure factor
Vehicle Age/Technology (%)	0.15	1.16	included	excluded	The age of the vehicle fleet is crucial in DCs
Overtaking Tendency	N/A	N/A	excluded	excluded	Critical in DCs
Traffic Rule Enforcement	N/A	N/A	excluded	excluded	Institutional variable
Public Safety Awareness (%)	N/A	N/A	excluded	excluded	Critical in DCs
Driver Safety Awareness (%)	N/A	N/A	excluded	excluded	Critical in DCs
Time of Day Visibility	N/A	N/A	excluded	included	Lighting is a proxy
Road Use (%)	0.05	1.05	included	included	Functional classification included
Design Configuration (%)	0.13	1.14	included	included	Includes medians, crossings, etc.
Ad hoc implementation of countermeasures (%)	−0.46	0.63	included	excluded	Planning sequence not captured
Footpath Encroachment (%)	0.00	1.00	included	excluded	Informal sector factor
Human Capacity of Agencies	N/A	N/A	excluded	excluded	Institutional capacity—not modelled
Age < 18 (%)	0.06	1.06	included	included	covered under the Star rating for schools
Age 18–49 (%)	0.14	1.15	included	excluded	High-activity demographic
Age 50+ (%)	−0.08	0.92	included	excluded	Vulnerable group not addressed
Male Pedestrians (%)	0.11	1.12	included	excluded	Demographic dimension
Female Pedestrians (%)	−0.15	0.86	included	excluded	Gender exposure gap
Employed Population (%)	0.20	1.22	included	excluded	Mobility-related risk
Maintenance Practices (%)	−0.10	0.90	included	included	Maintenance quality implied in iRAP
Pedestrian Infrastructure Coverage (%)	0.07	1.07	included	included	iRAP footpath attribute
Street Furniture Vandalism	−0.03	0.97	included	excluded	Social disorder indicator
Age of Countermeasure (years)	−0.01	0.99	included	excluded	Asset age is important in DCs
Appropriate Countermeasure Location (1/0)	0.04	1.04	included	included	Part of iRAP’s star logic

N/A was used for factors not modelled/excluded from the NB model.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mubiru, J.; Evdorides, H. Quantifying the Risk Impact of Contextual Factors on Pedestrian Crash Outcomes in Data-Scarce Developing Country Settings. Future Transp. 2025, 5, 151. https://doi.org/10.3390/futuretransp5040151

AMA Style

Mubiru J, Evdorides H. Quantifying the Risk Impact of Contextual Factors on Pedestrian Crash Outcomes in Data-Scarce Developing Country Settings. Future Transportation. 2025; 5(4):151. https://doi.org/10.3390/futuretransp5040151

Chicago/Turabian Style

Mubiru, Joel, and Harry Evdorides. 2025. "Quantifying the Risk Impact of Contextual Factors on Pedestrian Crash Outcomes in Data-Scarce Developing Country Settings" Future Transportation 5, no. 4: 151. https://doi.org/10.3390/futuretransp5040151

APA Style

Mubiru, J., & Evdorides, H. (2025). Quantifying the Risk Impact of Contextual Factors on Pedestrian Crash Outcomes in Data-Scarce Developing Country Settings. Future Transportation, 5(4), 151. https://doi.org/10.3390/futuretransp5040151

Article Menu

Quantifying the Risk Impact of Contextual Factors on Pedestrian Crash Outcomes in Data-Scarce Developing Country Settings

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Collection and Factor Selection

2.2. Extracting Trend Data of Each Factor from Literature Sources

2.3. Artificial Data Generation

2.4. External Face Validity and Dependence of Generated Data

2.5. Estimating the Influence of Risk Factors on Pedestrian Crash Outcomes

2.5.1. Correlation Analyses

2.5.2. Stepwise Regression Modelling

2.5.3. Transforming NB Coefficients into Risk Factor Influence Values (Fi)

2.6. Sensitivity Analysis and Fi Uncertainty

2.7. Comparative Analyses (Mapping of Factors to NB Model and iRAP Framework)

3. Results

3.1. Distribution of Trend Data and Artificial Datasets for Each Factor

3.2. External Face Validity of Generated Data

3.3. Correlation Analysis

3.4. Regression Analysis (Negative Binomial Models)

3.5. Transforming NB Coefficients into Risk Factor Influence Values (Fi)

3.6. Sensitivity Analysis

3.7. Comparative Analysis with iRAP Framework (Mapping of Factors to NB Model and iRAP Framework)

4. Discussion

4.1. External Face Validity

4.2. Correlation and Regression Analysis

4.3. Sensitivity Analysis

4.4. Mapping to iRAP

4.5. Methodological Contributions

5. Limitations

6. Conclusions

7. Recommendations

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Python Code for Generating Artificial Data for All the Variables

Appendix B. Histograms and Boxplots Showing Distribution for Various Variables

Appendix C. Python Scripts for Spearman’s Correlation, Negative Binomial Modelling with Model Fit Metrics

Appendix D. Python Code for Sensitivity Analysis and Fi Uncertainty Check

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2.6. Sensitivity Analysis and F_i Uncertainty