Bayesian Modeling of Traffic Accident Rates in Poland Based on Weather Conditions

Filapek, Adam; Faruga, Łukasz; Baranowski, Jerzy

doi:10.3390/app15137332

Open AccessArticle

Bayesian Modeling of Traffic Accident Rates in Poland Based on Weather Conditions

by

Adam Filapek

,

Łukasz Faruga

and

Jerzy Baranowski

^*

Department of Automatic Control & Robotics, AGH University of Kraków, 30-059 Kraków, Poland

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(13), 7332; https://doi.org/10.3390/app15137332

Submission received: 22 May 2025 / Revised: 24 June 2025 / Accepted: 27 June 2025 / Published: 30 June 2025

(This article belongs to the Special Issue Application of Artificial Intelligence and Semantic Mining Technology)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Road traffic accidents pose a substantial global public health burden, resulting in significant fatalities and economic costs. This study employs Bayesian Poisson regression to model traffic accident rates in Poland, focusing on the intricate relationships between weather conditions and socioeconomic factors. Analyzing both yearly county-level and weekly nationwide data from 2020 to 2023, we created four distinct models examining the relationships between accident occurrence and predictors including temperature, humidity, precipitation, population density, passenger car registrations, and road infrastructure. Model evaluation, based on WAIC and PSIS-LOO criteria, demonstrated that integrating both weather and socioeconomic variables enhanced predictive accuracy. Results showed that socioeconomic variables—especially passenger car registrations—were strong predictors of accident rates over longer timeframes and across localized regions. In contrast, weather variables, particularly temperature and humidity, were more influential in explaining short-term fluctuations in nationwide accident counts. These findings provide a statistical foundation for identifying high-risk conditions and guiding targeted interventions. The study supports Poland’s national road safety goals by offering evidence-based strategies to reduce accident-related fatalities and injuries.

Keywords:

road traffic accidents; bayesian modeling; poisson regression; weather conditions; socioeconomic factors

1. Introduction

Road Traffic Accidents (RTAs) represent a major global public health concern, resulting in approximately 1.19 million deaths annually and leaving between 20 and 50 million people with non-fatal injuries [1]. In addition to the human toll, RTAs impose substantial economic burdens, with estimated costs ranging from 1% to 3% of national GDPs [2]. World leaders understand this issue, and so the United Nations General Assembly proclaimed the decade 2021–2030 as the Second Decade of Action for Road Safety, aiming to reduce road traffic deaths and injuries by at least 50 percent [3]. Similar resolutions have been adopted by the WHO [4] and the European Commission [5], among many others.

In recent years, Poland has emerged as a leader in road safety improvement within the European Union, achieving a staggering 44% reduction in road accident fatalities between 2013 and 2023 [6]. However, despite these gains, traffic accidents continue to pose substantial societal and economic burdens. According to the Polish Road Safety Observatory, Poland recorded over 21,000 traffic accidents in 2024, resulting in nearly 1900 fatalities and more than 24,000 injuries [7].

Systematic solutions to lower the number of RTAs are needed, but without understanding the underlying causes it is infeasible to eliminate their tragic outcomes. The complex nature of traffic accidents stems from the interplay of diverse factors, involving traffic conditions, human factors, vehicle design and condition, infrastructure characteristics, and environmental variables [8]. Among these, traffic exposure represents the most fundamental one, suggesting that minimizing traffic volume would be the most effective way to reduce the number of accidents [9]. However, since traffic reduction is rarely feasible, other contributing factors must be carefully analyzed. Vehicle speed plays a critical role in road safety, affecting both the risk and severity of crashes [10,11]. Human factors substantially influence accident risk, with both fatigue and distraction emerging as key concerns [12]. Driver distractions span numerous categories, including mobile phone use, navigation system interactions, passenger conversations, audio system adjustments, and in-vehicle dining [13]. Fatigue degrades driving performance progressively with extended driving duration, with monotonous driving environments further contributing to decreased driver alertness [14]. Vehicle-related factors encompass both technological advancements and maintenance issues. Advanced safety systems such as Forward Collision Warning (FCW) and Autonomous Emergency Braking (AEB) have demonstrated remarkable effectiveness, contributing to the reduction of front-to-rear crashes and associated injuries [15]. Beyond technology, inadequate maintenance, such as insufficient tire tread depth, affect accident rates in a meaningful way, particularly on wet road surfaces [16]. Road infrastructure plays a critical role in both accident prevention and mitigation. Thoughtfully designed pedestrian infrastructure—including refuge islands and signalized crossings—can substantially reduce pedestrian-related accidents [17]. The concept of “forgiving” road infrastructure suggests that well-designed roads can help mitigate the consequences of driver errors [18].

Numerous studies have established a clear link between adverse weather conditions and increased crash frequency and severity, highlighting the importance of analyzing this relationship [19]. The impact of precipitation demonstrates temporal nuances, as rainfall following extended dry periods presents heightened hazards [20], while snow days have fewer fatal crashes than dry days, but more nonfatal-injury crashes [21]. Low friction on snow- and ice-covered roads amplifies collision risk, underscoring the importance of road maintenance in adverse conditions [22]. Temperature also influences traffic safety, with studies linking rising temperatures to an increased risk of crashes [23,24], particularly when extreme heat impairs driver concentration or encourages riskier behavior [25]. Conversely, subzero temperatures do not necessarily pose a direct threat unless they result in ice formation and snow accumulation [22]. Driver behavior adapts to adverse weather conditions, with reduced speeds and increased following distances observed during rainfall events, effects that intensify with precipitation severity [26]. Risk perception among drivers varies by weather condition and vehicle type, but among all, snow-covered roads pose the highest risk [27].

The complex interrelationships between weather variables, driver behavior, regional factors, and accident patterns underscore advanced modeling approaches. Researchers have applied various methodological frameworks to better understand these dynamics. Time series models have proven particularly valuable for capturing temporal dependencies in accident data [28,29]. In particular, ARIMA (Autoregressive Integrated Moving Average) models were used to establish associations between weather factors and trauma occurrences from collisions [30]. Park et al. [24] employed a Generalized Additive Model (GAM) with a Poisson distribution combined with meta-analysis techniques to identify temperature thresholds above which traffic accidents significantly increased in urban areas. Multivariate modeling approaches have enabled the simultaneous modeling of multiple factors. El-Basyouny et al. [31] proposed multivariate safety performance functions with multiple regression links to model both the frequency and severity of collisions in relation to weather variables and exposure proxies, achieving high predictive precision. Poisson regression models have been widely applied to accident count data, given their suitability for discrete, non-negative outcomes. Fridstrøm et al. [9] employed this approach to decompose the contribution of various factors—including randomness, exposure, and weather—to variations in road accident counts. The spatial dimension of weather–accident relationships has been addressed through innovative methodological methods. Jaroszweski et al. [32] introduced a weather radar approach to analyze rainfall impacts on urban accidents, providing superior spatial and temporal resolution compared to traditional station-based analyses. This methodology proved generally more effective, highlighting the importance of contextual factors in methodological selection. Risk assessment frameworks have also been developed to quantify the relative hazards associated with specific weather conditions. Malin et al. [33] employed Palm probability theory—derived from random point processes—to estimate the relative accident risks across different road weather conditions. This approach revealed particularly elevated risks during icy rain and on slippery road surfaces, with counterintuitive findings regarding the vulnerability of different road types during adverse conditions. Meta-analytical approaches have provided valuable integration of diverse research findings. Chudy-Laskowska et al. [34] used multidimensional comparative analysis to analyze traffic safety and road infrastructure development indicators between different regions. Qiu et al. [19] conducted a meta-analysis on the effects of adverse weather on traffic crashes to generalize research findings. This way, they excluded local phenomena and found the generalized influence of the different factors. Theofilatos et al. [35] highlighted the potential for real-time traffic and weather data to enable not only more precise estimation of individual parameter effects but also identification of complex interaction effects between traffic and weather variables. This increasing data granularity supports the development of predictive models with greater temporal and spatial specificity.

Many of the described modeling approaches face limitations in their ability to fully account for uncertainty, incorporate prior knowledge, and represent complex hierarchical or spatio-temporal structures. They often rely on restrictive assumptions and lack the flexibility needed to capture nonlinear interactions. These limitations underscore the rationale for employing Bayesian statistical frameworks, which offer a coherent probabilistic foundation for addressing such complexities. By incorporating prior information, quantifying uncertainty, and updating beliefs as new data become available, they effectively capture hierarchical structures, spatial and temporal dependencies, and intricate parameter interactions [36]. Bayesian hierarchical models have proven particularly effective in addressing spatial and temporal dependencies in accident data [37]. By pooling information across different locations or time periods, these models enhance estimation precision, especially in data-sparse areas. Similarly, Bayesian network models have been employed to represent the complex causal relationships between various accident-contributing factors [38].

Recent studies reinforce the growing value of Bayesian methods, particularly in weather-related accident analysis. First, they effectively handle the inherent variability and uncertainty in both accident occurrences and weather measurements. Second, they allow for the integration of prior information from past studies or expert knowledge, which is particularly valuable when dealing with complex weather–accident relationships. Third, they provide a natural framework for modeling hierarchical structures, such as accidents nested within geographical regions or time periods [39]. Zeng et al. [40] proposed a Bayesian bivariate conditional autoregressive model that jointly analyzes daytime and nighttime crash frequencies while accounting for spatial correlations across traffic analysis zones. Their findings reveal significant spatial effects and high correlation between crash periods, with key predictors including land use and traffic exposure. Wen et al. [41] expanded this perspective by developing a Bayesian spatio-temporal model that captures not only the main effects but also interactions between road geometry and weather factors, such as wind-speed–slope and precipitation–curve combinations, demonstrating their impact on crash risk. Meanwhile, Zeng et al. [42] introduced a Bayesian multivariate spatio-temporal interaction framework for ranking hazardous sites based on severity-weighted crash frequencies, showing that models incorporating spatial and temporal dependencies outperform simpler alternatives. Nowakowska [43,44] developed a Bayesian logistic regression model for classifying accident severity, incorporating various informative priors—including method of moments, maximum likelihood estimation, and two-stage Bayesian updating—alongside a novel Boot prior proposal. Results indicated that models trained on balanced datasets outperformed those trained on unbalanced data, with the two-stage Bayesian updating and Boot prior models achieving the highest classification accuracy. Additionally, the research explored spatial and temporal aspects of Bayesian modeling, where priors were derived from road-specific historical data or the latest available accident records. These findings underscore the importance of prior selection in Bayesian models and their adaptability for analyzing accident risks in both spatial and temporal contexts.

The present study aims to address this research gap by developing a comprehensive Bayesian model for analyzing the relationship between weather conditions and traffic accident rates across Poland. By leveraging national-scale data and accounting for spatial and temporal dependencies, this research seeks to provide a more nuanced understanding of how specific weather and socioeconomic parameters influence accident risk in different regions of Poland. The findings will have practical implications for traffic management, road safety policy, and weather warning systems tailored to Polish conditions.

This paper is structured as follows: first, the study’s rationale is established, relevant literature is reviewed, and the foundation for analyzing factors influencing RTAs in Poland is laid. The next section presents the dataset and preprocessing methods utilized, along with an overview of the Bayesian models employed. Section 3 examines the results and analyzes the findings derived from these models. Section 4 summarizes the conclusions, discusses implications, and suggests potential avenues for future research.

2. Materials and Methods

2.1. Data

The dataset employed in this study is a comprehensive aggregation of road traffic accidents recorded across Poland between 2015 and 2023, a period selected to ensure data integrity and completeness. Records prior to 2015 exhibited significant data quality issues, including a high incidence of missing values, while 2023 represented the most recent full year of data available during the dataset’s development. The resulting dataset contains approximately 250,000 individual accident records, each integrating police-reported accident characteristics with two contextual layers. For each accident, the dataset includes location-specific weather conditions—temperature, relative humidity, hourly precipitation, and 24 h cumulative precipitation—estimated using universal kriging with elevation drift. It also incorporates a suite of county-level socioeconomic indicators, such as population density, road infrastructure metrics, vehicle fleet composition, and historical accident statistics, providing a comprehensive profile for every recorded event. Additionally, the dataset provides reference baselines: daily average weather conditions for each county and yearly socioeconomic indicators for various Polish administrative units (counties, voivodeships, and national aggregates).

This dataset was developed in our previous work [45], where traffic accident data were sourced from the Accident and Collision Recording System (SEWIK), the official police-maintained registry of traffic incidents, socioeconomic metrics from Statistics Poland (GUS), Poland’s principal government agency for official statistics, and meteorological data from the Institute of Meteorology and Water Management (IMGW), the state service for meteorological and hydrological monitoring. The creation of this dataset involved extensive preprocessing to ensure its quality and analytical robustness. Key steps included the standardization of administrative unit names to resolve inconsistencies, the removal of physically implausible meteorological outliers, and the clipping of precipitation values below the minimum detectable threshold. As a result of these procedures, the data utilized in this study are clean, requiring no further imputation. Detailed preprocessing methodology can be found in the referenced work.

The multidimensional structure of this dataset enables modeling across various temporal and geographic scales. Socioeconomic data facilitates detailed modeling of accident location patterns, while meteorological parameters allow for fine-grained temporal analysis. Given these capabilities, we focus on two distinct modeling challenges with different scale considerations:

Modeling yearly accident counts at the county level
Modeling weekly accident counts at the national level

For our analysis, we examined the period from 2020 to 2023, utilizing weather parameters (temperature, humidity, and precipitation) based on data from IMGW and three socioeconomic indicators (population density, passenger car registrations, and road density) reported by GUS. The selection of these variables is directly informed by established road safety literature and the preliminary findings presented in our foundational work [45]. Weather parameters such as temperature, humidity, and precipitation were chosen for their well-documented effects on driver behavior and road surface conditions. Our previous analysis confirmed these relationships, revealing a consistent positive correlation between temperature and accident frequency (correlation coefficient of 0.09) and a notable negative correlation for humidity (−0.10). The selected socioeconomic indicators serve as robust proxies for traffic exposure and infrastructure characteristics. Specifically, population density and passenger car registrations were included due to their strong positive correlation with total accident counts (0.45 and 0.82, respectively), while road density provides a crucial measure of infrastructure development, showing a meaningful correlation (0.31). While the original dataset provides a detailed breakdown of vehicle registrations by category (e.g., passenger cars, trucks, buses), the analytical scope of this paper is focused on modeling the overall frequency of traffic accidents. Consequently, our models aggregate accidents across all vehicle types, allowing for a comprehensive assessment of the combined impact of the selected factors on total accident risk at the county level. Based on the original dataset, specific preprocessing was conducted for each analytical problem.

2.1.1. County-Level Modeling Data

For county-level yearly analysis, temperature and humidity were calculated as yearly averages of daily mean values for each county. Precipitation was represented as the sum of daily precipitation measurements within each county for a given year. Socioeconomic factors were incorporated directly as their respective yearly measurements for each county. The variables used in these models and their descriptions are presented in Table 1.

To address the varying scales and distributions of the selected variables, appropriate normalization techniques were applied, and the resulting distributions are presented in Figure 1. Temperature and humidity underwent z-score normalization due to their approximately normal distributions, while precipitation and all socioeconomic indicators were subjected to log1p normalization to address their right-skewed distributions. This normalization approach facilitates more effective model fitting and coefficient interpretation by reducing the impact of outliers and bringing variables to comparable scales.

2.1.2. Nationwide Modeling Data

For the nationwide weekly analysis, a different aggregation approach was employed. Temperature was calculated as the average of all mean daily temperatures across counties within each week. Humidity was processed using the same methodology. Precipitation was represented as the total sum of daily precipitation measurements across all counties for each week. Socioeconomic indicators were incorporated as the national aggregates for the corresponding year. The variables used in these models and their descriptions are provided in Table 2.

The normalization approach for weekly data followed a similar pattern to that used for county-level data, with certain adaptations to account for the different temporal scales. Temperature and humidity underwent z-score normalization to standardize seasonal variations. Precipitation was subjected to log1p normalization to address its right-skewed distribution. The yearly national socioeconomic data were min–max normalized to facilitate meaningful year-to-year comparisons while maintaining the relative scale of annual changes. The resulting distributions of the predictors are presented in Figure 2.

2.2. Models

Our approach to modeling traffic accident counts in Poland employs generalized linear models based on the Poisson distribution. The Poisson distribution is particularly suitable for modeling traffic accident counts as it is specifically designed for count data, where observations are non-negative integers representing the number of occurrences within a fixed interval. This distribution assumes events occur independently at a constant average rate, which aligns with the random nature of traffic accidents when aggregated across geographic areas or time periods. The Poisson distribution’s characteristic property that its mean equals its variance often approximately holds for accident count data at appropriate scales of aggregation, making it a statistically sound choice for our analytical framework.

The probability mass function of the Poisson distribution is given by:

P (Y = k) = \frac{λ^{k} e^{- λ}}{k!}

(1)

where k is the number of occurrences (accident counts in our case) and

λ

is the expected number of occurrences, which represents both the mean and variance of the distribution.

In our Poisson regression models, we relate parameter

λ

to our predictor variables through a logarithmic link function:

log (λ) = η = X β

(2)

This log-link function ensures that the predicted values remain positive, which is essential since accident counts cannot be negative. The linear predictor

η

is composed of the predictor matrix X (containing weather or socioeconomic variables, or both) and the coefficient vector

β

.

We developed four distinct models, categorized by their temporal and spatial granularity: two models for yearly county-level analysis and two models for weekly nationwide analysis. Each model configuration was designed to address specific research questions regarding the relationship between weather conditions, socioeconomic factors, and traffic accident occurrence.

The core of our Bayesian framework is captured in the following model structure:

\begin{matrix} y_{i} & \sim Poisson (e^{η_{i}}) \\ η_{i} & = α + X_{i} β \\ α & \sim Normal (μ_{α}, σ_{α}) \\ β_{k} & \sim Normal (0, σ_{β}) for all k \in K \end{matrix}

(3)

where

y_{i}

represents the observed accident count for observation i,

η_{i}

is the log-rate parameter of the Poisson distribution,

α

is the intercept term, and

β

contains the coefficients for each predictor. K represents the set of all predictors in our model, which may include weather variables, socioeconomic variables, or both, depending on the specific model configuration. The prior distributions for both the intercept and coefficients are normal distributions with specified hyperparameters.

To ensure numerical stability, we implemented a constraint on the linear predictor by clamping the maximum log-rate parameter

log (λ_{m a x})

, which prevents potential overflow issues during the sampling process.

Table 3 presents the hyperparameter values used for all four models.

The overall structure of the models and the corresponding relationships between the parameters are presented in Figure 3. All models were implemented in Stan and run using CmdStanPy, utilizing the NUTS-HMC sampler [46] to generate draws.

2.2.1. Model 1: County-Level Socioeconomic Model

The first yearly county-level model focuses on establishing a baseline relationship between accident occurrence and socioeconomic factors. This model incorporates county-specific sociodemographic variables, including population density, passenger car registrations, and road network density. The model accounts for the spatial heterogeneity of accident counts across different counties while maintaining temporal aggregation at the yearly level.

The linear predictor for this model is defined as:

\begin{matrix} log (λ_{i}) = α & + β_{1} \cdot c o u n t y_p o p u l a t i o n_d e n s i t y_{i} + \\ + β_{2} \cdot c o u n t y_p a s s e n g e r_c a r s_{i} + \\ + β_{3} \cdot c o u n t y_r o a d_d e n s i t y_{i} \end{matrix}

(4)

where

λ_{i}

represents the expected accident count for county i in a given year.

2.2.2. Model 2: County-Level Multifactorial Model

The second yearly county-level model extends Model 1 by incorporating meteorological variables. This model includes annual averages of temperature, humidity, and total precipitation for each county, alongside the socioeconomic factors. This extension aims to capture the combined influence of weather conditions and socioeconomic characteristics on accident occurrence at the county level.

The linear predictor for this model is expanded to:

\begin{matrix} log (λ_{i}) = α & + β_{1} \cdot c o u n t y_p o p u l a t i o n_d e n s i t y_{i} + β_{2} \cdot c o u n t y_p a s s e n g e r_c a r s_{i} + \\ + β_{3} \cdot c o u n t y_r o a d_d e n s i t y_{i} + β_{4} \cdot c o u n t y_m e a n_t e m p e r a t u r e_{i} + \\ + β_{5} \cdot c o u n t y_m e a n_h u m i d i t y_{i} + β_{6} \cdot c o u n t y_t o t a l_p r e c i p i t a t i o n_{i} \end{matrix}

(5)

2.2.3. Model 3: Nationwide Weather Model

Transitioning to a finer temporal resolution, the first weekly nationwide model examines the relationship between nationwide accident counts and weekly meteorological conditions. This model aggregates accident data across all counties for each week of the study period and relates these counts to country-average weather conditions. The weekly granularity allows for capturing seasonal patterns and shorter-term weather effects.

The linear predictor for this model is structured as:

\begin{matrix} log (λ_{i}) = α & + β_{1} \cdot w e e k l y_m e a n_t e m p e r a t u r e_{i} + \\ + β_{2} \cdot w e e k l y_m e a n_h u m i d i t y_{i} + \\ + β_{3} \cdot w e e k l y_t o t a l_p r e c i p i t a t i o n_{i} \end{matrix}

(6)

where

λ_{t}

represents the expected nationwide accident count for week t.

2.2.4. Model 4: Nationwide Multifactorial Model

The second weekly nationwide model combines the weather variables from Model 3 with nationwide socioeconomic indicators. Since socioeconomic data varies annually rather than weekly, these factors remain constant for all weeks within the same year but change across different years. This model attempts to capture both the short-term effects of weather fluctuations and the longer-term influence of socioeconomic conditions on accident occurrence.

The linear predictor for this enhanced model is:

\begin{matrix} log (λ_{i}) = α & + β_{1} \cdot w e e k l y_m e a n_t e m p e r a t u r e_{i} + β_{2} \cdot w e e k l y_m e a n_h u m i d i t y_{i} + \\ + β_{3} \cdot w e e k l y_t o t a l_p r e c i p i t a t i o n_{i} + β_{4} \cdot y e a r l y_p o p u l a t i o n_d e n s i t y_{i} + \\ + β_{5} \cdot y e a r l y_p a s s e n g e r_c a r s_{i} + β_{6} \cdot y e a r l y_r o a d_d e n s i t y_{i} \end{matrix}

(7)

where socioeconomic predictors are assigned the annual national values corresponding to the year of week t.

2.3. Prior Predictive Checks

Before fitting the models to observed data, prior predictive checks were performed to assess whether or not the specified priors lead to plausible predictions. This involves generating simulated data using only the prior distributions—without conditioning on any observed outcomes—to evaluate whether or not the model can produce realistic values given the assumed parameter ranges. Each model was executed with 500 warm-up iterations and 2000 sampling iterations across four chains to generate these prior-based simulations.

2.3.1. Model 1

For Model 1, the prior specification followed an iterative refinement approach based on empirical testing. These values were determined through systematic exploration of the parameter space to achieve satisfactory results. While informed by the general scale of traffic accident data, the final prior selection emerged from practical testing rather than from strict theoretical derivation based on domain knowledge of traffic accident patterns. The selected parameters, as presented in Figure 4, ensure the prior predictive distribution adequately encompasses the observed data range while maintaining appropriate uncertainty bounds for socioeconomic predictors. Resulting histograms of prior samples are presented in Figure 5.

2.3.2. Model 2

Model 2 extends the prior structure to accommodate additional meteorological predictors while preserving consistent comparison with Model 1. The prior predictive check, as presented in Figure 6, confirms that the expanded parameter space appropriately covers the empirical distribution of accident counts, validating our prior selections for the multifactorial county-level analysis. Additionally, predictive distributions for priors are presented in Figure 7.

2.3.3. Model 3

For the weekly nationwide analysis in Model 3, we adjusted prior hyperparameters, histograms of which are illustrated in Figure 8, using the same empirical approach as Model 1, but adapted for both weekly temporal intervals and national geographic scale. The modified configuration, presented in Figure 9, reflects the distinct variance structure of weekly aggregated data while maintaining sufficient coverage of the observed accident distribution.

2.3.4. Model 4

Model 4 utilizes identical prior specifications to Model 3 while incorporating both meteorological and socioeconomic predictors. The prior predictive check, as presented in Figure 10, demonstrates that, despite the increased dimensionality, our hyperparameter selection maintains appropriate coverage of the observed data space, ensuring robust posterior inference for the comprehensive nationwide model. Additionally, histograms of the prior parameters are presented in Figure 11.

3. Results

3.1. Posterior Predictive Checks

The reliability of our Bayesian models was assessed through posterior predictive checks, comparing model-generated predictions against observed data. All models employed consistent sampling parameters: 2000 iterations for sampling, 500 iterations for warm-up, and 4 independent chains.

3.1.1. Model 1

Figure 12 presents a posterior predictive distribution and the scatter plot, which reveals a positive correlation between actual and predicted values, demonstrating the model’s capacity to capture general patterns in yearly accident counts across counties. Most of the variation occurs at the lower end, with some in the middle range and minimal variation at the higher end. This suggests that the model’s ability to reliably predict extreme values may be limited by the relatively small number of samples in these lower accident count ranges.

To provide a more standardized evaluation of model performance, we aggregated the posterior predictions to compute yearly county-level accident rates and compared them against the observed rates. As illustrated in Figure 13, the socioeconomic model demonstrates good predictive accuracy. The scatter plot reveals a strong linear relationship between the observed and predicted rates on a log–log scale, with data points closely distributed around the identity line. The high coefficient of determination (

R^{2} = 0.828

) confirms this visual assessment, indicating that the model successfully explains approximately 82.8% of the variance in county-level accident rates.

Figure 14 shows that the posterior distribution for the intercept exhibits a well-defined, narrow credible interval, indicating a stable baseline accident count across counties. Among the socioeconomic predictors, passenger car ownership emerges as the most influential factor, with a consistently strong positive effect on yearly accident counts. Population density also shows a positive association with accident counts, while road density demonstrates a negative relationship. The narrow credible intervals for all parameters suggest high precision in our parameter estimates.

3.1.2. Model 2

The predictive distribution of Model 2, as presented in Figure 15, demonstrates improvements over Model 1, with better alignment between observed and predicted frequencies across the accident count spectrum. The scatter plot shows tighter clustering around the identity line, particularly for counties with moderate accident counts, suggesting enhanced predictive performance.

Following the same procedure, we evaluated the performance of the multifactorial model on predicting yearly accident rates. The results, presented in Figure 16, show a similarly strong correlation between observed and predicted values, with a slightly tighter clustering of points around the identity line compared to Model 1. The performance evaluation yielded a coefficient of determination of

R^{2} = 0.833

, indicating a marginal but positive improvement over the socioeconomic-only model. This suggests that, while socioeconomic factors are the primary drivers of variance in accident rates, the inclusion of weather-related variables provides additional explanatory power, resulting in a more accurate and comprehensive model.

Model 2, as presented in Figure 17, maintains the stable intercept estimate while adding explanatory power through additional predictors. The socioeconomic factors retain their directional effects from Model 1, with passenger car ownership continuing to show the strongest positive relationship with accident counts, followed by population density, while road density maintains its negative association. The inclusion of weather predictors (mean temperature, mean humidity, and total precipitation) enhances the model’s capacity to account for regional variations in traffic accidents, particularly in counties with distinctive climatic characteristics that may influence accident patterns beyond the core socioeconomic factors.

3.1.3. Model 3

As presented in Figure 18, the density comparison between observed and predicted accident counts provides insight into model performance across different ranges of accidents. The model captures the overall shape of the accident distribution, though with some areas of divergence. The positive correlation in the relationship between actual and predicted values demonstrates the model’s ability to track general trends in weekly accident counts. This analysis highlights potential areas for refinement, particularly for weeks with unusually high or low accident counts.

The posterior distribution for the intercept, as presented in Figure 19, exhibits stability with a narrow credible interval, indicating a well-estimated baseline accident count. Among the weather predictors, temperature emerges as the most influential factor, with a consistently positive effect on weekly accident counts. Humidity also demonstrates a positive association with accident counts, while precipitation shows a slight negative relationship. The narrow credible intervals for all parameters indicate high precision in our estimates.

3.1.4. Model 4

The predictive distribution of Model 4, as presented in Figure 20, is comparable to that of Model 3, with both showing similar alignment between observed and predicted frequencies across the accident count spectrum. The scatter plot indicates a comparable level of clustering around the identity line, suggesting that predictive accuracy remains consistent between the two models.

The introduction of socioeconomic factors in Model 4, as presented in Figure 21, maintains the stable intercept estimate while adding explanatory power. Population density and passenger car ownership demonstrate strong positive relationships with accident counts, while road density shows a negative association. Weather predictors retain their directional effects from Model 3, reinforcing the consistency of temperature and humidity as positive predictors and precipitation as a slight negative predictor of traffic accidents.

3.2. Model Comparison

Our model evaluation strategy employed two widely-recognized Bayesian predictive performance metrics: the Widely Applicable Information Criterion (WAIC) and Pareto-Smoothed Importance Sampling Leave-One-Out (PSIS-LOO) cross-validation. Both metrics estimate out-of-sample predictive accuracy while accounting for model complexity, though they differ in their computational approaches. WAIC approximates Bayesian cross-validation using the log pointwise posterior predictive density with a correction for effective number of parameters. PSIS-LOO, meanwhile, uses importance sampling with Pareto smoothing to stabilize the estimation process, particularly for models with influential observations.

The Expected Log Pointwise Predictive Density (ELPD) serves as our primary comparison metric. While higher ELPD values indicate better posterior predictive accuracy when using the model to predict new data points, the metric itself has no interpretable unit and cannot be compared across different analyses or datasets. Even within a single analysis, absolute ELPD values lack contextual meaning. However, because ELPD reliably ranks models by predictive accuracy, it remains a robust and informative criterion for comparing models within the same domain. When comparing models, we also consider Standard Errors (SEs) and the difference in Standard Errors (dSE) to assess the statistical significance of performance differences [47].

3.2.1. County-Level Models

The county-level analysis revealed differences in predictive performance between the two competing models. As shown in Figure 22 and Table 4 and Table 5, no substantial differences were observed between the WAIC and PSIS-LOO criteria; therefore, only the results from the latter are discussed. Model 2 performed better, achieving a higher Expected Log Pointwise Predictive Density (ELPD) of −12,788.50, compared to −12,872.12 for Model 1. The ELPD difference of 83.61 suggests a modest improvement in predictive accuracy. Model 2 also had a higher effective number of parameters (p-LOO = 152.42 vs. 92.02), correctly reflecting its greater complexity. It received a slightly higher model weight (0.54 vs. 0.46), suggesting a mild preference, but no strong evidence of superiority.

However, the standard errors—436.22 for Model 2 and 447.29 for Model 1—indicate that the observed ELPD difference falls well within the range of uncertainty. This shows that both models provide comparable predictive performance. While Model 2 may provide added flexibility, Model 1’s simpler structure could be advantageous in applications where interpretability or computational efficiency are especially important.

3.2.2. Nationwide Models

The weekly nationwide analysis revealed a clearer distinction in predictive performance between the two models compared to the previous county-level analysis. As shown in Figure 23 and Table 6 and Table 7, no substantial differences were observed between the WAIC and PSIS-LOO criteria; therefore, only the results from the latter are discussed. Model 4 outperformed Model 3, achieving a higher expected log pointwise predictive density (ELPD-LOO) of −1940.79, compared to −2019.78 for Model 3. The ELPD difference of 78.99 suggests a meaningful improvement in predictive accuracy. Model 4 also received a higher model weight (0.64 vs. 0.36), providing moderate evidence in its favor.

The models also differ in complexity, as indicated by their effective number of parameters: Model 4 has a higher p-LOO of 70.39, compared to 51.86 for Model 3. This reflects Model 4’s greater flexibility in capturing variation in the data. Although the increase in complexity may raise concerns about overfitting, the improvement in predictive performance and the higher model weight suggest that this added complexity is warranted.

However, the standard errors—115.07 for Model 4 and 105.23 for Model 3—indicate that the ELPD difference lies within a range of uncertainty. The differential standard error (dSE) of 43.17 further suggests that, while the observed advantage of Model 4 is notable, it is not definitive.

4. Conclusions

This study investigated the relationship between weather conditions, socioeconomic factors, and traffic accident counts in Poland. Bayesian Poisson regression models were employed across two temporal and spatial scales. The first set of models focused on yearly accident counts at the county level. Model 1, which included only socioeconomic predictors, revealed that passenger car ownership had a strong positive association with accident frequency, while population density showed a modest positive effect. This aligns with the well-established principle of traffic exposure, a fundamental concept in traffic safety research [9]. Road density exhibited a negative effect, suggesting that better-developed road networks may contribute to accident reduction. This finding is consistent with literature on the benefits of “forgiving” and thoughtfully designed infrastructure, which can mitigate driver error even in areas with high traffic volumes [17,18]. Posterior predictive checks indicated that the model captured the central tendency of the data well, though some underprediction occurred in counties with higher accident counts. Model 2 extended this framework by incorporating weather variables alongside socioeconomic predictors. While the socioeconomic effects remained stable—confirming the robustness of those associations—weather-related variables added explanatory power. Specifically, mean humidity had a positive effect on accident counts, while both mean temperature and total precipitation had negative effects. The seemingly counterintuitive negative effects of temperature and precipitation at the annual level may reflect broader climate patterns rather than event-specific risks. Counties with higher average annual temperatures and more precipitation may experience fewer days with highly hazardous conditions, which are known to significantly increase collision risk [21,22]. The positive association with humidity could be a proxy for conditions with reduced visibility, such as fog or mist, which are known hazards. The inclusion of weather factors improved model fit, particularly in terms of predictive accuracy across counties with mid-to-high accident frequencies.

The weekly nationwide analysis was conducted using Models 3 and 4. Model 3, which considered only weather variables, identified temperature as having a strong positive effect, with humidity also contributing positively, and precipitation showing a slight negative effect. The positive effect of temperature at the weekly scale, in contrast to the yearly finding, is highly consistent with international studies that link short-term temperature increases and heatwaves to a higher risk of crashes [24,25]. The slight negative effect of precipitation, while counterintuitive, can be explained by adaptive driver behavior; research has shown that drivers often reduce speeds and increase following distances during rainfall [26]. While the model captured overall patterns reasonably well, discrepancies were observed mostly in the middle range of accident counts, suggesting room for refinement. Model 4 combined weather and socioeconomic predictors at the weekly scale. This integrated model showed improved predictive performance across most of the accident frequency range. The effects of individual predictors were consistent with earlier models: socioeconomic variables retained their expected directional effects, and weather variables mirrored those in Model 3. Posterior predictive checks demonstrated that Model 4 achieved the best alignment between observed and predicted accident counts. However, the model still exhibited discrepancies in the mid-range of accident counts, suggesting that additional, unmeasured factors may exert a meaningful influence.

Despite its contributions, this study has several limitations. Temporal aggregation at the weekly and yearly levels may obscure short-term dynamics and transient weather effects. The absence of key covariates—such as driver behavior and road surface conditions—may introduce unmeasured confounding. The aggregation of counties’ socioeconomic data could hide differences between urbanized and rural areas. Additionally, the analysis does not distinguish between types of transport vehicles, which may exhibit different risk profiles. Finally, the reliance on historical data limits the model’s immediate applicability for real-time prediction or adaptive interventions.

Future research should pursue several extensions. First, the Bayesian Poisson model adopted here could be expanded to include spatial and spatio-temporal components, enabling more refined estimation of regional and temporal patterns. Second, incorporating a cognitive model of the human as a transport operator would help capture the behavioral dimensions of accident risk, particularly under adverse weather. Third, aggregating data by vehicle type could uncover mode-specific risk factors. Lastly, conducting cross-country comparisons would strengthen the generalizability and transferability of the results.

Author Contributions

Conceptualization, A.F. and Ł.F.; methodology, A.F., Ł.F., and J.B.; software, A.F. and Ł.F.; validation, A.F. and J.B.; formal analysis, A.F.; investigation, A.F.; resources, J.B.; data curation, Ł.F.; writing—original draft preparation, A.F.; writing—review and editing, A.F. and J.B.; visualization, Ł.F.; supervision, J.B.; project administration, J.B.; funding acquisition, J.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by AGH Subvention for Scientific Activity.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on Zenodo as: Faruga, Ł., Filapek, A., & Baranowski, J. (2025). Traffic Accident Analysis in Poland: Integrating Weather Data and Sociodemographic Factors [Data set]. Zenodo. https://doi.org/10.5281/zenodo.15731344.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

WAIC	Widely Applicable Information Criterion
PSIS-LOO	Pareto-Smoothed Importance Sampling Leave-One-Out
RTA	Road Traffic Accident
GDP	Gross Domestic Product
WHO	World Health Organization
FCW	Forward Collision Warning
AEB	Autonomous Emergency Braking
ARIMA	Autoregressive Integrated Moving Average
GAM	Generalized Additive Model
DIC	Deviance Information Criterion
HPD	Highest Probability Density
SEWIK	Accident and Collision Recording System
GUS	Statistics Poland
IMGW	Institute of Meteorology and Water Management
NUTS	No-U-Turn Sampler
HMC	Hamiltonian Monte Carlo
DAG	Directed Acyclic Graph
ELPD	Expected Log Pointwise Predictive Density
SE	Standard Error
dSE	Difference Standard error
CI	Confidence Interval

References

WHO. Road Traffic Injuries. Available online: https://www.who.int/news-room/fact-sheets/detail/road-traffic-injuries (accessed on 17 February 2025).
WHO. Global Status Report on Road Safety 2023; World Health Organization: Geneva, Switzerland, 2023; pp. 16–18. ISBN 978-92-4-008651-7. [Google Scholar]
UN General Assembly. Improving Global Road Safety A/RES/74/299. Available online: https://docs.un.org/en/A/RES/74/299 (accessed on 6 March 2025).
Global Plan for the Decade of Action for Road Safety 2021–2030. Available online: https://www.who.int/publications/m/item/global-plan-for-the-decade-of-action-for-road-safety-2021-2030 (accessed on 6 March 2025).
ECDG (Emergency Care Data Group) Transport for Mobility. Next Steps Towards ‘Vision Zero’—EU Road Safety Policy Framework 2021–2030; European Commission: Geneva, Switzerland, 2020. [Google Scholar]
Road Deaths in the European Union—Latest Data. Available online: https://etsc.eu/euroadsafetydata/ (accessed on 23 June 2025).
Road Traffic Accident Data for 2020–2024. Available online: https://obserwatoriumbrd.pl/statystyki/ (accessed on 6 March 2025). (In Polish).
Chand, A.; Jayesh, S.; Bhasi, A.B. Road traffic accidents: An overview of data sources, analysis techniques and contributing factors. Mater. Today Proc. 2021, 47, 5135–5141. [Google Scholar] [CrossRef]
Fridstrøm, L.; Ifver, J.; Ingebrigtsen, S.; Kulmala, R.; Thomsen, L.K. Measuring the contribution of randomness, exposure, weather, and daylight to the variation in road accident counts. Accid. Anal. Prev. 1995, 27, 1–20. [Google Scholar] [CrossRef]
Aljanahi, A.A.M.; Rhodes, A.H.; Metcalfe, A.V. Speed, speed limits and road traffic accidents under free flow conditions. Accid. Anal. Prev. 1999, 31, 161–168. [Google Scholar] [CrossRef] [PubMed]
Aarts, L.; van Schagen, I. Driving speed and the risk of road crashes: A review. Accid. Anal. Prev. 2006, 38, 215–224. [Google Scholar] [CrossRef]
Simons-Morton, B.G.; Guo, F.; Klauer, S.G.; Ehsani, J.P.; Pradhan, A.K. Keep Your Eyes on the Road: Young Driver Crash Risk Increases According to Duration of Distraction. J. Adolesc. Health 2014, 54, S61–S67. [Google Scholar] [CrossRef]
Chand, A.; Bhasi, A.B. Effect of driver distraction contributing factors on accident causations—A review. AIP Conf. Proc. 2019, 2134, 060004. [Google Scholar] [CrossRef]
Gastaldi, M.; Rossi, R.; Gecchele, G. Effects of Driver Task-related Fatigue on Driving Performance. Procedia—Soc. Behav. Sci. 2014, 111, 955–964. [Google Scholar] [CrossRef]
Cicchino, J.B. Effectiveness of forward collision warning and autonomous emergency braking systems in reducing front-to-rear crash rates. Accid. Anal. Prev. 2017, 99, 142–152. [Google Scholar] [CrossRef] [PubMed]
Pečeliūnas, R.; Žuraulis, V.; Droździel, P.; Pukalskas, S. Prediction of Road Accident Risk for Vehicle Fleet Based on Statistically Processed Tire Wear Model. Promet—Traffic Transp. 2022, 34, 619–630. [Google Scholar] [CrossRef]
Budzynski, M.; Gobis, A.; Guminska, L.; Jelinski, L.; Kiec, M.; Tomczuk, P. Assessment of the Influence of Road Infrastructure Parameters on the Behaviour of Drivers and Pedestrians in Pedestrian Crossing Areas. Energies 2021, 14, 3559. [Google Scholar] [CrossRef]
Kiso, F.; Džananović, A.; Šabanović Karičić, S. Development of road infrastructure safety management system according to updates of EU Directive 96/2008/EC. Sci. Eng. Technol. 2021, 1, 35–41. [Google Scholar] [CrossRef]
Qiu, L.; Nixon, W.A. Effects of Adverse Weather on Traffic Crashes: Systematic Review and Meta-Analysis. Transp. Res. Rec. 2008, 2055, 139–146. [Google Scholar] [CrossRef]
Keay, K.; Simmonds, I. Road accidents and rainfall in a large Australian city. Accid. Anal. Prev. 2006, 38, 445–454. [Google Scholar] [CrossRef]
Eisenberg, D.; Warner, K.E. Effects of Snowfalls on Motor Vehicle Collisions, Injuries, and Fatalities. Am. J. Public Health 2005, 95, 120–124. [Google Scholar] [CrossRef]
Abohassan, A.; El-Basyouny, K.; Kwon, T.J. Effects of Inclement Weather Events on Road Surface Conditions and Traffic Safety: An Event-Based Empirical Analysis Framework. Transp. Res. Rec. 2022, 2676, 51–62. [Google Scholar] [CrossRef]
He, L.; Liu, C.; Shan, X.; Zhang, L.; Zheng, L.; Yu, Y.; Tian, X.; Xue, B.; Zhang, Y.; Qin, X.; et al. Impact of high temperature on road injury mortality in a changing climate, 1990–2019: A global analysis. Sci. Total Environ. 2023, 857, 159369. [Google Scholar] [CrossRef]
Park, J.; Choi, Y.; Chae, Y. Heatwave impacts on traffic accidents by time-of-day and age of casualties in five urban areas in South Korea. Urban Clim. 2021, 39, 100917. [Google Scholar] [CrossRef]
Basagaña, X.; Escalera-Antezana, J.P.; Dadvand, P.; Llatje, Ò.; Barrera-Gómez, J.; Cunillera, J.; Medina-Ramón, M.; Pérez, K. High Ambient Temperatures and Risk of Motor Vehicle Crashes in Catalonia, Spain (2000–2011): A Time-Series Analysis. Environ. Health Perspect. 2015, 123, 1309–1316. [Google Scholar] [CrossRef]
Billot, R.; El Faouzi, N.E.; De Vuyst, F. Multilevel Assessment of the Impact of Rain on Drivers’ Behavior: Standardized Methodology and Empirical Analysis. Transp. Res. Rec. 2009, 2107, 134–142. [Google Scholar] [CrossRef]
Hjelkrem, O.A.; Ryeng, E.O. Chosen risk level during car-following in adverse weather conditions. Accid. Anal. Prev. 2016, 95, 227–235. [Google Scholar] [CrossRef]
Brijs, T.; Karlis, D.; Wets, G. Studying the effect of weather conditions on daily crash counts using a discrete time-series model. Accid. Anal. Prev. 2008, 40, 1180–1190. [Google Scholar] [CrossRef] [PubMed]
Basagaña, X.; de la Peña-Ramirez, C. Ambient temperature and risk of motor vehicle crashes: A countrywide analysis in Spain. Environ. Res. 2023, 216, 114599. [Google Scholar] [CrossRef]
Abe, T.; Tokuda, Y.; Ohde, S.; Ishimatsu, S.; Nakamura, T.; Birrer, R.B. The influence of meteorological factors on the occurrence of trauma and motor vehicle collisions in Tokyo. Emerg. Med. J. 2008, 25, 769–772. [Google Scholar] [CrossRef] [PubMed]
El-Basyouny, K.; Kwon, D.W. Assessing Time and Weather Effects on Collision Frequency by Severity in Edmonton Using Multivariate Safety Performance Functions. In Proceedings of the Transportation Research Board 91st Annual Meeting, Washington, DC, USA, 22–26 January 2012; p. 12-0494. [Google Scholar]
Jaroszweski, D.; McNamara, T. The influence of rainfall on road accidents in urban areas: A weather radar approach. Travel Behav. Soc. 2014, 1, 15–21. [Google Scholar] [CrossRef]
Malin, F.; Norros, I.; Innamaa, S. Accident risk of road and weather conditions on different road types. Accid. Anal. Prev. 2019, 122, 181–188. [Google Scholar] [CrossRef]
Chudy-Laskowska, K.; Pisula, T. Bezpieczeństwo w ruchu drogowym w Polsce w przekroju województw. Analiza porównawcza. TTS Tech. Transp. Szyn. 2013, 20, 1895–1906. [Google Scholar]
Theofilatos, A.; Yannis, G. A review of the effect of traffic and weather characteristics on road safety. Accid. Anal. Prev. 2014, 72, 244–256. [Google Scholar] [CrossRef]
Gelman, A.; Shalizi, C.R. Philosophy and the practice of Bayesian statistics. Br. J. Math. Stat. Psychol. 2013, 66, 8–38. [Google Scholar] [CrossRef]
Aguero-Valverde, J. Direct Spatial Correlation in Crash Frequency Models: Estimation of the Effective Range. J. Transp. Saf. Secur. 2014, 6, 21–33. [Google Scholar] [CrossRef]
Deublein, M.; Schubert, M.; Adey, B.T.; García de Soto, B. A Bayesian network model to predict accidents on Swiss highways. Infrastruct. Asset Manag. 2015, 2, 145–158. [Google Scholar] [CrossRef]
Huang, H.; Abdel-Aty, M. Multilevel data and Bayesian analysis in traffic safety. Accid. Anal. Prev. 2010, 42, 1556–1565. [Google Scholar] [CrossRef] [PubMed]
Zeng, Q.; Wen, H.; Wong, S.C.; Huang, H.; Guo, Q.; Pei, X. Spatial joint analysis for zonal daytime and nighttime crash frequencies using a Bayesian bivariate conditional autoregressive model. J. Transp. Saf. Secur. 2020, 12, 566–585. [Google Scholar] [CrossRef]
Wen, H.; Zhang, X.; Zeng, Q.; Sze, N.N. Bayesian spatial-temporal model for the main and interaction effects of roadway and weather characteristics on freeway crash incidence. Accid. Anal. Prev. 2019, 132, 105249. [Google Scholar] [CrossRef] [PubMed]
Zeng, Q.; Xu, P.; Wang, X.; Wen, H.; Hao, W. Applying a Bayesian multivariate spatio-temporal interaction model based approach to rank sites with promise using severity-weighted decision parameters. Accid. Anal. Prev. 2021, 157, 106190. [Google Scholar] [CrossRef]
Nowakowska, M. Selected aspects of prior and likelihood information for a Bayesian classifier in a road safety analysis. Accid. Anal. Prev. 2017, 101, 97–106. [Google Scholar] [CrossRef]
Nowakowska, M. Spatial and temporal aspects of prior and likelihood data choices for Bayesian models in road traffic safety analyses. Eksploat. I Niezawodn.—Maint. Reliab. 2016, 19, 68–75. [Google Scholar] [CrossRef]
Faruga, Ł.; Filapek, A.; Baranowski, J. Dataset for Traffic Accident Analysis in Poland: Integrating Weather Data and Sociodemographic Factors. Appl. Sci. 2025. submitted. [Google Scholar]
Hoffman, M.D.; Gelman, A. The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo. arXiv 2011, arXiv:1111.4246. [Google Scholar] [CrossRef]
Johnson, A.A.; Ott, M.Q.; Dogucu, M. Bayes Rules! An Introduction to Applied Bayesian Modeling. Available online: https://www.bayesrulesbook.com/chapter-10 (accessed on 20 June 2025).

Figure 1. Normalized distributions of predictors used in county-level yearly accident modeling. Top row: Weather factors including (a) mean temperature (z-score normalized), (b) mean humidity (z-score normalized), and (c) total precipitation (log1p normalized). Bottom row: Socioeconomic factors including (d) population density, (e) passenger car count, and (f) road density, all log1p normalized. The distributions illustrate the variability of these factors across Polish counties during the 2020–2023 study period.

Figure 2. Normalized distributions of weather factors used in nationwide weekly accident modeling. The histograms show (a) mean temperature (z-score normalized), (b) mean humidity (z-score normalized), and (c) total precipitation (log1p normalized). These distributions represent the aggregated weekly weather conditions across Poland during the 2020–2023 study period. Distributions of yearly variables were excluded from the figure as they offered little additional insight for the analysis.

Figure 3. Directed Acyclic Graph (DAG) of the model structure;

p c

—

p a s s e n g e r_c a r s

,

p d

—

p o p u l a t i o n_d e n s i t y

,

r d

—

r o a d_d e n s i t y

, t—

t e m p e r a t u r e

, h—

h u m i d i t y

, p—

p r e c i p i t a t i o n

.

Figure 3. Directed Acyclic Graph (DAG) of the model structure;

p c

—

p a s s e n g e r_c a r s

,

p d

—

p o p u l a t i o n_d e n s i t y

,

r d

—

r o a d_d e n s i t y

, t—

t e m p e r a t u r e

, h—

h u m i d i t y

, p—

p r e c i p i t a t i o n

.

Figure 4. Prior predictive check for Model 1 in the yearly county-level traffic accident analysis. The histogram compares the prior predictive distribution (red) against the actual observed accident counts (blue) on a logarithmic density scale. The prior distribution adequately covers the range of observed values, confirming that our model’s parameter space is sufficiently broad to encompass the actual data. This indicates appropriate prior specification for the Bayesian modeling of yearly traffic accidents across Polish counties.

Figure 5. Prior predictive distributions for Model 1 in the yearly county-level accident count analysis. The top row presents the overall histogram of prior samples for socioeconomic predictors: (a) population density, (b) number of passenger cars, and (c) road density. The bottom row displays the prior distribution for (d) the model intercept.

Figure 6. Prior predictive check for Model 2 in the yearly county-level traffic accident analysis. Similar to Model 1, this histogram illustrates the comparison between prior predictions (red) and observed data (blue) using logarithmic density scaling. The chosen prior specifications successfully span the entire range of empirical accident counts, demonstrating that Model 2’s parameter space appropriately accommodates the observed data patterns across Polish counties. This confirms the suitability of both prior distributions for subsequent Bayesian inference.

Figure 7. Prior predictive distributions for Model 2 in the yearly county-level accident count analysis. The first row presents the overall histogram of prior samples for primary socioeconomic predictors: (a) population density, (b) number of passenger cars, and (c) road density. The middle row displays prior distributions for weather factors: (d) temperature, (e) humidity, and (f) precipitation. The last row shows the prior distribution for (g) the model intercept.

Figure 8. Prior predictive distributions for Model 3 in the weekly nationwide accident count analysis. The top row presents the overall histogram of prior samples for weather predictors: (a) mean temperature, (b) mean humidity, and (c) total precipitation. The bottom row displays the prior distribution for (d) the model intercept.

Figure 9. Prior predictive check for Model 3 in the weekly nationwide traffic accident analysis. The histogram compares the prior predictive distribution (red) against the actual observed accident counts (blue) on a logarithmic density scale. The prior distribution adequately covers the range of observed values, confirming that our model’s parameter space is sufficiently broad to encompass the actual data. This indicates appropriate prior specification for the Bayesian modeling of weekly traffic accidents across Poland.

Figure 10. Prior predictive check for Model 4 in the weekly nationwide traffic accident analysis. Similar to Model 3, this histogram illustrates the comparison between prior predictions (red) and observed data (blue) using logarithmic density scaling. The chosen prior specifications successfully span the entire range of empirical accident counts, demonstrating that Model 4’s parameter space appropriately accommodates the observed data patterns across Polish voivodeships. This confirms the suitability of both prior distributions for subsequent Bayesian inference.

Figure 11. Prior predictive distributions for Model 4 in the weekly nationwide accident count analysis. The first row presents the overall histogram of prior samples for weather predictors: (a) mean temperature, (b) mean humidity, and (c) total precipitation. The second row displays prior distributions for socioeconomic predictors: (d) population density, (e) number of passenger cars, and (f) road density. The bottom row shows the prior distribution for (g) the model intercept.

Figure 12. Model 1 posterior predictive checks: (a) Comparison of observed accident frequency distribution (blue) versus model-predicted distribution (red), showing the model capturing the central tendency with some discrepancies in the higher accident ranges. (b) Scatter plot of observed versus predicted yearly accident counts with the identity line (red dashed). Points clustering around this line indicate accurate predictions, with increasing variance at higher accident counts. The coefficient of determination (

R^{2} = 0.810

) indicates good model fit.

Figure 12. Model 1 posterior predictive checks: (a) Comparison of observed accident frequency distribution (blue) versus model-predicted distribution (red), showing the model capturing the central tendency with some discrepancies in the higher accident ranges. (b) Scatter plot of observed versus predicted yearly accident counts with the identity line (red dashed). Points clustering around this line indicate accurate predictions, with increasing variance at higher accident counts. The coefficient of determination (

R^{2} = 0.810

) indicates good model fit.

Figure 13. Observed vs. predicted yearly county-level accident rates from Model 1. This scatter plot displays the model’s predicted rates against the observed rates on a log–log scale. The coefficient of determination (

R^{2} = 0.828

) indicates that the socioeconomic factors in the model account for approximately 82.8% of the variance in the observed accident rates, signifying a good model fit.

Figure 13. Observed vs. predicted yearly county-level accident rates from Model 1. This scatter plot displays the model’s predicted rates against the observed rates on a log–log scale. The coefficient of determination (

R^{2} = 0.828

) indicates that the socioeconomic factors in the model account for approximately 82.8% of the variance in the observed accident rates, signifying a good model fit.

Figure 14. Posterior distributions for model parameters: (a) Posterior density for the model intercept (

α

), showing a narrow, approximately normal distribution with a high-precision estimate of yearly accident counts across Polish counties (median: −8.839 [95% CI: −8.974, −8.705]). (b) Posterior densities for the three socioeconomic predictors: population density (median: 0.032 [95% CI: 0.020, 0.046]) (top), passenger car ownership (median: 1.190 [95% CI: 1.178, 1.202]) (middle), and road density (median: −0.112 [95% CI: −0.137, −0.087]) (bottom). Population density shows a slight positive effect, passenger car ownership demonstrates a strong positive effect, while road density has a negative effect.

Figure 14. Posterior distributions for model parameters: (a) Posterior density for the model intercept (

α

), showing a narrow, approximately normal distribution with a high-precision estimate of yearly accident counts across Polish counties (median: −8.839 [95% CI: −8.974, −8.705]). (b) Posterior densities for the three socioeconomic predictors: population density (median: 0.032 [95% CI: 0.020, 0.046]) (top), passenger car ownership (median: 1.190 [95% CI: 1.178, 1.202]) (middle), and road density (median: −0.112 [95% CI: −0.137, −0.087]) (bottom). Population density shows a slight positive effect, passenger car ownership demonstrates a strong positive effect, while road density has a negative effect.

Figure 15. Model 2 posterior predictive checks: (a) Comparison of observed accident frequency distribution (blue) versus model-predicted distribution (red), showing similar alignment compared to Model 1. (b) Scatter plot of observed versus predicted yearly accident counts with the identity line (red dashed). The coefficient of determination (

R^{2} = 0.813

) indicates a slightly higher model fit compared to Model 1.

Figure 15. Model 2 posterior predictive checks: (a) Comparison of observed accident frequency distribution (blue) versus model-predicted distribution (red), showing similar alignment compared to Model 1. (b) Scatter plot of observed versus predicted yearly accident counts with the identity line (red dashed). The coefficient of determination (

R^{2} = 0.813

) indicates a slightly higher model fit compared to Model 1.

Figure 16. Observed vs. predicted yearly county-level accident rates for Model 2. The log–log scatter plot demonstrates a strong positive correlation, with points clustering around the identity line (red dashed). The coefficient of determination (

R^{2} = 0.833

) indicates a very good model fit and a slight improvement in predictive performance compared to Model 1.

Figure 16. Observed vs. predicted yearly county-level accident rates for Model 2. The log–log scatter plot demonstrates a strong positive correlation, with points clustering around the identity line (red dashed). The coefficient of determination (

R^{2} = 0.833

) indicates a very good model fit and a slight improvement in predictive performance compared to Model 1.

Figure 17. Posterior distributions for Model 2 parameters: (a) Intercept posterior density showing a well-defined, narrow distribution with high estimate precision. (b) Socioeconomic predictors, including population density—positive effect (top), passenger car ownership—strong positive effect (middle), and road density—negative effect (bottom). (c) Weather predictors showing mean temperature—negative effect (top), mean humidity—positive effect (middle), and total precipitation—negative effect (bottom) on accident counts across counties.

Figure 18. Model 3 posterior predictive checks: (a) Comparison of observed accident frequency distribution (blue) versus model-predicted distribution (red), showing the model capturing the central tendency with notable discrepancies in the 300–400 accident range, where the model predicts higher frequencies than observed. (b) Scatter plot of observed versus predicted weekly accident counts with the identity line (red dashed). Points clustering around this line indicate accurate predictions, with increasing variance at both lower and higher accident counts, suggesting areas for potential model improvement. The coefficient of determination (

R^{2} = 0.607

) indicates moderate model fit.

Figure 18. Model 3 posterior predictive checks: (a) Comparison of observed accident frequency distribution (blue) versus model-predicted distribution (red), showing the model capturing the central tendency with notable discrepancies in the 300–400 accident range, where the model predicts higher frequencies than observed. (b) Scatter plot of observed versus predicted weekly accident counts with the identity line (red dashed). Points clustering around this line indicate accurate predictions, with increasing variance at both lower and higher accident counts, suggesting areas for potential model improvement. The coefficient of determination (

R^{2} = 0.607

) indicates moderate model fit.

Figure 19. Posterior distributions for model parameters: (a) Posterior density for the model intercept (

α

), showing a narrow, approximately normal distribution with a high-precision estimate of weekly accident counts across Poland. (b) Posterior densities for the three weather predictors: mean temperature (top), mean humidity (middle), and total precipitation (bottom). Temperature shows a strong positive effect, humidity demonstrates a moderate positive effect, while precipitation has a slight negative effect.

Figure 19. Posterior distributions for model parameters: (a) Posterior density for the model intercept (

α

), showing a narrow, approximately normal distribution with a high-precision estimate of weekly accident counts across Poland. (b) Posterior densities for the three weather predictors: mean temperature (top), mean humidity (middle), and total precipitation (bottom). Temperature shows a strong positive effect, humidity demonstrates a moderate positive effect, while precipitation has a slight negative effect.

Figure 20. Model 4 posterior predictive checks: (a) Comparison of observed accident frequency distribution (blue) versus model-predicted distribution (red), showing improved alignment, particularly in the 350–450 accident range compared to Model 1. (b) Scatter plot of observed versus predicted weekly accident counts with the identity line (red dashed). The reduced scatter around the line indicates improved predictive performance across most of the data range, particularly in the 300–500 accident range. The coefficient of determination (

R^{2} = 0.643

) indicates improved model fit compared to Model 3.

Figure 20. Model 4 posterior predictive checks: (a) Comparison of observed accident frequency distribution (blue) versus model-predicted distribution (red), showing improved alignment, particularly in the 350–450 accident range compared to Model 1. (b) Scatter plot of observed versus predicted weekly accident counts with the identity line (red dashed). The reduced scatter around the line indicates improved predictive performance across most of the data range, particularly in the 300–500 accident range. The coefficient of determination (

R^{2} = 0.643

) indicates improved model fit compared to Model 3.

Figure 21. Posterior distributions for Model 4 parameters: (a) Intercept posterior density showing a well-defined, narrow distribution with high estimate precision. (b) Socioeconomic predictors including population density—positive effect (top), passenger car ownership—strong positive effect (middle), and road density—negative effect (bottom). (c) Weather predictors showing consistent effects with Model 3: temperature—positive effect (top), humidity—positive effect (middle), and precipitation—slight negative effect (bottom).

Figure 22. Comparison of yearly county-level models using the (a) PSIS-LOO and (b) WAIC criteria. The dashed vertical line indicates the reference point for model comparison based on the expected log pointwise predictive density (ELPD-LOO or ELPD-WAIC). The dashed vertical line indicates the reference point for the best-performing model (Model 2). Both model comparison metrics yield consistent results.

Figure 23. Comparison of weekly nationwide models using the (a) PSIS-LOO and (b) WAIC criteria. The dashed vertical line indicates the reference point for model comparison based on the expected log pointwise predictive density (ELPD-LOO or ELPD-WAIC). Both model comparison metrics yield consistent results.

Table 1. Variables used in the yearly county-level accident count modeling.

Variable	Description
county_mean_temperature	Annual average of daily mean temperatures per county (°C)
county_mean_humidity	Annual average of daily mean relative humidity per county (%)
county_total_precipitation	Annual accumulated precipitation per county (mm)
county_population_density	Population per square kilometer
county_passenger_cars	Number of registered passenger vehicles per county
county_road_density	Length of paved municipal and county roads per 100 km²

Table 2. Variables used in the weekly nationwide accident count modeling.

Variable Name	Description
weekly_mean_temperature	Weekly average of daily mean temperatures across all counties (°C)
weekly_mean_humidity	Weekly average of daily mean relative humidity across all counties (%)
weekly_total_precipitation	Total weekly precipitation across all counties (mm)
yearly_population_density	National population density for the corresponding year (inhabitants/km²)
yearly_passenger_cars	Total number of registered passenger vehicles nationwide
yearly_road_density	National average of municipal and county paved roads per 100 km²

Table 3. Prior distribution parameters for the Bayesian models.

Parameter	Yearly County-Level Models	Weekly Nationwide Models
$μ_{α}$	3.0	2.0
$σ_{α}$	0.3	0.5
$σ_{β}$	0.2	0.15
$\log (λ_{m a x})$	6.7	6.8

Table 4. Comparison of county-level models based on PSIS-LOO criterion for accident count prediction.

Model	Rank	ELPD-LOO	p-LOO	ELPD Diff	Weight	SE	dSE
Model 2	0	−12,788.50	152.42	0.00	0.54	436.22	0.00
Model 1	1	−12,872.12	92.02	83.61	0.46	447.29	67.57

Table 5. Comparison of county-level models based on WAIC criterion for accident count prediction.

Model	Rank	ELPD-WAIC	p-WAIC	ELPD Diff	Weight	SE	dSE
Model 2	0	−12,789.84	153.76	0.00	0.54	436.54	0.00
Model 1	1	−12,874.76	94.67	84.93	0.46	448.08	67.63

Table 6. Comparison of weekly nationwide models based on PSIS-LOO criterion for accident count prediction.

Model	Rank	ELPD-LOO	p-LOO	ELPD Diff	Weight	SE	dSE
Model 4	0	−1940.79	70.39	0.00	0.64	115.07	0.00
Model 3	1	−2019.78	51.86	78.99	0.36	105.23	43.17

Table 7. Comparison of nationwide models based on WAIC criterion for accident count prediction.

Model	Rank	ELPD-WAIC	p-WAIC	ELPD Diff	Weight	SE	dSE
Model 4	0	−1941.50	72.26	0.00	0.64	115.10	0.00
Model 3	1	−2019.21	51.29	77.70	0.36	105.12	43.28

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Filapek, A.; Faruga, Ł.; Baranowski, J. Bayesian Modeling of Traffic Accident Rates in Poland Based on Weather Conditions. Appl. Sci. 2025, 15, 7332. https://doi.org/10.3390/app15137332

AMA Style

Filapek A, Faruga Ł, Baranowski J. Bayesian Modeling of Traffic Accident Rates in Poland Based on Weather Conditions. Applied Sciences. 2025; 15(13):7332. https://doi.org/10.3390/app15137332

Chicago/Turabian Style

Filapek, Adam, Łukasz Faruga, and Jerzy Baranowski. 2025. "Bayesian Modeling of Traffic Accident Rates in Poland Based on Weather Conditions" Applied Sciences 15, no. 13: 7332. https://doi.org/10.3390/app15137332

APA Style

Filapek, A., Faruga, Ł., & Baranowski, J. (2025). Bayesian Modeling of Traffic Accident Rates in Poland Based on Weather Conditions. Applied Sciences, 15(13), 7332. https://doi.org/10.3390/app15137332

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Bayesian Modeling of Traffic Accident Rates in Poland Based on Weather Conditions

Abstract

1. Introduction

2. Materials and Methods

2.1. Data

2.1.1. County-Level Modeling Data

2.1.2. Nationwide Modeling Data

2.2. Models

2.2.1. Model 1: County-Level Socioeconomic Model

2.2.2. Model 2: County-Level Multifactorial Model

2.2.3. Model 3: Nationwide Weather Model

2.2.4. Model 4: Nationwide Multifactorial Model

2.3. Prior Predictive Checks

2.3.1. Model 1

2.3.2. Model 2

2.3.3. Model 3

2.3.4. Model 4

3. Results

3.1. Posterior Predictive Checks

3.1.1. Model 1

3.1.2. Model 2

3.1.3. Model 3

3.1.4. Model 4

3.2. Model Comparison

3.2.1. County-Level Models

3.2.2. Nationwide Models

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI