Using Neural Networks to Predict the Frequency of Traffic Accidents by Province in Poland

Gorzelańczyk, Piotr; Zabel, Jacek; Sokolovskij, Edgar

doi:10.3390/app15169108

Open AccessArticle

Using Neural Networks to Predict the Frequency of Traffic Accidents by Province in Poland

by

Piotr Gorzelańczyk

¹

,

Jacek Zabel

²

and

Edgar Sokolovskij

^3,*

¹

Department of Transport, Stanislaw Staszic State University of Applied Sciences in Pila, Podchorazych Str. 10, 64-920 Pila, Poland

²

Laboratory of Vision Science and Optometry, Faculty of Physics and Astronomy, Adam Mickiewicz University of Poznan, Uniwersytetu Poznanskiego Str. 2, 61-614 Poznan, Poland

³

Department of Automobile Engineering, Faculty of Transport Engineering, Vilnius Gediminas Technical University, Plytinės Str. 25, 10105 Vilnius, Lithuania

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(16), 9108; https://doi.org/10.3390/app15169108

Submission received: 7 July 2025 / Revised: 8 August 2025 / Accepted: 17 August 2025 / Published: 19 August 2025

(This article belongs to the Special Issue Simulations and Experiments in Design of Transport Vehicles)

Download

Browse Figures

Versions Notes

Abstract

Road traffic fatalities remain a significant global issue, despite a gradual decline in recent years. Although the number of accidents has decreased—partly due to reduced mobility during the pandemic—the figures remain alarmingly high. To further reduce these numbers, it is crucial to identify regions with the highest accident rates and predict future trends. This study aims to forecast traffic accident occurrences across Poland’s provinces. Using official police data on annual accident statistics, we analyzed historical trends and applied predictive modeling in Statistica to estimate accident rates from 2022 to 2040. Several neural network models were employed to generate these projections. The findings indicate that a significant reduction in road accidents is unlikely in the near future, with rates expected to stabilize rather than decline. The accuracy of predictions was influenced by the random sampling distribution used in model training. Specifically, a 70-15-15 split (70% training, 15% testing, and 15% validation) yielded an average error of 1.75%, and an 80-10-10 split reduced the error to 0.63%, demonstrating the impact of sample allocation on predictive performance. These results highlight the importance of dataset partitioning in accident forecasting models.

Keywords:

traffic accident; forecasting; neural networks

1. Introduction

Annual fatalities caused by road accidents remain alarmingly high. According to data from the World Health Organization (WHO), approximately 1.3 million lives are lost globally each year due to traffic collisions. Road crashes are the leading cause of mortality among individuals aged 5 to 29. In response, the United Nations has set an ambitious goal to reduce traffic-related fatalities and serious injuries by 50% before 2030 [1].

The losses caused by traffic accidents, as well as the costs of activities related to the investigation of their circumstances, constitute a sufficiently significant amount in each country. Therefore, not only research related to the analysis of the causes and consequences of traffic accidents, their reconstruction, and investigation of the circumstances [2,3] but also the assessment of the risk of traffic accidents [4,5] and their prediction [6] are becoming important and increasingly relevant. This would allow for better management of future risks and planning of activities related to the problem of traffic accidents.

2. Literature Review

Various approaches have been developed to predict accident rates. Among these, time series analysis is one of the most widely applied methods [7]. However, this technique has certain drawbacks, including difficulties in evaluating prediction accuracy based on historical data and frequent autocorrelation in residual components [8]. Researchers have explored alternative models; for instance, Procházka et al. [9] employed a multi-seasonal forecasting model, while Sunny et al. [10] applied Holt–Winters exponential smoothing. A key limitation of the latter is its inability to account for external variables in predictions [11,12].

Forecasting the number of traffic accidents is a topic that has attracted attention both for its social dimension and its practical significance for transport policy. To date, a variety of methods have been used. The most commonly used are time series models, such as ARIMA or Holt–Winters, but they have limitations in integrating exogenous variables and are sensitive to the autocorrelation of residual components [13]. We also use regression models for forecasting. They use linear relationships but require stationarity of the data. Their effectiveness decreases when the phenomena are nonlinear [14]. ANOVA models and classical statistical methods are another type of forecasting method. They are most often used in regional forecasting, such as in Poland [15], but are burdened with assumptions that, if broken, lead to erroneous conclusions.

A growing number of studies indicate that neural networks are more effective in forecasting nonlinear and seasonal phenomena [16]. The use of MLP (multilayer perceptron) models allows the analysis of complex data, such as the number of accidents, taking into account seasonality and local trends. However, it is important to note that, considering the COVID-19 pandemic, numerous studies have confirmed a significant reduction in road accidents during lockdowns, up to 30% [17]. However, the trend returned to previous values after restrictions were lifted.

3. Materials and Methods

In recent years, the number of traffic accidents in Poland has been decreasing, which is partly attributed to the COVID-19 pandemic and legislative action. According to the Police Headquarters, 30,288 fatal accidents were reported in 2019, with 2909 fatalities. There was a decrease in 2020, with 2491 deaths (−14% against 2019). This trend continued in the following years as well: In 2021, there were 22,816 accidents (2245 killed), while in 2022, 21,322 accidents occurred (1896 killed). A further decline in 2023 (20,936 accidents, 1893 killed) illustrates that the number of incidents remained at a reduced level after the pandemic restrictions were lifted. There was a slight rebound in 2024, with 21,519 accidents and 1896 fatalities reported.

The trend is also evident in the charts. The CSO analysis shows that there was a general decline in both accidents, fatalities, and injuries from 2013 to 2022. Data comparison: in 2021, 22.6 thousand accidents and 2212 fatalities were observed, while in 2024, 21.5 thousand accidents and 1896 deaths were reported. This means that the pandemic has led to an approximate 10–15% reduction in the number of casualties (from 2909 killed in 2019 to 1896 in 2024).

Poland is home to about 38 million people (Table 1). Its entire size is 312,705 square kilometers, divided into 16 provinces (Table 2, Figure 1). In every province that was studied between 2001 and 2021, the average decrease from 2001 is higher than 56%. The provinces of Podlaskie (69%) and Kujawsko–Pomorskie (70%) have the highest level of awareness, while Lubuskie (32%) has the lowest level of awareness. The number of traffic accidents in a province is based on its population. Poland’s accident rate is exceptionally high when compared to other EU countries. Consequently, every attempt should be made to reduce this number and identify the provinces with the highest number of traffic accidents (Figure 2 and Figure 3).

Table 1. Population of Poland from 2001 to 2020 [18].

	Lower Silesia	Kuyavia–Pomerania	Lublin Province	Lubusz Province	Lodz Province	Lesser Poland	Masovia	Opole Province
2001	2,909,622	2,069,747	2,201,720	1,008,983	2,617,318	3,236,268	5,121,681	1,066,438
2002	2,904,694	2,069,166	2,196,992	1,008,196	2,607,380	3,237,217	5,128,623	1,061,009
2003	2,898,313	2,068,142	2,191,172	1,008,786	2,597,094	3,252,949	5,135,732	1,055,667
2004	2,893,055	2,068,258	2,185,156	1,009,168	2,587,702	3,260,201	5,145,997	1,051,531
2005	2,888,232	2,068,253	2,179,611	1,009,198	2,577,465	3,266,187	5,157,729	1,047,407
2006	2,882,317	2,066,371	2,172,766	1,008,520	2,566,198	3,271,206	5,171,702	1,041,941
2007	2,878,410	2,066,136	2,166,213	1,008,481	2,555,898	3,279,036	5,188,488	1,037,088
2008	2,877,059	2,067,918	2,161,832	1,008,962	2,548,861	3,287,136	5,204,495	1,033,040
2009	2,876,627	2,069,083	2,157,202	1,010,047	2,541,832	3,298,270	5,222,167	1,031,097
2010	2,917,242	2,098,711	2,178,611	1,023,215	2,542,436	3,336,699	5,267,072	1,017,241
2011	2,916,577	2,098,370	2,171,857	1,023,158	2,533,681	3,346,796	5,285,604	1,013,950
2012	2,914,362	2,096,404	2,165,651	1,023,317	2,524,651	3,354,077	5,301,760	1,010,203
2013	2,909,997	2,092,564	2,156,150	1,021,470	2,513,093	3,360,581	5,316,840	1,004,416
2014	2,908,457	2,089,992	2,147,746	1,020,307	2,504,136	3,368,336	5,334,511	1,000,858
2015	2,904,207	2,086,210	2,139,726	1,018,075	2,493,603	3,372,618	5,349,114	996,011
2016	2,903,710	2,083,927	2,133,340	1,017,376	2,485,323	3,382,260	5,365,898	993,036
2017	2,902,547	2,082,944	2,126,317	1,016,832	2,476,315	3,391,380	5,384,617	990,069
2018	2,901,225	2,077,775	2,117,619	1,014,548	2,466,322	3,400,577	5,403,412	986,506
2019	2,900,163	2,072,373	2,108,270	1,011,592	2,454,779	3,410,901	5,423,168	982,626
2020	2,891,321	2,061,942	2,095,258	1,007,145	2,437,970	3,410,441	5,425,028	976,774
	Subcarpathia	Podlaskie Province	Pomerania	Silesia	Holy Cross	Warmia-Masuria	Greater Poland	West Pomerania
2001	2,104,138	1,209,439	2,178,337	4,741,816	1,299,382	1,428,469	3,350,437	1,698,402
2002	2,105,050	1,207,704	2,183,636	4,731,533	1,295,885	1,428,449	3,355,279	1,697,718
2003	2,097,248	1,205,117	2,188,918	4,714,982	1,291,598	1,428,885	3,359,932	1,696,073
2004	2,097,975	1,202,425	2,194,041	4,700,771	1,288,693	1,428,714	3,365,283	1,694,865
2005	2,098,263	1,199,689	2,199,043	4,685,775	1,285,007	1,428,601	3,372,417	1,694,178
2006	2,097,564	1,196,101	2,203,595	4,669,137	1,279,838	1,426,883	3,378,502	1,692,838
2007	2,097,338	1,192,660	2,210,920	4,654,115	1,275,550	1,426,155	3,386,882	1,692,271
2008	2,099,495	1,191,470	2,219,512	4,645,665	1,272,784	1,427,073	3,397,617	1,692,957
2009	2,101,732	1,189,731	2,230,099	4,640,725	1,270,120	1,427,118	3,408,281	1,693,198
2010	2,127,948	1,203,448	2,275,494	4,634,935	1,282,546	1,453,782	3,446,745	1,723,741
2011	2,128,687	1,200,982	2,283,500	4,626,357	1,278,116	1,452,596	3,455,477	1,722,739
2012	2,129,951	1,198,690	2,290,070	4,615,870	1,273,995	1,450,697	3,462,196	1,721,405
2013	2,129,294	1,194,965	2,295,811	4,599,447	1,268,239	1,446,915	3,467,016	1,718,861
2014	2,129,187	1,191,918	2,302,077	4,585,924	1,263,176	1,443,967	3,472,579	1,715,431
2015	2,127,657	1,188,800	2,307,710	4,570,849	1,257,179	1,439,675	3,475,323	1,710,482
2016	2,127,656	1,186,625	2,315,611	4,559,164	1,252,900	1,436,367	3,481,625	1,708,174
2017	2,129,138	1,184,548	2,324,251	4,548,180	1,247,732	1,433,945	3,489,210	1,705,533
2018	2,129,015	1,181,533	2,333,523	4,533,565	1,241,546	1,428,983	3,493,969	1,701,030
2019	2,127,164	1,178,353	2,343,928	4,517,635	1,233,961	1,422,737	3,498,733	1,696,193
2020	2,121,229	1,173,286	2,346,671	4,492,330	1,224,626	1,416,495	3,496,450	1,688,047

Table 2. Area and population by province in Poland in 2020 [18].

Province	Area		Population
Province	in ha	in km²	Total	per 1 km²
Poland	31,270,525	312,705	38,265,013	122
Lower Silesia	1,994,670	19,947	2,891,321	145
Kuyavia–Pomerania	1,797,134	17,971	2,061,942	115
Lublin Province	2,512,246	25,123	2,095,258	83
Lubusz Province	1,398,793	13,988	1,007,145	72
Łodz Province	1,821,895	18,219	2,437,970	134
Lesser Poland	1,518,279	15,183	3,410,441	225
Masovia	3,555,847	35,559	5,425,028	153
Opole Province	941,187	9412	976,774	104
Subcarpathia	1,784,576	17,846	2,121,229	119
Podlaskie Province	2,018,702	20,187	1,173,286	58
Pomerania	1,832,368	18,323	2,346,671	128
Silesia	1,233,309	12,333	4,492,330	364
Holy Cross	1,171,050	11,710	1,224,626	105
Warmia-Masuria	2,417,347	24,173	1,416,495	59
Greater Poland	2,982,650	29,826	3,496,450	117
West Pomerania	2,290,472	22,905	1,688,047	74

Figure 1. Location of provinces in Poland [19].

Figure 2. Number of road accidents in Poland by province from 2009 to 2021 [20].

Figure 3. Number of road accidents in Poland by province in 2021 [20].

To predict traffic accident rates across Polish provinces, multiple neural network models were employed. A key strength of this approach lies in its ability to simulate human cognitive processes. These networks are structured with interconnected nodes, each processing inputs through adjustable weights, variances, and outputs. For this study, Statistica’s artificial neural network (ANN) module was used to iteratively optimize these weights during training.

The predictive framework relied on multilayer perceptron (MLP) networks, each featuring a single hidden layer. The input layer comprised 10 neurons, corresponding to the historical traffic accident data (time series) for each province. The hidden layer’s architecture was tested with two to eight neurons, while the output layer consisted of a single neuron generating the predicted accident values. Model performance was contingent upon two factors: The selected network topology (e.g., neuron count in the hidden layer) and the parameter configuration (e.g., weight optimization during training).

A neural network is a computational model inspired by the human nervous system, designed to process information through interconnected layers. These models are structured in such a way that data—whether in the form of images, audio, text, or numerical values—are introduced at the input layer and subsequently processed through intermediate layers before generating a final output. During training, the system learns patterns in the input data, enabling it to analyze complex information and make decisions based on learned relationships.

At the heart of such systems are artificial neurons, which serve as simplified representations of their biological counterparts. Just like biological neurons, artificial ones receive multiple inputs, process them, and produce a single output signal. These components mimic the function of dendrites in the brain, allowing for the transformation of input data into a meaningful response. Neural networks are a foundational element of artificial intelligence, supporting the development of intelligent systems capable of organizing knowledge hierarchically and making autonomous decisions [21].

Their versatility enables neural networks to be applied in numerous domains. For example, streaming platforms utilize them to recommend shows and movies based on user behavior, while services like Google Translate rely on them for language translation. In e-commerce, they assist in customizing product recommendations during auctions, and in the transport sector, they are used to forecast traffic accident occurrences [22,23,24,25].

In this context, a neural network model was applied to estimate the number of traffic accidents across selected counties. One of the major advantages of such systems lies in their ability to simulate cognitive functions of the human brain. These models operate through networks composed of input nodes, adjustable weights, bias values, and output nodes.

Neural networks emulate the processes of the nervous system through mathematical formulations. Their architecture typically involves several layers, starting with an input layer that assimilates data of various types. The system then propagates this information through one or more hidden layers—often referred to as transition layers—before producing a result at the output layer (Figure 4). Each stage plays a critical role in extracting relevant features from the data and generating accurate predictions.

For the modeling process, the Statistica software 13.3, which includes dedicated tools for neural network analysis, was used to optimize weight values during testing. A multilayer perceptron (MLP) architecture was implemented, incorporating hidden layers composed of a variable number of neurons, ranging from two to eight, depending on the case. The output layer consisted of a single neuron that produced the forecasted values of traffic accident occurrences in the studied regions.

The neural network model developed in this study is based primarily on historical time series data of road accident frequency for each province in Poland, covering the years 2001 to 2021. These data served as the sole input to the multilayer perceptron (MLP) networks used for prediction. The input layer consisted of 10 neurons, each representing accident counts from the ten most recent years in the training set. The model architecture also included a hidden layer (with two to eight neurons) and an output neuron producing the projected number of accidents.

Although the model does not explicitly incorporate exogenous variables such as traffic volume, meteorological conditions, or infrastructure characteristics, these factors are indirectly reflected in the data used, for instance, the following features:

Demographic and infrastructural differences between provinces influence the historical accident rates and are thus implicitly embedded in the model.
Legislative changes, such as increased penalties for violations introduced in 2022, and external shocks like the COVID-19 pandemic, are represented through fluctuations in the time series.
Temporal patterns, including long-term trends and seasonal variations, are captured during model training.

In essence, while the model does not isolate specific causal factors, their cumulative effects are learned by the network through historical data. Future work may include enriching the model with additional explanatory variables to improve its predictive capabilities and interpretability.

This methodology underscores the flexibility of ANNs in modeling complex, nonlinear relationships within traffic safety data. Measures of forecast excellence were derived from the following forecast errors, which were found using Equations (1)–(5):

ME—mean error

$M E = \frac{1}{n} \sum_{i = 1}^{n} (Y_{i} - Y_{p}),$

(1)
MAE—mean absolute error

$M A E = \frac{1}{n} \sum_{i = 1}^{n} |Y_{i} - Y_{p}|,$

(2)
MPE—mean percentage error

$M P E = \frac{1}{n} \sum_{i = 1}^{n} \frac{Y_{i} - Y_{p}}{Y_{i}},$

(3)
MAPE—mean absolute percentage error

$M A P E = \frac{1}{n} \sum_{i = 1}^{n} \frac{|Y_{i} - Y_{p}|}{Y_{i}},$

(4)
SSE—mean square error

$S S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(Y_{i} - Y_{p})}^{2}}$

(5)

where

n—length of the forecast horizon;

Y—observed value of traffic accidents;

Y_p—the forecast value of traffic accidents.

Neural network models with the lowest mean percentage error and mean absolute percentage error were used to forecast the frequency of traffic accidents.

4. Results

The change in the number of traffic accidents in each province was examined using the Kruskal–Wallis nonparametric test, and the findings indicated that the provinces under consideration had different medians. The test statistic has a value of 260.4 and a test probability of p = 0.000. The obtained value leads to the conclusion that the monthly accident rates in Poland deviate statistically significantly from the examined data in each province.

Moreover, as Figure 2 and Figure 3 show, there is a noticeable difference in accidents according to the province. Figure 5 clearly shows that the provinces with the highest population density, Masovia and Silesia, also have the highest frequency of traffic accidents, while Lubusz Province and Podlaskie Province have the lowest frequency.

Road Accident Forecasting

The number of road accidents in the provinces under consideration was projected using annual data from the Polish Police from 2001 to 2021 [20]. The following random sample sizes were used for statistical tests in Statistica:

-: Teaching 70%, testing 15%, and validation 15%;
-: Teaching 80%, testing 10%, and validation 10%.

The majority of studies were conducted using random sample sizes of 15% for testing, 70% for instruction, and 15% for validation. In order to answer the topic of whether and how the random sample size affects the success of the analysis utilized, new random sample sizes of teaching 80%, testing 10%, and validation 10% were adopted because no study in this area could be found. The investigation was conducted in each of the previously indicated random sample sizes and with the assumed number of learning networks: 20, 40, 60, 80, 100, and 200. This was done to avoid a false halt of learning when a local minimum occurs. Increasing this value leads to an abundance of learning networks.

This study used different numbers of random paths to show how their choice affected the results. Meanwhile, it was believed that the neurons in the hidden and output layers were activated by logistic, linear, exponential, and hyperbolic tangent functions. The activation function of a neural network is the process by which its neurons determine the value of their output. They are also used to calculate the value of a neuron’s output signal while taking the values of this collective excitation into consideration.

Each layer that has been covered has a specific purpose. In particular, data entry into the network is the responsibility of the input layer; here, the data include the total number of traffic accidents that transpired during the research period. The hidden layer, or the next layer in question, uses input neurons to process the values of the incoming data. The output layer, the last layer to be considered, is used to ascertain the data’s output values once it has been analyzed.

It was anticipated that the learning error functions would lower the output of the neural network’s sum of squares of errors. The iterative BFGS (Broyden–Fletcher–Goldfarb–Shanno) learning algorithm was used. The algorithm used in this study is quite stable and shows no sensitivity to directional minimization errors. Moreover, there is no usage of the Hesse matrix. Furthermore, one of the advantages of the aforementioned algorithm is a wide region of convergence, which permits one to approach the optimization task to a solution without considering a predetermined starting point [27].

The Statistica automatic search module for optimal networks was utilized in the computations. Five networks were identified for each number of learning networks and an expected number of random groups in order to serve this goal. Then, out of the sixty neural network combinations that were tested, the combinations with the lowest assumed MAPE error values were selected.

The best-trained neural networks used to forecast the number of traffic accidents in Poland by province are displayed in Table 3 and Table 4. In the tables under examination, the word “MLP” refers to a specific type of neural network consisting of n input layers, n hidden layers, and n output layers. The quality of neural network learning and validation was then evaluated. Values closer to 1 indicate better performance. The subsequent columns contain the neuron activation function that was previously defined and for which the error was lowest. The number at BFGS indicates the number of epochs required for the network to learn its architecture and find the best solution with the least amount of learning error.

To better understand how the choice of training–validation–testing split affects forecasted accident levels, the results from the 70-15-15 and 80-10-10 configurations were compared for the year 2040 across all provinces. Although both approaches produced similar trends, the predicted values in some regions (e.g., Lesser Poland, Silesia) differed by more than 5%. To visualize this, a comparative difference map may be created in future work, using GIS tools to display forecast divergence. Such maps can help identify provinces where model sensitivity is higher and forecast confidence is lower, thereby supporting more informed decisions in traffic safety planning.

5. Discussion

Based on current police reports, the previous statistical analysis was expanded to include data from 2022 to 2024. In summary, in 2022, 21,322 accidents (1896 killed, 24,743 injured) were reported; in 2023, 20,936 accidents (1893 killed, 24,125 injured) occurred; and in 2024, 21,519 accidents (1896 killed, 24,782 injured) were reported. Compared with earlier forecasts, there is general agreement on the numerical range, although the details differ. For example, classical trend models predicted a further decline in accidents, which actually occurred until 2023, but there was a rebound in 2024. In contrast, the neural network forecast by province, assuming stabilization, was confirmed observationally: the number of accidents in 2024 was similar to that of 2022 (it increased only slightly). The new data also enabled a more precise assessment of the error of the models—the results show that the neural nets maintain high accuracy (an error of 1–2%), while the simple trend models yield higher deviations, especially in the short horizon.

Based on the data shown above, it is anticipated that the number of traffic accidents in Poland will decrease every year until reaching a stable level, regardless of the province. The results are impacted by the random sample size selection. With the 70-15-15 proportions, the average percentage error was 1.75% for the learning group (70%), 15% for the testing group, and 15% for the validation group. However, for the second sample (80-10-10), the error was 0.63%.

It is important to note that the outbreak affected the results. During the pandemic, there was an average 30% decrease in the number of road accidents [28]. However, after the outbreak, the number of road accidents went back to what it was before the epidemic.

The COVID-19 pandemic was a key factor disrupting accident data. The period of traffic restrictions (2020–2021) saw a decrease in the number of incidents, with the historical minimum level of fatalities (2245 people) falling in 2021. As the authors note, “the pandemic significantly reduced the frequency of accidents on the road,” which is also confirmed by trend analysis. After the restrictions were lifted, traffic gradually increased, contributing to a rebound in the number of accidents in 2024.

Moreover, the Tanh function for the hidden layer and the linear function for the baseline are the most beneficial activation functions for the first investigation (70-15-15). For the second investigation, it is advantageous to use the Tanh and exponential functions (80-10-10).

It is feasible to conclude that in the next few years, there will probably be a decrease in traffic accidents, followed by stabilization, based on the data above regarding the expected number of accidents per province. It should be noted that the results were impacted by the epidemic (Figure 6 and Figure 7).

Moreover, Figure 6 and Figure 7 present the projected number of traffic accidents in Poland’s provinces from 2022 to 2040, based on two different training–validation–testing splits. Figure 6 corresponds to the model trained with a 70-15-15 split (70% training, 15% testing, and 15% validation), while Figure 7 presents the results using the 80-10-10 split.

The figures visualize the expected stabilization in the number of traffic accidents across provinces after an initial decline. In both cases, the neural network models forecast relatively flat trends after 2025, with minor variations between provinces. However, the model trained on the 80-10-10 split generally produces smoother curves and smaller fluctuations, which aligns with its lower mean absolute percentage error (MAPE = 0.63%) compared to the 70-15-15 model (MAPE = 1.75%).

Notably, provinces with higher historical accident rates—such as Masovia and Silesia—maintain higher predicted values, while regions like Lubusz and Podlaskie exhibit lower and more stable forecasted accident levels. These differences reflect the patterns observed in the historical data and were learned by the neural network during training.

The visual comparison between the two figures also demonstrates the effect of data partitioning on model stability and forecast variance, underlining the importance of selecting optimal training configurations in neural network forecasting.

6. Conclusions

The neural network method projected the number of accidents that would happen in each province of Poland using Statistica software. The software minimized both the average absolute percentage error and the average absolute error while assuming the weights.

Based on the study, it was found that forecasts of the number of traffic accidents based on data from 2000 to 2021 indicated a further decrease in the number of accidents (trend models) or their stabilization (neural networks). The data for 2022–2024 confirm that, after a period of decline, there was a rebound, with the number of accidents at a slightly higher level in 2024. However, according to the UNRSC assumptions, there is a long-term favorable trend (fewer deaths than 5 years ago). In addition, the use of neural networks (MLPs) allows for high forecast accuracy, which can support police planning and prevention efforts. The results suggest maintaining effective enforcement (e.g., higher fines from 2022) and investing in infrastructure (expansion of the highway network, sidewalks, etc.) to continue the trend of reducing casualties. At the same time, it is worth including additional factors (traffic volume, meteorological conditions, and pedestrian participation) in predictive models, as proposed in the literature.

Classical trending methods are intuitive, but they do not take into account the impact of new variables and can fail with non-stationary changes (e.g., pandemic, novation). Neural networks offer flexibility and usually lower predictive error, but their effectiveness depends on correct parameter selection and training set balance. Further research should test different architectures (e.g., deep networks or LSTMs) and compare them with other ML algorithms (e.g., random forests, Bayesian networks). It is also important to validate predictions on new data—the introduction of 2022–2024 data into the analysis shows that models must adapt to unexpected perturbations in the data.

It is possible to conclude that the pandemic has decreased the number of traffic accidents based on the gathered data. Before stabilization, there should be an even greater drop in traffic accidents in the next few years. Moreover, the pandemic may further change the findings’ importance. The calculated forecast errors show how accurate the models are.

Keeping in mind the obtained forecasts, efforts must be made to further reduce the number of traffic incidents. Poland has already implemented these regulations, which would increase the fines for movement offenses on Polish roads, beginning 1 January 2022. The epidemic likely influenced the study’s conclusions because it drastically altered the number of traffic accidents.

In summary, the latest research and data confirm the general trends seen since the early 2000s. However, seasonal effects, crisis events (pandemic), and legislative changes need to be taken into account for the forecasts to be reliable. Integrating statistical methods with machine learning and continuously updating data (CSO, police) will allow for more precise road policy planning and improved road safety.

The authors plan to compute the overall number of traffic accidents by using a range of statistical techniques and taking additional factors impacting accident rates into consideration in a subsequent study. The volume of traffic, the type of weather, and the age of the driver engaged in the crash are a few examples of these.

Although the current study focuses on predicting the number of traffic accidents at the province level, future research should address the spatial dimension of road safety. In particular, identifying “black spots” (i.e., road segments with the highest accident frequency) would enable more targeted and actionable interventions. This would require the inclusion of geo-referenced accident data, which were not available in the present dataset. Future extensions of this work may combine neural network forecasting with geospatial analysis to enhance traffic safety planning.

Compared to previous neural network-based forecasting studies, the present work introduces several improvements. First, it analyzes traffic accident trends over a long time horizon (2001–2021) across all Polish provinces, capturing regional differences and post-pandemic effects. Second, the impact of training–testing–validation splits (70-15-15 vs. 80-10-10) on forecast accuracy is explicitly tested, which is rarely addressed in the literature. Third, the use of multiple MLP architectures with error-driven model selection (based on MAPE, MAE, etc.) ensures that the best-performing model is chosen for each province. Lastly, predictions are validated with the newest available data (2022–2024), providing robust insight into the model’s generalization capabilities. Together, these features enhance both the scientific and practical value of the approach.

One limitation of this study is its geographic scope, as it focuses exclusively on Polish provinces. While this ensures homogeneity of data and legal context, it limits the ability to compare the results internationally. Future research should aim to incorporate international case studies—especially from countries with comparable traffic conditions and publicly available data. Applying the same neural network approach to multiple countries would enable a robust comparison of model performance and accident trends in different road safety environments.

The methodology proposed in this study is transferable and scalable to other countries or larger regions, provided that relevant data are available. This is confirmed by other studies by the authors on this subject. To replicate the forecasting model, the following steps are recommended:

Collect historical road accident data by region for a minimum of 10 consecutive years.
Preprocess the data to ensure completeness and consistency.
Adapt the neural network architecture to match the data scope (e.g., number of input neurons = number of years).
Train multiple MLP models with varying hidden layer sizes and activation functions.
Validate the model using a holdout dataset or cross-validation and select the best-performing version based on MAPE or MAE.

Author Contributions

Conceptualization, P.G., J.Z. and E.S.; methodology, P.G., J.Z. and E.S.; software, P.G., J.Z. and E.S.; validation, P.G., J.Z. and E.S.; formal analysis, P.G., J.Z. and E.S.; investigation, P.G., J.Z. and E.S.; resources, P.G., J.Z. and E.S.; data curation, P.G., J.Z. and E.S.; writing—original draft preparation, P.G., J.Z. and E.S.; writing—review and editing, P.G., J.Z. and E.S.; visualization, P.G., J.Z. and E.S.; supervision, P.G., J.Z. and E.S.; project administration, P.G., J.Z. and E.S.; funding acquisition, E.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

WHO. The Global Status on Road Safety 2018; WHO: Geneva, Switzerland, 2018; 403p, ISBN 9789241565684. Available online: https://iris.who.int/bitstream/handle/10665/276462/9789241565684-eng.pdf?sequence=1 (accessed on 17 April 2022).
Žuraulis, V.; Sokolovskij, E. Vehicle velocity relation to slipping trajectory change: An option for traffic accident reconstruction. Promet—Traffic Transp. 2018, 30, 395–406. [Google Scholar] [CrossRef]
Stepanović, N.; Tubić, V.; Milenković, M.; Halaj, K. The impact of basic traffic flow characteristics on traffic accident occurrence on 2-lane rural roads in Serbia. Transport 2025, 40, 64–73. [Google Scholar] [CrossRef]
Pečeliūnas, R.; Žuraulis, V.; Droździel, P.; Pukalskas, S. Prediction of road accident risk for vehicle fleet based on statistically processed tire wear model. Promet—Traffic Transp. 2022, 34, 619–630. [Google Scholar] [CrossRef]
Lyashuk, O.; Mironov, D.; Martyniuk, V.; Aulin, V.; Tson, O.; Maruschak, P. Risk analysis of road traffic accidents in Ukraine. Transport 2024, 39, 350–359. [Google Scholar] [CrossRef]
Gorzelańczyk, P.; Sokolovskij, E. Using neural networks to forecast the amount of traffic accidents in Poland and Lithuania. Sustainability 2025, 17, 1846. [Google Scholar] [CrossRef]
Helgason, A. Fractional integration methods and short Time series: Evidence from asimulation study. Polit. Anal. 2016, 24, 59–68. [Google Scholar] [CrossRef]
Forecasting Based on Time Series. 2022. Available online: http://pis.rezolwenta.eu.org/Materialy/PiS-W-5.pdf (accessed on 17 April 2022).
Procházka, J.; Flimmel, S.; Čamaj, M.; Bašta, M. Modelling the Number of Road Accidents. In Applications of Mathematics and Statistics in Economics. International Scientific Conference, Szklarska Poręba, 30 August–3 September 2017; Publishing House of the University of Economics in Wrocław: Wrocław, Poland, 2017. [Google Scholar] [CrossRef]
Sunny, C.M.; Nithya, S.; Sinshi, K.S.; Vinodini, V.M.D.; Lakshmi, A.K.G.; Anjana, S.; Manojkumar, T.K. Forecasting of Road Accident in Kerala: A Case Study. In Proceedings of the 2018 International Conference on Data Science and Engineering (ICDSE), Kochi, India, 7–9 August 2018. [Google Scholar] [CrossRef]
Dudek, G. Forecasting Time Series with Multiple Seasonal Cycles Using Neural Networks with Local Learning. In Artificial Intelligence and Soft Computing ICAISC 2013; Lecture Notes in Computer Science; Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; p. 7894. [Google Scholar] [CrossRef]
Szmuksta-Zawadzka, M.; Zawadzki, J. Forecasting on the Basis of Holt-Winters Models for Complete and Incomplete Data. Research Papers of the Wrocław University of Economics 2009, No. 38. Available online: http://bazekon.icm.edu.pl/bazekon/element/bwmeta1.element.ekon-element-000164937729 (accessed on 17 April 2022).
Lavrenz, S.; Vlahogianni, E.; Gkritza, K.; Ke, Y. Time series modeling in traffic safety research. Accid. Anal. Prev. 2018, 117, 368–380. [Google Scholar] [CrossRef] [PubMed]
Al-Madani, H. Global road fatality trends’estimations based on country-wise microlevel data. Accid. Anal. Prev. 2018, 111, 297–310. [Google Scholar] [CrossRef] [PubMed]
Chudy-Laskowska, K.; Pisula, T. Prognozowanie liczby wypadków drogowych na Podkarpaciu. Logistics 2015, 4, 2782–2796. [Google Scholar]
Gorzelanczyk, P. Application of neural networks to forecast the number of road accidents in provinces in Poland. Heliyon 2023, 9, e12767. [Google Scholar] [CrossRef] [PubMed]
Jurkovic, M.; Gorzelanczyk, P.; Kalina, T.; Jaros, J.; Mohanty, M. Impact of the COVID-19 pandemic on road traffic accident forecasting in Poland and Slovakia. Open Eng. 2022, 12, 578–589. [Google Scholar] [CrossRef]
Central Statistical Office. Available online: www.gus.pl (accessed on 17 April 2022).
Province Names in English. Available online: https://polandtravel.agency/regions-of-poland/ (accessed on 17 April 2022).
Statistic Road Accident. Available online: https://statystyka.policja.pl/ (accessed on 17 April 2022).
Lake, B.M.; Ullman, T.D.; Tenenbaum, J.B.; Gershman, S.J. Building machines that learn and think like people. Behav. Brain Sci. 2017, 40, e253. [Google Scholar] [CrossRef] [PubMed]
Becoming Human. How Netflix Uses AI, Data Science, and Machine Learning—From a Product Perspective. Available online: www.becominghuman.ai (accessed on 10 August 2019).
Forbes. The Amazing Ways eBay Is Using Artificial Intelligence to Boost Business Success. Available online: www.forbes.com (accessed on 12 September 2020).
Wu, Y.; Schuster, M.; Chen, Z.; Le, Q.V.; Norouzi, M.; Macherey, W.; Krikun, M.; Cao, Y.; Gao, Q.; Macherey, K.; et al. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv 2020, arXiv:1609.08144. [Google Scholar]
Oronowicz-Jaśkowiak, W. The application of neural networks in the work of forensic experts in child abuse cases. Adv. Psychiatry Neurol. 2019, 28, 273–282. [Google Scholar] [CrossRef]
Wójcik, A. Autoregressive Vector Models as a Response to the Critique of Multi-Equation Structural Econometric Models; Publishing House of the University of Economics in Katowice: Katowice, Poland, 2014; Volume 193. [Google Scholar]
Baron, B.; Pasierbek, A. Comparison of the performance of coupled gradient and quasi-Newtonian BFGS algorithms in the problem of optimizing power distribution in a power system. Electrics 2009, 3, 211. [Google Scholar]
Gorzelańczyk, P. Change in the Mobility of Polish Residents during the COVID-19 Pandemic. Commun.-Sci. Lett. Univ. Zilina 2022, 24, A100–A111. [Google Scholar] [CrossRef]

Figure 4. Neural network models [26].

Figure 5. Average number of traffic accidents by province from 2001 to 2021.

Figure 6. Projected number of road accidents in 2022–2040 for the 70-15-15 study group.

Figure 7. Projected number of road accidents in 2022–2040 for the 80-10-10 study group.

Table 3. Summary of neural network learning for the case of random sample sizes: teaching 70%, testing 15% and validation 15%.

Province	Network Number	Network Name	Learning Algorithm	Activation (Hidden)	Activation (Output)	Errors
Province	Network Number	Network Name	Learning Algorithm	Activation (Hidden)	Activation (Output)	ME	MAE	MPE	MAPE	SSE
Lower Silesia	20	MLP 10-3-1	BFGS 8	Exponential	Linear	22.06	105.58	0.23%	4.48%	199.39
Kuyavia–Pomerania	200	MLP 10-2-1	BFGS 4	Tanh	Tanh	6.39	64.20	0.19%	5.83%	79.33
Lublin Province	20	MLP 10-6-1	BFGS 4	Tanh	Tanh	15.71	63.21	0.95%	4.81%	78.17
Lubusz Province	60	MLP 10-6-1	BFGS 5	Tanh	Logistics	1.70	29.54	0.18%	4.69%	38.85
Łodz Province	80	MLP 10-2-1	BFGS 11	Logistics	Tanh	72.07	134.71	2.92%	4.54%	190.24
Lesser Poland	40	MLP 10-7-1	BFGS 33	Logistics	Exponential	115.52	138.82	4.51%	5.15%	276.59
Masovia	40	MLP 10-3-1	BFGS 6	Exponential	Linear	72.14	186.35	2.43%	5.21%	248.13
Opole Province	40	MLP 10-6-1	BFGS 10	Linear	Linear	3.58	45.73	1.69%	7.28%	60.00
Subcarpathia	80	MLP 10-8-1	BFGS 14	Exponential	Tanh	24.55	363.96	2.23%	20.76%	674.90
Podlaskie Province	40	MLP 10-6-1	BFGS 19	Logistics	Tanh	8.49	31.52	2.81%	6.16%	47.92
Pomerania	20	MLP 10-4-1	BFGS 21	Logistics	Tanh	40.84	70.60	2.29%	3.47%	139.26
Silesia	60	MLP 10-4-1	BFGS 7	Logistics	Linear	29.29	122.23	0.24%	3.24%	152.45
Holy Cross	100	MLP 10-8-1	BFGS 7	Logistics	Tanh	17.02	48.83	2.18%	4.55%	61.91
Warmia-Masuria	200	MLP 10-6-1	BFGS 9	Logistics	Tanh	0.52	36.85	0.18%	3.10%	52.19
Greater Poland	60	MLP 10-4-1	BFGS 1	Exponential	Exponential	9.98	392.95	3.07%	14.06%	477.32
West Pomerania	100	MLP 10-2-1	BFGS 7	Logistics	Linear	15.13	59.73	1.87%	5.09%	70.69
Average						28.44	118.43	1.75%	6.40%	177.96

Table 4. Summary of neural network learning for the case of random sample sizes: teaching 80%, testing 10% and validation 10%.

Province	Network Number	Network Name	Learning Algorithm	Activation (Hidden)	Activation (Output)	Errors
Province	Network Number	Network Name	Learning Algorithm	Activation (Hidden)	Activation (Output)	ME	MAE	MPE	MAPE	SSE
Lower Silesia	200	MLP 10-2-1	BFGS 0	Linear	Exponential	29.65	97.51	0.02%	4.10%	207.57
Kuyavia–Pomerania	100	MLP 10-6-1	BFGS 13	Logistics	Linear	2.25	28.07	0.06%	2.82%	34.20
Lublin Province	20	MLP 10-4-1	BFGS 10	Tanh	Logistics	2.72	71.71	0.50%	5.81%	80.54
Lubusz Province	60	MLP 10-8-1	BFGS 24	Tanh	Exponential	0.47	10.55	0.00%	1.65%	12.90
Łodz Province	40	MLP 10-2-1	BFGS 5	Tanh	Logistics	43.99	197.34	0.06%	5.94%	272.65
Lesser Poland	20	MLP 10-2-1	BFGS 5	Logistics	Logistics	94.48	467.44	0.87%	15.28%	540.21
Masovia	80	MLP 10-8-1	BFGS 3	Tanh	Linear	25.07	202.75	1.54%	5.56%	261.11
Opole Province	100	MLP 10-6-1	BFGS 18	Tanh	Linear	6.19	24.53	1.63%	4.08%	37.95
Subcarpathia	80	MLP 10-2-1	BFGS 4	Logistics	Linear	5.06	114.15	1.68%	7.69%	141.80
Podlaskie Province	20	MLP 10-4-1	BFGS 16	Linear	Linear	1.51	30.69	0.86%	5.47%	42.34
Pomerania	60	MLP 10-3-1	BFGS 7	Logistics	Tanh	5.38	87.26	0.57%	3.97%	123.00
Silesia	200	MLP 10-4-1	BFGS 46	Logistics	Exponential	17.85	41.76	0.82%	1.56%	64.00
Holy Cross	80	MLP 10-4-1	BFGS 14	Tanh	Exponential	4.95	26.67	0.26%	2.19%	33.00
Warmia-Masuria	80	MLP 10-4-1	BFGS 20	Tanh	Exponential	0.60	7.58	0.10%	0.56%	10.00
Greater Poland	40	MLP 10-8-1	BFGS 19	Tanh	Exponential	8.16	131.72	0.17%	5.01%	181.00
West Pomerania	100	MLP 10-5-1	BFGS 3	Exponential	Exponential	7.21	87.72	0.93%	7.55%	114.22
Average:						15.97	101.72	0.63%	4.95%	134.81

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gorzelańczyk, P.; Zabel, J.; Sokolovskij, E. Using Neural Networks to Predict the Frequency of Traffic Accidents by Province in Poland. Appl. Sci. 2025, 15, 9108. https://doi.org/10.3390/app15169108

AMA Style

Gorzelańczyk P, Zabel J, Sokolovskij E. Using Neural Networks to Predict the Frequency of Traffic Accidents by Province in Poland. Applied Sciences. 2025; 15(16):9108. https://doi.org/10.3390/app15169108

Chicago/Turabian Style

Gorzelańczyk, Piotr, Jacek Zabel, and Edgar Sokolovskij. 2025. "Using Neural Networks to Predict the Frequency of Traffic Accidents by Province in Poland" Applied Sciences 15, no. 16: 9108. https://doi.org/10.3390/app15169108

APA Style

Gorzelańczyk, P., Zabel, J., & Sokolovskij, E. (2025). Using Neural Networks to Predict the Frequency of Traffic Accidents by Province in Poland. Applied Sciences, 15(16), 9108. https://doi.org/10.3390/app15169108

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Using Neural Networks to Predict the Frequency of Traffic Accidents by Province in Poland

Abstract

1. Introduction

2. Literature Review

3. Materials and Methods

4. Results

Road Accident Forecasting

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI