A Case Study on Predicting Road Casualties Among Young Car Drivers in the Republic of Serbia Using Machine Learning

Bačkalić, Svetlana; Kanović, Željko; Bačkalić, Todor

doi:10.3390/safety11040107

Open AccessArticle

A Case Study on Predicting Road Casualties Among Young Car Drivers in the Republic of Serbia Using Machine Learning

by

Svetlana Bačkalić

^*

,

Željko Kanović

and

Todor Bačkalić

Faculty of Technical Sciences, University of Novi Sad, 21000 Novi Sad, Serbia

^*

Author to whom correspondence should be addressed.

Safety 2025, 11(4), 107; https://doi.org/10.3390/safety11040107

Submission received: 31 August 2025 / Revised: 2 November 2025 / Accepted: 6 November 2025 / Published: 10 November 2025

(This article belongs to the Special Issue The Safe System Approach to Road Safety)

Download

Browse Figures

Versions Notes

Abstract

Road traffic accidents are a major global public health concern, ranking among the top three causes of death worldwide and constituting the leading cause of death among individuals aged 15–29. Monitoring traffic safety status and trends is a vital element of effective road safety management. This study investigates road traffic casualties involving young car drivers (aged 18–24) in the Republic of Serbia from 1997 to 2024, analyzing historical patterns and introducing a predictive model for casualty outcomes. The analytical framework employs machine learning techniques, specifically Long Short-Term Memory (LSTM) networks, to estimate the number of casualties (FSI = Fatal + Serious Injuries) based on various contributing factors. Accurate prediction of accident outcomes is essential for designing targeted road safety measures and reducing casualty numbers.

Keywords:

traffic safety; machine learning; young drivers; prediction model; casualties

1. Introduction

Road traffic injuries represent a major global public health issue that requires coordinated and collective efforts to effectively prevent traffic accidents and their consequences. Each year, approximately 1.19 million people lose their lives on the roads, while an additional 30 to 50 million suffer injuries [1]. A worrying fact is that road traffic accidents are among the top three causes of death globally and represent the leading cause of death for young adults aged 15–29 years [2,3]. Although individuals of all age groups are affected by traffic accidents, young people are disproportionately represented in accident statistics across all countries [4,5]. Between 2015 and 2019 in the EU, an average of 1.215 young car drivers (aged 18–24) died in crashes each year, accounting for 16% of all car driver fatalities during that period [6]. The distribution of fatalities among young people by transport mode in the EU27 in the period 2010–2019 shows that around 40% of fatalities among young people aged 18–24 are among car drivers [7].

The most-represented of the factors that cause low road safety levels for youth are connected with lifestyle, inexperience, and the non-use of protective systems. Several factors, such as over-speeding, fatigued driving, drunk driving, and overloading/overcrowding, are primary accident causes. In addition, attitudes, driving behavior, experience, personality, and skill play significant roles [8,9]. Another critical factor that has been increasingly recognized in the literature is driver distraction. Distraction can arise from internal sources, such as mind-wandering or mobile phone use, and external sources, such as roadside advertising or complex traffic environments. Studies show that both internal and external distractions significantly increase the risk of accidents, often interacting with other behavioral and environmental factors to exacerbate crash likelihood [10,11]. This underscores the importance of considering distraction as a key determinant of road safety, particularly for young drivers, who are more susceptible to cognitive and attentional lapses. Road accidents are influenced by various external factors, including weather conditions (such as rainfall, temperature, and windstorms), lighting conditions (daylight or nighttime), time of day, day of the week (weekday or weekend), month, and season—all of which are important determinants of road traffic accidents. They should be used in the training of predictive models [12]. In order to mitigate the risks associated with young drivers, it is essential to implement a comprehensive set of countermeasures. These should include improvements in the areas of training, education, testing, communication, enforcement, and technology.

Monitoring and analysis of the temporal distribution of traffic accidents and their consequences makes it possible to observe trends in the movement of these phenomena and thus gain knowledge that is of great use in planning and implementing appropriate activities with the aim of improving the traffic safety situation [13,14,15,16]. The creation of policies, strategies, and action plans that have a clear goal of reducing the number of fatalities and injuries in traffic accidents is based on analysis and understanding of existing problems in this area. Several studies have employed conventional statistical analysis techniques to examine factors related to casualties and predict road accident frequency [17,18,19,20,21]. Although such models offer mathematical interpretability and facilitate a clearer understanding of the effects of individual predictor variables, they are often constrained by limitations, including low predictive accuracy and the potential for biased parameter estimation [22]. Researchers have made significant efforts to investigate the applicability of machine learning techniques in road safety modeling [23,24,25]. In response to these shortcomings, machine learning (ML) techniques have been increasingly adopted as robust alternatives to traditional analytical methods to study factors related to casualties and frequency [23,26,27]. Traffic accident data are inherently nonlinear and often require domain-specific knowledge for accurate prediction. Deep learning methods can autonomously learn this underlying knowledge directly from the data, enabling more accurate and adaptive accident or casualty forecasting. Moreover, machine learning techniques facilitate the extraction of meaningful patterns and insights from large, complex, and heterogeneous datasets. Among the most commonly utilized ML methods in this context are decision trees (DT) [28], Bayesian networks [29], classification and regression trees (CART) [30], extreme gradient boosting (XGBoost) [29], and random forest (RF) [31]. Artificial Neural Networks (ANN) have been successfully applied in traffic modeling and prediction of exceeding traffic load severity [32]. Recently, applications of Deep Neural Network Models have been introduced in traffic accident forecasting [12,33]. Generally, when problems regarding time-series prediction are considered, the most common neural network architectures used are Recurrent Neural Networks [34]. According to the literature, Long Short-Term Memory (LSTM) networks are proven to be the most efficient and most accurate of them [35,36]. Therefore, a network architecture of this type was used in the present study for FSI frequency forecasting [24,27].

Although previous studies have examined young drivers in Serbia, important gaps remain. Research shows that inexperience, risky behaviors, and low risk perception contribute to high crash rates, and interventions such as GDL programs improve attitudes but have limited impact on actual crashes [37]. Other studies highlight gender, urban/rural, and behavioral differences [38,39], yet these analyses typically rely on conventional statistics and do not integrate long-term trends or predictive modeling. The effects of major events, such as legislative changes or the COVID-19 pandemic, on FSI are also largely unexplored. This study addresses these gaps by applying LSTM-based deep learning models to national datasets, enabling the prediction and comprehensive understanding of young driver casualties (FSI) and providing insights about improvements in traffic safety policy and targeted interventions.

The aim of this study is to present and analyze the trends in the numbers of fatalities and serious injuries (FSI) among young drivers in the Republic of Serbia in the overall period of 1997–2024, and to develop accurate short-term forecasts of casualties (FSI) in two specific periods between 1997 and 2024. One of the main goals is to apply deep learning methods to estimate and quantify the impact of these two specific events on the trend of casualties (FSI) among young drivers in Serbia, which is, to the best of our knowledge, the first such study in Serbia. The critical turning-point years were 2010, when Serbia introduced a new Road Traffic Safety Law and 2020, when there was a COVID-19 lockdown.

The central hypothesis of the study is that machine learning algorithms (MLAs), particularly deep learning architectures such as LSTM, can reliably predict short-term trends in traffic accident outcomes involving young car drivers based on historical data.

The specific objectives of the study are as follows: (1) To analyze the temporal trends of young driver casualties (FSI) in Serbia over a 28-year period. (2) To identify key contributing factors to young driver casualties (FSI) using available national datasets. (3) To predict the number of young driver casualties (FSI) following two specific events. The first refers to the introduction of the new Road Traffic Safety Law and the second to the period after the Coronavirus pandemic. (4) To evaluate the practical implications of model predictions for public policy and youth-targeted road safety programs.

This study contributes to the field of road safety research in several key ways. First, it is one of the first studies to apply deep learning techniques (LSTM networks) to the prediction of youth-related traffic casualties in Serbia. Second, it provides a thorough empirical analysis of nearly three decades of national accident data. Third, it offers actionable insights to inform evidence-based traffic safety planning and targeted policy interventions for young drivers.

This article is organized as follows: Section 2 presents the description of the problem. Section 3 outlines the methodology applied in this empirical case study. Section 4 reports the results and analysis of young driver casualties (FSI), including forecasting, using deep learning methods. Section 5 discusses the key insights and implications for traffic safety policy, and Section 6 concludes the study.

2. Problem Description

The Republic of Serbia, situated in Southeast Europe, remains below the EU average in terms of socioeconomic development and continues to face substantial challenges in the area of road traffic safety. In 2024, the estimated population was about 6.5 million [40], with approximately 2.4 million registered motor vehicles. In 2023, road traffic accidents claimed the lives of 503 individuals, including 163 passenger vehicle drivers—representing 32% of all fatalities. Among these drivers, 17 were young individuals aged 18–24 (15 men and 2 women), accounting for 12% of all passenger car driver fatalities [41]. Notably, over 38% of traffic fatalities among youth were the drivers themselves, and the trend is likely to continue without effective interventions. The adoption of in-vehicle safety systems (ABS, ESC, airbags) and Intelligent Transport Systems (ITS) has improved traffic safety in developing countries [42]. In Serbia, their impact is limited by a fleet dominated by imported used cars lacking modern safety features and with modest ITS infrastructure. Consequently, fluctuations in safety outcomes are more closely associated with legislative interventions—such as stricter penalties, mandatory seat belt use, and drink-driving laws—than with technological improvements. Following the 2010 Road Traffic Safety Law, fatalities declined moderately, reflecting the benefits of regulation, enforcement, and driver education. Despite these gains, young drivers remain overrepresented in accidents, underscoring the persistent influence of behavioral factors and the complex interplay of human, technical, and institutional elements.

In response to these challenges, Serbia introduced a new Road Traffic Safety Law at the end of 2009, followed by a revised version in 2020, aiming to strengthen the legal framework and improve road safety for young drivers [43]. The Road Traffic Safety Law in Serbia, which came into force in 2010, introduced numerous innovations in the field of traffic regulation. Among the most significant were the implementation of strategic traffic safety measures, the establishment of a traffic safety financing system, the introduction of a penalty points system, improvements in road infrastructure safety, and the organization of traffic safety campaigns.

For many years, the Republic of Serbia has ranked near the bottom of road safety performance lists. The latest Annual Report, which evaluates the progress of 32 European and associated countries, shows that Serbia recorded the highest road mortality rate in 2024, with 78 road deaths per million inhabitants [44]. Over the past three decades, the country has experienced a turbulent period marked by political instability, armed conflicts, sanctions, economic transition, privatization, a rising standard of living, the COVID-19 pandemic, and several amendments to the Road Traffic Safety Law. A particularly concerning aspect of Serbia’s poor road safety record is the high mortality rate among young car drivers. The core of the issue lies in the unacceptably high number of fatalities within this group of road users, as well as significant year-to-year fluctuations in their occurrence.

Three indicators were used to assess the road safety performance of young car drivers: (1) Vulnerability indicator (VI), (2) FSI rate per 100,000 people (FSI/pop), and (3) FSI rate per 10,000 registered motor vehicles (FSI/rmv). Vulnerability indicator (VI) is calculated by dividing the proportion of FSI in a given age group (within the total number of all FSI) by the proportion that the same age group represents in the total population. If the VI is greater than 1, the observed age group is considered overrepresented among the casualties. The analysis of the VI shows that, throughout the entire period, the observed age group of young drivers was a vulnerable group, with values ranging from 1.67 (in 2016) to a maximum of 2.24 (in 2006 and 2011). The FSI rate per 100,000 people exhibited a downward trend from 2007 to 2014, after which it has shown a consistent year-on-year increase. Over the past decade, the highest value of FSI rate per 100,000 population was 3.0 and it was recorded in 2024. The FSI rate per 10,000 vehicles also declined between 2007 and 2014, after which it remained constant at 0.6 FSI of young drivers per 10,000 registered vehicles throughout the 2014–2024 period (Figure 1).

3. Methodology

This study analyzes the long-term trends and predictions in the number of FSI among young car drivers (aged 18–24) in the Republic of Serbia over the period of 1997–2024. The methodology consists of descriptive and inferential statistical techniques suitable for trend analysis and prediction.

3.1. Data Collection and Pre-Processing

To properly understand the presented problem, it is necessary to define the dataset and its sources. The main data sources are the Road Traffic Safety Agency (RTSA), the Statistical Office of the Republic of Serbia, and the Republic Hydrometeorological Service of Serbia. The data are presented in different ways (data files or PDF files) and in various formats (online databases, annual reports, statistical yearbooks).

The data on traffic accidents and casualties used in this research were obtained from the official database maintained by the RTSA of the Republic of Serbia, which represents the national reference source for road safety statistics. This database includes all officially reported traffic accidents resulting in fatalities or injuries and contains detailed attributes related to accident type, location, time, vehicle, and driver characteristics. It therefore provides a representative and comprehensive basis for longitudinal analysis of traffic safety trends in Serbia. Although the structure and management of the database have undergone minor updates during the 1997–2024 period, the core variables used in this study—particularly the number of young driver casualties and fatalities—have been consistently defined throughout, ensuring comparability over time.

To maintain data integrity, annual data were cross-checked against the official RTSA statistical yearbooks and national transport safety reports. Missing or inconsistent entries were handled through systematic cleaning, validation, and aggregation procedures to ensure a uniform and reliable time-series suitable for machine learning analysis.

The prepared dataset includes the following:

Number of traffic accident casualties categorized by FSI (Fatalities + Serious Injuries);
Number of vehicles;
Role in accident (passenger car drivers only);
Age group (restricted to 18–24 years old);
Gender of the involved drivers (male/female);
Year;
Month;
Hour;
Weekend or weekday;
Season;
Urban or rural area;
Weather conditions;
Number of licensed young drivers.

Data can be displayed interactively on the webpage of the Serbian RTSA or can be downloaded as an .xls file [40]. Accounting for any potential missing observations, a total number of 31,863 casualties in traffic accidents was used for the analysis of drivers between the ages of 18 and 24 between January 1997 and December 2024 (25,010 male, 4430 female).

A fatal injury to a car driver is defined in accordance with international practice as a death occurring within 30 days of an injury sustained in a traffic accident. Definitions of non-fatal injuries, however, are not fully harmonized across countries and often depend on national regulations and data collection systems. In this study, the classification of serious injuries follows the official definitions used by the RTSA and the Serbian police, referring to injuries that require hospital treatment, as documented in police or medical records. In this paper, the term casualties refers to the combined total of fatally and seriously injured young drivers (FSI). These outcomes are addressed within the National Road Safety Strategy and the National Road Safety Plan of developed countries. The legislative framework adopts a Safe System approach, drawing on the principles of Vision Zero [4].

3.2. Modeling Strategy

The data analysis was conducted using a dual approach: classical descriptive analysis of absolute and relative traffic safety indicators, and prediction of the number of FSI using an LSTM network.

3.2.1. Descriptive Statistics

The nature of traffic safety phenomena, their temporal dimensions, and their frequent fluctuations make this analysis well-suited for determining the specifics and characteristics of changes over a given time period. Time-based absolute indicators of young driver casualties (FSI), which show the extent of this problem in units such as years, months, days, or hours, are indispensable for phenomenological description.

In developed countries, the measurement, monitoring, and comparison of traffic safety most often rely on relative indicators that reflect exposure to risk [4]. Key indicators include the number of inhabitants (for assessing FSI per 10,000 people), and the number of registered vehicles (for assessing FSI per 10,000 registered motor vehicles). Monitoring and analyzing changes in both absolute and relative indicators is of utmost importance for decision-makers at national and regional levels. Descriptive statistics have many advantages: they are easy to apply, produce clear visualizations suitable for tracking changes over time, and are well accepted by decision-makers and other stakeholders. However, they do not allow for deeper analysis of causal relationships among influencing factors, nor do they enable precise prediction of future events.

To address this limitation, an LSTM network was used, as it enables forecasting of specific phenomena by considering multiple, heterogeneous influencing factors simultaneously.

3.2.2. LSTM Network

The LSTM network is a deep learning algorithm that was introduced in 1997 [35] as an improvement of the traditional Recurrent Neural Network (RNN), which is known to have a problem with vanishing and exploding gradients [45]. The main purpose of LSTM is to perform dynamic systems modeling and prediction of time-series. Its key improvement compared to traditional RNN is the existence of memory cells and gating mechanisms, which enable it to maintain cell state over time and maintain information flow control.

As opposed to traditional RNN, with a single neural network layer in each cell, LSTM has four neural network layers in each cell. The basic diagram of an LSTM cell is depicted in Figure 2, while Figure 3 presents the LSTM cells connected in a network.

In Figure 2 and Figure 3, c is a cell state, x represents the input, and h is the output of the cell. Time instances are represented by t. The first layer, known as the forget gate layer, uses a sigmoid activation function to determine which amount of cell state will be “thrown away”. The output of this layer is a number ranging between 0 and 1, calculated as

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f}),

(1)

where σ is the sigmoid function, W_f is the weight factor, and b_f is bias. The same notation for weights and biases will be used in the sequel for each layer.

The next step within the cell is to calculate what amount of data will be stored in the cell state. This step involves the sigmoid layer, called the input gate layer and the tanh layer, whose outputs are calculated as

i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i}),

(2)

{\tilde{C}}_{t} = t a n h (W_{C} \cdot [h_{t - 1}, x_{t}] + b_{C}) .

(3)

The annotation is analog to previous equation. The new cell state is now updated as

C_{t} = f_{t} \cdot C_{t - 1} + i_{t} \cdot {\tilde{C}}_{t} .

(4)

To calculate the output of the cell, first the sigmoid layer, called the output gate layer, must be applied as follows:

o_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o}) .

(5)

After that, we apply the tanh function to the cell state and multiply it with o_t:

h_{t} = o_{t} \cdot t a n h (C_{t}) .

(6)

The LSTM consists of an array of such cells, which enable the forecasting of a time-series in a defined number of time steps t.

In this paper, the variant of LSTM with external inputs is applied, which means that, apart from the previously defined value of the predicted time-series, we use some other input parameters which have an influence on the prediction. All these input parameters are stored in the appropriate data format, similar to the input parameter x. The LSTM network architecture used in this study is identical to the one defined in [34], and characteristic activation function parameters are adjusted in the process of network training.

4. Results

4.1. Data Pre-Processing

During data collection, it was observed that the majority of data were available on a monthly basis, while a few were only annual (number of vehicles and number of licensed young drivers). Data available only on an annual basis were linearly interpolated to a monthly basis to ensure compatibility with other datasets and facilitate their accurate inclusion in the model.

All analyses and calculations were performed with data on a monthly basis but are presented in the diagrams on an annual basis to provide a clearer and more comprehensive graphical representation.

4.2. Analysis of Road Casualty Data Among Young Drivers in the Republic of Serbia

Temporal Analysis of Young Driver Casualties (FSI)

During the observed period, a total of 883 young drivers lost their lives and 4636 sustained serious injuries. The number of FSI among young drivers showed a steady increase from 2002 to 2008, followed by a downward trend until 2014. From 2014 onward, the number of FSI has exhibited a consistent year-over-year increase. In 2024, the number of FSI was higher than ten years ago (Figure 4).

The temporal distribution of young driver casualties (FSI) highlights distinct periods of elevated and reduced risk. In the analyzed period, the highest numbers of young driver fatalities and seriously injured young drivers were recorded in October (567, 10.3%), August (554, 10.0%), and July (545, 9.9%). July also registered the highest monthly fatality count (103, 11.7%). In contrast, the lowest casualty rates occurred between January and April, with February showing the smallest total (305 fatalities and serious injuries) and the fewest fatalities (48, 5.4%) (Figure 5a).

Day-of-week analysis shows a significant increase in FSI beginning on Fridays, peaking over weekends. Weekends constitute the most hazardous period for young drivers, with 2453 FSI (44.4%) and the highest number of fatalities (402, 45.5%) (Figure 5b).

Hourly analysis of FSI clearly shows that the most critical period of the day for young drivers is between 1:00 and 3:00 a.m., when a total of 1392 fatalities and serious injuries were recorded (25.2%). The highest number of fatalities occurred at 3:00 a.m. (93, 10.5%). A particularly high incidence of FSI in this category of road users compared to all other road users was recorded on Fridays and during the weekend over the nighttime period from midnight until the morning, when 290 (32.8%) fatalities and 1244 (37.3%) serious injuries were recorded (Figure 5c).

The machine learning model was applied to analyze data aggregated on a monthly basis.

4.3. LSTM Network Architecture

As already noted, the goal of the paper is to forecast the FSI in two specific periods between 1997 and 2024. For that purpose, an LSTM network was defined, using the available data on traffic accident casualties among young drivers along with other significant factors, such as motorization rate (the number of registered passenger cars per 1000 inhabitants), percentage of young people in the population, and weather conditions (air temperature, insolation, precipitation, the number of days with rain, wet snow, snow, fog). All this data is available for each month between 1997 and 2024.

Based on the available dataset, the neural network architecture was defined, presented in Figure 6 and consisting of four layers: (1) the sequence input layer, which normalizes all input features, formats them in a data matrix (with dimensions equal to number of input features x number of time instances N), and passes them to the (2) LSTM layer, with a defined number of hidden units (X), (3) the fully connected layer, forming the response, which is the forecasted output sequence, and (4) regression layer, calculating the loss function and accuracy of the predicted data. The last layer is necessary only during the network training, in forecasting tasks it can be omitted. The network was implemented in Matlab, using Deep Learning Toolbox. The total number of input features is 11 (10 impact factors and the actual number of FSI), and the number of outputs is one (the predicted data sequence - the number of FSI in a specified time period).

Calculation and prediction using the LSTM network was divided into two parts. Firstly, we used all available data from January 1997 to December 2009 in order to forecast the number of FSI for the following three years (2010–2012). These specific dates were chosen because the new Road Traffic Safety Law was introduced in November 2009, so we wanted to estimate its influence in real life. Secondly, the period from January 2012 to December 2019 was used to forecast the data for the period of the following 12 months, i.e., for all of 2020, which was the year when the COVID-19 pandemic had the hardest effects (quarantine, mobility limitations, etc.). As previously stated, the goal was to estimate the impact of the coronavirus on casualty numbers. In both cases, the output of the network was a number of traffic accidents with fatal and serious injuries among young drivers. The assumption is that external factors cannot directly influence the final consequence of the accident (fatal/serious injury), so the output is defined as the sum of these two types of accident.

4.4. Network Training and Testing

In the first part of the research, the total length of the data sequence was 156 (13 years, 12 months each). This data was divided into two sets: a training (first 90%) and a test set (the remaining 10%).

The network was trained on a training dataset, using a total of 200 hidden units (number of neurons in the hidden LSTM layer), which is a parameter that corresponds to the amount of information that the layer remembers between time steps, also called the hidden state. This parameter is usually set empirically, based on a trial-and-error procedure. The ADAM optimizer was used for network parameter optimization, with a maximum number of 250 epochs, a gradient threshold of 1, and an initial learning rate of 0.005. The input data was normalized, as LSTM networks are very sensitive to the different magnitudes of input data. The accuracy was estimated using RMSE, whose value for the training converged to 0.26 for normalized values, which is 0.89 for the absolute values (number of FSI per month).

The testing of the network was conducted using a test dataset and the same criterion, RMSE, which was 0.29 for the normalized values, i.e., 1.15 for the absolute data values. These values confirm that the trained network had a high quality, i.e., the error of prediction within the interval from 1998 to 2009 is very low. Besides RMSE, the scaled-free metrics were also included, in particular the Mean Absolute Scaled Error (MASE). The value of MASE for testing was 0.48, which confirms the quality of the model.

Finally, the forecasting for the period from 2010 to 2012 was conducted, and the results are presented in Figure 7 (monthly level) and Figure 8 (yearly level).

The numerical data presented in Figure 7 and Figure 8 are given in Table 1.

RMSE for the forecasted series is 0.625 for the normalized values, i.e., 8.51 for the absolute values. It is obvious that the quality of the forecasting model or data after January 1st 2010 is significantly lower than for the previous period, and that the out-of-sample gap is substantial. There is no need to introduce additional metrics to quantify these discrepancies, so complete statistical analysis will be omitted.

The hypothesis of this research was that the newly introduced Road Traffic Safety Law contributed to a decrease in the number of traffic accidents, and the results obtained by this forecasting method confirm that this assumption was correct. The only significant change in external impact factors was the introduction of this law, so we can conclude that it had a positive effect on road traffic safety.

The second part of the research was conducted using the same method. The total length of the data sequence was 96 (8 years, from 2012 to 2019), and the set was divided into training and test datasets using the same proportion (90–10). The same network parameters were used as in the first part. RMSE for training was 0.27 for the normalized data and 0.91 for absolute values. In the testing phase RMSE was 0.30 for the normalized, or 1.21 for the absolute data values and the MASE value was 0.46, which means that the achieved accuracy of the network was similar to the first research part.

After the training and testing phase, forecasting for 2020 was conducted, with RMSE equal to 0.27 for normalized, and 1.78 for absolute values. The value of MASE for this period was 0.58. The numerical results obtained are presented in Figure 9 and Table 2. It is noticeable that the forecasted values were, again, generally higher than the real data, but the difference is significantly lower than in the first part (Figure 8).

5. Discussion

This study examined the temporal patterns and underlying dynamics of FSI among young passenger car drivers in Serbia over a 28-year period (1997–2024). Using Long Short-Term Memory (LSTM) neural networks, the study introduced a novel deep learning approach to forecast short-term variations in number of FSI and assess the impact of two key events: the introduction of the new Road Traffic Safety Law in 2010 and the COVID-19 lockdown in 2020.

The results confirmed that the LSTM model achieved satisfactory predictive performance, as reflected in low RMSE values and the model’s ability to capture complex temporal dependencies in the data. Unlike traditional statistical models, which often assume linear relationships, the LSTM approach effectively modeled the nonlinear and dynamic nature of traffic casualty data. The model’s sensitivity to time-dependent fluctuations also allowed for a more realistic reflection of behavioral, environmental, and policy-related influences on crash outcomes.

A comparison of the actual and predicted values revealed that the model consistently forecasted slightly higher casualty numbers than those observed in the post-intervention periods. This finding suggests that both the 2010 Road Traffic Safety Law and the 2020 COVID-19 restrictions had a tangible positive impact on reducing FSI among young drivers. The legislative reform, in particular, strengthened enforcement mechanisms, introduced a penalty point system, and improved infrastructure safety standards, all of which contributed to the declining trend in casualties. Similarly, the temporary reduction in traffic volume during the COVID-19 lockdowns produced a short-term decline in casualties, further validating the model’s ability to detect the effects of external disruptions.

However, despite legislative and technological progress, young drivers continue to represent a disproportionately high-risk group. Their overrepresentation in accident statistics points to persistent behavioral and psychological factors—such as inexperience, overconfidence, and risk-taking—that mitigate the benefits of technological and regulatory improvements. These results are consistent with previous research emphasizing that road safety outcomes are shaped by a combination of human, technical, and institutional elements. Thus, improving safety among young drivers requires not only continuous legal enforcement but also targeted education, awareness campaigns, and early intervention programs.

The findings also highlight the importance of maintaining comprehensive and high-quality traffic safety databases. Although the RTSA dataset provides a reliable foundation for long-term analysis, continued efforts are needed to improve data completeness, standardization, and temporal granularity. Expanding the database to include behavioral indicators or near-miss data would allow for even deeper analytical insights.

Overall, the discussion underscores that while deep learning models such as LSTM provide powerful predictive capabilities, their greatest value lies in supporting evidence-based policy formulation. The integration of these tools with traditional analytical approaches, coupled with ongoing improvements in data collection and public awareness, can significantly enhance Serbia’s capacity to reduce the number of FSI among young drivers.

Beyond the Serbian context, the methodological framework proposed in this study can be adapted for use in other regions with similar data structures and traffic safety challenges. The approach allows policymakers and researchers to conduct “what-if” simulations, assess the effectiveness of implemented safety measures, and identify potential future risks. Moreover, by integrating additional explanatory variables—such as road infrastructure characteristics, behavioral indicators, and vehicle safety technologies—the predictive accuracy and interpretability of the model can be further enhanced.

6. Conclusions

The present study analyzed the long-term trends in FSI involving young car drivers (aged 18–24) in the Republic of Serbia from 1997 to 2024 and evaluated the influence of major events, including the introduction of the new Road Traffic Safety Law in 2010 and the COVID-19 pandemic in 2020. By applying Long Short-Term Memory (LSTM) neural networks to historical accident data, the study demonstrated the potential of deep learning approaches for short-term forecasting of FSI.

The results confirmed that the LSTM model achieved satisfactory predictive accuracy and effectively captured the nonlinear and temporal dependencies in the data. The forecasts indicated that both legislative reforms and temporary mobility restrictions had a measurable positive impact in reducing the number of FSI among young drivers. These findings support the hypothesis that machine learning algorithms can overcome the limitations of traditional statistical models and provide deeper insights into complex, multifactorial road safety dynamics.

The developed model provides a practical foundation for traffic safety planning, enabling virtual experiments, identification of future trend changes, and evaluation of new policy measures. Continued improvement of data quality and inclusion of additional explanatory factors—such as behavioral indicators, vehicle safety technologies, and spatial components—would further enhance its predictive performance and policy relevance.

One of the contributions of this study lies in the future potential of deep learning techniques as decision-support tools for traffic safety management. The developed LSTM model enables the simulation of various scenarios, identification of potential turning points, and assessment of the impact of future policy measures. The application of a given model as a support tool in decision-making is not the primary function at the current stage of its development. In addition, the modeling framework allows the inclusion of new explanatory variables—such as socioeconomic indicators, vehicle fleet composition, or environmental conditions—which could further enhance its prediction accuracy and policy relevance.

However, the study has several limitations. The predictive model relies primarily on historical time-series data and does not explicitly incorporate behavioral, infrastructural, or environmental variables, or the adoption of Intelligent Transport Systems (ITS) and in-vehicle technologies, which may also influence casualty trends. Moreover, as with most deep learning models, interpretability remains a challenge—while the model performs well in prediction, understanding the relative contribution of individual factors requires further methodological development.

Overall, this study demonstrates that advanced machine learning techniques can substantially improve the reliability of traffic casualty forecasting and provide actionable insights for safety-related interventions. The forecasting results can guide the allocation of enforcement resources, such as targeted traffic patrols or awareness campaigns, and inform the design of preventive measures, including driver education programs and infrastructure improvements. Strengthening traffic law enforcement, promoting the adoption of ITS and in-vehicle safety technologies, and enhancing driver education can, when combined with predictive analytics, enable more proactive and effective strategies to reduce FSI among young drivers and improve overall road safety.

Author Contributions

Conceptualization, S.B. and Ž.K.; methodology, S.B. and Ž.K.; validation, S.B., Ž.K., and T.B.; formal analysis, S.B. and Ž.K.; investigation, S.B. and T.B.; data curation, S.B. and T.B.; writing—original draft preparation, S.B. and Ž.K.; writing—review and editing, T.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research has been supported by Provincial Secretariat for Higher Education and Scientific Research (Autonomous Province of Vojvodina) through project no. 003077543 2024 09418 003 000 000 001 01 001 “Development of a spatially based intelligent traffic safety management system” and the APC is has been partialy funded by Ministry of Science, Technological Development and Innovation (Contract No. 451-03-137/2025-03/200156) and the Faculty of Technical Sciences, University of Novi Sad through project “Scientific and Artistic Research Work of Researchers in Teaching and Associate Positions at the Faculty of Technical Sciences, University of Novi Sad 2025” (No. 01-50/295).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ML	Machine learning
DT	Decision trees
CART	Classification and regression trees
XGBoost	Extreme gradient boosting
RF	Random forest
LSTM	Long Short-Term Memory
MLAs	Machine learning algorithms
RTSA	Road Traffic Safety Agency of the Republic of Serbia
FSI	Fatally and seriously injured
VI	Vulnerability indicator
RNN	Recurrent Neural Network
RMSE	Root Mean Squared Error

References

World Health Organization. Global Status Report on Road Safety 2023; World Health Organization: Geneva, Switzerland, 2023. [Google Scholar]
National Center for Statistics and Analysis. Young Drivers: 2022 Data (Traffic Safety Facts. Report No. DOT HS 813 601); National Highway Traffic Safety Administration: Washington, DC, USA, 2024.
Department for Transport. Reported Road Casualties in Great Britain: Younger Driver Factsheet, 2023; Department for Transport: London, UK, 2024. [Google Scholar]
International Transport Forum. Road Safety Annual Report 2024; OECD Publishing: Paris, France, 2024. [Google Scholar]
Carson, J.; Jost, G.; Meinero, M. Ranking EU Progress on Road Safety, 18th Road Safety Performance Index Report; European Transport Safety Council: Brussels, Belgium, 2024. [Google Scholar]
European Commission. Road Safety Thematic Report—Young Novice Drivers. European Road Safety Observatory; European Commission, Directorate General for Transport: Brussels, Belgium, 2023. [Google Scholar]
European Commission. European Road Safety Observatory: Facts and Figures—Young People—2021. Available online: https://road-safety.transport.ec.europa.eu/system/files/2022-01/F%26F_young_people_20211221.pdf (accessed on 25 August 2025).
Ulleberg, P.; Rundmo, T. Personality, attitudes and risk perception as predictors of risky driving behaviour among young drivers. Saf. Sci. 2003, 41, 427–443. [Google Scholar] [CrossRef]
Lajunen, T.; Summala, H. Driving experience, personality, and skill and safety-motive dimensions in drivers’ self-assessments. Pers. Individ. Differ. 1995, 19, 307–318. [Google Scholar] [CrossRef]
Traficante, S.; Tinella, L.; Lopez, A.; Bosco, A.; Koppel, S.; Spano, G.; Napoletano, R.; Ricciardi, E.; Caffò, A.O. Driving-related factors affecting mind-wandering behind the wheel: A systematic review. Transp. Rev. 2025, 45, 948–969. [Google Scholar] [CrossRef]
Unsworth, N.; McMillan, B.D. Similarities and differences between mind-wandering and external distraction: A latent variable analysis of lapses of attention and their relation to cognitive abilities. Acta Psychol. 2014, 150, 14–25. [Google Scholar] [CrossRef]
Pourroostaei Ardakani, S.; Liang, X.; Mengistu, K.T.; So, R.S.; Wei, X.; He, B.; Cheshmehzangi, A. Road car accident prediction using a machine-learning-enabled data analysis. Sustainability 2023, 15, 5939. [Google Scholar] [CrossRef]
Bandi, P.; Silver, D.; Mijanovich, T.; Macinko, J. Temporal trends in motor vehicle fatalities in the United States, 1968 to 2010—A joinpoint regression analysis. Inj. Epidemiol. 2015, 2, 4. [Google Scholar] [CrossRef] [PubMed]
Lović Obradović, S.; Rabiei-Dastjerdi, H.; Matović, S. Identifying spatiotemporal variability of traffic accident mortality. Evidence from the City of Belgrade, Serbia. Cent. Eur. J. Geogr. Sustain. Dev. 2022, 4, 78–93. [Google Scholar] [CrossRef]
Ren, K.; Miao, L.; Lyu, J. The temporal trend of road traffic mortality in China from 2004 to 2020. SSM Popul. Health 2023, 24, 101527. [Google Scholar] [CrossRef]
Melchor, I.; Nolasco, A.; Moncho, J.; Quesada, J.A.; Pereyra-Zamora, P.; García-Senchermés, C.; Salinas, M. Trends in mortality due to motor vehicle traffic accident injuries between 1987 and 2011 in a Spanish region (Comunitat Valenciana). Accid. Anal. Prev. 2015, 77, 21–28. [Google Scholar] [CrossRef]
Arora, Y.K.; Kumar, S. Statistical approach to predict road accidents in India. In Computing in Engineering and Technology, Proceedings of ICCET 2019, Nagoya, Japan, 12–15 April 2019; Springer: Singapore, 2019; pp. 189–196. [Google Scholar]
Bamel, K.; Dass, S.; Jaglan, S.; Suthar, M. Statistical analysis and development of accident prediction model of road safety conditions in Hisar City. In Proceedings of the IOP Conference Series: Earth and Environmental Science, Online, 24–25 November 2021; IOP Publishing: Bristol, UK, 2021; Volume 889, p. 012034. [Google Scholar]
Jindal, R.K.; Agarwal, A.K.; Sahoo, A.K. Envisaging the road accidents using regression analysis. Int. J. Adv. Sci. Technol. 2020, 29, 1708–1716. [Google Scholar]
Wu, Q.; Zhang, G.; Zhu, X.; Liu, X.C.; Tarefder, R. Analysis of driver injury severity in single-vehicle crashes on rural and urban roadways. Accid. Anal. Prev. 2016, 94, 35–45. [Google Scholar] [CrossRef]
Zeng, Q.; Gu, W.; Zhang, X.; Wen, H.; Lee, J.; Hao, W. Analyzing freeway crash severity using a Bayesian spatial generalized ordered logit model with conditional autoregressive priors. Accid. Anal. Prev. 2019, 127, 87–95. [Google Scholar] [CrossRef]
Yahaya, M.; Fan, W.; Fu, C.; Li, X.; Su, Y.; Jiang, X. A machine-learning method for improving crash injury severity analysis: A case study of work zone crashes in Cairo, Egypt. Int. J. Inj. Control Saf. Promot. 2020, 27, 266–275. [Google Scholar] [CrossRef]
Zhang, J.; Li, Z.; Pu, Z.; Xu, C. Comparing prediction performance for crash injury severity among various machine learning and statistical methods. IEEE Access 2018, 6, 60079–60087. [Google Scholar] [CrossRef]
Silva, P.B.; Andrade, M.; Ferreira, S. Machine learning applied to road safety modeling: A systematic literature review. J. Traffic Transp. Eng. (Engl. Ed.) 2020, 7, 775–790. [Google Scholar] [CrossRef]
Mannering, F.; Bhat, C.R.; Shankar, V.; Abdel-Aty, M. Big data, traditional data and the tradeoffs between prediction and causality in highway-safety analysis. Anal. Methods Accid. Res. 2020, 25, 100113. [Google Scholar] [CrossRef]
Obasi, I.C.; Benson, C. Evaluating the effectiveness of machine learning techniques in forecasting the severity of traffic accidents. Heliyon 2023, 9, e19172. [Google Scholar] [CrossRef]
Chen, J.; Liu, P.; Wang, S.; Zheng, N.; Guo, X. Prediction and interpretation of crash severity using machine learning based on imbalanced traffic crash data. J. Saf. Res. 2025, 93, 185–199. [Google Scholar] [CrossRef] [PubMed]
Gutierrez-Osorio, C.; Pedraza, C. Modern data sources and techniques for analysis and forecast of road accidents: A review. J. Traffic Transp. Eng. (Engl. Ed.) 2020, 7, 432–446. [Google Scholar] [CrossRef]
Yang, Y.; Wang, K.; Yuan, Z.; Liu, D. Predicting freeway traffic crash severity using XGBoost-Bayesian network model with consideration of features interaction. J. Adv. Transp. 2022, 2022, 4257865. [Google Scholar] [CrossRef]
Hashmienejad, S.H.A.; Hasheminejad, S.M.H. Traffic accident severity prediction using a novel multi-objective genetic algorithm. Int. J. Crashworthi. 2017, 22, 425–440. [Google Scholar] [CrossRef]
Yan, M.; Shen, Y. Traffic accident severity prediction based on random forest. Sustainability 2022, 14, 1729. [Google Scholar] [CrossRef]
Ventura, R.; Barabino, B.; Maternini, G. Prediction of the severity of exceeding design traffic loads on highway bridges. Heliyon. 2024, 10, e23374. [Google Scholar] [CrossRef] [PubMed]
Lee, K.; Eo, M.; Jung, E.; Yoon, Y.; Rhee, W. Short-term traffic prediction with deep neural networks: A survey. IEEE Access 2021, 9, 54739–54756. [Google Scholar] [CrossRef]
Yu, Y.; Si, X.; Hu, C.; Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 2019, 31, 1235–1270. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Afrin, T.; Yodo, N. A Long Short-Term Memory-based correlated traffic data prediction framework. Knowl.-Based Syst. 2022, 237, 107755. [Google Scholar] [CrossRef]
Stanojević, P.; Lajunen, T.; Jakšić, D.; Jovanović, D.; Matović, B. Effectiveness of implementing a Graduated Driver Licensing (GDL) law among young Serbian drivers. J. Saf. research. 2022, 83, 339–348. [Google Scholar] [CrossRef]
Bačkalić, S.; Jovanović, D.; Rajčević, S.; Pljakić, M. Analysis of casualties of young car drivers in traffic accidents by gender differences: Temporal trends in the Republic of Serbia. Int. J. Adolesc. Youth 2025, 30, 2536799. [Google Scholar] [CrossRef]
Pešić, A.; Stephens, A.N.; Newnam, S.; Čičević, S.; Pešić, D.; Trifunović, A. Youth perceptions and attitudes towards road safety in Serbia. Systems 2022, 10, 191. [Google Scholar] [CrossRef]
Statistical Office of the Republic of Serbia. Statistical Release—Estimated Population. 2024. Available online: https://www.stat.gov.rs/en-US/vesti/statisticalrelease/?p=17030&a=18&s=1801 (accessed on 25 August 2025).
Road Traffic Safety Agency. Integrated Database of Traffic Safety Characteristics. (In Serbian: Integrisana Baza Podataka o Obeležjima Bezbednosti Saobraćaja). Available online: https://bazaabs.abs.gov.rs/absPortal/ (accessed on 25 August 2025).
Shaaban, K.; Elamin, M.; Alsoub, M. Intelligent transportation systems in a developing country: Benefits and challenges of implementation. Transp. Res. Procedia 2021, 55, 1373–1380. [Google Scholar] [CrossRef]
Official Gazette of Republic of Serbia. Road Traffic Safety Law; No. 24/2020; Official Gazette of Republic of Serbia: Belgrade, Serbia, 2020.
European Transport Safety Council. Ranking EU Progress on Road Safety. 19th Road Safety Performance Index (PIN) Report; European Transport Safety Council: Brussels, Belgium, 2025. [Google Scholar]
Sherstinsky, A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys. D 2020, 404, 132306. [Google Scholar] [CrossRef]

Figure 1. Road safety indicators for passenger car drivers aged 18–24 in Serbia (1997–2024).

Figure 2. The basic diagram of an LSTM network cell.

Figure 3. LSTM cells connected in a network.

Figure 4. Casualty (FSI) trends among car drivers aged 18–24 in Serbia, 1997–2024.

Figure 5. Temporal distribution of young passenger car driver casualties (FSI) in Serbia, 1997–2024.

Figure 6. The LSTM model architecture.

Figure 7. Actual and predicted values of FSI-monthly level.

Figure 8. Actual and predicted values of FSI yearly level.

Figure 9. Actual and predicted number of FSI in 2020, monthly level.

Table 1. Actual and predicted number of casualties (FSI) of young car drivers during the period of 2010–2012.

	2010		2011		2012
	Actual FSI	Predicted FSI	Actual FSI	Predicted FSI	Actual FSI	Predicted FSI
January	10	25	19	21	9	20
February	4	11	8	18	2	15
March	9	12	12	13	8	14
April	13	18	11	17	12	19
May	12	19	12	19	16	19
June	19	20	14	21	11	23
July	9	24	14	25	19	26
August	25	26	18	29	18	27
September	25	26	12	27	12	23
October	13	24	20	24	16	21
November	18	24	17	23	12	21
December	14	23	16	22	10	20
TOTAL	171	252	173	259	145	248

Table 2. Actual and predicted number of FSI among young car drivers in 2020.

	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec	TOTAL
Actual	15	7	7	3	10	7	15	4	13	15	13	14	123
Predicted	16	9	8	5	12	11	16	6	14	16	13	15	141

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bačkalić, S.; Kanović, Ž.; Bačkalić, T. A Case Study on Predicting Road Casualties Among Young Car Drivers in the Republic of Serbia Using Machine Learning. Safety 2025, 11, 107. https://doi.org/10.3390/safety11040107

AMA Style

Bačkalić S, Kanović Ž, Bačkalić T. A Case Study on Predicting Road Casualties Among Young Car Drivers in the Republic of Serbia Using Machine Learning. Safety. 2025; 11(4):107. https://doi.org/10.3390/safety11040107

Chicago/Turabian Style

Bačkalić, Svetlana, Željko Kanović, and Todor Bačkalić. 2025. "A Case Study on Predicting Road Casualties Among Young Car Drivers in the Republic of Serbia Using Machine Learning" Safety 11, no. 4: 107. https://doi.org/10.3390/safety11040107

APA Style

Bačkalić, S., Kanović, Ž., & Bačkalić, T. (2025). A Case Study on Predicting Road Casualties Among Young Car Drivers in the Republic of Serbia Using Machine Learning. Safety, 11(4), 107. https://doi.org/10.3390/safety11040107

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Case Study on Predicting Road Casualties Among Young Car Drivers in the Republic of Serbia Using Machine Learning

Abstract

1. Introduction

2. Problem Description

3. Methodology

3.1. Data Collection and Pre-Processing

3.2. Modeling Strategy

3.2.1. Descriptive Statistics

3.2.2. LSTM Network

4. Results

4.1. Data Pre-Processing

4.2. Analysis of Road Casualty Data Among Young Drivers in the Republic of Serbia

Temporal Analysis of Young Driver Casualties (FSI)

4.3. LSTM Network Architecture

4.4. Network Training and Testing

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI