1. Introduction
Road traffic injuries represent a major global public health issue that requires coordinated and collective efforts to effectively prevent traffic accidents and their consequences. Each year, approximately 1.19 million people lose their lives on the roads, while an additional 30 to 50 million suffer injuries [
1]. A worrying fact is that road traffic accidents are among the top three causes of death globally and represent the leading cause of death for young adults aged 15–29 years [
2,
3]. Although individuals of all age groups are affected by traffic accidents, young people are disproportionately represented in accident statistics across all countries [
4,
5]. Between 2015 and 2019 in the EU, an average of 1.215 young car drivers (aged 18–24) died in crashes each year, accounting for 16% of all car driver fatalities during that period [
6]. The distribution of fatalities among young people by transport mode in the EU27 in the period 2010–2019 shows that around 40% of fatalities among young people aged 18–24 are among car drivers [
7].
The most-represented of the factors that cause low road safety levels for youth are connected with lifestyle, inexperience, and the non-use of protective systems. Several factors, such as over-speeding, fatigued driving, drunk driving, and overloading/overcrowding, are primary accident causes. In addition, attitudes, driving behavior, experience, personality, and skill play significant roles [
8,
9]. Another critical factor that has been increasingly recognized in the literature is driver distraction. Distraction can arise from internal sources, such as mind-wandering or mobile phone use, and external sources, such as roadside advertising or complex traffic environments. Studies show that both internal and external distractions significantly increase the risk of accidents, often interacting with other behavioral and environmental factors to exacerbate crash likelihood [
10,
11]. This underscores the importance of considering distraction as a key determinant of road safety, particularly for young drivers, who are more susceptible to cognitive and attentional lapses. Road accidents are influenced by various external factors, including weather conditions (such as rainfall, temperature, and windstorms), lighting conditions (daylight or nighttime), time of day, day of the week (weekday or weekend), month, and season—all of which are important determinants of road traffic accidents. They should be used in the training of predictive models [
12]. In order to mitigate the risks associated with young drivers, it is essential to implement a comprehensive set of countermeasures. These should include improvements in the areas of training, education, testing, communication, enforcement, and technology.
Monitoring and analysis of the temporal distribution of traffic accidents and their consequences makes it possible to observe trends in the movement of these phenomena and thus gain knowledge that is of great use in planning and implementing appropriate activities with the aim of improving the traffic safety situation [
13,
14,
15,
16]. The creation of policies, strategies, and action plans that have a clear goal of reducing the number of fatalities and injuries in traffic accidents is based on analysis and understanding of existing problems in this area. Several studies have employed conventional statistical analysis techniques to examine factors related to casualties and predict road accident frequency [
17,
18,
19,
20,
21]. Although such models offer mathematical interpretability and facilitate a clearer understanding of the effects of individual predictor variables, they are often constrained by limitations, including low predictive accuracy and the potential for biased parameter estimation [
22]. Researchers have made significant efforts to investigate the applicability of machine learning techniques in road safety modeling [
23,
24,
25]. In response to these shortcomings, machine learning (ML) techniques have been increasingly adopted as robust alternatives to traditional analytical methods to study factors related to casualties and frequency [
23,
26,
27]. Traffic accident data are inherently nonlinear and often require domain-specific knowledge for accurate prediction. Deep learning methods can autonomously learn this underlying knowledge directly from the data, enabling more accurate and adaptive accident or casualty forecasting. Moreover, machine learning techniques facilitate the extraction of meaningful patterns and insights from large, complex, and heterogeneous datasets. Among the most commonly utilized ML methods in this context are decision trees (DT) [
28], Bayesian networks [
29], classification and regression trees (CART) [
30], extreme gradient boosting (XGBoost) [
29], and random forest (RF) [
31]. Artificial Neural Networks (ANN) have been successfully applied in traffic modeling and prediction of exceeding traffic load severity [
32]. Recently, applications of Deep Neural Network Models have been introduced in traffic accident forecasting [
12,
33]. Generally, when problems regarding time-series prediction are considered, the most common neural network architectures used are Recurrent Neural Networks [
34]. According to the literature, Long Short-Term Memory (LSTM) networks are proven to be the most efficient and most accurate of them [
35,
36]. Therefore, a network architecture of this type was used in the present study for FSI frequency forecasting [
24,
27].
Although previous studies have examined young drivers in Serbia, important gaps remain. Research shows that inexperience, risky behaviors, and low risk perception contribute to high crash rates, and interventions such as GDL programs improve attitudes but have limited impact on actual crashes [
37]. Other studies highlight gender, urban/rural, and behavioral differences [
38,
39], yet these analyses typically rely on conventional statistics and do not integrate long-term trends or predictive modeling. The effects of major events, such as legislative changes or the COVID-19 pandemic, on FSI are also largely unexplored. This study addresses these gaps by applying LSTM-based deep learning models to national datasets, enabling the prediction and comprehensive understanding of young driver casualties (FSI) and providing insights about improvements in traffic safety policy and targeted interventions.
The aim of this study is to present and analyze the trends in the numbers of fatalities and serious injuries (FSI) among young drivers in the Republic of Serbia in the overall period of 1997–2024, and to develop accurate short-term forecasts of casualties (FSI) in two specific periods between 1997 and 2024. One of the main goals is to apply deep learning methods to estimate and quantify the impact of these two specific events on the trend of casualties (FSI) among young drivers in Serbia, which is, to the best of our knowledge, the first such study in Serbia. The critical turning-point years were 2010, when Serbia introduced a new Road Traffic Safety Law and 2020, when there was a COVID-19 lockdown.
The central hypothesis of the study is that machine learning algorithms (MLAs), particularly deep learning architectures such as LSTM, can reliably predict short-term trends in traffic accident outcomes involving young car drivers based on historical data.
The specific objectives of the study are as follows: (1) To analyze the temporal trends of young driver casualties (FSI) in Serbia over a 28-year period. (2) To identify key contributing factors to young driver casualties (FSI) using available national datasets. (3) To predict the number of young driver casualties (FSI) following two specific events. The first refers to the introduction of the new Road Traffic Safety Law and the second to the period after the Coronavirus pandemic. (4) To evaluate the practical implications of model predictions for public policy and youth-targeted road safety programs.
This study contributes to the field of road safety research in several key ways. First, it is one of the first studies to apply deep learning techniques (LSTM networks) to the prediction of youth-related traffic casualties in Serbia. Second, it provides a thorough empirical analysis of nearly three decades of national accident data. Third, it offers actionable insights to inform evidence-based traffic safety planning and targeted policy interventions for young drivers.
This article is organized as follows:
Section 2 presents the description of the problem.
Section 3 outlines the methodology applied in this empirical case study.
Section 4 reports the results and analysis of young driver casualties (FSI), including forecasting, using deep learning methods.
Section 5 discusses the key insights and implications for traffic safety policy, and
Section 6 concludes the study.
2. Problem Description
The Republic of Serbia, situated in Southeast Europe, remains below the EU average in terms of socioeconomic development and continues to face substantial challenges in the area of road traffic safety. In 2024, the estimated population was about 6.5 million [
40], with approximately 2.4 million registered motor vehicles. In 2023, road traffic accidents claimed the lives of 503 individuals, including 163 passenger vehicle drivers—representing 32% of all fatalities. Among these drivers, 17 were young individuals aged 18–24 (15 men and 2 women), accounting for 12% of all passenger car driver fatalities [
41]. Notably, over 38% of traffic fatalities among youth were the drivers themselves, and the trend is likely to continue without effective interventions. The adoption of in-vehicle safety systems (ABS, ESC, airbags) and Intelligent Transport Systems (ITS) has improved traffic safety in developing countries [
42]. In Serbia, their impact is limited by a fleet dominated by imported used cars lacking modern safety features and with modest ITS infrastructure. Consequently, fluctuations in safety outcomes are more closely associated with legislative interventions—such as stricter penalties, mandatory seat belt use, and drink-driving laws—than with technological improvements. Following the 2010 Road Traffic Safety Law, fatalities declined moderately, reflecting the benefits of regulation, enforcement, and driver education. Despite these gains, young drivers remain overrepresented in accidents, underscoring the persistent influence of behavioral factors and the complex interplay of human, technical, and institutional elements.
In response to these challenges, Serbia introduced a new Road Traffic Safety Law at the end of 2009, followed by a revised version in 2020, aiming to strengthen the legal framework and improve road safety for young drivers [
43]. The Road Traffic Safety Law in Serbia, which came into force in 2010, introduced numerous innovations in the field of traffic regulation. Among the most significant were the implementation of strategic traffic safety measures, the establishment of a traffic safety financing system, the introduction of a penalty points system, improvements in road infrastructure safety, and the organization of traffic safety campaigns.
For many years, the Republic of Serbia has ranked near the bottom of road safety performance lists. The latest Annual Report, which evaluates the progress of 32 European and associated countries, shows that Serbia recorded the highest road mortality rate in 2024, with 78 road deaths per million inhabitants [
44]. Over the past three decades, the country has experienced a turbulent period marked by political instability, armed conflicts, sanctions, economic transition, privatization, a rising standard of living, the COVID-19 pandemic, and several amendments to the Road Traffic Safety Law. A particularly concerning aspect of Serbia’s poor road safety record is the high mortality rate among young car drivers. The core of the issue lies in the unacceptably high number of fatalities within this group of road users, as well as significant year-to-year fluctuations in their occurrence.
Three indicators were used to assess the road safety performance of young car drivers: (1) Vulnerability indicator (VI), (2) FSI rate per 100,000 people (FSI/pop), and (3) FSI rate per 10,000 registered motor vehicles (FSI/rmv). Vulnerability indicator (VI) is calculated by dividing the proportion of FSI in a given age group (within the total number of all FSI) by the proportion that the same age group represents in the total population. If the VI is greater than 1, the observed age group is considered overrepresented among the casualties. The analysis of the VI shows that, throughout the entire period, the observed age group of young drivers was a vulnerable group, with values ranging from 1.67 (in 2016) to a maximum of 2.24 (in 2006 and 2011). The FSI rate per 100,000 people exhibited a downward trend from 2007 to 2014, after which it has shown a consistent year-on-year increase. Over the past decade, the highest value of FSI rate per 100,000 population was 3.0 and it was recorded in 2024. The FSI rate per 10,000 vehicles also declined between 2007 and 2014, after which it remained constant at 0.6 FSI of young drivers per 10,000 registered vehicles throughout the 2014–2024 period (
Figure 1).
4. Results
4.1. Data Pre-Processing
During data collection, it was observed that the majority of data were available on a monthly basis, while a few were only annual (number of vehicles and number of licensed young drivers). Data available only on an annual basis were linearly interpolated to a monthly basis to ensure compatibility with other datasets and facilitate their accurate inclusion in the model.
All analyses and calculations were performed with data on a monthly basis but are presented in the diagrams on an annual basis to provide a clearer and more comprehensive graphical representation.
4.2. Analysis of Road Casualty Data Among Young Drivers in the Republic of Serbia
Temporal Analysis of Young Driver Casualties (FSI)
During the observed period, a total of 883 young drivers lost their lives and 4636 sustained serious injuries. The number of FSI among young drivers showed a steady increase from 2002 to 2008, followed by a downward trend until 2014. From 2014 onward, the number of FSI has exhibited a consistent year-over-year increase. In 2024, the number of FSI was higher than ten years ago (
Figure 4).
The temporal distribution of young driver casualties (FSI) highlights distinct periods of elevated and reduced risk. In the analyzed period, the highest numbers of young driver fatalities and seriously injured young drivers were recorded in October (567, 10.3%), August (554, 10.0%), and July (545, 9.9%). July also registered the highest monthly fatality count (103, 11.7%). In contrast, the lowest casualty rates occurred between January and April, with February showing the smallest total (305 fatalities and serious injuries) and the fewest fatalities (48, 5.4%) (
Figure 5a).
Day-of-week analysis shows a significant increase in FSI beginning on Fridays, peaking over weekends. Weekends constitute the most hazardous period for young drivers, with 2453 FSI (44.4%) and the highest number of fatalities (402, 45.5%) (
Figure 5b).
Hourly analysis of FSI clearly shows that the most critical period of the day for young drivers is between 1:00 and 3:00 a.m., when a total of 1392 fatalities and serious injuries were recorded (25.2%). The highest number of fatalities occurred at 3:00 a.m. (93, 10.5%). A particularly high incidence of FSI in this category of road users compared to all other road users was recorded on Fridays and during the weekend over the nighttime period from midnight until the morning, when 290 (32.8%) fatalities and 1244 (37.3%) serious injuries were recorded (
Figure 5c).
The machine learning model was applied to analyze data aggregated on a monthly basis.
4.3. LSTM Network Architecture
As already noted, the goal of the paper is to forecast the FSI in two specific periods between 1997 and 2024. For that purpose, an LSTM network was defined, using the available data on traffic accident casualties among young drivers along with other significant factors, such as motorization rate (the number of registered passenger cars per 1000 inhabitants), percentage of young people in the population, and weather conditions (air temperature, insolation, precipitation, the number of days with rain, wet snow, snow, fog). All this data is available for each month between 1997 and 2024.
Based on the available dataset, the neural network architecture was defined, presented in
Figure 6 and consisting of four layers: (1) the sequence input layer, which normalizes all input features, formats them in a data matrix (with dimensions equal to number of input features x number of time instances N), and passes them to the (2) LSTM layer, with a defined number of hidden units (X), (3) the fully connected layer, forming the response, which is the forecasted output sequence, and (4) regression layer, calculating the loss function and accuracy of the predicted data. The last layer is necessary only during the network training, in forecasting tasks it can be omitted. The network was implemented in Matlab, using Deep Learning Toolbox. The total number of input features is 11 (10 impact factors and the actual number of FSI), and the number of outputs is one (the predicted data sequence - the number of FSI in a specified time period).
Calculation and prediction using the LSTM network was divided into two parts. Firstly, we used all available data from January 1997 to December 2009 in order to forecast the number of FSI for the following three years (2010–2012). These specific dates were chosen because the new Road Traffic Safety Law was introduced in November 2009, so we wanted to estimate its influence in real life. Secondly, the period from January 2012 to December 2019 was used to forecast the data for the period of the following 12 months, i.e., for all of 2020, which was the year when the COVID-19 pandemic had the hardest effects (quarantine, mobility limitations, etc.). As previously stated, the goal was to estimate the impact of the coronavirus on casualty numbers. In both cases, the output of the network was a number of traffic accidents with fatal and serious injuries among young drivers. The assumption is that external factors cannot directly influence the final consequence of the accident (fatal/serious injury), so the output is defined as the sum of these two types of accident.
4.4. Network Training and Testing
In the first part of the research, the total length of the data sequence was 156 (13 years, 12 months each). This data was divided into two sets: a training (first 90%) and a test set (the remaining 10%).
The network was trained on a training dataset, using a total of 200 hidden units (number of neurons in the hidden LSTM layer), which is a parameter that corresponds to the amount of information that the layer remembers between time steps, also called the hidden state. This parameter is usually set empirically, based on a trial-and-error procedure. The ADAM optimizer was used for network parameter optimization, with a maximum number of 250 epochs, a gradient threshold of 1, and an initial learning rate of 0.005. The input data was normalized, as LSTM networks are very sensitive to the different magnitudes of input data. The accuracy was estimated using RMSE, whose value for the training converged to 0.26 for normalized values, which is 0.89 for the absolute values (number of FSI per month).
The testing of the network was conducted using a test dataset and the same criterion, RMSE, which was 0.29 for the normalized values, i.e., 1.15 for the absolute data values. These values confirm that the trained network had a high quality, i.e., the error of prediction within the interval from 1998 to 2009 is very low. Besides RMSE, the scaled-free metrics were also included, in particular the Mean Absolute Scaled Error (MASE). The value of MASE for testing was 0.48, which confirms the quality of the model.
Finally, the forecasting for the period from 2010 to 2012 was conducted, and the results are presented in
Figure 7 (monthly level) and
Figure 8 (yearly level).
RMSE for the forecasted series is 0.625 for the normalized values, i.e., 8.51 for the absolute values. It is obvious that the quality of the forecasting model or data after January 1st 2010 is significantly lower than for the previous period, and that the out-of-sample gap is substantial. There is no need to introduce additional metrics to quantify these discrepancies, so complete statistical analysis will be omitted.
The hypothesis of this research was that the newly introduced Road Traffic Safety Law contributed to a decrease in the number of traffic accidents, and the results obtained by this forecasting method confirm that this assumption was correct. The only significant change in external impact factors was the introduction of this law, so we can conclude that it had a positive effect on road traffic safety.
The second part of the research was conducted using the same method. The total length of the data sequence was 96 (8 years, from 2012 to 2019), and the set was divided into training and test datasets using the same proportion (90–10). The same network parameters were used as in the first part. RMSE for training was 0.27 for the normalized data and 0.91 for absolute values. In the testing phase RMSE was 0.30 for the normalized, or 1.21 for the absolute data values and the MASE value was 0.46, which means that the achieved accuracy of the network was similar to the first research part.
After the training and testing phase, forecasting for 2020 was conducted, with RMSE equal to 0.27 for normalized, and 1.78 for absolute values. The value of MASE for this period was 0.58. The numerical results obtained are presented in
Figure 9 and
Table 2. It is noticeable that the forecasted values were, again, generally higher than the real data, but the difference is significantly lower than in the first part (
Figure 8).
5. Discussion
This study examined the temporal patterns and underlying dynamics of FSI among young passenger car drivers in Serbia over a 28-year period (1997–2024). Using Long Short-Term Memory (LSTM) neural networks, the study introduced a novel deep learning approach to forecast short-term variations in number of FSI and assess the impact of two key events: the introduction of the new Road Traffic Safety Law in 2010 and the COVID-19 lockdown in 2020.
The results confirmed that the LSTM model achieved satisfactory predictive performance, as reflected in low RMSE values and the model’s ability to capture complex temporal dependencies in the data. Unlike traditional statistical models, which often assume linear relationships, the LSTM approach effectively modeled the nonlinear and dynamic nature of traffic casualty data. The model’s sensitivity to time-dependent fluctuations also allowed for a more realistic reflection of behavioral, environmental, and policy-related influences on crash outcomes.
A comparison of the actual and predicted values revealed that the model consistently forecasted slightly higher casualty numbers than those observed in the post-intervention periods. This finding suggests that both the 2010 Road Traffic Safety Law and the 2020 COVID-19 restrictions had a tangible positive impact on reducing FSI among young drivers. The legislative reform, in particular, strengthened enforcement mechanisms, introduced a penalty point system, and improved infrastructure safety standards, all of which contributed to the declining trend in casualties. Similarly, the temporary reduction in traffic volume during the COVID-19 lockdowns produced a short-term decline in casualties, further validating the model’s ability to detect the effects of external disruptions.
However, despite legislative and technological progress, young drivers continue to represent a disproportionately high-risk group. Their overrepresentation in accident statistics points to persistent behavioral and psychological factors—such as inexperience, overconfidence, and risk-taking—that mitigate the benefits of technological and regulatory improvements. These results are consistent with previous research emphasizing that road safety outcomes are shaped by a combination of human, technical, and institutional elements. Thus, improving safety among young drivers requires not only continuous legal enforcement but also targeted education, awareness campaigns, and early intervention programs.
The findings also highlight the importance of maintaining comprehensive and high-quality traffic safety databases. Although the RTSA dataset provides a reliable foundation for long-term analysis, continued efforts are needed to improve data completeness, standardization, and temporal granularity. Expanding the database to include behavioral indicators or near-miss data would allow for even deeper analytical insights.
Overall, the discussion underscores that while deep learning models such as LSTM provide powerful predictive capabilities, their greatest value lies in supporting evidence-based policy formulation. The integration of these tools with traditional analytical approaches, coupled with ongoing improvements in data collection and public awareness, can significantly enhance Serbia’s capacity to reduce the number of FSI among young drivers.
Beyond the Serbian context, the methodological framework proposed in this study can be adapted for use in other regions with similar data structures and traffic safety challenges. The approach allows policymakers and researchers to conduct “what-if” simulations, assess the effectiveness of implemented safety measures, and identify potential future risks. Moreover, by integrating additional explanatory variables—such as road infrastructure characteristics, behavioral indicators, and vehicle safety technologies—the predictive accuracy and interpretability of the model can be further enhanced.
6. Conclusions
The present study analyzed the long-term trends in FSI involving young car drivers (aged 18–24) in the Republic of Serbia from 1997 to 2024 and evaluated the influence of major events, including the introduction of the new Road Traffic Safety Law in 2010 and the COVID-19 pandemic in 2020. By applying Long Short-Term Memory (LSTM) neural networks to historical accident data, the study demonstrated the potential of deep learning approaches for short-term forecasting of FSI.
The results confirmed that the LSTM model achieved satisfactory predictive accuracy and effectively captured the nonlinear and temporal dependencies in the data. The forecasts indicated that both legislative reforms and temporary mobility restrictions had a measurable positive impact in reducing the number of FSI among young drivers. These findings support the hypothesis that machine learning algorithms can overcome the limitations of traditional statistical models and provide deeper insights into complex, multifactorial road safety dynamics.
The developed model provides a practical foundation for traffic safety planning, enabling virtual experiments, identification of future trend changes, and evaluation of new policy measures. Continued improvement of data quality and inclusion of additional explanatory factors—such as behavioral indicators, vehicle safety technologies, and spatial components—would further enhance its predictive performance and policy relevance.
One of the contributions of this study lies in the future potential of deep learning techniques as decision-support tools for traffic safety management. The developed LSTM model enables the simulation of various scenarios, identification of potential turning points, and assessment of the impact of future policy measures. The application of a given model as a support tool in decision-making is not the primary function at the current stage of its development. In addition, the modeling framework allows the inclusion of new explanatory variables—such as socioeconomic indicators, vehicle fleet composition, or environmental conditions—which could further enhance its prediction accuracy and policy relevance.
However, the study has several limitations. The predictive model relies primarily on historical time-series data and does not explicitly incorporate behavioral, infrastructural, or environmental variables, or the adoption of Intelligent Transport Systems (ITS) and in-vehicle technologies, which may also influence casualty trends. Moreover, as with most deep learning models, interpretability remains a challenge—while the model performs well in prediction, understanding the relative contribution of individual factors requires further methodological development.
Overall, this study demonstrates that advanced machine learning techniques can substantially improve the reliability of traffic casualty forecasting and provide actionable insights for safety-related interventions. The forecasting results can guide the allocation of enforcement resources, such as targeted traffic patrols or awareness campaigns, and inform the design of preventive measures, including driver education programs and infrastructure improvements. Strengthening traffic law enforcement, promoting the adoption of ITS and in-vehicle safety technologies, and enhancing driver education can, when combined with predictive analytics, enable more proactive and effective strategies to reduce FSI among young drivers and improve overall road safety.