Next Article in Journal
Characterizing Human-Caused Wildfire Based on the Fire Weather Index in South Korea
Previous Article in Journal
Knowledge Domain Mapping in Powder Coating Explosion Research: A Visualization and Analysis Study
Previous Article in Special Issue
Fire Spread Through External Walls of Wooden Materials in Multi-Story Buildings—Part I
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Patterns of Human Injuries and Fatalities in Fire Incidents in Serbia: A Comprehensive Statistical and Data Mining Analysis

by
Nikola Mitrović
1,
Vladica Stojanović
2,*,
Mihailo Jovanović
2,
Željko Grujčić
3 and
Dragan Mladjan
1
1
Department of Criminalistics, University of Criminal Investigation and Police Studies, 196 Cara Dušana Street, 11000 Belgrade, Serbia
2
Department of Computer Sciences and Informatics, University of Criminal Investigation and Police Studies, 196 Cara Dušana Street, 11000 Belgrade, Serbia
3
Department of Informatics, Mathematics and Statistics, School of Engineering Management (FIM), Beopolis University, 11000 Belgrade, Serbia
*
Author to whom correspondence should be addressed.
Fire 2026, 9(4), 146; https://doi.org/10.3390/fire9040146
Submission received: 19 February 2026 / Revised: 18 March 2026 / Accepted: 31 March 2026 / Published: 2 April 2026
(This article belongs to the Special Issue Fire Safety and Sustainability)

Abstract

This manuscript is a continuation of the research published in Fire 2025, 8(8), 302, i.e., it deals with the examination of the cause-and-effect relationships of fires in the Republic of Serbia from the aspect of human safety. Among others, variables related to gender, age, and severity of injuries caused by fires are introduced, on which various methods of statistical analysis and stochastic modeling are first applied. Continuous age variables are modelled using the flexible Generalized Additive Models for Location, Scale, and Shape (GAMLSS) framework, where the Generalized Normal Distribution (GND) is identified as the optimal generative model for injuries, while a Reflected Log-Normal Distribution with positive support (RefLOGND+) provides the best fit for fatalities. The quality of such modeling is formally verified, and the probabilities of injury and death of individuals in certain age categories are predicted, revealing a pronounced concentration of injuries in the working-age population and a markedly higher relative risk of fatal outcomes among elderly individuals. Thereafter, by applying certain Data Mining (DM) techniques, primarily the Apriori algorithm, the most frequently occurring association rules are found, which indicate typical patterns and demographic structure of injuries and deaths in fires in Serbia. Finally, using the CART (Classification and Regression Trees) algorithm, several decision trees are formed that describe the impact and relationship of different causes of fires on injury and death in fires. In this way, some important and frequent patterns are observed that indicate key fire risk factors that significantly affect the demographic structure of human casualties. The results thus obtained provide a basis for developing targeted strategies for fire prevention and improving emergency response planning.

1. Introduction

Fires have long represented a major challenge to human society. Scientific investigation is therefore essential, enabling deeper understanding, prevention, and mitigation of their consequences. Within this framework, forensic analysis plays a central role—identifying fire origins, reconstructing events, and assessing impacts on casualties and damage. Such analyses provide the foundation for preventive strategies and insurance frameworks and continue to drive extensive research into fire causality. In recent years, some predictive models supported by machine learning have become particularly influential in this field [1,2,3,4]. More specifically, Ahn et al. [5] integrated different datasets and variables to predict fire hazard in buildings, Choi et al. [6] used Google Earth Engine to compare multiple machine learning approaches for similar tasks, while Gunduz et al. [7] are focused on detecting burned areas using machine learning.
In addition to these approaches, stochastic modeling is emerging as a powerful tool to capture the inherent uncertainty of fire dynamics [8]. To this purpose, Hui et al. [9] used stochastic Petri nets to model fire risk in urban smart cities, while Masoudian [10] developed stochastic simulations of wildfire spread that incorporate random wind and fuel variability. Such probabilistic frameworks allow for more realistic predictions compared to deterministic models, especially in complex environments where randomness plays a crucial role. At the same time, the Association Rule Mining (ARM) technique has been shown to be effective in uncovering hidden patterns in fire data. Thus, Chan and Ye [11] applied ARM to petrochemical fires, revealing correlations between causes and responses to emergencies, while Ayhan et al. [12] demonstrated the use of ARM in the analysis of severe accidents, highlighting its potential for identifying critical attribute combinations. The broader application of ARM to fire datasets has also demonstrated its ability to extract useful knowledge for risk prevention and mitigation [13].
Despite the growing body of research on fire prediction and accident analysis, relatively little attention has been devoted to probabilistic modeling of the demographic structure of fire victims and to the integration of stochastic modeling with data mining (DM) approaches in order to analyze fire-related human casualties. Building on this foundation, the present study investigates cause-and-effect relationships in fire incidents with a particular emphasis on human safety. This choice is motivated by well-established recent theoretical stochastic results by Stojanović et al. [14,15], and it extends research conducted by Mitrović et al. [16]. It should be noted that the previous study by Mitrović et al. [16] primarily focused on clustering fire patterns, while the present research extends that framework toward demographic risk modeling of fire victims (gender and age), probabilistic analysis of their characteristics, and the use of DM techniques to uncover structural relationships between fire causes, environments, and casualty outcomes. Therefore, the main novelties of this study are:
i.
Introduction of demographic modeling via a flexible class of Generalized Additive Models for Location, Scale, and Shape (GAMLSS) distributions,
ii.
Predictive age-group probabilities based on fitted cumulative distribution functions (CDFs),
iii.
Gender- and age-segmentation using the Classification and Regression Trees (CART) algorithm, and
iv.
Separation of exposure-driven vs. vulnerability-driven fire risk mechanisms.
In this way, the proposed framework contributes to fire risk assessment by quantifying demographic risk patterns, identifying high-risk population groups, and distinguishing between exposure-driven and vulnerability-driven mechanisms of fire casualties.
The rest of the manuscript is structured as follows: Section 2 presents the dataset and variables relevant to the analysis of human safety in fires, with a particular focus on injuries and fatalities in fire incidents. The stochastic and DM methods applied in this study, along with their practical application, are also described here. Section 3 presents the main results obtained from stochastic modeling and DM techniques. Specifically, empirical distributions of continuous variables describing the ages of injured and deceased individuals are fitted using different theoretical stochastic models, while ARM is used to discover significant patterns linking explanatory factors to fire outcomes. In addition, classification models based on decision trees are used to estimate the impact of multiple independent variables on the occurrence of injuries and fatalities. Section 4 discusses these findings from the perspective of identifying details and regularities that can improve public safety and reduce fire risks, while Section 5 provides some concluding remarks.

2. Materials and Methods

To analyze the demographic structure and causal patterns of fire casualties, this study combines stochastic modeling with DM techniques. Stochastic modeling is used to formally describe the probabilistic structure of victim age distributions and to estimate age-specific risk patterns. Flexible distribution families within the GAMLSS framework are particularly suitable for this purpose because they allow simultaneous modeling of location, dispersion, skewness, and kurtosis, which are often present in real demographic data. Moreover, DM techniques are applied to identify hidden relationships among categorical attributes describing fire incidents. Specifically, ARM enables the discovery of frequent co-occurrence patterns between fire causes, environments, and demographic characteristics, while CART classification provides interpretable decision structures that reveal conditional pathways leading to injury or fatal outcomes. The combination of these approaches allows both probabilistic modeling of demographic risk and structural analysis of fire incident characteristics, providing a comprehensive analytical framework for the study of fire casualties. This section first describes the dataset used in this study, with a special emphasis on those variables that relate to the demographic structure of injured and killed persons. Thereafter, the theoretical foundations of the basic techniques used in further analysis of the observed dataset are given.

2.1. Dataset and Basic Variables

In this study, the dataset that includes persons injured and killed due to fires and explosions on the territory of the Republic of Serbia in the period from 1 January 2005 to 31 December 2015 is analyzed. The data was provided by the Ministry of Internal Affairs of the Republic of Serbia and represents the most recent available dataset with detailed demographic and contextual attributes required for the analyses conducted in this paper. Within it, a total of 2385 injured persons were recorded, of whom the age structure was determined for 2075 persons, while the age was not determined for 310 persons. Further, in the same dataset, 912 deceased individuals were recorded, of whom the age structure was known for 853 individuals, while the age structure remained undetermined for 59 individuals.
After filtering records with missing demographic information, the final dataset used in the analysis consisted of 2075 injured persons and 853 fatalities with known age and gender. Cases with missing age information were excluded in order to ensure reliable modeling of age distributions and demographic structures. Although this filtering slightly reduces the overall sample size, the remaining dataset remains sufficiently large to support robust statistical modeling and data mining analysis.
In addition, note that an official report by the International Federation of Fire and Rescue Services (CTIF) [17] indicates that the age-standardized fire mortality rate in Serbia (ICD-10 codes X00–X09) is 0.42, while the injury rate is 1.3 per 100 fires. As illustrated in Table 1, both values are slightly lower than the global averages but remain higher than those reported in most countries in the region. Consequently, human safety in fire incidents remains an important public safety issue, both in Serbia and internationally, highlighting the continuing need for effective fire prevention strategies. These comparative indicators provide an international context for the Serbian dataset analyzed in this study and underline the broader relevance of investigating demographic risk patterns in fire casualties.
Further, a set of eight variables is used to analyze the injured and killed persons, which allows their classification in relation to their values. All variables include key information about the cause of the incident, the location of the incident, the age structure of fire victims, the year and time of the incident, gender, and the day of the week when the incident occurred. The basic characteristics of these variables are shown in Table 2 below. Note that, due to their scope, the value sets for the variables “Cause of Fire (CoF)” and “Fire Object Type (FOT)” are given in Appendix A. Also, for the dataset of injured persons, a special analysis of injuries by severity was conducted, i.e., segmentation into minor and serious injuries was performed.
Additionally, the Injury/Fatalities classification can be considered an output variable over the aggregated dataset. Then, for all the above variables (usually called attributes), their information gain ( I G ) is determined, which measures the reduction in entropy achieved by splitting on an attribute (A), according to the formula:
I G ( A ) = H ( S ) H ( A | S )
Here, similar to Mitrović et al. [16]:
H S = i = 1 n p i l o g 2 p i
is the entropy between the classes S 1 , , S n of the set S , p 1 , , p n are aposteriori probabilities of choosing classes S 1 , , S n , and:
H A | S = i = 1 n S i · H ( S i ) S
is the entropy occurring after dividing S into the classes S i , with respect to the attribute A . As can be seen, the COF attribute has the highest I G value, which can be taken as the primary one during the classification, i.e., formation of decision trees (see Section 3).

2.2. Statistical Analysis and Stochastic Modeling

Statistical analysis of fire casualties provides a basic understanding of the dataset, highlighting recurring causes and demographic profiles. This kind of analysis begins with a descriptive investigation of the observed dataset, with the aim of identifying structural patterns and distributional properties. Frequency distributions are examined for the most important target nominal variables that, as already mentioned, are of interest from a safety and demographic perspective. These are the severity of injuries among injured persons, as well as the gender and age of injured and deceased persons, observed within different groups. This type of analysis provides initial insight into the demographic structure of injured and deceased individuals, thereby providing guidance for further, more precise research into more complex relationships and patterns.
Based on these descriptive insights, stochastic modeling is used to fit probability distributions to the observed age data of injuries and fatalities. Therefore, two numerical variables are additionally introduced: X for the age of the injured and Y for the age of the deceased, interpreted as realizations of some continuous random variables. In order to theoretically describe the distribution of injured and deceased, several symmetric and asymmetric candidate stochastic distributions (normal, generalized normal, reflective log-normal, etc.) are used. The adequacy of each of these distributions is evaluated in several ways, using the information criteria, error metrics, and matching tests, ensuring both statistical rigor and empirical validity:
  • Information Criteria: The Akaike Information Criterion (AIC), as well as the Bayesian Information Criterion (BIC), are used to balance model fit with complexity. Both AIC and BIC penalize model complexity, with BIC applying a stronger penalty. Lower values indicate a better balance between fit quality and parsimony.
  • Mean Square Error (MSE): This metric provides a direct measure of how well the theoretical distribution fits the original data, i.e., how well it approximates the observed frequencies. MSE is calculated as the mean square deviation between the empirical and fitted data, so smaller MSE values indicate that the distribution more accurately reflects the empirical age variability and vice versa.
  • Goodness-of-Fit Tests: Several statistical tests were applied to formally assess the agreement between empirical and theoretical distributions:
    a.
    The Kolmogorov–Smirnov (KS) test evaluates the maximum distance between the empirical cumulative distribution function (ECDF) and the theoretical CDF.
    b.
    The Anderson–Darling (AD) test places more weight on the tails of the distribution, making it particularly useful for detecting deviations in extreme age groups (e.g., very young or very old casualties).
    c.
    The Cramér–von Mises (CVM) test complements the KS and AD tests by integrating the squared differences between the ECDF and theoretical CDF across the entire domain. In this way, CVM provides a smooth, distribution-free assessment of fit quality and is particularly robust for continuous age data.
Note that all the above metrics and goodness-of-fit tests were conducted in the statistical-oriented programming language “R” (version 4.5.0) and using the software package “twosamples” [18]. According to this, the test statistics, as well as the corresponding p -values, can be easily calculated.
By combining information criteria, error metrics, and goodness-of-fit tests, the adequacy of the selected stochastic models is evaluated from several complementary perspectives, ensuring statistical robustness and reliability of the obtained results. The best distributions are then selected and used to predict fire risk for specific age groups. Therefore, stochastic modeling allows us to identify the most appropriate probability models for each group of victims. Symmetrical distributions (normal, generalized normal, etc.) were found to be adequate candidate for injuries (i.e., X-variable), reflecting their balanced age profile. In contrast, asymmetric and heavy-tailed distributions were required for fatalities (Y-variable), capturing the negative skewness and excessive kurtosis observed in the empirical data. This methodological rigor provides a robust framework for quantifying uncertainty, validating the adequacy of models, and supporting subsequent risk analysis.

2.3. Data Mining Techniques

In addition to stochastic modeling, DM techniques are applied to uncover hidden structures and relationships within the fire casualty dataset. Two complementary approaches are used for this purpose, described in more detail below (for more detail, see, e.g., Hastie et al. [19]).

2.3.1. Associate Rules (Apriori Algorithm)

Association rules belong to the group of unsupervised machine learning methods, used to identify frequent co-occurrences between attributes such as fire cause, age group, gender, and outcome (injury or death). To that end, the Apriori algorithm is used to identify frequent co-occurrences between attributes such as fire cause, age group, gender, and outcome. It operates by iteratively identifying frequent itemsets that satisfy predefined support thresholds and generating association rules of the form A B . Here, A and B are disjoint subsets of attributes (also called events), usually termed as antecedents and consequences, respectively. Each rule is evaluated using the following metrics:
a.
Support is the percentage of records that (simultaneously) contain attributes A and B. Using event probabilities, support is defined by the expression:
Support A B = P A B .
b.
Confidence represents the conditional probability of B, given A:
C o n f i d e n c e ( A B ) = P ( B   |   A )
c.
Lift is the ratio of observed co-occurrence to expected independence, highlighting non-trivial associations:
L i f t ( A B ) = C o n f i d e n c e A B P B
Thus, values of L i f t > 1 indicate positive and strong associative links between events A and B.
d.
Leverage represents the difference between observed and expected co-occurrence:
L e v e r a g e ( A B ) = P A B P A P B
e.
Conviction is a measure of the reliability of an implication, penalizing cases where A occurs without B:
C o n v i c t i o n ( A B ) = 1 P B 1 C o n f i d e n c e ( A B )
Thus, a high conviction value indicates that there is a strong implication between the occurrence of events A and B.
In this way, rules with high support and confidence identify stable, recurrent patterns, while rules with high lift but lower conviction highlight risk factors that significantly increase the probability of fatal outcomes. This methodology allows for a shift from descriptive summaries to probabilistic inference of risk profiles, revealing both deterministic and stochastic patterns in accident outcomes.

2.3.2. Decision Trees (CART Algorithm)

Decision tree models represent classification structures in a hierarchical form, where data are recursively partitioned according to the values of explanatory variables. In this study, the CART algorithm [20] is applied to model the relationships between fire characteristics and casualty outcomes. The strength of this algorithm lies in its ability to solve both classification and regression problems, as well as in the simplicity of interpreting the results. In this way, the CART algorithm is often a universal solution for data analysis via decision trees, as it can perform classification of both categorical and continuous, numerical variables.
The CART algorithm starts with the entire dataset and builds the tree by following these steps:
  • Initial partition: The algorithm examines all possible partitions (by all attributes and their values) and chooses the one that leads to the greatest improvement in the “purity” of the data.
  • Criteria for choosing the best partition: Two different criteria are applied here:
    a.
    The criterion for classification is the minimization of the Gini index:
    G i n i t = 1 i = 1 C p i 2
    where t is the observed node, C is the number of classes, and p i is the proportion of cases belonging to class i in node t . Note that a lower Gini index value (closer to zero) indicates that the node t is cleaner, i.e., that the data in this node belong to the same class.
    b.
    For regression, minimization of the residual sum of squared errors (RSSSE) is used:
    R S S E = i = 1 N ( y i y ¯ ) 2
    where y i is an actual value of the dependent (target) variable, y ¯ is the average value at the node t , and N is the number of instances at this node. The goal of the CART algorithm in this case is to find a data partition that leads to the smallest total RSSE value across all nodes.
3.
Recursive tree building: The process is repeated for each subset created by the split, until one of the stopping criteria is reached (minimum number of samples in a node, maximum depth, or node purity).
4.
Tree pruning: After the tree is built, a pruning algorithm is applied to remove redundant branches that do not contribute significantly to the accuracy of the model.
Applying the previous procedure creates a decision tree that is usually easy to understand, interpret, and visualize. Note that by combining association rules with decision trees, the analysis captures both horizontal patterns of co-occurrence (Apriori algorithm) and vertical hierarchical dependencies (CART algorithm). This dual perspective provides a comprehensive view of the risk structure: association rules highlight frequent and strong combinations of attributes, while decision trees reveal conditional paths leading to injuries or fatalities.

3. Results

The results of the analysis of human safety in fire accidents are presented here in three complementary stages. First, descriptive statistics provide an overview of the demographic structure of fire casualties, highlighting differences in age, gender, and injury severity between injured and deceased individuals. Second, stochastic modeling is applied to fit candidate probability distributions to the observed age data, allowing for a rigorous assessment of symmetry, skewness, and kurtosis effects. Finally, data mining techniques, including association rule mining and decision tree induction, are employed to uncover hidden patterns and conditional pathways that characterize the risk profiles of casualties. Together, the results thus obtained offer a comprehensive empirical foundation for subsequent interpretation and discussion.

3.1. Descriptive Statistics

This section presents the demographic and descriptive characteristics of fire casualties, focusing on age, gender, and injury severity. Descriptive statistics of the empirical distribution of the above-mentioned variables are shown in Table 3. From here, it can be seen that there is a clear predominance of men among the injured, and a further study shows that the reason is occupational causes where men are more often exposed. On the other hand, the participation of women in fatal outcomes is significantly higher than in injuries. This suggests that women, although less frequently injured, have a relatively higher risk of a fatal outcome, probably due to the age structure.
Regarding the age of the injured, the dominant group is the 20-to-64-year-old group, which accounts for 72.4% of all injured persons. Among the deceased, the dominant group is those over 65 years of age, who account for almost 59% of all deaths. Thus, injuries are concentrated in the working-age population, whereas fatalities disproportionately affect the elderly, underscoring age as a critical risk factor. Finally, in most cases, injuries have minor consequences, but a significant proportion of serious injuries indicate the need for prevention and protection in work environments.
In addition, the age of the injured and deceased is also considered as two separate numerical variables. For this purpose, statistical indicators that are usually used to summarize continuous attributes, such as measures of central tendency, dispersion, and shape of the distribution (skewness and kurtosis), are calculated. These values are shown in the following Table 4, where similar conclusions can be drawn as in the previous one. In other words, injuries are clustered around the working-age population, with a symmetrical distribution (mean ≈ median = 47, skewness ≈ 0) and moderate variability.
In contrast, fatalities are concentrated in the older population, with a significant negatively asymmetrical distribution (mean ≈ 65, median = 71, skewness < 0) and a pronounced leptokurtosis (greater than 3). This fact indicates a concentration of the distribution of victims of older age, as well as a significant presence of extreme cases, i.e., the possibility of suffering both in the youngest and oldest population. Thus, a fundamental demographic difference between injuries and deaths from fires is observed. Namely, it is obvious that injuries predominantly affect the economically active population, while deaths are disproportionately present among the elderly.
Note that in addition to the demographic characteristics presented above, the dataset also includes several contextual attributes describing the temporal and spatial conditions of fire incidents, such as accident hour groups, seasonal variation, and fire object types. Although these factors are not the primary focus of the descriptive analysis, they are incorporated in the subsequent data mining procedures, where their relationships with injury and fatal outcomes are explored through association rule mining and decision tree modeling.

3.2. Stochastic Modeling

To formalize the observed demographic patterns, stochastic modeling is applied to fit candidate probability distributions to the age data of injuries and fatalities. To this end, the GAMLSS methodology proposed in [21] was used, which allows flexible modeling not only of the mean values μ but also of the scale parameters σ , shape ν , and kurtosis ( τ ) of the distribution. This makes it particularly suitable for analyzing the age distribution of injuries and fatalities, as the data show various asymmetries and leptokurtic that classical approaches cannot fully capture. For a distribution with parameters μ , σ , ν , τ , GAMLSS defines the so-called linear predictor:
η k = g k θ k = X k β k + j f j k x j k , k = 1,2 , 3,4 ,
where g k is the link function for parameter θ k , X k β k is the parametric (e.g., regression) component, and f j k x j k are smooth functions (e.g., splines). In our study, the parameters are estimated using a penalized maximum likelihood method [22], while optimization is performed via Newton-Raphson and Fisher scoring algorithms [23]. In doing so, for smooth functions, a backfitting algorithm iteratively adjusts parametric and nonparametric components [24], and the entire estimation procedure is carried out using the so-called “gamlss.dist” package in the statistical software “R” [25].
The fitting of the age distribution of the injured ( X -variable) is described in more detail below, for which, as already mentioned, descriptive statistics showed that the age distribution of injuries is symmetric with moderate variability (SD ≈ 19.6) and mild leptokurtosis (≈2.4). In order to formalize these patterns, four candidate distributions were fitted: Normal Distribution (ND), Generalized Normal Distribution (GND), Student’s t-distribution (TD), and Generalized t-distribution (GTD). It is worth noting that these distributions were chosen from the point of view of their symmetry, which was established in the empirical distribution of the age of injured persons. The estimated parameter values of these distributions, along with the corresponding goodness-of-fit statistics mentioned above, are shown in Table 5 below.
Based on this, it is clear, for instance, that although GTD achieved a competitive AIC, it did not pass the goodness-of-fit tests and was therefore not considered adequate despite its flexibility. Among remaining candidate distributions, the ND also failed the goodness-of-fit tests, and the TD proved to be too flexible, with too large a value of the parameter ν ~ + . In contrast, the GND distribution achieved the lowest AIC/BIC values, minimized the CDF-based mean square error (MSE), and was not rejected by the KS, AD, or CVM tests, with the significance at the level p > 0.05 . Additionally, estimated shape parameter of the GND ( ν 3.1 ) indicates moderate deviation from Gaussian tails, where ν = 2 . Thus, the GND, given by the probability density function:
f x μ , σ , ν = ν 2 σ Γ 1 / ν exp     x     μ σ   ν     ,   x R ,     ν   >   0 ,
where Γ · is the gamma function, represents uniquely providing adequate fit. According to this, the GND best fits the age distribution of the injured and can be used as a symmetric age profile with moderately strong tails, which is consistent with descriptive statistics (e.g., almost zero skewness).
This is also illustrated in Figure 1, which shows the empirical and competitive distributions (left), as well as a quantile-to-quantile (Q-Q) plot showing the fit of the empirical age distribution to the GND (right). According to them, it is also clearly noticeable that GND best fits the empirical age distribution of injured persons; accordingly, it was further selected as the generative model for subsequent inference, prediction, and risk quantification. In the case of stochastic modeling of the age-specific mortality distribution (Y-variable), descriptive statistics showed that the distribution was negatively skewed (mean 64.5, median = 71, skewness 1.06) with high variability (SD 21.5) and pronounced leptokurtosis ( 3.61). These values indicated a distribution centered in older age groups, with negative skewness and heavier tails. This matches the empirical profile: fatalities are concentrated among elderly individuals, with a sharp peak and extended left tail, which indicates significant mortality probabilities in the younger population.
To formalize these patterns, as can be seen in Table 6, four candidate distributions were fitted: Normal distribution (ND), Weibull distribution (WD), Generalized Gamma distribution (GGD), and Reflected Log-Normal distribution with positive support (RefLOGND+), given by the probability density function:
f y     μ ,   σ ,   c   = 1 c y σ 2 π exp log c y μ 2 2 σ 2 Φ log c μ σ , 0 < y < c
Here, μ is the location parameter (e.g., mean value), σ is the scale parameter, c > 0 is the reflection constant, and Φ ( · ) is the standard normal CDF. Note that, in this case, the reflection constant is chosen according to equality c = max ( Y ) + 5 = 103 , thus ensuring that the RefLOGND+ has only positive values. Obviously, the ND, WD, and GGD failed the goodness-of-fit tests, so the RefLOGND+ uniquely provided an adequate fit. As in the previous one with GND, the RefLOGND+ achieved the lowest AIC/BIC and MSE values and is the only one not rejected by the KS, AD, and CVM tests.
Table 6. Estimated values of competing distributions and goodness-of-fit statistics (Y-variable).
Table 6. Estimated values of competing distributions and goodness-of-fit statistics (Y-variable).
DistributionParametersInfor. Criteria and ErrorsTest-Statistics/(p-Values)
μ σ ν AICBICMSEKSADCVM
ND64.5221.47 7737.57747.0 3.89 × 10 3 0.1346   * ( ~ 0.000) 39,049   * ( ~ 0.000) 6.617   * ( ~ 0.000)
WD70.883.313 7876.97886.4 5.08 × 10 3 0.1485   * ( ~ 0.000) 46,542   * ( ~ 0.000) 8.765   * ( ~ 0.000)
GGD81.830.154919.447521.57535.8 3.59 × 10 3 0.1137   * ( ~ 0.000) 25,268   * ( ~ 0.000) 5.896   * ( ~ 0.000)
RefLOGND+3.5000.5534 7465.17474.6 9.63 × 10 4 0.0638 (0.0530)5121.6 (0.1655)1.1093 (0.1545)
* p < 0.05.
A similar conclusion can be drawn from Figure 2, where the fitting of the empirical distribution of the age of the deceased with four competitive distributions is shown, along with the Q-Q plot of the fit with the RefLOGND+ distribution. Empirical distribution clearly indicates a negative asymmetry with limited support and concentration near the upper limit, which naturally motivates the reflection of right-skewed distributions. Thus, the RefLOGND+ is the most appropriate generative model for the age of deaths, offering both descriptive accuracy and predictive utility. It captures the negative skewness and leptokurtosis observed in the data, outperforming the normal, Weibull, and generalized gamma alternatives. It is also noticeable, based on the Q-Q plots, that RefLOGND+ shows some weaknesses in fitting younger age categories. However, the pronounced accuracy for older age groups, as the most common category of deaths, makes RefLOGND+ particularly suitable for risk assessment and simulation of future age-fatality patterns, which we now describe in more detail.
Using the empirically “best” distributions (GND for injuries and RefLOGND+ for fatalities), age-group predictions can be generated. As noted above, selected models ensure that the predicted probabilities are not just smooth numbers, but also risk estimates based on the age-group distribution. In this sense, predictive values for each age interval [ a , b ) are obtained using the CDF of the selected model. More precisely, for the variables injuries (X) and fatalities (Y), we calculate:
P a   X < b = F X b F X a ,   P a   Y < b = F Y b F Y a
where F X x and F Y y are the CDFs of the GND and RefLOGND+, respectively. The predicted values thus obtained are visualized and shown in Figure 3, where it can be seen that in the case of injuries (X-variable), symmetric concentration is noticeable for middle adulthood, peaking between 35–49 years old ( 0.26), remaining high between 50–64 years old ( 0.25), and gradually decreasing thereafter. On the other hand, mortality indicates a low initial value until midlife, followed by a sharp increase for the age group 50–64 ( 0.20), a peak between 65–79 ( 0.35), as well as a continuous increase between 80 and the older population ( 0.28). Note that this pattern is consistent with selected models that incorporate the risk of activity-related injuries and age-related frailty that drives mortality. In that way, these predictions also support differentiated prevention strategies: exposure control for the working-age population aged 20–64 years and specific interventions targeting vulnerability for the population aged 65+ years.
In order to further quantify the structural divergence between injuries and fatalities across age groups, we introduce the relative risk index (RR), defined for each age interval a i as follows:
R R i = P a i   F P a i   I
Here, P a i   F and P a i   I denote the predictive probabilities obtained from the fitted RefLOGND+ and GND distributions, respectively. This ratio measures the relative overrepresentation of a given age group among fatalities compared to injuries. Values R R i > 1   indicate that the age group is proportionally more frequent among fatalities than among injuries, while R R i < 1 indicates dominance of non-fatal outcomes.
For visualization purposes, the relative risk curve is presented in Figure 3 on a logarithmic scale, enabling symmetric interpretation of the structural transition around the neutral threshold R R i = 1 . For age groups below 65 years, the relative risk remains below unity, confirming that fire incidents in these categories are predominantly injury-driven and related to exposure mechanisms. However, after age 65, the index increases sharply, exceeding 2 in the 65–79 group and reaching values above 5 in the 80+ category. This indicates that the oldest population is more than five times relatively overrepresented in fatal outcomes compared to injuries. The relative risk curve, therefore, provides quantitative confirmation of the demographic divergence identified by stochastic modeling. It supports the interpretation that fire injuries are primarily an exposure-driven phenomenon concentrated in the working-age population, while fire deaths are characterized by physiological fragility, vulnerability, and reduced adaptive capacity in older age.

3.3. Data Mining

This section presents the results of the analysis of data on the injured and deceased population in fires, obtained using DM techniques. First, frequent co-occurrence rules are identified using the Apriori algorithm, described in the previous section. Then, a classification analysis is performed using the CART algorithm to obtain decision trees with the aim of determining the most important predictors leading to injuries or fatalities.

3.3.1. Results of the Apriori Algorithm

The Apriori algorithm is performed in WEKA 3.9.6 software [26], which enables reproducible and transparent processing of categorical records. In our case, this algorithm is implemented with a minimum support of 3% for injuries and 7% for fatalities. These thresholds were chosen to equalize the absolute frequency of rule generation, so that the minimum number of instances required to form associations is approximately equal for both sets (62 and 60, respectively). In this way, they allow the discovery of rare but significant rules that link different attributes related to fire injuries and deaths. Conversely, a high confidence value of at least 95% is taken, which is also used as the leading metric, thus ensuring significant reliability of the obtained association rules. The algorithm itself was run for 20 iterations, such that the resulting number of large itemsets of length L i ,   i = 1 , , 5 shows significant structural complexity in both the injury and fatality sets. This can also be seen in Table 7, where the existence of numerous stable attribute combinations that exceed defined frequency thresholds is indicative.
Therefore, applying association rules to fire data reveals “hidden” patterns that link fire causes, time, facilities, age groups, and outcomes (injuries or fatalities). Table 8 and Table 9 show the ten most important associations with the highest confidence values for both datasets, which we analyze in more detail below.
Within the injury population, the most strongly identified rules show clear demographic and ecological patterns, as is shown in Table 8. The measures of Confidence, Lift, Leverage, and Conviction are used to assess the strength and significance of the rules. Note that all associative rules have high Confidence (0.95–0.98), moderate Lift (1.25–1.31), Leverage close to zero (0.01–0.02), and strong Conviction (3.74–8.11). This means that in almost all structured contexts of fire injuries (industry, traffic, explosions, weekdays, minor injuries), injuries are significantly more common in men. Moreover, the riskiest “male” context is welding/grinding machines in industry during weekdays, general industrial causes, and traffic fires. It is clear that the transport sector has a smaller, but still clearly identifiable, risk disproportion, primarily among professional drivers who are highly exposed to fire risk.
Otherwise, the dataset on the deceased differs from the previous one, as it has completely different risk patterns: demographic, etiological, and seasonal mortality factors are significantly more specific. As can be seen in Table 9, residential fires are dominant in fatalities, as residential spaces are the most common context for fatal fires, combining high exposure (many people are staying in them) with vulnerable groups (elderly, sick). Also, the dominant causes of fires are heating elements, cigarette use, and open flames, so such fires can be extremely risky, especially for the elderly population. Thus, the associations here are not related to professional contexts, but to vulnerable groups, residential environments, and specific causes of fire. Furthermore, it should be noted that according to the obtained association metrics, two different clusters can be observed: Residential fires have significant, but moderate metric values, with approximately equal values of Lift (1.22–1.24) and close values of Conviction (4.4–6.9). In contrast, fires caused by open flames (Conf. ≈ 0.97, Lift ≈ 5.7, Conv. ≈ 18–23) indicate extremely strong relationships between antecedents and consequents; that is, flames in open spaces (yards, unregulated spaces) represent a disproportionately risky scenario.
Some of the above facts are also shown in Figure 4, which represents a log-log scatter plot of the Lift vs. Conviction values for the association rules obtained in the fire analysis. All metric values are grouped into three different clusters: “Gender = Male”, “FOT = Residential Buildings”, and “CoF = Open Flame”, where it is evident that the first two clusters have close Lift values (≈1.2–1.3), while the third (“Open Flame”) is specific with extremely high values of both metrics. Thus, this log-log visualization effectively separates clusters, confirming the robustness of the rules and aiding their interpretability.

3.3.2. Results of Classification Modeling

In this part, all observed data are integrated into one general set; that is, all cases (i.e., classes) of injuries and fatalities in fires are considered as cases of a new target (output) variable called “IFO” (Injury/Fatal/Outcome). This variable is observed in relation to other attributes, where three separate data segments: male, female, and older (65 and older) population are taken as sets of instances. The basic idea of this segmentation is to examine similarities, but also differences, in the occurrence of injuries and deaths from fires among people of different genders, as well as among the elderly population, as the most at-risk category. To this end, the previously described CART algorithm is applied and implemented by the “Decision Tree” tool in the SPSS statistical software environment [27] with ten-fold cross-validation during tree construction to prevent overfitting. In this way, three different decision trees are obtained that provide a segmentation of the impact of different fire characteristics on the occurrence of risk outcomes for human safety. Due to the scope, parts of these trees are given in Appendix A (Figure A1, Figure A2 and Figure A3), and some of their basic metrics are shown in Table 10.
According to this, it can be seen that all the resulting decision trees are quite extensive (with several dozen nodes), have significant depth ( 10 levels of branching), and have approximately equal, relatively small values of standard errors. In addition, the number of terminal nodes (“leaves”), which show the final classification for a particular outcome, indicates the effective accuracy of the resulting decision trees. Thus, all of them can be considered highly predictive, with the potential to identify risky situations and enable targeted action (e.g., preventive measures in high-risk categories). Nevertheless, notice that there is a significant difference in the structure of the obtained decision trees. For men and older groups, the primary branching variable is “Cause of Fire” (“COF”), which is quite expected because it has already been shown to have the highest I G -value (see Table 1). In addition, a pure binary tree is obtained for the older population. Conversely, for the female population, the primary branching variable is “Age of Individuals”, which is taken here as a continuous numerical variable. As already mentioned, the CART algorithm segments this variable using a regression technique and RSSE values, which improves the precision of the tree itself.
As a confirmation of the above, Table 11 below provides the classification results for all the above-mentioned decision trees. More precisely, the total values of correctly and incorrectly predicted outcomes for the target variable “IFO”, which classifies injuries and fatalities, are shown here. As can be seen, for all three datasets, the classification can be considered satisfactory, as the total number of correct outcomes is significantly higher than the number of incorrectly predicted ones. In addition, the distribution of incorrect outputs is, to a fair extent, evenly distributed (especially for the elderly set), indicating that there is no overfitting. In this way, different combinations of attributes make the model more stable and allow for more accurate prediction and decision-making.
According to the above classification tables, a more detailed analysis of the quality of the resulting decision trees can be conducted. To this end, some of the most commonly used qualitative measures are as follows:
Accuracy = T P + T N N , Precision = T P T P + F P , Sensitivity ( Recall ) = T P T P + F N , Specifity = T N T N + F P ,   F measure = 2 × Precision × Sensitivity Precision + Sensitivity
Here, the so-called total number of true positive ( T P ), false positive ( F P ), true negative ( T N ), and false negative ( F N ) predictions is denoted, while N = T P + F P + T N + F N is the total size of the observed dataset. Note that for the terms “positive” or “negative”, due to symmetry, any of the values “injuries” or “fatalities” can be taken. The calculated values of all the above quality measures, obtained for the above-mentioned decision trees, are shown in Table 12. It is clear that all trees have high accuracy, as a result of the satisfactory distribution of values in the classification matrices given in Table 11.
Therefore, the obtained models have satisfactory, high-quality measures, with accuracy and precision being more pronounced in the female population (approximately 95%), while specificity is particularly pronounced in the male population (almost 98%). On the other hand, the decision tree as a classification model in the elderly population has the best sensitivity (approximately 93%), primarily due to the almost uniform distribution of false positive and false negative cases. The classification accuracy of the obtained models can also be confirmed by the so-called ROC (Receiver Operating Characteristic) curves. ROC curves represent polygonal lines whose horizontal axis shows the rate of false positive responses (complementary to specificity), and the vertical axis shows the rate of true positive responses (i.e., sensitivity). In this way, the so-called area under the ROC curve (AUC) represents another measure of the quality of the classification results, as the ability of the model to correctly distinguish between different classes. Figure 5 shows the AUC values (0.92–0.95) in all classification models based on decision trees, confirming that these models can provide a very high percentage of correct predictions.

4. Discussion

In this section, statistical, stochastic, and DM results are integrated to identify key differences in demographic profiles, risk patterns, and practical implications. An important methodological aspect of this study is that probabilistic modeling (GND and RefLOGND+ distributions) and DM analysis (Apriori and CART algorithms) are mutually supportive. In this way, the combined statistical, stochastic, and data analysis provides a comprehensive view of the demographic, temporal, and causal structures underlying injuries and fatalities in different types of fires in Serbia. By integrating descriptive indicators, probabilistic modeling, association rule analysis, and classification trees, several consistent and practically relevant patterns emerge, which will be discussed in more detail below. It should also be noted that the analyzed dataset covers the period 2005–2015; therefore, the observed patterns should be interpreted within the context of that time frame, as subsequent technological, demographic, or regulatory developments may have influenced more recent fire risk characteristics.

4.1. Demographic Divergence Between Injuries and Fatalities

The central contribution of this study is the clear statistical separation between the age profiles of injured and deceased individuals. Thus, injuries predominantly affect the working-age population (20–64 years), showing an almost symmetric age distribution (generalized normal distribution). In contrast, deaths are concentrated among the elderly (65+ years), with a markedly asymmetric and strongly skewed profile best represented by a reflected log-normal distribution. This demographic divergence highlights two fundamentally different risk mechanisms:
  • Injuries are caused by exposure to activities related to industrial or mechanical causes (cutting, grinding, electrical work).
  • Fatalities correspond to the risk caused by vulnerability, where physiological fragility, reduced reaction time, chronic diseases, and limited mobility play a dominant role.
These results indicate two structurally different mechanisms of fire risk. Injuries are primarily associated with exposure-related risks, typically linked to occupational activities and industrial environments. In contrast, fatal outcomes are predominantly related to vulnerability-related risks, particularly affecting elderly individuals in residential settings. This distinction becomes clearer when the results of probabilistic modeling and data mining analyses are considered together. Stochastic modeling reveals the age-related probability structure of fire casualties, while association rule mining and decision tree analysis identify the most frequent contextual combinations of fire causes, environments, and demographic attributes. Taken together, these complementary perspectives reveal consistent underlying patterns of fire risk mechanisms and suggest that effective prevention strategies should address both occupational exposure and vulnerability-related risks in ageing populations.

4.2. Interpretation of Stochastic Modeling Results

The analysis of the age structure of injured and deceased in fires showed clear differences in the shape of the distributions and predictive patterns. For injured people, descriptive statistics indicate a symmetric distribution around a mean of approximately 48 years, with moderate dispersion and mild leptokurticity. Among the deceased, the distribution was negatively asymmetric, with a mean of approximately 65 years, a median shifted towards older ages (71 years), and pronounced leptokurticity. These findings, already at a descriptive level, suggest that injuries occur predominantly in the working-age population, while mortality affects older populations.
Candidates from the family of flexible distributions within the GAMLSS methodology were used for formal modeling. For the injured population, the Normal, Student’s t-distribution, Generalised t-distribution (GTD), and Generalized Normal distribution (GND) were tested. The normal and GTD failed the goodness-of-fit tests, while the t-distribution showed excessive flexibility and outliers in the tails. Finally, the GND emerged as the only statistically acceptable candidate: it had the lowest AIC/BIC values, the smallest mean square error (MSE), and was the only one not rejected by the Anderson–Darling, Kramer–von Mises, and Kolmogorov–Smirnov tests. The estimated parameters of this distribution ( μ 47.8 ,   σ 19.6 ,   η > 3 ) indicate a symmetric distribution with controlled tails, which is consistent with empirical findings.
For the fatalities, the candidates were Normal, Weibull, Generalized Gamma, and Reflected Log-Normal Distribution (RefLOGND+). Normal and Weibull distributions were unable to capture pronounced negative skewness, while Generalized Gamma showed some flexibility, but was still rejected in fit tests. The RefLOGND+ distribution proved to be optimal: it had the lowest AIC/BIC values and was the only one not rejected in the agreement tests ( p -values above 0.05 in the AD, CVM and KS tests). Estimated values of parameters ( μ 3.5 , σ 0.5 ) obtained by reflection around the constant c = 103 allowed the negative asymmetry to be transformed into the form that the log-normal distribution naturally describes. Thus, RefLOGND+ successfully captures the empirical profile: a concentration of mortality in older ages, with a pronounced peak and heavy tails.
Finally, the predictive probabilities by age group, obtained through the differences of the CDFs of the selected distributions, further confirm the differences between injured and killed. Injuries occur most often in the working age group (35–64), with the highest probability in the 35–49 group (0.260) and a stable, high probability in the 50–64 group (0.250). Mortality, in contrast, rises sharply after the age of 65: the highest probability is in the group 65–79 (0.350), while the high value is maintained in the oldest group, 80+ (0.280). These results clearly indicate different risk mechanisms: in the injured, exposure and activity dominate, while in the deceased, physiological vulnerability and limited possibilities of reaction in old age play a crucial role. Overall, the combination of GND for injured and RefLOGND+ for fatalities provides a methodologically consistent framework for describing and predicting the age structure of fire victims. The resulting models not only provide a statistically valid description of the data but also enable practical application in risk prediction by age group. Based on these findings, preventive measures should be differentiated: for younger and middle-aged people, focus on exposure control and safety procedures, while for older people, the priority should be to reduce vulnerability through faster detection, evacuation, and medical support. Finally, the log-scaled relative risk curve provides a quantitative transition point, visually and numerically confirming the shift from exposure-dominated to vulnerability-dominated fire outcomes.

4.3. Patterns Revealed by Association Rule Mining

Association rule analysis in WEKA software (Apriori algorithm, min. support 4–7%, min. confidence 95%) also revealed different underlying mechanisms for injuries and fatalities. Association analysis conducted on fire-related injuries in Serbia shows several consistent and conceptually coherent patterns that reflect the underlying structure of occupational exposures and specific fire contexts. In all extracted rules, the attribute case “Gender = Male” emerged as the dominant effect, with high Confidence (0.95–0.98) and moderate but significant Lift (1.25–1.31). The strictest rules were observed in scenarios involving welding and grinding, as well as in cases of fires in industrial and manufacturing plants, especially when these events occurred on weekdays, primarily in the afternoon (12–18 h). A somewhat smaller but still significant phenomenon is fires in transport, vehicles, and vessels. These results are expected, as these are activities predominantly performed by male workers in industrial sectors in Serbia, resulting in structurally higher exposure to fire hazards. These findings are also in line with international reports that highlight metalworking, work at high temperatures, and the transportation of flammable materials as the most vulnerable causes of workplace fires, especially when safety protocols or protective equipment are inconsistently applied.
The association rules for fatal outcomes indicate a well-defined set of risks that are predominantly associated with residential buildings, with very high and stable Confidence (0.97–0.98) and stable Lift (1.22–1.24). The most prominent antecedents—cigarette butt, fireplace/stove, and unspecified causes—show that fatal fires tend to occur in enclosed, mostly nocturnal, and less controlled environments, where fire detection is difficult and evacuation options are limited. The high confidence values confirm that, when these antecedents are present, the absence of a fatal outcome becomes relatively unlikely. Demographic factors, primarily people over 80 years old, stand out as high-risk groups, which is consistent with reduced mobility and slower reaction in emergency situations. Compared to injuries, where the risk mainly occurs in work and industrial environments, fatal fires are less diverse but much more consistent: they are concentrated in households, associated with carelessness, heating elements, and vulnerable categories of the population. This clearly distinguishes preventive priorities—occupational safety to prevent injuries, versus early detection, safe heating, and support for the elderly population to prevent fire fatalities. Finally, fires caused by open flames have a very significant Lift ≈ 6 and Conviction ≈ 20, which indicates the extreme strength of this rule, related to open flames in nonurban spaces. This group of association rules represents a key indicator of fatal fires, indicating that open flames in unregulated spaces are the most critical scenario.

4.4. Conditional Pathways Identified by Decision Trees

The classification results obtained using the CART algorithm show that the obtained classification models reliably distinguish fatal outcomes from injuries in all three analyzed population segments. In the female population, the overall accuracy of the model is high, with a marked ability to accurately identify fatalities (281 correctly classified fatalities versus only 24 misassigned injuries). It is particularly significant that the primary branching variable is “Age of Individuals”, indicating that age is a key determinant of risk in women. This confirms the finding from the association rules, according to which older women (especially 80+) represent the most vulnerable group from the point of view of fatal outcomes. In men, the model also shows marked separation, with over 1500 correctly classified injuries and stable recognition of fatalities (480 correctly identified). The dominant branching according to the variable “Cause of Fire” indicates that in the male population, the primary risks are related to specific operational activities and technical sources of ignition (welding, grinding, and other work processes), which directly coincides with previous results from the analysis of injuries in industrial settings.
Finally, the decision tree for the elderly population shows the clearest differentiation of fatalities (466 correctly classified fatalities, with only 36 misclassified). Here, too, the key predictor is the cause of the fire, indicating that critical situations in older age are most often associated with household fires (heaters, chimneys, fireplaces), with poorer mobility and slower reaction times in older people being a factor that further increases the likelihood of death. Taken as a whole, the classification results confirm a clear risk structure that is consistent with the findings from the association rules. In men and the elderly, the most important role is played by the characteristics of the cause of the fire, while in women, the strongest predictive effect is related to age. In all three CART models, injuries are seen to be far more consistent and easier to predict, as evidenced by the relatively low number of misclassifications in all three segments. On the contrary, fatal outcomes are rarer and thus statistically less stable, so although the models capture them relatively well, they are always more difficult to classify. Thus, the obtained risk structure indicates that preventive measures must be segmented: technical-operational interventions for men, safe use of heating devices and control of risky habits for the elderly, and specific protection measures for elderly women as the most vulnerable population category.

4.5. Practical Implications for Fire Safety Policy

The results obtained in this study provide several practical implications for fire safety management and prevention and risk prevention strategies to be identified. For the male working-age population, where ignition sources such as welding, grinding, and other technical operations dominate the risk structure, policies should prioritize the implementation of safety protocols, improved training, and stricter surveillance of industrial fire hazards. In contrast, findings for women, and especially the elderly, show that residential buildings are the primary site of death, mainly due to smoking, heating appliances, and reduced mobility. This highlights the need for targeted interventions such as subsidized smoke detectors, safer heating technologies, community-based monitoring programs for older households, and public campaigns aimed at preventing night-time firefighting. Outdoor fires are particularly risky, representing a more critical cause of mortality, and therefore require specific protocols to increase safety. Overall, the results emphasize that fire prevention strategies cannot be one-size-fits-all; instead, they require tailored, evidence-based approaches that address the specific mechanisms that lead to injury versus death in different segments of the population.
These findings point to several practical directions for fire risk mitigation, particularly in the areas of occupational fire safety, targeted prevention measures for elderly populations in residential environments, and the promotion of early fire detection systems in high-risk settings. Although the identified patterns provide useful insights into demographic and contextual fire risks, the dataset used in this study includes a limited set of attributes describing fire incidents, and, therefore, some potentially relevant factors (e.g., socioeconomic conditions, health status, or environmental conditions) could not be incorporated into the analysis. Consequently, the practical implications discussed above should be understood as indicative directions for fire risk assessment and prevention strategies rather than definitive policy recommendations.

5. Concluding Remarks

This study provides insight into the structure of fire-related risks in Serbia, revealing consistent, interpretable, and highly effective patterns across demographic groups and incident types. The combination of statistical and stochastic methods together with association rule mining techniques and CART classification yielded clear, stable, and relevant results, with the resulting models achieving high accuracy and strong discriminatory power between fire injury and fatality. In this way, it has been shown, among others, that injuries and fatalities in fires are not only quantitatively different but also qualitatively different in their contexts and causes. The relative risk analysis further highlights the demographic divergence between injuries and fatalities, indicating that the oldest population groups exhibit several times higher relative representation in fatal outcomes compared to non-fatal fire injuries. For instance, industrial and transportation environments have been shown to be more associated with injuries, with the most common being those in the specific work environments and occupations of the male population. Conversely, residential buildings are the dominant cause of fatalities, while the most critical scenario, open flames, appears to be the strongest and most reliable predictor of fatal outcomes. These findings provide a solid basis for targeted prevention strategies and policy interventions.
However, as already mentioned above, it should be noted that the presented analysis is subject to certain limitations, primarily due to the content and structure of the available dataset, as already highlighted in the previous section. In particular, a deeper analysis of social behavior, weather conditions, and living environment was not possible, since the dataset does not include detailed socioeconomic or health-related variables, such as weather conditions at the time of the fire, educational level, employment status, or the previous health status of injured and deceased individuals. Moreover, the dataset analyzed in this study represents the most recent available dataset in Serbia, containing sufficiently detailed demographic attributes on fire casualties. Therefore, although the present results provide meaningful insight into the demographic and contextual structure of fire injuries and fatalities, they should be interpreted as being limited to the available observation period and variables. Future research could extend the present analysis once updated datasets become available and could further enrich the proposed framework by incorporating temporal, spatial, and other contextual attributes in order to increase predictive power. In addition, more advanced predictive tools and simulation-based approaches based on statistical and data mining scenarios could be developed, which may further contribute to fire prevention and public safety planning.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/fire9040146/s1.

Author Contributions

Conceptualization, N.M., V.S. and M.J.; methodology, N.M., V.S. and M.J.; software, N.M., V.S. and M.J.; validation, N.M., V.S. and M.J.; formal analysis, N.M., V.S. and Ž.G.; investigation, N.M., V.S. and M.J.; resources, N.M. and Ž.G.; data curation, N.M., V.S. and D.M.; writing—original draft preparation, N.M., V.S. and M.J.; writing—review and editing, Ž.G. and D.M.; visualization, N.M., V.S. and D.M.; supervision, Ž.G. and D.M.; project administration, M.J. and Ž.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors sincerely thank the Ministry of Internal Affairs of the Republic of Serbia, which officially provided the dataset presented in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1 and Table A2 contain the sets of values for the nominal variables “Cause of Fire (COF)” and “Fire Object Type (FOT)”, respectively. Figure A1, Figure A2 and Figure A3 show specific branches of the decision trees for male, female, and older populations.
Table A1. Description of the grouped Cause of Fire (“CoF”) variable.
Table A1. Description of the grouped Cause of Fire (“CoF”) variable.
Cause of Fire (CoF)Number of Cases
Burning low vegetation34
Candle192
Chimney78
Cigarette butt193
Electric lamp25
Electrical appliances98
Electrical installation272
Explosion424
Furnace hearth225
Heating devices164
Intention86
Lightning6
Open flame262
Other causes258
Self-immolation10
Undetermined406
Vehicle44
Welding/Grinding151
Table A2. Description of grouped Fire Object Type (“FOT”) variable.
Table A2. Description of grouped Fire Object Type (“FOT”) variable.
ValuesFire Object Type (FOT)Number of Cases
Residential buildingsResidential house, Apartment building, Cottage, Barrack, Log cabin, Camp1753
Industrial and production facilitiesFactory, Plant, Workshop, Mill, Dryer, Refinery, Silo, Bakery, Laundry, Warehouse,215
Agricultural buildingsFarm, Economy, Hives, Other76
Public and institutional buildings/healthcare and social institutionsHospital, Healthcare institution, Nursing home, Spa, Clinic, Auxiliary hospital, Red Cross, School, Police station, Municipality, Barracks, Market, Court, Student dormitory32
Hospitality and commercial buildingsHotel, Restaurant, Tavern, Club, Shopping center, Store, Butcher shop, Kiosk, Newsstand, Commercial building/Business premises/Company216
Transport, vehicles and vesselsAirplane, Vehicle, Locomotive, Ship, Boat, Watercraft, Garage, Gas station232
Energy and infrastructure facilitiesGas pipeline, Transformer station, Heating plant10
Sports and cultural facilitiesStadium, Sports center, Theater4
Open spaces/othersCourtyard, Cemetery, Open space, Dump390
Figure A1. One branch of the decision tree for classification of injuries/fatalities outcomes (Gender = Female).
Figure A1. One branch of the decision tree for classification of injuries/fatalities outcomes (Gender = Female).
Fire 09 00146 g0a1
Figure A2. One branch of the decision tree for classification of injuries/fatalities outcomes (Gender = Male).
Figure A2. One branch of the decision tree for classification of injuries/fatalities outcomes (Gender = Male).
Fire 09 00146 g0a2
Figure A3. One branch of the decision tree for classification of injuries/fatalities outcomes (Age of Individuals = 65+).
Figure A3. One branch of the decision tree for classification of injuries/fatalities outcomes (Age of Individuals = 65+).
Fire 09 00146 g0a3

References

  1. Abid, F. A Survey of Machine Learning Algorithms Based Forest Fires Prediction and Detection Systems. Fire Technol. 2021, 57, 559–590. [Google Scholar] [CrossRef]
  2. Wood, D.A. Prediction and Data Mining of Burned Areas of Forest Fires: Optimized Data Matching and Mining Algorithm Provides Valuable Insight. Artif. Intell. Agric. 2021, 5, 24–42. [Google Scholar] [CrossRef]
  3. Alkhatib, R.; Sahwan, W.; Alkhatieb, A.; Schütt, B. A Brief Review of Machine Learning Algorithms in Forest Fires Science. Appl. Sci. 2023, 13, 8275. [Google Scholar] [CrossRef]
  4. Rubí, J.N.S.; Paulo de Carvalho, H.P.; Paulo, R.L.G. Application of Machine Learning Models in the Behavioral Study of Forest Fires in the Brazilian Federal District region. Eng. Appl. Artif. Intell. 2023, 118, 105649. [Google Scholar] [CrossRef]
  5. Ahn, S.; Won, J.; Lee, J.; Choi, C. Comprehensive Building Fire Risk Prediction Using Machine Learning and Stacking Ensemble Methods. Fire 2024, 7, 336. [Google Scholar] [CrossRef]
  6. Choi, J.; Yun, Y.; Chae, H. Forest Fire Risk Prediction in South Korea Using Google Earth Engine: Comparison of Machine Learning Models. Land 2025, 14, 1155. [Google Scholar] [CrossRef]
  7. Gündüz, H.İ.; Torun, A.T.; Gezgin, C. Post-Fire Burned Area Detection Using Machine Learning and Burn Severity Classification with Spectral Indices in İzmir: A SHAP-Driven XAI Approach. Fire 2025, 8, 121. [Google Scholar] [CrossRef]
  8. McNorton, J.R.; Di Giuseppe, F.; Pinnington, E.; Chantry, M.; Barnard, C. A Global Probability-of-Fire (PoF) Forecast. Geophys. Geophys. Geophys. Res. Lett. 2024, 51, e2023GL107929. [Google Scholar] [CrossRef]
  9. Hui, M.; Ni, F.; Liu, W.; Liu, J.; Chen, N.; Zhou, X. SPN-Based Dynamic Risk Modeling of Fire Incidents in a Smart City. Appl. Sci. 2025, 15, 2701. [Google Scholar] [CrossRef]
  10. Masoudian, S. Stochastic Modelling of Wildfire Spread (PhD Abstract). Bull. Aust. Math. Soc. 2025, 12, 399–400. [Google Scholar] [CrossRef]
  11. Chen, Z.; Ye, B. Association Rule Analysis of Petrochemical Fire Accidents Based on the Apriori Algorithm. In Proceedings of the 9th International Conference on Fire Science and Fire Protection Engineering (ICFSFPE), Chengdu, China, 18–20 October 2019. [Google Scholar] [CrossRef]
  12. Ayhan, B.U.; Doğan, N.B.; Tokdemir, O.B. An Association Rule Mining Model for the Assessment of the Correlations Between the Attributes of Severe Accidents. J. Civ. Eng. Manag. 2020, 26, 315–330. [Google Scholar] [CrossRef]
  13. Mahmood, I.N.; Aliedane, H.A.; Abuzaraida, M.A. Applications of Data Mining in Mitigating Fire Accidents Based on Association Rules. Int. J. Interact. Mob. Technol. 2021, 15, 158–169. [Google Scholar] [CrossRef]
  14. Stojanović, V.S.; Jovanović Spasojević, T.; Bojičić, R.; Pažun, B.; Langović, Z. Cauchy–Logistic Unit Distribution: Properties and Application in Modeling Data Extremes. Mathematics 2025, 13, 255. [Google Scholar] [CrossRef]
  15. Stojanović, V.S.; Bakouch, H.S.; Alomair, G.; Daghestani, A.F.; Grujčić, Ž. A Flexible Unit Distribution Based on a Half-Logistic Map with Applications in Stochastic Data Modeling. Symmetry 2025, 17, 278. [Google Scholar] [CrossRef]
  16. Mitrović, N.; Stojanović, V.S.; Jovanović, M.; Mladjan, D. Forensic and Cause-and-Effect Analysis of Fire Safety in the Republic of Serbia: An Approach Based on Data Mining. Fire 2025, 8, 302. [Google Scholar] [CrossRef]
  17. The CTIF World Fire Statistics, Report No. 29. 2024. Available online: https://www.ctif.org/world-fire-statistics (accessed on 14 March 2026).
  18. Dowd, C. “Twosamples”: Fast Permutation Based Two Sample Tests, R package, version 2.0.1, 2023. Available online: https://twosampletest.com (accessed on 27 January 2026).
  19. Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer: New York, NY, USA, 2009. [Google Scholar] [CrossRef]
  20. Aggarwal, C. Data Mining: The Textbook; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar] [CrossRef]
  21. Rigby, R.A.; Stasinopoulos, D.M. Generalized Additive Models for Location, Scale and Shape. J. R. Stat. Soc. C 2005, 54, 507–554. [Google Scholar] [CrossRef]
  22. Mutoh, A.; Booth, A.S.; Stallrich, J.W. Revisiting Penalized Likelihood Estimation for Gaussian Processes. arXiv 2025, arXiv:2511.18111. [Google Scholar] [CrossRef]
  23. Longford, N.T. A Fast Scoring Algorithm for Maximum Likelihood Estimation in Unbalanced Mixed Models with Nested Random Effects. Biometrika 1987, 74, 817–827. [Google Scholar] [CrossRef]
  24. Hastie, T.; Tibshirani, R. Generalized Additive Models. In Wiley StatsRef: Statistics Reference Online; Wiley: Hoboken, NJ, USA, 2014. [Google Scholar]
  25. Rigby, R.A.; Stasinopoulos, D.M.; Heller, G.Z.; De Bastiani, F. Distributions for Modeling Location, Scale, and Shape: Using GAMLSS in R; Chapman and Hall/CRC: Boca Raton, FL, USA, 2019. [Google Scholar]
  26. Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I.H. The WEKA Data Mining Software: An Update. ACM SIGKDD Explor. Newsl. 2009, 11, 10–18. [Google Scholar] [CrossRef]
  27. Baizyldayeva, U.B.; Uskenbayeva, R.K.; Amanzholova, S.T. Decision Making Procedure: Applications of IBM SPSS Cluster Analysis and Decision Tree. World Appl. Sci. J. 2013, 21, 1207–1212. [Google Scholar]
Figure 1. Left plot: Empirical and fitted injury age distributions X . Right plot: The Q-Q plot of fitting empirical data with GND.
Figure 1. Left plot: Empirical and fitted injury age distributions X . Right plot: The Q-Q plot of fitting empirical data with GND.
Fire 09 00146 g001
Figure 2. Left plot: Empirical and fitted age distributions of deaths Y . Right plot: Q-Q plot of fitting empirical data with RefLOGND+.
Figure 2. Left plot: Empirical and fitted age distributions of deaths Y . Right plot: Q-Q plot of fitting empirical data with RefLOGND+.
Fire 09 00146 g002
Figure 3. Age-specific predictive probabilities of injuries and fatalities, as well as the corresponding logarithmically scaled relative risk index derived from optimally fitted stochastic models (GND and RefLOGND+, respectively).
Figure 3. Age-specific predictive probabilities of injuries and fatalities, as well as the corresponding logarithmically scaled relative risk index derived from optimally fitted stochastic models (GND and RefLOGND+, respectively).
Fire 09 00146 g003
Figure 4. Logarithmic scatterplot of Lift and Conviction values for the three clusters obtained by the association rules.
Figure 4. Logarithmic scatterplot of Lift and Conviction values for the three clusters obtained by the association rules.
Fire 09 00146 g004
Figure 5. ROC curves obtained according to the decision trees classification: (a) Gender = Female; (b) Gender = Male; (c) Age of Individuals = Elderly (65+).
Figure 5. ROC curves obtained according to the decision trees classification: (a) Gender = Female; (b) Gender = Male; (c) Age of Individuals = Elderly (65+).
Fire 09 00146 g005
Table 1. Fire injury and fatality rates per 100 fires in Serbia and selected countries in the region (source: CTIF 2024 report [17]).
Table 1. Fire injury and fatality rates per 100 fires in Serbia and selected countries in the region (source: CTIF 2024 report [17]).
CountryInjuries (per 100 Fires)Deaths (per 100 Fires)
Bulgaria0.80.43
Greece0.30.24
Croatia1.00.22
Hungary4.30.52
Serbia1.30.42
World average1.50.53
Table 2. Variables used in fire safety risk analysis.
Table 2. Variables used in fire safety risk analysis.
Ord. Num.Variable
(Attribute)
TypeValuesDescription I G
Injuries/Fatalities
1.Cause of the Fire (CoF)Nominal/The various causes of fires.0.1299
2.Age-groupNominal0–9,…, 80+Age-based classification.0.1243
3.Fire Object Type (FOT)Nominal/The various category of fires.0.0549
4.Year of Fire Occurrence (YFO)Ordinal2005,…, 2015The year the fire occurred.0.0142
5.Accident Hour Group (AHG)NominalInterval I–IVTime interval of fire occurrence (0–6 h, 6–12 h, 12–18 h, 18–24 h).0.0130
6.SeasonNominalWinter, Spring, Summer, AutumnSeason in which the fire occurred.0.0108
7.GenderNominalFemale, MaleClassification by male and female.0.0090
8.Day of Day (DoD)NominalWeekday, WeekendDay of the week when the fire occurred.0.0014
Table 3. Demographic distribution of fire injuries and fatalities in Serbia.
Table 3. Demographic distribution of fire injuries and fatalities in Serbia.
VariablesInjuriesFatalities
FrequencyPercentageFrequencyPercentage
Gender
Female51024.6%30535.8%
Male156575.4%54864.2%
Age groups (per year)
0–9532.5%283.3%
10–19803.8%80.9%
20–3444221.3%566.6%
35–4955126.6%819.5%
50–6450824.5%17820.9%
65–7931915.4%24829.0%
80+1225.9%25429.8%
Injury Severity
Minor135665.3%
Severe71934.7%
Table 4. Descriptive statistics of the age structure of fire injuries and deaths.
Table 4. Descriptive statistics of the age structure of fire injuries and deaths.
VariableMin1st Qu.Median3rd Qu.MaxMeanVarStDevSkewKurtSum
Injuries (X)03347629247.66384.819.62 3.84   ×   10 4 2.3942075
Fatalities (Y)15371819864.52461.721.49−1.0613.606853
Table 5. Estimated values of competing distributions and goodness-of-fit statistics (X-variable).
Table 5. Estimated values of competing distributions and goodness-of-fit statistics (X-variable).
Distr.ParametersInfor. Criteria and ErrorsTest-Statistics/(p-Values)
μ σ ν AICBICMSEKSADCVM
ND47.6619.61 18,24418,255 3.42 × 10 4 0.0467   * (0.0185) 28,321 * (0.0180) 2.247   * (0.0275)
TD47.6619.611.62 × 104218,24618,262 3.43 × 10 4 0.0434   * (0.0325) 22,450   * (0.0425)1.5033 (0.0775)
GTD47.7832.40400.0218,19418,216 1.62 × 10 3 0.0675   * ( ~ 0.000) 119,651   * ( ~ 0.000) 4.178   * ( ~ 0.000)
GND47.7819.653.10518,19218,209 2.06 × 10 4 0.0376 (0.0935)10,223 (0.255)0.5462 (0.425)
* p < 0.05.
Table 7. Number of generated large itemsets with minimal and total support in the Apriori algorithm.
Table 7. Number of generated large itemsets with minimal and total support in the Apriori algorithm.
Dataset L ( 1 ) L ( 2 ) L ( 3 ) L ( 4 ) L ( 5 ) Instance
SupportTotal
Injuries3825042522023622075
Fatalities23947915060853
Table 8. Observed patterns and metrics of associative rules within the injured population.
Table 8. Observed patterns and metrics of associative rules within the injured population.
OrderAntecedents (A)Consequents (B)Number of Cases ( A B )Conf.LiftLev.Conv.
1.AHG = Interval III,
CoF = Welding/Grinding
Gender = Male 66 65 0.981.310.018.11
2.Age-group = 20–34, FOT = Industrial and production facilitiesGender = Male 70 68 0.971.290.015.73
3.DoD = Weekday, CoF = Welding/GrindingGender = Male 114 110 0.961.280.015.60
4.CoF = Welding/GrindingGender = Male 144 138 0.961.270.015.06
5.Season = Winter, FOT = Industrial and production facilitiesGender = Male 70 67 0.961.270.014.30
6.CoF = Welding/Grinding,
Injury Severity = Minor
Gender = Male 88 84 0.951.270.014.33
7.FOT = Industrial and production facilitiesGender = Male 181 172 0.951.260.024.45
8.AHG = Interval III,
FOT = Transport, vehicles, and vessels
Gender = Male 76 72 0.951.260.013.74
9.CoF = Explosion, FOT = Industrial and production facilitiesGender = Male 76 72 0.951.260.013.74
10.DoD = Weekday, FOT = Industrial and production facilitiesGender = Male 146 138 0.951.250.013.99
Average values:0.961.270.014.91
Table 9. Observed patterns and metrics of associations within the population of the deceased.
Table 9. Observed patterns and metrics of associations within the population of the deceased.
OrderAntecedents (A)Consequents (B)Number of Cases ( A B )Conf.LiftLev.Conv.
1.AHG = Interval I, Age-group = 80+FOT = Residential buildings 66 65 0.981.240.016.89
2.CoF = Furnace hearthFOT = Residential buildings 97 95 0.981.240.026.75
3.CoF = Cigarette butt, Gender = MaleFOT = Residential buildings 96 94 0.981.240.026.68
4.FOT = Open spaces/othersCoF = Open flame 82 80 0.985.700.0822.65
5.CoF = Cigarette buttFOT = Residential buildings 116 113 0.971.230.026.05
6.DoD = Weekday, CoF = Furnace hearthFOT = Residential buildings 75 73 0.971.230.025.22
7.CoF = Heating devicesFOT = Residential buildings 72 70 0.971.230.025.01
8.FOT=Open spaces/others, Gender=MaleCoF = Open flame 69 67 0.975.670.0619.06
9.DoD = Weekday,
FOT = Open spaces/others
CoF = Open flame 67 65 0.975.670.0618.51
10.CoF = Undetermined, Gender = FemaleFOT = Residential buildings 64 62 0.971.220.014.45
Average values:0.972.570.0310.13
Table 10. Metric characteristics of the obtained decision trees.
Table 10. Metric characteristics of the obtained decision trees.
Tree MetricFemaleMaleElderly (65+)
Instances8152113943
Total nodes4711357
Terminal nodes245729
Tree depth10119
Input variables777
Estimate0.1750.1160.204
Std. Error0.0130.0070.013
Table 11. Classification table for the resulting decision trees (“IFO” target variable and three various segmentations of the dataset).
Table 11. Classification table for the resulting decision trees (“IFO” target variable and three various segmentations of the dataset).
ObservedPredicted
FemaleMaleElderly (65+)
FatalitiesInjuriesTotalFatalitiesInjuriesTotalFatalitiesInjuriesTotal
Fatalities281243054806854846636502
Injuries15495510341531156537404441
Total29651981551415992113503440943
Table 12. Classification quality measures for observed fire causes and the three obtained decision trees.
Table 12. Classification quality measures for observed fire causes and the three obtained decision trees.
MeasureFemaleMaleElderly (65+)
Accuracy0.95210.95170.9226
Precision0.94930.93390.9264
Sensitivity0.92130.87590.9283
Specificity0.97060.97830.9161
F-measure0.93510.90400.9274
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mitrović, N.; Stojanović, V.; Jovanović, M.; Grujčić, Ž.; Mladjan, D. Patterns of Human Injuries and Fatalities in Fire Incidents in Serbia: A Comprehensive Statistical and Data Mining Analysis. Fire 2026, 9, 146. https://doi.org/10.3390/fire9040146

AMA Style

Mitrović N, Stojanović V, Jovanović M, Grujčić Ž, Mladjan D. Patterns of Human Injuries and Fatalities in Fire Incidents in Serbia: A Comprehensive Statistical and Data Mining Analysis. Fire. 2026; 9(4):146. https://doi.org/10.3390/fire9040146

Chicago/Turabian Style

Mitrović, Nikola, Vladica Stojanović, Mihailo Jovanović, Željko Grujčić, and Dragan Mladjan. 2026. "Patterns of Human Injuries and Fatalities in Fire Incidents in Serbia: A Comprehensive Statistical and Data Mining Analysis" Fire 9, no. 4: 146. https://doi.org/10.3390/fire9040146

APA Style

Mitrović, N., Stojanović, V., Jovanović, M., Grujčić, Ž., & Mladjan, D. (2026). Patterns of Human Injuries and Fatalities in Fire Incidents in Serbia: A Comprehensive Statistical and Data Mining Analysis. Fire, 9(4), 146. https://doi.org/10.3390/fire9040146

Article Metrics

Back to TopTop