Fire Risk Assessment Models Using Statistical Machine Learning and Optimized Risk Indexing

: It is very di ﬃ cult for us to accurately predict occurrence of a ﬁre. But, this is very important to protect human life and property. So, we study ﬁre hazard prediction and evaluation methods to cope with ﬁre risks. In this paper, we propose three models based on statistical machine learning and optimized risk indexing for ﬁre risk assessment. We build logistic regression, deep neural networks (DNN) and ﬁre risk indexing models, and verify performances between proposed and traditional models using real investigated data related to ﬁre occurrence in Korea. In general, ﬁre prediction models currently in use do not provide satisfactory levels of accuracy. The reason for this result is that the factors a ﬀ ecting ﬁre occurrence are very diverse and frequency of ﬁre occurrence is very sparse. To improve accuracy of ﬁre occurrence, we ﬁrst build logistic regression and DNN models. In addition, we construct a ﬁre risk indexing model for a more improved model of ﬁre prediction. To illustrate comparison results between our research models and current ﬁre prediction model, we use real ﬁre data investigated in Korea between 2011 to 2017. From the experimental results of this paper, we can conﬁrm that accuracy of prediction by the proposed method is superior to the existing ﬁre occurrence prediction model. Therefore, we expect the proposed model to contribute to evaluating the possibility of ﬁre risk in buildings and factories in the ﬁeld of ﬁre insurance and to calculate the ﬁre insurance premium.


Introduction
Fires have been devastating to human life and property. So, humans have been making various efforts to deal with the fires. One of them is to predict the possibility of fire. However, it is actually very difficult to predict the fire occurrence [1]. This is because there are many variables that affect fire, and the number of fire occurrences in the total data is very small. That is, the fire occurrence data set is very sparse because most values of the data set are zeros. To solve this problem of fire data set, we study novel models to predict fire occurrence in this paper. The fire risk evaluation is a popular approach for fire prediction [1]. Rishickesh, et al. (2019) studied a model to predict the forest fires using various machine learning algorithms such as logistic regression, support vector machines, random forest, boosting, etc. In addition, they showed that the experimental results from logistic regression and gradient boosting are better than other machine learning algorithms. In their experimental results, they showed the accuracy results of 0.6826 and 0.6838 as better prediction results than others. The first accuracy was the result of logistic regression with principal component analysis (PCA), and the second accuracy was the result of gradient boosting without PCA. The other machine learning algorithms except these two algorithms had lower accuracy results than logistic regression with PCA and gradient boosting without PCA. In general, the prediction accuracy of 0.6826 or 0.6838 is not satisfactory. So, we need to improve the performance of prediction models in fire occurrence. We consider the logistic regression as one of our candidate models for fire risk prediction. In our research, we apply the model to not forest but factory and building and try to increase the accuracy of fire prediction using new proposed methods. There are a number of approaches to fire risk evaluation including fire risk indexing for buildings, factories, forests, etc. [2][3][4][5]. The fire risk index is an index made up of variables that describe the fire hazards and prevention for evaluating fire hazards. Madaio, et al. (2016) developed the 'Firebird' framework for predicting fire risk and prioritizing fire inspections. The authors also used support vector machines and random forest for fire risk prediction, and built an interactive map for prioritization of fire inspections. From the experimental results of 'Firebird' system, we found that the performance of random forest is better than the support vector machines. In both studies conducted by Rishickesh et al. (2019) and Madaio et al. (2016), we can see that the accuracy of the fire prediction models was not satisfactorily high. So, we can confirm it is difficult to predict fire occurrence accurately. Watts (2016) introduced the fire risk indexing as another new method for predicting fire risk. In his research, the Fire risk indexing is a heuristic model based on the knowledge and experience of fire experts for fire safety [4]. The fire risk index consists of the factors (variables) representing the influences of fire risk. This index contributes to the quantification of fire risk. Therefore, we use the fire risk indexing to model the hazard and prevention of fire in our paper. Sakennaite and Vaidogas (2010) compared fire risk index with fire risk analysis. The authors assessed the fire safety by means of fire risk indexing [5]. They used various variables related to geometry and fire-specific data for building fire risk indexing and analysis [5]. Nikolopoulos et al. (2018) illustrated the model performance for prediction of post-fire debris flow occurrence. Using a contingence table, they evaluated the performances of three approaches which are rainfall thresholds, logistic regression and random forest [6]. They found random forest model has the best performance in the predictive models [6].
From previous research results related to fire occurrence prediction, we confirmed that the analytical methods that provide best prediction performance differ according to the detailed prediction fields related to fire occurrence. In our paper, we also propose a novel fire risk assessment model using fire risk indexing and analysis. We apply statistical machine learning and optimized risk indexing models for fire risk assessment. The Korea Fire Protection Association (KFPA) conducts fire safety inspections and is sponsored by fire insurers in Korea [7]. KFPA uses fire risk indexing method to evaluate fire risk of buildings known as KFPA Fire Risk Index (KFRI). In order to improve KFRI, KFPA has developed a lot of models including statistical machine learning and optimization of fire risk indexing. Therefore, we propose novel models to improve the performance of the KFRI for fire risk assessment and prediction. We organize this paper as follows. In Section 2, the statistical machine learning for fire risk prediction is introduced. We propose our statistical machine learning and optimized risk indexing models for fire risk assessment in Section 3. In Section 4, we illustrate the performance of our proposed models for fire prediction and evaluation using the real investigated data related to fire risk from the KFPA. Lastly, we show our conclusions and future works in Section 5.

Statistical Machine Learning
Statistics is defined as learning from data [8]. Machine learning is to make machines (computers) intelligent by learning from data [9]. So, statistical machine learning is to apply statistics to machine learning. In general, statistics uses the concept of inference with estimation and hypotheses testing [10]. In addition, statistics has a normality assumption for data [11]. Therefore, statistical machine learning leads to the improvement of the performance of existing machine learning by using inference and normality assumptions of statistics [12]. Regression is a representative method of statistics [13]. Also, deep learning is a popular algorithm of machine learning [14]. In this paper, we use logistic regression from the regression and deep neural networks from deep learning for statistical machine learning. We can consider so many methods for statistical machine learning. Most of them are for classification, prediction and clustering. The aim of this paper is to study, using a predictive model, fire risk assessment. Therefore, we use statistical machine learning models for fire risk forecasting and assessment.

Fire Risk Assessment Models Using Statistical Machine Learning and Optimized Risk Indexing
In this paper, we construct a fire risk model using statistical machine learning and optimized risk indexing. The data related to fire risk consist of explanatory variables (X) affecting the occurrence of fire and response variable (Y), indicating the frequency of fire occurrence.

Statistical Machine Learning for Fire Occurrence Prediction
Since the response variable contains frequency data, we use count data analysis methods among statistical machine learning models [15]. In the discrete probability distributions, Poisson and binomial distributions can be applied to analyze the fire occurrence data. In the proposed study, we focused on building a model for the possibility of fire, so we built a predictive model based on the binomial distribution. A random variable Y is distributed to a binomial distribution with n and p when Y is represented as follows [16].
where n is the number of Bernoulli trials, and p is the probability of success. The expectation (E(Y)) and variance (Var(Y)) of Y are np and np(1-p) respectively. Each y has a binary data value (1: occurred fire or 0: no occurred fire) representing whether a fire has occurred. So, we build a logistic regression model based on binary response variable (Y) to forecast fire risk. The logistic regression model is defined as the following model [17]: where p is P(Y = 1), which is the probability of occurred fire, and log p 1−p is logit function of p. (x 1 , x 2 , . . . , x k ) are the explanatory variables that affect Y. In addition, error term ε is followed to normal distribution with mean=0 and variance = σ 2 . From the logistic model of equation (2), we get the following model for predicting the probability of occurred fire (p): where b i represents the estimate for regression parameter β i under minimizing ε. We consider another method of statistical machine learning. We apply deep neural networks (DNN) to fire risk assessment. (y, x 1 , x 2 , . . . , x k ) is the data set for DNN [18]. Like the logistic regression model, y is a response variable representing fire occurrence or not. Also, (x 1 , x 2 , . . . , x k ) are explanatory variables affecting Y. Our DNN consists of input, hidden and output layers. Input and hidden layers are related to X and Y respectively. We design the size and structure of hidden layers to improve model performance. Figure 1 shows the proposed DNN model for fire risk prediction. In Figure 1, is connecting weight vector from input to hidden layers, and is linear combination with . Also, we call combination function. ℎ is also a linear combination ℎ with , where is a vector of connecting weights from hidden to output layer. (•) is a function to transform ℎ into the values between 0 to 1. This function is called as activation function, and we use logistic growth function as follows [9,12]: We train the DNN model based on perceptron cost function that minimizes the difference between predicted value and real value .

New Fire Risk Indexing for Fire Risk Assessment
Lastly, we carry out an optimized risk indexing. In this paper, we use various variables (components) related to fire occurrence. Table 1 illustrates each component, module with components, and category with modules for the fire risk modeling.
Fire facility (i1) Gas facility (i2) Hazardous material facility (i3) Electricity facility (i4)  In Figure 1, w is connecting weight vector from input to hidden layers, and w T x is linear combination x with w. Also, we call w T x combination function. θ T h is also a linear combination h with θ, where θ is a vector of connecting weights from hidden to output layer. f (·) is a function to transform θ T h into the values between 0 to 1. This function is called as activation function, and we use logistic growth function as follows [9,12]: We train the DNN model based on perceptron cost function that minimizes the difference between predicted valueŷ and real value y.

New Fire Risk Indexing for Fire Risk Assessment
Lastly, we carry out an optimized risk indexing. In this paper, we use various variables (components) related to fire occurrence. Table 1 illustrates each component, module with components, and category with modules for the fire risk modeling.
Some components such as years, floors, size, and distance from the fire brigade are generated objectively but most other components are made rather subjectively. Therefore, we perform the proposed methods for fire risk assessment using the variables in Table 1. We illustrate our proposed procedure for fire risk assessment in Figure 2.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 11 Total flooding gas system (F5) Fire alarm system (A) A = A1 × A2 × A3 Fire detection system (A1) Emergency alarm system (A2) Emergency notifying system (A3) Passive fire protection system (V) V = V1 × V2 Fire compartment (V1) Evacuation system (V2) Smoke control system and auxiliary equipment required for fire brigade (S) S = S1 × S2 Smoke control system (S1) Auxiliary equipment required for fire brigade (S2) Some components such as years, floors, size, and distance from the fire brigade are generated objectively but most other components are made rather subjectively. Therefore, we perform the proposed methods for fire risk assessment using the variables in Table 1. We illustrate our proposed procedure for fire risk assessment in Figure 2. To assess fire risk for each building and factory, we use fire occurrence data provided by the KFPA. This data set has a large number of variables related to fire occurrence. In this paper, we consider two approaches to fire risk assessment. First, we use statistical machine learning and DNN for fire occurrence prediction. Next, we build a risk index of fire occurrence, we call this index New KFRI (NKFRI). We use prediction accuracy and lift value as performance measures for statistical machine learning and fire risk indexing respectively. Using the results, we carry out fire risk To assess fire risk for each building and factory, we use fire occurrence data provided by the KFPA. This data set has a large number of variables related to fire occurrence. In this paper, we consider two approaches to fire risk assessment. First, we use statistical machine learning and DNN for fire occurrence prediction. Next, we build a risk index of fire occurrence, we call this index New KFRI (NKFRI). We use prediction accuracy and lift value as performance measures for statistical machine learning and fire risk indexing respectively. Using the results, we carry out fire risk assessment. Table 1. Components, modules and categories of KFRI for manufacturing facilities.

Category Module Components
Hazards Fire facility (i1)
In this paper, we carried out the logistic regression analysis and DNN using the 18 explanatory variables and one response variable. We divided the entire data into training and test data sets. We extracted 70% randomly from our data for training and 30% for test. After various trials and errors, we finally designed our DNN architecture as follows: hidden layers, 3; hidden nodes in each hidden layer, 100; activation function, sigmoid; learning rate, 0.8; learning momentum, 0.5; epoch, 100; and weight decay, 0.000001. Also, we used R data language and its packages for fire data analysis [19]. Using the training data set, we constructed the forecasting models. To evaluate the performance of structured models, we used the test data set. Table 2 shows the comparison result of prediction accuracy between the models. KFRI is the predictive model currently used, and the accuracy result of the KFRI model is to predict whether a fire has occurred using the KFRI. Therefore, we found that fire prediction is a very difficult task. The prediction accuracy of logistic regression was higher than that of KFRI, but the performance improvement was not significant. Finally, we can see that the accuracy of fire prediction of the DNN model is significantly higher than that of KFRI or logistic regression.
For further performance improvement, we considered fire risk indexing. The fire risk indexing has been recommended and used as a rapid assessment to evaluate the fire risk of alternative concepts for large buildings. KFPA also uses this fire risk indexing method known as one of the KFRI models.
In Table 1, we show the components (variables) for fire risk indexing and calculation formula of KFRI for manufacturing facilities. The hazard components of KFRI are placed in the numerator and the countermeasure components are placed in the denominator to reflect the risk of building. We calculate the KFRI by equation (6): where B = Basic hazards or intrinsic hazards such as number of floor, structure, size, fire load, etc. I = Ignition hazards due to fire, gas, electrical facilities, and hazardous material P = Process hazards apply only for factories consist of basic process, hazardous material treating process, hot work process, etc. M = Building safety management based on fire drill, education, hot work, smoking control, etc. F = Fire protection equipment and system such as fire extinguisher, fire sprinkler, standpipe, etc. A = Fire alarm system such as fire detecting system, notifying system, etc. V = Fire compartment and evacuation system S = Smoke control system and auxiliary equipment required for fire brigade. G = Public fire service The higher the risk, the higher the value of KFRI. Each component of KFRI is calculated combining its weight and inspection results by KFPA surveyors. The weights of each component for KFRI are originally decided by the analytical hierarchy process (AHP) with experienced KFPA field surveyors. After collecting enough inspection and fire incidents data for many years, we tried various approaches to improve its performance. New KFRI (NKFRI) was considered in order to compare the risks among factories with a total area of more than 3,000 m 2 and other specific buildings designated by the Korean law [20].
In the NKFRI model, we performed the fire risk indexing for factories. NKFRI more specifically introduced the concepts of likelihood and severity of fire by rearranging variables and then assigning optimal weight for each components based on its deviation. One of the main reasons to develop NKFRI is to make it easier to compare the risk among factories that have similar processes. In order to evaluate the relative standing of a value within a group, "PERCENTRANK" function of Excel 2016 was used.
After calculating NKFRI, "PERCENTRANK" function is adapted again to finally compare the relative risk in a group. Table 3 shows the variable lists adapted when evaluating fire risk. Table 3. Application range of variables.

Application Range Variables
All To keep the independence of data, data was separated into model construction and model verification. The fire data during 2011-2017 was used to find out the optimized weights of each factor and then they were verified using the fire data during 2018-2019 data. Table 4 shows the weight change for likelihood index of modules. The components and modules of likelihood and severity indexes were selected by KFPA experienced surveyors from the fire engineering point of view. In this paper, we tried to optimize the weights of each component for the likelihood index.
Inspection data was combined with fire incidents data, and then classified into whether fire happened or not. In order to optimize the weights of each component, the concept of deviation based on the mean value of the likelihood index with or without fire incident was used. The deviation below was calculated as changing the weight of each component in the designated range, and then try to find optimized one that maximizes the deviation of likelihood index that consists of fire occurrence related components of KFRI using the tailor-made program: where X = The mean value of fire frequency-related KFRI for buildings with fire Y= The mean value of fire frequency-related KFRI for buildings without fire The performance of optimized weights of each component was evaluated using the lift value defined below: where Baseline lift: Ratio of the number of fires included in the overall data before building the model. Top 10% lift: Ratio of fires in the top 10% of the data sorted in descending order of the fire frequency-related KFRI. This value ranges from 0 to infinity. In addition, the larger this value, the better the performance of the model. Table 5 illustrates the lift values of likelihood index of KFRI and proposed NKFRI models. In order to compare likelihood performance, identical components were adapted and for KFRI, origin weights decided by the AHP were used. The weights obtained using data from 2011-2017 were applied in 2018-2019, and then the likelihood index was recalculated respectively. When comparing the lift value using the first weights (KFRI) and the optimized weights (NKFRI), it was confirmed that the proposed NKFRI improved the lift value by 41.01%. So, we verified the improved performance of our proposed work, and this paper contributes to the fire risk assessment for various buildings and factories.

Discussion
The goal of this paper was to predict the occurrence of fire accurately. In this paper, we considered logistic regression, DNN, and optimized risk indexing for fire forecasting. From our experimental results, we found that the prediction accuracy of DNN is better than others such as KFRI or logistic regression. The accuracy of the prediction of the DNN model is 0.7514, which is an improved result compared to the prediction accuracy by the previous study of Rishickesh et al. (2019), 0.6838.
In addition, we also proposed an optimized risk indexing for fire prediction. We got 2.1421 as the lift value of this indexing. This result means that the accuracy of prediction is increased by 2.1421 times through the modeling based on the optimized risk indexing compared to before building the model. Therefore, using the proposed fire risk indexing with a lift value of 2.1421, we can predict fire occurrence efficiently for fire risk management.

Conclusions
In this paper, we proposed a model for fire risk assessment using statistical machine learning and optimized risk indexing. In general, to predict fire risk prediction is very difficult, this is because fires occur very rarely and the causes that affect fire occurrence are very diverse. So, we need more advanced models for fire risk prediction and assessment. Currently, the KFRI is widely used in practice to evaluate fire risk and then decide the discount rate of fire insurance premium in Korea. In this paper, we compared our proposed models with the KFRI and previous other researches. For the fire risk forecasting and management, we considered and constructed three models that are logistic regression, DNN and NKFRI (new fire risk index). The logistic regression analysis and DNN learning algorithm are based on the statistical machine learning. Also, the NKFRI is an optimized risk indexing model, an extended model of KFRI. In the experimental results, we found that the DNN and NKFRI provide more improved performance than traditional KFRI or previous research. However, it is difficult to actually apply machine learning to sensitive areas such as insurance premium decisions despite its better performance, because it is hard to explain the result clearly.
Therefore, in our future works, we will try to solve this problem preferentially. To increase the explanatory power of the model, we will apply probability theory and statistical inference to fire prediction models based on machine learning and fire risk indexing. Because of the characteristic of fire occurrence, it is very difficult to accurately predict the fire occurrence and risk. Although the accuracy of fire prediction by our proposed model has been improved compared to the existing research results, the accuracy of our fire prediction was not raised to the highest level. So, additional research strategies are needed to develop a model that can improve the performance of fire prediction. In this paper, we considered DNN with large hidden layers and nodes and got the prediction accuracy of 0.7514. We will consider convolutional neural networks with convolution operation for improving fire prediction accuracy larger than 0.7514. One of the advantages of using our models is it is easy to combine with other information. There are plenty of chances of more sources or information related to fire is open to public in this big data era. This process makes it easy to find out the optimal weights, whereas the AHP process requires a quantity of time and collaboration among engineers to get proper weights of each component. In addition, we will consider more advanced models such as Bayesian deep learning for improved fire risk prediction.
This paper contributes to the real domains related to fire risk assessment such as calculation and imposition of fire insurance premiums. Traditionally, the calculation of fire hazard ratings for buildings and factories relies on the subjective knowledge of a group of fire experts, but the results of our study have enabled this work to be quantifiable and objective. In our research, we focused on the fire risk forecasting and assessment for buildings and factories. We also expect that the application of this paper can be extended to fire risk related to forests and other domains.