Applying Bayesian Models to Reduce Computational Requirements of Wildﬁre Sensitivity Analyses

: Scenario analysis and improved decision-making for wildﬁres often require a large number of simulations to be run on state-of-the-art modeling systems, which can be both computationally expensive and time-consuming. In this paper, we propose using a Bayesian model for estimating the impacts of wildﬁres using observations and prior expert information. This approach allows us to beneﬁt from rich datasets of observations and expert knowledge on ﬁre impacts to investigate the inﬂuence of different priors to determine the best model. Additionally, we use the values predicted by the model to assess the sensitivity of each input factor, which can help identify conditions contributing to dangerous wildﬁres and enable ﬁre scenario analysis in a timely manner. Our results demonstrate that using a Bayesian model can signiﬁcantly reduce the resources and time required by current wildﬁre modeling systems by up to a factor of two while still providing a close approximation to true results.


Introduction
Each year natural disasters such as wildfires cause harm to people and significant destruction of physical infrastructure. To better understand these phenomena and reduce their impact, various natural hazard models have been developed over the years [1,2]. The effectiveness of dynamic systems based on these models often depends on how quickly they can predict the unfolding of events. Currently, several simulation models, such as Firemap [3], SiroFire [4], Prolif [5], Farsite [6], Pyrocart [7], Firemaster [8], FireStation [9], Prometheus [10], Spark [11], and Phoenix [12], are used to predict the spread of wildfires across a landscape based on pre-existing fire spread models. Deriving accurate risk metrics often requires a large number of fire simulations, and the inputs to these wildfire spread models have associated uncertainties that can influence the resulting rate of speed and, consequently, the area burned by the fire. Quantifying the sensitivity of these parameters to the resulting output is useful for worst-case scenario analysis in operational risk management [13]. However, such scenario analysis for effective wildfire management requires additional simulations to be run under various combinations of input values, which can be both computationally expensive and time-consuming [14].
Several methods and tools are available for conducting sensitivity analysis in the environmental and wildfire context. These include variance-based approaches such as Sobol's method [15], Cukier's method [16], and Saltelli's method [17]; density-based methods such as Krzykacz's method [18], Plischke's method [19], and Pianosi's method [20]; neural network methods [21,22]; Taylor series expansion [23]; and polynomial chaos expansion [24]. Additionally, several tools such as the Monte Carlo Analysis Toolbox (MCAT) [25], Eikos [26], MATLODE [27], SAFE (Sensitivity Analysis for Everybody) [28], SALib [29], and OpenTURNS (Open source Treatment of Uncertainty, Risk 'N Statistics) [30] have been developed for sensitivity analysis. However, these tools can only be used once the sets of input and output values are available. While these methods and tools are useful for conducting sensitivity analyses of fire simulations, they do not consider the computational requirements of running the fire simulations under a large number of input combinations, as required by these methods and tools. As a result, using a conventional wildfire management system to run any of these methods or tools for scenario analysis in emergency planning and management may be prohibitively time-consuming, hindering the ability to make better-informed decisions to minimize the extent of damage caused by the disaster. In this work, we investigate whether Bayesian models can reduce the computational requirements of such analyses while maintaining a close approximation to true results and assessing wildfire impacts.
The Bayesian approach is a widely used method in the literature for constructing models to explain various phenomena. It involves fitting a probabilistic model to a given set of data to summarize and predict new observations. In the wildfire domain, the Bayesian approach has been used in several applications; prediction of the likelihood of large fires [31,32], projection of wildfire activities [32,33], estimation of fire suppression costs and resource allocation [34,35], estimation of the size of extreme fires [36][37][38], wildfire risk assessment [39][40][41][42], and prediction/modeling wildfire behavior [43][44][45][46][47]. One advantage of using Bayesian models in wildfire applications is that it allows for the incorporation of prior knowledge to any observation data sizes to make approximate estimations. Despite requiring a large number of simulations, wildfire management practices, such as scenario analysis, can benefit from the Bayesian model to reduce computational costs while still providing realistic predictions.
The main objective of this paper is to investigate the application of a Bayesian model for combining prior knowledge and data to make predictions for wildfire management practices, such as scenario analysis. These predictions will help to significantly reduce the computational requirements of such analyses as only a fraction of the entire input combinations would now be required to be run using the fire simulation tool. The proposed Bayesian model is used to estimate the impacts of wildfires at a location based on three major meteorological inputs: temperature, relative humidity, and wind speed collected from weather stations. The choice of these parameters is based on the experimental setup of previous works [48,49], and the goal of this investigation is to determine the potential time and resource savings that can be achieved through the use of a Bayesian model in wildfire modeling systems.
We build two Bayesian models for comparison, one with and one without a latent effect. The use of a latent effect is investigated to determine whether it improves the accuracy of the model and under what circumstances it can be ignored for the simplicity of the model. If the addition of the latent effect does not improve the model's accuracy, it can be omitted for the sake of simplicity. Additionally, we evaluate the performance of the models under different priors for the hyperparameters associated with the input parameters to the fire simulations. We then apply the best-performing Bayesian model to estimate the impact of a wildfire (fire size) based on the available data and prior information.
The predicted values, along with the data, are used to estimate the sensitivity of the fire area to the input parameters, which can enable scenario analysis in wildfire modeling systems.
The remainder of the paper is organized as follows. In Section 2, we provide a detailed overview of the workflow for this study. In Section 3, we explain the experimental setup for the study, while in Section 4, we present the results and discuss our findings. Finally, in Section 5, we offer conclusions and suggestions for future work. Figure 1 illustrates the overall workflow of the study that investigates the reduction of computational requirements for sensitivity analysis in wildfire management through the use of a Bayesian model. The following section outlines the Bayesian model fitting process and then describes the experimental setup for its application.

Bayesian Model Fitting
For Bayesian model fitting, we first investigated the influence of latent effects in the model before identifying the best prior for the precision parameter in the model.

Influence of Latent Process
In our Bayesian model fitting, the joint posterior distribution of parameter θ is conditional on the observed data y, and hence we write the posterior distribution as follows.
where p(y|θ) is the likelihood of the data given all the parameters θ, p(θ) is the prior distribution of the parameters, and p(y) is the marginal likelihood. p(y) is a normalizing constant and is mathematically equal to: The dimension of the posterior distribution p(θ|y) usually depends on the parameter dimension, i.e., on θ, and can be obtained using joint posteriors as follows.
In other words, the posterior distribution of the parameters can be estimated by scaling the product of the likelihood and the prior.
We considered the Bayesian model shown in Equation (4).
where y is the fire area in hectares that follows a normal distribution with latent mean µ and variance σ 2 . Here, I n is an identity matrix of order n. For each fire area, we use the Spark input data, i.e., the meteorological information, to identify the latent process µ, as given in Equation (5).
We define the term Σ as a correlated fire area variance-covariance matrix. Note that for simplicity, we use a fixed correlation structure S to define the fire areas; thus we write Σ = σ 2 s S. This leads us to use the prior distribution for σ 2 s instead of the whole correlation structure Σ.
To relate the model parameters in Equations (4) and (5) with the posterior distribution in Equation (3), we define θ = µ, σ 2 , Σ and the log of the joint posterior distribution of the model parameters is given by: where ∆ = β 0 1 n − ∑ 3 j=1 β j x j , and P(θ) is the joint prior distributions of the model parameters: β j , j = 0, 1, 2, 3, σ 2 s , σ 2 . Under the INLA structure, we approximate the joint posterior distribution in Equation (6) and consider the posterior marginals. Hence, we write the marginals for the latent process µ as follows.
P(µ j |y) = P(µ j |σ 2 , y)π(σ 2 )dσ 2 (7) In addition to the Bayesian model that takes latent correlated effects of fire areas into account, we also consider a simpler version of the model that does not consider these effects. For Bayesian model updating, we use the Integrated Nested Laplace Approximation (INLA) [50] framework. INLA is a newer approach for computing Bayesian models that is less computationally expensive than popular Markov chain Monte Carlo (MCMC) methods, such as Gibbs sampling [51], which can provide similar solutions for posterior distributions of the parameters of interest.

Sensitivity to the Priors
To analyze the sensitivity of the models to the priors, we considered six different prior distributions for the parameters in the model. In this paper, we present the sensitivity to the priors only for the precision parameters, as the distribution of the fire area is more sensitive in the tails. The priors we consider include the half-Cauchy, half-t, log-gamma, half-normal, penalized complexity (PC log-gamma), and uniform improper, as listed in Table 1. The values of the parameters of the priors are so chosen to make the priors weakly informative and let the observation data set drive the posteriors. We included the penalized complexity prior in our sensitivity analysis to investigate whether priors based on probability statements about the parameters (PC priors [52]) result in better model performance compared to default priors. We then analyzed the sensitivity by examining the posterior marginals of the three weather input parameters: temperature, relative humidity, and wind speed.

Evaluation Metrics
To compare the models with and without latent effects and different priors, we use several metrics: the marginal likelihood (MLK) [53], Deviance Information Criterion (DIC) [54], Watanabe-Akaike Information Criterion (WAIC) [55], and Conditional Predictive Ordinates (CPO) [56]. The MLK is the probability of the observed data values in the fitted model and can be used to estimate posterior probabilities in the model. DIC and WAIC are model performance criteria that measure the complexity of the model by considering the goodness of fit and penalty term, along with the effective number of parameters. CPO is the posterior probability of observing a value when the model is fitted using all the data except the observation in question. In this paper, we use the CPOs of all the observations, transformed through a log transformation, as shown in Equation (8).
where n is the total number of observations.

Fire Simulation Tool-Spark
We used Spark [11] to simulate and predict the spread of wildfires under different conditions and fuel types. It provides a flexible platform for simulating wildfire behavior in various vegetation types by allowing the integration of different packages and models, such as wind field generation and topographic correction, fire ignition models, fire-line interactions, fireband transport, and fire transmission models. The simulations require input data on fire behavior, land classification, fuel load, topography, and weather to produce output metrics such as total burned area, fire intensity, and the number of urban cells burned. In addition to predicting fire progression, Spark can also predict fireband dynamics and risk metrics for fire severity and impact. It can model firebreaks, spot fires, and the coalescence of different parts of the fire over time and is able to run simulations for multiple fire perimeters simultaneously. The calculations in Spark are parallelized using the OpenCL framework. More information on Spark can be found in [57].

Weather Inputs
For this study, we selected three weather inputs: temperature, relative humidity, and wind speed, based on the experimental setup of previous research [14,48]. The ranges and distributions for these inputs, as provided in Table 2, follow the same experimental design. These values were chosen to cover operational weather conditions for wildfire modeling in the Australian context and can easily be modified as needed. Wildfires tend to grow more aggressively under conditions characterized by high wind speed, temperature, and low relative humidity. The other static inputs for fire simulations were taken from configurations and records maintained by the Tasmanian government and the TFS [58]. All simulations were run for five hours, and the cumulative burned area during this period was reported as an output.

Wildfire Management Practice Use Case-Scenario Analysis
We created a wildfire management use case of scenario analysis to identify the worst conditions for aggressive wildfires. The scenario analysis is enabled by the results of the sensitivity analysis, which quantifies the relative influence of each input parameter on fire simulations. The sensitivity analysis uses two sets of data: one with true observations and the other with Bayesian model-predicted values. The study area and evaluation metrics for the application of the Bayesian model in the use case are discussed further below.

Study Area
Tasmania was chosen as the study area for the use case due to the frequent occurrence of wildfires in the region, the availability of high-quality land datasets that can be used in operational wildfire simulation tools, and the systematic grid configuration of fire start locations, which has been well-studied [14,59]. During the 2018-2019 wildfire season, Tasmania experienced 841 wildfires that burned 310,311 hectares of forest [59]. The Tasmania Fire Service has established a grid of 68,048 potential fire start locations at a distance of 1 km, regardless of the type of land, with locations on water bodies shifted to the nearest land location. The model fitting is based on a dataset of fire simulations run at a single location, considering different combinations of the input parameters. Tasmania also has a detailed, high-resolution dataset of simulations, as maintained in one of our previous works [60].

Sensitivity Analysis
We estimated the sensitivity indices of the input parameters in the fire simulation for various sizes of true observations (sample size) used in the Bayesian models. The choice of the priors of precision in the model was based on our initial findings, and the choice was complimented with varied sample sizes to predict the values of fire area for combinations of input parameters. For example, for a sample size of 4000, 4000 random true observations were used to construct a fitted Bayesian model, which then predicted the fire areas for the next 4000 sampled combinations of input parameters. The means of the predicted values were considered for the estimation of the sensitivity indices.
For the sensitivity analysis of fire size to input weather parameters, 8000 samples for different weather input combinations within their ranges were generated using Saltelli's sampling method [61]. The sensitivity indices were estimated with a variance-based sensitivity analysis (SA) method (Sobol Analysis [15]) using the python framework SALib [29]. The choice of the sampling method aligns with the method for estimating the sensitivity indices.

Evaluation Metrics
To further investigate the potential of the Bayesian model in reducing the computational requirements of scenario analysis in wildfire management, we used the measure of similarity and the reduction in computational requirements. The measure of similarity, calculated using Pearson's correlation coefficient [62], assessed the closeness between the true observed data and the Bayesian model predicted values for input combinations. The value of the correlation coefficient ranges from 0 to 1, with higher values indicating better model performance. Additionally, we estimated the reduction in computational requirements by calculating the total number of data values predicted by the Bayesian model, which gives a theoretical upper limit of the possible reduction in computational requirements. For example, if the Bayesian model predicted half of the data values used for sensitivity analysis, the computational requirements would be reduced by up to a factor of 2.

Results and Discussion
In this section, we present and discuss the Bayesian model for model fitting, the role of the latent process, and the model sensitivity to different priors. We also address uncertainty quantification in the Bayesian model with an increased dataset and the estimation of sensitivity indices using the model-predicted values and the available dataset for scenario analysis.  Table 3 shows a comparison of the two fitted models. All the metrics favored the model with a latent process over the model without one. The model without a latent process had a higher value of marginal likelihood than the model with random noise, indicating that the observed values were more likely to occur in the latter model. Additionally, the lower values of DIC and WAIC for the model with the latent process indicated a better fit with better goodness of fit. The higher calculated values of CPO* also showed that both models had a good fit, with the model with a latent process performing slightly better. These metrics demonstrate that the latent process in the Bayesian model fitting contributed to slightly improved performance. Based on our analysis, the model with the latent process should be preferred. However, due to the insignificant differences between the values of the metrics obtained for both models, both models can be used for predictions with the appropriate priors. Table 3. Comparative Analysis of Bayesian models. All the evaluation metrics favor the model with random noise as the model had marginally improved performance than the model without random noise. Due to insignificant differences between the values of the metrics, both models may be used with proper priors.

Components
With  Figure 2 presents the posterior marginals of the temperature, relative humidity, and wind speed model parameters (i.e., β's) for various precision priors in the model. The selection of priors can significantly affect the posterior distribution of the fitted model parameters, as demonstrated by the posterior marginals of the input parameters. While most priors yielded similar posterior marginals, the half-Cauchy prior resulted in a significantly displaced posterior margin. In addition, we compared the performance of the models with different priors based on four evaluation metrics and established a preference order for the priors. Figure 3 displays the preference orders for the priors according to these evaluation metrics, with 1 representing the highest performance and 6 the lowest. The preference order for DIC overlapped with the preference order for CPO, and both are depicted in the same order in the figure. Overall, the PC loggamma prior yielded the best model performance, while the uniform and half-Cauchy priors performed relatively poorly. The PC loggamma prior did, however, have the worst value for the MLK metric. Therefore, the Bayesian model with the PC loggamma prior to precision was the most effective in our analysis and should be employed in any further applications of the model in wildfire management practices.

Similarity between True and Predicted Values
Before estimating the sensitivity indices of the input variables in the fire simulations as an application of the Bayesian model fitting, we calculated the similarity score of the prediction values given by the fitted models (considering the means) with the true data values. Figure 4 illustrates the correlation coefficients calculated for various fitted models. It is evident that the similarity to the true observations increases as the number of true observations considered in the model fitting increases. Although an additional 1000 observations were considered in each model, the improvement in the correlation coefficients was not substantial (only 0.02 from 4000 to 7000 observations).  Figure 5 displays the sensitivity indices estimated for the input parameters in the fitted model, compared to those estimated using actual values obtained from the wildfire simulations. The indices based on the full data sets from the actual model runs indicate that relative humidity, wind speed, and temperature had contributions of 72%, 19%, and 9%, respectively. When the fitted model predicted 4000 values using 4000 true observations, the contributions of the parameters in order were 80%, 13%, and 7%, respectively, which are similar to the true values. As the fitted model considered more data points in the data set for model fitting and predicted fewer data points, the estimated indices became closer to the true values. These levels of influence of the input parameters align with the mean values of the posterior marginals of the three input parameters. Estimating the sensitivity of the fire area to input parameters through sensitivity analysis, in combination with posterior marginals analysis provides important insights into the factors contributing to destructive wildfires and facilitates scenario analysis. Wildfires tend to grow rapidly under high values of temperature and wind speed coupled with low values of relative humidity.

Reduced Computational Requirements
The model fitted with only 4000 data points in our demonstration produced results that were significantly closer to the true values. These findings have important implications for state-of-the-art wildfire management systems, as they suggest that we can trade off the time and computational resources required to run 4000 simulations for which the Bayesian model predicts the values with the time and resources needed to build the Bayesian model. Without considering the Bayesian model fitting, the computational requirements of the fire model simulations are significantly higher. Our application of a Bayesian model for worst-case scenario analysis through sensitivity analysis in wildfire management practices demonstrated promising results, as we were able to obtain close-to-true values with only 4000 fire simulations instead of 8000, saving computational and time resources by up to a factor of 2. Obtaining close-to-true predictions quickly during wildfire emergencies is crucial for effective wildfire management.

Conclusions and Future Work
Wildfire modeling systems are critical for understanding the spread of fires and making informed decisions during emergencies. Obtaining close-to-true predictions quickly is crucial in these situations. However, state-of-the-art wildfire management practices, such as scenario analysis, often require a large number of wildfire simulations to be run, which can be computationally expensive and time-consuming. In this study, we demonstrated how probabilistic models built using Bayesian models can be used to improve wildfire management practices. We also examined the impact of a latent effect on the performance of Bayesian models and the sensitivity of the model to different priors for precision. Our application of Bayesian models in estimating the sensitivity of fire size to input parameters showed that this approach can significantly reduce the computational cost and time of wildfire applications by a factor of up to two while still providing close approximations to the true values.
The study presents promising results but has several limitations. The Bayesian model used assumes a linear relationship between inputs and outputs, and only a few parameters and fire area were considered. To further improve the study, the authors plan to build more complex hierarchical Bayesian models with multiple inputs and outputs from fire simulations. Additionally, the study is currently limited to a specific location in Tasmania, and future research will expand this to consider the influence of spatial and fuel characteristics that may vary with fire start locations. Finally, the complexity of building the Bayesian model in terms of algorithms and computation is not considered and will be examined in future research.