Comparing Machine Learning and Time Series Approaches in Predictive Modeling of Urban Fire Incidents: A Case Study of Austin, Texas

: This study examines urban fire incidents in Austin, Texas using machine learning (Random Forest) and time series (Autoregressive integrated moving average, ARIMA) methods for predictive modeling. Based on a dataset from the City of Austin Fire Department, it addresses the effectiveness of these models in predicting fire occurrences and the influence of fire types and urban district characteristics on predictions. The findings indicate that ARIMA models generally excel in predicting most fire types, except for auto fires. Additionally, the results highlight the significant differences in model performance across urban districts, indicating an impact of local features on fire incidence prediction. The research offers insights into temporal patterns of specific fire types, which can provide useful input to urban planning and public safety strategies in rapidly developing cities. In addition, the findings also emphasize the need for tailored predictive models, based on local dynamics and the distinct nature of fire incidents.


Introduction
The United States has experienced rapid urban expansion in recent years.However, as cities expand, so does the risk of fires [1][2][3][4].These fires not only pose an acute danger to urban residents but also jeopardize the long-term health and development of cities.It is necessary to understand the patterns of these fires better as cities develop and change [5][6][7].Developing a data-driven approach to predict fire incidents is not only of paramount importance for protecting public safety, but it is also essential for understanding how human-environment interactions may impact the occurrence of urban fires [8,9].
Previous studies applied machine learning approaches to fire prediction, detection, and spread rate analysis [10][11][12][13][14].For instance, researchers used machine learning algorithms to improve the classification of burn zones, which classifies how broadly an area may burn [15,16].Machine learning models also can help identify environmental features that increase fire risks, such as extended drought conditions [17,18].Machine learning models are also used for fire incident prediction.For example, Bayes Network and Naive Bayes have been used to predict the likelihood of fire breakouts based on a probabilistic framework and the spatiotemporal information of previous incidents [12].Sevinc, Kucuk and Goltas [19] used Bayes networks to analyze the possible causes of a forest fire ignition.Another study by Szpakowski and Jensen [20] reviewed how remote sensing imagery and land use land cover data have been used in fire ecology, such as fire risk prediction, active fire detection, and burn severity assessment.Another widely used technique in fire prediction is the Random Forest model, which is a decision tree-based ensemble model that combines multiple decision trees to make predictions using features like temperature, humidity, vegetation type, and past fire occurrences.Random Forest has often been applied in the forecasting of urban fire outbreaks, and its effective performance has been reported in many instances [10][11][12]21,22].For example, Song, Kwan, Song and Zhu [23] used Random Forest to predict the occurrence of fire outbreaks in Hefei City, China.While the Random Forest accuracy assessments of the predicted fire locations fell short of the more position-dependent spatial econometric models, the Random Forest model was successful in outputting the relevant environmental variables that were attributable to the fire outbreaks [23].In another study, an evaluation of the Random Forest model was conducted in Yichun, China.The goal of this study was to assess Random Forest's ability to extrapolate risk patterns, fire outbreak drivers, and the spatial distribution of urban fire occurrences.The results showed that Random Forest was successful in making these assertions [24].
Artificial Neural Networks (ANNs) are another machine learning algorithm employed for various fire prediction studies.These methods include the ability to automatically extract relevant attributes from input data based on neural networks with various layers.For example, one wildfire prediction study used multi-year, fire incident data that were collected in the Montesinho Natural Park of Portugal.This study showed the success of ANNs locating potential sites of large-scale wildfire outbreaks but struggling with smaller-scale fire incidents [25].Another study completed in Heilongjiang, Northeast China used ANNs comparatively with logistic regression models to determine the most effective algorithm for wildfire outbreak prediction.The study used wildfire outbreak data, along with coinciding climate and topographic factors, to build and test the accuracy of the chosen predictive models.These authors found that the ANN model performed with the highest accuracy in predicting wildfire outbreaks, except in areas that were in or near urban zones [26].Apart from the aforementioned drawbacks of ANNs when it comes to small-scale and urban fire outbreaks, ANN models also are affected by the highly sensitive and often site-specific relationships of the input parameters and outputs that are difficult to untangle due to the "black box" nature of this algorithm [27,28].While it can render highly predictive accuracies in one study area, it can require additional adjustments when the study area changes, with no clear picture of the relationship between the input data and the results [29].
In addition to machine learning models, researchers also applied various time series statistical models to fire incident prediction, and Autoregressive integrated moving average (ARIMA) models are a commonly used technique in this field [13,30].ARIMA models were created in the 1970s and use mathematical laws to realize predictions from time series variables [31].This model captures temporal dependencies in data and has been widely applied in predicting forest fires.For example, Ma, Liu and Zhang [32] used a seasonally optimized ARIMA model to predict the frequency of fire outbreaks in China from 2003 to 2017.Their study reported that the SARIMA model performed well in predicting the frequency of fire outbreaks, with excellent Root Mean Squared Error values [32].Similarly, in a study by Zhang, Zhou, Weng and Zhang [33], an ARIMA model was used to predict urban fire outbreaks using fire rescue requests as a proxy for urban fire occurrences.This study found that ARIMA models, with sufficient historical data, are accurate and useful tools for informing fire departments on where and how to allocate resources to mitigate and respond to urban fire outbreaks [33].In addition, ARIMA models have been widely adopted to predict wildfires and fire occurrences within the wildland urban interface [34,35].
Although these methods have been used for both urban fires and wildfires, the majority of studies placed a focus on wildfires, and there has not been sufficient research that compares the performances of machine learning models and traditional time series models when predicting urban fire incidents.In addition, previous research also did not place a focus on how these models may perform differently for different fire types.To fill this gap, our research expands the analyses in [36] and aims to test the effectiveness of the Random Forest model and the ARIMA model in predicting urban fires.We are also interested in exploring whether and how the occurrence of fire incidents depends on a collection of variables, such as urban districts with different socioeconomic factors and the type of fires based on a dataset from the City of Austin Fire Department [36].We chose Random Forest and ARIMA, instead of other machine learning methods or time series models (e.g., STARIMA), as these were the most commonly used methods for fire predictions from each of the two categories based on our literature review.We feel that a comparison of these two basic methods, instead of their numerous variations, can provide researchers more valuable input.We chose Random Forest instead of ANNs in the category of machine learning based on previous studies showing that ANNs underperformed in urban areas [26].The research questions that we are trying to answer are two-fold: (1) whether the ARIMA model or the Random Forest model works better when predicting urban fire incidents, and (2) whether the type of fire or the specific urban district impacts the performance of fire incident predictions.

Data
The dataset used in this study is a collection of fire incidents within Austin, Texas that is collected and maintained by the City of Austin Fire department.This dataset collection ranges from January 2009 to December 2018 and contains information such as the time and date of the fire incident, the fire type (e.g., trash fire, grass fire, auto fire), and a latitude and longitude record for each individual incident (Table 1).Austin is one of the fastest-growing cities in the United States, and analyzing the spatial patterns of urban fire incidents is valuable for enhancing Austin's overall safety and resilience against fire-related incidents.The study area in Austin is divided into ten city council districts (Figure 1).Tables 2 and 3 display the number of fires by year and by city council district.As can be seen from Figure 1, the city center (e.g., Districts 1, 3, 9) exhibits the most significant concentration of fire incidents, likely due to the higher population densities.In contrast, the outskirts of the city, such as districts 2, 5, 6, 8, and 10, display a sparser distribution of fire reports.Although the total number of fires fluctuates over the years (Table 2), there are no substantial differences in the spatial distribution of fires across different years.

Methodology
The objective of this research was to evaluate the performance of Random Forest and ARIMA models in predicting urban fire incidents.Additionally, we aimed to investigate how these models may perform differently across various urban areas and with different types of fires.Therefore, as part of the data preprocessing, we first summarize the number of fire occurrences by month, fire type, and city council district.Figure 2 shows an example of the monthly data after preprocessing.The highlighted entry shows the number of trash fires in each month for city council district 3. The first component ("3") indicates the city council district, the second component ("TRASH-Trash Fire") denotes the fire type, and

Methodology
The objective of this research was to evaluate the performance of Random Forest and ARIMA models in predicting urban fire incidents.Additionally, we aimed to investigate how these models may perform differently across various urban areas and with different types of fires.Therefore, as part of the data preprocessing, we first summarize the number of fire occurrences by month, fire type, and city council district.Figure 2 shows an example of the monthly data after preprocessing.The highlighted entry shows the number of trash fires in each month for city council district 3. The first component ("3") indicates the city council district, the second component ("TRASH-Trash Fire") denotes the fire type, and the third component consists of a series of 120 bracketed numbers.Each number represents the monthly count of a specific type of fire in a given district, starting with January 2009 and ending December 2018.Then, we propose the following three analyses to examine the research questions proposed in Section 1 (Figure 3).

Generic Analysis Based on a Random Forest Model
We first constructed a Random Forest regression model based on the monthly fire incident data.Random Forest is an ensemble learning method primarily used for classification and regression tasks.It constructs multiple decision trees during the training process.Each tree is built from a sample drawn with replacement (i.e., bootstrap sample) from the training set.Compared to models based on single decision trees, Random Forest has greater accuracy and can handle datasets with more complex features [14,37].When constructing a Random Forest model, the features are the input variables that are used to predict the output.Because feature selection has a big influence on the model's performance, it is important to select the appropriate features.Additionally, features can be ranked according to their importance (i.e., the degree to which each feature improves the model's accuracy).In this research, due to availability, the features used for the Random Forest model were (1) the type of fire, (2) the city council district, and (3) the prior five years of fire incident data for a specific fire type/city council district combination based on a moving window.For example, to train the model, the number of fires in January 2016 used 60 (12 × 5) monthly fire counts from 2011-2015 as the input.The reason for constructing this model is to rank the importance of features, such as the type of fire or the city council district, when constructing the Random Forest model.Then, we propose the following three analyses to examine the research questions proposed in Section 1 (Figure 3).Then, we propose the following three analyses to examine the research questions proposed in Section 1 (Figure 3).

Generic Analysis Based on a Random Forest Model
We first constructed a Random Forest regression model based on the monthly fire incident data.Random Forest is an ensemble learning method primarily used for classification and regression tasks.It constructs multiple decision trees during the training process.Each tree is built from a sample drawn with replacement (i.e., bootstrap sample) from the training set.Compared to models based on single decision trees, Random Forest has greater accuracy and can handle datasets with more complex features [14,37].When constructing a Random Forest model, the features are the input variables that are used to predict the output.Because feature selection has a big influence on the model's performance, it is important to select the appropriate features.Additionally, features can be ranked according to their importance (i.e., the degree to which each feature improves the model's accuracy).In this research, due to availability, the features used for the Random Forest model were (1) the type of fire, (2) the city council district, and (3) the prior five years of fire incident data for a specific fire type/city council district combination based on a moving window.For example, to train the model, the number of fires in January 2016 used 60 (12 × 5) monthly fire counts from 2011-2015 as the input.The reason for constructing this model is to rank the importance of features, such as the type of fire or the city council district, when constructing the Random Forest model.

Generic Analysis Based on a Random Forest Model
We first constructed a Random Forest regression model based on the monthly fire incident data.Random Forest is an ensemble learning method primarily used for classification and regression tasks.It constructs multiple decision trees during the training process.Each tree is built from a sample drawn with replacement (i.e., bootstrap sample) from the training set.Compared to models based on single decision trees, Random Forest has greater accuracy and can handle datasets with more complex features [14,37].When constructing a Random Forest model, the features are the input variables that are used to predict the output.Because feature selection has a big influence on the model's performance, it is important to select the appropriate features.Additionally, features can be ranked according to their importance (i.e., the degree to which each feature improves the model's accuracy).In this research, due to availability, the features used for the Random Forest model were (1) the type of fire, (2) the city council district, and (3) the prior five years of fire incident data for a specific fire type/city council district combination based on a moving window.For example, to train the model, the number of fires in January 2016 used 60 (12 × 5) monthly fire counts from 2011-2015 as the input.The reason for constructing this model is to rank the importance of features, such as the type of fire or the city council district, when constructing the Random Forest model.

Comparative Analysis between Random Forest and ARIMA for Different Fire Types
ARIMA models are often useful in time series forecasting due to their flexibility in modeling data with both stationary and non-stationary and seasonal patterns [26,28,33,34].An ARIMA model is expressed as ARIMA (p, d, q), where the parameters p, d, and q are non-negative integers representing the autoregressive, integrated, and moving average parts of the model, respectively, and are interpreted as follows: • p (Autoregressive Parameter): This indicates the extent to which the current value of the series is linearly dependent on its previous values.For example, it shows how the value in March is related to the values in preceding months like February, January, etc. • d (Integrated Parameter): This represents the number of non-seasonal differences needed to make a time series stationary.For example, if a time series shows a linear trend, you might use d = 1 (i.e., differencing once by subtracting the previous value from each current value) to transform it into a stationary series.• q (Moving Average Parameter): This denotes the number of lagged forecast errors in the prediction equation.The parameter q can be seen as a measure of the uncertainty in the time series analysis.
For each time series, we first tested whether it was stationary and determined if there was a need for differentiation.We then used the stepwise auto_arima function in the pmdarima Python package, which automatically selects the best functional form of the ARIMA model based on Akaike Information Criterion values.The construction of ARIMA models provides quantitative evidence of how the occurrence of fire incidents has changed over time, and the fitted parameters can be applied for the prediction and estimation of future patterns.
To compare the performance between Random Forest and ARIMA, we constructed models for each type of fire and computed their mean absolute error (MAE).We used MAE instead of the mean absolute percentage error because there are zero values in the time series.

Comparative Analysis between Random Forest and ARIMA for Different Urban Districts
Similar to the previous step, to compare Random Forest and ARIMA, we constructed models for each urban district and compared their MAE.Note that both the ARIMA model and the Random Forest model use the last 12 months' data for testing and the preceding 9 years for training.However, the Random Forest model differs in that it incorporates the 5 years of data immediately preceding each data point in the training set as input features.After comparing the results using 1, 3, 5, and 7 years at the trial and error stage, we determined that utilizing five years produced the best MAE for the Random Forest model.

Creating a Random Forest Model Based on Monthly Fire Data
To construct a Random Forest model, as detailed in Section 2, the testing data consisted of the last 12 months of data from January 2018 to December 2018.This data encompassed ten city council districts and five main types of fire incidents, resulting in a total of 600 scenarios (twelve months × five fire incident types × ten districts) in the testing set.The five incident types are the five most common types of urban fires in our dataset, namely TRASH-Trash Fire, GRASS-Small Grass Fire, BOX-Structure Fire, AUTO-Auto Fire, and ELEC-Electrical Fire.
The result showed an MAE of 2.635 for all fire types.From Figure 4, we can observe a few patterns:

•
Figure 4 shows that the predicted values were effective in reflecting the overall pattern of urban fire occurrences.The closeness of the two lines (expected and predicted) for the majority of samples suggests a good fit for standard scenarios.• The MAE of 2.635 reported for all fire types offers valuable insights into the perfor- mance of our predictive model.It is particularly noteworthy given the diverse set of 600 scenarios within our testing set, spanning various districts and types of fire incidents.This MAE indicates that on average, the model's predictions deviated from the actual numbers by approximately 2.635 incidents.Although this signifies a relatively low error margin across the entire dataset, it is essential to delve deeper into the distribution of these errors.
• There were noticeable spikes in the expected values (i.e., sharp peaks) that the predicted values did not capture.This shows that the model fails to capture some of the extreme values, which is a common challenge in predictive modeling, especially for models that often average out the predictions like Random Forest.• The prediction errors do not appear to be uniformly distributed across all samples.
There are clusters of samples with larger errors which can correspond to specific types of fires or districts.After inspecting the raw data, it seems that the clusters of high error mostly occur in the city center and during summer months when fire incidents peak.
 There were noticeable spikes in the expected values (i.e., sharp peaks) that the pre-dicted values did not capture.This shows that the model fails to capture some of the extreme values, which is a common challenge in predictive modeling, especially for models that often average out the predictions like Random Forest.


The prediction errors do not appear to be uniformly distributed across all samples.
There are clusters of samples with larger errors which can correspond to specific types of fires or districts.After inspecting the raw data, it seems that the clusters of high error mostly occur in the city center and during summer months when fire incidents peak.
When analyzing the importance of different features in the Random Forest model, the fire type appears to be the most important feature, which suggests that the model's accuracy may improve if we construct separate models based on the fire type.In addition, we are also interested in the performance differences between ARIMA and Random Forest.Therefore, the rest of this section emphasizes the comparison of model performances by fire type and urban district.

Comparing ARIMA and Random Forest by Fire Type
To compare the performance between Random Forest and ARIMA, we constructed models for each type of fire and compared their MAE.Upon testing the stationarity of our time series, we found that the times series for all fire types appear stationary, so there was no need for differentiation.
Also, for forecasting models, there is a notable difference between ARIMA and the Random Forest regressor when it comes to handling residuals.ARIMA relies on the When analyzing the importance of different features in the Random Forest model, the fire type appears to be the most important feature, which suggests that the model's accuracy may improve if we construct separate models based on the fire type.In addition, we are also interested in the performance differences between ARIMA and Random Forest.Therefore, the rest of this section emphasizes the comparison of model performances by fire type and urban district.

Comparing ARIMA and Random Forest by Fire Type
To compare the performance between Random Forest and ARIMA, we constructed models for each type of fire and compared their MAE.Upon testing the stationarity of our time series, we found that the times series for all fire types appear stationary, so there was no need for differentiation.
Also, for forecasting models, there is a notable difference between ARIMA and the Random Forest regressor when it comes to handling residuals.ARIMA relies on the assumption of normally distributed residuals for accurate predictions, given its parametric nature.Hence, it is important to examine the normality of residuals of ARIMA models.On the other hand, the Random Forest regressor is not bound by such constraints and does not require the residuals to be distributed normally.Therefore, we created the residual plot in Figure 5 and the Q-Q plots for ARIMA models in Figure 6.The residuals appear to be oscillating around zero without a clear pattern, and there is no obvious trend, seasonality, or repeated structure, which would indicate that the ARIMA models have captured the underlying process well.In the Q-Q plots, it seems that the residuals for all fire types follow the red line quite closely, especially in the central quantiles.There are some deviations at the ends (tails), which suggests that there may be some outliers, or that the distributions have heavier tails than the normal distribution.This is common in real-world data.The residuals of Random Forest also follow normality in our tests, but since Random Forest models do not require the normality of residuals, only the ARIMA model residuals are included in this section.
ual plot in Figure 5 and the Q-Q plots for ARIMA models in Figure 6.The residuals appear to be oscillating around zero without a clear pattern, and there is no obvious trend, seasonality, or repeated structure, which would indicate that the ARIMA models have captured the underlying process well.In the Q-Q plots, it seems that the residuals for all fire types follow the red line quite closely, especially in the central quantiles.There are some deviations at the ends (tails), which suggests that there may be some outliers, or that the distributions have heavier tails than the normal distribution.This is common in real-world data.The residuals of Random Forest models also follow normality in our tests, but since Random Forest models do not require the normality of residuals, only the ARIMA model residuals are included in this section.Table 4 shows the comparison of MAE for the Random Forest and the ARIMA models based on the testing set.As can be seen from Table 4 and Figure 7, overall, ARIMA outperformed Random Forest in predicting the occurrence of fire incidents for most fire types; however, the performance varied for different fire types.For example, in predicting trash Table 4 shows the comparison of MAE for the Random Forest and the ARIMA models based on the testing set.As can be seen from Table 4 and Figure 7, overall, ARIMA outperformed Random Forest in predicting the occurrence of fire incidents for most fire types; however, the performance varied for different fire types.For example, in predicting trash fires, ARIMA's MAE was 21.92% lower than that of Random Forest.Similarly, ARIMA outperformed Random Forest in predicting grass fires and electrical fires with a decrease in MAE of 17.77% and 18.01%, respectively.Interestingly, the Random Forest model has a significantly lower MAE for auto fires compared to ARIMA, with the MAE for ARIMA being 95.36% higher.This might be due to nonlinear relationships or interactions between variables that Random Forest captured more effectively than ARIMA did.In other words, the ARIMA model is designed to capture time-series data's temporal structures (e.g., trends and seasonality), so the model may not work as well if the frequency of auto fires does not show a strong temporal pattern.Figure 8 shows a comparison of temporal patterns of two fire types: small grass fire and auto fire.The two types of fires exhibit distinct seasonal trends.The frequency of grass fires shows a particular pattern of increase or decrease during the spring and summer months, which could be related to factors like temperature, rainfall, drought conditions, and vegetation growth.The pattern for auto fires is less seasonally dependent and more consistent month to month.Interestingly, the Random Forest model has a significantly lower MAE for auto fires compared to ARIMA, with the MAE for ARIMA being 95.36% higher.This might be due to nonlinear relationships or interactions between variables that Random Forest captured more effectively than ARIMA did.In other words, the ARIMA model is designed to capture time-series data's temporal structures (e.g., trends and seasonality), so the model may not work as well if the frequency of auto fires does not show a strong temporal pattern.Figure 8 shows a comparison of temporal patterns of two fire types: small grass fire and auto fire.The two types of fires exhibit distinct seasonal trends.The frequency of grass fires shows a particular pattern of increase or decrease during the spring and summer months, which could be related to factors like temperature, rainfall, drought conditions, and vegetation growth.The pattern for auto fires is less seasonally dependent and more consistent month to month.
Figure 8 shows comparison of temporal patterns of two fire types: small grass fire and auto fire.The two types of fires exhibit distinct seasonal trends.The frequency of grass fires shows a particular pattern of increase or decrease during the spring and summer months, which could be related to factors like temperature, rainfall, drought conditions, and vegetation growth.The pattern for auto fires is less seasonally dependent and more consistent month to month.In summary, while ARIMA appears to be the better model overall for predicting fire incidents in this analysis, the choice of model should consider the specific type of fire In summary, while ARIMA appears to be the better model overall for predicting fire incidents in this analysis, the choice of model should consider the specific type of fire being predicted.The superior performance of Random Forest in predicting auto fires could imply that certain fire types may have underlying patterns better captured by the more complex, nonlinear methods employed by Random Forest algorithms.It would be beneficial to further investigate the characteristics of auto fire incidents that lead to this discrepancy in model performance and potentially explore hybrid models or feature engineering to improve predictions across all fire types.

Comparing ARIMA and Random Forest by Urban District
Similarly, we also compared the performance of ARIMA and Random Forest in different urban districts (Table 5).Like the residual and Q-Q plots for different fire types, Figures 9 and 10 also demonstrate randomly distributed residuals, signifying that the ARIMA models have captured the underlying process well.It can be seen from Table 5 that the Random Forest model performed better in five districts (Districts 1, 3, 8, 9, 10), where the MAE values were slightly lower than the ARIMA values.This performance difference might be attributed to Random Forest's ability to handle complex, nonlinear data patterns and interactions between multiple predictors, which can be characteristic of urban fire incidents.On the other hand, the ARIMA It can be seen Table 5 that the Random Forest model performed better in five districts (Districts 1, 3, 8, 9, 10), where the MAE values were slightly lower than the ARIMA values.This performance difference might be attributed to Random Forest's ability to handle complex, nonlinear data patterns and interactions between multiple predictors, which can be characteristic of urban fire incidents.On the other hand, the ARIMA model performed better in the other five districts.This could indicate that fire occurrences in these districts follow more predictable temporal trends, which ARIMA can capture more effectively.The models appear to show a varied performance across the ten districts (Figure 11).be better at handling the complexity and variety of urban fires due to its ability to capture nonlinear relationships and interactions between unknown variables.From Figure 11, we can see that Districts 3 and 9 (i.e., where Random Forest has a lower MAE) are located in the city center.Although Districts 1, 8, and 10 cover a large size of suburban areas, most of the fire incidents in these regions happened close to the city center (c.f. Figure 1).Suburban districts, on the other hand, might have fires that are more related to residential areas, which may follow more seasonal or temporal patterns and potentially make ARIMA more suitable.For example, Districts 2 and 4, where ARIMA performs better, might exhibit more homogeneous (possibly suburban) characteristics with fire incidents that follow a more predictable temporal pattern.
To further understand the complex correlation between model performance and urban districts, it would be beneficial to consider additional data layers, such as districtspecific socioeconomic characteristics to better interpret why certain districts are better predicted by one model than the other.

Limitations
Although the results in Sections 3.1-3.3provide valuable insights into the predictive capabilities of ARIMA and Random Forest models for urban fire incidents, there are a few constraints.One limitation is that the historical data were collected between 2009 and 2019, which may not fully capture the changing dynamics of urban environments, especially in a fast-developing city like Austin.Changes in other factors, such as urban planning and building codes over time, can also alter the landscape of fire incidents.Therefore, it may not always be reliable to use past fire data for future predictions.
Also, we chose city urban districts instead of census tracts or census block groups because the latter often contain many areas with zero incidents.By selecting city urban districts, we aimed to reduce the skewing effect of sparsely populated or less incidentprone areas that may not provide a realistic picture of fire incident patterns.Future studies can look into how the choice of spatial unit may impact the model fitting and the prediction results.
Another limitation that is worth considering is that the Random Forest model tends to underestimate the values.This could be because some features are missing when training the model.Also, there was some noise and outlier data in the expected values.Random From Figure 11, the spatial pattern in terms of the performance difference is not as obvious as expected.This can be due to the underlying diversity in the characteristics and dynamics of fire incidents across the city.Various factors, such as the number of fire incidents, size of the area, and urban density, could influence the model's performance.However, we can still observe an urban/suburban difference.For example, urban districts typically have higher population densities and more infrastructure; therefore, these areas may experience a higher frequency of different types of fires, such as structural fires in multi-story buildings, compared to suburban districts.The Random Forest model might be better at handling the complexity and variety of urban fires due to its ability to capture nonlinear relationships and interactions between unknown variables.From Figure 11, we can see that Districts 3 and 9 (i.e., where Random Forest has a lower MAE) are located in the city center.Although Districts 1, 8, and 10 cover a large size of suburban areas, most of the fire incidents in these regions happened close to the city center (c.f. Figure 1).Suburban districts, on the other hand, might have fires that are more related to residential areas, which may follow more seasonal or temporal patterns and potentially make ARIMA more suitable.For example, Districts 2 and 4, where ARIMA performs better, might exhibit more homogeneous (possibly suburban) characteristics with fire incidents that follow a more predictable temporal pattern.
To further understand the complex correlation between model performance and urban districts, it would be beneficial to consider additional data layers, such as district-specific socioeconomic characteristics to better interpret why certain districts are better predicted by one model than the other.

Limitations
Although the results Sections 3.1-3.3provide valuable insights into the predictive capabilities of ARIMA and Random Forest models for urban fire incidents, there are a few constraints.One limitation is that the historical data were collected between 2009 and 2019, which may not fully capture the changing dynamics of urban environments, especially in a fast-developing city like Austin.Changes in other factors, such as urban planning and building codes over time, can also alter the landscape of fire incidents.Therefore, it may not always be reliable to use past fire data for future predictions.
Also, we chose city urban districts instead of census tracts or census block groups because the latter often contain many areas with zero incidents.By selecting city urban districts, we aimed to reduce the skewing effect of sparsely populated or less incidentprone areas that may not provide a realistic picture of fire incident patterns.Future studies can look into how the choice of spatial unit may impact the model fitting and the prediction results.
Another limitation that is worth considering is that the Random Forest model tends to underestimate the values.This could be because some features are missing when training the model.Also, there was some noise and outlier data in the expected values.Random Forest models can sometimes smooth out noise, which might look like an underestimation in the presence of volatile data.In this research, we only considered factors like fire type and district characteristics due to the limitation of data, but there can be other factors, such as weather conditions or traffic patterns, that may improve the model's accuracy.
In addition, this study was only conducted at a city council-district level in Austin, Texas.The models' performance in Austin may not necessarily reflect their potential accuracy in other urban areas, as different cities have unique urban layouts, socio-demographic factors, policies, etc.The delineation of city council districts can obscure localized patterns and outliers and also introduce modifiable areal unit (MAUP) problems, meaning that the results may be different if we conduct the analysis at a census tract or a census block group level.However, in the initial data exploration, we discovered that either census tract or census block group level yields too many polygons with zero fire incidents.

Conclusions
This study conducted a predictive analysis of two models-ARIMA and Random Forest-based on a dataset of urban fire incidents across various districts of Austin, Texas.Overall, the ARIMA model is capable of modeling how fire incidents have changed over time in a parametric way and has proven to be a dependable way to forecast future trends based on these changes.On the other hand, the Random Forest model has also shown notable effectiveness in certain city areas.This suggests that the specific kind of fire and the unique features of each urban district greatly affect how well the model works.
Based on the comparative study results, we found that both the type of fire and the district have a substantial influence on the model's performance.The ARIMA model outperformed the Random Forest model for most fire types except for auto fires.The results from the city district analysis demonstrated an interesting pattern in model efficacy, with Random Forest outperforming ARIMA in five districts and vice versa in the other five.This balanced variance in predictive accuracy highlighted the importance of considering local district characteristics, including socioeconomic factors and the specific nature of fire incidents, when selecting a predictive model.Overall, this research contributes to the existing literature by filling the gap in comparative studies of machine learning and time series models for urban fire prediction.The results can provide valuable input for urban planning and public safety by creating more targeted and effective fire prevention strategies in rapidly growing urban areas like Austin.
Future work can focus on including other factors, such as socioeconomic profiles, weather patterns, and land use patterns, into the predictive models.This may help to uncover complex interactions and dependencies that are not apparent from the fire incident data alone.Also, expanding the geographical scope to include other cities could help generalize the findings and allow the models to be tested and validated in diverse urban settings.Researchers can investigate the performance of other machine learning algorithms, especially algorithms with temporal capabilities, like Long Short-Term Memory networks, when handling the sequential nature of time-series data.The use of deep learning techniques could potentially reveal deeper insights into the predictive factors of urban fires.Lastly, burn severity data can be incorporated into future research to assess its impact on fire behavior and propagation, which can provide valuable insights for fire management and urban planning strategies.

Figure 1 .
Figure 1.Yearly fire incidents in the ten Austin city council districts.

Figure 1 .
Figure 1.Yearly fire incidents in the ten Austin city council districts.
ISPRS Int.J. Geo-Inf.2024, 13, x FOR PEER REVIEW 5 of 16 the third component consists of a series of 120 bracketed numbers.Each number represents the monthly count of a specific type of fire in a given district, starting with January 2009 and ending December 2018.
ISPRS Int.J. Geo-Inf.2024, 13, x FOR PEER REVIEW 5 of 16 the third component consists of a series of 120 bracketed numbers.Each number represents the monthly count of a specific type of fire in a given district, starting with January 2009 and ending December 2018.

Figure 4 .
Figure 4. Predicted values based on a Random Forest model [36].

Figure 4 .
Figure 4. Predicted values based on a Random Forest model [36].

Figure 5 .
Figure 5. ARIMA model residuals for all five types of fires.Figure 5. ARIMA model residuals for all five types of fires.

Figure 5 . 16 Figure 6 .
Figure 5. ARIMA model residuals for all five types of fires.Figure 5. ARIMA model residuals for all five types of fires.

Figure 6 .
Figure 6.Q-Q plots for ARIMA models by fire type.

Figure 8 .
Figure 8. Comparing the temporal patterns of small grass fire and auto fire.

Figure 8 .
Figure 8. Comparing the temporal patterns of small grass fire and auto fire.

Figure 9 .
Figure 9. ARIMA model residuals for all districts.Figure 9. ARIMA model residuals for all districts.

Figure 10 .
Figure 10.Q-Q plots for ARIMA models by district.

Figure 11 .
Figure 11.Comparing Random Forest and ARIMA by city council district.

Figure 11 .
Figure 11.Comparing Random Forest and ARIMA by city council district.

Table 1 .
Example records from the fire incident dataset.

Table 2 .
Fire incidents by year.

Table 3 .
Fire incidents by city council district.

Table 5 .
Comparing Random Forest and ARIMA at the city council district level.