An Assessment of the Effectiveness of Tree-Based Models for Multi-Variate Flood Damage Assessment in Australia

: Flood is a frequent natural hazard that has signiﬁcant ﬁnancial consequences for Australia. In Australia, physical losses caused by ﬂoods are commonly estimated by stage-damage functions. These methods usually consider only the depth of the water and the type of buildings at risk. However, ﬂood damage is a complicated process, and it is dependent on a variety of factors which are rarely taken into account. This study explores the interaction, importance, and inﬂuence of water depth, ﬂow velocity, water contamination, precautionary measures, emergency measures, ﬂood experience, ﬂoor area, building value, building quality, and socioeconomic status. The study uses tree-based models (regression trees and bagging decision trees) and a dataset collected from 2012 to 2013 ﬂood events in Queensland, which includes information on structural damages, impact parameters, and resistance variables. The tree-based approaches show water depth, ﬂoor area, precautionary measures, building value, and building quality to be important damage-inﬂuencing parameters. Furthermore, the performance of the tree-based models is validated and contrasted with the outcomes of a multi-parameter loss function (FLFA rs ) from Australia. The tree-based models are shown to be more accurate than the stage-damage function. Consequently, considering more parameters and taking advantage of tree-based models is recommended. The outcome is important for improving established Australian ﬂood loss models and assisting decision-makers and insurance companies dealing with ﬂood risk assessment.


Introduction
In recent decades, flood risk is growing, due to climate change and increase in vulnerability of properties at risk [1][2][3]. In Australia, floods are the most costly of all disaster types [4], contributing 29% of the total cost of the nation's economy and the built environment [5,6]. Accordingly, flood risk management is attracting more attention [7][8][9], and results are used to inform disaster management policy and support the development of risk reduction measures [10,11]. Flood risk management has to be based upon an appropriate evaluation of flood hazard and flood vulnerability [12,13], including an assessment of damage and effectiveness of risk reduction measures [14][15][16]. Therefore, loss estimation and consequence assessment is an indispensable part of flood risk management [17,18]. However, stage-damage functions. The objective of this study is to employ tree-based data mining methods to examine the effect and importance of damage-influencing parameters using a dataset collected from 2012 to 2013 flood events in Queensland. The performance of the tree-based models is also compared with the outcomes of a newly established multi-parameter loss function (FLFA rs ) from Australia.

Study Area and Data
For this study, two areas were chosen. The first survey area is the city of Bundaberg in Queensland, Australia, located in the vicinity of the Burnett River waterway north of the state capital, Brisbane ( Figure 1). The Burnett River catchment is located in South-East Queensland, with the main system incorporating the rivers of Three Moon Creek, Burnett River, Nogo Creek, Auburn River and the Boyne River, in addition to many other creeks and tributaries. The total Burnett River catchment area is approximately 33,000 square kilometres. This area is bound by the catchments of the Fitzroy and Kolan Rivers to the north; the Dawson and Condamine Rivers to the east and the Brisbane and Mary Rivers to the South. The Burnett River catchment has had a long history of flooding that has impacted both the urban centres and rural areas [43]. The Bundaberg ground elevation and the Burnett River catchment are illustrated in Figures 2 and 3. In recent years, the city of Bundaberg has experienced some extreme flood events. The most recent flood responses from Bundaberg Regional Council date back to the floods in November 2010, January 2013, February 2013, and February 2015 [2]. During the flood event in January 2013, 200 businesses were inundated, and over 2000 residents and 70 hospital patients were evacuated. Furthermore, the performance of lifelines was disrupted, and infrastructures were impacted [44]. This flood event that occurred from 21 to 29 January 2013 was a result of the Tropical Cyclone Oswald, and the associated rainfall and flooding had a catastrophic effect on Queensland and it is considered to be the worst flood experienced in Bundaberg's recorded history. The height of the floodwaters in Bundaberg city from Burnett River reached 9.53 metres at its peak, and over 2000 properties were affected [2]. The extension of the water depth is illustrated in Figure 4. Bundaberg Regional Council estimated that the public infrastructure damage from the flood event of 2013 was approximately AUD 103 million [2]. The second study area is the city of Roma, located on Bungil Creek, a tributary of the Condamine River in the Maranoa region in Queensland ( Figure 5). The flood event in 2012 is considered to be the worst flood experienced in Roma's history, having inundated 444 homes. This flood event that occurred from late January to early February 2012 was a result of heavy rainfall. The boundary of the flood is illustrated in Figure 6. The Maranoa Regional Council estimated that the public infrastructure damage from the natural disaster events of 2012 was approximately AUD 50 million [2]. The return periods of both flood events have been estimated to be approximately 100 years, based on the flood frequency analyses [43].  Map of Bundaberg Regional Council [45].        The empirical dataset used for this study (457 loss cases from the 2013 flood and 150 loss cases from the 2012 flood) was gathered after these two flood events from the Queensland Reconstruction Authority, a governmental responder organisation to Queensland disaster events. The official dataset-which was collected by either two or three post-disaster on-site surveys based on a standardised procedure and unified guidelines of the survey-provides data on the intensity of Figure 5. Map of Maranoa Regional Council [49].  The empirical dataset used for this study (457 loss cases from the 2013 flood and 150 loss cases from the 2012 flood) was gathered after these two flood events from the Queensland Reconstruction Authority, a governmental responder organisation to Queensland disaster events. The official dataset-which was collected by either two or three post-disaster on-site surveys based on a standardised procedure and unified guidelines of the survey-provides data on the intensity of The empirical dataset used for this study (457 loss cases from the 2013 flood and 150 loss cases from the 2012 flood) was gathered after these two flood events from the Queensland Reconstruction Authority, a governmental responder organisation to Queensland disaster events. The official dataset-which was collected by either two or three post-disaster on-site surveys based on a standardised procedure and unified guidelines of the survey-provides data on the intensity of hazard (i.e., water depth, information on water contamination, and information on flow velocity), characteristics of buildings (i.e., material, floor space, construction type, number of building storeys, information on utilities and solar panels, and emergency measures undertaken), and the magnitude of losses. It is worth mentioning that for every building, the magnitude of damage has been explained based on the affected structural components. Accordingly, based on the average value of damaged items relative to the total value of the structure, the descriptions of damages have been exchanged into a percentage of damages [2]. Further complementary data (e.g., building age, length of residency, average replacement building value, the number of residences, and socioeconomic status) was collected from the National Exposure Information System of Australia [51]. Consequently, the final dataset provides 20 attributes on 607 inundations. Candidate predictors are either extracted directly from one attribute (e.g., water depth or building area) or transformed from several attributes (e.g., building quality or flow velocity). Data preparation and data transformation are discussed further below.

‚
Water depth and water contamination: this information was collected in two post-disaster surveys. The value of water depth fluctuated between 0 cm and 700 cm above ground. However, for 96% of buildings, this attribute was equal to or less than 350 cm. Also, the existence of sewage, biological, or chemical contamination has been checked and reported by visual inspection and smell. Accordingly, water contamination was ranked based on the reported material and the existing chemical hazards, from 0 (no contamination) to 2 (chemical contamination), with 1 representing only sewage contamination.

‚
Flow velocity: flow velocity was assessed according to the comments of inspectors about the amount of water penetration inside of buildings, the volume of deposited materials, and the type of sediment next to the house (mud, sand, gravel or stone). Afterwards, this information was transformed and ranked as calm (1: no deposit or only mud sediment), medium (2: sand sediment or a considerable amount of water penetration), or high (3: gravel or stone sediment or high volume of deposits) flow velocity.

‚
Emergency measures: the dataset provides information about whether or not people undertook any action against water infiltration, e.g., pumping water out or cut-off of electricity supply. Subsequently, these actions were ranked from 0 (no measure was undertaken) to 3 (many measures were undertaken), with 1 representing that only water was pumped out, and 2 representing that only electricity supply was cut off. The "cut-off of electricity supply" measure had a greater weight due to the high value of electrical equipment [2].
‚ Precaution measures: the indicators of precaution measures were defined and ranked based on the construction type (3: high-set open under, 2: low-set with suspended floor, or 1: high-set enclosed under or slab on ground); protection of utilities and power system against water impacts (1: no protection, 2: protected); availability of solar-panel power provider (1: not available, 2: available); and the number of building storeys (1: one-storey buildings, 2: two-storey buildings). Eventually, precaution measure indicators were calculated and weighted by multiplying the above ranks.

‚
Flood experience: the areas of study have experienced a variety of flood events in recent years [2,52]. Therefore, this parameter has been assessed and averaged according to the length of residency. Overall, about 11% of households moved into the areas one year or less before the events, weighted 1. About 31% of families settled there in the last five years, weighted as 2.
Residents with more than five years length of residency were weighted 3.

‚
Building quality: this item is a function of age (i.e., constructed pre-or post-1981) and material (e.g., timber, brick, concrete, or metal) of buildings. Age of buildings was weighted 1 if the structure was constructed pre-1981 and 2 if it was constructed post-1981. Also, the resistance of different materials against impacts of water is judged and ranked: 1 for timber, 2 for brick, and 3 for concrete or metal, according to the Australian building guidelines for flood prone areas [53]. Finally, this candidate predictor is defined by multiplying the weight of age by the weight of the material.

‚
The value and floor space of building: for every building, the value was calculated by multiplying the total area reported by the inspectors by the average replacement value per square metre extracted from the national exposure information system of Australia [51]. In this study, besides considering the area of the buildings, the contribution of the residents' density with the extent of losses has been taken into account. Accordingly, floor space of the building was calculated per person, by dividing the total area by the number of residents.
‚ Socioeconomic status: this category includes information about ownership status and monthly income (i.e., low: $1-$599, middle: $600-$1,999, or high: greater than $2,000). Also, it represents buildings whose residents need special attention (i.e., aged less than five or more than 65; needing assistance with a core activity; or do not speak English well) or low education residents (i.e., the highest educational attainment of all building residents is year 11 or below).  (Table 1). Table 2 shows the Pearson correlation coefficient of the final candidate predictors and the loss ratio. As expected, and as other researchers have claimed [2, 15,24], water depth has the highest absolute correlation with loss ratios ( Figure 7). However, many other variables-such as flow velocity, contamination, precaution measure, floor space per person, the value of the affected building, and building quality-are also significantly correlated to damage ratio.

Statistical Methods
Regression trees and bagging decision trees were applied to determine the prominent damage-influencing parameters, to understand their effect on the extent of structural damage, and to compare the performance of the tree-based models with an established flood loss function. The tree-based analyses were performed with the Weka machine learning software [54].

Regression Trees
Regression trees are machine learning methods for constructing prediction models from data where the target variables are continuous values [55]. Tree-based regression models are known for their simplicity and efficiency when facing up to domains with a large number of variables and data [56]. They are constructed by sub-dividing the predictor data space into smaller areas such that in each split, the dataset is partitioned into two sub-spaces. In this regard, each terminal node is labelled with a question and the binary branches are labelled with the answers. Subdivision should

Statistical Methods
Regression trees and bagging decision trees were applied to determine the prominent damage-influencing parameters, to understand their effect on the extent of structural damage, and to compare the performance of the tree-based models with an established flood loss function. The tree-based analyses were performed with the Weka machine learning software [54].

Regression Trees
Regression trees are machine learning methods for constructing prediction models from data where the target variables are continuous values [55]. Tree-based regression models are known for their simplicity and efficiency when facing up to domains with a large number of variables and data [56]. They are constructed by sub-dividing the predictor data space into smaller areas such that in each split, the dataset is partitioned into two sub-spaces. In this regard, each terminal node is labelled with a question and the binary branches are labelled with the answers. Subdivision should be performed in such a way that the predictive accuracy is maximised, and errors are minimised. In other words, the algorithm searches over all possible split values of all predictor variables to identify the split which minimises an error criterion. Overall, trees should be complicated enough to take advantage of information that increases predictive power, while simple enough to ignore random noises that do not enhance the accuracy of results [15].
If a decision tree model is fully grown, it may lose some generalisation capability, and if the training data contains any errors, it can lead to poor performance on unforeseen cases. This issue is known as overfitting and needs careful attention [57,58]. One way to avoid overfitting is tree pruning, which was employed in this study. Tree pruning is a technique in machine learning that decreases the size of decision trees by taking off sections of the tree that give little power to classify instances. Pruning reduces the complexity of the final classifier and hence improves predictive accuracy by the reduction of overfitting [59].
In this study, the target variables were relative structural loss values and trees were constructed using the entire dataset. Therefore, some repeated binary partitioning questions construct the structure of the tree, from the root node to the terminal nodes (or leaves). Terminal node values give the average loss ratio of all data values of the terminal node [15]. In other words, the prediction of loss ratio is the average of the training dataset that belongs to every leaf.
The prediction error used for Figure 8 is estimated by a 10-fold cross-validation technique based on the average absolute deviation of the estimated ratios from the observed values (MAE). In this regard, the shuffled data was first partitioned into 10 equally-sized segments (folds). A tree was computed 10 times. In each iteration, a different fold of the data was held out for model testing while the remaining nine folds were used for model training. Eventually, the error was averaged over all constructed models [6,60].
Water 2016, 8, 282 11 of 19 In this study, the target variables were relative structural loss values and trees were constructed using the entire dataset. Therefore, some repeated binary partitioning questions construct the structure of the tree, from the root node to the terminal nodes (or leaves). Terminal node values give the average loss ratio of all data values of the terminal node [15]. In other words, the prediction of loss ratio is the average of the training dataset that belongs to every leaf.
The prediction error used for Figure 8 is estimated by a 10-fold cross-validation technique based on the average absolute deviation of the estimated ratios from the observed values (MAE). In this regard, the shuffled data was first partitioned into 10 equally-sized segments (folds). A tree was computed 10 times. In each iteration, a different fold of the data was held out for model testing while the remaining nine folds were used for model training. Eventually, the error was averaged over all constructed models [6,60].

Bagging Decision Trees
The bagging predictor is a method for generating a multiple version of a predictor and using this to get an aggregated predictor. The multiple version is formed by making bootstrap replicates of the entire dataset and using each replica to grow a new regression tree. The response of a bagging decision tree is the average of all individual regression trees. Bootstrapping and ensemble models make the response strong enough to cope with variation in data and avoid the overfitting issue. Tests on real and simulated datasets using regression trees have shown that compared to an individual regression tree, bagging can substantially enhance the stability and accuracy of the model's performance [15,[61][62][63][64]. About one-third of data is not used for training the individual regression trees. This segment, called out-of-bag data, is the observation data utilised for error estimation and feature importance assessment.
The quality of a bagging tree, used for exploring the feature importance, is measured by the average error of predictions of all regression trees compared with the observation data (out-of-bag data). In this regard, the values of one variable in the out-of-bag examples is randomly permuted, and the increase in the out-of-bag error is measured: the greater the growth, the more important the feature [15,26,62].

Comparing the Performance of the Tree-Based Models with FLFArs
The tree-based models constructed in the previous stages, based the on the entire dataset, were utilised for loss ratio estimation and comparison with the stage-damage function. For a meaningful

Bagging Decision Trees
The bagging predictor is a method for generating a multiple version of a predictor and using this to get an aggregated predictor. The multiple version is formed by making bootstrap replicates of the entire dataset and using each replica to grow a new regression tree. The response of a bagging decision tree is the average of all individual regression trees. Bootstrapping and ensemble models make the response strong enough to cope with variation in data and avoid the overfitting issue. Tests on real and simulated datasets using regression trees have shown that compared to an individual regression tree, bagging can substantially enhance the stability and accuracy of the model's performance [15,[61][62][63][64]. About one-third of data is not used for training the individual regression trees. This segment, called out-of-bag data, is the observation data utilised for error estimation and feature importance assessment.
The quality of a bagging tree, used for exploring the feature importance, is measured by the average error of predictions of all regression trees compared with the observation data (out-of-bag data). In this regard, the values of one variable in the out-of-bag examples is randomly permuted, and the increase in the out-of-bag error is measured: the greater the growth, the more important the feature [15,26,62].

Comparing the Performance of the Tree-Based Models with FLFA rs
The tree-based models constructed in the previous stages, based the on the entire dataset, were utilised for loss ratio estimation and comparison with the stage-damage function. For a meaningful comparison, all models should be derived from the same dataset [15]. Accordingly, the performance of the tree-based model was compared with a newly established multi-parameter flood loss model (FLFA rs ) [2], which has been derived from the same flood event data.
The results of the damage models have been compared with the following resampling procedure. First, 100 samples are randomly pulled out from the original data set, and each model is implemented with this random sample. Errors in the estimates from the aforementioned models in contrast to the actual values are evaluated by three error measures: mean absolute error (MAE), root mean square error (RMSE), and correlation coefficient. Then, this step is repeated 200 times and the average of errors converged to a final constant value. Finally, the performance of the damage models is compared according to the converged values of the averaged errors ( Figure 11).

Regression Trees
Regression trees were created in different sizes. Figure 8 compares the various trees based on the cost error parameter. The largest tree was stopped with 19 terminal nodes ( Figure 9). As stated before, trees should be complicated enough to take advantage of information that increases predictive power, while simple enough to ignore random noises that do not enhance the accuracy of results [15]. Accordingly, after using tree pruning technique for all sizes of regression trees, the tree with 19 terminal nodes and a minimum value of error (0.0652) was selected. In this tree, five predictors out of the 13 candidates were considered and correlated with loss ratios. Table 3 shows how many times these predictors were used in decision nodes and how these parameters are correlated with loss ratios. A positive correlation means that the loss ratio increases or decreases as the candidate predictor increases or decreases, and the reverse for a negative correlation. Water depth is the most significant predictor, available in nine decision nodes and correlating positively with the loss ratio. This outcome is as expected, and accords with previous research [11,32]. After water depth, floor area (space area per person) is the most important influencing factor, correlating negatively with loss ratio. The space area might be substantial if the depth of water is greater than 64 cm. This result accords with the findings of Thieken et al. (2005) and Merz et al. (2013), who showed that the building loss ratio decreases if the total floor space of the building exceeds 139 m 2 or 120 m 2 [15,24]. However, in this study, the area of the building reduces the extent of losses if it exceeds 150 m 2 per person ( Figure 9).
Another important factor that correlates negatively with the extent of losses is the precautionary measures. In the pruned tree with 19 leaves, the precautionary measures are important only for larger water depths (>177.5 cm). This outcome is opposite to the results of the studies in Germany, where the effects of the precautionary measures were significant only for shallow water depths [15,39]. This matter can be explained according to the flood characteristics and the precaution measures considered. As stated, in this study, water depth was the most significant impact factor. On the other hand, the construction type (i.e., how much the first floor has been raised up) and the number of building storeys had the most influential effects on the weighting of the precautionary measures. Accordingly, when the flood depth is shallow, and hazard has little impact, these measurements do not significantly affect the calculated extent of losses. However, when the impact of the flood (water depth) is considerable, precautionary measures-either by substantially decreasing the water depth on the floor of the building, or by protecting the building fabrics placed at higher levels-will remarkably reduce the extent of losses.
As with precautionary measures, building quality has an inverse effect on the structural loss ratios if the water depth is greater than 177.5 cm. This accords with the above finding that water depth is the greatest influencing factor of the floods, and the resistance parameters are meaningful if the depth of water (hazard impact) is significant. The building value indicator was also presented in three decision nodes of the right part of the tree. Nonetheless, its correlation with the loss ratio is not clear. In other words, on this dataset and in large flood depths, variation in the building value does not have a defined relationship with the trend of the loss ratio. This can be interpreted as a weak local correlation between this predictor and the loss ratio, or as an inherent uncertainty in the data.
the cost error parameter. The largest tree was stopped with 19 terminal nodes ( Figure 9). As stated before, trees should be complicated enough to take advantage of information that increases predictive power, while simple enough to ignore random noises that do not enhance the accuracy of results [15]. Accordingly, after using tree pruning technique for all sizes of regression trees, the tree with 19 terminal nodes and a minimum value of error (0.0652) was selected. In this tree, five predictors out of the 13 candidates were considered and correlated with loss ratios. Table 3 shows how many times these predictors were used in decision nodes and how these parameters are correlated with loss ratios. A positive correlation means that the loss ratio increases or decreases as the candidate predictor increases or decreases, and the reverse for a negative correlation.    Merz et al. (2013), who showed that the effects of the flow velocity and the water contamination are significant only if the depth of water is shallow and the level of energy head is low [15,65]. Since in this study these predictors are reported simultaneously with large flood depths, they do not have a major effect on the extent of the damage. Other defined indicators such as emergency measures, flood experience, and socioeconomic status do not have an evident meaningful relationship with the loss ratios, although these parameters (e.g., water contamination, flow velocity and socioeconomic status) might be related to the loss ratios if an unpruned tree was grown on the dataset. As stated, although unpruned trees might have better performance on the original data, overfitting phenomena could affect their performance for an independent dataset. Accordingly, the authors have not developed unpruned trees for this part of the study. Furthermore, due to the joint effects of parameters, the interaction of emergency measures should also be discussed in the context of warnings and alerts issued during the event.

Bagging Decision Trees
As mentioned earlier, the bagging decision tree is formed by making bootstrap replicates of the entire dataset and using each replica for growing a new regression tree. This step was completed up to 200 times until the average of the ensemble errors became stable. Afterwards, the feature importance and the ranking of the predictors were calculated based on the results achieved from random permute. The grading of the predictors is water depth, space area per person, precautionary measures, building value, building quality and flow velocity ( Figure 10). Other candidates show slight feature importance. This ranking is very similar to the results obtained from the regression trees, see Table 3. As mentioned earlier, the bagging decision tree is formed by making bootstrap replicates of the entire dataset and using each replica for growing a new regression tree. This step was completed up to 200 times until the average of the ensemble errors became stable. Afterwards, the feature importance and the ranking of the predictors were calculated based on the results achieved from random permute. The grading of the predictors is water depth, space area per person, precautionary measures, building value, building quality and flow velocity ( Figure 10). Other candidates show slight feature importance. This ranking is very similar to the results obtained from the regression trees, see Table 3.

Performance of the Applied Damage Models
In this part of the study, the performance of the tree-based models was compared with FLFArs multi-parameter flood loss function. As mentioned before, both approaches (the tree-based models and the stage-damage function) were derived based on the same dataset.
To compare the performance of the tree-based models with FLFArs, 200 sets of 100 affected buildings were randomly drawn from the original dataset; each model was applied to every building record and the errors were calculated and averaged over all samples.
Results show that there is a distinct improvement in the tree-based models' performance over the FLFArs model, which is due to the consideration of more candidate predictors. Also, there is a small improvement in the fulfilment of the bagging decision tree compared to the regression tree. The metrics are the higher value of the correlation coefficients, the lower value of the errors, and the lower variation of the results. This improvement is due to the reduction in the variances of the dataset and the greater accuracy of the model (Figure 11). In Figure 11, MAE represents the average absolute deviation of the estimated ratios from the observed values and is a quantity used to measure how close the estimates are to the empirical data. The RMSE also expresses the variation of the estimated ratios from the observed ratios. It signifies the standard deviation of the differences between the modelled values and observed values [41,66].

Performance of the Applied Damage Models
In this part of the study, the performance of the tree-based models was compared with FLFA rs multi-parameter flood loss function. As mentioned before, both approaches (the tree-based models and the stage-damage function) were derived based on the same dataset.
To compare the performance of the tree-based models with FLFA rs , 200 sets of 100 affected buildings were randomly drawn from the original dataset; each model was applied to every building record and the errors were calculated and averaged over all samples.
Results show that there is a distinct improvement in the tree-based models' performance over the FLFA rs model, which is due to the consideration of more candidate predictors. Also, there is a small improvement in the fulfilment of the bagging decision tree compared to the regression tree. The metrics are the higher value of the correlation coefficients, the lower value of the errors, and the lower variation of the results. This improvement is due to the reduction in the variances of the dataset and the greater accuracy of the model (Figure 11). In Figure 11, MAE represents the average absolute deviation of the estimated ratios from the observed values and is a quantity used to measure how close the estimates are to the empirical data. The RMSE also expresses the variation of the estimated ratios from the observed ratios. It signifies the standard deviation of the differences between the modelled values and observed values [41,66].

Conclusions
Flood damage assessment is an important component of flood risk management since inaccurate damage estimation leads to wasted effort, money, and resources for the organisations involved in risk mitigation. The majority of flood damage models have attempted to propose simplified approaches based on the type or use of elements at risk and the inundation depth of water. However, flood damage is a complicated process, dependent on a variety of factors. Accordingly, the traditional stage-damage functions are subject to significant uncertainties since some influencing factors are usually neglected. If the water depth is the only hydraulic factor considered, the models are not flexible enough to transfer and use in a new area of study. On the other hand, multi-variable models are also subject to uncertainty, particularly since additional variables are taken into account. Therefore, they also entail additional sources of uncertainty. This study used a multi-variate statistical analysis to explore the interaction and effect of many

Conclusions
Flood damage assessment is an important component of flood risk management since inaccurate damage estimation leads to wasted effort, money, and resources for the organisations involved in risk mitigation. The majority of flood damage models have attempted to propose simplified approaches based on the type or use of elements at risk and the inundation depth of water. However, flood damage is a complicated process, dependent on a variety of factors. Accordingly, the traditional stage-damage functions are subject to significant uncertainties since some influencing factors are usually neglected. If the water depth is the only hydraulic factor considered, the models are not flexible enough to transfer and use in a new area of study. On the other hand, multi-variable models are also subject to uncertainty, particularly since additional variables are taken into account. Therefore, they also entail additional sources of uncertainty. This study used a multi-variate statistical analysis to explore the interaction and effect of many influencing parameters on the extent of flood losses. In this regard, tree-based approaches (e.g., regression trees and bagging decision trees) have been applied, and a dataset collected from 2012 to 2013 flood events in Queensland has been utilised. Previous studies have shown that tree-based models are very effective in identifying the significant damage-influencing parameters and their interactions with the extent of losses since they can extract the local relevance of every predictor. Accordingly, this study has taken advantage of this approach.
The results of the Australian dataset show that water depth is the most significant predictor, correlating positively with the loss ratio. After water depth, floor space per person is the most important influencing factor, correlating negatively with loss ratio. This predictor is substantial if the depth of water is greater than 64 cm and the area of the building exceeds 150 m 2 per person. Another important factor that correlates negatively with the extent of losses is the precautionary measures. The precautionary measures are important only for large flood depths (>177.5 cm). This outcome is opposite to the results of the studies in Germany, where the effects of the precautionary measures were significant only for shallow water depths. As with precautionary measures, building quality has an inverse effect on the structural loss ratios if the water depth is greater than 177.5 cm. The building value indicator was also presented in three decision nodes of the tree. However, its correlation with the loss ratio is not specified. In this study area, water contamination and flow velocity were not correlated with the loss ratios. Also, it has been shown that socioeconomic status does not play a fundamental role in flood loss mitigation in the areas of study. As the results of the tree-based approaches show, the following damage-influencing parameters are important: water depth, floor space per person, precautionary measures, building value, and building quality. The high importance of water depth is in accordance with traditional stage-damage functions. However, to the best of our knowledge, the influences of other parameters have not been studied comprehensively for flood damage assessment in Australia.
Finally, the performance of the tree-based models was compared with the outcomes of a newly established multi-parameter flood loss function (FLFA rs ) from Australia. It is demonstrated that the new tree-based model, due to considering more parameters, can estimate the extent of losses more accurately. The evaluation of model performance in this paper is based on random samples which are not independent of the data used for model development. Hence, the comparison of model performance does not give information about the transferability of the models.
Accordingly, it is recommended that further development of Australian flood damage models consider more candidate predictors (especially the important parameters stated in this study), and take advantage of tree-based models. Further research will be aimed at examining a more comprehensive dataset to explore the significance of other influencing factors (e.g., return period, long duration flooding, sediment loading, and early warning) and using an independent dataset to evaluate the level of transferability of the tree-based models in time and space.