Assessment of Business Interruption of Flood-Affected Companies Using Random Forests

: Losses due to ﬂoods have dramatically increased over the past decades, and losses of companies, comprising direct and indirect losses, have a large share of the total economic losses. Thus, there is an urgent need to gain more quantitative knowledge about ﬂood losses, particularly losses caused by business interruption, in order to mitigate the economic loss of companies. However, business interruption caused by ﬂoods is rarely assessed because of a lack of sufﬁciently detailed data. A survey was undertaken to explore processes inﬂuencing business interruption, which collected information on 557 companies affected by the severe ﬂood in June 2013 in Germany. Based on this data set, the study aims to assess the business interruption of directly affected companies by means of a Random Forests model. Variables that inﬂuence the duration and costs of business interruption were identiﬁed by the variable importance measures of Random Forests. Additionally, Random Forest-based models were developed and tested for their capacity to estimate business interruption duration and associated costs. The water level was found to be the most important variable inﬂuencing the duration of business interruption. Other important variables, relating to the estimation of business interruption duration, are the warning time, perceived danger of ﬂood recurrence and inundation duration. In contrast, the amount of business interruption costs is strongly inﬂuenced by the size of the company, as assessed by the number of employees, emergency measures undertaken by the company and the fraction of customers within a 50 km radius. These results provide useful information and methods for companies to mitigate their losses from business interruption. However, the heterogeneity of companies is relatively high, and sector-speciﬁc analyses were not possible due to the small sample size. Therefore, further sector-speciﬁc analyses on the basis of more ﬂood loss data of companies are recommended.


Introduction
Losses due to floods have dramatically increased over the past decades and amount to an estimated global annual average loss of US $104 billion [1,2]. Flood events hold, with about 50%, the highest share of economic losses due to natural hazards during the last 6 decades in Germany [3]. The losses of companies, including losses due to business interruption, contribute a large portion to it [4]. To mitigate flood losses, continuous improvement in flood risk management is necessary.
For an optimal allocation of funds, cost-benefit analyses are performed, of which a central part is the estimation of benefits, i.e., averted flood losses. Cost-benefit analyses, which exclude certain loss categories, such as losses due to business interruption, lead to sub-optimal decisions [5]. Thus, quantitative knowledge about processes that determine business interruption, as well as models to estimate business interruption time and resultant losses, are necessary. However, even though business interruption losses are expected to exceed the direct flood losses of companies [6,7], little is known about it and only a few models for estimating time of, or losses due to, business interruption are available.
The objective of this study is to improve our understanding of the processes during floods that lead to business interruption and resultant losses in Germany. That is, to identify the most important variables determining business interruption duration and costs. On this basis, multivariable models for the estimation of business interruption duration and costs are developed and validated.

Literature Review
Losses due to business interruption occur in industry and commerce, in areas that are directly affected by floods. They occur due to the immediate flood impact but do not necessarily result from a physical contact between the inundation and assets, but also from the interruption of business processes, which often last much longer than the direct impact of the flood [8]. For instance, business interruptions take place if employees are not able to do their job, since their workplace is destroyed and they do not have access to an alternative working site. Business interruption losses are sometimes referred to as direct damage, as they occur due to the immediate impact of the hazard (see e.g., [9,10]) but are also sometimes referred to as primary indirect damage because the losses do not result from physical damage to property but from the interruption of economic processes (e.g., [11]). However, models to estimate losses due to business interruption, those used for direct or indirect damage, are different from both.
Common business interruption losses refer to the loss of revenue from the reduction of the flow of services, economic output or profit [6]. However, business interruption losses may be modelled as losses to stocks, e.g., when they are calculated as a fixed ratio of property losses, as conducted by the models of Anuflood [12] and Rapid Appraisal Method (RAM) [13], and as losses to flows. Stocks refer to a quantity of, e.g., money at a single point in time. Flows are defined as the outputs or services of stocks over time [14]. In most models, business interruption losses are estimated as losses of flows for a certain period of time. As a measure of the sum of flows in a company, the value added is often used [15].
Various variables are taken into account by existing models to define the susceptibility of production processes to floods. Impact variables considered are the water level (e.g., [16][17][18]), flood duration (e.g., [17,18]), and return period (e.g., [19,20]). For instance, [20] using four classes of return period, e.g., it is estimated that floods with a return period of 50 years lead to a period of business interruption of two months. Resistance variables taken into account are differences in the economic sectors [17][18][19], number of employees [16] and the value added [18,20]. For example, the model of [18] distinguishes 16 different manufacturing branches and 17 different branches in retail, distribution, office and leisure services. Unfortunately, in many studies, it remains rather unclear on which basis these variables and their quantitative influence on the business interruption were identified and quantified.
In recent years, several studies have demonstrated that machine learning approaches-particularly tree-based methods-have a good performance in relation to determining flood damage-influencing factors, achieving a more precise description of the damage processes. For instance, [21] applied bagging decision trees and regression trees to quantify the importance of various factors for the amount of flood damage to residential buildings. The study in [22] used Random Forests to calculate the variable importance for the damage estimation of various company sectors and assets. Both claimed that tree-based models are particularly suitable to the analysis of flood damage processes, as they are able to capture nonlinear and non-monotonous dependencies between predictor and response variables, and they take interactions between the predictors into account. Thus, in this study, we also used Random Forests to identify the most important variables determining business interruption duration and costs.
Three main approaches to estimating losses due to business interruption can be distinguished [8]: (1) Applying sector-specific reference values, e.g., loss of added value per employee and day [19]; (2) comparisons of production output between flood and non-flood years (e.g., [23]); and (3) approaches that calculate production losses using a fixed share of direct damages (e.g., [12,13]). The latter two approaches are rather coarse and involve more uncertainties than the first. They are therefore particularly useful for rapid assessments in the case of, for example, emergency planning and budgeting [8].
Examples for the first approach are the following: [16] Present a probabilistic model for estimating the business interruption loss of industrial sectors caused by urban flooding in Japan. They use functional fragility curves and accelerated failure time models to estimate the extent of damage to production capacity and production-affected time, including stagnation and recovery time. The study in [24] developed a semi-quantitative framework to assess the entrepreneurial and regional-economic flood impacts of one specific production facility. Their approach relies mainly on a quantitative flood hazard modelling, resulting in a detailed inundation area and water level maps for the commercial area, as well as a rather qualitative vulnerability assessment based on co-development with the company. The US model, HAZUS, calculated monetary business interruption losses in summing up relocation expenses, which "include the cost of shifting and transferring, and the rental of temporary space", rental income losses and capital-related losses, which reflect the income losses of the proprietor of the company [17]. All these loss types depend on business interruption duration, as well as on several "cost per day and area" factors, such as the proprietors' income loss per day and square foot. These examples explain the conclusion of [8], i.e., that business interruption loss models are diverse, usually rather simplistic, and often non-transparent and unvalidated.

Description of Flood in June 2013
In June 2013, large-scale flooding occurred in many Central European countries, i.e., in Switzerland, Austria, the Czech Republic, Slovakia, Poland, Hungary, Croatia, Serbia and, particularly, in Germany, where almost all main river basins were affected [25][26][27].
The event of 2013 was especially characterized by extraordinarily high antecedent moisture. During the second half of May 2013, exceptional rainfall amounts, due to a quasi-stationary upper-level trough over Central Europe, had been witnessed. This circulation pattern triggered a sequence of surface lows on its eastern side that repeatedly transported warm and humid air from South-eastern Europe to Central Europe [27]. By the end of May, rainfall was at 200% of the average monthly amount  in large areas of Germany. Regionally, more than 300% was reached [28]. The intense and widespread precipitation that finally triggered the June 2013 flood occurred at the end of May and beginning of June. It was caused by a cut-off low that slowly moved, with its center, from France (29th May) over Northern Italy (30th May) to Eastern Europe (1st June). Overall, a combination of large-scale lifting, orographically-induced lifting and embedded convection resulted in persistent and widespread rainfall. The most intense precipitation occurred in the Danube catchment in the alpine areas of Southern Bavaria and Northern Austria [27]. For example, at the weather station Aschau-Stein of the German Weather Service in the Chiemgau Alps, a rainfall total of 405.1 mm within 96 h (the 30th of May to the 2nd of June) was registered [28].
The spatially-extended and intense-but not extraordinarily intense-precipitation from the end of May until the beginning of June, in combination with the high antecedent catchment wetness, was the main driver of the June 2013 flood [26,27]. Severe flooding occurred, especially along the Elbe River and its tributaries, Saale and Mulde, in the federal states of Saxony, Thuringia and Saxony-Anhalt, and along the Danube River in the federal state of Bavaria. Return periods of peak discharge exceeded 100 years at many gauges along the Elbe River, from Dresden to Lenzen, as well as in the Mulde and Saale catchments. Along a reach of 350 km down the Elbe River, between Coswig and the weir at Geesthacht, as well as down the Saale river, record-breaking water levels were registered [26,29]. In the Danube catchment, return periods of discharges of more than 100 years were observed along the Danube river, downstream of Regensburg and along the rivers, Inn and Salzach. In Passau, the highest water level since 1501, due to the superposition of the flood waves from the Inn and Danube rivers, was observed [30]. Using an adapted method from [31], which determines and assesses large-scale flooding based on discharge data from 162 gauges from all over the country, the flood of June 2013 can be regarded-in hydrological terms-as the most severe flood in Germany for at least the past 60 years [26]. At several locations, embankments were unable to withstand the floodwater, resulting in dike breaches and the inundation of the hinterland, e.g., 5 breaches in the Saxon part of the river Elbe and 24 failures along the river Mulde [31]. Three of these breaches had dramatic dimensions, with large-scale inundations: Near Deggendorf at the Danube River, near Groß Rosenburg at the confluence of the Saale and Elbe rivers and near Fischbeck at the Elbe River [26].
As a result, 12 out of the 16 federal states were affected by the flood, of which 8 declared a state of emergency (see Figure 1 for a geographic overview). The most affected federal states, where together more than 90% of the economic losses occurred, were Saxony, Saxony-Anhalt, Bavaria and Thuringia [4]. The flood caused 14 fatalities, 128 people were injured, 600,000 people were affected and the total direct losses amounted up to EUR 8 billion [4]. The German insurance industry paid EUR 1.65 billion in compensation [32].

Description of the Company Survey
The data result from a survey, conducted after the June 2013 flood in Germany [4]. Computer Aided Telephone Interviews (CATI) were carried out approximately one year after the flood, between May and June 2014, by a pollster (SOKO Institute, Bielefeld). In total, 557 interviews were taken on the basis of lists of affected streets of the whole flood-affected area (see Figure 1). These lists were compiled on the basis of information from affected districts or municipalities, flood reports and press releases, as well as with the help of flood masks derived from satellite data (DLR, Center for Satellite Based Crisis information, https://www.zki.dlr.de/). The telephone numbers were generally retrieved from the commercial telephone directory (yellow pages), and all researched telephone numbers were contacted. About 90 questions with the following topics were asked in the survey: Flood impact parameters (e.g., contamination and water level), early warning, emergency measures, precautionary measures, company characteristics, flood damage (direct losses and business interruption), and flood experience. As not all questions were applicable in all cases and, for many questions, lists of possible answers were given (with either a single answer or multiple answers possible), the interviews took only 34 minutes on average. At the beginning of the telephone call, the person on the phone was asked who in the company has the best knowledge about the flood event and the incurred losses. Then, the interview was undertaken with this person, and, in most cases, this was a member of the management board. In total, 557 interviews were completed. For further details about the survey and the data processing, see [4,33,34].

Description of Collected Data and Developed Indicators
The 13 variables, used as potential predictors for business interruption duration or cost (predictor variables) and the two response variables, which were all derived from the data set, are shown in Table 1. They were selected according to their potential to influence company flood damage, as indicated in previous studies [4,15,22,33].
Flood impact variables are the water level, the contamination indicator and the inundation duration. The water level and the inundation duration were given from the interviewees in cm above ground and in hours or days, respectively. The contamination indicator is the weighted sum of contaminants, such as heating oil, sewage water and chemical substances. These contaminants are weighted according to their damage potential [34]. The perceived danger of another disastrous flood event was evaluated from the interviewed people on a rank scale from 1 (=very unlikely) to 6 (=very likely).
Variables that describe the companies' precaution are the adaptation ratio and the availability of flood insurance. These variables were derived from questions about the long-term precautionary measures the company had undertaken before the 2013 flood event (checklist with different measures, e.g., the availability of flood insurance, adapted use of the flood-prone area, with multiple possible answers). The measures, entitled the "adapted use of flood-prone area," "relocation of susceptible equipment" and "relocation of dangerous substances," are classified as adaptation measures. The adaptation ratio corresponds to the share of the implemented measures compared to all of the relevant or possible measures for the specific company. The relevance of the respective

Description of Collected Data and Developed Indicators
The 13 variables, used as potential predictors for business interruption duration or cost (predictor variables) and the two response variables, which were all derived from the data set, are shown in Table 1. They were selected according to their potential to influence company flood damage, as indicated in previous studies [4,15,22,33].
Flood impact variables are the water level, the contamination indicator and the inundation duration. The water level and the inundation duration were given from the interviewees in cm above ground and in hours or days, respectively. The contamination indicator is the weighted sum of contaminants, such as heating oil, sewage water and chemical substances. These contaminants are weighted according to their damage potential [34]. The perceived danger of another disastrous flood event was evaluated from the interviewed people on a rank scale from 1 (=very unlikely) to 6 (=very likely). Variables that describe the companies' precaution are the adaptation ratio and the availability of flood insurance. These variables were derived from questions about the long-term precautionary measures the company had undertaken before the 2013 flood event (checklist with different measures, e.g., the availability of flood insurance, adapted use of the flood-prone area, with multiple possible answers). The measures, entitled the "adapted use of flood-prone area", "relocation of susceptible equipment" and "relocation of dangerous substances", are classified as adaptation measures. The adaptation ratio corresponds to the share of the implemented measures compared to all of the relevant or possible measures for the specific company. The relevance of the respective measures was a question posed to the interviewees. For more details, see [22]. The warning-related variables are the warning lead time and the emergency indicator. The warning lead time is the time in hours or days between the time when the company became aware of the upcoming flood (due to, e.g., an official warning, warning by employees, own observation) and the time when the inundation of the business premises occurred. With the emergency indicator, the undertaken emergency measures, before and during the flood, are counted. Eight different measures (e.g., the availability of an emergency plan, conducting emergency exercises every year, installation of water barriers and installation of water pumps, as indicated in [22]) were named in the surveys, and a maximum of four were conducted. These measures can also be classified into (and should be part of) a so-called Business Continuity Strategy (BCS). The BCS is a comprehensive framework on disaster preparedness, response and recovery, which aims to ensure the continuity of business in the case of any form of internal or external impacts of catastrophic events, such as technological, man-made or natural disasters.
The variables that describe the company characteristics are the sector, the number of employees, the spatial situation, and the share of suppliers and customers within a 50 km radius. The sectors were assigned, according to Nomenclature statistique des activités économiques dans la Communauté européenne (NACE) Rev. 2 [35], into the following classes: Agricultural sector, manufacturing sector, commercial sector, financial sector and service sector. The agricultural sector could not be used for the analyses because only few observations were made. The variable spatial situation describes the company site, e.g., the business premises with more than one building and less than one floor in an externally used building. The shares of suppliers and customers within a 50 km radius were asked to get information about the local interdependence of companies within the affected region. The business interruption duration and the monetary business interruption damage were given by the interviewees.
Some basic analyses and plots are made to describe the correlation structures in the data set, i.e., scatter plots with linear correlation and correlation matrix based on Spearman's rank correlation coefficient. A correlation matrix can be used to depict significant and insignificant correlations between the variables to facilitate further analysis [21,22,36].

Random Forests
As already introduced in Section 2, previous studies showed the suitability of the application of tree-based models in flood damage modeling [21,22,37,38]. This paper gives only a brief overview of the functionality of Random Forests, and we refer to [39] for a more in-depth introduction of the method.
A Random Forest is, when used for regression, an ensemble of many regression trees which are organized into different nodes, namely, root nodes, split nodes and leaf nodes. The purpose of the trees is to subdivide a data set into less heterogeneous subsets, with regard to a response variable, by means of predictor variables. The data set is subdivided at the split nodes until a stop criterion is fulfilled, and the stop criterions vary with different algorithms. Hence, the data set chunks end up in leaf nodes containing only data points whose variables meet the threshold values of the split nodes of the tree. A prediction of a single tree for a new data point is usually given by the mean value of all data points present in the leaf node, in which the new data point ends up. In this case, the prediction of the forest for a new data point is the mean of the predictions made by the single trees.
Random Forest algorithms apply an internal bagging to split the input data set into two samples for the construction of single trees. One sample usually consists of about two thirds of the input data set. This sample is used for the construction of the tree. The remaining third of the input data set is called Out-of-Bag observations (OOB). The OOB observations are used internally to estimate the accuracy of the resultant model.
The Classification and Regression Tree (CART) algorithm is the most widely used algorithm to construct a Random Forest. Some studies, however, recognized a bias, with respect to variable selection, toward variables with different scales and many possible splits within the CART algorithm [40][41][42][43][44][45]. Hence, the Conditional Inference Tree (CIT) algorithm was developed to overcome this bias and improve the interpretability of the trees [46].
The CIT algorithm is used in this study to model the impacts of business interruption, since the data sets used contain variables with different scales and many possibilities of splitting. The data analysis was conducted with the statistical programing language and environment, "R" (R Foundation, Vienna, Austria, version 3.3.3) [47]. The package, "party" (R Foundation, Vienna, Austria, version 1.2), was used to compute the Random Forests [45,46,48]. Each Random Forest consists of 1000 trees (ntree), and 3 variables were randomly chosen as candidate variables at each node for splitting (mtry). Each leaf node consists of at least 7 observations. In this study, the number of predictor variables to grow the Random Forest is 13. These variables are listed in Table 1. Splitting of the data set for model validation is described in Section 3.2.4.

Stage-Damage-Function
For method comparison, stage damage functions (SDF) are also used to predict business interruption costs and duration. SDFs use the water level as the only predictor variable. In Germany, stage-damage functions, in the form of square-root functions, are widely used to estimate the damage to companies [49].
where D-the damage (can be either business interruption costs or business interruption duration) a, b-parameters (subscript indicates the related case) h-water level above the ground surface in cm The parameters a and b of the square-root function are fitted to the respective training data set, which is also used to train the Random Forest. For business interruption costs, the fitting resulted in the parameter values a C = −121795.4 and b C = 293649, and, for business interruption duration, the parameter values amount to a D = 7.41 and b D = 46.42.

Variable Importance
In this study, we also use Random Forests to assess the individual relevance of a predictor variable from a set of input predictor variables in order to estimate business interruption costs and duration. Random Forests estimate the so-called variable importance by randomly permuting the predictor variable values to simulate the absence of the respective variable [22]. Subsequently, by comparing the OOB prediction accuracy resulting from the predictions based on original and permuted values, the importance of the particular predictor variable is derived [22]. In other words, the amount that the prediction accuracy decreases as a consequence of the permutation of the predictor variable values is used as a measure for variable importance.

Model Validation
A splitting of the input data sets is applied to allow for a comparison between the model results. The sampling method used is the Jackknife, which was developed to assess the stability of estimates [50]. Of the input data set, 75% is used for the training of the Random Forest and the SDFs, while the remaining 25% serve as a basis for the validation of both models using three different error measures: The Mean Absolute Error (MAE): The Root Mean Square Error (RMSE): The Mean Bias Error (MBE): where est-estimated value obs-observed value n-number of observations The MAE measures the mean deviation from the predicted values to the observed values, the square root of the average of squared errors is considered by the RMSE, and systematic overestimation or underestimation of the models is captured by the MBE.
An exemplary single tree of a random forest, with the response variable business interruption cost, is shown in Figure 2. This tree consists of one root node, two split nodes or decision nodes and four leaf nodes, with a minimum of 8 observations.
Water 2018, 10, x 9 of 16 The Mean Bias Error (MBE): where -estimated value -observed value -number of observations The MAE measures the mean deviation from the predicted values to the observed values, the square root of the average of squared errors is considered by the RMSE, and systematic overestimation or underestimation of the models is captured by the MBE.
An exemplary single tree of a random forest, with the response variable business interruption cost, is shown in Figure 2. This tree consists of one root node, two split nodes or decision nodes and four leaf nodes, with a minimum of 8 observations.

Correlations with Business Interruption
The scatter plot in Figure 3 shows the linear correlation between business interruption cost bic and business interruption duration bid in the observation data. The low value of the coefficient of determination R 2 (0.0206) indicates a relatively weak linear relationship between these two variables. Additionally, the influence of outliers seems to be high. This can be explained by the high heterogeneity of companies with respect to business processes and volume, size and sector. It is quite understandable that, e.g., one week of business interruption will lead to much higher costs for a manufacturing company with 100 employees than it would be in the case of a service company with

Correlations with Business Interruption
The scatter plot in Figure 3 shows the linear correlation between business interruption cost bic and business interruption duration bid in the observation data. The low value of the coefficient of determination R 2 (0.0206) indicates a relatively weak linear relationship between these two variables. Additionally, the influence of outliers seems to be high. This can be explained by the high heterogeneity of companies with respect to business processes and volume, size and sector. It is quite understandable that, e.g., one week of business interruption will lead to much higher costs for a manufacturing company with 100 employees than it would be in the case of a service company with 2 employees. It was revealed in the foregoing that there are significant differences between the sectors in nearly all phases of flood management [51]. For instance, the manufacturing sector was shown to have comparatively the best preparedness and precaution status but, due to the high assets and business volumes, also the highest total direct damage [51]. The correlation matrix of the 15 variables, comprising the 13 predictor and the two response variables, as described in Table 1, is shown in Figure 4. It is based on the Spearman's rank correlation, which is nonparametric, relatively robust to outliers and does not rely on the linearity of the statistical dependence of the variables involved. Each color of the correlation matrix represents a correlation coefficient interval, with a size of 0.2. The sizes of the colored boxes indicate the strength of the respective correlation. The white empty boxes indicate a non-significant correlation between variables. The correlation coefficients are relatively weak and range from −0.41 to 0.46, and a large number of coefficients are close to zero. Despite the heterogeneity of the companies, business interruption cost bic has the highest positive and significant correlation with the business interruption duration bid (0.46). Moreover, business interruption cost is also significantly correlated with the company size (0.39), water level wl (0.28), inundation duration d (0.2), perceived danger of flood event recurrence pror (0.14) and the emergency indicator emeri (0.15). In contrast, the variables customers within a 50 km radius c50 (−0.24), spatial situation spats (−0.22) and sector sec (−0.18) have a negative correlation with the business interruption cost.
The water level wl shows the highest positive and significant correlation with the business interruption duration bid (0.35). The inundation duration d (0.31), contamination coni (0.15), perceived danger of flood event recurrence pror (0.16) and warning time wt (0.17) positively and significantly correlate with the business interruption duration bid. The only significant negative correlation with business interruption duration bid is given by the company size variable (−0.18).
The scatter plots of business interruption costs bic in relation to the water level wl are shown in Figure 5a, and business interruption duration bid in relation to the water level wl is shown in Figure 5b. The water level wl shows relatively high correlations with business interruption costs bic and duration bid (Figure 3), and it is found to be the most important variable for flood damage in a variety of studies [15,21,22,52,53]. The value of the response variable (y-axis) increases with the increase of the water level (x-axis) in both cases. In comparison, as already indicated by the Spearman correlation, the water level seems to be more important for determining the business interruption duration than for determining business interruption costs (Figures 4 and 5). However, according to the values of the correlation coefficients R 2 , linear correlations are, additionally, rather weak, compared to what is probably the high influence of outliers. The scatter plots of business interruption costs bic in relation to the water level wl are shown in Figure 5a, and business interruption duration bid in relation to the water level wl is shown in Figure  5b. The water level wl shows relatively high correlations with business interruption costs bic and duration bid (Figure 3), and it is found to be the most important variable for flood damage in a variety of studies [15,21,22,52,53]. The value of the response variable (y-axis) increases with the increase of the water level (x-axis) in both cases. In comparison, as already indicated by the Spearman correlation, the water level seems to be more important for determining the business interruption duration than for determining business interruption costs (Figures 4 and 5). However, according to the values of the correlation coefficients R 2 , linear correlations are, additionally, rather weak, compared to what is probably the high influence of outliers.

Important Variables Determining Business Interruption
In general, correlation coefficients can only reflect pairwise and monotonic relationships of the variables, whereas Random Forests are capable of assessing non-monotonic and multivariate relationships. In other words, variable importance measures using Random Forests can capture the influences of variables on business interruption-related flood consequences, which are not detected by traditional correlation coefficients [22]. Additionally, it has been shown before, that multivariate algorithms are better suited to describing the complex damage processes during (and after) flooding [21,22,39]. Therefore, the variable importance measure based on a Random Forest is applied, and the results are used for the development of the Random Forest model.
The variable importance analysis based on the Random Forest algorithm reveals that the company size is by far the most important variable for predicting business interruption costs bic ( Figure 6). This result confirms the findings of [54], wherein the size of the company was found to be one of the major predictors for estimating business interruption after the Nisqually earthquake in 2001 in Seattle, USA. The variable emergency indicator, which is positively correlated with the size of the company (Figure 4), seems to be also of considerable relevance for predicting business interruption costs. Other variables, namely, the water level wl and number of customers/suppliers within a radius of 50 km c50/s50 are by far less important for determining business interruption costs. Since the water level wl seems not very important for determining business interruption costs bic, it appears quite critical to use the water level as a predictor for business interruption costs, e.g., using stage-damage functions. The most important variable for predicting business interruption duration bid is the water level wl (see Figure 6). Many previous studies designate the water level likewise as the most important variable for the estimation of direct flood damage as well as business interruption [15][16][17][18]21,22,53,54]. Thus, in this case, stage-damage functions might be useful tools for estimating business interruption duration. The variables perceived, such as the danger of flood event recurrence and insurance coverage, are also relatively important for predicting business interruption duration.
A normalization of the variables bic and bid could enable a better comparison between different companies. For example, the variable bic could be normalized with the annual turnover of the companies. However, these kinds of data were not available for this study. Future studies, especially with a focus on modelling, should consider a normalization of the response variables.

Models Estimating Business Interruption
The results of the random forest model and the SDF validation reveal very high errors and hardly any difference between the two models ( Figure 7). The validation of the Random Forest model for predicting business interruption costs resulted in a median MAE of EUR 222,515, a median MBE of EUR 2675 and a median RMSE of EUR 617,010 ( Figure 7). The error measures of the SDF predicting business interruption costs were slightly worse, with a median MAE of EUR 275,756, a median MBE of EUR 18,050 and a median RMSE of EUR 683,185. The validation of both models predicting business interruption duration revealed almost no difference between the Random Forest model and the SDF with respect to error statistics: The median MAEs were 54.08 days and 54.67 days, the median MBEs were −0.01 days and 0.04 days and the median RMSEs were 78.73 days and 78.38 days, respectively (Figure 7). In comparison with the empirical median and mean business interruption costs and durations of all companies in our sample, which are EUR 15,000 and EUR 173,356, and 15 days and 54 days, respectively, it must be concluded that both models are not able to provide reasonably accurate estimates.

Conclusions
This study improves the understanding of flood-related business interruption duration and costs in Germany, based on analyses of empirical data from 557 companies affected by the June 2013 flood. Due to the high heterogeneity, with respect to the business processes and volume, as well as the size and sector, of the companies, there is only a relatively weak linear relationship between business interruption costs and duration. The processes leading to business interruption costs and duration are complex, with various variables influencing these flood consequences.
The results of the variable importance identification show that water depth is the most important flood impact parameter for the magnitude of business interruption duration, whereas business interruption costs are mainly driven by the size of the affected company. That is to say, business interruption duration seems to be mainly driven by the hazard severity characteristics, and business interruption costs appear to be rather determined by company characteristics. We could further show that other variables, such as the perceived danger of flood event recurrence, insurance coverage and emergency indicators, can also have a significant influence on business interruption-related flood consequences.
SDF, but also the developed Random Forest-based loss model, are not able to estimate business interruption costs or duration with reasonable accuracy. Thus, the data analyses and estimation attempts can only partly explain the effects that determine business interruption duration and resultant costs, although the empirical data used are unique with regard to both the data volume and level of detail. One reason for this might be the large heterogeneity of commercial companies. A sufficient representation of the processes leading to the occurrence and magnitude of business interruption seems to require an even larger and more comprehensive database. Consequently, more empirical damage data on the commercial (sub-) sector level need to be collected to further facilitate the development of reliable business interruption loss models as well as a more in-depth understanding of the processes involved. Once such a database is available, future research should, on the one hand, strive for a reduction of uncertainties in the estimation of business interruption and, on the other hand, analyze potential changes in a company's vulnerability with respect to business interruption.