Application of Sensitivity Analysis for Process Model Calibration of Natural Hazards

Sensitivity analysis (SA) describes how varying inputs to a model subsequently varies its outputs. Its inclusion can support the systematic calibration of numerical models to back-calculate intensity properties of past torrent events that would otherwise be difficult or impossible to collect during their occurrence. Sensitivity analysis for model calibration is assessed with the back-calculation of a known torrent event. In particular, FLO-2D, a cell-based numerical model, is used to simulate the 2005 debris flow event that occurred in Brienz, Switzerland. Under 4000 simulations were completed with ranges of physically reasonable parameter values. Model results were compared in 3-dimensions with both sediment deposition extents (x, y) and estimated deposition heights (z) from available post-event images. The comparisons highlighted that more accurate input and validation data, namely the flow behavior of hazardous processes and post-event deposition heights, are required to produce stronger agreements between simulated and observed results. Furthermore, the application of SA for model calibration supports systematic exploration of large parameter spaces characteristic of complex phenomena like natural hazard events. These findings demonstrated how important model input factors can be identified, which provide guidance for future data collection efforts to capture both the rheology and the spatial distribution of hazards more accurately.


Introduction
Torrents are defined as steep waterways in mountainous environments [1].Hazardous torrent processes are characterized by the rapid propagation of large quantities of available sediments, debris, and water from an upslope source via a transit zone, to a downslope depositional area where human settlements may be established.In theory, torrent processes can be further differentiated as debris flows, hyperconcentrated flows, or fluvial sediment transport, based on the respective characteristics and dominant processes of each event [2,3].
While conventional approaches based the classification process on flow behavior alone [4], peak discharge has since been recommended as a complimentary criterion [3,5].Observable flow characteristics reflect variable concentrations of water and sediment, which provides insight into the internal physics of the flow.For instance, debris flows typically produce thicker, more hummocky and lobate depositions, and are characterized as very rapid to extremely rapid flows of saturated, non-plastic materials along steep channels, headed by a coarse surge front [3].While debris floods are associated with the transport of considerable quantities of coarse sediments, the flows are generally characterized as thin, wide sheets of materials [3].Relative to debris floods, hyperconcentrated or sediment-laden flows transport relatively less, albeit still notable quantities of fine sediment in suspension [6,7].In particular, mudflows have been defined as very to extremely rapid flows of saturated plastic debris in a channel, and are characterized by significantly larger quantities of water content with respect to the amount of solid source materials; the plastic index is greater than 5% [8].Following the definition by Bradley and McCutcheon [9], mudflows also have sufficient viscosity to transport sizable boulders, and natural and anthropogenic debris within a matrix of smaller-sized particles.It is possible for the flow behavior of a single torrent event to evolve as it occurs, depending on the types, velocity, and quantities of materials propagated downwards.In this respect, events may often be more realistically described as a combination or evolution of the aforementioned torrent process types.
The distribution of these materials downslope from the active zone is further influenced by underlying site characteristics (e.g., topography, presence of confined or unconfined preferential pathways).Due to the combined effects of composition, flow behavior and site-specific characteristics, torrent processes are associated with variable peak discharges, sediment transport capacities, momentums, and subsequently differential potentials to cause damage to elements at risk upon impact.
Assessing the physical vulnerability of elements at risk (e.g., affected buildings) due to these processes is a part of consequence analysis within risk assessment, where the intensity of a given event is related to the damages sustained [10,11].The design and implementation of effective risk mitigation strategies is dependent on the results of such analyses.The estimation of expected direct losses, as a result of the hazard process patterns with respect to the properties of exposed elements, is possible with the derivation of representative physical vulnerability functions [12].
However, a persistent challenge in vulnerability studies on torrent processes is the high uncertainty and limited amounts of direct, field-based observation data that is available, since its collection during the occurrence of these types of events is difficult or impossible [13].Detailed analyses have been conducted for events that result in high losses in both Austria and Switzerland.These analyses generally report on the triggering and boundary conditions of the particular torrent process, the process evolution, the extent of runout zones, and estimates of eroded materials deposited downslope (e.g., [14,15]).However, comprehensive information on processes, damage patterns, and their interactions with structural building properties have not been adequately documented to date.
To address this challenge, proxies have been adapted to inform about event intensities, including, but not limited to sediment deposition heights, velocities, and impact pressures [16,17].It is of interest to replicate past events with process models to determine if simulated intensity proxy data can be considered in further consequence analysis.Following Mazzorana et al. [12], process modeling is the first three of five steps to accurately assessing the physical vulnerability of the built environment.A range of recognized methods have been applied to different process models including empirical [7,[18][19][20][21][22][23], empirical-statistical combined with simple flow equations [24], topographic gradient-based [25], numerical-based with the integration of shallow water equations [26][27][28][29][30][31][32][33][34][35][36][37][38][39][40][41], and smoothed particle hydrodynamics (SPH) or Lagrangian [42][43][44][45][46] (see References [47,48] for review).Of the numerical models, 1-(e.g., DAN-W [28]; DFEM-1D [49]) or 2-(e.g., FLO-2D [27]; RAMMS-DF [35,36]; TopRunDF [7]; MassMov2D [34]) dimensional runout modeling approaches can be adapted.Rickenmann [50] provides a comprehensive overview of the advantages and limitations of each type of modeling approach as a combination of how flows are propagated and distributed on the alluvial fan.Furthermore, flows are represented as either single-phased, homogenous fluids with constant rheological properties, or dual-phased, heterogeneous matrices.Each approach has specific data requirements, which may limit its application to data scarce case studies.The different model approaches also highlight tradeoffs between minimizing computational time and various degrees of accuracy in simulated results.Consequently, the choice of a suitable model is determined by data availability and how representative a given approach is to representing a complex, heterogeneous 3D problem, while minimizing the computation time required to return a solution.
Furthermore, known sources of uncertainties can have adverse implications on model results [51].Sources associated with debris flow models include, but are not limited to, the quality of model input and calibration data, initial and boundary conditions, how accurately the model structure represents events of interest, the sensitivity of defined parameters, and the calibration method applied.Accurate topographic representation is imperative to model both the propagation of debris flows within the torrent channel and the lateral distribution of materials that exceed bankfull conditions onto the alluvial fan [52].For example, Rickenmann et al. [32] demonstrated that debris flow model results are sensitive to the presence of local topographic features, which can divert flows and consequently determine where material is deposited on alluvial fans.Moreover, infrequent generation of topographic inputs, which are used in debris flow models, may not necessarily reflect the same conditions when the event occurred.This temporal mismatch results in modelling with topography that does not accurately represent initial conditions and can include differences in channel slope steepness, channel width, and inaccurate representation of mitigation structures along the channel [53].Additional uncertainties in debris flow modelling stem from the lack of basic data required to reconstruct the characteristics of debris flow events.These may include the point where the debris flow event was initiated, the hydrograph peak and duration [16,53], and the volume of material at the start of the event and further entrained during its course [54].In lieu of actual data, working assumptions are made to reconstruct ranges of plausible data values, which contains inherent uncertainties that are introduced in the simulated results.For example, Rickenmann et al. [32] attributed sources of model errors to the lack of detailed data needed to describe a debris flow consisting of multiple surges, which was described with the use of a simpler, single surge hydrograph instead.The degree of model complexity can also be a source of uncertainty.For instance, sediment entrainment during a debris flow changes the volume of materials and flow behavior [55,56].Simpler debris flow models do not replicate entrainment, but partly consider this process by including additional quantities of material at the debris initiation point, while more complex models explicitly replicate the spatially and temporally-distributed erosion of channel materials as the flow is propagated [57].However, while a more complex model that includes entrainment can better replicate debris flow heights near the point of initiation [57], this additional process introduces another source of uncertainty into the model with the introduction of additional parameters (e.g., erosion rate), which require values that may not be known.Event and site-specific parameters for debris flow models are estimated through calibration, and efforts usually focus on flow resistance [48] and rheological parameters [58].These parameters are associated with wide ranges of plausible values in literature and may be difficult to determine which are most representative for a given event.The majority of debris flow model research uses a trial and error approach to calibration [58], and may not arrive to optimal parameter sets because calibration is time consuming and the process may be ended prematurely [52].Methods exist to efficiently derive parameter sets (e.g., genetic algorithms), but these methods are rarely, if ever, employed in debris flow modelling.
Given the inherent complexities that characterize natural hazard processes and the contributing sources of uncertainties, modeling past torrent events requires the exploration of relatively large parameter spaces, even with model simplifications.Sensitivity analysis (SA) describes how varying inputs in a numerical model subsequently varies its outputs [59].The inclusion of SA for model calibration supports a better understanding of model behavior, its parameterization, and its associated uncertainties [60].
In this respect, the inclusion of SA offers several advantages as a part of a sound model calibration and evaluation framework.Firstly, it is instrumental in reducing the number of parameters that require calibration.Calibration is an example of an inverse problem, where the most optimal agreement between simulated and observed reference data is obtained with parameter combinations and values that result in higher model performance.Secondly, SA supports the identification of the degree of influence that input factors have on model simulation results.Thirdly, SA can highlight limitations of model calibration due to residual sources of uncertainties once parameter uncertainties are accounted for [59].Consequently, the inclusion of SA into the modelling process identifies optimal results while highlighting application-specific limitations with greater efficiency.Furthermore, epistemic uncertainty about natural hazard phenomena that occur in complex systems remains prevalent [61].In particular, the combination of model and input data limitations compromises the ability to learn about the most influential parameter(s) within the systems of interest to a sufficiently high degree of accuracy.Uncertainties due to imperfect knowledge about initial conditions and simplified representations of model inputs are generally addressed with ensembles of model predictions, where each simulation represents a different choice of parameter combinations and values.A structured statistical approach to assess parametric uncertainty and model performance is preferable, where decisions about parameters can be made in a transparent and explicit way, using methods that can be easily understood [62].This study addresses the two aforementioned challenges, firstly, relating to the large parameter spaces needed to be explored to fully capture the complexities of hazardous torrent events, and secondly, evaluating model performance to gain a better understanding of epistemic uncertainties.
In response to the first challenge, the utility of SA for model calibration is assessed with the back-calculation of the 2005 debris flow event that occurred in Brienz, Switzerland.Taking all of these requirements into consideration, the FLO-2D [27], a simplified, physically-based, 2-dimensional numerical model, was used to model the event.Single-phased models like FLO-2D are commonly used to simulate debris flows by researchers and practitioners working in the risk community (Table 1) as a computationally efficient first step to gain insight into these complex processes [32,63].In the study conducted by De Blasioa et al. [38], it was suggested that in principle, the presence of larger blocks interspersed within a mud matrix may behave comparably to a pure Bingham fluid for certain types of flows.This working assumption was also adopted to minimize the amount of computation time to support the evaluation of a wider model parameter space.Furthermore, the application of more complex models often requires input data and initial conditions that exceed that which is available [64,65].With respect to specific model data requirements, FLO-2D results have been reported to be strongly influenced by topography [58]; the availability of a high resolution SwissALTI3D digital elevation model (DEM) [66] satisfied this requirement.Previous studies (e.g., [58]) have also cited that FLO-2D is capable of generating accurate runout distances and capturing the distribution of materials across the fan through back-calculation.Finally, to support subsequent investigations of building vulnerability with simulated hazard intensities, FLO-2D is capable of generating sediment deposition (flow) heights, in addition to flow velocities and impact pressures.
Model performance is assessed against post-event observations of sediment deposition extent, sediment deposition heights, and a single point estimate of flow velocity for close to 4000 completed simulations.Focusing only on studies conducted with the commonly applied FLO-2D model, Table 1 presents an overview of published literature on the assessment of simulated results specific to the back-calculation of torrent events.To date, the performance of simulated torrent events has generally been assessed by visual comparison, with limited instances where quantitative or hybrid approaches are applied to consider validation data in 3-dimensions.In particular, while comparisons of observed sediment deposition extents are more common, the additional inclusion deposition heights is limited.For instance, only three known studies with FLO-2D assessed results with both post-event observations of sediment deposition extent and deposition heights.From these cases, sample sizes of observed points were limited (n < 20), and no further statistical analyses were published at the time this manuscript was prepared.Furthermore, other studies that quantified model performance presented the percent of over or under prediction by sediment deposition extent or only visually displayed the simulated results without quantification.
In light of these findings, statistically-based performance metrics are applied in this study to address the second challenge.This supports the quantitative formulation of aggregated uncertainties related to data quality, parameterization, and model suitability; it is a step towards effectively identifying priorities for data collection and future modeling efforts based on quantitative methods.In effect, the methods proposed in this study combines SA for calibration together with the evaluation of model performance and behavior with a set of statistically-based metrics to support the process in a systematic and efficient way.

Site Description
The proposed framework is applied to a well-documented, high magnitude debris flow event that occurred along the Glyssibach in Brienz, Switzerland in 2005.A detailed description of the study site and characteristics of the debris flow was obtained from post-event documentation (i.e., Lokale Lösungsorientierte Ereignisanalyse or LLE) prepared by local experts and published in Reference [71].
Brienz is located at the fan of the Glyssibach, a 1.98 km 2 alpine catchment situated between 563 m and 2039 m above sea level.The underlying bedrock on location is Lower Cretaceous limestone.The combination of marl and siliceous limestone found in the vicinity and contributions of weathered schist provided sources of sediment for the torrent processes.
Between 19 and 23 August 2005, continuous rainfall varying between 6 and 16 mm/h fell in the upstream catchment.This resulted in 320 mm of rain over 72 h, which gradually transported finer sediments downwards along the Glyssibach into Lake Brienz, prior to the onset of the main debris flow event.An eyewitness account described the northern bank, where the channel meets the lake, reaching its retention capacity.The subsequent backpropagation of sediment was deposited upwards along the main channel until the limit, which is identified in Figure 1.While solid materials did not breach the confines of the channel, witnesses reported small quantities of water spilling onto the banks.This observation indicates reduced carrying capacity prior to the occurrence of the main event.
The debris flow was triggered by a landslide in the early morning on 23 August 2005 and lasted for approximately 15 min.The maximum discharge (Q max ) was estimated to be between 140 and 160 m 3 /s.Available sediment was entrained and approximately 72,000 m 3 of total bulk volume was subsequently propagated along the main channel into the settlement, deposited on the alluvial fan, and subsequently flowed into Lake Brienz.Flow velocities between 6 and 10 m/s at the Glyssibrücke was estimated with the superelevation approach.This is based on the forced vortex equation, where the difference in surface elevation of a debris flow is determined as it travels within a known channel curvature [73].Debris flow materials were described by local experts to be relatively homogeneous, resulting in a viscoplastic mud matrix composed primarily of fine fractions of silt and sand.In particular, the mud matrix of the 2005 event is estimated to have been comprised of 30-70% water; 5-8% of the total sediment volume is comprised of clay.The channelized mud matrix was capable of entraining coarse, unconsolidated materials (e.g., vehicles, large woody debris) and transported boulders between 3 and 5 m in diameter to the bottom of the alluvial fan.For this reason, the 2005 event is referred to as a debris flow.

Methods
Generally, a pre-screening stage is carried out to optimize modeling efforts by focusing on a subset of inputs based on relative importance [74,75].Figure 2 outlines the required inputs and expected outputs of the dynamic FLO-2D simulation model following the pre-screening stage.Both sets of boundary conditions and model parameter values reflect characteristics of the study site and the 2005 debris flow event described in Section 2. These characteristics were translated into ranges of physically plausible parameter values used to define the parameter space of interest; Table 2 summarizes key parameters and associated ranges of values identified from various sources.In particular, full ranges of acceptable input values are defined in the FLO-2D model reference manuals.Truncated a priori ranges of these values were further defined based on a combination of literature-based values from debris flow modeling studies with FLO-2D and values from expert knowledge about the debris flow that occurred specifically in Brienz.Certain identified parameters are set as initial conditions with fixed values based on recommendations in the literature, with respect to the degree of uncertainty in required model parameters.For example, accurate definition of uncertain flow behavior was prioritized over simulations designed to investigate the impact of changing the spatial resolution of the computational grid.Furthermore, the amount of time that would be required to evaluate both input factors and model parameters was also a consideration.These considerations effectively minimized the parameter space and optimized the search for parameter combinations that best match the observed reference data.
The combination of marl and siliceous limestone found in the vicinity and contributions of weathered schist provided sources of sediment for the torrent processes.
Between 19 and 23 August 2005, continuous rainfall varying between 6 and 16 mm/h fell in the upstream catchment.This resulted in 320 mm of rain over 72 h, which gradually transported finer sediments downwards along the Glyssibach into Lake Brienz, prior to the onset of the main debris flow event.An eyewitness account described the northern bank, where the channel meets the lake, reaching its retention capacity.The subsequent backpropagation of sediment was deposited upwards along the main channel until the limit, which is identified in Figure 1.While solid materials did not breach the confines of the channel, witnesses reported small quantities of water spilling onto the banks.This observation indicates reduced carrying capacity prior to the occurrence of the main event.In this study, a global sensitivity analysis was conducted, where the model output was obtained by varying inputs across their entire feasible space rather than around a reference value.One-at-a-time (OAT) or all-at-a-time (AAT) methods can be applied to calibrate each input factor independently or all input factors simultaneously [76].In this particular study, the AAT method was initially applied to explore the wider parameter space by considering model output sensitivity to both direct changes to input factors and to joint input interactions; this corresponds to about 45% of the total simulations completed for this study or 1764 model runs.The OAT method was applied to specific parameters once the most influential input factors were identified from evaluations of AAT results.Global sensitivity analysis was performed semi-automatically with the implementation of Sikuli scripts [77] coupled with an executable batch file program that was provided by the developers of FLO-2D.Due to constraints with the batch program (i.e., only a maximum of 20 simulations can be run in a series within the batch program at a time), fully automated calibration was not supported.
further gain in model performance was detected with respect to the available event data.Consequently, the evaluation of model outputs with summary scalar variable describe how influential variations of input factors are.Furthermore, the results provide insight about the feasibility of using simulated intensity proxies to further develop physical vulnerability curves (e.g., [16]).A distinction is made between model outputs and their translation into summary scalar variables with different performance assessment functions; these variables (e.g., global root mean square error, fitness measure F) quantify model performance by comparing simulated results to observed reference data [59].In particular, completed simulations were evaluated against reference observation data to determine whether higher agreements were obtained and new sets of input factor values were defined based on the results.This procedure was repeated iteratively in phases until no further gain in model performance was detected with respect to the available event data.Consequently, the evaluation of model outputs with summary scalar variable describe how influential variations of input factors are.
Furthermore, the results provide insight about the feasibility of using simulated intensity proxies to further develop physical vulnerability curves (e.g., [16]).In this study, pre-screening began with the identification of accepted ranges of required input values for the numerical model [78].Truncated ranges of physically feasible values were defined to reflect the specific 2005 debris flow event based on expert consultation.Additionally, unknown values were identified from literature describing potentially similar study sites and processes.Information from past SA studies with FLO-2D (e.g., [67][68][69]79]) was compiled; findings highlighted the degree to which the model outputs varied with the definition of certain conditions and parameters.Certain inputs were defined as initial conditions with fixed input values, while others were defined as input factors with a range of plausible values, based on the findings on relative parameter importance, on available information about the 2005 event and the amount of computational time required to explore a given number of parameters.For the 2005 mudflow in Brienz, initial conditions were defined with data about the underlying topography, the bulk volume of sediment and water propagated downslope, the inclusion of mitigation structures and building footprints into the topography, surface roughness, and the spatial resolution of model grid elements (Figure 2; Table 2).Sensitivity analysis for model calibration was applied to the defined input factors, which effectively reduced the size of the original problem by minimizing the model parameter space.
A swissALTI3D digital elevation model (DEM; [66]) was obtained at 2 m spatial resolution and was resampled to 5 m.While decreasing the spatial resolution of grid elements limits the ability to perform highly detailed analyses, the results provided an overview of reasonable reproductions of the simulated process with greater computational efficiency.To account for any notable changes represented in DEM since the occurrence of the mudflow event, additional or modified mitigation structures were changed to reflect the topographic conditions in 2005 based on information from expert consultation and review of pre-event orthophotographs.Furthermore, the DEM was adapted to include witness accounts of sediment backpropagation up to channel bankfull conditions until the limit of the blue arrow indicated in Figure 2, prior to the onset of the main debris flow event.

Input Factors for Calibration
Following the definition of initial conditions, site-specific ranges of physically feasible input factor values are defined based on expert knowledge and from published literature.This ensures that simulated outputs, produced by varying these constrained values, are realistic for a given process and event.Different combinations of input factors represent the simplified physics that govern the flow behavior of the process, how the materials spread laterally across the alluvial fan and downslope from the defined triggering location.
For all of the simulations, the same total bulk volume is transported downslope.However, the volumetric sediment concentration (C v ) input factor is calibrated for, which is the ratio between solid materials and water.Within the solid component, different grain size distributions can be defined.This reflects the importance of grain size on energy dissipation in granular flows, which translates to different effects on geophysical flow mobility [81].
An inflow mud hydrograph is required for each simulation and is generated based on the defined C v and discharge data.The amount of precipitation that initiated the debris flow could not be determined from rain or discharge records most proximal to the study site.Consequently, an expert-based estimate of maximum discharge from the main channel was used to produce a series of mud hydrographs based on variations of different volumetric sediment concentrations.

Calibration of Rheological Values
Flow behavior is partially characterized by the amount of volumetric concentration of a fluid matrix, C v , where V s and V w represent the respective sediment and pore water volumes in a given mixture [82].
Empirical data on relatively homogeneous, single-phased geophysical flows have shown that the amount of interstitial fluid and its associated properties determine both the viscosity and yield strength [83][84][85].In general, both viscosity and yield stress increase exponentially with the volumetric concentration of fine sediments.The empirical coefficients or material parameters, α i and β i , are key parameters within the FLO-2D model that define the internal resistance of the mudflow materials.In particular, the study conducted by D'Agostino et al. [69] demonstrated that the simulated results are highly dependent on the correct definition of rheological parameters.While these values are typically determined by lab-based analyses of representative field samples collected shortly post-event (e.g., [67,70,82,83]) to describe process flow behavior, it may not always be possible to obtain such samples.In lieu of post-event field samples, rheological values have been calibrated from existing values reported in the literature (e.g., [67,70]).
Consequently, in addition to the aforementioned input factors, rheological values were also calibrated for this study due to the lack of more detailed information about the sediments that were transported.A selection of rheological values (x 1-4 ) that was compiled from the literature for consideration and subsequent calibration in FLO-2D (Table 3).
Specific to FLO-2D, the additional calibration of an input factor referred to as sediment detention (x 7 ) modifies the defined rheology by introducing a uniform height threshold (in m) to each computational grid cell, so that the volume of materials propagating downslope is retained until the defined height is exceeded.Observed reference data was based on contributions of volunteered geographic information (VGI) from local residents, experts, and authorities.On the building level, available data included post-event photographs taken by residents.Point-based estimates of sediment deposition heights (Figures 3 and 4; in meters) and degrees of loss were determined for each affected building.
All height estimations are associated with a confidence level, primarily based on the quality of the post-event photograph and the ability to determine deposition heights accurately.Figure 4 shows three examples of photographs of affected buildings in Brienz.In the top left, sediment deposition heights can be observed on the outer wall of the building, however, the location of the ground (i.e., reference) cannot be clearly determined.
As a result, this estimated observed height would be assigned a lower confidence level (2 or 3: somewhat uncertain) than the height estimated from the photograph in the top right (1: very certain), where the ground can be observed.While sediment deposition heights can be observed in the orthophotograph (bottom) for several buildings, the associated confidence level (4: highly uncertain) is much lower due to the angle at which the photograph is taken.Figure 4 shows the spatial distribution of estimated sediment deposition heights with respect to associated confidence levels.Post-event orthophotographs [72] provide a synoptic view of the affected site and building level damages.The sediment deposition extent (Figure 3) was delineated from an orthophotograph that shows the limits of where the debris flow materials were deposited in Brienz.Observed reference data was based on contributions of volunteered geographic information (VGI) from local residents, experts, and authorities.On the building level, available data included post-event photographs taken by residents.Point-based estimates of sediment deposition heights (Figures 3 and 4; in meters) and degrees of loss were determined for each affected building.In summary, three sources of reference data were available to support model validation, namely the sediment deposition extent, estimates of sediment deposition heights distributed across the alluvial fan area, and a point-estimate estimate of flow velocity (i.e., between 6 and 10 m/s) at the Glyssibrück.

Model Description
FLO-2D [27] is a two-dimensional, physically-based distributed flood routing model that is also capable of simulating non-Newtonian sediment flows.In particular, cumulative shear stress in hyperconcentrated sediment flows is simulated, which includes debris flows, mudflows and mud floods [58].Cumulative shear stress in the model is comprised of five components, namely, cohesive yield stress, Mohr-Coulomb shear, viscous, turbulent, and dispersive shear stress.A quadratic rheological model function of sediment concentration adds turbulent and dispersive terms to the original Bingham equation when the aforementioned components are expressed as shear rates [86].Furthermore, the model is driven with the definition of a mudflow hydrograph and conserves the defined volume of sediment and water over a square grid-based, user-defined computational domain based on a cell storage approach.Variable combinations of model parameter values results in a specific rheology or flow behavior, which effectively describes how the mud matrix is distributed over a given landscape.In particular, the geophysical flows are modeled as simplified Bingham fluids, which is defined by yield stress and viscosity coefficients [86].The rheology can be further calibrated with the definition of the model-specific sediment detention input factor.

Performance Metrics
While numerical models have been previously applied to replicate hazardous events, it is instrumental to gain a sound understanding about simulated outputs, in addition to model-and calibration-specific limitations, with respect to available data.Performance metrics provide a means to assess these components in a systematic and reproducible way.In this study, two sets of performance assessments are carried out to address two different objectives.In particular, the assessments provide insight into model performance with respect to validation data and parameter importance on simulated results.
Model performance is evaluated on the degree of agreement between simulated results with observed data, which is quantified by a series of summary scalar variables.In this study, a three-dimensional evaluation was conducted with validation data describing both the sediment deposition extent (x, y) and sediment deposition heights (z).Furthermore, a single, point-based comparison of simulated and estimated flow velocities was performed to determine whether the simulated outputs were within a reasonable range.
A measure of fitness, F, quantifies the binary (i.e., yes or no) agreement between observed and simulated sediment deposition extent per grid cell as a percentage over the study site.Both instances of over-and under-prediction in the simulated extent were penalized with respect to the observed extent [87].
The global root mean square error (gRMSE) provides insight on the overall agreement between observed and simulated sediment deposition heights and is expressed in meters.Once the summary scalar variables were computed, model outputs were ranked by highest F scores and lowest gRMSE values to identify combinations of input factors that matched the validation data most closely.
Models of complex phenomenon, such as natural hazards, are often characterized by a large number of input factors that may interact in non-linear ways [88].To minimize the uncertainty associated with SA findings, a larger number of simulations is required to increase the sampling density [89].This is especially important for the exploration of high dimension parameter spaces, so that there is a higher chance of capturing complex interactions among parameter sets.In this study, 3876 simulations representing different input factor combinations were completed.Further assessment of input factor combinations was conducted with two statistical approaches, namely regression trees and random forests.
Regression trees [90] begin with a root node and end with terminal nodes or leaves.The root node of the regression tree accounts for the total number of completed simulation runs or training set of data that are being considered in the analysis, with respect to a defined objective function (e.g., higher F or lower gRMSE).The training set is recursively partitioned into subsets of simulations that reflect combinations of input factor values that satisfy the objective function.The complexity parameter, cp, is a value that defines the amount by which splitting the current node improves the relative error.In the case of ANOVA splitting, if the overall r 2 is increased by the defined cp value at each step, the regression tree continues to grow.In this study, two regression tree models were generated, based on average gRMSE and F scores, respectively.
A random forest is based on a set of independently generated regression trees [91].To further the analysis, a measure of predictor importance (%IncMSE) was returned after a random forest was grown.If a predictor is important in the model, randomly assigning it realistic values (i.e., permuting predictor values over the dataset) should result in a worse prediction with the same model.A comparison can be made with respect to a predictive measure (i.e., mean squared error or MSE) by comparing the model prediction results based on original and permuted datasets.It is expected that predictions with the original dataset are better than ones made with the permuted dataset.
Consequently, %IncMSE represents the increase in the mean squared error (MSE) of predictions, which is estimated by out-of-bag cross validation.First, a regression forest is grown.The MSE associated with the original model is computed and assigned to MSE0.Each predictor variable is then permuted a new model MSE(x) is calculated.%IncMSE of the predictor is determined by: (MSE(x) − MSE0)/MSE0 × 100%, or the difference between the reference model MSE0 and the new model MSE(x).The values are scaled to support the comparison over multiple predictors.The predictor is considered to be relatively important if the %IncMSE is higher.
While random forests do not have the same explanatory power as individual regression trees [90,92,93], they have offer the advantages of limiting overfitting while minimizing errors due to bias.This is because an individual regression tree is sensitive to changes in the training data and splitting criteria, which can result in different tree structures and subsequent explanations [90].Thus, results from both statistical methods are complimentary and provide insight on model behavior as a function of the relative sensitivity of individual parameters and the effects of their joint interactions.

Model Performance Assessments
A total of 3876 simulations were completed with the FLO-2D model, each representing the evaluation of a different combination of the seven input factors described in Table 2. Summary scalar variables were calculated to determine how closely simulations matched the observed debris flow event.For all completed simulations, the F scores ranged between 24.28% and 53.42%, which represent the lowest and the highest percent of binary agreement between the simulated and observed sediment deposition extent.The global root mean square error ranged between 0.60 and 1.16 m when considering only estimated reference heights associated with very high confidence (gRMSE1), and between 0.87 and 1.23 m when data points of all confidence levels were considered (gRMSE4).Based on this result, only gRMSE1 was subsequently used to rank the performance of simulation runs.
Tables 4 and 5 present the highest overall performing simulations ranked based on F and gRMSE1 scores, respectively.In general, the simulations with lower gRMSE1 scores are associated with a difference between simulated and reference sediment deposition heights of approximately 0.10 m lower than height differences associated with the top performing simulations ranked on the F score.However, the corresponding agreement between simulated and observed extents is 8-9% lower.
Furthermore, Table 6 presents the highest performing simulations from each group of rheological values that were evaluated in this study (Table 2) based on F scores.The ranges provide a first impression about model sensitivity to different parameter combinations, especially to definitions of rheology (i.e., input factors x 1-4 and x 7 ).
In general, F scores were low (i.e., a maximum of 53% agreement matching the observed extent) and gRMSE1 values were high (i.e., >0.6 m difference between the simulated and observed heights).Summary scalar variables associated with the highest performing simulations for each type of rheology (i.e., A-F) are presented in Table 6. Figure 5 illustrates the simulated results that were generated based on the definition of two different rheologies (i.e., B and F) and input factor combinations.While there are certain similarities in terms of deposition extent (i.e., fairly close reproduction on the right side of the alluvial fan towards the mountains and under-prediction of extent in the bottom-left side), there are notable differences in the spatial distribution of sediment heights within the modelled extent.Additionally, the flow behavior that is defined in each simulation affects the flow velocities across the alluvial fan.The linkages among flow behavior, sediment deposition extent and heights, and flow velocity are further exemplified in the comparison of the two selected simulations.While the modelled output of B53 captures the sediment deposition heights relatively better than the output of F300, the maximum velocity is notably lower than that estimated at the Glyssibrücke (i.e., 2-4 m/s versus 6-10 m/s); the point-based simulated velocity of the F300 output matched that of the estimate provided by local experts.However, the agreement between simulated and estimated deposition heights is lower and a larger quantity of simulated debris flow materials was deposited into Lake Brienz.sediment deposition heights against all of the reference height data and a subset consisting of only very certain height data (Figure 5).The adjusted r 2 value [76] associated with using all of the data ranged from 0.01 to 0.07, while the adjusted r 2 value associated with the subset was relatively higher and ranged from 0.09 to 0.24 (Table 6).For the highest performing model output (i.e., B53), 24.57% of the variance between simulated and reference heights could be explained, an increase of about 17% from using all available observation heights associated with variable confidence levels.Scatterplots with simple linear regression models compared the performance of simulated sediment deposition heights against all of the reference height data and a subset consisting of only very certain height data (Figure 5).The adjusted r 2 value [76] associated with using all of the data ranged from 0.01 to 0.07, while the adjusted r 2 value associated with the subset was relatively higher and ranged from 0.09 to 0.24 (Table 6).For the highest performing model output (i.e., B53), 24.57% of the variance between simulated and reference heights could be explained, an increase of about 17% from using all available observation heights associated with variable confidence levels.
As the model outputs did not reproduce the debris flow event in a highly satisfactory way (e.g., F > 75%; gRMSE < 0.5 m), it is difficult to justify further inclusion of simulated results to develop physical vulnerability curves.However, the wide ranges of both F scores and gRMSE values indicate that some parameter combinations did perform better than others.

Parameter Importance and Model Behaviour Assessments
To investigate model behavior as a function of the relative sensitivity of individual parameters and the effects of their joint interactions, two statistical methods described in Section 2.2 were applied.
Two were regression trees grown to support this study.The resulting tree models can change according to variable definitions of cp values; this describes the range of input factor combinations that generally produce simulations that have higher or lower agreements with observed data.For instance, in both of the regression trees, simulated outputs with higher degrees of agreement with observed data were produced by increasing the surface detention input factor from the default value of 0.03 m to a certain limit before performance no longer improves.In Figure 6, the blue circle indicates the cluster of simulation runs (n = 21) that is associated with the lowest averaged gRMSE1 values (0.68 m).These simulations have defined surface detention values < 1.4 m, a volumetric sediment concentration < 47%, and rheological parameters where the viscosity coefficient β 2 < 28 and yield stress coefficient α 1 < 356 × 10 6 .In Figure 7, the cluster of simulation runs (n = 228) indicated by the blue circle produced results with the highest averaged F scores (44% agreement between observed and simulated extents).These simulations were defined by surface detention values < 0.7 m, a volumetric sediment concentration < 32%, and the viscosity coefficient β 2 ≥ 23.As the model outputs did not reproduce the debris flow event in a highly satisfactory way (e.g., F > 75%; gRMSE < 0.5 m), it is difficult to justify further inclusion of simulated results to develop physical vulnerability curves.However, the wide ranges of both F scores and gRMSE values indicate that some parameter combinations did perform better than others.

Parameter Importance and Model Behaviour Assessments
To investigate model behavior as a function of the relative sensitivity of individual parameters and the effects of their joint interactions, two statistical methods described in Section 2.2 were applied.
Two were regression trees grown to support this study.The resulting tree models can change according to variable definitions of cp values; this describes the range of input factor combinations that generally produce simulations that have higher or lower agreements with observed data.For instance, in both of the regression trees, simulated outputs with higher degrees of agreement with observed data were produced by increasing the surface detention input factor from the default value of 0.03 m to a certain limit before performance no longer improves.In Figure 6, the blue circle indicates the cluster of simulation runs (n = 21) that is associated with the lowest averaged gRMSE1 values (0.68 m).These simulations have defined surface detention values < 1.4 m, a volumetric sediment concentration < 47%, and rheological parameters where the viscosity coefficient β2 < 28 and yield stress coefficient α1 < 356 × 10 6 .In Figure 7, the cluster of simulation runs (n = 228) indicated by the blue circle produced results with the highest averaged F scores (44% agreement between observed and simulated extents).These simulations were defined by surface detention values < 0.7 m, a volumetric sediment concentration < 32%, and the viscosity coefficient β2 ≥ 23.Based on the results of the random forest model (Figure 8), the predictors within the FLO-2D model are identified and ranked.Relative parameter importance is determined based on highest %IncMSE.In this study, volumetric sediment concentration, surface detention, viscosity coefficients, and yield stress coefficients are the most important input factors in the FLO-2D model, respectively.Specific gravity is associated with a negative %IncMSE.This indicates that the randomly permuted predictor values performed better than the original values.Consequently, while the variability observed in Tables 4 and 5 show notable variability in the specific gravity values associated with top performing simulations, model outputs are insensitive to these changes and the predictor is not considered to play an important role in the model with respect to the contributions of other input factors.

Sensitivity Analysis for Model Calibration
The utility of conducting SA for the calibration of a process model to reproduce a past event was assessed with a series of performance metrics.In natural hazard studies, observations or reference data have inherent uncertainties attributed to a range of sources such as lack of standardized data collection and interpretation guidelines, pre-processing errors, and uncertainties introduced with Based on the results of the random forest model (Figure 8), the predictors within the FLO-2D model are identified and ranked.Relative parameter importance is determined based on highest %IncMSE.In this study, volumetric sediment concentration, surface detention, viscosity coefficients, and yield stress coefficients are the most important input factors in the FLO-2D model, respectively.Specific gravity is associated with a negative %IncMSE.This indicates that the randomly permuted predictor values performed better than the original values.Consequently, while the variability observed in Tables 4 and 5 show notable variability in the specific gravity values associated with top performing simulations, model outputs are insensitive to these changes and the predictor is not considered to play an important role in the model with respect to the contributions of other input factors.Based on the results of the random forest model (Figure 8), the predictors within the FLO-2D model are identified and ranked.Relative parameter importance is determined based on highest %IncMSE.In this study, volumetric sediment concentration, surface detention, viscosity coefficients, and yield stress coefficients are the most important input factors in the FLO-2D model, respectively.Specific gravity is associated with a negative %IncMSE.This indicates that the randomly permuted predictor values performed better than the original values.Consequently, while the variability observed in Tables 4 and 5 show notable variability in the specific gravity values associated with top performing simulations, model outputs are insensitive to these changes and the predictor is not considered to play an important role in the model with respect to the contributions of other input factors.

Sensitivity Analysis for Model Calibration
The utility of conducting SA for the calibration of a process model to reproduce a past event was assessed with a series of performance metrics.In natural hazard studies, observations or reference data have inherent uncertainties attributed to a range of sources such as lack of standardized data collection and interpretation guidelines, pre-processing errors, and uncertainties introduced with

Sensitivity Analysis for Model Calibration
The utility of conducting SA for the calibration of a process model to reproduce a past event was assessed with a series of performance metrics.In natural hazard studies, observations or reference data have inherent uncertainties attributed to a range of sources such as lack of standardized data collection and interpretation guidelines, pre-processing errors, and uncertainties introduced with spatial averaging.Furthermore, sources of uncertainties also exist with respect to model limitations and working assumptions.With the acknowledgement of these uncertainties, absolute corroboration or rejection of a model on the basis of performance metrics alone has its limitations.Nevertheless, the results of a sensitivity analysis provide useful guidance on the identification of parameter importance on model behavior and effectively reduces the amount of parameter space to explore.
Based on the results presented in Tables 3 and 4, it can be observed that the accurate definition of rheological parameters is the basis to capturing representative flow behavior.Additional input factors such as surface detention and volumetric sediment concentration were identified by both regression trees and the random forest model to have a notable effect on improving the overall agreement between simulated and reference data through further calibration.In particular, sediment detention values effectively modify rheological properties by introducing a height threshold to each computational grid cell, so that the debris flow materials are retained for a longer time before further propagating downwards as the detention height threshold is exceeded.In effect, the surface detention value modifies the flow viscosity.Volumetric sediment concentration values are directly used to generate mud hydrographs, which influence the definition of flow characteristics.The results of the sensitivity analysis helped to refine reasonable range of values for the two input factors for further calibration, highlight the importance of rheological properties, in addition to showing that variations in specific gravity did not result in notable changes in model outputs.Through the investigation of parameter importance and their joint interactions, we can continue to build a knowledge base for a more efficient locally-based calibration within sub-regions of the parameter space that may be of interest [59].With respect to processing modelling past torrent events with FLO-2D, it is recommended that in situ sediment samples be collected to represent flow behavior as accurately as possible.

Model Performance Assessment: Potentials and Limitations
Several working assumptions should be considered to determine if the FLO-2D model is suited to replicating torrent event and the potential limitations if it is applied.Firstly, certain gravity-driven geophysical flows that comprise of solid particles within a fluid (e.g., snow avalanches, lava, mud and debris flows) are assumed to behave as relatively homogeneous, single-phase flows at a macroscopic scale [28].In reality, the materials that comprise the matrix may be more accurately described as multi-phased.Furthermore, in the case of Brienz, the matrix was also capable of transporting boulders between 3 and 5 m in diameter.Since particles within granular flows experience frictional and collisional interactions, discrete element modeling is needed to account for these types of interactions in order to effectively describe flow behaviors [81].The treatment of such geophysical flows as a single-phase phenomenon effectively reduces the complexity of the problem [94,95].Consequently, both pore pressure and inter-grain frictional effects are considered to be negligible and bulk rheological properties (i.e., constant dynamic viscosity and yield stress) are assumed to apply to the entire matrix in FLO-2D.Furthermore, the theory and working assumption of viscoplasticity may not be valid for describing all geophysical flows, where Coulomb plasticity or other models may describe the behavior more accurately [96,97].In the study conducted by Bertolo and Bottino [98], the authors expressed that given the complexity of geophysical flows, multiple rheological models may be necessary to capture the full range of characteristics of a specific debris flow.This recommendation also applies to different events occurring at the same location.
Under the assumption of homogeneity, the geophysical flow modelled in FLO-2D should primarily be comprised of fine materials.Ideally, laboratory-based assessments of collected field samples can then be conducted to infer rheological properties about the materials [60].However, field samples of the torrent event of interest may not be available.For these cases, it may be possible to simulate an event using rheological values cited in literature from materials with comparable mineralogical compositions and/or grain size distributions (e.g., [69]).
Model results provide insight about the feasibility of simulating past natural hazard events with limited field-based model input data.The study focused on determining whether ranges of FLO-2D input factor values could be adequately estimated from available data and whether model calibration could generate simulated results with a strong agreement with the reference data.In particular, the definition of flow behavior based on rheological values from literature highlighted the limitations of this approach.For this study, where the exact grain size distribution is unknown and the estimated range of the volumetric concentration is wide (i.e., 30-70%), the event was simulated based on assumptions of homogeneity.Rheological parameter values obtained from literature varied in terms of their mineralogical compositions.Furthermore, it is important to note that flow behavior within a torrent can change within the duration of a given event.Consequently, the defined range of parameters captures generalized flow physics.In lieu of additional information to constrain available rheological values in literature, those that were associated with comparable grain size distributions to the study site were favored.
In a study by Sosio et al. [99], shear stress was observed to vary for flows comprised of greater amounts of solid content at relatively higher volumetric concentrations (C v = 45-63%).This variability has been attributed to the local effects of air bubbles or the formation of grain clusters in the mixture, which violate the continuum assumption for mud matrices characterized by a higher C v .This could provide some explanation for the lower agreement of simulated and observed extents when using literature-based values corresponding to materials from torrent events characterized by a lower C v to calibrate for higher C v values (i.e., 45-70%) in this study.
Additionally, O'Brien and Julien [84] observed that for mudflow matrices with < 20% volumetric sand concentrations, the viscosity corresponds with that of the silt-clay fraction of the mixture.This relationship is important since rheological analysis to determine viscosity and yield stress values is only conducted with the fine fraction of a given mixture.The specific maximum threshold of grain size that is included in a given rheological analysis is independently determined by each working group.For example, lab-based rheological analyses were conducted based on grain sizes < 0.425 mm and < 0.063 mm from field samples, in studies conducted by Sosio et al. [67] and Boniello et al. [70], respectively.This highlights potential limitations with determining the rheological properties of a complex event, such as the 2005 debris flow, based only on the defined fine fraction of the associated grain size distribution.Uncertainty is further introduced when rheological values derived with this reductionist approach are applied to calibrate a different study site such as Brienz.Furthermore, considering the working assumptions of homogeneity while reviewing the range of materials that are identifiable in post-event photographs, it is clear that the impact of grain sizes beyond the defined fine fraction and larger pieces of debris within the mud matrix cannot be accounted for with this specific model.However, simulated results are generated based on the consistent application of rules.Consequently, outputs will have a consistent degree of error attributed to problem oversimplification or the effects of spatial averaging.This is in contrast to variable, often human errors that can be introduced from data collection and subjective interpretation stages, especially when multiple parties and participants are involved.
The FLO-2D model respects mass conservation.Consequently, over-prediction of the sediment deposition extent is related to the consistent under-prediction of sediment deposition heights.Additionally, the accurate definition of the flow behavior through the rheological parameters characterizes the velocity at which the flow propagates downwards and the manner in which it spreads laterally across the alluvial fan.Lower performing simulations were generally characterized by an over-prediction of the extent, low deposition heights and high velocities attributed to flows with low viscosity (Figure 5, simulation F300).On the contrary, relatively higher performing simulations were generally characterized by a combination of under-and over-prediction of the extent, relatively higher deposition heights and low velocities attributed to more viscous flows (Figure 5, simulation B53).
The results of this study illustrate that while the visual agreement between simulated and observed deposition extents may look promising, quantifying the degree of agreement with a three-dimensional approach provides a clearer picture about the feasibility of reproducing a past event under the aforementioned constraints.In particular, this addresses past observations that while process models often require calibration, many simulated results have yet to be comprehensively evaluated against field events [32].In the case of reproducing past torrent events, additional agreement between simulated outputs and reference deposition heights and/or available flow velocities is necessary to determine the accuracy of the defined flow behavior.The current study effectively extends the analysis initiated by past studies that were conducted with the consideration of only a limited number of sediment deposition heights (i.e., n < 20).It is a prerequisite to accurately define flow behavior to be able to generate representative intensity proxies (e.g., velocities and impact pressures) via process modeling.Only then can the inclusion of accurately simulated intensity proxies contribute to the derivation of physical vulnerability curves for consequence analysis.

Conclusions
Based on the analysis results to date, sources of uncertainties from the model, input data and reference data limit further inclusion of simulated intensity proxies from this study for further development of physical vulnerability curves.Nevertheless, the proposed method and tools support the quantitative representation of aggregated uncertainties related to data quality, parameterization and model suitability.In light of these findings, the following recommendations may be of interest to researchers and practitioners in the natural hazards and risk community: The findings of the sensitivity analysis demonstrated how the most influential input factors within a given model can be identified.This provides guidance on setting priorities for future data collection and process modeling efforts.In particular, the definition of flow behavior is a key prerequisite to obtaining more highly accurate simulated results and is predicated upon acquiring representative post-event field samples.Collecting field samples of materials to determine rheology in the lab generates much higher agreement in modelled results than the use of rheology reported in literature from other study sites.Additionally, Tiranti and Deangeli [100] presented a method of interest that predicts probable rheologies of alpine debris flows, based on the availability of source area data such as lithology.
Exploration of alternative methods to determine deposition heights in a consistent way to reduce uncertainties about validation data accuracy (i.e., reduce the range of confidence levels) is imperative before a more accurate understanding of process and model behavior is possible.For instance, advances in satellite, especially after 2015, enables researchers to produce highly accurate and up to date digital terrain models that reflect the landscape characteristics well, especially with shorter temporal return periods where images captured immediately following event occurrence become more readily available.Differences in the landscape before and after a hazard event can be calculated from a pre-event and post-event image acquired immediately over the affected area.Cavalli et al. [101] presented a method to detect geomorphic changes in mountain catchments based on increasingly available high-resolution DEM data.Alternatively, the difference between pre-and post-event orthophotographs containing height information can also be used to calculate the sediment deposition height across an area of interest.
Exploration of alternative process models with different working assumptions and approaches and consideration of models that can be calibrated automatically to support the efficient exploration of larger parameter spaces is instrumental.Models with the latter characteristic can reach optimal solutions under reduced computational times.
In summary, the conclusions of this study are twofold.Firstly, the results support previous findings (e.g., [102]) that the main limitation of the FLO-2D model is predicated on the use of simplified Bingham equations to represent complex debris flow physics.In doing so, the granular flow of heterogeneous materials such as debris and rock fragments within an interstitial mud [103] is reduced

Figure 1 .
Figure 1.Vectorized post-event orthophotograph [72] capturing the distribution of mud and debris in Brienz.Figure 1. Vectorized post-event orthophotograph [72] capturing the distribution of mud and debris in Brienz.

Figure 1 .
Figure 1.Vectorized post-event orthophotograph [72] capturing the distribution of mud and debris in Brienz.Figure 1. Vectorized post-event orthophotograph [72] capturing the distribution of mud and debris in Brienz.

Figure 2 .
Figure 2. Schematic illustrating how the different sources of data are fed into the dynamic numerical model to generate simulated outputs; summary scalar variables are subsequently produced to support model performance evaluations and provide insight about the potential use of simulated intensity proxies to further develop physical vulnerability curves.

Figure 2 .
Figure 2. Schematic illustrating how the different sources of data are fed into the dynamic numerical model to generate simulated outputs; summary scalar variables are subsequently produced to support model performance evaluations and provide insight about the potential use of simulated intensity proxies to further develop physical vulnerability curves.

Figure 3 .
Figure 3. Two main sets of reference data are available for the study site including sediment deposition extent and 317 point estimates of sediment deposition heights that are associated with spatial coordinates and confidence levels.

Figure 3 .
Figure 3. Two main sets of reference data are available for the study site including sediment deposition extent and 317 point estimates of sediment deposition heights that are associated with spatial coordinates and confidence levels.

Figure 4 .Table 3 .
Figure 4. Selected examples of post-event photographs of affected buildings in Brienz; sediment deposition heights estimated from the photographs are associated with variable levels of confidence.Table 3. Summary of selected rheological values from the literature to evaluate with the FLO-2D model with respect to the 2005 debris flow event, in lieu of location-specific post-event field samples; a unique ID is assigned to each group of values.

Figure 4 .
Figure 4. Selected examples of post-event photographs of affected buildings in Brienz; sediment deposition heights estimated from the photographs are associated with variable levels of confidence.

Figure 5 .
Figure 5.The two selected maps exemplify the agreement between simulated and observed sediment deposition extents, in addition to the spatial distribution of the simulated deposition heights for two of the best performing simulation runs; simulated maximum velocities provide a quick comparison with the point-based velocity estimate of 6-8 m/s that was determined by local experts at the Glyssibrücke.

Figure 5 .
Figure 5.The two selected maps exemplify the agreement between simulated and observed sediment deposition extents, in addition to the spatial distribution of the simulated deposition heights for two of the best performing simulation runs; simulated maximum velocities provide a quick comparison with the point-based velocity estimate of 6-8 m/s that was determined by local experts at the Glyssibrücke.

Figure 8 .
Figure 8.The results of the random forest analysis ranks model input factors based on importance.

Figure 8 .
Figure 8.The results of the random forest analysis ranks model input factors based on importance.

Figure 8 .
Figure 8.The results of the random forest analysis ranks model input factors based on importance.

Table 1 .
Summary of published literature that assessed FLO-2D model results.

Table 2 .
Overview of required inputs in FLO-2D and feasible ranges of values specific to the 2005 debris flow event that occurred in Brienz.

Table 3 .
Summary of selected rheological values from the literature to evaluate with the FLO-2D model with respect to the 2005 debris flow event, in lieu of location-specific post-event field samples; a unique ID is assigned to each group of values.

Table 4 .
Overview of top 10 performing simulations based on highest F fitness measurement values.

Table 5 .
Overview of the top 10 performing simulations based on lowest gRMSE1 values calculated with very certain reference height data only.

Table 6 .
Comparison of parameter combinations and summary scale variables for the top performing simulations runs for each of the six sets of rheologies obtained from published literature to back-calculate the 2005 debris flow in Brienz.