Advanced Uncertainty Quantification for Flood Inundation Modelling

: Flood hazards present a significant risk to the UK, with homes, businesses and critical infrastructure exposed to a mixture of fluvial, surface water and coastal flooding. Climate change is expected to influence river flows, changing the frequency and magnitude of future flood events. Flood hazard assessments are used by decision-makers to implement policies and engineering interventions to reduce the impacts of these flood events. Probabilistic flood modelling can explore input and parameter uncertainties in flood models to fully quantify inundation uncertainty. However, probabilistic methods require large computational costs—limiting their application. This paper investigates a range of advanced uncertainty quantification methods (traditional Monte Carlo (FMC), Kriging and multi-fidelity Monte Carlo (MFMC)) to reduce the dichotomy between accuracy and costs. Results suggest that Kriging can reduce computational costs by 99.9% over FMC. The significantly increased efficiency has the potential to improve future policy and engineering decisions, reducing the impacts of future flood events.


Introduction
Since 2000, the UK has experienced over GBP 24 Bn in direct economic damages from over 45 significant natural-hazard related disasters including the 2013/14 and 2015/16 storm seasons, affecting over half a million people [1].The indirect economic damages remain unquantified but are likely even larger.Critically in the UK, such natural hazards are projected to increase in frequency, magnitude, and intensity [2][3][4].Designing and optimising cost-effective flood resilience and adaptation plans requires accurate estimation of changes in present and future flood inundation at a range of scales from regional strategic plans to development scale assessments.To inform these assessment, scientists and engineers routinely use hydro-dynamic models to simulate flood events and deliver predicted inundation extents; however, these complex numerical models are computationally heavy.This poses scientific challenges to users facing the tension between increasing accuracy (model resolution) and increasing computational burden [5,6].
Future estimates of flood hydrology are required to drive hydro-dynamic models, which is fraught with uncertainties associated with, among others, climate projections, underlying data and flood hydrology methods, as well as the ability to project the impacts of change on river flows [4,7].Engineering interventions to reduce flood hazard footprints must account for future climate uncertainties in the input hydrograph (i.e., uncertainties in the extreme flows) and be capable of understanding the implications of the entire range of flows for a given return period event.As such, a full probabilistic assessment of the input uncertainty is required.However, due to the computational demands associated with uncertainty quantification, it is common for this to be performed in a deterministic manner using climate change uplifts or sensitivity analysis [8,9].
Water 2024, 16, 1309 2 of 18 Probabilistic assessments of future flood hazards are a necessary, yet costly, step towards greater flood resilience.Exploring the uncertainties associated with input and parameter spaces is a complex issue at the heart of many scientific fields [10][11][12][13][14][15].Understanding the entire range of possible outputs provides decision-makers with the most information, allowing informed choices to reduce the impacts of flood events.Gaining a full understanding of the output space-with distributions converged to equilibriumnormally requires an approach using thousands of simulations, which rapidly becomes computationally unfeasible.One such technique is Full Monte Carlo (FMC).The gold standard to probabilistic assessments, FMC uses a brute force approach to quantifying uncertainties, requiring tens of thousands of simulations to provide a robust solution [16].Such a resource-intensive approach is not possible in time-sensitive commercial flood hazard assessments, and uncertainties thus remain unquantified.
Previous work by the authors has investigated reducing computational demands of uncertainty quantification using stratified sampling (e.g., Latin Hypercube Sampling) and multi-level Monte Carlo (MLMC) methods across three case study locations [5,6].Aitken et al. (2022) compared MLMC and Latin hypercube sampling (LHS) to traditional FMC, testing both speed and accuracy.Four time limits were used to show the increased accuracy under realistic time constraints; minimum costs for convergence of the output distribution were investigated across the three methods.Results suggested that advanced uncertainty quantification methods, such as (MLMC), make high-resolution probabilistic modelling computationally feasible with a 99.5% reduction in core hour costs.However, alternative probabilistic methods may further reduce costs and/or improve the accuracy of these methods.Two such approaches are explored herein: Kriging and multi-fidelity Monte Carlo (MFMC).Originally developed for geostatistical assessments [17], Kriging is a proxy modelling technique that uses a small set of training samples to model the relationship between input and output values.From this smaller, cheaper set of simulations, a larger set of results can be interpolated in seconds, providing output flooded area values for the entire input distribution.Kriging has been used in a number of different fields (including precipitation mapping [18]) but is yet to be applied to flood hazard modelling.The second method-MFMC-builds upon Kriging by combining proxy models across multiple resolutions into a multi-level framework with the aim of further reducing computational costs.There have been MFMC model designs in oil reservoir and climate modelling, investigating two subsurface flow models with an order of magnitude speedup [19] and simple climate systems [20], respectively.However, the implementation of Kriged models in a multi-level framework is a novel aspect of MFMC in both flood modelling and across scientific domains.It is important to note that these two advanced methods are applied across the output flooded area distribution and not during the running of the hydraulic model.This paper explores the potential opportunity for combining Kriging and MLMC into an MFMC method.This is a novel approach for uncertainty quantification in flood hazard assessments.The main purpose of this work is to compare the ability of Kriging and MFMC methods to reduce computational costs associated with high-resolution results.This has been performed by assessing the ability of advanced uncertainty quantification (UQ) methods to match traditional Full Monte Carlo distributions and the minimum costs required to achieve the same errors.

Uncertainty Quantification Methods
Understanding the probability associated with flood hazard assessments is a critical step towards reducing the impacts of increasingly frequent, devastating flood events [21].Assessing a range of hydrological inputs requires probabilistic methods.This captures the range and distribution of plausible flood footprints associated with a given return period event.

Full Monte Carlo
Uncertainty assessments have traditionally been performed using Full Monte Carlo (FMC).FMC has been applied in many scientific fields [10,11,22,23] and has been demonstrated in flood hazard assessments [5,[24][25][26][27]; however, the computational demands are significant.In the case of uncertain inflow, uncertainty quantification methods randomly sample inputs from a defined range of estimates and individually run corresponding hydrographs through a flood model to produce a flooded area prediction.With FMC, thousands of runs are undertaken to deliver an equilibrium distribution of potential flooded area, which quantifies the uncertainty associated with the hydrological input.Combining the individual extents into a probabilistic map provides more detailed insight into potential flood events.
The major drawback of FMC lies with the large sample sizes required to ensure convergence of the results, creating unfeasible computational costs.A sample size of approximately 10,000 simulations [16] is needed to ensure coverage of the uncertain parameter and converged outputs.Sampling errors converge with O(1/ √ n), for sample size n, independent of the parameter dimensionality.This slow rate of convergence is a consequence of the random sampling technique that tends to suffer from clustering.Thousands of simulations may be possible for cheap, low-resolution models, but this approach rapidly becomes unfeasible for more finely discretised models.However, FMC remains the gold standard for probabilistic methods and, as such, is the benchmark to which novel uncertainty quantification (UQ) methods can be compared.This ensures that accuracy can be maintained while cost-reducing measures are investigated.

Multi-Level Monte Carlo (MLMC)
Multi-level Monte Carlo (MLMC) is an UQ technique that can deliver high-resolution probabilistic outputs as a combination of a few fine grid models with a larger sample of cheaper coarse grid models.Coarse grid models are used to estimate a value of interest before adjusting this with an error calculated between the fine and coarse resolutions.A full description of the MLMC method with comparisons to FMC and LHS can be found in Aitken et al. (2022).MLMC provides a robustly tested output distribution that has already been compared directly to FMC methods in a number of fields [6,28,29].Aitken et al. (2022) identified the triple combination MLMC method to be the most effective for the three case studies examined in this paper.The work compared three MLMC models (5 m-10 m; 5 m-20 m; 2 m-10 m-20 m) with LHS and FMC.It was found that the 5-10-20 m MLMC combination reduces the costs to produce converged outputs with FMC sampling accuracy.Therefore, MLMC will herein refer to this triple combination version.

Kriging
Proxy modelling is an alternative UQ method that emulates the behaviour of a simulation model at a much lower computational cost.Proxy models use known results to determine the relationship between input and output variables and interpolate for the unknown values [12,30,31].A trained proxy model can emulate simulations for a large number of input values in minutes, reducing computational costs considerably [32].
Kriging is one such proxy model which assumes a Gaussian process to produce interpolated values [33].Commonly used in geostatistics [17,34], Kriging models are developed using a set of training samples, and the accuracy of the model is thus dependent upon the sample size and sampling approach.A minimum number of samples is required to ensure accurate interpolation.A well-trained proxy model will then be able to produce as many samples as necessary in seconds.In this study, Kriging was performed using the 'buildKriging' function in the R package 'SPOT' (version 2.1.8[35]) with the function based on code by Forrester et al. (2008) [36].Due to the reliance upon training samples, Kriging is highly dependent upon the sampling approach: random, low-discrepancy, or orthogonal.It has been found that implementing an equidistant Sobol sequence provides the best coverage of the input distribution [Supplementary Material S1].Therefore, from this point onwards, Kriging will refer to a Kriged proxy model with Sobol sampled input values.
The convergence of each Kriged model was tested to ensure accurate output distributions from the smallest training set possible.A convergence error-defined as the Kolmogorov-Smirnov (KS) test error between two consecutive kriged model outputs-was used to determine whether the model had reached equilibrium.Further KS error testing of the converged model to outputs produced using larger sample sizes was performed to ensure complete convergence, removing potential for a local convergence error minimum.For the work herein, a convergence error of 0.01 was used-equivalent to a 1% error in the cumulative distribution functions.

Multi-Fidelity Monte Carlo
The final method tested in this paper is a novel multi-fidelity Monte Carlo (MFMC) approach combining proxy modelling (Section 2.1.3)within a multi-level framework (Section 2.1.2).The principle of multi-fidelity approaches is to combine low-cost lowfidelity models with fewer high-fidelity models to speed up the estimation of the quantity of interest [19,37].MFMC applies the Kriging approach, with training sets sampled using a low-discrepancy sampling approach (Sobol sequencing) in a multi-level structure.This produces fine-grid-accuracy results with a lower, more manageable computational cost.
MFMC uses the same equations as MLMC with the addition of proxy modelled Kriging terms: where l 0 and L are the coarsest and finest grids, respectively; H K l is the Kriging model estimation; and H is the quantity of interest.Similar to Kriging, a well-trained MFMC model will be able to produce a large number of outputs in seconds.
The multi-level aspect of MFMC means that there are 3 different combinations of 3 model resolutions, which could be used to produce 5 m-resolution outputs, namely: 5-10-20, 5-10, and 5-20.However, for the sake of clarity, this paper will focus on the triple-combination 5-10-20 model.A comparison of MFMC combinations is included in the Supplementary Material S2.

1.
Set a sequence of grid resolutions l = l 0 , . . ., L, fix a number of training samples N t and the accuracy ε.

2.
Starting with l = l 0 , create a Kriging proxy model using N t samples and check the convergence; if the criteria are met, go to step 3. Otherwise, add more samples.

3.
Perform N l = N up samples of the Kriging model at level l and compute Update the mean estimator for the variance of the estimator and the cost for each mesh/grid.5.
Solve the optimization problem and update the required number of samples N l .
Evaluate extra samples at each level and then check the criteria are met as above.For each level, more samples can be added to satisfy the criteria as necessary.6.

Comparing Methods
To assess the performance of the novel uncertainty algorithms, two particular considerations must be tested: (i) firstly, statistical sampling accuracy must be tested for a given computational cost (i.e., in situations where tests are performed with time constraints); (ii) secondly, computational costs to achieve a given statistical sampling accuracy should be checked (i.e., where computational resource is constrained).These tests have been performed and results presented a range of scenarios (Section 2.3).
The study explored a robust set of scenarios: 1.A set of realistic commercial time constraints (6 h, 12 h, 24 h, and 48 h) were designed to test statical sampling accuracy.

2.
For computational cost calculations, the minimum time requirements to achieve a sampling error equivalent to converged results are compared (i.e., as with FMC (n = 10,000) [16]) and the speedup over FMC is determined.
The second test is computed by calculating the total computational cost associated with converged errors: MLMC and MFMC convergence is a built-in feature of the multi-level method algorithm [29]; Kriging convergence is dependent upon the training sample sizes and the input-output relationship with convergence once more a built-in feature.
Additionally, the mean arctangent absolute percentage error (MAAPE) and Kolmogorov-Smirnov (KS) test are used to test the convergence of flooded area output distributions to the distribution of the FMC flooded outputs.

Study Areas
Three Scottish case studies as presented in Aitken et al. (2022) [6] have been used to investigate reduced-cost uncertainty quantification methods: Dyce (River Don, NE Scotland, UK), Inverurie (River Don/River Urie, NE Scotland, UK), and Glasgow (River Clyde, W Scotland, UK) (see Figure 1).The different domain sizes, computational costs, and topographic and domain characteristics provide a diverse set of studies for robust testing of the advanced methods (Table 1).The model of Dyce on the River Don provides a simple case study with one channel, and a mixture of urban and rural land use in the riparian zone.Inverurie is upstream of Dyce on the River Don.This model includes the confluence of the Rivers Don and Urie and a more heavily urbanised floodplain.The Clyde model flows through Glasgow in a highly urbanised and industrial location.

Numerical Modelling: Hydraulic Model: LISFLOOD-FP
This research has employed LISFLOOD-FP [39], a 2D reduced physics hydro-dynamic model, due to the computational efficiency offered.The LISFLOOD-FP model is a 1D-2D hybrid model using a finite-difference solver [40].Other commercially available 2D models such as Telemac2D, TuFLOW or HECRAS2D could be used.LisFLOOD was chosen based on the speed of computation for one simulation, which was critical for this research.

Numerical Modelling: Hydraulic Model: LISFLOOD-FP
This research has employed LISFLOOD-FP [39], a 2D reduced physics hydro-dynamic model, due to the computational efficiency offered.The LISFLOOD-FP model is a 1D-2D hybrid model using a finite-difference solver [40].Other commercially available 2D models such as Telemac2D, TuFLOW or HECRAS2D could be used.LisFLOOD was chosen based on the speed of computation for one simulation, which was critical for this research.
Data for the models, including river channel bathymetry, DEM and floodplain land use, were sourced from the Scottish Environmental Protection Agency (SEPA) [41].All three models use the sub-grid channel solver and accelerated floodplain solver.National River Flow Archive gauging stations have been used as upstream boundaries for each of the locations (see Table 1).Five models of each location were built at 20 m, 10 m, 5 m, 2.5 m and 1 m discretisation to allow a variety of UQ methods combining different resolutions.
Three models were built and calibrated using PSO following Aitken et al. (2022) [6].

Model Resolution
Hydraulic model resolution has a large influence on the simulated flood extent (finer discretisation (1 m/2.5 m grids) should result in higher accuracy [42]) and model efficiency (finer discretisation (1 m/2.5 m grids) will result in longer run times [43]).Achieving a Water 2024, 16, 1309 7 of 18 compromise between cost and accuracy relies on an expert understanding of the models and domains.
Five different resolution models (20 m/10 m/5 m/2.5 m and 1 m grid) were built for each case study.A 5 m resolution is identified as the fiducial model, where increasing discretisation (e.g., 2.5 m or 1 m grids) does not represent significant improvement (within 1.5% of the inundated area) in the inundation extent but does increase the computational burden (see Figure 2).Consequently, a 5 m-resolution model was considered for the rest of the research.

Model Resolution
Hydraulic model resolution has a large influence on the simulated flood extent (finer discretisation (1 m/2.5 m grids) should result in higher accuracy [42]) and model efficiency (finer discretisation (1 m/2.5 m grids) will result in longer run times [43]).Achieving a compromise between cost and accuracy relies on an expert understanding of the models and domains.
Five different resolution models (20 m/10 m/5 m/2.5 m and 1 m grid) were built for each case study.A 5 m resolution is identified as the fiducial model, where increasing discretisation (e.g., 2.5 m or 1 m grids) does not represent significant improvement (within 1.5% of the inundated area) in the inundation extent but does increase the computational burden (see Figure 2).Consequently, a 5 m-resolution model was considered for the rest of the research.

Estimating the Uncertain Inflows
One of the most uncertain inputs in flood modelling is the hydrological boundary condition [5,27].Quantifying uncertainty in flood hydrology translates through the flood

Estimating the Uncertain Inflows
One of the most uncertain inputs in flood modelling is the hydrological boundary condition [5,27].Quantifying uncertainty in flood hydrology translates through the flood modelling chain into uncertainties in flood inundation extent and depth.Uncertainties in the hydrology can arise from several sources, for example, the gauging data (in this case, the data from the NRFA), from the extreme value distribution fitting to estimate return period events, or from cascading uncertainties from climate models to future hydrological projections.
In this paper, only a single uncertain parameter was considered: inflow variability.A 1:30 year return period event was estimated using the recorded gauge data (NRFA).This return period event is chosen as a medium-magnitude, medium-frequency event to demonstrate the applicability of the investigated methods.In the UK, a 1:30 year event is often considered for planning purposes, whereas for flood alleviation schemes the chosen event is often more extreme and less frequent (e.g., 1:100-year or 1:200-year return period event) [44].
A Generalised Extreme Value (GEV) distribution was fit to the 30-year annual maximum data from 1970 to 2000 using recorded gauge data.Maximum likelihood estimation Water 2024, 16, 1309 8 of 18 was used to fit the data prior to calculating a 95% confidence interval.This follows the approach documented in the flood estimation handbook [45] and applied to hydrological studies across the UK [21,46].Generalised Pareto (GP) or generalised logistic (GLO) distributions are also used to fit flow series data within the UK [47] but are not considered in this study.This provides extreme value distributions from which peak inflows can be sampled for each case study.These are given as follows: Dyce, 62.79-172.02m 2 /s; Inverurie, 38.93-106.2m 2 /s; Glasgow, 269.23-759.63 m 2 /s.Full input distributions can be found in Aitken et al. (2022) [6].Using the recorded flow data, a hydrograph shape from a recent event was selected (December 2015 (Dyce and Inverurie), December 1994 (Glasgow)) and used as the hydrograph shape for scaling purposes (Figure 3).Extreme value estimates were calculated for each gauge in the three modelled domains.Alternative methods of distribution generation include L-Moments [4,48] and ordinary moments [49]; however, neither has been used in this assessment.
return period event is chosen as a medium-magnitude, medium-frequency event to demonstrate the applicability of the investigated methods.In the UK, a 1:30 year event is often considered for planning purposes, whereas for flood alleviation schemes the chosen event is often more extreme and less frequent (e.g., 1:100-year or 1:200-year return period event) [44].
A Generalised Extreme Value (GEV) distribution was fit to the 30-year annual maximum data from 1970 to 2000 using recorded gauge data.Maximum likelihood estimation was used to fit the data prior to calculating a 95% confidence interval.This follows the approach documented in the flood estimation handbook [45] and applied to hydrological studies across the UK [21,46].Generalised Pareto (GP) or generalised logistic (GLO) distributions are also used to fit flow series data within the UK [47] but are not considered in this study.This provides extreme value distributions from which peak inflows can be sampled for each case study.These are given as follows: Dyce, 62.79-172.02m 2 /s; Inverurie, 38.93-106.2m 2 /s; Glasgow, 269.23-759.63 m 2 /s.Full input distributions can be found in Aitken et al. (2022) [6].Using the recorded flow data, a hydrograph shape from a recent event was selected (December 2015 (Dyce and Inverurie), December 1994 (Glasgow)) and used as the hydrograph shape for scaling purposes (Figure 3).Extreme value estimates were calculated for each gauge in the three modelled domains.Alternative methods of distribution generation include L-Moments [4,48] and ordinary moments [49]; however, neither has been used in this assessment.

Results
Previous research by the authors has identified multi-level Monte Carlo (MLMC) as a more efficient uncertainty quantification than FMC with no loss in output accuracy [6].Results from that study suggest that MLMC methods reduce the total computational cost by at least 99.5% for all three case studies investigated.As such, the following methods will be compared directly to FMC and MLMC-specifically the three-resolution (5-10-20) version-using accuracy tests and computational cost assessments described in Section 2.2.For clarity, these include the time required to match FMC outputs and the accuracy of flooded area output distributions produced by running as many simulations as possible in the allotted time constraints.
Every result is compared at a 5 m grid resolution-although MLMC and MFMC use coarser grids to increase computational efficiency.This has been identified as the most appropriate resolution for each case study, providing manageable individual simulation costs with low amounts of discretisation error (Section 3.2).Four time constraints representative of commercial models have been used to explore performance within computational limits: 6 h, 12 h, 24 h, and 48 h.Results have been compared to FMC and MLMC model outputs, testing the accuracy of the advanced UQ methods within similar confined costs from the previous paper by Aitken et al. (2022) [6] (note that the largest time constraint from the previous study has been replaced with a smaller 6 h limit, reflecting the improvements gained from the advanced methods).Violin plots are used to compare the shape and range of output flooded area distributions (Figure 4).For each time limit, MLMC, Kriging and MFMC results are compared directly to FMC outputs.This provides insight into which model replicates the gold-standard FMC distribution the most effectively.Computational costs were compared for the three uncertainty quantification techniques where the sampling error was set to match FMC ( = 10,000).To do this, the number of simulations to achieve converged results for the 5 m-resolution model was recorded.Convergence is a built-in feature of both MLMC and MFMC methods and is dependent upon the ratios of computational cost and variance between resolution levels.Results show that Kriging produces the most accurate outputs for every time limit considered.The Kriging method captures the general long-tailed distribution shape and matches FMC extremes even at the lowest 6 h limit (equating to 12 training simulations) (see Figure 4).The complex bi-modal distribution shape is captured for the larger two limits-more efficiently than the other methods tested.
MFMC results show greater efficiency than MLMC, but the performance is not as strong as Kriging.Output distributions match the general FMC shape for upper time constraints (faster than MLMC) but miss the detailed inflexions that Kriging captures.The extremes of the FMC results are replicated across all times investigated as a result of the Sobol sequenced training values.However, the distribution shape is not picked up as accurately as Kriging.The general long-tailed shape is captured across all times with the bi-modal shape beginning to form with the higher time constraints.
Both Kriging and MFMC methods outperform MLMC across each of the four realistic time constraints.Implementing either of the advanced methods produces more accurate results than multi-level Monte Carlo for both the extremes and shape of FMC.Kriging was found to converge towards the complex output distribution more efficiently than other methods tested.

Dyce: Comparison of Methods: Required Simulations for Error Convergence
Computational costs were compared for the three uncertainty quantification techniques where the sampling error was set to match FMC (n = 10,000).To do this, the number of simulations to achieve converged results for the 5 m-resolution model was recorded.Convergence is a built-in feature of both MLMC and MFMC methods and is dependent upon the ratios of computational cost and variance between resolution levels.
Introducing a Kriged model reduces the number of simulations to 59 on the 5 m model, a 1.42-fold speed up over MLMC (see Table 2).Implementing a proxy modelling approach reduces the number of simulations required to match the FMC output distribution to less than 30 h.This large reduction in cost equates to a 169-fold speedup over FMC.Both MAAPE and K-S test results indicate strong agreement between the output distributions with a 0.21% mean error and 0.029 maximum distance.Combining three Kriged models in an MFMC model reduces the number of 5 m simulations required for the proxy model but has a larger overall cost than Kriging.MFMC improves costs over FMC and MLMC with a 1.3-fold speedup over the latter (see Table 2).However, the need to create and train three kriged models results in a 2.4 h increase in costs over Kriging.MFMC produces 0.19% mean error values across the distribution, 0.02% lower than Kriging and 1.8% lower than MLMC.The K-S test d-value is also found to be lower than other methods tested.These improvements over Kriging may be expected due to the larger costs required.
High-resolution probabilistic modelling of the Dyce domain can be achieved in under 30 h-a feasible cost for commercial uses.By implementing a Kriged approach with Sobol sampling, a full uncertainty quantification of Dyce has been completed faster than previously possible.The MAAPE and K-S values indicate strong agreement between the results, which were obtained from 59 samples (169 times faster than FMC).

Inverurie
Two uncertainty quantification methods have been investigated for the Inverurie case study: Kriged and MFMC.These are compared to previous results from an FMC and MLMC approach [6].

Inverurie: Time Constraints
Following a similar methodology to Dyce, the UQ method accuracy has been tested across four realistic time constraints: 6 h, 12 h, 24 h, and 48 h.The output distributions have been compared using violin plots (Figure 5) to highlight the importance of replicating both the distribution shape and extreme regions.

Inverurie: Comparison of Methods: Required Simulations for Error Convergence
Minimum costs required to achieve the same sampling errors have been assessed for the three UQ methods.In each case, the target error was equivalent to an FMC approach with  = 10,000 samples.This highlights which method has the most efficient convergence towards an equilibrium distribution.
Implementing a low-discrepancy Kriged proxy model requires 14 flood model runs to replicate the shape and extremes of traditional probabilistic methods.The efficiency of Kriging for Inverurie allows FMC outputs in less than 14 h-a 714-fold speed up over Results show that Kriging is the most effective uncertainty quantification method tested.Distribution extremes are captured throughout the four tests for both MFMC and Kriging with the latter able to replicate the distribution shape more effectively (Figure 5).
Applying a Kriged model produces accurate results matching the FMC distribution from 12 h upwards.The lowest 6 h time limit deviates slightly from the FMC results at around 2.1 × 10 6 m 2 .Nevertheless, the overall performance of Kriging for Inverurie is significantly improved in comparison to current uncertainty quantification methods (FMC, Latin Hypercube Sampling, and MLMC).
MFMC is able to match the distribution extremes and general shape but struggles to model inflexion details in the distribution.Sobol sequencing allows accurate modelling of the extremes and quickly picks up the long-tailed distribution shape.However, the detailed fluctuations of the distribution are not captured as a consequence of three-resolution kriged models diluting the results.This suggests differences in the distribution shape between resolutions.Although outperforming MLMC, the three-model MFMC approach is less efficient than a single Kriged model.
Both advanced UQ methods outperform MLMC across each of the four time constraints investigated.The ability to reduce the number of simulations with no loss in output distribution reinforces the potential of these methods to significantly reduce the computational burden associated with probabilistic flood modelling.

Inverurie: Comparison of Methods: Required Simulations for Error Convergence
Minimum costs required to achieve the same sampling errors have been assessed for the three UQ methods.In each case, the target error was equivalent to an FMC approach with n = 10, 000 samples.This highlights which method has the most efficient convergence towards an equilibrium distribution.
Implementing a low-discrepancy Kriged proxy model requires 14 flood model runs to replicate the shape and extremes of traditional probabilistic methods.The efficiency of Kriging for Inverurie allows FMC outputs in less than 14 h-a 714-fold speed up over traditional probabilistic methods (Table 3).Compared to MLMC, this is a further 38 h drop in computational costs.Both MAAPE and K-S tests indicate a strong correlation between the Kriged outputs and FMC results.MAAPE errors between output distributions are less than 0.001%-a consequence of the simple inflow-output relationship observed for Inverurie.MFMC requires the second lowest cost at 28.8 h-15 h slower than Kriging.Combining multiple-resolution Kriged models produces the same FMC sampling errors with a 1.8-fold speed up over MLMC.The MAAPE values and K-S distances are larger than Kriging but small enough to suggest that the differences are insignificant.Results indicate that MFMC can efficiently reduce the computational demands of uncertainty quantification-although with larger costs than pure Kriging.
Kriging and MFMC outperform MLMC with the former requiring only 14 simulations for accurate high-resolution probabilistic outputs.This large cost reduction highlights the ability of these methods to accurately assess inflow uncertainty in less time than previously thought possible.

Glasgow
Similarly to Inverurie, two advanced uncertainty quantification methods have been investigated for Glasgow.MFMC and Kriging have been compared directly to FMC (n = 2500) and MLMC results.The larger simulation costs for Glasgow (72 min for a 5 m run) prevent a full 10,000 simulations for FMC; thus, a 2500 sample size was selected as a 50% reduction in the error convergence.Unfeasible FMC costs further reinforce the need for significant cost reductions.Glasgow has a simple uni-modal output distribution, which is likely to improve UQ method efficiency.

Glasgow: Time Constraints
Output distributions have been compared across four realistic time constraints: 6 h, 12 h, 24 h, and 48 h.The extreme values and shape of flooded area distributions have been compared to current methods using violin plots (Figure 6).Results show that Kriging and MFMC are capable of replicating FMC output distributions for every time constraint tested.The two advanced UQ methods match the extremes and shape of the FMC distribution accurately even at the lowest (6 h) time constraint (see Figure 6).In contrast, MLMC results only begin to reflect the correct distribution at the 48 h time limit.Lower time limits are poorly captured by MLMC, requiring a  Results show that Kriging and MFMC are capable of replicating FMC output distributions for every time constraint tested.The two advanced UQ methods match the extremes and shape of the FMC distribution accurately even at the lowest (6 h) time constraint (see Figure 6).In contrast, MLMC results only begin to reflect the correct distribution at the 48 h time limit.Lower time limits are poorly captured by MLMC, requiring a larger number of simulations than permitted.
It is clear from these results that Kriging and MFMC outperform MLMC in quantifying the inflow uncertainty for this case study.Once more, advanced UQ methods are able to converge to equilibrium distribution much faster than MLMC.The simple output distribution and linear inflow-flooded area relationship induce rapid convergence of the proxy models and allow significantly lower computational costs than traditional methods.

Glasgow: Comparison of Methods: Required simulations for Error Convergence
Once more the minimum costs required to achieve the same accuracy as FMC (n = 2500) have been assessed across the two new UQ methods with MLMC providing a further comparison.
Kriging has the lowest computational demand, requiring 10 simulations to replicate the FMC output distribution (see Table 4).Implementing a proxy modelling approach allows probabilistic assessment of Glasgow in 12 h-significantly faster than MLMC.This large reduction in costs can be attributed to the simple unimodal shape of the output probability density function and relatively simple inflow-flooded area relationship-both of which are a consequence of the topographical constraints of the modelled area.MAAPE and K-S values suggest a strong correlation between output distributions with a mean error value of less than 0.05% (see Table 4).This small error combined with the 1000-fold speedup over FMC highlights the potential of this method to drastically reduce the computational costs of probabilistic flood modelling without any loss in distribution accuracy.Multi-fidelity methods perform well for Glasgow but continue to have a larger computational burden than individual Kriged models.MFMC increases uncertainty quantification efficiency over MLMC with 19.4 h required for converged results-a 5.4-fold speedup (Table 4).However, the multi-level framework dilutes the high-resolution outputs and increases costs compared to Kriging.MFMC produces slightly larger errors than Kriging, but both tests show a high degree of accuracy towards FMC outputs.
Both advanced UQ methods have been shown to significantly reduce computational costs over FMC and MLMC.The efficiency of these new methods allows accurate probabilistic assessments in remarkably low time, vastly improving the feasibility of flood model uncertainty quantification.

Discussion
Two advanced uncertainty quantification techniques have been tested for three distinct Scottish case studies.These have been compared to the gold standard FMC method and a previously investigated multi-level Monte Carlo (MLMC).Each case study brings a unique inflow-flooded area relationship, discretisation errors and computational costs.Assessing model performance across all three locations can go beyond the case study locations and provide robust insights into UQ model performance for flood modelling in general.
Kriging outperforms the other methods across each of the three case studies, producing accurate results with the lowest costs.Implementing a low-discrepancy sampling approach produces efficient proxy models that can replicate FMC results in fewer than 15 simulations for specific case studies.Sobol sequencing input samples ensures that the maximum and minimum inflow values are captured and that the interpolation errors are minimised.The well-trained Kriged model matches FMC output distributions with less than 0.6% of simulations (dropping to 0.15% and 0.1% for Inverurie and Glasgow, respectively).Kriging produces results with a speedup between 1.42 and 8.7 over MLMC.Furthermore, the ability to create additional samples in seconds allows in-depth analysis of specific flows.
Results suggest that MFMC produces accurate results in less time than MLMC and FMC.Across all three case studies, MFMC models are able to match the FMC distribution extremes and shape faster than MLMC, producing a speedup of between 1.3 and 5.4 over traditional multi-level methods.The order of magnitude speedup of MFMC over FMC is in agreement with previous studies [19].MFMC is able to significantly reduce the computational costs but is less effective than pure Kriging for every location.The multilevel structure of MFMC appears to dilute the shape of the output distribution and produce less accurate results than Kriging.Including multiple resolutions that do not share the same distribution shape or extremes weakens the effectiveness of the high-resolution Kriged model.In the case where all three models are extremely similar, this would be less of an issue; however, the necessity for finer grid models is simultaneously lessened.MFMC may only be more effective than Kriging when the disparity between coarse grid and fine grid computational costs is greater than that tested herein.In most single dimensional problems, a well-trained high-resolution Kriged model would most likely be the best option.In transferring this research finding to other locations, it seems sensible to start with Kriging as the preferred method before moving to the more complex approach of MFMC in the first instance.
Kriging has been shown to outperform MFMC for a single input parameter analysis; however, MFMC is likely to be preferred for high-dimensional problems.The work herein has presented a simple application of Kriging to quantify inflow uncertainty for flood hazards.Larger dimensional problems may produce different conclusions due to the known limitation of Gaussian problems and dimensionality [50].In such cases, MLMC and MFMC are likely to reduce computational costs more effectively than Kriging; however, this is yet to be investigated.
Varying efficiency of Kriging and MFMC methods at different locations highlights the dependence upon inflow-output relationships.The strong linear correlation between peak inflow and flooded area for Inverurie and Glasgow means that Kriging can quickly and accurately represent this relationship with a small set of training samples (10)(11)(12)(13)(14)(15).On the other hand, Dyce requires a much larger training set (59) to match the complex flooded area distribution and the highly non-linear relationship.Larger training sample sets relative to the other case studies are likely a result of the more complex output flooded area distribution.The bimodal distribution with inflexions cannot be accurately modelled with small sample sizes.Additional case studies with a range of output distributions are required to fully validate this theory.However, the number of simulations cannot be known a priori and currently needs expert understanding of the domain and topography to estimate.This consideration needs further investigation for transferring this research to new case studies.
The next steps for these methods will be to create a larger database of studies to determine if there are situations where MFMC outperforms Kriging.Understanding the driving force behind method efficiency using multiple case studies will enable the development of a decision framework for method selection based on known catchment characteristics and modelling factors.Additionally, pluvial cases may be considered to ensure applicability across hydraulic modelling.Establishing a methodology of best practice and providing commercial modellers with a consistent guide of uncertainty quantification methods will hopefully promote a larger uptake of uncertainty assessments, both in flood modelling and beyond as the methods become more refined.This will ensure that future societies are as resilient as possible to the ever-increasing impacts of climate change.

Conclusions
This paper has investigated a range of uncertainty quantification techniques new to flood modelling.Robust assessment of these methods has been performed across three case studies with a range of topographic features, domain sizes, fluvial complexities and computational costs.The overarching aim of the work is to improve probabilistic flood hazard assessments to incentivise commercial uptake.This has been achieved using Kriging and MFMC.A range of methods has been provided that are able to reduce computational demands of high-resolution uncertainty assessments.
Two main conclusions can be drawn from the results presented: 1.
Kriging and MFMC significantly reduce the computational costs required for probabilistic modelling.Both methods reduce the computational costs by at least 99.4% and at most 99.99% over FMC across the three different case studies.These case studies are considered to be representative of small-medium-scale flood modelling assessments, and it is thus expected that these results would be transferable to other catchments.

2.
High-resolution Kriged methods require the lowest computational costs and return the highest degree of accuracy.As few as 10 simulations can accurately replicate the entire output distribution (although the exact number required will vary between locations and is highly dependent upon the inflow-flooded area relationship/topography).
Kriging-Sobol is the most effective method for each of the case studies investigated with the lowest convergence costs and highest degree of accuracy within constrained time limits.This method has proven itself to be highly efficient at representing both simple and complex output distributions whilst allowing further simulations to be produced in seconds.This research suggests that Kriging can improve probabilistic flood modelling.Thinking more broadly, the results suggest that these techniques could be applied to other natural hazard assessments including, for example, coastal flooding, tsunami or landslide hazards where an assessment of uncertainty is required.This would add to the current literature combining physically based analysis with machine learning methods to quantify uncertainty [51,52].
Future flood hazard assessments can now be performed using Kriged proxy models in 0.1% of FMC computational costs.The ability to fully quantify inflow uncertainty in future flood events will prevent unnecessary economic and environmental damage and increase societal resilience to flood hazards.

Data Availability Statement:
The codes that were used for the quantification of uncertainty using R language (version 4.0.2(22 June 2020)) can be found in Github: https://github.com/ga41/MFMCand-Kriging(accessed on 10 November 2022).This repository was created by Gordon Aitken

Figure 1 .
Figure 1.Case study locations within Scotland.

Figure 2 .
Figure 2. Convergence of flooded area discretisation errors with increasing resolution for three Scottish case studies.Red dots indicate flooded area outputs from the independently calibrated resolution models; blue dots denote the flooded area produced at each resolution using the 2.5 m-resolution parameters.Based on these results, the 5 m model is chosen for the rest of the research as it provides a good balance of accuracy and computational speed.

Figure 2 .
Figure 2. Convergence of flooded area discretisation errors with increasing resolution for three Scottish case studies.Red dots indicate flooded area outputs from the independently calibrated resolution models; blue dots denote the flooded area produced at each resolution using the 2.5 mresolution parameters.Based on these results, the 5 m model is chosen for the rest of the research as it provides a good balance of accuracy and computational speed.

Figure 3 .
Figure 3. Normalised standard hydrograph shape for each model; scaled to the flow range as needed using range calculated from GEV analysis.

Figure 3 .
Figure 3. Normalised standard hydrograph shape for each model; scaled to the flow range as needed using range calculated from GEV analysis.

Figure 4 .
Figure 4. Output PDFs for varying computational time constraints: (a) 6 h, (b) 12 h, (c) 24 h, and (d) 48 h, for Dyce and compared to FMC results.Horizontal black lines correspond to the minimum and maximum flooded area of the FMC distribution.4.1.2.Dyce: Comparison of Methods: Required Simulations for Error Convergence

Figure 4 .
Figure 4. Output PDFs for varying computational time constraints: (a) 6 h, (b) 12 h, (c) 24 h, and (d) 48 h, for Dyce and compared to FMC results.Horizontal black lines correspond to the minimum and maximum flooded area of the FMC distribution.

Figure 5 .
Figure 5. Output PDFs for varying computational time constraints: (a) 6 h, (b) 12 h, (c) 24 h, and (d) 48 h, for Inverurie and compared to FMC results.Horizontal black lines correspond to the minimum and maximum flooded area of the FMC distribution.

Figure 5 .
Figure 5. Output PDFs for varying computational time constraints: (a) 6 h, (b) 12 h, (c) 24 h, and (d) 48 h, for Inverurie and compared to FMC results.Horizontal black lines correspond to the minimum and maximum flooded area of the FMC distribution.

Water 2024, 16 , 1309 14 of 19 Figure 6 .
Figure 6.Output PDFs for varying computational time constraints: (a) 6 h, (b) 12 h, (c) 24 h, and (d) 48 h, for Glasgow and compared to FMC results.Horizontal black lines correspond to the minimum and maximum flooded area of the FMC distribution.

Figure 6 .
Figure 6.Output PDFs for varying computational time constraints: (a) 6 h, (b) 12 h, (c) 24 h, and (d) 48 h, for Glasgow and compared to FMC results.Horizontal black lines correspond to the minimum and maximum flooded area of the FMC distribution.

Figure S1 :
Output PDF's for varying Kriging sampling approaches at different locations (a) Dyce (b) Inverurie and (c) Glasgow and time constraints compared to FMC results.Horizontal black lines correspond to the minimum and maximum flooded area of the FMC distribution.;S2: MFMC Combination Analysis: Figure S2: Output PDF's for varying MFMC combinations at different locations (a) Dyce (b) Inverurie and (c) Glasgow and time constraints compared to FMC results.Horizontal black lines correspond to the minimum and maximum flooded area of the FMC distribution.Author Contributions: G.A. performed the simulations and statistical analysis.G.A., L.B. and M.A.C. analysed results.All authors have read and agreed to the published version of the manuscript.Funding: This work was supported by EPSRC-EP/N030419/1.

Table 2 .
Required sample size and associated run costs for Dyce.

Table 3 .
Required sample size and associated run costs for Inverurie.

Table 4 .
Required sample size and associated run costs for Glasgow.