Rainfall Threshold for Flash Flood Warning Based on Model Output of Soil Moisture: Case Study Wernersbach, Germany

Convective rainfall can cause dangerous flash floods within less than six hours. Thus, simple approaches are required for issuing quick warnings. The flash flood guidance (FFG) approach pre-calculates rainfall levels (thresholds) potentially causing critical water levels for a specific catchment. Afterwards, only rainfall and soil moisture information are required to issue warnings. This study applied the principle of FFG to the Wernersbach Catchment (Germany) with excellent data coverage using the BROOK90 water budget model. The rainfall thresholds were determined for durations of 1 to 24 h, by running BROOK90 in “inverse” mode, identifying rainfall values for each duration that led to exceedance of critical discharge (fixed value). After calibrating the model based on its runoff, we ran it in hourly mode with four precipitation types and various levels of initial soil moisture for the period 1996–2010. The rainfall threshold curves showed a very high probability of detection (POD) of 91% for the 40 extracted flash flood events in the study period, however, the false alarm rate (FAR) of 56% and the critical success index (CSI) of 42% should be improved in further studies. The proposed adjusted FFG approach has the potential to provide reliable support in flash


Introduction
Flash floods (FF) are considered one of the most dangerous flood types due to their sudden occurrence and potentially severe impacts. The term "flash" reflects the rapid reaction of a drainage network with water levels reaching a critical stage within only minutes to a few (usually less than six) hours after the onset of a heavy rainfall event [1][2][3]. This leaves an extremely short flood warning time which can cause tremendous socioeconomic damage [4,5]. Typical consequences of such events include local flooding, soil erosion, debris and destruction of buildings and infrastructure, which are potentially dangerous for human life [6,7]. The majority of flash floods take place in small to mediumsized and often ungauged catchments. In the near future, these events are likely to increase in frequency and intensity with the impact of climate change [8].
Flash flood conditions are usually difficult to model, monitor and forecast [3,9]. Due to the fast rise in discharge and water level, a flood warning based on the evaluation of measured precipitation or stream gauges would be often too late to prevent a serious impact. Instead, the flood or hazard potential must be estimated from meteorological or hydrological forecasts. One commonly used approach for flash flood warning is socalled flash flood guidance (FFG), where flood warnings are issued solely based on preevent soil moisture conditions and rainfall forecast information. The method uses a simple comparison of accumulated (forecasted) rainfall with critical values of rainfall [1,2]. These rainfall thresholds are estimated once based on catchment characteristics and flood warnings are issued when they are exceeded for the specific rain forecast. The method was originally designed and implemented by the US National Weather Service in the 1970s and has been operated in several countries [2,4,10]. However, a method for deriving rainfall thresholds for early warning of precipitation-related flooding may differ depending on the hypothesis adopted, such as probabilistic or deterministic [11]. De Luca and Versace (2017) discussed in detail various schemes that can be used for rainfall thresholds and suggested careful consideration for selecting a scheme for rainfall threshold estimation to avoid confusion in its use in early warning.
This study used a relatively simple and flexible model that can be later applied to gauged as well as ungauged catchments. BROOK90 delivers soil moisture based on the water balance, indicating the pre-event catchment state in the case of FF. Most recent studies on flash flood warnings based on the FFG approach used only rainfall-runoff models for single events, as flash floods often happen under extremely heavy rainfall events with short durations; hence, only a limited number of events are available [12,13]. Pre-event soil moisture was taken into account in the calibration and validation processes [14,15]. However, the boundary conditions of a catchment for different durations and seasons were neglected. Rainfall thresholds derived from a single-event approach might be unrepresentative and have limitations in statistical analysis. Given that antecedent soil moisture differs, it plays an important role in the runoff generation in a catchment. A storm event is considered irrelevant in a dry period; however, it can cause flooding in a wet period when the soil is already saturated [16,17]. This is why rainfall thresholds need to be determined under several soil moisture conditions.
When long historical records are available, statistical means of precipitation data before the event can be applied for warning thresholds. However, such long-term records are rare, especially in flash flood-prone catchments. An approach based on synthesis hyetographs with different shapes and durations as values of rainfall producing a critical discharge was proposed to overcome the limitation of historical records [12]. This approach requires assumptions regarding both the temporal evolution of the designed rainfall and pre-event catchment conditions. However, the main drawback of these approaches is the use of an event-based model.
Here, we propose an adjusted method of FFG that takes the limitations of previous studies into account and overcomes the associated drawbacks. Instead of considering the pre-wet condition of individual events for flood development, we examine a wide range of soil moisture from dry to wet as an input to run a rainfall-runoff model. The use of synthetic precipitation with different intensities and durations also allows us to evaluate FF cases more holistically and overcome the statistical problem due to the rare occurrence of such an event.The FFG approach was applied to the Wernersbach catchment within the Tharandt Forest as a test case to i) take advantage of the reliable and multifold long-term data available for the catchment, and ii) investigate the potential use of the BROOK90 model as a tool for FFG.

Catchment Characteristics
The study area is a small forested catchment of 4.6 km 2 in Tharandt Forest south of Dresden in eastern Germany (Figure 1), which has been part of many studies [18,19]. The slope is relatively flat with an average grade of 3% and an elevation ranging from 322 to 424 m.a.s.l. The catchment is dominantly covered with coniferous (spruce) trees (>80%) and contains soil dominated by loamy silt, Dystric Cambisols, Podsols and Stagosols [20]. A more detailed presentation of the geologic and land use characteristics is provided in [21]. This catchment was selected due to (a) well-monitored experimental catchment with long term hydrological and meteorological data records (more than 50 years), (b) reference measurements to derive vegetation parameters, and (c) numerous studies and expert knowledge are available.
Floods occur mostly in summer partly caused by very intense rainfall. For instance, the extreme flood event in 2002 with a total daily sum of 312 mm between August 12th and 13th measured at Zinnwald-Georgenfeld set a new record in Germany [22]. The maximum discharge during this event was 10 m 3 /s, 280 times greater than the mean runoff. Soils in the catchment are able to store a large amount of rain water before surface flow starts occurring [23,24].  The discharge records used in this study are for discharge at the catchment outlet. Discharge is calculated by empirical equations for two stage-discharge relationships, one for low base flow with water stages lower than 331 mm and one for high base flow with water stages higher than 331 mm. The parameters of the empirical curves are validated twice a year with flow measurement devices. During the extreme flood events in 1980 and 2002, discharge data are derived from interpolated data from surrounding states since the gauge weir was overtopped.

Short Model Description
The BROOK90 model is a lumped-parameter water budget model designed for small, uniform catchments. It produces a good representation of evapotranspiration (ET) and soil moisture by applying the well-known Penman-Monteith equation twice: once for the canopy and once for the soil surface [26]. To describe soil movement, the model applies Richard's equation and near-saturation interpolation of the scheme of Clapp and Hornberger, (1978) [27]. The model requires daily data for precipitation, Tmin, Tmax, solar radiation, vapor pressure and wind speed. However, it can also be operated with reduced daily inputs of precipitation and min/max temperature, while other input values are generated by the model. Thus, it is widely applied to estimate water fluxes at the soil plant atmosphere interface on a daily basis [21,23,28]. Nevertheless, the input of precipitation at higher temporal resolutions, such as hourly resolution, is possible, potentially improving the representation of fast components of the water budget, such as interception or interflow. A detailed chart can be found in [26].
Discharge is generated with different flow paths, such as vertical bypass, seepage, surface flow and lateral subsurface flow. Most of the flow parameters are empirical and are set according to the general understanding of the modeler. However, the model cannot accommodate lateral transfer of water downstream. The lack of a routing mode limits the application to catchments, in which flow is generated locally. To address some heterogeneity in larger catchments (up to 100 km 2 ), we allow the model to run for various combinations of land use and soil characteristics. The catchment response is then derived from the superpositioning of individual runs weighted according to their spatial contribution to the catchment area. This allows the introduction of a kind of hydrological response unit (HRU) but contributes additional uncertainty. While the BROOK90 model is not recommended for direct flood modelling, we apply it for the partitioning of precipitation into ET, storage change and discharge. The simulated discharge is merely used as an indicator to evaluate the critical flooding stage as outlined below.

Flash Flood Guidance Setup
FFG is an effective flood warning system for small or medium-sized mountainous catchments with the potential danger of intense and destructive flooding with a short warning time. It does not intend to predict the timing of flooding, but tries to identify potential flood occurrence. The method compares rainfall forecasts with so-called rainfall thresholds for different antecedent soil moisture conditions (AMCs). Rainfall thresholds are rainfall intensities that lead to critical discharge in the catchment. A flood warning is issued if the corresponding thresholds are exceeded. Rainfall thresholds are derived with the following three steps as described in Figure 2.
Step 1: Estimation and classification of antecedent soil moisture Following [16,29,30], the soil moisture in the catchment is grouped into values corresponding to "wet", "moderately saturated", or "dry" conditions to account for different AMCs. For this process, the BROOK90 hydrological model was used to simulate the catchment's water balance from 1970 to 2016, identifying the daily moisture conditions. The model performs well under different data input conditions, and detailed model setup and performance validation are described in [24].
The 0.33 and 0.66 percentiles are derived from the historical soil moisture value distribution to categorize the three aforementioned classes. Each class, namely, AMC I (dry soil), AMC II (moderately saturated soil), and AMC III (wet soil), is defined as the wetness condition at the beginning of a rain event. This step is referred to as the current catchment state in Figure 2.
Step 2: Runoff threshold identification Runoff is considered critical when flooding starts, exceeding the so-called bank-full flow. A method commonly used to identify this value uses a 2-year discharge return interval [10]; other methods derive it from available historical data and hydraulic geometry using, for example, stage-discharge curves of the considered riverbed. In the case of the Wernersbach catchment, statistical values give very small and implausible discharge values, explainable by the small catchment size. Values based on hydraulic geometry are larger, leading to a very small sample size. The critical water stage (Qs) was therefore defined as 50 cm, equivalent to the average high discharge, which is considered representative of flood events in this specific catchment. This value (Qs, in Figure 2) was compared with the simulated discharge (q, in Figure 2) after a model run to identify the rainfall threshold.
Step 3: Rainfall threshold estimation Identifying critical rainfall values that can potentially cause a flood requires a large sample of different rainfall events and their corresponding runoff to test the physical boundary when the river is full of water under different initial catchment conditions. We increased the sample size by running the BROOK90 model for the summer months from April to September for the study period 1996-2010 with synthetic rainfall inputs and different rainfall intensity distributions, namely, step, triangle, decreasing and increasing ( Figure 2). These designed rainfalls are also called hyetotypes [12]. Only summer months were included in the analysis since flash flood events are mainly caused by convective rainfall, which mostly takes place during the summer, particularly in Germany [7]. The study site was parameterized by deriving from available measurements and literature. As mentioned above, the BROOK90 model is primarily based on physical laws. Thus, soil properties (density, grain size distribution, humus content of soil horizons) are combined with site properties (slope, exposure) and meteorological measurements (air temperature, humidity, radiation, wind speed, precipitation) to form explicitly site-specific conditions. The model's flow parameters, which have no physical meaning, were estimated empirically by the daily discharge measurements using the Parameter ESTimation program [31]. The Nash-Sutcliffe efficiency (NSE) for the calibration period ) and the validation period (1991-2016) are 0.61 and 0.82, respectively. For each day with its corresponding original catchment conditions (taken from the water balance calculations in Step 1), rainfall of different durations from 1 to 24 h and different configurations (step, triangular, decreasing and increasing) was fed into the model (Figure 2), taking temporal variations in rain events into account. A maximum of 24 h was chosen since critical discharge in flash flood situations is usually reached sooner than six hours after the rain begins. For each rain configuration and duration, the amount of rainfall was increased until the model output reached or exceeded the critical discharge value within the corresponding duration (when q > Qs). The sample was divided into three pre-event soil moisture conditions (AMCs), and three final curves for each AMC category were established. The R version of the BROOK90 model (freely available at https://github.com/rkronen/Brook90_R (accessed on 25 January 2018) was used for the derivation since it is much more flexible than the original version concerning data input and adaptation of the model to the user's needs.  [32][33][34][35], considering the hydro-climatic settings and catchment size of our study area. Discharge data were considered to reflect a potential flood event when the critical value was exceeded within the first 48 h after the relevant rain event started. This methodology is similar to that used by [34], where the storm duration was defined for an integrated high-resolution dataset of high-intensity European and Mediterranean flash floods. The identified relevant rainfall-runoff events are then classified into AMC categories according to the soil conditions at the beginning of the rain event (just before the rain started).
A total of 40 summertime events between 1996 and 2010 were extracted from the discharge and precipitation series ( Table 1). The total event precipitation ranged between 21 mm and 272.3 mm. The maximum hourly precipitation values (Pmax) ranged from 3 mm to 41 mm.

Application of Flash Flood Guidance
After estimating rainfall thresholds for the individual catchment and soil moisture states, the FFG approach can be applied to rainfall events. Once it starts raining or once rain is forecasted, all rain is accumulated until the rain event is over. Soil moisture at hour "0" (just before the rain starts) is classified as AMC I-III to choose the correct threshold curve. The rain information is updated from the rain forecast.
Examples of selected events from Table 1 are illustrated in Figure 3 to demonstrate how rainfall thresholds and different pre-event soil moisture conditions and rainfall intensities are linked. Depending on the characteristics of an event, the consequences can be very different. In the example of Figure 3a, the rainfall (red curve) does not exceed the rainfall threshold curve (i.e., there were not "enough" rain during this event); hence, no warning would be issued. In Figure 3b the accumulated rainfall exceeds the corresponding AMC III curve, and a warning is issued. Note that in this example, the rain would also exceed the curves if the initial wetness state were different. However, even though accumulated rainfall exceeds the rainfall thresholds in the case of wet and moderate soil moisture states (Figure 3c), no warning is issued as the catchment is dry and can hold more water. The situation shown in Figure 3d would lead to a warning because the accumulated rainfall exceeds the AMC III curve. For a drier catchment (AMC I), no warning would be issued for the same accumulated rainfall.  Table 2. The characteristics of the four selected events are listed in Table 1. The events were selected on purpose to illustrate the differences in issuing an alarm. The false alarm is not illustrated here.

Validation of FFG
The discharge for both events in Figure 3a-c, in fact, exceeds the critical discharge value in both cases; hence, a warning should have also been issued for the left examples. This shows that the operational mode of the FFG approach requires careful validation. For this validation, the method is tested on the historical events that were previously defined in Table 1. The rainfall information is cumulative, as shown in Figure 3, as if it were forecast information. If a flood warning is issued (the rainfall threshold is exceeded), we count it as a correct alarm (CA, see Table 2). If the rainfall threshold is not exceeded, it is a so-called missing alarm (MA) since the discharge exceeds the critical value. For a complete evaluation, we also evaluated all other rainfall events from the period 1997-2010. A rain event was defined as previously described, and the cumulative value was compared with the rainfall thresholds. If the critical rainfall level was not reached as expected, the event was counted as a correct missing alarm (CMA). The cases in which a warning is issued based on rainfall but without precedent high discharge were classified as false alarms (FAs). Obviously, MAs are potentially dangerous and should be avoided if possible. However, if we reduce the MA frequency to 0 (which is theoretically possible by setting the rainfall thresholds very low), the FAs will naturally increase to a very high level since a warning will be issued almost every time it rains. This modification would decrease the quality of FFG and would make the approach obsolete and unnecessary. Hence, the quality of prediction can be evaluated by using the hit alarm rate or the probability of detection (POD) and the false alarm rate (FAR) and the critical success index (CSI). The POD is determined by comparing the correctly predicted events with the events actually observed. The FAR establishes a comparison between incorrectly forecast events and all observed events. Adapting the concept of Schaeffer, (1990) [36], CSI presents the ratio of correctly predicted events to the total number of predicted events. The three statistics are defined as follows: FFG performs well if the forecast system has a POD of 1 and a FAR of 0. Consequently, CSI will obtain a value of 1 as the derivation of POD and FAR. This evaluation method is a comprehensive approach that can cover all potential cases and improves the estimation accuracy of the FFG approach for our study catchment.

Rainfall Thresholds
This approach results in more than 241,000 model runs, covering the wide range of pre-event condition states in the catchment, including almost drought to almost saturated conditions, as shown in Figure 4. The high number of the simulations resulted from the combination of 2340 days in the study period with 24 durations and four different rainfall shapes. This process requires a tremendous computation effort as each simulation is involved by "a reverse mode" to detect a rainfall threshold. Each point in Figure 4 displays a rainfall threshold value that caused simulated discharge within its duration to exceed the discharge threshold as defined in step 2 in the guidelines. Figure 4a-c shows a range of rainfall thresholds for durations from 1 to 24 h. At first glance, the amount of rainfall increases exponentially as the event duration increases. For instance, critical rainfall increased for 1 to 24 h of step rainfall in the range of 5-130 mm in the case of dry soil condition. Classifying these values according to AMCs reveals an interesting distribution. Specifically, the higher the pre-event soil moisture is, the smaller the potential rainfall intensity causing potential flooding is. It is also observed that wetter soil has a smaller range between the upper boundary and lower boundary of the rainfall thresholds (Figure 4c).
Under wet conditions, the Wernersbach catchment required only 11.5 to 13.5 mm of precipitation in one hour to reach the flooding stage. On average, dry soil requires more rainfall input to cause critical discharge, regardless of the type of designed rainfall. Furthermore, the rainfall threshold values are also variable and depend on the hyetotypes and antecedent soil moisture. The median values of critical rainfall were extracted as representatives for further evaluation (Figure 4d-f). We see that almost 50 mm is estimated for the difference in rainfall amount in 24 h depending on the hyetotype. Comparing critical rainfall among the hyetotypes, the threshold decreased in the following order: decreasing-triangular-step-increasing. This result is due to the increasing rainfall intensity of the hyetotypes in the same order. By grouping critical rainfall events (median values of rainfall thresholds) according to soil moisture classes for the hyetotypes (Figure 5), we can clearly see the impact of antecedent soil moisture. In all hyetographs, the pattern remained consistent, as the drier soil required more precipitation than the wetter soil. However, the differences among rainfall thresholds in soil classes varied in the hyetographs. For instance, for a duration of 24 h, the ranges of rainfall thresholds between wet soil and dry soil were 25, 29, 21, and 30 mm for the increasing, step, triangular and decreasing hyetotypes, respectively. In operational application of the proposed framework, Figure 5 can be used as a practical tool for decision-making. When the temporal distributions of storm moisture and pre-event soil moisture are estimated, corresponding curves can be chosen for comparison with the accumulated rainfall.  Table 3 shows the results of validation and summarizes the categories of CA, MA and FA for different pre-event soil moisture and rainfall configurations. Since we noticed that most rain events decreased in intensity, we focused on the results for the corresponding hyetotype H3, as highlighted. We can see that 11 events were correctly forecast, 1 event was forecasted as an MA, 14 events were forecasted as FAs, and 11 events had CA. These results led to the evaluation criteria of a POD of 91.7% and an FAR of 56% and an CSI of 42.3%, which is comparable with similar studies [37,38]. Based on this result, for the rain gauge data, the threshold-based forecasting system seems to have reliable performance. On the other hand, the FAR is not as low as desired. The main cause of the high FAR is the number of long low-intensity events in the validation. Montesarchio, Lombardo and Napolitano, 2009 also obtained an FAR of 75% in their study and found the same reason in a catchment in North Italy. This study also pointed out an interesting result that FA events took place only under dry to moderate AMCs. This result can be explained by the fact that most rainfall infiltrated the soil due to low intensity and did not generate surface runoff. However, the model was not able to describe this process, which is shown in Figure 6 and discussed in the next section.

Performance Evaluation
The correctly forecasted events were those with high discharge values and those with generally large rainfall amounts. An important characteristic was that most of the events had preceding rain that started with peak rain.

Role of Discharge Simulation
As briefly mentioned above, the routing process in the BROOK90 model has been omitted to focus on the details of the factors controlling evaporation [26]. An apparently weak point of the model is a missing delay in discharge, which should be expected for the concentration time for simulated discharge. Therefore, the model responded quickly, which resulted in peak discharge immediately after rainfall, which is not the case in reality. The peak discharge in the catchment often occurs much later after sub-surface flow processes occur. This causes the peak discharge to be not well captured temporally; thus, even if the cumulative rainfall exceeds the reference threshold, the observed discharge is still under, even if the cumulative rainfall critical value. This discrepancy can be clearly seen in Figure 6 (left side), as the peak discharge in the simulation appeared 3 h earlier than the observed peak discharge. The concentration time was longer, which led to the discharge curve being rather flat during the rainfall event. Hence, the threshold discharge was already exceeded by the simulated discharge, while the observed discharge was still under the threshold. In addition, the discrepancies were particularly significant under dry soil conditions, where infiltration mainly dominated the hydrological process. The results in Table 3 illustrate that among 14 FAs, 9 were found in AMC I (dry soil class). The results clearly demonstrate the role of soil moisture conditions in the prediction skill for flash flood events. Figure 6 (right side) shows an interesting result for the extreme event in 2002. A large rainfall amount seems to overcome the problem of simulating discharge. This finding indicates that this approach will work considerably well for extreme events. Moreover, in ungauged catchments, neither hydraulic geometry nor hydraulic data are available, which makes it more difficult to estimate the critical discharge. Thus, output from the BROOK90 model can be used a reference source of discharge information. When the data input is sufficiently long, a critical discharge value can be estimated using the return period method. Thus, this method enables more robust application in poorly gauged catchments.

Conclusions and Outlook
This case study has tested the definition of a rainfall threshold methodology. The thresholds were estimated by running a physically based model with synthetic rainfall under various pre-event moisture conditions of the catchment. Thus, it resulted in a whole range of rainfall threshold curves categorized according to the hyetotype of rain input and antecedent soil moisture. Under dry conditions, critical discharge is caused by higher rainfall than under wet soil conditions. Thus, in addition to the high precipitation intensities, antecedent soil moisture also plays an important role in the estimation of rainfall thresholds for flash floods. This finding was consistent with that of Penna et al. (2011), who investigated the influence of soil moisture on threshold runoff generation processes in an alpine headwater catchment. Depending on soil type, soil depth, and pre-event soil moisture, soils in the catchment can store a large amount of rainwater before surface flow occurs.
Using rain gauge data, validation with 40 selected events in the study period led to a correct rate greater than 91% for identifying the critical wetness state in the considered catchment. The relatively high FAR can be explained by the limitation of the rainfall runoff model as well as the selection of critical discharge. The proposed adjusted FFG approach has the potential to provide reliable support in flash flood forecasting. It is a one-time action used to derive the thresholds and requires little information for operational use, being based solely based on rainfall forecasting and daily soil moisture information as well as available information on study site characteristics. The R-Br90 version is a good and handy tool for this application. This version allows to run the model in batch mode to investigate the catchment under various pre-event conditions and data inputs. However, a more detailed rainfall-runoff model is needed to improve warning accuracy.
Nevertheless, the actual framework was tested only for a 'perfect' prediction without uncertainties in meteorological variables, especially in rainfall. Determining the quality of different meteorological forecasts (predictability) of heavy precipitation events will be a task in future investigations.
Further investigations will focus on precipitation input derived from numerical weather prediction and radar data sets. Several improvements to the model and method should lead to improved prediction skill. For instance, the choice of critical discharge is crucial for model verification with observed events; thus, a sensitivity analysis is needed to define a critical value for method performance. Additionally, the computed runoff could be improved by a spatially distributed model and better temporal resolution. Then, a critical antecedent soil moisture level derived from the BROOK90 model could serve as the starting condition for running a more complex hydrological model that in turn checks for the alarm level. This combination would allow the monitoring of soil moisture status with very good coverage of critical head catchments with little computational effort, while more complex modelling could be performed only in selected situations. However, to implement the proposed method operationally, we recommend additional reductions in computational time via an integrated modelling framework.