1. Introduction
Currently, river and stream flood forecasts are usually made using Quantitative Precipitation Estimates (QPEs) [
1,
2] instead of Quantitative Precipitation Forecasts (QPFs) because large errors can exist in the QPFs. However, because QPE is not available until after the rainfall begins, a hydrologic forecast cannot be made until the precipitation event is underway and the full impact of the event cannot be captured in the streamflow forecast until the rain has ended. This is especially problematic in small river basins where the precipitation event may not have ended before flash flooding has already started to occur. Thus, reliance on QPE for flood forecasting limits the ability of forecasters to predict rapid streamflow changes with much lead time, reducing the time emergency managers have to inform and prepare safety personnel and the general public for potential flooding. Earlier flood warnings could be made if QPFs were used as input to hydrologic models instead of QPEs [
3,
4,
5,
6].
One problem in the use of QPFs in hydrologic models is that precipitation forecasts often have substantial errors, especially for warm season rainfall [
7,
8], which is typically associated with relatively small-scale and intense thunderstorms that have weak large-scale forcing. The small-scale and high intensity of this type of precipitation often makes it particularly challenging to forecast possible resulting flash flooding [
9,
10], especially because displacement errors in the simulated precipitation systems are common [
11,
12]. Additionally, the lack of larger-scale forcing means that small-scale forcing mechanisms play a more important role in initiating and sustaining warm-season precipitation systems so that higher resolution is necessary in numerical models to forecast warm-season rainfall. Unfortunately, even models run with fine grids still can struggle to accurately predict small-scale forcing and the resulting precipitation [
13,
14,
15], at least partly because existing limited observational networks that supply the initial conditions for the models may only partially resolve these small-scale mechanisms. In addition, errors can arise from the use of parameterizations for the boundary layer or microphysical processes [
16,
17,
18]. Since it is so challenging to forecast warm-season precipitation events using deterministic model runs, an increasing emphasis has been placed on ensemble forecasting [
19,
20,
21].
Ensemble Prediction Systems (EPSs) assist with forecasting, since EPSs show the uncertainty of an event through probabilistic forecasts [
12,
22], and the ensemble mean of a variable is often more accurate than values from any individual ensemble member [
23,
24]. EPSs can use multiple models, different physical parameterizations, and different initial and lateral boundary conditions, thus accounting for many of the uncertainties that can lead to errors in atmospheric models. Werner et al. [
25] tested the use of an EPS in streamflow forecasts and discovered that using the mean areal precipitation of the EPS improved the Ranked Probability Skill Score (RPSS) values when compared to the reforecast archived model forecasts. However, Ebert et al. [
26] found that the ensemble mean for heavy QPF was usually smaller than that observed due to the averaging process smoothing the extremes from each member, since the locations of predicted heavy rain typically differed among members. The resulting mean QPF field was often unrealistically smooth and light.
Since the precipitation patterns from an ensemble mean may not be physically realistic, a better approach can be to use the individual QPFs from the ensemble members. For example, Davolio et al. [
5] showed how QPFs from a multi-model ensemble were useful in generating flood forecasts, although shortcomings with the forecasted timing and shape of the river’s hydrograph were noted. The small scale and intense nature of typical warm-season rainfall aggravate QPF errors associated with both timing and location, such that when input into the hydrologic model, significant errors in watershed-level predictions can occur, especially for small basins [
27,
28]. Carlberg et al. [
29] found that increasing the number of ensemble members by randomly shifting the individual members’ QPF to account for displacement errors led to more skillful probabilistic streamflow forecasts than those of the original EPS. The current operational Hydrological Ensemble Forecasting Service (HEFS) at the National Oceanic and Atmospheric Administration in the United States uses bias correction applied to the ensemble member QPFs that are used as input to create probabilistic streamflow forecasts [
30].
Individual member QPFs are commonly used to create probabilistic QPF (PQPF) [
31,
32], which provides a measure of uncertainty in the rainfall forecasts. Although one could simply assign probabilities at grid points based on the number of members showing precipitation above a threshold, often these forecasts are calibrated by comparing the model forecasts to observations for a period of time to account for systematic biases in the ensembles. In addition, smoothers are often used, which effectively take into account information from nearby grid points [
19], possibly mitigating in a small measure displacement errors common to precipitation forecasts. Assuming value is added to the PQPFs by these approaches, the rainfall amounts associated with the PQPFs could be used instead of individual member QPF as input to hydrologic models to create probabilistic streamflow forecasts. Presently, the National Weather Service’s (NWS) North Central River Forecasts Center (NCRFC) generates a probabilistic streamflow forecast using both QPE measurements from the United States Geological Survey and NWS’s Stage IV rainfall data and PQPFs from the NWS Weather Prediction Center using the 5%, 50%, and 95% exceedances. However, they and others who have attempted to use PQPF in similar manners acknowledge that there may be problems in the direct use of PQPF (S. Connely, NCRFC, 2020, personal communication), suggesting that further work is necessary to document how well the technique may work to provide probabilistic streamflow forecasts.
The objective of this study is to determine how skillful probabilistic streamflow forecasts are if they are made using high-resolution ensemble PQPF information directly, with the probability of exceedance rainfall amounts determined from the probabilities that had been assigned to several QPF thresholds as opposed to using member QPFs. Such an approach takes advantage of the fact that statistical techniques such as Gaussian smoothers add value to the ensemble PQPFs. Two operational/quasi-operational high-resolution ensembles are used to compute the rain amounts associated with various probabilities of exceedance in a similar manner to NCRFC’s PQPF forecasts. Then, these values are input into a hydrologic model to test how well this PQPF application works for short-term, warm-season probabilistic streamflow prediction. The streamflow predictions are compared to the operational predictions from the NCRFC.
3. Results
A comparison of the discharges predicted by the different exceedance probabilities with observed discharges averaged over the full sample of events can be seen in
Table 3. As would be expected, the percent differences were positive for the low probability of exceedance forecasts and became negative for the high probability forecasts. Since the switch from positive to negative percent differences happened between the 75% and 90% probability values, and not closer to 50% as might be expected, a positive error in the streamflow forecasts is suggested. Both ensembles produced basin-average rainfall amounts that exceed 225 mm for the 5% exceedance probability forecast. These values are greater than the record 24-h maximum rainfall totals for most places in this region, and thus, they are especially unrealistic as basin averages. The rainfall amounts and percent differences in the discharge forecasts were higher for HREF-based forecasts than for HRRRE-based forecasts. Rainfall inputs from the HRRRE and HREF were more similar for high and low exceedances probabilities than the values in the middle (the biggest differences were at the 25% probability value), resulting in similar percent differences in the discharge forecasts (
Table 3). For the low rainfall amounts associated with a high probability of exceedances, a large portion of that rain would go into soil storage in the SAC-SMA model, limiting the amount of water available to produce discharge. At the low exceedance values, the extremely high amounts of rainfall (in both precipitation forecasts) would result in much of the rainfall going to runoff and producing similar high discharges in the SAC-SMA, despite some differences in the basin-average precipitation inputs.
HRRRE-based forecasts had an average RPS of 0.29 (standard deviation, 0.06), which was better than the HREF-based forecasts with an average RPS of 0.36 (standard deviation, 0.09) (
Table 4). This difference in RPSs is due to HREF producing higher rainfall amounts compared to HRRRE, which results in forecasts that more seriously over-predict discharge in the higher flow categories. However, when compared to the operational NCRFC forecasts, the HRRRE-based and HREF-based streamflow forecasts had better RPSs; the average RPS for the NCRFC was 0.59 with a standard deviation of 0.07 (
Table 4). The poorer RPS for the NCRFC forecasts was due to consistent underprediction of the magnitude of peak discharge for the three different probability values used in these forecasts. Since the NCRFC had a smaller spread in predicted discharge magnitude, the percentage of cases for which the observed discharge remained below that of any of the streamflow forecasts was lower, 60% of events, compared to HRRRE and HREF forecasts, which both were able to capture 100% of the discharge events.
For the HRRRE-based and HREF-based forecasts, RPSs for individual basins ranged from 0.16 to 0.45, except for the Root River at Pilot Mound, MN (RPMM5), which had values of 0.49 and 0.61 (
Figure 4). Forecasts for this basin are likely less skillful because RPMM5 is the largest basin studied, resulting in especially large rainfall inputs to the SAC-SMA when the heavy amounts associated with low probabilities of exceedance are applied to all grid points for all time periods. This would make this basin more likely to experience unrealistically large discharges compared to others included in the study and indicates that the PQPF application presented here is likely not suitable to larger watersheds.
The streamflow forecast that most accurately captured the correct relative frequency of the observed peak discharge was HRRRE-95% followed just behind by NCRFC-95% (
Figure 5). Forecasts NCRFC-50% and NCRFC-5% greatly underpredicted the frequency of peak discharge amounts, whereas the HREF and HRRRE forecasts overpredicted at those probability levels. Overall, HRRRE did a better job at predicting the frequency of occurrences for the different probability values, having only a slight overprediction in HRRRE-95% and HRRRE-90% that worsened from HRRRE-75% through HRRRE-5%, where there were zero observed occurrences (
Figure 5). HREF had poorer results with overprediction, with an observed frequency of exceedance around 77% for HREF-95% and observed frequencies of zero at HREF-25%.
To provide some additional insight into the performance of the streamflow forecasting technique, a flash flood event that occurred in the city of Ames, Iowa, on 14 June 2018 is described in more detail. Ames is located at the junction of the Squaw Creek and Skunk River (
Figure 3). On 13 June, both streams were at or below the median discharge of ≈ 4.2 m
3 s
−1. In the early morning hours (07 UTC) of 14 June, a line of multicellular convection in connection with a cold front moved over the watersheds and produced heavy rain for the next twelve hours. According to the Iowa Mesonet (
https://mesonet.agron.iastate.edu/), the system deposited 107 mm of rainfall with a peak rainfall rate of 40.9 mm h
−1. Other gauges monitored by the Community Collaborative Rain and Hail Snow Network (COCORAHS) volunteers in the area had total measured accumulations as large as 178 mm. Stage IV data showed that the heaviest total rainfall occurred over the Squaw Creek basin, with the majority of the 6-h accumulation occurring between forecast hours 12 and 18, or 12–18 UTC. The observed peak discharge for the Skunk River was 89.2 m
3 s
−1 (action stage is 122 m
3 s
−1), while Squaw Creek’s peak discharge was 120 m
3 s
−1 (action stage is 108 m
3 s
−1).
HRRRE and HREF output from the runs at 00 UTC 14 June was used to forecast the discharge for this event. Both the HRRRE and HREF PQPF values suggested that rainfall was going to occur in the region of the Skunk and Squaw basins (
Figure 6). HRRRE’s probability forecasts had a northward shift compared to the observed STAGE IV precipitation, while HREF’s forecasts correctly focused the heaviest rain over the basins.
For the Skunk River, the HRRRE and HREF-based forecasts produced hydrographs that were similar to the observed in both in shape and timing (
Figure 7). Note that an additional run of the SAC-SMA was completed using STAGE IV measured precipitation data to indicate how the forecast model would perform with QPE. The probability of exceedance value associated with a discharge most similar to the observed discharge was 50%, which resulted in a peak slightly lower than the observed peak discharge (
Table 5). The NCRFC forecast largely underpredicted the event on the Skunk River; the discharge associated with only a 5% probability of exceedance was small, at 6.4 m
3 s
−1, which was almost 14 times smaller than that observed (
Table 6).
The forecasted hydrographs for the Squaw Creek were similar to the observed in terms of shape and timing (
Figure 8). HRRRE-25% and HREF-50% were the discharge forecasts closest to the observed peak for the two precipitation forecasts tested (
Table 5). The NCRFC-5% exceedance discharge forecast was again small: only 10.3 m
3 s
−1 (
Table 6). Although overall, the forecasts overpredicted peak discharge in this example case, the use of ensemble PQPF to generate discharge forecasts associated with a flash flood before rainfall began would have provided a more skillful forecast than what was available from the NCRFC approach.
Given that the frequencies of the observed discharges were poorly matched to the forecast exceedance probabilities, post-processing of the forecasts would likely be one way to improve them. As one preliminary test, a simple calibration was performed iteratively removing a fraction of the forecasted discharge from the discharge predicted at each exceedance probability threshold until the forecasted probabilities of exceedance agreed with the observed frequency of discharge. All 109 cases were used to determine the appropriate average adjustment for each exceedance probability discharge. The size of the reduction in water amount varied greatly between probabilities of exceedance levels and the ensemble being considered. In many cases, more than 50% of the water had to be removed. Using this adjustment, the average RPS for the calibrated HREF-based forecasts improved by 0.1, while HRRRE-based scores improved by a smaller amount of 0.02. Another way to calibrate such forecasts would be to adjust the probability of exceedance so that it matched the observed frequency. Using just the 79 events for which the forecasts had been compared to observed frequencies in
Figure 5, a test was performed to determine the impact on the RPSs when the exceedance probabilities were adjusted to match the observed frequencies. This adjustment substantially lowered the discharges associated with the 5% and 50% exceedances, with less adjustment needed for the 95% value. With this test applied to all 109 cases, the average RPS of the adjusted HREF-based forecasts improved by 0.14, which was a nearly 50% improvement, while HRRRE-based scores improved by over 20%, which was a decrease of 0.05. Our sample of cases was not large enough to allow us to split it into a true training set and a separate test set for both of the calibration tests, although the second test did include 30 events independent of those used to adjust the exceedance probabilities.
Techniques such as the Schaake Shuffle [
52], which reorders the ensemble output to recover variability in the forcing variables, thereby eliminating the uniform precipitation forcing in time and space, might also lead to improvement. A limiting factor in completing this technique is that it requires the user to have a sizeable sample of historical data to force into the correct order based on climatology. Another post-processing technique to reduce errors in probabilistic streamflow forecasts is the “logistic regression” discussed in [
53]. Crochemore et al. [
54] showed that bias correcting the precipitation forecasts prior to input into the hydrologic model can lead to improved streamflow forecast skill as measured with the continuous ranked probability skill score. However, they also state that improving the reliability of precipitation forecasts does not always improve the reliability of the streamflow forecasts, and watersheds that had the most “room for improvement” benefitted the most from the bias correction [
54]. The testing of more complex pre- and post-processing methods was beyond the scope of the present work but should be examined in future research.
4. Discussion
This study examined a technique that derived rainfall time series for different probabilities of exceedance from ensemble PQPF and used it as input to the SAC-SMA hydrologic model to generate ensemble streamflow forecasts. Two different convection-allowing ensembles, HRRRE and HREF, were tested for 109 events across 11 small-scale basins throughout the Upper Midwest. A variety of different techniques were used to analyze forecasts of peak discharge, and comparisons were made with probabilistic streamflow forecasts generated by the NCRFC.
The HRRRE-based forecasts had the best ability to predict streamflow as indicated by the lowest average RPS of 0.32, and the best agreement between the predicted exceedance probabilities and the frequency that observed discharges exceeded these values. HRRRE likely performed better than HREF because it had lower predicted precipitation amounts for all of the probability of exceedance values examined, which were more similar to the observed rainfall. For HREF-based forecasts, the higher predicted precipitation amounts led to discharge forecasts that, on average, overpredicted the peak discharge. Finally, the NCRFC forecasts frequently underpredicted observed discharge, resulting in the worst RPSs and an increase in the number of times the observed discharges exceeded forecasted discharges associated with the exceedance probabilities. A case study focused on the 14 June Ames flood event suggests that the forecasting technique presented here may provide improved information compared to current forecasting methods to give emergency personnel and the public early information about the possibility of streamflow rises.
The discharge values associated with low probability of exceedance forecasts for both the HRRRE- and HREF-based forecasts were unreasonably large, with no observations having discharges as high as those predicted. The high rainfall amounts associated with low probabilities of exceedance are applied at all grid points within a basin, resulting in unreasonably high discharge forecasts. This problem was especially prevalent for the Root River, the largest basin examined, indicating that the overprediction worsens with basin size, as would be expected due to the application of the same exceedance amounts at all grid points in a basin.
This problem could be reduced in the future by calibrating the probability of exceedance values, such as by adjusting them to match observed frequencies during a training period, or by decreasing rainfall amounts associated with given probabilities of exceedance so that the magnitude of the predicted discharges would be reduced, or by calibration of the streamflow forecasts related to the exceedance values themselves. Two simple preliminary tests applied to the streamflow forecasts resulted in improvements in skill for the HREF-based forecast, with smaller improvements for the HRRRE-based one. Another refinement that might improve this technique would be to use an analysis of the spatial distribution of QPE from multiple warm-season rainfall events to determine the typical areal pattern of precipitation. This analysis of the spatial distribution would require the ability to distinguish between different types of convective systems occurring in the model output. Then, the distribution pattern could be used to adjust the rainfall amounts associated with PQPF values over the basins. Combining these techniques would allow the use of multiple different exceedance values in the spatial averaging, instead of a single blanket exceedance probability. This would allow more accurate precipitation forcing to be fed into the hydrologic model during each timestep and take advantage of the presumed value added by the analysis that enters into the PQPF that is absent from the raw QPF members.