Ensemble Radar-Based Rainfall Forecasts for Urban Hydrological Applications

Radar rainfall forecasting is of major importance to predict flows in the sewer system to enhance early flood warning systems in urban areas. In this context, reducing radar rainfall estimation uncertainties can improve rainfall forecasts. This study utilises an ensemble generator that assesses radar rainfall uncertainties based on historical rain gauge data as ground truth. The ensemble generator is used to produce probabilistic radar rainfall forecasts (radar ensembles). The radar rainfall forecast ensembles are compared against a stochastic ensemble generator. The rainfall forecasts are used to predict sewer flows in a small urban area in the north of England using an Infoworks CS model. Uncertainties in radar rainfall forecasts are assessed using relative operating characteristic (ROC) curves, and the results showed that the radar ensembles overperform the stochastic ensemble generator in the first hour of the forecasts. The forecast predictability is however rapidly lost after 30 min lead-time. This implies that knowledge of the statistical properties of the radar rainfall errors can help to produce more meaningful radar rainfall forecast ensembles.


Introduction
Flooding is a very common natural disaster around the world, and its frequency and intensity are expected to rise [1].Structural measures are not only very expensive but also are built to protect a particular region from an estimated flood level [2,3].More intense floods that can overcome the protective structures will inevitably occur, and the consequences will be increasingly severe, as the concentration of people and infrastructure in urban areas is on the rise as well.Therefore alternative measures using flood forecasting and warning systems are in demand, and forecasting floods several hours ahead can allow a timely emergency response to take place [4].Rainfall-runoff modelling has an important function when issuing flood warnings [2].Over the last couple of decades, much research has been done to increase the reliability in forecasting rainfall [5][6][7][8].Hydrodynamic models for real-time flow predictions, utilising radar rainfall and radar-based rainfall forecasts, can be used for the real-time control of drainage systems in urban areas [9].It is, however, essential to know the uncertainties related to the radar rainfall measurements in order to produce reliable forecasts.
For urban flood forecasting applications, rainfall data should have high temporal (e.g., 5 min or lower) and spatial (e.g., 1 km or lower) resolutions [10,11].Weather radars measure rainfall in realtime and provide high-resolution data for short-term precipitation forecasting, which is also known as nowcasting.Accurate rainfall forecast inputs are one of the main factors that influence accurate flow predictions.Increasing the predictability of these forecasts has proven to be a challenge given the small spatial scales involved and therefore reducing radar rainfall-related uncertainties can improve radar-based rainfall forecasts [8,12].This paper aims at quantifying the value of radar-based rainfall forecasts and their associated uncertainties in urban flood forecasting applications.
Weather radar rainfall is affected by different sources of errors that propagate into the rainfall forecasts produced by nowcasting models [13,14].Weather radars send microwaves and receive backscattered radiation from precipitation particles through the radar reflectivity (Z), which is related to the rainfall rate (R) using a semi-empirical equation of the form Z = aR b [15,16].However, the parameters a and b of the Z-R equation are known to be dependent on the precipitation type and the raindrop size distribution.Using a constant Z-R equation contributes to an error in radar rainfall estimation [16,17].In order to reduce uncertainties due to parameter bias in the Z-R equation, Hasan [18] developed an error model to assess how rain gauge uncertainties impact in the Z-R equation.Although the model was shown to improve rainfall estimation, the difference is less than 5%.According to Gorgucci et al. [19], radar estimations of precipitation have a large degree of uncertainty (higher than 100%) when the reflectivity error is only a few dBZ for C-band radars.Radar calibration should regularly be made to avoid inaccurate readings [17].Lack of accurate determination of the radar calibration constant leads to systematic errors in the observations.Some of the radar rainfall errors are associated with ground clutter echoes, which occur when the radar beam come across ground targets [16].Much ground clutter is permanent, and a ground clutter map can be used to identify these non-meteorological echoes.However, ground clutter maps are unable to identify radar echoes caused by anomalous propagation conditions [17,20].In conditions of anomalous propagation, the radar beam is bent towards the ground due to changes in the refractive index of the atmosphere producing unwanted ground echoes that can be misinterpreted as heavy precipitation [17,20].Polarimetric weather radars have shown promise in identifying anomalous propagation [21].Elevated terrain can cause occultation of the main part of the radar beam in some cases.This can only be corrected if less than 60% of the radar beam is obstructed [16].Variation in the Vertical Profile of Reflectivity (VPR) is related to changes in size, in phase, and in shape distribution of hydrometeors [22].Using dual polarisation radars permits the measurements of hydrometeor characteristics such as rain and melting snow, and snow to be identified and corrections to be applied [23].This can be used to improve the rainfall estimation on the ground.When the distance from a radar increases, there is naturally an increase in the radar sampling volume as well.At a higher altitude, the distribution of hydrometeors changes, leading to a difference between measured rainfall and rainfall that actually falls on the ground [17,24].During heavy rainfall events, attenuation might occur when radars with small wavelengths are used (e.g., C-band or X-band radars) [16,19,25,26].Finally, the radar rainfall can be adjusted using ground rain gauge measurements using bias correction methods or geostatistical approaches [27,28].
Once the radar data have been corrected for the different error sources, a radar-based rainfall forecasting model, such as nowcasting, can be applied.Nowcasting models based on the extrapolation of Lagrangian trajectories produce rainfall forecasts with a few hours lead-time (with increased loss of performance after 2-3 h ahead).They have a significant role in enhancing rainfall warning systems, especially when predicting extreme events or flash floods [29,30].Nowcasting models use a sequence of weather radar images, and the generated precipitation forecasts produced with this technique have a high spatial and temporal resolution [7].However, uncertainties inherent to the nowcasting model result in an increasing loss of forecasting skill after 1 h lead-time.Uncertainties in radar nowcasts are caused mainly by [30][31][32]: uncertainties inherent in the specific nowcasting model used, errors in radar rainfall estimation, uncertainties due to the temporal development of the velocity field and uncertainties caused by precipitation processes such as growth and decay not being taken into account.
Rinehart & Garvey [33] developed a pattern recognition model based on a correlation coefficient to calculate motion vectors in storms known as TREC (tracking radar echo with correlation).The TREC algorithm has been modified, and new models are based on this approach.Sokol et al. [34] compared forecasts produced using two different models.The COTREC model (Continuity of Tracking Radar Echoes by Correlation vectors), which is based only on extrapolation from radar images and assumes that rainfall trajectories do not change with time, and the SAMR model (Statistical Advective Method Radar) that utilizes the same technique, but in addition factors in a statistical model to correct precipitation estimations.Results showed that SAMR provides slightly better results, but it is not capable of predicting new storms or of forecasting any significant changes in existing ones accurately.The GANDOLF scheme (Generating Advanced Nowcasts for Deployment in Operational Land-based Forecasts) was developed to increase the predictability of convective rainfall.An object-oriented model of convection is used that incorporates a model of the life cycle of convective clouds [6].This advection scheme showed deficiencies in severe rainfall events.An attempt to address this issue was made by dividing the rain analysis into blocks and forcing adjacent blocks to have a smooth transitional variation of the velocity.Bowler et al. [12] derived a new optical flow algorithm, enhancing the GANDOLF system's capability to calculate the advection field.This algorithm was used to further develop the STEPS (Short-Term Ensemble Prediction System) [7].In the STEPS model, ensemble radar nowcasts are blended with Numerical weather prediction (NWP) forecasts.This is because NWP models have been shown to have better forecasting skill after several hours lead-time and can improve the ability to forecast growths and decays of precipitation when blended with a nowcasting model [7].NWP models require high computing power, and although the forecast resolutions are increasing [34,35], they are still more computationally demanding than nowcasting models.Blending nowcasting with NWP forecasts has been shown to improve the forecasting skill [9], and it has been successfully used in real-time applications [7,36].The uncertainties in the radar rainfall analysis and the temporal evolution of precipitation were accounted for by using a stochastic perturbation system, in which probabilistic forecasts are produced by adding spatially correlated stochastic noise to the deterministic forecast.Although the STEPS performance is higher than GANDOLF's, results for moderate and heavy rain still do not match up accurately with the observed precipitation.A newer version of STEPS has been developed to take into account radar errors [36], using a statistical model to generate ensembles proposed by Germann et al. [14].
Even with extensive research to correct radar errors and improve rainfall estimation, residual errors are present in processed radar images, and they are an important source of errors in the first hours of the forecasts [14,31,37].Ensemble forecasts consist of a number of different forecasts, all of them equally likely, that might occur in the same space and time.Ensemble forecasts can be obtained by adding spatially correlated stochastic noise as described above to the initial radar analyses, in order to produce a series of equally likely forecasts.An advantage of producing ensemble radarbased forecasts is that it provides additional information about the uncertainties in the forecasts, either from the nowcasting model or from the radar analyses [5,9,34,38].Germann et al. [15] proposed a radar rainfall ensemble scheme that uses rain gauge data as a reference to estimate the residual errors in radar rainfall.This approach was coupled with a semi-distributed rainfall-runoff model to demonstrate the propagation of the radar rainfall uncertainty in the simulation of river flows [14].Rico-Ramirez et al. [37] implemented the model proposed by Germann et al. [14] and assessed how uncertainties in radar rainfall propagate into urban drainage flows.The results showed that this model is able to capture the temporal and spatial correlations of the radar rainfall errors.The results also showed that the uncertainties in radar rainfall could explain some of the uncertainties observed in the simulated flow volumes, but additional uncertainties in the sewer flow model structure play an important role.
As shown in this section, the majority of past papers attempt to model the uncertainties in radar rainfall, but this leaves a research gap with regards to assessing and quantifying the value of radar rainfall ensembles in short-term rainfall forecasting (i.e., nowcasting), as well as in urban sewer flow forecasting.This study aims to model radar rainfall uncertainties in radar nowcasting to assess how these uncertainties propagate in the prediction of sewer flows of a small urban area.Section 2 of this paper describes the methodology and the data sets used.The results from the rainfall probabilistic forecasts and the results of applying these ensembles in the hydraulic model are shown in Section 3. A discussion of the results is presented in Section 4 and, Section 5 summarises the conclusions of this work.

Materials and Methods
Composite radar data from the UK Met Office, available through the British Atmospheric Data Centre (BADC) [39] were used in this study.The radar data has a temporal resolution of 5 min and a spatial resolution of 1 km × 1 km.The dataset used in this study is from the year 2008 and can be correlated with the available flow observations.The UK Met Office radar network consists of C-band weather radars capable of producing high-resolution precipitation data for the UK [39].Ground clutter is identified using ground clutter maps.The Z-R equation is constant for all types of rainfall (  = 200 and  = 1.6 ).The radar data processing includes an algorithm to correct for rain attenuation, which can be significant at C-band frequencies.The algorithm can be unstable in cases of severe attenuation and further uncertainties can occur with attenuation correction in cases where the weather radars are not properly calibrated [17].Currently, there are 18 C-band weather radars in the UK [37].Most of these radars have been upgraded with dual-polarisation technology in the last few years.This will result in significant improvements in terms of data quality (e.g., better identification of non-meteorological echoes), attenuation correction and rainfall estimation.Images from adjacent radars and previous images from the same radar are analysed in order to discard corrupted images.Anomalous propagation is removed from the radar data using a combination of the radar data and Meteosat images.In order to take into account the variation in the vertical reflective profile, an idealized vertical profile is identified at each radar pixel and this is defined by the background reflectivity factor and incorporates simple parameterizations.A radar horizon is used to correct for occultation of the radar beam [17].The Met Office radar data processing system also includes correction algorithms for uncertainties due to noise filtering, antenna pointing, mean field bias and conversion from Cartesian to polar coordinates [40].
Rain gauge data was provided by the UK Environmental Agency.The rain gauge data is freely available upon request (national.requests@environment-agency.gov.uk)under the Open Government License.A network of 229 tipping bucket rain gauges (TBRs) with 15 min temporal resolution were available for the study area in the north of England.Although rain gauges are used as a reference and are known to be reliable precipitation instruments, they are also subject to measurement errors.Uncertainties are related to gauge calibration, wind effect, wetting-evaporation losses, timing-errors, hydrodynamic water flow instabilities, blockages, malfunction and underestimation during heavy rainfall [37,41].In order to minimise inaccuracies in measurements different rain gauges were compared, and only reliable rain gauge data was used.Rain gauge data that presented significant deviation when compared to the surrounding rain gauges or showed an anomalous behaviour (e.g., blockages) were discarded (22 rain gauges in our case).It is also necessary to deal with sampling errors when comparing weather radar and rain gauge data.It is also worth recognising that rain gauge measurements are point measurements at ground level, whereas weather radar measurements occur at a higher altitude and with a larger sampling volume in space.Therefore, part of the discrepancies between both measurements are due to differences in the sampling volumes [42].Providing the rain gauge data is quality-controlled, they can be considered as ground true measurements to validate the radar rainfall observations.
A sequence of three radar images with a time-step of 5 min were used to produce forecasts every 5 min with a 3-h forecasting lead-time and 1 km spatial resolution.The STEPS model provided by the UK Met Office was used in this analysis.STEPS is a rainfall-forecasting model that blends rainfall extrapolation nowcasts with NWP rainfall forecasts.The nowcasting module isolates small characteristics (estimation of advection field, temporal evolution of rainfall and spectral decomposition) into multiplicative cascades.This ensures that features which cannot be accurately predicted by the model are substituted by stochastic noise.The model assumes that the rate of temporal evolution of rainfall and temporal development of velocity fields remain stationary during the forecast.Even with the uncertainties inherent to this model, radar nowcasting produces more skilful forecasts than NWP forecasts up to 2 h lead-time, and therefore STEPS uses a multi-cascade approach to blend the two components [7,36].For this study, the nowcasting component of the STEPS model was used to produce the forecasts up to 3 h ahead.Liguori and Rico-Ramirez [43] produced probabilistic nowcasts using the STEPS model and concluded that with a number of ensembles larger than 10-20 members it does not effectively increase the accuracy of the forecast.In different research papers, a number of ensembles between 20 and 30 are commonly used [44][45][46][47][48][49][50].In this research, each probabilistic forecast is formed by 25 ensemble members.
The radar rainfall errors were modelled using the model proposed by Germann et al. [14].This model uses the difference between the radar measurements and a reference (e.g., rain gauge observations) obtained from a large historical data set.The perturbation fields are calculated through the covariance matrix of the residual errors, by using radar and rain gauge measurements.Values with no rainfall are excluded from the calculations.Temporal correlation of the errors is imposed using an autoregressive model AR(2) [37].The perturbation fields () are added to the radar rainfall (R) in the log domain, resulting in radar rainfall ensemble members ().Radar rainfall ensembles (RE) attempt to assess in a realistic way the residual uncertainties that remain even after the correction algorithms are applied to the radar data.These radar rainfall ensembles are used instead of the original radar images to generate forecasts using the nowcasting model from STEPS.A deterministic forecast is produced using each of the 25 radar ensemble members.As a result, each of these 25 forecasts become the ensemble forecasts.Ensemble forecasts based on this method will be referred to as radar ensembles in this paper, while ensemble forecasts generated using the stochastic ensemble generator by STEPS will be referred as STEPS ensembles.Note that the STEPS ensembles are generated by adding spatially correlated noise to the deterministic radar forecasts.
The rainfall forecast output was used as an input in a hydrodynamic sewer network model built in the Infoworks CS software package to simulate the flows in the sewer network.Rainfall-runoff processes and the flow through the sewer network conduits were modelled utilising the Infoworks model provided by Yorkshire Water for research purposes.The sewer system is mainly combined, being used to carry both wastewater and rainfall runoff.The urban area, Ilkley, is located in the Pennine hills and has an area of 11.06 km 2 (Figure 1).This study utilises the geographic grid reference of the Ordnance National Grid for the radar rainfall data [51].Infoworks CS uses both rainfall-runoff volume and runoff routing models to simulate flows in the catchment and in this study the New UK percentage Runoff model, the Wallingford model and the Double Linear Reservoir model, were used.The full St Venant equations are used in the model to calculate the flows in the sewer conduits [9].Schellart et al. [52] and Liguori et al. [9] provide further information about the Infoworks CS model used in this study.The calibration of the hydrological urban model was performed using current industrial standards [53] and data from three storm days and one dry day from events that happened between March and April 2000 were used.Data from 5 tipping bucket rain gauges and flow monitors within the urban area were used for calibration.
Data from 7 depths monitors, 4 additional rain gauges (not shown in Figure 1) and 16 flow monitors are available from 2007 until 2009 within the urban area [9].In this study data from 2008 was used for validation.Liguori et al. [9] provide more detailed information about the urban model, and their Figure 2 presents the location of the flow monitor used in this study.
The capacity of the hydrodynamic model in simulating flows was assessed using radar and gauge data from 15 April 2008 until 31 December 2008.The period was chosen to include a wide range of events.The root mean square error (RMSE) was calculated comparing simulated flow with the measured flow.The RMSE was only computed for measured flow higher than 0.1 m 3 /s to exclude flows measured in dry periods and to minimise uncertainties related to the flow measurement.
An overview of the events with the start date and time, duration, peak flow, maximum average rainfall and storm type is shown in Table 1.Events with high peak flow were selected, and flow forecasts were performed every 30 min, including forecasts of low, medium and high flows.In order to classify the storms as convective or stratiform, each pixel of the radar scan was classified as either stratiform or convective using to the algorithm proposed by Steiner et al. [54].A storm was classified as convective if more than 3% of the pixels of the study area are convective for a period of more than 3 h.The events that did not fulfil these requirements are classified as stratiform [37].

Results
Radar rainfall ensembles were generated using both the method described by Germann et al. [14] and the STEPS ensembles [7].Figures 2 and 3 illustrate an example of radar scans and two different radar ensemble members at different time steps.In the images it is possible to see how the rainfall is developing according to the radar measurements.In Figure 3 the forecast started at 02:00 so the results show forecasts with lead-time of 30 min, 60 min and 90 min.There are clear differences between the radar rainfall and the rainfall forecasts and the results show how these differences increase with lead-time.Using different ensembles to estimate and forecast rainfall can give more information about how the storm will develop according to the initial uncertainty of the radar rainfall measurement.

Rainfall Forecasting
To assess the predictability of the ensembles rainfall forecasts, the Receiving Operating Characteristic (ROC) curves were calculated.ROC curves have been widely used to analyse uncertainties in probabilistic forecasting systems and can measure the ability of a model to identify the occurrence of an event correctly.The method is based on a binary system, where yes/no forecasts and yes/no observations are computed [9].For a sequence of threshold, a 'hit-rate' (HR) (proportion of events correctly forecasted) and a 'false-alarm rate' (FAR) (proportion of events that were not forecasted) are computed and used to define the ROC curve [55].The better the forecast, the higher the HR (and lower the FAR).The area beneath the ROC curve should be above 0.5 (random forecast) and is equal to 1 when the model can perfectly forecast an event that occurs or not [9].Some of the results obtained for a selection of events with rainfall intensities equal or higher than 0.1 mm/h, 1.0 mm/h and 3.0 mm/h, are presented in Figures 4-7 at different forecasting lead-times.
To summarise the results from the events analysed, the area beneath the ROC curves for different rainfall thresholds and forecasting lead-times are plotted in Figure 8.The results show that the forecast skill decreases with both, forecasting lead-time and higher rainfall intensities.

Sewer Flow Simulations
The hydrodynamic model was verified using radar rainfall and the additional rain gauge data within the urban area as input.Figure 9 presents the results comparing the measured flow with flow simulated using radar data and Figure 10 shows the results comparing measured flows and flows simulated using gauge data.The simulation using gauge data presents a slightly better result than using radar data.The results show that the hydrodynamic model simulated the flows accurately in many cases.To exclude dry periods, the RMSE was only calculated for measured flows over the threshold of 0.1 m 3 /s.The RMSE for the radar flow simulations is 0.0956 m 3 /s and for the gauge flow simulation is 0.0838 m 3 /s using a time period from 15 April 2008 to 31 December 2008.

Forecasting Sewer Flows
The sewer flow simulation results obtained by using radar rainfall, rain gauge measurements and rainfall forecasts were compared against the sewer-measured flows.Because the forecasted peak flow sometimes appeared a few minutes later (or earlier) compared to the flow observations, it was decided to compare the peak flows within a particular time window (e.g., 30 min or 1 h).The results are presented in Figures 11-13.
RMSE calculations were carried out to assess the performance of the ensembles.Due to the time lag present between the measured peak flow and the ensemble-simulated peak flows, a crosscorrelation correction was performed between the measured flows and ensemble flows.As the lags are not consistent, the cross-correlation was adjusted for each case.RMSE for each ensemble member was calculated for measured flows higher than 0.1 m 3 /s.(Figure 14).Table 2 presents the mean of the RMSE for both STEPS and Radar ensembles.Given the fact that there is a significant loss of forecast efficiency after 1 h lead-time, the RMSE is only shown for this period.In most events, the time lags between measured flow and forecasts are less than 15 min.
The forecasted flow in Ilkley only produced reliable forecasts with lead-times up to 30 min, although in some cases the predictability maintained up to 1 h ahead depending upon the nature of the rainfall event.

Rainfall Forecasting
The tendency observed in this study is that the forecast accuracy decreases with longer leadtime and higher rainfall intensities as shown in Figure 8.The first forecasted hour shows a high forecast ability, however with the increase of lead-time the ability is rapidly reduced.The forecast skill also decreases with higher rainfall thresholds, indicating that high rainfall intensities are more challenging to forecast and are subject to more errors.In most cases the ROC curve is below the random forecast line for a threshold of 3 mm/h, demonstrating that in these situations the model fails to forecast higher rainfall intensities efficiently.The ROC curves shown in Figures 4-7 demonstrate how the forecast efficiency decline after the first hour forecasted for a threshold of 3 mm/h.For a rainfall threshold of 0.1 mm/h, all the events for both probabilistic forecasts produce skilful predictions up to 3 h lead-time.With an increased lead-time however, the forecast ability also decreases.
The events analysed produced good forecasts up to 1 h lead-time for all rainfall thresholds, but overall the radar ensembles performed slightly better than the STEPS ensembles during this period.The STEPS ensembles lose forecasting skill less rapidly and more constantly between the time-steps used when the thresholds are between 0.1 mm/h and 1.0 mm/h.Significant forecast accuracy is lost after the first hour for the radar ensembles.The area beneath the ROC curve is reduced by approximately 20% for rainfall intensities higher than 0.1 mm/h.For a threshold of 3.0 mm/h, the area beneath the ROC curve is reduced by around 24% after the first forecasted hour.For lead-times longer than 1 h, the STEPS ensembles perform better than the radar ensembles in most cases.Seeing as radar errors are the predominant source of uncertainties in the first hour forecasted, an ensemble generator based on the modelling of the radar residual errors is expected to produce more accurate results at the beginning of the forecast [14,50].
For higher rainfall intensities, the decrease in forecasting accuracy can be up to 27% per hour after 1 h lead-time.In most cases with a 3.0 mm/h threshold and lead-time longer than 2 h, the forecast is unable to predict the rainfall intensities accurately at these small spatial scales and presents areas beneath the ROC curve equal or lesser than 0.5.The loss of efficiency in the forecast is consistent with other studies [9,56].Due to the fact that regions with high-intensity rainfall are smaller there is a decline in the performance of the ensemble rainfall forecasts for higher rainfall thresholds.The area beneath the ROC curves for the radar ensembles are on average 10% higher than the STEPS up till 30 min lead-time.The difference falls to 6% when the forecasting lead-time is up to 1 h.

Sewer Flow Simulations
For low flow situations, the model can accurately estimate flows using radar and gauge data.However, for flows over 0.5 m 3 /s, the gauge estimations underestimate the flow peaks in most cases.Results using radar data produce better estimates for higher flows and are able to predict peak intensities more accurately for flows around 0.5 m 3 /s.For higher intensity flows the model also underestimates the flow in most cases.For low-intensity flows, both the gauge and radar estimations produce accurate results and mimic the flow pattern.Flow simulation with rain gauges have a lower RMSE (0.084 m 3 /s) than using radar rainfall (0.096 m 3 /s).These results are expected given the fact that the rain gauges used to simulate the flow are located within the urban area.

Forecasting Sewer Flows
In the event shown in Figure 11, the radar probabilistic forecast is able to simulate the peaks in the first hour of the forecast and also accurately simulate the second flow peak.The STEPS probabilistic forecasts underestimated both flow peaks.For this event, it is clear that using rain gauge data to produce the radar rainfall ensembles adds valuable information to improve the forecasts.This allows the flow peak and flows patterns to be forecasted better using the radar ensembles.For radar ensembles, the measured flow (0.509 m 3 /s) is very close to the 75th percentile flow (0.418 m 3 /s).For the STEPS ensembles, the measured flow is only captured by the more extreme ensembles, as can be seen on the boxplot.The flow simulations using radar data and the deterministic forecast show similar results to the STEPS ensembles peak flows, both underestimating the flow peaks.The simulation using rain gauge data replicate the first flow peak more accurately but underestimate the second one.The radar ensembles have the advantage of combining information from both the radar and the rain gauges and are able to predict both flow peaks.The second large flow peak occurred around 2 h lead-time, and cannot be forecasted by any of the probabilistic or deterministic forecasts.This indicates that at lead-time longer than 1 h the flow forecast loses its forecasting skill.For this event, the radar ensembles perform better in analysing the RMSE mean.For the first hour, the mean is approximately 15% smaller for radar ensembles when compared to the STEPS ensembles.
The simulated flow peak in Figure 12 is overestimated using radar rainfall.Flow simulations using rain gauge data are much closer to the measured peak flow, but there is a delay in time of a few minutes.In this example, both radar and STEPS forecasts can capture the flow peak, however, the radar ensembles produce a flow forecast with a higher spread than the STEPS forecast.The measured peak flow (0.563 m 3 /s) falls between the 25th and 75th percentile for the radar ensembles (Figure 12b) in the first forecasted hour.In the second hour of the forecast, the STEPS ensembles forecast the flow intensity more accurately than the radar ensembles.The radar ensembles produce a much larger spread and therefore there is an overestimation of the flow by a large part of the ensembles.This leads to an increased RMSE (0.287 m 3 /s) for the first hour of the forecasts and a higher value than using the STEPS ensembles (0.0169 m 3 /s).The STEPS ensembles produce a smaller RMSE mean for the whole duration of the event.
In Figure 13, the nowcasting model's efficiency is rapidly lost with increased lead-time.This figure shows the results of forecasting the same rainfall event but at different starting times (30 min apart).The forecasts for both ensembles initiated at 02:30 on 1 August 2008 (Figure 13a) fail to predict the flow peak correctly, and the time lag between the forecasted flow and the measured flow is higher than for shorter lead-times.In the forecasts initiated at 03:00 (Figure 13b), the peak flow falls into the first forecasted hour and the forecast produced replicates the peak flow better.In this case both ensembles were able to capture the peak flow, however, only some of the radar ensembles can reproduce the peak flow correctly.Because the STEPS has a higher spread, the peak flow can be forecasted by more ensemble members under these circumstances.The event presented in Figure 13 presents the highest measured flow among all the selected events.Accurately forecasting high rainfall intensities has proved to be more challenging and both forecasts were able to predict the peaks at a short forecasting lead-time.The RMSE mean for the forecast initiated at 02:30 is higher (0.405 m 3 /s for radar ensembles and 0.430 m 3 /s for STEPS ensembles), and in contrast to the other events, the flows were forecasted with a delay of around 20 min.Starting the forecast 30 min later improved the ability to predict flow peaks and the RMSE for radar ensembles was nearly a third of the previous forecast.In the event represented in Figure 13, there were no rain gauge data available.So, the advantage of using radar rainfall ensembles is more evident.The RMSE indicates that the radar ensembles overperform the STEPS ensembles during the first hour forecasted.
Analysing the 13 events for longer forecasting lead-times, the performance is case dependent, but both radar ensembles and STEPS ensembles tend to lose their accuracy as forecasting lead-time increases.
In the majority of the events there is a time-lag between the measured and forecasted flows, however, this time-lag is not consistent.In some events, the flow peaks are predicted in advance of the actual flow and in other cases, the forecasted flow has a delay when compared to the measured flow.The simulated flows for both radar and gauge do not present the same time lag, confirming that these are uncertainties inherited in the rainfall forecast.The fact that the area studied is an urban area of small dimensions with both permeable and impermeable surfaces means that the catchment response time is very small, and any uncertainties related to the rainfall forecast have an almost immediate effect in the flow forecast.Tests performed with the ensembles forecasts, where highintensity rainfall pixels were only displaced a few kilometres, could have a great impact in the flow simulation.This highlights the importance of improving the accuracy of rainfall forecasts for applications in urban areas.

Conclusions
This work assessed how radar rainfall uncertainties propagate from the radar rainfall measurements into radar rainfall forecasts, and further on into urban sewer forecasting.The work also compared the accuracy of flow forecast prediction using two different rainfall ensemble generators.A stochastic ensemble generator, which adds spatially correlated noise to the deterministic forecast, was used as reference (STEPS ensembles).An ensemble generator that adds spatially-correlated noise based on the residual radar error between radar rainfall and rain gauge measurements was used to assess the radar rainfall uncertainties (radar ensembles) and how these uncertainties propagate into the simulated sewer flows of an urban area.
Results from the rainfall forecasts show that both ensembles can produce accurate forecasts for lead-times up to 3 h for all rainfall intensities larger or equal to 0.1 mm/h.With intensities larger than 1 mm/h the results vary depending on the event.Skilful forecasts could be produced up to 1 h leadtime in most cases and in some events, it was possible to produce a skilled forecast even when the lead-time was 3 h.For a high rainfall threshold (larger than 3 mm/h) reliable forecasts were produced for at least 1-h lead-time.The radar ensembles produced slightly better results than the STEPS ensembles in the first hour of the forecasts.After this, the radar ensembles lost accuracy more rapidly than the STEPS ensembles.
The flow forecasts in the urban areas were generated using an Infoworks CS model.The radar ensembles produced better results than the STEPS ensembles in the first hour of the forecasts and were able to better reproduce the flow peaks.The RMSE mean for the radar ensembles was lower than the RMSE mean for the STEPS ensembles for most of the events in the first hour forecasted.This was true even in cases where the simulated flow using the radar rainfall is overestimated or underestimated, thus being able to better reproduce the flow hydrograph.With lead-times longer than 1 h, all the forecasts lose predictability independently of which ensemble generator is used.
The radar ensembles produced reproducible forecasts with improved accuracy at predicting the flow peaks when compared to the STEPS ensembles.However, there were events where neither of the ensembles could adequately forecast the flow peaks.In general, this happened in cases where both, radar and rain gauge simulated flows were very different from the observed flow.
The results show a time-lag of some minutes between the measured and forecasted peak flows.However, even with this limitation, it is possible to improve the forecast of peak flows in urban areas using the method proposed in this study, and it can be used in real time to enhance existing warning systems in urban areas with up to one hour lead-time.The nowcast skill can be potentially improved by blending radar nowcasts with NWP forecasts, especially as the forecasting lead time increases.With up to one hour lead-time, the nowcast has a major impact on the forecast and this improvement enhances the flow forecasts during this period.Future works can incorporate uncertainties caused by growth and decay of precipitation using for instance the method described by Foresti et al. [57].
Author Contributions: M.C. conceived the methodology, performed the calculations and wrote the original draft.M.A.R.-R.provided supervision and guidance in every aspect of the research and participate in the editing and review of the paper.

Funding:
The first author acknowledges the funding from Science without Borders Programme and CAPES foundation to carry out this project.

Figure 1 .Figure 2 .
Figure 1.Location of the urban catchment (inside the orange circle), radar locations (red squares) and rain gauge positions (blue circles) in the study region.

Figure 7 .Figure 8 .
Figure 7. ROC curves for radar and STEPS ensembles for event on 14 October 2008 starting at 15:00 for Radar ensembles and rainfall thresholds (th) equal to 0.1 mm/h (a), 1.0 mm/h (b) and 3.0 mm/h (c).and STEPS ensembles and thresholds equal to 0.1 mm/h (d), 1.0 mm/h (e) and 3.0 mm/h (f).

Figure 14 .
Figure 14.RMSE for radar and STEPS ensembles and 0-1 h after the start of the forecast.

Table 1 .
Event start dates, duration, peak measured flow, maximum average rainfall and storm types (S-stratiform and C-convective) for the Ilkley urban catchment.

Table 2 .
RMSE mean for radar and STEPS ensembles for 0-1 h after the start of the forecast.