Evaluation of Radar-Gauge Merging Techniques to Be Used in Operational Flood Forecasting in Urban Watersheds

Demand for radar Quantitative Precipitation Estimates (QPEs) as precipitation forcing to hydrological models in operational flood forecasting has increased in the recent past. It is practically impossible to get error-free QPEs due to the intrinsic limitations of weather radar as a precipitation measurement tool. Adjusting radar QPEs with gauge observations by combining their advantages while minimizing their weaknesses increases the accuracy and reliability of radar QPEs. This study deploys several techniques to merge two dual-polarized King City radar (WKR) C-band and two KBUF Next-Generation Radar (NEXRAD) S-band operational radar QPEs with rain gauge data for the Humber River (semi-urban) and Don River (urban) watersheds in Ontario, Canada. The relative performances are assessed against an independent gauge network by comparing hourly rainfall events. The Cumulative Distribution Function Matching (CDFM) method performed best, followed by Kriging with Radar-based Error correction (KRE). Although both WKR and NEXRAD radar QPEs improved significantly, NEXRAD Level III Digital Precipitation Array (DPA) provided the best results. All methods performed better for lowto medium-intensity precipitation but deteriorated with the increasing rainfall intensities. All methods outperformed radar only QPEs for all events, but the agreement is best in the summer.


Introduction
Flooding has been identified as the world's deadliest natural disaster after earthquakes and tsunamis due to the associated damage to life, property, economy, and infrastructure [1,2]. Therefore, flood mitigation procedures are essential for regions susceptible to flooding. A well-developed flood forecasting system that can deliver accurate and reliable forecasts with proper lead time is a vital part of nonstructural flood management. At present, flood forecasting and warning utilizing hydrological models to forecast river flow have a widespread application in disaster management [3,4]. In urban watersheds, the frequency and magnitude of flooding are mostly influenced by precipitation [5,6]. Therefore, the accuracy and reliability of hydrological model predictions depend heavily on the accuracy of the forcing data, especially the Quantitative Precipitation Estimates (QPEs) [7,8]. Additionally, accurate QPEs produce model parameter sets that represent watershed characteristics after hydrological radar QPEs by reducing the distance to the radar stations [51]. However, both methods mentioned above require significant investment.
Besides the methods mentioned above, a significant improvement has been made by adjusting radar QPEs with gauge observations (hereafter radar-gauge merging). The radar-gauge merging produces accurate, reliable, real-time QPEs that capture variations both in space and time. Radar-gauge merging to adjust radar QPEs to match with gauge observations is broadly discussed in the literature [52][53][54]. Previous research has shown that the radar-gauge merging is useful to improve radar QPEs that can be used as an additional source for hydrological applications [23,52,54,55]. Even though radar-gauge merging has been used in improving radar QPEs since the start of the operational use of weather radars in the 1970s, Ochoa-Rodriguez et al. [54] emphasize the lack of understanding of the full potential of application at the spatial-temporal resolutions required for urban hydrology. High-quality precipitation estimates with high temporal resolution are needed for urban hydrological applications due to small-sized urban catchments and a high degree of imperviousness in response to fast rainfall [36,54]. Hence, urban watersheds are very sensitive to the spatial and temporal variability of rainfall, and further research is essential before using radar QPEs in operational urban hydrology with confidence.
Even though studies on radar-gauge merging are common, to date, they have not adequately addressed applicability at the resolution and scale required for urban hydrological applications. Most of the studies have focused on large-scale area applications, frequently country-wide, using a time-step above 24 h, or at temporal scales for event-based accumulation [22,51,52,[56][57][58][59][60][61][62][63][64][65]. Urban hydrological studies need reliable QPEs at high spatial (a few km) and temporal (hourly or sub-hourly) resolutions [53,66]. A few studies have investigated radar-gauge merging techniques focusing on the urban scale but were limited to a single technique under limited climatological and infrastructure conditions [23,[67][68][69]. To date, only two inter-comparison studies have been conducted focusing on urban scales [68,70]. The first study evaluated radar-gauge merging techniques based on a rain gauge network of three gauges [69]. The latter research provides an introductory understanding of the relative performances of merging methods [70]. Therefore, further research on smaller-scale urban catchments is essential [53,54]. The most recent categorization of the existing radar-gauge merging methods was introduced by Ochoa-Rodriguez et al. [54] based on the potential for urban hydrological application; 1. radar bias adjustment methods, 2. rain gauge interpolation methods using spatial radar association as additional information, and 3. radar-rain gauge integration methods. Only two preliminary studies have investigated the three categories of merging methods defined by Ochoa-Rodriguez et al. [54,68,71]. Therefore, it is beneficial to consider methods that preserve small-scale features in the merged estimates in fine scale, the limited availability of rain gauges, and computational requirements for operational use when applying radar-gauge merging methods. Furthermore, radar-gauge merging applications using modern dual-polarization radars must be evaluated. Most of the previous studies have evaluated merging techniques using single polarized radar [51,53]. As mentioned before, the dual-polarized radars have the potential to produce more accurate and reliable radar QPE compared to single polarized radar [33,50]. As emphasized by Ochoa-Rodriguez et al. [68], the quality of initial radar products affects the relative performance of merging methods and, ultimately, the quality of merged radar precipitation estimates. Therefore, it is vital to investigate how these new dual-polarized products may impact the performance of different radar-gauge merging techniques.
In this study, different existing radar-gauge merging techniques for operational use in urban watersheds have been implemented to obtain the best estimation of precipitation in two watersheds in the Greater Toronto Area (GTA), Canada: semi-urban Humber River, and urban Don River watersheds. The operational application of radar-gauge merging in flood forecasting is not yet implemented widely in Canada [51]. This is important because a recent study conducted in the GTA reported about 94.0 mm of rainfall estimates with R(K DP ) algorithm using dual-polarized King City radar (WKR) through a heavy rainfall event over a 2-h period on 8 July 2013, whereas the rain gauge recorded 126 mm over the same period [72]. In the same study, an accumulation of 109.2 mm of rain is reported for the same storm event with the R(Z) algorithm using KBUF NEXRAD S-band algorithms [72]. The authors suggested the differences are caused by ground clutter contamination, path attenuation, and radome wetting. To use radar QPEs as additional precipitation forcing for hydrological models in the GTA area, the reported differences must be minimized. In this study, the performance of two dual-polarized WKR C-band radar hourly QPEs and two KBUF NEXRAD S-band radar hourly QPEs have been improved using radar-gauge merging techniques to minimize the difference with gauge measurements. Moreover, several radar-gauge merging techniques with varying degrees of complexity have been evaluated to assess which method best suits these two urban and semi-urban watersheds in the GTA area. The merging techniques implemented include Mean Field Bias correction (MFB), Frequency and Intensity Correction (FIC), local intensity scaling (LOCI), Cumulative Distribution Function Matching (CDFM), Range-Dependent bias Adjustment (RDA), Modified Brandes Spatial Adjustment (MBSA), and Kriging with Radar-based Error correction (KRE). The MFB, RDA, MBSA, and KRE methods are often used in the literature for radar-gauge merging; however, methods CDFM, FIC, and LOCI are seldomly discussed in the literature as radar-gauge merging techniques. In addition to radar-gauge merging techniques, Ordinary Kriging (OK) was also included for the evaluation as a benchmark since it is a widely used interpolation method for hydrological applications and is often used as a reference.
This work aims to evaluate radar-gauge merging techniques using two dual-polarized WKR C-band radar and two KBUF NEXRAD S-band operational radar hourly QPEs (the first attempt to the best of our knowledge) to assess which best suits the two urban and semi-urban watersheds in the GTA, Canada. The objectives of this study are (1) to evaluate different existing radar-gauge merging techniques of various degrees of complexity to facilitate hydrological model runs for operational flood forecasting in urban watersheds; (2) to verify the reliability and accuracy of dual-polarized radar QPEs from WKR C-band and KBUF NEXRAD S-band QPEs as an additional data source for hydrological model calibration; (3) to assess the relative strength of radar-gauge merging techniques at different rain intensities; and (4) to illustrate the performance of radar-gauge merging techniques for different rainfall events. Additionally, Section 2 provides a detailed description of the study area. Characteristics of the radar and the rain gauge network and detailed descriptions of different methods used for merging can be found in Section 3. Section 4 presents the results and discussion of the evaluation of radar-gauge methods. Section 5 draws general conclusions.

Study Area
The Humber River watershed and Don River watershed, located in the GTA, Ontario, Canada, are the two watersheds of interest in this study ( Figure 1). Toronto and Region Conservation Authority (TRCA) currently manages both watersheds. The watersheds form part of the Great Lakes basin and have good coverage from the King City Canadian WKR C-band radar and the Buffalo KBUF NEXRAD USA S-band radar. The radar-gauge merging methods are applied on a square domain containing both watersheds ( The Humber River watershed is the largest semi-urban watershed in the GTA and is home to over 800,000 people. The watershed covers an area of 911 km 2 and consists of approximately 54% rural land, 33% urban land, and 13% of urbanizing land [73]. For further details on the Humber River watershed, readers are referred to the TRCA Watershed Features Humber River website [73].
The other watershed focused on for this study is the Don River watershed and is nearly 350 km 2 in size and home to over a million residents. It is a fully urbanized watershed, with approximately 80% developed areas [74]. Further details of the Don River watershed can be found in the TRCA Don River website [74].
developed areas [74]. Further details of the Don River watershed can be found in the TRCA Don River website [74].

Rain Gauge Data
Hourly rainfall accumulations provided by tipping-bucket rain gauges from 2012 to 2017 were gathered from TRCA data archives. TRCA operates a dense gauge network (one gauge per ∼75 km 2 in Humber River and one gauge per ∼116 km 2 in the Don River) with 18 precipitation monitoring locations across the two watersheds. All TRCA gauge stations are located within ∼40 km radius range of WKR Canadian C-band radar station and within the usable range (<180 km) of KBUF NEXRAD S-band radar ( Figure 1).  [75]. Reflectivity data for this study were collected from the King City weather radar facility located north of Toronto, Ontario ( Figure 1). The WKR radar performs dual-polarization POLarimetric Plan Position Indicator (POLPPI) scans at 0.5-degree elevation, completed in about 1 min, every 10 min with 0.25 km range and 0.5-degree azimuth resolution [33]. Reflectivity values were corrected for attenuation using the modified ZPHI algorithm [33,76]. Radar QPEs were estimated using different rain rate estimator algorithms, R(Z), R (Z, K DP ), and R (K DP , Z DR ), listed in Table 1. In this study, the average over 11 × 11 radar pixels around the nearest gauge location was used to limit the wind drift [77]. Considering the resolution of radar bins, the final radar QPE grid size is about 3 km (11 × 0.25 km) × 4 km (11 × 0.5 = 5.5 deg assuming a distance of 40 km) with each grid cell containing a 10-min rainfall accumulation. Radar estimated rain rates are not time interpolated and are assumed to be constant over the scanning time interval of 10 min. To be consistent with the gauge data, the 10-min rainfall was integrated in time to derive 1-h rainfall accumulations.

KBUF NEXRAD S-Band Dual-Polarized Radar QPEs
NEXRAD Level III S-band radar QPE data from the Buffalo radar station (Latitude 42.94639 Longitude −78.72278) were downloaded from the National Centers for Environmental Information (NCEI) archives. The KBUF radar collects reflectivity values about every 6 min at 0.25 km in range and 0.5-degree in azimuth. Afterward, rain rates are calculated using the Precipitation Processing System (PPS) algorithm R = 0.017 Z 0.714 . Two NEXRAD Level III one-hour precipitation products, Digital Precipitation Array (DPA), and one-hour precipitation (OHA), were used for the study ( Table 1). The operational NEXRAD QPEs have a temporal resolution of 60 min and a spatial resolution of 4 × 4 km [78].

Radar-Gauge Merging Methods
The following radar-gauge merging techniques were carefully chosen for this study based on extensive operational use in urban areas, the ability to be implemented, prominence in past literature, varying degrees of complexity, and several location-specific factors such as gauge density, proximity to the radar station, basin size, and time step of adjustment [51,80].

Mean Field Bias Correction (MFB)
This method removes the bias introduced through the uncertainty in the radar calibration or an erroneous coefficient in the Z-R relationship [81]. MFB assumes that the radar QPEs are affected by a uniform multiplicative error. Therefore, a single adjustment factor (C m f b ) is estimated (Equation (1) [82]) and applied to the entire radar field [82]. An alternative adjustment that depends on the mean assessment factor (C ma f ) (Equation (2) [82]) is also implemented: where C mfb and C maf are correction factors, G i and R i are the gauge and corresponding radar value associated with gauge i.

Frequency and Intensity Correction (FIC)
This method maps the distribution between the radar and gauge rainfall data at a given location [83]. FIC first truncates the radar rainfall distribution at a point that approximately replicates the long-term observed relative frequency of rainfall. Then the truncated radar rainfall intensity distribution is mapped on to a gamma distribution fitted to observed gauge intensity distribution. The frequency of the hourly radar rainfall is corrected by fitting a threshold value x R to truncate the empirical distribution of the radar rainfall data. The radar rainfall threshold x R is calculated as follows (Equation (3) [83]) using the empirical cumulative distribution of both gauge and radar rainfall. In this method, 0.1 mm was used as the minimum observed precipitation amount x for an hour to be considered wet (hours with precipitation ≥ threshold) based on the literature [83]. Corrected radar rainfall x Cor i on hour i is calculated as in Equation (4) [83] by mapping the truncated radar rainfall intensity distribution F R (x) (e.g., fitted gamma or empirical distribution) to a gamma distribution fitted to observed intensity distribution F G (x). x where F(·) and F −1 (·) represent a cumulative distribution function (CDF) and its inverse, R indicated radar rainfall, and G shows observed gauge rainfall This study considered both gamma (GG) and empirical distribution (EG) for truncated radar data, respectively. Precipitation at a location without a coincident gauge record was obtained by correction from the precipitation values from the WKR C-band grid point closest to the gauge station. For NEXRAD radar, it was obtained from the closest radar pixel to the prediction grid cell with coincidental gauge records.

Local Intensity Scaling (LOCI)
The primary step of the LOCI method is to correct the wet time-step frequency and intensity, respectively [84]. First, a model wet-hour threshold x R,h is determined from the hourly radar rainfall for an hour h to ensure that the threshold exceedance matches the wet-hourly frequency in the gauge observation. A correction factor x Cor R,h,i is then calculated using the following equation (Equation (5) [84]): where µ is mean intensity (mm wh −1 ) ("wh" is a wet hour, with greater than or equal to thresh hold). The CDFM method uses a sorting algorithm to match the CDF of observed data (radar rainfall) to a reference data (gauge) set [85]. CDFM matches the CDF of the radar rainfall (cd f R ) based on polynomial fitting to match with the CDF of the historical gauge data (cd f G ). The radar estimated QPEs are re-scaled so that the empirical CDFs of both radar and gauge data sets match (Equation (6) [85]). A third-degree polynomial model is employed as follows for this purpose (Equation (7) [85]).
where x and x are the gauge data and transformed radar rainfall data respectively where BR is the bias-corrected radar rainfall, P indicates the coefficients of the polynomial models, and S is the raw radar rainfall.

Range-Dependent Bias Adjustment (RDA)
This method is based on the Baltic Sea Experiment (BALTEX) adjustment proposed by Michelson et al. [86] and assumes that radar biases are a function of the distance from the radar tower [87]. Range dependencies are due to beam broadening, overshooting of the beam, increasing height of the measurements, and attenuation effects [88]. The relationship between the R/G ratio and the distance from the radar station is expressed in the log-scale. The range is approximated by a second-order polynomial whose coefficients are determined through observations using the least-squares fit (Equation (8) [53]). The range dependent multiplicative factor is calculated from: where r is the distance from the radar tower to the radar bin, and a, b, and c are coefficients using the least-squares fit.

Modified Brandes Spatial Adjustment (MBSA)
In opposition to MFB, the MBSA assumes that the biases are spatially dependent [57]. The MBSA distributes correction factors across the radar field. A distance-weighting scheme with a smoothing factor is used to determine the influence of a known data point on the interpolated value of a specific radar bin. The correction factors (C i ) are calculated at each gauge location at a set time step (e.g., hourly) (Equation (9) [53]). Then, weights (WT) for each radar bin i from each gauge location are determined following the Barnes objective analysis to produce the calibration field [89]. A negative exponential weighting (Equation (10) [53]) is used to calculate the weights. All correction factors are then interpolated across the entire radar field using two passes (F 1 : Equation (11) [53] and F 2 : Equation (12) [53]). Finally, the spatially interpolated correction factors at each radar bin are multiplied by the radar-estimated rainfall (Equation (13) [53]) to get the bias-corrected radar rainfall.

Kriging
The Kriging methods determine the precipitation value at a grid point/pixel at a non-gauged location by using gauge measurements at neighboring locations [61]. In this study, two kriging methods have been implemented: Ordinary Kriging (OK) and Kriging with Radar-based Error correction (KRE). The Ordinary Kriging method interpolates the precipitation from gauge observations at several locations [90]. The KRE method uses the radar field to estimate the error of the OK that is created using gauge data [91].
Ordinary Kriging (OK) The OK defines a variogram symbolizing the spatial variability of the observed precipitation field. First, a parametric variogram, γ(h), is generated using the gauge measurements. Weighted contributions from surrounding gauges are then used to calculate rainfall values at unknown points (Equation (14) and 15) [61]). After that, the OK system is produced by minimizing the estimate variance using a Lagrange multiplier µ 1 (Equation (16) [61]). Finally, merged rainfall values at x 0 are determined using n values obtained by solving a matrix (Equation (17) [61]) that formalizes the above conditions.
with the condition that Kriging with Radar-Based Error Correction (KRE) This method attempts to diminish the bias while minimizing the variance of error [53]. The KRE method combines radar and gauges by fitting gauge data into the observed precipitation field R(s) based on the radar data [91]. First, gauge data are Kriged using OK to create a gauge Kriging field G K (s) to obtain the best linear unbiased rainfall estimates. Then, the radar-based Kriging precipitation filed R K (s) is generated with the same variogram using radar QPEs at corresponding gauge stations. This process produces an interpolated rainfall field that retains the mean-field of the original radar data. After that, the deviation between observed and interpolated radar values ε R (s) is calculated at each grid point (Equation (18) [91]). Finally, the deviation field ε R (s) is applied to the Kriged gauge field G K (s) to obtain the merged rainfall field (Equation (19) [91]) that preserves mean-field deviation and the spatial structure of radar rainfall.

Evaluation of Radar-Gauge Merging Techniques
After Koistinen and Puhakka [92], several assumptions were made before conducting radar-gauge merging. Firstly, rain gauge data were used as a ground reference assuming they are accurate for each respective gauge location. However, gauge data can be unreliable due to human errors, irregularities of topography, wind-induced under-catch, wetting, and evaporation losses [93]. Secondly, it was assumed that there is no spatial mismatch between radar and gauge measurements and is valid for the same location in time and space. However, there is always a spatial mismatch because different volumes are sampled by point rain gauges and spatially integrated weather radar at different heights [94]. While radar samples volume above approximately 4 × 4 km surface for NEXRAD radar and 3 × 4 km for WKR radar, rain gauges measure precipitation over an eight-inch diameter surface area contributing differences in measured precipitation. Thirdly, the radar was assumed to capture relative spatial and temporal variabilities of precipitation successfully. Even if attenuation and ground clutter is addressed during signal processing, other limitations such as anomalous propagation, radar calibration errors, wind effect, growth of precipitation, variations in the Z-R relationship, presence of hail, or other hydrometeors can affect the radar precipitation [95]. Fourthly, the assumption of constant rain rate over the scans leads to a temporal sampling error [69,96]. Finally, the relations that are made upon the comparison between gauges and radar were assumed to be valid for other locations in time and space.
Performance of nine radar-gauge merging techniques was investigated by using 18 rainfall events, totaling 278 h (i.e., all events) of rainfall that occurred in spring, summer, and fall periods from 2012 to 2017 (Table 2). Since the Z-R relationships for WKR C-band radar QPEs are only valid for liquid rainfall and do not account for low melting layers and possible bright band contamination, winter precipitation was excluded from the analysis [33]. Events were determined as the time where at least half of the gauges (9 out of 18) recorded a precipitation amount > 0 mm to the time where half of the gauges start re-recording zero. Additionally, intensity, availability of both radar and gauge precipitation, data continuity (limited number of missing values), reasonable accumulation of rainfall, and coverage of the watershed were considered during event selection. The methods were evaluated by comparing the merged radar QPEs to the gauge measurements and radar only QPEs (hereafter referred to as RO) to determine what method generates the best precipitation estimates. First, radar-gauge merging was applied for all events (278 h) at the same time to evaluate the performance of each technique for each radar QPE for an hourly time-step of accumulation. Then radar-gauge merging was separately applied for the events to represent the performance by the event. Missing gauge values were ignored during the analysis. The merged WKR QPE at the nearest gauge point or the combined NEXRAD radar pixel where the ground observation is collected was compared with the reference gauge points. As commonly practiced in hydrological studies, two-thirds of the gauges (12) were assigned for merging method implementing purpose (denoted by triangles in Figure 1), and the remaining six gauge stations (denoted by squares in Figure 1) were used for validation as reference gauges [61]. To get the maximum coverage within the watershed, stations that are located a minimum of 10 km apart from each other were selected for the verification. To assess the added value of the radar-gauge merging compared to the spatial distribution of the rain gauge data alone, the OK method was tested as a benchmark. In addition, the performance of each merging method for rainfall intensity thresholds was assessed to determine the effect of rainfall intensity on the merging methods.
The merging techniques were applied to hourly precipitation amounts to keep the spatial and temporal advantages accessible by radar as the error due to spatial and temporal variations in gauge estimates are averaged out for more extended time steps, especially at 24 h or above or event-based accumulation times. The selection of appropriate space and time scales for radar-gauge merging to remove systematic bias in urban and semi-urban watersheds is always a compromise. It heavily depends on the specific area of interest. Even though sub-hourly data provides more reliable results during radar-gauge merging for urban watersheds, hourly precipitation was used in this study due to several reasons. Firstly, high temporal resolution data limitation is a challenge in these two watersheds. Only hourly precipitation gauge data is available from most of the TRCA gauge stations for the study period from 2012 to 2017. Additionally, available NEXRAD radar data for the study area are in hourly temporal resolution. Secondly, sub-hourly time steps can result in random errors [97]. Thirdly, adjusting radar QPEs in long-term accumulations to be applied in short term hydrological applications is often used in literature and proven to be effective [97][98][99][100]. Fourthly, the hydrological models have been improved recently; however, many of them are not yet ready to use high temporal resolution (sub-hourly) inputs because the output is always for an hour or more than that [5]. Finally, hourly streamflow simulations using hydrological models could adequately capture peak flows at the two watersheds because the calculated time of concentration of both basins is higher than one hour [101]. In terms of operational flood forecasting, streamflow and possible inundated areas along a river must be sent to the authorities and public as early as possible. High-resolution radar QPEs are costly and also could delay the flood forecasting process because of the high model computation time when it comes to real-world application [5].
The spatial resolution of the radar data used in the study may not be the optimal resolution for urban hydrological applications. High spatial resolution data limitation is challenging. Nearly 4 × 4 km resolution NEXRAD radar is available for the study area. However, Hourly NEXRAD radar with 4 × 4 km spatial resolution has shown some potential for operational use in urban scale watersheds in the USA [102]. After pixel averaging to limit the wind drift, the resolution of WKR is~3 × 4 km. The pixels averaging was performed following the advice of severe weather scientists at WKR radar station, who has a long time of hands-on experience on WKR radar QPEs. Even though spatial sampling error increases with pixel averaging, the effect decreases with increasing accumulation time and, therefore, relatively low at a longer time scale [52]. Moreover, on a 1261 km 2 (Humber River-911 km 2 + Don River-350 km 2 ) grid, the radar provides better spatial resolution than gauges. For Humber River watershed and Don River watershed, the WKR C-band radar gives the equivalent of about 75 and~29 rain gauges across the basins, respectively. In contrast, the rain gauges yield 12 and 3 measurements (Figure 1). The NEXRAD S-band radar provides the equivalent of about~56 and 21 rain gauges across the Humber River and Don River basins, respectively. In addition, the use of radar QPE resolution mentioned earlier might save on computation time, facilitating the operational flood forecasting.
Correlation (r) BIAS (%) where, P G is gauge measurement, P G is average gauge measurement, P R is radar rainfall, P R is average radar rainfall, and N is the number of radar-gauge pairs data available.
The following flow chart ( Figure 2) provides an overview of the methods and evaluation process for this study.
Water 2020, 12, x FOR PEER REVIEW 13 of 31 RMSE (mm) where, is gauge measurement, is average gauge measurement, is radar rainfall, is average radar rainfall, and is the number of radar-gauge pairs data available. The following flow chart ( Figure 2) provides an overview of the methods and evaluation process for this study.   Figure 3 shows the average RMSE (mm), MAE (mm), BIAS (%), correlation (r), and RMSF(dB) of the radar only QPE and merged radar QPE values for 278 h (i.e., all events) at grid point nearest to gauge point of King City WKR C-band radar and corresponding grid cell of KBUF NEXRAD S-band radar. The agreement between gauge observations and radar QPEs has been improved after applying radar-gauge merging; however, the degree of improvement varies for each method as well as for each radar QPE (Figure 3). Overall, the CDFM method appears as the best performing method followed by KRE among nine merging techniques used in this study. Matching the CDFs of radar and gauge data sets based on polynomial fitting 1-5 was carried out to remove the systematic differences between two data sets. The CDFM method with polynomial fitting 3 appeared to perform best with the least RMSE and highest correlation for all four radar QPEs. The reported average RMSE with polynomial fitting 3 for C1, N1, C2, and N2 is 1.0 mm, 0.8 mm, 0.9 mm, and 1.7 mm, respectively. The correlation values for C1, N1, C2, and N2 recorded after CDFM with the third-degree polynomial model is 0.88, 0.91, 0.89, and 0.89, correspondingly. The existing error after applying CDFM may be caused by the random component of the radar QPE errors that are not removed by the CDFM method. Even though the CDFM method is not often used in radar-gauge merging, it has been successfully applied to bias correct different other gridded hydrological inputs to hydrological models in previous literature such as soil moisture [103][104][105] and snow depths [106,107]. For example, Leach et al. [106] have reported a significant reduction of average RMSE after applying CDFM bias correction for Snow Data Assimilation System (SNODAS) snow depths (67.30 mm to 38.45 mm) as well as SNODAS snow water equivalent data (SWE) (19.99 mm to 5.19 mm). A significant improvement of average NSE for SNODAS snow depths (0.24 to 0.76) and SNODAS snow water equivalent data (−5.7 to 0.55) have also been reported. Furthermore, a mean correlation of 0.87 ± 0.02 and mean RMSE of 0.05 ± 0.02 was reported after CDF matching between soil moisture and ocean salinity (SMOS) microwave radiometer and local soil moisture observations [104]. Although the CDFM method is not very common in radar-gauge merging, results suggest that it can be successfully used to match radar QPEs with gauge observations. The KRE method shows the second-best performance in terms of matrices calculated for all events. In this study, a bounded linear function that can be fitted to the experimental variogram model was used, and the data are assumed to be isotropic. Other unbounded functions stable, exponential, and Gaussian were also explored in the study to perform Kriging. Several bounded functions, including circular, spherical, pentaspherical, were also explored with gauge observations to find the best method to be used for radar-gauge merging. The bounded linear variogram outperforms all other variogram models with a recorded correlation of 0.88, BIAS of −1.05%, RMSE of 1.7 mm, and MAE of 0.8 mm. Both KBUF NEXRAD S-band (N1 and N2) and WKR C-band radar (C1 and C2) have been enhanced after application of radar-gauge merging; nevertheless, the degree of improvement differs for each radar QPE.

Results and Discussion
In comparison with the radar only QPEs, the RMSE decrease is apparent for all the radar-gauge merging methods (Figure 3). The percent decrease of RMSE compared to radar only QPEs for C1, N1, C2, and N2 ranges from 79.85% to 90.50% after applying the CDFM method ( Table 3). The NEXRAD Level III (DPA) shows the highest percent decrease, followed by WKR C-band multi-parameter rain rate estimator using K DP and Z DR. All radar-gauge merging methods effectively reduced the RMSE by more than 50% for C1, C2, and N1 except for MFB-maf and MBSA (Table 3). The recorded RMSF has been reduced for all four radar QPEs after applying all radar-gauge merging methods. In contrast to RMSE, N2 shows relatively higher RMSF decrement compared to the other three radar QPEs, especially after MFB-mfb, MFB-maf, RDA, and FIC-GG with greater than 70% ( Table 3). As mentioned before, RMSE inflates relatively significant differences that may be caused by inaccurate data that leads to anomalous values. The anomalous reflectivity values caused by the bright band effect may have caused the discrepancies in RMSE for N2. There is a high possibility for NEXRAD to measure reflectivity at the height of the bright band because the KBUF NEXRAD measures the precipitation relatively far (~106 km) from the watershed making a discrepancy in sampling heights between KBUF NEXRAD S-band and WKR C-band reflectivity measurements. because the KBUF NEXRAD measures the precipitation relatively far (~106 km) from the watershed making a discrepancy in sampling heights between KBUF NEXRAD S-band and WKR C-band reflectivity measurements. As seen in Figure 3, the MAE is considerably reduced after radar-gauge merging for all methods used in the study. All four radar QPEs have been improved except for MFB-mfb for C1. Reported MAE after RDA and FIC-GG merging methods are relatively similar for all radar QPEs. Reported MAE values for C1, N1, C2, and N2 are 2.21 mm, 2.32 mm, 2.15 mm, and 2.34 mm respectively after RDA and 2.07 mm, 2.08 mm, 2.03 mm, and 2.21 mm after applying FIC-GG. The highest percent decrease of MAE is recorded after CDFM with greater than 90% decrement percent for C1, C2, and N1 ( Table 3). Most of the methods effectively reduced the MAE (>50% percent decrease) for both WKR C-band and NEXRAD S-band radar QPEs except for MFB-maf and MBSA.
The average negative (underestimation) BIAS for all radar QPEs has been reduced after applying radar-gauge merging; however, the degree of bias reduction varies for each method as well as for each radar QPE (Figure 3). The BIAS is relatively higher for both KBUF NEXRAD S-band radar before and after As seen in Figure 3, the MAE is considerably reduced after radar-gauge merging for all methods used in the study. All four radar QPEs have been improved except for MFB-mfb for C1. Reported MAE after RDA and FIC-GG merging methods are relatively similar for all radar QPEs. Reported MAE values for C1, N1, C2, and N2 are 2.21 mm, 2.32 mm, 2.15 mm, and 2.34 mm respectively after RDA and 2.07 mm, 2.08 mm, 2.03 mm, and 2.21 mm after applying FIC-GG. The highest percent decrease of MAE is recorded after CDFM with greater than 90% decrement percent for C1, C2, and N1 ( Table 3). Most of the methods effectively reduced the MAE (>50% percent decrease) for both WKR C-band and NEXRAD S-band radar QPEs except for MFB-maf and MBSA.
The average negative (underestimation) BIAS for all radar QPEs has been reduced after applying radar-gauge merging; however, the degree of bias reduction varies for each method as well as for each radar QPE (Figure 3). The BIAS is relatively higher for both KBUF NEXRAD S-band radar before and after radar-gauge merging (Figure 3). After radar-gauge merging, the KRE method shows low BIAS with relatively persistent (average value ranges between −5% and −15%) values for each radar QPE. The persistent bias can be adjusted through hydrological model calibration [108].
The correlation between radar QPEs and gauge measurements has considerably increased after applying radar-gauge merging techniques (Figure 3). The correlation values after radar-gauge merging for all four radar QPEs are relatively high and persistent for the CDFM method compared to other methods. The percent increase of r values is higher than 100% for all four radar QPEs after bias correcting using the CDFM method (Table 3). After the CDFM method, a relatively complex KRE method shows the highest correlations between radar and gauge measurements with reported r values of 0.88, 0.87, 0.76, and 0.78 respectively for C1, N1, C2, and N2. The percent increase of r values for all four radar QPEs for the KRE method is also higher than 100% (Table 3). The KRE method outperforms simple radar-gauge merging methods because it uses optimal interpolation to combine gauge and radar observations while taking the covariance structure of the data into account to reduce bias as well as to minimize variance [52]. The MBSA method that uses a negative exponential weighing method (Barnes's objective analysis) to interpolate differences in radar and gauge measurements performs well after KRE with relatively high and consistent correlation values for all four radar QPEs (Figure 3). The correlation values for all four radar QPEs after application of MFB-mfb, MFB-maf, RDA, LOCI, FIC-EG, and FIC-GG methods show similar and relatively satisfactory performances with r values range from 0.40 to 0.60 and relatively high percent increments ( Table 3). The simple methods such as MFB can significantly improve radar QPEs and therefore remains one of the most commonly used radar-gauge merging methods for operational applications, especially in many national meteorological services [52,60,65]. Contrasting the MFB adjustment, the RDA method assumes radar QPE error surges with distance from the radar tower because of beam broadening and overshooting, and therefore improves KBUF NEXRAD than WKR C-band radar QPEs ( Figure 3).
Moreover, after radar-gauge merging, the correlation values decrease for radar QPEs as N1 > N2 > C2 ≈ C1. Among four radar QPEs, the correlation is much improved after applying bias correction for N2 compared to the other three radar QPEs and ranges from 94.97% to 206.89% (Table 3). For example, the correlation value of 0.29 between raw radar and gauge before bias correction is improved to 0.89, with the percent increment associated is 206.89% (Table 3) after applying CDFM. Apart from the CDFM method, all the other methods show a percent increment of greater than 90% for N2 (Table 3). Improvement is apparent for N2 after applying KRE with recorded highest percent decrement of RMSE, BIAS, RMSF, and MAE, and the second-highest percent increment of the correlation ( Table 3). The N2 shows the worst performances compared to other radar QPEs before radar-gauge adjustment with recorded RMSE, MAE, BIAS, r, and RMSF of 8.4 mm, 7.3 mm, −52.01%, 0.29, and 9.6 dB, respectively.
Overall, as seen in Figure 3, the CDFM and KRE methods perform better than gauge only OK. However, the OK method provided precipitation estimates with a similar or better magnitude of accuracy as other radar-gauge merging techniques. The OK method shows a better magnitude of accuracy than all four radars only QPEs. The estimated matrices for gauge only OK method display similar matrices to other radar-gauge merging techniques MFB, RDA, LOCI, FIC, and MBSA. As mentioned before, the rain gauge network density is relatively high for both Humber River and Don River watersheds with one gauge per 75 km 2 and one gauge per 116 km 2 , respectively. The noticeable accuracy of gauge only OK is attributed to the proximity of merged gauge stations to verification gauges. The distance between the verification gauge and the nearest merged gauge in the study area varies from 3.5 km to 9.5 km. Since gauge only OK method performs better than radar only QPEs, radar QPEs must be adjusted with gauge measurements before using them as precipitation inputs to get the additional benefit added by radar QPEs.  Figure 4 compares the box and whisker plots of gauge measurements with each of the radar QPEs after the application of radar-gauge merging. The interquartile range (IQR) for merged radar QPEs is at the same level as reference gauge measurements and overlaps with one another for C1, C2, and N1 for all radar-gauge merging methods. Additionally, the median of C1, C2, and N1 lies within the IQR of the gauge measurements. The first quartile, median, third quartile, and the range of C1, C2, and N1 nearly tie-up with gauge measurements, implying that all radar-gauge merging methods perform evenly well for C1, C2, and N1. The first quartile, median, third quartile, and the range of N1 after applying CDFM methods match the gauge measurements implying that N1 with the CDFM method can be used as an additional source of precipitation for hydrological model calibration with high confidence. The IQR for N2 does not overlap with reference gauge or with C1, C2, and N1 for all radar-gauge merging methods except for the KRE method. For example, after FIC-EG correction, the IQR for N2 is entirely above the reference gauge measurements as well as C1, C2, and N1. Additionally, its median always lies either below or above IQR of gauge and other radar QPEs except for KRE. Therefore, it can be concluded that radar-gauge merging methods work relatively better on C1, C2, and N1 compared to N2. However, after applying KRE, the IQR of N2 overlaps with gauge measurements and other radar QPEs. The median after KRE bias correction lies within IQR of the gauge data as well as C1, C2, and N1. Therefore, the KRE method performs well for all radar QPEs compared to other methods.
As mentioned above, the N2 QPEs are unstable for radar-gauge merging methods except for KRE. The percent detection (d) of precipitation was calculated for radar only QPEs using the following equation to find out the capability of each radar QPE to capture the precipitation compared to gauges.
Detection (d) where, P G is gauge measurement, P G is average gauge measurement, P R is radar rainfall, P R is average radar rainfall, n P R > 0, n P G > thresh, number of radar-gauge pairs that the radar records precipitation and the corresponding gauge observation exceeds the specified threshold (0 mm); and n P G > thresh, number of radar-gauge pairs where the gauge value exceeds the specified threshold (>0 mm).
The calculated percent detection for C1, C2, N1, and N2 are 84%, 84%, 78%, and 58%. The detection is relatively higher for WKR radar only QPEs than NEXRAD radar QPEs. The detection for N1 is better than N2. The WKR radar measures the precipitation at a distance of 37 km away from the greatest edge of the watershed and therefore shows high detection whereas the KBUF NEXRAD measures the precipitation~106 km away from the watershed resulting lower detection of precipitation. The low detection of N2 partially causes the worst performances compared to other radar QPEs used in the study. However, the KRE method uses radar observations of rainfall to assess the errors using Kriging to interpolate between the rain gauge observations and then condition the Kriged gauge field accordingly. Unlike bias reduction methods, the KRE method uses the spatial association of the radar rainfall field to support the interpolation of gauge measurements. Since it interpolates rainfall between radar and gauges, the missing values are infilled during the radar-gauge merging process. The spatial variability observed by the radar is retained while reducing variance using the optimal information content in the vicinity of the gauges where they provide accurate information on the exact rainfall field using Kriging interpolation. The infilling using Kriging has caused relatively significant improvement of N2 radar QPEs compared to other radar QPEs. It is beneficial, especially for hydrological model calibration, where a continuous precipitation time series is often necessary. As suggested by Zhang et al. [109], the lack of detection is difficult to address during the calibration of hydrologic models. Therefore, the KRE method is recommended if the original data suffers from missing data.  Figure 5 shows the hourly radar rainfall accumulations before and after radar-gauge merging for all four radar QPEs for the 278 h of analysis (i.e., all events) and plotted against the gauge hourly accumulations. The scatter plots illustrate the relative strengths of each radar-gauge merging method at different rain intensities. The unbiased radar precipitation estimates are indicated by the one-to-one line (dotted line). In general, hourly precipitation values are more clustered along the one-to-one line after the application of radar-gauge merging when compared to the radar-only QPEs. Therefore, all radar-gauge merging methods are effective in adjusting radar QPEs to match with observed gauge measures. The CDFM and KRE methods generally perform well compared to the gauge observation followed by MBSA. However, the KRE method slightly overestimates the gauge measurements. The QPEs are much closer to the one-to-one line for all four radar QPEs for precipitation intensities ranges from 1 mm/h to 10 mm/h, especially after the application of CDFM methods. The KRE method performs reasonably well for all intensities as well as for all four radar QPEs. However, the agreement is relatively low for precipitation less than 1 mm/h and has become progressively worse with higher rainfall intensities. In heavy intensity rainfall events, the path attenuation affects reflectivity values and, ultimately, radar estimates precipitation [33]. Even though attenuation is addressed using the ZPHI algorithm for all methods, the attenuation correction can be possibly overshadowed by the extensive radome attenuation at WKR C-band radar station, resulting in poor rainfall estimates [33]. The CDFM method performs reasonably well for a wide range of intensities, including intensity > 10 mm/h. However, the CDMF method overestimates lower accumulation amounts.  Figure 5 shows the hourly radar rainfall accumulations before and after radar-gauge merging for all four radar QPEs for the 278 h of analysis (i.e., all events) and plotted against the gauge hourly accumulations. The scatter plots illustrate the relative strengths of each radar-gauge merging method at different rain intensities. The unbiased radar precipitation estimates are indicated by the one-to-one line (dotted line). In general, hourly precipitation values are more clustered along the one-to-one line after the application of radar-gauge merging when compared to the radar-only QPEs. Therefore, all radar-gauge merging methods are effective in adjusting radar QPEs to match with observed gauge measures. The CDFM and KRE methods generally perform well compared to the gauge observation followed by MBSA. However, the KRE method slightly overestimates the gauge measurements. The QPEs are much closer to the one-to-one line for all four radar QPEs for precipitation intensities ranges from 1 mm/h to 10 mm/h, especially after the application of CDFM methods. The KRE method performs reasonably well for all intensities as well as for all four radar QPEs. However, the agreement is relatively low for precipitation less than 1 mm/h and has become progressively worse with higher rainfall intensities. In heavy intensity rainfall events, the path attenuation affects reflectivity values and, ultimately, radar estimates precipitation [33]. Even though attenuation is addressed using the ZPHI algorithm for all methods, the attenuation correction can be possibly overshadowed by the extensive radome attenuation at WKR C-band radar station, resulting in poor rainfall estimates [33]. The CDFM method performs reasonably well for a wide range of intensities, including intensity > 10 mm/h. However, the CDMF method overestimates lower accumulation amounts. A similar overestimation is observed for the remaining radar-gauge merging methods as well. Even though a degree of scattering occurs between radar QPEs and gauge reported precipitation for hourly data after application of radar-gauge merging, less scatter is observed for N1 compared to C1, C2, and N2. Even though N2 shows acceptable performance after application of CDFM radar-gauge merging, some inconsistencies at the medium and higher intensities exist. This inconsistency may result due to the lack of detection of precipitation from NEXRAD S-band radar compared to WKR C-band as the two watersheds are closer to the WKR C-band radar than the KBUF NEXRAD radar. Although the WKR radar suffers from attenuation of return echoes, the detection is high as WKR radar measures the precipitation at a distance of ∼37 km at the furthest edge of the watershed. For N2, the low detection has been successfully addressed by applying the KRE method as most of the values after KRE correction plotted close to the one-to-one line.
A similar overestimation is observed for the remaining radar-gauge merging methods as well. Even though a degree of scattering occurs between radar QPEs and gauge reported precipitation for hourly data after application of radar-gauge merging, less scatter is observed for N1 compared to C1, C2, and N2. Even though N2 shows acceptable performance after application of CDFM radar-gauge merging, some inconsistencies at the medium and higher intensities exist. This inconsistency may result due to the lack of detection of precipitation from NEXRAD S-band radar compared to WKR C-band as the two watersheds are closer to the WKR C-band radar than the KBUF NEXRAD radar. Although the WKR radar suffers from attenuation of return echoes, the detection is high as WKR radar measures the precipitation at a distance of ~37 km at the furthest edge of the watershed. For N2, the low detection has been successfully addressed by applying the KRE method as most of the values after KRE correction plotted close to the one-to-one line. The average correlation between WKR C-band and KBUF NEXRAD S-band radar QPEs and gauge observations before and after radar-gauge merging for each event are presented in Figure 6a The correlation values vary from event to event. The average correlation for 18 events for QPEs from C1, C2, and N1 after applying the CDFM method is superior to other radar-gauge methods for all events followed by KRE and MBSA (Figure 6a-c). The correlation values vary considerably between different radar QPEs as well as between different events before the application of radar-gauge merging. As can be seen in Figure 6, after radar-gauge merging, the correlation values are The average correlation for 18 events for QPEs from C1, C2, and N1 after applying the CDFM method is superior to other radar-gauge methods for all events followed by KRE and MBSA (Figure 6a-c).
The correlation values vary considerably between different radar QPEs as well as between different events before the application of radar-gauge merging. As can be seen in Figure 6, after radar-gauge merging, the correlation values are approximately similar for all radar QPEs as well as for all events, especially after CDFM. Correlation values range from 0.80 to 0.99 for all events after applying the CDFM method for C1, C2, and N1. However, for N2, the CDFM method shows some inconsistencies (Figure 6d), and this may be due to the lack of detection, as discussed before. The KRE method shows a higher correlation with gauge data compared to other radar-gauge merging methods followed by CDFM and MBSA for N2 (Figure 6d). Even though MFB, RDA, LOCI, FIC methods show higher correlation values after radar-gauge merging for 278 h (i.e., all events) compared to radar only QPEs, relatively low or no improvement is observed for each event separately for C1, C2, and N2. However, all radar-gauge merging techniques show a considerable increase in correlation values for four summer events 13, 14, 15, and 16 and relatively low but substantial improvements for summer events 3, 4, and 10 for N1 (Figure 6b). The radar-gauge merging is relatively effective for events in summer than fall and spring, especially for N1. The performances of radar-gauge merging methods and, eventually, the quality of merged rainfall products is affected by the quality of original radar QPE products [68]. The recorded RMSE and Correlation values before radar-gauge merging for N1 is relatively good compared to other raw radar QPEs of C1, C2, and N2. This quality of N1 radar QPEs may have affected for successful merging in N1. On the other hand, the bright band effect might have affected the quality of radar QPEs in early spring, and late fall, and apparently, the radar-gauge merging is not able to improve the radar precipitation estimates successfully. Figure 6e-h shows the average RMSE between radar-gauge hourly accumulation pairs before (RO) and after applying radar-gauge merging for radar QPEs for each event separately. Calculated RMSE shows the same trend as the correlation values and varies from event to event. Generally, the RMSE is relatively lower for all four radar QPEs compared to raw radar QPEs after applying radar-gauge merging. The average RMSE for QPEs from all four radar QPEs after applying CDFM method ( The RMSE values vary substantially between different radar QPEs as well as between different events before the application of radar-gauge merging. Even though RMSE is reduced after radar-gauge merging, RMSE is nonetheless high for high-intensity events (e.g., events 3 and 5) and relatively small in low-intensity events (e.g., events 7, 12, and 18). Although radar-gauge merging reduces the RMSE, above mentioned limitations in radar precipitation estimates [23,41] impede the accuracy and reliability of radar QPEs.
As indicated in the matrices for 278 h (i.e., all events), the gauge only OK method shows relatively better performances than radar only QPEs for separate events in terms of r and RMSE ( Figure 6). After applying the CDFM, KRE, and MBSA merging techniques to C1, C2, and N1 show better performances than the OK method. As stated before, a relatively high dense gauge network may result in a better magnitude of accuracy for OK than MFB, RDA, FIC, and LOCI radar-gauge merging techniques. The outperformance of rain gauge alone OK could be due to the ability of the high-density gauge network to describe the spatial variability in the precipitation field. For every event, radar QPEs adds additional value only after applying radar-gauge merging. Therefore, radar only QPEs must be adjusted with the appropriate merging technique before using them as additional precipitation source for event-based hydrological models for operational flood forecasting. Water 2020, 12, x FOR PEER REVIEW 22 of 31 Figure 6. Average correlation (a-d) and RMSE (e-h) between the hourly accumulation of merged radar QPEs and gauge measurements for each event. Note: Summer events: 1, 3, 4, 5, 9, 10, and 13-17; Spring events: 2, 7, and 8; Fall events: 6, 11, 12, and 18.  Figure 7 shows an example of radar-gauge merging results after applying the KRE method. The figure shows the spatial distribution of accumulated precipitation derived from radar-gauge merging using KRE for the event 3 took place from 8 July 2013 1800 UTC to 9 July 2013 0200 UTC. It compares the difference between the gauge (G), NEXRAD radar only QPEs (RO), and KRE merged QPE (KRE). The KRE precipitation field display features from both gauges and radar only QPE. Additional information is reported after applying the KRE method in the areas where the gauges' record precipitation, but the radar only QPEs is uncertain. The agreement between the precipitation recorded at the reference gauges and radar QPEs is higher than that of the radar only QPEs, especially in the South-Eastern parts of the watershed.
Water 2020, 12, x FOR PEER REVIEW 23 of 31 Figure 7 shows an example of radar-gauge merging results after applying the KRE method. The figure shows the spatial distribution of accumulated precipitation derived from radar-gauge merging using KRE for the event 3 took place from 8 July 2013 1800 UTC to 9 July 2013 0200 UTC. It compares the difference between the gauge (G), NEXRAD radar only QPEs (RO), and KRE merged QPE (KRE). The KRE precipitation field display features from both gauges and radar only QPE. Additional information is reported after applying the KRE method in the areas where the gauges' record precipitation, but the radar only QPEs is uncertain. The agreement between the precipitation recorded at the reference gauges and radar QPEs is higher than that of the radar only QPEs, especially in the South-Eastern parts of the watershed. Besides the factors discussed above, the selection of an optimal radar-gauge merging method is influenced by location-specific environmental and operational factors [53]. Therefore, it is vital to understand the impact of factors such as the density of the gauge network, storm characteristics, proximity to radar tower, response time (time of concentration) of the watersheds, and the time step of adjustment to select appropriate radar-gauge merging technique. The rain gauge density of the semi-urban Humber River and urban Don River watersheds are relatively high, with rain gauge densities of one gauge per ~75 km 2 and one gauge per ~116 km 2 , respectively. According to McKee and Binns [53], rain gauge density can play a significant role in determining the optimal radar-gauge merging method. Highly dense rain gauge networks can sometimes characterize the spatial variability in the rainfall field and hence adequately increase the confidence in radar-gauge merging methods. For example, Goudenhoofdt and Delobbe [52] studied the sensitivity of radar-gauge merging methods to rain gauge density and concluded that the decrease in density affects spatial adjustment methods than simple bias reduction techniques. As shown in this study, both bias reduction techniques (CDFM, MFB, MBSA, RDA) and error variance minimization techniques (KRE) Besides the factors discussed above, the selection of an optimal radar-gauge merging method is influenced by location-specific environmental and operational factors [53]. Therefore, it is vital to understand the impact of factors such as the density of the gauge network, storm characteristics, proximity to radar tower, response time (time of concentration) of the watersheds, and the time step of adjustment to select appropriate radar-gauge merging technique. The rain gauge density of the semi-urban Humber River and urban Don River watersheds are relatively high, with rain gauge densities of one gauge per ∼75 km 2 and one gauge per ∼116 km 2 , respectively. According to McKee and Binns [53], rain gauge density can play a significant role in determining the optimal radar-gauge merging method. Highly dense rain gauge networks can sometimes characterize the spatial variability in the rainfall field and hence adequately increase the confidence in radar-gauge merging methods. For example, Goudenhoofdt and Delobbe [52] studied the sensitivity of radar-gauge merging methods to rain gauge density and concluded that the decrease in density affects spatial adjustment methods than simple bias reduction techniques. As shown in this study, both bias reduction techniques (CDFM, MFB, MBSA, RDA) and error variance minimization techniques (KRE) produce better results because of the high dense gauge network. The characteristics of the storm, such as intensity, also affect the accuracy of radar-gauge merging [110]. The results of this study also suggest that only the CDFM method and KRE method show better agreement between observed gauge measures and merged radar QPEs for a wide range of intensities; however, the agreement is deteriorated for all other methods with increasing the intensity. Furthermore, the proximity to the radar tower affects the accuracy of radar-gauge merging in two separate ways. Firstly, the accuracy of the radar-estimated precipitation and the radar-gauge merging methods deteriorate with increasing distance from the radar tower. Secondly, low detections of KBUF NEXRAD radar compared to WKR C-band radar due to beam broadening, beam overshooting, and beam attenuation affect the raw radar QPE quality, radar-gauge merging techniques, and ultimately merged radar QPEs. After applying RDA to KBUF NEXRAD radar, a substantial improvement was observed compared to WKR C-band radar, especially in terms of correlation. As Gjertsen et al. [80] suggested, relatively small urban basins with a time of concentration on the order of hours are highly benefitted from radar QPEs that require rainfall estimation on small spatial and temporal scales. Therefore, the time step of the radar-gauge merging plays a vital role in selecting radar-gauge merging methods for urban watersheds. For more extended periods (e.g., daily or event-based temporal resolutions), the magnitude of spatiotemporal sampling errors become stable because error fluctuations are averaged over time [53,80]. Even though the errors are reduced, short-term variations are missed affecting the accuracy of merged radar QPEs. In this study, an hourly time-step is used, because the study targets operational flood forecasting in urban watersheds where response time is on the range of hours. Since radar-gauge merging is performed in hourly time-step, spatially dependent bias correction method KRE works better than bias reduction method MFB. The variations between radar and gauges are more pronounced in shorter time steps, and therefore more weight is placed on the gauge observations when error variance methods such as KRE are used. Bias reduction methods such as MFB averaged out these variations and hence produced less accurate merged radar QPEs compared to KRE. Apart from all the above-mentioned factors, data management, and computational requirements must also be considered. The error variance methods, such as KRE, involves higher computational power than simple methods such as MFB. Because of these factors, the results from this study are transferable only between relatively small urban watersheds with similar environmental and operational factors but not between large rural basins.

Conclusions
Various methods combining QPEs from the dual-polarized WKR C-band and KBUF NEXRAD S-band operational radars and precipitation data from a rain gauge network have been implemented for two watersheds in the GTA, Ontario, Canada-the semi-urban Humber River, and the urban Don River. A comparison of nine radar-gauge merging techniques is conducted using 18 rainfall events, occurring from 2012-2017, totaling 278 h (i.e., all events), against an independent gauge network of hourly rainfall measurements. Additionally, this study has investigated the impact of the quality and quantity of different radar QPEs on the performance of radar-gauge merging. Several statistical measures, Correlation (r), BIAS (%), MAE (mm), RMSE (mm), and RMSF (dB), have been computed to evaluate the performance of selected radar-gauge merging methods.
Based on the verification study, all radar-gauge merging methods outperformed radar only QPEs alone. However, performance varies for each radar method as well as for different radar QPEs. The CDFM method with polynomial fitting 3 is the best performing method, followed by KRE. Since the KRE method uses information from radar to interpolate gauge data, it can be effectively used to merge radar QPEs with missing gauge values. The persistent bias reported using the KRE method could be addressed adequately during hydrological model calibration. If the distance from the radar tower to gauges in the watershed changes drastically, the RDA method that takes distance from the radar tower into account during radar-gauge merging is recommended. A relatively simple MFB method shows satisfactory performances along with reduced computational demand. Since radar-gauge merging primarily addresses the systematic errors, the inevitable random error component of the radar QPE errors is responsible for most of the existing differences between merged radar QPEs and reference gauge observations. The gauge-only OK method outperforms radar only QPEs, as well as several merged QPEs because of its ability to characterize the spatial variability in the precipitation field using the high dense gauge network.
The effectiveness of various radar-gauge merging methods and, ultimately, the quality of merged radar QPEs are affected by the quality and quantity of the radar only QPEs. Both NEXRAD S-band and WKR C-band radar QPEs have been improved after radar-gauge merging; however, the NEXRAD Level III (DPA) showed the most improvement. For that reason, all types of corrections for errors such as attenuation, VPR effect, etc. must be applied to improve raw radar QPEs before applying radar-gauge merging. Both merged NEXRAD and WKR radar QPEs show acceptable agreement between reference gauges and, therefore, can be used as an additional data source for hydrological model calibration with high confidence. All merged radar QPEs performed well in predicting for low and medium intensity precipitation from 1 mm/h to 10 mm/h. The CDFM and KRE methods show better performances for a wide range of intensities; nevertheless, performances deteriorate with the increasing rainfall intensities. The event-based evaluation points out that all merged radar QPEs outperformed radar only QPES alone for all 18 events. However, performances improve considerably after applying CDFM, KRE, and MBSA. Radar-gauge merging performed best in the summer season when contamination due to the bright band effect is minimal compared to late fall and early spring.
Since this study addresses systematic spatio-temporal errors at an hourly time step, radar-gauge merging would aid in developing accurate and continuous precipitation data for hydrological model calibration for flood forecasting purposes at relatively small urban and semi-urban watersheds with a time of concentration on the order of hours. Although the results presented in this study provide some direction on the best methods to use for radar-gauge merging, there is no guarantee that similar performances will be obtained in all other locations in the radar domain. Therefore, it is recommended that additional information about the environmental and operational factors such as rain gauge density, rainfall intensity, proximity to radar tower, the response time of the watershed, and the time step of the radar-gauge merging, be used to select the best-merged radar estimated precipitation. In the future, evaluation of the accuracy, as well as computational demand for radar-gauge merging methods with hydrological models used for operational flood forecasting, is recommended. In addition, the radar-gauge merging methods must be re-evaluated when sub-hourly data is available.