Comparison of Hydrologic Model Performance Statistics Using Rain Gauge and NEXRAD Precipitation Input at Different Watershed Spatial Scales and Rainfall Return Frequencies for the Upper St. Johns River, Florida USA

: Water resources numerical models are dependent upon various input hydrologic field data. As models become increasingly complex and model simulation times expand, it is critical to understand the inherent value in using different input datasets available. One important category of model input is precipitation data. For hydrologic models, the precipitation data inputs are perhaps the most critical. Common precipitation model input includes either rain gauge or remotely-sensed data such next-generation radar-based (NEXRAD) data. NEXRAD data provides a higher level of spatial resolution than point rain gauge coverage, but is subject to more extensive data pre and post processing along with additional computational requirements. This study first documents the development and initial calibration of a HEC-HMS model of a subtropical watershed in the Upper St. Johns River Basin in Florida, USA. Then, the study compares calibration performance of the same HEC-HMS model using either rain gauge or NEXRAD precipitation inputs. The results are further discretized by comparing key calibration statistics such as Nash– Sutcliffe Efficiency for different spatial scale and at different rainfall return frequencies. The study revealed that at larger spatial scale, the calibration performance of the model was about the same for the two different precipitation datasets while the study showed some benefit of NEXRAD for smaller watersheds. Similarly, the study showed that for smaller return frequency precipitation events, NEXRAD data was superior.


Introduction
Computer models that simulate hydrologic runoff processes are essential tools for understanding and describing the overall hydrologic cycle. They are routinely used for important studies regarding water management, water quality issues, land use changes, flood inundation, and many other forecasting applications. The success of current model development and subsequent hydrologic prediction lies in the proper selection of model input parameters. Some researchers believe spatial and temporal variability of precipitation data are the main source of input data uncertainty when rainfall-runoff models are applied [1]. Since precipitation is the main driving mechanism for hydrologic models, selecting the suitable meteorological input dataset for precipitation is perhaps the most critical step in model development.
Traditionally, precipitation measurements from rain gauges or meteorological stations have been used as the only reliable source of precipitation in watershed modeling [2]. The benefit of rain gauges is their ability to obtain a precise point value for precipitation, with minimal data processing needed for use in hydrologic model applications. The main limitation in the reliance on rain gauge technology is that the rain gauge network cannot supply information about rainfall occurring between the gauges, and as a result, the network may not fully capture rainfall events demonstrating high spatial variability [3,4]. In an effort to use these point measurement locations, areal averaging of the measured precipitation amounts is necessary. A commonly used method, the Thiessen Polygon method, calculates the weight of each rain gauge according to the rain gauge location to create a polygon network and applies the gauge rainfall quantity uniformly over the polygon area.
In recent years, technological advancements have provided rainfall estimates derived from remotely-sensed data. The use of remotely-sensed, radar-based data for rainfall estimation has evolved, presenting the opportunity of more accurate rainfall predictions. This has led to the use of satellite-based rainfall products, such as the Next-Generation Weather Radar (NEXRAD) data available in the USA and select overseas locations, in many hydrological modeling applications due to the availability of spatially continuous estimations at relatively fine spatial resolution. The NEXRAD data system was initiated in the early 1990s by the United States National Weather Service (NWS) and consists of a national network of radars known as the WSR-88D (Weather Surveillance Radar 1988) [5]. Due to the indirect nature of radar rainfall measurements, NEXRAD data may be subject to many sources of uncertainty such as radar-based factors (antenna, transmitter, and receiver), ground clutter, anomalous beam propagations, radar beam overshooting, and range effects caused by an increase in beam elevation and degradation of resolution due to beam spreading [2,6,7]. Additionally, one of the largest sources of error can be the chosen reflectivity-to-rainfall (Z-R) relationship because it is directly related to the amount of precipitation estimated. Many empirical Z-R relationships have been developed because different climate conditions and rainfall characteristics can impact raindrop size distributions [8].
Since the acquisition of precipitation data and its application becomes more time consuming and expensive as the spatial resolution is increased (through the incorporation of NEXRAD data), the determination of the appropriate data source needed to provide satisfactory results in a hydrologic model is critical. Due to the limitations inherent with gauge-measured and radar-derived data, it is important to understand the quality of the data (in terms of precipitation quantity), possible bias, and systematic offsets. Many authors have found that NEXRAD underestimated rain when compared to rain gauge measurements [4,9], whereas Mazari [10] concluded that radar data was comparable to rain gauge measurements. Other researchers recommend understanding the potential differences in precipitation values, quality of radar data, and the overall bias of radar generated rainfall compared to rain gauge rainfall quantities, as they have a direct impact on hydrologic simulations [11].
The potential for improved accuracy of hydrologic model simulations and forecasts using radar data instead of point gauge data has been studied previously [1,[11][12][13]. Other research determined that NEXRAD data was generally less accurate in predicting the streamflow volumes using the hydrologic simulation code Hydrologic Engineering Center-Hydrologic Modeling System (HEC-HMS) [14] as compared to gauge-only simulations for two basins in central Tennessee, USA. The results of Kalin and Hantush [1] showed that during calibration of a Soil Water Assessment Tool (SWAT) model [15,16] in the Pocono Creek Watershed (~120 km 2 ) the NEXRAD and rain gauge driven model performance statistics were comparable and the simulated hydrographs were similar to the observed flow hydrographs. Looper and Vieux [17] investigated streamflow prediction accuracy using radar-derived precipitation estimates and gauge observations on a physics-based distributed (PBD) hydrologic model, Vflo [18,19]. It was determined that the radar data showed more accurate hydrologic prediction for flash flood events. Results show that there are mixed conclusions in regards to whether radar-derived precipitation is a better alternative than traditional fixed instrument-based rain gauge data. Only limited inner comparison of watershed response improvement and its relationship at the spatial scale and rain intensity level has been completed.
This study was designed to assess the intrinsic value of the two precipitation input types (rain gauge versus NEXRAD) in the development of a rainfall-runoff hydrologic simulation model. Available stream gauges in the watershed were used to compare simulated discharges to actual discharges. These watersheds are typical for low-topographic drive systems situated around the globe. In order to assess and compare the value of the two input data types, a hydrologic simulation model was developed for subtropical watersheds located in central Florida, USA. The overall development of the model using HEC-HMS is briefly discussed while the focus of the research centers on the overall value of the two precipitation input data sets. In order to measure the value of each precipitation input dataset, a comparative assessment of the improvement or decline in calibration and validation performance statistics of the hydrological model results regarding: (i) Watershed scale and (ii) rainfall return frequency were completed. These proxies were then used to determine when NEXRAD data provides little to no advantage over rain gauge data, estimated using the Thiessen Polygon method across a model domain. This work is very important as it clearly illustrates that automatically using the most complicated data input for hydrologic models may not be prudent if the results of the simulations deliver no advantage in model performance or accuracy. Ultimately, the research revealed that at larger spatial scale, the calibration performance of the model was approximately the same for the two different precipitation datasets while the study showed some benefit of NEXRAD for smaller watersheds. Similarly, the study showed that for smaller return frequency precipitation events, NEXRAD data was superior.

Materials and Methods
As was noted in the Introduction, this research compares the inherent value of two different precipitation input data sets for use in rainfall-runoff hydrologic modeling. The study approach uses a hydrologic model of a portion of the St. Johns River (SJR) watershed to do a comparative assessment. The study area chosen for simulation purposes was the SJR located in northeast, Florida USA. The general topography of the project area is flat, with an average slope of approximately 0.003 m/km, which results in a more lacustrine than riverine characteristic [20]. The climate of the central and eastern portion of Florida is mainly humid subtropical, similar to many other regions of the world at the same latitude. The average rainfall is between 1100 mm to 1500 mm per year [21]. The watershed is comprised of multiple sub-basins of various sizes and hydrologic properties that drain toward the flow way of the SJR. Due to its size, the SJR has been divided into major "Basins" by the St. Johns River Water Management District (SJRWMD), a state agency whose work focuses on managing water supply, water quality and natural systems management, and flood protection (SJRWMD, 2012). The Upper St. Johns River Basin (USJRB) acts as the headwaters of the SJR and has an area of approximately 4530 km 2 . This Basin is mainly comprised of marsh and agricultural land with multiple storage areas used for flood control and environmental management. Additionally, there are multiple flood control projects within the area which include flood levees and water control structures. The Middle St. Johns River Basin (MSJRB) is downstream of the USJRB and encompasses approximately 3100 km 2 . It includes a variety of natural land types, but a majority of the basin is comprised of highly urbanized areas. For this study, a majority of the USJRB and portion of the MSJRB was simulated. The model domain covers ~5200 km 2 with a rain gauge density of approximately 217 km 2 /gauge. The watershed was divided into multiple sub-basins, delineated previously by the U.S. Army Corps of Engineers (USACE) and SJRWMD. Figure 1 shows the model domain with sub-basin delineation and available rain gauge locations. Gauges are maintained by the United States Geological Survey (USGS), USACE, South Florida Water Management District (SFWMD) and SJRWMD.

HEC-HMS Model
The HEC-HMS version 3.5 hydrologic modeling platform simulates precipitation-runoff processes for a wide range of geographic areas and a variety of watershed sizes. HEC-HMS simulates natural and controlled hydrologic conditions in watershed systems and simulates precipitationrunoff processes [14]. HEC-HMS has the ability to perform multiple computations for runoff processes including: runoff volume including infiltration and impervious percentage, direct runoff, baseflow, and channel flow. Hydrologic elements (includes sub-basin, reach, reservoir, junction, diversion, source, and sink) are used within HEC-HMS to represent the physical processes of a watershed with one or more mathematical model available for computational purposes. HEC-HMS also has the capability to model water control facilities, including diversions, reservoirs, and detentions, which may include control structures or outlet works. Meteorologic conditions over a watershed can be specified by using precipitation and evapo-transpiration input data. The HEC-HMS modeling platform allows the Upper and Middle St. Johns River Basins hydrologic processes to be simulated using one large-scale model with adequate detail to determine the changes in runoff processes due to variations in the precipitation input conditions.
The category of mathematical modeling chosen for this study was an event based, lumpedparameter model. HEC-HMS computes outflow by subtracting the losses, transforming excess precipitation, and adding baseflow to the precipitation data that is applied to each sub-basin. The USDA Soil Conservation Services (SCS) Curve Number Method [22] was selected as the Loss Method for calculation of the infiltration of each sub-basin element. The SCS runoff equation is an empirical model that was developed for estimating runoff potential from the rainfall event based on the relationship between soil type, land use, and antecedent soil conditions. The retention parameter, or Curve Number, was estimated using the Florida Land Use and Cover Classification System (FLUCCS) [23] and state soil geographic (STATSGO) database information. The SCS unit hydrograph Transform Method was used to compute the resulting surface runoff hydrograph based on calculated lag time. The Baseflow Method represents the runoff of prior precipitation and subsurface volumes for each sub-basin and the constant monthly baseflow method was selected.
The channel network is created through the use of reach elements, which are used to represent the main flow way of the St. Johns River. The Muskingum-Cunge Routing Method, which is based on the conservation of mass and the diffusion representation of the conservation of momentum, was used to represent overland flow within the reach elements. In relation to the St. Johns River, one of the largest issues with routing is properly modeling the flood plain storage. According to HEC-HMS, flood flows through extremely flat and wide flood plains may not be modeled adequately as onedimensional flow [24]. To overcome the potential overestimation in flow due to inadequate modeling of storage availability, a loss method was used in areas known to have low runoff potential, surface water withdrawals, and recharge rates.
Four simulation periods in the years between 2007 and 2011 were used for calibration and validation of the model, each representing a different rainfall return frequency. The calibration period occurred from August 2008 to October 2008 and represented a return frequency between a 5 to 10 year-24 h duration. The validation periods occurred in October 2007, March 2010, and October 2011 with return frequencies of less than 1 year-24 h duration, approximately 1 year-24 h duration, and between a 10 to 25 year-24 h duration, respectively. Two sets of daily precipitation data were used in the models: rain gauge observations and radar-derived estimates. USGS observed gauge measurements occur at multiple different locations within the model. Certain gauge locations are at the outlet of sub-basins, thereby representing the area of the upstream sub-basin(s). Any gauges located within the flow way of the SJR would be representative of all sub-basins upstream, thereby allowing a large array of spatial scales to be analyzed. This paper will provide an abbreviated summary of the extensive model development and calibration effort herein, but the reader is referred to Tancreto [25] for complete details regarding the model. The complete thesis study can be downloaded at the University of North Florida Digital Commons.

Precipitation Input to Models
The SJRWMD and SFWMD each operate a large network of rain gauges in or near the project area. A majority of the data has undergone quality assurance procedures, which aids in improving the accuracy and reliability of the data. Spatial estimates of precipitation across the entire model domain were developed using the Thiessen Polygon method, which defines an individual area of influence surrounding each gauge using the application of a Thiessen Polygon network. Each polygon is formed by the perpendicular bisectors of the lines joining adjacent gauges and represents areas of effective uniform depth. It is assumed that the gauge data, which was collected at a single point, is representative of the entire Thiessen Polygon. A common problem with the Thiessen Polygon method is having an insufficient number of rain gauges in a network, thereby assuming the areal rainfall is spatially homogeneous over large areas.
Hourly WSR-88D NEXRAD radar data, that has been gauge-adjusted from the network of rain gauges within the project area, was obtained from the SJRWMD for this study. The gauge and radar data are combined to calculate a gauge-radar ratio, which is applied in a radar calibration algorithm to derive a gauge-adjusted rainfall dataset. Additionally, proprietary geographic information system (GIS) algorithms are used to help reduce or eliminate any discontinuities and ground clutter from the radar station data [26]. The SJRWMD radar data product is generated at a 2 km pixel resolution grid format, which covers the entire project area. The data was aggregated to a daily time step to ensure consistent temporal scaling across precipitation input types.
Both precipitation input types required preprocessing in order to be used within HEC-HMS due to the various models selected to compute runoff processes being lumped in nature. The precipitation method selected in HEC-HMS was the Gauge Weights method, which uses separate parameter data for each gauge and for each sub-basin in the model. The ArcGIS 10.1 software [27] was used to compute the Thiessen polygons and the relative area of each sub-basin (from the sub-basin shape file) within each polygon.

Comparison of Precipitation Measurements
The quality of the radar-rainfall measurements, although continuously advancing, remains largely unknown. It is important to distinguish the relationship between the precipitation datasets, as they have a direct impact on hydrologic modeling results. Because the comparison of precipitation input in hydrologic simulation is the main objective of the research, the degree of similarity between the rainfall datasets must be analyzed. Comparative statistics will aid in understanding the potential differences in precipitation values, quality of the radar data, and the overall bias of radar generated rainfall compared to rain gauge rainfall quantities. The two statistical measures employed are the bias (B) and root mean square difference (RMSD), as utilized in several studies [4,9,11]. The estimation bias (B) is the ratio of the total difference in precipitation between the radar total and rain gauge total to the rain gauge total, as seen in Equation (1).
The RMSD was calculated to determine the degree of deviation or difference between the radar data predicted to the gauge value, which was actually observed. The RMSD represents the standard deviation of the differences between the predicted radar data and the observed gauge data. It is used to evaluate the goodness of fit for the study rain gauge and NEXRAD data. The RMSD equation can be seen below in Equation (2).
where, Gi and Ri represent the ith day precipitation rate of gauge and radar, respectively; n is the sample size of radar and gauge pairs.
In previous studies, radar estimates are compared with the corresponding gauge observations [6,11,12]. Comparative statistics are normally performed for the radar pixels (one grid) that contain gauges. As discussed earlier, the NEXRAD data processing uses the gauge observations to adjust, correct, and sometimes replace radar estimates. This causes a lack of independence between the two sets of estimates at these locations [11]. Therefore, a direct comparison of the 2 × 2 km grid at the corresponding rain gauge data (point data) location may not provide an adequate representation of bias present within the dataset. To quantify the amount of total bias present between the datasets, precipitation inputs of particular sub-basins were chosen as the comparative sampling locations. Additionally, distance measurements were computed from the rain gauge location to the centroid of the sub-basin. This was important because as stated above, bias between the gauge and radar data may be reduced (through radar-gauge correction) at locations close to the gauges. For this reason, the distance was used to determine if the calculated biases were influenced by gauge location through the use of the coefficient of determination. It was not the purpose of this comparison to examine, in full detail, the accuracy of the NEXRAD data provided, but to provide insight on the bias that may be present when performing simulations. The sub-basin rainfall data chosen for comparison are believed to be representative of the entire basin as they are located throughout the modeled area. Table 1 illustrates that for seven representative sub-basin rainfall measurements, the total percent bias can range from −36.1% to 53.3%. The average overall bias for 2007 and 2008 were the greatest at 16% and 9%, respectively. The average bias of 2010 and 2011 was close to zero. This may be due to the fact that radar estimation accuracy has been continuously increasing over the past few years, resulting in higher quality radar data. The higher values of bias indicate that significant random differences may exist between the radar estimates and the corresponding gauge measurements for many of the sub-basins within the model. Additionally, a majority of the subbasins show a positive bias which indicates that the NEXRAD data may be overestimating rainfall compared to the corresponding gauge. The total bias was plotted against the distance from the rain gauge to the centroid of the sub-basin. Linear regression was performed and the associated coefficient of determination ranged from 0.13 to 0.45. Therefore, no strong correlation can be detected regarding total bias in terms of distance from the rain gauge. The RMSD values help quantify the random differences between the data sets to show the magnitude of error. As can be seen from Table 1, an error of up to 12.7 mm was calculated. This error can cause significant implications in hydrologic runoff calculations due to the undesired over-or underestimation of rainfall amounts.
The comparative statistical tests show that bias and error can occur between the radar and rain gauge data sets. The model calibration process largely acts to correct this error or bias in precipitation data via parameter adjustment [13]. Many studies discuss the importance of model recalibration when switching precipitation products [2,11,13,28]. Therefore, minor recalibration of the model was performed to ensure the model parameters are acceptable in order to produce the best possible hydrologic simulation results for each product.

Model Calibration and Validation
The HEC-HMS model was calibrated and validated using existing observed storm events in the study area. The initial calibration of the model was completed using the rain gauge data. Once the model was deemed calibrated, the NEXRAD-derived data were input and the model was recalibrated. The primary goal of the calibration process is to match the simulation results to the observed data as closely as possible. The calibration locations were determined by the available discharge data locations available during the calibration timeframe and are shown in Figure 2. Table  2 provides a summary of the initial calibration performance of the model using rain gauge precipitation input, which was considered highly satisfactory. Each of the performance goodness-offit statistics are discussed in more detail below.  After the initial calibration effort was completed, a comparison of calibration and validation performance was completed comparing both precipitation input datasets. Having known discharge gauge and simulation data allows a direct comparison to be made to evaluate the performance of the model using the two different precipitation input datasets. This comparison was completed using two different statistical measures: the coefficient of determination (r 2 ) and the Nash-Sutcliffe efficiency coefficient (NSE). These statistics were used to quantitatively compare the hydrologic simulation results to determine which precipitation input method yielded the most accurate results for the various simulations events.
The r 2 gives the variance of the data and assesses a goodness of fit at each calibration point for the model. The equation for r 2 can be seen below in Equation (3). It can help explain the variability of the model and how well the model may produce results for future predictions. The coefficient of determination is between 0 and 1, with 1 indicating a perfect fit with all variation explained. where, Oi is the observed data on the ith day; Si is the simulated data on the ith day; Obar and Sbar are the observed and simulated mean values, respectively; and n is the number of observations.
The NSE [29] is a normalized statistic that determines the relative magnitude of the residual variance compared to the measured data variance and indicates how well the plot of the observed data versus the simulated data fits the 1:1 line [30]. The NSE equation can be seen below in Equation (4). where, Oi is the observed data on the ith day; Si is the simulated data on the ith day; Obar is the observed mean value; and n is the number of observations.
The ranges for NSE can vary between -∞ to 1, where: NSE = 1 corresponds to a perfect match between discharge data and observed data; NSE = 0 shows that the model predictions are as accurate as the mean of the observed data; and -∞ < NSE < 0 occurs when the observed mean is a better predictor than the model, which indicates unacceptable performance [30]. The St. Johns River Water Supply Impact Study [20] used the Nash-Sutcliffe statistic to explain the calibration performance for their hydraulic model. Following similar methodology, the NSE values will be divided into intervals which explain performance rating. The intervals are as follows; 0.75 < NSE < 1 is a "very good" performance rating, 0.65 < NSE < 0.75 is a "good" performance rating, 0.50 < NSE < 0.65 is a "satisfactory" performance rating, and NSE < 0.50 is an "unsatisfactory" performance rating.
The Wilcoxon signed-rank test is a nonparametric test of hypothesis used to determine whether there is a significant difference between the medians of two related groups. It is an analysis that is useful to determine if the population median-ranks differ between measurements that are repeated, also known as a paired difference test. The calculation of the Wilcoxon signed-rank test involves a W test statistic, whose distribution under the null hypothesis (distributions between the pairs are equal) is known. Using the test statistic W, a z-score and p-value can be calculated. If the p-value calculated is less than the significance level (α), then the groups are statistically significant and different with the null hypothesis rejected. The selected significance level for all tests was chosen as 0.05. This test was used to determine if the resulting NSE and r 2 between the groups of data were statistically significant, that is, there is a distinct difference between the NEXRAD-generated and rain gaugegenerated simulation results. If the null hypothesis is rejected then the simulation results are unlikely to have occurred by chance and the simulation results from each precipitation input type do indeed differ.

Simulation Results at Different Rainfall Return Frequencies
The four simulation runs, representing the different rainfall return frequencies, were completed using two different rainfall input types. The final statistical results are presented on a per simulation basis, with a comparative framework designed to determine which precipitation input yielded more accurate results. The r 2 and NSE values for each precipitation input type for the model calibration and validation periods are shown below in Tables 3 and 4, respectively.
The model performance measures during the 2008 calibration period show a strong agreement between the simulated and observed streamflow data, with a high level of accuracy, as can be inferred from the high r 2 and NSE values. Outcomes of the two statistical tests were compared using the Wilcoxon signed-rank test for the two precipitation input types. The Wilcoxon signed-rank test determined that the distributions between the two groups did not differ significantly (W = 34.5, z = 0.13, p = 0.917 for r 2 and W = 325, z = 0.297, p = 0.797 for NSE). Both Wilcoxon sign-rank test p-values suggest that there is no significant improvement of simulated streamflow values when using the NEXRAD data as compared to the rain gauge data.
The validation performance results for the October 2007 simulation event differ significantly depending on the discharge location, with a majority of locations showing relatively low performance values. The NEXRAD r 2 values were compared against the rain gauge r 2 values to determine improvement or decline. The median r 2 values were calculated as 0.53 and 0.7 for the rain gauge and NEXRAD data, respectively, which is a favorable increase in performance. The Wilcoxon signed-rank test calculated W value was 55 and a p-value of 0.233 was determined, thus the null hypothesis can only be rejected at an alpha value of 0.23, which is higher than the specific alpha value of 0.05. Many of the locations had little to no correlation (negative values), as determined by the NSE results, when using rain gauge data as the precipitation input method. The use of the NEXRAD data allowed significant improvement at ten USGS discharge gage locations. The median values for the NSE values were −0.06 and 0.53 for the rain gauge and NEXRAD data, respectively. For the NSE results, the calculated W value was 81 with a p-value of 0.01. This shows that a statistically significant difference of median is present at the alpha level of 0.05. Therefore, from the results it can be inferred that the NEXRAD calibrated data simulations estimate flow more accurately than the rain gauge simulations for rainfall return frequencies less than 1 year at a duration of 24 h.
The 2010 and 2011 validation performance results show a strong relationship between the simulated and observed streamflow data for both precipitation input types. The r 2 and NSE values are relatively close to one for a majority of the locations. An interesting observation made during the 2010 and 2011 validation effort was that minimal recalibration of model was needed between precipitation input types. The Wilcoxon signed-rank test determined that there is no significant improvement of streamflow values using the NEXRAD data over the rain gauge data for both validation events.

Simulation Results at Different Spatial Scales
The model performance measures were compared across simulation events for both individual sub-basins and at locations downstream of multiple sub-basins. The purpose of this analysis was to determine if, at certain sub-basin sizes, the NSE or r 2 values would show improvement during the various simulation runs. To determine if NEXRAD data input produced more accurate results at particular sub-basin scales, the r 2 and NSE performance evaluations for all simulations were compared using the Wilcoxon signed-rank test. The r 2 and NSE values were compared across all simulations events at the three sub-basin size groupings; small (less than 250 km 2 ), medium (250 km 2 < x < 1000 km 2 ), and large (over 1000 km 2 ). The p-value for the r 2 and NSE comparison for the small subbasins were 0.002 and 0.0002, respectively. Due to the p-values, difference in computed median values, and previously computed improvement of r 2 and NSE results, it was inferred that at smaller sub-basin sizes the NEXRAD data performs better. For the medium sized sub-basins a p-value of 0.818 for r 2 and 0.719 for NSE were computed. Since the p-values are much higher than the significance level of 0.05 or 5%, the null hypothesis cannot be rejected. There is no statistical significance suggesting that the distributions between the datasets are different, and thus the NEXRAD data input does not improve r 2 and NSE values at the medium sub-basin size. For the large sized sub-basins, the p-value for the comparison of r 2 and NSE values were 0.003 and 0.05, respectively. Since the p-value for NSE comparison is equal to the significance level, the null hypothesis should not be rejected. This p-value is very close to rejecting the null hypothesis thus suggesting that the rain gauge data may produce more accurate results than the NEXRAD data. The p-value, higher median value, and previously determined decline in results when using NEXRAD data would suggest that rain gauge data produces more accurate results at the large sub-basin scale.

Discussion
In this paper, a comparison of HEC-HMS hydrologic simulation performance using rain gauge and NEXRAD precipitation input at varying spatial scales and rainfall return frequencies for a large subtropical watershed with minimal topographic drive was conducted. In addition to the comparative model performance analysis, total bias between NEXRAD and rain gauge data within the study location was investigated. Precipitation measurements arguably have the most critical influence on the model performance, thus the need for quality data input is apparent. Comparing hydrologic simulation results using radar and rain gauge input aids in identifying the thresholds for maximum gain when using the more cumbersome, but finer resolution radar data. This research provided guidance for both spatial scale and rainfall return frequency scenarios for which the use of radar data would yield more accurate hydrologic results.
Calibration and validation of a HEC-HMS hydrologic model of the Upper and Middle St. Johns River Basins was completed for four storm simulation periods, each representing a different rainfall return frequency. The HEC-HMS model was first calibrated using precipitation data from rain gauges located within or near the watershed boundary. As an alternative precipitation input source, NEXRAD data was obtained. Rain gauge and NEXRAD precipitation estimates were compared at seven locations within the model domain. The evaluation showed that NEXRAD total precipitation was greater than gauge total precipitation for a majority of the sub-basins and that recalibration between precipitation inputs was necessary.
Model performance was evaluated both visually and statistically against observed hydrograph data from USGS. The model performance measures, Nash-Sutcliffe efficiency (NSE) and coefficient of determination (r 2 ), were used to quantitatively compare the NEXRAD and rain gauge hydrologic simulations to the observed USGS discharge data. Additionally, a statistical hypothesis test, the Wilcoxon signed-rank test, was used to evaluate the difference in model performance results for the two precipitation input types. Overall, the calculated NSE and r 2 values for the 2008, 2010, and 2011 simulations were similar and very promising (majority were > 0.75), indicating the model predicts streamflow values with a high level of accuracy for both NEXRAD and rain gauge data input. The Wilcoxon signed-rank test results confirm that no significant improvement or decline in model streamflow accuracy is present when using NEXRAD data input for rainfall return frequencies of approximately 1 year-24 h duration and greater. For the 2007 event, the NEXRAD precipitation data performed better than the rain gauge data at predicting the magnitude and timing of the peak, as reflected in the higher r 2 and NSE values. A statistically significant difference of median is present (at the alpha level of 0.05). Thus, the NEXRAD data were shown to produce more accurate streamflow simulation results for rainfall return frequencies less than 1 year-24 h duration. The model performance measures were compared across simulation events at multiple spatial scales. NEXRAD data produced more accurate simulated streamflow values for the small sub-basin or watershed areas (less than 250 km 2 ); neither NEXRAD nor rain gauge results show consistent improvement or decline in accuracy of the streamflow values for the medium sized sub-basin or watershed areas (250 km 2 < x < 1000 km 2 ); and rain gauge data produced more accurate simulated streamflow values for the large sub-basin or watershed areas (over 1000 km 2 ). The difference in median for the small and large subbasin performance statistics were statistically significant.
The results of this study suggest that at small spatial scales and low return frequencies, NEXRAD data may produce more accurate streamflow estimations. Given that the performance of radar data in this study, it may be inferred that the spatial averaging of rain gauge Thiessen polygon data provides similar or more accurate rainfall estimations for large spatial scales and higher rainfall frequencies. This could be an indication of the spatial resolution of the rain gauges not capturing the spatial variability of smaller storm systems, which may be more convective in nature. Additionally, smaller spatial scales show a high level of sensitivity to rainfall input, thus the need for the higher spatial resolution of the NEXRAD data. It is important to note that the results of this study are conditioned on the modeling platform and precipitation data used. Many important factors should be analyzed before a precipitation input is selected such as; the spatial and temporal scale of the model, rain gauge data availability (coverage), quality of radar data, rainfall event, and model structure and spatial discretization. The conclusions of this study are not comprehensive of all watersheds, and thus care should be taken when assessing the results of this study for use in future modeling efforts. Further research should concentrate on identifying the rainfall return frequency threshold for which NEXRAD data may provide more accurate results for varying temporal scales. All simulation periods analyzed in this research were for a daily time period, over a relatively short duration. Introducing radar data at the hourly time scale may further improve model performance statistics, thus suggesting improvement of NEXRAD simulation results at certain return frequencies.
Additionally, further research on model spatial discretization and its relation to streamflow accuracy at different spatial scales is needed. Due to the relative size of the model domain, the spatial discretization was relatively coarse for this project. Therefore, a full range of model benefits, including improvements in model accuracy, may not have been realized.
In summary, this study has clearly demonstrated that automatically using a larger and more complex precipitation input dataset is not justified in flatter, subtropical watersheds. In the future, it would be advisable for model developers to carefully consider the benefits of using more complicated and costly model inputs if the resulting benefits are negligible. Funding: This study was partially funded with support from grant #27884 from the St. Johns River Water Management District (SJRWMD) for the University of North Florida as a subcontractor to the University of Central Florida. It was part of a larger study for the economic evaluation of wetlands for the St. Johns River.