1. Introduction
Hydrologic models are essential for improving our understanding of the various components of the hydrologic cycle and are valuable tools for water resources modeling, drought and flood forecasting, and climate change impact assessment studies. In recent decades, efforts have been made to advance hydrologic modeling and forecasting through the use of spatially distributed models [
1,
2,
3,
4,
5,
6]. Spatially distributed models are supported by readily available geographic information system (GIS) data and rapidly increasing computing power, and it has been anticipated that spatially distributed models would provide more accurate and timely hydrologic information due to their innate ability to account for basin heterogeneities and spatially distributed inputs [
7,
8,
9,
10,
11]. Critically, distributed models are able to simulate hydrologic responses at interior locations within the basin drainage network, a benefit not afforded by lumped models.
Quantifying hydrologic variability across a range of scales is a primary focus of the prediction in ungauged basins (PUB) initiative, where ungauged basins are described as “one with inadequate records (in terms of both data quantity and quality) of hydrological observations to enable computation of hydrological variables of interest at the appropriate spatial and temporal scales, and to the accuracy acceptable for practical applications” [
12]. Despite decades of research, hydrologic behavior in ungauged basins is still often misunderstood or misrepresented in model applications [
13]. This is largely because there is an incomplete process understanding of the multi-scale spatial-temporal heterogeneities across different landscapes and climates [
14].
From an operational standpoint, distributed models provide an opportunity to expand the information available in forecasts, including streamflow at interior locations, uncertainty assessment, and soil moisture conditions [
10,
11,
15,
16]. Fang et al. [
17] showed that the use of appropriately structured flexible, distributed models with a-priori estimation from on-site measurements could produce robust estimations of snowpack, soil moisture, and streamflow at multiple scales. Another study by Reed et al. [
18], as part of the first phase of the Distributed Model Inter-Comparison Project (DMIP), reported that when calibrated to the outlet of larger parent basins, distributed models produced reasonable performance at interior locations where no explicit calibration was conducted. However, when these models were calibrated at the outlet of relatively smaller parent basins, a degradation of the performance at interior locations was observed. They argued that the lack of explicit calibration at interior sites does not fully explain poor model performance at these points.
Given the increasing demands for higher resolution information at scales necessary to support water resources relevant forecasts [
14], the Office of Hydrologic Development (OHD) developed the Hydrology Laboratory Research Distributed Hydrologic Model (HL-RDHM) [
1,
2]. The HL-RDHM uses the spatially distributed version of the Sacramento Soil Moisture Accounting model (SAC-SMA) [
19] and supports investigations such as forecasting at an increased number of locations (interior points), improving flash flood information, and utilizing a framework to test new spatially distributed products [
20]. While distributed models can represent the spatial variability of watershed response in a more realistic manner than the traditional lumped modeling, it is also necessary to properly account for the spatial variability of precipitation to accurately characterize the hydrologic response [
21,
22]. In this study, we aim (1) to quantify how the accuracy of simulated discharge changes with basin scale for a distributed model calibrated only to the outlet, and (2) to characterize the impact of the spatial resolution of precipitation inputs on the accuracy of simulated streamflow across basin scales. Both these objectives further the understanding of modeling and forecasting for small ungauged basins.
Using the HL-RDHM framework, we tested two precipitation products with varying spatial resolutions, the North American Land Data Assimilation System Phase 2 (NLDAS-2) with a nominal 12.5 km resolution and the Stage IV quantitative precipitation estimate with an approximate 4 km resolution, as hydrologic model input. Discharge simulations were conducted for three watersheds in the North-Central United States for the evaluation period 2003–2016. The model was calibrated with observed discharge from each watershed outlet, defined as the location of a United States Geological Survey (USGS) gage. Interior locations were left uncalibrated and used to evaluate model performance and predictive uncertainty for interior ungauged basins.
2. Materials and Methods
2.1. Hydrologic Model
The HL-RDHM is a spatially distributed hydrologic model that includes a rainfall-runoff model (the Sacramento Soil Moisture Accounting Model; SAC-SMA) (NOAA/NWS, Silver Spring, MD, USA), a snow model (SNOW-17) (NOAA/NWS, Silver Spring, MD, USA), and hillslope watershed and channel routing models to simulate hydrologic processes. We used the modified version of the SAC-SMA, referred to as the SAC-SMA Heat Transfer model (SAC-HT) (NOAA/NWS, Silver Spring, MD, USA), within the HL-RDHM V3.5.1. The SAC-HT incorporates a physically-based frozen ground model [
23]. The SAC-HT accounts for storage and flow in the subsurface using a two-layer soil structure. Each zone has free water storages that represent water drained by gravitational forces and tension water storages that represent water removed through evaporation and transpiration. Discharge contributions to the stream occur from both upper and lower zones representing surface runoff, interflow, and baseflow processes. Surface runoff occurs when precipitation exceeds the interflow and percolation capacities and the upper zone tension storage is full [
24]. The SNOW-17 is a snow accumulation and ablation model that uses air temperature as an index to determine energy exchange across the snow–air interface [
25]. The surface and subsurface flow routing processes are based on hillslope slope, roughness, and drainage density. Water is routed from upstream to downstream cells by using a cell-to-cell connectivity file. The connectivity sequence was developed by the OHD and defined using surface flow directions derived from a digital elevation model (DEM). Inputs to the model are precipitation, temperature, and potential evapotranspiration (PET). PET is commonly input as daily values that are uniformly interpolated into the specified model time step.
The spatial resolution of the HL-RDHM is based on the Next Generation Weather Radar (NEXRAD) Hydrologic Rainfall Analysis Project (HRAP) grid coordinate system [
1,
8]. The HRAP coordinate system is a polar stereographic projection at a nominal grid spacing of 4 km × 4 km, or 1 HRAP pixel; however true pixel size varies with latitude [
26]. Model simulations were run at an hourly time step and at the default 1 HRAP spatial resolution with a yearlong initialization period (2002) to allow model states to equilibrate.
2.2. Study Area
Three basins in the north central United States, one basin in Iowa and two basins in Minnesota (
Figure 1), were selected for study based on discharge data availability at and upstream of the basin outlet. All watersheds are within the Upper Mississippi River basin and are forecast points of the National Weather Service (NWS) North Central River Forecast Center (NCRFC) in Chanhassen, MN. The drainage area of the basins range from 542 km
2 to 3493 km
2 (
Table 1,
Table 2 and
Table 3). Each basin is partitioned into 5–8 subbasins (
Figure 1) depending on the location of available upstream data. The subbasins, or interior locations, range in size from 19 km
2 to 2429 km
2 (
Table 1,
Table 2 and
Table 3).
The study area has topographic relief of less than 200 m of total elevation change [
27]. The landscape surrounding the Squaw and Le Sueur basins, located in Iowa and Southern Minnesota, is dominated by agriculture with large areas of row crops such as corn and soybeans. This agricultural land is typically made up of the nutrient-rich Mollisol soil order defined by the United States Department of Agriculture soil taxonomy. Land use in South-Eastern Minnesota, the location of the Cannon River basin, is predominantly agriculture or forestland. Based on the Natural Resources Conservation Service Soil Survey Geographic (SSURGO) database, soils in the Cannon River basin are a combination of both mollisols and alfisols; the alfisols soil group is a clay-enriched subsurface region most commonly found in forested areas. In Iowa and Southern Minnesota, artificial drainage networks are commonly used to drain soils and lower the water table to promote crop growth. Field monitoring studies have suggested that drainage from tiles increases annual baseflow in streams [
28,
29]. Tile drainage is not explicitly included in the utilized model.
Each study basin experiences similar climatic conditions, and depending on the time of year, is subject to various meteorological extremes. In summer months, convective thunderstorms are common due to moisture transported from the Gulf of Mexico. These convective systems account for most of the annual precipitation for the study area, occurring from early spring into late summer. Annual precipitation amounts averaged over 12 years ranges from 773 mm in southern Minnesota (45° N) to 950 mm in central Iowa (42° N) (
Table 4). Conversely, winters are cool and drier with precipitation commonly occurring in the form of snow.
2.3. Discharge Data
Hourly and daily discharge data for the basin outlets were obtained from the USGS (
Table 1,
Table 2 and
Table 3). Discharge observations at upstream sites within the Minnesota basins were available from the Minnesota Department of Natural Resources (MN DNR) Stream Hydrology Unit [
30]. The MN DNR collects and quality controls discharge data following the USGS protocols. Data for upstream sites in the Squaw Creek were collected by the Iowa Flood Center (IFC) [
5] using a network of automated stage sensors. These sensors collect and relay stage data to a central database at 15 min intervals. The data are available in near real-time through the Iowa Flood Information System (IFIS) [
31]. There were 22 bridge-mounted stage sensors installed in Squaw Creek at the time of this study; a subset of these sites were selected based on data quality and site accessibility.
To convert IFC stage data to discharge, we obtained model-generated rating curves developed by the IFC using site-specific one-dimensional HEC-RAS models built from Iowa statewide LIDAR data, existing cross-sectional survey data, and bridge plans (Ricardo Mantilla, University of Iowa, personal communication, 2 February 2016). Throughout 2015 and 2016, we conducted periodic field measurements to verify and reduce uncertainty in the modeled curves. Between four to eight manual discharge measurements were taken at each upstream location in Squaw Creek. During periods of lower flows, where channel depths were less than 1.5 feet, a wading measurement was conducted using a hand-held FlowTracker, which employs an Acoustic Doppler Velocimeter (ADV). For moderate to high flows, measurements were taken from the bridge using an Acoustic Doppler Current Profiler (ADCP). A measurement transect was completed by navigating the boat in a straight line normal to the direction of flow, while the ADCP collects depth, distance, and velocity data. A minimum of four transects were made to ensure the average difference of the discharge from each transect was less than 6%. Manual discharge measurements were added to modeled rating curves and a polynomial equation was applied to fit a curve through both modeled and measured points along the curve.
2.4. Model Inputs
PET data were obtained from the OHD and consisted of long-term average PET at the HRAP resolution. The PET values represent mid-month values from which daily values are linearly interpolated. PE adjustment factors for each month account for the seasonal variability in vegetation through the course of a year [
20].
Hourly air temperature (at 2-m) and precipitation from the NLDAS-2 were used. These data are derived using the National Center for Environmental Protection’s (NCEP) Eta-model-based Data Assimilation System (EDAS) [
32]. The NLDAS-2 [
33] hourly precipitation product is created by NCEP through merger of satellite, radar, and observation-based data and generated at a 1/8th degree (~12.5 km) spatial resolution [
34]. In this study, the geographic coordinates of each 1/8th degree pixel were converted to the HRAP coordinate system and disaggregated into a 4-km resolution for input into the HL-RDHM. NLDAS-2 precipitation grids are available from 1996 to present.
Given the focus on localized, summer rainfall events, we included a second analysis using the NCEP Stage IV product, which has a higher spatial resolution than the NLDAS-2. Stage IV is a mosaic of the multi-sensor hourly/6-hourly Stage III analyses produced by the 12 RFCs in the continental United States (CONUS) via manual quality control performed on the Stage III data. Stage IV has a spatial resolution of 16 km
2 and is a combination of ground-based gage observations and radar calculated reflectivity. The hydrological and meteorological communities commonly use this product as a reference for gridded rainfall estimates due to its national coverage and high spatial and temporal resolutions [
35,
36]. Hourly Stage IV data is available at HRAP resolution from 2002 to present.
Initial model simulations indicated discharge tended to be underestimated with both precipitation inputs. Studies have indicated a low bias in the NLDAS-2 derived precipitation product, depending on the region [
33,
37]. Similarly, Prat and Nelson [
38] determined that Stage IV exhibits a considerable low bias in comparison with surface observations on a seasonal scale, with differences ranging from −18% to −2% for winter and from −28% to 8% for summer. In our analysis, we found that the NLDAS-2 and Stage IV inputs were consistently lower than the mean areal precipitation (MAP) data provided by the NCRFC (our ground-truth) for each study basin. To correct for the low bias, a precipitation adjustment factor was developed by calculating the average difference between the individual precipitation product (NLDAS-2 and Stage IV) and the MAP data from 2002 to 2013 (
Table 4) following an approach used in Spies et al. [
27]. This precipitation adjustment factor can be defined within HL-RDHM and is applied to each HRAP pixel in the specified basin.
2.5. Model Calibration and Validation
There are 16 parameters in the HL-RDHM used to reflect basin characteristics and parameter values can vary by grid. Default a-priori parameter grids [
8] were used as an initial starting point for parameter calibration for each basin. To improve calibration efficiency, basin-scale parameter multipliers, rather than the parameters in each grid, were calibrated and applied to the a-priori parameter grids. This method greatly reduces the number of parameters needed in the calibration process, while maintain the spatial information provided by the a-priori parameters [
20]. Prior studies show that this technique is an effective way of improving a-priori model parameter grids [
1,
27,
39].
The parameter multipliers were calibrated using an automated Stepwise Line Search (SLS) procedure [
20]. The SLS technique successively steps through each parameter to optimize the multi-scale objective function for each parameter. If values remain static for three consecutive loops, the parameter multiplier is removed from further manipulation. Ten parameters were selected for calibration with multiplier ranges based on previous studies [
27,
40,
41,
42,
43,
44,
45]. Identifying the maximum and minimum multipliers for each parameter was done using the following method [
27]:
where
xmin and
xmax represent the minimum and maximum multipliers,
pmin and
pmax are the minimum and maximum values for the parameter, and α indicating the mean basin a-priori parameter value.
The model was calibrated for each precipitation input (NLDAS-2 and Stage IV) separately and using observed discharge at the outlet as the calibration criteria. To assess the accuracy of the model for simulating discharge at interior points that may be ungauged, no calibration was done using data from interior locations. The data set spans 2002–2016 for which we have a continuous record of input data. The calibration (2003–2009) and validation (2010–2016) period used a one year initialization period (2002) to allow model states to equilibrate. The calibration period contained both very wet and dry periods.
To capture the flashy response times of the small upstream basins, it was necessary to run the HL-RDHM at an hourly time step, requiring the use of hourly discharge aggregated from the instantaneous data. To verify the suitability of the hourly USGS discharge data for this study, we calibrated the HL-RDHM parameters to both hourly (real-time) and daily (quality controlled) observed discharge in two separate trials for each site, using both precipitation inputs. Time periods with missing hourly data (occurring for winter months due to icing), were omitted from the calibration process. Overall, the differences in model evaluation scores (defined in
Section 2.6) were minor; calibration to hourly data produced higher NSE scores while the calibration to daily data produced better Pbias scores. Given the minimal differences between using hourly and daily discharge data for calibration, study results will focus on the hourly simulations.
2.6. Evaluation Statistics
Four statistical metrics were used to evaluate simulated discharge: Percent Bias (Pbias), coefficient of determination (R
2), Nash–Sutcliffe efficiency (NSE), and mean absolute percent deviation (MAPD) are shown below.
Pbias measures the relative tendency of simulated flows to be larger or smaller than observed counterparts. Positive values indicate overestimation, negative values indicate underestimation, and the optimal value is zero. NSE measures the relative magnitude of the residual variance compared to the measured data variance. Values range from −∞ to 1.0, with 1.0 being the optimal value where model simulation match observed discharge. MAPD is the mean absolute error as a percentage of the deviation from observed.
Model evaluation criteria established by Moriasi et al. [
46] define a model simulation as “satisfactory” if NSE > 0.50 and Pbias ±25% for streamflow simulations with a monthly timestep. Given that models exhibit poorer performance when evaluated at shorter time steps [
47], we used an NSE > 0.40 threshold to identify a result as acceptable for an individual basin at the hourly timestep. R
2, Pbias, and MAPD are percent error metrics and are suitable for comparing across different data sets [
48], they are used evaluated how model error changes across basin scales.
4. Discussion
The simulation results for each basin outlet and corresponding upstream location indicated that model efficiencies increase with increasing basin size. Based on our findings, we infer that the HL-RDHM model can produce satisfactory streamflow estimates for basins down to ~250 km2. Below this threshold, model skill became less consistent and more variable, especially for basins less than 50 km2. We found NSE values to be below the acceptable threshold for subbasins that represented less than 20–40% of the total watershed area. The NSE <0.40 threshold generally corresponded to Pbias value >±30%, R2 value <0.30, and MAPD values >30%.
The scale effect can be explained by a simpler rainfall–runoff relationship in larger basins as these larger systems integrate many of the complexities seen at small scales, thus creating better modeling conditions for larger basins [
49]. At the smallest scales studied, both model calibration and the resolution of the input data can become problematic [
50]. We calibrated only using information at the basin outlet and the decreased model skill with upstream sites may be associated with the calibration results being unable to reflect the various physical and hydrologic characteristics of subbasins, especially when estimating high flows in smaller subbasins [
51].
Spies et al. [
27] suggested that the HL-RDHM (specifically the SAC-HT) may not be able to capture the full range of hydrologic conditions experienced by watersheds in this region. For this same study area, they found that model performance was better during the wetter months of spring when watershed response was faster, whereas performance was poorer in the drier late summer when watershed response was damped due to lower soil moisture and higher surface roughness from extensive row crop vegetation cover. Spies et al. [
27] noted that the model was limited in its ability to represent the watershed consistently well across all time periods; in this study we find the model is also unable to represent watersheds consistently well at all spatial scales tested. In both studies, a single set of parameters was identified for each watershed. Parameters that are varied in time and space may improve model performance in this region.
The use of the 4 km resolution of the HRAP cell network may be another source of model error. It is plausible that the watershed area specified in the connectivity file (used for routing) could be significantly different from the USGS estimate, 10% or more in some cases [
52]. This issue is more common in basins less than 1000 km
2 because basin boundaries cannot be precisely captured by the coarse resolution grid [
20]. This is a significant challenge for small basins where the confluence of multiple streams could be occurring within a single cell.
Basin characteristics, such as basin slope, may also play a role in the accuracy of the simulations. Spies et al. [
27] found that the HL-RDHM performed the worst in basins characterized by relatively flat topography (<1.0% average slope) for this same general study area. However, differences in the average slopes for the basins studied here vary less than 1–2%. In basins with low slopes and poorly drained soils, agricultural tiles the presence of subsurface tile drainage adds to the uncertainty of simulations because they were not explicitly represented in the model.
Overall, model statistics were better when using the NLDAS-2 product rather than Stage IV. Our conclusions surrounding these two precipitation products are similar to Wu et al. [
53], who determined that out of nine Quantitative Precipitation Estimate (QPE) products, including Stage IV, model skill was highest for simulations forced with NLDAS-2 precipitation. Wu et al. [
53], who also studied basins in Eastern Iowa but with a much larger drainage area (ranging from 2816 km
2 to 32,381 km
2), determined that when using NLDAS-2 precipitation to force model simulations, model performance tends to increase for downstream basins.
The importance of the spatial representation of precipitation likely increases with decreasing basin size, particularly when simulating discharge at the smallest scales. Smaller basins have less capacity to dampen out inputs and corresponding input errors, therefore, small-scale variability and bias in precipitation inputs has the potential to cause greater errors in simulated streamflow for small basins [
18,
54]. Faurès et al. [
55] demonstrated that even for very small basins less than 1 km
2 subject to convective events, precipitation variability must be considered. However, Brath et al. [
56] investigated a basin in north central Italy with a drainage area 1050 km
2, which is considered “mid-sized” relative to watersheds used in our study, and suggested that the spatial representation of rainfall was not important to the streamflow response at the outlet.
Stage IV is often regarded as the reference or ground-truth in Quantitative Precipitation Estimate (QPE) [
57]. However, Chen et al. [
57] concluded that Stage IV hourly data underestimated roughly 10% of precipitation volume in the NCRFC region. A primary reason for this is the limited sensitivity of the radar to frozen precipitation, resulting in underrepresented precipitation across the United States. As temperatures fall below freezing or a temperature inversion occurs, the effective range of the WSR-88D radar used to generate Stage IV data decreases, decreasing the likelihood of accurately retrieving frozen precipitation. In Iowa, many of these events may be viewed as insignificant because light or frozen precipitation do not generally result in flooding events. However, these precipitation types can be frequent enough to influence local water budgets and affect soil moisture [
57], which in turn influence the accuracy of simulated low flow discharges and initial soil moisture conditions leading into larger events in the spring and summer. Precipitation either missed or underrepresented during the winter months may have affected the calculated precipitation adjustment factor, which is used to improve the water budget of the model. The adjustment factor was created using analysis of precipitation throughout the year. If the accuracy of the precipitation data varied by season, for example underrepresented winter precipitation but accurate summer precipitation, the adjustment factor may have been larger than necessary to account for the erroneously low precipitation in the cold months. This could negatively affect discharge simulations by increasing the intensity of summer events and lead to overestimation of high flows. We could have further manipulated the adjustment factor to improve results for Stage IV simulations, but to remain consistent in our approach we chose to adjust the Stage IV input the same way as the NLDAS-2.
The accuracy of the discharge data at non-USGS sites may also be influencing results, and for the Squaw Creek basin in particular. Mantilla and Krajewski [
58] state that the accuracy of the LiDAR derived cross-section used to create the modeled rating curves is not optimal, a primary reason for not including them in routine forecasts. We had few manual measurements with which to validate the rating curve for larger flows in the Squaw Creek. Due to the time period during which measurements were taken and the short time to peak in the small basins (<6 h), observations are limited to flows in the intermediate range for larger subbasins and sub-intermediate flows for small basins. Thus, discharge estimates for high stage observations likely are the most uncertain.
5. Conclusions
While distributed models are capable of producing simulations at interior points, data limitations make understanding how well the model perform at a range of spatial scales challenging. The objective of the current study was to investigate how the accuracy of a spatially distributed model, when calibrated only to the basin outlet, changes with respect to basin size to gain insight into forecasting in small, likely ungauged, basins. The ability of the HL-RDHM to produce NSE values above 0.4 for several upstream locations indicates the potential for using this distributed model to simulate discharge at interior basins, when only calibrating the model to the basin outlet. However, for the NLDAS-2 precipitation product, the skill of the model decreased for subbasins less than 250 km2, this represents 20–40% of the total basin area depending on the watershed. For basins smaller than 250 km2, the relationship of model skill to basin area becomes less predictable. These basin size thresholds were larger when using the Stage IV, and average model performance was poorer when using Stage IV precipitation.
In the cases shown, the coarser resolution NLDAS-2 product performed better overall, although for some mid-summer events the finer resolution Stage IV produced more accurate hydrographs. This study corroborates others illustrating that inputs with higher spatial and/or temporal resolution may not necessarily result in more accurate simulations of hydrologic variables [
39,
54,
59,
60], including subbasin streamflow. The continued improvement of spatially distributed inputs, calibration approaches, and model structures will increase the potential for using a spatially distributed model to advance current emergency management and decision-support services. Application of distributed models at finer watershed scales should be a continued focus of model development research.