An Assessment of Mean Areal Precipitation Methods on Simulated Stream Flow: A SWAT Model Performance Assessment

: Accurate mean areal precipitation (MAP) estimates are essential input forcings for hydrologic models. However, the selection of the most accurate method to estimate MAP can be daunting because there are numerous methods to choose from (e.g., proximate gauge, direct weighted average, surface-ﬁtting, and remotely sensed methods). Multiple methods ( n = 19) were used to estimate MAP with precipitation data from 11 distributed monitoring sites, and 4 remotely sensed data sets. Each method was validated against the hydrologic model simulated stream ﬂow using the Soil and Water Assessment Tool (SWAT). SWAT was validated using a split-site method and the observed stream ﬂow data from ﬁve nested-scale gauging sites in a mixed-land-use watershed of the central USA. Cross-validation results showed the error associated with surface-ﬁtting and remotely sensed methods ranging from − 4.5% to − 5.1%, and − 9.8% to − 14.7%, respectively. Split-site validation results showed the percent bias (PBIAS) values that ranged from − 4.5% to − 160%. Second order polynomial functions especially overestimated precipitation and subsequent stream ﬂow simulations (PBIAS = − 160) in the headwaters. The results indicated that using an inverse-distance weighted, linear polynomial interpolation or multiquadric function method to estimate MAP may improve SWAT model simulations. Collectively, the results highlight the importance of spatially distributed observed hydroclimate data for precipitation and subsequent steam ﬂow estimations. The MAP methods demonstrated in the current work can be used to reduce hydrologic model uncertainty caused by watershed physiographic differences.


Introduction
Accurate mean areal precipitation (MAP) estimates are essential forcing data for hydrologic models [1,2]. However, quantifying MAP can be confounding, particularly when metrological conditions, topography, and/or land use influence the spatiotemporal variability of precipitation [3]. There are multiple techniques that have been used to estimate MAP including (but not limited to) nearest neighbor (or proximate gauge), direct weighted average, surface-fitting, and remote sensing (e.g., radar and satellite-based) methods. Selecting the most suitable method to estimate MAP can be daunting, in part, because there are so many methods to choose from, each with distinct strengths and weaknesses.
Studies have shown a correlation between gauge density and the accuracy of MAP estimates [4]. In regions without observed data, MAP can be simulated using statistical weather generators. However, complex surface-fitting methods can be a disadvantage. A less frequently applied alternative approach to cross-validation analysis of MAP models is to validate the models against a hydrologic model simulated stream flow [17]. There remains a great need for a hydrologic model validation of multiple MAP methods across the array of contemporary MAP modeling techniques (i.e., proximate gauge, direct weighted average, surface-fitting, and remotely sensed methods). In addition, Masih et al. [4] and Ly et al. [27] concluded that there is a need for further testing of the effects of precipitation input on the SWAT model output.
In response to the call for further investigation, a novel split-site validation of multiple MAP techniques (e.g., proximate gauge, direct weighted average, surface-fitting, and remotely sensed methods) is presented using observed precipitation and stream flow data collected at nested-scales in a mixed-land-use catchment. The overarching purpose of the current work was to show which MAP method(s) are best-suited for hydrologic simulation modeling efforts so that managers can more readily focus on viable efforts in other developing watersheds with similar physiographic characteristics as the study catchment. The objectives of the current work were to: (i) quantify the effects of 19 different MAP methods on uncalibrated SWAT model performance for stream flow, and (ii) perform a split-site validation of SWAT model simulated stream flow using multiple years of observed stream flow data collected at nested-scales from a mixed-land-use watershed of the central USA.

Site Description
The study catchment, Hinkson Creek Watershed (HCW), is a rapidly urbanizing mixed-land-use (forest, agriculture, and urban) watershed located in Boone County, Missouri, USA (Figure 1). HCW has a total drainage area of approximately 230 km 2 . Elevation ranges from 274 m above mean sea level (AMSL) in the headwaters to 177 m AMSL near the watershed outlet. The climate in HCW is dominated by continental polar air masses in the winter and maritime and continental tropical air masses in the summer [32]. The wet season occurs primarily during March through June [33]. A 16-year climate record (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015) obtained from the Sanborn Field climate station located on the University of Missouri campus showed that the mean annual total precipitation was 1036 mm, and the mean annual air temperature was 13.3 • C, respectively. Daily minimum and maximum air temperatures ranged from −23.1 • C in the winter to 41.3 • C in the summer.
The HCW was instrumented with hydroclimate (n = 5) and meteorological (n = 6) monitoring sites in 2009. Precipitation was sensed using TE525WS tipping bucket rain gauges with an accuracy of +1% at 2.5 cm/h rainfall rate to −3.5% at 5.1 to 7.6 cm/h rainfall rate during the study period [water year's (WY's) 2009 to 2015]. A total of eleven precipitation monitoring sites were used for the current work including five Hinkson Creek hydroclimate sites (noted above), and six additional meteorological sites (The Columbia Regional Airport, and the five Missouri Agricultural Experimental sites) located in Boone County, Missouri, USA (Table 1, Figure 1). Site #1 was the sub-basin with the lowest precipitation gauge density. However, the precipitation gauge at the outlet was less than 14 km from the furthest ridge of the sub-basin at Site #1, thus resulting in a relatively high-resolution spatial sampling design. The five HCW hydroclimate sites were positioned to partition the study catchment into five sub-basins using a nested-scale experimental watershed study design each with different dominate land use types [32,34]. At the time of this research, agricultural land use decreased by 18.4% and urban land use increased by 21.6% with the downstream distance from Site #1 located in the headwaters to Site #5 located near the watershed outlet.
The stage was measured at each Hinkson Creek gauging site using Sutron Accubar ® constant flow bubblers with an accuracy of 0.02% at 0-7.6 m to 0.05% at 7.6-15.4 m [35][36][37][38]. Precipitation and stage data were stored on Campbell Scientific CR-1000 data loggers. Methods used to estimate the discharge met the U.S. Geological Survey (USGS) standards [39]. The stream flow was measured using FLO-MATE™ Marsh McBirney flow meters and wading rods when the stage was less than 1-m deep.
Storm flows were measured using a U.S. Geological Survey Bridge Board™. Stage-discharge rating curves were generated using the incremental cross section method [40].

MAP Modeling
Errors in precipitation estimates occur at the point measurement of precipitation (e.g., precipitation gauge), and during conversion of the point measurements to MAP. Error in precipitation measurement at a gauge may be caused by various factors including, but not limited to, wind, splash, evaporation, mechanical failure, improper instrument calibration, a plugged gauge orifice, isolated objects (e.g., trees, buildings, and fences), observation errors, and other factors [41,42]. Because point measurement is fraught with complications, it is important to examine the quality of precipitation data before undertaking hydrologic analyses [41,42]. Therefore, double-mass plots were created to check for undercatch due to obstructions at each gauging site where the precipitation was measured [43]. The results from the double-mass curve analyses indicated undercatch precipitation at Site #1 presumably attributable to a tree line induced fetch issue. The precipitation gauge at Site #1 was moved away from the tree line to approximately 250 meters south east of Site #1 on 16 June 2011. The undercatch issue was corrected as per methods used by Wagner et al. [44], where a calculated percentage was added to the time series recorded at Site #1 before the site was moved (Site #1 was installed on 1 February 2009). Additionally, any missing precipitation data (i.e., data gaps) were estimated using inverse distance weighting methods [45].
After rainfall data post-processing, 19 different methods were used to estimate MAP for each sub-basin in HCW including two proximate gauge methods, three direct weighted average methods, ten surface-fitting methods, and three satellite-based methods ( Table 2). The MAP methods chosen for this assessment ranged from simplistic to complex in order to address the needed balance between computational complexity and the ease of use. Nineteen MAP methods were included in the analysis to avoid erroneous selection of the most convenient method while at the same time providing a comprehensive overview (and validation) of many of the different types of methods available. The proximate gauge MAP models (i.e., Centroid and Outlet) were the simplest methods and were dependent on the fewest number of precipitation gauges. For example, the centroid method used data from three precipitation gauges (Sites #1, #2, and #4), and the outlet method used data from five precipitation gauges. For the centroid method, MAP was estimated as the precipitation measured at the precipitation gauge nearest the centroid of a sub-basin. For the outlet method, MAP was estimated as the precipitation measured at the outlet of each sub-basin (Sites #1, #2, #3, #4, and #5).

Direct-Weighted Average Methods
Direct weighted averages are computationally simplistic methods used to generate estimates of MAP with the following general equation [31]: where MAP is mean areal precipitation for G gauges termed g = 1, g = 2, . . . , G, p g is the precipitation measured at each gauge, and w g are the weights based on the sub-basin area that satisfy: The coefficients for each of the direct weighted averaging models were fit using ArcGIS. The arithmetic average is the simplest direct weighted averaging method. Each gauge has an equal weight in the computation of a single arithmetic average for the entire watershed. The equation used to calculate the arithmetic average follows [31]: The arithmetic average method is useful in estimating missing precipitation data using data collected from nearby gauging stations. However, this method does not represent the spatial variability observed in the precipitation data.
The Thiessen method is a direct weighted average method performed by dividing a region into sub-regions centered on each monitoring site [7]. Each sub-basin does not line up with the Thiessen sub-region. However, the fraction of each Thiessen sub-region contributing to each sub-basin can be computed, and weighted averages can be used to estimate MAP for each sub-basin. The equations used to compute the spatial average follow: where a g is the area of each sub-region. Bethlahmy [8] developed the two-axis method where weights are derived using basic geometry. The two-axes include a major axis and a minor axis. First, three lines are drawn, (1) the longest line that can be drawn in the region, (2) its perpendicular bisector (the minor axis), and (3) is the perpendicular bisector of the minor axis (the major axis). Then, two lines are drawn from each gauge to the furthest end of each axis, and the angle of the two lines is recorded. Finally, the equations used to calculate MAP are as follows: where Å is the sum of all of the angles α g .

Surface-Fitting MAP Methods
Surface-fitting algorithms were also used to generate MAP from point gauges using the ArcGIS software. The general equations used to calculate the weighted averages for each cell are as follows [42]: where w jg are weights for each cell, p j is precipitation value for each cell, and J is the total number of all cells. The difference between various surface-fitting methods used is how the weights are calculated. Surface fitting algorithms can be classified as deterministic or statistical methods. Global polynomial surface (GPS) and local polynomial surface (LPS) functions are deterministic methods commonly used to interpolate rainfall [9]. All point measurements are considered when using the GPS method. Conversely, only point measurements within a defined radius are considered with the LPS method. For both GPS and LPS methods, the order of the polynomial function must be defined by the end user. A linear plane results from a 1st order polynomial, one turning point (i.e., curve) results in a surface generated by a 2nd order polynomial, two curves are present for the 3rd order polynomial, and so on. Unlike the GPI and LPI methods, Radial Basis Functions (RBFs) are an exact interpolator. Some examples of RBFs include spline and multiquadric interpolation. RBFs generate a smooth surface and the degree of smoothness is dependent on the kernel parameter [9]. Conversely, IDW is an inexact interpolator. Thus, unlike RBFs, IDW fitted-surfaces may not be exactly equal to the point measurements used to generate the surface. The assumption behind the method of IDW is that each unknown point in the surface is more alike its closest neighboring point measurement [10]. The equations used to calculate IDW is [31]: where d pj is distance from point j to the gauge, and −m was an exponent assigned by the end user.
Kriging is a statistical surface-fitting method where the weights to each cell minimize the variance of the interpolation error [11]. Empirical semivarience is computed using input data and a curve is fit to develop a semivariogram. Then, predictions are made for each cell using kriging weights generated from the semivariogram [31]. Some examples of kriging methods include ordinary kriging (OK) and universal kriging (UK). The OK method generates a smooth surface, minimizes the standard error, but no spatial trend in precipitation is accounted for. Conversely, the UK method can be used to account for the presence of a spatial trend or drift.
The surface-fitting methods were applied using the ArcGIS 'model builder' to efficiently interpolate observed point measured precipitation data into MAP for each HCW sub-basin (n = 5). A split-site method was used to calibrate and validate the surface-fitting models and remotely sensed data against point gauge measured precipitation. Sites #1-5, Capen Park, Bradford Center, and Columbia Regional Airport were used for calibration. Sanborn Field, South Farm, and Jefferson Farm monitoring sites were used for validation. Three model performance criteria were used to evaluate the model performance including percent bias (PBIAS), root mean square error (RMSE), and mean absolute error (MAE) [35,46,47].

Remotely-Sensed MAP Methods
MAP was also estimated using three satellite-based datasets. The three satellite-based precipitation data included PRISM, TRMM, and CHIRPS data. The PRISM data were downloaded from an Oregon State University website [48]. Daily precipitation data grids (4 km raster images) corresponding to the 'AN81d' data set were downloaded one year at a time in the .bil format. Area-averaged TRMM Multi-Satellite Precipitation Analysis (3b42) data were downloaded at a daily time step from the Giovanni website [49]. Area-averaged CHIRPS data were downloaded at a daily time step from the ClimateServ website for this research [50]. Additionally, MAP was estimated using a level III daily precipitation product of Next Generation Radar Data (NEXRAD) which were bulk downloaded from a National Weather Service File Transfer Protocol server [51].
Models were created in ArcGIS using the 'model builder' to extract precipitation data from each surface raster file to each precipitation monitoring site (n = 11) for validation. Three model performance criteria were used to evaluate the model performance including PBIAS, RMSE, and MAE [35,46,47]. Sanborn Field, South Farm, and Jefferson Farm monitoring sites were used for validation.

SWAT Modeling
After MAP was generated and the MAP models were validated, the model output was formatted for input into SWAT (SWAT 2012 (Revision 627 packaged with ArcSWAT 2012.10_2.16)) to test the effects of the 19 different MAP methods on the uncalibrated SWAT model output of the stream flow. An uncalibrated model performance assessment was appropriate in the current work. This is justifiable considering that the SWAT model was designed to generate reliable predictions in ungauged basins [6]. These results are important for end users who rely on remotely-sensed precipitation data. A 30 m digital elevation model, SSURGO soils, and the 2011 National Land Cover Data sets were used for the spatial input data required to create a SWAT project. The SCS curve number method, the Muskingum method, and the Penman-Monteith method were selected to simulate rainfall-runoff, channel routing, and evapotranspiration processes, respectively. Channel degradation, stream water quality, and algae-biochemical oxygen demand-dissolved oxygen simulations were all set to 'active'. Calibration parameters were set to reflect physically realistic values for the watershed [52]. Agricultural HRU's were set to a corn-soybean rotation with three fertilization operations (anhydrous ammonia, elemental nitrogen, and elemental phosphorus) and tandem disk tillage in the spring. Grazing pasture HRU's were set to hay growing operations, fertilizer operations (elemental nitrogen, and elemental phosphorus), and grazing operations realistic for HCW. Urban HRU's were set to a tall fescue growing operation with lawn fertilization and street sweeping operations. Water abstractions were not implemented in the SWAT model.
For each SWAT project (n = 19, one for every MAP method tested), the precipitation text files were updated from the "txt in out" folder under the "default" scenario. The uncalibrated SWAT model output was tested to isolate only the effects of the precipitation input on SWAT simulated stream flow. Additionally, the SWAT model was calibrated using the auto-calibration software SWAT-cup and validated using a split-site method for stream flow at daily, monthly, and annual time steps [5,52]. Site #4 (the HCW USGS operated gauge) was used for calibration. Sites #1, #2, #3, and #5 were reserved for validation.
In the current work, one global sensitivity analysis was applied using SWAT-cup as per methods used by Szczesniak and Pinieski [4] and Masih et al. [17]. A global sensitivity analysis was run using SWAT's default method for precipitation input (centroid method) and the top six most sensitive calibration parameters were selected for the calibration of all MAP models. The six calibration parameters selected included the Soil Conservation Service (SCS) curve number (CN2), surface runoff lag time (SURLAG), available water capacity of the soil (SOL_AWC), manning's n value for overland flow (OV_N), base flow alpha factor (ALPHA_BF), and groundwater delay (GW_DELAY). User-specified physically meaningful minimum and maximum bounds were set for CN2 (absolute −5 . Model outputs were reduced by averaging to the monthly and annual time steps for analysis because daily simulations are typically poorer than monthly and annual simulations [47,53]. Uncalibrated and calibrated model performances were assessed using three model evaluation criteria for NSE, R 2 , and PBIAS, shown in Table 3 [52]. Each MAP method was ranked by model performance values (PBIAS, NSE, and R 2 ) at each time interval (daily, monthly, and annual) for easy comparison. Then, a relative overall model performance rank was quantified by summing the ranks of the individual model performance values across all time intervals.

MAP Modeling
When the observed precipitation from all eleven monitoring sites were averaged, the results showed that the average annual total precipitation ranged from 743 mm during water year (WY) 2012 to 1543 mm during WY 2010 ( Table 4). The observed precipitation at Sanborn Field and South Farm was within 6 mm of the all-site mean, but the precipitation at Jefferson Farm was 67 mm less than the seven-year mean. Double-mass curve analyses did not indicate precipitation undercatch due to gauge obstructions at the validation sites, thus the differences in annual total precipitation were presumably due to spatial variability of precipitation during the study period.
The results from the one-way ANOVA tests showed no significant differences (p > 0.05) in the seven-year average precipitation between sites in agreement with a recent publication by Hubbart et al. [33]. Hubbart et al. [33] showed that the differences in precipitation between urban and rural sites were not significant at the 95% confidence level in the HCW. However, precipitation was shown to be slightly greater by 3.3% in the urban area of the watershed indicating a slight influence of urban land use on total annual precipitation in the rapidly urbanizing lower elevations of HCW [38].
When surface-fitting methods and remote sensing methods were validated, the results showed that the models consistently overestimated the annual total precipitation (Table 6). Results from one-way ANOVA tests showed that the seven-year average MAP was significantly (p < 0.01) greater by a range of 5.9% to 15.7% for all four remotely sensed methods relative to all other methods tested. This was an important finding from the current work with implications for end-users that need to use remotely-sensed precipitation data for hydrologic modeling efforts. However, there were no significant differences (p > 0.05) in the seven-year average MAP between proximate gauge, direct-weighted average, and surface-fitting methods.
The lack of significant differences between annual average MAP estimated by proximate gauge, direct-weighted average, and surface fitting methods was surprising in the current work, and unanticipated. It was further surprising considering that the findings from other studies showed improvements in hydrological output with computational complexity and gauge density [4,17]. For example, there were differences in the equations for each MAP method (n = 19 methods) ranging across a simple-complex gradient of computational rigor. Also notable were the marked differences in gauge density between the centroid method (n = 3 gauges), the outlet method (n = 5 gauges), and direct weighted averaging and surface-fitting methods (n = 11 gauges). The lack of statistical differences was counter to the literature, and thus should be of interest, considering differences in climate and physiography between the study sites.
Validation results showed variable results for maximum daily MAP estimates. For example, maximum daily MAP was underestimated by greater than 50% for two remote sensing methods (i.e., CHIRPS and PRISM), and two direct weighted average methods (i.e., Average and Two-Axis). Conversely, maximum daily MAP was overestimated by more than 160% for the 2nd order polynomial functions (i.e., GPI_2 and LPI_2). The 2nd order polynomial functions generated unrealistic estimates of the daily maximum MAP because the model output increased with the distance upslope of Site #1. Conversely, the 1st order polynomial functions were within 16% of the observed maximum daily MAP. Thus, in the current work, the linear polynomial functions were more accurate than the nonlinear order polynomial functions. Polynomial functions of cubic or greater order that result in a surface with multiple turning points (i.e., a rippled or wavy surface) were not tested.

Uncalibrated Model Performance
In comparison to previous work, this study is the first to show the effects of MAP input on uncalibrated SWAT stream flow at daily, monthly, and annual time steps. Considering that the model calibration influences the model output, the uncalibrated results completely isolated the effects of precipitation input on the SWAT model performance. There was a general trend for uncalibrated model performance to increase as the stream flow data were reduced by averaging from daily to annual time steps. For example, uncalibrated daily SWAT model performance ratings for stream flow were 'unsatisfactory' for every MAP method tested due to low NSE and R 2 values ( Table 7). As the streamflow data were reduced by averaging from a daily to monthly time step, NSE and R 2 values increased leading to model performance that ranged from 'unsatisfactory' to 'satisfactory'. SWAT model simulations of annual stream flow ranged from 'unsatisfactory' to 'good'.
Overall model performance ratings were more limited by NSE and R 2 values at daily timesteps and by PBIAS at annual timesteps. For example, the results showed that the NSE and R 2 values improved by an average of 0.60 and 0.47, respectively, as the stream flow data were reduced by averaging from daily to annual time steps, while the PBIAS ratings were nearly equal for daily, monthly, and annual time steps. These results indicated that one threshold for PBIAS is appropriate for all timesteps (e.g., daily, monthly, and annual) in agreement with the model performance criteria recently published by Moriasi et al. [46], and point to the potential benefit of considering different NSE and R 2 thresholds for each timestep.
Notably, some MAP methods resulted in 'unsatisfactory' model performance across all time scales reflected in the PBIAS values ±15% even when the NSE and R 2 values were adequate. There may be a need to reconsider changing the PBIAS threshold to ±25% as proposed by Moriasi et al. [47] to better capture adequate model performance at monthly and annual timesteps. These results also highlighted a problem with the threshold dependent overall model performance rating criteria when selecting the best MAP method. The problem was resolved by ranking the model performance output to highlight the best MAP method tested.
When the uncalibrated SWAT model performance values were ranked, the results showed that the remotely sensed PRISM data set performed best overall (i.e., all time steps tested considered) ( Table 7). The PRISM method resulted in the greatest NSE and R 2 results at monthly and annual timesteps, indicating that the PRISM method was best-suited for long-term estimates of stream flow in the study catchment (Table 7). However, the results showed that the daily estimates of stream flow were 'unsatisfactory' for the uncalibrated models. These results are in general agreement with results provided by Srinivasan et al. [54] who used similar GIS-based methods to aggregate 4 km PRISM data for SWAT model precipitation input, and ran the model ungauged (i.e., uncalibrated) to show that the uncalibrated SWAT model can produce satisfactory estimates of hydrology at an annual time step in the Upper Missouri River Basin with PBIAS values within ±10%, and NSE values ranging from 0.51 to 0.95. Unlike the results from Srinivasan et al. [54], the model performance was also 'satisfactory' for the monthly stream flow in the current work, but only near the watershed outlet because the model did not simulate hydrology well in the agricultural dominated headwaters. Examination of the model performance ratings at the validation sites (Sites #1, #2, #3, and #5) indicated that the uncalibrated SWAT model performance ratings were unsatisfactory at every time step at Sites #1 and #2 due to overestimation of the stream flow in the predominantly agricultural headwaters (e.g., PBIAS values less than −175%), likely (in part) due to clay pan soils characterized by increased surface runoff rates that the model simulations over-accounted for [55], considering that Baffaut et al. [55] reported on the complications of SWAT modeling applications in watersheds with clay pan soils in the central US. In the current work, the PBIAS values ranged from −51.0 (Two-Axis) to less than −163.0 (GPI_2, LPI_2) at Site #1 ( Figure 2). Conversely, the stream flow was generally underestimated at Site #5 with PBIAS values greater than 15% for most of the MAP methods tested except for four MAP methods (i.e., GPI_2, LPI_2, PRISM, and NEXRAD). The stream flow at urban Site #5 was likely underestimated (at least in part) due to the impacts of increased impervious surfaces on the stream flow which the model did not simulate well.
Water 2017, 9,459 13 of 21 step at Sites #1 and #2 due to overestimation of the stream flow in the predominantly agricultural headwaters (e.g., PBIAS values less than −175%), likely (in part) due to clay pan soils characterized by increased surface runoff rates that the model simulations over-accounted for [55], considering that Baffaut et al. [55] reported on the complications of SWAT modeling applications in watersheds with clay pan soils in the central US. In the current work, the PBIAS values ranged from −51.0 (Two-Axis) to less than −163.0 (GPI_2, LPI_2) at Site #1 ( Figure 2). Conversely, the stream flow was generally underestimated at Site #5 with PBIAS values greater than 15% for most of the MAP methods tested except for four MAP methods (i.e., GPI_2, LPI_2, PRISM, and NEXRAD). The stream flow at urban Site #5 was likely underestimated (at least in part) due to the impacts of increased impervious surfaces on the stream flow which the model did not simulate well.

Calibrated Model Performance
Model calibration improved the model performance at Site #4 ( Table 8). The differences in fitted calibration parameter values for each MAP method are shown in Figure 3. When calibrated SWAT model performance was ranked, the IDW_2 method performed best overall (i.e., all time steps tested were considered) ( Table 8). The IDW_2 method has also been shown to work well in other locations with considerably different climate and physiographic characteristics compared to the study catchment [4,17,26]. For example, Tuo et al. [26] showed that the IDW method was the most accurate method tested for SWAT model precipitation input on SWAT simulated stream flow in an alpine Aldige river basin of north-eastern Italy. Szczesniak and Pinieski (2015) showed that IDW was more accurate than the centroid method in the Sulejów reservoir catchment in central Poland. Masih et al. [4] successfully tested the IDW method versus mountainous semiarid catchments in the Karkheh River basin, Iran. Thus, the results show that IDW_2 is a robust interpolation method that can improve SWAT model performance in various regions globally.

Calibrated Model Performance
Model calibration improved the model performance at Site #4 (Table 8). The differences in fitted calibration parameter values for each MAP method are shown in Figure 3. When calibrated SWAT model performance was ranked, the IDW_2 method performed best overall (i.e., all time steps tested were considered) ( Table 8). The IDW_2 method has also been shown to work well in other locations with considerably different climate and physiographic characteristics compared to the study catchment [4,17,26]. For example, Tuo et al. [26] showed that the IDW method was the most accurate method tested for SWAT model precipitation input on SWAT simulated stream flow in an alpine Aldige river basin of north-eastern Italy. Szczesniak and Pinieski (2015) showed that IDW was more accurate than the centroid method in the Sulejów reservoir catchment in central Poland. Masih et al. [4] successfully tested the IDW method versus mountainous semiarid catchments in the Karkheh River basin, Iran. Thus, the results show that IDW_2 is a robust interpolation method that can improve SWAT model performance in various regions globally.    The results from the current work and other studies showed mixed results from remotely sensed data. Tuo et al. [26] tested the influence of satellite-based data sets (TRMM and CHIRPS) on SWAT simulated stream flow. The CHIRPS data resulted in 'satisfactory' simulations of the stream flow [26]. However, the TRMM data set was associated with 'unsatisfactory' model performance by Tuo et al. [26]. In the current work, TRMM and CHIRPS data were ranked at 16th and 17th, respectively, indicating poor overall model performance relative to the other MAP methods tested. However, it is important to note that model calibration improved the model performance adequately at an annual timestep using the TRMM and CHIRPS input data in the current work. Examination of the model performance ratings at the validation sites (Sites #1, #2, #3, and #5) indicated that the PBIAS values decreased to less than −68% in the agricultural headwaters due to notable differences in soils, vegetation, land use, and underlying geology that the SWAT model did not simulate well (Figure 4). The study catchment was divided by two Level II Ecoregions [35]. The agricultural headwaters were within the Great Plains ecoregion while the urbanizing lower elevations were in the Ozark Highlands ecoregion. Thus, the split-site validation results indicated that calibrating the model at only the watershed outlet is not advised in watersheds with substantial changes in watershed physiographic characteristics. These results have implications for water resource management practices based on the SWAT model output in large basins (e.g., Upper Mississippi River Basin) that often contain multiple ecoregions coupled with differences in watershed characteristics. The results from the current work and other studies showed mixed results from remotely sensed data. Tuo et al. [26] tested the influence of satellite-based data sets (TRMM and CHIRPS) on SWAT simulated stream flow. The CHIRPS data resulted in 'satisfactory' simulations of the stream flow [26]. However, the TRMM data set was associated with 'unsatisfactory' model performance by Tuo et al. [26]. In the current work, TRMM and CHIRPS data were ranked at 16th and 17th, respectively, indicating poor overall model performance relative to the other MAP methods tested. However, it is important to note that model calibration improved the model performance adequately at an annual timestep using the TRMM and CHIRPS input data in the current work. Examination of the model performance ratings at the validation sites (Sites #1, #2, #3, and #5) indicated that the PBIAS values decreased to less than −68% in the agricultural headwaters due to notable differences in soils, vegetation, land use, and underlying geology that the SWAT model did not simulate well (Figure 4). The study catchment was divided by two Level II Ecoregions [35]. The agricultural headwaters were within the Great Plains ecoregion while the urbanizing lower elevations were in the Ozark Highlands ecoregion. Thus, the split-site validation results indicated that calibrating the model at only the watershed outlet is not advised in watersheds with substantial changes in watershed physiographic characteristics. These results have implications for water resource management practices based on the SWAT model output in large basins (e.g., Upper Mississippi River Basin) that often contain multiple ecoregions coupled with differences in watershed characteristics. The MAP input data affected the SWAT model simulated water balance variables (Table 9, Figure 5). For example, ET and stream flow accounted for the majority of the water balance. However, the greatest differences in the water balance components were in the amount of total stream flow that was the base flow as shown in Figure 5. For example, the SWAT model simulated BFI's ranged from The MAP input data affected the SWAT model simulated water balance variables (Table 9, Figure 5). For example, ET and stream flow accounted for the majority of the water balance. However, the greatest differences in the water balance components were in the amount of total stream flow that was the base flow as shown in Figure 5. For example, the SWAT model simulated BFI's ranged from 11% to 48% while the observed BFI was 25% for the study catchment [36]. BFI was overestimated when the MAP methods overestimated the annual total precipitation (GPI_2, LPI_2, PRISM, and NEXRAD). In the current work, to attenuate the overestimations of precipitation generated using the GPI_2, LPI_2, PRISM, and NEXRAD MAP methods, SWAT-cup autocalibration routines resulted in the greatest reductions of curve numbers (CN2-5) coupled with the greatest increases in overland roughness (OV_N+0.25) and available water capacity of the soil (SOL_AWC+0.05) (Figure 3). The combined effects of minimizing the curve numbers, maximizing manning's n for overland flow, and maximizing the available soil water capacity resulted in physically unrealistic BFI's for the study catchment. These results point to the need to consider the compounding effects of multiple calibration parameters on the model output when setting physically relevant bounds for the calibration parameters using autocalibration software like SWAT-cup. Additionally, the results highlight the importance of the precipitation input inaccuracy for hydrologic model output. Table 9. Mean areal precipitation (MAP) input effects on water balance variables after SWAT model calibration in the Hinkson Creek Watershed, Missouri, USA. SWAT models were forced with 19 different precipitation methods. All results are shown in depth of water (mm). The percent of total water input is shown parenthetically. Precip. is precipitation, ET is evapotranspiration, and Ground water is deep aquifer recharge.

MAP Method
Precip.

ET Surface Runoff (mm) (%) Baseflow Soil Water Ground Water
The NEXRAD data set led to an overestimated BFI by up to 23%, which was surprising considering that the level III NEXRAD data have been shown to produce more accurate hydrologic model performance compared to satellite-derived precipitation products, and a better choice for MAP inputs for hydrologic model performance than point gauges [56][57][58]. For example, Tobin and Bennett (2009) [58] used level III NEXRAD data for input into the SWAT model and the results showed that the three day average stream flow was associated with NSE values ranging from 0.60 to 0.88, while the TRMM 3B42 data set yielded more variable results with NSE values ranging from 0.38 to 0.94. However, in the current work, the NEXRAD precipitation data set was associated with the greatest overestimation in annual total precipitation (MAE = 152 mm) which indicated potential radar accuracy issues. Radar accuracy issues have been attributed to the proximity of the nearest radars which were located approximately 200 km away from the study catchment. For example, Simpson et al. (2016) [59] showed that the precipitation radar accuracy degrades at distances greater than 120 km due to increased beam light and volume coverage. These results may indicate a need to increase the radar gauge density in the US to reduce radar accuracy issues. The NEXRAD data set led to an overestimated BFI by up to 23%, which was surprising considering that the level III NEXRAD data have been shown to produce more accurate hydrologic model performance compared to satellite-derived precipitation products, and a better choice for MAP inputs for hydrologic model performance than point gauges [56][57][58]. For example, Tobin and Bennett (2009) [58] used level III NEXRAD data for input into the SWAT model and the results showed that the three day average stream flow was associated with NSE values ranging from 0.60 to 0.88, while the TRMM 3B42 data set yielded more variable results with NSE values ranging from 0.38 to 0.94. However, in the current work, the NEXRAD precipitation data set was associated with the greatest overestimation in annual total precipitation (MAE = 152 mm) which indicated potential radar accuracy issues. Radar accuracy issues have been attributed to the proximity of the nearest radars which were located approximately 200 km away from the study catchment. For example, Simpson et al. (2016) [59] showed that the precipitation radar accuracy degrades at distances greater than 120 km due to increased beam light and volume coverage. These results may indicate a need to increase the radar gauge density in the US to reduce radar accuracy issues.
Ultimately, in agreement with the previous studies [4,17,26,27], the centroid method is not the most accurate MAP method ( Table 8). The centroid method is useful in its simplicity. However, in regions with more than one gauge, precipitation data can be interpolated to estimate MAP using direct weighted averaging methods, or surface-fitting methods. Also, if satellite-based methods become more accurate as is anticipated, then point gauge interpolation methods may eventually become obsolete. Considering ArcSWAT is already compatible with ArcGIS, and all of the surfacefitting methods used in the current work were generated using geostatistical tools in ArcGIS, a conversion is possible. However, a balance must be achieved between spatial complexity and ease of use [5]. In the interim, 3rd party python scripts can be written by end-users that need to iterate point measured precipitation data across attribute table fields to generate daily time series' of MAP in ArcGIS to improve the SWAT model output for the stream flow. Ultimately, in agreement with the previous studies [4,17,26,27], the centroid method is not the most accurate MAP method ( Table 8). The centroid method is useful in its simplicity. However, in regions with more than one gauge, precipitation data can be interpolated to estimate MAP using direct weighted averaging methods, or surface-fitting methods. Also, if satellite-based methods become more accurate as is anticipated, then point gauge interpolation methods may eventually become obsolete. Considering ArcSWAT is already compatible with ArcGIS, and all of the surface-fitting methods used in the current work were generated using geostatistical tools in ArcGIS, a conversion is possible. However, a balance must be achieved between spatial complexity and ease of use [5]. In the interim, 3rd party python scripts can be written by end-users that need to iterate point measured precipitation data across attribute table fields to generate daily time series' of MAP in ArcGIS to improve the SWAT model output for the stream flow.

Conclusions
While there are many MAP methods to choose from, selecting the most accurate method for a given modeling application is essential for realistic hydrologic simulations. Internationally used alternative proximate gauge and direct-weighted average MAP estimation techniques may not be the most accurate MAP techniques for semi-distributed watershed-scale continuous time hydrologic models like the Soil and Water Assessment Tool (SWAT). The inverse distance weighted, linear local polynomial interpolation, and the multiquadric function surface-fitting methods were the most accurate methods for the SWAT model estimates of stream flow in this study. Conversely, the 2nd order polynomial MAP methods were the least effective, with unsatisfactory SWAT model performance at daily, monthly, and annual intervals.
Remotely sensed precipitation input data for hydrologic models are useful because the majority of the Earth's land surface is not monitored with networks of precipitation gauges. The PRISM data set was associated with 'satisfactory' uncalibrated SWAT model performance for the monthly stream flow. Additionally, all satellite-based datasets tested resulted in a rating of 'good' for annual stream flow following SWAT model calibration, except for the NEXRAD data that overestimated the total precipitation. Thus, the results support the use of the PRISM data set for ungauged watersheds regionally in catchments that may have stream flow gauges, but lack the precipitation monitoring sites to estimate MAP using surface-fitting methods. Additional model performance testing and validation will be needed as satellite-based precipitation data sets are updated. Networks of ground-based precipitation monitoring sites will also continue to be important for ground truthing.
Split-site SWAT model validation results indicated a need for multiple stream flow gauging sites in watersheds with notable differences in soils, vegetation, land use, and/or underlying geology. Model calibration at the watershed outlet alone was coupled with model uncertainty that increased with the distance from the urban site of calibration to the agricultural headwaters (−68% to −160% overestimations in stream flow) due to differences in watershed characteristics and two Level II Ecoregions. Collectively, the results highlight a need for networks of distributed meteorological monitoring sites, coupled with nested hydrologic gauging sites where multiple years of hydroclimate data can be collected to: (i) accurately quantify MAP using surface-fitting methods, (ii) continue to validate remotely sensed data sets, and (iii) partition large basins into sub-basins each with different dominate watershed characteristics for accurate watershed-scale continuous-time hydrologic model simulations.