Improving SWAT Model Calibration Using Soil MERGE (SMERGE)

: This study examined eight Great Plains moderate-sized (832 to 4892 km 2 ) watersheds. The Soil and Water Assessment Tool (SWAT) autocalibration routine SUFI-2 was executed using twenty-three model parameters, from 1995 to 2015 in each basin, to identify highly sensitive parameters (HSP). The model was then run on a year-by-year basis, generating optimal parameter values for each year (1995 to 2015). HSP were correlated against annual precipitation (Parameter-elevation Regressions on Independent Slopes Model—PRISM) and root zone soil moisture (Soil MERGE—SMERGE 2.0) anomaly data. HSP with robust correlation (r > 0.5) were used to calibrate the model on an annual basis (2016 to 2018). Results were compared against a baseline simulation, in which optimal parameters were obtained by running the model for the entire period (1992 to 2015). This approach improved performance for annual simulations generated from 2016 to 2018. SMERGE 2.0 produced more robust results compared with the PRISM product. The main virtue of this approach is that it constrains parameter space, minimizesing equiﬁnality and promotesing modeling based on more physically realistic parameter values.


Introduction
The Soil and Water Assessment Tool (SWAT) is a physically based model with demonstrated global applications and has been validated at the watershed scale through the publication of thousands of referred papers (see [1]). The SWAT model is moderate in terms of complexity, i.e., it is a semi-distributed model where the watershed is divided into subbasins, in which water balance is calculated on a daily basis. Many SWAT modeling studies have focused on matching simulated and observed streamflow at the basin's outlet. Calibration based on multiple gauges within a basin has been demonstrated to more realistically capture surface flow throughout an entire watershed (e.g., [2,3]). However, this approach, while an improvement, can fail to provide a realistic depiction of landscape conditions that strongly influence runoff production. During recent years, hydrologists have begun to leverage remote sensing observations to improve model calibration and achieve a more accurate picture of processes at a watershed scale. Examples of such studies that utilized the SWAT model span diverse aspects of the hydrologic cycle and include quantifying total terrestrial water [4,5], soil moisture [6][7][8], evapotranspiration [9][10][11], and groundwater recharge [12,13].
Since SWAT was designed as a tool to first and foremost simulate runoff, issues can arise when simulating other fluxes and state variables, such as soil moisture or evapotranspiration. New approaches have been developed to facilitate the incorporation of remotely sensed data to support watershed scale studies [14]. Of particular promise are data assimilation (DA) techniques adopted from the atmospheric science community, which have been increasingly applied to watershed hydrology studies [15][16][17]. However, the improvements that can be potentially conferred by DA have limitations. DA has difficulty in improving streamflow performance under high flow conditions [15,16] because runoff production is largely decoupled from the control of soil moisture under these circumstances. In addition, SWAT has some structural issues related to how soil moisture is accounted for that limits the benefits of DA of root zone soil moisture (RZSM) in this model. For example, the authors in reference [17] used DA to incorporate RZSM into SWAT and achieved worse results than open loop simulations. This is because the physics of the SWAT model without modification are not sufficiently complicated to account for vertical coupling between different soil layers. Despite these issues, soil moisture remains an important control on surface runoff production. One of the most important parameters within SWAT is the Curve Number (CN2), which is initialized based on the moisture content within soils. Therefore, finding a way of leveraging soil moisture to support more realistic modeling of streamflow remains important.
Another approach that provides a more holistic prospective is a mass balance accounting of the overall water budget. This method has yielded meaningful insights particularly at the regional and watershed scales (e.g., [18][19][20]). In reference [21], it is indicated that inter-seasonal and inter-annual variations in surface water storage volumes, as well as their impact on precipitation (P), evapotranspiration (ET), surface water storage (S), and runoff (Q), are not well understood. There remains a fundamental lack of knowledge, both in terms of spatial and temporal scales, regarding the hydrologic processes that influence each of the terms of the basic hydrologic equation. Incorporation of multiple observations (both in situ and remotely sensing) into model calibration can force modeling to be based on more realistic parameter selection. Therefore, the objective of this study is to demonstrate whether diverse remote sensing observations can improve simulated SWAT streamflow in eight Great Plains watersheds.

Watersheds Examined
Eight, moderate-sized (832 to 4892 square km) watersheds were examined (Table 1; Figure 1). Basins generally have a dendritic drainage pattern with a rounded shape, except for Chickaskia (CH) and Ninnescah (NI), which are elongated. Bird Creek (BC), CH, Little Arkansas (LA), and Little Nemaha (LN) flow in general toward the southeast. Black Vermillion (BV) drainage is oriented southwest and Walnut (WN) toward the south. Mill Creek (MC) and NI flow toward the east. The SWAT model is subdivided into subbasins as computational units. To enhance inter-comparability of the results, the number of subbasins was set as consistently as possible. The eight basins had subdued topography typical of the Great Plains region. Overall relief varied between 130 to 313 m in the examined watersheds (Table 1). In terms of soils, most watersheds were dominated by some variants of loam within the top layer that roughly correspond with the upper root zone. The only exception was NI, where loamy sand was the most abundant texture. Land use/land cover in five watersheds was dominated by agricultural activity (BV, CH, LA, LN, NI). BC, MC, and WN also had significant rangeland and grasses.

SWAT Model Input
The SWAT model incorporates landscape information about elevation, soils, and land use/land cover. Elevation data were derived from the National Map Download service from the United States Geological Survey. Tiles of this seamless product were downloaded in an ArcGrid format, with a spatial resolution of 1 arc-second. For soils, the Digital General Soil Map of the United States or STATSGO2 (United States (US) Department of Agriculture, Washington, DC, USA) developed by the National Cooperative Soil Survey, was selected. This product was downloaded from the Natural Resources Conservation Service Geospatial Data Gateway in polygon shapefile format and it has an inherent 1:250,000 spatial scale. Finally, land use/land cover data came from the 2011 National Land Cover Data Set, accessed through the Natural Conservation Service Geospatial Data Gateway. This product was developed by the Multi-Resolution Land Characteristics Consortium. It was obtained in a GeoTIFF format with a spatial resolution of 30 m.
Hydrometeorological data (daily precipitation and temperature) were obtained from the PRISM Climate Group at Oregon State University. This product has a 4 km spatial resolution covering all of the continental United States (CONUS). To support execution of the SWAT model, daily PRISM data for all SWAT subbasins was averaged using zonal statistics based on the intersection of PRISM grid cell with each subbasin. In addition, annual basin-wide averages for precipitation were obtained for all eight basins by simple averaging.

Other Soft Data
For RZSM, the SMERGE 2.0 product (US National Aeronautics and Space Administration, NASA, Washington, DC, USA) was selected [22]. This product provided particularly robust results

SWAT Model Input
The SWAT model incorporates landscape information about elevation, soils, and land use/land cover. Elevation data were derived from the National Map Download service from the United States Geological Survey. Tiles of this seamless product were downloaded in an ArcGrid format, with a spatial resolution of 1 arc-second. For soils, the Digital General Soil Map of the United States or STATSGO2 (United States (US) Department of Agriculture, Washington, DC, USA) developed by the National Cooperative Soil Survey, was selected. This product was downloaded from the Natural Resources Conservation Service Geospatial Data Gateway in polygon shapefile format and it has an inherent 1:250,000 spatial scale. Finally, land use/land cover data came from the 2011 National Land Cover Data Set, accessed through the Natural Conservation Service Geospatial Data Gateway. This product was developed by the Multi-Resolution Land Characteristics Consortium. It was obtained in a GeoTIFF format with a spatial resolution of 30 m.
Hydrometeorological data (daily precipitation and temperature) were obtained from the PRISM Climate Group at Oregon State University. This product has a 4 km spatial resolution covering all of the continental United States (CONUS). To support execution of the SWAT model, daily PRISM data for all SWAT subbasins was averaged using zonal statistics based on the intersection of PRISM grid cell with each subbasin. In addition, annual basin-wide averages for precipitation were obtained for all eight basins by simple averaging.

Other Soft Data
For RZSM, the SMERGE 2.0 product (US National Aeronautics and Space Administration, NASA, Washington, DC, USA) was selected [22]. This product provided particularly robust results in the Great Plains region and reflects a product that blends equally remote sensing and land surface model datasets. SMERGE 2.0 is available at a daily time step and has a 0.125-degree spatial resolution. Like with PRISM precipitation data, SMERGE was averaged for each of the eight basins on an annual basis (1992 to 2018). Unlike PRISM data anomalies, not raw volumetric data were used for RZSM.
Three ET products were applied to this study (Moderate Resolution Imagining Spectrometer, MODIS16A2v5, US NASA Earth Observing System Data and Information System (EOSDIS) Land Processes Distributed Active Archive Center (DAAC), [23]; Simplified Surface Energy Balance, SSEBopv4, US Geologic Survey, Center for Integrated Data Analytics, Middleton, Wisconsin [24]; Global Land Evaporation: the Amsterdam Model, GLEAMv3.3a, Vrije Universiteit Amsterdam, The Netherlands [25]). The MODIS product was obtained in a monthly HDF file with a 1 km resolution. This dataset was extracted into a raster layer and zonal statistics tools were utilized to obtain the average of the pixels intersecting with the watershed outline. The same method was applied to SSEBop, which was obtained in a GeoTiff raster format with a 0.009-degree spatial resolution in monthly files. GLEAM was available in netCDF files in a grid of 0.25 × 0.25 degrees with a monthly temporal resolution. The values of the grids with centroids within the watershed were extracted and summed to obtain the average. Values were summed to calculate an annual estimate of ET (2016-2018) for the eight examined watersheds.
Total terrestrial water was estimated from the NASA Gravity Recovery and Climate Experiment (GRACE) using the GRCTellus JPL-Mascons dataset [26,27]. This product combined monthly gravity solutions from GRACE and GRACE-FO, as determined from the JPL RL06Mv2 mascon solution with the coastline resolution improvement filter. The GRACE product was available in a monthly netCDf file (missing values exist) at a 0.5-degree spatial resolution. The average of intersecting GRACE grids with the watershed was summed to obtain an annual estimate (2015-2018) of total terrestrial water change.
Finally, to constrain an interception-related SWAT parameter, the MOD15A2H Terra version 006 combined Leaf Area Index (LAI) and Fraction of Photosynthetically Active Radiation (US NASA EOSDIS Land Processes DAAC, Sioux Falls, ND, USA) was used [28]. This product had a 500 m spatial resolution and was an eight-day composite dataset based on the best available from acquisitions from each period. Values were aggregated to obtain a basin-wide estimate of LAI.

SWAT Model Setup
In SWAT, the automatic watershed delineation tool was used to define the stream network and number of subbasins within a watershed. Subbasin number was based on the area of the watershed present upstream of the beginning point for each tributary channel. Within each subbasin, water balance calculations were based on the aerially weighted proportions of unique combinations of soil and land use, referred to as hydrologic response units (HRU). Each HRU had a unique Curve Number (CN), adjusted for antecedent moisture conditions, which was used to determine infiltration and surface runoff within each subbasin. Another component of the SWAT model that enhanced its ability to calculate the water balance within each subbasin was the calculation of daily potential evapotranspiration values using the Priestley-Taylor method [29]. SWAT does not consider the spatial location of HRUs within each subbasin, and consequently, this is why SWAT is not considered a fully distributed model. Excessive runoff generated within each subbasin was conceptually routed as overland flow. Once overland flow water intersected a stream reach or channel, water was routed downstream using the variable storage method [30]. To facilitate autocalibration of the SWAT model, the stand-alone SUFI-2 autocalibration [31] routine was utilized. Of the autocalibration programs available for the SWAT model, SUFI-2 converges on an optimal solution with a relatively small number of executed simulations (500 to 1000 model runs; [32,33]) and was ran at a daily time step. In addition, SUFI-2 provided an estimate of parameter sensitivity. Note that HRU parameter values were averaged at the subbasin level. Only highly sensitive parameters (HSP; p-value < 0.02) were varied after the first two global simulations executed, which are described below.
To evaluate model performance, standard objective measures were used, including the mass balance error (MBE) and Nash-Sutcliffe efficiency coefficients (NS). To collapse these metrics into one measure, all model results were evaluated based on the Relative Performance Scale [34]. This combined metric was based on the criterion of reference [35] (see Table 2). To calculate the RPS, both the MBE and NS were translated into a single RPS metric. For example, if a simulation has a NS = 0.75 and MBE of 15%, these values constituted provisional RPS values of 3.00 and 2.00, respectively. To be conservative, the lower provisional RPS value was always selected so that in this example, the final RPS value assigned was 2.00. The best model run for each simulation type was evaluated with a single RPS score to facilitate inter-comparison of results.

Global Simulation Series
For global simulations, one RPS value was calculated for the entire simulation period (1995 to 2015) in each watershed; shorter for MC (2005 to 2015). This simulation series consisted of two model runs. The initial simulation was referred to as Base_Q. In this model run, there were no constraints on parameters values, except for the outer bounds established by reference [29]. Parameter value ranges for the Base_Q are presented in Table 3.
The next type of global simulation consisted of iterative model runs, which constrained parameters to improve objective metrics and was referred to as IT_Q. Model parameters were limited in two ways: (a) using a priori data to set CANMX and ALPHA_BF and (b) examining Dotty plots ( Figure 2) to identify limits for optimal performance for variable HSP. The CANMX parameter was set by using MODIS_LAI product and the following equation [36].
where S Max was the maximum water storage within the canopy, ƒ was a specific factor dependent upon vegetation type, and LAI was determined from MODIS_LAI product (MOD15A2). ALPHA_BF was determined using the baseflow program from reference [37] and was set within a factor of two of the calculated value. Parameter sensitivity was examined and HSP were identified (Table 4). These parameters can be divided into two groups (variable and non-variable). Examination of Dotty Plots can further constrain variable, HSP values. While some of the parameters lack an optimal range of values (Figure 2a), others do not ( Figure 2b). An iterative approach was used in adjusting parameters with optimal values to yield better performance. HSP specifically, in all basins, the tightening of the range of CH_K2 improved results. Additional iterations in BC and WN focused on CH_N2; in NI, with CN2; in WN, on OV_N. The tightening of variable HSP values had a beneficial impact on the final IT_Q model executed. From this simulation, the values of non-sensitive (p > 0.02) and non-variable HSP are set from the optimum parameter values calculated (Table 5). Only variable, HSP (Table 6) were left unconstrained in subsequent modeling series.

Individual Year-By-Year Series
Individual year-by-year model runs between 1995 to 2015 were executed. In this modeling series, objective results were obtained for each year (n = 21). All variable, HSP were correlated with annual SMERGE 2.0 RZSM anomalies and raw PRISM precipitation. Note that years with unacceptable RPS values were omitted from this analysis. Correlation values based on this modeling series weare presented in Table 7. The range and average for variable, HSP are shown in Table 8. Only parameters with a correlation (r) that exceeds 0.5 were considered in the third modeling series described next.

Final Calibration Year-By-Year Series
Information from the two prior modeling series was leveraged to improve calibration in the final year-by-year calibration series (2016 to 2018). Unacceptable objective metrics (RPS < 1.00 for Sens_Q) were used to omit the following basin-year combinations (BC-2016, BC-2018, NI-2017, and WN-2017).
Four model runs that were executed in this series, which include: (a) Global_Q that applied IT_Q parameter values (Tables 4 and 6) on a year-by-year basis (i.e., 2016 Global_Q); (b) Sens_Q in which variable, HSP (Table 8) were set between the range observed during the 1995 to 2015 runs; (c) SMERGE_Parameter (i.e., 2016 SMERGE_CN2); (d) PRISM_Parameter (i.e., 2016 PRISM_CN2). For the parameter-based model runs variable, HSP were set at ±10% of the average value obtained between 1995 to 2015 (Table 8), except for the highly correlated parameters (HCP). HCP values were calculated using the SMERGE 2.0 RZSM anomaly or raw PRISM precipitation for the examined year (i.e., 2016; Table 9) using the 1995 to 2015 regression relationship. The parameter range for HCP was set at ±10% of the calculated value.

Mass Balance Calculations
Streamflow (Q) simulated from year-by-year series (2016 to 2018) was compared against USGS gauge observed streamflow (with a nominal ±10% error). The range of simulated Q were produced by extracting all simulations in a model run that yielded acceptable results (RPS > 1.00). Streamflow was calculated based on mass balance within each basin based on: where P was the annual average PRISM precipitation value within a watershed; ∆S was the change in annual terrestrial water determined from the GRACE product; ET was evapotranspiration and was estimated with three products (MODIS16A2v5; GLEAM v.3.3a; SSEBop v.4). A nominal 10% error was applied to calculated Q values. Note that LN-2018 was omitted in the mass balance analysis because of incomplete observed USGS streamflow data at the end of 2018.

SWAT Simulations
The initial global series included Base_Q and IT_Q simulations. Only BV and CH have acceptable Base_Q simulations. The IT_Q results are dramatically better. Only MC was not satisfactory. BC, LA, NI, and WN were satisfactory to good, CH good to very good, and BV and NH exceeded the threshold for a very good simulation. Figure 3 combines the results from all eight basins into box plots. The average for the Base_Q model runs had an RPS value of 0.781, which was unacceptable. The iterative approach IT_Q improved objective results, with an average RPS of 1.886. T-test comparison between Base_Q and IT_Q simulations yielded a significant difference (based on t-test results) between the means of these model runs (p value = 0.0058). This comparison shows how constraining parameters improved model performance.
The final calibration year-by-year series had four types of model runs that included Global_Q, Sens_Q, SMERGE_Parameter, and PRISM_Parameter ( Figure 4). Global_Q, which was based on IT_Q parameter values, had an average RPS of 1.282, considered satisfactory to good. In BC and NI, performance was unsatisfactory for all years (2016 to 2018). In other basins, simulations varied greatly on a year-by-year basis. In BV, results ranged from unsatisfactory to very good and in LA and MC, from unsatisfactory to good. LN recorded results between satisfactory and very good. In CH and WN, performance ranges between unsatisfactory to satisfactory. The three other model series (Sens_Q, SMERGE_Parameter, and PRISM_Parameter) leveraged the results from the individual year-by-year simulation series (1995 to 2015) to constrain parameter values. These series yielded average RPS values (2.032 to 2.623), which ranged from good to very good in terms of performance. Notable improvements in the three-model series over Global_Q were noted for the following basin-year combination, which recorded an over 2.00 increase in RPS values (BV-2018, CH-2018, LA-2018, MC-2016, and NI-2016). Table 10 provided a summary of t-test results for these simulation series. The Global_Q model run had a statistically significant difference compared with the three other simulation series. Conversely, Sens_Q, SMERGE_Parameter, and PRISM_Parameter did not differ significantly between each other (Table 10). These results demonstrated a range of optimal solutions achieved with differing parameter values-a prime example of how equifinality can limit the utility of hydrologic simulations.

Mass Balance Comparisons
To avoid the equifinality constraint, calculated Q, which used P, ET, and S, was examined. Observed, SWAT simulated, and calculated Q are compared in Figures 5-10 for the years 2016 to 2018. Similar results were obtained in most the watersheds, as discussed below. Note that BC-2016, BC-2018, and NI-2017 were omitted because no acceptable simulations were obtained during these years. LN-2018 was omitted because complete observational data were not available for 2018.
Calculated Q for all three products compared poorly against USGS gauge observed Q and in general, grossly overestimated this value. MODIS16A2v5 nearly always generated a highly inflated calculated Q value.

Mass Balance Comparisons
To avoid the equifinality constraint, calculated Q, which used P, ET, and ∆S, was examined. Observed, SWAT simulated, and calculated Q are compared in Figures 5-10 for the years 2016 to 2018. Similar results were obtained in most the watersheds, as discussed below. Note that BC-2016, BC-2018, and NI-2017 were omitted because no acceptable simulations were obtained during these years. LN-2018 was omitted because complete observational data were not available for 2018.
Calculated Q for all three products compared poorly against USGS gauge observed Q and in general, grossly overestimated this value. MODIS16A2v5 nearly always generated a highly inflated calculated Q value.  (Figure 5b,c, Figures 7b and 9a,b), which had GLEAM v.3.3a as the lowest calculated Q value. For MC-2016 (Figure 9a), SSEBop v.4 had the highest value.
In terms of the final calibration year-by-year series, the simulated Q much better matched with the USGS gauge observed Q. Sens_Q generally had a wider range of simulated Q values compared with either SMERGE_Parameter or PRISM_Parameter, in which parameter values were more constrained. The exception to this is in BV and MC, which had two HPCs ( Table 9) (Figures 5b,c, 7b, 9a,b), which had GLEAM v.3.3a as the lowest calculated Q value. For MC-2016 (Figure 9a), SSEBop v.4 had the highest value.
In terms of the final calibration year-by-year series, the simulated Q much better matched with the USGS gauge observed Q. Sens_Q generally had a wider range of simulated Q values compared with either SMERGE_Parameter or PRISM_Parameter, in which parameter values were more constrained. The exception to this is in BV and MC, which had two HPCs ( Table 9)

Discussion
This work is a fundamental example of how constraining parameter values in a hydrologic model can improve objective performance measures during autocalibration. Even so the issue of equifinality remains (e.g., [38]). During a model run, there were a large number of simulations that clustered close to the optimum for objective performance over a broad range of parameter values (Figure 2). There remains an imperative to further reduce the number of unconstrained model parameters, which is a means of minimizing the impact of equifinality [39,40]. In this work, parameter sensitivity and variability were leveraged as an approach to accomplish the above objective. Only HSP that exhibited significant variability were left unconstrained. The impact of this approach greatly improved performance for both global (1995 to 2015) and final calibration year-by-year (2016 to 2018) series. Therefore, by the application of parameter constraints, objective metrics transcended the limitations imposed within initial model runs by equifinality; or more simply put, by focusing on parameters that really matter, model performance dramatically improved.
Water 2020, 12, x FOR PEER REVIEW 14 of 22

Discussion
This work is a fundamental example of how constraining parameter values in a hydrologic model can improve objective performance measures during autocalibration. Even so the issue of equifinality remains (e.g., [38]). During a model run, there were a large number of simulations that clustered close to the optimum for objective performance over a broad range of parameter values ( Figure 2). There remains an imperative to further reduce the number of unconstrained model parameters, which is a means of minimizing the impact of equifinality [39,40]. In this work, parameter sensitivity and variability were leveraged as an approach to accomplish the above objective. Only HSP that exhibited significant variability were left unconstrained. The impact of this approach greatly improved performance for both global (1995 to 2015) and final calibration year-by-year (2016 to 2018) series. Therefore, by the application of parameter constraints, objective metrics transcended the limitations imposed within initial model runs by equifinality; or more simply put, by focusing on parameters that really matter, model performance dramatically improved.   Another interesting result was the slight preference for leveraging RZSM (SMERGE 2.0) over precipitation (PRISM) in improving model performance. This result should not be surprising. While there is no doubt that precipitation is a critical hydrologic variable, no model will yield meaningful results if erroneous precipitation is utilized. Still, it is soil moisture and not precipitation that directly modulates the rainfall-runoff response at a watershed scale. Specifically, antecedent soil moisture strongly governs streamflow response. In all eight examined basins, the Curve Number (CN2) was consistently the most sensitive parameter, emphasizing the importance of soil moisture in controlling streamflow. Indeed, research [41] demonstrates how improved soil moisture accounting by incorporating more realistic Curve Number values can enhance streamflow predictions.
Another approach to minimize equifinality involves the incorporation of remote sensing observations to directly constrain model parameter values [4][5][6][7][8][9][10][11][12][13]. Obviously, the utility of this approach relies on the robustness of the remote sensing data applied. As indicated previously, the Another interesting result was the slight preference for leveraging RZSM (SMERGE 2.0) over precipitation (PRISM) in improving model performance. This result should not be surprising. While there is no doubt that precipitation is a critical hydrologic variable, no model will yield meaningful results if erroneous precipitation is utilized. Still, it is soil moisture and not precipitation that directly modulates the rainfall-runoff response at a watershed scale. Specifically, antecedent soil moisture strongly governs streamflow response. In all eight examined basins, the Curve Number (CN2) was consistently the most sensitive parameter, emphasizing the importance of soil moisture in controlling streamflow. Indeed, research [41] demonstrates how improved soil moisture accounting by incorporating more realistic Curve Number values can enhance streamflow predictions.
Another approach to minimize equifinality involves the incorporation of remote sensing observations to directly constrain model parameter values [4][5][6][7][8][9][10][11][12][13]. Obviously, the utility of this approach relies on the robustness of the remote sensing data applied. As indicated previously, the model structure of SWAT has issues with the direct assimilation of soil moisture data [24]. ET is another important flux with the water budget at a watershed scale. In the mass balance approach, utilized ET is at least two orders of magnitude greater than ∆S. The fact that PRISM precipitation data was used to drive SWAT simulations and that robust results were obtained supports the general accuracy of this dataset. Therefore, the discrepancies that exist between the calculated and simulated Q values must largely lay with three ET datasets used in this study. MODIS16A2v5 tends to perform less well in the central Great Plains, where there is a more limited vegetation cover [42]. The GLEAM product also had issues in this region. At its core, GLEAM assimilates satellite estimated precipitation that has a strong tendency to overestimate summertime particularly over the Great Plains [43,44]. These errors can generate a strong positive bias in annual ET estimates from this region. Because of these issues, we opted not to utilize data from these ET products to constrain SWAT parameter values. model structure of SWAT has issues with the direct assimilation of soil moisture data [24]. ET is another important flux with the water budget at a watershed scale. In the mass balance approach, utilized ET is at least two orders of magnitude greater than S. The fact that PRISM precipitation data was used to drive SWAT simulations and that robust results were obtained supports the general accuracy of this dataset. Therefore, the discrepancies that exist between the calculated and simulated Q values must largely lay with three ET datasets used in this study. MODIS16A2v5 tends to perform less well in the central Great Plains, where there is a more limited vegetation cover [42]. The GLEAM product also had issues in this region. At its core, GLEAM assimilates satellite estimated precipitation that has a strong tendency to overestimate summertime particularly over the Great Plains [43,44]. These errors can generate a strong positive bias in annual ET estimates from this region. Because of these issues, we opted not to utilize data from these ET products to constrain SWAT parameter values.  Future work will further validate the approaches articulated in this study beyond the Great Plains within a broad range of land covers, soil, and climatic regimes. From reference [22], SMERGE 2.0 provided optimal estimates for RZSM in the Southern to Northern Great Plains, the Central Valley of California, and scattered areas in both Southwestern and Southeastern CONUS. In highly forested regions, including much of Northwestern and Eastern CONUS, land surface models like Noah may provide a better estimate of RZSM. In scattered areas from Southern California to Arizona and the corn belt extending from Iowa to Illinois, SMERGE 2.0 also underperforms and land surface model estimates of RZSM are preferable. Interestingly, in no region within CONUS are satellite soil moisture estimates preferred. Either the land surface model or merged land surface model and the satellite soil moisture retrievals yield optimal estimates of RZSM [22]. Future work will further validate the approaches articulated in this study beyond the Great Plains within a broad range of land covers, soil, and climatic regimes. From reference [22], SMERGE 2.0 provided optimal estimates for RZSM in the Southern to Northern Great Plains, the Central Valley of California, and scattered areas in both Southwestern and Southeastern CONUS. In highly forested regions, including much of Northwestern and Eastern CONUS, land surface models like Noah may provide a better estimate of RZSM. In scattered areas from Southern California to Arizona and the corn belt extending from Iowa to Illinois, SMERGE 2.0 also underperforms and land surface model estimates of RZSM are preferable. Interestingly, in no region within CONUS are satellite soil moisture estimates preferred. Either the land surface model or merged land surface model and the satellite soil moisture retrievals yield optimal estimates of RZSM [22].

Conclusions
To summarize, the key results are as follows: (1) The final calibration year-by-year simulation series (2016 to 2018), which was based on executing SWAT on an annual basis, outperformed the global simulations series, in which one objective metric was calculated based on the entire analysis period (1995 to 2015).
(2) For the final calibration year-by-year simulation series, four model runs were executed (Global_Q; Sens_Q; SMERGE_Parameter; PRISM_Parameter). The Global_Q simulation, which was based on parameter values fixed during the global simulation series, underperformed compared with other model runs, in which parameter values were constrained with information derived from individual year-by-year models as well as SMERGE 2.0 RZSM anomaly and PRISM precipitation data.
(3) SMERGE_Parameter simulations had slightly higher RPS values compared with PRISM_Parameter simulations and also better matched with USGS gauge observed Q.
(4) Calculated Q based on a mass balance approach did not consistently match observed Q, unlike SWAT simulated Q. The highest calculated Q was yielded by using the MODIS16A2v5 ET product, followed by GLEAM v.3.3a, and SSEBop v.4, which best matched with observed Q. The significant implication derived from this work is the demonstration that constraining parameter values can markedly improve SWAT model performance. In addition, that RZSM from SMERGE 2.0 can be leveraged to also greatly improve SWAT model performance. Therefore, this

Conclusions
To summarize, the key results are as follows: (1) The final calibration year-by-year simulation series (2016 to 2018), which was based on executing SWAT on an annual basis, outperformed the global simulations series, in which one objective metric was calculated based on the entire analysis period (1995 to 2015).
(2) For the final calibration year-by-year simulation series, four model runs were executed (Global_Q; Sens_Q; SMERGE_Parameter; PRISM_Parameter). The Global_Q simulation, which was based on parameter values fixed during the global simulation series, underperformed compared with other model runs, in which parameter values were constrained with information derived from individual year-by-year models as well as SMERGE 2.0 RZSM anomaly and PRISM precipitation data.
(3) SMERGE_Parameter simulations had slightly higher RPS values compared with PRISM_Parameter simulations and also better matched with USGS gauge observed Q.
(4) Calculated Q based on a mass balance approach did not consistently match observed Q, unlike SWAT simulated Q. The highest calculated Q was yielded by using the MODIS16A2v5 ET product, followed by GLEAM v.3.3a, and SSEBop v.4, which best matched with observed Q. The significant implication derived from this work is the demonstration that constraining parameter values can markedly improve SWAT model performance. In addition, that RZSM from SMERGE 2.0 can be leveraged to also greatly improve SWAT model performance. Therefore, this work highlights how diverse remote sensing data can be used to support hydrologic modeling of streamflow at the watershed scale providing more physically realistic results.
Water 2020, 12, x FOR PEER REVIEW 18 of 22 work highlights how diverse remote sensing data can be used to support hydrologic modeling of streamflow at the watershed scale providing more physically realistic results.