3.1. VIC Parameter Trends and Relationships
Investigating how each VIC parameter influenced median KGE values (comparing Qm vs. Qg) from the 39 gauges for all 10,000 iterations (Figure 3
) highlights one parameter, usoilD, above all others as having a strong signal. As usoilD increases from 0.2 to 1.4 m, the median KGE tends to increase, and as usoilD increases from 1.4 to 2.0 m, overall, median KGE values show a decreasing trend. The range of median KGE values for each tested value of usoilD also differs drastically, with soil layer depth of 0.2 m having median KGE values ranging from –0.14 to 0.37 and a soil layer depth of 1.4 m resulting in median KGE values ranging from 0.28 to 0.51. For the lower range of soil depth values (0.2–1.0 m), when bi
, Dsmax, and Ds are smaller, median KGE values tend to be highest, but as usoilD increases to 2.0 m, the best KGE values occur when bi
, Dsmax, and Ds are larger. In other words, the model parameters are shifting to balance their impacts on runoff production. For example, a smaller usoilD indicates less evapotranspiration (ET) and greater runoff. Smaller bi
values lead to more infiltration, supporting more ET and less runoff. Thus, to account for the higher runoff production due to smaller usoilD values, bi
values must be smaller. A smaller Dsmax is indicative of a lower baseflow and therefore also decreases the amount of water delivered to the channel network. Smaller values of Ds, the fraction of Dsmax where nonlinear baseflow begins, again tends to decrease channel discharge. When the median KGE values are the greatest (usoilD = 1.2–1.4 m), these four parameters balance each other, resulting in values nearer the middle of each parameter range, emulating most accurately gauge discharge. The right side of Figure 3
shows that the other VIC parameters do not exhibit a strong signal with median KGE. The bi
values show a slight increase in median KGE values at around 0.16–0.20 for the upper range of median KGE values, yet the smaller the value of bi
, the less likely the iteration is to have a KGE below zero. Often for this model, surface runoff controls during events rather than baseflow, so the impact of Dsmax and Ds is not as prevalent. Although, similar to bi
, the smaller the value of each Dsmax and Ds indicates a smaller likelihood of the median KGE being less than zero. All values greater than 8 mm/d for Dsmax and 0.2 for Ds do not seem to have much advantage over the others. No obvious patterns emerge in the relationships of bi
with other variables, as [13
] also found. Dsmax and Ds do not seem to have a correlation with each other in relation to median KGE values, either. We acknowledge that because we are setting Dsmax and Ds as constant through the whole Ohio River basin network, we are not able to effectively test the correlation between parameters as other studies with spatial gridded optimal values have done [13
3.2. Iteration Comparisons among Timeseries
When calibrating the model using the different discharge timeseries to obtain median KGE values, the same trends exists in all parameters, although the ranges in KGE values may differ, for example with usoilD and its relationship with bi
). For the median KGE of the 39 gauges, Qg and Qgs have similar ranges, but Qs has markedly larger ranges of median KGE values, especially when usoilD is equal to 0.2 m (ranging from –0.38 to 0.31). Note that all median KGE values are above the –0.41 KGE threshold discussed in the methods, indicating that the model improves upon the mean flow benchmark and that individual gauges have values that are below the threshold. The maximum KGE from an individual gauge out of the 39 is also shown, and KGE values for all three timeseries (Qg, Qgs, Qs) vs. Qm are between 0.47 and 0.82. Given that the maximum KGE possible is 1.0, the modeled discharge and each timeseries have good correlation. The range of maximum KGE values is the smallest using Qg, but the best maximum KGE values were achieved in calibration using Qgs, which is not immediately intuitive. An explanation of why this may be the case can be found in the discussion.
The parameter sets that produced the highest median KGE value from the 39 gauges for each timeseries comparison are shown in Table 2
. The KGE performance is most reliant on usoilD, and thus intuitively, the highest median KGE values align with similar usoilD values for each comparison. Qg and Qs are both optimized with 1.4 m of upper soil layer depth, while Qgs is optimized when the depth is 1.2 m. Optimized values for Ds and Dsmax vary greatly across the timeseries. For reference, the average mean annual flow (MAF) measured at the gauge locations is 1.5 mm/d, which was found by dividing the MAF by the drainage area per site, and it is an order of magnitude smaller than Dsmax. Qg and Qs are optimized when bi
is 0.16, but Qgs is optimized when it is 0.04. These three specific parameter sets give hydrologic model results, and they are referred to as Model A, B, and C for the selected top iteration of each timeseries: Qg, Qgs, and Qs, respectively.
Example hydrographs for the gauge site, White River at Newberry, IN (USGS 03360500), illustrate each timeseries and the optimized modeled discharge over a two-month period in 2011 (Figure 5
a). This gauge will be observed twice in the 21-day orbit cycle: days 5 and 17. This gauge performs well, giving KGE values greater than 0.71 over the 9-year period for all three timeseries comparisons. Figure 5
b shows 2011 discharge ranked from largest to smallest for all three calibrated models. Since the modeled discharge calibrated using Qgs, model B, uses an optimal depth at 1.2 m, as opposed to 1.4 m as the others, less infiltration into the soil can occur. When the soil is at saturated capacity, more runoff is generated, and the magnitude of the model B discharge is higher. During low-flow conditions, there is less separation between modeled discharge results because less runoff occurs in general. Since both bi
and usoilD’s optimized values were the same for model A and model C, which were calibrated according to the top KGE iteration of Qg and Qs, respectively, the difference in modeled hydrographs is attributed to Dsmax and Ds, which control subsurface or baseflow. The combination of Dsmax and Ds for model A results in higher baseflow values than for those of model C, which is more obviously observed in the hydrograph during peak flows.
To further determine whether results based on calibrating Qm with each timeseries are consistent, the top 100 iterations out of the 10,000 were also compared with each other (Figure 6
). The iterations with the top 100 median KGE values are ranked according to the median KGE of Qm vs. each individual timeseries (Qg, Qgs, Qs), and thus, each iteration has three median KGE values. Figure 6
shows three different ranking orders of the top 100 iterations, although the iterations in the top 100 are not necessarily the same in all three plots. For each plot, the top 100 iterations are selected based on the greatest 100 median KGE iterations for that comparison, and the other KGE values for these iterations also shown. For example, the first plot shows the top 100 iterations as defined by ranking Qm vs. Qg’s median KGE values. These same parameter sets used to obtain the top iterations of Qm vs. Qg also obtain KGE values for Qm vs. Qgs, and Qm vs. Qs, which are displayed on the plots as well. When the top iterations are selected and shown in Qg and Qgs’s ranked order, the magnitude of their highest median KGE values do not differ greatly, with their highest values between 0.50 and 0.52. Qs’s highest median KGE values remain close to 0.46 for all its top 100 iterations. It is interesting to note that many Qs KGE values close to 0.46 appear in the iterations selected when ranking according to Qg and Qgs, yet the top 20 median KGE values for Qgs only seem to appear on the middle plot that is ranked according to Qgs. In other words, the top parameter sets selected for Qgs are not the same as the top iterations selected when ranking by Qg and Qs. The optimal parameters obtained from ranking according to Qg KGE generate the highest KGE values for all three timeseries collectively (the most left plot in Figure 6
), suggesting that calibration using Qg gives more robust model parameters.
Breaking down the top 100 iterations into histograms of the optimal parameter values selected (Figure 7
) highlights similar patterns as discussed above for the best iterations recorded in Table 2
. Again, calibrating using Qg, Qgs, and Qs all lead to usoilD values between 1.2 and 1.6 m, although Qg tends toward 1.6 m in the top 100 iterations, while Qgs tends toward 1.2 m and Qs is split relatively evenly between 1.2 and 1.4 m as the optimum upper soil layer depth. For bi
, the mode selected when calibrating with Qg is 0.2, while Qs produces the most iterations in the top 100 with 0.12, and Qs is again split relatively evenly between 0.12 and 0.16. It is interesting to note that Qgs’s top run has a bi value of 0.04, although less than five iterations in the top 100 also have this value. bi
values above 0.24 are not in the top 100 iterations for any of the timeseries. Dsmax and Ds have a wide range of values in the top 100 iterations for all three timeseries. The only value that is not represented for Dsmax is 4 mm/d. One selection is not obviously better than any others, although 12 mm/d seems to be the mode for both Qg and Qgs, while 40 mm/d is the mode for Qs. Ds has modes at 0.3 (Qg), 0.5 (Qgs), and 0.6 (Qs), although again, the spread over all the possible ranges is large. The large spread of Ds and Dsmax in the top 100 support the finding that usoilD and bi are controlling the variability in KGE values much more than Ds and Dsmax.
Since every parameter iteration gives three KGE values, one for each timeseries compared to a modeled timeseries result, we compute the percent difference between KGE values per iteration (Figure 8
). We aim to determine the effects that SWOT temporal sampling, SWOT uncertainty, and both SWOT temporal and uncertainty combined have on KGE values for all 10,000 modeled iterations. Do only the top iterations have comparable KGE values? How do all iterations compare to one another? Figure 8
is a density curve for all three percent difference comparisons, and thus the area under each curve is equal to 1. Taking the area under the curve between ± 30% difference indicates that 96% of iterations have less than 30% absolute difference when only temporal sampling differences (Qg vs. Qgs) are considered. Meanwhile, 62% of iterations have less than 30% absolute difference when just uncertainty is considered (Qgs vs. Qs), and 70% of iterations have less than 30% absolute difference when both sampling and uncertainty are considered (Qg vs. Qs). Subtracting KGE values from each other per iteration and taking the average of these values gives –0.01, 0.1, and 0.09 for Qg – Qgs, Qgs – Qs, and Qg – Qs, respectively. The comparisons involving uncertainty both have density curves that are skewed to the right with most percent differences greater than zero, indicating that comparing Qm with Qg or Qgs almost always gives a higher median KGE than comparing Qm with Qs, as expected. Moreover, 68% of iterations lie to the left of zero comparing Qg vs. Qgs KGE values, indicating that often, Qgs gives a higher KGE value than Qg. Since adding temporal variation (Qg vs. Qgs) tends to lead to negative percent differences and adding uncertainty (Qgs vs. Qs) tends to lead to larger positive percent differences, logically, adding the two together (Qg vs. Qs) would lead to positive percent differences as well, but not as large in magnitude as uncertainty by itself; therefore, more iterations have a larger percent difference for Qgs vs. Qs than Qg vs. Qs.
These results lead to the question: What are the implications of these differing KGE results for calibrated models in a practical sense? For the optimal parameter sets determined by KGE for models A, B, and C, the KGE values for each model timeseries compared to Qg, Qgs, or Qs are within ±0.03 of each other (Table S2
). How do the models compare for metrics such as annual maximum and mean annual flows? We compare each modeled annual hydrograph for all 39 gauges with the truth timeseries, Qg, and compute the percent difference among them for all 9 years, giving 351 data points per statistic/model comparison shown in histograms (Figure 9
). When the percent difference between Qg and model A is calculated, 68% of mean annual flow statistics lie within ±30% difference of the truth. Comparing with models B and C give similar results, with 69% and 71% of values within a ±30% difference, respectively. For mean annual flow, no obvious skew toward under or over estimation of the model discharge compared to Qg discharge exists. Annual maximum values give percent errors that are skewed to the right; model discharges tend to underestimate peak flows, regardless of the parameter sets selected. Seventy-five percent, 68%, and 76% of annual maximums for models A, B, and C, respectively, give positive percent errors. It is interesting to note that since models A and C have the same optimal values for controlling parameters, bi
and usoilD, their values are more similar, as evident in the shape of their histograms. When median values of the annual statistics for the 39 gauges are analyzed, on average, models A and C only differ by about 1% for each statistic (Table S2
). Model B gives a slightly higher mean annual flow value averaged over the 9-year period than models A and C (1% and 2% difference respectively) while having an approximately 10% difference from the two models regarding annual peak flow. In general, model B indicates that calibrating a model based on SWOT temporal sampling alone (Qgs) could produce hydrograph values with a more accurate representation of peak flows.
Except for bi
, calibrated parameters are spatially the same throughout the entire basin; thus, we also investigated KGE as a function of drainage area to determine if each model varied in performance for upstream vs. downstream locations (Figure S1
). Although no significant trendline was discernable for any of the models, upstream locations with smaller drainage areas have a wider spread of KGE values; however, high drainage area gauges are not as represented among the selected gauges. The plot does solidify the trends observed for the top three models: A, B, and C. Model C often gives the worst KGE value of the three. Model B tends to have the best KGE value for each location, although model A often has comparative values to model B as well.
3.3. Sensitivity and Validation Analysis
Since currently, the SWOT mission is expected to last at least 3 years, we scale down the amount of measurements for our calibration experiment from 9 years to 3 years to see if/how results change. We test 3 periods of 3 years (2010–2012, 2013–2015, 2016–2018) and find that the median and maximum KGE values do not change drastically for the selected best iterations (Table S3
). The most influential parameter on model results, usoilD, continues to exhibit the same pattern for best iterations among timeseries with a slightly larger range, obtaining values between 1.0 and 1.4 m, depending on the timeseries used, as opposed to between 1.2 and 1.4 m as in the 9-year experiment. bi
’s range increased as well, ranging from 0.04 to 0.32 as the best selected iteration; 0.32 is selected once for the Qs comparison for test years 2013–2015, giving the lowest KGE value out of all the optimal sets. Dsmax has the same range as the best iterations previously (12–40 mm/d), while Ds expands its range to span all tested values (0.1–1.0). If the average median KGE value is taken over for the top 100 iterations for each ranking order (Qg, Qgs, Qs), median KGE values stay between 0.39 and 0.50 regardless of the time period tested (Table S4
). Using 3 years instead of 9 years does decrease median KGE values slightly, but only on average by 0.04 points. Changing the time period tested for calibration does not change the model results significantly.
For validation purposes, we compare median and maximum KGE values for the 9-year study period using top iteration parameters gleaned from each calibration time period per timeseries (Table S5
). Although we acknowledge that the tested 3-year calibration periods are within the 9-year period, two-thirds of the 9 years are unused in calibration. Therefore, for consistency’s sake in comparing KGE values, we use the 9-year window as the study period regardless of calibration length. On average, 9-year median and maximum KGE values decrease by 9% and 5%, respectively, when comparing results from the 9-year calibration top parameters with those from any given iteration or timeseries from the 3-year calibration top parameters. For the study period, 9-year median KGE values are within 0.36–0.50 and 9-year maximum KGE values are within 0.62–0.79 when optimizing the model from a 3-year calibration period. The model performance decreases slightly when only 3 years are used for calibration compared to when all 9 years are calibrated.